Posts filed under General (716)

July 29, 2015

NRL Predictions for Round 21

Team Ratings for Round 21

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Roosters 10.93 9.09 1.80
Broncos 9.29 4.03 5.30
Cowboys 8.18 9.52 -1.30
Rabbitohs 6.42 13.06 -6.60
Storm 5.12 4.36 0.80
Bulldogs 1.11 0.21 0.90
Sea Eagles 0.49 2.68 -2.20
Warriors -0.22 3.07 -3.30
Raiders -1.16 -7.09 5.90
Dragons -1.80 -1.74 -0.10
Sharks -1.93 -10.76 8.80
Panthers -3.77 3.69 -7.50
Eels -5.78 -7.19 1.40
Knights -6.37 -0.28 -6.10
Wests Tigers -8.78 -13.13 4.30
Titans -10.39 -8.20 -2.20

 

Performance So Far

So far there have been 144 matches played, 84 of which were correctly predicted, a success rate of 58.3%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Broncos vs. Titans Jul 24 34 – 0 20.80 TRUE
2 Wests Tigers vs. Roosters Jul 24 8 – 33 -15.30 TRUE
3 Rabbitohs vs. Knights Jul 25 52 – 6 11.10 TRUE
4 Storm vs. Dragons Jul 25 22 – 4 8.60 TRUE
5 Warriors vs. Sea Eagles Jul 25 12 – 32 7.00 FALSE
6 Bulldogs vs. Sharks Jul 26 16 – 18 7.40 FALSE
7 Panthers vs. Raiders Jul 26 24 – 34 2.10 FALSE
8 Cowboys vs. Eels Jul 27 46 – 4 13.00 TRUE

 

Predictions for Round 21

Here are the predictions for Round 21. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Roosters vs. Bulldogs Jul 31 Roosters 12.80
2 Wests Tigers vs. Storm Jul 31 Storm -10.90
3 Warriors vs. Sharks Aug 01 Warriors 5.70
4 Cowboys vs. Raiders Aug 01 Cowboys 12.30
5 Dragons vs. Knights Aug 02 Dragons 7.60
6 Rabbitohs vs. Panthers Aug 02 Rabbitohs 13.20
7 Titans vs. Eels Aug 03 Eels -1.60

 

Hadley Wickham

Dan Kopf from Priceonomics has written a nice article about one of Auckland’s famous graduates, Hadley Wickham. The article can be found Hadley Wickham.

July 27, 2015

Cheat sheet on polling margin of error

The “margin of error” in a poll is the number you add and subtract to get a 95% confidence interval for the underlying proportion (under the simplest possible mathematical model for polling).  Pollers typically quote the “maximum margin of error”, which is the margin of error when the reported value is 50%. When the reported value is 0.7%, reporting the maximum margin of error (3.1%) is not helpful.  The Conservative Party is unpopular, but it’s not possible for them to have negative support, and not likely that they have nearly 4%.

Here is a cheat sheet, an expanded version of one I posted last year. The first column is the reported proportion and the remaining columns are the lower and upper ends of the 95% confidence interval for a sample of size 1000 (Here’s the code).   The Conservative Party interval is  (0.3%,1.4%), not (-2.4%, 3.8%).

       l    u
0.1  0.0  0.6
0.2  0.0  0.7
0.3  0.1  0.9
0.4  0.1  1.0
0.5  0.2  1.2
0.6  0.2  1.3
0.7  0.3  1.4
0.8  0.3  1.6
0.9  0.4  1.7
1.0  0.5  1.8
1.5  0.8  2.5
2.0  1.2  3.1
2.5  1.6  3.7
3.0  2.0  4.3
3.5  2.4  4.8
4.0  2.9  5.4
4.5  3.3  6.0
5.0  3.7  6.5
10   8.2 12.0
15  12.8 17.4
20  17.6 22.6
25  22.3 27.8
30  27.2 32.9
35  32.0 38.0
50  46.9 53.1

As you can see, the margin downwards is smaller than the margin upwards for small numbers (because you can’t have fewer than no supporters). By the time you get to 30% or so, the interval is pretty close to what you’d get with the maximum margin of error, but below 10% the maximum margin of error is seriously misleading.

You can get a reasonable approximation to these numbers by taking the number (not percent) of supporters (eg, 0.7% is 7 out of 1000), taking the square root, adding and subtracting 1, then squaring again: (then converting back into percent: ie, dividing by 10 for a poll of 1000).

    approx l approx u
0.1     0.00     0.40
0.2     0.02     0.58
0.3     0.05     0.75
0.4     0.10     0.90
0.5     0.15     1.05
0.6     0.21     1.19
0.7     0.27     1.33
0.8     0.33     1.47
0.9     0.40     1.60
1       0.47     1.73
1.5     0.83     2.37
2       1.21     2.99
2.5     1.60     3.60
3       2.00     4.20
3.5     2.42     4.78
4       2.84     5.36
4.5     3.26     5.94
5       3.69     6.51
10      8.10    12.10
15     12.65    17.55
20     17.27    22.93
25     21.94    28.26
30     26.64    33.56
35     31.36    38.84
50     45.63    54.57

which is pretty easy on a calculator, or with an Excel macro. For example, for 1000-person polls, if you put the reported percentage in the A1 cell, use =(sqrt(A1*10)-1)^2/10 and =(sqrt(A1*10)+1)^2/10

Briefly

    • Profile of Auckland Stats almnus Hadley Wickham at Priceonomics
    • The kiwi (Apteryx, not Actinidia) genome was recently sequenced by a non-NZ research group. There’s a push for NZ-led sequencing of nationally-significant genomes: a taonga genomes project
    • Linguist Jack Grieve (@JWGrieve) has been tweeting maps of various swearwords on (US) Twitter. These are relative to total number of tweets, so the don’t have the usual problem

  • From Jonathan Marshall, the age distribution of NZ electorates, and their political hue: there’s a clear trend, and Ilam seems a bit of an outlier.

 

 

 

July 22, 2015

NRL Predictions for Round 20

Team Ratings for Round 20

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Roosters 10.23 9.09 1.10
Broncos 8.36 4.03 4.30
Cowboys 6.22 9.52 -3.30
Storm 4.44 4.36 0.10
Rabbitohs 4.09 13.06 -9.00
Bulldogs 1.78 0.21 1.60
Warriors 1.61 3.07 -1.50
Dragons -1.12 -1.74 0.60
Sea Eagles -1.34 2.68 -4.00
Raiders -2.02 -7.09 5.10
Sharks -2.60 -10.76 8.20
Panthers -2.91 3.69 -6.60
Eels -3.82 -7.19 3.40
Knights -4.03 -0.28 -3.80
Wests Tigers -8.09 -13.13 5.00
Titans -9.46 -8.20 -1.30

 

Performance So Far

So far there have been 136 matches played, 79 of which were correctly predicted, a success rate of 58.1%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Storm vs. Panthers Jul 17 52 – 10 5.50 TRUE
2 Eels vs. Bulldogs Jul 17 4 – 28 0.80 FALSE
3 Dragons vs. Rabbitohs Jul 18 8 – 24 0.00 FALSE
4 Knights vs. Titans Jul 18 30 – 2 5.30 TRUE
5 Raiders vs. Sharks Jul 18 20 – 21 4.40 FALSE
6 Roosters vs. Warriors Jul 19 24 – 0 10.80 TRUE
7 Broncos vs. Wests Tigers Jul 19 42 – 16 18.40 TRUE
8 Sea Eagles vs. Cowboys Jul 20 12 – 30 -2.40 TRUE

 

Predictions for Round 20

Here are the predictions for Round 20. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Broncos vs. Titans Jul 24 Broncos 20.80
2 Wests Tigers vs. Roosters Jul 24 Roosters -15.30
3 Rabbitohs vs. Knights Jul 25 Rabbitohs 11.10
4 Storm vs. Dragons Jul 25 Storm 8.60
5 Warriors vs. Sea Eagles Jul 25 Warriors 7.00
6 Bulldogs vs. Sharks Jul 26 Bulldogs 7.40
7 Panthers vs. Raiders Jul 26 Panthers 2.10
8 Cowboys vs. Eels Jul 27 Cowboys 13.00

 

July 19, 2015

Briefly

  • In the interests of balance, a post at Public Address by Rob Salmond, who did the analysis in the ‘Chinese names’ real-estate leak.  And a robust twitter discussion with him, Keith Ng, and Tze Ming Mok.
  • Stats New Zealand has a new standard question about gender identity (as distinguished from sex), acknowledging that it isn’t as simple as some people would like it to be.
  • The most important aspects of health seem to vary by age: “older raters gave significantly more weight to functional limitations and social functioning and less to morbidities and pain experience, compared to younger raters.” (via @hildabast)
  • Priceonomics has a post on the most common and most distinctive ingredients in recipes from around the world. The list illustrates the problem with the ‘distinctiveness’ metric (as Kieran Healy pointed out: whiskey is really not the distinctive signature of Irish food).  It also shows up other problems: for example, “African” and “Asian” are both listed as cuisines. Fundamentally, the limitation in is the recipe lists and the approximations made: galangal shows up as a reasonable candidate for most-distinctive Thai ingredient partly because there aren’t any substitutes; cayenne is the most widely used ingredient in the Mexican recipes because it’s being substituted for other chillis.
July 15, 2015

NRL Predictions for Round 19

Team Ratings for Round 19

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Roosters 8.23 9.09 -0.90
Broncos 7.81 4.03 3.80
Cowboys 5.13 9.52 -4.40
Rabbitohs 2.97 13.06 -10.10
Warriors 2.54 3.07 -0.50
Storm 2.00 4.36 -2.40
Panthers 0.60 3.69 -3.10
Bulldogs 0.10 0.21 -0.10
Dragons -0.01 -1.74 1.70
Sea Eagles -0.26 2.68 -2.90
Raiders -1.62 -7.09 5.50
Eels -2.14 -7.19 5.00
Sharks -3.00 -10.76 7.80
Knights -5.58 -0.28 -5.30
Wests Tigers -7.54 -13.13 5.60
Titans -7.91 -8.20 0.30

 

Performance So Far

So far there have been 128 matches played, 73 of which were correctly predicted, a success rate of 57%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Raiders vs. Knights Jul 10 36 – 22 5.80 TRUE
2 Bulldogs vs. Broncos Jul 11 8 – 16 -4.10 TRUE
3 Warriors vs. Storm Jul 12 28 – 14 3.00 TRUE
4 Sharks vs. Dragons Jul 12 28 – 8 -3.20 FALSE
5 Titans vs. Sea Eagles Jul 13 6 – 38 -0.40 TRUE

 

Predictions for Round 19

Here are the predictions for Round 19. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Storm vs. Panthers Jul 17 Storm 4.40
2 Eels vs. Bulldogs Jul 17 Eels 0.80
3 Dragons vs. Rabbitohs Jul 18 Dragons 0.00
4 Knights vs. Titans Jul 18 Knights 5.30
5 Raiders vs. Sharks Jul 18 Raiders 4.40
6 Roosters vs. Warriors Jul 19 Roosters 9.70
7 Broncos vs. Wests Tigers Jul 19 Broncos 18.40
8 Sea Eagles vs. Cowboys Jul 20 Cowboys -2.40

 

A modest proposal

Positive-looking results are more likely to be published in scientific journals, much more likely to get press releases, and hugely more likely to end up in the news. This trend is exaggerated if the size of the association is large.  The most likely way to get a large association is to do a very small study and be lucky enough (by chance or sloppiness) to overestimate the strength of association, so the news selects for small, early-stage, and poorly-done research.

One way to reduce this bias would be for media to quote the lower (less impressive) end of the uncertainty interval (confidence interval, credibility interval) rather than quoting the midpoint of the interval as scientists usually do. In small studies, the lower end of the interval will be close to no association, even if the midpoint of the interval is a strong association. In large, well-designed studies the change in practice would have little impact.

Isn’t that biased?

If you assume that in most cases the association being tested is smaller that the uncertainty in the experiment (ie, close to zero), and that positive results are more likely to make the news then it’s less biased than using the middle of the interval.

Scientists would’t be able to use tests that don’t produce confidence intervals.

How sad. Anyway, they would, they just wouldn’t be able to get their press releases into the papers

Press releases often don’t report uncertainty estimates.

So those ones wouldn’t get in the papers. The silver linings are just piling up.

 

 

July 11, 2015

What’s in a name?

The Herald was, unsurprisingly, unable to resist the temptation of leaked data on house purchases in Auckland.  The basic points are:

  • Data on the names of buyers for one agency, representing 45% fo the market, for three months
  • Based on the names, an estimate that nearly 40% of the buyers were of Chinese ethnicity
  • This is more than the proportion of people of Chinese ethnicity in Auckland
  • Oh Noes! Foreign speculators! (or Oh Noes! Foreign investors!)

So, how much of this is supported by the various data?

First, the surnames.  This should be accurate for overall proportions of Chinese vs non-Chinese ethnicity if it was done carefully. The vast majority of people called, say, “Smith” will not be Chinese; the vast majority of people called, say, “Xu” will be Chinese; people called “Lee” will split in some fairly predictable proportion.  The same is probably true for, say, South Asian names, but Māori vs non-Māori would be less reliable.

So, we have fairly good evidence that people of Chinese ancestry are over-represented as buyers from this particular agency, compared to the Auckland population.

Second: the representativeness of the agency. It would not be at all surprising if migrants, especially those whose first language isn’t English, used real estate agents more than people born in NZ. It also wouldn’t be surprising if they were more likely to use some agencies than others. However, the claim is that these data represent 45% of home sales. If that’s true, people with Chinese names are over-represented compared to the Auckland population no matter how unrepresentative this agency is. Even if every Chinese buyer used this agency, the proportion among all buyers would still be more than 20%.

So, there is fairly good evidence that people of Chinese ethnicity are buying houses in Auckland at a higher rate than their proportion of the population.

The Labour claim extends this by saying that many of the buyers must be foreign. The data say nothing one way or the other about this, and it’s not obvious that it’s true. More precisely, since the existence of foreign investors is not really in doubt, it’s not obvious how far it’s true. The simple numbers don’t imply much, because relatively few people are housing buyers: for example, house buyers named “Wang” in the data set are less than 4% of Auckland residents named “Wang.” There are at least three other competing explanations, and probably more.

First, recent migrants are more likely to buy houses. I bought a house three years ago. I hadn’t previously bought one in Auckland. I bought it because I had moved to Auckland and I wanted somewhere to live. Consistent with this explanation, people with Korean and Indian names, while not over-represented to the same extent are also more likely to be buying than selling houses, by about the same ratio as Chinese.

Second, it could be that (some subset of) Chinese New Zealanders prefer real estate as an investment to, say, stocks (to an even greater extent than Aucklanders in general).  Third, it could easily be that (some subset of) Chinese New Zealanders have a higher savings rate than other New Zealanders, and so have more money to invest in houses.

Personally, I’d guess that all these explanations are true: that Chinese New Zealanders (on average) buy both homes and investment properties more than other New Zealanders, and that there are foreign property investors of Chinese ethnicity. But that’s a guess: these data don’t tell us — as the Herald explicitly points out.

One of the repeated points I  make on StatsChat is that you need to distinguish between what you measured and what you wanted to measure.  Using ‘Chinese’ as a surrogate for ‘foreign’ will capture many New Zealanders and miss out on many foreigners.

The misclassifications aren’t just unavoidable bad luck, either. If you have a measure of ‘foreign real estate ownership’ that includes my next-door neighbours and excludes James Cameron, you’re doing it wrong, and in a way that has a long and reprehensible political history.

But on top of that, if there is substantial foreign investment and if it is driving up prices, that’s only because of the artificial restrictions on the supply of Auckland houses. If Auckland could get its consent and zoning right, so that more money meant more homes, foreign investment wouldn’t be a problem for people trying to find somewhere to live. That’s a real problem, and it’s one that lies within the power of governments to solve.

July 9, 2015

Followup: vitamin D and diabetes

Quite some time ago, I wrote about a story on vitamin D and diabetes:

A: Someone needs to do a randomized trial, where half the participants get vitamin D and half get a dummy pill. If the effect is real, fewer people getting vitamin D will end up with diabetes.

Q: That sounds like a good idea. Is someone doing a trial?

A: Yes, Professor Peter Ebeling, of the the University of Melbourne.

Q: Is there some useful website where I can find more information about the trial?

A: Indeed.

Q: Will it work?

A: No.

Q: Are you sure?

A: No, that’s why we need the trial.[…]

While the clinical trial registry hasn’t been updated, there are now published results from this trial.  The researchers didn’t get to their planned 160 participants; they gave up at 95 because of slow recruitment.  Even so, if the results had been as dramatic as in the observational studies, they would have been able to see the benefit.

They didn’t:

In this 6-month RCT of vitamin D and calcium supplementation in which over 90% of the participants reached the target serum 25(OH)D concentration of 75 nmol/L, there was no effect of supplementation on any measure of insulin sensitivity, insulin secretion or β-cell function in multi-ethnic vitamin D-deficient individuals at risk of type 2 diabetes (with prediabetes or an AUSDRISK score ≥15).

These results are no fun, so they have not received the same media attention as the observational correlations that prompted the trial, even though they are more reliable and more relevant to individual health choices.