April 14, 2015

Cumulative totals go up

From ThinkProgress  (graph from Wikipedia) “U.S. plug-in electric vehicle cumulative sales have soared in the past few years, thanks in part to rapidly falling battery prices” and “A major reason for the rapid jump in EV sales is the rapid drop in the cost of their key component -– batteries.”


From a cumulative graph it’s hard to tell whether the cumulative sales have soared due to rapidly falling battery prices or just due to the fact that cumulative sales have to increase, but the past few years look pretty much like straight lines to me.

Here’s the noncumulative monthly sales, with the same colour-coding: there hasn’t been a big increase in the rate of sales during 2013 or 2014, so it’s not clear there’s much for falling battery prices to explain. Beyond the graph, for the first three months of 2015 there have been slightly few sales than in the first three months of 2014.


Cumulative sales of a new technology with sizeable network effects are important: it matters how many plug-in vehicles are out there. A cumulative graph is still a bad way to see patterns.


Northland school lunch numbers

Last week’s Stat of the Week nomination for the Northern Advocate didn’t, we thought point out anything particularly egregious. However, it did provoke me to read the story — I’d previously only  seen the headline 22% statistic on Twitter.  The story starts

Northland is in “crisis” as 22 per cent of students from schools surveyed turn up without any or very little lunch, according to the Te Tai Tokerau Principals Association.

‘Surveyed’ is presumably a gesture in the direction of the non-response problem: it’s based on information from about 1/3 of schools, which is made clear in the story. And it’s not as if the number actually matters: the Te Tai Tokerau Principals Association basically says it would still be a crisis if the truth was three times lower (ie, if there were no cases in schools that didn’t respond), and the Government isn’t interested in the survey.

More evidence that number doesn’t matter is that no-one seems to have done simple arithmetic. Later in the story we read

The schools surveyed had a total of 7352 students. Of those, 1092 students needed extra food when they came to school, he said.

If you divide 1092 by 7352 you don’t get 22%. You get 15%.  There isn’t enough detail to be sure what happened, but a plausible explanation is that 22% is the simple average of the proportions in the schools that responded, ignoring the varying numbers of students at each school.

The other interesting aspect of this survey (again, if anyone cared) is that we know a lot about schools and so it’s possible to do a lot to reduce non-response bias.  For a start, we know the decile for every school, which you’d expect to be related to food provision and potentially to response. We know location (urban/rural, which district). We know which are State Integrated vs State schools, and which are Kaupapa Māori. We know the number of students, statistics about ethnicity. Lots of stuff.

As a simple illustration, here’s how you might use decile and district information.  In the Far North district there are (using Wikipedia because it’s easy) 72 schools.  That’s 22 in decile one, 23 in decile two, 16 in decile three, and 11 in deciles four and higher.  If you get responses from 11 of the decile-one schools and only 4 of the decile-three schools, you need to give each student in those decile-one schools a weight of 22/11=2 and each student in the decile-three schools a weight of 16/4=4. To the extent that decile predicts shortage of food you will increase the precision of your estimate, and to the extent that decile also predicts responding to the survey you will reduce the bias.

This basic approach is common in opinion polls. It’s the reason, for example, that the Green Party’s younger, mobile-phone-using support isn’t massively underestimated in election polls. In opinion polls, the main limit on this reweighting technique is the limited amount of individual information for the whole population. In surveys of schools there’s a huge amount of information available, and the limit is sample size.

April 13, 2015

Puppy prostate perception

The Herald tells us “Dogs have a 98 per cent reliability rate in sniffing out prostate cancer, according to newly-published research.” Usually, what’s misleading about this sort of conclusion is the base-rate problem: if a disease is rare, 98% accuracy isn’t good enough. Prostate cancer is different.

Blood tests for prostate cancer are controversial because prostate tumours are common in older men, but only some tumours progress to cause actual illness.  By “controversial” I don’t mean the journalistic euphemism for “there are a few extremists who aren’t convinced”, but actually controversial.  Groups of genuine experts, trying to do the best for patients, can come to very different conclusions on when testing is beneficial.

The real challenge in prostate cancer screening is to distinguish the tumours you don’t want to detect from the ones you really, really do want to detect. The real question for the canine sniffer test is how well it does on this classification.

Since the story doesn’t give the researchers’s names finding the actual research takes more effort than usual. When you track the paper down it turns out that the dogs managed almost perfect discrimination between men with prostate tumours and everyone else. They detected tumours that were advanced and being treated, low-risk tumours that had been picked up by blood tests, and even minor tumours found incidentally in treatment for prostate enlargement. Detection didn’t depend on tumour size, on stage of disease, on PSA levels, or basically anything. As the researchers observed “The independence of tumor volume and aggressiveness, and the dog detection rate is surprising.”

Surprising, but also disappointing. Assuming the detection rate is real — and they do seem to have taken precautions against the obvious biases — the performance of the dogs is extremely impressive. However, the 98% accuracy in distinguishing people with and without prostate tumours unavoidably translates into a much lower accuracy in distinguishing tumours you want to detect from those you don’t want to detect.

Stat of the Week Competition: April 11 – 17 2015

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday April 17 2015.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of April 11 – 17 2015 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.


April 12, 2015

Reductionism and the Petone-Grenada link

If you have to make a decision with several options, each with different types of positive and negative effects, it’s going to be hard. Techniques for breaking down complex decisions into sets of simpler questions are very valuable, but it’s important that the way you break down the problem and recombine the answers fits with how you answer the simpler questions.

I’ve been pointed to what looks like an unfortunate example from the NZTA, in assessing options for the Petone–Grenada link road to be constructed near Wellington. The road comes in two sections: from Petone to the eastern section of Lincolnshire Farm, and from there to Grenada. According to the scoping report (PDF), these can be decided independently of each other, so there’s an ideal opportunity to simplify the decision making.  NZTA describes four options P1 to P4 for the first section, and four options A to D for the second section.

I would have expected them to just make independent recommendations for the two sections, but what they actually did was more complicated. First, they looked at the P options and decided based on four criteria that P4 was best.  They then looked at A+P4, B+P4, C+P4, and D+P4 for the same four criteria, and said in a footnote (p172) “Upon combining one of Option P1, P2, P3 or P4 with one Option A, B, C or D the effect more towards the negative takes precedence.

This can only make sense if the harms or benefits weren’t independent.  Sometimes that’s possible. In particular, one of the criteria was “resilience”, and you might argue that it doesn’t matter how robust the second part of the road is when the first part is under several meters of rock and mud, or filled with bumper-to-bumper traffic jams. It could make sense to take the worst value of the two sections when assessing resilience: but people who know more about Wellington-area transport than I do still seem dubious.

The same argument certainly doesn’t apply for the other criteria: archaeological,  ecological,  landscape/visual impact, and transport benefit/cost. If one section of the road is an environmental nightmare, that doesn’t make the environmental impact of the other section unimportant. If one section of the road is unavoidably ugly, that doesn’t excuse making the other section ugly. If one section destroys an important heritage site, it doesn’t mean the other section doesn’t have to care about preservation of the past. If one section is ridiculously expensive it doesn’t mean the costs are unimportant for the other section.

The impact of decomposing and recombining the evaluation as they did, is that any criterion where P4 was bad becomes much less important in choosing among options A to D. P4 was very bad on the landscape/visual criterion, and moderately bad on ecology.

By now you should be expecting the punch line: evaluated independently, options A and B look good because they score well on ecology and landscape/visual criteria. Evaluated in combination with P4, they look terrible, because the ecology and landscape benefits are masked by the “more negative” combining rule. That’s a problem with the combining rule, not with the road. Here’s a colour-coded version of the information in Table 23-19, p182 (from T. Duran)


Not only is the combining rule obviously missing some information, it’s not even internally consistent. If the evaluation had been done in the opposite order they might well have chosen A first, and then looked at A+P1 to A+P4. Even D was what they’d chosen first, P3+D would then look slightly better than P4+D.

It’s very tempting to look for ways of combining preferences that don’t rely on numbers, just on orderings, but in most cases they aren’t available, and attempts to do it leave you worse off than before.

This evaluation wasn’t set up to focus only on resilience — even assuming that the resilience assessment is valid, which I hear is also being questioned — it was set up to value the four criteria equally. It really looks as though a minor detail of the approach to simplifying the evaluation has had a large, accidental effect on the result.

April 10, 2015


  • A properly-conducted opinion poll in Cuba, done in secret. Impressive.
  • As the Herald reports, New Zealand moved from 1st to 5th on the index reported by Social Progress Imperative. The story also points out, helpfully, that a lot of this is changes in how things are measured.  It turns out this goes further:  a 2014 version of the index is available using the new measurements. When the same definitions are used for the two years, NZ stays at the same ranking (5th) and improves on the actual values (from 86.93 to 87.08).
  • JPMorgan is using workplace data to predict which employees are likely to ‘go rogue’. Matt Levine doesn’t really worry. The Bloomberg News story worries a bit, but only “Policing intentions can be a slippery slope. Do people get a scarlet letter for something they have yet to do?” They don’t seem to consider false positives: people who weren’t going to do anything wrong (or more wrong than is necessary if you work for an investment bank).
  • The NZ Association of Scientists is having a conference titled “Speaking Out: Going public on difficult issues”. There will probably be more stuff on line soon, but currently you can read an expanded version of Peter Gluckman’s talk, and listen to (NZAS President) Nicola Gaston on Radio NZ; the Twitter hashtag is 

Odds and probabilities

When quoting results of medical research there’s often confusion between odds and probabilities, but there are stories in the Herald and Stuff at the moment that illustrate the difference.

As you know (unless you’ve been on Mars with your eyes shut and your fingers in your ears), Jeremy Clarkson will no longer be presenting Top Gear, and the world is waiting with bated breath to hear about his successor.  Coral, a British firm of bookmakers, say that Sue Perkins is the current favourite.

The Herald quotes the Daily Mail, and so gives the odds as odds:

It has made her evens for the role, ahead of former X-factor presenter Dermot O’Leary who is 2-1 and British model Jodie Kidd who is third at 5-2.

Stuff translates these into NZ gambling terms, quoting the dividend, which is the reciprocal of the probability at which these would be regarded as fair bets

Bookmaker Coral have Perkins as the equivalent of a $2 favourite after a flurry of bets, while British-Irish presenter Dermot O’Leary was at $3 and television personality and fashion model Jodie Kidd at $3.50.

An odds of 5-2 means that betting £2 and winning gives you a profit of £5.  The NZ approach is to quote the total money you get back: a bet of $2 gets you $2 back plus $5 profit, for a total of $7, so a bet of $1 would get you $3.50.

The fair probability of winning for an odds of 5-2 is 2/(5+2); the fair probability for a dividend of $3.50 is 1/3.50, the same number.

Of course, if these were fair bets the bookies would go out of business: the actual implied probability for Jodie Kidd is lower than 1/3.5 and the actual implied probability for Sue Perkins is lower than 0.5.  On top of that, there is no guarantee the betting public is well calibrated on this issue.


April 9, 2015

Graph of the week


Number of learner license tests taken in New Zealand, according to One News.

We’ll follow up to see if the future prediction part of the graph turns out to be correct.

Height and heart attack: genetic determinism is still wrong

From the Herald (originally from the Independent)

Short people are at a greater risk of heart attack – and there’s little they can do about it because the link is genetic.

This one is partly the fault of the researchers and partly the fault of the journalists.  The press release says

“We have shown that the association between shorter height and higher risk of coronary heart disease is a primary relationship and is not due to confounding factors such as nutrition or poor socioeconomic conditions.”

That’s partly true, and new and interesting, but (a) it’s being oversold (“the” association?) and (b) even if it were completely true, it wouldn’t imply the “there’s little they can do about it” added by the journalists.

Taking the second point first: knowing that something has a genetic component tells you absolutely nothing about how easy or hard it is to change. At a biological level hair colour and eye colour have similar degrees of genetic influence, but one of them is very easy to change and the other is more difficult and inconvenient.

Also, it’s certainly not true that height is entirely genetically determined. There is a genetic component: tall people have tall children. There is also an environmental component: most people are taller than their grandparents.  Here’s a graph (source) showing how the heights of Dutch people changed over sixty years: the Dutch went from some of the shortest people in Europe to some of the tallest, and this was an environmental change, not a genetic change.


The research paper doesn’t even claim that among modern Westerners the association between height and heart attack risk is all genetic, though if you only have the press release you have to read carefully to avoid getting that impression. Even within the (fairly homogeneous) groups of people being studied, the genetic variants they used explain only about 10% of the variation in height.

What’s new in this research is that some of the relationship between height and heart attack risk is genetic. Until now, it was possible that all the association was explained by environmental factors in childhood or before birth that made people shorter and also, separately, increased their heart attack risk.

For the part of the relationship explained by genetic variation there are basically three possible sorts of explanation:

  • Being short has some direct biological effect on risk,  for example, smaller people have smaller blood vessels, which might get blocked by smaller blood clots.
  • Being short subjects you to different environmental risks: for example, if shorter people had lower incomes (on average) they might have higher risk for various social and lifestyle reasons
  • The genetic variants that make you shorter also have some separate effect on heart attack risk: for example, the same variant might affect growth in infancy and also affect diabetes risk in later life.

These are all interesting, and there’s a reasonable hope of being able to separate them out with more data and experiments.

The last sentence of the research paper is a good counterpoint to the media coverage

More generally, our findings underscore the complexity underlying the inherited component of CAD.



[Disclosure: I work with one of the cohorts that is part of one of the consortia that is part of the whole Cardiogram group and I know some of the researchers — but that would be true of anyone in the field]

April 8, 2015

NRL Predictions for Round 6

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Rabbitohs 11.30 13.06 -1.80
Roosters 9.06 9.09 -0.00
Cowboys 6.51 9.52 -3.00
Storm 5.32 4.36 1.00
Broncos 4.51 4.03 0.50
Panthers 2.88 3.69 -0.80
Bulldogs 1.52 0.21 1.30
Warriors 1.47 3.07 -1.60
Knights 0.29 -0.28 0.60
Dragons -1.66 -1.74 0.10
Sea Eagles -2.21 2.68 -4.90
Eels -5.31 -7.19 1.90
Raiders -6.34 -7.09 0.70
Wests Tigers -7.54 -13.13 5.60
Sharks -8.87 -10.76 1.90
Titans -9.60 -8.20 -1.40


Performance So Far

So far there have been 40 matches played, 22 of which were correctly predicted, a success rate of 55%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Bulldogs vs. Rabbitohs Apr 03 17 – 18 -7.80 TRUE
2 Titans vs. Broncos Apr 03 16 – 26 -11.30 TRUE
3 Knights vs. Dragons Apr 04 0 – 13 7.80 FALSE
4 Sea Eagles vs. Raiders Apr 04 16 – 29 10.30 FALSE
5 Roosters vs. Sharks Apr 05 12 – 20 25.40 FALSE
6 Eels vs. Wests Tigers Apr 06 6 – 22 8.60 FALSE
7 Panthers vs. Cowboys Apr 06 10 – 30 2.40 FALSE
8 Storm vs. Warriors Apr 06 30 – 14 6.50 TRUE


Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Broncos vs. Roosters Apr 10 Roosters -1.50
2 Sharks vs. Knights Apr 10 Knights -6.20
3 Eels vs. Titans Apr 11 Eels 7.30
4 Panthers vs. Sea Eagles Apr 11 Panthers 8.10
5 Warriors vs. Wests Tigers Apr 11 Warriors 13.00
6 Dragons vs. Bulldogs Apr 12 Bulldogs -0.20
7 Raiders vs. Storm Apr 12 Storm -8.70
8 Rabbitohs vs. Cowboys Apr 13 Rabbitohs 7.80