Posts written by Thomas Lumley (1266)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

September 22, 2014

Blame it all on mum

Says the Herald (reprinting the Daily Mail)

If you have always found doing sums a struggle, you might just be able to blame your mother.

Because research has linked a woman’s hormone levels in pregnancy with her child’s maths skills at age five.

Boys and girls whose mothers were very low in the hormone thyroxine were almost twice as likely to do badly in arithmetic tests, it found.

The hormone in question is thyroxine, produced by the thyroid, and the basic issue is that iodine deficiency is getting more common again. In Australia and New Zealand, iodine has been added to bread since September 2009 to address this problem. In Australia, the fortification of bread has been fairly successful; there doesn’t seem to be data for New Zealand, but there’s no reason to expect it to be different. So the story  may not be applicable to New Zealanders.

Also, as with the cannabis paper a couple of weeks ago, the “twice as likely” is simply wrong.  Doing badly in arithmetic was defined as being in the bottom 50%, and it’s not plausible that low-thyroxine kids are twice as likely to be in the bottom half.  In fact, it’s the odds ratio for being in the lower 50% of students in maths that was 1.79.  Since the overall odds of being in the bottom half is 1:1, if you multiply by 1.79 you get 1.79:1, which is a probability of 64% of being in the bottom half.

A difference between 50% and 64% is not “almost twice as likely”, and “blame” is a completely inappropriate term — this is new research, so even if it’s true (it could be) and relevant to New Zealand (it probably isn’t) it would not be something for which ‘blame’ would be appropriate. There’s entirely too much blaming mothers already.

So, we had an election

Turnout of enrolled voters was up 3 percentage points over 2011, but enrollment was down, so as a fraction of the eligible population, turnout was only up half a percentage point.

From the Herald’s interactive, the remarkably boring trends through the count

There are a few electorates that are, arguably, still uncertain, but by 9pm the main real uncertainty at the nationwide level was whether Hone Harawira would win Te Tai Tokerau, and that wasn’t going to affect who was in government.  By 10pm it was pretty clear Harawira was out (though he hadn’t conceded) and that Internet Mana had been, in his opponent’s memorable phrase, “all steam and no hangi.”

Jonathan Marshall (@jmarshallnz) has posted swings in each electorate, for the party vote and electorate vote. He also has an interactive Sainte-Laguë seat allocation calculator and has published the data (complete apart from special votes) in a convenient form for y’all to play with.

David Heffernan (@kiwipollguy) collected a bunch of poll, poll average, and pundit predictions, and writes about them here. The basic summary is that they weren’t very good, though there weren’t any totally loony ones, as there were for the last US Presidential election. Our pundits seem to be moderately well calibrated to reality, but there’s a lot of uncertainty in the system and the improvement from averaging seems pretty small.  The only systematic bias is that the Greens did a bit worse than expected.

Based on his criterion, which is squared prediction error scaled basically by party vote, two single polls — 3 News/Reid at the high end and Herald Digipoll at the low end — spanned almost the entire range of prediction error.

The variation between predictions isn’t actually much bigger than you’d expect by chance. The prediction errors have the mean you’d expect from a random sample of about 400 people, and apart from two outliers they have the right spread as well. On the graph, the red curve is a chi-squared distribution with 9 degrees of freedom, and the black curve is the distribution of the 23 estimates. The outliers are Wikipedia and the last 3 News/Reid Research poll.


About half the predictions were qualitatively wrong: they had National needing New Zealand First or the Conservatives for a majority. The Conservatives were clearly treated unfairly by the MMP threshold. If someone is going to be, I’m glad it’s them, but a party with more votes than the Māori Party, Internet Mana, ACT, United Future, and Legalise Cannabis put together should have a chance to prove their unsuitability in Parliament.


September 21, 2014


Data collection edition

  • Too Much Information:  Clemson University, in South Carolina, was “requiring students and faculty to complete an online course through a third party website that asks invasive questions about sexual history.
  • Too Little Information: New Zealand insurers are not willing to cooperate with price-comparison websites of the sort that exist elsewhere in the world.  These have led to lower prices where they have been introduced, but the insurers say their real concern is that people won’t get the most appropriate cover. (Herald today, Stuff back in March)
  • Just Right (maybe):  Apple says that with the new iOS8 operating system it is unable to unlock phones and decrypt data, and so will be able to refuse government demands to do so. Of course, the government can still grab lots of metadata, and as John Gilmore points out, there’s nothing but Apple’s bare word to go on.
September 19, 2014

Not how polling works

The Herald interactive for election results looks really impressive. The headline infographic for the latest poll, not so much. The graph is designed to display changes between two polls, for which the margin of error is 1.4 times higher than in a single poll: the margin of error for National goes beyond the edge of the graph.



The lead for the story is worse

The Kim Dotcom-inspired event in Auckland’s Town Hall that was supposed to end John Key’s career gave the National Party an immediate bounce in support this week, according to polling for the last Herald DigiPoll survey.

Since both the Dotcom and Greenwald/Snowden Moments of Truth happened in the middle of polling, they’ve split the results into before/after Tuesday.  That is, rather than showing an average of polls, or even a single poll, or even a change from a single poll, they are headlining the change between the first and second halves of a single poll!

The observed “bounce” was 1.3%. The quoted margin of error at the bottom of the story is 3.5%, from a poll of 775 people. The actual margin of error for a change between the first and second halves of the poll is about 7%.

Only in the Internet Party’s wildest dreams could this split-half comparison have told us anything reliable. It would need the statistical equivalent of the CSI magic video-zoom enhance button to work.


September 18, 2014

Bald truth

From the Herald

Men who are bald at age 45 are more likely to develop aggressive prostate cancer compared with those who keep their hair.

US researchers found those who lose hair at the front of their heads and have moderate hair-thinning on the crown were 40 per cent more likely to develop a fast-growing tumour in their prostate.

This was compared with men with no baldness.

That’s all true, but what casts doubt on this finding is that you get the same results if you compare to men with severe baldness. That is, the research found a higher rate of aggressive prostate cancer in men who had ‘moderate’ baldness on the top of head, but not in those who had milder or more severe forms, and no increase in non-aggressive prostate cancer. Here are the estimated relative increases in risk, with confidence intervals


When you consider the lack of a consistent trend, and the fact that the evidence isn’t all that strong for the moderate-baldness/aggressive-cancer combination, I don’t think this is worth getting all that excited about.

This one can mostly be blamed on the journal: the American Society of Clinical Oncology press release isn’t too bad in itself, but if you compare it to the other recent occasions when ASCO have issued a press release, it doesn’t really measure up.

Interactive election results map

The Herald has an interactive election-results map, which will show results for each polling place as they come in, together with demographic information about each electorate.  At the moment it’s showing the 2011 election data, and the displays are still being refined — but the Herald has started promoting it, so I figure it’s safe for me to link as well.

Mashblock is also developing an election site. At the moment they have enrolment data by age. Half the people under 35 in Auckland Central seem to be unenrolled,which is a bit scary. Presumably some of them are students enrolled at home, and some haven’t been in NZ long enough to enrol, but still.

Some non-citizens probably don’t know that they are eligible — I almost missed out last time. So, if you know someone who is a permanent resident and has lived in New Zealand for a year, you might just ask if they know about the eligibility rules. Tomorrow is the last day.

September 15, 2014


  • From the Brainflapping blog at the Guardian, a set of classifications for science stories (Axe Grinding, Soapbox, Wild Extrapolation). My favourite “Article has not been checked by anyone who knows how to communicate”
September 10, 2014

Cannabis graduation exaggeration


Teenagers who use cannabis daily are seven times more likely to attempt suicide and 60 percent less likely to complete high school than those who don’t, latest research shows.

Me (via Science Media Center)

“The associations in the paper are summarised by estimated odds ratios comparing non-users to those who used cannabis daily. This can easily be misleading to non-specialists in two ways. Firstly, nearly all the statistical evidence comes from the roughly 1000 participants who used cannabis less than daily, not the roughly 50 daily users — the estimates for daily users are an extrapolation.

“Secondly, odds ratios are hard to interpret.  For example, the odds ratio of 0.37 for high-school graduation could easily be misinterpreted as a 0.37 times lower rate of graduation in very heavy cannabis users. In fact, if the overall graduation rate matched the New Zealand rate of 75%, the rate in very heavy cannabis users would be 53%, and the rate in those who used cannabis more than monthly but less than weekly would be 65%.

That is, the estimated rate of completing high school is not 60% lower, it’s about 20% lower.  This is before you worry  about the extrapolation from moderate to heavy users and the causality question. The 60% figure is unambiguously wrong. It isn’t even what the paper claims.  It’s an easy mistake to make, though the researchers should have done more to prevent it, and that’s why it was part of my comments last week.

You can read all the Science Media Centre commentary here.


[Update: The erroneous '60% less likely to complete high school' statement is in the journal press release. That's unprofessional at best.]

(I could also be picky and point out 3News have the journal wrong: The Lancet Psychiatry, which started this year, is not the same as The Lancet, founded in 1823)

September 8, 2014

Poll meta-analyses in NZ

As we point out from time to time, single polls aren’t very accurate and you need sensible averaging.

There are at least three sets of averages for NZ:

1. Peter Green’s analyses, which get published at DimPost (larger parties, smaller parties). The full code is here.

2. Pundit’s poll of polls. They have a reasonably detailed description of their approach and it follows what Nate Silver did for the US elections.

3. Curiablog’s time and size weighted average. Methodology described here

The implementors of these cover a reasonable spectrum of NZ political affiliation. The results agree fairly closely except for one issue: Peter Green adds a correction to make the predictions go through the 2011 election results, which no-one else does.

According to Gavin White, there is a historical tendency for National to do a bit worse and NZ First to do a bit better in the election than in the polls, so you’d want to correct for this, but you could also argue that the effect was stronger than usual at the last election so this might overcorrect.

In addition to any actual changes in preferences over the next couple of weeks, there are three polling issues we don’t have a good handle on:

  • Internet Mana is new, and you could make a plausible case that their supporters might be harder for the  pollers to get a good grip on (note: age and ethnicity aren’t enough here, the pollers do take account of those).
  • There seems to have been a big increase in ‘undecided‘ responders to the polls, apparently from former Labour voters. To the extent that this is new, no-one really knows what they will do on the day.
  • Polling for electorates is harder, especially when strategic voting is important, as in Epsom.


[Update: thanks to Bevan Weir in comments, there's also a Radio NZ average. It's a simple unweighted average with no smoothing, which isn't ideal for estimation but has the virtue of simplicity]


  • Interesting maps: a Moral Topography of Portland “The [1913] report found, specified per type of dwelling, only 22 of 80 apartments, merely 5 out of 59 hotels and no more than 71 out of 408 rooming and lodging houses to be ‘moral’.”  If you cynically expected that “immoral” didn’t refer to racial discrimination, rent-gouging, unhygienic conditions, or lack of fire escapes, you were right. (via


  • An interactive display of US lifetime earnings for various groups by education and gender. The underlying data are good, but there’s inevitably an assumption that the correlations with education are broadly a result of education (including social status and networking effects) rather than selection for existing differences.