Beyond the margin of error
From Twitter, this morning (the graphs aren’t in the online story)
— Lennart Nout (@lennartnout) March 19, 2014
Now, the Herald-Digipoll is supposed to be a real survey, with samples that are more or less representative after weighting. There isn’t a margin of error reported, but the standard maximum margin of error would be a little over 6%.
There are two aspects of the data that make it not look representative. Thr first is that only 31.3%, or 37% of those claiming to have voted, said they voted for Len Brown last time. He got 47.8% of the vote. That discrepancy is a bit larger than you’d expect just from bad luck; it’s the sort of thing you’d expect to see about 1 or 2 times in 1000 by chance.
More impressively, 85% of respondents claimed to have voted. Only 36% of those eligible in Auckland actually voted. The standard polling margin of error is ‘two sigma’, twice the standard deviation. We’ve seen the physicists talk about ‘5 sigma’ or ‘7 sigma’ discrepancies as strong evidence for new phenomena, and the operations management people talk about ‘six sigma’ with the goal of essentially ruling out defects due to unmanaged variability. When the population value is 36% and the observed value is 85%, that’s a 16 sigma discrepancy.
The text of the story says ‘Auckland voters’, not ‘Aucklanders’, so I checked to make sure it wasn’t just that 12.4% of the people voted in the election but didn’t vote for mayor. That explanation doesn’t seem to work either: only 2.5% of mayoral ballots were blank or informal. It doesn’t work if you assume the sample was people who voted in the last national election. Digipoll are a respectable polling company, which is why I find it hard to believe there isn’t a simple explanation, but if so it isn’t in the Herald story. I’m a bit handicapped by the fact that the University of Texas internet system bizarrely decides to block the Digipoll website.
So, how could the poll be so badly wrong? It’s unlikely to just be due to bad sampling — you could do better with a random poll of half a dozen people. There’s got to be a fairly significant contribution from people whose recall of the 2013 election is not entirely accurate, or to put it more bluntly, some of the respondents were telling porkies. Unfortunately, that makes it hard to tell if results for any of the other questions bear even the slightest relationship to the truth.
Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »