March 22, 2014

Polls and role-playing games

An XKCD classic

The mouseover text says “Also, all financial analysis. And, more directly, D&D.”

We’re getting to the point in the electoral cycle where opinion polls qualify as well. There will be lots of polls, and lots media and blog writing that tries to tell stories about the fluctuations from poll to poll that fit in with their biases or their need to sell advertising. So, as an aid to keeping calm and believing nothing, I thought a reminder about variability would be useful.

The standard NZ opinion poll has 750-1000 people. The ‘maximum margin of error’ is about 3.5% for 730 and about 3% for 1000. If the poll is of a different size, they will usually quote the maximum margin of error. If you have 20 polls, 19 of them should get the overall left:right division to within the maximum margin of error.

If you took 3.5% from the right-wing coalition and moved it to the left-wing coalition, or vice versa, you’d change the gap between them by 7% and get very different election results, so getting this level of precision 19 times out of 20 isn’t actually all that impressive unless you consider how much worse it could be. And in fact, polls likely do a bit worse than this: partly because voting preferences really do change, partly because people lie, and partly because random sampling is harder than it looks.

Often, news headlines are about changes in a poll, not about a single poll. The uncertainty in a change is  higher than in a single value, because one poll might have been too low and the next one too high.  To be precise, the uncertainty is 1.4 times higher for a change.  For a difference between two 750-person polls, the maximum margin of error is about 5%.

You might want a less-conservative margin than 19 out of 20. The `probable error’ is the error you’d expect half the time. For a 750-person poll the probable error is 1.3% for a single party and single poll,  2.6% for the difference between left and right in a single poll, and 1.9% for a difference between two polls for the same major party.

These are all for major parties.  At the 5% MMP threshold the margin of error is smaller: you can be pretty sure a party polling below 3.5% isn’t getting to the threshold and one polling about 6.5% is, but that’s about it.

If a party gets an electorate seat and you want to figure out if they are getting a second List seat, a national poll is not all that helpful. The data are too sparse, and the random sampling is less reliable because minor parties tend to have more concentrated support.   At 2% support the margin of error for a single poll is about 1% each way.

Single polls are not very useful, but multiple polls are much better, as the last US election showed. All the major pundits who used sensible averages of polls were more accurate than essentially everyone else.  That’s not to say experts opinion is useless, just that if you have to pick just one of statistical voodoo and gut instinct, statistics seems to work better.

In NZ there are several options. Peter Green does averages that get posted at Dim Post; his code is available. KiwiPollGuy does averages and also writes about the iPredict betting markets, and pundit.co.nz has a Poll of Polls. These won’t work quite as well as in the US, because the US has an insanely large number of polls and elections to calibrate them, but any sort of average is a big improvement over looking one poll at a time.

A final point: national polls tell you approximately nothing about single-electorate results. There’s just no point even looking at national polling results for ACT or United Future if you care about Epsom or Ohariu.

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

• Heh… Yeah need to be careful making inferences from single polls. Good to rember though (especially in NZ), that without them, poll averages wouldn’t exist.

• Thomas Lumley

Sure. You need to update the averages each time a poll comes in, and each single poll has more impact that it would in the US.

• Any thoughts about poll volatility Thomas?

I’m thinking that polls carried out more frequently have a bigger say in polling averages. If, by design, some polls have greater variance than others, there’s the potential for polling averages in NZ to be a bit volatile too, especially if the more frequent polls are also the most volatile.

I’ve been wondering if some sort of weighting could be applied in poll average calculations, so that polls that deviate more from ‘all the others at around the same time’ have less influence over the average at that time.

I guess this wouldn’t matter so much in the US, where there are so many polls. However in NZ, where there are only a few, could something like this potentially make polling averages more reliable?

I haven’t given this a huge amount of thought really… Just putting it out there to see if my logic makes sense to others.

• Thomas Lumley

There are three issues:

* In the simple case of no ‘house effect’ and independent samples, it doesn’t matter who does a poll. Weighting by the sample size [strictly, the precision] will give each poll the right amount of weight.

* If you estimate the house bias and subtract it, again weighting by precision will be ideal [no longer weighting by sample size if you have different amounts of information about house bias]

* If you have a rotating panel rather than separate samples, you need to be able to estimate the correlation. The panel will be more informative about changes over time but less informative about current averages than the same number of independent samples.

Peter Green’s code basically subtracts off biases and then runs a smoother through it. Exponential weighting (as David Scott does for rugby) is the other sensible and straightforward approach and shouldn’t be too different.

The big problem, due to our smaller number of elections, is that we don’t have much ground truth data to estimate house biases — it’s hard to do better than just subtracting the error at the last election, even though that’s obviously crude.

• Peter Green

Somewhere on my todo list I’m planning to add some outlier detection / rogue poll identification.

Excluding or downweighting contrarian results would fall under “robust” estimation. Given the characteristics of this dataset, I’d expect this to increase volatility. If you flip four coins, you will get one disagreeing with the other three 50% of the time.

• Peter Green

http://themonkeycage.org/2012/11/07/is-nate-silver-incentive-compatible/

As a statistician it’s always important to remember all the hard work that goes into the data we get to play with! If it ever sounds like I’ve forgotten feel free to tell me off on twitter :)

I’m trying to be careful with terminology now, especially using “house effect” instead of “bias”. The latter is fairly value-neutral in stats, but it’s bound to offend people in political discussions.

• Hey Peter

I’m a sucker for punishment really, as I read blogs way too often. The team I work with puts in a massive effort to get things ‘right’ – they take it really seriously and are constantly on the look out for any potential sources of non-sampling error. I’m really proud of them, and for that reason I get on the defensive a little too easily when people try to explain away results by talking about single sources of error (when there are many, many sources which need to be considered).

Like other pollsters, I’m sure, I see a poll or survey as being a bit like a Swiss Watch or an America’s Cup boat. No single change will make it right or wrong – all the different interacting parts need to be considered.

• BTW I wasn’t referring to Statschat in my comment above. This is like my happy place – although always slightly worried that one of my charts or reports will end up being critiqued here one day :)

• Thanks!

I wasn’t thinking so much about house bias. More thinking about ‘random’ volatility resulting from factors other than sample size. For example, the design effect, sampling approach, and other things that might result in some polls having more variance than others.

• Thomas Lumley

If it’s random overdispersion rather than bias you should weight by the actual precision rather than the sample size. But Peter’s smoother-based model doesn’t actually show much overdispersion, I was surprised to see.

• With the way polls are reported by the media and subsequently interpreted on blogs (eg, Egads! Poll x shows this party has gone up 1pt, but poll y shows it had gone down!), even a little extra overdispersion in one poll over another could lead to a lot of confusion.

From an interpretation point of view, it’d prefer consistent bias over volatility.

• Thomas Lumley

Maybe. I’m not sure you get to choose that tradeoff, except in weighting vs not weighting and then at least you can use a standard error that reflects the impact of weighting.

• Nobody does though. In fact, as far as I can see, only two polls seem to try to approximate a random probability design.

Perhaps I’m being too purist, but when you’re using a non-probability design doesn’t MoE just go out the window?

• Thomas Lumley

No one in NZ. In the US they do.

I think you’re being too purist. Non-response means that there isn’t such a thing as a probability sample of human respondents, but the empirical overdispersion and bias are surprisingly low most of the time. That’s why the recent Digipoll was surprising.

• Heh… Well it wouldn’t be the first time I’ve been told that. :)

My personal philosophy is that non-response is a big issue in NZ surveys, but that maintaining tight fieldwork practices, approximating a probability sample (where possible), and targeting a consistently high response rate *should* yield more interpretable changes over time (but not necessarily without bias).

Whenever I’ve run quota surveys I’ve been staggered by the number of screen outs and how low the response rate has been. But I guess response rate is not the point in that approach. Anyway, I digress…

Thanks for the dialogue Thomas.

• Thomas Lumley

It’s a good point. All I mean about ‘too purist’ is that given non-response, probability sampling isn’t nearly enough to explain the good performance of surveys that attempt it.

That means I don’t think one can rule out good, reliable estimates from other methods. I’d always prefer a probability sample, I’m just not as confident as I used to be that they really are necessary. Old-fashioned ‘stand in the street and grab passersby’ quota samples were awful, but good ones could work

In medical treatment evaluation I’m much more of a purist about randomisation, the equivalent idea, but we can afford much stricter standards for non-adherence, so it’s not such a problem.

• Yeah I was *really* impressed with how closely the pre-election polls were to the result in 2011, even with all the different approaches.

I came to survey research through a social psychology path, so I struggle with the idea that measuring voter sentiment is all about getting the right number of people by age, gender, location, etc (there are so many other variables). Clearly it’s a solution that seems to work though.

• Unrelated to the post, but if you like the xkcd comic’s view of sports you might like this comic as well.

http://cheezburger.com/7939487744