Posts filed under Random variation (85)

April 11, 2014

The favourite never wins?

From Deadspin, an analysis of accuracy in 11 million tournament predictions (‘brackets’) for the US college basketball competition, and 53 predictions by experts



Stephen Pettigrew’s analysis shows the experts average more points than the general public (651681 vs 604.4). What he doesn’t point out explicitly is that picking the favourites, which corresponds to the big spike at 680 points, does rather better than the average expert.


March 31, 2014

Election poll averaging

The DimPost posted a new poll average and trend, which gives an opportunity to talk about some of the issues in interpretation (you should also listen to Sunday’s Mediawatch episode)

The basic chart looks like this


The scatter of points around the trend line shows the sampling uncertainty.  The fact that the blue dots are above the line and the black dots are below the line is important, and is one of the limitations of NZ polls.  At the last election, NZ First did better, and National did worse, than in the polling just before the election. The trend estimates basically assume that this discrepancy will keep going in the future.  The alternative, since we’ve basically got just one election to work with, is to assume it was just a one-off fluke and tells us nothing.

We can’t distinguish these options empirically just from the poll results, but we can think about various possible explanations, some of which could be disproved by additional evidence.  One possibility is that there was a spike in NZ First popularity at the expense of National right at the election, because of Winston Peters’s reaction to the teapot affair.  Another possibility is that landline telephone polls systematically undersample NZ First voters. Another is that people are less likely to tell the truth about being NZ First voters (perhaps because of media bias against Winston or something).  In the US there are so many elections and so many polls that it’s possible to estimate differences between elections and polls, separately for different polling companies, and see how fast they change over time. It’s harder here. (update: Danyl Mclauchlan points me to this useful post by Gavin White)

You can see some things about different polling companies. For example, in the graph below, the large red circles are the Herald-Digipoll results. These seem a bit more variable than the others (they do have a slightly smaller sample size) but they don’t seem biased relative to the other polls.  If you click on the image you’ll get the interactive version. This is the trend without bias correction, so the points scatter symmetrically around the trend lines but the trend misses the election result for National and NZ First.


March 22, 2014

Polls and role-playing games

An XKCD classic



The mouseover text says “Also, all financial analysis. And, more directly, D&D.” 

We’re getting to the point in the electoral cycle where opinion polls qualify as well. There will be lots of polls, and lots media and blog writing that tries to tell stories about the fluctuations from poll to poll that fit in with their biases or their need to sell advertising. So, as an aid to keeping calm and believing nothing, I thought a reminder about variability would be useful.

The standard NZ opinion poll has 750-1000 people. The ‘maximum margin of error’ is about 3.5% for 730 and about 3% for 1000. If the poll is of a different size, they will usually quote the maximum margin of error. If you have 20 polls, 19 of them should get the overall left:right division to within the maximum margin of error.

If you took 3.5% from the right-wing coalition and moved it to the left-wing coalition, or vice versa, you’d change the gap between them by 7% and get very different election results, so getting this level of precision 19 times out of 20 isn’t actually all that impressive unless you consider how much worse it could be. And in fact, polls likely do a bit worse than this: partly because voting preferences really do change, partly because people lie, and partly because random sampling is harder than it looks.

Often, news headlines are about changes in a poll, not about a single poll. The uncertainty in a change is  higher than in a single value, because one poll might have been too low and the next one too high.  To be precise, the uncertainty is 1.4 times higher for a change.  For a difference between two 750-person polls, the maximum margin of error is about 5%.

You might want a less-conservative margin than 19 out of 20. The `probable error’ is the error you’d expect half the time. For a 750-person poll the probable error is 1.3% for a single party and single poll,  2.6% for the difference between left and right in a single poll, and 1.9% for a difference between two polls for the same major party.

These are all for major parties.  At the 5% MMP threshold the margin of error is smaller: you can be pretty sure a party polling below 3.5% isn’t getting to the threshold and one polling about 6.5% is, but that’s about it.

If a party gets an electorate seat and you want to figure out if they are getting a second List seat, a national poll is not all that helpful. The data are too sparse, and the random sampling is less reliable because minor parties tend to have more concentrated support.   At 2% support the margin of error for a single poll is about 1% each way.

Single polls are not very useful, but multiple polls are much better, as the last US election showed. All the major pundits who used sensible averages of polls were more accurate than essentially everyone else.  That’s not to say experts opinion is useless, just that if you have to pick just one of statistical voodoo and gut instinct, statistics seems to work better.

In NZ there are several options. Peter Green does averages that get posted at Dim Post; his code is available. KiwiPollGuy does averages and also writes about the iPredict betting markets, and has a Poll of Polls. These won’t work quite as well as in the US, because the US has an insanely large number of polls and elections to calibrate them, but any sort of average is a big improvement over looking one poll at a time.

A final point: national polls tell you approximately nothing about single-electorate results. There’s just no point even looking at national polling results for ACT or United Future if you care about Epsom or Ohariu.

March 20, 2014

Beyond the margin of error

From Twitter, this morning (the graphs aren’t in the online story)

Now, the Herald-Digipoll is supposed to be a real survey, with samples that are more or less representative after weighting. There isn’t a margin of error reported, but the standard maximum margin of error would be  a little over 6%.

There are two aspects of the data that make it not look representative. Thr first is that only 31.3%, or 37% of those claiming to have voted, said they voted for Len Brown last time. He got 47.8% of the vote. That discrepancy is a bit larger than you’d expect just from bad luck; it’s the sort of thing you’d expect to see about 1 or 2 times in 1000 by chance.

More impressively, 85% of respondents claimed to have voted. Only 36% of those eligible in Auckland actually voted. The standard polling margin of error is ‘two sigma’, twice the standard deviation.  We’ve seen the physicists talk about ’5 sigma’ or ’7 sigma’ discrepancies as strong evidence for new phenomena, and the operations management people talk about ‘six sigma’ with the goal of essentially ruling out defects due to unmanaged variability.  When the population value is 36% and the observed value is 85%, that’s a 16 sigma discrepancy.

The text of the story says ‘Auckland voters’, not ‘Aucklanders’, so I checked to make sure it wasn’t just that 12.4% of the people voted in the election but didn’t vote for mayor. That explanation doesn’t seem to work either: only 2.5% of mayoral ballots were blank or informal. It doesn’t work if you assume the sample was people who voted in the last national election.  Digipoll are a respectable polling company, which is why I find it hard to believe there isn’t a simple explanation, but if so it isn’t in the Herald story. I’m a bit handicapped by the fact that the University of Texas internet system bizarrely decides to block the Digipoll website.

So, how could the poll be so badly wrong? It’s unlikely to just be due to bad sampling — you could do better with a random poll of half a dozen people. There’s got to be a fairly significant contribution from people whose recall of the 2013 election is not entirely accurate, or to put it more bluntly, some of the respondents were telling porkies.  Unfortunately, that makes it hard to tell if results for any of the other questions bear even the slightest relationship to the truth.




March 18, 2014

Seven sigma?

The cosmologists are excited today, and there is data visualisation all over my Twitter feed

That’s a nice display of uncertainty at different levels of evidence, before (red) and after (blue) adding new data.  To get some idea of what is greater than zero and why they care, read the post by our upstairs neighbour Richard Easther (head of the Physics department)

March 1, 2014

It’s cold out there, in some places

Next week I’m visiting Iowa State University, one of the places where the discipline of statistics was invented. It’s going to be cold — the overnight minimum on Sunday is forecast at -25C — because another of the big winter storms is passing through.

The storms this year have been worse than usual. Minneapolis (where they know from cold) is already up to its sixth-highest number of days with the maximum below 0F (-18C, the temperature in your freezer). The Great Lakes have 88% ice cover, more than they have had for twenty years.

Looking at data from NOAA, this winter has been cold overall in the US, very slightly below the average for the past century or so.


However, that’s just the US. For the northern hemisphere as a whole, it’s been an unusually warm winter, well above historical temperatures



This has been your periodic reminder that weather news, for good reasons, gives you a very selective view of global temperature.



February 13, 2014

How stats fool juries

Prof Peter Donnelly’s TED talk. You might want to skip over the first few minutes of vaguely joke-like objects

Consider the two (coin-tossing) patterns HTH and HTT. Which of the following is true:

  1. The average number of tosses until HTH is larger than the average number of tosses until HTT
  2. The average number of tosses until HTH is the same as  the average number of tosses until HTT
  3. The average number of tosses until HTH is smaller than the average number of tosses until HTT?

Before you answer, you should know that most people, even mathematicians, get this wrong.

Also, as Prof Donnelly doesn’t point out, if you have a programming language handy, you can find out the answer very easily.

February 4, 2014

What an (un)likely bunch of tosse(r)s?

It was with some amazement that I read the following in the NZ Herald:

Since his first test in charge at Cape Town 13 months ago, McCullum has won just five out of 13 test tosses. Add in losing all five ODIs against India and it does not make for particularly pretty reading.

Then again, he’s up against another ordinary tosser in MS Dhoni, who has got it right just 21 times out of 51 tests at the helm. Three of those were in India’s past three tests.

The implication of the author seems to be that five out of 13, or 21 out of 51 are rather unlucky for a set of random coin tosses, and that the possibility exists that they can influence the toss. They are unlucky if one hopes to win the coin toss more than lose it, but there is no reason to think that is a realistic expectation unless the captains know something about the coin that we don’t.

Again, simple application of the binomial distribution shows how ordinary these results are. If we assume that the chance of winning the toss is 50% (Pr(Win) = 0.5) each time, then in 13 throws we would expect to win, on average, 6 to 7 times (6.5 for the pedants). Random variation would mean that about 90% of the time, we would expect to see four to nine wins in 13 throws (on average). So McCullum’s five from 13 hardly seems unlucky, or exceptionally bad. You might be tempted to think that the same may not hold for Dhoni. Just using the observed data, his estimated probability of success is 21/51 or 0.412 (3dp). This is not 0.5, but again, assuming a fair coin, and independence between tosses, it is not that unreasonable either. Using frequentist theory, and a simple normal approximation (with no small sample corrections), we would expect 96.4% of sets of 51 throws to yield somewhere between 18 and 33 successes. So Dhoni’s results are somewhat on the low side, but they are not beyond the realms of reasonably possibility.

Taking a Bayesian stance, as is my wont, yields a similar result. If I assume a uniform prior – which says “any probability of success between 0 and 1 is equally likely”, and binomial sampling, then the posterior distribution for the probability of success follows a Beta distribution with parameters a = 21+ 1 = 22, and b = 51 – 21 + 1 = 31. There are a variety of different ways we might use this result. One is to construct a credible interval for the true value of the probability of success. Using our data, we can say there is about a 95% chance that the true value is between 0.29 and 0.55 – so again, as 0.5 is contained within this interval, it is possible. Alternatively, the posterior probability that the true probability of success is less than 0.5 is about 0.894 (3dp). That is high, but not high enough for me. It says there at about a 1 in 10 chance that the true probability of success could actually be 0.5 or higher.

January 14, 2014

Causation, counterfactuals, and Lotto

A story in the Herald illustrates a subtle technical and philosophical point about causation. One of Saturday’s Lotto winners says

“I realised I was starving, so stopped to grab a bacon and egg sandwich.

“When I saw they had a Lotto kiosk, I decided to buy our Lotto tickets while I was there.

“We usually buy our tickets at the supermarket, so I’m glad I followed my gut on this one,” said one of the couple, who wish to remain anonymous.

Assuming it was a random pick, it’s almost certainly true that if they had not bought the ticket at that Lotto kiosk at that time, they would not have won.  On the other hand, if Lotto is honest, buying at that kiosk wasn’t a good strategy — it had no impact on the chance of winning.

There is a sense in which buying the bacon-and-egg sandwich was a cause of the win, but it’s not a very useful sense of the word ’cause’ for most statistical purposes.

The dangers of better measurement

An NPR News story on back pain and its treatment

One reason invasive treatments for back pain have been rising in recent years, Deyo says, is the ready availability of MRI scans. These detailed, color-coded pictures that can show a cross-section of the spine are a technological tour de force. But they can be dangerously misleading.

This MRI shows a mildly herniated disc. That's the sort of thing that looks abnormal on a scan but may not be causing pain and isn't helped by surgery.

This MRI shows a mildly herniated disc. That’s the sort of thing that looks abnormal on a scan but may not be causing pain and isn’t helped by surgery.

“Seeing is believing,” Deyo says. “And gosh! We can actually see degenerated discs, we can see bulging discs. We can see all kinds of things that are alarming.”

That is, they look alarming. But they’re most likely not the cause of the pain.