Posts filed under Surveys (175)

August 6, 2016

Momentum and bounce

Momentum is an actual property of physical objects, and explanations of flight, spin, and bounce in terms of momentum (and other factors) genuinely explain something.  Electoral poll proportions, on the other hand, can only have ‘momentum’ or ‘bounce’ as a metaphor — an explanation based on these doesn’t explain anything.

So, when US pollsters talk about convention bounce in polling results, what do they actually mean? The consensus facts are that polling results improve after a party’s convention and that this improvement tends to be temporary and to produce polling results with a larger error around the final outcome.

Andrew Gelman and David Rothschild have a long piece about this at Slate:

Recent research, however, suggests that swings in the polls can often be attributed not to changes in voter intention but in changing patterns of survey nonresponse: What seems like a big change in public opinion turns out to be little more than changes in the inclinations of Democrats and Republicans to respond to polls. 

As usual, my recommendation is the relatively boring 538 polls-plus forecast, which discounts the ‘convention bounce’ very strongly.

July 31, 2016

Lucifer, Harambe, and Agrabah

Public Policy Polling has a history of asking … unusual… questions in their political polls.  For example, asking if you are in favour of bombing Agrabah (the fictional country of Disney’s Aladdin), whether you think Hillary Clinton has ties to Lucifer, and whether you would vote for Harambe (the dead, 17-yr old gorilla) if running as an independent against Trump and Clinton.

From these three questions, the Lucifer one stands out: it comes from a familiar news issue and isn’t based on tricking the respondents. People may not answer honestly, but at least they know roughly what they are being asked and how it’s likely to be understood.  Since they know what they are being asked, it’s possible to interpret the responses in a reasonably straightforward way.

Now, it’s fairly common when asking people (especially teenagers) about drug use to include some non-existent drugs for an estimate of the false-positive response rate.  It’s still pretty clear how to interpret the results: if the name is chosen well, no respondents will have a good-faith belief that they have taken a drug with that name, but they also won’t be confident that it’s a ringer.  You’re not aiming to trick honest respondents; you’re aiming to detect those that aren’t answering honestly.

The Agrabah question is different. There had been extensive media discussion of the question of bombing various ISIS strongholds (eg Raqqa), and this was the only live political question about bombing in the Middle East. Given the context of a serious opinion poll, it would be easy to have a good-faith belief that ‘Agrabah’ was the name of one of these ISIS strongholds and thus to think you were being asked whether bombing ISIS there was a good idea. Because of this potential confusion, we can’t tell what the respondents actually meant — we can be sure they didn’t support bombing a fictional city, but we can’t tell to what extent they were recklessly supporting arbitrary Middle-Eastern bombing versus just being successfully trolled. Because we don’t know what respondents really meant, the results aren’t very useful.

The Harambe question is different again. Harambe is under the age limit for President, from the wrong species, and dead, so what could it even mean for him to be a candidate?  The charitable view might be that Harambe’s 5% should be subtracted from the 8-9% who say they will vote for real, living, human candidates other than Trump and Clinton. On the other hand, that interpretation relies on people not recognising Harambe’s name — on almost everyone not recognising the name, given that we’re talking about 5% of responses.  I can see the attraction of using a control question rather than a half-arsed correction based on historical trends. I just don’t believe the assumptions you’d need for it to work.

Overall, you don’t have to be very cynical to suspect the publicity angle might have some effect on their question choice.

July 27, 2016

In praise of NZ papers

I whinge about NZ papers a lot on StatsChat, and even more about some of the UK stories they reprint. It’s good sometimes to look at some of the UK stories they don’t reprint.  From the Daily Express


The Brexit enthusiast and cabinet Minister John Redwood says “The poll is great news, well done to the Daily Express.” As he seems to be suggesting, you don’t get results like this just by chance — having an online bogus poll on the website of an anti-Europe newspaper is a good start.

(via Antony Unwin)

May 24, 2016


Headline: “Newshub poll: Key’s popularity plummets to lowest level”

Just 36.7 percent of those polled listed the current Prime Minister as their preferred option — down 1.6 percent — from a Newshub poll in November.

National though is steady on 47 percent on the poll — a drop of just 0.3 percent — and similar to the Election night result.

So, apparently, 0.3% is “steady” and 1.6% is a “plummet”.

The reason we quote ‘maximum margin of error’, even though it’s a crude summary, not a good way to describe evidence, underestimates variability, and is a terribly misleading phrase, is that it at least gives some indication of what is worth headlining.  The maximum margin of error for this poll is 3%, but the margin of error for a change is 1.4 times higher, about 4.3%.

That’s the maximum margin of error, for a 50% true value, but it doesn’t make that much difference– I did a quick simulation to check. If nothing happened, the Prime Minister’s measured popularity would plummet or soar by more than 1.6% between two polls about half the time purely from sampling variation.


May 20, 2016

Depends who you ask

There’s a Herald story about sleep

A University of Michigan study using data from Entrain, a smartphone app aimed at reducing jetlag, found Kiwis on average go to sleep at 10.48pm and wake at 6.54am – an average of 8 hours and 6 minutes sleep.

It quotes me as saying the results might not be all that representative, but it just occurred to me that there are some comparison data sets for the US at least.

  • The Entrain study finds people in the US go to sleep on average just before 11pm and wake up on average between 6:45 and 7am.
  • SleepCycle, another app, reports a bedtime of 11:40 for women and midnight for men, with both men and women waking at about 7:20.
  • The American Time Use Survey is nationally representative, but not that easy to get stuff out of. However, Nathan Yau at Flowing Data has an animation saying that 50% of the population are asleep at 10:30pm and awake at 6:30am
  • And Jawbone, who don’t have to take anyone’s word for whether they’re asleep, have a fascinating map of mean bedtime by county of the US. It looks like the national average is after 11pm, but there’s huge variation, both urban-rural and position within your time zone.

These differences partly come from who is deliberately included and excluded (kids, shift workers, the very old), partly from measurement details, and partly from oversampling of the sort of people who use shiny gadgets.

March 11, 2016

Getting to see opinion poll uncertainty

Rock’n Poll has a lovely guide to sampling uncertainty in election polls, guiding you step by step to see how approximate the results would be in the best of all possible worlds. Highly recommended.

Of course, we’re not in the best of all possible worlds, and in addition to pure sampling uncertainty we have ‘house effects’ due to different methodology between polling firms and ‘design effects’ due to the way the surveys compensate for non-response.  And on top of that there are problems with the hypothetical question ‘if an election were held tomorrow’, and probably issues with people not wanting to be honest.

Even so, the basic sampling uncertainty gives a good guide to the error in opinion polls, and anything that makes it easier to understand is worth having.


(via Harkanwal Singh)

February 28, 2016

How I met your mother

Via Jolisa Gracewood on Twitter, a graph from Stanford sociologist Michael Rosenfeld on how people met their partners (click to embiggen)


Obviously the proportion who met online has increased — in the old days there weren’t many people on line. It’s still dramatic how fast the change happened, considering that ‘the year September never ended’, when AOL subscribers gained access to Usenet, was only 1993.  It’s also notable how everything else except ‘in a bar or restaurant’ has gone down.

Since this is StatsChat you should be asking how they got the data: it was a reasonably good survey. There’s a research paper, too (PDF).

You should also be worrying about the bump in ‘online’ in the mid-1980s. It’s ok. The paper says “This bump corresponds to two respondents. These two respondents first met their partners in the 1980s without the assistance of the Internet, and then used the Internet to reconnect later”



February 7, 2016

Zombie bogus surveys

From Food Network magazine, via Twitter, via Julie Blommaert


There’s no more detail than “Kellogg’s” as the source, and the Kellogg’s website is very sensibly not admitting to anything.

Some more Google finds two stories from September last year — getting the factoid into a real paper magazine, because of the publication time lag, gives it another chance to roam the earth looking for brains.

Even though it has to be the same survey, the story from Vice says “a full one-fifth of Americans are using orange juice in their cereal instead of milk,” though Bustle says “More than 10 percent of Americans admitted to using orange juice or coffee”.   It’s not just that the numbers are inconsistent, the phrasing in one case suggests “do you usually?” as the question, the other “have you ever?” It matters, or at least it would if anything about this mattered.

We’re also not told whether these are really supposed to be proportions of “Americans” or of “Americans who eat cereal”, or “Americans who eat cereal for breakfast”, or whatever.

Usefully, the Vice story does give a bit more detail about the survey

Two thousand US consumers and college students from all over the country participated in the study, with about 30 percent male subjects and 70 percent female. The participants were of all ages, with half being college students and the rest varied (14 percent between the ages of 25 and 34 years old, 16 percent between 35 and 44 years old, about a quarter between 45 and 54 years old, and the rest scattered in older or younger age groups). 

They don’t say how the participants were recruited or surveyed, but there’s enough information there to make it clear the data would be meaningless even if we knew what the questions were and what percentages the survey actually found.

January 15, 2016

When you don’t find any

The Icelandic Ethical Humanist Association commissioned a survey on religion. For people who don’t want to read the survey report (in PDF, in Icelandic), there’s a story at Iceland Magazine. The main point is in the headline: 0.0% of Icelanders 25 years or younger believe God created the world, new poll reveals.

That’s a pretty strong claim, so what did the survey actually do? Well, here you do need to read the survey report (or at least feed snippets of it to Google Translate). Of the people they sampled, 109 were in the lowest age category, which is ‘younger than 25’.  None of the 109  reported believing “God created the world” vs “The world was created in the Big Bang”.

Now, that’s not a completely clean pair of alternatives, since a fair number of people — the Pope, for example — say they believe both, but it’s still informative to some extent. So what can we say about sampling uncertainty?

A handy trick for situations like this one is the ‘rule of 3’.  If you ask N people and none of them is a creationist, a 95% confidence upper bound for the population proportion is 3/N. So, “fewer than 3% of Icelanders under 25 believe God created the world”

Who got the numbers, how, and why?

The Dominion Post has what I’m told is a front page story about school costs, with some numbers:

For children starting state school this year, the total cost, including fees, extracurricular activities, other necessities, transport and computers, by the time they finish year 13 in 2028 is estimated at $35,064 by education-focused savings trust Australian Scholarship Group.

That increases to $95,918 for a child at a state-integrated school, and $279,807 for private school.

Given that the figures involve extrapolation of both real cost increases and inflation thirteen years into the future, I’m not convinced that a whole-education total is all that useful. I would have thought estimates for a single year would be more easily interpreted.  However, that’s not the main issue.

ASG do this routinely. They don’t have the 2016 numbers on their website yet, but they do have last year’s version. Important things to note about the numbers, from that link:

ASG conducted an online education costs survey among its members during October 2013. The surveys covered primary and secondary school. In all, ASG received more than 1000 survey responses.

So, it’s a non-random, unweighted survey, probably with a low response rate, among people signed up for an education-savings programme. You’d expect it to overestimate, but it’s not clear how much. Also

Figures have been rounded and represent the upper ranges that parents can reasonably expect to pay

‘Rounded’ is good, even though they don’t actually show much sign of having been rounded. ‘Represent the upper ranges’ is a bit more worrying when there’s no indication of how this was done — and when the Dom Post didn’t include this caveat in their story.