Posts filed under Surveys (172)

May 24, 2016

Microplummeting

Headline: “Newshub poll: Key’s popularity plummets to lowest level”

Just 36.7 percent of those polled listed the current Prime Minister as their preferred option — down 1.6 percent — from a Newshub poll in November.

National though is steady on 47 percent on the poll — a drop of just 0.3 percent — and similar to the Election night result.

So, apparently, 0.3% is “steady” and 1.6% is a “plummet”.

The reason we quote ‘maximum margin of error’, even though it’s a crude summary, not a good way to describe evidence, underestimates variability, and is a terribly misleading phrase, is that it at least gives some indication of what is worth headlining.  The maximum margin of error for this poll is 3%, but the margin of error for a change is 1.4 times higher, about 4.3%.

That’s the maximum margin of error, for a 50% true value, but it doesn’t make that much difference– I did a quick simulation to check. If nothing happened, the Prime Minister’s measured popularity would plummet or soar by more than 1.6% between two polls about half the time purely from sampling variation.

 

May 20, 2016

Depends who you ask

There’s a Herald story about sleep

A University of Michigan study using data from Entrain, a smartphone app aimed at reducing jetlag, found Kiwis on average go to sleep at 10.48pm and wake at 6.54am – an average of 8 hours and 6 minutes sleep.

It quotes me as saying the results might not be all that representative, but it just occurred to me that there are some comparison data sets for the US at least.

  • The Entrain study finds people in the US go to sleep on average just before 11pm and wake up on average between 6:45 and 7am.
  • SleepCycle, another app, reports a bedtime of 11:40 for women and midnight for men, with both men and women waking at about 7:20.
  • The American Time Use Survey is nationally representative, but not that easy to get stuff out of. However, Nathan Yau at Flowing Data has an animation saying that 50% of the population are asleep at 10:30pm and awake at 6:30am
  • And Jawbone, who don’t have to take anyone’s word for whether they’re asleep, have a fascinating map of mean bedtime by county of the US. It looks like the national average is after 11pm, but there’s huge variation, both urban-rural and position within your time zone.

These differences partly come from who is deliberately included and excluded (kids, shift workers, the very old), partly from measurement details, and partly from oversampling of the sort of people who use shiny gadgets.

March 11, 2016

Getting to see opinion poll uncertainty

Rock’n Poll has a lovely guide to sampling uncertainty in election polls, guiding you step by step to see how approximate the results would be in the best of all possible worlds. Highly recommended.

Of course, we’re not in the best of all possible worlds, and in addition to pure sampling uncertainty we have ‘house effects’ due to different methodology between polling firms and ‘design effects’ due to the way the surveys compensate for non-response.  And on top of that there are problems with the hypothetical question ‘if an election were held tomorrow’, and probably issues with people not wanting to be honest.

Even so, the basic sampling uncertainty gives a good guide to the error in opinion polls, and anything that makes it easier to understand is worth having.

poll-land

(via Harkanwal Singh)

February 28, 2016

How I met your mother

Via Jolisa Gracewood on Twitter, a graph from Stanford sociologist Michael Rosenfeld on how people met their partners (click to embiggen)

met

Obviously the proportion who met online has increased — in the old days there weren’t many people on line. It’s still dramatic how fast the change happened, considering that ‘the year September never ended’, when AOL subscribers gained access to Usenet, was only 1993.  It’s also notable how everything else except ‘in a bar or restaurant’ has gone down.

Since this is StatsChat you should be asking how they got the data: it was a reasonably good survey. There’s a research paper, too (PDF).

You should also be worrying about the bump in ‘online’ in the mid-1980s. It’s ok. The paper says “This bump corresponds to two respondents. These two respondents first met their partners in the 1980s without the assistance of the Internet, and then used the Internet to reconnect later”

 

 

February 7, 2016

Zombie bogus surveys

From Food Network magazine, via Twitter, via Julie Blommaert

Caj-ZOKUEAAr9vm

There’s no more detail than “Kellogg’s” as the source, and the Kellogg’s website is very sensibly not admitting to anything.

Some more Google finds two stories from September last year — getting the factoid into a real paper magazine, because of the publication time lag, gives it another chance to roam the earth looking for brains.

Even though it has to be the same survey, the story from Vice says “a full one-fifth of Americans are using orange juice in their cereal instead of milk,” though Bustle says “More than 10 percent of Americans admitted to using orange juice or coffee”.   It’s not just that the numbers are inconsistent, the phrasing in one case suggests “do you usually?” as the question, the other “have you ever?” It matters, or at least it would if anything about this mattered.

We’re also not told whether these are really supposed to be proportions of “Americans” or of “Americans who eat cereal”, or “Americans who eat cereal for breakfast”, or whatever.

Usefully, the Vice story does give a bit more detail about the survey

Two thousand US consumers and college students from all over the country participated in the study, with about 30 percent male subjects and 70 percent female. The participants were of all ages, with half being college students and the rest varied (14 percent between the ages of 25 and 34 years old, 16 percent between 35 and 44 years old, about a quarter between 45 and 54 years old, and the rest scattered in older or younger age groups). 

They don’t say how the participants were recruited or surveyed, but there’s enough information there to make it clear the data would be meaningless even if we knew what the questions were and what percentages the survey actually found.

January 15, 2016

When you don’t find any

The Icelandic Ethical Humanist Association commissioned a survey on religion. For people who don’t want to read the survey report (in PDF, in Icelandic), there’s a story at Iceland Magazine. The main point is in the headline: 0.0% of Icelanders 25 years or younger believe God created the world, new poll reveals.

That’s a pretty strong claim, so what did the survey actually do? Well, here you do need to read the survey report (or at least feed snippets of it to Google Translate). Of the people they sampled, 109 were in the lowest age category, which is ‘younger than 25’.  None of the 109  reported believing “God created the world” vs “The world was created in the Big Bang”.

Now, that’s not a completely clean pair of alternatives, since a fair number of people — the Pope, for example — say they believe both, but it’s still informative to some extent. So what can we say about sampling uncertainty?

A handy trick for situations like this one is the ‘rule of 3’.  If you ask N people and none of them is a creationist, a 95% confidence upper bound for the population proportion is 3/N. So, “fewer than 3% of Icelanders under 25 believe God created the world”

Who got the numbers, how, and why?

The Dominion Post has what I’m told is a front page story about school costs, with some numbers:

For children starting state school this year, the total cost, including fees, extracurricular activities, other necessities, transport and computers, by the time they finish year 13 in 2028 is estimated at $35,064 by education-focused savings trust Australian Scholarship Group.

That increases to $95,918 for a child at a state-integrated school, and $279,807 for private school.

Given that the figures involve extrapolation of both real cost increases and inflation thirteen years into the future, I’m not convinced that a whole-education total is all that useful. I would have thought estimates for a single year would be more easily interpreted.  However, that’s not the main issue.

ASG do this routinely. They don’t have the 2016 numbers on their website yet, but they do have last year’s version. Important things to note about the numbers, from that link:

ASG conducted an online education costs survey among its members during October 2013. The surveys covered primary and secondary school. In all, ASG received more than 1000 survey responses.

So, it’s a non-random, unweighted survey, probably with a low response rate, among people signed up for an education-savings programme. You’d expect it to overestimate, but it’s not clear how much. Also

Figures have been rounded and represent the upper ranges that parents can reasonably expect to pay

‘Rounded’ is good, even though they don’t actually show much sign of having been rounded. ‘Represent the upper ranges’ is a bit more worrying when there’s no indication of how this was done — and when the Dom Post didn’t include this caveat in their story.

 

November 22, 2015

Helpful context

From the Herald

A study by sleep experts at Sealy UK found that those who kip on the right-hand side of the mattress are far more pessimistic than those who doze on the left.

 

Neil Robinson, Sealy’s top snooze analyst, said: “The research certainly highlights an interesting trend ” could it be possible that the left side of bed is the ‘right’ side?

 

November 16, 2015

Measuring gender

So, since we’re having a Transgender Week of Awareness at the moment, it seems like a good time to look at how statisticians ask people about gender, and why it’s harder than it looks.

By ‘harder than it looks’ I don’t just mean that it isn’t a binary question; we’re past that stage, I hope.  Also, this isn’t about biological sex — in genetics I do sometimes care how many X chromosomes someone has, but most questionnaires don’t need to know. It’s harder than it looks because there isn’t just one question.

The basic Male/Female binary question can be extended in (at least) two directions.  The first is to add categories to represent other ways people identify their gender beyond just male/female, which can be fluid over time, or can have more than two categories. Here a write-in option is useful since you almost certainly don’t know all the distinctions people care about across different cultures. In a specialised questionnaire you might even want to separate out questions about fluid/constant identity from non-binary/diversity, but for routine use that might be more than you need.

A second direction is to ask about transgender status, which is relevant for discrimination and (or thus) for some physical and mental health risks.  (Here you might want also want to find out about people who, say, identify as female but present as male.) We have very little idea how many people are transgender — it makes data on sexual orientation look really precise — and that’s a problem for service provision and in many other areas.

Life would get simpler for survey collectors if you combined these into a single question, or if you had a Male/Female/It’s Complicated question with follow-up questions for the third group. On the other hand, it’s pretty clear why trans people don’t like that approach. These really are different questions. For people whose answer to the first question is something like “it depends” or a culturally specific third option, the combination may not be too bad. The problem comes when answer to the second type of question might be “Trans (and yes I sometimes get comments behind my back at work but most people are fine)”, but the answer to the first “Female (and just as female as people with ovaries and a birth certificate, ok)”.

Earlier this year Stats New Zealand ran a discussion and  had a go at a better gender question, and it is definitely better than the old one, especially when it allows for multiple answers and for a write-in answer. They also have a ‘synonym list’ to help people work with free-text answers, although that’s going to be limited if all it does is map back to binary or three-way groups. What they didn’t do was to ask for different types of information separately. [edit: ie, they won’t let you unambiguously say ‘female’ in an identity question then ‘trans’ in a different question]

It’s true that for a lot of purposes you don’t need all this information. But then, for a lot of purposes you don’t actually need to know anything about gender.

(via Writehanded and Jennifer Katherine Shields)

November 9, 2015

Inelegant variation

These graphs are from the (US) National Cable & Telecommunications Association (the cable guys)

cableguy

Apart from the first graph, they are based on five-point agree-disagree scales, and show the many ways you can make pie and bar charts more interesting, especially if you don’t care much about the data. I think my favourites are the bendy green barchart-orbiting-a-black-hole and the green rectangles, where the bars disagree with the printed numbers.

Since it’s a bogus poll, using the results basically to generate artwork is probably the right approach.