Posts filed under Surveys (135)

August 13, 2014

When are self-selected samples worth discussing?

From recent weeks, three examples of claims from self-selected samples:

In all three cases, you’d expect the pattern to generalise to some extent, but not quantitatively. The dating site in question specifically boasts about the non-representativeness of its members; the NZAS survey was sent to people who’d be likely to care, and there wasn’t much time to respond; scientists who had experienced or witnessed harassment would be more likely to respond and to pass the survey along to others.

I think two of these are worth presenting and discussing, and the other one isn’t, and that’s not just because two of them agree with my political prejudices.

The key question to ask when looking at this sort of probably non-representative sample, is whether the response you see would still be interesting if no-one outside the sample shared it. That is, the surveys tell us at a minimum

  • there exist 350 women in New Zealand who wouldn’t marry a man earning less than them, and are prepared to say so
  • there exist 200-odd scientists in NZ who think the National Science Challenges were badly chosen or conducted, and are prepared to say so
  • there exist 417 scientists who have experienced verbal sexual harassment, and 139 who have experienced unwanted physical contact from other research staff during fieldwork, and are prepared to say so.

I would argue that the first of these is completely uninteresting, but the second is contrary to the impressions being given by the government, and the third should worry scientists who participate in or organise fieldwork.


August 6, 2014

Income statistics

The Herald has a story headlined “Where to work if it’s money you’re after,” giving estimated median incomes across a range of job areas.  Sadly, if you read to the end, two of the sources are summaries of advertised salaries for advertised jobs on Seek and TradeMe.  That is, they are neither actual incomes, nor for the country as a whole.

Rather than just whinge about unrepresentative data, I looked at StatsNZ. They divide things up differently, so there was only one job group in the story that exactly matched one on NZ.Stat. People working in construction have a median weekly income of $840 and mean weekly income of $956 according to the NZ Income Survey. If most people in construction worked all year, without periods of unemployment, this would come to a median annual income of  $43,680 or a mean of $49,712.

The Herald thinks the median annual income in construction is $60,000-$78,000.



July 24, 2014

Infographic of the month

Alberto Cairo and pointed me to the infographic on the left, a summary of a residents’ survey from the town of Flower Mound, Texas (near Dallas/Fort Worth airport). The highlight of the infographic is the 3-D piecharts nesting in the tree, ready to hatch out into full-fledged misinformation.

At least, they look like 3-D pie charts at first glance.  When you look more closely, the data are three-year trends in approval ratings for a variety of topics, so pie charts would be even more inappropriate than usual as a display method.  When you look even more closely, you see that that’s ok, because the 3-D ellipses are all just divided into three equal wedges — the data aren’t involved at all.

flower_mound 2014 Citizen Survey Infographic_201407151504422733

The infographic on the right comes from the town government.  It’s much better, especially by the standards of infographics.

If you follow the link, you can read the full survey results, and see that the web page giving survey highlights actually describes how the survey was done — and it was done well.  They sent questionnaires to a random sample of households, got a 35% response rate (not bad, for this sort of thing) and reweighted it based on age, gender, and housing tenure (ie rent, own, etc) to make it more representative.  That’s a better description (and a better survey) than a lot of the ones reported in the NZ media.


[update: probably original, higher resolution version, via Dave Bremer.]

June 25, 2014

Not even wrong

The Readers’ Digest “Most Trusted” lists are out again. Sigh.

Before we get to the actual complaint in Stat-of-the-Week recommendation, we should acknowledge that there’s no way the “most trusted” list could make sense.

Firstly, ‘trusted’ requires more detail. What is it that we’re trusting these people with? Of course, it wouldn’t help making the question more specific, since people will still answer on some vague ‘niceness’ scale anyway: we saw this problem with a Herald poll at the beginning of the year, which asked opinions about five notable people and found the only one notable for his commitment to animal safety had the lowest rating for “who would you trust to feed your cat?”. Secondly, there’s no useful way to get an accurate rating of dozens of people (or other items) in an opinion poll. People’s brains overload. Thirdly, even if you could get a rating from each respondent, the overall ranking will be sensitive to how you combine the individual ratings.

So how does Readers’ Digest do it? They say (shouting in the original)


That is, the list is determined in advance, and the polling just addresses the ordering on the list. There is some vague sense in which Willie Apiata is the most trusted person,  or at least the most highly-regarded person, or at least the most highly-regarded famous person, in New Zealand but there really isn’t any useful sense in which Hone Harawira is the least trusted person in New Zealand. There are many people in NZ who you’d expect to be less trusted than Mr Harawira; they didn’t get put on the list, and the survey respondents weren’t asked about them.

It’s not surprising that stories keep coming out about this list, and I suppose it’s not surprising that people try to interpret being on the bottom of the list. Perhaps more surprising, no-one has yet complained that there are actually 101 well-known people, not 100, on the list.

June 9, 2014

Chasing factoids

The Herald says

Almost four in 10 young UK adults describe themselves as digital addicts, according to research published by Foresters, the financial services company.

The story does quote an independent expert who is relatively unimpressed with the definition of ‘digital addict’, but it doesn’t answer the question ‘what sort of research?”

Via Google, I found a press release of a digital addiction survey promoted by Foresters. It’s not clear if the current story is based on a new press release from this survey or a new version of the survey, but the methodology is presumably similar.

So, what is the methodology?

Over 1,100 people across the UK responded to an online survey in November 2013 , conducted by Wriglesworth Research

There’s also a related press release from Wriglesworth, but without any more methodological detail. If I Google for “wriglesworth survey”, this is what comes up


That is, the company is at least in the habit of conducting self-selected online polls, advertised on web forums and Twitter.

I tried, but I couldn’t find any evidence that the numbers in this online survey were worth the paper they aren’t written on.

May 16, 2014

Smarter than the average bear

Online polling company YouGov asked people in the US and Britain about how their intelligence compared to other people.

For the US, the results were



They pulled that graph only seconds after I found it, and replaced it with the more plausible


The British appear to be slightly more reluctant that the Americans to say they’re smarter than average, though it would be unwise to assume they are less likely to believe it.



April 14, 2014

What do we learn from the Global Drug Use Survey?



That’s the online summary at Stuff.  When you point at one of the bubbles it jumps out at you and tells you what drug it is. The bubbles make it relatively hard to compare non-adjacent numbers, especially as you can only see the name of one at a time. It’s not even that easy to compare adjacent bubbles, eg, the two at the lower right, which differ by more than two percentage points.

More importantly, this is the least useful data from the survey.  Because it’s a voluntary, self-selected online sample, we’d expect the crude proportions to be biased, probably with more drug use in the sample than the population. To the extent that we can tell, this seems to have happened: the proportion of past-year smokers is 33.5% compared to the Census estimate of 15% active smokers.  It’s logically possible for both of these to be correct, but I don’t really believe it.  The reports of cannabis use are much higher than the (admittedly out of date) NZ Alcohol and Drug Use Survey.  For this sort of data, the forthcoming drug-use section of the NZ Health Survey is likely to be more representative.

Where the Global Drug Use Survey will be valuable is in detail about things like side-effects, attempts to quit, strategies people use for harm reduction. That sort of information isn’t captured by the NZ Health Survey, and presumably it is still being processed and analysed.  Some of the relative information might be useful, too: for example, synthetic cannabis is much less popular than the real thing, with past-year use nearly five times lower.

April 2, 2014

Census meshblock files: all the datas

Statistics New Zealand has just released the meshblock-level data from last year’s Census, together with matching information for the previous two censuses (reworked to use the new meshblock boundaries).

Mashblock shows one thing that can be built with this sort of data, there are many others.

Get your meshblock files here

Drug use trends

There’s an interesting piece in Stuff about Massey’s Illegal Drug Monitoring System. I’d like to make two points about it.

First, the headline is that synthetic cannabis use is declining. That’s good, but it’s in a survey of frequent users of illegal drugs.  If you have the contacts and willingness to buy illegal drugs, it isn’t surprising that you’d prefer real cannabis to the synthetics — there seems to be pretty universal agreement that the synthetics are less pleasant and more dangerous.  This survey won’t pick up trends in more widespread casual use, or in use by teenagers, which are probably more important.

Second, the study describes the problems caused by much more toxic new substitutes for Ecstacy and LSD. This is one of the arguments for legalisation. On the other hand, they are also finding increased abuse of prescription oxycodone. This phenomenon, much more severe in the US, weakens the legalisation argument somewhat.  Many people (including me) used to believe, based on reasonable evidence, that a substantial fraction of the adverse health impact of opioid addiction was due to the low and unpredictably-varying purity of street drugs, and that pure, standardised drugs would reduce overdoses. As Keith Humphreys describes, this turns out not to be the case.



March 25, 2014

Political polling code

The Research Association New Zealand  has put out a new code of practice for political polling (PDF) and a guide to the key elements of the code (PDF)

The code includes principles for performing a survey, reporting the results, and publishing the results, eg:

Conduct: If the political questions are part of a longer omnibus poll, they should be asked early on.

Reporting: The report must disclose if the questions were part of an omnibus survey.

Publishing: The story should disclose if the questions were part of an omnibus survey.

There is also some mostly good advice for journalists

  1. If possible, get a copy of the full poll  report and do not rely on a media release.
  2. The story should include the name of the company which conducted the poll, and the client the poll was done for, and the dates it was done.
  3.  The story should include, or make available, the sample size, sampling method, population sampled, if the sample is weighted, the maximum margin of error and the level of undecided voters.
  4. If you think any questions may have impacted the answers to the principal voting behaviour question, mention this in the story.
  5. Avoid reporting breakdown results from very small samples as they are unreliable.
  6. Try to focus on statistically significant changes, which may not just be from the last poll, but over a number of polls.
  7. Avoid the phrase “This party is below the margin of error” as results for low polling parties have a smaller margin of error than for higher polling parties.
  8.  It can be useful to report on what the electoral results of a poll would be, in terms of likely parliamentary blocs, as the highest polling party will not necessarily be the Government.
  9. In your online story, include a link to the full poll results provided by the polling company, or state when and where the report and methodology will be made available.
  10. Only use the term “poll” for scientific polls done in accordance with market research industry approved guidelines, and use “survey” for self-selecting surveys such as text or website surveys.

Some statisticians will disagree with the phrasing of point 6 in terms of statistical significance, but would probably agree with the basic principle of not ‘chasing the noise’

I’m not entirely happy with point 10, since outside politics and market research, “survey” is the usual word for scientific polls, eg, the New Zealand Income Survey, the Household Economic Survey, the General Social Survey, the National Health and Nutrition Examination Survey, the British Household Panel Survey, etc, etc.

As StatsChat readers know, I like the term “bogus poll” for the useless website clicky surveys. Serious Media Organisations who think this phrase is too frivolous could solve the problem by not wasting space on stories about bogus polls.