Posts filed under Surveys (133)

July 24, 2014

Infographic of the month

Alberto Cairo and wtfviz.net pointed me to the infographic on the left, a summary of a residents’ survey from the town of Flower Mound, Texas (near Dallas/Fort Worth airport). The highlight of the infographic is the 3-D piecharts nesting in the tree, ready to hatch out into full-fledged misinformation.

At least, they look like 3-D pie charts at first glance.  When you look more closely, the data are three-year trends in approval ratings for a variety of topics, so pie charts would be even more inappropriate than usual as a display method.  When you look even more closely, you see that that’s ok, because the 3-D ellipses are all just divided into three equal wedges — the data aren’t involved at all.

flower_mound 2014 Citizen Survey Infographic_201407151504422733

The infographic on the right comes from the town government.  It’s much better, especially by the standards of infographics.

If you follow the link, you can read the full survey results, and see that the web page giving survey highlights actually describes how the survey was done — and it was done well.  They sent questionnaires to a random sample of households, got a 35% response rate (not bad, for this sort of thing) and reweighted it based on age, gender, and housing tenure (ie rent, own, etc) to make it more representative.  That’s a better description (and a better survey) than a lot of the ones reported in the NZ media.

 

[update: probably original, higher resolution version, via Dave Bremer.]

June 25, 2014

Not even wrong

The Readers’ Digest “Most Trusted” lists are out again. Sigh.

Before we get to the actual complaint in Stat-of-the-Week recommendation, we should acknowledge that there’s no way the “most trusted” list could make sense.

Firstly, ‘trusted’ requires more detail. What is it that we’re trusting these people with? Of course, it wouldn’t help making the question more specific, since people will still answer on some vague ‘niceness’ scale anyway: we saw this problem with a Herald poll at the beginning of the year, which asked opinions about five notable people and found the only one notable for his commitment to animal safety had the lowest rating for “who would you trust to feed your cat?”. Secondly, there’s no useful way to get an accurate rating of dozens of people (or other items) in an opinion poll. People’s brains overload. Thirdly, even if you could get a rating from each respondent, the overall ranking will be sensitive to how you combine the individual ratings.

So how does Readers’ Digest do it? They say (shouting in the original)

READER’S DIGEST COMMISSIONED CATALYST CONSULTANCY & RESEARCH TO POLL A REPRESENTATIVE SAMPLE OF NEW ZEALANDERS ABOUT TRUSTED PEOPLE AND PROFESSIONS. A TOTAL OF 603 ADULTS RANKED 100 WELL-KNOWN PEOPLE AND 50 JOB TYPES ON A SCALE OF ONE TO TEN IN MARCH 2014.

That is, the list is determined in advance, and the polling just addresses the ordering on the list. There is some vague sense in which Willie Apiata is the most trusted person,  or at least the most highly-regarded person, or at least the most highly-regarded famous person, in New Zealand but there really isn’t any useful sense in which Hone Harawira is the least trusted person in New Zealand. There are many people in NZ who you’d expect to be less trusted than Mr Harawira; they didn’t get put on the list, and the survey respondents weren’t asked about them.

It’s not surprising that stories keep coming out about this list, and I suppose it’s not surprising that people try to interpret being on the bottom of the list. Perhaps more surprising, no-one has yet complained that there are actually 101 well-known people, not 100, on the list.

June 9, 2014

Chasing factoids

The Herald says

Almost four in 10 young UK adults describe themselves as digital addicts, according to research published by Foresters, the financial services company.

The story does quote an independent expert who is relatively unimpressed with the definition of ‘digital addict’, but it doesn’t answer the question ‘what sort of research?”

Via Google, I found a press release of a digital addiction survey promoted by Foresters. It’s not clear if the current story is based on a new press release from this survey or a new version of the survey, but the methodology is presumably similar.

So, what is the methodology?

Over 1,100 people across the UK responded to an online survey in November 2013 , conducted by Wriglesworth Research

There’s also a related press release from Wriglesworth, but without any more methodological detail. If I Google for “wriglesworth survey”, this is what comes up

wriglesworth

That is, the company is at least in the habit of conducting self-selected online polls, advertised on web forums and Twitter.

I tried, but I couldn’t find any evidence that the numbers in this online survey were worth the paper they aren’t written on.

May 16, 2014

Smarter than the average bear

Online polling company YouGov asked people in the US and Britain about how their intelligence compared to other people.

For the US, the results were

usintel

 

They pulled that graph only seconds after I found it, and replaced it with the more plausible

intelligence2

The British appear to be slightly more reluctant that the Americans to say they’re smarter than average, though it would be unwise to assume they are less likely to believe it.

 

selfassess1-2

April 14, 2014

What do we learn from the Global Drug Use Survey?

drug

 

That’s the online summary at Stuff.  When you point at one of the bubbles it jumps out at you and tells you what drug it is. The bubbles make it relatively hard to compare non-adjacent numbers, especially as you can only see the name of one at a time. It’s not even that easy to compare adjacent bubbles, eg, the two at the lower right, which differ by more than two percentage points.

More importantly, this is the least useful data from the survey.  Because it’s a voluntary, self-selected online sample, we’d expect the crude proportions to be biased, probably with more drug use in the sample than the population. To the extent that we can tell, this seems to have happened: the proportion of past-year smokers is 33.5% compared to the Census estimate of 15% active smokers.  It’s logically possible for both of these to be correct, but I don’t really believe it.  The reports of cannabis use are much higher than the (admittedly out of date) NZ Alcohol and Drug Use Survey.  For this sort of data, the forthcoming drug-use section of the NZ Health Survey is likely to be more representative.

Where the Global Drug Use Survey will be valuable is in detail about things like side-effects, attempts to quit, strategies people use for harm reduction. That sort of information isn’t captured by the NZ Health Survey, and presumably it is still being processed and analysed.  Some of the relative information might be useful, too: for example, synthetic cannabis is much less popular than the real thing, with past-year use nearly five times lower.

April 2, 2014

Census meshblock files: all the datas

Statistics New Zealand has just released the meshblock-level data from last year’s Census, together with matching information for the previous two censuses (reworked to use the new meshblock boundaries).

Mashblock shows one thing that can be built with this sort of data, there are many others.

Get your meshblock files here

Drug use trends

There’s an interesting piece in Stuff about Massey’s Illegal Drug Monitoring System. I’d like to make two points about it.

First, the headline is that synthetic cannabis use is declining. That’s good, but it’s in a survey of frequent users of illegal drugs.  If you have the contacts and willingness to buy illegal drugs, it isn’t surprising that you’d prefer real cannabis to the synthetics — there seems to be pretty universal agreement that the synthetics are less pleasant and more dangerous.  This survey won’t pick up trends in more widespread casual use, or in use by teenagers, which are probably more important.

Second, the study describes the problems caused by much more toxic new substitutes for Ecstacy and LSD. This is one of the arguments for legalisation. On the other hand, they are also finding increased abuse of prescription oxycodone. This phenomenon, much more severe in the US, weakens the legalisation argument somewhat.  Many people (including me) used to believe, based on reasonable evidence, that a substantial fraction of the adverse health impact of opioid addiction was due to the low and unpredictably-varying purity of street drugs, and that pure, standardised drugs would reduce overdoses. As Keith Humphreys describes, this turns out not to be the case.

 

 

March 25, 2014

Political polling code

The Research Association New Zealand  has put out a new code of practice for political polling (PDF) and a guide to the key elements of the code (PDF)

The code includes principles for performing a survey, reporting the results, and publishing the results, eg:

Conduct: If the political questions are part of a longer omnibus poll, they should be asked early on.

Reporting: The report must disclose if the questions were part of an omnibus survey.

Publishing: The story should disclose if the questions were part of an omnibus survey.

There is also some mostly good advice for journalists

  1. If possible, get a copy of the full poll  report and do not rely on a media release.
  2. The story should include the name of the company which conducted the poll, and the client the poll was done for, and the dates it was done.
  3.  The story should include, or make available, the sample size, sampling method, population sampled, if the sample is weighted, the maximum margin of error and the level of undecided voters.
  4. If you think any questions may have impacted the answers to the principal voting behaviour question, mention this in the story.
  5. Avoid reporting breakdown results from very small samples as they are unreliable.
  6. Try to focus on statistically significant changes, which may not just be from the last poll, but over a number of polls.
  7. Avoid the phrase “This party is below the margin of error” as results for low polling parties have a smaller margin of error than for higher polling parties.
  8.  It can be useful to report on what the electoral results of a poll would be, in terms of likely parliamentary blocs, as the highest polling party will not necessarily be the Government.
  9. In your online story, include a link to the full poll results provided by the polling company, or state when and where the report and methodology will be made available.
  10. Only use the term “poll” for scientific polls done in accordance with market research industry approved guidelines, and use “survey” for self-selecting surveys such as text or website surveys.

Some statisticians will disagree with the phrasing of point 6 in terms of statistical significance, but would probably agree with the basic principle of not ‘chasing the noise’

I’m not entirely happy with point 10, since outside politics and market research, “survey” is the usual word for scientific polls, eg, the New Zealand Income Survey, the Household Economic Survey, the General Social Survey, the National Health and Nutrition Examination Survey, the British Household Panel Survey, etc, etc.

As StatsChat readers know, I like the term “bogus poll” for the useless website clicky surveys. Serious Media Organisations who think this phrase is too frivolous could solve the problem by not wasting space on stories about bogus polls.

On a scale of 1 to 10

Via @neil_, an interactive graph of ratings for episodes of The Simpsons

simpsons

 

This comes from graphtv, which lets you do this for all sorts of shows (eg, Breaking Bad, which strikingly gets better ratings as the season progresses, then resets)

The reason the Simpsons graph has extra relevance to StatsChat is the distinctive horizontal line.  For the first ten seasons an episode basically couldn’t get rated below 7.5, after that it basically couldn’t rated above 7.5.   In the beginning there were ‘typical’ episodes and ‘good’ episodes; now there are ‘typical’ episodes and ‘bad’ episodes.

This could be a real change in quality, but it doesn’t match up neatly with the changes in personnel and style.  It could be a change in the people giving the ratings, or in the interpretation of the scale over time. How could we tell? One clue is that (based on checking just a handful of points) in the early years the high-rating episodes were rated by more people, and this difference has vanished or even reversed.

March 20, 2014

Beyond the margin of error

From Twitter, this morning (the graphs aren’t in the online story)

Now, the Herald-Digipoll is supposed to be a real survey, with samples that are more or less representative after weighting. There isn’t a margin of error reported, but the standard maximum margin of error would be  a little over 6%.

There are two aspects of the data that make it not look representative. Thr first is that only 31.3%, or 37% of those claiming to have voted, said they voted for Len Brown last time. He got 47.8% of the vote. That discrepancy is a bit larger than you’d expect just from bad luck; it’s the sort of thing you’d expect to see about 1 or 2 times in 1000 by chance.

More impressively, 85% of respondents claimed to have voted. Only 36% of those eligible in Auckland actually voted. The standard polling margin of error is ‘two sigma’, twice the standard deviation.  We’ve seen the physicists talk about ’5 sigma’ or ’7 sigma’ discrepancies as strong evidence for new phenomena, and the operations management people talk about ‘six sigma’ with the goal of essentially ruling out defects due to unmanaged variability.  When the population value is 36% and the observed value is 85%, that’s a 16 sigma discrepancy.

The text of the story says ‘Auckland voters’, not ‘Aucklanders’, so I checked to make sure it wasn’t just that 12.4% of the people voted in the election but didn’t vote for mayor. That explanation doesn’t seem to work either: only 2.5% of mayoral ballots were blank or informal. It doesn’t work if you assume the sample was people who voted in the last national election.  Digipoll are a respectable polling company, which is why I find it hard to believe there isn’t a simple explanation, but if so it isn’t in the Herald story. I’m a bit handicapped by the fact that the University of Texas internet system bizarrely decides to block the Digipoll website.

So, how could the poll be so badly wrong? It’s unlikely to just be due to bad sampling — you could do better with a random poll of half a dozen people. There’s got to be a fairly significant contribution from people whose recall of the 2013 election is not entirely accurate, or to put it more bluntly, some of the respondents were telling porkies.  Unfortunately, that makes it hard to tell if results for any of the other questions bear even the slightest relationship to the truth.