December 8, 2014

Stat of the Week Competition: December 6 – 12 2014

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday December 12 2014.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of December 6 – 12 2014 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

December 7, 2014

Briefly

Bot or Not?

Turing had the Imitation Game, Phillip K. Dick had the Voight-Kampff Test, and spammers gave us the CAPTCHA.  The Truthy project at Indiana University has BotOrNot, which is supposed to distinguish real people on Twitter from automated accounts, ‘bots’, using analysis of their language, their social networks, and their retweeting behaviour. BotOrNot seems to sort of work, but not as well as you might expect.

@NZquake, a very obvious bot that tweets earthquake information from GeoNet, is rated at an 18% chance of being a bot.  Siouxsie Wiles, for whom there is pretty strong evidence of existence as a real person, has a 29% chance of being a bot.  I’ve got a 37% chance, the same as @fly_papers, which is a bot that tweets the titles of research papers about fruit flies, and slightly higher than @statschat, the bot that tweets StatsChat post links,  or @redscarebot, which replies to tweets that include ‘communist’ or ‘socialist’. Other people at a similar probability include Winston Peters, Metiria Turei, and Nicola Gaston (President of the NZ Association of Scientists).

PicPedant, the twitter account of the tireless Paulo Ordoveza, who debunks fake photos and provides origins for uncredited ones, rates at 44% bot probability, but obviously isn’t.  Ben Atkinson, a Canadian economist and StatsChat reader, has a 51% probability, and our only Prime Minister (or his twitterwallah), @johnkeypm, has a 60% probability.

 

December 5, 2014

Bogus polls don’t work even for the good guys

There’s a story in the Times Higher Education Supplement about a Nuffield Council report “The Culture of Scientific Research in the UK.”  The lead in the THES story is

More than a quarter of scientists have felt tempted or under pressure to compromise the integrity of their research, according to a report on the ethics culture at universities.

On the other hand, since the report also found that 56% of scientists were women, the UK must be doing something right.

Seriously, there is a lot to be concerned about — especially in the light of the recent case of Professor Stefan Grimm at Imperial College — but that makes it more important to be careful about facts, not less important.

The survey that formed the quantitative part of the report had just under 1000 responses over three months, which is a substantially lower fraction of the target population than the NZ Association of Scientists managed for similar surveys in much less time. Researchers in biosciences are over-represented (57% of respondents vs 34% of university scientists, according to the report), and I think postdocs probably are too (30% of respondents).

The report itself is careful to describe the percentages as “of survey respondents” — it’s THES that dropped this distinction. As usual, it’s the qualitative information in the report that is most useful, and it’s a pity it has been pushed aside by unreliable numbers.

December 4, 2014

The poker economy?

StatsChat spends a lot of time criticising the Herald and Stuff. That’s because they are readily available, not because they are particularly bad.

For a change, this graph is from the Waikato Business News (you can see the e-book version of the paper here)

hamilton-is-coming

So, there’s some measure of a city’s economy where Hamilton is 4/7 of Auckland, Christchurch is 6/7 of Auckland, and where ChCh and Auckland will be stable but Hamilton will increase and Wellington decrease by about the same amount over the next 10 years.

It’s a bit surprising that you can find a measure (other than construction expenditures, perhaps) where Christchurch’s economy is almost as big as Auckland’s. The graph doesn’t say what the measure is, or how big the poker chips are. Neither does the text of the story. The statistics and graphs are attributed to a report “Growing the Hamilton Economy” by Berl Economics, which I can’t find online — but the reader shouldn’t have to do this much work to decode a front-page headline graphic.

Fortune cookie science reporting

fortune_cookies

For science, the appropriate addition is “in mice.”

The Herald’s story (from the Daily Telegraph) “The latest 12 hour diet backed by science” has exactly this problem. It begins

Dieters hoping to shed the kilos should watch the clock as much as their calorie intake after scientists discovered that limiting the time span in which food is consumed can stop weight gain.

Confining meals to a 12-hour period, such as 8am to 8pm, and fasting for the remainder of the day, appears to make a huge difference to whether fat is stored, or burned up by the body.

It’s not until paragraph 6 that we find out this isn’t about dieters, it’s about mice.  The differences truly are huge — 5% of body weight within a few days, 25% by the end of the study — so you’d think it would be easy to demonstrate these benefits in humans if they were real.

Earlier this year, a different research group published a summary of studies on time-restricted feeding.  There are no controlled studies in humans. The uncontrolled studies aren’t especially high quality, and the ones with a 12-hour period mostly just take advantage of the no-daytime-eating rule observed by Muslims during the month of Ramadan. However, it’s still notable that the average weight reductions from a 4-week period of 12-hour food restrictions were 1-3%.

 

December 3, 2014

Briefly

  • The Economist has a piece on interactive graphics “It is becoming clear that the native form for data is alive, not dead. Online, interactive charts will become the norm, nudging aside paper-based, static ones.”
  • Ampp3d is the data blog of (UK left-wing tabloid) The Mirror. I’m not sure how to phrase this without sounding more pretentious than I actually am, but it’s good to find data journalism in idioms other than California/Manhattan nerdy and New York Times/BBC/Guardian upper-middle-class liberal.
  • Datavisualization.ch claims to be “the premier news and knowledge resource for data visualization and infographics.” Despite that, it is really worth reading.
December 2, 2014

Bogus UK poll reporting

The Daily Express today: Ukip is now MORE popular than LABOUR: Nigel Farage gets polls boost as Ukip surges ahead. For NZ readers who aren’t familiar with Ukip, you can think of them as NZ First without all the tolerance and multiculturalism. It would be surprising, to put it mildly, for them to be doing that well.

I heard about this on Twitter, from Federica Cocco, who (among other things) writes about politics, data, and statistics for the Mirror Here are her graphs:

raw-poll 

and

weighted-poll

That’s a lot more plausible.

As she says, the problem is sampling bias. I had a long post drafted on reweighting and YouGov and non-response bias, but then I read her post more carefully (on a real computer, not on my phone) and realised the mistake was nothing nearly as complicated or subtle.

YouGov report results broken down in a lot of subsets, because that’s how their methodology works. Rather than attempting to get a relatively random sample and fine-tuning it to be representative, they have given up on random samples and rely on statistical modelling to get representative results. Essentially, they give each respondent a different number of ‘votes’ depending on whether people like them are under-represented or over represented in the sample.

For example, they report the results for people 18-24 (who were under-represented by about 1/3 in the sample and so will be given extra votes in the result), for Scots (who were over-represented by about 1/2 and so will be given fractional votes in the result), and for Sun readers (who were represented about right in the sample).

Overall, accounting for the over- and under-representation, the UKIP got 15% support; among the 18-24 year olds, the UKIP got 10% support; among the Scots, they got 2%. And among Sun readers they got 28% support. That’s pretty much the sort of variation you’d expect, and shows YouGov have probably picked sensible categories for doing their statistical adjustments.

The question still remains as to how the Express managed to report the poll results only for Sun readers, a small, unrepresentative sample of people who support their competition. I don’t know, but my guess is that it’s because the ‘Sun readers’ column is on the right-hand edge of the table of results. If you aren’t paying attention, you might expect the overall totals to be there.

 

[for non-UK readers: a guide to British papers]

Known and unknown unkowns

This graph is from the Ministry of Transport Strategic Policy Programme, looking at forecasts of demand for transport infrastructure.

ResizedImage600304-FutureDemand-Diagram1

The coloured lines show forecasts of driving (billion vehicle-km) made in the past; the black diamonds show actual driving. It’s clear that actual driving flattened out about ten years ago and the forecasts didn’t. What’s not clear is the implication. It could be that the old models need to be thrown out and that increases in driving are a twentieth-century phase we’ve grown out of. Or, it could be that growth in driving will restart soon. Or something else entirely.

The MoT report very sensibly accepts that we don’t really know what’s going on, and emphasises the importance of flexibility: if you aren’t confident about the future, you should be willing to accept extra costs to avoid premature lock-in, and you should also be prepared to pay for research to get better information.

 

December 1, 2014

Drug graphs

The Economist has a story on the changes in heroin abuse in the US (via @nzdrug).  It’s interesting to read, but I want to comment on the graphs.  The first one, and the one in the tweet, was this:

20141122_USC323

The source (if you use the clues in the story to search at JAMA Psychiatry) is here; the format of the graph is the same in the research paper.  I really don’t like this style with two lines for one proportion. At first glance it looks as though there’s information in the way one line mirrors the other, with the total staying approximately constant over time. Then you see that the total is exactly constant over time. It’s 100%.

The other interesting graph is different in the research paper and the story. The data are the same, but the visual impression is different.

drug-nozero drug-zero

The graph on the left, from The Economist, has no zero. The graph on the right has a zero, making the change in mean age look a lot smaller.  In this case I think I’m with The Economist when it comes to the design, though I’d argue for a slightly wider and flatter graph. Barcharts must start at zero (defined appropriately for the data), but lines don’t have to, and an increase in mean age of first use from 16.5 to 22.9 is a pretty big change.

Where I’m not with The Economist is the numbers. The research paper, as I said, gives the numbers as 16.5 in the 1960s and 22.9 in the 2010s. The graph from the story is definitely too high at the maximum and probably too low at the minimum.