Posts filed under Random variation (114)

May 28, 2015

Junk food science

In an interesting sting on the world of science journalism, John Bohannon and two colleagues, plus a German medical doctor, ran a small randomised experiment on the effects of chocolate consumption, and found better weight loss in those given chocolate. The experiment was real and the measurements were real, but the medical journal  was the sort that published their paper two weeks after submission, with no changes.

Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.

Think of the measurements as lottery tickets. Each one has a small chance of paying off in the form of a “significant” result that we can spin a story around and sell to the media. The more tickets you buy, the more likely you are to win. We didn’t know exactly what would pan out—the headline could have been that chocolate improves sleep or lowers blood pressure—but we knew our chances of getting at least one “statistically significant” result were pretty good.

Bohannon and his conspirators were doing this deliberately, but lots of people do it accidentally. Their study was (deliberately) crappier than average, but since the journalists didn’t ask, that didn’t matter. You should go read the whole thing.

Finally, two answers for obvious concerns: first, the participants were told the research was for a documentary on dieting, not that it was in any sense real scientific research. Second: no, neither Stuff nor the Herald fell for it.

 [Update: Although there was participant consent, there wasn’t ethics committee review. An ethics committee probably wouldn’t have allowed it. Hilda Bastian on Twitter]

Road deaths up (maybe)

In Australia road deaths are going down but in New Zealand the number has shot up“, says the Herald, giving depressing-looking international comparisons from newly-announced OECD data. The percentage increase was highest in New Zealand The story does go on to point out that the increase reverses a decrease the previous year, suggesting that it might be that 2013 was especially good, and says

An ITF spokesman said New Zealand’s relatively small size made percentage movements more dramatic.”

Overall, it’s a good piece. Two things I want to add: first, it’s almost always useful to see more context in a time series if it’s available. I took the International Road Traffic Accident Database and picked out a group of countries with similar road toll to New Zealand in 2000: all those between 200 and 1000. The list is Austria, Denmark, Finland, Ireland, Israel, New Zealand, Norway, Slovenia, Sweden, Switzerland. Here are the data for 2000 and for 2010-2014; New Zealand is in red.

roaddeaths

There’s a general downward trend, but quite a bit of bouncing around due to random variation. As we keep pointing out, there are lots of mistakes made when driving, and it takes bad luck to make one of these fatal, so there is a lot of chance involved. It’s clear from the graph that the increase is not much larger than random variation.

Calculations using the Poisson distribution (the simplest reasonable mathematical model, and the one with the smallest random variation) are, likewise, borderline. There’s only weak evidence that road risk was higher last year than in 2013. The right reference level, though, isn’t ‘no change’, it’s the sort of decrease that other countries are seeing.  The median change in this group of 10 countries was a 5% decrease, and there’s pretty good evidence that New Zealand’s risk did not decrease 5%.  Also, the increase is still present this year, making it more convincing.

What we can’t really do is explain why. As the Herald story says, some of the international decrease is economic: driving costs money, so people do less of it in recessions. Since New Zealand was less badly hit by recession, you’d expect less decrease in driving here, and so less decrease in road deaths. Maybe.

One thing we do know: while it’s tempting and would be poetic justice, it’s not valid to use the increase as evidence that recent road-safety rule changes have been ineffective. That would be just as dishonest as the claims for visible success of the speed tolerance rules in the past.

 

May 20, 2015

Weather uncertainty

From the MetService warnings page

7567-14d6e9ea400-14d78eb5c00-14d6f31d538.0

The ‘confidence’ levels are given numerically on the webpage as 1 in 5 for ‘Low’, 2 in 5 for ‘Moderate’ and 3 in 5 for ‘High’. I don’t know how well calibrated these are, but it’s a sensible way of indicating uncertainty.  I think the hand-drawn look of the map also helps emphasise the imprecision of forecasts.

(via Cate Macinnis-Ng on Twitter)

May 6, 2015

All-Blacks birth month

This graphic and the accompanying story in the Herald produced a certain amount of skeptical discussion on Twitter today.

AB2

It looks a bit as though there is an effect of birth month, and the Herald backs this up with citations to Malcolm Gladwell on ice hockey.

The first question is whether there is any real evidence of a pattern. There is, though it’s not overwhelming. If you did this for random sets of 173 people, about 1 in 80 times there would be 60 or more in the same quarter (and yes, I did use actual birth frequencies rather than just treating all quarters as equal). The story also looks at the Black Caps, where evidence is a lot weaker because the numbers are smaller.

On the other hand, we are comparing to a pre-existing hypothesis here. If you asked whether the data were a better fit to equal distribution over quarters or to Gladwell’s ice-hockey statistic of a majority in the first quarter, they are a much better fit to equal distribution over quarters.

The next step is to go slightly further than Gladwell, who is not (to put it mildly) a primary source. The fact that he says there is a study showing X is good evidence that there is a study showing X, but it isn’t terribly good evidence that X is true. His books are written to communicate an idea, not to provide balanced reporting or scientific reference.  The hockey analysis he quotes was the first study of the topic, not the last word.

It turns out that even for ice-hockey things are more complicated

Using publically available data of hockey players from 2000–2009, we find that the relative age effect, as described by Nolan and Howell (2010) and Gladwell (2008), is moderate for the average Canadian National Hockey League player and reverses when examining the most elite professional players (i.e. All-Star and Olympic Team rosters).

So, if you expect the ice-hockey phenomenon to show up in New Zealand, the ‘most elite professional players’, the All Blacks might be the wrong place to look.

On the other hand Rugby League in the UK does show very strong relative age effects even into the national teams — more like the 50% in first quarter that Gladwell quotes for ice hockey. Further evidence that things are more complicated comes from soccer. A paper (PDF) looking at junior and professional soccer found imbalances in date of birth, again getting weaker at higher levels. They also had an interesting natural experiment when the eligibility date changed in Australia, from January 1 to August 1.

soccer

As the graph shows, the change in eligibility date was followed by a change in birth-date distribution, but not how you might expect. An August 1 cutoff saw a stronger first-quarter peak than the January 1 cutoff.

Overall, it really does seem to be true that relative age effects have an impact on junior sports participation, and possibly even high-level professional acheivement. You still might not expect the ‘majority born in the first quarter’ effect to translate from the NHL as a whole to the All Blacks, and the data suggest it doesn’t.

Rather more important, however, are relative age effects in education. After all, there’s a roughly 99.9% chance that your child isn’t going to be an All Black, but education is pretty much inevitable. There’s similar evidence that the school-age cutoff has an effect on educational attainment, which is weaker than the sports effects, but impacts a lot more people. In Britain, where the school cutoff is September 1:

Analysis shows that approximately 6% fewer August-born children reached the expected level of attainment in the 3 core subjects at GCSE (English, mathematics and science) relative to September-born children (August born girls 55%; boys 44%; September born girls 61% boys 50%)

In New Zealand, with a March 1 cutoff, you’d expect worse average school performance for kids born on the dates the Herald story is recommending.

As with future All Blacks, the real issue here isn’t when to conceive. The real issue is that the system isn’t working as well for some people. The All Blacks (or more likely the Blues) might play better if they weren’t missing key players born in the wrong month. The education system, at least in the UK, would work better if it taught all children as well as it teaches those born in autumn.  One of these matters.

 

 

March 12, 2015

Variation and mean

A lot of statistical reporting focuses on means, or other summaries of where a distribution lies. Often, though, variation is important.  Vox.com has a story about variation in costs of lab tests at California hospitals, based on a paper in BMJ OpenVox says

The charge for a lipid panel ranged from $10 to $10,169. Hospital prices for a basic metabolic panel (which doctors use to measure the body’s metabolism) were $35 at one facility — and $7,303 at another

These are basically standard lab tests, so there’s no sane reason for this sort of huge variation. You’d expect some variation with volume of tests and with location, but nothing like what is seen.

What’s not clear is how much this is really just variation in how costs are attributed. A hospital needs a blood lab, which has a lot of fixed costs. Somehow these costs have to be spread over individual tests, but there’s no unique way to do this.  It would be interesting to know if the labs with high charges for one test tend to have high charges for others, but the research paper doesn’t look at relationships between costs.

The Vox story also illustrates a point about reporting, with this graph

 F1.large-1

If you look carefully, there’s something strange about the graph. The brown box second from the right is ‘lipid panel’, and it goes up to a bit short of $600, not to $10169. Similarly, the ‘metabolic panel’, the right-most box, goes up to $1000 on the graph and $7303 in the story.

The graph is taken from the research paper. In the research paper it had a caption explaining that the ‘whiskers’ in the box plot go to the 5th and 95th percentiles (a non-standard but reasonable choice). This caption fell off on the way to Vox.com, and no-one seems to have noticed.

February 27, 2015

Quake prediction: how good does it need to be?

From a detailed story in the ChCh Press, (via Eric Crampton) about various earthquake-prediction approaches

About 40 minutes before the quake began, the TEC in the ionosphere rose by about 8 per cent above expected levels. Somewhat perplexed, he looked back at the trend for other recent giant quakes, including the February 2010 magnitude 8.8 event in Chile and the December 2004 magnitude 9.1 quake in Sumatra. He found the same increase about the same time before the quakes occurred.

Heki says there has been considerable academic debate both supporting and opposing his research.

To have 40 minutes warning of a massive quake would be very useful indeed and could help save many lives. “So, why 40 minutes?” he says. “I just don’t know.”

He says if the link were to be proved more firmly in the future it could be a useful warning tool. However, there are drawbacks in that the correlation only appears to exist for the largest earthquakes, whereas big quakes of less than magnitude 8.0 are far more frequent and still cause death and devastation. Geomagnetic storms can also render the system impotent, with fluctuations in the total electron count masking any pre-quake signal.

Let’s suppose that with more research everything works out, and there is a rise in this TEC before all very large quakes. How much would this help in New Zealand? The obvious place is Wellington. A quake over 8.0 magnitude has been observed in the area in 1855, when it triggered a tsunami. A repeat would also shatter many of the earthquake-prone buildings. A 40-minute warning could save many lives. It appears that TEC shouldn’t be that expensive to measure: it’s based on observing the time delays in GPS satellite transmissions as they pass through the ionosphere, so it mostly needs a very accurate clock (in fact, NASA publishes TEC maps every five minutes). Also, it looks like it would be very hard to hack the ionosphere to force the alarm to go off. The real problem is accuracy.

The system will have false positives and false negatives. False negatives (missing a quake) aren’t too bad, since that’s where you are without the system. False positives are more of a problem. They come in two forms: when the alarm goes off completely in the absence of a quake, and when there is a quake but no tsunami or catastrophic damage.

Complete false predictions would need to be very rare. If you tell everyone to run for the hills and it turns out to be sunspots or the wrong kind of snow, they will not be happy: the cost in lost work (and theft?) would be substantial, and there would probably be injuries.  Partial false predictions, where there was a large quake but it was too far away or in the wrong direction to cause a tsunami, would be just as expensive but probably wouldn’t cause as much ill-feeling or skepticism about future warnings.

Now for the disappointment. The story says “there has been considerable academic debate”. There has. For example, in a (paywalled) paper from 2013 looking at the Japanese quake that prompted Heki’s idea

A detailed analysis of the ionospheric variability in the 3 days before the earthquake is then undertaken, where a simultaneous increase in foF2 and the Es layer peak plasma frequency, foEs, relative to the 30-day median was observed within 1 h before the earthquake. A statistical search for similar simultaneous foF2 and foEs increases in 6 years of data revealed that this feature has been observed on many other occasions without related seismic activity. Therefore, it is concluded that one cannot confidently use this type of ionospheric perturbation to predict an impending earthquake.

In translation: you need to look just right to see this anomaly, and there are often anomalies like this one without quakes. Over four years they saw 24 anomalies, only one shortly before a quake.  Six complete false positives per year is obviously too many.  Suppose future research could refine what the signal looks like and reduce the false positives by a factor of ten: that’s still evacuation alarms with no quake more than once every two years. I’m pretty sure that’s still too many.

 

February 12, 2015

Two types of brain image study

If a brain imaging study finds greater activation in the asymmetric diplodocus region or increased thinning in the posterior homiletic, what does that mean?

There are two main possibilities. Some studies look at groups who are different and try to understand why. Other studies try to use brain imaging as an alternative to measuring actual behaviour. The story in the Herald (from the Washington Post), “Benefit of kids’ music lessons revealed – study” is the second type.

The researchers looked at 334 MRI brain images from 232 young people (so mostly one each, some with two or three), and compared the age differences in young people who did or didn’t play a musical instrument.  A set of changes that happens as you grow up happened faster for those who played a musical instrument.

“What we found was the more a child trained on an instrument,” said James Hudziak, a professor of psychiatry at the University of Vermont and director of the Vermont Center for Children, Youth and Families, “it accelerated cortical organisation in attention skill, anxiety management and emotional control.

An obvious possibility is that kids who play a musical instrument have different environments in other ways, too.  The researchers point this out in the research paper, if not in the story.  There’s a more subtle issue, though. If you want to measure attention skill, anxiety management, or emotional control, why wouldn’t you measure them directly instead of measuring brain changes that are thought to correlate with them?

Finally, the effect (if it is an effect) on emotional and behavioural maturation (if it is on emotional and behavioural maturation) is very small. Here’s a graph from the paper
PowerPoint Presentation

 

The green dots are the people who played a musical instrument; the blue dots are those who didn’t.  There isn’t any dramatic separation or anything — and to the extent that the summary lines show a difference it looks more as if the musicians started off behind and caught up.

January 31, 2015

Big buts for factoid about lying

At StatsChat, we like big buts, and an easy way to find them is unsourced round numbers in news stories. From the Herald (reprinted from the Telegraph, last November)

But it’s surprising to see the stark figure that we lie, on average, 10 times a week.

It seems that this number comes from an online panel survey in the UK last year (Telegraph, Mail) — it wasn’t based on any sort of diary or other record-keeping, people were just asked to come up with a number. Nearly 10% of them said they had never lied in their entire lives; this wasn’t checked with their mothers.  A similar poll in 2009 came up with much higher numbers: 6/day for men, 3/day for women.

Another study, in the US, came up with an estimate of 11 lies per week: people were randomised to trying not to lie for ten weeks, and the 11/week figure was from the control group.  In this case people really were trying to keep track of how often they lied, but they were a quite non-representative group. The randomised comparison will be fair, but the actual frequency of lying won’t be generalisable.

The averages are almost certainly misleading, because there’s a lot of variation between people. So when the Telegraph says

The average Briton tells more than 10 lies a week,

or the Mail says

the average Briton tells more than ten lies every week,

they probably mean the average number of self-reported lies was more than 10/week, with the median being much lower. The typical person lies much less often than the average.

These figures are all based on self-reported remembered lies, and all broadly agree, but another study, also from the US, shows that things are more complicated

Participants were unaware that the session was being videotaped through a hidden camera. At the end of the session, participants were told they had been videotaped and consent was obtained to use the video-recordings for research.

The students were then asked to watch the video of themselves and identify any inaccuracies in what they had said during the conversation. They were encouraged to identify all lies, no matter how big or small.

The study… found that 60 percent of people lied at least once during a 10-minute conversation and told an average of two to three lies.

 

 

January 16, 2015

Holiday road toll

Here are the data, standardised for population but not for the variation in the length of the period, the weather, or anything else

holiday

As you can see, the numbers are going down, and there’s quite a bit of variability — as the police say

“It’s the small things that often contribute to having a significant impact. Small decisions, small errors..”

Fortunately, the random-variation viewpoint is getting a reasonable hearing this year:

  • Michael Wright, in the ChCh PressBut the idea that a high holiday road toll exposed its flaws may be dumber still. A holiday week or weekend is too short a period to mean anything more.”
  • Eric Crampton, in the Herald: “People have a bad habit of wanting to tell stories about random low-probability events.”

 

December 7, 2014

Bot or Not?

Turing had the Imitation Game, Phillip K. Dick had the Voight-Kampff Test, and spammers gave us the CAPTCHA.  The Truthy project at Indiana University has BotOrNot, which is supposed to distinguish real people on Twitter from automated accounts, ‘bots’, using analysis of their language, their social networks, and their retweeting behaviour. BotOrNot seems to sort of work, but not as well as you might expect.

@NZquake, a very obvious bot that tweets earthquake information from GeoNet, is rated at an 18% chance of being a bot.  Siouxsie Wiles, for whom there is pretty strong evidence of existence as a real person, has a 29% chance of being a bot.  I’ve got a 37% chance, the same as @fly_papers, which is a bot that tweets the titles of research papers about fruit flies, and slightly higher than @statschat, the bot that tweets StatsChat post links,  or @redscarebot, which replies to tweets that include ‘communist’ or ‘socialist’. Other people at a similar probability include Winston Peters, Metiria Turei, and Nicola Gaston (President of the NZ Association of Scientists).

PicPedant, the twitter account of the tireless Paulo Ordoveza, who debunks fake photos and provides origins for uncredited ones, rates at 44% bot probability, but obviously isn’t.  Ben Atkinson, a Canadian economist and StatsChat reader, has a 51% probability, and our only Prime Minister (or his twitterwallah), @johnkeypm, has a 60% probability.