Posts written by Thomas Lumley (1418)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

March 2, 2015

A nice cuppa

Q:  What do you think about this new research on tea preventing diabetes?

A: That’s not what it says

Q: Sure it is. Big black letters, right at the top: “Three cups of tea a day can cut your risk of diabetes… even if you add milk”

A: I mean that’s not what the research says

Q: The bit about milk?

A: Well, they didn’t study milk at all, but that’s not the main problem

Q: They didn’t study cups?

A: No. Or diabetes. Or, in one of the studies, tea.

Q: Hmm. Ok, so this “glucose-lowering effect” they write about, is that a lab study?

A: Yes.

Q: Mice?

A:  One of the studies used rats, the other didn’t

Q: Cells, then?

A: No, just enzymes in a test tube, and a highly processed chemical extract of tea.

Q: Ok, forget about that one. But the rat study, that measured actual glucose lowering and actual tea?

A:  Almost. They gave the rats a high-sugar drink, and if they were given the tea first, their blood glucose didn’t go up as much.

Q: Which of the two studies was this one?

A: The one where the story just says the results were similar and doesn’t give the researchers’ names, only their institution.

Q: Wouldn’t you think the story would say more about this one, since it actually involves blood glucose and, like, living things?

A: In a perfect world, yes.

Q: The story says they don’t think milk would make a difference. What about sugar?

A: No mention of it.

Q: That’s strange. Quite a lot of British people have sugar in their tea. Wouldn’t it be helpful to say something?

A: You’d think.

Q: How much tea did the rats get?

A: The lowest effective dose they report is 62.5 mg/kg of freeze-dried tea powder

Q: What’s that in cups?

A: The research paper says “corresponds to nine cups of black tea”.

Q: Per day?

A: No, all at once.

Q: So we need to get bigger cups?

A: Or fewer reprinted British ‘health’ stories.

February 27, 2015

Quake prediction: how good does it need to be?

From a detailed story in the ChCh Press, (via Eric Crampton) about various earthquake-prediction approaches

About 40 minutes before the quake began, the TEC in the ionosphere rose by about 8 per cent above expected levels. Somewhat perplexed, he looked back at the trend for other recent giant quakes, including the February 2010 magnitude 8.8 event in Chile and the December 2004 magnitude 9.1 quake in Sumatra. He found the same increase about the same time before the quakes occurred.

Heki says there has been considerable academic debate both supporting and opposing his research.

To have 40 minutes warning of a massive quake would be very useful indeed and could help save many lives. “So, why 40 minutes?” he says. “I just don’t know.”

He says if the link were to be proved more firmly in the future it could be a useful warning tool. However, there are drawbacks in that the correlation only appears to exist for the largest earthquakes, whereas big quakes of less than magnitude 8.0 are far more frequent and still cause death and devastation. Geomagnetic storms can also render the system impotent, with fluctuations in the total electron count masking any pre-quake signal.

Let’s suppose that with more research everything works out, and there is a rise in this TEC before all very large quakes. How much would this help in New Zealand? The obvious place is Wellington. A quake over 8.0 magnitude has been observed in the area in 1855, when it triggered a tsunami. A repeat would also shatter many of the earthquake-prone buildings. A 40-minute warning could save many lives. It appears that TEC shouldn’t be that expensive to measure: it’s based on observing the time delays in GPS satellite transmissions as they pass through the ionosphere, so it mostly needs a very accurate clock (in fact, NASA publishes TEC maps every five minutes). Also, it looks like it would be very hard to hack the ionosphere to force the alarm to go off. The real problem is accuracy.

The system will have false positives and false negatives. False negatives (missing a quake) aren’t too bad, since that’s where you are without the system. False positives are more of a problem. They come in two forms: when the alarm goes off completely in the absence of a quake, and when there is a quake but no tsunami or catastrophic damage.

Complete false predictions would need to be very rare. If you tell everyone to run for the hills and it turns out to be sunspots or the wrong kind of snow, they will not be happy: the cost in lost work (and theft?) would be substantial, and there would probably be injuries.  Partial false predictions, where there was a large quake but it was too far away or in the wrong direction to cause a tsunami, would be just as expensive but probably wouldn’t cause as much ill-feeling or skepticism about future warnings.

Now for the disappointment. The story says “there has been considerable academic debate”. There has. For example, in a (paywalled) paper from 2013 looking at the Japanese quake that prompted Heki’s idea

A detailed analysis of the ionospheric variability in the 3 days before the earthquake is then undertaken, where a simultaneous increase in foF2 and the Es layer peak plasma frequency, foEs, relative to the 30-day median was observed within 1 h before the earthquake. A statistical search for similar simultaneous foF2 and foEs increases in 6 years of data revealed that this feature has been observed on many other occasions without related seismic activity. Therefore, it is concluded that one cannot confidently use this type of ionospheric perturbation to predict an impending earthquake.

In translation: you need to look just right to see this anomaly, and there are often anomalies like this one without quakes. Over four years they saw 24 anomalies, only one shortly before a quake.  Six complete false positives per year is obviously too many.  Suppose future research could refine what the signal looks like and reduce the false positives by a factor of ten: that’s still evacuation alarms with no quake more than once every two years. I’m pretty sure that’s still too many.


Siberian hamsters or Asian gerbils

Every year or so there is a news story along the lines of”Everything you know about the Black Death is Wrong”. I’ve just been reading a couple of excellent posts  by Alison Atkin on this year’s one.

The Herald’s version of the story (which they got from the Independent) is typical (but she has captured a large set of headlines)

The Black Death has always been bad publicity for rats, with the rodent widely blamed for killing millions of people across Europe by spreading the bubonic plague.

But it seems that the creature, in this case at least, has been unfairly maligned, as new research points the finger of blame at gerbils.


The scientists switched the blame from rat to gerbil after comparing tree-ring records from Europe with 7711 historical plague outbreaks.

That isn’t what the research paper (in PNAS) says. And it would be surprising if it did: could it really be true that Asian gerbils were spreading across Europe for centuries without anyone noticing?

The abstract of the paper says

The second plague pandemic in medieval Europe started with the Black Death epidemic of 1347–1353 and killed millions of people over a time span of four centuries. It is commonly thought that after its initial introduction from Asia, the disease persisted in Europe in rodent reservoirs until it eventually disappeared. Here, we show that climate-driven outbreaks of Yersinia pestis in Asian rodent plague reservoirs are significantly associated with new waves of plague arriving into Europe through its maritime trade network with Asia. This association strongly suggests that the bacterium was continuously reimported into Europe during the second plague pandemic, and offers an alternative explanation to putative European rodent reservoirs for how the disease could have persisted in Europe for so long.

If the researchers had found repeated, prevously unsuspected, invasions of Europe by hordes of gerbils, they would have said so in the abstract. They don’t. Not a gerbil to be seen.

The hypothesis is that plague was repeatedly re-imported from Asia (where affected a lots of species, including, yes, gerbils) to European rats, rather than persisting at low levels in European rats between the epidemics. Either way, once the epidemic got to Europe, it’s all about the rats [update: and other non-novel forms of transmission]

In this example, for a change, it doesn’t seem that the press release is responsible. Instead, it looks like progressive mutations in the story as it’s transmitted, with the great gerbil gradually going from an illustrative example of a plague host in Asia to the rodent version of Attila the Hun.

Two final remarks. First, the erroneous story is now in the Wikipedia entry for the great gerbil (with a citation to the PNAS paper, so it looks as if it’s real). Second, when the story is allegedly about the confusion between two species of rodent, it’s a pity the Herald stock photo isn’t the right species.


[Update: Wikipedia has been fixed.]

What are you trying to do?


There’s a new ‘perspectives’ piece (paywall) in the journal Science, by Jeff Leek and Roger Peng (of Simply Statistics), arguing that the most common mistake in data analysis is misunderstanding the type of question. Here’s their flowchart


The reason this is relevant to StatsChat is that you can use the flowchart on stories in the media. If there’s enough information in the story to follow the flowchart you can see how the claims match up to the type of analysis. If there isn’t enough information in the story, well, you know that.


February 25, 2015

Measuring what you care about

If cannabis is safer than thought (as the Washington Post says), that might explain why the reporting is careful to stay away from thought.



The problem with this new research is that it’s looking at the acute toxicity of drugs — how does the dose people usually take compare to the dose needed to kill you right away.  It’s hard to overstate how unimportant this is in debates over regulation of alcohol, tobacco, and cannabis.  There’s some concern about alcohol poisoning (in kids, mostly), but as far as I can remember I have literally never seen anti-tobacco campaigns mentioning acute nicotine poisoning as a risk, and even the looniest drug warriors don’t push fatal THC overdoses as the rationale for banning marijuana.

Alcohol is dangerous not primarily because of acute poisoning, but because of car crashes, violence, cancer, liver failure, and heart damage. Tobacco is dangerous not primarily because of acute poisoning, but because of lung cancer, COPD, heart disease, stroke, and other chronic diseases.

It’s hard to tell how dangerous marijuana is. It certainly causes dependence in some users, and there are reasons to think it might have psychological and neurological effects. If smoked, it probably damages the lungs. In all these cases, though, the data on frequency and severity of long-term effects are limited.  We really don’t know, and the researchers didn’t even try to estimate.

The conclusions of the researchers — that cannabis is over-regulated and over-panicked-about relative to other drugs — are reasonable, but the data provide very little support for them.  If the researchers had used the same methodology on caffeine, it would have looked much more dangerous than cannabis, and probably more dangerous than methamphetamine. That would have been a bit harder to sell, even with a pretty graph.


[story now in Herald, too]


  • NZ papers have sensible coverage of the new peanuts/kids research (Herald, Stuff). NHS Behind The Headlines has a summary and takes some UK papers to task.
  • “Rich Data, Poor Data”: Nate Silver writes about why sports statistics works. Unfair summary: it’s an artificial problem in a controlled environment that people care about more than they should.
  • “the vast majority of health sites, from the for-profit to the government-run, are loaded with tracking elements that are sending records of your health inquiries to the likes of web giants like Google, Facebook, and Pinterest, and data brokers like Experian and Acxiom.” Story at, video summary from the researcher:
  • “A memo to the American people from US Chief Data Scientist Dr DJ Patil”.  More informative than you might expect given the source.
  • “the biggest problem facing the world of public opinion research isn’t that online opt-in polls, but rather the temptation to troll twitter to “see what people are thinkingand other thoughts from Cathy O’Neil, based on the new report on Big Data from the American Association for Public Opinion Research.
  • another part of the increasing supply of open data
February 21, 2015

Another interesting thing about petrol prices

or What I Did At Open Data Day.

The government monitoring data on petrol prices go back to 2004, and while they show their data as time series, there are other ways to look at it.


The horizontal axis is the estimated cost of imported petrol plus all the taxes and levies. The vertical axis is the rest of the petrol price: it covers the cost hauling the stuff around the country, the cost of running petrol stations, and profit for both petrol stations and companies.

There’s an obvious change in 2012. From 2005 to 2012, the importer margin varied around 15c/litre, more or less independent of the costs. From 2012, the importer margin started rising, without any big changes in costs.

Very recently, things changed again: the price of crude oil fell, with the importer margin staying roughly constant and the savings being passed on to consumers. Then the New Zealand dollar fell, and the importer margin has fallen — either the increased costs from the lower dollar are being absorbed by the vendors, or they have been hedged somehow.


If it seems too good to be true

The Herald (from the Daily Telegraph) has a story about a new high-antioxidant chocolate

Its makers claim it can change the underlying skin of a 50 to 60-year-old into that of someone in their 20s or 30s.

Actually, in an uncontrolled short-term trial in 400 people they say

“We used people in their 50s and 60s and in terms of skin biomarkers we found it had brought skin back to the levels of a 20 or 30-year-old

The target market is

“elegant, educated and affluent” city-dwelling women in their 30s and businessmen “to support their appearance in a stressful environment and on their business travels”.

or, in other words, people who would be willing to bore on about how young and beautiful their skin biomarkers are, in case you can’t tell by looking.

To be fair, there is independent expert comment (which is not entirely convinced). If you read right to the last sentence you get the real highlight:

Nutrition experts at UCL also warned that previous trials showed that astaxanthin worked better when applied directly to the face rather than ingested.


Updated to add: the story was also on Prime News, where they made explicit the point that this really has nothing to do with the chocolate. They could have put the astaxanthin in a pill, but they thought it would be more attractive if they put it in chocolate. A spoonful of sugar makes the medicine go down, etc,

February 20, 2015

Why we have controlled trials



The graph is from a study — a randomised, placebo-controlled trial published in a top medical journal — of a plant-based weight loss treatment, an extract from Garcinia cambogia, as seen on Dr Oz. People taking the real Garcinia cambogia lost weight, an average of 3kg over 12 weeks. That would be at least a little impressive, except that people getting pretend Garcinia cambogia lost an average of more than 4kg over the same time period.  It’s a larger-than-usual placebo response, but it does happen. If just being in a study where there’s 50:50 chance of getting a herbal treatment can lead to 4kg weight loss, being in a study where you know you’re getting it could produce even greater ‘placebo’ benefits.

If you had some other, new, potentially-wonderful natural plant extract that was going to help with weight loss, you might start off with a small safety study. Then you’d go to a short-term, perhaps uncontrolled, study in maybe 100 people over a few weeks to see if there was any sign of weight loss and to see what the common side effects were. Finally, you’d want to do a randomised controlled trial over at least six months to see if people really lost weight and kept it off.

If, after an uncontrolled eight-week study, you report results for only 52 of 100 people enrolled and announce you’ve found “an exciting answer to one of the world’s greatest and fastest growing problems” you perhaps shouldn’t undermine it by also saying “The world is clearly looking for weight-loss products which are proven to work.”


[Update: see comments]

February 19, 2015

London card clash sensitivity analysis

The data blog of the Daily Mirror reports a problem with ‘card clash’ on the London Underground.  You can now pay directly with a debit card instead of buying a ticket — so if you have both a transport card and a debit card in your wallet, you have the opportunity to enter with one and leave with the other and get overcharged. Alternatively, you can take the card out of your wallet and drop it.  Auckland Transport has a milder version of the same problem: no-touch credit cards can confuse the AT HOP reader and make it not recognise your card, but you won’t get overcharged unless you don’t notice the red light.

They looked at numbers of cards handed in at lost-and-found across the London Underground over the past two years (based on FOI request)


If we’re going to spend time on this, we might also consider what the right comparison is. The data include cards on their own and cards with other stuff, such as a wallet. We shouldn’t combine them: the ‘card clash’ hypothesis would suggest a bigger increase in cards on their own.

Here’s a comparison using all the data: the pale points are the observations, the heavy lines are means.


Or, we might worry about trends over time and use just the most recent four months of comparison data:


Or, use the same four months of the previous year:



In this case all the comparisons give basically the same conclusion: more cards are being handed in, but the increase is pretty similar for cards alone and for cards with other stuff, which weakens the support for the ‘card clash’ explanation.

Also, in the usual StatsChat spirit of considering absolute risks: there are 3.5 million trips per day, and about 55 cards handed in per day: one card for about 64000 trips. With two trips per day, 320 days per year, that would average once per person per century.