Posts written by Thomas Lumley (2007)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

June 26, 2017


  • A map of 1.3 billion taxi trips in New York, taking advantage of the underappreciated principle that there’s no point having more detail than the screen can display.  Also, GPS error naturally gives an attractive glowing effect that you’d usually have to add in afterwards
  • “In the summer of 2015, Alexandra Franco got a letter in the mail from a company she had never heard of called AcurianHealth. The letter, addressed to Franco personally, invited her to participate in a study of people with psoriasis, a condition that causes dry, itchy patches on the skin.”  A story about creepy data-mining, from Gizmodo.
  • From Scientific American, graphics showing daily, weekly, yearly patterns in number of births.
  • From the New York Times: a new drug for muscular dystrophy. It costs about US$1 million per year, and the FDA is not really convinced it has an effect
  • It’s time for the NZ Garden Bird Survey, which means it’s time for me to recommend their questions and answers page for its attention to principles of experimental design.
  • “Death when it comes will have no sheep”. Last week it was hamster names; this week it’s proverbs. Look, save yourself some effort and just go directly to Janelle Shane’s blog rather than waiting for each post to go viral.
  • In Science, probability is more certain than you think.” Chad Orzel
June 24, 2017

Cheese addiction: the book

I missed this a couple of weeks ago when it came out, but Stuff has a pretty good story on the ‘cheese addiction’ question.

As long-time readers will know, there’s been a persistent story circulating in the media claiming that a University of Michigan study found cheese was addictive because of substances called casomorphins.  The story is always unsourced (or sourced only to another copy), and the researchers at the University of Michigan have pointed out that this isn’t remotely like what their research found. The difference now is that Dr Neal Barnard, of the Physicians Committee for Responsible Medicine is fronting up. He’s written a book.

As the story on Stuff says (with added expert input), the cheese addiction claim doesn’t really stand up, but cheese is high in fat and there are things to not like about the dairy industry. And

While it’s not hard to pick holes in some of Barnard’s anti-cheese arguments, the book has good advice on what to eat instead

That could well be true but, as with paleo, you could find books that just give the recipes and leave out the scientifically-dubious propaganda.

June 19, 2017

What’s brown and sticky?

Q: What’s brown and sticky?

A: A stick!

Q: What do you call a cow on a trampoline?

A: A milk shake!

Q: Where does chocolate milk come from?

A: Brown cows!

There’s a popular news story around claiming that 7% of Americans think chocolate milk comes from brown cows.

It’s not true.

That is, it’s probably not true that 7% of Americans think chocolate milk comes from brown cows.  If you try to trace the primary source, lots of stories point to Food & Wine, who point to the Innovation Center for U.S. Dairy, who point to, who point back to Food & Wine. Critically, none of the sources give the actual questions.  Was the question “Where does chocolate milk come from?” Was it “Lots of people say chocolate milk comes from brown cows, do you agree or disagree?” Was it “Does chocolate milk come from: (a) brown cows, (b) mutant sheep, (c) ordinary milk mixed with cocoa and sugar?” Was there a “Not sure” option?

This was clearly a question asked to get a marketing opportunity for carefully-selected facts about milk.  If the Innovation Center for US Dairy was interested in the factual question of what people believe about chocolate milk, they’d be providing more information about the survey and how they tried to distinguish actual believers from people who were just joking.

The Washington Post story does go into the more general issue of ignorance about food and agriculture: there’s apparently a lot of it about, especially among kids.  To some extent, though, this is what should happen. Via the NY Times

According to Agriculture Department estimates going back to 1910, however, the farm population peaked in 1916 at 32.5 million, or 32 percent of the population of 101.6 million.

It’s now down to 2%. Kids don’t pick up, say,  how cheese is made, from their day-to-day lives, and it’s not a top educational priority for schools.

The chocolate milk story, though, is bullshit: it looks like it’s being spread by people who don’t actually care whether the number is 7%.  And survey bullshit can be very sticky: a decade from now, we’ll probably find people citing this story as if it was evidence of something (other than contemporary news standards).

June 18, 2017

Unbiased anecdote is still anecdote

RadioNZ has a new “Healthy or Hoax” series looking at popular health claims. The first one, on coconut oil, is a good example both of what it does well, and of the difficulties in matching the claims and science.

The serious questions about coconut oil are about changes in blood fats and in insulin resistance when saturated fats replace various other components of the diet.  Replacing sugar and starch by  saturated fat is probably good; replacing, say,  monounsaturated fat by saturated fat probably isn’t. But in both cases the effects are small and are primarily on things you don’t notice, like your cholesterol level. That’s why there’s disagreement, because it’s actually hard to tell, given all the individual variability between people.

The superfood questions about coconut oil are about whether eating loads of it makes dramatic improvements in your health over a period of a few weeks.  There’s no reason to think it does, and the story quotes various people including Grant Schofield — who is at one end of the spectrum of respectable views on this subject — as saying so.

That’s all fine, but a big part of the story is about Kate Pereyra Garcia trying it for herself.  If the scientists — any subset of them — are right, a study on one person isn’t going to say anything helpful.  A one-person experience might disprove some of the extreme superfoodie claims, but no-one who believes those claims is likely to pay attention.

So, on one hand, the series looks like a great way to bring up the relatively boring evidence on a range of health topics. On the other hand, it’s reinforcing the concept of individual testimonials as a way of evaluating health effects.  If it was that easy to tell, we wouldn’t still be arguing about it.


  • A simulation of measles spreading through communities with different vaccination levels.
  • Update on the prosecution of the former government statistician of Greece, Andreas Georgiou, apparently because the right numbers weren’t popular.
  • Blind testing suggests wine tasters do much better than chance, but nowhere near as well as they’d like you to think.
  • “It’s not that people don’t like Mexican food. The reason was that the system had learned the word “Mexican” from reading the Web.” On reducing the ethnic and gender biases of automated text analysis
  • Herald Insights visualisation of crime patterns in New Zealand. Yes, there’s a denominator problem; no, the obvious fixes wouldn’t help.


June 15, 2017

One poll is not enough

As Patrick Gower said recently about the new Newshub/Reid Research polls

“The interpretation of data by the media is crucial. You can have this methodology that we’re using and have it be bang on and perfect, but I could be too loose with the way I analyse and present that data, and all that hard work can be undone by that. So in the end, it comes down to me and the other people who present it.”

This evening, Newshub has the headline Poll: Labour crumbles, falling towards defeat. That’s based on a difference between two polls of 4.2% for Labour on its own, or 3.1% for a Labour/Greens alliance.

The poll has a ‘maximum margin of error’ of 3.1%, but that’s for support in this poll. For change between two polls, the maximum margin of error from the same assumptions is larger: 4.4%.

There’s pretty good evidence the decrease for Labour is likely to be real: at 25-30% support the random variation is smaller.  Even so, an uncertainty interval based on the usual optimistic assumptions about sampling goes from a decrease of 0.3% to a decrease of 8.1%.

The smaller change for the Greens/Labour alliance, this could easily just be the sort of thing that happens with polling. Or, it could be a real crumble. Or anything in between

Certainly, even a 3.1% decrease in support is potentially a big deal, and could be news. The problem is that a single standard NZ opinion poll isn’t up to the task of detecting it reliably. Whether it’s news or not is up to the judgement (or guesswork) of the media, and the demands of the audience.  Even that would be ok, if everyone admitted the extent to which the data just serve to dilute the reckons, rather than glossing over all the uncertainty.

If anyone wants less-exciting summaries, my current recommendation for an open, transparent, well-designed NZ poll aggregator is this by Peter Ellis.

We’re not number three

From the Twitter, via Graeme Edgeler


As Graeme points out, the nice thing about having a link included is that you can check the report (PDF, p8) and find out the claim isn’t true — at least by the source’s definitions.

This one is redrawn to use all the data, with the countries previously left out coloured grey. There’s a pattern.



June 14, 2017

Comparing sources

The Herald has a front-page-link “Daily aspirin deadlier than we thought”, for a real headline “Daily aspirin behind more than 3000 deaths a year, study suggests”.  The story (from the Daily Telegraph) begins

Taking a daily aspirin is far more dangerous than was thought, causing more than 3000 deaths a year, a major study suggests.

Millions of pensioners should reconsider taking pills which are taken by almost half of elderly people to ward off heart attacks and strokes, researchers said.

The study by Oxford University found that those over the age of 75 who take the blood-thinning pills are 10 times more likely than younger patients to suffer disabling or fatal bleeds.

The BBC also has a report on this research. Their headline is Aspirin ‘major bleed’ warning for over-75s, and the story starts

People over 75 taking daily aspirin after a stroke or heart attack are at higher risk of major – and sometimes fatal – stomach bleeds than previously thought, research in the Lancet shows.

Scientists say that, to reduce these risks, older people should also take stomach-protecting PPI pills.

But they insist aspirin has important benefits – such as preventing heart attacks – that outweigh the risks.

The basic message from the same underlying research seems very different. Sadly, neither story links to the open-access research paper, which has very good sections on the background to the research and what this new study added.

Basically, we know that aspirin reduces blood clotting.  This has good effects — reducing the risk of heart attacks and strokes — and also bad effects — increasing the risk of bleeding.   We do randomised trials to find out whether the benefits exceed the risks, and in the randomised trials they did for aspirin. However, the randomised trials were mostly in people under 75.

The new study looks at older people, but it wasn’t a randomised trial: everyone in the study was taking aspirin, and there was no control group.  The main comparisons were by age. Serious stomach bleeding was a lot more common in the oldest people in the study, so unless the beneficial effects of aspirin were also larger in these people, the tradeoff might no longer be favourable.

In particular, as the Herald/Telegraph story says, the tradeoff might be unfavourable for old-enough people who hadn’t already had a heart attack or stroke. That’s one important reason for the difference between the two stories.  The research only looked at people who had previously had a heart attack or stroke (or some similar reason to take aspirin). The BBC story focused mostly on these people (who should still take aspirin, but also maybe an anti-ulcer drug); the Herald/Telegraph story focused mostly on those taking aspirin purely as a precaution.

So, even though the Herald/Telegraph story was going for the scare headlines, the content was potentially helpful: absent any news coverage, the healthy aspirin users would be less likely to bring up the issue with their doctors.


June 13, 2017

Appropriate subdivisions

From Public Policy Polling on Twitter, a finding that voters are less likely to vote for a member of Congress if they supported the Republican anti-healthcare bill


The problem with this sort of claim, as we’ve seen for NZ examples in the past, is that more than 24% of voters already have ‘not in a million years’ as the baseline willingness-to-support for some candidates. Maybe this vote would just change that to `not in two million years’.

Since Public Policy Polling are a reputable survey company (even though I’m not a fan) , they publish detailed survey results (PDF).  In these results, they break down the healthcare question by self-reported vote in the 2016 election
And, as you’d expect, the detailed story is different.  People who voted for Clinton think the Republican healthcare bill is terrible; people who voted for Trump think it’s basically ok. The net 24% who might change their vote might be better described a mixture of a net 50% imaginary `loss’ of people who already weren’t voting Republican, and a net 20% imaginary `gain’ of people who already were.

What’s more striking than the 24% vs 48% overall percentage is that as many as 23% of Trump voters are willing to say something negative about the bill. Still, as an indication that even the hopeful news is unclear, consider this table
Only 13% of Trump voters prefer the current healthcare law, so the 23% who would penalise a Congressperson who voted for the new law includes at least 10% who actually prefer the new law or who aren’t sure.


June 7, 2017

Fraud or typos?

The Guardian saysDozens of recent clinical trials may contain wrong or falsified data, claims study

A UK anaesthetist, John Carlise, has scraped 5000 clinical-trial publications, where patients are divided randomly into two groups before treatment is assigned, and looked at whether the two groups are more similar or more different than you’d expect by chance.  His motivation appears to be that having groups which are too similar can be a sign of incompetent fraud by someone who doesn’t understand basic statistics. However, the statistical hypothesis he’s testing isn’t actually about fraud, or even about incompetent fraud.

As the research paper notes, some of the anomalous results can be explained by simple writing errors: saying “standard deviation” when you mean “standard error” — and this would, if anything, be evidence against fraud.  Even in the cases where that specific writing error isn’t plausible, looking at the paper can show data fabrication to be an unlikely explanation.  For example, in one of the papers singled out as having a big difference not explainable by the standard deviation/standard error confusion, the difference is in one blood chemistry measurement (tPA) that doesn’t play any real role in the conclusions. The data are not consistent with random error, but they also aren’t consistent with deliberate fraud.  They are more consistent with someone typing 3.2 when they meant 4.2. This would still be a problem with the paper, both because some relatively unimportant data are wrong and because it says bad things about your workflow if you are still typing Table 1 by hand in the 21st century, but it’s not of the same scale as data fabrication.

You’d think the Guardian might be more sympathetic to typos as an explanation of error.