Posts written by Thomas Lumley (1980)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

April 24, 2017


  • The Herald (from the Daily Mail) recommends drinking beetroot juice, based on a study of brain waves: “This finding could help people who are at-risk of brain deterioration to remain functionally independent, such as those with a family history of dementia“.  The NHS Choices blog commented on a similar study by the same research group in 2010; their comments still apply.
  • Testimonials and motivational speakers tell you “I did this and look how it turned out”.  As XKCD illustrates, results may not be typical 
  • “Data made available for reanalysis, a journal that promptly responded to the outcomes of that reanalysis, and a finding that could save lives.” (from Stat). Another moral to the story: don’t edit data by copy-and-paste.
  • The company says it has studies that back up its claims, but refused to release them on the grounds that they are commercial-in-confidence.” It appears that Johnson & Johnson would rather pull their ad than let people look at the evidence. (from The Age)
  • it’s not acceptable if you’ve got the information readily available to leave it to the last minute for release, that’s not what the Act says you can do”  The Chief Ombudsman interviewed by Newsroom  about the Official Information Act.

And finally

If you give a mouse a strawberry…


So, the Herald (from the Daily Mailhas a headline Why women should eat a punnet of strawberries a day. That seems a little extreme, especially as punnets of strawberries are fairly seasonal.

The story leads off with

Eating just 15 strawberries a day protected mice from aggressive breast cancer in a new medical study.

So, first of all, mice, not women.  Also, when you go to the open-access research paper, it didn’t exactly ‘protect’ the mice.  The mice had cells from a breast cancer cell culture implanted under their skins, and the study looked at the change in size of those implanted tumours, not at spread within the mouse or health of the mouse or anything like that.  It’s a useful approach to learning about cancer cell biology, but not all that close to preventing or treating human cancers.

More surprisingly, though, “15 strawberries a day” seems quite a lot for a mouse — several times its body weight. The story changes a bit later:

In total, the strawberries made up 15 percent of the mice’s diet. That is just shy of the recommended daily amount of fruit we should eat each day, and would be equivalent to a punnet of strawberries, reported the Daily Mail.

A figure of 15% seems more plausible than 15 strawberries, though it’s still not quite true, since actually the mice were given concentrated strawberry extract in their food rather than strawberries.  Using the standard (lowish) estimate of 2000 kcal/day, 15% of calories would  be 300 kcal/day  which would take nearly a kilogram of strawberries.

Previous studies have already shown that eating between 10 and 15 strawberries a day can make arteries healthier by reducing blood cholesterol levels.

There isn’t a reference, but the same researcher has studied strawberries and cholesterol (this time even in humans). The ‘between 10 and 15 strawberries a day’ was actually 500g per day.

[via Sam Warburton]

April 17, 2017

Slow on the uptake

Q: Did you see gin can increase your metabolism?

A: Um…

Q: Here, in the Herald, new research from Latvia!

A:  Not really convincing.

Q: Why? Is it in mice?

A: Up to a point.

Q: <reads> Yes, it’s in mice: “In fact, the mice who were fed regular doses of the spirit saw a 17 percent increase in their metabolic rate”.  That’s a lot, isn’t it?

A: Indeed. One might almost say an incredible amount.

Q:  Ok, were these some special sort of mutant mouse with a weird metabolism?

A: The story doesn’t seem to say.

Q: Of course it doesn’t, but can’t you find the original research paper? The story says it’s in Food & Nature. Doesn’t University of Auckland subscribe to it?

A: No.

Q: That’s usually a bad sign, isn’t it?

A: Especially in this case. The journal doesn’t exist, the university doesn’t exist, and Professor Thisa Lye is, apparently a lie.

Q:  😕?

A: The story is two weeks old. It was an April Fool’s hoax. Thanks to Elle Hunt I was saved potentially quite a bit of time looking for the journal. She tweeted a link from Latvian Public Broadcasting, who have tracked the story down.

Q: So the Herald got it from the Daily Mail who got it from Yahoo who got it from Prima. And none of them checked that the research existed? I mean, ok, checking science isn’t what journalists are trained to do, but checking that sources actually exist? With Google?

A:  On the positive side, no mice were harmed in conducting the research.

April 14, 2017

Cyclone uncertainty

Cyclone Cook ended up a bit east of where it was expected, and so Auckland had very little damage.  That’s obviously a good thing for Auckland, but it would be even better if we’d had no actual cyclone and no forecast cyclone.  Whether the precautions Auckland took were necessary (at the time) or a waste  depends on how much uncertainty there was at the time, which is something we didn’t get a good idea of.

In the southeastern USA, where they get a lot of tropical storms, there’s more need for forecasters to communicate uncertainty and also more opportunity for the public to get to understand what the forecasters mean.  There’s scientific research into getting better forecasts, but also into explaining them better. Here’s a good article at Scientific American

Here’s an example (research page):


On the left is the ‘cone’ graphic currently used by the National Hurricane Center. The idea is that the current forecast puts the eye of the hurricane on the black line, but it could reasonably be anywhere in the cone. It’s like the little blue GPS uncertainty circles for maps on your phone — except that it also could give the impression of the storm growing in size.  On the right is a new proposal, where the blue lines show a random sample of possible hurricane tracks taking the uncertainty into account — but not giving any idea of the area of damage around each track.

There’s also uncertainty in the predicted rainfall.  NIWA gave us maps of the current best-guess predictions, but no idea of uncertainty.  The US National Weather Service has a new experimental idea: instead of giving maps of the best-guess amount, give maps of the lower and upper estimates, titled: “Expect at least this much” and “Potential for this much”.

In New Zealand, uncertainty in rainfall amount would be a good place to start, since it’s relevant a lot more often than cyclone tracks.

Update: I’m told that the Met Service do produce cyclone track forecasts with uncertainty, so we need to get better at using them.  It’s still likely more useful to experiment with rainfall uncertainty displays, since we get heavy rain a lot more often than cyclones. 

April 12, 2017

Criteria for criteria for mānuka honey

There’s a new proposed definition of NZ Mānuka Honey, as you may have seen. The MPI page on the topic is here; no-one is linking it, which is sad because it’s interesting if you’re enough of a nerd.

I’m not going to comment on the biochemistry or botany, but there are two statistically-interesting parts of the proposal.  First, how the statistical method for classifying honey was constructed. The document says:

A classification modelling approach (CART – classification and regression tree) was the most suitable method of analysis for determining the identification criteria for mānuka honey because:

  • test results for several different attributes were available and needed to be assessed in combination;
  • the identification criteria needed to be related to the attributes tested;
  • the identification criteria needed to be straightforward, transparent and easily interpreted
  • the outputs would enable an unknown honey sample to be authenticated as monofloral or multifloral mānuka honey.

CART is a relatively old classification method, developed in the early 1980s by adding statistical ‘pruning’ to automated methods for building decision trees. It hasn’t been the most accurate method in head-to-head prediction competitions for a long time now, but it remains very useful for basically the reasons the MPI scientists gave.  CART tends to end up with simple rules based whether a small selection of variables all or mostly exceed some thresholds, and while building a good CART prediction rule takes experience and statistical knowledge, using it doesn’t.

Using a collection of honey samples from known origins, and other information about chemical composition of the plants, a rule was developed for distinguishing mānuka honey from other NZ honeys such as kānuka or pōhutukawa, and from Leptospermum species other than mānuka. The resulting rule for monofloral (`pure’) mānuka honey is a threshold that four chemicals have to exceed, plus the presence of mānuka DNA.  For multifloral mānuka honey, the threshold for one of the four chemicals is lowered.

The second interesting aspect of the criteria is that none of the four chemicals have anything to do with real or imagined medical benefits of mānuka honey.  Methylglyoxal, the leading candidate for a somewhat mānuka-specific antimicrobial, isn’t in there.  The rule attempts to identify honey produced by bees foraging on mānuka flowers — scientists know what a mānuka flower is. It doesn’t try to identify honey that prevents miscellaneous diseases when you eat it, because no-one one knows what characteristics that honey would have, or even if it exists.

As I’ve noted before, the largest controlled trial of eating mānuka honey to prevent minor illness was conducted by a London primary school. On the other hand, people are willing to pay a lot of money for honey from NZ mānuka, and as long as MPI isn’t officially supporting the health arguments I’m definitely in favour of that money going to NZ apiarists rather than counterfeiters.

Are you related to your ancestors?

Two people have emailed me this story (one via Stuffone via the Herald) about the DNA ancestry of Oriini Kaipara, a TV presenter:

An analysis of the DNA of Oriini Kaipara, 33, has shown that – despite her having both Maori and Pakeha ancestry – her genes only contain Maori DNA. That makes her, in her own words, a “full-blooded Maori”.

Culturally, people identify as Maori through their whakapapa, while legally a person is defined as Maori if they are of Maori descent, even through one long-distant ancestor.

However, the intermingling of different ethnicities in New Zealand over the past 200 years means all Maori people are thought to have some non-Maori ancestry, so would not be expected to have 100 per cent Maori DNA.

It seems strange that someone could have an ancestor from whom they got no DNA, but while most ‘ancestry and genetics’ news stories are completely bogus, this one probably isn’t.

Ignoring the X and Y chromosomes to start with, you have 22 chromosomes from your mother and 22 from your father (except for some rare cases such as people with Down syndrome, who have an extra copy of one of them, usually from their mothers).  Each of your maternal chromosomes is a combination of DNA from your mother’s father and mother’s mother, in chunks averaging about 1/4 chromosome long. Each of your paternal chromosomes is a combination of DNA from your father’s father and father’s mother, in chunks averaging about 1/4  chromosome long.  So, on average, you have 1/4 of your DNA from each grandparent, but it’s random.  You might have only tiny chunks from one grandparent and almost 50% from another.

As we go back further, after N generations you have 2N direct ancestors, but the chunks of DNA being inherited are about 1/2N chromosomes long.  So, going back 10 generations you have 1024 ancestors and you’re inheriting DNA chunks about 1/20th of a chromosome long.   But with 22 pairs of chromosomes, that only allows you to fit in chunks from 20×2×22=880 of your great8-grandparents.   So, you almost certainly have DNA from all your grandparents, and very likely from all your great-grandparents, but it’s unlikely you have DNA from all your ancestors ten generations back, and the proportion you have DNA from goes down and down the further back you go.  Europeans in NZ don’t go all that far back, so the probability is pretty high for any given European ancestor of a modern Māori, but it’s not 100%.

In modern New Zealand, most Māori will have more non-Māori ancestors than Ms Kaipara does, and most people with only two non-Māori ancestors will have inherited DNA from at least one of them, so it would be unusual for someone to have no non-Māori DNA, but certainly not impossible.

The next question is how the genetic testing people can know which DNA came from Māori ancestors.  The DNA bases that end up in a saliva sample are synthesised in your body from the food you eat: they don’t come with little labels saying which ancestor’s DNA they are copies of.  One adenine base looks just like any other.  The approach to this problem is statistical: there are many, many positions in the DNA sequence where particular variations are more common in one part of the world than in others. Some of these are well known because of what they do, but those are a tiny minority; nearly all of them are unimportant copying errors. In any case, two people who share the variant probably got it from the same distant ancestor, so if you collect enough DNA variants from enough people around the world, you can tell with surprising reliability where people’s ancestors came from.

Here’s a picture from research in the USA, showing three genetic summaries for people identifying with various Hispanic/Latinx groups:


There’s pretty clear separation: in this sample you can tell quite a lot about a typical person’s ancestry from their genes.  No single genetic variant will tell you much, but thousands or millions of them together tell you a lot.  In this example, the three summaries correspond roughly to amounts of ancestry from the Americas before Columbus, from Europe, and from western Africa via the slave trade. are the most important variation after the first three summaries giving basic continental ancestry are taken  out.

The test used by measures 700,000 DNA variants, which is a respectable number.  It’s probably a bit short on markers for Polynesian ancestry, because there hasn’t been much genetic study of Polynesians. It will be very short on markers that distinguish Māori from other people with Polynesian ancestry, but in this example, family history was enough to make that unnecessary.  So, it’s plausible that some Māori have little or no non-Māori DNA, and it’s plausible that could determine that with reasonable reliability: the story is making a claim that has some content and could very well be true.  As the story says, the result doesn’t actually matter much, but it is interesting.

Without Ms Kaipara’s family history, just using genetic data, the video clip says her Polynesian ancestry was estimated as between 93% and 100%: there’s quite a bit of uncertainty.   For someone with a less clearly known family history, or from somewhere that mixing of populations happened longer ago than two centuries, the test will be less informative, but will still give some general information about what parts of the world your ancestors may have come from.  You might still want to know.

What this story should make you concerned about, though, is other news stories talking about someone’s descent from, say, Genghis Khan.  If Ms Kaipara can have recent ancestors whose DNA she doesn’t appear to carry, how can claims from 1000 years in the past be credible? And indeed they aren’t.  As you go back further and further in time,  you have more and more ancestors. By the time of Genghis Khan, there would be tens of billions of them.  Obviously there must be huge overlap, but that still allows you to be descended from a lot of people. Pretty much everyone in Europe and Asia has Genghis Khan as an ancestor; a fraction of them carry DNA descended from his; and a tiny fraction of these have copies of his Y chromosome.  The test results that more often make headlines are the last sort, which are pretty meaningless.


April 10, 2017

Attack of the killer sofa

From the Herald (from the Daily Mail)

Materials used to fireproof sofas are linked to a 74% rise in thyroid tumours

From the American Cancer Society

The chance of being diagnosed with thyroid cancer has risen in recent years and is the most rapidly increasing cancer in the US tripling in the past three decades. Much of this rise appears to be the result of the increased use of thyroid ultrasound, which can detect small thyroid nodules that might not otherwise have been found in the past.

That is, thyroid cancer looks as if it’s more common at least partly because diagnosis has improved. It could potentially still be true that fire retardants are a problem as well, but the  “killer sofa” people either don’t know about out about the changes in diagnosis or do know but don’t think we need to be told.  Either way, I don’t think it increases their credibility.


  • Good piece at Stuff about what a 500-year flood is. The concept isn’t quite as shaky as it sounds — there’s some independent information from comparing different river systems — but it’s inevitably uncertain.
  • 23andme is back providing genetic risk information, but in a much more restricted way after FDA review.  A lot of the risk information you can get this way isn’t useful for treatment, but it’s the sort of thing some people like to know.  So, sometimes, do their insurance companies
  • The concept of ‘net tax’ — tax paid minus cash benefits and transfers (but not non-cash ones such as Pharmac subsidies) can be a useful concept.  However, I don’t think it’s as useful when ‘tax’ leaves out GST, as in this story at Stuff.  Admittedly, it’s not trivial to calculate how much GST people pay, but I’m sure the Treasury had looked at it.
  • Scientists and journalists need to get better at communicating uncertainty, and people need to accept it’s there. (Ed Yong, in the Atlantic)
April 5, 2017

Extrapolation, much?

HeadlineResearch has found that Marmite could help prevent dementia

Research article:  A group of 28 adult volunteers (10 males, mean age 22 years) completed the study after providing written informed consent.

We could just stop there, but it gets better (not better)

The study found that the people getting Marmite had, as hypothesised, less response by their brains to flickering visual stimuli.  The research paper does not mention dementia (or memory, or Alzheimers). At all. It concludes

“This demonstrates that the balance of excitation and inhibition in the brain can be influenced by dietary interventions, suggesting possible clinical benefits in conditions (e.g. epilepsy) where inhibition is abnormal.”

Even the story doesn’t come close to the headline claims, saying just

It could also prompt further research to see if Marmite, and its effect on the brain’s GABA chemical, might provide a treatment for dementia.

And, right at the end of the story, the quote from an independent expert

“there’s no way to say from this study whether eating Marmite does affect your dementia risk.

If it does, and if that’s because of the vitamin B12, it might also have been worth mentioning that there are other foods with as much or more vitamin B12 per serving, such as beef, and lamb, and many types of fish.



  • If someone told me a longstanding problem in mathematical statistics had been solved, but then admitted the proof was short, used fairly elementary techniques, was written with Microsoft Word, and was published in the Far East Journal of Theoretical Statistics, I might not be in a hurry to look it up.  These are all genuinely reasonable filters for mathematical papers that are worth putting effort into. But, in this case, they were all false positives. Quanta Magazine has the story.
  • From The Conversation,”The seven deadly sins of statistical misinterpretation, and how to avoid them“.
  • From Newsroom (who seem to be quite good so far) Interaction of recreational genotyping and health insurance in NZ
  • From The Conversation, how website terms of use (and their potential criminal enforcement in the US) affect research into fairness and transparency of algorithms.
  • Good Herald interview on air pollution with NIWA scientist Elizabeth Somervell