Posts written by Thomas Lumley (1511)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

June 25, 2015

Poetry about statistics

On Twitter, Evelyn Lamb pointed me to the poem “A contribution to Statistics”, by Wisława Szymborska (who won the 1996 Nobel Prize for Literature). It begins

Out of every hundred people

those who always know better:
— fifty-two,

doubting every step
   — nearly all the rest,

glad to lend a hand
if it doesn’t take too long:
— as high as forty-nine,

Read all of it here

The same blog, “Poetry with Mathematics”, has some other statistically themed poems:

The last was written in honour of Florence Nightingale, who was the first female member of the Royal Statistical Society, and also an honorary member of the American Statistical Association.

June 23, 2015

Refugee numbers

Brent Edwards on Radio NZ’s Checkpoint has done a good job of fact-checking claims about refugee numbers in New Zealand.  Amnesty NZ tweeted this summary table


If you want the original sources for the numbers, the Immigration Department Refugee Statistics page is here (and Google finds it easily).

The ‘Asylum’ numbers are in the Refugee and Protection Status Statistics Pack, the “Approved” column of the first table. The ‘Family reunification’ numbers are in the Refugee Family Support Category Statistics Pack in the ‘Residence Visas Granted’ section of the first table. The ‘Quota’ numbers are in the Refugee Quota Settlement Statistics Pack, in the right-hand margin of the first table.

Update: @DoingOurBitNZ pointed me to the appeals process, which admits about 50 more refugees per year: 53 in 2013/4; 57 in 2012/3; 63 in 2011/2; 27 in 2010/11.


June 21, 2015

Sunbathing and babies

The Herald (from the Daily Mail)

A sunshine break is the perfect way to unwind, catch up on your reading and top up that tan.

But it seems a week soaking up the rays could also offer a surprising benefit – helping a woman have a baby.

Increased exposure to sunshine could raise the odds of becoming a mother by more than a third, a study suggests.


If you read StatsChat regularly, you probably won’t be surprised to hear the study had nothing to do with either holidays or sunbathing, or fertility in the usual sense.

As the story goes on to say, it was about the weather and IVF success rates. The researchers looked for correlations between a variety of weather measurements and a variety of ways of measuring IVF success. They didn’t find evidence of correlations with the weather at the time of conception. As they said (conference abstract, since this isn’t published)

When looking for a linear correlation between IVF results and the mean monthly values for the weather, the results were inconsistent.

So, following the ‘try, try again’ strategy they looked at weather a month earlier

However, when the same analysis was repeated with the weather results of 1 month earlier, there was a clear trend towards better IVF outcome with higher temperature, less rain and more sunshine hours. 

It helps, here, to know that “a clear trend” is jargon for “unimpresssive statistical evidence, but at least in the direction we wanted”.  That’s not the only problem, though. Since these are honest researchers, you find the other big problem in the section of the abstract labelled “limitations”

Because of the retrospective design of the study, further adjusting for possible confounding factors such as age of the woman, type of infertility and indication for IVF is mandatory. 

That is, their analysis lumped together women of different ages,  types of infertility, and reasons for using IVF, even those these have a much bigger impact on success than is being claimed for the weather.

I don’t have any problem with these analyses being performed and presented to other consenting scientists who are trying to work out ways to improve IVF.  On the other hand,  I’m pretty sure the Daily Mail didn’t get these results by reading the abstract book or sitting through the conference. Someone made a deliberate decision to get publicity for this research, at this stage, in a form where all the cautionary notes would be lost. 



  • From Vox, for map nerds: “Countries that are South Sudan” in red, “Countries that are not South Sudan” in green, “No data” in grey
  • Medical marijuana laws didn’t lead to an increase in teenage marijuana use in the US (MinnPost, Lancet Psychiatry). This is less unsurprising than it sounds, because in some of the states, (eg, California) the medical-use requirement is not all that stringent.
  • Survey research finds that people who (claim to) have more sex are (or claim to be) happier. As XKCD has pointed out, this is something you can’t easily do a double-blind randomised trial of.  But you could do a trial of encouraging people to have more sex. It didn’t make them happier. This is an issue more generally with lifestyle changes: just because a lifestyle difference would be good, it doesn’t necessarily mean that a doctor telling you to make the change will be good. (via Tim Harford)
  • Google Research was looking for ways of visualising what actually happens in different layers of a computational neural network. They used feedback of images and amplification of layers to get things like
  • Nice story in the Herald about second languages in Auckland. (though also note the Herald search page finds 514 stories with the phrase “melting pot”)
June 18, 2015

Bogus poll story again

For a while, the Herald largely gave up basing stories on bogus clicky poll headlines. Today, though, there was a story about Gurpreet Singh,  who was barred from the Manurewa Cosmopolitan Club for refusing to remove his turban.

The headline is “Sikh club ban: How readers reacted”, and the first sentence says:

Two thirds of respondents to an online NZ Herald poll have backed the controversial Cosmopolitan Club that is preventing turbaned Sikhs from entering due to a ban on hats and headgear.

In some ways this is better than the old-style bogus poll stories that described the results as a proportion of Kiwis or readers or Aucklanders. It doesn’t make the number mean anything much, but presumably the sentence was at least true at the time it was written.

A few minutes ago I looked at the original story and the clicky poll next to it


There are two things to note here. First, the question is pretty clearly biased: to register disagreement with the club you have to say that they were completely in the wrong and that Mr Singh should take his complaint further. Second, the “two thirds of respondents” backing the club has fallen to 40%. Bogus polls really are even more useless than you think they are, no matter how useless you think they are.

But it’s worse than that. Because of anchoring bias, the “two thirds” figure has an impact even on people who know it is completely valueless: it makes you less informed than you were before. As an illustration, how did you feel about the 40% figure in the new results? Reassured that it wasn’t as bad as the Herald had claimed, or outraged at the level of ignorance and/or bigotry represented by 40% support for the club?


June 17, 2015

Chocolate: the new health food?

A UK cohort study published a paper yesterday with the title “Habitual chocolate consumption and risk of cardiovascular disease among healthy men and women.”  In contrast to the last chocolate study to make headlines, this is actual research, involving 20,000 people followed up for twelve years, and was published in a respectable medical journal.

The study found that people who ate more chocolate back in the mid-1990s had less cardiovascular disease over the period to 2008. Of course, this makes for a great press release and headlines

  • StuffEating chocolate every day linked to lower heart disease and stroke
  • NZ Herald (from the Telegraph): Two choc bars a day keeps doctor away
  • One News: Is chocolate good for you? New study suggests 100g a day may be beneficial

Are the findings of the paper true? Well, that depends on what you mean by ‘true’, which is important to remember when you see claims that 90% of scientific results are false.

On the one hand, it’s true that the EPIC study recruited all these people and asked them questions about their diet in the 1990s, and it’s presumably true that the proportion getting cardiovascular disease was lower among those who ate more chocolate and that the study included people who ate up to 100g/day.  These are historical facts, not health claims.

The conclusion of the research paper was

Cumulative evidence suggests that higher chocolate intake is associated with a lower risk of future cardiovascular events, although residual confounding cannot be excluded. There does not appear to be any evidence to say that chocolate should be avoided in those who are concerned about cardiovascular risk.

This is also probably true: there is a correlation; it could be due to confounding; there doesn’t seem to be any big extra risk from eating chocolate (instead of something else with similar calorie content).

At the other extreme, though, the Herald and One News headlines are misleading: they imply that adding 100g/day of chocolate to your existing diet would be beneficial. First, while the maximum consumption was 100g/day, 95% of the study participants consumed less than 40g/day, and 90% less than 25g/day. Second, and more important, the study looked at people’s normal diet, not at changes in diet.

If you add 100g/day of chocolate to your diet, you either need to cut more than 500 Calories of other foods, or exercise a lot more, to avoid gaining weight. The study participants basically did this: the high-chocolate and low-chocolate groups had similar BMI and waist:hip ratio, and the high-chocolate group exercised more.

Third, there is confounding. The people who ate more chocolate might have been healthier for other reasons. For example, those who ate no chocolate were more likely to have diabetes, which is probably why some of them ate no chocolate. The difference in cardiovascular disease rates was far too small for confounding to be ruled out as an explanation, no matter how carefully the analysis was done (and it was done pretty well).

Fourth, and in some ways most important, is the role of chance. This was a big study, but it still came up with only moderately strong evidence that the correlation was real, and that’s considering the study on its own. We don’t know whether there was any publication bias leading positive results about chocolate to be easier to publish in the scientific journals, but we can be sure there was publication bias in the media coverage. Always, if you see a diet and health study on TV, it must have had unusually interesting results. Even when it’s valuable as a component in the cumulative scientific literature, the biased selection for interesting results usually means you can’t believe it in application to your own life.

I got up at 5:15 today in order to be on breakfast TV to talk about this study. That would never have happened if the results had been different.

June 16, 2015


  • NZ Defence Force say they have seized “260kg of high-grade heroin worth about $235m” (via Stuff). That would be $900/gram. Presumably the figure is supposed to be street price ignoring any distribution costs and assuming it’s all sold. Even so it seems steep. NZDF also say “they were destined for east Africa and likely into Europe.” European street prices for heroin vary depending on who you ask, but they aren’t anywhere near that high.
  • 3News had a story on the rise in Indian-style weddings in New Zealand. I noticed the line “The average Indian wedding costs up to $100,000 – that is more than three times the average New Zealanders spend on tying the knot.” We know the figure of $30,000 for an average NZ wedding is bogus; it’s hard to tell whether the figure for Indian weddings is more or less inflated.
  • When the FDA doesn’t approve an application to market a new drug, it sends what it calls a “complete response letter”.  An analysis in the medical journal BMJ compares the letters to company press releases. In 21% of cases no information in the press release matched the letter. In 19 of 32 cases where the FDA had said more clinical trials would be needed, there was no mention of this in the press release.
  • The top ten finalists from a competition for optical illusions. Knowing about optical illusions is useful for data visualisation: they are extreme versions of things you want to avoid.
  • From the Washington Post, an illustration of why you want to avoid the optical illusion of 3-d in your graphics. The box on the right is 21% smaller than the box on the left. Really. Do the maths.
  • A good example of risk communication, from the British NHS:
    (via David Spiegelhalter who also writes about the conflict between targeting information at the people who theoretically need it versus the people who will actually take advantage of it)
June 13, 2015

In a bit of a pickle

Q: Isn’t this microbiome stuff cool?

A: <suspiciously> Yes?

Q: In the Herald. “Eating sauerkraut, pickles and yoghurt may be the answer for young adults suffering from social anxiety.” We didn’t know that before scientists starting measuring gut microbes, did we?

A: I’m not sure we know that now.

Q: They even found “the effect was greatest among those at genetic risk for social anxiety disorder as measured by neuroticism”.  This sort of interdisciplinary approach is a real step forward, surely? Genes and microbes and personality changes?

A: Yes, in principle, but there weren’t any genes or microbes or personality changes measured in this research.

Q: Not any?

A: No

Q: Oh.

A: They asked, on one occasion, about what they called ‘fermented foods’ and measured social anxiety, and found a correlation.

Q: How strong?

A: ‘Fermented food’ intake explained nearly 2% of the variation in social anxiety

Q: You mean nearly 20%?

A: I mean nearly 2%. The correlation was -0.13, and you square it to get proportion of variation explained.

Q: Oh. Um. Sometimes the story says ‘fermented’ and sometimes it says ‘pickles’ and then there’s this mention of chocolate? What’s up with that?

A: They asked about ten food classes: fruit and veg, and nine things they lumped together into ‘fermented foods’. From the research paper

2. yogurt, 3. kefir, or food or beverages that contain yogurt; 4. soy milk, or foods or beverages that contain soy milk; 4. miso soup; 5. sauerkraut; 6. dark chocolate; 7. juices that contain microalgae; 8. pickles; 9. tempeh; and 10. kimchi

Q: But soy milk isn’t fermented. Or dark chocolate. And what do they include in ‘pickles’?

A: Whatever a US undergraduate student would include, so probably more vinegar-preserved cucumbers and peperoncini than real lactic-fermented pickles

Q: And they just added these all up?

A: Yes. And then took the inverse hyperbolic sine.

Q: The what now?

A: Some people reported eating much more fermented food than the rest, so they used a mathematical transformation to reduce the impact of these measurements. The effect is that they’re focusing mostly on low levels of consumption (weekly), not high levels (multiple per day).

Q: Is that a problem?

A: No, it’s fine. It’s just that they seemed to have done it because they have a thing about Normal distributions rather than because that’s what they wanted to focus on.

Q: I thought it was statisticians who had a thing about Normal distributions?

A: Not for a few decades now, but yes, our bad originally.

Q: Ok, how about the genes. How did they avoid measuring any genes?

A: Look more carefully at what you quoted. They defined “at genetic risk of social anxiety” by a measure of neuroticism

Q: How genetic is … no, wait, we’ve been there. You’re going to tell me the estimated heritability, then explain it doesn’t answer my question.

A: Glad to see someone’s paying attention.

Q: Could it just be that there are cultural differences in reporting social anxiety and also in eating things like tempeh, miso, and kimchi?

A: It’s not out of the question.

June 11, 2015

Comparing all the treatments

This story didn’t get into the local media, but I’m writing about it because it illustrates the benefit of new statistical methods, something that’s often not visible to outsiders.

From a University of Otago press release about the work of A/Prof Suetonia Palmer

The University of Otago, Christchurch researcher together with a global team used innovative statistical analysis to compare hundreds of research studies on the effectiveness of blood-pressure-lowering drugs for patients with kidney disease and diabetes. The result: a one-stop-shop, evidence-based guide on which drugs are safe and effective.

They link to the research paper, which has interesting looking graphics like this:


The red circles represent blood-pressuring lowering treatments that have been tested in patients with kidney disease and diabetes, with the lines indicating which comparisons have been done in randomised trials. The circle size shows how many trials have used a drug; the line width shows how many trials have compared a given pair of drugs.

If you want to compare, say, endothelin inhibitors with ACE inhibitors, there aren’t any direct trials. However, there are two trials comparing endothelin inhibitors to placebo, and ten trials comparing placebo to ACE inhibitors. If we estimate the advantage of endothelin inhibitors over placebo and subtract off the advantage of ACE inhibitors over placebo we will get an estimate of the advantage of endothelin inhibitors over ACE inhibitors.

More generally, if you want to compare any two treatments A and B, you look at all the paths in the network between A and B, add up differences along the path to get an estimate of the difference between A and B, then take a suitable weighted average of the estimates along different paths. This statistical technique is called ‘network meta-analysis’.

Two important technical questions remain: what is a suitable weighted average, and how can you tell if these different estimates are consistent with each other? The first question is relatively straightforward (though quite technical). The second question was initially the hard one.  It could be for example, that the trials involving placebo had very different participants from the others, or that old trials had very different participants from recent trials, and their conclusions just could not be usefully combined.

The basic insight for examining consistency is that the same follow-the-path approach could be used to compare a treatment to itself. If you compare placebo to ACE inhibitors, ACE inhibitors to ARB, and ARB to placebo, there’s a path (a loop) that gives an estimate of how much better placebo is than placebo. We know the true difference is zero; we can see how large the estimated difference is.

In this analysis, there wasn’t much evidence of inconsistency, and the researchers combined all the trials to get results like this:


The ‘forest plot’ shows how each treatment compares to placebo (vertical line) in terms of preventing death. We can’t be absolutely sure than any of them are better, but it definitely looks as though ACE inhibitors plus calcium-channel blockers or ARBs, and ARBs alone, are better. It could be that aldosterone inhibitors are much better, but also could be that they are worse. This sort of summary is useful as an input to clinical decisions, and also in deciding what research should be prioritised in the future.

I said the analysis illustrated progress in statistical methods. Network meta-analysis isn’t completely new, and its first use was also in studying blood pressure drugs, but in healthy people rather than people with kidney disease. Here are those results


There are different patterns for which drug is best across the different events being studied (heart attack, stroke, death), and the overall patterns are different from those in kidney disease/diabetes. The basic analysis is similar; the improvements since this 2003 paper are more systematic and flexible ways of examining inconsistency, and new displays of the network of treatments.

‘Innovative statistical techniques’ are important, but the key to getting good results here is a mind-boggling amount of actual work. As Dr Palmer put it in a blog interview

Our techniques are still very labour intensive. A new medical question we’re working on involves 20-30 people on an international team, scanning 5000-6000 individual reports of medical trials, finding all the relevant papers, and entering data for about 100-600 reports by hand. We need to build an international partnership to make these kind of studies easier, cheaper, more efficient, and more relevant.

At this point, I should confess the self-promotion aspect of the post.  I invented the term “network meta-analysis” and the idea of using loops in the network to assess inconsistency.  Since then, there have been developments in statistical theory, especially by Guobing Lu and A E Ades in Bristol, who had already been working on other aspects of multiple-treatment analysis. There have also been improvements in usability and standardisation, thanks to Georgia Salanti and others in the Cochrane Collaboration ‘Comparing Multiple Interventions Methods Group’.  In fact, network meta-analysis has grown up and left home to the extent that the original papers often don’t get referenced. And I’m fine with that. It’s how progress works.


Women and dementia risk

A Herald story headlined “Women face greater dementia risk – study” has been nominated for Stat of the Week, I think a bit unfairly. Still, perhaps it’s worth clarifying the points made in the nomination.

People diagnosed with dementia are more likely to be women, and the story mentions three reasons. The first is overwhelmingly the most important from the viewpoint of population statistics: dementia is primarily a disease of old people, the majority of whom are women because women live longer.

In addition, and importantly from the viewpoint of individual health, women are more likely to have diagnosed dementia than men in  a given age range

European research has indicated that although at age 70, the prevalence of dementia is the same for men and women, it rapidly diverges in older age groups. By 85, women had a 40 per cent higher prevalence than men.

There could be many reasons for this. A recent research paper lists possibilities related to sex (differences in brain structure, impact of changes in hormones after menopause) and to gender (among current 85-year-olds, women tend to be less educated and less likely to have had intellectually demanding careers).

The third statistic mentioned in the Stat of the Week nomination was that “Women with Alzheimer’s disease (AD) pathology have a three-fold risk of being diagnosed with AD than men.”  This is from research looking at people’s brains.  Comparing people with similar amounts of apparent damage to their brains, women were more likely to be diagnosed with Alzheimer’s disease.

So, the differences in the summary statistics are because they are making different comparisons.

Statistical analysis of Alzheimer’s disease is complicated because the disease happens in the brain, where you can’t see. Definitive diagnosis and measurement of the biological disease process can only be done at autopsy. Practical clinical diagnosis is variable because dementia is a very late stage in the process, and different people take different amounts of neurological damage to get to that point.