Posts written by Thomas Lumley (1221)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

July 23, 2014

Human statisticians not obsolete

There’s a website,, that, as it says

Discovers New Insights from Data.
Writes Them Up in Perfect English.
All Automated.

You can test this by asking it for ‘insights’ in some example areas. One area is baseball, so naturally I selected the Seattle Mariners, and 2009, when I still lived in Seattle. OnlyBoth returns several names where it found insights, and I chose ‘Matt Tuiasosopo’ — the most obvious thing about him is that he comes from a famous local football family, but I was interested in what new insight the data revealed.

Matt Tuiasosopo in 2009 was the 2nd-youngest (23 yrs) of the 25 hitters who were born in Washington and played for the Seattle Mariners.

outdone by Matt Tuiasosopo in 2008 (22 yrs).

I don’t think our students need to be too worried yet.

Average and variation

Two graphs from the NZ influenza surveillance weekly update (PDF, via Mark Hanna)


Both show that the seasonal epidemic has started.  I think the second graph is more helpful in comparing this year to the past; showing the actual history for a range of years, rather than an average.  This sort of graph could handle a larger number of past years if they were all or mostly in, eg, thin grey lines, perhaps with this year, last year, and the worst recent year in colour.

The other news in the surveillance update is that the flu viruses that have been examined have overwhelming been H1N1 or H3N2, and both these groups are covered in this year’s vaccine.

The self-surveillance world

See anyone you know? (click to embiggen)



This is a screenshot from I know where your cat lives, a project at Florida State University that is intended to illustrate the amount of detailed information available from location-tagged online photographs, without being too creepy — just creepy enough.

(via Robert Kosara and Keith Ng)

July 22, 2014

Lack of correlation does not imply causation

From the Herald

Labour’s support among men has fallen to just 23.9 per cent in the latest Herald-DigiPoll survey and leader David Cunliffe concedes it may have something to do with his “sorry for being a man” speech to a domestic violence symposium.

Presumably Mr Cunliffe did indeed concede it might have something to do with his statement; and there’s no way to actually rule that out as a contributing factor. However

Broken down into gender support, women’s support for Labour fell from 33.4 per cent last month to 29.1 per cent; and men’s support fell from 27.6 per cent last month to 23.9 per cent.

That is, women’s support for Labour fell by 4.2 percentage points (give or take about 4.2) and men’s by 3.7 percentage points (give or take about 4.2). This can’t really be considered evidence for a gender-specific Labour backlash. Correlations need not be causal, but here there isn’t even a correlation.

July 14, 2014


Why supermoons aren’t a big deal for earthquakes, based on XKCD


Multiple testing, evidence, and football

There’s a Twitter account, @FifNdhs, that has five tweets, posted well before today’s game

  • Prove FIFA is corrupt
  • Tomorrow’s scoreline will be Germany win 1-0
  • Germany will win at ET
  • Gotze will score
  • There will be a goal in the second half of ET

What’s the chance of getting these four predictions right, if the game isn’t rigged?

Pretty good, actually. None of these events is improbable on its own, and  Twitter lets you delete tweets and delete accounts. If you set up several accounts, posted a few dozen tweets on each, describing plausible events, and then deleted the unsuccessful ones, you could easily come up with an implausible-sounding remainder.

Twitter can prove you made a prediction, but it can’t prove you didn’t also make a different one, so it’s only good evidence of a prediction if either the predictions were widely retweeted before they happened, or the event described in a single tweet is massively improbable.

If @FifNdhs had predicted a 7-1 victory for Germany over Brazil in the semifinal, that would have been worth paying attention to. Gotze scoring, not so much.

July 13, 2014

Age/period/cohort voting

From the New York Times, an interactive graph showing how political leanings at different ages have changed over time


Yes, voting preferences for kids are problematic. Read the story (and this link) to find out how they inferred them. There’s more at Andrew Gelman’s blog.

100% accurate medical testing

The Wireless has a story about a fatal disease where there’s an essentially 100% accurate test available.

Alice Harbourne has a 50% chance of Huntington’s Disease. If she gets tested, she will have either a 0% or 100% chance, and despite some recent progress on the mechanism of the disease, there is no treatment.

July 11, 2014

Another prostate cancer study

Today’s prostate cancer risk factor, per the Herald, is vasectomy. The press release is here; the paper isn’t open-access.

This is a much more reliable study than the one earlier in the week about cycling, and there’s reasonable case that this one is worth a press release.

In 1986, the researchers recruited about 50000 men (health professionals: mostly dentists and vets), then followed them up to see how their health changed over time.  This research involves the 43000 who hadn’t had any sort of cancer at the start of the study. As the Herald says, about a quarter of the men had a vasectomy, and there have been 6000 prostate cancer diagnoses. So there’s a reasonable sample size, and there is a good chance you would have heard about this result if no difference had been found (though probably not via the Daily Mail)

The relative increase in risk is estimated as about 10% overall and about 20% for ‘high-grade’ tumours, which is much more plausible than the five-fold increase claimed for cycling.  The researchers had information about the number of prostate cancer tests the men had had, so they can say this isn’t explained by a difference in screening — the cycling study only had total number of doctor visits in the past year. Also, the 20% difference is seen in prostate cancer deaths, not just in diagnoses, though if you only consider deaths the evidence is borderline.  Despite all this, the researchers quite rightly don’t claim the result is conclusive.

There are two things the story doesn’t say. First, if you Google the name of the lead researcher and ‘prostate cancer’, one of the top hits is another paper on prostate cancer (and coffee, protective). That is, the Health Professionals Followup Study, like its sister cohort, the Nurses Health Study, is in the business of looking for correlations between a long list of interesting exposures and potential effects. Some of what it finds will be noise, even if it appears to pass sanity checks and statistical filters. They aren’t doing anything wrong, that’s just what life is like.

Second, there were 167 lethal prostate cancers in men with vasectomies. If the excess risk of 20% is really due to vasectomy, rather than something else, that would mean about 27 cancers caused by 12000 vasectomies. Combining lethal and advanced cases, the same approach gives an estimated 38 cases from 12000 vasectomies. So, if this is causation, the risk is 2 or 3 serious prostate cancers for every 1000 vasectomies. That’s not trivial, but I think it sounds smaller than “20% raised risk”.

July 10, 2014

Summaries of income

I don’t want to get into the general business of election fact-checking, but we have a Stat-of-the-Week  nomination for a statement that is (a) about a specifically statistical issue at the high-school level, and (b) unambiguously wrong.  From Richard Prebble’s “The Letter”:

 Cunliffe is basing Labour’s election campaign around the claim that inequality is growing. Fact check: inequality is falling and New Zealand remains a very equal country. The claim that around a quarter of a million children are in poverty is dubious, to say the very least. Cunliffe says households in poverty have less than 60 percent of the medium income after housing costs. If Bill Gates came to live in New Zealand, the medium income of the country would rise and, according to that logic, more children would be in poverty.

David Cunliffe, as you presumably know, talked about the median, not “medium”; the use of a fraction of median income as a relative poverty threshold is very common internationally. The reason for using the median is precisely that the median income of the country would not rise if a few billionaires were added to the population. The median, the income of the household in the middle of the income distribution, is very insensitive to changes in or additions of a few values. That’s what it’s for.

While I’m writing, I might as well mention the inequality statistics.  Mr Cunliffe isn’t making up his figures on children in poverty; they can be found in the 2014 Household Incomes Report from the Ministry of Social Development [update: that figure is 260000, which matches what The Letter reported was said, but the actual speech said 285000]. The report also gives trends in the Gini index of inequality and in the proportion of income spent on housing.  StatsNZ gives trends in the ratio of 80th to 20th percentile of income, before and after housing costs. The details of trends in inequality depend on how you measure it, but by these measures it is neither falling, nor notably low internationally.