Posts written by Thomas Lumley (1873)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

October 24, 2016

Why so negative?

My StatsChat posts, and especially the ‘Briefly’ links, tend to be pretty negative about big data and algorithmic decision-making. I’m a statistician, and I work with large-scale personal genomic data, so you’d expect me to be more positive. This post is about why.

The phrase “devil’s advocate” has come to mean a guy on the internet arguing insincerely, or pretending to argue insincerely, just for the sake of being a dick. That’s not what it once meant. In the early eighteenth century, Pope Clement XI created the position of “Promoter of the Faith” to provide a skeptical examination of cases for sainthood. By the time a case for sainthood got to the Vatican, there would be a lot of support behind it, and one wouldn’t have to be too cynical to suspect there had been a bit of polishing of the evidence. The idea was to have someone whose actual job it was to ask the awkward questions — “devil’s advocate” was the nickname.  Most non-Catholics and many Catholics would argue that the position obviously didn’t achieve what it aimed to do, but the idea was important.

In the research world, statisticians are often regarded this way. We’re seen as killjoys: people who look at your study and find ways to undermine your conclusions. And we do. In principle you could imagine statisticians looking at a study and explaining why the results were much stronger than the investigators thought, but since people are really good at finding favourable interpretations without help, that doesn’t happen so much.

Machine learning includes some spectacular achievements, and has huge potential for improving our lives. It also has a lot of built-in support both because it scales well to making a few people very rich, and because it fits in with the human desire to know things about the world and about other people.

It’s important to consider the risks and harms of algorithmic decision making as well as the very real benefits. And it’s important that this isn’t left to people who can be dismissed as not understanding the technical issues.  That’s why Cathy O’Neil’s book Weapons of Math Destruction is important, and on a much smaller scale it’s why you’ll keep seeing stories about privacy or algorithmic prejudice here on StatsChat. As Section 162 (4) (a) (v) of the Education Act indicates, it’s my actual job.



  • I would never have guessed this was a problem, but “Data from three national surveys indicated that people are unaware that age is a risk factor for cancer. Moreover, those who were least aware perceived the highest risk of cancer regardless of age.” (free abstract but paywalled paper, via @RolfDegen)
  • Useful graph of uncertainty in vote margin and winner from Nate Silver on Twitter.
  • There’s a computer-personalised education system supported by Facebook that seems to be getting good results. On the other hand, the evidence for the effectiveness isn’t very good quality, and the handling of data privacy is weak. There’s going to be a lot of this sort of issue coming up in the data-based policy world. (Washington Post)

Not the Nobel Prize for Statistics

Q: There isn’t a Nobel Prize for Statistics, is there?

A: No. We already talked about that.

Q: But there is a new big prize?

A: Yes, a group of five statistics organisations collaborated to create the “International Prize in Statistics

Q: And did someone win it?

A: Yes. To the vast surprise of no-one, it was won by Sir David Cox. (PDF)

Q: So what did he do?

A: He invented the Cox model. (And the other Cox model, but it was the Cox model he got the prize for.)

Q: And what is the Cox model?

A: It’s a regression model for censored time-to-event data. That is, you’re interested in modelling the time until something happens (death, unemployment,graduation) and you don’t get to observe the actual time for some people — they were still alive, employed, or studying when you stopped collecting data.

Q: That sounds useful. But why hadn’t someone already done it?

A: It was 1972.

Q: Oh.

A: And they had, it’s just Cox’s model was better in some ways. In particular, it didn’t make assumptions about the rate of events over time, just about how different groups of people compared.

Q: Um..

A: Consider smokers and non-smokers. The model might say smokers get cancer at ten times the rate of non-smokers, but not have to assume anything about how those rates change with age.  Earlier models would have assumed the rates were constant over time, or that they had simple mathematical forms.

Q: And they don’t?

A: Exactly.

Q: Ok, that sounds like a step forward. The model was popular, I suppose.

A: Yes, the paper presenting it has over 30,000 citations. It has more citations with a typo in the page number than my most-cited first-author statistics paper has in total.

Q: That many people have read it?

A: I didn’t say they’d read it. Nowadays, they mostly haven’t; they have read other papers or textbooks that mention it.

Q: So why hasn’t someone come up with a better model since 1972?

A: They have, but the Cox model is good enough to stay popular. And it was helped to popularity by being computationally well-behaved and mathematically interesting.

Q: Mathematically interesting?

A: The model is “semiparametric”: it has both rigid constraints (the ratios of rates are constant over time) and completely flexible parts (the pattern of events over time).  The estimator that Cox proposed is very simple, and in particular doesn’t involve estimating the flexible part of the model. It’s very unusual for that to work well, so mathematical statisticians wanted to study it and work out how to duplicate its success.

Q: And did they?

A: Not really. They understand how it works, but it’s not something you can make work in general. Cox was lucky and/or brilliant.

Q: Did Cox do anything else important?

A: Lots. He wrote or co-wrote 17 books on different areas of statistics, several of which became classics. He’s written a few hundred other research papers. He’s had 63 PhD students (he was my advisors’ advisor’s advisor’s advisor). And ..

Q: Ok, enough already. Where did he study statistics?

A: He didn’t really. He got a degree in maths (in two years, because there was a war on), then went to work for the Wool Industry Research Association before doing a PhD. Later, he moved to the US for 15 years because he couldn’t get a long-term job in Britain.

Q: Well, that part of his experience is still easy to duplicate in many countries.

A: Sadly, yes.

October 23, 2016

Psychic meerkats and Halloween masks

Prediction is hard — especially,  as the Danish proverb says, when it comes to the future. In the Rugby World Cup we had psychic meerkats. For the US elections the new bogus prediction trend is Halloween masks: allegedly, more masks are sold with the face of the candidate who goes on to win.

The first question with a claim like this one, especially given some of the people making it, is whether the historical claim is true.  In this case it’s true-ish.  The claim was made before the 2012 election, and while the data aren’t comprehensive, they are from the same big chain of stores each year. From 1980 to 2012, the mask rule has predicted the eventual winner of the presidency.  That’s actually an argument against it.

If there’s more to the mask sales than there is to psychic meerkats, it would have to be as a prediction of the popular vote — you’d need data from individual states to predict the weird US Electoral College. But if the mask rule got the 2000 election right, it must have got the popular vote wrong that year — George W. Bush won the electoral college, but lost the popular vote to Al Gore. From that point of view, we’re looking at 8 out of 9.

More importantly, 9 out of 9 isn’t all that impressive. Suppose you got your predictions by flipping a coin.  Your chance of getting either all heads for the Republican wins or all heads for the Democratic wins is 1 in 256, increasing to 1 in 128 if you’re allowed to choose which way to treat the 2000 election.  The chance of getting 8 of 9 agreement is much better: about 1 in 13.  If only one in a million people in the US had tried coming up with just one prediction rule each, you’d expect someone to get it perfect and dozens to get it nearly right.

Given these odds, it wouldn’t be surprising if, say, a US professional sports team had results agreeing with the Presidential results — and in fact, there was a rule based on the results for the Washington Redskins football team that worked from 1940 to 2000, was fudged to work in 2004, and then failed completely in 2012.    That’s 17/19 correct, but since the rule was first publicised in the run-up to the 2000 election, it’s 2/4 correct in actual use.

If you’re allowed to combine multiple variables it gets even easier to find rules. With anything from basic linear regression to a neural network you’d expect to get perfect prediction from five unrelated variables. Even restricting the models to be simple doesn’t help much.  I downloaded some OECD data on national GDP for various countries, and found that since 1980 the Republicans have won the popular vote precisely in years when the GDP of Sweden increased more than the GDP of Norway.

My advice is to stick with the psychic meerkats for entertainment and the opinion poll aggregators or the betting markets for prediction.

October 22, 2016

Stat of the Week fixed

Because of changes at WordPress, the Stat of the Week competition has been eating the URLs you submitted.




We’ve fixed it now.

Cheese addiction hoax again

Three more sites have fallen for the cheese addiction hoax

As you may remember, this story is very very loosely based on real research from the University of Michigan. However, the hoax version misrepresents which foods were most addictive and makes up an explanation based on the milk protein casein that isn’t mentioned in the real research at all.

The reason I’m calling this a hoax is that it wasn’t the fault of the researchers, their institution, or the journal, and it’s obvious to anyone who makes any attempt to scan the research paper that it doesn’t support the story. It isn’t an innocent mistake, and it isn’t a simple exaggeration like most misleading health science stories.

There’s a good post at Science News describing what was actually found.

October 20, 2016

Brute force and ignorance

At a conference earlier this week, a research team from Microsoft described a computer system for speech transcription. For the first time ever, this system did better than humans on a standard set of recordings.

What’s more impressive — and StatsChat relevant — is that this computer system does not understand anything about the conversations it writes down. The system does not know English, or any other human language, even in the sense that Siri does.

It has some preconceived notions about what tends to follow a particular word, pair of words, or triple of words, and about what sequences of sounds tend to follow each other, but nothing about nouns or verbs or how colorless green ideas sleep. As with modern image recognition, the system is just based on heaps and heaps of data and powerful computers.  It’s computing and statistics, not linguistics.

In a comment to a post at Language Log, the linguist Geoffrey Pullum says

I must confess that I never thought I would see this day. In the 1980s, I judged fully automated recognition of connected speech (listening to connected conversational speech and writing down accurately what was said) to be too difficult for machines, far more difficult than syntactic and semantic processing (taking an error-free written sentence as input, recognizing which sentence it was, analysing it into its structural parts, and using them to figure out its literal meaning). I thought the former would never be accomplished without reliance on the latter.

There are many problems where enough data is not available to construct a model with no understanding of the problem. There won’t be a shortage of work for human statisticians or linguists any time soon. But there are problems where brute force and ignorance works, and they aren’t always the ones we expect.

October 18, 2016

Evidence-based policy chants

An old one, seen at the ‘Rally To Restore Sanity and/or Fear”

What do we want?

When do we want it?


A new one, from @zentree and @bex_stevenson on Twitter

What do we want?

When do we want them?

(this sort of thing is why we have a ‘Silly’ tag on StatsChat)

The lack of change is the real story

The Chief Coroner has released provisional suicide statistics for the year to June 2016.  As I wrote last year, the rate of suicide in New Zealand is basically not changing.  The Herald’s story, by Martin Johnston, quotes the Chief Coroner on this point

“Judge Marshall interpreted the suicide death rate as having remained consistent and said it showed New Zealand still had a long way to go in turning around the unacceptably high toll of suicide.”

The headline and graphs don’t make this clear

Here’s the graph from the Herald


If you want a bar graph, it should go down to zero, and it would then show how little is changing


I’d prefer a line graph showing expected variation if there wasn’t any underlying change: the shading is one and two standard deviations around the average of the nine years’ rates


As Judge Marshall says, the suicide death rate has remained consistent. That’s our problem.  Focusing on the year to year variation misses the key point.

October 17, 2016


  • Beautiful weather maps from Ventusky, via Jenny Bryan
  • From BusinessInsider: 90% of executive board members think the ideal proportion of women on boards is higher than the current 20%, but the majority think it should still be 40% or less.
  • The Ministry for Social Development is collecting more data on people who use government-support community services. On one hand, they’re less likely to misuse it than a lot of internet companies; on the other hand, it might well deter people from seeking help. And while the Ministry is getting written consent, the people obtaining it won’t get paid by the Ministry if consent isn’t given.
  • If you only read one summary of the state of the US elections, the 538 update is a relatively painless and informative one.
  • People might be worrying too much about hackers (techy)

Moreover, we find that cyber incidents cost firms only a 0.4% of their annual revenues, much lower than retail shrinkage (1.3%), online fraud (0.9%), and overall rates of corruption, financial misstatements, and billing fraud (5%).


“Kind of” being an important qualifier here.