Posts written by Thomas Lumley (1857)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

September 6, 2016


  • A preview of Cathy O’Neil’s book about data science and its potential dangers, coming out tomorrow.
  • A map of the world’s languages — showing the difficulties in definition, since all the Chinese languages are lumped in together when probably equally distinctive languages from different countries are given separately.
September 2, 2016

A game changer?

There are stories on Stuff and the Herald about early studies of a potential Alzheimer’s drug. There was also a story on One News last night, but the video doesn’t seem to be up, and there’s one on Newshub.

The drug, adacanumab, reduced amyloid plaque buildup in people with early-stage disease. According to the most widely believed theory about Alzheimer’s, that could slow or even stop the progression of disease. And, as the stories say, if the treatment turns out to be successful in future trials, it will be a game changer.


We’ve never had an successful treatment that modifies the disease process in Alzheimer’s, but we’ve had a range of promising candidates that failed as soon the test went beyond biochemistry to improvements in memory or the ability to handle daily life.  Adacanumab might be different. Let’s hope so.

September 1, 2016

Transport numbers

Auckland Transport released new patronage data, and FigureNZ tidied it up to make it easily computer-readable, so I thought I’d look at some of it.  What I’m going to show is a decomposition of the data into overall trends, seasonal variation, and random stuff just happening. As usual, click to embiggen the pictures.

First, the trends: rides are up.


It’s hard to see the trend in ferry use, so here’s a version on a log scale — meaning that the same proportional trend would look the same for all three modes of transport


Train use is increasing (relatively) faster than bus or ferry use.  There’s also an interesting bump in the middle that we’ll get back to.

Now, the seasonal patterns. Again, these are on a logarithmic scale, so they show relative variation


The clearest signal is that ferry use peaks in summer, when the other modes are at their minimum. Also, the Christmas minimum is a bit lower for trains: to see this, we can combine the two graphs:


It’s not surprising that train use falls by more: they turn the trains off for a lot of the holiday period.

Finally, what’s left when you subtract the seasonal and trend components:


The highest extra variation in both train and ferry rides was in September and October 2011: the Rugby World Cup.


August 31, 2016

Be afraid

From the Herald (from the Daily Mail)

Patients should be warned about the dangers of chemotherapy after research showed that cancer drugs are killing up to 50 per cent of patients in some UK hospitals.

That’s almost completely untrue.

Firstly, the research looked at deaths from any cause within 30 days of starting treatment, and did not claim these were all due to chemotherapy. Secondly,  the 50% figure was in one hospital. Thirdly, it was for a subset of one particular type of cancer.  And, the conclusion from the news story cannot be found anywhere in the research paper.

The researchers do think that chemotherapy is probably being used suboptimally in some of the hospitals, including the one where about 5 out of 10 of the patients being treated ‘with curative intent’ for lung cancer died within 30 days. That hospital stood out, despite the tiny numbers, because the average death rate across all hospitals for similar patients was about 3%.

As the researchers say

The identification of hospitals with significantly higher 30-day mortality rates will promote review of clinical decision making in these hospitals.

It probably will, but that doesn’t tell us much about risks here on the other side of the world


August 29, 2016

Lucky lotto stores

From the Northern Advocate

An unprecedented run of success in selling winning Lotto second division winning tickets has a Whangarei store on tenterhooks expecting an even bigger win soon.

Now, in one sense this is rubbish: lotto is drawn randomly. Previous wins can’t function as an outward and visible sign of a inward propensity to sell lucky tickets, because there is no such thing.

On the other hand, statistically, you would expect a store that has sold a lot of winning tickets in the past to sell a lot of winning tickets in the future. That’s because a store that has sold a lot of winning tickets has probably just sold a lot of tickets.

A ‘lucky’ lotto vendor will usually be one that’s made a lot of profits for Lotto New Zealand. As to whether its customers are lucky, well, you don’t tend to see stories like this set in Herne Bay or Thorndon.


  • 538 has a new Twitter bot, censusAmericans, which tweets little descriptions of individuals from what I think must be the American Community Survey, though they describe it as the census.
  • “Relaxing Privacy Vow, WhatsApp Will Share Some Data With Facebook” (NY Times). “Relaxing” is such a nice way to put that, but as various people have pointed out, this is what happens when companies build up volumes of data.
  • A nice app for exploring how differences in some measurement (‘biomarker’) between groups of people (fail to) translate into reliable tests
August 20, 2016


  • Mining data from Lending Club.  And Matt Levine’s comments: Here are 50 data points about this loan. Do what you want….. And if there’s no field for “does this person have another LendingClub loan,” and if that data point would have been helpful, well, sometimes that happens.
  • It’s just gone Saturday in the US, so it is no longer National Potato Day, and it won’t be National Spumoni Day until Sunday. Nathan Yau has a graphic of the 214 days that are National <some food> Day.
  • Because genetic association studies are (or were) largely done in people of European ancestry, they can overpredict risks in everyone else. (NY Times). (The implication that this is also true of non-genetic research is, at least, exaggerated)

The statistical significance filter

Attention conservation notice: long and nerdy, but does have pictures.

You may have noticed that I often say about newsy research studies that they are are barely statistically significant or they found only weak evidence, but that I don’t say that about large-scale clinical trials. This isn’t (just) personal prejudice. There are two good reasons why any given evidence threshold is more likely to be met in lower-quality research — and while I’ll be talking in terms of p-values here, getting rid of them doesn’t solve this problem (it might solve other problems).  I’ll also be talking in terms of an effect being “real” or not, which is again an oversimplification but one that I don’t think affects the point I’m making.  Think of a “real” effect as one big enough to write a news story about.


This graph shows possible results in statistical tests, for research where the effect of the thing you’re studying is real (orange) or not real (blue).  The solid circles are results that pass your statistical evidence threshold, in the direction you wanted to see — they’re press-releasable as well as publishable.

Only about half the ‘statistically significant’ results are real; the rest are false positives.

I’ve assumed the proportion of “real” effects is about 10%. That makes sense in a lot of medical and psychological research — arguably, it’s too optimistic.  I’ve also assumed the sample size is too small to reliably pick up plausible differences between blue and yellow — sadly, this is also realistic.


In the second graph, we’re looking at a setting where half the effects are real and half aren’t. Now, of the effects that pass the threshold, most are real.  On the other hand, there’s a lot of real effects that get missed.  This was the setting for a lot of clinical trials in the old days, when they were done in single hospitals or small groups.


The third case is relatively implausible hypotheses — 10% true — but well-designed studies.  There are still the same number of false positives, but many more true positives.  A better-designed study means that positive results are more likely to be correct.


Finally, the setting of well-conducted clinical trials intended to be definitive, the sort of studies done to get new drugs approved. About half the candidate treatments work as intended, and when they do, the results are likely to be positive.   For a well-designed test such as this, statistical significance is a reasonable guide to whether the effect is real.

The problem is that the media only show a subset of the (exciting) solid circles, and typically don’t show the (boring) empty circles. So, what you see is


where the columns are 10% and 50% proportion of studies having a true effect, and the top and bottom rows are under-sized and well-design studies.


Knowing the threshold for evidence isn’t enough: the prior plausibility matters, and the ability of the study to demonstrate effects matters. Apparent effects seen in small or poorly-designed studies are less likely to be true.

August 19, 2016

Has your life improved since 1966?

From Pew Research, is life better than 50 years ago for people like you?


The answers aren’t going to mean much about reality, more about the sort of people we are or want to think we are.  As Fred Clark puts it

If you ask those of us who are 18-53 years old for our opinions about what life was like before we either existed or have any memory, we’ll give you an answer. And that speculative, possibly even informed, opinion may mean something or other in the aggregate. Maybe it tells us something fuzzy about general optimism or pessimism. Or maybe something about the dismal state of history, social studies, civics and science education.

Or, for the people who do have memories of the mid-sixties…

Age 65-70: I peaked in high school. Go away, nerd, or I’ll give you a swirlie.

August 18, 2016

Post-truth data maps

The Herald has a story “New map compares breast sizes around the world”. They blame as the immediate cause, but a very similar story at the Daily Mail actually links to where it got the map.  You might wonder how the data were collected (you might wonder why, too). The journalist did get as far as that:

The breast map doesn’t reveal how the cup sizes were measured, it’s fair to say tracking bra purchases per country would be an ideal – and maybe a little weird – approach.