Posts written by Thomas Lumley (1213)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

July 11, 2014

Another prostate cancer study

Today’s prostate cancer risk factor, per the Herald, is vasectomy. The press release is here; the paper isn’t open-access.

This is a much more reliable study than the one earlier in the week about cycling, and there’s reasonable case that this one is worth a press release.

In 1986, the researchers recruited about 50000 men (health professionals: mostly dentists and vets), then followed them up to see how their health changed over time.  This research involves the 43000 who hadn’t had any sort of cancer at the start of the study. As the Herald says, about a quarter of the men had a vasectomy, and there have been 6000 prostate cancer diagnoses. So there’s a reasonable sample size, and there is a good chance you would have heard about this result if no difference had been found (though probably not via the Daily Mail)

The relative increase in risk is estimated as about 10% overall and about 20% for ‘high-grade’ tumours, which is much more plausible than the five-fold increase claimed for cycling.  The researchers had information about the number of prostate cancer tests the men had had, so they can say this isn’t explained by a difference in screening — the cycling study only had total number of doctor visits in the past year. Also, the 20% difference is seen in prostate cancer deaths, not just in diagnoses, though if you only consider deaths the evidence is borderline.  Despite all this, the researchers quite rightly don’t claim the result is conclusive.

There are two things the story doesn’t say. First, if you Google the name of the lead researcher and ‘prostate cancer’, one of the top hits is another paper on prostate cancer (and coffee, protective). That is, the Health Professionals Followup Study, like its sister cohort, the Nurses Health Study, is in the business of looking for correlations between a long list of interesting exposures and potential effects. Some of what it finds will be noise, even if it appears to pass sanity checks and statistical filters. They aren’t doing anything wrong, that’s just what life is like.

Second, there were 167 lethal prostate cancers in men with vasectomies. If the excess risk of 20% is really due to vasectomy, rather than something else, that would mean about 27 cancers caused by 12000 vasectomies. Combining lethal and advanced cases, the same approach gives an estimated 38 cases from 12000 vasectomies. So, if this is causation, the risk is 2 or 3 serious prostate cancers for every 1000 vasectomies. That’s not trivial, but I think it sounds smaller than “20% raised risk”.

July 10, 2014

Summaries of income

I don’t want to get into the general business of election fact-checking, but we have a Stat-of-the-Week  nomination for a statement that is (a) about a specifically statistical issue at the high-school level, and (b) unambiguously wrong.  From Richard Prebble’s “The Letter”:

 Cunliffe is basing Labour’s election campaign around the claim that inequality is growing. Fact check: inequality is falling and New Zealand remains a very equal country. The claim that around a quarter of a million children are in poverty is dubious, to say the very least. Cunliffe says households in poverty have less than 60 percent of the medium income after housing costs. If Bill Gates came to live in New Zealand, the medium income of the country would rise and, according to that logic, more children would be in poverty.

David Cunliffe, as you presumably know, talked about the median, not “medium”; the use of a fraction of median income as a relative poverty threshold is very common internationally. The reason for using the median is precisely that the median income of the country would not rise if a few billionaires were added to the population. The median, the income of the household in the middle of the income distribution, is very insensitive to changes in or additions of a few values. That’s what it’s for.

While I’m writing, I might as well mention the inequality statistics.  Mr Cunliffe isn’t making up his figures on children in poverty; they can be found in the 2014 Household Incomes Report from the Ministry of Social Development [update: that figure is 260000, which matches what The Letter reported was said, but the actual speech said 285000]. The report also gives trends in the Gini index of inequality and in the proportion of income spent on housing.  StatsNZ gives trends in the ratio of 80th to 20th percentile of income, before and after housing costs. The details of trends in inequality depend on how you measure it, but by these measures it is neither falling, nor notably low internationally.

July 9, 2014

Would I have heard if the results were different?

The story about cycling and prostate cancer in the Herald (or the Daily Mail) is a good opportunity to look at some of the rules of thumb for deciding which stories to read or believe:

Firstly, would you have heard if the results were the other way around? Almost certainly not: prostate cancer wasn’t the main point of this study, and there wasn’t a previously-suspected relationship.

Second, for cancer specifically, is this mortality or diagnosis data? That is, are we seeing an increase in detection or in cancer? This is diagnosis data; so it could be just an increase in detection. The researchers were confident it wasn’t, but we must remember the immortal words of Mandy Rice-Davies

Thirdly, what sort of study is it? Obviously it can’t be experimental, but a good study design would be to ask people about cycling (or even better, measure cycling) and then see whether it’s the bike fanatics who develop cancer. This study was a self-selected survey of cyclists, getting self-reported data about past cycling and past diagnosis of prostate cancer. It’s a fairly extreme sample, too: half of them cycle more than 5.75 hours per week.

Fourth, how strong is the evidence of association, and what sort of sample size are we looking at? The association is just barely statistically significant (p=0.046 in one model, p=0.025 in a second), and there are only 36 prostate cancer cases in the sample.  It’s pretty borderline.  The estimated relative risk is huge, because it has to be given the sample size, but the uncertainty range is also huge. The confidence interval on the relative risk of 5 reported by the Herald goes from 1.5 to 18.

Fifth, what does previous research say? This is in the story

‘To the best of our knowledge, this is the first study to demonstrate an association between prostate cancer and cycling, so there are no studies hypothesizing a pathophysiological mechanism for such a link.’

Sixth, what do other experts think? We don’t know. The closest thing to an independent comment is this in the press release

“Physicians should discuss the potential risks and health benefits of cycling with their patients, and how it may impact their overall health,” says Ajay Nehra, MD, Editor-in-Chief of Journal of Men’s Health and Chair, Department of Urology, Director, Men’s Health, Rush University Medical Center, Chicago, IL.

He could have said that without reading the paper.

In summary, there’s borderline evidence from a weak study design for a sensational finding that isn’t supported by any prior evidence. This is fine as research, but it shouldn’t be in the headlines.

You can read the research paper here for the next month, and the journal press release here.


In March, I wrote

The Herald has a story about a potential blood test for dementia, which gives the opportunity to talk about an important statistical issue. The research seems to be good, and the results are plausible, though they need to be confirmed in a separate, larger sample before they can really be believed. …

 But it’s the description of the accuracy of the test that might be misleading.

There’s a Herald story today about new test; the same comments apply, except that the research paper is open-access

July 5, 2014

Once is accident, twice is coincidence

Back in 2010, a piece in Slate pointed out that a country’s success in the 2010 and 2006 World Cup knock-out rounds was strongly correlated with the proportion of the population infected by Toxoplasma gondii.  In 2010, Toxoplasma seroprevalence predicted all eight knockout-round wins; in 2006 it predicted seven of eight.

Toxoplasma, in case you weren’t introduced when you met it, is a single-celled organism that can live, and reproduce asexually, in pretty much any warm-blooded animal, but can only reproduce sexually in the guts of cats. That’s not the interesting part. The interesting part is that in rodents the parasite has effects on the brain, making the animal less cautious and more likely to end up in the gut of a cat. There’s some evidence Toxoplasma also has effects on human behaviour, though that’s still controversial.

Now, in 2014, I see a Tweet from Australian biologist Michael Whitehead

So, three times is enemy action?

There are good reasons to be sceptical: the football rankings haven’t changed all that much since 2006, so this isn’t really three independent tests. Also, the seroprevalence data is for the countries as a whole, not for the team members.  Still, in contrast to the predictions using Octopus vulgaris in the last World Cup, it’s not completely out of the question that there could be a real effect.

July 4, 2014


  • “There are advantages and disadvantages for reporting the median, but over time it has become common practice worldwide to report market benchmark prices as the median.” from a good explanation at
  • The maximum, on the other hand, is rarely a good summary on its own. This Herald story on slow licence suspensions is not an exception. A median or 90th percentile, or proportion taking longer than a reasonable duration would have been good additions. Also, how many licence suspensions are there for accumulated demerits?
  • TheWireless is having a ‘theme’ on Risk. It’s pretty much non-quantitative, which I think misses something, but they aren’t trying to draw unwarranted generalisations from qualitative data. I liked the story on yellow-stickered earthquake-risk buildings; an interesting counterpoint is Eric Crampton’s post on house-hunting in Wellington.
  • “Measure twice; cut once” is the old saying. It’s good that the government’s welfare reform program is being evaluated. Not so good that the evaluation plan is secret even under OIA.
  • Interestingly, the retracted and recently republished paper on GMOs and Roundup  (previous StatsChat coverage)wasn’t peer-reviewed. Or, rather, it wasn’t peer-reviewed again — the journal decided that the initial review before the retraction was enough. This was not made very clear in the paper or press material.

Measuring accuracy

From the Herald

A new scientific test is able to detect which 14-year-olds will become binge drinkers by the time they hit 16.

A study published in the journal Nature describes how scientists have developed a system that weighs up a range of risk factors and predicts – with about 70 per cent accuracy – which teens will become heavy drinkers.

That’s true, but the definition of accuracy is doing quite a bit of work here.

We don’t have figures for 16 year olds, but according to the Ministry of Health about 20% of 15-17 year olds have ‘hazardous drinking patterns.’ That means I can predict with 80% accuracy without even needing to weigh up a range of risk factors — I just need to predict “No” each time. Parents, teachers, or people working with youth could probably do better than my 80% accuracy.

The researchers found that their test correctly classified 73% of the non-binge-drinker and 67% of the binge drinkers, which means it would get 72% of people classified correctly. That’s rather worse than my trivial “the kids are ok” predictor. In order to be any use, the new test, which combines brain imaging and detailed interviews, needs to be set to a higher threshold, so it predicts fewer drinkers.  The researchers could have done this, but they didn’t.

Also, in order to be any use, the test needs to identify a group who will selectively benefit from some feasible intervention, and there needs to be funding to supply both this intervention, and the cost of doing long interviews and fMRI brain imaging on large groups of teenagers. And that needs to be the best way to spend the money.

July 2, 2014

What’s the actual margin of error?

The official maximum margin of error for an election poll with a simple random sample of 1000 people is 3.099%. Real life is more complicated.

In reality, not everyone is willing to talk to the nice researchers, so they either have to keep going until they get a representative-looking number of people in each group they are interested in, or take what they can get and reweight the data — if young people are under-represented, give each one more weight. Also, they can only get a simple random sample of telephones, so there are more complications in handling varying household sizes. And even once they have 1000 people, some of them will say “Dunno” or “The Conservatives? That’s the one with that nice Mr Key, isn’t it?”

After all this has shaken out it’s amazing the polls do as well as they do, and it would be unrealistic to hope that the pure mathematical elegance of the maximum margin of error held up exactly.  Survey statisticians use the term “design effect” to describe how inefficient a sampling method is compared to ideal simple random sampling. If you have a design effect of 2, your sample of 1000 people is as good as an ideal simple random sample of 500 people.

We’d like to know the design effect for individual election polls, but it’s hard. There isn’t any mathematical formula for design effects under quota sampling, and while there is a mathematical estimate for design effects after reweighting it isn’t actually all that accurate.  What we can do, thanks to Peter Green’s averaging code, is estimate the average design effect across multiple polls, by seeing how much the poll results really vary around the smooth trend. [Update: this is Wikipedia's graph, but I used Peter's code]


I did this for National because it’s easiest, and because their margin of error should be close to the maximum margin of error (since their vote is fairly close to 50%). The standard deviation of the residuals from the smooth trend curve is 2.1%, compared to 1.6% for a simple random sample of 1000 people. That would be a design effect of (2.1/1.6)2, or 1.8.  Based on the Fairfax/Ipsos numbers, about half of that could be due to dropping the undecided voters.

In principle, I could have overestimated the design effect this way because sharp changes in party preference would look like unusually large random errors. That’s not a big issue here: if you re-estimate using a standard deviation estimator that’s resistant to big errors (the median absolute deviation) you get a slightly larger design effect estimate.  There may be sharp changes, but there aren’t all that many of them, so they don’t have a big impact.

If the perfect mathematical maximum-margin-of-error is about 3.1%, the added real-world variability turns that into about 4.2%, which isn’t that bad. This doesn’t take bias into account — if something strange is happening with undecided voters, the impact could be a lot bigger than sampling error.


July 1, 2014

Does it make sense?

From the Herald (via @BKDrinkwater on Twitter)

Wages have only gone up $34.53 annually against house prices, which are up by $38,000.

These are the findings of the Home Affordability Report quarterly survey released by Massey University this morning.

At face value, that first sentence doesn’t make any sense, and also looks untrue. Wages have gone up quite a lot more than $34.53 annually. It is, however, almost a quote from the report, which the Herald embeds in their online story

 There was no real surprise in this result because the average annual wage increase of $34.53 was not enough to offset a $38,000 increase in the national median house price and an increase in the average mortgage interest rate from 5.57% to 5.64%. 

If you look for income information online, the first thing you find is the NZ Income Survey, which reported a $38 increase in median weekly salary and wage income for those receiving any. That’s a year old and not the right measure, but it suggests the $34.53 is probably an increase in some measure of average weekly income. Directly comparing that to the increase in the cost of house would be silly.

Fortunately, the Massey report doesn’t do that. If you look at the report, on the last page it says

Housing affordability for housing in New Zealand can be assessed by comparing the average weekly earnings with the median dwelling price and the mortgage interest rate

That is, they do some calculation with weekly earnings and expected mortgage payments. It’s remarkably hard to find exactly what calculation, but if you go to their website, and go back to 2006 when the report was sponsored by AMP, there is a more specific description.

If I’ve understood it correctly, the index is annual interest payment for an 80% mortgage  on the median house price at the average interest rate, divided by the average weekly wage.  That is, it’s the number of person-weeks of average wage income it would take to pay the mortgage interest for a year.  An index of 30 in Auckland means that the mortgage interest for the first year on 80% mortgage on the median house would take 30 weeks of average wage income to pay. A household with two people earning the average Auckland wage would spend 15/52 or nearly 30% of their income on mortgage interest to buy the median Auckland house.

Two final notes: first the “There was no real surprise” claim in the report is pretty meaningless. Once you know the inputs there should never be any real surprise in a simple ratio. Second, the Herald’s second paragraph

These are the findings of the Home Affordability Report quarterly survey released by Massey University this morning.

is just not true. Those are the inputs to the report, from, respectively, Stats New Zealand and REINZ. The findings are the changes in the affordability indices.

Graph of the week

From Deadspin. No further comment needed.