Posts written by Thomas Lumley (1250)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

August 17, 2014

Health evidence: quality vs quantity

From the Sunday Star-Times, on fish oil

Grey and colleague Dr Mark Bolland studied 18 randomised controlled trials and six meta-analyses of trials on fish oil published between 2005 and 2013. Only two studies showed any benefit but most media coverage of the studies was very positive for the industry.

On the other hand, the CEO of a fish-oil-supplement company disagrees

Keeley said more than 25,000-peer reviewed scientific papers supported the benefits of omega-3. “With that extensive amount of robust study to be then challenged by a couple of meta-analyses where negative reports are correlated together dumbfounds me.”

In fact, it happens all the time that large numbers of research papers and small experiments find something is associated with health then small numbers of large randomised trials show it doesn’t really help.  If it didn’t happen, medical and public health research would be much faster, cheaper, and more effective. I’m a coauthor on at least a couple of those 25000 peer-reviewed papers, and I’ve worked with people who wrote a bunch more of them, and I’m not dumbfounded. You don’t judge weight of evidence by literally weighing the papers.

Mr Keeley takes fish oil himself, and believes he will “live to 70, or 80 or 90 and not suffer from Alzheimer’s.”  That’s actually about what you’d expect without fish oil. He’s 60 now, so his statistical life expectancy is another 23 years, and by 83, less than 10% of people have developed dementia.

I wouldn’t say there was compelling evidence that fish-oil capsules are useless, but the weight of evidence is not in favour of them doing much good.

August 16, 2014

Lotto and concrete implementation

There are lots of Lotto strategies based on trying to find patterns in numbers.

Lotto New Zealand televises its draws, and you can find some of them on YouTube.

If you have a strategy for numerological patterns in the Lotto draws, it might be a good idea to watch a few Lotto draws and ask yourself how the machine knows to follow your pattern.

If you’re just doing it for entertainment, go in good health.

August 15, 2014

Cancer statistics done right

I’ve mentioned a number of times that statistics on cancer survival are often unreliable for the conclusion people want to draw, and that you need to look at cancer mortality.  Today’s story in Stuff is about Otago research that does it right:

The report found for 11-year timeframe, cancer-specific death rates decreased in both countries and cancer mortality fell in both countries. But there was no change in the difference between the death rates New Zealand and Australia, which remained remained 10 per cent higher in New Zealand.

That is, they didn’t look at survival after diagnosis, they looked at the rate of deaths. They also looked at the rate of cancer diagnoses

“The higher mortality from all cancers combined cannot be attributed to higher incidence rates, and this suggests that overall patient survival is lower in New Zealand,” Skegg said.

That’s not quite as solid a conclusion — it’s conceivable that New Zealand really has higher incidence, but Australia compensates by over-diagnosing tumours that wouldn’t ever cause a problem — but it would be a stretch to have that happen over all types of cancer combined, as they observed.


August 14, 2014

Breast cancer risk and exercise

Stuff has a story from the LA Times about exercise and breast cancer risk.  There’s a new research paper based on a large French population study, where women who ended up having a breast cancer diagnosis were less likely to have exercised regularly for the past five year period.  This is just observational correlation, and although it’s a big study, with 2000 breast cancer cases in over 50000 women, the evidence is not all that strong (the uncertainty range around the 10% risk reduction given in the paper goes from an 18% reduction down to a 1% reduction).  Given that,  I’m a bit unhappy with the strength of the language in the story:

For women past childbearing age, a new study finds that a modest amount of exercise — four hours a week of walking or more intensive physical activity such as cycling for just two hours a week — drives down breast cancer risk by roughly 10 per cent.

There’s a more dramatically wrong numerical issue towards the end of the story, though:

The medications tamoxifen and raloxifene can also drive down the risk of breast cancer in those at higher than average risk. They come with side effects such as an increased risk of deep-vein thrombosis or pulmonary embolism, and their powers of risk reduction are actually pretty modest: If 1000 women took either tamoxifen or raloxifene for five years, eight breast cancers would be prevented.

By comparison, regular physical activity is powerful.

Using relative risk reduction for the (potential) benefits of exercise and absolute risk reduction for the benefits of the drugs is misleading. Using the breast cancer risk assessment tool from the National Cancer Institute, the five-year breast cancer risk for a typical 60 year old is perhaps 2%. That agrees with the study’s 2000 cases in 52000 women followed for at least nine years.  If 1000 women with that level of risk took up regular exercise for five years, and if the benefits were real,  two breast cancers would be prevented.

Exercise is much less powerful than the drugs, but it’s cheap, doesn’t require a doctor’s prescription, and the side-effects on other diseases are beneficial, not harmful.

August 13, 2014

Most things don’t work

A nice story in the Herald about a randomised trial of hand sanitiser dispensers in schools.

The study, published today in the journal PLoS Medicine, found absence rates at schools that installed dispensers in classrooms as part of the survey were similar at those “control” schools which did not.

There’s even a good description of the right way to do sample size calculations for a clinical trial

Beforehand, the authors believed a 20 per cent reduction in absences due to illness would be important enough to merit schools considering making hand sanitiser available, so designed the study to detect such a difference.

“Some previous studies suggested that there could be a bigger effect than that, but we wanted to be sure of detecting an effect of that size if it was there,” Dr Priest told the Herald.

That is, the study not only failed to find a benefit, it ruled out any worthwhile benefit. Either Kiwi kids are already washing their hands enough, or they didn’t use the supplied sanitiser.

My only quibble is that the story didn’t link to the open-access research paper.


When are self-selected samples worth discussing?

From recent weeks, three examples of claims from self-selected samples:

In all three cases, you’d expect the pattern to generalise to some extent, but not quantitatively. The dating site in question specifically boasts about the non-representativeness of its members; the NZAS survey was sent to people who’d be likely to care, and there wasn’t much time to respond; scientists who had experienced or witnessed harassment would be more likely to respond and to pass the survey along to others.

I think two of these are worth presenting and discussing, and the other one isn’t, and that’s not just because two of them agree with my political prejudices.

The key question to ask when looking at this sort of probably non-representative sample, is whether the response you see would still be interesting if no-one outside the sample shared it. That is, the surveys tell us at a minimum

  • there exist 350 women in New Zealand who wouldn’t marry a man earning less than them, and are prepared to say so
  • there exist 200-odd scientists in NZ who think the National Science Challenges were badly chosen or conducted, and are prepared to say so
  • there exist 417 scientists who have experienced verbal sexual harassment, and 139 who have experienced unwanted physical contact from other research staff during fieldwork, and are prepared to say so.

I would argue that the first of these is completely uninteresting, but the second is contrary to the impressions being given by the government, and the third should worry scientists who participate in or organise fieldwork.


August 9, 2014


Limits of measurement edition

  • “So you can either believe that Germany has no billionaires or that European statisticians aren’t very good at finding them.” Stories from Slate and Bloomberg on the difficulty of estimating wealth inequality
  • “Big data really only has one unalloyed success on its track record, and it’s an old one: Google, specifically its Web search.” Another story from Slate, on Big Data and creepy experiments.
  • Even for the best drink-driving propaganda, such as the famous ‘Ghost Chips’ ad, the evaluation is basically in terms of public perception, because it’s too hard to evaluate actual impact on drink driving.  A nice piece from TheWireless
August 8, 2014

History of NZ Parliament visualisation

One frame of a video showing NZ party representation in Parliament over time,


made by Stella Blake-Kelly for TheWireless. Watch (and read) the whole thing.

August 7, 2014

Vitamin D context

There’s a story in the Herald about Alzheimer’s Disease risk being much higher in people with low vitamin D levels in their blood. This is observational data, where vitamin D was measured and the researchers then waited to see who would get dementia. That’s all in the story, and the problems aren’t the Herald’s fault.

The lead author of the research paper is quoted as saying

“Clinical trials are now needed to establish whether eating foods such as oily fish or taking vitamin D supplements can delay or even prevent the onset of Alzheimer’s disease and dementia.”

That’s true, as far as it goes, but you might have expected the person writing the press release to mention the existing randomised trial evidence.

The Women’s Health Initiative, one of the largest and probably the most expensive randomised trial ever, included randomisation to calcium and vitamin D or placebo. The goal was to look at prevention of fractures, with prevention of colon cancer as a secondary question, but they have data on dementia and they have published it

During a mean follow-up of 7.8 years, 39 participants in the treatment group and 37 in the placebo group developed incident dementia (hazard ratio (HR) = 1.11, 95% confidence interval (CI) = 0.71-1.74, P = .64). Likewise, 98 treatment participants and 108 placebo participants developed incident [mild cognitive impairment] (HR = 0.95, 95% CI = 0.72-1.25, P = .72). There were no significant differences in incident dementia or [mild cognitive impairment] or in global or domain-specific cognitive function between groups.

That’s based on roughly 2000 women in each treatment group.

The Women’s Health Initiative data doesn’t nail down all the possibilities. It could be that a higher dose is needed. It could be that the women were too healthy (although half of them had low vitamin D levels by usual criteria). The research paper mentions the Women’s Health Initiative and these possible explanations, so the authors were definitely aware of them.

If you’re going to tell people about a potential way to prevent dementia, it would be helpful to at least mention that one form of it has been tried and didn’t work.

Non-bogus non-random polling

As you know, one of the public services StatsChat provides is whingeing about bogus polls in the media, at least when they are used to anchor stories rather than just being decorative widgets on the webpage. This attitude doesn’t (or doesn’t necessarily) apply to polls that make no effort to collect a non-random sample but do make serious efforts to reduce bias by modelling the data. Personally, I think it would be better to apply these modelling techniques on top of standard sampling approaches, but that might not be feasible. You can’t do everything.

I’ve been prompted to write this by seeing Andrew Gelman and David Rothschild’s reasonable and measured response (and also Andrew’s later reasonable and less measured response) to a statement from the American Association for Public Opinion Research.  The AAPOR said

This week, the New York Times and CBS News published a story using, in part, information from a non-probability, opt-in survey sparking concern among many in the polling community. In general, these methods have little grounding in theory and the results can vary widely based on the particular method used. While little information about the methodology accompanied the story, a high level overview of the methodology was posted subsequently on the polling vendor’s website. Unfortunately, due perhaps in part to the novelty of the approach used, many of the details required to honestly assess the methodology remain undisclosed.

As the responses make clear, the accusation about transparency of methods is unfounded. The accusation about theoretical grounding is the pot calling the kettle black.  Standard survey sampling theory is one of my areas of research. I’m currently writing the second edition of a textbook on it. I know about its grounding in theory.

The classical theory applies to most of my applied sampling work, which tends to involve sampling specimen tubes from freezers. The theoretical grounding does not apply when there is massive non-response, as in all political polling. It is an empirical observation based on election results that carefully-done quota samples and reweighted probability samples of telephones give pretty good estimates of public opinion. There is no mathematical guarantee.

Since classical approaches to opinion polling work despite massive non-response, it’s reasonable to expect that modelling-based approaches to non-probability data will also work, and reasonable to hope that they might even work better (given sufficient data and careful modelling). Whether they do work better is an empirical question, but these model-based approaches aren’t a flashy new fad. Rod Little, who pioneered the methods AAPOR is objecting to, did so nearly twenty years before his stint as Chief Scientist at the US Census Bureau, an institution not known for its obsession with the latest fashions.

In some settings modelling may not be feasible because of a lack of population data. In a few settings non-response is not a problem. Neither of those applies in US political polling. It’s disturbing when the president of one of the largest opinion-polling organisations argues that model-based approaches should not be referenced in the media, and that’s even before considering some of the disparaging language being used.

“Don’t try this at home” might have been a reasonable warning to pollers without access to someone like Andrew Gelman. “Don’t try this in the New York Times” wasn’t.