September 22, 2012

Why most research in the news is wrong

There’s a new paper in the journal PLoS One (free access), looking at what research makes the news.

We focused on attention deficit hyperactivity disorder (ADHD). Using Factiva and PubMed databases, we identified 47 scientific publications on ADHD published in the 1990s and soon echoed by 347 newspapers articles. We selected the ten most echoed publications and collected all their relevant subsequent studies until 2011. We checked whether findings reported in each “top 10” publication were consistent with previous and subsequent observations. We also compared the newspaper coverage of the “top 10” publications to that of their related scientific studies.

The media are more likely to report the more surprising findings; these are more likely to be wrong.  The relatively boring research that contradicts these findings doesn’t get reported, because it would put people to sleep.

To a lesser extent the same bias is found in the scientific literature.  If you find something surprising and dramatic, you’re more likely to work nights and weekends to get it written up, and to submit it to a top journal.  Research with methodologic problems is more likely to get past peer review if it’s interesting than if it’s boring.

In the case of clinical trials this publication bias has been recognised as a clear and present danger to patient care, and steps have been introducted to reduce its impact.   More generally it’s hard to see what to do in the scientific literature: we don’t actually want a lot of boring low-quality research in academic journals (we have to read them, after all), and we don’t want to suppress interesting findings because the research wasn’t perfect.

For newspapers, a bit more restraint in  scientific press releases would help.  The philosopher Daniel Dennett had a nice phrasing of this. I don’t remember it exactly, but it was along the lines of “Preliminary scientific results should be treated like any other potentially hazardous laboratory waste and not irresponsibly discharged into the environment”

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar

    This has been fairly widely discussed but I suspect is not all that well known. A prominent paper is
    Ioannidis, J.P. “Why most published research findings are false”, PLoS Medicine (http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124)

    The question of how to ensure that appropriate credit is given for replicating experiments with headline-grabbing outcomes, even when the replication shows that the exciting claim is not valid, goes to the heart of the messed-up reward system of science.

    12 years ago

    • avatar
      Thomas Lumley

      Yes, but Ioannidis is making a much stronger claim, and in my opinion he seriously overestimates the effects he is talking about (even though the phenomenon is real)

      For instance, one of his examples says “Let us assume that a team of investigators performs a whole genome association study to test whether any of 100,000 gene polymorphisms are associated with susceptibility to schizophrenia.”
      and goes on to work out what would happen at a p-value threshold of 0.05, or at a higher threshold resulting from analytic fudging.

      In fact, in the days when people published genetic associations at p=0.05 unreplicated, the cost of measurement meant that very few variants were studied, and the candidates were selected much more highly than 100,000 possibilities. Some of us did replication before publishing even then. In the modern world, people use a threshold like p=0.00000005, often plus replication or follow-up in lab models.

      The same problem, to a less extreme extent, happens in a lot of his computations — he’s basically assuming that statistical worst practices are followed uniformly, which just isn’t true. And the fields and journals where it tends to be more true are the ones we don’t believe.

      12 years ago