June 7, 2017

Fraud or typos?

The Guardian saysDozens of recent clinical trials may contain wrong or falsified data, claims study

A UK anaesthetist, John Carlise, has scraped 5000 clinical-trial publications, where patients are divided randomly into two groups before treatment is assigned, and looked at whether the two groups are more similar or more different than you’d expect by chance.  His motivation appears to be that having groups which are too similar can be a sign of incompetent fraud by someone who doesn’t understand basic statistics. However, the statistical hypothesis he’s testing isn’t actually about fraud, or even about incompetent fraud.

As the research paper notes, some of the anomalous results can be explained by simple writing errors: saying “standard deviation” when you mean “standard error” — and this would, if anything, be evidence against fraud.  Even in the cases where that specific writing error isn’t plausible, looking at the paper can show data fabrication to be an unlikely explanation.  For example, in one of the papers singled out as having a big difference not explainable by the standard deviation/standard error confusion, the difference is in one blood chemistry measurement (tPA) that doesn’t play any real role in the conclusions. The data are not consistent with random error, but they also aren’t consistent with deliberate fraud.  They are more consistent with someone typing 3.2 when they meant 4.2. This would still be a problem with the paper, both because some relatively unimportant data are wrong and because it says bad things about your workflow if you are still typing Table 1 by hand in the 21st century, but it’s not of the same scale as data fabrication.

You’d think the Guardian might be more sympathetic to typos as an explanation of error.



Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »


  • avatar
    Thomas Lumley

    Also notable: the description of the method in the Guardian is completely wrong

    The tool works by comparing the baseline data, such as the height, sex, weight and blood pressure of trial participants, to known distributions of these variables in a random sample of the populations.

    Comparing the trial participants to a random sample of the population would be useless: trial participants aren’t remotely a random sample of the population.

    4 months ago

  • avatar
    Megan Pledger

    I’m still not entirely clear what he did but from what I gather he scraped the means and variances/standard errors from the baseline table in published articles and calculated the p-value for the difference between treatment groups for each baseline variable.

    I guess my main worry would be the amount of rounding in the standard errors that he scraped from publications. I would generally report SEs rounded to 2 sig figures in an article but calculate the p-value on unrounded values. Even quite small changes in SE can can cause quite large changes in the p-value especially in the extremes e.g.

    Rounded SE as for an article: df=100, Mean difference 1, se of difference 0.4
    p-value = pt(-1/0.40,100)*2 = 0.01404579

    Potential low value of se: df=100, Mean difference 1, se of difference 0.35
    p-value = pt(-1/0.35,100)*2 = 0.005200736
    p-value difference = 0.01404579 – 0.005200736 = 0.008845054

    Potential high value of se: df=100, Mean difference 1, se of difference 0.45
    p-value = pt(-1/0.45,100)*2 = 0.02852148
    p-value difference 0.01404579 – 0.02852148
    = -0.01447569

    If the SE is heavily rounded then you’d expect the resulting p-value distribution to be heavier in the tails i.e. the rounded SE p-value is closer to the extreme p-value than the non-extreme p-value.

    3 months ago

    • avatar
      Thomas Lumley

      Yes, that’s right, but he’s interested in bigger anomalies than those. I still think he’s hugely overselling fraud as a cause of them compared to transcription error, and it would only catch really incompetent fraud, but the idea is useful for quick pre-screening of submitted papers. You could then automate a “Please check Table 1 and resubmit with an explanation of the reason for the errors”.

      3 months ago