July 4, 2012

Physicists using statistics

Traditionally, physics was one of the disciplines whose attitude was “If you need statistics, you should have designed a better experiment”.  If you look at the CERN webcast about the Higgs Boson, though, you see that it’s full of statistics: improved multivariate signal processing, boosted decision trees, random variations in the background, etc, etc.

Increasingly, physicists have found, like molecular biologists before them, and physicians before that, that sometimes you can’t afford to do a better experiment. When your experiment costs billions of dollars, you really have to extract the maximum possible information from your data.

As you have probably heard by now, CERN is reporting that they have basically found the Higgs boson: the excess production of certain sets of particles deviates from a non-Higgs model by 5 times the statistical uncertainty: 5σ.  Unfortunately, a few other sets of particles don’t quite match, so combining all the data they have 4.9σ, just below their preferred threshold.

So what does that mean?  Any decision procedure requires some threshold for making a decision.  For drug approval in the US, you need two trials that each show the drug is more effective than placebo by twice the statistical uncertainty: ie, two replications of 2σ, which works out to be a combined exceedance by 2.8 times the statistical uncertainty: 2.8σ.  This threshold is based on a tradeoff between the risk of missing a treatment that could be useful and the risk of approving a useless drug.  In the context of drug development this works well — drugs get withdrawn from the market for safety, or because the effect on a biological marker doesn’t translate into an effect on actual health, but it’s very unusual for a drug to be approved when it just doesn’t work.

In the case of particle physics, false positives could influence research for many years, so once you’ve gone to the expense of building the Large Hadron Collider, you might as well be really sure of the results.  Particle physics uses a 5σ threshold, which means that in the absence of any signal they have only a 1 in 30 million chance per analysis of deciding they have found a Higgs boson.    Despite what some of the media says, that’s not quite the same as a 1 in 30 million chance of being wrong: if nature hasn’t provided us with  a 125GeV Higgs Boson, an analysis that finds the result has a 100% chance of being wrong, if there is one, it has a 0% chance of being wrong.

 

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar

    People don’t really use decision thresholds in fundamental science do they? It’s not like they actually have to make a public policy decision now. Instead, they just have a subjective degree of confidence that varies smoothly with the number of sigmas.

    “Despite what some of the media says, that’s not quite the same as a 1 in 30 million chance of being wrong”

    Absolutely. Many scientists will make such a statement too, not just the media.

    “If nature hasn’t provided us with a 125GeV Higgs Boson, an analysis that finds the result has a 100% chance of being wrong, if there is one, it has a 0% chance of being wrong.”

    That’s all true, but not super useful. People want to know the probability that the Higgs boson exists given the data. Not the probability that it exists given that it exists. :-)

    12 years ago

    • avatar
      Thomas Lumley

      People really do use decision thresholds in particle physics. They argue about them, but they do use them, at least for deciding who gets to make the “we’ve found it” announcement. Of course they don’t use thresholds for deciding what they believe about the Standard Model. That would be silly.

      On the p-value interpretation, I agree that people want to know the probability that the Higgs Boson exists, but that number cannot be one minus the p-value, and the p-value is the number being quoted. Nor will it be the same as the probability that the Higgs Boson exists and has mass roughly 125 GeV, or the probability that the Higgs Boson exists and CERN has found it.

      The exact values of all of these would be subjective, but I think we can agree on “pretty bloody likely”.

      12 years ago

  • avatar

    Agreed. Most physicists seem pretty convinced by the result. Also the NZ Herald has “scientists 99.999% sure of result” in the headline. That’s about how sure I am that they’re incorrectly interpreting one minus the p-value as the posterior probability.

    12 years ago