March 5, 2013

Biomarkers and the underpants gnomes

The Gnomes appeared in an episode of South Park. They had a detailed business plan:

  1. Steal underpants
  2. ???
  3. Profit!

I’ve just been pointed to a story `Make your own cancer diagnostic test’, from a newsletter of the Stanford Medical School, about a year ago.  The idea seems to be

  1. Find a biomarker
  2. ???
  3. Diagnostic test!

That is, the story describes how you could use the massive databases of knowledge about gene expression, and the ability to order up inexpensive samples and assays, to find a cancer biomarker, a protein that was present in large quantities in people with a specific type of cancer, but not in healthy people.

There are a few problems before you even get that far, like the fact that most proteins don’t wander around in the blood but stay inside cells or attached to membranes, but those issues could be handled without too much difficulty.  There’s also the possibility that the particular type of cancer you’re looking at doesn’t put large quantities of any unique protein into the blood, but let’s ignore that one.

The real problem is that what you end up with is a strategy for diagnosing cancer in people who already know they have it.  For a diagnostic test to be useful, it has to diagnose cancer accurately, with few false positive, and do it well before you would otherwise know about.  That’s hard.  There are plenty of known protein biomarkers for cancer, but very few of them (some people would say none of them) are currently useful for early detection

To drive this point home: ten years ago, a paper appeared in Proceedings of the National Academy of Sciences, describing a better version of  this proposed search strategy for biomarkers.  It worked, in the sense that they discovered new biomarkers for multiple types of cancer.  With a decade of followup, how many of these have been turned into new diagnostic tests? Not a lot.


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »