May 1, 2020

The right word

Scientists often speak their own language. They sometimes use strange words, and they sometimes use normal words but mean something different by them. Toby Morris & Siouxsie Wiles have an animation of some examples.

The goal of scientific language is usually to be precise, to make distinctions that aren’t important in everyday speech. Scientists aren’t trying to confuse you or keep you out, though those effects can happen — and they aren’t always unwelcome. I’ve written on my blog about two examples: bacteria vs virus (where the scientists are right) and organic (where they need to get over themselves).

This week’s example of conflict between trying to be approachable and trying to be precise is the phrase “false positive rate”. When someone gets a COVID test, whether looking for the virus itself or looking for antibodies they’ve made in reaction to it, the test could be positive or negative. We can also divide people up by whether they really have/had COVID infection or no infection. This gives four possibilities

True positives: positive test, have/had COVID
True negatives: negative test, really no COVID
False positives: positive test, really no COVID
False negatives: negative test, have/had COVID

If you encounter something called the “false positive rate”, what is it? It obvious involves the false positives, divided by something, but it could be false positives as a proportion of all positive tests, or false positives as a proportion of people who don’t have COVID, or even false positives as a proportion of all tests. It turns out that the first two of these definitions are both in common use.

Scientists (statisticians and epidemiologists) would define two pairs of accuracy summaries

Sensitivity: true positives divided by people with COVID
Specificity: true negatives divided by people without COVID
Positive Predictive Value(PPV): true positives divided by all positives
Negative Predictive Value(NPV): true negatives divided by all negatives

The first ‘false positive rate’ definition is 1-PPV ~~NPV~~; the second is 1-specificity.

If you write about the antibody studies carried out in the US, you can either use the precise terms, which will put off people who don’t know anything, or use the vague terms, and people who know a bit about the topic may misunderstand and think you’ve got them wrong.

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

Andrew Matheson

An interesting post, and very helpful in understanding some of the current discussion.

I’m not a statistician of any sort, but this statement seems odd: “The first ‘false positive rate’ definition is 1-NPV”. Shouldn’t it be 1-PPV?

Thanks

4 years ago
- Thomas Lumley
  
  No, it’s false positives/(false positives+ true negatives)
  
  4 years ago
  - Rob Sagetti
    
    Would FP / (FP + TN) not be 1 – specificity, which is the second definition?
    The first definition, FP as a proportion of all positives, or FP / (TP + FP), would be 1 – PPV, no?
    
    4 years ago
  - Tommy Jones
    
    I think Andrew is right unless I’m misunderstanding something fundamental. (Always probable in my case.)
    
    1 – NPV is the false omission rate, a type of false negative. 1 – PPV is the false discovery rate, a type of false positive.
    
    For reference: https://en.wikipedia.org/wiki/Sensitivity_and_specificity
    
    There’s a long table down and to the right that has the definitions of just about anything one would care to calculate from a confusion matrix.
    
    4 years ago

The right word

Comments

Related posts

Latest posts

Topics filed under

Subscribe:

Receive our posts via email:

The right word

Comments

Related posts

Latest posts

Topics filed under