April 18, 2020

Prevalence estimation: is it out there?

One of the known unknowns about the NZ coronavirus epidemic is the number of cases we have not detected. There will have been a mixture of people who didn’t get any symptoms, people who are going to show symptoms but haven’t yet, people who got moderately sick but didn’t get tested, and people whose deaths were attributed to some pre-existing condition without testing.

For the decision to loosen restrictions, we care mostly about people who are currently infected, who aren’t (currently) sick enough to get testing, and who aren’t known contacts of previous cases. What can we say about this number — the ‘community prevalence’ of undetected coronavirus infection in New Zealand?

One upper bound is that we’re currently seeing about 1% positive tests in people who either have symptoms or are close contacts of cases. The prevalence in close contacts of cases must be higher than in the general population — this is an infectious disease — so we can be fairly confident the population prevalence is less than 1%.

Are there any other constraints? Well, infection isn’t a static process. If you have coronavirus in 1% of Kiwis, they will pass it on to other people and they themselves will recover. At the moment, under level 4, the epidemic modellers at Te Pūnaha Matatini are estimating a reproduction number of about 0.5, so 50,000 cases will infect half that many new people. Now, if we’re missing nearly all the cases, the modelling might not be all that accurate, but there would have to be tens of thousands of new infections. And at least a few percent of those new cases will be sick enough to need medical treatment. We would quickly notice that many people showing up to hospitals with (by assumption) no known contacts. It isn’t happening. Personally, I have a hard time believing in a prevalence as high as 0.2%, which would mean we’re missing over 80% of cases.

The other constraint would come from testing of healthy people, which is why the government has started doing that. If you wanted an accurate estimate for the population as a whole, you’d need some sort of random population sample, but in the short time it makes more sense to take a sensibly-constructed random sample of supermarkets and then test their customers and employees — if there’s major undetected spread, supermarkets are one of the likely places for it to happen, and they’re also a convenient place to find people who are already leaving home, so you can test them without barging into their bubbles. So, we aren’t getting a true population prevalence estimate, but we are getting an estimate of something a bit like it but probably higher.

How many do we need to test? It depends on how sure you want to be. If we sample 10,000 people and 4 are positive, we could estimate^* prevalence at 4/10,000, or 0.04%. But what if no-one is positive? The best estimate clearly isn’t zero!

The question gets more extreme with smaller sample sizes: if we sample 350 people (as was done at the Queenstown PakNSave) and find no cases, what can we say about the prevalence? The classical answer, a valuable trick for hallway statistical consulting, is that if the true rate is 3/N or higher, the chance of seeing no cases in N tests is less than 5%. So, if we see no cases in 350 people, we can be pretty sure the prevalence was less than 3/350, or about 1%. Since we were already pretty sure the prevalence was way less than 1%, that hasn’t got us much further forward. We’re eventually going to want thousands, or tens of thousands, of tests. The Queenstown testing was only a start.

After that introduction, you’ll understand my reaction when Radio NZ’s Checkpoint said there had been a positive test in the Queenstown supermarket, with only two-thirds of the samples run through the lab. Fortunately, it turns out there had been a misunderstanding and there has not yet been a positive result from this community testing. If the true rate is 0.1% there’s a good chance we’ll see a community-positive test soon; if it’s 0.01%, not for a while. And if we’re really at the level of eliminating community transmission, even longer.

Update: Statistical uncertainty in the other direction also matters. If the true prevalence is p and you test N people, you get pN positive tests on average, but your chance of getting no positive tests is e^-pN. So, if you test 350 people and the true prevalence is 0.1%, your chance of getting no positive tests is about 70% and your chance of at least one positive is 30%. And a positive test in Queenstown would have been surprising, but shouldn’t have been a complete shock. Two positive tests should be a shock.

* There’s another complication, for another post, in that the test isn’t perfect. The estimate would actually be more like 0.05% or 0.06%.

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

Owen Watson

Any comment on the Santa Clara study, as in https://medium.com/@balajis/peer-review-of-covid-19-antibody-seroprevalence-in-santa-clara-county-california-1f6382258c25

4 years ago
- Thomas Lumley
  
  Those comments look reasonable to me. I haven’t had a chance to read the original study results yet.
  
  4 years ago
Peter Davis

Can you just clarify the run of the argument in para 4. You start by talking about the reproduction rate TPM are using (0.5). Then later in the paragraph you talk about a prevalence rate of 0.2% and missing 80% of cases. How do you get from one to the other? Also, I take it that the reproduction rate is a function of the dynamics of the disease epidemic (e.g. subject to adequate policy interventions to reduce its spread). But what is the infectivity or contagion rate in the disease’s “natural history”? If you look at, say, the Bluff, Matamata or Marist clusters, the infectious spread from a single source is startling!

4 years ago
- Thomas Lumley
  
  The point of the reproduction number is to argue that a rate of close to 1% is impossible, because 1% would mean 50,000 cases, which would lead to new cases.
  
  The same thing happens, pro rata, at lower rates. At 0.2%, there are 10000 cases, of which we’ve seen 1400, less than 20%. Those 10,000 would be expected to cause obvious new community transmission.
  
  4 years ago
Megan Pledger

I know two groups of people who traveled overseas, had symptoms and got tested (along with very close contacts). They were all negative.

They are all the people I personally know who returned from overseas during this time.

So, in the initial testing there could have been a lot of worried well.

4 years ago
- Geoffrey Platt
  
  People could have arrived with a cold or flu. I have arrived home from a trip with a cold more than once, once I had flu when I got home and was sick for a week.
  
  4 years ago
  - Megan Pledger
    
    Perhaps I should have said “worried (almost) well” i.e. they are the usual maladies that people get from plane travel.
    
    4 years ago
Martyn Fields

Now we have moved to asymptomatic community testing – I have been wondering why all medical / nursing / hospital staff / rest home staff are not being tested at this stage? They appear to be a group at higher risk of being positive. If all were tested – it would provide more info of the incidence of asymptomatic carriers plus the added benefit of picking up someone that could cause significant issues in a hospital.
Is there a reason that this group is not being tested as a priority?

4 years ago
- Thomas Lumley
  
  I’m the wrong person to ask; I don’t have any inside information. They are another plausible essential-worker group apart from supermarket employees. I don’t know how high-risk they actually are in NZ — they have better training and equipment than other essential workers, and they haven’t been overwhelmed the way health workers in some other countries have been — but it would make sense.
  
  4 years ago
Robert McLachlan

See these discussions of some possible problems with the Santa Clara study, especially the false positive rate and a possibly enriched sample:

https://medium.com/@balajis/peer-review-of-covid-19-antibody-seroprevalence-in-santa-clara-county-california-1f6382258c25
https://twitter.com/mattsheffield/status/1251285817735208962

On the face of it is hard to understand why they didn’t ask the participants if they had been tested, had any contacts with positives, or had shown any symptoms.

4 years ago
Adrian Bunk

Why would two positive tests be a shock for you?

Undetected cases might not be evenly distributed over the country, there might be undetected clusters. Of course not the same size as the undetected cluster in Lombardy/Italy was, but hitting two people who are part of a small cluster of asymptomatic cases would be plausible.

4 years ago
- Thomas Lumley
  
  The existence of a small cluster wouldn’t be a shock, but sampling a cluster with 300 tests at one location would be — it would suggest there were a lot of small clusters around the country.
  
  4 years ago
  - Adrian Bunk
    
    But then you should be shocked already about one positive test.
    
    You are getting closer to the point in time where the findings will be either “zero new cases” or “undetected cluster”.
    
    4 years ago