February 26, 2017

This bit is even nerdier

Nick Smith, the Environment Minister, on Stuff

This bit is very nerdy. We are saying at 540 E.coli the risk is one in 20 (of getting sick).  But that one in 20 is at the 95 per cent confidence level. So there is an extra level of cautiousness. Even if you put 20 people in water and it has a 540 E.coli level it’s not saying on average one person gets sick out of 20. It’s saying one in 20 of 20 groups will have one in 20 get sick.

No, it’s not saying that.

Let’s step back a bit.  First, why is such a baroque description of the risk, less than 1/20 95% of the time, even being used?

As Dr Smith does convey in the interview, the problem is that risk varies. There are two sources of uncertainty if you go swimming in the Hutt River. First, the bacteria count varies over time — with rain, temperature, and whatevever else — so you don’t know what it will be at precisely the time you stick your head under.  Second, if you end up swallowing some  Campylobacter you still only have a chance of getting infected.

Summarising these two types of uncertainty in a single number is hard. One sensible approach is to pick a risk, such as 1/20.  If we want to say that the chance of getting infected is less than 1/20, we need to handle both the variation in shittiness of the water, and the basically random risk of infection for a given level of contamination.

Suppose we imagine a slightly implausible extreme sports facility that sends 100 backpackers on one-day swimming parties each day.  On 95% of days (347 days per year), they’d expect fewer than 5 to get infected. On 5% of days (18 days per year) they’d expect more than 5 to get infected, but it couldn’t possibly be more than 100.  So the total number of infections across the year is less than 5*347+100*18, or 10% of swimmers. That sounds bad, but it’s an extremely conservative upper bound.  In fact, when the risk is less than 5% it’s often much less, and when it’s greater than 5% it’s usually nowhere near 100%.  To say more, though, you’d need to know more about how the risk varies over time.

There are statistical models for all of this, and since everyone seems to be using the same models we can just stipulate that they’re reasonable.  The detailed report is here (PDF), and Jonathan Marshall, who’s a statistician who knows about this sort of thing, has scripts to reproduce some calculations here.

Using those models, a `yellow’ river, with risk less than 1/20 95% of the time actually has risk less than 1/1000 about half the time, but occasionally has risks well over 10%.  Our imaginary extreme sports facility will have about 3 infections per 100 customers, averaged over the year. About half these infections will happen on the worst 5% of days.

So, the 1/20 of 1/20 level doesn’t by itself guarantee anything better than 10% infection risk for people swimming on randomly chosen days, but combined with knowledge of the actual bacteria distribution in NZ rivers, seems to work out at about a 3%  risk averaged over all days.  Also, if you can detect and avoid the worst few days each year, your risk will be reduced quite a lot.

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »