Posts written by Thomas Lumley (2640)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

April 21, 2026

Green and full of terrors

If you want to get a health story into the papers it helps if it sounds controversial and it especially helps if it tells people they can eat food they want to eat: thus the frequency of stories about chocolate, wine, and beer.

This week’s version is a not-very-detailed abstract from a conference in the US that purportedly says healthy food gives you lung cancer. And when I say that’s what the study purportedly shows, I mean:

  • MSN: Eating fruits, vegetables and whole grains may increase chance of early onset lung cancer
  • The Independent: Eating more fruits and vegetables could put you at risk for this cancer
  • Newsweek: Fruits and Vegetables May Increase Your Cancer Risk, New Research Shows
  • and even the researchers’ own press release: Eating fruits, vegetables and whole grains may increase chance of early onset lung cancer

One exception is Ars Technica, headlining the story as Absurd study suggests eating fruits and vegetables leads to cancer.

The press release says

“Our research shows that younger non-smokers who eat a higher quantity of healthy foods than the general population are more likely to develop lung cancer,” said Jorge Nieva, MD,

and, no, it really doesn’t show that.  For a start, we shouldn’t really be advertising  health advice to the general public based on just a conference abstract, with so little detail. In this case even the limited detail we have is enough to say that this shouldn’t be a big health story.

The research is part of a project to study lung cancer in younger people who don’t smoke. It used to be that nearly all lung cancer cases were in older people who had been smokers, but one of the victories of global public health is to reduce the number of cases like this.    Clearly, if someone has lung cancer at age 30 it isn’t because they’ve been smoking for fifty years — and in fact, many of them haven’t smoked at all. So, there’s interest in studying what causes their lung cancer.  The Epidemiology of Young Lung Cancer study wants to look at genetic attributes and environmental risk factors for lung cancer before age 40.

Finding a control group is hard. You can’t just recruit a whole bunch of young people and see who gets lung cancer, because it’s an very rare disease: you won’t find anyone.  You need to look for diagnosed cases, but then you need to decide who to compare them to. In this research the people with lung cancer were compared to people in a big national survey series, NHANES, which asks about diet.

The primary reported finding is that young people diagnosed with lung cancer had healthier diets (according to one measure) than the average of the US population.  The researchers don’t say they expected to see this, and my guess is they didn’t.  Their theory is that pesticides — in some generic holistic sense — are responsible.  It’s obviously not impossible that pesticides could be carcinogenic, but this doesn’t seem like a very good way to find out. In particular, while the people in the study all have lung cancer, they don’t all have similar mutations in their tumours  — they don’t have the same sort of lung cancer — and there’s literally zero actual data on pesticides, just an assumption that they’re present in healthier food, so this isn’t picking out some sort of ultra-selective cancer effect.

Everything here is correlations, but better correlational studies with controls consistently find that people with lung cancer eat less of the fruit and vegetables and high-fibre foods than people without lung cancer (eg here). There’s a theoretical argument that a diet high in anti-oxidants might reduce the body’s ability destroy cancer, but you wouldn’t look at a small, unusual subset of lung cancers to study this question.  There are perfectly good alternative reasons why the young lung cancer patients might have healthier diets than the US average. They’re young, for a start.  They have had their cancers diagnosed early enough to end up in a study like this one, which will correlate with income and interest in medical science. They’re non-smokers.

If unreliable evidence of a healthier diet in a subset of people with lung cancer is taken as evidence of harm from pesticides, should we take evidence of a less healthy diet in other groups of people with serious illness as evidence that pesticides are beneficial?

April 16, 2026

Top five wealthiest?

From the NZ$ Herald New Zealand ranks among world’s top five wealthiest countries per capita in rich list report. I don’t think it really makes sense to call this a “rich list” report, but New Zealand does indeed rank “among” the world’s top five wealthiest countries (obviously we’re in fifth place, otherwise we’d be “among” the top four).

As Damien Venuto at Stuff notes, this doesn’t sound right. Is NZ really wealthier than Norway or Denmark or Japan or the UK?

The report being quoted is here; nobody links. There are at least three things going on.

First, the numbers are means when we usually prefer medians for this sort of comparison. The means are much more strongly influenced by the richest people, and also are just larger.  The use of means isn’t some evil capitalist plot by Allianz — it’s just easier to find out the mean, since you get it by taking the total and dividing by the population.  Working out the median per capita financial assets would take some serious survey-based research.  I will note that they aren’t completely clear about how they define the population, but it won’t make much difference to comparisons.

The second issue, which is important in the Stuff piece, is that a big chunk of the ‘wealth’ in New Zealand and Australia is over-valued real estate.  Real estate is problematic for wealth because it’s hard to extract the wealth that is nominally generated and use it to pay for stuff.  It’s even harder for large chunks of society to extract their real-estate wealth, since doing so would tend to bring prices back in line with reality.

A third issue, especially when comparing with the USA on one hand and the Scandinavian countries on the other hand, is what expenses need to be covered by that wealth.  In the USA, private assets pay for a larger fraction of healthcare and education than they do in New Zealand, and in turn we pay privately for more of these than they do in Norway.  When the public sector provides less, it will tend to use less  money, leaving private households more money to spend on the services they now have to buy.   The per-capita mean is not the best statistic for tracking this sort of thing: distribution of wealth and income matters.

As a final note, there is a whole chapter on distribution in the report that neither NZ paper mentioned.  The chapter isn’t very positive — inequality between between countries seems to have stopped decreasing, and it hasn’t improved within countries either.

April 9, 2026

Kōkako goneburger?

The North Island kōkako is one of Aotearoa’s most elegantly beautiful birds, and while rare, they still exist in the wild as well as in sanctuaries. I’ve seen them on Tiritiri, near Auckland.  The South Island kōkako is a bit more controversial. It has been regarded as extinct and but is officially classified at the moment as “Don’t Know”.

This week, the Press published a story about the South Island kōkako, based on a publication in a regional ornithology journal, claiming there was a 48% chance that the species is still around.  The story raises two questions: what even does that mean, and is it reasonable?

We know that many of the reported sightings of South Island kōkako must be wrong: if they were not, the bird would be everywhere and there would be no uncertainty about its survival.  The question is whether any of the reported sightings are correct. Now, if there are no South Island kōkako then clearly all the sightings are mistaken, no matter how skilled and careful the reporters are– just like all the sightings of Bigfoot.  If South Island kōkako are rare, then most of the sightings are mistaken, but some of them are probably correct. Most of the sightings aren’t all that convincing anyway, but some of them do look pretty convincing.

There are various ways to approach this problem statistically. One is to try to pick out some sightings that you are sure are correct and see whether these stop at some point, or become less frequent. Another is to look at sightings during the period we know the bird existed and see what the ratio of convincing to dodgy sightings was like then, and see if it changed.  These (described more elegantly and formalised with maths) are the methods of three papers by Andrew Solow and co-workers (a,b,c — the copyright industry probably won’t let you read them).

One of the key bits of data in this calculation is a 2007 observation of the South Island kōkako that the Ornithological Society of NZ thought was convincing. According to the research paper, the last reliable sightings from the period when  the bird was uncontroversially still around were five over the period 1954-1967.  The 48% kōkako probability in the new report relies very heavily on the bird not being extinct in 2007. Without that one report, the estimated survival probability would be basically zero.  The isolated 2007 sighting, if true, would also provide evidence that real sightings are rare even when the species is still present.

There’s a problem with the formulation of the extinction models.  The original paper describing the first method, the one that gives the 48% probability, says “The methods described in this note assume that, prior to extinction, sightings follow a stationary Poisson process”.  In English: we assume that (true) sightings occur independently at a constant underlying rate.  They probably don’t.  There are a lot more people out there now than in 1967, so the rate is probably not constant. Also, there will be clustering: if someone convincingly reports seeing a South Island kōkako, the birding community will descend on the area with cameras at the ready and the chance of true sightings should go up[1]  And if the population is diminishing slowly (as it would have to be), the true sightings will also diminish slowly. This method also requires that you can tell which sightings are true.

The third method I linked above allows for uncertain sightings, so you don’t have to be able to tell in advance which sightings are true. However, to make the maths tractable, it still models both true and false sightings as being stationary Poisson processes: there’s a constant random rate of true sightings before extinction and a constant random rate of false sightings before and after extinction. Under this model, if the kōkako is extinct then at least 99.75% of the sightings since 1967 are false.

That’s less impressive than it sounds. To start with, obviously if there aren’t any kōkako now and there were no reliable sightings between 1967 and 2007, then nearly all sightings are false.  Also, this doesn’t mean that people’s accuracy in distinguishing kōkako from other things is less than 1 in 100. The iNaturalist site records 350,000 observations with photo or sound recording of securely-identified birds that aren’t South Island kōkako over just the time since 2012, and people may have seen birds and not posted about it to iNaturalist. The proportion of times someone sees something and wrongly think it’s a South Island kōkako could still be tiny — it’s just large compared to the (possibly zero) number of true sightings.

So, overall the paper says that if there were South Island kōkako in 2007 it’s not unreasonable that there still are a few. Which is fair. If they exist, they’re probably in one remote area rather than all over the South Island.   The 48% probability was correctly presented in the research paper as the output of the statistical method they used, but you shouldn’t put a lot of weight on the precise number. When you don’t have good data to put into the model you aren’t going to get much certainty out of it, and the statistical modelling had to make some pretty big approximations.  In particular, the model is leaning quite hard on the approximation that the search effort (and number of false sightings) has not increased over time.

 

 

[1] there’s a type of statistical model called a “self-exciting point process”, whose name is very appropriate here.

April 4, 2026

NZTA much better?

This is an expansion from the “Briefly” post about an NZTA summary of public comment on their SH1 Wellington proposals.

On Bluesky, @gwynebs had pointed out that some of the bars indicating levels of support didn’t appear to match the numbers attached to them — the “much better” category seemed inflated

A couple of days ago I noted there was a pattern to the distortion: it really was only the “much better” bar that was inflated and the other four were compressed in the same proportion. That is, some varying percentage was effectively being added to the “much better” level.  This is true for all five of the specific sections of the proposal,  but is not true for the two overall ratings in the middle of page 2, which appear correct. The bars are also correct in the much more detailed community engagement report; it’s just the summary that is wrong — which should indicate something about where things went wrong.

This is not rounding error. It’s much larger than that.

I went and measured the widths of all the bars in the five charts. These are in the same order as in the report: from top to bottom we have “2nd Terrace tunnel”, “Te Aro”, “Basin Reserve”, “2nd Mt Victoria tunnel”, and “Hataitai and Kilburnie”. The lower bar for each is cut from the NZTA summary. The upper bar has the correct percentages plus the necessary additional amount to make the bars line up — so the red is the amount that has been added to the “much better” category in the graph compared to the numbers. My bars and their bars don’t line up perfectly; that is probably rounding error. One possible explanation is that the red is some sort of “Don’t know” value that has inadvertently been put into the last bar — I could see that happening if the bars were drawn as pictures rather than as charts.

How much should we care about this? On the one hand, this sort of thing is probably corrosive to public trust in government data. On the other hand, this purports to be quantitative analysis of a self-selecting survey of the sort that attracts highly motivated and unrepresentative minorities*, so there’s a real limit to how seriously you should be taking the numbers.

Arguably, the point of this sort of survey is to see if there are surprising results — either something NZTA didn’t know about, or stronger opposition than they expected.  Even so, most people who aren’t the Advertising Standards Authority would think there’s something wrong with graphs that don’t match the data they purport to present.

 

*eg, people such as me

April 2, 2026

Briefly

  • For the day between March 31 and April 2nd, Andrew Gelman takes on an app that claims to find patterns in lotto numbers and make you money.
  • RNZ reports the plans for tolls on the Road of Northland Significance, a charge of $4.50 each way from Warkworth to Te Hana (you will see some quotes of $14.20, which includes current tolls on the already-existing road to Puhoi). They don’t report what fraction of the cost the tolls will cover. Greater Auckland looked at the NZTA consultation papers about the tolling and say 35 years of tolling will raise $391m. That would be nearly 10% of the (phase 1) cost if you didn’t include interest; it’s a much smaller fraction when you do. And this is phase 1 — there are two more phases in the planned road to Whangārei.  Whether the road is worth the cost isn’t my specialty, but it’s a lot of cost.
  • Len Cook (former Government Statistician) is in the Otago Daily Times disapproving of the planned removal of the census enumerations. We’ve covered this topic before.  The changes to the Data and Statistics Act are up for public comment, as are the necessary changes to the Electoral Act.   The electoral changes are not intrinsically controversial but are needed because electoral redistricting is currently triggered by the census. The electoral changes are important because they need a 75% supermajority in Parliament.
  • RNZ reports on an NZTA report on public consultation about road changes in Wellington. First, the usual whinge: please link to this sort of report, so we can read it if your summary gets us interested!  Second, and the StatsChat motivation, the NZTA report displays pretty graphics of the public feedback, which are systematically wrong! For example, on the question “will a second Terrace Tunnel make things worse or better for you?” the lower bar is from the report and the upper bar is correct based on the percentages.  The right end of the bar is “better”, and is exaggerated

    Or the next question, about Te Aro improvements (original above, correct version below). Again, the “better” end is exaggerated

    I don’t think this is likely to be deliberate, but it’s a bad look

Oily rag

The Ministry of Transport have put up a fuel monitoring dashboard. It shows estimates of demand, supply, and price.

At the moment, the reduction in demand is less than 10%, a level of demand that’s probably not sustainable in the medium when global supply is down at least 25%. On the other hand, we are still at level 1 of the alert system, and even level 2 doesn’t ask for any real reductions in demand.

What this display doesn’t show is any sort of “time to running out”.  That’s probably sensible, because it’s not even well-defined, let alone predictable. If you define “running out” as some petrol stations being out of supplies then it’s already happened. If you define it as “no fuel in the country”, it probably won’t happen. And if you define it as level 3 or level 4 restrictions on supply then it’s a choice by the government based on unknown criteria, and so is hard to forecast statistically.

 

March 31, 2026

Dangers of opt-in surveys

There have been two stories just recently in the Guardian about the dangers of opt-in surveys.  A survey from the respectable polling organisation YouGov reported a big increase in (Christian) church attendance among young people.  This was a bit of a  surprise, and didn’t seem to match up with other polling data (or with attendance counts by denominations that count attendance), but it was YouGov and it was what some people wanted to hear.

Apparently the problem was opt-in respondents.  This isn’t the completely useless opt-in clicky polls that our news sites put up from time to time; YouGov is a serious polling organisation.  However, I think it’s fair to say YouGov has tried to get accurate poll results by focusing more on statistical modelling of who responds and less on trying to get a good sample.  Again, that’s a perfectly reasonable strategy and has historically been competitive. You can’t get real random samples of people any more — not like in the 1950s — and so you get samples that are representative in some qualitative sense and reweight them to match the groups you’re trying to study.

You might think it’s strange that people would try to get into survey samples. It is strange, and that’s exactly the problem. Only a small fraction of people will try to get into surveys for the money, so those people are very unrepresentative, and while they are only a small fraction of the population that’s still a lot of people.   In the future, there’s the potential for LLM-based fake people to take surveys for the money (or just to be inconvenient), and they will be still worse.

When you start with a reasonably well-controlled sample and some people opt out, you have a subset of a reasonably well-controlled sample. It looks as though allowing too much self-selection can be qualitatively worse (though this is a one-off so far, and only provides limited evidence).

I also want to note that CNN reports a response from the Bible Society to the withdrawal of the survey report

The Bible Society said in a statement it was “deeply disappointed” by what had happened, but insisted the “wider picture” from other surveys pointed to “an increased engagement in faith among young adults compared to older generations.”

This isn’t a good reaction: the reason we found out the report was inaccurate was precisely that other evidence didn’t point the same way.

March 26, 2026

As and when it looks supportive

Via Russell Brown on Bluesky, the Herald has a report on the increases in people being charged with cannabis possession. Charges fell by about 1/3 from 2017 to 2021, in parallel with increasing evidence that arrests for possession didn’t really have social license, but then started rising and now are back at nearly 2017 levels.

So what do the police say? Well, the Herald reports

Director of the National Organised Crime Group Detective Superintendent Greg Williams says wastewater testing in the Auckland and Northland region shows cannabis consumption spiking in July 2024.

“If you look at that charging data, it actually perfectly almost reflects what looks like a significant increase in cannabis consumption.”

We can look at the charging data, and the Herald does. We can’t look at the wastewater cannabis data, though.  On the same day in the Herald there was a story on the newest results from wastewater drug analyses. The story reported estimates of meth, MDMA, and cocaine use. As expected, there’s a lot more meth than anything else, but there’s a potentially worrying increase in cocaine (it’s not so much that cocaine is worse than meth, but it’s a new supply chain).  There was no comment in the story on cannabis use.  There were related stories at One News and RNZ and Newstalk ZB.

If you go to the NZ Police webpage on wastewater drug testing you see

The drugs tested for include methamphetamine, MDMA, cocaine, fentanyl, and heroin. These reports focus on methamphetamine, MDMA and cocaine as these drugs are routinely detected by the programme.

At PHF Science (former ESR) you can find plenty of pages talking about their efforts in testing for meth, MDMA and cocaine, such as this one on the 2024 spike in meth, or this research paper with the mind-numbing details of how they do the testing, or this drug harm page where they say

To date, wastewater testing has been used to measure consumption of illicit drugs including methamphetamine, MDMA, cocaine, heroin and fentanyl. 

Neither the police nor PHF Science publish cannabis-use estimates from wastewater.  The reason they don’t publish the estimates is they aren’t very good.  According to a research report from PHF Science,

However, certain characteristics of cannabis – such as it being lipophilic, not dissolving well in water and its tendency to stick to surfaces such as wastewater pipes – have made analysis in wastewater more difficult. Additionally, due to the considerable chemical differences between cannabis and the other illicit substances being monitored it cannot be added to the same analysis workflow. At this stage there is still too much uncertainty for cannabis measurements to be reliably quantifiable. However, the monitoring data can still be used in trend analyses

They do measure cannabis at five sites around the country, and as the research report says, the data could still be used in trend analyses. But popping up with a claim about two regions from undisclosed data about one time period isn’t a credible trend analysis.

What other data are there?

I don’t find the NZ Drug Trends Survey all that convincing on a detailed level, but its questions asking people who admit to using illegal drugs about which drugs they use should also be ok for trend analyses, and their cocaine reports show a similar trend to the wastewater data. They see a decrease and then increase in daily or weekly cannabis use over the time period we’re talking about, but to a much smaller extent: 68% of respondents at the peak, then down to 57%, then up to 70% for the most recent data. That’s about a 15% decrease and corresponding in regular cannabis use among regular drug users.  Also, a big spike in population cannabis use would increase the number of regular drug users, and show up as a decrease in the proportion regularly using other drugs, which we don’t see.

The NZ Health Survey asks about drug use. The Drug Foundation has collected their data (along with other data sources) and it doesn’t show a pattern anything like the police charging data (click to embiggen, as always)

So, I’m not convinced by the bare assertion that wastewater data show the police are just picking up the same fraction of a varying drug-user population. If the police want to use trends in the cannabis wastewater data to influence public policy they should publish the complete data series, with all the attached caveats from the scientists behind the testing (who I do trust).

How cats vote

Joao Barbosa posted these two maps of Paris on Bluesky: votes in the mayoral election, and cat ownership. If you’re one of the dozens of people on the internet who aren’t American, the political colour scheme is the way you expect.

You can probably come up with explanations for the left-wing lean of the cats if you’re a cat person. And even more so if you’re a dog person.

Another useful map is this one from a report on the risk of gentrification in Paris caused by the 2024 Olympics. It’s a map of income: light colours are high income, dark colours are low income.

There seems to be a general rule that all choropleth maps of a given place reduce to one of a very small number of basic patterns.  There’s an XKCD comic about this for the USA, and Kieran Healy has also written about the two basic US maps

March 20, 2026

Cars vs public transport

Yesterday I noted RNZ had just quoted an Auckland Transport claim about the cost of driving that was implausible on the face of it, and didn’t seem to have done any checking or provided any explanation.  Today, the same story is in the Herald, with the same  lack of explanation.

Here’s the RNZ quote; the Herald one is almost identical

Auckland Transport said before the Iran conflict began late last month, the cost of public transport was roughly the same as the cost of driving a vehicle with single occupancy in Auckland.

It’s now costing people nearly double to drive their own cars.

“The cost of petrol has risen at least 50 cents per litre since then, with a 15-kilometre single person commute now costing roughly 80 cents per kilometre, which is equal to about $12 for the total trip.”

So where does this come from? In comments to yesterday’s post, David Welch pointed me to the IRD page on the cost of driving.  The “Tier 1” cost in 2024-5 was $1.17/km for petrol cars. That’s higher than 80c/km, and it’s also not the right comparison — it’s an average cost per km. That is, it includes a per-km share of the fixed costs of having a car. Auckland Transport are (or should be) talking about just the extra per-km cost of using the car to commute. Taking a bus won’t make your car loan go away.

The IRD view on the cost of running a car is their “Tier 2” number, which is only 37c/km.   That, interestingly, is close to half the 80c/km that Auckland Transport is claiming. Since they say that this is double what last year’s cost was, their estimate of last year’s cost is interestingly close to the IRD Tier 2 value and might come from the same methods?

I found an estimate of national average petrol prices as $2.66/L  from December last year.  That would be an increase of 44c/L compared to the one Auckland station I checked yesterday.  If, instead, we take Auckland Transport’s “at least 50c/L”, the increase in running costs would be 5c/km for a vehicle that gets 10 km/L, and less than that for a more typical single-commuter vehicle, so again we can’t get the AT figure.

Even without trying to work out and replicate their calculations, however, we can say one simple thing.  Petrol prices have not yet nearly doubled, so they can’t have caused driving costs, however defined, to have nearly doubled.

On the other hand, the conclusion that people should consider switching to public transport is true: we want to save the potentially scarce supply of oil for people and industries who don’t have any alternatives.

 

 

Update: Greater Auckland have also reprinted the claim from AT, again without comment.