Posts written by Thomas Lumley (2609)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

June 21, 2025

Census roundup

Not necessarily endorsed by me, but many of these people do know what they are talking about.

I do also want to emphasize that no-one expert thinks this is a proposal to stop collecting data for the government. Administrative data already marks when you are born or die, when you enter or leave New Zealand, when you pay taxes or go to school or get health care.  This information is more reliably and rapidly collected administratively than in the Census. What we risk losing is not that, but other things.

Reeling them in

Q:  One News says fishing can improve your mental health!

A: That sounds fairly plausible, actually. Did they say how they know?

Q: “research from the UK”

A: A bit non-specific, innit?

Q:

A: I think it’s this paper. The number matches (“Almost 17% less likely”) and it’s from the UK and there doesn’t seem to be a better match

Q: And people who fished more had less mental illness?

A: People who fished more often had less history of depression, suicidal thoughts, and self-harm. People who fished longer had more suicidal thoughts.

Q: How often did people have to fish to be in the “17% less likely” group

A: It’s not clearly described.  The model in the paper actually has 17% more likely, so maybe it’s a model for “not mental health problem”.  If the 17% is for a one-step difference in the survey question then it’s a surprisingly large effect of a very small difference: 5-6 times a week is a different category from 3-4 times; once every two weeks is different from once per month.

Q: Could the anglers just be healthier anyway, or richer or something? Did they collect that information?

A: They did collect it, but they didn’t use it in the analysis, at least in this paper.

Q:

A:

Q: How did they recruit the people?

A: “an online survey  that was advertised through the Instagram, Facebook, and Twitter accounts of Angling Direct and Tackling Minds. Angling Direct also sent the survey link to their mailing list, and the link was distributed via the Anglia Ruskin University Twitter account, as well as the authors’ own networks.”

Q: That … sounds like it might not be perfectly representative

A: 98% of the respondents were men, for example. And 40% were in the top 20% of household income nationally.

Q: Would I be right in guessing that Angling Direct is some sort of fishing magazine?

A: It’s actually a chain of fishing supply stores in the UK.  Claims to be the UK’s leading fishing-tackle retailer

Q: Ok, and Tackling Minds is maybe some sort of fishing education thing?

A: It’s a charity that uses fishing as a mental health intervention.

Q: Couldn’t that have some impact on the correlations between fishing and mental health in the sample?

A: Indeed it could

June 19, 2025

Compared to what?

Via Bluesky from Instagram, and attributed to Chris Hipkins

When StatsNZ produces the data here, it was purely descriptive: number go sideways, number go down, number go up. The use on @nzlabour’s Instagram and with a Chris Hipkins electoral authorisation obviously intends a comparison, even without the annotations.  A simple comparison to the past — butter is more expensive now — is true, but it’s not what’s implied. We can tell it isn’t, because it would have no political implications and so wouldn’t be worth marketing.

The implied comparison here is to a scenario where the price of butter stays const (or keeps decreasing?) in 2024. The comparison is clearly bogus (which is why the graph is such an effective way to present it).  You might approve or disapprove of NZ butter prices following global trends, and of the NZ supermarket duopoly having substantial pricing power, but these are ongoing issues and neither one is the fault of the current government. A Labour government that committed to not increasing taxes  isn’t going to introduce price caps or government subsidies for butter!

The graph has the opposite problem to a lot of Covid comparisons.  Here, the problem is comparing to a hypothetical world that is unrealistically different. For Covid, it’s comparing to a hypothetical world that’s unrealistically similar: talking as if we could have skipped lockdowns and just had a normal economy, when the real alternative is lots of illness and death and a much worse economic problem.  The usefulness of counterfactual comparisons relies on making realistic choices about what would have been the same or different.

June 18, 2025

Tatau tātou, eh?

According to the Herald, the government has decided to stop doing the Census after the next (2028) round and switch to yearly administrative data from 2030.  The press release is here, and StatsNZ’s page is here.  There’s no commitment so far to get the necessary legislative changes passed before the election, but that may come.

This was inevitable at some point.  Door-to-door enumeration is getting less effective and administrative data are getting more complete: eventually the two lines will cross. There are quite a few countries that have more detailed and thorough government data collection than us and don’t bother with censuses. They get on fine. I’m not sure we’re there yet, but maybe we will be in 2030?

At the crude level of “how many people are there and roughly where do they live and what work do they do?”, administrative data is great.  The use of administrative data in the 2018 and 2023 Censuses improved the counts of people by region, and especially improved the counts for Māori.  There are some important weaknesses, though.

First, the `administrative’ data used to augment the 2018 and 2023 Censuses included past Census data, not just routinely-collected government data.  In 2018, the first-priority source for additional data was the 2013 Census, and it was often important. For example, when creating the “Māori descent for electoral purposes” variable, StatsNZ found 15% of the “Yes” values and 7.7% of the “No” values in 2013 Census data. [Table 4.2, Initial report of Census data quality panel].  If we stop doing Censuses, the existing Census data will rapidly become less useful.

Second, administrative data is much less effective for household statistics than for individual statistics.  Most routine government data collection is about individuals.  If Chris reported a particular Auckland address in March 2025 and Pat reported that address in December 2024 and Sandy reported it in July 2024 and Alex in June 2024, how do you work out which subset of these folks were ever living there together? And that’s before you get to situations like if you’ve just started flatting but your doctor has your Mum’s address and your boss has your Dad’s address.   In 2018, household data were a big weakness of the Census — nearly 8% of the census population didn’t have an assigned household. StatsNZ did a lot of work on this subsequently, but it’s hard.

Third, there are data that just aren’t collected routinely. Iwi affiliation, disabilities, and housing quality variables were examples from 2018. If these variables are wanted, they will have to be collected in other surveys, and there’s no clear reason to expect the other surveys to be more accurate than the Census. In particular, they may have worse non-response rates for Māori and for minority groups.

There’s also a potential social license issue.  People understand the Census and have some idea of what it’s for, and mostly approve.  The IDI is much less well understood, and I think is less popular. Replacing the Census with surveys and vacuuming up of data collected for other purposes could well have a negative effect on public willingness to give up their data and public trust in the results.

Good sources if you want to read about this include the StatsNZ page, whatever Len Cook writes, and also the reports of the 2018 Census Data Quality Panel (there’s a 2023 report, but it’s much smaller and mostly talks about minor improvements in methods).

June 7, 2025

Auckland is larger than Wellington

The Herald is reporting on the Government’s new road-cone hotline. Apparently there were 236 reports in the first four days, and the article usefully points out that this was about 100 for each of the first two days, so it dropped pretty fast after that.

A bit less usefully, there’s a graph (click to embiggen) of reports by miscellaneous administrative category . Auckland (region), as usual, is at the top.

It’s not completely clear what the right scaling is, but raw reports aren’t it. The traditional Kiwi favourite of per capita shows Wellington (city) way ahead of Auckland (region), with more than half as many reports from about an eighth as many people.  NZTA is there to represent state highways, and it doesn’t really have a population in the same sense as the other areas.

We might also scale by the amount of road to have cones on.  This map, from Figure NZ, isn’t quite what we want because it uses consistent geographical units — regions– but it does show how state highway and local roads compare. It looks like NZTA is coning above its weight per km of road

FigureNZ also has a graph by territorial authority,  a better match to the Herald’s graph, showing Auckland has well over twice the sealed road of any other authority (if you aren’t a Kiwi: this is ordered north to south).  Wellington has much less road, so it is attracting comparatively the most reports.

So, the data do show a “hotspot for cone concerns”, but it’s Wellington, not Auckland.  This could mean more cones, or streets with less spare room, or fewer alternate routes, or people who just whinge more. The data do not suffice to tell us.

June 4, 2025

Briefly

  • Ask Stuff’s BudgetBot anything you need to know about Budget 2025. Or, perhaps, don’t.  The budget is an occasion where there’s actually quite a lot of expert analysis both in mainstream media and social media, as well as all the unreliable vibes-based commentary anyone could want
  • I’m not saying that AI is useless.  Mike Caulfield has some very impressive demonstrations of forcing Claude to help with fact checking.
  • Various sources report that smartphones cause haemorrhoids.  This is based on unpublished research presented at a conference, of a sort that can’t show more than a correlation, and that just barely provides evidence of a correlation (this is the closest we have to details).  The problem is that the study has no ability to tell whether using a smartphone on the toilet increases the risk of hemorrhoids, or having hemorrhoids increases the chance that you’ll use a smartphone on the toilet, or that something else affects both things.  Also, it’s about time for this story to reappear, since we had it in 2023 and 2020  and 2018, though those times it was at least presented as expert opinion rather than research.
  • RadioNZ have launched a political poll with Reid Research (hat tip to @danxduran on Bluesky).  Notably, the article describes not just the maximum margin of error, for proportions near 50%, but margins of error for smaller proportions such as 10% or 20% (not for 5%, unfortunately). These are uncertainty estimates for an idealised mathematical model of polling, and underestimate the true uncertainty a bit, but they are a big step forward.  I’ve written about the uncertainty for smaller probabilities on StatsChat before
June 3, 2025

Cancer and exercise

There’s a new study of cancer and exercise that’s just been reported at a cancer conference in the USA and published in a major scientific journal, and which has made it to the media.  It’s good news; and actually real good news.

The study finds that exercise actually improves survival in colon cancer.  More precisely, it finds that providing an exercise coach improves survival compared to just providing the usual “exercise good; junk food bad” information.  Obviously this wasn’t a double-blind trial — you can’t make people exercise without them knowing — but it measured objective health outcomes. In particular, the medical profession can measure death very reliably.

There have been lots of papers in the past showing that exercise is correlated with better health in people with colon cancer. The correlation is robustly unsurprising: the less well you are, the less you are able to exercise, and there was no way to be confident anything more than this was going on.  This study was different, because people were randomly assigned to higher or lower pressure to exercise.  It’s pretty unusual to have a study that actually changes people’s level of exercise over a long period, and even more unusual to show that it actually improves their health. We don’t know if the effect translates to other cancers — previous studies have had hypotheses about mechanisms that are specific to colorectal cancer and others that aren’t.

Since this is StatsChat, I do want to compare what the research paper and the news said about the size of the effect. Here’s the graph from the research paper

At the planned 8-year follow-up point, the difference in survival was 7 percentage points. Basically the same was true at the planned 5-year point for survival cancer-free.  The overall survival difference narrowed a bit if you took the data out to ten years, and the cancer-free survival widened a bit.   You could also quote the average ratio of the death rates (or cancer recurrence rates) in the two groups, which is common in the statistical analysis of cancer but is a bit harder to translate into real-world impact (and which gives much bigger numbers 28% or 36% reduction)

The Guardian just reported the relative rates. The BBC reported both, very clearly.  Ars Technica reported both, but didn’t link the absolute and relative numbers as clearly as the BBC.

The Guardian also made a lot of the “better than a drug” comment by the chief medical officer of the American Society for Clinical Oncology,

“It’s the same magnitude of benefit of many drugs that get approved for this kind of magnitude of benefit – 28% decreased risk of occurrence, 37% decreased risk of death. Drugs get approved for less than that, and they’re expensive and they’re toxic.”

I think it’s worth noting that this is not saying exercise is better than chemotherapy. It’s saying exercise plus chemotherapy is better than chemotherapy along, and the margin is large enough that if it were new drug+chemo vs chemo alone you’d easily get approval.

 

 

May 28, 2025

Typical wedding

From RNZ this week

The Wedding Planner director Susannah Reid said in 2023 $58,000 was a typical budget for a wedding across New Zealand but this year, the average cost was more like $87,600.

As we’ve seen in the past, this use of ‘typical’ by people in the wedding industry doesn’t have a lot to do with how the word is normally used.  In that post, in 2014, the typical value from the marital-industrial complex was $30,000.  In 2025 dollars that’s about $40,000, so apparently the real cost of “typical” weddings has more than doubled over the time StatsChat has been running.

Radio NZ also have this week: How New Zealand couples saved on their wedding (and what they splurged on). The most expensive wedding here is less than a quarter of the “typical” budget; the others are less than 10%.

If you think about it, estimating the actual cost of a typical wedding is quite hard.  Technically it’s not that difficult: the government keeps track of marriages, so you could do a survey of a sample of marriages and ask people, but the Births, Deaths, and Marriages people will only let you browse arbitrary marriages from the distant past. Contemporary marriages are public records, but public access to them is by name rather than just by year.   If you don’t have a genuine sample, you’ll tend to notice big weddings more than small ones.

May 24, 2025

Redrawing a graph

Newsroom has an article about tertiary funding in the new budget.  There’s a graph showing how the government fee cap and inflation have competed over the past few years. The graph in the article (ignoring the impact of a fee-free year) is

I find this a bit hard to interpret — it’s not easy to see the cumulative effect of the two changes and work out whether fees have shot ahead or lagged behind.  I think this is clearer, showing the cumulative effect of inflation and the (maximum) fee increase; the highlight is the two Covid border-closure years.

We can see that fees had increased in real terms but had now fallen behind inflation. Based (obviously) on a projection for 2026, fees might be slightly ahead again.

Another option is to use the inflation series to deflate the fee cap and just show the real-terms fee changes. Again, fees were up in real terms, then down, and are right at the 2014 level. Again, this ignores the impact of a fee-free year, which will depend on when one started uni.

What I think this shows is that adding up yearly changes (especially multiplicative ones) in your head is hard, and it’s probably harder than the opposite task of estimating changes in slopes.  If you need both scales, you might be better off with a cumulative graph.   The big disadvantage of a cumulative graph is that the visual impression can be quite sensitive to when you start adding.

 

May 20, 2025

How many coffees is a house?

RadioNZ have How many years would you have to skip coffee to save enough to buy a house?

The story correctly says (a) a lot, even if house prices didn’t go up, and (b) that’s not really the question.

There’s one point they miss  which is important to a lot of the narrative on house prices and saving (perhaps because this is a personal finance article, not a housing prices article).

The reason for housing unaffordability isn’t that Kiwis aren’t spending enough on housing.  You could imagine a world where housing was readily available and affordable, but people in general, or some group of people, couldn’t buy houses because they were spending all their money on beer and holidays and not saving anything for a deposit. In a world like that, advising people to save might be useful (if they listened).

That’s not the problem in New Zealand.  Kiwis, collectively, are spending far too much on housing. If one person gave up coffee or avocado toast  to save faster it might help them a little bit.  If we collectively gave up coffee and avocado toast to save faster, housing prices would just increase faster to compensate.