Posts written by Thomas Lumley (2609)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

November 16, 2025

50-year mortgages and avocado toast

Sometimes society or government faces a problem where not enough money is being spent on something.  Kiwis on average aren’t allocating enough money to retirement, or councils aren’t investing enough money on water infrastructure. In that scenario, you want to spend more money.  Kiwisaver was supposed to get people to save more. Making people save more was the initial effort of the “nudge” industry. No-one seems to really know how to make councils plan for water infrastructure,  but it would be nice if we could.

The housing industry is not like that. The problem with housing prices is not that we are spending too little on houses. We (collectively) are spending too much on houses. That’s why avocado toast is not a housing issue* — if abolishing avocado toast would increase total expenditure on housing, we’d be worse off, not better off.

This week, in the US, 50-year fixed mortgages have been proposed.  A 50-year mortage would increase the amount you could spend on a house for a given level of savings and income (at the cost of dramatically reducing its value as an investment). If the problem with housing were insufficient money being spent this might help, but that’s not the problem.  A change in financing that lets people bid more for houses doesn’t help.  Like abolishing avocado toast, extending mortgage terms is trying to solve the wrong problem.  Unlike abolishing avocado toast, it might have a real effect on the market.

 

 

 

* and because the basic arithmetic doesn’t make sense, but that’s a different post

October 20, 2025

Briefly

  • It’s World Statistics Day (which only happens every five years). Well, because time zones it’s not actually World Statistics Day for another half-hour as I write this.
  • The US Secretary of Health has claimed teenage boys have sperm counts half of those in 65-year old men. Angela Rasmussen looks at this claim. If you think about it, as she points out, there’s no plausible way there could be good worldwide evidence on sperm counts in teenagers — how would you get those data?!
  • Backblaze, who sell cloud storage, have periodically reported on the long-term survival of hard drives (and released data, too) — these are the old-fashioned “spinning rust” hard drives, not solid-state drives.  Their new report says that drives are getting better, and that they don’t see the “bathtub” risk curve of folklore, where the newest and oldest drives are most likely to fail.
  • Consumer Reports has published on heavy metal content in protein powders. They say “more than two-thirds of them contain more lead in a single serving than our experts say is safe to have in a day”.  One issue here is that lead can be measured sensitively with modern technology, and is notoriously said to have no safe level, so in a sense all food will have more lead than is safe.  Consumer Reports does acknowledge this; their threshold, which one could perhaps describe as ‘not really unsafe’ is 0.5 micrograms per day.  I think it’s useful to have some historical context. In the 1980s, the “provisional tolerated weekly intake” was 5 ug/kg/week, or about 250 ug/day for a 70kg adults. For infants, even breast milk added up to 0.5ug/kg/day, well above the modern limit, and formula was much higher. So, yes, we know more about lead now and we’re right to be more scared of it, but there are a lot of people in the world who have been exposed to way more lead than these protein supplements would give you.
  • This map from USA Today is misleadingly labelled, as often happens.  It’s what I call a “caricature map”. It doesn’t show each state’s most ordered Halloween candy. It doesn’t even show each state’s most ordered Halloween candy from this specific online retailer. It shows, for each state, which candy is most over-ordered relative to the rest of the country.  Like a caricature, where you find the distinctive features of a person’s face and exaggerate them, the map finds what’s different about candy purchases in each state and promotes that to the state norm.  These maps aren’t bad — the most common candy/side dish/toy/whatever in each state is often a fairly boring map — but they would be better with an accurate description (this was from USAToday on Bluesky — the story on their website does a bit better).
September 28, 2025

Briefly

  • From the Guardian: Exclusive: Study gives 85.7% probability Badminton House version of The Lute Player is by 17th-century master. As I said about a previous rating from the same company, there’s no way this probability is meaningful to three significant digits (except potentially to the computer). The company’s head, Dr Carina Popovici, told the Guardian: “Everything over 80% is very high.” which is, um, reasonable.  Importantly, we’re not told any of the “compared to what” information. Is this 85.7% considering that it was previously described as fake and doesn’t have good provenance, or it is 85.7% if the painting was selected from a training set of half real and half fakes.  Or what?
  • From The Xylom via Flowing Data, a map of H1B visa holders at US universities, including what fraction of the research budget it would take to keep hiring at the same rate under the new rules.  I’m not sure the research budget is the right comparison — yes, a lot of H1B’s are postdoctoral researcher, but I was in a regular academic job when I had an H1B.
  • Voting has just closed in Bird of the Year, the only online bogus clicky poll endorsed by StatsChat.  Bird of the Year takes a lot more care than most online bogus polls to clamp down on virtual ballot-box stuffing. Its results are more trustworthy than the typical online clicky poll.  You should definitely be more confident that it’s identified the truely most popular bird in Aotearoa than you are that the average unweighted opt-in survey is telling you the truth.
September 23, 2025

Panadol scare

R.F. Kennedy Jr managed to predict almost perfectly the day on which his research initiative would “find”  “the” “cause” of autism.  Of course, it’s easier when you don’t have to actually do any new research.

What do we actually know about paracetamol and autism or ADHD?

About a decade ago, there was a surprise finding of a fairly weak but not negligible correlation between paracetamol use during pregnancy and ADHD symptoms in the infant.  A New Zealand study repeated this analysis and found the same answer, at which point it became a bit more interesting.  There have been other replications since then.  The correlation is reasonably well established. The problem is deciding what we can say about causation.

Clearly no-one has done a randomised trial where some pregnant people take paracetamol and others don’t, because that would be unethical and also no-one would volunteer to be in the trial.  In the absence of randomisation, the question is how comparable the paracetamol and non-paracetamol infants would be otherwise. ADHD and autism diagnosis varies in frequency by all sorts of social factors, and there’s good evidence for a genetic basis in at least some cases of autism, so comparability is not automatic.  Also, one thing we do know about all the pregnant people who took paracetamol is that they had a reason to take paracetamol (probably pain or fever).  In contrast to alcohol or tobacco,  no-one’s taking paracetamol just for fun.

So, at that point things were all a bit unclear. On the one hand, maybe you want to avoid paracetamol during pregnancy if you didn’t need it, on the other hand, you probably already were.

Last year, a very large study in Sweden reported its results. They also found a weak correlation between paracetamol use and ADHD and autism symptoms in the whole population. However, they went further than this.  They did a study restricted to comparisons between siblings.  Oversimplifying massively, you could imagine taking all the families with two children where paracetamol was used in pregnancy for just one child and not the other. You could then count up the number of families where the paracetamol-exposed infant had ADHD or autism and not the unexposed child, and vice versa.  The point is that any other factor that differs between families will be the same for the two kids in the comparison and so can’t cause a  correlation. This c0uld be a genetic factor, or some ethnic or social class difference, or access to health care, or many other things.  (My description was oversimplified in the sense that they didn’t just use families with two kids, but also those with more than two, and they adjusted for variables that they know about and are different within a family. )

Importantly, this isn’t just a case of preferring a newer study or a bigger study.  The fact that the Swedish study saw the broadly the same whole-population correlations as other research studies argues that there isn’t something different about Sweden or about their data collection. The fact that they didn’t see the same correlation when doing within-family comparisons argues that the correlation is caused by something that varies between families, not something about individual pregnancies such as paracetamol use.

Estimating rare proportions

There is a statistic circulating on social media claiming that the average person in the USA thinks 21% of the population is transgender.  Obviously this isn’t true (both obviously it isn’t 21% and obviously that isn’t what the average person believes). It’s similar in some ways to the claim that some Americans think Iran is in the middle of the Atlantic Ocean, which I’ve dealt with before, except that estimating small proportions is an extensively studied problem in psychology, so a lot is known about the biases. In fact, if you look at the original source for the claim, demonstrating this phenomenon was the actual point of the story.

As Danielle Navarro explains, all small proportions are overestimated and all large proportions underestimated when people aren’t certain of the true value. This is an extremely consistent phenomenon, to the extent that we can actually say Americans are better informed about the proportion of transgender people than they are about other comparably extreme proportions.

[Update: Andrew Gelman writes about a slightly different, but related phenomenon, in the context of people reporting having been present for mass shootings.  It’s slightly different because people are reporting their own experience, which they presumptively do know, rather than their estimates of some proportion they have no way of knowing. We’d expect the bias to be smaller in this setting, but to still be present — it’s like the estimate of the frequency of virgin birth from the National Longitudinal Study of Youth]

August 8, 2025

Success rates

Complicated interventions benefit from pilot studies, where you try to implement the intervention and see how feasible it is.  These are not designed as evaluations of how good the intervention is; they’re typically too small for that and they may have insufficient attention paid to representativeness.  You typically still would look at the outcome of the intervention, and you would have some idea of what you hoped to see.  As Dan Davies says if you don’t make predictions, you won’t know what to be surprised by (and if you don’t make recommendations, you won’t know what to be disappointed by)

In the new young-offenders bootcamp program, there has been a pilot with ten participants.  According to the news, 7 out of 10 have reoffended so far. Since one out of ten died, it would be generous to summarise the proportion with bad outcomes as 8 out of 10.

Speaking to RNZ, acting senior manager in charge Iain Chapman said at the time the pilot began, the 10 participants were the “most serious and persistent young offenders in the country”.

Going into the pilot and expecting no reoffending would have been naive, he said.

This is absolutely true.  What he didn’t say — and should have — was how much reoffending was reasonable to expect. Did he expect better results than two out of ten? Maybe he didn’t. Perhaps one out of ten is what he expected and getting two out of ten is an amazing success. That wasn’t the impression that the government and the media were giving when the program was announced, though. In particular, getting two out of ten not to reoffend doesn’t stack up well against the death.

If the ten pilot participants had been a representative sample of the sort of people who would go into the program, we could do some statistics.   However, we can’t really do this because the pilot program is so small and we don’t know how the participants were chosen. They presumably weren’t chosen specifically because they were unlikely to benefit, but we can’t say much more.

I would have expected that somewhere on a server in Wellington there is a business case for this program that has someone’s best guess at the likely success rate. It would be good to know if that person is surprised, or disappointed.

BLS accuracy

From economist Justin Wolfers on BlueSky, the record of payroll employment revisions by the US Bureau of Labor Statistics

First: look at 2020!

Next, though, the purple and green lines are quite close together compared to the scale of year-to-year change even when there isn’t a pandemic.

On the other hand, people do actually care about differences of the size we see between the initial and revised estimates, as is demonstrated by the stock market reaction to the revisions.  What this really shows is how difficult the estimation problem is.  People care deeply about changes that are almost invisibly small on this graph, and that are right at the limit of what’s feasible statistically.

The ideal solution is probably for people to be more relaxed about small changes in estimated payroll employment, just as the ideal solution for political opinion polling discourse is for people to have a more realistic view of the limits of estimation. Alternatively, if people want to be unrelaxed about small differences, they need to be willing to pay more to get better estimates.

August 5, 2025

Official statistics

As you may have heard, President Trump has dismissed the head of the US Bureau of Labor Statistics, claiming that payroll employment figures presented by the BLS were faked to make him look bad.

Politicians meddling with official statistics is a bad idea.  This isn’t because official statistics are Pure and Holy and True and above mere political concerns; it’s because official statistics are messy and difficult and hard to get right, but also very valuable.   The benefit-cost ratio of good official statistics is very high; for the NZ Census the ratio was estimated some years back as 10.  National and local governments, non-profits, and businesses use official statistics to make decisions and the stock market responds to the numbers.  On the other hand, the benefit-cost ratio of bad official statistics is very low — if no-one believes the numbers, there’s not a lot to be gained by publishing them.  Since estimation is messy and difficult and hard to get right, trust in official statistics agencies is critical for trust in official statistics.

Agencies don’t always do it perfectly.  It can definitely be necessary to have some sort of independent review at times.  I was on the External Data Quality Panel looking at the 2018 NZ Census, and there has just been an independent review of the UK Office of National Statistics.  The goal is to make sure the agencies have good procedures, evaluated carefully, to produce the best feasible answers.   Political interference, on the other hand, is discouraged by national and  international principles for official statistics.  It’s hard to get rid of once you have it, and very hard to prove you’ve gotten rid of it — like black mould.

The American Economic Association put out a statement on Friday saying that getting rid of the BLS head this way was  a bad idea.  They don’t do this sort of statement very often.  The International Statistical Institute put out a statement today — they do this more often, but it still takes a fairly significant event to get them moving.  The American Statistical Association haven’t said anything yet, but they were all travelling to their annual conference over the weekend, so it might come soon.

August 3, 2025

Where are they now: asthma

From 2015 in the Herald and in StatsChat

Asthma could be cured within five years after scientists discovered what causes the condition and how to switch it off

As I noted at the time: nope, and nope.

Following up, the drugs in question, called calcilytics, continue to not be used to treat asthma. Hope is not entirely lost — a 2022 research paper says

these data firmly suggest that first-in-human studies will be feasible, desirable and achievable in the short term.

So it might still be true that this research eventually leads to useful treatments, but it certainly didn’t happen five years ago.

 

July 23, 2025

Most commonly reported

From XKCD, the most commonly reported plant and animal on iNaturalist for each US state (click to embiggen)

This is mostly about selection biases of various types: how recognisable the plant or animal is and how interesting.  Saguaro, in Arizona, are not exactly rare, but they aren’t the most commonly seen plant. They are famously associated with the southwest desert and immediately recognisable, so they get reported often.  In other states, the most common plant is genuinely common: yarrow in Montana, California Poppy in California, Amur honeysuckle (sadly) in some Kentucky and Indiana.

So what is it in New Zealand? Looking at the “research-grade” reports, because it’s easier, the most common animal seems to be the kererū, with the pīwakawaka second. Easy to recognise, popular, interesting. For plants, it’s the māhoe, which I wasn’t expecting.