Posts written by Thomas Lumley (2612)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

January 9, 2026

Baby names

The top baby names from 2025 are out, together with historic data to look at trends. Sadly, the historic data for boys’ names and for girls’ names come as three-page PDF tables with very small print, not as some conveniently computer-readable or human-readable format.  We can still see some interesting trends

The top names last year were Noah for boys (244 times) and Isla for girls (179 times). There has always been more variability in girls’ names: there are always more boys with the most popular name than girls with the most popular name.

The total number of births in NZ has been broadly stable since the 1950s but the number of babies with the most popular name has steadily been decreasing, implying increasing name diversity. In 1954 there were 1389 Johns; in 1979 there were 707 Michaels; in 2004 there were 504 Joshuas. For girls, the numbers were 779 Christines in 1954, 578 Sarahs in 1979, and 352 Emmas in 2004.

There aren’t any names that appear in the top 100 for both boys and girls. There are few names that, over history, have been in both lists but I haven’t found any that were in both lists in the same year — the closest was Kim. a top-100 name for boys in 1961 and 1962 and for girls in quite a few years starting from 1968.

January 8, 2026

Pie chart issues

This was on a real-estate agent’s advertising leaflet at a local café

If you aren’t from around here, those are neighbourhoods in south central Auckland.

Statisticians often complain about pie charts because it’s hard to make numerical comparisons between the categories, especially compared to a bar chart

The poor visual comparison might actually be a virtue in this case if the point is just that these neighbourhoods are similar.  In any case, there’s a deeper problem: pie charts are fundamentally about the relationship between portions and a total — slices and the whole pie.  In this example there is no meaningful total that the separate medians are components of.  There isn’t a pie for these to be slices of.

January 6, 2026

Vibe graphs

From Nicola Rennie on Bluesky, a bad graph found on LinkedIn:

and a correct version of the same graph

The bad version is probably from generative AI — as Nicola says, it is bad in ways that would take substantial effort to achieve in commonly-used software, ranging from the weird bar alignment to the incorrect lengths to the incoherent choice of colours to the Slovenioid flag to the spelling of Belgıun.  It’s also a bit vague about the data source, but that’s easy to achieve by hand.

The corrected version is a lot better, but brings out that this is actually hard to interpret. What’s a “foreign” language?  If you’re Welsh or Irish, can English count? Can Spanish count for the Basques? Less politically, if you’re Czech and you speak Slovak, is that a foreign language? Is it still a foreign language if you learned it before 1992?  If you grew up in Ghent, speaking Flemish and French, and learned English at school then you know one foreign language, but if you move to London do you suddenly know two foreign languages?

You might say “language that is not an official language of where you live”,  which is less ambiguous but does require identifying all the official languages of where you live. These are typically well-known within any the country or region (though there are people who profess to be confused about whether English is an official language of New Zealand), but they can be hard to determine by database search.

Kieran Healy, of Duke, gave an excellent talk last year about “Trustworthy Data Visualisation“: having graphs you can trust is not just about reproducibility in a simple sense, but about the systems that allow you to trust what you see: The important thing is not to lose sight of the collective, cooperative character of the whole enterprise.

Cross-national comparisons require that someone in each country has collected data, that the data answer the question you are interested in, that the biases and edge cases are either unimportant or the same across the countries, that the data have been accumulated, and that someone has drawn a graph.  In the past, all these steps were done by accountable people or organisations who were (or perhaps weren’t) honestly trying to provide good information. All these steps became more accessible over the past few decades, but we may be about to lose it all again.

You might well have good and sufficient reasons to trust your vibe graphics for your purposes.  It’s hard to see how other people can have good and sufficient reasons to trust them, though.

November 16, 2025

50-year mortgages and avocado toast

Sometimes society or government faces a problem where not enough money is being spent on something.  Kiwis on average aren’t allocating enough money to retirement, or councils aren’t investing enough money on water infrastructure. In that scenario, you want to spend more money.  Kiwisaver was supposed to get people to save more. Making people save more was the initial effort of the “nudge” industry. No-one seems to really know how to make councils plan for water infrastructure,  but it would be nice if we could.

The housing industry is not like that. The problem with housing prices is not that we are spending too little on houses. We (collectively) are spending too much on houses. That’s why avocado toast is not a housing issue* — if abolishing avocado toast would increase total expenditure on housing, we’d be worse off, not better off.

This week, in the US, 50-year fixed mortgages have been proposed.  A 50-year mortage would increase the amount you could spend on a house for a given level of savings and income (at the cost of dramatically reducing its value as an investment). If the problem with housing were insufficient money being spent this might help, but that’s not the problem.  A change in financing that lets people bid more for houses doesn’t help.  Like abolishing avocado toast, extending mortgage terms is trying to solve the wrong problem.  Unlike abolishing avocado toast, it might have a real effect on the market.

 

 

 

* and because the basic arithmetic doesn’t make sense, but that’s a different post

October 20, 2025

Briefly

  • It’s World Statistics Day (which only happens every five years). Well, because time zones it’s not actually World Statistics Day for another half-hour as I write this.
  • The US Secretary of Health has claimed teenage boys have sperm counts half of those in 65-year old men. Angela Rasmussen looks at this claim. If you think about it, as she points out, there’s no plausible way there could be good worldwide evidence on sperm counts in teenagers — how would you get those data?!
  • Backblaze, who sell cloud storage, have periodically reported on the long-term survival of hard drives (and released data, too) — these are the old-fashioned “spinning rust” hard drives, not solid-state drives.  Their new report says that drives are getting better, and that they don’t see the “bathtub” risk curve of folklore, where the newest and oldest drives are most likely to fail.
  • Consumer Reports has published on heavy metal content in protein powders. They say “more than two-thirds of them contain more lead in a single serving than our experts say is safe to have in a day”.  One issue here is that lead can be measured sensitively with modern technology, and is notoriously said to have no safe level, so in a sense all food will have more lead than is safe.  Consumer Reports does acknowledge this; their threshold, which one could perhaps describe as ‘not really unsafe’ is 0.5 micrograms per day.  I think it’s useful to have some historical context. In the 1980s, the “provisional tolerated weekly intake” was 5 ug/kg/week, or about 250 ug/day for a 70kg adults. For infants, even breast milk added up to 0.5ug/kg/day, well above the modern limit, and formula was much higher. So, yes, we know more about lead now and we’re right to be more scared of it, but there are a lot of people in the world who have been exposed to way more lead than these protein supplements would give you.
  • This map from USA Today is misleadingly labelled, as often happens.  It’s what I call a “caricature map”. It doesn’t show each state’s most ordered Halloween candy. It doesn’t even show each state’s most ordered Halloween candy from this specific online retailer. It shows, for each state, which candy is most over-ordered relative to the rest of the country.  Like a caricature, where you find the distinctive features of a person’s face and exaggerate them, the map finds what’s different about candy purchases in each state and promotes that to the state norm.  These maps aren’t bad — the most common candy/side dish/toy/whatever in each state is often a fairly boring map — but they would be better with an accurate description (this was from USAToday on Bluesky — the story on their website does a bit better).
September 28, 2025

Briefly

  • From the Guardian: Exclusive: Study gives 85.7% probability Badminton House version of The Lute Player is by 17th-century master. As I said about a previous rating from the same company, there’s no way this probability is meaningful to three significant digits (except potentially to the computer). The company’s head, Dr Carina Popovici, told the Guardian: “Everything over 80% is very high.” which is, um, reasonable.  Importantly, we’re not told any of the “compared to what” information. Is this 85.7% considering that it was previously described as fake and doesn’t have good provenance, or it is 85.7% if the painting was selected from a training set of half real and half fakes.  Or what?
  • From The Xylom via Flowing Data, a map of H1B visa holders at US universities, including what fraction of the research budget it would take to keep hiring at the same rate under the new rules.  I’m not sure the research budget is the right comparison — yes, a lot of H1B’s are postdoctoral researcher, but I was in a regular academic job when I had an H1B.
  • Voting has just closed in Bird of the Year, the only online bogus clicky poll endorsed by StatsChat.  Bird of the Year takes a lot more care than most online bogus polls to clamp down on virtual ballot-box stuffing. Its results are more trustworthy than the typical online clicky poll.  You should definitely be more confident that it’s identified the truely most popular bird in Aotearoa than you are that the average unweighted opt-in survey is telling you the truth.
September 23, 2025

Panadol scare

R.F. Kennedy Jr managed to predict almost perfectly the day on which his research initiative would “find”  “the” “cause” of autism.  Of course, it’s easier when you don’t have to actually do any new research.

What do we actually know about paracetamol and autism or ADHD?

About a decade ago, there was a surprise finding of a fairly weak but not negligible correlation between paracetamol use during pregnancy and ADHD symptoms in the infant.  A New Zealand study repeated this analysis and found the same answer, at which point it became a bit more interesting.  There have been other replications since then.  The correlation is reasonably well established. The problem is deciding what we can say about causation.

Clearly no-one has done a randomised trial where some pregnant people take paracetamol and others don’t, because that would be unethical and also no-one would volunteer to be in the trial.  In the absence of randomisation, the question is how comparable the paracetamol and non-paracetamol infants would be otherwise. ADHD and autism diagnosis varies in frequency by all sorts of social factors, and there’s good evidence for a genetic basis in at least some cases of autism, so comparability is not automatic.  Also, one thing we do know about all the pregnant people who took paracetamol is that they had a reason to take paracetamol (probably pain or fever).  In contrast to alcohol or tobacco,  no-one’s taking paracetamol just for fun.

So, at that point things were all a bit unclear. On the one hand, maybe you want to avoid paracetamol during pregnancy if you didn’t need it, on the other hand, you probably already were.

Last year, a very large study in Sweden reported its results. They also found a weak correlation between paracetamol use and ADHD and autism symptoms in the whole population. However, they went further than this.  They did a study restricted to comparisons between siblings.  Oversimplifying massively, you could imagine taking all the families with two children where paracetamol was used in pregnancy for just one child and not the other. You could then count up the number of families where the paracetamol-exposed infant had ADHD or autism and not the unexposed child, and vice versa.  The point is that any other factor that differs between families will be the same for the two kids in the comparison and so can’t cause a  correlation. This c0uld be a genetic factor, or some ethnic or social class difference, or access to health care, or many other things.  (My description was oversimplified in the sense that they didn’t just use families with two kids, but also those with more than two, and they adjusted for variables that they know about and are different within a family. )

Importantly, this isn’t just a case of preferring a newer study or a bigger study.  The fact that the Swedish study saw the broadly the same whole-population correlations as other research studies argues that there isn’t something different about Sweden or about their data collection. The fact that they didn’t see the same correlation when doing within-family comparisons argues that the correlation is caused by something that varies between families, not something about individual pregnancies such as paracetamol use.

Estimating rare proportions

There is a statistic circulating on social media claiming that the average person in the USA thinks 21% of the population is transgender.  Obviously this isn’t true (both obviously it isn’t 21% and obviously that isn’t what the average person believes). It’s similar in some ways to the claim that some Americans think Iran is in the middle of the Atlantic Ocean, which I’ve dealt with before, except that estimating small proportions is an extensively studied problem in psychology, so a lot is known about the biases. In fact, if you look at the original source for the claim, demonstrating this phenomenon was the actual point of the story.

As Danielle Navarro explains, all small proportions are overestimated and all large proportions underestimated when people aren’t certain of the true value. This is an extremely consistent phenomenon, to the extent that we can actually say Americans are better informed about the proportion of transgender people than they are about other comparably extreme proportions.

[Update: Andrew Gelman writes about a slightly different, but related phenomenon, in the context of people reporting having been present for mass shootings.  It’s slightly different because people are reporting their own experience, which they presumptively do know, rather than their estimates of some proportion they have no way of knowing. We’d expect the bias to be smaller in this setting, but to still be present — it’s like the estimate of the frequency of virgin birth from the National Longitudinal Study of Youth]

August 8, 2025

Success rates

Complicated interventions benefit from pilot studies, where you try to implement the intervention and see how feasible it is.  These are not designed as evaluations of how good the intervention is; they’re typically too small for that and they may have insufficient attention paid to representativeness.  You typically still would look at the outcome of the intervention, and you would have some idea of what you hoped to see.  As Dan Davies says if you don’t make predictions, you won’t know what to be surprised by (and if you don’t make recommendations, you won’t know what to be disappointed by)

In the new young-offenders bootcamp program, there has been a pilot with ten participants.  According to the news, 7 out of 10 have reoffended so far. Since one out of ten died, it would be generous to summarise the proportion with bad outcomes as 8 out of 10.

Speaking to RNZ, acting senior manager in charge Iain Chapman said at the time the pilot began, the 10 participants were the “most serious and persistent young offenders in the country”.

Going into the pilot and expecting no reoffending would have been naive, he said.

This is absolutely true.  What he didn’t say — and should have — was how much reoffending was reasonable to expect. Did he expect better results than two out of ten? Maybe he didn’t. Perhaps one out of ten is what he expected and getting two out of ten is an amazing success. That wasn’t the impression that the government and the media were giving when the program was announced, though. In particular, getting two out of ten not to reoffend doesn’t stack up well against the death.

If the ten pilot participants had been a representative sample of the sort of people who would go into the program, we could do some statistics.   However, we can’t really do this because the pilot program is so small and we don’t know how the participants were chosen. They presumably weren’t chosen specifically because they were unlikely to benefit, but we can’t say much more.

I would have expected that somewhere on a server in Wellington there is a business case for this program that has someone’s best guess at the likely success rate. It would be good to know if that person is surprised, or disappointed.

BLS accuracy

From economist Justin Wolfers on BlueSky, the record of payroll employment revisions by the US Bureau of Labor Statistics

First: look at 2020!

Next, though, the purple and green lines are quite close together compared to the scale of year-to-year change even when there isn’t a pandemic.

On the other hand, people do actually care about differences of the size we see between the initial and revised estimates, as is demonstrated by the stock market reaction to the revisions.  What this really shows is how difficult the estimation problem is.  People care deeply about changes that are almost invisibly small on this graph, and that are right at the limit of what’s feasible statistically.

The ideal solution is probably for people to be more relaxed about small changes in estimated payroll employment, just as the ideal solution for political opinion polling discourse is for people to have a more realistic view of the limits of estimation. Alternatively, if people want to be unrelaxed about small differences, they need to be willing to pay more to get better estimates.