Posts written by Thomas Lumley (2609)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

July 21, 2025

Briefly

  • In “where are they now” followup: a 2016 StatsChat post examining a claim that dementia cures were just five years away. They weren’t.
  • Data Strips: A nice lookup at ways to show the distribution of a single numeric variable “in-line”
  • As foreshadowed by XKCD, mobile phone acceleration sensors are now genuinely warning of earthquakes-in-progress (media, research paper)
  • Stuff reports a reported sighting of the South Island kōkako. From the South Island Kōkako Trust (via Mike Dickison), a map just of probable encounters with South Island kōkako (ie, leaving out their large collection of “possible” encounters). They’re everywhere!

    That’s the problem. If they really existed in any significant fraction of the places they’ve been reported, you would expect them to be seen a lot more often including by people with cameras at the ready.  It’s not really feasible for a bird to be on the edge of vanishing, all over the South Island simultaneously, for decades, which is why I think it’s an ex-parrot.
July 19, 2025

Good nitrate, bad nitrate

As you probably have heard, tap water in Gore has been running just above the regulatory maximum of 11.3mg/L of nitrate. Nitrate in high enough doses converts hemoglobin to an inactive form, and babies are more susceptible. There are other risks, but they are more speculative according to the CDC. The risk should be pretty low when the concentration is just above the regulatory limit — that’s the point of regulatory limits — and it’s mostly for babies and pregnant women. Still: not ideal, and  you don’t want to let water standards start to slip.

That first link, though, is from 1News.  A couple of months ago they ran a story headed Seven things to eat or avoid to lower your blood pressure. Number 2 on the list is beetroot, based on its high nitrate content. Yes, the same nitrate that’s in the water in Gore. Beetroot juice contains quite a lot more nitrate. A research paper on sports supplements suggests the effective dose would be about 5 mmol per serving, which translates to a bit more than 300 mg, or more than 25 litres of Gore tapwater.

Obviously there’s difference between sports nerds deliberately drinking beetroot for the nitrates and nitrates turning up as an unwanted contaminant for young and old.  The risks are different, and consent matters. Still, it’s a recurrent irritation that the “nitrates good in massive doses” and “nitrates bad even in small doses” stories don’t get cross-referenced a bit more in the media, and we don’t have a more quantitative approach to the risk.  If we’re supposed to believe it’s actually risky to brush your teeth with water that’s 1% above the regulatory limit, the limit is in the wrong place.

 

July 17, 2025

Briefly

July 16, 2025

Chicken soup for the body?

From 1News (from the Conversation): Your mum was right: soup can aid recovery from winter illnesses

My mum was right about many things, but she did not say that soup could aid recovery from winter illnesses.  To feel better with a cold she would recommend hot lemon and honey (for adults, perhaps with a splash of whisky), but she didn’t ascribe any particular therapeutic powers to it.  My preference is Tom Yum soup, again without any claims about faster recovery.

The original story at The Conversation, unlike the 1News piece, actually links to things (The Conversation has mastered the technology of the hyperlink).  One thing it links to is the review article by the author of the story; another is a medical encyclopedia snippet saying soup may make you feel better but won’t affect your recovery.

There’s not as much to the review article as one might hope, even though the researchers seem to have done a very thorough search. People tend not to do high-quality randomised trials of non-standardised home interventions, and they also tend not to do high-quality randomised trials in common cold, so when you combine the two, there’s not a lot to find.  The researchers say they found four studies that looked at symptom duration, one of which showed evidence of a benefit.

I went to look at that one. It’s available from the National Library of Medicine, and it was published in the European Journal of Integrative Medicine.  It does describe a randomised trial, done in Iran.  Let’s look at how the abstract of the paper starts

SARS-CoV-2 causes severe acute respiratory syndrome prompting worldwide demand for new antiviral treatments and supportive care for organ failure caused by this life-threatening virus. This study aimed to help develop a new Traditional Persian Medicine (TPM) -based drug and assess its efficacy and safety in COVID-19 patients with major symptoms.

So it’s not really soup in the usual sense. Some of it sounds quite nice: chicken and barley soup with rosewater and saffron and fig. Other parts maybe not so much: your day started with a tablespoon of herb-Sophia seeds, a relative of cabbage and mustard.  The control group was normal hospital food, so this certainly wasn’t blinded.

In any case, the claim is that this soup has therapeutic effects on Covid-19, and there I’m prepared to be substantially more skeptical than with the common cold.  There wasn’t any sign that the treatment  led to earlier cure or prevented the illness getting worse. The study claims a reduction in four self-assessed symptoms, but the reduction isn’t big and the study decided to use twice the normal threshold for false-positive results — only one of the four reductions would barely meet the usual threshold.   There’s nothing magical about the usual threshold, but there also doesn’t seem to be anything magical about the soup.

If you have Covid, consult your doctor. If you have a cold, your favourite soup might well be a good way to feel better for a bit, especially if someone else makes it for you.

July 14, 2025

Counting homelessness

We’ve seen in the past that NZ has very high estimated numbers of homeless people by international standards, and that this is at least in part because we have a very broad definition of “homeless”.

In this podcast, journalist Elizabeth Spiers talks to Brian Goldstone about his new book on homelessness in the US, and in part about how the problem is a lot broader than the official homelessness statistics.  His book takes into account the same sorts of people without a home that the NZ statistics do, and his estimate that the true number is about six times the official number would rate the USA as a little worse than NZ.

July 8, 2025

A fine line

As you probably know, there are four sorts of horizontal line separating symbols in text: the minus sign, −; the hyphen, -; the en-dash, –; and the em-dash, —.  Some people use just hyphens — or perhaps paired hyphens to indicate an em-dash — because that’s what a standard English keyboard provides.

Recently, the em-dash has been touted as an outward and visible sign of ChatGPT output.  This annoys people who deliberately use the full variety of English punctuation marks. The em-dash is in LLM output, they retort, because LLM output is trained on English writing and so will extrude em-dashes and semicolons, just as it will extrude metaphor and metonymy, zeugma and syllepsis.

On the other hand, there do seem to be a lot of em-dashes in ChatGPT output nowadays.

An analysis by Maria Sukhareva suggests a compromise explanation. Yes, the stretch hyphens come from the training data, and yes, they are somewhat new, but there are also too many of them.  We’re seeing a combination of two factors: addition of older books— with more em-dashes— to the training materials, and the fact that an em-dash is fewer tokens than other ways of setting off parenthetical comments.

July 7, 2025

Just one hot dog a day!

Q: Did you see there’s no safe level of processed meat!

A: I saw the New York Post

Q: <side eye>Really?

A: Via Google. And RNZ and Stuff

Q: Didn’t we have this story a couple of months ago?

A: Not quite. That was ultra-processed. This is only processed.  And just meat.

Q: Is it true?

A: What did we say last time?

Q: “But think about it. How do they measure people’s consumption of ultraprocessed food down to the single bite level? How do they find a comparison group with just one bite less consumption? What does it even mean?”

A:  Exactly. Even though it’s not “one bite” this time, there’s still the question of what levels they actually compared

Q: So, um, what levels did they actually compare? And is there a research paper?

A: The paper is here.  They said (for diabetes, colorectal cancer was similar)

The mean relative risk (RR) of developing type 2 diabetes was 1.30 (1.12–1.52) at a daily intake of 50 g of processed meat compared with the theoretical minimum risk exposure level (TMREL; equal here to 0 g d−1 or no consumption)… consuming processed meat in the range of the 15th to 85th percentiles of exposure (0.6–57 g d−1), compared with consuming no processed meat, was associated on average with at least an 11% higher risk of type 2 diabetes.

Q: Is 50g a lot

A; About one hot dog

Q: So eating one hot dog will increase your risk of diabetes by 26%

A: One hot dog per day

Q: And they’re comparing to a “theoretical minimum risk” at zero hot dogs per day?

A: Yes

Q: Doesn’t that kind of assume there is no safe level

A: Pretty much.  They’d be able to see if moderate levels of consumption are actually protective, though if you think about how long people argued over that with alcohol, they obviously can’t tell very clearly

Q: So it’s not harmful at lower levels of consumption?

A: It’s correlated with diabetes (and cancer) at lower levels, too. Here’s the picture. The blue line is their estimate relative risk

Q: It gets a bit noisy down near 10 or 20g/day. Hard to tell the shape of the curve.

A: It is.

Q: And the red crosses?

A: The bluish dots are data; the red crosses are data they didn’t use

Q: So what does it actually mean that there’s no safe level?

A: If you’re eating hot dogs primarily for the health benefits, you’re doing it wrong.

July 4, 2025

Briefly

  • Health Nerd else has similar views about coffee as me last week
  • Royal Statistical Society blog post on the future of the British Office of National Statistics
  • Graeme Edgeler argues that getting rid of the Census may require amending entrenched provisions of the Electoral Act, which takes a 75% supermajority of Parliament
  • The Economist/YouGov had a poll being run on bombing Iran at the time the US did it, and reported this interesting shift in opinions: republicans approved more; democrats approved less.
  • As a StatsChat reader you should be looking at this and wondering where the uncertainty estimates. Owen Winter responded to my BlueSky query with this. If you take into account model uncertainty it’s a bit less impressive, but it’s not nothing
June 23, 2025

Briefly

  • ‘Kids in sport stay out of court’ – Sport NZ to help curb youth offending from RNZ.  This is another one of these cause and effect ones. Is it that being pushed into sport makes kids less likely to offend? Is it that kids with the qualities — self-control, work ethic, parents who drive you to games — to engage with sport are less likely to commit dumb crimes? Or (as anonymous law commenter @StrictlyObiter suggests) is it that “promising young sportsmen” are more likely to get a discharge without conviction?  Or, more likely, all of the above in some complicated mixture
  • From the Bennett Institute for Applied Data Science at Oxford, another example of counting being hard. They wanted to find out how much of each medication was used across the National Health Service
  • On pizza as a leading indicator of US military activity
  • It’s twenty years since XKCD did a big colour survey, showing people coloured patches and asking for colour names.  Nicola Rennie made this poster of the top (ie, most agreed on) colour names — click to embiggen.

Evidence of things not seen

A couple of studies out recently look at coffee and health.  One from Harvard(and reported by CNBC)  says coffee (but not decaf or tea) increases the chance of healthy aging in women. Another, from Tufts, (and reported by Newsweek and The Independent) says that unsweetened black coffee, or coffee with very small amounts of sugar or normal coffee amounts of milk, reduces death rates slightly, but not coffee with more sugar or milk (as in everything from a Kiwi small flat white to American-style lattes)

There are two problems with these studies.  The first is that I can’t see them.  One is an abstract from a conference presentation; the other is a paper in an academic journal, but not one the University of Auckland gives me access to.

Compounding this, the two abstracts only give information for their preferred beverages. It’s not possible to tell whether “Decaffeinated coffee and tea intake were not significantly associated with odds of HA nor any domains” means that there’s evidence the correlations were different for tea and decaf or whether there was just a bit more uncertainty around plausibly the same correlation.  Similarly, “However, the mortality benefits were restricted to black coffee [HR (95% CI): 0.86 (0.77, 0.97)] and coffee with low added sugar and saturated fat content [HR (95% CI): 0.86 (0.75, 0.99)]” doesn’t tell us what they found for other coffee types. Nor is the information in the press releases I could find. Since the difference between ways of drinking coffee was the main news tag for these studies, that’s a bit unsatisfactory.

I’ll also note that the Tufts team published an abstract in 2020 with a slightly smaller version of the data from the same survey series, and concluded “Adding milk/cream, alone or with sugar/sweetener, did not significantly change the results.”

A basic principle for studies like these is that conclusions about difference require evidence of difference.  This applies to conclusions in the paper, and even more so to conclusions you want the press to report.