Posts written by Thomas Lumley (1418)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

February 19, 2015

West Island census under threat?

From the Sydney Morning Herald

Asked directly whether the 2016 census would go ahead as planned on August 9, a spokeswoman for the parliamentary secretary to the treasurer Kelly O’Dwyer read from a prepared statement.

It said: “The government and the Bureau of Statistics are consulting with a wide range of stakeholders about the best methods to deliver high quality, accurate and timely information on the social and economic condition of Australian households.”

Asked whether that was an answer to the question: “Will the census go ahead next year?” the spokeswoman replied that it was.

Unlike Canada, it’s suggested they would at least save money in the short term. It’s the longer-term consequences of reduced information quality that are a concern — not just directly for Census questions, but for all surveys that use Census data to compensate for sampling bias. How bad this would be depends on what is used to replace the Census: if it’s a reasonably large mandatory-response survey (as in the USA), it could work well. If it’s primarily administrative data, probably not so much.

In New Zealand, the current view is that we do still need a census.

Key findings are that existing administrative data sources cannot at present act as a replacement for the current census, but that early results have been sufficiently promising that it is worth continuing investigations.


February 18, 2015

Petrol prices

From time to time I like to remind people about the national petrol price monitoring program. For example, when there’s a call for a review of fuel prices.

The Ministry of Business, Innovation & Employment (Economic Development Information) carries out weekly monitoring of “importer margins” for regular petrol and automotive diesel.  The weekly oil prices monitoring report is reissued each week with the previous week’s data.

The importer margin is the amount available to retailers to cover domestic transportation, distribution and retailing costs, and profit margins.

The purpose of this monitoring is to promote transparency in retail petrol and diesel pricing and is a key recommendation from the New Zealand Petrol Review

The importer margin for petrol over the past three years looks like this:


The wiggly blue line is the week-by-week estimated margin; the shaded area is centered around the red trend line and covers 50% of the data. The margin had been going up; the calls for a review came just after it plummeted.

At the same site, but updated only quarterly, is an international comparison of the cost of fuel broken down into tax and everything else.


February 16, 2015


Pot and psychosis

The Herald has a headline “Quarter of psychosis cases linked to ‘skunk’ cannabis”, saying

People who smoke super-strength cannabis are three times more likely to develop psychosis than people who have never tried the drug – and five times more likely if they smoke it every day.

The relative risks are surprisingly large, but could be true; the “quarter” attributable fraction needs to be qualified substantially. As the abstract of the research paper (PDF) says, in the convenient ‘Interpretation’ section

Interpretation The ready availability of high potency cannabis in south London might have resulted in a greater proportion of first onset psychosis cases being attributed to cannabis use than in previous studies

Let’s unpack that a little.  The basic theory is that some modern cannabis is very high in THC and low in cannabidiol, and that this is more dangerous than more traditional pot. That is, the ‘skunk’ cannabis has a less extreme version of the same problem as the synthetic imitations now banned in NZ. 

The study compared people admitted as inpatients in a particular area of London (analogous to our DHBs) to people recruited by internet and train advertisements, and leaflets (which, of course, didn’t mention that the study was about cannabis). The control people weren’t all that well matched to the psychosis cases, but it wasn’t too bad.  The psychosis cases were somewhat more likely to smoke cannabis, and much more likely to smoke the high-THC type. In fact, smoking of other cannabis wasn’t much different between cases and controls.

That’s where the relative risks of 3 and 5 come from.  It’s still possible that these are due at least in part to some other factor; you can’t tell from just this sort of data. The atttributable fraction (a quarter of cases) comes from combining the relative risk with the proportion of the population who are exposed.

Suppose ‘skunk-type’ cannabis triples your risk, and 20% of people in the population use it, as was seen for controls in the sample. General UK data (eg) suggest the rate in non-users might be 5 cases per 10,000 people per year. So, in 100,000 people, 80,000 would be non-users and you’d expect 40 cases per year. The other 20,000 would be users, and you’d expect a background rate of 10 cases plus 20 extra cases caused by the cannabis. So, in the 100,000 people, you’d get 70 cases per year, 50 of which would have happened anyway and 20 due to cannabis. That’s not exactly the calculation the researchers did — they used a trick where they don’t need the background rate as long as it’s low, and I rounded more — but it’s basically the same. I get 28%; they got 24%.

The figures illustrate two things. First, the absolute risk increase is roughly 20 cases per 100,000 20,000 people per year. Second, the ‘quarter’ estimate is very sensitive to the proportion exposed. If 5% of people used ‘skunk-type’ cannabis, you can run the numbers again and you get 5 cases due to cannabis out of 55 in 100,000 people: only 9% of cases due to exposure.

Now we’re at the ‘interpretation’ quote from the research paper.  In this South London area, 20% of people have used mostly the high-potency cannabis and 44% mostly have used other types, with 37% non-users. That’s a lot of pot.  Even if the relative risks are correct, the population attributable proportion will be much lower for the UK as a whole (or for NZ as a whole).

Still, the research does tend to support the idea of regulated legalisation, the sort of thing that Mark Kleiman advocates, where limits on THC and/or higher taxes for higher concentrations can be used to push cannabis supply to lower-risk varieties.


February 15, 2015

Caricatures and credits


A lot of surprisingly popular accounts on Twitter just tweet pictures, without giving any sources,and often with captions that misleading or just wrong.  One from yesterday had a picture of a picnic on a highway in the Netherlands in 1973 and described it as being from the US.

Here’s one that came from @AmazingMaps, today, captioned “Most popular word used in online dating profiles by state”



Could it really be true that ‘NASCAR’ is the most popular word in Indiana dating profiles? Or that ‘oil’ is the most popular word in Texas? Have the standard personal-ad clichés become completely outdated? Aren’t Americans easy-going any more? Doesn’t anyone care about romance or honesty or humour?

We’ve seen this sort of analysis before on StatsChat. It’s designed to produce a caricature, though not necessarily in a bad way. This one comes from Mashable, based on analysis by The original post says

Essentially, they broke down which words are used with relative frequency in certain states, as compared to relative infrequency in the rest of the country.

That is, the map has ‘oil’ for Texas and ‘NASCAR’ for Indiana not because these words were used very often in those states, but because they were used much less often in other states. Most Indiana dating profiles probably don’t mention NASCAR, but a much higher proportion do than in, say, New York or Oregon. Most Texas dating profiles don’t talk about oil, but it’s more common in Texas than in Maine or Tennessee. It’s not that everyone in Oregon or Idaho kayaks, but a lot more do than in Iowa or Kansas.


When this map first came out, in November, there were lots of stories about it, typically getting things wrong (eg an NBC motor sports site had the headline “NASCAR” is most frequently used word among Indiana online dating profiles”). That’s still bad, but most of these sites had links or at least mentioned the source of the map, so that people who care could find out what the facts are. @AmazingMaps seems confident none of its followers care.

February 14, 2015

Run and find out, but guess first has a post on calendar patterns.  Yuri Victor noticed that, this year, February is a nice rectangular shape on a calendar (as happens whenever it starts on Sunday in  a non-leap year), and wondered how often this happened.  This is the sort of question where you can easily find out the answer, so he did:

I decided to see if this occurs often so I wrote some code and found out it happens more than I thought.In the past 100 years, there have been 11 Februaries that make a rectangle.

He also noticed that February 13th would be Friday when this happened and wondered how often we got a Friday 13th:

Friday the 13ths also happen more than I thought. In the past 100 years there have been 171 Friday the 13ths, which means there is one to two a year.

This is a Good Thing. We want journalists wondering about patterns and looking up data to check them. We don’t want them being required to call an expert in calendars to give a quote. It’s also a Good Thing that he tells us his expectations were wrong

It would be even better, though, if he’d tried to work out a quantitative guess and tell us. The simplest guess would be that, in the long run, February 1 is a Sunday as often as any other day, and that the 13th of a month is a Friday as often as any other day.  These are natural guesses because there’s no special reason the year or a particular month should start on a particular day of the week. 

In  100 years there are 1200 months, and 1200/7 is 171.4, so it looks as though Friday 13th happens in almost exactly 1/7 of months.  In the past 100 years there are 75 Februaries with 28 days, and 75/7 is 10.7, so 28-day Februaries begin on Sunday almost exactly 1/7 of the time.

You wouldn’t always expect the simplest possible explanation to hold. For example, the date of Passover is set based on the solar and lunar calendars, in a 19-year cycle. Since 7 doesn’t divide 19, you’d expect either that the days of the week didn’t divide up equally or that they took a long time (requiring lots of leap years) to do so.


February 13, 2015

Misunderstanding genetic heritability

From the Herald, under the headline “Is this why we’re all getting fat?”

According to the UN’s World Health Organisation, obesity nearly doubled worldwide from 1980 to 2008.

More than 2.8 million adults die each year as a result of being overweight or obese, it says. A full 42 million children under the age of five are considered to be obese.

Diet and a sedentary lifestyle have long been fingered as causes of obesity, but in recent years, advances in gene sequencing have turned attention to inheritance.

Previous studies have variously estimated genes as being to blame for between 40 and 70 per cent of the problem.

Every sentence here is true, but the impression is completely wrong.

The 40-70% genetic contribution to weight is comparing different individuals in basically the same environment.  The ‘obesity epidemic’ is comparing whole populations over time.  One thing we know can’t possibly explain the recent increases in obesity is genetics: there hasn’t been time for the genes of these populations to change.

Looking under the lamppost

Harkanwal Singh, at the Herald, has a very nice animation of known meteorite locations around the world and over time, as part of the report on Wednesday night’s fireball.  Here’s a still of the last frame: click to expand.


This is basically a map of sampling bias. That is, meteorites hit the Earth uniformly by longitude and over time, though with a preference for the tropics over the poles. The bias towards the tropics is fairly slight by real area, but the Mercator projection will amplify it. From a 1964 paper by Ian Halliday:


That’s not what the map looks like.

The first part of the sampling bias is that a meteorite basically has to hit land to be counted: if it hits ocean it will sink without a trace.

It’s easier to find meteorites in places where they don’t bury themselves in soil or get eroded, so we see lots of them in desert or in ice. You don’t get many found in the Amazon, but there are lots just to the west in the Atacama desert of Chile.

In non-ideal circumstances it helps if there’s a fairly dense population of observers and scientists: meteorites in the modern US have a reasonable chance of being found even in non-ideal countryside.  And finally, some places are easier to search than others. There’s a sharp drop off in meteorite finds between Oman and Yemen. This isn’t due to a dramatic geological or weather boundary; it has the same causes as the 13-year difference in life expectancy.

February 12, 2015

Eat food

From the Herald, based on this paper

Dietary advice issued to tens of millions had warned that fat consumption should be strictly limited to cut the risk of heart disease and death.

But experts say the recommendations, which have been followed for the past 30 years, were not backed up by scientific evidence and should not have been issued.

Firstly, the “not  backed up by scientific evidence” actually means “not backed up by randomised trials”. When there’s a shortage of randomised trials on a topic it doesn’t mean there is no evidence. Randomised trials are ideal, but they are very hard to do usefully for effects of diet.  The same issue of the scientific journal has a useful commentary piece talking about the evidence and policy questions.

Second,  it’s true that there were real gaps in knowledge on the difference between types of fat back then. All fat isn’t the same, and neither is all saturated fat, or all polyunsaturated fat. Since I wasn’t in epidemiology back then, I don’t know how much this was a known unknown that should have led to more caution versus an unknown unknown.

Third, in the US at least, people didn’t really reduce their fat consumption as a result of the guidelines. For example, in a paper in the American Journal of Clinical Nutrition

In a comparison of NHANES 2005–2006 with NHANES I, men had a decreased absolute daily fat intake (by 20 ± 23 kcal, from 909 to 889 kcal), whereas women had an increased absolute daily fat intake (by 27 ± 14 kcal, from 577 to 605 kcal).

Fat intake as a proportion of calories decreased quite a lot, because calories went up, but absolute fat intake stayed fairly stable. Saying the recommendations ‘have been followed for the past 30 years’ is misleading.

Fourth, as this shows we don’t know a lot about how to make recommendations that translate to the right sort of behaviour changes. This is another area where there’s shortage of randomised trials. And of scientific evidence generally.

And finally, there was a good story by Martin Johnston in the Herald in December that gives more background on the issue. There’s genuine disagreement, but the establishment view isn’t what the caricatures suggest:

Professor Jackson reckons the Japanese and traditional Mediterranean diets offer insights. He says the balance of carbs and fats is probably unimportant as long as most fat is not saturated and most carb is the complex variety, not sugar and white flour-based refined carbs.


Two types of brain image study

If a brain imaging study finds greater activation in the asymmetric diplodocus region or increased thinning in the posterior homiletic, what does that mean?

There are two main possibilities. Some studies look at groups who are different and try to understand why. Other studies try to use brain imaging as an alternative to measuring actual behaviour. The story in the Herald (from the Washington Post), “Benefit of kids’ music lessons revealed – study” is the second type.

The researchers looked at 334 MRI brain images from 232 young people (so mostly one each, some with two or three), and compared the age differences in young people who did or didn’t play a musical instrument.  A set of changes that happens as you grow up happened faster for those who played a musical instrument.

“What we found was the more a child trained on an instrument,” said James Hudziak, a professor of psychiatry at the University of Vermont and director of the Vermont Center for Children, Youth and Families, “it accelerated cortical organisation in attention skill, anxiety management and emotional control.

An obvious possibility is that kids who play a musical instrument have different environments in other ways, too.  The researchers point this out in the research paper, if not in the story.  There’s a more subtle issue, though. If you want to measure attention skill, anxiety management, or emotional control, why wouldn’t you measure them directly instead of measuring brain changes that are thought to correlate with them?

Finally, the effect (if it is an effect) on emotional and behavioural maturation (if it is on emotional and behavioural maturation) is very small. Here’s a graph from the paper
PowerPoint Presentation


The green dots are the people who played a musical instrument; the blue dots are those who didn’t.  There isn’t any dramatic separation or anything — and to the extent that the summary lines show a difference it looks more as if the musicians started off behind and caught up.