Posts written by Thomas Lumley (1256)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

September 1, 2014

Sometimes there isn’t a (useful) probability

In this week’s Slate Money podcast (starting at about 2:50), there’s an example of a probability puzzle that mathematically trained people tend to get wrong.  In summary, the story is

You’re at a theatre watching a magician. The magician hands a pack of cards to one member of the audience  and asks him to check that it is an ordinary pack, and to shuffle it. He asks another member of the audience to name a card. She says “Ace of Hearts”.  The magician covers his eyes, reaches out to the pack of cards, fumbles around a bit, and pulls out a card. What’s the probability that it is the Ace of Hearts?

It’s very tempting to say 1 in 52, because the framing of the puzzle prompts you to think in terms of equal-probability sampling.  Of course, as Felix Salmon points out, this is the only definitively wrong answer. The guy’s a magician. Why would he be doing this if the probability was going to be 1 in 52?

With an ordinary well-shuffled pack of cards and random selection we do know the probability: if you like the frequency interpretation of probability it’s an unknown number quite close to 1 in 52, if you like the subjective interpretation it should be a distribution of numbers quite close to 1 in 52.

With a magic trick we’d expect the probability (in the frequency sense) to be close to either zero or one, depending on the trick, but we don’t know.  Under the subjective interpretation of probability then you do know what the probability is for you, but you’ve got no real reason to expect it to be similar for other people.


August 30, 2014

Funding vs disease burden: two graphics

You have probably seen the graphic from vox.comhyU8ohq


There are several things wrong with it. From a graphics point of view it doesn’t make any of the relevant comparisons easy. The diameter of the circle is proportional to the deaths or money, exaggerating the differences. And the donation data are basically wrong — the original story tries to make it clear that these are particular events, not all donations for a disease, but it’s the graph that is quoted.

For example, the graph lists $54 million for heart disease, based on the ‘Jump Rope for Heart’ fundraiser. According to Forbes magazine’s list of top charities, the American Heart Association actually received $511 million in private donations in the year to June 2012, almost ten times as much.  Almost as much again came in grants for heart disease research from the National Institutes of Health.

There’s another graph I’ve seen on Twitter, which shows what could have been done to make the comparisons clearer:



It’s limited, because it only shows government funding, not private charity, but it shows the relationship between funding and the aggregate loss of health and life for a wide range of diseases.

There are a few outliers, and some of them are for interesting reasons. Tuberculosis is not currently a major health problem in the US, but it is in other countries, and there’s a real risk that it could spread to the US.  AIDS is highly funded partly because of successful lobbying, partly because it — like TB — is a foreign-aid issue, and partly because it has been scientifically rewarding and interesting. COPD and lung cancer are going to become much less common in the future, as the victims of the century-long smoking epidemic die off.

Depression and injuries, though?


Update: here’s how distorted the areas are: the purple number is about 4.2 times the blue number


Flying vs driving costs

To complement the Herald’s flying Air New Zealand vs driving costs for various NZ cities, I thought I’d work out similar comparisons for the Pacific Northwest, where I used to live.  It’s a reasonable comparison — both have relatively sparsely spaced cities, though the roads are better there.

I used Alaska Airlines for the flying costs; they are the main local airline in the region. The costs are the cheapest flight on a random weekday in September — there will be some days and seasons when it’s cheaper or more expensive.  The driving cost  is based on the actual driving distance, not the straight-line distance, and uses the cost per mile specified for business tax deductions.

from to distance (km) US$flying US$driving NZ$flying NZ$driving
1 Seattle Portland 278 368 97 438 116
2 Seattle Spokane 449 398 157 474 187
3 Seattle Calgary 1146 455 401 542 478
4 Seattle Kelowna 507 440 177 524 211
5 Portland Kelowna 785 487 275 580 327
6 Spokane Calgary 698 581 244 692 291


The results aren’t that different from NZ, except that the impact of competition is clearer: the Seattle–Calgary flight is much less expensive that you’d predict from the others, probably because there lots of one-stop alternatives via Vancouver.

August 29, 2014

Getting good information to government

On the positive side: there’s a conference of science advisers and people who know about the field here in Auckland at the moment. There’s a blog, and there will soon be videos of the presentations.

On the negative side: Statistics Canada continues to provide an example of how a world-class official statistics agency can go downhill with budget cuts and government neglect.  The latest story is the report on how the Labour Force Survey (which is how unemployment is estimated) was off by 42000 in July. There’s a shorter writeup in Maclean’s magazine, and their archive of stories on StatsCan is depressing reading.

August 28, 2014

Age, period, um, cohort

A recurring issue with trends over time is whether they are ‘age’ trends, ‘period’ trends, or ‘cohort’ trends.  That is, when we complain about ‘kids these days’, is it ‘kids’ or ‘these days’ that’s the problem? Mark Liberman at Language Log has a nice example using analyses by Joe Fruehwald.


If you look at the frequency of “um” in speech (in this case in Philadelphia), it decreases with age at any given year



On the other hand, it increases over time for people in a given age cohort (for example, the line that stretches right across the graph is for people born in the 1950s)



It’s not that people say “um” less as they get older, it’s that people born a long time ago say “um” less than people born recently.


‘Dodgy use of data’ edition [Background: the Washington Post is the serious DC paper. The Washington Times, not so much]

  • From the Washington Post  “But really, is it possible that more than 1 in 6 people in France could “back” Islamic State? When you look at the numbers closely, something doesn’t add up.”
  • From journalism/editing blog HeadsUp: “Too bad a clear conscience and a pure heart can’t turn correlation into cause, no matter what your first named source says.”
  • From economics blog TVHE:  For example, [Tourism Industry Association New Zealand] claims that 15%of Upper Hutt residents’ jobs depend on the tourism industry, while only 9% of residents’ jobs in Queenstown-Lakes District depend on tourism.”



August 26, 2014


Infographic edition

1. Thomson Reuters illustrated the importance of fine detail in graphic in one of their ads. It looks like a Venn diagram. Oops.




Removing the transparent overlap and changing the colours makes it less Venn-ish




2. Kevin Schaul in the Washington Post came up with this neat graphical summary of state data



Because the basic outline of the US is so familiar (especially to people who live there), the huge spatial distortions aren’t actually all that disturbing.  Mark Monmonier, a geographer, seems to have been the first person to move in this direction (eg). I suggested to Kevin, on Twitter, that this technique would also allow Alaska to be moved from the tropical Pacific to its proper home in the north, and he agreed.


3. That’ll wake you up


Jawbone, who make products that tell you if you are awake and walking around, looked at the impact of this week’s Napa earthquake. The data resolution isn’t quite fine enough to see the time taken for the ground waves to propagate — compare XKCD on the Twitter event horizon


August 22, 2014

Margin of error for minor parties

The 3% ‘margin of error’ usually quoted for poll is actually the ‘maximum margin of error’, and is an overestimate for minor parties. On the other hand, it also assumes simple random sampling and so tends to be an underestimate for major parties.

In case anyone is interested, I have done the calculations for a range of percentages (code here), both under simple random sampling and under one assumption about real sampling.


Lower and upper ‘margin of error’ limits for a sample of size 1000 and the observed percentage, under the usual assumptions of independent sampling

Percentage lower upper
1 0.5 1.8
2 1.2 3.1
3 2.0 4.3
4 2.9 5.4
5 3.7 6.5
6 4.6 7.7
7 5.5 8.8
8 6.4 9.9
9 7.3 10.9
10 8.2 12.0
15 12.8 17.4
20 17.6 22.6
30 27.2 32.9
50 46.9 53.1


Lower and upper ‘margin of error’ limits for a sample of size 1000 and the observed percentage, assuming that complications in sampling inflate the variance by a factor of 2, which empirically is about right for National.

Percentage lower upper
1 0.3 2.3
2 1.0 3.6
3 1.7 4.9
4 2.5 6.1
5 3.3 7.3
6 4.1 8.5
7 4.9 9.6
8 5.8 10.7
9 6.6 11.9
10 7.5 13.0
15 12.0 18.4
20 16.6 23.8
30 26.0 34.2
50 45.5 54.5

California drought visualisation


From XKCD. Both the data and the display technique are worth looking at



Presumably you could do something similar with New Zealand, which is roughly the same shape.

August 21, 2014

Auckland rates arithmetic

In today’s Herald story about increases in rates and impact on renters it’s not that the numbers are wrong, it’s that they haven’t been subjected to the right sorts of basic arithmetic.

The lead is

Auckland landlords are hiking rents amid fears of big rates increases next year on the back of spiralling property values.

and later on

Increases in landlords’ expenses, including rates, mortgage interest rates and insurance premiums, could push up rent on a three-bedroom Auckland house by between $20 and $40 a week, he said.

Including‘ is doing a lot of work in that sentence. The implications are particularly unfortunate in a story targeted at renters, who don’t get sent rates information directly and are less likely to know the details of  the system.

The first place to start is with a rough estimate of how much money we’re looking at. One of the few useful things the Taxpayers’ Union has done is to collate data on rates, hosted now at Stuff. The average Auckland rates bill was $2636.  That’s all residences, not three-bedroom houses, but the order of magnitude should be right. An annual bill of $2636 is $50/week. If the average total weekly rates payment is around $50, the average increase can’t reasonably be a big fraction of $20-$40/week or there’d be a lot more rioting in the streets.

Anyone who owns a house in Auckland or checks the Council website should know there is a cap on rates increases to cover the neighbourhoods where prices are increasing fastest. The cap is 10%/year; no rates increase faster than that, and most increase slower.  To get more detailed information you’d need to look at the website describing 2014/2015 rates changes, and find that the average increase for residential properties is 3.7%, then calculate that 3.7% of $50/week is about $2/week.

According to the Reserve Bank, both floating and two-year-fixed mortgage interest rates have gone up 0.5% since last year.  That’s $9.60/week per $100,000 of mortgage, so it’s likely to be a much bigger component of the rental cost increase than the rates are.

The average increase in rates is a lot slower than the increase in property prices (10% in the year to July), but you’d expect it to be. The council doesn’t set a fixed percentage of value from year to year and live with real-estate price fluctuations. It sets a budget for total rates income, and then distributes the cost using a combination of a fixed charge and a proportion of value. In other words, the increase in average real-estate prices in Auckland has no direct impact on average increase in rates — it’s just that if your house value has gone up more than average, your rates will tend to go up more than average.   Increases in average real-estate price obviously do lead to increases in rental price, but rates are not the mechanism.

The Council is currently working on a ten-year plan, including the total rates income over that period of time. It will be open for public comment in January.