Posts written by Thomas Lumley (1755)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

May 3, 2016

Bright lights, big city

From NewsHub: “NZ’s most violent city spots revealed”

The approximately one square kilometre grid follows part of Queen St and includes the area around the Sky Tower and casino, as well as the eclectic entertainment strip of Karangahape Rd.

Last calendar year 550 people were the victims of assaults, sexual attacks and robberies in this area.

That’s a rate for these violent crimes more than six-and-a-half times the national average.

The other top locations included two more areas in central Auckland, and a chunk of central Wellington including Cuba St and Courtenay Place.  One thing these four (and quite possibly some of the other top locations) have in common is that a lot of people who don’t live there spend time there — and some of these people commit or suffer violent crimes.  Auckland Central West has a very high violent crime rate for its local population, but some of that is because the relevant population isn’t just the local residents, it’s workers by day and revellers by night.   The area is presumably more dangerous than the national average, but it’s not six and a half times more dangerous.

April 29, 2016

Looking up the index


Q: Did you hear that Auckland housing affordability is better now than when the government came to office?

A: No. Surely not.

Q: That’s what Nick Smith says: listen, it’s at 4:38. Is it true?

A: Up to a point.

Q: Up to what point?

A:  As he says, the Massey University Housing Affordability Index for February 2016 is lower than it was for November 2008, for Auckland and everywhere else in the country. For Auckland it was 38.44 then and is 33.8 now.

Q: But The Spinoff says one of the people behind the Index says Nick Smith is wrong, that housing isn’t more affordable than it was then.

A: Indeed she does. That’s because housing isn’t more affordable.

Q: But you said the index was lower?

A: Yes, it is.

Q: And lower is supposed to be better?

A: Yes.

Q: But how can the Housing Affordability Index be lower when housing isn’t more affordable? What is the index?

A: If it’s the same as it was is 2006 (which would make sense) it’s median selling price multiplied by a weighted-average interest rate and divided by the mean individual weekly earnings.

Q: Can you translate that?

A: Roughly,  the number of weeks of average earnings you’d need to pay the first year’s interest on a 100% mortgage.

Q: So if it’s 34, and you’ve got two people making the average, it’s 17 weeks each out of 52 going to mortgage interest? About 32% of income?

A: That’s right, only you don’t get 100% mortgages, so it’s more like 26% of income. And there’s taxes and insurance and you actually pay off a bit of the principal even in the first year, so it’s more complicated. But it’s a simple summary of the interest cost.

Q: And that’s lower now than in November 2008?

A: So it seems. I wasn’t living in New Zealand then, but it looks like mortgage interest rates were near 9%. The combination of the increase in incomes and the fall in interest rates has been slightly more than the increase in house prices, even in Auckland.

Q: But what if rates go back up?

A: Then a lot of houses will retroactively become much less affordable.

Q: And what about saving for down payments? That’s what all the snake people have been complaining about, and low interest rates don’t help there.

A: Down payments don’t go into the affordability index

Q: But they go into actual affordability!

A: Which is presumably why the Minister was talking about the affordability index.


Bar chart of the week

From the IMF, using OECD data, (via Sam Warburton)


Bar charts should start at zero (and probably shouldn’t  have distracting house/arrow/tree reflections in the background), but this graph would look even worse if the y-axis went down to zero. The problem is that ‘zero’ isn’t 0 for this sort of measurement.  The index is the price:income ratio now, divided by the price:income ratio in 2010, multiplied by 100.  The “no change” value is 100, which suggests using that for the floor of the bars.  Making the bars wider relative to the spaces gives easier comparisons and makes the graph less busy.  The colour scheme isn’t ideal for dichromats, but it only reinforces the information, it’s not needed to interpret anything.



The next step, as Sam suggested on Twitter, would be to give up on the ‘index’, which is really economist jargon, and just describe the change in %.  He also suggesting putting the two labels in colour (which required some fiddling: for the text colour to look like the bar colour it has to actually be darker).


One might also go back to the full names of the countries, but I quite like the abbreviations.


April 28, 2016


Most of Auckland is within walking distance of a school: there are over 500 schools in the 560 km2 of Auckland’s urban area. That’s usually regarded as a Good Thing, and Healthy. Auckland Transport’s “walking school bus” program takes advantage of it to get kids more active and to get cars off the roads. The coverage is pretty impressive: in this map by Stephen Davis, the circles show a 800m (half-mile, 2 km2 ) area around each school:


However, as a story at Stuff notes,  if most everywhere in Auckland is close to a school, the schools are going to be close to other establishments.  With a school on most square kilometres of urban land, there will be shops in the square kilometre around most schools selling fast food, or junk food.

That’s going to be even more true in denser, more walkable cities elsewhere, from Amsterdam to New AmsterdamYork.  “Near schools” isn’t a thing in cities. To reduce the number of these shops near schools, you have to reduce them everywhere.

This isn’t to say that all restrictions on fast-food sales are unreasonable, but having lots of things in a relatively small area is hard to avoid in cities. It’s how cities work.

Marking beliefs to market

Back in August, I wrote

Trump’s lead isn’t sampling error. He has an eleven percentage point lead in the poll averages, with sampling error well under one percentage point. That’s better than the National Party has ever managed. It’s better than the Higgs Boson has ever managed.

Even so, no serious commentator thinks Trump will be the Republican candidate. It’s not out of the question that he’d run as an independent — that’s a question of individual psychology, and much harder to answer — but he isn’t going to win the Republican primaries.

Arguably that was true: no serious commentator, as far as I know, did think Trump would be the Republican candidate.  But he is going to win the Republican primaries, and the opinion polls haven’t been all that badly wrong about him — better than the experts.

Māori imprisonment statistics: not just age

Jarrod Gilbert had a piece in the Herald about prisons

Fifty per cent of the prison population is Maori. It’s a fact regularly cited in official documents, and from time to time it garners attention in the media. Given they make up 15 per cent of the population, it’s immediately clear that Maori incarceration is highly disproportionate, but it’s not until the numbers are given a greater examination that a more accurate perspective emerges.

The numbers seem dystopian, yet they very much reflect the realities of many Maori families and neighbourhoods.

to know what he was talking about, qualitatively. I mean, this isn’t David Brooks.

It turns out that while you can’t easily get data on ethnicity by age in the prison population, you can get data on age, and that this is enough to get a good idea of what’s going on, using what epidemiologists call “indirect standardisation”.

Actually, you can’t even easily get data on age, but you can get a graph of age:

and I resorted to software that reconstructs the numbers.

Next, I downloaded Māori population estimates by age and total population estimates by age from StatsNZ, for ages 15-84.  The definition of Māori won’t be exactly the same as in Dr Gilbert’s data. Also, the age groups aren’t quite right because we’d really like the age when the offence happened, not the current age.  The data still should be good enough to see how big the age bias is. In these age groups, 13.2% of the population is Māori by the StatsNZ population estimate definition.

We know what proportion of the prison population is in each age group, and we know what the population proportion of Māori is in each age group, so we can combine these to get the expected proportion of Māori in the prison population accounting for age differences. It’s 14.5%.  Now, 14.5% is higher than 13.2%, so the age-adjustment does make a difference, and in the expected direction, just not a very big difference.

We can also see what happens if we use the Māori population proportion from the next-younger five-year group, to allow for offences being committed further in the past. The expected proportion is then 15.3%, which again is higher than 13.2%, but not by very much. Accounting for age, it looks as though Māori are still more than three times as likely to be in prison as non-Māori.

You might then say there are lots of other variables to be looked at. But age is special.  If it turned out that Māori incarceration rates could be explained by poverty, that wouldn’t mean their treatment by society was fair, it would suggest that poverty was how it was unfair. If the rates could be explained by education, that wouldn’t mean their treatment by society was fair; it would suggest education was how it was unfair. But if the rates could be explained by age, that would suggest the system was fair. They can’t be.

April 27, 2016

Not just an illusion

There’s a headline in the IndependentIf you think more celebrities are dying young this year, you’re wrong – it’s just a trick of the mind“. And, in a sense, Ben Chu is right. In a much more important sense, he’s wrong.

He argues that there are more celebrities at risk now, which there are. He says a lot of these celebrities are older than we realise, which they are. He says that the number of celebrity deaths this year is within the scope of random variation looking at recent times, which may well be the case. But I don’t think that’s the question.

Usually, I’m taking the other side of this point. When there’s an especially good or especially bad weekend for road crashes, I say that it’s likely just random variation, and not evidence for speeding tolerances or unsafe tourists or breath alcohol levels. That’s because usually the question is whether the underlying process is changing: are the roads getting safer or more dangerous.

This time there isn’t really a serious question of whether karma, global warming, or spiders from Mars are killing off celebrities.  We know it must be a combination of understandable trends and bad luck that’s responsible.  But there really have been more celebrities dying this year.   Prince is really dead. Bowie is really dead. Victoria Wood, Patty Duke, Ronnie Corbett, Alan Rickman, Harper Lee — 2016 has actually happened this way,  it hasn’t been (to steal a line from Daniel Davies) just a particularly inaccurate observation of the underlying population and mortality patterns.

April 24, 2016

On numbers meaning something

From a 2014 interview of Randall Munroe (XKCD) at 538,

We’re always seeing things like, “This canal project will require 1.15 million tons of concrete.” It’s presented as if it should mean something to us, as if numbers are inherently informative. So we feel like if we don’t understand it, it’s our fault.


 …Or is this just easy, space-filling trivia? A good rule of thumb might be, “If I added a zero to this number, would the sentence containing it mean something different to me?” If the answer is “no,” maybe the number has no business being in the sentence in the first place.

via Jenny Bryan


  • An example of bad forms design. 73% of members of the “American Indepedent Party” in California didn’t realise they were members of a party. They won’t be able to vote in the Democratic primary, though unaffiliated voters will be able to.  This ‘73%’ is also an example of the denominator mattering: the errors are estimated at 73% of AIP members but only 12% of independents
  • Herald (Daily Mail) headline “Meditation can knock 7 years off age of your brain”. Text: “those who meditate may lead healthier lifestyles in general. It is also possible that some inherent difference in brain structure makes some people more likely to take up meditating. Those studied had practised various types of traditional meditation for an average of 20 years.
  • Amazon has been distinctive for making the same prices available to rich and poor Americans. But the same-day free delivery service is becoming an exception. Bloomberg looks at why (with graphics) (via Harkanwal Singh)
  • Maps of electorate-level odds for the Australian election, with an interesting attempt to solve the problem of a continent made up mostly of empty space
  • A data proofreading app designed for data journalists (via Kristin Henry)
April 20, 2016

Housing affordability graphics

Another nice Herald interactive, this time of housing affordability.


Affordability comes in two parts: down payment and monthly mortgage costs. The affordability index from Massey University looks at monthly payments; this one looks at the 20% down payment.

The difference between Auckland and the rest of the country is pretty dramatic, but there are other things to see. Above, the centre of Auckland is much less expensive than the rest of the city: 75% of properties are valued at under $500,000 by CoreLogic.  That’s the apartments, but they mostly aren’t the sort of apartments people are planning to stay in long-term.

Another interesting feature for Auckland is that the neighbourhoods really are ordered in price — you don’t see the spatial trends changing as you move the slider, so there aren’t areas where the low-end houses are especially cheap and the high-end houses especially expensive.

You can also see the difficulty of relating valuations to prices. In Point Chev, the valuations say 70% of homes are valued at over $1 million. On the other hand, the median sale price is $990,00, so less than half the homes that changed hands went for over a million.


Both those numbers are correct. Well, ok,  I assume they are both correct; they are both what they are supposed to be.  It’s just that home sales aren’t a random sample of all homes.  But if the median sale price is $990k and the median valuation for all homes is $1.2m, you can see that interpreting these numbers is harder than it looks.