Posts filed under Just look it up (267)

October 18, 2016

The lack of change is the real story

The Chief Coroner has released provisional suicide statistics for the year to June 2016.  As I wrote last year, the rate of suicide in New Zealand is basically not changing.  The Herald’s story, by Martin Johnston, quotes the Chief Coroner on this point

“Judge Marshall interpreted the suicide death rate as having remained consistent and said it showed New Zealand still had a long way to go in turning around the unacceptably high toll of suicide.”

The headline and graphs don’t make this clear

Here’s the graph from the Herald


If you want a bar graph, it should go down to zero, and it would then show how little is changing


I’d prefer a line graph showing expected variation if there wasn’t any underlying change: the shading is one and two standard deviations around the average of the nine years’ rates


As Judge Marshall says, the suicide death rate has remained consistent. That’s our problem.  Focusing on the year to year variation misses the key point.

September 1, 2016

Transport numbers

Auckland Transport released new patronage data, and FigureNZ tidied it up to make it easily computer-readable, so I thought I’d look at some of it.  What I’m going to show is a decomposition of the data into overall trends, seasonal variation, and random stuff just happening. As usual, click to embiggen the pictures.

First, the trends: rides are up.


It’s hard to see the trend in ferry use, so here’s a version on a log scale — meaning that the same proportional trend would look the same for all three modes of transport


Train use is increasing (relatively) faster than bus or ferry use.  There’s also an interesting bump in the middle that we’ll get back to.

Now, the seasonal patterns. Again, these are on a logarithmic scale, so they show relative variation


The clearest signal is that ferry use peaks in summer, when the other modes are at their minimum. Also, the Christmas minimum is a bit lower for trains: to see this, we can combine the two graphs:


It’s not surprising that train use falls by more: they turn the trains off for a lot of the holiday period.

Finally, what’s left when you subtract the seasonal and trend components:


The highest extra variation in both train and ferry rides was in September and October 2011: the Rugby World Cup.


August 17, 2016

Official statistics

There has been some controversy about changes to how unemployment is computed in the Household Labour Force Survey. As StatsNZ had explained, the changes would be back-dated to March 2007, to allow for comparisons.  However, from Stuff earlier this week:

In a media release Robertson, Labour’s finance spokesman, said National was “actively massaging official unemployment statistics” by changing the measure for joblessness to exclude those using websites, such as Seek or TradeMe.

Robertson was referring to the Household Labour Force Survey, due to be released on Wednesday, which he says would “almost certainly show a decrease in unemployment” as a result of the Government “manipulating official data to suit its own needs”.

Mr Robertson has since withdrawn this claim, and is now saying

“I accept the Chief Statistician’s assurances on the reason for the change in criteria but New Zealanders need to be aware that National Ministers have a track record of misusing and misrepresenting statistics.”

That’s a reasonable position — and some of the examples have appeared on StatsChat — but I don’t think the stories in the media have made it clear how serious the original accusation was (even if perhaps unintentionally).

Official statistics such as the unemployment estimates are politically sensitive, and it’s obvious why governments would want to change them. Argentina, famously, did this to their inflation estimates. As a result, no-one believed Argentinian economic data, which gets expensive when you’re trying to borrow money. For that reason, sensible countries structure their official statistics agencies to minimise political influence, and maximise independence.  New Zealand does have a first-world official statistics system — unlike many countries with similar economic resources — and it’s a valuable asset that can’t be taken for granted.

The system is set up so the Government shouldn’t have the ability to “actively massage” official unemployment statistics for minor political gain. If they did, well, ok, it was hyperbole when I said on Twitter ‘we’d need to go through StatsNZ with fire and the sword’, but the Government Statistician wouldn’t be the only one who’d need replacing.

August 4, 2016

Garbage numbers

This appeared on Twitter


Now, I could just about believe NZ was near the bottom of the OECD, but to accept zero recycling and composting is a big ask.  Even if some of the recycling ends up in landfill, surely not all of it does.  And the garden waste people don’t charge enough to be putting all my wisteria clippings into landfill.

So, I looked up the source. It says to see the Annex Notes. Here’s the note for New Zealand

New Zealand: Data refer to amount going to landfill

The data point for New Zealand is zero by definition — they aren’t counting any of the recycling and composting.

When the most you can hope for is that the lies in the graph will be explained in the footnotes, you need to read the footnotes.


May 26, 2016

Budget visualisations

This will likely be updated as I find them

  1. From Keith Ng. Budget now and over time. This gets special mention for being inflation-adjusted (it’s in 2014 dollars). Doesn’t work on my phone, but works well on a small laptop screen
  2. NZ Herald. Works (though hard to read) on a mobile. Still hard to read on a small laptop screen, but attractive on a large screen. I still have reservations about the bubbles.
  3. Stuff has a set of charts. The surplus/deficit one is nicely clear, though there’s nothing about the financial crisis/recession as an explanation for a lot of it.
  4. The government has interactive charts of Core Crown Revenue, Core Crown Expenditure, and breakdown for a taxpayer. On the last one, they lose points for displaying just income tax, when the Treasury are about the only people who could easily do better.
May 7, 2016

Open data: baby names

The Herald has a headline “Emma and Noah continue to be tops for baby names”, with this link from the web front page


In fact, Noah was number 11 as a baby boy’s name, and Emma didn’t make the top hundred names for baby girls in New Zealand.  The top names in NZ, as in this Stuff story from the first week of January, were Oliver and Olivia. That story also had tables and graphs from the Dept of Internal Affairs data.

The new Herald story is about the USA, where they take longer to accumulate and release the baby-name data, but where they have the indefatigable Laura Wattenberg to make sure it gets publicised.

In fact, it’s kind of surprising how much difference there is between the US and NZ lists. Enough to make it worth pointing out in the story.  UK data won’t be out for another few months. Based on last year, it’s a bit more similar to NZ. Maybe we’ll get another story then.


April 29, 2016

Looking up the index


Q: Did you hear that Auckland housing affordability is better now than when the government came to office?

A: No. Surely not.

Q: That’s what Nick Smith says: listen, it’s at 4:38. Is it true?

A: Up to a point.

Q: Up to what point?

A:  As he says, the Massey University Housing Affordability Index for February 2016 is lower than it was for November 2008, for Auckland and everywhere else in the country. For Auckland it was 38.44 then and is 33.8 now.

Q: But The Spinoff says one of the people behind the Index says Nick Smith is wrong, that housing isn’t more affordable than it was then.

A: Indeed she does. That’s because housing isn’t more affordable.

Q: But you said the index was lower?

A: Yes, it is.

Q: And lower is supposed to be better?

A: Yes.

Q: But how can the Housing Affordability Index be lower when housing isn’t more affordable? What is the index?

A: If it’s the same as it was is 2006 (which would make sense) it’s median selling price multiplied by a weighted-average interest rate and divided by the mean individual weekly earnings.

Q: Can you translate that?

A: Roughly,  the number of weeks of average earnings you’d need to pay the first year’s interest on a 100% mortgage.

Q: So if it’s 34, and you’ve got two people making the average, it’s 17 weeks each out of 52 going to mortgage interest? About 32% of income?

A: That’s right, only you don’t get 100% mortgages, so it’s more like 26% of income. And there’s taxes and insurance and you actually pay off a bit of the principal even in the first year, so it’s more complicated. But it’s a simple summary of the interest cost.

Q: And that’s lower now than in November 2008?

A: So it seems. I wasn’t living in New Zealand then, but it looks like mortgage interest rates were near 9%. The combination of the increase in incomes and the fall in interest rates has been slightly more than the increase in house prices, even in Auckland.

Q: But what if rates go back up?

A: Then a lot of houses will retroactively become much less affordable.

Q: And what about saving for down payments? That’s what all the snake people have been complaining about, and low interest rates don’t help there.

A: Down payments don’t go into the affordability index

Q: But they go into actual affordability!

A: Which is presumably why the Minister was talking about the affordability index.


April 28, 2016

Māori imprisonment statistics: not just age

Jarrod Gilbert had a piece in the Herald about prisons

Fifty per cent of the prison population is Maori. It’s a fact regularly cited in official documents, and from time to time it garners attention in the media. Given they make up 15 per cent of the population, it’s immediately clear that Maori incarceration is highly disproportionate, but it’s not until the numbers are given a greater examination that a more accurate perspective emerges.

The numbers seem dystopian, yet they very much reflect the realities of many Maori families and neighbourhoods.

to know what he was talking about, qualitatively. I mean, this isn’t David Brooks.

It turns out that while you can’t easily get data on ethnicity by age in the prison population, you can get data on age, and that this is enough to get a good idea of what’s going on, using what epidemiologists call “indirect standardisation”.

Actually, you can’t even easily get data on age, but you can get a graph of age:

and I resorted to software that reconstructs the numbers.

Next, I downloaded Māori population estimates by age and total population estimates by age from StatsNZ, for ages 15-84.  The definition of Māori won’t be exactly the same as in Dr Gilbert’s data. Also, the age groups aren’t quite right because we’d really like the age when the offence happened, not the current age.  The data still should be good enough to see how big the age bias is. In these age groups, 13.2% of the population is Māori by the StatsNZ population estimate definition.

We know what proportion of the prison population is in each age group, and we know what the population proportion of Māori is in each age group, so we can combine these to get the expected proportion of Māori in the prison population accounting for age differences. It’s 14.5%.  Now, 14.5% is higher than 13.2%, so the age-adjustment does make a difference, and in the expected direction, just not a very big difference.

We can also see what happens if we use the Māori population proportion from the next-younger five-year group, to allow for offences being committed further in the past. The expected proportion is then 15.3%, which again is higher than 13.2%, but not by very much. Accounting for age, it looks as though Māori are still more than three times as likely to be in prison as non-Māori.

You might then say there are lots of other variables to be looked at. But age is special.  If it turned out that Māori incarceration rates could be explained by poverty, that wouldn’t mean their treatment by society was fair, it would suggest that poverty was how it was unfair. If the rates could be explained by education, that wouldn’t mean their treatment by society was fair; it would suggest education was how it was unfair. But if the rates could be explained by age, that would suggest the system was fair. They can’t be.

April 17, 2016

Overcounting causes

There’s a long story in the Sunday Star-Times about a 2007 report on cannabis from the National Drug Intelligence Bureau (NDIB)

“Perhaps surprisingly,” Maxwell wrote, “cannabis related hospital admissions between 2001 and 2005 exceeded admissions for opiates, amphetamines and cocaine combined”, with about 2000 people a year ending up in hospital because of the drug.

The problem was with hospital diagnostic codes. Discharge summaries include both the primary cause of admission and a lot of other things to be noted. That’s a good thing — you want to know what all was wrong with a patient both for future clinical care and for research and quality control.  For example, if someone is in hospital for bleeding, you want to know they were on warfarin (which is why the bleeding happened), and perhaps why they were on warfarin. It’s not even always the case that the primary cause is the primary cause — if someone has Parkinson’s Disease and is admitted with pneumonia as a complication, which one should be listed? This is a difficult and complex field, and is even slightly less boring than it sounds.

As a result, if you just count up all the discharge summaries where ‘cannabis dependence’ was somewhere on the laundry list of codes, you’re going to get a lot of people who smoke pot but are in hospital for some completely different reason.  And since there’s a lot of cannabis consumption out there, you will get a lot of these false positives.

There are some other things to note about this report, though. The National Drug Foundation says (on Twitter) that they made the same point when it first came out. They also claim

that the Ministry of Health argued against its being published.

Perhaps now the multiple-counting problem has been publicised in the context of hospital admissions the same mistake will be made less often for road crashes, where multiple factors from foreign drivers to speed to alcohol to drugs are repeatedly counted up as ‘the’ cause of any crash where they are present.

April 11, 2016

Missing data

Sometimes…often…practically always… when you get a data set there are missing values. You need to decide what to do with them. There’s a mathematical result that basically says there’s no reliable strategy, but different approaches may still be less completely useless in different settings.

One tempting but usually bad approach is to replace them with the average — it’s especially bad with geographical data.  We’ve seen get this badly wrong with kidnappings in Nigeria, we’ve seen maps of vaccine-preventable illness at epidemic proportions in the west Australian desert, we’ve seen Kansas misidentified as the porn centre of the United States.

The data problem that attributed porn to Kansas has more serious consequences. There’s a farm not far from Wichita that, according to the major database providing this information, has 600 million IP addresses.  Now think of the reasons why someone might need to look up the physical location of an internet address. Kashmir Hill, at Fusion, looks at the consequences, and at how a better “don’t know” address is being chosen.