Posts filed under Just look it up (262)

May 7, 2016

Open data: baby names

The Herald has a headline “Emma and Noah continue to be tops for baby names”, with this link from the web front page

baby

In fact, Noah was number 11 as a baby boy’s name, and Emma didn’t make the top hundred names for baby girls in New Zealand.  The top names in NZ, as in this Stuff story from the first week of January, were Oliver and Olivia. That story also had tables and graphs from the Dept of Internal Affairs data.

The new Herald story is about the USA, where they take longer to accumulate and release the baby-name data, but where they have the indefatigable Laura Wattenberg to make sure it gets publicised.

In fact, it’s kind of surprising how much difference there is between the US and NZ lists. Enough to make it worth pointing out in the story.  UK data won’t be out for another few months. Based on last year, it’s a bit more similar to NZ. Maybe we’ll get another story then.

 

April 29, 2016

Looking up the index

 

Q: Did you hear that Auckland housing affordability is better now than when the government came to office?

A: No. Surely not.

Q: That’s what Nick Smith says: listen, it’s at 4:38. Is it true?

A: Up to a point.

Q: Up to what point?

A:  As he says, the Massey University Housing Affordability Index for February 2016 is lower than it was for November 2008, for Auckland and everywhere else in the country. For Auckland it was 38.44 then and is 33.8 now.

Q: But The Spinoff says one of the people behind the Index says Nick Smith is wrong, that housing isn’t more affordable than it was then.

A: Indeed she does. That’s because housing isn’t more affordable.

Q: But you said the index was lower?

A: Yes, it is.

Q: And lower is supposed to be better?

A: Yes.

Q: But how can the Housing Affordability Index be lower when housing isn’t more affordable? What is the index?

A: If it’s the same as it was is 2006 (which would make sense) it’s median selling price multiplied by a weighted-average interest rate and divided by the mean individual weekly earnings.

Q: Can you translate that?

A: Roughly,  the number of weeks of average earnings you’d need to pay the first year’s interest on a 100% mortgage.

Q: So if it’s 34, and you’ve got two people making the average, it’s 17 weeks each out of 52 going to mortgage interest? About 32% of income?

A: That’s right, only you don’t get 100% mortgages, so it’s more like 26% of income. And there’s taxes and insurance and you actually pay off a bit of the principal even in the first year, so it’s more complicated. But it’s a simple summary of the interest cost.

Q: And that’s lower now than in November 2008?

A: So it seems. I wasn’t living in New Zealand then, but it looks like mortgage interest rates were near 9%. The combination of the increase in incomes and the fall in interest rates has been slightly more than the increase in house prices, even in Auckland.

Q: But what if rates go back up?

A: Then a lot of houses will retroactively become much less affordable.

Q: And what about saving for down payments? That’s what all the snake people have been complaining about, and low interest rates don’t help there.

A: Down payments don’t go into the affordability index

Q: But they go into actual affordability!

A: Which is presumably why the Minister was talking about the affordability index.

 

April 28, 2016

Māori imprisonment statistics: not just age

Jarrod Gilbert had a piece in the Herald about prisons

Fifty per cent of the prison population is Maori. It’s a fact regularly cited in official documents, and from time to time it garners attention in the media. Given they make up 15 per cent of the population, it’s immediately clear that Maori incarceration is highly disproportionate, but it’s not until the numbers are given a greater examination that a more accurate perspective emerges.

The numbers seem dystopian, yet they very much reflect the realities of many Maori families and neighbourhoods.

to know what he was talking about, qualitatively. I mean, this isn’t David Brooks.

It turns out that while you can’t easily get data on ethnicity by age in the prison population, you can get data on age, and that this is enough to get a good idea of what’s going on, using what epidemiologists call “indirect standardisation”.

Actually, you can’t even easily get data on age, but you can get a graph of age:
ps_ages_3_16

and I resorted to software that reconstructs the numbers.

Next, I downloaded Māori population estimates by age and total population estimates by age from StatsNZ, for ages 15-84.  The definition of Māori won’t be exactly the same as in Dr Gilbert’s data. Also, the age groups aren’t quite right because we’d really like the age when the offence happened, not the current age.  The data still should be good enough to see how big the age bias is. In these age groups, 13.2% of the population is Māori by the StatsNZ population estimate definition.

We know what proportion of the prison population is in each age group, and we know what the population proportion of Māori is in each age group, so we can combine these to get the expected proportion of Māori in the prison population accounting for age differences. It’s 14.5%.  Now, 14.5% is higher than 13.2%, so the age-adjustment does make a difference, and in the expected direction, just not a very big difference.

We can also see what happens if we use the Māori population proportion from the next-younger five-year group, to allow for offences being committed further in the past. The expected proportion is then 15.3%, which again is higher than 13.2%, but not by very much. Accounting for age, it looks as though Māori are still more than three times as likely to be in prison as non-Māori.

You might then say there are lots of other variables to be looked at. But age is special.  If it turned out that Māori incarceration rates could be explained by poverty, that wouldn’t mean their treatment by society was fair, it would suggest that poverty was how it was unfair. If the rates could be explained by education, that wouldn’t mean their treatment by society was fair; it would suggest education was how it was unfair. But if the rates could be explained by age, that would suggest the system was fair. They can’t be.

April 17, 2016

Overcounting causes

There’s a long story in the Sunday Star-Times about a 2007 report on cannabis from the National Drug Intelligence Bureau (NDIB)

“Perhaps surprisingly,” Maxwell wrote, “cannabis related hospital admissions between 2001 and 2005 exceeded admissions for opiates, amphetamines and cocaine combined”, with about 2000 people a year ending up in hospital because of the drug.

The problem was with hospital diagnostic codes. Discharge summaries include both the primary cause of admission and a lot of other things to be noted. That’s a good thing — you want to know what all was wrong with a patient both for future clinical care and for research and quality control.  For example, if someone is in hospital for bleeding, you want to know they were on warfarin (which is why the bleeding happened), and perhaps why they were on warfarin. It’s not even always the case that the primary cause is the primary cause — if someone has Parkinson’s Disease and is admitted with pneumonia as a complication, which one should be listed? This is a difficult and complex field, and is even slightly less boring than it sounds.

As a result, if you just count up all the discharge summaries where ‘cannabis dependence’ was somewhere on the laundry list of codes, you’re going to get a lot of people who smoke pot but are in hospital for some completely different reason.  And since there’s a lot of cannabis consumption out there, you will get a lot of these false positives.

There are some other things to note about this report, though. The National Drug Foundation says (on Twitter) that they made the same point when it first came out. They also claim


that the Ministry of Health argued against its being published.

Perhaps now the multiple-counting problem has been publicised in the context of hospital admissions the same mistake will be made less often for road crashes, where multiple factors from foreign drivers to speed to alcohol to drugs are repeatedly counted up as ‘the’ cause of any crash where they are present.

April 11, 2016

Missing data

Sometimes…often…practically always… when you get a data set there are missing values. You need to decide what to do with them. There’s a mathematical result that basically says there’s no reliable strategy, but different approaches may still be less completely useless in different settings.

One tempting but usually bad approach is to replace them with the average — it’s especially bad with geographical data.  We’ve seen fivethirtyeight.com get this badly wrong with kidnappings in Nigeria, we’ve seen maps of vaccine-preventable illness at epidemic proportions in the west Australian desert, we’ve seen Kansas misidentified as the porn centre of the United States.

The data problem that attributed porn to Kansas has more serious consequences. There’s a farm not far from Wichita that, according to the major database providing this information, has 600 million IP addresses.  Now think of the reasons why someone might need to look up the physical location of an internet address. Kashmir Hill, at Fusion, looks at the consequences, and at how a better “don’t know” address is being chosen.

March 22, 2016

Counting sheep

From the Guardian (slightly outside our usual beat, but noted by Robin Evans on Twitter)

The UK is the world’s third largest lamb exporter – after Australia and New Zealand – with just over a third of the market.

That can’t be true. Even if Australia and New Zealand and the UK were the only exporters, the UK being in third place would mean it had to have less than a third of the market.  The (UK) Agriculture & Horticulture Development Board (PDF) thinks it’s about 9% — yes, that’s not just lamb, but lamb makes up most of the NZ and Oz exports.

sheep

I’m not sure what the ‘just over a third’ really is. It might be the proportion of UK-raised lamb that is exported.

It’s also interesting to see the Guardian slant on the story: that supermarkets should refuse to stock any imported lamb at this time of the year and insist on English lamb raised indoors, out of season.

 

March 3, 2016

Soft drink doses

From the Herald today

Coca-Cola would prefer to see more people drinking less of its products rather than a few people drinking a lot. So one can a week is quite alright, according to the folks from Coke.

So, how does that compare to current consumption? We don’t know specifically for Coca-Cola, but Stuff gave figures a year ago for fizzy soft drinks

New Zealanders drank just under 73 litres of carbonated drinks each in 2014 – a fraction lower than Australia where the per-capita consumption sat just under 75 litres.

The figure excludes sports drinks, tea and coffee, and other soft drinks, and 73 litres a year breaks down to nearly four cans a week, and that’s averaged over the whole population. Averaged over just those who drink carbonated soft drinks it’s obviously going to be more.

Coca-Cola Amatil would probably be happy if people who don’t currently drink Coke started drinking a can a week, or if people switched to Coke from L&P, Fanta, or Six Barrel Soda Celery Tonic, but if everyone who drinks fizzy soft drinks regularly were to cut down to one can a week, the market would shrink a lot.

February 24, 2016

Home ownership comparisons

Two graphs to help people on Twitter who are arguing about home ownership trends in Auckland vs rest of NZ or in generational differences.

Both are percentages of home ownership based on the census question “Do you own or partly own your home?”, with data from the last three censuses.

First, comparisons between Auckland and the Rest of NZ by age, over time. Blue is Auckland, pink is RONZ

tenure-1

Second, trends over 12 years, by age, for three census years. Blue is 2001, pink is 2006, green is 2013.

tenure-2

Data from the nzdotstat table “Tenure holder by age group and sex, for the census usually resident population count aged 15 years and over, 2001, 2006 and 2013 Censuses (RC, TA, AU)”

 

Update: And one more. Here the lines connect roughly the same group of people (birth cohort) over time (only approximately because the planned 2011 census didn’t happen until 2013).

tenure-3

February 23, 2016

Population density: drawing the lines

David Seymour, on the Herald website

 Auckland is already denser than New York, and most American and Australian cities.  The 1.6 million people in Manhattan may live cheek-by-jowl, but not the other 20 million inhabiting the wider urban area.

An intelligent politician wouldn’t say something as apparently bizarre as this first sentence if it wasn’t true, so of course it is. The question is going to be true in what sense?

Based on the population figure, Mr Seymour is talking about the New York Metropolitan Statistical Area, aka, New York Urban Area,  which has a population of 20.1 million and a population density of 724/km2[*]. The Auckland Urban Area has a population of 1.45 million and a density of 2,600/km2, and, yes, 2600 is larger than 724. However, as the scenic photos in the Wikipedia page for the New York Metropolitan Area suggest, that might not be a fair comparison.

In fact, it’s true almost by definition that the New York metropolitan area has a lower density than urban Auckland

Urban areas in the United States are defined by the U.S. Census Bureau as contiguous census block groups with a population density of at least 1,000/sq mi (390/km2) with any census block groups around this core having a density of at least 500/sq mi (190/km2).  [Wikipedia, or see full legal definition]

That is, the metropolitan area is defined as the area around New York City all the way out until the local population density is below 190/km2It’s a sensible statistical unit — the US Census Bureau wasn’t trying to make a political point about urban infill when they defined it — but it’s not the same sort of unit as Stats New Zealand’s definition of urban or metro Auckland.

So, what other comparisons could we do? We could compare the New York Metropolitan Area to the Auckland Supercity, whose population density of 320/km2 is less than half as high. That might be unfair in the other direction — the Supercity is designed with the future expansion of Auckland in mind, while the US definitions are only intended for a ten-year period between censuses.

We can’t quite do the perfect comparison of redrawing Auckland Urban Area by the US rules, because NZ Area Units are bigger than US Census Block Groups, and NZ meshblocks are smaller, but someone with more time than me could try.

We could compare the Auckland urban area to genuinely urban parts of  the New York metro:  Mr Seymour mentioned Manhattan (density 27,673/km2, three times that of the Auckland CBD, nine times that of the Epsom electorate) but the other four boroughs of New York City all have higher density than urban Auckland. Two of them (the Bronx, and Brooklyn) have higher density than the Auckland CBD, Queens (8237/km2) is closer in density to the Auckland CBD than to the rest of Auckland, and even Staten Island is denser than urban Auckland as a whole. In the metropolitan area but across the river from New York City proper we have Hudson County (density 5,241/km2) and Newark (density about 4500/km2). The whole of Long Island, part of the New York metropolitan area, but also known for places like Fire Island and the Hamptons, has population density 2,151/km2, not far below urban Auckland.

And finally, an alternative way to do this whole comparison, which is much less sensitive to where the lines are drawn, is to look at population-weighted densities. That is, for the average person in a city, how dense is the population near them? For the whole New York metropolitan area the population-weighted density is 12000/km2 (or 120/hectare). For Auckland it is 43/hectare. In other words, while people near the edges of the New York metro area have a lot of space, most New Yorkers don’t. The average person in the broad New York metropolitan area sees three times the local population density of the average Aucklander.

 

Update: * Mr Seymour tells me he was referring the the definition of metropolitan areas from Demographia, which trims some of the low-density parts of the Census Bureau definition of New York to give a population density of 1800, and agrees well with the StatsNZ definition of urban Auckland.  So, while the issue about the difficult in defining things comparably is still an issue, it is less his fault than I had assumed.

February 11, 2016

Anti-smacking law

Family First has published an analysis that they say shows the anti-smacking law has been ineffective and harmful.  I think the arguments that it has worsened child abuse are completely unconvincing, but as far as I can tell there isn’t any good evidence that is has helped.  Part of the problem is that the main data we have are reports of (suspected) abuse, and changes in the proportion of cases reported are likely to be larger than changes in the underlying problem.

We can look at  two graphs from the full report. The first is notifications to Child, Youth and Family

ff-1

The second is ‘substantiated abuse’ based on these notifications

ff-2

For the first graph, the report says “There is no evidence that this can be attributed simply to increased reporting or public awareness.” For the second, it says “Is this welcome decrease because of an improving trend, or has CYF reached ‘saturation point’ i.e. they simply can’t cope with the increased level of notifications and the amount of work these notifications entail?”

Notifications have increased almost eight-fold since 2001. I find it hard to believe that this is completely real: that child abuse was rare before the turn of the century and became common in such a straight-line trend. Surely such a rapid breakdown in society would be affected to some extent by the unemployment  of the Global Financial Crisis? Surely it would leak across into better-measured types of violent crime? Is it no longer true that a lot of abusing parents were abused themselves?

Unfortunately, it works both ways. The report is quite right to say that we can’t trust the decrease in notifications;  without supporting evidence it’s not possible to disentangle real changes in child abuse from changes in reporting.

Child homicide rates are also mentioned in the report. These have remained constant, apart from the sort of year to year variation you’d expect from numbers so small. To some extent that argues against a huge societal increase in child abuse, but it also shows the law hasn’t had an impact on the most severe cases.

Family First should be commended on the inclusion of long-range trend data in the report. Graphs like the ones I’ve copied here are the right way to present these data honestly, to allow discussion. It’s a pity that the infographics on the report site don’t follow the same pattern, but infographics tend to be like that.

The law could easily have had quite a worthwhile effect on the number and severity of cases child abuse, or not. Conceivably, it could even have made things worse. We can’t tell from this sort of data.

Even if the law hasn’t “worked” in that sense, some of the supporters would see no reason to change their minds — in a form of argument that should be familiar to Family  First, they would say that some things are just wrong and the law should say so.  On the other hand, people who supported the law because they expected a big reduction in child abuse might want to think about how we could find out whether this reduction has occurred, and what to do if it hasn’t.