Posts filed under Just look it up (235)

May 6, 2015

All-Blacks birth month

This graphic and the accompanying story in the Herald produced a certain amount of skeptical discussion on Twitter today.


It looks a bit as though there is an effect of birth month, and the Herald backs this up with citations to Malcolm Gladwell on ice hockey.

The first question is whether there is any real evidence of a pattern. There is, though it’s not overwhelming. If you did this for random sets of 173 people, about 1 in 80 times there would be 60 or more in the same quarter (and yes, I did use actual birth frequencies rather than just treating all quarters as equal). The story also looks at the Black Caps, where evidence is a lot weaker because the numbers are smaller.

On the other hand, we are comparing to a pre-existing hypothesis here. If you asked whether the data were a better fit to equal distribution over quarters or to Gladwell’s ice-hockey statistic of a majority in the first quarter, they are a much better fit to equal distribution over quarters.

The next step is to go slightly further than Gladwell, who is not (to put it mildly) a primary source. The fact that he says there is a study showing X is good evidence that there is a study showing X, but it isn’t terribly good evidence that X is true. His books are written to communicate an idea, not to provide balanced reporting or scientific reference.  The hockey analysis he quotes was the first study of the topic, not the last word.

It turns out that even for ice-hockey things are more complicated

Using publically available data of hockey players from 2000–2009, we find that the relative age effect, as described by Nolan and Howell (2010) and Gladwell (2008), is moderate for the average Canadian National Hockey League player and reverses when examining the most elite professional players (i.e. All-Star and Olympic Team rosters).

So, if you expect the ice-hockey phenomenon to show up in New Zealand, the ‘most elite professional players’, the All Blacks might be the wrong place to look.

On the other hand Rugby League in the UK does show very strong relative age effects even into the national teams — more like the 50% in first quarter that Gladwell quotes for ice hockey. Further evidence that things are more complicated comes from soccer. A paper (PDF) looking at junior and professional soccer found imbalances in date of birth, again getting weaker at higher levels. They also had an interesting natural experiment when the eligibility date changed in Australia, from January 1 to August 1.


As the graph shows, the change in eligibility date was followed by a change in birth-date distribution, but not how you might expect. An August 1 cutoff saw a stronger first-quarter peak than the January 1 cutoff.

Overall, it really does seem to be true that relative age effects have an impact on junior sports participation, and possibly even high-level professional acheivement. You still might not expect the ‘majority born in the first quarter’ effect to translate from the NHL as a whole to the All Blacks, and the data suggest it doesn’t.

Rather more important, however, are relative age effects in education. After all, there’s a roughly 99.9% chance that your child isn’t going to be an All Black, but education is pretty much inevitable. There’s similar evidence that the school-age cutoff has an effect on educational attainment, which is weaker than the sports effects, but impacts a lot more people. In Britain, where the school cutoff is September 1:

Analysis shows that approximately 6% fewer August-born children reached the expected level of attainment in the 3 core subjects at GCSE (English, mathematics and science) relative to September-born children (August born girls 55%; boys 44%; September born girls 61% boys 50%)

In New Zealand, with a March 1 cutoff, you’d expect worse average school performance for kids born on the dates the Herald story is recommending.

As with future All Blacks, the real issue here isn’t when to conceive. The real issue is that the system isn’t working as well for some people. The All Blacks (or more likely the Blues) might play better if they weren’t missing key players born in the wrong month. The education system, at least in the UK, would work better if it taught all children as well as it teaches those born in autumn.  One of these matters.



May 5, 2015

Civil unions down: not just same-sex

The StatsNZ press release on marriages, civil unions, and divorces to December 2014 points out the dramatic fall in same-sex civil unions with 2014 being the first full year of marriage equality. Interestingly, if you look at the detailed data, opposite-sex civil unions have also fallen by about 50%, from a low but previously stable level.


May 4, 2015

US racial disparity: how do we compare?

There’s a depressing chart at Fusion, originally from the Economist, that shows international comparisons for infant mortality, homicide, life expectancy, and imprisonment, with White America and Black America broken out as if they were separate countries.

Originally, I was just going to link to the chart, but I thought I should look at how Māori/Pākehā  disparities compare. European-ancestry New Zealanders and Māori make up roughly the same proportions of the NZ population as self-identified White and Black do in the US. The comparison is depressing, but also interesting: showing how ratios and differences give you different results.

First, infant mortality.  Felix Salmon writes

A look at infant mortality, a key indicator of development, is just as grim. Iceland has 1.6 deaths per 1,000 births; South Korea has 3.2. “White America” is pretty bad — by developed-country standards — with 5.1 deaths per 1,000 births. But “Black America,” again, is much, much worse: at 11.2 deaths per 1,000 births, it’s worse than Romania or China.

According to the Ministry of Health,  Māori infant mortality was 7.7/1000 in 2011 compared to 3.7/1000 for non-Māori, non-Pacific. According to StatsNZ, the rate for Māori was lower in 2012 (the numbers don’t quite match: different definitions or provisional data).  So, the Māori/Pakeha ratio is similar to the Black/White ratio in the US, but the difference is quite a bit smaller here.

Incarceration rates show a similar pattern. In the US, the rate is 2207/100k for Blacks and 380/100k for Whites.  In New Zealand, the rates are (about) 700/100k for Māori and 100/100k for European-ancestry. The NZ figures include people on remand; I don’t know if the US figures do. The ratio is a bit lower in New Zealand, but the difference is dramatically lower.

Homicide rates are harder to compare, because New Zealand only started collecting ethnicity of victims last year, and because NZ.Stat will only show you one month of data at a time. However, it looks as though the ratio is a lot less than the nearly 9 in the US. More importantly, the overall rate is much lower here: our rate is 0.9 per 100k, the overall US rate is 4.5 per 100k.

If the Māori/Pākehā disparities are slightly less serious than US Black/White disparities as ratios but much less serious as differences, which comparison is the right one? To some extent this depends on the question: risk ratios may be more relevant as indicators of structural problems, but risk differences are what actually matter to individuals.

May 1, 2015

Have your say on the 2018 census


StatsNZ has a discussion forum on the 2018 Census


They say

The discussion on Loomio will be open from 30 Apr to 10 Jun 2015.

Your discussions will be considered as an input to final decision making.

Your best opportunity to influence census content is to make a submission. Statistics NZ will use this 2018 Census content determination framework to make final decisions on content. The formal submission period will be open from 18 May until 30 Jun 2015 via

So, if you have views on what should be asked and how it should be asked, join in the discussion and/or make a submission


April 30, 2015

Half the median

From the Herald, under the headline “First-home buyers nab new home subsidies”

The AMP 360 First Home Buyer Affordability Report, published yesterday, shows housing remains “affordable” in all regions except Auckland and Queenstown.

The index tracked the lower-quartile (halfway between zero and the median) selling prices of houses and the median after-tax income of typical first-home buyers (a working couple both aged 25 to 29).

The lower quartile is not “halfway between zero and the median”. The lower quartile is the price that 25% of sales are below and 75% are above.

What’s more, the interpretation is obviously wrong. If you take the first Google link, at, there’s a table by region, and it lists the lower quartile price for Auckland metro as $587000 and Auckland City as $681000. The Herald reports the median price often enough that they must know it isn’t over a million dollars.

While I’m complaining: the data table at is in a state of sin. It’s not actually a data table; it’s a picture of a data table, a GIF image.

March 20, 2015

Ideas that didn’t pan out

One way medical statisticians are trained into skepticism over their careers is seeing all the exciting ideas from excited scientists and clinicians that don’t turn out to work. Looking at old hypotheses is a good way to start. This graph is from a 1986 paper in the journal Medical Hypotheses, and the authors are suggesting pork consumption is important in multiple sclerosis, because there’s a strong correlation between rates of multiple sclerosis and pork consumption across countries:


This wasn’t a completely silly idea, but it was never anything but suggestive, for two main reasons. First, it’s just a correlation. Second, it’s not even a correlation at the level of individual people — the graph is just as strong support for the idea that having neighbours who eat pork causes multiple sclerosis. Still, dietary correlations across countries have been useful in research.

If you wanted to push this idea today, as a Twitter account claiming to be from a US medical practice did, you’d want to look carefully at the graph rather than just repeating the correlation. There are some countries missing, and other countries that might have changed over the past three decades.

In particular, the graph does not have data for Korea, Taiwan, or China. These have high per-capita pork consumption, and very low rates of multiple sclerosis — and that’s even more true of Hong Kong, and specifically of Chinese people in Hong Kong.  In the other direction, the hypothesis would imply very low levels of multiple sclerosis among US and European Jews. I don’t have data there, but in people born in Israel the rate of multiple sclerosis is moderate among those of Ashkenazi heritage and low in others, which would also mess up the correlations.

You might also notice that the journal is (or was) a little non-standard, or as it said  “intended as a forum for unconventional ideas without the traditional filter of scientific peer review”.

Most of this information doesn’t even need a university’s access to scientific journals — it’s just out on the web.  It’s a nice example of how an interesting and apparently strong correlation can break down completely with a bit more data.

March 17, 2015

Bonus problems

If you hadn’t seen this graph yet, you probably would have soon.


The claim “Wall Street bonus were double the earnings of all full-time minimum wage workers in 2014″ was made by the Institute for Policy Studies (which is where I got the graph) and fact-checked by the Upshot blog at the New York Times, so you’d expect it to be true, or at least true-ish. It probably isn’t, because the claim being checked was missing an important word and is using an unfortunate definition of another word. One of the first hints of a problem is the number of minimum wage workers: about a million, or about 2/3 of one percent of the labour force.  Given the usual narrative about the US and minimum-wage jobs, you’d expect this fraction to be higher.

The missing word is “federal”. The Bureau of Labor Statistics reports data on people paid at or below the federal minimum wage of $7.25/hour, but 29 states have higher minimum wages so their minimum-wage workers aren’t counted in this analysis. In most of these states the minimum is still under $8/hr. As a result, the proportion of hourly workers earning no more than federal minimum wage ranges from 1.2% in Oregon to 7.2% in Tennessee (PDF).  The full report — and even the report infographic — say “federal minimum wage”, but the graph above doesn’t, and neither does the graph from Mother Jones magazine (it even omits the numbers of people)

On top of those getting state minimum wage we’re still short quite a lot of people, because “full-time” is defined by 35 or more hours per week at your principal job.  If you have multiple part-time jobs, even if you work 60 or 80 hours a week, you are counted as part-time and not included in the graph.

Matt Levine writes:

There are about 167,800 people getting the bonuses, and about 1.03 million getting full-time minimum wage, which means that ballpark Wall Street bonuses are 12 times minimum wage. If the average bonus is half of total comp, a ratio I just made up, then that means that “Wall Street” pays, on average, 24 times minimum wage, or like $174 an hour, pre-tax. This is obviously not very scientific but that number seems plausible.

That’s slightly less scientific than the graph, but as he says, is plausible. In fact, it’s not as bad as I would have guessed.

What’s particularly upsetting is that you don’t need to exaggerate or use sloppy figures on this topic. It’s not even that controversial. Lots of people, even technocratic pro-growth economists, will tell you the US minimum wage is too low.  Lots of people will argue that Wall St extracts more money from the economy than it provides in actual value, with much better arguments than this.

By now you might think to check carefully that the original bar chart is at least drawn correctly.  It’s not. The blue bar is more than half the height of the red bar, not less than half.

March 9, 2015

Not all there

One of the most common problems with data is that it’s not there. Families don’t answer their phones, over-worked nurses miss some forms, and even tireless electronic recorders have power failures.

There’s a large field of statistical research devoted to ways of fixing the missing-data problem. None of them work — that’s not my cynical opinion, that’s a mathematical theorem — but many of them are more likely to make things better than worse.  The best ways to handle data you don’t have depends on what sort of data and why you don’t have it, but even the best ways can confuse people who aren’t paying attention.

Just ignoring the missing data problem and treating the data you have as all the data is effectively assuming the missing data look just like the observed data. This is often very implausible. For example, in a weight-loss study it is much more likely that people who aren’t losing weight will drop out. If you just analyse data from people who stay in the study and follow all your instructions, unless this is nearly everyone, they will probably have lost weight (on average) even if your treatment is just staring at a container of felt-tip pens.

That’s why it is often sensible to treat missing observations as if they were bad. The Ministry of Health drinking water standards do this.  For example, they say that only 96.7% of New Zealand received water complying with the bacteriological standards. That sounds serious. Of the 3.3% failures, however, more than half (2.0%) were just failures to monitor thoroughly enough, and only 0.1% had E. coli transgression that were not followed up by immediate corrective action.

From a regulatory point of view, lumping these together makes sense. The Ministry doesn’t want to create incentives for data to ‘accidentally’ go missing whenever there’s a problem. From a public health point of view, though, you can get badly confused if you just look at the headline compliance figure and don’t read down to page 18.

The Ministry takes a similarly conservative approach to the other standards, and the detailed explanations are more reassuring than the headline compliance figures. There are a small number of water supplies with worrying levels of arsenic — enough to increase lifetime cancer risk by a tenth of a percentage point or so — but in general the biggest problem is inadequate fluoride concentrations in drinking water for nearly half of Kiwi kids.


March 5, 2015

Showing us the money

The Herald is running a project to crowdsource data entry and annotation for NZ political donations and expenses: it’s something that’s hard to automate and where local knowledge is useful. Today, they have an interactive graph for 2014 election donations and have made the data available


February 25, 2015

Wiki New Zealand site revamped

We’ve written before about Wiki New Zealand, which aims to ‘democractise data’. WNZ has revamped its website to make things clearer and cleaner, and you can browse here.

As I’m a postgraduate scarfie this year, the table on domestic students in tertiary education interested me – it shows that women (grey) are enrolled in greater numbers than men at every single level. Click the graph to embiggen.

Founder Lillian Grace talks about the genesis of Wiki New Zealand here, and for those who love the techy  side, here’s a video about the backend.