Posts filed under Politics (174)

March 9, 2017

Causation, correlation, and gaps

It’s often hard to establish whether a correlation between two variables is cause and effect, or whether it’s due to other factors.  One technique that’s helpful for structuring one’s thinking about the problem is a causal graph: bubbles for variables, and arrows for effects.

I’ve written about the correlation between chocolate consumption and number of Nobel prizes for countries.  The ‘chocolate leads to Nobel Prizes’ hypothesis would be drawn like this:


One of several more-reasonable alternatives is that variations in wealth explain the correlation, which looks like


As another example, there’s a negative correlation between the number of pirates operating in the world’s oceans and atmospheric CO2 concentration.  It could be that pirates directly reduce atmospheric CO2 concentration:


but it’s perhaps more likely that both technology and wealth have changed over time, leading to greater CO2 emissions and also to nations with the ability and motivation to suppress piracy:


The pictures are oversimplified, but they still show enough of the key relationships to help with reasoning.  In particular, in these alternative explanations, there are arrows pointing into both the putative cause and the effect. There are arrows from the same origin into both ‘chocolate’ and ‘Nobel Prizes’; there are arrows from the same origins into both ‘pirates’ and ‘CO2‘.  Confounding — the confusion of relationships that leads to causes not matching correlations — requires arrows into both variables (or selection based on arrows out of both variables).

So, when we see a causal hypothesis like this one:


and ask if there’s “really” a gender pay gap, the answer “No” requires finding a variable with arrows into both gender and pay.  Which in your case you have not got. The pay gap really is caused by gender.

There are still interesting and important questions to be asked about mechanisms. For example, consider this graph


We’d like to know how much of the pay gap is direct underpayment, how much goes through the mechanism of women doing more childcare, and how much goes through the mechanism of occupations with more women being  paid less.  Information about mechanisms helps us think about how to reduce the gap, and what the other costs of reducing it might be.  The studies I’ve seen suggest that all three of these mechanisms do contribute, so even if you think only the direct effects matter there’s still a problem.

You can also think of all sorts of things and stuff I’ve left out of that graph, and you could put some of them back in


But you’re still going to end up with a graph where there are only arrows out of gender.  Women earn less, on average, and this is causation, not mere correlation.

August 17, 2016

Official statistics

There has been some controversy about changes to how unemployment is computed in the Household Labour Force Survey. As StatsNZ had explained, the changes would be back-dated to March 2007, to allow for comparisons.  However, from Stuff earlier this week:

In a media release Robertson, Labour’s finance spokesman, said National was “actively massaging official unemployment statistics” by changing the measure for joblessness to exclude those using websites, such as Seek or TradeMe.

Robertson was referring to the Household Labour Force Survey, due to be released on Wednesday, which he says would “almost certainly show a decrease in unemployment” as a result of the Government “manipulating official data to suit its own needs”.

Mr Robertson has since withdrawn this claim, and is now saying

“I accept the Chief Statistician’s assurances on the reason for the change in criteria but New Zealanders need to be aware that National Ministers have a track record of misusing and misrepresenting statistics.”

That’s a reasonable position — and some of the examples have appeared on StatsChat — but I don’t think the stories in the media have made it clear how serious the original accusation was (even if perhaps unintentionally).

Official statistics such as the unemployment estimates are politically sensitive, and it’s obvious why governments would want to change them. Argentina, famously, did this to their inflation estimates. As a result, no-one believed Argentinian economic data, which gets expensive when you’re trying to borrow money. For that reason, sensible countries structure their official statistics agencies to minimise political influence, and maximise independence.  New Zealand does have a first-world official statistics system — unlike many countries with similar economic resources — and it’s a valuable asset that can’t be taken for granted.

The system is set up so the Government shouldn’t have the ability to “actively massage” official unemployment statistics for minor political gain. If they did, well, ok, it was hyperbole when I said on Twitter ‘we’d need to go through StatsNZ with fire and the sword’, but the Government Statistician wouldn’t be the only one who’d need replacing.

July 27, 2016

In praise of NZ papers

I whinge about NZ papers a lot on StatsChat, and even more about some of the UK stories they reprint. It’s good sometimes to look at some of the UK stories they don’t reprint.  From the Daily Express


The Brexit enthusiast and cabinet Minister John Redwood says “The poll is great news, well done to the Daily Express.” As he seems to be suggesting, you don’t get results like this just by chance — having an online bogus poll on the website of an anti-Europe newspaper is a good start.

(via Antony Unwin)

July 19, 2016

Polls over petitions

I mentioned in June that Generation Zero were trying to crowdfund an opinion poll on having a rail option in the Auckland’s new harbour crossing.

Obviously they’re doing this because they think they know what the answer will be, but it’s still a welcome step towards evidence-based lobbying.

The results are out, in a poll conducted by UMR. Well, a summary of the results is out, in a story at The Spinoffand we can hope the rest of the information turns up on Generation Zero’s website at some point. A rail crossing is popular, even when its cost is presented as part of the question:


The advantage of proper opinion polls over petitions or other sort of bogus polls is the representativeness.  If 50,000 people sign a petition, all you know is that the true number of supporters is at least 50,000 (and maybe not even that).  Sometimes there will be one or two silent supporters for each petition vote (as with Red Peak); sometimes many more; sometimes fewer.

Petitions do have the advantage that you feel as if you’re doing something when you sign, but we can cope without that: after all, we still have social media.

May 26, 2016

Budget visualisations

This will likely be updated as I find them

  1. From Keith Ng. Budget now and over time. This gets special mention for being inflation-adjusted (it’s in 2014 dollars). Doesn’t work on my phone, but works well on a small laptop screen
  2. NZ Herald. Works (though hard to read) on a mobile. Still hard to read on a small laptop screen, but attractive on a large screen. I still have reservations about the bubbles.
  3. Stuff has a set of charts. The surplus/deficit one is nicely clear, though there’s nothing about the financial crisis/recession as an explanation for a lot of it.
  4. The government has interactive charts of Core Crown Revenue, Core Crown Expenditure, and breakdown for a taxpayer. On the last one, they lose points for displaying just income tax, when the Treasury are about the only people who could easily do better.
April 29, 2016

Looking up the index


Q: Did you hear that Auckland housing affordability is better now than when the government came to office?

A: No. Surely not.

Q: That’s what Nick Smith says: listen, it’s at 4:38. Is it true?

A: Up to a point.

Q: Up to what point?

A:  As he says, the Massey University Housing Affordability Index for February 2016 is lower than it was for November 2008, for Auckland and everywhere else in the country. For Auckland it was 38.44 then and is 33.8 now.

Q: But The Spinoff says one of the people behind the Index says Nick Smith is wrong, that housing isn’t more affordable than it was then.

A: Indeed she does. That’s because housing isn’t more affordable.

Q: But you said the index was lower?

A: Yes, it is.

Q: And lower is supposed to be better?

A: Yes.

Q: But how can the Housing Affordability Index be lower when housing isn’t more affordable? What is the index?

A: If it’s the same as it was is 2006 (which would make sense) it’s median selling price multiplied by a weighted-average interest rate and divided by the mean individual weekly earnings.

Q: Can you translate that?

A: Roughly,  the number of weeks of average earnings you’d need to pay the first year’s interest on a 100% mortgage.

Q: So if it’s 34, and you’ve got two people making the average, it’s 17 weeks each out of 52 going to mortgage interest? About 32% of income?

A: That’s right, only you don’t get 100% mortgages, so it’s more like 26% of income. And there’s taxes and insurance and you actually pay off a bit of the principal even in the first year, so it’s more complicated. But it’s a simple summary of the interest cost.

Q: And that’s lower now than in November 2008?

A: So it seems. I wasn’t living in New Zealand then, but it looks like mortgage interest rates were near 9%. The combination of the increase in incomes and the fall in interest rates has been slightly more than the increase in house prices, even in Auckland.

Q: But what if rates go back up?

A: Then a lot of houses will retroactively become much less affordable.

Q: And what about saving for down payments? That’s what all the snake people have been complaining about, and low interest rates don’t help there.

A: Down payments don’t go into the affordability index

Q: But they go into actual affordability!

A: Which is presumably why the Minister was talking about the affordability index.


April 28, 2016

Marking beliefs to market

Back in August, I wrote

Trump’s lead isn’t sampling error. He has an eleven percentage point lead in the poll averages, with sampling error well under one percentage point. That’s better than the National Party has ever managed. It’s better than the Higgs Boson has ever managed.

Even so, no serious commentator thinks Trump will be the Republican candidate. It’s not out of the question that he’d run as an independent — that’s a question of individual psychology, and much harder to answer — but he isn’t going to win the Republican primaries.

Arguably that was true: no serious commentator, as far as I know, did think Trump would be the Republican candidate.  But he is going to win the Republican primaries, and the opinion polls haven’t been all that badly wrong about him — better than the experts.

Māori imprisonment statistics: not just age

Jarrod Gilbert had a piece in the Herald about prisons

Fifty per cent of the prison population is Maori. It’s a fact regularly cited in official documents, and from time to time it garners attention in the media. Given they make up 15 per cent of the population, it’s immediately clear that Maori incarceration is highly disproportionate, but it’s not until the numbers are given a greater examination that a more accurate perspective emerges.

The numbers seem dystopian, yet they very much reflect the realities of many Maori families and neighbourhoods.

to know what he was talking about, qualitatively. I mean, this isn’t David Brooks.

It turns out that while you can’t easily get data on ethnicity by age in the prison population, you can get data on age, and that this is enough to get a good idea of what’s going on, using what epidemiologists call “indirect standardisation”.

Actually, you can’t even easily get data on age, but you can get a graph of age:

and I resorted to software that reconstructs the numbers.

Next, I downloaded Māori population estimates by age and total population estimates by age from StatsNZ, for ages 15-84.  The definition of Māori won’t be exactly the same as in Dr Gilbert’s data. Also, the age groups aren’t quite right because we’d really like the age when the offence happened, not the current age.  The data still should be good enough to see how big the age bias is. In these age groups, 13.2% of the population is Māori by the StatsNZ population estimate definition.

We know what proportion of the prison population is in each age group, and we know what the population proportion of Māori is in each age group, so we can combine these to get the expected proportion of Māori in the prison population accounting for age differences. It’s 14.5%.  Now, 14.5% is higher than 13.2%, so the age-adjustment does make a difference, and in the expected direction, just not a very big difference.

We can also see what happens if we use the Māori population proportion from the next-younger five-year group, to allow for offences being committed further in the past. The expected proportion is then 15.3%, which again is higher than 13.2%, but not by very much. Accounting for age, it looks as though Māori are still more than three times as likely to be in prison as non-Māori.

You might then say there are lots of other variables to be looked at. But age is special.  If it turned out that Māori incarceration rates could be explained by poverty, that wouldn’t mean their treatment by society was fair, it would suggest that poverty was how it was unfair. If the rates could be explained by education, that wouldn’t mean their treatment by society was fair; it would suggest education was how it was unfair. But if the rates could be explained by age, that would suggest the system was fair. They can’t be.

April 17, 2016

Overcounting causes

There’s a long story in the Sunday Star-Times about a 2007 report on cannabis from the National Drug Intelligence Bureau (NDIB)

“Perhaps surprisingly,” Maxwell wrote, “cannabis related hospital admissions between 2001 and 2005 exceeded admissions for opiates, amphetamines and cocaine combined”, with about 2000 people a year ending up in hospital because of the drug.

The problem was with hospital diagnostic codes. Discharge summaries include both the primary cause of admission and a lot of other things to be noted. That’s a good thing — you want to know what all was wrong with a patient both for future clinical care and for research and quality control.  For example, if someone is in hospital for bleeding, you want to know they were on warfarin (which is why the bleeding happened), and perhaps why they were on warfarin. It’s not even always the case that the primary cause is the primary cause — if someone has Parkinson’s Disease and is admitted with pneumonia as a complication, which one should be listed? This is a difficult and complex field, and is even slightly less boring than it sounds.

As a result, if you just count up all the discharge summaries where ‘cannabis dependence’ was somewhere on the laundry list of codes, you’re going to get a lot of people who smoke pot but are in hospital for some completely different reason.  And since there’s a lot of cannabis consumption out there, you will get a lot of these false positives.

There are some other things to note about this report, though. The National Drug Foundation says (on Twitter) that they made the same point when it first came out. They also claim

that the Ministry of Health argued against its being published.

Perhaps now the multiple-counting problem has been publicised in the context of hospital admissions the same mistake will be made less often for road crashes, where multiple factors from foreign drivers to speed to alcohol to drugs are repeatedly counted up as ‘the’ cause of any crash where they are present.

March 24, 2016

The fleg

Two StatsChat relevant points to be made.

First, the opinion polls underestimated the ‘change’ vote — not disastrously, but enough that they likely won’t be putting this referendum at the top of their portfolios.  In the four polls for the second phase of the referendum after the first phase was over, the lowest support for the current flag (out of those expressing an opinion) was 62%. The result was 56.6%.  The data are consistent with support for the fern increasing over time, but I wouldn’t call the evidence compelling.

Second, the relationship with party vote. The Herald, as is their wont, have a nice interactive thingy up on the Insights blog giving results by electorate, but they don’t do party vote (yet — it’s only been an hour).  Here are scatterplots for the referendum vote and main(ish) party votes (the open circles are the Māori electorates, and I have ignored the Northland byelection). The data are from here and here.


The strongest relationship is with National vote, whether because John Key’s endorsement swayed National voters or whether it did whatever the opposite of swayed is for anti-National voters.

Interestingly, given Winston Peters’s expressed views, electorates with higher NZ First vote and the same National vote were more likely to go for the fern.  This graph shows the fern vote vs NZ First vote for electorates divided into six groups based on their National vote. Those with low National vote are on the left; those with high National vote are on the right. (click to embiggen).

There’s an increasing trend across panels because electorates with higher National vote were more fern-friendly. There’s also an increasing trend within each panel, because electorates with similar National vote but higher NZ First vote were more fern-friendly.  For people who care, yes, this is backed up by the regression models.