Posts filed under General (1063)

December 5, 2016

Do snake people hate our freedom?

We had a Stat of the Week nomination for this graph from Stuff showing attitudes to democracy changing for people born more recently:


The complaint was that the non-NZ lines were indistinguishable. They do get pop-up descriptions on mouse-over, but the coloured circles in the legend are certainly not doing much work.

This is the original graph, from the New York Times:


The Times verison is more elegant and clearer, and also provides uncertainty intervals around the lines. On the other hand, the higher-than-wide panels are going to make any decrease look more dramatic.

There are two more important problems with the graph. The first is that it uses only the highest category, “Essential”, on a ten-point scale.  A decrease in the proportion of people using the top rating could be due to the whole distribution moving down, but it could also just be a trend in people’s tendency to use the extreme values on a scale.

Here’s a related graph using other data, tweeted by (Prof) Pippa Norris


The trend looks weaker when using means on a four-point scale. It’s also less universal than the New York Times graph suggests.

There’s another problem, though.  The source for the first graph: Yascha Mounk and Roberto Stefan Foa, “The Signs of Democratic Deconsolidation,” Journal of Democracy. The paper doesn’t exist yet at the journal’s website (or anywhere else that I’ve been able to find).  According to Dr Mounk’s CV, it’s coming out in the first edition next year.

Part of the point of peer-reviewed publications is that they include the details that don’t make it into a media story. This is, potentially, significant research on an important topic. If we’re going to have a full-on panic about millennials and the end of democracy, we could at least wait a couple of months for the research to be published.


December 2, 2016

Polling accuracy

It’s worth remembering sometimes that the Daily Mail is far from the worst UK paper statistically, and that US election polling and reporting could be a lot worse.

There was a by-election today in the electorate of Richmond Park. The Liberal Democrats won, with 49.7% of the vote to ex-Conservative Zac Goldsmith’s 45.2%.

Last month On Monday, the Evening Standard published a poll showing Goldsmith was leading 56% to 29%.

On Tuesday, the Standard reported as controversial a claim that the Liberal Democrats were “within three to four points” of Mr Goldsmith, with a Conservative source saying  “These are the usual claims from the LibDem national by-election machine – that’s not what we are finding on the doorstep.”


  • Beautiful pictures of food popularity over season and year, based on Google Trends data (via @kamal_hothi)
  • Despite the Sydney Morning Herald, Sydney high-school kids did not synthesise Daraprim. They synthesised pyrimethamine, and the difference is what matters. First, there’s the manufacturing quality control criteria that they don’t come close to meeting. More importantly, though, there’s the whole regulatory failure that let Shkreli overprice his brand of the drug in the US. In New Zealand, for comparison, Pharmac buys pyrimethamine for less than a dollar a pill, and in Australia it’s about the same (maybe cheaper).
  • Figure.NZ has a ‘festive data calendar’ with one NZ fact each day
  • In the past few months, global mean temperatures have decreased. Or even “plummeted”
    That’s because it’s winter in the northern hemisphere, and the northern hemisphere has more land than the southern hemisphere, and land temperatures vary more with season than ocean temperatures. It happens every year, and no-one would take this year’s fall as special evidence against climate change. Except, apparently, the US House of Representatives Committee on Science, Space, and Technology (or at least their Twitter account)
December 1, 2016

Praedictio mortis conturbat me

Q: Did you see scientists have found a way to predict immediate death?

A: What? Lack of pulse?

Q: Very droll. No, it says interleukin-6. What is that?

A: It’s a messenger protein that some white blood cells use to stimulate other white blood cells to do stuff. If there’s a lot of it around, there’s probably inflammation, which is probably bad.

Q: And it’s new?

A: No.

Q: The story says it’s new.

A: Yes. Yes, it does.

Q: So what’s new?

A: Interleukin 6 and another marker of inflammation called C-reactive protein used to be thought of as the best things to measure if you cared about inflammation. Some researchers came up with another, called α1-acid glycoprotein, and said it was better. This research is arguing that, no, α1-acid glycoprotein isn’t better.

Q: Why isn’t α1-acid glycoprotein mentioned in the story?

A: It is: the Herald’s just having font problems and calling it Î±1-acid glycoprotein.

Q: Are they right? Is interleukin 6 really better than α1-acid glycoprotein?

A: We can’t really tell just from this one study, any more than we could really tell α1-acid glycoprotein was better from the study that liked it.

Q: How accurate is the prediction?

A: Well, suppose you were given the name of  a 55-year old and had to guess whether they’d die in the next five years. What would you guess?

Q: Umm. No?

A: Very good. In this study, over 98% of the people didn’t die in the first five years of followup, so you’d be about 98% accurate knowing nothing.

Q: And knowing their interleukin 6 levels?

A: About 98% accurate.

Q: So it’s useless?

A: No, not at all. Comparing people at the top and bottom of the middle 50% of the distribution for interleukin-6 was like comparing smokers to non-smokers for short-term death rate. It’s just that will you/won’t you die in five years is not the right question for reasonably healthy middle-aged people.

Q: So it could be important for insurance, then?

A: In principle, if you wanted to undermine the usefulness of insurance.  It’s more useful for science — either understanding how inflammation has its effects, or trying to rule it out as an explanation of a correlation.


November 25, 2016


It’s well into Thanksgiving Day in the US now, and that’s a nice tradition to export. So, today, I’m thankful for geophysics.

In the Late Bronze Age, it made perfect sense that earthquakes were caused by God or gods getting upset. That, on a larger scale, is how people often behave, and whether we are made in God’s image or he in ours, you’d expect some similarities.  And when an earthquake destroys a city, well, whether you think God is more offended by homosexuality or homelessness, by not giving enough to the temple or not giving enough to the poor, there’s going to be something in any major city to piss him off.

Now we have maps like this one from GNS Science:
and this one, which I made for a very early StatsChat post, showing all sufficiently-large earthquakes from 1973 to mid-2011.


Working from travellers’ tales in the Middle East it would be impossible to see the patterns, but technologies including GPS, helicopters, the internet, and a worldwide network of seismometers makes them much clearer. Earthquakes mostly happen along a small set of lines, and scientists can measure the strains in the rock around those lines that lead to the earth rupturing.  The global pattern, together with a vast network of other evidence, fits an explanation where whole continents are pushed around on the Earth by convection deep inside, bumping and grinding as they collide. It doesn’t fit an explanation based on human behaviour being different in different places — even though that might seem a less grandiose explanation before we got the data.

There’s a lot we don’t know about earthquakes, but we understand them well enough to make high-risk/low-risk predictions, to describe the patterns of aftershocks, to do tsunami warnings (on a good day), and to buy and sell earthquake insurance.  We don’t know exactly why one building is destroyed and another is spared, but there aren’t any mysteries about it: it’s the sort of thing we could work out given time and money.

Science isn’t a pure good; there are many things we can go with more knowledge of the world, and the blue circles on the world map above show some seismic events that are the result of human action. But even they have become less frequent.

And now that God has gotten out of the natural-disaster business, many people in this country don’t believe in him, and those that do still believe mostly (with sad exceptions) have a higher opinion of him than their ancestors did.

November 24, 2016


  • “The problem scientists have to face here isn’t whether the data is real, but whether this is an appropriate way to represent it.” On the sea-ice graphic that’s going around.
  • “Using the language of economics, judgment is a complement to prediction and therefore when the cost of prediction falls demand for judgment rises. We’ll want more human judgment.” Harvard Business Review
  • Apps blamed for rise in road deaths (NY Times)
  • The sort of basic search skills Tim O’Reilly describes can also be applied to non-political fake news. If you start with “Ice cream for breakfast makes you smarter, claims scientist” from the Herald you can easily find the Japanese story that’s the source. If you look a little harder, as my brother did, you can find the 2013 story on the same Japanese site, which has a little more detail. Using Google Translate, the research was sponsored by an ice-cream company and the source for the story is the company website. The researcher is real, but the research appears not to have been published — and there has been plenty of time since 2013.   Ice-cream doesn’t really matter, but the question of which stories in the newspaper we’re supposed to take seriously does matter.
November 23, 2016

Indigenous data – why is it important?

andrew-sporle tahu-kukutai-240712In a data-driven world, indigenous peoples are becoming increasingly concerned about who owns and represents statistics about indigenous people: that is, who has access to the data, its cultural integrity, and how people’s privacy and autonomy is protected.

Not only do governments collect data about their citizens, but so, too, do indigenous peoples about themselves – just think of the data that iwi need to collect about their own people in this post-settlement era. As an example, I’m a registered member of Waikato-Tainui. The central administration knows six or so generations of my whakapapa because becoming registered means putting your links on paper that a kaumatua then signs off. It knows my home marae and all sorts of personal details such as where I live and my birth date. As I have been the privileged recipient of educational scholarships from the iwi, it also knows my academic record and quite a lot of personal stuff about my goals and aspirations.

So why is this important? Indigenous people have historically had a problematic relationship with researchers, academics and other data collectors. Researcher Andrew Sporle, pictured at right (Rangitāne, Ngāti Apa, Te Rārawa) recently told me that “From a Māori perspective, we were all too often the researched, not the researchers, and Māori realities were often portrayed as a strange and inferior ‘other’. Indigenous peoples are asserting the right to govern and protect the data that are so important to our development. We cannot afford to lose control of data about us.”

Data, he added, is a “highly valuable strategic asset” for Māori development. “In the age of big data, Māori want access to data to support our decision‐making and to be involved when big data is used to make decisions about us.”

In this field, things have been moving fast of late, and New Zealander statisticians are among the leaders.  Andrew and Tahu Kukutai pictured left (Ngāti Maniapoto, Te Aupōuri), Associate Professor at the Institute of Demographic and Economic Analysis, University of Waikato, are among the founding members of Te Mana Raraunga (the Māori Data Sovereignty Network), which was set up last year to assert Māori rights and interests in relation to data.

The group’s guiding motto is “He whenua hou, Te Ao Raraunga; Te Ao Raraunga, He whenua hou”, or “Data is a new world, a world of opportunity.”  It advocates “for the development of capacity and capability across the Māori data ecosystem, including data rights and interests, data governance, data storage and security, and data access and control”.

Andrew and Tahu attended last month’s  Indigenous Open Data Summit in Madrid, Spain, alongside independent statisticians Kirikowhai Mikaere (Tūhourangi, Ngāti Whakaue) and James Hudson (Ngāti Pukeko, Ngāti Awa, Ngāi Tai, Tūhoe), a researcher for Auckland Council’s Independent Māori Statutory Board. The summit, a first of its kind, provided a forum to discuss what action was being taken to protect the use of data about indigenous peoples.

Tahu and John Taylor, Emeritus Professor at the Centre for Aboriginal Economic Policy Research at the Australian National University,  have edited the just-released first book on indigenous data, titled Indigenous Data Sovereignty – Towards an Agenda, published by ANU Press.

It’s free to download and provides a comprehensive overview of why indigenous oversight of data is important, focusing largely on Australasia. It’s an interesting read and provides a perspective on data that has been missing for too long.

The local contributors include Darin Bishop (Ngāruahine, Taranaki), team leader of organisational knowledge at Te Puni Kōkiri, the Ministry of Māori Development; Dickie Farrar (Whakatōhea, Te Whānau ā Apanui, Te Aitanga ā Mahaki), CEO of the Whakatōhea Māori Trust Board;  James Hudson, mentioned above; Maui Hudson (Ngāruahine, Te Mahurehure, Whakatōhea), Associate Professor in the Faculty of Māori and Indigenous Studies at the University of Waikato; GP Rawiri Jansen (Ngati Hinerangi); Lesley McLean (Whakatōhea, Te Whānau ā Apanui), tribal database coordinator for the Whakatōhea Māori Trust Board; and leading demographer Ian Pool, Emeritus Professor at Waikato University.



November 20, 2016

Gained in translation

From a talk  at the workshop on Fairness, Accountability, and Transparency in Machine Learning, via Twitter


There’s obviously something wrong with these translations, but it’s also hard to do better.

To step back, there has classically been a translation problem where Greek and Latin have separate words for man as distinguished from woman and for man ‘as distinguished from beasts and angels’. It can be quite hard to guess which word was in the original source, if you’re working from the English translation.  This problem has a simple solution, since modern English also has a clear (and increasingly unavoidable) distinction between ‘man’ on the one hand and  ‘human’ or ‘person’ on the other.

This isn’t that problem.  It’s kind of the opposite.

The correct translation of “O bir doktor” is one of “He is a doctor”, “She is a doctor”, and “They are a doctor” and the correct translation of “O bir hemşire” is one of “He is a nurse”, “She is a nurse”, and “They are a nurse”.  Without more context, though, you can’t tell which, and none of them is unmarked or neutral.  “He” and “She” are obviously too narrow, and while singular ‘They” has always been standard English for an unspecified individual, it is only recently standard for a specific individual if they have asked to be referred to that way because of non-binary gender identification.

This is an example where the ambiguities probably have to be put back in by humans, because predictive analytics is unavoidably going to follow the stereotypes. Or, as a new Harvard Business Review article rather optimistically says about the impacts of machine learning:

Using the language of economics, judgment is a complement to prediction and therefore when the cost of prediction falls demand for judgment rises. We’ll want more human judgment.

November 18, 2016


  • “So what I got from reading some of Clinton’s email is another piece of evidence confirming my intuition that political systems scale poorly.” (
  • Cathy O’Neil on a program at Georgia State University: Here’s the thing. One of the hallmark characteristics of a WMD is that it punishes the poor, the unlucky, the sick, or the marginalized. This algorithm does the opposite – it offers them help.
November 15, 2016

Fake news and AI

From Russell Brown at Public Address

As Facebook moved from human curation to trust artificial intelligence to sift it stories, fakery exploded. It was a Google algorithm, not an editor, that made a wholly false claim about the popular vote the “top” story in its rankings. The idea that AI will actually write most of the news we see is genuinely horrifying.