Showing us the money

The Herald is running a project to crowdsource data entry and annotation for NZ political donations and expenses: it’s something that’s hard to automate and where local knowledge is useful. Today, they have an interactive graph for 2014 election donations and have made the data available


Wiki New Zealand site revamped

We’ve written before about Wiki New Zealand, which aims to ‘democractise data’. WNZ has revamped its website to make things clearer and cleaner, and you can browse here.

As I’m a postgraduate scarfie this year, the table on domestic students in tertiary education interested me – it shows that women (grey) are enrolled in greater numbers than men at every single level. Click the graph to embiggen.

Founder Lillian Grace talks about the genesis of Wiki New Zealand here, and for those who love the techy  side, here's a video about the backend.












Another interesting thing about petrol prices

or What I Did At Open Data Day.

The government monitoring data on petrol prices go back to 2004, and while they show their data as time series, there are other ways to look at it.


The horizontal axis is the estimated cost of imported petrol plus all the taxes and levies. The vertical axis is the rest of the petrol price: it covers the cost hauling the stuff around the country, the cost of running petrol stations, and profit for both petrol stations and companies.

There’s an obvious change in 2012. From 2005 to 2012, the importer margin varied around 15c/litre, more or less independent of the costs. From 2012, the importer margin started rising, without any big changes in costs.

Very recently, things changed again: the price of crude oil fell, with the importer margin staying roughly constant and the savings being passed on to consumers. Then the New Zealand dollar fell, and the importer margin has fallen — either the increased costs from the lower dollar are being absorbed by the vendors, or they have been hedged somehow.


Petrol prices

From time to time I like to remind people about the national petrol price monitoring program. For example, when there’s a call for a review of fuel prices.

The Ministry of Business, Innovation & Employment (Economic Development Information) carries out weekly monitoring of “importer margins” for regular petrol and automotive diesel.  The weekly oil prices monitoring report is reissued each week with the previous week’s data.

The importer margin is the amount available to retailers to cover domestic transportation, distribution and retailing costs, and profit margins.

The purpose of this monitoring is to promote transparency in retail petrol and diesel pricing and is a key recommendation from the New Zealand Petrol Review

The importer margin for petrol over the past three years looks like this:


The wiggly blue line is the week-by-week estimated margin; the shaded area is centered around the red trend line and covers 50% of the data. The margin had been going up; the calls for a review came just after it plummeted.

At the same site, but updated only quarterly, is an international comparison of the cost of fuel broken down into tax and everything else.


Caricatures and credits


A lot of surprisingly popular accounts on Twitter just tweet pictures, without giving any sources,and often with captions that misleading or just wrong.  One from yesterday had a picture of a picnic on a highway in the Netherlands in 1973 and described it as being from the US.

Here’s one that came from @AmazingMaps, today, captioned “Most popular word used in online dating profiles by state”



Could it really be true that ‘NASCAR’ is the most popular word in Indiana dating profiles? Or that ‘oil’ is the most popular word in Texas? Have the standard personal-ad clichés become completely outdated? Aren’t Americans easy-going any more? Doesn’t anyone care about romance or honesty or humour?

We’ve seen this sort of analysis before on StatsChat. It’s designed to produce a caricature, though not necessarily in a bad way. This one comes from Mashable, based on analysis by The original post says

Essentially, they broke down which words are used with relative frequency in certain states, as compared to relative infrequency in the rest of the country.

That is, the map has ‘oil’ for Texas and ‘NASCAR’ for Indiana not because these words were used very often in those states, but because they were used much less often in other states. Most Indiana dating profiles probably don’t mention NASCAR, but a much higher proportion do than in, say, New York or Oregon. Most Texas dating profiles don’t talk about oil, but it’s more common in Texas than in Maine or Tennessee. It’s not that everyone in Oregon or Idaho kayaks, but a lot more do than in Iowa or Kansas.


When this map first came out, in November, there were lots of stories about it, typically getting things wrong (eg an NBC motor sports site had the headline “NASCAR” is most frequently used word among Indiana online dating profiles”). That’s still bad, but most of these sites had links or at least mentioned the source of the map, so that people who care could find out what the facts are. @AmazingMaps seems confident none of its followers care.

Looking under the lamppost

Harkanwal Singh, at the Herald, has a very nice animation of known meteorite locations around the world and over time, as part of the report on Wednesday night’s fireball.  Here’s a still of the last frame: click to expand.


This is basically a map of sampling bias. That is, meteorites hit the Earth uniformly by longitude and over time, though with a preference for the tropics over the poles. The bias towards the tropics is fairly slight by real area, but the Mercator projection will amplify it. From a 1964 paper by Ian Halliday:


That’s not what the map looks like.

The first part of the sampling bias is that a meteorite basically has to hit land to be counted: if it hits ocean it will sink without a trace.

It’s easier to find meteorites in places where they don’t bury themselves in soil or get eroded, so we see lots of them in desert or in ice. You don’t get many found in the Amazon, but there are lots just to the west in the Atacama desert of Chile.

In non-ideal circumstances it helps if there’s a fairly dense population of observers and scientists: meteorites in the modern US have a reasonable chance of being found even in non-ideal countryside.  And finally, some places are easier to search than others. There’s a sharp drop off in meteorite finds between Oman and Yemen. This isn’t due to a dramatic geological or weather boundary; it has the same causes as the 13-year difference in life expectancy.

Two types of brain image study

If a brain imaging study finds greater activation in the asymmetric diplodocus region or increased thinning in the posterior homiletic, what does that mean?

There are two main possibilities. Some studies look at groups who are different and try to understand why. Other studies try to use brain imaging as an alternative to measuring actual behaviour. The story in the Herald (from the Washington Post), “Benefit of kids’ music lessons revealed – study” is the second type.

The researchers looked at 334 MRI brain images from 232 young people (so mostly one each, some with two or three), and compared the age differences in young people who did or didn’t play a musical instrument.  A set of changes that happens as you grow up happened faster for those who played a musical instrument.

“What we found was the more a child trained on an instrument,” said James Hudziak, a professor of psychiatry at the University of Vermont and director of the Vermont Center for Children, Youth and Families, “it accelerated cortical organisation in attention skill, anxiety management and emotional control.

An obvious possibility is that kids who play a musical instrument have different environments in other ways, too.  The researchers point this out in the research paper, if not in the story.  There’s a more subtle issue, though. If you want to measure attention skill, anxiety management, or emotional control, why wouldn’t you measure them directly instead of measuring brain changes that are thought to correlate with them?

Finally, the effect (if it is an effect) on emotional and behavioural maturation (if it is on emotional and behavioural maturation) is very small. Here’s a graph from the paper
The green dots are the people who played a musical instrument; the blue dots are those who didn’t.  There isn’t any dramatic separation or anything — and to the extent that the summary lines show a difference it looks more as if the musicians started off behind and caught up.

Thousand words edition:

  • From the Sydney Morning Herald (I’m in the West Island at the moment), new recommendations for amounts of sleep now have extra ‘may be appropriate’ uncertainty fringes around the central band, representing our lack of real knowledge about sleep.  If you are an adult and get 5 or less hours sleep a night, you aren’t getting enough. On the other hand, you probably have a small child and know you aren’t getting enough, or are Margaret Thatcher.


  • A graph for showing inequality. This has potential, but it would be more convincing if the examples involved real data. School decile data would be one possibility
  • Orange and blue: A circular histogram of the colour profiles in film trailers (from, via)
Meet Statistics summer scholar Christopher Pearce

Chris PearceEvery year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Christopher, right, is working on the OpenAPI project with Associate Professor Paul Murrell. Chris explains:

“Government data is becoming increasingly available. However, this does not mean it is readable – few individuals possess the knowledge and skills to make use of these data by themselves.

“In an ideal world, the code used by fellow statisticians would be available to everyone. It would be even more ideal if it were transferable. Sites like Wiki New Zealand  are doing a remarkable job of displaying some of New Zealand’s trends, but with no source code it can sometimes be impossible to recreate.

“The OpenAPI project is developing a flow-based framework that is primarily aimed at lowering the barriers to use of open data by the general public. My project is about creating an architecture for programmers and statisticians of all levels. Our goal is for anyone interested to have the ability to perform analyses on open government data. The idea is that there are publicly available snippets of code from fellow statisticians that can be easily linked in a meaningful way. The less expertise required by the end user, the better.

“My job is to come up with questions I am interested in answering, then figuring out how a potential lay observer would solve them. So far it has yielded some interesting results.

“I’m a third-year student at the University of Auckland, studying a Bachelor of Laws/Bachelor of Science conjoint. My skills lie in statistics and computer science, but I need the literal side to keep a balanced life.

“I got hooked on statistics when I discovered the Poisson distribution. There’s something about statistics that never seems to get old, and I’m discovering new things every day. It’s nice knowing I can actually attempt an answer to the curiosities in my head.”

Meet Statistics summer scholar Daniel van Vorsselen

Every year, the Department of Statistics offers summer scholarships to a number of students so they can work with staff on real-world projects. Daniel, right, is working on a project called Working with data from conservation monitoring schemes with Associate Professor Rachel Fewster. Daniel explains:

Daniel Profile Picture“The university is involved in a project called CatchIT, an online system that aims to help community conservation schemes by proving users with a place where they can input and store their data for reference. The project also produces maps and graphics so that users can assess the effectiveness of their conservation schemes and identify areas where changes can be made.

“My role in the project is to help analyse the data that users put into the project. This involves correctly formatting and cleaning the data so that it is usable. I assist users in the technical aspects relating to their data and help them communicate their data in a meaningful way.

“It’s important to maintain and preserve the wildlife and plant species we have in New Zealand so that future generations have the opportunity to experience them as we have. Our environments are a defining factor of our culture and lifestyles as New Zealanders and we have a large amount of native species in New Zealand. It would be a shame to see them eradicated.

“I am currently studying a BCom/BA conjoint, majoring in Statistics, Economics and Finance. I’m hoping to do Honours in statistics and I am looking at a career in banking.

“Over summer, I hope to enjoy the nice weather, whether out on the boat fishing, at the beach or going for a run.”