Posts filed under Social Media (90)

March 8, 2017

Yes, November 19


The graph is from a Google Trends search for  “International Men’s Day“.

There are two peaks. In the majority of years, the larger peak is on International Women’s Day, and the smaller peak is on the day itself.

November 26, 2016

Garbage numbers from a high-level source

The World Economic Forum (the people who run the Davos meetings) are circulating this graph:cyjjcamusaaooga

According to the graph, New Zealand is at the bottom of the OECD, with 0% waste composted or recycled.  We’ve seen this graph before, with a different colour scheme. The figure for NZ is, of course, utterly bogus.

The only figure the OECD report had on New Zealand was for landfill waste, so obviously landfill waste was 100% of that figure, and other sources were 0%.   If that’s the data you have available, NZ should just be left out of the graph — and one might have hoped the World Economic Forum had enough basic cluefulness to do so.

A more interesting question is what the denominator should be. The definition the OECD was going for was all waste sent for disposal from homes and from small businesses that used the same disposal systems as homes. That’s a reasonable compromise, but it’s not ideal. For example, it excludes composting at home. It also counts reuse and reduced use of recyclable or compostable materials as bad rather than good.

But if we’re trying to approximate the OECD definition, roughly where should NZ be?  I can’t find figures for the whole country, but there’s some relevant –if outdated — information in Chapter 3 of the Waste Assessement for the Auckland Council Waste Management Plan. If you count just kerbside recycling pickup as a fraction of kerbside recycling+waste pickup, the diversion figure is 35%. That doesn’t count composting, and it’s from 2007-8, so it’s an underestimate. Based on this, NZ is probably between USA and Australia on the graph.

May 29, 2016

I’ma let you finish

Adam Feldman runs the blog Empirical SCOTUS, with analyses of data on the Supreme Court of the United States. He has a recent post (via Mother Jones) showing how often each judge was interrupted by other judges last year:


For those of you who don’t follow this in detail, Elena Kagan and Sonia Sotomayor are women.

Looking at the other end of the graph, though, shows something that hasn’t been taken into account. Clarence Thomas wasn’t interrupted at all. That’s not primarily because he’s a man; it’s primarily because he almost never says anything.

Interpreting the interruptions really needs some denominator. Fortunately, we have denominators. Adam Feldman wrote another post about them.

Here’s the number interruptions per 1000 words, with the judges sorted in order of  how much they speak


And here’s the same thing with interruption per 100 ‘utterances’


It’s still pretty clear that the female judges are interrupted more often (yes, this is statistically significant (though not very)). Taking the amount of speech into account makes the differences smaller, but, interestingly, also shows that Ruth Bader Ginsburg is interrupted relatively often.

Denominators do matter.

April 28, 2016

Māori imprisonment statistics: not just age

Jarrod Gilbert had a piece in the Herald about prisons

Fifty per cent of the prison population is Maori. It’s a fact regularly cited in official documents, and from time to time it garners attention in the media. Given they make up 15 per cent of the population, it’s immediately clear that Maori incarceration is highly disproportionate, but it’s not until the numbers are given a greater examination that a more accurate perspective emerges.

The numbers seem dystopian, yet they very much reflect the realities of many Maori families and neighbourhoods.

to know what he was talking about, qualitatively. I mean, this isn’t David Brooks.

It turns out that while you can’t easily get data on ethnicity by age in the prison population, you can get data on age, and that this is enough to get a good idea of what’s going on, using what epidemiologists call “indirect standardisation”.

Actually, you can’t even easily get data on age, but you can get a graph of age:

and I resorted to software that reconstructs the numbers.

Next, I downloaded Māori population estimates by age and total population estimates by age from StatsNZ, for ages 15-84.  The definition of Māori won’t be exactly the same as in Dr Gilbert’s data. Also, the age groups aren’t quite right because we’d really like the age when the offence happened, not the current age.  The data still should be good enough to see how big the age bias is. In these age groups, 13.2% of the population is Māori by the StatsNZ population estimate definition.

We know what proportion of the prison population is in each age group, and we know what the population proportion of Māori is in each age group, so we can combine these to get the expected proportion of Māori in the prison population accounting for age differences. It’s 14.5%.  Now, 14.5% is higher than 13.2%, so the age-adjustment does make a difference, and in the expected direction, just not a very big difference.

We can also see what happens if we use the Māori population proportion from the next-younger five-year group, to allow for offences being committed further in the past. The expected proportion is then 15.3%, which again is higher than 13.2%, but not by very much. Accounting for age, it looks as though Māori are still more than three times as likely to be in prison as non-Māori.

You might then say there are lots of other variables to be looked at. But age is special.  If it turned out that Māori incarceration rates could be explained by poverty, that wouldn’t mean their treatment by society was fair, it would suggest that poverty was how it was unfair. If the rates could be explained by education, that wouldn’t mean their treatment by society was fair; it would suggest education was how it was unfair. But if the rates could be explained by age, that would suggest the system was fair. They can’t be.

April 27, 2016

Not just an illusion

There’s a headline in the IndependentIf you think more celebrities are dying young this year, you’re wrong – it’s just a trick of the mind“. And, in a sense, Ben Chu is right. In a much more important sense, he’s wrong.

He argues that there are more celebrities at risk now, which there are. He says a lot of these celebrities are older than we realise, which they are. He says that the number of celebrity deaths this year is within the scope of random variation looking at recent times, which may well be the case. But I don’t think that’s the question.

Usually, I’m taking the other side of this point. When there’s an especially good or especially bad weekend for road crashes, I say that it’s likely just random variation, and not evidence for speeding tolerances or unsafe tourists or breath alcohol levels. That’s because usually the question is whether the underlying process is changing: are the roads getting safer or more dangerous.

This time there isn’t really a serious question of whether karma, global warming, or spiders from Mars are killing off celebrities.  We know it must be a combination of understandable trends and bad luck that’s responsible.  But there really have been more celebrities dying this year.   Prince is really dead. Bowie is really dead. Victoria Wood, Patty Duke, Ronnie Corbett, Alan Rickman, Harper Lee — 2016 has actually happened this way,  it hasn’t been (to steal a line from Daniel Davies) just a particularly inaccurate observation of the underlying population and mortality patterns.

April 11, 2016

Missing data

Sometimes…often…practically always… when you get a data set there are missing values. You need to decide what to do with them. There’s a mathematical result that basically says there’s no reliable strategy, but different approaches may still be less completely useless in different settings.

One tempting but usually bad approach is to replace them with the average — it’s especially bad with geographical data.  We’ve seen get this badly wrong with kidnappings in Nigeria, we’ve seen maps of vaccine-preventable illness at epidemic proportions in the west Australian desert, we’ve seen Kansas misidentified as the porn centre of the United States.

The data problem that attributed porn to Kansas has more serious consequences. There’s a farm not far from Wichita that, according to the major database providing this information, has 600 million IP addresses.  Now think of the reasons why someone might need to look up the physical location of an internet address. Kashmir Hill, at Fusion, looks at the consequences, and at how a better “don’t know” address is being chosen.

April 9, 2016

Movie stars broken down by age and sex

The folks at Polygraph have a lovely set of interactive graphics of number of speaking lines in 2000 movie screenplays, with IMDB look-ups of actor age and gender.  If you haven’t been living in a cave on Mars, the basic conclusion won’t be surprising, but the extent of the differences might. Frozen, for example, gave more than half the lines to male characters.

They’ve also made a lot of data available on Github for other people to use. Here’s a graph combining the age and gender data in a different way than they did: total number of speaking lines by age and gender


Men and women have similar number of speaking lines up to about age 30, but after that there’s a huge separation and much less opportunity for female actors.  We can all think of exceptions: Judi “M” Dench, Maggie “Minerva” Smith, Joanna “Absolutely no relation” Lumley, but they are exceptions.

March 18, 2016

What they aren’t telling you is a beautiful visualisation of what news topics are less covered in your country (or any selected country) than on average for the world:


For a lot of these topics it will be obvious why they’re just not that relevant, but not always.

(via Harkanwal Singh)

November 1, 2015

Twitter polls and news feeds


I don’t know why this feels worse that the bogus clicky polls on newspaper websites. Maybe it’s the thought of someone actually believing the sampling scheme says something useful. Maybe it’s being in Twitter, where following a news headline feed usually gets you news headlines. Maybe it’s that the polls are so bad: restricting a discussion of Middle East politics to two options with really short labels makes even the usual slogan-based dialogue look good in comparison.

In any case, I really hope this turns out to be a failed experiment, and that we can keep Twitter polls basically as jokes.


October 30, 2015

Pie charts “a menace”, study shows

StatsChat can reveal exclusive study results showing that pie charts are a menace to over 75% of us.

Although these round, delicious, data metaphors have been maligned in the past, this is the first research of its kind, based on newly-available survey technology.

Researchers used an online, multi-wave, respondent-driven sampling scheme to reach thousands of potential respondents. 77% of responses agreed that pie charts are a menace.


Aren’t these new Twitter polls wonderful?