Posts filed under Social Media (88)

May 29, 2016

I’ma let you finish

Adam Feldman runs the blog Empirical SCOTUS, with analyses of data on the Supreme Court of the United States. He has a recent post (via Mother Jones) showing how often each judge was interrupted by other judges last year:


For those of you who don’t follow this in detail, Elena Kagan and Sonia Sotomayor are women.

Looking at the other end of the graph, though, shows something that hasn’t been taken into account. Clarence Thomas wasn’t interrupted at all. That’s not primarily because he’s a man; it’s primarily because he almost never says anything.

Interpreting the interruptions really needs some denominator. Fortunately, we have denominators. Adam Feldman wrote another post about them.

Here’s the number interruptions per 1000 words, with the judges sorted in order of  how much they speak


And here’s the same thing with interruption per 100 ‘utterances’


It’s still pretty clear that the female judges are interrupted more often (yes, this is statistically significant (though not very)). Taking the amount of speech into account makes the differences smaller, but, interestingly, also shows that Ruth Bader Ginsburg is interrupted relatively often.

Denominators do matter.

April 28, 2016

Māori imprisonment statistics: not just age

Jarrod Gilbert had a piece in the Herald about prisons

Fifty per cent of the prison population is Maori. It’s a fact regularly cited in official documents, and from time to time it garners attention in the media. Given they make up 15 per cent of the population, it’s immediately clear that Maori incarceration is highly disproportionate, but it’s not until the numbers are given a greater examination that a more accurate perspective emerges.

The numbers seem dystopian, yet they very much reflect the realities of many Maori families and neighbourhoods.

to know what he was talking about, qualitatively. I mean, this isn’t David Brooks.

It turns out that while you can’t easily get data on ethnicity by age in the prison population, you can get data on age, and that this is enough to get a good idea of what’s going on, using what epidemiologists call “indirect standardisation”.

Actually, you can’t even easily get data on age, but you can get a graph of age:

and I resorted to software that reconstructs the numbers.

Next, I downloaded Māori population estimates by age and total population estimates by age from StatsNZ, for ages 15-84.  The definition of Māori won’t be exactly the same as in Dr Gilbert’s data. Also, the age groups aren’t quite right because we’d really like the age when the offence happened, not the current age.  The data still should be good enough to see how big the age bias is. In these age groups, 13.2% of the population is Māori by the StatsNZ population estimate definition.

We know what proportion of the prison population is in each age group, and we know what the population proportion of Māori is in each age group, so we can combine these to get the expected proportion of Māori in the prison population accounting for age differences. It’s 14.5%.  Now, 14.5% is higher than 13.2%, so the age-adjustment does make a difference, and in the expected direction, just not a very big difference.

We can also see what happens if we use the Māori population proportion from the next-younger five-year group, to allow for offences being committed further in the past. The expected proportion is then 15.3%, which again is higher than 13.2%, but not by very much. Accounting for age, it looks as though Māori are still more than three times as likely to be in prison as non-Māori.

You might then say there are lots of other variables to be looked at. But age is special.  If it turned out that Māori incarceration rates could be explained by poverty, that wouldn’t mean their treatment by society was fair, it would suggest that poverty was how it was unfair. If the rates could be explained by education, that wouldn’t mean their treatment by society was fair; it would suggest education was how it was unfair. But if the rates could be explained by age, that would suggest the system was fair. They can’t be.

April 27, 2016

Not just an illusion

There’s a headline in the IndependentIf you think more celebrities are dying young this year, you’re wrong – it’s just a trick of the mind“. And, in a sense, Ben Chu is right. In a much more important sense, he’s wrong.

He argues that there are more celebrities at risk now, which there are. He says a lot of these celebrities are older than we realise, which they are. He says that the number of celebrity deaths this year is within the scope of random variation looking at recent times, which may well be the case. But I don’t think that’s the question.

Usually, I’m taking the other side of this point. When there’s an especially good or especially bad weekend for road crashes, I say that it’s likely just random variation, and not evidence for speeding tolerances or unsafe tourists or breath alcohol levels. That’s because usually the question is whether the underlying process is changing: are the roads getting safer or more dangerous.

This time there isn’t really a serious question of whether karma, global warming, or spiders from Mars are killing off celebrities.  We know it must be a combination of understandable trends and bad luck that’s responsible.  But there really have been more celebrities dying this year.   Prince is really dead. Bowie is really dead. Victoria Wood, Patty Duke, Ronnie Corbett, Alan Rickman, Harper Lee — 2016 has actually happened this way,  it hasn’t been (to steal a line from Daniel Davies) just a particularly inaccurate observation of the underlying population and mortality patterns.

April 11, 2016

Missing data

Sometimes…often…practically always… when you get a data set there are missing values. You need to decide what to do with them. There’s a mathematical result that basically says there’s no reliable strategy, but different approaches may still be less completely useless in different settings.

One tempting but usually bad approach is to replace them with the average — it’s especially bad with geographical data.  We’ve seen get this badly wrong with kidnappings in Nigeria, we’ve seen maps of vaccine-preventable illness at epidemic proportions in the west Australian desert, we’ve seen Kansas misidentified as the porn centre of the United States.

The data problem that attributed porn to Kansas has more serious consequences. There’s a farm not far from Wichita that, according to the major database providing this information, has 600 million IP addresses.  Now think of the reasons why someone might need to look up the physical location of an internet address. Kashmir Hill, at Fusion, looks at the consequences, and at how a better “don’t know” address is being chosen.

April 9, 2016

Movie stars broken down by age and sex

The folks at Polygraph have a lovely set of interactive graphics of number of speaking lines in 2000 movie screenplays, with IMDB look-ups of actor age and gender.  If you haven’t been living in a cave on Mars, the basic conclusion won’t be surprising, but the extent of the differences might. Frozen, for example, gave more than half the lines to male characters.

They’ve also made a lot of data available on Github for other people to use. Here’s a graph combining the age and gender data in a different way than they did: total number of speaking lines by age and gender


Men and women have similar number of speaking lines up to about age 30, but after that there’s a huge separation and much less opportunity for female actors.  We can all think of exceptions: Judi “M” Dench, Maggie “Minerva” Smith, Joanna “Absolutely no relation” Lumley, but they are exceptions.

March 18, 2016

What they aren’t telling you is a beautiful visualisation of what news topics are less covered in your country (or any selected country) than on average for the world:


For a lot of these topics it will be obvious why they’re just not that relevant, but not always.

(via Harkanwal Singh)

November 1, 2015

Twitter polls and news feeds


I don’t know why this feels worse that the bogus clicky polls on newspaper websites. Maybe it’s the thought of someone actually believing the sampling scheme says something useful. Maybe it’s being in Twitter, where following a news headline feed usually gets you news headlines. Maybe it’s that the polls are so bad: restricting a discussion of Middle East politics to two options with really short labels makes even the usual slogan-based dialogue look good in comparison.

In any case, I really hope this turns out to be a failed experiment, and that we can keep Twitter polls basically as jokes.


October 30, 2015

Pie charts “a menace”, study shows

StatsChat can reveal exclusive study results showing that pie charts are a menace to over 75% of us.

Although these round, delicious, data metaphors have been maligned in the past, this is the first research of its kind, based on newly-available survey technology.

Researchers used an online, multi-wave, respondent-driven sampling scheme to reach thousands of potential respondents. 77% of responses agreed that pie charts are a menace.


Aren’t these new Twitter polls wonderful?

October 22, 2015

Early NZ data visualisation

From the National Library of New Zealand, via Jolisa Gracewood


Types of motor-vehicle accidents in rural areas vary considerably from those ocourrlng In urban areas, as shown in tho above chart. Tho percentages are based on figures of the Transport Department in respect of accidents causing’ fatalltles during the twelve months, April I, 1932, to March 31, 1933.

The text goes on to say “The black section representing collisions with tram and train forms only I per cent, of the whole, through this type of accident appeals to the popular Imagination’ from its spectacular nature.”  Some things don’t change.

September 21, 2015

It’s bad enough without exaggerating

This UK survey report is being a bit loose with the details, in a situation where that’s not even needed

stem for boys

The survey of more than 4,000 girls, young women, parents and teachers, demonstrates clearly that there is a perception that STEM subjects and careers are better suited to male personalities, hobbies and brains. Half (51 percent) of the teachers and 43 percent of the parents surveyed believe this perception helps explain the low uptake of STEM subjects by girls. [emphasis added]

Those aren’t the same thing at all.  I believe this perception helps explain the low uptake of STEM subjects by girls. Michelle ‘Nanogirl’ Dickinson believes this perception helps explain the low uptake of STEM subjects by girls. It’s worrying that nearly more than half of UK teachers don’t believe this perception helps explain the low uptake of STEM subjects by girls.

On the other hand, this is depressing and actually does seem to be what the survey said:

Nearly half (47 percent) of the young girls surveyed said they believe such subjects are a better match for boys.

as does this

difficult subjects It would fit with NZ experience if a lot of boys felt the same about the difficulty of science and maths, but that wouldn’t actually make it any better.