January 14, 2018


  • Metropolitan Museum of Art President “For various reasons, over the past 10 or 12 years, the pay-as-you-wish policy has failed. It has declined by 71% in the amount people pay.” Felix SalmonIt’s worth fact-checking this, because it turns out that it’s not really true”
  • Cloudflare, a company that distributes websites across the world, has a wall of lava lamps that it uses for random number generation (presumably to seed computational pseudorandom generators)
  • “Do algorithms reveal sexual orientation or just expose our stereotypes?”— on last year’s ‘gaydar’ paper.
  • 538 looks at how they got an analysis of broadband internet availabilty wrong, due to bad data.
  • “The projects tried to show hidden patterns of our daily shopping….Unfortunately, it shows only the internal categorization and sorting of the supermarket.” Another example of data not meaning what you think it means. Christian Laesser (via FlowingData)
  • Child protective agencies are haunted when they fail to save kids. Pittsburgh officials believe a new data analysis program is helping them make better judgment calls.from the New York Times.
  • The NZ government has released a review of the handling of weather data (PDF)
  • From the LSE Impact blog “Academics looking to communicate the findings and value of their research to wider audiences are increasingly going through the media to do so. But poor or incomplete reporting can undermine respect for experts by misrepresenting research, especially by trivialising or sensationalising it, or publishing under inappropriate headlines and with cherry-picked statistics.”  As StatsChat readers will known a lot of this is public-relations people, but some of it is definitely the researchers.
  • The scientific reporting of some pre-clinical research is disturbingly crap: a report in the BMJ; Siouxsie Wiles commenting at The Spinoff
  • Constructing optical illusions for AI visual systems: (gory technical details)
  • You may have seen reports of research saying that Australian hawks spread bushfires…

January 10, 2018

Complete balls

The UK’s Metro magazine has a dramatic story under the headline Popping ibuprofen could make your balls shrivel up

Got a pounding headache?

You might just want to give a big glass of water and a nap a go before reaching for the painkillers. Scientists warn that ibuprofen could be wrecking men’s fertility by making their balls shrivel up.

Sounds pleasant.

Fortunately, that’s not what the study showed.

The story goes on

Researchers looked at 31 male participants and found that taking ibuprofen reduced production of testosterone by nearly a quarter in the space of around six weeks.

That’s also almost completely untrue. In fact, the research paper says (emphasis added)

We investigated the levels of total testosterone and its direct downstream metabolic product, 17β-estradiol. Administration of ibuprofen did not result in any significant changes in the levels of these two steroid hormones after 14 d or at the last day of administration at 44 d. The levels of free testosterone were subsequently analyzed by using the SHBG levels. Neither free testosterone nor SHBG levels were affected by ibuprofen.

Stuff has a much better take on this one:

Men who take ibuprofen for longer than the bottle advises could be risking their fertility, according to a new study.

Researchers found that men who took ibuprofen for extended periods had developed a condition normally seen in elderly men and smokers that, over time, can lead to fertility problems

Ars Technica has the more accurately boring headline Small study suggests ibuprofen alters testosterone metabolism.

The study involved 14 men taking the equivalent of six tablets a day of ibuprofen for six weeks (plus a control group). Their testosterone levels didn’t change, but the interesting research finding is that this was due to compensation for what would otherwise have been a decrease. That is, a hormone signalling to increase testosterone production was elevated.  There’s a potential risk that if the men kept taking ibuprofen at this level for long enough, the compensation process might give up. And that would potentially lead to fertility problems — though not (I don’t think) to the problems Metro was worried about.

So, taking ibuprofen for months on end without a good reason? Probably inadvisable. Like it says on the pack.


January 9, 2018

Election maps: what’s the question?

XKCD has come out with a new map of the 2016 US election

In about 2008 I made a less-artistic one of the 2004 elections on similar principles

These maps show some useful things about the US vote:

  1. the proportions for the two parties are pretty close, but
  2. most of the land area has very few voters, and
  3. most areas are relatively polarised
  4. but not as polarised as you think, eg, look at the cities in Texas

What these maps are terrible at is showing changes from one election to the next. The map for 2004 (Republicans ahead by about 2.5%) and 2016 (Republicans behind by about 3%) look very similar. And even 2008 (Republicans behind by 7%) wouldn’t look that different.

Like a well-written thousand words, a well-drawn picture needs to be about something. Questions matter. The data don’t speak for themselves.

January 8, 2018

Long tail of baby names

The Dept of Internal Affairs has released the most common baby names of 2017 (NZ is, I think, the first country each year to do this), and Radio NZ has a story.  A lot of names popular last year were also popular in the past; a few (eg Arlo) are changing fast.

If you look at the sixty-odd years of data available, there’s a dramatic trend. In 1954, ‘John’ was the top boy’s name, with 1389 uses. In 2017 the top was ‘Oliver’, but with only 314 uses — not enough to make 1954’s top twenty. According to the government, there were nearly 13,000 different names given last year, so the mean number of babies per name is under 5; the most popular names are still much more popular than average. But less so than in the past.

Here’s the trend in the number of babies given the top name

and the top ten names

and the top hundred names

That decrease is despite an increase in the total population: here’s the top 10 names as a percentage of all babies (assuming 53% of babies are boys)

and the top 100 names

The proportion with any of the top 100 names has been going down consistently, and also becoming less different between boys and girls.


Not dropping every year

Stuff has a story on road deaths, where Julie Ann Genter claims the Roads of National Significance are partly responsible for the increase in death rates. Unsurprisingly, Judith Collins disagrees.  The story goes on to say (it’s not clear if this is supposed to be indirect quotation from Judith Collins)

From a purely statistical viewpoint the road toll is lowering – for every 10,000 cars on the road, the number of deaths is dropping every year.

From a purely statistical viewpoint, this doesn’t seem to be true. The Ministry of Transport provides tables that show a rate of fatalities per 10,000 registered vehicles of 0.077 in 2013, 0.086 in 2014,  0.091 in 2015, and  0.090 in 2016. Here’s a graph, first raw

and now with a fitted trend (on a log scale, since the trend is straighter that way)

Now, it’s possible there’s some other way of defining the rate that doesn’t show it going up each year. And there’s a question of random variation as always. But if you scale for vehicles actually on the road, by using total distance travelled, we saw last year that there’s pretty convincing evidence of an increase in the underlying rate, over and above random variation.

The story goes on to say “But Genter is not buying into the statistics.” If she’s planning to make the roads safer, I hope that isn’t true.


  • “Every now and then a story appears in the media about how boffins (and it is always “boffins”) have worked out an equation for something: the perfect cup of tea, the most depressing day of the year, the best way to make pancakes, the perfect handshake, or in the most recent case, the perfect cheese on toast.” The equation for the perfect bullshit equation.
  • The BBC’s statistics-in-the-media radio program More or Less has a special ‘statistics of the year’ episode
  • Some interesting student projects from a data visualisation class
  • How Spotify picks your music.
  • “Average London”: averages of tourist photos of the same London attraction.
  • Displaying uncertainty in the UK unemployment rate
  • One of the problems in training modern neural network classifiers is that they will pick up on anything, sensible or not. Luke Oakden-Rayner writes about a popular set of data from chest x-rays and why it won’t teach the computers the right things.
  • The American Academy of Family Physicians is not endorsing new blood pressure standards that would increase the proportion of US adults defined as having hypertension from about 1/3 to about 1/2.
January 2, 2018

Consider a spherical cow

Part of the point of mathematical modelling is discarding unimportant features of a problem to make it tractable. But you have to discard the right features. Here are two recent stories about mathematical optimisation.

Jason Steffen has invented a more efficient way of getting passengers on to planes — not just more efficient than what US airlines actually do, but even more efficient than letting passengers board at random. He writes

So, why isn’t this optimum method of airplane boarding being adopted by any carrier in the industry? One significant reason may be the challenge of its implementation — lining passengers up in such a rigid order. 

I’d argue that a much more important reason is you’d have to get rid of priority boarding, making frequent flyers queue with everyone else and depriving them of their chance to get the lion’s share of overhead luggage space.  A model that doesn’t account for the power of frequent flyers is solving the wrong optimisation problem to get implemented.

The same sort of issue often turns up in US discussions of partisan gerrymandering, where you’ll see mathematicians write about algorithms for perfect electorate design. These don’t solve an existing problem, because they don’t take into account who actually draws districts: it isn’t impartial mathematicians.  The main theoretical limitation on gerrymandering in the US is the power of courts to declare a partisan redistricting plan unconstitutional — but they aren’t willing to do so.  Justice Scalia wrote in 2004

    Eighteen years of judicial effort with virtually nothing to show for it justify us in revisiting the question whether the standard promised by Bandemer exists. As the following discussion reveals, no judicially discernible and manageable standards for adjudicating political gerrymandering claims have emerged. Lacking them, we must conclude that political gerrymandering claims are nonjusticiable and that Bandemer was wrongly decided.

There’s a new effort to change this, from Wisconsin. In 2015, the state was sued in federal District Court over its redistricting plan, and lost. The case focused on the ‘efficiency gap’; the difference in the number of ‘wasted’ votes between the two parties (as a percentage of all votes cast). The Supreme Court has heard an appeal in October this year and is thinking about it.

Patrick Honner wrote about the efficiency-gap proposal for Quanta, but there’s a lot more detail in a 2015 expert-witness report by Simon Jackman (PDF), an Australian political scientist at Stanford.

December 31, 2017

Sweet as

Today, Stuff’s “Well and Good” section has

There’s nothing wrong with the content, which describes some interesting dry sparkling wines one might want to try (if one liked that sort of thing enough to spend that much).  But it’s not a health story.

The very-low-sugar wines differ from ordinary ‘brut’ champagne by less than 10 grams of sugar per litre, in a drink that has more than 120 grams of alcohol per litre.   The sugar in an ordinary ‘brut’ bubbly is maybe 5% of the calorie content.

Not everything has to be about health.

Hangover cures that work

December 30, 2017

Bitter and twisted

From the New York Daily News: Study finds gin and tonic drinkers are more likely to be psychopaths, sadists

That’s not quite what the study finds. A slightly revised version is a couple of paragraphs into the story (credit for linking, but with a penalty for not mentioning it’s from 2015)

Researchers at Innsbruck University found that people who enjoy bitter flavors like the tonic water in a gin and tonic, black coffee, and dark chocolate are more prone to “Machiavellianism, psychoticism, and narcissism,” among other traits.

Here’s the list of ‘bitter’ flavoured foods they used (from)

bitter melon, cabbage, coffee, cottage cheese, grapefruit, radishes, rye bread, tea, and tonic water.

You might well think that preferences for these foods had a lot of other cultural associations on top of bitterness, and that added sugar or salt would make a big difference. And the researchers agreed, writing

Thus, due to the bitter items’ poor face validity, we refrained from formulating precise predictions regarding them. Moreover, previous research has shown that assessing taste preference is not a simple endeavor. For example, many preference measures often yield low reproducibility or are influenced by social desirability. Thus, we included this list for exploratory reasons.

They did find correlations between preferences for this list of ‘bitter’ foods and the negative personality traits (to the extent that they’re measurable on Mechanical Turk workers) — but the correlation predicted about 2% of the variability in psychopathy and sadism, and about 1% of the variability in Machiavellianism. And those are probably over-estimates given the selection bias of the news process.

There’s a more important problem, though, with the idea that ordering a gin and tonic at the bar reveals your friend’s hidden psychopathic nature. As always, the question in statistics is “compared to what”, and a G&T is not the only notably bitter beverage often consumed at the pub.