November 16, 2014

John Oliver on the lottery

When statisticians get quoted on the lottery it’s pretty boring, even if we can stop ourselves mentioning the Optional Stopping Theorem.

This week, though, John Oliver took on the US state lotteries: “..,more than Americans spent on movie tickets, music, porn, the NFL, Major League Baseball, and video games combined. “

October 13, 2014

Context from everyday units

From @JohnDonoghue64 on Twitter


From the Guardian, a few years ago

Perhaps, as with metric and imperial measurements, such comparisons should be given convenient abbreviations: SoWs (size of Wales), SoBs (size of Belgium), OSPs (Olympic swimming pools), DDBs (buses) and so on. Thus the Kruger national park in South Africa measures 1 SoW (Daily Telegraph), as do Lesotho (London Evening Standard) and Israel (Times), whereas Lake Nzerakera in Tanzania is 2 SoBs (Observer).

At times the most carefully calibrated calculations can go awry. So we learn that Helmand province in Afghanistan is “four times the size of Wales” (Daily Telegraph, 2 December 2009) only to find a few weeks later that it has apparently shrunk to “the size of Wales” (Daily Telegraph, 29 January 2010).

For the benefit of NZ readers, a badger appears to weigh about the same as three female North Island brown kiwi, two typical merino fleeces, or half a case of Malborough sav blanc. That should help you get a grasp on the size of the Lindisfarne Gospels.

September 24, 2014

That’s just a guess


While it’s nowhere near as annoying as Phoenix Organics “Don’t drink science“, Charlie’s could do better than ‘just a guess’ as to whether there are a million oranges in this truck

If there are ten oranges in a litre of juice, there are ten thousand in a cubic metre of juice, so a million oranges would make 100 cubic metres of juice. The little juice bottles probably don’t pack that efficiently, so you’d need more than 100 cubic metres of truck.

So, how big is a truck?  A standard twenty-foot container is 6.1m long, 2.44m wide, and 2.59m high, with a volume of 38.5 cubic metres.  That truck doesn’t look three times as big as a twenty-foot container to me.

There could be a hundred thousand oranges in that truck. I don’t think a million is feasible.

September 23, 2014

I’m not even sure where to begin on this highly important topic

New Zealand’s favourite biscuit.

I just clicked on a link on the homepage of the NZ Herald which says “NZ@Noon: NZ’s favourite biscuit revealed” which took me to an article with a snippet saying:

Bay of Plenty voters have taken to the polls. Find out which biscuit triumphed in the annual nationwide biscuit election.

This lead to another article with the headline: “Mallowpuffs voted Bay’s best biscuit” which includes the following (emphasis mine):

Bay of Plenty voters have taken to the polls and voted Mallowpuffs Original their favourite biscuit in an annual nationwide biscuit election.

Around the country, close to 5,000 votes were cast by biscuit-lovers who also voted Mallowpuffs Original as the national favourite, ahead of 57 other contenders.

Kiwi women were once again more passionate about pledging their support, contributing 94 percent of the votes nationwide.

The 2014 Bikkielections poll was conducted via an application on Griffin’s Facebook page from September 9 to 21 following weeks of campaigning via billboards, radio promotions, polling booths and street sampling. The poll has a margin of error of plus or minus zero percent.

That’s a first, right?

July 14, 2014


Why supermoons aren’t a big deal for earthquakes, based on XKCD


May 30, 2014

Trusting your data or your model

Even with large amounts of data, automated predictions must usually incorporate explicit or implicit prior understanding of the structure of the problem. “Look for anything” is not good enough: “anything” is too big.

Here, for your weekend light entertainment, are some examples where the prior structure was too strong or too weak:

The example that prompted this post, from the blog of Melville House Press, is about automated scanning of books to create digital editions

 in many old texts the scanner is reading the word ‘arms’ as ‘anus’ and replacing it as such in the digital edition. As you can imagine, you don’t want to be getting those two things mixed up.

A similar phenomenon was pointed out at Language Log a decade ago

Fear not your toes, though they are strong,
The conquest doth to you belong;

Daniel Dennett recounts two anecdotes of speech recognition, one human and one computer, which err in the opposite direction to the text recognition example. The computer one:

An AI speech-understanding system whose development was funded by DARPA (Defense Advanced Research Projects Agency), was being given its debut before the Pentagon brass at Carnegie Mellon University some years ago. To show off the capabilities of the system, it had been attached as the “front end” or “user interface” on a chess-playing program. The general was to play white, and it was explained to him that he should simply tell the computer what move he wanted to make. The general stepped up to the mike and cleared his throat–which the computer immediately interpreted as “Pawn to King-4.” 

And, the example that is frustratingly familiar to so many of us: mobile phone autocorrupt, which you can search for yourself.

May 16, 2014

Smarter than the average bear

Online polling company YouGov asked people in the US and Britain about how their intelligence compared to other people.

For the US, the results were



They pulled that graph only seconds after I found it, and replaced it with the more plausible


The British appear to be slightly more reluctant that the Americans to say they’re smarter than average, though it would be unwise to assume they are less likely to believe it.



March 16, 2014

The only way he knows how

Q: Did you see the story about aphrodisiacs on Stuff this weekend?

A: Yes

Q: How did they find out which ones worked?

A: It says “Richard Cornish investigates the only way he knows how.”

Q: Randomised n-of-1 trials with independent evaluation by someone who doesn’t know what he’s eaten?

A: Sadly, no.

Q: Allocating different foods, and some control foods, to a large group of people and collecting their reports?

A: No

Q: Getting a librarian to help him review the scientific research on the topic? Or the traditional knowledge?

A: Not really, though there are some biochemical or historical anecdotes for many of the items.

Q: Um. Did he just try each food as you would if you wanted to use it as an aphrodisiac?

A: Not that, either.

Q: I give up. What did he do?

A: ” It was my task to consume them in a bland environment, with no chance of any stimulation or excitement.”

Q: What a waste. But aren’t you being a bit harsh?  He’s a food writer and TV producer. He does sustainability and Spanish food. He’s not a science journalist or an investigative reporter.  They didn’t expect anyone to take it seriously.

A: Ok, but some of the nutrition stories and sex stories they run are supposed to be taken seriously. It should be easier to tell which is which online.

Q: Wait, isn’t it March now?

A: Yes.

Q: That sounds more like a Valentine’s Day column

A: An interesting point. You thought of that faster than I did.

Q: Well?

A: It is a Valentine’s Day column. From the Southland Times. Except they took out the foie gras and truffles to make it suitable for the national audience. Reruns aren’t just for The Simpsons, you know.

January 9, 2014

Infographic of the week

Via @keith_ng, this masterpiece showing that more searches for help lead to more language. Or something.


It’s not, sadly, unusual to see numbers being used just for ordering, but in this case the numbers don’t even agree with the vertical ordering.  And several of them aren’t, actually, languages. And the headline is just bogus.

This version, by Kevin Marks (@kevinmarks), at least is accurate and readable.


but it’s hard to tell how much of Java’s dominance is due to it being popular versus being confusing.

Adam Bard has data on the most popular languages on the huge open-source software repository GitHub. This isn’t quite the right denominator, since Stack Overflow users aren’t quite the same population as GitHub users, but it’s something.  Assigning iOS, Android, and Rails, to Objective-C, Java, and Ruby respectively, and scaling by GitHub popularity, we find that C# has the most StackOverflow queries per GitHub commit; Objective-C and Java have about two-thirds as many.  In the end, though, this data isn’t going to tell you much about either high-demand programming skills or the relative friendliness of different programming languages.



December 29, 2013

Brute force and ignorance

My grandfather, a high school maths teacher, characterised a mathematician as someone who would rather spend an hour working out the quick way to solve a problem than fifteen minutes doing it the slow way.

Computers are so fast nowadays that many traditional ‘recreational maths’ problems can be solved by some brute-force approach. Christian Robert translates an example from Le Monde,

A regular die takes the values 4, 8 and 2 on three adjacent faces. Summit values are defined by the product of the three connected faces, e.g., 64 for the above. What values do the three other faces take if the sum of the eight summit values is 1768? 

and provides R code that just tries lots of possibilities. On my laptop, the code runs in about a quarter of a second.

More practically, the same applies to a lot of calculations in statistics –for example, if you need to work out what sample size is needed for an experiment, it’s often easier to simulate the experiment at different sizes and see what happens than to work out the solution mathematically.

There’s a similar problem for quizzes that are often made trivial by Google. Often, but not always. The famous Christmas quiz from King William’s College, on the Isle of Man is made easier by search engines, but still takes effort. For example, the first question:

In the year 1913: what famous club was founded at Vrijstraat 20?

You won’t get the answer just by Googling “Vrijstaat 20″, at least not yet (eventually Google will pick up on it), but with a bit of extra effort you can determine it must be PSV Eindhoven (select the white text, if you want the answer).