Posts written by Thomas Lumley (1690)


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient

February 7, 2016

Why do we care?

From the history of the Manchester Statistical Society

Manchester Statistical Society was a pioneering organisation: It was the first organisation in Britain to study social problems systematically and to collect statistics for social purposes. In 1834 it was the first organisation to carry out a house-to-house social survey.

The Society was formed in September 1833 at a time of severe social problems. Few of the founders were statisticians in the modern, technical sense. But, they were interested in improving the state of the people and believed that establishing the facts regarding social problems was a necessary first step. 

From an earlier organisation in London (via)

‘The privation and misery endured by the productive classes of society in Great Britain in 1816 and 1817, led to the formation of an Association in London, for the purpose of investigating the nature and extent of that misery; and of ascertaining, if possible, how far it resulted from avoidable or from unavoidable causes; and how far repetitions of similar ills were likely or not to occur’.

Groups of wealthy men saying they want to improve society isn’t new.  Nor is it new that they don’t know enough to do much good.  What was different in the early nineteenth century was that they recognised they didn’t know enough. The Statistical Societies were founded to provide information about social problems that went beyond any individual’s range of anecdotes, because the truth mattered.

The range of statistics has broadened immensely since then, especially with the help of computers. At the foundation is still the principle that one person’s reckons aren’t enough: the world is more complicated than that and the truth matters.

I’m not arguing that statistics has to be Important and Serious. If you want to know whether The Rock plays the same music at the same time each day or who is likely to win the rugby, statistics can help there. If you care enough about how other people eat their cereal, that’s a valid topic for investigation. The bottom line is that you do actually care about the answer; that the truth matters.

For a depressingly large fraction of surveys in the news today, no-one really cares whether the answer is accurate or even what the question means. Maybe it’s ok to have sections of the newspaper where facts aren’t really relevant — you need to ask a journalist, not me. But when the truth doesn’t matter, stop pretending to use statistics.

Zombie bogus surveys

From Food Network magazine, via Twitter, via Julie Blommaert


There’s no more detail than “Kellogg’s” as the source, and the Kellogg’s website is very sensibly not admitting to anything.

Some more Google finds two stories from September last year — getting the factoid into a real paper magazine, because of the publication time lag, gives it another chance to roam the earth looking for brains.

Even though it has to be the same survey, the story from Vice says “a full one-fifth of Americans are using orange juice in their cereal instead of milk,” though Bustle says “More than 10 percent of Americans admitted to using orange juice or coffee”.   It’s not just that the numbers are inconsistent, the phrasing in one case suggests “do you usually?” as the question, the other “have you ever?” It matters, or at least it would if anything about this mattered.

We’re also not told whether these are really supposed to be proportions of “Americans” or of “Americans who eat cereal”, or “Americans who eat cereal for breakfast”, or whatever.

Usefully, the Vice story does give a bit more detail about the survey

Two thousand US consumers and college students from all over the country participated in the study, with about 30 percent male subjects and 70 percent female. The participants were of all ages, with half being college students and the rest varied (14 percent between the ages of 25 and 34 years old, 16 percent between 35 and 44 years old, about a quarter between 45 and 54 years old, and the rest scattered in older or younger age groups). 

They don’t say how the participants were recruited or surveyed, but there’s enough information there to make it clear the data would be meaningless even if we knew what the questions were and what percentages the survey actually found.


  • Locksmiths gaming Google and Google Maps: from the New York Times. 
  • We have “lead generation systems” here too, but I haven’t seen any suggestion that these are scams, just somewhat misleading advertising aiming to look local. Eg, from a site that’s appealingly honest about how it works

If your business covers the whole of Auckland for example then we would set up 200 websites for you which is one for every suburb in the city covering 6 districts

  • What’s the first word that comes to your mind starting with SUPP? If you said “SURGERY”, you’re not alone — or maybe you’re not real. “Disturbing oddities” in a research paper.
  • A map of the longest flights — Auckland-Houston and Auckland-Vancouver don’t make it.
  • Herald Insights has an exploration of unemployment rates over the recession and recovery, by census ethnic group and age.
  • A company that may be taking measurement too far: “Any meeting of at least three people is expected to hold at least one poll.”
February 6, 2016

Bogus polls from Nature

Soon after Twitter polls were introduced, Al-Jazeera News used one to poll its followers about US intervention in Syria. Fortunately, it seems to have been a failed experiment (the poll, that is).

Now, Nature News is doing it:


Twitter polls can work as jokes and commentary, and they might work for gathering opinions from your friends. They shouldn’t be allowed to masquerade as data collection.

February 5, 2016

DIY investigations

So, the Herald has a story headlined Men’s DIY skills ‘dying out’ – study. Here’s the first three paragraphs

Men are no longer able to carry out traditional DIY – with most now opting to call in tradesmen, research shows.

Most men cannot change a tyre, while only half can wire a plug and just one in five are able to fix a dripping tap.

Three in five men would need to call in a plumber to unblock a toilet, while only a third feel confident about putting together flat-pack furniture.

You might think this raises some questions. For example:

  • Is there any information in the story suggesting the numbers are reliable? (not really)
  • Do they have comparable data from the past to support the ‘dying out’, ‘no longer able’ and so on? (no)
  • Isn’t this whole gendered housework thing mildly offensive and about five decades out of date? (yes)
  • Hasn’t it been a cliche since flat-pack furniture was invented that most people don’t feel confident assembling it? (yes)

A question that might not spring immediately to mind: “What country are these numbers from?”

The stock photo isn’t much use. It shows a man carrying a plank while smiling — maybe one of the other dying skills?  Google Image Search finds a vendor who has it tagged as Calgary, Alberta (Canada).

However, there are two clues in the text I’ve quoted.  The first clue is that there aren’t any specifically NZ journalistic cliches — it’s hard to imagine a Kiwi writing this sort of crap without referencing, say, number 8 wire.

The second clue is more subtle: “only half can wire a plug”.  Until 1992, appliances in the UK were often sold without plugs attached, and wiring a plug was a more important household skill than in the rest of the world.

Reading on, the fourth paragraph confirms that this is a UK story. It’s from the Daily Mail and it exists to advertise a UK men’s clothing company.  If you found the story annoying, I would encourage you not to investigate their blog further.

A modest proposal: it appears that news sites have to publish a certain number of these surveys. Maybe they could trim out the name of the sponsoring company, and just provide it in a link for readers who really cared? Then the story could be assessed on its true news (or perhaps entertainment) value.


Returning-from-travel edition

  • Wordbank is an open database of information about children’s vocabulary growth. With pointy-clicky apps.
  • The Ethical Data Scientist. Cathy O’Neil, writing at Slate. “As long as our world is not perfect, and as long as data is being collected on that world, we will not be building models that are improvements on our past unless we specifically set out to do so.”
  • Billboards that use DNA on discarded cigarette butts or gum to recreate images of the litterer. “Obviously we have no photos of the original litterers to compare the sketches to, but according to Oglivy, the results are accurate.” [trust us, we’re an advertising agency]
  • HeadlineRevealed: The drug that keeps you young. Story: “Experts say they had extended the lifespan of mice by 35 per cent” and “The one used on mice is not suitable for people”
February 3, 2016

Units matter

I think this is just a typo in the Herald story about testing for methamphetamine in state houses:

Forensic scientist Dr Nick Powell said any meth contamination above 0.5mcg per 100sq m of surface risked headaches, coughs and sleeplessness, and poorer appetite and infant brain growth.

The NZ standard for cleanups is 0.5μg per 100cm2, which is designed to be safe, rather than just on the borderline of danger, so even with square centimetres this is a strong claim. Changing the denominator to square metres makes the claim ten thousand times stronger.

February 2, 2016

Not everything is a breakthrough. But.

Jack Scanlan, at Lateral

Steps could be taken to reduce the hype associated with every tiny step towards curing a disease or developing a new technology, but the trade-off would be a reduction in the amount of science making its way to the public.

Helpful prediction vs recycled prejudice

From What World Are We Building?  by danah boyd

One of the perennial problems with the statistical and machine learning techniques that underpin “big data” analytics is that they rely on data entered as input. When the data you input is biased, what you get out is just as biased. These systems learn the biases in our society, and they spit them back out at us.


January 25, 2016


  • “Claims that forensic experts can match a bullet or shell casing found at a crime scene to a specific weapon lack a scientific basis and should be barred from criminal trials as misleading, a D.C. Court of Appeals judge wrote this week.”  The judge objected to claims that the evidence proved a ‘unique’ match. And quite right, too.
  • You can prove that a treatment works without knowing how it works, but it’s much harder to find treatments that way. Lithium for bipolar and other mood disorders is an excellent example
  • You might have heard of CRISPR in the news and wondered what exactly it was.  It’s a technique for cutting DNA at very easily customised locations, for example, to allow for new sequences to be inserted. Good references