March 9, 2015

Stat of the Week Competition: March 7 – 13 2015

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday March 13 2015.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of March 7 – 13 2015 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Not all there

One of the most common problems with data is that it’s not there. Families don’t answer their phones, over-worked nurses miss some forms, and even tireless electronic recorders have power failures.

There’s a large field of statistical research devoted to ways of fixing the missing-data problem. None of them work — that’s not my cynical opinion, that’s a mathematical theorem — but many of them are more likely to make things better than worse.  The best ways to handle data you don’t have depends on what sort of data and why you don’t have it, but even the best ways can confuse people who aren’t paying attention.

Just ignoring the missing data problem and treating the data you have as all the data is effectively assuming the missing data look just like the observed data. This is often very implausible. For example, in a weight-loss study it is much more likely that people who aren’t losing weight will drop out. If you just analyse data from people who stay in the study and follow all your instructions, unless this is nearly everyone, they will probably have lost weight (on average) even if your treatment is just staring at a container of felt-tip pens.

That’s why it is often sensible to treat missing observations as if they were bad. The Ministry of Health drinking water standards do this.  For example, they say that only 96.7% of New Zealand received water complying with the bacteriological standards. That sounds serious. Of the 3.3% failures, however, more than half (2.0%) were just failures to monitor thoroughly enough, and only 0.1% had E. coli transgression that were not followed up by immediate corrective action.

From a regulatory point of view, lumping these together makes sense. The Ministry doesn’t want to create incentives for data to ‘accidentally’ go missing whenever there’s a problem. From a public health point of view, though, you can get badly confused if you just look at the headline compliance figure and don’t read down to page 18.

The Ministry takes a similarly conservative approach to the other standards, and the detailed explanations are more reassuring than the headline compliance figures. There are a small number of water supplies with worrying levels of arsenic — enough to increase lifetime cancer risk by a tenth of a percentage point or so — but in general the biggest problem is inadequate fluoride concentrations in drinking water for nearly half of Kiwi kids.

 

March 5, 2015

Showing us the money

The Herald is running a project to crowdsource data entry and annotation for NZ political donations and expenses: it’s something that’s hard to automate and where local knowledge is useful. Today, they have an interactive graph for 2014 election donations and have made the data available

money

Briefly

March 4, 2015

NRL Predictions for Round 1

Team Ratings for Round 1

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Rabbitohs 13.06 13.06 -0.00
Cowboys 9.52 9.52 -0.00
Roosters 9.09 9.09 -0.00
Storm 4.36 4.36 0.00
Broncos 4.03 4.03 -0.00
Panthers 3.69 3.69 -0.00
Warriors 3.07 3.07 -0.00
Sea Eagles 2.68 2.68 0.00
Bulldogs 0.21 0.21 0.00
Knights -0.28 -0.28 -0.00
Dragons -1.74 -1.74 -0.00
Raiders -7.09 -7.09 -0.00
Eels -7.19 -7.19 -0.00
Titans -8.20 -8.20 0.00
Sharks -10.76 -10.76 -0.00
Wests Tigers -13.13 -13.13 -0.00

 

Predictions for Round 1

Here are the predictions for Round 1. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Broncos vs. Rabbitohs Mar 05 Rabbitohs -6.00
2 Eels vs. Sea Eagles Mar 06 Sea Eagles -6.90
3 Cowboys vs. Roosters Mar 07 Cowboys 3.40
4 Knights vs. Warriors Mar 07 Knights 0.70
5 Titans vs. Wests Tigers Mar 07 Titans 7.90
6 Panthers vs. Bulldogs Mar 08 Panthers 6.50
7 Sharks vs. Raiders Mar 08 Raiders -0.70
8 Dragons vs. Storm Mar 09 Storm -3.10

 

Super 15 Predictions for Round 4

Team Ratings for Round 4

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Waratahs 8.16 10.00 -1.80
Crusaders 7.07 10.42 -3.30
Hurricanes 5.52 2.89 2.60
Chiefs 4.08 2.23 1.90
Brumbies 3.84 2.20 1.60
Sharks 2.81 3.91 -1.10
Stormers 2.80 1.68 1.10
Bulls 1.82 2.88 -1.10
Blues 0.43 1.44 -1.00
Highlanders -2.53 -2.54 0.00
Cheetahs -4.29 -5.55 1.30
Lions -4.33 -3.39 -0.90
Force -5.17 -4.67 -0.50
Reds -5.91 -4.98 -0.90
Rebels -7.29 -9.53 2.20

 

Performance So Far

So far there have been 21 matches played, 13 of which were correctly predicted, a success rate of 61.9%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Highlanders vs. Reds Feb 27 20 – 13 8.10 TRUE
2 Force vs. Hurricanes Feb 27 13 – 42 -3.40 TRUE
3 Cheetahs vs. Blues Feb 27 25 – 24 -0.50 FALSE
4 Chiefs vs. Crusaders Feb 28 40 – 16 -1.80 FALSE
5 Rebels vs. Brumbies Feb 28 15 – 20 -7.60 TRUE
6 Bulls vs. Sharks Feb 28 43 – 35 2.20 TRUE
7 Lions vs. Stormers Feb 28 19 – 20 -3.60 TRUE

 

Predictions for Round 4

Here are the predictions for Round 4. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Chiefs vs. Highlanders Mar 06 Chiefs 10.60
2 Brumbies vs. Force Mar 06 Brumbies 13.00
3 Blues vs. Lions Mar 07 Blues 9.30
4 Reds vs. Waratahs Mar 07 Waratahs -10.10
5 Cheetahs vs. Bulls Mar 07 Bulls -2.10
6 Stormers vs. Sharks Mar 07 Stormers 4.00

 

March 2, 2015

A nice cuppa

Q:  What do you think about this new research on tea preventing diabetes?

A: That’s not what it says

Q: Sure it is. Big black letters, right at the top: “Three cups of tea a day can cut your risk of diabetes… even if you add milk”

A: I mean that’s not what the research says

Q: The bit about milk?

A: Well, they didn’t study milk at all, but that’s not the main problem

Q: They didn’t study cups?

A: No. Or diabetes. Or, in one of the studies, tea.

Q: Hmm. Ok, so this “glucose-lowering effect” they write about, is that a lab study?

A: Yes.

Q: Mice?

A:  One of the studies used rats, the other didn’t

Q: Cells, then?

A: No, just enzymes in a test tube, and a highly processed chemical extract of tea.

Q: Ok, forget about that one. But the rat study, that measured actual glucose lowering and actual tea?

A:  Almost. They gave the rats a high-sugar drink, and if they were given the tea first, their blood glucose didn’t go up as much.

Q: Which of the two studies was this one?

A: The one where the story just says the results were similar and doesn’t give the researchers’ names, only their institution.

Q: Wouldn’t you think the story would say more about this one, since it actually involves blood glucose and, like, living things?

A: In a perfect world, yes.

Q: The story says they don’t think milk would make a difference. What about sugar?

A: No mention of it.

Q: That’s strange. Quite a lot of British people have sugar in their tea. Wouldn’t it be helpful to say something?

A: You’d think.

Q: How much tea did the rats get?

A: The lowest effective dose they report is 62.5 mg/kg of freeze-dried tea powder

Q: What’s that in cups?

A: The research paper says “corresponds to nine cups of black tea”.

Q: Per day?

A: No, all at once.

Q: So we need to get bigger cups?

A: Or fewer reprinted British ‘health’ stories.

Stat of the Week Competition: February 28 – March 6 2015

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday March 6 2015.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of February 28 – March 6 2015 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

February 27, 2015

Quake prediction: how good does it need to be?

From a detailed story in the ChCh Press, (via Eric Crampton) about various earthquake-prediction approaches

About 40 minutes before the quake began, the TEC in the ionosphere rose by about 8 per cent above expected levels. Somewhat perplexed, he looked back at the trend for other recent giant quakes, including the February 2010 magnitude 8.8 event in Chile and the December 2004 magnitude 9.1 quake in Sumatra. He found the same increase about the same time before the quakes occurred.

Heki says there has been considerable academic debate both supporting and opposing his research.

To have 40 minutes warning of a massive quake would be very useful indeed and could help save many lives. “So, why 40 minutes?” he says. “I just don’t know.”

He says if the link were to be proved more firmly in the future it could be a useful warning tool. However, there are drawbacks in that the correlation only appears to exist for the largest earthquakes, whereas big quakes of less than magnitude 8.0 are far more frequent and still cause death and devastation. Geomagnetic storms can also render the system impotent, with fluctuations in the total electron count masking any pre-quake signal.

Let’s suppose that with more research everything works out, and there is a rise in this TEC before all very large quakes. How much would this help in New Zealand? The obvious place is Wellington. A quake over 8.0 magnitude has been observed in the area in 1855, when it triggered a tsunami. A repeat would also shatter many of the earthquake-prone buildings. A 40-minute warning could save many lives. It appears that TEC shouldn’t be that expensive to measure: it’s based on observing the time delays in GPS satellite transmissions as they pass through the ionosphere, so it mostly needs a very accurate clock (in fact, NASA publishes TEC maps every five minutes). Also, it looks like it would be very hard to hack the ionosphere to force the alarm to go off. The real problem is accuracy.

The system will have false positives and false negatives. False negatives (missing a quake) aren’t too bad, since that’s where you are without the system. False positives are more of a problem. They come in two forms: when the alarm goes off completely in the absence of a quake, and when there is a quake but no tsunami or catastrophic damage.

Complete false predictions would need to be very rare. If you tell everyone to run for the hills and it turns out to be sunspots or the wrong kind of snow, they will not be happy: the cost in lost work (and theft?) would be substantial, and there would probably be injuries.  Partial false predictions, where there was a large quake but it was too far away or in the wrong direction to cause a tsunami, would be just as expensive but probably wouldn’t cause as much ill-feeling or skepticism about future warnings.

Now for the disappointment. The story says “there has been considerable academic debate”. There has. For example, in a (paywalled) paper from 2013 looking at the Japanese quake that prompted Heki’s idea

A detailed analysis of the ionospheric variability in the 3 days before the earthquake is then undertaken, where a simultaneous increase in foF2 and the Es layer peak plasma frequency, foEs, relative to the 30-day median was observed within 1 h before the earthquake. A statistical search for similar simultaneous foF2 and foEs increases in 6 years of data revealed that this feature has been observed on many other occasions without related seismic activity. Therefore, it is concluded that one cannot confidently use this type of ionospheric perturbation to predict an impending earthquake.

In translation: you need to look just right to see this anomaly, and there are often anomalies like this one without quakes. Over four years they saw 24 anomalies, only one shortly before a quake.  Six complete false positives per year is obviously too many.  Suppose future research could refine what the signal looks like and reduce the false positives by a factor of ten: that’s still evacuation alarms with no quake more than once every two years. I’m pretty sure that’s still too many.

 

Siberian hamsters or Asian gerbils

Every year or so there is a news story along the lines of”Everything you know about the Black Death is Wrong”. I’ve just been reading a couple of excellent posts  by Alison Atkin on this year’s one.

The Herald’s version of the story (which they got from the Independent) is typical (but she has captured a large set of headlines)

The Black Death has always been bad publicity for rats, with the rodent widely blamed for killing millions of people across Europe by spreading the bubonic plague.

But it seems that the creature, in this case at least, has been unfairly maligned, as new research points the finger of blame at gerbils.

and

The scientists switched the blame from rat to gerbil after comparing tree-ring records from Europe with 7711 historical plague outbreaks.

That isn’t what the research paper (in PNAS) says. And it would be surprising if it did: could it really be true that Asian gerbils were spreading across Europe for centuries without anyone noticing?

The abstract of the paper says

The second plague pandemic in medieval Europe started with the Black Death epidemic of 1347–1353 and killed millions of people over a time span of four centuries. It is commonly thought that after its initial introduction from Asia, the disease persisted in Europe in rodent reservoirs until it eventually disappeared. Here, we show that climate-driven outbreaks of Yersinia pestis in Asian rodent plague reservoirs are significantly associated with new waves of plague arriving into Europe through its maritime trade network with Asia. This association strongly suggests that the bacterium was continuously reimported into Europe during the second plague pandemic, and offers an alternative explanation to putative European rodent reservoirs for how the disease could have persisted in Europe for so long.

If the researchers had found repeated, prevously unsuspected, invasions of Europe by hordes of gerbils, they would have said so in the abstract. They don’t. Not a gerbil to be seen.

The hypothesis is that plague was repeatedly re-imported from Asia (where affected a lots of species, including, yes, gerbils) to European rats, rather than persisting at low levels in European rats between the epidemics. Either way, once the epidemic got to Europe, it’s all about the rats [update: and other non-novel forms of transmission]

In this example, for a change, it doesn’t seem that the press release is responsible. Instead, it looks like progressive mutations in the story as it’s transmitted, with the great gerbil gradually going from an illustrative example of a plague host in Asia to the rodent version of Attila the Hun.

Two final remarks. First, the erroneous story is now in the Wikipedia entry for the great gerbil (with a citation to the PNAS paper, so it looks as if it’s real). Second, when the story is allegedly about the confusion between two species of rodent, it’s a pity the Herald stock photo isn’t the right species.

 

[Update: Wikipedia has been fixed.]