Stats Chat Stats Chat

March 9, 2015

Stat of the Week Competition: March 7 – 13 2015

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday March 13 2015.
Statistics can be bad, exemplary or fascinating.
The statistic must be in the NZ media during the period of March 7 – 13 2015 inclusive.
Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

View comments (1)

Not all there

By Thomas Lumley

One of the most common problems with data is that it’s not there. Families don’t answer their phones, over-worked nurses miss some forms, and even tireless electronic recorders have power failures.

There’s a large field of statistical research devoted to ways of fixing the missing-data problem. None of them work — that’s not my cynical opinion, that’s a mathematical theorem — but many of them are more likely to make things better than worse. The best ways to handle data you don’t have depends on what sort of data and why you don’t have it, but even the best ways can confuse people who aren’t paying attention.

Just ignoring the missing data problem and treating the data you have as all the data is effectively assuming the missing data look just like the observed data. This is often very implausible. For example, in a weight-loss study it is much more likely that people who aren’t losing weight will drop out. If you just analyse data from people who stay in the study and follow all your instructions, unless this is nearly everyone, they will probably have lost weight (on average) even if your treatment is just staring at a container of felt-tip pens.

That’s why it is often sensible to treat missing observations as if they were bad. The Ministry of Health drinking water standards do this. For example, they say that only 96.7% of New Zealand received water complying with the bacteriological standards. That sounds serious. Of the 3.3% failures, however, more than half (2.0%) were just failures to monitor thoroughly enough, and only 0.1% had E. coli transgression that were not followed up by immediate corrective action.

From a regulatory point of view, lumping these together makes sense. The Ministry doesn’t want to create incentives for data to ‘accidentally’ go missing whenever there’s a problem. From a public health point of view, though, you can get badly confused if you just look at the headline compliance figure and don’t read down to page 18.

The Ministry takes a similarly conservative approach to the other standards, and the detailed explanations are more reassuring than the headline compliance figures. There are a small number of water supplies with worrying levels of arsenic — enough to increase lifetime cancer risk by a tenth of a percentage point or so — but in general the biggest problem is inadequate fluoride concentrations in drinking water for nearly half of Kiwi kids.

March 5, 2015

Showing us the money

By Thomas Lumley

The Herald is running a project to crowdsource data entry and annotation for NZ political donations and expenses: it’s something that’s hard to automate and where local knowledge is useful. Today, they have an interactive graph for 2014 election donations and have made the data available

Briefly

By Thomas Lumley

“Yesterday the Herald reported on some confusion about what’s happening to New Zealand property prices. QV said prices were continuing their seemingly inexorable rise, while Barfoot & Thompson said that in Auckland prices had dipped, based on its own sales.” Aaron Schiff, arguing that housing price data should be open
Another tea story, but this one does say what experiments were done and what the equivalent dose would be in humans.
Predictive models in language: autocompleting to ‘Qatar’
Predictive models in language: detecting ‘fake’ (or Gaelic) names on Facebook
Predictive vs causal explanations in language, from XKCD

View comments (1)

March 4, 2015

NRL Predictions for Round 1

By David Scott

Team Ratings for Round 1

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Rabbitohs	13.06	13.06	-0.00
Cowboys	9.52	9.52	-0.00
Roosters	9.09	9.09	-0.00
Storm	4.36	4.36	0.00
Broncos	4.03	4.03	-0.00
Panthers	3.69	3.69	-0.00
Warriors	3.07	3.07	-0.00
Sea Eagles	2.68	2.68	0.00
Bulldogs	0.21	0.21	0.00
Knights	-0.28	-0.28	-0.00
Dragons	-1.74	-1.74	-0.00
Raiders	-7.09	-7.09	-0.00
Eels	-7.19	-7.19	-0.00
Titans	-8.20	-8.20	0.00
Sharks	-10.76	-10.76	-0.00
Wests Tigers	-13.13	-13.13	-0.00

Predictions for Round 1

Here are the predictions for Round 1. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Broncos vs. Rabbitohs	Mar 05	Rabbitohs	-6.00
2	Eels vs. Sea Eagles	Mar 06	Sea Eagles	-6.90
3	Cowboys vs. Roosters	Mar 07	Cowboys	3.40
4	Knights vs. Warriors	Mar 07	Knights	0.70
5	Titans vs. Wests Tigers	Mar 07	Titans	7.90
6	Panthers vs. Bulldogs	Mar 08	Panthers	6.50
7	Sharks vs. Raiders	Mar 08	Raiders	-0.70
8	Dragons vs. Storm	Mar 09	Storm	-3.10

View comments (5)

Super 15 Predictions for Round 4

By David Scott

Team Ratings for Round 4

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Waratahs	8.16	10.00	-1.80
Crusaders	7.07	10.42	-3.30
Hurricanes	5.52	2.89	2.60
Chiefs	4.08	2.23	1.90
Brumbies	3.84	2.20	1.60
Sharks	2.81	3.91	-1.10
Stormers	2.80	1.68	1.10
Bulls	1.82	2.88	-1.10
Blues	0.43	1.44	-1.00
Highlanders	-2.53	-2.54	0.00
Cheetahs	-4.29	-5.55	1.30
Lions	-4.33	-3.39	-0.90
Force	-5.17	-4.67	-0.50
Reds	-5.91	-4.98	-0.90
Rebels	-7.29	-9.53	2.20

Performance So Far

So far there have been 21 matches played, 13 of which were correctly predicted, a success rate of 61.9%.

Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Highlanders vs. Reds	Feb 27	20 – 13	8.10	TRUE
2	Force vs. Hurricanes	Feb 27	13 – 42	-3.40	TRUE
3	Cheetahs vs. Blues	Feb 27	25 – 24	-0.50	FALSE
4	Chiefs vs. Crusaders	Feb 28	40 – 16	-1.80	FALSE
5	Rebels vs. Brumbies	Feb 28	15 – 20	-7.60	TRUE
6	Bulls vs. Sharks	Feb 28	43 – 35	2.20	TRUE
7	Lions vs. Stormers	Feb 28	19 – 20	-3.60	TRUE

Predictions for Round 4

Here are the predictions for Round 4. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Chiefs vs. Highlanders	Mar 06	Chiefs	10.60
2	Brumbies vs. Force	Mar 06	Brumbies	13.00
3	Blues vs. Lions	Mar 07	Blues	9.30
4	Reds vs. Waratahs	Mar 07	Waratahs	-10.10
5	Cheetahs vs. Bulls	Mar 07	Bulls	-2.10
6	Stormers vs. Sharks	Mar 07	Stormers	4.00

March 2, 2015

A nice cuppa

By Thomas Lumley

Q: What do you think about this new research on tea preventing diabetes?

A: That’s not what it says

Q: Sure it is. Big black letters, right at the top: “Three cups of tea a day can cut your risk of diabetes… even if you add milk”

A: I mean that’s not what the research says

Q: The bit about milk?

A: Well, they didn’t study milk at all, but that’s not the main problem

Q: They didn’t study cups?

A: No. Or diabetes. Or, in one of the studies, tea.

Q: Hmm. Ok, so this “glucose-lowering effect” they write about, is that a lab study?

A: Yes.

Q: Mice?

A: One of the studies used rats, the other didn’t

Q: Cells, then?

A: No, just enzymes in a test tube, and a highly processed chemical extract of tea.

Q: Ok, forget about that one. But the rat study, that measured actual glucose lowering and actual tea?

A: Almost. They gave the rats a high-sugar drink, and if they were given the tea first, their blood glucose didn’t go up as much.

Q: Which of the two studies was this one?

A: The one where the story just says the results were similar and doesn’t give the researchers’ names, only their institution.

Q: Wouldn’t you think the story would say more about this one, since it actually involves blood glucose and, like, living things?

A: In a perfect world, yes.

Q: The story says they don’t think milk would make a difference. What about sugar?

A: No mention of it.

Q: That’s strange. Quite a lot of British people have sugar in their tea. Wouldn’t it be helpful to say something?

A: You’d think.

Q: How much tea did the rats get?

A: The lowest effective dose they report is 62.5 mg/kg of freeze-dried tea powder

Q: What’s that in cups?

A: The research paper says “corresponds to nine cups of black tea”.

Q: Per day?

A: No, all at once.

Q: So we need to get bigger cups?

A: Or fewer reprinted British ‘health’ stories.

Stat of the Week Competition: February 28 – March 6 2015

By Rachel Cunliffe

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday March 6 2015.
Statistics can be bad, exemplary or fascinating.
The statistic must be in the NZ media during the period of February 28 – March 6 2015 inclusive.
Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

February 27, 2015

Quake prediction: how good does it need to be?

By Thomas Lumley

From a detailed story in the ChCh Press, (via Eric Crampton) about various earthquake-prediction approaches

About 40 minutes before the quake began, the TEC in the ionosphere rose by about 8 per cent above expected levels. Somewhat perplexed, he looked back at the trend for other recent giant quakes, including the February 2010 magnitude 8.8 event in Chile and the December 2004 magnitude 9.1 quake in Sumatra. He found the same increase about the same time before the quakes occurred.

Heki says there has been considerable academic debate both supporting and opposing his research.

To have 40 minutes warning of a massive quake would be very useful indeed and could help save many lives. “So, why 40 minutes?” he says. “I just don’t know.”

He says if the link were to be proved more firmly in the future it could be a useful warning tool. However, there are drawbacks in that the correlation only appears to exist for the largest earthquakes, whereas big quakes of less than magnitude 8.0 are far more frequent and still cause death and devastation. Geomagnetic storms can also render the system impotent, with fluctuations in the total electron count masking any pre-quake signal.

Let’s suppose that with more research everything works out, and there is a rise in this TEC before all very large quakes. How much would this help in New Zealand? The obvious place is Wellington. A quake over 8.0 magnitude has been observed in the area in 1855, when it triggered a tsunami. A repeat would also shatter many of the earthquake-prone buildings. A 40-minute warning could save many lives. It appears that TEC shouldn’t be that expensive to measure: it’s based on observing the time delays in GPS satellite transmissions as they pass through the ionosphere, so it mostly needs a very accurate clock (in fact, NASA publishes TEC maps every five minutes). Also, it looks like it would be very hard to hack the ionosphere to force the alarm to go off. The real problem is accuracy.

The system will have false positives and false negatives. False negatives (missing a quake) aren’t too bad, since that’s where you are without the system. False positives are more of a problem. They come in two forms: when the alarm goes off completely in the absence of a quake, and when there is a quake but no tsunami or catastrophic damage.

Complete false predictions would need to be very rare. If you tell everyone to run for the hills and it turns out to be sunspots or the wrong kind of snow, they will not be happy: the cost in lost work (and theft?) would be substantial, and there would probably be injuries. Partial false predictions, where there was a large quake but it was too far away or in the wrong direction to cause a tsunami, would be just as expensive but probably wouldn’t cause as much ill-feeling or skepticism about future warnings.

Now for the disappointment. The story says “there has been considerable academic debate”. There has. For example, in a (paywalled) paper from 2013 looking at the Japanese quake that prompted Heki’s idea

A detailed analysis of the ionospheric variability in the 3 days before the earthquake is then undertaken, where a simultaneous increase in foF2 and the Es layer peak plasma frequency, foEs, relative to the 30-day median was observed within 1 h before the earthquake. A statistical search for similar simultaneous foF2 and foEs increases in 6 years of data revealed that this feature has been observed on many other occasions without related seismic activity. Therefore, it is concluded that one cannot confidently use this type of ionospheric perturbation to predict an impending earthquake.

In translation: you need to look just right to see this anomaly, and there are often anomalies like this one without quakes. Over four years they saw 24 anomalies, only one shortly before a quake. Six complete false positives per year is obviously too many. Suppose future research could refine what the signal looks like and reduce the false positives by a factor of ten: that’s still evacuation alarms with no quake more than once every two years. I’m pretty sure that’s still too many.

View comments (3)

Siberian hamsters or Asian gerbils

By Thomas Lumley

Every year or so there is a news story along the lines of”Everything you know about the Black Death is Wrong”. I’ve just been reading a couple of excellent posts by Alison Atkin on this year’s one.

The Herald’s version of the story (which they got from the Independent) is typical (but she has captured a large set of headlines)

The Black Death has always been bad publicity for rats, with the rodent widely blamed for killing millions of people across Europe by spreading the bubonic plague.

But it seems that the creature, in this case at least, has been unfairly maligned, as new research points the finger of blame at gerbils.

and

The scientists switched the blame from rat to gerbil after comparing tree-ring records from Europe with 7711 historical plague outbreaks.

That isn’t what the research paper (in PNAS) says. And it would be surprising if it did: could it really be true that Asian gerbils were spreading across Europe for centuries without anyone noticing?

The abstract of the paper says

The second plague pandemic in medieval Europe started with the Black Death epidemic of 1347–1353 and killed millions of people over a time span of four centuries. It is commonly thought that after its initial introduction from Asia, the disease persisted in Europe in rodent reservoirs until it eventually disappeared. Here, we show that climate-driven outbreaks of Yersinia pestis in Asian rodent plague reservoirs are significantly associated with new waves of plague arriving into Europe through its maritime trade network with Asia. This association strongly suggests that the bacterium was continuously reimported into Europe during the second plague pandemic, and offers an alternative explanation to putative European rodent reservoirs for how the disease could have persisted in Europe for so long.

If the researchers had found repeated, prevously unsuspected, invasions of Europe by hordes of gerbils, they would have said so in the abstract. They don’t. Not a gerbil to be seen.

The hypothesis is that plague was repeatedly re-imported from Asia (where affected a lots of species, including, yes, gerbils) to European rats, rather than persisting at low levels in European rats between the epidemics. Either way, once the epidemic got to Europe, it’s all about the rats [update: and other non-novel forms of transmission]

In this example, for a change, it doesn’t seem that the press release is responsible. Instead, it looks like progressive mutations in the story as it’s transmitted, with the great gerbil gradually going from an illustrative example of a plague host in Asia to the rodent version of Attila the Hun.

Two final remarks. First, the erroneous story is now in the Wikipedia entry for the great gerbil (with a citation to the PNAS paper, so it looks as if it’s real). Second, when the story is allegedly about the confusion between two species of rodent, it’s a pity the Herald stock photo isn’t the right species.

[Update: Wikipedia has been fixed.]

View comments (1)

Stats Chat

Stat of the Week Competition: March 7 – 13 2015

Not all there

Showing us the money

Briefly

NRL Predictions for Round 1

Team Ratings for Round 1

Predictions for Round 1

Super 15 Predictions for Round 4

Team Ratings for Round 4

Performance So Far

Predictions for Round 4

A nice cuppa

Stat of the Week Competition: February 28 – March 6 2015

Quake prediction: how good does it need to be?

Siberian hamsters or Asian gerbils

Recent comments

Popular posts

Latest posts

All topics

Recommended sites

Subscribe:

Receive our posts via email:

Team Ratings for Round 1

Predictions for Round 1

Team Ratings for Round 4

Performance So Far

Predictions for Round 4

Recent comments

Popular posts

Latest posts

All topics

Recommended sites