October 5, 2017

Strength of evidence and type of evidence

The last 127 people to win a Nobel Prize for Physics have been men. Various people have calculated probabilities for this under a model where the true probability each time is 50:50.  Those probabilities are very small: they start with 37 zeros. Now, as people who analyse coincidences will tell you, there’s potential for cherry-picking. Gender of Nobel Laureates in Physics isn’t the only possible comparison.  On the other hand, if the comparison were picked from a thousand million billion trillion competing comparisons, the probability would still be tiny. The hypothesis that the committee chooses people at random for the Nobel Prize in Physics is not statistically defensible.  In fact, even the hypothesis that the committee chooses people with physics PhDs at random isn’t statistically defensible.

The reason there’s still controversy is that ‘choosing people at random’ isn’t anyone’s claim about how the Nobel Prize in Physics works.  Roughly speaking, there are three explanations for it just being given to women:

  1. Physics is too hard for women, who should just stick to biostatistics
  2. Women don’t get many opportunities to lead really ground-breaking physics research, because science is sexist
  3. The Nobel Committee for Physics (or the people eligible to nominate) are less likely to choose women who have contributed just as much

The p-value with a ridiculous number of zeros doesn’t provide any basis for assessing how important the three explanations are.  You need more data — and a different type of data.

So, for example, it’s relevant that Lise Meitner was nominated 29 times for the prize (and 19 times for the Chemistry prize) and didn’t win, but that Otto Hahn did win for their joint work. It’s relevant that Chieng-Shieng Wu was nominated 7 times and that a prize was awarded for the discovery she worked on. It’s relevant that Vera Rubin received lots of other prizes and awards and was routinely mentioned as a possible Nobelist — we don’t yet know how often she was nominated because there’s a 50-year secrecy rule.

Personally (though I’m not a physicist) I think that explanation 1 can be largely discounted and explanation 2 has to stretch a lot to cover the situation, so explanation 3 is looking plausible. But the numbers with 37 zeroes aren’t a relevant summary of the data.

October 4, 2017


  • Data leakage: Bluetooth sex toys do not have a good sense of what’s a private activity (probably NSFW)
  • “Science in Society” award winners from the (US) National Association of Science Writers
  • The Nobel Prize for Physics went to gravitational wave astronomy. That’s a more statistical area than usual — extracting minute gravitational-wave signals from the background noise is a statistical challenge as well as an engineering nightmare. Renate Meyer, from the UoA Statistics department, and her co-workers, did some of the early work on this problem, and Matt Edwards (who we’re hoping to get back after a postdoc overseas) is a member of the LIGO Consortium.

Slip, slop, slap

From Stuff, the front-page link:


As Betteridge’s Law of Headlines implies, the answer is “No.” Even the vendor doesn’t make a claim like that.

The story says (with the advertising redacted)

The key ingredient in the capsules is 100mg of … a blend of grapefruit and rosemary extracts. An independent lab trial of [the stuff] in Italy in 2015 found the onset of sunburn was delayed by 30 percent after two months of daily use.

It appears to be still-unpublished study. According to an advertising white paper,  it’s actually better than a lot of nutraceutical research: it was blinded and had 35 people in each group.  If we assume there aren’t any hidden problems, the study says that people who take this stuff daily for a couple of months end up needing about 30% more UV light to get a mild sunburn.

That is, the optimistic view is we’re looking at the equivalent of SPF 1.3 sunscreen.

October 3, 2017

Mitre 10 Cup Predictions for Round 8

Team Ratings for Round 8

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Canterbury 18.69 14.78 3.90
Wellington 9.62 -1.62 11.20
Taranaki 8.28 7.04 1.20
Tasman 4.19 9.54 -5.40
Otago 3.93 -0.34 4.30
North Harbour 3.39 -1.27 4.70
Counties Manukau -0.67 5.70 -6.40
Auckland -0.73 6.11 -6.80
Manawatu -2.55 -3.59 1.00
Bay of Plenty -3.03 -3.98 1.00
Waikato -3.41 -0.26 -3.10
Northland -4.57 -12.37 7.80
Hawke’s Bay -13.02 -5.85 -7.20
Southland -22.74 -16.50 -6.20


Performance So Far

So far there have been 54 matches played, 38 of which were correctly predicted, a success rate of 70.4%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Northland vs. Otago Sep 27 32 – 30 -4.90 FALSE
2 Taranaki vs. Tasman Sep 28 40 – 26 6.80 TRUE
3 North Harbour vs. Hawke’s Bay Sep 29 33 – 30 24.20 TRUE
4 Southland vs. Manawatu Sep 30 20 – 25 -18.60 TRUE
5 Auckland vs. Bay of Plenty Sep 30 38 – 19 3.50 TRUE
6 Canterbury vs. Waikato Sep 30 37 – 17 27.40 TRUE
7 Wellington vs. Otago Oct 01 27 – 24 10.50 TRUE
8 Counties Manukau vs. Northland Oct 01 25 – 16 8.30 TRUE


Predictions for Round 8

Here are the predictions for Round 8. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Tasman vs. North Harbour Oct 04 Tasman 4.80
2 Manawatu vs. Counties Manukau Oct 05 Manawatu 2.10
3 Canterbury vs. Taranaki Oct 06 Canterbury 14.40
4 Otago vs. Bay of Plenty Oct 07 Otago 11.00
5 Northland vs. Hawke’s Bay Oct 07 Northland 12.40
6 Southland vs. Wellington Oct 07 Wellington -28.40
7 Tasman vs. Auckland Oct 08 Tasman 8.90
8 Waikato vs. North Harbour Oct 08 North Harbour -2.80


Currie Cup Predictions for Round 13

Team Ratings for Round 13

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Sharks 5.89 2.15 3.70
Western Province 3.98 3.30 0.70
Lions 2.44 7.41 -5.00
Cheetahs 1.59 4.33 -2.70
Blue Bulls 0.01 2.32 -2.30
Pumas -6.51 -10.63 4.10
Griquas -10.15 -11.62 1.50


Performance So Far

So far there have been 36 matches played, 24 of which were correctly predicted, a success rate of 66.7%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Sharks vs. Lions Sep 29 24 – 10 6.80 TRUE
2 Griquas vs. Cheetahs Sep 30 59 – 24 -9.50 FALSE
3 Blue Bulls vs. Western Province Oct 01 45 – 46 0.80 FALSE


Predictions for Round 13

Here are the predictions for Round 13. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

The Cheetahs have been playing in the Pro 14 competition and fielding a second or third team in the Currie Cup. They have another such game this weekend causing problems with prediction. After 4 such games a quick estimate is that they might score 27 points less than if they were fielding their best team, but losses so far will have dropped their rating a couple of points already. I would guess that a difference of 25 points might be appropriate, so instead of the points difference being 6.1 below, it might be -19 and a win to the Blue Bulls.

Game Date Winner Prediction
1 Cheetahs vs. Blue Bulls Oct 06 Cheetahs 6.10
2 Griquas vs. Pumas Oct 07 Griquas 0.90
3 Lions vs. Western Province Oct 08 Lions 3.00


October 2, 2017

Denominators (when cellphones attack)

A question that is very unlikely to be interesting: were there more cellphone-related injuries in Dunedin or Auckland last year?

Auckland has a lot more people. Of course it has more cellphone-related injuries.

A question that is moderately unlikely to be interesting, but, ok, you might need to write a story: were people in Auckland more likely to have cellphone-related injuries than people in Dunedin?

So, where the Herald website (and presumably the ODT originally) has

In the three years to the end of 2016, ACC received 23 claims for cellphone injuries from Dunedin people and paid claimants a total of $10,436…

Statistics provided by ACC show Aucklanders made the highest number of claims at 190, costing a total of $76,159

the second paragraph might be better as

Although Auckland has more than ten times as many people, the home of the Vodafone Warriors had only 190 claims, costing a total of $76,159

(Someone who can actually write might do better than me here. )

Stat of the Week Competition: September 30 – October 6 2017

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday October 6 2017.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of September 30 – October 6 2017 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.


September 30, 2017

Simple and ineffective

Q: Did you see there’s a new test to predict dementia?

A: Another one?

Q: Yes, the Herald says it  “would allow drugs and lifestyle changes, such as a healthy diet and more exercise, to be more effective before the devastating condition takes hold.

A: That would make more sense if there were drugs and lifestyle changes that actually worked to stop the disease process.

Q: At least it’s a simple one and accurate test. It’s just based on your sense of smell.

A: <dubious noises>

Q: But  “almost all the participants, aged 57 to 85, who were unable to name a single scent had been diagnosed with dementia. And nearly 80 per cent of those who provided just one or two correct answers also had it,

A: That’ s not what the research says

Q: It’s what the story says.

A: Yes. Yes, it is.

Q: Ok, what does the research say? It’s behind a paywall

A: Here’s a graph

Q: That matches the story, doesn’t it?

A: Check the axis labels.

Q: Oh. 8% and 10%? But couldn’t the labels just be wrong?

A: Rather than the Daily Mail? It’s possible, but the research paper also says “9% positive predictive value”, meaning that only 9% of those who are predicted to get dementia actually do, and that matches the graph.

Q: Um

A: And there’s a commentary in the same issue of the journal, headlined  Screening Is Not Benign and saying “No test with such a low [positive predictive value] would be taken seriously as a way to identify any disease in a population”

Q: But it’s still a big difference, isn’t it.

A: Yes, and it’s scientifically interesting that the nerves or brain cells related to smell seem to be damaged relatively early in the disease, but it’s not a predictive test.


[Update: the source for the error seems to be the University of Chicago press release.]

[Update: It’s on Stuff, too]

September 27, 2017

Stat Soc of Australia on Marriage Survey

The Statistical Society of Australia has put out a press release on the Australian Marriage Law Postal Survey.  Their concern, in summary, is that if this is supposed to be a survey rather than a vote, the Government has required a pretty crap survey and this isn’t good.

The SSA is concerned that, as a result, the correct interpretation of the Survey results will be missed or ignored by some community groups, who may interpret the resulting proportion for or against same-sex marriage as representative of the opinion of all Australians. This may subsequently, and erroneously, damage the reputation of the ABS and the statistical community as a whole, when it is realised that the Survey results can not be understood in these terms.


The SSA is not aware of any official statistics based purely on unadjusted respondent data alone. The ABS routinely adjusts population numbers derived from the census to allow for under and over enumeration issues via its post-enumeration survey. However, under the Government direction, there is there no scope to adjust for demographic biases or collect any information that might enable the ABS to even indicate what these biases might be.

If the aim was to understand the views of all Australians, an opinion survey would be more appropriate. High quality professionally-designed opinion surveys are routinely carried out by market research companies, the ABS, and other institutions. Surveys can be an efficient and powerful tool for canvassing a population, making use of statistical techniques to ensure that the results are proportioned according to the demographics of the population. With a proper survey design and analysis, public opinion can be reliably estimated to a specified accuracy. They can also be implemented at a fraction of the cost of the present Postal Survey. The ABS has a world-class reputation and expertise in this area.

(They’re not actually saying this is the most important deficiency of the process, just that it’s the most statistical one)


  • “This 30-point shift could be because attitudes changed rapidly. Villasenor’s study was immediately after Charlottesville, for example, and students might be more primed to think about Nazi’s marching on their campus…It could also be because of differences in survey methods. Surveying college students is really hard.
  • From the Ottawa CitizenIn six high-profile cases documented by the Citizen, searching the name of a young offender or victim online pointed to media coverage of their court cases, even though their names do not appear anywhere in the news articles themselves.