August 11, 2014

Stat of the Week Competition: August 9 – 15 2014

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday August 15 2014.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of August 9 – 15 2014 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

August 9, 2014

Briefly

Limits of measurement edition

  • “So you can either believe that Germany has no billionaires or that European statisticians aren’t very good at finding them.” Stories from Slate and Bloomberg on the difficulty of estimating wealth inequality
  • “Big data really only has one unalloyed success on its track record, and it’s an old one: Google, specifically its Web search.” Another story from Slate, on Big Data and creepy experiments.
  • Even for the best drink-driving propaganda, such as the famous ‘Ghost Chips’ ad, the evaluation is basically in terms of public perception, because it’s too hard to evaluate actual impact on drink driving.  A nice piece from TheWireless
August 8, 2014

History of NZ Parliament visualisation

One frame of a video showing NZ party representation in Parliament over time,

nzparties

made by Stella Blake-Kelly for TheWireless. Watch (and read) the whole thing.

August 7, 2014

Vitamin D context

There’s a story in the Herald about Alzheimer’s Disease risk being much higher in people with low vitamin D levels in their blood. This is observational data, where vitamin D was measured and the researchers then waited to see who would get dementia. That’s all in the story, and the problems aren’t the Herald’s fault.

The lead author of the research paper is quoted as saying

“Clinical trials are now needed to establish whether eating foods such as oily fish or taking vitamin D supplements can delay or even prevent the onset of Alzheimer’s disease and dementia.”

That’s true, as far as it goes, but you might have expected the person writing the press release to mention the existing randomised trial evidence.

The Women’s Health Initiative, one of the largest and probably the most expensive randomised trial ever, included randomisation to calcium and vitamin D or placebo. The goal was to look at prevention of fractures, with prevention of colon cancer as a secondary question, but they have data on dementia and they have published it

During a mean follow-up of 7.8 years, 39 participants in the treatment group and 37 in the placebo group developed incident dementia (hazard ratio (HR) = 1.11, 95% confidence interval (CI) = 0.71-1.74, P = .64). Likewise, 98 treatment participants and 108 placebo participants developed incident [mild cognitive impairment] (HR = 0.95, 95% CI = 0.72-1.25, P = .72). There were no significant differences in incident dementia or [mild cognitive impairment] or in global or domain-specific cognitive function between groups.

That’s based on roughly 2000 women in each treatment group.

The Women’s Health Initiative data doesn’t nail down all the possibilities. It could be that a higher dose is needed. It could be that the women were too healthy (although half of them had low vitamin D levels by usual criteria). The research paper mentions the Women’s Health Initiative and these possible explanations, so the authors were definitely aware of them.

If you’re going to tell people about a potential way to prevent dementia, it would be helpful to at least mention that one form of it has been tried and didn’t work.

Non-bogus non-random polling

As you know, one of the public services StatsChat provides is whingeing about bogus polls in the media, at least when they are used to anchor stories rather than just being decorative widgets on the webpage. This attitude doesn’t (or doesn’t necessarily) apply to polls that make no effort to collect a non-random sample but do make serious efforts to reduce bias by modelling the data. Personally, I think it would be better to apply these modelling techniques on top of standard sampling approaches, but that might not be feasible. You can’t do everything.

I’ve been prompted to write this by seeing Andrew Gelman and David Rothschild’s reasonable and measured response (and also Andrew’s later reasonable and less measured response) to a statement from the American Association for Public Opinion Research.  The AAPOR said

This week, the New York Times and CBS News published a story using, in part, information from a non-probability, opt-in survey sparking concern among many in the polling community. In general, these methods have little grounding in theory and the results can vary widely based on the particular method used. While little information about the methodology accompanied the story, a high level overview of the methodology was posted subsequently on the polling vendor’s website. Unfortunately, due perhaps in part to the novelty of the approach used, many of the details required to honestly assess the methodology remain undisclosed.

As the responses make clear, the accusation about transparency of methods is unfounded. The accusation about theoretical grounding is the pot calling the kettle black.  Standard survey sampling theory is one of my areas of research. I’m currently writing the second edition of a textbook on it. I know about its grounding in theory.

The classical theory applies to most of my applied sampling work, which tends to involve sampling specimen tubes from freezers. The theoretical grounding does not apply when there is massive non-response, as in all political polling. It is an empirical observation based on election results that carefully-done quota samples and reweighted probability samples of telephones give pretty good estimates of public opinion. There is no mathematical guarantee.

Since classical approaches to opinion polling work despite massive non-response, it’s reasonable to expect that modelling-based approaches to non-probability data will also work, and reasonable to hope that they might even work better (given sufficient data and careful modelling). Whether they do work better is an empirical question, but these model-based approaches aren’t a flashy new fad. Rod Little, who pioneered the methods AAPOR is objecting to, did so nearly twenty years before his stint as Chief Scientist at the US Census Bureau, an institution not known for its obsession with the latest fashions.

In some settings modelling may not be feasible because of a lack of population data. In a few settings non-response is not a problem. Neither of those applies in US political polling. It’s disturbing when the president of one of the largest opinion-polling organisations argues that model-based approaches should not be referenced in the media, and that’s even before considering some of the disparaging language being used.

“Don’t try this at home” might have been a reasonable warning to pollers without access to someone like Andrew Gelman. “Don’t try this in the New York Times” wasn’t.

New breast cancer gene

The Herald has a pretty good story about a gene, PALB2, where there are mutations that cause a substantially raised risk of breast cancer.  It’s not as novel as the story implies (the first sentence of the abstract is “Germline loss-of-function mutations in PALB2 are known to confer a predisposition to breast cancer.”), but the quantified increase in risk is new and potentially a useful thing to know.

Genetic testing for BRCA mutations is funded in NZ for people with a sufficiently strong family history, but the policy is to test one of the affected relatives first. This new gene demonstrates why.

If you had a high-risk family history of breast cancer, and tested negative for BRCA1 and BRCA2 mutations, you might assume you had missed out on the bad gene. It’s possible, though, that your family’s risk was due to some other mutation — in PALB2, or in another undiscovered gene — and in that case the negative test didn’t actually tell you anything. By testing a family member  first, you can be sure you are looking in the right place for your risks, rather than just in the place that’s easiest to test.

August 6, 2014

With friends like these…

Via Alberto Cairo on Twitter, a picture from an introductory statistics text being sold at the big statistics conference in Boston this week

BuTiYZtIcAAaju0

Income statistics

The Herald has a story headlined “Where to work if it’s money you’re after,” giving estimated median incomes across a range of job areas.  Sadly, if you read to the end, two of the sources are summaries of advertised salaries for advertised jobs on Seek and TradeMe.  That is, they are neither actual incomes, nor for the country as a whole.

Rather than just whinge about unrepresentative data, I looked at StatsNZ. They divide things up differently, so there was only one job group in the story that exactly matched one on NZ.Stat. People working in construction have a median weekly income of $840 and mean weekly income of $956 according to the NZ Income Survey. If most people in construction worked all year, without periods of unemployment, this would come to a median annual income of  $43,680 or a mean of $49,712.

The Herald thinks the median annual income in construction is $60,000-$78,000.

 

 

NRL Predictions for Round 22

Team Ratings for Round 22

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Rabbitohs 9.79 5.82 4.00
Sea Eagles 9.07 9.10 -0.00
Warriors 6.94 -0.72 7.70
Roosters 6.48 12.35 -5.90
Cowboys 5.45 6.01 -0.60
Storm 4.45 7.64 -3.20
Broncos 0.92 -4.69 5.60
Panthers 0.91 -2.48 3.40
Bulldogs -1.70 2.46 -4.20
Dragons -1.84 -7.57 5.70
Knights -3.61 5.23 -8.80
Titans -5.63 1.45 -7.10
Eels -6.41 -18.45 12.00
Wests Tigers -8.33 -11.26 2.90
Raiders -8.79 -8.99 0.20
Sharks -9.50 2.32 -11.80

 

Performance So Far

So far there have been 152 matches played, 85 of which were correctly predicted, a success rate of 55.9%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Sea Eagles vs. Broncos Aug 01 16 – 4 12.90 TRUE
2 Bulldogs vs. Panthers Aug 01 16 – 22 3.80 FALSE
3 Sharks vs. Eels Aug 02 12 – 32 5.90 FALSE
4 Cowboys vs. Titans Aug 02 28 – 8 14.50 TRUE
5 Roosters vs. Dragons Aug 02 30 – 22 14.00 TRUE
6 Raiders vs. Warriors Aug 03 18 – 54 -6.10 TRUE
7 Rabbitohs vs. Knights Aug 03 50 – 10 13.30 TRUE
8 Wests Tigers vs. Storm Aug 04 6 – 28 -5.20 TRUE

 

Predictions for Round 22

Here are the predictions for Round 22. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Rabbitohs vs. Sea Eagles Aug 08 Rabbitohs 5.20
2 Broncos vs. Bulldogs Aug 08 Broncos 7.10
3 Cowboys vs. Wests Tigers Aug 09 Cowboys 18.30
4 Knights vs. Storm Aug 09 Storm -3.60
5 Eels vs. Raiders Aug 09 Eels 6.90
6 Warriors vs. Sharks Aug 10 Warriors 20.90
7 Dragons vs. Panthers Aug 10 Dragons 1.80
8 Roosters vs. Titans Aug 11 Roosters 16.60

 

Currie Cup Predictions for Round 1

Team Ratings for Round 1

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season. Note that new teams are given a rating of -10. This is somewhat arbitrary, but has proved reasonably satisfactory in predicting Super Rugby games.

Current Rating Rating at Season Start Difference
Sharks 5.09 5.09 0.00
Western Province 3.43 3.43 -0.00
Cheetahs 0.33 0.33 0.00
Lions 0.07 0.07 0.00
Blue Bulls -0.74 -0.74 0.00
Griquas -7.49 -7.49 0.00
Kings -10.00 -10.00 0.00
Pumas -10.00 -10.00 0.00

 

Predictions for Round 1

Here are the predictions for Round 1. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Kings vs. Western Province Aug 08 Western Province -8.40
2 Griquas vs. Sharks Aug 09 Sharks -7.60
3 Lions vs. Blue Bulls Aug 09 Lions 5.80
4 Pumas vs. Cheetahs Aug 09 Cheetahs -5.30