Posts from June 2014 (44)

June 25, 2014

Something to listen to

Two people we have linked to a lot, Felix Salmon and Cathy O’Neill, now have a podcast on money and finance, at Slate

Not even wrong

The Readers’ Digest “Most Trusted” lists are out again. Sigh.

Before we get to the actual complaint in Stat-of-the-Week recommendation, we should acknowledge that there’s no way the “most trusted” list could make sense.

Firstly, ‘trusted’ requires more detail. What is it that we’re trusting these people with? Of course, it wouldn’t help making the question more specific, since people will still answer on some vague ‘niceness’ scale anyway: we saw this problem with a Herald poll at the beginning of the year, which asked opinions about five notable people and found the only one notable for his commitment to animal safety had the lowest rating for “who would you trust to feed your cat?”. Secondly, there’s no useful way to get an accurate rating of dozens of people (or other items) in an opinion poll. People’s brains overload. Thirdly, even if you could get a rating from each respondent, the overall ranking will be sensitive to how you combine the individual ratings.

So how does Readers’ Digest do it? They say (shouting in the original)

READER’S DIGEST COMMISSIONED CATALYST CONSULTANCY & RESEARCH TO POLL A REPRESENTATIVE SAMPLE OF NEW ZEALANDERS ABOUT TRUSTED PEOPLE AND PROFESSIONS. A TOTAL OF 603 ADULTS RANKED 100 WELL-KNOWN PEOPLE AND 50 JOB TYPES ON A SCALE OF ONE TO TEN IN MARCH 2014.

That is, the list is determined in advance, and the polling just addresses the ordering on the list. There is some vague sense in which Willie Apiata is the most trusted person,  or at least the most highly-regarded person, or at least the most highly-regarded famous person, in New Zealand but there really isn’t any useful sense in which Hone Harawira is the least trusted person in New Zealand. There are many people in NZ who you’d expect to be less trusted than Mr Harawira; they didn’t get put on the list, and the survey respondents weren’t asked about them.

It’s not surprising that stories keep coming out about this list, and I suppose it’s not surprising that people try to interpret being on the bottom of the list. Perhaps more surprising, no-one has yet complained that there are actually 101 well-known people, not 100, on the list.

June 24, 2014

Beyond clinical trials?

From The Atlantic

And with reliable simulations for what’s happening at the cellular level, this approach could be used to treat patients and also to test new drugs and devices. Dassault Systèmes is focusing on that level of granularity now, trying to simulate propagation of cholesterol in human cells and building oncological cell models. “It’s data science and modeling,” Charlès told me. “Coupling the two creates a new environment in medicine.”

Charlès and his colleagues believe that a shift to virtual clinical trials—that is, testing new medicines and devices using computer models before or instead of trials in human patients—could make new treatments available more quickly and cheaply. 

From pharmaceutical chemist Derek Lowe, in response

Speed the day. The cost of clinical trials, coupled with their low success rate, is eating us alive in this business (and it’s getting worse every year). This is just the sort of thing that could rescue us from the walls that are closing in more tightly all the time. But this talk of shifts and revolutions makes it sound as if this sort of thing is happening right now, which it isn’t. No such simulated clinical trial, one that could serve as the basis for a drug approval, is anywhere near even being proposed. How long before one is, then? If things go really swimmingly, I’d say 20 to 25 years from now, personally, but I’d be glad to hear other estimates.

We do, potentially, have the tools to use current treatments more effectively, and data science can help.  Even there,  the biggest opportunities are nothing to do with subtle individual differences — for example, both here and in the US, only about half of people with hypertension are being treated.

June 23, 2014

Possibly underreported

From Stuff, the headline “Cheating on the rise at Massey.” The basis for the story is that there were 56 incidents from 56 separate students reported in 2012, and 72 incidents from 51 separate students reported last year.

We aren’t told if that’s out of the 35000 total students, the 18000 on-campus students, or the 9000 at the Manawatu campus. Even with the smallest denominator, the cheating rate is only about half a percent. Taking this at face value requires a touching faith in the honest of Massey students, since the rate is a couple of orders of magnitude lower than self-report surveys often find for ever having plagiarised in college, and five times lower than a careful experiment found for a single assignment in US colleges (PDF)

Since reported incidents of cheating are a small minority of actual incidents, it’s hard to say anything sensible about trends from two years at a single university, especially as the story says Massey is taking new steps to combat cheating. There’s no way to disentangle changes in reporting from changes in cheating.

Briefly

  • From The Functional Art, ethics in infographics
  • From Scott Aaronson, is it possible to define morality or trust the way Google defines reliability
  • “Ethics in Graphic Design” is a forum for the exploration of ethical issues in graphic design. It is intended to be used as a resource and to create an open dialogue among graphic designers about these critical issues. 

Undecided?

My attention was drawn on Twitter to this post at The Political Scientist arguing that the election poll reporting is misleading because they don’t report the results for the relatively popular “Undecided” party.  The post is making a good point, but there are two things I want to comment on. Actually, three things. The zeroth thing is that the post contains the numbers, but only as screenshots, not as anything useful.

The first point is that the post uses correlation coefficients to do everything, and these really aren’t fit for purpose. The value of correlation coefficients is that they summarise the (linear part of the) relationship between two variables in a way that doesn’t involve the units of measurement or the direction of effect (if any). Those are bugs, not features, in this analysis. The question is how the other party preferences have changed with changes in the ‘Undecided’ preference — how many extra respondents picked Labour, say, for each extra respondent who gave a preference. That sort of question is answered  (to a straight-line approximation) by regression coefficients, not correlation coefficients.

When I do a set of linear regressions, I estimate that changes in the Undecided vote over the past couple of years have split approximately  70:20:3.5:6.5 between Labour:National:Greens:NZFirst.  That confirms the general conclusion in the post: most of the change in Undecided seems to have come from  Labour. You can do the regressions the other way around and ask where (net) voters leaving Labour have gone, and find that they overwhelmingly seem to have gone to Undecided.

What can we conclude from this? The conclusion is pretty limited because of the small number of polls (9) and the fact that we don’t actually have data on switching for any individuals. You could fit the data just as well by saying that Labour voters have switched to National and National voters have switched to Undecided by the same amount — this produces the same counts, but has different political implications. Since the trends have basically been a straight line over this period it’s fairly easy to get alternative explanations — if there had been more polls and more up-and-down variation the alternative explanations would be more strained.

The other limitation in conclusions is illustrated by the conclusion of the post

There’s a very clear story in these two correlations: Put simply, as the decided vote goes up so does the reported percentage vote for the Labour Party.

Conversely, as the decided vote goes up, the reported percentage vote for the National party tends to go down.

The closer the election draws the more likely it is that people will make a decision.

But then there’s one more step – getting people to put that decision into action and actually vote.

We simply don’t have data on what happens when the decided vote goes up — it has been going down over this period — so that can’t be the story. Even if we did have data on the decided vote going up, and even if we stipulated that people are more likely to come to a decision near the election, we still wouldn’t have a clear story. If it’s true that people tend to come to a decision near the election, this means the reason for changes in the undecided vote will be different near an election than far from an election. If the reasons for the changes are different, we can’t have much faith that the relationships between the changes will stay the same.

The data provide weak evidence that Labour has lost support to ‘Undecided’ rather than to National over the past couple of years, which should be encouraging to them. In the current form, the data don’t really provide any evidence for extrapolation to the election.

 

[here’s the re-typed count of preferences data, rounded to the nearest integer]

Stat of the Week Competition: June 21 – 27 2014

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday June 27 2014.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of June 21 – 27 2014 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

Stat of the Week Competition Discussion: June 21 – 27 2014

If you’d like to comment on or debate any of this week’s Stat of the Week nominations, please do so below!

June 18, 2014

NRL Predictions for Round 15

Team Ratings for Round 15

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Roosters 9.74 12.35 -2.60
Rabbitohs 7.89 5.82 2.10
Sea Eagles 5.23 9.10 -3.90
Broncos 5.07 -4.69 9.80
Cowboys 4.44 6.01 -1.60
Panthers 2.25 -2.48 4.70
Warriors 1.83 -0.72 2.50
Bulldogs 1.30 2.46 -1.20
Storm 0.27 7.64 -7.40
Knights -2.98 5.23 -8.20
Eels -4.24 -18.45 14.20
Titans -4.66 1.45 -6.10
Wests Tigers -4.77 -11.26 6.50
Dragons -6.84 -7.57 0.70
Raiders -7.27 -8.99 1.70
Sharks -9.04 2.32 -11.40

 

Performance So Far

So far there have been 104 matches played, 59 of which were correctly predicted, a success rate of 56.7%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Rabbitohs vs. Wests Tigers Jun 13 32 – 10 15.90 TRUE
2 Panthers vs. Dragons Jun 14 18 – 14 15.80 TRUE
3 Roosters vs. Knights Jun 14 29 – 12 17.30 TRUE
4 Bulldogs vs. Eels Jun 15 12 – 22 14.30 FALSE
5 Titans vs. Storm Jun 16 20 – 24 0.50 FALSE

 

Predictions for Round 15

Here are the predictions for Round 15. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Raiders vs. Bulldogs Jun 20 Bulldogs -4.10
2 Warriors vs. Broncos Jun 21 Warriors 1.30
3 Sharks vs. Sea Eagles Jun 21 Sea Eagles -9.80
4 Storm vs. Eels Jun 22 Storm 9.00
5 Titans vs. Dragons Jun 22 Titans 6.70
6 Knights vs. Cowboys Jun 23 Cowboys -2.90

 

Counts and proportions

Phil Price writes (at Andrew Gelman’s blog) on the impact of bike-share programs:

So the number of head injuries declined by 14 percent, and the Washington Post reporter — Lenny Bernstein, for those of you keeping score at home — says they went up 7.8%.  That’s a pretty big mistake! How did it happen?  Well, the number of head injuries went down, but the number of injuries that were not head injuries went down even more, so the proportion of injuries that were head injuries went up.

 

To be precise, the research paper found 638 hospitalised head injuries in 24 months before the bike share program, and 273 in the 12 months afterwards. In a set of control cities that didn’t start a bike-share program there were 712 head injuries in the 24 months before the matching date and 342 in the 12 months afterwards. That is, a 14.4% decrease in the cities that added bike-share programs and a 4% decrease in those that didn’t.