April 24, 2016

Briefly

  • An example of bad forms design. 73% of members of the “American Indepedent Party” in California didn’t realise they were members of a party. They won’t be able to vote in the Democratic primary, though unaffiliated voters will be able to.  This ‘73%’ is also an example of the denominator mattering: the errors are estimated at 73% of AIP members but only 12% of independents
  • Herald (Daily Mail) headline “Meditation can knock 7 years off age of your brain”. Text: “those who meditate may lead healthier lifestyles in general. It is also possible that some inherent difference in brain structure makes some people more likely to take up meditating. Those studied had practised various types of traditional meditation for an average of 20 years.
  • Amazon has been distinctive for making the same prices available to rich and poor Americans. But the same-day free delivery service is becoming an exception. Bloomberg looks at why (with graphics) (via Harkanwal Singh)
  • Maps of electorate-level odds for the Australian election, with an interesting attempt to solve the problem of a continent made up mostly of empty space
  • A data proofreading app designed for data journalists (via Kristin Henry)
April 20, 2016

Housing affordability graphics

Another nice Herald interactive, this time of housing affordability.

map

Affordability comes in two parts: down payment and monthly mortgage costs. The affordability index from Massey University looks at monthly payments; this one looks at the 20% down payment.

The difference between Auckland and the rest of the country is pretty dramatic, but there are other things to see. Above, the centre of Auckland is much less expensive than the rest of the city: 75% of properties are valued at under $500,000 by CoreLogic.  That’s the apartments, but they mostly aren’t the sort of apartments people are planning to stay in long-term.

Another interesting feature for Auckland is that the neighbourhoods really are ordered in price — you don’t see the spatial trends changing as you move the slider, so there aren’t areas where the low-end houses are especially cheap and the high-end houses especially expensive.

You can also see the difficulty of relating valuations to prices. In Point Chev, the valuations say 70% of homes are valued at over $1 million. On the other hand, the median sale price is $990,00, so less than half the homes that changed hands went for over a million.

CgcdhlFUMAAj56W

Both those numbers are correct. Well, ok,  I assume they are both correct; they are both what they are supposed to be.  It’s just that home sales aren’t a random sample of all homes.  But if the median sale price is $990k and the median valuation for all homes is $1.2m, you can see that interpreting these numbers is harder than it looks.

Super 18 Predictions for Round 9

Team Ratings for Round 9

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 7.89 9.84 -1.90
Hurricanes 6.70 7.26 -0.60
Chiefs 6.51 2.68 3.80
Highlanders 6.25 6.80 -0.50
Brumbies 3.97 3.15 0.80
Waratahs 1.80 4.88 -3.10
Stormers 1.72 -0.62 2.30
Lions 0.73 -1.80 2.50
Bulls -0.32 -0.74 0.40
Sharks -1.31 -1.64 0.30
Blues -3.92 -5.51 1.60
Cheetahs -5.17 -9.27 4.10
Rebels -6.40 -6.33 -0.10
Jaguares -8.44 -10.00 1.60
Force -9.31 -8.43 -0.90
Reds -9.56 -9.81 0.30
Sunwolves -17.01 -10.00 -7.00
Kings -17.37 -13.66 -3.70

 

Performance So Far

So far there have been 62 matches played, 42 of which were correctly predicted, a success rate of 67.7%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Crusaders vs. Jaguares Apr 15 32 – 15 20.80 TRUE
2 Rebels vs. Hurricanes Apr 15 13 – 38 -6.90 TRUE
3 Cheetahs vs. Sunwolves Apr 15 92 – 17 7.80 TRUE
4 Blues vs. Sharks Apr 16 23 – 18 0.90 TRUE
5 Waratahs vs. Brumbies Apr 16 20 – 26 2.30 FALSE
6 Bulls vs. Reds Apr 16 41 – 22 12.40 TRUE
7 Lions vs. Stormers Apr 16 29 – 22 1.90 TRUE

 

Predictions for Round 9

Here are the predictions for Round 9. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Highlanders vs. Sharks Apr 22 Highlanders 11.60
2 Rebels vs. Cheetahs Apr 22 Rebels 2.80
3 Sunwolves vs. Jaguares Apr 23 Jaguares -4.60
4 Hurricanes vs. Chiefs Apr 23 Hurricanes 3.70
5 Force vs. Waratahs Apr 23 Waratahs -7.60
6 Stormers vs. Reds Apr 23 Stormers 15.30
7 Kings vs. Lions Apr 23 Lions -14.60
8 Brumbies vs. Crusaders Apr 24 Brumbies 0.10

 

NRL Predictions for Round 8

Team Ratings for Round 8

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Cowboys 12.83 10.29 2.50
Broncos 11.48 9.81 1.70
Roosters 2.88 11.20 -8.30
Sharks 2.84 -1.06 3.90
Storm 1.83 4.41 -2.60
Bulldogs 1.77 1.50 0.30
Eels 1.15 -4.62 5.80
Rabbitohs 0.21 -1.20 1.40
Panthers -0.44 -3.06 2.60
Sea Eagles -0.96 0.36 -1.30
Dragons -3.03 -0.10 -2.90
Raiders -3.37 -0.55 -2.80
Warriors -4.58 -7.47 2.90
Wests Tigers -4.99 -4.06 -0.90
Titans -5.41 -8.39 3.00
Knights -10.53 -5.41 -5.10

 

Performance So Far

So far there have been 56 matches played, 28 of which were correctly predicted, a success rate of 50%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Sea Eagles vs. Eels Apr 14 10 – 22 3.00 FALSE
2 Cowboys vs. Rabbitohs Apr 15 44 – 18 13.90 TRUE
3 Titans vs. Dragons Apr 16 14 – 19 1.60 FALSE
4 Bulldogs vs. Warriors Apr 16 20 – 24 12.70 FALSE
5 Broncos vs. Knights Apr 16 53 – 0 20.70 TRUE
6 Raiders vs. Sharks Apr 17 16 – 40 0.10 FALSE
7 Wests Tigers vs. Storm Apr 17 18 – 19 -4.40 TRUE
8 Roosters vs. Panthers Apr 18 16 – 20 8.00 FALSE

 

Predictions for Round 8

Here are the predictions for Round 8. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Broncos vs. Rabbitohs Apr 22 Broncos 14.30
2 Bulldogs vs. Titans Apr 23 Bulldogs 10.20
3 Raiders vs. Wests Tigers Apr 23 Raiders 4.60
4 Cowboys vs. Eels Apr 23 Cowboys 14.70
5 Sharks vs. Panthers Apr 24 Sharks 6.30
6 Knights vs. Sea Eagles Apr 25 Sea Eagles -6.60
7 Dragons vs. Roosters Apr 25 Roosters -5.90
8 Storm vs. Warriors Apr 25 Storm 10.40

 

April 18, 2016

Being precise

regional1

There are stories in the Herald about home buyers being forced out of Auckland by house prices, and about the proportion of homes in other regions being sold to Aucklanders.  As we all know, Auckland house prices are a serious problem and might be hard to fix even if there weren’t motivations for so many people to oppose any solution.  I still think it’s useful to be cautious about the relevance of the numbers.

We don’t learn from the story how CoreLogic works out which home buyers in other regions are JAFAs — we should, but we don’t. My understanding is that they match names in the LINZ title registry.  That means the 19.5% of Auckland buyers in Tauranga last quarter is made up of three groups

  1. Auckland home owners moving to Tauranga
  2. Auckland home owners buying investment property in Tauranga
  3. Homeowners in Tauranga who have the same name as a homeowner in Auckland.

Only the first group is really relevant to the affordability story.  In fact, it’s worse than that. Some of the first group will be moving to Tauranga just because it’s a nice place to live (or so I’m told).  Conversely, as the story says, a lot of the people who are relevant to the affordability problem won’t be included precisely because they couldn’t afford a home in Auckland.

For data from recent years the problem could have been reduced a lot by some calibration to ground truth: contact people living at a random sample of the properties and find out if they had moved from Auckland and why.  You might even be able to find out from renters if their landlord was from Auckland, though that would be less reliable if a property management company had been involved.  You could do the same thing with a sample of homes owned by people without Auckland-sounding names to get information in the other direction.  With calibration, the complete name-linkage data could be very powerful, but on its own it will be pretty approximate.

 

Stat of the Week Competition: April 16 – 22 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday April 22 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of April 16 – 22 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

April 17, 2016

Briefly

  • “Statistical bullshit: how politicians poisoned statistics”, by Tim Harford in the Financial Times. An important piece to read, but it’s also worth bearing in mind Daniel Davies’s response on Twitter: “it wasn’t politicians that gave us the replicability crisis…”
  • Graphics: every attempted scoring shot Kobe Bryant made during his career.  Some features (like the three-point line) are obvious, more would probably be clear to basketball fans.  It still misses out a bit on the ‘compared to what?’ scale. How did Kobe’s shots compare to other players’, for example?
  • The dark side of comments. The Guardian analysed the 70 million comments on their website: results not surprising, but depressing.

Although the majority of our regular opinion writers are white men, we found that those who experienced the highest levels of abuse and dismissive trolling were not. The 10 regular writers who got the most abuse were eight women (four white and four non-white) and two black men. Two of the women and one of the men were gay. And of the eight women in the “top 10”, one was Muslim and one Jewish.

And the 10 regular writers who got the least abuse? All men

  • At Slate: maps of cholera: “at CDC headquarters today, five-and-a-half years into the epidemic, they are proudly displaying two historic maps that have everything to do with each other, but they are not telling you why” 
  • From 538: when people in the US file their taxes.
    casselman-taxday-v2

Evil within?

The headlineSex and violence ‘normal’ for boys who kill women in video games: study. That’s a pretty strong statement, and the claim quotes imply we’re going to find out who made it. We don’t.

The (much-weaker) take-home message:

The researchers’ conclusion: Sexist games may shrink boys’ empathy for female victims.

The detail:

The researchers then showed each student a photo of a bruised girl who, they said, had been beaten by a boy. They asked: On a scale of one to seven, how much sympathy do you have for her?

The male students who had just played Grand Theft Auto – and also related to the protagonist – felt least bad for her. with an empathy mean score of 3. Those who had played the other games, however, exhibited more compassion. And female students who played the same rounds of Grand Theft Auto had a mean empathy score of 5.3.

The important part is between the dashes: male students who related more to the protagonist in Grand Theft Auto had less empathy for a female victim.  There’s no evidence given that this was a result of playing Grand Theft Auto, since the researchers (obviously) didn’t ask about how people who didn’t play that game related to its protagonist.

What I wanted to know was how the empathy scores compared by which game the students played, separately by gender. The research paper didn’t report the analysis I wanted, but thanks to the wonders of Open Science, their data are available.

If you just compare which game the students were assigned to (and their gender), here are the means; the intervals are set up so there’s a statistically significant difference between two groups when their intervals don’t overlap.

gtamean

The difference between different games is too small to pick out reliably at this sample size, but is less than half a point on the scale — and while the ‘violent/sexist’ games might reduce empathy, there’s just as much evidence (ie, not very much) that the ‘violent’ ones increase it.

Here’s the complete data, because means can be misleading

gtaswarm

The data are consistent with a small overall impact of the game, or no real impact. They’re consistent with a moderately large impact on a subset of susceptible men, but equally consistent with some men just being horrible people.

If this is an issue you’ve considered in the past, this study shouldn’t be enough to alter your views much, and if it isn’t an issue you’ve considered in the past, it wouldn’t be the place to start.

Overcounting causes

There’s a long story in the Sunday Star-Times about a 2007 report on cannabis from the National Drug Intelligence Bureau (NDIB)

“Perhaps surprisingly,” Maxwell wrote, “cannabis related hospital admissions between 2001 and 2005 exceeded admissions for opiates, amphetamines and cocaine combined”, with about 2000 people a year ending up in hospital because of the drug.

The problem was with hospital diagnostic codes. Discharge summaries include both the primary cause of admission and a lot of other things to be noted. That’s a good thing — you want to know what all was wrong with a patient both for future clinical care and for research and quality control.  For example, if someone is in hospital for bleeding, you want to know they were on warfarin (which is why the bleeding happened), and perhaps why they were on warfarin. It’s not even always the case that the primary cause is the primary cause — if someone has Parkinson’s Disease and is admitted with pneumonia as a complication, which one should be listed? This is a difficult and complex field, and is even slightly less boring than it sounds.

As a result, if you just count up all the discharge summaries where ‘cannabis dependence’ was somewhere on the laundry list of codes, you’re going to get a lot of people who smoke pot but are in hospital for some completely different reason.  And since there’s a lot of cannabis consumption out there, you will get a lot of these false positives.

There are some other things to note about this report, though. The National Drug Foundation says (on Twitter) that they made the same point when it first came out. They also claim


that the Ministry of Health argued against its being published.

Perhaps now the multiple-counting problem has been publicised in the context of hospital admissions the same mistake will be made less often for road crashes, where multiple factors from foreign drivers to speed to alcohol to drugs are repeatedly counted up as ‘the’ cause of any crash where they are present.

April 13, 2016

Super 18 Predictions for Round 8

Team Ratings for Round 8

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 8.12 9.84 -1.70
Chiefs 6.51 2.68 3.80
Highlanders 6.25 6.80 -0.50
Hurricanes 5.62 7.26 -1.60
Brumbies 3.47 3.15 0.30
Waratahs 2.30 4.88 -2.60
Stormers 2.02 -0.62 2.60
Lions 0.42 -1.80 2.20
Bulls -0.72 -0.74 0.00
Sharks -1.06 -1.64 0.60
Blues -4.16 -5.51 1.30
Rebels -5.31 -6.33 1.00
Jaguares -8.67 -10.00 1.30
Reds -9.17 -9.81 0.60
Cheetahs -9.21 -9.27 0.10
Force -9.31 -8.43 -0.90
Sunwolves -12.98 -10.00 -3.00
Kings -17.37 -13.66 -3.70

 

Performance So Far

So far there have been 55 matches played, 36 of which were correctly predicted, a success rate of 65.5%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Chiefs vs. Blues Apr 08 29 – 23 15.30 TRUE
2 Force vs. Crusaders Apr 08 19 – 20 -15.10 TRUE
3 Stormers vs. Sunwolves Apr 08 46 – 19 17.90 TRUE
4 Hurricanes vs. Jaguares Apr 09 40 – 22 18.30 TRUE
5 Reds vs. Highlanders Apr 09 28 – 27 -13.10 FALSE
6 Sharks vs. Lions Apr 09 9 – 24 4.30 FALSE
7 Kings vs. Bulls Apr 09 6 – 38 -10.60 TRUE

 

Predictions for Round 8

Here are the predictions for Round 8. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Crusaders vs. Jaguares Apr 15 Crusaders 20.80
2 Rebels vs. Hurricanes Apr 15 Hurricanes -6.90
3 Cheetahs vs. Sunwolves Apr 15 Cheetahs 7.80
4 Blues vs. Sharks Apr 16 Blues 0.90
5 Waratahs vs. Brumbies Apr 16 Waratahs 2.30
6 Bulls vs. Reds Apr 16 Bulls 12.40
7 Lions vs. Stormers Apr 16 Lions 1.90