Posts filed under General (653)

March 31, 2015

Beautiful and trustworthy

The Herald has pictures of the most beautiful faces in the world

BEAUTIFUL-FACES_3249636b_620x310

and NPR reports on a computer algorithm that can tell if you sound trustworthy or calming or engaging.

The Herald story at least admits these faces are only world-famous in New Zealand (or, rather, the UK)

“It’s important to note that these are the idealised faces according to those living in the UK, so a study in Asia or Africa for example would no doubt have different results.”

The NPR story instead doubles down by saying

But algorithms have stamina, and they do not factor in things like age, race, gender or sexual orientation.

There’s a sense in which this is true, but it’s not a very useful sense. If we can guess age, race, or gender from the sound of someone’s voice, and these perceptions affect whether we think the voice is engaging,calming or trustworthy, our prejudices will show up in the training data and any competent black-box algorithm will learn them.

 

March 30, 2015

Briefly

  • Two data-related notes about the Northland by-election: the polls were amazingly accurate given how hard by-elections are to predict, and the Electoral Commission did a wonderful job in getting the vote counted and reported fast.
  • The Medical Council of New Zealand has released a Discussion Paper on the value of performance and outcome data.
March 26, 2015

Understanding Ebola

From the BBC, Hans Rosling on the Ebola epidemic

rosling-ebola

(That’s a diagram of the data collection system behind him)

(via Harkanwal Singh)

March 25, 2015

Gimme that old time nutrition

Q: Did you see that eating a bowl of quinoa every day helps you live longer?

A: No.

Q: There’s story on Stuff (well, from the West Island branches). Is it true?

A: Hard to say.

Q: Well, does the research claim it’s true?

A: Hard to say.

Q: Why? Didn’t they link?

A: No, they linked, and the paper is even open-access. It just doesn’t say anything about the effects of quinoa.

Q: But the story said “A new study by Harvard Public School of Health has found that eating a daily bowl of the protein-packed, gluten-free grain significantly reduces the risk of premature death from cancer, heart disease, respiratory disease and diabetes.”

A: Sadly, yes.

Q: This is your correlation and causation thing again, isn’t it?

A: No, the paper just doesn’t mention quinoa. It talks about grains and cereals.

Q: Ok. So they just didn’t break out the data for quinoa separately. It’s still a grain and a cereal, isn’t it?

A: Yes, as long as you aren’t even more pedantic than me. But it’s not just data analysis. They didn’t even ask their study participants about eating quinoa.

Q: So? Some of the grain they ate must have been quinoa, and there’s no reason to expect it’s different from other grains, is there? Won’t it all get averaged in somehow?

A: I suppose so. But there can’t have been that much of it getting “averaged in”

Q: Why not? You old folks may not have caught on, but quinoa’s getting popular now.

A: The study was in people over 50. That’s older than both of us. Even assuming we weren’t the same person.

Q: Even so. Things are changing. People have more adventurous diets. It’s not the twentieth century any more.

A: It is in the study.

Q: Huh?

A: The dietary data were collected in 1995 and 1997, from people with average age 61 years.

Q: Oh.

NRL Predictions for Round 4

Team Ratings for Round 4

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Rabbitohs 13.78 13.06 0.70
Roosters 10.81 9.09 1.70
Panthers 5.37 3.69 1.70
Cowboys 5.19 9.52 -4.30
Storm 4.43 4.36 0.10
Broncos 3.83 4.03 -0.20
Warriors 2.94 3.07 -0.10
Bulldogs 1.56 0.21 1.40
Knights 0.77 -0.28 1.00
Sea Eagles 0.01 2.68 -2.70
Dragons -3.71 -1.74 -2.00
Eels -5.62 -7.19 1.60
Raiders -7.45 -7.09 -0.40
Wests Tigers -9.74 -13.13 3.40
Titans -10.02 -8.20 -1.80
Sharks -10.80 -10.76 -0.00

 

Performance So Far

So far there have been 24 matches played, 16 of which were correctly predicted, a success rate of 66.7%.

Here are the predictions for last week’s games

Game Date Score Prediction Correct
1 Broncos vs. Cowboys Mar 20 44 – 22 -1.60 FALSE
2 Sea Eagles vs. Bulldogs Mar 20 12 – 16 2.40 FALSE
3 Raiders vs. Dragons Mar 21 20 – 22 -0.50 TRUE
4 Storm vs. Sharks Mar 21 36 – 18 18.30 TRUE
5 Warriors vs. Eels Mar 21 29 – 16 12.50 TRUE
6 Rabbitohs vs. Wests Tigers Mar 22 20 – 6 28.60 TRUE
7 Titans vs. Knights Mar 22 18 – 20 -8.80 TRUE
8 Roosters vs. Panthers Mar 23 20 – 12 8.50 TRUE

 

Predictions for Round 4

Here are the predictions for Round 4. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Eels vs. Rabbitohs Mar 27 Rabbitohs -16.40
2 Wests Tigers vs. Bulldogs Mar 27 Bulldogs -8.30
3 Dragons vs. Sea Eagles Mar 28 Sea Eagles -0.70
4 Knights vs. Panthers Mar 28 Panthers -1.60
5 Sharks vs. Titans Mar 28 Sharks 2.20
6 Roosters vs. Raiders Mar 29 Roosters 21.30
7 Warriors vs. Broncos Mar 29 Warriors 3.10
8 Cowboys vs. Storm Mar 30 Cowboys 3.80

 

Super 15 Predictions for Round 7

Team Ratings for Round 7

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 9.22 10.42 -1.20
Waratahs 8.43 10.00 -1.60
Hurricanes 5.61 2.89 2.70
Brumbies 4.50 2.20 2.30
Chiefs 4.29 2.23 2.10
Stormers 2.70 1.68 1.00
Sharks 2.68 3.91 -1.20
Bulls 2.06 2.88 -0.80
Blues -0.07 1.44 -1.50
Highlanders -1.26 -2.54 1.30
Lions -3.93 -3.39 -0.50
Force -4.98 -4.67 -0.30
Rebels -7.07 -9.53 2.50
Cheetahs -7.48 -5.55 -1.90
Reds -7.72 -4.98 -2.70

 

Performance So Far

So far there have been 40 matches played, 26 of which were correctly predicted, a success rate of 65%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Highlanders vs. Hurricanes Mar 20 13 – 20 -2.20 TRUE
2 Rebels vs. Lions Mar 20 16 – 20 2.20 FALSE
3 Crusaders vs. Cheetahs Mar 21 57 – 14 18.50 TRUE
4 Bulls vs. Force Mar 21 25 – 24 13.00 TRUE
5 Sharks vs. Chiefs Mar 21 12 – 11 3.30 TRUE
6 Waratahs vs. Brumbies Mar 22 28 – 13 6.90 TRUE

 

Predictions for Round 7

Here are the predictions for Round 7. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Hurricanes vs. Rebels Mar 27 Hurricanes 17.20
2 Reds vs. Lions Mar 27 Reds 0.70
3 Chiefs vs. Cheetahs Mar 28 Chiefs 16.30
4 Highlanders vs. Stormers Mar 28 Highlanders 0.50
5 Waratahs vs. Blues Mar 28 Waratahs 13.00
6 Sharks vs. Force Mar 28 Sharks 12.20
7 Bulls vs. Crusaders Mar 28 Crusaders -2.70

 

March 23, 2015

Briefly

The “It’s not paranoia if..” issue

  • A new initiative, Data Justice, concerned with widespread commercial data collection and analysis as a threat to privacy and equality.
  • Trying to get “open” data in New Jersey: “initially refused to answer The Jersey Journal’s OPRA request because it didn’t make it on the agency’s standardized OPRA form, which wasn’t available on the NBHA website. Even after a reporter noted that in 2009 the state Supreme Court ruled standardized forms aren’t necessary, Earl wouldn’t accept a request on anything but the agency’s form.”
March 19, 2015

More on petrol prices

I posted a version of this graph with ten years of weekly data, and Mark Stockdale pointed out there are quarterly data back to 1983 (isn’t official data wonderful?). You’ll need to click the graph to embiggen for easy viewing.

petrol-long

 

The horizontal axis is the import cost plus freight and insurance (with CPI adjustments to 2013 NZ dollars), and the vertical axis is the importer margin, which covers transport and sale costs within New Zealand, and profit. The idea is that local costs are typically slowly varying, so that short-term variation in margin tracks short-term variation in profit. The label for each year is on the data point for June.

The import cost plummeted in the early 1980s, soon followed by a drop in the importer margin. That’s presumably Rogernomics and its consequences. The cost stayed fairly stable and low in the 1990s and the margin drifted down.  Then the cost increased after 1999, with the margin staying stable. We’ve recently entered a new pattern, with margin drifting upwards.

A final note: the import cost is about the same as in 1983, and so is the retail price (in real terms). The reduction in importer margin since 1983 has been almost exactly matched by an increase in taxes, though the taxes would probably be higher under a realistic world carbon price.

March 18, 2015

Briefly

  • Large-scale data cleaning: the US Social Security Administration has social security records but no death records for 6.5 million people over 112, ie, about 6.5 million more than the number of people over 112 in the world. Nearly 4000 of these people are trying to get jobs “During Calendar Years 2008 through 2011, employers made 4,024 E-Verify inquiries using 3,873 SSNs belonging to numberholders born before June 16, 1901.”
  • First FDA approval of a ‘biosimilar’ drug — the analogue of ‘generic’ for biologicals. Copying a biologic treatment  such as a protein hormone or an antibody is much harder than copying a small molecule (where the patent gives the necessary details), so the makers can charge more for it: in this case, only a 30% discount relative to the brand-name version. Biosimilars will be an important issue for Pharmac in the future: its second and third biggest medication expenses are for two biologicals.
  • Census at School (or, in this context, Tatauranga Ki Te Kura) was on Māori TV’s news program Te Kāea yesterday, with StatsChat contributor Julie Middleton explaining. The story (from 11:10 in this video) was headlined by the inclusion of questions on bullying in this year’s survey.

censusatschool

 

NRL Predictions for Round 3

Team Ratings for Round 3

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Rabbitohs 14.80 13.06 1.70
Roosters 10.85 9.09 1.80
Cowboys 6.80 9.52 -2.70
Panthers 5.32 3.69 1.60
Storm 4.46 4.36 0.10
Warriors 2.89 3.07 -0.20
Broncos 2.21 4.03 -1.80
Knights 1.26 -0.28 1.50
Bulldogs 1.10 0.21 0.90
Sea Eagles 0.48 2.68 -2.20
Dragons -3.83 -1.74 -2.10
Eels -5.58 -7.19 1.60
Raiders -7.33 -7.09 -0.20
Titans -10.51 -8.20 -2.30
Wests Tigers -10.76 -13.13 2.40
Sharks -10.82 -10.76 -0.10

 

Performance So Far

So far there have been 16 matches played, 10 of which were correctly predicted, a success rate of 62.5%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Bulldogs vs. Eels Mar 13 32 – 12 8.00 TRUE
2 Sharks vs. Broncos Mar 13 2 – 10 -10.40 TRUE
3 Cowboys vs. Knights Mar 14 14 – 16 10.30 FALSE
4 Panthers vs. Titans Mar 14 40 – 0 15.50 TRUE
5 Sea Eagles vs. Storm Mar 14 24 – 22 -1.50 FALSE
6 Rabbitohs vs. Roosters Mar 15 34 – 26 6.70 TRUE
7 Raiders vs. Warriors Mar 15 6 – 18 -5.20 TRUE
8 Wests Tigers vs. Dragons Mar 16 22 – 4 -7.40 FALSE

 

Predictions for Round 3

Here are the predictions for Round 3. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Broncos vs. Cowboys Mar 20 Cowboys -1.60
2 Sea Eagles vs. Bulldogs Mar 20 Sea Eagles 2.40
3 Raiders vs. Dragons Mar 21 Dragons -0.50
4 Storm vs. Sharks Mar 21 Storm 18.30
5 Warriors vs. Eels Mar 21 Warriors 12.50
6 Rabbitohs vs. Wests Tigers Mar 22 Rabbitohs 28.60
7 Titans vs. Knights Mar 22 Knights -8.80
8 Roosters vs. Panthers Mar 23 Roosters 8.50