April 9, 2016

Movie stars broken down by age and sex

The folks at Polygraph have a lovely set of interactive graphics of number of speaking lines in 2000 movie screenplays, with IMDB look-ups of actor age and gender.  If you haven’t been living in a cave on Mars, the basic conclusion won’t be surprising, but the extent of the differences might. Frozen, for example, gave more than half the lines to male characters.

They’ve also made a lot of data available on Github for other people to use. Here’s a graph combining the age and gender data in a different way than they did: total number of speaking lines by age and gender

hollywood

Men and women have similar number of speaking lines up to about age 30, but after that there’s a huge separation and much less opportunity for female actors.  We can all think of exceptions: Judi “M” Dench, Maggie “Minerva” Smith, Joanna “Absolutely no relation” Lumley, but they are exceptions.

Compared to what?

Two maps via Twitter:

From the Sydney Morning Herald, via @mlle_elle and @rpy

creativemap

The differences in population density swamp anything else. For the map to be useful we’d need a comparison between ‘creative professionals’ and ‘non-creative unprofessionals’.  There’s an XKCD about this.

Peter Ellis has another visualisation of the last election that emphasises comparisons. Here’s a comparison of Green and Labour votes (by polling place) across Auckland.

votemap

There’s a clear division between the areas where Labour and Green polled about the same, and those where Labour did much better

 

April 8, 2016

Briefly

  • A lottery in the US rigged by subverting the random number generator.  That’s harder to do with the complicated balls-from-a-machine we use — and it’s also more obvious when drawing balls from a machine that betting systems based on sophisticated numerical sequences won’t work.
  • The (US) Transport Security Administration has a ‘fast lane’ for more-trusted travellers, who get chosen for screening randomly. They use a randomizer app to make sure it really is random, which is a good idea — people are very bad at random choices. But perhaps it shouldn’t have cost $50k.
  • The Panama Papers are an example of the importance of data skills to journalists.
  • University of Otago research on microRNA may help with Alzheimer’s Disease diagnosis, which is interesting and potentially very useful, but there have been a lot of ‘potential tests’ recently. Also the research is unpublished and they aren’t disclosing yet which microRNAs are involved, so perhaps the publicity could have waited.
April 6, 2016

Super 18 Predictions for Round 7

Team Ratings for Round 7

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 8.97 9.84 -0.90
Highlanders 7.10 6.80 0.30
Chiefs 7.07 2.68 4.40
Hurricanes 5.64 7.26 -1.60
Brumbies 3.47 3.15 0.30
Waratahs 2.30 4.88 -2.60
Stormers 1.48 -0.62 2.10
Sharks 0.10 -1.64 1.70
Lions -0.74 -1.80 1.10
Bulls -2.00 -0.74 -1.30
Blues -4.72 -5.51 0.80
Rebels -5.31 -6.33 1.00
Jaguares -8.69 -10.00 1.30
Cheetahs -9.21 -9.27 0.10
Reds -10.01 -9.81 -0.20
Force -10.16 -8.43 -1.70
Sunwolves -12.43 -10.00 -2.40
Kings -16.08 -13.66 -2.40

 

Performance So Far

So far there have been 48 matches played, 31 of which were correctly predicted, a success rate of 64.6%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Highlanders vs. Force Apr 01 32 – 20 22.50 TRUE
2 Lions vs. Crusaders Apr 01 37 – 43 -5.70 TRUE
3 Blues vs. Jaguares Apr 02 24 – 16 8.00 TRUE
4 Brumbies vs. Chiefs Apr 02 23 – 48 3.90 FALSE
5 Kings vs. Sunwolves Apr 02 33 – 28 -0.30 FALSE
6 Bulls vs. Cheetahs Apr 02 23 – 18 11.50 TRUE
7 Waratahs vs. Rebels Apr 03 17 – 21 13.20 FALSE

 

Predictions for Round 7

Here are the predictions for Round 7. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Chiefs vs. Blues Apr 08 Chiefs 15.30
2 Force vs. Crusaders Apr 08 Crusaders -15.10
3 Stormers vs. Sunwolves Apr 08 Stormers 17.90
4 Hurricanes vs. Jaguares Apr 09 Hurricanes 18.30
5 Reds vs. Highlanders Apr 09 Highlanders -13.10
6 Sharks vs. Lions Apr 09 Sharks 4.30
7 Kings vs. Bulls Apr 09 Bulls -10.60

 

NRL Predictions for Round 6

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Cowboys 12.42 10.29 2.10
Broncos 8.47 9.81 -1.30
Roosters 3.11 11.20 -8.10
Storm 2.78 4.41 -1.60
Bulldogs 2.25 1.50 0.80
Rabbitohs 1.67 -1.20 2.90
Sharks 1.59 -1.06 2.60
Raiders 0.23 -0.55 0.80
Sea Eagles -1.06 0.36 -1.40
Panthers -1.74 -3.06 1.30
Eels -1.87 -4.62 2.80
Dragons -2.67 -0.10 -2.60
Warriors -4.59 -7.47 2.90
Wests Tigers -5.05 -4.06 -1.00
Titans -5.32 -8.39 3.10
Knights -8.56 -5.41 -3.10

 

Performance So Far

So far there have been 40 matches played, 21 of which were correctly predicted, a success rate of 52.5%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Sea Eagles vs. Rabbitohs Mar 31 12 – 16 1.10 FALSE
2 Titans vs. Broncos Apr 01 16 – 24 -11.30 TRUE
3 Storm vs. Knights Apr 02 18 – 14 16.00 TRUE
4 Wests Tigers vs. Sharks Apr 02 26 – 34 -2.80 TRUE
5 Cowboys vs. Dragons Apr 02 36 – 0 15.20 TRUE
6 Roosters vs. Warriors Apr 03 28 – 32 14.20 FALSE
7 Eels vs. Panthers Apr 03 18 – 20 3.80 FALSE
8 Bulldogs vs. Raiders Apr 04 8 – 22 8.00 FALSE

 

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Broncos vs. Dragons Apr 07 Broncos 14.10
2 Rabbitohs vs. Roosters Apr 08 Rabbitohs 1.60
3 Eels vs. Raiders Apr 09 Eels 0.90
4 Warriors vs. Sea Eagles Apr 09 Warriors 0.50
5 Panthers vs. Cowboys Apr 09 Cowboys -11.20
6 Sharks vs. Titans Apr 10 Sharks 9.90
7 Knights vs. Wests Tigers Apr 10 Wests Tigers -0.50
8 Storm vs. Bulldogs Apr 11 Storm 3.50

 

April 4, 2016

Stat of the Week Competition: April 2 – 8 2016

Each week, we would like to invite readers of Stats Chat to submit nominations for our Stat of the Week competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

  • Anyone may add a comment on this post to nominate their Stat of the Week candidate before midday Friday April 8 2016.
  • Statistics can be bad, exemplary or fascinating.
  • The statistic must be in the NZ media during the period of April 2 – 8 2016 inclusive.
  • Quote the statistic, when and where it was published and tell us why it should be our Stat of the Week.

Next Monday at midday we’ll announce the winner of this week’s Stat of the Week competition, and start a new one.

(more…)

April 2, 2016

One weird trick increases donating tenfold?

From the Herald:

US researchers have confirmed a strange link between touching rough surfaces and feeling for others, which could help charities raise more money.

Based on my usual complaints about this sort of claim, you might expect that the research didn’t look at donating money  or that it saw only a tiny difference. No.

There were five experiments, but only one that involved actual money. People were approached on the street and given a description of  a health-related charity, and asked to donate. One charity was real, working in a familiar disease; the other was fake, working in a real but obscure disease (all the money actually ended up with the real charity).  Half the participants were given the information and donation envelope on a clipboard with rough sandpaper on the back; the other half weren’t.

1-s2.0-S1057740815001035-gr4

When asked to donate to the National Breast Cancer Foundation there was no difference between the rough and smooth clipboards (as you’d expect). When asked to donate to the National Sjögren’s Foundation, 10/34 with sandpaper-backed clipboards said yes compared to only 1/32 with smooth clipboards.

I’m going to go very slightly out on a limb here to say there is no way this ten-fold increase is a real and generalisable phenomenon.  So, what went wrong?  Part of the problem is what Andrew Gelman calls the ‘garden of forking paths’, after the Jorge Luis Borges story — there are many, many possible analyses and they don’t all show this dramatic difference.

For example, there wasn’t a difference in donation probability with the familiar charity. This was consistent with the researchers’ theory, but I’m pretty sure if there had been a difference the researchers wouldn’t have considered it as evidence refuting the theory. Also, the researchers note that they didn’t see a difference in donation amount with the sandpaper, just in donation probability.

Also, if you assume the ten-fold increase was overestimated even a bit, you then get into the problem of sample size. Suppose that the effect was only a two-fold increase rather than ten-fold. That still seems implausibly large to me, but the comparison would then be something like 2/34 vs 1/32 and would be completely unimpressive.  You’d need a sample size something like ten times larger.  And that’s if a bit of sandpaper on the back of a clipboard doubled the number of people who donated.

Still, these findings could have “significant implications for less well-known charities”, as the researchers suggest. If I got approached by a charity using sandpaper on the back of their clipboards, I would tend to think they were (a) poor at evaluating evidence, and (b) not all that honest. I could see that having an impact.

March 30, 2016

Hold the lettuce

Q: Did you see vegetarian diets cause cancer now?

A: No.

Q: The Herald site front page: headline Vegetarianism can lead to cancer?

vege

A: No

Q: The teaser: “Scientists have found there can be long-term health risks associated with a vegetarian diet, that could outweigh the benefits.”?

A: Well, it depends on what you mean by ‘long-term’, for a start.

Q: How long-term?

A: Centuries, perhaps thousands of years.

Q: How did they find people who were thousands of years old? And why isn’t that the headline?

A: Not people.

Q: I refuse to believe in century-old lab mice.

A: Human populations.

Q: Ok, so if we click through to the story (from the Telegraph) it seems they’re saying your great-grandparents eating lettuce gives you harmful mutations?

A: That’s what the story says, but it’s not what the research says. The research suggests that a mutation that with a modern diet might increase cancer risk arose randomly a long time in the past and became common in a South Asian population where vegetarian diets have been common.

Q: How did the mutation become common?

A: Because it wasn’t true that the long-term health risks outweighed the benefits — there’s genetic evidence of  ‘selection’ in the evolutionary sense, meaning that people with the mutation had more descendants on average.

Q: How much health risk did they find?

A: They weren’t looking at health risks

Q: But “long-term health risks” and “can lead to cancer”?

A: Sadly, yes.

Q: Ok, what were they looking at?

A: They were looking at enzymes that turns one type of fatty acid into another. The mutation makes it easier for the body to synthesis long polyunsaturated acids

Q: Aren’t they good?

A: Some of them, like the DHA and EPA also found in fish, are thought to reduce inflammation and heart disease. But arachidonic acid is thought to increase inflammation, though the American Heart Association isn’t convinced

Q: That’s heart disease. What about cancer?

A: The only links to cancer are pretty speculative — that the mutation could reinforce effects of modern diet in increasing cancer risk.  The contribution of arachidonic acid to that is controversial. But it could be real.

Q: Is there actually a higher cancer rate where they got their vegetarian population from, compared to the control population?

A: No.

Q: That ‘arachidonic acid’ thing. Why does that make me think of spiders?

A: Yes, me too. It’s a false cognate: Latin ‘arachis‘, ‘peanut’, not the mythic Greek technologist Aράχνη that arachnids were named for.

Super 18 Predictions for Round 6

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Crusaders 8.95 9.84 -0.90
Highlanders 7.73 6.80 0.90
Hurricanes 5.64 7.26 -1.60
Chiefs 5.34 2.68 2.70
Brumbies 5.21 3.15 2.10
Waratahs 3.34 4.88 -1.50
Stormers 1.48 -0.62 2.10
Sharks 0.10 -1.64 1.70
Lions -0.72 -1.80 1.10
Bulls -1.61 -0.74 -0.90
Blues -4.72 -5.51 0.80
Rebels -6.35 -6.33 -0.00
Jaguares -8.69 -10.00 1.30
Cheetahs -9.60 -9.27 -0.30
Reds -10.01 -9.81 -0.20
Force -10.79 -8.43 -2.40
Sunwolves -12.12 -10.00 -2.10
Kings -16.40 -13.66 -2.70

 

Performance So Far

So far there have been 41 matches played, 27 of which were correctly predicted, a success rate of 65.9%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Hurricanes vs. Kings Mar 25 42 – 20 26.60 TRUE
2 Chiefs vs. Force Mar 26 53 – 10 17.00 TRUE
3 Rebels vs. Highlanders Mar 26 3 – 27 -8.20 TRUE
4 Sunwolves vs. Bulls Mar 26 27 – 30 -7.00 TRUE
5 Cheetahs vs. Brumbies Mar 26 18 – 25 -11.30 TRUE
6 Sharks vs. Crusaders Mar 26 14 – 19 -4.80 TRUE
7 Jaguares vs. Stormers Mar 26 8 – 13 -6.30 TRUE
8 Reds vs. Waratahs Mar 27 13 – 15 -10.90 TRUE

 

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Highlanders vs. Force Apr 01 Highlanders 22.50
2 Lions vs. Crusaders Apr 01 Crusaders -5.70
3 Blues vs. Jaguares Apr 02 Blues 8.00
4 Brumbies vs. Chiefs Apr 02 Brumbies 3.90
5 Kings vs. Sunwolves Apr 02 Sunwolves -0.30
6 Bulls vs. Cheetahs Apr 02 Bulls 11.50
7 Waratahs vs. Rebels Apr 03 Waratahs 13.20

 

NRL Predictions for Round 5

Team Ratings for Round 5

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Cowboys 11.00 10.29 0.70
Broncos 8.74 9.81 -1.10
Roosters 4.37 11.20 -6.80
Bulldogs 3.76 1.50 2.30
Storm 3.63 4.41 -0.80
Rabbitohs 1.27 -1.20 2.50
Sharks 1.17 -1.06 2.20
Sea Eagles -0.65 0.36 -1.00
Dragons -1.25 -0.10 -1.20
Raiders -1.28 -0.55 -0.70
Eels -1.40 -4.62 3.20
Panthers -2.21 -3.06 0.80
Wests Tigers -4.64 -4.06 -0.60
Titans -5.59 -8.39 2.80
Warriors -5.85 -7.47 1.60
Knights -9.41 -5.41 -4.00

 

Performance So Far

So far there have been 32 matches played, 17 of which were correctly predicted, a success rate of 53.1%.
Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Rabbitohs vs. Bulldogs Mar 25 12 – 42 5.20 FALSE
2 Broncos vs. Cowboys Mar 25 21 – 20 0.70 TRUE
3 Raiders vs. Titans Mar 26 20 – 24 9.20 FALSE
4 Roosters vs. Sea Eagles Mar 26 20 – 22 9.70 FALSE
5 Dragons vs. Panthers Mar 27 14 – 12 4.30 TRUE
6 Warriors vs. Knights Mar 28 40 – 18 5.20 TRUE
7 Wests Tigers vs. Eels Mar 28 0 – 8 1.10 FALSE
8 Sharks vs. Storm Mar 28 14 – 6 -0.70 FALSE

 

Predictions for Round 5

Here are the predictions for Round 5. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Sea Eagles vs. Rabbitohs Mar 31 Sea Eagles 1.10
2 Titans vs. Broncos Apr 01 Broncos -11.30
3 Storm vs. Knights Apr 02 Storm 16.00
4 Wests Tigers vs. Sharks Apr 02 Sharks -2.80
5 Cowboys vs. Dragons Apr 02 Cowboys 15.20
6 Roosters vs. Warriors Apr 03 Roosters 14.20
7 Eels vs. Panthers Apr 03 Eels 3.80
8 Bulldogs vs. Raiders Apr 04 Bulldogs 8.00