Posts filed under General (575)

October 29, 2014

Briefly

  • The Herald reports on a genetic study in Finland that found a couple of rare genetic variants which were about 2.5 times more common in people who had committed multiple violent crimes.  I don’t have anything criticise about the story, just a point about genetics. When you’re trying to interpret an association like this one from a philosophical or policy point of view, it’s helpful to note that roughly 95% of their extremely violent criminals carried a genetic variant present in only 50% of the population — an odds ratio more like 25 than 2.5.
  • A story and interactive tool at Fusion, showing how changes in youth turnout would affect the US election results next week (if they happened, which they probably won’t).
  • From Anthony Tockar at Neustar, how anonymised taxi ride data from New York could be used to track passengers, not just drivers.
  • And the same taxi data being used for good, via mathbabe.org
October 24, 2014

Something in the air

There’s a story “Pollution can cause lung problems in unborn baby – research” in the Herald, which I’m not  convinced by, but the reasons are relatively subtle.

The researchers compared levels of traffic-related air pollution exposure for different pregnant women, and looked at the lung function of the children at age four and a half (press release).  The story gets the name of the main pollutant (nitrogen dioxide) wrong in two different ways, but is otherwise a good summary.  It’s all correlation, but weaker associations than this are fairly reliably estimated for short-term exposures to air pollution. Long-term exposure is different, and that’s what’s interesting.

Studies of short-term effects of air pollution compare the number of people dying or going to hospital on days when pollution is high to the number on days where pollution is low.  That is, the comparisons of pollution are for the same people and for the same air pollution monitors. There are a fairly limited selection of other factors that could explain the association — the main ones being related to weather.

Studies of longer-term effects compare people with high exposure to pollution and people with low exposure to pollution.  Actually, they don’t quite do that, because air pollution monitoring is expensive in labour and equipment. They compare people with high estimated exposure and low estimated exposure. Since we’re comparing different people, any factor that affects health and also affects where people live could cause a bias, and it’s very well established that poorer people tend to get exposed to more pollution, at least in cities. Also, since we’re comparing different air pollution monitors, there can be biases from how representative the monitors are of the local area.

These problems mean that it’s much harder to be confident about effects of longer-term air pollution exposure, even though these effects are likely to be bigger than the short-term ones. Fortunately, we don’t need to be sure of these effects in setting public policy. The main source of the pollution is traffic, and there are other independent reasons why we want to have fewer cars burning less fuel.

On the statistical generalisability of personal experience

Going by people I know in real life or on Twitter, you would think the majority of people brought up in the Mormon church become scientists. though I am informed this is not actually the case.

There’s an interview with one of them, Heather Hendrickson, in the Herald.

October 23, 2014

Official Information and Open Data

In recent years it has become much easier to just go and get routine government data. It’s now easy to put data up online, and organisations do it. We might whinge about how often the URLs and layouts change, but you can get and reuse information in ways that used to be impossible. For examples in just one field, see the blog of the NZ geodata company Koordinates.

On the other hand, non-routine requests seem to be increasingly difficult. David Fisher, of the Herald, gave a talk in Wellington last week on the Official Information Act. The talk has been published at Public Address

When I started, if I wanted to know about something, I would ring and ask. For example, if I want to know about how Kauri stumps were exported, I would ring up the equivalent of the MPI and ask how Kauri stumps get exported. I would then spend half an hour on the phone to the guy who oversaw the exporting – often the guy who was physically down at the docks – and I would be informed.

It seems a novel idea now. I can barely convey to you now what a wonderful feeling that is, to be a man with a question the public wants answering connecting with the public servant who has the information.

Things have changed, he says.

October 22, 2014

Currie Cup Predictions for the Currie Cup Final

Team Ratings for the Currie Cup Final

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Lions 7.39 0.07 7.30
Western Province 5.87 3.43 2.40
Sharks 3.00 5.09 -2.10
Blue Bulls 1.17 -0.74 1.90
Cheetahs -3.53 0.33 -3.90
Pumas -8.20 -10.00 1.80
Griquas -10.10 -7.49 -2.60
Kings -14.91 -10.00 -4.90

 

Performance So Far

So far there have been 42 matches played, 31 of which were correctly predicted, a success rate of 73.8%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Lions vs. Sharks Oct 18 50 – 20 6.70 TRUE
2 Western Province vs. Blue Bulls Oct 18 31 – 23 10.00 TRUE

 

Predictions for the Currie Cup Final

Here are the predictions for the Currie Cup Final. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Western Province vs. Lions Oct 25 Western Province 3.50

 

ITM Cup Predictions for the ITM Cup Finals

Team Ratings for the ITM Cup Finals

Here are the team ratings prior to the ITM Cup Finals, along with the ratings at the start of the season. I have created a brief description of the method I use for predicting rugby games. Go to my Department home page to see this.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Tasman 12.36 5.78 6.60
Canterbury 12.25 18.09 -5.80
Counties Manukau 6.58 2.40 4.20
Taranaki 6.27 -3.89 10.20
Auckland 4.84 4.92 -0.10
Hawke’s Bay 0.47 2.75 -2.30
Wellington -2.74 10.16 -12.90
Manawatu -2.90 -10.32 7.40
Otago -3.98 -1.45 -2.50
Northland -4.05 -8.22 4.20
Waikato -5.74 -1.20 -4.50
Southland -6.07 -5.85 -0.20
Bay of Plenty -9.23 -5.47 -3.80
North Harbour -10.13 -9.77 -0.40

 

Performance So Far

So far there have been 74 matches played, 48 of which were correctly predicted, a success rate of 64.9%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Hawke’s Bay vs. Northland Oct 17 26 – 21 9.30 TRUE
2 Manawatu vs. Southland Oct 18 23 – 18 7.70 TRUE
3 Taranaki vs. Auckland Oct 18 49 – 30 3.00 TRUE
4 Tasman vs. Canterbury Oct 18 26 – 6 1.30 TRUE

 

Predictions for the ITM Cup Finals

Here are the predictions for the ITM Cup Finals. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

Game Date Winner Prediction
1 Manawatu vs. Hawke’s Bay Oct 24 Manawatu 0.60
2 Taranaki vs. Tasman Oct 25 Tasman -2.10

 

October 19, 2014

Broadening your data display palate — multivariate beer?

Nathan Yau at Flowing Data has a project page on multivariate beer. That is, he wants to use beer recipes to encode information about US counties taken from the American Community Survey:

The great thing about beer is that it has plenty of dimensions to work with: body, bitterness, head retention, hop profile, color, aroma, alcohol by volume, and plenty more. The amount of various ingredients affects how beer looks, tastes, and smells.

Still a work in progress, here’s how a beer recipe is formed.

  • Greater head retention should increase with higher education, so a grain called Carapils is added.More hop aroma represents higher employment. This comes from more hops at the end of a boil and dry hopping.
  • Rye adds spice and complexity to the beer as health care coverage increases.
  • A darker-colored and more full-bodied beer comes from higher median household income and Crystal Malt 40.
  • More hop bitterness and flavor means more people per square mile, and the type of hops — Cascade, Centennial, Citra, Warrior, and Magnum — represents the races of the population.

That sounds fun, but I’m not convinced by its possibilities for data communication.

People often want to use other senses than vision for data communication, because they would provide more dimensions.  There are a couple of problems with this. First, the bandwidth and resolution of the other senses aren’t as good — for example, even a professional tea-taster can’t manage much over a thousand data points per day. Second, there’s encoding: the idea is to take advantage of the richness of experience from using all the senses, but it’s hard enough to work out how to encode numbers visually, and it will be much harder to come up with encodings for the other senses that convey accurate quantitative information.

October 18, 2014

Briefly

1. There’s a conference coming up in Canada on “Fairness, Accountability, and Transparency in Machine Learning”, a topic I wrote a little about for the Listener

Questions to the machine learning community include:

  • How can we achieve high classification accuracy while eliminating discriminatory biases? What are meaningful formal fairness properties?
  • How can we design expressive yet easily interpretable classifiers?
  • Can we ensure that a classifier remains accurate even if the statistical signal it relies on is exposed to public scrutiny?
  • Are there practical methods to test existing classifiers for compliance with a policy?

(via mathbabe.org)

2. From Nate Silver at fivethirtyeight.com

Democrats may not be wrong. The polls could very well be biased against their candidates. The problem is that the polls are just about as likely to be biased against Republicans, in which case the GOP could win more seats than expected.

This sort of slowly varying bias is probably one of the reasons the NZ election polls weren’t very good: not only did they have more variability than you’d expect given the sample sizes, but averaging didn’t cancel out much of the error.

3. Yesterday was Spreadsheet Day. Flee in terror! (via @kara_woo)

4. An informative  visualisation of what the world eats, over time. (via Harkanwal Singh)

 

October 16, 2014

Do you feel lucky?

I’m glad to say it’s been quite a while since we’ve had this sort of rubbish from the NZ papers, but it’s still  going across the Tasman (the  Sydney Morning Herald)

If you’re considering buying a lottery ticket, you’d better make sure it’s from either Gladesville or Cabramatta, which are now officially Sydney’s luckiest suburbs when it comes to winning big. 

NSW Lotteries has released statistics that show the luckiest suburbs across all lotto games in NSW and the ACT, as well as other tips for amateurs hoping to ring their bosses tomorrow morning to say they wouldn’t be coming in to work. 

Of course, the ‘luckiest’ suburbs are nothing of the sort: just the ones where the most money is lost on the lotteries. Cabramatta has improved a lot in recent years, but it’s still not the sort of place you’d expect to see called ‘lucky’.

October 15, 2014

Currie Cup Predictions for the Semi-Finals

Team Ratings for the SemiFinals

The basic method is described on my Department home page. I have made some changes to the methodology this year, including shrinking the ratings between seasons.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

Current Rating Rating at Season Start Difference
Lions 6.03 0.07 6.00
Western Province 6.02 3.43 2.60
Sharks 4.36 5.09 -0.70
Blue Bulls 1.02 -0.74 1.80
Cheetahs -3.53 0.33 -3.90
Pumas -8.20 -10.00 1.80
Griquas -10.10 -7.49 -2.60
Kings -14.91 -10.00 -4.90

 

Performance So Far

So far there have been 40 matches played, 29 of which were correctly predicted, a success rate of 72.5%.

Here are the predictions for last week’s games.

Game Date Score Prediction Correct
1 Kings vs. Pumas Oct 10 26 – 25 -2.20 FALSE
2 Lions vs. Cheetahs Oct 11 47 – 7 11.30 TRUE
3 Western Province vs. Sharks Oct 11 20 – 28 8.70 FALSE
4 Blue Bulls vs. Griquas Oct 11 46 – 12 13.70 TRUE

 

Predictions for the SemiFinals

Here are the predictions for the SemiFinals. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team


Game Date Winner Prediction
1 Lions vs. Sharks Oct 18 Lions 6.70
2 Western Province vs. Blue Bulls Oct 18 Western Province 10.00