Stats Chat

April 17, 2018

Aviva Premiership Predictions for Round 21

Team Ratings for Round 21

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Saracens	11.35	7.47	3.90
Exeter Chiefs	10.08	7.99	2.10
Wasps	5.98	5.89	0.10
Leicester Tigers	3.92	4.64	-0.70
Sale Sharks	1.07	-1.73	2.80
Bath Rugby	0.15	1.23	-1.10
Gloucester Rugby	0.08	0.21	-0.10
Newcastle Falcons	-1.36	-3.33	2.00
Harlequins	-2.13	0.84	-3.00
Northampton Saints	-2.15	1.53	-3.70
Worcester Warriors	-5.67	-4.37	-1.30
London Irish	-5.89	-4.94	-1.00

Performance So Far

So far there have been 120 matches played, 83 of which were correctly predicted, a success rate of 69.2%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Newcastle Falcons vs. Sale Sharks	Apr 13	35 – 30	-0.00	FALSE
2	Gloucester Rugby vs. Harlequins	Apr 14	37 – 9	3.70	TRUE
3	Leicester Tigers vs. Northampton Saints	Apr 14	21 – 27	10.20	FALSE
4	Wasps vs. Worcester Warriors	Apr 14	30 – 15	14.60	TRUE
5	London Irish vs. Exeter Chiefs	Apr 15	5 – 45	-11.20	TRUE
6	Saracens vs. Bath Rugby	Apr 15	41 – 6	12.80	TRUE

Predictions for Round 21

Here are the predictions for Round 21. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Leicester Tigers vs. Newcastle Falcons	Apr 27	Leicester Tigers	8.30
2	Exeter Chiefs vs. Sale Sharks	Apr 28	Exeter Chiefs	12.00
3	Gloucester Rugby vs. Bath Rugby	Apr 28	Gloucester Rugby	2.90
4	Worcester Warriors vs. Harlequins	Apr 28	Harlequins	-0.50
5	London Irish vs. Saracens	Apr 29	Saracens	-14.20
6	Wasps vs. Northampton Saints	Apr 29	Wasps	11.10

April 13, 2018

Nibiru watch

By Thomas Lumley

The Herald website has just published another story about the imaginary and invisible planet Nibiru. That’s in addition to the nine from last year.

18 Nov: Conspiracy theorists claim mysterious planet Nibiru will trigger apocalyptic earthquakes
25 Sep: Nibiru: How the nonsense Planet X Armageddon and Nasa fake news theories spread globally
30 Oct: The end of the world as we know it (again)
23 Sep: How the nonsense Planet X Armageddon, Nasa fake news spread globally
19 Sep: The world is about to end, if you believe this doomsday claim
30 Aug: Claims new planet ‘about to destroy Earth’ and clues written on pyramids
23 Sep: ‘Extremely violent times will come’: TV broadcast interrupted with end-of-world prediction
9 Aug: Rogue planet about to smash into Earth, claim conspiracy theorists
4 Jan: Conspiracy theorist claims mysterious planet Nibiru will smash into Earth and the world will end in October 2017

As you see, two of the stories ask how this nonsense theory spread globally.

One says, with a splendid lack of self-awareness

Despite absolutely no scientific evidence to back up the suggestions of a rogue planet getting rapidly closer to Earth, myths about Planet X continue to be perpetuated online, according to the Telegraph UK.

Indeed they do.

Briefly

By Thomas Lumley

An ‘insufficient data’ edition

Data quality matters: the high rate of death reported for women giving birth in Texas seems to have been partly a data entry error. Researchers say Approximately half (50.3%) of obstetric-coded deaths showed no evidence of pregnancy within 42 days, and a large majority of these incorrectly indicated pregnancy at the time of death. That is, these were real deaths, but not related to pregnancy. The research paper also says “Texas’ current electronic death registration system displays pregnancy status options as a dropdown list. The “pregnant at the time of death” option is directly below the “not pregnant within the past year” option” Via Ars Technica.
The new cancer drug pembrolizumab (Keytruda) is spectacularly effective across a wide range of tumours, but typically for a minority of patients. In the US, the FDA has approved its marketing for any tumour with a particular defect in DNA repair, but testing for that defect is not as reliable as one would like. The story in Nature News focuses on false negatives: people who would benefit but aren’t found by the test. In New Zealand, false positives are also important: these new drugs would be more cost-effective and so more likely to be subsidised if you could avoid giving them to people who wouldn’t benefit.
There’s a new claim that kumara got to the Pacific Islands before people did, in the New York Times, based on this research. Basically, the DNA from samples collected by the first European botanists in Polynesia has quite a lot of minor differences from modern sweet potatoes in the Americas, suggesting that its ancestors had been separated from the rest of the sweet potato lineage for over 100,000 years. However, Lisa Matisoo-Smith and Michael Knapp from Otago argue that the samples are old enough — nearly 250 years — that the DNA will have been degraded and needs to be analysed with special obsessively-detailed protocols for old DNA. That is, the evidence isn’t nearly strong enough to overturn the other reasons for thinking kumara were brought from South America by humans.

View comments (1)

April 11, 2018

Aviva Premiership Predictions for Round 20

By David Scott

I have been meaning to add some additional competitions to my predictions. I have now implemented predictions for the Aviva Premiership, starting a bit late in the season, with only a few rounds to go.

Team Ratings for Round 20

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Saracens	10.63	7.47	3.20
Exeter Chiefs	9.19	7.99	1.20
Wasps	5.96	5.89	0.10
Leicester Tigers	4.48	4.64	-0.20
Sale Sharks	1.37	-1.73	3.10
Bath Rugby	0.86	1.23	-0.40
Gloucester Rugby	-0.69	0.21	-0.90
Harlequins	-1.36	0.84	-2.20
Newcastle Falcons	-1.66	-3.33	1.70
Northampton Saints	-2.70	1.53	-4.20
London Irish	-5.00	-4.94	-0.10
Worcester Warriors	-5.65	-4.37	-1.30

Performance So Far

So far there have been 114 matches played, 79 of which were correctly predicted, a success rate of 69.3%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Sale Sharks vs. Wasps	Apr 07	28 – 27	-1.90	FALSE
2	Northampton Saints vs. Saracens	Apr 08	13 – 63	-7.90	TRUE
3	Bath Rugby vs. Leicester Tigers	Apr 07	19 – 34	-2.70	TRUE
4	Worcester Warriors vs. Newcastle Falcons	Apr 08	27 – 13	-2.10	FALSE
5	Harlequins vs. London Irish	Apr 08	5 – 35	8.90	FALSE
6	Exeter Chiefs vs. Gloucester Rugby	Apr 09	46 – 10	11.30	TRUE

Predictions for Round 20

Here are the predictions for Round 20. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Newcastle Falcons vs. Sale Sharks	Apr 13	Sale Sharks	-0.00
2	Gloucester Rugby vs. Harlequins	Apr 14	Gloucester Rugby	3.70
3	Leicester Tigers vs. Northampton Saints	Apr 14	Leicester Tigers	10.20
4	Wasps vs. Worcester Warriors	Apr 14	Wasps	14.60
5	London Irish vs. Exeter Chiefs	Apr 15	Exeter Chiefs	-11.20
6	Saracens vs. Bath Rugby	Apr 15	Saracens	12.80

April 10, 2018

Super 15 Predictions for Round 9

By David Scott

Team Ratings for Round 9

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Hurricanes	15.91	16.18	-0.30
Crusaders	15.52	15.23	0.30
Highlanders	10.17	10.29	-0.10
Chiefs	9.45	9.29	0.20
Lions	8.80	13.81	-5.00
Sharks	1.44	1.02	0.40
Stormers	-0.23	1.48	-1.70
Blues	-0.46	-0.24	-0.20
Brumbies	-1.19	1.75	-2.90
Waratahs	-1.38	-3.92	2.50
Bulls	-3.07	-4.79	1.70
Jaguares	-4.82	-4.64	-0.20
Reds	-8.83	-9.47	0.60
Rebels	-9.23	-14.96	5.70
Sunwolves	-19.51	-18.42	-1.10

Performance So Far

So far there have been 48 matches played, 33 of which were correctly predicted, a success rate of 68.8%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Hurricanes vs. Sharks	Apr 06	38 – 37	20.80	TRUE
2	Sunwolves vs. Waratahs	Apr 07	29 – 50	-13.20	TRUE
3	Chiefs vs. Blues	Apr 07	21 – 19	15.00	TRUE
4	Brumbies vs. Reds	Apr 07	45 – 21	9.40	TRUE
5	Lions vs. Stormers	Apr 07	52 – 31	11.40	TRUE
6	Jaguares vs. Crusaders	Apr 07	14 – 40	-15.00	TRUE

Predictions for Round 9

Here are the predictions for Round 9. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Hurricanes vs. Chiefs	Apr 13	Hurricanes	10.00
2	Sunwolves vs. Blues	Apr 14	Blues	-15.00
3	Rebels vs. Jaguares	Apr 14	Jaguares	-0.40
4	Highlanders vs. Brumbies	Apr 14	Highlanders	15.40
5	Waratahs vs. Reds	Apr 14	Waratahs	10.90
6	Sharks vs. Bulls	Apr 14	Sharks	8.00

NRL Predictions for Round 6

By David Scott

Team Ratings for Round 6

The basic method is described on my Department home page. Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Storm	12.57	16.73	-4.20
Panthers	4.33	2.64	1.70
Dragons	4.23	-0.45	4.70
Sharks	1.74	2.20	-0.50
Roosters	1.39	0.13	1.30
Cowboys	0.80	2.97	-2.20
Raiders	0.43	3.50	-3.10
Broncos	0.31	4.78	-4.50
Wests Tigers	-0.22	-3.63	3.40
Sea Eagles	-0.36	-1.07	0.70
Warriors	-1.32	-6.97	5.70
Rabbitohs	-2.07	-3.90	1.80
Eels	-4.24	1.51	-5.80
Bulldogs	-4.48	-3.43	-1.10
Knights	-7.65	-8.43	0.80
Titans	-7.78	-8.91	1.10

Performance So Far

So far there have been 40 matches played, 21 of which were correctly predicted, a success rate of 52.5%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Raiders vs. Bulldogs	Apr 05	26 – 10	6.60	TRUE
2	Sharks vs. Roosters	Apr 06	10 – 28	6.80	FALSE
3	Dragons vs. Rabbitohs	Apr 06	16 – 12	10.20	TRUE
4	Wests Tigers vs. Storm	Apr 07	11 – 10	-15.00	FALSE
5	Warriors vs. Cowboys	Apr 07	22 – 12	1.10	TRUE
6	Knights vs. Broncos	Apr 07	15 – 10	-6.60	FALSE
7	Titans vs. Sea Eagles	Apr 08	32 – 20	-7.10	FALSE
8	Eels vs. Panthers	Apr 08	6 – 12	-5.50	TRUE

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Roosters vs. Rabbitohs	Apr 12	Roosters	6.50
2	Storm vs. Knights	Apr 13	Storm	23.20
3	Dragons vs. Sharks	Apr 13	Dragons	5.50
4	Warriors vs. Broncos	Apr 14	Warriors	2.90
5	Cowboys vs. Bulldogs	Apr 14	Cowboys	8.30
6	Raiders vs. Eels	Apr 14	Raiders	7.70
7	Panthers vs. Titans	Apr 15	Panthers	15.10
8	Sea Eagles vs. Wests Tigers	Apr 15	Sea Eagles	2.90

View comments (3)

Algorithmic Impact Assessments

By Thomas Lumley

There’s a new report from New York University’s AI Now Institute, giving recommendations for algorithmic impact assessments (PDF). Worth reading for anyone who is or should be interested in criteria for automated decision systems. As the researchers say:

AIAs will not solve all of the problems that automated decision systems might raise, but they do provide an important mechanism to inform the public and to engage policymakers and researchers in productive conversation. With this in mind, AIAs are designed to achieve four key policy goals:

Respect the public’s right to know which systems impact their lives by publicly listing and describing automated decision systems that signi cantly a ect individuals and communities;

Increase public agencies’ internal expertise and capacity to evaluate the systems they build or procure, so they can anticipate issues that might raise concerns, such as disparate impacts or due process violations;

Ensure greater accountability of automated decision systems by providing a meaningful and ongoing opportunity for external researchers to review, audit, and assess these systems using methods that allow them to identify and detect problems; and

Ensure that the public has a meaningful opportunity to respond to and, if necessary, dispute the use of a given system or an agency’s approach to algorithmic accountability.

(via Harkanwal Singh)

The Immigration NZ model: recap

By Thomas Lumley

Original post

To begin with: Yes, everyone being evaluated was already eligible for deportation.

There were two main categories of feedback on this point: the ‘Manus Island’ tendency, arguing they’re all guilty and so it doesn’t matter how you treat them, and the people pointing out that a model could perhaps make better decisions that an individual immigration officer. The first group have, I think, missed an important issue: the arguments given by Immigration NZ for this model being a good thing would apply anywhere else in the immigration system or the justice system where there is currently discretion — eg, police discretion to prosecute.

The second group do have a good point (which is why it’s a point I made in my original post), but only if the model is constructed well and, ideally, audited. As I said, it didn’t look like we had that sort of model. Today, we got more information about the model, thanks to Radio NZ’s Morning Report. Here’s a PDF of the spreadsheet and the briefing document (dated April 6, so potentially cleaned up after the initial publicity). It’s a spreadsheet, simply adding up points for a bunch of categories, with minimal scaling for importance based on Immigration NZ’s expert knowledge or fitting to empirical data.

It’s not especially surprising that the harm model is a bit crap. What is surprising is that the Minister thinks this is a good thing

He said he was concerned about misconceptions around the pilot programme.

“Some people were talking about a sophisticated algorithm some people were talking about racial profiling, both of those are incorrect and I think it’s very important that the public know exactly what this is, and what it isn’t,” he said.

“This is not modelling or a predictive tool – this is a spreadsheet that they put some information into and they rank people based on that information.”

That’s not a defence; it’s an indictment.

View comments (5)

April 9, 2018

Briefly

By Thomas Lumley

Puerto Rico’s official statistics agency is being dismantled and privatised
A good Herald medical science story, on the McLeod family, the stomach-cancer risk gene many of them carry, and what’s being done about it
It’s hard to do survey research on small subgroups of people, since even a small fraction of false responses can dominate the real ones. Especially with teenagers:

In a 2003 study, 19 percent of teens who claimed to be adopted actually weren’t, according to follow-up interviews with their parents. When you excluded these kids (who also gave extreme responses on other items), the study no longer found a significant difference between adopted children and those who weren’t on behaviors like drug use, drinking and skipping school

“If your data is bad, your machine learning tools are useless” – from the Harvard Business Review, so the message is getting out.
From the South China Morning Post: “Jaywalkers under surveillance in Shenzhen soon to be punished via text messages” — new automated system based on facial recognition.

View comments (1)

April 5, 2018

Immigration NZ and the harm model

By Thomas Lumley

Immigration NZ, by and large, has been good at transparency in the past– you may think some of their policies are inhumane or arbitrary, but you can easily find out what their policies are. That’s a pleasant contrast to the other place I’ve lived as an immigrant. Even their operational manual is available online. So, when you hear in this morning’s Radio NZ story “Immigration NZ using data system to predict likely troublemakers”, you might want to give them the benefit of the doubt and assume they are just taking more steps to make their decision procedures explicit.

But then you get to the quotes

“We will model the data sets we have available to us and look at who or what’s the demographic here that we’re looking at around people who are likely to commit harm in the immigration system or to New Zealand,” he said.

“Things like who’s incurring all the hospital debt or the debt to this country in health care, they’re not entitled to free healthcare, they’re not paying for it.

“So then we might take that demographic and load that into our harm model and say even though person A is doing this is there any likelihood that someone else that is coming through the system is going to behave in the same way and then we’ll move to deport that person at the first available opportunity so they don’t have a chance to do that type of harm.

At the very least, they are saying that you can have two people with the same record of what they’ve done in New Zealand, in the same circumstances, and one of them will be deported and the other not deported based on, say, country of origin or age. It’s true that to be deported you have to have done something that gives them a justification — but “at the first available opportunity” is fairly broad when you’re Immigration NZ. And if they’re talking about people who are “not entitled to free health care”, then “immigrants” is the wrong term. [update: Radio NZ have now changed the first word of the story from “Immigrants” to “Overstayers”. Apart from that issue of terminology the same comments still apply]

So, how does this differ from, say, the IRD using statistical models to target people with higher probability of having committed tax fraud for auditing? There are two important differences in principle. The first is that the IRD is interested in auditing people who have already committed tax fraud, not people who might do so in the future. The second is that the consequences of being caught don’t depend on the predicted probability. Immigration NZ, on the other hand, seems to be interested in treating people differently based on things they haven’t done but might do in the future.

Now, Immigration NZ has to deport some people. It has to make decisions about who to let into the country in the first place, and who to give extensions of visas, or grant residency. That’s what it’s for. These decisions will have serious impacts on the lives of would-be immigrants — ranging from those who have an application for residency denied to those who don’t even bother applying because there’s no hope.

Since Immigration NZ does make these sorts of decisions, do we want them to do it based on a statistical model? That’s actually a serious question. It depends. There are at least three issues with the model: the ‘transparency‘ issue, the ‘audit‘ issue and the ‘allowable information‘ issue. All of these are also a problem with decisions made by humans.

The ‘allowable information‘ issue is ‘racial profiling’. As a society, we’ve decided that some information just should not be used to make certain types of decisions — regardless of whether it’s genuinely predictive. For anyone other than Immigration NZ, country of origin would be in that category. Invoking a statistical model — essentially, writing it down in a flowchart — wouldn’t be a justification. To some extent Immigration NZ is required to treat prospective immigrants differently based on their country of origin; the question is how far they can go. The Human Rights Commission is likely to have an opinion here, and it’s quite possible they’ll say Immigration NZ has gone too far.

The ‘transparency‘ issue is that the model should be public. Voters should be able to find out their government’s policy on deportations; people trying to immigrate should know their chances. The tax office have an argument for keeping their model secret; they don’t want people to be able to tweak their accounts to escape detection. The immigration office don’t.

The ‘audit‘ issue is related but more complicated. Immigration NZ need to know (and should have independent verification, and should tell us) how accurate the model is and what inputs it’s sensitive to, and how reliable the data are. How many of the deported people does the model say would have committed serious crimes? How much unnecessary government expenditure does it predict they will require? How well do these predictions match up to reality? Are there relevant groups of people for whom the model is importantly less accurate — people from particular countries, people with or without family in NZ, etc — so that the costs of automated decision making aren’t justified by benefits. And to what extent do the inputs to the model suffer from self-reinforcing bias?

The classic problem of self-reinforcing bias comes from a different context, predictions of future offences by convicted criminals. We don’t have data on who commits crimes, only on who is arrested, charged, or convicted. To the extent that people from particular demographic groups are more likely to attract the notice of the justice system, it will look as if they are more likely to commit crime, and this will lead to more targeted enforcement. And so on, round and round.

In the immigration setting, we’d be concerned about any of the criteria that can be affected by current immigration enforcement practice — if people are currently more likely to be deported or more likely to have applications refused based subjectively on country of origin, this will tend to show up in the new models. Healthcare costs, on the other hand, aren’t directly affected by Immigration NZ decisions and so don’t have the same self-reinforcing vicious circle — though failing to pay the bills might.

Having a statistical model isn’t necessarily a bad thing, just like having a formal flowchart or points system isn’t necessarily a bad thing. The model can have various sorts of bias, but so can actual human immigration officers. In contrast to some of the social policy models, this model isn’t being used to make new distinctions in a setting where everyone used to be treated uniformly — the immigration system has always made individual decisions about visas and deportations.

In principle, a model could be developed with care to include only the right sorts of inputs, to predict outputs that aren’t subject to vicious circles, to have clear and reliably estimated costs and benefits associated with decisions, and to be open to independent audit. Such a model would be more accountable to the Minister, Parliament, and the nation than the decisions of individual immigration officers.

The fact that we, and the incoming Minister, only found out about the system this morning doesn’t suggest we’ve got that sort of model. Neither does the disappearance of data from their website, where they’ve just discovered privacy problems (without all that much effect, since the data are still up at archive.org). Nor the explicit use of country of origin. Nor the spokesperson’s complete lack of reference to safeguards in the modelling process, or the argument that they can’t be doing racial profiling because they also use gender, age and type of visa in the model.

View comments (10)

Stats Chat

Aviva Premiership Predictions for Round 21

Team Ratings for Round 21

Performance So Far

Predictions for Round 21

Nibiru watch

Briefly

Aviva Premiership Predictions for Round 20

Team Ratings for Round 20

Performance So Far

Predictions for Round 20

Super 15 Predictions for Round 9

Team Ratings for Round 9

Performance So Far

Predictions for Round 9

NRL Predictions for Round 6

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

Algorithmic Impact Assessments

The Immigration NZ model: recap

Briefly

Immigration NZ and the harm model

Recent comments

Popular posts

Latest posts

All topics

Recommended sites

Subscribe:

Receive our posts via email:

Team Ratings for Round 21

Performance So Far

Predictions for Round 21

Team Ratings for Round 20

Performance So Far

Predictions for Round 20

Team Ratings for Round 9

Performance So Far

Predictions for Round 9

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

Recent comments

Popular posts

Latest posts

All topics

Recommended sites