Stats Chat

April 5, 2018

Immigration NZ and the harm model

Immigration NZ, by and large, has been good at transparency in the past– you may think some of their policies are inhumane or arbitrary, but you can easily find out what their policies are. That’s a pleasant contrast to the other place I’ve lived as an immigrant. Even their operational manual is available online. So, when you hear in this morning’s Radio NZ story “Immigration NZ using data system to predict likely troublemakers”, you might want to give them the benefit of the doubt and assume they are just taking more steps to make their decision procedures explicit.

But then you get to the quotes

“We will model the data sets we have available to us and look at who or what’s the demographic here that we’re looking at around people who are likely to commit harm in the immigration system or to New Zealand,” he said.

“Things like who’s incurring all the hospital debt or the debt to this country in health care, they’re not entitled to free healthcare, they’re not paying for it.

“So then we might take that demographic and load that into our harm model and say even though person A is doing this is there any likelihood that someone else that is coming through the system is going to behave in the same way and then we’ll move to deport that person at the first available opportunity so they don’t have a chance to do that type of harm.

At the very least, they are saying that you can have two people with the same record of what they’ve done in New Zealand, in the same circumstances, and one of them will be deported and the other not deported based on, say, country of origin or age. It’s true that to be deported you have to have done something that gives them a justification — but “at the first available opportunity” is fairly broad when you’re Immigration NZ. And if they’re talking about people who are “not entitled to free health care”, then “immigrants” is the wrong term. [update: Radio NZ have now changed the first word of the story from “Immigrants” to “Overstayers”. Apart from that issue of terminology the same comments still apply]

So, how does this differ from, say, the IRD using statistical models to target people with higher probability of having committed tax fraud for auditing? There are two important differences in principle. The first is that the IRD is interested in auditing people who have already committed tax fraud, not people who might do so in the future. The second is that the consequences of being caught don’t depend on the predicted probability. Immigration NZ, on the other hand, seems to be interested in treating people differently based on things they haven’t done but might do in the future.

Now, Immigration NZ has to deport some people. It has to make decisions about who to let into the country in the first place, and who to give extensions of visas, or grant residency. That’s what it’s for. These decisions will have serious impacts on the lives of would-be immigrants — ranging from those who have an application for residency denied to those who don’t even bother applying because there’s no hope.

Since Immigration NZ does make these sorts of decisions, do we want them to do it based on a statistical model? That’s actually a serious question. It depends. There are at least three issues with the model: the ‘transparency‘ issue, the ‘audit‘ issue and the ‘allowable information‘ issue. All of these are also a problem with decisions made by humans.

The ‘allowable information‘ issue is ‘racial profiling’. As a society, we’ve decided that some information just should not be used to make certain types of decisions — regardless of whether it’s genuinely predictive. For anyone other than Immigration NZ, country of origin would be in that category. Invoking a statistical model — essentially, writing it down in a flowchart — wouldn’t be a justification. To some extent Immigration NZ is required to treat prospective immigrants differently based on their country of origin; the question is how far they can go. The Human Rights Commission is likely to have an opinion here, and it’s quite possible they’ll say Immigration NZ has gone too far.

The ‘transparency‘ issue is that the model should be public. Voters should be able to find out their government’s policy on deportations; people trying to immigrate should know their chances. The tax office have an argument for keeping their model secret; they don’t want people to be able to tweak their accounts to escape detection. The immigration office don’t.

The ‘audit‘ issue is related but more complicated. Immigration NZ need to know (and should have independent verification, and should tell us) how accurate the model is and what inputs it’s sensitive to, and how reliable the data are. How many of the deported people does the model say would have committed serious crimes? How much unnecessary government expenditure does it predict they will require? How well do these predictions match up to reality? Are there relevant groups of people for whom the model is importantly less accurate — people from particular countries, people with or without family in NZ, etc — so that the costs of automated decision making aren’t justified by benefits. And to what extent do the inputs to the model suffer from self-reinforcing bias?

The classic problem of self-reinforcing bias comes from a different context, predictions of future offences by convicted criminals. We don’t have data on who commits crimes, only on who is arrested, charged, or convicted. To the extent that people from particular demographic groups are more likely to attract the notice of the justice system, it will look as if they are more likely to commit crime, and this will lead to more targeted enforcement. And so on, round and round.

In the immigration setting, we’d be concerned about any of the criteria that can be affected by current immigration enforcement practice — if people are currently more likely to be deported or more likely to have applications refused based subjectively on country of origin, this will tend to show up in the new models. Healthcare costs, on the other hand, aren’t directly affected by Immigration NZ decisions and so don’t have the same self-reinforcing vicious circle — though failing to pay the bills might.

Having a statistical model isn’t necessarily a bad thing, just like having a formal flowchart or points system isn’t necessarily a bad thing. The model can have various sorts of bias, but so can actual human immigration officers. In contrast to some of the social policy models, this model isn’t being used to make new distinctions in a setting where everyone used to be treated uniformly — the immigration system has always made individual decisions about visas and deportations.

In principle, a model could be developed with care to include only the right sorts of inputs, to predict outputs that aren’t subject to vicious circles, to have clear and reliably estimated costs and benefits associated with decisions, and to be open to independent audit. Such a model would be more accountable to the Minister, Parliament, and the nation than the decisions of individual immigration officers.

The fact that we, and the incoming Minister, only found out about the system this morning doesn’t suggest we’ve got that sort of model. Neither does the disappearance of data from their website, where they’ve just discovered privacy problems (without all that much effect, since the data are still up at archive.org). Nor the explicit use of country of origin. Nor the spokesperson’s complete lack of reference to safeguards in the modelling process, or the argument that they can’t be doing racial profiling because they also use gender, age and type of visa in the model.

View comments (10)

April 3, 2018

Super 15 Predictions for Round 8

By David Scott

Team Ratings for Round 8

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Hurricanes	17.10	16.18	0.90
Crusaders	14.86	15.23	-0.40
Chiefs	10.23	9.29	0.90
Highlanders	10.17	10.29	-0.10
Lions	8.22	13.81	-5.60
Stormers	0.35	1.48	-1.10
Sharks	0.25	1.02	-0.80
Blues	-1.24	-0.24	-1.00
Waratahs	-1.85	-3.92	2.10
Brumbies	-2.06	1.75	-3.80
Bulls	-3.07	-4.79	1.70
Jaguares	-4.16	-4.64	0.50
Reds	-7.95	-9.47	1.50
Rebels	-9.23	-14.96	5.70
Sunwolves	-19.04	-18.42	-0.60

Performance So Far

So far there have been 42 matches played, 27 of which were correctly predicted, a success rate of 64.3%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Chiefs vs. Highlanders	Mar 30	27 – 22	3.40	TRUE
2	Rebels vs. Hurricanes	Mar 30	19 – 50	-21.10	TRUE
3	Blues vs. Sharks	Mar 31	40 – 63	6.00	FALSE
4	Brumbies vs. Waratahs	Mar 31	17 – 24	4.70	FALSE
5	Bulls vs. Stormers	Mar 31	33 – 23	-1.30	FALSE
6	Lions vs. Crusaders	Apr 01	8 – 14	-2.20	TRUE

Predictions for Round 8

Here are the predictions for Round 8. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Hurricanes vs. Sharks	Apr 06	Hurricanes	20.80
2	Sunwolves vs. Waratahs	Apr 07	Waratahs	-13.20
3	Chiefs vs. Blues	Apr 07	Chiefs	15.00
4	Brumbies vs. Reds	Apr 07	Brumbies	9.40
5	Lions vs. Stormers	Apr 07	Lions	11.40
6	Jaguares vs. Crusaders	Apr 07	Crusaders	-15.00

NRL Predictions for Round 5

By David Scott

Team Ratings for Round 5

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Storm	13.70	16.73	-3.00
Dragons	4.66	-0.45	5.10
Panthers	4.29	2.64	1.70
Sharks	3.48	2.20	1.30
Cowboys	1.42	2.97	-1.50
Broncos	1.12	4.78	-3.70
Sea Eagles	0.98	-1.07	2.00
Raiders	-0.23	3.50	-3.70
Roosters	-0.35	0.13	-0.50
Wests Tigers	-1.35	-3.63	2.30
Warriors	-1.94	-6.97	5.00
Rabbitohs	-2.50	-3.90	1.40
Bulldogs	-3.82	-3.43	-0.40
Eels	-4.21	1.51	-5.70
Knights	-8.46	-8.43	-0.00
Titans	-9.12	-8.91	-0.20

Performance So Far

So far there have been 32 matches played, 17 of which were correctly predicted, a success rate of 53.1%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Cowboys vs. Panthers	Mar 29	14 – 33	3.20	FALSE
2	Rabbitohs vs. Bulldogs	Mar 30	20 – 16	4.40	TRUE
3	Sharks vs. Storm	Mar 30	14 – 4	-10.00	FALSE
4	Roosters vs. Warriors	Mar 31	6 – 30	11.00	FALSE
5	Sea Eagles vs. Raiders	Mar 31	32 – 16	2.30	TRUE
6	Dragons vs. Knights	Apr 01	30 – 12	15.80	TRUE
7	Broncos vs. Titans	Apr 01	14 – 26	17.30	FALSE
8	Wests Tigers vs. Eels	Apr 02	30 – 20	5.20	TRUE

Predictions for Round 5

Here are the predictions for Round 5. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Raiders vs. Bulldogs	Apr 05	Raiders	6.60
2	Sharks vs. Roosters	Apr 06	Sharks	6.80
3	Dragons vs. Rabbitohs	Apr 06	Dragons	10.20
4	Wests Tigers vs. Storm	Apr 07	Storm	-15.00
5	Warriors vs. Cowboys	Apr 07	Warriors	1.10
6	Knights vs. Broncos	Apr 07	Broncos	-6.60
7	Titans vs. Sea Eagles	Apr 08	Sea Eagles	-7.10
8	Eels vs. Panthers	Apr 08	Panthers	-5.50

View comments (1)

March 28, 2018

Cycling for work or play

By Thomas Lumley

Auckland Transport publish data from cycle counters on various bike paths. They’re most interested in trends over time (increasing) and perhaps in seasonal variation (more in summer).

Here’s a look at weekday vs weekend counts using data from the start of 2016 to now (click to embiggen).

There are some paths that are clearly used primarily by commuters, with more than twice the average traffic on a weekday vs weekend. There are also some that are mostly used at the weekend, such as Matakana, Upper Harbour, and Mangere Bridge. And some, like the Lightpath, that get used all the time.

Note: while it’s great that Auckland Transport publishes these data, the data would be easier to reuse if the names they used for each counter were consistent over time (eg: “Tamaki Dr” vs “Tamaki Drive”, or “Nelson Street Lightpath Counter Cyclists” vs “Nelson Street Lightpath Cyclists”)

View comments (9)

March 27, 2018

Super 15 Predictions for Round 7

By David Scott

Team Ratings for Round 7

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Hurricanes	16.51	16.18	0.30
Crusaders	14.63	15.23	-0.60
Highlanders	10.27	10.29	-0.00
Chiefs	10.13	9.29	0.80
Lions	8.45	13.81	-5.40
Stormers	1.02	1.48	-0.50
Blues	0.50	-0.24	0.70
Brumbies	-1.36	1.75	-3.10
Sharks	-1.49	1.02	-2.50
Waratahs	-2.55	-3.92	1.40
Bulls	-3.74	-4.79	1.10
Jaguares	-4.16	-4.64	0.50
Reds	-7.95	-9.47	1.50
Rebels	-8.64	-14.96	6.30
Sunwolves	-19.04	-18.42	-0.60

Performance So Far

So far there have been 36 matches played, 24 of which were correctly predicted, a success rate of 66.7%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Crusaders vs. Bulls	Mar 23	33 – 14	22.80	TRUE
2	Rebels vs. Sharks	Mar 23	46 – 14	-7.90	FALSE
3	Sunwolves vs. Chiefs	Mar 24	10 – 61	-21.70	TRUE
4	Hurricanes vs. Highlanders	Mar 24	29 – 12	8.80	TRUE
5	Stormers vs. Reds	Mar 24	25 – 19	13.90	TRUE
6	Jaguares vs. Lions	Mar 24	49 – 35	-11.70	FALSE

Predictions for Round 7

Here are the predictions for Round 7. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Chiefs vs. Highlanders	Mar 30	Chiefs	3.40
2	Rebels vs. Hurricanes	Mar 30	Hurricanes	-21.10
3	Blues vs. Sharks	Mar 31	Blues	6.00
4	Brumbies vs. Waratahs	Mar 31	Brumbies	4.70
5	Bulls vs. Stormers	Mar 31	Stormers	-1.30
6	Lions vs. Crusaders	Apr 01	Crusaders	-2.20

NRL Predictions for Round 4

By David Scott

Team Ratings for Round 4

The basic method is described on my Department home page.
Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Storm	15.10	16.73	-1.60
Dragons	4.51	-0.45	5.00
Broncos	3.17	4.78	-1.60
Cowboys	2.98	2.97	0.00
Panthers	2.73	2.64	0.10
Roosters	2.10	0.13	2.00
Sharks	2.08	2.20	-0.10
Raiders	0.73	3.50	-2.80
Sea Eagles	0.02	-1.07	1.10
Wests Tigers	-1.68	-3.63	2.00
Rabbitohs	-2.47	-3.90	1.40
Bulldogs	-3.84	-3.43	-0.40
Eels	-3.87	1.51	-5.40
Warriors	-4.39	-6.97	2.60
Knights	-8.30	-8.43	0.10
Titans	-11.17	-8.91	-2.30

Performance So Far

So far there have been 24 matches played, 13 of which were correctly predicted, a success rate of 54.2%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Storm vs. Cowboys	Mar 22	30 – 14	15.00	TRUE
2	Bulldogs vs. Panthers	Mar 23	20 – 18	-4.50	FALSE
3	Wests Tigers vs. Broncos	Mar 23	7 – 9	-1.80	TRUE
4	Raiders vs. Warriors	Mar 24	19 – 20	11.40	FALSE
5	Rabbitohs vs. Sea Eagles	Mar 24	34 – 6	-4.00	FALSE
6	Eels vs. Sharks	Mar 24	4 – 14	-1.80	TRUE
7	Titans vs. Dragons	Mar 25	8 – 54	-7.30	TRUE
8	Roosters vs. Knights	Mar 25	38 – 8	10.70	TRUE

Predictions for Round 4

Here are the predictions for Round 4. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Cowboys vs. Panthers	Mar 29	Cowboys	3.20
2	Rabbitohs vs. Bulldogs	Mar 30	Rabbitohs	4.40
3	Sharks vs. Storm	Mar 30	Storm	-10.00
4	Roosters vs. Warriors	Mar 31	Roosters	11.00
5	Sea Eagles vs. Raiders	Mar 31	Sea Eagles	2.30
6	Dragons vs. Knights	Apr 01	Dragons	15.80
7	Broncos vs. Titans	Apr 01	Broncos	17.30
8	Wests Tigers vs. Eels	Apr 02	Wests Tigers	5.20

March 26, 2018

Accurate graphical rhetoric

By Thomas Lumley

This graph comes from the Twitter account of Jill Hennessy, Victoria’s Minister for Health. It’s obviously intended to make a particular point — and one that’s politically supportive to her. However, it’s actually a pretty good graph.

The baseline isn’t zero, but this is clearly an example where a zero baseline would be silly: zero is not a relevant value of the vaccination rate. The 95% top line is also not arbitrary: it’s the government target for vaccination, chosen because it’s thought to be high enough for herd immunity even to measles. Having the line break out of the box is done without distorting the numerical values. I might want some earlier data than 2013 to see the trends under the previous government, but that’s not a terrible omission.

The causal attribution of the increase to the “No Jab No Play” laws — restricting kindergarten, preschool, and daycare attendance for kids who are missing vaccinations — is obviously less solid, but it’s not implausible. And there are some regions of Victoria where rates are still low. And there’s obviously room to argue about whether the laws denying benefits and restricting preschool/kindergarten/daycare enrolment are worth it even if they were responsible. But the graph itself, unusually for something from a minister, isn’t bad.

View comments (1)

The data speak for themselves?

By Thomas Lumley

This graph was on Twitter this morning. There’s nothing wrong with the graph: good data, clear presentation, but it does provide a nice illustration of the difficulties in official statistics — you have to decide what categories to use, and it makes a difference.

The second leading cause, motor vehicles, is straightforward enough. The first, firearms, is more complicated. A majority of the firearm deaths are suicides, and it’s controversial whether firearm access increases the suicide rate or just affects the method. Poisoning is also complicated: you might well want to treat both suicide and accidental recreational-drug overdose separately. And so on.

Sometimes you want to break down the data by intent, sometimes by physical cause, sometimes by medical type of injury or damage. You can’t define the ‘correct’ answer in the absence of a question.

View comments (2)

Ihaka Lecture Series 2018 – collected here for your viewing pleasure

By Atakohu Middleton

The second annual edition of the Ihaka Lecture Series has just ended, and we are, once again, delighted with the turnout and engagement, in person and online. Our final speaker was Alberto Cairo, right, Knight Chair in Visual Journalism at the University of Miami, whose lecture on the dubious uses of data was thought-provoking and a bit worrying.

If you want to see how Trump supporters deluded themselves and misled others with graphics, it’s all laid bare here in Alberto’s lecture. And that this brand of Trumpery is not the only example of statistics willfully used to mislead – Alberto delivers a few other eye-openers. And some laughs, as well – he is a very entertaining and engaging speaker. By the way, it’s not all bad news – there is much useful and thoughtful work being done, and Alberto shows what that is and where.

Alberto’s lecture is accessible to all. He uses non-technical language, as and Alberto says, he’s not a statistician. So if you are teaching secondary-school statistics (or citizenship or social studies … ) this would be a really good resource for your students.

Also, Alberto was yesterday interviewed by Colin Peacock, the long-time host of Radio New Zealand’s Mediawatch, and it’s recommended listening. The pic Mediawatch ran of Alberto on its webpage was so nice, we stole it. Nice image, RNZ’s Claire Eastham-Farrelly!

Of course, we also had two other incomparable speakers: our own Associate Professor Paul Murrell, one of the movers and shakers behind R, on the BrailleR package, which generates text descriptions of R plots (watch here) and Monash Professor Dianne Cook, who described some simple tools for helping to decide if patterns you think you are seeing in the data are really there (watch here).

And … in breaking news, the theme of next year’s Ihaka Lecture Series is … machine learning! Speakers will be announced at a later date.

+ Useful link: The 2017 Ihaka Lecture Series.

March 20, 2018

Super 15 Predictions for Round 6

By David Scott

Team Ratings for Round 6

The basic method is described on my Department home page.

Here are the team ratings prior to this week’s games, along with the ratings at the start of the season.

	Current Rating	Rating at Season Start	Difference
Hurricanes	16.02	16.18	-0.20
Crusaders	14.86	15.23	-0.40
Highlanders	10.76	10.29	0.50
Lions	9.99	13.81	-3.80
Chiefs	8.37	9.29	-0.90
Stormers	1.50	1.48	0.00
Sharks	0.91	1.02	-0.10
Blues	0.50	-0.24	0.70
Brumbies	-1.36	1.75	-3.10
Waratahs	-2.55	-3.92	1.40
Bulls	-3.98	-4.79	0.80
Jaguares	-5.70	-4.64	-1.10
Reds	-8.43	-9.47	1.00
Rebels	-11.03	-14.96	3.90
Sunwolves	-17.28	-18.42	1.10

Performance So Far

So far there have been 30 matches played, 20 of which were correctly predicted, a success rate of 66.7%.
Here are the predictions for last week’s games.

	Game	Date	Score	Prediction	Correct
1	Chiefs vs. Bulls	Mar 16	41 – 28	16.80	TRUE
2	Highlanders vs. Crusaders	Mar 17	25 – 17	-1.80	FALSE
3	Brumbies vs. Sharks	Mar 17	24 – 17	1.00	TRUE
4	Stormers vs. Blues	Mar 17	37 – 20	3.40	TRUE
5	Lions vs. Sunwolves	Mar 17	40 – 38	35.30	TRUE
6	Jaguares vs. Reds	Mar 17	7 – 18	9.10	FALSE
7	Waratahs vs. Rebels	Mar 18	51 – 27	10.30	TRUE

Predictions for Round 6

Here are the predictions for Round 6. The prediction is my estimated expected points difference with a positive margin being a win to the home team, and a negative margin a win to the away team.

	Game	Date	Winner	Prediction
1	Crusaders vs. Bulls	Mar 23	Crusaders	22.80
2	Rebels vs. Sharks	Mar 23	Sharks	-7.90
3	Sunwolves vs. Chiefs	Mar 24	Chiefs	-21.70
4	Hurricanes vs. Highlanders	Mar 24	Hurricanes	8.80
5	Stormers vs. Reds	Mar 24	Stormers	13.90
6	Jaguares vs. Lions	Mar 24	Lions	-11.70

View comments (2)

Stats Chat

Immigration NZ and the harm model

Super 15 Predictions for Round 8

Team Ratings for Round 8

Performance So Far

Predictions for Round 8

NRL Predictions for Round 5

Team Ratings for Round 5

Performance So Far

Predictions for Round 5

Cycling for work or play

Super 15 Predictions for Round 7

Team Ratings for Round 7

Performance So Far

Predictions for Round 7

NRL Predictions for Round 4

Team Ratings for Round 4

Performance So Far

Predictions for Round 4

Accurate graphical rhetoric

The data speak for themselves?

Ihaka Lecture Series 2018 – collected here for your viewing pleasure

Super 15 Predictions for Round 6

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

Recent comments

Popular posts

Latest posts

All topics

Recommended sites

Subscribe:

Receive our posts via email:

Team Ratings for Round 8

Performance So Far

Predictions for Round 8

Team Ratings for Round 5

Performance So Far

Predictions for Round 5

Team Ratings for Round 7

Performance So Far

Predictions for Round 7

Team Ratings for Round 4

Performance So Far

Predictions for Round 4

Team Ratings for Round 6

Performance So Far

Predictions for Round 6

Recent comments

Popular posts

Latest posts

All topics

Recommended sites