January 18, 2018

Measuring what you care about

There’s a story in the Guardian saying

The credibility of a computer program used for bail and sentencing decisions has been called into question after it was found to be no more accurate at predicting the risk of reoffending than people with no criminal justice experience provided with only the defendant’s age, sex and criminal history.

They even link to the research paper.

That’s all well and good, or rather, not good. But there’s another issue that doesn’t even get raised.  The algorithms aren’t trained and evaluated on data about re-offending. They’re trained and evaluated on data about re-conviction: they have to be, because that’s all we’ve got.

Suppose two groups of people have the same rate of re-offending, but one group are more likely to get arrested, tried, and convicted than the other. The group with a higher re-conviction rate will look to the algorithm as if they have a higher chance of re-offending.   They’ll get a higher predicted probability of re-offending. Evaluation will confirm they’re more likely to have the “re-offending” box ticked in their subsequent data.  The model can look like it’s good at discriminating between re-offenders and those who go straight, when it’s actually just good at discriminating against the same people as the justice system.

This isn’t an easy problem to fix: re-conviction data are what you’ve got. But when you don’t have the measurement you want, it’s important to be honest about it. You’re predicting what you measured, not what you wanted to measure.


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »


  • avatar

    In terms of recidivism this is a long standing finding – going back near 100 years now. Improvements in prediction are pretty minimal after taking into account those “static” characteristics. Bernard Harcourt’s book Against Prediction is a good source on that history.

    This measurement point applies just as well if you don’t look at re-conviction, but look at just re-arrest as well. I think re-conviction is probably worse though, because it has that bias accumulating at two stages instead of just one. This is maybe backwards from what people expect — since re-conviction requires more evidence than arrest.

    1 month ago