October 19, 2015

Thinking about public Big Data

There are useful pieces by David Fisher in the Herald, and Tom Pullar-Strecker at Stuff, about the new NZ Data Futures Partnership and its chair, Dame Diane Robertson.  The idea is that the government has access to a lot of data, which could be used in all sorts of ways, but that New Zealand society needs to make decisions about which uses are ok. At least, that’s the idea in David Fisher’s story. In the Stuff piece it sounds more as though the idea is to educate people so they agree with the desired uses. [update: in the print version of the Herald there’s also something by Harkanwal Singh on the social consent issue]

Detailed individual data can be used for predicting things, and while there’s obviously a problem if the predictions are inaccurate, there can be even more of a problem if they are accurate. The Herald story mentions the use of predictions of re-offending to give people longer prison terms, so-called ‘evidence-based sentencing‘.

It isn’t just a question of whether data will be used to do Bad Things, though. There’s a broader problem of maintaining trust in government data. If you think your information is going to be used to do complex, mysterious, and potentially creepy things, you’re going to be less likely to talk to the nice StatsNZ interviewer.  Reliable government data collection is important for the private sector as well as the public sector, and it’s much more difficult and expensive if the public don’t trust the data collectors.

In the US, according to a recent analysis, confidence in federal statistical agencies is fairly low — they’re rated more highly than the politicians, but below the military and universities, and level with newspapers.

In this survey, it mattered how much people knew about the statistical agencies, but in a complicated way. For people who didn’t know much about what the agencies did, confidence in them was moderately well correlated with confidence in the military, newspapers, Congress, and universities. For people who knew more about the agencies, these correlations were weaker.  These people might like or dislike the Census Bureau or the CDC, but didn’t see the agencies as part of a vague and powerful Them.

There are plenty of cynical explanations you can give for these results, but there’s also an obvious positive explanation: it’s good for people to understand what government does with their data and why.

 

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »