What the NSA can’t do by data mining
In the Herald, in late May, there was a commentary on the importance of freeing-up the GCSB to do more surveillance. Aaron Lim wrote
The recent bombings at the Boston Marathon are a vivid example of the fragmented nature of modern warfare, and changes to the GCSB legislation are a necessary safeguard against a similar incident in New Zealand.
Ceding a measure of privacy to our intelligence agencies is a small price to pay for safe-guarding the country against a low-probability but high-impact domestic incident.
Unfortunately for him, it took only a couple of weeks for this to be proved wrong: in the US, vastly more information was being routinely collected, and it did nothing to prevent the Boston bombing. Why not? The NSA and FBI have huge resources and talented and dedicated staff, and have managed to hook into a vast array of internet sites. Why couldn’t they stop the Tsarnaevs, or the Undabomber, or other threats?
The statistical problem is that terrorism is very rare. The IRD can catch tax evaders, because their accounts look like the accounts of many known tax evaders, and because even a moderate rate of detection will help deter evasion. The banks can catch credit-card fraud, because the patterns of card use look like the patterns of card use in many known fraud cases, and because even a moderate rate of detection will help deter fraud. Doctors can predict heart disease, because the patterns of risk factors and biochemical meausurements match those of many known heart attacks, and because even a moderate level of accuracy allows for useful gains in public health.
The NSA just doesn’t have that large a sample of terrorists to work with. As the FBI pointed out after the Boston bombing, lots of people don’t like the United States, and there’s nothing illegal about that. Very few of them end up attempting to kill lots of people, and it is so rare that there aren’t good patterns to match against. It’s quite likely that the NSA can do some useful things with the information, but it clearly can’t stop `low-probability, high-impact domestic incidents’, because it doesn’t. The GCSB is even more limited, because it’s unlikely to be able to convince major US internet firms to hand over data or the private keys needed to break https security.
Aaron Lim’s piece ended with the typical surveillance cliche
And if you have nothing to hide from the GCSB, then you have nothing to fear
Freedom’s just another word for nothing left to lose.
Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »