June 9, 2013

What the NSA can’t do by data mining

In the Herald, in late May, there was a commentary on the importance of freeing-up the GCSB to do more surveillance. Aaron Lim wrote

The recent bombings at the Boston Marathon are a vivid example of the fragmented nature of modern warfare, and changes to the GCSB legislation are a necessary safeguard against a similar incident in New Zealand.


Ceding a measure of privacy to our intelligence agencies is a small price to pay for safe-guarding the country against a low-probability but high-impact domestic incident.

Unfortunately for him, it took only a couple of weeks for this to be proved wrong: in the US, vastly more information was being routinely collected, and it did nothing to prevent the Boston bombing.  Why not?  The NSA and FBI have huge resources and talented and dedicated staff, and have managed to hook into a vast array of internet sites. Why couldn’t they stop the Tsarnaevs, or the Undabomber, or other threats?

The statistical problem is that terrorism is very rare.  The IRD can catch tax evaders, because their accounts look like the accounts of many known tax evaders, and because even a moderate rate of detection will help deter evasion.  The banks can catch credit-card fraud, because the patterns of card use look like the patterns of card use in many known fraud cases, and because even a moderate rate of detection will help deter fraud.  Doctors can predict heart disease, because the patterns of risk factors and biochemical meausurements match those of many known heart attacks, and because even a moderate level of accuracy allows for useful gains in public health.

The NSA just doesn’t have that large a sample of terrorists to work with.  As the FBI pointed out after the Boston bombing, lots of people don’t like the United States, and there’s nothing illegal about that.  Very few of them end up attempting to kill lots of people, and it is so rare that there aren’t good patterns to match against.   It’s quite likely that the NSA can do some useful things with the information, but it clearly can’t stop `low-probability, high-impact domestic incidents’, because it doesn’t.  The GCSB is even more limited, because it’s unlikely to be able to convince major US internet firms to hand over data or the private keys needed to break https security.

Aaron Lim’s piece ended with the typical surveillance cliche

And if you have nothing to hide from the GCSB, then you have nothing to fear

Computer security expert Bruce Schneier has written about this one extensively, so I’ll just add that if you believe that, you can easily deduce Kristofferson’s Corollary

Freedom’s just another word for nothing left to lose.


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »


  • avatar
    Tim White

    The fact that the Boston bombers or the Unabomber were not apprehended by US intelligence agencies not only does not “prove” that their approach of large-scale trawling of private data does not work, it provides no evidence whatsoever to support this conclusion. The crucial pieces of information that we would need to gauge the (in)effectiveness of their approach are (1) the number of (or more accurately, the total damage that would have been wreaked by) terrorist plots that have been foiled using their approach, and (2) (an estimate of) what it would cost to to foil the same terrorist plots using some alternative approach. (Or if cost is deemed to be no object: whether some alternative approach could have foiled as many as the current approach.) If (as many people suspect, myself included) every, or nearly every terrorist plot that they have succeeded in foiling would have been foiled without their secretly snooping through everyone’s private data, and without incurring enormously greater costs, then their secret snooping is indeed ineffective. But we simply do not have this information — or at least I don’t, and if you do, you haven’t told us here.

    4 years ago

    • avatar
      Thomas Lumley

      On its own it doesn’t prove anything, but combined with the lack of prosecutions of attempted terrorism that aren’t due to FBI sting operations, it is definite evidence.

      4 years ago

      • avatar

        We have no idea where or not there was a lack of evidence (the ol’ absence of evidence is not evidence of absence). The NSA data at best can provide leads for investigation. The FBI or Police still need to investigate and it could certainly be the case that any evidence that was obtained is so weak and prone to false alarms that it wasn’t worth investigating further. We just don’t know.

        We are told that the NSA has thwarted terrorist attacks through these data collections. I don’t doubt that is true. A separate question is whether it is worth it to collect the data.

        For Megan, if those lads thought a system would trigger when someone put “bomb” in an email must think NSA analysts and algorithm designers are really really stupid. I can’t imagine this would be the case based on the fraud detection systems I’ve worked on.

        4 years ago

  • avatar
    megan pledger

    Back in the early 90’s when I was doing comp sci, the more anarchic lads would have a program that automatically put key words such as “bomb”, “terrorism” on the footer of their e-mails. They assumed their e-mails were being sniffed somewhere on their route through the net and wanted to create as many false negs as possible to make sniffing so time consuming to manually investigate as to be useless.

    4 years ago