September 4, 2012

False positives

All decision rules based on observed data make mistakes, whether they are detecting the Higgs Boson or breast cancer.  And the more resources you can invest in each decision, the lower you can (typically) push the error rate.

Copyright enforcement has been a dramatic example of the opposite trend.  A century ago, individuals basically couldn’t infringe copyright in any meaningful way, so the occasional disputes got settled by expensive lawsuits.  In the modern world, copying is easy, so the companies that own lots of copyrights are looking for cheap and dirty ways to detect and stop copying.  Of course, the error rate goes up.

A dramatic example happened last night, when the live video feed for the Hugo Awards ceremony at the World Science Fiction Convention was shut down  by automated filtering systems at the company doing the broadcasting. One of the awards, for “Best Dramatic Presentation, short form” was being given for a Doctor Who script.  Clips from the TV show were shown — with full permission of the copyright owners, not that permission would have been necessary for this purpose — and some computer program made a false positive decision.

[Update: the same thing happened, briefly, to the YouTube feed of the Democratic National Convention later the same week]

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »