August 7, 2012

Data mining sees faces in the clouds

One of the problems with lookingfor patterns in Big Data is that there’s a lot of things that look like patterns.  It’s easy to see things that aren’t there.

In a dramatic example, Phil McCarthy feeds a random polygon generator into an automatic face recogniser, and makes random changes to the polygons to improve the recognition score.

(via Ben Goldacre and Prosthetic Knowledge)

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar

    I would have though this is more a good story about the power of evolutionary algorithms rather than a cautionary tale about the perils of data mining…

    12 years ago

    • avatar
      Thomas Lumley

      They are the same issue: given sufficient flexibility and a good algorithm, you can get a good fit to anything, whether it’s really there or not.

      When it is there, this is a good thing. When it isn’t, not so much.

      12 years ago

  • avatar

    My view on overfitting is that it tends to happen if you optimise. If you explore instead of optimising, you tend not to overfit even if the model is very flexible.

    12 years ago