November 12, 2012

The rise of the machines

The Herald has an interesting story on the improvements in prediction displayed last week: predicting the US election results , but more importantly, predicting the path of Hurricane Sandy.  They say

In just two weeks, computer models have displayed an impressive prediction prowess.

 The math experts came out on top thanks to better and more accessible data and rapidly increasing computer power.

It’s true that increasing computer power has been important in both these examples, but it’s been important in two quite different ways.  Weather prediction, use the most powerful computers that the metereologists can afford, and they are still nowhere near the point of diminishing returns.  There aren’t many problems like this.

Election forecasting, on the other hand, uses simple models that could even be run on hand calculators, if you were sufficiently obsessive and knowledgeable about computational statistics and numerical approximations.  The importance of increases in computer power is that anyone in the developed world has access to computing resources that make the actual calculations trivial.  Ubiquitous computing, rather than supercomputers, are what has revolutionised statistics.  If you combine the cheapest second-hand computer you can find with free software downloaded from the Internet, you have the sort of modelling resources that the top academic and industry research groups were just starting to get twenty years ago.

Cheap computing means that we can tackle problems that wouldn’t have been worthwhile before.  For example, in a post about the lottery, I wanted to calculate the probability that distributing 104 wins over 12 months would give 15 or more wins in one of the months.  I could probably work this out analytically, at least to a reasonable approximation, but it would be too slow and too boring to do for a blog post.  In less than a minute I could write code to estimate the probabilities by simulation, and run 10,000 samples.  If more accuracy was needed I could easily extend this to millions of samples.  That particular calculation doesn’t really matter to anyone, but the same principles apply to real research problems.


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »