October 1, 2012

Computer hardware failure statistics

Ed Nightingale, John Douceur, and Vince Orgovan at Microsoft Research have analyzed hardware failure data from a million ordinary consumer PCs, using data from automated crash-reporting systems. (via)

Their main finding is that if something goes wrong with your computer, you should panic immediately, rather than being relieved when it seems to recover. Machines that accumulated at least 5 days full-time use over eight months had a 1/470 chance of a hard disk failure, but those that had one hard disk failure had a 30% chance of a second failure, and those with a second failure had nearly a 60% chance of a third failure.  Do you feel lucky?

It’s obvious that the set of computers that have a failure are basically doomed, but this still leaves open an interesting statistical question.  Does the risk of a second failure increase because the first failure damages the computer, or because the first failure picks out a set of computers that were always a bit dodgy?   I think the researchers missed something here: they tested for whether the times between failures have an exponential distribution (which is the distribution for events that don’t have any memory), and found that it didn’t.  That doesn’t distinguish between the situation where each computer has its own constant risk of failure, and the situation where each machine starts off the same but some of them have risk increasing over time.

For computers, it doesn’t matter very much which of these possibilities is true, but in some other contexts it does.   For example, if young people sent to prison are more likely to reoffend, we want to know whether the prison exposure was partly responsible, or whether these particular people were likely to reoffend anway. Unfortunately, this turns out to be hard.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments