A new research paper with the cheeky title “Power failure: why small sample size undermines the reliability of neuroscience” has come out in a neuroscience journal. The basic idea isn’t novel, but it’s one of these statistical points that makes your life more difficult (if more productive) when you understand it. Small research studies, as everyone knows, are less likely to detect differences between groups. What is less widely appreciated is that even if a small study sees a difference between groups, it’s more likely not to be real.
The ‘power’ of a statistical test is the probability that you will detect a difference if there really is a difference of the size you are looking for. If the power is 90%, say, then you are pretty sure to see a difference if there is one, and based on standard statistical techniques, pretty sure not to see a difference if there isn’t one. Either way, the results are informative.
Often you can’t afford to do a study with 90% power given the current funding system. If you do a study with low power, and the difference you are looking for really is there, you still have to be pretty lucky to see it — the data have to, by chance, be more favorable to your hypothesis than they should be. But if you’re relying on the data being more favorable to your hypothesis than they should be, you can see a difference even if there isn’t one there.
Combine this with publication bias: if you find what you are looking for, you get enthusiastic and send it off to high-impact research journals. If you don’t see anything, you won’t be as enthusiastic, and the results might well not be published. After all, who is going to want to look at a study that couldn’t have found anything, and didn’t. The result is that we get lots of exciting neuroscience news, often with very pretty pictures, that isn’t true.
The same is true for nutrition: I have a student doing a Honours project looking at replicability (in a large survey database) of the sort of nutrition and health stories that make it to the local papers. So far, as you’d expect, the associations are a lot weaker when you look in a separate data set.
Clinical trials went through this problem a while ago, and while they often have lower power than one would ideally like, there’s at least no way you’re going to run a clinical trial in the modern world without explicitly working out the power.
Other people’s reactions