January 24, 2016

Tracing a data factoid

I saw on Twitter this morning the claim “90% of the worlds data was created in the last 12 months”, raising the twin questions “Says who?” and “What does that even mean?”.

The tweet linked to a story at the Huffington Post, which attributed the claim to Constellation Research, but with no further information on how they found it out or when, or what they meant by it. Further Google searches find that most occurrences of the claim are either completely unsourced, attributed to the guy who wrote the HuffPo story, or also just say “Constellation Research”.

IBM has claimed “Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been created in the last two years alone.” But that’s two years, not 12 months. And they’ve been saying it for some time:
ibm

In 2012, the skeptics site at stackexchange.com discussed the IBM claim and found it plausible then — assuming that ‘data has been created’ meant that ‘data has been stored in some permanent medium of the sort IBM sells’. It’s not just business data, as some of the citations imply — it will include cat videos on mobile phones, data from the Large Hadron Collider, the NSA’s archive of your email, and lots and lots of porn.

It still could be that Constellation Research has a new estimate, not the recycled one from the days before Siri. However, a 2015 Oracle presentation (PDF, slide 7) says “90% of the world’s data has been created in the last two years“, and cites “ Constellation Research: “Businesses Must Answer the Call for Cloud Based Integration”“. So I’m guessing maybe not.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments