We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at loshin@knowledge-integrity.com or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

There is an oft-quoted statistic about the growth rate of data volumes that I wanted to use in some context, and I started searching for a source. I googled "data volumes" +"double every" to see what I could find, and to my surprise, lots of hits, but it is difficult to pin down the exact parameters. Lots of folks are using the statistic:

"Data doubles every year"
"The amount of stored data from corporations nearly doubles every year"
"...the amount of data stored by businesses doubles every year to 18 months."
"In his book “Simplicity,” business management expert and author Bill Jensen indicates that the most conservative estimates show business information doubling every three years, while some estimates say data doubles every year. "
"Unstructured data doubles every three months"

I am still following links from the first page of results, and we are doubling our data every 3 to 18 months.

"Reed's Law states that the volume of data doubles every 12 months. "

OK, so there is actually a law about it. Hold on a second, according to wikipedia this law is about the utility of (social) networks, so perhaps the law doesn't apply in all jurisdictions.

Anyway, these may all be references to a UC Berkeley study on the growth of data , which said that the amount of information stored on media such as hard disk drives doubled between 2000 and 2003.

So let's look at this a little more carefully - we have a scientific study that looks not at the creation of data, but rather the use of storage media to hold what is out there. And out there is a lot of stuff needing a lot of storage, like images, music, videos, etc. Things that have information yet from which are still a challenge to extract data. Also, consider that for each thing out there, there are likely to be a lot of copies! I am sure that a scan of all the TiVos in the country would demonstrate that lots of people are still catching up on older episodes of 24 and American Idol.

I need to refine my question a little bit, then, but I am afraid it will be difficult to track down defensible sources for it. I am more interested in knowing about the growth rate for data that can be integrated into an actionable information environment. I may not care about the bits comprising that specific episode of 24 that is sitting on millions of DVRs, but as an advertiser, I might be interested in profiling which households have watched which episodes and at what kind of time shift.

Anyone have any ideas?

Posted January 23, 2008 10:48 AM
Permalink | No Comments |

Leave a comment


Search this blog
Categories ›
Archives ›
Recent Entries ›