We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

July 2011 Archives

I've just been reading the 5th annual Digital Universe study from IDC, released by EMC last month.  This year's study seems to have attracted less media attention than previous versions.  Perhaps we've grown blasé about the huge numbers of bytes involved - 1.8 ZB (zettabytes, or 1.8 trillion gigabytes) in 2011 - or perhaps the fact that the 2011 number is exactly the same as predicted in the previous study is not newsworthy.  However, the subtitle of this year's study, "Extracting Value from Chaos", should place it close to the top of every BI strategist's reading list.  Here, and in my next blog entry, are a few of the key takeaways, some of which have emerged in previous versions of the study, but all of which together reemphasize that business intelligence needs to undergo a radical rebirth over the next couple of years.

1.8 ZB is a big number, but consider that it's also a rapidly growing number, more than doubling every two years.  That's faster than Moore's Law.  By 2015, we're looking at 7.5-8 ZB.  More than 90% of this information is already soft (aka unstructured) and that percentage is growing.  Knowing that the vast majority of this data is generated by individuals and much of that consists of video, image and audio, you may ask: what does this matter to my internal BI environment?  The answer is: it matters a lot!  Because in that vast, swirling and ever-changing cosmic miasma of data there are hidden the occasional nuggets of valuable insight.  And whoever gets to them first - you or your competitors - will potentially gain significant business advantage.

With such volumes of information and such rapid growth, it is simply impossible to examine (never mind analyse) it manually.  This demands an automated approach.  Such tools are emerging - for example, facial recognition of photos on Facebook and elsewhere or IBM Watson's extraction of Jeopardy answers from the contents of the Internet.  Conceptually, what such tools do is generate data about data, which, as we know and love in BI, means metadata.  According to IDC, metadata is growing at twice the rate of the digital universe as a whole.  That's more than doubling every year!  

So, while we may well ask what you're doing about managing and storing soft information, an even more pressing question is what are you going to do about metadata?  Of course, the volumes of metadata are probably still relatively small (IDC hasn't published an absolute value), but that growth rate means they will get large; fast.  And we currently have a much more limited infrastructure and weaker methodologies to handle metadata than we've created over the years for data.  Not to mention that the value to be found in the chaos can be discovered only through the lens of the metadata that characterizes the data itself.

For BI, this shift in focus from hard to soft information is only one of the changes we have to manage.  Another major change involves the nature and sources of the hard data itself.  There is a growing quantity of hard data collected from machine sensors as more and more of the physical world goes on the Web.  RFID readers are generating ever increasing volumes of data.  (According to VDC Research, nearly 4 billion RFID tags were sold in 2010, a 35% increase over the previous year.)  From electricity meters to automobiles, intelligent, connected devices are pumping out ever increasing volumes of data that is being used in a wide variety of new applications.  And almost all of these applications can be characterized as operational BI.  So, the move from traditional, tactical BI to the near real-time world of operational BI is accelerating, with all of the challenges that entails.

Next time, I'll be looking at some of the implications of the changes in sourcing on security and privacy, as well as the interesting fact that although the stored digital universe is huge, the transient data volumes are a number of orders of magnitude higher.

Posted July 27, 2011 8:32 AM
Permalink | No Comments |