Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

Recently in Hybrid Database Category

Any acquisition in the database market, in this case, the July 6 announcement of EMC's plan to acquire Greenplum, generates a flurry of analyst activity speculating about the financial or technical rationale for the acquisition, winners and losers among other database vendors and the effect of the move on customers' buying patterns.  Personally, I find these opinions very interesting and highly informative.  And I invite you to check out, for example, Curt Monash or Merv Adrian to explore these aspects of the acquisition.

However, I'd like to take the opportunity to focus our minds once again on a more fundamental question: how is IT going to manage data quality and reliability in a rapidly expanding data environment, both in terms of data volumes and places to store the data?  I'm currently describing a logical enterprise architecture, Business Integrated Insight (BI2), that focuses on this.

So, for me, what the acquisition emphasizes, like that of Sybase by SAP, is that specialized databases, with their sophisticated features and functions, are rapidly entering the mainstream of database usage.  Their ability to handle large data volumes with vast improvements in query performance has become increasingly valuable in a wide range of industries that want to analyze enormous quantities of very detailed data at relatively low cost.  How to do this?  Vendors of these systems typically have a simple answer: copy all the required data into our machine and away you go!

My concern is that IT ends up with yet another copy of the corporate data, and a very large copy at that, that must be kept current in meaning, structure and content on an ongoing basis.  Any slippage in maintaining one or more of these characteristics leads inevitably to data quality problems and eventually to erroneous decisions.  Such issues typically emerge unexpectedly, in time-constrained or high-risk situations and lead to expensive and highly visible firefighting actions by IT.  Unfortunately, such occurrences are common in BI environments, but typically relate to unmanaged spreadsheets or relatively small data marts.  We have just jumped the problem size up by a couple of orders of magnitude.

So, am I suggesting that you shouldn't be using these specialized databases?  Would I recommend that you stand in front of a speeding freight train?  Clearly not!

There are two ways that these problems will be addressed.  One falls upon customer IT departments, while the other comes back to the database industry and the vendors, whether acquiring or acquired.  These paths will need to be followed in parallel.

IT departments need to define and adopt stringent "data copy minimization" policies.  The purist in me would like to say "elimination" rather than "minimization".  However, that's clearly impossible.  Minimization of data copies, in the real world, requires IT to evaluate the risks of yet another copy of data, the possibility of using an existing set of data for the new requirement and, if a new copy of the data is absolutely needed, whether existing analytic solutions could be migrated to this new copy of data and the existing data copies eliminated.

Meanwhile, it is incumbent upon the database industry to take a step back and look at the broader picture of data management needs in the context of emerging technologies and the explosive growth in data volumes.  The basic question that needs to be asked is: how can the enormous power and speed of these emerging technologies be crafted into solutions that equally support divergent data use cases on a single copy of data?  And, if not on a single copy, how can multiple copies of data be managed to complete consistency invisibly within the database technology?

Tough questions, perhaps, but ones that the acquirers in this industry, with their deep pockets, need to invest in.  As the database market re-converges, the vendors that solve this architectural conundrum will become the market leaders in highly consistent, pervasive and minimally duplicated data that enables IT to focus on solving real business needs rather than managing data quality.  Wouldn't that be wonderful?

Posted July 7, 2010 1:18 PM
Permalink | No Comments |
Preparing materials for a seminar really forces you to think!  I just finished the slides for my two-day class in Rome next week, and after I got over my need for a strong drink (a celebration, of course), I got to reflect on some of what I had discovered.

Perhaps the most interesting was the amazing changes in the database area that have been happening over the past couple of years.  A combination of hardware advances and software innovations have come together with a recognition that data is no longer what it once was to pose some fundamental questions about how databases should be constructed.

Let's start on the business side - always a good place to start.  Users now think that their internal IT systems should behave like a combination of Google, Facebook and Twitter.  Want an answer to the CEO's question on plummeting sales?  Just do a "search", maybe "call a friend", join it all together and voila!  We have the answer. 

From an information viewpoint, this brings up some very challenging questions about the intersection of soft (aka unstructured) information and hard (structured) data and how one ensures consistency and quality in that set.  IT's problem is no longer just combining hard data from different sources; it's about parsing and qualifying soft information as well.  This is not a truly new problem.  Data modelers have struggled with it for years.  It's the speed with which it needs to be done that causes the problem.

So, what has this got to do with new software and hardware for databases?  Well, the key point is that database thinking has suddenly moved on from strict adherence to the relational paradigm.  The relational model is an extraordinarily structured view of data.  Relational algebra is a very precise tool for querying data.  You need to have a strong understanding of both to make valid queries, but do you really want your users to think that way?  Should you necessarily store the information physically in that model?  When you free yourself of these assumptions, you can begin to think in new ways.  Store the data in columns instead of rows?  Perfect!  A mix of row- and column-oriented data, and maybe some in memory only?  Yes, can do!  And then there's mixing searching (a soft information concept) with querying (a hard data thought) to create a hybrid result.  That's easy too!

And on the edges of the field, there are even more fundamental questions being asked.  Do we need always need consistency in our databases?  Can we do databases without going to disk for the data?  Could we do away with physically modeling the data and just let the computer look after it?  The answers to these questions and more like them are not what you might expect if you've been around the database world for 20 years.  And with those different answers, the overall architecture of your IT systems is suddenly open to dramatic change.

Believe me, the first businesses to adopt some of these approaches are going to gain some extraordinary competitive advantages.  Watch this space!

Posted April 8, 2010 9:58 AM
Permalink | No Comments |
I'm presenting a two-day seminar for Technology Transfer in Rome in mid-April, entitled "BI2--From Business Intelligence to Enterprise IT Integration" and am currently researching and preparing the material.  And the more I research, the more excited I get about the prospects for the next wave of development from BI to... what?  Well, that's the real question for me!

It's my belief, and I've been writing and speaking about this for quite a while now, that the way we do BI today has reached its limits.  Business today demands ever closer to real-time information that must be consistent and meaningfully integrated across ever wider scopes.  These demands simply cannot be satisfied by our current concept of a layered, triplicated (and more) data warehouse of hard information--largely numerical data arranged in neat tables--along with some soft information thrown in as an afterthought.  The only way forward that I can see is to begin to treat all business information as a conceptually single, integrated, modelled resource with minimal duplication of data.  I've described this business information resource (BIR) to a first approximation elsewhere and my seminar will, among other things, dig deeper into the structure of the BIR and the technology needed to create and maintain it.

My current excitement stems from the growing reality of "hybrid" databases--combining the features and strengths of row-oriented and columnar relational databases.  Now, I know that academia has proposed approaches to this as much as 8 years ago, but it's only in the last year that commercial databases are introducing it.  I wrote about Vertica's FlexStore feature, introduced in 2009, in my last post.  The latest announcement I found  is of a technology preview program for Ingres VectorWise, the newest entrant in the hybrid database arena.  Add Oracle's Exadata V2, announced last year with typical modesty by Larry Ellison as the "fastest machine in the world for data warehousing, but now by far the fastest machine in the world for online transaction processing", and we can see that the approach is finally gaining market traction.

Why is this important?  Well, despite the hype, Larry hit the nail on the head.  If we finally have databases that can handle both operational and informational workloads equally well, we can begin to define an architecture that doesn't insist on copying vast quantities of data from one database to another.  That doesn't mean the death of the data warehouse any time soon, but it does mean that a much more integrated IT environment is coming your way.

Posted March 18, 2010 10:12 AM
Permalink | 2 Comments |