Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation published by Addison-Wesley in 1997.

Over the past few years, Barry has extended his interest to cover the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT.

Barry has worked in the IT industry for more than 25 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

Recently in Business Integrated Insight Category

Autumn has arrived here in Cape Town and with it a slight chill in the air.  In a couple of weeks, I head up to Europe where spring is warming the land.  It's a time of transitions.  And so it is too for the IT industry and BI in particular.

I'm presenting three seminars on the transformation of BI into Enterprise IT Integration in Europe: a half day each in Copenhagen and Helsinki (4 and 5 April) and a full two-day deep dive in Rome (11-12 April).  I started teaching these classes in early 2010, and as I review and update the contents, I'm struck by two impressions.  First, a lot has changed in the BI / Data Warehousing market in the past year.  Second, the new architecture, BI2, that I've been developing since 2009 continues to closely map the emerging technology. That's hugely gratifying!

So, what are the big changes?  One is the exploding interest in "Big Data".  Now, as I've discussed elsewhere, the name is a total misnomer--the stuff is neither necessarily big nor data in the strict meaning of the words--but, the interesting thing is how the concepts and technologies around this area have energized thinking about BI.  Pre-defined reports have become even less interesting in the face of innovative data mining and predictive analytics in the big data environment.  And the information we can use to gain analytic value has expanded in variety and volumes beyond our wildest dreams.

Another important change has been the acquisitions of innovative start-ups, particularly in the analytic appliances field by both BI and other-than-BI giants--Netezza by IBM, Vertica by HP and Sybase by SAP to name a few. The acquisitions highlight the importance of these emerging technologies in a sea change in the way BI will be and increasingly is being delivered.

The third change is at the hardware level.  While much more is going on, two aspects are important to BI: the increasing number of cores in commodity processors and the move towards silicon-based data storage, whether on solid state disks or in memory.  Both of these trends suggest that internal database design approaches established since the 1970s are due for a big shake-up.

None of these trends is particularly new.  But their speed-up and convergence in the past year or so are rapidly pushing the BI environment to a tipping point--the architecture we've successfully used for over twenty years is rapidly being made irrelevant!  That architecture, dating from a 1988 paper that I co-authored, was based on business needs and technological solutions that were appropriate for their time.  Today's bus-tech ecosystem is so different that I have been arguing for a number of years that we need to look beyond our current silos of operational systems, BI and office/collaborative systems to a new architecture that crosses all of IT.  This architecture, which I call Business Integrated Insight, proposes that we treat the entire information asset of the business as a single, logical entity--the Business Information Resource (BIR).

Such a change of focus has widespread implications for the entire IT support infrastructure.  The challenges are big, but the technology as it is emerging is enabling this change.  My European seminars describe the new architecture in depth, but, more importantly, link what needs to be done to current technologies and provides practical guidance on how to begin the transformation.

Posted March 15, 2011 9:15 AM
Permalink | No Comments |
Yesterday's Information Management webinar, "Information Overload? 3 ways real-time information is changing decision management" hosted by Eric Kavanaugh, got me into a philosophical mood.  Robin Bloor of the Bloor Group and David Olson of Progress Software provided a fascinating overview of the development of the complex event processing (CEP) market and its increasing importance for business competitiveness and, perhaps, even survival.

Robin's message was twofold.  From a business viewpoint, decision making is moving from reactive to predictive, driven by competition in business to best understand upcoming market opportunities right to the edge of what can be foreseen.  In technology, the exponential growth in speed of processors and, indeed, the majority of the hardware infrastructure is driving or enabling application architectures from batch orientation, through transaction processing and into real-time event handling.  David provided some interesting examples of how CEP is being used by his customers in manufacturing, logistics and airlines to monitor business events in real time, to react earlier to changing circumstances and to drive process improvement.  Their bottom line: CEP is a major architectural transition that is rapidly becoming mainstream; if you're not on board, you risk severe competitive disadvantage.

From one point of view, the message makes sense.  It's yet another twist of the screw towards speedier decision making.  Operational BI promotes the use of near real-time data either copied into the warehouse or accessed in transaction-processing systems via federated query.  CEP goes one step further and says let's access and analyze the data as it flows through the network; we need to make decisions before we land the data on a disk, if we even land it at all.  In the financial markets, with the data volumes and reaction speeds involved, once the technology became available the approach seemed like a no-brainer.  In financial systems, CEP enables high-value applications such as fraud detection in credit card transactions.

However, looking at some of the applications presented in the webinar, operational BI has also been used effectively to solve similar business issues.  The boundary between the more traditional operational BI approach and CEP depends on the required speed of decision making and the volumes of events involved.  CEP certainly extends the high-end of pattern recognition and trend detection to higher speeds and volumes.  And in the middle range, it provides another set of implementation options beside operational BI.

So, what were my philosophical musings?  Robin presented a very interesting scale of human decision-making timescales, from months and years at one extreme to one tenth of a second at the other.  That latter number is the fastest human reaction time and, by the way, slower than a cobra's strike!  CEP and, to some extent operational BI, operate in the range of decisions speeds faster than one tenth of a second: that is, entirely beneath human radar.  While it is clear that some decisions--collision avoidance on the highway, for example--naturally fall in this timescale, my concern is the implications that arise from pushing more and more decisions into this realm and, by definition, beyond human oversight.  We've already seen the consequences of this approach in the financial markets, where computer-based trading has driven wild, unpredictable and potentially dangerous swings in the markets.  Decision-making algorithms are only as good as the assumptions that have been encoded in them, which depend, in turn, on the knowledge available and the business requirements--both explicit and implicit--when they were created.  It is really sensible to design systems that unnecessarily exclude human wisdom?

The current business mindset that competitiveness is next to godliness is, in many cases, driving decision making into tighter and tighter circles, removing wisdom, insight, intuition and basic humanity from the loop.  Are we prepared to learn any lessons from the recent financial market fluctuations?  And, it is wise to arbitrarily remove more and more important decision making from human oversight just because the technology is available?  Just asking... 

Posted August 27, 2010 7:05 AM
Permalink | 4 Comments |
Wow - what a saga!  I published the first article of the series "From Business Intelligence to Enterprise IT Architecture" in March and part 5 has just appeared, part 6 is under construction and I can see at least another couple to come.  So, what's it all about?

Business Integrated Insight* (BI2 - BI to the power of 2) is an architectural effort I've committed to undertake in order to extend and update the long-serving data warehouse / business intelligence architecture. Why?  It's very clear to me that the business requirements and expectations for business intelligence have expanded dramatically over the past ten years.  And that expansion has been into areas more usually thought of as operational applications and collaborative systems.  Both of these areas have also encroached into BI.  So, requirement boundaries have blurred significantly, and an architecture that was defined in the 1980s based on the then current definition of decision support could clearly do with a substantial overhaul.

One approach could be to try to address the changed needs from within the boundaries of data warehousing, as Bill Inmon, for example, has done with DW 2.0â„¢.  However, in my opinion, this effort has too narrow a focus.  Today's business users and decision-makers come from an entirely different generation to those that were the target audience of the original data warehouse efforts in the 1980s (Devlin & Murphy, 1988).  Modern business users are tech savvy, internet-aware, social networking animals.  Technology boundaries such as operational / informational / collaborative are simply not their scene.  My belief is that any new architecture for business intelligence must, by definition, extend over the entire IT infrastructure for business.

Is this a tall order?  Well, it's certainly ambitious.  But so was data warehousing way back in the mid-1980s when relational databases were still young.  But, the technology available today has advanced by leaps and bounds since then, especially in the last few years.  Take a look at columnar / parallel databases such as Netezza, Vertica, ParAccel, Aster Data, Infobright and more, not to mention Oracle's Exadata V2.  While the hype is about query speed and large data volumes, the potential is surely for a paradigm shift (there! I've said it) in data storage architectures.  The same drive is evident in the mushrooming interest in unstructured information, which I prefer to call soft information.  I wrote recently about Attivio, who are building a bridge between the worlds of unstructured and structured information.  From beyond the database world SOA-like and Web 2.0 technologies are also fundamentally changing the way the old operational and collaborative environments are structured.

These technologies and tools, and more, will enable the leap to BI2 over the coming few years.  And if you know of particularly innovative solutions that are coming to market, I'd love to hear about them!

*For the record, I coined the term Business Integrated Insight and drew the first architectural diagram in an August 2009 white paper sponsored by Teradata


Posted August 5, 2010 12:16 PM
Permalink | No Comments |
Synchronicity is a wonderful thing! I get yet another follower notice from Twitter today, and for the first time in ages I am curious enough to check the profile. It turns out that @LaurelEarhart is marketing director for the Smart Content Conference, among other things, including Biz Dev Maven! And there, I read "Perfect storm: #Google acquired #Metaweb" announced on July 16. Having just done a webinar with Attivio yesterday on the topic "Beyond the Data Warehouse: A Unified Information Store for Data and Content" my interest was piqued. Let me tell you why.

I suspect that very few data warehouse vendors or developers have paid much attention to Metaweb or its acquisition. As far as I can tell, it hasn't turned up on the data warehouse or BI analyst blogs either. Perhaps the reason is that Metaweb's business is in providing a semantic data storage infrastructure for the web, and Freebase, an "open, shared database of the world's knowledge". For data warehouse geeks, the former is probably a bit off-message, while the latter may sound like Wikipedia, although the mention of a shared database may raise the interest level slightly.

But, if you're thinking about what lies beyond data warehousing (as I am), and wondering how on earth we're ever going to truly integrate relevant content with the data in our warehouses, what Metaweb and now Google are doing should be of some interest. Here's a quote from Jack Menzel, director of product management at Google on his blog:

"Type [barack obama birthday] in the search box and see the answer right at the top of the page. Or search for [events in San Jose] and see a list of specific events and dates. We can offer this kind of experience because we understand facts about real people and real events out in the world. But what about [colleges on the west coast with tuition under $30,000] or [actors over 40 who have won at least one oscar]? These are hard questions, and we've acquired Metaweb because we believe working together we'll be able to provide better answers."

For me, the interesting point here is the inclusion in the hard questions of conditions that would make sense to even the most inexperienced BI user. Take either of these two hard questions and you can easily imagine the SQL statements required, provided you defined and populated the right columns in your tables. The problem is that you need to have predefined columns and the tables in advance of somebody asking the questions.

What Metaweb on the Internet and Attivio on the intranet (and, of course, other vendors in both areas) are trying to do is to bridge the gap between data and content, so that users can ask mixed search and BI queries based on the implicit understanding that exists in the data/content stores of the semantics of the information. And, perhaps more importantly, to be able to do that in a fully ad hoc manner that doesn't require prior definition of a data model and its instantiation in columns and tables of a relational database. If you want to dig deeper, I invite you to take a look at my recent white paper.

In the meantime, my thanks to @LaurelEarhart and the wonder of synchronicity.

Posted July 22, 2010 3:39 AM
Permalink | No Comments |
Any acquisition in the database market, in this case, the July 6 announcement of EMC's plan to acquire Greenplum, generates a flurry of analyst activity speculating about the financial or technical rationale for the acquisition, winners and losers among other database vendors and the effect of the move on customers' buying patterns.  Personally, I find these opinions very interesting and highly informative.  And I invite you to check out, for example, Curt Monash or Merv Adrian to explore these aspects of the acquisition.

However, I'd like to take the opportunity to focus our minds once again on a more fundamental question: how is IT going to manage data quality and reliability in a rapidly expanding data environment, both in terms of data volumes and places to store the data?  I'm currently describing a logical enterprise architecture, Business Integrated Insight (BI2), that focuses on this.

So, for me, what the acquisition emphasizes, like that of Sybase by SAP, is that specialized databases, with their sophisticated features and functions, are rapidly entering the mainstream of database usage.  Their ability to handle large data volumes with vast improvements in query performance has become increasingly valuable in a wide range of industries that want to analyze enormous quantities of very detailed data at relatively low cost.  How to do this?  Vendors of these systems typically have a simple answer: copy all the required data into our machine and away you go!

My concern is that IT ends up with yet another copy of the corporate data, and a very large copy at that, that must be kept current in meaning, structure and content on an ongoing basis.  Any slippage in maintaining one or more of these characteristics leads inevitably to data quality problems and eventually to erroneous decisions.  Such issues typically emerge unexpectedly, in time-constrained or high-risk situations and lead to expensive and highly visible firefighting actions by IT.  Unfortunately, such occurrences are common in BI environments, but typically relate to unmanaged spreadsheets or relatively small data marts.  We have just jumped the problem size up by a couple of orders of magnitude.

So, am I suggesting that you shouldn't be using these specialized databases?  Would I recommend that you stand in front of a speeding freight train?  Clearly not!

There are two ways that these problems will be addressed.  One falls upon customer IT departments, while the other comes back to the database industry and the vendors, whether acquiring or acquired.  These paths will need to be followed in parallel.

IT departments need to define and adopt stringent "data copy minimization" policies.  The purist in me would like to say "elimination" rather than "minimization".  However, that's clearly impossible.  Minimization of data copies, in the real world, requires IT to evaluate the risks of yet another copy of data, the possibility of using an existing set of data for the new requirement and, if a new copy of the data is absolutely needed, whether existing analytic solutions could be migrated to this new copy of data and the existing data copies eliminated.

Meanwhile, it is incumbent upon the database industry to take a step back and look at the broader picture of data management needs in the context of emerging technologies and the explosive growth in data volumes.  The basic question that needs to be asked is: how can the enormous power and speed of these emerging technologies be crafted into solutions that equally support divergent data use cases on a single copy of data?  And, if not on a single copy, how can multiple copies of data be managed to complete consistency invisibly within the database technology?

Tough questions, perhaps, but ones that the acquirers in this industry, with their deep pockets, need to invest in.  As the database market re-converges, the vendors that solve this architectural conundrum will become the market leaders in highly consistent, pervasive and minimally duplicated data that enables IT to focus on solving real business needs rather than managing data quality.  Wouldn't that be wonderful?

Posted July 7, 2010 1:18 PM
Permalink | No Comments |