Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

October 2011 Archives

Laurel-Hardy.jpgIn the Information part of Information Technology, Big Data is the Big Hit of 2011.  It's also a wonderful phrase to play with: take the "big", place it in front of a few other words and suddenly you have a strapline... or a blog title!  So, is it a big change for IT, or is it just big hype?

There's no doubt in my mind that big data describes a real and novel phenomenon; unfortunately, there are also many existing and well-understood phenomena in the world of business intelligence and data warehousing that are getting sucked into marketing stories and, indeed, even into respectable articles about big data.


The recent McKinsey Quarterly article "Are you ready for the era of 'big data'?" (registration required) opens with the following example: "The top marketing executive at a sizable US retailer recently [discovered that a] major competitor was steadily gaining market share across a range of profitable segments...  [This] competitor had made massive investments in its ability to collect, integrate, and analyze data from each store and every sales unit and had used this ability to run myriad real-world experiments.  At the same time, it had linked this information to suppliers' databases, making it possible to adjust prices in real time, to reorder hot-selling items automatically, and to shift items from store to store easily.  By constantly testing, bundling, synthesizing, and making information instantly available across the organization... the rival company had become a different, far nimbler type of business.  What this executive team had witnessed first hand was the game-changing effects of big data" [my emphasis].

With all due respect to the authors, I believe that anybody who has been involved in business intelligence over the past ten years will be underwhelmed by this story.  It is almost entirely a scenario, and a common one, at that, describing a pervasive data warehousing implementation and operational BI excellence.  I suspect that the reason this example was tagged as big data was because of the reference to running myriad real-world experiments.  This is a behavior often associated with big data; however, on its own, it is generally not a sufficient characteristic.  

The remainder of the article provides many interesting examples and possible consequences, both beneficial and cautionary, of using big data.  For the business executive, it clearly whets the appetite.  But, from an IT perspective, it misses a key aspect--a viable definition of what big data really is.  This is hardly surprising; big data has reached the point on the hype curve where definitions are considered unnecessary.  We all seem to have an assumed definition that neatly meets our needs, be it selling a product or initiating a project.  Hear me clearly, though.  Despite the hype, there is something real going on here.  And it's fundamentally about the underlying characteristics of the information involved; characteristics that differ significantly from the data we in IT have stored and used over the years.

I contend that there are four types of information that together make up big data:
1.    Machine-generated data, such as RFID data, physical measurements and geolocation data, from monitoring devices
2.    Computer log data, such as clickstreams
3.    Textual social media information from sources such as Twitter and Facebook
4.    Multimedia social and other information from the likes of Flickr and YouTube

They are as different from traditional transactional data (the mainstay of BI) as they are from one another.  They have little in common, beyond their volume.  How business extracts value from them and how IT processes them vary widely.

While closely related to traditional BI and data warehousing, big data projects require additional and often very different skills in business and IT.  Their value is first to drive innovative change in business processes; only afterwards can their use become ongoing and operational.  These are topics I'll return to in the coming months.  But, in the meantime, join me for my webinar "Big Data Drives Tomorrow's Business Intelligence" on 25th October for further insights in this rapidly evolving area.



Posted October 21, 2011 11:03 AM
Permalink | 1 Comment |
1944_NormandyLST.jpgI believe it was Forrester Research who coined the phrase "Unified Information Access (UIA)" back in a 2008 publication as the convergence of search and BI.  It was picked up by a few of startups, among them Attivio and Coveo, and largely ignored by the bigger players... until recently, that is.  HP's move into the information management space with acquisitions of first Vertica and then Autonomy signaled the first interest by a major player.  Oracle snapping up Endeca marks the opening of the second phase of the campaign: a leader in the hard information (structured data) relational database market takes the plunge.

Am I surprised?  Not at all.  With the hype around big data and the conflation (partially erroneous, in my view) of big data and soft (unstructured) information, it was obvious that the big relational database vendors would need a beachhead.  Oracle has landed.  Further speculation has already begun as to the location of the second front.  IBM targeting Coveo?  Teradata aiming at MarkLogic?  Microsoft and Attivio (the founders of Attivio previously sold FAST to Microsoft in 2008)?  The guessing game is fun, but my concerns about this war run far deeper.

Leaving aside the other aspects of big data for now, let's focus on soft information, and in particular on the textual subset, including information that can be easily converted to text, such as audio and document scans.  While leaving out image and video reduces volumes dramatically, text is a large and highly valuable segment of the overall information market.  In the past, this segment has stood largely separate from database management under the umbrella of enterprise content management.  What Forrester named in 2008 (and it was already becoming evident even then) was that business users neither know nor care about some division between content and data, between search and BI.  They simply want to find and benefit from the burgeoning wealth of digital information that is being stored in computers everywhere.  I wrote in a 2010 white paper that I see "data and content as two ends of a continuum of the same business information asset... [with a] depth of integration required for full business value."

The problem I see in this likely battle for acquisitions by database vendors is as follows:  UIA is a comparatively young technology coming largely from the content management space, with "unstructured" search bridging over into the BI query world.  This direction of innovation flow makes sense; soft information is more complex and extensive than relational data, and for business users more "natural" and easier of understand.  In simple terms, the concept of search must be enhanced with query rather than query extended to search.  Hard data flows from soft information, both conceptually and in its implementation.  Can you imagine converting a relational database in its entirety into an engaging novel?

The risk is that large database vendors will try to shoehorn search into their existing query-centric view of the world; that the innovative solutions we need to gain real value from the explosion of soft information will be stifled.  There are some small startups (such as NeutrinoBI) that come from the hard data space with an understanding of the primacy of search as an entry point into analytics, but in my experience, the larger players either focus solely on hard data or split hard and soft information management into very separate organizational silos.

In the Oracle acquisition, I expect that the likely BI outcome is the positioning of Endeca's Latitude component behind OBIEE.  My challenge to Larry (if he is listening!) is to conceptually put the two components the other way around and see what it offers to business users.


Posted October 19, 2011 2:37 AM
Permalink | 2 Comments |
rolls-royce-logo-302.jpgHaving looked at timeliness in Part 1, let's turn our attention to consistency.

Early proponents of data warehousing, including myself, majored on the role of the Enterprise Data Warehouse (EDW) as a repository of a consistent, integrated and historical view of the business.  Leaving aside the historical aspect for now, the desire for consistency and integration can be traced directly to one of the main concerns of decision makers in the 1980s.  There existed a growing proliferation of applications--operational systems--that were responsible for running the business.  These systems were being introduced in an ad hoc manner throughout the business, often on different platforms and addressing different but overlapping aspects of the same process.  

In a bank, for example, a mainframe-based application running against an IMS database handled checking accounts.  A new relational database system running on a minicomputer was introduced to handle savings accounts.  The difficulty for decision makers was to understand the combined account position for individual customers.  The need, stated in a nutshell, was for a "single version of the truth".

This divergence of sources, combined with the often poor data quality in individual operational sources, as well as the need for a single truth, led EDW designers and developers to focus almost maniacally on how to achieve consistency and integration of information in the warehouse.  Enterprise modeling, ETL tools and intricate, often lengthy projects were all used in service of this goal.

Today, we need to pose two important questions.  First, is there really a single version of the truth that can be created and stored in the EDW?  Second, do we have the time and the money to create it?

On the first question, I feel that we have become blinded by our unswerving belief in a universal truth.  Yes, there do exist "truths" in the business that need to be universally agreed.  The quarterly figures announced to the stock markets absolutely need to be internally consistent and well-integrated.  The underlying numbers that lead to these results are similarly constrained.  But, it is equally clear that some numbers can exist as best estimates, close approximations or even "swag" (some wild-assed guess!).  As a culture, we have become obsessed with the second or third decimal point on many numbers.  How many times have you heard election polls being reported with candidates separated by half a percentage point, while the 2% margin of error on the poll is hidden in a footnote?

Answering the first question as we just have leads easily to an answer to the second.  We need to divert resources from seeking complete consistency to achieving consistency where it matters and timeliness where that is important.  And, more, getting the best return on investment in both areas--timelines and consistency.   The real business value in some data lies in its early availability to decision makers; the value in other data resides in its consistency and integrity.

Distinguishing between the two is the key to success.

Join me on my upcoming webinar, "Business Intelligence: the Quicker, the Better", on October 25th for further insights into this important issue.

And for my European readers, allow me to remind you that Larissa Moss is presenting a two-day seminar in Rome on October 20-21st, entitled "Agile Approach to Data Warehousing & Business Intelligence" which will also show how to address this dilemma.


Posted October 18, 2011 8:46 AM
Permalink | No Comments |
Stopwatch.pngOver the past few years, we've heard increasingly that business needs ever more up-to-the-minute information to compete effectively.  First mover advantage, sometimes equated with technological leadership, is frequently cited as a driving force for business success, especially in new or emerging market segments, where the first entrant can gain early control of resources that followers miss.  In BI, this thinking drives a number of distinct patterns.

The first pattern, which has been around for some time now, is operational, or near real-time, BI.  The premise is that ever earlier identification of patterns and trends in customer behavior or needs allows the business to respond faster and thus gain competitive advantage.  A second, more recent, example is seen in big data, where advanced data mining and analytic techniques are applied to clickstream and other web-sourced information on an ongoing basis for similar purposes as in operational BI.  The data sources and tools are different; but the driving thought is the same.

Such thinking, as well as exciting success stories or interesting discoveries, has led many people to the conclusion that the business value of BI is centered on timeliness.  We see this in BI tools and appliances that offer ever faster load, access and query speeds.  We observe it in the self-service argument--who has the time to wait for IT to deliver?  We can hear it in arguments for agile BI, whether in development or in use.  

The focus on timeliness and the value it can deliver is very fair, of course.  Especially, given the history of data warehousing development where we have encountered extended delivery times.  It is particularly appealing to unfortunate business users who sit in front of BI screens waiting for the paint to dry while their query plods on in some treacle-like database.  However, that is only half the story.

Let me pose you a question.   Which would your business users prefer:
1.    The wrong answer immediately
2.    The right answer too late, or
3.    A "sufficiently correct" answer "soon enough" to affect the outcome of decision-making?
I suspect most will opt for answer number 3.

This search for a sufficiently right and timely enough answers leads directly to the other central driver of BI--consistency.  Once upon a time, in the early days of days of data warehousing, consistency was the primary goal for the data warehouse.  Its ultimate expression is in the now well-worn phrase, "a single version of the truth".  And it's to that phrase I'll turn in part 2 of this post.

And check out my upcoming webinar, "Business Intelligence: the Quicker, the Better", on October 25th, which will also address this issue.
 
 

Posted October 12, 2011 8:21 AM
Permalink | No Comments |
ebenezer_scrooge.jpg"Oracle Exalytics enables organizations to make decisions faster... through the introduction of interactive visualization capabilities that make every user an analyst" from Oracle's Exalytics press release of 3 October.

"The ultimate challenge... is putting enough useful Big Data capabilities into the hands of the largest number of workers. The organizations that figure out this part will reap corresponding rewards." Dion Hinchcliffe's recent post "The enterprise opportunity of Big Data: Closing the 'clue gap'".

Sorry to sound like Ebenezer Scrooge of Dickens' "A Christmas Carol", but... Bah, humbug!

Some business users have been doing analytics for years... in Excel.  Do we consider that an Information Management success story?  Have the business benefits far outweighed the costs in terms of users' time, IT's efforts in trying to provide data or, indeed, the numerous spreadsheet-induced business mistakes and mis-stated statutory reports?  In a word, no!

So, what do you think?  Will providing growing volumes of increasingly diverse data through ever more sophisticated and speedy statistical analysis tools make this situation better or worse?  Furthermore, does every user want or need to be a statistician?

I believe that we are in danger of being caught on a hype wave here.  Extreme analytics and big data certainly have an important role to play in modern business.  But that role is in exploration and innovation of new opportunities for or threats to the business.  For many managers, regular reports and the ability to drill down into exceptions and outliers are as much as they need.  In other words, traditional BI.  For much of the business, the focus is on the minutiae and the mundane.  For daily decisions--and such decisions are the heartbeat of the business--the information required and the implications of the vast majority of possible circumstances are already largely known.  Big data and extreme analytics are unnecessary.  What is required is faster access to current transaction data or easier access to background content.

We've known this fork in BI for many years.  It's the difference between tactical/strategic and operational BI.  And while analytics and big data are getting the publicity, much is going on to restructure and re-architect the foundations of traditional BI.  One of these advances is data virtualization.

The emergence of big data has, of course, made data virtualization a mandatory technology for BI.  Given the volumes of data involved, it makes less and less sense to duplicate data on the scale we do it today.  And reduced duplication means that remote access, federation, EII or whatever term you like becomes a key component of any modern BI architecture. I'll be discussing this at the kickoff webinar today of Denodo's Data Virtualization World Series, available also on-demand from B-eye-Network.

So while we're dreaming dreams of extreme analytics and big data in Christmas' Future, let us also keep our eyes firmly fixed on Christmas Present and how we meet the current needs of the majority of ordinary business users.


Posted October 5, 2011 6:54 AM
Permalink | 1 Comment |