Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

June 2011 Archives

I was one of the speakers at the International Data Warehousing and Business Intelligence Summit 2011, New and Emerging Technologies in Rome last week.  A host of excellent speakers, including Colin White, Claudia Imhoff, Cindy Howson, James Taylor, not to mention my modest self, covered a wide range of topics that are shaping the future outlook for BI.  The event has been running for many years now and this year an additional and related event in Rome has been added to the schedule: the International Data Management and Data Warehousing Conference 2011, Nov 30 - Dec 2.  To be highly recommended!

The main theme of this blog, however, was sparked by a very interesting conversation with a small, English startup, that was exhibiting at the event, called Neutrino Concepts.  Founded in 2007, the company aims to deliver, according to themselves, "next generation business intelligence software based on technology designed to enhance decision-making across all levels; allowing organisations to become more agile and efficient, to gain faster and exponential return on investment."  The bottom line is encapsulated in a single concept: "Google-like".  Wayne Eckerson sumes it up: "Neutrino Concept's interface enables users to search existing data warehouses using words  and  phrases - similar to Google - instead of submitting complicated queries."

The concept is hardly new.  Natural-language search and artificial intelligence have been around for decades.  More recently, Google's legendary, minimalistic interface has been held up for years to BI tool vendors and developers as the ultimate goal in usability.  Just type in a few keywords or phrases and the system will auto-magically find the results you're looking for.  The reality behind the marketing hype is somewhat different.  BI, its users and its context differs dramatically from those of Google.  Google deals mainly with documents - soft information, BI with highly structured hard information.  Google depends on the "wisdom of crowds", very large crowds indeed and statistical analysis of their behaviors to determine what is likely to be important in a particular search; BI lacks crowds of users and extensive cross-referencing (hyperlinks) to make similar inferences.  So, how can BI become Google-like?

Until now, much of the thinking in this area has come from vendors from the content space, who seek to extend their inverted indexing approach from documents to relational databases.  This approach is essentially post-cognitive - meaning and relationship is discovered after the information has been created and ingested (as content vendors say) into the system.  This, of course, is the only viable approach where documents can contain any information, in any context and in any structure.  This condition is not true for traditional business intelligence information, where structure and meaning are defined in advance, essentially a pre-cognitive approach.  Pat Foody, Technical Director of Neutrino Concepts, clearly comes at the problem from this latter approach, based on extensive experience in, dare I say it, traditional hard data.

Neutrino Concepts' product, NIRA (the poetically named Neutrino Information Release Appliance!) demonstrates impressive ability to interpret free-form user input and return highly relevant sets of data the demonstration hard data set.  Such hard data can come from the warehouse, but also from user-defined sources such as CSV files and spreadsheets.  Soft data can be included in the search too, where required.  An intuitive user interface allows easy filtering, querying and joining of results to produce the required answers.  The success of this approach depends entirely on the quantity, breadth and quality of the metadata describing the data sources.  Where such sources are part of the enterprise data warehouse environment, it may be expected that such metadata exists.  Unfortunately, such metadata is often limited.  For user-defined sources, such metadata is likely to e missing.  NIRA, in its present incarnation, takes the pessimistic view: assuming that such metadata is unavailable, it requires that it be entered during the setup phase.  This approach is probably more than sufficient for small projects; for larger, enterprise-wide projects it will be unlikely to suffice.

NIRA shows how BI can begin to approach "Google Nirvana" within the context of largely well-defined and well-documented sets of traditional business data.  One needs to recognize, however, that there is more work to be done to enable automatic extraction of metadata, both business and technological, from existing metadata stores and similar repositories.


Posted June 28, 2011 1:21 PM
Permalink | No Comments |
A chat with Max Schireson, President of 10gen makers of MongoDB (from humongous database), yesterday provided some food for thought on the topic of our assumptions about the best database choices for different applications.  Such thinking is particularly relevant for BI at the moment, as the database choices expand rapidly.

But first, for traditionalist BI readers, a brief introduction to MongoDB, which is one of the growing horde of so-called NoSQL "databases", some of which have very few of the characteristics of databases.  NoSQL stores come in a half a dozen generic classes, and MongoDB is in the class of "document stores" along with tools such as Apache's CouchDB and Terrastore.  Documents?  If you're confused, you are not alone.  In this context, we don't mean textual documents used by humans, but rather use the word in a programming sense as a collection of data items stored together for convenience and ease of processing, generally without a predefined schema.  From a database point of view, such a document is rather like the pre-First Normal Form set of data fields that users present to you as what they need in their application.  Think, for example, of an order consisting of an order header and multiple line items.  In the relational paradigm, you'll make two (or more) tables and join them via foreign keys. In a document paradigm, you'll keep all the data related to that order in one document.

Two characteristics--the lack of a predefined schema and the absence of joins--are very attractive in certain situations, and these turn out to be key design points for MongoDB. The lack of a schema makes it very easy to add new data fields to an existing database without having to reload the old data; so if you are in an emerging industry or application space, especially where data volumes are large, this is very attractive.  The absence of joins also plays well for large data volumes; if you have to shard your data over multiple servers, joins can be very expensive.  So, MongoDB, like most NoSQL tools, play strongly in the Web space with companies needing fact processing of large volumes of data with emergent processing needs.  Sample customers include Shutterfly, foursquare, Intuit, IGN Entertainment, Craigslist and Disney.  In many cases, the use of the database would be classed as operational by most BI experts.  However, there are some customers using it for real-time analytics, and that leads us to the question of using non-relational databases for BI.

When considering implementing Operational BI solutions, many implementers first think of copying the operational data to an operational data store (ODS), data warehouse or data mart and analysing it there.  They are immediately faced with the problem of how to update the informational environment fast enough to satisfy the timeliness requirement of the users.  As that approaches real-time, traditional ETL tools begin to struggle.  Furthermore, in the case of the data warehouse, the question arises of the level of consistency among these real-time updates and between the updates and the existing content.  The way MongoDB is used points immediately to an alternative, viable approach--go directly against the operational data.

As always, there are pros and cons.  Avoiding storing and maintaining a second copy of large volumes of data is always a good thing.  And if the analysis doesn't require joining with data from another source, using the original source data can be advantageous.  There are always questions about performance impacts on the operational source, and sometimes security implications as well.  However, the main question is around the types of query possible against a NoSQL store in general or a document-oriented database in this case.  It is generally accepted that normalizing data in a relational database leads to a more query-neutral structure, allowing a wider variety of queries to be handled.  On the other hand, as we saw with the emergence of dimensional schemas and now columnar databases, query performance against normalized databases often leaves much to be desired.  In the case of Operational BI, however, most experience indicates that the queries are usually relatively simple, and closely related to the primary access paths used operationally for the data concerned.  The experience with MongoDB bears this out, at least in the initial analyses users have required.

I'd be interested to hear details of readers' experience with analytics use of this and other non-Relational approaches.


Posted June 17, 2011 5:50 AM
Permalink | 1 Comment |
Last week I blogged on the Future of Business Intelligence and today as I'm making final preparations for two events dealing with this topic, I'd like to add a few further thoughts that struck me and deserve some emphasis.  (The events are in Brussels tomorrow, 15 June and in Rome on 22-24 June.  Between them they bring together an outstanding line-up of international speakers, including Rick van der Laans and Jan Henderyckx in the first event and Colin White, Claudia Imhoff, Cindi Howson and James Taylor at the second.  Not to mention yours truly at both!)

To start, two trends in addition to last week's five:

(6)    Operational vs. informational distinctions are disappearing.  In the past, decision support was seen as a relatively relaxed process, typically based on a stable, point-in-time view of the business.  Today, many decisions need to be made based on near real-time data.  This process is known as operational BI and breaks down the boundaries between operational and informational systems, a distinction on which the original BI architecture is based.

(7)    Service oriented architecture (SOA) is reinventing the application landscape.  Although SOA has been promoted since the early 2000s, its uptake has been relatively slow.  However, most application developers recognize that it is the only concept that has the potential to address the business need for more rapid and flexible application development.  The impact on BI is significant.  ETL techniques will need to radically change as applications become plug-and-play.  On the positive side, SOA offers the best chance in twenty years to finally address the metadata problem in BI--simply because SOA can only work if complete, live metadata exist in the the operational environment.

It is my firm belief that these seven trends, when taken together, spell the end of our current way of thinking about and implementing Business Intelligence.  I'm not suggesting that BI is at the end of its useful life.  Far from it.  In business terms BI is growing dramatically in popularity and value delivered to business users, its influence has extended across the entire spectrum of business activities--from the board room all the way down to the factory floor--and exciting and profitable new uses are being constantly invented.  However, this success and ever-broadening remit has significant consequences.  

First, it becomes clear that the boundary assumptions and resulting structures that framed BI twenty years ago have become totally obsolete.  For example, what forward-looking business manager today would be satisfied with traditional BI month-end data when making daily operational decisions?  Or be willing to operate with only internally sourced data?  The rules have changed.  The required timeliness and breadth of information today are dramatically different than seen two decades ago when the BI architecture was emerging.

Second, BI today encroaches into areas traditionally covered by other IT disciplines.  Operational BI and collaborative BI are obvious examples of the blurring of boundaries that were once clearly demarcated in both technological and organizational terms.

My personal conclusion, about which I've written extensively elsewhere, is that we need a new architecture for BI that encompasses the entire gamut of IT support for business.  Essentially, we need to move our thinking up a level from separate operational, collaborative and BI architectures to a comprehensive, joined-up Enterprise IT Architecture that I call Business Integrated Insight (BI to the power of two).

Of course, I'm not alone in thinking about the future of BI in this time of dramatic change.  So, come along to the conferences in Brussels or Rome to hear what other leaders in the field think.

Posted June 14, 2011 6:36 AM
Permalink | No Comments |
It's an exciting time in BI!  The extent and speed of change in technological possibilities is greater than we've seen since the early 1990s by my reckoning.  Some of these changes have seemed like minor ripples in the past, but it seems that they are all rolling up together in the last and next couple of years in one huge tsunami.  Here are five trends that are changing the future of business intelligence:

  1. Information volumes and variety are exploding.  This is creating exciting new business opportunities, but posing huge issues for traditional tools and methodologies for managing, storing and using data.  Traditional hard (structured) data for which relational database technology is best suited is now less than 5% of the total.  How do we manage the rest?  How do we combine the new soft ("unstructured") information sensibly with the old?  Is virtualization the answer or even the only solution?
  2. Hardware is undergoing a seismic shift.  Megabytes of cheap memory and multi-core processors change the design point for applications and databases.  Many of today's mid-size data warehouses could be run in memory already... if we had the database software that could take advantage of it.  The software architecture needs to change, and we can already see the beginnings.
  3. Database structures are diverging rapidly.  Row-based, general purpose databases have dominated the market for almost two decades.  They won't go away, but more specialized databases such as columnar varieties and even non-databases like Hadoop are taking on important roles and rapidly encroaching on the old way of doing BI.  The performance benefits of this software and the hardware advances above give some hope of tackling the information tsunami, but don't ignore how they enable new ways of speeding up old tasks.
  4. Tablets and smart phones are ramping up the mobile BI information mess.  Most people focus on the usage of tablets in business and on the user interface implications, but there is another aspect.  The sheer volume of these devices and the fact that increasing amounts of information will be stored on them will make today's PC spreadsheet proliferation problem look like nothing.  How will we manage information quality, consistency and auditability in this new world?
  5. Web 2.0 is reinventing how people work.  This has been the real slow burner, but it's reaching critical mass as more and more Generation Y people get to positions of power in the workforce.  The ways that business information will be shared, used and manipulated is set for major growth.  This will have an even bigger impact on information quality, consistency and auditability that mobile BI.
To hear more join me, Rick van der Lans and Jan Henderyckx in Brussels on June 15th to explore "The Future of Business Intelligence"


Posted June 6, 2011 9:22 AM
Permalink | No Comments |