Data warehousing/business intelligence (BI) has played a remarkable role in IT over the past few decades. It has remained a relatively closed ecosystem, both in terms of the vendors involved and the delivery teams within IT organizations. Signs are that that’s all about to change as the boundaries between operational and informational computing become increasingly blurred and Enterprise 2.0 finally catches on in business. This series of articles explores the business and technological drivers for this change and its far-reaching implications for architects, developers and vendors alike. And it describes the architecture that’s needed to move the focus from business intelligence to the wider world of enterprise architecture.
Consistency, Integration and Other Dirty Words
Reading some vendor materials and even analyst commentaries these days, you’d be forgiven for thinking that data consistency, cleansing, reconciliation and integration were swear words. Or maybe that all enterprise data has been made so clean at the source in recent years that business intelligence can just get on with analysis of whatever presents itself, and focusing on ever sexier ways to display the results. If only that were so…
It’s worth recalling that among the key drivers for data warehousing, and later business intelligence, were information consistency and integration. Users required consistent reports and answers to decision support questions so that different departments could give agreed answers to the CEO. IT wanted an integrated set of base data to avoid time-consuming and costly reworking when the CEO got conflicting answers. This was a key emphasis I placed in the first ever article describing the data warehouse architecture, published in 1988 in the IBM Systems Journal.1
Now, more than two decades later, far more weight is placed on speed and flexibility of decision making, and ever more sophisticated analytics. We see this in diverse technology trends from operational BI
through data appliances to “post-relational” databases and stand-alone analytic and dashboard environments. Sometimes information consistency is assumed – “we have a data warehouse; that will take care of it.” To the extent that the data in question passes through the warehouse, that’s fine; but, a closer look often reveals that key parts of the informational environment stand stubbornly apart from the warehouse.
Financial, technical or data timeliness concerns often lead to “warehouse bypasses.” In such cases, the requirement for clean, consistent data is conveniently forgotten as exciting new data marts and appliances are sourced directly from potentially inconsistent or unclean operational systems. And, of course, contradictory and incompatible results and reports emerge later to haunt the development and maintenance teams. In the history of data warehousing, this is at least the third iteration of the “independent data mart” disease. It emerges with particular virulence in times of economic stress, when vendors need quick sales and buyers are more willing to listen to stories of great bargains. And the cause remains the same: insufficient attention to cleanliness and consistency of the source data.
And then there are the resurgent direct query proponents who seem to believe that “mashups” – to use the modern term for an old approach – of any data, anywhere, performed on the fly and all mixed together can solve all users’ information needs. Enterprise information integration (EII), federation, virtual data warehousing and a variety of similar terms for the same technique have been around since the earliest days of business intelligence. In the past, eliminating the need for a big store of reconciled data was a key driver. Today, it is the increasing business demand for ever-closer to real time data for decision making. Again, it also suits vendors of such tools to declare that little or no data modeling is required and deployment can therefore be rapid and low-cost. But, the oft-forgotten question is: Does the data being combined in the real-time query actually belong together?
Add to this soft information.2
Such information, from emails to voicemails, from Word documents to YouTube videos, from tweets to blogs, is growing dramatically in volume and is seen as increasingly important as a source of valuable knowledge for many aspects of business. It is also notoriously unclean, inconsistent and difficult to integrate. Typically, vendors coming from the search and enterprise content markets have only a limited understanding of the extent of data integration needed and expected in business intelligence. And their assumptions about the level of automation required in assignment of meaning, selection of relevant records and integration with hard information are often far less stringent than those encountered in business intelligence.
What’s happening in all cases is that a fundamental assumption that has underpinned data warehousing from the earliest days is coming under increasing stress. It was clear from the beginning, and it remains so today, that when decision making is taken to an enterprise level, a largely consistent, enterprise-wide base of information is required. The original data warehouse architecture assumed that this consistency could be created as data enters the warehouse. Modern business needs and long-standing practices largely invalidate this assumption. The practice of bypassing the warehouse when need be removes any opportunity to enforce consistency or enable integration. The business need for ever closer to real-time decision making creates significant technical barriers to reconciliation of data if it does enter the warehouse. And the proliferation of soft data, especially from unreliable sources such as the Web, adds an even greater issue of defining what consistency means in the first place.
As this and other original postulates of data warehousing3
are coming under increasing strain, the time comes to ask if the original architecture for data warehousing needs to be revisited. And it is to be hoped that the process will not be as painful as it was for Galileo when he suggested moving the earth from the center of the universe!
A Changing Landscape for IT
While ongoing issues of consistency and integration graphically illustrate some of the changing information landscape that data warehouse developers encounter, the rest of IT is facing equally daunting changes in business expectations and technology. Among the most important is service oriented architecture (SOA). Driven initially by a need for flexibility and adaptability in the operational environment, SOA is creating a new process-oriented, plug-and-play integration approach for operational applications. While there continues to be ongoing debate about the practicality of vendor approaches and the viability of fully implementing SOA in the short to medium term, there is little doubt that the days of enormous, custom-built, monolithic applications are numbered. They are simply too inflexible for modern business. Commercial off-the-shelf (COTS) applications such as SAP
and Oracle Applications are already promoting SOA as a way to integrate and innovate within and around these packages.
Initially, SOA was aimed specifically at the operational environment, but its scope certainly extends to informational and collaborative applications in the medium to longer term. As business users become used to the concept that they can (or should be able to) link together existing services into a workflow they need to do their job, they will correctly begin to question the difference between these classes of function: Why can’t we plug an analysis step into the workflow to understand the likely impact of delaying this shipment? How do we link into the e-mail system to notify a customer automatically of an order fulfillment problem? By now, these questions are commonplace and vendors increasingly offer point solutions in dashboards, business process management
flows and so on. But the underlying architectural questions are often avoided. What does all this mean for the traditional division between operational and informational data? Is process finally becoming a strong requirement in informational applications?
Web/Enterprise 2.0 approaches are also dissolving the old boundaries between operational, informational and collaborative function by reframing user interactions in a looser and more user-directed social environment. And as these old (and artificial) functional boundaries break down, so too does our traditional division between operational, informational and collaborative information.
Today’s Business Needs and Technologies Demand a New Architecture
The above considerations, and more, lead to a daunting but obvious conclusion. Modern business needs and current technological trends demand a fully consistent, integrated set of information that spans the formerly separate worlds of operational, informational and collaborative activities. For the informational world, the original data warehouse provided such a store. But, unlike data warehousing, this information cannot be stored in a single database, but must be distributed throughout the entire IT infrastructure. Furthermore, Enterprise 2.0 makes it clear that the nature of this information is expanding from hard information to include a wide variety of more complex, soft information such as documents, audio and images. Consistency and integrity needs apply (to varying extents) to these information types as well. So, the key question being posed about information is: how can we create a new base of integrated and consistent information for the entire
The simpler the answer, the better the solution. If you want to create a consistent, integrated information resource, you must stop creating duplicates of existing information that have to be managed to consistency, and you must eliminate, or substantially reduce, existing data duplication. The original data warehouse architecture did this. It proposed a logically single data store – the business data warehouse – modeled at the enterprise level as the consistent and integrated source of all information for decision making. This simplicity was ultimately lost with the emergence of the layered architecture, due to a combination of database performance and enterprise modeling issues.
Nonetheless, the approach remains valid for the current much-expanded needs for integration. First, model all the information according to an enterprise-level model and then implement as far as possible in alignment to that model with minimal duplication. This is the approach proposed in Business Integrated Insight (BI2
), an architecture that gathers all the information of the enterprise, hard and soft, operational, informational and collaborative into a single logical component. In addition, a second integration point, largely absent from the original data warehouse architecture, is absolutely mandatory today. A logically single, consistent and integrated set of processes that spans all aspects of business and IT needs is required to enable the flexibility and adaptability that modern business requires. This is also included as part of BI2
What does BI2
look like? And if you’re a little impatient, a substantial preview3
is already available!References:
- Devlin, B. A. and Murphy, P. T., “An architecture for a business and information system,” IBM Systems Journal, Volume 27, Number 1, (1988).
- I use the terms “hard” and “soft” information to distinguish between data that has been structured for traditional computer uses, such as data entry, computation, analysis and summarization, and data that is stored and used in a more free-form way, often called unstructured information. However, the phrase “unstructured information” is actually an oxymoron – information, by definition, has a structure that imbues meaning. Unstructured data is, in reality, nothing more than random noise!
- Devlin, B. “Business Integrated Insight (BI2) – Reinventing enterprise information management.”
Recent articles by Barry Devlin