We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Business unIntelligence—Insight and Innovation Beyond Analytics and Big Data

Originally published January 8, 2014

One of the key reasons I decided to write Business unIntelligence was my growing conviction that the original data warehouse architecture was showing its age and struggling to meet modern business needs for near real-time business intelligence (BI) and support for big data. I first conceived the data warehouse, then called the Business Data Warehouse (BDW) in 1986, and subsequently published it in the IBM Systems Journal in 1988 (Devlin & Murphy, 1988), in an era with far simpler business needs and technology than at present. Despite such significant change, it was my aim to create a new architecture that could benefit from previous data warehouse work, expanding and building on that in terms of technology, methodology and organizational aspects.

While the new IDEAL conceptual and REAL logical architectures stretch far beyond the bounds of traditional BI systems and data, the concepts of enterprise data warehouse (EDW) and data marts remain at the core. The following excerpt from Chapter 5 of my new book discusses how these original components are positioned in the new logical architecture.

Business unIntelligence—Insight and Innovation Beyond Analytics and Big Data is published by Technics Publications, New Jersey.
_________________________________________________

Beyond the Data Warehouse

Given the extent and rate of changes in business and technology described thus far, it is somewhat unexpected that the term data warehouse and the architectural structures and concepts described in Sections 5.2 and 5.3 of my book still carry considerable weight after more than a quarter of a century. However, this resistance to change cannot endure much longer. Indeed, one goal of this book is to outline what a new, pervasive information architecture looks like, within the scope of data-based decision making and the traditional data sources of BI for the past three decades.

Reports of my death have been greatly exaggerated1

Of course, the data warehouse has been declared terminally ill before now. BI and data warehouse projects have long had a poor reputation for delivering on-time or within budget. While these difficulties have clear and well-understood reasons—emanating from project scope and complexity, external dependencies, organizational issues, and more—vendors have regularly proposed quick-fix solutions to businesses seeking quick and reliable solutions to BI needs. The answers, as we’ve seen, range from data marts and analytic appliances to spreadsheets and big data. As each of these approaches has gained traction in the market, the death of the data warehouse has been repeatedly—and incorrectly—pronounced. The underlying reason for such faulty predictions is a misunderstanding of the consistency vs. timeliness conundrum described in section 5.4 of the book. The data warehouse is primarily designed for consistency; the other solutions are more concerned with timeliness, in development and/or operation. And data consistency remains a valid business requirement, alongside timeliness, which has growing importance in a fully interconnected world. Nonetheless, as the biz-tech ecosystem evolves to become essentially real-time, the data warehouse cannot retain its old role of all things to all informational needs, going forward. As a consequence, while it will not die, the data warehouse concept faces a shrinking role in decision support as the business demands increasing quantities of information of a structure or speed that are incompatible with the original architecture or relational technology.

REAL Architecture: Core Business Information

In essence, the data warehouse must return to its roots, as represented by the three-layer data warehouse architecture (Figure 5-3 in my book). This requires separate consideration of the two main architectural components of today’s data warehouses—the enterprise data warehouse and the data mart environment. In the case of the EDW, this means an increasing focus on its original core value propositions of consistency and historical depth, where they have business value, including:
  1. The data to be loaded is process-mediated data, sourced from the operational systems of the organization

  2. This loaded data provides a fully agreed, cross-functional view of the one consistent, historical record of the business at a detailed, atomic level as created through operational transactions

  3. Data is cleansed and reconciled based on an EDM, and stored in a largely normalized, temporally based representation of that model; star-schemas, summarizations and similar derived data and structures are defined to be data mart characteristics

  4. The optimum structure is a “single relational database” using the power of modern hardware and software to avoid the copying, layering and partitioning of data common in the vast majority of today’s data warehouses

  5. The EDM [enterprise data model] and other metadata describing the data content is considered as an integral, logical component of the data warehouse, although its physical storage mechanism may need to be non-relational for performance reasons
The first business role of any “modern data warehouse” is thus to present a historically consistent, legally binding view of the business to both the internal and outside worlds, including finance and audit departments, regulatory bodies for reporting results, and business partners for committed business transactions. This corresponds to the original positioning of the enterprise data warehouse, before it became “report generation central” in broad usage. I discussed the meaning, extent and limits of largely normalized in Chapter 8 of my book on data warehousing (Devlin, 1996). A detailed description of the temporal database concept is best left to Chris Date, Hugh Darwen, and Nikos Lorentzos (Date, et al., 2002), and Tom Johnston and Randall Weis (Johnston & Weis, 2010). The second business role is as the source of consistent data used to “anchor” other less reliable or rapidly changing data used in reporting, exploration and other BI applications. In some sense, this is an expansion of the concept of master data management, as applied in the purely operational world to the real-time informational needs of the biz-tech ecosystem, and reflects the fact that operational and informational data usage are rapidly converging.

These two roles suggest that a name distinct from the wider and multiple interpretations of the data warehouse brand may be more appropriate. While some traditional BI / data warehousing activities focused on regulatory and business reporting certainly continue to be supported here, more operational, legally guaranteed activities with external business entities are equally important. With the added consideration that context-setting information (CSI), including its model and other metadata, are also explicitly included in its scope, this component is called the core business information repository (CBIR), shown in the figure to the right.

Despite the earlier discussion in the book’s Section 5.4 of the emerging possibility of physical re-integration of the operational and informational environments using in-memory database technology, the CBIR remains a valid, long-term component of any new information architecture for two distinct reasons. First, historical data is being stored for increasing periods of time, but as it ages, access usually diminishes. Keeping the full historical record in memory becomes impractical or financially unviable as data volumes grow. Second, despite some vendors’ desire to integrate the entire gamut of business information processing within a single system, many businesses will continue to run multiple operational applications, albeit in smaller numbers than before. Given both considerations, a physically separate instantiation of the CBIR remains a necessity. In that case, data in the combined operational/informational in-memory store that meets the requirement of cross-functional consistency are considered part of the logical scope of the CBIR.

While the role and content of the EDW are thus transformed significantly by this evolution, data marts are rather differently affected. Their content and purpose remain the same, but their sourcing changes. For consistency with the CBIR, we call these components core analysis and reporting stores (CARS). In the purer architectural form of the data warehouse, experts recommended that data marts should be dependent, i.e., sourced from the EDW.  Independent data marts were often frowned upon. As shown in the figure, the new architecture reverses this advice, feeding all such data marts as directly as possible from their original sources. The reasons are two-fold: to minimize the number of copies of data held and to maximize timeliness. Done indiscriminately, however, such multi-sourcing leads to a spaghetti dish of feeds; a common, model-defined, and CSI-driven information pre-integration function is thus mandatory. In addition, these two components share data—via the assimilation process described in Chapter 7—to create and maintain ongoing semantic and temporal consistency.

The lower part of the figure also shows a necessary change in thinking about operational systems. In expositions of data warehousing, the data and process components of an operational application are generally presented together and positioned outside the architectural boundary of interest of the warehouse. As we’ve already seen, this separation is no longer logically tenable; the needs of the biz-tech ecosystem for (near) real-time information in analytics and for analysis results to be applied immediately in production mean that informational and operational needs must be designed, developed and maintained strictly in tandem. The emergence of in-memory database approaches further supports this direction. In terms of physical implementation, relational databases are the most likely platform for all three components. With the convergence of operational and informational systems in the biz-tech ecosystem, we must therefore separate operational data and process and bring the operational data within the boundary of interest of REAL information. Operational processes are likewise identified explicitly as the components that generate transactions, the legally defined and binding interactions of the enterprise. The CBIR, CARS and transactional data are represented as a single logical component, indicating that they must be considered as a whole. Together and individually, they create, store and manage the definitive, transactional reality—current and historical—of the business.

These components represent the first tranche of a new REAL logical architecture, which will unfold in subsequent chapters. First in the sense of this initial introduction, but also first in the sense of a starting point for any migration from a current BI environment to one that fully supports the biz-tech ecosystem—in both informational and operational needs.

End Notes:

  1. Mark Twain’s actual written reaction in 1897 was: “The report of my death was an exaggeration.”
Bibliography
Date, C. J., Darwen, H. & Lorentzos, N. A., 2002. Temporal Data & the Relational Model. San Francisco(CA): Morgan Kaufmann.

Devlin, B., 1996. Data Warehousing—From Architecture to Implementation. Reading(MA): Addison-Wesley.

Devlin, B. A. & Murphy, P. T., 1988. An architecture for a business and information System. IBM Systems Journal, 27(1).

Johnston, T. & Weis, R., 2010. Managing Time in Relational Databases: How to Design, Update and Query Temporal Data. 1st ed. Burlington(MA): Morgan Kaufmann.

 

  • Barry DevlinBarry Devlin
    Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

    Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

    Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

    Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

Recent articles by Barry Devlin

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!