Originally published March 31, 2005
This article attempts to draw out the similarities and differences between the Inmon and Kimball approaches to the data warehouse.
On the subject of what the data warehouse is and what the data marts are, both Kimball and Inmon have spoken:
“… The data warehouse is nothing more than the union of all the data marts …” Ralph Kimball Dec. 29, 1997.
“You can catch all the minnows in the ocean and stack them together and they still do not make a whale.” Bill Inmon Jan. 8, 1998.
The Corporate Information Factory (CIF) and the Kimball Data Warehouse Bus (BUS) are considered the two main types of data warehousing architecture. Accordingly, the two architectures have some elements in common.
All enterprises require a means to store, analyze and interpret the data they generate and accumulate in order to implement critical decisions that range from “continuing to exist” to maximizing prosperity. Corporations must develop operating and feedback systems to use the underlying data means (the data warehouse) to achieve their goals.
Both the CIF and BUS architectures satisfy these criteria.
Another requirement of any data warehouse architecture is that the user can depend on the accuracy and timeliness of the data. The user must also be able to access the data according to his or her particular needs through an easily understandable and straightforward manner of making queries.
The data that is extracted in this manner by one user should be compatible with and translatable to other operations and users within the same group or enterprise that rely on the same data.
Both Inmon and Kimball share the opinion that stand-alone or independent data marts or data warehouses do not satisfy the needs for accurate and timely data and ease of access for users on an enterprise or corporate scale.
In an article for the Business Intelligence Network, Mr. Inmon writes:
“… Independent data marts may work well when there are only a few data marts. But over time there are never only a few data marts ... Once there are … a lot of data marts, the independent data mart approach starts to fall apart. There are many reasons why … independent data marts built directly from a legacy/source environment fall apart:
In” Differences of Opinion” (previously cited), Mr. Kimball gives his opinion of independent data marts:
“Finally stand-alone data marts or warehouses … are problematic. These independent silos are built to satisfy specific needs, without regard to other existing or planned analytic data. They tend to be departmental in nature, often loosely dimensionally structured.
Although often perceived as the path of least resistance because no coordination is required, the independent approach is unsustainable in the long run. Multiple, uncoordinated extracts from the same operational sources are inefficient and wasteful.
They generate similar, but different variations with inconsistent naming conventions and business rules. The conflicting results cause confusion, rework and reconciliation. In the end, decision-making based on independent data is often clouded by fear, uncertainty and doubt.”
It appears from the above, that both Inmon and Kimball are of the opinion that independent or stand-alone data marts are of marginal use.
However, for the most part, this is where the perception of similarity stops. You may discern later, as I have, that there are more similarities, but each of our data warehouse architects expresses them in a very different way.
Inmon believes that Kimball’s star schema-only approach causes inflexibility and therefore leads to a “brittle” structure. He writes”… this basic lack of flexibility is at the heart of the weakness of the star schema model as the basis of the data warehouse ... When there is an enterprise need for data the star schema is not at all optimal.
Taken together, a series of star schemas and multi-dimensional tables are brittle ... [They] cannot change gracefully over time …” Mr. Inmon believes his approach, which uses the dependent data mart as the source for star schema usage, solves the problem of enterprise-wide access to the same data, which can change over time.
“The relational data warehouse is best served by a relational [3NF] database design running on relational technology … This should be no surprise since the dbms technology the data warehouse runs on works the best with a relational database design.”
The Kimball BUS architecture expresses that “raw data is transformed into presentable information in the staging area, ever mindful of throughput and quality. Staging begins with coordinated extracts from the operational source systems.
Some staging “kitchen” activities are centralized, such as maintenance and storage of common reference data, while others may be distributed. (“Data Warehouse Dining Experience,” Intelligent Enterprise, Jan 1, 2004.) The above indicates to this author that Kimball has gone beyond the individual star schema approach, criticized by Inmon and, in fact, has described his multi-dimensional data warehouse. In this approach, the model contains atomic data and the summarized data, but its construction is based on business measurements, which enable disparate business departments to query the data from a higher level of detail to the lowest level without reprogramming.
Although this description appears to indicate that the Kimball “staging area” is VERY similar to the Inmon data warehouse, the Kimball approach does not recommend a real, physically implemented, data warehouse. His “data warehouse” is still the collection of data marts with their conformed dimensions.
In Mastering Data Warehouse Design: Relational and Dimensional Techniques, by Claudia Imhoff, Nicholas Galemmo and Jonathan Geiger (Wiley, 2003), these authors analyze the Kimball approach as relying on star schemas for both atomic and aggregated storage.
Summarizing this point of their research, the Data Warehouse Bus Architecture is said to consist of two types of data marts:
In both the Atomic and Aggregated Data Marts, the data is stored in a star schema design.
Their description of the Kimball Bus Architecture seems to indicate that the Kimball Approach still does not recognize a need for nor require a central data warehouse repository.
The next article will highlight the differences in the two models regarding relational vs. multidimensional data.
Recent articles by Katherine Drewek