Data Warehousing: Relational vs. Multi-Dimensional DataOriginally published April 6, 2005 Continuing with our view of similarities and differences between the Inmon and Kimball designs, we turn your attention to a "mixed" view or controversy concerning whether data in the data warehouse should be relationally designed data (Inmon) or whether the data should be of a multi-dimensional design and used in a logical collection of data marts (Kimball). Each architect addresses the level of granularity of the data required for his design, namely, the Inmon CIF or the Kimball BUS. According to Bill Inmon: "The paradigm for relational data in the data warehouse should be at a low level of granularity and should be in third normal form... then it is possible to 'lightly denormalize' the data if [it] is commonly used in the denormalized form... Relationally designed data warehouse data stored at a low level of granularity can be used in a wide variety of ways... [It]... can be used to support a wide variety of structures of data:
The Inmon data warehouse is a physical repository of data, which can be used to build data marts in which the data can be in multi-dimensional or other forms. Mr. Kimball also considers granularity and atomic level data to be key to his multi-dimensional design data warehouse or BUS. In Kimball University, Design Tip #21, Declaring the Grain" it states: "The most important step in a dimensional design is declaring the grain of the fact table. Declaring the grain means saying EXACTLY what a fact table represents... When you make a grain declaration, you can have a very precise discussion of which dimensions are possible and which ones are not... Atomic data has the most dimensionality and so it can be constrained and rolled up in every way that is possible for that data source. Atomic data is a perfect match for the dimensional approach... higher levels of aggregation will almost always have smaller dimensions... Since useful aggregations necessarily shrink dimensions and remove dimensions, it leads to the realization that aggregated data must always be used in conjunction with base atomic data, because aggregated data has less dimensional detail." At this point in the comparison between Inmon and Kimball, it seems they agree on the need for atomic data and the need for it to be available when aggregated data is being used. Mr. Kimball is emphatic on this point in defending data marts: "Some authors get confused on this point and after declaring that data marts necessarily consist of aggregated data, they criticize the data marts for 'anticipating business questions.'" He then points out that the misunderstanding can be clarified by providing the atomic data along with the derivative aggregated data. The next and last article will be a summary and will provide reader feedback and answers to readers' questions received during this series. SOURCE: Data Warehousing: Relational vs. Multi-Dimensional Data Recent articles by Katherine Drewek |
Copyright 2004 — 2012. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!