We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Data Warehousing: Relational vs. Multi-Dimensional Data

Originally published April 6, 2005

Continuing with our view of similarities and differences between the Inmon and Kimball designs, we turn your attention to a "mixed" view or controversy concerning whether data in the data warehouse should be relationally designed data (Inmon) or whether the data should be of a multi-dimensional design and used in a logical collection of data marts (Kimball). 

Each architect addresses the level of granularity of the data required for his design, namely, the Inmon CIF or the Kimball BUS.

According to Bill Inmon: "The paradigm for relational data in the data warehouse should be at a low level of granularity and should be in third normal form... then it is possible to 'lightly denormalize' the data if [it] is commonly used in the denormalized form...

Relationally designed data warehouse data stored at a low level of granularity can be used in a wide variety of ways... [It]... can be used to support a wide variety of structures of data:

  • Exploration warehouses;
  • Data mining warehouses; and
  • OLTP data bases, etc."

The Inmon data warehouse is a physical repository of data, which can be used to build data marts in which the data can be in multi-dimensional or other forms.

Mr. Kimball also considers granularity and atomic level data to be key to his multi-dimensional design data warehouse or BUS. In Kimball University, Design Tip #21, Declaring the Grain" it states: "The most important step in a dimensional design is declaring the grain of the fact table. Declaring the grain means saying EXACTLY what a fact table represents... When you make a grain declaration, you can have a very precise discussion of which dimensions are possible and which ones are not...

Atomic data has the most dimensionality and so it can be constrained and rolled up in every way that is possible for that data source. Atomic data is a perfect match for the dimensional approach... higher levels of aggregation will almost always have smaller dimensions...

Since useful aggregations necessarily shrink dimensions and remove dimensions, it leads to the realization that aggregated data must always be used in conjunction with base atomic data, because aggregated data has less dimensional detail."

At this point in the comparison between Inmon and Kimball, it seems they agree on the need for atomic data and the need for it to be available when aggregated data is being used.

Mr. Kimball is emphatic on this point in defending data marts: "Some authors get confused on this point and after declaring that data marts necessarily consist of aggregated data, they criticize the data marts for 'anticipating business questions.'"

He then points out that the misunderstanding can be clarified by providing the atomic data along with the derivative aggregated data.

The next and last article will be a summary and will provide reader feedback and answers to readers' questions received during this series.

  • Katherine Drewek

    Katherine (1950-2010) had more than 30 years of experience in the editorial and corporate law environment. She was responsible for the content review, editing and formatting of international newsletters focusing on business intelligence and data warehousing. She was a frequent lecturer and panelist for the American Bar Association, the National Association of Credit Managers and the Corporate Practice Institute. She had been a mentor for the Women's Leadership Conference and served as Managing Editor for BeyeNetwork.

Recent articles by Katherine Drewek

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!