The Single Version Of The Truth

Originally published September 9, 2004

One of the great appeals of the data warehousing concept is that with a properly installed warehouse, there is a single version of the truth. There are many reasons why a single version of the truth is so appealing: 

  • there is a basis for reconciliation;
  • there is always a starting point for new analyses;
  • there is less redundant data; and
  • there is integrity of data, etc. 

The appeal of the single version of the truth is valid and strong. It is a worthy goal for organizations everywhere. 

But, does the notion that there should be a single version of the truth mean that there should be one physical data warehouse? In a large organization, there may be many physical renditions of data without violating the concept of the single version of the truth. 

Take a large, complex multi-national organization such as IBM. There are simply too many products, product types and customers spread over different geographical areas and time zones in order for there to be a single data warehouse. One imagines that a successful company such as IBM would have different data warehouses for: 

  • a hardware data warehouse for US customers;
  • a consulting data warehouse for European customers; and
  • a software data warehouse for Asian and Australian customers, etc.

There simply is too much data to imagine that IBM would have a single data warehouse. And this principle applies not just to IBM but to every large complex organization. 

If there are legitimate installations of physically separate data warehouses all residing within the concept of a single system of record, what then are the “rules of the road” for this phenomenon? Some of these are: 

  • there is no overlap of non-key data between the data warehouses. For example, the data element – contract cost for US software – does not appear in more than one place;
  • there is connectivity between different physical data warehouses at the key level. The key, customer id, appears in more than one physical data warehouse and has a consistent meaning across all data warehouses;
  • these rules apply to granular atomic data and that the physical data warehouses contain data at the atomic level;
  • each unit of data in a data warehouse be attached to, or associated with, some unit of time (as is the case for all data warehouse data); and
  • data within the physical data warehouse is integrated uniformly across all physical data warehouses. This means that one physical data warehouse uses the same rules and procedures for integration as all other physical data warehouses. 

There are a surprisingly small set of rules for creating an environment of data integrity across a large and complex environment. Implementation of these rules, however, requires organizational discipline and the attitude of all organizational units working in harmony with other organizational units, which is easier said than done.

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!