We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Metadata in the Data Warehouse Draining the Swamp

Originally published May 20, 2011

A subtle but steady transition has been taking place right before our eyes. Old style first generation data warehouses are giving way to modern data warehouses. This usually means that corporations have discovered that the real essence of a data warehouse is the creation of the single version of the truth.

The first wave of data warehouses entailed people building data marts. After about the 50th data mart and after about five or ten years, those corporations woke up to the fact that with data marts there is no single version of the truth in the organization. And soon the transition from a data mart-based approach to a true data warehousing approach where there is a single version of the truth ensued.

Along the way, during the transition process, the data architect discovers DW 2.0, which is the architectural blueprint for a “modern” data warehouse. There are many novel aspects of DW 2.0, and the most salient aspects of include:

  • The recognition of the lifecycle of data within the data warehouse. Once data enters the data warehouse, it enters into its own unique lifecycle.

  • The recognition of the need for inclusion of unstructured textual data into the data warehouse. Over 80% of the data in the corporation is in the form of text. As such, it is not included in first generation data warehouses.

  • The need for tight integration of metadata in the data warehouse.
And there are many other lesser features of a data warehouse that are included into DW 2.0.
Now let’s focus on just one of these aspects – the need to tightly integrate metadata into the data warehouse. Does it seem odd that the tight integration of metadata into a data warehouse has not appeared in an earlier day and time? Stated differently, why does the tight integration of metadata into the data warehouse appear in DW 2.0? Shouldn’t it have appeared long ago in the first days of first generation data warehousing? In hindsight, the inclusion of metadata into the data warehouse appears to be what should have happened long ago. But it didn’t. Why?

Consider what was going on years ago when data warehousing was in its infancy. In the early days of data warehousing, there was much confusion as to what a data warehouse was or was not. Many people tried to tell the world that a data mart and a data warehouse were the same thing. Many people said that a data warehouse was merely a copy of operational data. Many people said a data warehouse was a collection of unintegrated data. So the first issues the pioneers of data warehousing faced were the issues of what was a data warehouse. There were (and still are) many pretenders to the world of data warehousing.

But the real melee came when people discovered that in order to build a data warehouse that data had to be integrated. Telling people that they had to go back into old legacy systems and wallow around was unwelcome news. It was sort of like telling a person that he needed to catch a really bad case of the flu. Voluntarily. Talk about your hard sell. So the early pioneers of data warehousing also had their hands full selling the need for the integration of data.

But there were other issues as well. There was the issue of cost. Data warehouses have never been inexpensive to build and maintain. Then there was the issue of ETL. What was ETL and why was it needed? Then there was the issue of the volumes of data that came with a data warehouse. Wasn’t a data warehouse just a massive replication of data found elsewhere in the corporation? Then there was the issue of business intelligence. How do you demonstrate the value of having a single version of the truth?
And the list goes on.

The early pioneers of data warehousing – starting with myself – had a lot of explaining to do. While it seems obvious today that metadata should have played a much more prominent role in the early days of data warehousing, we were so busy addressing other pressing and important issues that we just did not have the bandwidth to stop and address the issues centering around the role of metadata.

There is an old Louisiana saying – “You don’t worry about draining the swamp when you are up to your neck in alligators.” And that is exactly what was going on with metadata in the early days of data warehousing.

But thanks to DW 2.0, metadata is now an integral part of the data warehouse.

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon



Want to post a comment? Login or become a member today!

Be the first to comment!