We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Data Warehouse Stabilization

Originally published December 7, 2006

When are we finished building a data warehouse? A data warehouse is finished when the business of the enterprise stops changing. Unfortunately, this answer is not palatable to many people. This means that as long as the business is growing and changing, the data warehouse will also be growing and changing. Therefore, a data warehouse is never finished.

These are words that management does not like to hear, however true they are. There is something about working on a project that never ends that is unnerving, especially from the managerial perspective of funding such an effort.

While it is true that a data warehouse is never finished, it is also true that a data warehouse reaches a point of what can be termed stabilization, or “semantic stabilization.” Stabilization refers to the entering of new types of data into the system and the changing to the existing data structures found in the data warehouse. In the sense that is discussed here, stabilization does not refer to the passage of occurrences of data into and out of the data warehouse. Instead, stabilization as discussed here is about the stability of the semantics of data, not the content of data.

Figure 1 shows how the semantics of data stabilize from one iteration of development to another.

In Figure 1, it is seen that on the building of the first iteration of the data warehouse, everything is new (not surprisingly!). All data definitions and all data types are new upon the initial building of the data warehouse. Then, as time passes and the second iteration of the data warehouse is completed, the data that was once new passes into existing data that has not changed or existing data that has changed.

In order to see the dynamics, suppose that on day one, the data elements of revenue and order date are added into the data warehouse. As the data warehouse is initially built, these two data elements are new. However, as the second iteration of data is being built, there is a change. Order date is split into two data elements: initial order date and approved order date. Now revenue – in the second iteration – is data whose semantics have not changed, and order date is data whose semantics have changed in the second iteration of development.

In addition, as the second iteration of the data warehouse is built, new semantic requirements (as opposed to changes to old semantic requirements) are added. A new segmentation of data types for the second iteration is created. Then, as the second iteration of development occurs, the process of shuffling the data elements occurs, as seen in Figure 1.

This process of shuffling data element types over time continues throughout the life of the data warehouse. It never ends. However, how it is done and the results of reshuffling do, in fact, change over time. Figure 2 shows the effects of reshuffling over time.

In Figure 2, it is seen that the amount of data that is entered as new data elements decreases over time. On the sixth or seventh iteration of data warehouse development, very little new data is added. And over time, the amount of data that is changed starts to decrease as well. In a word, the data warehouse reaches a point of relative semantic stability. Each new iteration of building on to the data warehouse merely adds some more data, but leaves much of the data warehouse untouched. At this point, the data warehouse has reached a point of stability.

While it is true that the data warehouse is continually modified and new data elements are constantly being added, the rate of these modifications and the number of these modifications decrease dramatically. When this occurs, the data warehouse has matured.

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon



Want to post a comment? Login or become a member today!

Be the first to comment!