Migrating to the Next Generation of Data Warehousing

Originally published March 23, 2006

In my article, An Introduction to the DW 2.0 Architecture, I described the challenges that first-generation data warehouses were encountering. I also explained why a new data warehousing architecture is now required and also described several of the features of DW 2.0. This article builds on that article and briefly discusses the migration from first- to second-generation data warehouses.

The migration to the first generation of data warehousing was slow and tortuous. The biggest obstacle was the need for integrated data. Even with extract, transform and load (ETL) technology, going back into old corporate applications was tough sledding. Indeed, even today there are organizations that suggest that integration of data is just too complex and too difficult to accomplish. There have been many schemes to “buy your way to heaven” and bypass the arduous and tedious work of integrating old, legacy data, but those schemes have failed because there simply is no substitute for integrated data.

In the early days of data warehousing, a gigabyte or two of data was considered to be a lot of data. In the world of first-generation data warehousing, a terabyte of data became the norm. Also, along with the volumes of data in the first-generation data warehouses came the realization that there were very different probabilities of data access.

Another factor of the maturing of the data warehouse environment was the need for looking at data in many different ways. The same store of data had to meet the needs of accounting, sales, marketing and finance. In addition, as new requirements came through the door (and they always came), it became necessary to look at data in a manner that no one had ever thought of before. The only way to have the required flexibility of data was to store the data in a granular manner.

Because the view for data warehousing is different today than it was ten years ago, there is a new vision for data warehousing. That vision is called DW 2.0, the architecture for the next generation of data warehousing. The complete description of the DW 2.0 architecture is contained on the Web site http://www.inmoncif.com/ and is free.

One of the interesting and important considerations of the migration between first- and second-generation data warehouses is the amount of work an organization has to do in order to get from one point to the next. Fortunately, the evolution from a first-generation data warehouse to a second-generation data warehouse is smooth. There are a minimal number of speed bumps as an organization migrates from one generation to the next.

The high level differences are between first- and second-generation data warehouses include:

  • Second-generation data warehouses recognize the difference between the life cycle stages of data.

  • Second-generation data warehouses recognize the need for mixing structured and unstructured data.

  • Second-generation data warehouses recognize the need for tying metadata closely and intimately with the actual data in the data warehouse.

The last feature - tying metadata closely to the data warehouse – can be accomplished by the creation of an enterprise metadata repository. Metadata already exists at the local level. It is metadata at the enterprise level that needs to be created. Therefore, to accelerate into a second-generation data warehouse, an enterprise metadata repository is the primary challenge.

The second feature – integrating unstructured data and structured data – depends entirely on the ability to access and condition the unstructured environment. Once the unstructured environment is accessed and conditioned, the unstructured data can be added to the structured data warehouse environment in a “bolt on” manner.

The first feature of second-generation data warehouses is the most problematic. This feature requires that the first-generation data warehouse be divided along the lines of the aging of data in order to become a second-generation data warehouse. Usually near-line data and archival data can be added independently, but separating the first-generation data warehouse into integrated sectors and interactive sectors is another matter. If the organization has built a separate operational data store (ODS), then such a separation is not difficult. However, if the organization has done online transaction processing inside the first-generation data warehouse, such a separation is difficult. In order to create a second-generation data warehouse, the designer must create a clear integrated sector and a clear interactive sector, and those sectors must not overlap in the slightest.

The migration to the second-generation data warehouse environment is natural and evolutionary. Only under a few circumstances is the transition difficult.

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!