Originally published March 23, 2006
In my article, An Introduction to the DW 2.0 Architecture, I described the challenges that first-generation data warehouses were encountering. I also explained why a new data warehousing architecture is now required and also described several of the features of DW 2.0. This article builds on that article and briefly discusses the migration from first- to second-generation data warehouses.
The migration to the first generation of data warehousing was slow and tortuous. The biggest obstacle was the need for integrated data. Even with extract, transform and load (ETL) technology, going back into old corporate applications was tough sledding. Indeed, even today there are organizations that suggest that integration of data is just too complex and too difficult to accomplish. There have been many schemes to “buy your way to heaven” and bypass the arduous and tedious work of integrating old, legacy data, but those schemes have failed because there simply is no substitute for integrated data.
In the early days of data warehousing, a gigabyte or two of data was considered to be a lot of data. In the world of first-generation data warehousing, a terabyte of data became the norm. Also, along with the volumes of data in the first-generation data warehouses came the realization that there were very different probabilities of data access.
Another factor of the maturing of the data warehouse environment was the need for looking at data in many different ways. The same store of data had to meet the needs of accounting, sales, marketing and finance. In addition, as new requirements came through the door (and they always came), it became necessary to look at data in a manner that no one had ever thought of before. The only way to have the required flexibility of data was to store the data in a granular manner.
Because the view for data warehousing is different today than it was ten years ago, there is a new vision for data warehousing. That vision is called DW 2.0, the architecture for the next generation of data warehousing. The complete description of the DW 2.0 architecture is contained on the Web site http://www.inmoncif.com/ and is free.
One of the interesting and important considerations of the migration between first- and second-generation data warehouses is the amount of work an organization has to do in order to get from one point to the next. Fortunately, the evolution from a first-generation data warehouse to a second-generation data warehouse is smooth. There are a minimal number of speed bumps as an organization migrates from one generation to the next.
The high level differences are between first- and second-generation data warehouses include:
The last feature - tying metadata closely to the data warehouse – can be accomplished by the creation of an enterprise metadata repository. Metadata already exists at the local level. It is metadata at the enterprise level that needs to be created. Therefore, to accelerate into a second-generation data warehouse, an enterprise metadata repository is the primary challenge.
The second feature – integrating unstructured data and structured data – depends entirely on the ability to access and condition the unstructured environment. Once the unstructured environment is accessed and conditioned, the unstructured data can be added to the structured data warehouse environment in a “bolt on” manner.
The first feature of second-generation data warehouses is the most problematic. This feature requires that the first-generation data warehouse be divided along the lines of the aging of data in order to become a second-generation data warehouse. Usually near-line data and archival data can be added independently, but separating the first-generation data warehouse into integrated sectors and interactive sectors is another matter. If the organization has built a separate operational data store (ODS), then such a separation is not difficult. However, if the organization has done online transaction processing inside the first-generation data warehouse, such a separation is difficult. In order to create a second-generation data warehouse, the designer must create a clear integrated sector and a clear interactive sector, and those sectors must not overlap in the slightest.
The migration to the second-generation data warehouse environment is natural and evolutionary. Only under a few circumstances is the transition difficult.
Recent articles by Bill Inmon