Originally published July 19, 2007
First came applications. Then came data warehouses. Then came the corporate information factory (CIF). The idea of an integrated, granular, single version of the truth was just what many corporations needed for the foundation of decision support system (DSS) processing and corporate decisions. Data warehousing provided the basis for many forms of sophisticated analysis.
Figure 1 shows that the data warehouse has become the foundation of DSS processing.
Indeed, in many organizations, the data warehouse was just what was needed to open up information to the entire organization.
As powerful as the notion of a data warehouse is, in many organizations, a single data warehouse is not entirely sufficient or appropriate. There is a need in some organizations for what is termed a “network” of data warehouses. In some organizations, the needs for information are so diverse over geography and different products and services that a single data warehouse just doesn’t make sense. So what is a network of data warehouses and what kind of organization can make use of networked data warehouses?
A network of data warehouses is one where there are multiple data warehouses. Data is linked from one data warehouse to another and all data across all data warehouses participates in what can be called a global system of record.
One type of organization that can make use of a network of data warehouses is the geographically dispersed organization. In a geographically dispersed organization, there are separate data warehouses that serve different geographic locations. Figure 2 shows the different processing locations of the geographically dispersed organization.
Figure 2 shows that there are processing centers all over the world. And while there are some uniformities of data, business function and processing across the different locations, there are also differences. The laws and business customs in Tokyo are different from the laws and business customs in Riyadh. And the laws and business customs in Riyadh are different from the laws and business customs in Paris, and so forth. In fact, there are many data and business function differences around the world.
Where there is great geographical diversity, it simply makes sense that there be networked data warehouses. A single data warehouse has a difficult time accounting for the many worldwide differences.
Another case for networked data warehouses comes from organizations that have great diversity in their product and service lines. Figure 3 shows this case.
Figure 3 depicts the different lines of business for a large and diverse organization. There are indeed some commonalities of customer, product, services and so forth between the different lines of business. But there are some major differences as well.
For organizations such as these, a network of data warehouses makes much more sense than a single data warehouse, as seen in Figure 4.
So exactly what does a networked data warehouse environment look like and what are the dynamics? Figure 5 shows what a typical networked data warehouse environment might look like.
In Figure 5, there is an enterprise data warehouse (the EDW), a European manufacturing data warehouse, a Far East shipping data warehouse and a Latin American retail data warehouse. In addition to housing their own data, each of these data warehouses regularly exchanges data with the others. Figure 6 shows a free exchange of data among the different networked data warehouses.
Note that the same or related data may exist in one or more networked data warehouses. Figure 7 shows the existence of similar or the same data in different networked data warehouses.
In order to keep the integrity of data where the same data is found in different places, it is necessary to create and maintain a global system of record. The global system of record is needed in order to prevent data chaos – where the same item of data is found in many places with different values.
As an example of the system of record, suppose that the EDW is charged as the “owner” of several elements of data, as seen in Figure 8.
The EDW then is the place where the data elements are created or entered into the network, deleted or updated. There is then a single point of control in that one and only one data warehouse controls the values for the data elements found in its own system of record, as seen in Figure 9.
If any changes are to be made to an element of data, the changes are made in the EDW. Once the data in the system of record is brought up to date and is accurate, then from the EDW, the data can be distributed across the network. Any other data warehouse in the network can access and copy (but not change) the data found in the EDW system of record.
But if there are ever any discrepancies in the values of data found in the EDW system of record and the same data found elsewhere, then the data found in the EDW system of record is correct, by definition. In other words, the data found in the system of record is always deemed to be correct and it is the responsibility of other networked data warehouses to reconcile the discrepancy in favor of the values found in the system of record. Figure 10 shows this resolution of discrepancies.
One of the characteristics of data residing in the system of record is that the data be stored in the system of record in the lowest form of granularity. In other words, data in the system of record is not summarized or aggregated. Figure 11 shows that data found in the system of record is stored at the lowest level of granularity.
Another characteristic of the networked data warehouse environment is that data not in one data warehouse may be placed in that data warehouse from another data warehouse. The data warehouse where the data resides as the system of record is always the source of the data. If data that is found in one data warehouse is not the system of record, then that data should not be used as a source. The only legitimate source of data is data that is in a system of record.
In addition, data that has no system of record can be placed in a data warehouse. In doing so, a new system of record is created. Figure 12 shows that data may be placed in the Far East data warehouse that is not in the EDW.
This means that the networked data warehouse environment contains data that is outside the EDW system of record. But when data enters the networked data warehouse environment, it must go to some system of record. Figure 13 shows that each networked data warehouse may have its own system of record.
Indeed, every unit of data in the networked data warehouse environment needs to belong to some system of record. This is what is meant as a global system of record. And of course, no unit of data can belong to more than one system of record. Every unit of granular data belongs to exactly one and only one system of record. The system of record can be physically mapped to many different data warehouses. There is no overlap of data throughout the networked data warehouse environment. Figure 14 shows this structuring of data.
Once the networked data warehouses are arranged with the global system of record that has been described, the result is integrity of data and data content. Because of the establishment and ongoing maintenance of the global system of record, the resulting networked data warehouse environment is very unlike the stovepipe environment, as seen in Figure 15.
In the stovepipe environment, there is no discipline of the content of data values or the shareability of data values. Data goes anywhere, data can be updated anywhere and duplication of data – either exact duplication or closely similar duplication – can be done anywhere or anytime. There is no basis for reconciliation or any discipline for the enforcement of integrity in the classical stovepipe environment.
Recent articles by Bill Inmon
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!