Data Warehouse Construction: Forces Driving Architectural Evolution
Originally published May 2, 2013
Roughly speaking, an architectural engagement for a data warehouse is a functionality placement onto its components. In order to have a better understanding of your data warehouse today and work out a meaningful picture for its tomorrow, it is helpful to review the architectural evolution road that data warehouses generally have followed in the past and know the possibilities leading to the future. Here, it is particularly helpful to understand the major forces that drove the evolution.
The Old Time: Manual ELTIn the good old days, when the term "data warehouse" was not yet established, the notion was decision support systems. At that time, there were no specific, professional tools for constructing such information platforms on which the operational data was collected, integrated, stored, and analyzed. For these information platforms, data was treated in a natural way:
Figure 1: The Architecture in the Old Time with Manual ELT
The main strength of this ELT architecture is its high performance, in particular in the transformation phase on the information platform. The following are the major reasons:
The New Time: Tool-Aided ETLIncreasing the productivity and quality of a large number of programs and facilitating their administration and documentation in this context were the driving forces for the invention of numerous tools, i.e., the data integration tools, in the recent decades, as an instance of the so-called CASE (computer-aided software engineering) movement. Most of these tools had the following similarities:
Figure 2: The Architecture in the New Time with Tool-Aided ETL
This so-called ETL architecture, as illustrated in Figure 2, was conceived essentially in favor of the tool vendors – the complex and sophisticated functionalities of the tools could be designed and developed independently of the diverse requirements of the multifarious information platform systems. This architecture had the following snags in practice:
The Modern Time: Tool-Aided ELTEfforts for completely overcoming the performance challenge led finally to the fundamental revision of the existing data integration tools and to the invention of new ones in the recent years. Many existing representative data integration tools, just as the new ones, have been moving in the following revised ELT direction, as shown in Figure 3.
This way, the advantages that should be achieved using graphical user interfaces seem to be realized, and the performance challenge has been overcome. However, issues regarding productivity, documentation, software quality and administration are not really resolved, even in the case of completely new data warehouses.
Figure 3: The Architecture in the Modern Time with Tool-Aided ELT
It has to be pointed out that for the existing data warehouses based on the "new time" ETL tools, the new generation of tools, i.e., the "modern time" ELT tools, cannot help very much. This is because the existing SQL statements hidden behind the "side door" cannot be automatically converted into the metadata desired by the tools. The expense of a manual reengineering of all such hidden SQL statements equals more than a half of expense for constructing a completely new data warehouse on the same scale.
The Future Time: MGO-Based ELTIn my previous articles in this series, a new approach, i.e., the metadata-driven generic operator one (MGO), for constructing data warehouses was introduced in detail. Its workflow, as illustrated in Figure 4, is exactly the same as the "old time" architecture described at the beginning of this article, i.e., ELT, but without tools. Thus, it has the same and, notably, the best performance. However, it does not have any of the weaknesses of its ancestor. As detailed in the above-mentioned articles, it is even much better than the approaches supported by the modern and professional data integration tools on the market regarding all critical aspects listed here:
Figure 4: The Architecture in the Future Time with MGO-Based ELT
As a matter of fact, this approach makes constructing data warehouses with a sophisticated architecture affordable, which, in turn, leads to high data quality and usability even with very unfavorable data sources. This way, it makes data warehouse construction no longer a big issue, even if they are expected to be of any type described in Defining Data Warehouse Variants by Classification and to meet all requirements enumerated in Refining the Enterprise Data Warehouse Definition by Requirements Categorization.
It is noteworthy that from the architectural perspective, there is no essential evolutional movement or improvement here. All substantial improvements enumerated above are achieved by applying a completely new constructional approach/paradigm introduced by my article series mentioned above. In other words, as soon as the architecture is determined, the constructional approach is decisive for the real success of the undertaking. A detailed and extensive description of the MGO paradigm can be found in my book Constructing Data Warehouses with Metadata-driven Generic Operators.
A Pictorial SummaryFigure 5 summarizes the four evolutional phases of data warehouse architecture discussed in this article along with the five driving forces. As a matter of fact, the validity of the discussion results obtained in this article is not limited only to data warehouses. They are also valid with regard to the so-called enterprise data integration in general.
Figure 5: Four Evolutional Phases of Data Warehouse Architecture (B. Jiang, 2011)
A Stock Market RecommendationGo short against the data integration tool vendors if they do not change their constructional paradigms, even in the context of the so-called "big data." (Note that it may take a relatively long time to see the change take effect.)
Recent articles by Bin Jiang, Ph.D.
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC