<-- Back to full color view

Data Warehouse Construction: Constructional Efficiency

Originally published April 4, 2013

Computational Algorithm, Problem Complexity and Computational Efficiency †

In computer science, we usually use the term problem complexity to express the possible cost if the problem in consideration is solved using the best available computational algorithm for this problem. The more complex a problem is that there may be a higher cost for solving this problem using the algorithm. Here, we are interested mainly in the time and storage that the algorithm may need. From an opposite perspective, we investigate the time and storage required by different algorithms respectively for solving a given problem so as to compare their computational efficiency.

Constructional Approach and Constructional Efficiency or Expense

Constructing a mechanism for updating a target table in the data warehouse using data in the corresponding source tables is also a problem (just as in constructing an automobile or a hotel). We are interested now not only in the time and storage necessary for executing the update operation as with the classical dogma mentioned above, but also we are especially concerned with the expense, i.e., the time and money, that the construction of the related programs induces by employing a given constructional approach. In other words, we are now interested in the constructional efficiency of different constructional approaches.

Keeping this in mind, let us analyze the constructional efficiency, or the constructional expense if expressed at little pessimistically, in more detail, namely at the level of updating a single target table.

Traditional Constructional Expense Determiners

As a matter of fact, the following factors mostly determine the constructional expense by applying the traditional constructional approach, except for the number of attributes of the target table:
  1. Are there surrogate keys in the target table?

  2. Are there foreign keys in the target table, and does the related referential integrity have to be ensured?

  3. Are there additional joins required apart from surrogate keys and foreign keys?

  4. Which type of historicizing or archiving is involved?

  5. What is it about the error-rejection handling?

  6. What is it about the data quality treatment?

  7. Is it possible that the target table contains overlaps?

  8. Is it required that the rows have to wait if their parents are not yet available in the data warehouse?

  9. Is a cleaning of the source data necessary?

  10. Is a filtering of the source data required?

  11. Is the target table multi-linguistic?

  12. Is the source row involved in updating more than one target row?

  13. Does the updating of the target table require more than one source table?

  14. Is the source table involved in updating more than one target table?

  15. Does the updating of the target table require more than one source application?

  16. Is delta identification required?

  17. Is there some non-standard logic involved?

  18. Is there some nested processing?
Note that the treatment of the target table attributes in the form of representation transformation induces generally only a small fraction of the entire constructional expense, usually less than 15/85. Thus, it is negligible in comparison to the other expense factors enumerated above.

Mechanism Complexity Measured with the MGO Approach

From the new approach standpoint, i.e., the metadata-driven generic operator (MGO) approach, we have now a completely new and qualitatively different situation concerning the expense assessment. That is, the constructional expense is determined virtually exclusively by the related operative metadata when the relevant MGOs are in place. This is due to the fundamental fact that the treatment of most of the "traditional" expense factors listed above is already accomplished as the treatment of the domain-generic knowledge. Stated in another way, it has already been coded in the MGOs and, most importantly, it has been coded only once and forever. Consequently, it is no longer an expense consideration.†

Based on this fact, we can classify the mechanism complexity, or the related constructional expense, of an updating mechanism for a given target table by employing the MGO approach, for instance, as follows:
  • Simple, if the updating involves no more than one user-defined metadata tables1;

  • Medium, if the updating involves no more than two user-defined metadata tables;

  • Complex, if the updating involves no more than three user-defined metadata tables;

  • Very complex, if the updating involves more than three user-defined metadata tables or it has to be treated as a special case,2 a very infrequent situation.
In other words, the constructional expense with the MGO approach is mainly determined by the number of the metadata tables involved for the construction, apart from the number of the target table attributes. Different from the traditional approach, the treatment of the target table attributes in the form of representation transformation here amounts to a significant fraction of the entire constructional expense, usually larger than 85/15. It is important to point out that the respective expenses with both approaches for this part of treatment are comparable. Based on all this information, we should now actually be able to figure out the expense ratio between both approaches.

Expense Assessment Incommensurability between Constructional Paradigms

It should be noticed that both constructional approaches have essentially different determiners for their respective constructional expenses. A factor for one approach can be extremely challenging and expensive, while for the other it represents only a banality. For instance, the constructing and thorough testing of a mechanism for treating the eighth factor listed above for a target table may mean several working days with the traditional approach, whereas it may be only a flag setting with the MGO approach, which, in turn, means only a finger movement within a fraction of a second. As mentioned above, this is because this mechanism has already been implemented in the related MG operator generically, and is applicable for every target table if required. This is valid for all factors listed above. Last but not least, it is worth mentioning that even the "very complex" updating enumerated above with the MGO approach is generally not more complex than the usual "simple" ones with the traditional approach.

It may be noted that the two expense assessment systems presented above are in fact incommensurable with each other. The reason for this consists in the fact that they are respectively defined in two paradigmatically3 different contexts. Between two different paradigms, things cannot always be compared meaningfully with each other, as observed by Thomas Samuel Kuhn.

Nevertheless, the constructional practice delivers an effective solution for this dilemma, i.e., measuring the time and money actually needed for constructing the respective mechanisms with the same functionality and of the same quality.

Conclusion

This and my previous articles are dedicated to a thorough analysis and detailed explanation for the reasons why the MGO approach can save so much time and money for constructing sophisticated data warehouses in comparison to the traditional approaches. As a matter of fact, the result of this conceptional work has also been confirmed with a productivity improvement factor of twenty in terms of time and money by realizations on large scales (B. Jiang, 2011). Last but not least, the MGO approach ensures a significantly better quality extract-transform-load mechanism, the core of any data warehouse, in terms of the mechanismís:
  • Documentability

  • Maintainability

  • Administratability

  • Agility

  • Stability

  • Performance
In my next article, I will have a look at some of the benefits of this approach that we have not yet addressed in detail.

References
  1. User-defined metadata tables are described inMetathink: An Enterprise-Wide Single Version of the Truth, and Beyond and Data Warehouse Construction: Generator, Generic Knowledge and Operative Metadata as user/tool-defined catalog for operative metadata.

  2. The treatment of special cases is detailed inConstructing Data Warehouses with Metadata-Driven Generic Operators, and More (B. Jiang, DBJ Publishing, 2011).

  3. The paradigm discussion is given in Data Warehouse Construction: A Paradigm Shift?

SOURCE: Data Warehouse Construction: Constructional Efficiency

  • Bin Jiang, Ph.D.Bin Jiang, Ph.D.
    Dr. Bin Jiang received his masterís degree in Computer Science from the University of Dortmund / Germany in 1986. In 1992, he received his doctorate in Computer Science from ETH Zurich / Switzerland. During the research period, two of his publications in the field of database management systems were awarded as the best student papers at the IEEE Conference on Data Engineering in 1990 and 1992.

    Afterward, he worked for several major Swiss banks, insurance companies, retailers, and with one of the largest international data warehousing consulting firms as a system engineer, software developer, and application analyst in the early years, and then as a senior data warehouse consultant and architect for almost twenty years.

    Dr. Bin Jiang is a Distinguished Professor of a large university in China, and the author of the book Constructing Data Warehouses with Metadata-driven Generic Operators (DBJ Publishing, July 2011), which Dr. Claudia Imhoff called ďa significant featĒ and for which Bill Inmon provided a remarkable foreword. Dr. Jiang can be reached by email at bin.jiang@bluewin.ch.†

    Editor's Note: You can find more articles from Dr. Bin Jiang and a link to his blog in his BeyeNETWORK expert channel, Data Warehouse Realization.

Recent articles by Bin Jiang, Ph.D.



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!

 

Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC