We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Data Warehouse Construction: Compiler, Interpreter and Operator

Originally published February 7, 2013

In my article Data Warehouse Construction: Generator, Generic Knowledge and Operative Metadata, I used a straightforward statement generator as a didactically effective aid for introducing several basic concepts of a new data warehouse constructional paradigm. It was, in turn, further discussed in my article Data Warehouse Construction: A Constructional Paradigm Shift? On the other hand, the generator chosen is so trivial that it could be hard to believe that such a thing represents a serious paradigm for constructing sophisticated enterprise data warehouses. In fact, it does.
 
In the data warehousing practice, we often encounter self-developed "script generators," "program generators" or the like for certain tasks. Although they can be significantly more complex than our statement generator mentioned above (for instance, I know a "code generator" composed of 40,000 lines of code for generating SQL codes updating bitemporally historicized complex dimensions), they are the same in essence. That is, they all are centralizing containers of some kind of domain-generic knowledge. Generators as metadata-driven generic programs represent a possibility for centralizing generic knowledge. In this article, we will introduce and analyze another possibility, i.e., metadata-driven generic operators.

Essence of Operative Metadata

In Metathink: An Enterprise-Wide Single Version of the Truth, and Beyond, I mentioned that operative metadata is almost always stored in a "structured" form in the system for performance reasons. In our context, the most employed "structured" form is table, such as those holding mappings for constructing extract-transform-load (ETL) programs. Here, the structure of a table is nothing but the syntax for the rows stored in the table:
  • Each row in the table is a syntactically correct sentence of the "language" syntactically defined by the underlying syntax/structure of the table.

  • The order of and the relationships among the attributes of a row, i.e., the components of the corresponding sentence, are meaningful and context-sensitive. They are stipulated by the underlying syntax/structure of the table.

  • The contextual semantics of such a sentence/row can only be realized/understood by utilizing a corresponding interpreting mechanism such as an interpreter or a compiler.

  • Different interpreting mechanisms can pick out different semantics from the same sentence/row. Theoretically, all of them are meaningful. In other words, it is the interpreting mechanisms that determine the semantics of the given sentence/row.

  • On the other hand, interpreting mechanisms with different appearances may pick out the same semantics from a given sentence/row. In other words, it is not the appearances but the contents of the interpreting mechanisms that is decisive for the semantics of the sentence/row.

  • In fact, only a part of the whole semantics of a sentence/row is determined by the content of the sentence/row. The rest is contained in, as some kind of generic knowledge, and, thus, determined by the corresponding interpreting mechanism.
We call such special tabular languages for formulating operative metadata the metadata languages and tables for holding operative metadata the metadata tables.

Essence of Metadata-Driven Generic Programs

Compiler and Generator

A compiler is a program that takes programs written in the corresponding language as input, and translates this input into other programs that can be understood by the computer or by a program at a lower level. The programs generated this way are not executed by the compiler itself. They can be distributed to locations where the corresponding lower level program or the computer for their execution is available. There, these programs are then executed. In fact, a compiler of a language is a generator of programs in a language at a lower level. These generated lower level programs represent the semantics of the corresponding original input programs.

Script generators, often encountered in the data warehousing practice, have basically the following characteristics:
  • They are intended only for a specific type of tasks and usable in a given domain.

  • The task-type-specific, domain-generic knowledge is encapsulated in the generators.

  • They generate object-specific, executable scripts composed of SQL statements for a specific task based on the following input information:
  • The object-specific parameter values provided with the invocation

  • The object-specific operative metadata meeting certain specific tabular syntax, i.e., rows stored in certain metadata tables or sentences of the corresponding metadata language

  • The task-type-specific, domain-generic knowledge encapsulated in the generators
  • The generated scripts, representing the semantics of the input sentences, are then distributed for later execution.
In this sense, script generators are the compilers of the corresponding metadata languages, while SQL is the lower level language.

Interpreter and Operator 

An interpreter is a program too, but it does a little more than a compiler. An interpreter of a language not only generates executable programs from the input program, but it also executes these programs by itself for certain processing tasks immediately after the generation. This way, it realizes the semantics of the input program.

In algebras, operators operate operands and generate results that can be operated again as operands by the operators. In the relational algebra, for instance, operators like selection, projection, product, etc., operate tables as operands and generate new tables as results that, in turn, can be operated as operands by the operators. In the data warehouse context, the operands are mostly files and tables, and the operators are ETL programs. With the traditional paradigm, ETL programs could be regarded as operand-specific operators. That is, each ETL program can be applied only to certain specific operands. With the new paradigm, each operator is domain-generic and, thus, can be applied to all operands in a given domain as far as it is meaningful. This is quite similar to the relational algebraic operators. It is the corresponding operand-specific knowledge, i.e., the operative metadata, that makes the effect of the operation here operand-specific. As discussed in Metathink: An Enterprise-Wide Single Version of the Truth, and Beyond, we can regard this effect as the semantics of the related operative metadata, i.e., the corresponding sentences of the underlying metadata language. From this viewpoint, the (operative) metadata-driven generic operators (MGOs) with the new paradigm are essentially nothing but task-type-specific interpreters of these metadata languages.

In the data warehouse context, we regard operands as objects. In these terms, the interpreters/operators in discussion have the following characteristics:
  • They generate object-specific, executable scripts composed of SQL statements for specific tasks exactly as the corresponding compilers/generators do.

  • They pass the generated SQL statements to the SQL interpreter of the underlying database management system. The latter, then, executes these SQL statements immediately for accomplishing the tasks.

Compiler/Generator or Interpreter/Operator

In theory, there is no essential difference between compilers and interpreters or generators and operators. In practice, however, the difference is substantial, especially from the perspective of software administration and operating. In my next article, I will analyze this aspect in detail.

  • Bin Jiang, Ph.D.Bin Jiang, Ph.D.
    Dr. Bin Jiang received his master’s degree in Computer Science from the University of Dortmund / Germany in 1986. In 1992, he received his doctorate in Computer Science from ETH Zurich / Switzerland. During the research period, two of his publications in the field of database management systems were awarded as the best student papers at the IEEE Conference on Data Engineering in 1990 and 1992.

    Afterward, he worked for several major Swiss banks, insurance companies, retailers, and with one of the largest international data warehousing consulting firms as a system engineer, software developer, and application analyst in the early years, and then as a senior data warehouse consultant and architect for almost twenty years.

    Dr. Bin Jiang is a Distinguished Professor of a large university in China, and the author of the book Constructing Data Warehouses with Metadata-driven Generic Operators (DBJ Publishing, July 2011), which Dr. Claudia Imhoff called “a significant feat” and for which Bill Inmon provided a remarkable foreword. Dr. Jiang can be reached by email at bin.jiang@bluewin.ch

    Editor's Note: You can find more articles from Dr. Bin Jiang and a link to his blog in his BeyeNETWORK expert channel, Data Warehouse Realization.

Recent articles by Bin Jiang, Ph.D.



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!