Data Warehouse Construction: Generator, Generic Knowledge and Operative Metadata
Originally published December 6, 2012
In my article Data Warehouse Construction: Behavior Pattern Analysis, we observed the usual constructional behaviors with a lab task of emptying all 1000 tables in a database "my_db." There, we showed that the employed approach was not effective and analyzed the reasons for this. In my article Data Warehouse Construction: The Real Life, we showed that the insights obtained there are relevant to the real life data warehouse construction today. In this article, we go back to the laboratory to consider some fundamentals of another type of constructional approach, the metadata-driven generic one.
Generator: Another Solution for the Lab TaskAssume that you know some basics of SQL (structured query language). Then, you should know what the following SQL statement does if it is executed in an SQL execution environment:
SELECT 'DELETE FROM my_db.' || tablename || ';'Actually, this statement perfectly generates all DELETE statements listed in the lab example discussed in Data Warehouse Construction: Behavior Pattern Analysis as the execution result in a fraction of one second! They are namely:
DELETE FROM my_db.tab_adfhkfha;Here is a little technical detail for a better understanding of the story.
Now, let us have a close look at the components of this statement generator. The first portion 'DELETE FROM my_db.' and the third one ';' of the complete character string to be constructed by the generator are constant. If the traditional approach is applied, they are the repeated contents; in particular, they are the major motivation for applying the operations pair "copy & paste" 999 times. In the terms introduced in the article mentioned above, they are carriers of certain domain-generic knowledge, regarding the SQL-syntax and regarding the programming style conventions in our case. The second portion is variable. Its concrete value, i.e., the table name, is specific for each individual table. With the traditional approach, this portion is where operations chain "search-replace-adjust-verify" is applied, again 999 times. In the terms introduced in the article mentioned above, this portion represents the carriers of certain object-specific knowledge, i.e., the table name in our case.
What are the essential differences between the two approaches regarding constructional behaviors?
Domain-Generic KnowledgeWith the new approach, the carriers of domain-generic knowledge are touched/edited only once, instead of the additional 999 times of manual "copy & paste" with the traditional approach. In other words, the domain-generic knowledge is centralized and, thus, not distributed with the new approach. Although the 1000 DELETE statements generated contain these carriers, they are in principle not to be read, understood and changed, just as with a piece of binary code. This way, the carriers of generic knowledge remain to be such, and no noxious "adjust" can change this state.
In Data Warehouse Construction: Behavior Pattern Analysis, we introduced domain-generic knowledge in a very thrifty form and used it to analyze the behaviors in data warehouse construction. In general, with generic knowledge we mean such that shows a general validity and applicability in a given domain of interest. Therefore, we call it domain-generic knowledge. A domain in our context is a well-defined system area such as a source application, the data warehouse, and so forth. Representatives of the domain-generic knowledge are the data warehouse architecture guidelines, the programming style conventions and, in particular, the algorithms employed. In fact, the latter generally represents the major portion of the domain-generic knowledge involved in data warehouse construction. As an important matter of fact, the architect team produces nothing but the domain-generic knowledge.
Object-Specific KnowledgeWith the new approach, we know exactly where the object-specific knowledge is to be placed. Thus, there is no need for the operations pair "search & replace." Moreover, we know exactly which object-specific knowledge is to be used, and we always get the correct one. Hence, there is no need for the operations pair "adjust & verify." In short, there is no need for the expensive and time-consuming operations repetition analyzed in Data Warehouse Construction: Behavior Pattern Analysis. Therefore, the new approach is substantially more effective than the traditional one.
But how do we obtain such object-specific knowledge? With the second point of the above explanation, we could have left an impression that we would manually put information about the related tables into the catalog table. If this would be the case, our life as data warehouse constructors would not be improved substantially with the new approach. This is because the manual acquisitions are generally error-prone as well, and we have to "adjust & verify" their results individually.
Actually, we obtain this information perfectly for free! When a table is created in a given host database such as "my_db" using any database management system available today, information like table name and the name of the database hosting this table is stored by the system automatically and perfectly as a record in the so-called system catalog table "tables" or the like. If the table is dropped later, this record will be removed from the system catalog table automatically and completely. In short, a table exists in a database, if and only if there is a corresponding record in the system catalog table "tables." This information exists in the system catalog without being perceived by us, and many people are indeed not aware of its existence. To make the story perfect, therefore, the above statement generator should look like
SELECT 'DELETE FROM my_db.' || tablename || ';'where "my_catalog" is replaced by "system_catalog."
Actually, the so-called "information" mentioned above, regardless of whether it is stored in "my_catalog" or in "system_catalog," is nothing but our object-specific knowledge. As a matter of fact, the object-specific knowledge, in turn, is nothing but the so-called operative metadata.
Operative MetadataOperative metadata defines operative/system objects and the relationships among them in the system, and determines the system behavior or state therewith. It is, thus, indispensable for the well-functioning of the system in consideration. Examples of operative metadata are the column list of a table, or the column mappings from a source table to a target table. The former is usually stored in the system catalog, e.g., in "system_catalog" above, and maintained by the system automatically, whereas the latter is stored in the user/tool-defined catalog, e.g., in "my_catalog" above, and maintained by the system constructors manually. For business users, operative metadata is data that is stored in the systems somewhere. To them, it is data they do not always understand and are, therefore, not interested in.
It is noteworthy that operative metadata is object-specific by definition. As a matter of fact, the activities in constructing data warehouses that are not repetitive are related with operative metadata. It has to be treated individually and specifically for each of the concrete objects, e.g., tables or mappings. This is the reason why we said previously that the object-specific knowledge is nothing but the operative metadata.
In the terms explained above, the new constructional approach type mentioned at the beginning of this article, i.e., the metadata-driven generic one, could be now called an operative-metadata-driven domain-generic-knowledge-centralized approach type. This designation is a little long but complete.
In this article, we observed actually only one of the possibilities of this approach type. In the upcoming articles, we will investigate another possibility, an even more effective one. In fact, this approach type is nothing but a new constructional paradigm, in the sense of Thomas Samuel Kuhn (1922 – 1996), as will be discussed in my next article.
Recent articles by Bin Jiang, Ph.D.
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC