In my article Data Warehouse Construction: Behavior Pattern Analysis, we observed the usual constructional behaviors with a lab task of emptying all 1000 tables in a database "my_db." There, we showed that the employed approach was not effective and analyzed the reasons for this. In my article Data Warehouse Construction: The Real Life, we showed that the insights obtained there are relevant to the real life data warehouse construction today. In this article, we go back to the laboratory to consider some fundamentals of another type of constructional approach, the metadata-driven generic one.
Generator: Another Solution for the Lab Task
Assume that you know some basics of SQL (structured query language). Then, you should know what the following SQL statement does if it is executed in an SQL execution environment:
SELECT 'DELETE FROM my_db.' || tablename || ';'
WHERE databasename = 'my_db';
Actually, this statement perfectly generates all DELETE statements listed in the lab example discussed in Data Warehouse Construction: Behavior Pattern Analysis
as the execution result in a fraction of one second! They are namely:
DELETE FROM my_db.tab_adfhkfha;
DELETE FROM my_db.tab_lkeasr;
DELETE FROM my_db.tab_dajpgh;
Here is a little technical detail for a better understanding of the story.
- When being executed, the above SQL statement looks up table "tables" located in the catalog database "my_catalog," fetches all records in it about the tables that are hosted in the database "my_db" and takes the names of these tables (tablename) in the hand. For each of these table names, it constructs a character string by concatenating three string portions: "DELETE FROM my_db.," the current value of the variable "tablename" from the 1000 table names just taken in the hand and the character ";" for a syntactically correct end of every SQL statement.
- Before we do this, we need to put information about the related tables into the table catalog, i.e., table "tables" located in the catalog database "my_catalog." Each of these tables has a corresponding record in this catalog table with at least two columns, "databasename" and "tablename," denoting the hosting database and the table in consideration, respectively.
It is easy to verify that each of such a character string corresponds to one of the DELETE statements desired. In this sense, the SQL statement in consideration is a statement generator
, although it is a simple one. What you still need to do is simply execute these generated statements. Note that there is no cold-blooded "copy," no powerful "paste," no clever "search," no elegant "replace," no noxious "adjust" and no skeptical "verify" anymore. In short, there is no operations repetition (discussed in Data Warehouse Construction: Behavior Pattern Analysis
) at all!
Now, let us have a close look at the components of this statement generator. The first portion 'DELETE FROM my_db.' and the third one ';' of the complete character string to be constructed by the generator are constant
. If the traditional approach is applied, they are the repeated contents
; in particular, they are the major motivation for applying the operations pair "copy & paste" 999 times. In the terms introduced in the article mentioned above, they are carriers of certain domain-generic knowledge
, regarding the SQL-syntax and regarding the programming style conventions in our case. The second portion is variabl
e. Its concrete value, i.e., the table name, is specific for each individual table. With the traditional approach, this portion is where operations chain "search-replace-adjust-verify" is applied, again 999 times. In the terms introduced in the article mentioned above, this portion represents the carriers of certain object-specific knowledge
, i.e., the table name in our case.
What are the essential differences between the two approaches regarding constructional behaviors?
With the new approach, the carriers of domain-generic knowledge are touched/edited
only once, instead of the additional 999 times of manual
"copy & paste" with the traditional approach. In other words, the domain-generic knowledge is centralized and, thus, not distributed with the new approach. Although the 1000 DELETE statements generated contain these carriers, they are in principle not to be read, understood and changed, just as with a piece of binary code. This way, the carriers of generic knowledge remain to be such, and no noxious "adjust" can change this state.
In Data Warehouse Construction: Behavior Pattern Analysis
, we introduced domain-generic knowledge in a very thrifty form and used it to analyze the behaviors in data warehouse construction. In general, with generic knowledge we mean such that shows a general validity and applicability in a given domain of interest
. Therefore, we call it domain-generic knowledge. A domain in our context is a well-defined system area such as a source application, the data warehouse, and so forth. Representatives of the domain-generic knowledge are the data warehouse architecture guidelines, the programming style conventions and, in particular, the algorithms employed. In fact, the latter generally represents the major portion of the domain-generic knowledge involved in data warehouse construction. As an important matter of fact, the architect team produces nothing but the domain-generic knowledge
With the new approach, we know exactly where
the object-specific knowledge is to be placed. Thus, there is no need for the operations pair "search & replace." Moreover, we know exactly which
object-specific knowledge is to be used, and we always get the correct one. Hence, there is no need for the operations pair "adjust & verify." In short, there is no need for the expensive and time-consuming operations repetition analyzed in Data Warehouse Construction: Behavior Pattern Analysis
. Therefore, the new approach is substantially more effective than the traditional one.
But how do we obtain such object-specific knowledge? With the second point of the above explanation, we could have left an impression that we would manually put information about the related tables into the catalog table. If this would be the case, our life as data warehouse constructors would not be improved substantially with the new approach. This is because the manual acquisitions are generally error-prone as well, and we have to "adjust & verify" their results individually.
Actually, we obtain this information perfectly for free! When a table is created in a given host database such as "my_db" using any database management system available today, information like table name and the name of the database hosting this table is stored by the system automatically
as a record in the so-called system catalog
table "tables" or the like. If the table is dropped later, this record will be removed from the system catalog table automatically
. In short, a table exists in a database, if and only if there is a corresponding record in the system catalog table "tables." This information exists in the system catalog without being perceived by us, and many people are indeed not aware of its existence. To make the story perfect, therefore, the above statement generator should look like
SELECT 'DELETE FROM my_db.' || tablename || ';'
WHERE databasename = 'my_db';
_catalog" is replaced by "system
Actually, the so-called "information" mentioned above, regardless of whether it is stored in "my_catalog" or in "system_catalog," is nothing but our object-specific knowledge. As a matter of fact, the object-specific knowledge, in turn, is nothing but the so-called operative metadata.
Operative Metadata Operative metadata
defines operative/system objects and the relationships among them in the system, and determines the system behavior or state therewith. It is, thus, indispensable for the well-functioning of the system in consideration. Examples of operative metadata are the column list of a table, or the column mappings from a source table to a target table. The former is usually stored in the system catalog
, e.g., in "system_catalog" above, and maintained by the system automatically
, whereas the latter is stored in the user/tool-defined catalog
, e.g., in "my_catalog" above, and maintained by the system constructors manually
. For business users, operative metadata is data that is stored in the systems somewhere. To them, it is data they do not always understand and are, therefore, not interested in.
It is noteworthy that operative metadata is object-specific
by definition. As a matter of fact, the activities in constructing data warehouses that are not repetitive are related with operative metadata. It has to be treated individually and specifically for each of the concrete objects, e.g., tables or mappings. This is the reason why we said previously that the object-specific knowledge is nothing but the operative metadata
In the terms explained above, the new constructional approach type mentioned at the beginning of this article, i.e., the metadata-driven generic one, could be now called an operative-metadata-driven domain-generic-knowledge-centralized
approach type. This designation is a little long but complete.
In this article, we observed actually only one of the possibilities of this approach type. In the upcoming articles, we will investigate another possibility, an even more effective one. In fact, this approach type is nothing but a new constructional paradigm
, in the sense of Thomas Samuel Kuhn (1922 – 1996)
, as will be discussed in my next article. References
- "Domain-generic knowledge" is a concept introduced in Constructing Data Warehouses with Metadata-Driven Generic Operators, and More (B. Jiang, DBJ Publishing, 2011). In this book, you can find a huge amount of domain-generic knowledge in forms of, for instance, sophisticated detailed reference architecture for enterprise data warehouses and more than two dozen generally applicable algorithms frequently found in diverse data warehouse realizations.
- "Operative metadata," a concept introduced in Constructing Data Warehouses with Metadata-Driven Generic Operators, and More (B. Jiang, DBJ Publishing, 2011) as well, is more detailed discussed in Metathink: An Enterprise-Wide Single Version of the Truth, and Beyond on BeyeNETWORK.com.
SOURCE: Data Warehouse Construction: Generator, Generic Knowledge and Operative Metadata
Recent articles by Bin Jiang, Ph.D.