Refining the Enterprise Data Warehouse Definition by Requirements Categorization

Originally published July 12, 2012

As mentioned in my last article, the enterprise data warehouse is of strategic importance for every organization striving to improve its performance. To accomplish its mission, the enterprise data warehouse should meet a lot of fundamental requirements, which will be discussed in this article.

In 1001 Data Warehouse Definitions: Is This One Accurate?, I proposed using verbs for specifying functionalities of engineered objects to make their definitions more objective, restrictive, concrete, direct and precise. With such definitions, it is easy to judge whether an object is so defined or not. In the engineering practice, however, this is usually not sufficient. We want to know how good the engineered product is and how satisfied our customers are with the delivered engineered products. For this, we usually need (nominalized) adjectives that are more or less measurable and thus stable and objective.

As a matter of fact, the fundamental requirements a system has to meet help us characterize the system, frequently by using adjectives, and thus refine its definition qualitatively and generally. Moreover, the degree of fulfillment of the requirements represents the goodness of the realized system quantitatively and individually.

To make the system requirements of diverse natures more comprehensible and easier to remember, I put them into five categories: functional requirements, informational requirements, operational requirements, economical requirements and security requirements.

Functional/CTO Requirements

This category considers the functionalities that an enterprise data warehouse is required to provide for the enterprise technical infrastructure of the organization.

Data collection. The data warehouse collects operational data, i.e., the data produced by the operational applications, and represents a repository of the business history of the organization. On the other hand, it should not be regarded as a classical (offline) archive because of its other functionalities.

Data integration. In general, the business enabler of the organization consists of more than one operational application. For various reasons, the data generated by these operational applications are in different formats, structures and semantics. Thus, it is required that all this data is integrated by the data warehouse so that it has a uniform appearance and consistent interpretation and can be easily compared and combined with each other.

Data preparation. The data in the data warehouse will be used for subsequent analysis. Thus, it has to be prepared so the analysis can be carried out without much additional effort, such as data reorganization and quality improvement. In other words, the data must be made usable for the analysis as directly as possible. 

The person in the organization that is most concerned about these functionalities from the perspective of the enterprise technical infrastructure is the chief technical officer (CTO). Therefore, we also call this category “CTO requirements.”

Informational/CEO Requirements

This category takes the content of the data warehouse into consideration, i.e., the information that the data in the data warehouse carries.

Correctness. The data in the data warehouse must be correct, consistent and trustable. It must semantically mirror its counterparty in the source applications. It is not required for a data warehouse to correct the data errors produced by the source applications, although it is nice to do it. On the other hand, the data warehouse does not allow modifying data without a correctness warranty.

Completeness. The scope of the data in the data warehouse must fill the business needs. There are three dimensions regarding this aspect:
  • All analysis-relevant data objects of sufficient accuracy must be available in the data warehouse. If not, new data sources must be added to the data warehouse.

  • The history of each object must be sufficiently long to satisfy the analysis’ needs.

  • Within the history of each object, no erroneous gaps are allowed.
Traceability. Every data element stored in the data warehouse must be traceable. For instance, for any given column value stored in the data warehouse, we can figure out from which source columns, corresponding tables, databases, and source applications it originated. Moreover, it is nice to be able to reproduce every data element stored in the data warehouse, regardless whether it is big or small, with some affordable effort.

The quality of the information provided by the data warehouse is directly related to the quality of the business decisions, which ultimately concerns the chief executive officer (CEO): He expects nothing from the data warehouse other than good and trustable information derived from the data stored there. Thus, we also call this category “CEO requirements.”

Operational/CIO Requirements

Like other IT infrastructures, the data warehouse must be operated by the organization. This category concerns its operational aspects.

Availability. The data warehouse must be available when the business needs it. Furthermore, the data must be current. However, this requirement depends strongly on the concrete business needs. For some business needs, five minutes are sufficient, whereas for others, one week of topicality is satisfactory.

Performance. The data warehouse must be able to answer the business requests for information fairly quickly. This is an availability of fine granularity.

Safety. The data stored in the data warehouse must be safe against any incidences. This is an availability of coarse granularity.

If the enterprise data warehouse, like any other IT infrastructure, does not run well or is not effectively available, the most embarrassed person is the chief information officer (CIO) of the organization. For this reason, we also call this category “CIO requirements.”

Economical/CFO Requirements

This category embraces the usual system engineering aspects.

Solidity. The data warehouse software and its documentation must be of high quality. The software must be stable and easy to understand. The documentation must be available and current.

Agility. The software and structures must be so designed that it can be easily and quickly extended and scaled to fulfill new business needs, so as to minimize the time and money to the market and maximize the market share for the organization. 

Affordability. The total cost of ownership of the data warehouse must be low. The following are the major cost factors for a data warehouse:
  • Hardware. The computer systems with all relevant subsystems like storage, backup, and so forth.

  • Software. The fundamental software like the operating systems, the database management systems, the special software tools for data warehouse development, administration, and so on.

  • Development. The data warehouse software has to be developed, maintained and extended. For this cost factor, solidity and agility are decisive.

  • Administration and operations.
All these requirements appear to be purely technical. They are indeed. However, the very person who feels the ultimate impact of the fulfillment degree of these requirements is the chief finance officer (CFO). Therefore, we also call this requirements category the “CFO requirements.” A little surprised?

Security/CSO Requirements

This category is related to the security of the data stored in the data warehouse.

Privacy. The privacy of the business partners must be secure. If anything awful happens, the organization may face juristic and public relationship troubles. A loss of general trustworthiness can happen.

Secrecy. The business secrets of the organization must be secure. If loss of secrecy happens, a loss of organizational value is the consequence, and its competitive position can be weakened.

Access. The access to the data warehouse must be organized and controlled. For the business, everyone should be able to obtain what he really needs, no more than is sufficient and no less than is necessary. This is valid, regarding not only the information, but also the system resource.

Clearly, the person mostly concerned about these requirements is the chief security officer (CSO) of the organization. Thus, we also call this category “CSO requirements.”

Figure 1 is a summary of the discussion above, and we call this requirements categorization system the CORC (Chief Officers Requirements Catalog) on enterprise data warehouses.


Figure 1: CORC – A Requirements Categorization System on Enterprise Data Warehouses (B. Jiang, 2011)

As a matter of fact, this categorization system can be exploited to evaluate existing data warehouses and to measure how well the data warehouse designs on hand are. This can be carried out as follows:
  • Prioritize the importance of the 15 requirements given above according to the business actuality of your organization. Then, order them by these priorities, where the highest one has a weight of 15 and the lowest one has a weight of one.

  • Assume that every requirement has a maximum of 10 scalar points for its fulfillment degree. Determine the fulfillment degree of your current data warehouse for each of the 15 requirements or the fulfillment potential for your current data warehouse design.

  • Multiply the weight with the fulfillment degree for each requirement and summarize these products. The result could be considered the “goodness” of your data warehouse or the data warehouse design.
The results obtained above can be visualized in the spin-net form.

Postscript

In 1001 Data Warehouse Definitions: Is This One Accurate?, I proposed a definition of data warehouses. With such a definition, we can judge qualitatively whether a system is a true data warehouse or not. In philosophical terms, it considers the first aspect of the trinity “truth-goodness-beauty” of each data warehouse. In our engineering practice, this is not sufficient since we need further guidelines to judge quantitatively whether a data warehouse is a good one, and especially how good. This is about the second part of the trinity, i.e., the “goodness.” The requirements categorization system and its application to evaluating data warehouses discussed in this article are designed to answer this question. As shown, the answers to these two aspects in our case can be made more or less objectively. The answers to the third aspect, “beauty,” are subjective. For instance, you might find my beautiful house ugly. Assume that a data warehouse meets all 15 requirements discussed above to the highest fulfillment degree, i.e., it is perfect. Is it therefore objectively beautiful? If not, how can I judge the “beauty” of a data warehouse?

  • Bin Jiang, Ph.D.Bin Jiang, Ph.D.
    Dr. Bin Jiang received his master’s degree in Computer Science from the University of Dortmund / Germany in 1986. In 1992, he received his doctorate in Computer Science from ETH Zurich / Switzerland. During the research period, two of his publications in the field of database management systems were awarded as the best student papers at the IEEE Conference on Data Engineering in 1990 and 1992.

    Afterward, he worked for several major Swiss banks, insurance companies, retailers, and with one of the largest international data warehousing consulting firms as a system engineer, software developer, and application analyst in the early years, and then as a senior data warehouse consultant and architect for almost twenty years.

    Dr. Bin Jiang is a Distinguished Professor of a large university in China, and the author of the book Constructing Data Warehouses with Metadata-driven Generic Operators (DBJ Publishing, July 2011), which Dr. Claudia Imhoff called “a significant feat” and for which Bill Inmon provided a remarkable foreword. Dr. Jiang can be reached by email at bin.jiang@bluewin.ch

    Editor's Note: You can find more articles from Dr. Bin Jiang and a link to his blog in his BeyeNETWORK expert channel, Data Warehouse Realization.

Recent articles by Bin Jiang, Ph.D.

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!