Originally published September 25, 2008
Storing historical data is important for improving performance monitoring, conducting special decision relevant studies, improving financial reporting and providing drill down. However, a data warehouse is not a complete decision support system (DSS). Rather, a data warehouse or data mart is commonly the "driver" and dominant component for a data-driven decision support system. The DSS adds a manager-friendly front end commonly built with a business intelligence (BI) product.
A data-driven DSS emphasizes access to and manipulation of a time series of internal company data and sometimes external data. Simple file systems accessed by query and retrieval tools provide the most elementary level of functionality. Data warehouse systems that allow the manipulation of data by computerized tools tailored to a specific task and setting or by more general tools and operators provide additional functionality. Data-driven DSS with online analytical processing (OLAP) provide the highest level of functionality.
A data warehouse is an organized collection of large amounts of structured data. It is a database designed and intended to support decision making in organizations. It is usually batch updated and structured for rapid online queries and managerial summaries of its contents. According to Bill Inmon (1993), who is often called the "father" of data warehousing, "a data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data". Ralph Kimball (1996), another data warehousing pioneer, notes, "A data warehouse is a copy of transaction data specifically structured for query and analysis." So, the data warehouse or the single subject data mart stores data for a data-driven DSS.
When a data warehouse is included as a component in a data-driven DSS, a DSS analyst or data modeler needs to develop a schema or structure for the database and identify analytic software and end-user presentation software to complete the DSS architecture and design. The DSS components need to be linked in an architecture that provides appropriate performance and scalability. In some data-driven DSS designs, a second multidimensional database management system (MDBMS) will be included and populated by a data warehouse built using a relational database management system (RDBMS). The MDBMS will provide data for online analytical processing (OLAP). It is common to build a data warehouse using an RDBMS from Oracle or IBM and then use query and reporting and analytical software from a vendor such as MicroStrategy or Business Objects as part of the overall data-driven DSS design. What some vendors call "business intelligence software" provides the analytics and user interface functionality for a data-driven DSS built with a data warehouse component.
In a data warehouse built using an RDBMS, the most common data model is called the star schema. A related model is called a snowflake schema. A star schema is organized around a central fact table that is joined to some dimension tables using foreign key references. The fact table contains data like price, discount values, number of units sold, and dollar value of sales. The fact table usually has some summarized and aggregated data, and it is usually very large in terms of both fields and records. The basic premise of a star schema is that information can be classified into two groups: facts and dimensions. Facts are the core data elements one is analyzing. For example, units of individual items sold are facts, while dimensions are attributes about the facts. The star schema has also been called a star-join schema, data cube, data list, grid file, and multidimensional schema. The name star schema comes from the pattern formed by the entities and relationships when they are represented as an entity-relationship diagram. Metaphorically, the results of a specific business activity are at the center of the star schema database and are surrounded by dimensional tables with data on the people, places, and things that come together to perform the business activity. These dimensional tables are the points of the star.
How does a snowflake schema differ from a star schema? A snowflake schema is an expansion and extension of a star schema to additional secondary dimensional tables. In a star schema, each dimension is typically stored in one table; the snowflake design principle expands a dimension and creates tables for each level of a dimensional hierarchy. For example, a Region dimension may contain the levels of Street, City, State and Country. In a star schema, all these attributes would be stored in one table; in a snowflake schema, one would expand the schema and a designer might add city and state secondary tables.
Creating the data model for a data-driven DSS is a complex task. Whether DSS data is stored in a flat file, a hierarchical or multidimensional database or a relational database management system, a large, well-organized database of business facts provides the functionality for a data-driven DSS. A data warehouse is only part of such a system; but when it is used, the data component is the "driver" for decision support.
Inmon, W. H. "What is a Data Warehouse?" PRISM Newsletter, Center for the Application of Information Technology, Washington University in St. Louis, vol. 1, no. 1, 1993.
Kimball, R. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, 1996.
Power, D. Is a Data Warehouse a DSS? What is a Star Schema? How Does a Snowflake Schema Differ from a Star Schema? DSS News, Vol. 3, No. 4, February 17, 2002.
Recent articles by Dan Power