What a Data Warehouse is Not
by Bill Inmon
Originally published October 29, 2009
Some topics just never seem to die, however much their demise is deserved. When you think you have heard the last of something, here it comes again, just like a bad penny.
Recently, I was at a conference, and I heard the following discussion about what a data warehouse was. One person suggested that a data warehouse was really all the old legacy systems connected by software that could access the data. By calling such a contraption a data warehouse, the organization could avoid having to do the hard and complex work of integration.
There are so many problems with this federated approach to a data warehouse that they are almost not worth repeating here. But (once again!) here goes.
A federated data warehouse:
A federated data warehouse is no data warehouse at all.
Another person suggested that a data mart was a data warehouse. In this case, it was suggested that an organization build a data mart for finance. Then, the data mart could be expanded with new requirements for marketing. Then sales could add on, and so forth.
The problem with this solution is that the requirements for data as found in a data mart vary considerably from one department to the next. Adding sales data to finance data cannot be done without restructuring data back down to its most basic level and rebuilding the structure. At this point, it would have been easier to just build a data warehouse in the first place.
Stated differently, a data mart has one set of DNA and a data warehouse has another set of DNA. Setting seeds for a tumbleweed in the ground, watching the seeds grow, then calling the plant an oak tree does not make the plant an oak tree. The DNA for a tumbleweed and an oak tree are as different as can be.
So a data warehouse is not a data mart, just as a federated data warehouse is not a data warehouse.
Data warehouses exist for the purpose of supporting management, not operations. As such, an active data warehouse is not a data warehouse. Doing transaction processing and up-to-the-second transactions is not what a data warehouse is. Management does not need or even care about detailed, up-to-the-second accurate transactions in order to make decisions. It is the clerical community that cares about these kinds of decisions. And a data warehouse does not support the clerical community. The data warehouse supports the managerial community.
So an active data warehouse is also not a data warehouse.
A data warehouse is not a dimensional database, where there is a star structure and fact tables. Star structures and fact tables are designed to optimize the access and analysis of a single group of users and a single set of requirements. As long as the users do not change and the requirements do not change, everything is fine. The problem is that over time, users do change and requirements do change. That is the way of the world. And when requirements change, the star schema and the fact tables need to undergo change.
A much more rational way to build the data warehouse is to use the relational model. The relational model is able to handle change as gracefully as change can be handled. In addition, the relational model is so granular and basic that it is not optimized for any user at the expense of any other user.
And last but not least, you do not buy a technology and have a data warehouse. Instead, you design and build the proper structure, and then you seek out the best technology to help you access and analyze the data. Most vendors that offer to sell you a data warehouse are pulling your leg.
So here is a short list of what a data warehouse is not:
SOURCE: What a Data Warehouse is Not
Recent articles by Bill Inmon
Copyright 2004 — 2017. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC