<-- Back to full color view

What a Data Warehouse is Not

Originally published October 29, 2009

Some topics just never seem to die, however much their demise is deserved. When you think you have heard the last of something, here it comes again, just like a bad penny.

Recently, I was at a conference, and I heard the following discussion about what a data warehouse was. One person suggested that a data warehouse was really all the old legacy systems connected by software that could access the data. By calling such a contraption a data warehouse, the organization could avoid having to do the hard and complex work of integration.

There are so many problems with this federated approach to a data warehouse that they are almost not worth repeating here. But (once again!) here goes.

A federated data warehouse:

  • Has terrible performance problems,

  • Has no integration of data,

  • Has no reliable source of historical data,

  • Requires the cooperation of Armonk, Larry Ellison and Bill Gates, among others,

  • Has no repeatability of processing, and so forth.

A federated data warehouse is no data warehouse at all.

Another person suggested that a data mart was a data warehouse. In this case, it was suggested that an organization build a data mart for finance. Then, the data mart could be expanded with new requirements for marketing. Then sales could add on, and so forth.

The problem with this solution is that the requirements for data as found in a data mart vary considerably from one department to the next. Adding sales data to finance data cannot be done without restructuring data back down to its most basic level and rebuilding the structure. At this point, it would have been easier to just build a data warehouse in the first place.

Stated differently, a data mart has one set of DNA and a data warehouse has another set of DNA. Setting seeds for a tumbleweed in the ground, watching the seeds grow, then calling the plant an oak tree does not make the plant an oak tree. The DNA for a tumbleweed and an oak tree are as different as can be.

So a data warehouse is not a data mart, just as a federated data warehouse is not a data warehouse.

Data warehouses exist for the purpose of supporting management, not operations. As such, an active data warehouse is not a data warehouse. Doing transaction processing and up-to-the-second transactions is not what a data warehouse is. Management does not need or even care about detailed, up-to-the-second accurate transactions in order to make decisions. It is the clerical community that cares about these kinds of decisions. And a data warehouse does not support the clerical community. The data warehouse supports the managerial community.

So an active data warehouse is also not a data warehouse.

A data warehouse is not a dimensional database, where there is a star structure and fact tables. Star structures and fact tables are designed to optimize the access and analysis of a single group of users and a single set of requirements. As long as the users do not change and the requirements do not change, everything is fine. The problem is that over time, users do change and requirements do change. That is the way of the world. And when requirements change, the star schema and the fact tables need to undergo change.

A much more rational way to build the data warehouse is to use the relational model. The relational model is able to handle change as gracefully as change can be handled. In addition, the relational model is so granular and basic that it is not optimized for any user at the expense of any other user.

And last but not least, you do not buy a technology and have a data warehouse. Instead, you design and build the proper structure, and then you seek out the best technology to help you access and analyze the data. Most vendors that offer to sell you a data warehouse are pulling your leg.

So here is a short list of what a data warehouse is not:

  • A federation of data,

  • A data mart that can be grown into a data warehouse,

  • An active data warehouse, or

  • A dimensional table with star schemas and fact tables.

SOURCE: What a Data Warehouse is Not

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon



 

Comments

Want to post a comment? Login or become a member today!

Posted October 5, 2010 by Anonymous

In general, this makes a lot of sense, especially the part about data federation. That has seemed like a bad move on all kinds of levels, but a lot of businesses didn’t seem to see it. I get that an “active data warehouse” the way you’re defining it, up to the minute transactional level, is a bad idea.  But, to keep a data warehouse useful to management, it still has to be continuously maintained and updated. Automated integration processes that pull data from primary sources, scrub and reconcile the data, and update it regularly are crucial to keeping a data warehouse from becoming just an archive of out-of-date historical data.

I’m biased, of course, but I think the Pervasive Data Warehouse offering is the right way to go. Nobody’s pulling your leg there, offering a “one size fits all” kind of thing.  Basically, it’s just good solid ETL infrastructure, plus the expertise and best practices of folks who have done hundreds of data warehouse implementations put at your service to build exactly what the individual enterprise needs. Just my opinion.

-Paige

Is this comment inappropriate? Click here to flag this comment.

Posted November 10, 2009 by ANDREA VINCENZI andrea.vincenzi@tiscali.it

Hi, I already posted this comment about one week ago, but something must have gone wrong, because it doesn't show up on the site, so I'm trying again:

Well, I must admit that I would have been very embarassed if I had to give my definition of a Data Warehouse knowing that Bill Inmon was sitting among the audience. In my humble opinion however, deciding what can or can't be defined as a DW is a sterile discussion, while discussing the advantages and disadvantages of different architectures makes much more sense.

If I had the opportunity to be there, I would have proposed the following architecture, which I have successfully used in many projects: "a set of conformed fact tables, connected by conformed dimensions, built directly from a staging area, without an underlying 3NF structure".

But I wasn't there, so I guess I’ll never be able to know if Bill considers this a data warehouse.

With regards,

Andrea Vincenzi – andrea.vincenzi@olap.it

Is this comment inappropriate? Click here to flag this comment.

Posted November 4, 2009 by Anonymous

Curious where SAP BW fits in here then.  The Info Cube is an 'extended' dimensional mart and this is not a DW indeed.  Or is it the ODS layer that fits the bill of a relational database falling in to the category of 'the data warehouse'?

Is this comment inappropriate? Click here to flag this comment.

Posted November 3, 2009 by Ken Hausman ken.hausman@sas.com

As the man who created the concept and formal definition of the data warehouse, which spawned both a discipline and an industry, Bill Inmon well knows what a data warehouse is and isn't.  But I do have a slightly different take on the conversation he reports.  And kudos to the conference attendees for having the conversation!  Two thoughts...

First, conferences are where a vast number of folks learn about data warehousing.  How lucky for those folks to have had the opportunity to discuss their misconceptions with THE data warehouse expert.

Second, even sans a proper straightening out, I am encouraged by their interest/engagement in the subject.  The state of the industry has evolved to the point where organizations need analytics to be successful and the foundation for analytics is clean, organized and timely data.  A good data warehouse raises the bar, but it is only part of the "big picture" and regardless of the form this structure takes, it is guaranteed to evolve as business requirements and problems arise. 

Too many organizations still try to run their business based on a collection of spreadsheets.  Let's encourage the next generation of information managers to take the next step and engage in more powerful and longer term solutions even if those steps don’t start out strictly meeting the definition of a DW. 

 

 

Is this comment inappropriate? Click here to flag this comment.

Posted October 29, 2009 by Anonymous

Bill is the authority on data warehousing, he defined it and so far no-one has come up with a better definition. None of the technologies that call themselves data warehosues - stars, conformed stars(a nonsense), federated, BI tools pass the test of compliance with the definition of data warehouse.

Is this comment inappropriate? Click here to flag this comment.

 

Copyright 2004 — 2014. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC