Are you caught in the explosion of your life? Have you gotten to a point where federated star schemas aren't cutting it for Enterprise Data Warehousing? Is data volume pushing impacts on the star that can't be handled? What does a change to your single enterprise conformed dimension cost? How much time does it take, and how many impacts are there to integrate new systems or more data to your existing federated system?
Star schemas are GREAT ARCHITECTURES for solving subject oriented answers, they were and are not designed to be enterprise data warehouses! They get to a certain point where they get in the way of being nimble, creating workable solutions for the business, or constrain the business (because of cost or time to implement - due to impacts of changes). The business is now caught, in a super-nova exploding star. Let's take a look to see what happened, and what we can do to "fix" this situation.
Super-nova star schemas, ok: how did we get here?
We had no other way to build an architecture for an enterprise data warehouse, so we used the best we had.... True, that's how we got here. Now there's another, better, more nimble data modeling architecture on the block that solves these problems - and it's called: "COMMON FOUNDATIONAL ENTERPRISE MODELING ARCHITETURE" But for short, it's actually called: "Data Vault Modeling Architecture", and it's been available since 2000. It's been endorsed by Bill Inmon, Clive Finklestein, Claudia Imhoff, Lowell Fryman, Wayne Eckerson, and numerous others in the industry. Bill recently stated: "Data Vault is the optimal choice for DW2.0..."
Ok, but I don't have a super-nova on my hands, my federated data warehouse is working just fine... Why should I look at this?
If the architectural model is serving you today, cool - don't change. No problem. But at least bear with me to read about the problems that do occur with federated star-schemas that are currently going super-nova....
Signs of an impending super-nova:
* Business can't approve _business changes_ because the impacts cost too much or take too long for I.T. to implement into their Federated Star
* Business is constantly arguing over "what the conformed dimension means", causing either dimension-itis (too many dimensions), or duplicate copies of the data in the dimension, to be slightly tweaked to meet "individual needs."
* Business is constantly building their own spread-marts to answer their questions because I.T. is now "frozen" by an inability to change the federated mart (too many users, too many dependencies, too high a cost, too large an impact).
* Business is constantly building their own spread-marts because I.T. won't agree to provide data in a SUBJECT ORIENTED BUSINESS FOCUSED PURPOSE, because supposedly "it's already available in the enterprise federated star."
* 160 to 300 columns "appear" in a dimension, making it near impossible to manage or to provide full value to the business going forward.
* It becomes difficult if not impossible to integrate a new system into the federated system.
* "Copies" of the federated stars have propogated through-out the enterprise and then have been subsequently altered to meet specific business needs, now - the answers from those different federated stars no longer are consistent.
* The star has "JUNK" and "HELPER" tables embedded all over the place. What is a JUNK table anyway? And if it's JUNK, why is it in my model? Helper table? What's it helping? is it a band-aid for the architecture?
* Volumes have outgrown hardware capabilities, pushing the purchase of an appliance which only staves off the inevitable.
* Real-time processing conflicts with the batch processing
* Real-Time processing forces UPDATES to the data set, or causes orphaned facts, or worse yet, orphaned dimensions with keys that simply never make it to the rest of the business, and cannot be used.
* When was the last time that AUDITABLE and TRACEABLE data could be produced from your federated star-schema? Can you reproduce the source system data by reversing the algorithms (mathematically) to answer an auditors questions?
Well.... Hmmmmm... What on earth are you talking about?
Architecture, architecture, architecture.
Would you take a Porsche 911 and weld a trailer hitch to the back-end, to haul a camper up a 30 degree incline in 90 degree (farenheight) weather? COULD YOU DO THIS? YES, you can modify the ARCHITECTURE and adapt the car to actually do this, but what would happen? In this situation, you probably would blow out the motor right?
Any time you "adapt" an architecture, or apply band-aids to make it fit where it wasn't designed to fit, it inherits limitations and eventually "blows apart." (the wheels fall off, the band-aids don't stick anymore, it becomes un-managable).
Likewise would you take a Big-Rig, chop the top, leave the frame and motor in place, place a new fiberglass body on it and try to win an auto-race? No... Could you? Yes you could, but it doesn't make sense.
The Data Vault (Common Foundational Enterprise Architecture) provides this capability as a mathematically designed solution built to suit the needs of enterprise data warehousing. Star Schemas WERE and ARE designed to be a SUBJECT ORIENTED ANSWER DATABASE for use in OLAP solutions... They work wonders in pivots, and OLAP type processing, and yes - the architecture is valuable when applied appropriately, but for our sakes, don't apply the modeling paradigm to be your enterprise data warehouse.
Ok, so what is available? What can I do to stop the Super-Nova? I have one that I might end-up with, can I prototype something that will keep me from getting there?
Yes. The Data Vault modeling technique is flexible, highly scalable, and FREE - just like 3rd normal form, and star schema, the modeling techniques are freely available. It is the next step in data modeling for the enterprise data warehouse, you can build a very small data vault model as a Proof of concept, then proceed to build larger models on top of the existing foundation, and eventually "re-feed" your star-schemas as Type 1 dimensions.
With a Data Vault as the EDW, you no longer need Type 2 Dimensions to house history. Also the need for Conformed Dimensions is no longer an absolute necessity, you can do it where it makes sense. It releases the impacts of making changes because "everyone and everything" is using a particular conformed dimension.
Remember, like 3rd normal form or star-schema there is a full approach to using, building, implementing a Data Vault. More on that next time.
Posted September 1, 2007 3:37 PM
Permalink | 1 Comment |