There are problems in I.T. today with a lack of agility. There are issues with Business creating their own spread-marts in MS-Access, Excel, and OLAP Cubes. There is a widening gap between the "corporate Enterprise Data Warehouse" and what I.T. can provide, how quickly they can adapt, and how cost effective they can be going forward. There is a rise of something called 2nd generation data warehouses... Why? Because 1st generation warehouses are suffering from "stove-piped solutions" re-created by using the incorrect modeling techniques for your data warehouse. Bill Inmon has been writing lately about data modeling and how to do it properly. In this entry I'll dive in head first into these issues, and what's going on in the industry, and what you can do about it.
What' are you talking about "Willis..."?
(From an American '80's TV show called: Different Strokes)
I've seen it happening first hand at many many clients. The typical story is as follows:
First the company select "star-schema" modeling as the way to build their enterprise data warehouse. Then, they select conformed dimensions, and shared fact tables. The first implementation costs the business 90 days and maybe 5 consultants, and maybe $250,000 USD. If your lucky, it might be $150,000.
The business unit that this is built for becomes very happy, with quick delivery, apparently low cost, and super fast access to dimensional information that meets their business needs... But then, reality sets in... Other business units see this success, and want "one of their own" built.
There's trouble on the horizon sailor...
What do you mean? I don't see any trouble... Well, to tell you the truth, building a second or even a third "star schema" and then federating these together doesn't seem to be such a big deal. The cost may increase only slightly to maybe $180k or $275k, and the number of days to implement may increase only slightly to maybe 110-120 days. But what's happening here?
The reality of it is: I.T. (because of business needs) takes existing dimensions and begins to add different & loosely affiliated information to the same "dimension", thus, apparently attempting to "conform it."
So what's the bottom line?
As this process continues, and I.T. gets in to the 5th or 6th "project", the conformity of the dimensions becomes lost in the fray. Too many different kinds of data are added to the dimension "to conform it to the enterprise" which distorts it's original purpose, and in fact (if done improperly) can destroy the grain of the business key and fact tables to which the dimension is "hooked up." But more importantly, each time I.T. increases the size of this monster, it always creeps in to higher cost, and longer implementation time-frames.
I.T. becomes less agile in their implementation strategies, and a simple "change" that the business has to make (that used to cost $150k and take 90 days) now costs well in to the $350k range and takes 6 months or more. What was a conformed dimension now becomes a "deformed" dimension, and has trouble meeting the business needs.
What's the business impact of all of this?
The business begins to wonder how effective the "EDW" solution is... They need the changes made in order to stay effective in business, and since I.T. costs to much and takes to long, they make a copy of existing data sets, and build their own MS-Access databases, and Excel spreadsheets. The flip side is: I.T. mitigates this by beginning to construct singular star-schemas (back to quick delivery and smaller cost), which now means I.T. is reconstructing the stove pipes that they were trying to eliminate in the first place!!
You have one of two outcomes:
1) Wow! This is HUGE, I never realized - but this is exactly what's happening in my business... (You are now teetering on the brink of disaster unless you enter DW2.0, and 2nd generation warehousing).
2) I.T. has followed the strictest of standards all along, the system hasn't really "grown" beyond maintenance control costs because the data modelers did their job properly and volume, real-time, and compliance aren't issues here... So I see no reason to look for solutions. Congratulations, you're one of the few and I'd love to hear from you as to how you got your success to work.
How can we solve this problem?
There are several solutions to these problems, but they all stem from choosing the right data modeling architecture for your EDW, along with a solid foundational architecture and framework with which to build your system.
The short answer is to look at becoming DW2.0 compliant. DW2.0 brings with it fundamental tenants that we should adhere to in order to put the right architectural components in place. It also comes with the standard definitions that the industry has lacked over the years, finally and at last we have standards, definitions, and frameworks to follow. The second part of the short answer is that you need the right data model under the covers to make scalability, flexibility, and compliance a top-notch effort.
There are several different data modeling ideas and techniques floating around out there (all of them built off other's ideas). They can help you overcome these pains you may be feeling.
The first (and my favorite - but then again, I'm biased) is called the Data Vault, it's real name is Common Foundational Integration Model Architecture (but that doesn't sell). It's been10+ years of R&D from 1990 to 2000, it's been available for free since 2000, it's been endorsed last summer as:
"‚ÄúThe Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.‚ÄĚ, June 2007, Bill Inmon
It's currently in use in many BIG industries around the world, and has a community for it's seekers.
Data Vault "customers" (I use the term loosely because they pay for consulting, not for the modeling technique which is free), have repeatedly told me how it really helps make I.T. agile again, how it's made I.T. stars in the business eyes because it has lowered cost, and reduced time frames to delivery, and gotten I.T. back in to the EDW game of providing value to the business by keeping up with the changes. The Data Vault has proven itself to be the technique of 2nd Generation EDW efforts around the world.
Many customers (after following the whole approach on Data Vault) can actually produce Star Schemas for business (as an output/delivery mechanism) in about 24 hours turn-around from the time they receive their business requirements to the time full delivery occurs. If your lucky, then prototypes with data can be turned around in less than 1 hour.
There's another technique (which if you take the Data Vault model to 6th normal form-likeness, looks very similar) called Anchor Modeling from a company in the Netherlands called Intellibis. The creators say it's been around since 2002, I cannot yet find any customer stories, or reasons to normalize to that level, but it appears to possibly hold promise in the future. It's a different way of thinking.
And finally, an interesting thought pattern that has yet to be put into practice (as far as I can tell) is something called the Triadic Continuum. There appears to be a book written about the subject that approaches the theoretical (and some concrete) aspects of what this means. However it appears to be stated as a way to help machines understand context - I'm not sure exactly (yet) how this applies to EDW.
http://www.eruditor.com/books/item/9780595441129.html.en (the book)
You'll hear more from me in this upcoming series on data modeling, and architecture (and how it affects your business) as we move forward. In the mean time, check out the modeling efforts noted above, or respond to this entry, tell us what you are engaged in and how successful it is, or where some of the pains you are experiencing exist.
Posted December 21, 2007 3:59 AM
Permalink | 2 Comments |