Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

If you have a large number of star schemas, or a large federated star schema as an enterprise data warehouse, then you might or might not have this issue. This is one of the issues affecting business today. In this entry we will explore the issue called Dimensionitis from a business perspective, in other words: how much does it cost to maintain, what happens when... and so on.

Have you been confronted with "Silos" of information? Does your IT team have a "logical box" drawn around a set of split up star schemas, and is it labeled EDW?

Does your "EDW" system look like this: http://intelligententerprise.com/channels/business_intelligence/showArticle.jhtml?articleID=206902663

or this:
http://www.dwmantra.com/dwconcepts.html

If you have one of these systems, then let me ask you this as a business user:
a) Does it continually cost more money to build new stars? (add on to the logical EDW)
b) Do you have "copies" of stars for different business units that produce different answers?
c) Does your EDW contain silos of information that business is demanding be reconciled, and consolidated because of management costs?

OR: do you find yourself saying to IT: "Just create a copy of the existing dimension, modify the data fields so they contain just what I need... Why try to conform it? It costs too much, or it will take too long. And by the way, if you (IT) can't do this, then I'll build it myself in Microsoft Access or Excel spreadsheets..."

If this is the case, then you may have Dimensionitis. Dimensionitis is the desire to extend your "EDW" but because of cost or time being prohibitive, you suggest IT simply "copy" the dimension to create a new one.

This needs to be fixed at a business level.

Don't get me wrong, please... I'm not saying that dimensions and star schemas are bad - I believe they are the best mechanism for presenting data to the business users for OLAP, drill down, and so on. What I am saying is that Star Schema modeling IS NOT SUITED to be an enterprise data warehouse. The data modeling architecture was never built nor intended to be an EDW. The original specifications did not have "type 1, type 2, or type 3, nor did it define a conformed dimension" - they only had a single star (no history) and were designed to be a subject oriented answer set.

If your lines of business create new dimensions because the COST of re-engineering or the TIME it takes IT to "conform the new data to an existing dimension" takes too long, then you've got a case of Dimensionitis running around. This also is a loss of governance and control over the data in the EDW.

Next time, we'll discuss the impact of IT agility on your EDW projects.

Thoughts? Comments? Questions?
Thanks,
Dan Linstedt
http://www.GeneseeAcademy.com


Posted November 18, 2008 3:54 AM
Permalink | 2 Comments |

2 Comments

I want to ask you something, in a MOLAP structure what is better? create a new "same" dimension for every cube? or share the same dimension in all cubes?,

Thanks in advance for your response,

Greetings,

For a MOLAP structure, it depends on performance. If too many users access the same object, and the Database cannot handle the load - then splitting a copy of the dimension off to another machine is a way to improve performance.

Remember: it's not "bad" to copy dimensions, as long as the dimensions are NOT being used as part of the EDW... History, and the EDW should be in a different layer.

Hope this helps,
Dan L

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›