Blog: Dan E. Linstedt« ETL, ELT, EAI, EII and E-I-E-I-O | Main | Architecture, Standards, and Business » Automated Enterprise Data Warehouse ModelingIn my last blog in this category, I discussed repeatable architecture, and repeatable process to build a solid foundational enterprise architecture. I hope I was not giving the message out that the model elements must be repeated, for that is not the case. Data modeling is definitely a cross-combination of understanding the business need (the practitioner) and their ability to represent the business in a structured format. NOTE: This is a biased blog entry, based on a new data modeling technique called the Data Vault. I'll be talking more about the architecture in coming blogs. However, I believe that with an integrated non-aggregated, low-level detail architecture, there is a mechanism by which to achieve a standard data modeling architecture. Particularly when it comes to "integrating" different enterprises, why else would something like Universal Data Models ever have taken off? In this blog, I explore beyond the simple data model. I have suggested a new revolution in data modeling (available here: www.DanLinstedt.com) which is based on standard, repeatable architecture - an architecture that builds a granular, integrated, and foundational enterprise view of the facts (the data itself). It doesn't mean that this model should be used for information dissemination. It just means this model should be used to construct an enterprise data warehouse, non-accessible except to power users and data miners. From this point, we can generate information stores, star-schemas (turn data into information) through integration, quality, cleansing, and aggregation. At the end of this modeling effort, we begin to realize that it's nothing more than a STANDARD set forth on how to build a decent model. With any standard, the next evolution is automation. Well, I've done it. I've built a "Data Modeling Wizard", one that takes in multiple source data models from a number of relational databases, and spits out: Staging Areas, Data Vault data models, and ETL loading code to go with it. The next version of the software will actually produce all the mathematical combinations of "star-schemas" that appear to be "useful" to the end-user, and allow the end-user (IT modeler) to pick the stars to generate, then cross them with date/time aggregation options. In other words, I've automated the process of building a back-end enterprise data warehouse data model. One step closer to the truly "Dynamic Data Warehouse" or Dynamic restructuring of information in near-real-time. I can now produce a data model that is 60% to 80% of the final result that I want, in under 10 minutes (11,000 source tables). Of course the software has limitations, it reads only relational source data models, and relies heavily on Primary/Foreign keys. Also, the quality of the data model output depends directly on the quality of the data model input. But then again, if I can automate what used to take 3 months, down to 1 day - then I can use the rest of the week to manually tweak the data model to my liking. The point to this blog is not to sell the software (although I am looking for VC's/Angels), or it's usage, but to point out that there is another revolution coming: Automated build-outs of enterprise information stores, and dynamic model changes. For the first time that I can recall, I can play "what-if" games with my architecture before I sink tons of cost and time into it. After using and generating the Data Vault, a business has the responsibility of turning the DATA into INFORMATION, and actually writing the correct business rules into the processing engine to accomplish this task. This also requires different modeling techniques like Star Schema, and something Dave Wells recently wrote about in Flashpoint (November 2004): Master Dimensions, Master Fact Tables. |