Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Welcome to the next installment of Data Vault Modeling and Methodology.  In this entry I will attempt to address the comment I received on the last entry surrounding Data Vault and Master Data.  I will continue posting as much information as I can to help spread the knowledge for those of you still questioning and considering the Data Vault.  I will also try to share more success stories as we go, as much of my industry knowledge has been accrued in the field - actually building systems that have turned in to successes over the years.

Ok, let's discuss the health-care provider space, conceptually managed data and master data sets, and a few other things along the way.

I have a great deal of experience in building Data Vaults to assist in managing health-care solutions.  I helped build a solution at Blue-Cross Blue Shield (Wellpoint St. Louis), another Data Vault was built and used for a part of the Center for Medicare and Medicaid facilities in Washington DC.  Another Data Vault is currently being built for the US Government Congressionally mandated health-care electronic records systems for helping track US Service personell, and there are quite a few more in this space that I cannot mention or discuss.

Anyhow, what's this got to do with Data Vault Modeling and Building Data Warehouses for chaotic systems, or immature organizations?

Well - let's see if we can cover this for you.  First, realize that we are discussing the Data Vault Data Modeling constructs (Hub & Spoke) here, we are not addressing the methodology components - that can come later if you like (although having said that, I will introduce parts of the project that help with parallel yet independent team efforts, that meet or link together at the end.

Ok, so how does Data Vault Modeling truly work?

It starts with the Business Key, or should I say the multiple business keys.  The business keys are the true identifiers of the information that lives and breathes at the finger tips of our applications.  These keys are what the business users apply to locate records, and uniquely identify records across multiple systems.  There are plenty of keys to go around, and source systems often disagree as to what the keys mean, how they are entered, how they are used, and even what they represent.  You can have keys that look the same that represent two different individuals, or you can have two of the same key that SHOULD represent the same individual, but the details (for whatever reason) are different in each operational system, or you can have two of the same keys representing duplicate records across multiple systems (best case scenario).

These business keys are the HUBS or HUB entities.  If you wanted, the different project teams can build their own Data Vault models constructed from business keys in their own systems.  Once the data is loaded to a historical store (or multiple stores), you can then build Links across the business keys to represent "same-as" keys: ie: keys that look the same, that the business user defines to be the same, but the data disagrees with itself.

Remember, links are transitionary, they represent the state of the current business relationships today.  They change over time, links come and go - they are the fluid dynamic portion of the Data Vault - making "changes to structure" a real possibility... but I digress.

To get the different teams building in parallel (their own Data Vaults) is the first step.  Once they have build Hubs, Links, and their own Satellites - and they are loading and storing historical data, then a "master data" team can begin to attack the cross-links from a corporate standpoint.  This must be done in much the same manner as building a corporate ontology - different definitions for different parts of the organization, even for different levels within the organization.   The Master Data Team can build the cross-links to provide the "corporate view" to the corporate customers, with the appropriate corporate definitions.

Think back to a scale-free architecture, it's often built like a B+ or binary tree, where nodes are inside of nodes, and other nodes are stacked on top of nodes, etc...  So... we have Data Vault Warehouse A, and Data Vault Warehouse B - now, we need Corporate Data Vault Warehouse C to span the two....  Links are the secret, followed by Satellites on the Links.  There may (as a result of a spread-sheet or two used at the corporate levels) even be some newly added Hub keys.  Again, business keys used at the corporate level that are not used at any other level of the organization.

Finally, at long last - a good use for Ontologies marrying to Enterprise Data Warehouses.  By the way, this is also the manner in which you develop a Master Data Set.  Don't forget that MDM means Master Data Management - and MDM includes people, process, and technology.  The Data Vault only provides the means to easily construct Master Data - it is NOT an MDM solution, strictly an MD "Master Data" solution.

Governance doesn't have to be separate, doesn't have to come before or after the Data Vaults are built - and again, disparate EDW Data Vaults can be built by parallel teaming efforts.  However - that said, once you embark on building Master Data sets, you *MUST* have governance in place to define the Ontology, the access paths, the corporate view (corporate Links & Hubs & Satellites) that you want in the Master Data components.

In essence you are using the Data Vault componentry (from the data modeling side) to bridge the lower level Data Vaults - to feed back to operational systems (that's where Master Data begins to hit ROI if done properly), and to provide feeds to corporate data marts.

In fact, we are using this very same technique in a certain organization to protect specific data (keep some data in the classified world) while other data lives in the non-classified or commercial world.  Scale free architecture works in many ways, and the LINK table (aside from adding joins), is the sole reason that makes this possible, and is what makes the Data Vault Model fluid.

It's also what helps IT to be more agile/more responsive to business needs going forward.  The Link table houses the dynamic ability to adapt quickly, change on the fly.

I'm not sure if I mentioned it, but ING Real Estate is using Excel Spreadsheets through Microsoft Sharepoint to trigger Link changes and structural changes to the Data Vault on the fly.  Thus when spreadsheets change and the relationships change, the Link tables change - leaving the existing history in tact, and creating new joins/new links for future historical collection.  But this is yet another example of dynamic alteration of structure (on the fly) that is helping companies overcome many obstacles.

But I ramble, there's another company: Tyson Foods who has a very small Data Vault absorbing XML, and XSD information from 50 or so external feeds?  Most of which change on a bi-weekly basis.  They had one team who built this as a pilot project using the Data Vault, and are now adapting easily and quickly to any of the external feed changes coming their way.  In fact, they were able to apply the "master data/governance" concepts at the Data level, and "clean up" the XML quality of the feeds they were re-distributing to back to their suppliers.

So let me bring it home:  Is Governance and clean-up required up-front to build a Data Vault?

No, not now, not ever.   Is it a good thing? Well - maybe, but by loading the data you do have into disparate Data Vaults, you can quickly and easily discover just where the business rules are broken, and where the applications don't synchronize when they are supposed to.  Can the Data Vault Model help you in building your MDM?  Yes, but's it's only a step in the Master Data side of the house.  You are still responsible for the "Data Management" part of MDM, the people, process, and technology - including the Governance...  all part of Project Management at a corporate level.

This brings the second segment to a close.  Love to have your feedback, what else about the Data Vault are you interested in?  Again, these are meant to be high level - to explain the concepts.  Let me know if I'm meeting your needs.  Feel free to contact me directly.

Thank-you kindly,

Dan L,  DanL@DanLinstedt.com


Posted March 15, 2010 6:39 PM
Permalink | No Comments |

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›