Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Most of you by now have heard the words: "Data Vault".  When you run it through your favorite search engine you get all kinds of different hits/definitions.  No surprise.  So what is it that I'm referring to when I discuss "Data Vault" with BI and EDW audiences?

This entry will try to answer such basic questions, just to provide a foundation of knowledge with which to build your fact finding on.

Data Vault: Definitions vary - from security devices, to appliances that scramble your data, to other services that offer to "lock it up" for you...  That's NOT what I'm discussing.

I define the Data Vault as follows:  Two basic components:

COMPONENT 1: The Data Vault Model

The Modeling component is really (quite simply) a mostly normalized hub and spoke data model design with table structures that allow flexibility, scalability, and auditability at it's core.

COMPONENT 2: The Data Vault Methodology

I've written a lot less about this piece.  BUT: This piece is basically a project management component (project plan) + implementation standards + templates + data flow diagrams + statement of work objects + roles & responsibilities + dependencies + risk analysis + mitigation strategies + level of effort guestimations + predictive type / expected outcomes + project binder, etc....

What's so special about that?

Well - what's special about the methodology is that it combines the best practices of six sigma, TQM, SEI/CMMI Level 5 (people and process automation/optimization), and PMP best practices (project reviews, etc..).  Is it overkill?  for some projects, yes, for others - no.  It depends on how mature the culture of your organization is, and how far along the maturity path IT is - whether or not they are bound or decreed to create then optimize the creation of enterprise data warehouses.

Ok - the project sounds a lot like "too huge to handle"  - old, cumbersome, too big, too massive an infrastructure.  etc.. etc.. etc...  Yea, I've heard it all before, and quite frankly I'm sick of it.

I built a project this way in the 1990's for Lockheed Martin Astronautics called the Manufacturing Information Delivery System (MIDS / MIDW) for short which last I heard is still standing, still providing value, still growing today.  I was an employee for them under their EIS (enterprise Information Systems) company.  My funding came from project levels, specifically through contracts.  I couldn't get time from a fellow IT worker without giving them my project Charge Number  (Yes, CHARGEBACKS).  So every minute we burned was monitored and optimized.  We built this enterprise data warehouse in 6 months total with a core team of 3 people (me, a DBA, and a SME).  We had a part time data architect/data modeler helping us out.  We wrote all our code in COBOL, SQL, and PERL scripts.  Our DEC/ALPHA mainframe was one of our web-servers, so we wrote scripts that generated HTML every 5 minutes to let our users know when our reports were ready.

Ok - technology has come a long long way since then, but the point is: we used this methodology successfully with limited time, and limited resources.  We combined both waterfall and spiral project methodologies to produce a repeatable project for enterprise data warehouse build-out.  At the end of the project we were able to scale out our teams from our lessons learned, optimize our IT processes, and produce more successes in an agile time frame.  We had a 2 page business requirements document - that once the business user filled in and handed back to us, to the time we delivered a new star schema was approximately 45 minutes to 1 hour.  ** as long as the data was already in the data warehouse, and we didn't have to source a new system**

This is efficiency.  We had a backlog of work from around the company because we had quick turn-around.  Is this Agile?  Don't know - all I know is it was fast and Business Users Loved it.

Anyhow, off track - so let's get back.

The methodology is what drove the team to success - allowed us to learn from our mistakes, correct and optimize our IT business processes, manage risk, and apply the appropriate mitigation strategies.  We actually got to a point where we began turning a profit for our initial stakeholders (they were re-selling our efforts to other business units, bringing in multiple funding projects across the companies because of our turn around time).  The first project integrated 4 major systems: Finance, HR, Planning, and Manufacturing.  The second project integrated Re-work, Contracts, and a few others like launch-pad parts.

Anyhow, at the heart of the methodology was and is a good (I like to think it's great) data architecture.  The Data Vault Modeling components.

This is just the introduction, there is more to come - I really am counting on your feedback to drive the next set of blog entries, so please comment on what you'd like to hear about, what you have heard (good/bad/indifferent) about the Data Vault Model and / or methodology.  Or contact me directly with your questions - as always I'll try to answer them.

Thanks,

Dan Linstedt

DanL@DanLinstedt.com


Posted March 14, 2010 8:52 PM
Permalink | 3 Comments |

3 Comments

Dan,

I always like hearing success stories about systems that survive the test of time. We often think about scalability merely in technical terms. Scalability in conceptual terms (how big a set of distinct ideas can a given concept consume) seems like a powerful trait of the data vault.

What I'd like to hear more about is how effective (or not) the data vault is for organizations that are immature in data and information management. Working in the health care provider space: there is a high degree of chaos in how we conceptually manage and organize data (MDM); there are a lot of different systems with data that we care about; and a lack of industry maturity. So, I'd like to hear some thoughts about how the data vault fits into that environment. Can it be a tool to help; or does practical implementation rely heavily on a separate governance effort to clean and organize first?

I hope that's merely a leading question and I know the answer already.


Dan,

This idea of a data vault makes a whole lot of sense. We completely concur with the idea, in our experience we are seeing more traction in virtual-agile and cloud use in deliverying BI services. BIAAS is a term being used frequently to refer to a self service product, but in our experience it seems that people and process have a very important role in the success of any BIAAS project.

The businesses today are more inclined to consider data vault when you consider the challenges of reporting from various silos of application within an enterprise (SAAS, CRM, ERP, Legacy, XML and SQL). Most IT admins and CIOs we talk to seem to be interested to streamline + consolidate reporting, visualization and notification from silos of information resident in vairous business application within the enterprise.

The value of data vault comes from a re-usability point of view and consistency of metadata. The key problem to solve would be the idea of dynamic DW generation and meta data management/mapping/ discovery.

Not sure if other concepts such as canonical metadata f, generic (EL) extract-load logic to push enterprise data into cloud warehouse, automation of transformation + mapping and model-driven visualization would play any role in this data vault concept. Repetability of quality would require more role from the role and a minimalistic role from process and people.

It would be interesting to know more about how your data vault project does this?

The businesses today are more inclined to consider data vault when you consider the challenges of reporting from various silos of application within an enterprise (SAAS, CRM, ERP, Legacy, XML and SQL). Most IT admins and CIOs we talk to seem to be interested to streamline + consolidate reporting, visualization and notification from silos of information resident in vairous business application within the enterprise.

The value of data vault comes from a re-usability point of view and consistency of metadata. The key problem to solve would be the idea of dynamic DW generation and meta data management/mapping/ discovery.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›