Most of you by now have heard the words: "Data Vault". When you run it through your favorite search engine you get all kinds of different hits/definitions. No surprise. So what is it that I'm referring to when I discuss "Data Vault" with BI and EDW audiences?
This entry will try to answer such basic questions, just to provide a foundation of knowledge with which to build your fact finding on.
Data Vault: Definitions vary - from security devices, to appliances that scramble your data, to other services that offer to "lock it up" for you... That's NOT what I'm discussing.
I define the Data Vault as follows: Two basic components:
COMPONENT 1: The Data Vault Model
The Modeling component is really (quite simply) a mostly normalized hub and spoke data model design with table structures that allow flexibility, scalability, and auditability at it's core.
COMPONENT 2: The Data Vault Methodology
I've written a lot less about this piece. BUT: This piece is basically a project management component (project plan) + implementation standards + templates + data flow diagrams + statement of work objects + roles & responsibilities + dependencies + risk analysis + mitigation strategies + level of effort guestimations + predictive type / expected outcomes + project binder, etc....
What's so special about that?
Well - what's special about the methodology is that it combines the best practices of six sigma, TQM, SEI/CMMI Level 5 (people and process automation/optimization), and PMP best practices (project reviews, etc..). Is it overkill? for some projects, yes, for others - no. It depends on how mature the culture of your organization is, and how far along the maturity path IT is - whether or not they are bound or decreed to create then optimize the creation of enterprise data warehouses.
Ok - the project sounds a lot like "too huge to handle" - old, cumbersome, too big, too massive an infrastructure. etc.. etc.. etc... Yea, I've heard it all before, and quite frankly I'm sick of it.
I built a project this way in the 1990's for Lockheed Martin Astronautics called the Manufacturing Information Delivery System (MIDS / MIDW) for short which last I heard is still standing, still providing value, still growing today. I was an employee for them under their EIS (enterprise Information Systems) company. My funding came from project levels, specifically through contracts. I couldn't get time from a fellow IT worker without giving them my project Charge Number (Yes, CHARGEBACKS). So every minute we burned was monitored and optimized. We built this enterprise data warehouse in 6 months total with a core team of 3 people (me, a DBA, and a SME). We had a part time data architect/data modeler helping us out. We wrote all our code in COBOL, SQL, and PERL scripts. Our DEC/ALPHA mainframe was one of our web-servers, so we wrote scripts that generated HTML every 5 minutes to let our users know when our reports were ready.
Ok - technology has come a long long way since then, but the point is: we used this methodology successfully with limited time, and limited resources. We combined both waterfall and spiral project methodologies to produce a repeatable project for enterprise data warehouse build-out. At the end of the project we were able to scale out our teams from our lessons learned, optimize our IT processes, and produce more successes in an agile time frame. We had a 2 page business requirements document - that once the business user filled in and handed back to us, to the time we delivered a new star schema was approximately 45 minutes to 1 hour. ** as long as the data was already in the data warehouse, and we didn't have to source a new system**
This is efficiency. We had a backlog of work from around the company because we had quick turn-around. Is this Agile? Don't know - all I know is it was fast and Business Users Loved it.
Anyhow, off track - so let's get back.
The methodology is what drove the team to success - allowed us to learn from our mistakes, correct and optimize our IT business processes, manage risk, and apply the appropriate mitigation strategies. We actually got to a point where we began turning a profit for our initial stakeholders (they were re-selling our efforts to other business units, bringing in multiple funding projects across the companies because of our turn around time). The first project integrated 4 major systems: Finance, HR, Planning, and Manufacturing. The second project integrated Re-work, Contracts, and a few others like launch-pad parts.
Anyhow, at the heart of the methodology was and is a good (I like to think it's great) data architecture. The Data Vault Modeling components.
This is just the introduction, there is more to come - I really am counting on your feedback to drive the next set of blog entries, so please comment on what you'd like to hear about, what you have heard (good/bad/indifferent) about the Data Vault Model and / or methodology. Or contact me directly with your questions - as always I'll try to answer them.
Posted March 14, 2010 8:52 PM
Permalink | 3 Comments |