Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

August 2008 Archives

In my last entry in this category, I described automorphic data models and how the Data Vault modeling components is one of the architectures/data models that will support dynamic adaptation of structure. In this entry I will discuss a little bit about the research I'm currently involved in, and how I am working towards a prototype of making this technology work.

If you're not interested in the Data Vault model, or you don't care about "Dynamic Data Warehousing" then this entry is not for you.

The Data Vault model has reached the height of flexibility by applying the Link tables. It is an architecture that is linear scalable and is based on the same mathematics that MPP is based on. Single Link tables represent associations, concepts linking two or more KEY ideas together at a point within the model. They also represent the GRAIN of those concepts.

Because the link tables are always a Many To Many, they are extracted away from the traditional relationship (1 to many, 1 to 1, and many to 1). The Links become flexible, and in fact, dynamic. By adding strength and confidence ratings to the link tables we can begin to gauge the STRENGTH of the relationship over time.

Dynamic mutability of data models is coming. In fact, I'd say it's already here. I'm working in my labs to make it happen, and believe me it's exciting. (only a geek would understand that one...) The ability to:

* Alter the model based on incoming where clauses in queries (we can LEARN from what people are ASKING of the data sets and how they are joining items together)
* Alter the model based on incoming transactions in real-time (by examining the METADATA) and relative associativity / proximity to other data elements within the transaction.
* Alter the model based on patterns DISCOVERED within the data set itself. Patterns of data which were yet previously "un-connected" or not associated.

The dynamic adaptability of the Data Vault modeling concepts show up as a result of these discovery processes. I'm NOT saying that we can make machines "think" but I AM suggesting that we can "teach" the machines HOW the information is interconnected through auto-discovery processes over time. This mutability of the structure (without losing history) begins to create a "long term memory store" of notions and concepts that we've applied to the data over time.

Through recording a history of our ACTIONS (what data we load, and how we query) we can GUIDE the neural network into better decision making and management over the structures underneath. This includes the optimization of the model, to discovery of new relationships that we may not have considered in the past.

The mining tool is:
* Mining the data set AND
* Mining the ARCHITECTURE
* Mining the queries AND
* Mining the incoming transactions

to make this happen. We've known for a very long time that Mining the data can reap benefits, but what we are starting to realize NOW is that mining these other components really drive home new benefits we've not considered before. In the Data Vault Book (the new business supermodel) I show a diagram of convergence (which has been bought off on by Bill Inmon). Convergence of systems is happening, Dynamic Data Warehousing is happening.

These neural networks work together to achieve a goal: creating and destroying link tables over time (dynamic mutability of the data model) while leaving the KEYS (Hubs) and the history of the keys (Satellites) in-tact. Keep in mind that the Satellites surrounding Hubs and Links provide CONTEXT for the keys.

I've already prototyped this experiment at a customer, where I personally spent time mining the data, the relationships, and the business questions they wanted to ask. I built 1 new link table as a result with a relationship they didn't have before. We used a data mining process to populate the table where strength and confidence were over 80%. The result? Their business increased their gross profit by 40%. They opened up a new market of prospects and sales that they didn't previously have visibility to.

Again, I'm building new neural nets, new algorithms using traditional off the shelf software and existing technology. It can be done, we can "teach" systems at a base level how to interact with us. They still won't think for themselves, but if they can discover relationships that might be important to us, then alert us to the interesting ones - then we've got a pretty powerful sub-system for back-offices.

More on the mathematics behind the Data Vault is on its way. I'll be publishing a white paper on the mathematics behind the Data Vault Methodology and Data Vault Modeling on B-Eye-Network.com very shortly.

Cheers,
Dan Linstedt


Posted August 27, 2008 5:54 AM
Permalink | 5 Comments |

Here is another installment of the secrets of the masters. Quite frequently customers and IT alike complain about how difficult it is to gather business requirements. They discuss the pain of having to "get together" for a day, or for a week-long process to write down and document business processes, and ultimately their needs and desires for a new BI/EDW system. Any good analyst worth their salt has battle-scars from negotiating these treacherous grounds.

We've all walked in to an environment with a blank white-board and asked: business, please give me your requirements, only to be confronted with: "What can you provide to us?"

I'm here to describe a different way to you. This is an ancient technique and requires a flack-jacket to be worn by the IT participants at all times. Remember: nothing is personal, this is only business. No really, I'm not a Zen master, but a Zen master once said: it is far easier to tear something down than it is to build something up. So with that in mind, here we go... (I'm kidding about the ancient part...)

There are many of you in the BI and data integration world who handle BI requirements in a similar fashion: go through the months and months of drudgery discovering business processes, and "requests" for new systems design. Then go through all kinds of long lasting meetings to pull together a buy-off, and write up a business requirements and a technical requirements document. Then throughout the design and build phase of the project, have the users "slip in new requirements and larger scope because they forgot to tell you something."

But this process is incredibly painful, requires a lot of money up front before business users can "see" anything, and requires diligence on behalf of the IT and business folks to see it through. Now don't get me wrong, I'm NOT saying this is a bad thing, I am saying that it simply takes too long and there is a better way up Mount Everest.

Ok - enough with the bad jokes already... How do I do this?
For one, you need to shift your thinking about integration of data sets into your data warehouse. I've blogged on this before. There is a paradigm shift in the works for auditability and compliance, and it basically says: move your business rules DOWNSTREAM of the EDW. That's right, take a deep breath and swallow. Placing business rules upstream of the EDW will lead you back to the old techniques of waiting and waiting for business requirements. Moving the business rules DOWNSTREAM and implementing them coming out of the EDW allows us to see new light on gathering business requirements.

Easy for you to say... I still don't quite get it....
Right. Have you ever walked in to a room full of business users and started describing the data that their systems are "collecting today" from an integrated stand point? If you haven't, you should try it. If you have, then you know: Business users at that point are quick to the draw to point out why you're wrong, where the systems are wrong, what the problems are with the systems you are talking about, and of course - why they have their own special Excel spreadsheet that they built to FIX this problem.

The point is, you can literally start fires with your pen you are writing so fast... These are the missing business requirements that you've been seeking so long and hard for. It requires a flak jacket because you cannot take it personally.

By moving the business rules downstream of the EDW, we can load RAW and AUDITABLE data "as-it-stands" into the EDW. From there we can produce something called an "AS-IS STAR SCHEMA." The AS-IS Star shows raw level grain, with un-doctored and uncensored, and un aggregated data sets. You can then share with the business users "this is the way your source systems (once integrated) are currently capturing data, and by the way, these are the results of your source systems executing your business processes."

They very quickly are more than happy to tear it down, shoot holes in it, tell you why it's wrong and why it won't work. Again, if you're willing, you can gather nearly all the requirements you need for phase 1 within a day, or 2 day session. This reduces the cycle time to delivery of your EDW environment, increases the visibility into all the "work-arounds" the business users are currently engaged in to "get the source systems to do what I want."

I've been using this technique for 16+ years, and it hasn't failed me yet. But again, it requires a strong will, and to move the business rules downstream. AFTER you've collected the business requirements, then you can build integration processes to take the data from the AS-IS stars into the "business release" star schemas. Also, by moving the business rules downstream you can meet accountability, auditability and compliance in your EDW.

This is one of the most powerful secrets of the masters available within Integration projects, whether you are executing SOA, ESB, Web Services, or EDW / BI projects, it works, and yes - we teach this in our Data Vault Modeling and Certification course.

I'd love to hear your thoughts and comments.

Thank-you,
Daniel Linstedt
http://www.DanLinstedt.com


Posted August 26, 2008 5:32 AM
Permalink | 1 Comment |

It's not often in our industry you get a chance to read about successes. Too much press is given to negative types of issues. This entry is about successful implementations.

Would you like your IT team to build "Data marts in about an Hour?", How about full EDW's with AS-IS star schemas in 2 weeks (regardless of size of source systems, or number of systems to integrate)? Would you as a business user like to hear that your new requirements for reporting can be met within a 2 day turnaround? How about your IT team becoming a profit center for the stake-holder rather than a cost center?

Sound too good to be true? It's NOT! Honestly, this is the first time in a long time that I'm excited again to be in IT. I'm working with several customers in which we have made these things a reality. This entry is about how we did it, and how you can do it too.

The more I go, the more I realize that this is the key to success in any IT project. Of course hind-sight is 20-20 so therefore I should have realized this years and years ago, wait a minute.... I did. This is why I built the Data Vault Model, combined it with SEI/CMMI Level 5 approach (methodology for implementation) and now it's playing out.

What I've done over the years is combine, refine, and optimize everything from the project plan to the work breakdown structure, to the risk analysis and mitigation strategies. We've also applied SEI/CMMI Level 5, and combined it with PMP, Six Sigma, TQM, and Lean Initiatives to end up with drastically reduced cycle time, increased quality of the project, and massively leveraged team resources. We've reduced cost and improved delivery times of new projects 10x.

At specific customers that I've been visiting over the past 2 years, we've seen the Data Vault modeling and Methodology really take hold within customer sites. We have happier customers, long-term relationships, and the CORPORATIONS and IT Teams are winning together.

Let me explain: these are real case studies.
Customer A: Full as-is star schemas and EDW in 2 weeks
Took 198 tables, 4 source systems, combined the models (with some manual effort up-front - about two weeks worth of work before I came on-site), and within two days produced the following artifacts:
* Staging models
* Data Vault (EDW) Models
* AS-IS Star Schema Models
* Master Data Models
* Exploration Mart Models
* Oracle Stored procedures to Load Source to Stage
* Oracle Stored Procedures to Load Stage to Data Vault
* Oracle Stored Procedures to load Data Vault to AS-IS Star Schema.

At the end of two weeks, we had produced 3 cubes (that incorporate business rules) for the business users to access, see, feel and touch. We did this all with 3 people (myself, and 2 others) on the team, keeping cost down, delivery high, and quality high. The customer decided this was such a success that they wanted to make a change while I was on-site. They fed us a new source system to combine. We had the integration done and the new system in place within 5 days, again producing 1 new cube.

The business decided that they'd rather use our team for new deliveries instead of building their own analysis and integration projects. We had successfully stemmed (or at least reduced) the tide of spread-marts.

Company B: (Data marts in about an hour)
3 months in to production of the Data Vault EDW, for Manufacturing, we had delivered 5 reporting tables (report collections). The business users wanted to build new "star schemas". We created a two page requirements form with a sign-off. The business users usually took 1 week to "fill in the business requirements", but once they handed it to our team, it only took us about 1 hour to turn it around and have a star schema available (prototype filled with 5000 rows of sample data).

If they liked it, we would load the full compliment overnight. We did this over 15 years ago, and the Data Vault EDW is still in place today, and still running strong. We proved that "Data Marts in about an hour" was possible. Granted, the more complex the business rules, the longer it took to turn around a prototype. Our longest time to deliver in this situation was 1 week from receiving the requirements to prototype. The only other stipulation was that we already had the data, and it didn't require integrating a new system.

We had 3 people on this team; we supported 5000 production reports on a daily basis at the end of 3 months.

Company C: (Turning IT into a profit center), ANY good EDW should be able to do this!
At this particular company, we started the project 6 months in the hole. I was brought in to turn the project around. When I got there they had no requirements, no documentation, no tables, no loading systems, and almost no funding. The business users were fighting with IT over how it should even begin.

Well, long story short - inside of 3 weeks we had business requirements written. But, in addition to that the stake holder was concerned with the overall cost of the project, that he couldn't identify hard ROI, or even that justification for hardware would continue to be a "money pit" to grow the warehouse. We continued building the project this way for several months.

After 6 months we reached phase 1 production state and this is where our success begins. We were a cost-center for our stake holder at this time. We began receiving phone calls from other business units and other projects, could we build them a mart X, or a Mart Y, or could we provide them with reporting tables Z?

We said: yes, but here's our deal: we'll build the mart for you and we'll give you a 10 day or 30 day trial period. After which, if you don't like it, or you don't use it, we'll tear it down and take it away. If you like it/use it, we will begin charging you for disk space, and CPU load cycles to support the hardware necessary to grow your efforts.

We were able to make accurate projections of the hardware required for disk, and the CPU cycles required to load, along with the RAM used. We also monitored their query usage to see what data and how much data they accessed, and how often.

What ended up happening was the new business unit would "sign up" for a data mart service, and begin paying our stake-holder for the privilege of "renting" the machine resources. It got better from there, once they realized that this would work, they began asking for new systems to be incorporated. We would then begin a project costing and estimating phase where they became the "stake-holder" of that part of the system and data set.

We replicated the business model across the entire enterprise. We constantly had more projects than we could fill, the business users were happy, and actually able to cross-charge business units for use of their information. Viola, our main stake holder said we became a profit center for him.

Goes to show you, if you can run your IT team (no matter how big/small) like a business, you will get more business going forward...

I'd love to hear your success stories, if you'd care to share.

Cheers,
Daniel Linstedt


Posted August 23, 2008 5:53 AM
Permalink | No Comments |