Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

In my last entry in this category, I described automorphic data models and how the Data Vault modeling components is one of the architectures/data models that will support dynamic adaptation of structure. In this entry I will discuss a little bit about the research I'm currently involved in, and how I am working towards a prototype of making this technology work.

If you're not interested in the Data Vault model, or you don't care about "Dynamic Data Warehousing" then this entry is not for you.

The Data Vault model has reached the height of flexibility by applying the Link tables. It is an architecture that is linear scalable and is based on the same mathematics that MPP is based on. Single Link tables represent associations, concepts linking two or more KEY ideas together at a point within the model. They also represent the GRAIN of those concepts.

Because the link tables are always a Many To Many, they are extracted away from the traditional relationship (1 to many, 1 to 1, and many to 1). The Links become flexible, and in fact, dynamic. By adding strength and confidence ratings to the link tables we can begin to gauge the STRENGTH of the relationship over time.

Dynamic mutability of data models is coming. In fact, I'd say it's already here. I'm working in my labs to make it happen, and believe me it's exciting. (only a geek would understand that one...) The ability to:

* Alter the model based on incoming where clauses in queries (we can LEARN from what people are ASKING of the data sets and how they are joining items together)
* Alter the model based on incoming transactions in real-time (by examining the METADATA) and relative associativity / proximity to other data elements within the transaction.
* Alter the model based on patterns DISCOVERED within the data set itself. Patterns of data which were yet previously "un-connected" or not associated.

The dynamic adaptability of the Data Vault modeling concepts show up as a result of these discovery processes. I'm NOT saying that we can make machines "think" but I AM suggesting that we can "teach" the machines HOW the information is interconnected through auto-discovery processes over time. This mutability of the structure (without losing history) begins to create a "long term memory store" of notions and concepts that we've applied to the data over time.

Through recording a history of our ACTIONS (what data we load, and how we query) we can GUIDE the neural network into better decision making and management over the structures underneath. This includes the optimization of the model, to discovery of new relationships that we may not have considered in the past.

The mining tool is:
* Mining the data set AND
* Mining the queries AND
* Mining the incoming transactions

to make this happen. We've known for a very long time that Mining the data can reap benefits, but what we are starting to realize NOW is that mining these other components really drive home new benefits we've not considered before. In the Data Vault Book (the new business supermodel) I show a diagram of convergence (which has been bought off on by Bill Inmon). Convergence of systems is happening, Dynamic Data Warehousing is happening.

These neural networks work together to achieve a goal: creating and destroying link tables over time (dynamic mutability of the data model) while leaving the KEYS (Hubs) and the history of the keys (Satellites) in-tact. Keep in mind that the Satellites surrounding Hubs and Links provide CONTEXT for the keys.

I've already prototyped this experiment at a customer, where I personally spent time mining the data, the relationships, and the business questions they wanted to ask. I built 1 new link table as a result with a relationship they didn't have before. We used a data mining process to populate the table where strength and confidence were over 80%. The result? Their business increased their gross profit by 40%. They opened up a new market of prospects and sales that they didn't previously have visibility to.

Again, I'm building new neural nets, new algorithms using traditional off the shelf software and existing technology. It can be done, we can "teach" systems at a base level how to interact with us. They still won't think for themselves, but if they can discover relationships that might be important to us, then alert us to the interesting ones - then we've got a pretty powerful sub-system for back-offices.

More on the mathematics behind the Data Vault is on its way. I'll be publishing a white paper on the mathematics behind the Data Vault Methodology and Data Vault Modeling on B-Eye-Network.com very shortly.

Posted August 27, 2008 5:54 AM
Are you talking about an intersect type table?

I'm not great at "reading" word problems...any pics of the model (simple of course)?

I definitely like the idea of the neural network...


Hi Chet,

Yes, I am referring to an intersect type table. In 3rd normal form modeling it's also known as a hierarchical recursive relationship table.

Basically in mathematics it's referred to as a Tuple (where only the keys are a part of the Tuple), so Tuple math abounds...

Very interested in learning about Data Vault but I find the book extremely over priced. I can't find a lot of info on the web (examples, etc). How do you expect the approach to catch on like this?

Hi John,

I've reduced the cost significantly, just enough to cover printing costs. It is a business book about the business value, not a technical book about technical aspects of building the Data Vault. The Data Vault series of articles are available for free on www.TDAN.com, please feel free to download those as well. Also, if you know of executives or others who might be able to make use of this book, feel free to share the PDF.

Also, I have a technical book with examples, how-to, and best practices on the way. It should be available next year. As far as your comment about how I expect it to catch on: I’ve already gotten a huge draw from the European Union, and the US Government, and a number of companies here in the US, and around the world. Long before the book was ever published. We have well over 150 consultants now trained and certified on http://www.GeneseeAcademy.com who build Data Vaults.

There is also a free forum to which you can sign up and peruse some of the technical issues at: http://www.DanLinstedt.com

I am from the Netherlands. Off course the book is great and you should read it (price is sooo relative if you consider the potential benefits)

Data Vault is an architectural style and methodology that is hugely exciting to learn. Every seasoned professional is excited when Dan and I teached them in our 4 day seminar about the Data Vault.

You actually see the 'lightbolt' turning on.

It will happen...spread the word.

There is NO viable alternative......

