Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

It seems people have taken the term "Dynamic Data Warehousing" and abused it. They've made it out to be about "Dynamic Data" and completely ignored "Dynamic Modeling", or dynamic restructuring as the case may be. Automorphic means self-changing, self-adapting. In this entry we'll talk about different capabilities of Dynamic Data Warehousing and the changes to data models as they grow.

First, let's define what we mean by Dynamic Data Warehousing:

My definition of DDW has come to mean:
* Data models that can adapt to the incoming data based on A.I. rule sets, learnt data patterns, linguistics, metadata, and associativity.
* Load patterns that are driven by the changes, to move data from "point a" (the source) directly to point b (the target, or the dynamic model)
* Indexes that shift, are created and dropped on the fly based on load patterns and query patterns
* Learning systems attached to the firm-ware of the device to watch, and learn about the metadata - tying the metadata together is an important step.
* Adaptable cubes - dynamic cubes or in-memory aggregations based on "temperature data" (hot, cold, warm). In other words, in-memory ROLAP solutions that are built based on metadata and cube structures, attribution of data sets, and queries or the questions being asked.

Now this may all sound really interesting, and extremely future based - but I can assure you - it's not. I am currently working on solutions in my lab which entail the execution of portions of these elements. The hardest one (you might think) is the dynamic modeling, or dynamic restructuring of the database... well, let me tell you - nope! That’s not the hardest piece (when the Data Vault modeling architecture is used)...

Keep in mind that the DDW is one or two steps beyond the Operational Data Warehouse, which I've just begun writing about. Also remember that the term: DDW retains all the responsibilities of the "Data Warehouse", as in: time-variant, non-volatile, granular, etc... That said, the question for Dynamic Data Modeling then becomes: how do you keep history on massive volumes of information without losing value, and without "reorganizing" or altering existing structures?

The answer is to come later, but with the Data Vault modeling methodology it CAN be done...

So how about 3rd normal form?
Sorry, it seems to be incapable of handling dynamic structure change. Why? The same reason that it fails as a Data Warehousing architecture in the first place. Parent-Child relationships embedded in the tables, and then: placing the structure over time. changing the structure of a 3NF DW is bad enough, let alone trying to alter it on-the-fly during loading and maintain existing history. This requires super-human strength, massive amounts of disk (to copy the elements), and sometimes changes the MEANING of the data when the structure changes.

"Danger Will Robinson!" (quote from a U.S. T.V. show ... lost in space, from the 1960's) http://en.wikipedia.org/wiki/Danger,_Will_Robinson

Ok, so what about Star Schema's?
Well, if you read through the definitions of Star Schemas AS ENTERPRISE DATA WAREHOUSES you quickly find that it's not the right fit, hence the new Generation 2, DW2.0(tm), and other new modeling concepts like the Data Vault.

Have you ever tried to change the structure of a conformed dimension? Does it indeed get harder as the system grows, and/or the more conformity it has? Does it slow down your development efforts?

Yes to all of these (at least from my personal experience). Does that make Star Schema bad? NO! Star Schemas are AWESOME, WONDERFUL, and the ONLY solution to work for OLAP, and Drill Down... Do they have a place in the DDW? YES! ABSOLUTELY! well then, where?

They have a place as adaptable cubes. Something funny happens when Star Schemas are used as SINGULAR STARS to LOGICALLY define VIRTUAL marts. They work extremely well, and as long as they are logical (not physically implemented), then dynamic memory cubes can become a reality. That's right! IN-MEMORY CUBING, it's happening already in certain DB engines, but it's not yet dynamic.

However, as a DDW foundational structure, we need something else. The Data Vault Model seems to be (today) the only other choice available that is actually capable of executing on this dream. We'll talk more about this in my Data Vault blog on http://www.BetterDataModel.com

Cheers for now,
Dan Linstedt
DanL@DanLinstedt.com


Posted April 26, 2008 7:56 AM
Permalink | 1 Comment |

1 Comment

I think you are correct that most people ignore the 'dynamic modeling' part of dynamic data warehousing, although one major exception to that has been Kalido (which has been a very vocal advocate of that approach for years).

To reinforce this dedication, Kalido recently added a visual modeling component to their offering that allows Kalido customers to automatically drive their warehouse from a business focused conceptual layer diagram that is automatically translated to metadata to drive warehouse processing. This includes everything from staging table creation to the storage mechanism in the relational database, to the generation of physical reporting structures, to the population of BI semantic layers like Business Objects Universes and Cognos Framework Manager models.

The underlying mechanism for this is similar (though not exact) to your Data Vault approach (e.g. Hub, Link, and Satellite -> Kalido Adaptive Services Core, Transaction Datasets, and Mapping Tables). This contributed to Kalido being the first software package being certified compliant with DW 2.0 by Bill Inmon.

This is not to say Kalido addresses everything on your list--yet. But the footprint is certainly expanding. This leads me to a question, though. There are a class of tools in the market that I will call “Usage and Query Monitoring”. Examples include Teleran and BEZ. When deployed, these tools gather a lot of information about who is using what data, allowing you to manually make decisions about things you mention (tuning queries, adding indexes, building summaries) based on actual data points. Do you envision a place for these types of applications in addressing dynamic data warehousing (e.g. automatically acting on information gathered), or are you thinking about a new class of tool here? Thanks.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›