Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

June 2007 Archives

I've had several good conversations with different folks in the industry lately about this term and what it means. Lou Agosta was nice enough to write a piece on DMReview, as well as have a really neat phone call with me. We both agree that there are more layers to this onion that originally thought. What a surprise!! But like anything, there are also steps, types, and classes of DDW to eventually get us there. In this entry I dive into the topic of "classifications of terms", and levels of DDW, and attempt to put down a rough road-map to get there. Keep in mind the definition is still in flux, and will be for some time to come.

Dynamic Data Warehousing terminology must be separated into its constituent parts in order to be understood properly (my belief anyhow).

Just like Master Data Management is pulled into: Master Data, and Data Management, and Master Data Management, such is the definition of DDW: Dynamic Data, and Data Warehousing, and Dynamic Data Warehousing

I think each stage of these terms warrants explanation, so here goes:

1. Dynamic Data - something that has been discussed for many many years, and has been affectionately known as Real-Time data, operational data, tactical data, active data, and so on... In other words, it's the data that is dynamic, changing, and responding to the business needs. This particular set of words has zip zero zilch to do with data warehousing, structural integrity, views, queries, ETL, ELT, EAI, and so on - it has everything to do with the data.
2. Data Warehousing - well we all have a pretty good idea as to what this is, and Bill Inmon just solidified the definitions within the architecture of DW2.0. We'll write more on this topic later. Again, how is Data Warehousing dynamic? Well, if you go back to "real-time data" or dynamic data, or active data, you'll see some of the ties. The data warehouse simply becomes a place where both tactical and strategic data can exist. Does this mean that the EDW and the ODS are one in the same? NO - that's NOT what I said, what I said is the "data warehouse" (which includes the ODS (Interactive sector), and a strategic data storage area (Integrated Sector) exist within the same "architectural foundation" called a data warehouse.
3. So where does that leave Dynamic Data Warehousing? Good question, I hope this is one we can answer with many more discussions and entries to come.

Here are my thoughts: When I talk about a "DATA WAREHOUSE" I usually refer to all the moving parts and pieces that make up this "electronic data store", in other words, I include the structure (data model), execution code (SQL queries, loader scripts, unload scripts), data migration code layers (ETL, stored procedures, functions), and of course the metadata (semantic meaning that defines all these things, and hopefully how they all interact).

So when I talk about "DYNAMIC DATA WAREHOUSE", what I'm really saying is:

The following layers must be "included" in the term DYNAMIC:
1. structure and indexing (DDL)
2. integration code
3. SQL Queries
4. BI Reports (some use the queries, some house queries OUTSIDE the RDBMS)
5. Metadata and Semantic Ontologies
5a. This includes: Dependency chains
5b. Workflows (technical workflows)
5b. Data Model Definitions
5c. Aggregate definitions
5d. Security and access concerns .... and so on.

What I mean by Dynamic is the nature of handling change, or as Lou put it: "changing at the speed of business". I would also go so far as to say, "it is AUTOMATED ADAPTATION that enables change at the speed of business."

Now, what does all this mean? To me, Dynamic Data Warehousing is a solution requiring people, machines, standards, processes, architecture, and design elements. In the future once standardization can be executed properly (even within a single DDW within a vendor producing their own "flavor") then it will become possible to AUTOMATE many (not all, but many) of the functions that we currently do by hand.

In other words, like everything else - a "DDW" will follow the commoditization path, the same way appliances are following it. In fact, I'd also be so bold as to say I think Appliances are the right foundation to START with, to move in the direction of the purist DDW. I also think that DDW is a GOAL, not necessarily a solution - some of the parts to what I've outlined may never come to pass, we'll just have to wait and see.

What I will say is this: I know that semantic integration of structures can be done, I know that matching structures to semantic meanings of well-established ontologies can also be done. I also know through a variety of inference engines, neural net algorithms, and visualization that the strength of these relationships can be SEEN, EDITED, and TAUGHT (to the neural net to process this information better the next time). I've seen these software components, they exist today.

I also know that there is still a long way to go, that the results of these data model integrations must be checked manually and corrected - post generation. Furthermore, the only piece that this addresses is the structural component of the DDW. It does not address the automated adaptation of code layers, SQL layers, ETL layers, security layers, metadata layers, nor reporting layers.... these are all things that must be worked on.

For now, all I'm suggesting is "broaden your minds..." Harry Potter movie, art of divination scene. Let's go for the utopian vision, and make our efforts that much better as we progress. After all - a philosopher once said: you set your own limits, and another said: you can only execute to the goals that you set for yourself.

So why not stretch a little bit? In the next entry on DDW, I'll try to unfold the layers - but first, if I missed any layers, please let me know by responding. Furthermore, MARKETING DEPARTMENTS TAKE NOTICE: PLEASE make sure your wording matches your marketing statements, in my last post I took a shot at IBM marketing, I was _NOT_ taking shots at the DB2 UDB product.

Just don't say things to me like: "XXXXXXX is a SOLUTION, not a product, not a this, not a that.... [then later say] YYYYYY is our product, and it is a XXXXXXXX." This is contradictory.

Thanks,
Dan Linstedt
Come get a masters of science in BI at: http://www.COBICC.org


Posted June 27, 2007 6:27 PM
Permalink | No Comments |

Thank-you everyone for the great feedback so far. Let's keep going on this track until someone says that it simply isn't possible. Why? Because as many of you know, I like to jump out beyond the horizon to see what might be done "outside the box", and if there is a remote chance that it will take hold (because of what we see happening), then great! If not, let's ditch the ideas in favor of something better... I must invent, but my ideas are based on many other individuals work in the industry.

I blogged about temperature of data a while back, and recently I've been exploring metadata. Now I ask the question: what about temperature of metadata as it relates to the impact that the change would have? What about notions like Architecture Mining? Metadata Mining? Correlation analysis on structures? Are these things too far out to bend the brain on?

I think not. The time is coming where the next stage to get to (beyond active/real-time) is Dynamic Data Warehousing. I've borrowed from my good friend Stephen Brobst, and his diagram of the 5 stages of Data Warehousing, and added a 6th (see below).

StagesofDDW.jpg


After all, this is the peak of what we are trying to get to: dynamic adaptation to business as business changes, so do all the systems (especially the integration systems). When we look at the end-results of implementing SEI/CMMI Level 5, or ISO 9001/9002/9003 etc... or PMP best practices, or ITIL documents, or ISACA audits, or CoBIT Controls, or good governance - they all come to a similar conclusion about processing: the nature of the processing routines once built, optimized, and well established is to run seamlessly day in and day out, repeatably in the back-office until such time as new processes or new structures must be introduced.

Automation is often left out when people talk about these levels of projects, however automation is really a fundamental goal of IT to begin with. We should be constantly thinking of new ways to automate repeatable and consistent processes. With this in mind, why can't the structures of the data warehouse (metadata, metrics, data itself, unstructured data, indexes, queries, code, etc..) all be subjected to the same repeatable rules? Even with code there's a finite sequence for execution defined in the compiler architectures.

Now, let's take the great leap off the edge of the horizon for a moment.... What happens to systems when the business changes? What happens to architectures or Data Models to be precise? What happens to the business processing (code) built on the source capture systems? They all change - but do they change consistently, repeatably, and can the affect of the change be measured prior to being implemented?

Most of the time (today) the answer is no, there must be some level of human intelligence involved to figure out where the impact is, how big of an impact it is, where the changes need to be made, and then they architect a patch, a band-aid, a new section of code, or a complete rewrite to serve the needs. Well, this _process_ happens over and over and over again. Cost, Risk, and Mitigation analysis. We can safely assume that for most of the changes happening within standard operations that they can be "graded" in accordance with location, risk, and impact. We can safely say that impact measurement can be automatically determined by examining METADATA, or ontologies of metadata which define the pre-existing relationships and associations of that metadata for our tools.

When I see a new element being added to a table in the source databases, I will typically make the assumption that it must be related to the source key from the table in which it is being added to, otherwise they would not have put the attribute there in the first place. Therefore there is no reason why I shouldn't be able to construct a program that recognizes the new element, where it appears on the source - and can "grade" it according to it's impact, and our downstream models' ability to handle the new element dynamically.

In other words, much like applying temperature of data, I maintain that Architecture Mining (or metadata mining or Mining of Ontology trees) can lead us to mathematical results that can apply temperature ratings to impacts of schema changes. What I am saying is that new attributes that appear within XML, XSD, XSL, object, web-services, or table structures can be run through these algorithms, and assigned a green, yellow or red flag based on the impending confidence that the change is "easy with low impact", "somewhat challenging, but we are confident", or "too difficult to achieve without human intervention."

I further assert that these temperature ratings can be placed on a gradient scale, so that alarms won't be raised unnecessarily. Like any A.I. or Neural Network it would have to be trained, and occasionally re-trained or corrected; and there might be the occasional false positives to deal with, but that sure beats adding 150 attributes to a table by hand, just because the ERP coder was up all night implementing that into the source system.

This is only one piece of Dynamic Data Warehousing, this is the structural change adaptation of Dynamic Data Warehousing. I think that we cannot achieve true DDW without this component working first. After the structures are changing seamlessly, we can begin to work on automating the adaptation of views, reports, mart loads, and processing routines.

Do you have any futuristic thoughts about DDW? I'd like to hear about it.

Thanks,
Dan Linstedt
You can get a Masters of Science in BI at Daniels College of Business, Denver U. http://www.COBICC.org


Posted June 7, 2007 9:46 PM
Permalink | 2 Comments |

I've been writing (scantily) about DDW in the past, in this entry we will take a look at what the definition appears to be in the industry, and then I will offer my opinion on what I think the definition _should_ be for DDW. If vendors believe that they have a DDW, or a DDW solution, then I open heartedly invite them to contact the COBICC board members, and give us all a demonstration, along with definitions of what they've produced.

Dynamic Data warehousing, what does it mean to you?
Throughout the industry we've been getting up to speed on Active or Near-Real Time Warehousing lately, and recently we've also begun experimenting with getting to the next level: DW2.0 (which includes an ADW, structured and unstructured information, metadata, and so on). So what are researchers and folks in the industry saying DDW is?

The first link is to a student’s research project regarding what their view of DDW is:
http://64.233.167.104/search?q=cache:KGeKi0tFzFQJ:www.dblab.ntua.gr/~dwq/p44.pdf+dynamic+data+warehouse&hl=en&ct=clnk&cd=1&gl=us

DWs are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered by them. Some of the new queries can be answered by the views already materialized in the DW. Other new queries, in order to be answered by the DW, necessitate the materialization of new views. In any case, in order for a query to be answerable by the DW, there must exist a complete rewriting 5 of it over the old and new materialized views.

Ok, this is an interesting look - but certainly not the complete picture of what I see DDW to be. To give them credit, they are attacking a difficult problem: how to answer a new query that doesn't have the appropriate data set available - by building new materialized views. The concept is decent, but the words "materialized view" make the approach locked in to Oracle, as other databases do not have the notion of a materialized view. They go on to discuss how to create new views that are needed, and they do a good job of expressing the mathematics behind the desire. Again, this is only one piece of Dynamic Data Warehousing.

Here's another project:
http://xml.coverpages.org/xyleme.html
While they discuss some notions of dynamic data warehousing, they do not disclose all the pieces they will manage. They seem more interested in the fact that they can store vast quantities of XML, and rely on the notions that XML query can change with the XML document structure changing, true - but this still doesn't answer the questions about dynamic restructuring, dynamic indexing (changing indexes when a new one is needed), dynamic query building, dynamic security, and so on. I'll provide this list a little later. However, they are closer to a holistic solution than the first reference.

Here's another interesting look, they start out sounding very promising, but when it comes to brass tacks they are merely discussing Dynamic View Generation - still a worthy cause, but not quite a DDW (as they originally claim).
http://davis.wpi.edu/dsrg/EVE/idm2002-eve.html

IBM has been at it a while, and in this definition - they are defining (you guessed it) an appliance, with bundled software, but in their press release I blogged on yesterday they said DDW is not a tool, a product, or a service... yet again they contradict themselves. Besides that - what they really have is an ADW, not a DDW.... read on... http://www.intelligententerprise.com/showArticle.jhtml;jsessionid=MHPVP5URAXGSAQSNDLRSKH0CJUNN2JVN?articleID=198000675

And another post by Doug Henschen agrees with me.

My friend Lou Agosta does a decent job of discussing some of the background pieces involved in Dynamic Data Warehousing. I think what's missing here is the definition of what Dynamic really means... Should Dynamic mean the data warehouse is dynamic with near-real time data? should it mean it is dynamic with query changes? Dynamic with unstructured data? what does it mean?

Here's another vendor (Axiom Software Labs) that claim to have a DDW, they are probably closer to the mark, but again all of these solutions say they have dynamic abilities, but none of them talk about HOW these abilities work, nor do they disclose what true DDW needs to be. Oh yes, a new acronym is emerging (unfortunately) DyDa - what?

Ok, here are my thoughts on what is required in order to be "the next level" or to be a DDW. We require that all of the following be recognized as dynamic:
* Structural changes to structured data sets are recognized, and changed as available - automated back-room basis.
* Views are adapted as needed when structures change
* Active and Batch loaded data is occurring on the same system at the same time
* Procedural Load routines are adapted to the structure changes when they occur
* Data Mining occurs to build new models against the data in a dynamic fashion
* Architecture mining occurs to determine if the structural changes are attached in the right place.
* Unstructured Data is attached, and searched - all data which can be inserted into a structural matrix will be.
* BI Reports and dashboards are dynamically altered to include the new elements.
* Web services are versioned and re-released to include the new elements.

And so on. Dynamic is a very versatile word, and DDW (in my mind) encompasses a whole lot more than just one piece of the pie (Dynamic Data or Dynamic Views). While these are noble efforts and steps in the right direction, they are _not_ qualified to be called a DDW environment, because they are only pieces of a larger puzzle.

I welcome your comments as always, do you have a definition of DDW that you can share? What is it in your mind?

Thank-you,
Daniel Linstedt
http://www.COBICC.org/


Posted June 6, 2007 4:04 AM
Permalink | 5 Comments |

Warning: this is more of a rant, which I usually don't do - my apologies... This came across my desk this morning: IBM announces capabilities to support Dynamic Data Warehousing... It's always interesting when big companies don't give credit where credit is due. Of course, I'm not the first one to discuss DDW, and I certainly won't be the last. On the other hand, IBM is making claims that they came up with the definition... We'll I'll be! I had no idea that anyone was even talking about it. (See my post from 2005, here).

Bill Inmon, myself, Claudia Imhoff, and a few other peers on the COBICC board (Colorado Business Intelligence Community Connection) have been discussing these notions for years. Now don't get me wrong, I applaud IBM for beginning the effort to meet the needs of DDW, but I don't think they should be claiming credit for being the first to create a solution. Ralph Kimball has also been discussing DDW (in a different light)... It's the natural evolution of the warehousing industry.

Here's a quote directly from the IBM marketing brochure:

Dynamic warehousing is not a product, tool or simple one-off solution. It is an approach that enables you to deliver more dynamic business insights by integrating,
transforming, harvesting and analyzing insights from structured and unstructured information. The result is a framework for delivering right-time, contextual information for both strategic planning and operational purposes. Enabling dynamic warehousing requires a set of services that extends beyond traditional data warehousing and reporting to support the increasing number of business processes and applications requiring analytic capabilities, and to address the demands for more dynamic business insight

Interesting, they say it's not a product, tool or one off solution but rather an approach. Then they go on to state that IBM DB2 UDB is the only database that supports this approach, and that you have to hire their consultants to build it.

When will vendors "get it"? And begin to credit the resources that create the terminology and define the industry? You know, my credit goes back to all my mentors - for creating an industry I earn a living in. I can only hope that one day I'm as smart as they are.

They are right about one thing: DDW is the future, but they fail to include neural networking, AI, discovery, semantic ontologies, and dynamic integration. In my mind, they are using the term loosely to define their next level of "EDW appliance". This is what lead my friend Bill Inmon to define and trademark DW2.0, lack of common definition for the term Data Warehouse.

Here's a quote about their new "DDW:"

Traditional data warehousing solutions have provided insurance companies with a valuable tool for analyzing paid claims for possible fraud. Unfortunately, companies often have difficulty recovering funds that have already been dispersed. Dynamic warehousing provides a way to transform this process by aggregating relevant information from across the organization. For example, it can include details that are potentially relevant to claims flight, and embed scoring and analytics directly into the claims review process to identify potential
fraud prior to approval and payment.

Ok, if a Data Warehouse is built properly to begin with, it would have already put together levels of aggregation horizontally across the organization. Furthermore, it would already have had data mining capabilities built in to identify fraud. There is NOTHING dynamic about this definition, where's the automated adaptability to services? Where's the automated data model recognition changes? Where's the automated adaptability to BI queries? Where's the automated discovery of new BI queries and reports?

Here we go again:
Customer service

Most companies have a wealth of information about their customers stored in various systems across the organization but have focused their warehousing efforts on more traditional reporting of customer problems. However, if call center agents can combine the information they have collected about a customer with historical information in the data warehouse—and leverage dynamic analysis capabilities—they can identify other potential issues the customer may be facing. Armed with this knowledge, agents can better recognize the likelihood of a customer leaving while they are talking to the client, improving the chances that they can take steps to maintain the customer’s loyalty. They can even more easily identify appropriate cross-sell opportunities to turn customer support efforts into revenue-generating opportunities.

This is Real-Time Data Warehousing, Active Data Warehousing with a live neural-network on top. This is what banks, credit card companies, and airlines do today... Where's the DYNAMIC part of this warehouse? If the warehouse was built properly to begin with, it would scale, it would run in real-time, and it would have analytics (i.e.: live mining) built in.

Appliances, Teradata, Oracle, HP, Microsoft (SQLServer), and so on are all capable of Real-Time and Active Data Warehousing. Each of the cases listed in this marketing brochure reflect ADW, not DDW. There is nothing said in here that indicates that the new IBM "appliance" with DB2 UDB is dynamic in any way.

ICING ON THE CAKE:

The hub of dynamic warehousing is still the data warehouse. Moving forward, however, data warehouses will need to support increasingly mixed application requirements, including mission-critical operational activities in addition to traditional back-office reporting. Warehouses will also need to provide expanded functionality and work seamlessly with the other services required to enable dynamic warehousing. A data warehouse solution should be able to consolidate data marts and silo solutions across the enterprise into a unified warehouse where needed, while still enabling distributed data marts to address business- specific requirements. And as the demand for resources increases, companies will require more balanced and optimized performance from warehouses (balanced storage, hardware and software performance) to keep costs in check and meet varying service level requirements.

When will these big companies get it? Dynamic Data Warehousing IS NOT YET HERE! IBM Starts by saying that DDW is not a tool, a product, etc... but then at the end defines how IBM DB2 Balanced Warehouse is a DDW... Where's the definition of what DDW really is? In other words, where do they state what part is Dynamic about their system?

The sad part of all this is: IBM has a really great approach, it's just misnamed. It is not DDW, but ADW, it is a great approach to a next generation integrated appliance (with all the software and hardware bundled to make it go). I like the approach, but PLEASE don't call it dynamic data warehousing.

In my next entry I'll re-define Dynamic Data Warehousing and it's evolving state, IBM - you cannot take credit for creating this term (neither can I), what you are defining is Active Data Warehousing, or Real-Time Data Warehousing, not Dynamic Data Warehousing...


Posted June 5, 2007 4:41 AM
Permalink | 2 Comments |