Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

About this blog >

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can’t wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

Recently in Dynamic Data Warehousing Category

I've blogged about this topic for many years now, my first mention of it was in my www.TDAN.com articles regarding the Data Vault Modeling architecture. However, that said, I've been blogging on everything from autonomic data models, to dynamic data warehousing, but in my research, I've come to realize I've left out some very critical components. I've lately been experimenting with building a self-adapting structured data warehouse. There are many moving pieces and not all the experiments are finished, so I cannot write (yet) about any of the findings. But here, I'll expose some more of the under-belly as it were that is necessary to make DDW a reality (in my labs anyhow)....

I've tried and tried to find a new name for this thing, but alas, it just seems to elude me. Dynamic Data Warehousing seems to have a nice ring, and is quite the nice fit. The term however evokes all kinds of different meanings to different companies and different people. So much so, that I've had open discussions with IBM in the past about their use of the term! Oh-well, water under the bridge.

But that brings me to my next point. There are missing components to my definition of DDW, I didn't get it all, and I'm sure that this is just another step in the definition (that the definition will not be completed for another year or two). If I look back at what's going on I see the following:

Convergence of:
* Operational Processing and Data Warehousing.
* Master Data and Metadata to use the Master Data Properly
* Tactical decisions backed by strategic result sets
* Business, Technical, Architectural, and Process Metadata
* Real-Time and Batch processing
* Standard reporting technologies and "Live animated scenarios" with walk-throughs and 3D imagry
* Human-machine interfaces
* MPP RDBMS systems and Column Based Database solutions

Why then shouldn't we see convergence of "data models" and "business processes"?
or "Data Models" and "Systems Architecture"?

The point is: WE ARE. (or at least I am). Not only is this happening in my labs, but It's being requested of me when I visit client sites. The customers want "1 solution", or better yet, they want a solution that "appears to learn" based on the demands put upon the system.

Why do I say "appears to learn?"
because Machine learning and appearances of machines translating context are two totally different things. I cannot and will not claim to have made a machine to think. However, I can and have made a machine's enterprise data warehouse responsive to external stimulous - at least when it comes to the data model, loading routines, and queries. Please do NOT mistake this as anything more than AI applied in a new manner - mining metadata (structure and queries and load-code and web-services) rather than just mining data sets themselves. (more on that later, much later --- I still have a LOT of research to do).

Ok - so what's missing from the Dynamic Data Warehouse definition?
* Use of metadata: business, technical, and process during the model learning/adaptation phase
* Use of an ontology (part of business and technical metadata as described above)
* Use of a training model, all good neural nets need to be trained over time, and then corrected.
* Use of the queries to examine and compare HOW the data sets are being used and accessed against the current data model
* Use of a minimal load-code parser, again to assist in training the neural net to recognize the correct structure.

Anyhow you get the point. Dynamic Data Warehousing is about a back office system, that responds to changes in the structured data world - as the queries change then the indexes change. As the incomming data set changes, the model needs to change. Some queries (if consistent enough) can actually express new relationships that need to be built.

This is an adaptable system, this is a dynamic system, this will eventually become a true Dynamic Data Warehouse.

Thoughts?
Dan Linstedt
DanL@DanLinstedt.com


Posted September 21, 2008 9:52 PM
Permalink | 3 Comments |

In my last entry in this category, I described automorphic data models and how the Data Vault modeling components is one of the architectures/data models that will support dynamic adaptation of structure. In this entry I will discuss a little bit about the research I'm currently involved in, and how I am working towards a prototype of making this technology work.

If you're not interested in the Data Vault model, or you don't care about "Dynamic Data Warehousing" then this entry is not for you.

The Data Vault model has reached the height of flexibility by applying the Link tables. It is an architecture that is linear scalable and is based on the same mathematics that MPP is based on. Single Link tables represent associations, concepts linking two or more KEY ideas together at a point within the model. They also represent the GRAIN of those concepts.

Because the link tables are always a Many To Many, they are extracted away from the traditional relationship (1 to many, 1 to 1, and many to 1). The Links become flexible, and in fact, dynamic. By adding strength and confidence ratings to the link tables we can begin to gauge the STRENGTH of the relationship over time.

Dynamic mutability of data models is coming. In fact, I'd say it's already here. I'm working in my labs to make it happen, and believe me it's exciting. (only a geek would understand that one...) The ability to:

* Alter the model based on incoming where clauses in queries (we can LEARN from what people are ASKING of the data sets and how they are joining items together)
* Alter the model based on incoming transactions in real-time (by examining the METADATA) and relative associativity / proximity to other data elements within the transaction.
* Alter the model based on patterns DISCOVERED within the data set itself. Patterns of data which were yet previously "un-connected" or not associated.

The dynamic adaptability of the Data Vault modeling concepts show up as a result of these discovery processes. I'm NOT saying that we can make machines "think" but I AM suggesting that we can "teach" the machines HOW the information is interconnected through auto-discovery processes over time. This mutability of the structure (without losing history) begins to create a "long term memory store" of notions and concepts that we've applied to the data over time.

Through recording a history of our ACTIONS (what data we load, and how we query) we can GUIDE the neural network into better decision making and management over the structures underneath. This includes the optimization of the model, to discovery of new relationships that we may not have considered in the past.

The mining tool is:
* Mining the data set AND
* Mining the ARCHITECTURE
* Mining the queries AND
* Mining the incoming transactions

to make this happen. We've known for a very long time that Mining the data can reap benefits, but what we are starting to realize NOW is that mining these other components really drive home new benefits we've not considered before. In the Data Vault Book (the new business supermodel) I show a diagram of convergence (which has been bought off on by Bill Inmon). Convergence of systems is happening, Dynamic Data Warehousing is happening.

These neural networks work together to achieve a goal: creating and destroying link tables over time (dynamic mutability of the data model) while leaving the KEYS (Hubs) and the history of the keys (Satellites) in-tact. Keep in mind that the Satellites surrounding Hubs and Links provide CONTEXT for the keys.

I've already prototyped this experiment at a customer, where I personally spent time mining the data, the relationships, and the business questions they wanted to ask. I built 1 new link table as a result with a relationship they didn't have before. We used a data mining process to populate the table where strength and confidence were over 80%. The result? Their business increased their gross profit by 40%. They opened up a new market of prospects and sales that they didn't previously have visibility to.

Again, I'm building new neural nets, new algorithms using traditional off the shelf software and existing technology. It can be done, we can "teach" systems at a base level how to interact with us. They still won't think for themselves, but if they can discover relationships that might be important to us, then alert us to the interesting ones - then we've got a pretty powerful sub-system for back-offices.

More on the mathematics behind the Data Vault is on its way. I'll be publishing a white paper on the mathematics behind the Data Vault Methodology and Data Vault Modeling on B-Eye-Network.com very shortly.

Cheers,
Dan Linstedt


Posted August 27, 2008 5:54 AM
Permalink | 5 Comments |

I've just completed Bill Inmon's brand new course on Unstructured Data using his new Unstructured Data ETL tool. It's been very eye opening. Every time I meet Bill I'm always learning something new. There was a discussion at the end of the class that asks the question: WHAT do you do if you FIND "structural definition elements" in unstructured data that AREN'T represented in the EDW??

So what is interesting here, is the notion that mining unstructured data gains knowledge or metadata by association that provide definitional elements _about_ the information held within. What we can do, and what we need here is truly Dynamic Data Warehousing - which includes the ability to BRIDGE structures, along with creating new structures.

To get to the unstructured data mart, we should be going throught the Dynamic Data Warehouse. Remember, I'm using the term Dynamic to include STRUCTURAL CHANGE, INDEX CHANGE, etc... So what that means is: that the structure which is discovered through unstructured data surfing is built via nerual networks, then it is subsequently graded on strength and confidence, and finally - optimized and adjusted by the neural net as new unstructured data is fed in.

This is what the Dynamic Data Warehouse looks like to me. The follow on is dynamic cubes, and once the unstructured data reaches the dynamic cubes we (I'm sure) will be surprised at what we find...

Stay tuned for more info later.

Cheers,
Dan L


Posted May 2, 2008 1:42 PM
Permalink | No Comments |

It seems people have taken the term "Dynamic Data Warehousing" and abused it. They've made it out to be about "Dynamic Data" and completely ignored "Dynamic Modeling", or dynamic restructuring as the case may be. Automorphic means self-changing, self-adapting. In this entry we'll talk about different capabilities of Dynamic Data Warehousing and the changes to data models as they grow.

First, let's define what we mean by Dynamic Data Warehousing:

My definition of DDW has come to mean:
* Data models that can adapt to the incoming data based on A.I. rule sets, learnt data patterns, linguistics, metadata, and associativity.
* Load patterns that are driven by the changes, to move data from "point a" (the source) directly to point b (the target, or the dynamic model)
* Indexes that shift, are created and dropped on the fly based on load patterns and query patterns
* Learning systems attached to the firm-ware of the device to watch, and learn about the metadata - tying the metadata together is an important step.
* Adaptable cubes - dynamic cubes or in-memory aggregations based on "temperature data" (hot, cold, warm). In other words, in-memory ROLAP solutions that are built based on metadata and cube structures, attribution of data sets, and queries or the questions being asked.

Now this may all sound really interesting, and extremely future based - but I can assure you - it's not. I am currently working on solutions in my lab which entail the execution of portions of these elements. The hardest one (you might think) is the dynamic modeling, or dynamic restructuring of the database... well, let me tell you - nope! That’s not the hardest piece (when the Data Vault modeling architecture is used)...

Keep in mind that the DDW is one or two steps beyond the Operational Data Warehouse, which I've just begun writing about. Also remember that the term: DDW retains all the responsibilities of the "Data Warehouse", as in: time-variant, non-volatile, granular, etc... That said, the question for Dynamic Data Modeling then becomes: how do you keep history on massive volumes of information without losing value, and without "reorganizing" or altering existing structures?

The answer is to come later, but with the Data Vault modeling methodology it CAN be done...

So how about 3rd normal form?
Sorry, it seems to be incapable of handling dynamic structure change. Why? The same reason that it fails as a Data Warehousing architecture in the first place. Parent-Child relationships embedded in the tables, and then: placing the structure over time. changing the structure of a 3NF DW is bad enough, let alone trying to alter it on-the-fly during loading and maintain existing history. This requires super-human strength, massive amounts of disk (to copy the elements), and sometimes changes the MEANING of the data when the structure changes.

"Danger Will Robinson!" (quote from a U.S. T.V. show ... lost in space, from the 1960's) http://en.wikipedia.org/wiki/Danger,_Will_Robinson

Ok, so what about Star Schema's?
Well, if you read through the definitions of Star Schemas AS ENTERPRISE DATA WAREHOUSES you quickly find that it's not the right fit, hence the new Generation 2, DW2.0(tm), and other new modeling concepts like the Data Vault.

Have you ever tried to change the structure of a conformed dimension? Does it indeed get harder as the system grows, and/or the more conformity it has? Does it slow down your development efforts?

Yes to all of these (at least from my personal experience). Does that make Star Schema bad? NO! Star Schemas are AWESOME, WONDERFUL, and the ONLY solution to work for OLAP, and Drill Down... Do they have a place in the DDW? YES! ABSOLUTELY! well then, where?

They have a place as adaptable cubes. Something funny happens when Star Schemas are used as SINGULAR STARS to LOGICALLY define VIRTUAL marts. They work extremely well, and as long as they are logical (not physically implemented), then dynamic memory cubes can become a reality. That's right! IN-MEMORY CUBING, it's happening already in certain DB engines, but it's not yet dynamic.

However, as a DDW foundational structure, we need something else. The Data Vault Model seems to be (today) the only other choice available that is actually capable of executing on this dream. We'll talk more about this in my Data Vault blog on http://www.BetterDataModel.com

Cheers for now,
Dan Linstedt
DanL@DanLinstedt.com


Posted April 26, 2008 7:56 AM
Permalink | 1 Comment |

Bill Inmon and I sat down the other day to discuss a system that we are building. We didn't have a good "name" for it, but what it amounts to is: Operational Data Warehousing. If you can believe it, what we've done is taken the Operational specifics of systems capturing data - and placed it on top of the Data Warehouse as a single integrated historical and operational data store. We are currently using the Data Vault model for this componentry. Some folks have called this "Active Data Warehousing" in the past, but we feel that this is one step beyond, in that it actually IS the operational store at the same time as being the Data Warehouse. Convergence has arrived...

I've blogged about convergence in the past, it's no secret that the world is converging, and I.T. is no different. It is also no secret that EDW technology is converging with operational technology. Well, if we look behind us (20/20 is always best) we can see the divergence path of data warehousing and operational systems, and the re-convergence of these systems. Active Data Warehousing coupled with SOA, and real-time alerts coming back from the ADW have begun to turn the tables.

We have closed the gap on this one. Using the principles of the Data Vault modeling (http://www.DanLinstedt.com) we've constructed an Operational Data Warehouse (right now, Bill and I do not have a better term for this, Bill also thought that this is a new approach).

What does Operational Data Warehouse do?
One way to describe it is as an Operational Data Store with history.

Another way to describe it is: as a data warehouse with operational (raw) data.

Why do it this way?
Well for one, it provides traceability in all the data. Bringing in the RAW operational data over a web-service (as generated by the upstream machines), provides us with accountability, auditability and pure traceability. By utilizing the notions of the HUB entity within the Data Vault structures, we achieve horizontal integration across the data sets. This operational data warehouse is front-ended by web-services, and has direct integration in to the business processes. It is not fed with any sort of "batch" system, it is however pre-loaded with master data.

The structures of the Data Vault have been setup within the databases in such a way as to allow tremendous scalability and flexibility. We have physically partitioned the machines for security purposes, and scalability purposes. We can join 800M rows to 300M rows to 100M rows, and bring back 10k rows in under 10 seconds when we know what we're looking for. This setup is housed on SQLServer2005 on Windows 2003 r2, with 32 bit, 2 dual core CPU's at 2.8 GHZ, 2GB RAM.

So what's this got to do with Operational Data Warehousing?
Plenty. Operational data warehouses (a very lose term today) consist of the following requirements:
* Must be accountable
* Must be auditable
* Must be a system-of-record
* Must interact with other operational systems
* Must house operational data
* Must house historical data
* Must NOT separate operational data from historical data in the data store.
* Must be the SOURCE for a major business function
* Must be real-time (can have batch feeds, but must be real-time in data streams)
* Must be part of the business process flows.

So what are the technical requirements?
* Must be scalable
* Must be flexible
* Must NOT break history when the business changes/data models change
* Must NOT break existing data feeds when the model changes
* Must be FAST access, fast insert, etc...

And of course it MUST follow the DW2.0 requirements:
* Must have historical data
* Must not be "updated" directly (would break auditability)
* Must maintain cross-functional relationships
* Must be GRANULAR (to the absolute lowest level of grain available)
* Must provide strategic and tactical value
* Must include indexes/pointers/links to unstructured information

So what? How do I get there?
We've used the Data Vault data modeling to get there. It meets all these needs and has been blessed by Bill Inmon as the "optimal choice for DW2.0" data modeling. Because of the structures, along with the foundational approaches to loading the Data Vault, and what the data in the Data Vault represent - we've been able to construct the system described above. In fact, we have two of these up and running. One in our facilities in Denver, and one in Washington DC.

So you mean to say there "is no operational system"?
There is partially, there are many "machines" that collect the information operationally, and pass it back to our Operational Data Warehouse (Data Vault), but - they do not house the information after they've released it to us. The ODW Data Vault actually stores all the operational information from around the country, and soon - around the world.

Next time we'll dive in a little deeper as to what it means to construct one of these, and how they work.

You might already have one of these, if you do - I'd love to hear about it. As always, thoughts, comments, corrections, are welcome.

Cheers,
Dan Linstedt


Posted February 25, 2008 8:26 AM
Permalink | 2 Comments |

I've recently begun research in to this area, and am calling this "Automorphic data models" rather than dynamic data warehousing, because I think the concept lends itself better to this kind of term. Dynamic Data Warehousing seems to be an overly-used slightly abused term in the industry, and raises quite a few questions as to how, and what it is. Vendors are also using this term to mean different things. We'll let the business and the vendors work out their definition of this term over the next few years. I'm going to write exclusively (for a while - in this section) on Automorphic Data Modeling. These entries are aimed at the researches and the scientific people in the audience.

First, I must apologize to all those who _really_ know this stuff. I am an architect, and an Information modeler at heart. I believe these connections exist to the Data Model architecture I wrote up called the Data Vault Model, because it is based in spatial-temporal mathematics, and because it is based on the "poor mans definition of how the brain MIGHT store/use/retrieve information." Based on these hypothesis, I can see where the mathematics of these types apply to the model. I'd love to hear from those of you as to why these theories will or won't work, it will be interesting to see how this progresses.

If we start with Websters definition of Automorphic we end up with the following:

Patterned after one's self.

The conception which any one frames of another's mind is more or less after the pattern of his own mind, -- is automorphic. --H. Spenser.
http://dictionary.reference.com/browse/automorphic

However, I prefer the mathematical definition of Automorphism:

In mathematics, an automorphism is an isomorphism from a mathematical object to itself. It is, in some sense, symmetry of the object, and a way of mapping the object to itself while preserving all of its structure. The set of all automorphisms of an object forms a group, called the automorphism group. It is, loosely speaking, the symmetry group of the object. http://encyclopedia.thefreedictionary.com/automorphism

Automorphic Groups: (Which is what I'd suggest the Data Vault model is built from)

In mathematics, the general notion of automorphic form is the extension to analytic functions, perhaps of several complex variables, of the theory of modular forms. It is in terms of a Lie group , to generalize the groups SL2(R) or PSL2 (R) of modular forms, and a discrete group , to generalize the modular group, or one of its congruence subgroups. The formulation requires the general notion of factor of automorphy for , which is a type of 1-cocycle in the language of group cohomology. The values of may be complex numbers, or in fact complex square matrices, corresponding to the possibility of vector-valued automorphic forms. The cocycle condition imposed on the factor of automorphy is something that can be routinely checked, when is derived from a Jacobian matrix, by means of the chain rule. http://encyclopedia.thefreedictionary.com/Automorphic+form


Essentially what we are doing within the Data Vault data model is a form of Automorphism. The Data Vault modeling structures are built The Data Vault Model is actually based on many different components of temporal mathematics and spatial mathematics. (I've listed a few of the research papers I used in the 1990's to help me construct the structural integrity of the Data Vault):

1. “Unifying Temporal Data Models via a Conceptual Model”, http://www.cs.arizona.edu/~rts/pubs/ISDec94.pdf
2. “Notions of Upward Compatibility of Temporal Query Language”, http://www.cs.arizona.edu/~rts/pubs/Wirtschafts.pdf
3. “Temporal Data Management”, http://oldwww.cs.aau.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-17.pdf
4. “Spatio-Temporal Data Types: An Approach to Modeling and Querying”, http://web.engr.oregonstate.edu/~erwig/papers/MovingObjects_GEOINF99.pdf
5. “Formal Semantics for Time in Databases”, http://portal.acm.org/citation.cfm?id=319986&coll=portal&dl=ACM&CFID=6511873&CFTOKEN=58729889

The Data Vault model is capable of adapting, changing on the fly, exhibiting the mathematical properties of automorphism, in that through architecture mining combined with data mining efforts we can "learn" what architecture flaws exist, where stronger relationships exist, and where the architecture can change itself or re-connect to itself to form a stronger data model.

How does this work?
The Data Vault LINK is made up of vectors. It houses a directional connection to each HUB that it is associated with. The vector of that connection can be assigned a strength and confidence co-efficient to determine it's usefulness within the data set contained within a link. Mining the data over time can produce a powerful combination of patterns of change, along with the discovery of patterns of association (possibly never known before), or as a result of a pre-known state.

The data mining tool can then be taught either "what to look for", or it can be set-off in discovery mode to associate information based on a Data Vault model already constructed (use the existing Data Vault model as a starting point for the learning process), and then it can determine if any "undiscovered" relationships exist. Furthermore the process of mining the data can then be used to assign strength and confidence coefficients to EACH of the vectors in each link, thus preparing for the architectural mining phase.

So how is the Data Vault automorphic?
The Data Vault is connected (within itself) to itself via the Links and the vectors within the links. Each vector can be considered a component within the mathematical matrix of the automorphic functions. Then, the mathematics of "groups" and vector analysis can be applied to dynamically alter the matrix for a potentially different outcome.

Thus, new links can be constructed on the fly, tested, and then removed (if no real value to the human on the other end of the computer is perceived). They can likewise be constructed, and then old linkages can be removed to produce an auto - morphing data structure, something akin to self-correcting. I will NOT go so far as to say it's actually learning, because it (the computer) will still not understand the CONTEXT to which it's applying the changes.

This type of system STILL requires guidance, training, and tweaking from the operators in order to achieve the desired outcomes and modifications to the model that make sense to the business, even if the business itself is commercial or government oriented.

However, this type of system can be applied (easily) directly to the Data Vault modeling constructs in order to achieve a self-changing data store, something that appears to "point-out" different facts, or discover different unknown relationships without understanding what it has. The understanding part is still up to the human.

Ok, so how does this benefit business?
Well, if we can spot relationship changes automatically (on a simplistic level), or mathematically figure out POSSIBLE infancies in our business, then we might be able to adapt our business based on the information being collected (or in some cases, adapt the source systems to collect information WE MISSED that might be vital to our operations). The data sets and the architecture of the data sets can tell us as much about our business as the processes and the business models we use.

You can find out more about the Data Vault model (for free) at: http://www.DanLinstedt.com

Hope this is interesting,
Daniel Linstedt


Posted December 23, 2007 6:32 AM
Permalink | No Comments |

I sat down with my good friend Jeff Jonas yesterday and discussed the nature and notion of contextual processing. Jeff is a phenomenal individual, and much smarter than I ever hope to be, but all that aside, we had a wonderful conversation about the nature of processing streaming data (one piece at a time, or possibly multiple pieces in parallel, but separated) and how to focus the notions of context.

How is this related to B.I.?
It has everything to do with Business Intelligence, and how we "experience" and use our data sets/patterns within to make sense of our business, especially in an Operational B.I. world

Processing the context on a streaming basis (as Jeff says) requires the ability to "change" all that we know (perception) at run-time based on new facts arriving on the stream. His statements went a little like this:

1) Imagine we think our friend XYZ is a good person. We just met this person 3 days ago, so we don't know much about them, but they've been nice to us - so our current perception of this individual is: K, U, I, O, T - and so on. We've hung out with them, so we have a whole host of experiences to draw from (mostly fun).
2) Now, 3 days later we find out from another very good friend, someone we've trusted for over 25 years, that this person has done something horrible in the past...

At that instant, considering our relationship to our very good friend, all that we know about person XYZ (perceptively) changes; usually very quickly.

Now, this isn't so bad if we are dealing with one piece of information, and a very small series of memories that we are focused on, but imagine now: trying to do this at 10,000 transactions per second in a non-sequential order of arrival of facts, and then trying to affect data sitting within 100 billion rows in our database...

This brings me to my discussion. From here Jeff and I began discussing HOW this processing needed to take place, and it reminded me of some of the conversations I'm having here at Teradata Partners conference this week.

The questions on the table are:
1) How should the system determine the assigned context for a given fact? Well, we have to let go of the word "context" and from a systems perspective we have to work with the notion that the data has a strong correlation to a particular STACK or SET of facts/history or historical knowledge.
2) Once a perspective has been established for that incoming fact, what IMPACT does it or should it have against all the target data, or patterns that are already known? For instance, suppose an area code changes from 720 to 750 (Jeff's example) - what do you need to do to change ALL of the existing phone numbers? Inserting brand new rows isn't always the answer, it would cause too much data change, updating existing information also won't work - it too would take too long. REMEMBER: 10,000 transactions per second, means we have to process this information and execute on the history in millisecond response times.

Jeff and I began to discuss the notions of a LENSE, through which focus on a particular pattern could be achieved. What's important here is the FOCUS - but again, remember the focus is for _this current piece of information_ and is not necessarily related to other currently arriving information or facts.

Well what the heck does this have to do with B.I.?
You should already be able to see it... In a VLDW where we have huge stores of time based information it is near impossible (without focus) to find what you're looking for, so the first problem is (again) establishing focus - where oh where does my data FIT? So if you're processing in REAL-TIME folks, listen up... Once we establish which data sets are affected, we need to understand IN A FRACTION OF A SECOND how to change the "known outcome" on the existing history - oh yes, and by the way, this all has to happen in PARALLEL with all the other arriving facts, or it simply won't be executed in a timely fashion.

Now what else am I saying about ALL THIS DATA we've stored?
HERE IT IS:

* Large volumes of data must be processed and learnt from.
* The combined "learned" knowledge (we'll call it a derivation on average) of a STACK of related information within a topic area IS MORE IMPORTANT than the parts or the all the history and individual facts, but without all the details, we can't create a combined image.
* This combined knowledge element must be used IN CONTEXT or AS A CONTEXT LENSE to quickly establish the relevance of the incoming information, and how it will affect the "next" view or look at the information.

In other words:
* VLDB / VLDW data by itself is important when you're digging for detailed specifics that happened at a specific point in time, but the real value is having a "mined" collective perspective on all that detail that allows us to establish where and how our current "transaction" will affect the outcome.

A 24x7x365 neural network / data mining engine MUST be up and running consistently. it must first be trained, and then constantly adjusted for "drift" off topic, but the neural net should be receiving the transaction inflow for "context" application in order to establish our focus, or put a "lens" of information to our historical data set. This isn't your fathers neural net, and not your mother’s data mining engine - no... this is a different way of "scoring" parts of interesting history that are within the interested perception bounds (Jeff's term) so that processing of "extraneous noise" is filtered away as one of the first steps.

This data "mining" engine or neural net is highly focused, real-time processing based on transactions, and it houses "the many different lenses" of focus (or combined derivations) of different but interesting views of history, so that based on the incoming transaction - it can change the "lens" to match and see where the impact is.

From a B.I. perspective, I'm also saying that the sum of the whole may be more interesting and more valuable than the sum of the parts, but to get the sum of the whole, we have to have all the parts when we start. So the INTELLIGENT part of Business Intelligence is all about
1) Knowing which patterns are most interesting / most costly to the business - establishing the RIGHT LENSE at the right time, and having that lens available ahead of the arrival of the transactions
2) understanding that changing the color of the lens is easy when the transaction arrives, but that over time, the "lens" needs to be replaced (due to virtual scratching / shifting of the answer set), and needs to be re-aligned with all the large set of facts included in the history.
3) real-time transaction processing IS 100% necessary in a VLDW / data warehousing environment.
4) ALL the facts that we collect are important, depending on the "viewing perspective" of the business user.

New kinds of systems like this are in development labs, and I can help you with your efforts (should you so desire) to focus the lens. But it's advances in technology beyond what we have today that make this so interesting.

Food for thought anyhow, I'd love to hear what you have to say.

Cheers,
Dan L
DanL@DanLinstedt.com


Posted October 9, 2007 7:33 AM
Permalink | 2 Comments |

I've been out for a while; getting new companies started can be hard to do... But I think we're getting there. Anyhow, as everyone knows - I've got an interest in the real nature of things, how they work, how they change and adapt themselves, and how automated computing facilities can bring these systems closer and closer to the business users. Ultimately we'd like the business users to manage, and suggest change to the back-ends through uncomplicated interfaces. We have a lot of terms popping up on the internet (even just over the past two years) which I'd like to explore, and then we'll dive in to where I think the industry should be going. In particular, one term is driving me to sleepless nights: "Dynamic"

It seems to start and end with the term Dynamic. Throw the phrase on to the front of existing terms (like decision support, BI, DW, web services, etc...) and we have a whole new ball game. Just run a search in an engine like http://www.SurfWax.com - which uses a neural network to really find what you might be looking for... and see the hundreds of results that now pop up as a result.

I've been following some really interesting writing in ACM Communications journal, and IEEE on visualization, business tactics, and in their latest print volume, Dynamic Decision Support (caught my eye).

I'll try to summarize for you what they were getting at:
Dynamic Decision support (according to them) includes the notions of tying business process workflow to a visual design system on the front end, which allows the users full drag-and-drop access to their data, and to their metadata, and to the lineage - all connected within ontologies and sub-ontologies defined for each "workflow component." The business users would then design different workflows, and therefore re-organize the data sets underneath to produce their own reports, and their own "processes." In another article in this magazine, they continue on to discuss how the changes to these processes can be measured and managed in terms of impacts.

As you might imagine, there would be security around this entire process to keep them from blowing the larger business flows out of the water. However, they all live in this "world" and play with different levels of "process flow metadata", to help streamline and optimize the business itself. In other words, in their eyes, DSS is no longer just a reporting interface, nor is it actually developing dashboards, but an interactive process enriched with metadata, process analysis, and business workflows that manage dependencies, definitions, ontologies, and so on. The end result is getting the data out to a platform like "excel" or your favorite reporting tool - in an automated fashion.

Of course they also state that this direction is several years off, there's a lot of engineering to be done to ensure the quality, consistency, and usability of such interfaces. There is also a level of business education that must take place, along side of fundamental shifts in the way we do business, but another point they make is: that business must stay nimble, and in order to compete - businesses (even large ones) must be flexible and employees who do their daily jobs are the ones armed with the knowledge on how best to change the business for the future.

What does this have to do with DYNAMIC?
Everything. Too often this term is abused on the web - it's like the new buzz word used in front of all else. For instance: I went to lunch and ordered a dynamic pizza, or a dynamic menu that changes based on my dress style, or dynamic presentation, or dynamic speaker, or dynamic blog entry... You get the point, it's been tossed in front of Dynamic Data Warehouse, Dynamic Decision Support, Dynamic Reporting, Dynamic ScoreCards, Dynamic ETL, Dynamic Data Integration, Dynamic Web Services, Dynamic SOA.

But what does it _really_ mean?
In my humble opinion, the word "dynamic" should not be used lightly, yes there are standard definitions dating back to the Greeks, but I am of the opinion that when we use the word Dynamic in front of today's technologies we really are saying that structure (parts of metadata) change with the data and the processes. Well, at least that what I am suggesting. I contend that the nature of "dynamic systems" would expand to include "adaptive systems", to me: a dynamic system is one that enhances the user experience, and adapts to the user experience in a holistic fashion.

I think this is a paradigm shift, and there are different levels of getting to a Dynamic Data Warehouse, or a Dynamic Operational System, or even a Dynamic Enterprise Architecture. I think the levels are already built by standard ontological definitions of terms.

For instance: Dynamic Data Sets, Dynamic Information (a bit higher level), Dynamic Hardware, and so on. There is a difference in my mind between "active" and "dynamic." Again, I would take Dynamic Data Warehouse to encompass or include all the following components:
* Dynamic Data Sets
* Dynamic Hardware
* Dynamic Metadata
* Dynamic Structure
* Dynamic Technical Metadata
* Dynamic Business Metadata
* Dynamic Process Flows
* Dynamic Architecture
* Dynamic Reporting
* Dynamic Data Mining

These are the general components included at an architectural level in a non-dynamic or traditional Data Warehouse. They should (in my opinion) be defined as part of a Dynamic Data Warehouse. The same can be said for the term: "Real-Time" which I'd prefer to call "Right-Time", and the term "Active".

How do we define these components?
By defining each of the terms that are laid out with further definitional content I think we can walk further down the path. Dynamic to me, means capable of adapting to change in an automated fashion. Some of these terms are self-explanatory, but here are a few that I think have been missed by the industry:

Dynamic Data Warehouse - if we accept the fact that this term is a "complex system" which includes architecture, design, data, execution, software, and hardware, then we should also expect to see each of these components within the system to be "dynamic" and autonomous in their nature to change with the surrounding environment. A "Dynamic Data Warehouse" is _not_ (in my opinion) simply a Hardware Platform with a Database that accepts right-time data that might have a DSS thrown in.

There's a difference between providing Dynamic Access to right-time data, and Dynamic Data Warehousing in my opinion.

In my definition, the structure (metadata) must adapt, the business rules must be flexible and truly at the fingertips of the business users, those same rules must be part of the governance that helps the automated changes take place in the back-end, and so on. But if we look at a very simple example, let's see what a "Dynamic Data Model" might be:

Suppose I have an SOA in my organization, and at it's heart is a common data model, surrounded by a layer of common data services. These services "listen" to the outside world for transactions. When the transaction is ready, some of these services are smart enough to check the metadata (the version of the sending service, and the definition of the transaction itself). In doing so, the service discovers a new "XML" element; a new piece of information has been added. Now, in order to be a dynamic service, it must check the common model it has - and decide "what" to do with the new data, where does it put it? How does it adapt the model? Can it adapt the model? What’s the impact of the change if it were to make one? Does the service simply "stop" in mid-stream and fire an alert?

Well, the easiest answer is it takes the XML as a blob, and sticks it into an "XML aware" database object, theoretically it would send a workflow notification to the business users that this new element arrived. It would then be up to the business users to "adapt" or change their processes, and queries to make use of the element.

This is a very crude interpretation of a Dynamic Services component. What's missing is the automated adaptation of the element through MODEL DISCOVERY, META DATA MINING, and ARCHITECTURAL MINING with scoring, and "approximation" of context for where the new element fits, and where it impacts the process flows downstream. Not to mention the automated CHANGE to the process flows (when appropriate).

This begins to touch some of the components in a Dynamic System - but doesn't get to a Dynamic Warehouse. These are daunting tasks - and yes, we need to take baby steps along the way (and if someone can make a giant leap, all the better - if the market is ready). I would urge you to think of "Dynamic" in a new light, let's be more specific about WHAT it is we are creating that is dynamic rather than just throwing the label on an entire system.

I'm curious to hear your thoughts, as always - I like to take this as far beyond the horizon as possible. This is just one way to look at "Dynamic Systems" and "Dynamic Data Warehousing", if you have other ideas, please let me know.

Have a wonderful day,
Daniel Linstedt
http://www.COBICC.org (Colorado Business Intelligence Community Connection)


Posted August 3, 2007 6:30 AM
Permalink | No Comments |

I've had several good conversations with different folks in the industry lately about this term and what it means. Lou Agosta was nice enough to write a piece on DMReview, as well as have a really neat phone call with me. We both agree that there are more layers to this onion that originally thought. What a surprise!! But like anything, there are also steps, types, and classes of DDW to eventually get us there. In this entry I dive into the topic of "classifications of terms", and levels of DDW, and attempt to put down a rough road-map to get there. Keep in mind the definition is still in flux, and will be for some time to come.

Dynamic Data Warehousing terminology must be separated into its constituent parts in order to be understood properly (my belief anyhow).

Just like Master Data Management is pulled into: Master Data, and Data Management, and Master Data Management, such is the definition of DDW: Dynamic Data, and Data Warehousing, and Dynamic Data Warehousing

I think each stage of these terms warrants explanation, so here goes:

1. Dynamic Data - something that has been discussed for many many years, and has been affectionately known as Real-Time data, operational data, tactical data, active data, and so on... In other words, it's the data that is dynamic, changing, and responding to the business needs. This particular set of words has zip zero zilch to do with data warehousing, structural integrity, views, queries, ETL, ELT, EAI, and so on - it has everything to do with the data.
2. Data Warehousing - well we all have a pretty good idea as to what this is, and Bill Inmon just solidified the definitions within the architecture of DW2.0. We'll write more on this topic later. Again, how is Data Warehousing dynamic? Well, if you go back to "real-time data" or dynamic data, or active data, you'll see some of the ties. The data warehouse simply becomes a place where both tactical and strategic data can exist. Does this mean that the EDW and the ODS are one in the same? NO - that's NOT what I said, what I said is the "data warehouse" (which includes the ODS (Interactive sector), and a strategic data storage area (Integrated Sector) exist within the same "architectural foundation" called a data warehouse.
3. So where does that leave Dynamic Data Warehousing? Good question, I hope this is one we can answer with many more discussions and entries to come.

Here are my thoughts: When I talk about a "DATA WAREHOUSE" I usually refer to all the moving parts and pieces that make up this "electronic data store", in other words, I include the structure (data model), execution code (SQL queries, loader scripts, unload scripts), data migration code layers (ETL, stored procedures, functions), and of course the metadata (semantic meaning that defines all these things, and hopefully how they all interact).

So when I talk about "DYNAMIC DATA WAREHOUSE", what I'm really saying is:

The following layers must be "included" in the term DYNAMIC:
1. structure and indexing (DDL)
2. integration code
3. SQL Queries
4. BI Reports (some use the queries, some house queries OUTSIDE the RDBMS)
5. Metadata and Semantic Ontologies
5a. This includes: Dependency chains
5b. Workflows (technical workflows)
5b. Data Model Definitions
5c. Aggregate definitions
5d. Security and access concerns .... and so on.

What I mean by Dynamic is the nature of handling change, or as Lou put it: "changing at the speed of business". I would also go so far as to say, "it is AUTOMATED ADAPTATION that enables change at the speed of business."

Now, what does all this mean? To me, Dynamic Data Warehousing is a solution requiring people, machines, standards, processes, architecture, and design elements. In the future once standardization can be executed properly (even within a single DDW within a vendor producing their own "flavor") then it will become possible to AUTOMATE many (not all, but many) of the functions that we currently do by hand.

In other words, like everything else - a "DDW" will follow the commoditization path, the same way appliances are following it. In fact, I'd also be so bold as to say I think Appliances are the right foundation to START with, to move in the direction of the purist DDW. I also think that DDW is a GOAL, not necessarily a solution - some of the parts to what I've outlined may never come to pass, we'll just have to wait and see.

What I will say is this: I know that semantic integration of structures can be done, I know that matching structures to semantic meanings of well-established ontologies can also be done. I also know through a variety of inference engines, neural net algorithms, and visualization that the strength of these relationships can be SEEN, EDITED, and TAUGHT (to the neural net to process this information better the next time). I've seen these software components, they exist today.

I also know that there is still a long way to go, that the results of these data model integrations must be checked manually and corrected - post generation. Furthermore, the only piece that this addresses is the structural component of the DDW. It does not address the automated adaptation of code layers, SQL layers, ETL layers, security layers, metadata layers, nor reporting layers.... these are all things that must be worked on.

For now, all I'm suggesting is "broaden your minds..." Harry Potter movie, art of divination scene. Let's go for the utopian vision, and make our efforts that much better as we progress. After all - a philosopher once said: you set your own limits, and another said: you can only execute to the goals that you set for yourself.

So why not stretch a little bit? In the next entry on DDW, I'll try to unfold the layers - but first, if I missed any layers, please let me know by responding. Furthermore, MARKETING DEPARTMENTS TAKE NOTICE: PLEASE make sure your wording matches your marketing statements, in my last post I took a shot at IBM marketing, I was _NOT_ taking shots at the DB2 UDB product.

Just don't say things to me like: "XXXXXXX is a SOLUTION, not a product, not a this, not a that.... [then later say] YYYYYY is our product, and it is a XXXXXXXX." This is contradictory.

Thanks,
Dan Linstedt
Come get a masters of science in BI at: http://www.COBICC.org


Posted June 27, 2007 6:27 PM
Permalink | No Comments |

Thank-you everyone for the great feedback so far. Let's keep going on this track until someone says that it simply isn't possible. Why? Because as many of you know, I like to jump out beyond the horizon to see what might be done "outside the box", and if there is a remote chance that it will take hold (because of what we see happening), then great! If not, let's ditch the ideas in favor of something better... I must invent, but my ideas are based on many other individuals work in the industry.

I blogged about temperature of data a while back, and recently I've been exploring metadata. Now I ask the question: what about temperature of metadata as it relates to the impact that the change would have? What about notions like Architecture Mining? Metadata Mining? Correlation analysis on structures? Are these things too far out to bend the brain on?

I think not. The time is coming where the next stage to get to (beyond active/real-time) is Dynamic Data Warehousing. I've borrowed from my good friend Stephen Brobst, and his diagram of the 5 stages of Data Warehousing, and added a 6th (see below).

StagesofDDW.jpg


After all, this is the peak of what we are trying to get to: dynamic adaptation to business as business changes, so do all the systems (especially the integration systems). When we look at the end-results of implementing SEI/CMMI Level 5, or ISO 9001/9002/9003 etc... or PMP best practices, or ITIL documents, or ISACA audits, or CoBIT Controls, or good governance - they all come to a similar conclusion about processing: the nature of the processing routines once built, optimized, and well established is to run seamlessly day in and day out, repeatably in the back-office until such time as new processes or new structures must be introduced.

Automation is often left out when people talk about these levels of projects, however automation is really a fundamental goal of IT to begin with. We should be constantly thinking of new ways to automate repeatable and consistent processes. With this in mind, why can't the structures of the data warehouse (metadata, metrics, data itself, unstructured data, indexes, queries, code, etc..) all be subjected to the same repeatable rules? Even with code there's a finite sequence for execution defined in the compiler architectures.

Now, let's take the great leap off the edge of the horizon for a moment.... What happens to systems when the business changes? What happens to architectures or Data Models to be precise? What happens to the business processing (code) built on the source capture systems? They all change - but do they change consistently, repeatably, and can the affect of the change be measured prior to being implemented?

Most of the time (today) the answer is no, there must be some level of human intelligence involved to figure out where the impact is, how big of an impact it is, where the changes need to be made, and then they architect a patch, a band-aid, a new section of code, or a complete rewrite to serve the needs. Well, this _process_ happens over and over and over again. Cost, Risk, and Mitigation analysis. We can safely assume that for most of the changes happening within standard operations that they can be "graded" in accordance with location, risk, and impact. We can safely say that impact measurement can be automatically determined by examining METADATA, or ontologies of metadata which define the pre-existing relationships and associations of that metadata for our tools.

When I see a new element being added to a table in the source databases, I will typically make the assumption that it must be related to the source key from the table in which it is being added to, otherwise they would not have put the attribute there in the first place. Therefore there is no reason why I shouldn't be able to construct a program that recognizes the new element, where it appears on the source - and can "grade" it according to it's impact, and our downstream models' ability to handle the new element dynamically.

In other words, much like applying temperature of data, I maintain that Architecture Mining (or metadata mining or Mining of Ontology trees) can lead us to mathematical results that can apply temperature ratings to impacts of schema changes. What I am saying is that new attributes that appear within XML, XSD, XSL, object, web-services, or table structures can be run through these algorithms, and assigned a green, yellow or red flag based on the impending confidence that the change is "easy with low impact", "somewhat challenging, but we are confident", or "too difficult to achieve without human intervention."

I further assert that these temperature ratings can be placed on a gradient scale, so that alarms won't be raised unnecessarily. Like any A.I. or Neural Network it would have to be trained, and occasionally re-trained or corrected; and there might be the occasional false positives to deal with, but that sure beats adding 150 attributes to a table by hand, just because the ERP coder was up all night implementing that into the source system.

This is only one piece of Dynamic Data Warehousing, this is the structural change adaptation of Dynamic Data Warehousing. I think that we cannot achieve true DDW without this component working first. After the structures are changing seamlessly, we can begin to work on automating the adaptation of views, reports, mart loads, and processing routines.

Do you have any futuristic thoughts about DDW? I'd like to hear about it.

Thanks,
Dan Linstedt
You can get a Masters of Science in BI at Daniels College of Business, Denver U. http://www.COBICC.org


Posted June 7, 2007 9:46 PM
Permalink | 2 Comments |