Blog: Dan E. Linstedt« Dynamic Data Warehousing... A new definition | Main | DDW - Constantly re-defining the term » DDW - Detecting and Securing ChangesThank-you everyone for the great feedback so far. Let's keep going on this track until someone says that it simply isn't possible. Why? Because as many of you know, I like to jump out beyond the horizon to see what might be done "outside the box", and if there is a remote chance that it will take hold (because of what we see happening), then great! If not, let's ditch the ideas in favor of something better... I must invent, but my ideas are based on many other individuals work in the industry. I blogged about temperature of data a while back, and recently I've been exploring metadata. Now I ask the question: what about temperature of metadata as it relates to the impact that the change would have? What about notions like Architecture Mining? Metadata Mining? Correlation analysis on structures? Are these things too far out to bend the brain on? I think not. The time is coming where the next stage to get to (beyond active/real-time) is Dynamic Data Warehousing. I've borrowed from my good friend Stephen Brobst, and his diagram of the 5 stages of Data Warehousing, and added a 6th (see below).
Automation is often left out when people talk about these levels of projects, however automation is really a fundamental goal of IT to begin with. We should be constantly thinking of new ways to automate repeatable and consistent processes. With this in mind, why can't the structures of the data warehouse (metadata, metrics, data itself, unstructured data, indexes, queries, code, etc..) all be subjected to the same repeatable rules? Even with code there's a finite sequence for execution defined in the compiler architectures. Now, let's take the great leap off the edge of the horizon for a moment.... What happens to systems when the business changes? What happens to architectures or Data Models to be precise? What happens to the business processing (code) built on the source capture systems? They all change - but do they change consistently, repeatably, and can the affect of the change be measured prior to being implemented? Most of the time (today) the answer is no, there must be some level of human intelligence involved to figure out where the impact is, how big of an impact it is, where the changes need to be made, and then they architect a patch, a band-aid, a new section of code, or a complete rewrite to serve the needs. Well, this _process_ happens over and over and over again. Cost, Risk, and Mitigation analysis. We can safely assume that for most of the changes happening within standard operations that they can be "graded" in accordance with location, risk, and impact. We can safely say that impact measurement can be automatically determined by examining METADATA, or ontologies of metadata which define the pre-existing relationships and associations of that metadata for our tools. When I see a new element being added to a table in the source databases, I will typically make the assumption that it must be related to the source key from the table in which it is being added to, otherwise they would not have put the attribute there in the first place. Therefore there is no reason why I shouldn't be able to construct a program that recognizes the new element, where it appears on the source - and can "grade" it according to it's impact, and our downstream models' ability to handle the new element dynamically. In other words, much like applying temperature of data, I maintain that Architecture Mining (or metadata mining or Mining of Ontology trees) can lead us to mathematical results that can apply temperature ratings to impacts of schema changes. What I am saying is that new attributes that appear within XML, XSD, XSL, object, web-services, or table structures can be run through these algorithms, and assigned a green, yellow or red flag based on the impending confidence that the change is "easy with low impact", "somewhat challenging, but we are confident", or "too difficult to achieve without human intervention." I further assert that these temperature ratings can be placed on a gradient scale, so that alarms won't be raised unnecessarily. Like any A.I. or Neural Network it would have to be trained, and occasionally re-trained or corrected; and there might be the occasional false positives to deal with, but that sure beats adding 150 attributes to a table by hand, just because the ERP coder was up all night implementing that into the source system. This is only one piece of Dynamic Data Warehousing, this is the structural change adaptation of Dynamic Data Warehousing. I think that we cannot achieve true DDW without this component working first. After the structures are changing seamlessly, we can begin to work on automating the adaptation of views, reports, mart loads, and processing routines. Do you have any futuristic thoughts about DDW? I'd like to hear about it. Thanks, |
Comments
Dan,
*Personal Disclosure - I work for a vendor, Informatica*
This topic is a good one. "Change" exists at so many levels within a solution implementation, metadata, data, quality levels, the list is endless.
Although I would tend to agree that structures historically have been difficult to "auto-change" and many of the change processes manual, the ability to measure the impact of structural change at the metadata level has existed for many years in the Informatica product platform and others.
Are these capabilities leveraged by the "masses"? Not to overwhelming effect. But the savvy, experienced practitioner uses this information in their daily change processes as a ruler to gauge "big project" to "minor edit".
In my mind, what you've outlined as "Architecture Mining" is really another name for "structural profiling" a functional addition to the content-focused Data Profiling tools.
Some of these capabilities already exist in Informatica's offerings and others, see http://www.informatica.com/products/data_explorer/data_mapping/default.htm for details.
Another area of change worth discussing is "transactional change capture".....
Don Tirsell
Sr. Director, Product Marketing
Informatica
Posted by: Don Tirsell | June 20, 2007 12:31 PM
Hi Don,
Thank-you for your comments, and maybe I didn't articulate myself well enough. I need to define the term "architecture mining", because there is a _significant_ difference in my mind between structural profiling and Arch. Mining.
Structural Profiling is (in my mind) statistically based, and can be computed based on statistical matching and probability.
Architectural Mining looks for relationships between structural elements where none may be defined, it also uses a neural network approach to "learn" and suggest changes to the architecture with confidence ratings in-tact.
I'll blog on this going forward.
Thanks,
Dan L
Posted by: Dan Linstedt | June 25, 2007 3:20 PM