We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Thank-you everyone for the great feedback so far. Let's keep going on this track until someone says that it simply isn't possible. Why? Because as many of you know, I like to jump out beyond the horizon to see what might be done "outside the box", and if there is a remote chance that it will take hold (because of what we see happening), then great! If not, let's ditch the ideas in favor of something better... I must invent, but my ideas are based on many other individuals work in the industry.

I blogged about temperature of data a while back, and recently I've been exploring metadata. Now I ask the question: what about temperature of metadata as it relates to the impact that the change would have? What about notions like Architecture Mining? Metadata Mining? Correlation analysis on structures? Are these things too far out to bend the brain on?

I think not. The time is coming where the next stage to get to (beyond active/real-time) is Dynamic Data Warehousing. I've borrowed from my good friend Stephen Brobst, and his diagram of the 5 stages of Data Warehousing, and added a 6th (see below).


After all, this is the peak of what we are trying to get to: dynamic adaptation to business as business changes, so do all the systems (especially the integration systems). When we look at the end-results of implementing SEI/CMMI Level 5, or ISO 9001/9002/9003 etc... or PMP best practices, or ITIL documents, or ISACA audits, or CoBIT Controls, or good governance - they all come to a similar conclusion about processing: the nature of the processing routines once built, optimized, and well established is to run seamlessly day in and day out, repeatably in the back-office until such time as new processes or new structures must be introduced.

Automation is often left out when people talk about these levels of projects, however automation is really a fundamental goal of IT to begin with. We should be constantly thinking of new ways to automate repeatable and consistent processes. With this in mind, why can't the structures of the data warehouse (metadata, metrics, data itself, unstructured data, indexes, queries, code, etc..) all be subjected to the same repeatable rules? Even with code there's a finite sequence for execution defined in the compiler architectures.

Now, let's take the great leap off the edge of the horizon for a moment.... What happens to systems when the business changes? What happens to architectures or Data Models to be precise? What happens to the business processing (code) built on the source capture systems? They all change - but do they change consistently, repeatably, and can the affect of the change be measured prior to being implemented?

Most of the time (today) the answer is no, there must be some level of human intelligence involved to figure out where the impact is, how big of an impact it is, where the changes need to be made, and then they architect a patch, a band-aid, a new section of code, or a complete rewrite to serve the needs. Well, this _process_ happens over and over and over again. Cost, Risk, and Mitigation analysis. We can safely assume that for most of the changes happening within standard operations that they can be "graded" in accordance with location, risk, and impact. We can safely say that impact measurement can be automatically determined by examining METADATA, or ontologies of metadata which define the pre-existing relationships and associations of that metadata for our tools.

When I see a new element being added to a table in the source databases, I will typically make the assumption that it must be related to the source key from the table in which it is being added to, otherwise they would not have put the attribute there in the first place. Therefore there is no reason why I shouldn't be able to construct a program that recognizes the new element, where it appears on the source - and can "grade" it according to it's impact, and our downstream models' ability to handle the new element dynamically.

In other words, much like applying temperature of data, I maintain that Architecture Mining (or metadata mining or Mining of Ontology trees) can lead us to mathematical results that can apply temperature ratings to impacts of schema changes. What I am saying is that new attributes that appear within XML, XSD, XSL, object, web-services, or table structures can be run through these algorithms, and assigned a green, yellow or red flag based on the impending confidence that the change is "easy with low impact", "somewhat challenging, but we are confident", or "too difficult to achieve without human intervention."

I further assert that these temperature ratings can be placed on a gradient scale, so that alarms won't be raised unnecessarily. Like any A.I. or Neural Network it would have to be trained, and occasionally re-trained or corrected; and there might be the occasional false positives to deal with, but that sure beats adding 150 attributes to a table by hand, just because the ERP coder was up all night implementing that into the source system.

This is only one piece of Dynamic Data Warehousing, this is the structural change adaptation of Dynamic Data Warehousing. I think that we cannot achieve true DDW without this component working first. After the structures are changing seamlessly, we can begin to work on automating the adaptation of views, reports, mart loads, and processing routines.

Do you have any futuristic thoughts about DDW? I'd like to hear about it.

Dan Linstedt
You can get a Masters of Science in BI at Daniels College of Business, Denver U. http://www.COBICC.org

Posted June 7, 2007 9:46 PM
Permalink | 2 Comments |



*Personal Disclosure - I work for a vendor, Informatica*

This topic is a good one. "Change" exists at so many levels within a solution implementation, metadata, data, quality levels, the list is endless.

Although I would tend to agree that structures historically have been difficult to "auto-change" and many of the change processes manual, the ability to measure the impact of structural change at the metadata level has existed for many years in the Informatica product platform and others.

Are these capabilities leveraged by the "masses"? Not to overwhelming effect. But the savvy, experienced practitioner uses this information in their daily change processes as a ruler to gauge "big project" to "minor edit".

In my mind, what you've outlined as "Architecture Mining" is really another name for "structural profiling" a functional addition to the content-focused Data Profiling tools.

Some of these capabilities already exist in Informatica's offerings and others, see http://www.informatica.com/products/data_explorer/data_mapping/default.htm for details.

Another area of change worth discussing is "transactional change capture".....

Don Tirsell

Sr. Director, Product Marketing

Hi Don,

Thank-you for your comments, and maybe I didn't articulate myself well enough. I need to define the term "architecture mining", because there is a _significant_ difference in my mind between structural profiling and Arch. Mining.

Structural Profiling is (in my mind) statistically based, and can be computed based on statistical matching and probability.

Architectural Mining looks for relationships between structural elements where none may be defined, it also uses a neural network approach to "learn" and suggest changes to the architecture with confidence ratings in-tact.

I'll blog on this going forward.

Dan L

Leave a comment

Search this blog
Categories ›
Archives ›
Recent Entries ›