Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Context fascinates me, not to mention I love a good challenge.... Perhaps it's the fever I have tonight and perhaps it's just a wandering mind. In this entry I'm going to explore a couple of perspectives that I've been creating lately, along with a few theories that I'm proving out - and they have to do with (what else?) The obvious. The fact that metadata drives our data model structures, but rarely are metadata synchronized with definitional context (derived automatically from unstructured data sources), and rarely are they visualized beyond the standard 2 dimensional data models that we are so used to seeing and working with.

This entry is a thought experiment that dives into a land of "what-if" analysis, and attaches it to what I call Dynamic Data Warehousing - which also leads to Dynamic Automated Architecture Manageability. The problem is: how can we build a consistent, standardized, and solid foundational data model that will adapt its self going forward as the business changes and the needs change? (All of this of course without loosing sight of all the history that has already been collected). Impossible you say? Not at all...

A few of my good friends (much brighter than I) discuss semantic notions of context on a continuous basis. You can read about some of these phenomenal thoughts here. He discusses notions of semantic neutrality, and the fact that examining semantic reconciliation is the first step to success.

It is important to address semantic reconciliation before other analytical processes (e.g., statistical analysis, market segmentation, link analysis, etc.). This is a "first things first" principle because semantic reconciliation makes secondary analytic and computational problems that much easier and that much more accurate.

All too many "semantic driven engines" on the market today address the data as the starting point, establishing statistical analysis, market segmentation, etc... before ever addressing _any_ of the semantics of the data model (i.e.: the naming conventions, prefixes, suffixes, abbreviations, definitions, correlations, and so on.) This is fine if what you want is to build a "model that will hold your current but not your future nor your past data set."

An engine that builds a data model "based on just the data set" is completely ignoring 80% of the problem. Any semantic engine that "scans the data" to build a model might build the "most correct model" which addresses "today’s' snapshot of data"; but fails to be dynamic, adaptable, or even consistent (in structural layers) going forward.

The problem with basing metadata or newly constructed models on "data" is that it only captures the value of TODAY. And if you're lucky enough to have history, it might form a picture of yesterday - but then again, the tool may make compromises based on outliers and strange patterns that occurred (errors in the data). Or worse yet, it may not be able to make heads or tails of the data to come to any conclusion about what the model should be.

There is a solution: Data Model Architectural Consolidation based on the metadata or definitional elements held within. Coupling the existing models with semantic definitions, and a few other elements not only increases the value of the metadata, but soon can provide enough contexts to make a decision as to what the model really should be. This is what "Model Driven Architecture" is truly about, MDA is NOT about "data driven architecture" as some vendors claim.

What I'm proposing here is the fact that not only can an engine be developed to automate consolidation of data models, but that it can in fact apply new changes to an existing consolidated data model based on semantic discovery and associability. This can lead to a common data model which would last for years within a business and provide a solid, repeatable foundation from which to build.

It would also be feasible to assume that in order to reach Dynamic Data Warehousing, one must be willing to accept the ideas that data model changes can be "automated" and applied dynamically to everything including queries, load routines, web-services interfaces, and yes (eventually) with security in place.

Metadata is what ties all this together, and when the business changes the models change, when the models change, the context of the data (in most cases) changes... Data discovery to build a model is a crucial technology that can lead to better understanding, but it should *not* be the driving factor in building common data models for EDW or common web services going forward. Using models to drive models, and then making statements through data discovery is the proper way to forge ahead in a model driven world.

All of this should change the way we view metadata, we should begin to realize just how important metadata is, models are, and the impact that naming conventions can have on our business for years to come.

I'd love to hear your thoughts, whimsical or otherwise - again this is a thought experiment.

All the best,
Dan Linstedt


Posted May 28, 2007 8:54 PM
Permalink | 2 Comments |

2 Comments

In metadata management it is important to segregate naming from meaning of names. It is not naming of metadata that make metadata functional, it is adequate metadata management system, http://www.dataintegrityinstitute.com/Enterprise_Metadata_System.htm .

Let’s take a database column, as a typical metadata object. If we, with energy of a headmaster, try to impose some naming conventions every time when database column is created, we would act like Old Testament people when they selected names for children, rivers, etc. The column name should be unique inside a database table and that is all. The database management system and all other system and conventions should allow selection of any name for the column that is unique for the particular table. No other barrier should exist. Nobody can expect that is possible to inject a transactional sense of the column in its name, the data warehouse sense, this report sense, that report sense, the corporate policy, etc. If that would be possible, the column name would be a very long string of sausages, almost without beginning and end. And even if it works at the moment, next time when business changes (corporate name changed, merge, etc.) the situation will be a headache.

What gives sense of metadata is a proper metadata management system. The metadata management system should link and make functional every structural, business, and operational property and aspect of the column, not jus a name, including:

- unique ID, this time across all enterprise
- real name
- long name
- short name
- abbreviation
- type
- SQL type
- C language type
- CSV type
- fixed length type
- size
- length
- scale
- null ability
- null value
- uniqueness
- lineage
- impact
- compliance
- etc.

That is a reason why ASG Rochade, one word most successful metadata management engine is build on the top of hierarchal database system. Other metadata tools that promise everything provide nothing and even require an external relational database engine.

Drago Pejic
info@DataIntegrityInstitute.com
www.dataintegrityinstitute.com

First, a declaration of interest. I am the Product Manager for ASG-Rochade! What caught my eye about this was the notion that metadata and "definitional context" are somehow distinct. I'm arguing internally that "definitional context" is a good definition of metadata! There is, for sure, a distinction between syntactical and semantic metadata. Both need to be available, though often different communities of interest will have a different balance of requirements. (Loosely, if I consider the enterprise as having IT at the center and the "real" business user at the boundary then the syntactic focus diminishes and the semantic focus increase from the center to the boundary).

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›