Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Metadata is an interesting piece, many corporations and individuals fight over the true meaning of metadata and the context to which it applies. This entry is a thought experiment and explores the question of context, deriving context and resolving contextual fights within an organization as they relate to enterprise metadata. I believe everyone can have a metadata sit-in, and maybe finally work this thing out. Note: this is a tiny bit of light reading...

Why should I even have knickers? What are knickers anyway? And why would they be twisted? Well, if you've never visited England, I suggest maybe you do so. It's a beautiful country - anyhow, knickers have multiple definitions depending on the time of reference and who's doing the referencing. For most of us who speak or understand English today, the statement usually refers to under-garments worn around the waist area.

Ok, so what's changed?
The TYPE of undergarment that knickers used to be, versus what they are today. This is an example of a time-sensitive contextual piece of metadata.

According to Websters Dictionary:

knick‧ers  Pronunciation[nik-erz]
–noun (used with a plural verb)
1. Also, knick‧er‧bock‧ers Pronunciation[nik-er-bok-erz] loose-fitting short trousers gathered in at the knees.
2. Chiefly British. a. a bloomers like undergarment worn by women.
b. panties.
3. British Informal. a woman's or girl's short-legged underpants.

—Idiom4. to get one's knickers in a twist, British Slang. to get flustered or agitated: Don't get your knickers in a twist every time the telephone rings.
--------------------------------------------------------------------------------
[Origin: 1880–85; shortened form of knickerbockers, pl. of knickerbocker, special use of Knickerbocker]

Now notice something interesting: At the end of the definition, it doesn't even agree with itself - they've twisted their knickers, and said see the word KNICKERBOCKER - let's see what KNICKERBOCKER has to say:

Knick‧er‧bock‧er Pronunciation[nik-er-bok-er] –noun 1. a descendant of the Dutch settlers of New York. 2. any New Yorker. -------------------------------------------------------------------------------- [Origin: 1800–10, American; generalized from Diedrich Knickerbocker, fictitious author of Washington Irving's History of New York]

Which not surprisingly has NOTHING to do with Knickers in the first place. Look at Definition #1 in the first quote, and definition #1 in the second quote - they DON'T MATCH!!! They are from close to the same time-period in origin. Ok, so we studied the root of the word, this is not so interesting...

But it gives rise to a contextual problem (one that we have throughout our enterprises today. We can't decide on how to define our own terms, and furthermore, the metadata (the definitions and contextual understanding) 1) changes over time, 2) changes based on individual or line of business.

Our enterprise metadata (Master Metadata) needs to be set forth, and needs to be built from an enterprise (top-down) view. That's not to say that we can't all have our cake / definitions and eat them too! We can, and we should. The best way to describe this type of effort is to look at existing Semantic Mapping Technology, or the Semantic Web, or Semantic Integration. Normally these things are done by hand, and if you choose to do so I would highly suggest an investment in a tool that can track, develop, and visualize Taxonomies, and Ontology’s of words.

In order to make this work you might need:
* Clear taxonomy - defined at different Work Breakdown Structures
* Clear Taxonomy - defined at different Organizational Breakdown Structures
* Clear ontology to manage the taxonomies, cross the WBS with the OBS for big success.
* Clear Version control - each piece of metadata MUST be versioned, and tracked to the CHANGE REQUEST that triggered it within the business processes. Yes, (sigh) this too is tied to BPM and SEI/CMMI level 4.

Yes, I'm suggesting Metadata at CMMI level 4, quantitatively tracked. Quality scores could be included, but are subjective to the individual scoring the metadata.

Now on to your knicker problem, uhhh I mean - the Knickers Twisting problem... I mean - don't wear tight pants and then exercise if you don't want your knickers in a twist... Ok - I digress (sorry).

In all honesty, Knickers are _not_ knickerbockers, although the word may have been derived from the original term. Knickers at an enterprise level may be accepted from a pants manufacturing corporation such as Levi Strauss - as the definition of PANTS or UNDERPANTS... but which is it?

In the real-world of metadata this needs to be resolved by the executive team, they need to be the ones to define PRIMARY metadata. Using Taxonomy trees, secondary, and tertiary metadata can be defined based on LOB (lines of business) and work breakdown structures (roles & responsibilities or uses of the metadata). As long as the metadata is tied to the CURRENT VIEW of the organization, and what the data set represents. So that when data is delivered to the enterprise the metadata goes with it, and the organization can drill up/down and across the metadata meanings (provided they have the proper security).

Unfortunately I do not know of any single tool that can accomplish this today. There are a set of open-source tools that manage semantic meaning, and a set of other tools that manage taxonomies, and another set of tools that manage version control / document management, security, and so on. Metadata tool set vendors are still in their infancy, hopefully someone will rise to the challenge - and hopefully I have not put your knickers in a twist!

We can help you sort out the metadata mess, and establish a contextual, enterprise based metadata system that will save you time and money. This is a serious issue and must be solved before the enterprise gives rise to an SOA initiative, or before the enterprise claims to have completed an SOA initiative.

As always, I'd love to hear from you - your thoughts, comments, poetry, haiku, and and tall tales are all welcome.

Thanks,
Dan Linstedt
CTO, Myers-Holum, Inc
http://www.MyersHolum.com


Posted November 7, 2006 7:23 AM
Permalink | 1 Comment |

1 Comment

Greetings,

I am Harikrishna S Aravapalli and am a Senior Technical Architect at Infosys Technologies Ltd. , Bangalore , INDIA , in their R&D division. I belong to the Information
Management stream of the R&D labs here. My main focus of interest and 12 years of exprience has been in databases with a sepcial focus on data warehousing and Business Intelligence for the past 9years.

I am a member of the UDEF Forum in "The Open Group" and my Id is "h.aravapalli".

I was introduced to the concept of UDEF by my supervisor here in Infosys. I instantly liked it as I strongly feel that we have matured enough to move from Data to Semantics in all our Information processing and UDEF is in the right direction towards achieving the same.
I studied all possible powerpoint presentations on UDEF and did some thinking on it's possibilities for Enterprise Data management and also with a special focus on Data warehousing and Business Intelligence worlds.

I wanted to share a few of my thoughts on the same , which I feel enthusiastic about :

Exciting Possibilities of UDEF concept in the Enterprise Data World :

1 ) UDEF enabled Information Search on the intranet portal ( across structured and unstructured data )
2) UDEF enabled Information Categorization which users can navigate ( perhaps in a tree-structureed format )
3) Performing queries at the UDEF IDs level -- UQL --> to discover/pull other related UDEFIds and hence their meaning
4) Semantic BI --> My first thoughts are more like, a rule-based interpretation of the different UDEFIds ( in relation to other occurring UDEFIds ) within the enterprise , to derive the automated Business Insight/Meaning by machine , as compared to the current Data-based BI which has to be interpreted by humans ( and can vary from person to person ) . Maybe it can help in the real-time BI on Operational Data ( by not touching the actual underlying data ????? )

For example : If a UDEFId for a Product is found in the Purchase Order,Sales , Inventory and Delivery systems, then we can conclude that the product is manufactured and ready to be delivered.

5) Mapping RFID EPC Codes to UDEFIds , may help in easier Data Integration ( of RFID Data ) to the different Enterprise Applications ( ERP , DW etc )

6) UDEF Cache / Semantic Cache for the most frequently used UDEFIds
7) An OSI ( as in comunications/networking ) type of layered model for UDEF-enabling the data-element concepts of the various data sources.

Some of the problems/challenges that need to be solved with respect to the UDEF Concept implementation :

1) algorithms for finding the best set of UDEF names and ID.
2) algorithms for codifying the UDEF names
3) algorithms for de-coding the UDEFIds to their UDEF Names and then to Data Element Concepts and then to Data Elements ( if possible )
4) algorithms for developing "learnings" /"inferences" based on the UDEF names
5) algorithms for the different "UDEF Caching" schemes , to minimise the UDEF re-construction everytime
6) Evaluation of Security/Encryption algorithms that may be needed to combat the "easy access to semantically similar data from different sources that will be possible with UDEFIds"
7) UDEF can also be used as a bridge to convert data from one format to the other using a "triangulation" approach. i.e : Convert EDIFACT to UDEF and then from UDEF to XML. This can be used in enterprises which have both EDIFACT and XML standards and would like to safely convert their processes entirely to XML or EDIFACT or whatever format they chose to convert their data assets to.

UDEF evolution for the Information Management/Data warehousing/Business Intelligence world :
1) Use UDEFId to integrated the Structured and UnStructured data and tie it to the Enterprise Information Elements
2) UDEFIds to enable automated-data-Integration
3) Currently UDEFId is based on "OBJECT" and "PROPERTY". For IM , we not only need to know the data element as a standalone ,but it's inter-relation with other applicable entities that it is associated with and interacts with to derive any relevant meaning.
Hence there could be an additional terms like "CONTAINMENT / RELATIONSHIPS" , "DIRECTION ( Source or Target )" , "ENCRYPTION".

e.g: Current state : ObjectQualifier.ObjectWord_PropertyQualifier.PropertyWord
Suggested State :[ ObjectQualifier.ObjectWord_PropertyQualifier.PropertyWord ].[ContainmentSetName.DataDirection].[Encryption]

This can result in the possibility of the DW transforming from being a standalone entity to being a collection of Sub-DWs , with each Sub-DW residing in the Source system itself. The central DW can then do the function of an exchange or registry or indexing machine. This may be feasible because of the uniform UDEF enabled data in the Source Systems.

Thus we can have multiple instances of DW exchanges thus achieving a "Configurable DW" architecure , similar to Database instances. This can also lead to real time BI , with different sets of Users/Applications accessing different instances of the Central DW

4) Evaluate the possibility of analysing the UDEFIds to derive Inference ( at the Semantic level ) --> maybe using UQL and rules

5) Have a Semantic level BI ( Business Intelligence ) Architecture on top of the Current Data-Level BI architecture ( which consists of a source data layer , Data Integration layer and BI Analysis layer ).
A probable Semantic BI Architecture can have :
a) Semantically coded UDEF Data Sources at Lowest level
b) The data integration can be done at the Semantic level using the UDEF Ids
c) The BI analysis can be done on this UDEFIed data at a Semantic Level using RULES , i.e : meaning of A + meaning of B => C , where C is an inference
d) Further detailed analysis can be done by drilling across to the Data level using the standard BI/DW architectures

e) Thus BI ( Business Intelligence ) analysis can be done in two planes (a) Semantic Level BI Plane -- done by computers using RULES and (b) Data Level BI Plane -- prepared by computer and interpreted by humans
f) if (e) above can be achieved then next step can include "learning algorithms" to develop "Intuitive Analysis" processes ( above the Semantic plane )
using the "historical UDEF data" in the DW.
This would be the third plane of BI , the "Learning/Intuitive" plane
g) Thus BI analysis can navigate between the Learning/intuitive Plane , Semantic Plane and Data-Level Plane , as per the needs of analysis

6) An easier option would be the Data/Semantic Hybrid Architecture where a Semantic UDEF'd layer is introduced between the "Source Data Layer" and "Data Integration Layer".


Note : Some of these concepts might have been inspired from the existing literature on UDEF and some are my own vision of the same.I am writing to you purely based
on my interest and enthusiam to share with you my insights on UDEF and it's exciting possibilities.

Would be eager to get some viewpoints or pointers to my viewpoints from Mr Bill Inmon on these.

Regards,
Harikrishna S Aravapalli

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›