Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Recently in Master Data Management Category

Well, it's happened again. IT is trying desperately to eliminate the value of the EDW from the business (at least this is what I see). Business is responding by demanding the creation of Master Data systems. There seems to be an age-old argument in the market space about the use of, definition of, and condition of: Business Keys. IT appears to be telling people to use surrogate keys and to ignore the business keys entirely. In this entry we will explore this single notion, and see what some folks have to say about it (me included!) Mind you, this is a bit of a rant; they seem to know how to "get my goat" as they say...

I start off by stating, Codd & Date designed normalized forms to have business meaning. They insisted that Business Keys be utilized in order to "make sense of" and "tag" the data sets appropriately so that relationships can be understood and maintained.

A simple link to a temporal database book houses a brief entry on the Information Principle: See it here.

I'd like to say a word or two about business keys (which by the way, you'll be able to find additional information on my videos on YouTube: http://www.youtube.com/dlinstedt/

What are business keys?
Business keys ARE the master information that unlocks context for business users. Business keys are (often) intelligent keys that have MEANING to the business. Business Keys are often alphanumeric, parts of which may be generated sequences, other parts have meaning based on position. In any event, business keys are USED by the business to locate, identify, and track information through the business life-cycle. Without them, business may not be able to "use" or apply the information properly.

What business keys are NOT:
Business keys are NOT surrogates, NOT sequences, NOT ordered numeric elements assigned based on technical insertion rates. Surrogate keys should NEVER be shown to the business, ever.... They should be used within a system (internally only) to identify rows to the machine, and provide optimal join paths, but they should NEVER appear on reports, screens, or anywhere that the business can see them.

Is there an argument around business keys versus surrogate keys?
You bet! Check out these comments:
http://www.mindfuldata.com/Modeling/modeling-pdf/DAMA%202008%20Speaker%20Notes.pdf
http://stackoverflow.com/questions/63090/surrogate-vs-naturalbusiness-keys


"Dimensions should always use a surrogate key that is generated within the warehouse. I went to a presentation a couple of years ago by Ralph Kimball (a data warehouse author), and he discussed the importance of removing the warehouse's dependency on business keys. The idea is a good one, because business keys change regularly and this will result in a long-term problem for the warehouse. However, when we discussed Slowly Changing Dimensions (especially ones that kept history), he said that we should use the business key to link them together. This went against what he had just said, so I decided that we needed to find another solution." http://expertanswercenter.techtarget.com/eac/blog/0,295203,sid63_tax298150,00.html

http://www.mindfuldata.com/Modeling/modeling-pdf/DAMA%202008%20Speaker%20Notes.pdf

http://www.infoadvisors.com/Home/tabid/36/EntryID/191/Default.aspx
http://www.cerebiz.com/blog/index.php/2007/08/06/use-of-surrogate-keys-in-data-warehousing/

WHY are these people demanding that there is no value to business keys?
Because it's a very tough problem for business to overcome. Yet the business today is ASKING, Begging, pleading for answers from Master Data Sets. I maintain that you cannot build a master data system without looking at and using business keys as a central HUB of information.

Why not surrogates?
If I ask you to look up surrogate key 5, do you understand what this is? where it came from? what it is bound to? Does it give you _any_ context at all as to which system generated the number? Do you even know where to begin to find this key?

Surrogate numbers are generated today in EACH source system. In the Data Warehousing world we are responsible for integrating MULTIPLE systems at once into a single place. If we rely solely on these "surrogate keys" and completely ignore business keys as has been suggested by the links above, our EDW would never mesh or align for the business. Furthermore trying to build a master data system would be impossible. Some of these individuals I listed even went so far as to say: "ignore the business keys in your dimension entirely, because it is unruly (null) most of the time".

I say rubbish. If your business is not properly synchronizing, populating, or utilizing business keys then they are hemorrhaging money along their business process. Business keys are vital to the traceability of information ACROSS lines of business and ACROSS systems.

Take a look at what I say about business keys:
http://www.danlinstedt.com/AboutDV.php
http://www.tdan.com/view-articles/5285
http://www.b-eye-network.com/blogs/linstedt/archives/2005/09/between_inmon_a.php

Bottom line, Business keys are imperative that they span the systems. If the business keys are changing, or are re-used, the business is LOSING MONEY. I will take that to the board of directors level every single time, and every time - I can find busted and broken business problems and lack of visibility ACROSS the organization in accordance with their lack of regard for business keys.

The ONLY thing one has to do is look at the businesses that want master data systems - how are you (IT) going to integrate the data sets by surrogate if the surrogates generated by source system ARE THE SAME across multiple sources? WHICH surrogate are you going to show to the business as the "MASTER KEY" for which pieces of information? It's a near impossible problem to solve, the business units will fight over the definition, and it will come down to politics as to who is right/wrong, when the business REALLY should be deciding how to fix the source of the problem: lack of a single business key.

Auto manufacturers figured it out long ago, they use VIN (vehicle identification numbers) to uniquely identify: make, model, manufacturer, date of manufacturer, size of engine, and so on. Unless you are doing something illegal, the VIN does not change, nor does it go away. What would happen to the world of car's if the VIN disappeared?

We have the SAME question in the world of counterfeit drugs... Unfortunately E-Pedigree as a country wide solution has been lobbied down, and pushed back. Each bottle was to be labeled and identified as a unique bottle using a very specific bar code. It would have allowed the entire industry to sort out the MOST of the counterfeit drug problem, and save people’s lives.

You can sit there and tell me that "Business Keys don't matter" but at the end of the day, I will say: you are losing money, and quite possibly people are dying without them.

Cheers,
Dan Linstedt
Check out WHY business keys are important, learn about the Data Vault Model.


Posted November 2, 2008 10:07 PM
Permalink | 3 Comments |

In this entry we'll dive a little further into the pros and cons of master data as a service (MDaas). We'll bring to light the different kinds of master data, and how it will evolve in the market place into a service oriented architecture, housed offsite (generically). MDaaS follows the standard curve of new ideas, individual creation (decentralization), then centralization, and then commodity based master data. I think the firm which undertakes master data as a commodity will be a hot property in the near future.

First, I'd like to discuss the definition of master data (which I've done in other blogs). From a 30,000 ft perspective, master data is operational, quality cleansed, singular in nature, and descriptive about a business key - it is in fact an operational data store for the enterprise (with a few rules twisted). By the way, come see me at TDWI in Orlando next week - I'm teaching on Master Data (how to implement within your enterprise).

Master data should not contain:
* Parent-child relationships (other than recursive hierarchies to itself).
* Degenerate dimensional information
* Junk
* Data that is unrelated or weakly related to the business key.
* multi-part business keys that represent relationships in the business world.

Master data structures should contain:
* The business key, the whole business key and nothing but the business key.
* In addition to the business key, all descriptive data ABOUT the business key (to provide the business key CURRENT CONTEXT)
* 1 to 1 relationship with a surrogate generated number to the business key.
* Load date, create date, last updated date, original record source, updated record source

Basic rules:
* Master data can exist (as a historical record) within the warehouse.
* Master data in the ODS is always updated in place
* Master data can be built from a historical record in the warehouse (if done properly)
* Master data is NOT a materialized view within the warehouse
* Master data is usually stored in a separate data store for performance reasons. It is tuned to be operational in nature
* Each element or attribute within master data tables are defined by Master Metadata (enterprise metadata and ontology’s for further context).
* Master data is hooked to 24x7x365 services layers for bi-directional data streams (updates in, pushed update notification out to subscribers of that service).
* Master data sets are cleansed prior to load into the ODS, this data is partially auditable as a System Of Record (once established and is used to update source systems) However, the caveat is: the cleansing and quality routines MUST provide auditable and traceable actions on what happened to the master data on the way in. These audit logs MUST be reversible.
* Master data updates are reversible
* Master data is a single copy within the enterprise, hence the term MASTER. If copied locally across geographical regions, then it is read-only, and each local copy of the MD is force-fed (is a subscriber) to all updates.

Now, MDaaS requires that Master Data be housed off-site, on hosting services, in a remote database, connected through metadata and service layers. MDaaS can be specific by client (like SalesForce.com does with it's sales companies data it houses).

MDaaS attributes:
* Must be off-site
* Must be accompanied by discovery services
* Must be accessible through web services
* Must be secured through authentication
* Must be encrypted when traveling over the WAN
* Must be accompanied by Master Metadata (Enterprise Metadata)
* Must allow discovery services to query metadata.
* Must be updatable through services
* Must have minimal latency even though it's over a WAN
* Must have constant quality engines running to cleanse the data on the way in.
* Must be accessible via web-browser user interface in order for the business to monitor and manually adjust master data.
* One stream of changes (the old record prior to a new update) must be pushed out to an EDW subscriber for recording purposes.

MDaaS must NOT:
* be locked away within an ERP, or CRM system unless this is the ONLY source system this enterprise is using.
* be down at any time, down-time will kill SLA's and the operations of a company.

Some interesting items, there are some general master data sets that can and should be available to paying subscribers as shared data sets, these include:
* Postal Information
* SIC Codes
* Public records, like patents, locations of buildings, maps, geo-spatial information, public financial calendars and so on, some (regulated) tax / levy data.
* Government registries of registered businesses, and their corresponding names

Any data currently reported to the public and available on the web, should be turned into MDaaS - and in some cases already has.

Types of Master Data Entities might include:
* Portfolio Master Lists
* Invoice Master Lists
* Location Master Lists
* Address Master Lists
* Accounts Master Lists
* Portfolios Master Lists
* Employee Master Lists
* Customer Data Hubs
* Product Master Lists
* Service Master Lists
* Supplier Master Lists
* Manufacturer Master Lists
* Parts Master Lists

Some of these are protected and encrypted and relegated to authentication for access, some are not.

At long last, what are the pros and cons of MDaaS?
Pros:
* Centralized Master Data can improve global quality of information
* Off-Site Master Data can reduce the costs for each customer wanting to get in to the fray.
* Cycle time to attain Master Data for your enterprise is reduced as more vendors offer MDaaS (rapid build out)
* Standardized Metadata is hashed out for Master Data Sets that are shared. For instance, a zip code is a zip code is a zip code - no matter where in the world you live.
* It's already a proven technology (some companies are providing customer master lists with addresses in this light) i.e.: Axciom
* Low risk for implementation success

Cons:
* Could cost a lot of money for ensuring 9x9 uptime in a global environment.
* A breach of security in your MD hosting provider may be an uphill ethical battle in local governments.
* Rount-trip time over the WAN for master data updates may be outside the desired or acceptable time-frame.
* A company hosting your Master Data may use it (without your knowledge) to help other companies achieve standardized master data.
* A question of "Who owns the Master Data" comes in to play - contract negotiations should mitigate this.
* Requires your business to have Metadata already defined for the master data sets, so that context can be established (basic context) when surfing the available MD services.
* Requires your business to be Services Enabled - you don't need to be at the SOA level (yet), but you need to have web-services in play, and operational within your organization. An SOA initiative under-way will help.

Do you have anything to add to this entry? Please share it. I'd love to hear your thoughts. Again, come see me next Friday at TDWI for Master Data Implementation.

Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc
http://www.MyersHolum.com


Posted November 3, 2006 5:35 AM
Permalink | 4 Comments |

Have you ever thought about Master Data as a Service? Well, some companies are thinking this way. If this happens, a major paradigm shift will occur. This entry looks at MDaaS - and it's possibilities for changing the way we do business entirely. Who knows, maybe EII vendors could play in this space very very well. After all, they are the ones with Niche technology that really fits this space to begin with.

I'll blog on Master Data, the hype - the shenanigans, and the fears in my next entry. For now, realize that master data is important to the enterprise for many reasons.

Master Data means a number of things to a number of people, I'm no exception - master data to me are literally the keys to standardized kingdom. The cycle repeats itself in everything we build, first there's a new idea, then everyone implements their interpretation of this new idea as a gain, a benefit, the fact that they are different seemingly gives them "an-edge." Then some of these edges fail, best practices and lessons learned emerge, and then all the smart industry implementers begin to follow what really works - common ground, standardization, convergence of thoughts - then the real players emerge.

This is what is happening with Master Data Management solutions. However I think there are a couple of companies who are thought leaders in this space who are making a difference today. One (of course) is my company, Myers-Holum, Inc. Another is IBM Professional Services, another is Cap-Gemini, Intelligent Solutions, Object River, Kalido, and of course my respective peers (here on B-Eye Network) like David Loshin who write about MDM implementations.

But something caught my eye the other day, Cap-Gemini was saying that as a best practice, they take their customers' master data and house it off-site, so that the customer is not impacted by the machines, hardware, extra support for master data. They enable the master-data set with web-services for their customer, and they surround it with Enterprise Metadata (or my term: Master Metadata).

When I first saw this, I thought: no, not possible that a company would release their intellectual capital (which master data really is like golden keys to a kingdom when implemented properly), and allow it to be stored off-site. Then I started thinking about differentiation and then about standardization.

I realized very quickly the same thing applies to master data that applies to SaaS - standardization of particular parts, geographical locations, customers, and so on - as long as the data can be "secured", treated with integrity, delivered on time, standardized and made available - why not put it out as a service? Data Warehouses as a Service never really took off, and I'm not sure it ever will (maybe one day), but MD as a service, that's different - why? It's operational data when we look at it, we deal with transactional based information, now information - small numbers of rows going through a web-service request.

What a gold mine! Now imagine you get common data from Dunn & Bradstreet, you clean it up, and you standardize it over a web-service request, then you get common local census data (like the post-office does), and address data, and you intermix these as master data sets, then release them as MDaaS, you've got an interesting solution for the industry.

Suppose you load company profiles, SIC codes, and other public information - what happens? You can serve many different customers at the same time with the same data (master data that is standardized). A "virtually compressed" image of the data, because you don't have to store different copies for each implementation that is built. Voilla - keeping costs down for the customers of the service, the master data is updated, and pushed when changed to the customers who have signed an SLA with you.

I think Cap-Gemini takes this one-step further, by offering MDaaS for ALL the data sets the customer has, in agreement to keep certain company information confidential. Of course if Cap-Gemini or any other MDaaS system is compromised there will be a lot of stirr in the ethics community, and compliance will become an issue. Cap-Gemini must abide by in-sourcing, and different country rules, particularly with a global enterprise.

I think transactional Master Data as a Service is one wave of the future that I would ride. It's potentially a huge wave if it can be implemented properly, and security concerns can be addressed with encryption, compression, and proper data access. After all, the true nature of SOA is services, regardless of whether or not they are in-house or out-of-house, the true nature of Master Data is consolidation and standardization, regardless of company utilizing that information.

If you have any thoughts on why this would work, or wouldn't work - or what you think it would take to make it successful, I'd love to hear from you.

Cheers for now,
Dan L
CTO, Myers-Holum, Inc (http://www.MyersHolum.com)


Posted October 28, 2006 6:13 AM
Permalink | 4 Comments |

MDM data often is dispersed across the organization. This begs the question: how can the MDM be a viable asset to the business base? Is the Master Data reused throughout the organization the way it should be? Is it defined in the right context (Master Metadata)? But technically, Master Data should be consolidated into a single global data center. MDM is not a tool, not a toy, not a process - it's a way of doing business that includes tools, best practices, people, governance, metadata and single answers.

Remember: MASTER DATA is NOT, I repeat: NOT a single version of the facts, rather it IS a single version of today's corporately accepted TRUTH. Also remember: what's true today was NOT true yesterday (last week, last year, 5 years ago...) Master Data IS auditable as a system of record, but it's one of 3 system of record definitions (see my blog on System of Record).

Ok, so where does that leave me?
If you've got an SOA effort within your business, then you should have a space in the plan for Master Data, Governance (at different levels), Master Metadata, EII. It means your EDW is already setup as an Active Data Warehouse and is operating in Near Real Time. It means your operational systems have already placed their data exchanges under web-services (or are undergoing this conversion as a part of the SOA project).

To get to Master Data, I would strongly suggest that you have already defined enterprise wide conformed dimensions, or if you've got a 3rd normal form warehouse, that you've already defined enterprise wide accepted metadata definitions. In other words, I don't believe that you can get to Master Data Management successfully without FIRST going through a Master Metadata Management effort.

So what is Master Data Exactly?
As I've suggested in the past, Master Data IS a single consistent, consolidated version of today's truth, based on a SINGLE BUSINESS KEY, and includes descriptive data of that key post-cleansing/aggregation. Keep in mind that Master Data means the data attached to that KEY is at the SAME semantic levels of metadata definition. For instance, a Master Data table is different from a Conformed Dimension in that Master Data Tables do NOT and should not contain hierarchies within the same table set. Hierarchical data is at a different semantic grain, and usually is keyed off a different business key - therefore requires a separated and different Master Data Table.

Can the Master Data Tables be linked together?
Yes - through many to many relationships, Master Data Tables should NOT be dependant on parents, each master data table should be fully independant - as they stand alone in definitional nature. Master Data only requires the business key - it does NOT require the existence of the parent in order to be "created, used, and referenced." The BUSINESS may require the parent key when enforcing business rules for reporting purposes. Referential integrity should be done through secondary processing, or through EII, or through BI query sets to determine what "todays" business rule is, and if the data is in error (according to context).

Keep in mind that Master Data CHANGES CONTEXT depending on who's using it!!! That means, that Master Metadata must be defined, and metadata definitions over-ridden at operational levels (as long as there exists a 100% dependancy on the parent metadata chain) in order to determine context of the Master Data Set.

For example: A car is a car is a car, it has a VIN number - the VIN number doesn't change even though the car's color changes, or the seats change, or the radio is swapped out. The CAR CAN EXIST WITHOUT A DRIVER / OWNER! The business rule for "shipping" of that Car cares about a parent Container, and a Parent ship to that container. The Sales-floor cares about the "car" and the prospective buyer of that car, and the OWNER cares about the car itself. Context of the CAR and how the master data of the CAR changes depending on who's using the data and how it's viewed.

Master Data MUST be consolidated in a single data center. If it's to be utilized (for performance purposes) in other systems, it must flow downstream from the central synchronization point to a local copy. It's a read-only local copy in order to avoid stove-piping in the industry again. Master Metadata must also be centralized, AND the master metadata MUST be delivered along with the Master Data in order to make sense of it at run-time or access time.

Are there problems with the Master Data centralization effort?
YES, timing issues abound, locking, synchronization, and distribution issues abound. However - networks are getting faster, enabling data to centralize better - and be utilized all over the world. Here's where EII can help - EII can begin to compress data sets, store data sets in virtual tables on the server, AND in the near future (EII vendors, are you listening?) manage Master Metadata!! EII is in a prime position to solve the problem of connecting Master Data to Master Metadata definitions, and act as the delivery point for both.

Before you tackle Master Data, please please please spend time in considering the architecture carefully, and I would suggest that you're ADW, EII, Metadata, and web-services components are in place for at least one component of the business.

We are a world-class implementation firm who builds solutions that scale to the Petabyte ranges. Come see my MDM night school course at TDWI next week, followed by my VLDW class the next day.

Thoughts? Questions? Comments?

Thanks,
Dan Linstedt
CTO, Myers-Holum, Inc
http://www.MyersHolum.com


Posted August 19, 2006 6:49 AM
Permalink | No Comments |

My last entry introduced the concept. Making it work is the hard part. One of the comments I received asked about handling business changes, politics, and a variety of other circumstances. In this entry we'll begin diving in to the differences a little more, and discuss a small portion of the politics that surround this. I'll be presenting some of this information at the upcoming TDWI conference in August-2006, and the Upcoming Teradata Partners conference in November.

First, one must understand (and agree) that technical and business metadata are to be separated from a conceptual level. Then we can begin to implement projects that tackle finding and cojoining technical metadata. There are several repositories on the market today which are capable of housing technical metadata. My personal two favorite are: MetaIntegration, and SuperGlue (Informatica Product). There is also MetaStage (IBM/Ascential), Platinum (CA), and a few others that stretch the definition.

Regarding successes in the industry, we've had quite a few successes over the years building technical metadata repository stores inside of our 90-day prototypes and larger projects. It is important to remember that technical metadata projects must be scoped just like data warehousing projects must be scoped. Both have to have attainable goals, project charters, buy-off, statement of work, etc... In fact, one of the best practices we employ is: to treat the technical metadata project as a technical metadata data warehousing project. This works quite well - now we begin to capture metadata from different tools, stitch it together and produce it for business users from the BI side of the house.

Having it wrapped via a DW project allows us to use our best practices to get the technical metadata in-place, and delivered via reporting tools as fast as possible, and as accurately as possible. Standards, guidelines, lessons learned are all a part of the on-going project. Companies that we've deployed some successes at include a large travel based company, two different branches of federal government, a manufacturing sector of a large government contractor, and a large television cable provider.

From a business perspective, managing the on-going requirements is the same as the EDW project - risk analysis, mitigation strategies, and best practice policies; the same as any other project. Technical metadata is the easy part, because just like data warehousing, the data "arrives" from a source system in standard formats, and can be retrieved and merged together as master-metadata ontology. The one thing that SARBOX is missing is the ability to count technical metadata as auditable, so by putting both in a warehouse we also begin to make the technical metadata live up to the promises of compliance.

Business metadata is a completely different story. Business metadata is extremely complex, can reside in a hundred different sources, and lives in many different types of documents. Business metadata can (and must be) standardized with regards to a master registry, something that defines the layers and trees of metadata within the business, along with the dependency chains. We have setup such registries at two large branches of the Federal Government, and have other clients considering the same. Registries are just the first step - they define the structure and dependency in a standard format, along with generic terminology (mostly data elements and business definitions, and how-used definitions) within an organization at multiple levels. This MUST be a part of an MDM initiative - if not, MDM will lose its way, and it's meaning quickly after productionalization.

The second step is usually another phase in the project, it is taking the registries, and using EII tools to locate, scan, feed, from unstructured and semi-structured data sources any kind of metadata we cannot get from the technical side of the house. We augment the registries (which are already housed within the metadata-data warehouse). Again, best practices, detailed processes, risk assessments, mitigation strategies, and user communication are key to the success of such projects. The business metadata is then "standardized" according to today's rules and logged as a snapshot (this is what we found... kind of thing). Only EII tools seem to be able to gather this kind of information automatically (once setup).

Of course there are complex tools, like Rational Suite of products which also handle Business Metadata, but they require lots of investment in time and money to get them working to your enterprise advantage. They pull requirements and business metadata from Word Documents, and so on – however today, we find it much more useful to implement EII tools that can grab both unstructured and semi-structured information and interrogate it/interpret it for metadata context. There are some EII/Data vendors that we commonly work with including: Ipedo, Informatica, Silver-Creek Systems, and IBM that help make our job easier.

Handling the politics requires governance policies (see my articles on Governance here on B-Eye-Network). My firm has a proven track record of successful implementations of Master Metadata (including both technical and business metadata). Let us help you get your effort under-way successfully.

Do you have an initiative under-way? Do you have lessons learned you would like to share? Please, comment on the blog - let's discuss it. Or send me your questions privately.

Thank-you for your time,
Daniel Linstedt
Daniel.Linstedt@myersHolum.com


Posted July 31, 2006 7:18 AM
Permalink | No Comments |

My oh my, we've thrown a lot of terms in to the mix. These days when you read magazine articles or you look through your local friendly blogger :) you find a slew of these terms used. Maybe it's time to refresh our memory on exactly what these terms mean. Why? Because they are pertinent to MDM and MMDM (master metadata data management). So read on...

These definitions have been pulled from http://www.websters.com

Taxonomy:
1. The classification of organisms in an ordered system that indicates natural relationships.
2. The science, laws, or principles of classification; systematics.
3. Division into ordered groups or categories: “Scholars have been laboring to develop a taxonomy of young killers” (Aric Press).

Classification:
1. The act, process, or result of classifying.
2. A category or class.
3. Biology. The systematic grouping of organisms into categories on the basis of evolutionary or structural relationships between them; taxonomy.

Ontology:
1. The branch of metaphysics that deals with the nature of being.

Registry:
1. The act of registering; registration.
2. The registered nationality of a ship.
3. A place for registering.
-- A book for official records.
-- The place where such records are kept.

Ok, what does this have to do with Master Data or Metadata or BI for that matter?
The industry is throwing the terms around too loosely. Registries are being used for Metadata, as such they should be - at the bottom level of a Taxonomy is a registry. The first step to successful enterprise Metadata Management or governance is getting a handle on the Taxonomy of the business and the metadata used within the business. This is critical to identifying and governing specific components of the MDM strategy.

Taxonomies should be utilized to manage, govern, and view (visualize) the metadata from an enterprise perspective. However, the act of building a metadata management solution, or a Master Data Management solution requires the implementation of a classification with a registry or set of registries underneath.

It is vital that we all speak the same language here and not get confused. Some of my blog entries I've discussed the possibilities of VISUALIZING data sets, well guess what? An EASY way to visualize huge metadata collections is to use a Tree classification as the implementation side of the taxonomy. The registries are at the leaves in the trees and provide further drill down, but have nothing to do with the visualization.

Wait a minute, I can see this for Metadata, but how does that help my MDM effort?
Well, as I've blogged before - Metadata or Master Metadata Management needs to be a part of EVERY MDM initiative out there. Why? Because it provides the CONTEXT to understanding our Master Data. How it's used, where it's used, when it should / should not be used, and what the elements mean at varying levels within the organization.

Master Metadata (at a very simplistic viewpoint) really is a data-driven taxonomy (representation) of the BUSINESS. Without tying our master data back to the business it will lose value quickly within the company, and eventually end up where all master systems end-up... in the sunset on the horizon...

Questions? Thoughts? Haiku's? Incantations? I'll take them all, let me know what you think...

Thanks,
Dan Linstedt
Daniel.Linstedt@MyersHolum.com


Posted June 21, 2006 2:45 PM
Permalink | No Comments |

The question? What does the new business initiative really need to focus on?

Today's business initiatives seem to be headed in many different directions, from SOA to MDM to registries, and business processes. The issue is that when different initiatives take on different directions (rather than a consolidated view and set of drivers) they all end up at different destinations. The cost is heart-ache, silo'd solutions, and a maintenance nightmare. The bottom line is that there is convergence afoot. I've written about this over the past 5 years in my convergence articles on TDAN, B-Eye Network, and Teradata Magazine. In this entry we'll explore what business should do, and how they should approach these very different initiatives (all with a common goal).

MDM - Master Data Management
MMDM - Master Metadata Management
SOA - Service Oriented Architectures
Registries - well, registries of web-services, taxonomies and hierarchies of access points, names, and security access restrictions, I guess one could say more metadata...
BPEL - Business Process Execution Language
BPM - Business Process Management

And of course the tools of the trade:
EAI - Enterprise Application Integration
EII - Enterprise Information Integration
ETL/ELT - Extract Transform / Load
RDBMS - Relational Database Management System

Ok now that we got that out of the way... Businesses have been divesting their interests for years (at least when it comes to I.T. projects). It's time to get a little convergence back into the mix. Businesses who start separate initiatives for each of the categories above will quickly find that they end up with one or more of the following:

* Silo'd answer sets
* Silo'd information assets
* Argumentative Fiefdoms within the kingdom (arguing over who's right and who's wrong and who has the best answers).
* IT Constrained Business - disparate projects, tons of sunk cost, high maintenance overhead
* Inconsistent standards
* Missing best practices
* Holes in the I.T. security wall (all over the place)
* Lack of IT business initiative
* Poorly motivated IT employees

And so on... Executive staff should realize that the good things in life don't come cheap, or easy. After all, they've worked extremely hard to get where they are. IT is no different, and should be treated as a single operational business unit. IT's initiatives should be aligned, but in a way that allows IT to work together rather than against each other.

So you've heard this all before have you?
I'm sure you have - it's been printed in the magazines for years, lately it was called IT alignment. Let's get back to the issues shall we?

What does this have to do with lining up: MDM, MMDM, SOA, and Registries?
Everything. Businesses today should establish an overriding IT umbrella, that umbrella is in fact, an SOA initiative. One way to think about it is: IT is a service based organization, SOA is a service based architecture from which automated services make business information, processes and descriptions available (on-demand) to the business. Let's just say SOA does for IT what JIT does for manufacturing and supply chains.

Underneath the SOA are Master Data, Master Metadata, Web Services, Registries, Auditability, EDW, OLTP, data marts, and Information Integration. All of these are the components necessary to make SOA a success. But remember, SOA is a journey not a destination - just like alignment of IT is a continuous process (it never ends).

So what do all of these have in common?
* Shared business insight
* Shared executive level sponsorship
* Shared information and data sets
* Shared asset base
* Shared security model
* Shared business processes
* Shared Metadata
* Common information dissemination model

From a project standpoint:
* Shared milestones
* Shared Risks
* Shared training
* Shared knowledge

There is also a certain dependency (order) in which these items must be executed. If one is left out of the process chain, then the business stands to suffer at the end of the day. Convergence is upon us, and real-time (active), metadata (descriptive), data sets (asset base), registries (organization of all data and metadata underneath), security and services (access layers) are all a part of the enterprise initiative to bring IT in to focus.

More to come on this topic - if you have questions, I'd like to try to answer them. Feel free to ask publicly or privately.

Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc


Posted May 15, 2006 5:26 AM
Permalink | 2 Comments |

What do I mean by Local versus Localized copies of Master Data? In this entry I will try to explain my definition for each. This is a short entry, and as always, comments are welcome.

I did not say "local copies" should not exist, what I did suggest is that no "localized" copies should exist. If I did actually refer to "no local copies" then shame on me, I made a mistake.

Local Copies of the data replicated Master Lists across geographically dispersed regions for fast access times, this is fine.

Localized copies of the data
Master Data that has been "changed, altered, or is different" in some way from the source of the Master Data Set. In other words, someone not only changed the master metadata (definition in how the data set is used in context - which is OK to override), but they also changed the MASTER DATA itself, to represent "their version" of the master data. This is NOT ok. It re-creates the silo's and destroys the master data purpose to span the horizontal enterprise.

I hope this clears things up.
Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc
http://www.MyersHolum.com


Posted April 21, 2006 8:55 AM
Permalink | No Comments |

When Claudia Imhoff and Shawn Rogers and I got together for lunch the other day, we discussed this notion of SoR - it's a very interesting take. SoR has long been held as a single definition, and has been defined as residing in the source systems. Today, there are multiple definitions (3 to be exact) of SoR. Particularly since MDM evokes new notions of what SoR means to the business, as does a compliant and auditable enterprise warehouse. In this entry I'll walk through the multiple definitions of SoR. In my MDM night course in August at TDWI (2006, San Diego) I'll be discussing many of these things.

Let's start with the three types of definitions: The first definition is the widely accepted definition, based on the origination point of data (source systems). The second definition is not so well known, but for those of you with a normalized and AUDITABLE enterprise warehouse, you'll be happy to see the second definition. The third definition is based on the ever-present "single view of today's enterprise". Many people and vendors call this: "Single Version of the Truth", but truth is subjective to EACH individual user, so there's no possible way (in the purist sense) that truth actually exists!!

SoR Definition 1: The data that exists in the source system, in other words, where the data is entered, or originates for the first time. It contains a record of entry or creation or origination for the information it houses. Hopefully this system is auditable, and compliant. The data might not be "clean, quality checked, or integrated" unless it sits in some sort of master list (master data set). Most of this data is shipped across the enterprise to other systems, and to an enterprise data warehouse for integration.

SoR Definition 2: This data resides in a NORMALIZED enterprise data warehouse, and is auditable and compliant. This data is RAW Data that is INTEGRATED by business keys (See Data Vault Data Modeling). This data is NOT cleansed, altered, modified, or quality checked - but is auditable, meaning that an auditor can trace the data back to the source system from whence it came. The integration point is in fact the master-key (business key) that horizontally integrates data across the enterprise. Duplicates reside within this data set, dirty data resides within this data set, the data that resides in the normalized warehouse is captured by type of data and rate of change, and is the only place in the entire enterprise where this integrated version of uncleansed raw data exists. This data is often used by business to FIND broken business processes. If you don't have source system raw data in an integrated fashion, it will be harder to spot the trends, and the broken business processes will continue to go un-noticed. I've seen save companies millions of dollars. This is a RAW system of record, it is the only place in the enterprise where this integrated raw version exists.

SoR definition 3: Master Data, or Conformed Dimensions - data that has been cleansed, quality checked, duplicates removed - and is seen and used by the business as "single version of today's truth", it is covered by master metadata, and is understood by the organization to have meaning. In the master data set there is only a SINGLE copy of each (customer / part / work order / supplier etc..) item. This is an SoR by business standards because it represents value to the business in eliminating duplicates and understanding how the business looks "TODAY". It's a snapshot of the current consistent, and quality cleansed information that feeds the rest of the source systems.

So, as you can see - there is value to each type of System-of-Record. So what then is exactly, a system of record itself?

I would tend to suggest that a System of record has the following characteristics:
1. It's a data origination point (the only place in the enterprise where this "vision" of data exists).
2. It begins to feed other systems, providing automatic feedback to source systems, and becomming a part of the operational LOOP in business decision making.
3. In some cases it's auditable and traceable, in other cases it's quality cleansed - but in all cases it provides business value in different formats and assists the business in DOING business on a daily basis.

If you have questions or comments I'd love to hear them, please post them below.

Thank-you
Daniel Linstedt
CTO Myers-holum, Inc


Posted April 15, 2006 7:12 AM
Permalink | 2 Comments |

I've been blogging about MDM for a while now, and in my last entry I defined what Master Data and Master Metadata should be. By the way, both of those definitions along with the entry has been certified by Bill Inmon, and Clive Finkelstein as the standard definitions for MDM. In my sense of adventure I decided to take a look at 10 different vendors, what they claim MDM to be, how they define it (if they define it), and how they claim to implement it. What I discovered is not that shocking, MDM SOFTWARE: BUYER BE-WARE!!

WARNING: THIS ENTRY IS NOT FOR THE FAINT OF HEART, IT IS MY BIASED OPINION ON WHAT MDM REALLY IS VERSUS WHAT VENDORS CLAIM IT TO BE. I'm not saying that vendors are all wrong or bad, quite the contrary - I'm saying that while Master data vendors have good software and provide ROI, not all solutions are built to meet your needs, and the marketing hype would have you believe otherwise.

Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution.

Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not.

**** DISCLAIMER ******
I Have not received vendor demonstrations from any of the vendors, all I have done is read through their web-sites and looked at what they "claim", and looked for supporting information on their sites to see if they've included people, business process, governance, compliance, integration, master data, and metadata management. Please take all this advice with a grain of salt. The purpose of this entry is to raise awareness of the customer base, and to have you ask questions of the vendors so that your business expectations can be set appropriately.

First off, MDM is a business process, what the vendors are really selling is a piece of Master Data Management, the mechanical part of integrating, cleansing, and quality checking the master data. Most do not offer Business Rules Integration, applied Data Mining, registries, web services (to meet SOA), EAI, EII, ETL/ELT, and RDBMS. I've listed (as a comment) to my last entry, a slew of vendor URL's that I've looked at. For reference, I'll list them again here:

LIST OF VENDORS SELLING MDM SOFTWARE SOLUTIONS:
www.ibm.com
www.sas.com
www.DataMirror.com
www.i2.com/solutionareas/sixone/scos/mdm.cfm
www.kalido.com/products/mdm
www.hyperion.com/products/bi_platform/ core_data_integration/mdm_index.cfm
www.Stratature.com
www.Gemstone.com
www.SilverCreekSystems.com
www.Purisma.com
www.Nimaya.com
www.Netics.com
www.datafoundations.com
www.ObjectRiver.net
https://www.sdn.sap.com/irj/sdn/developerareas/mdm
http://ibm.ascential.com/solutions/master_data_management.html
www.metamatrix.com/pages/solutions/master_data_management.htm

In no particular order. Now let's look at a few of their definitions to see just what they say MDM is to them:

SAS:

Creating a master data environment enables organizations to provide a single source of truth around which enterprise systems can be synchronized...Reusable business rules clean, standardize and enhance data as it moves into the master reference file so all information is accurate.

IBM:

IBM WebSphere Information Integration is the master data integration offering that delivers authoritative master data for any industry or business function…support the full master data lifecycle…IBM defines MDM as the set of disciplines, technologies, and solutions used to create and maintain consistent, complete, contextual and accurate business data for all stakeholders

Kalido:

KALIDO 8M is an enterprise-wide master data management software solution for harmonizing, storing and managing master data over time…The master data management software produces a master data warehouse from which "golden-copy" master data can be distributed to enterprise applications and business people throughout the organization

Now I've listed a few vendors, let's talk about the pros and cons of each vendor (taking from additional inormation on their web sites).
SAS:
Pros:
* They have an embedded data mining capability, and are best of breed for data mining (separate module)
* Embedded ETL engine (if you purchase this module)
* GUI integration (separate module)
* Reporting Engine (separate Module)
* They handle large scale data sets

Cons: (according to industry analyst groups)
* They are not best of breed when it comes to web-services
* They are not best of breed when it comes to answering SOA
* They are not best of breed when it comes to ETL
* They do not have EAI or EII embedded (or so it seems)
* They are a code driven solution
* They are not best of breed for Enterprise Master Metadata
* They do not appear to have a pure business rules processing/management engine like ILOG or JRULES
* Their web site does not provide enough surrounding information to describe their implementation methodology regarding the "data management" and "governance" processes needed to fully implement MDM.
* They want you to believe that "ETL" is Master Data Management, they are touting old-tools under a new skin without including Data Mining, Information Quality, Business Process Management, Compliance, Governance, EAI, and EII as a part of their solution.

IBM:
Pros:
* BIG COMPANY, Lots of information (freely available) on how they handle Governance, Data Management, Metadata and Enterprise Metadata Management.
* Implementation methodology documented (in overview form) for Master Data Management
* Include EAI (websphere / MQ series) as their solution
* Include Ascential Quality Stage for data quality/scrubbing and consolidation
* Include Ascential Meta Stage for Metadata
* Include Web Services (websphere) for Production and handling of master data
* Include Registries (websphere) for production and handling of master data

Cons:
* In one diagram and implementation methodology claim that WebSphere and EAI is the entire solution for MDM, in another diagram and description, they describe additional needs for Quality Stage and Meta Stage
* They seem to be in conflict with themselves, no clear story as to the "Complete" vision of MDM. One document leads you to five or ten others that discuss governance, EII, EAI, Reference Tables, Information Quality, and so on. Different authors have different ideas as to what MDM really is.
* They would have you believe that Data mining, and business rules are not a part of Master Data Management (until you dig deeper into their implementation methodologies).
* Their solution seems to be: buy the entire SUITE of products from IBM, then buy all of their consulting to get the full and complete MDM solution. This is good if you have TONS of money to throw at the problem, and several (3 to 5 years) to solve the problem.
* No single tool seems to be the shining star for helping tackle the Master Data issues.

Kalido 8M:
Pros:
* Easy integration
* Easy Logical Data Model Changes
* Contains an ETL tool
* Contains a data modeling tool
* Vendor says: it contains Data Quality, Profiling, ETL, EAI, EII, Web Services, and Business workflows.

Cons:
* It is not best of breed (according to industry analyst groups) in: EII, ETL, EAI, Web Services, and Business Process Management (ILOG, JRULES).
* The vendor makes it seem like they are the SINGLE tool for the entire suite of MDM - yet they don't document how the rest of Data Management takes place.
* They are missing methodology definition, compliance, governance procedures, implementation best practices, definition of scalability into the 50TB+ range.

No single tool can be-all-end-all for the MDM architecture. Again, Master Data is one thing, Master Data Management encompasses master data, and data management. The entire MDM is an enterprise initiative involving people, process, compliance, governance, data, systems, and a variety of best of breed tool sets.

Buyer-be-ware, do your homework, interview your vendors. Sponsor your MDM initiative at the executive level, apply best practices for Project Management, SEI/CMM, follow standards for including your data warehouse as a part of your Master Data Management Initiative, and proceed. Soon, I'll have an MDM Vendor Scorecard available on our web-site at: http://www.MyersHolum.com

Are you a vendor? Please feel free to comment, to counter any of my opinions with facts. I'd love to learn more about your specific solution.

Are you a customer of an MDM "tool"? Please feel free to comment, share your experiences - anonymously if desired. I want to see where the tools have worked for you, and where they have not.

Thank-you kindly for your time,
Daniel Linstedt, CTO, Myers-Holum, Inc
Daniel.Linstedt@MyersHolum.com


Posted April 14, 2006 6:12 AM
Permalink | 2 Comments |
Search this blog
Categories ›
Archives ›
Recent Entries ›