Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Recently in Master Data Management Category

Well, it's happened again. IT is trying desperately to eliminate the value of the EDW from the business (at least this is what I see). Business is responding by demanding the creation of Master Data systems. There seems to be an age-old argument in the market space about the use of, definition of, and condition of: Business Keys. IT appears to be telling people to use surrogate keys and to ignore the business keys entirely. In this entry we will explore this single notion, and see what some folks have to say about it (me included!) Mind you, this is a bit of a rant; they seem to know how to "get my goat" as they say...

I start off by stating, Codd & Date designed normalized forms to have business meaning. They insisted that Business Keys be utilized in order to "make sense of" and "tag" the data sets appropriately so that relationships can be understood and maintained.

A simple link to a temporal database book houses a brief entry on the Information Principle: See it here.

I'd like to say a word or two about business keys (which by the way, you'll be able to find additional information on my videos on YouTube: http://www.youtube.com/dlinstedt/

What are business keys?
Business keys ARE the master information that unlocks context for business users. Business keys are (often) intelligent keys that have MEANING to the business. Business Keys are often alphanumeric, parts of which may be generated sequences, other parts have meaning based on position. In any event, business keys are USED by the business to locate, identify, and track information through the business life-cycle. Without them, business may not be able to "use" or apply the information properly.

What business keys are NOT:
Business keys are NOT surrogates, NOT sequences, NOT ordered numeric elements assigned based on technical insertion rates. Surrogate keys should NEVER be shown to the business, ever.... They should be used within a system (internally only) to identify rows to the machine, and provide optimal join paths, but they should NEVER appear on reports, screens, or anywhere that the business can see them.

Is there an argument around business keys versus surrogate keys?
You bet! Check out these comments:
http://www.mindfuldata.com/Modeling/modeling-pdf/DAMA%202008%20Speaker%20Notes.pdf
http://stackoverflow.com/questions/63090/surrogate-vs-naturalbusiness-keys


"Dimensions should always use a surrogate key that is generated within the warehouse. I went to a presentation a couple of years ago by Ralph Kimball (a data warehouse author), and he discussed the importance of removing the warehouse's dependency on business keys. The idea is a good one, because business keys change regularly and this will result in a long-term problem for the warehouse. However, when we discussed Slowly Changing Dimensions (especially ones that kept history), he said that we should use the business key to link them together. This went against what he had just said, so I decided that we needed to find another solution." http://expertanswercenter.techtarget.com/eac/blog/0,295203,sid63_tax298150,00.html

http://www.mindfuldata.com/Modeling/modeling-pdf/DAMA%202008%20Speaker%20Notes.pdf

http://www.infoadvisors.com/Home/tabid/36/EntryID/191/Default.aspx
http://www.cerebiz.com/blog/index.php/2007/08/06/use-of-surrogate-keys-in-data-warehousing/

WHY are these people demanding that there is no value to business keys?
Because it's a very tough problem for business to overcome. Yet the business today is ASKING, Begging, pleading for answers from Master Data Sets. I maintain that you cannot build a master data system without looking at and using business keys as a central HUB of information.

Why not surrogates?
If I ask you to look up surrogate key 5, do you understand what this is? where it came from? what it is bound to? Does it give you _any_ context at all as to which system generated the number? Do you even know where to begin to find this key?

Surrogate numbers are generated today in EACH source system. In the Data Warehousing world we are responsible for integrating MULTIPLE systems at once into a single place. If we rely solely on these "surrogate keys" and completely ignore business keys as has been suggested by the links above, our EDW would never mesh or align for the business. Furthermore trying to build a master data system would be impossible. Some of these individuals I listed even went so far as to say: "ignore the business keys in your dimension entirely, because it is unruly (null) most of the time".

I say rubbish. If your business is not properly synchronizing, populating, or utilizing business keys then they are hemorrhaging money along their business process. Business keys are vital to the traceability of information ACROSS lines of business and ACROSS systems.

Take a look at what I say about business keys:
http://www.danlinstedt.com/AboutDV.php
http://www.tdan.com/view-articles/5285
http://www.b-eye-network.com/blogs/linstedt/archives/2005/09/between_inmon_a.php

Bottom line, Business keys are imperative that they span the systems. If the business keys are changing, or are re-used, the business is LOSING MONEY. I will take that to the board of directors level every single time, and every time - I can find busted and broken business problems and lack of visibility ACROSS the organization in accordance with their lack of regard for business keys.

The ONLY thing one has to do is look at the businesses that want master data systems - how are you (IT) going to integrate the data sets by surrogate if the surrogates generated by source system ARE THE SAME across multiple sources? WHICH surrogate are you going to show to the business as the "MASTER KEY" for which pieces of information? It's a near impossible problem to solve, the business units will fight over the definition, and it will come down to politics as to who is right/wrong, when the business REALLY should be deciding how to fix the source of the problem: lack of a single business key.

Auto manufacturers figured it out long ago, they use VIN (vehicle identification numbers) to uniquely identify: make, model, manufacturer, date of manufacturer, size of engine, and so on. Unless you are doing something illegal, the VIN does not change, nor does it go away. What would happen to the world of car's if the VIN disappeared?

We have the SAME question in the world of counterfeit drugs... Unfortunately E-Pedigree as a country wide solution has been lobbied down, and pushed back. Each bottle was to be labeled and identified as a unique bottle using a very specific bar code. It would have allowed the entire industry to sort out the MOST of the counterfeit drug problem, and save people’s lives.

You can sit there and tell me that "Business Keys don't matter" but at the end of the day, I will say: you are losing money, and quite possibly people are dying without them.

Cheers,
Dan Linstedt
Check out WHY business keys are important, learn about the Data Vault Model.


Posted November 2, 2008 10:07 PM
Permalink | 3 Comments |

In this entry we'll dive a little further into the pros and cons of master data as a service (MDaas). We'll bring to light the different kinds of master data, and how it will evolve in the market place into a service oriented architecture, housed offsite (generically). MDaaS follows the standard curve of new ideas, individual creation (decentralization), then centralization, and then commodity based master data. I think the firm which undertakes master data as a commodity will be a hot property in the near future.

First, I'd like to discuss the definition of master data (which I've done in other blogs). From a 30,000 ft perspective, master data is operational, quality cleansed, singular in nature, and descriptive about a business key - it is in fact an operational data store for the enterprise (with a few rules twisted). By the way, come see me at TDWI in Orlando next week - I'm teaching on Master Data (how to implement within your enterprise).

Master data should not contain:
* Parent-child relationships (other than recursive hierarchies to itself).
* Degenerate dimensional information
* Junk
* Data that is unrelated or weakly related to the business key.
* multi-part business keys that represent relationships in the business world.

Master data structures should contain:
* The business key, the whole business key and nothing but the business key.
* In addition to the business key, all descriptive data ABOUT the business key (to provide the business key CURRENT CONTEXT)
* 1 to 1 relationship with a surrogate generated number to the business key.
* Load date, create date, last updated date, original record source, updated record source

Basic rules:
* Master data can exist (as a historical record) within the warehouse.
* Master data in the ODS is always updated in place
* Master data can be built from a historical record in the warehouse (if done properly)
* Master data is NOT a materialized view within the warehouse
* Master data is usually stored in a separate data store for performance reasons. It is tuned to be operational in nature
* Each element or attribute within master data tables are defined by Master Metadata (enterprise metadata and ontology’s for further context).
* Master data is hooked to 24x7x365 services layers for bi-directional data streams (updates in, pushed update notification out to subscribers of that service).
* Master data sets are cleansed prior to load into the ODS, this data is partially auditable as a System Of Record (once established and is used to update source systems) However, the caveat is: the cleansing and quality routines MUST provide auditable and traceable actions on what happened to the master data on the way in. These audit logs MUST be reversible.
* Master data updates are reversible
* Master data is a single copy within the enterprise, hence the term MASTER. If copied locally across geographical regions, then it is read-only, and each local copy of the MD is force-fed (is a subscriber) to all updates.

Now, MDaaS requires that Master Data be housed off-site, on hosting services, in a remote database, connected through metadata and service layers. MDaaS can be specific by client (like SalesForce.com does with it's sales companies data it houses).

MDaaS attributes:
* Must be off-site
* Must be accompanied by discovery services
* Must be accessible through web services
* Must be secured through authentication
* Must be encrypted when traveling over the WAN
* Must be accompanied by Master Metadata (Enterprise Metadata)
* Must allow discovery services to query metadata.
* Must be updatable through services
* Must have minimal latency even though it's over a WAN
* Must have constant quality engines running to cleanse the data on the way in.
* Must be accessible via web-browser user interface in order for the business to monitor and manually adjust master data.
* One stream of changes (the old record prior to a new update) must be pushed out to an EDW subscriber for recording purposes.

MDaaS must NOT:
* be locked away within an ERP, or CRM system unless this is the ONLY source system this enterprise is using.
* be down at any time, down-time will kill SLA's and the operations of a company.

Some interesting items, there are some general master data sets that can and should be available to paying subscribers as shared data sets, these include:
* Postal Information
* SIC Codes
* Public records, like patents, locations of buildings, maps, geo-spatial information, public financial calendars and so on, some (regulated) tax / levy data.
* Government registries of registered businesses, and their corresponding names

Any data currently reported to the public and available on the web, should be turned into MDaaS - and in some cases already has.

Types of Master Data Entities might include:
* Portfolio Master Lists
* Invoice Master Lists
* Location Master Lists
* Address Master Lists
* Accounts Master Lists
* Portfolios Master Lists
* Employee Master Lists
* Customer Data Hubs
* Product Master Lists
* Service Master Lists
* Supplier Master Lists
* Manufacturer Master Lists
* Parts Master Lists

Some of these are protected and encrypted and relegated to authentication for access, some are not.

At long last, what are the pros and cons of MDaaS?
Pros:
* Centralized Master Data can improve global quality of information
* Off-Site Master Data can reduce the costs for each customer wanting to get in to the fray.
* Cycle time to attain Master Data for your enterprise is reduced as more vendors offer MDaaS (rapid build out)
* Standardized Metadata is hashed out for Master Data Sets that are shared. For instance, a zip code is a zip code is a zip code - no matter where in the world you live.
* It's already a proven technology (some companies are providing customer master lists with addresses in this light) i.e.: Axciom
* Low risk for implementation success

Cons:
* Could cost a lot of money for ensuring 9x9 uptime in a global environment.
* A breach of security in your MD hosting provider may be an uphill ethical battle in local governments.
* Rount-trip time over the WAN for master data updates may be outside the desired or acceptable time-frame.
* A company hosting your Master Data may use it (without your knowledge) to help other companies achieve standardized master data.
* A question of "Who owns the Master Data" comes in to play - contract negotiations should mitigate this.
* Requires your business to have Metadata already defined for the master data sets, so that context can be established (basic context) when surfing the available MD services.
* Requires your business to be Services Enabled - you don't need to be at the SOA level (yet), but you need to have web-services in play, and operational within your organization. An SOA initiative under-way will help.

Do you have anything to add to this entry? Please share it. I'd love to hear your thoughts. Again, come see me next Friday at TDWI for Master Data Implementation.

Cheers,
Dan Linstedt
CTO, Myers-Holum, Inc
http://www.MyersHolum.com


Posted November 3, 2006 5:35 AM
Permalink | 4 Comments |

Have you ever thought about Master Data as a Service? Well, some companies are thinking this way. If this happens, a major paradigm shift will occur. This entry looks at MDaaS - and it's possibilities for changing the way we do business entirely. Who knows, maybe EII vendors could play in this space very very well. After all, they are the ones with Niche technology that really fits this space to begin with.

I'll blog on Master Data, the hype - the shenanigans, and the fears in my next entry. For now, realize that master data is important to the enterprise for many reasons.

Master Data means a number of things to a number of people, I'm no exception - master data to me are literally the keys to standardized kingdom. The cycle repeats itself in everything we build, first there's a new idea, then everyone implements their interpretation of this new idea as a gain, a benefit, the fact that they are different seemingly gives them "an-edge." Then some of these edges fail, best practices and lessons learned emerge, and then all the smart industry implementers begin to follow what really works - common ground, standardization, convergence of thoughts - then the real players emerge.

This is what is happening with Master Data Management solutions. However I think there are a couple of companies who are thought leaders in this space who are making a difference today. One (of course) is my company, Myers-Holum, Inc. Another is IBM Professional Services, another is Cap-Gemini, Intelligent Solutions, Object River, Kalido, and of course my respective peers (here on B-Eye Network) like David Loshin who write about MDM implementations.

But something caught my eye the other day, Cap-Gemini was saying that as a best practice, they take their customers' master data and house it off-site, so that the customer is not impacted by the machines, hardware, extra support for master data. They enable the master-data set with web-services for their customer, and they surround it with Enterprise Metadata (or my term: Master Metadata).

When I first saw this, I thought: no, not possible that a company would release their intellectual capital (which master data really is like golden keys to a kingdom when implemented properly), and allow it to be stored off-site. Then I started thinking about differentiation and then about standardization.

I realized very quickly the same thing applies to master data that applies to SaaS - standardization of particular parts, geographical locations, customers, and so on - as long as the data can be "secured", treated with integrity, delivered on time, standardized and made available - why not put it out as a service? Data Warehouses as a Service never really took off, and I'm not sure it ever will (maybe one day), but MD as a service, that's different - why? It's operational data when we look at it, we deal with transactional based information, now information - small numbers of rows going through a web-service request.

What a gold mine! Now imagine you get common data from Dunn & Bradstreet, you clean it up, and you standardize it over a web-service request, then you get common local census data (like the post-office does), and address data, and you intermix these as master data sets, then release them as MDaaS, you've got an interesting solution for the industry.

Suppose you load company profiles, SIC codes, and other public information - what happens? You can serve many different customers at the same time with the same data (master data that is standardized). A "virtually compressed" image of the data, because you don't have to store different copies for each implementation that is built. Voilla - keeping costs down for the customers of the service, the master data is updated, and pushed when changed to the customers who have signed an SLA with you.

I think Cap-Gemini takes this one-step further, by offering MDaaS for ALL the data sets the customer has, in agreement to keep certain company information confidential. Of course if Cap-Gemini or any other MDaaS system is compromised there will be a lot of stirr in the ethics community, and compliance will become an issue. Cap-Gemini must abide by in-sourcing, and different country rules, particularly with a global enterprise.

I think transactional Master Data as a Service is one wave of the future that I would ride. It's potentially a huge wave if it can be implemented properly, and security concerns can be addressed with encryption, compression, and proper data access. After all, the true nature of SOA is services, regardless of whether or not they are in-house or out-of-house, the true nature of Master Data is consolidation and standardization, regardless of company utilizing that information.

If you have any thoughts on why this would work, or wouldn't work - or what you think it would take to make it successful, I'd love to hear from you.

Cheers for now,
Dan L
CTO, Myers-Holum, Inc (http://www.MyersHolum.com)


Posted October 28, 2006 6:13 AM
Permalink | 3 Comments |

MDM data often is dispersed across the organization. This begs the question: how can the MDM be a viable asset to the business base? Is the Master Data reused throughout the organization the way it should be? Is it defined in the right context (Master Metadata)? But technically, Master Data should be consolidated into a single global data center. MDM is not a tool, not a toy, not a process - it's a way of doing business that includes tools, best practices, people, governance, metadata and single answers.

Remember: MASTER DATA is NOT, I repeat: NOT a single version of the facts, rather it IS a single version of today's corporately accepted TRUTH. Also remember: what's true today was NOT true yesterday (last week, last year, 5 years ago...) Master Data IS auditable as a system of record, but it's one of 3 system of record definitions (see my blog on System of Record).

Ok, so where does that leave me?
If you've got an SOA effort within your business, then you should have a space in the plan for Master Data, Governance (at different levels), Master Metadata, EII. It means your EDW is already setup as an Active Data Warehouse and is operating in Near Real Time. It means your operational systems have already placed their data exchanges under web-services (or are undergoing this conversion as a part of the SOA project).

To get to Master Data, I would strongly suggest that you have already defined enterprise wide conformed dimensions, or if you've got a 3rd normal form warehouse, that you've already defined enterprise wide accepted metadata definitions. In other words, I don't believe that you can get to Master Data Management successfully without FIRST going through a Master Metadata Management effort.

So what is Master Data Exactly?
As I've suggested in the past, Master Data IS a single consistent, consolidated version of today's truth, based on a SINGLE BUSINESS KEY, and includes descriptive data of that key post-cleansing/aggregation. Keep in mind that Master Data means the data attached to that KEY is at the SAME semantic levels of metadata definition. For instance, a Master Data table is different from a Conformed Dimension in that Master Data Tables do NOT and should not contain hierarchies within the same table set. Hierarchical data is at a different semantic grain, and usually is keyed off a different business key - therefore requires a separated and different Master Data Table.

Can the Master Data Tables be linked together?
Yes - through many to many relationships, Master Data Tables should NOT be dependant on parents, each master data table should be fully independant - as they stand alone in definitional nature. Master Data only requires the business key - it does NOT require the existence of the parent in order to be "created, used, and referenced." The BUSINESS may require the parent key when enforcing business rules for reporting purposes. Referential integrity should be done through secondary processing, or through EII, or through BI query sets to determine what "todays" business rule is, and if the data is in error (according to context).

Keep in mind that Master Data CHANGES CONTEXT depending on who's using it!!! That means, that Master Metadata must be defined, and metadata definitions over-ridden at operational levels (as long as there exists a 100% dependancy on the parent metadata chain) in order to determine context of the Master Data Set.

For example: A car is a car is a car, it has a VIN number - the VIN number doesn't change even though the car's color changes, or the seats change, or the radio is swapped out. The CAR CAN EXIST WITHOUT A DRIVER / OWNER! The business rule for "shipping" of that Car cares about a parent Container, and a Parent ship to that container. The Sales-floor cares about the "car" and the prospective buyer of that car, and the OWNER cares about the car itself. Context of the CAR and how the master data of the CAR changes depending on who's using the data and how it's viewed.

Master Data MUST be consolidated in a single data center. If it's to be utilized (for performance purposes) in other systems, it must flow downstream from the central synchronization point to a local copy. It's a read-only local copy in order to avoid stove-piping in the industry again. Master Metadata must also be centralized, AND the master metadata MUST be delivered along with the Master Data in order to make sense of it at run-time or access time.

Are there problems with the Master Data centralization effort?
YES, timing issues abound, locking, synchronization, and distribution issues abound. However - networks are getting faster, enabling data to centralize better - and be utilized all over the world. Here's where EII can help - EII can begin to compress data sets, store data sets in virtual tables on the server, AND in the near future (EII vendors, are you listening?) manage Master Metadata!! EII is in a prime position to solve the problem of connecting Master Data to Master Metadata definitions, and act as the delivery point for both.

Before you tackle Master Data, please please please spend time in considering the architecture carefully, and I would suggest that you're ADW, EII, Metadata, and web-services components are in place for at least one component of the business.

We are a world-class implementation firm who builds solutions that scale to the Petabyte ranges. Come see my MDM night school course at TDWI next week, followed by my VLDW class the next day.

Thoughts? Questions? Comments?

Thanks,
Dan Linstedt
CTO, Myers-Holum, Inc
http://www.MyersHolum.com


Posted August 19, 2006 6:49 AM
Permalink | No Comments |

My last entry introduced the concept. Making it work is the hard part. One of the comments I received asked about handling business changes, politics, and a variety of other circumstances. In this entry we'll begin diving in to the differences a little more, and discuss a small portion of the politics that surround this. I'll be presenting some of this information at the upcoming TDWI conference in August-2006, and the Upcoming Teradata Partners conference in November.

First, one must understand (and agree) that technical and business metadata are to be separated from a conceptual level. Then we can begin to implement projects that tackle finding and cojoining technical metadata. There are several repositories on the market today which are capable of housing technical metadata. My personal two favorite are: MetaIntegration, and SuperGlue (Informatica Product). There is also MetaStage (IBM/Ascential), Platinum (CA), and a few others that stretch the definition.

Regarding successes in the industry, we've had quite a few successes over the years building technical metadata repository stores inside of our 90-day prototypes and larger projects. It is important to remember that technical metadata projects must be scoped just like data warehousing projects must be scoped. Both have to have attainable goals, project charters, buy-off, statement of work, etc... In fact, one of the best practices we employ is: to treat the technical metadata project as a technical metadata data warehousing project. This works quite well - now we begin to capture metadata from different tools, stitch it together and produce it for business users from the BI side of the house.

Having it wrapped via a DW project allows us to use our best practices to get the technical metadata in-place, and delivered via reporting tools as fast as possible, and as accurately as possible. Standards, guidelines, lessons learned are all a part of the on-going project. Companies that we've deployed some successes at include a large travel based company, two different branches of federal government, a manufacturing sector of a large government contractor, and a large television cable provider.

From a business perspective, managing the on-going requirements is the same as the EDW project - risk analysis, mitigation strategies, and best practice policies; the same as any other project. Technical metadata is the easy part, because just like data warehousing, the data "arrives" from a source system in standard formats, and can be retrieved and merged together as master-metadata ontology. The one thing that SARBOX is missing is the ability to count technical metadata as auditable, so by putting both in a warehouse we also begin to make the technical metadata live up to the promises of compliance.

Business metadata is a completely different story. Business metadata is extremely complex, can reside in a hundred different sources, and lives in many different types of documents. Business metadata can (and must be) standardized with regards to a master registry, something that defines the layers and trees of metadata within the business, along with the dependency chains. We have setup such registries at two large branches of the Federal Government, and have other clients considering the same. Registries are just the first step - they define the structure and dependency in a standard format, along with generic terminology (mostly data elements and business definitions, and how-used definitions) within an organization at multiple levels. This MUST be a part of an MDM initiative - if not, MDM will lose its way, and it's meaning quickly after productionalization.

The second step is usually another phase in the project, it is taking the registries, and using EII tools to locate, scan, feed, from unstructured and semi-structured data sources any kind of metadata we cannot get from the technical side of the house. We augment the registries (which are already housed within the metadata-data warehouse). Again, best practices, detailed processes, risk assessments, mitigation strategies, and user communication are key to the success of such projects. The business metadata is then "standardized" according to today's rules and logged as a snapshot (this is what we found... kind of thing). Only EII tools seem to be able to gather this kind of information automatically (once setup).

Of course there are complex tools, like Rational Suite of products which also handle Business Metadata, but they require lots of investment in time and money to get them working to your enterprise advantage. They pull requirements and business metadata from Word Documents, and so on – however today, we find it much more useful to implement EII tools that can grab both unstructured and semi-structured information and interrogate it/interpret it for metadata context. There are some EII/Data vendors that we commonly work with including: Ipedo, Informatica, Silver-Creek Systems, and IBM that help make our job easier.

Handling the politics requires governance policies (see my articles on Governance here on B-Eye-Network). My firm has a proven track record of successful implementations of Master Metadata (including both technical and business metadata). Let us help you get your effort under-way successfully.

Do you have an initiative under-way? Do you have lessons learned you would like to share? Please, comment on the blog - let's discuss it. Or send me your questions privately.

Thank-you for your time,
Daniel Linstedt
Daniel.Linstedt@myersHolum.com


Posted July 31, 2006 7:18 AM
Permalink | No Comments |
PREV 1 2 3 4 5 6

Search this blog
Categories ›
Archives ›
Recent Entries ›