Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt,

About the author >

Cofounder of Genesee Academy, RapidACE, and, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on

In this entry we'll dive a little further into the pros and cons of master data as a service (MDaas). We'll bring to light the different kinds of master data, and how it will evolve in the market place into a service oriented architecture, housed offsite (generically). MDaaS follows the standard curve of new ideas, individual creation (decentralization), then centralization, and then commodity based master data. I think the firm which undertakes master data as a commodity will be a hot property in the near future.

First, I'd like to discuss the definition of master data (which I've done in other blogs). From a 30,000 ft perspective, master data is operational, quality cleansed, singular in nature, and descriptive about a business key - it is in fact an operational data store for the enterprise (with a few rules twisted). By the way, come see me at TDWI in Orlando next week - I'm teaching on Master Data (how to implement within your enterprise).

Master data should not contain:
* Parent-child relationships (other than recursive hierarchies to itself).
* Degenerate dimensional information
* Junk
* Data that is unrelated or weakly related to the business key.
* multi-part business keys that represent relationships in the business world.

Master data structures should contain:
* The business key, the whole business key and nothing but the business key.
* In addition to the business key, all descriptive data ABOUT the business key (to provide the business key CURRENT CONTEXT)
* 1 to 1 relationship with a surrogate generated number to the business key.
* Load date, create date, last updated date, original record source, updated record source

Basic rules:
* Master data can exist (as a historical record) within the warehouse.
* Master data in the ODS is always updated in place
* Master data can be built from a historical record in the warehouse (if done properly)
* Master data is NOT a materialized view within the warehouse
* Master data is usually stored in a separate data store for performance reasons. It is tuned to be operational in nature
* Each element or attribute within master data tables are defined by Master Metadata (enterprise metadata and ontology’s for further context).
* Master data is hooked to 24x7x365 services layers for bi-directional data streams (updates in, pushed update notification out to subscribers of that service).
* Master data sets are cleansed prior to load into the ODS, this data is partially auditable as a System Of Record (once established and is used to update source systems) However, the caveat is: the cleansing and quality routines MUST provide auditable and traceable actions on what happened to the master data on the way in. These audit logs MUST be reversible.
* Master data updates are reversible
* Master data is a single copy within the enterprise, hence the term MASTER. If copied locally across geographical regions, then it is read-only, and each local copy of the MD is force-fed (is a subscriber) to all updates.

Now, MDaaS requires that Master Data be housed off-site, on hosting services, in a remote database, connected through metadata and service layers. MDaaS can be specific by client (like does with it's sales companies data it houses).

MDaaS attributes:
* Must be off-site
* Must be accompanied by discovery services
* Must be accessible through web services
* Must be secured through authentication
* Must be encrypted when traveling over the WAN
* Must be accompanied by Master Metadata (Enterprise Metadata)
* Must allow discovery services to query metadata.
* Must be updatable through services
* Must have minimal latency even though it's over a WAN
* Must have constant quality engines running to cleanse the data on the way in.
* Must be accessible via web-browser user interface in order for the business to monitor and manually adjust master data.
* One stream of changes (the old record prior to a new update) must be pushed out to an EDW subscriber for recording purposes.

MDaaS must NOT:
* be locked away within an ERP, or CRM system unless this is the ONLY source system this enterprise is using.
* be down at any time, down-time will kill SLA's and the operations of a company.

Some interesting items, there are some general master data sets that can and should be available to paying subscribers as shared data sets, these include:
* Postal Information
* SIC Codes
* Public records, like patents, locations of buildings, maps, geo-spatial information, public financial calendars and so on, some (regulated) tax / levy data.
* Government registries of registered businesses, and their corresponding names

Any data currently reported to the public and available on the web, should be turned into MDaaS - and in some cases already has.

Types of Master Data Entities might include:
* Portfolio Master Lists
* Invoice Master Lists
* Location Master Lists
* Address Master Lists
* Accounts Master Lists
* Portfolios Master Lists
* Employee Master Lists
* Customer Data Hubs
* Product Master Lists
* Service Master Lists
* Supplier Master Lists
* Manufacturer Master Lists
* Parts Master Lists

Some of these are protected and encrypted and relegated to authentication for access, some are not.

At long last, what are the pros and cons of MDaaS?
* Centralized Master Data can improve global quality of information
* Off-Site Master Data can reduce the costs for each customer wanting to get in to the fray.
* Cycle time to attain Master Data for your enterprise is reduced as more vendors offer MDaaS (rapid build out)
* Standardized Metadata is hashed out for Master Data Sets that are shared. For instance, a zip code is a zip code is a zip code - no matter where in the world you live.
* It's already a proven technology (some companies are providing customer master lists with addresses in this light) i.e.: Axciom
* Low risk for implementation success

* Could cost a lot of money for ensuring 9x9 uptime in a global environment.
* A breach of security in your MD hosting provider may be an uphill ethical battle in local governments.
* Rount-trip time over the WAN for master data updates may be outside the desired or acceptable time-frame.
* A company hosting your Master Data may use it (without your knowledge) to help other companies achieve standardized master data.
* A question of "Who owns the Master Data" comes in to play - contract negotiations should mitigate this.
* Requires your business to have Metadata already defined for the master data sets, so that context can be established (basic context) when surfing the available MD services.
* Requires your business to be Services Enabled - you don't need to be at the SOA level (yet), but you need to have web-services in play, and operational within your organization. An SOA initiative under-way will help.

Do you have anything to add to this entry? Please share it. I'd love to hear your thoughts. Again, come see me next Friday at TDWI for Master Data Implementation.

Dan Linstedt
CTO, Myers-Holum, Inc

Posted November 3, 2006 5:35 AM
Permalink | 4 Comments |


"Any data currently reported to the public and available on the web, should be turned into MDaaS."

Overreaching just a wee bit?

Yes, I probably did overreach. Sorry about that. Must have been the fever that I've been fighting the past couple days.

My appologies.

Dan Linstedt

Why is it reaching? Is this not the integrated view of the enterprise? I guess it is possible for a company like Amazon to have more than one master list of products that is publicly available, but that would have to be different divisions and operated independently. Their locations would most certainly have to conform. So, I guess the overreaching is the SOA implications?

You did use the word "should" and not "must". I agree since the list of MDM entities that you have identified should be published and accessbile over secure, reliable, documented, and enterprise accepted. Thus, if they are published as MDaaS then the enterprise has produced an integrated infrastructure reflecting the business. Isn't this the utopia that so many are pining to achieve?


This is a huge idea. We're driving enterprise data management in a big way and MDaaS looks like it could be a great way of getting the data to the right place at the right time.

Leave a comment

Search this blog
Categories ›
Archives ›
Recent Entries ›