We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approachand Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at loshin@knowledge-integrity.com or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

On Monday, march 15 I conducted a full-day master data management tutorial at the Enterprise Data World conference. As a forum for discussing pragmatic MDM best practices, one section of the day was set aside for a panel discussion among representatvis from four vendor products:

  • Dan Soceanu from DataFlux
  • Marty Moseley from Initiate Systems - An IBM Company
  • Ravi Shankar from Informatica (formerly Siperian)
  • Jim Walker from Talend

I posed the question to all four: What defines data as "master data"? The first round of answers focused on the standard answer: data concepts that "are important" to the business and are shared by two or more applications. My reaction to this response was that it was not a practical guide, and then rephrased the question: What can the people in the audience do when they got back from the conference to start identifying data entities as master data?

Again, I did not get the answers I was looking for - all four suggested that the task was not one that could be done at your desk, that it required knowledge of the business, that subject matter experts had to be consulted.

All true, but again, not executable, so I reframde the question again: knowing that there was bound to be variation, replication, duplication, redundancy, differences in semantics, what is a process for reviewing the data to decide which data element of which data entities belongs in a unified master view.

At that point the answer became a little clearer: you can't tell unless you understand what each data element is, how it is used, what its definition was, how many application sused, in what type of usage scenarios. In addition, you needed oversight of the process for analyzing the data and capturing the results, sharing, and having all that information validated by subject matter experts.

As moderator, I responded by summarizing: "in order to determine what data is master data, you need to analyze the data, document all the information about the data, and have policies for overseeing that process. That sounds like data profiling, metadata management, and data governance." (nods all around)

But is has to be more than that; there has to be a more operationalized method that results in a clear determination of which data elements of which data entities are to be mastered.

Posted March 17, 2010 1:01 PM
Permalink | 3 Comments |


Hi David -

It was a good panel for sure - thanks for inviting me to join you!

Here's the gist of what I think you're asking: "What are the principles and primary tests you apply to any set of data that will help you determine which data are master data to that enterprise and which are not?" It doesn't work to say "customer data are master data" because that is not true in all cases. That's why it must be an objective set of evaluation criteria, or principles, that drive the discovery process.

Now, I think you're on a good track, but I would suggest that people start with a top-down assessment of master data rather than a bottom-up profiling exercise. I suggest starting with what we've typically called "subject areas" and not by looking at profiling thousands of data elements. My experience is that most leaders in organizations can quickly agree on what kinds of data are master data and which aren't by the following criteria:

1) is this subject area a building block of critical business transactions? By "critical business transactions" I mean those that significantly impact revenue generation, customer satisfaction, profitability, safety, efficiency - or whatever are critical to the success of an organization.

2) are the data in these subject areas created and managed in multiple systems, leading to possible redundancies and discrepancies in the structure and meaning of those data? If so, then there may be value in managing those data better via MDM.

3) if those data are incorrect, inaccurate, incomplete, mismanaged, etc. do they have the potential to do harm to the organization - in other words, how much do these data *really* matter? This helps prioritize which should be managed first.

I've found that most organizations can easily agree which of those subject areas fall into the category of "master data."

Next, I think you construct lightweight business cases for those business issues, risks, and missed opportunities that matter most to your organization, and map those to the master data they require or use.

Once you do that, you'll have a great idea of the most important "domains" of master data, and which matter most to your enterprise. Then, you can discuss which attributes or kinds of data within those subject areas are most meaningful/critical to those business cases. Then (finally) I think it's time to start profiling data. And you profiling can be much more focused and the results easier to analyze, because you're not looking at every value within every column in every table in every application.

Hope this makes sense!


Excellent post - well done for putting the vendors "on the spot".

In my opinion organisations need an "Enterprise Wide Data Model" mapping out where all their data is - but most critically their "Master Data".

I have posted an item on my blog discussing this further.

Lorain Lawson called this blog entry out along with Marty's response, and I dropped a responsed there:


Leave a comment


Search this blog
Categories ›
Archives ›
Recent Entries ›