Business Intelligence Network business intelligence resources

Blog: David Loshin

Main

April 25, 2008

Whew...Wrapping up my MDM Book

Hey, sorry it has been a while since my last blog entry. I have been focused on finishing up my book on master data management (MDM), which thankfully is now finished. Some interesting thoughts gelled over the past 6 months in which I have been furiously assembling material for the book, which is due now to be published in the Fall by Elsevier:

- MDM is more of a means than an end, and it is more likely to be justified in the context of other enterprise activities such as CRM or ERP.

- I have started to bristle at the phrase "golden copy." I now think that MDM is more about providing universal transparent access to a sngle representation of uniquely identifiable entity data, but that does not mean that entity data has to sit in its own silo.

- Comprehensive master metadata should include more than just data dictionary information

Stay tuned for more information on the book...

March 5, 2008

Insight on MDM

Earlier this week I attended the MDM Insight event that TDWI ran in Savannah, GA. The hosted event employed a different model than other TDWI events, in which qualified participants were invited to attend, and vendor sponsors were provided with direct access to demonstrate their products' capabilities.

One of my roles at the event was to moderate a short workshop session to help attendees articulate what they believed were their most critical needs for master data management. One interesting common reaction was confusion about what composed an MDM solution, and what were the vendors actually selling. Another frequent reaction was expressing difficulty in lining up the requisite set of ducks within a reasonable amount of time to garner enough "horizontal support. Third, a general consensus was that instituting MDM was best done as an adjunct to existing application development (e.g. to support BI), focusing on small projects.

Actually, that last one confused me a bit, since if it only centering on a small application area (and not the whole enterprise), could it really be "master data" management?

Oh, one more thing - it may be worthwhile to consider the qualitative (and feasibility) differences between creating a "single golden source of truth" and an environment supporting the transparent access to a unified view of uniquely identifiable master objects (my current definition of what MDM is, by the way).

January 29, 2008

Products and MDM

Why do so many people directly link master data management with customer data? Maybe because we have been dealing with customer data so long, that when a new buzz word appears, we immediately try to link what we are doing to the "latest craze" to ensure our mindshare among the stakeholders.

However, the more I think about MDM and product data, the more intrigued I am. I have said this in a number of metings: product names are curious because they often describe what they are. For example, a PHILLIPS SCREWDRIVER 6-3/4" is a phillips screwdriver that is 6 and 3/4 inches long. What is more, product descriptions carry a lot of information that can be relatively easily parsed out using standard text analysis and text mining techniques. So I would very much be interested in hearing more about some product information MDM projects - email me or post your success stories!

November 20, 2007

Rights of Data Consolidation for Externally Supplied Data

I have been thinking about MDM and the need to incorporate all data sets that describe a specific master object, and some of the issues surrounding supplied data. The appeal of mastering disparate data sets that represent the same conceptual data objects often leads to an enthusiasm for consolidation in which individuals may neglect to validate that data ownership issues will not impede the program. In fact, many organizations use data sourced from external parties to conduct their business operations, and that external data may appear to suitably match the same business data objects that are to be consolidated into the master repository.

However, there may be issues regarding ownership of the data and contractual obligations relating to the ways that the data is used, and these are some that might require some care:
• Licensing arrangements – data providers probably license the use of the data that is being provided, as opposed to “selling” the data for general use. This means that the data provider contract will be precise in detailing the ways that the data is licensed, such as for review by named individuals, for browsing and review purposes directly through provided software, or may be used for comparisons but may not be copied or stored. License restrictions will prevent consolidating the external data into the master.
• Usage restrictions – more precisely, some external data may be provided or shared for a particular business reason and may not be used for any other purpose. This differs subtly from the licensing restrictions in that many individuals may be allowed to see, use, or even copy the data, but only for the prescribed purpose. Therefore, using the data for any other purpose that would be enabled by MDM would violate the usage agreement.
• Segregation of information – in this situation, information provided to one business application must deliberately be quarantined from other business applications due to a “business-sensitive” nature, which also introduces complexity in terms of data consolidation.
• Obligations upon termination – typically, when the provider arrangement ends, the data customer is required to destroy all copies of provided data; if the provider data has been integrated into a master repository, to what degree does that co-mingling “infect” the master? This restriction would almost make it impossible to include external data in a master repository without introducing significant safeguards to identify data sources and to provide selective roll-back.

November 13, 2006

Curious MDM Activity, and Differentiating between Master and Reference Data

Last week at the TDWI conference in orlando, I had the chance to briefly chat with Phillip Russom, who has assembled some very nice research papers this year on data quality and on master data management. One comment about his latest effort on MDM that I found intriguing was that his research suggested that a large number of MDM projects are done on behalf of finance activities, often in the area of accounting (GL, chart of accounts, item lists, etc.). I thought a large part of that data was what we might call "reference data," not necessarily "master data." One the one hand, it is good to see that the kinds of governance that are relevant for financial activities being applied to data.

However, his comment drives back to a question I must have heard 10 times down there - what is the difference between master data and reference data? This underlies an even more challenging question - how do you define "master data"? We have a lot of descriptions of master data, but nothing definitive. Dan Linstedt has done a good job of tracking some of these questions in his blog, but I think it is about time to nail this definition down. Any suggestions?

August 10, 2006

Anonymization and Identifiability

AOL admits their goof in publishing huge amounts of search data that is questionably anonymized. The New York Times describes some details of the person identified through analysis of the released search data, as was reported in Martin McKeay's blog entry.

I always have a dual reaction to the uproar over the privacy issues associated with the release of this kind of data. First, I am amused that a big company like AOL doesn't have the governance controls in place to assess the public's reaction to the publication of what might be considered sensitive data. The second is surprise that "The Public" is concerned over the exposure of what they suddenly consider to be private information, when in fact the privacy policy states that the data may be presented to others in a nonidentifiable way ("(others) ...receive aggregate data about groups of AOL Network users, but do not receive information that personally identifies you"

Of course, AOL thought that the released data was presented in a way that did not personally identify anyone. The fact that others are able to extract identifiable information from presumably anonymized data should be a wake up call to AOL to review how their governance practices are deployed to ensure they are abiding by their own policies.

April 24, 2006

Master Data Management, Local Update, and Coherence

I am actually writing this entry in real time during my (and Malcolm Chisholm's) DAMA/Meta Data Conference tutorial on Effective Management of Master Data. A question was asked about allowing updates of local copies of master data objects within operational applications. I immediately commented that allowing this introduces coherence issues between the application copies and the master copy, and that one must ensure that policies exist for coherence management if local updates are to be allowed. Of course, we have to realize that this issue is not a new one - it has been around for a long time, both in the data world (transactional semantics) as well as the compiler world (cache and memory coherence).

I would be surprised that there are any MDM systems that allow for local update without having some embedded transactional semantics incorporated.

February 6, 2006

More on Data Quality Integration

Informatica's purchase of Similarity continues the consolidation in the data quality tools market that we have been tracking over the past year or so. Absent Pitney Bowe's aborted acquisition of Firstlogic, the absorption of DQ technology into a greater data integration environment seems to be, at this point, a fait accompli. With most major US DQ tools vendors now part of something bigger (Trillium, which owns Avellino, is itself owned by Harte-Hanks, Dataflux is owned by SAS, Vality and Metagenix's tools have been sucked into Ascential, now a prime part of IBM's data motion strategy, and Evoke, tossed by CSI into Similarity's hands is now part of Informtica).

Data profiling seems to be a key component - everybody's doin' it, including Oracle (with a sophisticated tool embedded within OWB), Informatica has a nice profiling capability as part of Power Center. I hear in dark alleyways that Ab Initio has a profiling capability also. This does dovetail with my continuing observations on how profiling has numerous uses that span the information management lifecycle.

So what is next for the DQ tools industry? Here are my predictions: look for additional consolidation on the standardization and matching/record linkage side - smaller vendors may be absorbed by other larger concerns just to "stay in the game." Also: there are still two relatively senior DQ tools vendors still unattached, Firstlogic and Innovative - one or both may be candidates for marriage. Lastly: look for innovation in the DQ tools space from Europe and Asia, where there is a more penetrating understanding of engineering data quality into the development process.

December 2, 2005

Search for Knowledge

According to a news item I read the other day, according to a study done by the Pew Internet and American Life organization, the top 3 web activities are:

3) Reading news,
2) Using a search engine, and
1) E-mail

The interesting note is that searching has moved up into the number 2 spot, meaning that aside from sending and reading emails (which is standard operating procedure for most people these days), the thing that people do most on the web is using some kind of search engine to look for something. Of course, one might assume that the search is a prelude to some other action, but this fact establishes that the key to web activity is the search engine.

The ability to provide search capability really epitomizes master data management - it means being able to:
- identify entities
- maintain entity metadata in a browsable repository
- provide fast access to the seek through the metadata

The fact that search engines now provide some kind of spelling suggestions when your searches don't have significant results demonstrates how fundamental data quality, metadata, and record linkage techniques are being integrated into the "search business."