Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approachand Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at loshin@knowledge-integrity.com or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

July 2009 Archives

I was just scanning Philip Russom's October 2007 monograph on "Unifying the Practices of Data Profiling, Integration, and Quality," and noticed this:

"The quality of data degrades as application update, add data to, or delete data from a database. Most estimates say that 10-12% of the data in an active database becomes dirty, nonstandard, or redundant each month. Hence, if you cleanse a database 100% today, it will only be 88-90% clean 30 days from now."

I love finding sentences like this that provide some (hopefully objective) third-party providing a hard statistic that can be used as ammunition for supporting a data quality initiative. On the other hand, I do get concerned when it an unnamed source (the "most estimates" part, I mean) is used to provide the statistic.

Actually what would happen if the data is never cleansed? How about for six months? Does the 10-12% degradation apply to all the records or only to currently clean ones? OK, simple arithmetic - if each month 10% of the clean records become dirty, nonstandard, or redundant each month, then after 6 months (.9*.9*.9*.9*.9*.9) * 100%, or 53.1% of the records are unsullied.

I am not sure that I believe this to be the case. Perhaps the rate at which data becomes dirty slows each month, since those records with a higher propensity to become flawed (for what ever reason - multiple touchpoints, commonly-used records, tc.) will have already been subjected to an error, so they would not be counted the second month?

If anyone has any references to this 10-12% number, post a comment with a link!


Posted July 15, 2009 1:06 PM
Permalink | 1 Comment |


   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›