Business Intelligence Network Business Intelligence Resources

Blog: Jill Dyche

« Slouching Toward What's Best for the Firm | Main | Our Data Sucks--the CDI Remedy »

Our Data Sucks!

In which Jill (and her co-author, Evan, by proxy) bemoan the state of data quality and call for a rapprochement.

Chapter 4 of my new CDI book, Customer Data Integration: Reaching a Single Version of the Truth (with Evan Levy) bemoans the poor state of data, particularly customer data, and I'm calling for a rapprochement.

That is, a meeting of the minds between the "process" crowd and the "technology" crowd when it comes to data quality. Both are well-intentioned. Both understand the impact bad data can have on a company and its ability to achieve strategic objectives. But both can tend to extremes when it comes to making their respective cases.

The technology people, for instance, come down hard on the side of automation. And don't assume that I'm just referring to the tool vendors here. You'd be surprised at how many internal IT people just want to "install" data quality--as if we ever could!--by immediately acquiring a tool and turning it on. "Voila! Clean data!" As if.

In contrast, the process people want us all to endure often-arduous design walkthroughs wherein stakeholders deconstruct the diversity of data cleansing steps. I've been to several of these sessions and the word "overkill" doesn't begin to describe them. Sort of reminds me of the scene in the movie Airplane! where a polite young woman chats with her seatmate non-stop, to the point where the poor guy ends up trying to commit suicide.

I like to compare data quality to the production support processes of old. These processes have evolved over the years. They didn't start as a highly rigorous and well defined process. Instead, they began with a specific, core set of functions that evolved as system requirements evolved--and as stuff broke.

In my experience, a data quality effort needs both a deliberate process and automated tools to be successful. I've seen drawn-out process meetings deliver in elegant workflows and root cause analyses that get rendered so much shelfware when a manager refuses to pony up budget money for a tool. ("No one said we'd have to automate it!") Sadly, it's much easier to do nothing at all than to implement a new process that mandates do-it-yourself (aka: manual) data cleansing.

Conversely, I've seen great data quality tools get underutilized because no one is really sure what to do or where to begin. ("Do we have to use data profiling?")

At Baseline, we like data quality pilots. These pilots are short but meaningful, and don't simply involve "flipping a switch." They explore the client's expectations for data improvement and accuracy, consider the viability of data quality as a service, force ownership and accountability discussions, and, yes, define high-level processes for determining the inputs, outputs, and rules for data cleansing. By establishing this level of clarity up-front, we can then automate data quality processes with an assurance of the desired outcome. No one is forced to languish in onerous process meetings. Moreover the results are clear and demonstrable to managers who have the budget to move forward with data quality on a larger scale.

When it comes to process versus technology, there can be a sharp delineation of approaches. But when launching a data quality effort, one without the other just won't cut it.

  Posted by Jill Dyche on August 3, 2006 7:24 PM |

Comments

Jill,

I totally agree with you. We also like to start with a smaller pilot project on a well defined amount of data just to get the sense and feeling in the organisation that data quality is nothing you can buy, but that the organisation has to get a common understanding of the steps and tasks for a sucessfull data quality initiative.

Niels

I am not sure who is more dangerous, a team of technologists with a whiz bang data quality tool or a single technologist hacking away at the data with SQL. Both can cause an amazing amount of damage via well intentioned "fixes"! You certainly need a lot of input from those who know the business value of the data and the processes that make it safe.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)