Consider this scenario: You are a team member on your company’s customer data integration project. One of your tasks, perhaps in a data stewardship role, is to evaluate customer records acquired from a third party to ensure that each record complies with certain corporate information policies. During the execution of this task, however, you see a customer name that you recognize, perhaps a relative or a new neighbor. As part of your review of that customer’s record, you realize that you know that the customer’s address is incorrect – perhaps the customer recently moved into a new home and the address provided is no longer valid. Should you correct the record?
I know what you are thinking – the answer is obvious. Unfortunately, half of you think the answer is clearly to correct the record and the other half think the answer is to absolutely leave the record alone. And both sides are justified in their stances, but in order to understand why, let’s abstract the scenario a bit into more direct questions:
The first question is a bit of a red herring – presumably, anyone in the organization would like critical data sets to reflect the highest levels of data quality; and if someone knows that there is a flaw in the data, correcting the error improves the quality of the record along with the corresponding downstream applications dependent on that record. Rather, the issues have to do with the approach used for making that change – not just who changes the data, but under what authority is that person allowed to modify the record, who approves that modification, how is the modification logged, what do we do if the record needs to be reverted to its earlier state, has that copy of the record become unsynchronized with other copies and so on. In other words, directly modifying data requires a significant amount of oversight, control and auditing before letting anyone have “modify access” to a data set.
It is the authority of the source making the change that might be of greatest concern. An individual’s attempt to modify a record based on what could be termed “circumstantial
evidence” is precarious at best and could have deeper ramifications. For example, that neighbor that just moved in next door may be using that house only as a vacation home, with a previous
address remaining as her official residence. Modifying the address might trigger other events that could impact the customer, perhaps in very inappropriate ways.
However, instituting layers of hierarchical control means that the amount of overhead and approvals required to correct known inaccuracies would elongate the time frame in which invalid information can be made to be fit for the downstream purposes. And knowing that a record contains inaccurate data and not doing something about it is somehow disconcerting. At some point in time, if the value is really not accurate, the record will need to be corrected. That brings us to the second question: Under what circumstances is data correction allowed?
Let’s boil it down to addressing some of our identified issues and asserting some basic concepts that would be the start of a nascent governance framework for data:
Does this constitute a data governance program? Not really, but it does provide a starting point for overseeing actions that are expected to happen and provides an audit trail that can demonstrate the justifications for any data correction as well as show that the changes were made by vetted staff members. Instituting some straightforward controls over data correction should prevent arbitrary modifications performed in the absence of any supervision.
Recent articles by David Loshin
Editor's note: More David Loshin articles, resources, news and events are available in the David Loshin Expert Channel on the BeyeNETWORK. Be sure to visit today!