The application of quality management principles is required for true information quality.
I recently read yet another article about data quality. This article promoted extracting data from source databases, verifying data in a staging area, correcting data in a staging area,
transforming data, “certifying” data (as valid according to a set of business rules) and publishing data in a reporting database. This was called “the data quality process.”
Data cleansing is not information quality.
The problem with this, though, is that it is neither information quality, nor data quality. This author was simply describing “automated information scrap and rework.” The supposed
“data quality process” fails to meet the criteria of a sound quality management system.
Here are the fundamental flaws in this as a “data quality process”:
-
It attacks the symptom, instead of the root cause. Since there were no attempts to correct data (corrective maintenance) at the source, any corrections made to the data in the downstream
reporting database will not stop the defective records. These records will remain in the source and continue to cause processes that use that source data to continue to fail. Furthermore, the
processes producing the defective data will continue to produce defective data.
-
No automated process can identify and correct all errors. Automated data correction techniques can only change data to conform to business rules or change them to data in a reference dataset.
An example of this is postal addresses, which are deemed correct. But there can be errors in those reference datasets. Although the automated validity correction can actually change a
value to a “valid” value, the value is not necessarily “correct.” Assigning a gender status of “male” and a personal title of “Mr.” to George A.
Davis would have actually introduced an error, alienating her and possibly losing her as a customer.
-
By focusing on data correction instead of process improvement, all money spent on the back end is suboptimized. Without conducting a correct process improvement initiative on the source
processes, errors will continue at the same rate. These errors might even increase if information producers know there is a back-end process for correcting them.
-
Every valid quality management system has changed from an approach of “inspect and correct” (scrap and rework) to “defect prevention” (process improvement) to prevent
the causes of defects.
Problems Caused by this Process
There are several problems caused by the aforementioned approach:
The Information Quality Process
To solve information quality problems when you have defective data:
-
Treat all data cleansing efforts as one-time events. You should correct data at its source, if it is still used there. The only exception, though, is if legal requirements prevent you from
altering the data at its source. If this happens, you must maintain the corrected data as an alternate for the processes that require correct data.
- Conduct a process improvement initiative on the processes at the source. This can be done using a Pareto approach to identify and attack the most problematic processes
first. This involves a fundamental Plan-Do-Check-Act process improvement to analyze root causes and define improvements that eliminate the causes of errors
(Improving Data Warehouse and Business Information Quality).
- As Deming’s third point of quality states, “Quality comes not from inspection, but from improvement of the production
process. Inspection, scrap, downgrading, and rework are not corrective action on the process (Out of the Crisis).”
- Using the process, establish monitoring and real-time feedback to correct the data at the point of data creation. For example, the same business rules used for downstream
audit checks should be invoked by the applications that create the data.
- Process improvements also look at the business process outside of the application for procedure clarity, checklists, form design, training, management accountability and
other sources causing defects.
Conclusion
If someone wishes to only address data cleansing, they should call their method a “data cleansing,” “data inspection and correction” or “data corrective
maintenance” process.
If you want a “data quality” or “information quality” process, however, you must understand and apply quality management principles including a strong process improvement
method, based on the Shewhart Cycle of Plan-Do-Check/Study-Act or the Six Sigma variation of DMAIC (Define-Measure-Analyze-Improve-Control).
This is the main process that can lead you to become an Intelligent Learning Organization.
Please share your ideas for the intelligent learning organization at Larry.English@infoimpact.com.