Blog: Claudia Imhoff« DCI's April Business Intelligence Conference | Main | What technology buyers really want... » Data Quality or Data Integration - which is more difficult?I read an interesting article in the Business Intelligence Pipeline Newsletter recently asking which was the more difficult challenge - assuring data quality or integrating data from across your organization. They have a voting booth set up so you can cast your vote for which you believe is the more difficult task. I have my own opinion as well. I voted for assuring data quality and, at the time of my vote, it appeared that the majority of voters agreed with me. Why? In my opinion, it is because of the assuring part of the task. Data integration seems to be a much more straightforward task with more mature technologies, methodology, and practical expertise in the data integrators. Even the definition of data integration seems to be cut and dried. (Not always but at least you have a solid standard to go from -- a single version of the truth...) I think we are still feeling our way through what it means to assure data quality. While there certainly is useful technology to help with data quality, so much of the assurance part is still heavily dependent on the human being (in this case, usually a business person)eyeballing the cleaned up data to verify its "quality". There don't seem to be very clear, standard methodologies or processes to follow either. And what are the metrics of quality? When to we reach a state of "quality"? And what exactly does quality data even mean? Without answers to these fundamental questions, it seems to me that we will continue to struggle with this challenge more so than with that faced by data integrators. Your thoughts? Yours in BI success, Claudia |
Comments
I agree, the quality of the data is where most of my data conversion issue arise from. I can map the data, but when we try to compare results it can be a challenge to work out the minor details.
Posted by: Lonnie Reeder | July 25, 2005 11:00 AM
Data quality, by far, is the more difficult of the two tasks, and the reason is simple: Data quality is a business process oriented concept, defined by each business in situ. This is how it should be and this is how it will continue to be for quite some time at least. Competitive business has the inherent right to find its own place between the poles of incompetence and brilliance. In a data quality perspective, what fits the needs of a given business may seem either too much or too little to another business. This is just the nature of business. The data quality tools that will be the most successful are those that recognize this immutable truth and provide the most flexibility to the end user.
Posted by: Kelly M. McDaniel | July 29, 2005 6:46 AM