We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Data Validation as Part of Your Data Governance Program

Originally published February 10, 2011

Data validation is an extremely important activity in the loading and maintaining of any decision support system because the basis of sound reporting and decision making is accurate, timely data. This pronouncement should not come as a surprise to anyone, and any data manager worth his or her salt knows this implicitly. Unfortunately, many times data validation is handled in a reactive, haphazard way that is not coordinated with other parts of the system or the organization, diminishing its effectiveness. This article will show you that data validation is a component part of a comprehensive data governance program, making both the individual validation processes and the overall data governance program stronger.

Where It Fits

Data validation encompasses accuracy, the "...degree of conformity of a measure to a standard or a true value," and validity, “…the degree to which the data conforms to defined business rules,” according to the IAIDQ. These definitions contain several key phrases that show that validation is dependent on a data governance program:
  • …degree of conformity…

  • …standard or true value…

  • …defined business rules…
Data governance programs address all of these items and more in an organized, orderly way that both standardizes policies and makes procedures easier to implement, saving development time and expense. Most IT professionals think that data validation is a fairly straightforward exercise that is independent of other groups, but there are details to it that are not appreciated and often make validation efforts fail. Let's take a look at how validation is done and how data governance plays a large part in its success.

How Validation is Done

An example of data validation is the loading of staged source data into a data warehouse for long-term storage and transformation into efficient reporting structures. The mechanical details of validating data are:
  • Determining what data to validate. Not all data needs to be validated in all ways, but at a minimum each data element should be checked for conformance to the target data type and domain values/ranges, as necessary. Additional checks can be performed to check relational integrity, column sums and the like. All of the definitional data for these checks should be centrally stored in a metadata repository where designated staff has made determinations on data modeling, valid domain values, and data relationships. The system implementers need only refer to this repository for these answers.

  • Determining when to validate data. Validation checks are usually applied when data moves from one data structure to another, and there is great flexibility in determining when each type of validation should be applied for maximum efficiency. The data governance program is the proper place to develop and disseminate the recommended placement of the various types of validations to relieve the business analysis and development teams of this burden. The result of this standardization is consistency in validation activities, which enhances confidence in the data product.

  • Determining what to do with data that fails validation. Identifying non-conforming data is only the first step of the validation process. Once the failed data is identified, decisions must be made on where to store the data, who to notify in case of failure, and how the data will be corrected and resubmitted into the data acquisition process. All of these decisions fall outside of the purview of the system implementation team, and are often treated by the development team as "identify the bad data, set it aside, and someone will deal with it." A robust data governance program will address all of these decisions via a partnership between the information providers and the business users, and provide the processes for addressing and correcting bad data.
In the absence of a data governance program, the determination of these items falls to the business analysis and development teams for the system being implemented. The two main effects of this are a loss of efficiency—insufficient coordination with other systems, inconsistent application of validation rules, etc.—and possible failure of the entire validation process by marking but not addressing invalid data. This leads to a decrease in the trust level of the data in the system, which can lead to the abandonment of the system by business users, which could be seen as a failure of the system as a whole.

The One Reason You Need Governance

There is an overriding concept dealing with all we have discussed thus far that must be addressed for any data validation program to succeed: Who decides what is “valid”? It is the foundation of the entire concept of data validation, and is a basic question that organizations must manage. When a data governance program is in place, an organization will be able to find the person or group responsible for all of the data in the decision support system, charge them with defining the parameters that constitute "valid" data, and hold them responsible for the resulting quality of the data. In this scenario, the analysis and development teams simply implement these definitions. When data governance is absent, this definition is left to the individual teams, which may have conflicting ideas of what constitutes "validity" and will implement them accordingly. This results in incomplete and inefficient data validation, with varying degrees of data quality in the target system.

Data validation is an integral part of any data governance effort. The coupling of governance and validation will improve the quality and efficiency of each of your systems. This becomes increasingly important as more and more data is shared both internally and externally to the organization. Make governance part of your validation efforts and make your data better!

  • Steve PutmanSteve Putman
    Steve Putman is a consultant at Baseline Consulting. He specializes in data management projects, including hands-on delivery of data architectures, data profiling, and data quality solutions for global and mid-market companies. Prior to Baseline, Steve worked for several commercial and vendor firms, and has consistently solved business problems with information.
 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!