Over the last several articles, I’ve stressed the virtues of robust data quality architectures and processes. In my last article, Certified Data and the Certification Process for Financial Institutions, I focused on certified data and the processes that support it. This month I will examine how quality management programs, such as Six Sigma, fit into this data certification message. Sid Frank, Senior Principal at Knightsbridge Solutions, has assisted me with this article. Sid has worked extensively in the TQM and Six Sigma worlds, both inside and outside of financial services.
Many companies have applied Six Sigma concepts and philosophies to numerous processes and achieved stunning results. Few companies, however, have applied Six Sigma to their data quality processes. Here, Sid and I will show how the application can be made.
What is Six Sigma?
The narrowest definition of Six Sigma is a statistical one: controlling a process to limit output defects to 3.4 per million opportunities. Whereas a defect is anything outside a customer’s requirement specification, an opportunity is any chance of a defect occurring.
Most companies use Six Sigma more broadly, as a goal of achieving near ideal quality for a product or service through new or improved processes and tools, and as a mindset for satisfying customer requirements. For example, Motorola has used Six Sigma processes to improve their production processes for their mobile phones. Similarly, GE has used Six Sigma processes to improve efficiencies and cost savings in a variety of its businesses. These businesses include aircraft engines, power, medical systems, and GE capital. But we want to look at a much narrower application—Six Sigma on data.
Data Terms Reviewed
As a quick review, “certified data” is defined as data that has been subjected to a structured quality process to ensure that it meets or exceeds the standards established by its intended consumers. Such standards are typically documented via service level agreements (SLAs) and administered by an organized data governance structure. In this definition, we refer to the term “standards.” This is where the Six Sigma begins to fit perfectly. Six Sigma is all about measurable standards that will impact quality and/or productivity. Another connection, or affinity, to certified data is that a formal Six Sigma program naturally links to measurable corporate results.
For example: In a ten-year period, Motorola, the originator of Six Sigma, increased sales by a factor of five, and profits by almost 20% per year. At GE, Six Sigma initiatives have driven $10 billion to the bottom line over the life of the program. Honeywell is another company that credits Six Sigma with increasing productivity. They have increased productivity by 6%.
Application to Data Certification
Since Six Sigma and data quality improvement share the same goal of reducing defects, data certification improvement programs are natural candidates for the application of Six Sigma methodologies.
Six Sigma Methodologies
The two commonly used Six Sigma methodologies, DMAIC and DMADV, are shown in Figure 1. DMAIC (Define, Measure, Analyze, Improve and Control) is used to provide incremental improvements to existing processes. DMADV (Define, Measure, Analyze, Design and Verify) is used to develop new processes or make radical changes to existing processes. Design for Six Sigma, or DFSS, is another name for DMADV. Both methodologies incorporate the philosophy of continuous improvement using a feedback loop.
Figure 1. DMAIC and DMADV Six Sigma Processes
Both methodologies also depend upon the ability to specify the desired output of a process, such as the quality and certification of data, the ability to measure the output or quality of the data, and the ability to control the process to affect the desired output. The ETL and quality process for this was discussed in last month’s article, Certified Data and the Certification Process for Financial Institutions.
Fitness of Use
Specifying data certification requirements can be compared to specifying quality requirements for a material input, such as steel to a manufacturing process. The quality and certification requirements are driven by intended use, or consumption. Steel used to manufacture aircraft bolts will have more stringent structural quality requirements than steel used to manufacture household appliances. Similarly, the data used for compliance reporting and corporate performance analysis will have more stringent quality requirements than data required for customer segmentation or other marketing functions.
It is important to be able to thoroughly define measurable attributes for data certification. At least 50% of the attributes must be “use-specific” versus general data quality metrics (e.g., percent null fields).
Measuring Data Quality
Central to Six Sigma and data certification is the ability to measure data quality throughout the entire process and compare the actual outputs to the desired, required or expected outputs. The ability to certify data is determined by how closely the data produced reflects the data that was required or expected. Some typical metrics used to certify data include:
Accuracy refers to how closely the data value agrees with the correct or “true” value. Precision is the ability of a measurement or analytical results to be consistently reproduced, or the number of significant digits to which a value has been measured or calculated. One can simultaneously be extremely precise and totally inaccurate.
Take, for example, an oil filter manufacturer that produces millions of filters annually. It can accurately calculate revenues by capturing the sales price for each filter to two significant digits, representing the price of each filter sold. To calculate profit, the company collects and calculates costs for each filter to four significant digits. Due to errors in collecting, allocating and calculating costs, the company found that its cost per filter was off by as much as 10 cents per filter, yielding a profit calculation error over $300,000.
Accuracy may also refer to non-quantitative data, such as customer names, customer addresses, customer segment categorization, product classifications and descriptions.
Completeness measures the presence or absence of data. For example, a top-five bank determined that the customer addresses it received from a third-party data provider were 95% complete. The missing 5% cost the bank millions of dollars in missed cross-selling opportunities. Completeness also pertains to retention requirements for historical data. To perform historical trending, business analysts often require historical data spanning several years.
While reliability is closely related to accuracy, it is more of a relative measure of how much confidence one can place in the data values. Reliability is often used for data that is provided from external providers. For example, a bank receiving credit scores from a credit reporting firm believes the credit scores it receives are correct for 99.9% of its prospects. This means that for a sample set of 100,000 customers, credit scores will be incorrect for 100 customers.
Data reliability can pertain to the reliability of the data source as well. In competitive intelligence systems, sources are often rated for reliability. For example, a source like the New York Times will usually have a higher reliability rating than the National Enquirer. Primary sources generally receive higher ratings than secondary sources.
Data is only useful if it is available when needed. This is especially true for managers relying on decision support systems. Often, systems are down and data is not accessible during maintenance periods or system failures. Data availability can be defined as the ratio of the amount of time data is available to the amount of time data is needed for access.
The Wall Street Journal is often referred to as the obituary column for Wall Street investors. By the time stock prices or stock “news” is published, the street has already accounted for the information in the sales price of a stock. By then, it’s too late for investors to take advantage of the published information. Data almost always has an associated “timing” or “freshness” attribute or component. While data may be measured in seconds for stock traders, the time requirement for mortgage analysts may be years.
Data consistency refers to the common definition, understanding, interpretation and calculation of a data element. Using this quality metric is illustrated by the following example: In the process of designing a performance data mart for a credit card bank, a multi-department survey was conducted on the interpretation of company “profit.” Some departments equated profit to EBIT, others did not. And while some departments included the cost of capital, others did not.
Another common example is the use of and interpretation of “price.” A manufacturer of mobile phones attempted to measure the performance of its sales force by using the price per phone negotiated by its sales people with its distributors. The mixture of incentives offered by each salesperson differed by customer, and sometimes by each negotiated deal. Incentives may have included volume discounts, marketing allowances, rebates, or any combination of these offers. The sales price of each phone did not reflect the true net or “pocket” price of each phone. Because of this, price could not be used as a reliable way of measuring sales performance.
Uniqueness is closely related to consistency. For a data element to be consistent, it should also have a unique identity and definition. There must be a unique definition and method of calculation for “lifetime value” and “large” customers to calculate the lifetime value of large customers.
Typical Data Certification Program Challenges
When attempting to implement a data certification program, companies encounter common issues, challenges and problems. These include:
- Understanding the needs and fitness of use
Data requirements are neither absolute nor universal. Data requirements are defined by the users and consumers of the data. They are bounded by the intended use of the data. A common improvement challenge is gathering, understanding and analyzing the diverse data needs of multiple data users/consumers and consumer groups. Whereas marketing may only require individual monthly account balances to be accurate to within 5%, finance may require these same balances to be within 1% accurate. Another department, like audit, may require 0.1%.
- Measuring accuracy
Both DMAIC and DMADV (or DFSS) methodologies require measuring the data quality attribute or metric. Data accuracy may be difficult to measure because:
Finding causes of defects
When the data quality metric is known and measurable, it is sometimes difficult to trace the cause of the defect. This is especially true if there is a long and complex trace from the source of the data to the data element of interest, or if the value of the data element is a function of other data elements.
Size or Data Volumes
Measuring, analyzing and improving data quality often seems daunting in the face of multiple terabytes of data, with hundreds to thousands of data elements with interdependencies. Faced with this complex “ocean” of data, many companies find data quality improvement programs untenable.
Where or How to Start?
Faced with these and other program challenges, such as cost justifying the data improvement project, companies are often challenged with how to initiate and implement a successful Six Sigma data quality improvement initiative.
Six Sigma Data Certification Benefits
Despite these challenges, companies can be well-rewarded for implementing a data certification program. One advantage of having access to high-quality certified data is that it creates better foundations for making business decisions. It also helps executives avoid litigation by ensuring that the data used for regulatory compliance is of the highest quality.
The benefits of applying Six Sigma concepts to data certification are:
- A framework and methodology exists that can be applied to improving data quality.
- Six Sigma tools and techniques are available to support the methodology.
- Successes achieved in improving manufacturing and other processes can be realized in achieving data quality objectives.
Organizations applying Six Sigma concepts and methodologies to manufacturing and other processes have realized billions of dollars in benefits. The same can be true in the even more data intensive world of financial services. By applying Six Sigma methodologies to improve their data quality processes, financial institutions can realize compounded benefits of reduced costs, reduced risks, increased revenues, improved margins and regulatory compliance.
SOURCE: The Partnership of Six Sigma and Data Certification
Recent articles by Duffie Brunson, Sid Frank