We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Practical Approaches to Data Quality Management in Business Intelligence and Performance Management

Originally published October 20, 2009

Many organizations are challenged by justifying the costs of business intelligence solutions. These challenges are justified and often difficult to overcome. Business intelligence and performance management (BI/PM) projects are very expensive in terms of tools, software licensing costs, administration costs, development costs, hardware, and business implementation costs. These “hard costs” are associated with any software development project, but these are not the only costs. “Soft costs” or “soft challenges” also are incurred by BI/PM projects. Business intelligence/performance management projects can be very complex and expose many truths about how an organization operates at a strategic and tactical level. These are some of the costs that some companies and their executives may not want to expose. However, by exposing and repairing these problems, organizations can become much more profitable, able to scale their businesses, and delight both internal and external customers with the ability to exploit the value of data through the enterprise. This article will discuss these soft costs or soft challenges and provide insight into how to deal with them as well as the potential value of repair.

Soft Cost – Data Quality

The value of any data warehouse or BI/PM solution is directly tied to the data contained within. The old term GIGO (garbage in/garbage out) applies. If the data in the marts or warehouse is not high quality and is riddled with errors and inconsistencies, then the trust the organization has in the BI/PM solution decreases. There is an inverse relationship between these two areas: data quality to system utilization. The organization may have invested heavily in business intelligence (BI) tools, such as MicroStrategy, Cognos or BusinessObjects, but may have given very little thought to integration, source system analysis, governance and other data issues. Data is at the heart of any BI/PM solution. There are many reasons why BI/PM solutions fail to achieve significant value to an organization, and this is definitely one of the most critical.

Since most business intelligence/performance management solutions gather data from internal source transaction systems, the real problem could lie in the way the business operates at a tactical level. For example: a call center has a transaction system that is nearly 20 years old. The company uses the customer relationship management (CRM) system to enter data about customers and their business activities. The system was built with the best of intentions by a small team and has evolved over the years. Things like data masking and data validation were overlooked or not placed as a critical need. One user may enter a transaction date as Jan 30, 2009, another user may enter the date as 1/30/2009, and yet another may enter the date as 30-Jan-2009. They all mean the same thing, but the lack of standards and controls is problematic. So, there are architectural and technical problems in force. These nonfunctional requirements are often overlooked, not budgeted, or seen as low priority.

There is another bigger problem with this organization. Call center workers often enter data in inappropriate or convenient fields in the CRM system. For example, the second line of the customer address field is rarely used, so some call center workers use this field to enter order numbers or notes related to that customer. Call center workers may do this because they were encouraged to do it by someone in their organization, or the CRM system does not provide the fields they require. Another reason could be that there is a place where the data could be entered appropriately, but it is on another screen to which it is difficult or slow to navigate. It also could just be a shortcut related to lazy behavior and lack of governance of the source systems.

Many organizations do not realize the costs of poor data quality or do not successfully justify the investment to repair data quality problems. The figure below can demonstrate just how critical data quality is to any organization’s bottom line — in some cases, life or death.

(mouse over image to enlarge)

So how does an organization repair these problems? There are many solutions, but the ones discussed here are: do nothing approach (DNA), fix it in integration or ETL (FINE), or source system governance (EVOLVE). Let’s discuss each one of these solutions.

DNA or do nothing approach. Unfortunately, this is a very real and very acceptable solution for some. Many large financial systems, healthcare systems, and insurance systems are riddled with data problems relating to source system data quality issues. Leaders of these organizations do not know how to approach the problem (fear), do not want to deal with the problem (priority), or just do not see the value in fixing the problem (cost/benefit). They may not understand how this impacts their daily operations and how it impacts the end customer or patient. For example, a regional bank in the southwestern United States had a data warehouse riddled with data quality problems around customer information. They grew their organization by acquisition of other regional banks that offered non-competing financial solutions. They built a data warehouse, and one of the entities was the customer. The bank was proud of its outstanding customer service and its high customer touch. This was how they competed with larger national banks. They sent out thank you letters to their clients based on business activities. This is a great practice; however, it was not working. Some customers were receiving two or three copies of the letters, some customers were receiving inappropriate letters, and some letters were going to the wrong address. Jane Smith might have received two letters: one addressed to J. Smith and another addressed to Jane Smith. Customers also might have received inaccurate letters saying “Welcome to the Bank” even though they had been with the bank for 10 years. Not a big deal? Maybe not on an individual level, but when taken to the scale of millions of customers, it could mean hundreds of thousands of dollars annually in postage, mail room operations, paper, envelopes, printing services, and so forth. Sometimes letters were sent to the wrong address, so there were expenses relating to the back end as well as mail rooms receiving “Return to Sender” mail. Jane Smith also may not mind that she receives two letters, but some customers might see this as a problem and lose trust in the bank’s operations. If you do not think this matters, then answer one question: How many times have you received a bill you have already paid from your health insurance company? More than likely, this problem stems from a source system business and/or operational problem. It is also another problem around master data management. In short, the DNA approach is not highly recommended.

Fix it in integration or ETL (FINE) is when an information technology (IT) team develops special software programs that transform data into a warehouse, or target mart, to repair damaged data or data that is entered into source systems improperly. This is fine (hence the acronym FINE), but it is expensive and very difficult to build and maintain. Take, for instance, our example in the previous section. The source CRM system evolved over 20 years. That also means that the business and the people that operated those systems evolved over the years. There could be hundreds of different data entry patterns and improper data entry styles, which means there could be hundreds of special transformation programs that will be required to make the data reliable for a data warehouse. This will add more time to the initial project and also increase complexity. It will also make it more difficult to maintain in the future and could add significant time to the data warehouse load process. Let’s discuss a case regarding a major national bank in the United States. This bank built a highly valued customer performance system — let’s call it BPS. BPS was a reporting system that received its data from roughly 10 source banking systems. There were nearly 12,000 jobs developed to extract, transform, and load data into the data warehouse and two data marts that served up reports based on customer activity. There wasn’t a large amount of data, only five terabytes in total for five years’ worth of transaction data. Remember, this was their top 1% of client activities and not the general population. It took 22 days to complete all 12,000 integration jobs, and the maintenance and support costs were staggering as a result. Most of the 12,000 jobs were constructed to repair poor business operations around the data and the many anomalies that existed. This complexity resulted in longer load times, more people to operate, more people to support, and a significant increase in the annual total cost of ownership for the data warehouse. The fact that it took 22 days per month to load this data into the mart meant that the bank could not exploit the value of data from its most valuable customers. There were only 8 days to report on the data inside the calendar month before the load cycle started again for the next month. This is all attributable to the lack of governance around source systems and not fixing the source of the real problem. FINE, although better than the DNA approach discussed in the previous section, is not highly recommended.

Finally, there is the EVOLVE method, which incorporates all the methods mentioned in this paper (DNA and FINE) and adds data governance. It is a hybrid model, because this problem is very real and very complex and requires a complex solution. It can take a very long time to repair an organization’s data, especially if that organization is a major multinational organization. Even smaller organizations struggle because of lack of resources. The point is that organizations have to prioritize implementing EVOLVE and develop a program in order to drive the project. The customer is a very critical component for all for-profit businesses, so most organizations start with this entity, and other entities may have to implement the DNA or FINE approach in the short- to mid-term; hence, a hybrid approach. EVOLVE is all about creating a data governance program and requires high-level executive ownership. The first crucial step, therefore, is gaining stakeholder support for recognizing that data quality is critical to the organization’s success and that the investment of time, people, and resources is required.

The next phase is to build the leadership team that will own the project and report status up to the executives. This team should be owned by the business and should include members from specific business units or departments. The business should lead this effort, or the likelihood of success is greatly reduced. Some organizations put their IT leadership in control of these efforts. Having good IT architects, database administrators, and other resources is absolutely critical, but they should follow the lead of the business leadership team. Data quality projects like EVOLVE are more likely to fail if IT is leading the effort because IT lacks the authority to change business operations and priorities.

Once the team is established, a top-down approach and priorities are required. The approach should be made up of a high-level vision/mission statement, goals, and other high-level iron triangle estimations/definitions: time, scope, and resources. This should be detailed in a living document that will change as the project matures and new knowledge is gained. This is a good time to hold high-level critical success factor workshops and gain internal support from subject matter experts and other required personnel within the organization. The deliverable of this top-down approach would be a high-level business requirements document and program plan including people, time, and priorities. Priorities would include definition of critical business entities, such as: customer (could be patient/client/other), product (could be service or drug or parts), supplier, partner, organization, employee, and/or any other critical object relevant to the organization. Vertical industries will have different vernacular for defining entities. Once these entities are defined, the leadership team must decide on the priorities of entities and which entity requires attention first. Priority could be based on pain points, revenue impact, or cost avoidance indicators. Most organizations find that the customer entity is the most critical, because customers have direct impact on revenue. An organization may also want to take on a simple and less damaged entity first to demonstrate value and get a quick win. These priorities will determine who will need to be involved and what their roles and responsibilities are for the project.

The leadership team is established, priorities are determined, there is a top-down plan, and the team has determined which entity to take on first — now what? For this example, let’s use customer as our first entity to which to apply EVOLVE. The leadership team will need to build the customer EVOLVE team and educate them on the goals of the project and the reasons why this project is being undertaken. This is a good time to for the high-level executive to introduce the project, emphasize its importance, and gain team support and buy-in. Roles and responsibilities will have to be established, and a business data owner will have to be nominated. This person will have expertise in customer information and will have deep responsibility in ensuring that the customer is maintained and happy. The business data owner will have power to repair broken business processes and data. He or she will have the power to enforce data standards and set goals to repair data in the systems that maintain customer information. This person should also have intimate knowledge of the customer and the people who support the customer. There are two ways to repair data in source systems: manual repair by businesspeople or automatic repairs enforcing business rules on data in the source systems.

The customer owner will have to work with the business subject matter experts managing the source systems and with the technology team to automate and repair data. The technology team could include programmers, database administrators, or other system administrators. Technology data repair automation will still require business involvement to determine business rules, make data decisions, and manage exceptions. The customer owner can also set goals and standards for data quality by enforcing operations guidelines, data quality goals, and incentives and can introduce process and workflow management with proper checks and balances. The customer owner can create incentives to create good behavior through proper employee goals. For example: the team or individual who moves data from the second address line to the proper field in the CRM system will get a $1,000 bonus each quarter.

Technology will work the business to uncover all sources of customer data and create a master data store for customers. The IT team will also have to work with the business to assist in repairing source data systems and assist in building field masks, data constraints, and other programs to enforce the goals set by the business. The IT team can also provide guidance in tools and technologies for managing technical and business metadata and other master data management tools and processes. There may be other tactical repairs that are required by implementing FINE repairs mentioned in the previous section. Data architecture plays a critical role in determining future data stores and integration techniques. It is very key that business and technology leaders work as a team, with the business executives taking the primary leadership roles and the technology professionals providing the infrastructure and tools to enforce the requirements and goals.

There are many other soft challenges that can be found throughout a business intelligence/performance management project. During a BI/PM project, many different truths will be uncovered, many of them ugly. The project teams will run into resistance for many reasons, and data quality is definitely one of them, because it is usually caused by lack of good tactical management and practices. Don’t be surprised by other things you will find as well. business intelligence/performance management projects also show true business performance that may have been hidden in the past, including how well business lines really perform.

In conclusion, fixing data quality at the source will enable businesses to grow, scale, and reduce risk. Data quality management also means fewer silos, less reliance on expensive human talent, and much less uncertainty. Businesses and organizations with high data quality in source systems are also more profitable and much more valuable to future acquisition partners. It will also make life as a business intelligence/performance management project member much simpler.

  • John ThumaJohn Thuma

    John Thuma has 20 years of experience in systems integration, large scale project management, and business intelligence. He has worked with various vertical industries, including pharmaceuticals, retail, manufacturing, healthcare, and banking/financial services.

Recent articles by John Thuma



Want to post a comment? Login or become a member today!

Be the first to comment!