Blog: Jill Dyché Subscribe to this blog's RSS feed!

Jill Dyché

There you are! What took you so long? This is my blog and it's about YOU.

Yes, you. Or at least it's about your company. Or people you work with in your company. Or people at other companies that are a lot like you. Or people at other companies that you'd rather not resemble at all. Or it's about your competitors and what they're doing, and whether you're doing it better. You get the idea. There's a swarm of swamis, shrinks, and gurus out there already, but I'm just a consultant who works with lots of clients, and the dirty little secret - shhh! - is my clients share a lot of the same challenges around data management, data governance, and data integration. Many of their stories are universal, and that's where you come in.

I'm hoping you'll pour a cup of tea (if this were another Web site, it would be a tumbler of single-malt, but never mind), open the blog, read a little bit and go, "Jeez, that sounds just like me." Or not. Either way, welcome on in. It really is all about you.

About the author >

Jill is a partner co-founder of Baseline Consulting, a technology and management consulting firm specializing in data integration and business analytics. Jill is the author of three acclaimed business books, the latest of which is Customer Data Integration: Reaching a Single Version of the Truth, co-authored with Evan Levy. Her blog, Inside the Biz, focuses on the business value of IT.

Editor's Note: More articles and resources are available in Jill's BeyeNETWORK Expert Channel. Be sure to visit today!

June 2010 Archives

By Carol Newcomb, Senior Consultant

Diamond in the Rough: Data Quality

The third part of my summertime primer addresses Data Quality Analysis.   Don’t even start a data quality analysis until you have completed the first two steps of your Root Cause Analysis--investigate & prioritize any potential causative factors, and start your metadata assessment.   Otherwise, you may be misled by your findings.

Diamonds
Data quality is defined as complete and accurate data that is ready for business consumption.   Sources of poor data quality may include lack of data entry rules, unclear data element definitions, inconsistent metadata definitions for field type, format or intent, or breakdowns in data transformation processes as data flow between systems or applications.   Poor data quality results in bad business decisions; it contributes to major problems in using data effectively, and costs companies millions of dollars/year in terms of rework and inefficiency.   Data quality, in combination with robust metadata definitions, is part of the foundation of good data governance.

Data Quality Analysis

A Data Quality Management process should be designed to enable an area to start with a simple approach and over time to mature to one that is more proactive and comprehensive.   Initially, investigation may be focused on single data elements or events.   As patterns, data commonalities and other relationships appear, the data quality management process will grow to support complete business processes.     A mature data quality management process will not just resolve individual issues; it will also track relationships between data elements, ensure that business rules are consistent and generate statistical analyses to monitor previously addressed issues to ensure that data quality is stable and that an early warning system is in place as part of the data governance program.   The goal is to design a data quality management lifecycle, as shown in this diagram:

Carol_fig1

Initial Data Quality Analysis Process

I. Define data scope
    • Determine data elements that are associated with or are direct results of the reported issue
    • Check that all metadata definitions are present and current
    • Enlist the involvement of the Data SME or Data Stewards
    • Identify all source systems where the data originates, is   entered or derived
II. Extract and profile the data
    • Extract the relevant data from all key source systems.
    • Design the profile.   A profile will consist, at a minimum, of total record counts, min/max values, frequency of unique values, and frequency of invalid values (if defined) for each data element profiled.  
    • Profile the data to determine key characteristics that are contributing to the issue, such as:
      1. Wrong values
      2. Missing values
      3. Corrupt transformation processes
      4. Incorrect business rules
      5. Incorrect usage rules
III. Analyze Data Profile Results
    • Summarize the key findings from the profile detail
    • Determine what key drivers are contributing to the impact
    • Determine accountability for the data quality issue
    • Involve other Data Stewards in troubleshooting and designing the data quality solution
IV. Design the Corrective Action Plan

Two types of plans should be developed to address known data quality issues: a corrective action plan to fix the immediate source of the problem identified, and an ongoing monitoring plan, where thresholds have been determined and metrics are routinely collected and reported to data stakeholders.   This monitoring process should be scalable based on the number of data elements being tracked.

    1. Corrective Action Plan
      • Does scope of problem warrant change in metadata definitions, business practices or data entry rules?
      • Does scope of problem warrant a data governance standard?
      • Does the corrective action plan include details on how to fix the source of the problem as well as ways to correct historical data in the system?
    2. Preventive Action Plan
      • This plan will be designed to minimize the probability of data quality issues from recurring
      • Determine ‘early warning triggers’ based on designated thresholds.   These thresholds should reflect the business tolerance for inaccurate data (is 95% acceptable?)
      • If data latency is the source of a data quality issue, then latency thresholds should be included in the monitoring plan
      • Determine how frequently results of the monitoring plan will be reported to data stakeholders or governance oversight committees
Carol_fig2
So, now that summer is officially here, this wraps up my Data Governance Primer series.   Time for some iced tea and my favorite beach towel.   Come August, these little refreshers might be just the thing!

photo by Swamibu via Flickr (Creative Commons License)


CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.

Posted June 24, 2010 6:00 AM
Permalink | No Comments |

By Carol Newcomb, Senior Consultant

Minding Your Metadata

The second part of my summertime primer addresses ‘Minding your Metadata’.   I can just hear the collective groans and yawns now.   Sorry, but metadata collection is one of those necessary evils that may not be fun in the doing, but having it available as a resource to understand your data and use it appropriately is invaluable.   And you just might find some interesting surprises along the way!

Carol_image3

Metadata: What Is It & Why Do I Need It?

As you start your Root Cause Analysis (see last week’s primer), you first need to examine existing data definitions (or lack thereof).   Metadata is the foundation of good data management and forms the basis for Data Governance.     Pardon me for stating the obvious, but metadata is fundamental to investigating and resolving data issues and it is the first place to start when investigating data quality issues.

Metadata is ”data about data”.   Plain and simple.   It includes descriptive information about electronic data used in common daily business practice.   Metadata includes items usually found in a data dictionary: field name, field length, retention rules, and security access, as well as additional descriptive information that may include data origin (source or system), creation/entry date, method of creation (key-entry or the result of a calculation), purpose of the data (its intended use), how frequently it gets updated or refreshed, and current location in a database (table, view, schema).   If a data element is the result of calculation logic or groupings (such as age categories), those business rules used to generate the resulting data values should be collected as part of the metadata.

A good example of metadata that you may use every day would be ‘document properties’ in a Word document.   This feature captures data on the original document creation date, most recent access and update times, document creator, count of characters, words and pages.   If the document should be private, this will be indicated in its properties.   You may also tag the document by indicating key words in order to make it easier to find by you or others.

A few of the benefits of Metadata Management include:

  • Clarify rules for data entry
  • Reduce ambiguity around appropriate use of data elements
  • Eliminate problems associated with not having data definitions, business rules or transformation logic available
  • Validate legitimate values at the data element level
  • Provide evidence to regulators that security and confidentiality are protected
  • Centralize the storage and accessibility of metadata for end-users
  • Reduce the amount of effort required to research data results.

A Metadata Management Repository is a central location or system to collect and store metadata that may exist in disparate parts of the organization (data dictionaries, systems, spreadsheets, or people’s brains). The metadata repository will store detailed definitions centrally on a network where other users can find it.

There are three general sources of metadata that should be included in this repository:

Business Metadata – Business metadata attributes facilitate identification, understanding, and appropriate use of existing data elements.   These include clear business names and descriptions, relevant business rules, descriptions of the data sources, security and privacy rules, etc.  
Technical Metadata – Describes the technical attributes of data such as physical location (host server, database server, schema, etc.), data types, any transformations applied and domain of valid values, relationships to other data elements, precision, and lineage.   Technical metadata is used by business users and by IT staff to design efficient databases, queries, and applications, and to reduce duplication of data.  
Operational Metadata – Describes the attributes of routine operations on data and related statistics.   These include job schedules and descriptions, data movement and transformation processes, data read, update and performance statistics, volume statistics, backup and archival information.   Operational metadata is used by operations staff, and DBA’s to tune the system and ensure its continued efficient operations.   It is also used by business users to track such events as ”last use” of a field, and ”last load” of a data element.
Exciting stuff, huh?   Well, the whole point of metadata is to have the information about data available to a multitude of users when they need it, to keep it current, and to avoid confusion around usage.   So if you appreciate having a clean bathroom, and knowing where you keep your antiperspirant, you will also appreciate having good metadata!   The time for spring cleaning is well overdue.

CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.


Posted June 17, 2010 6:00 AM
Permalink | No Comments |

By Carol Newcomb, Senior Consultant

Newcomb_Graphic_01b

They say that Data Governance is about People, Process and Organization.   Much of the design work in planning for data governance is around people’s roles and responsibilities, then designing the organizational structure that will provide authority for decisions to be made and enforced.   The processes, however, are not new.   They are probably already being practiced within your organization, just in a decentralized, informal way.   In this blog series, I discuss the processes for 1) investigating and isolating the data quality issues—Root Cause Analysis—, 2) starting to collect complete Metadata Definitions, and 3) performing Data Quality Analysis.   Only when your governance group has worked through each step, in order, will you be more likely to design the appropriate solution.

Root Cause Analysis

The process of data governance is fundamentally very simple.

  1. Identify the data quality issues to address
  2. Prioritize the portfolio of issues to isolate/tackle the most important
  3. Perform Root Cause Analysis to determine the true source of the data issue
  4. Design the corrective action
  5. Formalize the correction through consideration & approval by the Data Governance organization
  6. Implement the fix
  7. Monitor the results

It seems like when we start to map out the discrete steps involved in the data governance process, much of the work is already being done in informal ways throughout the organization.   What some folks don’t realize is that data governance is often nothing more than formalizing a whole bunch of informal processes that either don’t get communicated, or aren’t accepted as a data standard.

Root Cause Analysis is the process of identifying probable causes of a data issue, and isolating the contributing factors.   In order to resolve any particular issue, root cause analysis involves fact-finding, drilling into details of the problem, talking to the right people, and separating out other associated (but not contributing) factors.

A standard tool for supporting the detailed findings is the Ishikawa Diagram, below.   

Newcomb_Graphic_02
To conduct a thorough Root Cause Analysis, use the following checklist:
  • Diagnose the problem as if you are a physician or a detective. Consider all possible sources of the symptom. Don’t rule anything out yet!
  • Boil the ocean—be exhaustive and creative.
  • Don't practice problem solving before collecting all possible causes.
  • Practice the ”5 Why’s”—don’t stop asking ”Why” until you have exhausted every conceivable potential reason.
  • Rank the factors if possible.   Identify the Primary causes versus the Secondary or associated factors.
  • Rule out each possible factor one at a time.   Justify why (you may need to come back to this later).
  • Find all potential business process and data owners to involve them in your understanding of the possible sources of the problem.
  • Share the findings with everyone involved in troubleshooting. They could rule out certain factors with their knowledge.
  • Test your hypotheses with actual data.     
  • Fix the problem and test again.
  • Publish/share your findings and fixes.   Communicating your findings may reveal additional factors you hadn’t considered.

After a thorough Root Cause Analysis has been completed, Data Stewards should proceed to Metadata Analysis and Data Quality Analysis.   These two techniques will be discussed in my next blogs.


CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.


Posted June 10, 2010 6:00 AM
Permalink | No Comments |

By Carol Newcomb, Senior Consultant

Newcomb_Graphic_01b

They say that Data Governance is about People, Process and Organization.   Much of the design work in planning for data governance is around people’s roles and responsibilities, then designing the organizational structure that will provide authority for decisions to be made and enforced.   The processes, however, are not new.   They are probably already being practiced within your organization, just in a decentralized, informal way.   In this blog series, I discuss the processes for 1) investigating and isolating the data quality issues—Root Cause Analysis—, 2) starting to collect complete Metadata Definitions, and 3) performing Data Quality Analysis.   Only when your governance group has worked through each step, in order, will you be more likely to design the appropriate solution.

Root Cause Analysis

The process of data governance is fundamentally very simple.

  1. Identify the data quality issues to address
  2. Prioritize the portfolio of issues to isolate/tackle the most important
  3. Perform Root Cause Analysis to determine the true source of the data issue
  4. Design the corrective action
  5. Formalize the correction through consideration & approval by the Data Governance organization
  6. Implement the fix
  7. Monitor the results

It seems like when we start to map out the discrete steps involved in the data governance process, much of the work is already being done in informal ways throughout the organization.   What some folks don’t realize is that data governance is often nothing more than formalizing a whole bunch of informal processes that either don’t get communicated, or aren’t accepted as a data standard.

Root Cause Analysis is the process of identifying probable causes of a data issue, and isolating the contributing factors.   In order to resolve any particular issue, root cause analysis involves fact-finding, drilling into details of the problem, talking to the right people, and separating out other associated (but not contributing) factors.

A standard tool for supporting the detailed findings is the Ishikawa Diagram, below.   

Newcomb_Graphic_02
To conduct a thorough Root Cause Analysis, use the following checklist:
  • Diagnose the problem as if you are a physician or a detective. Consider all possible sources of the symptom. Don’t rule anything out yet!
  • Boil the ocean—be exhaustive and creative.
  • Don't practice problem solving before collecting all possible causes.
  • Practice the ”5 Why’s”—don’t stop asking ”Why” until you have exhausted every conceivable potential reason.
  • Rank the factors if possible.   Identify the Primary causes versus the Secondary or associated factors.
  • Rule out each possible factor one at a time.   Justify why (you may need to come back to this later).
  • Find all potential business process and data owners to involve them in your understanding of the possible sources of the problem.
  • Share the findings with everyone involved in troubleshooting. They could rule out certain factors with their knowledge.
  • Test your hypotheses with actual data.     
  • Fix the problem and test again.
  • Publish/share your findings and fixes.   Communicating your findings may reveal additional factors you hadn’t considered.

After a thorough Root Cause Analysis has been completed, Data Stewards should proceed to Metadata Analysis and Data Quality Analysis.   These two techniques will be discussed in my next blogs.


CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.


Posted June 10, 2010 6:00 AM
Permalink | No Comments |

By Caryn Maresic, Senior Consultant

Parents on Vacation via Flickr (Creative Commons)

Julia’s parents were planning a vacation.   Her mother thought Pensacola would be a great destination—she’s heard so much about the wildlife, especially the dolphins!   Her father wants to see the National Naval Aviation Museum and the Blue Angels.   Since Julia’s traveled extensively, her parents asked her to make all the arrangements.   While having dinner with them to discuss plans, she jotted down the following notes:

  • Location:   Moderately-priced hotel close to water/sights.
  • Budget: $3,000 for transportation and accommodations.
  • Activities:   Beach and nature activities (Mom), science/historic sights (Dad)
  • Duration: 10 days.

Julia felt honored that her parents trusted her to get the job done.   After doing some online research, she made all the reservations and met with her parents to review the reservations.   She eagerly awaited the look on her parents’ faces as they scanned the vacation itinerary and read through the glossy brochures.

”Hawaii?”, they said in unison.   ”We didn’t want to go to Hawaii!"

"Honey, we chose Florida because we can drive there.   I don’t want to fly anymore.   Flying is such a pain,” Dad grumbled.

”I appreciate what you’ve done, Julia, but an old friend of mine lives near Pensacola and I was hoping to visit while we were there.” said Mom.

”But, Mom!”, exclaimed Julia, ”You said you wanted beaches, dolphins, sunny weather.   Dad, you like science and history—what about Pearl Harbor?   You two can’t go to the gulf coast—what about the oil spill?”

What happened here is typical of what happens to IT projects all the time.   It’s easy to say that we wouldn’t do what Julia did.   Would we?   Don’t we oftentimes:

  • Interview the business and record the requirements in an abstract way.
  • Believe that the we can deliver something better than what the business asked for.
  • Assume that the business lacks the capability to understand the technology.
  • Fail to get all of the requirements.   Not exactly our fault, but still a problem.
  • Neglect to keep the business involved in the process.

There has been a lot of buzz on IT-Business alignment of late, including this article on some specific companies that are going the extra mile: Beyond Alignment—as well as this one on lack of user involvement: Why IT Projects Fail: Lack of User Involvement.   Most companies aren’t as progressive.   The willingness to work together has to occur at all levels. Only when we let them drive can we deliver, if not what they asked for, then at least something useful.

photo by stevendepolo via Flickr (Creative Commons license)


Caryn_50x50 Caryn has over 20 years experience in providing high-quality data solutions to clients in the areas of Business Intelligence, Data Warehousing and System Integration.   Caryn has expertise in across industries with an emphasis in Pharmaceutical, Manufacturing, and Insurance.   Prior to joining to Baseline, she ran her own consulting company.


Posted June 3, 2010 6:00 AM
Permalink | No Comments |