When Claudia Imhoff and Shawn Rogers and I got together for lunch the other day, we discussed this notion of SoR - it's a very interesting take. SoR has long been held as a single definition, and has been defined as residing in the source systems. Today, there are multiple definitions (3 to be exact) of SoR. Particularly since MDM evokes new notions of what SoR means to the business, as does a compliant and auditable enterprise warehouse. In this entry I'll walk through the multiple definitions of SoR. In my MDM night course in August at TDWI (2006, San Diego) I'll be discussing many of these things.
Let's start with the three types of definitions: The first definition is the widely accepted definition, based on the origination point of data (source systems). The second definition is not so well known, but for those of you with a normalized and AUDITABLE enterprise warehouse, you'll be happy to see the second definition. The third definition is based on the ever-present "single view of today's enterprise". Many people and vendors call this: "Single Version of the Truth", but truth is subjective to EACH individual user, so there's no possible way (in the purist sense) that truth actually exists!!
SoR Definition 1: The data that exists in the source system, in other words, where the data is entered, or originates for the first time. It contains a record of entry or creation or origination for the information it houses. Hopefully this system is auditable, and compliant. The data might not be "clean, quality checked, or integrated" unless it sits in some sort of master list (master data set). Most of this data is shipped across the enterprise to other systems, and to an enterprise data warehouse for integration.
SoR Definition 2: This data resides in a NORMALIZED enterprise data warehouse, and is auditable and compliant. This data is RAW Data that is INTEGRATED by business keys (See Data Vault Data Modeling). This data is NOT cleansed, altered, modified, or quality checked - but is auditable, meaning that an auditor can trace the data back to the source system from whence it came. The integration point is in fact the master-key (business key) that horizontally integrates data across the enterprise. Duplicates reside within this data set, dirty data resides within this data set, the data that resides in the normalized warehouse is captured by type of data and rate of change, and is the only place in the entire enterprise where this integrated version of uncleansed raw data exists. This data is often used by business to FIND broken business processes. If you don't have source system raw data in an integrated fashion, it will be harder to spot the trends, and the broken business processes will continue to go un-noticed. I've seen save companies millions of dollars. This is a RAW system of record, it is the only place in the enterprise where this integrated raw version exists.
SoR definition 3: Master Data, or Conformed Dimensions - data that has been cleansed, quality checked, duplicates removed - and is seen and used by the business as "single version of today's truth", it is covered by master metadata, and is understood by the organization to have meaning. In the master data set there is only a SINGLE copy of each (customer / part / work order / supplier etc..) item. This is an SoR by business standards because it represents value to the business in eliminating duplicates and understanding how the business looks "TODAY". It's a snapshot of the current consistent, and quality cleansed information that feeds the rest of the source systems.
So, as you can see - there is value to each type of System-of-Record. So what then is exactly, a system of record itself?
I would tend to suggest that a System of record has the following characteristics:
1. It's a data origination point (the only place in the enterprise where this "vision" of data exists).
2. It begins to feed other systems, providing automatic feedback to source systems, and becomming a part of the operational LOOP in business decision making.
3. In some cases it's auditable and traceable, in other cases it's quality cleansed - but in all cases it provides business value in different formats and assists the business in DOING business on a daily basis.
If you have questions or comments I'd love to hear them, please post them below.
CTO Myers-holum, Inc
Posted April 15, 2006 7:12 AM
Permalink | 3 Comments |