Business Intelligence Network

Blog: Dan E. Linstedt

« DNA Computing - Control over DNA Molecules | Main | Does MDM include Data Visualization? »

DW2.0 - Introductory Thoughts

I've been granted permission by Bill to discuss DW2.0 on the blog, and in other articles that I write. This entry is an introductory look at DW2.0, the overall definition, sections, and components. If you wish to use the terms you will need to contact Bill directly. I've included Bill Inmon's stringent legal ramifications below:

"The definition of DW2.0 is intended for the non commercial use of anyone who wants to use the material. However, any commercial use of the material and the trademark is strictly forbidden and will be vigorously monitored and prosecuted. Commercial usage of DW2.0 specifically pertains to (but is not limited to) commercial usage in seminars, presentations, books, articles, speeches, web sites, white papers, panel discussions, reports, and other written and oral forms is forbidden. If you wish to use material about DW2.0 commercially, licensing can be arranged for a fee."

There are 4 sectors of DW 2.0 which comprise the "data warehouse" in a disciplined format: (note: all quoted material is from Bill Inmon’s site and description of DW2.0)

Interactive Sector - The place where high performance data warehouse processing occurs Integrated Sector - The place where integrated data resides Near Line Sector - The place where data with a lower probability of access resides Archival Sector - The place where data with a truly low probability of access resides

From a 3000 ft perspective, each "sector" looks to be (at first) like separate copies of data, this may not turn out to be the case. In fact, these can be made into logical divisions - particularly if the data model underneath supports the logical architecture in a physical format. I've created a public domain (freely available) data modeling architecture called the Data Vault which supports both the interactive and integrated sectors. The notion of Near Line and Archival Sectors appear (at first glance) to be more physically related to storage. I'll dive into these in future blog entries.

In my opinion, the RDBMS vendors should be the first to stand up and take notice (along with the appliance vendors). They should be rushing to the table to support DW2.0 from a mechanical standpoint - offering the developers "seamless" integration across each of the four sectors. That would bring the reality of a logical model and metadata management to the implementation cycles. I long for the day when I can "logically model" the data and no longer care (or know) how the physical implementation takes place - the only addition to the logical model might be data types and field lengths from the physical world.

Let's switch gears and discuss DW2.0 Compliance, audit ability, and SOR (system of record) for a moment. Below is Bill's definition of SOR and the best place to identify data as arriving from an SOR.

Because the data that enters DW2.0 has its first appearance in the operational environment, great care needs to be taken with the data. In a word, the data that eventually finds its way into DW2.0 needs to be as accurate, up to date, and complete as possible. There needs to be defined what can be determined the source data system of record. The source data system of record is the data that is the best source of data.

I often ponder the question: what does SOR truly mean? Hmmm - by that I wonder about the following case study (which actually happened to me 10 years ago on a government data warehouse).

We built a data warehouse, it contained a master parts list, and a few other master lists (hence my recent entries on Master Data Management). Our warehouse also contained integrated data organized by business key, but stored at the lowest level of grain. Furthermore the information was not "transformed" except in raw data type, and defaults were assigned in specific cases documented by SLA's with the business.

Three things happened. Auditors were brought in because naysayer’s were stating that the warehouse was "wrong", and they wanted the project stopped. The first thing that happened was around data audit ability. The auditors asked: why do the reports from the data marts not match the operational reports? Our team demonstrated the value of raw integrated data (both bad and good) stored within the warehouse, and that the warehouse reflected what was in the source system - the auditor passed the warehouse, and then proceeded to tell the business that the operational report (financial calculation) was wrong and needed to be corrected. The business would not have had "accountability" much less found or fixed the problem if our data warehouse was not deemed a "reliable and compliant" source of data.

The second thing that happened (at the same time): the auditor saw the parts list, employee list, work order list, and so on... and then asked: does this "vision of integrated data" exist in any one source system? The answer was clearly no. The auditor then checked the individual data elements for audit ability and traced them back to their source systems, once satisfied he labeled the warehouse suitable to become a "system of record" as it was the only place that data existed.

The third thing that happened: the auditor then asked for a source system that was called "the master system" for bill of materials to be re-loaded with 5 year old data. But the business had changed, the models in the source system had changed and the restore could not take place - making it impossible for the "master system" to be a system of record for historical data. The only place that data could be loaded was in the warehouse.

As I read through DW2.0 specification I believe there is a place for accountability, SOR, and compliance within the warehouse, again it has a lot to do with the traceability of the data sets and creating audit trails where they didn't exist before. We'll dive into this more later.

For now, if you have thoughts or comments - I'd love to hear about them. What part of DW2.0 would you like to know about?

Thank-you,
Dan L

  Posted by Dan Linstedt on March 9, 2006 6:59 PM |

Post a comment