Data warehouses have been evolving over time and data within the warehouse will reach a maturity point beyond which, it will be obsolete. Is the simplistic answer to just delete or archive this data? No, just deleting the data creates space but not solve the underlying problem, which is the value of the data (read attribute in the E-R world) itself.
Sit back and take a look at the data evolution in your data warehouse. When you start examining the data evolution you see that your data architecture has been evolving over time to accommodate source system changes and end user demands constantly, while portions of the data elements from legacy source systems do not even get to the data warehouse anymore.
When you scream, my goodness why do we have all this extra data? You will realize that the data within the data warehouse does lose its value, thereby reaching an end of lifecycle. Now it makes sense all of a sudden that archiving the data is not the solution.
How do you determine the lifecycle of data in the data warehouse? To answer this question, you will need to have information about the following -
• Data Lineage
• Data Dependency Matrix
• Data Usage (This is an activity that needs more definition and methodology and will be addressed in subsequent blogs)
• Metadata – Business and Technical
• Report usage activity for reports with the legacy or obsolete data
Once you have the relevant information gathered about the data , you will compile a findings and recommendations document, meet with the data governance committee and get the final approval on the obsolete data removal.
Removing obsolete data along with its definition, benefits the data warehouse in the following areas
• Data Movement
• Data Processing
• Data Presentation
• Data Warehouse Performance
Remember that this is not a simple task and requires elaborate planning and execution. You will be changing the data definitions for dimensions in the data warehouse, thereby requiring testing and user approval before this goes to production.
Fortunately for us from a methodology perspective, Bill Inmon’s DW2.0 will serve as a blueprint in this exercise. For more information on DW2.0 please visit www.inmoncif.com or watch this space in upcoming months for articles around this topic.
A second part of this blog will address some technology specific ideas.
Posted August 6, 2007 12:31 PM
Permalink | 1 Comment |



