Business Intelligence Network business intelligence resources

Blog: Krish Krishnan

« Data Warehouse Solution Architecture - Why does it matter? | Main | Does data have a lifecycle? Part II »

Does data have a lifecycle? - Part 1

Data warehouses have been evolving over time and data within the warehouse will reach a maturity point beyond which, it will be obsolete. Is the simplistic answer to just delete or archive this data? No, just deleting the data creates space but not solve the underlying problem, which is the value of the data (read attribute in the E-R world) itself.


Sit back and take a look at the data evolution in your data warehouse. When you start examining the data evolution you see that your data architecture has been evolving over time to accommodate source system changes and end user demands constantly, while portions of the data elements from legacy source systems do not even get to the data warehouse anymore.

When you scream, my goodness why do we have all this extra data? You will realize that the data within the data warehouse does lose its value, thereby reaching an end of lifecycle. Now it makes sense all of a sudden that archiving the data is not the solution.

How do you determine the lifecycle of data in the data warehouse? To answer this question, you will need to have information about the following -
• Data Lineage
• Data Dependency Matrix
• Data Usage (This is an activity that needs more definition and methodology and will be addressed in subsequent blogs)
• Metadata – Business and Technical
• Report usage activity for reports with the legacy or obsolete data

Once you have the relevant information gathered about the data , you will compile a findings and recommendations document, meet with the data governance committee and get the final approval on the obsolete data removal.

Removing obsolete data along with its definition, benefits the data warehouse in the following areas
• Data Movement
• Data Processing
• Data Presentation
• Data Warehouse Performance

Remember that this is not a simple task and requires elaborate planning and execution. You will be changing the data definitions for dimensions in the data warehouse, thereby requiring testing and user approval before this goes to production.

Fortunately for us from a methodology perspective, Bill Inmon’s DW2.0 will serve as a blueprint in this exercise. For more information on DW2.0 please visit www.inmoncif.com or watch this space in upcoming months for articles around this topic.

A second part of this blog will address some technology specific ideas.

  Posted by kkrishnan on August 6, 2007 12:31 PM |

Comments

Thank You

hello sir,
i am a fresher who wants to start his career in data warehousing, but i hear that freshers are not considered for data warehousing right away and prefer people who have some domain knowledge (banking,telecom, etc) and have been working as data warehouse professionals. so could u please reply to this query of mine. please post ur reply to my email id (vyomkeshrishi3@gmail.com)

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)