Business Intelligence Network business intelligence resources

Blog: Krish Krishnan

« Does data have a lifecycle? - Part 1 | Main | Why you need Data profiling »

Does data have a lifecycle? Part II

What impact does data lifecycle within a data warehouse have on your overall costs? Let us start examining this by looking at a scenario

ABC Corporation has deployed and is using an enterprise data warehouse for about 6 years. Over time there have been a number of changes to the source systems that feed the data warehouse and business rules around the data transformation to the data warehouse. Currently the data warehouse is experiencing severe service level agreement issues on data availability and warehouse availability. The data warehouse initial size was 500GB and has grown to over 5TB in 6 years.

At the most recent meeting the CFO of the corporation has asked the IT and IS departments to give him a TCO report on the data warehouse. Based on the total size of the data warehouse and its importance across the company, we can estimate the following (figures shown are sample dollar values)

Initial cost of hardware - $750,000
Initial cost of software (ETL, RDBMS, OS) - $300,000
Initial cost of deployment (services, installation etc) - $1,200,000
Initial cost of backup and recovery solution - $300,000

Ongoing annual maintenance of hardware and software - $100,000
Ongoing annual cost of deployment (upgrades, new programs) - $500,000
Ongoing annual backup costs - $400,000
Ongoing annual spend on storage - $600,000

Looking at the figures above, the TCO for the solution is over a million dollars. Some money spent on a solution which cannot sustain performance and meet deadlines on data availability.

Given this situation, when you start looking at reducing costs and increasing the data warehouse availability, you start assessing which portions of the data warehouse is becoming expensive to maintain and how to mitigate the performance issues.

On closer examination it is discovered that a significant portion of the data in the data warehouse is not used at all and can be completely removed including its definition and metadata. But there is an impact to the data movement and loading processes etc from this. Apart from all of these issues, you would need to conduct another assessment on the data usage to ensure that you do not cause downstream issues with the removal of the data. Given all of the complexities, it is clear that for an initial strategy archiving the data and removing metadata definitions for unused data will solve the immediate suffering. The overall TCO question still lingers on to be solved. This brings about an associated problem of how to provide user access to the legacy data and its metadata in an archived state, while keeping the overall TCO manageable.

Data lifecycle management within a data warehouse is a topic of interest that will be needing attention and focus. It becomes a significant exercise considering the impact it will bring to the overall TCO of the data warehouse. TCO management for the data warehouse will be another topic for discussion another day.

  Posted by kkrishnan on August 7, 2007 12:32 PM |

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)