The Eternal Data Warehouse
by Bill Inmon
Originally published April 15, 2010
Every now and then it is worthwhile to think about what-ifs and why-nots. One of the interesting subjects for this exercise is how much historical data belongs in a data warehouse.
It is normal for a corporation to keep 3 to 5 years of data in their data warehouse. Occasionally you see a corporation with 10 years’ worth of data. There are some exceptions to this rule. In a life insurance company, it is normal to see a data warehouse go back 100 years. Actuaries need that deep historical data to calculate when we are all going to die (an odd job if there ever were one). And in certain engineering and research environments, experiments and data that was collected years ago is every bit as relevant and useful as data that was collected yesterday. So there are then some environments where data just doesn’t seem to age.
And perhaps the oldest historical data warehouse (if you can call it a warehouse) is that of the Mormon Church (the Church of Jesus Christ of the Latter Day Saints). We are told that in some cases, the Mormons’ genealogical records go back 2,000 years.
In terms of history, how much data should a data warehouse contain? Let’s start with retail. How valuable is retail data when it is historical? Is it useful to compare a sale made today with a sale made last week? Last month? Last year? Ten years ago?
Comparing a sale from week to week gives an early warning of trends that may be developing. It is hard to argue that an early warning of trends is not valuable. Comparing sales month to month also is valuable for the same reason, except that the perspective is a little removed from time. It is easier to smooth out the daily peaks and valleys that may occur when looking at a month. Looking at sales from one year to the next is a very common technique used by management and accountants alike. Such comparisons are a good indicator of the progress (or lack thereof) being made by a company.
Now comparison of sales from one year over a five-year period starts to become suspect. Over a five-year period, many factors have changed – inflation, product lines, competition, packaging, the customer base, and so forth. To be honest, a five year comparison of sales data is questionable. And what about a 25 year comparison of sales data? Comparing sales data over 25 years may be an interesting thing to do, but it is doubtful that such a comparison will lead to any insight. The world is so much different today than it was 25 years ago that such a comparison is very dodgy.
Now let’s stretch it out. What about sales data that is 100 years old? There may be some historical value in such a comparison, but it is highly likely that there is no business insight that can be gained by looking at such data.
Now let’s move away from sales data and go to a related (but different) topic. Let’s go to financial data. For obvious reasons, financial comparisons of data from month to month, quarter to quarter, and year to year make sense. These comparisons are so normal and common that they don’t deserve comment. But what about financial comparisons over five years? Or over 25 years? These comparisons are worthwhile if for no other reason than that they give insight into the corporate position in the marketplace. And over lengthy periods of time, this perspective can be quite useful.
But there is another reason why financial information over a lengthy period of time can be useful, and this reason is that financial information over a lengthy period of time can be useful for examining how a corporation reacts to the different phases of an economic cycle. How does a company do during a recession? How does a company do during expansion? For these reasons, it makes sense to keep financial data in a data warehouse through at least one cycle of economic expansion and retraction.
That observation then brings up a related topic. That topic is how long is an economic cycle? It can be argued that an economic cycle may be hundreds of years long. It can be argued that a recession and expansion are merely small palpitations of the heart of a much longer economic cycle. If that is the case, it may be useful to keep corporate finance data for a long time because economists seldom agree when an economic cycle begins or ends.
SOURCE: The Eternal Data Warehouse
Recent articles by Bill Inmon
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC