Different Data Warehouses
by Bill Inmon
Originally published March 5, 2009
Anyone who says that all data warehouses look alike either hasn’t been paying attention or has been exposed to only a limited number of data warehouses. The truth is that different industries have different kinds of data warehouses.
So what are some of the characteristics of data warehouses in different industries?
Life insurance – The life insurance industry has one peculiarity not found in most other industries. The life insurance industry looks at data for a lot longer than almost anyone else. In most organizations, a five year time horizon is about all that is needed for historical analysis. Not life insurance companies. It is not unusual to have a hundred-year data warehouse for the life insurance industry. The actuaries that calculate when we are all going to meet our demise need as much mortality information as they can lay their hands on. If ten years of mortality information is good, twenty is even better. So if you are looking for data warehouses where there is lots of history, look to the life insurance companies.
Telecommunications – There is a tremendous amount of data in the telephone industry. Compared to other industries, the telephone industry simply overwhelms other industries when it comes to volumes of data. The primary factor for inflating volume of data is call level detail. Call level detail is the data regarding every phone call made – the date and time of the call, where the call was placed from, where the call went to, and how long the call was. The fact is that there are a LOT of phone calls being made, and keeping track of every one is a real challenge. Furthermore, for the most part, the phone calls exist at the lowest level of granularity. The telephone companies feel that they need the flexibility that comes with having data at the lowest level of detail. So when it comes to looking at gigantic amounts of data, telephone companies have pretty much everyone in the world beat.
Insurance (not just life insurance companies, but all insurance companies) – Insurance companies have a fixation on dates. Nowhere else are there as many dates as in the data that serves the insurance environment. There are effective dates, policy activated dates, coverage dates, expiration dates, extension dates, and these are just barely scratching the surface of the kinds of dates that insurance companies have. About the only kind of date that they don’t have is the date that comes off a tree. All these dates make database design for the insurance environment somewhat of an adventure.
Research – Research environments have a unique proposition. Like life insurance companies, research organizations need historical data. But research organizations have a special need when it comes to historical information. In almost every other environment, as data ages, it becomes less valuable. A lot of organizations feel safe in summarizing data as it gets to be older because the data has less relevance to the business of the organization. This fading relevance of data is found in many places – retailing, finance, banking, insurance, manufacturing, and so forth. But in the research environment, it is possible that an experiment and measurement that was made ten years ago is as useful and as relevant as an experiment done yesterday. There is a certain agelessness of historical data in research data that is not found elsewhere.
Another characteristic of a data warehouse for research data is that the use of the data is completely unknown. In most data warehouses, there is a suspicion of how data is to be used and what data elements are important. But in a research environment, obscure data elements may end up being vitally important. This drives the data warehouse designer crazy because the elimination of almost any data element at the point of database design may well be a mistake.
Medical – There is a crying need for data warehousing in the medical environment. If any environment has potential for a real return on value for the building and use of data warehousing, it is the medical environment. But the medical environment has at least two strikes against it. The first strike is that there is no General Motors of medicine. The medical environment is distributed to a fault. Getting the critical mass to do a medical data warehouse is difficult because there is no centralization of medical resources. And in the few places where there are critical masses that can support a data warehouse, these medical environments are run by committee. Building a data warehouse by committee is a really slow and painful thing to do.
A second limitation of the medical environment is that most interesting information in the medical environment is in the form of textual information. Textual information is not nearly as easy to capture and manipulate as classical structured information. Therefore, building data warehouses in the world of medicine is difficult.
Inventory management – In the world of inventory management, there is a need for the management of lots of information and information at the detailed level. Furthermore, the processing of the inventory data can be very difficult. Data warehouses for inventory management are of necessity very challenging.
These are some of the interesting environments where data warehousing has made inroads (and one where data warehousing has not made inroads). It turns out that there are many different forms and idiosyncrasies for the different data warehouses of the world.
SOURCE: Different Data Warehouses
Recent articles by Bill Inmon
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC