We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Data Degradation

Originally published April 16, 2009

The other day at a conference, the subject of data degradation/corrosion arose. The speaker at the conference said that data in a database degraded or corroded over time. The statement was made as if degradation over time applied to all databases. I found this blanket statement to be misguided. Indeed, I think that data does degrade in some databases, but not all.

First, what is meant by data degradation or corrosion? Data degradation or corrosion occurs when data is stored in a database and then at a later point in time, the data that has been stored changes and the data in the database remains constant. The data has been captured and stored correctly, but changes in the real world have rendered the data to be stored incorrectly.

So is it true that data degrades in a database in general? Is data corrosion a characteristic of databases? Let’s look at some typical databases.

Consider the classical customer database. It is fair to say that data degrades in customer databases with great regularity and predictability. Why does customer data degrade over time? There are lots of reasons, including:

  • Customers move

  • Customers die

  • Customers marry and change their names

  • Customers change jobs

Because of the changing nature of the customer base, it is a fair statement to say that indeed customer data does degrade over time.

Now let’s consider a different kind of database. Let’s consider a database made up of data collected by researchers. Once, I encountered a data warehouse made up of the results of tests and experiments made on oil. The experiments that were conducted sought to examine the outer limits of oil, looking at the conditions where oil changed its basic properties. The researchers looked at the temperature at which oil exploded, burned, turned to other substances and broke down. Oil was put through a torture chamber to find out what its limits were.

Much data was collected as a result of this research on the properties of oil. Interestingly, this data about experiments on oil did not degrade at all. In many cases, the results of an experiment made thirty years ago were as interesting and as important as if the experiment had been done yesterday. The research data was simply timeless. So while some data degrades and corrodes, other data does not degrade or corrode at all.

Now let’s consider the database created by a bank that manages the money in accounts. The data found in the accounts reflects real life business circumstances. A withdrawal is made and the value for the account in the database decreases. A deposit is made and the value for the account increases. In short, the bank account database is an exact statement of the affairs of an account.

What happens to the database for bank accounts? Does it degrade or corrode over time? Of course it doesn’t. The data is either accurate or not, and time doesn’t have anything to do with the accuracy of data. The data in the account is accurate as of the moment the data is accessed. So here is another database where data does not degrade over time.

Now let’s consider a simple inventory database. In this inventory database is information about products found inside the company. Part of the information that is kept is about the manufacturer of an item. One item found in the database is the name of the manufacturer. For many years, the manufacturer of a product is stable. Then, one year, the manufacturer is bought and is now known as a different manufacturer. At this point, the data in the database about the original manufacturer is incorrect. It has degraded.

However, how often does manufacturer data change? Not often. And how much of the data degrades when degradation occurs? Not much. So there are databases where degradation of data occurs, but at a very, very slow rate.

From these simple examples, we can see that degradation of data can occur in a database. But it is not correct to say that degradation occurs as a general state of affairs in all databases. Some databases do corrode, some databases do not corrode.

An interesting aspect of database corrosion is that when databases corrode, they do not corrode at a constant rate. It is likely that the corrosion rate of a database probably occurs in the form of a Poisson distribution.

To illustrate this nonlinear rate of corrosion, consider a customer email address file. Suppose that in six months, 10% of the file has experienced corroded data. If you believe that the rate of corrosion is constant, the implication is that in approximately five years, the list will be worthless.

However, ten years later, it is found that 20% of the list has not corroded. There still is considerable useful data in the database ten years later. This means (using the Laffer technique) that data does not corrode at a steady rate.

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon

 

Comments

Want to post a comment? Login or become a member today!

Posted May 14, 2009 by tmkeller@gmail.com

Good topic, however I feel there are some major points left out of the discussion.  To me this is a data update issue, not degredation.  My understanding of data degredation is when malformed data is inserted, incomplete data, or there was a problem storing or retrieving the data, or during a migration or upgrade of the data storage (either structure or content) there were complications.  Most of these can be pinned down to user error either through the data input/migration/modification.  But there is the idea that the RDBS engine causes the problem.  This could simply be through a bug in the code or a combination of malformed insert that uncovers a previously unknkown bug.  If data is stored across pages that are not being checked for consitency and the issue is not resolved immediately but rather uncovered at a later time, it is most likely that the issue will become more complex over time as opposed to solving it immediately after it happens.  Thus it degrades, at least in my opinion.

Kindest Regards!

Is this comment inappropriate? Click here to flag this comment.

Posted May 14, 2009 by bob.willsie@iongeo.com

Excellent topic.  It is unfortunate that most users, managers, and executives have a mistaken belief that once data is entered it will always be pure, pristine, and accurate. After 30+ years of dealing with computers and data, I am convinced that data is subject to the same laws of physics as physical objects in the universe.  IE, over the course of time, data will eventually degenerate into the electronic equivalent of dust. 

Is this comment inappropriate? Click here to flag this comment.