We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


The ODS and Teradata

Originally published June 9, 2005

The ODS has long been a standard part of the corporate information factory (CIF). The ODS is the place where real-time processing occurs. The ODS can support both integrated data and OLTP processing. Many organizations have both ODSs and data warehouses.

For a long time, it has been suggested that the ODS is a physically separate technology from the data warehouse. In most technologies, the physical separation of the ODS from the data warehouse makes sense. There are several good reasons for the physical separation of the ODS from the data warehouse. These reasons include:

  • The mixed workload. When a data warehouse is physically separated from the ODS,  it separates the workload into two classes—fast running, small transactions are in one job stream and large running transactions are in another job stream. But when these transactions are mixed together, the workloads are also mixed together and the end result is poor performance. In most technologies, there is a question such as: “how fast can you drive a Porsche during a rush hour in Mexico City?” The answer is less than 5 miles an hour. Never mind that the Porsche is capable of going much, much faster. The Porsche cannot go faster than the vehicle that is immediately in front of it. This illustrates why mixing fast running transactions with slow transactions is a problem.
  • Update contention. When updates are being done to a transaction, care must be taken to allow the transaction to run to completion. Consider the classical case where a transaction has to update two records. The update to the first record is done and the update to the second record is about to be done when the machine goes down at that precise moment. Now the organization has a problem with data and transaction integrity. The transaction needed to update two records. One record has been updated and the other record has not been updated. The data is in a state of instability. If the organization allows the data to remain in that state, then it is very likely that incorrect business decisions will be made at some point in time. A mechanism is needed to either continue the transaction or back it out completely. By separating ODS and data warehouse processing, data warehouse processing does not have to worry about the overhead and complexity of update processing.

There probably are other reasons why an ODS and the data warehouse do not work well together, but these two reasons—mixed workload and the complexity and overhead of update integrity—are the main ones. 

Enter Teradata with their active data warehouse technology. Among other things, Teradata makes the claim that with the active data warehouse the ODS can be mixed with the data warehouse. They point out that by keeping the two environments together that there are some benefits—ETL can be done in a single place, data can be shared across traditional ODS and data warehouse boundaries, etc.

But if the issues of the mixed workload and data integrity cannot be addressed, then the ODS and the data warehouse should still be kept separate.

So what about these two crucial issues in the case of Teradata’s active data warehouse technology?

To address the different performance needs of the ODS and the data warehouse, Teradata allows separate streams or threads to be defined. Fast-running transactions are assigned to one set of threads and slow-running transactions are assigned to a different set of threads. For example, Porsches do not have to compete for road space with moving vans. That is the first step towards addressing the issue of speed of processing in the active warehouse. But there is a second component to the management of the mixed workload that allows separation of the threads to work, and that component is dynamic reallocation of the resources dedicated to different threads based on the whole system workload. One second there is very little high-performance activity. In this case, machine resources are available for heavy-duty background processing. This is the equivalent of a Mexico City road at 4:00 am in the morning. Not much is going on so big trucks can travel the streets of Mexico City during this early morning hour. But as traffic picks up, dynamically, resources are taken away from the heavy lifters and given to the fast-moving cars. This would be like dynamically making high occupancy vehicle (HOV) lanes in Mexico City as the morning traffic picks up. Except that, Teradata does this creation of HOV lanes dynamically. This means that when needed, resources will be available for high-performance processing, thus insuring good performance.

The second issue is managing the complexity and the overhead of update integrity when ODS data is mixed with data warehouse data. Teradata makes some interesting arguments. In many cases, when high-performance processing is done, the high-performance processing is read-only processing. In this case, there is no overhead or complexity of transaction integrity. In other cases, the updates to the ODS/data warehouse are done in batch. And in yet other cases, if a machine interruption occurs during update, the consequences are not severe. As an example of non-severity of data integrity, in the world of airlines reservations processing, in the interest of gaining speed, classical data base update integrity is not done. This means that once in a while, when the system hiccups, the record being processed may be corrupted. This does not happen often. But when it does, two passengers may be assigned to the same seat. In that case the airline leaves it up to the gate agent to make a manual correction. As long as seat assignment is not messed up too often, the airline can get away with having less-than-perfect transaction integrity. Teradata takes that same approach in that Teradata allows data that is “in flight”—data that has not been completely updated to be read. This has the effect of not allowing contention of update to halt performance.

TECHNICAL NOTE:

According to the CTO at Teradata: “Teradata does not compromise transaction integrity for the writers. We have all the same ACID properties as any DBMS for the updates—any update transaction large or small is either completed/committed or it is rolled back completely to the state prior to transaction begin. The data will never be corrupted by an interruption or fault. Where we compromise is on the reader side. We allow the readers to read through partially completed transactions/load jobs. This means that a reader may see a partially completed transaction or may see updates from a transaction that is later rolled back. We call this Access Locking. ANSI calls this Read Uncommitted. ODBC calls this SQL_TXN_READ_UNCOMMITTED. All of these are very transient and occur only if the reader looks at the time the transaction is open. But it does not affect the permanent values or integrity of the data—transaction integrity ensures that the data has transactional integrity over any fault situation.”

So indeed Teradata—unlike other vendors—does have a legitimate story for the physical inclusion of an ODS with a data warehouse using their active data warehouse technology.

  • Bill InmonBill Inmon

    Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

    Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Bill Inmon



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!