Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

In a series of blogs I answered the questions on data virtualization coming from a particular organization. The following question is not coming from them, but since I've heard the question being asked so many times, I decided to include it in this series.

Question: "If we adopt data virtualization, can we throw away the data warehouse, because we can access the data in the production databases straight on, right?"

Wrong! Data virtualization is not some data warehouse killer. In most projects, where data virtualization is deployed, you will still need a data warehouse. In many systems, if no data warehouse it developed, it won't be possible to implement the information needs of many reports. Let me give the two key reasons:

  • Most production systems do not contain historical data. They were not designed to keep track of historical data. If a value is changed, the old value is deleted. For reports that need to do trend analysis, those deleted values may be needed. Thus, those values have to be stored somewhere. And this is where the data warehouse comes in: data warehouses are needed to store historical data.
  • Production systems may contain inconsistent data. One system may say that a customer is based in New York, while the other system indicates that he is based in Boston. Inconsistencies can't always be solved using software, sometimes human intervention is required to indicate what the correct value is. The result of that intervention must be stored somewhere, so that it can be reused. Again, that's where a data warehouse comes in.
And there are more reasons why an additional database is needed: the data warehouse. If that data warehouse would not exist, and if the data virtualization server is connected to the production systems, it would have no idea how to retrieve the historical data because it wouldn't exist, and it would not know how to determine which of the inconsistent values is the right one.

Worthwhile to mention is that if a data warehouse system consists of a data warehouse and deploys data virtualization, then (physical) data marts may not be needed anymore when they contain data derived from the data warehouse. Such data marts can be simulated by the data virtualization server. We usually refer to them as virtual data marts.

So, introducing data virtualization in a data warehouse system does not imply throwing away the data warehouse. The data warehouse is still needed.

Note: For more information on data virtualization, I refer to my new book "Data Virtualization for Business Intelligence Systems" available from Amazon.


Posted September 27, 2012 12:04 PM
Permalink | No Comments |

Leave a comment