Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

In a series of blogs I am answering some of the questions a large US-based, health care organization had on data virtualization. I decided to share some of their questions with you, because some of them represent issues that many organizations struggle with.

Their question: "There is a theory being proposed within the organization that we should just replicate tables from all systems into a common database, then define views on top and we are home. Are we now oversimplifying the whole field of data integration?"

This question can be generalized to one I get very often: "Every SQL database server supports views and they are like virtual tables in data virtualization servers, so what's the difference?" On a very high abstraction level and if we leave out hundreds of (important) details, the two may look quite identical, but the differences are numerous. Here are a few features offered by data virtualization servers not normally found by views in SQL database servers:

  • In the definitions of virtual tables advanced cleansing operations can be invoked, which is impossible or hard to do in most SQL views.
  • Virtual tables can point to a wide range of data sources, including NoSQL data sources, HTML-based websites, and unstructured data sources. In many SQL database servers, views can only be defined on SQL tables or flat files.
  • Data virtualization servers support designer-friendly user interfaces for designing virtual tables. This does not apply to most SQL database servers.
  • Advanced caching mechanisms exist to store the contents of the virtual tables. This is comparable to materializing views. However, caches in data virtualization servers offer more advanced features for refreshing the contents.
  • Data virtualization servers offer on-demand data profiling capabilities to study the quality of the content of virtual tables.
  • Data virtualization servers support lineage and impact analysis to determine the relationships between all the virtual tables and the data sources. This is important for development and maintenance.
  • Virtual tables can be published in different technical forms, such as SQL tables, SOAP-based services, REST-based services, MDX tables, and so on. In most SQL systems, views can only be presented as SQL tables.

Hopefully, this convinces most people of the differences between the two. I heard someone once summarize the differences as follows: "Virtual tables are like SQL views on steroids." I agree.

Note: For more information on data virtualization, I refer to my new book "Data Virtualization for Business Intelligence Systems" available from Amazon.


Posted September 18, 2012 12:02 AM
Permalink | No Comments |

Leave a comment