We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

In a series of blogs I am answering some of the questions a large US-based, health care organization had on data virtualization. I decided to share some of their questions with you, because some of them represent issues that many organizations struggle with.

One question they had was: "Our reporting tool supports data integration features, any thoughts on whether these features are in the same league as Composite, Informatica, Denodo?"

This is an understandable question, because many reporting and analytical tools come with their own built-in data integration capabilities. For example, BusinessObjects, Microsoft PowerPivot, and QlikView, all allow users to enter data integration specifications. So, why buy a separate data virtualization server if you (think you) have that kind of functionality already in place?

There are various reasons why data virtualization servers are valuable:

  • All the data integration specifications entered in a reporting tool can only be used by that particular tool (or by tools of the same vendor). So, if an organization deploys different reporting and analytical tools (and many do), for example SAS, Excel, and Cognos, data integration specifications are replicated in all tools. Keeping them consistent across all the tools, is quite a challenge. With a data virtualization server, these specifications have to be entered only once and can be shared by all the reporting tools. This results in more consistent reporting results, increases productivity, and simplifies maintenance.

  • Besides features for data transformation, data cleansing, and data transformation, data virtualization servers offer a lot more functionality. Most of them support on-demand data profiling capabilities, invocation of data cleansing operations, master data management, special user interfaces for less technical business analysts, modules for lineage and impact analysis, and so on. It's not just data federation and data transformation anymore. Data virtualization servers support comprehensive design and development environments.

  • The technology and techniques for extracting data from data sources is usually more powerful in data virtualization servers than in reporting tools. This is not so strange, because this is the core functionality of data virtualization servers, while extracting and transforming data is not the core functionality of reporting tools. For example, data virtualization servers support advanced query optimization techniques, such as query expansion, query substitution, and ship joins, not supported by the data integration capabilities of reporting tools; they have sophisticated caching mechanisms to improve performance or to offload query processing; powerful data protection features, and so on.

  • Most data virtualization support a wider range of data sources from which they can extract data than reporting tools. They even support the extraction of data from NoSQL data sources, HTML-based websites, and unstructured data sources.
In other words, there are numerous reasons for deploying data virtualization servers in business intelligence systems, even if the reporting tools indeed support some of that functionality. I am not saying that the data integration capabilities of reporting are immature, but data virtualization servers are designed for on-demand data integration, whereas the strength of reporting tools is processing and presenting data.

To me, the most important reason of the list above is the first one: data virtualization servers allow for the centralization of data integration specifications. These data integration specifications are extremely valuable to an organization and should not be distributed and replicated all over the place. I assume every data governance and information management specialist will agree with me on this one.

Note: For more information on data virtualization, I refer to my new book "Data Virtualization for Business Intelligence Systems" available from Amazon.

Posted September 7, 2012 12:17 AM
Permalink | No Comments |

Leave a comment