We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

The last few weeks I've been talking to a large health care organization based in the US. They are considering to introduce data virtualization in their business intelligence system. Some of their questions on data virtualization led to interesting discussions and insights. My feeling is that some of their questions represent issues that many organizations struggle with today. Therefore, I decided to share a few of their questions and my answers with you.

This was one of their first questions: "Is it accurate to think that in this new paradigm of data virtualization, reporting programs only have to deal with the user interface logic, whereas the data virtualization server does most of the business logic and data manipulation/data integration/data aggregation work? Consequently, does that mean that the task of programming the reports is easier than in a traditional report where the report program contains all the logic (UI + business + data integration)?

To me this is a great question. Let me share my answer with you.

More and more reporting and analytical tools support features for data federation, data aggregation, data manipulation, and data cleansing (for simplicity sake, let's refer to these features with the term data integration). Technically this means that after the tools are hooked up to the required data sources, data integration specifications can be entered to turn all the data into any form, and finally, the development of reports can be started.

This approach has disadvantages. The first one is that it becomes hard to guarantee that all users deploy the same data integration specifications. How do we enforce this? If they all use the same tool, maybe the tool offers features to share those data integration specifications--note that some tools do and some don't. However, not all users use the same tools, which can lead to inconsistent reporting results.

The second disadvantage relates to whether users are aware of all the intricacies of the data sources they access. Imagine that one of the data sources is an old production database. In that database some tricky structures are used. For example, if column A contains the code 1, then the value X in column B means New York, but if the code in column A is equal to 2, then the X in column B means New Jersey. Users have to be aware of all those data-related logic when doing their own data integration work.

For many situations it's recommended to enter data integration specifications only once in a centralized system and let all reports share them. These specifications specify all the necessary data federation, data cleansing, data transformation, and data integration work. In other words, together they hide the intricacies of all the data sources. This approach leads to more consistent reporting.

This is where data virtualization servers come in. With data virtualization servers, such as those from Composite, Denodo, and Informatica, all those data integration specifications can be entered in a more centralized way and are shared by all reports even in a heterogeneous reporting environment.

To come back to the question, the effect is that users can fully focus on the reporting and analysis of data (UI), plus they don't have to spend time on data integration. So yes, the task of programming the reports becomes easier than in a traditional report where the report program contains all the logic (UI + business + data integration).

Note: For more information on data virtualization, I refer to my new book "Data Virtualization for Business Intelligence Systems" available from Amazon.


Posted September 4, 2012 7:07 AM
Permalink | 2 Comments |

2 Comments

I guess another question is how data virtualization handle performance issue with data being accessed directly from multiple disparate systems and on different data granularity, the data needs to be integrated to be consumed by analytical tools.

How data virtualization handle such difficulties?

Great point. Performance is almost always a concern when an organization considers data virtualization. Or maybe I should say, there are always performance-related concerns when new technology is introduced. Just remember the criticism relational technology had to endure when it was introduced.
Nevertheless, it's something that has been addressed. Several techniques and technologies exist today to get the right performance. As you can imagine, the customer I am writing about also had questions on performance. Therefore, I will address this in a separate blog within a few days.

Leave a comment