Data Virtualization: Where Do We Stand Today?

Originally published July 16, 2013

Many blogs and articles are devoted to data virtualization, at many events data virtualization has been discussed (sometimes heatedly), the topic is explained on numerous webinars and webcasts, itís on the radar of nearly all analyst organizations, all types of organizations are using the technology today, and the products have matured sufficiently to handle large and complex data environments. One can easily state that although the term data virtualization is not as popular as the terms SQL, data warehouse or big data, data virtualization has been accepted by the market. More and more organizations are deploying the technology to simplify access to their labyrinth of data sources.

But where exactly do we stand with data virtualization? What is the current status? Is it hype or reality? Itís time to evaluate where data virtualization stands today. This article shows the current status of data virtualization. Results of various studies are used to quantify the market.

Data Virtualization in a Nutshell

Data virtualization allows an organization to make its enterprise data easily available to business users. From a more technical standpoint, data virtualization makes all the enterprise data that has been dispersed over a multitude of IT systems look like one logical databaseóeven if that data is buried deeply in IT systems. The effect of using data virtualization is that organizations can increase the return on their investment in data processing.

Data virtualization is relatively young. Exactly when the term was first coined is not entirely clear. It looks as if Eric Broughton used the term first in a paper published in 2005.1 Although not as popular as terms such as big data and cloud, for the last five years data virtualization has moved slowly into the spotlights.

The history of data virtualization is strongly related with data federation, which has been around much longer. (For more information about that relationship, see my article,Clearly Defining Data Virtualization, Data Federation and Data Integration.) Data federation means combining a heterogeneous set of autonomous data stores to form one large data store. In principle, this is what data virtualization does as well. But this is where data federation stops and data virtualization continues. Next to data federation technology, current data virtualization products also support cleansing technology, data profiling and data modeling capabilities, impact and lineage analysis and so on. Some products that started out as pure data federation products evolved into data virtualization products.

The market of data virtualization products includes:
  • Composite Information Server
  • Denodo Platform
  • IBM InfoSphere Federation Server
  • Informatica Data Services
  • Oracle BI Server
  • Red Hat Teiid
  • Stone Bond Enterprise Enabler Virtuoso
To summarize, data virtualization products and their forerunners have been around for some time. This technology has already matured.

The Current Market for Data Virtualization

The most recent study of the market for data virtualization was performed by Wayne Eckerson of TechTarget in April 2013 (see† Data Virtualization: Perceptions and Market Trends). This study shows that 35% of the respondents have invested money in data virtualization, 27% of the respondents have partially deployed the software and 18% have it fully deployed. Furthermore, almost one-third of the organizations have data virtualization under consideration.

These numbers are comparable to the ones coming from a TDWI (The Data Warehousing Institute) study.2 This study indicates that 19% of the organizations have data virtualization currently in use and 31% have plans to implement data virtualization. In July 2012, Ted Friedman of Gartner indicated that approximately 27% of the respondents of a study indicated that they were actively involved in or had plans for deployment of federated or virtualized views of data. Ventana Research indicated in an April 2012 study3 that data virtualization is an advancing priority in information management: 12% have completed data virtualization projects, 11% have initiated projects and 20% have planned a project in which data virtualization will be used. Finally, in a 2012 study, Forrester Research4 predicted that the total software revenues (licenses, maintenance and services) for data virtualization would grow to $8 billion by 2014.

These numbers are very promising, but clearly the data virtualization market is not growing explosively. Itís growing as so many enterprise software products grow: slowly and steadily. In addition, we have to remind ourselves that it was only during the last five years that data virtualization vendors have started to push and promote data virtualization heavily, and that period coincides with a global economy under stress. As is generally known, in a poor economy, organizations invest less in new technologies, even if that technology may solve some of the problems and lower the total cost of ownership (TCO).

Noteworthy is that, based on conversations with dominant data virtualization vendors, Europe is significantly behind the USA with respect to adopting data virtualization.

How Organizations Are Using Data Virtualization

Although some of the data virtualization products were initially designed to support ESB/SOA type systems, today most organizations use data virtualization in business intelligence (BI) environments. This was also shown by Wayne Eckersonís previously mentioned study. These were the use cases for data virtualization:
  • Augment the data warehouse (77%)
  • Prototyping (45%)
  • Visualizing real-time data within an existing application (39%)
  • Drill into detailed data in another system, such as a data warehouse (30%)
  • Querying non-relational data sources (25%)
  • Creating an enterprise view of multiple data warehouses (24%)
  • Querying external data (19%)
  • Supporting ETL processing (18%)
  • Querying external data (16%)
  • Delivering a 360-degree view of customers (15%)
In many cases, organizations are attracted to data virtualization because of its agility: the speed with which data sources can be integrated and the speed with which integrated data becomes available to end users. Eckersonís study confirmed this: 66% of the respondents were interested in data virtualization because of agility.

In the long run, the fast growing interest in self-service BI tools, such as QlikView, Spotfire and Tableau, will boost the adoption of data virtualization. The reason is that self-service BI tools can only deliver a certain level of agility. The moment users ask for a change of a data structure in a data mart or data warehouse, the IT department has to be involved. Developing a BI system using self-service BI tools together with data virtualization servers leads to a much more agile result. In other words, the same level of agility currently offered by self-service BI tools for reporting and analytics can be delivered by data virtualization servers for the data storage aspects of BI environments.

It is also noteworthy that those not too familiar with the technology assume it can be deployed in small environments only. According to Eckersonís study this is not true. This study shows that a majority of the organizations (59%) that have deployed data virtualization have implemented it on an enterprise scale, only one-quarter (25%) have deployed it on business unit level and just 14% have deployed it at the departmental level for one or more departments. In other words, the majority deploy data virtualization on enterprise scale, which could only mean itís suitable for large scale environments.

The Future of Data Virtualization

It is important that the data virtualization products continue to develop in the following three areas:
  1. The vendors have to continue their research into improving performance for all kinds of queries and on all kinds of data sources, including NoSQL systems. Also, their caching mechanisms have to be strengthened. Here, support for in-memory database servers could be beneficial.† Furthermore, now that more and more data is moving to the cloud, data virtualization products have to improve the efficiency of moving data in, to and within the cloud.

  2. The products have to be extended to support the full system development cycle. The design modules of current data virtualization servers should allow designers and analysts to enter other more business-related specifications, such as business glossaries, data models and taxonomies. Data virtualization servers should support more and more features currently supported by tools such as data modeling, master data management and business glossary tools. Data virtualization servers should be able to support the whole process of information management, including information modeling, data governance, and logical database design, and not just the implementation phase. For more information see my book, Data Virtualization for Business Intelligence Systems.

  3. NoSQL systems are very powerful with respect to storage and processing of massive amounts of data. However, in BI environments most of the tools in use donít know how to handle data not stored in a relational way. This means that all that valuable (big) data stored in NoSQL systems is not available to everyone in the organization. To open up all that data, SQL interfaces are extremely important and useful. A SQL interface to a NoSQL system makes the data available to a larger user community and thus increases the potential business value of it. Lately, more and more SQL interfaces are becoming available for NoSQL. Unfortunately, these interfaces are still very young and most of them offer access to one NoSQL system onlyóthey donít federate data from multiple systems. This is a potential area in which data virtualization products can play an important role. They have mature SQL optimizers capable of handling large amounts of data, they have been designed to federate data and they know how to handle non-relational data sources. They should be able to win a considerable portion of this market. Thus, the increasing NoSQL market may create a new market for data virtualization products.

Conclusions

Data virtualization is an accepted technology whose adoption is accelerating steadily. Many IT specialists know what the technology has to offer and how it makes data architecture more flexible. Studies show that 30-35% of the organizations study, invest and/or deploy the technology today. The BI market will continue to push the deployment of data virtualization, but the fast adoption of NoSQL systems will also increase the demand for data virtualization.

In short, data virtualization is not hype; itís a reality. Most vendors can show an impressive list of organizations deploying the technology today. This is supported by the studies done by renowned analyst organizations.

For those who have just started evaluating this technology, I wish you all the best on your data virtualization journey.

End Notes:

  1. E. Broughton, Periscope: Access to Enterprise Data, TUSC, 2005.
  2. D. Stodder, Achieving Greater Agility with Business Intelligence, TDWI, January 2013; see http://tdwi.org/research/2013/01/tdwi-best-practices-report-achieving-greater-agility-with-business-intelligence.aspx
  3. Ventana Research, The Evolution of Information Management, April 2012; see http://www.ventanaresearch.com/infomgt/
  4. N. Yuhanna and M. Gilpin, The Forrester Wave: Data Virtualization Q1 2012, Forrester Research, January 2012; see http://www.informatica.com/Images/1888_forrester-wave-data-virtualization_ar.pdf

  • Rick van der LansRick van der Lans

    Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

    Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

Recent articles by Rick van der Lans



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!