Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

Maybe some have missed it, but at the end of last year Informatica entered the market of data virtualization/data federation products with Informatica Data Services (IDS). This product has been built on top of the Informatica 9 platform, from which it inherits its robustness and scalability.

 

Besides all the features you expect from a data virtualization product, it does offer some unique ones. For example, virtual tables (views) are not defined by using SQL or XQuery, but with a flow language that resembles the flow language used in PowerCenter for defining ETL scripts. The only difference is that in PowerCenter the result of a flow is stored in some table or file, while with IDS the result is "pushed" to a reporting or analytics tool. Under the hood, the flow language is transformed into SQL and pushed down to the database servers. It will try to process as much of the data access as close to the data as possible.

 

Another feature is that data profiling has been implemented as an integrated part of the product and the profiling can be done in an on-demand style. What that means is that when a virtual table has been defined, by just clicking on a button, the (virtual) contents of the virtual table is profiled. If something looks incorrect, it can be fixed by adding or changing transformation, or by fixing the source data (if allowed and possible). This will become an iterative process that continues until the virtual table returns the right data.

 

In addition, the developer can ask a user or business analyst to look at the virtual table as well. The user can check whether he thinks the contents is ok, and if not, by using a simple Excel-like language, the user can add his own transformations. Eventually, defining the right transformations becomes a collaborative process between users and developers.

 

Complex cleansing operations can also be executed on-demand. In other words, when data is retrieved by a report, IDS will access the underlying data sources and will execute all the cleansing operations.

 

To summarize, IDS shows how feature-rich and mature the data virtualization products are becoming. If you want to know more about how IDS works and what its features are, get my new technical whitepaper Developing a Data Delivery Platform with Informatica Data Services.

 


Posted April 8, 2011 1:57 AM
Permalink | 1 Comment |

1 Comment

reviewing this for informatica 9.1 implementation in hp.com

Leave a comment