Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

In a series of blogs I am answering some of the questions a large US-based, health care organization had on data virtualization. I decided to share some of their questions with you, because some of them represent issues that many organizations struggle with.

For those not familiar with the concept or name, a Rube Goldberg machine, contraption, invention, device, or apparatus is a deliberately over-engineered or overdone machine that performs a very simple task in a very complex fashion, usually including a chain reaction--this is the definition used by Wikipedia. Examples of very simple tasks are pouring beer in a glass, opening up a door, or switching on a TV. Rube Goldberg was an American cartoonist and was most popular for drawing weird machines. Here you can find a photo showing an example of a Rube Goldberg machine.  On YouTube you can find numerous films showing such machines at work. One you have to see is the one developed by a young kid called Audri.

Why a discussion on these weird and often useless machines? Quite recently, I received an email from my customer with the following remark: "I have been thinking about the complexities of physical integration of our systems [with physical integration he means using classic ETL and duplicating data in several databases]. I wish I had Audri's YouTube video when I was trying to urge my team to consider data virtualization. After seeing that little boy, it feels as if developing and testing a system based on physical integration, is like trying to develop and test a Rube Goldberg machine."

He continues with "Knowing what I know now [after studying data virtualization more seriously], if data virtualization is comparable to using a remote control to turn a TV on and off, then physical integration is comparable to developing and using a Rube Goldberg machine to turn a TV on and off."

Evidently, this is an exaggeration, because there are still various situations for which you have to or want to deply a form of physical integration. But there is some truth in it. Software and hardware are currently so much more powerful than ten years ago. In fact, there is so much more "power" available that if organizations would have to design their current BI systems from scratch, they would probably come up with much simpler architectures, ones in which agility would be a fundamental design factor. Data virtualization would be one of the technologies that would clearly help to develop more agile BI systems.

So, years ago, when we designed the architectures of our BI systems, they were not considered Rube Goldberg machines. They were necessities, there was no other choice. But today there is. So, if we look at these architectures today, they do resemble Rube Goldberg machines. They are like machines in which the data values roll down a spiral, are thrown from one database to another, are changed occasionally, fall of some track sporadically, and sometimes even float a few inches, before they arrive in a report.

I have decided to use Audri's film from now on to explain what the differences are between developing BI systems with and without data virtualization.

Note: If you have questions related to data virtualization, send them in. I am more than happy to answer them.
 


Posted November 19, 2012 11:15 AM
Permalink | 1 Comment |

1 Comment

Hi! Rick,
What is your opinion about below 3 points?

1) Organizations have invested so much over past decade or so in phsyical integration and access to data for operational or analytical purposes. Why should they now look at this new trend and not continue to build over the infrastructure they already have? Is it that DV is more suited for green field integrations rather than large corporations with tons of data being integrated over the years?

2) Also wouldn't using a DV layer to physically access the data through SQL's and other direct data access methods mean tighter coupling between data services layer and backend data sources? Isn't it going against all the principles organizations have been following on loose coupling & other principles of SOA?

3)In operational world, If DV is recommended to be used as a means of data aggregation services then why not use lightweight ESB's like Datapower for sewing together webservices?

Leave a comment