Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

June 2010 Archives

Most analytical tools process a large portion of the analytical logic themselves. For example, the logic to perform a regression analysis which determines the relationship between a dependent variable and one or more independent variables, is executed by the tool. The role of the database server is minimal, it's only used for retrieving all the required data from the database.

Because most of the analytic processing takes place on the machine where the tool runs, it's very likely that too much data is transmitted from the database server to the application, which is bad for performance. Additionally, the processing is not taking place on the most powerful machine.

With in-database analytics, the analytical processing is primarily done by the database server itself. The remaining task of the analytical tool is to present results on the screen and do some minimal processing. This approach has several performance advantages. For example, because the database server (almost certainly) runs on a more powerful machine, the analytical logic is processed more quickly. Secondly, because most of the analytical processing is executed very close to the where the data is stored, the I/O is optimal. And thirdly, because only the result set is transmitted back to the tool, minimal time is wasted on transmitting data from the database server to the tool.

But moving the analytical processing from the application to the database server by itself does not automatically lead to a considerable performance improvement. A serious performance improvement is realized when the analytical logic is executed in parallel by the database server.

A solution based on SQL-MapReduce does allow to push most of the analytical processing to the database server and most of that processing will be executed in parallel. My technical whitepaper Using SQL-MapReduce for Advanced Analytical Queries, which describes Aster Data's implementation of SQL-MapReduce, explains in detail how this works.

 


Posted June 21, 2010 11:42 PM
Permalink | No Comments |