Most analytical tools process a large portion of the analytical logic themselves. For example, the logic to perform a regression analysis which determines the relationship between a dependent variable and one or more independent variables, is executed by the tool. The role of the database server is minimal, it's only used for retrieving all the required data from the database.
Because most of the analytic processing takes place on the machine where the tool runs, it's very likely that too much data is transmitted from the database server to the application, which is bad for performance. Additionally, the processing is not taking place on the most powerful machine.
With in-database analytics, the analytical processing is primarily done by the database server itself. The remaining task of the analytical tool is to present results on the screen and do some minimal processing. This approach has several performance advantages. For example, because the database server (almost certainly) runs on a more powerful machine, the analytical logic is processed more quickly. Secondly, because most of the analytical processing is executed very close to the where the data is stored, the I/O is optimal. And thirdly, because only the result set is transmitted back to the tool, minimal time is wasted on transmitting data from the database server to the tool.
But moving the analytical processing from the application to the database server by itself does not automatically lead to a considerable performance improvement. A serious performance improvement is realized when the analytical logic is executed in parallel by the database server.
A solution based on SQL-MapReduce does allow to push most of the analytical processing to the database server and most of that processing will be executed in parallel. My technical whitepaper Using SQL-MapReduce for Advanced Analytical Queries, which describes Aster Data's implementation of SQL-MapReduce, explains in detail how this works.
Posted June 21, 2010 11:42 PM
Permalink | No Comments |