We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Database Technology for the Web: Part 1 The MapReduce Debate


Originally published June 24, 2009


Colin White weighs in on the MapReduce debate, reviewing this technology and describing its pros and cons.

Colin White
Colin White



Want to post a comment? Login or become a member today!

Posted June 29, 2009 by Shawn Rogers

Colin great topic and great article. The buzz around the MapReduce technology is at an all time high its great to have you shed light on the topic.

Is this comment inappropriate? Click here to flag this comment.

Posted June 24, 2009 by Steve Wooledge

Great summary and balanced perspective, Colin. We share your concern for application developers.  Aster's vision is to enable application developers to leverage the power of MPP data warehouses for rich analytic applications by using MapReduce for writing expressions and have them execute in the database, close to the data. There are many more Java and .NET application developers in the world than there are SQL analysts.  Both have to be empowered!

As data volumes in organizations grow, it become prohibitive to ship data out of the EDW to a middle-tier for analytic processing. The network becomes a bottle-neck. By enabling data miners and application developers to push down their functions into Aster nCluster, it speeds up the knowledge discovery process (less data sampling and movement), as well as the performance of applications. MapReduce is about processing power, fail-over, etc., but it's also about rich expressiveness and enabling a new class of developers to make use of the EDW in ways that were prohibitive before. (which you pointed out, but wanted to add emphasis)

A great example of this is one of our customers who has taken some Java code for a fraud detection algorithm they created and pushed it into our database.  Instead of running it once a week (because they have to decompress data, ship it to the application tier, and then process it), they can run it once every 15 minutes.  This lets them increase the frequency of their fraud detection (to catch more bad guys), and save costs/time in the process.

There are some great example applications and educational materials here for folks interested.

Is this comment inappropriate? Click here to flag this comment.