Blog: Colin White Subscribe to this blog's RSS feed!

Colin White

I like the various blogs associated with my many hobbies and even those to do with work. I find them very useful and I was excited when the Business Intelligence Network invited me to write my very own blog. At last I now have somewhere to park all the various tidbits that I know are useful, but I am not sure what to do with. I am interested in a wide range of information technologies and so you might find my thoughts will bounce around a bit. I hope these thoughts will provoke some interesting discussions.

About the author >

Colin White is the founder of BI Research and president of DataBase Associates Inc. As an analyst, educator and writer, he is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies and how they can be used for building the smart and agile business. With many years of IT experience, he has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit and is a regular contributor to several leading print- and web-based industry journals. For ten years he was the conference chair of the Shared Insights Portals, Content Management, and Collaboration conference. He was also the conference director of the DB/EXPO trade show and conference.

Editor's Note: More articles and resources are available in Colin's BeyeNETWORK Expert Channel. Be sure to visit today!

The use of cloud computing for data warehousing is getting a lot of attention from vendors. Following hot on the heels of Vertica's Analytic Database v3.0 for the Cloud announcement on June 1 was yesterday's Greenplum announcement of its Enterprise Data Cloud™ platform and today's announcement by Aster of .NET MapReduce support for its nCluster Cloud Edition.

I have interviewed all three vendors over the past week and while there are some common characteristics in the approaches being taken by the three vendors to cloud computing, there are also some differences.

Common characteristics include:
  • Software only analytic DBMS solutions running on commodity hardware
  • Massively parallel processing
  • Focus on elastic scaling, high availability through software, and easy administration
  • Acceptance of alternative database models such as MapReduce
  • Very large databases supporting near-real-time user-facing applications, scientific applications, and new types of business solution
The emphasis of Greenplum is on a platform that enables organizations to create and manage data warehouses and data marts using a common pool of physical, virtual or public cloud infrastructure resources. The concept here is that multiple data warehouses and data marts are a fact life and the best approach is to put these multiple data stores onto a common and flexible analytical processing platform that provides easy administration and fast deployment using good enough data. Greenplum sees this approach being used initially on private clouds, but the use of public clouds growing over time.

Aster's emphasis is on extending analytical processing to the large audience of Java, C++ and C# programmers who don't know SQL. They see these developers creating custom analytical MapReduce functions for use by BI developers and analysts who can use these functions in SQL statements without any programming involved.

Although MapReduce has typically been used by Java programmers, there is also a large audience of Microsoft .NET developers who potentially could use MapReduce. A recent report by Forrester, for example, shows 64% of organizations use Java and 43% use C#. The objective of Aster is to extend the use of MapReduce from web-centric organizations into large enterprises by improving its programming, availability and administration capabilities over and above open source MapReduce solutions such as HADOOP.

Vertica see its data warehouse cloud computing environment being used for proof of concept projects, spill over capacity for enterprise projects and for software-as-service (SaaS) applications. Like Greenplum it supports virtualization. Its Analytic Database v3.0 for the Cloud adds support for more cloud platforms including Amazon Machine Images and early support for the Sun Compute Cloud. It also adds several cloud-friendly administration features based on open source solutions such as Webmin and Ganglia.

It is important for organizations to understand where cloud computing and new approaches such as MapReduce fit into the enterprise data warehousing environment. Over the course of the next few months my monthly newsletter on the BeyeNETWORK will look at these topics in more detail and review the pros and cons of these new approaches.


Posted June 9, 2009 12:00 AM
Permalink | 3 Comments |

3 Comments

Hi Colin:
Given your recent report on BI SaaS with Claudia, you know all about Kognitio and its Data Warehousing as a Service (DaaS) offering.

You're absolutely right in that data warehousing in the cloud environment is gaining momentum and traction. Kognitio, as you cited in your report, has been implementing various forms of DaaS for more than a decade with customers like British Telecom. In that sense, it's been doing data warehousing in the cloud for longer than the other three guys put together.

While technology is rather mature, I keep seeing some issues about moving data in the cloud.
Data control, physical ownership, change management, SLAs etc. are still unsorted issues IMHO.

Nice summary of a busy week of announcements!

Just one minor point about the .NET support for MapReduce -- it is within Aster's core nCluster database product and not just the Aster nCluster Cloud Edition. As you pointed out, our main goal is to make MapReduce accessible to enterprises, most of whom are deploying our on-premise Aster nCluster database at this point.

Technical details here:
http://www.asterdata.com/mapreduce/index.php

Leave a comment