For awhile the Hadoop community was proselytizing the new open source distributed file system as a relational database killer. But wiser minds have prevailed, namely that of Mike Olson, long-time database executive and current CEO of Cloudera, a leading distributor of Hadoop and related open source add-ons.
I recently sat down with Olson and Jon Kreisa, Cloudera VP of Marketing, and heard loud and clear that Hadoop plays a complementary role to relational-oriented data warehouses and BI tools. "It would be foolish for us to duplicate the functionality of a relational database which has more than 20 years of development behind it," says Olson.
According to Olson, Hadoop's sweet spot is processing large volumes of semi-structured and unstructured data in batch-oriented programs written by developers. Many BI architects see Hadoop as a perfect environment for staging and processing large volumes of clickstream and other unconventional data not commonly stored in a data warehouse.
In effect, Hadoop serves as staging area and ETL system to filter and process "big data" so it can loaded into a data warehouse and joined with other corporate data for reporting and analysis purposes. Hadoop also makes a terrific low-cost archival system that enables companies to keep all their data online without having to summarize it or migrate it to tape.
Last year, Cloudera notched partnerships with a bevy of relational database vendors, who also see the complementary nature of Hadoop to their data warehousing business. This year, Olson says, Cloudera will establish partnerships with multiple ETL and BI vendors, solidifying Hadoop's position as a key component in a large-scale BI architecture. Already, Cloudera has partnered with database, ETL, and BI vendors to create bridges between the two worlds. Database partners of Cloudera include Aster Data, Greenplum, Membase, Netezza, Quest, Teradata, and Vertica. Its ETL partners include Informatica, Pentaho Data Integration, and Talend. And its BI vendors include Jaspersoft, MicroStrategy, and Pentaho.
In finishing, Olson admitted that despite the current cooperation between the Hadoop and BI communities, each is aggressively developing capabilities offered by the other, which will eventually minimize the need for such partnerships. In fact, many large Internet companies, including eBay which recently spoke on a Cloudera Webcast, said they are using Hadoop for reporting and analysis as well as staging, archiving, and preprocessing.
So, while the two camps are playing nice today, the battle has only just begun!
Posted December 2, 2010 2:52 PM
Permalink | No Comments |




Leave a comment