Relational database systems, such as IBM DB2 and Oracle Database, have undergone over a quarter century of development. During that time they have managed to successfully fight off competing database technologies for supporting mainstream database management. Do you remember the object/relational wars of the eighties?
MapReduce, a software framework introduced by Google for supporting parallel processing over large petabyte files has garnered significant attention of late. IBM is experimenting with this in conjunction with Google, and GreenPlum recently announced support.
The significant interest in MapReduce, and related technologies such as Hadoop and HDFS, has led to a backlash from the relational camp. David DeWitt and Michael Stonebraker have been especially outspoken (see www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html and www.databasecolumn.com/2008/01/mapreduce-continued.html).
Here is a small quote from their thoughts on the topic:
"As both educators and researchers, we are amazed at the hype that the MapReduce proponents have spread about how it represents a paradigm shift in the development of scalable, data-intensive applications. MapReduce may be a good idea for writing certain types of general-purpose computations, but to the database community, it is:
1. A giant step backward in the programming paradigm for large-scale data intensive applications
2. A sub-optimal implementation, in that it uses brute force instead of indexing
3. Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago
4. Missing most of the features that are routinely included in current DBMS
5. Incompatible with all of the tools DBMS users have come to depend on"
Does this mean the database wars are starting up again?
My opinion is that MapReduce is not intended for general purpose commercial database processing and is therefore not a major threat to relational systems. However, it does have its uses (as Google has demonstrated) for certain types of high volume processing. It also demonstrates that as data volumes get bigger, and the complexity of data and data structures increases, other types of database technology may start to gain traction in certain niche marketplaces. The use by IBM of the SPADE language, instead of StreamSQL, in its InfoSphere Streams product (System S) also demonstrates the changes going on in the database market.
What do you think?
Posted November 25, 2008 4:20 PM
Permalink | No Comments |