Oops, what is that I tripped on? Oh, it’s another new column-oriented data warehouse appliance. If you haven’t noticed, in 2007, Vertica, ParAccel and Calpont have emerged with a column orientation to their DBMS and the appliance model to their delivery. By the way, that makes 12 data warehouse appliances by my count.
A phrase I saw on the internet recently - “Pioneer calls RDBMS technology obsolete” - caught my eye and the first thing that came to mind was “Michael Stonebraker?”. My suspicions were correct. Vertica is his new venture and he states “my prediction is that column stores will take over the warehouse market over time, completely displacing row stores”.
Most IS professionals do not know about column (or “vector”) oriented DBMS. Column-oriented DBMS have several major architectural differences from other relational database management systems. The main difference is its physical orientation of data in columns as opposed to rows. This allows it to perform very high selective compression because all of a column’s values are physically together. It also provides for excellent performance when you select a small subset of the columns in a table since you do not perform I/O for data that is not needed. Column-orientation greatly assists a compression strategy due to the high potential for the existence of similar values in columns of adjacent rows in the table.
The Model 204 was sort of like this and Sybase IQ is definitely column oriented. There have been special occassions where they are more appropriate than the row-oriented DBMS.
It will be interesting to see where and how these approaches find merit in DW, if they have overcome some of the problems of the past such as those below (early indications are that they may have) and finally, if they intend to compete for EDW, as Michael Stonebraker suggests in his quote above.
Former challenges of column-oriented DBMS:
It is recommended and common practice to index all columns at least once and, for some columns, more than once
Lack of parallelism
Query performance disadvantages for any query other than columnar functions
Insert performance disadvantages
Overcoming lack of market resources, lack of vendor ports and industry row-oriented mindsets
Posted October 5, 2007 8:26 AM
Permalink | No Comments |




Leave a comment