Links to the Columnar Database series:
For 99% of you, the database management systems you have been working with have been "row oriented". You may never have applied that label to them, know why it's important or that alternatives exist. And for many, even DBAs, the "need to know" down to the data page layout has not been there. Even the vendors who are columnar don't necessarily promote their products that way, but rather promote their systems as solving "analytic" queries.
The current main alternative to "row oriented" is "column oriented". "Column oriented" databases now permeate the landscape and claim to provide a better value proposition for a specialized analytic workload or even the entire data warehouse.
This is the orientation of....
Microsoft SQL Server 2008 R2 Gemini/Vertipaq
Oracle Exadata2 (both row- and columnar-)
Microsoft is using columnar in PowerPivot for SharePoint (Vertipaq) client-side collaborative result set manipulation. Oracle's Exadata2 accommodates both columnar and row orientation and Oracle has announced a "hybrid future". While scant on details, this should merit some attention. Also there is at least one vendor, Illuminate, which is a hybrid of row and column implementation, which I'll discuss later.
So what is columnar? Well, today, I'm going to start the literal answer and in later posts, I'll expand on the answer, the byproducts and ramifications. The literal answer will expand from a knowledge of the row-based data page layout.
Row- and columnar- have to do with how the data is laid out on a data page. DBAs will know about data page size, which can range from 4K to 64K. Every DBMS is slightly different, but inside the page is a page header, the records (with record headers), a row ID map and a small page footer. The row ID map is significant because it contains the offsets from the start of the page to the beginning of the record (well actually the record header.) If the DBMS wants to read the 4th column of the 3rd record on the 123rd page, it will go to the 123rd page and read the (let's say 2-byte) entry that is 8 bytes from the end of the page (2 byte footer plus 2 bytes for each row ID map entry times 3) to get to the 3rd record. That takes it to the offset and after (let's say) 6 bytes for the record header, the actual data begins.
In a row-oriented DBMS, EVERY field for the record is then found (in column order within the table.) If a certain column (say, the 4th column) is interesting, the DBMS will advance the number of bytes that's contained in the first 3 fields (according to the catalog notwithstanding variable length fields, for which the field length is contained IN the first 2 bytes of field itself) and there the read engine finds it's desired 4th column of the 3rd record on the 123rd page.
The time-consuming part of the operation was reading the data page in the first place. If you're interested in just one column, it would be great to be able to find a bunch of that column's values on the page, without having to skip over those other columns that aren't interesting. And that will set us up for the next entry in this series, which will be on the columnar data page.
Row-oriented data page:
Posted April 26, 2010 7:45 PM
Permalink | 2 Comments |