Blog: William McKnight Subscribe to this blog's RSS feed!

William McKnight

Hello and welcome to my blog!

I will periodically be sharing my thoughts and observations on information management here in the blog. I am passionate about the effective creation, management and distribution of information for the benefit of company goals, and I'm thrilled to be a part of my clients' growth plans and connect what the industry provides to those goals. I have played many roles, but the perspective I come from is benefit to the end client. I hope the entries can be of some modest benefit to that goal. Please share your thoughts and input to the topics.

About the author >

William is the president of McKnight Consulting Group, a firm focused on delivering business value and solving business challenges utilizing proven, streamlined approaches in data warehousing, master data management and business intelligence, all with a focus on data quality and scalable architectures. William functions as strategist, information architect and program manager for complex, high-volume, full life-cycle implementations worldwide. William is a Southwest Entrepreneur of the Year finalist, a frequent best-practices judge, has authored hundreds of articles and white papers, and given hundreds of international keynotes and public seminars. His team's implementations from both IT and consultant positions have won Best Practices awards. He is a former IT Vice President of a Fortune company, a former software engineer, and holds an MBA. William is author of the book 90 Days to Success in Consulting. Contact William at wmcknight@mcknightcg.com.

Editor's Note: More articles and resources are available in William's BeyeNETWORK Expert Channel. Be sure to visit today!

 Links to the Columnar Database series:

 

Understanding Columnar Database Fundamentals

More on the Row-Oriented Data Page

The Columnar Data Page

How Columnar Addresses Modern Bottlenecks

Columnar Materialization

The Value of Performance and When to Consider Columnar Databases

For 99% of you, the database management systems you have been working with have been "row oriented".  You may never have applied that label to them, know why it's important or that alternatives exist.  And for many, even DBAs, the "need to know" down to the data page layout has not been there.  Even the vendors who are columnar don't necessarily promote their products that way, but rather promote their systems as solving "analytic" queries.

The current main alternative to "row oriented" is "column oriented".  "Column oriented" databases now permeate the landscape and claim to provide a better value proposition for a specialized analytic workload or even the entire data warehouse. 

This is the orientation of....

  • Sybase IQ
  • SAND
  • Vertica
  • ParAccel
  • InfoBright
  • Kickfire
  • Exasol
  • MonetDB
  • Microsoft SQL Server 2008 R2 Gemini/Vertipaq
  • Oracle Exadata2 (both row- and columnar-)
  • Calpont's InifiniDB

Microsoft  is using columnar in PowerPivot for SharePoint (Vertipaq) client-side collaborative result set manipulation.  Oracle's Exadata2 accommodates both columnar and row orientation and Oracle has announced a "hybrid future".  While scant on details, this should merit some attention.  Also there is at least one vendor, Illuminate, which is a hybrid of row and column implementation, which I'll discuss later.

 

So what is columnar?  Well, today, I'm going to start the literal answer and in later posts, I'll expand on the answer, the byproducts and ramifications.  The literal answer will expand from a knowledge of the row-based data page layout.

 

Row- and columnar- have to do with how the data is laid out on a data page.  DBAs will know about data page size, which can range from 4K to 64K.  Every DBMS is slightly different, but inside the page is a page header, the records (with record headers), a row ID map and a small page footer.  The row ID map is significant because it contains the offsets from the start of the page to the beginning of the record (well actually the record header.)  If the DBMS wants to read the 4th column of the 3rd record on the 123rd page, it will go to the 123rd page and read the (let's say 2-byte) entry that is 8 bytes from the end of the page (2 byte footer plus 2 bytes for each row ID map entry times 3) to get to the 3rd record.  That takes it to the offset and after (let's say) 6 bytes for the record header, the actual data begins.

 

In a row-oriented DBMS, EVERY field for the record is then found (in column order within the table.)  If a certain column (say, the 4th column) is interesting, the DBMS will advance the number of bytes that's contained in the first 3 fields (according to the catalog notwithstanding variable length fields, for which the field length is contained IN the first 2 bytes of field itself) and there the read engine finds it's desired 4th column of the 3rd record on the 123rd page.

 

The time-consuming part of the operation was reading the data page in the first place.  If you're interested in just one column, it would be great to be able to find a bunch of that column's values on the page, without having to skip over those other columns that aren't interesting.  And that will set us up for the next entry in this series, which will be on the columnar data page.

 

Row-oriented data page:datapage.JPG


Posted April 26, 2010 7:45 PM
Permalink | 2 Comments |

2 Comments

Bill,

great to hear you talking about this topic. I believe columnar is the way of the future for all non-volatile data which is why I joined SAND.

Over the next few months I believe you will find a number of new things at SAND which will be of interest to you. Keep an eye out around mid-May.

Mike

Hi Mike and thanks. I've been talking about columnar and will be giving a talk on "Columnar Databases 101: The Vectorizing of Information" at the Dallas chapter of TDWI on June 11. All are welcome, no fee.

http://tdwichapters.org/Dallas

Leave a comment

    
   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›