We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: William McKnight Subscribe to this blog's RSS feed!

William McKnight

Hello and welcome to my blog!

I will periodically be sharing my thoughts and observations on information management here in the blog. I am passionate about the effective creation, management and distribution of information for the benefit of company goals, and I'm thrilled to be a part of my clients' growth plans and connect what the industry provides to those goals. I have played many roles, but the perspective I come from is benefit to the end client. I hope the entries can be of some modest benefit to that goal. Please share your thoughts and input to the topics.

About the author >

William is the president of McKnight Consulting Group, a firm focused on delivering business value and solving business challenges utilizing proven, streamlined approaches in data warehousing, master data management and business intelligence, all with a focus on data quality and scalable architectures. William functions as strategist, information architect and program manager for complex, high-volume, full life-cycle implementations worldwide. William is a Southwest Entrepreneur of the Year finalist, a frequent best-practices judge, has authored hundreds of articles and white papers, and given hundreds of international keynotes and public seminars. His team's implementations from both IT and consultant positions have won Best Practices awards. He is a former IT Vice President of a Fortune company, a former software engineer, and holds an MBA. William is author of the book 90 Days to Success in Consulting. Contact William at wmcknight@mcknightcg.com.

Editor's Note: More articles and resources are available in William's BeyeNETWORK Expert Channel. Be sure to visit today!

Links to the Columnar Database series:

 

Understanding Columnar Database Fundamentals

More on the Row-Oriented Data Page

The Columnar Data Page

How Columnar Addresses Modern Bottlenecks

Columnar Materialization

The Value of Performance and When to Consider Columnar Databases

So far in this series on columnar, I have attempted to distinguish the columnar database from the row-wise database from the bottoms up, drawing distinctions in how the data is stored on disk, which is the major physical difference.  Once the data is stored in its new and unusual way in a column-based DBMS, it, of course, must be retrieved.

In row-wise data retrieval, we frequently ask for multiple tables to be joined in the result set.  Of course, in columnar (which works with SQL too by the way), there are joins, but there is also an analogous concept to joins in columnar within a single table.  It's a process I call "gluing" and it's the process of pulling together the necessary columns from within a table that are needed by the query, either for selection or projection (where clause).

In row-wise, unnecessary columns must be removed from the result set because inevitably (unless the query is a SELECT *) more columns were retrieved than are interesting to the query.  In columnar, since just enough columns were retrieved, that is not necessary.  However, what is necessary is a strategy that determines when to glue (materialize) the columns together.  There are two choices - early materialization or late materialization.

Early materialization retrieves all the columns necessary and glues them together up front, effectively creating a "row" with which to do the necessary processing.  If you asked...

SELECT CUSTID, PRICE

FROM SALES

WHERE PRODID = 4 AND STOREID = 1

...it would pull CUSTID, PRICE, PRODID, and STOREID from their respective vectors, glue them together*, apply the WHERE clause and return the selected columns by trimming PRODID and STOREID from the "row".  This strategy works best for complex joins.

The same query could also look into the PRODID vector and return a bitmap of the rows that met the condition (=4), do the same for STOREID (=1), then "AND" the bitmaps to determine which rows met both conditions.  The columns that matched the rows in those vectors can then be pulled and glued together at the very end of the processing, just before presentation.  This is called late materialization and it is usually best for less complex and non-join queries.

Ideally, the columnar optimizer is able to execute either strategy.  Ideally, it can analyze the query and determine and execute the best strategy.  It's important to analyze your workload in order to determine the value of this aspect of the columnar optimizer to your shop.  And that will lead to the next entry, which will be on the fit of columnar databases in an enterprise.

*Recalling from this post how it carefully connects columns from the same row


Posted June 1, 2010 1:51 PM
Permalink | 1 Comment |

1 Comment

Easy to understand article. Thanks!

Leave a comment

    
   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›