We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: William McKnight Subscribe to this blog's RSS feed!

William McKnight

Hello and welcome to my blog!

I will periodically be sharing my thoughts and observations on information management here in the blog. I am passionate about the effective creation, management and distribution of information for the benefit of company goals, and I'm thrilled to be a part of my clients' growth plans and connect what the industry provides to those goals. I have played many roles, but the perspective I come from is benefit to the end client. I hope the entries can be of some modest benefit to that goal. Please share your thoughts and input to the topics.

About the author >

William is the president of McKnight Consulting Group, a firm focused on delivering business value and solving business challenges utilizing proven, streamlined approaches in data warehousing, master data management and business intelligence, all with a focus on data quality and scalable architectures. William functions as strategist, information architect and program manager for complex, high-volume, full life-cycle implementations worldwide. William is a Southwest Entrepreneur of the Year finalist, a frequent best-practices judge, has authored hundreds of articles and white papers, and given hundreds of international keynotes and public seminars. His team's implementations from both IT and consultant positions have won Best Practices awards. He is a former IT Vice President of a Fortune company, a former software engineer, and holds an MBA. William is author of the book 90 Days to Success in Consulting. Contact William at wmcknight@mcknightcg.com.

Editor's Note: More articles and resources are available in William's BeyeNETWORK Expert Channel. Be sure to visit today!

It was columnar day for me at SQL PASS on Wednesday.  On Tuesday, Microsoft announced that Denali, the, the code name for its next release, would have a columnar data store option.  My talk was on columnar databases Wednesday.  Here are some of the details I shared about Denali's column store, which has a project name of Apollo.  If you're interested in columnar databases in general, see my blog entries here.

In Denali, there will be an index type of "COLUMN STORE".  I find this to be an interesting use of index, because the resultant data stores that are created are not like traditional indexes.  However, Microsoft has never been a conformist organization.   The column stores are non-clustered indexes.  No indexes can be created on top of the column stores.

Where the column store is like an index is that you need the accompanying row-wise database.  The column stores are not created mutually exclusive to the row-wise database.  To my knowledge, this is the only database that requires this.  I don't expect this to be a long-term requirement.  While this may seem like it's expanding your storage needs (and it is), it may not be as much as you initially think because some non-clustered indexes might become redundant in this architecture. 

The good news about this is that the optimizer has been updated to route queries to the column stores or the row store accordingly.  This could prove to be a competitive differentiator.  Few other database systems have this.  An intelligent hybrid optimizer will be key to the success of databases that are at least partly columnar. 

Apollo's vectors (per my Sybase IQ language in my earlier posts) are called column segments, although there can be multiple segments per column, as explained below.  You can only have one column store index per table, but you can name as many columns as you want.  Of course, it doesn't matter what order you use because each column forms an independent segment.  Only single-column segments are supported in Apollo.

Apollo leverages the intellectual property, patents, and good experiences that Microsoft has had with Vertipaq, the client-side, columnar, in-memory structure used with PowerPivot.  Columnar remains the preferred, future, and only, format for Vertipaq. 

In Apollo, no inserts, updates or deletes are allowed on the tables that have a COLUMN STORE (this is the part of the talk where I did a mock exit).  You can, however, do incremental loads and you can switch partitions to add data.  You can also sandwich DISABLE and REBUILD of the segments around your updates.  I expect this will improve over time.

As long as I'm on limitations, the columns selected have data type restrictions.  The columns must be integer, real, string, money, datetime or a decimal that is 18 digits or less.  No other data types are supported. [11/19 update: Apollo will support all the date and time types (including datetime) except that datetimeoffset is only supported for precisions of 0 to 2.]

As we know, getting more (relevant) information in the I/O is one of the advantages of columnar data store.  Microsoft has taken this to a new level.  While data is still stored in pages (blocks), the unit of I/O is actually one million data values.  That forms a "segment."  You read it right - the unit of I/O is not a certain number of "K" but has to do with the NUMBER of data values.  Inside those pages, the data is stored in blobs.  Bitmapping is part of the storage somehow as well although columnar data page layouts are not public information.  Neither is how it's doing materialization.  As for I/O, compression algorithms have been reengineered for columnar.  These are not the same compression algorithms from the row-wise database.

If there are more than a million data values represented, there will be multiple segments.  Apollo uses run-length encoding ("row-range" in this post) and the 'million' refers to the number of corresponding rows covered by the data values.  If there are 500,000 unique values in a 2,000,000 row table, there will be 500,000 entries but 2 segments.

Apollo is not available in the CTP1.  I have no information on its licensing, except that it probably won't be available in the lower-tier SQL Server licenses.  As for which columns to store this way, it's going to be per the recommendations from this post. 

Get ready as Microsoft begins its foray into columnar and taking the pressure off the core DBMS to do all workloads and do it all fast. 

Posted November 12, 2010 10:22 AM
Permalink | No Comments |

Leave a comment


Search this blog
Categories ›
Archives ›
Recent Entries ›