Potentially Teradata's most significant enhancement in a decade will be on display next week at the Teradata Partners conference. And that is Teradata Columnar. Few leading database players have altered the fundamental structure of having all of the columns of the table stored consecutively on disk for each record. The innovations and practical use cases of "columnar databases" have come from the independent vendor world, where it has proven to be quite effective in the performance of an increasingly important class of analytic query. Here is the first in a series of blogs where I discussed columnar databases.
Teradata obviously is not a "columnar database" but would now be considered a hybrid, exhibiting columnar features upon those columns that are chosen to participate. Teradata combines columnar capabilities with a feature-rich and requirements-matching DBMS already deployed by many large clients for their enterprise data warehouse. Columnar is available in all Teradata platforms - Teradata Active Enterprise Data Warehouse, Teradata Data Warehouse Appliance, Teradata Extreme Data Appliance and Teradata Extreme Performance Appliance.
Teradata's approach allows for the mixing of row structure, column structures and multi-column structures directly in the DBMS in "containers." The physical structure of each container can also be in row- (extensive page metadata including a map to offsets) which is referred to as "row storage format" or columnar- (the row "number" is implied by the value's relative position) format. All rows of the table will be treated the same way, i.e., there is no column structure/columnar-format for the first 1 million rows and row structure for the rest. However, (row) partition elimination is still very alive and, when combined with column structures, creates I/O that can now retrieve a very focused set of data for the price of a few metadata reads to facilitate the eliminations.Each column goes in one container. A container can have one or multiple columns. Columns that are frequently access together should be put into the same container. Physically, multiple container structures are possible for columns with a large number of rows.
Teradata Columnar utilizes several compression methods that take advantage of the columnar orientation of the data. Methods include run-length encoding, dictionary encoding, delta compression, null compression, trim compression and the previously-available columnar-agnostic UTF8. Multiple methods can be used with each column.
The dictionary representations are fixed length which allows the data pages to remain void of internal maps to where records begin. This small fact saves calculations at run-time for page navigation, another benefit of columnar. Variable-length records are handled similarly. Dictionaries are container-specific, which is advantageous in the usual case where column values are fairly unique to the column.
Starting by analyzing the workloads to be used with the data and focusing on column-specific workloads, then grouping columns accessed together, the foundation for table creation, with its automatic compression, is laid. Advantages will be seen in fewer storage needs, improvements in I/O bound query performance and scan operations.
Posted September 30, 2011 3:17 PM
Permalink | No Comments |