Blog: William McKnight Subscribe to this blog's RSS feed!

William McKnight

Hello and welcome to my blog!

I will periodically be sharing my thoughts and observations on information management here in the blog. I am passionate about the effective creation, management and distribution of information for the benefit of company goals, and I'm thrilled to be a part of my clients' growth plans and connect what the industry provides to those goals. I have played many roles, but the perspective I come from is benefit to the end client. I hope the entries can be of some modest benefit to that goal. Please share your thoughts and input to the topics.

About the author >

William is the president of McKnight Consulting Group, a firm focused on delivering business value and solving business challenges utilizing proven, streamlined approaches in data warehousing, master data management and business intelligence, all with a focus on data quality and scalable architectures. William functions as strategist, information architect and program manager for complex, high-volume, full life-cycle implementations worldwide. William is a Southwest Entrepreneur of the Year finalist, a frequent best-practices judge, has authored hundreds of articles and white papers, and given hundreds of international keynotes and public seminars. His team's implementations from both IT and consultant positions have won Best Practices awards. He is a former IT Vice President of a Fortune company, a former software engineer, and holds an MBA. William is author of the book 90 Days to Success in Consulting. Contact William at wmcknight@mcknightcg.com.

Editor's Note: More articles and resources are available in William's BeyeNETWORK Expert Channel. Be sure to visit today!

Recently in Business Intelligence/Data Warehousing Category

This week, Teradata introduced its in-memory capabilities.  The press release is here.   

In-memory is the current fastest commercial medium for data storage, upwards of thousands of times faster than HDD.  Naturally, it is expensive relatively speaking or it would be more pervasive now.  However, the price point has lowered recently and we are having an in-memory hardware renaissance.   As Tony Baer of Ovum said in a tweet response to me:  "w/DRAM cheap,takes SmartCaching to nextlevel."

Teradata takes in-memory a step further by considering the relative priority of data and, though all data is backed by disk, it will automatically transfer it in and out of memory as appropriate, in a feature it calls Intelligent Memory.  This is the data "80/20 rule" at work.   After 7 days of usage, Teradata says the allocation will be fairly settled.  Teradata has found that, across multiple environments, 85% of I/O uses 10% of the cylinders.  It's in-memory driven by data temperature (and this extends to HDD and SSD). 

You still need to allocate the memory (and SSD) initially but whatever you allocate will be intelligently utilized.  It will run will run across the Teradata family: the 670, 1650, 2700 and 6700 models. 

It behooves us to seek out the fastest medium for data that can be utilized effectively in the organization and delivers maximum ROI.  Those who understand the possibilities for data today should be sharing and stimulating the possibilities for information in the organization.  All productive efforts at information management go to the bottom line.


Posted May 9, 2013 11:05 AM
Permalink | No Comments |
I'm sitting in Dave Wells and John Myers class on Data Virtualization at the Data Warehousing Institute World Conference in Las Vegas.  They presented a good list of business considerations for when to do data virtualization (versus materialization).  These include:

  • High Urgency
  • Limited Budget
  • High Volatility in requirements
  • Highly Constrained replication constraints
  • Explore and Learn organizational personality

Dave also mentioned, and I agree, that you buy data virtualization for infrastructure, not one project.  This will be a point I'll make again in my keynote tomorrow where I regard data virtualization, data integration and governance as "blankets" over the "no-reference" ("no" because how can it be a reference architecture when it keeps changing) architecture.

Posted February 20, 2013 10:25 AM
Permalink | No Comments |

The story goes that Willy Sutton robbed banks because "that's where the money is."  While this attribution appears to be an urban legend, it's no myth that Oracle has a lion's share of databases - both transactional and analytic.

IBM started an advanced land grab for Oracle customer conversions by bringing a high compatibility of PL/SQL into the DB2 database.

Now, Teradata has invested resources in facilitating the migration away from Oracle.  With the Teradata Migration Accelerator (TMA). structure and SQL (PL/SQL) code can be converted to Teradata structures and code. This is a different philosophy from IBM, which requires few code changes for the move, but also doesn't immediately optimize that code for DB2.

While data definition language (DDL) has only minor changes from DBMS to DBMS, such as putting quotes around keywords, Teradata's key activity and opportunity in the migration is to change Oracle cursors to Teradata set-based SQL. 

"Rule sets" - for how to do conversions - can be applied selectivity across the structure and code in the migration.  TMA supports selective data movement, if desired, with WHERE clauses for the data.  TMA also supports multiple users doing a coordinated migration effort.

TMA also works for DB2 migrations.

While it will not do the trick on its own, having these tools, which convinces a shop that the move could be more pain-free than originally thought, will support DBMS migrations.  


Posted April 30, 2012 10:15 AM
Permalink | No Comments |

Teradata rolled out Teradata Data Labs (TDL) in Teradata 14.  Though it is not a high-profile enhancement, it is worth understanding for not only Teradata data warehouse customers, but also for all data warehouse programs as a signal for how program architectures now look. Teradata Data Labs supports how customers are actually playing with their resource allocations in production environments in an effort to support more agile practices under more control by business users.

TDL is part of Teradata Viewpoint, a portal-based system management solution.  TDL is used to manage "analytic sandboxes" by these non-traditional builders of data systems.  Companies can allocate a percentage of overall disk and other resources to the lab area and the authorities can be managed with the TDL.  By creating "data labs" and assigning them to requesting business users, TDL minimizes the potential dangers of the "can of worms" that has long been opened, supporting production create, alter and delete activity - not just select activity - by business users.

These sandboxes must be managed since resources are limited.  Queries can be limited, various defaults set and, obviously, disk space is limited for each lab.  Expiration dates can be placed on the labs, which is not dissimilar to how a public library works.  Timeframes will span a week through a year.  The users may also send a "promotion" request to the labs manager, requesting the entities within the lab be moved out of labs and into production.

Data labs can be joined to data in the regular data warehouse.  One Teradata customer has 25% of the data warehouse space allocated to TDL.

TDL can support temporary processing needs with strong resources - not what is usually found in development environments.  I can also see TDL supporting normal IT development.  Look into TDL, or home-grow the idea within your non-Teradata data warehouse environment.  It's an idea whose time has come.

TDL is backward-compatible to Teradata 13.


Posted April 17, 2012 9:38 AM
Permalink | No Comments |

Potentially Teradata's most significant enhancement in a decade will be on display next week at the Teradata Partners conference.  And that is Teradata Columnar.  Few leading database players have altered the fundamental structure of having all of the columns of the table stored consecutively on disk for each record.  The innovations and practical use cases of "columnar databases" have come from the independent vendor world, where it has proven to be quite effective in the performance of an increasingly important class of analytic query.  Here is the first in a series of blogs where I discussed columnar databases. 

Teradata obviously is not a "columnar database" but would now be considered a hybrid, exhibiting columnar features upon those columns that are chosen to participate.  Teradata combines columnar capabilities with a feature-rich and requirements-matching DBMS already deployed by many large clients for their enterprise data warehouse.  Columnar is available in all Teradata platforms - Teradata Active Enterprise Data Warehouse, Teradata Data Warehouse Appliance, Teradata Extreme Data Appliance and Teradata Extreme Performance Appliance.

Teradata's approach allows for the mixing of row structure, column structures and multi-column structures directly in the DBMS in "containers."  The physical structure of each container can also be in row- (extensive page metadata including a map to offsets) which is referred to as "row storage format" or columnar- (the row "number" is implied by the value's relative position) format.  All rows of the table will be treated the same way, i.e., there is no column structure/columnar-format for the first 1 million rows and row structure for the rest.  However, (row) partition elimination is still very alive and, when combined with column structures, creates I/O that can now retrieve a very focused set of data for the price of a few metadata reads to facilitate the eliminations.

Each column goes in one container.  A container can have one or multiple columns.  Columns that are frequently access together should be put into the same container.  Physically, multiple container structures are possible for columns with a large number of rows.

Teradata Columnar utilizes several compression methods that take advantage of the columnar orientation of the data.  Methods include run-length encoding, dictionary encoding, delta compression, null compression, trim compression and the previously-available columnar-agnostic UTF8.  Multiple methods can be used with each column.

 

The dictionary representations are fixed length which allows the data pages to remain void of internal maps to where records begin.  This small fact saves calculations at run-time for page navigation, another benefit of columnar. Variable-length records are handled similarly.  Dictionaries are container-specific, which is advantageous in the usual case where column values are fairly unique to the column.   

Starting by analyzing the workloads to be used with the data and focusing on column-specific workloads, then grouping columns accessed together, the foundation for table creation, with its automatic compression, is laid.  Advantages will be seen in fewer storage needs, improvements in I/O bound query performance and scan operations. 


Posted September 30, 2011 3:17 PM
Permalink | No Comments |
PREV 1 2 3 4 5 6 7 8 9 10 11 12 13

   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›