Oops! The input is malformed!
Originally published May 3, 2012
The information explosion has rocked the world of data over the last five years. The advancement of mobile technology, the availability of tablets and smartphones, and the rapid growth of social media have all contributed to both production and consumption of data at never-before-seen volumes. Other contributing factors have been recommendation engines, cool new visualization capabilities for business intelligence (BI), advances in software technology to enable machine learning, and smarter systems in hospitals, airports, airplanes, automobiles and more.
Different types of data with varying degrees of complexity are produced at multiple levels of velocity. All of this data is now used for learning more about an enterprise, its customers, the brand, its equity and clout, the competition – all of the factors that drive a business, now quantifiable and available for use in decision making. This explosion of information has caused a flurry of hyperactivity in the enterprise, and business users are wondering about adopting new and additional data for analysis and decision making.
As this story unfolds, IT and data warehouse (DW) teams will start dreading the very thought of more data. Why? Well, all of this data will mean that existing systems will now have additional work processing this data. This is the challenge. This is where the nuance of workload is introduced.
Workload can be defined as the execution and completion of a task that utilizes a mix of resources (e.g., processing power, storage, disk input/output and network bandwidth). At any given time, any system that is processing information is executing a workload. This is common to the world of data warehousing and business intelligence, and very applicable to the underlying information architecture.
On average, there are three types of workload in a standard data warehouse. Think for a minute about your data warehouse and the information processing associated with it: data from multiple sources that are being loaded and integrated, downstream systems and data marts that are accessing data from the same data warehouse, and analytical platforms executing and analyzing large complex queries.
Now, let’s load complexity into this equation: You have jobs that need to load large fact tables, large history tables and move data from one level to another depending on the age of the data. While executing these specific jobs, your storage environment (SAN) is showing signs of slowdown. You find out that the SAN is shared across online transaction processing (OLTP), data warehouse and analytical databases. In addition, the storage is sharing disk between your data warehouse and OLTP system.
Before we proceed further down this path, let’s pause for a minute and look at what we have. We have a three-tier architecture: application → data → storage. In most cases, two out of the three tiers are shared across the enterprise. When these systems are commissioned and designed, the basic premise is to provide sustained speed of performance per service level agreements. In the first few months, due to the volume of data and users, the performance will meet and exceed all service level agreements. Within six months to a year, the performance curve slows down, sometimes drastically. This is not uncommon, and it is not caused by user adoption of the data warehouse.
Figure 1 shows the different categories of activities that affect a typical data warehouse:
In this scenario, traffic will keep moving – albeit slowly – as long as the vehicles stay in their appropriate lanes. This situation is similar to a data warehouse in that some queries are like fast cars with small result sets, and other queries are like trucks and buses, bringing back lot of information.
In this scenario, traffic is congested; everybody is driving across lanes. This is similar to the data warehouse when you allow for all types of queries, across all data types without regard for the existing database structure, how the data is stored and how it will be processed. You end up with loss of adoption, lack of trust and overall failure. This is where understanding workload becomes mandatory for data warehouse design.
SOURCE: Workload-Driven Data Warehousing
Recent articles by Krish Krishnan