Print
Email
ADVERTISEMENT
business intelligence resources
Why Do We Need Data Warehouse Appliances? Part 2
The Challenges of High Performance Data Warehousing
Listen Podcast
(To subscribe to receive audio articles, click here for the audio article feed.)
Published: November 1, 2007
The key factors that drive effective data warehouse performance and challenge IT departments are data loading, availability, volume, storage and operational issues.

The most common challenges that constantly hit the user and management teams are as follows:

  • "My query that runs for 10 minutes in 100,000 records runs for 10 hours in 1,000,000 records."

  • "My ETL processes are slow or have failed due to large data volumes to process into the data warehouse."

  • “My CFO wants to know why we need another $500,000 for infrastructure when we invested a similar amount just 6 months ago.” 

  • “My database administration team is exhausted trying to keep the tuning process online for the databases to achieve speed.”

The key factors that drive effective data warehouse performance and challenge IT departments are data loading, availability, volume, storage and operational issues.

Loading Data

Loading data into a data warehouse is one of the longest processes in terms of time. The process of extracting various data feeds, processing them through data quality, and then data profiling and loading them with or without transformations to a final destination is time-consuming. This can be especially challenging when input volumes are low with smaller bursts of data and speed is impacted by the volume of data in the data warehouse.

Data Availability

Data availability service level agreements (SLAs) have a profound impact on the need to have a high-performance environment. End user requirements must be clearly documented for data to be pristine, integrated and seamlessly available for downstream applications like reporting and analytics. Additionally, organizations often fall short on data growth projections, data demand projections, data retention cycles and associated SLAs that have not been documented.

Data Volumes

Data volumes in the average data warehouse have been exploding by gigabytes every day. Growth rates for capturing and retaining granular details have been increasing over the past 3 years. A few reasons for this data volume explosion could include:

  • Compliance requirements

  • Legal mandates

  • Business mergers and acquisitions

  • Analytics

Storage

Disk and storage systems have consistently improved over the years, both in terms of speed and performance, while costs have been relatively stable and, in some cases, less expensive. In a traditional architecture, storage is shared across all areas of a data warehouse, making it a highly constrained area in terms of availability and performance. ETL and business intelligence (BI) queries produce large amounts of traffic and consuming a lot of space to compute the complex result sets. Shared disk architecture has never been an answer for the data warehouse.

Operational Issues

The cost of maintaining a data warehouse has become monumental in many organizations. With the need for granularity of the data growing and history retention growing, a two-way explosion has resulted in an unmanageable amount of information being processed by the data warehouse. In addition to this data volume, multiple kinds of related activities such as data mining, predictive analysis and other historical trending queries have left IT feeling numb and cold rather than energized and happy with the task of managing this infrastructure. Another indirect effect is heavy demand on resources, both in hardware and IT administration (DBA, system administrator, network administrator roles), which often leaves other tasks incomplete from a holistic perspective.

We have so far outlined the various pain points, the issues and challenges that are faced in managing and maintaining a “high performance” data warehouse. In my next article, we will start looking at why the data warehouse appliance can alleviate a lot of this pain.

If you found this article helpful and would like to receive the latest insights each week from the experts featured on the Business Intelligence Network, please subscribe to the Bill Inmon Newsletter.


Recent articles by Krish Krishnan

Krish Krishnan - Krish is an expert in the strategy, architecture and implementation of high performance data warehousing solutions. He is a recognized data warehouse thought leader, writing and speaking at industry leading conferences, user groups and trade publications. He is a certified Bill Inmon professional and holds all major DBA certifications. In his 18 years of professional experience, he has been solving complex solution architecture problems spanning all aspects of data warehousing and business intelligence for Fortune 1000 clients. With his “get it done” approach, he has implemented data warehouse solutions ranging up to hundreds of terabyte data volumes and drives performance tuning into existing BI/DW investments to realize greater than 90% performance gains. Krish leads the Data Warehouse Appliance Expert Channel at BeyeNETWORK.com and is helping drive and mature the data warehouse appliance market. Krish also serves as Associate Vice President of DAMA Chicago. He is a BI Principal at Daugherty Business Solutions in Oak Brook, Illinois.

Editor's note: More Krish Krishnan articles, resources, news and events are available in the Business Intelligence Network's Krish Krishnan Channel. Be sure to visit today!

showing all