<-- Back to full color view

Applying Agile Methods to Data Warehouse Projects

Originally published January 25, 2012

Rapidly gaining in popularity, the Agile approach to data warehousing solves many of the thorny problems typically associated with data warehouse development – most notably high costs, low user adoption, ever-changing business requirements and the inability to rapidly adapt as business conditions change.

The Agile approach can be used to develop any analytical database, so let’s begin with two familiar definitions:

  • A data warehouse (DW) is simply a database that contains integrated and homogenized information from one or more sources brought together to support analysis and reporting. These sources can be your internal online transactional processing (OLTP) systems such as finance, accounting, sales, marketing, payroll, supply chain, etc., or external sources such as supplier files, purchased marketing lists, Facebook, Twitter or census data, etc. In addition to the data warehouse, you may also be using additional types of databases for analysis and reporting. The most common types include data marts and operational data stores (ODSs).

  • Business intelligence (BI) refers to the different mechanisms used to leverage and interact with the data stored in the databases. Types of BI applications include query, reporting, analysis, dashboards, statistics, predictive models, key performance indicators (KPIs), etc.
These two concepts are depicted in the following diagram.


This article focuses on applying Agile methods to the creation of the databases. In order to simplify the discussion, I will use the generic term “analytical database” to refer to all types of data stores – data warehouse, data mart, operational data store, etc.

Why Agile?

There are several key reasons why Agile methods are well-suited to building analytical databases. Among them are:
  • Business-driven vs. data-driven development

  • Reduced risk and complexity
Agile Methods Take a Business-Driven Approach. Simply stated, creating analytical databases is complex, time-consuming and oftentimes overly expensive, especially when traditional data-driven methods are used. A fundamental truth of business intelligence and data warehousing is that data integration and homogenization account for 70% to 80% of the project budget and an even higher percentage of the risk.

Let’s take a simple example. Assume your organization has four internal OLTP systems and one external data source. On average, each system has 30 database tables and each table contains 30 columns. This means that:

(4 OLTP systems + 1 external system) x 30 tables x 30 columns = 4,500 data elements!

In a data-driven approach, it’s not uncommon to want to integrate and homogenize most, if not all, of the data before the first query or report can be written. This means that 70% to 80% of the project budget will be expended before any business value can be realized. Similarly, integrating thousands of fields can take upwards of 12 months to complete. What this means is that your goal should be to minimize the amount of effort associated with data integration and homogenization.

In an Agile delivery model, only data needed to answer specific business questions or to solve specific business problems is sourced (these need statements are captured in a series of business “stories”). So instead of first trying to “boil the ocean” via a massive data integration effort, Agile practitioners work with the business community to define the hundred or so data elements that drive performance. This means that the business will be in a position to receive value much more quickly – in weeks or months rather than quarters or years.


Agile Methods Reduce Risk and Produce Systems with High Adoption Rates.
Organizations that apply traditional waterfall methods to BI/DW projects accrue unnecessary risk and may find out what they’ve created does not satisfy the business’s needs. Waterfall methods mean it’s all or nothing. In other words, design cannot begin until all of the requirements are defined, and coding cannot begin until design is complete. This means that the project takes on ever-increasing levels of risk and that business value is delivered at the end of the project.

It’s not unusual to find that once an analytical database has been deployed using a data-driven and waterfall approach, it suffers from low adoption and usage. The primary culprit seems to be that business needs and priorities will have shifted between the time the requirements were originally defined and when the analytical database was deployed. For the most part, the delivery team’s efforts are seen as a black hole. What ends up being delivered is based on assumptions and interpretations of the requirements and may not contain the information the business really needs. Reality hits when that first report is written, oftentimes making the data integration effort moot.

When Agile methods are applied, value can be shown on a recurring basis. The key tasks of database design – data quality remediation, and data integration and homogenization – are broken into short, time-boxed and scope-boxed delivery cycles, or “sprints,” that generally last two to four weeks each. These data-focused tasks are paired with prototyping in the BI layer, allowing the business to interact with the data multiple times, helping to assure that the analytical database truly contains useful information.

The use of delivery sprints keeps business value at the forefront and drives project risk down to a minimum. At the end of each two- to four-week sprint cycle, the delivery team is required to demonstrate what they have produced, making their work much more visible to the business and allowing for midcourse corrections (another tenet of Agile is to “fail quickly”).


Agile methods can easily be applied to the building of analytical databases – data warehouses, data marts, operational data stores, etc. When the work associated with designing the databases and integrating the data is driven from the business side, project costs can be held to a minimum and business value can be delivered more quickly. By organizing the work into short sprint cycles, everyone can be assured that the expensive and risky part of the project – data integration and homogenization – will surely meet the business’s ever-increasing appetite for impactful and actionable information rather than suffer from misinterpreted requirements and low business adoption.

SOURCE: Applying Agile Methods to Data Warehouse Projects

  • Jim GalloJim Gallo
    Jim is  National Director, Vice President, Business Analytics at Information Control Corporation (ICC), a firm focused on reducing the cost of developing BI solutions. Jim and the ICC team have enabled companies to increase the velocity of their development by adopting Agile methods for BI, with a particular focus on turning theory into reality.

    Jim is a recognized expert and has published a number of articles on the practical realities of business intelligence and data warehousing. He is a regular speaker at conferences and industry-related events. He has led a number of large, complex BI projects for Fortune 1000 companies in addition to delivering value to federal and state governments and international clients. For two years in a row, Jim has been named IBM Champion for Information Integration and Federation. He can be reached at jgallo@iccohio.com.

    Editor's Note: More articles and resources are available in Jim's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

Recent articles by Jim Gallo



Want to post a comment? Login or become a member today!

Be the first to comment!


Copyright 2004 — 2020. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC