We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Can You Do Agile Data Integration?

Originally published March 28, 2011

It’s the debate that will not die: Can you or can you not deliver integration services for a data warehousing/business intelligence project using an Agile method? Those of us advocating Agile data warehousing keep running into well-credentialed and extremely experienced data warehouse (DW) and business intelligence (BI) professionals who insist that anyone who attempts Agile data warehousing is going to fail, no doubts about it.

We encounter these naysayers on discussion boards and speaking at DW/BI conventions. They have authored pamphlets and white papers published by our industry’s preeminent professional organizations. There’s only one problem with the adamant, negative message they offer: The rest of us have been successfully building data integration applications using Agile methods for over ten years now. I’ve achieved four-to-one increases in delivery speeds after introducing Agile methods into Fortune 500 shops, with quality defect rates dropping to zero. I know other DW/BI service companies in the U.S. that are so productive with Agile DW/BI methods that they can provide fixed bid contracts for data warehouse projects and effectively compete with offshore pricing. With this backdrop, how can anyone insist that you can’t perform Agile data integration?

Where’s the Beef?

It’s surprising how heated this particular debate can get – to the point you’d think we were discussing politics or religion. Usually it’s the proponents of Agile data integration that finally leave in frustration. “I could build an entire Agile data warehouse in the half the time it would take to convince these enterprise data warehousing people that it’s possible,” an Agile advocate once explained.

Recently, I doggedly followed one particularly acrimonious exchange, determined to discover what lay behind the anxiety of the naysayers. The light went on only when one of the esteemed detractors wrote, “All of you Agile hippies are just doing iterative delivery, and I can believe that works. BUT ITERATIVE IS NOT AGILE!!!”

So, it isn’t the iterative or incremental delivery paradigms that have the naysayers so concerned – it’s the extra something that makes iterative methods “agile” that they can’t believe in.

What Is Agile?

Perhaps this is just a personal bias, but if you encounter someone who can rattle off without hesitation, “Agile is this, Agile is that,” I think you should be wary of them. At my last count, there were over a dozen “Agile” methods, and the differences between them are considerable. Some, like Scrum, are time-boxed and others, such as Kanban, are buffer-based. The Dynamic Systems Development Method (DSDM) and extreme programming (XP) discover the extent of the desired system as delivered components accumulate, whereas the Rational Unified Process (RUP) strives to scope an entire project before coding begins. Forget about apples and oranges...this range of variation between Agile methods is more like apples and zebras.

So, the first step in working with Agile naysayers is to ask them what Agile method have they worked with that failed so miserably. Often, this question leads to the discovery that the Agile detractors have never actually used Agile on a DW/BI project...they’re speaking from theory after reading a book or attending a two-hour introduction at some conference.

That observation aside, I still want to discover the vital issue driving the naysayers’ pessimism. To get at it, we need to provide a working definition of “Agile,” at least for this article. In my experience, the most commonly employed Agile frameworks tend to be a set of attitudes, behaviors, principles, practices, tools, and techniques that emphasize close customer collaboration, quick delivery of increments in value, frequent reviews of project performance, innovation, and self-organized teams.

Heartburn Over Close Customer Collaboration

In the great debate over Agile data integration, naysayers say that the difference between “iterative” and “Agile” is the quick delivery and the close customer collaboration aspects of the common Agile methods. If I were to distill all the heat I’ve witnessed over this point, I would paraphrase their concerns like this: “You can’t have rapid, customer-driven delivery of new data warehouse features. It’s just too much work to take data all the way from source, through integration, and onto a dashboard quickly. And if you let the customer drive the process, they will change their direction so frequently you’ll never be able to build out the layers of data integration needed to form a comprehensive and complete enterprise data warehouse.”

I certainly agree that trying to rush new data all the way from source to dashboard in the span of a few weeks – before the customers change their priorities – would be a recipe for disaster. But isn’t this a misinterpretation of the Agile approach?  Who says you have to go all the way from source to dashboard in one short time span?  Remember, the mainstream of Agile methods emphasize quick increments of value. Nothing in Agile says you can’t structure those increments in a way that makes sense.

Furthermore, who says that customers are going to be some child-like creatures constantly chasing after the next shiny object they see?  Such situations do exist, but they do not have to persist. If we’re leaders on these data warehouse projects, isn’t it up to us to properly charter, organize, and manage the DW/BI program so that there are as few major changes in requirements as possible? In my experience, those whose Agile DW/BI projects disappoint have failed to exert the leadership required to make the project succeed, and without such leadership it would have failed under any method, Agile or waterfall.

Of the many ingredients that I and my colleagues draw upon to guide Agile data warehousing projects to success, those that best address this risk of collaborating with an erratic business partner are light-weight requirements management, user epic decomposition schemes, pipelined delivery squads, two-tiered demo data sets, and proxy product owners. Let me sketch just the first two of these in order to provide a flavor of how they address the concerns of the Agile data integration detractors. Interested readers can find descriptions of the remaining techniques in our Agile Data Warehousing books at at our website, www.ceregenics.com.

Light-Weight Requirements Management

I think it’s safe to agree that if we let the business dictate a rapid series of left and right turns in order to race after every squirrel that crosses the road, we’ll all end up in a ditch. Luckily, Agile lets us innovate, so my teams often borrow a streamlined version of requirements management from the world of traditional IT to ever so slightly dampen our customer’s churn on the requests they make of the development teams.

Our version of requirements management revolves around five stripped-down artifacts that form a progressive elaboration of requirements, starting with business requirements, progressing through solution requirements, and ending with technical requirements. The first three of these artifacts are designed specifically to dampen product-owner churn on requirements:

  • Business Concept Brief – half a page in which the business describes how it is going to make money with the help of data warehousing and business intelligence.

  • Stakeholder Requests – half a page each in which business partners describe what is wrong with their current DW/BI platform and how they would fix it.

  • Vision Document – five pages of consolidated problem statements with a list of major features planned, plus one diagram for each of the target systems proposed and their sources, the presentation-layer data model, and the high-level extract, transfer and load (ETL) process flow. Here, data warehousing and business intelligence describe the solution in enough detail to give a rough, order-of-magnitude estimate of cost and duration, allowing the business to decide whether to pursue the project. The vision document also lists the major stakeholders with interest in the features and performance of the DW/BI solution.
(Above descriptions taken from author’s forthcoming book, Agile Data Warehousing, Volume 2.)

All of these documents are short, but do convert the business’ requests into words so that the developers have recourse when the customer, the “product owner,” arrives with a wild request for some new feature. For example, if the team is building a billing data mart and the product owner announces that the next iteration will deliver a sales margin report, the developers can draw upon the project’s short requirements management documents and say, “Time out. Margins require cost data. This five-page vision document lists only revenue data sources. Your new request is both valuable and way out of the scope. Shouldn’t we touch base with all the stakeholders listed in the vision document before we start chasing after a whole new class of data?” That’s just enough dampening to eliminate changes in direction based upon your product owner’s whim, leaving only those driven by changes in the business, to which we definitely want to respond.

User-Epic Decomposition Schemes

Learning from experience, I realized long ago that when my product owners make wild requests for huge features, this shows I have not taken the time to explain how data warehouses are built. The fact is we must teach our product owners a little about data warehousing, just as they must teach us a little about their business before we can envision and build an effective DW/BI solution together.

As described in our Agile Data Warehousing book, we employ “user epic decomposition schemes” to communicate the layers and components of DW/BI solutions so that our product owners can move past making huge, inactionable requests for new analytic capabilities. Typically there’s a decomposition scheme for both the back-end and front-end portions of the warehouse.

A back-end scheme might have four dimensions: architectural layer, transformation goal, refresh type, and refresh frequency. Each of these dimensions depicts a series of increasing sophistication for the DW/BI application. For example, the architectural layer dimension starts with staging, progresses through data quality and integration layers, and then arrives at enterprise and departmental dimensional layers. The transformation goal dimension starts with simple replication of columns, but progresses through aggregations, and finishes with complex business rules.

Once oriented to this back-end decomposition scheme, our product owners no longer make huge requests such as “I want to understand why we’re not billing for every product our service trucks have provisioned for customers.” Instead, they begin pinpointing where each of their requests falls along the dimensions of our shared decomposition scheme. For example, they might state, “I know we completed data quality for the western region billing system customers during the last iteration. With this iteration I would like to take that scrubbed data and add it to the customer list we’ve already got started in the warehouse. Forget about any business rules for figuring out what market segment each customer falls into. For this iteration, let’s just get them added to existing data without creating any duplicate records.”

Our front-end decomposition schemes are similar to the back-end schemes, but have replaced a couple of the dimensions with a continuum of BI application features (such as menu systems, list of values for query choices, and multi-object dashboards) and application accessibility (such as on-demand on your workstation versus scheduled refreshes once posted to a departmental server).

I don’t want to imply that the decomposition schemes that we presented in our book will be the precise solution your team should use on your project. Instead, I’m suggesting that if you want to reduce requirements churn, you will need to collaborate with your business partners on compiling a decomposition scheme similar to those in Agile Data Warehousing.

With such decomposition schemes in place, you will have educated your business partners on the many steps required to deliver new data into the data warehouse. They can no longer rationally make sweeping demands because they can now visualize the physics behind data warehousing. They will be able to see for themselves that any capricious requirement churn on their part will cause a large chain of half-completed development work to be abandoned. Erratic changes in direction will now appear costly, and will occur only when truly dictated by business needs.


So, have we resolved any of the rancorous debate with our esteemed colleagues who claim, “You cannot deliver data integration using Agile methods”? I agree that trying to deliver full slices of data integration at breakneck speed before customers change their minds would be darn near impossible. But that scenario is a misreading of Agile, which says “collaborate with the business,” not “let them drive you insane.” We can structure that collaboration by asserting the best ways for the business to work with data warehousing and business intelligence. To wit, we can introduce a dollop of requirements management to slow down the pace of our customers’ change requests and we can provide them with an epic decomposition scheme that will cause them to walk through the data integration journey with us, one step at a time.

So, if we find ourselves jumping madly through every hoop the business holds before us, we cannot blame Agile as being incapable of handling data integration. That objective would be even more impossible under waterfall, where our response time is measured in years rather than Agile’s weeks. Instead, we’d need to admit we haven’t provided the leadership necessary to make the project doable.

The Agile data warehousing community has dozens of techniques, besides the two we’ve examined here, that instill a touch of discipline into the iterative development process so that it becomes data-integration friendly. Data integration certainly needs all the acceleration it can find, given the increasing global competition and shrinking retail product lifecycles in the world today. In the end, the real question is not “Can you do data integration with Agile methods?” but rather “Every aspect of my company’s data warehousing program needs to go Agile to survive, so what must I do to succeed?”

For more ideas on Agile data integration and information on our Agile Data Warehousing books, please visit the Ceregenics website. 

  • Ralph HughesRalph Hughes
    Ralph is the author of the first book on Agile Data Warehousing and currently serves as Chief Systems Architect for Ceregenics, Inc. A faculty member for The Data Warehouse Institute, he teaches Agile methods to hundreds of data warehousing and business intelligence practitioners each year and has converted enterprise data warehousing departments around the globe to Agile methods and tools.  He holds a master’s degree from Stanford University in computer modeling and econometric forecasting.  A certified Scrum Master (CSM) and a PMI Project Management Professional (PMP), he has been developing data warehousing projects since the early 1980s.


Want to post a comment? Login or become a member today!

Posted May 27, 2011 by Ron Tijhaar

I’d like to add in another suggestion for the divide between proponents of traditional data warehousing and proponents of agile data warehousing. In my opinion the big challenge of building data warehouses is not about method or technique, not even about the project management or managerial techniques being used, but predominantly in organizing business support. In a fairly big company with a matrix organization it is pretty impossible to achieve anything in an agile manner, let alone building the corporate data warehouse. And that’s the trouble with method gurus, they believe in the one size fits all mantra or rather the one method fits all version of it. Sure, in a separated corner of that business you will be able to build a data warehouse the agile way. Just don’t involve all business parties and don’t bother too much about enterprise wide data integration. Of course, if that’s what agile is about, agile means the towel is in the ring before the match even started.


Data warehouse methods rarely address the real business support issue. Sure, on paper they do contain the disclaimers about business involvement being an essential prerequisite. The problem is that data warehousing methods don’t offer any guidance on business politics. Building an enterprise data warehouse requires involvement and support of multiple business parties which makes it a rather dynamical thing. There simply is no guarantee from any method that all business parties will give their support all the way down the line. Agility does not make all things malleable.

Is this comment inappropriate? Click here to flag this comment.

Posted April 5, 2011 by Anonymous

It ususally does come down to precise understanding of the terms. And you're right: a good manager/team could succeed with waterfall or agile. One comment/question: I notice that your decomposistion of epics *seems* like fairly technical stories. Scrum advocates user stories that are laser-cuts through the entire technical stack. Business value, yet no technical mocking. I'll hope to see more of what you mean in your book.


Is this comment inappropriate? Click here to flag this comment.