Business Intelligence Network business intelligence resources

Blog: Claudia Imhoff

« Who Should the CIO Report to? | Main | AMD Goes After Intel on Antitrust Charges »

BI Schedule Busters; Mitigating ETL Surprises

There is one task in virtually every BI project that seems to cause the most surprises and to be the biggest schedule buster -- that is estimating how long the ETL or data integration process will take. While the reasons for this are numerous, it seems that a lack of business understanding leads the pack.

My friend, James Schardt, Chief Technologist for Advanced Concepts Center, sent me a suggestion for how he mitigates this serious problem in his DW projects.

From James Schardt:

"Here's the situation: A data model has been developed and the appropriate physical schemas for the DW have been designed. (This may or may not have been done using gathered documented requirements for the DW). The design team has some idea of which source system data records will be used to populate the DW. Now the design team must develop a set of transformations that must properly clean, transform, and/or merge the data into the right location in the DW. The team discovers they simply do not have an understanding of the business to correctly specify many of the more complex transformations. They make assumptions. They get it wrong. The users are not happy with what they get.

When I talk about gathering and documenting requirements for the DW, I mention the "Reverse Engineering" technique. During the Requirements and Analysis phases of DW development, have a couple of developers reverse engineer the important source systems. The result is a conceptual level ER or UML class diagram that forms the basis of bottom-up domain modeling. (This is then combined with top-down domain modeling to form the basis for subsequent data warehouse and ODS design).

While the development team is reverse engineering source systems, they will uncover problems with the data and the structure of the data. When they combine multiple reverse engineered models into an integrated bottom-up domain model, they will uncover many other problems with the operational data sources. But, these problems are discovered during analysis and before starting the design. The DW team has an opportunity to resolve these issues with users before they begin design work. Users have input and more buy-in to what the DW will provide them. We get the users to specify their requirements for clean and integrated data using business rules that make sense to them.

By performing reverse engineering early in the process the developers should be able to perform the ETL development without a lot of surprises."

Please add your comments to James' suggestion or make your own suggestions to this universal BI project schedule killer.

Yours in BI success!

Claudia

  Posted by Claudia Imhoff on June 24, 2005 1:29 PM |

Comments

Admittedly I have a tainted view on requirement gathering. Almost always when I ask a data warehouse designer how do they gather requirements, there is the inevitable answer "Ask the users"; ask them what? These SMEs have no and do not want to have any knowledge how the CIF works or will work. But we think they will give us some magic potion that will spell out requirements to us.

I am primarily involved in forensic data, I am a lawyer with substantial trial back ground but realize that what Management wants is to make decisions that comply with law. So what I want to know is what those decisions are.

We use reverse engineering as James suggests and other methods but user acceptance is best assured if the system supports the decisions the manager must make. Knowing and documenting those decisions is the first step to a project without surprises.

Hi Claudia,

I am concerned about the entire notion of DW in the industry, and it's thoughts as it pertains to ETL.

I think Compliance is a serious issue, and I believe those in the DW community have taken it too lightly. Furthermore, I believe that cleansing, changing, and migrating data definately makes it "wrong" according to the business requirements, and often breaks ultimate business initiatives.

Yes, there's something to be said about quality, and clean data being presented to the end-users from which a strategic decision is to be made, but I believe from a compliance aspect when we implement business rules on the way IN to a data warehouse, we lose much of the compliance we are after - not to mention it's impact on the engineering of the data model.

The data models have indeed divested themselves away from the business purposes, hence the problems of "meeting the business needs" and storing the raw data necessary to pass audits.

What I think needs to happen is two fold:
1. Businesses need to decide how and what needs to be "keyed" - most of the time, this is already done, by the way they identify information uniquely across their business.
2. IT needs to construct a DW model that reflects these grains of data across the enterprise, and brings the lowest level of detail possible (good, bad or indifferent) across into the EDW.
3. The business rules ("cleansing, quality, cleanup, merging, changing") should be implemented going FROM the EDW into the Data Marts.

Following these three rules allows businesses to provide traceability, enable Real-Time loads to their EDW, manage alerts on "bad data" affecting the business processes, and keep consistent quality data in place in the Data Marts used for strategic decisions. The data model that I'm suggesting is the Data Vault Architecture.

Thoughts? Thanks,
Dan Linstedt

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)