Blog: Claudia Imhoff Subscribe to this blog's RSS feed!

Claudia Imhoff

Welcome to my blog.

This is another means for me to communicate, educate and participate within the Business Intelligence industry. It is a perfect forum for airing opinions, thoughts, vendor and client updates, problems and questions. To maximize the blog's value, it must be a participative venue. This means I will look forward to hearing from you often, since your input is vital to the blog's success. All I ask is that you treat me, the blog, and everyone who uses it with respect.

So...check it out every week to see what is new and exciting in our ever changing BI world.

About the author >

A thought leader, visionary, and practitioner, Claudia Imhoff, Ph.D., is an internationally recognized expert on analytics, business intelligence, and the architectures to support these initiatives. Dr. Imhoff has co-authored five books on these subjects and writes articles (totaling more than 150) for technical and business magazines.

She is also the Founder of the Boulder BI Brain Trust, a consortium of independent analysts and consultants (www.BBBT.us). You can follow them on Twitter at #BBBT

Editor's Note:
More articles and resources are available in Claudia's BeyeNETWORK Expert Channel. Be sure to visit today!

There is one task in virtually every BI project that seems to cause the most surprises and to be the biggest schedule buster -- that is estimating how long the ETL or data integration process will take. While the reasons for this are numerous, it seems that a lack of business understanding leads the pack.

My friend, James Schardt, Chief Technologist for Advanced Concepts Center, sent me a suggestion for how he mitigates this serious problem in his DW projects.

From James Schardt:

"Here's the situation: A data model has been developed and the appropriate physical schemas for the DW have been designed. (This may or may not have been done using gathered documented requirements for the DW). The design team has some idea of which source system data records will be used to populate the DW. Now the design team must develop a set of transformations that must properly clean, transform, and/or merge the data into the right location in the DW. The team discovers they simply do not have an understanding of the business to correctly specify many of the more complex transformations. They make assumptions. They get it wrong. The users are not happy with what they get.

When I talk about gathering and documenting requirements for the DW, I mention the "Reverse Engineering" technique. During the Requirements and Analysis phases of DW development, have a couple of developers reverse engineer the important source systems. The result is a conceptual level ER or UML class diagram that forms the basis of bottom-up domain modeling. (This is then combined with top-down domain modeling to form the basis for subsequent data warehouse and ODS design).

While the development team is reverse engineering source systems, they will uncover problems with the data and the structure of the data. When they combine multiple reverse engineered models into an integrated bottom-up domain model, they will uncover many other problems with the operational data sources. But, these problems are discovered during analysis and before starting the design. The DW team has an opportunity to resolve these issues with users before they begin design work. Users have input and more buy-in to what the DW will provide them. We get the users to specify their requirements for clean and integrated data using business rules that make sense to them.

By performing reverse engineering early in the process the developers should be able to perform the ETL development without a lot of surprises."

Please add your comments to James' suggestion or make your own suggestions to this universal BI project schedule killer.

Yours in BI success!

Claudia


Posted June 24, 2005 1:29 PM
Permalink | 2 Comments |

2 Comments

Admittedly I have a tainted view on requirement gathering. Almost always when I ask a data warehouse designer how do they gather requirements, there is the inevitable answer "Ask the users"; ask them what? These SMEs have no and do not want to have any knowledge how the CIF works or will work. But we think they will give us some magic potion that will spell out requirements to us.

I am primarily involved in forensic data, I am a lawyer with substantial trial back ground but realize that what Management wants is to make decisions that comply with law. So what I want to know is what those decisions are.

We use reverse engineering as James suggests and other methods but user acceptance is best assured if the system supports the decisions the manager must make. Knowing and documenting those decisions is the first step to a project without surprises.

Hi Claudia,

I am concerned about the entire notion of DW in the industry, and it's thoughts as it pertains to ETL.

I think Compliance is a serious issue, and I believe those in the DW community have taken it too lightly. Furthermore, I believe that cleansing, changing, and migrating data definately makes it "wrong" according to the business requirements, and often breaks ultimate business initiatives.

Yes, there's something to be said about quality, and clean data being presented to the end-users from which a strategic decision is to be made, but I believe from a compliance aspect when we implement business rules on the way IN to a data warehouse, we lose much of the compliance we are after - not to mention it's impact on the engineering of the data model.

The data models have indeed divested themselves away from the business purposes, hence the problems of "meeting the business needs" and storing the raw data necessary to pass audits.

What I think needs to happen is two fold:
1. Businesses need to decide how and what needs to be "keyed" - most of the time, this is already done, by the way they identify information uniquely across their business.
2. IT needs to construct a DW model that reflects these grains of data across the enterprise, and brings the lowest level of detail possible (good, bad or indifferent) across into the EDW.
3. The business rules ("cleansing, quality, cleanup, merging, changing") should be implemented going FROM the EDW into the Data Marts.

Following these three rules allows businesses to provide traceability, enable Real-Time loads to their EDW, manage alerts on "bad data" affecting the business processes, and keep consistent quality data in place in the Data Marts used for strategic decisions. The data model that I'm suggesting is the Data Vault Architecture.

Thoughts? Thanks,
Dan Linstedt

Leave a comment

    
   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›