ETL: Extract, Transform and Load for Data Warehousing
by Bill Inmon
Originally published January 10, 2008
Once upon a time in the not so distant past, there was no ETL (extract, transform and load) software. If you wanted to build a data warehouse, you had to write code in order to get data from one source to the appropriate target. There was lots of code – lots of repetitive code.
After you wrote your code, you had to maintain it. Every time a legacy system changed, you did manual maintenance to your code. Every time a target definition changed, you had to do manual maintenance to your code. Every time an end user wanted something new, you had to do maintenance to your code.
Keeping up with transformation coding was – in a word – a real drag in the early days of data warehousing. The code needed for transformation and loading was a real problem because everything had to be done manually.
Then the world was visited by ETL. Prism Solutions led the way, followed by Informatica, DataStage and a host of other ETL technologies.
When data warehousing and ETL first appeared on the scene, the coders of the world felt threatened. In case after case, the coders of the world went and found their most complicated, most arcane, most convoluted program that was needed for transformation and threw that program at the ETL vendor. When the ETL technology threw up, the programmers said – “See, we can’t use ETL here – it can’t handle the XYZ program.”
Fast forward to today, and there is ETL everywhere. It is found in almost every shop that has a data warehouse. No one thinks twice about bringing in ETL technology. The drudgery of writing and maintaining transformation code has mercifully been shifted to automation.
What happened when programmers were protecting their own turf? The programmers were selecting the most impossible example of code and transformation to use as a basis for selecting or not selecting ETL processing. Behind the one difficult program are oodles of easy and much more normal transformations. Selecting the most difficult program in the company as a basis for transformations is like selecting Yao Ming as a representative of Chinese people. Yao Ming plays in the NBA and is 7'5" tall. To draw the conclusion that most Chinese are 7'5" tall is to make a serious error in judgment because even though Yao Ming is Chinese and is tall, not all Chinese are nearly that tall. For that matter, neither are the Americans, the Russians, the English or anyone else. But that is exactly what the early programmers did in order to keep ETL out of the shop.
Today the pendulum has swung in a different direction. Today we say that all programs must be managed under an ETL tool. However, that assumption is as invalid as saying that all transformations are terribly difficult. Most transformations are pretty mundane.
A good middle ground is to say that most programs – say 90% to 95% – are to be written under ETL. But there is always going to be approximately 5% of transformation programs that are not going to fit comfortably under the ETL technology aegis.
However, if you can write 95% of your programs under an ETL tool, you are getting your money’s worth.
Recent articles by Bill Inmon
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC