Archimedes would probably have considered the ETL program for each target table as a mirror shield, and thus divided the available program developers into small developer teams. Each of these teams would have been assigned to producing a small number of completely functioning ETL programs, each for a target table, according to its succinct functional specification: E, T and L, together with a tiny and pragmatist footnote: "White cat, black cat, whichever catches mice is a good cat."
Henry Ford (1863 –1947) might have taken another way, the way of his automobile production, and applied his assembly line approach, as shown in Figure 2
. He would have divided the entire task of ETL into smaller subtasks, like E, T and L, considered these as work step types, and assigned each of these step types to a single developer team. In other words, each developer team would have treated only one small member of the entire work step chain, but for all target tables.
In short, Archimedes would have partitioned the task horizontally
, while Ford would have done this vertically
Each of the Archimedes’ program developers would have had an overview of their own ETL programs and would have been a generalist. Moreover, he would have had fun with his work, since the programs are not required to be identical and the succinct design specification can and will be inevitably interpreted individually, creatively and thus differently. As a consequence, the program production would have been expensive, time-consuming and, thus, ineffective; and the developers could not have been easily exchanged between the teams because the programs there would have looked quite different.
Because all programs they would have produced in a team would have been almost identical, Ford’s developers would have been highly trained specialists for their assigned small tasks. As a consequence, they would have delivered high quality results efficiently, i.e., they could have worked extremely effectively. On the other hand, they would not really have been aware of the big picture. Therefore, their work would have been monotone and boring.
Archimedes was ingenious at designing his weapons and is regarded as one of the greatest engineering designers of all time. In principle, however, he was merely a great artisan, similar to Leonardo da Vinci (1452 – 1519). In contrast, Ford was smart, not only at designing automobiles, but also at the design of the whole production system for automobiles. He was a great meta-designer, who began the assembly line industry epoch. Archimedes had fun, whereas Ford had effectiveness, i.e., high productivity with high quality and low costs.
For real life data warehouses, the ETL mechanism is not as straightforward as described by my above scenarios. We have identified more than two dozen functional task types for the ETL mechanism. Some of them are fairly sophisticated and challenging. Moreover, their applicability is strongly dependent on the data source situation and on the requirements on the data warehouse in consideration, which can be complex and challenging as well. Even for a single task type there may be several variants.
Working generally in the Archimedes' manner, we had lots of fun constructing data warehouses due to the interpretation freedom in the past three decades. Quite frequently, however, our fun turned into a Mary Shelley nightmare. In other words, Archimedes’ way has proven to be the wrong one for us. Ford did not present any freedom of interpretation to his employees. His approach is based on the simple observation that the works assigned to each team can be accomplished absolutely identically. This is also the major design principle of his assembly lines. However, the ETL programs are not identical, although they can be fairly similar. In short, Ford’s way is not the right one for us either. Does an effective and applicable way exist for us? In my next article, I will analyze what does happen when we are doing ETL and give an answer to this question.
Gong Cheng Shi Constructing … Houses
There is some noteworthy "big data" to consider regarding the latest advancements around constructional techniques
by Chinese engineers (Gong Cheng Shi). Do you see any similarity between Ford's assembly line technique and the constructional technique that Gong Cheng Shi applied to construct a 30-story hotel in just 15 days (Figure 3
)? Can we construct data warehouses at a similar velocity and with a comparable quality?
: The 30-story Ark Hotel in Chiangsa, China was built in just 15 days