Many times we criticize users for having poor or no definitions at all for their concepts, and we can even get upset if different users of the same organization use different definitions for the same concept. However, can we say with certainty that we are doing a good job with respect to definitions in our own field? I am not so sure. It's more like the pot calling the kettle black. In the world of business intelligence and data warehousing, many concepts have been defined poorly or not at all, including those concepts we use daily. Obviously, this always leads to confusing discussions.
A good definition of a concept satisfies several requirements, one is reversibility. Suppose that we have the following abstract definition: "A is text". Reversibility means that everything that satisfies the text is also an A. Take for example the concept of an african elephant (Loxodonta). A possible definition of elephant would go along the lines of "a big herbivore with a trunk, tusks, and big feet". So each mammal satisfying these requirements is an elephant by definition. Only having a trunk is not sufficient, you must have tusks, big ears, and big feet as well.
With a decent definition we want to include the correct concepts and exclude the wrong ones. For example, from the above definition of the african elephant we can conclude that the savannah elephant is indeed an african elephant. However, by including big ears as a requirement, we exclude the asian elephant rightfully so. By demanding that a concept's definition is reversible, we assure that the wrong concepts excluded.
Unfortunately, in our world not all the definitions are reversible. Let's take as an example Bill Inmon's well-known and frequently used definition of a data warehouse: "A data warehouse is a subject oriented, integrated, non volatile, time variant collection of data for management's decision making". Unfortunately, this definition is not reversible. If a user creates a spreadsheet containing customer data (subject-oriented), that have been brought together from different systems (integrated), that remain unchanged the entire time (non-volatile), and that contain historical data (time variant), and, in addition, if this spreadsheet has been developed to support decision making, then this spreadsheet satisfies all the requirements specified in the specified definition. Ergo, this spreadsheet is a data warehouse. In fact, a lot of data marts that have been created would also satisfy this definition. However, I don't think this is Inmon's intention. In short, the definition has been defined too "wide".
Note that it's not only the definition of the concept data warehouse that is not reversible. It applies to definitions of many other popular concepts as well.
Posted December 15, 2010 6:25 AM
Permalink | No Comments |



