Business Intelligence Network Business Intelligence Resource

Blog: Dan E. Linstedt

« Information Valuation - Is data an Asset? | Main | SOA and DW beyond the big picture »

Averages and Outliers - Where's the REAL business BI?

My good friend Claudia recently blogged on the mis-use of averages. (click here) I agree with her statements, particularly in light of what the "average" ignores in terms of the outliers in information. Too many times, averages are used to get a green or red colored "single point of light" (graph) on our executive dashboards.

Statements like: Our company is performing fine, we're in the Green on our "average graph" can be extremely misleading. Warning: Opinionated statement in 3-2-1... Sometimes I wonder if the BI vendors in the industry are in business to sell software, or to actually make business better (there are those vendors who have real solutions, and those that just sell software with pretty dials).

Let's take a look at some of the facts about averages and averaging.

1. Averages ignore outliers.
2. very large data sets tend to produce clusters of outliers which averages smooth out, and remove.
3. If a VLDS (very large data set) is averaged, the really important details can be lost.
4. In a VLDS it there are MORE needles in the haystack, not less (more gems in the rough).
5. Some of these needles are like gold, when you can find them. Averages hide these facts.
6. Producing more averages over smaller clusters of averages (where the clusters are clustered data points according to market basket or neural-net mining) will produce a much better graph.
7. When was the last time you heard an executive say: "Yea, I just made a 5 Billion investment based on the average performance of the company over it's customer base..." It's usually they make an investment for very specific reasons, the gold-needle that will bring ROI fast.

Here's an interesting thought... Many people today are of the opinion that too much data is a bad thing. Well, here's the news, good bad or indifferent data allows us to learn more about our business specifics than not enough data.

Averages operate poorly over very large pools of data, they tell "less and less" about the data set underneath, where-as data mining operates very poorly in small data sets. Mining data, clustering data, and understanding the data is better done with too much data than with too little, the answers and assumptions (along with confidence levels rise) when there's more data. Does a data mining engine reach it's conclusions with "averages?" Not usually, it must go through ALL the details it's given (unless it's given a sample set) to find the correct answer.

Averages hide the gold-needle in the haystack of business data. I would suggest that a 3 dimensional landscape graph that pinpoints and monitors clusters of data (resulting from market-basket analysis, and neural networks) would be of more use to the BI world than the standard tank-full chart.

  Posted by Dan Linstedt on April 22, 2005 10:37 AM |

Post a comment