We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Data Mining or Data Warehousing?

Originally published August 4, 2005

There is a lot of confusion concerning the terms data mining and data warehousing (also referred to as business intelligence in the marketplace today). To my chagrin, many IT professionals use the two terms interchangeably, with little hesitation or regard for the differences between the two types of applications. While the goals of both are related, and often overlap; data mining and data warehousing are dedicated to furnishing different types of analytics, for different types of users and therefore merit their own space.

By definition, data mining is intended for users who are statistically inclined. These analysts look for patterns hidden in data, which they are able to extract using statistical models. Data miners engage in question formulation based primarily on the "law of large numbers" to identify potentially useful relationships between data elements, which can be profitable to companies.

For instance, car insurance companies will sift through terabytes of data to link accident rates to demographic groups. They may start with a hypothesis that single men in the 18-25 age group who drive red sports cars are prone to drive recklessly (based on number of tickets they get and number of accidents they have) than older men who drive minivans. After sifting through their data, they may find that this hypothesis is not necessarily true. On the contrary, they may find that families who have multiple cars, among which one is a sports car (color is irrelevant), have more accidents and tickets when they have a teenager living in the house. Such cause-and-effect patterns help data miners quote premiums fairly on insurance rates. And hopefully allow them to reduce premiums where appropriate.

Data warehouse users, on the other hand, tend to be data experts who analyze by business dimensions directly. In the retail sector, for instance, they evaluate sales transactions based on the products sold, across a period of time. Data warehousing analysts are concerned with what kinds of purchases their customers make, and if the analyst can help the customer by improving the customer experience.

A good example of the type of data warehousing analysis may be where a major retailer was stocking its stores with the same sizes and products. However through warehousing their data, they were able to understand that stores in New York City churned through the smaller size inventory much faster than the stores in Chicago. Subsequently, the retailer would stock their NYC store with smaller sizes as compared to the stores in Chicago.

To the naked eye, the two types of analyses appear to be of the same nature, as both are concerned with increasing profitability based on history. However, there are key differences here. In data mining the analyst is looking to support a hypothesis based on correlations, patterns and cause and effects relationships in statistical models. The same is not necessarily so in data warehousing. Data warehousing is concerned with answering a broader question (i.e., what did I sell in my store yesterday), and slicing and dicing data from that point onward to identify ways to improve the customer’s shopping experience.

In conclusion, the two applications types are similar in that they rely on historical data to drive profitability in the future. However, the methods the two employ are different, and require different skill sets of the analysts that analyze data. Nevertheless, both fall short of delivering a predictive model. There are algorithms and products under development that are working to improve predictive analytics. However, we will have to wait and see whether they behave like a crystal ball.

  • Sanjeev Vohra

    Sanjeev is a Senior Consultant for Data Management Group, specializes in business intelligence and performance management. Throughout his career, Sanjeev has led multiple business intelligence and data warehousing implementations for commercial clients and government agencies. Sanjeev holds a double bachelor’s degree; a B.S. in Information and Decision Science and Economics from Carnegie Mellon University.

Recent articles by Sanjeev Vohra



Want to post a comment? Login or become a member today!

Be the first to comment!