We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Predictive Analytics: Classification Using Decision Trees

Originally published September 27, 2012

In recent months we have looked at different aspects of predictive analytics, which, to a large extent, rely on analysis of historical data to develop models that can help predict desired outcomes of selected business scenarios. In this month’s column, we look at an analysis technique for prediction that combines a method, classification, with a type of algorithm used for classification, namely decision trees.

The predictive capability relies on the premise that if you have a process with a desired outcome, the entities that behave in the desired way have some shared characteristics. The expectation, then, is that others that share the same characteristics are then likely to behave in the desired manner. Therefore, when assessing a candidate’s propensity to act in the desired manner, one can use the classification model to review the critical characteristics’ values to get a probability of achieving the desired result.

One approach to developing this predictive classification model is a decision and classification tree, which represents the different questions one would ask to determine the class into which the candidate entity is assigned. Each node in the decision tree basically represents a question, with a child node associated for each possible answer.  At the bottom of the tree, the leaves represent the different sets for classification

The implementation of the decision tree involves a data structure consisting of nodes and edges (or links), in which one node is identified as a parent to other nodes (the children) that are connected via the edges. When traversing the tree, each determination as to which path to take from any specific node is dependent on the answer to the node’s question. At each step along the path from the root of the tree to the leaves, the set of records that conform to the answers along the way continues to grow smaller.
For example, Figure 1 depicts part of a decision tree for mobile telephone customers. The top level node asks the question “Is the customer’s monthly bill greater than $200?” If the answer is no, that customer can be immediately classified as an “OK Customer.” On the other hand, if the answer is yes, the next question asked is about the number of phones associated with the plan. One to two phones means the customer is classified as a “Good Customer,” while more than 2 phones indicates a “Great Customer.”
 



Figure 1: A piece of a classification tree for mobile telephone customers.

The algorithmic approach requires that you take a training set and begin a sequence of iterations for splitting the data set based on the values of the independent variables. The algorithm seeks to choose the best of the remaining independent variables for most evenly splitting the records, which corresponds to the determination of the next question to use in classification. The choice for the next split is based on choosing the criteria that divide the records into two or more sets where, in each set, a single characteristic predominates.

In reality, to find the best choice means that the algorithm must evaluate all the possible ways to split the record set based on each independent variable, measuring how good that split will be using predefined success criteria, and then choosing the one that provides the best division. At the same time, during each step, if you find that for a certain attribute all values are the same, we eliminate that attribute from future consideration since it does not have any measurable impact on the classification model.

When you reach a point where no appropriate split can be found, we determine that node to be a leaf node. Lastly, when the tree is complete, the splitting criteria at each internal node can be reviewed and assigned some qualitative meaning. This provides a “white box” process for classification such as the one excerpted in Figure 1.

The benefit of the decision tree approach is twofold. First, the algorithm selects the hierarchy of critical dependent variables that are relevant for the probabilities for particular business processes. To use our mobile telephony example, if we know that “Good Customers” are likely to be converted into “Great Customers” when provided with a specific kind of offer, the classification tree can be integrated into the operational process that generates customer offers. Second, the review of the dependent variables can shed some light on the characteristics of the analyzed entities (e.g., customers) that are important and their order of importance. In our example, the amount of the monthly bill is more important in distinguishing customer types than the number of phones, as is shown by its higher position in the hierarchy.

One last point: I noted before that at each iteration, the algorithm basically has to evaluate all the possibilities. While this may be computationally restrictive on small-scale computing systems, the fact that the analyses can be done on shared data in parallel makes the process eminently adaptable to a scalable high performance analytical platform, such as those that use MapReduce and/or Hadoop. In future columns, I will look more closely at the breakdown of the decision tree algorithms and how they can be implemented using MapReduce.

Recent articles by David Loshin

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!