We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Gambling versus Probability: Predictive Analytics Requires Advanced Skills

Originally published September 22, 2009

Games are typically played by two or more participants. Every game has defined rules and boundaries. For every game, there is a method of keeping score. Games may be a short-term, one-time process. Sports contests are typical examples of this type of game. In this type of contest, the participants focus on the particular outcome of a single contest.

This article focuses on a longer term game…one that is repeated indefinitely. The goal of the long-term game is to continue to play and continue to score with increasing frequency and value. Most businesses engage in this type of strategy.

Business can be viewed as a contest where we keep score with money. We develop rules for making decisions that are used to attract new customers, retain existing customers, reduce expenses, and minimize risks. The more effective we are with making these decisions, the more we score.

In the context of customer relationship management (CRM), small businesses engage in a relatively simple contest. Many times, they have the luxury of knowing their customers personally. They can adapt their strategies and decision making based on personal feedback with a high degree of accuracy. These types of businesses adapt to the variety of individual games played between the business and its customer base. Each experience is played with the intention of winning. Most times, these games are played to create a mutually beneficial scenario.

Alternatively, large businesses must shift from this mind-set. They do not have the luxury of knowing each of their customers’ needs on an individual basis. They must make decisions based on group behavior. The outcome of any one interaction is relatively unimportant. The focus shifts to strategies that generate long term success based on group behavior. We shift from a position of playing the game to the position of sponsoring the game.

Most of us have gambled at one point or another, whether at a charity fundraiser, in purchasing a lottery ticket, or on a trip to a casino. The games have many consistent characteristics; we pay a price for the opportunity to receive something more than we risk. As individuals, we generally don’t have accurate perceptions of the risk/reward ratios involved. We are willing to gamble a relatively small amount in hopes of becoming a big winner.

The sponsor of the game, however, fully understands the risks and rewards. The sponsor of the game generally attempts to build an interaction with a particular public appeal. They are fully aware that some individuals will walk away winners. In fact, they need those winners to maintain the game. But they also know that by playing the game consistently, with a large number of occurrences, that the probabilities guarantee that they are the true winners.

For large business, the game is only slightly modified. We are attempting to model human behavior that is highly inconsistent. The game is not played with fixed probabilities. Because these businesses cannot accurately analyze the risk/reward structure of each decision and business relationship, they must develop a strategy where the organization makes decisions that guarantee long-term success.

Some customers receive higher than anticipated value. Others may not receive the full value expected. Customers gamble that they can negotiate an arrangement that will provide them with a product or service at a fair value. But, the probability of long-term success is still with the sponsor of the game – the business.

Game Creation

Imagine sitting in a high-stakes poker game. And in this game, you are allowed to see your cards and the cards of every other player before you decide whether to bet. For a relatively minimal cost, you are allowed to sit in and simply observe. When a situation that is beneficial to you develops, you then execute the privilege of participating.

This is the environment most large businesses enjoy. If they accurately evaluate their environment and have the discipline to only participate in probabilistically correct decision making, they are virtually guaranteed to be winners.

The experts at playing these games have an established set of decision rules for when to sit and watch, when to participate, and how to play when they do participate.

Successful large business executives create environments that guarantee success for their organizations. These contests range from sales to customer retention to loss prevention and fraud detection. All decision making is geared to selection of opportunities to increase the score or reduce the risk of loss.

Just as casinos have developed sophisticated games of chance to entertain their customers while guaranteeing their success, data mining and CRM have developed sophisticated techniques of data analysis in the business environment. And as with their gaming counterparts, the business implementation of advanced gaming technology requires an understanding of the characteristics of the tools being employed and advanced skills in decision making.

Data mining and CRM are the advanced technologies of the skilled business decision maker. It is no longer sufficient to simply review a report. With the development of advanced technology solutions in the business environment, it is necessary to increase the precision of the tactics we employ.

We can use a variety of tools to enhance our skills. Realistically, we use these tools to improve our decision making while playing the game. Our intent is to improve our position in order to achieve a higher score.

Keeping Score

In the world of advanced technology, performance is a subjective matter. That means we must take the time to define it on our terms.

At the inception of the project, it is important to fully and completely define the metrics by which the success of a project will be evaluated. The evaluation criteria should include the realistic constraints to be expected in the delivery environment, as well as the operational metrics of performance.

The key is to develop a mathematical formula that will make a significant contribution in live decision making. By being highly precise in defining our decision rules, and by applying them consistently, we are able to adapt the rules effectively as we gain additional experience.

Failure to completely and accurately define our performance criteria appropriately often leads to the development of good solutions to the wrong contest.

It is important to keep in mind that many of the advanced technologies employed in the data mining and CRM fields attempt to optimize performance based on the criteria utilized.

The Tools

There are many alternatives available from traditional statistics to various qualitative tools to the advanced technologies employed in the data mining and CRM arenas. One of the keys to success is selecting the correct technology for the situation at hand.

When we strip away all of the hype and all of mystique surrounding the advanced technologies, we are left with an array of tools. These tools perform a very simple function. They use a database of historical experience to build mathematical models intended to assist in future decision making.

Traditional statistical analysis is often of limited value. It is not that these tools are somehow flawed. Rather, it is that they are overly simplistic and, in many cases inappropriate for the task of modeling human behavior.

Traditional statistical techniques are overly simplistic as they are suitable for only the most basic support of our decision making. They typically assume that the interactions in our decision variables are independent of each other, when, in fact, we are bombarded with multiple inputs that are highly interrelated.

Additionally, these simple modeling techniques generally attempt to build linear relationships between the inputs and the desired output. It is often the case that the basic recognition of the non-linear aspects of a solution space will generate improved decision making.

Traditional statistical analysis is often an inappropriate choice because we are attempting to model human behavior. Human behavior is typically not normally distributed, it rarely has a stable mean and standard deviation and it never has inputs into a model that cause a particular type of behavior – conditions that are necessary for the correct application of traditional statistical tools.

The advanced modeling tools used in data mining are not “better” tools. They are simply better suited to modeling the realities of human behavior.

The techniques employed in data mining are often criticized as less rigorous and more complex than traditional statistical analysis tools. Both of these criticisms should be viewed realistically.

These techniques are less rigorous from the perspective of not offering “right” answers. However, this is not deficiency of the tools. Rather, it is a reality of modeling human behavior that is inconsistent and constantly changing.

The tools associated with data mining are generally more complex than traditional statistics. The mathematical formulas derived are generally not “simple.” Again, this should not be viewed as undesirable. Our goal is to achieve more sophisticated and more accurate decision making in a highly complex environment.

The Environment

One of the pitfalls often encountered in data mining and CRM project management lies in a failure to recognize the limitations of the technologies employed due to the type of environment in which these technologies are expected to perform.

The environment in which most business decision makers function is not precise. In most cases where data mining and CRM are employed, we are modeling human behavior, not physical systems.

Human decision making is subject to inconsistencies, both between and within observations. This means that given the same set of factors influencing our decisions, the answers may be very different. And these differences in responses can be expected, not only from different individuals, but from the same individual at different points in time.

The implication of the inconsistency of human behavior is that, at best, we hope to identify a set of characteristics that allow us to expect a particular type of response at a probabilistically reliable rate. Further, our expectations of a particular behavior pattern can only be expected in a group of individuals displaying a common set of characteristics. We cannot expect to predict the performance of any one individual in any other than a probabilistic fashion.

We are often tempted to look for highly precise answers… to expect a solution to be right or wrong. We cannot win on every play. Human decision making is not precise. Our training in math and the physical sciences does not apply to anticipating human behavior. Our environments cannot be expected to meet our objectives in black and white terms.

Recognizing the limitations we are faced with in human behavior modeling is a first step in developing enhanced decision support mechanisms.

The Continuum – Where Do We Win and Where Do We Lose?

Traditional techniques often set out to come up with most likely outcomes – ways of most accurately describing group behavior in aggregate. In doing so, annoying discrepancies in the group behavior are often assumed away or discarded completely. Observations referred to as “outliers,” observations more than three standard deviations from the mean, are typical examples.

The astute business analyst, as with the astute gambler, will recognize that while many situations may appear to be similar in value, those at the extremes tend to have the most impact whether positive or negative. It is in accurately and reliably identifying the extremes in the continuum that we can make the most impact.

In most cases, we are focused on one tail or the other, such as fraud detection or credit screening. In some problems, we may benefit by identifying occurrences from both tails, such as in response modeling where it is possible to increase sales and reduce expenses simultaneously.

Business Opportunities

Ultimately, we want to know the impact of our decision models on our business performance. This is the ultimate set of metrics. Our goal may be to increase sales revenue, reduce expenses, increase net profit or reduce bad loans. Whatever we decide our priorities are, these metrics become our touchstone for all future decisions. It doesn’t matter how well our prospective models perform on a lift chart or how much we reduce our error metric, if we don’t meet the business criteria, complete with the realities of the constraint system in which we will operate, we will not have a winning model.

Analytic Opportunities

Our business goals are often implemented by using analytic surrogates. Can I increase my response rate? Can I reduce my false positives in my credit scoring model?

These analytic targets can be misleading. It must be remembered that they are, at best, surrogates for our true business metrics.

Technical Opportunities

Technical opportunities are what we see sitting at the table in front of the software during the model development process. The lift chart looks good. The r-squared continues to improve.

This is a good point to emphasize one of the key questions any model architect should be asking on a regular basis. So what!?! Remember, this is not an academic problem. This is real life – which generally involves real dollars.

Our technical enhancements may, or may not, directly translate into additional business benefits. The only way to know for sure is to periodically test our developing models using our true business metrics.

Baseline Performance

It should be apparent to any analyst developing data mining and CRM models that, in dealing with human behavior modeling, there are no right answers – there is no final solution upon which improvements cannot be made.

Instead, we hope to improve on what has been done in the past. To that end, we have to have measured our existing efforts using the same quantitative performance metrics we plan to use in the future.

Incremental Performance Enhancement

Our baseline gives us a point of reference. How is this model performing? Is it significantly better/worse than the techniques we have been employing? What level of improvement is significant? At what point am I willing to terminate this development effort and field a new model?

The frustrating reality of modeling human behavior is that we are always putting ourselves at risk. We must determine how much or how little to devote to our development efforts.

There is no way to determine, in advance, what the payoff is going to be. If we knew that answer, we’d have enough information to not even need the development effort.

We do know that human behaviors change over time – sometimes gradually, and sometimes in very dynamic shifts. And with those changes, our existing decision models must evolve to meet new challenges as well.

What is often overlooked in data mining and CRM is that these are not projects. It is not something to be initiated and concluded. It is a dynamic game that continues over time. There is no one “right” answer. Our decision making and our model development is constantly evolving. It is a goal directed process and must shift with changes in corporate priority, and with experience. It must evolve to maintain value in helping us evaluate the complex realities in which we operate.

  • Thomas RathburnThomas Rathburn

    Thomas A. "Tony" Rathburn is a senior consultant and director of training with The Modeling Agency. Tony has more than 25 years of predictive analytics development experience, and he is a regular speaker on data mining and predictive analytics at TDWI Conferences. He is also a co-presenter for a popular webinar entitled “Data Mining: Failure to Launch,” produced live monthly by The Modeling Agency. He can be contacted at Tony@The-Modeling-Agency.com.

Recent articles by Thomas Rathburn

 

Comments

Want to post a comment? Login or become a member today!

Posted January 14, 2010 by vincentg@datashaping.com

Traditional statistical analysis is often of limited value. It is not that these tools are somehow flawed. Rather, it is that they are overly simplistic and, in many cases inappropriate for the task of modeling human behavior.

Traditional statistical techniques are overly simplistic as they are suitable for only the most basic support of our decision making. They typically assume that the interactions in our decision variables are independent of each other, when, in fact, we are bombarded with multiple inputs that are highly interrelated.

Additionally, these simple modeling techniques generally attempt to build linear relationships between the inputs and the desired output. It is often the case that the basic recognition of the non-linear aspects of a solution space will generate improved decision making.

Traditional statistical analysis is often an inappropriate choice because we are attempting to model human behavior. Human behavior is typically not normally distributed, it rarely has a stable mean and standard deviation and it never has inputs into a model that cause a particular type of behavior – conditions that are necessary for the correct application of traditional statistical tools.

My rebuttal:

  1. Most statistical models DO NOT assume normal distribution. None of my models rely on normal distribution, but are dealing with multimodal or highly skewed distributions (e.g. in the context of fraud detection).
  2. Most modern models do not assume that decision variables are independent. See e.g. my hidden decision tree technology that handles interaction, as well as many other models that include interactions.
  3. Models with linear relationships are just a very small subset of all models. Hierarchical Bayesian and stochastic processes are examples of non linear models.

The author seems to believe that statistics is just about linear regression and basic tests of hypotheses. This is what you actually study during the first 30 hours in any basic statistics curriculum, but there's much much more than that. Follow the discussion at: http://www.analyticbridge.com/profiles/blogs/misconceptions-about

Is this comment inappropriate? Click here to flag this comment.