We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Pinpointing Customer Behavior Predictors — A Step-By-Step Deconstruction Process

Originally published March 8, 2011

Increasingly, in the insurance, retail, telecom and banking industries, the ability to accurately pinpoint customers who have a greater chance of exhibiting certain behavior and proactively putting interventions in place is becoming a competitive differentiator. One of the most important tasks in trying to model customer behavior is to precisely zero in on predictors which influence a behavioral outcome using advanced analytical models. For example, in the insurance industry, a company might want to identify the top five influence levers that determine whether or not a policyholder would cancel his policy. Some questions that arise are:

  • Have all the causal factors been figured into the analytical model for predicting a behavioral outcome (i.e., policy cancellations)?

  • How many predictors were obtained from field immersion vs. exit surveys vs. 90-day customer conversation mining?

  • What is the step-by-step methodology used to bubble up predictors?

  • What are the various best practices and techniques in place to harvest solid predictors that can then be statistically tested for significance?

This article outlines a step-by-step process of methodically harvesting the most influential set of predictors using six different methods.

In all the business scenarios across industries, the search for behavioral predictors is almost a forensic science where the behavioral investigator needs to methodically sift through terabytes of raw electronic and digital data trails left behind by customers with every interaction. He or she needs to investigate the sequence of transactional patterns leading to the outcome (such as policy cancellations), and to investigate if there are statistically significant patterns or triggers in the electronic data trail, which could have served as an early warning sign. Once the most important predictors are determined, their influence on customer behavior should be ranked and sensitivity analysis should be performed to quantitatively measure expected impact on customer behavior for changes in predictors.

While we have used the process of modelling cancellations in the insurance industry thus far, this process can be extended to other industries like telecom or retail or health care business scenarios in which they are also trying to predict customer behavior.

The following schematic provides a three step process which can be used to investigate behavorial drivers. The first step is to research the factors which influence a behavior. The second step consists of assembling quantitative data measuring the specific behavior, and the third step consists of administering statistical tests to confirm linkage between causal factors and the behavorial outcome.

Causal Behavior

Source 1: Field Immersion

In field immersion, the behavioral modelling/investigation team spends a day in the life of an agent or customer trying to get direct experience of the factors at play while an agent is trying to persuade an existing subscriber to keep his or her policy. The rationale for direct customer immersion is based on the fact that direct observations of these interactions can offer a window into agent-customer dynamics and yield new insights as to what causes a policyholder to renew a policy. This can then be statistically modeled into the policy scoring model, provided the data exists. Also, through field immersion, a behavioral investigator gets access to nuances and behavioral dynamics at play, which we may not be privy to with other methods.

For example, observing the agent-customer interaction during a field visit led us to hypothesize that agents had a greater influence over a policyholder if the agent-customer relationship was longer than three years and the customer’s age was greater than 47 years. This was statistically validated by a simple discriminant analysis and hypothesis testing and finally fed as an input to the scoring process. This was one practical example of a modeling variable received from field immersion where the statistical test of significance (chi square value) was better than those received from hypothesis workshops. Similarly, in another interaction, it was observed that in certain zip codes, there was a unique word-of-mouth referral campaign being executed that was causing policyholders to cancel. This was an instance in which data to model this phenomenon was not readily available and the modeling team brainstormed with the customer team to institutionalize a new data collection process to capture details about competitive campaigns that fuelled cancellations.

Source 2: Contact Center (Text Mining Conversations)

Frequently mining customer conversations can yield clues as to triggers for customer behavior. For example in this method, clues for policy cancellations are searched for among a subset of call center transcripts (sub-sample of 90 days of customer conversations in which an outbound customer service representative records the key points of conversation regarding a policyholder’s reasons for not subscribing). Since the volume of conversations is in the thousands, it’s difficult to manually filter these conversations. However, they can be fed through the unstructured text mining process, which can extract the top 10 themes. For example, we can track the frequency of occurrence of certain “Watch List” keywords where a sudden increase in keyword frequency or themes such as "Poor Service" or "Premium Amount" can signal a subscriber’s intent to not renew the policy.

Source 3: Exit Survey

Another technique that can be utilized is to design a simple questionnaire that can probe for the causal factors that resulted in the behavior. For instance, in the insurance example, we could explore agent experience as a possible cause, competitive policy structures as another cause, etc.

Source 4: Advanced Analytics (Automatic Predictor Ranking)

In this method, a set of behavioral dimensions are identified, and a set of quantitative measures of behavior are assembled relevant to the outcome being predicted. In the case of the cancellation modelling problem in the Insurance industry, it would consist of capturing payment frequency, premium share of salary, policy tenure, frequency of outbound reminder calls, and recency of the last inbound complaint call, in a cancellation-specific analytical data mart.

The following list outlines a sample set of variables which are assumed to influence the probability of a policyholder cancelling. Each of these information components will have to be sourced from a myriad of IT applications to create a 360-degree view of the policyholder. Once a 360-degree view of the policyholder is built, it becomes easier to run the statistical process on the integrated dataset to rank the variables on the influence they exert independently on the cancellation rates.

Cancellation Trigger – Predictor Bank

  1. Recency of a claims denial

  2. Tenure of agent with insurance company

  3. Overall experience of agent (total experience)

  4. Automated deduction or cash/check based (payment mode)

  5. Number of unanswered call center calls in last 8 weeks

  6. Frequency of outbound triggers for renewal

  7. Recency of phone bound renewal trigger

  8. Percentage change in renewal commissions to the agent (driven by policy)

  9. Three-month ratio of inbound calls to outbound calls

  10. Range of channels for interaction – Agent / Internet / Mobile / Call Center

  11. Outbound watch list: frequency of occurrence of specific keywords in outbound call interaction

  12. Inbound watch list: frequency of occurrence of specific keywords in inbound call interaction

  13. Recency of last payment

  14. Policy attributes: type of policyholder / location / type of coverage/ policy cost / sum assured / issue age / policy tenure

  15. Range of products covered


Once a 360-degree view of the policyholder is assembled with regard to crucial behavioral elements, we can run two important analytical techniques that provide insight into the most important influencing elements. The first technique is attribute ranking, and discriminant analysis can be used to explore the data. For example, the objective of discriminant analysis is to investigate differences between cancellations and retainers by identifying the most important discriminating variables like policy tenure, timing of (during tax period or not), financial advisor channel/agent channel, premium amount/share of salary, etc.

The second technique is to run a logistic regression model using tools like SAS, SPSS and Oracle data miner to figure out the relative rank of each influence. The logic used to interpret logistic regression is beyond the scope of this article, but a brief overview is given below:

Is the R square value greater than the statistically significant threshold?

If yes, start the interpretation process:

  • “Agent training has 3 times more effect than sending a reminder.” 

  • “If we ensure the reminder call from agent goes on time, then the chances of policyholder renewing goes up by 34%”

If no:

  • Identify additional internal predictors, external predictors, structured predictors and unstructured predictors (keyword frequency), which can be introduced to boost model's predictive power.
Identify additional internal predictors, external predictors, structured predictors and unstructured predictors (keyword frequency), which can be introduced to boost models predictive power.

Source 5: Deep Dive Workshops

Here, some of the most knowledgeable field agents who understand the markets very well huddle together for a brainstorming workshop, in which a facilitator guides the conversations trying to explore the reasons for a policyholder not renewing his or her policy. The deliverable at the end of this workshop is a prioritized set of behavioral policyholder predictors which can then be used for statistical testing.

The rational for employing this technique is that the ability to tease out subjective behavioral hunches is essential to behavioral modeling. In many respects, a skilled behavioral analyst, while grounded in analytics, also employs the powers of perception and intuition. This allows him to consider the significance of other causal factors for cancellations that others would either fail to consider or quickly dismiss as insignificant.

Source 6: Field Hypothesis Harvester

The biggest source of intelligence in any industry are the sales “foot soldiers” (in the insurance industry the financial advisors and agent) on the ground who are interacting every day with their customers. If there is a mechanism by which they can share what they are hearing from their customers, it can dramatically improve the number of causal factors one can model.

Behavioral “Triangulation" of Predictors

After the independent predictor harvesting exercise across different processes is completed, the behavioral investigation team collates the top 10 predictors. The top 10 behavioral predictors are broken up by the process used to collect them—Field Immersion, Contact Center Mining, Automated Detection of Patterns from Policy 360 Data Mart and Cancellation Exit Surveys. The objective is to see if certain policy cancellation predictors are being reinforced by multiple methods; if so, then it increases the confidence level in that predictor’s accuracy.

For example, in the scenario given below, the X axis outlines all the cancellation predictors and the Y-axis outlines the process used to surface each predictor. The cancellation predictors identified by the behavioral research team are policy age, premium amount band, agent tenure, etc. The methods used for surfacing the predictors are field visits with agents, mining call center conversations, etc.

Key observations from the above predictor map are as follows:

  1. Policy tenure, premium amount and agent interaction frequency repeatedly came up as key influencers from three different sources

  2. Competitive products, coverage amount and premium came up as a factor from exit surveys of policyholders surrendering the policy

  3. Agent’s tenure as a factor came up only from field immersion, but data around agent’s tenure is not available for testing this statistically


As can be seen from the examples above, increasing the breadth of sources you listen to to build the analytical model can yield rich dividends in the form of a more robust predictive model. The behavioral triangulation process reinforces the case for introducing a behavioral parameter into the analytical model to increase the predictive lift of the analytical model. As Abraham Lincoln famously said, “If I had six hours to chop down a tree, I'd spend the first four sharpening the axe.” Similarly, in trying to analytically model customer behavior, the bulk of the time must be spent not on the process of statistical modelling itself, but in the preparatory process of researching causal factors and stitching together a 360-degree view of the customer’s behavior to quantitatively test the same.


  • Derick JoseDerick Jose

    Derick Jose is the vice president of Advanced Analytics/Research within MindTree's Data & Analytic Solutions (DAS) Group, one of the world’s largest information management practices, which offers customers a one-stop-shop to capture, analyze, enhance, and view their business information. The DAS practice combines MindTree’s proven analytics, business intelligence, information management and research services for customers in the consumer packaged goods (CPG), retail, financial services, insurance, travel and media markets. Derick has 20 years of experience spanning consulting, advanced analytics and business intelligence solutions. He has worked extensively in the CPG, banking, telecom and retail industries. Derick can be contacted at Derick_Jose@mindtree.com.

    Editor's Note: More articles and resources are available in Derick's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Derick Jose



Want to post a comment? Login or become a member today!

Be the first to comment!