Originally published December 4, 2007
Voice of the customer (VOTC) is a time-tested business concept that has gained new life through the application of text analytics.
VOTC researchers seek to understand the totality customer needs and opinions, whether explicitly stated or indirectly implied. They probe both individual views and collective, market thinking. (We might term that latter type of analysis, voice of the market.)
Findings are used by marketers and product designers, in customer support and in quality assurance. They are drawn from direct customer contacts and from information gathered for operational purposes in the course of everyday business. Both sources generate large volumes of both structured data and free-form text.
Traditional analytical approaches handle structured data well, but prior to the advent of text analytics, discerning customer voices in textual sources was laborious and expensive, providing only qualitative, subjective, small-sample findings. Text analytics additionally enables systematic and automatic text processing of individual cases and also the derivation of scientifically valid statistics from larger-scale data collection efforts. Moreover, prior to the advent of text analytics, it was difficult to integrate the analysis of these text-discovered, qualitative findings with quantitative analyses of numerical operational data. It was therefore quite difficult to deliver much talked about but seldom achieved 360-degree customer views. Text analytics helps organizations overcome these limitations.
In an enterprise information-technology context, VOTC techniques are part of the customer relationship management (CRM) toolbox. CRM is an enterprise application that automates customer interactions, bringing into play a customer’s transaction history as well as supplementary demographic and marketplace information. Advanced CRM utilizes predictive models that identify cross-sell and up-sell opportunities, customer-retention risks, and conditions that are likely to lead to desired goals. Text analytics can provide “lift” – it can enhance results – in all these facets of CRM.
Traditional CRM is operationally focused. Customer interactions generate both transactional data and free-form textual information – correspondence, call transcripts and call center notes, warranty claims, marketing responses, and the like are examples – that are captured by CRM tools. And CRM is complemented by a set of practices that have become collectively known as enterprise feedback management (EFM). EFM relies on direct data collection via surveys, interviews, and focus groups and on indirect harvesting of information from blogs, message boards and e-mail lists, online discussions and social media, and news articles. Other forms of customer input come into play as well, albeit more in business-to-business (B2B), and less in business-to-consumer (B2C), settings: requirements documents, product specifications, contracts, and reports.
Just as there is no single source or type of voice of the customer data, there is rarely a single, time-fixed customer voice.
VOTC work must classify customers according to characteristics such as age, sex, education, income, geographic location, and type of customer (individual, company, government), and it must support segmentation according to outcomes such as propensity to buy a particular product, likelihood to cancel a service, and risk of default on debt.
Of course, certain characteristics and certain outcomes will apply to individual consumers, others to businesses, and some to both; some to private companies, others to government and non-profits, and so on. Certain information of interest will be discernable in some types of source material but not in others. For instance, product satisfaction might be detected in call-center notes and chat boards but not, or at least not before a widespread problem has emerged, in news reports. So context and goals are important in the choice of sources and analytical approaches.
VOTC analytics should extract features – entities, concepts, and relationships – appropriate to context and expected usage. Typically relevant entities include names of individuals and companies, locations, products and their components, and dates. Concepts could include topics such as product quality, price, value, and ease of use. And relationships link extracted entities or concepts via some form of predicate. They capture identity, actions, and events and their attributes.
Take the sentences “I bought a new Prius last April. I am very happy with it.” We have entities “I” and “Prius,” which conceptually indicate a person and a car, the latter with an attribute, “new.” The two entities are linked by the action, “bought,” which in turn indicates an ownership relationship. The action statement is revealed to be an event by the presence of the modifier “last April.”
Robust analyses often require deep linguistic and inferential capabilities. For example, the pronoun “I” of the last example would presumably refer to the person who has called, submitted a form, or written an e-mail message. It is an external reference. “It” in “I am very happy with it” is an anaphora, referring to “Prius” in the prior sentence. “Last April” must be understood as indicating the April preceding the date of the communication, and Prius clearly belongs to the concept classes “car” and “vehicle,” where the fact that “truck” is also a “vehicle” may be relevant.
Similarly, term-reduction and disambiguation capabilities may be quite important, for instance, the ability to understand that Jerry Ford and Gerald R. Ford are a single person, but that in a business context, Ford is likely to refer to a car rather than a former president or a place to cross a river.
Let’s look more closely at the sentiment expressed in our Prius example, “happy” with its modifier “very.” From them, we infer a strongly positive tone in the communication, which we would handle quite differently if it read, “I would be very happy with the car if the brakes weren’t so stiff.” The use of the words “would” and “if” in our example show that it is not enough to simply consider the presence of terms like “very happy”: VOTC analysis demands deeper understanding. It should distinguish fact from opinion; extract sentiments; discern attributes such as intensity of feeling, duration, and sequence; and boast sufficient linguistic capabilities to process important language elements that indicate tone, negation, and even sarcasm.
Text analytics processes – for voice of the customer research and for other purposes – generally start with information retrieval: identifying sources and accessing documents that are likely to contain information of interest. They apply linguistic and statistical techniques to discern features as described previously, and they typically annotate documents using XML tags that are appropriate for the information type and business domain.
Text analytics can automate document processing, for instance, by classifying e-mail and other communications according to category – examples are compliments and complaints, product inquiries, orders, and requests for return authorization – and routing or responding appropriately. This type of automated, embedded analytics is complemented by interactive, exploratory analyses, for instance for market research and product design. Some vendors offer workbench-style interfaces that are designed for specialized tasks such as survey analysis and media monitoring, and some support VOTC work within generic data-mining workbenches.
By extracting tagged entities, concepts and sentiments to databases, users can do their VOTC work with familiar business intelligence, predictive modeling, and visualization tools. In this scenario, we use text analytics essentially to enhance and extend traditional extract, transform, load (ETL) processes that were previously restricted to structured, fielded sources. With text-extracted information sitting in database tables alongside data generated by operational systems, we have the possibility of integrated analysis that enables those elusive 360-degree customer and market views. Integrated VOTC analytics is a route to an important, long-sought goal.
Voice of the customer approaches have until recently been limited to an assemblage of structured data sources and free-form text analyzed solely with qualitative methods. Text analytics has opened up new vistas on customer information – on individuals and on broad market segments – by enabling combined analyses. The product is enhanced customer and market insights that help organizations act quickly and effectively and retain competitive advantage.
Recent articles by Seth Grimes