Text Analytics Opportunities and Challenges for 2010

Originally published February 8, 2010

My year-ago item, Perspectives on Text Analytics in 2009, was my second-most-viewed BeyeNETWORK article last year. On the assumption that those readers weren’t all employees of the vendors I quoted, given reader interest, I thought I’d revisit the topic. After all, users benefit by knowing what vendor executives are seeing and planning for the year ahead. Given the nature of the text-analytics market – solutions are not standardized, much less commoditized – customers invest much more than money when they sign with a vendor, and they want to remain sure they’ll invest well.

No single solution provider dominates text analytics. No single provider dominates any significant text-analytics market segment: not customer experience, media and publishing, financial services and insurance, intelligence, life sciences, or e-discovery. This is good news for current and prospective users. Each provider, to stay competitive, has to innovate, to stay ahead of evolving customer requirements. Further good news, judging from provider financial results and from my discussions with end users, is that most vendors are on target. They have done a good job tracking and meeting emerging needs. They need to stay on track in 2010.
So I posed to vendor executives the same question as last year, just updated by a year:

What do you see as the 3 (or fewer) most important text-analytics technology, solution, or market challenges [or opportunities] in 2010?

I’ll again relay responses with minimal editing, condensed in just a few cases. Do keep in mind that the vendors quoted doubtless see the greatest opportunities and challenges as residing in areas their solutions address. This focus is completely understandable; it simply reflects what’s most important to them and their companies’ commercial success.

Text Analytics in 2010

Let’s start with Ian Hersey, Global Chief Technology Officer and EVP, Products of the Attensity Group. Ian cofounded Inxight and later joined Business Objects and then SAP in a succession of acquisitions a couple of years back. According to Ian, there are two big opportunity areas that will be front and center in 2010:
  • Moving text analytics from a standalone application into a component of other applications. As with its larger cousin, the traditional BI solution, as a platform you have a limited set of very specialized users. But there's a need for analytics about textual content to be a part of many applications that deal with textual content, for example, applications that help automate customer service in the contact center should provide analytics about customer issues back to the business, not only to improve operational efficiency back to the call center management but also to share more real time views into product or service issues with the rest of the business.

  • Incorporating social media into text analytics in ways that better benefit businesses. Today we see a number of players out there focused on pure social media aggregation and lightweight analytics (e.g., buzz analysis, media monitoring), but we see customers wanting to incorporate the social media into the same analytical models as they use for their internal data and, more important, plug that social media into business processes. An example again in the area of service is to be able to monitor forums for technical solutions to user issues and to vet and potentially incorporate those solutions into your own web self-service portal.
According to Ian, the biggest challenges in realizing these opportunities include:
  • The need to pre-build "out of the box" analytical models for the various applications. Today, high-end text analytics deployments involve definition of both the dimensions you want to use in your analysis as well as the tuning of the back-end text analysis componentry in order to achieve high levels of accuracy. Making that more self-service while still achieving trust in the data you create is perhaps the industry's biggest overall challenge.

  • The "language" of social media is different, and often context is fairly limited, which makes deriving value from much of it quite difficult. There is also a LOT of it, so being able to effectively filter the data that's really valuable up front is very important and not easy.
Breck Balwin of Alias-i is co-author of LingPipe natural-language processing software. The first of his two points takes us 180 degrees away from the need Ian Hersey sees to “pre-build ‘out of the box’ analytical models” –

We are focusing on customizing text analytics solutions quickly and cheaply for customers. For example, most commercial sentiment classifiers are generic and performance suffers accordingly. We are focused on creating customized models at similar price points with a combination of active learning approaches (this maximizes the value of a human annotators efforts) and better user interfaces for getting training data.

We have also started to use crowd sourcing as a means to create training data quickly [for use with in model building via machine learning]. A nice example of this is our autism project which uses non-specialist volunteers to verify whether a highlighted phrase is a gene or a protein. Have a look at http://bionotate.hms.harvard.edu/autism/index.html. This project combines crowd sourcing with machine learning and supports the high degree of customization necessary for high quality analysis of the autism literature.

Breck’s a great guy for lending Ian Hersey one of his “3 (or fewer)” points (even though Breck didn't know he was doing that). Thanks Breck! One of Ian’s points related to moving text analytics from a standalone application into a solution component, and others of his and Breck’s points focus on adapting the technology for users.

Claire Thomas, text analytics lead at SAP, (independently) shares these views. Claire sees as notable 2010 text-analytics challenges –
  • Adoption by mainstream business users remains the biggest challenge for our industry. We have made strides in showing our relevance for new user groups, with marketing departments interested in customer feedback analytics, and sentiment analysis being broadly applicable to various industries and initiatives. However, the vast majority of business users still view text analytics as a specialized, emerging technology.

  • One big blocker to mainstream adoption is integrating text analytics into solutions that reach beyond the small number of specialists using pure text analytics applications. This requires text analytics to be accessible for the masses, but that can’t be at the expense of compelling results. That’s not an easy balance to achieve, especially given the common view that modern technology should provide instant value.

  • We still need to get the business world to see textual data as a valuable asset to be managed. The issue is that while most structured data is inherently valuable, with unstructured data, there’s no control on the input, so it’s often hard to tell which part of it is important to store and analyze, not to mention what should be trusted. Additionally, unstructured data requires an extra step for sorting and sifting before any profiling and reporting can be done, posing yet another barrier to mainstream adoption.
Aaron B. Brown’s responses echo Claire’s and others I’ve cited to this point. Perhaps SAP and IBM, Aaron’s employer (IBM Business Analytics & Process Optimization Software), have been passing around a pitcher of mega-vendor Kool Aid. Aaron wrote me:

Market momentum has turned the spotlight onto three key challenges that face text analytics as it begins to inhabit [a] new mainstream role:
  • Meeting the growing solution need. Text analytics has traditionally found its niche in focused solutions for voice-of-customer insight, intelligence, and, more recently, e-discovery. As these solutions are becoming mainstream, the innovation frontier is moving rapidly outward, and forward-thinking organizations are looking for new text analytics solutions that address a broader set of problems, including healthcare optimization, fraud mitigation, proactive quality management, and more. This explosion of solution demand will challenge text analytics vendors and their technologies, in both the new demands that these applications put on the technology and the new requirements they impose for sophisticated analytical processing and seamless integration with broader information management capabilities (such as BI, BPM, collaboration, master data management, and predictive modeling) – both elements of creating complete solutions that fully address the business need.

  • Quantifying the business case for text analytics. Quantifying the ROI and creating the business case for text analytics projects is still mostly an art – particularly when the buyer knows there is insight trapped in unstructured information, but the exact value of that insight is unknown until well into the execution of the project. Organizations (and vendors) are still learning how to navigate these challenges, and this will remain a fundamental challenge to text analytics growth in 2010.

  • Closing the value loop by operationalizing the insight of text analytics. Historically, text analytics has served one of two usage patterns – exploratory text analytics to discover new insights from poorly understood content sets and operational text analytics to automate routine extraction of text for use in BI and reporting environments. To fully address its potential as a lever for business optimization, text analytics has to evolve to integrate these two patterns and in doing so deliver true closed-loop performance management of content-rich business processes. With exploratory text analytics identifying previously invisible process problems and bottlenecks, and operational text analytics enabling ongoing oversight of those problems through continuous delivery of text-based insight to reporting and predictive modeling tools, organizations can finally put a performance management lens on content-rich processes. Leading-edge organizations are already starting to do this to improve healthcare delivery, reduce fraud and crime, proactively avoid product quality issues, and more; the challenge in 2010 is to make this transformative capability a mainstream reality.

A Focus on Sentiment Analysis

All three respondents I’ve quoted so far have cited sentiment analysis. You’ll see that text-analytics application recur in additional responses. Next, the response of Olivier Jouve, vice president of corporate development at SPSS, an IBM company. Olivier relates two challenges that IBM and SPSS have anticipated through their solutions:
  • Multi-lingual, multi-source sentiment analysis: Twitter, Facebook and other Web 2.0 media are the new critical sources for marketing. They present a variety of challenges for text analytics: they are voluminous, cryptic, multi-lingual, and deeply interconnected. Text analytics solutions require sophisticated data collection mechanisms, advanced multi-lingual analysis and the infrastructure to manage daily terabytes of data.

  • Associate buzz and sentiment to operational data: For a CMO, extraction of buzz and sentiment has little value if it can't be connected to operational data like marketing spend, basket analysis, shopper cart, or marketing campaigns. Predictive analytics, when associated with text analytics, allows [users] to detect early signals, measure or even anticipate campaign effectiveness, and act upon buzz measurement, for instance, to turn detractors into influencers.
David Seuss, CEO of Northern Light, says that vendors lack imagination, however, in making the case to current and prospective sentiment-analysis users:

The most important challenge for the text analytics industry is developing applications beyond sentiment scoring that have a commercial market. There is little money being earned... in any area other than reputation management based on sentiment scoring. The industry has demonstrated a stunning lack of... new ideas that educate the market as to the potential text analytics has in business research, sales opportunity discovery, competitive intelligence, strategic planning, product development, purchasing evaluations, and technology research.

This is not an easy task, as witnessed by the Financial Times's recent shuttering of the highly innovative Newssift.com [site, a financial news portal featuring faceted navigation and document-level sentiment analysis].

If the text analytics industry is going to get itself out of the sentiment-scoring corner it has painted itself into, compelling applications... have to be developed, deployed, used, and publicized. Here is a fascinating statistic from one of our clients: At one large IT products and services firm, there are fewer than 50 staff members in the analyst relations/marketing communications department that might need sentiment scoring support. This same firm has 70,000 users in marketing, sales, competitive intelligence, strategic planning, product development, and technology research that can all benefit from text analytics applications applied to their needs.

More sentiment analysis – I warned you – from Mattias Tyrberg, founder and CEO of Saplo, a Swedish start-up, but with a search-oriented twist:

I believe that next year will focus on more advanced and personalized real time filters. The increased amount of information makes it impossible to read everything, but users do not want to miss anything interesting or important. The filters needs to be personalized because it's only then you can trust them and save time by not reading all texts.

In the same way, sentiment analysis needs to be developed to be able to adapt to each individual’s (or company’s) opinions. I believe that in 2010 we will be able to use better and more accurate sentiment analysis models that learn and adapt from users.

Market Growth

Lexalytics CEO Jeff Catlin affirms other respondents’ themes. He sees search as a particular growth area and makes other points regarding text-analytics market growth:
  • Text analytics [TA] will become a mainstream feature set in enterprise search applications (though not by name). We’ve seen a steady march toward this in 2009, and it’s most notable in how accepted TA features are by the general public. When I’m at a party now and tell someone what we do, they say “Oh yeah, I read something about that sort of stuff last month” as opposed to the “Huh???” that I used to get. The effect of this is that there are a lot more opportunities for TA in enterprise applications, and I suspect it will mean that one or two of the players may get picked up by a big company.

  • Sentiment will complete its transition to a “checklist” feature that everyone who works in this space will have to provide. All of the vendors (big and small) will claim to have sentiment. The consumers of this technology will also get a bit more educated – we’re seeing this in RFP requests for particular capabilities of sentiment – which will help separate the wheat from the chaff. Unfortunately for us, sentiment won’t be a totally differentiating feature that you can hang a business on anymore, as there will be lots of competition on the sentiment front.

  • The [differentiation between] larger TA players and the niche players will become even more obvious. The bigger players will integrate a number of useful and useable semantic features into their engines which will help with things like ad hoc classification, concept roll-up, and relationship [extraction].

  • On the business side, we expect 2010 to be a “Home Run” year for all the TA vendors with growth rates of 75% to 200% not out of the norm. This is partly due to the mainstreaming of the technology, which is opening up a lot of additional verticals.

Technical Matters

Of all my respondents, only Manya Mayes, SAS chief text Mining strategist, chose to respond with technical challenges. I’m going to conclude that Manya is quite confident about SAS’s ability to compete, hence her concern with further improving solution capabilities. According to Manya:
  • Handling data (or at least text) volumes is increasingly important. Semi-supervised learning could mean that rather than letting the model becoming dated and having to retrain the entire model, the new data could be added incrementally to the training data without requiring an entire rerun.

  • Text mining/text analytics will become critical enabling technologies to solutions such as risk management and customer intelligence.

  • Text cleansing is more important than ever with Web data being so messy (acronyms, synonyms, misspellings etc.).

An Ecosystem

Finally, responses from Sid Banerjee, CEO of Clarabridge, who relayed his thoughts and then expanded on them a bit in his blog. Sid’s third point, about a partner ecosystem, is particularly interesting and matches something I’ve observed myself as a market observer. Sid’s responses:
  • Social media analysis will move from "fad" to "function": Sentiment analysis is key, and it's finally good enough to find real insights... Social media [provides] a critical early-warning (before a customer buys) and customer-support (when a customer is having issues) indicator of customer experience… There are stakeholders in the Fortune 1000 now, with budget and interest, that didn't exist 2 years ago.

  • Enterprise expansion of text analytics will drive… application requirements: Systems will scale to support multi-terabyte solutions, enterprise security, [and] authentication, shielding [personal] data… Capital expenditure budgets will grow and more customers will begin deploying in-house text analytics solutions (though SaaS deployments will also continue to grow).

  • Text Analytics will FINALLY be embraced by a rich partner ecosystem in 2010. In 2009 there was some partner activity (text analytics vendors partnering with marketing services providers, database platform vendors selling text analytics solutions), but partnership activity did not drive high revenue growth for either side... We expect to see strong and productive partnerships developing among text analytics vendors and data warehousing, CRM, and call center [vendors] and systems integrators.

Learning More

I’ll conclude with a suggestion how you can learn more, whether you’re a new or experienced user or still evaluating solutions, whether your role is in business or in information technology. For the broad set of text-analytics technologies and solutions, there’s no better venue than the Text Analytics Summit. This year’s summit, the 6th Boston summit, is slated for May 25-26. I expect to reprise my role as chair.

And given intense interest in sentiment analysis – in opinions, attitudes, mood, and other subjective information in social media, news, and enterprise feedback – I’ve organized a new conference, the 2010 Sentiment Analysis Symposium, April 13 in New York. The symposium will have a practical, solutions focus, bridging technology and business concerns.

I hope to see many readers at these events. Whether you can make it or not, if you’re a current or prospective text-analytics user and would like to tell me about challenges (and opportunities) you face or expect to face – about your text-analytics plans or experiences – please do get in touch (grimes@altaplana.com or 301-270-0795).

SOURCE: Text Analytics Opportunities and Challenges for 2010

  • Seth GrimesSeth Grimes

    Seth is a business intelligence and decision systems expert. He is founding chair of the Text Analytics Summit and principal consultant at Washington, D.C., based Alta Plana Corporation. Seth consults, writes, and speaks on information-systems strategy, data management and analysis systems, IT industry trends, and emerging analytical technologies. Seth chairs the Sentiment Analysis Symposium and the Text Analytics Summit.

    Editor’s Note: More articles and resources are available in Seth's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Seth Grimes



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!