I recently received an inquiry from a student at a European management school who is writing a thesis about the relationship between search technology and business intelligence. She sees the two technologies as having a meeting point at text analytics and asked to pose a few questions on the topic. Many folks share her interest so my BeyeNETWORK blog seemed like a great place to share my responses. Here goes!
Management student> I have been struggling differentiating some terms and understanding them more clearly. Therefore, my questions are related to that confusion. I would also like to hear your opinion on these two technologies (BI as software and enterprise search) and their uses of text analytics.
MS> What is the difference between text analytics and text mining? Is it related to structured vs. unstructured data? Or is text mining a subset of text analytics?
Seth> There isn't a significant difference. I find that text mining is used in areas that have applied the technology longer and that apply data mining. Examples include life sciences and intelligence (e.g., counter-terrorism). Text analytics is more often used in business.
MS> Is content analysis the same as text analysis (if we look at textual documents, not rich data)?
Seth> To me, "content" generally indicates managed information that is typically found in a repository and that is often published on the Web. In this sense, e-mail and IM messages, survey responses, contact center notes and transcripts, and other forms of text generated during business operations are not content. In this sense, content analysis that concerns text is a subset of text analysis.
But "content" does also cover video, audio, and other media as you note. Content analysis would include these forms where text analysis wouldn't, as you understand, beyond work with textual tags.
MS> Is there a difference between text analytics done by search technology and BI applications?
Seth> Text analytics that backs up search is meant to support information retrieval: indexing, summarizing, and ranking documents in response to a search query. TA enables semantic indexing by topics and themes and relationships in order to go beyond indexing based solely on keywords. TA in support of search can also enable smarter, and natural-language, query processing. The example I'll give is that you can enter "map oslo" in Google and get a map of Oslo, because Google is doing a combination of named entity recognition for the geographic area, Oslo, and pattern matching that understands that "map
TA in BI (outside use of search for BI) is different. A complete definition of BI include treatment of information in textual and other forms, in databases, repositories, and on the Web. Search is a BI tool, and so is information extraction (a text analytics technique; information = entities, facts, topics, themes, etc.) into structured databases -- some see IE from text as equivalent to ETL for traditional databases -- and also analysis in the sense of data mining of text-extracted information. So when, for instance, you visualize a relationship network that includes people, companies, etc., based on text-extracted named entities and links (relationships, events, etc.), that's TA at work for BI.
MS > What are the fields that use text analytics the most? (any industries in particular?)
Seth> Life sciences and intelligence (including counter-terrorism) were the earliest use cases with serious work going back to the late '90s and they're still very strong domains for TA. But now we're seeing use in a spectrum of business applications as well.
Seth> Let me refer you for this question and the next to a report I recently published, which you can download for free at http://altaplana.com/TA2009 .
MS> How would you describe text analytics market?
[Seth> In my paper, I estimate a 2008 diversified, global market for text-analytics software and vendor provided professional services at $350 million, representing 40% growth from 2007. I foresee sustained growth rates of up to 25% for 2009.]
MS> There is a lot of talk about eDiscovery where text analytics plays a crucial role, but it is also one of the main markets for search technology. Are these two technologies (is it ok to call text analytics is a technology?) coming together?
Seth> I believe that in e-discovery, the principal application of TA is (still) in support of search in the sense that I wrote about above, creating richer indexes that allow legal researchers (litigants) to respond faster and comprehensively to discovery mandates. TA is only starting to be used by legal professional for investigatory purposes, for what you could call "making the case." Compliance and fraud investigations, and risk management, are starting points in this type of use. But I don't think the technology is being used systematically by litigators yet. I do think we'll see a lot more of this investigatory type of use.
I hope you've found our Q&A useful! As always, if you have questions or comments, do get in touch.
Posted July 30, 2009 1:13 PM
Permalink | No Comments |