Blog: Seth Grimes Subscribe to this blog's RSS feed!

Seth Grimes

Welcome to my BeyeNETWORK Blog, which will focus on text analytics and other matters related to making sense of unstructured information sources in support of better enterprise decision making.

About the author >

Seth is a business intelligence and decision systems expert. He is founding chair of the Text Analytics Summit and principal consultant at Washington, D.C., based Alta Plana Corporation. Seth consults, writes, and speaks on information-systems strategy, data management and analysis systems, IT industry trends, and emerging analytical technologies. Seth chairs the Sentiment Analysis Symposium and the Text Analytics Summit.

Editor’s Note: More articles and resources are available in Seth's BeyeNETWORK Expert Channel. Be sure to visit today!

March 2009 Archives

Welcome to a first, monthly text-analytics update. This month's and subsequent updates will cover developments ranging from software, market trends, conferences, and whatever else is new that will help readers better understand advances in Knowledge Discovery in Text.

Software news

The Nature Publishing Group announced January 27 that it is no longer actively pursuing the Open Text Mining Interface (OTMI), which had aimed to enable scholarly publishers, among others, to disclose their full text for indexing and text-mining purposes. Timo Hannay, publishing director at Nature.com, says if interest returns then we're open to picking up OTMI again. And if anyone else should want to take it forward then we would be delighted, though I haven't yet heard of anyone wanting to do that. Send inquiries to otmi@nature.com.

NLTK 0.9.8 has been released, an update version of the Python open-source Natural Language Toolkit with "a new off-the-shelf tokenizer, POS tagger, and named-entity tagger. A new metrics package includes inter-annotator agreement scores and various distance and word association measures. There's a new collocations package. There are many improvements to the WordNet package and browser and to the semantics and inference packages. The NLTK corpus collection now includes the PE08 Parser Evaluation data, and the CoNLL 2007 Basque and Catalan Dependency Treebanks. We have added an interface for dependency treebanks. Many chapters of the book have been revised in response to feedback from readers. For full details see the ChangeLog.

SAS has released SAS Content Categorization, based on text technologies from Teragram, which SAS acquired in 2008. According to SAS, the software applies natural language processing and advanced linguistic techniques to automatically categorize large volumes of multilingual content that is acquired, generated, or exists in a repository. It correctly parses and analyzes content for entities and events, which are then used to create metadata and trigger business processes.

Company news

Infonic, a UK text-analytics and document-management software publisher, merged with US sentiment-analysis specialist Lexalytics on December 1 and was subsequently declared insolvent on February 3. VC firm Lake House Capital bought Infonic out of administration on February 10. Lexalytics has continued operating independently, without disruption, in the interim. (Visit http://intelligententerprise.com/blog/archives/2009/02/infonic_reloade.html for a fuller examination of the story.)

Conferences

The fifth annual Text Analytics Summit has been announced for June 1-2, 2009, in Boston. I will reprise my role as chair of the summit, which is targeted to practitioners, users, solution providers, researchers, and industry observers.

SIGIR 2009, the Association for Computing Machinery's Special Interest Group on Information Retrieval, will convene July 19-23, 2009 in Boston. SIGIR focuses on all aspects of information storage, retrieval and dissemination, including research strategies, output schemes and system evaluations. The conference's Industry Track aims to bridge the gap between research and practice across a broad spectrum of topics in information retrieval.

The sixth International Workshop on Text-based Information Retrieval (TIR) will be held in conjunction with DEXA 2009 in Linz, Austria, August 31-September 4, 2009. The call for papers is open.

The 2009 NooJ conference and workshop is slated for June 8-10, 2009 in Touzeur, Tunisia. NooJ is a freeware, linguistic engineering development environment used to formalize various types of textual phenomena using a large gamut of computational devices, from Finite-State Automata to Augmented Recursive Transition Networks. Paper abstracts may be submitted through March 15.


Posted March 4, 2009 6:49 PM
Permalink | No Comments |
I've been privileged to "staff" a BeyeNETWORK expert channel, covering Text Analytics, for over a year now.  I hope the articles published to my channel have been helpful in educating readers on text technologies and how to use them.

The addition of a blog to my channel will allow me to post more frequently and flexibly, complementing my more formal channel articles.  I'll blog starting (after this welcome note) with the first of a monthly series of Text Analytics Updates.  The updates will cover new and noteworthy items from the text-analytics universe: product announcements, conferences, notable publications, company news, and so on.  I will include brief analyses as appropriate.  Consider them a joint production with data-mining newsletter KDnuggets, whose editor, Gregory Piatetsky-Shapiro, asked me to compile them.

I plan to post other, original items to my blog here.  So thanks to the BeyeNETWORK folks for getting me set up, and stay tuned...

Posted March 4, 2009 10:13 AM
Permalink | No Comments |