We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Understanding Legal Information and E-Discovery

Originally published March 18, 2008

December 2006 amendments to U.S. Federal Rules of Civil Procedure (FRCP) are the most recent in a series of compliance mandates that IT and business alike must observe. FRCP govern court procedures in civil lawsuits. What differentiates FRCP from earlier mandates such as 2002’s Sarbanes-Oxley Act is the huge volume of textual information involved. (Other nations have similar legal and financial control mandates.) The ingredients include lots of text, exacting procedures, tight time frames and significant cost pressures. The result is widespread legal-sector adoption of text technologies to facilitate litigation related processes: the collection, management, review and delivery of electronic stored information (ESI) that is or may be pertinent to lawsuits.

Information technology (IT) plays an important role in the legal sector just as in other industries. There are typical operational functions such as time accounting and billing, and also litigation-related case management. It’s in management of evidentiary materials for lawsuits and of legal information – searchable legal code, case law and directories – that text technologies can really shine.

The Legal Information Market

Legal-information providers such as Thomson and Lexis-Nexis use text technologies to support information extraction and advanced search functions. Thomson Vice President Peter Jackson describes interesting text-analytics applications at his firm that started a number of years ago with an initiative to address “the people side of the law,” which Jackson characterizes as important but (formerly) neglected. Thomson’s PeopleCite project identified references to people in case records, disambiguating names using the context and attempting to link them to external records. Thomson would focus especially on attorneys and judges and link identified names to the West Legal Directory and then back to other cases – a two-way street between case law and the directories. Thomson followed this work by looking for other people of interest such as expert witnesses by reviewing years of jury verdicts. Most expert witnesses have professional qualifications and licenses that can be tracked through public records, allowing Thomson to create enriched databases by joining text-extracted information with records in conventional database systems.

Thomson and other legal-information providers have only expanded their reliance on text analytics in recent years. The primary legal-sector application for the technology, outside of creating legal-information databases, is in management of evidentiary materials.

The E-Discovery Mandate

Electronic evidentiary materials may take a broad variety of forms: e-mail, instant-messaging traffic, telephone logs, transactional data in operational systems, depositions, sound and video recordings, and all manner of corporate communications, documents and records. (We won’t concern ourselves with physical objects.) Relevant information is overwhelmingly textual with a smaller amount of relevant information captured in audio form and a very small amount in video or other forms.

The treatment of these materials is highly formalized due to admissibility standards formulated in the course of hundreds of years of legal practice. The process whereby parties to a lawsuit request and provide documents and information that may be pertinent in litigation is called discovery. The management of discovery has been transformed by the computerization of evidentiary materials, of electronically stored information (ESI) that may be pertinent in litigation, whether that information originates in electronic form (e.g., e-mail) or scanned from paper and possibly transformed via optical character recognition (OCR). The result is termed e-discovery. While responding solutions focus on the U.S. market, similar trends are evident worldwide.

According to e-discovery consultant Tom Lidbury of law firm Mayer Brown, speaking at the February LegalTech conference, “…clients are adopting technology rapidly to manage all these processes.” Forrester estimates:

E-discovery technology spending will grow from $1.4 billion in 2006 to more than $4.8 billion in 2011 as enterprises realize that they have no choice but to prepare for electronic discovery. Amendments to the Federal Rules of Civil Procedure (FRCP) taking effect on December 1, 2006, will drive short-term growth for reactive e-discovery solutions, while the desire to wrap e-discovery into broader retention management strategies will drive significant market growth for years to come.

Software and service vendors and legal-sector organizations responded to then-pending FRCP-amendment e-discovery mandates by creating a framework for e-discovery solutions. From edrm.net:

Launched in May 2005, the Electronic Discovery Reference Model (EDRM) Project was created to address the lack of standards and guidelines in the electronic discovery market … a major concern for vendors and consumers alike. The completed reference model provides a common, flexible and extensible framework for the development, selection, evaluation and use of electronic discovery products and services.


The vast majority of attention in legal-sector IT is currently on e-discovery solutions, principally in the EDRM range from Records Management to Review. The greatest concern by far relates to management of e-mail, which is estimated to comprise 90% of discoverable ESI. Solutions that conform to EDRM are very tied to formal discovery procedures, capturing specialized and particular needs such as “legal hold” and concepts such as “custodian.”

Solutions for Records Management, Identification, Preservation and Collection processes – the left half of the EDRM picture – center on policy-driven content acquisition, management and stewardship.

Processing-Review-Analysis in the EDRM is retrieval, transformation and evaluation of individual documents or sets of documents to a) deduplicate and otherwise remove unneeded documents from consideration; b) convert documents to tractable format(s); c) determine the relevance of documents by considering metainformation such as date, custodian and content; and d) evaluate “a collection of electronic discovery materials to determine relevant summary information, such as key topics of the case, important people, specific vocabulary and jargon, and important individual documents.”

Lastly, Production as used in the EDRM is a technical term for the delivery of discovered material to counsel, and Presentation relates to form.

Moving Beyond Legal Review to Investigation

Reviewers typically interact only with metadata that describes document properties. Analytical support may include very limited entity extraction, in particular, of e-mail header information, typically using regular-expression pattern matching, and of topic(s) using statistical methods. A number of products will cluster documents according to identified topics, but these products do not provide capabilities such as term reduction – that is, the ability to decide by automated or manual means that a set of multiple inferred topic terms should be grouped and treated as a single term – because that function is not part of the conventional legal workflow.

Several commercial review solutions allow the user to visualize document sets, rendering, for example, the flow of e-mail messages via a social-network display with time-based selection controls. Many users are still struggling to comply with e-discovery collection and archival mandates, so most are content with limited text-analytics capabilities. Nonetheless, a number of forward-looking vendors and their clients are looking beyond e-discovery to investigative capabilities that can help “make the case.”

Text analytics can support legal analyses by discerning important entities, relationships and sentiments in textual sources by extracting important features, and by supporting classification and analysis of documents and extracted information.

Ian Black, head of global operations at analytics vendor Autonomy, cites the ability of his company’s tools to look at “information as a pattern,” whether based in text, rich media, voice or video, as providing powerful infrastructure that, when married to legal-sector needs, allows customers to undertake important investigatory functions. He cites as an example a joint New York Stock Exchange-SEC initiative, and also the ability to empower early case assessment, which can play a role in development of legal strategy.

His colleague Deborah Baron, director of corporate development for Autonomy’s Zantaz subsidiary, provides another example – a global enterprise facing a multibillion dollar loss that used her company’s software for rapid fact assessment in a fashion that ultimately protected the enterprise’s reputation.

Other text-technology vendors in the legal space include Attenex, EED, MetaLINCS (recently acquired by Seagate), Recommind, Stratify and Zylab. A number of vendors license linguistic, search, clustering and visualization modules from companies that include dtSearch, Engenium, Inxight (now part of Business Objects), ISYS and Vivisimo. Q&A: IBM's Aaron Brown on Text Analytics for Legal Compliance provides an in-depth look at IBM’s legal-sector strategy. 

David Bayer, vice president of marketing at Stratify, says that his company, with a knowledge-management heritage focusing on intelligence, publishing, and oil and gas, first introduced information-extraction capabilities almost four years ago, but that only in the last year have judges and attorneys been interested in the ability to mine entities such as personal and corporate names from documents. Like the Autonomy representatives, he cites case assessments – he focuses on time and cost factors – as a promising area of application, and he further sees usage branching out into applications in adjacent, highly regulated areas such as financial services and health care.

Bayer says that Stratify, now a subsidiary of archival-services firm Iron Mountain, sees application of text analytics to e-discovery as the first in a series of value-added services. He sees an “interesting transition from thinking of stored information as being dead to thinking of it as being active.” Clients whose initial thought was compliance are asking a new question, “How do we make this information actionable?” The answer is text analytics, capabilities that will allow users to move beyond e-discovery compliance to business-value discovery.

  • Seth GrimesSeth Grimes

    Seth is a business intelligence and decision systems expert. He is founding chair of the Text Analytics Summit and principal consultant at Washington, D.C., based Alta Plana Corporation. Seth consults, writes, and speaks on information-systems strategy, data management and analysis systems, IT industry trends, and emerging analytical technologies. Seth chairs the Sentiment Analysis Symposium and the Text Analytics Summit.

    Editor’s Note: More articles and resources are available in Seth's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Seth Grimes



Want to post a comment? Login or become a member today!

Be the first to comment!