Utilizing Unstructured Data to Provide Business Intelligence Envisioning Business Intelligence as More Than Reporting Dashboards and Enterprise Content Management as More Than a Bucket of Stuff

Originally published June 2, 2008

The management of structured information has evolved throughout a relatively long history. Part of this evolutionary process included the development of increasingly sophisticated review, roll-up and analysis capabilities that are generally categorized as business intelligence (BI). BI capabilities run the gamut from simple reporting to sophisticated multidimensional analysis. BI packages often contain services for next-best-action selection as well as presentation services for roll-up dashboards and at-a-glance reviews. Historically, though, business intelligence has primarily been used in structured and semi-structured data sets.

That is changing as the wealth of information in unstructured data sources, such as documents, web pages and emails is simply too great to ignore. Uncovering that data for BI analysis has been challenging, but emerging solutions can enable BI engines to utilize semi-structured and unstructured information. Now, unstructured data can be analyzed to provide vital, enriching context for traditional BI results.

Many organizations already have the available ingredients for a robust enterprise information management system, one that combines business intelligence with structured data stores and unstructured enterprise content management (ECM) repositories. Most modern enterprise application systems (e.g., CRM, ERP, SFA) are built on a service-oriented architecture (SOA). This means that BI capabilities can be invoked and consumed in a manner that is delinked from the housing application. Additionally, ECM systems are increasingly common infrastructure assets within an organization. But capability does not necessitate desirability – tactical utility and strategic vision both need to be taken into account. The decision calculus for tactical utility is very different from strategic vision considerations.

Tactical Utility

Tactically speaking, there are two important permutations of the BI-ECM combination. The first is the exposure of ECM transactions to traditional BI engines. The second is the BI-enabling of ECM systems and content-enabled vertical applications (CEVAs). Several compelling scenarios emerge when we look at CEVAs and how business intelligence can be applied to them. These scenarios not only fit vendor models and governing BI strategy, but also contain the customer’s tactical business value. After all, it is very hard to find a compelling reason to BI-enable a repository that has no direct business value.

The key to understanding the exposure of ECM transactions to traditional BI engines is to accept that ECM systems are more than sophisticated “buckets of stuff.” Modern ECM systems are both highly structured and transactional and, therefore, obvious candidates for business intelligence. ECM systems employ sophisticated taxonomic and metadata schemas. They track and store what happens to each content item – from versioning to access requests, and a host of specialized transactions related to the content. These records are typically kept in a database. As a result, this content reflecting data can be consumed by BI services. The difference is in the interpretation of the data. For example, BI services can query for the aggregates of download requests for all white papers about a particular topic. But the appropriate intelligence systems must be in place to understand how white paper downloads map to sales leads and then to opportunity closures. To do this, the combined BI and ECM system (or EIM system) must also understand:

  • What a white paper is – this information is available from ECM metadata classification

  • What the download request is – this information is available from ECM service tracking

  • That the download requester is not someone from your company – this information is available from the web content management components of the ECM system

  • That the download requester is from an organization you are trying to sell to – this information is available from either the sales force automation (SFA), web content management or CRM systems

  • How successful you are at selling into that organization – this information is available from CRM and SFA systems

The transactional nature of ECM systems is rich with information that is relevant to the business tableau in which critical decisions are being made. This alphabet soup of systems and programs leads inextricably to the CEVAs.

Once the transactional aspects of ECM systems are recognized, the power of BI engines can be used to work on the business scenarios that matter most. At one end of the spectrum, these scenarios may be simple imaging applications. These have a fairly simple one-to-one relationship between the document and the transaction. Much more common and interesting is when the content transaction is in a case or event scenario with a life cycle all of its own.

Cases, the collections of data and content related to a unique subject, are often driven by a process. But many of these governing processes only move the case and its aggregated content items from state to state (e.g., create case, add supporting materials, review, approve, close, archive). Rarely is there one overall, end-to-end process that encompasses the states the case moves through, handles the ways in which disparate content items relate to the case, and recognizes the transactions that occur to items and the case aggregate as a whole. When ECM systems are the de facto content store under a case management system, and when information, metadata and transaction data are housed in heterogeneous locations but associated with a case (e.g., the data is being reused rather than re-created, which is very good ECM practice), BI systems are the only real way to gather intelligence about what is going on inside such CEVAs.

To the detriment of business, BI systems are rarely brought to bear on these CEVAs. Instead, first-generation reporting tools and basic database queries are the norm. These basic reporting tools cannot span the cross-system context of the CEVAs and are unable to bring interesting trends and analyses to light. Conversely, when BI and ECM systems are combined into EIM systems, the ECM transactions that occur in the context of the application scenario are selected, analyzed and composited with application transactions. This yields a rich BI analysis that provides a vastly more comprehensive picture of what is going on than a simple reporting tool or application-centric BI tool alone.

Each enterprise application has a type, a defined structure. Defined structures have predictable needs and markers. Against these, progress, success and effectiveness can be programmatically measured. As soon as we start to think of ECM systems as intrinsic to information-handling applications, business intelligence becomes an obvious necessity. Furthermore, if a pervasive BI perspective is adopted, all sources of BI data must be included. When vast amounts of information about the business are held in ECM repositories, those systems must be included in BI calculations to provide a comprehensive picture of the business and enable effective BI-driven decisions.

The support of BI by ECM needs to be mentioned. BI systems not only use information from around the organization to enable an actionable view of the business, but also generate a lot of information. This information should be treated the same way as other sensitive and important corporate information; it should be secured, persisted and protected in the ECM system. It can benefit from rich ECM services, be transformed, parsed, mashed up and re-presented in appropriately secured contexts and channels.

Internal corporate sales portals benefit from published EIM reports that present the conversion rate of sales when prospective clients read relevant white papers. Such information (if positively affecting sales) would show the corporate sales team that white paper collateral is important for successful deals. The sales teams could then leverage this collateral and drive more revenue. The information then feeds back into the system, creating a feedback loop and generating even better, more granular or more deeply reconfirmed measurements of conversions, and providing the sales team with indispensable strategies for sealing the deal.

Having ECM as an integral part of the BI system utilizes the content-handling capabilities of ECM to lower the total cost of ownership and improve the capabilities of the overall BI system. Infrastructure managers and IT staff will immediately see the relevance to storage management and budget directives.

Strategic Vision

At its core, the strategic vision of EIM systems is to move from the highly accurate but reflective descriptive analyses of present day BI systems to highly accurate, persuasive and forward-looking predictive systems. Such systems move beyond today’s hyper-accurate and granular online analytical processing reports and dashboards. They start to make semantic reasoning inferences based on organically and automatically generated thesauri and ontologies (hierarchies of meaning that pertain to a given topic). Those inferences drive real-world, automated decisions that affect the bottom line.

Real-time decisioning (RTD) is a BI technology that tracks user behavior and determines what information is most likely to benefit, persuade or be useful to the user. Consider this example. A user searches Google for “snibbling griblious” and selects the top result, which links the user to your organization’s web page. The page is not a simple, static HTML brochure; it is built by the web content management-enabled portions of your corporate ECM system, which is part of your larger EIM strategic infrastructure and therefore BI enabled. As the ECM system starts building the page, it sends the referring URL, search string and cookie ID to the RTD service of your EIM infrastructure. Then the ECM system indicates to the RTD service the user’s page visits. When constructing a page that includes implicit personalization, marketing or a test, the ECM system asks the RTD service to select specific content.

The ECM system asks the RTD service to select an appropriate, persuasive banner advertisement for this user. In order to determine what content is eligible for each decision, the RTD service consults the ECM system directly. As the ECM system is building the landing page for the user, the RTD service retrieves from its memory cache all the possible IDs for the banner ad in the home page. These results were previously cached by the RTD service, so performance is zippy. Based on previous BI analysis, the RTD service computes the likelihood of click through for each banner ad and selects the most appropriate one. The user views the landing page, is intrigued by the banner ad and clicks it. As soon as the user clicks, that information is returned to the RTD service, which stores the information in its own schema to use in further RTD analyses.

Advanced BI analysis is performed in real time to automatically generate persuasive and personalized results for the user. Additionally, every action taken by the user is registered with the RTD service, which allows the EIM system to learn what is persuasive to different users. This is one example of systems in place today, leveraging capabilities that exist in the BI and ECM systems of most organizations.

Compositing disparate systems into an EIM framework is important if we want to enable a robust view of information (and the processes that intersect the data and transition their states). Imagine bringing the power of BI analysis to unstructured ECM information. Most ECM systems have full-text index capabilities built in or available to them. This is a pre-aggregated set of unstructured data that is ripe for BI-oriented text mining. Such text mining can, at the very least, produce a set of topical thesauri that represent the conceptual topography of the enterprise. Those conceptual maps become indispensable inputs for additional, more traditional BI analyses. They become the baselines and benchmarks against which other analyses are measured, compared or referenced. Within such a mapped conceptual topography, a judge might see if his sentencing is consistent with published sentencing guidelines (structured data), as well as how consistent his decision is with previous cases. The decision is full-text indexed, and the index is then analyzed by the BI engine against the EIM conceptual topography for cases with similar characteristics. The results are predictions and percentages of consistency in concepts, keywords and emphasis. Based on the results, the losing lawyer can decide if he wishes to pursue an appeal – a good bet if the ruling misses on a precedent compatibility score. All this is available from a fairly simple thesaurus-styled conceptual topography analysis. This is descriptive BI evolving to a predictive EIM system. And this is, quite simply, what businesses need to do in order to become proactive rather than reactive organizations.

Parsing and relating concepts to concepts from a morass of unstructured data is a key capability still needed. Structured metadata and transactional data about content in ECM systems can be leveraged by BI systems. What is often overlooked is that the information contained in the content can provide important context to traditional BI analyses. The challenge heretofore has been to uncover the information in a document, email or video. However, the transformation and full-text index capabilities of ECM systems provide the view into the content that BI engines require to start the interpretive process. Transformation and templating capabilities can change an unstructured memo into a semi structured XML document. Full-text indexing brings the words, phrases and terms into a semi-structured format. A BI engine can create a concept map based on term frequencies, term proximities and term usage. When combined with metadata classifications and transactional data, an organic classification structure begins to surface. Mapped against topics, groups, or classification phyla, specific semantic ontologies begin to emerge. These organic ontologies can then be used as the basis for BI inferences that produce predictions for users.

Looking Forward

Challenges still exist in combining BI and ECM capabilities into true EIM systems. The importance of the immediate and the tactical will continue to trump the more deliberate pace of strategic implementation. Yet with a strategic vision firmly in place, progress is inevitable. As tactical needs breed strategic opportunities, and as strategic pilots prove themselves in tactical situations, the blending of tactical BI capabilities into a strategic EIM framework will emerge. It is simply a matter of time and vision.

  • Billy Cripe
    Billy has been working with enterprise content management systems for more than 8 years as a customer, implementer, solution developer and now director of product management for Oracle. He has a passion for Web 2.0 and semantic technologies within the enterprise and is writing a book on the topic titled, Reshaping Your Business with Web 2.0, due out later this year.
  • Nick Tuson
    Nick has spent much of the last 13 years developing enterprise content management (ECM) software and working closely with global organizations to improve their information management strategies. Nick is currently a Senior Director on Oracle's Business Intelligence (BI) product management team where he is helping customers to understand the natural synergy between BI and ECM, BPM and SOA.


 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!