Using Unstructured Business Content in Business Intelligence

Originally published April 25, 2007

While numbers differ about how much unstructured and semi-structured business content exists in organizations, there is no question that it far exceeds the amount of structured business data managed by today’s data warehousing systems. To date, there has been limited interest in the use of business content in business intelligence (BI) and data warehousing, but this is beginning to change. The rapid growth in business content generated by web and collaborative applications coupled with technology improvements in areas such as search and text mining have created a situation where business content can now be used to extend the analytical power and decision support capabilities of business intelligence.

In this article, I look at the different types of business content that exist in organizations and their value in the decision-making process. I also review different approaches to accessing and analyzing this content in a BI environment. For simplicity, I use the term business content to refer to unstructured and semi-structured business information, and the term business data to refer to structured business data.

Types of Business Content
There are five main types of business process in IT systems: business transaction, master data, business intelligence, business planning, and business collaboration. Most business data is created and managed by business transaction and master data processes. Business content, on the other hand, is created and used by all five types of business process.

Unstructured and semi-structured information varies widely in both format and content. Unstructured information, for example, includes rich media files containing audio and video, and text files containing electronic forms, reports and web pages. All of these file types may contain useful business information. Audio file recordings of conversations between customers and support center staff provide valuable insight into the efficiency of support staff and about customers’ views concerning products and services. Similarly, electronic forms used by support staff may also contain information about customer attitudes and viewpoints. Product review web logs (blogs) offer valuable feedback on the acceptance of new products in the market, whereas product and services websites contain competitive pricing on everything from books and DVDs to airline fares.         

A review of semi-structured information shows that a high percentage of this type of information is in an XML format. Tags in XML files provide some semantic information about the contents of the file. There are also an increasing number of industry XML vocabularies, or metamodels, that add additional semantics to XML documents. An example here is XBRL for reporting financial information. XML is becoming the standard approach for exchanging information between systems and between companies.

Examples of applications that can analyze unstructured and semi-structured content and thus enhance BI processing include customer and market intelligence, pricing optimization, customer sentiment and complaint analysis, product safety and quality analysis, regulatory compliance, legal discovery, fraud detection, financial analysis, and IP protection.  

Processing Business Content
Information in business content can add value to the existing business data used by business intelligence applications and their underlying data warehouses. In some cases, the business content may be converted into structured business data and loaded into a data warehouse, while in other situations, the business content may be accessed dynamically and used in conjunction with the results obtained from BI processing.

There are six main approaches to accessing and using business content in a business intelligence environment:

  • BI search where search technology is used to give users access to BI business content such as metadata, queries and analyses, and BI output such as reports and metrics. BI search can also supply connectors to a BI system for use by enterprise search. BI search is often provided as a component of a BI portal.

  • Enterprise search for locating and accessing corporate business content (including BI content) and business data. Compared with Internet search approaches, enterprise search adds techniques such as guided navigation, semantic search, and result clustering. Enterprise search is usually used in conjunction with an enterprise portal.

  • Business content analytics that are created by accessing and analyzing the contents of a search repository or the results of search operations. Often these analytics are delivered to business users in the form of a BI dashboard, or through a BI portal. 

  • Business content exploration that extracts metadata (facts, concepts, relationships) from business content. This extracted metadata may be used to build a business taxonomy, or by business content categorization tools, enterprise search tools, business content analytical applications, and data integration applications that capture and transform business content for loading into a data warehouse.

  • Business content federation techniques that use federated database queries to access and analyze business content and business data that may be maintained in multiple files and databases.

  • Business content integration applications that capture, transform, and integrate business content into the BI and data warehousing environment.

These six approaches are illustrated in Figure 1. (The difference between managed content and unmanaged content in the diagram is that managed content is usually maintained by a content management system and subject to governance procedures, whereas unmanaged content exists in standalone files and databases.) With all six approaches, it is important to understand the capture, transformation, and delivery techniques that are supported and used.

Figure 1

The Impact of Business Content on Business Intelligence
Business content will have an increasing effect on business intelligence processing over the next few years, and it is essential that business intelligence and data warehousing designers, architects, and specialists thoroughly understand how to use this type of information and exploit the significant benefits it offers to the business. It will also become increasingly important for BI staff to work closely with their counterparts that are responsible for building the systems that manage and process business content.

 

  • Colin WhiteColin White

    Colin White is the founder of BI Research and president of DataBase Associates Inc. As an analyst, educator and writer, he is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies and how they can be used for building the smart and agile business. With many years of IT experience, he has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit and is a regular contributor to several leading print- and web-based industry journals. For ten years he was the conference chair of the Shared Insights Portals, Content Management, and Collaboration conference. He was also the conference director of the DB/EXPO trade show and conference.

    Editor's Note: More articles and resources are available in Colin's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Colin White



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!