Originally published April 25, 2007
While numbers differ about how much unstructured and semi-structured business content exists in organizations, there is no question that it far exceeds the amount of structured business data managed by today’s data warehousing systems. To date, there has been limited interest in the use of business content in business intelligence (BI) and data warehousing, but this is beginning to change. The rapid growth in business content generated by web and collaborative applications coupled with technology improvements in areas such as search and text mining have created a situation where business content can now be used to extend the analytical power and decision support capabilities of business intelligence.
In this article, I look at the different types of business content that exist in organizations and their value in the decision-making process. I also review different approaches to accessing and analyzing this content in a BI environment. For simplicity, I use the term business content to refer to unstructured and semi-structured business information, and the term business data to refer to structured business data.
Types of Business Content
There are five main types of business process in IT systems: business transaction, master data, business intelligence, business planning, and business collaboration. Most business data is created and managed by business transaction and master data processes. Business content, on the other hand, is created and used by all five types of business process.
Unstructured and semi-structured information varies widely in both format and content. Unstructured information, for example, includes rich media files containing audio and video, and text files containing electronic forms, reports and web pages. All of these file types may contain useful business information. Audio file recordings of conversations between customers and support center staff provide valuable insight into the efficiency of support staff and about customers’ views concerning products and services. Similarly, electronic forms used by support staff may also contain information about customer attitudes and viewpoints. Product review web logs (blogs) offer valuable feedback on the acceptance of new products in the market, whereas product and services websites contain competitive pricing on everything from books and DVDs to airline fares.
A review of semi-structured information shows that a high percentage of this type of information is in an XML format. Tags in XML files provide some semantic information about the contents of the file. There are also an increasing number of industry XML vocabularies, or metamodels, that add additional semantics to XML documents. An example here is XBRL for reporting financial information. XML is becoming the standard approach for exchanging information between systems and between companies.
Examples of applications that can analyze unstructured and semi-structured content and thus enhance BI processing include customer and market intelligence, pricing optimization, customer sentiment and complaint analysis, product safety and quality analysis, regulatory compliance, legal discovery, fraud detection, financial analysis, and IP protection.
Processing Business Content
Information in business content can add value to the existing business data used by business intelligence applications and their underlying data warehouses. In some cases, the business content may be converted into structured business data and loaded into a data warehouse, while in other situations, the business content may be accessed dynamically and used in conjunction with the results obtained from BI processing.
There are six main approaches to accessing and using business content in a business intelligence environment:
These six approaches are illustrated in Figure 1. (The difference between managed content and unmanaged content in the diagram is that managed content is usually maintained by a content management system and subject to governance procedures, whereas unmanaged content exists in standalone files and databases.) With all six approaches, it is important to understand the capture, transformation, and delivery techniques that are supported and used.
The Impact of Business Content on Business Intelligence
Business content will have an increasing effect on business intelligence processing over the next few years, and it is essential that business intelligence and data warehousing designers, architects, and specialists thoroughly understand how to use this type of information and exploit the significant benefits it offers to the business. It will also become increasingly important for BI staff to work closely with their counterparts that are responsible for building the systems that manage and process business content.
Recent articles by Colin White