Search is beginning to gain interest in the business intelligence (BI) space, and several people have begun asking about the difference between querying data (using a traditional database query language) and browsing data using a search tool, and which approach to use when. This blog entry is an attempt to put a stake in the ground and to encourage a discussion on this topic.
Let's first discuss traditional database queries. In the BI environment, a BI tool or application issues a query against a database system. The query is in a formalized database language such as SQL (note that S stands for structured). A GUI may be used to hide the complexities of SQL. The results of the query can be browsed, reported on, analyzed, etc. The emphasis of this style of processing is on the analysis of structured data. It is not designed for the ad hoc browsing of information from a bunch of unrelated data sources.
Typically database query processing is formalized and standardized both in terms of the query language itself and the database structures accessed. There are natural language interfaces for doing this type of processing, but they simply convert the natural language requests into database language statements.
With database query processing you have to have some knowledge (i.e., metadata) about the structure of the data before you can access it. There has been a move to relax the structured approach of structured database processing by adding database structure and language support for accessing and analyzing semi-structured data such as XML.
When it comes to unstructured data there are two options. The first is to transform all or some of the unstructured data into a structured or semi-structured format, and store the transformed data in a database system (together with any remaining unstructured data that has not been converted). The transformed data can be associated with existing structured data in the database. This transformed data can then be processed by database queries, and any associated unstructured data retrieved as a part of the result set.
A second option for unstructured data is to access it directly using a search tool. With a search tool information is accessed using search queries. As with database languages, these queries can be generated from a GUI. The results from search queries can be browsed to find the data of interest. The search results can also be passed to an analysis tool for further processing.
Search languages are designed for accessing freeform unstructured data. Of course search queries can also access both structured and unstructured data. Search query languages are less complex than database query languages, because, unlike languages such as SQL, they are not designed for complex data retrieval and analysis.
To improve the accuracy of search queries, metadata can be extracted from the unstructured data using utilities supplied with the search tool, or by third-party vendors who offer taxonomy and information exploration tools. The better the metadata, the more accurate the search results are likely to be. The metadata adds semantic meaning to the unstructured data. In some cases, the metadata can be used to build faceted or taxonomy-driven search interfaces that use the metadata to filter the search results. The metadata can also be used to aid in the transformation of unstructured data into a semi-structured or structured format. It is these types of capability that separate enterprise search tools from internet search tools.
The bottom line is that the database query and the search query approaches are starting to come together. However, search queries are designed for the browsing of less formalized and unstructured information, whereas database queries are intended for the analysis of structured and semi-structured data. As discussed, unstructured information can be transformed into a semi-structured format, and search results can be further analyzed by analysis tools. Both approaches use some form of query language.
Posted September 12, 2007 11:38 AM
Permalink | No Comments |