Using Vocabularies to Improve Findability

Originally published August 14, 2008

Search has become the norm for finding information, yet it is far more complex than a simple just taking a word and finding all occurrences of the word. Gone are the days when word indexing and keyword search were sufficient for today’s sophisticated analyst. Buyers/customers/users expect relevant and useful results from the execution of a search, yet a high percentage of searches fail to return the answers they expect. Let’s look at some reasons behind the less than optimally effective searches that people do each day:

  • Meaning of search terms can be specific to an industry. “Media” can refer to public relations or optical storage devices, which, in turn, may be categorized as DVDs or digital versatile disks.

  • Commonly used search terms, particularly acronyms, can have multiple meanings. “GPS” may mean gallons per second or global positioning device. The searcher entering “aids” may retrieve results for hearing aids, teaching aids or acquired immune deficiency syndrome.

  • Meaning varies by context. A yoga bag for exercise equipment has a different purpose than a purse, yet both are considered to be a “bag.” “Boxer” can be a dog, a fighter, underwear or a politician.

  • Information is indexed under one search term, but a searcher using a perfectly reasonable synonym can get zero results. "Central air conditioner" doesn’t return the same results as "airconditioner," particularly when the space is omitted. "Back pack" with a space is usually interpreted as an implied OR, with different results than "backpack" without the space.

  • Search terms vary by language and by regional dialects. Native Spanish speakers may not know English colloquial language. Even native Americans use different terms for common consumer items. "Soft drink," "soda" and "pop" are regional variants on carbonated beverages. A common vocabulary containing all these synonyms improves relevancy of search results.

  • Searchers can start with one search term and then discover they need to browse narrower or broader terms to find related information. Industrial equipment is complex. Items such as pressure valves have many different shapes, sizes and functions that depend on the industry. Golf can include golf instruction, golf balls, golf clubs and golf clothes.

Language is imprecise, unlike programming languages. Using a structured vocabulary can significantly improve findability particularly in large, complex databases.

WAND Inc. developed such a structured vocabulary in response to their business needs, and then repackaged that expertise for their clients. The core dictionary of search terms was developed for the consumer electronics industry. The company has now expanded to provide comprehensive taxonomies covering all products and all services in multiple industries.

The power of WAND vocabularies comes from using a numeric coding system for each search term. Descriptions for individual search terms are then translated into eleven different languages (both Asian and European) by experts in those languages, trained linguists and lexicographers.

This numeric coding scheme has a thesaurus structure to show synonyms (related terms), as well as broader and narrower terms. Controlled vocabularies are used to describe attributes for search terms, which can be utilized for faceted search.

Search results can be improved by utilizing this platform independent approach at the different levels of data management:

  • Adding search terms as metadata during the data collection process, using multiple terms and attributes. Using controlled vocabularies can significantly improve data quality and consistency.

  • Incorporating the thesaurus into the physical indexes during the crawling process. Numeric terms are more efficient and language independent.

  • Providing smart query processing to utilize industry-specific and language-specific synonyms, related terms and suggestions for alternative terms using structured vocabularies and relationships.

Organizing digital data is not easy. Yet organization and classification are crucial to finding relevant information for decision making. Text-based assets can be processed for keyword searching. Digital images, audio and video assets, however, can’t be found without adding search terms, either on the open web or inside corporate firewalls.

A common vocabulary and structure such as that provided by WAND can be used for digital assets and data in different business applications. Using the same industry vocabulary in multiple languages improves communication with international partners and offices.

Search failures are reduced by enabling buyers/customers/users to use the same body of search terms as internal customer service, developers and marketers. Unresolved search terms from web searches can be added, as well as departmental software applications that may have their own specialized terminology.

Search is an enabling technology, built by software specialists who speak their own language. Buyers/customers, however, have their own language, shaped by industry jargon, multiple languages, buying cycle and other factors. Utilizing a structured vocabulary to bridge these languages with synonyms and search term relationships significantly improves the search experience and reduces the failure rate.

  • Jean Bedord
    Jean focuses on strategies to improve web findability and business results for content producers and content technology companies. She brings a background in search technologies and managing database products, as well as IT management, to enterprise search implementation. Her consulting practice and publications are profiled at the
    eContent Strategies website.  She is  also part-time faculty at San Jose State University School of Library and Information Sciences. Her background includes over 15 years of management experience in the online information industry.
 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!