We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Business Analytics: The Importance of Semantic Metadata Processes

Originally published March 29, 2012

When building a business analytics program, there is no doubt that one requires the standard types of metadata for the physical design and implementation of a data warehouse and corresponding business intelligence delivery methods and tools. For example, it would be impossible to engineer the data integration and transformations needed to migrate data out of the source systems and into an operational data store or a data warehouse without knowledge of the structures of the sources and the target models. Similarly, without understanding the reference metadata (particularly the data types and units of measure!), the delivered reports might be difficult to understand, if not altogether undecipherable.
But even presuming the soundness of the management of the technical, structural, and operational metadata, the absence of conceptual data available for shared information will often lead to reinterpretation of the data sets’ meanings. The availability of the business metadata, particularly semantic metadata, is somewhat of a panacea, and that means there must be some well-defined processes in place for soliciting, capturing, and managing that semantic information. Some key processes will focus on a particular set of areas of concentration, as we explore here.

Management of Non-Persistent Data Elements

We often will assume that the business terms, data element concepts, and entity concepts that are managed within a metadata framework are associated with persistent data sets, either operational or transactional systems or with data sitting in a data mart or warehouse. It turns out that there are numerous data elements that are used but not stored in a persistent database.

The simplest examples are those associated with the presentation of generated reports and other graphical representations such as column headers or labels on charts. Another example is interim calculations or aggregations that are used in preparing values for presentation. These data elements all have metadata characteristics – size, data type, associated data value domains, mappings to business terms – and there is value in managing that metadata along with metadata for persistent items.

Business Term Glossary

The most opportune place to start is establishing a business term glossary, which is a catalog of the terms and phrases used across the different business processes within (as well as relevant external interfaces to) the enterprise. It would not be a surprise to learn that in most organizations the same or similar words and phrases are used (both in documentation and in conversation) based on corporate lore or personal experience, but many of these terms are never officially defined or documented. When the same terms are used as column headings or data element names in different source systems, there is a tendency to presume that they mean the same things, yet just as often as not there are slight (or sometimes significant) variations in the context and consequently in the definition of the term.

Establishing a business term glossary is a way to identify where the same terms are used with different meanings and facilitating processes for harmonizing the definitions or differentiating them. The metadata process involves reviewing documentation, business applications (their guidelines as well as the program code), and interviewing staff members to identify business terms that are either used by more than one party, or are presumed to have a meaning that is undocumented. Once the terms have been logged, the analysts can review the definitions and determine whether they can be resolved or whether they actually represent more than one concept, each of which requires qualification.

Managing Synonyms

The more unstructured and externally streamed data consumed by the analytical platforms, the greater the potential for synonyms, which are sets of different words or terms that share the same meaning. The synonym challenge is the opposite problem of the one posed by variation in definitions for the same term. For example, the words “car,” “auto,” and “automobile” in most situations will share the same meaning, and therefore can be considered synonyms.

This process becomes more challenging when the collections of terms are synonyms in one usage scenario but not in another. To continue the example, in some cases the words “truck,” “SUV,” and “minivan” might be considered synonyms for automobile, but in other cases, each of those terms has its own distinct meaning.

Developing the Business Concept Registry

We can take the idea of a business term glossary one step further by combining it with the management of synonyms to create a business concept registry that captures the ways that the different business terms are integrated into business concepts. For example, we can define the business term “customer,” but augment that description with the enumeration of the business terms and concepts that are used to characterize a representation of a customer.

This can be quite complex, especially in siloed organizations with many implementations. Yet the outcome of the process is the identification of the key concepts that are ultimately relevant to both running the business (absorbed as a result of assessing the existing uses) and to improving the business, as the common concepts with agreed-to definitions can form the basis of a canonical data model supporting business reporting and analytics.

Mappings from Concept to Use

If one goal of a business intelligence program is to accumulate data from the different areas of the business for reporting and analysis, the designers’ and developers’ understanding of the distribution of content in the source systems must be comprehensive enough to pinpoint the lineage of information as it flowed into the data warehouse and then out through the reports or analytical presentations. That suggests going beyond the structural inventory of data elements and encompassing the business term glossary, mapping those terms to their uses in the different systems across the organizations.

For example, once you can specify a business term “customer” and establish a common meaning, it would be necessary to identify which business processes, applications, tables, and data elements art related to the business concept “customer.” Your metadata inventory can be adapted for this purpose by instituting a process for mapping the common concepts to their systemic instantiations.

Semantic Hierarchy Management

The next level of business metadata complexity centers on the organization of business concepts within the contexts of their use. We can return to the previously mentioned automobile example from: “car” and “auto” may be defined as equivalent terms, but the particular class of car (“SUV,” “minivan,” or “truck”) might be categorized as subsidiary to “car” in a conceptual hierarchy.

The hierarchical relationship implies inheritance – the child in the hierarchy shares the characteristics of the parent in the hierarchy. That basically means that any SUV is also a car (although not the other way around), and like a car it will have brakes and a rear-view mirror. On the other hand, the descendants in the hierarchy may have characteristics that are neither shared with the parent nor with other siblings. The hierarchy is expandable (you could have a “4WD SUV” and a “2WD SUV” subsidiary to an “SUV”). It is also not limited to a single-parent relationship – you could have a hybrid SUV that inherits from both hybrid cars and SUVs.

The hierarchies lay out the aggregation points along the dimensions. Continuing the example, an auto manufacturer might count the total number of cars sold, but also might want that broken down by the different subsidiary categories in the hierarchy.

Considerations: Entity Concepts and Master Dimensions

All of these metadata ideas converge when lining up the information in the analytical environment with that of the other data sources across the organization, especially when it comes to key master concepts that are relevant for transactional, operational, and analytical applications. The master entity concepts (such as “product”) are associated with master dimensions (such as the automobile hierarchy), but the value is the semantic alignment that allows the business analyst reading the report to be confident that the count of sold SUVs is consistent with the operational reporting coming out of the sales system. These aspects of semantic metadata enable that level of confidence.

Recent articles by David Loshin



Want to post a comment? Login or become a member today!

Posted March 29, 2012 by mark.troester@sas.com

Hello David –

Interesting and comprehensive post! Even in a traditional environment, this is an extremely important topic to get right. If not, it can have a detrimental impact on analytics, reporting, etc., not to mention the churn that it causes between the various people involved from different organizations.

It’s also interesting to consider this along with the broader data governance topic when it comes to big data. We are doing a lot of work in this area and it definitely presents a different set of considerations. Relating to meta-data and text, the ability to generate meta-data out of textual sources after the data is dumped into Hadoop or other big data sources is an interesting topic!

Thanks again for the post.

Mark Troester, IT/CIO Thought Leader & Strategist, SAS

Twitter: @mtroester

Blog: http://blogs.sas.com/content/datamanagement/

Is this comment inappropriate? Click here to flag this comment.