Business Analytics: The Importance of Semantic Metadata Processes
by David Loshin
Originally published March 29, 2012
When building a business analytics program, there is no doubt that one requires the standard types of metadata for the physical design and implementation of a data warehouse and corresponding business intelligence delivery methods and tools. For example, it would be impossible to engineer the data integration and transformations needed to migrate data out of the source systems and into an operational data store or a data warehouse without knowledge of the structures of the sources and the target models. Similarly, without understanding the reference metadata (particularly the data types and units of measure!), the delivered reports might be difficult to understand, if not altogether undecipherable.
Management of Non-Persistent Data ElementsWe often will assume that the business terms, data element concepts, and entity concepts that are managed within a metadata framework are associated with persistent data sets, either operational or transactional systems or with data sitting in a data mart or warehouse. It turns out that there are numerous data elements that are used but not stored in a persistent database.
The simplest examples are those associated with the presentation of generated reports and other graphical representations such as column headers or labels on charts. Another example is interim calculations or aggregations that are used in preparing values for presentation. These data elements all have metadata characteristics – size, data type, associated data value domains, mappings to business terms – and there is value in managing that metadata along with metadata for persistent items.
Business Term GlossaryThe most opportune place to start is establishing a business term glossary, which is a catalog of the terms and phrases used across the different business processes within (as well as relevant external interfaces to) the enterprise. It would not be a surprise to learn that in most organizations the same or similar words and phrases are used (both in documentation and in conversation) based on corporate lore or personal experience, but many of these terms are never officially defined or documented. When the same terms are used as column headings or data element names in different source systems, there is a tendency to presume that they mean the same things, yet just as often as not there are slight (or sometimes significant) variations in the context and consequently in the definition of the term.
Establishing a business term glossary is a way to identify where the same terms are used with different meanings and facilitating processes for harmonizing the definitions or differentiating them. The metadata process involves reviewing documentation, business applications (their guidelines as well as the program code), and interviewing staff members to identify business terms that are either used by more than one party, or are presumed to have a meaning that is undocumented. Once the terms have been logged, the analysts can review the definitions and determine whether they can be resolved or whether they actually represent more than one concept, each of which requires qualification.
Managing SynonymsThe more unstructured and externally streamed data consumed by the analytical platforms, the greater the potential for synonyms, which are sets of different words or terms that share the same meaning. The synonym challenge is the opposite problem of the one posed by variation in definitions for the same term. For example, the words “car,” “auto,” and “automobile” in most situations will share the same meaning, and therefore can be considered synonyms.
This process becomes more challenging when the collections of terms are synonyms in one usage scenario but not in another. To continue the example, in some cases the words “truck,” “SUV,” and “minivan” might be considered synonyms for automobile, but in other cases, each of those terms has its own distinct meaning.
Developing the Business Concept RegistryWe can take the idea of a business term glossary one step further by combining it with the management of synonyms to create a business concept registry that captures the ways that the different business terms are integrated into business concepts. For example, we can define the business term “customer,” but augment that description with the enumeration of the business terms and concepts that are used to characterize a representation of a customer.
This can be quite complex, especially in siloed organizations with many implementations. Yet the outcome of the process is the identification of the key concepts that are ultimately relevant to both running the business (absorbed as a result of assessing the existing uses) and to improving the business, as the common concepts with agreed-to definitions can form the basis of a canonical data model supporting business reporting and analytics.
Mappings from Concept to UseIf one goal of a business intelligence program is to accumulate data from the different areas of the business for reporting and analysis, the designers’ and developers’ understanding of the distribution of content in the source systems must be comprehensive enough to pinpoint the lineage of information as it flowed into the data warehouse and then out through the reports or analytical presentations. That suggests going beyond the structural inventory of data elements and encompassing the business term glossary, mapping those terms to their uses in the different systems across the organizations.
For example, once you can specify a business term “customer” and establish a common meaning, it would be necessary to identify which business processes, applications, tables, and data elements art related to the business concept “customer.” Your metadata inventory can be adapted for this purpose by instituting a process for mapping the common concepts to their systemic instantiations.
Semantic Hierarchy ManagementThe next level of business metadata complexity centers on the organization of business concepts within the contexts of their use. We can return to the previously mentioned automobile example from: “car” and “auto” may be defined as equivalent terms, but the particular class of car (“SUV,” “minivan,” or “truck”) might be categorized as subsidiary to “car” in a conceptual hierarchy.
The hierarchical relationship implies inheritance – the child in the hierarchy shares the characteristics of the parent in the hierarchy. That basically means that any SUV is also a car (although not the other way around), and like a car it will have brakes and a rear-view mirror. On the other hand, the descendants in the hierarchy may have characteristics that are neither shared with the parent nor with other siblings. The hierarchy is expandable (you could have a “4WD SUV” and a “2WD SUV” subsidiary to an “SUV”). It is also not limited to a single-parent relationship – you could have a hybrid SUV that inherits from both hybrid cars and SUVs.
The hierarchies lay out the aggregation points along the dimensions. Continuing the example, an auto manufacturer might count the total number of cars sold, but also might want that broken down by the different subsidiary categories in the hierarchy.
Considerations: Entity Concepts and Master DimensionsAll of these metadata ideas converge when lining up the information in the analytical environment with that of the other data sources across the organization, especially when it comes to key master concepts that are relevant for transactional, operational, and analytical applications. The master entity concepts (such as “product”) are associated with master dimensions (such as the automobile hierarchy), but the value is the semantic alignment that allows the business analyst reading the report to be confident that the count of sold SUVs is consistent with the operational reporting coming out of the sales system. These aspects of semantic metadata enable that level of confidence.
Recent articles by David Loshin
Copyright 2004 — 2020. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC