We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


A Brief Review of Indicator and Flag Columns, Part 2

Originally published February 4, 2009

In Part 1 of this article, we looked at the basic characteristics of what are known as indicator or flag columns in databases. These are columns that, at least ideally, should record a value of true, false, or indeterminate for an attribute whose definition can be stated as a proposition that can be affirmed or denied.

Let us now look a little further into how these columns relate to business rules and to metadata.

Derived Indicators

All indicator columns have the same conceptual domain – i.e., the values true, false, and indeterminate. However, there are different classes or groupings of indicator columns. A particularly important class is derived indicators.

Not many years ago, there was a lot of resistance from data modelers to having any derived attributes in a data model, let alone derived columns in a physical database. It was thought that all computed or derived data should be computed or derived “on the fly.” Obviously, this led to practical difficulties like performance problems. In theory, a balance for an account can be calculated by summing up all the transactions against that account – but that might be many years of data. More insidiously, the rules to calculate the balance would have to be redundantly, and perhaps differently, implemented everywhere the balance had to appear. Today, this view has largely been abandoned, but it has not necessarily been replaced by rigor in dealing with business rules that specify derived data.

This is particularly important for indicator columns. If an indicator column is derived, then the derivation must be understood in addition to the definition of the column. In practice, definitions are often much more vague than they should be, and the business rules for the derivations are absent.

Consider a criminal justice database where we have a table Person with a column that has the logical name of Number of Prior Convictions. A person that has prior convictions is a person with a criminal record. Having a criminal record is a different piece of information to Number of Prior Convictions and is useful in relating to questions where it is asked directly if a person has or does not have a criminal record. We could thus implement an indicator called Criminal Record Indicator on this table. A data analyst might give it a definition such as:

The person has been convicted of one or more criminal offenses.

This is a good definition. But it contains no reference to the fact that the column is derived, and that it is derived solely from the column Number of Prior Convictions. Nor does it say how it is derived.

Usage of Syllogisms

Given that the definition of a derived indicator is different to the rule or rules for its derivation, how should we treat the rules? A clue comes from the fact that as we saw earlier, the indicator column answers true, false or indeterminate to a particular proposition. Therefore, we have to arrive at this proposition. The best way to do that is to use a traditional syllogism, such as the following:

  • Every person with a Number of Prior Convictions greater than zero has a criminal record.
  • This person is a person with a Number of Prior Convictions greater than zero.
  • Therefore, this person has a criminal record.

The Criminal Record Indicator affirms or denies the proposition:

This person has a criminal record.

We now see quite explicitly how we get from the column Number of Prior Convictions to the indicator column Criminal Record Indicator. We also have a methodology that goes with it, such as there only being three terms allowed in the syllogism. The syllogism should be stored as metadata representing the business rule, in addition to the definition.

Trying to do all of this with pseudocode can be trickier:

If Person Number of Prior Convictions is greater than zero,
Then Criminal Record Indicator =“Y”
Else Criminal Record Indicator =“N”

This never arrives at the proposition that the indicator column is affirming or denying, and it is implementation specific, saying what the value representation in the indicator column should be. It lacks the explanatory power of the syllogism and does not take advantage of the fact that an indicator is really a particular proposition.

Ease of Change

Suppose that we later find we made a mistake and that a person can only have a criminal record if they have been convicted as an adult, rather than as a juvenile. What now seems to be perhaps the rather tedious and long-winded way of developing our business rule allows us to easily understand it and spot what has to be changed. It now becomes:

  • Every person with a Number of Prior Adult Convictions greater than zero has a criminal record.
  • This person is a person with a Number of Prior Adult Convictions greater than zero.
  • Therefore, this person has a criminal record.

Of course, we will need another column for Number of Prior Adult Convictions, which is perfectly fine.

Indicators as Metadata

Besides derived indicators, there are other classes of indicator. One of these is metadata indicators. Sometimes these indicators are so familiar that it takes a little thought to understand that they are metadata.

Metadata, of course, is usually defined as “data about data,” which is hardly helpful. For the purposes of indicators, it is better to separate data from metadata as follows. A record should describe an entity instance in the real world (e.g., a record in the Person table describes are real individual person). The columns on the record that correspond to characteristics of the real individual person are data. However, the columns that correspond to the structure or behavior of the record, or how we manage the information about the person, are metadata. We usually colocate the metadata columns on the record with the data columns, at least for indicators. This is because if the columns were on other records in pure metadata tables, it would be more difficult to process them together with the data.

Currently Active Record Indicator is an example of a metadata indicator. It states whether the current record can be processed by current-time transactions versus being also available for historical reporting.

A problem with metadata indicators is differentiating them from data indicators on a record. It is simply not always easy to recognize them as metadata. A good approach to this is to group the metadata indicator columns all at the end of the record and have the data indicators grouped at the beginning. This is a technique called locality of reference – i.e., keeping similar related things together in one place. Of course, having metadata attributes in a repository that distinguishes metadata indicators from data indicators is good too – but does not help when one is simply browsing a table directly.

Conclusion

In these two articles, we have reviewed indicator columns and found that their simplicity is rather deceptive. They can be quite complex and require special attention. Understanding their nature and the different classes that compose them can be very helpful in overall data management in physical databases.

  • Malcolm ChisholmMalcolm Chisholm

    Malcolm Chisholm, Ph.D., has more than 25 years of experience in enterprise information management and data management and has worked in a wide range of sectors. He specializes in setting up and developing enterprise information management units, master data management, and business rules. His experience includes the financial, manufacturing, government, and pharmaceutical industries. He is the author of the books: How to Build a Business Rules Engine; Managing Reference Data in Enterprise Databases; and Definition in Information Management. Malcolm writes numerous articles and is a frequent presenter at industry events. He runs the websites http://www.refdataportal.com; http://www.bizrulesengine.com; and
    http://www.data-definition.com. Malcolm is the winner of the 2011 DAMA International Professional Achievement Award.

    He can be contacted at mchisholm@refdataportal.com.
    Twitter: MDChisholm
    LinkedIn: Malcolm Chisholm

    Editor's Note: More articles, resources, news and events are available in Malcolm's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Malcolm Chisholm

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!