In an earlier post I outlined the first step to become data-centric: determine what your business data is. This requires that business data be (1) named; (2) defined; and (3) published. This first step requires IT to implement a data discipline and governance program that commits to the published business data as the means to interact with the business community on their data and information usage and needs. Another way to think of this is as defining the semantic layer between the business and the data and information IT delivers to the business.
The second step to becoming data-centric focuses on the rules that govern business data. These rules take several forms:
Data Specification - these rules define the characteristics of a data element:
o specifying a type of data (date, number, string)
o specify the form of the data (numeric, alphabetic, alphanumeric, and so forth)
o specify the format (identifying whether a date format is mm-dd-yyyy, dd-mm-yyyy, yyyy-mm-dd; number of decimal places in a number; and so forth)
Data Combination - these rules define data elements such as an Address that are a combination of specific data elements, such as Street (street number and name), City, State, and Zipcode
Data Validation - these rules define the valid content of a data element:
o valid values, a list of states to validate whether a state data element, such as is found in an Address, that has been entered is valid, for example
o valid ranges, to validate a Zipcode that should be in a range of 08800 to 08899, for example
o table validation, to validate a medical procedure code for a person against the person's gender, age, and so forth, for example
Data Derivation - these rules define how data is derived from other data elements:
o life insurance premium, for example, may be derived from a person's age, weight, smoker status, and so forth
o salary paid, for example, may be derived from the annual salary divided by the number of pay periods in a year, or the hourly salary multiplied by the hours worked in the pay period
o more complicated algorithms may be required, depending on the derivation to be performed
Data Usage - these rules define how data is used and presented:
o conditional display, perhaps an entry for data elements, such as for passport number, passport country, and so forth, should not be allowed unless "Yes" is the response for a "Do you have a passport?" data element
o presentation format, to allow for presenting numbers, for example, in thousands or millions rather than the full number of digits
These define the range of rules that govern data, what it should be, and how it is used. This information should be included in the Glossary of Business Terms and used in your data governance process. I'll discuss rules more in my next post.
Posted April 27, 2012 12:52 PM
Permalink | 1 Comment |



