We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


The Data Audit Imperative

Originally published December 2, 2009

Data is now widely accepted as being something of value. In this it is analogous to money, and it seems to share another property of money in that it can "flow." Modern bookkeeping recognizes the ability of money to flow by identifying a credit and a debit for each side of a financial transaction, and treating the transaction as a flow of funds from the debit side to the credit. Should we do something similar for data? I think that we should, but I think that what we need to do with flows of data also bears a resemblance to another aspect of financial management the need to audit.

The word "audit" comes from the word "auditor" which originally meant "one who listens." Once upon a time, auditors really did listen. For instance, it was quite common in the Spanish empire for the monarch to send auditors to the colonies to gather intelligence on what was happening in the colonial administrations. The auditors would report directly back to Madrid. In this way, the Emperor or Empress could be assured that the colonies were being run in compliance with his or her wishes. The idea of the auditor was gradually developed from this early arrangement into what we have today.

The Need to Audit

Today, compliance includes the idea that we are not simply expected to do something, but that we may also be expected to prove that we did it. Whatever mechanism we use to perform a task be it manual, or automated, or both it is impossible to use it to prove that the task was performed in the way that was expected. Suppose I write a function to move data from Table A to Table B, and this function outputs the number of records read from Table A and the number of records written to Table B. This is a very good feature, but it is not an audit. Perhaps the records were not really written to Table B in the Production environment, but were written to Table B in the Development environment because somebody forgot to change the connection string. The function might produce perfect record counts, but the process would still have failed. Of course, this is not to say that processes should not have their own internal controls like record counts. They should. However, there is also a need to independently verify that the process has functioned in the way it should have. There is a need to audit it.

The Flow of Data

Data movement is very common in modern IT environments. We expect that transaction applications will produce data, but that different informational applications will analyze it. The data must be moved from the transactional applications to the informational applications. But data movement is even more pervasive. Data is moved among transaction applications, in both real time and batch modes. Data is sent to external parties, such as regulators, and received from others, such as data vendors. We are all aware that myriads of data flows happen every day in the enterprises we work in, just as financial flows do. However, data flows are different. There is no real "debit" from the source of data. Data is nearly always copied, rather than moved. That is, the records which are written to the target do not result in the elimination of the corresponding records in the source. This makes data difficult to deal with. There is no single place in which a given record is located it may have been copied to many places.

Given that data flows are now so common, it is worth considering if these flows should be audited. It would be nice to have an independent assurance that the data which we think we have moved actually got moved, and that the data came from where it was supposed to have come from, and has gone where it was supposed to go to.

What Can Go Wrong?

Actually, this is not really a "nice to have" feature for a modern enterprise. It is essential. Data flows can go wrong in all kinds of ways. Consider orchestration of data movement. We may have a nightly flow from a table in Transaction Application A to Staging Table B in Data Warehouse C, and a second flow from Staging Table B to Fact Table D in the warehouse. Suppose that the flow from A to B is scheduled to run at 01:30 a.m. every day, and the flow from B to D at 04:00 a.m. every day. Now suppose that the first flow is delayed and does not happen until after the B to D flow has completed. We obviously will have a problem.

A single isolated example like this seems easy to comprehend and might not seem to really require an audit process to detect exceptions. Perhaps this could be done within the process itself. But when we have hundreds or thousands of data flows per day, figuring out everything that could go wrong and specifically coding it into the data movement processes is not scalable. Also, what happens if a data movement process for whatever reason simply is not run? It cannot detect its own failure. We are back to the need for independent verification for auditing.

What is Data Auditing?

We undoubtedly still have a lot of theoretical and practical work to do in the realm of data auditing, but it is possible to see the outlines of what it should consist of.

A data auditing tool should allow us to identify a source and a target. Data is going to flow from the source to the target. We should then be able to identify the records expected to have been moved in the source and the records expected to have arrived from the source in the target. This could be simple, or it could be complex. It can certainly involve identifying subsets of records in the source and the target. If this is the case, we will inevitably need a business rules approach. Logic will be needed to identify the subsets of records in the source and target. Perhaps this will be based on SQL queries. This logic will require metadata, such as description of what the logic is trying to do, who set it up, and how it corresponds to some kind of business reality. Governance processes will need to be overlain on all of this. Thus, we can quickly appreciate that a simplistic programming approach will not be sufficient.

There must be other components in the architecture that supports data auditing. The results of the audit runs must be stored in a database. A notification service will be needed to send messages to stakeholders if exceptions are detected. This, in turn, requires elements for stakeholder management. Then there is orchestration. The audit processes have to run in the correct time windows and observe the proper dependencies. And then there are the governance processes to configure, monitor, and evaluate the auditing.

None of this is ultimately easy. However, it needs to be addressed to stop the "data mess" from spiraling ever further out of control in the enterprises we work in. Automated tools are now making their appearance in this area. They will be part of any solution, but all data managers need to begin thinking in earnest about data auditing.

  • Malcolm ChisholmMalcolm Chisholm

    Malcolm Chisholm, Ph.D., has more than 25 years of experience in enterprise information management and data management and has worked in a wide range of sectors. He specializes in setting up and developing enterprise information management units, master data management, and business rules. His experience includes the financial, manufacturing, government, and pharmaceutical industries. He is the author of the books: How to Build a Business Rules Engine; Managing Reference Data in Enterprise Databases; and Definition in Information Management. Malcolm writes numerous articles and is a frequent presenter at industry events. He runs the websites http://www.refdataportal.com; http://www.bizrulesengine.com; and
    http://www.data-definition.com. Malcolm is the winner of the 2011 DAMA International Professional Achievement Award.

    He can be contacted at mchisholm@refdataportal.com.
    Twitter: MDChisholm
    LinkedIn: Malcolm Chisholm

    Editor's Note: More articles, resources, news and events are available in Malcolm's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Malcolm Chisholm

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!