Blog: Dan E. Linstedt« What's the big stink about anyway? | Main | Compliance, Data Integration, Part 2 » Compliance, Data Integration, Accountability?In this weeks' newsletter Bill discusses Sarbanes-Oxley and what it means to business. See Bill Inmons Newsletter article. In this blog we take it a step deeper - into the implementation world of data integration. What does compliance mean to those building ETL, EAI, EII and Web Services routines? What does it mean to the data set both IN the data warehouse and now being loaded into the data warehouse? What will data Integration have to endure in the coming year or two commercially? This category of blogs will explore these questions and more. For our first entry in this space, we will discuss the following question: What does compliance mean to data integration routines, and the implementors building EAI, ETL, EII, and Web Services components? For the purpose of this discussion, we will translate "Compliance at a business level" to mean "accountability and tracability of the data at the lowest grain." When an auditor holds a firm "in-compliance" or "out-of-compliance" that could potentially mean that their accounting numbers (bottom line) doesn't match up with their records or investigation. As Bill Inmon said: "In many ways, the financial transaction is the pie, and the audit needs to look at how the pie was baked and how the apples were cut up before being put in the pie." So what if we were to assume the apples and the cutting process were to include not only e-mail (along with other unstructured data), but also the actual integration routines that cleanse, check, and alter data on the way IN to our data warehouses? This would change the picture drastically. Suppose we are given a project to build a data warehouse that consolidates a single view of financial transactions from 3 source systems. The business signs-off on some requirements declaring "how to transform and cleans the data" to bring it into the warehouse. We build the data warehouse model, followed by the integration routines to load it. The story begins... Then, the business sponsor leaves the company, and a new individual "ABE" takes their place. ABE decides that he doesn't like the way the financial data is being rolled together, comes to the data warehouse team and says: change the business rules, this data isn't bad - this other data is bad. So we change the rules, and again the users, including ABE are happy. A few more months pass - and an auditor comes. The auditor asks questions like: how did this data get this way? There seems to be an error in this data, it doesn't seem to ever have been correct. Where is the problem ABE? ABE Says, check with JJJ he works for me - and so on down the line. Finally the auditor walks in to your office (you the implementor) and asks: did you write the routines that change this data? Yes? Hmmm, can you show me what the data looked like BEFORE your routine changed it? Did you know that the data your routine produces is not auditable? You struggle for a minute, and say: yes - but the source system is supposed to be the system of record, oh - and the business user JJJ signed off on these requirements... The auditor continues - JJJ said you built the routines that changed the data; he's only responsible for what they DO with the data and how they interpret the results. The auditor, smiles and gives you the benefit of the doubt: show me these source systems and if you can trace this information back to the source - then I'll go find out how this data was captured and where the break is. If not, then I'm afraid I'm going to involve you in this audit too. ----------- Ok - that's the most severe case, but as Bill said many times in his career: the Data Warehouse should NEVER be a system of record - he's right, we should make it a "statement-of-fact." One that tracks data coming into the warehouse on a granular level AS-IT-STOOD in the source system, to meet auditors needs, thus allowing traceability back to the real SOR - the source system. Finally, this means moving the business logic, cleansing, quality, and aggregation to the OUTPUT side of the EDW - changing data "because of todays version of the truth" lends itself well to single data marts, when it's backed with a full normalized EDW that carries all the details. What does this mean to the implementors of integration logic? If it is a requirement to be compliant, then it means that the data must be tracked: what it was before it was changed, when it was changed, and what it was changed into. Often times, a single source system just isn't enough, or it offloads the data onto un-restorable backup systems. If the data is not "auditable" by compliance rules then you have nothing to worry about... or do you? Watch this category, more coming soon. |