Blog: Dan E. Linstedt« RNA and RNAi in Nanohousing | Main | What does ETL do that EAI Can't? » Got Dirty SOX? EII & ADW & IQRecently my discussions in the field have centered on Information Quality (or the lack thereof) and the EII tool set as well as the Active Data Warehouse (right-time data warehouse). We will explore this exceedingly dry (hopefully interesting) aspect in this blog entry, particularly in relation to Compliance and Integration - but I felt that it fits under SOA as well - so here goes. Information Quality (according to Larry English) includes business processes, data, reporting, and people involved in interpretation of the information. But Information Quality both helps and hurts compliance efforts, particularly when the corporation is audited. One of the over-simplified definitions of SOX is (at least at the data level): Can your system show the "before it was changed, after it was changed, and when this change occurred" audit trail. Without being able to answer these questions, any "software product" that claims it is SOX compliant is flat out wrong. What does this have to do with EII? I've written an EAB (executive action brief) on this site that talks about "making your data integration processes compliant" - click on B-Eye-Network, go to the HOME page, and look for "education" link in the lower left corner. Now, if EII is _not_ transforming data, then the data set it pulls from should be "sox compliant" - it shifts the owness back on to the source systems to maintain audit trails of any information it changes. For source systems, this is a no-brainer, they are capture systems and are supposed to be "systems of record" for the business, which means the business is already supposed to "trust" these systems - even though the data quality may or may not be there. Time out - this doesn't make a lot of sense, where's the quality in all of this? Quality Tools are nothing more than transformation engines (ok - they do a LOT more than that), but when it comes to bare-bones they are CHANGING DATA sets. Therefore: everything that applies to ETL/EAI and data mining (in accordance with compliance) also applies to EII, and the processes that load active warehouses. Wait a minute! Active Warehouses have a refresh cycle that's too fast to put a quality trigger in play, right? Now what we will say is this: even ADW's still have "strategic" initiatives to them, which means that only the tactical sides for-go the quality settings (until the strategic based quality engine cleanses the historical data, and sometimes that historical data is returned to the source during transactional/tactical processing). Remember this: Information Quality is SUBJECTIVE, it is one version/one flavor of the truth - truth is subjective, and will change depending on the eye-of-the beholder (the end user). Therefore, quality engines MUST be held accountable and auditable by surrounding them with processes that capture before-after-when (BAW). Can EII use IQ tools or Data mining processes in-stream? Can ADW use IQ tools or data mining processes in stream? Quality should come "after" the load of the raw data to the data warehouse, or "after" the load of the raw data into the EII engine, it should be secondary, and applied only if there is an audit trail mechanism in place to trace back to the original data. Thoughts and comments are welcome; I'll blog more on the subject if there's an interest. Cheers, |
Comments
I think I'd have to beg to differ with you on the claim that EII will have to perform a write-back to satisfy an audit.
Consider this. EII, even if it transforming data instream, is effectively stateless. The only state of the EII operation is the metadata that describes the data source, the transformation rules, the exact request being issued, when that request was issued, by whom, etc. So, given a compliant set of source systems that can meet the "able to reproduce a replica of what the state of data was at a point in time" requirement, the EII tool should be able to reuse its own metadata to produce the exact same output that it produced originally. In which case, you have all of the data (from the source system) and process specification (from the metadata) to show the original results, explain how they were achieved, and explain why the might be different now.
Of course it isn't trivial to maintain metadata versioning like that. You've got to have a perfect record of the system configuration, either in source control or metadata backups. But I should think that you'd be able to fulfill compliance needs in an EII environment without forcing the system to actually log the data anywhere.
Posted by: Paul Boal | October 26, 2005 8:15 PM
Hi Paul thanks for your comment, I appreciate the feedback - and I understand why you'd think this way.
In my opinion, no system is "safe" from audits, the truth - even in operational systems - can be questioned in courts, which raises the fundamental question: is there ANY data which is stored anywhere, that can be called "fact" without a doubt?
But beyond that, the issue I have with EII not recording what it saw before-after-when, is in the case where the OLTP data changes in between queries - therefore nullifying any chance of EII actually producing the same result twice.
In all reality though, if we make the assumption: 80% stays the same, 20% changes in OLTP systems - then I think we're safe with what you've stated, and I would agree - and as you've said: a set of sourcing systems that can produce the data as it was at that point in time, then write-back is not necessary within EII. In this case, it might just be an integrated warehouse that undertakes this task. When we get into unstructured data, we need to be careful about how and what is sourced - unless it too is included within the warehouse in one fashion or another.
Thanks Paul, great comment! :)
Dan L
Posted by: Dan Linstedt | October 27, 2005 6:52 AM