Recently my discussions in the field have centered on Information Quality (or the lack thereof) and the EII tool set as well as the Active Data Warehouse (right-time data warehouse). We will explore this exceedingly dry (hopefully interesting) aspect in this blog entry, particularly in relation to Compliance and Integration - but I felt that it fits under SOA as well - so here goes.
Information Quality (according to Larry English) includes business processes, data, reporting, and people involved in interpretation of the information. But Information Quality both helps and hurts compliance efforts, particularly when the corporation is audited.
One of the over-simplified definitions of SOX is (at least at the data level): Can your system show the "before it was changed, after it was changed, and when this change occurred" audit trail. Without being able to answer these questions, any "software product" that claims it is SOX compliant is flat out wrong.
What does this have to do with EII?
EII can be quality driven too - but in doing so, it can and will, break compliance - IF it is told to transform data in the middle without producing an audit trail of what it did, what it used, and when - this is where the Write-Back capability of EII comes in handy - we almost need a EII-SOX warehouse to record the information flowing THROUGH EII tool sets in order to meet compliance and auditability.
I've written an EAB (executive action brief) on this site that talks about "making your data integration processes compliant" - click on B-Eye-Network, go to the HOME page, and look for "education" link in the lower left corner.
Now, if EII is _not_ transforming data, then the data set it pulls from should be "sox compliant" - it shifts the owness back on to the source systems to maintain audit trails of any information it changes. For source systems, this is a no-brainer, they are capture systems and are supposed to be "systems of record" for the business, which means the business is already supposed to "trust" these systems - even though the data quality may or may not be there.
Time out - this doesn't make a lot of sense, where's the quality in all of this?
Ok - compliance is one thing, but the reason I talk about it is this: Information Quality tools CHANGE DATA under the covers, therefore in order meet compliance initiatives and be auditable, we must surround these tools with a before and after process. At the DATA level - this means that if we introduce "quality processes" in-stream with EII, we could be in serious trouble with Compliance - again, unless we record the effects (before/after and when).
Quality Tools are nothing more than transformation engines (ok - they do a LOT more than that), but when it comes to bare-bones they are CHANGING DATA sets. Therefore: everything that applies to ETL/EAI and data mining (in accordance with compliance) also applies to EII, and the processes that load active warehouses.
Wait a minute! Active Warehouses have a refresh cycle that's too fast to put a quality trigger in play, right?
Right and wrong - remember active warehouses are "right-time" warehouses, it's all about latency. However, there are active warehouses that cannot use quality initiatives in-stream, because the data decays too fast.
Now what we will say is this: even ADW's still have "strategic" initiatives to them, which means that only the tactical sides for-go the quality settings (until the strategic based quality engine cleanses the historical data, and sometimes that historical data is returned to the source during transactional/tactical processing).
Remember this: Information Quality is SUBJECTIVE, it is one version/one flavor of the truth - truth is subjective, and will change depending on the eye-of-the beholder (the end user). Therefore, quality engines MUST be held accountable and auditable by surrounding them with processes that capture before-after-when (BAW).
Can EII use IQ tools or Data mining processes in-stream?
Sure, and they probably should - especially when sourcing external or freely available data. I'm just saying that EII will have to take the extra hit, and write-back the BAW somewhere to be compliant. The challenge here is: when to initiate a quality process, and keep it so that it doesn't impact the query timing too significantly. Now if EII is pulling from the strategic side of the warehouse - wonderful, it should be pulling quality data (already altered/cleansed/patched).
Can ADW use IQ tools or data mining processes in stream?
Yes and no - depending on the latency requirements this may vary. Most ADW's at 5 minutes or less latency, don't run IQ processes in-stream, companies at this level use IQ tools plugged directly in to their source/capture systems, which raises other questions, like how do I find my broken business processes? But that's for another day, another entry.
Quality should come "after" the load of the raw data to the data warehouse, or "after" the load of the raw data into the EII engine, it should be secondary, and applied only if there is an audit trail mechanism in place to trace back to the original data.
Thoughts and comments are welcome; I'll blog more on the subject if there's an interest.
Posted October 26, 2005 6:33 AM
Permalink | 2 Comments |