Blog: Dan E. Linstedt« Compliance, Data Integration, Accountability? | Main | Nanotech - genes and computation ability » Compliance, Data Integration, Part 2As strange as it sounds, statement-of-fact is exactly what the data warehouse should become. See Bill Inmons article on Bill Inmon's Vision for a data warehouse. In this second installment we explore compliance, auditability and integration routines. Let's take a look at data we think is not auditable by compliance rules... For a hypothetical example, lets assume we are a marketing company, our business is to gain subscribers to magazines through postal mail. In our business, addresses are worth more than names - but we've seen subscription rates and response rates rise when we utilize the correct names. Of course, having the address correct saves us tons of money in postage, returns, and bounced mail. Now - if a goal of our new campaign is to give away a free subscription to a magazine, in hopes of gaining advertising dollars or new paid subscribers. Now, in order to know what to "offer" in this free subscription we decide tho use one of our other data elements: the "gender" column, optionally provided of course. After profiling our data (basic analysis) we find that 60% of our gender columns aren't filled in, or aren't filled in with appropriate values. This column is of great importance as a deciding factor to subscriptions, so we decide to run a data mining algorithm to "figure out" what the gender should be set to. Uh-oh - we're bordering on changing data in the warehouse! Warning, this is NOT compliant nor advisable. But what if, we could insert the imputed value to a new table and generate a system record source, and a datetime that shows when we computed it? We've managed to leave the original data in tact, and we can show the before and after, so I think we meet compliance. But it goes deeper than this.... After mining the data sets we discover two particular rows that we are struggling with: the data says they are both: Sam Smith at two different addresses with no gender filled in. Through mining we discovered that Sam Smith #1 subscribes to Hot-Rod Magazine, and Sam Smith #2 subscribes to Home Architecture. We mine a little further and discover that Sam Smith #1 also subscribes to Home Maker and Girls Fashion, Sam Smith #2 also subscribes to Bowling Weekly, and FHM. Finally, we produce the right gender and send out the correct invitations to additional magazine titles. You get the idea. If one single column can hold financial value for the company, then most likely, most all data elements can be included in this arguement. Does this mean that ALL data can now be treated as an asset, valuated, and insured? Well - that's another blog - but it certainly means the auditor can look at how the financial figures were produced, and within that audit they might very well look at the process that generates Gender values. Does that also mean that if the data is "incorrectly calculated" that the Implementor can be held accountable? That's for the auditors to decide, but I don't want to be left in that bucket without a solid traceable set of original and unmodified information. |