January 2009 Archives

We live in a world proliferated with hand-held devices.  These devices can watch TV, see streaming movie content, browse the web, and provide interactivity via iconography and touch panels.  Yet, we have serious problems with the delivery mechanism of an EDW, and BI on these devices.  Yes, we can produce graphs and charts, and web-based reports - but it all appears to be back-ended by large scale systems that must be up all the time, and that we have to be interconnected to the web in order to "work" with our applications.

There are needs out there beyond the connected, that I will explore in this entry.

I've used my daughters Nintendo DS.  I've seen my business partners I-Phone look-alike, and of course, I've seen all of this technology out in the field.  What I marvel at is the advancement of database appliances in on the server side.  I really like the column based database approach to housing data - especially if it's a small data set (less than 100 Terabytes).  I'm familiar with Netezza, Dataupia, Vertica, and ParAccel, and a few others.

I've lately been hearing about a few needs from customers:

1) the need to have a centralized management architecture and framework

2) the need to have centralized governance processes

3) the need to have de-centralized data stores for privacy and ethics reasons (again controlled centrally), acting as a local cache of specific segmented data sets

4) the need to be able to access data on-network, and off the network - and when the network appears or is available, the device automatically synchronizes with the master EDW store.

5) the need to have real-time data, and an operational application directly on TOP of the EDW, so that history is available, but operational activities can take place in keeping the data fresh, or applying it to day to day activities (along the Operational Data Warehousing side).

6) the need for the centralized EDW data store keeping all history for compliance and accountability.

Now I think to myself, with all these wonderful advances in high-speed, MPP, and parallel computing we can easily achieve #6 (VLDW, high speed centralized EDW) etc... And with all these wonderful advances in hand-held devices, why can't we take advantage of them?

Well, I said quite a while ago, that I believe individuals in BI will switch their delivery mechanisms to Adobe Flash-Like platforms, we see it now with Microsoft's competing platform: SilverLight, and of course using things like QuickTime and Final Cut Studio.  Interactive video, and user-interfaces on "video like" delivery systems will begin to make a difference in how we write our applications, and interface with our data. 

BUT: we still need local data stores.  We've seen a rise of In-Memory databases, Object Oriented data stores, but where-o-where are the column based databases?  I strongly believe that with the massive compression ratio's they get, the high speed access they have, and the ability to load trickle data quickly (not to mention adapting the physical data architecture by adding and removing columns is very easy) - that they would have come to market with a hand-held device.

These are the requirements I think would be awesome to see one of these column based appliance vendors make available.

1) Column based data stores with in-memory pinning, pre-configured on a FLASH drive that can plug & play with an I-Phone like device

2) Application bundle on-top of the column based database for OLTP purposes.

3) partial historical data store acting as a local cache - available from column based data store

4) minimal configuration parameters, such as HTTPS addresses, and FTPS for auto-synchronization of data set changes when the network is available.

5) simple switch "on/off" for to control when synchronization takes place

6) encryption/decryption of the data set both in storage, and in transit - while the device talks to the main EDW mothership.

7) each device data is scrambled by different keys (multiple keys).

8) flash-based, or interactive video based application (still with forms and such) to collect data and feed it via SOAP/XML or web-service protocoll to and from the column based database.

9) additional ability to define application logic with functions embedded in the column based appliance

Now I could be behind the times here - maybe someone out there already has this platform, and if so - I'd really like to hear about it.  If not, then I'd like to hear who may be close to this.  The point is, I can think of at least a dozen of my large clients who can use this functionality today, and I simply don't have a solution to offer them.

Hope to hear from you soon,

Dan Linstedt,

Posted January 29, 2009
Posted January 24, 2009
In this entry we explore the nature and notion of compliance - specifically Sarbanes-Oxley and what it means to your EDW. I've been working with compliant based systems for years. Over the years I've learned about data as an asset, that is: data in the EDW affecting the financial bottom line. I've learned about audits and auditability (been through a few of them myself). In this "series" I will first explore sarbanes oxley, then follow with CoBIT, ITIL, SEI/CMMI Level 5, and a few other things. Please let me know what you think of this entry/series and if you'd like to see more.

Let's start with a few definitions:

1. Public Company Accounting Oversight Board (PCAOB)
2. Auditor Independence
3. Corporate Responsibility
4. Enhanced Financial Disclosures
5. Analyst Conflicts of Interest
6. Commission Resources and Authority
7. Studies and Reports
8. Corporate and Criminal Fraud Accountability
9. White Collar Crime Penalty Enhancement
10. Corporate Tax Returns
11. Corporate Fraud Accountability

How does this tie to my EDW/BI initiative?
Very interesting question, to which I have an opinion. My opinion is as follows: I believe that data is an asset within our organizations. Now before you run off to tell the world that "only good data is an asset", let me back up. Good, Bad, and Indifferent - data is an asset - regardless of how it's perceived. Data that is captured, or created on the fly is an asset. It doesn't matter if it's good or bad data. Besides, who determines which label to place on the data?

With data as an asset, it affects the bottom line financials. Financial decisions are made based on data every day, sometimes every second. In some cases (like NASA), data affects peoples lives. Clearly, data is worth something on the financial books.

Ok - so how do you value it?
That's a discussion for another day.

Now that data is seen as an asset to the corporation, and that it's considered tied to financials, it should be available for audits, and compliance. The compliance must come from the people themselves within the organization; however the data can shed light on the firm's compliance or non-compliance abilities. In other words, the data can tell the auditors: "what the company knows, and how they are reacting to the situation." The data can also help determine the "net-worth" of the organization.

Sarbanes-Oxley_Act (a little further down says)
Auditing Standard No. 5
* Assess both the design and operating effectiveness of selected internal controls related to significant accounts and relevant assertions, in the context of material misstatement risks;
* Understand the flow of transactions, including IT aspects, sufficient enough to identify points at which a misstatement could arise;
* Evaluate company-level (entity-level) controls, which correspond to the components of the COSO framework;
* Perform a fraud risk assessment;
* Evaluate controls designed to prevent or detect fraud, including management override of controls;
* Evaluate controls over the period-end financial reporting process;
* Scale the assessment based on the size and complexity of the company;
* Rely on management's work based on factors such as competency, objectivity, and risk;
* Conclude on the adequacy of internal control over financial reporting.

Ok, I can see how source systems are affected, but how does this tie to my EDW?
The EDW must house "A SINGLE VERSION OF THE FACTS for a specific point in time." (see Data Vault Modeling and Methodology e-learning on The Data must tell a story of what the company DID and how they REACTED to a specific situation that occurred within the organization. The data in the EDW must create an AUDIT TRAIL of decision making along the way. The EDW is crucial to uncovering the facts about what people knew when. It MUST become a system of record "capture mechanism" in order to meet compliance initiatives.

Wait a minute, that's a big leap - I don't follow...
You're not alone. Many people around the world are now discovering that the only way to uncover corruption, fraud, or pure misjudgment is to look at the good, the bad, and the ugly data in the EDW - and how it changed (or didn't) over time. The EDW tells the story of the companies' evolution, ranging from new source data, to changing of the business rules. Ok, back to the point:

How can you "assess the effectiveness of audit controls" without looking into the EDW for a data trail of how the company is operating? Especially if you are warehousing the financial systems...

How can you "Understand the flow of transactions" without tracking how the flow's business rules changed the transactions along the way? An EDW should capture the history of the raw transactions BEFORE and AFTER the changes in order to meet compliance.

SOX 404 compliance costs represent a tax on inefficiency, encouraging companies to centralize and automate their financial reporting systems. This is apparent in the comparative costs of companies with decentralized operations and systems, versus those with centralized, more efficient systems. For example, the 2007 FEI survey indicated average compliance costs for decentralized companies were $1.9 million, while centralized company costs were $1.3 million.[28] Costs of evaluating manual control procedures are dramatically reduced through automation.

Regarding costs, the EDW is meant to be a centralized repository of information. The Sarbanes-Oxley auditor should be asking to view the financial reports from three directions - using triangulation to spot discrepancies.

Auditor to the firm:
Direction 1: Show me today's financial reports from today's data... (firms response: ok, either from the EDW or from the operational systems) - usually this will come from an "OPERATIONAL DATA WAREHOUSE" or a system using operational BI.

Direction 2: Show me yesterdays' financial reports - reproduce them for me using yesterdays' routines, and yesterday's data.... don't just grab your "old hard-copy"... (firms response: ok - from the EDW, and the backed-up routines, and yesterday's data mart).

Direction 3: I see errors, discrepancies between the two reports... Now, show me the RAW detail data that went in to yesterdays report, and the RAW detail data that went in to today's report. (Firms response with a "version of the truth" warehouse is: UH-OH, we're in trouble.... Firms response with a Data Vault is: No problem)

Data is an asset, data affects the financial bottom lines. RAW data needs to be tracked in the EDW in order to be compliant with Sarbanes-Oxley. Auditors will ask to see this information, and the EDW better have it.

*** Compliance initiatives are difficult (if not impossible) to meet without a historical tracking of RAW data sets, integrated, and stored in the EDW ***

Changing the data on the way IN to your EDW can cause a compliance audit failure in the future, especially if the source system is retired, is destroyed, or is unable to "restore" the system of record that created the data in question. The EDW is the ONLY place in the future to house this information.

I will be continuing my series on auditability, compliance here - but you can also find out more by registering, and watching new on-line courses on - I will have some courses available by February 15th, 2009 about auditability and compliance and the EDW.

I will continue my series as well, in discussing governance controls, and accountability as we move forward.

I'd like to hear about your thoughts/experiences. Please reply with comments below.

Dan Linstedt
CIO, Genesee Academy, LLC

Posted January 20, 2009
