Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

January 2009 Archives

We live in a world proliferated with hand-held devices.  These devices can watch TV, see streaming movie content, browse the web, and provide interactivity via iconography and touch panels.  Yet, we have serious problems with the delivery mechanism of an EDW, and BI on these devices.  Yes, we can produce graphs and charts, and web-based reports - but it all appears to be back-ended by large scale systems that must be up all the time, and that we have to be interconnected to the web in order to "work" with our applications.

There are needs out there beyond the connected, that I will explore in this entry.

I've used my daughters Nintendo DS.  I've seen my business partners I-Phone look-alike, and of course, I've seen all of this technology out in the field.  What I marvel at is the advancement of database appliances in on the server side.  I really like the column based database approach to housing data - especially if it's a small data set (less than 100 Terabytes).  I'm familiar with Netezza, Dataupia, Vertica, and ParAccel, and a few others.

I've lately been hearing about a few needs from customers:

1) the need to have a centralized management architecture and framework

2) the need to have centralized governance processes

3) the need to have de-centralized data stores for privacy and ethics reasons (again controlled centrally), acting as a local cache of specific segmented data sets

4) the need to be able to access data on-network, and off the network - and when the network appears or is available, the device automatically synchronizes with the master EDW store.

5) the need to have real-time data, and an operational application directly on TOP of the EDW, so that history is available, but operational activities can take place in keeping the data fresh, or applying it to day to day activities (along the Operational Data Warehousing side).

6) the need for the centralized EDW data store keeping all history for compliance and accountability.

Now I think to myself, with all these wonderful advances in high-speed, MPP, and parallel computing we can easily achieve #6 (VLDW, high speed centralized EDW) etc... And with all these wonderful advances in hand-held devices, why can't we take advantage of them?

Well, I said quite a while ago, that I believe individuals in BI will switch their delivery mechanisms to Adobe Flash-Like platforms, we see it now with Microsoft's competing platform: SilverLight, and of course using things like QuickTime and Final Cut Studio.  Interactive video, and user-interfaces on "video like" delivery systems will begin to make a difference in how we write our applications, and interface with our data. 

BUT: we still need local data stores.  We've seen a rise of In-Memory databases, Object Oriented data stores, but where-o-where are the column based databases?  I strongly believe that with the massive compression ratio's they get, the high speed access they have, and the ability to load trickle data quickly (not to mention adapting the physical data architecture by adding and removing columns is very easy) - that they would have come to market with a hand-held device.

These are the requirements I think would be awesome to see one of these column based appliance vendors make available.

1) Column based data stores with in-memory pinning, pre-configured on a FLASH drive that can plug & play with an I-Phone like device

2) Application bundle on-top of the column based database for OLTP purposes.

3) partial historical data store acting as a local cache - available from column based data store

4) minimal configuration parameters, such as HTTPS addresses, and FTPS for auto-synchronization of data set changes when the network is available.

5) simple switch "on/off" for to control when synchronization takes place

6) encryption/decryption of the data set both in storage, and in transit - while the device talks to the main EDW mothership.

7) each device data is scrambled by different keys (multiple keys).

8) flash-based, or interactive video based application (still with forms and such) to collect data and feed it via SOAP/XML or web-service protocoll to and from the column based database.

9) additional ability to define application logic with functions embedded in the column based appliance

Now I could be behind the times here - maybe someone out there already has this platform, and if so - I'd really like to hear about it.  If not, then I'd like to hear who may be close to this.  The point is, I can think of at least a dozen of my large clients who can use this functionality today, and I simply don't have a solution to offer them.

Hope to hear from you soon,

Dan Linstedt, danL@geneseeAcademy.com


Posted January 29, 2009 5:09 AM
Permalink | No Comments |

I've just read Jill Dyche's excellent entry about the value of communities, and I agree - communities are important for us to collaborate and communicate.  It's why I'm happy to be a part of B-Eye-Network.com, and LinkedIn where I belong to a number of communities.  I'd like to take a minute to tell you about other communities that we've launched recently.

By the way, I won't do this very often - most of the time I like to blog about business and IT.  I felt it necessary to let everyone know what I've been up to for the past 8 months, in addition to traveling to Europe to teach the Data Vault.

Now before you start commenting on how this is an advertisement, let me say - yes, it is in a way.  But as always, I am trying to provide the business with serious value.  The value here?  Budgets are being slashed on average of 20%, training budgets are nearly gone, yet new innovation must continue.  You can no longer afford to "go to a class", nor to "pay" to have an instructor come to you. It's tougher and tougher to get budget to attend large conferences.  We are putting content on-line, and making it affordable, so you can continue to learn and innovate at your own pace without the high-cost, and without a lot of expense.  We believe in the community messages, and are fostering three new communities that will really help the DW/BI space.

One major effort, centers on e-learning.  You can get to this community at http://inmoninstitute.com - Bill Inmon, myself, and Hans Hultgren offer on-line training for you on subjects like DW2.0, Unstructured Data, VLDW, Data Vault, CIF, and business accumen of DW/BI landscape.  We are posting new material every week.  Register for free, watch a few courses now.

In this day and age where IT budgets are slashed, it becomes near impossible to "go to a class" or to bring an instructor in.  You can take these courses at your liesure from your desk, and it's a very reasonable fee.  Anyhow, the e-learning community is the place to go for new knowledge, you can register for free, and see the quality of the video and sound that we produce, and watch a few free segments.

I also launched a more intense community around the Data Vault modeling called the Data Vault Institute.  The new URL will come soon, but for now, you can reach it at: http://www.danlinstedt.com/datavaultinstitute/  This community is free to register.  In this community, you will find white papers, articles, downloads, and a host of customers and IT folks using the data vault and discussing the business practices all over the world.  This community is more than just a technical community, it's for business users too.  Here they can find out the business value of the Data Vault, and why it helps put agility back in to their IT teams.  We will also offer a FREE 3-D data model visualizer for anyone who pays to upgrade as a subscribed memberSign up now for free!

Finally, next week, I'm launching another e-learning community - with a focus on Software Tool training.  You can see the "old-site" now, at http://www.trainovation.com - the new site is completely revamped and will be launched next week.  It will start with Informatica based e-learning, my own custom courses finally available on-line for a reasonable cost.  So take a look at this site in about another week or so to get access immediately to new content.  I'm also interested in talking to vendors who want to post their own training courses on-line with us, we have a full production video studio.

If you like this kind of entry please comment and let me know.  If you don't like this kind of entry, like-wise, please let me know.  I am curious to the feedback - and I promise, if you like it, I'll keep you up to date only once every 6 months.

You can email me directly: danL@danLinstedt.com

Cheers,
Dan Linstedt


Posted January 24, 2009 8:22 AM
Permalink | 1 Comment |

In this entry we explore the nature and notion of compliance - specifically Sarbanes-Oxley and what it means to your EDW. I've been working with compliant based systems for years. Over the years I've learned about data as an asset, that is: data in the EDW affecting the financial bottom line. I've learned about audits and auditability (been through a few of them myself). In this "series" I will first explore sarbanes oxley, then follow with CoBIT, ITIL, SEI/CMMI Level 5, and a few other things. Please let me know what you think of this entry/series and if you'd like to see more.

Let's start with a few definitions:

Sarbanes-Oxley_Act
1. Public Company Accounting Oversight Board (PCAOB)
2. Auditor Independence
3. Corporate Responsibility
4. Enhanced Financial Disclosures
5. Analyst Conflicts of Interest
6. Commission Resources and Authority
7. Studies and Reports
8. Corporate and Criminal Fraud Accountability
9. White Collar Crime Penalty Enhancement
10. Corporate Tax Returns
11. Corporate Fraud Accountability

How does this tie to my EDW/BI initiative?
Very interesting question, to which I have an opinion. My opinion is as follows: I believe that data is an asset within our organizations. Now before you run off to tell the world that "only good data is an asset", let me back up. Good, Bad, and Indifferent - data is an asset - regardless of how it's perceived. Data that is captured, or created on the fly is an asset. It doesn't matter if it's good or bad data. Besides, who determines which label to place on the data?

With data as an asset, it affects the bottom line financials. Financial decisions are made based on data every day, sometimes every second. In some cases (like NASA), data affects peoples lives. Clearly, data is worth something on the financial books.

Ok - so how do you value it?
That's a discussion for another day.

Now that data is seen as an asset to the corporation, and that it's considered tied to financials, it should be available for audits, and compliance. The compliance must come from the people themselves within the organization; however the data can shed light on the firm's compliance or non-compliance abilities. In other words, the data can tell the auditors: "what the company knows, and how they are reacting to the situation." The data can also help determine the "net-worth" of the organization.

Sarbanes-Oxley_Act (a little further down says)
Auditing Standard No. 5
* Assess both the design and operating effectiveness of selected internal controls related to significant accounts and relevant assertions, in the context of material misstatement risks;
* Understand the flow of transactions, including IT aspects, sufficient enough to identify points at which a misstatement could arise;
* Evaluate company-level (entity-level) controls, which correspond to the components of the COSO framework;
* Perform a fraud risk assessment;
* Evaluate controls designed to prevent or detect fraud, including management override of controls;
* Evaluate controls over the period-end financial reporting process;
* Scale the assessment based on the size and complexity of the company;
* Rely on management's work based on factors such as competency, objectivity, and risk;
* Conclude on the adequacy of internal control over financial reporting.

Ok, I can see how source systems are affected, but how does this tie to my EDW?
The EDW must house "A SINGLE VERSION OF THE FACTS for a specific point in time." (see Data Vault Modeling and Methodology e-learning on http://inmoninstitute.com) The Data must tell a story of what the company DID and how they REACTED to a specific situation that occurred within the organization. The data in the EDW must create an AUDIT TRAIL of decision making along the way. The EDW is crucial to uncovering the facts about what people knew when. It MUST become a system of record "capture mechanism" in order to meet compliance initiatives.

Wait a minute, that's a big leap - I don't follow...
You're not alone. Many people around the world are now discovering that the only way to uncover corruption, fraud, or pure misjudgment is to look at the good, the bad, and the ugly data in the EDW - and how it changed (or didn't) over time. The EDW tells the story of the companies' evolution, ranging from new source data, to changing of the business rules. Ok, back to the point:

How can you "assess the effectiveness of audit controls" without looking into the EDW for a data trail of how the company is operating? Especially if you are warehousing the financial systems...

How can you "Understand the flow of transactions" without tracking how the flow's business rules changed the transactions along the way? An EDW should capture the history of the raw transactions BEFORE and AFTER the changes in order to meet compliance.

SOX 404 compliance costs represent a tax on inefficiency, encouraging companies to centralize and automate their financial reporting systems. This is apparent in the comparative costs of companies with decentralized operations and systems, versus those with centralized, more efficient systems. For example, the 2007 FEI survey indicated average compliance costs for decentralized companies were $1.9 million, while centralized company costs were $1.3 million.[28] Costs of evaluating manual control procedures are dramatically reduced through automation.
http://en.wikipedia.org/wiki/Sarbanes-Oxley_Act

Regarding costs, the EDW is meant to be a centralized repository of information. The Sarbanes-Oxley auditor should be asking to view the financial reports from three directions - using triangulation to spot discrepancies.

Auditor to the firm:
Direction 1: Show me today's financial reports from today's data... (firms response: ok, either from the EDW or from the operational systems) - usually this will come from an "OPERATIONAL DATA WAREHOUSE" or a system using operational BI.

Direction 2: Show me yesterdays' financial reports - reproduce them for me using yesterdays' routines, and yesterday's data.... don't just grab your "old hard-copy"... (firms response: ok - from the EDW, and the backed-up routines, and yesterday's data mart).

Direction 3: I see errors, discrepancies between the two reports... Now, show me the RAW detail data that went in to yesterdays report, and the RAW detail data that went in to today's report. (Firms response with a "version of the truth" warehouse is: UH-OH, we're in trouble.... Firms response with a Data Vault is: No problem)

Data is an asset, data affects the financial bottom lines. RAW data needs to be tracked in the EDW in order to be compliant with Sarbanes-Oxley. Auditors will ask to see this information, and the EDW better have it.

*** Compliance initiatives are difficult (if not impossible) to meet without a historical tracking of RAW data sets, integrated, and stored in the EDW ***

Changing the data on the way IN to your EDW can cause a compliance audit failure in the future, especially if the source system is retired, is destroyed, or is unable to "restore" the system of record that created the data in question. The EDW is the ONLY place in the future to house this information.

I will be continuing my series on auditability, compliance here - but you can also find out more by registering, and watching new on-line courses on http://inmoninstitute.com - I will have some courses available by February 15th, 2009 about auditability and compliance and the EDW.

I will continue my series as well, in discussing governance controls, and accountability as we move forward.

I'd like to hear about your thoughts/experiences. Please reply with comments below.

Thank-you,
Dan Linstedt
CIO, Genesee Academy, LLC
DanL@GeneseeAcademy.com


Posted January 20, 2009 4:45 AM
Permalink | No Comments |