We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Recently in Compliance and Integration Category

Ok folks, I'm back.  I've spent the last six months reading, writing (and arithmetic - that's a joke... ha-ha...)  seriously, implementing solutions, visiting customer sites, and seeing what's going on in the world; and to that end, the DW/BI world is changing.  You've seen me write about this before, but now it's on-top of your head.

Like it or not...  Your Data Warehouse IS a System Of Record, and if it isn't in 2009, it WILL BE before the end of 2010.  So what does that mean?

You've heard about it for the past 2 years (at least), it's called Operational BI, and my good friend Claudia Imhoff, and my other friend Colin White have published many many articles and blog entries about these topics.  But have you given any consideration to exactly what that means to the Data Warehouse?

Come on people, get with the times!

Ok, down to brass tacks.  To have "operational BI" you must have Operational Data, well, wait a minute - operational data means system of record doesn't it? 

Ding Ding Ding, you win the prize... come on down!

Wow, didn't know there was a prize... (there's not)...  Anyhow, Operational Data is required - and if you're talking operational BI, then your talking about business intelligence, and if you mention business intelligence, then you MUST consider data warehousing.... 

So what does that mean?

It means you've GOT NO CHOICE but to consider an Operational EDW under the covers.  That's right folks, I've said it for years, now I'm not going to hide this fact from the light anymore.  YOUR DATA WAREHOUSE WILL FROM NOW ON... HOUSE OPERATIONAL DATA RIGHT BESIDE YOUR HISTORICAL DATA, this means that the operational data will be captured from applications sitting directly ON-TOP of your Data Warehouse.

Think I'm foolin do ya?

Think again...  These are the systems I've been building, or helping folks build for the past 3 years - you don't think that I've been sitting around resting on my laurels eh?  Far from it.  I've been traveling the world, talking to folks about accountability, credibility, RAW OPERATIONAL DATA sitting right inside the data warehouse.  This means that the combined result produces Operational BI.  it also means that the resulting Data Warehouse or (ODW for short) IS a system-of-record.

Don't like it?  Tough.  Take it to the bank!

Customers like JP-Morgan-Chase, Lockheed Martin, Raytheon, World Bank, ABN-AMRO, RBS (royal bank of Scotland), Sisters of Mercy Healthcare, Blue-Cross Blue Shield, Department of Defense, Food and Drug Administration, Federal Aviation Administration, City of Charloette (North-Carolina), SNS Bank Netherlands, and more have all used these principles to create complete ODW's in house today.

What do they need?

Accountable, Auditable, Flexible and scalable data stores that act as an EDW while at the same time absorbing real-time transactional (operational) data directly in to the EDW.  Thus creating a RAW ODW.  It's here today folks, fed by web-services (SOA), batch, and ESB's - managed and edited by applications that control metadata, carrying changes and audit trails along with cube-write-backs and data edits, that's what ODW and Operational BI are all about.  It's making the systems more robust, more scalable, more complete.

At the risk of upsetting some folks: And ODW is the underpinning of successful OBI.

The ODW contains and inherits all the responsibilities of the data warehouse.  In addition, because it houses operational data it also inherits all the responsibilities of an operational system!!!  Yep, you guessed it: 9x9 up-time, reliability, fault-tolerance, fail-over, garaunteed access, high speed transaction handling, on-top of: strategic data handling, historical record keeping, integration, etc...  And don't forget: system-of-record.

But why is it a system-of-record?  I though other operational systems handled this?

Each operational system handles their own system of record, but with an ODW - think about it...  you're saying: My data warehouse/BI solution is an operational system in addition to being a data warehouse.  So, you're saying: this is the only place where a complete, single picture of the integrated operational and historical data lives.  Which in essence means: you have built what needs to be a system-of-record.

What's involved in creating a system-of-record for the ODW?

Good question, and I wish I had time to answer it....  Ok - here's my opinion: you need to worry about all the operational "rules and regulations and governance", you also need to worry about storing RAW (uncleansed, unchanged, unaltered) data sets in your data warehouse.  That means, your "business rules" to mainpulate data can no longer be executed up-stream of your data warehouse.  You must move your business rules downstream of the ODW, otherwise you're changing operational data to make it fit, and blowing up any chance of an auditable system-of-record.  ESPECIALLY when 3 months go by and the business rules change!!

Are you saying I need "RAW" data in my ODW to make this work?

That's exactly what I'm saying.  I'm not saying get rid of the business rules, I'm not saying put "bad" data in front of business users.... No, I am saying "move the processing downstream" to POST-ODW (between ODW and Data Mart delivery).  It will make your life MUCH easier.

This is absured, I've never heard of anything like this before.  Why should I care?

You don't have to care if you don't want, but the world is a changing place.  If you want an auditable and compliant ODW, or you want to add operational data to your data warehouse you will have to follow these rules, dare I say it: you might find yourself out of a job if you don't!

All the companies I mentioned above, and more, every day are learning this the hard way (through failed audits, through corporate restructuring, through spread-marts where cost has gotten out of hand)...  They have decided to build ODW's to solve these problems.

It just so happens, they are using the Data Vault Modeling to make this happen, you don't need a Data Vault model to get here, you must however follow the principles outlined in this article - but the Data Vault is an enabler that will help you get here faster.  You can read more about the Data Vault Model at: http://www.DataVaultInstitute.com

or contact me directly: DanL@DataVaultInstitute.com


Dan Linstedt

Posted October 28, 2009 12:46 AM
Permalink | No Comments |

There are many definitions of governance, compliance, and accountability in the industry, and it seems as though many of us are struggling to define it for the EDW/BI space in some generically acceptable way.  As I've recently researched these subjects (and been involved in them for years) I've noticed a trend that many of the definitions are vertically challenged (if you will).  They focus on a vertical industry rather than on horizontal enterprise data warehousing. 

In this entry I'll add my two cents to this noise as just another voice of opinion on these subjects.

Governance can be defined many different ways, there are hundreds of definitions there - and in B-Eye Best practices section of this site, there are discussions, white-papers, and so on.  Many resources to define governance.  My opinion on the matter is as follows:

* Governance for EDW and BI should be about managing, monitoring, optimizing, fixing whats broken, and producing ever increasing quality work product.  It involves people, process, and technology - all aspects of the game, and it's focus should be horizontally driven across lines of business in order to meet the needs of the enterprise and BI initiatives.

* Compliance for EDW and BI should be about the meeting agreements, showing progress in governance, between IT and internal business, and the enterprise and it's external customers.  Good "compliance" means (in my opinion) meeting service level agreements between organizations both inter-organization and intra-organization.  Applying compliance in the EDW BI landscape means upholding standards and delivering quality auditable information across the business.  Service level agreements should cover the "data" that is delivered, as well as the people and processes used to meet the needs.  Compliance means setting standards that multiple parties agree to, and then reaching those levels of delivery.


* Accountability for EDW and BI team should be about the ability of business to stand up and take ownership of mistakes made, errors and omissions in meeting both compliance and governance initiatives.  It should be horizontally focused across the enterprise, and should include the data warehouse information delivered in reports on a daily basis.  This includes the raw-data-store for good-bad-and-ugly data in the EDW, along with the "quality cleansed" data stores used in data marts, along with ensuring that numbers match at a corporate level when necessary.  Accountability is all about the people taking ownership of their mistakes, admitting errors, omissions, and optimization opportunities at the process level, then applying this to the systems.

These are my introductory thoughts, I would love to hear from you about what you would add, change, or remove from these definitions.


Dan L, DanL@GeneseeAcademy.com

Posted May 6, 2009 6:51 AM
Permalink | No Comments |

In this entry we explore the nature and notion of compliance - specifically Sarbanes-Oxley and what it means to your EDW. I've been working with compliant based systems for years. Over the years I've learned about data as an asset, that is: data in the EDW affecting the financial bottom line. I've learned about audits and auditability (been through a few of them myself). In this "series" I will first explore sarbanes oxley, then follow with CoBIT, ITIL, SEI/CMMI Level 5, and a few other things. Please let me know what you think of this entry/series and if you'd like to see more.

Let's start with a few definitions:

1. Public Company Accounting Oversight Board (PCAOB)
2. Auditor Independence
3. Corporate Responsibility
4. Enhanced Financial Disclosures
5. Analyst Conflicts of Interest
6. Commission Resources and Authority
7. Studies and Reports
8. Corporate and Criminal Fraud Accountability
9. White Collar Crime Penalty Enhancement
10. Corporate Tax Returns
11. Corporate Fraud Accountability

How does this tie to my EDW/BI initiative?
Very interesting question, to which I have an opinion. My opinion is as follows: I believe that data is an asset within our organizations. Now before you run off to tell the world that "only good data is an asset", let me back up. Good, Bad, and Indifferent - data is an asset - regardless of how it's perceived. Data that is captured, or created on the fly is an asset. It doesn't matter if it's good or bad data. Besides, who determines which label to place on the data?

With data as an asset, it affects the bottom line financials. Financial decisions are made based on data every day, sometimes every second. In some cases (like NASA), data affects peoples lives. Clearly, data is worth something on the financial books.

Ok - so how do you value it?
That's a discussion for another day.

Now that data is seen as an asset to the corporation, and that it's considered tied to financials, it should be available for audits, and compliance. The compliance must come from the people themselves within the organization; however the data can shed light on the firm's compliance or non-compliance abilities. In other words, the data can tell the auditors: "what the company knows, and how they are reacting to the situation." The data can also help determine the "net-worth" of the organization.

Sarbanes-Oxley_Act (a little further down says)
Auditing Standard No. 5
* Assess both the design and operating effectiveness of selected internal controls related to significant accounts and relevant assertions, in the context of material misstatement risks;
* Understand the flow of transactions, including IT aspects, sufficient enough to identify points at which a misstatement could arise;
* Evaluate company-level (entity-level) controls, which correspond to the components of the COSO framework;
* Perform a fraud risk assessment;
* Evaluate controls designed to prevent or detect fraud, including management override of controls;
* Evaluate controls over the period-end financial reporting process;
* Scale the assessment based on the size and complexity of the company;
* Rely on management's work based on factors such as competency, objectivity, and risk;
* Conclude on the adequacy of internal control over financial reporting.

Ok, I can see how source systems are affected, but how does this tie to my EDW?
The EDW must house "A SINGLE VERSION OF THE FACTS for a specific point in time." (see Data Vault Modeling and Methodology e-learning on http://inmoninstitute.com) The Data must tell a story of what the company DID and how they REACTED to a specific situation that occurred within the organization. The data in the EDW must create an AUDIT TRAIL of decision making along the way. The EDW is crucial to uncovering the facts about what people knew when. It MUST become a system of record "capture mechanism" in order to meet compliance initiatives.

Wait a minute, that's a big leap - I don't follow...
You're not alone. Many people around the world are now discovering that the only way to uncover corruption, fraud, or pure misjudgment is to look at the good, the bad, and the ugly data in the EDW - and how it changed (or didn't) over time. The EDW tells the story of the companies' evolution, ranging from new source data, to changing of the business rules. Ok, back to the point:

How can you "assess the effectiveness of audit controls" without looking into the EDW for a data trail of how the company is operating? Especially if you are warehousing the financial systems...

How can you "Understand the flow of transactions" without tracking how the flow's business rules changed the transactions along the way? An EDW should capture the history of the raw transactions BEFORE and AFTER the changes in order to meet compliance.

SOX 404 compliance costs represent a tax on inefficiency, encouraging companies to centralize and automate their financial reporting systems. This is apparent in the comparative costs of companies with decentralized operations and systems, versus those with centralized, more efficient systems. For example, the 2007 FEI survey indicated average compliance costs for decentralized companies were $1.9 million, while centralized company costs were $1.3 million.[28] Costs of evaluating manual control procedures are dramatically reduced through automation.

Regarding costs, the EDW is meant to be a centralized repository of information. The Sarbanes-Oxley auditor should be asking to view the financial reports from three directions - using triangulation to spot discrepancies.

Auditor to the firm:
Direction 1: Show me today's financial reports from today's data... (firms response: ok, either from the EDW or from the operational systems) - usually this will come from an "OPERATIONAL DATA WAREHOUSE" or a system using operational BI.

Direction 2: Show me yesterdays' financial reports - reproduce them for me using yesterdays' routines, and yesterday's data.... don't just grab your "old hard-copy"... (firms response: ok - from the EDW, and the backed-up routines, and yesterday's data mart).

Direction 3: I see errors, discrepancies between the two reports... Now, show me the RAW detail data that went in to yesterdays report, and the RAW detail data that went in to today's report. (Firms response with a "version of the truth" warehouse is: UH-OH, we're in trouble.... Firms response with a Data Vault is: No problem)

Data is an asset, data affects the financial bottom lines. RAW data needs to be tracked in the EDW in order to be compliant with Sarbanes-Oxley. Auditors will ask to see this information, and the EDW better have it.

*** Compliance initiatives are difficult (if not impossible) to meet without a historical tracking of RAW data sets, integrated, and stored in the EDW ***

Changing the data on the way IN to your EDW can cause a compliance audit failure in the future, especially if the source system is retired, is destroyed, or is unable to "restore" the system of record that created the data in question. The EDW is the ONLY place in the future to house this information.

I will be continuing my series on auditability, compliance here - but you can also find out more by registering, and watching new on-line courses on http://inmoninstitute.com - I will have some courses available by February 15th, 2009 about auditability and compliance and the EDW.

I will continue my series as well, in discussing governance controls, and accountability as we move forward.

I'd like to hear about your thoughts/experiences. Please reply with comments below.

Dan Linstedt
CIO, Genesee Academy, LLC

Posted January 20, 2009 4:45 AM
Permalink | 2 Comments |

Well, it's happened again. IT is trying desperately to eliminate the value of the EDW from the business (at least this is what I see). Business is responding by demanding the creation of Master Data systems. There seems to be an age-old argument in the market space about the use of, definition of, and condition of: Business Keys. IT appears to be telling people to use surrogate keys and to ignore the business keys entirely. In this entry we will explore this single notion, and see what some folks have to say about it (me included!) Mind you, this is a bit of a rant; they seem to know how to "get my goat" as they say...

I start off by stating, Codd & Date designed normalized forms to have business meaning. They insisted that Business Keys be utilized in order to "make sense of" and "tag" the data sets appropriately so that relationships can be understood and maintained.

A simple link to a temporal database book houses a brief entry on the Information Principle: See it here.

I'd like to say a word or two about business keys (which by the way, you'll be able to find additional information on my videos on YouTube: http://www.youtube.com/dlinstedt/

What are business keys?
Business keys ARE the master information that unlocks context for business users. Business keys are (often) intelligent keys that have MEANING to the business. Business Keys are often alphanumeric, parts of which may be generated sequences, other parts have meaning based on position. In any event, business keys are USED by the business to locate, identify, and track information through the business life-cycle. Without them, business may not be able to "use" or apply the information properly.

What business keys are NOT:
Business keys are NOT surrogates, NOT sequences, NOT ordered numeric elements assigned based on technical insertion rates. Surrogate keys should NEVER be shown to the business, ever.... They should be used within a system (internally only) to identify rows to the machine, and provide optimal join paths, but they should NEVER appear on reports, screens, or anywhere that the business can see them.

Is there an argument around business keys versus surrogate keys?
You bet! Check out these comments:

"Dimensions should always use a surrogate key that is generated within the warehouse. I went to a presentation a couple of years ago by Ralph Kimball (a data warehouse author), and he discussed the importance of removing the warehouse's dependency on business keys. The idea is a good one, because business keys change regularly and this will result in a long-term problem for the warehouse. However, when we discussed Slowly Changing Dimensions (especially ones that kept history), he said that we should use the business key to link them together. This went against what he had just said, so I decided that we needed to find another solution." http://expertanswercenter.techtarget.com/eac/blog/0,295203,sid63_tax298150,00.html



WHY are these people demanding that there is no value to business keys?
Because it's a very tough problem for business to overcome. Yet the business today is ASKING, Begging, pleading for answers from Master Data Sets. I maintain that you cannot build a master data system without looking at and using business keys as a central HUB of information.

Why not surrogates?
If I ask you to look up surrogate key 5, do you understand what this is? where it came from? what it is bound to? Does it give you _any_ context at all as to which system generated the number? Do you even know where to begin to find this key?

Surrogate numbers are generated today in EACH source system. In the Data Warehousing world we are responsible for integrating MULTIPLE systems at once into a single place. If we rely solely on these "surrogate keys" and completely ignore business keys as has been suggested by the links above, our EDW would never mesh or align for the business. Furthermore trying to build a master data system would be impossible. Some of these individuals I listed even went so far as to say: "ignore the business keys in your dimension entirely, because it is unruly (null) most of the time".

I say rubbish. If your business is not properly synchronizing, populating, or utilizing business keys then they are hemorrhaging money along their business process. Business keys are vital to the traceability of information ACROSS lines of business and ACROSS systems.

Take a look at what I say about business keys:

Bottom line, Business keys are imperative that they span the systems. If the business keys are changing, or are re-used, the business is LOSING MONEY. I will take that to the board of directors level every single time, and every time - I can find busted and broken business problems and lack of visibility ACROSS the organization in accordance with their lack of regard for business keys.

The ONLY thing one has to do is look at the businesses that want master data systems - how are you (IT) going to integrate the data sets by surrogate if the surrogates generated by source system ARE THE SAME across multiple sources? WHICH surrogate are you going to show to the business as the "MASTER KEY" for which pieces of information? It's a near impossible problem to solve, the business units will fight over the definition, and it will come down to politics as to who is right/wrong, when the business REALLY should be deciding how to fix the source of the problem: lack of a single business key.

Auto manufacturers figured it out long ago, they use VIN (vehicle identification numbers) to uniquely identify: make, model, manufacturer, date of manufacturer, size of engine, and so on. Unless you are doing something illegal, the VIN does not change, nor does it go away. What would happen to the world of car's if the VIN disappeared?

We have the SAME question in the world of counterfeit drugs... Unfortunately E-Pedigree as a country wide solution has been lobbied down, and pushed back. Each bottle was to be labeled and identified as a unique bottle using a very specific bar code. It would have allowed the entire industry to sort out the MOST of the counterfeit drug problem, and save people’s lives.

You can sit there and tell me that "Business Keys don't matter" but at the end of the day, I will say: you are losing money, and quite possibly people are dying without them.

Dan Linstedt
Check out WHY business keys are important, learn about the Data Vault Model.

Posted November 2, 2008 10:07 PM
Permalink | 3 Comments |

In part 7 of this series I mentioned that I would share how to number deliverables of the project to assist in monitoring progress, and managing metrics (KPA's and KPI's of the project). In this entry I provide some very simplistic starting blocks on how this is done within a project methodology, and hopefully - there's light at the end of the tunnel where we can begin to see the impact on the risk, estimations of hours, cost measurements/forecasts, and actuals for delivery. This entry is all about Project Management and deliverables - how to tie them together.

If you are SEI/CMMI certified, or are an auditor, I would love to have your feedback and comments regarding these subjects. If you are PMP or Six Sigma or TQM familiar, it would be good to hear from you as well - please, tear down these ideas if they do not fit with your experience. I can only learn from you if you respond. :)

In my past 15+ years (now going on 20 years) of IT experience I've spent maybe 6 to 8 years managing projects (technical project management). One of the hats I wore was not only business analyst, but also full project manager, and team enabler. Under this guise of lean-initiatives and cycle time reduction, I learned a few things that seemed to make sense at the time.

I got tired of estimates that didn't match the project plan, couldn't be scoped properly, or attached to actuals and deliverables. I got tired of risk running rampant and killing projects before they started. I got tired of always being asked: "how complete are you and your team on task X?" the real question I got was: "How close to done are you with the requirement... you know, the requirement that discusses ZZZZZ...."

I needed a way to track all of this, and furthermore to be able to press a button and run some analytical reports / graphs (every day) on the project as we moved along. So, taking from technical writing requirements, and from SEI/CMMI and from the legal profession (which I only know they number every paragraph).... I started numbering everything I could find.

For instance, I went through the business requirements, numbered all TITLES and SUB-TITLES, and paragraphs.
1.0 Requirements Overview
1.1 Requirement 1
1.1.1 Response time for req. 1 The expected response time........ (paragraph)
1.1.2 Types of queries...

You get the idea, next I numbered the technical requirements to mesh with the business requirements. I aligned the requirements to match up in a matrix of "this is what they want, and this is how we propose to build it." This was appropriately called "IT alignment" (at least it was in the '80s... Then, I took the technical requirements and began numbering EVERY line-item in the project plan. By the way, this became a GREAT way to spot requirements (stated) that were missed in the project plan... interesting loop-hole catch.

I then thought to myself: Self.... (just kidding)
I took the Project plan, and assigned roles rather than people (as responsible parties). To which I developed a roles & responsibilities document, and then numbered that too (independently of the requirements). RR1.0, RR1.1, RR2.0, etc.... I took the R&R numbering system and attached them back to the technical requirements, then assigned resources to the roles and responsibilities, and to each of the resources - I assigned resource loading. This ended up becomming the work-breakdown structure (Project + Tech requirements + RR)

Next, I created an organizational breakdown structure (org chart), and developed an escalation path for each role, numbering each element in the org-chart as I went, assigning RR1.1 to a specific org unit. Now we knew where the risk would be handled, or escalated as things got hot (if they got hot). Next on the list was a process breakdown structure (AS-IS process flows). We needed to know how the data currently moved from one business unit to another, from one system to another. We developed process flows at 30,000 feet and above - then numbered all of them with the appropriate business requirements number (which tied the artifacts to specific components of the project plan).

Then, we immediately began designing new (to-be) process flows, which re-defined some of the interfaces, and how the data would flow to the warehouse, out to the marts, and back to the reports in the business users' hands. We then numbered each of these to-be process flows with the "original process flow numbers" tying them together. As we built the "to-be flows" and completed the process re-design, we could attach these components to mile-stones reached within the project and produce deliverables consisting of data and process to the business users.

Finally, we went through each major section of the technical requirements and assigned risk analysis templates by applying expected skill sets (balanced against the R&R, and the org chart, and availablility) - we developed a low,medium, and high risk score. We then set a threshold for warning (approaching high risk) where we would begin escalation procedures up the Org Breakdown structure.

Needless to say, there were many other deliverables (all docs were versioned in keeping with CMM), all processes were measured, quantified, and then optimized, and the Data warehouse (now some 15 years later) is still running strong.

Ok - you think this is a lot of work? We did this with a team of 3 people, + 1 person from the PMO (proj. Mgmt Office), + 1 DBA part time, + 1 senior/expert data modeler/data architect part time. And we accomplished delivery of the full production warehouse inside 6 months for 3 source systems (Planning, Manufacturing and Finance). The EDW consisted of 60+ tables, source systems around 300+ tables with manufacturing bill of materials.

It can be done, with the right people, the right training and the right expertise - and the benefits can be enormous. It doesn't take a huge bankroll to institute this type of "project governance" or maturity model for EDW projects, just dedication and consistency.

Hope this helps,
Dan Linstedt

Posted September 24, 2008 8:51 PM
Permalink | 3 Comments |
PREV 1 2 3 4 5 6

Search this blog
Categories ›
Archives ›
Recent Entries ›