We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Accountability (or lack thereof) has long been a problem with integration projects including data warehouses. Business users tell IT: the warheouse is wrong, and you know what? They're right! Why? Because IT implemented business rules to merge data, to far up-stream. They've lost traceability, and therefor accountability for the facts.

Not only is accountability a problem, but technically speaking - handling real-time data along with the batch data is also a problem. Most modeling architectures don't support this, thus resulting in cost increases to re-architect to move from an EDW to an ADW.

Here's a real-life case about accountability that I ran into over 10 years ago. Manufacturing costs had never matched finance costs. For some reason, financial reports of build processes had been off by X dollars for over 15 years. When we built the warehouse we went to the following paradigm:

Raw data pull, Raw data integrate (no change to the grain, no merging), then we implemented business rules to merge, mix and match going FROM the warehouse into two places: Error Marts and "real-marts" or the business version of the truth. Some business users were stating that the "real-marts" were right, others were complaining that the reports they were getting were wrong - and placing the blame incorrectly on the Data Warehouse project.

This came to a head quickly when executive decision makers got wind that the business believed the warehouse to be wrong because it didn't match the operational reports they were getting. The executives hired an auditor to come in and see if the warehouse was wrong (this was long before SOX and Basil II). The auditor came to us and said: show me how these financial reports are produced, what the data looks like BEFORE it was changed into business usage, and what it looks like after it was changed - furthermore can you tell me when it was changed?

We walked through the separation of data in the Error Marts, and the Financial Marts for the auditor, we showed them the business rules that the business users had asked for - in the cycle of loading both Error Marts and Financial Marts. We traced the data through the aggregation and integration, and cleansing components. He validated and verified our information.

Then, we showed him the compliant STATEMENT OF FACT in our warehouse, each row with a record source and a load stamp - where it came from, when it came in from that source and the fact that the data was not MERGED, nor was it cleansed - it was a raw snapshot of what it looked like in the source system AT THAT TIME. He then independently verified (for current data) that we indeed were loading raw data that matched the operational systems.

He walked over to the operational report on the operational system, looked at the aggregations and merging - and delcared that the warehouse was right, and the operational report was wrong - furthermore it had been wrong for 15 years! On that day, the business learned the value of true warehousing (statement of Fact), and separation of data from information - where the business rules were placed into building of the marts.

They found out that 1) if the warehouse team had loaded and matched to the operational report it would really be wrong (would have propogated the accounting error), and 2) that they had mis-billed their client for 15 years, 3) that the only reason they could discover this error was because we had a single statement of fact in our warehouse, and that we left the version of truth to building the marts. Needless to say they corrected the problem, reported it to their customer - and were subsequently awarded more contracts because they were finding and fixing their own BUSINESS FLAWS.

For the first time in 15 years we saw a tremendous rise of Change Requests on the source system to begin fixing Business Process Problems on the data capture side. We also saw change requests and investigations into the physical business processes for the same reason - they could see and audit the BAD DATA right along side the Good Data. We had itemized visibility into our data supply chain.

End users were now accountable for business problems, explaining why they happened and then recommending fixes to either source systems or business processes or both. Error marts helped separate raw data from truly usable information. This kept the Queries clean, and the reports accurate.

Next, we'll finish up this series with a discussion on real-time data sets. By now you should be getting the feeling that Data Modeling is very much a part of the success or "failure" to use information in the warehouse correctly. It should also be evident that traceability, statement of facts, and separation of perceived "good" information from "bad" data is necessary for auditability, compliance and Business accountability. No where is this more evident than with the source system data pulls, and with the data warehouse modeling techniques - along with the shift of the business rules to the OUTPUT from the warehouse sides.

Business Rule #6:
In order to remain compliant, auditable, and prove beyond a shadow of a doubt that the warehouse is NOT wrong - the correct architecture and data modeling must be put in place.

You can read more about the data modeling constructs called the Data Vault at: www.DanLinstedt.com, in fact you can ask questions of customers who are using these techniques today. See you next time.

Comments?
Dan L
CTO, Myers-Holum, Inc. http://www.myersholum.com


Posted June 2, 2005 5:34 AM
Permalink | No Comments |

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›