We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

So you're curious are you? Have I grabbed your attention yet, or is this not making any sense? In thinking about accountability, the data supply chain, and change requests there is one key component to making this all happen. That is: to show the bad data with the good, maybe not in the same reports, but to physically separate the bad data from the good depending on the severity of rules breakages.

There's one more thing to think about, if we accept Data Supply Chain as a paradigm, then we should be keeping the business key on the data unique, and the same - once assigned, always assigned. The concept goes back to RFIDS and the manufacturing supply chain. RFID's are helping "clean up" and provide visibility... Read more...

RFID's are used to clean up data, and provide visibility into the manufacturing supply chain, they are causing accountability in business and providing the means and mechanisms with which to improve and measure supply chains around the world. If RFIDS (which are nothing more than unique identifier keys) can do this for our manufacturing supply chain, imagine what a CONSISTENT business key can do for our DATA SUPPLY CHAIN!

It can become the RFID for the Data within our systems. This means businesses MUST abide by the following rules:
1. A business key assigned, must be mechanical
2. A business key assigned must be unique
3. A business key assigned must never be re-used.
4. The Application (like CRM/ERP/HR etc..) that assigns DUIDS (Data Unique Identifiers), will provide incredible metrics, visibility and consistency to improving the Data Supply Chain. They can begin producing the ultimate application.

I'm talking about more than just simple sequence numbers. We need an international numbering board. What if (just a thought) that the RFID's could have the same exact number or ID as the Data Supply Chain? Of course that would mean tagging the very smallest of parts in all of our assembly lines. Service companies would never have RFIDS to speak of, except maybe on invoices or paper contracts that are printed.

Hmmm, does this mean data is trackable when printed to hard-copy? You bet! Imagine a filing room filled with RFID's - how easy it would be to track down a document? Maybe the US Patent Office or the Library of Congress could undertake something like this, saving billions of dollars a year (Yea! Less Taxes?)

Next step: Printers that stamp RFIDS on documents according to document numbers that come from DUIDS. So you get the point, data identifiers (business keys) are just as important as any hard-coded identifier tags we put on products. Curve ball: if we can place a value on the products we produce, and we can begin uniquely identifying our data and it's elements, then there's no reason why we can't place a value on our data as well.

Back to the real world, since today we have no commonly accepted notion of truly unique identifiers, business keys will have to suffice. So what's the problem with today's business intelligence reports and systems?

The problem is, by the time the business user sees the integrated data set, every effort has been made to adjust, clean, alter, move, remove, and merge data to make it usable by the business. This is fine until we begin to question what is meant by "One version of the truth." As I've stated in several other entries, TRUTH is in the eye of the beholder and is subjective. It has NOTHING to do with the FACTS of the way the data is captured, stored, and moved around the organization.

While there is value to producing "usable information", we (as BI implementors) have long overlooked the fact that there's also value in producing the unusable facts - the raw data that is messed up, wrong, unmatched. However, everything starts with the analysis of the business keys. I propose that there are really two answers, both right at the same time, ahhh - a Conundrum? Yup.

I propose that along with DUIDS, we should be storing a single statement of FACT in our warehouses, then moving the FACTS into polarized/colorized versions of the truth in Data Marts. This means two basic principles apply:

1. Business rules move to the "output" side of the warehouse, between the warehouse and the marts.
2. Raw data that breaks business rules, ends up in one or more ERROR MARTS.

Physical separation of the data is absolutely necessary to begin pushing the accountability back into the business, to begin the IQ cycle and the business process clean up, to begin providing true visibility into ALL data that exists in the source systems, to begin showing the FULL level of rejects in our data supply chain.

Manufacturing supply chains don't throw away "bad parts", they put them in reject bins, record them, try to figure out why they went bad, and then try to improve them so they don't make the same mistake again (because it costs them money, time, and competitive advantage). Why shouldn't we treat our data this way? Why do so many implementation specialists INSIST on cleansing, mixing, merging, and constantly fine-tuning the "truth" so that these errors are hidden or disposed of?

By actually separating the bad data into "reject bins" for the lowest level of grain, before it is cleansed, mixed, merged, etc.. We can really begin to take inventory of our source systems and the business processes - we can finally see where our businesses are HEMMORAGING money, time and competitive advantage.

In our next entry, we'll walk through an example of how this worked at a real customer site. IT'S TIME for OUR DATA SUPPLY CHAIN to step up and begin working for us.

Comments?


Posted May 31, 2005 7:00 AM
Permalink | No Comments |

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›