Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Well, it's happened again. IT is trying desperately to eliminate the value of the EDW from the business (at least this is what I see). Business is responding by demanding the creation of Master Data systems. There seems to be an age-old argument in the market space about the use of, definition of, and condition of: Business Keys. IT appears to be telling people to use surrogate keys and to ignore the business keys entirely. In this entry we will explore this single notion, and see what some folks have to say about it (me included!) Mind you, this is a bit of a rant; they seem to know how to "get my goat" as they say...

I start off by stating, Codd & Date designed normalized forms to have business meaning. They insisted that Business Keys be utilized in order to "make sense of" and "tag" the data sets appropriately so that relationships can be understood and maintained.

A simple link to a temporal database book houses a brief entry on the Information Principle: See it here.

I'd like to say a word or two about business keys (which by the way, you'll be able to find additional information on my videos on YouTube: http://www.youtube.com/dlinstedt/

What are business keys?
Business keys ARE the master information that unlocks context for business users. Business keys are (often) intelligent keys that have MEANING to the business. Business Keys are often alphanumeric, parts of which may be generated sequences, other parts have meaning based on position. In any event, business keys are USED by the business to locate, identify, and track information through the business life-cycle. Without them, business may not be able to "use" or apply the information properly.

What business keys are NOT:
Business keys are NOT surrogates, NOT sequences, NOT ordered numeric elements assigned based on technical insertion rates. Surrogate keys should NEVER be shown to the business, ever.... They should be used within a system (internally only) to identify rows to the machine, and provide optimal join paths, but they should NEVER appear on reports, screens, or anywhere that the business can see them.

Is there an argument around business keys versus surrogate keys?
You bet! Check out these comments:
http://www.mindfuldata.com/Modeling/modeling-pdf/DAMA%202008%20Speaker%20Notes.pdf
http://stackoverflow.com/questions/63090/surrogate-vs-naturalbusiness-keys


"Dimensions should always use a surrogate key that is generated within the warehouse. I went to a presentation a couple of years ago by Ralph Kimball (a data warehouse author), and he discussed the importance of removing the warehouse's dependency on business keys. The idea is a good one, because business keys change regularly and this will result in a long-term problem for the warehouse. However, when we discussed Slowly Changing Dimensions (especially ones that kept history), he said that we should use the business key to link them together. This went against what he had just said, so I decided that we needed to find another solution." http://expertanswercenter.techtarget.com/eac/blog/0,295203,sid63_tax298150,00.html

http://www.mindfuldata.com/Modeling/modeling-pdf/DAMA%202008%20Speaker%20Notes.pdf

http://www.infoadvisors.com/Home/tabid/36/EntryID/191/Default.aspx
http://www.cerebiz.com/blog/index.php/2007/08/06/use-of-surrogate-keys-in-data-warehousing/

WHY are these people demanding that there is no value to business keys?
Because it's a very tough problem for business to overcome. Yet the business today is ASKING, Begging, pleading for answers from Master Data Sets. I maintain that you cannot build a master data system without looking at and using business keys as a central HUB of information.

Why not surrogates?
If I ask you to look up surrogate key 5, do you understand what this is? where it came from? what it is bound to? Does it give you _any_ context at all as to which system generated the number? Do you even know where to begin to find this key?

Surrogate numbers are generated today in EACH source system. In the Data Warehousing world we are responsible for integrating MULTIPLE systems at once into a single place. If we rely solely on these "surrogate keys" and completely ignore business keys as has been suggested by the links above, our EDW would never mesh or align for the business. Furthermore trying to build a master data system would be impossible. Some of these individuals I listed even went so far as to say: "ignore the business keys in your dimension entirely, because it is unruly (null) most of the time".

I say rubbish. If your business is not properly synchronizing, populating, or utilizing business keys then they are hemorrhaging money along their business process. Business keys are vital to the traceability of information ACROSS lines of business and ACROSS systems.

Take a look at what I say about business keys:
http://www.danlinstedt.com/AboutDV.php
http://www.tdan.com/view-articles/5285
http://www.b-eye-network.com/blogs/linstedt/archives/2005/09/between_inmon_a.php

Bottom line, Business keys are imperative that they span the systems. If the business keys are changing, or are re-used, the business is LOSING MONEY. I will take that to the board of directors level every single time, and every time - I can find busted and broken business problems and lack of visibility ACROSS the organization in accordance with their lack of regard for business keys.

The ONLY thing one has to do is look at the businesses that want master data systems - how are you (IT) going to integrate the data sets by surrogate if the surrogates generated by source system ARE THE SAME across multiple sources? WHICH surrogate are you going to show to the business as the "MASTER KEY" for which pieces of information? It's a near impossible problem to solve, the business units will fight over the definition, and it will come down to politics as to who is right/wrong, when the business REALLY should be deciding how to fix the source of the problem: lack of a single business key.

Auto manufacturers figured it out long ago, they use VIN (vehicle identification numbers) to uniquely identify: make, model, manufacturer, date of manufacturer, size of engine, and so on. Unless you are doing something illegal, the VIN does not change, nor does it go away. What would happen to the world of car's if the VIN disappeared?

We have the SAME question in the world of counterfeit drugs... Unfortunately E-Pedigree as a country wide solution has been lobbied down, and pushed back. Each bottle was to be labeled and identified as a unique bottle using a very specific bar code. It would have allowed the entire industry to sort out the MOST of the counterfeit drug problem, and save people’s lives.

You can sit there and tell me that "Business Keys don't matter" but at the end of the day, I will say: you are losing money, and quite possibly people are dying without them.

Cheers,
Dan Linstedt
Check out WHY business keys are important, learn about the Data Vault Model.


Posted November 2, 2008 10:07 PM
Permalink | 3 Comments |

3 Comments

one of the most dangerous usage of surrogate keys I have seen is when report writers start using them as filters in the "where clause" of the SQL queries feeding their reports. Those reports get to production with the surrogate filters. If for any reason the DW is reloaded (exemple: new architecture, revamp of existing DW, etc) then there is a great chance that the surrogate keys will change and thus break all those reports...always filter on business keys..always!

I am going to respectfully disagree.

FIRST

If I needed to know about the first 5 cars that were produced because they all had a component attached that was incorrectly aligned and causes fatal accidents, does that number now have business value? What if I had a DateFinishedProduction column, but it is not a date time field or it is only to the granularity of Date? Can I determine which were the first 5 cars if I built 30,000 that day and they all finished within a few seconds of each other?


SECOND

Your VIN number argument is flawed. When you said "and so on" you left out that it has a sequential number attached to the end of it.

I did notice you said that a business key could have a sequence IN IT. But how do you get this sequence? Most of the time this sequence is a surrogate key (but it doesn't have to be). Does this negate that a business key could be formed from this value (that has no initial meaning to the user) and then some other values that do?

I don't believe so. And neither does the auto industry. Because they used a sequence at the end of all of the business information. The sequence is in fact useless. Or is it? If you don't have it, you don't have a UNIQUE value. Which vehicle is newer? Sequence 001 or 881? So the sequence is not useless. I bet there are other great uses for that sequence number as well.

To carry this argument further, the auto industry decided on a STANDARD KEY that everyone could agree on. Then the put the sequences in a central system and propagated that out to other systems. So the integration here is that other systems will use this sequence, this surrogate, as that system's business key for that particular automobile.

A business key is therefore a code that is defined manualy and commonly across all systems, in some cases in conjunction with a generated key (sequence date etc)

which is fine, however most source systems are old and licenced both issues may cause a problem with your suggestion.

further the issues with defining so many business keys. consider the trouble (and expense) found with defining a common across all systems key for customers.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›