Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

There are quite a few vendors out there who publish industry logical data models.  I've had my fare share of direct, hands-on experience with them over the last 8 to 10 years.  In this entry I will share a few of the pros and cons of these classifications of models. My objective is more of an inquisitive one, in hopes that you will respond with your experiences, and if you are a vendor: please post your opinions.

First let's take a look at what these iLDM's are defined to be: Steve Hoberman offers some thoughts in 2004 on the subject: http://www.teradata.com/tdmo/v06n04/Tech2Tech/InsidersWarehouse/DataModels.aspx

The right tools to capture how an organization within a particular industry works, or should work, are essential for in-depth knowledge of your business. These tools should also provide expert modeling structures for your enterprise to use as a guide. An industry logical data model (iLDM) offers this kind of information and guidance.

...The iLDM is a pre-built model that extends the EDM concept. Like the EDM, it is a subject-oriented and integrated view of the information needed for an enterprise data warehouse (EDW) to answer strategic and tactical business questions....

...While it does not contain history, the iLDM represents a snapshot of information (point in time). It also does not contain derived data or other calculations to assist with retrieving data more quickly. All model changes necessary for issues like speed, storage, security, backup and recovery must be added at the physical level....

Interesting... It does not contain history, nor derived data, nor other calculations... However the iLDM's I've seen have been implemented 1 to 1 with the physical model at client sites, and DO contain history and DO contain calcations, and so on. So it appears in step 1 that vendors are not necessarily following Steve's advice...

In my days of data modeling, I was taught as Steve suggests, that Logical Data Models should stay Logical - and often times do NOT represent the physical (due to partitioning, indexing, data types, super and sub-types, and so on). However in my experience, the vendors professional services of the LDM's appear to want to implement these physically in a 1 to 1 fashion (meaning 1 logical table = 1 physical table).

Moving on, I find this interesting: http://www.freepatentsonline.com/7359906.html A patent by several individuals for a particular LDM. I'll let you decide if this is a good or a bad thing. Let's move on...

Here is more information on iLDM's from another individual: http://www.univdata.com/Portals/9/udm_Products_Repository.pdf

And another: http://publib.boulder.ibm.com/infocenter/rdahelp/v7r5/index.jsp?topic=/com.ibm.datatools.logical.ui.doc/topics/clogmod.html

A logical data model is a model that is not specific to a database that describes things about which an organization wants to collect data, and the relationships among these things.

A logical model contains representations of entities and attributes, relationships, unique identifiers, subtypes and supertypes, and constraints between relationships. A logical model can also contain domain model objects, or reference one or more domain or glossary models. After logical objects and relationships are defined in a logical data model, you can use the workbench to transform the logical model into a database-specific physical representation in the form of a physical data model.

What I find in my experience:

What I've found is that professional service organizations take the logical data models and physically implement them directly on the specific databases in a 1 to 1 fashion. Of course, they add all the components necessary to make them work in a physical realm. However, I've found both pros and cons to this approach. So let's get to the heart of this entry and discuss (I'd love to hear from you about your pros and cons, and I'm not here to try to tell you one way or the other what to do with the LDM's)...

Pros:

  • Coverage of the LDM's are wide
  • Years of Experience in the industry of the people who design these models
  • Some "fast" implementation if scope of the project is kept to a minimum
  • As guides go, they are understandable (to a point) and contain many of the complex relationships needed to express information properly
  • They are "mostly" compressed (smaller models, smaller numbers of tables)
  • Easy to "Buy" versus "build"

(This is the part where you can add to the list)

Cons:

  • Consulting organizations don't often understand how to implement properly
  • They have been mapped 1 to 1 to the physical (without necessary changes)
  • The models are super typed, and often times are too high a grain for customers to really grasp ALL the elements that are combined within
  • The super-types try to handle too many levels of "grain", mixing corporate customers for example with individual customers and changing the outcome of the queries on the reports
  • Often require years of expertise and training to implement properly
  • Sometimes are a nightmare to tune, partition, and get performance from
  • Sometimes require tons of staging areas to "prepare" the data before integrating into supertyped table structures
  • Nearly always need "extending" and customization, but don't fare well with customers, as this becomes a multi-year, high cost effort - turning into a huge consulting services gig for big dollars.

Now are they worth it?
Sometimes they are, other times they fall down. If your a customer, and you're looking at these models, they certainly seem appealing.

What's your experience? *** WARNING: THIS IS MY OPINION ONLY! ***
I've seen these models take 3 to 5 years to implement correctly, and modify and customize properly. To be fair, I've seen 50/50 success rates with the implementations, some have worked beautifully, others have failed miserably. Was it because of the model? Can't say. Some environments it was the politics, not the model, not the provider, not the hardware or database. Other companies it was because of consulting services not understanding how to integrate, alter, upgrade, and customize. In other companies it was because of the storage needed in the staging area in order to "cleanse/change/alter and super-type" the data before it was loaded.

Successful implementations I've seen were because the scope was controlled, the modifications were extremely minimal, and data quality (from the source system) was extremely high. I've also seen successful implementations when there is only 1 source system to deal with. When this is the case, cross-mapping the information from the source to target super types is fairly easy to do. When there are multiple source systems involved the task becomes tremendously complex quickly.

My conclusion (this is where I'd like your feedback)
I believe that sometimes, buying the logical data models can really help move the project forward, but I also believe that breaking out the super-type into it's constituant raw-level components for the physical model is an important act of service to the customer. I believe that if you "customize" the logical data model, that you will have trouble when the vendor puts out a new version (that alters the version you have in place). I've seen cases where a 2 and 3 year bid effort to customize becomes a 7 year services contract in order to maintain customizations across multiple releases.

Don't get me wrong: LDM's are WONDERFUL GUIDES! But alas, they are "logical" data models, and should remain "logical" as their name implies. In the physical world, dealing with history, separation of grain, functional areas of business (crossing lines of business), and so on are very very important. Customizations are nearly *ALWAYS* in a data warehousing world, even with pre-packaged implementations of "pre-built data warehouse solutions."

Finally, let me ask you data modelers out there: does it really make the physical ETL implementation that much easier if you "supertype" everything (customer, individual, employee, manager, corporation, organization, and so on) into "one single table"? What if each of these elements has their own business key, and clearly the data is at a different grain, what do you do in your LDM to ensure success?

Thank-you for your comments and feedback,
Daniel Linstedt
DanL@DanLinstedt.com


Posted May 22, 2009 4:26 AM
Permalink | No Comments |

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›