We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Recently in Business Intelligence Category

Recently I made a key-note presentation at Array Communications conference in the Netherlands.  There were 50 people in the room, and it was a good crowd.  There are quite a few shifts occurring in the EDW/BI market space, and there are some reasons why changes are happening.  Game changing technology is coming to the market in ETL, services, hardware, databases, and applications.

Below are some game changing technologies that will help shape 2010 and the years to come:

  • Solid State Disk (SSD)
  • Hosted Cloud Computing
  • Column Based Databases
  • Unstructured Information
  • Ontologies / Taxonomies
  • Mining Engines (in-database)
  • Broad Based Web Services
  • Data Visualization
  • Flash & Silverlight Front-Ends
  • Analytic functions in Database Engines
  • Business Rules Engines Melding with ETL and Web Services
  • DW Appliances

If you're not familiar with these technologies, it would be good to get a grasp on them as we move forward.   By the way, SSD is now available in Macintosh Powerbooks - did you know this?  Solid State Disk has no moving parts, and is anywhere from 75x to 150x faster than standard disk devices.

Hosted cloud computing will bring the rise of EDW/BI as a service - There will be open and closed clouds available.  Of course the open clouds are things like Amazon EC2, closed clouds will be hosted in data centers over VPN, sponsored by vendors and the like - but will bring on the data warehouse and BI as a service solutions.  Mark my words - combining cloud based services with DW appliances will make things seemless for the back-end.  You will be able to mix and match technologies for appropriate purposes (column based data stores mixed with standard or traditional database engines).   The whole game is changing!

Unstructured information is coming to the table, businesses are finally realizing the value hidden in their unstructured information - and the technology is beginning to mature in this space.  It's taken many years, but it's coming of age.  Unstructured information can be "hooked" dynamically through Data Vault Link structures to structured data sets.  It can be interpreted by applying ontologies and taxonomy break-downs.  It can then be applied to build dynamic or RAM based cubes - lending itself to data exploration in a rapid fasion.

In-Database mining engines are making their way to the fore-front.  This is going to continue, as database engines get stronger - they will absorb more and more Transformation functionality.  Moving the transformations closer and closer to the data set - this is where the future is.  Real-time data mining will be available as another transformation call.  Real-time mining will be a continual running neural network - that takes transactional input and grades it on the fly.  This has been happening for years in the research community, and in the robot community - it's high time it moved in to main-stream technology.

Broad based web-services will be used, especially with the rise of the cloud technologies.  What we will see is: Cloud To Cloud vendors, technology that specializes in cross-cloud communication through web-services and grid components.  These technologies will include compression, encryption, and secure communication channels at high speeds over "smart" routers, hubs, and switches.  In fact - it wouldn't surprise me to see routers and switches get smarter, and include layers of web-services within their hardware that allow cloud-to-cloud communication over secure protocol.

On the BI side of the house, we will see a continued rise of use in Adobe Flash platforms, and Microsoft Silverlight platforms for Data Visualization.  We will move into the realm of using multi-touch, fly-through, dynamic graphs and 3D land-scapes for our data.  We will see an evolution of 2nd life, or technologies similar to this - where virtual worlds will offer the BI meeting spaces of tomorrow.  But these graphs won't be static.  You'll be able to alter, change, play what-if games with them, add metadata ON-THE-FLY, and write the new data back to dynamic cubes.

As database engines, appliances, and vendors sort themselves out, more and more transformation logic will be included in these engines - to the point where some of the "transformation" notions will be engineered into firmware or hardware levels.  Continuing to move the data closer and closer to the storage will be the end-goal.  it won't matter if the storage is old-world disk (standard disk), or SSD.  In fact, I would argue that the database manufacturers will work a deal with the SSD manufacturers, and have custom "Smart-Storage" built to include transformation functionality at the data store level. 

ETL engines will re-focus their efforts on managing and manipulating the Metadata around the transformation logic.  People like Tielhard (who just sued everyone in the land-scape) will have to work deals with database vendors to provide transformation functionality at the hardware level.  "traditional" ETL vendors will no longer exist - they will change to provide service based, cloud based technology - again, with a heavy focus on metadata management, process design, and controls.

Did I miss something?  Comment on what you see!

Dan Linstedt


Posted November 12, 2009 3:15 AM
Permalink | 1 Comment |

Ok folks, I'm back.  I've spent the last six months reading, writing (and arithmetic - that's a joke... ha-ha...)  seriously, implementing solutions, visiting customer sites, and seeing what's going on in the world; and to that end, the DW/BI world is changing.  You've seen me write about this before, but now it's on-top of your head.

Like it or not...  Your Data Warehouse IS a System Of Record, and if it isn't in 2009, it WILL BE before the end of 2010.  So what does that mean?

You've heard about it for the past 2 years (at least), it's called Operational BI, and my good friend Claudia Imhoff, and my other friend Colin White have published many many articles and blog entries about these topics.  But have you given any consideration to exactly what that means to the Data Warehouse?

Come on people, get with the times!

Ok, down to brass tacks.  To have "operational BI" you must have Operational Data, well, wait a minute - operational data means system of record doesn't it? 

Ding Ding Ding, you win the prize... come on down!

Wow, didn't know there was a prize... (there's not)...  Anyhow, Operational Data is required - and if you're talking operational BI, then your talking about business intelligence, and if you mention business intelligence, then you MUST consider data warehousing.... 

So what does that mean?

It means you've GOT NO CHOICE but to consider an Operational EDW under the covers.  That's right folks, I've said it for years, now I'm not going to hide this fact from the light anymore.  YOUR DATA WAREHOUSE WILL FROM NOW ON... HOUSE OPERATIONAL DATA RIGHT BESIDE YOUR HISTORICAL DATA, this means that the operational data will be captured from applications sitting directly ON-TOP of your Data Warehouse.

Think I'm foolin do ya?

Think again...  These are the systems I've been building, or helping folks build for the past 3 years - you don't think that I've been sitting around resting on my laurels eh?  Far from it.  I've been traveling the world, talking to folks about accountability, credibility, RAW OPERATIONAL DATA sitting right inside the data warehouse.  This means that the combined result produces Operational BI.  it also means that the resulting Data Warehouse or (ODW for short) IS a system-of-record.

Don't like it?  Tough.  Take it to the bank!

Customers like JP-Morgan-Chase, Lockheed Martin, Raytheon, World Bank, ABN-AMRO, RBS (royal bank of Scotland), Sisters of Mercy Healthcare, Blue-Cross Blue Shield, Department of Defense, Food and Drug Administration, Federal Aviation Administration, City of Charloette (North-Carolina), SNS Bank Netherlands, and more have all used these principles to create complete ODW's in house today.

What do they need?

Accountable, Auditable, Flexible and scalable data stores that act as an EDW while at the same time absorbing real-time transactional (operational) data directly in to the EDW.  Thus creating a RAW ODW.  It's here today folks, fed by web-services (SOA), batch, and ESB's - managed and edited by applications that control metadata, carrying changes and audit trails along with cube-write-backs and data edits, that's what ODW and Operational BI are all about.  It's making the systems more robust, more scalable, more complete.

At the risk of upsetting some folks: And ODW is the underpinning of successful OBI.

The ODW contains and inherits all the responsibilities of the data warehouse.  In addition, because it houses operational data it also inherits all the responsibilities of an operational system!!!  Yep, you guessed it: 9x9 up-time, reliability, fault-tolerance, fail-over, garaunteed access, high speed transaction handling, on-top of: strategic data handling, historical record keeping, integration, etc...  And don't forget: system-of-record.

But why is it a system-of-record?  I though other operational systems handled this?

Each operational system handles their own system of record, but with an ODW - think about it...  you're saying: My data warehouse/BI solution is an operational system in addition to being a data warehouse.  So, you're saying: this is the only place where a complete, single picture of the integrated operational and historical data lives.  Which in essence means: you have built what needs to be a system-of-record.

What's involved in creating a system-of-record for the ODW?

Good question, and I wish I had time to answer it....  Ok - here's my opinion: you need to worry about all the operational "rules and regulations and governance", you also need to worry about storing RAW (uncleansed, unchanged, unaltered) data sets in your data warehouse.  That means, your "business rules" to mainpulate data can no longer be executed up-stream of your data warehouse.  You must move your business rules downstream of the ODW, otherwise you're changing operational data to make it fit, and blowing up any chance of an auditable system-of-record.  ESPECIALLY when 3 months go by and the business rules change!!

Are you saying I need "RAW" data in my ODW to make this work?

That's exactly what I'm saying.  I'm not saying get rid of the business rules, I'm not saying put "bad" data in front of business users.... No, I am saying "move the processing downstream" to POST-ODW (between ODW and Data Mart delivery).  It will make your life MUCH easier.

This is absured, I've never heard of anything like this before.  Why should I care?

You don't have to care if you don't want, but the world is a changing place.  If you want an auditable and compliant ODW, or you want to add operational data to your data warehouse you will have to follow these rules, dare I say it: you might find yourself out of a job if you don't!

All the companies I mentioned above, and more, every day are learning this the hard way (through failed audits, through corporate restructuring, through spread-marts where cost has gotten out of hand)...  They have decided to build ODW's to solve these problems.

It just so happens, they are using the Data Vault Modeling to make this happen, you don't need a Data Vault model to get here, you must however follow the principles outlined in this article - but the Data Vault is an enabler that will help you get here faster.  You can read more about the Data Vault Model at: http://www.DataVaultInstitute.com

or contact me directly: DanL@DataVaultInstitute.com


Dan Linstedt

Posted October 28, 2009 12:46 AM
Permalink | No Comments |

There are many trends afoot in the industry, but I think there are two or three that are here to stay, and quite frankly, they will make a huge impact on the way BI/EDW is done. Some of the technologies that are ground-shaking and game-changing include Solid State Disk, Database appliances on the Cloud, and data services on the cloud.  I've been watching the Database market now for several years, and here are a few of my predictions for 2010.

Here's the skinny of it:

1) Solid State Disk.  It's not your grand-fathers "RAM on a USB stick" plugged in to the computer, no - it's high-speed, ultra parallel, screaming bus performance RAM.  As one analyst recently put it: "CPU's have gained 5 million times faster performance over the last 30 years, while disk drives have improved only about five times, he says. SSD's offer 150x improvement in random I/O's per second over spinning disk."  Information Week, Oct 26, 2009, Douglas Henschen.

So what does this really mean in business terms?

In business terms this brings the ability to finally execute high speed parallel operations for massive BI queries, and massive bulk-loading systems.  But wait, there's more (yes, it gets better)...  This also means that we can finally take advantage of extreme flexibility in logical modeling layers, and have to "worry less" about physical modeling components... Which of course leads us right to the Data Vault Modeling methodology, or 6th normal form (can become a reality if you like). 

In business terms, if we have extreme flexibility in our architecture, we can finally enable IT to become nimble and fast again.  It means we can reduce delivery times, and execution cycle times.  It also means we can increase up-time, and oh-wait...  HOT DATA?  Did all of you forget about Temperature driven data?  Shame on you!  Hot data will reside in the SSD devices, allowing hyper-cubes to take place, AND allowing DYNAMIC changes to both the data models, the queries, and the loading systems to occur.

Why is that important to me?

Well, I'll tell you...  In my opinion, it's important to you because you should care about the architecture you are using, you should care about the accountability and auditability of the data set, you should (and usually do) care about the performance of the system you're building your solution on; and by the way: your going to have MORE and MORE data to deal with, like it or not: the EDW/BI solutions of today are accountable (they have become a system of record in their own right).

Yea ok, so what...  Solid state disk, faster machines, phooey...

You can say what you like, but IBM Smart Analytic System, Oracles' Exadata, Teradata Extreme Performance Appliance 4555, are just a few of these notions.  What would happen (do you think) if ParAccel, or Vertica, or Sybase IQ jumped on this ship?  Especially into Amazon EC2 cloud...  Hmmmm Now it get's interesting!

Which brings me to my next point:

2) Database technology is changing for 2010.  RDBMS making use of SSD will change the game, it will be an uneven playing field, but then making these devices available on the EC2 cloud (wether it's amazon or google providing the services) - based on HADOOP (mostly) - it won't matter.  The RDBMS technology will get faster, and now - in-company network bandwidth will be OUTSHINED by cloud privisioned bandwidth.

What?  What does that mean outshined by cloud bandwidth?  I don't get it...

You will have to get it if you want to lower cost, and succeed in business going forward.  What I mean is: Data Warehousing and BI solutions will be offered as an outsourced service.  That's right folks, for the first time in DW/BI history, (with just these two advancements coupled together), Data Warehousing and BI as a service will become a reality.  Mark my words!   Ok, don't take my word for it... 

All you need to do is call up a company called: Quantivo they are processing/ingesting 7 billion customer records, and allowing 2 to 3 second query response times against this sizable data.  What do they sell?  packaged data warehousing/BI analytics.  In other words, they do the loading of the raw data, they manage their internal data model, they perform the cross-mapping, the indexing, and they build the "data mart" availability of the data set.

So what's the catch, there's always a catch, and I want to know what it is!

Of course there's a catch... today there's an ethical, security, and privacy dilemma.  How do you "outsource" data to a company providing a service that is put into a cloud owned by a 3rd party company (say Amazon or Google), and keep it all safe and secure?  Well, I would argue there are several layers to making "DW/BI as a service" available.

Layer A: is what Quantivo Does: doesn't keep (or receive) any specific or private information about individual customers that may risk identity theft if the security wall is broken.  They provide a VPN service, dedicated encrypted channels for the customer to get at their own data sets.  They also slice the data sets off - for each customer, they setup different grids in the cloud - ensuring all data stays separate.

Layer B: Hardware vendor/RDBMS vendor offers use of their software/hardware within the cloud as a direct contract or managed service for upgrades, the customer contracts them to execute the cloud installation, and then the customer manages, owns the data, the data modeling, the queries, and all other components within the cloud over VPN.

I think beyond this, there may be some additional options for EDW/BI as a service however they have to be scrutinized carefully.

Ok, ok - so what's the real business benefit here?

Still don't see it eh?  Well, here are just a few of the business benefits to be received by executing plans like this:

  • Lower cost of ownership
  • Faster turn-around time (like in Quantivo's case - they are specialized in customer analytics and data mining)
  • More parallelism with "sloppier design" - wait, that's a risk!  Yes, it is...  but sloppy designs will get a significant performance boost from SSD, and from cloud technology
  • More agile, more nimble (Dynamic data marts) - here today, and possibly trashed tomorrow, only to build a new one!
  • Cube Write-Back, yes - because of the nature of cubes, LARGE CUBES on SSD in a CLOUD now can offer FAST write-back. Hey Microsoft, you listening?
  • More power in the hands of the business user - direct access to create on-demand data mart solutions IN-RAM, that's right... SSD on a cloud over VPN, lets users PLAY with their data directly (if the outsourced service has implemented properly).  finally, play with LARGE to HUMUNGOUS sets of data on essentially a RAM drive!  Pay as you go, pay for only what you need.

I don't mean to dig on Microsoft, I think just the opposite - they have cube write back technology today, and their 2008 SQLServer R2 release is INCREDIBLE...  it is screaming fast, and super scalable (even on standard 32 bit windows servers, if you know how to tune it).  I just think it's an easy jump for them to implement something on SSD and in the cloud, and voila - what an INCREDIBLE solution.  It's so easy, I don't know why their gold partners haven't done it yet...  Anyone want to invest in me?  I'll build you a company to do it.

Alright, what are the risks...

The risks include: not managing the services effectively, not setting the quotas and policies for each user appropriately (example: a low rung business user fires off a query that expands the cloud to 12 to 20 machines for 1 hour, but can't authorize the payment for the service usage).  Ok - Governance, Governance, and more governance....  Oh, did I mention: GOVERNANCE over the solution?  Absolutely required.

Another risk is: letting yourself believe that this is the answer to "all the poor quality data modeling" or "sloppy ETL" or "poorly behaved BI queries" because now the performance is incredible out of the gate.  Wrong, wrong wrong.  Even though this opportunity affords HUGE gains in performance, it is no excuse for lack of GOVERNANCE and DATA MANAGEMENT best practices.  Your data models can & should be improved, your sloppy ETL can and should be tuned, and your poorly behaved BI queries must be optimized.  Then instead of seeing only a 10x gain, you will be able to see a 150x gain.

You know what they say: No PAIN, no GAIN... In this case, the pain is paying for the right architecture to be put in place, the right training for performance and tuning, and the right data modeling to be utilized under the covers.

You can find out more about Data Vault (550+ users, 2000 unique visitors a month, 150k hits a month) at http://www.DataVaultInstitute.com

You can find out more about ETL performance and Tuning at http://www.TrainOvation.com, and http://www.GeneseeAcademy.com

OR: Contact me directly: danL@DataVaultInstitute.com  (even if you're not using Data Vault)

Cheers for now,

Dan Linstedt


Posted October 28, 2009 12:03 AM
Permalink | No Comments |
If you've started a Master Data project, or you are working with Metadata at the business levels, then you probably are familiar with the need for ontological classification of terms.  If you aren't familiar with Ontologies (at an entry level), then I would say that you will have a tough time putting together successful MDM or Master Metadata components for your EDW/BI solution.

Ontologies are forms of classifications for terminology. In other words, the idea is as follows:

  • Vehicle
    • Car
    • Bus
    • Train
    • Motorcycle
    • Auto
    • Bicycle

This is just one way (one ontology) of terms that are used to describe vehicle.  There are many other views of Vehicle (axis) on which to pivot this data set, depending on the classification strategy you want to use (enter: Taxonomy). 

Business users often fight over what a term means, why? Because they apply the term in a different context (along a different ontology or categorization).  The problem is: they are both right at the same time, even with different definitions of the Metadata.

What does this mean to my Master Data Projects?

Well if you're focused on creating a successful BI deployment, and creating value in the business, then you MUST allow these multiple ontologies to exist.  However, good governance practices will state that you SHOULD manage them through a taxonomy (a classification of groups of ontologies).  I guess you can think of it as a CUBE for metadata with the elements in the cells, and along the axis points. (Someone should visualize metadata this way, it would be really interesting). 

In other words, you MUST be capable of creating "master" metadata classifications IF you are to be successful at creating, understanding, and deploying Master Data Management solutions across your company or organization.

Unfortunately there are hundreds of "spread out" tool sets that manage "ontologies" and sets of meta-data, but don't come close to hooking it in to a BI/EDW or even Master Data tool.

How does this impact my project?

If you have embarked on MDM project, or something akin to that, and you have NOT looked at your metadata or ontological layout, you are already behind the 8-ball.  You probably are experiencing cost overrun, missed requirements, non-met time frames for delivery.  You probably have a disgruntled staff, and disallusioned business users.  If on the other hand you have a successful project, I'd like to hear how you did it without Master Metadata.

It's high time to put together the Metadata needed to apply, understand, and decipher Master Data at the business level.  Oh - yea, and one more shocker (probably not news to you)... but if you're ever going to get value out of Unstructured data, or you plan on mining unstructured data, then you will ABSOLUTELY need the ontological breakdown of the different languages that your documents are written in.  It is a MUST to tying any of that information into a structural world where the context can SHIFT at any given time.

I hope this is helpful, I'd like to hear your experiences on the subject.


Dan Linstedt


By the way, we've just launched a library of technical documents (free) to which we will be expanding these subjects at http://www.DataVaultInstitute.com


Posted July 24, 2009 4:27 AM
Permalink | 1 Comment |
There are quite a few vendors out there who publish industry logical data models.  I've had my fare share of direct, hands-on experience with them over the last 8 to 10 years.  In this entry I will share a few of the pros and cons of these classifications of models. My objective is more of an inquisitive one, in hopes that you will respond with your experiences, and if you are a vendor: please post your opinions.

First let's take a look at what these iLDM's are defined to be: Steve Hoberman offers some thoughts in 2004 on the subject: http://www.teradata.com/tdmo/v06n04/Tech2Tech/InsidersWarehouse/DataModels.aspx

The right tools to capture how an organization within a particular industry works, or should work, are essential for in-depth knowledge of your business. These tools should also provide expert modeling structures for your enterprise to use as a guide. An industry logical data model (iLDM) offers this kind of information and guidance.

...The iLDM is a pre-built model that extends the EDM concept. Like the EDM, it is a subject-oriented and integrated view of the information needed for an enterprise data warehouse (EDW) to answer strategic and tactical business questions....

...While it does not contain history, the iLDM represents a snapshot of information (point in time). It also does not contain derived data or other calculations to assist with retrieving data more quickly. All model changes necessary for issues like speed, storage, security, backup and recovery must be added at the physical level....

Interesting... It does not contain history, nor derived data, nor other calculations... However the iLDM's I've seen have been implemented 1 to 1 with the physical model at client sites, and DO contain history and DO contain calcations, and so on. So it appears in step 1 that vendors are not necessarily following Steve's advice...

In my days of data modeling, I was taught as Steve suggests, that Logical Data Models should stay Logical - and often times do NOT represent the physical (due to partitioning, indexing, data types, super and sub-types, and so on). However in my experience, the vendors professional services of the LDM's appear to want to implement these physically in a 1 to 1 fashion (meaning 1 logical table = 1 physical table).

Moving on, I find this interesting: http://www.freepatentsonline.com/7359906.html A patent by several individuals for a particular LDM. I'll let you decide if this is a good or a bad thing. Let's move on...

Here is more information on iLDM's from another individual: http://www.univdata.com/Portals/9/udm_Products_Repository.pdf

And another: http://publib.boulder.ibm.com/infocenter/rdahelp/v7r5/index.jsp?topic=/com.ibm.datatools.logical.ui.doc/topics/clogmod.html

A logical data model is a model that is not specific to a database that describes things about which an organization wants to collect data, and the relationships among these things.

A logical model contains representations of entities and attributes, relationships, unique identifiers, subtypes and supertypes, and constraints between relationships. A logical model can also contain domain model objects, or reference one or more domain or glossary models. After logical objects and relationships are defined in a logical data model, you can use the workbench to transform the logical model into a database-specific physical representation in the form of a physical data model.

What I find in my experience:

What I've found is that professional service organizations take the logical data models and physically implement them directly on the specific databases in a 1 to 1 fashion. Of course, they add all the components necessary to make them work in a physical realm. However, I've found both pros and cons to this approach. So let's get to the heart of this entry and discuss (I'd love to hear from you about your pros and cons, and I'm not here to try to tell you one way or the other what to do with the LDM's)...


  • Coverage of the LDM's are wide
  • Years of Experience in the industry of the people who design these models
  • Some "fast" implementation if scope of the project is kept to a minimum
  • As guides go, they are understandable (to a point) and contain many of the complex relationships needed to express information properly
  • They are "mostly" compressed (smaller models, smaller numbers of tables)
  • Easy to "Buy" versus "build"

(This is the part where you can add to the list)


  • Consulting organizations don't often understand how to implement properly
  • They have been mapped 1 to 1 to the physical (without necessary changes)
  • The models are super typed, and often times are too high a grain for customers to really grasp ALL the elements that are combined within
  • The super-types try to handle too many levels of "grain", mixing corporate customers for example with individual customers and changing the outcome of the queries on the reports
  • Often require years of expertise and training to implement properly
  • Sometimes are a nightmare to tune, partition, and get performance from
  • Sometimes require tons of staging areas to "prepare" the data before integrating into supertyped table structures
  • Nearly always need "extending" and customization, but don't fare well with customers, as this becomes a multi-year, high cost effort - turning into a huge consulting services gig for big dollars.

Now are they worth it?
Sometimes they are, other times they fall down. If your a customer, and you're looking at these models, they certainly seem appealing.

What's your experience? *** WARNING: THIS IS MY OPINION ONLY! ***
I've seen these models take 3 to 5 years to implement correctly, and modify and customize properly. To be fair, I've seen 50/50 success rates with the implementations, some have worked beautifully, others have failed miserably. Was it because of the model? Can't say. Some environments it was the politics, not the model, not the provider, not the hardware or database. Other companies it was because of consulting services not understanding how to integrate, alter, upgrade, and customize. In other companies it was because of the storage needed in the staging area in order to "cleanse/change/alter and super-type" the data before it was loaded.

Successful implementations I've seen were because the scope was controlled, the modifications were extremely minimal, and data quality (from the source system) was extremely high. I've also seen successful implementations when there is only 1 source system to deal with. When this is the case, cross-mapping the information from the source to target super types is fairly easy to do. When there are multiple source systems involved the task becomes tremendously complex quickly.

My conclusion (this is where I'd like your feedback)
I believe that sometimes, buying the logical data models can really help move the project forward, but I also believe that breaking out the super-type into it's constituant raw-level components for the physical model is an important act of service to the customer. I believe that if you "customize" the logical data model, that you will have trouble when the vendor puts out a new version (that alters the version you have in place). I've seen cases where a 2 and 3 year bid effort to customize becomes a 7 year services contract in order to maintain customizations across multiple releases.

Don't get me wrong: LDM's are WONDERFUL GUIDES! But alas, they are "logical" data models, and should remain "logical" as their name implies. In the physical world, dealing with history, separation of grain, functional areas of business (crossing lines of business), and so on are very very important. Customizations are nearly *ALWAYS* in a data warehousing world, even with pre-packaged implementations of "pre-built data warehouse solutions."

Finally, let me ask you data modelers out there: does it really make the physical ETL implementation that much easier if you "supertype" everything (customer, individual, employee, manager, corporation, organization, and so on) into "one single table"? What if each of these elements has their own business key, and clearly the data is at a different grain, what do you do in your LDM to ensure success?

Thank-you for your comments and feedback,
Daniel Linstedt

Posted May 22, 2009 4:26 AM
Permalink | No Comments |

Search this blog
Categories ›
Archives ›
Recent Entries ›