Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Before we get to Dynamic Data Warehousing, we need to first reach Operational Data Warehousing. Now I realize that I'm not the first, nor will I be the last to use or even possibly abuse this term. In fact if you search on the term today you'll get tons and tons of hits. I do however believe that Data Warehousing and BI as an industry have gotten slow, and become somewhat of a laggard in terms of keeping up with technology. Just look at the adoption curve of DW2.0... It simply isn't there yet (wish it were). Anyhow, in this blog let's take another look at the ODW as Bill Inmon and I are beginning to discuss it.

First, I must say: Thank-you to Bill for not only being a great friend, but a wonderful mentor to me. I must also say, thank-you to Claudia Imhoff and Colin White for writing about Operational BI lately, and of course all my other friends out there who continually amaze me by answering my simplistic and absurd questions.

On that note, I've been pondering (and asking Bill for help) Operational Data Warehousing. I've also been blogging on the subject lately, and as of last week - had the wonderful opportunity to share my questions with my good friend Jeff Jonas (see his blog here). More on that in my thought experiments section, where I'll also be blogging on Form versus Function and some new advances in computing sciences.

So to the point: Operational Data Warehousing as it were requires:
* good form
* strong functionality
* streaming real-time data
* scalability, and flexibility

When I talk about real-time data, I'm not talking about "every 3 to 5 seconds, I get 500 transactions or so..." No, I'm talking about every 2 to 5 microseconds, the warehouse receives burst rate mini-batches of 500 to 10,000 transactions across multiple feeds... In other words, AS the transaction is created and pushed across to other source systems, so it is pushed directly in to the warehouse on an Operational basis.

In some cases, the objects doing the data collection and generating the transactions do NOT keep a copy of the transaction. In these instances, it is important to realize that the real-time fed data to the ODW IS a system of record. Now, as a DW2.0 compliant architecture, we are housing SOR data and NON-SOR data in the same structure, in the same place, at the same time. By Non-SOR data I mean: anything defined as "arriving from an operational source system which keeps transactional history" *** It has NOTHING to do with it being batch or non-batch ***

Ok, so there was a comment on my blog from a good friend: Walter Smetsers, requesting clarification of a statement about "will we continue to need operational systems" once we build an ODW... The answer today is: maybe. In the future, as new systems are built - convergence will take hold and the answer may become: no.

In an ODW, we not only have the capacity, but also the capabilities to program directly on top of the ODW the operational applications themselves. However in order to make this happen, we also need a Master Data layer inside the ODW, along with a Metadata Layer, and a Master Metadata Layer. All of this MUST be coupled together, and managed through an ontological function.

Ok - enough blathering, do we have one of these or don't we?
Yes, we've built one, but it doesn't YET have all the components it needs.

Where is it?
Unfortunately I can no longer share the customer information, however it's a Data Vault modeling architecture called the Serialization Vault that we've built for the national E-Pedigree mandates from the FDA. I've heard that the WHO (world health org) is interested among other parties.

How does it work?
We accept data in real-time from collectors on the manufacturing lines to a central Data Vault modeled data warehouse. We keep the data itself separate, and through a logical model and metadata layer we can re-assemble the disparate data sets to provide the drug manufacturers a complete picture.

We also have a layer of operational systems on top of the ODW, allowing data to be logically updated by the application. I say logically because in keeping with the DW2.0, it stores history of the original transaction, and merely inserts new information rather than updating in place.

There are a few other customers who've had one of these in process for years. I'd be happy to put you in touch with them.

BACK to ODW...
Are vendors supporting these concepts today? Not directly. You can build one on any RDBMS system, or column-based database, as long as application programming logic can access the data underneath directly. Operational BI is coming to the fore-front, and there are a number of young tech vendors coming to the table to meet the challenge, but it will be a while before the market space "grows up".

Does this mean my ODW is treated just like an operational system?
YES! It is an OPERATIONAL SYSTEM that has converged with a DATA WAREHOUSE, therefore it has the same requirements as an operational system and an EDW at the same time. The best type of data modeling for this is a normalized format, for scalability, flexibility, and auditability.

I'd love to hear your opinions, thoughts and questions.

Thanks,
Dan L


Posted May 4, 2008 9:30 PM
Permalink | 3 Comments |

3 Comments

I recently had a similar challenge while working on designing and implementing a subject oriented data mart at a bank, for thier currency exchange analytical needs.

Challenge started out that data cubes be available to bank's Treasury Front Office in not more than 30 sec - a part of Treasury Banking that acts as Front Line Curency Dealers.

When i accomplished this; I used to think a lot on what have i done ... it is not a warehouse - i have turned the warehouse into an operational transaction system; because it continuely kept on recieving updates on transactions from OLTP.

But reading your blog; it is defeinitve that in near future, one would not need an operational system when equiped with Operation Data Warehouse.

Dan,

Does this imply that the ODW uses a 2PC protocol? I can't imagine how that can perform queries with any sort of acceptable SLA.

-NR

Hi Neil,

You and I will chat about this before I post any further information. There is definately an impact on 2 phase commit - we will explore this together as we move forward.

Love to have your thoughts,
Dan Linstedt

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›