Before we get to Dynamic Data Warehousing, we need to first reach Operational Data Warehousing. Now I realize that I'm not the first, nor will I be the last to use or even possibly abuse this term. In fact if you search on the term today you'll get tons and tons of hits. I do however believe that Data Warehousing and BI as an industry have gotten slow, and become somewhat of a laggard in terms of keeping up with technology. Just look at the adoption curve of DW2.0... It simply isn't there yet (wish it were). Anyhow, in this blog let's take another look at the ODW as Bill Inmon and I are beginning to discuss it.
First, I must say: Thank-you to Bill for not only being a great friend, but a wonderful mentor to me. I must also say, thank-you to Claudia Imhoff and Colin White for writing about Operational BI lately, and of course all my other friends out there who continually amaze me by answering my simplistic and absurd questions.
On that note, I've been pondering (and asking Bill for help) Operational Data Warehousing. I've also been blogging on the subject lately, and as of last week - had the wonderful opportunity to share my questions with my good friend Jeff Jonas (see his blog here). More on that in my thought experiments section, where I'll also be blogging on Form versus Function and some new advances in computing sciences.
So to the point: Operational Data Warehousing as it were requires:
* good form
* strong functionality
* streaming real-time data
* scalability, and flexibility
When I talk about real-time data, I'm not talking about "every 3 to 5 seconds, I get 500 transactions or so..." No, I'm talking about every 2 to 5 microseconds, the warehouse receives burst rate mini-batches of 500 to 10,000 transactions across multiple feeds... In other words, AS the transaction is created and pushed across to other source systems, so it is pushed directly in to the warehouse on an Operational basis.
In some cases, the objects doing the data collection and generating the transactions do NOT keep a copy of the transaction. In these instances, it is important to realize that the real-time fed data to the ODW IS a system of record. Now, as a DW2.0 compliant architecture, we are housing SOR data and NON-SOR data in the same structure, in the same place, at the same time. By Non-SOR data I mean: anything defined as "arriving from an operational source system which keeps transactional history" *** It has NOTHING to do with it being batch or non-batch ***
Ok, so there was a comment on my blog from a good friend: Walter Smetsers, requesting clarification of a statement about "will we continue to need operational systems" once we build an ODW... The answer today is: maybe. In the future, as new systems are built - convergence will take hold and the answer may become: no.
In an ODW, we not only have the capacity, but also the capabilities to program directly on top of the ODW the operational applications themselves. However in order to make this happen, we also need a Master Data layer inside the ODW, along with a Metadata Layer, and a Master Metadata Layer. All of this MUST be coupled together, and managed through an ontological function.
Ok - enough blathering, do we have one of these or don't we?
Yes, we've built one, but it doesn't YET have all the components it needs.
Where is it?
Unfortunately I can no longer share the customer information, however it's a Data Vault modeling architecture called the Serialization Vault that we've built for the national E-Pedigree mandates from the FDA. I've heard that the WHO (world health org) is interested among other parties.
How does it work?
We accept data in real-time from collectors on the manufacturing lines to a central Data Vault modeled data warehouse. We keep the data itself separate, and through a logical model and metadata layer we can re-assemble the disparate data sets to provide the drug manufacturers a complete picture.
We also have a layer of operational systems on top of the ODW, allowing data to be logically updated by the application. I say logically because in keeping with the DW2.0, it stores history of the original transaction, and merely inserts new information rather than updating in place.
There are a few other customers who've had one of these in process for years. I'd be happy to put you in touch with them.
BACK to ODW...
Are vendors supporting these concepts today? Not directly. You can build one on any RDBMS system, or column-based database, as long as application programming logic can access the data underneath directly. Operational BI is coming to the fore-front, and there are a number of young tech vendors coming to the table to meet the challenge, but it will be a while before the market space "grows up".
Does this mean my ODW is treated just like an operational system?
YES! It is an OPERATIONAL SYSTEM that has converged with a DATA WAREHOUSE, therefore it has the same requirements as an operational system and an EDW at the same time. The best type of data modeling for this is a normalized format, for scalability, flexibility, and auditability.
I'd love to hear your opinions, thoughts and questions.