Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Bill Inmon and I sat down the other day to discuss a system that we are building. We didn't have a good "name" for it, but what it amounts to is: Operational Data Warehousing. If you can believe it, what we've done is taken the Operational specifics of systems capturing data - and placed it on top of the Data Warehouse as a single integrated historical and operational data store. We are currently using the Data Vault model for this componentry. Some folks have called this "Active Data Warehousing" in the past, but we feel that this is one step beyond, in that it actually IS the operational store at the same time as being the Data Warehouse. Convergence has arrived...

I've blogged about convergence in the past, it's no secret that the world is converging, and I.T. is no different. It is also no secret that EDW technology is converging with operational technology. Well, if we look behind us (20/20 is always best) we can see the divergence path of data warehousing and operational systems, and the re-convergence of these systems. Active Data Warehousing coupled with SOA, and real-time alerts coming back from the ADW have begun to turn the tables.

We have closed the gap on this one. Using the principles of the Data Vault modeling (http://www.DanLinstedt.com) we've constructed an Operational Data Warehouse (right now, Bill and I do not have a better term for this, Bill also thought that this is a new approach).

What does Operational Data Warehouse do?
One way to describe it is as an Operational Data Store with history.

Another way to describe it is: as a data warehouse with operational (raw) data.

Why do it this way?
Well for one, it provides traceability in all the data. Bringing in the RAW operational data over a web-service (as generated by the upstream machines), provides us with accountability, auditability and pure traceability. By utilizing the notions of the HUB entity within the Data Vault structures, we achieve horizontal integration across the data sets. This operational data warehouse is front-ended by web-services, and has direct integration in to the business processes. It is not fed with any sort of "batch" system, it is however pre-loaded with master data.

The structures of the Data Vault have been setup within the databases in such a way as to allow tremendous scalability and flexibility. We have physically partitioned the machines for security purposes, and scalability purposes. We can join 800M rows to 300M rows to 100M rows, and bring back 10k rows in under 10 seconds when we know what we're looking for. This setup is housed on SQLServer2005 on Windows 2003 r2, with 32 bit, 2 dual core CPU's at 2.8 GHZ, 2GB RAM.

So what's this got to do with Operational Data Warehousing?
Plenty. Operational data warehouses (a very lose term today) consist of the following requirements:
* Must be accountable
* Must be auditable
* Must be a system-of-record
* Must interact with other operational systems
* Must house operational data
* Must house historical data
* Must NOT separate operational data from historical data in the data store.
* Must be the SOURCE for a major business function
* Must be real-time (can have batch feeds, but must be real-time in data streams)
* Must be part of the business process flows.

So what are the technical requirements?
* Must be scalable
* Must be flexible
* Must NOT break history when the business changes/data models change
* Must NOT break existing data feeds when the model changes
* Must be FAST access, fast insert, etc...

And of course it MUST follow the DW2.0 requirements:
* Must have historical data
* Must not be "updated" directly (would break auditability)
* Must maintain cross-functional relationships
* Must be GRANULAR (to the absolute lowest level of grain available)
* Must provide strategic and tactical value
* Must include indexes/pointers/links to unstructured information

So what? How do I get there?
We've used the Data Vault data modeling to get there. It meets all these needs and has been blessed by Bill Inmon as the "optimal choice for DW2.0" data modeling. Because of the structures, along with the foundational approaches to loading the Data Vault, and what the data in the Data Vault represent - we've been able to construct the system described above. In fact, we have two of these up and running. One in our facilities in Denver, and one in Washington DC.

So you mean to say there "is no operational system"?
There is partially, there are many "machines" that collect the information operationally, and pass it back to our Operational Data Warehouse (Data Vault), but - they do not house the information after they've released it to us. The ODW Data Vault actually stores all the operational information from around the country, and soon - around the world.

Next time we'll dive in a little deeper as to what it means to construct one of these, and how they work.

You might already have one of these, if you do - I'd love to hear about it. As always, thoughts, comments, corrections, are welcome.

Cheers,
Dan Linstedt


Posted February 25, 2008 8:26 AM
Permalink | 2 Comments |

2 Comments

Dan,

Nice - very nice. I so agree with the convergence scenario between transational and informational worlds. I see however two scenario's developing at the moment. The guys and galls who believe that integration should be done by application integration (the federated approach - a lot of the SOA guys are keen on this one) and the guys and galls that believe in data integration. Needless to say.....I vote for the second one.

The first is the federated one. Data is stored where data is created. Federation software (EII) is retrieving the data when asked for.

To be frank; I dont like this one.....In fact, its intrinsic wrong. How about traceability? History? I think this aint doable, especially if u would hold it against the stated requirements u posted.

The second one is something u seem to have cracked. The second one is a (as I call it) data centric scenario where transational and informational are converging. The data in the transactional (read: operational) system only stores the current transaction, once committed it is directly fed into the data-service/ODS/data warehouse/?? Voila....your data warehouse is active (u can now use this data for almost any purpose - not just BI!!)

In this future (second scenario), operational OLTP databases supporting the operational system will die eventually. They will at least be very thin. They will focus on the services they need to provide - good thing I should say.

This scenario opens up a lot of doors of opportunity now. I can't even begin to imagine the impact of such an architecture on an Enterprise Architecture.

I am off course very interested in the technical solution u chose and the model underneath it.

Great post!

Dan (and Bill),

I must admit the concept looks appealing, but do understand you both correctly if I say the following;

"We don't need a operational system any more with this concept. We only need a interface to a very small database, where the information is 'stored' until it is 'extracted' by the ODW?"

The closed loop that will originate from this approach is very nice. Data Warehouse supporting the Operational Interface, which delivers the information for this support...

But... doesn't this shift responsibility of data quality more and more towards the data warehouse? You might say data quality becomes more visible (and quicker adjustable), but making mistakes is human. Using the operational interface means human intervention...

My feelings is that in the future it might be a nice opportunity for highly mature companies, but with companies on the lower part of the BI / DWH ladder, we might be facing problems using this ODW. In my opinion it asks for a very strong BI / DWH vision, which I miss at some customers (Sorry for that)

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›