Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

February 2008 Archives

Bill Inmon and I sat down the other day to discuss a system that we are building. We didn't have a good "name" for it, but what it amounts to is: Operational Data Warehousing. If you can believe it, what we've done is taken the Operational specifics of systems capturing data - and placed it on top of the Data Warehouse as a single integrated historical and operational data store. We are currently using the Data Vault model for this componentry. Some folks have called this "Active Data Warehousing" in the past, but we feel that this is one step beyond, in that it actually IS the operational store at the same time as being the Data Warehouse. Convergence has arrived...

I've blogged about convergence in the past, it's no secret that the world is converging, and I.T. is no different. It is also no secret that EDW technology is converging with operational technology. Well, if we look behind us (20/20 is always best) we can see the divergence path of data warehousing and operational systems, and the re-convergence of these systems. Active Data Warehousing coupled with SOA, and real-time alerts coming back from the ADW have begun to turn the tables.

We have closed the gap on this one. Using the principles of the Data Vault modeling (http://www.DanLinstedt.com) we've constructed an Operational Data Warehouse (right now, Bill and I do not have a better term for this, Bill also thought that this is a new approach).

What does Operational Data Warehouse do?
One way to describe it is as an Operational Data Store with history.

Another way to describe it is: as a data warehouse with operational (raw) data.

Why do it this way?
Well for one, it provides traceability in all the data. Bringing in the RAW operational data over a web-service (as generated by the upstream machines), provides us with accountability, auditability and pure traceability. By utilizing the notions of the HUB entity within the Data Vault structures, we achieve horizontal integration across the data sets. This operational data warehouse is front-ended by web-services, and has direct integration in to the business processes. It is not fed with any sort of "batch" system, it is however pre-loaded with master data.

The structures of the Data Vault have been setup within the databases in such a way as to allow tremendous scalability and flexibility. We have physically partitioned the machines for security purposes, and scalability purposes. We can join 800M rows to 300M rows to 100M rows, and bring back 10k rows in under 10 seconds when we know what we're looking for. This setup is housed on SQLServer2005 on Windows 2003 r2, with 32 bit, 2 dual core CPU's at 2.8 GHZ, 2GB RAM.

So what's this got to do with Operational Data Warehousing?
Plenty. Operational data warehouses (a very lose term today) consist of the following requirements:
* Must be accountable
* Must be auditable
* Must be a system-of-record
* Must interact with other operational systems
* Must house operational data
* Must house historical data
* Must NOT separate operational data from historical data in the data store.
* Must be the SOURCE for a major business function
* Must be real-time (can have batch feeds, but must be real-time in data streams)
* Must be part of the business process flows.

So what are the technical requirements?
* Must be scalable
* Must be flexible
* Must NOT break history when the business changes/data models change
* Must NOT break existing data feeds when the model changes
* Must be FAST access, fast insert, etc...

And of course it MUST follow the DW2.0 requirements:
* Must have historical data
* Must not be "updated" directly (would break auditability)
* Must maintain cross-functional relationships
* Must be GRANULAR (to the absolute lowest level of grain available)
* Must provide strategic and tactical value
* Must include indexes/pointers/links to unstructured information

So what? How do I get there?
We've used the Data Vault data modeling to get there. It meets all these needs and has been blessed by Bill Inmon as the "optimal choice for DW2.0" data modeling. Because of the structures, along with the foundational approaches to loading the Data Vault, and what the data in the Data Vault represent - we've been able to construct the system described above. In fact, we have two of these up and running. One in our facilities in Denver, and one in Washington DC.

So you mean to say there "is no operational system"?
There is partially, there are many "machines" that collect the information operationally, and pass it back to our Operational Data Warehouse (Data Vault), but - they do not house the information after they've released it to us. The ODW Data Vault actually stores all the operational information from around the country, and soon - around the world.

Next time we'll dive in a little deeper as to what it means to construct one of these, and how they work.

You might already have one of these, if you do - I'd love to hear about it. As always, thoughts, comments, corrections, are welcome.

Cheers,
Dan Linstedt


Posted February 25, 2008 8:26 AM
Permalink | 2 Comments |

In this entry I'm going to get on a small tangent about "contracting" with companies that execute in a consulting realm, what to watch for, what to ask about, how to negotiate with these companies. These companies are famous for "squeezing" you as a customer to pin you down on deliverables (this I see as completely fair) in order for them to get paid, they _must_ have a set of clearly defined deliverables and timelines signed off on. However, these companies are also interesting in another light. I'll tell you a true story (without names) about a review of certain companies who pitched to solve a problem for a very large customer. Our team was involved in "reviewing" their bids.

The scenario:
Customer "ABA" had a project scoped out: to build a warehouse of information in one years time frame with real-time feeds from a multitude of systems that included HR, web-capture, customer base, click-stream analytics, data mining operations, and a few other components. They had specified the first phase around these pieces (scoped down quite a bit), and then opened it up for bid. Somehow (although I'm not sure how) the "consulting companies" involved in the bid (all 3 of them) managed to find out: how much time the customer had to implement, and how much money (total budget) for the first phase.

The customer had told us: 6 months and $1M dollars (the numbers and time-line have been changed to protect the innocent) was what they had hoped that this would take. The customer brought us in to evaluate these pitches from these vendors.

So, to move on:
We evaluated the pitches, looked at all the power-points, documentation, and then called the companies making the pitch - discussed it all with them. At the end of the day (strange as it may seem) the results were as follows:

Company "A" pitched: 6 months, $1+ million (without expenses), they had pitched an elaborate scheme to get the company "integrated", but this first six months, they did NOT pitch any sort of relevant BI, reporting, analytics. This was just to get the database installed, up & working, initial feeds in place. They claimed 25+ people were needed from a variety of backgrounds (some of which didn't make sense).

Turns out, all three companies pitched EXACTLY to the time frame, and EXACTLY to the TOTAL dollar figures "available" for the implementation.

Ok, you get the point. So why is this case study in this blog anyhow? What's the relevance? I'm glad you asked...

The relevance is: none of the 3 companies pitches included (you guessed it), a) the artifacts that would make the project go b) knowledge transfer on the artifacts c) training and knowledge transfer on the systems they were going to put in place d) NO involvement from the customer employee's was specified.

And it went on from there. In other words, none of the pitches included words about scope control, delivery processes, methodologies and knowledge used to build the systems, training, mentoring. And of course, none of the pitches included _any_ of the actual employees of the existing corporation, not even the business users.

So, at the end of the project (and yes, the consulting companies pitched this too) "there would have to be *required*, 3 more "phases" rounding out the next two years, topping $13M to $15M in revenue streams, piling over 250 _different_ resources (rolling in on the job, out of the job and being shuffled around to different customers depending on resource needs) in order to complete what the business needed done inside of 6 months in order to stay competitive" At the end of the project, there was NO knowledge transfer, NO training, NO mentoring by the people, on the processes and methodologies used to get there, nor on the systems that were put in place.

The consulting companies "claimed" it would be easier for the customer this way, to not get involved. That it would be a "black box slam dunk" proposition that the customer didn't need to care or worry about what or who they rolled in/out of the project, nor that the customer doesn't need to care HOW they build the solution, just that they get it done in time and "within the budget they specified."

This is a serious serious mess. Now I'm no dummy (at least I don't think so)... I've worked as a CUSTOMER Project Manager / employee, and as a consultant. I've seen it (probably guilty of doing these things too in the past, and for that reason I'm writing these entries).

I no longer believe that this is the right way to do business. My full value proposition as an expert/consultant is really, truly as an educator/advisor/counselor to the customer - YOU! It is my responsibility to not only help you get the project done, guide it's build out, put in place the people, processes, and knowledge needed to get it done, but ALSO to train, educate, and share with the employees (both business and IT) on HOW we get things done, WHY we do things the way we do, and WHAT the methodologies are that make it work.

It is my civic duty to understand how to OPTIMIZE the methodologies, and to train you on how to optimize the methodologies that make I.T. more nimble, more successful, and repeatably over time - faster to the delivery track for lower cost. THIS is the way I operate.

These methodologies are not rocket science, they are however quite special from a DW/BI perspective, as we do have _some_ unique value propositions to overcome. But as a customer you need to ask for:

* Copies of the methodology used to build your systems
* Maps that tie the implementation roadmap to the features to the methodology and scope
* Involvement from your employees, how they "help", when they "help".
* A training/mentoring schedule, at what points in the project will the employees become TRAINED to be efficient in the methodology used to construct the system that is critical to your business. (SOME consulting companies hate this, because it "appears" as a loss of future revenue from their perspective, be wary of companies that won't work with you on this).

My whole point is: UNDERSTAND not only what you are "paying for" in services, but also what the end-product will be once delivered. Without that level of understanding, you (as a customer) will be beholden to the consulting company to come in and maintain it... there's the kicker my friends... yup - beholden to the consulting company. At the end of the project, if it's done well, you shouldn't NEED the consulting company anymore except for possibly additional training, or "new projects" where you are resource stretched.

Just be careful my friends. Find the RIGHT consulting company/consultant that meets your needs and will train you on the deliverables, and HOW to use the deliverables effectively in the future.

In my next entry we'll get back on track and talk more about projects, the type of knowledge that needs to be transferred, and the kind of training required to make these things a success. As always, feel free to contact me directly or respond to this blog entry. What have been your experiences?

Cheers,
Dan L
DanL@GeneseeAcademy.com


Posted February 15, 2008 5:18 AM
Permalink | 1 Comment |