Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Recently I've been asked about Active Data Warehousing, and (Real-time) Right Time Data Warehousing, what do these mean to the enterprise? In this short blog entry, I offer my opinion on the definition of each. In future entries I will define the basics of building one, the questions to ask, and potential value to the enterprise. I lead an effort in Active Data Warehousing, and Right Time Data Warehousing for Myers-Holum, Inc. We have best practices surrounding these efforts, and will soon offer tips and tricks for free on our site.

Too often, we are confused by marketing literature and vendor hype. I'm going to set the line and offer my opinion in DEFINING just what an Active Data Warehouse is, and just what a Right Time Data Warehouse is. Are they different? Yes, why? Well, we'll get to that in a minute. For now, this is the way I define each:

Active Data Warehousing (ADW)
The technical ability to capture transactions when they change, and integrate them in to the warehouse - along with maintaining batch or scheduled cycle refreshes.

Right-Time Data Warehouse (RTDW) (Not REAL-TIME)
The ability to answer a specific justifiable business question at the time in which it is asked. In other words: a pre-designed business question that requires heaps of pre-integrated data (from the warehouse) in order to answer the question. The answer to the question drives a competitive business decision or a decision that already has a cost / benefit analysis tied to the answer (in other words: quantifiable answer).

RTDW: If the business needs to answer a question at the end of the day, every day - then a RTDW would refresh on a daily basis. If the business needs to answer a question every 30 minutes, then an RTDW would refresh every 30 minutes - assuming the data is available.

Is an RTDW an Active Data Warehouse?
In my opinion: not always. The way I see it: ADW refreshes WHEN TRANSACTIONS CHANGE (they also combine pre-scheduled batch cycles). An RTDW is an ADW when the two are in sync - in other words, if transactions change every 10 minutes, and are captured and integrated into the warehouse as they change, and I have a business question that must be answered every 10 minutes, then I have an ADW and an RTDW.

Is an ADW also an RTDW?
Not necessarily, although 99% of the time, yes - it should be. Why? Because it costs a lot of money to build and feed an ADW, the ADW shouldn't be constructed without solid quantifiable business questions, and in doing so - thus answer the RTDW criteria.

So then, what is a Real-Time Data Warehouse?
I've blogged before on my interpretation of Real-Time data warehousing, I still maintain that it does not exist, and that timing just can't be fast enough (due to laws of physics) to make Real-Time decisions. That of course is the technical definition.

Is the business definition of Real-Time different than Right-Time?
Yes, there is a distinction between the two. Real-Time is more geared towards what I've defined as ADW (here), than it is Right-Time (as defined here).

Please don't mince words when going forward. Develop best-practices, metadata definitions, and terminology standards (business metadata). Too often our businesses are confused by all the vendor hype and marketing material that "throws a term in" just because it sounds cool.

I'd love to hear your thoughts on this topic, even if you disagree. How do you define Real-Time, Right-Time, and Active DW?

Thanks,
Dan L


Posted January 30, 2006 12:47 PM
Permalink | 5 Comments |

5 Comments

Another aspect of Active Data Warehousing, in my opinion, is the ability of the warehouse environment to create activity in the overall transaction environment. Rather than just receiving and integrating transactions from other systems as they happen, the result of that incoming transaction needs to be the possibility of creating a new outgoing transaction destined to initiate some activity elsewhere in the enterprise. In this respect, I'd argue that active warehousing is more fringe than just transaction speed (versus batch) updates. That being said, I'm still wrapping the pragmatic part of my brain around some of the ideas.

Hi Paul,

I completely agree, I forgot to mention that indeed: Active Warehousing feeds back to transactional systems, and in doing so CHANGES the definition of what a warehouse is... my next entry will discuss the nature of "just what does Data WHAREHOUSE mean to you?"

Great comment! Thanks,
Dan L

I would be interested in the implementation challenges surrounding Righ-time data warehouseing. For example the typical batch approach to ETL often places a burden on the warehouse which may prevent extracts from running. How can this be managed if the refresh cycle is 10 minutes?

I agree that one must not damage the words. I have one point to highlight though, in many cases though the data may be uploaded into the DWH real-time , it is getting used by business users only on a 'right time' basis. Its extremely imp to analyze the reason for real time DWH since that is costly and technically challenging.

Dan -

I read with some interest your blog. This is a very broad topic from my perspective. As I see BI and the information needed to support it, the critical pieces are not technological necessarily. The most critical functions are discovering the business questions and creating capability for the business to answer them; not providing the answer, but providing the capabilities. Delivering the capability for the business to answer its most pressing and concerning questions; whether by ADW or RTDW or EDW or whatever means is where it is at. Also, just getting to the real business questions can be a daunting task and requires a patient, discovery technique and becoming a trusted advisor to the business. The technologies, while important, are at best efficient in providing data unless you can find a way to deliver business value capabilities.

Your blog provided me some information that stimulated my real concerns around BI.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›