Blog: Dan E. Linstedt« Standards, Compliance, and Successes | Main | RNA and RNAi in Nanohousing » Push-Pull Pros and ConsI've been asked about the pros and cons of ETL push-pull, I thought I'd generalize the issue a little more into the pros and cons of Push Pull technology in general. I'm including EII, and EAI in this posting. It's not that push or pull is necessarily bad by itself, its' more about using the right notion for the right data access at the right time.
Ok - down to brass tacks, the nature of PUSH technology is basically the realm of EAI and Message Queuing. In this realm we deal with the publish/subscribe model, or maintaining a broadcast message to anyone listening. Really "easy" technology until you get to the engineering underneath. The real work is deciding WHICH transactions are important, and WHICH are not. Then there's the decision on how often, how fast, and how to write the drivers to "plug" in to each of the applications, or legacy apps that service transactions to begin with. Ok - enough of the engineering talk, let's get back to the business aspects. Push technology is GREAT when wanting to distribute transactions as-they-happen. Stock tickers, and other types of financial institution transactions are very important when it comes to push technology. How about disasters and notification? Again, important. What about the different components? Now wait just a minute - Aren't ETL/ELT engines getting stronger and faster? Yes - they are. But they still aren't "architected" for real-time dynamic data integration. The worlds BEST ETL/ELT engine will focus on transforming as many transactions as possible (in batch) in the shortest amount of time, that's their strength - and they should STICK to it (Stick to your ticket Harry, very important that you STICK to your ticket... - Harry Potter) We could learn a few things from this line; no really! ETL/ELT is GREAT at PULL technology - go get the data on a scheduled timing interval, not just the data - but ALL the data, en masse. Bring me everything that meets criteria X, across ALL disparate systems, then integrate it all en masse (batch style) - and do it as fast as possible so that I can replicate the system with new information, and transformed information. Ok - well, ETL/ELT engines will HAVE to process near real time in the near future in order to survive, while batch will not go away any time soon, the windows are shrinking, and the data sets are growing, and the timeliness of critical data is becoming more important. ETL/ELT are GREAT at static rules, parallelism, partitioning, and performance - they require huge amounts of processing power to get the job done right (with very large data sets). This is the nature of PULL. I guess one could speculate that PULL technologies require a place to "land" the data once it's been transformed. Not something that PUSH technology needs, nor wants. PUSH technology wants to ACT on the transaction as it stands, once it reaches it's destination. This is a primary difference between PUSH and PULL. Now let's not get confused! There's such a thing as IMMEDIATE PULL, or PULL ON DEMAND, this is new - it's called EII (as a paradigm). EII in this nature offers many different things and is a _complimentary_ technology to EAI and ETL/ELT. Pull on demand isn't (usually) interested in massive history sets, nor is it interested in "doing" something with the transaction, such as applying it to another system based on business process workflow (although this could change in the near future). It is more interested in managing the metadata layers in between the business and data set, it is more interested in immediate access, immediate integration of CURRENT state than it is in history. Now hold on! Don't get me wrong - EII can be used to access warehouses just the same as it can be used to access current OLTP/ODS, Staging areas, and Stock Tickers. It's the FOCUS of what EII does that makes PULL ON DEMAND different than PULL on batch schedule. The focus is much different. That same focus makes it a complimentary technology to the EAI and ETL/ELT world. Using the right tool for the right job makes all the difference. EII also can transform/conform, and write-back. Something that EAI does (write-back), but ETL frequently is not "architected" for. Mostly because the "work" that ETL does must be checked before it is re-integrated with the source systems. Now take Active or Right-Time Data Warehousing, there's a combination of technologies being utilized to get the data into the warehouse at the right-time, and there's a combination (including data mining, and scoring analysis) to re-deliver the transactions back to the source systems at the right time. Of course this is neither push nor pull, but rather "closed loop processing." Ok - it uses push to get the transaction to the warehouse, and push to get it back from the warehouse to the OLTP system. So at the bottom of this blog entry, we are still left with the question, what are the pros and cons of push and pull? Let's see if we can sum it up (forgive me, I may forget a few): Push Cons: Pull Pros: Pull Cons: PULL ON DEMAND Pros: PULL ON DEMAND Cons: Ok, none of these are Complete lists by any stretch of the imagination (*some might say I have none :) But hopefully they give a peek into what might be some of the top differentiators across these technologies. Thoughts? Comments? Have some pros/cons you'd like to add? Please, feel free. Thanks, |
Comments
Dan, I run a BI/DW aggregation site over at www.biblogs.com. I've reviewed your blog and I'd like to ask your permission to add your content to the site. If you don't mind, please take a look at the site and let me know if you'd like to have your content included there.
Thanks,
Scott
Posted by: Scott Mitchell | October 24, 2005 7:44 PM