Business Intelligence Network Business Intelligence Resources

Blog: Dan E. Linstedt

« Got Dirty SOX? EII & ADW & IQ | Main | Real-time versus Right-Time, Who's Right? »

What does ETL do that EAI Can't?

The reason for these posts under SOA, is that this is where the convergence of these technologies is headed. In other words, SOA "fabric" relies on all of these tool sets and paradigms to work together to achieve best of breed integration. I've been asked on another post about this specific subject, so here are my thoughts. My thoughts tend to be on the edge, and hopefully spur some comments from knowledgeable individuals in the field. With that, let's take a look.

Those that have been existing in the EAI world (especially if they're from a vendor) will try to tell you that their EAI tool is the be-all end-all solution for integration. This just is flat out wrong/false. If this were true, then we wouldn't have had the rise of ETL / ELT, and now the rise of EII engines, and web-services.

EAI DOESN'T do everything. One of the arguments they give us is: The whole world should be "real-time", so no need for batch at all... That's flat out untrue. The whole world can't be real-time, and it's not real-time at all anyhow (this is a fallacy all by itself). The CORRECT notion is RIGHT-TIME not real-time (a blog for another day).

Ok, so if we take the marketing statements they have which are geared to sell their product, and apply it to business here's what usually falls out:

EAI:
* Push solution, when a transaction is ready, it pushes it across to the required receiver.
* Geared STRICTLY at APPLICATION INTEGRATION, it is not geared to legacy systems, or history of the corporation.
* Great for business rule validation and manual processes of CURRENT DATA, not so good at the historical nature of information handling which feeds our warehouse to begin with.
* USUALLY HIGHLY CODE DRIVEN, proprietary coding/scripting required inside a proprietary code driven engine, increasing the build-out time for the overall solution.
* MAY NOT have a connector to the source system your business has in place, but probably will "sell you one" anyway, and write it as the deal is being inked.

Again, EAI is GREAT for application integration; it is GREAT for "transactional capture and movement". Now let's take a look at why not everyone needs this.

Right-Time Data Requirements:
1. To get to "right-time" requires money, time, and serious enterprise architecture initiatives.
2. Because of the amount of investment (usually double or triple the original "batch" investment for loading data warehouses), and because you are on an exponentially increasing difficult and cost curve (the more you reduce the latency, the more the cost/difficulty will rise), the business is REQUIRED to have a specific TACTICAL question they have no choice but to answer in order to remain competitive. This justifies the cost and time required to build the right-time system in the first place.

There are others (which I will blog on in the near future). BUT for now, the business needs to understand what they're getting in to. They often ask for the world and a silver spoon, but when it comes down to brass tacks, can they pay for it and justify the cost with business value?

EAI again is a PUSH technology (see my post about Push Vs Pull).

EAI - WON'T HANDLE BIG BATCHES OF HISTORY.

ETL or ELT or ELTL:
* Can hook to your EAI system as just another feed if desired.
* Can hook to your source systems directly, bypassing the EAI - bypassing the need to "code" everything in order to get the transaction from point a to point B in the first place.
* Can load massive batches of data in parallel, where EAI systems are limited (by network, and source system speed to X transactions per second).
* Can be scheduled for maintenance reasons, and not impact the source systems.
* Is a VISUAL rather than a CODE driven solution
* Has consistent metadata, where certain EAI vendors do, others don't have any metadata visibility into the executing code within the EAI - non-traceable to the transformations.

Most of all, it is often lower-cost with a quicker implementation time-frame, and it's about using the right tool for the right job. There are hundreds of articles out there that discuss why ETL, and Why EAI, and what the purposes of each are - I'm just trying to give a glimpse into each.

Recap:
Again, the largest point is: Right-Time costs money, architecture and time - the faster the system, the more (exponentially) it will cost to ensure transfer and delivery. ETL (in a lot of ways) mitigates what EAI SHOULD NOT even attempt to address, such as loading history or connecting to legacy systems. ETL (as I stated) provides a GUI development environment where EAI is heavily code-driven, and EAI is not built for "data warehousing feeds" except in justified right-time data cases.

We typically use both tool sets in conjunction with each other - best of breed to answer the questions the business has, but generally this is only the case if the business has already justified the need for EAI, otherwise ETL/ELT is the tool of choice for warehousing integration.

  Posted by Dan Linstedt on November 2, 2005 5:03 AM |

Comments

You're absolutely right on your assessment of EAI vs. ETL in the broadest sense. Most people don't understand their true needs, and blindly apply a buzzword. But EAI is changing. The movement towards customer data integration, for example, is causing a lot of headaches for people that just want to "get the job done" with ETL -- many in the organization will not put up with ANY latency in a single customer view for customer-facing applications , regardless of cost! I've particularly noticed this from the mainframe OLTP types that are used to having a single database for everything, and not used to the latency that comes with distribution and replication. This bias can be very frustrating when trying to pragmatically apply a hybrid ETL + EAI solution, as I've discovered.

Having said that, there are approaches making the "real time single view" more of a reality.
Enterprise Service Buses (ESBs) from BEA or Microsoft or IBM are making message-based integration much more standards-based and graphically-oriented. And the latest "changed data capture" adapters from vendors like NEON or IBM or Informatica are non-intrusive, taking advantage of logger exits to grab data from a legacy environment and propagate it in a standard format to a staging area.

Obviously, if your goal is to analyze historical data, an ETL solution supplemented by EAI/ESB for key subsets of data that require less latency is best. On the other hand, the desire for Business Activity Monitoring (BAM) and Complex Event Processing (CEP) is driving integration solutions to become better at what they do.

The problem is that most of these integration vendors do not understand the lessons learned from years of data warehousing. It's unfortunate that there has to be such an intellectual schism between the integration/CEP/BAM market and the BI/ETL/DW market, but it strikes me (having worked in both industries) that there is a general disdain for databases coming from the "developer" community and a general disdain for applications coming from the "warehousing" community.

Post a comment