Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

Pardon my ignorance, I'm still learning EII too - where it fits, how it's growing up, and what customers really can do with it in the long run. Lately I've been asked to describe EII vs EAI vs ETL in layman’s terms. I'll attempt to do this in this (short) entry. Again, if I misunderstand, please correct me for the benefit of the community, and I'll go seek additional definitions for better content next time.

By the way, I do enjoy the comments I've been getting from Tim Mathews (and others). He has a blog on Ipedo's web site: http://blogs.ipedo.com/integration_insider/

I work hard at translating terms into layman’s terms, and sometimes I don't quite get it right the first time ;-) Please correct me if I misstate some truths in translation. Just don't send the BableFish after me!

I'll try my best to explain the basic definition, differentiator, and provide an over-simplified example of where the technology might fit. Hopefully this will clear the air a bit more...

EII - Enterprise Information Integration, crudely defined as a middle tier query server; but it's much more than that. It contains a metadata layer with consolidated business definitions. It also contains (usually) an ability to communicate through web-services, database connections, or XQuery/XPath (XML translation). In fact, it relies heavily on the metadata layer to define "how and where" to get its data.

It's a PULL engine, that waits for a request - splits the query (if it has to) across heterogeneous source systems (multiple sources), gathers transactional (mostly) data sets, merges them together (again relying on the metadata layer for integration rules), then pushes them out to the requestor; which could be a web-service, a BI query tool, Excel, or some other front-end (like EAI or Message Queuing Systems).

With EII it may be safe to say?
The more definition that a business can provide for the metadata layer, the better the ROI the business will see, and the higher the utilization of the tool.

EII usually sits seamlessly between the requestor and the multiple scattered data sets. One final note: its job is NOT (as of today) to move massive batches of information on a scheduled basis from point A to point B through heavy translation layers.

An oversimplified example might be: A voter walks into a voting area, the registrar needs to check his background, current address, phone numbers, driver’s license records, and any recent activity involving the law. Each system has it's own interface, each system is completely disparate and doesn't talk to one another, and the registrar only has a drivers license number (maybe a current address) to look them up with. They need a response in a matter of seconds: Can this guy vote here now? EII is a perfect fit for getting this kind of job done, although the registrar uses a web-interface and never "sees" the EII tool doing the work.

EAI - Enterprise Application Integration. This one's been around for a while. In layman’s terms: EAI connects your Siebel to your PeopleSoft, and your Oracle Financials to your SAP systems, and vice-versa. Most EAI systems are PUSH driven, a transaction happens in your Enterprise App, and an EAI listener "sees" it and pushes it out over the bus, or to a centralized queue for distribution to other applications. Most EAI engines are more "workflow" and "process flow" driven rather than on-demand.

A simple example is: PeopleSoft is connected to Oracle Financials, and a sales person enters a new customer order, the EAI application picks up the new customer / new order, and sends it to Oracle Financials to be recorded. EAI is also transaction oriented. EAI's major flaw? It doesn't talk to "non-applications" like legacy systems, data warehouses; excel spread sheets, stock tickers, unstructured data, email, and so on (although some vendors have built custom "readers" for this information).

ETL - Extract Transform and Load, sometimes known as ELT (extract load THEN transform). This also is an older paradigm (although somewhat newer than EAI from an acronym standpoint). ETL/ELT offer PUSH technology. Usually geared towards huge volumes, highly parallel, repetitive tasks, scheduled and continuous. These are a kind of heart-beat of many integration systems around the world today - they feed massive amounts of data from point A to point B in a timely fashion. They are responsible for performing that task on a consistent and repeatable basis. They handle massive transformations (sometimes in the database, sometimes in stream).

Most ETL/ELT engines today also run on metadata, but a different kind of metadata (compared to EII). The metadata they utilize (I like to call) PROCESS METADATA. It contains back-office workflow information, the end-results of the data integration are often seen through utilizing data marts or querying the database directly. Although rare, ETL/ELT can also be used as a device to synchronize systems around the organization on an hourly or nightly basis.

ELT/ETL engines often do NOT respond well to transaction based requests, which is why ETL/ELT vendors are struggling with Real-Time integration today. An example of ELT/ETL would be: Integrate all customer data from 4 or 5 of my source systems overnight - produce a customer management table with all my customers in it. While you're at it, get me an ice-cream with a cherry on top and a root beer... Just kidding.

Well, this brings this entry to a close, I hope you enjoyed "my version of the truth." feel free to correct me, and I'll do more homework next time. Same B-Eye Time, Same B-Eye Channel, tune in next time for: For Whom the EII bell tolls??

Cheers,
Dan L


Posted September 20, 2005 7:30 PM
Permalink | 6 Comments |

6 Comments

Hi Dan,

I'd like to add another viewpoint on the differences between EII, EAI and ETL. People often seem to think that the three compete, but in reality they really are solving different problems. By the way, I think your descriptions of the three are correct.

EII is typically used to collect related information from disparate systems. In some ways, it can be thought of a suped-up join engine that happens to handle non-relational data as well as relational.
EAI is really a glue layer between applications that should talk to each other, but don't.
ETL is quite different. It's most common use is to populate a data mart or warehouse for use by analytical applications. This involves converting data from a system optimized for transactional systems to one designed to support dimensional analysis and ad-hoc querying. Another common use is to collect seveal data sources into a single data store that can be archived or used for auditing purposes. Unlike the other two system, ETL isn't really intended to work with real-time information and is used to create systems where real-time is inappropriate.

Finally, another common front end that is used with EII systems is good old-fashioned reporting, which is near and dear to my heart.

-Barry Klawans
JasperSoft
http://www.jaspersoft.com

Hi Folks:
With respect to this discussion on EII, ETL and EAI terminology it is important to understand that processes in the context of these solutions can be executed in real-time. EII can fire off a distributed query and the results from the query can be delivered to the "glass" in real-time. No different than sending a single query to a rdbms system.

ETL solutions are now equiped with the ability to fire off the ETL process in real-time or call the ETL transformation engine in real-time. These processes can be exposed as a web service, an enterprise java bean or an enterprise message bean. In addition the ETL process can be fired off through an event trigger.

EAI has always had real-time support and can also be fired off by invoking the process via a web service call. In addition, they have supported event triggers for quite some time.

Also, let's not described these solutions as "old" or "new". They have evolved for years based on user requirements and are aligned with modern software solutions.

-Bob Zurek
IBM

HI,
WHAT ARE THE PROS AND CONS IN PUSH AND PULL TECHNOLOGIES USING ETL


THIRU

I am currently experiencing some difficulty in answering questions of me. We have a TIBCO EAI project that is in progress. I am working on defining what the data warehouse infrastrcucture should look like. The issue that keeps coming up is what does ETL do that EAI can't? According to these TIBCO folks, TIBCO can do everything, we just haven't got around to completing certain projects. My stance is that we will need an ETL tool to build the warehouse. Eventually once we get all of the infrastructure in place if we want to tie the ETL processes to the EAI processes then we can plug into the TIBCO bus and get real-time messages to load into the data warehouse. In the interim we will need to get an ETL tool and start working on defining the extracts according to business rules. The first thing that comes to mind is that ETL is used for batch pulls, but the TIBCO perspective is there is no batch everything is done real-time. How do I convince an EAI world that we WILL need the ETL infrastructure and that does NOT compete with TIBCO the two can work together?

Hello to Doug Needcham,

I am trying to get in touch with someone who are using TIBCO EAI. There are some issues that I need inputs from. Is it possible to contact me at sujay_nair@jasubhai.com.

Greatly appreciate this...

With respect to this discussion on EII, ETL and EAI terminology it is important to understand that processes in the context of these solutions can be executed in real-time.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›