We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.


Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

May 2005 Archives

Now that I've blogged on the needs for an ETL-T engine, I think it only fair to discuss what EL-T still leaves to be desired, and what is required to make EL-T perform. While ETL-T is the industry direction, EL-T has a ways to go before it can "take-over". Of course the notions of ELT "successes" are highly dependant on the RDBMS engine that it puts its' data in.

Let's explore these notions a little deeper...

EL-T (as I blogged recently) is where the integration industry is headed. Some of the comments I received were in regards to specific tool sets in the integration space. In another blog this week, I'll explore what these tools will need to have in order to survive the next couple years.

Let's start with the advantages of ETL over ELT:
1. ETL can off-load the transformation, this can be particularly helpful if you have a powerful ETL machine, or fast enough hardware and network pipes to perform parallel transformations.
2. ETL with 64 bit has nearly unlimited reach into physical memory, allowing most of the transformations to be optimally performed in stream, in memory.
3. ETL transformations in stream only have to pass the data once, if architected properly.
4. ETL can offer cross-RDBMS best of breed features that sometimes the databases don't have (which now, the RDBMS engines are catching up).
5. ETL can circumvent a poorly tuned RDBMS server, or an overloaded RDBMS server, or an overloaded DISK I/O channel that the RDBMS is using.
6. ETL has a tremendous leverage point for metadata, consistent and re-usable metadata due to the in stream processing. This makes it easier to track dependencies and data changes.

Now, before we knock ETL, let's just say there's still some big benefits to being able to perform "T" in stream, even though the ETL paradigm is indeed "dead" or morphing into something else.


What ETL traditionally has trouble with is:
1. Near-Real Time Processing, these are typically "bolt-ons" to a batch engine architecture - unfortunately the engines simply are not equipped for high-performance NRT processing.
2. XML technology, this also is typically a "bolt-on" rather than an engineered core-view, mainly because batch processing at high speeds requires highly structured data, and XML just isn't so. Even though there are "structures", the parent-child relationships, one-to-many, and many-to-many relationships are data driven, along with optional structural components. Typical ETL engines fall down on high-performance XML, and/or ease-of-use XML.
3. Transaction processing, backup, distribution, logging, and all things transactional. Transaction processing isn't just near-real-time processing, it includes the business rules too. Unfortunately ETL has a HUGE hole in the business rules arena. They simply do not operate as "business process workflow engines", they operate today, more as "IT data integration process workflow engines." Transaction processing sometimes includes emails, escalation paths, time-outs, delays, queue's, and business user manual interaction. The ETL architecture simply isn't flexible enough to handle these needs.

Ok, now let's talk about ELT and what it's pros and cons are.
Pro's:
1. ELT offers tremendous flexibility, as long as the RDBMS engine can be extended, the ELT engine can live on. When the RDBMS engine receives upgrades, performance tuning, additional hardware, ELT engine takes advantage of it right away.
2. ELT engines can offer extreme performance in terms of "copy-drop" data movement, they can parallelize the heck out of threads that move data from point A to point B, not much magic there to think about, except maybe fault-tolerance and recovery.
3. ELT's don't need to move data out and back in to a single RDBMS, they can work with the data in the target RDBMS, within the RAM specifications and CPU that the database engine offers.
4. ELT technology can be trigger driven, real-time (near-real-time), and is more apt to recognize or be based on transactional processing.
5. ELT technology is truly the "next-generation" of data integration tools, of course EII and EAI are vying for this space as well.
6. ELT doesn't require middle tier (extra servers and so on) for deployment of jobs.
7. ELT engines typically have no trouble dealing with XML, mostly because the RDBMS engines have that handling built in now.

Some of the cons:
1. ELT relies heavily on the performance and tuning of the RDBMS instance. If the instance is slow, ELT has no where to go! It will run only as fast as the RDBMS server allows.
2. ELT with huge batches of data, can eat tremendous resources on an RDBMS server, if you're running extremely large data sets, you better have a super-duty RDBMS engine, and it better be water-cooled, twin engine, air-intake with overhead cam shaft. In other words, your DBA's have to be the cream of the crop, and really be wizzes at making your RDBMS hum.
3. Some ELT engines don't allow control over the "array batch size" within the RDBMS, this could easily blow log segments/redo's/temp spaces.
4. Some ETL vendors will tell you that their engine is an ELT engine, only if it generates optimized native RDBMS SQL code with advanced functionality.
5. ELT MUST stage the data in order to run delta's, if the vendor claims in-memory delta processing, then they are an ETL engine not an ELT engine (unless again they generate native RDBMS SQL code) - then they might be an ETL-T engine (new breed).
6. ELT software today usually doesn't have all the connectivity options that ETL has (but that will change soon).
7. ELT engines frequently stage to flat-file for bulk-loader processes, if your ELT engine loads through an OS PIPE, be-careful! The OS Pipe sizes can be limited, and become a bottleneck in the flow. In other words: loading through a pipe directly into an RDBMS bulk-load facility can be slower than staging to a flat file and blasting the bulk-load with buffering mechanisms.
8. ELT engines REQUIRE extra RDBMS space to transform data, particularly when dealing with VLDB (very large databases). Why? Because READS must be processed in a batch form, so they don't conflict with WRITES, especially if the machine itself or the RDBMS cannot show a linear performance increase with the increase in the size of the hardware.
9. ELT vendors (most of them) need to show their integration to the business rules (this is where the EII vendors have really thrived lately - that and metadata).
10. If you can't tune your SQL, then you're better off with an ETL engine (today). ELT will require technicians with a high proficiency in SQL tuning, and RDBMS tuning.

Ok, that said: at the end of the day, I still would like the option of ETL-T with a lower cost, and be flexible enough to deal with the situations that arise. More to come.

Cheers for now,
Dan L


Posted May 2, 2005 6:29 PM
Permalink | 2 Comments |

1 2 3 NEXT

Search this blog
Categories ›
Archives ›
Recent Entries ›