I've just blogged on ETL and ELT - the basics of some of the differences. While it starts with volumes of data within the system, it doesn't end there. While the power of SQL in the DBMS hardware can really show it's muscle, it's the metadata at the end of the day that makes all the difference in the maintenance world down the road.
Knowing SQL is critical, understanding each RDBMS specific set of Functionality is also critical, but more than that - building the solution within a tool that can capture the metadata is vital to the long-term success of handling huge volumes.
Why is metadata so important? Because coding in SQL for ELT can quickly get out of hand, especially on large scale projects where hundreds of jobs are necessary to perform the proper transformation. Also, to help in speeding time to delivery, as well as taking advantage of best practices and lessons learned in the industry. If we aren't allowed to leverage our knowledge in such a manner that is fitting to the metadata, then why use a tool at all?
ETLT (as I've described in articles 3 years ago) is a requirement - the ability to do both ELT and ETL depending on the needs of the design. However, designing the transformation or data flow in a single stream, then pushing that into SQL within the database, or even within the RDBMS engine, as SSIS (from Microsoft, SQLServer2005 does) is a necessary task for future use. See my articles on Teradata Magazine: A Symphony of Two, Additional thoughts #1, Additional Thoughts Part 2,
Tomorrows Real-Time ETL
In order for ELT to work, the power and functionality must be there within the RDBMS. As hardware becomes stronger, and is bundled with RDBMS software to become an "appliance", it becomes critical that these vendors begin to cross-implement functionality. We (as an enterprise) will have multiple levels of appliances from different vendors across the playing field in order to support different requirements. The "T" or transformations that are available should follow some standard like ANSI-SQL in their functions.
Today, each vendor offers individual functions specific to their own database environments which makes it nearly impossible to construct ELT in all environments from a single metadata source. The vendor that begins to implement cross-functionality will have a jump on the other vendors, because their "appliance" will fit in with the rest of the enterprise and can be better leveraged by existing Data Flow Diagramming tools (ETL/ELT) that exist.
Remember, metadata is king - proper use of metadata (IN ANY TOOL) allows me to build systems 3 times faster than hand-coding.
Lastly, don't be fooled by RDBMS vendors, some would have you believe they are powerful enough to handle Transformation (ELT) within the database when joining 100M rows to 100M rows to 100M rows, unless the vendor can prove it, AND they can do it while returning a number of queries, and running a batch load sequence - then they are not worth their weight. Today, all these things are happening within the RDBMS while transformation is taking place. The larger the data set, the more active the systems.
CTO, Myers-Holum, Inc
Posted October 17, 2006 4:03 AM
Permalink | 2 Comments |