Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

I've just blogged on ETL and ELT - the basics of some of the differences. While it starts with volumes of data within the system, it doesn't end there. While the power of SQL in the DBMS hardware can really show it's muscle, it's the metadata at the end of the day that makes all the difference in the maintenance world down the road.

Knowing SQL is critical, understanding each RDBMS specific set of Functionality is also critical, but more than that - building the solution within a tool that can capture the metadata is vital to the long-term success of handling huge volumes.

Why is metadata so important? Because coding in SQL for ELT can quickly get out of hand, especially on large scale projects where hundreds of jobs are necessary to perform the proper transformation. Also, to help in speeding time to delivery, as well as taking advantage of best practices and lessons learned in the industry. If we aren't allowed to leverage our knowledge in such a manner that is fitting to the metadata, then why use a tool at all?

ETLT (as I've described in articles 3 years ago) is a requirement - the ability to do both ELT and ETL depending on the needs of the design. However, designing the transformation or data flow in a single stream, then pushing that into SQL within the database, or even within the RDBMS engine, as SSIS (from Microsoft, SQLServer2005 does) is a necessary task for future use. See my articles on Teradata Magazine: A Symphony of Two, Additional thoughts #1, Additional Thoughts Part 2,
Tomorrows Real-Time ETL

In order for ELT to work, the power and functionality must be there within the RDBMS. As hardware becomes stronger, and is bundled with RDBMS software to become an "appliance", it becomes critical that these vendors begin to cross-implement functionality. We (as an enterprise) will have multiple levels of appliances from different vendors across the playing field in order to support different requirements. The "T" or transformations that are available should follow some standard like ANSI-SQL in their functions.

Today, each vendor offers individual functions specific to their own database environments which makes it nearly impossible to construct ELT in all environments from a single metadata source. The vendor that begins to implement cross-functionality will have a jump on the other vendors, because their "appliance" will fit in with the rest of the enterprise and can be better leveraged by existing Data Flow Diagramming tools (ETL/ELT) that exist.

Remember, metadata is king - proper use of metadata (IN ANY TOOL) allows me to build systems 3 times faster than hand-coding.

Lastly, don't be fooled by RDBMS vendors, some would have you believe they are powerful enough to handle Transformation (ELT) within the database when joining 100M rows to 100M rows to 100M rows, unless the vendor can prove it, AND they can do it while returning a number of queries, and running a batch load sequence - then they are not worth their weight. Today, all these things are happening within the RDBMS while transformation is taking place. The larger the data set, the more active the systems.

Thoughts?
Dan L
CTO, Myers-Holum, Inc


Posted October 17, 2006 4:03 AM
Permalink | 2 Comments |

2 Comments

I clicked on Bill Inmon blog and got Dan E. Linstedt

That's because Bill has chosen me to blog for him.

Cheers,
Dan L

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›