Data Integration: Looking Beyond ETL
Originally published October 13, 2010
Data integration can be as simple or complex as an organization demands. It can move data from a source to a target or it can transform data according to established standards Ė itís all contingent upon enterprise requirements. Successful data integration projects will ultimately allow data to be accessed, profiled, enriched, de-duplicated and consolidated to provide a single view of customers, products or operations.
ETL: The Old School Approach
ETL is a technology architecture that gathers and consolidates data from disparate data sources into a repository (such as a data warehouse or data mart) by integrating the data and providing it with a common structure. Since it often involves IT professionals doing their own custom coding, ETL is one of the most common data integration methods used in the marketplace. However, itís not always the best method. Depending on what you want to accomplish with a data integration project, there may be better alternatives to this traditional approach.
The reason ETL is seen by many as the de facto solution for data integration is because of its long list of benefits. It can handle large quantities of complex data as well as data transformations that require multiple passes. Itís also handy when an organization requires data transformation, frequent access, analytical processing or longitudinal reporting. Organizations that prefer ETL are those who require data consolidation, since the technology can handle large batch migrations of data.
However, for the majority of enterprises ETL may be way more trouble than itís worth. Some of the more important issues include:
ELT/In-Database: Quality, Actionable Data Delivered Quickly and Efficiently
Some organizations are now looking to the extract, load and transform (ELT) method, also known as in-database integration, as an alternative to ETL. With this process, most of the data transformations occur after the data has been loaded into its intended database or repository. While the data is still in its raw format, it is transformed and moved to tables before being made available to users.
At first glance, the main difference between ELT and ETL is the transposed order of transforming the data, but itís much more than that. Transforming the data after it has reached its destination helps optimize performance and minimize cost. In-database integration functions at the infrastructure level while ETL functions at the integration server level; therefore, in-database optimizes performance in most cases. Additionally, the in-database method leverages the convenience of virtualization and cloud computing Ė already part of the data warehousing infrastructure Ė which helps to speed processes and control costs.
With in-database integration/ELT, organizations can:
Data Federation: Providing a Single Virtual View of Enterprise Information
Data federation is a relatively new approach to integration. It enables a virtual view of data across multiple data silos without needing to move or copy the data. While ETL moves data into a pre-determined central repository, data federation allows the data to remain wherever it happens to be without physically altering the data. When an organization wants to access the data for business use, it uses a query-processing system to create a virtual snapshot of that data. All a user needs to do is to specify the information he or she wishes to see, and the federation server will immediately deliver it as a virtual, integrated view. This is truly invaluable when data across multiple departments or lines of business must remain siloed for compliance reasons.
The advantage to data federation is that it provides an intermediate layer between the data query and the source. Itís useful for light-duty and read-only applications, where a user needs quick one-and-done reporting or wants to extract only certain parameters of business intelligence for a specific vantage point in analytics. Additionally, data federation is particularly helpful in scenarios where itís just too expensive to create and maintain a database specifically for the integrated data.
One disadvantage of data federation, however, is actually one of its strengths: it prevents data from being changed. This feature is great for retaining historical accuracy, but problematic for companies looking to continually improve data. For instance, with data federation organizations can generate a clear view of their customers on a single platform. However, if thereís an inaccuracy in a customer record, it canít be fixed Ė itís a mirrored version of data that exists in some other location.
Data federation for integration can be beneficial to companies for several reasons, among them:
As long as a company doesnít need to change the data Ė or doesnít need up-to-the-minute pristine customer information, data federation can be an excellent option for data integration. Being able to attain quick, real-time snapshots of customers can help companies create targeted direct marketing campaigns or even anticipate staffing and expansion needs.
Real-time/Near-Time Data Integration: Fast, Accurate Information On-Demand
When it comes to on-demand or event-driven applications, nothing beats real-time data integration. It offers distinct benefits, making it a popular method of integration in use cases like call center operations, or when a manufacturer needs continual insight into each step of the product development process. Real-time data integration is also important in tactical and strategic applications, where users require data that is always current and accessible the second itís generated.
Therefore, real-time data integration revolves around timeliness and efficiency Ė where data integration meets business applications. Whether a company needs to track their expenses by the minute to meet compliance regulations, or catch data mistakes quickly before they negatively impact a customer, real-time data integration is a wise choice.
Situations are rarely so clear-cut that one solution can address all needs at once, and data integration is no different. While most organizations realize the critical role of effective data management in facilitating smooth business operations, they still struggle to see the ďbig pictureĒ in terms of data integration. Subsequently, many companies automatically turn to traditional solutions like ETL to get the job done Ė even if itís not done particularly well. This is where the mettle of data management providers is truly tested. By exploring ETL alternatives like data federation, in-database integration/ELT and real-time integration, customers can understand what theyíre really trying to accomplish Ė and deliver solutions that will meet those objectives.
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC