Channel: Information and Analytic Strategy - David Loshin RSS Feed for Information and Analytic Strategy - David Loshin

 

Data Integration: Looking Beyond ETL

Originally published October 13, 2010

Data integration can be as simple or complex as an organization demands. It can move data from a source to a target or it can transform data according to established standards – it’s all contingent upon enterprise requirements. Successful data integration projects will ultimately allow data to be accessed, profiled, enriched, de-duplicated and consolidated to provide a single view of customers, products or operations.

With a plethora of data integration strategies to choose from, many organizations automatically look to ETL (extract-transform-load) for the “go to” data integration tool. While being the most common approach, ETL may not be the best one for a specific business. There are numerous data integration options available today, including data federation, ELT (also known as in-database or SQL push-downs) and real-time data integration.

ETL: The Old School Approach

ETL is a technology architecture that gathers and consolidates data from disparate data sources into a repository (such as a data warehouse or data mart) by integrating the data and providing it with a common structure. Since it often involves IT professionals doing their own custom coding, ETL is one of the most common data integration methods used in the marketplace. However, it’s not always the best method. Depending on what you want to accomplish with a data integration project, there may be better alternatives to this traditional approach.

The reason ETL is seen by many as the de facto solution for data integration is because of its long list of benefits. It can handle large quantities of complex data as well as data transformations that require multiple passes. It’s also handy when an organization requires data transformation, frequent access, analytical processing or longitudinal reporting. Organizations that prefer ETL are those who require data consolidation, since the technology can handle large batch migrations of data.
    
However, for the majority of enterprises ETL may be way more trouble than it’s worth. Some of the more important issues include:
  • It’s a poor fit for synchronization due to its inability to address high concurrency, low latency data needs.
  • Hand-coding eliminates the option for efficiency in terms of scope and scale; since there’s no set process, data integration can be inaccurate or incomplete – plus a company is dependent on the specific style of its coders.
  • In addition to the risk of human error, the time and effort required to manually maintain a data management system renders the strategy cost-ineffective.
  • Its large deployment carries a heavy footprint that can affect the entire enterprise architecture.

ELT/In-Database: Quality, Actionable Data Delivered Quickly and Efficiently

Some organizations are now looking to the extract, load and transform (ELT) method, also known as in-database integration, as an alternative to ETL. With this process, most of the data transformations occur after the data has been loaded into its intended database or repository. While the data is still in its raw format, it is transformed and moved to tables before being made available to users.

At first glance, the main difference between ELT and ETL is the transposed order of transforming the data, but it’s much more than that. Transforming the data after it has reached its destination helps optimize performance and minimize cost. In-database integration functions at the infrastructure level while ETL functions at the integration server level; therefore, in-database optimizes performance in most cases. Additionally, the in-database method leverages the convenience of virtualization and cloud computing – already part of the data warehousing infrastructure – which helps to speed processes and control costs.

With in-database integration/ELT, organizations can:
  • Reduce time-to-market for new applications using a standardized enterprise data model.
  • Deliver constantly updated reports with real-time reporting.
  • Control costs through centralized development and reduction of core integration expenses.

Data Federation: Providing a Single Virtual View of Enterprise Information

Data federation is a relatively new approach to integration. It enables a virtual view of data across multiple data silos without needing to move or copy the data. While ETL moves data into a pre-determined central repository, data federation allows the data to remain wherever it happens to be without physically altering the data. When an organization wants to access the data for business use, it uses a query-processing system to create a virtual snapshot of that data. All a user needs to do is to specify the information he or she wishes to see, and the federation server will immediately deliver it as a virtual, integrated view. This is truly invaluable when data across multiple departments or lines of business must remain siloed for compliance reasons.

The advantage to data federation is that it provides an intermediate layer between the data query and the source. It’s useful for light-duty and read-only applications, where a user needs quick one-and-done reporting or wants to extract only certain parameters of business intelligence for a specific vantage point in analytics. Additionally, data federation is particularly helpful in scenarios where it’s just too expensive to create and maintain a database specifically for the integrated data.

One disadvantage of data federation, however, is actually one of its strengths: it prevents data from being changed. This feature is great for retaining historical accuracy, but problematic for companies looking to continually improve data. For instance, with data federation organizations can generate a clear view of their customers on a single platform. However, if there’s an inaccuracy in a customer record, it can’t be fixed – it’s a mirrored version of data that exists in some other location.

Data federation for integration can be beneficial to companies for several reasons, among them:
  • Helping businesses remain compliant in the midst of increasingly strenuous government oversight. For instance, if regulations forbid a financial institution from consolidating or altering certain types of data, staff can still get a snapshot of the information without needing to move or interact with it.
  • Providing a more cost-effective method for accessing data since federation rarely requires expensive permissions to a database.
  • Offering a more efficient, faster option for integration than other methods since it eliminates the need to create a separate data source for storing information.
As long as a company doesn’t need to change the data – or doesn’t need up-to-the-minute pristine customer information, data federation can be an excellent option for data integration. Being able to attain quick, real-time snapshots of customers can help companies create targeted direct marketing campaigns or even anticipate staffing and expansion needs.

Real-time/Near-Time Data Integration: Fast, Accurate Information On-Demand

When it comes to on-demand or event-driven applications, nothing beats real-time data integration. It offers distinct benefits, making it a popular method of integration in use cases like call center operations, or when a manufacturer needs continual insight into each step of the product development process. Real-time data integration is also important in tactical and strategic applications, where users require data that is always current and accessible the second it’s generated.

Therefore, real-time data integration revolves around timeliness and efficiency – where data integration meets business applications. Whether a company needs to track their expenses by the minute to meet compliance regulations, or catch data mistakes quickly before they negatively impact a customer, real-time data integration is a wise choice.

Situations are rarely so clear-cut that one solution can address all needs at once, and data integration is no different. While most organizations realize the critical role of effective data management in facilitating smooth business operations, they still struggle to see the “big picture” in terms of data integration. Subsequently, many companies automatically turn to traditional solutions like ETL to get the job done – even if it’s not done particularly well. This is where the mettle of data management providers is truly tested. By exploring ETL alternatives like data federation, in-database integration/ELT and real-time integration, customers can understand what they’re really trying to accomplish – and deliver solutions that will meet those objectives.
 

SOURCE: Data Integration: Looking Beyond ETL

  • Daniel TeacheyDaniel Teachey
    Daniel Teachey is senior director of marketing for DataFlux Corporation. Daniel manages global marketing efforts for DataFlux and currently oversees public relations, product marketing, marketing programs, customer relations and marketing communications. He joined DataFlux in 2003 and oversaw corporate communications activities before taking his current role. Prior to DataFlux, he held positions in public relations and marketing with IBM, MicroMass Communications and Datastream Systems. Daniel received a bachelor’s degree in journalism as well as a master’s degree in public administration from the University of North Carolina at Chapel Hill. He may be contacted by email at daniel.teachey@dataflux.com. 


 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!