An Intelligent Enterprise Requires Unified Data
Increasingly, companies rely on business intelligence and enterprise performance management (referred to in this article simply as BI) for mission-critical decisions and planning. However, many of today’s conventional BI solutions are limited to batch transfers of operational data into data marts that serve up the data in dashboard applications and batch-generated reports – refer to Figure 1. This is partly due to a lack of data integration core infrastructure in ETL (extract, transform, load). Another pain point is that BI systems are only as trustworthy as the data they present. As a result, these implementations will suffer from obsolete data and information inaccuracy. In addition, most data integration tools in the market sector today suffer from lack of interoperability with BI systems. Because of this, BI applications are limited by their isolation or require custom code to integrate data. Finally, because decision makers are demanding more actionable recourse in real-time, this batch-driven linear approach to information analysis is no longer sufficient.
Figure 1 (mouse over image to enlarge)
These five elements make up the critical factors that are needed in achieving compelling visibility through business intelligence. Each one of these factors is addressed by a unified data integration platform:
- Optimized data movement
- Data quality
- Real-time and consistent data
- Next generation interoperability with BI systems
- Actionable BI
A Short Overview of ETL and Why it Matters to Business Intelligence
ETL technology supports the extraction, transformation and integration of data from multiple datasources, including databases, data warehouses and data-centric applications. ETL allows businesses to consolidate their disparate data from any source while moving it from place to place. ETL can transform not only data from different departments but also data from different sources altogether. For example, order details from an enterprise resource planning (ERP) system and service history from a customer relationship management (CRM) application can be consolidated into a central data hub for a single view of the customer.
Today ETL technology is heavily used for stand-alone data warehousing. Data and knowledge management professionals for BI initiatives are increasingly using ETL and data warehousing together for the “heavy” lifting of their data, while at the same time keeping their BI tools with accurate and up-to-date information.
Traditional ETL tools tend to fall short on the performance, productivity and even cost savings aspects because additional hardware is required; it is this latter reason especially that has moved the industry away from traditional ETL vendors to support improved architecture considerations. Enter EL-T.
Optimized Data Movement with Next-Generation ELT
Solutions that optimize “where” data transformation is performed can distinguish themselves from conventional ETL, or extract, transform and load approaches (refer to Figure 2). These solutions are known as E-LT – or extract, load and transform – tools because they can optimize where transformations are deployed either on the target destination or even on the source. This also allows for greater flexibility, improved scalability and greater performance. ELT approaches can also reduce costly IT infrastructure costs. The following should also be considered when evaluating an ELT solution:
- Performance optimizations for set-based transformations.
- Heterogeneous relational databases.
- Optimizations for database appliances.
- No hardware requirements; run-time agents should be deployed on the databases themselves.
- Data that always goes from source to target through optimized database pathways; data should never move through the intermediary.
- Extensible support for standard Java and service-oriented architecture (SOA) environments.
- Design tools that support out-of-the-box optimizations. Users should not have to write special scripts or custom code to enable optimized performance.
Not all ELT approaches are equal. Unlike “pushdown optimization,” ELT architecture is not a mere afterthought. Most pushdown optimization transformations still occur inside the ETL engines and require the physical data to transit over the network and through their engines. When selecting an ELT solution, it is important to discern between brittle proprietary technologies that can easily break and open ELT platforms that can dramatically improve performance while simultaneously lowering cost of ownership.
The Data Quality and Profiling Life Cycle
All enterprise software projects are at risk from bad data; even worse, inaccurate and inconsistent data exists everywhere. In addition, the demand for trusted data continues to be driven by investments in packaged applications and especially BI software. This is partly because of the complexity of data typically used for performance management as well as the auditing requirements for that data. Many regulatory compliance initiatives – such as the Sarbanes-Oxley Act, the U.S. Patriot Act and Basel II – require tracing the source of the data used in financial reports, as well as examination, tracking (through snapshots) and certification of the state and quality of the business data.
Data quality is important for BI applications, but for different reasons than data warehousing initiatives. Technical data managers think of data quality differently than a business manager building out a business report. Database administrators care about the semantics of data, such as broken validations at the field level, whereas the business user considers holistic patterns or matches in the context of customer or performance data as part of a business process. It is this aspect that data architects should consider for comprehensive data quality and data profiling as a complete life cycle.
For example: the first step in a comprehensive data quality life cycle is to assess the quality of your data through data profiling. Profiling data means reverse-engineering metadata from various data stores, detecting patterns in the data so that additional metadata can be inferred and comparing the actual data values to expected data values. Profiling provides an initial baseline for understanding the ways in which actual data values in your systems fail to conform to expectations.
Figure 3 (mouse over image to enlarge)
Once data problems are well understood, the rules to repair those problems can be created and executed by data quality engines. Data quality rules can range from semantic integrity to sophisticated parsing, cleansing, standardization, matching and de-duplication. After data quality rules have been generated, fine-tuned and tested against data samples from within a design environment, those rules can be added to data integration processes so that data can be repaired either statically in the original systems or as part of a data flow. See Figure 3 for an example. By implementing data quality as a life cycle, flow-based control minimizes disruption to existing systems and ensures that downstream analysis and processing works on reliable, trusted data.
Real-time, Consistent Data
To understand how data integration fits within the BI landscape, look at detailed examples where applications consume real-time data and turn it into in-depth analytics and information for improved decision making. In many such scenarios, change data capture (CDC) plays a key role in keeping data consistently updated without impacting the target or source performance. In addition, these systems draw from a wide range of internal sales, customer and financial data applications as well as third-party systems. This requires a broad range of data integration connectivity options to support moving data across such a wide variety of enterprise applications.
There are many ways to extract data from a database management system (DBMS), including queries, replication, table dumps, storage snapshots and calls to the API of an application that sits over the database. Change data capture (CDC) is an alternate data extraction method that has recently become of interest, primarily because it enables data integration to operate closer to real time.
CDC can be applied to most database brands, including relational, legacy, mainframe and file-based DBMSs. A few vendors have built CDC into their products, but many organizations use the data modeling and log capabilities of a DBMS to build their own solutions. CDC has been around for many years, but its ability to solve some of the most difficult data integration challenges is driving interest among IT professionals today.
A simple example of CDC in action follows in Figure 4. Three separate data sources for a web storefront (one for customer data, one for order data, one for product data) are consolidated into a single data warehouse. To simply update the order details in real-time, only the delta (or set of orders and new customer info) needs to be propagated across to the data warehouse. This does not require moving all the data for both systems. Without CDC, business managers would not be able to see daily trends. In addition, business managers would be forced to wait for the next batch of data to load into the data warehouse before they could look at the results. By then, it might be too late to make important informed decisions.
Figure 4 (mouse over image to enlarge)
Next Generation Interoperability
Integrating data means communicating through several integration styles: batch, data, events and services. In some cases moving data directly from databases to different databases is more optimal than passing information over the wire as a service. Packaged applications, such as ERP or CRM demand complex integrations that require understanding metadata models, custom mappings and customer APIs. In some cases, the best form of data integration is a service, via enterprise service bus or a message-oriented middleware; data also needs to participate as part of an SOA by providing data services and transformation services.
But in addition to these generic modes of interoperability, BI applications require a more explicit form of connection that can help automate discovery and mapping of data into and out of systems. Without this type of automated step, companies would be forced to build these integrations by hand. Small changes in BI applications would resort to immense changes, re-coding and re-testing of the entire integration life cycle.
These next generation data integration solutions (see Figure 5) contain pluggable modules to better connect these disparate applications, sources, targets and systems.
Actionable Business Intelligence
Another key trend in the data integration market sector is actionable BI or also thought of as real-time data warehousing. Visibility into static data is no longer sufficient; users must be empowered to directly act on this data based on the available information. For example, a BI solution might report that a partner is no longer meeting a service-level agreement, but how does the company act on that data? The partner needs to be demoted from a platinum-level profile to a lower-profile category. This, in turn, means changing the data in the data warehouse or “closing the loop” among data integration, business process integration and business intelligence. See Figure 6 for an example. It is an important trend as more solutions take advantage of interoperability points in SOA, BI and data warehousing. But in addition to the interoperability requirements, to achieve real-time, actionable BI, the business user must be able to easily drill into the data behind a dashboard to see details about the data lineage – that is, where the data came from and what transformations were applied.
One of the key prerequisites for actionable BI is a comprehensive data management approach. Data management is often the forgotten requirement in a BI/DW solution, but it as important as the other components. At the core of any data integration solution is the need for metadata management, master data management (MDM) and data modeling.
Figure 6 (mouse over image to enlarge)
Metadata management improves data visibility so managers can understand how data is used and how it relates to other data within a global data-centric system. Metadata management and data relationship management are cornerstones for MDM-based solutions that reveal data relationships within a single source of truth.
Data lineage is a key example of metadata management often used by BI utilities to allow business users to independently track data sources. If the data lineage falls short of the actual source and has not integrated properly to the data integration solution, it will be unable to allow business users to identify gaps in the data.
When a BI process cuts across data silos, the project team needs metadata to understand the context of information, including terminology, calculations and methodologies – all prerequisites for a single version of the truth. Achieving actionable BI relies on accurate data throughout the data life cycle – from data origin to data retirement. Data integration is a critical enabler of this data integrity.
A growing manufacturing company implemented a BI solution with the goal of achieving near-real-time analysis of data and reducing the amount of time spent aggregating and extracting data. The methodology used required the company to implement data integration and data quality in conjunction with the BI solution.
The objective of the company’s BI project was to achieve near-real-time analysis of data and drastically reduce the amount of time spent aggregating and extracting data. Data quality is imperative for BI applications. Without it, business users build insight and reports on the wrong information. Hence, the methodology used required the company to implement data integration and data quality in conjunction with the BI solution.
The key issues facing the implementation team included:
- Understanding in-depth analytics related to sales and customer data
- Efficiently moving data into sales, customer and financial data systems, as well as third-party systems
- Improving audit controls around actual financial data
- Improving data quality for better visibility into enterprise wide data sources
As a result of the BI system and the associated data integration and data quality solutions, the manufacturing company has realized more efficient data integration, retrieval and reporting. Specifically, the company can generate standardized and ad hoc reports, freeing employees to focus on data analysis rather than data retrieval. This has led to an increase in data accuracy and the ability to slice and analyze it in many different ways. The company also has the ability to budget and forecast based on business drivers.
The Benefit of Cost Savings and Improved Responsiveness
Business leaders who demand the most from their BI projects can often be tangled up by the IT burden of a complex, fragmented and misbehaving data architecture. This puts additional burden on developers and IT managers to create custom code or point-based integrations. A unified data integration architecture can alleviate this burden by eliminating custom code, consolidating IT infrastructure and finally eliminating some of the key risk from poor quality and inconsistent data.
In addition, accurate, manageable and transparent data allows organizations to more quickly identify and respond to internal and external events. Data integration solutions foster this type of agility by uniting heterogeneous data sources across the enterprise. However, these solutions must be well governed to ensure that data is incorporated into business processes and that data integration becomes part of the change management process. Data quality, data profiling and data governance are essential components to establish and maintain the improved flexibility provided by complex data-centric architecture.
The best solutions available for unifying data across data-centric applications are ones that provide rapid productivity gains for the integration of data. Look for solutions that provide solution add-ons, or knowledge modules, for rapid-deployment and eliminate code creation. Also look for solutions that provide flexibility to work with multiple applications, databases, data warehouse frameworks and BI applications.
The strongest BI offerings embed versatile data integration solutions that increase the value of the information delivered to the business user. Optimizing data integration within a BI solution delivers consolidation across complex applications, clean and consistent data, real-time data access and actionable BI. Data integration and business intelligence are essential technologies to support enterprise-wide performance management, leveraging operational data to create smart, agile and aligned organizations that are achieving management excellence.