Originally published February 28, 2005
I have discussed in previous articles how organizations are becoming increasingly interested in using business intelligence, not only for strategic planning and tactical analysis, but also for driving intra-day business decision making. I use the term operational right-time business intelligence to describe the business intelligence used for this intra-day decision making. The word operational signifies that the business intelligence is being used to optimize everyday operational business processes, and the word right-time denotes that this optimization occurs at a frequency that matches business needs. Right-time may vary, for example, from right now (i.e., real-time) to several minutes or hours.
There are three types of right-time business intelligence processing: right-time data integration, right-time reporting and performance management, and right-time decisions and actions. The first type is concerned with gathering data in a timely fashion for analysis, whereas the latter two types are about analyzing the data, making business decisions, and taking actions in a timely manner. In this article, I talk about the first type, i.e., right-time data integration and the technologies that can be used to support it.
Right-Time Data Consolidation
Traditional data warehousing involves running regular, usually batch, ETL processes that extract data from operational data sources and transform and load the extracted data into a data warehouse. ETL processes can be thought of as doing data consolidation. Data replication is another technology that can be used for data consolidation. A third approach is enterprise content management (ECM), which consolidates unstructured and semi-structured data content (documents, rich-media, Web data) into a content repository. All three types of data consolidation move or copy data from a data source to a data target. The objective of data consolidation is to provide a shareable, clean, consistent, integrated and managed view of data for business users.
One approach to doing right-time data integration is to consolidate data on a timelier basis. Many companies, for example, have introduced the operational data store (ODS) and operational data marts into their data warehouse architecture for handling integrated right-time operational data. These operational data stores are usually maintained by event-driven and right-time ETL (RT-ETL) tools.
The use of operational data stores in data warehousing projects has accelerated over the last few years. This has increased the complexity of data integration projects, and often results in the copying of large amounts of operational data, which in turn leads to very large data stores. It also requires data consolidation products that are capable of supporting sustained and high-volume right-time processing. This type of right-time environment can be expensive to deploy and maintain.
Right-Time Data Federation
Another approach to right-time data integration is to use a federated data approach, which is often easier and more cost effective than data consolidation for certain types of applications. Data federation provides the ability to present a single logical view of dispersed data to an application, without the need to physically copy or move the data into a consolidated data store.
In its basic form, access to federated data involves breaking down a federated query into subcomponents, and sending each subcomponent for processing to the location where the required data resides. The federated query server then combines the results and sends a reply to the application that issued the query. Data federation is provided by enterprise information integration (EII) software. EII products vary considerably in the features they offer. Performance and query optimization is one key area, for example, where products differ. Other areas include support for unstructured data, and for business data in application packages from vendors such as SAP and Siebel.
The objective of EII is to enable business users to see all of the information they need as though it resided in a single database. EII shields business users and applications from the complexities associated with retrieving data from multiple locations and where the data may differ in semantics and formats, and may employ different APIs.
It is important to emphasize that data federation cannot replace the traditional data consolidation approach used for data warehousing. A fully federated, or virtual data warehouse, is not recommended for reasons of performance and data consistency. Data federation should be used instead to extend and enhance a data warehousing environment to address specific business needs.
Data federation is a powerful approach for solving certain types of data access problems, but it is essential to understand the trade-off of using federated data. One issue is that federated queries may need access to an operational business transaction system. Complex EII query processing against such a system can impact the performance of the operational applications running on that system. With the federated data approach, this impact can be reduced by sending only simple and specific queries to the operational system.
Another potential problem with data federation is how to logically and correctly relate data warehousing information to the data in operational and remote systems. This is a similar problem that must be addressed when designing the ETL processes for building a data warehouse. The same detailed analysis and understanding of the data sources and their relationships to the targets is required. Sometimes, it will be clear that a data relationship is too complex, or the source data quality too poor, to allow federated access. Data federation does not, in any way, reduce the need for detailed modeling and analysis. It may in fact require more rigor in the design process, because of the right-time nature of data transformation and cleanup in a federated environment.
When to Use Data Federation
The following is a list of circumstances when data federation would be an appropriate approach to consider:
When to Use Data Consolidation
The arguments in favor of data consolidation are the opposite of those for data federation:
You can see then that both data consolidation and data federation have a role to play in data warehousing and right-time data integration, and companies will need to implement both of these technologies. Rather than buying two separate products, one for data consolidation, and one for data federation, you should start looking at companies that support both of them in a single integrated product with shared metadata. This product trend is beginning to happen in the market, and to be successful in the future, data integration vendors will need to offer both capabilities in a single product.
Recent articles by Colin White
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!