Extending the Scope and Reach of Business Intelligence, Part 3 The Role of Data Federation

Originally published April 27, 2010

In Parts 1 and 2 of this series, we covered the ways in which companies can extend business intelligence (BI) using technologies that offer alternative approaches to managing data (analytical databases systems and cloud computing). In this article, we discuss the use of data federation as a means of accessing data quickly and easily, under what conditions we should use it, and when it should be avoided. Perhaps the best place to start is with a comparison to traditional data consolidation.

  • Data consolidation consists of the processes to physically capture, cleanse, integrate, transform, and load data from operational systems into a target data store (e.g., a data warehouse or mart). Data is consolidated using extraction, transformation, and load (ETL) technologies.
  • Data federation (sometimes referred to as data virtualization) consists of the processes that yield a real-time, virtually integrated view of disparate data from multiple sources, using a universal data access layer. Data federation uses enterprise information integration (EII) technologies to create virtual stores of data.
These two technologies work well together to form a more complete data integration platform as evidenced by the recent acquisitions of data federation capabilities by major data consolidation vendors. Most now offer both ETL and EII technologies in their solutions. These are not mutually exclusive solutions but often work together. The question for BI implementers becomes when to use data federation versus data consolidation. The following use cases shed light on the answers to the most popular usages of data federation. In all cases, the basic tenet is to create a data layer between the sources of the data and the application using the data.
  1. Complementing an existing data warehouse with current, operational data. Data federation is perfect for combining different sets of data together – as long as they don’t require significant integration or complicated data quality processing. In this case, much of the integration and data quality processing has already occurred for the data in the data warehouse. A simple and virtual combination of current operational data (real time data) with the historical data and analytic results from the data warehouse is quite useful especially for operational BI.
  2. Combining data from multiple data warehouses. Companies have multiple data warehouses for a number of reasons: mergers and acquisitions, political or cultural constraints, and security or privacy concerns. For most of these instances, the heavy lifting of physical data integration has been performed. Now combining historical data from various warehouses becomes a simple matter with minimal data integration. Implementers often find this virtual combination of data to be much faster and immediately useful rather than physically merging the various data warehouses together.
  3. Creating virtual data marts. Creating a virtual data mart from an existing data warehouse is simple and fast. This can be used to prototype the ultimate physical mart or may continue as a virtual one that is easy to enhance or change as the analytic usage changes. 
  4. Combining structured and unstructured data. In today’s decision-making environments, the addition of unstructured data is mandatory. This “color commentary” information yields significantly more insight and explanation than the structured data alone. The ability to combine it with the analytic results from the BI environment can be a simple process for data federation technology to accomplish.
  5. Implementing a virtual operational data store (ODS). Integrating operational data from multiple systems into a physical ODS can be quite onerous. The ability to virtually create this repository has the advantages of speed and flexibility.
  6. Creating data services, data mashups, and caches. Data federation gives the business users significant flexibility and self-service capabilities. The ability to generate their own combinations of analytic and other data through these services extends the BI environment to bring non-traditional operational or external data into the users’ dashboards, scorecards, or portals. Data federation works well in a service-oriented architecture (SOA).
  7. Constructing a prototype. Perhaps one of the more useful functions for data federation is the creation of a mock-up or prototype of a data warehouse, mart or ODS. The ease by with the virtual prototype can be made allows the implementation team to “see” the results quickly. They can then make a determination about the physical integration needs, the amount of data quality processing required, and the ultimate format of the data to be loaded physically into the target database.
Now that you have an understanding of some use cases for data federation, you should also recognize some of the constraints or situations where data federation may not be advisable.
  • When combining different sets of data (whether from multiple warehouses, operational sources, or even unstructured data), it is doubtful that these sources were built using an overarching standard data model. They will most likely have inconsistent formats, names, data quality processes, calculations, etc. To make virtual connections between repositories, disparities must be abstracted or easily transformed during the federation process. If these are not easily accommodated, then data federation may not be appropriate.
  • In federating operational sources (e.g., for an ODS), we recommend that you examine the sources of data to determine the extent that the data must be manipulated for integration purposes. Second, the queries should be fairly narrow in scope and not require large volumes of operational data to be accessed. A virtual ODS does not make sense when these data volumes are too high, when the integration and quality issues cannot be resolved at run time, and performance of the underlying operational systems becomes compromised.
  • The performance of the underlying systems being federated must be monitored. Data federation generally leads to unusual or unexpected utilization of the federated systems. In particular, operational systems perform routine processes and are often adversely affected by multiple nontraditional queries. Make sure you can monitor their performance at all times.
  • Understand the business problem – the best way of discerning which technology and technique will work best in your IT environment comes from a careful study of the business requirements. Do the business users need real-time, low latency data or historical data or a combination of all three? For the business processes, what types of data and interconnectivity are needed, who will be using the environment (analysts, executives or front-line operations personnel), what is the timeframe for decision making – is it immediate, tactical, or strategic in nature?
Finally, we recommend that you keep a focus on future needs. No matter where you are today, the business will move rapidly into new and different directions. Keep your options open regarding physical data consolidation and virtual data federation. If you are heavily using data consolidation today, you may find that data federation makes more sense in the near future. Reduction of costs, time to implementation, ease of maintenance, etc. are strong forces that could change your stance.

  • Claudia ImhoffClaudia Imhoff
    A thought leader, visionary, and practitioner, Claudia Imhoff, Ph.D., is an internationally recognized expert on analytics, business intelligence, and the architectures to support these initiatives. Dr. Imhoff has co-authored five books on these subjects and writes articles (totaling more than 150) for technical and business magazines.

    She is also the Founder of the Boulder BI Brain Trust, a consortium of independent analysts and consultants (www.BBBT.us). You can follow them on Twitter at #BBBT

    Editor's Note:
    More articles and resources are available in Claudia's BeyeNETWORK Expert Channel. Be sure to visit today!

     

  • Colin WhiteColin White

    Colin White is the founder of BI Research and president of DataBase Associates Inc. As an analyst, educator and writer, he is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies and how they can be used for building the smart and agile business. With many years of IT experience, he has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit and is a regular contributor to several leading print- and web-based industry journals. For ten years he was the conference chair of the Shared Insights Portals, Content Management, and Collaboration conference. He was also the conference director of the DB/EXPO trade show and conference.

    Editor's Note: More articles and resources are available in Colin's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Claudia Imhoff, Colin White

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!