The 2005 Extended Corporate Information Factory Architecture
The first article, Supporting the
Smart Business: The Extended Corporate Information Factorysummarizes the business and technology changes that made reconfiguring the CIF necessary. Part two of this series will
focuses on the Extended Corporate Information Factory (CIFe) itself, and the changes within this popular and well-established architecture for business intelligence. Over the years, we
have added or renamed certain components of the architecture as needs arose. For example, the data warehouse was originally called the atomic database and data marts were called departmental
databases. We added the popular operational data store (ODS) to the architecture in the mid-1990’s. But overall the architecture has remained remarkably stable while accommodating new
breakthroughs in its implementation as they became available.
At first glance, the Extended Corporate Information Factory may appear to be a complete departure from the “old” CIF. However, a closer examination shows that we have maintained
the basic functionality and principles of the original Corporate Information Factory. These principles include:
- The program orientation of the business intelligence environment.
- The sourcing of data from the operational systems environment and external data stores.
- The storage of an enterprise view of the data, in both the data warehouse and operational data store (ODS).
- The delivery of that data to the business community through data marts or views into the main storage components tailored to the business users and their applications.
- The inclusion of metadata management throughout the environment.
Many of the changes deal with a new set of technologies and techniques used in the overall process of creating the CIFe databases (Data Integration and Data Delivery) and in extending
the focus of the Extended Corporate Information Factory into the more operational aspects of business intelligence. In addition, the outer ring of entities (Governance, Infrastructure
Management, Quality Management, etc.) was added to ensure that the Extended Corporate Information Factory remained an enterprise resource. These entities have formalizing best practices that
we have learned over the years to support this focus. See Figure 1 for the complete new architecture – the Extended Corporate Information Factory.
-----
Related content from the BeyeNETWORK
How do data integration platforms compare? Read this report that estimates the cost effectiveness based on total cost per project.
Click Here
-----
Figure 1: The Extended Corporate Information Factory
We have divided this article into these two main categories of CIFe changes for further explanation. Before going further, though, we must revisit the purpose for the CIFe
architecture and the benefits it has brought to many business intelligence and data warehousing projects. We offer this latest edition of the Extended Corporate Information Factory freely and
without restrictions in the hopes that you will find it useful and informative.
CIFe– A Conceptual Architecture Supporting Business Intelligence for the Smart Business
There have been many articles, presentations and other intellectual property created about the Corporate Information Factory. In its purest essence, it must be understood that the Extended
Corporate Information Factory is a logical or conceptual depiction of a business intelligence environment. How companies physically implement this is their own choice. In all our years of
designing and building these architectures, we have never seen two exactly alike. The generic nature of the architecture ensures that all technologies and techniques can be considered in its
creation; there is no bias toward any particular technological solution. However, the basic tenets of implementation must not be compromised. These tenets include:
- Data is captured from the operational systems, integrated to form the “one version of the truth,” and then made available for analysis or other business intelligence activities.
There are two main mechanisms for making the data available. These can be made virtually through views or data federation techniques, or physically through data consolidation and propagation
techniques. More information about this is given in the next section.
- The data warehouse must be a real, physical entity without exception. A virtual data warehouse does not make sense. The data warehouse is a constant in the architecture since it contains the
historical, integrated data for all strategic analytics.
- The data marts must be dependent upon the data warehouse or operational data store (ODS) as their source of data. These data marts may be virtual, (e.g., through the use of views into the data
warehouse or ODS) or real physical entities requiring use of proprietary technologies (e.g., MOLAP cubes) or the need for specialized data formats or subsets of data (e.g., certain data mining,
statistical or exploration technologies and subsets of data).
- The ODS is the source of current, integrated data and plays an important role in operational business intelligence for many enterprises. This component could make use of both data federation or
consolidation technologies allowing it to be partially physical and partially virtual. It is the advance in data federation technologies that allow the ODS to contain real-time data for operational
decision making purposes.
- Metadata is the glue that holds the entire architecture together. Just like any other database component, it must be managed and maintained.
Another result of the conceptual or logical nature of the Extended Corporate Information Factory is that many physical technological components are left out. Some examples of this include:
- Staging areas (either persistent or temporary), which must be understood as part of the overall technical environment for the data integration process. Therefore, no formal component called
“Staging Area” is shown in the Extended Corporate Information Factory.
- Disaster recovery, backup or archive technologies are also not explicitly depicted. Because these are considered to be part of the overall routine database maintenance, they do not need to show
up in such a conceptual architecture.
- There are many different forms of data marts. And there will be more forms of data marts in the future. Therefore, we chose to consolidate these into just a single set of generic data marts.
There are currently at least five different types of marts. Each has its own technology supporting it, and its own business purpose or problem requiring its existence. See Figure 2 for examples of
the five types of data marts.
Figure 2: Examples of various data marts and mart technologies.
- Finally, the Extended Corporate Information Factory does not distinguish between structured and unstructured data. While the processes to capture and integrate these important pieces of
content may be different, they are not shown as separate entities in a conceptual architecture such as the CIFe. It must be recognized that both are simply forms of data that must be
incorporated into a business intelligence environment in some fashion.
Using an architectural diagram like the CIFe has a number of valid and tangible benefits for business intelligence implementers. First, it offers a well-planned roadmap for using data
integration and business intelligence technologies. The back-end of the roadmap consists of operational systems, the processes to integrate and make data accessible, and the storage units (data
warehouse and ODS) that are the backbone for an environment’s maintainability and sustainability. And the front-end delivery mechanisms consisting of data delivery, data marts and the various
access and display capabilities, which yield the tactical and strategic analytics. These are mandatory for today’s smart businesses.
The CIFe serves as an excellent blueprint for all systems that support and drive business analytics and operations. It ensures the coordinated deployment of CRM analytics, BPM and other
business intelligence technologies by mapping or documenting the overall data flows, which occur into and out of the various CIFe components and the corresponding process interactions.
Using this architecture as a business intelligence blueprint enables seamless integration across various architectural components and promotes the re-use of components thus reducing overall
development costs.
To extend the CIFe throughout the enterprise, the environment requires massive data collection, storage and access. If it is to perform correctly, the CIFe must be created
with scalable, interoperable and reliable technologies. The success of its performance is measured by all the CIFe technologies to become so ubiquitous that they become
invisible—to the customer, the business user and the overall enterprise. This transparency requires flexibility… and flexibility requires an appropriate architecture.
With this introduction, let’s turn our attention to the latest innovations in the 2005 Extended Corporate Information Factory.
Data Integration and Data Delivery
Perhaps the most significant change to the Extended Corporate Information Factory is in how we acquire and access the integrated data used throughout the entire environment. The
CIFe combines two formerly separated processes, data acquisition and data delivery, into a single process—Data Integration and Delivery. This new process contains three techniques
for acquiring and delivering data: data consolidation, data propagation and data federation (Figure 3). Each of these techniques, in turn, has its own applicable technologies for performing the
technique:
- Data Consolidation is the technique of integrating disparate pieces of data together to create a single record. The main technology used to perform this technique is ETL (extraction,
transformation and loading) software. This software was the first form of data integration technology used in business intelligence environments, and has considerably matured in the past decade. It
captures data from operational systems, performs matching and screening processes to integrate the data, converts the integrated data to the corporate standard formats and appends, inserts or
updates records into the data warehouse or ODS. It is also commonly used to selectively extract data from the warehouse or ODS for creation of analytical data marts. In the second case, the
selected data may need to be reformatted to fit the technology used in the mart, and then delivered to the mart environment. Data consolidation is typically an event-driven technique.
- Data Propagation is the technique of replicating large amounts of data from one source and delivering the replicated data into a target database. It too has been around for a number of years.
The technique uses EAI (enterprise application integration) technology to perform this task. Generally, minimal data transformation is needed. The most common uses for data propagation have been
between operational systems, e.g., bulk movement of billing information from the billing system to the general ledger system. However, EAI technology is increasingly being used to replicate data
from a data warehouse into a mart (if no reformatting or complex derivations are required). This technology can also replicate analytical results from a mart to the operational environment (e.g.,
replicating customer segment and lifetime value scores into a CRM system or ODS).
- Data Federation is the technique in which data is virtually combined and then presented to the requestor. This relatively new technique is supported by EII (enterprise information integration)
technologies. Data is not physically moved from source to target; rather the data is accessed from a variety of databases, and combined virtually, and then presented as if it were physically
integrated. If the request is common, the data may be cached in the EII technology for better performance with minimal impact on the source applications. This software is commonly used to combine
historical information from a mart with the current data from an operational system. It may also be used to combine data from multiple ERP installations into a single view of the combined data.
Figure 3: Data integration and data delivery
All three techniques and technologies can be used (in some combination) to create the data integration and data delivery process in the new Extended Corporate Information Factory. This change from
the traditional CIF architecture gives the integration and delivery process much more flexibility, in terms of how components are created and maintained. It also gives the implementers more options
about what technologies or techniques work best in their particular environments. You must remember, though, that the basic tenets of good CIFe construction cannot be violated.
Enterprise Business Intelligence Best Practices
The outer band of the Extended Corporate Information Factory shows the major components of the environment management function. These components provide a more modern approach to the major
activities that must be performed to ensure that the business intelligence environment (1) operates smoothly, (2) operates cost-effectively and (3) increases in value to the organization as the
business learns to leverage and expand its application. These components are gleaned from the best practices learned over the past two decades. These best practices ensure that the business
intelligence environment is focused on supporting the entire enterprise, while efficiently satisfying the individual needs of each department or subdivision. They ensure that the environment is
sustainable and maintainable over the long haul and new technologies and techniques can be easily incorporated. There are six major components within the environment management function. They are:
- Governance: This consists of the people and processes for controlling and coordinating the environment with the individual business intelligence projects. Governance ensures that the various
projects adhere to CIFe standards and nomenclature, that the data models are integrated and that the technologies are compatible and appropriate.
- Infrastructure Management: This consists of the people, processes and technologies for ensuring that the environment operates smoothly and reliably. Activities within this component include
version upgrades, incorporation of new technologies, retirement of older technologies, etc.
- Center of Excellence: This consists of the people, processes and technologies for promoting collaboration and applying best practices. Typical activities for the Center of Excellence are
maintaining source system expertise and data integration intelligence, gathering and understanding end-user requirements and preserving tool expertise.
- Quality Management: This consists of the people, processes and technologies that ensure that data quality meets business expectations. Many organizations today are creating data stewardship
functions to manage the overall data quality process. This function must interface with both the business intelligence and operational resources to ensure that data quality processes are fully
adopted throughout the enterprise.
- Application Management: This consists of the people, processes and technologies that create and coordinate application development within the business intelligence environment to provide
maximum business value. Understanding the business problem may not be enough to create a successful and satisfying application. It may take a more complete understanding of how the application or
technology fits into a person’s overall workflow. It means understanding the bigger picture of how people use the applications to perform their daily tasks. This function ensures that the
appropriate technology is used for a specific business problem as well. An example of this is using statistical technologies for statistical problems, multidimensional technology for
multidimensional problems, etc.
- Metadata Management: This consists of the people, processes, technologies and data stores for managing the information about the enterprise’s data resources and activities. This includes
not only the technical metadata generated from the data integration and delivery process, but also the metadata generated from administrative processes (who is using the environment, what data is
frequently used, what data can be archived) and business metadata containing business definitions and rules.
Summary
It is with a great sense of accomplishment and happiness that we offer the extended version of the Corporate Information Factory. We see the CIFe as bringing together the best practices
learned from prior implementations with the latest technological innovations, pushing the frontiers of business intelligence. The last part of this series will describe how the CIFe
works with the Smart BI Framework to create the ultimate Smart Business.
-
Claudia Imhoff
Claudia Imhoff, Ph.D., is the President and Founder of Intelligent Solutions, a leading consultancy on data warehousing and business intelligence
technologies and strategies. She is a popular speaker and internationally recognized expert, and serves as an advisor to many corporations, universities and leading technology companies on these
topics. She has co-authored five books and more than 100 articles on these topics and has a popular blog at www.b-eye-network.com/blogs/imhoff/. She may be reached at CImhoff@IntelSols.com.
Editor's note: More Claudia Imhoff articles, resources, news and events are available in the BeyeNETWORK's Claudia Imhoff Expert
Channel. Be sure to visit today!
-
Colin White
Colin is the founder and president of BI Research. He is well known for his in-depth knowledge of business intelligence, data management and data
integration technologies and how they can be used for supporting smart and agile decision making. With 40 years of IT experience, he has consulted for dozens of companies throughout the world and
is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit and is a regular contributor
to several leading print- and web-based industry journals, including the BeyeNETWORK. Colin may be contacted by sending an email to
info@bi-research.com .
Editor's note: More articles, resources, news and events are available in Colin's BeyeNETWORK Expert Channel. Be sure to visit
today!
Recent articles by Claudia Imhoff, Colin White
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!