We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

Business Drivers for Data Integration Part 2 of a Series on Data Integration as an Enterprise Service

Originally published March 30, 2012

The first article in this series detailed the functionality and requirements for a data integration service or set of tools. This part of the series will delve into the business cases driving the need for an SOA implemented data integration (DI) service and reduction in cost of ownership. The discussion will outline two potential applications of data integration as a service (DIaaS) with the associated cost calculation and savings, followed by the introduction of the chargeback potential concept.

Redundant Applications Case Study

As a consultant I see the inner workings of many client engagements, such as enterprise application integration, business intelligence (BI), data center migration and consolidation projects, enterprise  resource planning (ERP) solutions, etc. One thing that almost all clients have in common is complex data integration patterns. I am going to put a stake in the ground and state that the root cause of many project failures leads back to planning at below an enterprise level, which tends to inhibit constructive corporate collaboration. If the planning considered needs for the enterprise, then there would be a standard enforced path applied for data integration. A company with a mature collaboration level would be utilizing knowledge management and capturing/utilizing the metadata. If consumers knew how to locate needed data, then they would use the existing pathways and not create new ones. Some companies compound the issue by not only having redundant integration pathways, but multiple products for the same tool functionality. In a few cases, I have even seen multiple versions of the same product.

For example, while on a one-week project to assist in the configuration of an enterprise data propagation (EDP) solution it became evident that the client had three EDP projects, each one with its own server, product license, development and maintenance teams. Unfortunately – but typically – all three implementations were sourcing data from the same mainframe production system (actually the same set of 50 tables in the claims system) and loading to three different staging areas before finally targeting a common enterprise data warehouse. A return on investment (ROI) can be calculated by comparing the cost of this client’s design with what would it be if they had collaborated and employed data integration as a service.

In order to calculate the ROI, I like to separate the cost into three categories:
  1. Hardware
  2. Software
  3. Manpower
Each category can be further organized into subcategories, e.g., manpower can be subdivided into the cost of the resources per hour, training, client support and IT help desk. Using the three categories listed above, we can navigate through the high level categories and see where money was spent that could have been consolidated.
  1. Hardware

    1. Since there were three distinct installations for each EDP project, three Dell servers were used. Let’s estimate that $15,000 was spent on each server.

    2. Each server takes space in the rack and requires power and cooling. Let’s estimate this as $4,900 per year per server.

    3. Each server has to be administered, managed, maintained and monitored. Let’s estimate this at $5,000 per year and per server, which includes patches, backup/recovery, monthly maintenance and support.

  2. Software

    1. Each instance of enterprise data propagation requires a separate license since it is IP-based on three servers, and is approximately $150,000 initial investment per instance.

    2. Each instance will require annual licensing, which costs 18% of initial cost or $27,000 per year per instance.

    3. Client software must also be taken into account. If this is exposed via Citrix or another shared server, then the cost for this must also be added.

  3. Manpower

    1. Resources. Each resource has an hourly rate, so how many hours per type of resource are allocated for each instance.
      • Administration: This is the application administration. Let’s use $50 per hour as a rate and 15 hours/week as consumption rate.

      • Data Architect: Someone had to model each instance. Let’s use $50 per hour as a rate and 120 hours per instance.

      • Team lead: Each team has a super user or lead. Let’s use $50/hour and 30 hours per week consumption.

      • Developers: For simplicity, let’s assume there is one developer and use $30 per hour with 35 hours per week consumption.

    2. Training: Each user must go through some type of training, including administration. For this analysis, let’s blend this to an average cost of 10,000 per resource.

    3. IT help desk and client support: Let’s use the blended rate used by large corporations of $1500 per user.
There are a number of other factors that could be added to this case. Central processing unit (CPU) cycles on the mainframe forces the mainframe administrator to source three times to the same tables in production! Additionally, the mainframe must host three sets of log tables to capture the changes. In this example, the company is paying to run three schemas for this, so cost of database administrators (DBAs), disk space, CPU and memory for services has to be added. This all requires additional internal resources or consultants and professional services.

Let’s calculate the cost as implemented in a simplified way for discussion (Table 1).

   Initial cost
 Annual Cost (recurring)
 Total cost
 Hardware (servers)
 $45,000 (purchase price)
 $29,700 (support cost)
 Software  $450,000  $81,000  $450,000
 Manpower – administrative    $39,000 (per instance)
 Manpower – development    $18,200 (per instance)
 Total      $696,300

Table 1: Costs as Implemented

What should have been implemented? Let’s review: Fifty tables were sourced from claim system, sent  to three different staging areas, then transformed and loaded into the enterprise data warehouse (EDW). What if these projects were consolidated and used one server with one licensed version of the EDP software, one staging area, one administrator, DBA and data architect for support? Let’s revisit the costs (Table 2).

   Initial cost
 Annual cost (recurring)
 Total cost (first year)
 Hardware (servers)
 $15,000  $9,900  $24,900
 Software  $150,000  $27,000  $177,000
 Manpower – administrative    $39,000  $39,000
 Manpower – development    $18,200  $18,200

Table 2: Costs as Should be Implemented

This conservative scenario (more savings could have been added) shows a reduction of over 60% in costs when the projects are integrated within a service.

Also note, the lowest cost is obviously the developers. By sharing the EDP services, projects can easily be scaled and extended by adding development staff at reasonable expense.

As startling as this is, we are only looking at one service, not all of those that fall under the data integration umbrella. Let’s take a closer look at some common business drivers that are potential consumers of data integration services.

Business Drivers

Business drivers lead to initiatives that, in turn, instantiate projects. For instance, a business driver might be to discover the single view of the customer, identify the top ten customers or aggregate cost by vendor to increase purchase value. All of these drivers, in turn, lead to a BI  initiative. Upon approval of the business case for the initiative, the data warehouse project gets the green light. Other business drivers might include:
  • Mergers and acquisitions – Data consolidation project

  • Increase business profitability – Customer/product hubs

  • Increase competitiveness and real time market analytics – Big data

  • Mandate to modernize IT and reduce cost – Data migration project

  • Outsource IT – Software as a Service (SaaS)
These are all commonly occurring business drivers that upon project approval would consist of phases for requirements, hardware/software purchase, design, training, development and deployment. A byproduct of data integration as a service is the ability to remove time and cost of procurement for hardware/software and reduce other phases where reusability can be applied, such as design, development and testing.

Some of the standard types of projects and services covered under data integration as a service are reflected in Figure 1.

Figure 1: Data Integration Services

Another way to understand the full range of data integration services is to think about your IT investment and data flows. Often you will find that a single service sub-optimally consumes the technology and staff resources. Sharing these resources can be very advantageous. To find these opportunities, identify any place in your environment where data processing is transforming, migrating, cleansing and applying business value to data. These are good targets in your data integration.

Diverse Applications Case

In data integration as a service, different teams use the same service to complete disparate projects on the same platform. Let’s take a closer look at a real world example: A client who heard me speak on this topic called me in because they were initiating an update to their data integration environment. In calculating the ROI, I built on the project integration case to see where the dollars were actually spent for a given service.

The server itself priced out at $15,000 and software for data integration at $260,000. In addition, they were using a redundant server for passive backup so the total cost of a production-ready server was $290,000.

The way I calculate resource cost is to baseline the cost per CPU or cost per hour for processing. For the server under discussion: The client processing would be batched from 10 p.m. through 3 a.m. on weekdays.
  • Four CPUs

  • Total server hours = 24 hours x 365 days= 8760

  • Divide total cost ($290,000) by hours to get cost per hour = $33.10 
    This is the cost of running the server for each hour it is operational using all cores.

  • Determine the batch window: 5 days, 5 hours per day = 1300 hours/year processing time

  • Total cost = cost per hour x processing hours per year = $43,030
They spent $290,000 on a server that would process 1300 hours batch per year, which is valued at $43,000.  It would take more than six years before they recouped their initial investment. They needed a better way to get value from their investment and be intelligent in server utilization.

The opportunity lay in many corporate initiatives, including those in the data integration space. Some particularly tasty ones under client consideration for inclusion were:
  • An enhancement to the operational data store (ODS) to bring in website data (real time messaging)

  • Reference table management system (synchronization project)

  • An application migrating from a legacy system which is scheduled to be obsolesced (migration)

  • An initiative to bring in social data for real time marketing analytics
All of these were great candidates for leveraging DI service. The calculations on the size and scope of the integration opportunities were:
  1. The real time messaging was an extension existing in the DI service. The project would require one core for 24/7 processing. Although this was called a 24/7 process, the lion’s share of messages would be during the 7 a.m. through 11 p.m. time frame with relatively few messages occurring during the batch window. The calculation was as follows:

    1. One core, so divide $33.10 by 4 to get the cost per hour for single CPU = $8.28
    2. Processing was 24/7/52 = 8,736 hours
    3. Total annual processing cost = $66,739

  2. The legacy retirement project would require data migration during weekends over three months. There would be an initial load followed by tape archive migrations to get historical data. This was planned for weekend work since the go-live needed to take place while the system could be offline, and historical movement was not seen as so critical to complete during business hours. So the calculation for this project was:

    1. Four cores; processing cost = $33.10 per hour
    2. Processing was 72 hours per week for three months = 864 hours
    3. Total processing cost for project = $28,598
I don’t have the space in this article to list out all the costs, but you can see the process. The final outcome for a few of these projects is in Table 3.

 Project  Type of data integration
 Cost per hour
 Cost per month
 Cost of each service per year
 Data warehouse
 Business intelligence
 $33.11  $3,310.50  $39,726
 Reference table
 Data synchronization
 $8.28  $1,324.20  $15,890
 Migration  Legacy retirement
 $33.11  $9,534.25  $28,602
 Social–real-time messaging
 Big data
 $8.28  $5,561.64  $66,739

Table 3: Final Outcome

Leveraging the data integration services for multiple projects results in the following:
  • For the same initial investment, four projects are supported

  • A total of $150,958 is spent on processing capabilities for the year for all four services

  • If all four projects purchased their own similar hardware/software, the initial investment would be a total cost $1,160,000

  • Proof that data integration as a service is scalable and extensible
The benefits don’t stop there. Now this company is better utilizing their infrastructure resources, cutting down on rack space, data center power and cooling requirements, manpower reductions, decrease in software license fees, and they can take advantage of reusability, which will make them more agile in delivering solutions. The last thought I’d like to leave you with is chargeback potential.

Chargeback Potential

We are accustomed to the business initiating a project and purchasing hardware/software, thus we assume this will be the case going forward. Instead, what if we consider charging the business for its usage of IT investments? If this is a shared service, then can’t we use a mechanism to bill for utilization? How about by CPU cycles, data volumes or time for service utilization?

The next article in the series will discuss chargeback and data integration tool requirements in depth, which will lead to design considerations. Please contact me with any question or comments so they can be addressed in the next article.

  • Calla KnopmanCalla Knopman
    Calla has more than 15 years of consulting expertise in data integration including application integration, data warehousing, quality, metadata management, and business intelligence with an emphasis on system design and architecture. Calla is currently the managing member and founder of Knopman IT Consulting, LLC, while her past roles have included senior positions KPMG, Bearing Point, IBM and VIP. She is frequently a guest speaker at IUG, EDW and TDWI chapters. Calla can be reached at cknopman@knopmanit.com.

Recent articles by Calla Knopman



Want to post a comment? Login or become a member today!

Be the first to comment!