The Green Data Warehouse, Part 1 How to Save Money – and the Environment

Originally published January 15, 2009

The subject of green computing is getting significant airtime these days, with most of the attention on reducing computer power consumption. This is, of course, an important part of a green initiative, but is only one component of an overall strategy to reduce operating costs and make a positive impact on the environment. This and subsequent articles will address the challenges of "going green" while planning your data warehouse initiative.

The history of the term "green computing" dates back to 1992, with the launch of the Environmental Protection Agency's (EPA's) Energy Star program. The original charter for this program was to reduce greenhouse gases and included guidelines for a wide range of electronic consumer devices. The standard has been updated and now includes a framework for classifying computers as being "energy efficient"; but unfortunately, it excludes mid-range and large servers (they do provide definitions in their computer specification).1

Recent trends in green computing have included server virtualization, cloud computing and power monitoring software. Everyone has seen the IBM commercial that starts in black and white, only to turn to color when the "executives" realize they can save big money by "going green." In the ad, IBM claims to save 40% on energy costs due to the green initiative. This is aggressive although not unrealistic.

There can be no doubt that IT is driving energy usage and that there are significant benefits associated with reducing our consumption in this area. According to the Department of Energy, data centers account for 61 billion kWh and cost roughly $4.5 billion a year. Bruce Taylor from the Uptime Institute estimates that data centers will constitute three percent of power consumed in the U.S. by 2010, up from 1.5 percent today.2 Furthermore, a 2007 EPA report estimates that data centers in the United States have the potential to save up to $4 billion in annual electricity costs (a 25% reduction in energy consumption) through more energy efficient equipment and operations and the broad implementation of best management practices.3 Data warehouses will no doubt make up a significant percent of this growth, with increasing demands to support complex queries and the need to store and retain increasing volumes of data. However, there is also the potential to significantly reduce consumption through exponential improvements in hardware, software and application design.

Green Computing Meets Data Warehousing

Green computing efforts in data warehousing have centered on the relatively new appliance market, which includes hardware, operating systems and databases. One of the benefits touted by the hardware appliance vendors (e.g., DATAllegro, Kickfire) is the ability to process exponentially more transactions with a smaller hardware footprint. Meanwhile, extreme data compression is currently being advertised by specialty database management companies (e.g., ParAccel, Vertica), which also results in a smaller hardware footprint. A smaller hardware footprint draws less power, occupies less space and requires less material to manufacture. This also reduces the need for peripherals such as tape and other off-line storage devices by reducing storage requirements. Another less glamorous approach involves reducing redundant data storage at the application level, either by eliminating redundant or duplicate tables, or by purging duplicate backups. A colleague who works for a large business information aggregator was able to eliminate over 75 percent of the backup files that were generated, resulting in a $750,000 per year cost savings alone. The company has outsourced its data center, making it difficult to determine how much of that was due to energy savings, but it's a sure bet the outsourcer has that factored into their service fees.
 
Irrespective of the approach, it's important to understand the leverage achieved by reducing server dimensions. It is particularly important for an application like a data warehouse that has the potential to consume enormous resources. With the growth in data warehouse storage and processing demands, a smaller footprint should translate into big savings.

It should be fairly obvious that reducing the size of the server is a good thing. But how do we determine how well we're doing, and not just in terms of the individual server? The end game should be to reduce energy related operating costs for the entire data center, approaching and even exceeding the 25 percent EPA estimate.

Measurement Approach

To evaluate a reduction in energy consumption, you must first establish a baseline. The baseline should be taken at two levels:

  1. Individual (or cluster) of servers

  2. The entire data center

For an individual server, the measurement framework consists of the following components:

  • Power consumption

  • Heat generation

  • Footprint

  • Materials

Power consumption measures how much power is consumed for a given time quantity for the entire system. A minimum benchmark period is required (say, 24 hours) to get an average consumption measurement. Savings are calculated by a kilowatt per hour (kWh) difference multiplied by your cost per kWh ($/kWh).

Heat generation measures the average heat generated during the benchmark period. Savings will result by a reduction in the cost to cool the incremental heat generated by the baseline system.

Footprint measures the volume of the entire system (height x width x length). Savings are calculated by multiplying the difference in volume versus the baseline by the average cost of rack space per square foot of data center space.

Materials include the weight of key materials in the system (need the top 5-10 materials used in production of a typical platform). Savings can be calculated by incremental cost to both produce and dispose of those materials, although the most significant savings is to the environment as a whole.

For a data center, the measurement framework consists of power consumption, emissions and floor space. Power consumption is measured by total kWh used on a monthly basis. Emissions are measured by the EPA standard, which captures the quantity of pollutant released into the atmosphere by the data center. The EPA has models for measuring emissions, but, most likely, this is a known quantity for most data centers. Floor space includes the total available square footage and the amount used.

The measurement framework is depicted in Figure 1 with individual components, including servers, off-line storage, and other devices, driving energy costs at the data center level.

alt

Figure 1: Measurement Framework 

A baseline can be drawn for an existing system by measuring the current consumption levels of individual servers as well as the data center as a whole. For a greenfield (or new system), the baseline must be calculated. One approach is to determine an "un-optimized" representation of a platform used to store and process information. This could be pulled from other systems within the enterprise, generated by the research department of your company, or calculated based on manufacturers' and software companies' specifications. The important concept is that the baseline should closely approximate your planned data warehouse computing environment. The baseline for the data center can be calculated based on industry data or by polling other organizations with similar sized facilities. Once the baseline has been established, a measurement strategy must be determined. All evaluated platforms should be measured as a delta or offset from the baseline, regardless of the baseline source. Each potential computing platform can then be evaluated as a percentage of the baseline.

Business Case

The ultimate measurement, however, and the basis for your final evaluation, is the delta for the data center when compared to the baseline. The delta is then multiplied by the cost per kWh to determine your estimated cost savings related to power consumption. Add the reduction in floor space costs and you get total savings associated with your green initiative.

Reducing emissions can have financial implications of avoiding fines, but the primary benefit is to the environment. The same can be said about the reduction in raw materials needed to build the servers and associated components, although this can be loosely translated into a reduction in cost associated with those items.

The net result is represented by the following equations:

  1. Cost savings = ² Power x $/kWh + ² floor space x $/sq-ft

  2. Emissions impact = ² Emissions

  3. Environmental impact = ² Materials + ² Emissions

Using the following definitions:

  • ² Power = reduction in data center power consumption (kWh)

  • $/kWh = Your cost for a kilowatt hour of electricity

  • ² floor space = reduction in data center floor space usage, either actual reduction or a reduction in the planned usage

  • $/sq-ft = the cost per square foot for your data center

  • ² Emissions = reduction in the volume of pollutant generated by the data center

  • ² Materials = reduction in the amount of materials required to support your data warehouse applications

Conclusion

Hardware and software providers have recently advertised their energy efficiency, but have not provided specific quantifiable evidence to support these claims. This article presents a framework for allowing you to quantify the efficiency of various platforms. More importantly, the linkage between reducing server size and associated cost savings at the data center level has been established.

My next article will focus on the top 10 things you can do to make your data warehouse green.

Referenced works:

  1. www.energystar.org.tw/pdf/Computer_Spec_Final.pdf
  2. EPA Reports Significant Energy Efficiency Opportunities for U.S. Servers and Data Centers, 8/3/07
  3. Utility IT: The Case for Going Green, 9.18.08 Lynn Singleton
  • Rick AbbottRick Abbott

    Rick Abbott is President of 360DegreeView, LLC. He has over 20 years of information management and technology experience, including private and public sector work. He has significant experience in both the telecommunications and financial services industries, and has over 8 years of "Big 5" experience, including an associate partnership position with Deloitte Consulting. He has direct experience in all aspects of business intelligence and data warehouse projects, including business case development, strategic planning and business alignment, business requirements, and technical architecture and design. He also has significant experience in assisting clients in negotiating large technology product, service, and outsourcing contracts. Rick can be contacted at rick@360degreeview.com.

Recent articles by Rick Abbott

 

Comments

Want to post a comment? Login or become a member today!

Posted January 30, 2009 by becky sheetz-runkle

Great read. Very thorough with a solid business case. Good metics as well. Looking forward to your piece on the top 10 things to make your data center green.

Is this comment inappropriate? Click here to flag this comment.