The Physics of Cooling4

21
Chapter number: TBA Chapter Name: Rack Level Cooling Author Name: Phil Hughes, CEO Company Name: Clustered Systems Company, Inc. City State: Santa Clara, California The Physics of Server Cooling The phenomenon that we humans perceive as heat or cold is produced by the motion of molecules. Only at absolute zero (-273℃) do molecules have no motion. As they become more energetic their temperature is perceived to rise and their state can change from solid to liquid to gas, and even to plasma when the molecules themselves shake apart. As energy states increase, the rate of collisions between molecules increases and occasionally a photon is knocked off, causing the phenomenon of radiation. At lower energy levels radiation is in the infra-red part of the spectrum, increasing into the visible and beyond at higher energy levels. The first law of thermodynamics holds that energy cannot be created nor destroyed but may change form. It is one of these changes that creates our server heating problems. Electrical energy arrives in a chip as a flow of electrons which bang into molecules and start them moving faster, producing heat. Those molecules must be slowed down enough (cooled) to avoid damaging the chip. The second law of thermodynamics holds that when two systems are allowed to interact they will achieve an energy equilibrium, that is, energy will flow from the more energetic to the less energetic system. The question therefore is what is the best transfer mechanism to remove excess energy. We can choose from radiation, convection (forced or natural), conduction and phase change. Radiation At the time of writing, most electronics are solid state, so we can assume that our high energy system is a solid. The lower energy system surrounding it could be a vacuum, gas, liquid or another solid.

Transcript of The Physics of Cooling4

Chapter number:  TBAChapter Name: Rack Level CoolingAuthor Name: Phil Hughes, CEOCompany Name: Clustered Systems Company, Inc.City State:   Santa Clara, California

The Physics of Server CoolingThe phenomenon that we humans perceive as heat or cold is produced by the

motion of molecules. Only at absolute zero (-273℃) do molecules have no

motion. As they become more energetic their temperature is perceived to rise

and their state can change from solid to liquid to gas, and even to plasma

when the molecules themselves shake apart. As energy states increase, the rate

of collisions between molecules increases and occasionally a photon is knocked

off, causing the phenomenon of radiation. At lower energy levels radiation is

in the infra-red part of the spectrum, increasing into the visible and beyond

at higher energy levels.

The first law of thermodynamics holds that energy cannot be created nor

destroyed but may change form. It is one of these changes that creates our

server heating problems. Electrical energy arrives in a chip as a flow of

electrons which bang into molecules and start them moving faster, producing

heat. Those molecules must be slowed down enough (cooled) to avoid damaging

the chip.

The second law of thermodynamics holds that when two systems are allowed to

interact they will achieve an energy equilibrium, that is, energy will flow

from the more energetic to the less energetic system. The question therefore

is what is the best transfer mechanism to remove excess energy. We can choose

from radiation, convection (forced or natural), conduction and phase change.

RadiationAt the time of writing, most electronics are solid state, so we can assume

that our high energy system is a solid. The lower energy system surrounding it

could be a vacuum, gas, liquid or another solid.

With a vacuum, the only way for energy to escape the first system is through

radiation. According to Stefan-Bolzman the energy radiated by a black body is

defined by: q = σ T4 A

Where: q = watts, σ = 5.67*10-8 (W/m2K4) = the Stefan-Boltzmann Constant, T = absolute temperature, and A= body area in square meters.

Grinding though that lot, assuming a 33mm x 33mm chip package at a temperature

of 70℃, we conclude we can dissipate only 0.75W through radiation with a

perfect black body and surroundings at absolute zero; definitely insufficient.

ConductionA gas is one step up from a vacuum. There are about 2.7 x 1022 molecules in a

liter of air. Those molecules, if packed together at absolute zero, would

occupy only 4.7x10-8 liter. Not surprisingly, thermal conductivity, k, is only

0.028 W/m-K at room temperature. For every watt removed from the 33mm x 33mm

chip, there would be a temperature difference of 800℃ per inch (25mm) of air

between the hot chip and the cold body.

Water, one of the more popular coolants, has 3.3 x 1025 molecules per liter,

over a thousand times denser than air. Naturally this implies a higher

conductivity, 0.58 W/m-K, 20 times higher than air, dropping the temperature

difference to 40℃ per inch.

Aluminum, has 6.02x1025 molecules per liter. Its conductivity, is 205 W/m-K,

350 times that of water. The one inch temperature gradient is just 0.11℃,

over 7,000 times better than air.

Clearly, Aluminum or other high conductivity metal such as copper wins hands

down for conductivity. The molecules are trapped in a crystalline matrix where

they vibrate and pass energy to all their neighbors. Liquid, on the other

hand, is almost as dense but the molecules move freely (which is good for

convection) but they don’t readily pass on their energy to other molecules. In

a gas the molecules are so few they rarely collide, reducing conductivity even

more.

Natural ConvectionThis type of convection occurs in both liquids and gas. When the fluid is

heated its molecules closest to the heat source become more energetic and tend

to move above their less energetic neighbors. What we observer is that s

portion of the fluid expands and rises to the top.

To compute the heat removed by natural convection in air, no less than 14

parameters must be taken into account. Even then some are approximations or

simplifications by worthies from centuries past such as Rayleigh, Reynolds,

Prandtl, Nusselt and Grashof.

Fortunately there is a simplification, thus: h (heat transfer coefficient) =

C*((T1-T2)/L)n = 3.77 W/m2K

C and n are dimensionless coefficients, which can be assumed to be 0.59 and 0.25

respectively. T1 and T2 are the temperatures of the hot body and cold plate

respectively. L is the distance between the hot body and a cold plate, 25mm in this

example.

Thus for our 33mm x 33mm CPU, the gradient would be 5.8℃ per watt.

Conclusion: Natural convection may work well for lower power chips (<5W).

In systems using natural liquid convection, a fluid with a very high buoyancy

to viscosity ratio is required. This can be expressed as the Grashof number

which should be as high as possible.

Gr =buoyancy force/viscous force = g.β.ΔT.L^3/v^2

Where: g = gravitational acceleration, β = volumetric expansion coefficient, L

= characteristic length, 0.1m

ΔT = temperature diff, between vertical plane and fluid 30C, v = kinematic

viscosity

Typical Grashof numbers with above length and temperature parameters are:

Fluorinert 3283=1.88 *10^10, Florinert 70 = 5.19 * 10^5, Mineral oil

1.01 * 10^5

Conclusion: Fluorinert 3283 or similar with high Grashof numbers should work

quite well in systems where a system board is immersed in fluid and has a

cooling surface a few millimeters away. Iceotope is the only company with such

a solution currently.

Forced ConvectionBoth gases and liquids can be used in forced convection systems. We will only

discuss air and water in this context. Air is generally ducted to where its

cooling effect is required but water is tightly constrained in piping and heat

exchangers.

AirWhile the number of parameters required to derive the heat transfer

coefficient grows to about 18, there are some simplifications which can be

used for sanity checks. One of the simplest for air at standard temperature

and pressure that can be used for heat sinks is: Theta (Θ) = 916*(L/V)0.5/A in

W/℃

Where L=heat sink length in inches, V=air velocity in ft/min and A= total surface area

in sq. in.

However, this and even more sophisticated models are no substitute for in situ

measurement. Fig. 1 shows a representative example of the (very significant)

difference between the datasheet values and those derived using the heatsink

calculator provided on some manufacturers’ websites. The derived values use

exactly the same values as the formula above. Note the ~2x difference between

curves. dP shows the pressure drop required to achieve the stated air flow.

5 10 15 20 25 30 35 40 45 50 550

0.1

0.2

0.3

0.4

0

0.1

0.2

0.3

0.4

2U Heatsink

Theta measuredTheta Calculated

Air Flow CFM

Thet

a C/w

pres

sure i

nch

H2O

Figure 1 Heat Sink Characterization

Adding to the complexity is the variation between servers. The same heatsink

will perform differently as the externalities vary. These include ducting,

positioning of each CPU (if more than one), DRAM and VRM layouts.

The other significant factor is the fans’ specifications. They must be capable

of providing sufficient volume and pressure to drive the air through the

heatsink(s) and not consume too much power doing so. To establish the

operating requirements, we need to look at

the maximum allowable CPU lid temperature

and CPU power. Typically, 70℃ has been

the allowable maximum, but excursions up

to 95℃ may be allowed in the future.

Maximum power for high performance CPUs is

commonly 130W (even though most servers

may be equipped with only 95W CPUs).

Assuming that the maximum operating inlet

temperature is 45℃, we’d have a margin of

25℃. Thus the allowable thermal resistance

would be 25/135 = 0.185℃/W. As can be seen from the graph, that is the

Figure 2 Fan Curves 60mm x 60 mm

maximum capability of the heatsink. At that point, the fans must deliver 50CFM

with a static pressure of 0.35” of water.

Fig. 2 shows a typical set of

operating curves for two fans. When

operating at maximum power they

should be at the inflection point,

delivering 30-40CFM.

Typically a 2U server heatsink is

about 3.5” wide and 2.5” tall. Banks

of DRAMs will be deployed on one or

both sides of the CPU, see Fig. 3.

In the case on the half width board

on the left, there is room for only two fans, mandating the choice of the more

powerful fan 1. These will draw 60 watts. Further, at least 50% of the air

will bypass the heatsinks, producing borderline performance in normal

operation. A fan failure will cause the CPU to throttle in order to stay

within the thermal envelope, thus lose performance. The system on the right is

a little more forgiving but a fan failure still has the potential to affect

performance. Potentially, its fans could draw up to 150W, an additional 30%

load.

As the power consumed by a fan is proportional to the volume of air flow (CFM)

cubed, from an energy efficiency point of view it is better to have as many

fans as possible. For example if one fan could produce adequate air flow for

cooling at 32W, two of the same fans sharing the load would only consume 8W.

Note the energy of the fans adds slightly to the air temperature, but is

usually low enough (<1℃) so as not to be a significant factor.

After the heat is exhausted from the server it is either sucked into a cooling

unit, which is itself cooled by water or pumped refrigerant, then re-

circulated to the server inlets or exhausted to the atmosphere. In the latter

case, fresh outside air is directed to the server inlets. For a rack with 80

Figure 3 Server Motherboards

server motherboards (left motherboard) drawing 450W each, for a component load

of 36kW and typical fan load of 6kW (75W/server), approximately 445,000 cubic

feet of air (12,600m3) needs to be recirculated with its fans to maintain a

10℃ air temperature rise at the server exits.

It should be noted that the external environment can also affect fan

performance. Passive rear door heat exchangers and cabling are the two biggest

problems. They can block server exhaust and reduce cooling efficiency.

WaterWater is much easier to handle than air. It is piped exactly to where you want

it to go. Most systems consist of three components, in-server, in-rack and

exhaust. In all known systems the in-server component connects to the in-rack

distribution system via two quick connects.

They also come in two flavors, IBM and everybody else. The IBM version is very

solidly engineered with all cooling components connected with brazed copper

tubing. In Fig 4, it can be seen that each hot component has an individual

cooling block. Very little if any air cooling is required.

The representative of the “others” cools only the CPUs and is interconnected

with flexible tubing and plastic connectors. Air cooling is still required for

all other components including DIMMs.

Figure 4 Water Cooled Servers from IBM and Asetek

Fig. 5 shows the rack level plumbing of a typical water cooled system.

Most of these systems are advertised as

having the ability to be cooled with

hot water and they do remove heat quite

efficently. The block in contact with

the CPU or other hot body is usually

copper with a conductivity of around

400 W/m-K so the temperature drop

across it is negligible. If the water

is pumped slowly enough, reducing

pumping power, flow is laminar. Because

water is not a very good conductor of heat, a temperature drop of around 5℃

can be expected across the water copper interface. This is usually negligible,

but if necessary, can be reduced by forcing turbulent flow by increasing flow

rate. This could be an expensive waste of energy.

Both server types have two CPUs plumbed in series. The maximum power

consumption of a CPU is around 130W. If we assume the maximum lid temperature

is 70℃ and the inlet water is 40℃, each CPU could heat the water 10℃ while

accommodating the thermal resistance of the water film and the cold block

itself. For a rack with 40 servers, 160 CPUs (21kW), about 1.8 cubic meters of

water per hour would be required. Pump energy would be around 80W. Of course,

another 15kW (450W total per server) remains to be removed by fans. Clearly,

the racks cannot be deployed at maximum density, resulting in a power density

of around 600W/sq ft, without special provision such as rear door heat

exchangers.

While the physics of the system are workable, the statistics may not be. Let’s

be very optimistic and assume that the MTBF (mean time between failure) of a

liquid connector is 107 hours and the service life is 3 years, i.e. 26,280

hours. The probability of survival is e^-(26280/10-7) = 0.9974 or a 0.26%

Figure 5 Water Pipes and Manifolds

probability that it would fail. If there were 1000 servers, 2,000 connectors,

about 5 would fail. This calculation would be reasonable for the IBM system

where all the connectors are brazed to the piping. Where flexible tubing and

plastic connectors are in the mix together with the vibration of fans then the

probability of failures goes up.

Finally, water chemistry can be difficult. Described as the “universal

solvent” it can eat through metals and plastic if it has not been pre treated

properly. Another concern could be algae growth. A closed secondary loop to

the components is essential to reliably manage such issues. A leak in such a

loop might bring the entire loop and its associated servers down.

OilLight mineral has been applied in a couple of instances for cooling.In one

case, multiple servers are immersed in an oil bath and in the second, servers

are put into individual sealed cases. In both cases, the oil is forced through

the individual server containers using circulation pumps. Heat is removed from

the oil by passing it through a heat exchanger on a water loop.

Typical parameters for light oil are ( water in parentheses): 800kg/m3 (1000)

Viscoity 0.0345 N-s/m2 (.000798), Specific Heat 1100 J/Kg-oK (4186),

Thermal conductivity .15 J/s/m-K (.000615), Thermal expansion coefft. 0.00064 oK-1(.000291)

Grashof number 1.01 * 105 (1.34 *108)

This scheme is more energy efficient that air but suffers from two

disadvantages. Servicability can be a problem when the system boards are

covered in an oil film and more energy is required to drive the circulation

pumps than a water based system due to the lower specific heat of the oil and

higher viscosity. Ride through might also be an issue as the oil has a fairly

low Grashof number and specific heat so there would be little natural

circulation when a pump failed. This may cause over heating.

Phase ChangePhase change based systems use the latent heat of evaporation to absorb heat

and remove it from the hot objects.

In one case, servers are placed in a bath open to the atmosphere and filled

with cooling fluid with a relatively low boiling point, in the other, the

coolant is delivered to the server though a cold plate in a sealed system.

Bath

A coil with coolant, usually water or water and glycol, circulating through it

is mounted in the lid of the bath. In operation, the liquid boils, the gas

rises and is re-condensed by the cooling coil and the liquid drops back into

the bath.

Originally designed for single phase sealed systems, fluids such as 3M’s Novec

7000, boiling point 34oC and Novec 649, 49oC at normal atmospheric pressure

are being proposed for non-sealed systems. While Novec 7000 has the best

physical characteristics, such as latent heat of evaporation and boiling point

it has a significant global warming potential (GWP) which may be a problem in

some jurisdictions. On the other hand Novec 649 has an uncomfortably high

boiling point which may compromise reliability of some components but has a

very low GWP.

These and other similar fluids have been used for cleaning for years with no

apparent harm to operators as the liquids are always below boiling point.

Presumably most would have evacuation hoods over the cleaning baths so

inspiration is minimized. Precautions will be needed where operatives run the

risk of continuous exposure to the additional vapors released by the boiling

fluids until long term effects are understood. Additional precautions might be

necessary to guard against failure of the cooling loop which could cause the

room atmosphere to become saturated with coolant vapor.

Sealed System

In the open system, the fluid is directly in contact with the hot objects and

is insensitive to system topology and component height. In a sealed system

with flat, minimally flexible cold plates heat must be brought up to a single

plane. While convection is adequate for low wattage components, a conductive

path is required for high power devices.

In the implementation available from Clustered Systems, heat is conducted to a

single plane by a series of heat risers placed atop each component that

generates a significant amount of heat. In all cases, these include CPUs,

VRMs, DIMMs and system glue, plus, if merited, networking and other components

generating over ~2W. The heat risers can be seen at the top of Fig 6. For

clarity, only the bottom server is shown covered by a cold plate. The cold

plates are a chassis component and are all permanently soldered into

refrigerant distribution manifolds. This completely eliminates the probability

of leakage from connectors.

Figure 6 Clustered Systems’ Phase Change Cooled Blade with Two Half Width Servers

Liquid (R134A) is pumped through cold plates placed upon heat risers attached

to CPUs, DIMMs, VRMs, etc. The heat causes the liquid to boil, absorbing 93

times as much heat as the same weight of water.

Figure 7 Sixteen Blade Chassis

The liquid and gas mix is then passed to a heat

exchanger where it is re-converted to 100% liquid.

Unlike air cooled systems, the thermal resistance

between heat source and liquid is so small that

high coolant temperatures can be tolerated. No

chiller is required in most cases. The only energy

required is for circulation pumps and external

fans in a dry or adiabatic cooler. The cooling PUE

can be as low as 1.03.

Fig. 7 shows the front of the chassis. The cold plates can be seen at the

right of each non-populated slot. They slip into the blade and contact the

heat risers when the blade is inserted.

Fig. 8 shows the four rear switch blades and a partial view of the

distribution manifolds.

The maximum power consumption of a CPU is around 130W and we assume that the

maximum lid temperature is 70℃. As the system is isothermal, the cold plate

is the same temperature virtually everywhere. Heat input just causes liquid to

change to gas with no temperature rise. Assuming that the inlet refrigerant

was 40℃, and having established by measurement that the thermal resistance

from CPU lid to refrigerant is <0.2℃/watt the CPU lid would reach 66℃ (40 +

Figure 8 Chassis Rear

130*0.2). Because the gasification causes bubble formation, hence turbulence,

laminar flow film formation is not a problem.

For a whole rack with 160 servers (72kW @ 450W per server) about 0.66 cubic

meters of refrigerant per hour would be required. In practice, with viscosity

of refrigerent 25%, and fluid flow 10% that of a water based system, pump

energy is very low, about 30W.

The benefits of such an efficient phase change cooling system are striking:

Very high power densities can be achieved

o 100kW racks enable data center density of 4,000 W/ft2

Rack floor space for a 10MW data center can be reduced from 50,000ft2 to

about 2,500ft2

Data center construction and facility costs drop ~50%.

ConclusionsWhatever liquid cooling technology is chosen, it will always be more efficientthan air for two reasons. The first and most important is the amount of energyrequired to move air will always be several times greater than that to move a liquid for the same amount of cooling.

Density lb/cuft

Specific Heat BTU/lb

States Delta T 0F

lb/min/ton

CFM/ton

Static pressPSI

Req.Watts

Fan/ Pump Effic-iency

Total W

% of load

Air (std day)

0.075 0.205 gas-gas 18 54.11

722 0.036 84.6 30% 282 8.0%

Water @ 50 F

62.45 0.998 liquid-liquid

9 22.27

0.36 35 40.6 30% 135 3.8%

R134a @ 95 F

72.94 72 liquid-gas30%

0 9.26 0.13 20 8.3 30% 28 0.8%

The table illustrates some typical numbers. While the move to water reduces the energy by 50%, going to refrigerant cuts it by 90%.

Rack Level CoolingDiscussionToday (2013) a 1U server can contain a half width motherboard and four GPU cards. Typically a motherboard ill require about 450W and a GPU 300W, a total of 1,650 watts per 1U. A fully populated server with 40 1U boxes will require almost 70kW, way beyond the capability of air. Even with two half width servers per U the power requirement is 36kW (40 x 2 x 450W). Air for rack level cooling is practically obsolescent except possibly for some low power

work like web serving front ends. This paper briefly reviews some currently available systems. Table 1 presents a summary of some of the various options.

Passive Rear Door, Water

Passive Rear Door, Refrigerant

In Row Cooling Water

Active Rear DoorWater

Cold Plate Refrigerant

Rack Cooling Capability KW 15 20-30 20 20 200Cooling Efficiency Low

Medium- High Medium Medium High

Eliminate Hot Spots

Server fan dependent

Server fan dependent

Server fan dependent

Yes Yes

Additional Cooling required Yes Yes Yes No NoCRAC Required Yes Yes No No NoAisle ContainmentRequired No No Yes No NoAdditional floor space Minimum Minimum 50% Minimum NoneIsothermal No Yes No No YesRedundant pumps &control No Yes No Yes YesChiller Required Yes Yes Yes Yes NoTable 1 Cooling OptionsWater or refrigerant?Water Cooled DoorsHeat is extracted from the air passing through the system by warming water passing through the door’s coils. For each oF that one pound of water warms up, one BTU is absorbed (specific heat). However, due to this warming effect, the temperature differential between air and water drops, diminishing the effective cooling. To overcome this problem, very cold water has to be used which, in some cases, its temperature is below the dew point, causing condensation. While this can be accommodated, it is an extra installation expense. Chilled water can be connected to cooling units either under floor or overheadwith hard or flexible tubing.

Heat can be removed from the water directly from a primary system which could be either a stand-alone remote mechanical refrigeration unit or one that is part of the building system. A better solution which avoids the entry of contaminants from the primary system into the unit cooling loop is to insert aCDU (coolant distribution unit, a heat exchanger and pump assembly) between the primary and secondary (unit) loops.RefrigerantHeat is extracted from air by refrigerant passing through the door’s coils. Instead of making use of specific heat, latent heat of evaporation is used. A refrigerant can capture 100 times more heat per unit of weight than water. As the evaporative process is isothermal, warmer coolant can be used, avoiding condensation problems all too common with water based systems.Heat is removed from the refrigerant using CDU (coolant distribution unit, a heat exchanger and pump assembly) connected to a primary cooling water system.Due to the higher efficiency of refrigerant, the use of a chiller to cool the primary water can be eliminated or at least significantly curtailed and replaced by a water tower, adiabatic or dry cooler.Refrigerant can be brought to cooling units with hard or flexible overhead tubing. Refrigerant piping in a server room does require that most joints mustbe soldered or brazed to ensure a leak-tight system. However, due to the extrawork needed to protect against condensation drips in water based systems, costs are a wash. In the event of a leak, the refrigerant evaporates immediately while water can drip onto and damage sensitive electronics.Passive rear door A water cooled passive rear door was originally developed by IBM for its own servers. They worked quite well at the 5kW rack power levels then current. Later it was licensed to other companies. By bringing the cooling source closer to the heating source, energy savings can be realized both in the reduction of energy for air circulation fans and by reducing cool and return air mixing. The latter allows warmer coolant to beused, in some cases, eliminating the need for chiller operation. (Bell, 2010)While these doors are quite effective at relatively low rack power levels, recirculation becomes a problem as the fan speeds increase to maintain the servers’ internal temperature. The speed up causes the differential pressure from back to front of the rack to increase. In one study this was shown to increase from 20% at 5kW to over 45% at 20kW (Khanakari, 2008). The intake airto servers at the bottom of the rack exceeded the maximum limit of 27oC as then defined by ASHRAE.Counter intuitively, decreasing the recirculation rate by adding blocking plates can increase the pressure at the server outlet which will decrease fans’ efficiency, thus the air flow rate hence impeding cooling. In addition, cables can also partially block air flow, creating yet a further impediment.

It should also be noted that passive rear door coolers cannot be used to condition the data center space. Active Rear Door CoolersThese systems are similar to passive coolers but with the addition of fans. These fans can eliminate some of the drawbacks encountered with passive doors.It is claimed that they can handle up to 45KW when added to a single rack.The pressure between the server outlet and door is reduced and this can cut down hot air recirculation and improve the efficiency of the servers’ internalfans. It is necessary that door fans are synchronized with the servers’. If too slow, they can act as an impediment to air flow and if too fast, can wasteenergy.The additional fans will increase power draw and create another layer of devices to be regularly serviced and repaired.The increased cooling efficiency may also mean that warmer water can be used, possibly eliminating humidification and dehumidification issues. Another benefit may include extended use of economizer modes which cuts chiller energyexpense.In-row coolersIn-row coolers are modular enclosures the same height and depth of the server racks and interspersed between server racks depending on the density to provide increased cooling in the vicinity of the racks. In-row coolers function best when used in conjunction with aisle containment systems that force the cooled air from the in-row coolers to pass through the server racks from a cold aisle to a hot aisle. This also means for optimum efficiency in-row coolers require modular aisle containment systems, adding cost. While moderately effective, in-row coolers cannot be used for cooling a specific rack or racks because they are not directly connected to any rack. They cannotdetermine the exact direction of 100% of their airflow or cooling capacity. Current capacity limitation is around 30kW per in-row cooler (not per rack). In-row coolers are connected to a central chilled water system via flexible hydraulic hoses.Pumped refrigerant in-row cooling units are also available see the discussion on water and refrigerant based cooling above for discussion of the differences.Conduction Two Phase CoolingThis was discussed above in the “Phase Change, Sealed” section that discusses server level cooling. Generally the cold plates are permanently affixed in therack and are brought into contact with the server using a deflection mechanism.Both 1U pizza box and 8U blades have been developed. The cold plates are only 0.08” (2mm) thick but, using refrigerant, absorb well over 3KW of heat in the

21” x 22” format used in the 1u format and 1kW in the 6” x 22” format used in the blade server application. For standard 1U servers the TIM is integrated with the lid and an external mechanism presses the cold plate against it. Currently these are set up to cool either 36 servers with a maximum heat load of 80KW. This is far less thanthe capability of the cold plates but is more heat than most 1U servers can generate.The blade server chassis is 8U tall and holds 16 blades. The cold plates are integrated with the chassis, two for each blade. When a blade is inserted, thecold plate slides under the lid. After seating the blade, the lid is pulled down to press the cold plate onto the components. A 42U rack can hold 5 chassis and cool 100kW. This can be further extended to 200kWHow Much?Overall, there is relatively little equipment capital cost difference in the various forms of “close in” air cooling. All will have the same infrastructureconsisting of chiller, economizer (possibly), CDU, air-liquid heat exchanger (with or without fans) and possibly a CRAC for humidification and dehumidification. The passive rear door solutions may cost $2,000 to $3,000 per instance less, about $85,000 per megawatt.In the case of the conduction cooling system, it consists of a rack with integrated cold plates (solid to liquid heat exchanger), a CDU and a non chiller based heat disposal system. This could be a dry cooler, adiabatic cooler or cooling tower, depending on location.Table 2 below gives estimates of the build costs per megawatt for a data center using 30KW air cooled racks and Clustered Systems’ 100KW conduction cooled racks.

Air Clustered

W/sq ft Data room 430 2500Required sq ft 2320 400Cost per sq ft 250 80Mechanical 464 0Built area 2784 400KW per cabinet 30 100Number of cabinets 33 10

DC Construction $ 696,000 $ 32,000

Electrical system $ 900,000

$ 536,500

Static discharge protection $ 34,800 $6,000Cooling Chiller/cooler

$ 167,402 $ 73,400

Cooling CRAH/HX $ 284,445 $ -

Cooling CDU $ 284,068

$ 245,000

Fire suppression $ 55,680 $8,000Physical security $ 58,000 $ 10,000

Cabinets $ 41,667 $ 1,000,000

Air/liquid HX $ 225,883 $ -

Sub total $ 2,747,945

$ 1,910,900

Contingency @ 10% 10% 10%Architect & Engineering 6% 3%Project Mgr/Consultant 10% 5%

Totals $ 3,462,410

$ 2,254,862

Savings 35%

Table 2, Building Cost ComparisonOf course there are also energy savings. These are shown in Table 3.

ENERGY USE COMPONENT Air 30kW Rack

Clustered 100kW

Data Center IT load 950 1000DC internal cooling (server fans) 50Data Centre Cooling Load (UPS) 1000 1,000Chiller Load @ 7°C 230 -

Electric Room Cooling Load 122 30

Door power 40 -Data Centre Cooling Load (lighting & skin) 28 5Back of House Skin Load 10 2Chilled Water Pump 30 30Refrigerant pump 0 6Pump Cooling Load 30 -Condenser Water Load 20 -Ventilation Latent Heat Load 7 -Ventilation Sensible Load 2 -Cooling Tower 2 7Chiller Heat Rejection to Ambient 1 -Back of House Lighting 1 0.2Total 1525 1,079True PUE 1.66 1.08

Cost of power $0.10 $ 0.10

Annual cost$1,335,577 $945,537

Table 3, Annual Energy Cost ComparisonWhat do I get for my money?Servers with fans are specified differently from those without. At idle, the fans draw ~3 watts each (6-8 per server) and put out 50% of rated capacity. Under full load and 100% of capacity they will draw 8 times as much, about 150watts. Assuming that a motherboard has a power rating of 500W the server nameplate in an air cooled system will specify 650 watts. However, the contactcooled server is still specified at only 500 watts, eliminating unnecessary capital costs for the same number of servers or permitting more servers with the same infrastructure.Thus per megawatt you can cool: 1550 air cooled server or 2,000 contact cooledservers.Table 4 shows the amortization computation for each DC component.

Years Air 30kW Clustered

100kW

DC Construction 39 $17,846 $ 821Electrical system 15 $60,000 $ 35,767Static discharge protection 15 $ 2,320 $ 400Cooling Chiller/cooler 15 $11,160 $ 4,893Cooling CRAH/HX 15 $18,963 $-Cooling CDU 15 $18,938 $ 16,333Fire suppression 15 $ 3,712 $ 533Physical security 15 $ 3,867 $ 667Cabinets 3/6 $13,889 $ 166,667Air/liquid HX 6 $18,824 $-Subtotal, per year $ 169,518 $226,081

Power/year$1,335,577 $945,537

Totals$1,505,095

$ 1,171,617

Number of servers 1550 2000Cost per server $971 $ 586

40%Table 4, Annual Amortization per Server ComparisonAs the DC construction costs can be amortized over 39 years, Clustered’s system is slightly more expensive. However, that is recovered in less than 2 months through energy savings. Further, as data centers are usually populated using nameplate power or power guides from the various OEMs, which have to take into account worst case power 450 more servers can be accommodated using contact cooling, bringing the cost per 1U server to 60% of the air cooled ones.

The FutureAll of the systems discussed above have of necessity been designed to cool standard servers whose system boards and component pin outs were designed exclusively for air cooling. In the short term, with liquid cooling the constraint of component placing canbe removed. For example, DIMMs are spaced 0.4” apart to facilitate air cooling. In a liquid cooled system they can be set at 0.3”. Being closer together means that additional DIMMs can be accommodated in the same space or signal quality improved so that faster data transfers can be achieved. Anotherwith air cooling is the “shadowing” effect where components down wind of a hotcomponent can overheat. Liquid cooling and especially 2 phase cooling does nothave this problem.Another big issue that is especially important in HPC (high performance computing) is the interconnect between individual servers. At the low power densities that air is capable of these servers have to be so far apart that signal quality is decreased and links become speed limited so that the system cannot be used to its maximum compute capability. In addition, the cost of cabling becomes a large part of the whole. By moving to a liquid cooled systemwhich will enable power densities that are ten times or more than air and cable runs are much shorter resulting in higher network speeds, less errors, lower power and higher utilization of the available compute resources.In the mid term, we can expect a move to stacked DIMMs which will improve compute to memory ratio and allow denser packing that only liquid cooling willbe able to support.Current efforts are now focused on extreme density which will leave very little space for power and cooling. One approach is to combine power and cooling, using the same pipes to carry electricity in and heat out. IBM has demonstrated a system that is analogous to the operation of a mammal’s brain. Blood both delivers nourishment and removes heat.On the other hand, we may see a repeat analogous to the transition from bipolar to CMOS technology where air displaced water cooling. The next generation of computer technology may dissipate so little energy that we can move back to air again.