The Physics of Cooling4
-
Upload
independent -
Category
Documents
-
view
2 -
download
0
Transcript of The Physics of Cooling4
Chapter number: TBAChapter Name: Rack Level CoolingAuthor Name: Phil Hughes, CEOCompany Name: Clustered Systems Company, Inc.City State: Santa Clara, California
The Physics of Server CoolingThe phenomenon that we humans perceive as heat or cold is produced by the
motion of molecules. Only at absolute zero (-273℃) do molecules have no
motion. As they become more energetic their temperature is perceived to rise
and their state can change from solid to liquid to gas, and even to plasma
when the molecules themselves shake apart. As energy states increase, the rate
of collisions between molecules increases and occasionally a photon is knocked
off, causing the phenomenon of radiation. At lower energy levels radiation is
in the infra-red part of the spectrum, increasing into the visible and beyond
at higher energy levels.
The first law of thermodynamics holds that energy cannot be created nor
destroyed but may change form. It is one of these changes that creates our
server heating problems. Electrical energy arrives in a chip as a flow of
electrons which bang into molecules and start them moving faster, producing
heat. Those molecules must be slowed down enough (cooled) to avoid damaging
the chip.
The second law of thermodynamics holds that when two systems are allowed to
interact they will achieve an energy equilibrium, that is, energy will flow
from the more energetic to the less energetic system. The question therefore
is what is the best transfer mechanism to remove excess energy. We can choose
from radiation, convection (forced or natural), conduction and phase change.
RadiationAt the time of writing, most electronics are solid state, so we can assume
that our high energy system is a solid. The lower energy system surrounding it
could be a vacuum, gas, liquid or another solid.
With a vacuum, the only way for energy to escape the first system is through
radiation. According to Stefan-Bolzman the energy radiated by a black body is
defined by: q = σ T4 A
Where: q = watts, σ = 5.67*10-8 (W/m2K4) = the Stefan-Boltzmann Constant, T = absolute temperature, and A= body area in square meters.
Grinding though that lot, assuming a 33mm x 33mm chip package at a temperature
of 70℃, we conclude we can dissipate only 0.75W through radiation with a
perfect black body and surroundings at absolute zero; definitely insufficient.
ConductionA gas is one step up from a vacuum. There are about 2.7 x 1022 molecules in a
liter of air. Those molecules, if packed together at absolute zero, would
occupy only 4.7x10-8 liter. Not surprisingly, thermal conductivity, k, is only
0.028 W/m-K at room temperature. For every watt removed from the 33mm x 33mm
chip, there would be a temperature difference of 800℃ per inch (25mm) of air
between the hot chip and the cold body.
Water, one of the more popular coolants, has 3.3 x 1025 molecules per liter,
over a thousand times denser than air. Naturally this implies a higher
conductivity, 0.58 W/m-K, 20 times higher than air, dropping the temperature
difference to 40℃ per inch.
Aluminum, has 6.02x1025 molecules per liter. Its conductivity, is 205 W/m-K,
350 times that of water. The one inch temperature gradient is just 0.11℃,
over 7,000 times better than air.
Clearly, Aluminum or other high conductivity metal such as copper wins hands
down for conductivity. The molecules are trapped in a crystalline matrix where
they vibrate and pass energy to all their neighbors. Liquid, on the other
hand, is almost as dense but the molecules move freely (which is good for
convection) but they don’t readily pass on their energy to other molecules. In
a gas the molecules are so few they rarely collide, reducing conductivity even
more.
Natural ConvectionThis type of convection occurs in both liquids and gas. When the fluid is
heated its molecules closest to the heat source become more energetic and tend
to move above their less energetic neighbors. What we observer is that s
portion of the fluid expands and rises to the top.
To compute the heat removed by natural convection in air, no less than 14
parameters must be taken into account. Even then some are approximations or
simplifications by worthies from centuries past such as Rayleigh, Reynolds,
Prandtl, Nusselt and Grashof.
Fortunately there is a simplification, thus: h (heat transfer coefficient) =
C*((T1-T2)/L)n = 3.77 W/m2K
C and n are dimensionless coefficients, which can be assumed to be 0.59 and 0.25
respectively. T1 and T2 are the temperatures of the hot body and cold plate
respectively. L is the distance between the hot body and a cold plate, 25mm in this
example.
Thus for our 33mm x 33mm CPU, the gradient would be 5.8℃ per watt.
Conclusion: Natural convection may work well for lower power chips (<5W).
In systems using natural liquid convection, a fluid with a very high buoyancy
to viscosity ratio is required. This can be expressed as the Grashof number
which should be as high as possible.
Gr =buoyancy force/viscous force = g.β.ΔT.L^3/v^2
Where: g = gravitational acceleration, β = volumetric expansion coefficient, L
= characteristic length, 0.1m
ΔT = temperature diff, between vertical plane and fluid 30C, v = kinematic
viscosity
Typical Grashof numbers with above length and temperature parameters are:
Fluorinert 3283=1.88 *10^10, Florinert 70 = 5.19 * 10^5, Mineral oil
1.01 * 10^5
Conclusion: Fluorinert 3283 or similar with high Grashof numbers should work
quite well in systems where a system board is immersed in fluid and has a
cooling surface a few millimeters away. Iceotope is the only company with such
a solution currently.
Forced ConvectionBoth gases and liquids can be used in forced convection systems. We will only
discuss air and water in this context. Air is generally ducted to where its
cooling effect is required but water is tightly constrained in piping and heat
exchangers.
AirWhile the number of parameters required to derive the heat transfer
coefficient grows to about 18, there are some simplifications which can be
used for sanity checks. One of the simplest for air at standard temperature
and pressure that can be used for heat sinks is: Theta (Θ) = 916*(L/V)0.5/A in
W/℃
Where L=heat sink length in inches, V=air velocity in ft/min and A= total surface area
in sq. in.
However, this and even more sophisticated models are no substitute for in situ
measurement. Fig. 1 shows a representative example of the (very significant)
difference between the datasheet values and those derived using the heatsink
calculator provided on some manufacturers’ websites. The derived values use
exactly the same values as the formula above. Note the ~2x difference between
curves. dP shows the pressure drop required to achieve the stated air flow.
5 10 15 20 25 30 35 40 45 50 550
0.1
0.2
0.3
0.4
0
0.1
0.2
0.3
0.4
2U Heatsink
Theta measuredTheta Calculated
Air Flow CFM
Thet
a C/w
pres
sure i
nch
H2O
Figure 1 Heat Sink Characterization
Adding to the complexity is the variation between servers. The same heatsink
will perform differently as the externalities vary. These include ducting,
positioning of each CPU (if more than one), DRAM and VRM layouts.
The other significant factor is the fans’ specifications. They must be capable
of providing sufficient volume and pressure to drive the air through the
heatsink(s) and not consume too much power doing so. To establish the
operating requirements, we need to look at
the maximum allowable CPU lid temperature
and CPU power. Typically, 70℃ has been
the allowable maximum, but excursions up
to 95℃ may be allowed in the future.
Maximum power for high performance CPUs is
commonly 130W (even though most servers
may be equipped with only 95W CPUs).
Assuming that the maximum operating inlet
temperature is 45℃, we’d have a margin of
25℃. Thus the allowable thermal resistance
would be 25/135 = 0.185℃/W. As can be seen from the graph, that is the
Figure 2 Fan Curves 60mm x 60 mm
maximum capability of the heatsink. At that point, the fans must deliver 50CFM
with a static pressure of 0.35” of water.
Fig. 2 shows a typical set of
operating curves for two fans. When
operating at maximum power they
should be at the inflection point,
delivering 30-40CFM.
Typically a 2U server heatsink is
about 3.5” wide and 2.5” tall. Banks
of DRAMs will be deployed on one or
both sides of the CPU, see Fig. 3.
In the case on the half width board
on the left, there is room for only two fans, mandating the choice of the more
powerful fan 1. These will draw 60 watts. Further, at least 50% of the air
will bypass the heatsinks, producing borderline performance in normal
operation. A fan failure will cause the CPU to throttle in order to stay
within the thermal envelope, thus lose performance. The system on the right is
a little more forgiving but a fan failure still has the potential to affect
performance. Potentially, its fans could draw up to 150W, an additional 30%
load.
As the power consumed by a fan is proportional to the volume of air flow (CFM)
cubed, from an energy efficiency point of view it is better to have as many
fans as possible. For example if one fan could produce adequate air flow for
cooling at 32W, two of the same fans sharing the load would only consume 8W.
Note the energy of the fans adds slightly to the air temperature, but is
usually low enough (<1℃) so as not to be a significant factor.
After the heat is exhausted from the server it is either sucked into a cooling
unit, which is itself cooled by water or pumped refrigerant, then re-
circulated to the server inlets or exhausted to the atmosphere. In the latter
case, fresh outside air is directed to the server inlets. For a rack with 80
Figure 3 Server Motherboards
server motherboards (left motherboard) drawing 450W each, for a component load
of 36kW and typical fan load of 6kW (75W/server), approximately 445,000 cubic
feet of air (12,600m3) needs to be recirculated with its fans to maintain a
10℃ air temperature rise at the server exits.
It should be noted that the external environment can also affect fan
performance. Passive rear door heat exchangers and cabling are the two biggest
problems. They can block server exhaust and reduce cooling efficiency.
WaterWater is much easier to handle than air. It is piped exactly to where you want
it to go. Most systems consist of three components, in-server, in-rack and
exhaust. In all known systems the in-server component connects to the in-rack
distribution system via two quick connects.
They also come in two flavors, IBM and everybody else. The IBM version is very
solidly engineered with all cooling components connected with brazed copper
tubing. In Fig 4, it can be seen that each hot component has an individual
cooling block. Very little if any air cooling is required.
The representative of the “others” cools only the CPUs and is interconnected
with flexible tubing and plastic connectors. Air cooling is still required for
all other components including DIMMs.
Figure 4 Water Cooled Servers from IBM and Asetek
Fig. 5 shows the rack level plumbing of a typical water cooled system.
Most of these systems are advertised as
having the ability to be cooled with
hot water and they do remove heat quite
efficently. The block in contact with
the CPU or other hot body is usually
copper with a conductivity of around
400 W/m-K so the temperature drop
across it is negligible. If the water
is pumped slowly enough, reducing
pumping power, flow is laminar. Because
water is not a very good conductor of heat, a temperature drop of around 5℃
can be expected across the water copper interface. This is usually negligible,
but if necessary, can be reduced by forcing turbulent flow by increasing flow
rate. This could be an expensive waste of energy.
Both server types have two CPUs plumbed in series. The maximum power
consumption of a CPU is around 130W. If we assume the maximum lid temperature
is 70℃ and the inlet water is 40℃, each CPU could heat the water 10℃ while
accommodating the thermal resistance of the water film and the cold block
itself. For a rack with 40 servers, 160 CPUs (21kW), about 1.8 cubic meters of
water per hour would be required. Pump energy would be around 80W. Of course,
another 15kW (450W total per server) remains to be removed by fans. Clearly,
the racks cannot be deployed at maximum density, resulting in a power density
of around 600W/sq ft, without special provision such as rear door heat
exchangers.
While the physics of the system are workable, the statistics may not be. Let’s
be very optimistic and assume that the MTBF (mean time between failure) of a
liquid connector is 107 hours and the service life is 3 years, i.e. 26,280
hours. The probability of survival is e^-(26280/10-7) = 0.9974 or a 0.26%
Figure 5 Water Pipes and Manifolds
probability that it would fail. If there were 1000 servers, 2,000 connectors,
about 5 would fail. This calculation would be reasonable for the IBM system
where all the connectors are brazed to the piping. Where flexible tubing and
plastic connectors are in the mix together with the vibration of fans then the
probability of failures goes up.
Finally, water chemistry can be difficult. Described as the “universal
solvent” it can eat through metals and plastic if it has not been pre treated
properly. Another concern could be algae growth. A closed secondary loop to
the components is essential to reliably manage such issues. A leak in such a
loop might bring the entire loop and its associated servers down.
OilLight mineral has been applied in a couple of instances for cooling.In one
case, multiple servers are immersed in an oil bath and in the second, servers
are put into individual sealed cases. In both cases, the oil is forced through
the individual server containers using circulation pumps. Heat is removed from
the oil by passing it through a heat exchanger on a water loop.
Typical parameters for light oil are ( water in parentheses): 800kg/m3 (1000)
Viscoity 0.0345 N-s/m2 (.000798), Specific Heat 1100 J/Kg-oK (4186),
Thermal conductivity .15 J/s/m-K (.000615), Thermal expansion coefft. 0.00064 oK-1(.000291)
Grashof number 1.01 * 105 (1.34 *108)
This scheme is more energy efficient that air but suffers from two
disadvantages. Servicability can be a problem when the system boards are
covered in an oil film and more energy is required to drive the circulation
pumps than a water based system due to the lower specific heat of the oil and
higher viscosity. Ride through might also be an issue as the oil has a fairly
low Grashof number and specific heat so there would be little natural
circulation when a pump failed. This may cause over heating.
Phase ChangePhase change based systems use the latent heat of evaporation to absorb heat
and remove it from the hot objects.
In one case, servers are placed in a bath open to the atmosphere and filled
with cooling fluid with a relatively low boiling point, in the other, the
coolant is delivered to the server though a cold plate in a sealed system.
Bath
A coil with coolant, usually water or water and glycol, circulating through it
is mounted in the lid of the bath. In operation, the liquid boils, the gas
rises and is re-condensed by the cooling coil and the liquid drops back into
the bath.
Originally designed for single phase sealed systems, fluids such as 3M’s Novec
7000, boiling point 34oC and Novec 649, 49oC at normal atmospheric pressure
are being proposed for non-sealed systems. While Novec 7000 has the best
physical characteristics, such as latent heat of evaporation and boiling point
it has a significant global warming potential (GWP) which may be a problem in
some jurisdictions. On the other hand Novec 649 has an uncomfortably high
boiling point which may compromise reliability of some components but has a
very low GWP.
These and other similar fluids have been used for cleaning for years with no
apparent harm to operators as the liquids are always below boiling point.
Presumably most would have evacuation hoods over the cleaning baths so
inspiration is minimized. Precautions will be needed where operatives run the
risk of continuous exposure to the additional vapors released by the boiling
fluids until long term effects are understood. Additional precautions might be
necessary to guard against failure of the cooling loop which could cause the
room atmosphere to become saturated with coolant vapor.
Sealed System
In the open system, the fluid is directly in contact with the hot objects and
is insensitive to system topology and component height. In a sealed system
with flat, minimally flexible cold plates heat must be brought up to a single
plane. While convection is adequate for low wattage components, a conductive
path is required for high power devices.
In the implementation available from Clustered Systems, heat is conducted to a
single plane by a series of heat risers placed atop each component that
generates a significant amount of heat. In all cases, these include CPUs,
VRMs, DIMMs and system glue, plus, if merited, networking and other components
generating over ~2W. The heat risers can be seen at the top of Fig 6. For
clarity, only the bottom server is shown covered by a cold plate. The cold
plates are a chassis component and are all permanently soldered into
refrigerant distribution manifolds. This completely eliminates the probability
of leakage from connectors.
Figure 6 Clustered Systems’ Phase Change Cooled Blade with Two Half Width Servers
Liquid (R134A) is pumped through cold plates placed upon heat risers attached
to CPUs, DIMMs, VRMs, etc. The heat causes the liquid to boil, absorbing 93
times as much heat as the same weight of water.
Figure 7 Sixteen Blade Chassis
The liquid and gas mix is then passed to a heat
exchanger where it is re-converted to 100% liquid.
Unlike air cooled systems, the thermal resistance
between heat source and liquid is so small that
high coolant temperatures can be tolerated. No
chiller is required in most cases. The only energy
required is for circulation pumps and external
fans in a dry or adiabatic cooler. The cooling PUE
can be as low as 1.03.
Fig. 7 shows the front of the chassis. The cold plates can be seen at the
right of each non-populated slot. They slip into the blade and contact the
heat risers when the blade is inserted.
Fig. 8 shows the four rear switch blades and a partial view of the
distribution manifolds.
The maximum power consumption of a CPU is around 130W and we assume that the
maximum lid temperature is 70℃. As the system is isothermal, the cold plate
is the same temperature virtually everywhere. Heat input just causes liquid to
change to gas with no temperature rise. Assuming that the inlet refrigerant
was 40℃, and having established by measurement that the thermal resistance
from CPU lid to refrigerant is <0.2℃/watt the CPU lid would reach 66℃ (40 +
Figure 8 Chassis Rear
130*0.2). Because the gasification causes bubble formation, hence turbulence,
laminar flow film formation is not a problem.
For a whole rack with 160 servers (72kW @ 450W per server) about 0.66 cubic
meters of refrigerant per hour would be required. In practice, with viscosity
of refrigerent 25%, and fluid flow 10% that of a water based system, pump
energy is very low, about 30W.
The benefits of such an efficient phase change cooling system are striking:
Very high power densities can be achieved
o 100kW racks enable data center density of 4,000 W/ft2
Rack floor space for a 10MW data center can be reduced from 50,000ft2 to
about 2,500ft2
Data center construction and facility costs drop ~50%.
ConclusionsWhatever liquid cooling technology is chosen, it will always be more efficientthan air for two reasons. The first and most important is the amount of energyrequired to move air will always be several times greater than that to move a liquid for the same amount of cooling.
Density lb/cuft
Specific Heat BTU/lb
States Delta T 0F
lb/min/ton
CFM/ton
Static pressPSI
Req.Watts
Fan/ Pump Effic-iency
Total W
% of load
Air (std day)
0.075 0.205 gas-gas 18 54.11
722 0.036 84.6 30% 282 8.0%
Water @ 50 F
62.45 0.998 liquid-liquid
9 22.27
0.36 35 40.6 30% 135 3.8%
R134a @ 95 F
72.94 72 liquid-gas30%
0 9.26 0.13 20 8.3 30% 28 0.8%
The table illustrates some typical numbers. While the move to water reduces the energy by 50%, going to refrigerant cuts it by 90%.
Rack Level CoolingDiscussionToday (2013) a 1U server can contain a half width motherboard and four GPU cards. Typically a motherboard ill require about 450W and a GPU 300W, a total of 1,650 watts per 1U. A fully populated server with 40 1U boxes will require almost 70kW, way beyond the capability of air. Even with two half width servers per U the power requirement is 36kW (40 x 2 x 450W). Air for rack level cooling is practically obsolescent except possibly for some low power
work like web serving front ends. This paper briefly reviews some currently available systems. Table 1 presents a summary of some of the various options.
Passive Rear Door, Water
Passive Rear Door, Refrigerant
In Row Cooling Water
Active Rear DoorWater
Cold Plate Refrigerant
Rack Cooling Capability KW 15 20-30 20 20 200Cooling Efficiency Low
Medium- High Medium Medium High
Eliminate Hot Spots
Server fan dependent
Server fan dependent
Server fan dependent
Yes Yes
Additional Cooling required Yes Yes Yes No NoCRAC Required Yes Yes No No NoAisle ContainmentRequired No No Yes No NoAdditional floor space Minimum Minimum 50% Minimum NoneIsothermal No Yes No No YesRedundant pumps &control No Yes No Yes YesChiller Required Yes Yes Yes Yes NoTable 1 Cooling OptionsWater or refrigerant?Water Cooled DoorsHeat is extracted from the air passing through the system by warming water passing through the door’s coils. For each oF that one pound of water warms up, one BTU is absorbed (specific heat). However, due to this warming effect, the temperature differential between air and water drops, diminishing the effective cooling. To overcome this problem, very cold water has to be used which, in some cases, its temperature is below the dew point, causing condensation. While this can be accommodated, it is an extra installation expense. Chilled water can be connected to cooling units either under floor or overheadwith hard or flexible tubing.
Heat can be removed from the water directly from a primary system which could be either a stand-alone remote mechanical refrigeration unit or one that is part of the building system. A better solution which avoids the entry of contaminants from the primary system into the unit cooling loop is to insert aCDU (coolant distribution unit, a heat exchanger and pump assembly) between the primary and secondary (unit) loops.RefrigerantHeat is extracted from air by refrigerant passing through the door’s coils. Instead of making use of specific heat, latent heat of evaporation is used. A refrigerant can capture 100 times more heat per unit of weight than water. As the evaporative process is isothermal, warmer coolant can be used, avoiding condensation problems all too common with water based systems.Heat is removed from the refrigerant using CDU (coolant distribution unit, a heat exchanger and pump assembly) connected to a primary cooling water system.Due to the higher efficiency of refrigerant, the use of a chiller to cool the primary water can be eliminated or at least significantly curtailed and replaced by a water tower, adiabatic or dry cooler.Refrigerant can be brought to cooling units with hard or flexible overhead tubing. Refrigerant piping in a server room does require that most joints mustbe soldered or brazed to ensure a leak-tight system. However, due to the extrawork needed to protect against condensation drips in water based systems, costs are a wash. In the event of a leak, the refrigerant evaporates immediately while water can drip onto and damage sensitive electronics.Passive rear door A water cooled passive rear door was originally developed by IBM for its own servers. They worked quite well at the 5kW rack power levels then current. Later it was licensed to other companies. By bringing the cooling source closer to the heating source, energy savings can be realized both in the reduction of energy for air circulation fans and by reducing cool and return air mixing. The latter allows warmer coolant to beused, in some cases, eliminating the need for chiller operation. (Bell, 2010)While these doors are quite effective at relatively low rack power levels, recirculation becomes a problem as the fan speeds increase to maintain the servers’ internal temperature. The speed up causes the differential pressure from back to front of the rack to increase. In one study this was shown to increase from 20% at 5kW to over 45% at 20kW (Khanakari, 2008). The intake airto servers at the bottom of the rack exceeded the maximum limit of 27oC as then defined by ASHRAE.Counter intuitively, decreasing the recirculation rate by adding blocking plates can increase the pressure at the server outlet which will decrease fans’ efficiency, thus the air flow rate hence impeding cooling. In addition, cables can also partially block air flow, creating yet a further impediment.
It should also be noted that passive rear door coolers cannot be used to condition the data center space. Active Rear Door CoolersThese systems are similar to passive coolers but with the addition of fans. These fans can eliminate some of the drawbacks encountered with passive doors.It is claimed that they can handle up to 45KW when added to a single rack.The pressure between the server outlet and door is reduced and this can cut down hot air recirculation and improve the efficiency of the servers’ internalfans. It is necessary that door fans are synchronized with the servers’. If too slow, they can act as an impediment to air flow and if too fast, can wasteenergy.The additional fans will increase power draw and create another layer of devices to be regularly serviced and repaired.The increased cooling efficiency may also mean that warmer water can be used, possibly eliminating humidification and dehumidification issues. Another benefit may include extended use of economizer modes which cuts chiller energyexpense.In-row coolersIn-row coolers are modular enclosures the same height and depth of the server racks and interspersed between server racks depending on the density to provide increased cooling in the vicinity of the racks. In-row coolers function best when used in conjunction with aisle containment systems that force the cooled air from the in-row coolers to pass through the server racks from a cold aisle to a hot aisle. This also means for optimum efficiency in-row coolers require modular aisle containment systems, adding cost. While moderately effective, in-row coolers cannot be used for cooling a specific rack or racks because they are not directly connected to any rack. They cannotdetermine the exact direction of 100% of their airflow or cooling capacity. Current capacity limitation is around 30kW per in-row cooler (not per rack). In-row coolers are connected to a central chilled water system via flexible hydraulic hoses.Pumped refrigerant in-row cooling units are also available see the discussion on water and refrigerant based cooling above for discussion of the differences.Conduction Two Phase CoolingThis was discussed above in the “Phase Change, Sealed” section that discusses server level cooling. Generally the cold plates are permanently affixed in therack and are brought into contact with the server using a deflection mechanism.Both 1U pizza box and 8U blades have been developed. The cold plates are only 0.08” (2mm) thick but, using refrigerant, absorb well over 3KW of heat in the
21” x 22” format used in the 1u format and 1kW in the 6” x 22” format used in the blade server application. For standard 1U servers the TIM is integrated with the lid and an external mechanism presses the cold plate against it. Currently these are set up to cool either 36 servers with a maximum heat load of 80KW. This is far less thanthe capability of the cold plates but is more heat than most 1U servers can generate.The blade server chassis is 8U tall and holds 16 blades. The cold plates are integrated with the chassis, two for each blade. When a blade is inserted, thecold plate slides under the lid. After seating the blade, the lid is pulled down to press the cold plate onto the components. A 42U rack can hold 5 chassis and cool 100kW. This can be further extended to 200kWHow Much?Overall, there is relatively little equipment capital cost difference in the various forms of “close in” air cooling. All will have the same infrastructureconsisting of chiller, economizer (possibly), CDU, air-liquid heat exchanger (with or without fans) and possibly a CRAC for humidification and dehumidification. The passive rear door solutions may cost $2,000 to $3,000 per instance less, about $85,000 per megawatt.In the case of the conduction cooling system, it consists of a rack with integrated cold plates (solid to liquid heat exchanger), a CDU and a non chiller based heat disposal system. This could be a dry cooler, adiabatic cooler or cooling tower, depending on location.Table 2 below gives estimates of the build costs per megawatt for a data center using 30KW air cooled racks and Clustered Systems’ 100KW conduction cooled racks.
Air Clustered
W/sq ft Data room 430 2500Required sq ft 2320 400Cost per sq ft 250 80Mechanical 464 0Built area 2784 400KW per cabinet 30 100Number of cabinets 33 10
DC Construction $ 696,000 $ 32,000
Electrical system $ 900,000
$ 536,500
Static discharge protection $ 34,800 $6,000Cooling Chiller/cooler
$ 167,402 $ 73,400
Cooling CRAH/HX $ 284,445 $ -
Cooling CDU $ 284,068
$ 245,000
Fire suppression $ 55,680 $8,000Physical security $ 58,000 $ 10,000
Cabinets $ 41,667 $ 1,000,000
Air/liquid HX $ 225,883 $ -
Sub total $ 2,747,945
$ 1,910,900
Contingency @ 10% 10% 10%Architect & Engineering 6% 3%Project Mgr/Consultant 10% 5%
Totals $ 3,462,410
$ 2,254,862
Savings 35%
Table 2, Building Cost ComparisonOf course there are also energy savings. These are shown in Table 3.
ENERGY USE COMPONENT Air 30kW Rack
Clustered 100kW
Data Center IT load 950 1000DC internal cooling (server fans) 50Data Centre Cooling Load (UPS) 1000 1,000Chiller Load @ 7°C 230 -
Electric Room Cooling Load 122 30
Door power 40 -Data Centre Cooling Load (lighting & skin) 28 5Back of House Skin Load 10 2Chilled Water Pump 30 30Refrigerant pump 0 6Pump Cooling Load 30 -Condenser Water Load 20 -Ventilation Latent Heat Load 7 -Ventilation Sensible Load 2 -Cooling Tower 2 7Chiller Heat Rejection to Ambient 1 -Back of House Lighting 1 0.2Total 1525 1,079True PUE 1.66 1.08
Cost of power $0.10 $ 0.10
Annual cost$1,335,577 $945,537
Table 3, Annual Energy Cost ComparisonWhat do I get for my money?Servers with fans are specified differently from those without. At idle, the fans draw ~3 watts each (6-8 per server) and put out 50% of rated capacity. Under full load and 100% of capacity they will draw 8 times as much, about 150watts. Assuming that a motherboard has a power rating of 500W the server nameplate in an air cooled system will specify 650 watts. However, the contactcooled server is still specified at only 500 watts, eliminating unnecessary capital costs for the same number of servers or permitting more servers with the same infrastructure.Thus per megawatt you can cool: 1550 air cooled server or 2,000 contact cooledservers.Table 4 shows the amortization computation for each DC component.
Years Air 30kW Clustered
100kW
DC Construction 39 $17,846 $ 821Electrical system 15 $60,000 $ 35,767Static discharge protection 15 $ 2,320 $ 400Cooling Chiller/cooler 15 $11,160 $ 4,893Cooling CRAH/HX 15 $18,963 $-Cooling CDU 15 $18,938 $ 16,333Fire suppression 15 $ 3,712 $ 533Physical security 15 $ 3,867 $ 667Cabinets 3/6 $13,889 $ 166,667Air/liquid HX 6 $18,824 $-Subtotal, per year $ 169,518 $226,081
Power/year$1,335,577 $945,537
Totals$1,505,095
$ 1,171,617
Number of servers 1550 2000Cost per server $971 $ 586
40%Table 4, Annual Amortization per Server ComparisonAs the DC construction costs can be amortized over 39 years, Clustered’s system is slightly more expensive. However, that is recovered in less than 2 months through energy savings. Further, as data centers are usually populated using nameplate power or power guides from the various OEMs, which have to take into account worst case power 450 more servers can be accommodated using contact cooling, bringing the cost per 1U server to 60% of the air cooled ones.
The FutureAll of the systems discussed above have of necessity been designed to cool standard servers whose system boards and component pin outs were designed exclusively for air cooling. In the short term, with liquid cooling the constraint of component placing canbe removed. For example, DIMMs are spaced 0.4” apart to facilitate air cooling. In a liquid cooled system they can be set at 0.3”. Being closer together means that additional DIMMs can be accommodated in the same space or signal quality improved so that faster data transfers can be achieved. Anotherwith air cooling is the “shadowing” effect where components down wind of a hotcomponent can overheat. Liquid cooling and especially 2 phase cooling does nothave this problem.Another big issue that is especially important in HPC (high performance computing) is the interconnect between individual servers. At the low power densities that air is capable of these servers have to be so far apart that signal quality is decreased and links become speed limited so that the system cannot be used to its maximum compute capability. In addition, the cost of cabling becomes a large part of the whole. By moving to a liquid cooled systemwhich will enable power densities that are ten times or more than air and cable runs are much shorter resulting in higher network speeds, less errors, lower power and higher utilization of the available compute resources.In the mid term, we can expect a move to stacked DIMMs which will improve compute to memory ratio and allow denser packing that only liquid cooling willbe able to support.Current efforts are now focused on extreme density which will leave very little space for power and cooling. One approach is to combine power and cooling, using the same pipes to carry electricity in and heat out. IBM has demonstrated a system that is analogous to the operation of a mammal’s brain. Blood both delivers nourishment and removes heat.On the other hand, we may see a repeat analogous to the transition from bipolar to CMOS technology where air displaced water cooling. The next generation of computer technology may dissipate so little energy that we can move back to air again.