High-Performance, Power-Aware Distributed Computing for Scientific Applications

8
0018-9162/05/$20.00 © 2005 IEEE 40 Computer COVER FEATURE Published by the IEEE Computer Society High-Performance, Power-Aware Distributed Computing for Scientific Applications C omputer models deepen our understanding of complex phenomena and indirectly improve our quality of life. Biological sim- ulations of DNA sequencing and protein folding advance healthcare and drug dis- covery. Mathematical simulations of world eco- nomies guide political policy. Physical fluid-flow sim- ulations of global climate allow more accurate forecasting and life-saving weather alerts. Nanoscale simulations enhance underlying computing tech- nologies. Evolving parallel and distributed systems designed for these demanding scientific simulations will be enormously complex and power hungry. Computers capable of executing one trillion float- ing-point operations per second (Tflops) have already emerged, and petaflops (Pflops) systems that can execute one quadrillion floating-point opera- tions per second are expected by the end of the decade. Because supercomputing experts cannot agree on what the core technologies should be, these systems will likely be diverse as well. Nevertheless, most will be built from commercial components in integrated scalable clusters of symmetric multiprocessors. Such solutions will be highly parallel with tens of thou- sands of CPUs, tera- or petabytes of main memory, and tens of Pbytes of storage. Advanced software will synchronize computation and communication among multiple operating systems and thousands of components. PERFORMANCE AND POWER DEMANDS As Figure 1 illustrates, the growing computa- tional demands of scientific applications have led to exponential increases in peak performance (shown as Top5 average Linpack peak, R max ), power consumption (shown for several supercomputers), and complexity (shown relative to average number of microprocessors, µP) for large-scale systems using commercial components. Power-performance optimization involves minimizing the current effi- ciency gap in processing throughput and power consumption. Performance Complex parallel and distributed systems are dif- ficult to program for efficient performance. As the number of components increases, such systems achieve a lower percentage of theoretical peak per- formance. Gordon Bell award-winning applications achieve between 35 and 65 percent of peak perfor- mance through aggressive optimizations by a team of experts working collaboratively for months. 1 For the past decade’s five most powerful machines, the average percentage of peak performance for Linpack—arguably the most heavily optimized high- performance code suite—ranged between 54 and 71 The PowerPack framework enables distributed systems to profile, analyze, and conserve energy in scientific applications using dynamic voltage scaling. For one common benchmark, the framework achieves more than 30 percent energy savings with minimal performance impact. Kirk W. Cameron Rong Ge Xizhou Feng Virginia Tech

Transcript of High-Performance, Power-Aware Distributed Computing for Scientific Applications

0018-9162/05/$20.00 © 2005 IEEE40 Computer

C O V E R F E A T U R E

P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y

High-Performance,Power-Aware Distributed Computing for Scientific Applications

C omputer models deepen our understandingof complex phenomena and indirectlyimprove our quality of life. Biological sim-ulations of DNA sequencing and proteinfolding advance healthcare and drug dis-

covery. Mathematical simulations of world eco-nomies guide political policy. Physical fluid-flow sim-ulations of global climate allow more accurateforecasting and life-saving weather alerts. Nanoscalesimulations enhance underlying computing tech-nologies.

Evolving parallel and distributed systemsdesigned for these demanding scientific simulationswill be enormously complex and power hungry.Computers capable of executing one trillion float-ing-point operations per second (Tflops) havealready emerged, and petaflops (Pflops) systems thatcan execute one quadrillion floating-point opera-tions per second are expected by the end of thedecade.

Because supercomputing experts cannot agree onwhat the core technologies should be, these systemswill likely be diverse as well. Nevertheless, most willbe built from commercial components in integratedscalable clusters of symmetric multiprocessors. Suchsolutions will be highly parallel with tens of thou-sands of CPUs, tera- or petabytes of main memory,and tens of Pbytes of storage. Advanced softwarewill synchronize computation and communication

among multiple operating systems and thousandsof components.

PERFORMANCE AND POWER DEMANDSAs Figure 1 illustrates, the growing computa-

tional demands of scientific applications have ledto exponential increases in peak performance(shown as Top5 average Linpack peak, Rmax), powerconsumption (shown for several supercomputers),and complexity (shown relative to average numberof microprocessors, µP) for large-scale systemsusing commercial components. Power-performanceoptimization involves minimizing the current effi-ciency gap in processing throughput and powerconsumption.

PerformanceComplex parallel and distributed systems are dif-

ficult to program for efficient performance. As thenumber of components increases, such systemsachieve a lower percentage of theoretical peak per-formance. Gordon Bell award-winning applicationsachieve between 35 and 65 percent of peak perfor-mance through aggressive optimizations by a team of experts working collaboratively for months.1 Forthe past decade’s five most powerful machines, the average percentage of peak performance forLinpack—arguably the most heavily optimized high-performance code suite—ranged between 54 and 71

The PowerPack framework enables distributed systems to profile, analyze,and conserve energy in scientific applications using dynamic voltage scaling. For one common benchmark, the framework achieves more than30 percent energy savings with minimal performance impact.

Kirk W.CameronRong GeXizhou FengVirginia Tech

percent. Typical scientific applications commonlyachieve only 5 to 10 percent of peak performanceon today’s high-performance systems.

Power consumptionJapan’s NEC Earth Simulator, which is capable of

executing 40 Tflops, requires about 12 megawattsof power. Pflops distributed systems will require con-siderably more power. A 1998 NASA study2 pre-dicted power needs of 25 megawatts for such asystem, while more recent projections point to 100megawatts3—enough power to supply 1.6 million60-watt lightbulbs, the lighting requirements of asmall city.

The estimated exponential increase in large-scalesystems’ power requirements shown in Figure 1 issomewhat conservative in that the data, compiledfrom various sources, ignores air-cooling systempower, and single-processor power consumption—for example, the Itanium2 consumes 130 watts—is expected to increase exponentially with clockrates until at least 2010.

Reliability. Reducing distributed system compo-nents’ power and energy consumption increases reli-ability. A growing number of scientific simulationsrequire hundreds of hours of continuous execution.4

Current devices executing these applications fail atan annual rate of 2 to 3 percent. A hypothetical

Pflops system of about 12,000 nodes would expe-rience a hardware failure every 24 hours.

The Arrhenius life-stress model implies that theoperating temperature of an electronic componentreduces its life expectancy and thus its mean timeto failure. Every 10ºC (18ºF) temperature increasereduces component life expectancy by half; reduc-ing a component’s operating temperature by thesame amount doubles its life expectancy. De-creasing high-performance systems’ energy con-sumption will reduce the loss of time, effort, andexperimental data due to aborted program execu-tions or incorrect results from component failure.

Cost. Reducing distributed system components’power and energy consumption also decreasescost. Assuming a rate of $100 per megawatt, aPflops machine consuming 100 megawatts ofpower at peak operation would cost $10,000 perhour and could surpass $85 million in a year. Moreconservative rule-of-thumb predictions of 20 per-cent peak power consumption imply a lower, butnot necessarily manageable, $17 million annualoperational cost ($2,000 per hour). These roughestimates do not include air-cooling requirements,which commonly amount to 40 percent of systemoperating costs. Even small reductions in overallenergy consumption would significantly impactPflops systems’ operational costs.

November 2005 41

Figure 1. Super-computerperformance andpower trends. Scientificapplications’growingcomputationaldemands have led to exponentialincreases in systempeak performance,power consumption,and complexityrequirements.

June1993

June1995

June1997

June1999

June2001

June2003

June2005

June2007

June2009

Perf

orm

ance

(Gflo

ps)

106

105

104

103

102

101

1

High-speedelectric train10,000 kW

Commercialdata center1,374 kW

Residentialair conditioner

15 kW

Small power plantgenerating capacity

300,000 kW

Real application

EfficiencygapTop5 average

Linpack peak (Rmax)Peak power

requirements

1,150 µP2,000 µP

2,700 µP

4,600 µP6,100 µP

12,500 µP130,000 µP

TMCCM-55 kW

FujitsuNumerical

Wind Tunnel100 kW

IBMBlue Gene/L

?? kW

IntelASCI Red850 kW

IBM SPASCI White6,000 kW

IBMASCI Purple8,000 kW

NECEarth

Simulator12,000 kW

42 Computer

Power-performance efficiencyScientific applications demand more power-

performance efficiency in parallel and distrib-uted systems. The efficiency gap shown inFigure 1 could be reduced in two ways.Theoretically, improving performance effi-ciency would cause the red line to moveupward. Improving power efficiency whilemaintaining performance would reduce theo-retical peak performance when it is notrequired and thereby lower the green line with-out affecting the red line. Distributed applica-tions that suffer from poor performanceefficiency despite aggressive optimizations arecandidates for increased power efficiency.

POWER-AWARE COMPUTINGSimulators are available to evaluate the power

consumption of applications on emerging systemsat the circuit-design level, such as Hspice (www.synopsys.com/products/mixedsignal/hspice/hspice.html), and at the architectural level—for example,Wattch,5 SimplePower,6 and SoftWatt.7 Direct mea-surement at the system level is more difficult;PowerScope,8 Castle,9 and other tools attempt todo so using contact resistors, multimeters, and soft-ware tools. Indirect approaches estimate powerconsumption based on hardware counter data, butthese options focus on microarchitecture powerand energy consumption and are limited in the con-text of distributed scientific applications.

Hard versus soft constraintsIn direct contrast to low-power computing, per-

formance is a hard constraint and power is a softconstraint in high-performance computing. Thelow-power approach is appropriate for designingsystems that run on limited power supplies such asPDAs and laptops but not for overcoming mostreliability and cost constraints in high-performancesystems. Such attempts attain reasonable perfor-mance only insofar as they increase the number andimprove the efficiency of low-power components—for example, IBM’s Blue Gene/L uses more than100,000 processors and increases the throughput of floating-point operations in the embeddedPowerPC architecture.

Power modesPower-aware features provide opportunities for

more innovative energy-saving techniques. Inte-grated power-aware components in high-perfor-mance systems provide fine-grained control overpower consumption. Such components have vari-

ous programmable energy settings or modes: PCsand monitors stand by and shut down automati-cally, disk drives spin down when not in use, net-work interface cards have low-power settings,many commercial microprocessors can dynami-cally scale down clock rates,10 and memory powercan be conserved separately.11

The challenge of power-aware computing is toreduce power and energy consumption while main-taining good performance. Power modes affect adevice’s rate of electrical consumption. Operatingin low-power modes over time can conserve energy,but performance usually decreases during modetransition and low-power operation.

A device’s energy consumed reflects the interac-tion of the application workload, the device’s inter-nal structure, and power-mode timing. Determiningthe optimal duration and frequency of power modesrequires predicting the effects of mode-transitionpenalties and low power on an application’s timeto solution.

Dynamic voltage scalingDVS makes it possible to scale a processor’s fre-

quency, thereby reducing CPU performance andconserving system power. Researchers have stud-ied power-performance tradeoffs extensively usingDVS in the context of real-time systems and inter-active mobile applications. For example, LucaBenini and Giovanni De Micheli provide a tutorialon system-level optimization techniques includingDVS,12 while Chung-Hsing Hsu and Ulrich Kremerhave inserted DVS control calls at compile time innoninteractive, memory-bound applications.13

DVS techniques that exploit memory latency forpower savings are very promising in the context ofscientific applications because performance impactis controllable and minimal (between 0.69 and 6.14percent for Spec95 codes), and energy savings aresignificant (between 9.17 and 55.65 percent).

Distributed systems Energy savings are possible with controllable,

minimal performance loss in interactive cluster-based commercial servers. These power-aware dis-tributed clusters14 react to workload variations.When full nodes can be shut down, energy savingsare significant. These approaches, however, do notmeet the needs of scientific applications.

Interactive applications in Web servers are inde-pendent tasks that can be scheduled arbitrarilyupon arrival. Soft real-time deadlines on task com-pletion allow further flexibility. Power-related pol-icy decisions are centralized in the distributed

Dynamic voltagescaling makes itpossible to scale

a processor’sfrequency,

thereby reducing CPU performanceand conserving system power.

scheduler, which determines utilization using sys-tem-level data from the /proc file system and mod-els system load using arrival distributions orpredicts it based on past events. Matching systemresources to anticipated requests enables powersavings in interactive systems.

Distributed scientific applications are noninter-active and often run as batch jobs. Because tasksare usually interdependent and have a user-definedsynchronization scheme, future resource utilizationcannot typically be predicted based on past obser-vations. Given the low potential for idle nodes,power savings must be obtained by other meanssuch as DVS. As time is critical in high-performancecomputing, power optimizations must not impactperformance significantly.

POWERPACK FRAMEWORK Reducing energy consumption requires having

information about the significant details of an appli-cation’s performance and synchronization scheme.Centralized control is possible but, given the addi-tional costs in latency to communicate power-awarecontrol information across a distributed system,decentralized control is preferable. To meet thesedemands, we have created PowerPack, a frameworkfor profiling, analyzing, and controlling energy con-sumption in parallel scientific applications.

As Figure 2 shows, PowerPack 1.0 combinesdirect power measurements using multimeters,smart power strips, and ACPI (advanced configu-ration and power interface)-enabled power supplieswith software data collection, analysis, and control.With the framework, developers can profile com-ponent performance and power and throttle theCPU during distributed workload slack time to con-serve energy while maintaining good performance.

In our profiling experiments, we measured indi-vidual power consumption of four node compo-nents—CPU, memory, disk, and network interfacecard—on slave nodes in a 32-node Beowulf clus-ter.15 We used an older machine because power-related experiments can damage components. Eachslave node has one 933-MHz Intel Pentium IIIprocessor, four 256-Mbyte synchronous DRAMmodules, one 15.3-Gbyte IBM DTLA-307015DeskStar hard drive, and one Intel 82559 EthernetPro 100 onboard Ethernet controller.

In these studies, CPU power (about 30 watts)accounts for roughly 40 percent of nodal systempower under load. On newer 130-watt Itaniummachines, our techniques would result in greaterenergy savings because the CPU accounts for alarger percentage of total system power.

PowerPack library routines let users embed ACPIcalls in applications and reduce the CPU’s process-

November 2005 43

BayTechmanagement

unit

Datalog

Dataanalysis

Datarepository

AC power from outlet

BayTech power strip

DC power from power supply

MultimeterMultimeterMultimeter …

MM threadMM threadMM thread …

Multimeter control thread

PowerPack libraries (profile/control)

Microbenchmarks Applications

Data collection

Software power/energy control

DVS threadDVS threadDVS thread …

Dynamic voltage scaling control thread

Power/energy profiling data

BayTechpower strip

Singlenode

DC

Multimeter

AC

Hardware power/energy profiling

DVScontrol

Multimetercontrol

Performance profiling data

High-performance,power-aware cluster

Direct-powermeasurement node

Figure 2.PowerPack 1.0. The framework combinesdirect power measurements usingmultimeters, smartpower strips, andACPI-enabledpower supplies with software datacollection, analysis,and control.

44 Computer

ing speed via DVS. While in a lower voltage state,the processor consumes less power and perfor-mance suffers. To maintain good performance inscientific applications, PowerPack can throttledown CPU speed during idle or slack periods whenmemory or communication bound phases, not CPUspeed, is the application bottleneck.

BENCHMARK EXAMPLEBy carefully applying PowerPack, it is possible

to achieve total system energy savings in scientificapplications with minimal performance impact. Weillustrate our power-aware approach with theFourier transform (FT) benchmark from NASA’sNational Aerodynamic Simulation (NAS) parallelbenchmark suite.

FT code is communication bound and generallyperforms poorly on distributed systems due to but-terfly data exchanges that rely heavily on messagepassing interface (MPI) all-to-all communication.Other NAS benchmarks, such as embarrassinglyparallel (EP) code, do not conserve energy due to

high computational efficiency, or, like conjugategradient (CG) code, offer significant energy savingsbut involve complex tradeoffs.16

Performance and power profilesWe first obtained performance and power pro-

files of FT on PowerPack to quantify the powerand energy savings possible for a given applica-tion. Figure 3a shows a Jumpshot visualization ofFT performance on four nodes using the MPIParallel Environment tool from the MPICH imple-mentation. A single FT performance cycle or phaseconsists of four basic steps: compute, reduce, com-pute, all-to-all. Figure 3b shows power profiles ofFT on one of four nodes. All-to-all communica-tion dominates execution time and results in longCPU idle periods, indicating potential for energysavings.

To conserve energy with code like FT, the avail-able slack periods must be long enough to outweighthe overhead for moving into and out of DVSpower modes. For example, using PowerPack theswitching overhead on our DVS-enabled system—composed of 16 1.4-GHz Pentium-M-based lap-tops—is about 40 microseconds. Such largepenalties imply that DVS should be used sparingly.By observing the FT performance and power pro-files, we determined that potential energy savingswere high using function-level granularity—effec-tively wrapping calls to throttle the CPU down andback up around the MPI all-to-all communications.

Saving energy without impacting performance

Figure 4 shows the results of throttling downCPU speed for FT on our system. The bars showthe normalized values for delay and energy of aninternal algorithm that uses phased DVS schedul-ing based on the previously obtained performanceand power profiles, external settings fixed for theduration of application execution (600-1,400MHz), and cpuspeed—an automatic history-baseddaemon. The results are normalized to the best per-forming “machine”—in this case, all nodes runningat 1,400 MHz without any DVS.

The figure shows that with PowerPack signifi-cant energy savings can be achieved for code suchas FT. The amount of total system energy savedvaries from 5 to 40 percent, while execution timeincreases from 0 to 15 percent. The internal sched-uling algorithm achieves the most energy savings—about 36 percent—with less than 1 percent per-formance impact. Another 2 percent energy can be saved by setting all machines to the lowest oper-

Compute Reduce Compute All-to-all

Node 1

Node 2

Node 3

Node 4

Figure 3. NAS FTbenchmarkperformance andpower profiles using PowerPack.(a) Parallel performance on four nodes. (b) Power consumption on one of four nodes.

0 20 40 60 80 100 120 140 160 180 200

Pow

er (w

atts

)

NIC power

Startup

Time (seconds)

5

0

5

0

05

10

05

10

15

20

25

Disk power

Memory power

CPU power

Initialize Iteration 1 Iteration 2 Iteration 3

(a)

(b)

ating point (600 MHz), but this increases execu-tion time by more than 15 percent.

The results of cpuspeed and the static settings from800 to 1,200 MHz show that the energy-perfor-mance tradeoff is more complicated than simplyselecting a minimum value in each group. Users mustdetermine the desired energy savings and the per-formance they are willing to sacrifice to achieve it.

COST IMPLICATIONSBased on current trends in supercomputer design,

system operating budgets will eventually rival hard-ware costs. Consider, for example, the NEC EarthSimulator, which has an estimated hardware pricetag of $300 million. Assuming 20 percent of peakpower consumption (12 megawatts), its annualoperational cost exceeds $2 million ($240 perhour)—less than 1 percent of acquisition cost.However, the annual operating cost of a future 100-megawatt Pflops system, ignoring inflation, wouldbe about $5 million; assuming a generous $300 mil-lion budget, this is equivalent to nearly 6 percentof acquisition cost.

On our Beowulf system, which consumes about1.9 kilowatts under load at a cost of about $5 perday or $1,825 per year, 30 percent energy savingstranslates to a $550 annual cost savings. On a 100-megawatt Pflops system consuming 20 percentpeak power at a cost of $2,000 per hour, or $17million per year (ignoring air-cooling costs), thiswould result in a $5 million annual cost savings.Even half this amount would significantly reducethe operating budget of most US supercomputerfacilities while maintaining the high performancethat demanding scientific applications require.

FUTURE WORKPowerPack is the first framework to enable appli-

cation-driven power measurement and manage-ment at the component and cluster levels forscientific applications. We have successfully portedits tools and techniques to dual-processor AMDand Intel server-class machines, and we are now inthe process of porting PowerPack to a cluster ofAMD dual-processor, dual-core servers. Nonethe-less, the software is in its early stages.

PowerPack has helped us identify how and whento conserve energy in scientific applications withoutsignificantly impacting performance. However, ourmanual approach requires user intervention andunderstanding of application performance to beeffective. Parallel programming is already difficult,and legacy applications will not benefit from theneed to recompile to conserve energy.

We are currently exploring ways to transparentlyautomate DVS control for scientific applications.For example, we have created a power-aware MPIthat transparently throttles DVS during MPI com-munications; however, communication frequencyand power transition costs typically reduce perfor-mance considerably. Our current trace-basedapproach can be automated with existing paralleltools but is not transparent. We are therefore incor-porating our lognP model17 of parallel computationinto PowerPack to predict runtime performance andthereby transparently automate DVS throttling.

Power-aware components such as disks, networkcards, and memory offer additional opportunitiesfor power and energy savings in clusters. However,challenges arise in coordinating the use of thesecomponents to conserve energy without seriouslyaffecting performance. Researchers have shownthat the effects of coordinating multiple compo-nents for interactive workloads are often unpre-dictable.18 We plan to explore the tradeoffs of thisapproach in the future.

C ritics of our power-aware approach mayargue that operating cost and reliability arenot critically important because supercom-

puters are primarily designed for high performance;slowing peak speed down to conserve power andcost is thus tantamount to designing a lower-per-formance machine. However, such logic is flawed.

The performance diversity of applications todayimplies that peak performance is sometimes but notalways needed. Further, the number of system com-ponents is increasing nearly exponentially andcomponent reliability is not improving as rapidly,making hardware failures more common and largesystems less useful. Finally, operating cost and reli-

November 2005 45

0

0.20

0.40

0.60

0.80

1.00

1.20

1.40

cpuspeed 600MHz

800MHz

1,000MHz

1,200MHz

1,400MHz

Internalalgorithm

Normalized delayNormalized energy

Figure 4.Normalized delayand energy valuesfor NAS FTbenchmark on PowerPack. Delaygreater than 1 indicatesperformance loss;energy consumptionless than 1indicates energysavings.

46 Computer

ability are critical design factors if they prohibit cre-ation of the highest-performance machine possibleusing commodity components under budget andspace constraints.

Today, supercomputers are built from compo-nents that offer the highest performance from thecommodity marketplace. We cannot rely solely onhardware to reduce the power consumed by clus-ter-based systems. Increasingly, these componentshave power-aware features. Only by exploitingthese features using system software such asPowerPack is it possible to address the power-per-formance efficiency gap and maintain exponentialincreases in parallel application performance. �

AcknowledgmentPowerPack is an open source project sponsored

by the National Science Foundation (CCF#0347683) and the US Department of Energy (DE-FG02-04ER25608). Source code is available fordownload at the SCAPE Laboratory Web site(http://scape.cs.vt.edu).

References1. J.C. Phillips et al., “NAMD: Biomolecular Simula-

tion on Thousands of Processors,” Proc. 2002ACM/IEEE Conf. Supercomputing, IEEE CS Press,2002, p. 36.

2. F.S. Preston, A Petaflops Era Computing Analysis,tech. report CR-1998-207652, NASA, Mar. 1998;http://adipor.members.epn.ba/p2/NASA.pdf.

3. D.H. Bailey, “Performance of Future High-End Com-puters,” presentation, DOE Mission ComputingConf., 2003; http://perc.nersc.gov/docs/Workshop_05_03/dhb-future-perf.pdf.

4. S.C. Jardin, ed., DOE Greenbook: Needs and Direc-tions in High Performance Computing for the Officeof Science, tech. report PPPL-4090, Princeton PlasmaPhysics Lab., June 2005; www.nersc.gov/news/greenbook/N5greenbook-print.pdf.

5. D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: AFramework for Architectural-Level Power Analysisand Optimizations,” Proc. 27th Ann. Int’l Symp.Computer Architecture, IEEE CS Press, 2000, pp. 83-94; www.eecs.harvard.edu/~dbrooks/isca2000.pdf.

6. M.J. Irwin, M. Kandemir, and N. Vijaykrishnan,“SimplePower: A Cycle-Accurate Energy Simulator,”IEEE CS Technical Committee on Computer Archi-tecture Newsletter, Jan. 2001; http://tab.computer.org/tcca/NEWS/jan2001/irwin.pdf.

7. S. Gurumurthi et al., “Using Complete Machine Sim-ulation for Software Power Estimation: The SoftWatt

Approach,” Proc. 8th Int’l Symp. High-PerformanceComputer Architecture, IEEE CS Press, 2002, pp.141-150; www.cs.virginia.edu/~gurumurthi/papers/gurumurthi_softwatt.pdf.

8. J. Flinn and M. Satyanarayanan, “PowerScope: ATool for Profiling the Energy Usage of Mobile Appli-cations,” Proc. 2nd IEEE Workshop Mobile Com-puter Systems and Applications, IEEE CS Press, 1999,pp. 2-10.

9. R. Joseph and M. Martonosi, “Run-time Power Esti-mation in High-Performance Microprocessors,”Proc. 2002 Int’l Symp. Low-Power Electronics andDesign, ACM Press, 2001, pp. 135-140; http://parapet.ee.princeton.edu/papers/rjoseph-islped2001.pdf.

10. Intel Corp., Intel 80200 Processor Based on IntelXScale Microarchitecture, developer’s manual, Mar.2003; www.intel.com/design/iio/manuals/27341103.pdf.

11. D. Lammers, “IDF: Mobile Rambus Spec Unveiled,”EE Times Online, 25 Feb. 1999; www.eet.com/story/OEG19990225S0016.

12. L. Benini and G. De Micheli, “System-Level PowerOptimization: Techniques and Tools,” ACM Trans.Design Automation of Electronic Systems, vol. 5, no.2, 1999, pp. 115-192.

13. C-H. Hsu and U. Kremer, “The Design, Implemen-tation, and Evaluation of a Compiler Algorithm forCPU Energy Reduction,” Proc. ACM SIGPLAN2003 Conf. Programming Language Design andImplementation, ACM Press, 2003, pp. 38-48.

14. E.V. Carrera, E. Pinheiro, and R. Bianchini, “Con-serving Disk Energy in Network Servers,” Proc. 17thInt’l Conf. Supercomputing, ACM Press, 2003, pp.86-97.

15. X. Feng, R. Ge, and K.W. Cameron, “Power andEnergy of Scientific Applications on Distributed Sys-tems,” Proc. 19th IEEE Int’l Parallel and Distrib-uted Processing Symp., IEEE CS Press, 2005, p. 34.

16. K.W. Cameron, X. Feng, and R. Ge, “Performance-and Energy-Conscious Distributed DVS Schedulingfor Scientific Applications on Power-Aware Clusters,”to appear in Proc. 2005 ACM/IEEE Conf. Super-computing, IEEE CS Press, 2005.

17. K.W. Cameron and R. Ge, “Predicting and Evaluat-ing Distributed Communication Performance,” Proc.2004 ACM/IEEE Conf. Supercomputing, IEEE CSPress, 2004, p. 43.

18. X. Fan, C.S. Ellis, and A.R. Lebeck, “The Synergybetween Power-Aware Memory Systems and Proces-sor Voltage Scaling,” Proc. 3rd Int’l WorkshopPower-Aware Computing Systems, LNCS 3164,Springer-Verlag, 2003, pp. 164-179; www.cs.duke.edu/~alvy/papers/dvs-pacs03.pdf.

Kirk W. Cameron is an associate professor in theDepartment of Computer Science and director ofthe Scalable Performance (SCAPE) Laboratory atVirginia Polytechnic Institute and State University.His research interests include high-performanceand grid computing, parallel and distributed sys-tems, computer architecture, power-aware systems,and performance evaluation and prediction.Cameron received a PhD in computer science fromLouisiana State University. He is a member of theIEEE and the IEEE Computer Society. Contacthim at [email protected].

Rong Ge is a PhD candidate in the Department ofComputer Science and a researcher at the SCAPELaboratory at Virginia Tech. Her research interestsinclude performance modeling and analysis, par-allel and distributed systems, power-aware systems,

high-performance computing, and computationalscience. Ge received an MS in fluid mechanics fromTsinghua University, China, and an MS in com-puter science from the University of South Car-olina. She is a member of the ACM and Upsilon PiEpsilon. Contact her at [email protected].

Xizhou Feng is a PhD candidate in the Departmentof Computer Science and Engineering at theUni-versity of South Carolina and a researcher at theSCAPE Laboratory at Virginia Tech. His researchinterests include parallel and distributed systems,high-performance and grid computing, algorithms,bioinformatics, and software engineering. Fengreceived an MS in engineering thermophysics fromTsinghua University. He is a member of the IEEEand the IEEE Computer Society. Contact him [email protected].

November 2005 47

Learn how others are achieving systems and networks design anddevelopment that are dependable and secure to the desireddegree, without compromising performance.

This new journal provides original results in research, design, anddevelopment of dependable, secure computing methodologies,strategies, and systems including:

• Architecture for secure systems• Intrusion detection and error tolerance• Firewall and network technologies• Modeling and prediction• Emerging technologies

Publishing quarterlyMember rate: $31Institutional rate: $285

Learn more about this newpublication and become a subscriber today.

www.computer.org/tdsc

IEEE TRANSACTIONS ON DEPENDABLEAND SECURE COMPUTING