Energy-aware Scheduling Algorithm for Precedence-Constrained Parallel Tasks of Network-intensive...

3rd International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi University of Mashhad

Energy-aware Scheduling Algorithm for Precedence-Constrained Parallel Tasks

of Network-intensive Applications in a Distributed Homogeneous Environment

Vahid Ebrahimirad Aboozar Rajabi Maziar Goudarzi Dept. of Computer Engineering

Sharif University of Technology, Iran [email protected]. ir

School of Electrical and Computer Eng. Dept. of Computer Engineering Sharif University of Technology, Iran

goudarzi@sharifir University of Tehran, Iran

ab. [email protected]. ir

Abstract-A wide range of scheduling algorithms used in the

data centers have traditionally concentrated on enhancement

of performance metrics. Recently, with the rapid growth of

data centers in terms of both size and number, the power

consumption has become a major challenge for both

industry and society. At the software level, energy-aware

task scheduling is an effective technique for power reduction

in the data centers. However, most of the currently proposed

energy-aware scheduling approaches are only paying

attention to computation cost. In the other words, they

ignore the energy consumed by the network equipment,

namely communication cost. In this paper, the problem of

scheduling precedence-constrained parallel tasks of network

intensive applications on homogeneous physical machines in

the data centers is addressed. The proposed Energy-Aware

Scheduling algorithm (EASy) takes both the computation

cost and communication cost into consideration with a low

time complexity O(nlogn + z( (e + n)mv)). The algorithm

reduces energy consumption of the computation and

communication by dynamic voltage frequency scaling

(DVFS) and task packing respectively. The goal of EASy is to

minimize the completion time besides energy consumption of

the data center. The extensive experimental results using

both synthetic benchmarks and real-world applications

clearly demonstrate that EASy is capable of decreasing

energy consumption of physical machines and network

devices respectively by 4.5% and 15.06% on average.

Keywords- Energy-aware scheduling; Communication

awareness; Precedence-constraint prallel application; List

Scheduling; Dynamic Voltage Frequency Scaling (DVFS)

I. INTRODUCTION Precedence-constrained parallel applications have been

used mostly in engineering and scientific fields. The problem of scheduling such applications considering make-span and time complexity has been studied extensively [27][32-36]. Recently, due to impact of energy on operational cost and destructive environmental impacts, the energy consumption has been also considered (e. g. [16-18][28]). In 2006, data centers consumed about 61 billion kWh in the United States, equaling roughly 1.5% of the total electricity consumption. In addition, trends show that power consumption keeps growing at 18% once a year [2]. A few years of neglect of the problem caused by energy would lead it to a crucial challenge. Regarding McKinsey report, the total estimated energy bill for the data centers, which is the most important part of expenses, was $11.5 billion in 2010. Also, the energy cost in a typical data center is doubled every five years [1].

While energy consumption has been improved by recent advances in hardware technologies (e. g. [5-10]) but, there are still some serious concerns about green

978-1-4799-2093-8/13/$31.00 ©201 3 IEEE

computing. The amount of energy consumed by the computing and auxiliary hardware resources is considerably affected by their usage patterns. This fact demonstrates that the development of various software level energy-aware task scheduling techniques is essential.

There are several techniques for reducing power consumption in data centers such as dynamic voltage frequency scaling (DYFS), resource hibernation, PowerNap [19] and memory optimization. Among these techniques that are applied on processors, DYFS is more common than other techniques for saving energy in computer systems (e.g. , [20-24]). Also, it is available on the most modem processors (e.g. [25,26]). Therefore, we propose a scheduling algorithm that uses DYFS and adopts it when a task can take more time to execute. Although many algorithms and strategies have been developed, the scope of the most of them are only restricted to the batterypowered embedded systems [11], applications with independent tasks [12,13,14] or single-processor systems [15]. In addition, most scheduling algorithms consider merely computation cost and they ignore the energy consumption of communications among the physical machin es (PM) [16-18].

As reported by [3], energy consumption of PMs and the network devices (ND) is about 40% of overall data centers' energy consumption. The case for network hardware equipment becomes more crucial when we notice that about one-third of the total energy consumed by IT is related to communication links, switching and aggregation elements [4]. Therefore, to improve energy consumption, both PMs' and NDs' energy consumption should be considered together.

This paper investigates the problem of scheduling precedence-constrained parallel tasks of network-intensive applications on homogeneous PMs. We propose EnergyAware Scheduling (EASy) algorithm that takes into account not only make-span but also energy consumption of both PMs and NDs. The algorithm is an extension of the well-known Energy-Conscious Scheduling (ECS) algorithm [16]. The goal of the EASy is to minimize the completion time besides energy consumption of the data center. The energy reduction of computation and communication is achieved by dynamic voltage frequency scaling (DYFS) and task packing respectively.

In summary, the contributions of this paper are:

• Proposing a scheduling algorithm that considers the energy consumption of NDs as well as minimizing energy consumption of PMs and completion time.


• Reducing communication cost among PMs without increasing the time complexity.

The remainder of this paper is organized as follows. Section 2 presents an overview for the state of the art in the literature. Section 3 describes the data center, application and energy models. The scheduling algorithm is presented in Section 4. This is followed by performance evaluation results of proposed algorithm in Section 5. Finally, the conclusion and future works are reviewed in Section 6.

II. RELATED WORKS In this section, we present the remarkable algorithms in

task scheduling of precedence-constrained parallel tasks which are particularly for distributed PMs. In addition, the well-known ECS scheduling algorithm for heterogeneous distributed PMs is discussed, which has extended in our work.

Scheduling a set of precedence-constrained parallel tasks is a NP-Complete problem [29]. Although heuristic algorithms [46] are the most popular scheduling solutions since they solve the scheduling problems in less than polynomial time, meta-heuristic approaches are also utilized [47][48]. Precedence-constrained parallel tasks scheduling has been studied with different assumptions and goals. Traditionally, researchers have presented their algorithms with two objectives: 1) low time complexity and 2) minimum completion time. Nonetheless, nowadays another objective has been considered by researchers; they have paid attention to energy consumption of PMs and tried to minimize it. However, main objective of most works is the low time complexity and after that researchers should balance among other objectives.

The most common solution for scheduling tasks is listscheduling heuristic in which tasks are sorted into a list and then picked from the sorted list and assigned to PMs one by one. In the other words, list-scheduling algorithms consist of the two typical phases, namely task prioritization and processor selection. Two well-known list-based scheduling algorithms are Heterogeneous Earliest Finish Time (HEFT) and Energy-Conscious Scheduling (ECS). The former does not consider energy-consumption of PMs, but the latter does.

Topcuoglu et al. [27] have presented a scheduling algorithm with low time complexity O(n2 m) that aims to schedule the tasks of an application to shorten the completion time. The HEFT algorithm, first sorts tasks based on the b-Ievel values of all tasks and then selects a task which minimizes its earliest finish time with an insertion-based approach that does not violate precedence constraint.

Y. C. Lee and Y. Zomaya [16] have proposed an energy-aware scheduling algorithm that uses DVFS. ECS is a list-scheduling algorithm that sorts tasks based on blevel values before assigning them to distributed PMs and then schedules tasks via an objective function that aims to minimize energy consumption of PMs and completion time.

Beside list-based scheduling algorithms, there are duplication-based algorithms that are famous algorithms to solve the scheduling problem. These algorithms usually

978-1-4799-2093-8/13/$31.00 ©201 3 IEEE

have much higher time complexity values than the listbased algorithms. Task Duplication Based Bottom-Up Scheduling Algorithm (DBUS) and Energy Aware Scheduling by Minimizing Duplication (EAMD) are two scheduling algorithms that have solved scheduling problem by the duplication-based heuristic.

D. Bozdag et al. [30] have proposed a duplicationbased scheduling that minimizes the completion time. DBUS performs a CP-based listing for tasks and assigns them to PMs with task duplication and insertion. In contrast to the traditional approaches, the DBUS traverses the DAG in a bottom-up fashion and does not impose any constraints on the number of task duplication.

1. Mei and K. Li [31] have presented a new energyaware scheduling algorithm which considers the energyconsumption of PMs as well as the completion time. EAMD makes decision for removing the abundant task copies in the schedules generated by duplication-based algorithms and then reduces energy consumption by using DVFS. In addition to above algorithms, there are several notable scheduling algorithms (e. g. [32-38]) which some of them consider energy consumption of PMs and others do not.

To summarize this section, most of the related works on scheduling parallel tasks either aim to reduce energy consumption with maintaining performance or reduce completion time. Nevertheless, they do not consider communication cost among tasks and the energy consumption of network equipment. As a result, our study takes into account both energy consumption of PMs and network equipment. This paper presents an energy-aware scheduling algorithm for precedence-constrained tasks of network-intensive applications with considering communication cost.

III. UNDERLYING MODELS In this section, the assumed underlying models of

target data center, application and energy are presented. These models are prerequisites to explanation of the proposed energy-aware algorithms. The notations used to model the system are listed in Tab. I.

A. Data Center Model The target system assumed in this study consists of M

PMs. The processors of each PM can scale up or scale down the input voltage. In the other words, they are DVFS enabled. When a processor is in the idling mode, it still consumes energy but at the lowest voltage. Each scale up or scale down of the input voltage in processors needs to a clock frequency transition. Since the amount of time for clock frequency transition is trivial in comparison with execution time of tasks [39][40], we ignore the overheads in our study.

The data center used in this work is based on three-tier data center topology which is used in modem data centers. In this model, PMs are not fully interconnected while communicating with various speeds. The speed of communications among PMs depends on network topology and is determined by Communication Expense Matrix (CEM).


Name

M

N

T

E

CEM

CCN

Wi

ei,j

PPMj

NE

A

C

Vi

f

tbuSYi

TABLE I NOTATIONS

Description

Number of PMs

Number of Nodes

(Tasks)

Number of Switches

(NOs)

Number of Edges

(Connections)

Communication

Expense Matrix

Communication Cost

in Network

Computation Cost of

Task ith

Communication Cost between tasks ith and

jth

Power Consumption

of PM jth

Normalized Total Energy

Amount of switching activity of a

Processor

Total Capacitance

Load of a Processor

Supply Voltage of

Task ith

Frequency of a

Processor

busy time ith of a ND or PM

Name

Rporti

numlinecard

EST(n;)

EFT(n;)

AST(n;)

AFT(n;)

taskOnPMi,j

readyForRuni,j

MECPM

MECND

Pchassis

Plinecard

Pporti

tid lei

Description

Rate of Port ith

Number of linecards

in a Switch

Earliest Start Time of Task ith

Earliest Finish Time of Task ith

Actual Start Time of

Task ith

Actual Finish Time

of Task ith

Task ith is assigned to PM jth if I,

otherwise it is 0

the time which PM

jth is ready for

executing task ith

Maximum Energy

Consumption of

PMs

Maximum Energy

Consumption of

NOs

Power Consumption

of Switch's chassis

Power Consumption

of Switch's linecard

Power Consumption

of Switch's port ith

idle time ith of a NO

or PM

The CEM is a two-dimensional matrix consisting of M rows and M columns. Each cell (i, j) E CEM represents the communication expense between PMj and PMj. The cells of CEM are elicited by PacketTracer [41] -a simulator for network communications- which represent a multiplicative factor of the communication time for transmitting one bit. For example, a topology network including four PMs with its CEM is shown in Fig. 1. The cells of CEM are relative and depend on each other. For example, the cell (0, 1) is equal to 1 and the cell (0, 3) is equal to 2 if transmitting one bit from PMO to PMl takes 32 time units and from PMO to PM3 takes 64 time units.

o 1

CEM = � � 2 2

2 2 2 2 o 1 1 0

server-PJ s-t-PT o 1

Figure I, Communication Expense Matrix (CEM) for four PMs

978-1-4799-2093-8/13/$31.00 ©201 3 IEEE

B. Application Model In this research, we assumed that applications are

network-intensive and formed by precedence-constrained parallel tasks. The applications can be represented by a directed acyclic graph (DAG) [39]. In a DAG representation of an application, the application is divided by a set of tasks which each task depends on the results of executions of other tasks in the application, in other words, each task starts to execute when parents of the task completed their execution. A DAG, G=(N, E), consists of a set N of n nodes and a set E of e edges. Predecessors of a task called parents of the task and children of the task are the successors of the task. Entry task, nentry, is one that does not have any parents and a task with no children called exit task, nexit. A simple task graph with details is represented in Fig. 2.

Entry Task

Exit Task Figure 2. A simple task graph

As depicted in Fig. 2, a weight Cjj is associated with each edge in DAG that represents the communication cost between tasks nj and nj (e. g. , the required volume for transmitting data from task nj to task nD. Moreover, each node has a weight Wj that represents the computation cost (e. g. , the required time for execution) of a task. As a result, if two tasks nj and nj are assigned to PMs mp and mq, then their Communication Cost in Network (CCN) is defined as:

(1)

C Power Model In this work, power consumption consists of two parts,

the power consumption of PMs and the power consumption of NDs (e.g. core switches, aggregation switches, and rack switches). Thus, we present two models for calculating total power consumption in a data center. Power consumption of a PM is derived from the power consumption model in complementary metal-oxide semiconductor (CMOS) logic circuits. As a result, total energy consumption of PMs is defined as:

TotalE PM = Lf;!l LnjE m/ACVi2 fa Wi +

Lf;!l LidleiE IDLE/ACVI�wflow) tidlej (2)

In the other side, we take into account the energy consumption of communications among PMs. The power


model for calculating power consumption of NDs is defined as [42]:

PSwitch_busy = Pchassis + Plinecard . numlinecard +

LpiEPorts Pporti . Rporti

PSwitchJdle = Pc has is idle + PlinecardJdle . numlinecard

(3)

(4)

Thus, total energy consumption for T switches is computed by:

Total_END = LT=1 Pswitch_busy tbuSYi + LT=1 PswitchJdle tidlej

IV. SCHEDULING ALGORITHM

A. Problem Statement

(5)

The problem is to allocate tasks to the PMs such that completion time and energy consumption of PMs and NDs to be minimized. Also, precedence constraint should be satisfied. Therefore, the goal of our work is defined as:

Minimize

{Total_EpM + Total_END' max( AFT( nex it ) ) }

Subject to:

AST(ni) � max (AFT(nj) + nj E parents(ni) CCN(mp' mq> ni' nj))

L�o taskOnPM .. = 1 J- I,J

B. EASy Solution

i = O .. N

(6)

(7)

(8)

As the problem has two objectives (minimizing energy consumption and completion time) which have conflict with each other, an objective function is required for balancing equally between them. As a result, an objective function is proposed that considers all objectives of the problem. Then, we have performed a global optimization phase for reducing energy consumption of PMs and NDs.

Before the scheduling begins, the b-Ievel values of all tasks in a task graph are computed and sorted into a scheduling list in decreasing order of their b-Ievel values. Then, tasks are picked from the sorted list and assigned to PMs one by one. If LPathj = {nj, nj+ 1, . . . , nexit} is the longest path from node nj to exit node so the b-Ievel value for the task nj is defmed as:

b-Ieveli = Lnj E node(LPathi) Wj + Lej E edge(LPathi) Cj (9)

In addition, it is necessary to defme three attributes, namely Earliest Start Time (EST), Earliest Finish Time (EFT) and Task Communication Cost (TCC).

978-1-4799-2093-8/13/$31.00 ©201 3 IEEE

EST(ni,mJ {D' if ni = nentry

= max{ readyForRuni j' max (AFT(nk) + Ck i)}, else , nk€ parents(ni) ,

TCC(ni' mj) = LnkE parents(ni) CCN(mj' mx' ni' nk) + nkEmx

(10)

(11)

LnkE childern(ni) CCN(mj' mx' ni' nk) (12) nkEmx

The proposed objective function, Scheduling Decision Maker (SDM), determines which task should be assigned to which PM. A positive SDM value indicates the fmding of a new best scheduling alternative. For a given task n� when CCR (Communication to Computation Ratio) is equal or less than 1, the SDM value of allocating a combination of the PM mj with the input voltage Vk with the best combination of m' and v' is computed by (14).

(13)

(14) When the CCR is more than one, SDM is defined as:

SDM(npmj, vk,m', v')

( EFT(n"m',v')-EFT(n"mj,v,) XCCR

J+

l EFT(n"mj, v,) -min{EFT(n"m', v'), EFT(n"mj, v,)}

( EPAr (np m', v') -Em (npmj, V' ) J TCC(n"m)<&

EpM(npmj,v,)

( EFT(n"m',v')-EFT(n"mj'v,) 1

J ----------,----'----- x - + EFT(n"mj, v,) -min{EFT(n"m', v'), EFT(n"mj, v,)} CCR

( EPAr (np m', v') -Em (npmj, V' ) J TCC(n"m') < &

EpM(npmj,v,)

(TCC(n"m') -TCC(n"mj )J

+( EpM(n"m', v') -EpM (n"mj, v, )

J l TCC(n"nlj) l EpM(npnlj,vk)

( EFT(n"m',v')-EFT(n"mj,vk)

J Othenvise

EFT(n"mj, v,) -min{EFT(n"m', v'), EFT(n"mj, v,)}

(15) As it is shown in Alg. I, tasks are scheduled and

assigned to the PMs by the SDM value (steps 1-14) . Since each allocating decision that the SDM makes is a local optimum decision, consequently an energy reduction technique is needed for reducing energy in both PMs and NDs. The Task Packing for Energy Reduction (TAPE) is presented for reducing energy (steps 15-34). Therefore, tasks are scheduled by EASy in the first phase and then TAPE optimizes the scheduling without degrading the completion time and time complexity. In this paper, two attributes, Maximum Energy Consumption of PMs (MECpm) and Maximum Energy Consumption of NDs (MECnd), are used for nonnalizing energy consumption of


PMs and NDs. Hence, Normalized Total Energy (NTE) is defined as:

NTE = Total EpM +

Total END MECPM MECND

(16)

Before rescheduling of the tasks by TAPE, tasks are sorted into a scheduling list in decreasing order of their Communication Cost level (cc-Ievel) values. The cc-Ievel of task nj is allocated to PMj is computed by:

cc-Ievel; = LnkE parents(ni) CCN( mi' mx, ni' nk) + nkEmx

LnkE childern(ni) CCN (mi' mx, ni' nk) nkEmx

(17)

Algorithm I. the pseudo-code of EASY algorithm and TPER algorithm

Phase I: EASy Algorithm

Input: DAG, set of PMs, CEM Output: S-pi - an order of tasks onto PMs

I. Sort 'V n, E N in decreasing order by b-level values 2. Foreach n, E N Do

3. m =mo 4. v = Vo 5. Foreach mj E PMs Do

6. Foreach Vk E Voltage Do

7. if ( SDM(ni, mj, Vk, m', y) > SDM(ni, m', y, m', y) then 8. 1� =mj 9. v = Vk 10. endif II. endForeach 12. endForeach 13. S-p1 � allocate ni to mj with voltage Vk 14. endForeach

Phase 2: TAPE Algorithm

Input: MECpm, MECsw, S -p I Output: S -p2 - an order of tasks onto PMs

IS. Sort 'V n, E N in decreasing order by b-level values 16. Foreach n, E N Do 17. m' � the PM that ni allocated in phase I 18. v' � the voltage that ni allocated in phase I 19. eg � Compute NE for S-p1 20. Remove ni in S-p1 21. Foreach mj E PMs Do

22. Foreach Vk E Voltage Do 23. Add ni in S-p1 on mj with voltage Vk 24. eg � Compute NE for S-p1 25. if ( eg < eg and no increase in AFT(n"it)) then

26. 1� =mj 27. v = Vk 28. eg� eg 29. endif 30. Remove ni in S-p1 31. endForeach 32. endForeach 33. S-p2 � allocate ni to mj with voltage Vk 34 endForeach

V. PERFORMANCE EVALUATION In this section, we present our experimental results and

demonstrate how our algorithm can effectively save energy without marginal performance degradation. Before presenting the experimental results, the comparison metrics and experimental settings are defined and expressed.

A. Evaluation Metrics The completion time is the most common metric for

performance evaluation of the task scheduling algorithms. In addition, energy consumption is defined as a metric since we are focused on minimizing of the energy consumed by the PMs and NDs. In this work, three normalized evaluation metrics are also considered which

978-1-4799-2093-8/13/$31.00 ©201 3 IEEE

are Schedule Length Ratio (SLR), Energy Consumption Ratio of PMs (ECRpM) and Energy Consumption Ratio of NDs (ECRND). Formally, the SLR, ECRpM, and ECRND are defined as:

SLR = CompletionTime

LniEcPWi

EnergyConsumptionOfPMs ECRpM =

LniEcPWi x PidtePM

ECRND = EnergyConsumptionOfNDs

(18)

(19)

(20)

Where CP is a set of tasks which are in the critical path-the path with the most computation and communication cost from the root of a given task graph.

B. Experimental Settings The simulation technique is used for performance

evaluation of the proposed EASY+TAPE in comparison with ECS+MCER. Therefore, we have designed and implemented the TaskGraphSim. This simulator tool is able to schedule given task graphs on a set of defined heterogeneous or homogenous resources using various algorithms and different network topologies. As a result, the tool will generate a report of desired values, such as the completion time and energy consumption of PMs and NDs separately.

For evaluating our algorithm, we executed two sets of benchmarks: synthetic benchmarks (STG) [43] and realworld applications the Laplace Equation Solver (LES) [44] and LV-Decomposition (LUD) [45]. The synthetic benchmarks consist of 597 task graphs with various characteristics as is represented in Tab. II. Also, we have generated 500 various task graphs with different sizes (number of nodes) for real-world benchmarks.

TABLE II THE CHARACTERISTICS OF SYNTHETIC BENCHMARKS

DAG's characteristics Values

Number of Nodes {50, 100, 300, 500, 750} Number of Predecessors Max= 177, Ave = 16.57, Min=0

Parallelism Max = 94.84, Ave = 10.33, Min = 1.56

Since the three-tier architecture is the most popular network topology in modern data centers [49], the threetier data center topology is used for the target data center. The three-tier topology is shown in Fig. 3. Also, we assumed that the capacity of each rack is eight PMs and each cluster contains four racks. Furthermore, all PMs are connected to a rack switch; each rack switch is linked to two aggregation switches while each aggregation switch is connected to two core switches. In addition, the power settings of switches [42] are tabulated in Tab. III. Also, we assumed that each active linecard consumes 35 watts and each port consumes about 1 watt in maximum rate.

TABLE III POWER CONSUMPTION SUMMARY FOR NETWORK DEVICES

Type of Switch Idle Power Maximum Power

Consumption (W) Consumption (W)

Rack Switch 198 150

Aggregation Switch 656 555

Core Switch 656 555


Cor. Switch

Aggreg.tion Switch

R.ck Switch

-ECS+MCER --- EASy+TAPE

6 5

1 o

�i(.: -::

..... � � r-

� , 1.-.1

...

2 4 6 8 10 12 14 16 18 20 CCR


6 5

� 4 \I) CII �3 Qj � 2

1 o

.JII' ,...

,,-".

,#' 1-;: �

, ::; � -

2 4 6 8 10 12 14 16 18 20 CCR


6 5 4

� 3 2 1 o

1.1-'fIII4 .

� ". ...

..... ., -ill'

2 4 6 8 W U M � � w

CCR

978-1-4799-2093-8/13/$31.00 ©201 3 IEEE

.....

-- . , \ "' .....

""'to •

Figure 3. Three-Tier Data Center Network Topology


35 30

[ 25 c:: lrl 20 CII � 15 CII � 10

5 o

...... .,

�

-� � � .,

� ..

2 4 6 8 10 12 14 16 18 20 CCR

Figure 4. Experimental Results of the LUD

E


30 25 ,...,

.--

� 20 � � � I"'" u w

CII 15 110 '" Qj 10 > <t 5

o

� : ... �,

2 4 6 8 10 12 14 16 18 20 CCR

Figure 5. Experimental Results of the LES


25 20

[ 15 c:: lrlW

5 o

... .... �

2 4 6 8 W U M � � W CCR

Figure 6. Experimental Results of the STG

......


3 2.5

" � 2 u w g:, 1.5 l!! CII 1 �

0.5 o

� .�

.... ;::.... --

2 4 6 8 W 12 14 16 18 20 CCR


"

3 2.5

� 2 u w g:, 1.5 '" � 1 <t 0.5

o

,

,1\ �, I' ..... ""'" . - - � ' . - -

2 4 6 8 10 12 14 16 18 20 CCR

-ECS+MCER --- EASy+ TAPE

1.4 1.2

1 " 0.8 I: c:: u 0.6 w

0.4 0.2

0

\. ,

.� .. .. r.::: 10.. ' - -- � r'II

2 4 6 8 10 12 14 16 18 20 CCR


C. Experimental Results In this section, the perfonnance of EASy+ TAPE is compared with a well-known energy-aware algorithm called ECS+MCER. In particular, we show the behavior of our proposed algorithm when the application has various CCRs in comparison with ECS+MCER. Some generate task graphs with different CCRs {O.l, 0.5, 1, 2, 4, 6, 8, lO, 12, 14, 16, 18, 20} are generated and executed on different number of PMs {2, 4, 8, 16, 32, 64, 128}. Thus, the number of experiments conducted with two algorithms (ECS+MCER and EASy+TAPE) is 137774 (i. e. , 68887 for each algorithm). The large set of task graphs with various characteristics prevents bias towards one specific scheduling algorithm. A wide range of simulation results, show that the EASy reduces energy consumption of PMs and NDs by 4.5% and 15.06% respectively on average.

T ABLE IV. EXPERIMENTAL RESULTS OF EASY+T APE IN COMPARISON

WITH ECS+MCER

Benchmark Name LUD LES STG

Completion time 0.45% -6.13% -2.13%

Energy Consumption of PMs 4.46% 7.00% 2.03%

Energy Consumption of NDs 11.72% 22.25% 11.22%

The entire results obtained from the extensive simulations are summarized in Tab. IV. In addition, results of different benchmarks using SLR, ECRPM and ECRND metrics with respect to various CCRs are shown in Fig. 4, 5 and 6. As Figures show, the proposed algorithm has different behaviors in various CCRs:

1) Low CCRs: When the CCRs are less than one, the communication cost is not considered in the objective function since applications are processor-intensive, so results have not get any improvement.

2) Medium CCRs: The communication costs are included in the SDM. EASy+TAPE activates less PMs than ECS+MCER for mmumzmg communication cost between PMs. As a result, although the energy consumption of PMs and NDs are reduced because of less parallelism between tasks, EASy+ T APE produce slightly longer schedule lengths in some benchmarks than ECS+MCER. However the mcrease of completion time varies in various benchmarks because of their structure.

3) High CCRs: Our algorithm shows its best outcome since applications are network-intensive in these CCRs. Considering communication cost, the objective function causes the reduction of the completion time and the energy consumption of PMs and NDs. This is the case since, communications between tasks take a long time and reducing the communication costs causes the completion time to decrease a lot and also the number of active PMs and NDs lessen.

978-1-4799-2093-8/13/$31.00 ©201 3 IEEE

VI. Conclusion AND FUTURE WORKS The problem of scheduling precedence-constrained

parallel tasks has been widely studied, but most of them just consider the perfonnance metrics such as completion time. In recent years, researchers present several energyaware heuristics, however none of them did not pay attention to the energy consumption of NDs and they just considered the energy consumption of the PMs. In this study, we investigated the energy issue in task scheduling with extending and optimizing a well-known algorithm called ECS. EASy+ TAPE takes into account the energy consumption of NDs as well as the energy consumption of PMs besides completion time. We have evaluated EASy+ T APE with an extensive set of simulations and compared with ECS+MCER. The experimental results from our comparative evaluation study confirm the superior performance of EASy+ T APE over ECS+MCER, particularly in energy saving of NDs. A wide range of simulation results, show that the proposed algorithm reduces energy consumption of the PMs and NDs by 4.5% and 15.06% respectively on average.

We plan to extend EASy+TAPE algorithm to support the PMs with multicore processors. In addition, since some data centers have heterogeneous PMs, we will extend our algorithm to support the data centers.

VII. REFRENCES

[I] Available from: http://searchstorage.techtarget.com.au/.

[2] ENERGY STAR, Report to congress on server and data center

energy efficiency public law 109-431. Public law, 2007. 109: p. 431.

[3] Brown, R., et al.: Report to congress on server and data center energy efficiency: public law 109-431. Lawrence Berkeley National Laboratory, Berkeley, 2008.

[4] Shang, L., Peh, L.-S., Jha, K.N.: Dynamic voltage scaling with

links for power optimization of interconnection networks. In: Proceedings of the 9th International Symposium on High

Performance Computer Architecture, Table of Contents, 2003.

[5] Deng, Q., et aI., Memscale: active low-power modes for main memory. ACM SIGPLAN Notices, 2011. 46(3): p. 225-238.

[6] Grover, A., Modern system power management. Queue, 2003. 1(7): p. 66.

[7] Burd, TD. and R.W. Brodersen. Energy efficient CMOS

microprocessor design. in System Sciences, 1995. Proceedings of the Twenty-Eighth Hawaii International Conference on. 1995.

IEEE.

[8] Kaxiras, S. and M. Martonosi, Computer architecture techniques for power-efficiency. Synthesis Lectures on Computer Architecture, 2008. 3(1): p. 1-207.

[9] Kaxiras, S., Z. Hu, and M. Martonosi. Cache decay: exploiting

generational behavior to reduce cache leakage power. in Computer Architecture, 200 I. Proceedings. 28th Annual International Symposium on. 2001. IEEE.

[10] Narayanan, D., A. Donnelly, and A. Rowstron, Write off-loading:

Practical power management for enterprise storage. ACM Transactions on Storage (TOS), 2008. 4(3): p. 10.

[II] Tian, Y., et al. Real-time task mapping and scheduling for

collaborative in-network processing in dvs-enabled wireless sensor networks. in Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. 2006. IEEE.

[12] Lee, Y.C. and A.Y. Zomaya, Energy efficient utilization of resources in cloud computing systems. J. Supercomput., 2012.

60(2): p. 268-280.

[13] Beloglazov, A., J. Abawajy, and R. Buyya, Energy-aware resource

allocation heuristics for efficient management of data centers for


cloud computing. Future Generation Computer Systems, 2012.

28(5): p. 755-768.

[14] Pahlavan, A., M. Momtazpour, and M. Goudarzi. Data center power reduction by heuristic variation-aware server placement and chassis consolidation. in Computer Architecture and Digital Systems (CADS), 2012 16th CSI International Symposium on. 2012. IEEE.

[15] Zhong, X. and C.-Z. Xu, Energy-aware modeling and scheduling for dynamic voltage scaling with statistical real-time guarantee. Computers, IEEE Transactions on, 2007. 56(3): p. 358-372.

[16] Lee, Y.C. and A.Y. Zomaya, Energy conscious scheduling for distributed computing systems under different operating conditions. Parallel and Distributed Systems, I EEE Transactions on, 2011. 22(8): p. 1374-1381.

[17] Sharifi, M., S. Shahrivari, and H. Salimi, PASTA: a power-aware solution to schedul ing of precedence-constrained tasks on heterogeneous computing resources. Computing, 2013. 95(1): p. 67-88.

[18] Wang, L., et al. Towards energy aware scheduling for precedence

constrained parallel tasks in a cluster with DVFS. in Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM

International Conference on. 2010. IEEE.

[19] Meisner, D., 8.T. Gold, and T.F. Wenisch. PowerNap: eliminating server idle power. in ACM SIGPLAN Notices. 2009. ACM.

[20] Zhuravlev, S., et aI., Survey of energy-cognizant scheduling techniques. 2012.

[21] Kim, K.H., R. Buyya, and 1. Kim. Power aware scheduling of bagof-tasks applications with deadline constraints on DVS-enabled clusters. in Proceedings of the seventh IEEE international

symposium on cluster computing and the grid. 2007.

[22] Zhu, D., R. Melhem, and B.R. Childers, Scheduling with dynamic Voltage/speed adjustment using slack reclamation in multiprocessor

real-time systems. Parallel and Distributed Systems, IEEE Transactions on, 2003. 14(7): p. 686-700.

[23] Ge, R., X. Feng, and K.W. Cameron. Performance-constrained

distributed dvs scheduling for scientific applications on poweraware clusters. in Proceedings of the 2005 ACMlIEEE conference

on Supercomputing. 2005. IEEE Computer Society.

[24] Rountree, 8., et al. Bounding energy consumption in large-scale MPI programs. in Supercomputing, 2007. se07. Proceedings of the 2007 ACMlIEEE Conference on. 2007. IEEE.

[25] Intel, E., SpeedStep® Technology for the Intel® Pentium® M Processor, 2004, Intel Corporation, Santa Clara, CA.

[26] PowerNOW, A., Technology, AMD white paper, November 2000.

[27] Topcuoglu, H., S. Hariri, and M.-y. Wu, Performance-effective and low-complexity task schedul ing for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, 2002. 13(3): p. 260-274.

[28] Pahlavan, A., M. Momtazpour, and M. Goudarzi. Variation-aware

Server Placement and Task Assignment for Data Center Power Minimization. in Parallel and Distributed Processing with

Applications ([SPA), 2012 IEEE 10th International Symposium on. 2012. IEEE.

[29] M.R. Garey, D.S.J., Computers and Intractability: A Guide to the Theory of NP-Complete ness. 1979.

[30] Bozdag, D., U. Catalyurek, and F. Ozguner. A task duplication based bottom-up scheduling algorithm for heterogeneous

environments. in Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. 2006. IEEE.

[31] Mei, J. and K. Li. Energy-Aware Scheduling Algorithm with

Duplication on Heterogeneous Computing Systems. in Grid

978-1-4799-2093-8/13/$31.00 ©201 3 IEEE

Computing (GRlD), 2012 ACM/IEEE 13th International Conference on. 2012. IEEE.

[32] llavarasan, E. and P. Thambidurai, Low complexity performance

effective task scheduling algorithm for heterogeneous computing environments. Journal of Computer sciences, 2007. 3(2): p. 94-103.

[33] Daoud, M.1. and N. Kharma, A high performance algorithm for

static task scheduling in heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing, 2008. 68(4): p. 399-409.

[34] Hagras, T. and J. Janecek, A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems. Parallel Computing, 2005. 31 (7): p. 653 -670.

[35] Liu, G., K. Poh, and M. Xie, Iterative list scheduling for heterogeneous computing. Journal of Parallel and Distributed Computing, 2005. 65(5): p. 654-665.

[36] Tang, x., et aI., List scheduling with duplication for heterogeneous computing systems. Journal of Parallel and Distributed Computing,

2010. 70(4): p. 323-329.

[37] Li, K., Energy efficient scheduling of parallel tasks on multiprocessor computers. The Journal of Supercomputing, 2012.

60(2): p. 223-247.

[38] Zong, Z., et aI., EAD and PEBD: two energy-aware duplication

scheduling algorithms for parallel tasks on homogeneous clusters. Computers, IEEE Transactions on, 2011. 60(3): p. 360-374.

[39] Sinnen, 0., Task scheduling for parallel systems. Vol. 60. 2007: Wiley-Interscience.

[40] Min, R., T. Furrer, and A. Chandrakasan. Dynamic voltage scaling techniques for distributed microsensor networks. in VLSI, 2000. Proceedings. IEEE Computer Society Workshop on. 2000. IEEE.

[41] Janitor, J., F. Jakab, and K. Kniewald. Visual Learning Tools for Teaching/Learning Computer Networks: Cisco Networking

Academy and Packet Tracer. in Networking and Services (ICNS), 2010 Sixth International Conference on. 2010. IEEE.

[42] Mahadevan, P., et aI., A power benchmarking framework for

network devices, in NETWORKING 20092009, Springer. p. 795-808.

[43] Tobita, T. and H. Kasahara, A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. Journal of Scheduling, 2002. 5(5): p. 379-394.

[44] Kwok, Y.-K., I. Ahmad, and J. Gu. FAST: A low-complexity algorithm for efficient scheduling of DAGs on parallel processors. in Parallel Processing, 1996., Proceedings of the 1996 International Conference on. 1996. IEEE.

[45] Van de Vel de, E.F., Experiments with mUlticomputer LU -decomposition. Concurrency: Practice and Experience, 1990. 2(1): p. I-26.

[46] A. Rajabi, V. Ebrahimirad, N. Yazdani. "Decision Support-as-aService: An Energy-aware Decision Support Service in Cloud Computing" 5th International Conference on Information and Knowledge Technology (IKT), Iran, 2013.

[47] A. Rajabi, H.R. Faragardi and N. Yazdani. "Communication-aware and Energy-efficient Resource Provisioning for Real-Time Cloud

Services", 17th CSI Symposium on Computer Architecture & Digital Systems (CADS), Iran, 2013.

[48] H.R. Faragardi, A. Rajabi, R. Shojaee, N. Yazdani. "Towards

Energy-aware Resource Scheduling to Maximize Reliability in Cloud Computing Systems" 15th IEEE International Conference

on High Performance Computing and Communications (HPCC 2013), China, 2013.

[49] Infrastructure, C.D.C., 2.5 Design Guide, 2007.

Energy-aware Scheduling Algorithm for Precedence-Constrained Parallel Tasks of Network-intensive...

Documents

Transcript of Energy-aware Scheduling Algorithm for Precedence-Constrained Parallel Tasks of Network-intensive...