Energy-aware Scheduling Algorithm for Precedence-Constrained Parallel Tasks of Network-intensive...
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Energy-aware Scheduling Algorithm for Precedence-Constrained Parallel Tasks of Network-intensive...
3rd International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi University of Mashhad
Energy-aware Scheduling Algorithm for Precedence-Constrained Parallel Tasks
of Network-intensive Applications in a Distributed Homogeneous Environment
Vahid Ebrahimirad Aboozar Rajabi Maziar Goudarzi Dept. of Computer Engineering
Sharif University of Technology, Iran [email protected]. ir
School of Electrical and Computer Eng. Dept. of Computer Engineering Sharif University of Technology, Iran
goudarzi@sharifir University of Tehran, Iran
ab. [email protected]. ir
Abstract-A wide range of scheduling algorithms used in the
data centers have traditionally concentrated on enhancement
of performance metrics. Recently, with the rapid growth of
data centers in terms of both size and number, the power
consumption has become a major challenge for both
industry and society. At the software level, energy-aware
task scheduling is an effective technique for power reduction
in the data centers. However, most of the currently proposed
energy-aware scheduling approaches are only paying
attention to computation cost. In the other words, they
ignore the energy consumed by the network equipment,
namely communication cost. In this paper, the problem of
scheduling precedence-constrained parallel tasks of network
intensive applications on homogeneous physical machines in
the data centers is addressed. The proposed Energy-Aware
Scheduling algorithm (EASy) takes both the computation
cost and communication cost into consideration with a low
time complexity O(nlogn + z( (e + n)mv)). The algorithm
reduces energy consumption of the computation and
communication by dynamic voltage frequency scaling
(DVFS) and task packing respectively. The goal of EASy is to
minimize the completion time besides energy consumption of
the data center. The extensive experimental results using
both synthetic benchmarks and real-world applications
clearly demonstrate that EASy is capable of decreasing
energy consumption of physical machines and network
devices respectively by 4.5% and 15.06% on average.
Keywords- Energy-aware scheduling; Communication
awareness; Precedence-constraint prallel application; List
Scheduling; Dynamic Voltage Frequency Scaling (DVFS)
I. INTRODUCTION Precedence-constrained parallel applications have been
used mostly in engineering and scientific fields. The problem of scheduling such applications considering make-span and time complexity has been studied extensively [27][32-36]. Recently, due to impact of energy on operational cost and destructive environmental impacts, the energy consumption has been also considered (e. g. [16-18][28]). In 2006, data centers consumed about 61 billion kWh in the United States, equaling roughly 1.5% of the total electricity consumption. In addition, trends show that power consumption keeps growing at 18% once a year [2]. A few years of neglect of the problem caused by energy would lead it to a crucial challenge. Regarding McKinsey report, the total estimated energy bill for the data centers, which is the most important part of expenses, was $11.5 billion in 2010. Also, the energy cost in a typical data center is doubled every five years [1].
While energy consumption has been improved by recent advances in hardware technologies (e. g. [5-10]) but, there are still some serious concerns about green
978-1-4799-2093-8/13/$31.00 ©201 3 IEEE
computing. The amount of energy consumed by the computing and auxiliary hardware resources is considerably affected by their usage patterns. This fact demonstrates that the development of various software level energy-aware task scheduling techniques is essential.
There are several techniques for reducing power consumption in data centers such as dynamic voltage frequency scaling (DYFS), resource hibernation, PowerNap [19] and memory optimization. Among these techniques that are applied on processors, DYFS is more common than other techniques for saving energy in computer systems (e.g. , [20-24]). Also, it is available on the most modem processors (e.g. [25,26]). Therefore, we propose a scheduling algorithm that uses DYFS and adopts it when a task can take more time to execute. Although many algorithms and strategies have been developed, the scope of the most of them are only restricted to the batterypowered embedded systems [11], applications with independent tasks [12,13,14] or single-processor systems [15]. In addition, most scheduling algorithms consider merely computation cost and they ignore the energy consumption of communications among the physical machin es (PM) [16-18].
As reported by [3], energy consumption of PMs and the network devices (ND) is about 40% of overall data centers' energy consumption. The case for network hardware equipment becomes more crucial when we notice that about one-third of the total energy consumed by IT is related to communication links, switching and aggregation elements [4]. Therefore, to improve energy consumption, both PMs' and NDs' energy consumption should be considered together.
This paper investigates the problem of scheduling precedence-constrained parallel tasks of network-intensive applications on homogeneous PMs. We propose EnergyAware Scheduling (EASy) algorithm that takes into account not only make-span but also energy consumption of both PMs and NDs. The algorithm is an extension of the well-known Energy-Conscious Scheduling (ECS) algorithm [16]. The goal of the EASy is to minimize the completion time besides energy consumption of the data center. The energy reduction of computation and communication is achieved by dynamic voltage frequency scaling (DYFS) and task packing respectively.
In summary, the contributions of this paper are:
• Proposing a scheduling algorithm that considers the energy consumption of NDs as well as minimizing energy consumption of PMs and completion time.
3rd International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi University of Mashhad
• Reducing communication cost among PMs without increasing the time complexity.
The remainder of this paper is organized as follows. Section 2 presents an overview for the state of the art in the literature. Section 3 describes the data center, application and energy models. The scheduling algorithm is presented in Section 4. This is followed by performance evaluation results of proposed algorithm in Section 5. Finally, the conclusion and future works are reviewed in Section 6.
II. RELATED WORKS In this section, we present the remarkable algorithms in
task scheduling of precedence-constrained parallel tasks which are particularly for distributed PMs. In addition, the well-known ECS scheduling algorithm for heterogeneous distributed PMs is discussed, which has extended in our work.
Scheduling a set of precedence-constrained parallel tasks is a NP-Complete problem [29]. Although heuristic algorithms [46] are the most popular scheduling solutions since they solve the scheduling problems in less than polynomial time, meta-heuristic approaches are also utilized [47][48]. Precedence-constrained parallel tasks scheduling has been studied with different assumptions and goals. Traditionally, researchers have presented their algorithms with two objectives: 1) low time complexity and 2) minimum completion time. Nonetheless, nowadays another objective has been considered by researchers; they have paid attention to energy consumption of PMs and tried to minimize it. However, main objective of most works is the low time complexity and after that researchers should balance among other objectives.
The most common solution for scheduling tasks is listscheduling heuristic in which tasks are sorted into a list and then picked from the sorted list and assigned to PMs one by one. In the other words, list-scheduling algorithms consist of the two typical phases, namely task prioritization and processor selection. Two well-known list-based scheduling algorithms are Heterogeneous Earliest Finish Time (HEFT) and Energy-Conscious Scheduling (ECS). The former does not consider energy-consumption of PMs, but the latter does.
Topcuoglu et al. [27] have presented a scheduling algorithm with low time complexity O(n2 m) that aims to schedule the tasks of an application to shorten the completion time. The HEFT algorithm, first sorts tasks based on the b-Ievel values of all tasks and then selects a task which minimizes its earliest finish time with an insertion-based approach that does not violate precedence constraint.
Y. C. Lee and Y. Zomaya [16] have proposed an energy-aware scheduling algorithm that uses DVFS. ECS is a list-scheduling algorithm that sorts tasks based on blevel values before assigning them to distributed PMs and then schedules tasks via an objective function that aims to minimize energy consumption of PMs and completion time.
Beside list-based scheduling algorithms, there are duplication-based algorithms that are famous algorithms to solve the scheduling problem. These algorithms usually
978-1-4799-2093-8/13/$31.00 ©201 3 IEEE
have much higher time complexity values than the listbased algorithms. Task Duplication Based Bottom-Up Scheduling Algorithm (DBUS) and Energy Aware Scheduling by Minimizing Duplication (EAMD) are two scheduling algorithms that have solved scheduling problem by the duplication-based heuristic.
D. Bozdag et al. [30] have proposed a duplicationbased scheduling that minimizes the completion time. DBUS performs a CP-based listing for tasks and assigns them to PMs with task duplication and insertion. In contrast to the traditional approaches, the DBUS traverses the DAG in a bottom-up fashion and does not impose any constraints on the number of task duplication.
1. Mei and K. Li [31] have presented a new energyaware scheduling algorithm which considers the energyconsumption of PMs as well as the completion time. EAMD makes decision for removing the abundant task copies in the schedules generated by duplication-based algorithms and then reduces energy consumption by using DVFS. In addition to above algorithms, there are several notable scheduling algorithms (e. g. [32-38]) which some of them consider energy consumption of PMs and others do not.
To summarize this section, most of the related works on scheduling parallel tasks either aim to reduce energy consumption with maintaining performance or reduce completion time. Nevertheless, they do not consider communication cost among tasks and the energy consumption of network equipment. As a result, our study takes into account both energy consumption of PMs and network equipment. This paper presents an energy-aware scheduling algorithm for precedence-constrained tasks of network-intensive applications with considering communication cost.
III. UNDERLYING MODELS In this section, the assumed underlying models of
target data center, application and energy are presented. These models are prerequisites to explanation of the proposed energy-aware algorithms. The notations used to model the system are listed in Tab. I.
A. Data Center Model The target system assumed in this study consists of M
PMs. The processors of each PM can scale up or scale down the input voltage. In the other words, they are DVFS enabled. When a processor is in the idling mode, it still consumes energy but at the lowest voltage. Each scale up or scale down of the input voltage in processors needs to a clock frequency transition. Since the amount of time for clock frequency transition is trivial in comparison with execution time of tasks [39][40], we ignore the overheads in our study.
The data center used in this work is based on three-tier data center topology which is used in modem data centers. In this model, PMs are not fully interconnected while communicating with various speeds. The speed of communications among PMs depends on network topology and is determined by Communication Expense Matrix (CEM).
3rd International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi University of Mashhad
Name
M
N
T
E
CEM
CCN
Wi
ei,j
PPMj
NE
A
C
Vi
f
tbuSYi
TABLE I NOTATIONS
Description
Number of PMs
Number of Nodes
(Tasks)
Number of Switches
(NOs)
Number of Edges
(Connections)
Communication
Expense Matrix
Communication Cost
in Network
Computation Cost of
Task ith
Communication Cost between tasks ith and
jth
Power Consumption
of PM jth
Normalized Total Energy
Amount of switching activity of a
Processor
Total Capacitance
Load of a Processor
Supply Voltage of
Task ith
Frequency of a
Processor
busy time ith of a ND or PM
Name
Rporti
numlinecard
EST(n;)
EFT(n;)
AST(n;)
AFT(n;)
taskOnPMi,j
readyForRuni,j
MECPM
MECND
Pchassis
Plinecard
Pporti
tid lei
Description
Rate of Port ith
Number of linecards
in a Switch
Earliest Start Time of Task ith
Earliest Finish Time of Task ith
Actual Start Time of
Task ith
Actual Finish Time
of Task ith
Task ith is assigned to PM jth if I,
otherwise it is 0
the time which PM
jth is ready for
executing task ith
Maximum Energy
Consumption of
PMs
Maximum Energy
Consumption of
NOs
Power Consumption
of Switch's chassis
Power Consumption
of Switch's linecard
Power Consumption
of Switch's port ith
idle time ith of a NO
or PM
The CEM is a two-dimensional matrix consisting of M rows and M columns. Each cell (i, j) E CEM represents the communication expense between PMj and PMj. The cells of CEM are elicited by PacketTracer [41] -a simulator for network communications- which represent a multiplicative factor of the communication time for transmitting one bit. For example, a topology network including four PMs with its CEM is shown in Fig. 1. The cells of CEM are relative and depend on each other. For example, the cell (0, 1) is equal to 1 and the cell (0, 3) is equal to 2 if transmitting one bit from PMO to PMl takes 32 time units and from PMO to PM3 takes 64 time units.
o 1
CEM = � � 2 2
2 2 2 2 o 1 1 0
server-PJ s-t-PT o 1
Figure I, Communication Expense Matrix (CEM) for four PMs
978-1-4799-2093-8/13/$31.00 ©201 3 IEEE
B. Application Model In this research, we assumed that applications are
network-intensive and formed by precedence-constrained parallel tasks. The applications can be represented by a directed acyclic graph (DAG) [39]. In a DAG representation of an application, the application is divided by a set of tasks which each task depends on the results of executions of other tasks in the application, in other words, each task starts to execute when parents of the task completed their execution. A DAG, G=(N, E), consists of a set N of n nodes and a set E of e edges. Predecessors of a task called parents of the task and children of the task are the successors of the task. Entry task, nentry, is one that does not have any parents and a task with no children called exit task, nexit. A simple task graph with details is represented in Fig. 2.
Entry Task
Exit Task Figure 2. A simple task graph
As depicted in Fig. 2, a weight Cjj is associated with each edge in DAG that represents the communication cost between tasks nj and nj (e. g. , the required volume for transmitting data from task nj to task nD. Moreover, each node has a weight Wj that represents the computation cost (e. g. , the required time for execution) of a task. As a result, if two tasks nj and nj are assigned to PMs mp and mq, then their Communication Cost in Network (CCN) is defined as:
(1)
C Power Model In this work, power consumption consists of two parts,
the power consumption of PMs and the power consumption of NDs (e.g. core switches, aggregation switches, and rack switches). Thus, we present two models for calculating total power consumption in a data center. Power consumption of a PM is derived from the power consumption model in complementary metal-oxide semiconductor (CMOS) logic circuits. As a result, total energy consumption of PMs is defined as:
TotalE PM = Lf;!l LnjE m/ACVi2 fa Wi +
Lf;!l LidleiE IDLE/ACVI�wflow) tidlej (2)
In the other side, we take into account the energy consumption of communications among PMs. The power
3rd International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi University of Mashhad
model for calculating power consumption of NDs is defined as [42]:
PSwitch_busy = Pchassis + Plinecard . numlinecard +
LpiEPorts Pporti . Rporti
PSwitchJdle = Pc has is idle + PlinecardJdle . numlinecard
(3)
(4)
Thus, total energy consumption for T switches is computed by:
Total_END = LT=1 Pswitch_busy tbuSYi + LT=1 PswitchJdle tidlej
IV. SCHEDULING ALGORITHM
A. Problem Statement
(5)
The problem is to allocate tasks to the PMs such that completion time and energy consumption of PMs and NDs to be minimized. Also, precedence constraint should be satisfied. Therefore, the goal of our work is defined as:
Minimize
{Total_EpM + Total_END' max( AFT( nex it ) ) }
Subject to:
AST(ni) � max (AFT(nj) + nj E parents(ni) CCN(mp' mq> ni' nj))
L�o taskOnPM .. = 1 J- I,J
B. EASy Solution
i = O .. N
(6)
(7)
(8)
As the problem has two objectives (minimizing energy consumption and completion time) which have conflict with each other, an objective function is required for balancing equally between them. As a result, an objective function is proposed that considers all objectives of the problem. Then, we have performed a global optimization phase for reducing energy consumption of PMs and NDs.
Before the scheduling begins, the b-Ievel values of all tasks in a task graph are computed and sorted into a scheduling list in decreasing order of their b-Ievel values. Then, tasks are picked from the sorted list and assigned to PMs one by one. If LPathj = {nj, nj+ 1, . . . , nexit} is the longest path from node nj to exit node so the b-Ievel value for the task nj is defmed as:
b-Ieveli = Lnj E node(LPathi) Wj + Lej E edge(LPathi) Cj (9)
In addition, it is necessary to defme three attributes, namely Earliest Start Time (EST), Earliest Finish Time (EFT) and Task Communication Cost (TCC).
978-1-4799-2093-8/13/$31.00 ©201 3 IEEE
EST(ni,mJ {D' if ni = nentry
= max{ readyForRuni j' max (AFT(nk) + Ck i)}, else , nk€ parents(ni) ,
TCC(ni' mj) = LnkE parents(ni) CCN(mj' mx' ni' nk) + nkEmx
(10)
(11)
LnkE childern(ni) CCN(mj' mx' ni' nk) (12) nkEmx
The proposed objective function, Scheduling Decision Maker (SDM), determines which task should be assigned to which PM. A positive SDM value indicates the fmding of a new best scheduling alternative. For a given task n� when CCR (Communication to Computation Ratio) is equal or less than 1, the SDM value of allocating a combination of the PM mj with the input voltage Vk with the best combination of m' and v' is computed by (14).
(13)
(14) When the CCR is more than one, SDM is defined as:
SDM(npmj, vk,m', v')
( EFT(n"m',v')-EFT(n"mj,v,) XCCR
J+
l EFT(n"mj, v,) -min{EFT(n"m', v'), EFT(n"mj, v,)}
( EPAr (np m', v') -Em (npmj, V' ) J TCC(n"m)<&
EpM(npmj,v,)
( EFT(n"m',v')-EFT(n"mj'v,) 1
J ----------,----'----- x - + EFT(n"mj, v,) -min{EFT(n"m', v'), EFT(n"mj, v,)} CCR
( EPAr (np m', v') -Em (npmj, V' ) J TCC(n"m') < &
EpM(npmj,v,)
(TCC(n"m') -TCC(n"mj )J
+( EpM(n"m', v') -EpM (n"mj, v, )
J l TCC(n"nlj) l EpM(npnlj,vk)
( EFT(n"m',v')-EFT(n"mj,vk)
J Othenvise
EFT(n"mj, v,) -min{EFT(n"m', v'), EFT(n"mj, v,)}
(15) As it is shown in Alg. I, tasks are scheduled and
assigned to the PMs by the SDM value (steps 1-14) . Since each allocating decision that the SDM makes is a local optimum decision, consequently an energy reduction technique is needed for reducing energy in both PMs and NDs. The Task Packing for Energy Reduction (TAPE) is presented for reducing energy (steps 15-34). Therefore, tasks are scheduled by EASy in the first phase and then TAPE optimizes the scheduling without degrading the completion time and time complexity. In this paper, two attributes, Maximum Energy Consumption of PMs (MECpm) and Maximum Energy Consumption of NDs (MECnd), are used for nonnalizing energy consumption of
3rd International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi University of Mashhad
PMs and NDs. Hence, Normalized Total Energy (NTE) is defined as:
NTE = Total EpM +
Total END MECPM MECND
(16)
Before rescheduling of the tasks by TAPE, tasks are sorted into a scheduling list in decreasing order of their Communication Cost level (cc-Ievel) values. The cc-Ievel of task nj is allocated to PMj is computed by:
cc-Ievel; = LnkE parents(ni) CCN( mi' mx, ni' nk) + nkEmx
LnkE childern(ni) CCN (mi' mx, ni' nk) nkEmx
(17)
Algorithm I. the pseudo-code of EASY algorithm and TPER algorithm
Phase I: EASy Algorithm
Input: DAG, set of PMs, CEM Output: S-pi - an order of tasks onto PMs
I. Sort 'V n, E N in decreasing order by b-level values 2. Foreach n, E N Do
3. m =mo 4. v = Vo 5. Foreach mj E PMs Do
6. Foreach Vk E Voltage Do
7. if ( SDM(ni, mj, Vk, m', y) > SDM(ni, m', y, m', y) then 8. 1� =mj 9. v = Vk 10. endif II. endForeach 12. endForeach 13. S-p1 � allocate ni to mj with voltage Vk 14. endForeach
Phase 2: TAPE Algorithm
Input: MECpm, MECsw, S -p I Output: S -p2 - an order of tasks onto PMs
IS. Sort 'V n, E N in decreasing order by b-level values 16. Foreach n, E N Do 17. m' � the PM that ni allocated in phase I 18. v' � the voltage that ni allocated in phase I 19. eg � Compute NE for S-p1 20. Remove ni in S-p1 21. Foreach mj E PMs Do
22. Foreach Vk E Voltage Do 23. Add ni in S-p1 on mj with voltage Vk 24. eg � Compute NE for S-p1 25. if ( eg < eg and no increase in AFT(n"it)) then
26. 1� =mj 27. v = Vk 28. eg� eg 29. endif 30. Remove ni in S-p1 31. endForeach 32. endForeach 33. S-p2 � allocate ni to mj with voltage Vk 34 endForeach
V. PERFORMANCE EVALUATION In this section, we present our experimental results and
demonstrate how our algorithm can effectively save energy without marginal performance degradation. Before presenting the experimental results, the comparison metrics and experimental settings are defined and expressed.
A. Evaluation Metrics The completion time is the most common metric for
performance evaluation of the task scheduling algorithms. In addition, energy consumption is defined as a metric since we are focused on minimizing of the energy consumed by the PMs and NDs. In this work, three normalized evaluation metrics are also considered which
978-1-4799-2093-8/13/$31.00 ©201 3 IEEE
are Schedule Length Ratio (SLR), Energy Consumption Ratio of PMs (ECRpM) and Energy Consumption Ratio of NDs (ECRND). Formally, the SLR, ECRpM, and ECRND are defined as:
SLR = CompletionTime
LniEcPWi
EnergyConsumptionOfPMs ECRpM =
LniEcPWi x PidtePM
ECRND = EnergyConsumptionOfNDs
(18)
(19)
(20)
Where CP is a set of tasks which are in the critical path-the path with the most computation and communication cost from the root of a given task graph.
B. Experimental Settings The simulation technique is used for performance
evaluation of the proposed EASY+TAPE in comparison with ECS+MCER. Therefore, we have designed and implemented the TaskGraphSim. This simulator tool is able to schedule given task graphs on a set of defined heterogeneous or homogenous resources using various algorithms and different network topologies. As a result, the tool will generate a report of desired values, such as the completion time and energy consumption of PMs and NDs separately.
For evaluating our algorithm, we executed two sets of benchmarks: synthetic benchmarks (STG) [43] and realworld applications the Laplace Equation Solver (LES) [44] and LV-Decomposition (LUD) [45]. The synthetic benchmarks consist of 597 task graphs with various characteristics as is represented in Tab. II. Also, we have generated 500 various task graphs with different sizes (number of nodes) for real-world benchmarks.
TABLE II THE CHARACTERISTICS OF SYNTHETIC BENCHMARKS
DAG's characteristics Values
Number of Nodes {50, 100, 300, 500, 750} Number of Predecessors Max= 177, Ave = 16.57, Min=0
Parallelism Max = 94.84, Ave = 10.33, Min = 1.56
Since the three-tier architecture is the most popular network topology in modern data centers [49], the threetier data center topology is used for the target data center. The three-tier topology is shown in Fig. 3. Also, we assumed that the capacity of each rack is eight PMs and each cluster contains four racks. Furthermore, all PMs are connected to a rack switch; each rack switch is linked to two aggregation switches while each aggregation switch is connected to two core switches. In addition, the power settings of switches [42] are tabulated in Tab. III. Also, we assumed that each active linecard consumes 35 watts and each port consumes about 1 watt in maximum rate.
TABLE III POWER CONSUMPTION SUMMARY FOR NETWORK DEVICES
Type of Switch Idle Power Maximum Power
Consumption (W) Consumption (W)
Rack Switch 198 150
Aggregation Switch 656 555
Core Switch 656 555
3rd International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi University of Mashhad
Cor. Switch
Aggreg.tion Switch
R.ck Switch
-ECS+MCER --- EASy+TAPE
6 5
1 o
�i(.: -::
..... � � r-
� , 1.-.1
...
2 4 6 8 10 12 14 16 18 20 CCR
-ECS+MCER --- EASy+TAPE
6 5
� 4 \I) CII �3 Qj � 2
1 o
.JII' ,...
,,-".
,#' 1-;: �
, ::; � -
2 4 6 8 10 12 14 16 18 20 CCR
-ECS+MCER --- EASy+TAPE
6 5 4
� 3 2 1 o
1.1-'fIII4 .
� ". ...
..... ., -ill'
2 4 6 8 W U M � � w
CCR
978-1-4799-2093-8/13/$31.00 ©201 3 IEEE
.....
-- . , \ "' .....
""'to •
Figure 3. Three-Tier Data Center Network Topology
-ECS+MCER --- EASy+TAPE
35 30
[ 25 c:: lrl 20 CII � 15 CII � 10
5 o
...... .,
�
-� � � .,
� ..
2 4 6 8 10 12 14 16 18 20 CCR
Figure 4. Experimental Results of the LUD
E
-ECS+MCER --- EASy+TAPE
30 25 ,...,
.--
� 20 � � � I"'" u w
CII 15 110 '" Qj 10 > <t 5
o
� : ... �,
2 4 6 8 10 12 14 16 18 20 CCR
Figure 5. Experimental Results of the LES
-ECS+MCER --- EASy+TAPE
25 20
[ 15 c:: lrlW
5 o
... .... �
2 4 6 8 W U M � � W CCR
Figure 6. Experimental Results of the STG
......
-ECS+MCER --- EASy+TAPE
3 2.5
" � 2 u w g:, 1.5 l!! CII 1 �
0.5 o
� .�
.... ;::.... --
2 4 6 8 W 12 14 16 18 20 CCR
-ECS+MCER --- EASy+TAPE
"
3 2.5
� 2 u w g:, 1.5 '" � 1 <t 0.5
o
,
,1\ �, I' ..... ""'" . - - � ' . - -
2 4 6 8 10 12 14 16 18 20 CCR
-ECS+MCER --- EASy+ TAPE
1.4 1.2
1 " 0.8 I: c:: u 0.6 w
0.4 0.2
0
\. ,
.� .. .. r.::: 10.. ' - -- � r'II
2 4 6 8 10 12 14 16 18 20 CCR
3rd International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi University of Mashhad
C. Experimental Results In this section, the perfonnance of EASy+ TAPE is compared with a well-known energy-aware algorithm called ECS+MCER. In particular, we show the behavior of our proposed algorithm when the application has various CCRs in comparison with ECS+MCER. Some generate task graphs with different CCRs {O.l, 0.5, 1, 2, 4, 6, 8, lO, 12, 14, 16, 18, 20} are generated and executed on different number of PMs {2, 4, 8, 16, 32, 64, 128}. Thus, the number of experiments conducted with two algorithms (ECS+MCER and EASy+TAPE) is 137774 (i. e. , 68887 for each algorithm). The large set of task graphs with various characteristics prevents bias towards one specific scheduling algorithm. A wide range of simulation results, show that the EASy reduces energy consumption of PMs and NDs by 4.5% and 15.06% respectively on average.
T ABLE IV. EXPERIMENTAL RESULTS OF EASY+T APE IN COMPARISON
WITH ECS+MCER
Benchmark Name LUD LES STG
Completion time 0.45% -6.13% -2.13%
Energy Consumption of PMs 4.46% 7.00% 2.03%
Energy Consumption of NDs 11.72% 22.25% 11.22%
The entire results obtained from the extensive simulations are summarized in Tab. IV. In addition, results of different benchmarks using SLR, ECRPM and ECRND metrics with respect to various CCRs are shown in Fig. 4, 5 and 6. As Figures show, the proposed algorithm has different behaviors in various CCRs:
1) Low CCRs: When the CCRs are less than one, the communication cost is not considered in the objective function since applications are processor-intensive, so results have not get any improvement.
2) Medium CCRs: The communication costs are included in the SDM. EASy+TAPE activates less PMs than ECS+MCER for mmumzmg communication cost between PMs. As a result, although the energy consumption of PMs and NDs are reduced because of less parallelism between tasks, EASy+ T APE produce slightly longer schedule lengths in some benchmarks than ECS+MCER. However the mcrease of completion time varies in various benchmarks because of their structure.
3) High CCRs: Our algorithm shows its best outcome since applications are network-intensive in these CCRs. Considering communication cost, the objective function causes the reduction of the completion time and the energy consumption of PMs and NDs. This is the case since, communications between tasks take a long time and reducing the communication costs causes the completion time to decrease a lot and also the number of active PMs and NDs lessen.
978-1-4799-2093-8/13/$31.00 ©201 3 IEEE
VI. Conclusion AND FUTURE WORKS The problem of scheduling precedence-constrained
parallel tasks has been widely studied, but most of them just consider the perfonnance metrics such as completion time. In recent years, researchers present several energyaware heuristics, however none of them did not pay attention to the energy consumption of NDs and they just considered the energy consumption of the PMs. In this study, we investigated the energy issue in task scheduling with extending and optimizing a well-known algorithm called ECS. EASy+ TAPE takes into account the energy consumption of NDs as well as the energy consumption of PMs besides completion time. We have evaluated EASy+ T APE with an extensive set of simulations and compared with ECS+MCER. The experimental results from our comparative evaluation study confirm the superior performance of EASy+ T APE over ECS+MCER, particularly in energy saving of NDs. A wide range of simulation results, show that the proposed algorithm reduces energy consumption of the PMs and NDs by 4.5% and 15.06% respectively on average.
We plan to extend EASy+TAPE algorithm to support the PMs with multicore processors. In addition, since some data centers have heterogeneous PMs, we will extend our algorithm to support the data centers.
VII. REFRENCES
[I] Available from: http://searchstorage.techtarget.com.au/.
[2] ENERGY STAR, Report to congress on server and data center
energy efficiency public law 109-431. Public law, 2007. 109: p. 431.
[3] Brown, R., et al.: Report to congress on server and data center energy efficiency: public law 109-431. Lawrence Berkeley National Laboratory, Berkeley, 2008.
[4] Shang, L., Peh, L.-S., Jha, K.N.: Dynamic voltage scaling with
links for power optimization of interconnection networks. In: Proceedings of the 9th International Symposium on High
Performance Computer Architecture, Table of Contents, 2003.
[5] Deng, Q., et aI., Memscale: active low-power modes for main memory. ACM SIGPLAN Notices, 2011. 46(3): p. 225-238.
[6] Grover, A., Modern system power management. Queue, 2003. 1(7): p. 66.
[7] Burd, TD. and R.W. Brodersen. Energy efficient CMOS
microprocessor design. in System Sciences, 1995. Proceedings of the Twenty-Eighth Hawaii International Conference on. 1995.
IEEE.
[8] Kaxiras, S. and M. Martonosi, Computer architecture techniques for power-efficiency. Synthesis Lectures on Computer Architecture, 2008. 3(1): p. 1-207.
[9] Kaxiras, S., Z. Hu, and M. Martonosi. Cache decay: exploiting
generational behavior to reduce cache leakage power. in Computer Architecture, 200 I. Proceedings. 28th Annual International Symposium on. 2001. IEEE.
[10] Narayanan, D., A. Donnelly, and A. Rowstron, Write off-loading:
Practical power management for enterprise storage. ACM Transactions on Storage (TOS), 2008. 4(3): p. 10.
[II] Tian, Y., et al. Real-time task mapping and scheduling for
collaborative in-network processing in dvs-enabled wireless sensor networks. in Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. 2006. IEEE.
[12] Lee, Y.C. and A.Y. Zomaya, Energy efficient utilization of resources in cloud computing systems. J. Supercomput., 2012.
60(2): p. 268-280.
[13] Beloglazov, A., J. Abawajy, and R. Buyya, Energy-aware resource
allocation heuristics for efficient management of data centers for
3rd International Conference on Computer and Knowledge Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi University of Mashhad
cloud computing. Future Generation Computer Systems, 2012.
28(5): p. 755-768.
[14] Pahlavan, A., M. Momtazpour, and M. Goudarzi. Data center power reduction by heuristic variation-aware server placement and chassis consolidation. in Computer Architecture and Digital Systems (CADS), 2012 16th CSI International Symposium on. 2012. IEEE.
[15] Zhong, X. and C.-Z. Xu, Energy-aware modeling and scheduling for dynamic voltage scaling with statistical real-time guarantee. Computers, IEEE Transactions on, 2007. 56(3): p. 358-372.
[16] Lee, Y.C. and A.Y. Zomaya, Energy conscious scheduling for distributed computing systems under different operating conditions. Parallel and Distributed Systems, I EEE Transactions on, 2011. 22(8): p. 1374-1381.
[17] Sharifi, M., S. Shahrivari, and H. Salimi, PASTA: a power-aware solution to schedul ing of precedence-constrained tasks on heterogeneous computing resources. Computing, 2013. 95(1): p. 67-88.
[18] Wang, L., et al. Towards energy aware scheduling for precedence
constrained parallel tasks in a cluster with DVFS. in Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM
International Conference on. 2010. IEEE.
[19] Meisner, D., 8.T. Gold, and T.F. Wenisch. PowerNap: eliminating server idle power. in ACM SIGPLAN Notices. 2009. ACM.
[20] Zhuravlev, S., et aI., Survey of energy-cognizant scheduling techniques. 2012.
[21] Kim, K.H., R. Buyya, and 1. Kim. Power aware scheduling of bagof-tasks applications with deadline constraints on DVS-enabled clusters. in Proceedings of the seventh IEEE international
symposium on cluster computing and the grid. 2007.
[22] Zhu, D., R. Melhem, and B.R. Childers, Scheduling with dynamic Voltage/speed adjustment using slack reclamation in multiprocessor
real-time systems. Parallel and Distributed Systems, IEEE Transactions on, 2003. 14(7): p. 686-700.
[23] Ge, R., X. Feng, and K.W. Cameron. Performance-constrained
distributed dvs scheduling for scientific applications on poweraware clusters. in Proceedings of the 2005 ACMlIEEE conference
on Supercomputing. 2005. IEEE Computer Society.
[24] Rountree, 8., et al. Bounding energy consumption in large-scale MPI programs. in Supercomputing, 2007. se07. Proceedings of the 2007 ACMlIEEE Conference on. 2007. IEEE.
[25] Intel, E., SpeedStep® Technology for the Intel® Pentium® M Processor, 2004, Intel Corporation, Santa Clara, CA.
[26] PowerNOW, A., Technology, AMD white paper, November 2000.
[27] Topcuoglu, H., S. Hariri, and M.-y. Wu, Performance-effective and low-complexity task schedul ing for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, 2002. 13(3): p. 260-274.
[28] Pahlavan, A., M. Momtazpour, and M. Goudarzi. Variation-aware
Server Placement and Task Assignment for Data Center Power Minimization. in Parallel and Distributed Processing with
Applications ([SPA), 2012 IEEE 10th International Symposium on. 2012. IEEE.
[29] M.R. Garey, D.S.J., Computers and Intractability: A Guide to the Theory of NP-Complete ness. 1979.
[30] Bozdag, D., U. Catalyurek, and F. Ozguner. A task duplication based bottom-up scheduling algorithm for heterogeneous
environments. in Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. 2006. IEEE.
[31] Mei, J. and K. Li. Energy-Aware Scheduling Algorithm with
Duplication on Heterogeneous Computing Systems. in Grid
978-1-4799-2093-8/13/$31.00 ©201 3 IEEE
Computing (GRlD), 2012 ACM/IEEE 13th International Conference on. 2012. IEEE.
[32] llavarasan, E. and P. Thambidurai, Low complexity performance
effective task scheduling algorithm for heterogeneous computing environments. Journal of Computer sciences, 2007. 3(2): p. 94-103.
[33] Daoud, M.1. and N. Kharma, A high performance algorithm for
static task scheduling in heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing, 2008. 68(4): p. 399-409.
[34] Hagras, T. and J. Janecek, A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems. Parallel Computing, 2005. 31 (7): p. 653 -670.
[35] Liu, G., K. Poh, and M. Xie, Iterative list scheduling for heterogeneous computing. Journal of Parallel and Distributed Computing, 2005. 65(5): p. 654-665.
[36] Tang, x., et aI., List scheduling with duplication for heterogeneous computing systems. Journal of Parallel and Distributed Computing,
2010. 70(4): p. 323-329.
[37] Li, K., Energy efficient scheduling of parallel tasks on multiprocessor computers. The Journal of Supercomputing, 2012.
60(2): p. 223-247.
[38] Zong, Z., et aI., EAD and PEBD: two energy-aware duplication
scheduling algorithms for parallel tasks on homogeneous clusters. Computers, IEEE Transactions on, 2011. 60(3): p. 360-374.
[39] Sinnen, 0., Task scheduling for parallel systems. Vol. 60. 2007: Wiley-Interscience.
[40] Min, R., T. Furrer, and A. Chandrakasan. Dynamic voltage scaling techniques for distributed microsensor networks. in VLSI, 2000. Proceedings. IEEE Computer Society Workshop on. 2000. IEEE.
[41] Janitor, J., F. Jakab, and K. Kniewald. Visual Learning Tools for Teaching/Learning Computer Networks: Cisco Networking
Academy and Packet Tracer. in Networking and Services (ICNS), 2010 Sixth International Conference on. 2010. IEEE.
[42] Mahadevan, P., et aI., A power benchmarking framework for
network devices, in NETWORKING 20092009, Springer. p. 795-808.
[43] Tobita, T. and H. Kasahara, A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. Journal of Scheduling, 2002. 5(5): p. 379-394.
[44] Kwok, Y.-K., I. Ahmad, and J. Gu. FAST: A low-complexity algorithm for efficient scheduling of DAGs on parallel processors. in Parallel Processing, 1996., Proceedings of the 1996 International Conference on. 1996. IEEE.
[45] Van de Vel de, E.F., Experiments with mUlticomputer LU -decomposition. Concurrency: Practice and Experience, 1990. 2(1): p. I-26.
[46] A. Rajabi, V. Ebrahimirad, N. Yazdani. "Decision Support-as-aService: An Energy-aware Decision Support Service in Cloud Computing" 5th International Conference on Information and Knowledge Technology (IKT), Iran, 2013.
[47] A. Rajabi, H.R. Faragardi and N. Yazdani. "Communication-aware and Energy-efficient Resource Provisioning for Real-Time Cloud
Services", 17th CSI Symposium on Computer Architecture & Digital Systems (CADS), Iran, 2013.
[48] H.R. Faragardi, A. Rajabi, R. Shojaee, N. Yazdani. "Towards
Energy-aware Resource Scheduling to Maximize Reliability in Cloud Computing Systems" 15th IEEE International Conference
on High Performance Computing and Communications (HPCC 2013), China, 2013.
[49] Infrastructure, C.D.C., 2.5 Design Guide, 2007.