Scheduling Algorithm for Workflow-Based Applications in Optical Grid

10
JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 17, SEPTEMBER 1, 2008 3011 Scheduling Algorithm for Workflow-Based Applications in Optical Grid Zhenyu Sun, Wei Guo, Member, IEEE, Zhengyu Wang, Yaohui Jin, Member, IEEE, Weiqiang Sun, Member, IEEE, Weisheng Hu, Member, IEEE, and Chunming Qiao, Senior Member, IEEE Abstract—Grid is evolving to a more efficient global computing infrastructure by introducing optical network technology to sup- port the advanced data-intensive distributed applications. Sched- uling such data-intensive applications includes assigning tasks on computational resources, routing lightpaths, and assigning wave- length channels for data communication. The scheduling problem is NP-hard in the traditional grid system, and in optical grids, it is more complicated due to the character of optical networks. In this paper, we formulate the scheduling problem in optical grids and propose a novel scheduling algorithm which modifies the sched- uling order according to actual importance of each task to search for a better solution. We call it the scheduled critical path (SCP) al- gorithm. We compare the scheduling results obtained by the SCP algorithm with the optimal results calculated by OPL studio soft- ware on a 3-node optical grid. To evaluate the performance of the proposed algorithm on more complicated systems, we construct a simulator which is able to schedule the application to the optical grid according to a certain scheduling algorithm. The simulation results prove the efficiency of the SCP algorithm. Index Terms—Grid computing, optical network, scheduling al- gorithm. I. INTRODUCTION G RID enables integration of the geographical distributed resources, such as computing resources, storage and vi- sualization devices. This integration can provide advanced ser- vices to many data-intensive applications in scientific research, industrial design, home entertainment, etc. The bandwidth re- quirement of such applications has increased rapidly during re- cent years. For example, projects in aero-crafts aerodynamic simulation and optimization generate gigabyte or even larger data and need to transport these data in almost real time [1]. Other data-intensive applications such as high energy physics project [2] and interactive HDTV also have huge bandwidth re- quirement. Among the existing network technologies, optical networks are able to provide a larger amount of bandwidth with Manuscript received October 28, 2007; revised February 19, 2008. Current version published December 19, 2008. This work was supported in part by the NSFC project under Grant 60672016 and in part by the China 863 project. Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, and W. Hu are with State Key Lab on Fiber-Optic Local Area Networks and Advanced Optical Communication Systems, Shanghai Jiao Tong University, China (e-mail: sunyuz@ sjtu.edu.cn; wguo@ sjtu.edu.cn; wzy1983@ sjtu.edu.cn; jinyh@ sjtu.edu.cn; sunwq@ sjtu. edu.cn; wshu@ sjtu.edu.cn) C. Qiao is with the Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JLT.2008.923935 lower latency. They can dynamically control and allocate net- work bandwidth to support large-scale data-intensive applica- tions [4]. It is, therefore, a promising solution to interconnect the grid resources directly with the optical network and dynam- ically provision lightpaths. We refer to this integrated infrastruc- ture as optical grid or lambda grid [3], [5], [6]. Many applications, such as the aircraft collaborative design application [15], are constituted of successive computing tasks and data communications between tasks. These applications are viewed as workflow-based applications and can be modelled as a directed acyclic graph (DAG) [13]. A vertex in DAG repre- sents a task and an edge represents the data communication be- tween two tasks. Some applications may run each task and each communication on a dedicated resource, but this will greatly in- crease the cost of the application system. Most applications uti- lize resources only when necessary to optimize the cost. There- fore, task scheduling is a fundamental issue in achieving high performance [12]. In a traditional grid system, the scheduling algorithm maps the DAG to the grid system, which is resource assignment and order decision for tasks and routing for data communication. Compared with the scheduling problem in tra- ditional grid environment, task scheduling for workflow-based application is more complicated in optical grid. The optical net- work should be viewed as a resource [20] and some attributes of the optical network, such as circuit switches, wavelength conti- nuity constraints, etc., should be taken into account when a grid application is scheduled. In this paper, we address the task scheduling problem for workflow-based applications in optical grid. We assume that the grid resources are connected by the wavelength routed op- tical network without wavelength converters, so the wavelength continuity constraint should be complied while calculating the route for communication. The grid application is modelled as DAG and our objective is to minimize the scheduling length (the definition of scheduling length will explained in Section III) of the application. We formulate the scheduling problem for workflow-based application in optical grid. Since this problem is NP-hard [4], [7], we focus on the heuristic approach. We first extend the list-scheduling algorithm which is a classic algorithm in traditional grid system. A novel scheduling algorithm based on the Scheduled Critical Path is proposed. The definition of the Scheduled Critical Path will be explained later. We compare the results from both algorithms to the optimal results calculated by OPL studio software on a 3-node optical grid, and we compare the results from both algorithms on more complicated systems. Both experiments reveal that the proposed Scheduled Critical Path algorithm is able to find a better solution. 0733-8724/$25.00 © 2008 IEEE

Transcript of Scheduling Algorithm for Workflow-Based Applications in Optical Grid

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 17, SEPTEMBER 1, 2008 3011

Scheduling Algorithm for Workflow-BasedApplications in Optical Grid

Zhenyu Sun, Wei Guo, Member, IEEE, Zhengyu Wang, Yaohui Jin, Member, IEEE, Weiqiang Sun, Member, IEEE,Weisheng Hu, Member, IEEE, and Chunming Qiao, Senior Member, IEEE

Abstract—Grid is evolving to a more efficient global computinginfrastructure by introducing optical network technology to sup-port the advanced data-intensive distributed applications. Sched-uling such data-intensive applications includes assigning tasks oncomputational resources, routing lightpaths, and assigning wave-length channels for data communication. The scheduling problemis NP-hard in the traditional grid system, and in optical grids, it ismore complicated due to the character of optical networks. In thispaper, we formulate the scheduling problem in optical grids andpropose a novel scheduling algorithm which modifies the sched-uling order according to actual importance of each task to searchfor a better solution. We call it the scheduled critical path (SCP) al-gorithm. We compare the scheduling results obtained by the SCPalgorithm with the optimal results calculated by OPL studio soft-ware on a 3-node optical grid. To evaluate the performance of theproposed algorithm on more complicated systems, we construct asimulator which is able to schedule the application to the opticalgrid according to a certain scheduling algorithm. The simulationresults prove the efficiency of the SCP algorithm.

Index Terms—Grid computing, optical network, scheduling al-gorithm.

I. INTRODUCTION

G RID enables integration of the geographical distributedresources, such as computing resources, storage and vi-

sualization devices. This integration can provide advanced ser-vices to many data-intensive applications in scientific research,industrial design, home entertainment, etc. The bandwidth re-quirement of such applications has increased rapidly during re-cent years. For example, projects in aero-crafts aerodynamicsimulation and optimization generate gigabyte or even largerdata and need to transport these data in almost real time [1].Other data-intensive applications such as high energy physicsproject [2] and interactive HDTV also have huge bandwidth re-quirement. Among the existing network technologies, opticalnetworks are able to provide a larger amount of bandwidth with

Manuscript received October 28, 2007; revised February 19, 2008. Currentversion published December 19, 2008. This work was supported in part by theNSFC project under Grant 60672016 and in part by the China 863 project.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, and W. Hu are with State Key Labon Fiber-Optic Local Area Networks and Advanced Optical CommunicationSystems, Shanghai Jiao Tong University, China (e-mail: sunyuz@ sjtu.edu.cn;wguo@ sjtu.edu.cn; wzy1983@ sjtu.edu.cn; jinyh@ sjtu.edu.cn; sunwq@ sjtu.edu.cn; wshu@ sjtu.edu.cn)

C. Qiao is with the Department of Computer Science and Engineering,State University of New York at Buffalo, Buffalo, NY 14260 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JLT.2008.923935

lower latency. They can dynamically control and allocate net-work bandwidth to support large-scale data-intensive applica-tions [4]. It is, therefore, a promising solution to interconnectthe grid resources directly with the optical network and dynam-ically provision lightpaths. We refer to this integrated infrastruc-ture as optical grid or lambda grid [3], [5], [6].

Many applications, such as the aircraft collaborative designapplication [15], are constituted of successive computing tasksand data communications between tasks. These applications areviewed as workflow-based applications and can be modelled asa directed acyclic graph (DAG) [13]. A vertex in DAG repre-sents a task and an edge represents the data communication be-tween two tasks. Some applications may run each task and eachcommunication on a dedicated resource, but this will greatly in-crease the cost of the application system. Most applications uti-lize resources only when necessary to optimize the cost. There-fore, task scheduling is a fundamental issue in achieving highperformance [12]. In a traditional grid system, the schedulingalgorithm maps the DAG to the grid system, which is resourceassignment and order decision for tasks and routing for datacommunication. Compared with the scheduling problem in tra-ditional grid environment, task scheduling for workflow-basedapplication is more complicated in optical grid. The optical net-work should be viewed as a resource [20] and some attributes ofthe optical network, such as circuit switches, wavelength conti-nuity constraints, etc., should be taken into account when a gridapplication is scheduled.

In this paper, we address the task scheduling problem forworkflow-based applications in optical grid. We assume thatthe grid resources are connected by the wavelength routed op-tical network without wavelength converters, so the wavelengthcontinuity constraint should be complied while calculating theroute for communication. The grid application is modelled asDAG and our objective is to minimize the scheduling length(the definition of scheduling length will explained in Section III)of the application. We formulate the scheduling problem forworkflow-based application in optical grid. Since this problemis NP-hard [4], [7], we focus on the heuristic approach. We firstextend the list-scheduling algorithm which is a classic algorithmin traditional grid system. A novel scheduling algorithm basedon the Scheduled Critical Path is proposed. The definition of theScheduled Critical Path will be explained later. We compare theresults from both algorithms to the optimal results calculated byOPL studio software on a 3-node optical grid, and we comparethe results from both algorithms on more complicated systems.Both experiments reveal that the proposed Scheduled CriticalPath algorithm is able to find a better solution.

0733-8724/$25.00 © 2008 IEEE

3012 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 17, SEPTEMBER 1, 2008

The remaining part of this paper is organized as follows. Sec-tion II presents a description of several relevant works. Sec-tion III introduces the scheduling model, including the modelof optical grid and grid application. The scheduling objectiveand some constraints used in this paper are also presented. Sec-tion IV proposes the scheduling algorithm based on the sched-uled critical path. Section V presents an evaluation of proposedalgorithm on a 3-node optical grid with the optimal results. Wealso construct a simulation tool to evaluate the performance ofthe proposed algorithm on complex systems. The simulation re-sults and analysis are presented. Finally, the conclusion is madein Section VI.

II. RELATED WORK

Task scheduling is a key issue to achieve high performancein grid environment. For each task in a workflow-based gridapplication, the scheduling algorithm decides the resourcethat it will be allocated to and the execution order on thatresource. In [4] and [7], task scheduling has been proved tobe NP-hard, so many researches about the scheduling problemin traditional grid system focus on the heuristic approach[8]–[10], [13]. These algorithms assume that the resources areconnected directly with each other and the communicationcan be performed whenever needed. In [12], Sinnen et al.proposed a scheduling model of heterogeneous system withcommunication contention awareness to achieve schedulingaccuracy compared to the ideal model.

In the optical network field, a large amount of researches havebeen performed on the routing problem [16]–[18]. Much of thisresearch is conducted under static or dynamic traffic models.Usually, in these models, the traffic demands are independent inboth time and space domain. However, for an workflow-basedapplication, the communication represents a precedence con-straint of the two tasks, and the source and the destination ofthe communication is determined by the allocation of the tasks.Since the allocation of tasks will not be determined before theapplication is scheduled, this dependent relationship makes itimpossible to capture a traffic model of an workflow-based ap-plication. The allocation of tasks and routing for communicationshould be considered in the same time during the schedulingprocess.

In [14], Wang et al. addressed the scheduling problem inoptical grid. This work viewed the optical network as a resourceand proposed a joint scheduling model which extended thelist-scheduling algorithm to achieve communication contentionawareness scheduling in optical grid. The work considered theimpact of different routing policy mechanisms and proposed anefficient adaptive routing schema.

Most scheduling algorithms discussed before are based on theclassic list-scheduling algorithm. The list-scheduling algorithmis the most common and effective approach. It first sorts thetasks into a list according to the priority of each task and thenschedules each task to a specific resource. Usually, the algorithmwill schedule the important tasks first to minimize the overallexecution time. So the priority scheme which decides the sched-uling order is important to the scheduling result. Many priorityschemes have been proposed in [9] and [11], such as bottom

level, top level, a technique based on critical path, etc. Sinnen etal. compare performances of several task priority schemes in aheterogeneous environment and concludes that the bottom levelprovides best performance [11]. The bottom level of a task is thelength of the longest path, from the task to a sink task (whichhas no successor) in the DAG. It includes expected cost, boththe computation cost and the communication cost. The sched-uling algorithm based on bottom level first sorts the tasks into alist according to the bottom level of each task in the decreasingorder. After that, the algorithm schedules each tasks to an ap-propriate resource.

In optical grid, especially for data-intensive applications, theactual communication cost may be greatly different from the ex-pected cost. If two tasks are assigned to the same resource, thecommunication between them will cost no time, but in other oc-casions when the two tasks are assigned to different resourcesand the optical network may not have enough resources for thecommunication between them, it has to wait for the networkresource to be released. The waiting cost should also be con-sidered as a part of the actual cost. In the data-intensive sce-nario, the waiting cost can be an important factor to the finishtime of the application. Since the bottom level is calculated onlywith the expected cost and the contention caused by the lim-ited resources in optical grid is not taken into account, a taskwith smaller bottom level may be scheduled first to reduce thewaiting cost and the finish time of the application.

On the other hand, the contention is not known before thescheduling, so we first generate a scheduling result according tothe list-scheduling algorithm, and then try to modify the sched-uling order according to the actual execution cost. Based on theactual cost of task and communication in DAG, we define theScheduled Critical Path (SCP) as the longest path in DAG con-sisting of several tasks and communications. The length of SCPis equal to the scheduling length of the application. Since our ob-jective is to minimize the scheduling length of the application,we try to increase the priority of the tasks on SCP to search fora better scheduling result.

III. MATHEMATICAL MODEL

In this section, we describe the mathematical model, in-cluding the model of optical grid and the grid application. Wealso present the objective function and some constraints thatthe scheduling algorithm should satisfy.

A. Model of Optical Grid

In this paper, two sets of resources are considered: one isgrid resource, including computational, storage, and visualiza-tion devices; the other is optical network resource, such as op-tical switch nodes connected with optical links and access linksconnecting the grid resource with the optical switches. In thispaper, we assume that the optical network is a wavelength routedoptical network. Each optical link contains several wavelengthchannels. We assume that there is no wavelength converter in thesystem and the lightpath should be established with the wave-length continuity constraint. The grid resources are connectedto the optical switch nodes via the access links. Usually, thetopology is logical because the service provider may only allowaccess to certain network resource and grid resource.

SUN et al.: SCHEDULING ALGORITHM FOR WORKFLOW-BASED APPLICATIONS IN OPTICAL GRID 3013

Fig. 1. Optical grid example.

Optical grid is modeled as a graph . Thefollowing notation is used to describe the model of optical grid.

1) : Set of grid resources.2) : Set of optical switch nodes.3) : Set of links including optical link and

access link .4) : Type of grid resource , for example, 1 repre-

sents computation resource, 2 represents storage, 3 rep-resents visualization resource, etc.

5) : Bandwidth of access link .6) : The number of wavelength channels in the system.7) : The th wavelength channel in optical link .8) : Bandwidth of .

Fig. 1 depicts an example of optical grid with 9 grid resources( ) connected by a 6-node optical network ( ).

B. Model of Grid Application

A grid application is usually composed of many dependenttasks, so it can be represented as directed acyclic graph (DAG).In this paper, a DAG is modeled as , and thefollowing notation is defined to describe the DAG.

1) : Set of vertex and a vertex represents a taskof the grid application.

2) : Set of edges and an edge represents thedata communication from task to .

3) : Type of task , for example, 1 represents thecomputation task, 2 represents the storage, and 3 repre-sents visualization task, etc.

4) : Average execution time of task .5) : Data communication time from task to on

one unit of bandwidth.6) : Set of predecessors of task . If task has no

predecessor, it is defined as the source task.7) : Set of successors of task . If task has no

successor, it is a sink task .Fig. 2 depicts an example of grid application represented by

the DAG. In this example, there are three types of tasks. Amongall the tasks, has no predecessor, , so is thesource task; has no successor, , so is the sinktask.

Fig. 2. DAG example.

C. Scheduling Objective and Constraints

Based on the model of optical grid and grid application, thescheduling process is to map the DAG to optical grid, allocatinga grid resource with the corresponding type for each task andassigning a lightpath for each data communication. The sched-uling objective is to minimize the scheduling length (executiontime of the application) which is the time required for the lasttask to finish. To describe a scheduling result, the following termis defined.

1) : The resource that task is assigned to.2) : Start time of task on resource .3) : Finish time of task on resource .4) : The heterogeneity factor which shows the capa-

bility of resource processing task . is used tocalculate the actual execution time of task on resource. So can be calculated by

5) : The scheduling length.denotes the finish time of task .

3014 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 17, SEPTEMBER 1, 2008

6) : Lightpath for which includes ac-cess links and a series of optical links. If task and

connected by are assigned to the same resource,.

7) : Bandwidth of lightpath . It is the minimumaccess links’ bandwidth and the wavelength channel’sbandwidth. .

8) : The start time of data communication .9) : The finish time of . If task and connected

by are assigned to the same resource, the finish timeis equal to the start time. So is calculated by

if

otherwise.

10) : The time when lightpath is ready for commu-nication .

Objective function

(a)

Constraints

if (b)

or if (c)

(d)

(e)

(f)

Equation (b) represents the resource type constraint. Taskscan only be assigned to resources with corresponding type.

Inequalities (c) represent the resource no time-overlap con-straint. Tasks assigned to the same resource should not be pro-cessed in the same time.

Inequality (d) represents the task precedence constraint. Taskcan be processed only after all its predecessors have been

finished and all the data needed are transferred to resource(suppose task is assigned to resource , ).

Inequality (e) shows that communication should only bestarted after task is finished.

Inequality (f) represents the network resource constraint.Since the network resource may be occupied by other com-munication, communication can only be started after thelightpath is available.

From (e) and (f), the start time of is calculated by

IV. SCHEDULING ALGORITHM

In this section, we discuss the scheduling algorithms for op-tical grid. We first briefly describe the extended list-schedulingalgorithm using bottom level to set the task priority, and thenthe proposed scheduled critical path (SCP) algorithm will bepresented. A simple schedule example is presented to show thedifference between the two algorithms.

Fig. 3. Extended list-scheduling algorithm.

Fig. 4. Procedure of calculating SCP.

A. Extended List-Scheduling Algorithm

List-scheduling algorithm is the most common heuristic forscheduling grid application which is modeled as DAG. The ex-tended list-scheduling algorithm with routing and wavelengthassignment policy is shown in Fig. 3.

The scheduling order of each task is set according to thebottom level. The bottom level of a task is the longest path fromit to a sink task, including the communication cost. It can be cal-culated recursively by

For example, in Fig. 2, the bottom level of task is equal to1 and the bottom level of task is equal to 13, calculated bythe equation .

According to the value of the bottom level for each task, thescheduling order of DAG in Fig. 2 is as follows (the number inthe parentheses is the bottom level of the task):

The lightpath for data communication is calculated by anadaptive routing and wavelength assignment policy. We calcu-late the path that provides the earliest start time for the data com-munication. For each wavelength channel, every optical link isassociated with a label to show the time when the wavelengthchannel on the link will be available; the label of source node is

By modifying the Dijkstra’s shortest path algorithm andsubject to constraints in (e) and (f), we can calculate the pathwith the earliest available time. The wavelength channel which

SUN et al.: SCHEDULING ALGORITHM FOR WORKFLOW-BASED APPLICATIONS IN OPTICAL GRID 3015

Fig. 5. SCP algorithm.

provides the earliest available time will be used for the data com-munication.

The time complexity for calculating the bottom level oftasks is . The time complexity of the Dijk-stra’s shortest path algorithm is . So thetime complexity of the extended list-scheduling algorithm is

.

B. Scheduled Critical Path Algorithm

As discussed in Section III, the scheduling length of DAGis the finish time of the sink task , and the finish time of ispartially decided by the time when all the data from its predeces-sors are ready. Based on this idea, we define scheduled criticalpath (SCP) as a path of tasks which decide the actual schedulinglength. Obviously, SCP contains the sink task whose finishtime is equal to the scheduling length of DAG. SCP should alsocontain the predecessor of whose data is the last to arrive atthe resource as the finish time of this predecessor task isa major factor in deciding the start time of task according toequation (d). By repeating the search for the “last-arrived-data”predecessor, SCP is built as a path from the source task to thesink task. Each task on SCP is critical to the scheduling length ofthe application. Fig. 4 shows the procedure of calculating SCP,where is a pointer to the current task on SCP.

Because the tasks on SCP mainly decide the schedulinglength of DAG, we adjust the scheduling order of these tasks tosearch for better scheduling results. From the sink task on SCP,each task is moved forward to the earliest possible position ofthe schedule list without violating the task precedence con-straint. The position of task in the schedule list is representedas . Under the task precedence constraint, the position oftask can only be after all the predecessors

After adjusting the positions of tasks on SCP, DAG is scheduledaccording to the new schedule list. The heuristic of selecting re-source for each task is the same to algorithm described in Fig. 3.The scheduling result with minimal scheduling length will bethe final result.

The main procedure of SCP algorithm is shown in Fig. 5.The time complexity of calculating the SCP of the DAG is

. So, the time complexity of the SCP algorithmis the same to that of the extended list-scheduling algorithm.

C. Schedule Example

A schedule example to show the difference between twoscheduling algorithms is presented in Fig. 6. The optical gridincludes 3 resources of different types. Each optical link con-tains 1 wavelength channel. represents the first wavelengthchannel of optical link which connects the optical switchesand . represents the access link connecting resourceand optical switch .

The scheduling result of the example generated by the list-scheduling algorithm is shown in Fig. 7(a). The list-schedulingalgorithm schedules tasks according to the value of bottom level.The schedule list is

DAG is scheduled according to this schedule list. The sched-uling length obtain by list-scheduling algorithm is 23.

Based on the scheduling result, SCP is calculated as. After adjusting the position of tasks on SCP, the

schedule list is

DAG is rescheduled according to this schedule list. Thescheduling length obtain by SCP algorithm achieves 17.Clearly, it is better than scheduling length obtained by thelist-scheduling algorithm.

Adjusting the scheduling order of tasks shouldn’t violatetheir dependency constraints. In the example, task should bescheduled after task . Similarly, task should be after task

, and task should be after task . However, there is nodependency relationship between task , , and Under thedependency constraint, we are still able to adjust the schedulingorder of these three tasks to search for the better result. For adirected acyclic graph (DAG) modeled workflow-based appli-cation, there are some tasks without dependency relationshipas long as there are more than one path in the graph. So anapplication under this condition can benefit from the proposedalgorithm.

V. SIMULATIONS AND COMPARISON

In this section, we present the comparison and evaluation ofthe proposed scheduled critical path (SCP) algorithm with theextended list-scheduling (ELS) algorithm. We first study howclose to the optimal result the SCP algorithm and ELS algorithm

3016 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 17, SEPTEMBER 1, 2008

Fig. 6. Schedule example: (a) DAG; (b) optical grid.

perform on a 3-node optical grid. Then we introduce a compar-ison on more complex systems. In order to evaluate the perfor-mance of our algorithm, we construct a simulator using Javakit. We develop a module to generate random DAG that is usedto compare the performance of different scheduling algorithms.Three key parameters of DAG are considered during the sim-ulation: 1) communication-computation ratio (CCR), 2) DAGsize (number of tasks in the DAG), and 3) average number ofedges per node. CCR is defined as sum of communication costdivided by the sum of task execution cost. Even though we stillcall it CCR, all types of tasks are counted, not only the compu-tation tasks but also the storage tasks and the display tasks. Theaverage number of edges per node is defined as the number ofedges divided by the number of nodes. For example, the CCRof the DAG in Fig. 6(a) is calculated by the sum of communi-cation cost ( ) divided by thesum of task execution cost ( ), sothe . The average number of edges of theDAG in Fig. 6(a) is calculated by .

A. Comparison With Optimal Scheduling Results

We use a 3-node optical grid for this comparison. This opticalgrid is similar to the one in Fig. 6(b), but all three resources areidentical computational resources. The execution time of a taskon any resource is the same. Only one wavelength channel is op-erating in each optical link. The bandwidth of each wavelengthchannel and each access link is the same.

We randomly generate 20 DAGs and compare the averagescheduling length of ELS algorithm and SCP algorithm withthe optimal scheduling length calculated by the OPL studio soft-ware. The results are given in Fig. 8. The scheduling results ob-tained by the SCP algorithm are more close the optimal sched-uling results. Furthermore as the CCR increases, the SCP algo-rithm achieves much better results than the ELS algorithm.

B. Performance on Large System

We compare the performance of different algorithms onlarger system by simulation. As we are interested in data-in-tensive applications, the simulations focus on the high CCR( 1, 2, 3, 4, 7). The DAG size, which is the numberof tasks of the DAG, describes the size and complexity of theapplication. During the simulation, DAG size is set to 50, 100,150, 200, 250, and 300. The average number of edges per nodeof DAG describes the number of edges in the DAG comparedto the number of tasks in the DAG. A larger average numberof edges per node mean more edges in the DAG. During thesimulation, the average number of edges is set to 8, 10, 12, and16.

Two typical network topologies are used in the simulations:14 nodes NSFnet [shown in Fig. 9(a)] and a mesh torus [shownin Fig. 9(b)]. To make the simulation less complicated, we as-sume that each optical switch has one grid resource connectedvia an access link. It is assumed that each optical link has 4wavelength channels available for the grid application. All theaccess links and the wavelength channels have the same band-width. Three types of resources and tasks are considered.

The following metric is used to compare the performance ofthe SCP algorithm with the ELS algorithm.

C. Normalized Scheduling Length (NSL)

The NSL of an algorithm is obtained by dividing thescheduling length of the algorithm by the lower boundof the DAG. The lower bound of a DAG is the timeneeded for processing all the tasks on the critical pathof the DAG on the fastest resource and completing thenecessary communication with maximum bandwidth. Thelower bound of the DAG in Fig. 6(a) is 14, calculated by

. In thisway, the NSL of the extended list-scheduling algorithm is1.64 and the NSL of scheduled critical path algorithm is1.21.

SUN et al.: SCHEDULING ALGORITHM FOR WORKFLOW-BASED APPLICATIONS IN OPTICAL GRID 3017

Fig. 7. Scheduling results: (a) list-scheduling; (b) SCP.

Fig. 8. Comparison with optimal results with respect to different CCR.

Obviously the lower bound is the ideal execution timethat may not be always possible to achieve and the optimalresult may be greater than the lower bound. Therefore, theNSL value of any algorithm should always be greater than1. The algorithm with a relative small NSL value achievesbetter performance.

The simulation program runs on a PC with 2.4-GHz processorand 512-MB memory. It takes several seconds to schedule an ap-plication according to ELS algorithm. The time that the SCP al-gorithm takes to schedule the application is about 2 times as longas the ELS algorithm takes. The practical grid application mayrun for several hours or even days; therefore, the cost of sched-

uling algorithm can be omitted compared to the time needed forthe whole application.

1) Impact of Communication-Computation Ratio, AverageNumber of Edges Per Node and DAG Size: First, we study theperformance of scheduling algorithms under different com-munication-computation ratio (CCR), number of edges pernode and DAG sizes. The simulations were carried out overthe NSFnet shown in Fig. 9(a). Fig. 10(a) depicts the NSLvalue of two algorithms when CCR is 1, 2, 3, 4, and 7.In this group of simulation, the average number of edgesper node is 10 and the DAG size is 200. The SCP algo-rithm achieves better performance than the ELS algorithm

3018 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 17, SEPTEMBER 1, 2008

Fig. 9. Network topology used in simulation study: (a) 14 nodes NFSnet; (b) mesh torus.

and the improvement made by SCP algorithm increases asCCR increases. When , the improvement is about2.4%, while when , the improvement is about 5.9%and when the improvement is about 6.4%. Thesesimulations prove that the SCP algorithm can achieves betterperformance in the data-intensive scenario.

Fig. 10(b) depicts the NSL value of two algorithms with re-spect to different average number of edges per node of DAG.In this group of simulation, the CCR is 3 and DAG size is 200.Again the SCP algorithm achieves better performance and theimprovement increases as the number of edges per node in-creases. When the average number of edges per node is 8, theimprovement is about 4.6%, while when the average number ofedges per node is 10, the improvement is about 6.4% and whenthe average number of edges per node is 16, the improvementis about 6.9%. When the average number of edges per nodeis higher, there are more edges compared to the tasks. As wediscussed previously, the actual cost of communication may begreatly different from the expected one. More edges in the DAGwill make the difference even greater. Therefore, the SCP al-

gorithm achieves much better performance. Fig. 10(c) depictsthe NSL value of two algorithms with respect to different DAGsize. In this group of simulation, the CCR is 3 and the averagenumber of edges per node is 10. The SCP algorithm achievesbetter performance but the improvement decreases as the DAGsize increases. The explanation is similar to the previous discus-sion about number of edges per node. The increase of DAG sizecan be considered as the decrease of number of edges per nodeswhich means that there are relatively fewer edges in the DAGand the fewer edges lead to a reduced possibility of improvingthe scheduling results.

2) Impact of Different Network Topology: In this group ofsimulations, the impact of different network topology is studied.Fig. 11(a) and (b) depicts the average NSL of DAG on the 14nodes NSFnet and mesh torus. Generally the NSL of mesh torusis smaller than that of NSFnet. That is because the mesh torushas bigger out degree which will cause less communication con-tentions. While the improvement made by SCP compared toELS increases faster in NSFnet than in mesh torus. The ex-planation is that in mesh torus there is reduced communication

SUN et al.: SCHEDULING ALGORITHM FOR WORKFLOW-BASED APPLICATIONS IN OPTICAL GRID 3019

Fig. 10. Average NSL with respect to: (a) CCR; (b)average number of edges per node; (c) DAG size.

Fig. 11. Average NSL with respect to CCR in: (a) mesh torus; (b) NSFnet.

contention and the actual cost of communication may presentsmaller different than the expected one. So, the improvementmade by SCP is smaller.

VI. CONCLUSION

In this paper, we introduce a novel task scheduling algorithmbased on the scheduled critical path for data-intensive appli-cation in the optical grid. The grid application is abstractedas DAG to show the dependency relationship between tasks.A model of optical grid taking into account specific charac-teristics of wavelength routed optical network such as circuitswitching, wavelength continuity constraint, etc., is presented.Under the scheduling model, the classic list-scheduling algo-rithm is extended. As it has been discussed that the performanceof list-scheduling algorithms is not good enough for data-inten-sive applications in the optical grid, we propose a scheduling al-gorithm based on the scheduled critical path to improve sched-uling performance without increasing the time complexity ofthe algorithm. From the simulations using a set of randomlygenerated DAGs, compared to extended list-scheduling algo-rithm, the SCP algorithm can provide better results for data-in-

tensive application. It is observed that the improvement madeby the SCP algorithm increases as the communication-compu-tation ratio (CCR) increases. The improvement can be more sig-nificant when there is higher communication contention causedby lack of network resource.

ACKNOWLEDGMENT

The authors would like to thank Prof. D. Simeonidou for thehelpful discussion and the correction of the manuscript.

REFERENCES

[1] H. Liu, X. H. Lin, Y. Qi, X. D. Lu, and M. L. Li, “Aero-crafts aero-dynamic simulation and optimization by using “CFD-Grid” based onservice domain,” Grid Cooperative Comput., vol. 3251, 2004.

[2] W. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C.Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke,“Secure, efficient data transport and replica management for high-per-formance data-intensive computing,” in Proc. IEEE Mass StorageConf., San Diego, CA, Apr. 2001, pp. 13–28.

[3] F. Travostino, J. Mambretti, and G. Karmous-Edwards, Grid Networks:Enabling Grids With Advanced Communication Technology. NewYork: Wiley, 2006.

[4] M. Veeraraghavan, X. Zheng, and Z. Huang, “On the use of connection-oriented networks to support grid computing,” IEEE Commun. Mag.,vol. 44, pp. 118–123, 2006.

3020 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 26, NO. 17, SEPTEMBER 1, 2008

[5] T. Kudoh and G. Karmous-Edwards, “EnLIGHTened and G-lambda:Reserving interdomain lambda and compute resources across US andJapan,” presented at the IEEE ECOC Workshop on Networks for IT: Anew Opportunity for Optical Network Technologies, 2007.

[6] D. Simeonidou, C. Nejabati, G. Zervas, D. Klonidis, A. Tzanakaki, andM. J. O’Mahony, “Dynamic optical network architectures and tech-nologies for existing and emerging grid services,” IEEE J. Lightw.Technol., vol. 23, no. 10, pp. 3347–3357, Oct. 2005.

[7] J. D. Ullman, “NP-complete scheduling problems,” J. Comput. Syst.Sci., vol. 10, pp. 384–393, 1975.

[8] I. Ahmad, Y.-K. Kwok, and M. Y. Wu, “Analysis, evaluation, and com-parison of algorithms for scheduling task graphs on parallel proces-sors,” in Proc. 2nd Int. Symp. Parallel Architectures, Algorithms, andNetworks, Beijing, China, Jun. 1996, pp. 207–213.

[9] H. Topcuoglu, S. Hariri, and M. Y. Wu, “Performance-effective andlow-complexity task scheduling for heterogeneous computing,” IEEETrans. Parallel Distrib. Syst., vol. 13, no. 3, pp. 260–274, Mar. 2002.

[10] G. C. Sih and E. A. Lee, “A compile-time scheduling heuristic for inter-connection-constrained heterogeneous processor architectures,” IEEETrans. Parallel Distrib. Syst., vol. 4, no. 2, pp. 175–187, Feb. 1993.

[11] O. Sinnen and L. Sousa, “List scheduling: Extension for contentionawareness and evaluation of node priorities for heterogeneous clusterarchitectures,” Parallel Comput., vol. 30, no. 1, pp. 81–101, Jan. 2004.

[12] O. Sinnen and L. Sousa, “Communication contention in task sched-uling,” IEEE Trans. Parallel Distrib. Syst., vol. 16, no. 6, pp. 503–515,Jun. 2005.

[13] M. Y. Wu and D. D. Gajski, “Hypertool: A programming aid for mes-sage-passing systems,” IEEE Trans. Parallel Distrib. Syst., vol. 1, no.3, pp. 330–343, Jul. 1990.

[14] Y. Wang, Y. H. Jin, W. Guo, W. Q. Sun, W. S. Hu, and M. Y. Wu,“Joint scheduling for optical grid applications,” J. Opt. Netw., vol. 6,pp. 304–318, 2007.

[15] Z. Y. Wang et al., “Demonstration of a task-flow based aircraft col-laborative design application in optical grid,” presented at the ECOC,Berlin, Germany, Sep. 2007.

[16] D. Banerjee and B. Mukherjee, “A practical approach for routing andwavelength assignment in large wavelength-routed optical networks,”IEEE J. Sel. Areas Commun., vol. 14, pp. 903–908, Jun. 1996.

[17] B. Jaumard, C. Meyer, B. Thiongane, and Y. Xiao, “ILP formula-tions and optimal solutions for the RWA problem,” presented at theGLOBECOM, Dallas, TX, Nov. 2004.

[18] D. Cavendish, A. Kolarov, and B. Sengupta, “Routing and wavelengthassignment in WDM mesh networks,” presented at the GLOBECOM,Dallas, TX, Nov. 2004.

[19] Y. Kwok and I. Ahmad, “Static scheduling algorithm for allocatingdirected task graphs to multiprocessors,” ACM Comput. Surv., vol. 31,no. 4, pp. 406–471, 1999.

[20] A. Jukan and G. Karmous-Edwards, “Optical control panel for the gridcommunity,” IEEE Surv. Tutorials, vol. 9, no. 3, pp. 30–43, 2007.

Zhenyu Sun received the B.S. degree in electronicengineering from Shanghai Jiao Tong University(SJTU), China, in 2005, where he is currentlypursuing the M.S. degree in communication andinformation systems.

He is with the State Key Lab on Fiber-Optic LocalArea Networks and Advanced Optical Communica-tion Systems, SJTU. His research interests includeoptical grids and optimization algorithms.

Wei Guo (M’06) has been an Associate Professorwith the State Key Lab on Fiber-Optic Local AreaNetworks and Advanced Optical CommunicationSystems, Shanghai Jiao Tong University (SJTU),China, since 2003. Before joining SJTU, she wasa Senior Engineer and a Project Manager forFiberhome Telecommunication Technologies Co.,Ltd., from 2001–2003. She has published over 50publications in technical journals and conferences.Her research interests include optical grids, networkplanning, and optimization algorithms.

Zhengyu Wang received the B.S. degree inelectronic engineering from Shanghai Jiao TongUniversity (SJTU), China, in 2005, where he is cur-rently pursuing the M.S. degree in communicationand information systems.

He is with the State Key Lab on Fiber-Optic LocalArea Networks and Advanced Optical Communica-tion Systems, SJTU. His research interests includeoptical grids and middleware software design and de-velop.

Yaohui Jin (M’05) is a Professor with the State KeyLab on Fiber-Optic Local Area Networks and Ad-vanced Optical Communication Systems, ShanghaiJiao Tong University (SJTU), China. Prior to joiningSJTU, he was a member of technical staff at Bell LabsResearch China from 2000 to 2002. He served as theTPC member in many international conferences. Hehas published more than 50 papers in technical jour-nals and conferences. His research interests includeoptical networking, optical grids, and switch sched-uling.

Weiqiang Sun (M’05) is currently a Lecturer withthe State Key Lab on Fiber-Optic Local Area Net-works and Advanced Optical Communication Sys-tems, Shanghai Jiao Tong University (SJTU), China.His research interests include automatically switchedoptical networks (ASON) and optical multicast andTV distribution in overlay networks.

Weisheng Hu (M’05) is a Professor and the Directorof the State Key Lab on Fiber-Optic Local Area Net-works and Advanced Optical Communication Sys-tems, Shanghai Jiao Tong University (SJTU), China.His interests are in generalized automatic switchedoptical network and optical packet switching. He isthe author or coauthor of over 100 journal and con-ference papers.

Chunming Qiao (S’89–M’92) is a full Professor atthe State University of New York at Buffalo wherehe directs the Lab for Advanced Network Design,Analysis, and Research (LANDER). He has over 12years of academic and industrial experience in opticalnetworks. He has published more than 100 papers inleading technical journals and conference proceed-ings, and is recognized for his pioneering researchon optical internet and, in particular, the optical burstswitching (OBS) paradigm. His work on integratedcellular and ad hoc networking systems (iCAR) is

also internationally acclaimed.