Robust Load Delegation in Service Grid Environments

14
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1 Robust Load Delegation in Service Grid Environments Alexander F¨ olling, Christian Grimme, Joachim Lepping, and Alexander Papaspyrou Abstract—In this paper, we address the problem of finding well- performing workload exchange policies for decentralized Computational Grids using an Evolutionary Fuzzy System. To this end, we establish a non-invasive collaboration model on the Grid layer which requires minimal information about the participating High Performance and High Throughput Computing (HPC/HTC) centers and which leaves the local resource managers completely untouched. In this environment of fully autonomous sites, independent users are assumed to submit their jobs to the Grid middleware layer of their local site, which in turn decides on the delegation and execution either on the local system or on remote sites in a situation-dependent, adaptive way. We find for different scenar- ios that the exchange policies show good performance characteristics not only with respect to traditional metrics such as average weighted response time and utilization, but also in terms of robustness and stability in changing environments. Index Terms—Grid Computing, Evolutionary Fuzzy Systems, Online Grid Scheduling, Performance Evaluation 1 I NTRODUCTION T ODAY, the use of Grid Computing is not anymore limited to HPC/HTC-centric communities such as High Energy Physics, Astronomy, or Climate Research, which have a certain tradition of using such infrastruc- tures. Other sciences—e.g. Financial Services, Construc- tion Engineering, and even arts and humanities—also start to adopt Grid Computing as a tool for e-Science, and show an ever-increasing demand for computing power and storage space. While well-established approaches such as the EGEE environment [1] have relied on centralized middleware infrastructures for whole e-Science communities, other— mostly emerging—efforts, such as the D-Grid Initiative 1 , have chosen a Service Grid approach with smaller, more community-tailored Grids. In the latter case, however, a strong demand for enabling collaboration and coop- eration on the infrastructure layer between the different communities and Grids can be observed. A major issue in such collaborations is the possibility of inter-community resource usage: Although most com- munities run their own data centers, working together in an ad-hoc manner by allowing alien workload to be All authors are with the Robotics Research Institute, Section Information Technology at TU Dortmund University, Germany. Manuscript received April 15, 2009. 1. http://www.d-grid.de run on community hardware is still a tedious task and usually requires resorting to 1980s-style command line interfaces and undesirable micro-management. This is mainly due to technical issues: Many e-Science infras- tructures show a lack of standardization, and therefore, collaborations between the workload gateways (usually Grid schedulers or brokers) fail on a compatibility level. There are, however, also organizational issues: Each com- munity Grid strives for delivering the highest possible Quality of Service to its own users and, as such, is only interested in participating in joint efforts if they are beneficial for all participants likewise. This last aspect is an open research problem in the field of Grid scheduling: algorithms for the exchange of work- load between different Grid communities—with respect to common performance metrics—have to perform at least as good as in the non-cooperative case. Otherwise, the motivation for participating in a HPC/HTC feder- ation, vanishes quickly, since one of the participating user communities will suffer from the collaboration. Here, we can identify four important properties for such algorithms: Support for environments with very strict information policies: Although almost every Grid provides var- ious kinds of information services, data regard- ing the machines themselves such as their current or overall utilization, average response times, or throughput is often kept confidential due to com- petition reasons. Strict separation from local resource management sys- tems (LRMS): Machine owners usually have their own operational policies implemented on their sys- tems and obviously are not willing to cease control over the machines they are obliged to fund. Situation-dependent, adaptive decision-making: The cur- rent state of the system is crucial when deciding on whether to accept or decline foreign workload, e.g. allowing for additional remote jobs if the local system is already highly loaded seems to be inap- propriate. Robustness and stability in changing environments: Even with respect to future, still unknown (and usu- ally unpredictable) job submissions, it is crucial that aspects such as complete site failures or even rogue

Transcript of Robust Load Delegation in Service Grid Environments

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1

Robust Load Delegationin Service Grid Environments

Alexander Folling, Christian Grimme, Joachim Lepping, and Alexander Papaspyrou

F

Abstract—In this paper, we address the problem of finding well-performing workload exchange policies for decentralized ComputationalGrids using an Evolutionary Fuzzy System. To this end, we establisha non-invasive collaboration model on the Grid layer which requiresminimal information about the participating High Performance and HighThroughput Computing (HPC/HTC) centers and which leaves the localresource managers completely untouched. In this environment of fullyautonomous sites, independent users are assumed to submit their jobsto the Grid middleware layer of their local site, which in turn decideson the delegation and execution either on the local system or on remotesites in a situation-dependent, adaptive way. We find for different scenar-ios that the exchange policies show good performance characteristicsnot only with respect to traditional metrics such as average weightedresponse time and utilization, but also in terms of robustness andstability in changing environments.

Index Terms—Grid Computing, Evolutionary Fuzzy Systems, OnlineGrid Scheduling, Performance Evaluation

1 INTRODUCTION

TODAY, the use of Grid Computing is not anymorelimited to HPC/HTC-centric communities such as

High Energy Physics, Astronomy, or Climate Research,which have a certain tradition of using such infrastruc-tures. Other sciences—e.g. Financial Services, Construc-tion Engineering, and even arts and humanities—alsostart to adopt Grid Computing as a tool for e-Science,and show an ever-increasing demand for computingpower and storage space.

While well-established approaches such as the EGEEenvironment [1] have relied on centralized middlewareinfrastructures for whole e-Science communities, other—mostly emerging—efforts, such as the D-Grid Initiative 1,have chosen a Service Grid approach with smaller, morecommunity-tailored Grids. In the latter case, however,a strong demand for enabling collaboration and coop-eration on the infrastructure layer between the differentcommunities and Grids can be observed.

A major issue in such collaborations is the possibilityof inter-community resource usage: Although most com-munities run their own data centers, working togetherin an ad-hoc manner by allowing alien workload to be

• All authors are with the Robotics Research Institute, Section InformationTechnology at TU Dortmund University, Germany.

Manuscript received April 15, 2009.1. http://www.d-grid.de

run on community hardware is still a tedious task andusually requires resorting to 1980s-style command lineinterfaces and undesirable micro-management. This ismainly due to technical issues: Many e-Science infras-tructures show a lack of standardization, and therefore,collaborations between the workload gateways (usuallyGrid schedulers or brokers) fail on a compatibility level.There are, however, also organizational issues: Each com-munity Grid strives for delivering the highest possibleQuality of Service to its own users and, as such, is onlyinterested in participating in joint efforts if they arebeneficial for all participants likewise.

This last aspect is an open research problem in the fieldof Grid scheduling: algorithms for the exchange of work-load between different Grid communities—with respectto common performance metrics—have to perform atleast as good as in the non-cooperative case. Otherwise,the motivation for participating in a HPC/HTC feder-ation, vanishes quickly, since one of the participatinguser communities will suffer from the collaboration.Here, we can identify four important properties for suchalgorithms:

• Support for environments with very strict informationpolicies: Although almost every Grid provides var-ious kinds of information services, data regard-ing the machines themselves such as their currentor overall utilization, average response times, orthroughput is often kept confidential due to com-petition reasons.

• Strict separation from local resource management sys-tems (LRMS): Machine owners usually have theirown operational policies implemented on their sys-tems and obviously are not willing to cease controlover the machines they are obliged to fund.

• Situation-dependent, adaptive decision-making: The cur-rent state of the system is crucial when decidingon whether to accept or decline foreign workload,e.g. allowing for additional remote jobs if the localsystem is already highly loaded seems to be inap-propriate.

• Robustness and stability in changing environments:Even with respect to future, still unknown (and usu-ally unpredictable) job submissions, it is crucial thataspects such as complete site failures or even rogue

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2

participants are handled gracefully with respect tothe own and overall performance.

In the work at hand, we address these properties usinga Fuzzy based approach for job exchange in Compu-tational Grids, where the controller acts depending onthe current system state. The states are modeled byFuzzy sets which are represented by simple membershipfunctions. Although the system states themselves donot contain any uncertainty the whole representationof the system is uncertain as only limited informa-tion (see above) is provided. Such Fuzzy System basedscheduling techniques have been successfully appliedto online scheduling problems before, see for exampleFranke et al. [2]. They outperform most static schedulingheuristic due to their ability to flexibly adapt decisionsto changing environments. As they have proven to be areliable concept to tackle challenging online schedulingproblems, we decide to also apply them in the Gridcontext.

In order to establish good rules for the Fuzzy System,we furthermore use evolutionary algorithms for findingparameterization of the Fuzzy membership functions.This approach is especially suitable because of the pos-sibility to find a simple and efficient encoding of thewhole controller. This combination of Fuzzy Systems andevolutionary algorithm is commonly denoted as Evolu-tionary Fuzzy Systems, see Cordon et al. [3]. We showthat our approach, while respecting the aforementionedrequirements for Grid scheduling algorithms, shows ad-equate performance characteristics in real setups. Notethat we do not address the scheduling of data-intensiveapplications, where location of data is a significant factorat the moment. Similarly, we assume no scheduling ofworkflows an consider only independent jobs. However,the here presented concept can easily be augmentedwith existing data schedulers or workflow schedulingapproaches.

The remainder of the paper is organized as follows: InSection 2, we establish the basis for understanding ourmodel, algorithm, and optimization. We then introduceour system model in Section 3 and our Fuzzy GridScheduling approach in Section 4. After discussing toolsfor performance measurement in Section 5, we depict theevolutionary learning of rule sets in Section 6. Next, weevaluate our approach with respect to adaptiveness in aGrid federation in Section 7 and robustness in unknownenvironments in Section 8. Along with a review of thecurrent state of the art in Section 9, we conclude ourwork in Section 10.

2 BACKGROUNDThis section briefly introduces to the basic of job schedul-ing on Massively Parallel Processing (MPP) systems,Evolutionary Fuzzy Systems, and evolutionary algo-rithms. These definitions and tools are applied through-out the paper to easily describe the used Grid architec-ture as well as the proposed approach for realizing jobmigration.

2.1 Job Scheduling for MPP Systems

The scheduling of MPP systems is an online problem asjobs are submitted over time and the precise processingtimes of those jobs is unknown in advance. Further-more, information about future jobs are not available.We assume independent rigid parallel batch jobs for ouranalysis, which are dominant on most parallel computersystems. Those jobs are neither moldable nor malleableand require concurrent and exclusive access to the re-quested resources. Formally, each job j is characterizedby its degree of parallelism mj and its processing timepj . Although many additional criteria are conceivable,see Feitelson et al. [4], we restrict ourselves to only thosetwo required job properties.

During the execution phase, job j requires the con-current and exclusive access to mj ≤ mk processingnodes with mk being the total number of nodes on theMPP system at site k. The number of required processingnodes mj is available at the release date rj of job j anddoes not change during the execution. As the networkdoes not favor any subset of the nodes and all nodes ofa parallel computer system are either identical or verysimilar, we assume that a job j can be processed on anysubset of mj nodes of the system.

Further, most current real installations of parallel com-puters do not use preemption but let all jobs run tocompletion. The completion time of job j within theschedule S is denoted by Cj(S).

2.2 Evolutionary Algorithms

Optimization algorithms that mimic the natural processof Darwinian evolution are widespread in computerscience and often applied for parameter optimizationwhen the fitness landscape of the optimization problemis unknown.

A specific type of these algorithms—Evolution Strate-gies [5]—operate on a population of µ individuals, whereeach individual represents a real-coded solution to thegiven optimization problem. These approaches applyvariation operators like mutation (a random change ingenome) and recombination (combining two or moreparent individuals’ genomes) to breed λ offspring in-dividuals from those µ parental individuals, followedby a global selection process in which the individualscompete against each other to form the new µ parents forthe next generation. The above described evolutionaryloop is executed until a given termination criterion, like afixed number of generations or a quality level within theobjective space, is satisfied. Two versions of EvolutionStrategies are distinguishable: In the (µ, λ)-strategy, thenext parent generation is selected just from the offspringindividuals while the (µ + λ)-strategy selects the bestindividuals of both the parent and offspring generations.All other individuals are removed from the system andthe next loop iteration starts.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 3

2.3 Evolutionary Fuzzy Systems

Since their conceptualization in the early 1960ies FuzzySystems have been widely and successfully applied tovarious areas like for example control systems or clas-sification. Especially in control systems, they are partic-ularly suited for the representation of problem specificknowledge, as imprecision or vague descriptions arecommon properties of expertise. Currently, many deci-sion making methods (e.g. in the fields of resource man-agement or robot behavior) solve problems in a heuris-tic fashion. They give advices for actions in certain—often fuzzy described—situations that have turned outto be profitable with respect to a given objective. Sucha collection of situation-dependent expertise is calleda knowledge base.

There are several advantages to represent a knowl-edge base by Fuzzy logic within a Fuzzy System: Theinterpolative nature of Fuzzy Systems has the ability toexpress partial and concurrent activations of behaviorsand gradual transitions between them. Further, the be-havior can be conveniently synthesized by a set of IF-THEN rules using linguistic terms to encode the expertknowledge. Finally, due to its approximate reasoningcapabilities, Fuzzy logic produces controllers that arerobust to uncertainty and imprecision. Especially, thelatter property is of great importance for the problemaddressed in this paper, as we aim to produce robustexchange mechanisms within changing environments.

However, one of the major drawbacks of classic FuzzySystems is their missing learning ability. They alwaysrequire a existing knowledge base that has to be derivedfrom experts knowledge which is often called trainingdata. In many cases, that data is not available and thedesign of Fuzzy Systems is not possible at all. Alsofor the problem at hand we cannot revert to any kindof training data. Therefore, we employ an evolutionarylearning process to automate the Fuzzy System design.

Evolutionary Fuzzy Systems are Fuzzy Systems de-rived and optimized by an evolutionary learning pro-cess. For these systems an evolutionary algorithm is em-ployed to learn or tune different components. They arealways applied, if neither expert knowledge nor trainingdata is available or cannot be transformed directly intocorresponding rules. Those algorithms do not requireparticular knowledge about the problem structure andcan be applied to various systems.

3 SYSTEM MODEL

The problem of job distribution between federatedcompute clusters has been continuously studied sincethe emergence of Grid computing in the beginningof the 1990ies. Early approaches favor a hierarchi-cal scheduling structure, where a central schedulerinstance—often called Meta-Scheduler, Grid Scheduler,or Broker—delegates submitted jobs to subordinatedpartner sites [6]. The most profound problem of this

scheduling structure is its bad fault-tolerance and lackof scalability.

Such a centralized model implies full knowledge of theGrid sites’ state and exclusive control over the local re-source management system (LRMS) to facilitate efficientjob scheduling. The local user community is often notallowed to bypass the central scheduler for submittingjobs.

The afore described approach is contrasted by a de-centralized structure, in which local sites can act asautonomous peers and share jobs in an equitable fashion.Thus, the process of job interchange is deferred to thecompetence of each local site scheduler and puts specialemphasis on the decision making process of acceptingremote and dispensing local jobs.

With respect to the basic parameters of modern e-Infrastructures regarding organizational autonomy andequity, we assume our Computational Grid as a loosecooperation between different HPC centers—further re-ferred to as sites—and consider Massively Parallel Pro-cessing (MPP) systems as their basic entities. For everyMPP entity we assume an own local user demand forcomputational resources which is reflected by the sites’originating workload. This includes the submission char-acteristics, but also the adaptation of the submitted jobs’resource demand to the local configuration. This scenariois based on the perception that, as a general rule, Gridenvironments are not build from scratch, but emergefrom collaborations between different organizational do-mains, each of which already operating one or moreMPP systems for internal purposes, in order to serve aprescribed, project-driven community of users.

Although Kee et al. [7] present an analysis of existingresource configurations and proposes a Grid platformgenerator that synthesizes realistic configurations of bothcomputing and communication resources, we followa much simpler model for our first analysis. Existingmodels can be used to test the here derived strategiesin a larger context, but for the first development werather assume interconnected parallel computers as Gridentities.

More formally, a Computational Grid consists hereof |K| independent sites. Each site k ∈ K is modeledby mk parallel processors which are identical such thata parallel job can be allocated on any subset of thesemachines. Note that we do not assume heterogeneousmachines as such a model would require, that some Jobdependent speed-up factor must be added. It is widelyagreed that a simple linear speed-up model does notfit reality as the speed-up is dependent on both theprocessor architecture and the structure of a program. Itwould be therefore necessary to introduce benchmarksthat cover most of the submitted applications, but as nodetails about the programs are included in the workloadtraces this is not applicable. As long as there is nocomprehensive speed-up model available it is betterto assume all machines with approximately the samespeed.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 4

Further, splitting jobs over multiple sites (multi-sitecomputation) is not allowed. Moreover, we assume thatall sites only differ in the number of available processors,but not in their speed: As we focus on the job exchangealgorithms, the differences in execution speeds can beneglected, see Schwiegelshohn et al. [8].

Fig. 1. Computational Grid scenario with independent,autonomous and equitable sites in a federated environ-ment.

The workload management within the infrastructure isconducted by a two-tier middleware, see Figure 1, com-prising a Local Resource Management System (LRMS)and a Grid Resource Management System (GRMS) oneach site. While the LRMS takes care of assigning work-load to resources for the local site only, the GRMSdecides on the delegation of jobs from and to the site.Users submit their workload to the local site in the samemanner as on classic LRMS systems; a small submissioncomponent intercepts those and forwards them to thelocal GRMS for further inspection.

That is, jobs that are submitted to the local site sched-uler may not be accepted for execution elsewhere be-cause of their resource demand being oversized for someor all of the other sites. Ignoring the inter-site collabo-ration for a moment, we describe the local schedulingproblem on MPP systems in the next paragraph.

3.1 LRMS LayerThe Local Resource Management System (LRMS) layerconsists of a waiting queue and a scheduler. The waitingqueue stores all locally submitted jobs while the sched-uler executes a specific scheduling strategy in order toassign jobs from the waiting queue onto the availablelocal resources. On MPP system layer, this approach al-lows the realization of priorities for jobs of different user

groups. Usually, the scheduling strategies are formulatedby the system provider to fulfill the users’ needs. Al-though many special-purpose algorithms exists that aretailored for certain MPP system owner priorities, weuse the basic and simple First-Come-First-Serve (FCFS)algorithm as an example on LRMS. This heuristic startsthe first job of the waiting queue whenever enough idleresources are available. Despite the very low utilizationthat is produced in the worst case this heuristic workswell in practice [9]. In general, our methodology is notrestricted to any special LRMS scheduling algorithm andcan be trained for arbitrary locally customized systems.Here, FCFS serves only as a simple and still widely usedexample that can easily be substituted by backfillingalternatives or even more advanced strategies.

In the remainder of this paper, we assume a LRMSwith one dedicated waiting queue for all incoming jobsand FCFS as local scheduling algorithm. An advantage,besides its fairness and easy implementation, is thesparse amount of required information. As we developGrid job exchange policies within information restrictivescenarios (e.g. no runtime estimations etc.), an LRMSalgorithm is required that works fine with a very basicjob description. Thus, we chose FCFS as it solely requiresa job’s degree of parallelism to make its scheduling de-cision. However note that our general methodology forlearning of rule based exchange policies, see Section 4, isnot restricted to any special LRMS algorithm. This con-cept can be applied with arbitrary LRMS configurationand FCFS serves only as a most common example.

3.2 GRMS Layer

The Grid Scheduling Resource Management Sys-tem (GRMS) extends every site by an additional layeron top of the LRMS, see Figure 1. The GRMS acceptslocally submitted jobs on behalf of the underlying LRMS.The actual exchange behavior is realized exclusively bythe GRMS and due to this strict layered architecturethe LRMS is kept completely unmodified. Both removalof jobs from LRMS queues as well as any kind ofintervention in the local scheduling process is prohibited.Furthermore, the GRMS is transparent to local users andthe LRMS. From the users point of view, all submittedjobs are executed on the local site, whereas each LRMSconsiders every job as a locally submitted independentof its origin. Decisions about a job’s delegation to an-other GRMS or local scheduling is made by a deployedexchange policy.

This exchange policy can be differentiated into twoindependent policies:Location PolicyThis policy becomes relevant if more than one exchangepartner is available in the Grid. Thus, there exists morethan one possibility to delegate a job to a remote Gridparticipant. For such scenarios, the location policy de-termines as a first step the sorted subset of possibledelegation targets À, see Figure 2.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 5

Location Policy

Transfer Policy Transfer Policy

Deliver job to LRMS

[accept]

delegation attempt

Decline of

remote job

[try to delegate job]

[execute locally]

[decline]

Local Job

Offer

Remote Job

Offer

[decline]

[accept]

[no partner

accepted]

[at least one

partner has job accepted]

[site left]

[no site left]

1

2

3

4

5

6

7

8

9

10

Fig. 2. Decision making process for local and remotejobs using an exchange policy consisting of location- andtransfer policy.

Transfer PolicyAfter the location policy has been applied the transferpolicy specifies whether a job should be delegated toa certain partner or not. For this purpose the policy isapplied separately on each partner in an redeterminedorder. Every time the transfer policy is consulted itdecides whether the job should be executed locally Áor delegated to the considered partner Â. In the firstcase, the job is sent to the remote LRMS Ã and in theother case the considered partner is requested for a job’sacceptance. A request can be replied in two differentways:

1) The job is accepted by the remote parter Ä. In thiscase, the job is delegated to this partner and noother further delegation attempts have to be made.

2) If the acceptance is declined the transfer policy isapplied for another partner in the Grid Å. This iter-ative procedure is continued until all partners havebeen requested. If none of the Grid participants iswilling to accept the job, the requesting site mustexecute it locally Æ, Ç, and Ã.

Further, the transfer policy has to decide about jobs thatare offered from remote sites and can choose betweenaccepting or declining a job offer. In the former case, thejob is immediately forwarded to the LRMS È, while it isrejected in the latter case É.

4 FUZZY SYSTEM BASED GRID SCHEDULINGAPPROACH

For the design of our GRMS decision policy we applythe method of Fuzzy inference proposed by Takagi,Sugeno and Kang, which is known as the Takagi-Sugeno-Kang (TSK) model in Fuzzy Systems literature [10].

The big advantage of the TSK model is its representa-tive power. It is capable of describing a highly nonlinearsystem using a small number of rules. Furthermore,since the output of the model has an explicit functionalexpression form, it is conventional to identify its param-eters using some learning algorithms as e.g. evolutionaryalgorithms. Exactly this combination will be used laterto optimize the Fuzzy model, see Section 6.

In this paper, the Fuzzy GRMS decision policy isfounded on a set of rules. Each specific rule describesa system state in which decisions about the acceptanceor refusal of jobs must be made. Thus, each system stateis described by a set of features. From the different partsof the overall system various state describing features areconceivable. They might be related to the current stateof the LRMS layer or to the currently job to decide, seeFigure 3. Please note that information about remote sites’systems states is assumed strictly classified. Following

GRM

S

Transfer Policy

LRM

S

Job

x1

x2

x3

Fuzzy Controller A

ccep

t / D

eclin

e

yD YD

Job-related features

Local features

Fig. 3. General Controller Architecture.

the Fuzzy rule concepts, a rule consists of a featuredescribing conditional part and a consequence part thatdecides on the acceptance or decline of an offered job.An exemplary set of rules might consist of the followingreasonable statement expressing some sort of experts’knowledge or experience:

IF queue is long and job highly parallel THEN decline job...

......

IF queue is empty and job not parallel THEN accept job

A complete rule base now constitutes the core of the rulesystem that can therefore be considered as a controller.The current system is checked whenever a new job hasbeen submitted to the local system or has been offeredfrom remote sites. In all those cases the current systemstate might change and the controller output has to bechanged if necessary. The controller concept is describedin the next paragraph.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 6

4.1 Fuzzy System for Decision Making

More formally, the general TSK model consists of Nr IF-THEN rules Ri such that

Ri := IF x1 is g(1)i and . . . and xNf

is g(Nf )i

THEN yi = bi0 + bi1x1 + . . .+ biNfxNf

(1)

where x1, x2, . . . , xNfare input variables and elements of

a vector ~x, and yi are local output variables. Further, g(h)i

is the h-th input Fuzzy set that describes the membershipfor a feature h. Thus, system state is described by a num-ber of Nf features. The actual degree of membership iscomputed as function value of an input Fuzzy set whichis characterized for example by a Gaussian MembershipFunction (GMF). The here used Fuzzy sets are explainedin the next section. Furthermore, bih are real valuedparameters that specify the local output variable yi as alinear combination of the input variables ~x. The overalloutput of the system yD(~x) is computed by Equation 2.

yD(~x) =

Nr∑i=1

φi(~x)yi

Nr∑i=1

φi(~x)(2)

where φi(~x) is the degree of membership of rule Ri for agiven input vector ~x, which is defined as

φi(~x) = g(1)i (x1) ∧ g(2)

i (x2) ∧ . . . ∧ g(Nf )i (xNf

) (3)

Each rule’s recommendation is weighted by its degreeof membership with respect to the input vector ~x. Thecorresponding output value of the TSK-System is thencomputed by the weighted average output recommen-dation over all rules. In the following, we explain howthis very general model is adapted to the problem ofdecision making addressed here. The specific coding ofrules and the output computation will be detailed in thefollowing paragraphs.

4.2 Encoding of Rules

The TSK-model allows do describe the degree of mem-bership by any sort of membership function. However,as the proposed Fuzzy-Systems is going to be optimizedwith an evolutionary algorithm the number of parame-ters for a rule system must be kept as small as possibleas every additional parameter increases the search spaceof the optimization problem and might deteriorate thesolution quality. Because of their smoothness and concisenotation, the Gaussian membership function has becomea very popular for specifying Fuzzy-Sets. It only requirestwo parameters (mean and variance) to specify its shapewhich makes it particularly suited to represent a rulebase with a minimum number of parameters.

This, for a single rule Ri every feature h of allNf features is now modeled by a (γ(h)

i , σ(h)i )-Gaussian

Membership Function (GMF)2 with no normalization asshown in Equation 4.

g(h)i (x) = exp

−(x− γ(h)i )2

σ(h)i

2

(4)

This function is completely described by defining theγ

(h)i and σ

(h)i values. The γ

(h)i -value adjusts the center

of the feature value, while σ(h)i models the region of

influence for this rule in the feature domain. In otherwords, for increasing σ

(h)i values the GMF becomes

wider, while the peak value remains constant at 1. Usingthis property of a GMF we are able to steer the influenceof a rule for a certain feature by σ

(h)i .

ã1(1) ó1

(1) ã1(2) ó1

(2) ã1(Nf) ó1

(Nf)…

ã2(1) ó2

(1) ã2(2) ó2

(2) ã2(Nf) ó2

(Nf)…

y1

y2

R1

R2

RNr

R1 R2 R3 RNr…

Rule Base

1/-1

1(1)

1(1)

1(2)

1(2)

1(Nf)

1(Nf)…

2(1)

2(1)

2(2)

2(2)

2(Nf)

2(Nf)…

y1

y2

R1

R2

RNr

R1 R2 R3 RNr…

Rule Base

1/-1

Fig. 4. Encoding pattern for single rules and constructionconcept for a whole rule base using concatenation.

Using this GMF as membership function a feature canbe coded as a pair of real values γ(h)

i and σ(h)i following

the approach of Juang et al. [11] and Jin et al. [12]. Usingthis feature description, a single rule’s conditional partis composed as shown in Figure 4. For the consequencepart, the general model in Equation 2, can be simplifiedas we have to deal with binary decisions only. Depen-dent on the current system state, the Fuzzy decisionmaker has to decide whether to accept an offered jobor not. Thus, we represent the acceptance of a job byan output value of 1 and the corresponding refusal of ajob by -1. With this binary decision concept, all weightsexcept bi0 in Equation 2 are set to 0 and the TSK modeloutput becomes yi = bi0. As we have to decide betweenthe acceptance/decline of a job offer, the output values

2. Different from the common notation we denote the mean of theGMF by γ to avoid conflicts with the parental population size ofEvolution Strategies which is in this paper denoted by µ.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 7

for a rule Ri can be chosen as

yi =

{1, if job is accepted−1, otherwise

(5)

Exactly this encoding concept is now used to buildan individual within an evolutionary algorithm. Thisscheme allows the encoding of a single rule by a string of2 ·Nf real-valued and one integer variable, see Figure 4.The whole rule base is encoded by concatenation ofsingle rules. A whole rule base consisting of Nr rules istherefore entirely described by a set of l = Nr(2 ·Nf + 1)parameters.This encoding scheme is perfectly suited asindividual representation within an evolutionary algo-rithm where individuals have the length l. As objec-tive function for the evolutionary algorithm we use theAWRT, see Section 5.1. Therefore, the optimization prob-lem is to find suitable parameter settings for the rules inorder to minimize the AWRT for alle participating sitesin the Grid.

4.3 Computation of the Controller Decision

To determine the actual controller output for a set ofinput states ~x the superposition of all degrees of mem-berships for a single rule Ri is computed first. For eachrule Ri a degree of membership g

(h)i (xh) of the h-th of

all Nf features is determined for all h. This value iscomputed as the function value of the h-th GMF for thegiven input feature value xh. According to the generalmodel, see Equation 3, the multiplicative superpositionof all these values as ”AND”-operation leads to anoverall degree of membership φi(~x) for rule Ri as shownin Equation 6.

φi(~x) =Nf∧h=1

g(h)i (xh) =

Nf∏h=1

exp

{− (xh − γ(h)

i )2

σ(h)2

i

}(6)

Further, the final controller output YD can be computedby considering the leading sign only, see Equation 7,

YD = sgn(yD(~x)) (7)

where a positive number again represents the acceptanceof the job and a negative values the decline. Note thatthe value zero corresponds to a decline as well.

The TSK-model allows including an arbitrary numberof features as controller input. Thus, it is possible toachieve a preferably accurate state description. However,this would increase the number of adjustable systemparameters drastically as each feature requires an addi-tional (γ, σ)-pair per rule. As the proposed Fuzzy systemis going to be optimized with an evolutionary algorithmthe number of system describing parameters must bekept as small as possible as every additional parameterincreases the search space of the optimization problemand might deteriorate the solution quality. Thus, werestrict ourselves to only two features for the systemstate description and detail them in the next paragraph.

4.4 Feature Selection for System State DescriptionFor the description of the current system state we relyon only Nf = 2 different features that will constitutethe conditional part of a rule. We denote jobs that havebeen inserted into the waiting queue ν at site k as j ∈νk. In order to cover comprehensive system informationwith only a single feature we consider the NormalizedWaiting Parallelism at site k (NWPk) as the first feature,see Equation 8.

NWPk =1mk

∑j∈νk

mj (8)

This feature indicates how many processors are expectedto be occupied by all submitted jobs (note that the num-ber of requires processors mj is known at release time)related to the maximum number of available processorsmk at site k. It reflects the efficiency of the currentlyrunning LRMS and measures the near future expectedload of the machine.

The second features focuses on the actual job thathas to be decided. The ratio of a job’s resource re-quirements mj and the maximum number of availableresources mk at the job’s submission site k is expressedby the Normalized Job Parallelism (NJP) in percent, seeEquation 9.

NJPj =mj

mk· 100 (9)

With those two selected features we approximate everypossible system state. Note that the NWP is not limitedin range; however, values greater than 10 occur veryrarely in practice.

4.5 Configuration of the Evolutionary Fuzzy SystemBefore we present the evaluation results the configura-tion of the Evolutionary Fuzzy System and the furtherevaluation circumstances are detailed. We generate ourEvolutionary Fuzzy Systems with a fixed number ofNr = 10 rules. Previous studies of Franke et al. [13]revealed that rule bases consisting of five to ten rulesyield good results. As we encode the whole rule basein one individual, we have to optimize a problem withNr · (Nf · 2 + 1) = 10 · (2 · 2 + 1) = 50 parameters.

For the tuning of the Fuzzy System we apply a (µ+λ)-Evolution Strategy. During the run of 150 generation acontinuous progress in fitness improvement is observ-able. As recommended by Schwefel [5], the ratio ofµ/λ = 1/7 should be used for Evolution Strategies. Wecreated a parent population of µ = 13 individuals whichresults in a children population of λ = 91 individuals.Hence, 91 individuals must be evaluated within eachgeneration. As objective function we use the AWRT, seeSection 5.1.

For the variation operators we used further the fol-lowing configurations: The mutation is performed withan individual mutation step-size for each feature. As thetwo features vary in they possible value range by a ratioof 1:10, see Figure 4.4, we used a mutation step-size of

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 8

0.01 for NWP and 0.1 for NJP respectively. This mutationis applied for the conditional part of the rule as they arereal values. For the binary consequence part we mutatevalues by flips from -1 to 1 or vice versa. Further, weapply discrete recombination in each reproduction step.

The population is uniformly initialized within theranges [0, 10] for the (γ, σ)-values of NWP and [0,100] for NJP respectively. As the fitness evaluation ofan individual is quite time consuming (from severalminutes up to half an hour) we evaluated the wholepopulation in parallel on a 200 node cluster with Pen-tium IV, 2.4Ghz machines. To train a basic rule set, ittakes on the cluster computer between five and ten hoursdepending on the simulated setups, see Table 2, as for Ksites (K2 −K) pairings have to be simulated. However,for the actual application of the Fuzzy-Systems onlybasic mathematical operations are requires which can beperformed within a negligible amount of time.

5 PERFORMANCE EVALUATION

In order to evaluate the performance of our approach,we define several well-known performance indicatorsfor job scheduling in the context of Grid computing,both from the users’ and the providers’ point of view.Additionally, we discuss the workload traces we use asinput data that are derived from real-world setups.

5.1 Average Weighted Response Time

This objective is computed for all jobs j ∈ τk that havebeen initially submitted to site k, see Equation 10. Inthe following we further denote by j ∈ πk all jobs thathave been actually executed on site k. It is widely agreedthat a short AWRT is the best way to describe that onaverage users do not wait long for their jobs to complete.Following Schwiegelshohn and Yahyapour [9], we usethe resource consumption (pj ·mj) of each job as weight.This ensures that neither splitting nor combination ofjobs can influence the objective function in a beneficialway.

AWRTk =

∑j∈τk

pj ·mj · (Cj(S)− rj)∑j∈τk

pj ·mj(10)

Note that this also respects the execution on remote sitesand, as such, the completion time Cj(S) refers to the sitethat executed job j.

5.2 Squashed Area and Utilization

The first two objectives are Squashed Area SAk andUtilization Uk, both specific to a certain site k. Theyare measured from the start of the schedule Sk, that isminj∈πk

{Cj(Sk) − pj} as the earliest job start time, upto its makespan Cmax,k = maxj∈πk

{Cj(Sk)}, that is thelatest job completion time and thus the schedule’s length.

SAk denotes the overall resource usage of all jobs thathave been executed on site k, see Equation 11.

SAk =∑j∈πk

pj ·mj (11)

Uk describes the ratio between overall resource usageand available resources after the completion of all jobsj ∈ πk, see Equation 12.

Uk =SAk

mk ·(Cmax,k − min

j∈πk

{Cj(Sk)− pj}) (12)

Uk describes the usage efficiency of the site’s availablemachines. Therefore, it is often serving as a schedulequality metric from the site provider’s point of view.

However, comparing single-site and multi-site utiliza-tion values is not recommended: since the calculationof Uk depends on Cmax,k, valid comparisons are onlyadmissible if Cmax,k is approximately equal betweenthe single-site and multi-site scenario. Otherwise, highutilizations may indicate good usage efficiency, althoughthe corresponding Cmax,k value is very small and showsthat only few jobs have been computed locally whilemany have been delegated to other sites for remoteexecution.

As such, we additionally introduce the Change ofSquashed Area ∆SAk, which provides a makespan- in-dependent view on the utilization’s alteration, see Equa-tion 13.

∆SAk =SAk∑

j∈τk

pj ·mj(13)

From the system provider’s point of view this objectivereflects the real change of the utilization when jobs areshared between site compared to the local execution.

5.3 Input DataThe Parallel Workloads Archive3 provides job submis-sion and execution traces recorded on real-world MPPsystem sites, each of which containing information onrelevant job characteristics. Four well known workloadshave been selected for evaluation: The KTH trace whichcontains records from a 100 processor IBM RS/6000 SPsystem at the Swedish Royal Institute of Technology inStockholm, the CTC trace from a 430 processor IBMRS/6000 SP system at the Cornell Theory Center inIthaca, NY, and two logs recorded at the San DiegoSupercomputer Center (SDSC) in La Jolla, CA. On theone hand, the SDSC05 ”DataStar” trace recorded on aIBM eServer pSeries 655/690 system with a total of 1664processors and the SDSC00 trace containing submissionsto a 128 processor IBM RS/6000 SP system. Relevantdetails of the examined cleaned traces are given inTable 1.

The original workloads record time periods of dif-ferent length. In order to be able to combine different

3. http://www.cs.huji.ac.il/labs/parallel/workload/.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 9

Identifier #Jobs mk AWRT U Cmax

KTH-5 11780 100 488387.49 64.84 13765377KTH-6 16699 100 99236.27 68.52 16420782CTC-5 35360 430 57897.77 63.74 13009718CTC-6 41839 430 59118.15 67.05 16346403SDSC05-5 28184 1664 56925.10 45.94 13078215SDSC05-6 46719 1664 77463.52 70.97 16419455

SDSC00-6 16316 128 413957.04 73.38 17002360

TABLE 1Workload characteristics of the used input data, including

AWRT in seconds, U in %, and Cmax in seconds forsingle site execution with FCFS.

workloads in a multi-site simulations and to have avalidation set available, we shortened and separated theworkloads to set of five and six month respectively4. Inthe remainder of this work, the five month sequencewill serve as training sequences for the EvolutionaryFuzzy Systems. The six month sequences are then usedfor application tests. In this context, the SDSC00-6 tracewill only be used to investigate the behavior of thetrained system when an previously unknown partnerparticipates in the system. Thus, we created no fivemonth training sequence of the SDSC00 workload. Itis important to mention that the here described dataonly serve as simulation input and do not contain anyadditional information concerning Fuzzy Systems (e.g.knowledge base, see Section 2.3)

Further, we do not shift the traces regarding theiroriginating timezones and restrict our study to work-loads which are all submitted within the same timezone.Therefore, the known diurnal rhythm of job submissionis similar for all sites in our scenario and time shiftscannot be availed to improve scheduling. In a global gridthe different timezones even benefit the job schedulingas idle machines at night can be used by jobs from peakloaded sites at noon, see Ernemann at al. [14]. As wecannot benefit from timezone shifts the presented resultsmight be even better in a global grid.

We simulated the workload on their original machineswith a LRMS that applied FCFS, see Section 3.1 for thelocal scheduling. The results for the above describedperformance metrics as well as other relevant job char-acteristics are listed in Table 1. We will refer to this non-cooperative case for the matter of comparison in the restof this paper.

6 LEARNING A BASIC RULE SET

Starting with no rule set at all, we need to bootstrapthe system: A first, basic set of rules has to be learned.Although it is generally necessary to create rule sets forboth location and transfer policy, see Section 3.2, we startwith a pair-wise training approach in order to reduceother partners’ influences as much as possible. To this

4. http://www.it.irf.uni-dortmund.de/∼lepping/traces/.

end, we limit job exchange to a single partner only—thus needing no location policy at all—and concentrateon the optimization of the transfer policy. Furthermore,we evolve only one site at a time while applying a statictransfer policy on the other site. This policy realizes anAccept When Fit (AWF) behavior, which accepts all jobsoffered to the GRMS layer for local execution if theydo not require more than the currently free resources.Otherwise, the job is offered to the other participants.

The motivation for using a static transfer policy forthe training partner and refraining from developing bothsites together lies in the application of evolutionary op-timization methods: A simultaneous training of two rulebases would lead to a mutual adaption of both trainingpartners. This however, would result in an environmentsubject to continuous change, making an evolutionary-guided adaptation very difficult: A rule base that leadsto good results during one generation might fail com-pletely in the next generation if the partner site changesits behavior completely, too. The aspired robustness inchanging environments will be achieved by additionalrefinements of the concept in the next sections.

6.1 Results for Training Sequences

The training results are listed in Table 2; gray-shadedlines indicate the evolved site while the other linesindicate the static site as described above. As expected,the optimization leads to significant improvements of theAWRT in all examined setups.

This results in larger AWRT for the partner site thatdoes not adapt its behavior. For instance in Setup I,the AWRT improves by 86.47% compared to FCFS, seeTable 1, while the AWRT for the CTC worsens for almost10%. Note that this corresponds to a strong shift ofwork as the Squashed Area (∆SA) is 48.56% lower forthe KTH and approximately 12% higher on the CTCsite. However, when the focus is changed, see Setup II,and CTC is optimized we achieve also improvements of5.44% for AWRT and slight load relief for the CTC site.Besides that, the AWRT is still significantly improved inSetup II although we do not focus on the KTH. Thisis due to the worse performance in the non-cooperativecase and indicates that Grid computing for this site isvery advantageous.

Furthermore, in Setup III and IV the small KTHinteracts with the very large SDSC05 compute centerand naturally the KTH benefits from more availableresources. It is remarkable that also the SDSC05 canimprove its AWRT for more than 3%, see Setup IV. Atthe same time, the Squashed Area is slightly increasedwhich indicated that an improvement in AWRT is notnecessarily caused by smaller utilization.

When the CTC interacts with a large compute center,see Setup V, the CTC also strongly benefits as its AWRTis decreased by more than 20%. Likewise, the SDSC05can benefits from the cooperation with a medium sizecompute installation like the CTC, see Setup VI.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 10

Setup Site AWRTk Uk ∆AWRTk ∆Uk ∆SAk ∆Cmax,k

I KTH-5 66100.77 sec. 35.33% 86.47% -45.51% -48.56% 5.60%CTC-5 63534.29 sec. 71.40% -9.74% 12.01% 12.15% -0.14%

II KTH-5 62884.33 sec. 73.00% 87.12% 12.58% 6.40% 5.49%CTC-5 54745.53 sec. 62.70% 5.44% -1.64% -1.60% -0.05%

III KTH-5 59409.60 sec. 45.15% 87.84% -30.36% -34.04% 5.28%SDSC05-5 58799.15 sec. 47.34% -3.29% 3.06% 3.04% 0.01%

IV KTH-5 70561.62 sec. 53.39% 85.55% -17.66% -21.75% 4.97%SDSC05-5 54773.39 sec. 46.85% 3.78% 1.98% 1.94% 0.02%

V CTC-5 45997.73 sec. 58.55% 20.55% -8.14% -8.15% 0.01%SDSC05-5 57013.44 sec. 47.27% -0.16% 2.90% 2.91% -0.02%

VI CTC-5 57069.59 sec. 63.30% 1.43% -0.69% -0.51% -0.20%SDSC05-5 49916.01 sec. 46.04% 12.31% 0.22% 0.18% 0.02%

TABLE 2Results for the pair-wise rule base training. The gray shaded rows indicated the optimized rule base.

(a) KTH vs. CTC (b) KTH vs. SDSC05 (c) CTC vs. KTH

(d) CTC vs. SDSC05 (e) SDSC05 vs. KTH (f) SDSC05 vs. CTC

Fig. 5. Set of characteristic curves after optimization for the different setups. The system state description is depictedon the x- and y-axis for NJP and NWP respectively. The grid at level zero on the z-axis marks the border betweenacceptance of jobs (>0) and refusal of jobs (≤0). Additionally, the activated areas during application on the trainingworkloads are depicted by black points.

In order to understand the optimized transfer behaviorin more detail, we show in Figure 5 the set of character-istic curves for the different Grid pairs. There, the systemstates are shown on the x- and y-axis for NJP and NWPrespectively. The border between acceptance of jobs (>0)and refusal of jobs (≤0) is marked by a grid at levelzero on the z-axis. In Figure 5(a) and Figure 5(b) onecan see that the small KTH site has a quite restrictiveexchange behavior as only small jobs are accepted andalmost all remote jobs are declined. As local and remotejobs are treated similarly, such a behavior tries to offerall jobs to remote sites first. The black dots indicate theareas in the decision space that are activated during thecontroller application. It becomes obvious that mainlysmall jobs are accepted. In Figure 5(b) also larger jobs

from the SDSC05 are offered as almost the whole rangefor NJP is activated. However, the exchange policy forthe KTH refuses them in any situation.

The situation is different for the CTC that does notonly accept its own large jobs but also remote jobs thatrequire many processors. Here, the small jobs are offeredto the remote site first, see Figure 5(c). Finally, the largecompute center, see Figure 5(e) and (f), accepts alwaysany kind of jobs as long as the waiting queue is nottoo full. If the NWP is larger that 4 the transfer policyswitches from acceptance to refusal. This is reasonable asis prevents the starvation of jobs in the queue and protectthe Grid participant for overuse of resources. Summa-rizing it becomes apparent from these figures that jobexchange among grid participants is only beneficial if

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 11

the partners show some sort of cooperation; that is theyhave to accept also alien jobs in suitable situations.

6.2 Robustness of Trained Rule SetsTo test the robustness of the pair-wise learned rule bases,we apply them to the 6 month workloads within thesame setups. To this end, only two site Grids are con-sidered and every partner applies its egoistically learnedrule base. Note that AWF is not used in these scenariosanymore.

In Figure 6, the changes in AWRT and SA are depictedwhen both partners apply their learned rule bases topreviously unknown job submissions.

-10.00

-5.00

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

Imp

rov

emen

tsin

%

AWRT-RB

SA

KTH-6 CTC-6

KTH-6 SDSC05-6

SDSC05-6CTC-6

Fig. 6. AWRT and U improvements for the optimized ruleson non-trained data sets.

Obviously, the Evolutionary Fuzzy Systems still de-crease the AWRT significantly in all cases. This indicatesa high robustness with respect to submission changes.

15,00

20,00

25,00

30,00

35,00

40,00

45,00

rove

men

ts in

%

AWRT for OPTAWRT for RBAWRT for AWF

-5,00

0,00

5,00

10,00

15,00

20,00

25,00

30,00

35,00

40,00

45,00

Impr

ovem

ents

in %

AWRT for OPTAWRT for RBAWRT for AWF

KTH-6 CTC-6 KTH-6 SDSC05-6 CTC-6 SDSC05-6

Fig. 7. AWRT and U improvements for the rule basedapproach (RB) and the standard exchange policy (AWF),respectively.

Further, we show in Figure 7 the AWRT improve-ments in comparison to the AWF transfer policy andadditionally to near optimal solutions. Although AWFis a quite naive exchange method it is sufficient for thetraining purpose. However, for the matter of comparison

it is more meaningful to compare our approach againstthe best achievable solutions. Unfortunately, there ex-ists no such algorithm that can generate optimal solu-tions under the given information restrictions. Thus, weapply an request based exchange method (OPT) thatactually relaxes all information restrictions and allowsalso backfilling of jobs in the queues. This method wasdetailed and extensively evaluated by Grimme et al. [15]and may serve—by neglecting the given informationrestrictions—as best achievable solution.

In the figure it becomes obvious that the rule basedapproach achieves results which are approximately halfas good as the optimal results (OPT). This shows thateven under the strong information restriction the pro-posed algorithm does not perform too bad as the optimalresults would be never achievable under these circum-stances. Although AWF performs good for KTH andleads to slight improvements for SDSC05, it completelyfails for the CTC workload trace. However, the rulebased transfer policy outperforms AWF in all cases andleads even to shorter AWRT values for the SDSC05together with the much smaller KTH.

7 COPING WITH MORE THAN ONE PARTNER

After setting up the basic rule sets for job exchangein a controlled environment with a single partner, wenow focus on the applicability of the rule bases ina Grid scenario with more participants. To this end,KTH, CTC, and SDSC05 are combined and the unknownsubmissions from the remaining six month of the tracesare used. This time, however, a location policy needs tobe applied in order to prioritize the options of deliveringjobs to another participant.

7.1 An AWRT-based Location PolicyIn order to create a prioritization of the available poten-tial delegation targets we follow a two-step approach.As first step, we generate the subset of sites that in totalprovide enough machines to execute the job. That is, wesort out all sites with mk < mj ∀ k ∈ K.

As second step the generated subset is sorted accord-ing to their former achievement with respect to a dele-gation source. Good achievements can be measured byshort AWRT of jobs on the corresponding sites. For thehere proposed location policy, a site calculates the AWRTmetric with respect to every exchange partner. To thisend, only jobs are considered that have been deliveredto the corresponding partner. The AWRT indicates howlong the delegation source had to wait for the completionof its delivered jobs in the past. This metric is based onthe assumption that a short AWRT for delivered jobs inthe past is expected to yield also short AWRT values forfuture delegated job.

7.2 Results for Multiple PartnersThe results in Figure 8 clearly indicate that the AWRTis still significantly improved for all sites while the

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 12

utilization decreases for the small partner. However,although the CTC and SDSC05 are slightly more utilizedit does again improve their objective values.

8 COPING WITH ALIEN PARTNERS

Until now, our learning approach was suited to generatea pool of rule bases for partners that are known inadvance. This, however, requires knowledge about thesubmitted workload in order to tune the transfer policies.With respect to the robustness requirement, we thereforeextend our approach to being able to perform well inan environment with previously unknown Grid partici-pants. This requires the automatic adjusting of transferbehavior to partners that were not part of a trainingscenario.

8.1 Selection of Rule BaseAs mentioned in Section 3.2, the rule based transferpolicy is applied to each partner site separately. If anew partner arises a transfer policy has to be selectedfrom the pool of all learned transfer policies. To identifythe best suitable transfer policy we assume a correlationbetween delegation targets’ maximum amount of avail-able resources and their transfer behavior. We conjecturethat the behavior within the grid mainly depends on asite’s resource number. Thus, we categorize the varioustrained rule bases by the machines sizes they belong to.Among the whole trained pool of transfer policies thebest fitting one, with respect to the number of maximumavailable resources, is selected to make the decision fora submitted job.

8.2 Results for Alien PartnersFinally, we investigate the performance of the rule baseselection concept and add the SDSC00 as a site withmk = 128 processors to the Grid. Following the rule baseselection concept, every site uses the KTH learned rulebase for the interaction with SDSC00 as it has the greatestsimilarity with respect to the machine size. The SDSC00site, in turn, uses AWF for exchange purpose.

In Figure 9 the results for interaction with CTCand SDSC05 are depicted and again we observestrong AWRT improvements. Similarly to the KTH, alsoSDSC00 shows a poor performance for exclusive singlesite execution. Therefore, there is a high potential toimprove the AWRT. However, it is important to see thatnot only this partner can improve its AWRT but alsoother participants are able to improve their AWRT for atleast 8%.

The same holds for a four-site grid with one unknownpartner, which results are presented in Figure 10. Again,AWRT improvements are achieved for all site for at least10% while the utilization decreases on the smaller sitesand increases on the larger site.

Summarizing, the learned Evolutionary Fuzzy Sys-tems realize a beneficial job exchange for several co-operative computing environments. The examined Grid

-25.00

-15.00

-5.00

5.00

15.00

25.00

35.00

45.00

Imp

rovem

ents

in%

AWRT

SA

KTH-6 CTC-6 SDSC05-6

Fig. 8. AWRT and U improvements for the optimized rulesin a three-site scenario on non-trained data sets.

-15.00

5.00

25.00

45.00

65.00

85.00

Imp

rovem

en

tsin

%

AWRT

SA

SDSC00-6

CTC-6 SDSC05-6

Fig. 9. AWRT and U improvements for optimized rulesin a three-site scenario with non-trained data sets andSDSC00 as unknown participant.

-20.00

0.00

20.00

40.00

60.00

80.00

Imp

rovem

en

tsin

%

AWRT

SA

KTH-6

SDSC00-6 CTC-6 SDSC05-6

Fig. 10. AWRT and U improvements for optimized rulesin a four-site scenario with non-trained data sets andSDSC00 as unknown participant.

sizes range from two to four sites and include unknownjob submissions as well as previously unknown Gridparticipant. In all cases, the AWRT can be significantlydecreased which results in improvements of about 10%-

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 13

20% for large sites and 40%-80% for larger sites. Ithas been shown, that the job exchange policies show astrong robustness with respect to both new sort of jobsubmissions and environmental changes.

9 RELATED WORK

Although decentralized scenarios are supposed to bewidely-used in future, todays development of Grid in-frastructure focuses on the application of centralizedGrid scheduling services [1] and yields only limitedinnovation regarding efficient algorithms for resourceallocation. Ernemann et al. [16] point out advantagesof hierarchical scheduling in general by consideringthe AWRT objective. In another work, they propose anadaptive algorithm which decides based on the overallsystem state whether to split up a job to execute iton different Grid sites. Further, Kurowski et al. [17]identify multiple objectives for efficient job schedulingin Grids and propose a strategy based on predictionmechanisms and resource reservation. Further Hongand Prasanna [18], model a heterogeneous grid as agraph, which is capable of representing an arbitrarynetwork topology. They focuses on the maximization ofthe steady state throughput of such systems and showthat the throughput maximization problem can be solvedthrough a linear programming formulation. Recently,Iosup et al. [19] proposed a delegated matchmakingmethod, which temporarily binds resources from remotesites to the local environment. As such, they strive torealize also Grid-interaction on the Meta-Level, but theyassume a hybrid structure and augment a hierarchicalstructure by P2P architectures.

Of course, due to the highly dynamical character ofthe problem, nature inspired techniques for adaptivescheduling have also been proposed during the lastyears: besides GA-driven Meta-schedulers [20] hybridheuristics have been applied [21].

For the decentralized scenario, only few results thatsupport the application of job delegation in Grids havebeen published so far. England and Weissman [22] givean estimation of benefits and costs but base their resultson synthetic workloads and review the average slow-down only. A more real-world centered investigationhas been made by Hamscher et al. [23]. They applydifferent job sharing strategies to two real workloadsand analyze the resulting behavior. However, they giveno quantification of the actual benefit compared to nonGrid-cooperative set-ups. Grimme et al. [24] analyze theprospects of collaborative job sharing in Grids. Theycompare their results to the non-cooperative scenarioof the same machines but do not give a quantitativeestimation of possible collaborative benefits. A follow-upto this initial work [25] focuses on the maximum achiev-able improvements for the AWRT objective consider-ing Grid participants with mutually conflicting perfor-mance goals. They consider the job exchange problem inGrids as multi-objective optimization where the resulting

Pareto front shows high potential for achievable AWRTimprovements within Grids, motivating the design ofmore advanced exchange policies.

Fuzzy Systems have only been partially applied toscheduling in Computational Grid. Huang et al. [26] de-termine a knowledge base for Grid computing by trans-forming Grid monitoring data into a performance dataset. They extract the association patterns of performancedata through fuzzy association rule in order to optimizethe Grid scheduling. This approach however requiresthe availability of various monitoring data and doesnot work within information restricted environments.Further, Fayad et al. [27] address a Grid schedulingproblem where processing times underlay uncertainty.They use Fuzzy sets to model the uncertain processingtimes and apply tabu search to maximize the numberof scheduled jobs. Their approach is evaluated withworkloads recorded at real-world installations. However,they consider sequential jobs only, i.e. non-parallel jobssee Section 2.1, and assume artificial due dates that haveto be met. Further, they allow direct access of the Gridscheduler to local resources.

The capabilities of Evolutionary Fuzzy Systems haveonly recently been applied to scheduling. So far—atleast to the authors’ knowledge—no endeavors havebeen made to apply them to Grid scheduling problems.However, some promising results have been obtained forparallel computer scheduling with Evolutionary FuzzySystems: Franke et al. [28] derived a methodology togenetically learn a Fuzzy System from real workloaddata to establish user group prioritizations. With such asystem, the owner of a parallel computer is able to flexi-bly realize prioritizations for certain user groups withoutworsening the overall system utilization. After a trainingphase, the controller realizes the desired priorities whilestill delivering significantly shorter response times forthe favored user groups.

10 CONCLUSION AND FUTURE WORKWe presented an Evolutionary Fuzzy System approachto finding non-invasive, situation-adaptive, and robustalgorithms for workload distribution in decentralizedComputational Grids. Such environments assume fullautonomy of the participating HPC/HTC centers andstrict confidentiality of dynamic system information anddemand Grid middlewares that do not interfere with therunning LRMS.

In our model, we introduced a decoupled GRMS layeron top of the available systems, which decides uponexecution on the local system or delegation to a remotesite for user-submitted jobs in an online, non-clairvoyantmanner. The decision mechanism is established by usinga Fuzzy controller system with flexible rule sets that areoptimized using evolutionary computation, using a pair-wise training approach and performance metric-basedrule base selection.

The presented system shows that—using real-worlddata—it is possible to establish job exchange policies

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 14

which lead to significantly improved performance forall user communities in terms of response time andutilization. We further find that our approach behavesrobustly with respect to fluctuations in the workloadpattern and shows situational adaptiveness even undercircumstances of unknown submission characteristics.Overall, we think that the derived controllers providea stable basis for workload distribution and interchangein Computational Grids, and may qualify as a promis-ing technology for future Service Grid-based e-Scienceinfrastructures.

For future work it is essential to consider a morefine grained machine an Grid environment model. Forexample, it is necessary to include also data intensiveapplications. The here proposed method can be extendedto data-aware scheduling where the replication of theaccessibility of data is considered as well. Further, thetraining of the system should be made online, duringscheduling in form of an self-learning system. Learningscheduling decision with machine learning techniqueswould lead to a zero-configuration scheduling systemwhich can be easily integrated into existing middelwaresolutions.

REFERENCES[1] F. Gagliardi, B. Jones, F. Grey, M.-E. Begin, and M. Heikkurinen,

“Building an infrastructure for scientific grid computing: statusand goals of the egee project,” Philosophical transactions. Series A,Mathematical, physical, and engineering sciences, vol. 363, no. 1833,pp. 1729–1742, 2005.

[2] C. Franke, F. Hoffmann, J. Lepping, and U. Schwiegelshohn, “De-velopment of Scheduling Strategies with Genetic Fuzzy Systems,”Applied Soft Computing, vol. 8, no. 1, pp. 706–721, January 2008.

[3] O. Cordon, F. Herrera, F. Hoffmann, and L. Magdalena, “Evo-lutionary Tuning and Learning of Fuzzy Knowledge Bases,” inGENETIC FUZZY SYSTEMS, ser. Advances in Fuzzy Systems -Applications and Theory. World Scientific, Singapore, July 2001,vol. 19.

[4] D. G. Feitelson and B. Nitzberg, “Job characteristics of a produc-tion parallel scientific workload on the NASA ames iPSC/860,” inProceedings of the 1st Job Scheduling Strategies for Parallel Processing,ser. Lecture Notes in Computer Science (LNCS), D. G. Feitelsonand L. Rudolph, Eds., vol. 949. Springer, 1995, pp. 337–360.

[5] H.-P. Schwefel, Evolution and Optimum Seeking. John Wiley &Sons, New York, 1995.

[6] D. C. Marinescu, L. Boloni, R. Hao, and K. koo Jun, “An alterna-tive model for scheduling on a computational grid,” in Proceedingsof ISCIS’98, the Thirteenth International Symposium on Computer andInformation Sciences, Antalya. IOP Press, 1998, pp. 473–480.

[7] Y.-S. Kee, H. Casanova, and A. A. Chien, “Realistic modeling andsvnthesis of resources for computational grids,” in Proceedings ofthe 2004 Conference on High Performance Networking and Computing,2004, p. 54.

[8] U. Schwiegelshohn, A. Tchernykh, and R. Yahyapour, “Onlinescheduling in grids,” in 22nd IEEE International Parallel and Dis-tributed Processing Symposium (IPDPS 2008). IEEE Press, April2008, cD-ROM.

[9] U. Schwiegelshohn and R. Yahyapour, “Fairness in parallel jobscheduling,” Journal of Scheduling, vol. 3, no. 5, pp. 297–320, 2000.

[10] T. Takagi and M. Sugeno, “Fuzzy Identification of Systems andIts Applications to Modeling and Control,” IEEE Transactions onSystems, Man, and Cybernetics, vol. SMC-15, no. 1, pp. 116–132,1985.

[11] C.-F. Juang, J.-Y. Lin, and C.-T. Lin, “Genetic Reinforcement Learn-ing through Symbiotic Evolution for Fuzzy Controller Design,”IEEE Transactions on System, Man and Cybernetics, vol. 30, no. 2,pp. 290–302, April 2000.

[12] Y. Jin, W. von Seelen, and B. Sendhoff, “On Generating FC3

Fuzzy Rule Systems from Data Using Evolution Strategies,” IEEETransactions on System, Man and Cybernetics, vol. 29, no. 6, pp. 829–845, December 1999.

[13] C. Franke, J. Lepping, and U. Schwiegelshohn, “Genetic FuzzySystems applied to Online Job Scheduling,” in Proceedings of the2007 IEEE International Conference on Fuzzy Systems. London: IEEEPress, June 2007, pp. 1573–1578.

[14] C. Ernemann, V. Hamscher, and R. Yahyapour, “Benefits of globalgrid computing for job scheduling,” in Proceedings of the FifthIEEE/ACM International Workshop on Grid Computing (GRID’04).IEEE Computer Society, 2004, pp. 374–379.

[15] C. Grimme, J. Lepping, and A. Papaspyrou, “Benefits of JobExchange Between Autonomous Sites in Decentralized Computa-tional Grids,” in Proccedings of the 8th IEEE International Symposiumon Cluster Computing and the Grid (CCGrid), T. Priol, L. Lefevre,and R. Buyya, Eds. IEEE Computer Society, May 2008, pp. 25–32.

[16] C. Ernemann, V. Hamscher, U. Schwiegelshohn, A. Streit, andR. Yahyapour, “On advantages of grid computing for paralleljob scheduling,” in Proceedings of the 2nd IEEE/ACM InternationalSymposium on Cluster Computing and the Grid (CCGRID02). IEEEPress, May 2002, pp. 39–46.

[17] K. Kurowski, J. Nabrzski, A. Oleksiak, and J. Weglarz, “Schedul-ing Jobs on the Grid - Multicriteria Approach,” ComputationalMethods in Science and Technology, vol. 12, no. 2, pp. 123–138, 2006.

[18] B. Hong and V. K. Prasanna, “Bandwidth-aware resource alloca-tion for heterogeneous computing systems to maximize through-put,” in International Conference on Parallel Processing (ICPP 2003).IEEE Computer Society, 2003, pp. 539–546.

[19] A. Iosup, T. Tannenbaum, M. Farrellee, D. Epema, and M. Livny,“Inter-operating grids through delegated matchmaking,” ScientificProgramming, vol. 16, no. 2-3, pp. 233–253, 2008.

[20] J. Carretero, F. Xhafa, and A. Abraham, “Genetic algorithm basedschedulers for grid computing systems,” International Journal ofInnovative Computing, Information and Control, vol. 3, no. 6, pp.1–19, 2007.

[21] W. Jakob, A. Quinte, K.-U. Stucky, and W. Suss, “OptimisedScheduling of Grid Resources Using Hybrid Evolutionary Algo-rithms,” in Proceedings of the 6th International Conference on ParallelProcessing and Applied Mathematics, ser. Lecture Notes in ComputerScience (LNCS), R. Wyrzykowski et al., Eds., no. 3911, 2005, pp.406–413.

[22] D. England and J. B. Weissman, “Cost and Benefits of LoadSharing in the Computational Grid,” in Proceedings of the 10thJob Scheduling Strategies for Parallel Processing, ser. Lecture Notesin Computer Science (LNCS), D. G. Feitelson, L. Rudolph, andU. Schwiegelshohn, Eds., vol. 3277. Springer, 2004, pp. 160–175.

[23] V. Hamscher, U. Schwiegelshohn, A. Streit, and R. Yahyapour,“Evaluation of job-scheduling strategies for grid computing,” inProceedings of the 7th International Conference on High PerformanceComputing (HiPC00), ser. Lecture Notes in Computer Science(LNCS), R. Buyya and M. Baker, Eds., vol. 1971. Bangalore,Indien: Springer, 2000, pp. 191–202.

[24] C. Grimme, J. Lepping, and A. Papaspyrou, “Prospects of Collabo-ration between Compute Providers by means of Job Interchange,”in Proceedings of Job Scheduling Strategies for Parallel Processing, ser.Lecture Notes in Computer Science (LNCS), E. Frachtenberg andU. Schwiegelshohn, Eds., vol. 4942. Springer, June 2007, pp. 132–151.

[25] ——, “Discovering Performance Bounds for Grid Scheduling byusing Evolutionary Multiobjective Optimization,” in Prococeedingsof the Genetic and Evolutionary Computation Conference (GECCO2008), M. Keijzer et al., Eds., ACM. Atlanta, Georgia, USA: ACMPress, July 2008, pp. 1491–1498.

[26] J. Huang, H. Jin, X. Xie, and Q. Zhang, “An approach to gridscheduling optimization based on fuzzy association rule mining,”in Proceedings of the First International Conference on e-Science andGrid Computing. Washington, DC, USA: IEEE Computer Society,2005, pp. 189–195.

[27] C. Fayad, J. M. Garibaldi, and D. Ouelhadj, “Fuzzy grid schedul-ing using tabu search,” in International Conference on Fuzzy Sys-tems, FUZZ-IEEE 2007. IEEE Computer Society, 2007, pp. 1–6.

[28] C. Franke, J. Lepping, and U. Schwiegelshohn, “On advantagesof scheduling using genetic fuzzy systems,” in Proceedings of the12th Workshop on Job Scheduling Strategies for Parallel Processing(JSSPP), ser. Lecture Notes in Computer Science (LNCS), vol. 4376.Springer, June 2006, pp. 68–93.