Preference–Based Matchmaking of Grid Resources with CP–Nets

Noname manuscript No.(will be inserted by the editor)

Preference–based Matchmaking of Grid Resources

With CP–Nets

Massimo Cafaro · Maria Mirto ·

Giovanni Aloisio

Received: date / Accepted: date

Abstract We deal with the problem of preference-based matchmaking of com-putational resources belonging to a grid. We introduce CP–Nets, a recent de-velopment in the field of Artificial Intelligence, as a means to deal with user’spreferences in the context of grid scheduling. We discuss CP–Nets from a the-oretical perspective and then analyze, qualitatively and quantitatively, theirimpact on the matchmaking process, with the help of a grid simulator wedeveloped for this purpose. Many different experiments have been setup andcarried out, and we report here our main findings and the lessons learnt.

Keywords Grids · Matchmaking · CP–Nets

1 INTRODUCTION

Grid computing [16] emerged as a new paradigm distinguished from traditionaldistributed computing because of its focus on large-scale resource sharing andinnovative high-performance applications. The grid infrastructure ties togethera number of Virtual Organizations (VOs) [17], that reflect dynamic collectionsof individuals, institutions and computational resources.

A Grid Information Service (GIS) [12] aims at providing an informationrich environment to support service/resource discovery and decision making

M. CafaroUniversity of Salento, Lecce, ItalyCMCC - Euro-Mediterranean Centre for Climate Change, Lecce, ItalyE-mail: [email protected]

M. MirtoCMCC - Euro-Mediterranean Centre for Climate Change, Lecce, ItalyE-mail: [email protected]

G. AloisioUniversity of Salento, Lecce, ItalyCMCC - Euro-Mediterranean Centre for Climate Change, Lecce, ItalyE-mail: [email protected]

2 Massimo Cafaro et al.

processes. The main goal of grid environments is indeed the provision of flexi-ble, secure and coordinated resource sharing among VOs to tackle large-scalescientific problems, which in turn require addressing, besides other challeng-ing issues like authentication/authorization, access to remote data etc., ser-vice/resource discovery and management for scheduling and/or co-schedulingof resources.

Information thus plays a key role allowing, if exploited, high performanceexecution in grid environments: the use of manual or default/static configura-tions hinders application performance, whereas the availability of informationregarding the execution environment fosters design and implementation of so-called grid-aware applications.

Obviously, applications can react to changes in their execution environmentonly if these changes are somehow advertised. Therefore, self-adjusting, adap-tive applications are natural consumers of information produced in grid envi-ronments where distributed computational resources and services are sourcesand/or potential sinks of information, and the data produced can be static,semi-dynamic or fully dynamic [33]. Resource brokering services and gridschedulers also need to access this information for matchmaking available gridresources against a user’s request [15] [47].

The problem of matchmaking available resources in a grid environmentagainst a user’s request entails finding one or more (a pooled set of) resourcesthat best match the user’s request. A matchmaking service is in charge offinding the best match given the current status of the grid environments:indeed, the same request may result in different matchings under differentresource load, etc.

The input to the matchmaking service is a job description expressed ina specified formalism (e.g., the Job Submission Description Language [1], theJob Description Language [34], the Condor’s classified advertisements [39] etc.)containing constraints to be satisfied by resources to execute a batch, param-eter sweep or workflow job.

Our contribution is two-fold. First, we propose to extend the matchmak-ing process to take into account the user’s preferences (besides the usual con-straints) and, in order to deal with preferences, we suggest the use of Condi-tional Preference Networks (CP–Nets) [7], a powerful concept borrowed fromthe field of Artificial Intelligence that can be used to describe, structure andreason about user’s preferences. Second, we thoroughly analyze, both qualita-tively and quantitatively, the impact of CP–Nets on the matchmaking process.Our analysis takes into account both the resource broker (or grid scheduler)and the users’ perspectives, in order to assess the validity of our approach.

It is worth remarking here that our focus is not on the scheduling process :we limit ourself in this analysis to matchmaking only, i.e., to the problemof finding a matching set of resources taking into account user’s constraintsand preferences. Therefore, in what follows, we will not deal with the problemof scheduling grid resources using algorithms such as FCFS, SJF, backfillingetc to achieve, for instance, minimization of the makespan metric and we donot discuss grid scheduling systems such as GlideinWMS [37] etc. Instead, we

Preference–based Matchmaking of Grid Resources With CP–Nets 3

will utilize the simplest possible strategy: since our matchmaking algorithmreturns a set of resources ranked according to their matching degrees, we willsimply schedule the corresponding job on the first available resource. If thisresource is not available, we will try to schedule the job on the second resourceif available and so on, round-robin.

The rest of the paper is organized as follows. Section 2 introduces CP–Nets.Our matchmaking approach based on CP–Nets is presented in Section 3. Thisapproach is implemented in the grid simulator we used for our tests, which isdescribed in Section 4. We discuss the impact of CP–Nets on the schedulingprocess in Section 5, analyzing the results of several computer simulations. Wediscuss related work in Section 6, and draw our conclusions in Section 7.

2 CP–Nets

Conditional Preference Networks [6] address the problem of representing andreasoning with preferences over a multivariate domain; their broad applicabil-ity to many fields such as, for instance, design, planning and decision making,is related to the ability to succinctly specify and represent preference order-ings graphically. This is an extremely important feature, owing to the fact thatcorrespondingly explicit representations of preference orderings of multivariatedomains are exponential in the number of variables and thus unfeasible. Webegin by formally defining preference relations.

DEFINITION 1. Preference relation.

Given a set of variables V = v1, ..., vn and the outcome space O =Dom(v1)× ...×Dom(vn), a preference relation or ranking is a total preorderover O; if o1 o2 then the outcome o1 is equally or more preferred than o2.

Each variable vi (also known as attribute or feature) may assume a valuebelonging to Dom(vi) = vi

1, ..., vini

. The size of the set of outcomes O is thusexponential.

CP–Nets capture ceteris paribus (all else being equal) conditional prefer-ence statements, whose semantics is based on the notion of preferential inde-pendence.

DEFINITION 2. Preferential independence.

Let x denote an assignment of values to a set X ⊆ V and xy the concate-nation of two assignments to X and Y with X ∩ Y = ∅. A set of variables Xis preferentially independent of its complement Y = V −X iff

x1y1 x2y1 iff x1y2 x2y2 ∀x1x2y1y2

When the preferential independence relation holds, x1 is preferred to x2

ceteris paribus : fixing the values of all of the other variables, the preference


relation (over assignments to the set X) holds independently of the valuestaken by the other variables.

We are now ready to define Conditional preferential independence.

DEFINITION 3. Conditional preferential independence.

Given a partition of V such that V = X∪Y ∪Z, X and Y are conditionallypreferentially independent given z iff

x1y1z x2y1z iff x1y2z x2y2z ∀x1x2y1y2

In practice, X and Y are preferentially independent iff Z is assigned z. Ifthe relation holds for all possible assignments z then X and Y are conditionallypreferentially independent given Z.

The preference elicitation process requires that users specify for each vari-able x ∈ V the parent variable Parent(x) that can affect their preferencesover the values of x. The CP–Net graph is then constructed so that for eachnode x, Parent(x) is the immediate predecessor. A more general approach isto allow for Parent(x) to represent a set of vertices instead of a single vertex.On the basis of the particular assignment to the vertex Parent(x) the user isable to determine a specific preference order over Dom(x), the domain of thevariable x, all other things being equal. Thus, a CP–Net associates to eachassignment to Parent(x) a Conditional Preference Table (CPT).

DEFINITION 4. CP–Net.

A CP–Net is a directed graph G = (V,E). The set of vertices V =v1, ..., vn represents the CP–Net variables and E = (vi, vj) : vi, vj ∈ V isthe set of edges between variables. For each v ∈ V , the function Parent(v)returns the vertex v ∈ V such that (v, v) ∈ E. The CPT specifies a strictpartial order ≻i

u over Dom(xi) representing the conditional preference of theinstantiations of xi for a given instantiation u of Parent(xi).

Given the ceteris paribus preference statement “I prefer wine to beer withmy meal”, its interpretation is: given two identical meals, one with wine andone with beer, I prefer the former. The statement “I prefer red wine to whitewine with my meal, ceteris paribus, given that meat is served” is interpreted as:given two identical meals in which meat is served, I prefer red wine to whitewine. This tells us nothing about two identical meals in which meat is notserved. Ceteris Paribus preference statements induce independence relations,for instance, if my preference for wine depends on (and only on) the maincourse, then wine choice is conditionally preferentially independent of all othervariables given the main course value.

We now give an example describing how a CP–Net can be used to representthe following preference statements:

– I strictly prefer aix to linux as operating system;


– I prefer power processors if the operating system is aix, and xeon processorsif the operating system is linux;

– I prefer the EESL math library when using power processors and the NAGlibrary when using xeon processors.

Let the variables a, l, p, x, e and n represent respectively a preference foraix, linux, power processor, xeon processor, EESL and NAG math library.

The first preference statement is unconditional; the other ones are condi-tional. Fig. 1a shows the CP–Net related to the previous example. Each noderepresents a domain variable, and the immediate parents Parent(v) of a vari-able v in the network are those variables that affect the user’s preference overthe values of v. Associated to each node, there is a Conditional PreferenceTable (CPT) which provides an ordering over the values of the node for eachpossible parent’s context. Fig. 1b shows the corresponding Conditional Pref-erence Graph, in which l ∧ p ∧ n represents the worst outcome and a ∧ p ∧ e

the best one.

As discussed in [7], any acyclic CP–Net defines a consistent partial or-der over the outcome space; given a CP–Net N and two possible outcomesx and y, a dominance query asks whether or not x y is a consequence ofthe preferences of N . When the Conditional Preference Graph is a DAG (Di-rected Acyclic Graph), it can be shown that, in order to answer the query,a simple polynomial time sweep algorithm only needs to search for a flip-ping sequence (path) from the less preferred outcome y through a series ofmore preferred outcomes, to the more preferred outcome x, where each valueflip in the sequence is sanctioned by the network N . The time complexity ofthe flipping-sequence search over binary–valued, DAG–structured CP–Nets isO(n2), where n is the number of variables in the CP–Net [6]. For instance, thedominance query a∧ x∧ n l ∧ p∧ e can be shown to be true in the exampleof Fig. 1b owing to the fact that there exists a path (sequence of improvingflips) from one assignment to another (flipping sequence). This is a proof thatthe latter assignment is preferred to the former.

Given a CP–Net N , generating an optimal outcome is even easier: thisrequires sweeping through the network from top (ancestor vertices) to bottom(descendent vertices) setting each variable to its most preferred value given theinstantiation of its parents. Even if the network does not, in general, determinea unique ranking, however it determines a unique best outcome (assuming noindifference). Therefore, outcome optimization queries can be answered usingthe outlined forward sweep algorithm, whose complexity is O(n), so that it islinear in the number n of variables [6].

Finally, CP–Nets also allow expressing relative importance relations. Theseexpress the fact that one variable’s value is more important than another’s;moreover, CP–Nets induce implicit importance relations between nodes andtheir descendants. As an example, one could say that Processor type is moreimportant to me than operating system (all else being equal). If it is moreimportant to me that the value of x be high than the value of y be high, then


x is more important than y. The notation to express this relative importancerelation is x ⊲ y.

A variable may be conditionally more important than another. For in-stance, one could say the operating system is more important than processortype (all else being equal), if the workstation is used primarily for graphicalapplications. Given z ∈ Dom(Z), if it is more important to me that the valueof x be high than the value of y be high, then x is conditionally more importantthan y. The corresponding notation is x ⊲z y.

a ≻ l

a : p ≻ x

l : x ≻ p

p :e ≻ n

x :n ≻ e

Operating System

CPU

Math Library

(a) CP–Net

∧ p∧ n

∧ p∧ e

l ∧ x ∧ e

l ∧ x ∧ n

∧ x ∧ n

∧ p∧ n

∧ x ∧ e

∧ p∧ e

(b) Corresponding Conditional Pref-erence Graph

Fig. 1: CP–Net and corresponding Conditional Preference Graph

3 CONDITIONAL PREFERENCE MATCHMAKING

In this Section we briefly review our approach to matchmaking available re-sources in a grid environment against a user’s request. Matchmaking availableresources in a grid environment against a user’s request entails finding one ormore (a pooled set of) resources that best match the user’s request. Our match-


making service is in charge of finding the best match given the current statusof the grid environments, the user’s constraints and preferences: indeed, thesame request may result in different matchings under different resource load,etc. The first input to the matchmaking service is a job description expressedin a specified formalism (e.g., the Job Submission Description Language [1],the Job Description Language [34], the Condor’s classified advertisements [39]etc.) containing constraints to be satisfied by resources to execute a batch,parameter sweep or workflow job (e.g. the machine’s memory must be at least16 GB, the processor must be AMD etc). The second input are the user’spreferences w.r.t. the resources to be used for the execution of his/her job. Inour simulator, both the constraint and the preferences are expressed using asimple XML dialect, as shown in Section 4.4.

Algorithm 3.1: Conditional Preference Matchmaking algorithmInput: J , a job to be scheduled; C, set of user’s constraints; P , set of user’s

preferencesOutput: Job J is scheduled on the resource that best matches the user’s request if

it exists/* Query a GIS using the constraints in C */

1 RMC ← Query-GIS(C) ;/* RMC is the set of resources matching the constraints */

2 if card(RMC) == 0 then

/* no resource matches the constraints in C */

3 return No match ;

4 else

5 build the CP–Net graph using the preferences in P ;/* run the linear time outcome optimization query on the CP-Net */

6 RMPC ← CP-Net(RMC, P) ;/* RMPC is the set of resources matching both the preferences and the

constraints */

7 if card(RMPC) == 0 then

/* no resource matches the preferences in P */

/* so scheduling may use any heuristic */

8 schedule job J on one of the available resources in RMC ;

9 else

10 F ← Filter(RMPC) ;/* F is a subset of resources in RMPC matching the preferences in P and

the constraints in C, totally ordered from best to worst w.r.t.

preferences */

11 schedule job J on the first available resource in F ;

Algorithm 3.1 describes the steps to achieve Conditional Preference Match-making. We start querying a Grid Information Service (GIS) using the con-straints in C to determine RMC, a set of resources matching the constraintson which the user’s job may run (step 1). If RMC is empty, no machine on thegrid actually satisfies the user’s constraint, so that the scheduler discards thejob and informs the user (steps 2–3). Otherwise, in steps 4–11 we deal withthe resources in RMC.


We begin by building the CP–Net graph related to the preferences P instep 5. Then, we run the linear time outcome optimization query on the CP–Net in step 6, determining RMPC ⊂ RMC, a subset of resources matchingboth the user’s preferences and costraints. If RMPC is empty, the CP–Netalgorithm did not find any resource in RMC satisfying the user’s preferences.Hence, we schedule the job on one of the machines belonging to RMC (steps7–8). This can be done using, for instance, an heuristic.

When RMPC is not empty, steps 9–11 return a set of resources suitablefor job execution. Since RMPC may contain many resources, we select F ⊂RMPC, a fraction of these resources in step 10. For instance, we select andreturn the initial 30% of the machines returned in RMPC, ordered from bestto worst w.r.t. preferences. The job is scheduled on the first machine availablein F . If a machine is not available, the scheduler will try scheduling the jobon the next one round robin until it succeeds or a timeout elapses.

We now analyze the computational complexity of algorithm 3.1. We denoteby T (R,C, P ) the time required to determine a feasible set of resources ona grid consisting of R resources taking into account the constraints C andthe preferences P , by H(RMC) the time to execute an heuristic on the setof resources RMC and by CP–Net(RMC,P ) the time to execute the CP–Net outcome optimization query on RMC and P . We have T (R,C, P ) =Max(H(RMC), CP–Net(RMC,P )).

Indeed, the time to query a GIS (we have an explicit query in step 1, andimplicit queries in steps 8 and 11) is usually negligible w.r.t. the time used forscheduling the job. The CP–Net outcome optimization query requires in theworst case time linear in the input size n (number of preferences), i.e. O(n).Therefore, T (R,C, P ) = Ω(|RMC|).

Since analyzing all of the resources in step 8 using an heuristic requires atleast time linear in the number of resources in RMC, we conclude that theoverall time complexity depends on the actual number of steps executed foreach resource. As an example, if no more than O(1) steps are executed on eachresource in RMC, then the overall complexity of the algorithm is linear in thenumber of resources in RMC.

4 A Simulator for Grid Matchmaking and Scheduling

Efficient and effective scheduling is very important in grid computing environ-ments as shown in [25], [26], [13]. The problem can be addressed consideringexperimental or simulation approaches. Validating the performance of gridscheduling strategies in a real production environment should be the idealscenario but cannot be feasibly carried out. The complexity of production sys-tems, dynamism of grid execution environments and the difficulty to reproduceexperiments, make scheduling in production systems a complex research envi-ronment. So, given the difficulties tied to the experimental approach, simula-tion is the most flexible and viable way of evaluating different Grid scheduling


algorithms as well as other design issues, although some simplifications andassumptions are made.

In order to develop and evaluate new grid scheduling algorithms it is fun-damental the use of simulators in order to address performance evaluationstudies, considering possible constraints and preferences given by the users.On the other hand, it is worth nothing here that the evaluation of the perfor-mances obtained with a simulator is just a first step, that must be followed bythe setup of a good testing environment with representative workload tracesto produce dependable results. Computer simulation is our approach for eval-uation of CP–Nets in the matchmaking of involved resources; therefore, a sim-ulator has been designed and implemented. Even though several simulatorshave already been developed e.g. Alea [20], Briks [44], ChicSim [36], Grench-Mark [19], GridNet [24], GridSim [8], MicroGrid [40], NSGrid, [46], G3S [32],OptorSim [9], SimGrid [10] and GSSIM [23], we decided to implement our ownsimulator for the following reasons:

– we did not need the ability to plug in several algorithms;– the majority of simulators do not provide C bindings (with the exception

of SimGrid);– in order to reduce software engineering costs and to maximize reuse we had

to exploit an already implemented code base;– the time required to install a simulator, read and understand its docu-

mentation and implement our algorithm according to the simulator’s ar-chitecture was much higher, with regard to the integration of our softwaremodules for our purposes.

Our simulations takes as parameters constraints and preferences, related toa set of information on resources (CPU, memory and storage usage), networklinks and applications, provided by the users (our scheduler’s clients) in anXML format. We begin describing first the workload data, then we presentthe architecture of the simulator and hence several implementation details.

4.1 Workload Data

The workload plays an important role in experimental performance evaluationof computer systems. Many studies have been conducted to design better andmore effective resource allocation schemes [13].

Using the simulation approach, a data set representative of the job inter–arrival times can be retrieved by using various statistical distributions such asUniform and Exponential distribution ([29], [31], [14]).

For describing job arrivals, we used statistical distributions such as Ex-ponential, Gaussian and Weibull. The first one, Exponential distribution, ismotivated by the need to simulate an incremental rate of task arrivals, thesecond one, Gaussian distribution, performs well even under the stress of mil-lions of tightly packed data points and finally the Weibull distribution’s failurerate is a power function of time, so that instantaneous failure rate at time t is


defined as the probability of failures between time t and t+ dt given that nofailure has occurred in the system until time t.

4.2 Simulator architecture

The simulator has a client-server architecture. The server starts initializinga number of child processes specified as a command line parameter. When aconnection request is issued by a client (service consumer), the server spawnsa new thread to serve the incoming request in a thread.

A submitted job contains constraints and preferences described in an XMLfile, described in Section 4.4. As shown in the Fig. 2, when the request issubmitted by a client, the server queues the request in the Arrivals Queue(managed by the process request thread). Hence the job is validated against aDocument Type Definition (DTD) document (validator thread).

If the validation is successful, then the job is queued in the Job Queue,handled with a FIFO (First In First Out) policy. The job status becomesJOB SUBMITTED and it is stored in an internal database with the cur-rent timestamp. Otherwise, if the XML file is not well formed or the clienthas submitted again a job that was submitted previously, then the valida-tor thread kills the job and the server deals with the following requests. Jobsqueued in the Job Queue are not scheduled immediately, owing to the fact thatthere is another queue, the Pending Queue, with higher priority. The latterqueue handles jobs that were submitted previously; the scheduler was not ableto schedule them before, taking into account the constraints and preferencescharacterizing them on the one hand, and the grid status on the other.

When a job is dequeued from the Job Queue, the scheduler queries theinternal database (which fullfills the role of a Grid Information Service) toretrieve RMC, the set of computational resources matching the constraintsand then processes through the CP–Net outcome optimization query the pref-erences as specified in the corresponding XML file, returning RMPC, the setof resources matching both the user’s preferences and constraints. If none ofthe resources satisfies the constraint query (|RMC| == 0), the client requestcannot be scheduled and hence the job is suppressed and its status is up-dated to JOB SUPPRESSED. If a set of resources matches the constraintsbut no resource satisfies the preferences, then one of the resources matchingthe constraints is chosen (|RMC| > 0 ∧ |RMPC| == 0). Finally, when a setof resources matches the constraints and the CP–Net algorithm returns a setRMPC including at least a resource, which is the best outcome with regardto the preferences (|RMC| > 0∧ |RMPC| > 0), the scheduler selects a subsetof resources in RMPC totally ordered from best to worst w.r.t. preferences,and tries to schedule the job on the first available resource (round robin).

Therefore, before executing the job, the scheduler queries again the databasebecause, owing to the the dynamic nature of the grid environment, it needs toverify if the selected resource still provides the required number of CPU cores


Start

XML

Data Generation

Workload data

Arrivals

Queue

(An)

Validation

Duplicated

job_id

Yes

Job not

scheduled

No

Yes

Job

Queue

(Jn)

No

Job

Evaluation

Use values of

previous queries

Constraint query

Preferences query

Query

Evaluation

Jn=An

Jn=Pn

Job is suppressed

con == 0

con > 0 Check grid

status

free storage >

requested

storage

free CPUs >

requested

CPUs

Yes

No

Job

Execution

Yes

No

Resources

availability

Yes

Job

Pending

(Pn)

No

End

Fig. 2: The simulator flow chart

etc; indeed, another scheduler’s thread may have submitted concurrently an-other job or rescheduled a job dequeued from the Pending Queue.

If this is the case, the job is queued in the Pending Queue (also handledwith FIFO policy). Since this queue has a higher priority with regard to theJob Queue, the scheduler will attempt to serve job requests belonging to thisqueue before the ones related to Job Queue. In turn, this provides a certaindegree of fault–tolerance. However, a job cannot be queued into the PendingQueue over and over again: a field of the database (maxPendingTimes (MPT))specifies a Time To Live (TTL), so that when this time limit is exceeded thejob is suppressed and its status updated to JOB TIMEOUT.

Fig. 3 depicts job states and transitions between states; available statesinclude:

– JOB PENDING: the job is waiting in the Arrivals or Pending Queue;– JOB SUBMITTED: the job is processed by the server;– JOB TIMEOUT: the time specified in the maxPendingTimes field has

elapsed without the scheduler being able to submit the job;– JOB SUPPRESSED: no resource was available for job scheduling;– JOB DONE: the job completed successfully;– JOB FAILED: job execution failed.


JOB_SUBMITTED

JOB_TIMEOUT

JOB_FAILED

JOB_PENDING

JOB_DONE

end

start

MPT < Max

JOB_SUPPRESSED

MPT >Max

Fig. 3: Job states

4.3 The information database

One of the aspects considered in the design of the simulator has been thedefinition of the database schema in order to provide a system for storing andaccessing the data with the usual CRUD operations. Therefore, we modeledthe information to be managed taking into account information about theavailable CPUs and cores, resources, nodes, applications, jobs and queues. Asnapshot of the relational schema is shown in Fig. 4). The “grid” databasecontains the following information:

– CPU entity:– serial number: CPU serial number (primary key of this entity);– node id: node identifier, represents a resource node containing the CPU;– hourly cost: hourly cost of the CPU;– frequency: clock frequency;– brand: CPU manufacturer;– cores: numbers of cores;– c int: integer performance value acquired through the SPEC bench-

mark;– c float: floating point performance value acquired through the SPEC

benchmark;– cache L1: cache size of level 1 of CPU;– cache L2: cache size of level 2 of CPU.

– Node entity:– id node: node identifier (primary key of this entity);– ram: RAM size;– os name: installed operating system;– os version: operating system version;– os load average: load average;– network interface: available network interface;– hostname resource: node’s hostname.

– Resource entity:– hostname: hostname identifying the resource (primary key of this en-

tity);


CPU

serial_number

hourly_costfrequencybrandcoresspec_intspec_floatcache_L1cache_L2node_id

node

id_node

ramos_nameos_versionos_load_averagenetwork_interfacehostname_resource

PK

FK1

application

name

typeexpected_workloadadditional_prefs

application_installed_resource

date_time

resource_hostnamename_application

resource

PK hostname

placestorage_typestorage_namestorage_brandstorage_spacestorage_free_spacebandwith_inbandwith_out

PK

FK1

queue

namehostname

typeprioritycpusfree_cpuspolicy

job

PK

FK1FK2FK3

id

typeSLAnameparent_idname_queuename_application

job status

PK

FK1

date_time

valueid_job

PK

PK

FK2FK1

PK1PK2

Fig. 4: Relational schema of the “grid” database

– place: resource’s location;– storage type: storage type of the resource;– storage name: storage name of the resource;– storage brand: storage brand of the resource;– storage space: size of space currently used;– storage free space: size of free space available;– bandwidth in: input network bandwidth;– bandwidth out: output network bandwidth.

– Application entity:– name application: application name (primary key of this entity);– type: application type (sequential, parallel);– expected workload: expected workload of the application;– additional prefs: possible preferences.

– Job entity:– id: job identifier (primary key of this entity);– type: job type (sequential, parallel);– SLA: Service Level Agreement associated to the job;– name: job name;


– parent id: identifier of parent job (required to support workflow appli-cations; NULL for a job without parent);

– name queue: name of resource queue in which the job has been queuedfor execution;

– name application: name of application to which the job refers.– Job Status entity:

– id job: job identifier (primary key of this entity);– date time: timestamp (date and hour) associated to the job status;– value: job status;

– Queue entity:– name: queue name (primary key of this entity, along with hostname resource);– hostname resource: hostname of the resource to which the queue be-

longs;– type: queue type (sequential, parallel)– priority: priority level (high, medium, low);– cpus: total number of CPUs handled;– free cpus: number of current CPUs available for job execution;– policy: management policy of the queue.

4.4 Job Description

The jobs executed during the simulation are represented by XML files con-taining both the constraints and preferences. These files are validated using asuitable schema. In particular, the Job tag, that represents the root element,must have three child tags: Parameters, Requirements and Preferences. TheParameters tag specifies the job executable, the command line arguments, thetype of jobs (serial or parallel) and the number of required CPUs. Require-ments contains constraints on the resources such as the type of CPUs, theamount of RAM, the operating system etc; Preferences contains the user’sdesiderata. A Preferences node may have a maxT node (maxterm) contain-ing a minT (minterm) node which must have a child node named depends.Indeed, preferences’ modeling is based on CP–Nets: a CP–Table associate toa CP–Net node may be expressed using standard forms of expressions such assum of products (minterms) or product of sums (maxterms) commonly usedin boolean algebra and Karnaugh maps. The following is the Document TypeDefinition for job description.

<!ELEMENT Job (Parameters,Requirements,Preferences)>

<!ELEMENT Parameters (Executable, Arguments, Type)>

<!ELEMENT Executable (#PCDATA)>

<!ELEMENT Arguments (#PCDATA)>

<!ELEMENT Type (#PCDATA)>

<!ELEMENT Requirements (CPU?, Node?, Resource?)>

<!ELEMENT CPU (brand?, cores?, frequency?, cache_L1?, cache_L2?, CINT?, CFP?, hourly_cost?)>

<!ELEMENT Node (ram?, os_name?, os_version?)>

<!ELEMENT Resource (hostname?, place?, bandwidth_in?, bandwidth_out?, storage_free_space?)>

<!ELEMENT Preferences (CPU?, Node?, Resource?)>

<!ELEMENT brand (maxT?)>

<!ELEMENT cores (maxT?)>

<!ELEMENT frequency (maxT?)>

<!ELEMENT cache_L1 (maxT?)>

<!ELEMENT cache_L2 (maxT?)>


<!ELEMENT CINT (maxT?)>

<!ELEMENT CFP (maxT?)>

<!ELEMENT hourly_cost (maxT?)>

<!ELEMENT ram (maxT?)>

<!ELEMENT os_name (maxT?)>

<!ELEMENT os_version (maxT?)>

<!ELEMENT hostname (maxT?)>

<!ELEMENT place (maxT?)>

<!ELEMENT bandwidth_in (maxT?)>

<!ELEMENT bandwidth_out (maxT?)>

<!ELEMENT storage_free_space (maxT?)>

<!ELEMENT maxT (minT+)>

<!ELEMENT minT (depends+)>

<!ELEMENT depends EMPTY>

<!ATTLIST Executable applicationName CDATA #REQUIRED expectedWorkload CDATA #REQUIRED>

<!ATTLIST Type nCpu CDATA #REQUIRED>

<!ATTLIST brand value CDATA #REQUIRED operator (equal) "equal">

<!ATTLIST cores value CDATA #REQUIRED operator (max|min) "min">

<!ATTLIST frequency value CDATA #REQUIRED operator (max|min) "min">

<!ATTLIST cache_L1 value CDATA #REQUIRED operator (max|min) "min">

<!ATTLIST cache_L2 value CDATA #REQUIRED operator (max|min) "min">

<!ATTLIST CINT value CDATA #REQUIRED operator (max|min) "min">

<!ATTLIST CFP value CDATA #REQUIRED operator (max|min) "min">

<!ATTLIST hourly_cost value CDATA #REQUIRED operator (max|min) "min">

<!ATTLIST ram value CDATA #REQUIRED operator (max|min) "min">

<!ATTLIST os_name value CDATA #REQUIRED operator (equal) "equal">

<!ATTLIST os_version value CDATA #REQUIRED operator (equal) "equal">

<!ATTLIST hostname value CDATA #REQUIRED operator (equal) "equal">

<!ATTLIST place value CDATA #REQUIRED operator (equal) "equal">

<!ATTLIST bandwidth_in value CDATA #REQUIRED operator (min) "min">

<!ATTLIST bandwidth_out value CDATA #REQUIRED operator (min) "min">

<!ATTLIST storage_free_space value CDATA #REQUIRED operator (min) "min">

<!ATTLIST maxT operation (and|or) "and">

<!ATTLIST minT operation (and|or) "and">

<!ATTLIST depends node (brand|cores|frequency|cache_L1|cache_L2|CINT|CFP|hourly_cost|ram|os_name|os_version|hostname|place|bandwidth_in|bandwidth_out|storage_free_space) "brand" denied (y|n) "n">

Here is an example of a job description file:<Job>

<Parameters>

<Executable applicationName="app_8" expectedWorkload="53.675">48</Executable>

<Arguments>arg4</Arguments>

<Type nCpu="8">PARALLEL</Type>

</Parameters>

<Requirements>

<CPU>

<frequency value="1.8" operator="min"/>

<hourly_cost value="2" operator="max"/>

</CPU>

<Node>

<ram value="4" operator="max"/>

<os_name value="Aix" operator="equal"/>

</Node>

</Requirements>

<Preferences>

<CPU>

<cores value="16" operator="min">

<maxT operation="or">

<minT operation="and">

<depends node="bandwidth_in" denied="n"/>

</minT>

</maxT>

</cores>

<hourly_cost value="1.5" operator="max"/>

</CPU>

<Resource>

<bandwidth_in value="12" operator="min">

<maxT operation="or">

<minT operation="and">

<depends node="hourly_cost" denied="y"/>

</minT>

</maxT>

</bandwidth_in>

</Resource>

</Preferences>

</Job>

This file describes the following requests of a user:

– Execution of a parallel job named ’48’, related to the ’app 8’ applicationwith at least 8 CPUs (constraint);

– RAM size must be at least 4 GB on each node (constraint);– The operating system must be Aix (constraint);– The CPU frequency must be greater than or equal to 1.8 Ghz (constraint);


– The hourly cost of the CPUs must be less than or equal to 2 dollars (con-traint);

– If possible, the hourly cost of the CPUs should be less than or equal to 1.5dollars (preference);

– If it is possible submit a job on a CPU with hourly cost under 1.5 dollars,preferably the resource should have a minimum input bandwidth of 12Mb/s (preference);

– If the previous requests can be satisfied, the job should be run on a CPUwith 16 cores (preference).

The following example shows how to express three preferences, each onedepending on the previous one; we omit details related to parameters andrequirements. In particular, the user prefers a CINT value (related to theCPU performance) that must be at least 42, and, if this preference holds, theuser prefers an AMD CPU; finally, if the CPU is an AMD one, the user prefersa level 2 cache size of at least 1 MBytes.

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE Job SYSTEM "gridsim.dtd">

<Job>

<Parameters> ...</Parameters>

<Requirements>...</Requirements>

<Preferences>

<CPU>

<brand value="AMD" operator="equal">

<maxT operation="and">

<minT operation="or">

<depends node="CINT" denied="n"/>

</minT>

</maxT>

</brand>

<cache_L2 value="1" operator="min">

<maxT operation="and">

<minT operation="or">

<depends node="brand" denied="n"/>

</minT>

</maxT>

</cache_L2>

<CINT value="42" operator="min"/>

</CPU>

</Preferences>

</Job>

5 IMPACT OF CP–Nets ON SCHEDULING

In this Section we present the experimental results we obtained. We begin bydescribing the experiments that have been carried out, which are characterizedby the following parameters:

– r, number of computational resources managed by the grid scheduler;– j, number of jobs submitted to the grid scheduler;– w, workload expressed as number of jobs already running on the grid;– e, boolean value indicating if the CP–Net algorithm is enabled or disabled;– a, number of applications to be simulated;– n, total number of nodes;– c, total number of CPUs.

We designed and carried out 38 different experiments which are character-ized by r ∈ 500, 1000, 2000, j ∈ 500, 1000, 2000, 100000,w ∈ 0, 500, 1000, 2000,


e ∈ TRUE,FALSE, a = 24. The first 36 experiments have been run sub-mitting up to 2,000 jobs on small, medium and large grids: for r = 500,n = 13, 567 and c = 365, 204; for r = 1, 000, n = 27, 766 and c = 729, 084;for r = 2, 000, n = 52, 642 and c = 1, 330, 148. The last two experiments havebeen run submitting 100,000 jobs on a large grid: r = 2, 000, n = 52, 737 andc = 1, 404, 214.

The hardware used in the first 36 experiments consists of three SMP (Sym-metric Multi-Processor) nodes configured with two Intel Itanium 2 single coreprocessors 1.4 Ghz with 1.5 MB level 3 cache and 4 GB of main memory. Oneof the nodes was dedicated to the execution of our grid simulator, another oneto the back-end PostgreSQL database and the last one was used to issue theuser’s requests to the grid simulator. In order to stress the simulator, all of therequests were issued concurrently. For the last two experiments, the hardwareused consists of three SMP nodes configured with two Intel Xeon E5520 dualcore processors 2.27 Ghz with 8 MB level 3 cache and 8 GB of main memory.

Tables 1–7 and Figures 5–14 summarize the results obtained. In these ta-bles, the total time for scheduling the jobs, the average time to schedule onejob and standard deviation are expressed in seconds. It’s worth clarifying herethat, since our focus is on matchmaking and not on scheduling, the impact ofCP–Nets on scheduling is measured by carrying out several couples of exper-iments, respectively enabling or disabling the CP–Net algorithm (parametere) in order to verify quantitatively the net effect of enabling it and to demon-strate that it is negligible; we schedule the jobs using the simplest round-robinstrategy applied to the set of resources returned by the CP–Nets algorithm,which are ranked according to their matching degrees; i.e., each job is sched-uled on the first available resource returned by the algorithm. Given that thisis the simplest possible scheduling strategy, our aim therefore is not to comparedifferent scheduling algorithms, but, rather, to determine experimentally thetime required to schedule all of the jobs, the average time and the standard de-viation to schedule a job and resource utilization when CP–Nets matchmakingis taken into account.

We begin discussing the results related to Table 1 and Figures 5–7. Thefigures are histograms in which we plot the distribution of data as frequencies.On the x and y axes we plot respectively the time in seconds to schedule a job(rounded to the nearest integer) and the number of jobs that required thattime to be scheduled.

For a grid consisting of 500 resources with no initial workload (Table 1and Figures 5– 7), enabling the CP–Net algorithm we observe as expected anincrease of the total time required to schedule the jobs. However, the rate ofincrease is not directly proportional to the number of submitted jobs. Whensubmitting 500 jobs, the rate of increase is 90.02%, on average it takes 5.05 sec-onds to schedule a job (versus 2.66 with CP–Nets disabled) and the standarddeviation is 8.47 seconds (versus 5.29). The rate of increase is only 28.59%when submitting 1000 jobs. Correspondingly, on average it takes 4.16 secondsto schedule a job (versus 3.24 with CP–Nets disabled) and the standard de-viation is 5.61 seconds (versus 5.76). Therefore, in this case the scheduling


process appears to be more uniform, with less dispersion around the meanvalue w.r.t. the same experiment in which CP–Nets are not enabled. Finally,when submitting 2000 jobs the rate of increase becomes 75.76%, the averagetime to schedule a jobs is 5.6 seconds (versus 3.19 with CP–Nets disabled)and the standard deviation is 6.57 seconds (versus 3.28). From these exper-imental results we conclude that for this grid increasing the submitted jobsleads to a reduction of the small overhead associated to the CP–Net algo-rithm up to (probably) a minimum value, and then the overhead increasesagain. The behavior is thus the one associated to a monotonically decreasingand then increasing function. As can be seen in Figures 5–7, the majority ofthe submitted jobs requires a few seconds to be scheduled, with or withoutthe CP–Net algorithm. The figures also show the presence of outliers, a fewjob requiring more time to be scheduled. We now discuss resource utilization.Since the application in our simulations are not installed on each resource, acomplete utilization of the full set of available resources is not possible. In theexperiments, resource utilization falls from 383 to 293 resources used whensubmitting 500 jobs, from 484 to 420 resources for 1000 jobs and finally from486 to 482 for 2000 jobs. The trend is therefore the one of a monotonicallyincreasing function; when enabling the CP–Net algorithm the overall differ-ence in resource utilization becomes negligible as the number of submittedjobs increases.

Regarding the rate of increase of the scheduling time associated to the CP–Net algorithm, the same pattern can be observed in the results related to Table2 and Figures 8–10. These results are related to the same grid consisting of 500resources, but in the corresponding experiments there is an initial workload of500 jobs already running on the grid before submitting new jobs. While therate of increase is higher with regard to the unloaded grid, we observe that onaverage the time required to schedule a job is practically almost always lower.Resource utilization is also better, with almost no difference when increasingthe number of submitted jobs, and very close to the maximum possible (giventhat, as already stated, full resource utilization is not possible).

Jobs 500 1000 2000Time (CP–Net disabled) 1329.29 1618.02 1592.54Time (CP–Net enabled) 2525.93 2080.67 2798.98Difference 1196.64 462.64 1206.44% Difference 90.02 28.59 75.76Average (CP–Net disabled) 2.66 3.24 3.19Std deviation (CP–Net disabled) 5.29 5.76 3.28Average (CP–Net enabled) 5.05 4.16 5.60Std deviation (CP–Net enabled) 8.47 5.61 6.57Resources (CP–Net disabled) 383 484 486Resources (CP–Net enabled) 293 420 482Difference -90 -64 -4%Difference -23.49 -13.22 -0.82

Table 1: Unloaded grid consisting of 500 resources



Table 2: Grid consisting of 500 resources, initial workload of 500 jobs

We now analyze the results obtained for a grid consisting of 1000 resources,with no initial workload. These results are provided in Table 3 and Figures8–10. As shown, when increasing the number of computational resources be-longing to the grid, the rate of increase is monotonically decreasing whenincreasing the number of submitted jobs from 500 to 2000. The average timeto schedule a job using the CP–Net algorithm is, respectively, 13.75, 17.07and 17.29 seconds (versus 8.96, 11.67 and 11.9 without CP–Net) for 500, 1000and 2000 submitted jobs. The overall resource utilization factor is very goodfor 500 and 2000 jobs (respectively a difference of 31 and 85 resources notutilized) and worse for 1000 submitted jobs (difference of 162 resources). Thenumber of resources utilized increases steadily with the number of submittedjobs, reaching 887 resources (out of 1000) for 2000 jobs.

When considering the same grid consisting of 1000 resources (Table 4 andFigures 8–10), this time with an initial workload of 1000 jobs already runningon the grid before submitting new jobs, we obtained the following results. Asin the previous case, the rate of increase is monotonically decreasing whenincreasing the number of submitted jobs from 500 to 2000. Moreover, theaverage time required to schedule a job is lower than the corresponding timein the previous case, and resource utilization is again steadily increasing withthe number of submitted jobs, reaching 972 resources (out of 1000) for 2000jobs.

Regarding the results obtained for a grid consisting of 2000 resources, withno initial workload and with an initial workload of 2000 jobs (Tables 5–6 andFigures 11–13), we note a dramatic decrease of the rate of increase of the totaltime required to schedule the jobs with respect to the small (500 resources)and the medium (1000 resources) sized grids in all of the experiments with500, 1000 and 2000 submitted jobs. The average time to schedule a job andthe standard deviation when using the CP–Net are only slightly larger thanthe corresponding times without the CP–Net algorithm, and the increase isnegligible. Resource utilization is also quite good. The utilization factor in-creases with the number of submitted jobs, and, for the grid with no initialworkload, in the worst case (2000 jobs submitted) there is only a -15.39%


20 40 60 80Seconds

50

100

150

200

250

Number of jobs

CP-Net enabled

CP-Net disabled

(a) grid unloaded

10 20 30 40 50 60 70Seconds

100

200

300

400

Number of jobs

CP-Net enabled

CP-Net disabled

(b) grid workload: 500 jobs

Fig. 5: 500 jobs on a grid consisting of 500 resources

difference between the corresponding experiments with and without CP–Net.For the grid with an initial workload of 2000 jobs, the worst case happenswhen submitting 1000 jobs, with a similar -15.79% difference. Interestingly,the percentage difference is the lowest (-4.05%) when submitting 2000 jobs.






Jobs 500 1000 2000Time (CP–Net disabled) 34256.4 63764.1 132312Time (CP–Net enabled) 38972.8 76280.9 153356Difference 4716.47 12516.8 21044.2% Difference 13.76 19.62 15.9Average (CP–Net disabled) 68.51 63.76 66.15Std deviation (CP–Net disabled) 179.16 177.42 184.17Average (CP–Net enabled) 77.94 76.28 76.67Std deviation (CP–Net enabled) 186.84 192.03 193.25Resources (CP–Net disabled) 445 852 1559Resources (CP–Net enabled) 435 795 1319Difference -10 -57 -240%Difference -2.24 -6.69 -15.39



10 20 30 40 50 60 70Seconds

100

200

300

400

500

600

700

Number of jobs

CP-Net enabled

CP-Net disabled

(a) grid unloaded

20 40 60 80 100Seconds

200

400

600

800

Number of jobs

CP-Net enabled

CP-Net disabled



Finally, Table 7 and Figure 14 refers to the last two experiment carried outon an unloaded grid consisting of 2000 resources. To assess the scalability of theCP–Net algorithm, we submitted 100000 jobs. As shown, the time required tosubmit all of the jobs using the CP–Net algorithm is only 18.41% more thanthe corresponding time without the algorithm. Average time and standard


20 40 60 80 100Seconds

200

400

600

800

1000

1200

1400

Number of jobs

CP-Net enabled

CP-Net disabled

(a) grid unloaded

20 40 60 80 100Seconds

500

1000

1500

Number of jobs

CP-Net enabled

CP-Net disabled



deviation increase slightly, and resource utilization is in this case even slightlybetter, with more resources utilized when running the experiment with theCP–Net algorithm enabled.


50 100 150 200 250Seconds

100

200

300

400

Number of jobs

CP-Net enabled

CP-Net disabled

(a) grid unloaded

50 100 150 200Seconds

100

200

300

400

500

Number of jobs

CP-Net enabled

CP-Net disabled



6 Related Work

In this paper we strictly deal with the matchmaking process in the contextof grid scheduling, not with scheduling algorithms in general; therefore, wediscuss in this Section relevant work in the field of matchmaking only.


50 100 150 200 250 300 350Seconds

200

400

600

800

Number of jobs

CP-Net enabled

CP-Net disabled

(a) grid unloaded

50 100 150 200 250 300Seconds

200

400

600

800

Number of jobs

CP-Net enabled

CP-Net disabled



In the field of Artificial and Computational Intelligence, earliest resultsin matchmaking include [38], [22], [5], [41], [43] and [35]. Agent-Based Soft-ware Interoperability (ABSI) [38] takes advantage of the KQML (KnowledgeQuery and Manipulation Language) specification and uses KIF (Knowledge In-


50 100 150 200 250 300Seconds

500

1000

1500

Number of jobs

CP-Net enabled

CP-Net disabled

(a) grid unloaded

50 100 150 200 250 300 350Seconds

500

1000

1500

Number of jobs

CP-Net enabled

CP-Net disabled



terchange Format) as content language. Matchmaking of advertisements andusers’ requests happens through the unification of equality predicates.

COIN [22], is a system in which matchmaking is based on a unification pro-cess quite similar to the one carried out by the Prolog programming language.InfoSleuth [5] uses KIF as the content language; the matchmaking process is


Jobs 500 1000 2000Time (CP–Net disabled) 32767.8 61576.5 125502Time (CP–Net enabled) 38217.1 75871.9 145591Difference 5449.3 14295.4 20089% Difference 16.63 23.21 16.007Average (CP–Net disabled) 65.53 61.57 62.75Std deviation (CP–Net disabled) 173.91 171.65 174.7Average (CP–Net enabled) 76.43 75.87 72.79Std deviation (CP–Net enabled) 185.39 191.97 184.4Resources (CP–Net disabled) 855 1564 1948Resources (CP–Net enabled) 794 1317 1869Difference -61 -247 -79%Difference -7.13 -15.79 -4.05


Jobs 100000Time (CP–Net disabled) 9949690Time (CP–Net enabled) 11782200Difference 1832480% Difference 18.41Average (CP–Net disabled) 99.49Std deviation (CP–Net disabled) 324.07Average (CP–Net enabled) 117.82Std deviation (CP–Net enabled) 370.76Resources (CP–Net disabled) 1950Resources (CP–Net enabled) 1954Difference 4%Difference 0.2

Table 7: Unloaded grid consisting of 2000 resources, 100000 submitted jobs

based on solving a constraint satisfaction problem, so that an advertisementand a request match if the user’s constraints are satisfied.

The Service Description Language has been proposed in [41] to describeavailable services. Here, matchmaking requires determining k -nearest servicesfor a request according to the distance between the service names (pairs ofverb and noun terms) and the request. Capability Description Language wasproposed in [48]. It supports reasoning through the notions of subsumptionand instantiation.

Language for Advertisement and Request for Knowledge Sharing (LARKS)appeared in [43] and is able to describe both service capabilities and servicerequests. It is based on the ITL (Information Terminological Language) con-cept language [42]. LARKS exploits the relations among concepts in order tocompute semantic similarities.

Traditionally, service and resource discovery have been carried out usingmethods based on name and keyword matchmaking. A semantic matchmakingframework based on DAML-S, a DAML (DARPA Agent Markup Language)-based language for service description, has been proposed in [35]. In thisontology-based matchmaking framework an advertisement matches a request


200 400 600 800 1000Seconds

100

200

300

400

Number of jobs

CP-Net enabled

CP-Net disabled

(a) grid unloaded

200 400 600 800 1000Seconds

100

200

300

400

Number of jobs

CP-Net enabled

CP-Net disabled



when the service or resource provided by the advertisement can provide acertain degree of usefulness to the requester. When performing matchmaking,the system uses the outputs and inputs of the advertisement and the requestbased on the ontologies available, and, through the subsumption relationshipof one concept of the input/output of the advertisement and one concept of


200 400 600 800 1000Seconds

200

400

600

800

Number of jobs

CP-Net enabled

CP-Net disabled

(a) grid unloaded

200 400 600 800 1000Seconds

200

400

600

800

Number of jobs

CP-Net enabled

CP-Net disabled



the input/output of the request, is able to determine four different levels ofmatching: exact, plug-in, subsume, and fail.

Another ontology-based matchmaking service is presented in [18]. It usesseparate ontologies to declaratively describe resources and job requests. In-stead of exact syntax matching, their ontology-based matchmaker performs se-


200 400 600 800 1000Seconds

500

1000

1500

Number of jobs

CP-Net enabled

CP-Net disabled

(a) grid unloaded

200 400 600 800 1000Seconds

500

1000

1500

Number of jobs

CP-Net enabled

CP-Net disabled



mantic matching using terms defined in ontologies. The loose coupling betweenresource and request descriptions remove the tight coordination requirementbetween resource providers and consumers. The authors designed and proto-typed their matchmaking service using TRIPLE to use ontologies encoded inW3Cs Resource Description Format (RDF) and rules (based on Horn logic


500 1000 1500 2000Seconds

20 000

40 000

60 000

80 000

Number of jobs

CP-Net enabled

CP-Net disabled

Fig. 14: Unloaded grid consisting of 2000 resources, 100000 submitted jobs

and F-logic) for resource matching. Resource descriptions, request descrip-tions, and usage policies are all independently modeled and syntactically andsemantically described using RDF schema. Inference rules are utilized for rea-soning about the characteristics of a request, available resources, and usagepolicies to find a resource that satisfies the request requirements.

In [3] the authors implemented a matchmaking service in an intelligentgrid environment, the BondGrid [4]. Their matchmaking framework is basedon a resource specification component, a request specification component, andmatchmaking algorithms. The request specification includes a matchmakingfunction and possibly two additional constraints, namely a cardinality thresh-old and a matching degree threshold. The cardinality threshold specifies howmany resources the requestor expects from the matchmaking service. Thematching degree threshold purpose is to specify the least matching degreeof one of the resources returned. The input of the matchmaking algorithm isthe request and the grid resource instances stored in a knowledge base; thealgorithm evaluates the request function in the context of each resource in-stance. The output is a number of grid resources, which are ranked accordingto their matching degrees. The matchmaking service returns the grid resources


that have the n largest matching degrees to the requester, where n is the car-dinality threshold specified by the request.

In [30], the authors discuss the problem of matchmaking for mathematicalservices, where the semantics play a critical role in determining the applica-bility or otherwise of a service and for which they use OpenMath descriptionsof pre- and post-conditions. A matchmaking architecture supporting the useof match plug-ins is described, along with five kinds of plug-in that have beendeveloped for this pourpose: (i) a basic structural match, (ii) a syntax andontology match, (iii) a value substitution match, (iv) an algebraic equivalencematch and (v) a decomposition match. Their matchmaker uses the individualmatch scores from the plug-ins to compute a ranking by applicability of theservices. The authors consider the effect of pre- and post-conditions of mathe-matical service descriptions on matching, and how and why to reduce queriesinto Disjunctive Normal Form (DNF) before matching. Finally, a case studydemonstrates in detail how the matching process works.

Trust-aware matchmaking is the subject of [2]. The authors presents a peer-to-peer trust brokering system, in which the network of trust brokers operateby providing peer reviews in the form of recommendations regarding potentialresource targets. One of the distinguishing features of this work is that it sepa-rately models the accuracy and honesty concepts, so that their model is able tosignificantly improve the performance. The trust brokering system is appliedto a resource manager in order to illustrate its utility in a public-resource Gridenvironment. The simulations performed to evaluate the trust-aware resourcematchmaking strategies indicate that high levels of robustness can be attainedby considering trust while matchmaking and allocating resources.

The Condor high-throughput resource management system for compute-intensive jobs [45] requires task submissions to be specified in description filescontaining basic information and task requirements. The latter are translatedto classified advertisements (ClassAds) that are sets of named expressions.ClassAds, which maps attribute names to expressions, are also used to ex-press characteristics of resources, and a matchmaking service matches taskand resource-related ClassAds to determine the proper resources where taskscan be executed. A Constraint attribute in a classad is evaluated against theclassad being matched with this classad, and when the values of attribute Con-straint of both classads are evaluated to true, these two classads are matched.Another attribute, Rank, measures the quality aof a match. The value of Ranktherefore provides an indication of how much the two classads match, so thatthe larger the value, the better is the matching. Condor requires a providerand a requester to know each other’s classad structure. The evaluation resultof the Rank attribute is, in general, not normalized and can not tell explicitlyhow well two classads match.

Matchmaking in Condor supports selecting only one resource. In [28], anextension, called set-extended classad syntax, was proposed in order to sup-port multiple resource selection. Matchmaking works evaluating a set-extendedclassad with a set of classads and returning a classad set with the highest rank.However, when the size of the classad set is large, evaluating all of the possible


combinations is infeasible; in this case, a simple greedy heuristic is used to findthe classad set providing the highest rank.

In [27] the authors present a new approach to symmetric matching thatachieves significant advances in expressiveness relative to ClassAds. It allowsmulti-way matches, expression and location of resource with negotiable capa-bility. The key to their approach is reinterpreting matching as a constraintproblem and exploit constraint-solving technologies to implement matchingoperations. A prototype matchmaking mechanism, named Redline, has beenimplemented and used to model and solve several challenging matching prob-lems.

GREEN [11] matches a job demand with a grid resource supply on thebasis of a characterization of resources by means of their performance, evalu-ated through benchmarks relevant to the application. The matchmaking ser-vice is based on a two-level benchmarking methodology; a requestorspecifiesboth syntactic and performance requirements, independently of the underlyingmiddleware. GREEN fosters Grid interoperability through the use of JSDL toexpress job submission requirements, and an internal translation to the jobsubmission languages used by the targets middleware. Middleware indepen-dence is pursued through an extension of JSDL based on the Glue2.0 schema.Moreover, some extensions to JSDL related to concurrency aspects were bor-rowed from JSDL SPMD Application Extension, in oder to support executionof parallel applications.

A resource selection system for exploiting graphics processing units (GPUs)as general-purpose computational resources in desktop Grid environments ispresented in [21]. The system allows Grid users to share remote GPUs, whichare traditionally dedicated to local users who directly see the display output.The key contribution of this paper is a novel system for non-dedicated en-vironments. The authors first show criteria for defining idle GPUs from theGrid users’ perspective. Based on these criteria, the system uses a screen-saver approach with some sensors that detect idle resources at a low overhead.The idea for this lower overhead is to avoid GPU intervention during resourcemonitoring. Detected idle GPUs are then selected according to a matchmakingservice, making the system adaptive to the rapid advance of GPU architecture.Though the system itself is not yet interoperable with current desktop Gridsystems, the idea can be applied to screensaver-based systems such as BOINC.The system has been evaluated using Windows PCs with three generations ofnVIDIA GPUs. The experimental results show that it achieves a low overheadof at most 267 ms, minimizing interference to local users while maximizing theperformance delivered to Grid users. Some case studies are also performed inan office environment to demonstrate the effectiveness of the system in termsof the amount of detected idle time.


7 CONCLUSIONS

In this paper we dealt with the problem of conditional preference matchmak-ing of computational resources belonging to a grid. We introduced CP–Nets,a recent development in the field of Artificial Intelligence, as a means to dealwith user’s preferences in the context of grid scheduling. We discussed CP–Nets from a theoretical perspective and then analyzed, qualitatively and quan-titatively, their impact on the matchmaking process, with the help of a gridsimulator we developed for this purpose. Many different experiments have beensetup and carried out, and we report here our main findings and the lessonslearnt.

1. Introducing CP–Nets in the matchmaking process is feasible. Theoverhead associated to CP–Nets is minimal when considering that the av-erage time to schedule a job in all of our experiments ranges from 3.89 (bestcase) to 117.82 seconds (worst case, less than two minutes). We also notehere that, besides requiring a few seconds, the average time to schedule ajob when using the CP–Nets is almost always close to the average time toschedule a job without CP–Nets, and is never more than two times thisvalue. Moreover, the outcome optimization query is polynomial (linear) inits input for the particular case related to our experiments, and the aver-age time to schedule a job is not directly proportional to the number ofsubmitted jobs.

2. Bigger grids are well suited to the use of CP–Nets in the match-

making process.Compared to smaller grids, bigger ones exhibit a reducedrate of increase of the scheduling time associated to the CP–Net algorithm.

3. Resource utilization does not decreases excessively using CP–

Nets. Overall, resource utilization is extremely good, ranging from nodifference at all (best case) to a maximum difference of 23.49%. For biggergrids and workloads, resource utilization is close to the maximum possible.

4. Grids with an initial workload provide better performances w.r.t.

unloaded ones. Interestingly, grids which are already busy executing aprevious workload react better to conditional preference matchmaking,leading in almost all of the experiments to a reduced average time to sched-ule a job.

Therefore, we conclude that CP–Nets can be a useful tool to ensure thatuser’s preferences are met in the matchmaking process, so that scheduling mayprovide results more appealing to the end users, with minimal overhead.

Acknowledgment

The authors would like to thank the anonymous reviewers for their useful, con-structive comments, that greatly helped improving the quality of this paper.


References

1. A. Anjomshoaa, F. Brisard, M. Drescher, D. Fellows, A. Ly, S. McGough, D. Pulsipher,and A. Savva. Job submission description language (jsdl), specification, version 1.0.Global Grid Forum Working Draft, 2005.

2. F. Azzedin, M. Maheswaran, and A. Mitra. Trust brokering and its use for resourcematchmaking in public-resource grids. Journal of Grid Computing, 4:247–263, 2006.

3. X. Bai, H. Yu, Y. Ji, and D. C. Marinescu. Resource matching and a matchmaking ser-vice for an intelligent grid. In International Conference on Computational Intelligence,pages 262–265. International Computational Intelligence Society, 2004.

4. X. Bai, H. Yu, G. Wang, Y. Ji, D. Marinescu, and L. Boloni. Intelligent grids. In GridComputing: Software Environments and Tools, pages 45–74. Springer, 2005.

5. R. J. Bayardo, Jr., W. Bohrer, R. Brice, A. Cichocki, J. Fowler, A. Helal, V. Kashyap,T. Ksiezyk, G. Martin, M. Nodine, M. Rashid, M. Rusinkiewicz, R. Shea, C. Unnikr-ishnan, A. Unruh, and D. Woelk. Infosleuth: agent-based semantic integration of in-formation in open and dynamic environments. SIGMOD Rec., 26(2):195–206, June1997.

6. C. Boutilier, R. I. Brafman, C. Domshlak, H. H. Hoos, and D. Poole. Cp-nets: a toolfor representing and reasoning with conditional ceteris paribus preference statements.J. Artif. Int. Res., 21:135–191, February 2004.

7. C. Boutilier, R. I. Brafman, H. H. Hoos, and D. Poole. Reasoning with conditionalceteris paribus preference statements. In K. B. Laskey and H. Prade, editors, UAI,pages 71–80. Morgan Kaufmann, 1999.

8. R. Buyya and M. Murshed. Gridsim: a toolkit for the modeling and simulation ofdistributed resource management and scheduling for grid computing. Concurrency andComputation: Practice and Experience, 14(13-15):1175–1220, 2002.

9. D. G. Cameron, A. P. Millar, C. Nicholson, R. Carvajal-Schiaffino, K. Stockinger, andF. Zini. Analysis of scheduling and replica optimisation strategies for data grids usingoptorsim. Journal of Grid Computing., 2(1):57–69, 2004.

10. H. Casanova, A. Legrand, and M. Quinson. Simgrid: A generic framework for large-scale distributed experiments. In Proceedings of the Tenth International Conferenceon Computer Modeling and Simulation, UKSIM ’08, pages 126–131. IEEE ComputerSociety, 2008.

11. A. Clematis, A. Corana, D. D’Agostino, A. Galizia, and A. Quarati. Job-resourcematchmaking on grid through two-level benchmarking. Future Gener. Comput. Syst.,26(8):1165–1179, Oct. 2010.

12. K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman. Grid information servicesfor distributed resource sharing. In High Performance Distributed Computing, 2001.Proceedings. 10th IEEE International Symposium on, pages 181 –194, 2001.

13. H. Dail, O. Sievert, F. Berman, H. Casanova, A. YarKhan, S. Vadhiyar, J. Dongarra,C. Liu, L. Yang, D. Angulo, and I. Foster. Scheduling in the grid application devel-opment software project. In J. Nabrzyski, J. M. Schopf, and J. Weglarz, editors, Gridresource management, pages 73–98. Kluwer Academic Publishers, Norwell, MA, USA,2004.

14. C. Dumitrescu and I. Foster. Usage policy-based cpu sharing in virtual organizations. InProceedings of the 5th IEEE/ACM International Workshop on Grid Computing, GRID’04, pages 53–60, Washington, DC, USA, 2004. IEEE Computer Society.

15. E. Elmroth and J. Tordsson. Grid resource brokering algorithms enabling advancereservations and resource selection based on performance predictions. Future GenerationComputer Systems, 24:585–593, 2008.

16. I. Foster and C. Kesselman. The Grid. Blueprint for a New Computing Infrastructure.:Blueprint for a New Computing Infrastructure (Elsevier Series in Grid Computing).Morgan Kaufmann, 2. a. edition, 2003.

17. I. Foster, C. Kesselman, and S. Tuecke. The Anatomy of the Grid: Enabling ScalableVirtual Organizations. International Journal of High Performance Computing Appli-cations, 15(3):200–222, 2001.

18. A. Harth, S. Decker, Y. He, H. Tangmunarunkit, and C. Kesselman. A semantic match-maker service on the grid. In Proceedings of the 13th international World Wide Web


conference on Alternate track papers & posters, WWW Alt. ’04, pages 326–327. ACM,2004.

19. A. Iosup and D. Epema. Grenchmark: A framework for analyzing, testing, and com-paring grids. In Cluster Computing and the Grid, IEEE International Symposium on,pages 313–320. IEEE Computer Society, 2006.

20. D. Klusacek and H. Rudova. Alea 2: job scheduling simulator. In Proceedings of the3rd International ICST Conference on Simulation Tools and Techniques, SIMUTools’10, pages 61:1–61:10, ICST, Brussels, Belgium, Belgium, 2010. ICST (Institute forComputer Sciences, Social-Informatics and Telecommunications Engineering).

21. Y. Kotani, F. Ino, and K. Hagihara. A resource selection system for cycle stealing ingpu grids. Journal of Grid Computing, 6:399–416, 2008.

22. D. Kuokka and L. Harada. Matchmaking for information agents. In Proceedings ofthe 14th international joint conference on Artificial intelligence - Volume 1, IJCAI’95,pages 672–678. Morgan Kaufmann Publishers Inc., 1995.

23. K. Kurowski, J. Nabrzyski, A. Oleksiak, and J. Weglarz. Grid scheduling simulationswith gssim. In Proceedings of the 13th International Conference on Parallel and Dis-tributed Systems - Volume 02, ICPADS ’07, pages 1–8. IEEE Computer Society, 2007.

24. H. Lamehamedi, Z. Shentu, B. Szymanski, and E. Deelman. Simulation of dynamic datareplication strategies in data grids. In Parallel and Distributed Processing Symposium,2003. Proceedings. International, page 10 pp., april 2003.

25. H. Li and R. Buyya. Model-driven simulation of grid scheduling strategies. In Proceed-ings of the Third IEEE International Conference on e-Science and Grid Computing,pages 287–294, Washington, DC, USA, 2007. IEEE Computer Society.

26. H. Li and R. Buyya. Model-based simulation and performance evaluation of gridscheduling strategies. Future Gener. Comput. Syst., 25:460–465, April 2009.

27. C. Liu and I. Foster. A constraint language approach to matchmaking. In Proceedingsof the 14th International Workshop on Research Issues on Data Engineering: WebServices for E-Commerce and E-Government Applications (RIDE’04), RIDE ’04, pages7–14. IEEE Computer Society, 2004.

28. C. Liu, L. Yang, I. Foster, and D. Angulo. Design and evaluation of a resource selec-tion framework for grid applications. In Proceedings of the 11th IEEE InternationalSymposium on High Performance Distributed Computing, HPDC ’02, pages 63–. IEEEComputer Society, 2002.

29. U. Lublin and D. G. Feitelson. The workload on parallel supercomputers: modelingthe characteristics of rigid jobs. J. Parallel Distrib. Comput., 63:1105–1122, November2003.

30. S. Ludwig, O. Rana, J. Padget, and W. Naylor. Matchmaking framework for mathe-matical web services. Journal of Grid Computing, 4:33–48, 2006.

31. E. Medernach. Workload analysis of a cluster in a grid environment. In D. Feitelson,E. Frachtenberg, L. Rudolph, and U. Schwiegelshohn, editors, Job Scheduling Strategiesfor Parallel Processing, volume 3834 of Lecture Notes in Computer Science, pages 36–61. Springer Berlin / Heidelberg, 2005.

32. S. Naqvi and M. Riguidel. Grid security services simulator (g3s) — a simulationtool for the design and analysis of grid security solutions. In Proceedings of the FirstInternational Conference on e-Science and Grid Computing, E-SCIENCE ’05, pages421–428. IEEE Computer Society, 2005.

33. L. N. Nassif, J. M. Nogueira, and F. v. V. de Andrade. Resource selection in grid:a taxonomy and a new system based on decision theory, case-based reasoning, andfine-grain policies. Concurr. Comput. : Pract. Exper., 21:337–355, March 2009.

34. F. Pacini. Job submission description language attributes, glite specification (submissionthrough wmproxy service). egee-jra1-tec-590869-jdlattributes-v0-8. EGEE, 2006.

35. M. Paolucci, N. Srinivasan, K. P. Sycara, and T. Nishimura. Towards a semantic chore-ography of web services: From wsdl to daml-s. In Proceedings of the InternationalConference on Web Services, ICWS ’03, June 23 - 26, 2003, Las Vegas, Nevada, USA,pages 22–26. CSREA Press, 2003.

36. K. Ranganathan and I. Foster. Computation scheduling and data replication algorithmsfor data grids. In J. Nabrzyski, J. M. Schopf, and J. Weglarz, editors, Grid resourcemanagement, pages 359–373. Kluwer Academic Publishers, 2004.


37. I. Sfiligoi, D. C. Bradley, B. Holzman, P. Mhashilkar, S. Padhi, and F. Wurthwein. Thepilot way to grid resources using glideinwms. In Proceedings of the 2009 WRI WorldCongress on Computer Science and Information Engineering - Volume 02, CSIE ’09,pages 428–432. IEEE Computer Society, 2009.

38. N. Singh. A common lisp api and facilitator for absi: version 2.0.3. Technical ReportLogic-93-4, Logic Group, Computer Science Department, Stanford University, 1993.

39. M. Solomon. The classad language reference manual v2.1. Computer Sciences Depart-ment, University of Wisconsin, Madison, USA, 2003.

40. H. J. Song, X. Liu, D. Jakobsen, R. Bhagwan, X. Zhang, K. Taura, and A. Chien. Themicrogrid: A scientific tool for modeling computational grids. Sci. Program., 8(3):127–141, Aug. 2000.

41. V. S. Subrahmanian, P. Bonatti, U. J. Dix, T. Eiter, S. Kraus, and R. Ross. Heteroge-neous Agent Systems. MIT Press, 2000.

42. K. Sycara, J. Lu, and M. Klusch. Interoperability among heterogeneous software agentson the internet. Technical Report CMU-RI-TR-98-22, Robotics Institute, Pittsburgh,PA, October 1998.

43. K. Sycara, S. Widoff, M. Klusch, and J. Lu. Larks: Dynamic matchmaking amongheterogeneous software agents in cyberspace. Autonomous Agents and Multi-AgentSystems, 5(2):173–203, June 2002.

44. A. Takefusa, S. Matsuoka, H. Nakada, K. Aida, and U. Nagashima. Overview of a per-formance evaluation system for global computing scheduling algorithms. In High Perfor-mance Distributed Computing, 1999. Proceedings. The Eighth International Symposiumon, pages 97 –104, 1999.

45. D. Thain, T. Tannenbaum, and M. Livny. Condor and the grid. In F. Berman, G. Fox,and T. Hey, editors, Grid Computing: Making the Global Infrastructure a Reality. JohnWiley & Sons Inc., December 2002.

46. P. Thysebaert, B. Volckaert, F. de Turck, B. Dhoedt, and P. Demeester. Evaluationof grid scheduling strategies through nsgrid: a network-aware grid simulator. Neural,Parallel Sci. Comput., 12(3):353–378, Sept. 2004.

47. C.-M. Wang, H.-M. Chen, C.-C. Hsu, and J. Lee. Dynamic resource selection heuris-tics for a non-reserved bidding-based grid environment. Future Gener. Comput. Syst.,26:183–197, February 2010.

48. G. J. Wickler. Using expressive and flexible action representations to reason aboutcapabilities for intelligent agent cooperation. PhD thesis, University of Edinburgh,1999.

Preference–Based Matchmaking of Grid Resources with CP–Nets

Documents

Transcript of Preference–Based Matchmaking of Grid Resources with CP–Nets