On Quality of Service Support for Grid Computing

17
Chapter 1 ONQUALITY OF SERVICE SUPPORT FOR GRID COMPUTING D. Colling 1 , T. Ferrari *,2 , Y. Hassoun 1 , C. Huang 4 , C. Kotsokalis *,3 , A.S. McGough 1 , E. Ronchieri 2 , Y. Patel 1 , P. Tsanakas 3 1 Imperial College London London, UK [email protected], [email protected], [email protected], [email protected] 2 National Institute of Nuclear Physics, CNAF Bologna, Italy [email protected], [email protected] 3 National Technical University of Athens Athens, Greece [email protected], [email protected] 4 Brunel University London, UK [email protected] * Contact authors Abstract Computing Grids are hardware and software infrastructures that sup- port secure sharing and concurrent access to distributed services by a large number of competing users from different Virtual Organizations. Concurrency can easily lead to overload and resource shortcomings in large-scale Grid infrastructures, as today’s Grids do not offer differen- tiated services. We propose a framework for supporting Quality of Ser- vice guarantees via both reservation and discovery of best-effort services based on the matchmaking of application requirements and Quality of Service performance profiles of the candidate services. We illustrate the middleware components needed to support both strict and loose guar-

Transcript of On Quality of Service Support for Grid Computing

Chapter 1

ON QUALITY OF SERVICE SUPPORT FOR

GRID COMPUTING

D. Colling1, T. Ferrari∗,2, Y. Hassoun1, C. Huang4, C. Kotsokalis∗,3,A.S. McGough1, E. Ronchieri2, Y. Patel1, P. Tsanakas3

1Imperial College LondonLondon, UK

[email protected], [email protected], [email protected], [email protected]

2National Institute of Nuclear Physics, CNAFBologna, Italy

[email protected], [email protected]

3National Technical University of AthensAthens, Greece

[email protected], [email protected]

4Brunel UniversityLondon, UK

[email protected]

∗Contact authors

Abstract Computing Grids are hardware and software infrastructures that sup-

port secure sharing and concurrent access to distributed services by a

large number of competing users from different Virtual Organizations.

Concurrency can easily lead to overload and resource shortcomings in

large-scale Grid infrastructures, as today’s Grids do not offer differen-

tiated services. We propose a framework for supporting Quality of Ser-

vice guarantees via both reservation and discovery of best-effort services

based on the matchmaking of application requirements and Quality of

Service performance profiles of the candidate services. We illustrate the

middleware components needed to support both strict and loose guar-

2

antees and the performance assessment techniques for the discovery of

suitable services.

Keywords: Grid Computing, Quality of Service, Performance Measurements, Mon-

itoring

1. Introduction

Quality of Service (QoS) is a broad term that is used, in this context,to denote the level of performance that a given client experiences wheninvoking an operation on a remote server. QoS support refers to thepossibility of a given service instance to offer a performance level thatsatisfies the requirements of a given client.

QoS support is of paramount importance given the inherent sharednature of Grid services, and the limited capabilities of hardware andsoftware resources available to satisfy client requests. In this environ-ment, an aggressive best-effort usage of services can result in denial ofservice problems, as the non-policed use of services is against their ef-fective provision. The more recent efforts for integration of scientificinstrumentation and the related requirements for high availability andresponsiveness, urge for the support of differentiated services.

QoS requirements can be of many different kinds, taking under con-sideration the heterogeneous nature of Grid resources which the appli-cations utilize. The support of QoS requirements needs the adoptionof specific enforcement techniques in the service backend. Exclusive ac-cess typically requires resource locking techniques, while other types ofQoS parameters, such as responsiveness, are dependent on a numberof different factors - both static and dynamic - such as the client andserver physical location, the status of the communication infrastructure,the client/server implementations, the type of operation invoked, thenumber of concurrent clients served, etc.

We propose a framework which supports two complementary types ofguarantees: strict, involving digital contracts, and loose, implying best-effort usage based on statistical assumptions. Firstly, we address theproblem of establishing agreements (digital contracts) in Section 2 andthe problem of delivering QoS guarantees for atomic tasks that needto be executed instantly in Section 3. Then we analyze a list of per-formance metrics for loose QoS in Section 4, defined on the basis ofthe requirements of interactive applications, such as the ones requiringInstrument Grids. Then we elaborate on our methodology for measure-ments in Section 4.0. We continue to provide a framework for publishingand archiving relevant information in Section 6. Finally, we present a

On Quality of Service Support for Grid Computing 3

preliminary prototype in Section 7 and related work in Section 8. Weconclude the paper and elaborate on future work in Section 9.

2. Strict Guarantees

The application of a specific QoS level to a Grid task requires thenegotiation and establishment of a contract, named Service Level Agree-ment (SLA), between the customer or proxy, and the service provider.A SLA quantitatively defines the performance level for the service re-quested and the obligations for the parties involved in the contract. TheSLA is typically specified through a template containing both quanti-tative and qualitative information. In particular, a SLA specificationsupplies administrative information (e.g., the identity of the entities in-volved in the contract, and the penalties applicable to the parties whenthe SLA guarantees are not honored), whereas the Service Level Speci-fication (SLS) is a set of attributes and values describing the profile ofthe requested performance level [EGEE, 2005a]. Strict guarantees re-quire the establishment of such a formal agreement via signalling andnegotiation. This process involves both service consumers and serviceproviders.

The Grid Resource Allocation and Agreement Protocol Working Group(GRAAP-WG) of the Open Grid Forum (OGF) proposes an architecturefor Grid SLA signalling comprising two functional levels: the agreementlayer and the service provider layer [OGF, 2007]. The agreement layerimplements the communication protocol used to exchange informationabout SLAs between the customer and the service provider, and it is re-sponsible for ensuring that the SLA guarantees are enforced by a suitableservice provider. In addition, the agreement layer exposes informationabout types of service and the related SLA templates offered, processesagreement creation requests to check the compliance of the so-calledagreement offers (i.e. the SLA requests), and handles well-formed re-quests to the service provider layer. An agreement is successfully createdif the corresponding guarantees can be enforced by one or more serviceproviders, which are also responsible for supervising the status of theirresource pools. Then the service provider layer concerns the terms, i.e.the content of the agreement, which binds the entities involved.

Reservation and allocation in Grids can be assisted by existing re-source management capabilities. In this paper, we address the problemof the support of resource-independent agreement signalling by GridWorkload and Resource Management services (WMS), and we describea prototype implementation developed in the framework of the GRIDCCand EGEE EU projects. The WMS comprises a set of middleware com-

4

ponents responsible for the distribution and management of tasks acrossGrid resources and services. In this context, the term resource refers tofabric-layer facilities, e.g. computing farms, data storage systems andnetworking infrastructures. Conversely, we use the term service to referto a general service instance regardless of the provided functionality levelwith respect to the Grid service architecture.

SLA Signalling

We propose an approach to SLA signalling based on the instrumenta-tion of the WMS and on the availability of a new collective service, whichis called the Agreement Service (AS). Resource broker instrumentation isneeded in order to provide the capability of handling the allocation andreservation of Grid resources. The WMS supports the SLA signallingphase, the discovery of service instances supporting the user’s require-ments based on data available within the Grid Information System (IS),and the monitoring of SLAs.

We borrow the notations of Agreement Initiator, Agreement Offer andService Provider from the WS-Agreement proposed specification of theOGF. In particular, as we focus on reservation and allocation, in this con-text the provider is called Reservation and Allocation Service Provider(RASP). The proposed architecture is illustrated in Figure 1.1, whichdetails the interactions between the above components. Four types ofresources are shown: the Computing Element (CE), the Storage Element(SE), the network (Network Service Access Point [EGEE, 2005b]) andthe Instrument Element (IE) [GRIDCC, 2005b].

RASPs are fabric-layer resource-specific services that are responsiblefor the management of the availability of the local resource over time,for admission control based on the involved infrastructure’s policies, forthe control of the satisfiability of each authorized incoming requestsand, finally, for the enforcement of the guarantees. We propose eachreservation-capable resource to be associated to an individual RASP in-stance. User’s resource requirements and preferences are part of the SLSand they consequently need to be specified as a list of Service Descrip-tion and Guarantee Terms. We identify four term categories: generalattributes (type of task requested and corresponding functionality), timeattributes (information about the start time, end time and duration ofthe reservation), functionality attributes (reservation-specific informa-tion) and guarantee attributes (definition of the QoS profile).

The agreement initiator is the entity which triggers the SLA signallingprocess. In our scenario, users, applications or Grid middleware com-ponents can be initiators provided that the identity of the resources to

On Quality of Service Support for Grid Computing 5

be involved in the SLA signalling is known. If available, this identityis defined in the SLA and the allocation and reservation task can besubmitted directly to the AS. While direct negotiation with the AS ispossible, nevertheless, very frequently users have no a-priori knowledgeof the Grid infrastructure. When this applies, we propose the alloca-tion and reservation task to be submitted to the WMS in order to takeadvantage of its discovery capabilities. After the submission the WMSdetermines both the RASPs which can satisfy the task requirements,and the AS’s which are capable of supporting the corresponding RASPinterfaces. This happens through the use of resource details fed by theRASPs into the IS. In this scenario, the WMS has the agreement initia-tor role and acts on behalf of the user or application.

Agreement Initiator

(Scheduler)

Agreement Initiator(WMS)

Agreement Initiator

(End user)

Agreement Service

RASP

ComputingElement

RASP

StorageElement

RASP

NetworkElement

RASP

InstrumentElement

Figure 1.1. Agreement Service and the other components of the proposed architec-

ture

When the AS receives an offer, it initially checks SLA compliance tothe appropriate corresponding template, it extracts attributes from theSLA request, and, finally, it dispatches them to a number of RASPs byinvoking the appropriate interface. A given agreement can be success-fully created only if one or more RASPs (depending on the nature ofthe agreement) accepted the responsibility of providing the guaranteesassociated with it.

Agreement Service

The AS is the novel service component that provides SLA manage-ment capabilities to the WMS. We propose the AS to be responsiblefor interacting with the agreement initiator (as a server) and with theRASPs (as a client) which can satisfy the initiator’s request. In partic-ular, the AS accepts agreement offers from initiators and is responsiblefor compliance checking as well as for the interaction with one or more

6

RASPs in case of success. The AS exposes RASP capabilities throughagreement templates [OGF, 2007]. The template is the SLA skeletonand includes a list of creation constraints applicable to the SLS. Thetemplates advertised by one AS explicitly define the types of service theagreement service can handle, and consequently depend on the serviceproviders the AS interacts with.

The AS can perform negotiation in order to assist initiators during theestablishment of SLS whose characteristics depend on the current statusof the RASP. For every well-formed offer, SLS attributes are translatedto low-level RASP-specific terms according to the RASP interface. Fi-nally, the AS supplies information about the status of agreements undernegotiation and the attributes of the agreements successfully established.

3. Loose Guarantees

The provisioning of loose guarantees consists of the capability of clientsto select service providers that, on a statistical basis, exhibit a best-effortQoS profile which meets the client’s requirements. In this case, servicediscovery relies on the monitoring of critical performance parametersover time. Gathered results combined with appropriate statistical es-timation methods can assist clients and schedulers during the selectionphase.

Loose guarantees are delivered on a best-effort basis, and for thisreason, they do not require the prior establishment of a SLA and theconsequent negotiation of a contract. If consumers experience a ser-vice level that contradicts the past QoS profile exhibited by the serviceproviders, compensation, adaptation and/or other self-healing opera-tions are needed to cope with requirements that are not met in practice.

The usefulness of this approach lies in the requirements of lightweightinteractive applications, which may be more easily addressed in a best-effort fashion via discovery and prediction of performance in the future.The performance experienced when invoking an operation on a remoteservice instance, can largely benefit from the availability of loose guar-antees, as it depends on a large number of factors. For example, asyn-chronous operations typically just exchange one-way messages from theclient to the server, while synchronous operations imply the additionalissuing of a reply message back to the client. Thus, the client/serverinteraction profile typically depends on the network path connectingthe client and the server, and the processing overhead during the entiremessaging process at both ends, this including request/reply messagecomposition/decomposition and processing.

On Quality of Service Support for Grid Computing 7

We propose a novel architecture and methodology for the delivery ofloose guarantees, which relies on three functional components.

Monitoring: the availability of distributed sensors for the monitoring ofthe QoS performance profile of various Grid infrastructure compo-nents, both at the fabric and resource layer;

Publishing: data gathered from the sensors must be made available todistributed consumers;

Discovery: the availability of service discovery engines is required toallow clients to select the most appropriate service instance(s),by comparing the QoS profiles of the services of interest to theclient requirements. Discovery and scheduling services used whenselecting adequate service providers and agreement responders incase of strict guarantees, can be easily extended to support this.

The provisioning of loose guarantees thus requires the complementingof the Grid middleware described in Section 2 with additional functionalcomponents. Consequently, the sections which follow, will focus on theproblem of information gathering and publishing.

We can identify a number of different stages in the process of complet-ing a typical Web services-based client-server interaction, as shown inFigure 1.2. REQ refers to the request message exchanged during a ses-sion, RES identifies the corresponding response message, C representsthe client side and S represents the server side of the interaction.

Figure 1.2. Decomposition of a synchronous operation

1 t0t1 (Message Generation Delay, mgd): composition of the requestmessage REQ is taking place on the client side.

2 t1t2 (Message Transport Delay, mtd): REQ is being dispatched bythe communication infrastructure connecting C to S.

3 t2t3 (Message Processing Delay, mpd): REQ is de-composed by S.

4 t3t4 (Execution Time, et): the operation called is being executed.

8

5 t4t5 (Response Message Generation Delay, rmgd): at this stagecomposition of the response message RES takes place.

6 t5t6 (Response Message Transport Delay, rmtd): it is the one-waydelay experienced by the RES message after being issued by S inorder to be transmitted by the network to C.

7 t6t7 (Response Message Processing Delay, rmpd): RES is de-composed by C.

8 t7t8 (Response Client-side Processing Time, rcpt): C processesthe response message and acts accordingly. This component is nottaken into account, as is also the case with t3t4.

Composition and de-composition delays can be measured with hooksin modern service stacks, while networking delays can be measured withexisting tools from the networking world. Different Grid infrastructurecomponents contribute in the various stages identified above, for thisreason we propose the estimation of end-to-end performance via theindividual monitoring of each part (network, clients and servers), andthe composition of the measurements gathered. The metrics of relevanceand the composition methodology are described in what follows.

4. End-to-end Performance Estimation

In our framework, the computation of the expected QoS service pro-files is required in order to support the selection the Grid services withgood chances to meet the client QoS requirements. The estimation ofa QoS profile is based on the constant monitoring of three fundamentalaspects. Firstly, network performance needs to be probed by specificnetwork sensors that are distributed over the network infrastructure ofinterest. Secondly, we propose the monitoring of performance at theclient side, as this component affects the time needed to construct arequest message in the first place and, if applicable, to process a replymessages from the invoked service. Finally, performance needs to beanalyzed also at the server end, being the server a component that con-tributes to determine both the time needed to process a given requestmessages from remote clients and the time needed to construct replymessages. In this framework, the time needed to process a specific op-eration is not considered, as it is highly specific to the type of operationand the input parameters supplied to it.

On Quality of Service Support for Grid Computing 9

Performance Metrics

The main components contributing to end-to-end performance, arenetwork paths, clients and servers. In this framework, the actual met-rics that need to be subject to monitoring for loose guarantee support,necessarily depend on the application requirements considered. Startingfrom a set of requirements from applications for Computing and Instru-ment Grids, we have worked out a list of relevant metrics, and defineddifferent measurement approaches that are applicable to different metriccategories [GRIDCC, 2005a].

Network performance metrics. Several network performancemetrics can be relevant to the estimation of some Grid service perfor-mance parameters.

Message one-way delay (RFC2679) from the node hosting client C

to the node hosting server S is of paramount importance, as it affectsthe completion and response time of a given operation. Measurementof one-way delay requires accurate clock synchronization in the client C

and in the server S.Round-trip Delay also known as Round Trip Time (RTT) (RFC2681),

complements one-way delay and corresponds to the time span from theinstant when a given message is sent to the time when that message isreceived.

Throughput is particularly relevant in scenarios where applicationsrequire the exchange of large amounts of data. Achievable throughput isthe average volume of application data that can be exchanged over timefrom C to S. The availability of achievable throughput [B. Lowekampet al., 2004] estimations, in combination with information about the filesize to be exchanged, can help with the computation of the time neededto transfer the file from C to S.

Finally, IP Packet Delay Variation (IPDV, RFC3393), quantifies thevariation of one-way delay for two subsequent IP packets. IPDV is crit-ical for interactive and video/audio streaming applications, whose per-formance is optimized in case of low IPDV.

Service metrics. The performance of a stand-alone service can bequantified via a number of metrics, all contributing in a complementaryfashion to the estimation of the performance level which can be expectedby a client when interacting with a given service. Several of these metricsstrictly depend on a number factors, such as message size and its levelof complexity, the type of operation invoked, etc. In the list of metricswhich follows, S denotes a generic server, while C the client.

10

Availability: we define it to be the parameter which expresses the proba-bility that S can be effectively invoked by C at a given time. Highavailability is an idication of service robustness.

Accessibility: it denotes the probability that S can receive a requestand return a response message. Essentially, this shows how well aservice scales with an increasing number of consumers C. S canbe available (serving some clients) but not accessible (rejecting orfailing to serve additional requests).

Completion Time: given t – the time at which the composition (for ex-ample, its serialization if the Web services technology is used) ofa message starts in C, and T – the instant when the execution ofthe operation O triggered by the message, finishes at the serverend, we define Completion Time CT (C,S,O, t) the time interval(Tt). In other terms, considering the operation decomposition in-troduced in Section 3, CT (C,S,O, t) = t0t4. Measurement of CT

requires clock synchronization at both the client and the serverend.

Response Time: given t - the time at which the composition (for ex-ample, its serialization if the Web services technology is used)of a message starts in C; and T - the time when the process-ing of the corresponding response message is terminated by C,Response Time RT (C,S,O, t) is the time interval (Tt), or equiva-lently RT (C,S,O, t) = t0t8. RT is only applicable to synchronousoperations, as RT takes into account the network latencies in bothdirections (C to S and S to C), while the Completion Time is onlydependent on one-way delay and processing time at S).

Measurements Methodology

Network monitoring frameworks generally rely on a set of distributedsensors for the gathering of raw data, which is accomplished by end-to-end and/or edge-to-edge probing sessions. End-to-end measurementsrequire the hosting of the sensors in customer networks, whereas edge-to-edge measurements refer to one or more network transit domains, andrequire the installation of sensors in network Points of Presence at theedge of provider’s domains, giving the possibility to monitor the vari-ous intermediate network segments that compose an end-to-end path.The metrics that can be gathered in the two scenarios are complemen-tary. The problem of client-server performance monitoring is addressedby adopting the data gathered by nearby network sensors as referencevalues.

On Quality of Service Support for Grid Computing 11

The performance metrics described earlier in the text can be appro-priately estimated via various methodologies at different levels of com-plexity.

Availability: When estimating availability, service performance has totake into account the quality of network connectivity experiencedby the service. Monitoring of connectivity for every client-serverpair is infeasible for scalability reasons, hence, we propose avail-ability to be directly measured and logged by the service’s clients.For every client-server communication succesfully established on asocket-layer level, we propose the event to be logged and publishedaccording to the type of information service in use.

Accessibility: If a service is accessible, then operations can complete suc-cessfully. This event can be monitored differently for synchronousand asynchronous operations. In the former case, a response mes-sage is returned, and the successfull completion can be logged andpublished, if the client receives a well formed response messagedirectly or indirectly indicating success, whereas completion of anaynchronous operation for a given client C, needs to be recordedand notified at the server end.

Completion Time: Measuring interval t0t4, needed to estimate CT , is in-trinsically difficult due to the requirement for clock synchronizationbetween clients and services. Furthermore, operations experiencedifferent CT values, depending on the input and the type of oper-ation. Because of this, we request a server to expose a “dummy”(empty) method for every time-critical operation, whose requestprocessing overhead is null. For empty operations CT can be ac-curatly approximated to the time interval t0t3, and the responseprocessing time t3t4 can be neglected, as CT is reduced to thetime for constructing, transferring and de-composing the requestmessage:

CT = t0t1 + t1t2 + t2t3

Response Time: Response Time RT , corresponding to the time intervalt0t8, is approximated by means of empty operations as describedfor parameter CT :

RT = t0t3 + t4t7

5. Metric Composition

While availability and accessibility can be measured relatively eas-ily, accuracy of temporal metrics estimation is in general more difficult

12

due to the reasons elaborated earlier, relevant to lack of clock synchro-nization. Unfortunately, a number of additional constraints come intoplay when estimating some of the metrics, such as CT and RT . Firstly,the computation of one-way delay for every (C,S) pair via active mea-surements, is infeasible for scalability reasons, and this requires the ap-proximation of actual one-way delay to the value measured by nearbysensors. More important, both CT and RT vary according to the inputsets considered, and forecasting is an additional source of inaccuracy. Wepropose CT and RT to be estimated via the composition of various esti-mated time intervals. In particular, time intervals mentioned in Section3 are individually measured and then combined together. The accu-racy achieved by this metric composition, is currently under testing. Weare considering a technique to model temporal overheads based on theMonte Carlo approach and application of Central Limit Theorem [Feller,1966][Tijms, 2004]. The Central Limit Theorem states that the mean ofany set of random variables with arbitrary distributions, having a finitemean and variance and defined on the same probability space, tends tothe normal distribution. Moreover, if the Probability Density Function(PDF) is known, then finite integration could be performed to obtaintighter bounds.

6. Information publishing and archiving

Service broker 1(WMS)

Service broker 2(WMS)

Service broker n(WMS)

Information Repository(Central / Distributed)

Client

Client

Service

Service

Networksensor

Client

Client

Service

Service

Networksensor

Consume

Publish

Domain i Domain j

Figure 1.3. QoS performance information publishing and consuming

As explained in Section 4, different types of metrics are probed andpublished by various sensors in the infrastructure. The clients measure

On Quality of Service Support for Grid Computing 13

availability, accessibility for synchronous operations, message generationdelay and response message processing delay. The services do the samefor message processing delay, response message generation delay, accessi-bility for asynchronous operations, and completion time variation. Thenetwork probes measure message transport delay (simulated by mes-sage one-way delay), response message transport delay (also simulatedby message one-way delay), IPDV and achievable bandwidth. Monitor-ing data is published by these producers, namely the clients with QoSrequirements, servers, and network sensors, as shown in Figure 1.3. In-formation is gathered in a repository, and the information consumers arethe Grid middleware components that need to discover services matchingthe QoS requirements specified by client applications.

Monitoring data can be populated using a central or a distributedrepository, and a push or a pull model, or a combination of the two.The most common approaches are that of a central repository to whichsensors push monitoring data, and that of a publish/subscribe mecha-nism where each client is subscribing for notifications from the serviceswhich are of interest, with each such service pushing data to the clientwhenever there is such a need.

7. Prototype

A prototype is being implemented to demonstrate the feasibility ofthe described framework. To this moment, it supports strict guarantees,with loose guarantees support being under development. It containstwo components: the WMS and an Agreement Service with storage andinstrument reservation templates exposed. This AS is the client of aStorage Element (SE) exposing a Storage Resource Management (SRM)2.1 [J. Gu et al., 2004] interface, as well as an Instrument Element (IE)using a custom, novel interface for instrument reservations. The im-plementation of the web-service part of the AS was based on the opensource gSOAP v.2.7 toolkit [van Engelen, 2003]. The resulting AS (boththe client and the server side) are entirely based on the WS-AgreementXML schema definitions. The operations exposed by the AS are simi-lar to the Agreement Factory ones defined in [OGF, 2007], however theprototype WSDL exposed by the AS does not fully comply with thestandard, as it is not currently based on the WS-Resource Framework[OASIS, 2004].

14

8. Related work

Relevant work in the field of QoS for Web services and Grid Comput-ing has taken place, by exploring different approaches.

In [R. Al-Ali et al., 2002], the authors propose a framework thatallows QoS criteria to be specified as part of a service interface, thusenabling service selection based on QoS profiles of candidate services.The framework defines three QoS layers with metrics similar to ours: theapplication layer (availability, accessibility, security etc), the middlewarelayer (resource requirements) and the network layer (networking QoS).All monitoring is performed by an Application QoS Manager (AQoSM)on every service execution invocation, where this includes the creation ofsandboxes, transfer of data and execution startup. Thus, client-relateddelays are not included in the aggregate values that are archived andused to extract the QoS profile of a service. It is interesting to notethat, again, execution time of the job is not measured (similarly to ourframework not measuring operation execution time).

In [Ran, 2003], the author proposes a service discovery model whichalso takes QoS (non-functional) requirements as input, similarly to whathas been mentioned up to this point. More specificially, the proposedframework includes roles for service providers, service consumers andQoS certifiers, as well as UDDI extentions to hold relevant information.The basic idea is that when a service is registered into the UDDI registryand claims a specific QoS level for itself, then the QoS certifier actuallyverifies these claims for correctness. The paper continues to define thenew structures required and finishes with a list of QoS attributes, similarto the ones defined in our work.

In both cases, extensions to WSDL and UDDI are proposed. Thisis different from our work where such extensions are not suggested forreasons of compliance to current standards.

In [Y. Liu et al., 2004], the authors propose a dynamic QoS registryand computation model for Web services selection. The measurementsare taken, similarly to our work, as a mixture of information sampledby either the service consumer or the service producer, depending onthe metric which is being measured. In certain cases, it is the end-user herself that may provide the QoS feedback. The authors do notinclude availability or accessibility metrics, but rather cost, executiontime, reputation (reliability), transaction support, compensation rateand penalty rate.

On Quality of Service Support for Grid Computing 15

9. Conclusions

We presented a QoS framework for Computing Grids, with emphasison Web services-based middleware implementations, which aims at sup-porting Grid sessions with various types of specific QoS requirements.The proposed framework relies on two complementary approaches. Theformer is designed to provide QoS guarantees on a deterministic ba-sis, and relies on a novel Grid service, the Agreement Service, for SLAsignalling. Conversely, the latter is a lightweight approach deliveringprobabilistic QoS guarantees for those use cases which can not affordthe additional delay of deterministic QoS enforcing techniques. LooseQoS guarantees are based on archived performance information and sta-tistical models. Therefore, we propose an architecture for publishingof monitoring data, which is integrated with a QoS-aware Grid servicefor workload management, namely the Workload Management System.Several QoS performance metrics for network and Grid service profilecharacterization are defined starting from a number of application usecases for computing and instrument Grids, and a metric compositionapproach is presented for the estimation of some of the temporal QoSmetrics of interest in the paper. Finally, preliminary implementationwork is presented.

Acknowledgements

This work has been partially funded by the GRIDCC (Grid-enabledRemote Instrumentation with Distributed Control and Computation)EU project under contract number 511382 and is presented on behalfof the GRIDCC consortium. It was also supported by the EU projectEGEE sponsored by the European Union under contract number INFSO508833, by the INFN project INFN-GRID, and by the Greek Secretariatfor Research and Technology.

The authors would like to thank Peter Hobson and Paul Kyberd(Brunel University, UK), Panagiotis Louridas (Greek Research and Tech-nology Network, Greece), Gaetano Maron and Francesco Lelli (NationalInstitute of Nuclear Physics, Italy), Luke Dickens, Marko Krznaric andJanusz Martyniak (Imperial College London, UK) for their work andfeedback on the paper.

References

B. Lowekamp et al. (2004). A Hierarchy of Network Performance Char-acteristics for Grid Applications and Services.

EGEE (2005a). Institution of SLAs and appropriate policies. DeliverableDSA2.2.

EGEE (2005b). Specification of interfaces for Bandwidth ReservationDeliverable DJRA4.1.

Feller, William (1966). Introduction to Probablity Theory and its Appli-cations, Vol II. John Wley and Sons.

GRIDCC (2005a). Definition of the QoS parameters for a real-time andinteractive environment. Deliverable 2.2.

GRIDCC (2005b). GRIDCC Architecture. Deliverable 1.2.J. Gu et al. (2004). The Storage Resource Manager Interface Specifica-

tion.OASIS (2004). Web Services Resource Framework (WSRF).OGF (2007). Web Services Agreement Specification (WS-Agreement).R. Al-Ali et al. (2002). G-QoSm: Grid Service Discovery Using QoS

Properties. Computing and Informatics Journal, 21(4).Ran, Shuping (2003). A model for web services discovery with QoS.

SIGecom Exch., 4(1):1–10.Tijms, Henk (2004). Understanding Probability: Chance Rules in Every-

day Life. Cambridge University Press.van Engelen, Robert (2003). Pushing the SOAP Envelope with web ser-

vices for scientific computing. In 1st International Conference on Web-services.

Y. Liu et al. (2004). QoS computation and policing in dynamic webservice selection. In WWW Alt. ’04: Proceedings of the 13th inter-national World Wide Web conference on Alternate track papers &posters, pages 66–73, New York, NY, USA. ACM Press.