On automating Web services discovery

26
VLDB Journal manuscript No. (will be inserted by the editor) On Automating Web Services Discovery Boualem Benatallah 1 , Mohand-Said Hacid 2 , Alain Leger 3 , Christophe Rey 4 , Farouk Toumani 4 1 School of Computer Science and Engineering, University of New South Wales, Sydney, Australia ([email protected]) 2 LIRIS, Universit´ e Lyon I, France ([email protected]) 3 France Telecom R&D ([email protected]) 4 LIMOS, Universit´ e Blaise Pascal, France ({rey, ftoumani}@isima.fr) Received: December 15, 2002 / Revised version: date Abstract One of the challenging problems that web service technology faces is the ability to effectively discover services based on their capabilities. We present an approach to tackle this problem in the context of description logics (DL). We formalize service discovery as a new instance of the problem of rewriting concepts using terminologies. We call this new instance the best covering problem. We provide a formalization of best covering problem in the framework of DL-based ontologies and propose a hypergraph-based algorithm to effectively compute best covers of a given request. We propose a novel matchmaking algorithm that takes as input a service request (or query) Q and an ontology T of services, and find a set of services called a “best cover” of Q whose descriptions contain as much as possible of common information with Q and as less as possible of extra information with respect to Q. We have implemented the proposed discovery technique and used the developed prototype in the context of Multilingual Knowledge Based European Electronic Marketplace (MKBEEM) project. Keywords: Web services discovery, Semantic matchmaking, Descrip- tion logics, Hypergraphs 1 Introduction Semantic web services are emerging as a promising technology for the effec- tive automation of services discovery, combination, and management [31,21, 20]. They aim at leveraging two major trends in web technologies, namely web services and semantic web: Web services built upon XML as vehicle for exchanging messages across applications. The basic technological infrastructure for web services is

Transcript of On automating Web services discovery

VLDB Journal manuscript No.(will be inserted by the editor)

On Automating Web Services Discovery

Boualem Benatallah1, Mohand-Said Hacid2, Alain Leger3,Christophe Rey4, Farouk Toumani4

1 School of Computer Science and Engineering, University of New South Wales,Sydney, Australia ([email protected])

2 LIRIS, Universite Lyon I, France ([email protected])3 France Telecom R&D ([email protected])4 LIMOS, Universite Blaise Pascal, France ({rey, ftoumani}@isima.fr)

Received: December 15, 2002 / Revised version: date

Abstract One of the challenging problems that web service technologyfaces is the ability to effectively discover services based on their capabilities.We present an approach to tackle this problem in the context of descriptionlogics (DL). We formalize service discovery as a new instance of the problemof rewriting concepts using terminologies. We call this new instance thebest covering problem. We provide a formalization of best covering problemin the framework of DL-based ontologies and propose a hypergraph-basedalgorithm to effectively compute best covers of a given request. We proposea novel matchmaking algorithm that takes as input a service request (orquery) Q and an ontology T of services, and find a set of services called a“best cover” of Q whose descriptions contain as much as possible of commoninformation with Q and as less as possible of extra information with respectto Q. We have implemented the proposed discovery technique and usedthe developed prototype in the context of Multilingual Knowledge BasedEuropean Electronic Marketplace (MKBEEM) project.

Keywords: Web services discovery, Semantic matchmaking, Descrip-tion logics, Hypergraphs

1 Introduction

Semantic web services are emerging as a promising technology for the effec-tive automation of services discovery, combination, and management [31,21,20]. They aim at leveraging two major trends in web technologies, namelyweb services and semantic web:

– Web services built upon XML as vehicle for exchanging messages acrossapplications. The basic technological infrastructure for web services is

2 Boualem Benatallah et al.

structured around three major standards: SOAP (Simple Object Ac-cess Protocol), WSDL (Web Services Description Language), and UDDI(Universal Description, Discovery and Integration) [12,18]. These stan-dards provide the building blocks for service description, discovery, andcommunication. While web services technologies have clearly influencedpositively the potential of the web infrastructure by providing program-matic access to information and services, they are hindered by lack ofrich and machine-processable abstractions to describe service proper-ties, capabilities, and behavior. As a result of these limitations, verylittle automation support can be provided to facilitate effective discov-ery, combination, and management of services. Automation support isconsidered as the cornerstone to provide effective and efficient access toservices in large, heterogeneous, and dynamic environments [13,12,20].Indeed, until recently the basic web services infrastructure was mainlyused to build simple web services such as those providing informationsearch capabilities to an open audience (e.g. stock quotes, search enginequeries, auction monitoring).

– Semantic web aims at improving the technology to organize, search, inte-grate, and evolve web-accessible resources (e.g., web documents, data) byusing rich and machine-understandable abstractions for the representa-tion of resources semantics. Ontologies are proposed as means to addresssemantic heterogeneity among web-accessible information sources andservices. They are used to provide meta-data for the effective manipula-tion of available information including discovering information sourcesand reasoning about their capabilities. Efforts in this area include the de-velopment of ontology languages such as RDF, DAML, and DAML+OIL[16]. In the context of web services, ontologies promise to take interop-erability a step further by providing rich description and modeling ofservices properties, capabilities, and behavior.

By leveraging efforts in both web services and semantic web, semanticweb services paradigm promises to take web technologies a step further byproviding foundations to enable automated discovery, access, combination,and management of web services. Efforts in this area focus on providingrich and machine understandable representation of services properties, ca-pabilities, and behavior as well as reasoning mechanisms to support automa-tion activities [31,14,21,20,15,11]. Examples of such efforts include DAML-S[15], WSMF (Web Services Modelling Framework) [21], and METEOR-S(http://lsdis.cs.uga.edu/proj/meteor/SWP.htm). Work in this area is stillin its infancy. Many of the objectives of the semantic web services paradigm,such as capability description of service, dynamic service discovery, and goal-driven composition of web services remain to be reached.

Our work focuses on the issue of dynamic discovery of web services basedon their capabilities. Dynamic service discovery is usually based on the ra-tionale that services are selected, at run-time, based on their properties andcapabilities. Our aim is to ground the discovery process on a matchmakingbetween a requester query and available web service descriptions. We for-

On Automating Web Services Discovery 3

malize the service discovery approach in the context of Description Logics(DLs) [17]. A key aspect of description logics is their formal semantics andreasoning support. DLs provide an effective reasoning paradigm for definingand understanding the structure and semantics of concept description on-tologies. This is essential to provide formal foundations for the envisionedsemantic web paradigm [27,29,28]. Indeed, DLs have heavily influenced thedevelopment of some semantic web ontology languages (e.g., DAML-OILor OWL [2]). Our work aims at enhancing the potential of web services byfocusing on formal foundations and flexible aspects of their discovery. Morespecifically, we make the following contributions:

– Flexible Matchmaking between Service Descriptions and Re-quests. We propose a matchmaking technique that goes beyond simplesubsumption comparaisons between a service request and service adver-tisements. As emphasized in [33], a service discovery mechanism shouldsupport flexible matchmaking since it is unrealistic to expect servicerequests and service advertisments to exactly match. To cope with thisrequirement, we propose to use a difference operation on service descrip-tions. Such an operation enables to extract from a subset of web servicedescriptions the part that is semantically common with a given servicerequest and the part that is semantically different from the request.Knowing the former and the latter allows to effectively select relevantweb services. We propose a novel matchmaking algorithm that takes asinput a service request (or query) Q and an ontology T of services, andfinds a set of services called a “best cover” of Q whose descriptions con-tain as much as possible of common information with Q and as less aspossible of extra information with respect to Q.

– Concept Rewriting for Effective Service MatchmakingWe formalize service matchmaking as a new instance of the problemof rewriting concepts using terminologies [26,6]. We call this new in-stance the best covering problem. We provide a formalization of bestcovering problem in the context of DL-based ontologies and propose ahypergraph-based algorithm to effectively compute best covers of a givenrequest.

– Characterization of Service Discovery Automation in DAML-SService OntologiesWe investigate the reasoning problem associated with service discoveryin DAML-S ontologies and its relationship with the expressiveness of thelanguage used to express service descriptions. To study the feasibility ofour approach, we have implemented the proposed discovery techniqueand used the developed prototype in the context of Multilingual Knowl-edge Based European Electronic Marketplace (MKBEEM) project1.

1 http://www.mkbeem.com

4 Boualem Benatallah et al.

Organization of the Paper

The remainder of this paper is organized as follows: Section 2 provides anoverview of the basic concepts of description logics. Section 3 describes theformalization of service discovery in the context of DL-based ontologies. Sec-tion 4 presents the hypergraph-based algorithm for computing best covers.An extension of our approach to accommodate DAML-S ontologies is pre-sented in Section 5. Section 6 describes an implementation of the proposedservice discovery technique and discusses some preliminary experimental re-sults. We review related work in Section 7 and give concluding remarks inSection 8.

2 Description Logics: An Overview

Our approach uses Description Logics (DLs) [4] as a formal framework.DLs are a family of logics that were developed for modeling complex hier-archical structures and to provide a specialized reasoning engine to performinferences on these structures. The main reasoning mechanisms (e.g., sub-sumption or satisfiability) are decidable for main description logics ([17]).Recently, DLs have heavily influenced the development of the semanticweb languages. For example, DAML+OIL, the ontology language used byDAML-S, is in fact an alternative syntax for a very expressive DescriptionLogic [28].

In this section, we first give the basic definitions, then we describe thenotion of difference between descriptions which is the core operation that isbeing used in our framework.

2.1 Basic Definitions

Description logics allow to represent domain of interest in terms of con-cepts or descriptions (unary predicates) that characterize subsets of theobjects (individuals) in the domain, and roles (binary predicates) over sucha domain. Concepts are denoted by expressions formed by means of specialconstructs. Examples of DL constructs considered in this paper are:

– the symbol � is a concept description which denotes the top conceptwhile the symbol ⊥ stands for the bottom concept,

– concept conjunction (�), e.g., the concept description parent � maledenotes the set of fathers (i.e., male parents),

– the universal role quantification (∀R.C), e.g., the description ∀child.maledenotes the set of individuals whose children are all male,

– the number restriction constructs (≥ nR) and (≤ nR), e.g., the descrip-tion (≥ 1 child) denotes the set of parents (i.e., individuals having atleast one child), while the description (≤ 1 Leader) denotes the set ofindividuals that cannot have more than one leader.

On Automating Web Services Discovery 5

The various description logics differ from one to another based on theset of constructs they allow. Table 1 shows the constructs of two DLs: FL0

and ALN . A concept obtained using the constructs of a description logicL is called an L-concept. The semantics of a concept description is defined

Construct name Syntax Semantics FL0 ALNconcept name P PI ⊆ ∆I X Xtop � ∆I X Xbottom ⊥ ∅ Xconjunction C � D CI ∩ DI X Xprimitive negation ¬P ∆I \ PI X

universal quantifica-tion

∀R.C {x ∈ ∆I |∀y : (x, y) ∈ RI → y ∈CI}

X X

at least number re-striction

(≥ nR), n ∈ N {x ∈ ∆I |#{y|(x, y) ∈ RI} ≥ n} X

at most number re-striction

(≤ nR), n ∈ N {x ∈ ∆I |#{y|(x, y) ∈ RI} ≤ n} X

Table 1 Syntax and semantics of some concept-forming constructs.

in terms of an interpretation I = (∆I , ·I), which consists of a nonemptyset ∆I , the domain of the interpretation, and an interpretation function ·I ,which associates to each concept name P ∈ C a subset P I of ∆I and toeach role name R ∈ R a binary relation RI ⊆ ∆I × ∆I . Additionally, theextension of .I to arbitrary concept descriptions is defined inductively asshown in the third column of Table 1. Based on this semantics, subsumption,equivalence and the notion of a least common subsumer2 (lcs) are definedas follows. Let C, C1, . . . , Cn and D be concept descriptions:

– C is subsumed by D (noted C D) iff CI ⊆ DI for all interpretationI.

– C is equivalent to D (noted C ≡ D) iff CI = DI for all interpretationI.

– D is a least common subsumer of C1, . . . , Cn (noted D = lcs(C1, . . . , Cn))iff:

(1) Ci D for all 1 ≤ i ≤ n, and(2) D is the least concept description with this property, i.e., if D′ is a

concept description satisfying Ci D′ for all 1 ≤ i ≤ n, then D D′

[5].

The intentional descriptions contained in a knowledge base built using adescription logic is called terminology. The kind of terminologies we considerin this paper are defined below.

Definition 1 (terminology) Let A be a concept name and C be a conceptdescription. Then A

.= C is a concept definition. A terminology T is a finiteset of concept definitions such that each concept name occurs at most oncein the left-hand side of a definition.

2 Informally, a least common subsumer of a set of concepts corresponds to themost specific description which subsumes all the given concepts [5].

6 Boualem Benatallah et al.

A concept name A is called a defined concept in the terminology T iff itoccurs in the left-hand side of a concept definition in T . Otherwise A iscalled an atomic concept.

An interpretation I satisfies the statement A.= C iff AI = CI . An in-

terpretation I is a model for a terminology T if I satisfies all the statementsin T .

A terminology3 built using the constructs of a language L is called anL-terminology. In the sequel, we assume that a terminology T is acyclic, i.e.,there do not exist cyclic dependencies between concept definitions. Acyclicterminologies can be unfolded by replacing defined names by their defini-tions until no more defined names occur on the right-hand sides. Therefore,the notion of least common subsumer (lcs) of a set of descriptions can bestraightforwardly extended to concepts containing defined names. In thiscase, we use the expression lcsT (C, D) to denote the least common sub-sumer of the concepts C and D w.r.t. a terminology T (i.e., the lcs isapplied to the unfolded descriptions of C and D).

2.2 The Difference Operation

In this section, we recall the main results obtained by Teege in [36] regardingthe difference operation between two concept descriptions.

Definition 2 (difference operation) Let C, D be two concept descrip-tions with C D. The difference C −D of C and D is defined by C −D :=max

{B|B � D ≡ C}

The difference of two descriptions C and D is defined as being a descriptioncontaining all information which is a part of the description C but not a partof the description D. This definition of difference operation requires that thesecond operand subsumes the first one. However, in case the operands Cand D are incomparable w.r.t the subsumption relation, then the differenceC − D can be given by constructing the least common subsumer of C andD, that is, C − D := C − lcs(C, D).

It is worth noting that, in some description logics, the set C − D maycontain descriptions which are not semantically equivalent as illustrated bythe following example.

Example 1 Consider the descriptions C.= (∀R.P1) � (∀R.¬P1) and D

.=(∀R.P2) � (∀R.(≤ 4S)). The set C − D includes, among others, the non-equivalent descriptions (∀R.¬P2) and (∀R.(≥ 5S)).

Teege [36] provides sufficient conditions to characterize the logics where thedifference operation is always semantically unique and can be syntacticallyrealized by constructing the set difference of subterms in a conjunction.Some basic notions and important results of this work are introduced below.

3 In the sequel, we use the terms terminology and ontology interchangeably.

On Automating Web Services Discovery 7

Definition 3 (reduced clause form and structure equivalence) Let Lbe a description logic.

– A clause in L is a description A with the following property: (A ≡B � A′) ⇒ (B ≡ �) ∨ (B ≡ A). Every conjunction A1 � . . . � An ofclauses can be represented by the clause set {A1, . . . , An}.

– A = {A1, . . . , An} is called a reduced clause set if either n = 1, or noclause subsumes the conjunction of the other clauses: ∀1 ≤ i ≤ n : Ai �A \ Ai. The set A is then called a reduced clause form (RCF) ofevery description B ≡ A1 � . . . � An.

– Let A = {A1, . . . , An} and B = {B1, . . . , Bm} be reduced clause sets ina description logic L. A and B are structure equivalent (denoted byA ∼= B) iff: n = m ∧ ∀1 ≤ i ≤ n ∃1 ≤ j, k ≤ n : Ai ≡ Bj ∧ Bi ≡ Ak

– If in a description logic for every description all its RCFs are structureequivalent, we say that RCFs are structurally unique in that logic.

The structural difference operation is defined as being the set difference be-tween clause sets where clauses are compared on the basis of the equivalencerelationship.

Let us now introduce the notion of structural subsumption as defined in[36].

Definition 4 The subsumption relation in a description logic L is saidstructural iff for any clause A ∈ L and any description B = B1�. . .�Bm ∈L which is given by its RCF, the following holds: A � B ⇔ ∃1 ≤ i ≤ m :A � Bi

[36] provides two interesting results: 1) in description logics with struc-turally unique RCFs, the difference operation can be straightforwardly de-termined using the structural difference operation, and 2) structural sub-sumption is a sufficient condition for a description logic to have structurallyunique RCFs. Consequently, structural subsumption is a sufficient conditionthat allows to identify logics where the difference operation is semanticallyunique and can be implemented using the structural difference operation.However, it is worth noting that the definition of structural subsumptiongiven in [36] is different from the one usually used in the literature. Un-fortunately, a consequence of this remark is that many description logicsfor which a structural subsumption algorithm exists (e.g., ALN [32]) donot have structurally unique RCFs. Nevertheless, the result given in [36]is still interesting in practice since there exists several description logicswhich satisfy this property. Examples of such logics include the languageFL0∪(≥ n R), that we have used in the context of the MKBEEM project,or the more powerful description logic L1 [36], which contains the followingconstructs:

– �,�,�,⊥, (≥ nR), existential role quantification (∃R.C) and existentialfeature quantification (∃f.C) for concepts, where C denotes a concept,R a role and f a feature (i.e., a functional role),

8 Boualem Benatallah et al.

– bottom (⊥), composition (◦), differentiation (|) for roles,– bottom (⊥) and composition (◦) for features.

In the remainder of this paper we use the term structural subsumption inthe sense of [36].

Size of a description Let L be a description logic with structural sub-sumption. We define the size |C| of an L-concept description C as beingthe number of clauses in its RCFs4. If necessary, a more precise measureof a size of a description can be defined by taking into account the size ofeach clause (e.g., by counting the number of occurrences of concept androle names in each clause). However, in this case one must use some kind ofcanonical form to deal with the problem of different descriptions of equiva-lent clauses. It should be noted that, in a description logic with structurallyunique RCFs it is often possible to define a canonical form which is itselfan RCF [36].

3 Formalization of the Best Covering Problem

In this section, we first formalize the best covering problem in the frameworkof description logics with structural subsumption. Then we describe how tocompute best covers using a hypergraph-based algorithm.

3.1 Problem Statement

Let us first introduce some basic definitions that are required to formallydefine the best covering problem. Let L be a description logic with structuralsubsumption, T be an L-terminology, and Q ≡ ⊥ be a coherent L-conceptdescription. The set of defined concepts occurring in T is denoted as ST ={Si, i ∈ [1, n]} with Si ≡ ⊥, ∀i ∈ [1, n]. In the sequel, we assume thatconcept descriptions Si, i ∈ [1, n] are represented by their RCFs.

Definition 5 (cover) A cover of Q using T is a conjunction E of somenames Si from T such that: Q − lcsT (Q, E) ≡ Q.

Hence, a cover of a concept Q using T is defined as being any conjunctionof concepts occurring in T which shares some common information with Q.It is worth noting that a cover E of Q is always consistent with Q (i.e.,Q � E ≡⊥) since L is a description logic with structurally unique RCFs5

and Q ≡ ⊥ and Si ≡ ⊥, ∀i ∈ [1, n].

4 We recall that, since L have structurally unique RCFs, all the RCFs of anL-description are equivalent and thus have the same number of clauses.

5 If the language L contains the incoherent concept ⊥, then ⊥ must be a clause,i.e., non trivial decompositions of ⊥ is not possible (that means we cannot haveincoherent conjunction of coherent clauses), otherwise it is easy to show that Ldoes not have structurally unique RCFs.

On Automating Web Services Discovery 9

To define the notion of best cover, we need to characterize the part of thedescription of a cover E that is not contained in the description of the queryQ, and the part of the query Q that is not contained in the description ofits cover E.

Definition 6 (rest and miss) Let Q be an L-concept description and E acover of Q using T . The rest of Q with respect to E, denoted as RestE(Q),is defined as follows: RestE(Q) ≡ Q − lcsT (Q, E).The missing information of Q with respect to E, denoted as MissE(Q), isdefined as follows: MissE(Q) ≡ E − lcsT (Q, E).

Now we can define the notion of best cover.

Definition 7 (best cover) A concept description E is called a best coverof Q using a terminology T iff:

– E is a cover of Q using T , and– there doesn’t exist a cover E′ of Q using T such that

(|RestE′(Q)|, |MissE′(Q)|) < (|RestE(Q)|, |MissE(Q)|), where < is thelexicographic order operator.

A best cover is defined as being a cover that has, first, the smallest restand, second, the smallest miss.

The best covering problem, denoted BCOV(T , Q), is then the problemof computing all the best covers of Q using T .

Theorem 1 (Complexity of BCOV(T , Q)) The best covering problem isNP-hard.

The proof of this theorem follows from the results regarding the minimalrewriting problem [6] (see [25] for a detailed proof).

3.2 Mapping Best Covers to Hypergraph Transversals

Let us first recall some necessary definitions regarding hypergraphs.

Definition 8 (hypergraph and transversals) [19]A hypergraph H is a pair (Σ, Γ ) of a finite set Σ = {V1, . . . , Vn} and a setΓ of subsets of Σ. The elements of Σ are called vertices, and the elementsof Γ are called edges. A set T ⊆ Σ is a transversal of H if for each ε ∈ Γ ,T ∩ ε = ∅. A transversal T is minimal if no proper subset T ′ of T is atransversal. The set of the minimal transversals of an hypergraph H is notedTr(H).

In this section, we describe how to express the best covering problem asthe problem of finding the minimal transversals with a minimal cost of agiven hypergraph.

10 Boualem Benatallah et al.

Definition 9 (hypergraph generation) Let L be a description logic withstructural subsumption, T be an L-terminology, and Q be an L-conceptdescription. Given an instance BCOV(T , Q) of the best covering problem,we build a hypergraph HT Q = (Σ, Γ ) as follows:

– each concept name Si in T is associated with a vertex VSi in the hyper-graph HT Q. Thus Σ = {VSi , i ∈ [1, n]}.

– each clause Ai ∈ Q, for i ∈ [1, k], is associated with an edge in HT Q,noted wAi , with wAi = {VSi | Si ∈ ST and Ai ∈≡ lcsT (Q, Si)} where∈≡ stands for the membership test modulo equivalence of clauses andlcsT (Q, Si) is given by its RCF.

Notation For the sake of clarity we introduce the following notation. Forany set of vertices X = {VSi}, subset of Σ, we use the expression EX ≡�VSi

∈XSi to denote the concept obtained from the conjunction of conceptnames corresponding to the vertices in X . Inversely, for any concept E ≡�j∈[1,m]Sij , we use the expression XE = {VSij

, j ∈ [1, m]} to denote the setof vertices corresponding to the concept names in E.

Lemmas 1 and 2 given below, say that computing a cover of Q usingT that minimizes the rest amounts to computing a transversal of HT Q byconsidering only the non empty edges. Proofs of these lemmas are presentedin [25].

Lemma 1 (characterization of the minimal rest) Let L be a descrip-tion logic with structural subsumption, T be an L-terminology, and Q be anL-concept description. Let HT Q = (Σ, Γ ) be the hypergraph built from theterminology T and the concept Q = A1 � . . .�Ak provided by its RCF. Theminimal rest (i.e., the rest whose size is minimal) of rewriting Q using Tis: Restmin ≡ Aj1 � . . . � Ajl

, ∀ji ∈ [1, k] | wAji= ∅.

From the previous lemma, we know that the minimal rest of rewriting aquery Q using T is always unique and is equivalent to Restmin.

Lemma 2 (characterization of covers that minimize the rest) LetHT Q = (Σ, Γ ′) be the hypergraph built by removing the empty edges fromHT Q. A rewriting E ≡ Si1�. . .�Sim , with 1 ≤ m ≤ n and Sij ∈ ST for 1 ≤j ≤ m, is a cover of Q using T that minimizes the rest iff XE = {VSij

, j ∈[1, m]} is a transversal of HT Q.

This lemma characterizes the covers that minimize the rest. Conse-quently, computing the best covers will consist of determining, from thosecovers, the ones that minimize the miss. To express miss minimization inthe hypergraphs framework, we introduce the following notion of cost.

Definition 10 (cost of a set of vertices)

Let BCOV(T , Q) be an instance of the best covering problem and HT Q =(Σ, Γ ′) its associated hypergraph. The cost of the set of vertices X is definedas follows: cost(X) = |MissEX (Q)|.

On Automating Web Services Discovery 11

ToTravel ≡ (≥ 1 departurePlace) � ( ∀ departurePlace.Location) � (≥ 1arrivalPlace) � (∀ arrivalPlace.Location) � (≥ 1 arrivalDate)� (∀ arrivalDate.Date) � (≥ 1 arrivalTime) � (∀ arrival-Time.Time)

FromTravel ≡ (≥ 1 departurePlace) � (∀ departurePlace.Location) � (≥ 1arrivalPlace) � (∀ arrivalPlace.Location) � (≥ 1 departure-Date) � (∀ departureDate.Date) � (≥ 1 departureTime) �(∀ departureTime.Time)

Hotel ≡ Accommodation � (≥ 1 destinationPlace) � (∀ destina-tionPlace.Location) � (≥ 1 checkIn) � (∀ checkIn.Date) �(≥ 1 checkOut) � (∀ checkOut.Date) � (≥ 1 nbAdults)� (∀ nbAdults.Integer) � (≥ 1 nbChildren) � (∀ nbChil-dren.Integer)

Table 2 Example of a terminology.

Therefore, the BCOV(T , Q) problem can be reduced to the computationof the transversals with minimal cost of the hypergraph HT Q. It is clearthat, it is only interesting to consider minimal transversals. In a nutshell,the BCOV(T , Q) problem can be reduced to the computation of the minimaltransversals with minimal cost of the hypergraph HT Q. Therefore, we canreuse and adapt known techniques for computing minimal transversals (e.g.,see [10,30,19]) for solving the best covering problem.

3.3 Illustrating Example

To illustrate the best covering problem, let us consider a terminology Tcontaining the following concepts (web services):

– ToTravel: allows to search for trips given a departure place, an arrivalplace, an arrival date and an arrival time,

– FromTravel: allows to search for trips given a departure place, an arrivalplace, a departure date and a departure time,

– Hotel: allows to search for hotels given a destination place, the check-indate, the check-out date, the number of adults and the number of children.

The terminology T , depicted in Table 2, is described using the descrip-tion logic FL0∪{≥ n R}6.

Let us consider now, the following query description:Q ≡ (≥ 1 departurePlace) � (∀ departurePlace.Location) � (≥ 1 ar-

rivalPlace) � (∀ arrivalPlace.Location) � (≥ 1 departureDate)� (∀ departureDate.Date) � Accommodation � (≥ 1 destina-tionPlace) � (∀ destinationPlace.Location) � (≥ 1 checkIn) �(∀ checkIn.Date) � (≥ 1 checkOut) � (∀ checkOut.Date) �carRental

6 We note FL0∪(≥ nR) the description logic FL0 augmented with the construct(≥ n R).

12 Boualem Benatallah et al.

VToTravel VFromTravel

VHotel

W(> 1 departurePlace)

W departurePlace.Location

W(> 1 arrivalPlace)

WAccommodation

W arrival Place.Location

W(> 1 departurePlace)

W departurePlace.Location

W(> 1 destinationPlace)

W destinationPlace.Location

W(> 1 CheckIn)

W CheckIn.Date

W(> 1 CheckOut)

W CheckOut.Date

WCarRental

Fig. 1 Example of a hypergraph.

We assume that the concept names (e.g., Location, Date, Accommoda-tion), that appear in the description of the query Q and/or in the con-cept descriptions of T , are all atomic concepts. Hence, the query Q andthe concepts of T are all provided by their RCFs7. Therefore, the asso-ciated hypergraph HT Q = (Σ, Γ ) consists of the set of vertices Σ ={VToTravel, VFromTravel, VHotel} and the set of edges:

Γ = {w(≥1departureP lace), w(∀departureP lace.Location), w(≥1arrivalP lace),w(∀arrivalP lace.Location), w(≥1departureDate), w(∀departureDate.Date),wAccommodation, w(≥1destinationPlace) , w(∀destinationPlace.Location),w(≥1checkIn), w(∀checkIn.Date), w(≥1checkOut), w(∀checkOut.Date),wcarRental}

The hypergraph HT Q = (Σ, Γ ) is depicted in Figure 1. We can see thatno concept covers the clause corresponding to the edge wcarRental (as wehave wcarRental = ∅). Since this is the only empty edge in Γ , the best coversof Q using T will have exactly the following rest: Restmin ≡ carRental.Now, considering the hypergraph HT Q, the only minimal transversal is:X = {VFromTravel, VHotel}. So, EX ≡ Hotel � FromTravel is the bestcover of Q using the terminology T . The size of the missing information ofEX is obtained from the transversal X as shown below:cost(X) = |MissFromTravel�Hotel(Q)| = |(≥ 1 departureTime) � (∀ depar-tureTime.Time) � (≥ 1 nbAdults) � (∀ nbAdults.Integer) � (≥ 1 nbChil-dren) � (∀ nbChildren.Integer)| = 6.

In this example, we do not consider this cost because the hypergraphHT Q has only one minimal transversal.

7 Otherwise, we have to recursively unfold the concept (resp. query) descriptionby replacing by its definition each concept name appearing in the concept (resp.query) description.

On Automating Web Services Discovery 13

4 Computing Best Covers

In this section we present an algorithm, called computeBCov, for comput-ing the best covers of a concept Q using a terminology T . In the previoussection, we have shown that this problem can be reduced to the search ofthe transversals with minimal cost of the hypergraph HT Q. The problemof computing minimal transversals of an hypergraph is central in variousfields of computer science [19]. The precise complexity of this problem isstill an open issue. In [22], it is shown that the generation of the transver-sal hypergraph can be done in incremental subexponential time kO(logk),where k is the combined size of the input and the output. To the best ofour knowledge, this is the best time bound for the problem of computingminimal transversals of a hypergraph.

A classical algorithm for computing the minimal transversals of an hy-pergraph is presented in [10,30,19]. The algorithm is incremental and worksin n steps where n is the number of edges of the hypergraph. Starting froman empty set of transversals, the basic idea is to explore each edge of the hy-pergraph, one edge in each step, and to generate a set of candidate transver-sals by computing all the possible unions of the candidates generated in theprevious step and each vertex in the considered edge. At each step, thenon-minimal candidate transversals are pruned.

So, a naive approach to compute the minimal transversals with a mini-mal cost consists in computing all the minimal transversals, using such analgorithm, and then choosing those transversals which have the minimalcost. The computeBCov algorithm presented below makes the following im-provements with respect to the naive approach:

1. it reduces the number of candidates in the intermediary steps of thealgorithm by generating only the minimal transversals, and

2. it uses a combinatorial optimization technique (Branch-and-Bound) inorder to prune, at the intermediary steps of the algorithm, those can-didate transversals which will not generate transversals with a minimalcost.

These two optimizations8 have been implemented as separate optionsof the algorithm, namely option Pers for the first optimization and optionBnB for the second one. The first optimization allows to generate only goodcandidates (only minimal transversals) at each iteration (line 5 of the algo-rithm). To do so, we use a necessary and sufficient condition (provided byTheorem 2 below) to describe a pair (Xi, sj) that will generate a non mini-mal transversal at iteration i, where Xi is a minimal transversal generatedat iteration i − 1 and sj is a vertex of the ith edge.

Theorem 2Let Tr(H) = {Xi, i = 1..m} be the set of minimal transversals for the

hypergraph H, and E = {sj, j = 1..n} be an edge of H. Assume H′ = H∪E.

8 Other less significant optimization options have also been implemented.

14 Boualem Benatallah et al.

Algorithm 1 computeBCov (skeleton)

Require: An instance BCOV(T , Q) of the best covering problem.Ensure: The set of the best covers of Q using T .1: Build the associated hypergraph HT Q = (Σ, Γ ′).2: Tr← ∅ – Initialization of the minimal transversal set.3:

CostEval←∑e∈Γ ′

minVSi

∈e(|MissSi(Q)|). – Initialization of CostEval

4: for all edge E ∈ Γ ′ do5: Tr ← the new generated set of the minimal transversals.6: Remove from Tr the transversals whose costs are greater than CostEval.7: Compute a more precise evaluation of CostEval.8: end for9: for all X ∈ Tr such that |MissEX (Q)| = CostEval do

10: return the concept EX as a best cover of Q using T .11: end for

Then, we have : Xi∪{sj} is a non-minimal transversal of H′ ⇔ there existsa minimal transversal Xk of H such that Xk∩E = {sj} and Xk \{sj} ⊂ Xi

Details and proofs of Theorem 2 are given in [35].The second improvement consists in a Branch-and-Bound like enumera-

tion of transversals. First, a simple heuristic is used to efficiently compute acost of a good transversal (i.e., a transversal expected to have a small cost).This can be carried out by adding, for each edge of the hypergraph, thecost of the vertex that has the minimal cost. The resulting cost is stored inthe variable CostEval (line 3 of the algorithm). Recall that for any set ofvertices X = {Si, . . . , Sn}:

cost(X) = |PmissSi�...�Sn(Q)| ≤∑

j∈[i,n]

|PmissSj(Q)| =∑

Sj∈X

cost(Sj)

The evaluation of CostEval is an upper bound of the cost of an existingtransversal. As we consider candidates in intermediate steps of the algo-rithm, we can eliminate from Tr(HT Q) any candidate transversal that hasa cost greater than CostEval, since that candidate could not possibly leadto a transversal that is better than what we already have (line 6). From eachcandidate transversal remaining in Tr(HT Q), we compute a new evaluationfor CostEval by considering only remaining edges (line 7).

At the end of the algorithm, each computed minimal transversal X ∈Tr is transformed into a concept EX which constitutes an element of thesolution to the BCOV(T , Q) problem (line 10).

On Automating Web Services Discovery 15

5 Semantic Reasoning for Web Service Discovery

In this section, we describe how the proposed reasoning mechanism can beused to automate the discovery of web services in the context of DAML-Sontologies9. More details on these aspects can be found in [8].

DAML-S [15] is an ontology for describing web services. It employs theDAML+OIL ontology language [16] to describe the properties and capabil-ities of web services in a computer-interpretable form, thereby facilitatingthe automation of web service discovery, invocation, composition and ex-ecution. DAML-S supplies a core set of markup language constructs fordescribing web services in terms of classes (concepts) and complex relation-ships between them.

A DAML-S ontology of services is structured in three main parts [15]:

– ServiceProfile describes the capabilities and parameters of the service.It is used for advertising and discovering services.

– ServiceModel gives a detailed description of a service’s operation. Serviceoperation is described in terms of a process model, which presents boththe control structure and data flow structure of the service required toexecute a service.

– ServiceGrounding specifies the details of how to access the service, viamessages (e.g., communication protocol, message formats, addressing,etc).

The service profile provides information about a service that can be usedby an agent to determine if the service meets its needs. It consists of threetypes of information: a (human readable) description of the service; thefunctional behavior of the service which is represented as a transformationfrom the service inputs to the service outputs; and several non functionalattributes which specify additional information about a service (e.g., thecost of the service).

In the DAML-S approach, a service profile is intended to be used byproviders to advertise their services as well as by service requesters to specifytheir needs.

5.1 Best Covering Profile Descriptions

We describe hereafter how the proposed algorithm can be adapted to sup-port dynamic discovery of DAML-S services. It is worth noting that we donot deal with the full expressiveness of the DAML-OIL language. We con-sider only DAML-S ontologies expressed using a subset of the DAML-OILlanguage for which a structural subsumption algorithm exists. In the sequel,such kind of ontologies are called restricted DAML-S ontologies.

As proposed in [33], a match between a query (expressed by means ofa service profile) and an advertised service is determined by comparing all

9 http://www.daml.org/services/

16 Boualem Benatallah et al.

the outputs of the query with the outputs of the advertisement and all theinputs of the advertisement with the inputs of the query. We adopt thesame idea for comparing requests with advertised services, but we proposeto use computeBCov instead of the matchmaking algorithm given in [33].Intuitively, we target a service discovery mechanism that works as follows:given a service request Q and a DAML-S ontology T , we want to computethe best combination of web services that satisfies as much as possible theoutputs of the request Q and that requires as little as possible of inputs thatare not provided in the description of Q. We call such a combination of webservices a best profile cover of Q using T . To achieve this task, we need toextend the best covering techniques, as presented in Section 3, to take intoaccount profile descriptions as presented below.

Let T = {Si, i ∈ [1, n]} be a restricted DAML-S ontology and E ≡Sl� . . .�Sp, with l, p ∈ [1, n], be a conjunction of some services occurring inT . We denote by I(E) (respectively, O(E)) the concept determined usingthe conjunction of all the inputs (respectively, the outputs) occurring in theprofile section of all the services Si, for all i ∈ [l, p]. In the same way, weuse I(Q) (respectively, O(Q)) to denote the concept determined using theconjunction of all the inputs (respectively, the outputs) occurring in theprofile section of a given query Q.

We extend the notions of cover, rest and miss to service profiles as fol-lows.

Definition 11 Profile cover (Pcover) A Profile cover, called Pcover, ofQ using T is a conjunction E of some services Si from T such that: O(Q)−lcsT (O(Q), O(E)) ≡ O(Q).

Hence, a Pcover of a query Q using T is defined as being any conjunctionof web services occurring in T which shares some outputs with Q.

Definition 12 Profile rest (Prest) and Profile miss (Pmiss) Let Qbe a service request and E a cover of Q using T . The Profile rest of Q withrespect to E, denoted as PrestE(Q), is defined as follows: PrestE(Q) .=O(Q) − lcsT (O(Q), O(E)).The Profile missing information of Q with respect to E, denoted as PmissE(Q),is defined as follows: PmissE(Q) .= I(E) − lcsT (I(Q), I(E)).

Finally, the notion of best profile cover can be extended to profiles by re-spectively replacing rest and miss by Prest and Pmiss in the definition 7 [9].Consequently, the algorithm computeBCov, presented in the previous sec-tion, can be adapted and used as a matchmaking algorithm for discoveringDAML-S services based on their capabilities. We devised a new algorithm,called computeBProfileCov, for this purpose. According to the definitions 11and 12, the algorithm selects the combinations of services that best match agiven query and effectively computes the outputs of the query that cannotbe satisfied by the available services (i.e., Prest) as well as the inputs thatare required by the selected services and are not provided in the query (i.e.,Pmiss).

On Automating Web Services Discovery 17

Service Inputs OutputsToTravel Itinerary, Arrival TripReservation

FromTravel Itinerary, Departure TripReservationHotel Destination, StayDuration HotelReservation

Table 3 Input and Output service parameters.

5.2 Illustrating Example

This example illustrates how the notion of best profile cover can be usedto match a service request with service advertisements. Let us consider anontology of web services which contains the following three services:

– ToTravel allows to reserve a trip given an itinerary (i.e., the departurepoint and the arrival point) and the arrival time and date.

– FromTravel allows to reserve a trip given an itinerary and the departuretime and date.

– Hotel allows to reserve a hotel given a destination place, a period oftime expressed in terms of the check-in date and the check-out date.

Table 3 shows the input and the output concepts of the three web ser-vices. We assume that, the service profiles refer to concepts that are definedin the restricted DAML-OIL tourism ontology given in Table 4. For clarityreasons, we use the usual DL syntax instead of the DAML-OIL syntax todescribe the ontology. In Table 4, the description of the concept Itinerarydenotes the class of individuals whose departure places (respectively arrivalplaces) are instances of the concept Location. Moreover, the individual whichbelongs to this class must have at least one departure place (the constraint(≥ 1 departurePlace)) and at least one arrival place (the constraint (≥ 1arrivalPlace)). The input of the service ToTravel is obtained using the con-junction of all its inputs as follows: I(ToTravel) ≡ Itinerary� Arrival.By replacing the concepts Itinerary and Arrival by their descriptions, weobtain the following equivalent description:

I(ToTravel) ≡ (≥ 1 departurePlace) � ( ∀ departurePlace.Location) �(≥ 1 arrivalPlace) � (∀ arrivalPlace.Location) � (≥ 1 ar-rivalDate) � (∀ arrivalDate.Date) � (≥ 1 arrivalTime) �(∀ arrivalTime.Time)

The inputs and the outputs of the other web services can be computedin the same way.

Now, let us consider a service request Q that searches for a vacationpackage that combines a trip with a hotel and a car rental, given a departureplace, an arrival place, a departure date, a (hotel) destination place and thecheck-in and check-out dates. The inputs and outputs of the query Q can beexpressed by the following descriptions which, again, refer to some conceptsof the tourism ontology given in Table 4:

18 Boualem Benatallah et al.

Itinerary ≡ (≥ 1 departurePlace) � ( ∀ departurePlace.Location) �(≥ 1 arrivalPlace) � (∀ arrivalPlace.Location)

Arrival ≡ (≥ 1 arrivalDate) � (∀ arrivalDate.Date) � (≥ 1 arrival-Time) � (∀ arrivalTime.Time)

Departure ≡ (≥ 1 departureDate) � (∀ departureDate.Date) � (≥ 1departureTime) � (∀ departureTime.Time)

Destination ≡ (≥ 1 destinationPlace) � (∀ destinationPlace.Location)StayDuration ≡ (≥ 1 checkIn) � (∀ checkIn.Date) � (≥ 1 checkOut) �

(∀ checkOut.Date)TripReservation ≡ . . .

HotelReservation ≡ . . .CarRental ≡ . . .

Table 4 Example of a tourism ontology.

I(Q) ≡ (≥ 1 departurePlace) � (∀ departurePlace.Location)� (≥ 1 arrivalPlace) � (∀ arrivalPlace.Location)� (≥ 1 departureDate) � (∀ departureDate.Date)� (≥ 1 destinationPlace) � (∀ destination-Place.Location) � (≥ 1 checkIn) � (∀ checkIn.Date)� (≥ 1 checkOut) � (∀ checkOut.Date)

O(Q) ≡ TripReservation � HotelReservation � CarRental

The matching between the service request Q and the three advertisedservices given above can be achieved by computing the best profile cover ofQ using these services. The result is the following:

Best Profile cover Prest PmissFromTravel, Hotel CarRental departureTime

In this example, there is only one best profile cover of Q corresponding tothe description E ≡ Hotel�FromTravel. The selected services generate theconcepts TripReservation and HotelReservation which are part of the out-put required by the query Q. From the service descriptions we can see thatno web service supplies the concept CarRental. Hence, the best profile coversof Q will have exactly the following profile rest: PrestE(Q) ≡ carRental.This rest corresponds to the output of the query that cannot be generatedby any advertised service. Moreover, the Pmiss column shows the informa-tion (the role departureTime) required as input of the selected services butnot provided in the query inputs. More precisely, the best profile covers ofQ will have exactly the following profile missing information: PmissE(Q)≡ (≥ 1 departureTime) � (∀ departureTime.Time). It is worth noting that,although the solution E′ ≡ Hotel � ToTravel generates the same outputs(i.e., the concepts TripReservation and HotelReservation), it will not beselected because its Pmiss is greater than the one of the first solution (itcontains the roles arrivalTime and arrivalDate).

On Automating Web Services Discovery 19

6 Evaluation and Experiments

In this section, we describe a testbed prototype implementation of the com-puteBCov algorithm. This prototype implementation has been motivated bythree goals: (i) to validate the feasibility of the approach, (ii) to test the cor-rectness of the computeBCov algorithm, and (iii) to study the performanceand scalability of computeBCov.

The first two goals were achieved in the context of an european pro-jet, namely the MKBEEM 10 project. To achieve the third goal, we haveintegrated into the prototype a tool, based on the IBM XML Genera-tor (http://www.alphaworks.ibm.com/tech/xmlgenerator), that enables togenerate random XML-based services ontologies and associated service re-quests.

6.1 Application Scenario

We used our prototype in the context of the MKBEEM project which aimsat providing electronic marketplaces with intelligent, knowledge-based mul-tilingual services. The main objective of MKBEEM is to create an intelligentknowledge based multilingual mediation service which features the followingbuilding blocks [3]:

– Natural language interfaces for both the end user and the system’s con-tent providers/service providers.

– Automatic multilingual cataloguing of products by service providers.– On-line e-commerce contractual negotiation mechanisms in the language

of the user, which guarantee safety and freedom.

In this project, ontologies are used to provide a consensual representationof the electronic commerce field in two typical domains (Tourism and Mailorder). In MKBEEM, ontologies are structured in three layers, as shown inFigure 2. The global ontology describes the common terms used in the wholeMKBEEM platform. This ontology represents the general knowledge in dif-ferent domains while each domain ontology contains specific concepts corre-sponding to vertical domains (e.g., tourism). The service ontology describesclasses of services, e.g., service capabilities, non-functionnal attributes, etc.The sources descriptions specify concrete services (i.e;, provider offers) interms of service ontology. The MKBEEM mediation system allows to fillthe gap between customer queries (possibly expressed in a natural lan-guage) and diverse concrete providers offers. In a typical scenario, usersexpress their requests in a natural language, the requests are then trans-lated into ontological formulas expressed using domain ontologies. Then,

10 MKBEEM stands for Multilingual Knowledge Based EuropeanElectronic Marketplace (IST-1999-10589, 1st Feb. 2000 - 1st Dec.2002).http://www.mkbeem.com

20 Boualem Benatallah et al.

MKBEEM Global Ontology

Tourism Mail order

Domain ontology

SNCF B&B Ellos...

Sources descriptions

services Ontology service level

Global anddomainOntologies

Source level

Fig. 2 Knowledge representation in the MKBEEM system.

the MKBEEM system rely on the proposed reasoning mechanism to refor-mulate users queries against the domain ontology in terms of web services.The aim here is to allow the users/applications to automatically discoverthe available web services that best meet their needs, to examine servicecapabilities and to possibly complete missing information.

In our implementation we used ontologies with approximately 300 con-cepts and 50 web services to validate the applicability of the proposed ap-proach. Indeed, this implementation has shown the effectiveness of the pro-posed matchmaking mechanism in two distinct end-user scenarios, namely:(i) business to consumer on-line sales, and (ii) web based travel/tourismservices.

6.2 Quantitative Experiments

To conduct experiments, we have implemented up to 6 versions of the com-puteBCov algorithm corresponding to different combinations of optimiza-tion options. The prototype is implemented using the Java programing lan-guage. All experiments have been performed using a PC with a Pentium III500 MHz and 384 Mo of RAM.

To quantitatively test computeBCov, we first have run computeBCov onworst cases and then on a set of ontologies and queries randomly gener-ated by our prototype. computeBCov worst cases were built according to atheoretical study of the complexity of all versions of computeBCov : two on-tologies (and their associated queries) have been built in order to maximizethe number of minimal transversals of the corresponding hypergraph as wellas the number of elementary operations of the algorithm (i.e., inclusion testsand intersection operations). In each case, there exists at least one versionof computeBCov that completes the execution in less than 20 seconds. It

On Automating Web Services Discovery 21

should be noted that although these cases are bad for computeBCov, theyare also totally unrealistic with respect to practical situations.

We generated larger but still realistic random ontologies. We focus hereon three case studies with varying size of the application domain ontology,of the web service ontology and of the query. Their characteristics are givenbelow:

Configurations Case 1 Case 2 Case 3Number of defined concepts inthe application domain ontology

365 1334 3405

Number of web services 366 660 570Number of (atomic) clauses inthe query

6 33 12

It is worth noting that the internal structures of these ontologies correspondto bad cases for the computeBCov algorithm. We have run the 6 versionsof the computeBCov algorithm based on these cases. The overall executiontime results are given in Figure 311. This Figure shows that for cases 1 and

4 146

4050

16 914 763

16 930 605

100 995

100 554

191

14 488 924

332 268

10 367 728

180

31 966

15 332

1 913

31 505

1

10

100

1000

10000

100000

1000000

10000000

100000000

Case 1 Case 2 Case 3

milliseconds

1 32 4 5 6 7 8> 43 200 000 (> 12 hours)

> 43 200 000 (> 12 hours)

1 : BnB: false, Pers: false, BnB1

2 : BnB: false, Pers: false, BnB2

3 : BnB: false, Pers: true, BnB1

4 : BnB: false, Pers: true, BnB2

5 : BnB: true, Pers: false, BnB1

6 : BnB: true, Pers: false, BnB2

7 : BnB: true, Pers: true, BnB1

8 : BnB: true, Pers: true, BnB2

(1s)

(10s)

(1m n40s)

(16m n40s)

(2h46m n40s)

(27h46m n40s)( 4h43m n)

( 1m n40s)

( 4h)

( 2h53m n)

( 5m n30s)

5 6 7 8 5 6 7 8

Fig. 3 Execution time.

3 (respectively, case 2), there is at least a version of the algorithm that runsin less than two seconds (respectively, in less than 30 seconds). Although

11 Note that versions 1 and 2 of the algorithm (respectively, 3 and 4) are similaras both run computeBCov without BnB, and what distinguishes 1 from 2 (respec-tively, 3 from 4) is the way the option BnB is implemented (BnB1 or BnB2).

22 Boualem Benatallah et al.

Figure 3 shows that there is a great difference in performance of the differentversions of the algorithm, in each case, there is at least one effecient versionof the algorithm even when the domain ontology is quite large. Details aboutthe implementation of computeBCov, the theoretical study of worst cases,the parameterized ontology generation process and experimental results canbe found in [35].

7 Related Work

In this section, we review related work in the area of semantic web servicesdiscovery, then we examine the relationship of our work with the problemof query (concept) rewriting.

7.1 Semantic Web Services Discovery

Current web services infrastructure have serious limitations with respect tomeeting the automation challenges. For example, UDDI provides limitedsearch facilities allowing only a keyword based search of businesses, servicesand the so-called tModels based on names and identifiers. To cope with thislimitation, emerging approaches rely on semantic web technology to supportservice discovery [24,33]. For example, [11] proposes to use process ontolo-gies to describe the behaviour of services and then to query such ontologiesusing a Process Query Language (PQL). [14] defines an ontology basedon DAML [1] to describe mobile devices and proposes a matching mecha-nism that locates devices based on their features (e.g., a type of a printer).The matching mechanism exploits rules that use the ontology, service pro-file information and the query to perform matching based on relationshipsbetween attributes and their values. A Prolog based reasoning engine isused to support such a matching. There are other approaches based on aDAML-OIL [29] description of services that propose to exploit the descrip-tion logic based reasoning mechanisms. [24] reports on an experience inbuilding matchmaking prototype based on description logic reasoner whichconsiders DAML+OIL based service descriptions. The proposed matchmak-ing algorithm is based on simple subsumption and consistency tests. [33]proposes a more sophisticated matchmaking algorithm between services andrequests described in DAML-S12. The algorithm considers various degreesof matching that are determined by the minimal distance between conceptsin the concept taxonomy. Based on a similar approach, the ATLAS match-maker [34] considers DAML-S ontologies and utilizes two separate sets offilters: 1) Matching functional attributes to determine the applicability ofadvertisements (i.e., do they deliver sufficient quality of service, etc). Thematching is achieved by performing conjunctive pair-wise comparison for thefunctional attributes; 2) Matching service functionality to determine if the

12 http://www.daml.org/services/

On Automating Web Services Discovery 23

advertised service matches the requested service. A DAML-based subsump-tion inference engine is used to compare input and output sets of requestsand advertisements.

Finally, it should be noted that the problem of capabilities based match-ing has also been addressed by several other research communities, e.g.,information retrieval, software reuse systems and multi-agent communities.More details about these approaches and their applicability in the con-text of the semantic web services area can be found in [11,33]. Our workmakes complementary contributions to the efforts mentioned above. Moreprecisely, our approach builds upon existing service description languagesand provides the following building blocks for flexible and effective servicediscovery:

– A global reasoning mechanism: we propose a matchmaking algorithmthat goes beyond a pair-wise comparison between a service request andservice offers by allowing to discover combinations of services that match(cover) a given request.

– A flexible matchmaking process that goes beyond subsumption tests– Effective computation of the missed information: the difference between

the query and its rewriting (i.e., rest and miss) is effectively computedand can be used, for example, to improve service repository interactions.

7.2 Query (concept) Rewriting

From the technical point of view, the best covering problem belongs to thegeneral framework for rewriting using terminologies provided in [6]. Thisframework is defined as follows:

– Given a terminology T (i.e., a set of concept descriptions), a conceptdescription Q that does not contain concept names defined in T and abinary relation ρ between concept descriptions, can Q be rewritten intoa description E, built using (some) of the names defined in T , such thatQρE ?

– Additionally, some optimality criterion is defined in order to select therelevant rewritings.

Already investigated instances of this problem are the minimal rewritingproblem [6] and rewriting queries using views [7,23,26].

Minimal rewriting is concerned with the problem of rewriting a conceptdescription Q into a shorter but equivalent description (hence, ρ is equiv-alence modulo T and the size of the rewriting is used as the optimalitycriterion). Here, the focus is on determining a rewriting that is shorter andmore readable than the original description.

The problem of rewriting queries using views has been intensively inves-tigated in the database area (see [26] for a survey). The purpose here is torewrite a query Q into a query expression that uses only a set of views V .Two main kind of rewritings have been studied:

24 Boualem Benatallah et al.

– Maximally-contained rewritings where ρ is the subsumption and the op-timality criterion is the inverse subsumption. This kind of rewriting playsan important role in many applications such as information integrationand data warehousing.

– Equivalent rewriting where ρ is the equivalence and the optimality cri-terion is minimization of the cost of the corresponding query plan. Thiskind of rewriting has been mainly used for query optimization purposes.

The best covering problem can be viewed as a new instance of the prob-lem of rewriting concepts using terminologies where:

– ρ correponds to the notion of cover (hence, it is neither equivalence norsubsumption), and

– the optimality criterion is the minimization of the rest and the miss.

8 Conclusion

In this paper we have presented a novel approach to automate the discoveryof web services. We formalized service discovery as a rewriting process, andthen investigated this problem in the context of restricted framework ofdescription logics with structural subsumption. These logics ensure thatthe difference operation is always semantically unique and can be computedusing a structural difference operation. In this context, we have shown thatthe best covering problem can be mapped to the problem of computing theminimal transversals with minimum cost of a “weighted” hypergraph.

The framework of languages with semantically unique difference appearsto be sufficient in the context of the MKBEEM project. But the languagesthat are proposed to achieve the semantic web vision appear to be more ex-pressive. Our future work will be devoted to the extension of the proposedframework to hold the definition of the best covering problem for descrip-tion logics where the difference operation is not semantically unique. In thiscase, the difference operation does not yield a unique result and thus theproposed definition of a best cover is no longer valid. We also plan to : (i)consider service discovery when there is a large number of heterogeneous on-tologies, (ii) extend the proposed technique to support service compositionautomation.

References

1. http://www.daml.org/services/.

2. http://www.w3.org/2001/sw/webont/.

3. MKBEEM web site : http://www.mkbeem.com.

4. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and editors P. Patel-Schneider. The Description Logic Handbook. Theory, Implementation andApplications. Cambridge University Press, 2003.

On Automating Web Services Discovery 25

5. F. Baader, R. Kusters, and R. Molitor. Computing Least Common Subsumerin Description Logics with Existential Restrictions. In T. Dean, editor, Proc.of the 16th Int. Joint Conf. on AI, pages 96–101. M.K, 1999.

6. Franz Baader, Ralf Kusters, and Ralf Molitor. Rewriting Concepts UsingTerminologies. In Proc. of the Int. Conf.KRColorado, USA, pages 297–308,Apr. 2000.

7. C. Beeri, A.Y. Levy, and M-C. Rousset. Rewriting Queries Using Views inDescription Logics. In L. Yuan, editor, Proc. of the ACM PODS , New York,USA, pages 99–108, Apr. 1997.

8. B. Benatallah, M-S. Hacid, C. Rey, and F. Toumani. Request Rewriting-BasedWeb Service Discovery. In Dieter Fensel, Katia Sycara, and John Mylopoulos,editors, International Semantic Web Conference (ISWC 2003), Sanibel Island,FL, USA, volume 2870 of LNCS, pages 242–257. Springer, Oct. 2003.

9. B. Benatallah, M-S. Hacid, C. Rey, and F. Toumani. Semantic Reasoning forWeb Services Discovery. In WWW Workshop on E-Services and the SemanticWeb, Budapest, Hungary, May 2003.

10. C. Berge. Hypergraphs, volume 45 of North Holland Mathematical Library.Elsevier Science Publishers B.V. (North-Holland), 1989.

11. A. Bernstein and M. Klein. Discovering Services: Towards High PrecisionService Retrieval. In CaiSE workshop on Web Services, e-Business, and theSemantic Web: Foundations, Models, Architecture, Engineering and Applica-tions. Toronto, Canada, May 2002.

12. F. Casati, M-C. Shan, and D. Georgakopoulos (editors). The VLDB Journal:Special Issue on E-Services. 10(1), Springer-Verlag Berlin Heidelberg, 2001.

13. Fabio Casati and Ming-Chien Shan. Dynamic and adaptive composition ofe-services. Information Systems, 26(3):143–163, May 2001.

14. D. Chakraborty, F. Perich, S. Avancha, and A. Joshi. DReggie: SemanticService Discovery for M-Commerce Applications. In Workshop on Reliableand Secure Applications in Mobile Environment, 20th Symposium on ReliableDistributed Systems, pages 28–31, Oct. 2001.

15. The DAML Services Coalition. DAML-S: Web Service Description for theSemantic Web. In The First International Semantic Web Conference (ISWC),pages 348–363, Jun. 2002.

16. Y. Ding, D. Fensel, and B. Omelayenko M.C.A. Klein. The semantic web: yetanother hip? DKE, 6(2-3):205–227, 2002.

17. F. Donini and A. Schaerf M. Lenzerini, D. Nardi. Reasoning in descriptionlogics. In Gerhard Brewka, editor, Foundation of Knowledge Representation,pages 191–236. CSLI-Publications, 1996.

18. G. Weikum (editor). Data Engineering Bulletin: Special Issue on Infrastruc-ture for Advanced E-Services. 24(1), IEEE Computer Society, 2001.

19. T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraphand related problems. SIAM Journal on Computing, 24(6):1278–1304, 1995.

20. D. Fensel, C. Bussler, Y. Ding, and B. Omelayenko. The Web Service Mod-eling Framework WSMF. Electronic Commerce Research and Applications,1(2), 2002.

21. D. Fensel, C. Bussler, and A. Maedche. Semantic Web Enabled Web Services.In International Semantic Web Conference, Sardinia, Italy, pages 1–2, Jun.2002.

22. M.L. Freidman and L. Khachiyan. On the complexity of dualization of mono-tone disjunctive normal forms. Journal of Algorithms, 21:618–628, 1996.

26 Boualem Benatallah et al.

23. F. Goasdoue and M-C Rousset V. Lattes. The Use of CARIN Languageand Algorithms for Information Integration: The PICSEL System. IJICIS,9(4):383–401, 2000.

24. J. Gonzalez-Castillo, D. Trastour, and C. Bartolini. Description Logics forMatchmaking of Services. In KI-2001 Workshop on Applications of De-scription Logics Vienna, Austria, Sep. 2001. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-44/.

25. M-S. Hacid, A. Leger, C. Rey, and F. Toumani. Dynamic Dis-covery of E-services (A Description Logics Based Approach). Re-port, LIMOS, Clemont-Ferrand, France, 2002. see http://www.710.univ-lyon1.fr/∼dbkrr/mshacid/publications.html.

26. Alon Y. Halevy. Answering queries using views: A survey. VLDB Journal,10(4):270–294, 2001.

27. J. Hendler and D. L. McGuinness. The DARPA Agent Markup Language.IEEE Intelligent Systems, 15(6):67–73, Nov. 2000.

28. I. Horrocks, P.F.Patel-Schneider, and F . van Harmelen. Reviewing the Designof DAML+OIL: An Ontology Language for the Semantic Web. In Proc. ofthe 18th Nat. Conf. on Artificial Intelligence (AAAI 2002), 2002.

29. Ian Horrocks. DAML+OIL: A Reasonable Web Ontology Language. In Proc.of the EDBT’2002 Prague, Czech Republic, pages 2–13, Mar. 2002.

30. H. Mannila and K-J Raiha. The Design of Relational Databases. Addison-Wesley, Wokingham, England, 1994.

31. S. McIlraith, T.C. Son, and H. Zeng. Semantic Web Services. IEEE IntelligentSystems. Special Issue on the Semantic Web, 16(2):46–53, March/April 2001.

32. R. Molitor. Structural Subsumption for ALN . LTCS-Report LTCS-98-03,LuFG Theoretical Computer Science, RWTH Aachen, Germany, 1998.

33. M. Paolucci, T. Kawamura, T.R. Payne, and K.P. Sycara. Semantic Matchingof Web Services Capabilities. In Int. Semantic Web Conference, Sardinia,Italy, pages 333–347, Jun. 2002.

34. T.R. Payne, M. Paolucci, and K. Sycara. Advertising and Matching DAML-SService Descriptions (position paper). In International Semantic Web Work-ing Symposium, Stanford University, California, USA, Jul. 2001.

35. C. Rey, F. Toumani, M-S. Hacid, and A. Leger. An algorithm and a prototypefor the dynamic discovery of e-services. Technical Report RR05-03, LIMOS,Clemont-Ferrand, France, http://www.isima.fr/limos/publications.htm,2003.

36. G. Teege. Making the difference: A subtraction operation for description log-ics. In J. Doyle, E. Sandewall, and P. Torasso, editors, KR’94, San Francisco,CA, 1994. Morgan Kaufmann.