Balanced distributed web service lookup system

14
Journal of Network and Computer Applications 31 (2008) 149–162 Balanced distributed web service lookup system S. Sioutas a,b,c, , E. Sakkopoulos a,b,c , L. Drossos a,b,c , S. Sirmakessis a,b,c a Research Academic Computer Technology Institute, Internet and Multimedia Technologies Research Unit, N. Kazantzaki Str. 26500 Rion, Patras, Greece b Technological Educational Institution of Messolongi, Department of Applied Informatics in Administration and Economics, GR-30200, Messolongi, Hellas c University of Patras, Computer Engineering and Informatics Department, GR-26500, Rion, Patras, Greece Received 8 March 2006; accepted 8 March 2006 Abstract Web Services (WS) constitute an essential factor for the next generation of application integration. An important direction, apart from the optimization of the description mechanism, is the discovery of WS information and WS search engines lookup capabilities. In this paper, we propose a novel decentralized approach for WS discovery based on a new distributed peer-based approach. Our proposed solution builds upon the Domain Name Service decentralized approach majorly enhanced with a novel efficient lookup up system. In particular, we work with peers that store WS information, such as service descriptions, which are efficiently located using a scalable and robust data indexing structure for Peer-to-Peer networks, the Balanced Distributed Tree (BDT). BDT provides support for processing (a) Exact match Queries of the form ‘‘given a key, map the key onto a node’’ and (b) Range Queries of the form ‘‘map the nodes whose keys belong to a given range’’. BDT adapts efficiently update queries as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from our theoretical analysis point out that the communication cost of both lookup and update operations scaling in sub-logarithmic (almost double-logarithmic) time complexity on the number of nodes. Furthermore, our system is also robust on failures. r 2006 Elsevier Ltd. All rights reserved. Keywords: Web service discovery, Quality of service ARTICLE IN PRESS www.elsevier.com/locate/jnca 1084-8045/$ - see front matter r 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.jnca.2006.03.005 Corresponding author. Research Academic Computer Technology Institute, Internet and Multimedia Technologies Research Unit, N. Kazantzaki Str. 26500 Rion, Patras, Greece. E-mail addresses: [email protected] (S. Sioutas), [email protected], [email protected] (E. Sakkopoulos), [email protected] (L. Drossos).

Transcript of Balanced distributed web service lookup system

ARTICLE IN PRESS

Journal of Network and

Computer Applications 31 (2008) 149–162

1084-8045/$ -

doi:10.1016/j

�CorrespoTechnologies

E-mail ad

drosos@teip

www.elsevier.com/locate/jnca

Balanced distributed web service lookup system

S. Sioutasa,b,c,�, E. Sakkopoulosa,b,c,L. Drossosa,b,c, S. Sirmakessisa,b,c

aResearch Academic Computer Technology Institute, Internet and Multimedia Technologies Research Unit,

N. Kazantzaki Str. 26500 Rion, Patras, GreecebTechnological Educational Institution of Messolongi, Department of Applied Informatics in Administration and

Economics, GR-30200, Messolongi, HellascUniversity of Patras, Computer Engineering and Informatics Department, GR-26500, Rion, Patras, Greece

Received 8 March 2006; accepted 8 March 2006

Abstract

Web Services (WS) constitute an essential factor for the next generation of application integration.

An important direction, apart from the optimization of the description mechanism, is the discovery of

WS information and WS search engines lookup capabilities. In this paper, we propose a novel

decentralized approach for WS discovery based on a new distributed peer-based approach. Our

proposed solution builds upon the Domain Name Service decentralized approach majorly enhanced

with a novel efficient lookup up system. In particular, we work with peers that store WS information,

such as service descriptions, which are efficiently located using a scalable and robust data indexing

structure for Peer-to-Peer networks, the Balanced Distributed Tree (BDT). BDT provides support for

processing (a) Exact match Queries of the form ‘‘given a key, map the key onto a node’’ and (b) Range

Queries of the form ‘‘map the nodes whose keys belong to a given range’’. BDT adapts efficiently

update queries as nodes join and leave the system, and can answer queries even if the system is

continuously changing. Results from our theoretical analysis point out that the communication cost of

both lookup and update operations scaling in sub-logarithmic (almost double-logarithmic) time

complexity on the number of nodes. Furthermore, our system is also robust on failures.

r 2006 Elsevier Ltd. All rights reserved.

Keywords: Web service discovery, Quality of service

see front matter r 2006 Elsevier Ltd. All rights reserved.

.jnca.2006.03.005

nding author. Research Academic Computer Technology Institute, Internet and Multimedia

Research Unit, N. Kazantzaki Str. 26500 Rion, Patras, Greece.

dresses: [email protected] (S. Sioutas), [email protected], [email protected] (E. Sakkopoulos),

at.gr (L. Drossos).

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162150

1. Introduction

Web Services (WS) have evolved to become the framework for next-generation businessintegration. Service discovery and matchmaking is promoted with the participation ofmost worldwide software vendors as well as IT researchers. However, discovery of servicesthat match a conceptual and technical description in a truly distributed and dynamicenvironment has not been fully enabled yet.Although WS technology is categorized among the newest members of the web

engineering area, the widespread adoption of XML standards that support functionalitiesin terms of description (e.g. WSDL), communication (e.g. SOAP W3C, 2003), anddiscovery (e.g. UDDI) has stimulated to action the industry and academia in order toaddress the WS research issues.Web Services, which are designed to enable applications to communicate directly and

exchange data, regardless of language, platform and location, are currently emerging as adominant paradigm (Borenstein and Fox, 2004) for constructing and composingdistributed business applications and enabling enterprise-wide interoperability. In theyears to come, WS are expected to become more widely adopted and thus, there will be anincreasing demand for automated service discovery.At present, catalogues based on the Universal Description, Discovery and Integration

standard (UDDI) constitute the prevalent technological environment for WS discovery.UDDI adopts a centralized architecture consisting of multiple repositories thatsynchronize periodically. However, this approach has already been proved to be inefficient,and as the number of WS grows and become more dynamic, such a centralized approachquickly becomes impractical. Another such example earlier in the Internet ear has been theDomain Name Service approach (DNS). DNS a cornerstone for the development andexpansion of the Internet as it facilitates better communication of site addresses bytranslating them to their corresponding formal IP address. The DNS is a universal serviceparadigm that has sustained a simple, distributed and balanced approach, keeping inparallel the advantages of the centralized approach.In the initial name resolution service steps all the information that one needed to

know about these hosts was stored in a single file, hosts.txt. As the number of hosts grewvery large, problems arose with a single point resolution service. A simplified replicationservice copied the files across the entire network, however maintaining consistency was notthat simple. This system of a flat name space obviously did not scale well. PaulMockapetris of USC’s Information Science Institute, designed the architecture of the newsystem, the DNS, to solve these problems. This new architecture involves a hierarchicalname space. In this design, uniqueness of names would easily be ensured. Data can be keptup-to-date much easier with local management and the implementation of zones ofauthority.In this work, we follow, build and extend the latter well-known and powerful idea of the

hierarchical solution for DNS for the similar problem of WS registries’ discoverymechanisms enhanced by a novel lookup system based on the BDT. Additionally, since aquery may return more than one results that meet the functional requirements but providedifferent quality of service attributes, the selection procedures become even morecomplicated in order to satisfy the QoWS requirements (Yutu Liu et al.,). One moreparadigm may be found in Makris et al. (2005b) where QoS negotiation and adaptive WSselection is performed.

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162 151

Current approaches to WS discovery can generally be classified as centralized ordecentralized. Comprehensive reviews have been provided thoroughly and extensively inthe work of, while agent based only WS discovery techniques in (Moreau et al., 2002). Thecentralized approach includes UDDI registries (Makris et al., 2005b; OASIS UDDI),whereas the decentralized, distributed approaches are usually based on Peer-to-Peer (P2P)infrastructures. In P2P discovery of WS network nodes are considered as peers that shareinformation and are able to query other nodes.

Decentralized systems of WS discovery may have either structured or unstructuredarchitecture (Makris et al., 2005b; Schmidt and Parashar, 2004). Structured P2P architecturesuse hashing and in specific the most of them are based on Distributed Hash Tables(DHTs). These systems provide efficient processing of location operations so that, givena query with an object key, they locate (route the query to) the peer node that storesthe object.

DHT-based systems provide efficient processing of the routing/location opera-tions that, given a query for a document id, they locate (route the query to) the peernode that stores this document. Thus, they provide support for exact-match queries.DHT-based systems are referred as structured P2P systems because in generalthey rely on lookups of a distributed hash table, which creates a structure in thesystem emerging by the way that peers define their neighbors. Related P2P sys-tems like Gnutella (http://gnutella.wego.com), MojoNation (Andersen, 2001), etc,do not create such a structure, since neighbors of peers are defined in rather ad hocways.

There are several P2P DHTs architectures like Chord (Stoica et al., 2001), CAN(Ratnasamy et al., 2001), Pastry (Rowstron and Druschel, 2001), Tapestry (Chen et al.,1999), etc. From these, CAN and Chord are the most commonly used supportingmore elaborate queries. Pastry and Tapestry are based on hypercube routing: themessage is forwarded deterministically to a neighbor whose identifier is one digit closerto the target identifier. CAN partitions a d-dimensional coordinate space into zonesthat are owned by nodes which store keys mapped to their zone. Routing is doneby greedily forwarding messages to the neighbor closest to the target zone. Chordmaps nodes and resources to identities of m bits placed around a modulo 2m identi-fier circle and does greedy routing to the farthest possible node stored in the rout-ing table.

However, there are also other than DHTs structured decentralized systems, which builddistributed, scalable indexing structures to route search requests, such as P-Grid. P-Grid(Clarke, 1999) is a scalable access structure based on a virtual distributed search tree. Ituses randomized techniques to create and maintain the structure in order to providecomplete decentralization.

In this work we present a new efficient hierarchical grid structure for Web ServiceDiscovery Overlay—Data Networks, named BDT. The high-level approach followed inthe BDT system that resembles the architecture of the hierarchical DNS service (PaulMockapetris; DNS internet survey, 2005). Nevertheless, in depth details differentiate as theBDT uses a virtual Exponential Search Tree to guide key based searches. BDT providessupport for processing:

(a)

Exact match Queries of the form ‘‘given a key, map the key onto a node’’ and (b) Range Queries of the form ‘‘map the nodes whose keys belong to a given range’’.

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162152

Data location can be easily implemented on top of BDT by associating a key with eachdata item, and storing the key/data item pair at the node to which the key maps. Wesuppose that each node stores an ordered set of keys and the mapping algorithm runs insuch way that locally ordered key_sets are also disjoint to each other. BDT adaptsefficiently update queries as nodes join and leave the system, and can answer queries even ifthe system is continuously changing. Results from theoretical analysis show that thecommunication cost of the query and update operations scaling sub-logarithmically(almost double logarithmically) with the number of BDT nodes. Furthermore, our systemis also robust on failures.The rest of this paper is structured as follows. Section 2 remind us the fundamentals of

hierarchical protocols giving examples, Section 3 presents the BDT, our new efficient andscalable P2P lookup system. In this section, we also describe and resolve thecommunication cost of search and join/leave operations. Section 4 presents the resultsfrom theoretical analysis and Section 5 discusses experimental results. Finally, we outlineitems for future work and summarize our contributions in Section 6.

2. Preliminaries

This section reminds us the hierarchical and tree-based algorithms that are useful in P2Pcontexts.

2.1. Hierarchical protocols

Hierarchical protocols are nothing new, but provide an interesting approach to thebalance between scalability and performance. The most well-known service in use todaythat uses a hierarchical protocol is DNS. The purpose of DNS is to translate a humanfriendly domain name, such as www.ietf.org, to its corresponding IP address (in this case4.17.168.6). The DNS architecture consists of the following:

Root name servers � Other name servers � Clients

The other name servers can also be classified as authorative name servers for somedomains. The early Internet forced all hosts to maintain a copy of a file named hosts.txt,which contained all necessary translations. As the network grew, the size and frequentchanges of the file became unfeasible. The introduction of DNS remedied this problem andhas worked successfully since then.

2.2. An example of a DNS lookup

Assume a host is located in the domain sourceforge.net. The following scenario showswhat a DNS lookup could look like in practice.If a user on the aforementioned host, in the sourceforge.net domain, directs his web

browser to http://www.ietf.org the web browser issues a DNS lookup for the namewww.ietf.org.The request is sent to the local name server of the sourceforge.net domain.

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162 153

The name server at sourceforge.net is not able to answer the question directly, but itknows the addresses of the root name servers and contacts one of them.

There are 12 root name servers (9 in the US, 1 in the UK, 1 in Sweden and 1 in Japan).The root name server knows the address of a name server for the org domain. This addressis sent in response to the question from the local name server at sourceforge.net.

The name server at sourceforge.net asks the name server of the org domain, but it doesnot have the answer either, but the name server of the org domain knows the name andaddress of the authorative name server for the ietf.org domain.

The name server at sourceforge.net contacts the name server at ietf.org and once againasks for the address of www.ietf.org. This time an answer is found and the IP address4.17.168.6 is returned.

The web browser can continue its work by opening a connection to the correct host.Note that a question sent to a name server can be either recursive or iterative.

A recursive question causes the name server to continue asking other name servers until itreceives an answer, which could be that the name does not exist. An iterative query returnsan answer to the host asking the question immediately. If a definite answer cannot begiven, suggestions on which servers to ask instead are given.

2.3. Caching in DNS

Caching plays an important part in DNS. In the example above the local name serverwill cache the addresses obtained for the name server of the org domain and the ietf.orgdomain as well as the final answer, the address of www.ietf.org. This causes subsequenttranslations of www.ietf.org to be answered directly by the local name server, andtranslations of other hosts in the domain ietf.org can bypass the root name server and theorg server. The translation of an address such as www.gnu.org bypasses the root nameserver and asks the name server for the org domain directly.

2.4. Redundancy and fault tolerance in DNS

To make DNS fault tolerant, any name server can hold a set of entries as the answer to asingle question. A name server can answer a question such as ‘‘What is the address ofwww.gnu.org’’ with something like Table 1, which provides the names of name servers for

Table 1

Sample response from a DNS query

;; ANSWER SECTION:

gnu.org. 86385 IN NS nic.cent.net.

gnu.org. 86385 IN NS ns1.gnu.org.

gnu.org. 86385 IN NS ns2.gnu.org.

gnu.org. 86385 IN NS ns2.cent.net.

gnu.org. 86385 IN NS ns3.gnu.org.

;; ADDITIONAL SECTION:

nic.cent.net. 79574 IN A 140.186.1.4

ns1.gnu.org. 86373 IN A 199.232.76.162

ns2.gnu.org. 86385 IN A 195.68.21.199

ns2.cent.net. 79574 IN A 140.186.1.14

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162154

the gnu.org domain. The results were obtained using the dig utility available on most Unixsystems. In reality the response is much more compact.The example shows that the gnu.org domain appears to have five name servers (NS), of

which four of their addresses are known to us. The question of the address of www.gnu.orgcan be sent to anyone of the four servers. This means that we can receive an answer to ourquestion as long as at least one of the NS is reachable.

3. The BDT architecture

The BDT provides a balanced distributed tree-like structure where key-based searchingcan be performed in order to discover and select a requested Web Service key or URL. Interms of bandwidth usage searching scales very well since no broadcasting or otherbandwidth consuming activities takes place during searches. Since all searches are keybased there are two possibilities:

Let each host implement the same translation algorithm that translates a sequence ofkeywords to a binary key. � Let another service provide the binary key. This service accepts keyword-based queries

and can respond with the corresponding key.

The second approach is more precise. It is also possible to use a more centralizedimplementation for such a service. From now on we assume that the key is available. Thepaper describes an algorithm for the first case. We also suppose that the set of keys on eachhost retain a global order. Details are described below.

3.1. BDT network

The BDT is a balanced distribution tree T where the degree of the nodes at level i isdefined to be d(i) ¼ t(i) and t(i) indicates the number of nodes present at level i. This isrequired to hold for iX1, while dð0Þ ¼ 2 and tð0Þ ¼ 1. It is easy to see that we also havetðiÞ ¼ tði � 1Þdði � 1Þ, so putting together the various components, we can solve therecurrence and obtain for iX1: dðiÞ ¼ 22

i�1

; tðiÞ ¼ 22i�1

. One of the merits of this tree is thatits height is O(log log n), where n is the number of elements stored in it.

3.2. Peers in BDT

We distinguish between leaf_peers and node_peers: If peer i, henceforth denoted pi, is akey_host_peer (leaf) of the BDT network it maintains the following:

A number of ordered k-bit binary keys ki ¼ b1 . . . bk, where k is less than orequal to n1, for some bounded constant n1 which is the same for all pi. Thisordered set of keys denotes key space that the peer is responsible for. Let K the numberof k-bit binary keys and n the number of key_host_peers. While we can initiallydistribute the keys in that way such as each host peer (leaf) stores a load of Y(K/n) keysit is not at all obvious how to bound the load of the host peers, during updateoperations. In Kaporis et al. (2003), an idea of general scientific interest was presented:modeling the insertions/deletions as a combinatorial game of bins and balls, the size of

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162 155

each host peer is expected w.h.p. Y(ln n), for keys that are drawn from an unknowndistribution.

� The key_sets Sj ¼ fkij1pipYðK=nÞg; 1pjpn retain a global order. That means,8Sj ;Sq; 1pj � n; 1pqpn; jaq; if minfSjgominfSqg then maxfSjÞominfSqg. There-upon, we are sorting the key_sets above providing a leaf oriented data structure as youcan see in Fig. 1 .If pi, is a node_peer (root or internal node) of the BDT network is associated with thefollowing: � A local table of sample elements REFPI

, one for each of its subtrees. The REF table iscalled the reference table of the peer and the expression REFPI

½r�denotes the set ofaddresses at index r in the table. Each REF table is organized as the innovative linearspace indexing scheme presented in Anderson et al. (2000) which achieves an optimalOð

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log nÞ

pworst-case time bound for dynamic updating and searching

operations, where n the number of stored elements. We will use this solution as thebase searching routine on the local table of each network node.

REF0th-level[2]

REF1st-level[4] REF1st-level[4]

key-host_1 key-host_2

Root-node

key-host_n

Leaf-nodes

Internal-nodes

1 k

Cache

1 k

Cache

1 k

Cache

Cache Cache Cache Cache

1 k 1 k 1 k1 k1 k 1 k 1 k1 k

Fig. 1. The BDT system.

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162156

For each node pi we explicitly maintain parent, child, and sibling pointers. Pointers tosibling nodes will be alternatively referred to as level links. The required pointer

information can be easily incorporated in the construction of the BDT search tree.

3.3. Lookup complexity

Theorem 1. Suppose a BDT network. Then, Exact Match operations require

Oðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log nÞ

phops where n denotes the current number of peers.

Proof. Assume that a key_host_peer p performs a search for key k. We first check whetherk is to the left or right of p, say k is to the right of p. Then we walk towards the root,say we reached node u. We check whether k is a descendant of u or u’s right neighbor onthe same level by searching the REF table of u or u’s right neighbor, respectively.If not, then we proceed to u’s father. Otherwise we turn around and search for k in theordinary way. &

Suppose that we turn around at node w of height h. Let v be that son of w that is on thepath to the peer p. Then all descendants of v’s right neighbor lie between the peer p and thekey k. The subtree Tw is an BDT-tree for n0pn elements, and its height is h ¼ Y(log log n0).So, we have to visit the appropriate search path w, w1, w2,y..wr from internal node w to

leaf node wr. In each node of this path we have to search for the key k using the REFwi

indices, 1pipr and r ¼ O(log log n), consuming Offiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog dðwiÞ=log log dðwiÞ

p� �worst-case

time, where d(wi) the degree of node wi. This can be expressed by the following sum:Pr¼Oðlog log dÞi¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog dðwiÞ=log log dðwiÞ

pLet L1, Lr the levels of W1 and Wr, respectively. So, dðw1Þ ¼ 22

L1and dðwrÞ ¼ 22

Lr

But, Lr ¼ O(log log n). Now, the previous sum can be expressed as follows:ffiffiffiffiffiffiffi2L1

L1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffi2L1þ1

L1 þ 1

sþ � � � � � � � � � � � � þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n

log log n

s¼ O

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n

log log n

s !.

Remark 1. Exploiting the order between the key_sets on the leaves it is obvious that Range

Queries of the form [k‘, kr] require Oðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log n

pþ jAjÞ hops, where A the answer set.

In such a query, the hosts whose keys belong to the range [k‘, kr] can be found by firstsearching the BDT structure for k‘ and then perform an in-order traversal in the tree fromk‘ to kr.

To perform the search a connection to a peer p in the BDT is established andthe call bdtgrid_search(p, k) is performed. The function bdtgrid_search is shownin Fig. 2.

3.4. Fault tolerance

What will happen when few internal nodes have been shutdown? In data cachingapplications, it is useful to solve the more general problem of finding a nearby nodethat actually has a copy of the desired data. In our structure, thus, we equip each nodewith k redundant nodes each of them stores replicated copies of a data item, where k41 asmall positive constant. We also suppose that each node is k-robust, that means the

ARTICLE IN PRESS

Fig. 2. Pseudo-code for BDT searches.

S. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162 157

simultaneous shutdown of all these nodes is impossible, thus at least one is active on thenetwork.

3.5. Key_host_peers join and leave the system

In the case of key_host_peer overflow we have to nearby insert a new host_peer. In thesecond case of underflow we have to mark as deleted the key_host_peer by moving first thefew remaining keys to the left or right neighbors. Obviously after a significant number ofjoin/leave operations a global rebuilding process is required for cleaning the redundantnodes and rebalancing the BDT structure (Figs. 3–5).

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162158

Theorem 2. Suppose a BDT network. Then, join and leave operations require

Oðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log nÞ

pamortized number of hops where n denotes the current number of peers.

Proof. A join (insert) operation affects the path from the new leaf node to the rootof the BDT–GRID. In each path-node wi (1pipc log log n and c is a constant) we haveto update the REFwi

index. This process requires Oðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog dðwiÞ=log log dðwiÞ

pÞ time,

where d(wi) the degree of the node wi. This can be expressed by the following sum:Pr¼Oðlog log dÞi¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog dðwiÞ=log log dðwiÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log n

p. &

The leave (delete) operation requires Oðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log nÞ

phops for detecting the node

and O(1) time to mark as deleted that node.

(Anderson et al ., 2000)

Fig. 3. Pseudo-code for INSERT host_peers.

Fig. 4. Pseudo-code for DELETE host_peers.

Fig. 5. Pseudo-code for Rebuilding operation.

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162 159

After Y(n) update operations we have to rebuild the Balanced Distributed backbone. Byspreading the Y(n) rebuilding cost to the next Y(n) updates, the theorem’s amortizedbound follows.

4. Evaluation and outline of contributions

For comparison purposes only, an elementary operation’s evaluation is presented inTable 2 between BDT, Chord and its newest variations, F-Chord(a) (Stoica et al., 2003)and LPRS-Chord (Zhang et al., 2003). Our contribution provides for exact-match queries,improved search costs from O(logN) in DHTs to Oð

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log nÞ

pin BDT and

adequate and simple solution to the range query problem. However, BDT is not basedupon DHT overlay and therefore it does not have full P2P functionality. It utilizes apowerful hierarchical lookup system similar to the concepts followed by the DNS service.WS descriptions are not expected to update the WS distributed registry in the same rate asa typical data distribution P2P network, due to their business intelligence based nature. Asa consequence, fully P2P functionality does not seem to be necessary in the case of WS.Recent DNS survey (DNS internet survey, 2005) shows that hierarchical solutions maymanage efficiently in practice millions of registered hosts as well as all correspondingupdates, insertions and deletions. In this sense, we propose a novel hierarchical basedapproach that out-performs the fully distributed P2P solutions, while it does follow a non-P2P approach.

Update Queries such as WS registration and de-registration requests are not performedas frequently as a user login and logout in a typical P2P data delivery network. WS aresoftware developed to support business structures and procedures which are expected tostay available in the WS discovery registries more than a P2P user session time span. BDTscales very well in the amortized case and it is better than Chord in the expected businessoriented weak–sparse updates. BDT does not scale well in worst-case, which is nottypically met in WS registry/catalogue implementation cases, though. Additionally, a faulttolerance schema is available to support with fidelity an elementary WS business solution.

Table 2

Performance comparisons with the best-known architecture

P2P network

architectures

Lookup messages Update messages Data overhead-routing

information

CHORD O(log n) O(log2 n) w.h.p. O(log n) nodes

H-F-Chord (a) O(log n/log log n) O(log n)

LPRS-Chord Slightly better than

O(log n)

Hierarchical

architectures

Lookup messages Update messages Data overhead-routing

information

NIPPERS O(log log n) expected O(log log n) amortised

expected O(n) w.c.

Exponentially

decreasing

BDT Oðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log nÞ

pworst-case

Oðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log nÞ

pAmortized Y(n) worst-

case

Exponentially increasing

ARTICLE IN PRESS

Load Balance Performance

0

50

100

150

200

250

0 5 10 15parameter - k

Ave

rag

e L

oad

Bal

ance Load (CHORD) LOAD(BDT-GRID)

Fig. 6. Load balance performance comparison with the best-known architecture.

Lookup Performance

02468

10121416

0 2 4 6 8 10 12 14 16parameter - k

Ave

rag

e P

ath

-L

eng

th

Path_Length (CHORD) Path_Length (BDT)

Fig. 7. Lookup performance comparison with the best-known architecture.

S. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162160

BDT and NIPPERS (Makris et al., 2005a) have an identical base algorithmic scheme.The only difference concerns the type of their time complexities. As you can see from thetable below, the better time complexities of NIPPERS (Makris et al., 2005b) are expected,meaning that the system depends on a class of distributions. In particular, the complexitiesin Makris et al. (2005a) were achieved for smooth family of distributions. That was a majordrawback, since in real life WS applications Update Queries such as WS registration andde-registration requests draw arbitrary distribution functions. For that reason, wedeveloped the BDT, which achieves worst-case complexities, independent on anydistribution. Besides, the distance between the complexities Oð

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log nÞ

pand

O(log logN) is negligible (Figs. 6 and 7).

5. Simulation and experimental results

In this section we evaluate the BDT protocol by simulation. The simulator generatesinitially K keys drawn by an unknown distribution. After the initialization procedure thesimulator orders the keys and chooses as bucket representatives the 1st key, the lnnst key,the 2lnnst key y and so on. Obviously it creates N buckets or N Leaf_nodes whereN ¼ K=ln n. By modeling the insertions/deletions as the combinatorial game of bins andballs presented in Kaporis et al. (2003), the size of each bucket (host peer) is expectedw.h.p. Y(ln n). Finally the simulator uses the lookup algorithm in Fig. 2. We compare the

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162 161

performance of BDT simulator with the best-known CHORD simulator presented inDabek et al. (2001b).

More specifically we evaluate the Load balance and the search path length of these twoarchitectures. In order to understand in practice the load balancing and routingperformance of these two protocols, we simulated a network with N ¼ 2k nodes, storingK ¼ 100� 2k keys in all. We varied parameter k from 3 to 14 and conducted a separateexperiment of each value. Each node in an experiment picked a random set of keys toquery from the system, and we measured the path length required to resolve each query.For the experiments we considered synthetic data sets. Their generation was based onseveral distributions like Uniform, Regular, Weibull, Beta and Normal. For anyone ofthese distributions we evaluated the length path for lookup queries and the maximum loadof each leaf node, respectively. Then we computed the mean values of the operations abovefor all the experiment shots. The figures below depict the mean load and path length,respectively.

From the experimental evaluation derives that the mean value of bucket load isapproximately 15 ln n in BDT protocol instead of k log 2n in CHORD protocol. Obviously,for k415 the BDT protocol has better load balancing performance. Considering now thelookup performance, the path-length in BDT protocol is almost constant in comparison toCHORD where the path-length is increased logarithmically.

6. Conclusion

The combination of decentralized architectures and WS technology has an added valueresult: functional integration problems are exceeded due to the universal informationarchitecture provided by WS, and moreover, direct and fault tolerant access tocomputational resources is achieved thanks to the network infrastructure architecturethat decentralized architectures provide. The decentralization of WS discovery, which isexpected to be one of the most favorable outcomes of this combination, will increase faulttolerance and search efficiency.

In the case of a WS discovery structure over a distributed hierarchical network, a WSitem (or a set of WS items) that satisfies the range criterion is stored in a node (or nodesrespectively). In order to discover the WS that meets the search criteria, this nodes need tobe determined. In this paper we introduced and analyzed the protocol BDT, which tacklesthis challenging problem in a decentralized manner.

Current work includes the implementation and experimental evaluation of BDT forlarge-scale WS discovery when the insertion/deletion of WS items draw unknowndistributions. Furthermore includes the application of BDT in other domains such aslarge-scale distributed computing platforms. Future steps also include research onsemantic-based P2P-like solutions (Nejdl et al., 2003, 2004) in order to supportsemantically enriched WS and respective ontologies (DAML Services (DAML-S/OWL-S);Web Services Modeling Ontology URL).

References

Andersen D. 2001. Resilient overlay networks. Master’s Thesis, Department of EECS, MIT, May 2001.

Anderson A, et al. Tight(er) worst-case bounds on dynamic searching and priority queues. ACM STOC 2000.

ARTICLE IN PRESSS. Sioutas et al. / Journal of Network and Computer Applications 31 (2008) 149–162162

Borenstein J, Fox J. Semantic discovery of web services: a step towards fulfillment of the vision. Web Services J

[on line] http://www.sys-con.com/webservices, 2004.

Chen Y, Edler J, Goldberg A, Gottlieb A, Sobti S, Yianilos P. A prototype implementation of archival

intermemory. In: Proceedings of the 4th ACM conference on digital libraries, Berkeley, CA, August 1999.

p. 28–37.

Clarke I. A distributed decentralized information storage and retrieval system. Master’s thesis, University of

Edinburgh, 1999.

DAML Services (DAML-S/OWL-S). Semantic Web Services Architecture (SWSA) Committee. URL http://

www.daml.org/services.

DNS internet survey, 2005. URL: http://www.isc.org/index.pl?/ops/ds/.

Kaporis Makris Ch, Sioutas S, Tsakalidis A, Tsichlas K, Zaroliagis Ch. Improved bounds for Finger Search on a

RAM’’, Lecture Notes Computer Science, vol. 2832, p. 325–36, 11th Annual European symposium on

algorithms (ESA 2003)–Budapest, 15–20 September, 2003.

Makris Ch, Sakkopoulos E, Sioutas S, Triantafillou P, Tsakalidis A, Vassiliadis B. NIPPERS: network of

interpolated PeERS for web service discovery. In: Proceedings of the 2005 IEEE international conference on

information technology: coding & computing (IEEE ITCC 2005), Track Next Generation Web and Grid

Systems, Full Paper, Las Vegas, USA, 2005a. p. 193–8.

Makris Ch, Panagis Y, Sakkopoulos E, Tsakalidis A. Efficient and adaptive discovery techniques of web services

handling large data sets. J Syst Software 2005b, in press.

Moreau L, Avila-Rosas A, Miles VDS, Liu X. Agents for the grid: a comparison with web services (part ii: Service

discovery). In: Proceedings of workshop on challenges in open agent systems 2002. p. 52–6.

Nejdl W, Siberski W, Sintek M. Design issues and challenges for RDF- and schema-based peer-to-peer systems.

SIGMOD Record 2003.

Nejdl W, Wolpers M, Siberski W, Schmitz C, Schlosser M, Brunkhorst I, Loser A. Super-peer-based routing

strategies for RDF-based peer-to-peer networks. J Web Seman 2004:177–86.

OASIS UDDI Specifications TC—Committee Specifications.

Paul Mockapetris, short cv, URL http://en.wikipedia.org/wiki/Paul_Mockapetris.

Ratnasamy S, Francis P, Handley M, Karp R, Shenker S. A scalable Content – Addressable Network, ACM

SIGCOM ’01 2001.

Rowstron A, Druschel P. Pastry: scalable, distributed object location and routing for large-scale peer-to-peer

systems. Lecture Notes in Computer Science 2001;2218:329–50.

Schmidt C, Parashar M. A peer-to-peer approach to web service discovery. World Wide Web 2004;7(2):211–29.

Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H. Chord: a scalable peer-to-peer lookup service for

internet applications. ACM-SIGCOMM 2001.

Stoica, Morris R, Liben-Nowell D, Karger DR, Kaashoek MF, Dabek F, Balakrishnan H. Chord: a scalable

peer-to-peer lookup protocol for Internet applications. IEEE/ACM Trans Network (TON) 2003;11(1):17–32.

W3C, 2003. SOAP Version 1.2 W3C Recommendation Documents. URL http://www.w3.org/TR/SOAP.

Web Services Modeling Ontology URL http://www.wsmo.org/.

Yutu Liu, Anne HH, Ngu and Liangzhao Zeng. QoS computation and policing in dynamic web service selection.

In: Proceedings of the WWW2004, New York, USA. p. 66–73.

Zhang H, Goel A, Govindan R. Incrementally improving lookup latency in distributed hash table systems. 2003

ACM SIGMETRICS international conference on measurement and modelling of computer systems

2003;31(1):114–25.