(128433164) arboles

24
0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM Approximation and Heuristic Algorithms for Minimum Delay Application-Layer Multicast Trees Eli Brosh Department of EE - Systems Tel-Aviv University Ramat-Aviv, Israel 69978 Email: [email protected]. il Yuval Shavitt Department of EE - Systems Tel-Aviv University Ramat-Aviv, Israel 69978 Email: [email protected]. il AbstractIn this paper we investigate the problem of finding minimum delay application-layer multicast trees, such as the trees constructed in overlay networks. It is accepted that shortest path trees are not a good solution for the problem since such trees can have nodes with very large degree, termed high load nodes. The load on these nodes makes them a bottleneck in the distribution tree, due to computation load and access link bandwidth constrains. Many previous solutions limited the maximal degree of the nodes by introducing arbitrary constraints. In this work, we show how to directly map the node load to the delay penalty at the application host, and create a new model that captures the trade offs between the desire to select shortest path trees and the need to constrain the load on the hosts. In this model the problem is shown to be NP-hard. There- fore, we present a logarithmic approximation algorithm and an alternative heuristic solution. Our heuristic algorithm is shown by simulations to be scalable for large group sizes, and produces results that are very close to optimal. I. INTRO DUCTION Multicast is a key component in the design of group communication applications which require efficient data de- livery to multiple destinations. However, IP multicast which implements multicast functionality at the network layer is still not widely deployed in current IP networks. To alleviate this problem, several recent proposals [1] have advocated an alternative approach, termed application layer multicast or end-host multicast, which implements multicast functionality at the application layer using unicast network level services only, forming an overlay network between end hosts. The goal of application layer multicast [2] is to construct and maintain efficient distribution trees between the multi- cast session participants, minimizing the performance penalty involved with application-layer processing. Many proposals attempt to optimize the cost of the multicast delivery tree using application level performance measures such as delay or throughput. The systems which aim at reducing the overall delay [2], [3], [4], [5], [6], construct a minimum height (or minimum diameter) tree with constrained degrees. The degree constraints are used to control the network resource usage, i.e., available bandwidth or stress on the physical links. However, this solution stipulates the usage of a dual cost optimization objective which mixes network level and application level costs to characterize applications performance.

Transcript of (128433164) arboles

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

Approximation and HeuristicAlgorithms for

Minimum Delay Application-LayerMulticast Trees

Eli BroshDepartment of EE -Systems Tel-Aviv

UniversityRamat-Aviv,Israel 69978

Email:[email protected].

il

Yuval ShavittDepartment of EE -Systems Tel-Aviv

UniversityRamat-Aviv,Israel 69978

Email:[email protected].

il

Abstract— In this paper we investigate theproblem of finding minimum delayapplication-layer multicast trees, such asthe trees constructed in overlay networks. Itis accepted that shortest path trees are nota good solution for the problem since suchtrees can have nodes with very largedegree, termed high load nodes. The load onthese nodes makes them a bottleneck in thedistribution tree, due to computationload and access link bandwidth constrains.Many previous solutions limited the maximaldegree of the nodes by introducing arbitraryconstraints. In this work, we show how todirectly map the node load to the delaypenalty at the application host, and createa new model that captures the trade offsbetween the desire to select shortest pathtrees and the need to constrain the load onthe hosts.In this model the problem is shown to be

NP-hard. There- fore, we present a logarithmicapproximation algorithm and an alternativeheuristic solution. Our heuristic algorithmis shown by simulations to be scalable forlarge group sizes, and produces results thatare very close to optimal.

I. INTRO DUCTION

Multicast is a key component in thedesign of group communication applicationswhich require efficient data de- liveryto multiple destinations. However, IPmulticast which implements multicastfunctionality at the network layeris still not widely deployed in currentIP networks. To alleviate this problem,several recent proposals [1] haveadvocated an alternative approach, termed

application layer multicast or end-host multicast,which implements multicast functionalityat the application layer using unicastnetwork level services only, forming anoverlay network between end hosts.The goal of application layer multicast

[2] is to construct and maintainefficient distribution trees between themulti- cast session participants,minimizing the performance penaltyinvolved with application-layerprocessing. Many proposals attempt tooptimize the cost of the multicastdelivery tree using application levelperformance measures such as delay orthroughput. The systems which aim atreducing the overall delay [2], [3], [4],[5], [6], construct a minimum height (orminimum diameter) tree with constraineddegrees. The degree constraints are usedto control the network resource usage,i.e., available bandwidth or stress on thephysical links. However, this solutionstipulates the usage of a dual costoptimization objective which mixesnetwork level and application levelcosts to characterize applicationsperformance.

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

In this paper we advocate anapplication-centric approach whichquantifies system performance usingapplication level costs only. We claimthat the conventional overlay networkmodel and its corresponding delay measureare designed to characterize multicastsystems which assume network-level datadistribution capabilities. Unfortunately,message process- ing by end-hostsinvolves an additional delay penaltywhich is not captured by such models andis related to application- layerimplementations of packet duplication androuting. In particular, the shift ofmulticast functionality to the upperlevel influences the simultaneousdistribution capabilities of end- hosts,implying a communication model withsequential mes- sage distribution. Thisconstraint stems from the fundamentalchange in the characteristics of therouting infrastructure assumed by theoverlay network, attributed to thedifference between message distributionspeeds of routing nodes (i.e., end-hosts)in overlay networks and packetdistribution speeds of routers inconventional physical networks.For example, consider the simple network

of Fig. 1A, composed of three hosts H1 ,H2 , and H3 and two routers R1 and R2connected using a high speed backbone,where host H1 uses a low-bandwidth accesslink for network connectivity, e.g.,modem access, and H2 , H3 use high-bandwidth LAN access connectivity.Assume that the goal of the overlaysystem is to devise a multicast treethat provides minimal distribution delayfrom H1 to H2 and H3 . Clearly, amulticast system must be careful toavoid delegating large degree to the lowbandwidth host H1 in order to eliminateunnecessary bottleneck due to its low-speed data distribution capabilities.Fig. 1B depicts the correspondingoptimal multicast tree. Now, considerthe conventional routing algorithm usedby many application-layer multicast

architectures that optimize tree delay,namely the shortest path tree algorithm.In this case the shortest path multicasttree reduces to a star topology (Fig.1C), which ignores the performancepenalty at the star center. Hence,serialized message distribution which isirrelevant to IP multicast schemes mustbe accounted for in the evaluation ofoverlay multicast architectures.Surprisingly, however, many application-layer architectures which optimize treedelay have neglected these implicationson the overall performance of groupcommunication applications.Another factor which constrains parallel message distribu-

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

provide performance bounds for tree and gridgraphs.The presented algorithmic solutions can

be effectively used to implementcentralized overlay systems, such as p2pand server based systems. The heuristicalgorithm is particularly useful in thecontext of two-tier server basedarchitectures [5], [10], [3] whichconstruct a virtual tree among theservers to provide an efficient contentand data delivery services to end- users.Each end-user registers to a server inorder to receive multicast services, andthe server handles the dissemination ofthe aggregated traffic. Such semi-staticarchitectures employ reliable servers toprovide high-availability service,stipulating a simple implementationwith low computational overheaddue to minor topology changes. Furthermore,a centralized

Fig. 1. Comparison between application-layermulticast and network-layer multicast in a simpleheterogeneous overlay network

tions in overlay networks is theprocessing capacity of end-host machines.For instance, consider a server whichimplements router like functionality atthe application layer, and therefore maynot have enough CPU power to handlemessage pro- cessing at the full speed ofits network interfaces. Hence, theeffective message distribution rate ofan end-host is shaped by two factors,the bandwidth of the access linkconnecting the host or its local areanetwork to the physical network, and theprocessing power and the computationalload on the host machine. A recent study[7] that measured the actual end-hostheterogeneity of popular peer-2-peer(p2p) overlay systems showed that thebandwidth and latency parameters canvary by several orders of magnitudeacross different hosts in the system.In this paper, we present an overlay

network model which captures the realisticcosts involved with application-layermulticast. The model is a mathematical

generalization of a communication modeldeveloped by Cidon et al. [8] for high-speed networks, and similarly itincorporates two separate delaymeasures. The processing delay measure,which is a reciprocal of the effectivemessage distribution speed of an end-hostapplication, and the communication delaymeasure, which represents the delay oftraversing an overlay link. Thisframework enables us to characterize theperformance of multicast trees using asingle cost, the overall delay of messagedistribution.We use the proposed framework to develop

heuristic and approximation algorithms forthe basic problem of optimal multicasttree construction. Both the heuristicand the ap- proximation generate minimumdelay trees that intrinsically balanceshort latency with small degree, and thusavoid an external trial-and-error type ofbalancing between the two, i.e., we donot impose a maximum degree on ourtrees. Our heuristic algorithm constructssuch trees efficiently and thus can scaleto large multicast groups, which is aknown problem [2]. Note that the

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

suggested solution works both for fullyconnected topologies, and for structuredtopologies, used in some p2p overlaynetworks [9]. Therefore, we address theissue of multicasting in partiallyconnected networks, and

approach is capable of providingquick and efficient ses- sionmanagement services by sharing thecomputational load among several overlayservers [4].The main applicability of our

algorithms is in the context of delay-sensitive multicast applications, whichrequire tight bounds on the end-to-enddelays due to jitter and timingconstrains. Applications which belong tothis category include audio conferencing,real-time media streaming, content distri-bution services, and multi-playerdistributed games.The rest of this paper is organized as

follows. The next section formulates theoverlay communication model. In Sec- tionIII we discuss the problem of optimalmulticast tree construction and show thatthis problem is NP-Complete. In SectionIV we develop approximation and heuristicalgorithms for solving this problem.Section V deals with performance analysisof the heuristic algorithm for severaloverlay topolo- gies. An experimentalevaluation of our solutions is presentedat Section VI. Finally, Section VIIconcludes the paper.

II. OVERLAY COMMUNICATION NETWORK MODEL An overlay network is a fully connected

virtual network formed by hosts whichcommunicate with each other using aphysical network, such as the Internet.The overlay network utilizes the regularunicast services of the physical networkto provide communication among hosts, anddo not require any special support at thenetwork level. The delay experienced bya message that travels between hosts iscomposed of two elements: (a)Communication delay - which represents thedelay of traversing the unicast pathbetween the hosts. This component includesthe accumulated propagation and queuingdelays of the physical links on theunicast path, and the message receptionoverhead at the receiver host. (b)Processing delay - which represents the delayof processing a message at the sender

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

host. This element includes theoverhead of preparing a message fortransmission and the transmission delaythrough the physical access link.Although current implementations of

operating systems enable applications toperform concurrent message trans-missions, the concurrent transmissionswould be serialized when passingthrough a physical access link.Typically, this serialization isperformed at the hardware level by theaccess equipment. Thus, sequentialdistribution of messages should beaccounted for in order to avoidunrealistic application design

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

schemes which rely on simultaneousmessage dissemination capabilities.We define an overlay network model based

on a gener- alization of a communicationmodel developed by Cidon et al. [8]. Theoverlay network is modeled by a directedcomplete graph G = (V, E), where V is aset of vertices representing hosts, and Eis a set of edges representing unicastpaths. We use the terms ’host’ and’link’ to refer to the vertices andedges in the overlay graph. Each overlayedge (u, v) ∈ Eis associated with a communication delaycost, c(u, v), and each host v ∈ V isassociated with a bounded and finite

processing delay cost, p(v). Theoriginal model of Cidon et al. [8] assumeshomogenous processing and communication

costs, i.e., p(v) = P, ∀v ∈ V , and c(u, v)= C, ∀(u, v) ∈ E.The direct communication between hosts is

characterized asfollows. Assume that at time t, host uinitiates processing of a message targetedto host v. Then, host u is busy processingthis message during the time interval [t, t+ p(u)], and the message arrives at hostv at time t + p(u) + c(u, v). Therefore,the processing delay measure representsthe shortest time interval betweenconsecutive message transmissions.It is important to note that in our

model, the delay costs between pairs ofhosts do not necessarily satisfy thetrian- gle inequality. This is a knownphenomena in the Internet, stemming inpart from policy routing. For example,Jamin et al. [11, Figs. 2 and 3] show thatabout 30-50% of the triangles in theInternet do not obey the triangleinequality.

III. THE OPTIMAL MULT ICAST TREEPROBLEM

In this section we state our designobjective formally, and show that theoptimal multicast tree problem is NP-Complete. We use the term multicast

scheme to refer to the task ofdistributing a message from a source host

to a subset of hosts M in the overlaynetwork. Since one cannot relay on the

cooperation of non-participating hosts(i.e., hosts which do not belong to themulticast group M ), we assume that only

the hosts in M are allowed toparticipate in the distribution. Thus, amulticast scheme in the graph G = (V, E)

can be viewed as a broadcast scheme,i.e., the task of distributing a message

to the entire network, in the subgraphinduced by the

host set M⊆ V .

We formulate the optimal multicast treeproblem, also de-

noted as minimal delay multicast (MDM) problem,as follows. Definition 1: The optimal

multicast tree problem (MDM): Givena directed complete graph G = (V, E),

amulticast group M ⊆ V , a source hosts ∈ M , a non- negative real processing

delay p(v) for each vertex v ∈ V ,and a non-negative real communicationcost c(u, v) for each edge (u, v) ∈ E, finda multicast scheme which minimizes the

delay by which all the hosts in M receivea message from s. Our objective is todevise a multicast scheme which min-imizes the arrival time of the lastmessage. Therefore, we restrict this

study to non-lazy multicast schemes, inwhich a host that has already received a

message does not delay mes- sagedistribution by becoming idle (this term

was introduced in [12]). Such schemescorrespond to an ordered directed tree

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

T , rooted at s and spanning M . In thistree, the outgoing edges of a non-leafnode u are ordered according to themessage distribution order of host u inthe corresponding multicast scheme, suchthat the ith outgoing edge corresponds tothe ith transmission. The reception delay ofhost v, denoted by tT (v), is defined asthe time at which v receives the message.The reception delay of s is by definition0. The cost of a multicast tree T isdefined as the earliest time at whichall the hosts have been notified, i.e.,this cost equals to maxv∈M tT (v).

Given a multicast tree we can easilycalculate the optimal

ordering using a recursive computation,working bottom-up (the full descriptionof the recursion can be found in [13]).Therefore, in the rest of the paper weneglect the ordering and concentrate onfinding the optimal tree.We show that the MDM problem is NP-hard

using a simple reduction from thetelephone broadcast (TB) problem. In theTelephone model (see [14]) communication issynchronous, i.e., each node can eithersend or receive a single message percommunication round. The TB problem seeksan optimal broadcast scheme whichdistributes a message from a root node, r,to all the nodes in a graph in a minimumnumber of rounds. The TB problem is knownto be NP-Hard [15, ND49] for arbitrarygraphs.

Theorem 1: The MDM problem is NP-hard.Proof: We will show that given a

delay bound K , deciding if there is amulticast scheme with a distributiondelay of at most K is NP-complete. Theproof is by a reduction from T B. Givenan instance to T B, GT B = (V, ETB ) andr ∈ V , we construct an instance toMDM as follows: acomplete overlay network G = (V, E) withunit processing costs p(v) = 1 ∀v ∈ V , andcommunication delay defined as c(u, v) = 0∀(u, v) ∈ ET B and c(u, v) = n ∀(u, v) ∈/ ETB .We let s = r and M = V . In theresulting MDM instance, there is amulticast scheme with a distributiondelay at most K if and only if there isa telephone broadcast with at most K

rounds (for K < n). The claim follows bynoting that forK ≥ n the TB problem is trivially solvedsince it is equivalentto deciding if G isconnected.

IV. MULTICASTALGORITHMS

In the next section we present oursolutions for obtaining good multicasttrees. We begin with a short review ofprevious results. Broadcast and multicastare important communication primitiveswhich have many applications indistributed and parallel systems. Theproblem of designing efficient broadcastand multicast algorithms which assumesequential message distribution, have beenextensively studied in the context ofseveral communication models. One modelwhich was widely investigated is thetelephone model, described in theprevious section. Some telephone modelstudies have focused on the problem ofdesigning optimal broadcast schemes forspecific classes of graphs (see [16] fora comprehensive survey), while others havesuggested approximation algorithms foroptimal broadcasting in general graphs([17], [18], [12]).Cidon et al. [8] presented a communication

model for high- speed networks whichcaptures communication costs using two

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

parameters – transmission delay andcomputation delay. In this model, thenetwork is represented by a graph G =(V, E). Each node is associated with aprocessing delay cost P , and each edgeis associated with a communication delaycost C . The model assumes sequentialprocessing of messages, such that thetime needed for a node to handle thetransmission of i messages is iP . Inaddition, they proposed an optimal tree-based broadcast algorithm (see [8]) forcomplete graphs, and showed that suchtrees achieve a broadcast delay which islogarithmic in the size of V . Sincethe overlay model is a generalizationof this model, any non-lazy multicastscheme (such as our heuristicalgorithm) would provide a logarithmicdelay for the case of a homogenous costoverlay network. Raz and Shavitt [19]presented a general version of thismodel which supports IP-like routing, andconsidered efficient multicastingalgorithm (based on balanced trees) forline topologies.The heterogenous postal model [12] is

a related model which incorporates(non-uniform) communications and switchingdelays, i.e., it captures networks whichmay have different link delays anddifferent switching times betweenmessages. Optimal algorithm forbroadcasting in a complete and homogenouscost postal network can be found at[14]. An approximation algorithm foroptimal multicasting is given in [12].This algorithm has a log k approximationratio where k is the size of the multicastgroup. We further discuss this model andthe corresponding approximation algorithmin Section IV- A.We proceed now to describe our solutions

for constructing application layermulticast trees. Since MDM is NP-complete, we seek to deviseapproximations and heuristics. We beginwith developing a logarithmicapproximation algorithm based on amodified version of the postal

approximation algorithm. The resultingalgorithm solves an undirected variant ofthe multicast problem, thus its domain islimited to overlay networks with symmetriclinks. This restriction is in many casesunrealistic due to the widespreaddeployment of asym- metric access links,such as ADSL and cable-modem con-nections. Furthermore, the approximationalgorithm requires high (polynomial)running time due to its need to solvelinear programs. Therefore, we devise analternative cost-effective heuristicalgorithm that supports directed overlaynetworks, and evaluate its performance.We also discuss shared tree extensions

of these algorithms. In the shared treeapproach [20] a single tree is constructedfor the purpose of multi-source multicastOur analysis show that the presentedalgorithms can be easily modified tosupport shared trees without major impacton the performance. Of course, usingmultiple single source multicast treesalways achieve lower delay, but at theexpense of the management and resourceusage overhead.

A. ApproximationoutlineWe base our approximation on the

algorithm of Bar-Noy et al. [12] whichhandles multicasting in heterogeneouspostal models. The postal modelrepresents the communication net-

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

work using an undirected complete graph G = (V, E). Each node v is associated with a switching time parameter sv . A sender node v is considered busy (i.e., engaged only with the current transmission) in the first sv time units following theprevious transmission time. Each link (u, v) ∈E is associatedwith a parameter λuv which denotes the delay of delivering amessage from u to v.Although both models (postal and

overlay) incorporate similar measures forcharacterizing heterogenous networks, thedistribution timings in the postal modeldiffers from the over- lay model in thefollowing aspects. (1) In the postalmodel the communication latency λuv

incorporates the sending time of u, whilein the overlay model this sending timeis incorporated in the processing delayof v. Thus, the cost (i.e., delay) ofdelivering the ith message from u to v isthe sum of the costof (u, v) and i − 1 times the cost of u in the case of the postalmodel, and the sum of the cost of (u, v) andi times the costof u in the case of the overlay model. (2) The postal model assumes that su < λuv, ∀(u, v) ∈ E.Thus, we need to modify the postal

approximation algorithm before applyingit to the overlay model. We derive ourapprox- imation algorithm, called Approx-MDM, using the following steps: (1) wedefine the generalized heterogeneous postal (GHP)model, which eliminates the restrictionon the values of the communication andswitching (sending) measures. (2) weextend the postal approximation andderive the generalized postalapproximation algorithm which handlesmulticasting in GHP models (3) weconstruct a cost preservingtransformation from the overlay modelto the GHP model and apply thegeneralized postal approximation tocompute the multicast tree.This process increases the originalapproximation by an additive factor of

O(log |M |)(pmax − pmin ) where pmax and

pmin are the maximum and minimum processing delays of the hosts in M , respectively.Next, we introduce the required definitions.Definition 2: The GHP model is a

heterogeneous postal model which excludesthe restriction on the network costs,such that the edge length parameter in theGHP model is finiteand positive, i.e., λuv > 0, ∀(u, v) ∈ E.

Definition 3: Given a GHP model with a graphG = (V, E),

a switching time function s, and a communication latency function λ, defineγ = max(u,v) E

su as the switchingto communication ratio of the graph G.Before proceeding, we provide an outline

of the postal approximation algorithm.The interested reader is directed to[12] for the full details.

B. The postal approximation algorithmIn the context of the postal model the

multicast problem is defined as follows.Given a configuration of an undirectedgraph with associated communication andswitching cost func-tions (G = (V, E), s, λ), a set of terminals U ⊆ V , and a source node r ∈ U, find a minimum time (i.e., minimumdelay) scheme that distributes a message fromr to the terminalset U where all the nodes in V may participate in the

− 4

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

distribution. Observe that we preservethe notations of [12]which denote the multicastgroup by U .The basic idea of the algorithm is to

find a multicast tree T which minimizesthe quantity ∆T +LT , where ∆T denotes themaximum generalized degree (thegeneralized degree of a node is itsactual degree multiplied by thecorresponding switching time) of T , andLT denotes the maximum distance (in T )from r to any of the nodes in U . Thealgorithm computes a multicast tree,which approximates the cost of theoptimaltree T ∗, iteratively using l rounds. LetUi denote the terminalset in the ith round. The algorithmstarts with the initial setU0 = U , and terminates when U = r. Inthe ith round thealgorithm uses the core procedure tocompute the following, for any i ≤ l:

1) a core subset Ui ⊆ Ui 1 of size atmost 3 ·|U | where

r ∈ Ui2) a multicast scheme from Ui to

Ui−1 , such that theobtained multicast time is at mostlinear in the optimalmulticast time from rto Ui−1 .

The computation of core(Ui ) involves thefollowing steps:

1) Solve a linear program, variant ofa multicommodity flow. The resulting set of fractional paths is rounded[12, Theorem 4] producing a set of |Ui | integral paths,one for eachterminal.

2) Transform the set of paths into aset of spider graphs

- graphs in which at most one nodehas degree more than two. Select anarbitrary terminal from each spidertogether with nodes which arenot spanned by any spider to beincluded within the resulting core.This selection insures that thechosen spider terminal is able todistribute a message to all its nodesin the required linear time.

In [12] it is shown that the resultingtree has a O(log |U |)

multiplicativeapproximation factor.C. The generalized postal approximationalgorithmWe now describe the GHP rounding matrixwhich enables the support of networkswith γ ≥ 1, i.e., GHP models. We

preserve the notations of [12]: P1 , P2 , . . .denotes the length bounded fractionalflow paths, and V (Pi ) and E(Pi ) denotesthe set of nodes and edges in a path Pi ,respectively; f (Pi )denotes the amount of flow pushedon path Pi , and Pj

denotes the set of all paths that carryflow of the jth com-modity. To simplify the presentation ofthe results we defineγ∗ = max1, γ. In the generalized postalapproximation, thefollowing rounding matrix (termed GHProunding matrix) isused for rounding the fractional solutionof step (1) in the core computation.

G

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

where the second part of the equationfollows from the definition of γ. The sumof the negative entries at each columnis at most −4LT · γ∗. By invoking therounding theorem [12,Theorem 4] we get a set of integralpaths such that theircongestion (i.e., the generalized degreeof the graph spanned by a set of paths)is at most 6∆T ∗ + 4LT ∗ · γ∗ and the lengthof each path is at most 4LT ∗ · γ∗.

The generalized postal approximationalgorithm. The generalized postalapproximation algorithm is a postalapprox- imation algorithm which employsthe GHP rounding matrix instead of theoriginal one.

The correctness of the modifiedalgorithm follows from the fact that thealgorithm structure and its underlyingtheorems and lemmas are not relatedto the specific switching andcommunication cost values, except of theconstrained selection of the roundingcoefficient which we handleappropriately. Therefore, it remains toshow the approximation ratio.Due to the GHP rounding, step (2) of the

core computationgives a set of spiders with the followingproperties. The diameter of each spideris at most 4 · γ∗ · (∆T ∗ + LT ∗ ) and thegeneralized degree of each spider is atmost 6·γ∗ ·(∆T ∗ +LT ∗ ). Since the algorithminvokes O(log |U |) iterations of the coreprocedure and the cost of theoptimal tree T ∗ is at least0.5 · (∆T ∗ + LT ∗ ) (as shown in [12, Lemma1]), we have

that the multicast time from the root r toa set of terminals Uis at most O(log |U | · max1, γ) times theoptimal multicasttime.D. The MDM approximationalgorithmThe following is a polynomial algorithm

that approximates the minimum multicastdelay. The algorithm accepts as an in-put an overlay network configuration, (G,c, p), which consists of an undirectedgraph G = (M, EM ) with associatedprocess- ing and communication costfunctions, p and c, respectively,and a source host s˜ ∈ M . Given anoverlay network graph(V, E), the input graph, G, is thesubgraph induced by the multicast groupset M ⊆ V (see Section III).

Algorithm Approx-MDM(G, p,c, s˜)1. Construct a GHP configuration

instance IGH P = (G, s, λ), from thegraph G, switching time function sv =p(v), ∀v ∈ M and communication latencyfunction λu,v =c(u, v) + (p(u) + p(v))/2,∀(u, v) ∈ EM .

2. Invoke the generalized postalapproximation to compute a

multicast tree using IGH P , source hosts˜, and multicast groupU = M.3. Return the computed multicast tree

for each v sv · i: v∈V (Pi )

for all j −4LT · γ∗ ·

i: Pi ∈Pj

f (Pi ) ≤

6∆T

f (Pi ) = −4LT · γ∗

Let OP T be the minimum multicast delayfrom s˜ to M , and let n be the sizeof M . Let pmax and pmin be themaximum and minimum processing delays ofthe hosts in M , respectively.

Theorem 2: The multicast delay of the Approx-MDM al-The sum of positive entries in the ith column is: gorithm is at most (OP T + (pmax − pmin )) ·

O(log n)

v∈V (Pi )sv ≤

(v,w)∈E(pi )

λvw · γ∗ +

stj

≤ 4LT · γ∗. Proof: Given a multicast tree T which spans M and a host v ∈ M , let t

T (v) be

the reception delay of v assuming

GH

GH

p

t

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

GHP model timings. By substituting thecomputed costs of IGH P with thecorresponding overlay input costs we getthe following relationship between thereception delay costs.

Algorithm LRF(G, p, c, s)

GH P

T(v) =

p ( v ) − p ( s ) 2

+ tT (v) (1)

1. t[s] = 0, set s as the root of a treeT2. for each v ∈ M − s

In order to derive this equation, we usedthe claim that there is a delay gap of asingle processing round (per eachtraversed host) between the messagedelivery delays in the postal and overlaymodels (see Section IV-A).

Consider the following quantitiescomputed assuming GHP

model timings. Let OP TGH P be themulticast delay of an

3. do t[v] ← ∞ 4. for each (u, v) ∈ EM5. do wu,v = c(u, v) + p(u)6. for each (u, v) ∈/ EM7. do if v = u then wu,v = 0 else wu,v = ∞8. D, Π ← All-Pairs-Shortest-Path(G, W )

optimal treeT ∗

for the IGH P configuration. Let u ∈ M

9. while M − V [T ] = ∅10. for each host u ∈ M − V [T ] do

be a node with the maximum reception delay in T ∗ .Therefore,

11. m[u] ← arg minv:v

∈V [T ] t[v] +d

v,u

OP TGH P ≤ OP T + p(u) − p(s)2

≤ OP T +

p max −

p min 2 (2)

12. v ← arg maxu:u∈M −V [T ] t[m[u]] + dm[u],u 13. w ← v14. while w = m[v] do15. t[w] ← t[m[v]] + p(w) + dwhere the first inequality follows from

Eq. (1). The constructed I GH P instance satisfies γ < 2, since0.5·(p(v)+p(w))+c(v,w) < 2, ∀(v, w) ∈ EM , and therefore the

m[v],w16. add w to T as a child of πm[v],w17. w ← πm[v],w

, t[v] ← −

[ [ ]] [ [ ]] + (

[ ]) [ ]multicast delay of the resulting tree isat most OP TGH P · p(v) t m

vt m v

p m v t v

O(log n). Substituting OP TGH P accordingto equation (2)gives the requested upper bound.When the processing delays are all

equal, it improves our approximation forthe MDM problem to O(log n). We donot restrict the communication costs tobe homogeneous. The following theoremhandles this case.

Theorem 3: Consider an overlay model withhomogenous processing costs, i.e., p(v) =p, ∀v ∈ M . The multicast delay of Approx-MDM algorithm for this case is at mostOP T ·O(log n).

Theorem 3 can be obtained bysubstituting pmax = pmin =

p in Theorem 2.

Given a network with symmetriccommunication costs, a multicast tree Trooted at s can be easily adapted tosupport multicasting from multiplesources. To enable the multicastfrom a host v ∈ M, v = s, we reverse the direction of theedges on the path from s to v. This modification results in a multicast schemewith a delay which at most p(v)−p(s)+2·C ,where C denotes the cost of T . Therefore, the undirected version of T can be used as a shared tree, such that themulticast delay of any host v = s is at most 2 · (OP Ts + (pmax − pmin )) · O(log n), where OP Ts denotes the optimalmulticast delay from s.

i

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

E. Heuristic algorithmWe introduce a heuristic tree

construction algorithm, named largestready time first (LRF), which solves thedirected variant of the MDM problem. Theproposed algorithm computes the multicasttree incrementally using a greedyapproach; for each host not yet includedin the tree, the algorithm computes itsminimum reception delay, and the host withthe maximal delay quantity is selected.The tree is extended with the hosts on aminimum delay path between the selectedhost and a notified host. Fig. 2 showsthe steps of the algorithm.

19. return T

Fig. 2. Greedy tree construction forthe MDM problem

The input to this algorithm is the sameas the input to the Approx-MDM, exceptfor the source host which is denoted by s.The algorithm maintains a ready timeattribute t[v] for eachhost v ∈ M which records the minimal timeat which the hostis free to initiate processing of a newmessage. The ready timeis set to infinity to indicate a nonnotified host. The constructed tree isdenoted by T and the corresponding set ofnotified hosts by V [T ]. In eachiteration, the algorithm determines foreachhost u ∈ M − V [T ] its mate host m[u] ∈ V [T ]by selecting apath which minimizes the ready timeattribute of u, setting vto indicate the host with the maximalreception delay. Then, it updates theready time of the hosts on the path fromm[v] to v to reflect the processing timeinvolved with delivering a message to thenewly notified host v, and it adds thepath hosts to the constructed tree T .The variable w indicates the currentupdated host. The algorithm terminateswhen all the hosts are notified.To be able to calculate the connection

cost between a non notified host and anotified host, a preprocessing phase ofcomputing all pairs shortest path usingthe Floyd-Warshall algorithm [21] isimplemented. Given a pair of hosts v1andvk connected by a path < v1 , . . . , vk > oflength k − 1, thecost of this path is defined as

k−1

p(vi ) + c(vi , vi+1 ), wherevi , 1 ≤ i ≤ k denotes the ith host on thispath, i.e., this costrepresent the minimal distribution delay(along the specifiedpath) from v1 to vk . A shortest path fromhost u to host v is defined as any pathbetween these hosts with minimum cost.

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

Therefore, the input to the Floyd-Warshall computation is an

2

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

n × n weight matrix W = (wvi ,vj ) defined as:

wvi ,vj =

p(vi ) + c(vi , vj )if vi = vj ,

0otherwise .

where n denotes the size of M . Theoutput of the all pairs shortest pathcomputation is composed of two n × nmatrices;all pairs distance matrix D = (dvi ,vj ) andpredecessor matrix Π = (πvi ,vj ) (See[21]). Observe that the shortest pathfrom the source s to any host v is a lowerbound on the cost of the optimal tree.

This algorithm can be extended tosupport a shared treesolution using the followingmodification. At the initialization phasethe longest path in the graph G iscomputed using the weight matrix W , andthe hosts on this path are used as theinitial set of notified hosts in T . Theshared tree variant uses this initialselection instead of the original one andproceeds with normal tree construction asin the original algorithm.The complexity analysis of this

algorithm is straightforward. The allpairs shortest path computation requiresΘ(n3 ) time. Each iteration requires O(n)time to find a single mate host, and O(n)time to extend the tree. The total timeper iteration is therefore O(n2 ), andthe total running time of the LRFheuristic is Θ(n3 ). We conjecture thistime complexity cannot be improved sinceany algorithm should at least calculatethe all pair shortest path.We show using an example (see Fig. 3A)

a lower bound on the approximation ratioof the heuristic tree. Consider thefollowing complete undirected graph G =(V, E) with n + 1 hosts denoted by v0 , . . . , vn

, with processing costs defined asp(v) = 1, ∀v ∈ V , and communication costsc(vi , vj ) definedas

0 if i = 0, j = 1, . . . n ,c(vi , vj ) =

δ if 1 ≤ i ≤ n − 1, j

= i + 1 , n otherwise .

where δ → 0. For the simplicity ofpresentation Fig. 3A omits the edgeswith cost n. Assume that the sourcehost is v0 and that M = V . Therefore,the LRF scheme would have v0 distributethe message to the rest of the hostsusing n processing rounds, such that thetree cost is n (see Fig. 3B). On theother hand, consider an improved schemein which v0 distributes the message to khosts, and the last host in the graphreceives the message in k + mδ timeunits, where m isa positive integer. Let |pi | denote thelength (i.e., the numberof edges) of path pi . Such a scheme canbe obtained by atree composed of k paths p1 , . . . , pk whichshare only singlehost, v0 (i.e., only v0 has an out-degree more than two), andthe lengths of these paths form thefollowing non increasing sequence: |pi |− 1= |pi+1 |,1 ≤ i ≤ k − 1, whereas for a singleindex j in this set we may have |pj | = |pj+1|. Fig. 3C depicts such a tree when n =k · (k +1) . Assume that v0 distributesthe message to these paths (i.e., to itsk children in the tree)according to a decreasing path lengthorder. Therefore, the cost of the optimaltree is less than (1 + δ) ·k. Since theset of pathsspan all the hosts in V we have that k =O(

√n), and therefore

we get Ω(√

n) approximation ratio for themulticast delay. We

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

Fig. 3. Example that provides √

n approximationratio for the heuristic tree.

(A) The input graph (B) The heuristic tree. (C) An improved tree.

conjecture that this example representsthe worst case, namely

that our LRF heuristic algorithm is an √

n-approximation.

V.TOPOLOGIES

In this section we analyze theperformance of broadcasting for thespecial case of partially connected

overlay networks. Partial connectivity,which assumes arbitrary or structuredgraphs, is an important model which arisesin several contexts. Partial connectivity

is implement by many data distributionservices, such as content distribution

networks and multime- dia streamingsystems, which utilize a dedicated

network of leased lines and virtualconnections to provide connectivity among

application servers. These systemsoptimize resource usage, and therefore

enforce connectivity constrains to achieveefficient resource utilization.

Structured p2p systems [9] are anotherclass of applications which utilize

partial connectivity overlays. Despite thefact that many of these systems employ

distributed architectures, ourcentralized application-centric approachcan still be used to provide theoretical

performancebounds on the multicast delay in such systems.

Partial connectivity may also rise incases where due to anonymity requirements

not all the hosts are aware of eachother and thus connectivity is

sparse. That is, hosts use localpolicies to override universal

connectivity. For example, considersecurity policies in the Internet, which

limit the con- nectivity of hosts locatedbehind firewalls and NAT facilities.

Partial topologies are also relevant tothe case of active net- works [19], which

have similar properties to those ofoverlay networks. It is possible to view

the overlay network as an applicationlevel implementation of the active

network model, where the active networkuses programmable routers to add new

functionality and services to thenetwork. For example, Raz and Shavitt

[19] have used a framework thatconsiders the processing and

communication delays in active networks,to develop and analyze the time

complexity of several basic algorithms,including multicasting. Their framework

uses the processing delay measure tocapture the delay imposed by a

software router implementing copy and forward of packets. Therefore, in order to support networks with partial connec-

tivity an extended overlay model isassumed; in this model

the communication cost of an overlaylink (u, v) is set to

mn

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

infinity, i.e., c(u, v) = ∞, to indicate the absence of directcommunication from u to v.

Farley et al. [23]. They have shown that givena grid graphGm,n with a node v at position (i, j), then

For general graph topologies our analysis focuses on the

D + 2 if i = j = m+1n+1

performance of broadcast algorithms. In the next section, we provide performance bounds for several common undirected

b(v) = D + 1 if

i =

2 =2 or j =

2 2 , i = j

graph topologies.

A. Trees

We consider broadcasting in tree graphs.In these graphs each node has asingle path from the root, implyingthat any broadcast scheme ischaracterized only by the messagedistribution order of non-leaf hosts.

Lemma 4: Any (non-lazy) broadcastscheme provides a factor d approximationfor the minimal broadcast delay for atree graph T with a maximal degree of d.

Proof: Denote by s the source host. Weuse the path cost notation defined inSection IV-E, i.e., the cost of a pathrepresents the minimal distribution delayalong it. In any (non- lazy) broadcastscheme the delay by which the lastnotified host, denoted by v, receives amessage is composed of two quantities,the cost of the path from s to v, andthe sum of the additional processingdelays invoked by the hosts on this path(the additional delay of v is assumed tobe zero). By definition, the formerquantity is no more than OP T , where OP Tdenotes the optimal broadcast delay. Wedenote by <v1 , . . . , vk > the path of length k − 1 which connects between

D otherwise.

where b(v) denotes the optimal broadcasttime (i.e., delay) from v, and Ddenotes the maximal distance from v to acorner node in Gm,n . The distance betweena pair of nodes u and v in positions (iu , ju) and (iv , jv ), respectively, is defined asthe number of edges on the shortest pathbetween them.Next, we provide a new result on

broadcasting in grid graphs usingshortest path trees. Let OP T denotethe cost of an optimal solution.

Theorem 5: The broadcast delay of ashortest path tree for homogenous costgrid graph Gm,n = (V, E) is at most OP T +2

Proof: Let s denote the source host, andlet T denote a directed shortest path tree(SPT) rooted at s. The SPT structureimplies the following degree delegationin T . If s is a corner host then itsdegree is 2 and the rest of the hosts havemaximal out-degree of 2. If s is aside host or interior host, then themaximal out-degree of the interior hostswhich share a common coordinate with s is3 and the maximal out-degree of the restof the hosts is 2. The degree of s is 3when s is a side host, and 4 when it isan interior host. Let S3 denote theset of hosts in V \ s such that the out-degree of these hostss and v, such that v1 = s and vk = v. Due

to the bound onthe degree of the tree, each node may delay the processing by

in T is 3, i.e., S3

= v : deg(v) = 3, v = s where deg(v)

denotes the out-degree of v in T .at most d − 1 processing rounds, and therefore the sum of theadditional processing delays is at most (d − 1) ·

k−1 p(vi ),

Let T2

be a binary subtree of T rooted at r,such that r is

i=0where vi , 1 ≤ i ≤ k denotes the ith host on the path from s tov. It is easy to see that this quantity is at most (d − 1) · OP T ,and the lemma follows.

This result indicates that messagedistribution along a degree-bounded treeat an arbitrary order, such as thedelivery schemes used by overlaymulticast systems which ignore sequentialdistribution of messages (see for example

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

[22]), produces a delay which is up to amultiplicative constant factor higher thanthe optimal result.The LRF heuristic achieves an optimal

solution for a special class of treegraphs termed spiders, in which at mostone node has degree larger than two. Theproof can be found in [13].

B. Grids

This section investigates broadcastingin the context of homogeneous rectangulargrid graphs. Let Gm,n = (V, E)denote an m × n grid graph. Each hostin this graph isuniquely identified by a row and column indexes (i, j), where1 ≤ i ≤ m and 1 ≤ j ≤ n. The broadcastanalysis isconducted assuming a homogeneous cost modelwhere p(v) = 1, ∀v ∈ V , c(u, v) = 0, ∀(u, v) ∈ E. This particular selectionreduces the model to the well known telephone model, andenables the usage of known results in grid broadcasting.The problem of finding an optimal

broadcast scheme in 2- dimensional gridgraphs have been previously investigatedby

a child of v ∈ S3 or a side host whichis a child of s. Thegrid topology implies that a subtree ofheight d, rooted at aninternal node of T2 , has a single leafat depth d. Therefore, by using a bottom-up recursive computation (see Section III)we get that the optimal broadcast delayfrom the root of a T2 tree with height dis d. If s is a corner host then T hastwo T2 subtrees linked to it (that is,the root of each subtree is achild of s). Since only one of these treeshas a height of D − 1while the height of the other is at mostD − 2, the broadcastdelay from a corner host is D. This delayachieves the optimalvalue (devised by Farley et al.in [23]), andthe lemma follows for this case.

The other cases are analyzed using acompressed version of T . A T2 tree withheight d can be ’compressed’ to a path

with d edges which preserve the broadcastdelay of the tree. The compressed version

of T , denoted as Tc , is produced byreplacing all the T2 subtrees with their

corresponding paths. This compressiondoes not modify the broadcast delay of T. Let T3 denote a subtree in Tc rooted

at a child of s. By definition, themaximal out-degree of this tree is 3.

Next, we consider the case of T3 treeswhich include at least a single node with

an out-degree of 3. The grid topologyimplies that

a subtree of height d rooted at aninternal node of T3 , v ∈ S3 , may have atmost two leaves at depth d. Each host v∈ S3has three children in T , v1 ,v2 and v3 ,ordered according to the height of thesubtrees rooted at these hosts, such thath(Tv1 ) ≤

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

h(Tv2 ) ≤ h(Tv3 ) where Tvi ,i = 1, 2, 3denotes the subtreerooted at vi , and h(Tvi ) denotes theheight of Tvi . Given asubtree of height d rooted at v with asingle leaf at depth d, the grid topologyimplies that h(Tv3 ) > maxh(Tv2 ), h(Tv1 ).Ifthe subtree has two leaves at depth d,then h(Tv3 ) = h(Tv2 ) > h(Tv1 ). By usinga bottom-up recursive computation we getthat the broadcast delay from the root ofa T3 tree with height d is at most d + 1when there is a single leaf at depth d,and at most d + 2 when there are twoleaves at depth d.If s is a side host, the root of T is

linked with three T3 subtrees. If s is amiddle side host (i.e., a host withcoordinate (is , js ) such that is = m+1 orjs = n+1 ) there are two hosts

order running time. In our simulationenvironment which includes 1.5Ghz PCswith 512M RAM, the Approx- MDMalgorithm was able to effectivelysolve problems with up to 25 hosts.

• Shortest Path Tree. This tree isevaluated to assess

the performance penalty involved withSPT routing, acommon routing scheme employed by manyoverlay multicast systems. The SPT iscomputed using Dijkstra’s algorithm[21], where the edge weights aredefined using the formulation ofsection IV-E.

• Delay bound. Since the MDM problem is NP-Hard (seeSection III) the optimal solutioncould not be computed.2 2

at distance D from s. If these two hostsreside in the sameT3 tree, then the maximal height of theremaining T3 trees isD − 2 and we have that the broadcastdelay from a cornerhost is at most D + 2. If these twohosts reside in different subtrees, thenthe maximal height of the third subtree isD − 2and the broadcast delay is again at mostD + 2. In the case of a non middle sidehost, the single host at distance D islocated at one of the T3 trees and themaximal height of theremaining trees is D − 2, and thereforethe broadcast delay isat most D + 2. Therefore, the lemmafollows for this case.If s is a interior host then T has four

T3 subtrees linked to it. By checking allthe possible combinations of tree heightsand the location of the hosts atdistances D and D − 1, it canbe easily shown that the broadcast delayfrom an interior hostis at most OPT + 2.This results indicates that any SPT

based broadcast (e.g., flooding with senseof direction) provides near-optimalresult. In addition, in [13] it is shownthat the LRF heuristic builds an SPTwhich provides an optimal solution.

VI. A SIMULAT IONSTUDY

In this section we analyze the averageperformance of the proposed algorithms onrandom networks assuming various groupsizes and wide range of network costs.

The simulations assume two undirectednetwork topologies

- fully connected and partially connectedoverlay graphs. The topologies of thephysical networks and the partiallyconnected overlays are constructed using apower-law graph generator. This generatoris based on the Notre-Dame model [24]which constructs undirected graphs withpower-law node degree fre- quencydistribution using an input parameter setm0 , m, p, q. This parameter set defines theproperties of the resulting graph: m0 isthe initial node set, p is theprobability to add m new links, and q isthe probability to rewire m links. Acommon parameter set m0 = 3, m = 2, p =0.1, q = 0 was used to derive all thetopologies. This set results in graphswith an average degree of approximately4.38. In addition, in all the simulationswe have selected the multicast group toinclude all the hosts in the network.In our simulations we compare the

performance of the our heuristic with thefollowing schemes.

• Approx-MDM multicast algorithm. Theapproximation algorithm (see SectionIV-A) needs to solve multiple linear

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

programs, and therefore it requireshigh polynomial

Instead, we select the maximum cost ofthe shortest path(as defined in Section IV-E) from thesource to any other host in the graph.The selected value is a non-tightlower bound on the performance of anymulticast scheme. In the graphs shown,this delay bound is labelled as Max.Latency.

A. SimulationresultsFirst we describe the format of the

plotted graphs. In all the presentedresults we apply 40 independentsimulation experiments per each datapoint, plotting the mean value with errorbars representing a 95% confidenceinterval. In the case of fully connectedoverlay networks, we present thesimulation results using two plots, onethat covers small group sizes up to 25members and another which handles largergroup sizes up to 4000 members. Thus, theperformance of the heuristic andapproximation trees is compared in thecontext of small group sizes, while largegroup sizes are used to analyze thescaling properties of the heuristic andSPT trees.Next, we present the results for the

case of a fully connected overlay network.Figs. 4–6 plot the costs, i.e., themulticast delays, of the LRF, Approx-MDM,and shortest-path trees as a function ofthe multicast group size. In eachsimulation the network costs are randomlyselected using a discrete uniformdistribution on the intervals ([1, 10], [1,10]), ([1, 1], [1, 10]), ([1, 10], [1, 1]),respectively. The left range in eachpair is the communication cost range,and the right range is the processingrange. In the graphs shown, the LRFresults are labelled as Heuristic-MDM.According to Fig. 4, the cost of the heuristic tree is up to

30% smaller than the cost of theapproximation tree. Fig. 5 in- dicatesthat the trees achieve similar cost whenthe processing costs dominate the

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

communication costs. Fig. 6 shows that inthe alternative case of network withdominating communica- tion costs, theheuristic tree cost can be up to 3 timessmaller than the approximation cost. Thelatter case captures internet- likescenarios. The results for this importantcase indicate that the heuristic treeoutperforms both the approximation treeand the SPT. This performance gap betweenthe heuristic and the approximation,stems from the fact that theapproximation scheme constructs treeswith logarithmic height. The usage oflogarithmic height trees increases theprobability of selecting high costcommunication delays, and thereforereduces the average efficiency ofapproximation trees.

400

Multic

ast

Mult

icas

t

Mult

icas

t

Multic

ast

Mult

icas

t 1

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

120 410Heuristic−MDM

500Heuristic−MDM

100 Approx−MDMSPT 3

80 Max Latency

260 10

40 11020

SPTMax Latency

300

200

100

00 10 20 30GroupSize

010 0 1000 20003000 4000GroupSize

0 0 1000 20003000 4000GroupSize

Fig. 4. The multicast delay for a clique topology with random network costs from [1, 10] Fig. 7. The multicast delay for a power-law

topology with random network costs from [1, 10]

200

Heuristic−MDM Approx−MDM

410law graph, to simulate fully connected overlay structures over

150 SPTMaxLatency

100

50

00 10 20 30GroupSize

10

210

110

010 0 1000 20003000 4000GroupSize

the Internet. In each simulation themulticast group hostswere attached to a randomly selecteduniformly distributed set of edge nodesin the power-law topology. The commu-nication costs were derived according tothe minimal hop count, yielding anaverage overlay link cost of 4.8hops with a maximal value of 9 hops. Theprocessing costs were randomly selectedfrom the discrete intervals [1, 5], [1,10]

Fig. 5. The multicast delay for a clique topology with random processing costs from [1, 10] and unit communication costs

As expected SPT provides the worst caseperformance, providing a cost functionwhich is almost linearly proportional tothe tested group size. Observe that themulticast delay is plotted on alogarithmic scale, such that the linearperformance degradation is shown usinga logarithmic curve. The SPTperformance is consistent with the treeconstruction mecha- nism which makes noattempt to minimize the degree of theresulting tree. The quality of the SPT isdetermined according to the dominance ofthe communication costs, such that theapplicability of SPT is limited to smallmulticast groups in overlay networks withdominating communication costs (Fig.6).The previous experiments were repeated

using other cost intervals, [1, 5], [1,

100], preserving the methodology of net-work cost selection. The obtained resultswere consistent with the previousoutcomes. We also simulated nearhomogeneous costs and verified thelogarithmic convergence rate (see [8]) ofthe heuristic.We used a 4000 node physical network, based on a power-

5

Mult

icast

Mult

icas

t

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

and [1, 100]. Unsurprisingly, the obtainedresults were similar to the previousresults which use random cost selection,and therefore the corresponding graphsare omitted.Next, we consider the case of partially

connected overlaynetworks derived using the power-lawtopology generator. In this case, weweren’t able to apply the approximationscheme due to the implicit full-connectivity assumption of the algorithm.Therefore we compare the performance ofthe heuristic tree with SPT, using thesame network costs as in the fullyconnected case. The results indicate thatthe heuristic tree scales well, such thatits maximal cost is up to 80% higherthan the lower bound, which is not tight.Fig. 7 shows a typical large scale resultwith processing and communication costs

randomly selected from the discreteintervals ([1, 10], [1, 10]). The large-scaleresults for a clique topology aresimilar. For example, see Fig. 4 in whichthe maximal cost of the heuristic tree isup to 3 times higher than the non-tightlower bound.

The main conclusion drawn from thesimulations is that the

heuristic algorithm produces resultswhich are very close to the optimal foralmost any group size, showing alogarithmic- like growth rate.Furthermore, the average performance ofthe heuristic algorithm is similar orbetter than the performance of theapproximation algorithm, whereas the SPTprovides worst- case performance andproduces non-scalable results.

VII. CONCLUDINGREMARKS25

Heuristic−MDM Approx−MDM SPT

20 Max Latency

15

10

50 10 20 30GroupSize

10

410

310

210

110

010 0 1000 20003000 4000GroupSize

In this work we looked at buildingefficient application layer multicasttrees. We have presented two solutions tothe MDM problem, a logarithmical provenapproximation and a heuristic. It isinteresting to see that in practice theheuristic achieves much shorter delaysthan the approximation for the casesthat represents the Internet, i.e.,networks with communication delays largerthan processing delays; and both arebetter than the previously advocatedshortest path trees.

Fig. 6. The multicast delay for a clique topology with random communication costs from [1, 10] and unitprocessing costs

Our approximation has an additivefactor that depends on

the value of the processing delays. Weare now working on

0-7803-8356-7/04/$20.00 (C) IEEE INFOCOM

a better approximation which depends onlyon the size of the multicast group.

ACKNOWLEDGMENT We thank Israel Cidon, Avishai Wool, and

the anonymous reviewers for their helpfulcomments that improved our pre- sentationin many ways. The first author is deeplygrateful to Zvika Lotker for valuable andmotivating discussions, and to YoavKarniely for his help with thesimulations.

REFERENCES[1] A. El-Sayed, V. Roca, and L. Mathy, “A survey

of proposals for an alternative groupcommunication service,” IEEE Network magazine, Specialissue on ”Multicasting: An Enabling Technology”, Jan. /Feb.2003.

[2] Y.-H. Chu, S. G. Rao, and H. Zhang, “A casefor end system multicast,” in ACM SIGMETRICS 2000.Santa Clara, CA, USA: ACM, June 2000, pp. 1–12.

[3] D. Pendarakis, S. Shi, D. Verma, and M.Waldvogel, “ALMI: An application level multicastinproceedings infrastructure,” in Proceedings of the3rd USNIX Symposium on Internet Technologies and Systems(USITS ’01), San Francisco, CA, USA, Mar. 2001, pp.49–60.

[4] S. Shi and J. Turner, “Routing in overlaymulticast networks,” in IEEE INFOCOM’02, New-York,NY, USA, June 2002.

[5] S. Banerjee, C. Kommareddy, K. Kar, B.Bhattacharjee, and S. Khuller, “Construction ofan efficient overlay multicast infrastructurefor real- time applications,” in IEEE INFOCOM’03,San Francisco, CA, USA, Apr. 2003.

[6] S. Banerjee, B. Bhattacharjee, and C.Kommareddy, “Scalable applica- tion layermulticast,” in ACM SIGCOMM 2002, Pittsburgh, PA,USA, Aug. 2002.

[7] S. Saroiu, P. K. Gummadi, and S. D. Gribble,“A measurement study

of peer-to-peer file sharing systems,” in Multimedia Computing andNetworking 2002 (MMCN’02), San Jose, CA, USA, Jan. 2002.

[8] I. Cidon, I. Gopal, and S. Kutten, “New modelsand algorithms for future networks,” IEEETransactions on Information Theory, vol. 41, no. 3, pp.769 – 780, May 1995.

[9] M. Castro, M. B. Jones, A.-M. Kermarrec, A.Rowstron, M. Theimer, H. Wang, and A. Wolman,“An evaluation of scalable application-levelmulticast built using peer-to-peer overlays,” inIEEE INFOCOM’03, San Francisco, CA, USA, Apr.2003.

[10] S. Y. Shi, J. Turner, and M. Waldvogel, “Dimensioning server access

bandwidth and multicast routing in overlay networks,” in NOSSDAV2001, June 2001, pp. 83–92.

[11] P. Francis, S. Jamin, C. Jin, Y. Jin, D. Raz,Y. Shavitt, and L. Zhang, “IDMaps: a globalinternet host distance estimation service,”IEEE/ACM Transactions on Networking, vol. 9, no. 5,pp. 525–540, Oct. 2001.

[12] A. Bar-Noy, S. Guha, J. S. Naor, and B.Schieber, “Message multicasting in heterogeneousnetworks,” SIAM Journal on Computing, vol. 30, no. 2,pp. 347–358, 2001.

[13] E. Brosh, “Approximation and heuristicalgorithms for minimum delay application-layermulticast trees,” Master’s thesis, Departementof EE- Systems, Tel-Aviv University, Ramat-Aviv,Israel, 2003.

[14] A. Bar-Noy and S. Kipnis, “Designingbroadcasting algorithms in the postal model formessage passing systems,” in Proc. of SPAA, 1992,pp.13–22.

[15] M. Garey and D. Johnson, Computers andIntractability. San Francisco: Freeman, 1979.

[16] S. M. Hedetniemi, S. T. Hedetniemi, and A.L. Liestman, “A survey of gossiping andbroadcasting in communication networks,”Networks, vol. 18, no. 4, pp. 319–349, 1988.

[17] R. Ravi, “Rapid rumor ramification:approximating the minimum broad- casting time,”in 35th IEEE Symp. on Foundations of Computer Science,1994, pp. 202–213.

[18] G. Kortsarz and D. Peleg, “Approximationalgorithm for minimum time broadcast,” SIAM J.Discrete Math., vol. 8, pp. 401–427, 1995.

[19] D. Raz and Y. Shavitt, “New models andalgorithms for programmable networks,” ComputerNetworks, vol. 38, no. 3, pp. 311–326, 2002.

[20] L. Wei and D. Estrin, “The trade-offs of multicasttrees and algorithms,”

in ICCCN’94, San Francisco, USA, Sept. 1994.[21] T. Cormen, C. Leiserson, and R. Rivest, Introduction to Algorithms.

McGraw-Hill, New York, NY: MIT Press, 1990.[22] P.Francis. (1999) Yoid: extending the

multicast inetrnet architecture. [Online].Available: http://www.aciri.org/yoid/

[23] A. Farley and S. Hedetniemi, “Broadcasting in grid graphs,” in the

9th S-E conf. combinatorics, graph theory, and computing, UtilitasMathematica, Winnipeg, 1978, pp. 275–288.

[24] R. Albert and A. Barabasi, “Topology ofevolving networks: local events anduniversality,” Physical Review Letters, vol. 85, no.24, pp. 5234–5237, 11 Dec. 2000.