QoS Routing Via Multiple Paths Using Bandwidth Reservation

ORNL/TM-13547QOS Routing Via Multiple Paths Using BandwidthReservationNageswara S.V. RaoOak Ridge National LaboratoryComputer Science and Mathematics DivisionP.O. Box 2008, Bldg 6010Oak Ridge, TN 37831-6355Stephen G. BatsellOak Ridge National LaboratoryComputing, Information, and Networking DivisionP.O. Box 2008, Bldg 6012Oak Ridge, TN 37831-6367DATE PUBLISHED | January 1998REVISED | October 1998Research sponsored by theLaboratory Directed Research and Development Project ofOak Ridge National LaboratoryPrepared by theOAK RIDGE NATIONAL LABORATORYOak Ridge, Tennessee 37831managed byLOCKHEED MARTIN ENERGY RESEARCH CORP.for theU.S. DEPARTMENT OF ENERGYunder Contract No. DE-AC05-96OR22464.i

blank page

ii

ContentsAcknowledgements viAbstract vii1 Introduction 11.1 Relation to Prior Work : : : : : : : : : : : : : : : : : : : : : : : : : : 21.2 Contribution and Organization of the Paper : : : : : : : : : : : : : : 32 Problem Formulation 43 Message Transmission Problem 53.1 Shortest-Widest Paths : : : : : : : : : : : : : : : : : : : : : : : : : : 53.2 Properties of Multipaths : : : : : : : : : : : : : : : : : : : : : : : : : 63.3 NP-Completeness of MTP : : : : : : : : : : : : : : : : : : : : : : : : 103.4 Approximate Routing Algorithm : : : : : : : : : : : : : : : : : : : : 143.5 Relation to Maximum Flow Algorithm : : : : : : : : : : : : : : : : : 163.6 Simulation Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : 183.7 Delay-Bandwidth Product : : : : : : : : : : : : : : : : : : : : : : : : 244 Sequence Transmission Problem 254.1 Intractability Results : : : : : : : : : : : : : : : : : : : : : : : : : : : 254.2 Approximation Algorithm : : : : : : : : : : : : : : : : : : : : : : : : 285 Conclusions 29

iii

List of Figures1 Widest-Shortest paths versus multipaths. : : : : : : : : : : : : : : : : 52 Single paths versus multipaths. : : : : : : : : : : : : : : : : : : : : : 63 Delays of single paths and multipaths. : : : : : : : : : : : : : : : : : 74 Illustration of conditions of Lemma 3.1. : : : : : : : : : : : : : : : : : 85 Region for segment of MP when C(MP1;MP2) is true. : : : : : : : : 96 Region for segment of MP when C(MP1;MP2) is false. : : : : : : : : 107 Intractability of MTP. : : : : : : : : : : : : : : : : : : : : : : : : : : 128 Example of ow-multipath. : : : : : : : : : : : : : : : : : : : : : : : : 149 Execution of MTA. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1710 Illustrative example. : : : : : : : : : : : : : : : : : : : : : : : : : : : 1911 Migration of minimum end-to-end delay single paths. : : : : : : : : : 1912 Illustration of minimum end-to-end delay multipaths. : : : : : : : : : 2113 Topology of ESnet. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2214 Topology of NSFNET. : : : : : : : : : : : : : : : : : : : : : : : : : : 2415 Reduction of knapsack problem to STP. : : : : : : : : : : : : : : : : 26

iv

List of Tables1 End-to-end delays for Example 3. : : : : : : : : : : : : : : : : : : : : 202 End-to-end delays for connection between LBNL and ORNL. : : : : : 233 End-to-end delays for connection between Palo Alto and WashingtonDC. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25

v

AcknowledgementsAuthors are thankful to William G. Grimell for pointing out the suboptimality ofan earlier version of our algorithm MTA [34], and also for many discussions on thistopic. The support of this work by Laboratory Directed Research and DevelopmentProgram of Oak Ridge National Laboratory is acknowledged.

vi

AbstractWe address the problem of computing a multipath, consisting of possibly overlappingpaths, to transmit data from the source node s to the destination node d over acomputer network while ensuring deterministic bounds on end-to-end delay or deliveryrate. We consider two generic routing problems within the framework wherein thebandwidth can be reserved, and guaranteed (once reserved) on various links of thecommunication network. The �rst problem requires that a message of �nite length betransmitted from s to d within � units of time. The second problem requires that asequential message of r units be transmitted at a rate of � such that maximum timedi�erence between two units that are received out of order is no more than q. Weshow both problems to be NP-complete, and propose polynomial-time approximatesolutions. Our approximation algorithm to the �rst problem is an extension of theclassical Ford-Fulkerson's method. We present simulation results to illustrate theapplicability of this algorithm.Keywords and Phrases: Routing algorithms, quality of service, maximum owmethods, multiple paths, bandwidth reservation.

vii

1 IntroductionThe ability to provide user- or application-level guarantees that a transmission taskwill be performed under strict Quality of Service (QoS) requirements is vital to thedevelopment of next generation of network services. For example, a medical imageor a robot control packet may be required to be transmitted over a network withthe minimum end-to-end delay. Another example is the transmission of video overa computer network without undesirable delays and jitter. To ensure the perfor-mances typi�ed in these examples, it is necessary to employ methods that guaranteestrict end-to-end delay and/or rate. The expected rapid proliferation of services,such as multimedia, remote medicine, and remote robot laboratories, over the widearea networks, would require performances unprecedented in the currently availablebest-e�ort routing methods. In particular, the present Internet routing mechanisms(based on the best-e�ort paradigm) are unlikely to provide satisfactory end-to-endperformance for services required in these future applications [3, 31]. Thus, there isa de�nite need for architectures and algorithms that provide QoS guarantees beyondthose of the currently available ones.In QoS mechanisms for computer networks, in general, the parameters may beoptimized either at the network-level or at the user-level, and here we consider thelatter. In particular, we consider source-based routing algorithms [38] for two genericuser-level transmission tasks. In our framework, bandwidth can be reserved on thecommunications links, and, once reserved, is guaranteed for the required time pe-riod. While such requirements call for additional mechanisms, they in turn enableus to provide deterministic delay and/or rate guarantees. This type of bandwidthguarantees can be naturally supported in ATM networks [26] which are being in-creasingly employed (see Section 3.6). In other scenarios, for example the Internet,the bandwidth reservations may require additional mechanisms, such as RSVP [45],and speci�c queuing implementations at the routers [7]. Our framework is di�erentfrom the dynamic frameworks which utilize feedback mechanisms [24, 11, 4] to pro-vide only \soft" guarantees. Our potential applications include the transmission of:(a) �les of varied sizes, for example, ranging from small robot control packets to largeimage �les, and (b) data streams such as video-on-demand or robot vision data.We discuss algorithms that plan multipaths, consisting of possibly overlappingpaths, based on the available bandwidths on various links of the network. We formu-late the following two generic transmission tasks:(a) The �rst problem deals with transmitting a message of r units from the sources to the destination d over a network within � units of time. This problemabstracts services such as �le and image transfers, where a volume of dataneeds to be transmitted over the network.(b) The second problem deals with transmitting a sequence of r units, denoted byfu1; u2; : : : ; urg, from the source node s to the destination node d. Let ti,i = 1; 2; : : : ; r, be the time ui is received at d. The problem is to ensure thatti � tj � q for i < j, and, in addition, the message units must be received ata rate of �. Consider that s sends video or vision data to be played at d at1

a rate of �. Under this condition the destination can start playing the videowithin q time units after receiving the �rst message unit, while bu�ering nomore than q� message units at any time. This problem abstracts services suchas video-on-demand, video calls, and vision feedback to a teleoperator.We consider routing algorithms to compute multipaths from s to d for both theseproblems. It is assumed that the information about the available bandwidths of alllinks is centrally available or can be gathered when needed using a distributed algo-rithm. Once the routing algorithm is initiated, the bandwidths are \held available"until the paths are computed and the requests for bandwidths are received. Thus,it is critical that the routing algorithms be computationally e�cient in order not totie-up bandwidth during their execution. We show both problems to be computation-ally intractable (NP-complete [18]), and present approximation algorithms to ensurereasonable execution times.1.1 Relation to Prior WorkUtilization of multiple paths to provide improved performance compared to singlepaths has been explored extensively in the past for various network problems. Toname a few, some of the early works are due to [17, 21], and some of the recent worksare due to [4, 11]. In particular, the two problems described in the introductionbare resemblances to a number of network ow problems studied extensively in theeighties, and also to QoS problems of more recent origin. These two problems canbe formulated as (static) special cases of the well-known optimal routing problem[17, 6, 36, 21] using possibly non-di�erentiable cost functions, and can be solved bygeneral methods. A comprehensive treatment on the optimal routing problem andits solutions, and also on other similar problems can be found in [5]. Similarly, ourproblems can also be posed as special cases of the classical minimum cost ow prob-lem [1] studied in transportation and operations research. These general solutions,however, do not provide practical methods tailor-made to the present problems norindicate their computational complexity.On the other hand, our two problems are strictly more di�cult than the classicalmaximum ow problem without ow costs (see Section 3.5), for which polynomial-time algorithms are known [1]. In addition, our �rst problem is also similar to severalother problems studied in computer networks, but none of them provide practicalalgorithms. For example, the disjoint path methods [40, 39] are inadequate here sincethe minimum end-to-end delay could be achieved by a set of non-disjoint paths. Also,the solutions based on computing k shortest paths [41] do not yield the minimum end-to-end delay in our case; despite a super�cial resemblance of our algorithm to thisapproach, the solutions to these problems could be quite di�erent depending on thebandwidth values.A majority of recently proposed QoS routing algorithms that provide bounds onend-to-end delay and/or transmission rates are limited to single paths [44, 43] withthe possible exception of [27, 11, 4]. The �rst problem is extensively studied forsingle paths under the title of the quickest path problem [9, 35, 33]. Optimization2

of network-level parameters is discussed in [27], and their QoS parameters are notdeterministic. In spirit, our �rst problem is a special case of the one studied in [15],where the transmission task is speci�ed by several parameters. Our work di�ers from[15] in the following ways: (a) we employ multiple paths thereby achieving lowerend-to-end delays, (b) we prove the computational intractability of the problem, (c)we propose a polynomial-time approximation algorithm, and (d) our algorithm issource-based while their algorithm is distributed. Problems similar to our second oneare discussed in [11, 3, 24, 4], but their bandwidth guarantees are \soft" due to thedynamic nature of the formulation. In summary, despite the intuitive nature andpotential applicability, we are unaware of systematic algorithmic treatments of ourtwo problems.1.2 Contribution and Organization of the PaperThe deployment of the multiple paths for transmission problems seems intuitivelyobvious in that more the number of paths the larger will be the resultant bandwidth[38, 11]. Surprisingly, such increase in the bandwidth does not necessarily result ina lower end-to-end delay. Moreover, the addition of newer paths to existing set ofpaths, in fact, increases the end-to-end delay under certain conditions (see Example3.2). This curious phenomenon is due to the �niteness of r and non-zero values forthe delays. As a result, the traditional maximum ow methods based on augmentingexisting set of paths with additional paths [16, 28] are not applicable here. Weidentify conditions under which the additional paths that increase the bandwidthalso reduce the end-to-end delay (Section 3.2, Lemma 3.1). A careful applicationof this result to an adaptation of the classical maximum ow method (due to [14])yields a polynomial-time approximation algorithm for the �rst problem. We also showthat the classical maximum ow problem [16, 1] is a special case of the �rst problemwhen message size is su�ciently large or all link delays are zero. Although both ourproblems are abstract compared to real-life applications, a concrete analysis of theseproblems enables us to understand the underlying complexities of more complex tasks.In particular, the intractability of these problems motivates us to search only forapproximate solutions if real-time response is required (unless the P = NP questionis a�rmatively settled [18]). We also discuss simulation results that illustrate theapplicability of the proposed algorithms in the context of existing networks, namelyESnet and NSFNET. The total bene�ts of our methods, however, can be reaped inspecialized networks where bandwidth guarantees can be provided on all links.The organization of this paper is as follows: In Section 2, the two generic problemsare formulated precisely, and preliminaries are discussed. The message transmissionproblem is discussed in Section 3. The sequence transmission problem is discussed inSection 4.3

2 Problem FormulationWe consider a network represented by a graph G = (V;E) with n nodes and m edgesor links. Each edge e = (i; j) 2 E enables the transmission of messages at a bandwidthof B(e) � 0 units per second. There is a link delay D(e) � 0 for each message unitsuch that a message unit sent from node i at time t on link e will arrive at node j attime t +D(e). The link delay includes the preparation and propagation time of thelink. By pipelining the message units, a message of r units can be sent along the edgee in r=B(e) +D(e) time. An edge of bandwidth B and delay D can be visualized astwo parallel edges with bandwidths B1 and B2, B1+B2 = B, and each with delay D.Consider a path from i0 to ik given by (i0; i1); (i1; i2); : : : ; (ik�1; ik), where (ij ; ij+1) 2E, for j = 0; 1; : : : (k � 1). The path is simple if all i0; i1; : : : ; ik are distinct. Thedelay of this path P , denoted by D(P ), is given by k�1Pj=0D(ej), where ej = (ij ; ij+1).The bandwidth of this path, denoted by B(P ), is given by k�1minj=0 B(ej).A multipath from s to d, denoted by MP , is set of (possibly overlapping) simplepaths from s to d. The end-to-end delay, denoted by t(MP ), of a multipathMP froms to d is de�ned as the time required to send a massage of r units from s to d. WhenMP consists of a single path P , we often denoteMP = fPg by P to simplify notation.We now provide formal de�nitions of the two transmission problems outlined in theintroduction. We are given the computer network G = (V;E), the delays D(e) for alle 2 E, and the bandwidths B(e), for all e 2 E.Message Transmission Problem: (MTP)The task is to compute a multipath 1 MP to transmit a message of r unitsfrom source s to destination d over the network G = (V;E) such that the timeelapsed since the �rst unit was sent from s until the last unit is received at d isno more than � .Note that the message units can be received at the destination in any order, and itis only required that t(MP ) � � . The above problem is the decision version, and itsoptimization version requires the end-to-end delay be minimized over all multipathsbetween s and d.Sequence Transmission Problem: (STP)The task is to compute a multipath to transmit a message of r units, denotedby fu1 ; u2 ; : : : ; urg, from the source s to the destination d such that: (a) unitsarrive at d at a rate of �, and (b) ti � tj � q for i < j where ti is the time ui isreceived at d.An optimization version of STP requires that the bandwidth of the multipath betweens and d be maximized for any given value of q. Note that if a single path is employedfor transmission, then all message units will arrive at d in sequence, i.e. q = 0. Since1To avoid degenerate cases in the transmission problems, it is assumed that every path a solutionmultipath is utilized in transmitting at least one message unit.4

(B , D)(B , D)(B , D)

s d

(B , D) (B , D)1 1

222

path P

path P

1

2Figure 1: Widest-Shortest paths versus multipaths.multipaths are allowed, we can potentially bene�t from added bandwidth due toadditional paths, but, at the cost of units arriving out of order at d. By maintaininga bu�er of size q�, the sequence can be reconstructed and displayed at d at the rate� starting no later than q from the time �rst segment arrives at d. Note that theentire sequence will be reconstructed in time r=� + q at d. A variation of STP thatrequires a multipath MP such that: (a) t(MP ) � � , and (b) ti � tj � q, for i < j, isconsidered in Section 4 (STP will be shown to be a special case of this problem).3 Message Transmission ProblemIn this section we discuss some properties of multiple paths and then the computa-tional issues of MTP.3.1 Shortest-Widest PathsThe shortest-widest paths [44] have been employed as a mechanism for QoS routing,where a shortest-widest path is a path with the shortest delay among all paths withthe largest bandwidth from s to d. This method, however, does not exploit multipathsto decrease the end-to-end delay. Furthermore, even when single paths are employed,for certain ranges of message sizes and delays, smaller end-to-end delays may not beachieved by the shortest-widest paths [33].Example 3.1: Consider the network shown in Fig. 1 that has two disjoint pathsfrom s to d with 2 and 3 edges respectively. The delay of each edge is D units. Thebandwidth of each edge of path Pi is Bi, for i = 1; 2. Assume B2 > B1, and thus P2is the shortest-widest path. Now consider the transmission of a message of r = 1500units via P2 with D = 10 sec, B1 = 100 units/sec, and B2 = 200 units/sec. Theend-to-end delay due to path P2 is 1500=200+ 30 = 37:5 sec. On the other hand, ifpath P1 is used, the delay is given by 1500=100+ 20 = 35 sec. For this network, thecondition for P1 resulting in a smaller delay is r � DB1B2B2�B1 . Thus, for smaller values ofr � 2000, it is advantageous not to use the shortest-widest path.Now consider that the message of r = 1500 units is split into two messages of1166 and 334 units and sent on P1 and P2, respectively. Then, the submessages arereceived via P1 and P2 in 1166=100+ 20 = 31:66 sec and 334=200+ 30 = 31:67 sec,5

respectively. Thus, the entire message is transmitted in 31:67 sec which is smallercompared to using individual paths. 2In general, for a network consisting of two non-intersecting paths P1 and P2, suchthat B2 > B1 and D2 < D1, the end-to-end delay of the shortest-widest path is largerunder the conditions: (a) r < (D2�D1)B1B2B2�B1 , or (b) D2 > D1+ (B2�B1)rB1B2 . The end-to-enddelay can be minimized by a shortest delay path for very small values of r, and bya shortest-widest path for large values of r. For a given value of r, a single pathwith minimum end-to-end delay can be computed in O(m2+mn logn) using existingalgorithms [35, 9].3.2 Properties of MultipathsThe ow-methods are often proposed for various multiple path routing problems [8, 5].One of the widely used maximum ow method, the Ford-Fulkerson's method [16],computes the ow by repeated path augmentation. Such an augmentation methoddoes not always result in lower delays, and in fact, can increase the end-to-end delay(Example 3.2). By carefully controlling the augmentation process, we will derivean approximation algorithm for MTP in the next section; this algorithm requires adetailed examination of the properties of multipaths, which is presented here.Example 3.2: Consider a network consisting of paths P1 and P2 as shown in Fig. 2.Let B1 = 10 units/sec, B2 = 20 units/sec, D1 = 2 sec, and D2 = 12 sec. Consider amessage of size r = 100 units. Using path P1 alone for the transmission will result ina delay of 100=10 + 2 = 12 seconds. The delay of path P2 is 100=20 + 12 = 17 sec.On the other hand, let us say that 99 units are sent on P1 and 1 unit is sent on P2;the corresponding delays are given by 99=10 + 2 = 11:9 sec and 1=20 + 12 = 12:05sec respectively, resulting in an end-to-end delay of 12:05 seconds. Clearly, it isadvantageous not to use the multipath fP1; P2g for this message size.For r = 1000 the end-to-end delays of individual paths P1 and P2 are 102 sec and62 sec, respectively. If a single path is to be used, it can be seen that: for r > 200,P2 is the choice; for r < 200, P1 is the choice; and for r = 200 either can be chosen(see Fig. 3).Consider using a multipath fP1; P2g for r = 1000 such that 400 and 600 units aresent via P1 and P2 respectively, resulting in the individual delays of 42 sec for eachpath; hence, the resultant end-to-end delay is 42 sec which is smaller than that of P1or P2 (given by 102 and 62 sec, respectively).s d

path P

path P

1

2Figure 2: Single paths versus multipaths.6

message size

delay

100

30

400300200

20

10

Path P

Path P

1

2

}2

Multipath { P ,P1

Figure 3: Delays of single paths and multipaths.In general, the message size determines if multipath fP1; P2g achieves a lowerdelay. The allocation for the least end-to-end delay is given by: if r � 102 use P1,and use the multipath fP1; P2g otherwise. To see this consider that message is splitinto two parts of size x and r�x, and sent on P1 and P2, respectively. Then the delaydue to the multipath is max �r�x20 + 12; x10 + 2�, which is minimized at x = (r+200)=3.Thus the delay due to multipath is r=30+8:66. The delays of P1, P2 and fP1; P2g areshown in Fig. 3. The lower envelop of the various delay lines in Fig. 3 corresponds tothe optimal strategy for this case. 2MP can be visualized as a subgraph of G consisting of s and d such that everyedge of this subgraph is contained in a path of MP from s to d. Let P1; P2; : : : ; Pkdenote the paths that constitute MP . A cut C of the multipath MP is a set of itsedges whose removal disconnects s and d [16]. The bandwidth of a cut C of MPis the sum of bandwidths of edges of the cut. Among all cuts of MP , the one withminimum bandwidth is called the minimum cut. The bandwidth of MP , denoted byB(MP ), is de�ned as the bandwidth of its minimum cut. ForMP = fP1; P2; : : : ; Pkgwhen all Pi's are edge disjoint, we have B(MP ) = kPi=1B(Pi); in general, however, wecan only say B(MP ) � maxi B(Pi), with equality achieved when all paths contain thesame edge with minimum bandwidth.It is convenient to visualize a multipathMP as line segment in the interval [1; r].Here when message is of unit length the total delay of the path is D(MP ) = D,and as large messages are considered the delay increases with a slope of 1=B(MP ).Subsequently, we denote the segment of MP increasing from left-to-right or right-to-left depending on the context.We de�ne two paths P1 and P2 from s to d to be non-opposing if an edge (u; v)is on P1 then (v; u) is not on P2 and vice versa. This de�nition naturally extends tomultipaths. The following lemma identi�es conditions under which it is advantageousto employ multipath fP1; P2g.Lemma 3.1 Consider two simple non-opposing paths P1 and P2 from s to d. For7

D1

D2

r/B + D 11r/B +D2 2

r

r

r

1

2Figure 4: Illustration of conditions of Lemma 3.1.the multipath MP = fP1; P2g, we havet(MP ) � minft(P1); t(P2)gif and only if D(P1) + r=B(P1) � D(P2) and D(P2) + r=B(P2) � D(P1).Proof: First consider the if part. Let ri denote the number of units transmittedalong Pi, i = 1; 2. Then t(MP ) = maxfr1=B1 + D1; r2=B2 + D2g, for r = r1 + r2.Under the condition D(P1) + r=B(P1) � D(P2) and D(P2) + r=B(P2) � D(P1), thedelay is minimized when total delays of both paths are the same. To see this referto Fig. 4, where t(Pi) is linearly decreasing function of ri starting with ri = r. Byvisualizing (r1; r2) as a point on the interval [0; r] with r1 represented as segment[0; r1] and r2 represented as segment [r � r1; r], the minimum end-to-end delay isachieved under the condition D2 + r�r1B2 = D1 + r1B1 : This condition in turn yieldsr1 = B1B2B1+B2 (D2 �D1 + r=B2) which yields the total delay oft(MP ) = rB1 +B2 + B2D2 + B1D1B1 + B2 ; (3:1)which is no more than minft(P1); t(P2)g.For the only if part, consider the contra-positive that D(P1) + r=B(P1) < D(P2)(the other case is identical). This condition means that entire message can be sentvia P1 in time less than D(P2). Thus if a single unit is sent via P2, we have t(MP ) >t(P1) � minft(P1); t(P2)g. 2.The minimum end-to-end delay of fP1; P2g as per equation (3.1) is achieved bydividing the message into two parts of sizes r1 and r2, r1 + r2 = r, to be sent via P1and P2 respectively such thatr1 = B1rB1 + B2 + B1B2(D2 �D1)B1 + B2 and r2 = B2rB1 +B2 � B1B2(D2 �D1)B1 +B2 :An example when the condition of Lemma 3.1 is violated is shown in Fig. 6(a). Basedon Eq (3.1) the multipath fP1; P2g can be visualized as a single path with bandwidth8

B1+B2 and delay B2D2+B1D1B1+B2 , when the condition of Lemma 3.1 is satis�ed. Using thisargument, the proof of Lemma 3.1 can be generalized in an obvious way to multipathsMP1 and MP2. For ease of reference, we denote by C(MP1;MP2) the condition:D(MP1) + r=B(MP1) � D(MP2) and D(MP2) + r=B(MP2) � D(MP1):Lemma 3.2 Consider two non-opposing multipaths MP1 and MP2 from s to d, eachcomposed of a set of non-opposing paths. For the multipath MP = MP1 [MP2, wehave t(MP ) � minft(MP1); t(MP2)g if and only if C(MP1;MP2) is true.Lemma 3.2 identi�es the critical condition when an existing multipath can beaugmented with an additional path or multipath. The e�ect of forming multipathscan be schematically visualized as follows. Consider the multipath MP = MP1 [MP2, and Di = D(MPi), Bi = B(MPi), for i = 1; 2. Now it is direct that (a)D(MP ) � maxfD1; D2g, and (b) B(MP ) � maxfB1; B2g. The latter implies thatr=B(MP ) � minfr=B1; r=B2g. First consider that C(MP1;MP2) is true as shownin Fig. 5(a). Take the segment of MP2 that increases from right-to-left to identifythe shaded region that contains the segment of MP ; a typical example of segment ofMP is shown in dotted lines in Fig. 5(b). Now consider that C(MP1;MP2) is false,and assume that D2 > r=B1 +D1 as shown in Fig. 6. The shaded region shows therange in which the segment of MP lies with a typical example shown in dotted lines.The following lemma establishes a critical property of optimal multipath solutioncomposed of a set of mutually non-opposing paths.Lemma 3.3 Let fP1; P2; : : : ; Ppg denote the set of mutually non-opposing paths suchthat D1 < D2 < : : : < Dp, where Di = D(Pi) and Bi = B(Pi) for i = 1; 2; : : : ; p.For the minimum end-to-end delay T �, we have T � = riBi +Di, for all i = 1; 2; : : : ; p,where ri is the part of r routed via path Pi. Let �Bj = jPi=1Bi and �Dj = jPi=1BiDijPi=1Bi . TheD

1D

2

r/B + D 11r/B +D2 2

D2

r/B + D 11

r r

r/B +D2 2

D1

(a) (b)Figure 5: Region for segment of MP when C(MP1;MP2) is true.9

r

D1

D2

r/B + D 11

r/B +D2 2

r

D1

D2

r/B + D 11

r/B +D2 2

(a) (b)Figure 6: Region for segment of MP when C(MP1;MP2) is false.minimum end-to-end delay for any r is given byT � = 8>>>>>><>>>>>>: r�B1 + �D1 if 0 < r � �B1(D2 �D1)r�B2 + �D2 if B1(D2 � �D1) < r � B1(D3 � �D2)� � �r�Bj + �Dj if �Bj�1(Dj � �Dj�1) < r � �Bj(Dj+1 � �Dj)� � �Proof: Let ri; rj be such that T � = riBi + Di and rjBj + Dj = T � � �, for some� > 0. Without loss of geenrality assume that Bi; Bj > 1. By reassigning themessages such that r̂i = ri � � and r̂j = rj + �, we have Ti = riBi + Di + �Bi < T �and Tj = rjBj +Dj � �Bj < T �. The same procedure is repeated for all k's such thatrkBk + Dk = T � to obtain a new multipath with delay smaller than T �, which is acontradiction. The expression of T � follows a repeated application of Eq (3.1). 2Given fP1; P2; : : : ; Ppg as in Lemma 3.3, all intervals in the lemma can be com-puted in O(p) time as follows. First we compute �Bj , �Dj, j = 1; 2; : : : ; p, in O(p) timeby using �Bj+1 = �Bj +Bj+1, and �Dj+1 = Bi+1Di+1+ iPj=1BjDj�Bj+1 . Then, for any given r themultiple path with minimum end-to-end delay is obtained using the expression forT � in Lemma 3.3.3.3 NP-Completeness of MTPWe now consider the problem of determining a multiple path with end-to-end delayno more than � 2.Theorem 3.1 The decision version of the message transmission problem is NP-complete.2In [34], it was erroneously claimed by us that MTP is polynomial-time solvable by using asimpli�ed version of the algorithm MTA of the next section. A counter example was suggested byGrimmell [20], a generalization of which lead us to the intractability proof presented here.10

Proof: Given a solution to MTP in terms of individual paths of the multipath andthe corresponding message sizes, the bound � can be veri�ed by computing the end-to-end delay of each path. This computation can be performed in polynomial-timeand hence MTP is in NP.Consider the Subset Sum Problem (SSP) [18] where we are given a set A =fa1; a2; : : : ; apg and a map s : A 7! (0; K), for some K > 0, and a real numberh. (the original subset problem is speci�ed in terms of integer values of s(:), and issubsumed by the present problem). We are required to decide if there exist a subsetA0 � A such that Pa2A0 s(a) = h. Without loss of generality we assume that s(a) < 1=2for all a 2 A. We reduce an instance of this problem to p instance of MTA suchthat a solution to the former exists if and only if a solution to one of the instance ofMTA exists. The kth, k = 1; 2; : : : ; p, instance of MTA is speci�ed as follows: Thenetwork consists of p modules as shown in Figure 7, where module i, denoted by Ci,corresponds to unique ai 2 A. Each module consists of �ve edges with the bandwidthand delays assignments as shown in Figure 7(b), where each edge is represented bya pair of numbers corresponding to the bandwidth and delay respectively. We posethe question that a message of size r = 40p can be transmitted with the end-to-enddelay of � = 120� 8k "140n� 66k + 10 pXi=1 s(ai) + 8h# :Note that the required instances MTA are computed in time polynomial in p. Firstnote that any two paths in two di�erent modules of G are non-opposing. Moregenerally, two multipaths each composed of non-opposing paths but in two di�erentmodules are non-opposing.Let us consider some special properties of multiple paths in Figure 7. First considereach module as in Figure 7(b) whose paths consist of P1; P2; P3; P4 with delays 3+�i,4+�i, 6, 9+�i+�i. Under the condition �i; �i < 1, these delays are strictly increasing.By an exhaustive analysis, the following is the listing of multiple paths with minimumend-to-end delay for various values of r.range for r paths t's of paths t of multipath[0; (8� 8�i + 8�i)] P1 r=8 + 3 +�i r=8 + 3 +�i[(8� 8�i + 8�i); P1 r=8 + 3 +�i r=10 + (32 + 8�i + 2�i)=10(28� 8�i � 2�i)] P2 r=2 + 4 + �i[(28� 8�i � 2�i); P1 r=8 + 3 +�i r=12 + (44 + 8�i + 2�i)=12(40� 20�i + 10�i)] P2 r=2 + 4 + �iP3 r=2 + 6[(40� 20�i + 10�i);1] P10 r=10 + 6 +�i r=20 + (10 + �)=2P20 r=10 + 4 + �iThe range of values for r of interest to us is [(28 � 8�i � 2�i); (40 � 20�i + 10�i)]where the multipath is either MPi = fP1; P2; P3g or MPi0 = fP10; P20g as indicatedin the rows 3 and 4, respectively, in the above table. For module i, we choose chooseeither MPi or MPi0 depending on the precise values of r and � .11

ds

(a)

C

C

C

1

2

p

Ci

P

P

P1

2

3

P4

(b)

10,1

10,5

10,3+

10,18,1+ i∆

δiFigure 7: Intractability of MTP.12

For lower values of r one could choose MPi for each module which yields by Eq(3.1) a multipath with the end-to-end delayt(MPall i) = r12n + 4412 + 112n pXi=1[8�i + 2�i]:For very large values of r, one could choose MPi0 for each module which yields amultipath with the following end-to-end delayt(MPall i) = r20n + 5 + 12n pXi=1 �i:For intermediate values of r, a selection of choices between MPi and MPi0 for eachCi would be required. Let A1 corresponds to modules for which MPi's are chosenand A2 = A n A1 corresponds to modules for which MPi0 's are chosen. Then, theend-to-edn delays of multipaths of modules corresponding to A1 and A2 are given asfollows, respectively: for jA1j = k, we havet(MPA1) = r12k + 4412 + 112k Xai2A1[8�i + 2�i]t(MPA2) = r20(p� k) + 5 + 12(p� k) Xai2A2 �i:The resulting multipath has the following end-to-end delayt(MPA1 ;A2) = r20p� 8k + 100p� 66k20p� 8k + 120p� k 2410 pXi=1 �i + 8 Xai2A1(�i � �i)35 :We now choose �i = s(ai) and �i = 2�i. Then, we have t(MPall i) = t(MPall i0)for r = 40p which corresponds to the message size that is too large for MPall i andtoo small for MPall i0 . Hence a combination of MPi's and MPi0 is needed, and thecorresponding end-to-end delay is given byt(MPA1 ;A2) = 120p� 8k 24140p� 66k + 10 pXi=1 s(ai) + 8 Xai2A1 s(ai)35 : (3:2)Now a solution to SSP with jA0j = k yields the required solution for the MTPwhere MPi's are chosen for the modules corresponding to A1 = A0. Furthermore, anysolution to the multipath problem with r = 40p must correspond to a combination ofMPi'a nd MPi0's since it is too large forMPall i but too small forMPall i0. No matterwhich k is chosen the end-to-end delay is achieved by a path such that h = Pai2A1 s(ai).Thus, given a solution to the multipath problem, the set A1 yields the solution A0 asper the equation (3.2). 2 13

3,1

1,31,3

1,1

2,2

3,2

P1

P2

P3Figure 8: Example of ow-multipath.3.4 Approximate Routing AlgorithmWe now present an algorithm that computes a multipath to approximately solvethe optimization version of MTP. Familiarity with the basic ideas of Ford-Fulkersonmethod and its implementations is assumed in this section (a complete discussioncan be found in [12, 1]). We de�ne capacity for u; v 2 V as follows: c(u; v) is thebandwidth of the edge (u; v) if such edge exists, and is 0 otherwise. Let ow in Gbe a real-valued function f : V � V 7! < such that: (i) for all u; v 2 V , we havef(u; v) � c(u; v); (ii) for all u; v 2 V , we have f(u; v) = �f(v; u); and (iii) for allu 2 V � fs; dg, we have Pv2V f(u; v) = 0. The residual capacity of (u; v) is de�ned bycf(u; v) = c(u; v)� f(u; v). Given G = (V;E) and the ow f , the residual network ofG induced by f is Gf = (V;Ef), whereEf = f(u; v) 2 V � V : cf(u; v) > 0g:Given a ow f , we compute a ow-multipath as follows:algorithm FP1. MP �;2. while there is a path from s to d do3. Pf shortest delay path in Gf via edges with positive ow;4. fP mine is on Pf f(e);5. for each edge (u; v) on Pf do6. f(u; v) f(u; v) � fP ;7. MP MP [ fPfg;Note that the constituent paths of a ow-multipath are non-opposing. Hence,for any message size r, the minimum end-to-end delay is given by Lemma 3.3 byinterpreting the ow values as bandwidths. The time complexity of this algorithm isO(m2 +mn logn), since there are at most m paths in MP and a shortest path canbe computed by using Fibonacci heaps in O(m+ n logn) time [12].Example 3.3: Consider Fig. 8 where the pair of an edge corresponds to ow anddelay, respectively. The decomposition of the ow yields the ow-multipath MP =fP1; P2; P3g: 14

algorithm MTAInitialization1. P0 shortest delay path on G;2. i 1; MP0 fP0g;M fMP0g;3. cf(P0) min(u;v) is in P0 cf(u; v);4. for each edge (u; v) in P0 do5. f [u; v] = cf(P0);6. f [v; u] = �f [u; v];Repetitive ow augmentation7. while there is a path from s to d on Gf do8. Pf shortest delay path on Gf ;9. cf (Pf) minfcf (u; v) : (u; v) is in Pfg;10. for each edge (u; v) in Pf do11. f [u; v] = f [u; v] + cf(Pf );12. f [v; u] = �f [u; v];13. if MPi and Pf are non-opposing do14. MPi MPi�1 [ fPfg;15. else16. compute ow-multipath MPf with minimum end-to-end delay for r;17. MPi MPf ;18. M M[ fMPig;19. i i+ 1;Return the multipath20. return MP arg minMPi2Mft(MPi)g;Algorithm 1. Algorithm for solving MTP.path end-to-end delayP1 r=1 + 2P2 r=2 + 5P1 r=1 + 8By treating fP as the bandwidth, we obtain the minimum end-to-end delay of MPas r=4 + 5 for large r by equation (3.1). 2The routing algorithm MTA is described in Algorithm 1, which is similar in spiritto the classical Ford-Fulkerson method. The algorithm initializesMP with the short-est delay path P0 in lines 1-6. Then a new path Pf which is the shortest delay path onGf is repeatedly added to MP in each iteration in lines 7-12, isMPi and Pf are non-opposing. If they are opposing, then the corresponding ow-multipath is computed inlines 16-17. With each additional path the ow function is suitably adjusted in lines10-13. After all multipaths ofM are computed, the one with minimum end-to-end15

delay is computed based on Lemma 3.2 in Line 20.Example 3.4: The execution of the algorithmMTA for the network shown in Figure9(a) yields a multipath consisting of three paths indicated below.path end-to-end delayP1 r=10 + 2P2 r=10 + 3P1 r=5 + 5These paths are mutually non-opposing and hence are sequentially added to themultipath.We now consider the example in Figure 9(b). The �rst three paths listed beloware non-opposing and hence result in a multipath fP1; P2; P3g with the end-to-enddelay given by r=15 + 4. path end-to-end delayP1 r=5 + 3P2 r=5 + 4P1 r=5 + 5Then next path P4 with t(P4) = r=5+8 contains a ow of 5 in the direction oppositeto that P1. Then the resultant ow yields two paths P10 and P20 shown in Figure 9(c).2 We �rst note that the number of path augmentations in the algorithm MTA isupperbounded by those in the case when the algorithm is executed with a su�cientlylarge r { in such case this algorithm reduces to the maximum ow algorithm, as willbe shown in Lemma 3.4. The runtime of algorithm MTA is estimated by using theresult of Edmonds and Karp [14] who showed that the number of ow augmentationsis upperbounded by S P(u;v)2E 1=[a(u; v)+a(v; u)]+m, where S is the maximum weightof a path from s to d, and a : V � V 7! < such that a(u; v) + a(v; u) > 0. By usingthe delay of edges, we conclude that number of augmentations is upperbounded bynmdmaxdmin +m, where dmax = maxe2E D(e) and dmin = mine2E D(e). Thus the time complexityof this algorithm is given by O(nm3dmaxdmin + n2m2 logndmaxdmin ).Summarizing the above discussion, we have the following theorem.Theorem 3.2 The message transmission problem for a network G = (V;E) of nnodes and m edges can be approximately solved in O(nm3dmaxdmin + n2m2 logndmaxdmin ) time,where dmax = maxe2E D(e) and dmin = mine2E D(e).3.5 Relation to Maximum Flow AlgorithmWe now discuss the relationship between the algorithm MTA and the maximum owalgorithm by showing that in two special cases the former reduces to the latter.When all link delays are zero, the end-to-end delay is constrained by the bandwidth16

P

P

P

(a)

1

3 2

10,215,1

15,110,1

5,1

10,1

10,110,4

10,3

5,1

(b)

P

P

P1

2

3

P4

10,1

10,110,4

10,3

5,1

P1

P2

(c)Figure 9: Execution of MTA.17

only, i.e., � = r=B�, where B� is the bandwidth of minimum-cut of s and d inG. In this case Lemma 3.2 is satis�ed in each augmentation, and the algorithmMTA reduces to a variation of Edmonds-Karp algorithm [14] (which is an implemen-tation of Ford-Fulkerson's method). Another case is when r is large enough thatthe condition of Lemma 3.2 is satis�ed in each execution of line 13. For exampler > Pe2ED(e)=mine2E B(e) is a su�cient condition. Intuitively, if the length of the mes-sage is su�ciently long, then link delay becomes an insigni�cant contributor to theend-to-end delay, which will be controlled entirely by the bandwidth (given by theminimum cut). A su�cient condition under which the maximum ow algorithm solvesMTP is given in the following lemma.Lemma 3.4 Let Dmax and Dmin denote the delays of the longest and shortest pathsin G, respectively. A su�cient condition for a solution of the maximum ow problemto be a solution of the message transmission problem is that length of the messagebe at least Bmin(Dmax �Dmin) where Bmin is the bandwidth of the minimum cut thatseparates s and d.Proof: Consider the hypothetical path ~P with delay Dmin and bandwidth Bmin. LetPmax denote the path with longest delay in G. Let for a particular message sizer, condition C( ~P; Pmax) be false, i.e. Dmin + r=Bmin < Dmax. By increasing r tor +�r, condition C( ~P; Pmax) can be ful�lled if �r=Bmin � Dmax � (Dmin + r=Bmin)or equivalently r + �r � Bmin(Dmax � Dmin). Thus for a message of size at leastr0 = r+�r, conditionC( ~P; Pmax) is true. Now for any multipathMP with bandwidthB and delay D we have B � Bmin and D � Dmin, which implies Dmin + r0=Bmin �D + r0=B. Hence condition C(MP;Pmax) is true for message of size r0. Thus in theexecution of the algorithmMTA, condition C(MPi; Pf) is satis�ed in every iteration.23.6 Simulation ResultsWe present three simulation examples to illustrate the performance and relevance ofthe proposed multipath algorithm. The �rst example is entirely illustrative, and theother two examples are based on existing networks, namely ESnet and NSFNET 3.Although the latter two networks are not speci�cally designed to provide bandwidthguarantees, the ATM portions of these networks can be utilized to support the pro-posed multipath algorithm. In all three examples a comparison with the single pathalgorithm of [33] that guarantees minimum end-to-end delay is also provided. Themultipath algorithm is implemented in C on a SPARC workstation. The executiontimes of the algorithm is no more than a few seconds in all the examples.3The topologies used here for both the networks are virtual in that they do not necessarily re ectall speci�c details of the actual physical connections. This level of abstraction is chosen here mainlyas a means of illustrating the salient features of the proposed algorithm. More details about variousconnections can be incorporated into the proposed algorithms without changing the overall natureof our conclusions. 18

1000

1000

50

1000

1000151000 1000

35

1000

100045 20 10

1000

1000

10

10001000

1000

20

1000

3

1000

1

2

3

4

5

6

1000

30

15

1000Figure 10: Illustrative example.1

2

3

4

5

6

1

2

3

4

5

6

(b) r=10,000

(c) r=100,000

(a) r=10

(d) r=1,000,000

1

2

3

4

5

6

1

2

3

4

5

6

Figure 11: Migration of minimum end-to-end delay single paths.19

message size end-to-end delaysingle path multipath10 65.66 65.66 (one path)1,000 131.66 115.83 (two paths)10,000 600.00 415.83 (two paths)20,000 1020.00 749.16 (two paths)30,000 1030.00 1002.40 (three paths)40,000 1040.00 1012.11 (three paths)50,000 1050.00 1021.81 (four paths)60,000 1060.00 1031.47 (four paths)70,000 1070.00 1041.13 (four paths)80,000 1080.00 1050.77 (�ve paths)90,000 1090.00 1060.11 (�ve paths)100,000 1100.00 1069.46 (�ve paths)1,000,000 2000.00 1910.58 (�ve paths)Table 1: End-to-end delays for Example 3.Example 3.5: Consider the network shown in Figure 10. The main purpose of thisexample is to illustrate (perhaps in an arti�cially enhanced manner): (a) e�ects ofbandwidths and link delays on the end-to-end delay, and (b) bene�ts of multipathrouting. The link delay and the bandwidth for each link are represented by the samenumber indicated in Figure 10; thus, in an overall sense, paths with higher delays alsohave higher bandwidths. Consider that s = 5 and d = 2. The shortest delay path is5� 3� 4 � 2 with a delay of 65 and bandwidth of 15 (Figure 11(a)). The shortest-widest path is 5� 2 with a delay of 1000 and bandwidth of 1000 (Figure 11(c)). Asr is varied from 10 to 106, the corresponding single paths with minimum end-to-enddelay migrate from low bandwidth to higher bandwidth paths as shown in Figure 11.The multipaths that minimize end-to-end delay are shown in Figure 12. For smallvalues of r (=100) the multipath coincides with the above single path, but for highervalues more paths are added. For r = 100 the multipath consists of 5 � 3 � 4 � 2only, and the path 5 � 3 � 1 � 2 is added for r = 1; 000. Then the shortest-widestpath 5 � 2 is added when r = 30; 000. For r = 50; 000 and r = 80; 000, the paths5 � 4 � 2 and 5 � 1 � 2 are added, respectively. Note that although the number ofpaths of the multipath increases with the message size, for any �xed message size,only a certain subset of paths must be chosen. Furthermore, in general, multipathsare clearly advantageous over single paths as illustrated in Table 1. 2Example 3.6: Consider the topology of ESnet shown in Figure 13. The link delaysare computed by using an approximate length of the link in miles and suitably ad-justed velocity of light. The full bandwidth of T1 and OC3 connections are assumedto be available for this illustration. When a single path is employed, the message20

1

2

3

4

5

6

1

2

3

4

5

6

(a) r=10 (b) r=1,000

(c) r=30,000

(f) r=1,000,000(e) r=80,00

1

2

3

4

5

6

1

2

3

4

5

6

1

2

3

4

5

6

1

2

3

4

5

6

(d) r=50,000

Figure 12: Illustration of minimum end-to-end delay multipaths.21

Figure 13: Topology of ESnet.transmission between Oak Ridge National Laboratory (ORNL) and Argonne NationalLaboratory (ANL) is via direct T1 link at 1.54 mbits/see for r = 10; 000. For largermessage sizes, for example r = 105, the end-to-end delay is minimized by the OC3connection at 155 Mbits/sec via Chicago. Note that the minimum end-to-end delayis not achieved by the OC3 connection for r = 10; 000, in spite of its larger band-width. When multipaths are employed, for low values of r the multipath consists ofthe direct T1 link. For r = 105, the end-to-end delay is minimized by the multipathconsisting of the direct T1 link and OC3 connection via Chicago. For r = 106, theoptimal delay for a single path is 25.017ms and that of a multipath is 24.815ms.Now consider a more complicated scenario of transmitting from Lawrence Berke-ley National Laboratory (LBNL) to ORNL. For large message sizes the single pathsrealize a bandwidth of 155Mb/s corresponding to the OC3 connection. The multipathrealizes a bandwidth of 158Mb/s corresponding to the seven paths shown in Table 2.The initial path is via OC3 connection with the largest bandwidth which is sequen-tially augmented by a number of rather diverse paths for larger message sizes. Forexample, the third path is via OC3 connection between PNNL and ANL but usingonly the bandwidth of T1 connection because of the bottleneck connection betweenLBNL and PNNL. Although the advantages of the multipath are more pronounced atlarge message sizes, reduction of end-to-end delay by few ms observed at low messagesizes could be vital in applications such as remote robot control. 2Example 3.7: Consider the topology of NSFNET shown in Figure 14. The link de-lays are computed as in the previous example. The available bandwidths are assumedto be around the general values for a link but are suitably chosen to illustrate thee�ect of message size; more precisely, only a portion of the bandwidth is assumed asfollows: Each link to regional nodes has a bandwidth of 1 Mbits/sec, each link to asupercomputer has a bandwidth of 10 Mbits/sec, and links to Chicago, Washington22

message size end-to-end delay (ms)single path multipath1.0M 20.940 20.940 (one path)Initial path:LBNL - OAK POP - ORNL1.2M 22.232 22.230 (two paths)Newly added path:LBNL - SLAC - CIT - UCLA - GA - CHI POP -- ANL - ORNL1.5M 24.167 24.157 (two paths)2.0M 27.393 27.351 (three paths)Newly added path:LBNL - PNNL - CHI POP - ANL - ORNL2.3M 29.329 29.264 (six paths)Newly added paths:LBNL - PNNL - SPRINT POP - JLAB - ORNLLBNL - TWC - LLNL - GA - SPRINT POP -- JLAB - ORNLLBNL - TWC - LLNL - GA - SNLA - UTA -- FSU - ORNL2.5M 30.619 30.519 (seven paths)Newly added path:LBNL - TWC - LLNL - OAK PROP - SLAC - CIT -- UCLA - GA - SNLA - UTA - FSU - ORNL10.0M 79.006 77.502 (seven paths)Table 2: End-to-end delays for connection between LBNL and ORNL.23

message single end-to-endsize path delay (ms)1,000 P1: Palo Alto - VBNS - Washington DC 19.1110,000 P2: Palo Alto - NSP#1 - Washington DC 24.95100,000 P3: Palo Alto - NSP#2 - Chicago - VBNS - 29.45- Washington DC1,000,000 P3: Palo Alto - NSP#2 - Chicago - VBNS - 74.45- Washington DCmessage size multipath end-to-end delay (ms)1,000 P1 19.1110,000 P1 and P2 23.2110,000 P1, P2 and P3 27.81100,000 P1, P2 and P2 62.42Table 3: End-to-end delays for connection between Palo Alto and Washington DC.DC have a bandwidth of 20 Mbits/sec. The available bandwidths between Palo Altoand NSP#2, NSP#1 and VBNS are taken to be 20, 5 and 1 Mbits/sec. For the trans-mission of a message from Palo Alto to Washington DC, the single and multipathsthat achieve minimum end-to-end delay are given in Table 3. Note that the singlepaths with minimum end-to-end delay migrate via VBNS, NSP#1 and NSP#2 as themessage size is increased. Also, notice that the multipaths achieve lower end-to-enddelays for message sizes of the order of 1000 and higher. 23.7 Delay-Bandwidth ProductFor large message sizes, the bandwidth is the main contributing factor to the end-to-end delay, and hence MTP can be solved by classical ow augmentation methods.Such algorithm is not guaranteed to work if delay is also a signi�cant factor in theend-to-end delay, which happens when the two terms of the end-to-end delay areapproximately equal, namely r=B � D or equivalently r � DB. Thus, the Delay-Bandwidth Product (DBP) is a \�rst-level" indicator of the range of message sizefor which the both the link-delay and bandwidths are dominant contributors to theend-to-end delay. The message size must be order of magnitude larger than DBP forthe bandwidth to be the dominant factor, and must be order of magnitude smallerthan DBP for the delay to be the dominant factor.We now consider two illustrative cases to compute typical values for DBP, andrelate them to some �le sizes that arise in certain real-life applications. The �rst ex-ample is a gigabit network (such as MAGIC or ATDNET) that uses OC48 switch witha bandwidth of 2.4 gigabits/sec. Consider such connection across the United Stateswhich is about 3000 miles long. The delay of this connection is approximately 32 mil-liseconds [30]. Then the DBP for this case is about 106 bytes, which is typical of highresolution medical or satellite images. The second example is ESnet that currently24

Figure 14: Topology of NSFNET.25

supports and is expected to support connections over OC3 and OC12 ATM switches,respectively, at the rates of 155 and 622 Mbits/sec. For a 3000 mile connection usingthis network DBP is of the order of 8� 104 and 0:25� 106 bytes, respectively. Thesesizes are typical of medium to low resolution medical images, images used in facerecognition applications, and ccd vision images used in remote robotic applications.4 Sequence Transmission ProblemWe �rst show that STP is NP-complete, and then solve a special case where the pathsare disjoint using a polynomial-time algorithm. Then we present a polynomial-timealgorithm that yields an approximate solution to the optimization version of STP.4.1 Intractability ResultsTheorem 4.1 The sequence transmission problem is NP-complete.Proof: First, the problem is NP since it can be solved by checking the conditionfor each multipath i.e. subset of paths from s to d (each such checking can be donein polynomial time). We now show the theorem by reducing the knapsack problemto the above problem. We restate the knapsack problem as follows [18] for the easeof explanation. We are given n items indexed by i = 1; 2; : : : ; n such that vi and cidenote the value and cost of ith item, respectively. The problem is to decide if thereexist a subset of items indexed by i1; i2; : : : ; ik such that kPj=1 vij � A and kPj=1 cij � Cfor two given real numbers A and C. We generate an instance of STP so that itssolution exists if and only if the solution to the instance of the knapsack problemexists.Let S = nPi=1 vi:We generate a network of n+3 nodes denoted by fs; 1; 2; : : : ; n; n+1; dg arranged linearly as shown in Fig. 15. Each edge is represented by a parameterpair (v; c) where v and c are bandwidth and delay, respectively. The ith item, i =1; : : : ; n, of the knapsack problem is represented by the pair of nodes i and i + 1.There are four types of edges in the graph. There is one outer edge from s to d withthe parameter pair �n nPi=1 vi; 0� = (nS; 0): There are n lower edges, one each betweennode s and node j, j = 1; 2; : : : ; n with the parameter pair � nPi=1 vi + vj ; 0� = (s+vj; 0)There n upper edges, one each between node j+1, j = 1; 2; : : : ; n and node d with theparameter pair � nPi=1 vi + vj ; 0� = (s+ vj ; 0). There are 2n middle edges generated asfollows. There are two parallel edges between the nodes i and i+1, for i = 1; 2; : : : ; nwith parameter pairs �k nPi=1 vi; tj� and (0; 0), which are called the upper-middle andlower-middle edges, respectively.Given an instance of the knapsack problem, we generate an instance of STPwhich requires �nding a multipath MP = fP1; P2; : : : ; Plg, for some l, such that26

2

(S+v ,0)i

(kS,c )2

(S+v ,0)i

ds

(nS,0)

(kS,c )

(S+v ,0)

i

2 i i+11 3 n+1. . . . . .

.

.

.

(0,0)

Figure 15: Reduction of knapsack problem to STP.B(MP ) � (n + k) nPi=1 vi + A and maxi;j jD(Pi) � D(Pj)j � C. Due to presence ofthe outer edge from s to d with 0 delay, the latter condition can be reduced tomaxi D(Pi) � C, because the outer edge must be included in the solution to realizethe required bandwidth term n nPi=1 vi. Note that the instance of STP can be generatedin polynomial time.Now we show that a solution to the multipath problem exists if and only if thesolution to the knapsack problem exists. First, given the solution to the knapsackproblem, we generate the solution to the STP as follows. For each item j in thesolution of the knapsack problem we generate one path from s to d with bandwidthS+ vj by choosing the path consisting of the lower edge (s; j), the upper-middle edge(j; j+1) and the upper edge from j+1 to d. Note that this path has a bandwidth ofS+ vj and the delay of cj . By combining these paths with the outer path, we achievea total bandwidth of nS + kS + kPj=1 vij � (n + k)S + A. For every item not in thesolution of the knapsack problem, we choose the lower-middle edge with delay 0. Leta be the lowest index of the chosen items in the knapsack solution. Then the longestpath in the multipath consists of the lower edge (a; k) followed by the sequence ofupper-middle and lower-middle edges corresponding to the items included and notincluded, respectively, in the solution of the knapsack problem. Clearly, the delay ofthis path is upper bounded by kPj=1 cij . Thus the resultant multipath satis�es both therequired conditions.Consider that a solution to the multipath problem is given. The lower edgescontained in the multipath form a cutset that separates s and d, hence they musthave been chosen to yield a total bandwidth of at least (n+k) nPi=1 vi+A. Consider all27

the upper-middle edges (j; j + 1) in the multipath; such j is called a non-zero nodes.The items corresponding to to all non-zero nodes yield the solution to the knapsackproblem as follows. The delay of the longest path is upperbounded by the sum of ci'sof the edges, which is upperbounded by C, thereby satisfying the �rst condition ofthe knapsack problem. We now show that the second condition is also satis�ed. Firstadd lower edges of the form (s; j) to each non-zero node j, and such addition stillsatis�es the bandwidth condition of the multipath, since bandwidth only increases asa result. Now consider the lower edges (s; j), where j in not a non-zero node, i.e. onlylower-middle edge (j; j + 1) is present in the multipath. The paths containing thesequence of these two edges do not contribute to the bandwidth, since they have zerobandwidth. Thus the sum of bandwidths of all lower edges between s and non-zeronodes is at least (n + k) nPi=1 vi + A, which implies the sum of vj's of the non-zeronodes is at least as large as A. Thus the second condition of the knapsack problemis satis�ed. 2A number of problems dealing with the computation of single paths with multipleconstraints such as delay, jitter, bandwidth, etc., have been shown to be NP-completein [23, 44, 37]. These results do not imply NP-completeness of STP, since multipathsare allowed here. For example, the problem studied in [44] dealing with minimizingjitter while ensuring bandwidth over a single path cannot be reduced to STP. Sincethe existence of a single path implies multipath but not vice versa, the former is notsubsumed by the latter by the restriction.Consider a combination of STP and MTP that requires a multipath MP suchthat: (i) the total delay of MP is no more than � , and (ii) ti � tj � q for all i < j.As shown in the last section in Lemma 3.4, this problem reduces to MTP undersu�ciently large message size r, when the delay condition can be expressed as acondition on the bandwidth. Thus this problem is NP-complete by restriction (notethat the problem is in NP since a solution can be veri�ed in polynomial time).4.2 Approximation AlgorithmWe �rst consider a variation of STP stated as follows:Disjoint path STP:We are given the set of vertex-disjoint paths, fP1; P2; : : : ; Psg and two realnumbers B and T . Does there exist a subset of paths Pi1 ; Pi2 ; : : : ; Pik such that(i) kPj=1B(Pij ) � B, and (ii) maxj;l jD(Pij )�D(Pil)j � T ?.Since the paths do not intersect, the bandwidth of the multipath fPi1 ; Pi2g, i1 6= i2, isB(Pi1)+B(Pi2). Assume that D(Pi1) � D(Pi1) and D(Pi2)�D(Pi1) � T . Now everypath Pi3 such that D(Pi1) � D(Pi3) � D(Pi2), (i3 6= i1 and i3 6= i2), can be addedto fPi1 ; Pi2g to increase the bandwidth, while still guaranteeing the condition onthe di�erence between longest and shortest paths. This idea leads to the algorithmDisjoint-STP. Note that the disjointness of the paths is important in ensuring thecondition on the delay: if paths intersect, the length of the longest path can change,28

algorithm Disjoint-STP1. let P(1); P(2); : : : ; P(s) be sorted list of paths according to increasing delay D(:)2. for j = 1 to s do3. compute largest k such that D(P(j+k)) � T +D(P(j));3. BWi kPi=0D(P(j+i));4. BW � mini BWi;5. if BW � � B then return yes6. else return noAlgorithm. 2. Algorithm disjoint path STP.since the resultant multipath can be any subgraph (the problem of obtaining anupperbound on the length of longest path is NP-complete [18]).The algorithm Disjoint-STP solves the disjoint path STP with a time complexityof O(s2). When the paths are constrained to be disjoint, some of the bandwidththat could otherwise be obtained by combining the paths is not utilized. However,there could be other advantages such as higher fault tolerance, which might providejusti�cation for giving up part of the bandwidth. Algorithms speci�cally designed forsuch problems have been proposed in [42].We now present a heuristic algorithm, Approx-STP, to approximately solve theSTP by combining the above algorithm with the algorithm MTA of the last section.The outline of the algorithm is as follows. Starting with the shortest-widest delaypath, new paths are added to current MP such that (a) all edges of the added pathare removed from Gf , and (b) shortest-widest path in residual Gf is computed. Theaddition is continued until just before the step that results in exceeding q. Thenthe multipath is returned as a candidate. Then the initial path of the multipathis removed from current MP , and the addition of the paths is continued until thenext candidate is found. This process is continued until no more paths are availableto add to current MP . Then, from among all candidate multipaths, the one withlargest bandwidth is returned. Thus this algorithm provides an approximate solutionto the optimization version of the STP. The time complexity of this algorithm isO(n2m), since there are no more than m paths considered, and in each iteration thecomputation of shortest-widest path can be computed in O(n2) time [44].Consider a sorted list of bandwidths of edges given by B(1); B(2); : : : ; B(m). Letq be the size of minimum cut of G, and p be the number of paths in the multipathreturned by Approx-STP. The optimal bandwidth is upperbounded by q�1Pj=0B(m�j),and is lowerbounded by qPj=1B(j). When ith path is added to MP , let BPi be thebandwidth of this path and BMi be the bandwidth of the edge of this path with largestbandwidth. The bandwidth realized by Approx-STP is pPj=1BPi � pPj=1B(i). Thus theratio of the optimal bandwidth to that realized by Approx-STP is upperbounded by29

q�1Pi=0 B(m�i)pPi=1B(i) . Also the total \unutilized bandwidth" by Approx-STP is upperbounded bypPi=1(BMi �BPi ) � p�1Pi=0B(m�j)� pPi=1B(i), which is small when the variation in bandwidthsof edges is small.5 ConclusionsWe formulated two generic routing problems within the framework wherein the band-width can be reserved (and guaranteed once reserved) on various links of a com-munication network. The �rst problem requires that a message of �nite length betransmitted from s to d within � units of time. The second problem requires that asequential message of r units be transmitted at a rate of � such that maximum timedi�erence between two units that are received out of order is no more than q. Weshowed that the �rst problem cannot be adequately solved by existing methods basedon ow algorithms or shortest-widest paths. We showed both problems to be NP-complete, and proposed polynomial-time approximation algorithms. Although themain motivation of these problems is to provide strict QoS guarantees, the overallframework can also be used (in some cases) to obtain an \average" end-to-end delayby using \average" estimates for the bandwidth and delay.This paper constitutes only a �rst step towards formulating and solving QoSrouting problems in a rigorous computational framework from a user or applicationperspective (as opposed to optimizing network-level parameters). Our main con-tribution is to exploit multiple paths to provide deterministic bounds for the QoSparameters. Several future research directions can be pursued. It would be inter-esting to see if the current best complexity of O(n3= logn) for the maximum owproblem [10] can be achieved in approximately solving MTP. Also of interest is anapproximation algorithm for MTP whose complexity does not depend on edge delays;such an algorithm is called strongly polynomial in the literature on ow algorithms.The proposed algorithm is based on the classical Ford-Fulkerson's method; severalother ow methods with improved time complexity (e.g. pre ow methods [25]) havebeen extensively studied [2, 19]. It would be interesting to see if these methods canbe used to design approximation algorithms for MTP with a lower complexity. Morework is needed in designing polynomial-time approximation algorithms for STP andalso in investigating performance guarantees of the algorithm proposed in this paper.The applicability of the algorithm MTA to more di�cult tasks, such as multicastingand multiple source-destination transmission is of interest.Concrete computational formulations of other transmission tasks, such as multi-casting, and video conferencing, can be attempted to better understand the complex-ity of these tasks. Also, the proposed framework is based on guaranteeing bandwidth,while there could be several other frameworks that guarantee parameters such as de-lay upper bounds, and upper bounds on queue lengths, etc.; QoS routing algorithmsfor such frameworks will be very useful in a number of applications. In particular,30

the methods of [13, 29] enable us to estimate parameters such as worst-case delay,for networks based on more general frameworks. It would be interesting to obtain ageneralization of Lemma 3.2 for multipaths in these general networks, which couldbe very useful for designing QoS multipath routing algorithms. An integration ofQoS services and the best-e�ort services used in the Internet [22, 32] within a sin-gle framework to identify various algorithmic and complexity issues would also be offuture interest.References[1] R. K. Ahuja, T. L. Magnati, and J. B. Orlin. Network Flows. Prentice Hall, EnglewoodCli�s, NJ, 1993.[2] R. K. Ahuja and J. B. Orlin. A fast and simple algorithm for the maximum owproblem. Operations Research, 37(5):748{759, 1989.[3] C. Aurrecocechea, A. T. Campbell, and L. Hauw. A survey of QoS architectures.Multimedia Systems Journal, 1996.[4] S. Bahk and W. El-Zarki. Dynamic multi-path routing and how it compares withother dynamic routing algorithms for high speed wide area networks. In Proc. of ACMSIGCOM, 1992.[5] D. Bertsekas and R. Gallager. Data Networks. Prentice-Hall, 1992.[6] D. P. Bertsekas, E. M. Gafni, and R. G. Gallager. Second derivative algorithms for min-imum delay distributed routing in networks. IEEE Transactions on Communications,COM-32:911{919, 1984.[7] R. Braden, D. Clark, and S. Shenker. Integrated services in the internet architecture,1994. IETF RFC 1633.[8] D. G. Cantor and M. Gerla. Optimal routing in a packet-switched computer network.IEEE Transactions on Computers, 23(10):1062{1069, 1974.[9] Y. L. Chen and Y. H. Chin. The quickest path problem. Computers and OperationsResearch, 17(2):153{161, 1990.[10] J. Cherian, T. Hagerup, and K. Mehlhorn. An o(n3)-time maximum- ow algorithm.SIAM Journal on Computing, 25(6):1144{1170, 1996.[11] H. Chu and K. Nahstedt. Dynamic multi-path communication for video tra�c. InProc. of Hawian Int. Conf. on Systems Sci., 1997.[12] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. McGraw-Hill Book Co., New York, 1990.[13] R. L. Cruz. A calculus for network delay, Part II: Network analysis. IEEE Transactionson Information Theory, 37(1):132{141, 1991.31

[14] J. Edmonds and R. M. Karp. Theoretical improvements in algorithmic e�ciency fornetwork ow problems. Journal of the Association of Computing Machinery, 19(2):248{264, 1972.[15] D. Ferrari and D. C. Verma. A scheme for real-time channel establishment in wide-areanetworks. IEEE Journal on Selected Areas in Communications, 8(3):368{379, 1990.[16] L. R. Ford and D. R. Fulkerson. Flows in Networks. Princeton University Press,Princeton, NJ, 1962.[17] R. G. Gallager. A minimum delay routing algorithm using distributed computation.IEEE Transactions on Communications, COM-25(1):73{85, 1977.[18] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theoryof NP-Completeness. W. H. Freeman and Co., San Francisco, 1979.[19] A. V. Goldberg and R. Tarjan. A new approach to the maximum- ow problem. Journalof the Association of Computing Machinery, 35(4):921{940, 1988.[20] W. C. Grimmell. private communication.[21] B. Hajek and R. G. Ogier. Optimal dynamic routing in communication networks withcontinous tra�c. Networks, 14:457{487, 1984.[22] C. Huitema. Routing in the Internet. Prentice Hall, 1989.[23] J. M. Ja�e. Algorithms for �nding paths with multiple constraints. Networks, 14:95{116, 1984.[24] H. Kanakia, P. P. Mishra, and A. R. Reibman. An adaptive congestion controlscheme for real time packet video transport. IEEE/ACM Transactions on Networking,3(6):671{682, 1995.[25] A. V. Karzanov. Determining the maximal ow in a network by the method of pre ows.Soviet Math. Dokl., 15:434{437, 1974.[26] O. Kyas. ATM Networks. International Thomson Computer Press, 1997. SecondEdition.[27] S. Murthy and J. J. Garcia-Luna-Aceves. Congestion-oriented shortest multipath rout-ing. In Proc. of IEEE INFOCOM'96, 1996.[28] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms andComplexity. Prentice-Hall, Inc., Englewood Cli�s, NJ, 1982.[29] A.. K. Parekh and R. G. Gallager. A generalized processor sharing approach to owcontrol in integrated services networks: The multiple node case. IEEE/ACM Trans-actions in Networking, 2(2):137{150, 1994.[30] C. Partridge. Gigabit Networking. Addison-Wesley Pub. Co., 1994.[31] V. Paxson. End-to-end behavior in the internet. In Proc. of ACM SIGCOM'96, 1996.[32] R. Perlman. Interconnections: Bridges and Routers. Addison-Wesley, 1992.32

[33] N. S. V. Rao and S. G. Batsell. Algorithm for minimum end-to-end delay paths. IEEECommunications Letters, 1(5):152{154, 1997.[34] N. S. V. Rao and S. G. Batsell. QoS routing via multiple paths using bandwidthreservation. In IEEE INFOCOM98: The Conference on Computer Communications.1998.[35] J. B. Rosen, S. Z. Sun, and G. L. Xue. Algorithms for the quickest path problem andthe enumeration of quickest paths. Computers and Operations Research, 18(6):579{584,1991.[36] A. Segall. The modeliing of adpative routing in data-communication networks. IEEETransations on Communications, COM-25:85{95, 1977.[37] R. Simha and B. Narahari. Single path routing with delay considerations. ComputerNetworks and ISDN Systems, 24:405{419, 1992.[38] M. E. Steenstrup, editor. Routing in Communications Networks. Prentice Hall, 1995.[39] J. W. Suurballe. Disjoint paths in a network. Networks, 4:125{145, 1974.[40] J. W. Suurballe and R. E. Tarjan. A quick method for �nding shortest pairs of disjointpaths. Networks, 14:325{336, 1984.[41] D. M. Topkis. A k shortest path algorithm for adaptive routing in communicationsnetwork. IEEE Transactions on Communications, COM-36(7):855{859, 1988.[42] D. Torrieri. Algorithms for �nding an optimal set of short disjoint paths in a commu-nications network. IEEE Transactions on Communications, 40(11):1698{1702, 1992.[43] R. Vogel, R. G. Herrtwich, W. Kalfa, H. Wittig, and L. C. Wolf. QoS-based routingof multimedia streams in computer networks. IEEE Journal on Selected Areas inCommunications, 14(7):1235{1244, 1996.[44] Z. Wang and J. Crowcroft. QOS routing for supporting resource reservation. IEEEJournal on Selected Areas in Communications, 14(7):1228{1234, 1996.[45] L. Zhang, S. E. Deering, D. Estrin, S. Shankar, and D. Zappala. RSVP: A new resourcereservation protocol communications network. Technical Report 95-607, ISI, Univeristyof Southern California, 1995.33

ORNL/TM-13547INTERNAL DISTRIBUTION1. J. Barhen 49. J. Rome2-21. S. G.Batsell 50. D. B. Reister22. T. Dunigan 51. D. M. Turpin23. H. R. Hicks 52. CSMD Reports O�ce24-25. M. A. Kuliasha 53. Central Research Library26. L. MacIntyre 54. Document Reference Section27. C. E. Oliver 55. Laboratory Records - RC28. L. E. Parker 56. ORNL Patent O�ce29-48. N. S. V. RaoEXTERNAL DISTRIBUTION57-58. O�ce of Scienti�c and Technical Information, P.O. Box 62, Oak Ridge, TN37831.59. Dr. Bob Aiken, ER-31, U.S. Department of Energy, 19901 Germantown Road,Germantown, MD 20874-129060. Prof. I. Akyildiz, School of Electrical Engineering, Georgia Institute of Tech-nology, Atlanta, GA 3033261. Prof. J. Dongarra, Department of Computer Science, 107 Ayres Hall, Universityof Tennessee, Knoxville, TN 37996-130162. Dr. S. T. Elbert, ER-31, U.S. Department of Energy, 19901 Germantown Road,Germantown, MD 20874-129063. Dr. D. Hitchcock, ER-51, U.S. Department of Energy, 19901 Germantown Road,Germantown, MD 20874-129064. Dr. F. Howes, ER-31, U.S. Department of Energy, 19901 Germantown Road,Germantown, MD 20874-129065. Prof. S. S. Iyengar, Department of Computer Science, Louisiana State Univer-sity, Baton Rouge, LA 7080366. Prof. K. Maly, Department of Computer Science, Old Dominion University,Norfolk, VA 2352967. Prof. N. Manickam, Department of Mathematics, Depauw University, Green-castle, IN 4613568. Dr. D. B. Nelson, ER-30, U.S. Department of Energy, 19901 Germantown Road,Germantown, MD 20874-1290 34

69. Prof. S. Radhakrishanan, School of Computer Science, 200 Felgar Street, Room114, University of Oklahoma, Norman, Oklahoma 7301970. Prof. R. C. Ward, Department of Computer Science, 107 Ayres Hall, Universityof Tennessee, Knoxville, TN 37996-1301

35

QoS Routing Via Multiple Paths Using Bandwidth Reservation

Documents

Transcript of QoS Routing Via Multiple Paths Using Bandwidth Reservation