Integrating routing and survivability in fault-tolerant computer network design

11
Integrating routing and survivability in fault-tolerant computer network design S. Pierre * , R. Beaubrun Department of Electrical and Computer Engineering, E ´ cole Polytechnique de Montre ´al, C.P. 6079, Succ. Centre-Ville, Montre ´al, Que ´bec H3C 3A7, Canada Received 8 April 1999; received in revised form 30 July 1999; accepted 30 July 1999 Abstract In the context of computer networks, most conventional algorithms are designed to work under no-failure conditions for dealing with the routing problem. This paper investigates a more realistic situation where the flow assignment takes into account the possibility of a link failure. The method proposed to deal with this problem guarantees the survivability and allows analyzing the network behavior after a link failure. Further, it can be applicable to networks during their design phase as well as during the operational phase. Result analysis reveals the capacity of this method to design networks which satisfy the survivability constraints and to largely improve the delay obtained by other methods. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Computer network design; Flow assignment; Quality of service; Reliability; Routing strategy; Survivability 1. Introduction Generally, routing algorithms are designed for networks that work under ideal conditions, without considering the possibility of link or node failure, which induces simple and non-realistic models [2,6,7,10]. However, in practice, although highly reliable, each component of a network has a non-null probability of failure, depending on its reliability or survivability level [20]. In fact, reliability and surviva- bility constitute two important parameters to consider when designing a computer network. This can be justified by two reasons. First, a computer network must always remain operational while maintaining a high level of service. Second, a failure can significantly impact the network performances or its level of service. Traditional approaches usually separate the survivability from the routing aspect, and focus on either finding an opti- mal (or near optimal) flow assignment for no failure case, or determining the network reliability without considering the routing aspect [1–7,12–21]. This paper proposes a method that solves a new version of the routing problem, by taking into account the effect of possible link failures when select- ing a route. It is organized as follows. Section 2 presents background and related work on network routing, reliability and survivability. Section 3 introduces a new survivable method which guarantees the communication between each pair of nodes after any link failure. Section 4 analyzes numerical results, whereas Section 5 presents some concluding remarks. 2. Background and related work The efficiency of a computer network largely depends on the routing procedures used during the design and operation phases. Such procedures determine the traffic flow patterns that minimize the average end-to-end delay for a given throughput. In the context of packet-switching networks, Kleinrock [11] has developed a model which predicts the transmission, propagation and queuing delay over each link l. This delay can be expressed as follows: T l 1 mC l 2 l l 1 t l 1 where t l , l l and C l , respectively, represent the propagation delay, the flow in packet/s, the link capacity in bps, and 1=m the average packet length. The reliability can be considered as a quantitative factor that indicates the ability of the network to remain opera- tional in its environment [5]. This definition induces several computing methods. Aggarwal et al. [1] have proposed one for determining the probability that a network remains operational. Wang and Schwartz [20] have identified the most probable failed links as a measure of reliability, Computer Communications 23 (2000) 317–327 0140-3664/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. PII: S0140-3664(99)00171-1 www.elsevier.com/locate/comcom * Corresponding author. Tel.: 11-514-340-4711, ext: 4685; fax: 11-514- 340-3240. E-mail address: [email protected] (S. Pierre).

Transcript of Integrating routing and survivability in fault-tolerant computer network design

Integrating routing and survivability in fault-tolerant computernetwork design

S. Pierre* , R. Beaubrun

Department of Electrical and Computer Engineering, E´cole Polytechnique de Montre´al, C.P. 6079, Succ. Centre-Ville, Montre´al, Quebec H3C 3A7, Canada

Received 8 April 1999; received in revised form 30 July 1999; accepted 30 July 1999

Abstract

In the context of computer networks, most conventional algorithms are designed to work under no-failure conditions for dealing with therouting problem. This paper investigates a more realistic situation where the flow assignment takes into account the possibility of a linkfailure. The method proposed to deal with this problem guarantees the survivability and allows analyzing the network behavior after a linkfailure. Further, it can be applicable to networks during their design phase as well as during the operational phase. Result analysis reveals thecapacity of this method to design networks which satisfy the survivability constraints and to largely improve the delay obtained by othermethods.q 2000 Elsevier Science B.V. All rights reserved.

Keywords: Computer network design; Flow assignment; Quality of service; Reliability; Routing strategy; Survivability

1. Introduction

Generally, routing algorithms are designed for networksthat work under ideal conditions, without considering thepossibility of link or node failure, which induces simple andnon-realistic models [2,6,7,10]. However, in practice,although highly reliable, each component of a network hasa non-null probability of failure, depending on itsreliabilityor survivability level [20]. In fact, reliability and surviva-bility constitute two important parameters to consider whendesigning a computer network. This can be justified by tworeasons. First, a computer network must always remainoperational while maintaining a high level of service.Second, a failure can significantly impact the networkperformances or its level of service.

Traditional approaches usually separate the survivabilityfrom the routing aspect, and focus on either finding an opti-mal (or near optimal) flow assignment for no failure case, ordetermining the network reliability without considering therouting aspect [1–7,12–21]. This paper proposes a methodthat solves a new version of the routing problem, by takinginto account the effect of possible link failures when select-ing a route. It is organized as follows. Section 2 presentsbackground and related work on network routing, reliabilityand survivability. Section 3 introduces a new survivable

method which guarantees the communication betweeneach pair of nodes after any link failure. Section 4 analyzesnumerical results, whereas Section 5 presents someconcluding remarks.

2. Background and related work

The efficiency of a computer network largely depends onthe routing procedures used during the design and operationphases. Such procedures determine the traffic flow patternsthat minimize the average end-to-end delay for a giventhroughput. In the context of packet-switching networks,Kleinrock [11] has developed a model which predicts thetransmission, propagation and queuing delay over each linkl. This delay can be expressed as follows:

Tl � 1mCl 2 ll

1 tl �1�

wheret l, l l andCl, respectively, represent the propagationdelay, the flow in packet/s, the link capacity in bps, and 1=mthe average packet length.

The reliability can be considered as a quantitative factorthat indicates the ability of the network to remain opera-tional in its environment [5]. This definition induces severalcomputing methods. Aggarwal et al. [1] have proposed onefor determining the probability that a network remainsoperational. Wang and Schwartz [20] have identified themost probable failed links as a measure of reliability,

Computer Communications 23 (2000) 317–327

0140-3664/00/$ - see front matterq 2000 Elsevier Science B.V. All rights reserved.PII: S0140-3664(99)00171-1

www.elsevier.com/locate/comcom

* Corresponding author. Tel.:11-514-340-4711, ext: 4685; fax:11-514-340-3240.

E-mail address:[email protected] (S. Pierre).

whereas Girard and Sanso [9] have considered the reliabilityand failure propagation constraints in the dimensioningprocess. However, in most cases, the random process relatedto the components’ failure induces very complex expres-sions for the reliability. This has motivated certain research-ers to use theK-connectivity as a measure of reliability[17,18]. This measure refers to the classical problems ofgeneratingK-connected networks or elaborating perturba-tion rules to maintain theK-connectivity [13,17].

On the other hand, the reliability can be defined in termsof the networkavailability. This definition is mostly used bynetwork managers who consider a reliable network as asystem with zero-failure service. In this context, the networkavailability can be expressed as follows [12]:

A� MTBFMTBF 1 MTTR

�2�

where MTBF (mean time between failure) determines theaverage network uptime and MTTR (mean time to repair)the average network downtime.

Even though reliability is an important measure, it doesnot allow studying the impact of a network failure on otherperformance measures. Such a study concerns the surviva-bility. For Newport [15], a network survivability determinesits ability to remain stable after any component failure. Infact, in the event of a failure, the network can still beconnected, but operating with a degraded level of service[11]. This degradation is often expressed in terms of exces-sive delay over certain links, which can even interrupt thecommunication between certain pairs of nodes [21].

Other survivability studies consider a network as a systemworking in accordance with several states. StateE0 charac-terizes the normal operation while stateEf determines thefailure of link f. From this point of view, Li and Sylvester[14] have proposed a method for determining thek mostprobable states of a network, which has allowed them todetermine lower and upper bounds for the average delayin the stateEi, with 1 # i # k: As for Yokohira et al. [21],they have proposed a heuristic that determines the capacityC�l;Ei� for each link l in the state Ei subject to asurvivability constraint.

Moreover, after a link failure, the information having totransit over the failed link must bereroutedover the avail-able links. The methods presented above do not take thatfact into account and only focus on the probability of linkfailure for determining the possible states and the networkbehavior in the stateEi. By proceeding this way, they ignorethe random aspect of the failures. Therefore, it becomesimportant to consider a general survivability model thattakes not only the random process of failure into account,but also executes the information rerouting over the avail-able links.

Let E0 be the state for which the network operates withoutfailure. In this state, information is routed over its primaryroute. However, if a linkf fails, the network must switchfrom stateE0 to stateEf. Consequently, the information that

should have been routed overf must be rerouted over itsbackup route. As a result, there is a variation of flow overcertain links of the network. In this context, Gavish andNeuman [8] have proposed a model which essentially deter-mines, for each pair of nodes, one primary route and a set ofbackup routes, reassigns the flow after a link failure anddetermines the variation of flow over the non-failed links.

LetP be the set of source–destination pairs,k an elementof P . In the context of packet-switching networks,k repre-sents a commodity. Moreover, ifr denotes the primary routeof k, q a backup route,l a non-failed link andf the failed one,the flow increase due to the failure off on each linkl can beexpressed as follows [8]:

Df 1lf � 1

m

Xk[P

Xr[Sk

Xq[Sk;q±r

gkxruqdrf dql�1 2 dqf ��1 2 drl �;

;l ± f [ A

�3�where 1=m represents the average packet length andg k thetraffic induced by the commodityk. The other variables aredefined as follows:

xr �1 if r is chosen to carry the traffic of

its associated source–destination pair

0 otherwise

8>><>>: �4�

uq �1 ifq is chosen as an alternate route for

its source–destination pair

0 otherwise

8>><>>: �5�

drl �1 if the link l belongs to the primary router

0 otherwise

(�6�

From Eq. (3), we can derive a set of conditions that char-acterize the flow increase over each linkl belonging to thebackup routeq. These are:

xr � 1 �7�

drf � 1 �8�

dqf � 0 �9�

uq � 1 �10�

dql � 1 �11�

drl � 0 �12�On the other hand, sinceg k diverges from its primary

route to its backup route after a link failure, a flow increaseover the links belonging toq induces a flow decrease overthe links belonging tor. The flow decrease is expressed as

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327318

follows [8]:

Df 2lf � 1

m

Xk[P

Xr[Sk

gkxrdrf drl ; ;l ± f [ A �13�

Again, we can derive the conditions that characterize theflow decrease over eachl belonging tor after the failureof f. These are:

xr � 1 �14�

drf � 1 �15�

drl � 1 �16�Even though this method seems general, the selection of

the backup routes is not explicitly stated, which inducesvery complex expressions for computing the variation offlow over each link. Further, that method does not necessa-rily guarantee the communication between each pair ofnodes, especially when the failed link is part of both primaryand backup routes used by that pair [8].

3. The proposed method

To tackle more efficiently the problem of integrating rout-ing and survivability aspects, we propose a new methodwhich is based on link-disjoint routes. This way of selectingthe routes guarantees the communication between each pairof nodes and gives a better means for computing the varia-tion of flow over the non-failed links. In this section, we alsopresent the way to determine, during the design phase of anetwork, each link capacity that allows absorbing the varia-tion of flow induced by a failure.

3.1. Computing the variation of flow

For a reliable network (at least 2-connected), we candetermine for each commodityk one primary routerk andone backup router 0k; both with disjointed links. This way ofselecting the routes guarantees the availability ofr 0k after thefailure of any link belonging tork. This makes it pertinent tochoose only one backup route. In other words, ifSk is the setof routes for the commodityk, Sk � { rk; r

0k} : The default

route for g k is rk. However, whenrk is unavailable,g k isrouted overr 0k: This leads us to conclude that implicitly theconditionxr � 1 is always true, which makes conditions (7)and (14) trivial.

Moreover, let us define the following binary variables foreach linkl:

drkl �1 if l [ rk

0 otherwise

(�17�

dr 0kl �1 if l [ r 0k

0 otherwise

(�18�

Sincerk and r 0k are defined as routes with disjointed links,

the failed linkf cannot simultaneously belong to bothrk andr 0k: Consequently, conditions (8) and (9) can be merged intothe following:

drkf � 1 �19�while conditions (11) and (12) can be combined into thefollowing:

dr 0kl � 1 �20�Further, sinceSk � { rk; r

0k} ; the traffic induced byk transits

exclusively byrk or r 0k; i.e. whenrk is unavailable,g k canonly be forwarded overr 0k: Consequently, conditions (8) and(10) can also be merged to become equivalent to Eq. (19).We realize that the selection of routes with disjointed linksconsiderably reduces the number of conditions for expres-sing the flow increase overl. In other words, conditions (7)–(12) can be replaced with Eqs. (19) and (20), i.e. the failureof f increases the flow overl whenever conditions (19) and(20) are satisfied.

Similarly, we can determine the conditions that charac-terize a flow decrease over the non-failed links. To do this,let us rewrite condition (16) according to our model. Weobtain the following condition:

drkl � 1 �21�Thus, conditions (14)–(16) can be replaced with Eqs. (19)and (21) for characterizing the flow decrease over eachl.

Moreover, let us define a variabled l such as

dl �1 if l [ r 0k

21 if l [ rk

0 otherwise

8>><>>: �22�

Then, the variation of flow overl can be computed asfollows:

Dflf � 1m

Xk

gkdrkf dl ; ;l ± f �23�

whereg k represents the traffic induced byk, d rkf andd l two

binary variables respectively expressed by Eqs. (17) and(22). This compact expression replaces relations (3) and(13) adopted in Ref. [8] for computing the variation offlow on each linkl after the failure off. It represents asum overg k and can be either positive or negative since itdepends ond l. In fact, it can represent an increase or adecrease in the flow over the non-failed links.

3.2. Capacity assignment

Since a failure is a random process, the resulted variationof flow turns out to be unpredictable. This induces theproblem of capacity assignment. In fact, during the designphase, we need to know what capacity to use for each linkso that the network, once operational, remains survivableafter any link failure. In this context, Yokohira et al.[21] have proposed a method based on the network states.

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327 319

Nevertheless, in addition to using deterministic measures todetermine the capacity values, this method does not takeinto account the variation of flow induced by the failedlink, which makes it practically inappropriate for integratingthe routing and survivability aspects. In this section, theproposed capacity assignment is directly derived from thecomputation of the variation of flow, taking also intoaccount the survivability conditions of the network. Thoseconditions are such as any non-failed linkl absorbs thewhole variation of the traffic without major degradation ofthe quality of service.

From Eq. (23), we can determine the maximum variationof flow Dflmax

over each non-failed linkl. For that, weconsider the failure of eachf ± l and memorize the maxi-mum value. This produces the following expression:

Dflmax� max

fDflf

This operation is repeated for each non-failed link.ReplacingDflf with its value given by Eq. (23), we candetermine the maximum variation of flow overl as follows:

Dflmax� 1

mmax

f

Xk

gkdrkf dl ; ;l ± f �24�

This expression allows us to determine for eachl the capa-city Cl that satisfies the survivability constraints. In otherwords,Cl must be chosen such as:

Cl $ fl 1 Dflmax�25�

wherefl indicates the flow overl for non-failure operations.Consequently,Cl is not only able to absorb the whole traffic,

but it will also allow the network to maintain a good perfor-mance after any link failure. This will particularly contri-bute to prevent the network from congestion due to toomuch variation of flow on certain links.

4. Numerical results

In this section, we present the implementation of theproposed method as well as some computational experi-ments, and compare results obtained from the proposedmethod with those obtained from Ref. [8].

4.1. Implementation of the proposed method

Basically, our goal consists of designing survivablenetworks which maintain their performance after any linkfailure. To reach such an objective, the proposed methodexecutes the following implementation steps.

Step 1:Determining the primary routerk. This step deter-mines each link delay by applying expression (1) and findsthe fastest routerk between each node pairk. The setR ofthose paths constitutes the set of the primary paths and getsmemorized in a path pointer.

Step 2:Determining the secondary router 0k: This stepdetermines for each node pairk, the second fastest router 0k; while rk and r 0k are link-disjoint paths. This step turnsout to be important in the sense that, whenrk is unavailable,packets must be rerouted overr 0k and then, the flow variationcomputing on the non-failed links largely depends on thechoice ofr 0k: Like R, the setR 0 (set of secondary routes)also gets memorized in a path pointer.

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327320

Fig. 1. Network with unreliable links (20 nodes, 30 links).

Step 3:The capacity assignment. In order to compute theflow variation on each linkl, we must implement expres-sions (22) and (23), which implies that we, respectively,know f, rk, andr 0k for eachk of P . Then, we must examineeach linkl of rk and comparel with f. If there is any linkl ofrk equal tof, we conclude thatf is a part ofrk. Consequently,we subtract the traffic of the commodityk from the flow ofeach linkl ± f of rk and add the same traffic to the flow ofeachl of r 0k: For a givenf, this operation is repeated for eachk of P . As a result, the total flow variationDflf on each linkl ± f is determined.

Let us repeat that procedure for another failed linkf 0. Wethen obtain a new flow variationDflf 0 on each linkl ± f 0: Letus now compareDflf with Dflf 0 and memorize the greatestvalue among them. We obtain the maximum flow variationon eachl after the failure off or f 0. Consequently, afterconsidering the failure of each of them links, we obtain aglobal maximum value that indicates the maximum flowvariation on each link. This leads to determine the maxi-mum flow on each link after any failure and to choose therequired capacity capable of absorbing the flow variation.

Step 4:Routing with failure consideration. The abovecapacity assignment modifies certain links attributes,

including each link delay, given by expression (1). As aresult, we obtain a new pair of paths for each pairk of P .Routing with failure consideration is done according to thenew defined pairs of routes.

4.2. Computational experiments

Let us apply the proposed method to the network shownin Fig. 1, with a uniform 150-packets/s traffic requirement.For such a network, there are 190 commodities. Based onthe link attributes obtained from the routing algorithmpresented in Ref. [16] and reported in Table 1, we determinefor each commodity the two fastest routes with disjointedlinks. The set of those routes allows determining the maxi-mum variation of flow over the non-failed links and therequired capacity values. These new link attributes aregiven in Table 2, while assuming the network in the stateE0. We observe an increase in the capacity values, whichmakes the network satisfy the survivability conditions. As aresult, the average packet delay decreases from 5.7113 to1.8942 ms.

Let us randomly setf and examine the network behaviorin the stateEf. For f � �1; 5� andf � �9; 13�; we present in

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327 321

Table 1Link attributes without consideration of survivability

Link Flow (Mbps) Capacity Ttransmission(ms) Tpropagation(ms) Ttotal (ms)

Mbps Signal

(1, 3) 4.200 6.312 DS-2 0.1584 0.3804 0.8539(1, 4) 5.550 6.312 DS-2 0.11584 0.3779 1.6902(1, 5) 8.850 12.624 2 DS-2 0.0792 0.4249 0.6899(2, 3) 2.850 3.152 DS-1C 0.3173 0.3480 3.6593(2, 6) 13.950 18.528 12 DS-1 0.0540 0.5336 0.7520(2, 7) 12.300 12.624 2 DS-2 0.0792 0.5336 3.6200(3, 7) 2.850 3.152 DS-1C 0.3173 0.3167 3.6279(4, 11) 3.900 6.312 DS-2 0.1584 0.6549 1.0695(4, 18) 6.150 12.624 2 DS-2 0.0792 1.3858 1.5403(5, 9) 10.050 12.624 2 DS-2 0.0792 0.3727 0.7612(5, 11) 4.050 6.312 DS-2 0.1584 0.5270 0.9691(6, 8) 2.700 3.152 DS-1C 0.3173 0.3887 2.6011(6, 12) 11.850 12.624 2 DS-2 0.0792 0.5069 1.7989(7, 9) 11.850 12.624 2 DS-2 0.0792 0.2635 1.5555(8, 10) 3.450 6.312 DS-2 0.1584 0.3308 0.6802(8, 13) 3.750 6.312 DS-2 0.1584 0.5207 0.9110(9, 13) 11.400 12.624 2 DS-2 0.0792 0.3145 1.1315(10, 12) 11.550 12.624 2 DS-2 0.0792 0.1863 1.1174(10, 15) 9.450 12.624 2 DS-2 0.0792 0.4167 0.7317(11, 14) 6.000 6.312 DS-2 0.1584 0.2357 3.4408(12, 16) 8.400 12.624 2 DS-2 0.0792 0.6194 0.8561(13, 14) 9.450 12.624 2 DS-2 0.0792 0.3399 0.6550(14, 17) 5.850 6.312 DS-2 0.1584 0.3727 2.5372(14, 18) 5.850 6.312 DS-2 0.1584 0.6346 2.7991(15, 17) 5.850 6.312 DS-2 0.1584 0.3005 2.4650(15, 20) 2.850 3.152 DS-1C 0.3173 0.7683 4.0796(16, 20) 5.100 6.312 DS-2 0.1584 0.3598 1.1849(17, 19) 1.200 1.544 DS-1 0.6477 0.5590 3.4660(18, 19) 9.000 12.624 2 DS-2 0.0792 0.3436 0.6195(19, 20) 5.700 6.312 DS-2 0.1584 0.6719 2.3058Average packet delayTaver� 5:7113 ms

Table 3 the flow and delay on the non-failed links. Werealize that the flow on each link depends onf, and thecapacity values determined by the method always satisfythe survivability conditions. Moreover, an increase in theaverage packet delay is also observed, which corresponds toa decrease in the network performance. Also, the failure of(9, 13) affects much more the network behavior than thefailure of (1, 5). In fact, when the link (9, 13) fails, theaverage packet delay increases from 1.8942 to 2.5405 ms,say a 34.0% increase, while the failure of (1, 5) causes onlyan increase of 1.0% in the average delay. However, in bothcases, the increase in the delay after failure practicallyremains acceptable.

Let us study the behavior of the network shown in Fig. 1after a failure. Each link is labeled according to the corre-spondence presented in Table 4. This network can be eitherin the stateE0 or in the stateEf. From a state to another, eachlink flow is subject to some variation depending onf. Thisvariation can be expressed in percentage as follows:

Dflf �%� �Dflffl

× 100�%�; ;l ± f �26�

whereDflf is given by Eq. (23) andfl the flow overl in thestateE0. This expression represents the relative variation of

flow due tof over l and characterizes the behavior of eachlink l. Fig. 2 shows the relative variation of flow over thefirst three links.

Based on the above analysis, we cannot exactly predictthe relative variation of flow overl. However, such avariation has a lower bound given by the negative valueof the flow in the stateE0. For the network of Fig. 1, wehave observed a lower bound of 94.7% on link 4 afterthe failure of link 1. Results also reveal that certainlinks close to the failure are subject to a significantvariation of flow. For example, the increase in theflow is 350% on link 26 when link 21 fails, and it is77.4% on link 30 when link 25 fails, while the increase inthe flow on link 30 remains relatively insignificant for anyfailure from links 1 to 20. However, other results show thatlinks close to the failure are not the only ones to be subject toa high variation of flow. For example, the failure of link 5induces an increase in the relative flow of about 125% overlink 26, while the failure of link 22 increases the flow on link2 to about 105%. Those results confirm how it is difficult topredict the variation of flow on the non-failed links after afailure.

Obviously, this variation of flow induces a change in theoverall network load. For anm-link network, this change

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327322

Table 2Link attributes with the consideration of survivability (state E0)

Link Flow (Mbps) Dfmax (Mbps) Capacity (Mbps) Ttransmission(ms) Tpropagation(ms) Ttotal (ms)

(1, 3) 3.600 4.500 12.624 0.0792 0.3804 0.4912(1, 4) 3.450 5.400 12.624 0.0792 0.3779 0.4869(1, 5) 2.850 1.200 6.312 0.1584 0.4249 0.7138(2, 3) 2.850 1.200 6.312 0.1584 0.3480 0.6369(2, 6) 5.850 5.400 12.624 0.0792 0.5336 0.6812(2, 7) 7.500 4.800 12.624 0.0792 0.5336 0.7288(3, 7) 4.500 4.200 12.624 0.0792 0.3167 0.4398(4, 11) 2.850 4.800 12.624 0.0792 0.6549 0.7572(4, 18) 3.000 4.500 12.624 0.0792 1.3858 1.4897(5, 9) 6.000 4.200 12.624 0.0792 0.3727 0.5236(5, 11) 1.800 1.500 6.312 0.1584 0.5270 0.7487(6, 8) 4.200 4.500 12.624 0.0792 0.3887 0.5074(6, 12) 2.850 3.000 6.312 0.1584 0.5069 0.7957(7, 9) 11.850 5.700 18.528 0.0540 0.2635 0.4133(8, 10) 11.550 6.300 18.528 0.0540 0.3308 0.4741(8, 13) 13.350 4.200 18.528 0.0540 0.5207 0.7138(9, 13) 16.650 5.700 25.248 0.0396 0.3145 0.4308(10, 12) 8.250 4.500 18.528 0.0540 0.1863 0.2836(10, 15) 4.800 4.800 12.624 0.0792 0.4167 0.5445(11, 14) 7.800 4.200 12.624 0.0792 0.2357 0.4430(12, 16) 6.000 4.800 12.624 0.0792 0.6194 0.7703(13, 14) 17.100 5.400 25.248 0.0396 0.3399 0.4627(14, 17) 11.400 4.200 18.528 0.0540 0.3727 0.5130(14, 18) 2.850 2.400 6.312 0.1584 0.6346 0.9235(15, 17) 5.250 5.100 12.624 0.0792 0.3005 0.4361(15, 20) 1.200 4.800 6.312 0.1584 0.7683 0.9639(16, 20) 2.400 3.300 6.312 0.1584 0.3598 0.6154(17, 19) 8.250 2.400 12.624 0.0792 0.5590 0.7876(18, 19) 4.500 2.700 12.624 0.0792 0.3436 0.4667(19, 20) 4.650 2.700 12.624 0.0792 0.6719 0.7973Average packet delayTaver� 1:8942 ms

can be expressed as follows:

Dfaverf �1m

Xml�1

Dflf ; l ± f �27�

whereDflf is given by Eq. (23). Expression (27) shows howthe overall network reacts in accordance with the failuref.This is illustrated in Fig. 3 where such a variation is alwayspositive, i.e. a failure induces an increase in the global

network load. For the network of Fig. 1, the average loadincreasing is maximum after the failure of link 23.

Based on Eq. (1), an increase in the flow on a particularlink induces an increase in the overall delay on that link.Consequently, we can expect the overall delay to follow thesame allure as the variation of the network load. The evolu-tion of the overall delay is illustrated in Fig. 4. In fact, thefailure of link 23 leading to the highest variation of flowcauses the highest delay, and the failure of link 3 that affectsvery little the behavior of the network induces the lowestdelay. From those observations, one can confirm severalinteresting results; after a link failure, the variation of flowon each non-failed link is lower-bounded, while the overallload of the network increases.

4.3. Comparison of results

Let us apply the proposed method to GTE and USAnetworks shown, respectively, in Figs. 5 and 6 [8]. Also,let us identify the method presented in Ref. [8] by Method 1and the method presented in this paper by Method 2. Tocompare the results obtained from those two methods, let ususe the following set of data:

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327 323

Table 3Flow and delay after the failure of links (1, 5) and (9, 13)

Link Capacity (Mbps) Failure of (1, 5) Failure of (9, 13)

Flow (Mbps) Ttotal (ms) Flow (Mbps) Ttotal (ms)

(1, 3) 12.624 3.900 0.4950 6.600 0.5464(1, 4) 12.624 3.150 0.4834 6.150 0.5323(1, 5) 6.312 – – 3.750 0.8152(2, 3) 6.312 2.850 0.6369 5.550 1.6603(2, 6) 12.624 5.850 0.6812 11.850 1.8256(2, 7) 12.624 7.500 0.7288 9.600 0.8643(3, 7) 12.624 5.400 0.4551 4.200 0.4354(4, 11) 12.624 3.750 0.7676 4.050 0.7715(4, 18) 12.624 3.000 1.4897 4.500 1.5089(5, 9) 12.624 5.700 0.5171 5.700 0.5171(5, 11) 6.312 2.100 0.7645 4.200 1.0005(6, 8) 12.624 4.200 0.5074 7.500 0.5839(6, 12) 6.312 2.850 0.7957 4.350 1.0166(7, 9) 18.528 12.750 0.4366 8.250 0.3608(8, 10) 18.528 11.550 0.4741 11.550 0.4741(8, 13) 18.528 13.350 0.7138 10.650 0.6476(9, 13) 25.248 16.650 0.4308 – –(10, 12) 18.528 8.250 0.2836 7.650 0.2783(10, 15) 12.624 4.800 0.5445 7.200 0.6010(11, 14) 12.624 8.400 0.4724 11.400 1.0527(12, 16) 12.624 6.000 0.7703 5.700 0.7638(13, 14) 25.248 17.700 0.4724 12.300 0.4172(14, 17) 18.528 11.400 0.5130 8.700 0.4744(14, 18) 6.312 2.850 0.9235 2.550 0.9005(15, 17) 12.624 5.250 0.4361 5.850 0.4481(15, 20) 6.312 1.200 0.9639 1.200 0.9639(16, 20) 6.312 2.400 0.6154 3.300 0.6918(17, 19) 12.624 8.250 0.7876 7.350 0.7486(18, 19) 12.624 4.500 0.4667 5.700 0.4880(19, 20) 12.624 4.650 0.7973 4.950 0.8022

Taver� 1:9094 ms Taver� 2:5405 ms

Table 4Link labeling

Link Label Link Label Link Label

(1, 3) 1 (5, 11) 11 (12, 16) 21(1, 4) 2 (6, 8) 12 (13, 14) 22(1, 5) 3 (6, 12) 13 (14, 17) 23(2, 3) 4 (7, 9) 14 (14, 18) 24(2, 6) 5 (8, 10) 15 (15, 17) 25(2, 7) 6 (8, 13) 16 (15, 20) 26(3, 7) 7 (9, 13) 17 (16, 20) 27(4, 11) 8 (10, 12) 18 (17, 19) 28(4, 18) 9 (10, 15) 19 (18, 19) 29(5, 9) 10 (11, 14) 20 (19, 20) 30

• a uniform 1 packet/s traffic between each node pair;• an average packet length 1=m of 400, 450 and 500 bits for

GTE network (Fig. 5) and 430 bits for USA network (Fig.6);

• capacity values (in kbps) respectively indicated over thelinks.

In this situation, since the capacity values are given, the step3 of Method 2 is skipped. For different packet lengths, wedetermine for each of the two methods the average delaywith and without failure. Generally, this delay varies in thesame way as 1=m: For example, for GTE network, theutilization of Method 1 induces a delay of 22.2 ms without

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327324

Fig. 2. Relative variation of flow on links 1, 2, 3.

Fig. 3. Average variation of the network load.

Table 5Comparison of delay for GTE network (note: Method 1 corresponds to the method presented in Ref. [8], while Method 2 corresponds to the one presented inthis paper)

Method 1 Method 2 Improvement (%)

1=m (bits/packet) 400 450 500 400 450 500 400 450 500Taver without failure (ms) 22.2 27.5 33.1 16.1 18.4 20.8 27.3 33.0 37.2Taver with failure (ms) 22.9 28.3 35.0 16.8 19.2 21.8 26.6 32.2 37.7

failure when 1=m � 400 bits: This delay changes, respec-tively, to 27.5 and 33.1 ms when the packet length becomes450 and 500 bits. This can be explained by expression (1)where the delay on each link is proportional to 1=m: As aresult, the average packet delay increases in accordancewith the packet length.

Moreover, the comparative results obtained from both

methods and reported in Tables 5 and 6 reveal that theaverage delay from Method 2 turns out to be better thanthis obtained from Method 1. For GTE network, theimprovement in the delay is ranged from 27.3 to 37.2%without link failure, while it is ranged from 26.6 to 37.7%with a link failure. Such an improvement is about 20% forUSA network, which confirms the efficiency of Method 2.

5. Concluding remarks

This paper examines a new version of the routingproblem by combining routing and survivability aspects.A method presented in Ref. [8] can only be applied duringthe operational phase and induces a complex expression for

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327 325

Fig. 4. Variation of the average packet delay.

Fig. 5. GTE network (12 nodes, 25 links).

Table 6Comparison of delay for USA network with 1=m � 430 bits=packet

Method 1 Method 2 Improvement (%)

Taver without failure (ms) 11.3 8.8 22.1Taver with failure (ms) 12.1 9.8 19.0

computing the variation of flow on each link. That brings usto propose a new method based on link-disjoint routes,which simplifies the expressions characterizing the variationof flow over the non-failed links. Such a method can beapplied during the design phase. In this case, it allows usto compute the capacity values capable to make the networksurvivable to any link failure, without major degradationof the quality of service. In this context, a completestudy of a particular network behavior has been done.On the other hand, when applied during the operationalphase, the proposed method does not make any capacityassignment, and result analysis reveals its efficiency. Itsapplication to GTE and USA networks generally givesbetter results than the method presented in Ref. [8]. Infact, the improvement in the average packet delay variesfrom 19.0 to 37.7%. However, the restriction to link-disjoint routes has two disadvantages. First, the imple-mentation of the secondary routesr 0 requires several opera-tions which can make the search ofr 0 relatively slow.Second, for a given pair of nodes, the exclusivity of linksbelonging to either the primary or the secondary route caninduce a relatively high delay over the secondary route.Future research could be oriented to determine a set ofrules for selecting the most reliable paths to be used forrouting packets.

6. Acknowledgement

This work was supported in part by the National Sciences

and Engineering Research Council (NSERC) of Canadaunder grant 140264-98.

References

[1] K.K. Aggarwal, J.S. Gupta, K.B. Misra, A simple method for relia-bility evaluation of a communication system, IEEE Transactions onCommunications COM-23 (5) (1975) 563–566.

[2] D. Bertsekas, R. Gallager, Data Networks, Prentice-Hall, EnglewoodCliffs, NJ, 1987, pp. 297–407.

[3] M. Carey, A. Hendrickson, Bounds on expected performance ofnetworks with links subject to failure, Networks 14 (1984) 439–456.

[4] H. Chen, J. Zhou, Reliability optimization in generalized stochastic-flow networks, IEEE Transactions on Reliability 40 (1) (1991) 92–97.

[5] C.J. Colbourn, Combinatorial aspects of network reliability, Annalsof Operations Research 33 (1991) 3–15.

[6] L. Fratta, M. Gerla, L. Kleinrock, The flow deviation method: anapproach to store-and-forward communication network design,Networks 3 (1974) 97–103.

[7] B. Gavish, I. Neuman, A system for routing and capacity assignmentin computer communication networks, IEEE Transactions onCommunications COM-37 (4) (1989) 360–366.

[8] B. Gavish, I. Neuman, Routing in a network with unreliable compo-nents, IEEE Transactions on Communications COM-40 (7) (1992)1248–1258.

[9] A. Girard, B. Sanso, Multicommodity flow models, failure propaga-tion, and reliable loss network design, IEEE/ACM Transactions onNetworking 6 (1) (1998) 82–93.

[10] A. Kershenbaum, Telecommunications Network Design Algorithms,McGraw-Hill, New York, 1993.

[11] L. Kleinrock, Computer applications,Queueing Systems, II, Wiley,New York, 1976, pp. 270–421.

[12] P. Kubat, Alternatives for reliable and quality telecommunicationnetwork design, Annals of Operations Research 33 (1991) 95–105.

[13] S. Pierre, G. Legault, A genetic algorithm for designing distributed

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327326

Fig. 6. USA network (26 nodes, 41 links).

computer network topologies, IEEE Transactions on Man, Systems,and Cybernetics 28 (2) (1998) 249–258.

[14] O.K.V. Li, J.A. Sylvester, Performance analysis of networks withunreliable components, IEEE Transactions on CommunicationsCOM-32 (10) (1984) 1105–1110.

[15] K.T. Newport, P.K. Varshney, Design of survivable communicationsnetworks under performance constraints, IEEE Transactions on Relia-bility 40 (4) (1991) 433–440.

[16] S. Pierre, R. Beaubrun, A routing algorithm for distributed commu-nication networks, 22nd Annual Conference on Computer Networks,Minneapolis, MN, 1997, pp. 99–105.

[17] S. Pierre, Inferring new design rules by machine learning: a case studyof topological optimization, IEEE Transactions on Man, Systems, andCybernetics 28A (5) (1998) 575–585.

[18] U. Schumacher, An algorithm for construction for ak-connectedgraph with minimum number of edges and quasiminimal diameter,Networks 14 (1984) 63–74.

[19] M. Schwartz, T.E. Stern, Routing techniques used in computercommunication networks, IEEE Transactions on CommunicationsCOM-28 (4) (1980) 539–552.

[20] C. Wang, M. Schwartz, Identification of faulty links in dynamic-routed networks, IEEE Journal on Selected Areas in Communications11 (9) (1993) 1449–1460.

[21] T. Yokohira, M. Sugano, T. Nishida, H. Miyahara, Fault tolerantpacket-switched network design and its sensitivity, IEEE Transac-tions on Reliability 40 (4) (1991) 452–460.

S. Pierre, R. Beaubrun / Computer Communications 23 (2000) 317–327 327