Analyzing the Hold Time Schemes to Limit the Routing Table Calculations in OSPF Protocol

8
Analyzing the Hold Time Schemes to Limit the Routing Table Calculations in OSPF Protocol M Goyal, M Soperi, H Hosseini University of Wisconsin Milwaukee Milwaukee, WI 53201 USA {mukul,msm2,hosseini}@uwm.edu KS Trivedi Duke University Durham, NC 27708 USA [email protected] A Shaikh AT&T Labs - Research Florham Park, NJ 07932 USA [email protected] G Choudhury AT&T Labs - Research Middletown, NJ 07748 USA [email protected] Abstract OSPF is a popular interior gateway routing protocol. Commercial OSPF routers limit their processing load by using a hold time between successive routing table calcu- lations as new link state advertisements (LSAs) arrive fol- lowing a topology change. A large hold time value limits the frequency of routing table calculations but also causes large delays in convergence to the topology change. Hence, commercial routers now use an exponential backoff scheme, where the hold time is initially set to a small value that is expected to rapidly increase, and hence limit the frequency of routing table calculations, in face of continuous LSA ar- rivals. In this paper, we analyze the ability of different hold time schemes to limit the frequency of routing table calcu- lations under continuous LSA arrivals starting with a small value for the hold time. This analysis is performed using Markov Regenerative Process based stochastic models as well as simulations using an extensively modified ospfd sim- ulator. 1. Introduction Link state routing protocols such as OSPF [15] and IS- IS [17], using shortest path first (SPF) routing, are the most popular protocols providing routing functionality within an autonomous system (AS). An AS is often organized as a number of areas for scalability reasons. A router needs to know the complete topology of the areas it belongs to. To accomplish this, each router generates a link state advertise- ment (LSA) for each area it belongs to. A router’s LSA for an area describes the state of each one of the router’s work- ing interfaces belonging to that area, including whether the router has established adjacency with a neighbor router on that interface. 1 These LSA are flooded throughout the area. Whenever there is a change in the network topology, the routers affected by the topology change establish new adja- cencies or break old ones and generate new LSAs that are flooded throughout the area. A router maintains a database of the latest LSAs originated by each router in the areas it belongs to. This link state database represents the current network topology for the router. The OSPF specification [15] requires a router to recalcu- late its routing table from scratch whenever it receives a new router or network 2 LSA. A complete routing table calcula- tion involves calculating the intra-area routes for all OSPF areas to which the router belongs (typically using Dijkstra’s shortest path algorithm [6] on the contents of router and net- work LSAs) and the inter-area/AS-external routes by exam- ining the contents of all summary 3 and AS-external 4 (ASE) LSAs. The amount of time required in this process depends on the number of different types of LSAs in the router’s link state database. Typically, a backbone router may have up to a few hundred router/network LSAs and up to a few thou- sand summary/ASE LSAs in its link state database. Cal- culating intra-area routes using Dijkstra’s algorithm (with a time complexity O(n × log(n))) takes only a few tens of milliseconds on modern routers [7, 19]. This time can be further reduced by using incremental algorithms [4] rather than Dijkstra’s algorithm. However, the examination of the 1 Such LSAs are called router LSAs. 2 A network LSA describes the routers on a broadcast LAN. 3 A summary LSA describes the cost of reaching some out-of-area des- tinations via an area border router. 4 An AS-external LSA describes the cost of reaching some out-of-AS destinations via an AS boundary router. 2009 International Conference on Advanced Information Networking and Applications 1550-445X/09 $25.00 © 2009 IEEE DOI 10.1109/AINA.2009.83 74 2009 International Conference on Advanced Information Networking and Applications 1550-445X/09 $25.00 © 2009 IEEE DOI 10.1109/AINA.2009.83 74

Transcript of Analyzing the Hold Time Schemes to Limit the Routing Table Calculations in OSPF Protocol

Analyzing the Hold Time Schemes to Limit the Routing Table Calculations inOSPF Protocol

M Goyal, M Soperi, H HosseiniUniversity of Wisconsin Milwaukee

Milwaukee, WI 53201 USA{mukul,msm2,hosseini}@uwm.edu

KS TrivediDuke University

Durham, NC 27708 [email protected]

A ShaikhAT&T Labs - Research

Florham Park, NJ 07932 [email protected]

G ChoudhuryAT&T Labs - Research

Middletown, NJ 07748 [email protected]

Abstract

OSPF is a popular interior gateway routing protocol.Commercial OSPF routers limit their processing load byusing a hold time between successive routing table calcu-lations as new link state advertisements (LSAs) arrive fol-lowing a topology change. A large hold time value limitsthe frequency of routing table calculations but also causeslarge delays in convergence to the topology change. Hence,commercial routers now use an exponential backoff scheme,where the hold time is initially set to a small value that isexpected to rapidly increase, and hence limit the frequencyof routing table calculations, in face of continuous LSA ar-rivals. In this paper, we analyze the ability of different holdtime schemes to limit the frequency of routing table calcu-lations under continuous LSA arrivals starting with a smallvalue for the hold time. This analysis is performed usingMarkov Regenerative Process based stochastic models aswell as simulations using an extensively modified ospfd sim-ulator.

1. Introduction

Link state routing protocols such as OSPF [15] and IS-IS [17], using shortest path first (SPF) routing, are the mostpopular protocols providing routing functionality within anautonomous system (AS). An AS is often organized as anumber of areas for scalability reasons. A router needs toknow the complete topology of the areas it belongs to. Toaccomplish this, each router generates a link state advertise-ment (LSA) for each area it belongs to. A router’s LSA foran area describes the state of each one of the router’s work-

ing interfaces belonging to that area, including whether therouter has established adjacency with a neighbor router onthat interface.1 These LSA are flooded throughout the area.Whenever there is a change in the network topology, therouters affected by the topology change establish new adja-cencies or break old ones and generate new LSAs that areflooded throughout the area. A router maintains a databaseof the latest LSAs originated by each router in the areas itbelongs to. This link state database represents the currentnetwork topology for the router.

The OSPF specification [15] requires a router to recalcu-late its routing table from scratch whenever it receives a newrouter or network2 LSA. A complete routing table calcula-tion involves calculating the intra-area routes for all OSPFareas to which the router belongs (typically using Dijkstra’sshortest path algorithm [6] on the contents of router and net-work LSAs) and the inter-area/AS-external routes by exam-ining the contents of all summary3 and AS-external 4 (ASE)LSAs. The amount of time required in this process dependson the number of different types of LSAs in the router’s linkstate database. Typically, a backbone router may have up toa few hundred router/network LSAs and up to a few thou-sand summary/ASE LSAs in its link state database. Cal-culating intra-area routes using Dijkstra’s algorithm (with atime complexity O(n × log(n))) takes only a few tens ofmilliseconds on modern routers [7, 19]. This time can befurther reduced by using incremental algorithms [4] ratherthan Dijkstra’s algorithm. However, the examination of the

1Such LSAs are called router LSAs.2A network LSA describes the routers on a broadcast LAN.3A summary LSA describes the cost of reaching some out-of-area des-

tinations via an area border router.4An AS-external LSA describes the cost of reaching some out-of-AS

destinations via an AS boundary router.

2009 International Conference on Advanced Information Networking and Applications

1550-445X/09 $25.00 © 2009 IEEE

DOI 10.1109/AINA.2009.83

74

2009 International Conference on Advanced Information Networking and Applications

1550-445X/09 $25.00 © 2009 IEEE

DOI 10.1109/AINA.2009.83

74

summary/ASE LSAs may potentially take several hundredmilliseconds [7]. Thus, a complete routing table calculationmay possibly keep the router’s CPU busy for a considerableamount of time.

A topology change may cause a number of newrouter/network LSAs to be generated. For example, whena router goes down, all the routers hitherto adjacent to thisrouter would break adjacency with it and regenerate theirrouter LSAs. Often, the topology changes take place con-currently or in quick succession. For example, when anoptical fiber gets accidently cut, all IP-level links carriedby the fiber will go down. Similarly, a power failure at apoint-of-presence (PoP) would lead to all routers residingin the PoP to go down. Such large scale topology changesmay cause a large number of LSAs to be generated. TheseLSAs may travel over different paths to arrive at a routerover an extended period of time. If a router were to performa new routing table calculation for each new router/networkLSA it receives, it may end up doing several such calcu-lations in quick succession that may keep the router CPUbusy for a long time and prevent it from doing other impor-tant tasks such as timely generation and processing of Hellomessages. It is possible that such failures may snowball intoa complete meltdown of routing functionality.

To avoid such problems, commercial routers typicallydo not perform a routing table calculation immediately onreceiving a new LSA. Cisco routers, with older IOS re-leases, used a fixed value parameter spfHoldTime, hence-forth called the hold time, to limit the frequency of rout-ing table calculations to once per 10 seconds. Additionally,there was an spfDelay (5 seconds) in doing a routing tablecalculation after receiving the first new LSA. While fixedspfDelay and spfHoldTime parameters limit the number ofrouting table calculations and hence help avoid routing un-stability, they also slow down the router’s convergence tothe new topology. With the default value 10 seconds for thehold time, a router may take upto 10 seconds to perform arouting table calculation after receiving the final LSA re-sulting from the topology change. This delay adds up withother delays in the convergence process such as the de-lays in LSA generation and flooding following the topologychange. Large convergence delays may have a cripplingimpact on real-time services (e.g. voice over IP) and hencesignificant attention has been devoted in recent past to op-timize the process of failure detection, adjacency establish-ment, LSA generation and flooding in OSPF and other linkstate routing protocols [7, 11, 1, 2, 3, 16].

Topology changes such as individual link/node up/downare common events in IP backbones [13]. Hence, it is im-portant to converge quickly to the new network topologyfollowing such events if they generate only a small num-ber of LSAs. A small value for the hold time would allowquick convergence to such topology changes at the cost of a

few routing table calculations. However, a small hold timevalue would result in too many routing table calculations ifa large scale topology change takes place resulting in thegeneration of a large number of new LSAs that arrive at arouter over an extended period of time. Such continuousLSA arrivals also occur in case a faulty network interfacecard causes link flaps. A small hold time value may seri-ously compromise routing stability under continuous LSAarrivals.

With an objective to achieve fast convergence for topol-ogy changes that generate only a few LSAs and to limitthe frequency of routing table calculations for topologychanges that cause continuous LSA arrivals, Cisco routerswith post 12.2(14)S release IOS use a simple exponentialbackoff scheme to adjust the hold time between successiverouting table calculations [5]. In this scheme, referred to asthe exponential backoff hold time (or the exponential holdtime) scheme in the following discussion 5, the hold timebetween successive routing table calculations is initially setto a small value and increases exponentially upto a certainmaximum for continuous LSA arrivals. In Section 2, wedescribe this scheme in more details and suggest a modi-fication that may perform better than the original schemewhen the initial hold time value is small in comparison tointer-LSA arrival interval.

In an earlier work [10], we analyzed the performance ofdifferent hold time schemes when a topology change leadsto the generation of a fixed number of LSAs. In this paper,we analyze the performance of different hold time schemesunder continuous LSA arrivals. The performance is mea-sured in terms of the ability of different hold time schemesto limit the frequency (number) of routing table calculationsunder continuous (large scale) LSA arrivals. This analysisis performed using Markov Regenerative Process (MRGP)[8, 12] based stochastic models for the hold time schemesas well as simulations of large scale topology changes usingan extensively modified ospfd simulator [9].

The rest of the paper is organized as follows. Sec-tion 2 describes different hold time based schemes. Sec-tion 3 presents MRGP based stochastic models for theschemes under continuous LSA arrivals with exponentiallydistributed inter-arrival times. Section 3 then analyzes thebehavior of the schemes using the models. Section 4 com-pares the performance of different hold time based schemesvia simulations. Finally, Section 5 concludes the paper.

2. Hold Time Based Schemes for SchedulingRouting Table Calculations

A simple fixed hold time scheme to schedule routing ta-ble calculations is shown in Figure 1(a). In this scheme, the

5This scheme is referred to as SPF Throttling in Cisco literature.

7575

Event: LSA receivedAction: do routing calc;start holdtime

Action: noneEvent: hold time over

Holdspf

spfinit

Event: LSA receivedAction: none

Event: LSA receivedAction: none

Action: do routing calc;Event: hold time over

start hold time

(a) A simple “fixed” hold time scheme

Event: LSA receivedAction: do routing calc;start holdtime

Holdspf

spfinit

Event: LSA receivedAction: none

Action: do routing calc;Event: hold time over

start hold time

Action: hold time = HEvent: hold time over

min(2*h,maxH)Action: hold time, h=Event: LSA received

(b) A simple “exponential backoff” based hold time scheme

init spf

Holdspf

Event: LSA receivedAction: none

Action: do routing calc;Event: hold time over

start hold time

min(2*h,maxH)Action: hold time, h=Event: LSA received

HoldOver

spf

hold time = hAction:

Event: initReturn timeover

Action: do routing calc;start holdtimeEvent: LSA received

Action:(re)start initReturn timeEvent: hold time over

Event: LSA received

start hold timeAction: do routing calc;

(c) An “exponential backoff” based hold time scheme usinginitReturn time

Figure 1. Different hold time schemes toschedule routing table calculations

receipt of an LSA in the init state causes a transition to thespf state. A transition to the spf state is accompanied by animmediate6 routing table calculation and starting of a holdtimer. We assume that the time required for a routing tablecalculation is less than the hold time. In spf state, the routerwaits for the hold time to expire or a new LSA to arrive.The expiry of the hold time causes the router to return tothe init state. Otherwise, the receipt of a new LSA movesthe router from the spf state to the spfHold state. In thespfHold state, the router has one or more pending LSAs andis waiting for the hold time to expire so that it can return tothe spf state and perform a routing table calculation, whichwill assimilate all the LSAs received while the hold timerwas running.

A corresponding exponential backoff based hold timescheme is shown in Figure 1(b). In this scheme, henceforthreferred to as simple exponential backoff scheme, the tran-sition from the spf state to the spfHold state causes the holdtime to double in value up to a maximum. The simple ex-ponential backoff scheme starts with a small value for holdtime. In case of frequent LSA arrivals, the hold time is ex-pected to quickly reach its maximum value thereby limitingthe frequency of routing table calculations. However, it ispossible that no LSA is received during a hold time dura-tion and the hold time is reset to its small initial value. Thisevent becomes more probable as the initial hold time valuegets smaller. To address this problem, we modify the simpleexponential backoff scheme as shown in Figure 1(c). Themodified scheme, called the exponential backoff schemeusing initReturn time, uses a new state, spfHoldOver, andan additional parameter, the initReturn time. The expiry ofhold time in the spf state without receiving any new LSAcauses a transition to the spfHoldOver state alongwith start-ing of the initReturn timer. The router returns to the initstate and resets the hold time to its small initial value onlywhen no new LSAs has been received in the spfHoldOverstate for the initReturn time duration. Receipt of a newLSA in the spfHoldOver state before the firing on initReturntimer causes an immediate routing table calculation along-with a transition to the spf state without changing the holdtime value.

3. Understanding The Behavior of Differ-ent Hold Time Schemes Under ContinuousLSA Arrivals

In this section, we analyze the behavior of different holdtime schemes under sustained arrival of LSAs over a rela-tively long time duration. Such scenarios may happen dur-ing large scale topology changes (e.g. network-wide reboot

6A variant scheme would have a small delay (spfDelay) before the rout-ing table calculation accompanying the init to spf transition.

7676

of routers over a few minutes) or due to pathological con-ditions such as link flaps. Our objective is to understandthe ability of different hold time schemes to limit the fre-quency of routing table calculations under such scenarios.For this purpose, we develop Markov Regenerative Pro-cess (MRGP) [8, 12] based stochastic models for three holdtime based schemes under continuous LSA arrivals with ex-ponentially distributed inter-arrival times and examine thesteady state time interval between successive routing tablecalculations. The Markov regenerative process is a pow-erful modeling technique that allows local state transitionsbetween two Markov regeneration points. From our per-spective, modeling a hold time scheme as a Markov regen-erative process allows us to keep track of the deterministichold and initReturn times as the exponentially distributedLSA arrivals cause state transitions.

A Markov regenerative process, {Z(t), t ≥ 0}, has anembedded Markov renewal sequence, {(Yn, Sn), n ≥ 0},where Sn is the nth Markov regeneration point and Yn isa function of the state of the MRGP at time Sn, i.e., Yn =f(Z(Sn)). Typically, Yn is the state of the MRGP itselfat time Sn, i.e., Yn = Z(Sn). The embedded Markov re-newal sequence has an associated kernel, G(x) = [gi,j(x)],where gi,j(x) = P{Y1 = j, S1 ≤ x|Y0 = i}. Letπ = [πi] , i ∈ Ω be a positive solution to the linear sys-tem π = πG(∞),

∑i∈Ω πi = 1, where Ω is the state space

of the embedded Markov renewal sequence. Then, it can beshown that, under certain conditions, the steady state prob-ability pj = limt→∞P{Z(t) = j} of being in state j of theMRGP is given by:

pj =∑

i∈Ω πiαi,j∑

i∈Ω πiμi, (1)

where μi = E(S1|Y0 = i) and αi,j = E(time spent in statej during [0, S1) |Y0 = i).

Figures 2(a), 2(b) and 2(c) show the MRGP models forthree hold time schemes described in the previous section.In the following, we describe the model for the exponentialbackoff scheme using initReturn time. In this model, wehave multiple sets of spf, spfHold and spfHoldOver states,one for each possible hold time value. A transition to anyspf state causes an immediate routing table calculation andstarting of the hold timer. Receipt of an LSA while thehold timer is running causes a transition to the correspond-ing spfHold state and finally to the next spf state when thehold timer fires. Since we limit the hold time to a maximumvalue of 16H , the receipt of an LSA in the spf16 state whilethe hold timer is running would cause a transition back tothe same state after a sojourn in the spfHold16 state. Onthe other hand, if the hold timer fires in an spf state withoutreceiving any LSA, there will be a transition to the corre-sponding spfHoldOver state alongwith starting of the ini-tReturn timer. If an LSA is received while the initReturn

(a) The “fixed” hold time scheme

(b) The simple “exponential backoff” based holdtime scheme

(c) The “exponential backoff” based hold timescheme using initReturn time

Figure 2. Markov regenerative process mod-els for different hold time schemes shown inFigure 1

7777

timer is running, there is a transition to the correspondingspf state. Otherwise, there is a transition to the init statethat causes the hold time to be reset to H and causes a moveto the spf1 state when the next LSA arrives. The stochasticprocess {Z(t), t ≥ 0}, defined as system state at time t, isa Markov regenerative process with an embedded Markovrenewal sequence (Yn, Tn), Yn ∈ Ω = {0, 1, 2, · · ·10},where 0, 1, · · · , 10 are labels for different states as shownin Figure 2(c). The sequence (Yn, Tn) is defined as follows:

• T0 = 0, Y0 = 0.

• If at time T +n , Yn = Z(T +

n ) = 0 then Tn+1 is the timewhen the next LSA arrives and Yn+1 = Z(T +

n+1) = 2.

• If at time T +n , Yn = Z(T +

n ) ∈ {2, 4, 6, 8, 10} thenTn+1 is the time when the current hold time is overand Yn+1 = Z(T +

n+1).

• If at time T +n , Yn = Z(T +

n ) ∈ {1, 3, 5, 7, 9} then Tn+1

is the time when the initReturn time is over or an LSAarrives, whichever occurs first and Yn+1 = Z(T +

n+1).

Note that {Z(t), t ≥ 0} is not a semi Markov process asLSAs could arrive while the hold timer or initReturn timeris running.

The matrix G(∞) = [gi,j ], gi,j = P (Y1 = j|Y0 = i) isgiven by:

g0,2 = 1, g2,1 = e−λH ,g4,3 = e−2λH , g6,5 = e−4λH ,g8,7 = e−8λH , g10,9 = e−16λH ,g2,4 = 1 − e−λH , g4,6 = 1 − e−2λH ,g6,8 = 1 − e−4λH , g8,10 = 1 − e−8λH ,g10,10 = 1 − e−16λH

and

g1,2 = g3,4 = g5,6 = g7,8 = g9,10 = 1 − e−λI ,

g1,0 = g3,0 = g5,0 = g7,0 = g9,0 = e−λI

The rest of the entries in G(∞) are zero. Here, λ is therate of exponentially distributed LSA inter-arrival times, His the initial value of the hold time and I is the value of theinitReturn time. Also, the μi = E(S1|Y0 = i) values are asfollows:

μ0 =1λ

,

μ2 = H, μ4 = 2H, μ6 = 4H, μ8 = 8H, μ10 = 16H,

μ1 = μ3 = μ5 = μ7 = μ9 = Ie−λI +1λ

(1 − e−λI)

We can calculate steady state probabilities of being inone of the spf states using Equation (1) that additionallyrequires αi,j values, where αi,j = E(time spent in state

j during [0, S1)|Y0 = i). Since the transitions to the spfstates take place only at Markov regeneration instants (Sn),the values of αi,j , j ∈ {2, 4, 6, 8, 10}, i �= j are all zero.Thus, the steady state probability of being in an spf state jis given by:

pj =πjαj,j∑i∈Ω πiμi

, j ∈ {2, 4, 6, 8, 10}

where, π = [πi], i ∈ Ω is the solution of linear systemπ = πG(∞),

∑i∈Ω πi = 1. Thus, the steady state rate of

entry in an spf state j is given by:

pj

αj,j=

πj∑i∈Ω πiμi

and hence the steady state rate of routing table calculationsis given by

∑j∈{2,4,6,8,10}

πj∑i∈Ω

πiμi. The inverse of this

rate gives the expected time between successive routing ta-ble calculations in the steady state.

Figure 3 shows the expected time interval between suc-cessive routing table calculations, henceforth called theinter-calculation time, in steady state under continuousLSA arrivals as the (initial) hold time value, H , increases.The curves in Figure 3 are drawn using the MRGP modelsdeveloped above (i.e. the maximum hold time value is 16times the initial value). The LSA arrivals are exponentiallydistributed with rate one LSA per second. The initReturntime, where applicable, is set to 10 seconds. The (initial)hold time value, H , varies in range 20ms to 10 seconds.Figures 3(b) and 3(c) are the zoomed in versions of Figure3(a) focussing on H values less than 1 second and 0.4 sec-ond respectively.

For fixed and simple exponential backoff schemes, ifH is small compared to the inter-LSA arrival interval, therouter is likely to return to the init state before the arrivalof the next LSA, thus resulting in an immediate routing ta-ble calculation for each LSA arrival. Exponential backoffin hold time may not take place or would be undone by areturn to the init state when no LSA arrives during the cur-rent hold time duration. This behavior is apparant in Fig-ure 3(c). As Figure 3(c) shows, under the fixed hold timescheme, the expected inter-calculation time is almost sameas the expected interval between successive LSA arrivals (1second) for small H values. Same is true for the simpleexponential backoff scheme when H is less than 0.15 sec-onds. The simple exponential backoff scheme is susceptibleto hold time resets even when the H value increases beyond0.15 seconds as evidenced from the slow increase in the ex-pected inter-calculation time (Figure 3(c)) towards 16H , themaximum hold time value.

On the other hand, under the exponential backoff schemeusing initReturn time, the hold time reaches and maintainsits maximum value (16H) irrespective of the H value andthe expected inter-calculation time quickly converges to the

7878

0 20 40 60 80

100 120 140 160

0 1 2 3 4 5 6 7 8 9 10

E(in

ter-

calc

ulat

ion

time)

(se

cond

s)

H (seconds)

Exp dist LSA arrivals at rate 1 per sec

16HExpo. with InitReturnExponentialFixed

(a)

0 2 4 6 8

10 12 14 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

E(in

ter-

calc

ulat

ion

time)

(se

cond

s)

H (seconds)

Exp dist LSA arrivals at rate 1 per sec

16HExpo. with InitReturnExponentialFixed

(b)

0

1

2

3

4

5

6

7

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

E(in

ter-

calc

ulat

ion

time)

(se

cond

s)

H (seconds)

Exp dist LSA arrivals at rate 1 per sec

16HExpo. with InitReturnExponentialFixed

(c)

Figure 3. Expected time interval between suc-cessive routing table calculations for thehold time schemes under exponentially dis-tributed LSA arrivals at rate one per second

maximum hold time value as well (Figure 3(c)). Under thisscheme, for very small H values where the maximum holdtime value is smaller than or not significantly larger than theexpected inter-LSA interval (1 second), the expected inter-calculation time is larger than both the expected inter-LSAinterval and the maximum hold time value. This behavior isdue to the exponentially distributed nature of inter-LSA in-terval and can be explained as follows. For such H values,even though the hold time reaches its maximum value, thereis significant probability that no LSA is received during thehold time and the router enters the spfHoldOver state. Dueto the memory less nature of LSA arrivals, the next LSA,which would cause the next routing table calculation, is ex-pected only after 1 second, the expected inter-LSA interval.Thus, the expected inter-calculation time is larger than boththe expected inter-LSA interval and the maximum hold timevalue. As H value increases, the probability that no LSAis received during the maximum hold time duration goesdown quickly and hence the expected inter-calculation in-terval quickly converges to the maximum hold time value(16H).

As H increases to value 0.4 seconds and beyond (Figure3(b)), the probability that an LSA is received during a holdtime duration increases and hence the exponential backoffsin the hold time value become likely. For these H values,the simple exponential backoff scheme is able to maintainthe maximum hold time value (16H) and the expected inter-calculation time converges to 16H . Clearly, the exponentialbackoff scheme with initReturn time is more suitable thanthe simple exponential backoff scheme when the desired Hvalues are quite small. Under the fixed hold time scheme,as the hold time value (H) increases and becomes more thanthe expected inter-LSA arrival interval, the expected inter-calculation interval becomes same as the hold time value(Figure 3(a)).

Overall, the curves in Figure 3 suggest that the relativeperformance of three hold time schemes under continuousLSA arrivals depends on the relative values of the (initial)hold time H and the inter-LSA interval. If H is small com-pared to the inter-LSA interval, the frequency of routing ta-ble calculations with the simple exponential backoff schememay not be better than with the fixed hold time scheme sincethe exponential backoff may not kick in or there may be fre-quent resets of the hold time to its small initial value (H).The exponential backoff scheme with initReturn time is ableto maintain the hold time at its maximum value even forsmall H values. As H becomes larger, both exponentialbackoff schemes are able to maintain the hold time at itsmaximum value and thus limit the frequency of the routingtable calculations. Under the fixed hold time scheme, thetime interval between successive routing table calculationsis expected to be the larger of the inter-LSA interval and thehold time.

7979

Simulation Time LineTime Event0-100s out-of-PoP routers come up and establish

adjacency with each other100-110s in-PoP routers come up in a certain sequence100-200s in-PoP routers establish adjacency with each

other and with out-of-PoP routers200-210s in-PoP routers go down in a certain sequence200-300s out-of-PoP routers break adjancency with

dead in-PoP routersSimulation Parameters

Initial Hold time (H): 0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 6.4sMax Hold time: 16 × H , initReturn time: 10sHello interval: 10s, minLSInterval: 5sOSPF packet processing time: 1msRouting table calculation time: 10ms

Table 1. Simulation Details

Note that the initial and the maximum hold time valuesshould satisfy the needs to limit the frequency of routing ta-ble calculations for large scale, continuous LSA arrivals andto achieve fast convergence for common topology changesthat generate only a few LSAs. For many networks, deter-mining such values may be difficult. In an earlier work [10],we suggested an alternate mechanism to schedule routingtable calculations. This mechanism, called LSA Correla-tion, does not use individual LSAs as the triggers for routingtable calculations. Rather, the individual LSAs are corre-lated to identify the topology change that caused their gen-eration. A routing table calculation may be performed whenthe topology change itself has been identified. In [10], wedemonstrated that the LSA correlation scheme works verywell for both individual and large scale topology changes.

4. Simulations to compare the performance ofdifferent Hold time based schemes underlarge scale topology changes

In this section, we report results of simulations that ex-amine the ability of different hold time schemes to limitthe number of routing table calculations when large scaletopology changes take place. These simulations were per-formed using a modified version [9] of the ospfd simulator[14]. The network topology used in these simulations con-sists of 54 nodes and 164 bidirectional links. This topol-ogy is derived from a real ISP backbone topology, obtainedfrom Rocketfuel website [18], considering only routers in asingle PoP and their out- of-PoP neighbor routers. The PoPconsists of 29 routers inter-connected with 104 bidirectionallinks, with 10 of these routers connected to 25 out-of-PoPneighbors via 35 links. The 25 out-of-PoP routers are con-

nected in a ring using links not present in original topol-ogy. All the routers belong to the same OSPF area. Table1 shows the time line of events in each simulation. In eachsimulation, we observe the routing table calculations per-formed by out-of-PoP routers in time windows (100− 200)seconds and (200−300) seconds, corresponding to the PoPup and PoP down events respectively. Table 1 also showsthe values of different parameters used in these simulations.Here, minLSInterval refers to the minimum time betweenconsecutive generation of an LSA. To get good confidence,each simulation was repeated 36 times with 36 different ran-domly generated time sequences for in-PoP router up/downevents in time windows (100 − 110) and (200 − 210) sec-onds.

Figure 4 shows the mean number of the routing tablecalculations performed by out-of-pop routers under differ-ent hold time schemes for PoP Up and PoP Down events.These mean values were calculated across all the out-of-PoProuters as well as across all 36 simulation runs. The 95%confidence intervals after 36 simulation runs were observedto be always within ±7% of the mean. Note that about 115and 30 LSAs were generated as the result of the PoP Up andPoP Down events respectively. The simulations results fordifferent hold time schemes (Figure 4) follow the expectedpattern - the routing table calculations drop in number as the(initial) hold time value increases. The exponential back-off scheme with initReturn time is observed to require lessrouting table calculations than simple exponential backoffscheme for small initial hold time values.

5. Conclusion

Commercial OSPF routers typically use a hold time tolimit the frequency of routing table calculations as multi-ple LSAs arrive following a topology change. In this pa-per, we examined the performance of different hold timeschemes under continuous LSA arrivals via stochastic mod-els and simulations. Our analysis suggests that the popularexponential backoff scheme to adjust hold time is suscep-tible to undesirable resets in hold time value when there isa brief lull in LSA arrivals and a proposed modification tothe scheme performs much better than the original for smallvalues of the initial hold time.

References

[1] C. Alaettinoglu, V. Jacobson, and H. Yu. Towards millisec-ond IGP convergence. In NANOG 20, October 2000.

[2] A. Basu and J. Riecke. Stability issues in OSPF routing.Computer Commun. Rev., 31(4), October 2001.

[3] G. Choudhury. Prioritized treatment of specific OSPF ver-sion 2 packets and congestion avoidance. Request For Com-ments (Best Current Practice) RFC 4222, Internet Engineer-ing Task Force, October 2005.

8080

0 10 20 30 40 50 60 70 80 90

100 110

100 1000 10000Num

Rou

ting

Tab

le C

alcu

latio

ns

(Initial) Hold Time (in ms)

PoP Up Simulations

FixedExponential

Exp w InitReturn

(a)

0

5

10

15

20

25

30

35

100 1000 10000Num

Rou

ting

Tab

le C

alcu

latio

ns

(Initial) Hold Time (in ms)

PoP Down Simulations

FixedExponential

Exp w InitReturn

(b)

Figure 4. Simulation results

[4] Cisco. OSPF incremental SPF. http://www.cisco.com/en/US/products/ps6350/products configuration guidechapter09186a00804556a5.html.

[5] Cisco. OSPF shortest path first throttling.http://www.cisco.com/en/US/products/sw/iosswrel/ps1838/products featureguide09186a0080134ad8.html.

[6] E. Dijkstra. A note on two problems in connexion withgraphs. Numerische Mathematik, 1:269–271, 1959.

[7] P. Francois, C. Filsfils, J. Evans, and O. Bonaventure.Achieving sub-second IGP convergence in large IP net-works. Computer Commun. Rev., 35(3), July 2005.

[8] R. Fricks, M. Telek, A. Puliafito, and K. Trivedi. Markovrenewal theory applied to performability evaluation. InK. Bagchi and G. Zobrist, editors, State-of-the Art in Perfor-mance Modeling and Simulation. Modeling and Simulationof Advanced Computer Systems: Applications and Systems,pages 193–236. Gordon and Breach Publishers, Newark, NJ,1998.

[9] M. Goyal, S. Bhaaradwaj, S. Venkatesh, and M. Soperi. Adistributed OSPFD simulator. http://www.cs.uwm.edu/∼mukul/newospfd.html.

[10] M. Goyal, W. Xie, M. Soperi, H. Hosseini, and K. Vairavan.Scheduling routing table calculations to achieve fast conver-gence in OSPF protocol. In Proc. IEEE Broadnets, 2007.

[11] D. Katz and D. Ward. Bidirectional forwarding detection.INTERNET-DRAFT, draft-ietf-bfd-base-05.txt, June 2006.(work in progress).

[12] V. Kulkarni. Modeling and Analysis of Stochastic Systems.Chapman-Hall, 1995.

[13] A. Markopoulou, G. Iannaccone, S. Bhattacharaya,C. Chuah, and C. Diot. Characterization of failures in anIP backbone. In Proc. INFOCOM’2004, 2004.

[14] J. Moy. OSPFD routing software resources. http://www.ospf.org.

[15] J. Moy. OSPF version 2. Request For Comments (StandardsTrack) RFC 2328, Internet Engineering Task Force, April1998.

[16] R. Ogier and P. Spagnolo. MANET extension of OSPF us-ing CDS flooding. INTERNET-DRAFT, draft-ogier-manet-ospf-extension-08.txt, October 2006. (work in progress).

[17] D. Oran. OSI IS-IS intra-domain routing protocol. RequestFor Comments RFC 1142, Internet Engineering Task Force,February 1990.

[18] Rocketfuel. Backbone topologies annotated with in-ferred weights and link latencies. http://www.cs.washington.edu/research/networking/rocketfuel/maps/weights-dist.tar.gz.

[19] A. Shaikh and A. Greenberg. Experience in black-box OSPFmeasurement. In Proc. 1st ACM SIGCOMM Workshop onInternet Measurement, pages 113–125, 2001.

8181