LASH-TOR: A Generic Transition-Oriented Routing Algorithm

10
LASH-TOR: A Generic Transition-Oriented Routing Algorithm * Tor Skeie, Olav Lysne J. Flich, P. L´ opez, A. Robles, J. Duato Simula.Research Laboratory Dept. of Computer Engineering P.O. Box 134, N-1325 Lysaker, Norway Universidad Polit´ ecnica de Valencia Camino de Vera, 14, 46071–Valencia, Spain E-mail: {tor.skeie,olav.lysne}@simula.no E-mail: {jflich,plopez,arobles,jduato}@gap.upv.es Abstract Cluster networks are seen as the future access networks for multimedia streaming, E-commerce, network storage, etc. For these applications, performance and high availa- bility are particularly crucial. Regular topologies are pre- ferred when performance is the primary concern. However, due to spatial constraints or fault-related issues, the net- work structure may become irregular, which makes more difficult to find deadlock-free minimal paths. Over the recent years, several solutions have been pro- posed. One of them is the LASH routing, which enables mi- nimal routing by assigning paths to different virtual layers. In this paper, we propose an extension of LASH in order to reduce the number of required virtual layers by allowing transitions between virtual layers. Evaluation results show that the new routing scheme (LASH-TOR) is able to obtain full minimal routing with a reduced number of virtual channels. For torus and mesh networks, with only two virtual channels, LASH through- put is increased by an average factor of improvement of 3.30 for large networks. For regular networks with some un- connected (faulty) links, equal performance improvements are achieved. Even for highly irregular networks of size up to 128 switches the new routing scheme only needs three vir- tual channels for guaranteeing minimal routing. Besides, LASH-TOR performs well compared to Dimension Order Routing for mesh and torus networks. 1. Introduction Networks of workstations (NOWs) or clusters of PCs are being considered as a cost-effective alternative to small and medium scale parallel computing. The performance of NOWs is closely related to the advances in the interconnec- tion network. NOWs are usually arranged as switch-based * This work was supported by the Spanish MCYT under Grant TIC2003-08154-C06-01 and by Junta de Comunidades de Castilla-La Mancha under Grant PBC-02-008. networks whose topology is defined by the customer. The layout of the network can be designed by using regular or ir- regular topologies. Regular topologies are often used when performance is the primary concern [10]. Preferred topolo- gies are 2D/3D mesh/torus networks. For these topologies, several routing algorithms have been proposed. Among de- terministic routings, dimension-ordered routing (DOR) is the most commonly used. As the number of nodes in NOWs increases, the proba- bility of fault is higher. Also, as processors, switches, and links work often close to their technology limits the proba- bility of fault also increases. For some environments, like high-performance computation and web servers, it is criti- cal to keep the system running even in the presence of faults. In the presence of some switch or link failures, a regu- lar network will become an irregular one. The irregularity of the topology makes routing quite complicated. For instance, dimension-ordered routing is not able to route packets in the presence of a single fault in meshes and tori networks. In or- der to tolerate faults, some fault-tolerant routing algorithms have been proposed. However, most of them rely on speci- fic logic (i.e. adaptivity) not present in current interconnects like InfiniBand TM [6], or rely on defining fault regions thus disabling some nodes not incurred in the fault [1]. An alternative to tolerate faults in regular networks is the use of generic routing algorithms. Generic rou- ting algorithms are those that can be used for any topology. Up*/down* [13] is the most popular generic routing scheme used in the NOW environment. Routing is based on an as- signment of direction labels (”up” or ”down”) to the links in the network by building a BFS spanning tree. Some im- provements of this routing scheme were recently proposed [11, 8, 5]. Other generic routing schemes have been propo- sed for NOWs, such as the adaptive-trail[9], the minimal adaptive[14], and the smart-routing[2]. In general, de- spite the performance improvements achieved by these rou- ting strategies, they cannot be applied to every network technology. This is because either they require extra func- tionality in the switches not present in all technologies (i.e., adaptive routing for adaptive-trail and minimal ad- aptive) or they have a very high computational cost (for smart-routing).

Transcript of LASH-TOR: A Generic Transition-Oriented Routing Algorithm

LASH-TOR: A Generic Transition-Oriented Routing Algorithm ∗

Tor Skeie, Olav Lysne J. Flich, P. Lopez, A. Robles, J. DuatoSimula.Research Laboratory Dept. of Computer Engineering

P.O. Box 134, N-1325 Lysaker, Norway Universidad Politecnica de Valencia

Camino de Vera, 14, 46071–Valencia, Spain

E-mail: {tor.skeie,olav.lysne}@simula.no E-mail: {jflich,plopez,arobles,jduato}@gap.upv.es

Abstract

Cluster networks are seen as the future access networksfor multimedia streaming, E-commerce, network storage,etc. For these applications, performance and high availa-bility are particularly crucial. Regular topologies are pre-ferred when performance is the primary concern. However,due to spatial constraints or fault-related issues, the net-work structure may become irregular, which makes moredifficult to find deadlock-free minimal paths.

Over the recent years, several solutions have been pro-posed. One of them is the LASH routing, which enables mi-nimal routing by assigning paths to different virtual layers.In this paper, we propose an extension of LASH in orderto reduce the number of required virtual layers by allowingtransitions between virtual layers.

Evaluation results show that the new routing scheme(LASH-TOR) is able to obtain full minimal routing with areduced number of virtual channels. For torus and meshnetworks, with only two virtual channels, LASH through-put is increased by an average factor of improvement of3.30 for large networks. For regular networks with some un-connected (faulty) links, equal performance improvementsare achieved. Even for highly irregular networks of size upto 128 switches the new routing scheme only needs three vir-tual channels for guaranteeing minimal routing. Besides,LASH-TOR performs well compared to Dimension OrderRouting for mesh and torus networks.

1. Introduction

Networks of workstations (NOWs) or clusters of PCsare being considered as a cost-effective alternative to smalland medium scale parallel computing. The performance ofNOWs is closely related to the advances in the interconnec-tion network. NOWs are usually arranged as switch-based

∗ This work was supported by the Spanish MCYT under GrantTIC2003-08154-C06-01 and by Junta de Comunidades de Castilla-LaMancha under Grant PBC-02-008.

networks whose topology is defined by the customer. Thelayout of the network can be designed by using regular or ir-regular topologies. Regular topologies are often used whenperformance is the primary concern [10]. Preferred topolo-gies are 2D/3D mesh/torus networks. For these topologies,several routing algorithms have been proposed. Among de-terministic routings, dimension-ordered routing (DOR) isthe most commonly used.

As the number of nodes in NOWs increases, the proba-bility of fault is higher. Also, as processors, switches, andlinks work often close to their technology limits the proba-bility of fault also increases. For some environments, likehigh-performance computation and web servers, it is criti-cal to keep the system running even in the presence of faults.

In the presence of some switch or link failures, a regu-lar network will become an irregular one. The irregularity ofthe topology makes routing quite complicated. For instance,dimension-ordered routing is not able to route packets in thepresence of a single fault in meshes and tori networks. In or-der to tolerate faults, some fault-tolerant routing algorithmshave been proposed. However, most of them rely on speci-fic logic (i.e. adaptivity) not present in current interconnectslike InfiniBandTM [6], or rely on defining fault regions thusdisabling some nodes not incurred in the fault [1].

An alternative to tolerate faults in regular networksis the use of generic routing algorithms. Generic rou-ting algorithms are those that can be used for any topology.Up*/down* [13] is the most popular generic routing schemeused in the NOW environment. Routing is based on an as-signment of direction labels (”up” or ”down”) to the linksin the network by building a BFS spanning tree. Some im-provements of this routing scheme were recently proposed[11, 8, 5]. Other generic routing schemes have been propo-sed for NOWs, such as the adaptive-trail[9], the minimaladaptive[14], and the smart-routing[2]. In general, de-spite the performance improvements achieved by these rou-ting strategies, they cannot be applied to every networktechnology. This is because either they require extra func-tionality in the switches not present in all technologies(i.e., adaptive routing for adaptive-trail and minimal ad-aptive) or they have a very high computational cost (forsmart-routing).

Moreover, most of the routing strategies referred abovehave a serious drawback. They do not guarantee that allpackets will be routed through minimal paths (using theshortest path). Preventing some packets from following mi-nimal paths is likely to cause an increment in the packetlatency, especially when packets are short. Moreover, rou-ting through non-minimal paths could lead to an ineffi-cient use of network resources, affecting the overall net-work performance, which could become increasingly signi-ficant as network size increases. Therefore, routing throughminimal paths becomes a key issue, specially in networkswith irregular topologies. Another serious drawback is thenetwork unbalance caused by some of them. In particular,up*/down* routing forces a high percentage of traffic tocross the root node. This network unbalance causes the sy-stem to congest rapidly for low traffic rates.

Over the recent years, several approaches have been pro-posed to solve these problems. In [4] minimal routing is pro-vided for all the packets. To avoid deadlocks, packets areejected from the network, temporarily stored at some inter-mediate hosts and later injected into the network. Howe-ver, this mechanism requires some support at NICs and atleast one host to be attached to every switch in the network,which cannot always be guaranteed. Other approaches makeuse of virtual channels to route all the packets through mi-nimal paths while still guaranteeing deadlock freedom. Inthis sense, the Layered Shortest Path [15] (LASH) routingprovides deterministic shortest path routing in irregular net-works. Deadlock freedom is achieved by dividing the phy-sical network into a set of virtual networks using separatevirtual channels. Minimal paths between every pair of hostsare spreaded onto these layers, such that each layer beco-mes deadlock free.

In [12] another approach that uses virtual networks is ta-ken. Unlike the LASH routing, a baseline routing algorithmis used (the up*/down* routing) in order to decide whenchanging to a new virtual network (when a forbidden tran-sition appears). In order to remove cyclic channel depen-dencies, virtual networks are crossed in increasing order,thus guaranteeing deadlock freedom. This strategy is ba-sed on detecting forbidden ”down” → ”up” transitions, thusidentifying the switches in which transitions between vir-tual networks must be carried out. However, due to the largenumber of routing restrictions imposed by up*/down*, pro-viding minimal paths among every pair of hosts may requirea high number of virtual networks.

2. Motivation

Using a relatively large number of virtual channels toprovide minimal routing could not be possible in some net-work technologies, such as InfiniBandTM , in which virtualchannels are mainly intended for other purposes (i.e., qua-lity of service, traffic prioritization, etc.). Therefore, redu-cing as much as possible the number of virtual channels re-quired to provide minimal routing must be considered one

of the primary concerns. Furthermore, if minimal routingcannot be fully achieved by using a bounded number of net-work resources, at least it should be possible to guaranteean acceptable performance level.

In this paper we try to take advantage of the experi-ence acquired by the authors in the development of both theLASH routing [15] and the strategy proposed in [12]. In par-ticular, we propose an extension of the LASH routing me-thodology by adding the transition mechanism between vir-tual networks proposed in [12]. The resulting routing me-thodology, which we will refer to as LASH-Transition Ori-ented Routing (LASH-TOR), is able to provide minimalrouting by using less network resources than those requi-red by the LASH routing. Furthermore, unlike conventionalLASH, LASH-TOR routing obtains good performance le-vel when the number of available resources is bounded.

We also apply the proposed routing algorithm to diffe-rent network topologies. Although LASH-TOR is a genericrouting algorithm that can be applied to any topology (regu-lar or irregular), we will apply it onto 2D mesh and 2D to-rus networks of different sizes. In fact there are very fewresults in the literature comparing dedicated routing functi-ons (like DOR) with generic routing algorithms. Also, regu-lar networks with some unconnected links (modeling faults)and totally irregular networks will be evaluated.

The rest of the paper is organized as follows. In Section 3the LASH-TOR routing strategy is presented. Section 4 ana-lyzes the number of layers required to provide minimal rou-ting for different networks. In Section 5, the performanceof the new routing scheme is evaluated and compared withthe LASH approach. Finally, some conclusions are drawnin Section 6.

3. LASH-TOR Routing

LASH-TOR is a further development of the LASH rou-ting methodology [15]. The new version extends LASHwith mechanisms described by Sancho et. al. [12]. We willrefer to the latter routing method as Up*/Down*-basedtransition oriented routing throughout this article.

In LASH, each physical channel can be multiplexed intoup to v virtual channels. We can see the network as compo-sed by multiple (v) virtual networks or layers, with the sametopology. A given virtual channel only belongs to one layer.The idea behind LASH is that each virtual layer li in the net-work has a set of <source, destination> pairs assigned toit, in such a way that a given <source, destination> pairis assigned to exactly one virtual layer. All the paths assi-gned to a layer are minimal and deadlock free. To do so,a given <source, destination> pair is assigned to the firstexisting layer in which it does not introduce cycles in thechannel dependence graph or to a new one, if required. Inorder to accommodate minimal paths among all <source,destination> pairs, several layers are needed.

The Up*/Down*-based transition oriented method, onthe other hand, approaches the problem of minimal rou-

ting through the use of up*/down* trees [13], which al-lows minimal routing only relative to some constraints (i.e.a minimal path can only be deployed as long as it doesnot make a down→up turn). To overcome this problem,the Up*/Down*-based method allows a path among a gi-ven <source, destination> pair to be split into two ormore layers. In particular, when a minimal path between a<source, destination> pair makes a down→up turn, a tran-sition from a virtual layer to a higher numbered virtual layeris performed. Since these “down→up” transitions betweenthe layers are carried out in a strictly increasing order, it fol-lows that the method is deadlock free.

LASH-TOR relies on the use of virtual layers in the net-work. As in LASH, given a <source, destination> pair,a minimal path according to any routing strategy can be se-lected. However, as in transition-oriented routing, a path canbe split into several virtual layers (on the contrary, in LASHa given path is located in a given layer). In particular, gi-ven a <source, destination> pair, LASH-TOR tries to finda minimal path that does not introduce cycles in the chan-nel dependence graph. To avoid these cycles, some pathsmust traverse several virtual layers. The algorithm choosesthe path that minimizes the number of required layers. Af-ter all <source, destination> pairs have been considered,the number of layers to accommodate all paths is obtained.

Some networks may not provide the required numberof virtual layers (for instance, InfiniBandTM provides upto 15 virtual lanes and some of them can be reserved forother purposes). In the case of a bounded number of layers,LASH-TOR will reserve one of them to apply a deadlock-free routing scheme that does not guarantee minimal paths,such as up*/down*. This layer will contain the paths bet-ween <source, destination> pairs that require more virtuallayers that those available in the network for routing.

Once the paths have been spread over the virtual layers,LASH-TOR will try to balance virtual layer utilization, mo-ving some subpaths from virtual layers most heavily usedto the ones that contain less subpaths, ensuring that cyclicchannel dependencies do not arise.

As we shall see in the following sections, this new rou-ting strategy outperforms conventional LASH in terms ofresource utilization efficiency and performance.

3.1. Detailed Description of LASH-TOR Routing

Definition 1 An interconnection network I is representedby a strongly connected directed graph, I = G(N, C). Thevertices of I are the set of routers N , whereas the edges arethe set of communication channels C. Each channel is uni-directional and transmits data from a source node to a de-stination node. A physical channel ci ∈ C may be split intoup to V virtual channels ci1 , ci2 , ..., ciV

. Let C ′ be the setof virtual channels in the network.Definition 2 A deterministic routing function R : N×C ′×N −→ C ′ returns the virtual output channel to be takenfrom node ni for packets entering across virtual input chan-nel cij

and whose destination is nd.

Notice that this definition takes into account the inputchannel a packet arrives on for routing. This is not possiblein all technologies. E.g. the routing tables in InfiniBandTM

force the switches to make routing decisions based on thedestination address only. However, by means of the techni-que proposed in [7] this kind of routing algorithms are im-plementable in InfiniBandTM .Definition 3 For a network I and routing function Rthere exists a dependency from channel ci to cj iffcj = R(dst(ci), ci, n) for some node n. That is, packets de-stined for n may use cj immediately after ci.

We will use variants of the following theorem due toDally and Seitz [3] in order to prove that the routing functi-ons we define in this paper are deadlock free.Theorem 1 A deterministic routing function R for an inter-connection network I is deadlock-free iff its channel depen-dency graph is acyclic.Definition 4 A network layer Li of network I is a subsetof virtual channels in I such that each link has exactly twochannels in Li, one in each direction.Definition 5 A set L of network layers {L1, . . . , Ln} is alayering of I iff

• Li is a layer of I for all i,• Li and Lj are disjoint for all distinct i and j• all channels in I are element of Li for exactly one i.The above two definitions allow us to view any layer of

a network as a bidirectional virtual network that is isomor-phic to the original physical network.Definition 6 Assume a layering L of network I . A traf-fic assignment function T of L takes a <source,destination> pair of compute nodes as input and re-turns with the subsets L′ ⊆ L and S ⊆ N , respec-tively, where L′ represents the set of layers that shall beused to forward packets from the source to the destina-tion node and S the set of switches where a transition fromone virtual layer to the next one shall be performed. No-tice that |S| = |L′| − 1.

T (s, d) is the strictly ordered set of layersL′ = {L1, L2, ..., L|L′|} that contains the path from sources to destination d. Transitions between layers are perfor-med in the switches given by the set S = {s1, s2, ..., s|S|},where si represents the switch where a transition fromlayer Li to layer Li+1, for all i, is performed.

Specifically, the traffic is originated in layer L1 and stayswithin this layer until it reaches switch s1 where a transitionto layer L2 is made and so on. Finally, at switch s|S|, a tran-sition to layer L|L′| takes place and furthermore the trafficis forwarded within this layer towards the destination node.In other words, the path between a source s and destinationd is split into |L′| sub-paths, each of them assigned to a vir-tual layer, as given by L′. However, L′ may also be a single-ton layer (and S = ∅), meaning that the traffic is assignedto only one virtual layer (L1).

Next, we formally describe the steps followed by theLASH-TOR routing algorithm for assigning <source,destination> pairs onto virtual layers. We assume that

a network I , and a layering L of that network are gi-ven.

Step 1: Let T (s, d) = undefined for all <source,destination> pairs s, d. Let also the routing func-tion R be empty.

Step 2: Obtain the physical routes by finding all the shor-test paths sp (if several exist) between any <source,destination> pair within the network I .

Step 3: Take one <source, destination> pair that has notbeen processed yet and do:

Step 3.1: For all the shortest paths spi available to this<source, destination> pair, find {L′, S}spi

(deter-mine T (s, d) with respect to spi), such that enrichingR with spi, which is split on the virtual layers L′ accor-ding to the transition switches sj ∈ S, will not closea cycle of dependencies within the respective virtuallayers Li ∈ L′. We may have that {L′, S}spi

is unde-fined due to lack of virtual layers (i.e. |L′| > |L| is re-quested).

Step 3.2: For all the shortest paths spi available to this<source, destination> pair, select the spi having{L′, S}spi

defined so that it has the least need for vir-tual layers (i.e. |L′| as small as possible). If one exists,let T (s, d) = {L′, S}spi

, otherwise leave T (s, d) un-changed.

Step 3.2: If T (s, d) is defined, enrich R accordingly.

Step 4: If there are more <source, destination> pairs thathave not been been processed, goto step 3.

Lemma 1 If the above algorithm results in a traffic as-signment function in which T (s, d) 6= undefined for all<source, destination> pairs, then, the network I is freefrom deadlocks with respect to R and T . Furthermore, allpackets are routed along minimal paths.Proof: Minimal routing follows immediately from step 2.By observation of step 3.1 we see that no cycle in the depen-dency graph associated with each virtual layer will ever beformed. Neither will any cyclic dependencies between thelayers be formed, since the transitions are made in an in-creasing order. Deadlock freedom thus follows from theo-rem 1.2

If the network contains fewer layers than needed to pro-vide shortest path routing, the algorithm fails by leavingthe traffic assignment function undefined for some <source,destination> pairs. However, the method can be easily ad-apted in such a way that these pairs are assigned to a re-served separate layer, where they are not routed accordingto their shortest paths, but according to an up*/down* rou-ting scheme. Let us refer to the separate layer as the ud layerand, furthermore, assume that Ln = ud.

For performance reasons, we propose an additio-nal step intending to reduce the number of undefined pairsby also allowing them for transitions to the ud layer. By let-ting the paths for undefined <source, destination> pairs

make a transition to the ud layer, some of them may nowachieve the defined status (i.e. shortest path can be de-ployed). However, note that this step can only take place af-ter the up*/down* paths for the rest of undefined <source,destination> pairs have been defined (in order to gua-rantee that in the worst case we will have available pathsfor them), otherwise we might risk that a subsequent as-signment of the up*/down* paths could form cyclic de-pendencies with some of the shortest paths already beingassigned to the ud layer.

Also, for performance reasons, we propose to extendLASH-TOR with an additional step that will balance vir-tual channels. As the algorithm is formulated above, L1

will be the most filled virtual layer, L2 the second mostfilled one, etc., and Ln may only contain very few sub-paths. This could result in poor utilization of the networkresources. The typical situation is that most of the paths as-signed to virtual layer L1 are complete paths - i.e. for a si-gnificant number of <source, destination> pairs the trafficassignment function T (s, d) returns with a singleton layer(|L′| = 1). These paths can be reassigned to other less fil-led virtual layers.

Referring to this discussion, we below refine the LASH-TOR algorithm with complementary steps for tackling a re-stricted number of available virtual layers as well as a stepfor virtual layer balancing.

Step 5: For the <source, destination> pairs havingT (s, d) undefined, determine the up ∗ /down∗ pathsand update the channel dependency graph of vir-tual layer ud accordingly.

Step 6: For the <source, destination> pairs havingT (s, d) undefined, verify if T (s, d) with respect tothe same shortest path considered in step 3.1 (spi),{L′, S}spi

can be defined by also allowing for a tran-sition to virtual layer ud (as ud is the last layer,L′ = L). If this is the case, let T (sd) = {L, S}spi

,otherwise set T (sd) = {{ud}, ∅} and update R accor-dingly.

Step 7: Select a random <source, destination> pairwhose path is entirely assigned to L1, and reas-sign it to Lj (1 < j ≤ |L|) if it does not close acycle in layer Lj . Repeat this process until the num-ber of (sub)paths are evenly balanced among the vir-tual layers or no more <source, destination> pairsare subject for a move. Update R accordingly to the re-assignments.

The algorithm shown above has a complexity of n2

where n is the number of switches in the network. Our testsshow that this does not become a problem until the networksare quite large (256 switches or more).

4. Number of Virtual Layers Required

An important issue in the evaluation of LASH andLASH-TOR routing is the number of virtual layers that

0

0.5

1

1.5

2

2.5

3

3.5

4

0 200 400 600 800 1000

Num

ber

of v

irtua

l lan

es n

eede

d

Number of connecting links

LASH-TOR applied on networks of 64 switches

AverageMinimum

Maximum

(a)

0

1

2

3

4

5

6

0 200 400 600 800 1000

Num

ber

of v

irtua

l lan

es n

eede

d

Number of connecting links

LASH applied on networks of 64 switches

AverageMinimum

Maximum

(b)

0

2

4

6

8

10

0 100 200 300 400 500 600 700

Num

ber

of v

irtua

l lan

es n

eede

d

Number of connecting links

LASH-TOR and LASH applied on networks of 256 switches

Average LASH-TORMinimum LASH-TORMaximum LASH-TOR

Average LASHMinimum LASHMaximum LASH

(c)

Figure 1. Virtual layers needed to guarantee shortest path routing. Network size is (a,b) 64 and (c)256 switches. For each network size and link connectivity 100 random topologies were generated.

are needed in order to provide minimal routing bet-ween every source/destination pair. The required numberof layers depends on both network size and connec-tivity. In this section we analyze the relation betweenthese two network parameters and the need for vir-tual layers for LASH-TOR being compared to conventionalLASH. We will also present results for the case of a boun-ded number of virtual layers, showing the number of<source, destination> pairs that could not be gran-ted shortest paths - i.e. debating another aspect of themethods’ resource utilization efficiency.

Let us first turn our attention on the number of virtuallayers required to guarantee minimal routing. Some ans-wers are easily derived. If we have minimal connectivity sothat the network has the shape of a tree, it suffices with onevirtual layer because no cycle of dependencies can be clo-sed as long as all paths are minimal. Furthermore, networkswith maximal connectivity will also need only one layer, asno packet will traverse more than one link, thus no chan-nel dependencies will exist.

However, most of the network topologies that are usedin practice will be somewhere in between of these two ex-tremes. Thus, we have conducted a series of experiments.We have considered four network sizes (32 to 256 swit-ches). For each network size, we have considered a range ofconnectivities from minimal towards a high level of connec-tivity in steps of adding one or two links at a time. For eachnetwork size and connectivity, we have generated and ana-lyzed 100 random topologies. In figures 1.a, 1.b, and 1.cwe have displayed the average, the maximum and the mi-nimum number of virtual layers needed for 64- and 256-switch networks. The results show that LASH-TOR is muchmore efficient than LASH as network size increases. For 32-switch networks (not shown), the difference is less noticea-ble - both LASH-TOR and LASH require at most 3 virtuallayers, but LASH-TOR needs this number of virtual layersonly for very few cases, resulting in an average number ofvirtual layers close to 2. However, when the network size in-creases to 128 switches (not shown), LASH-TOR still does

not require more than 3 virtual layers for all the tested ca-ses, compared to the 6 layers required for LASH. Moreo-ver, for large networks of 256 switches, LASH-TOR requi-res only 4 virtual layers versus the 9 needed for LASH.

Another surprising outcome is that the variance in theneeded number of layers is very small. Even with 100 ran-dom topologies tested for each network configuration, thedifference between the most demanding and the least de-manding topology was never more than 1 layer (for LASH,we observed a difference of 2 in some cases).

Let us now focus on the situation where the availablenumber of virtual layers is limited. This may cause that anumber of <source, destination> pairs cannot be grantedshortest paths (refer to Steps 3.1 and 3.2 of the algorithmproposed in Section 3). These “undefined” pairs possiblyhave to be routed non minimally according to an up*/down*routing scheme (see Steps 5 and 6 of the algorithm proposedin Section 3). In order to evaluate this issue, we have alsoempirically measured the number of non-minimal <source,destination> pairs for an availability of 2, 3 and 4 vir-tual layers. For these experiments, we have considered bothirregular and regular networks. Let us first turn our atten-tion on irregular ones where we have used network sizes of32, 64, 128, and 256 switches (S), and where connectivi-ties of 1.5S, 2S, and 2.5S have been examined. Note that itis apparent from our previous plots (see Figure 1) that, forany of the considered network sizes, the maximum num-ber of virtual layers was needed when the number of linksin the topology was about twice the number of switches(1.5S − 2.5S). Since these connectivities are very likelychoices for many different application areas, we used thisconnectivity throughout these experiments - those connec-tivities will furthermore be used in the performance evalua-tion following this section.

The results are presented in Table 1. We can see fromthe results that LASH-TOR is bounded only when thereare only two virtual layers available, regardless of net-work size and connectivity. For 32-switch networks, lessthan 0.1% of the <source, destination> pairs will be

S=32 LASH-TOR LASH S=64 LASH-TOR LASHP

PP

PP

PP

#links

#VL 2VL 3VL 2VL 3VL 4VLP

PP

PP

PP

#links

#VL 2VL 3VL 2VL 3VL 4VL

1.5S 0.08 0 25.10 1.45 0 1.5S 1.30 0 37.66 9.56 0.932S 0.01 0 25.08 0.97 0 2S 0.77 0 38.90 10.83 0.642.5S 0.002 0 22.80 0.08 0 2.5S 0.27 0 37.16 7.56 0.14

S=128 LASH-TOR LASH S=256 LASH-TOR LASHP

PP

PP

PP

#links

#VL 2VL 3VL 2VL 3VL 4VLP

PP

PP

PP

#links

#VL 2VL 3VL 2VL 3VL 4VL

1.5S 5.30 0 49.75 24.33 8.74 1.5S - - - - -2S 4.13 0 50.40 24.84 9.21 2S 8.58 0 59.32 40.05 23.632.5S 2.39 0 49.05 22.13 7.08 2.5S 6.21 0 58.66 37.01 20.46

Network LASH-TOR LASH Network LASH-TOR LASHP

PP

PP

PP

Topology

#VL 2VL 2VL 3VLP

PP

PP

PP

Topology

#VL 2VL 2VL 3VL

Mesh 4x4 0 17.5 0 Torus 4x4 0 8.3 0.4Mesh 8x4 0 17.4 0 Torus 8x4 0 18.6 1.3Mesh 8x8 0 16.3 0 Torus 8x8 0 20.8 16.0Mesh 16x8 0 15.8 0 Torus 16x8 0 30.1 24.3

Table 1. Percentage of <source,destination> pairs that could not be granted shortest paths. Irregu-lar (#links given as a proportion of the number of switches) and regular networks.

routed non-minimally. Even, increasing the network sizeup to 128 switches, always less than 6% of the <source,destination> pairs will follow non-minimal up*/down*paths. On the other hand, for LASH, a significant number ofthe <source, destination> pairs (22%-25%) will be definedas non-minimal for networks of 32 switches when 2 virtuallayers are available. For 128-switch networks, this num-ber is increased to about 50%. Even when 4 virtual layersare available, LASH cannot guarantee shortest path routingfor 64 and 128-switch network. Increasing the network sizeeven more to as large as 256-switch networks LASH-TORis still able to route minimally for an availability of 3 virtuallayers for the connectivity 2S and 2.5S (only for a few ca-ses of lower connectivity LASH-TOR needs 4 virtual layersas shown by figure 1.c). This expresses the strength of thetransition oriented approach.

For regular topologies we have considered 2D mesh andtorus networks of the sizes 4x4, 8x4, 8x8, and 16x8. The re-sults are presented in Table 1 for an availability of 2 and3 virtual layers. LASH-TOR enables shortest path routingfor all <source, destination> pairs regardless of the to-pology and the network size as long as 2 virtual layers areavailable. LASH on the other hand ends up with a signi-ficant number of “undefined” pairs possibly having to berouted non minimally according to the up*/down* routingscheme when only 2 virtual layers are offered, and for torusnetworks this number increases as the size of the networkgrows. When 3 virtual layers are available LASH is able toguarantee minimal routing for all the mesh networks, whilefor the torus 8x8 and 16x8 cases still a significant numberof the <source, destination> pairs would have to be rou-ted by the up*/down* algorithm.

5. Performance Evaluation

In this section, we analyze by simulation the generic rou-ting capabilities of LASH-TOR. The evaluation is conduc-ted with respect to regular topologies (2D mesh and to-rus networks), semi-regular topologies (faults imposed on2D mesh and torus networks), and irregular topologies.For the regular network cases we compare LASH-TORwith Dimension-Ordered Routing (DOR) and conventionalLASH, while for the latter two cases we compare LASH-TOR with LASH. In particular, we are interested in ana-lyzing the trade-off between network performance and thenumber of network resources available for routing purpo-ses.

5.1. The Simulation Model

The simulator models a switch-based network withpoint-to-point links, allowing any topology defined by theuser. End-nodes are attached to switches using the samelinks used between switches.

Packets are routed at each switch by accessing the rou-ting table. This table contains the output port to be used atthe switch for each possible destination. If there is sufficientbuffer capacity in the output buffer, the packet is forwarded.Otherwise, the packet must wait at the input buffer. In thesimulator, each switch will have a crossbar connecting theinput ports to the output ports, allowing multiple packetsto be simultaneously transmitted without interference. Thecrossbar will be able to transmit one byte on every cycle perconnection. Switches will support up to a maximum of 16

virtual lanes (VL). At each output port, a round-robin arbi-ter will select the next VL with available data to transmitonto the link.

We will use a non-multiplexed crossbar on each switch.This crossbar supplies separate ports for each virtual lane.We will use a simple crossbar arbiter based on FIFO re-quest queues per output crossbar port. The crossbar band-width will be set accordingly to the injection rate of thelinks. Buffers of 1 KB will be used both at the input andthe output side of the crossbar.

The routing time at each switch will be set to 100 ns.This time includes the time to access the routing table, thecrossbar arbiter time, and the time to set up the crossbarconnections. Additionally, the virtual cut-through switchingtechnique is used.

In the simulator, the link injection rate will be fixed to 2.5Gbps. We will also model the fly time (time required for abit to reach the opposite link side). This parameter dependson the link length and the propagation delay of the cable.We will model 20 m copper cables with a propagation de-lay of 5 ns/m. Therefore, the fly time will be set to 100 ns.The simulator models a credit-based flow control scheme.A packet will be transmitted over the link if the number ofavailable credits is enough to store the entire packet (1 cre-dit = 64 bytes). We will use 32-byte packets and the uniformdistribution of packet destinations.

5.2. Simulation Results of Regular Networks

In this section we analyze the performance of LASH-TOR in regular networks comparing it with DOR and con-ventional LASH. More specifically, we assess 2D meshand torus topologies of sizes 8x4, 8x8, and 16x8. Let usfirst turn our attention on the mesh experiments. Figure 2shows the evaluation results of the different routing algo-rithms. We will refer to them as DOR xVL, LASH xVL,and TOR xVL, where x is the number of deployed virtualchannels, which will be 2 or 3. Note that DOR is able toprovide minimal routing in meshes by using just one virtualchannel. However, results for DOR 1VL are not plotted be-cause its behavior is significantly worse than that exhibitedby DOR 2VL.

The results show a consistent picture where DOR givesthe best performance, however, with LASH-TOR slightlybehind it (for the 16x8 case they perform similar). LASH,on the other hand, does not quite follow the other two al-gorithms, in particular this is the case when only 2 virtualchannels are available and as the network size grows. Thereason for this is that LASH is not able to offer minimal rou-ting for all the <source, destination> pairs as LASH-TORdoes. As can be seen by the Table 1, about 15%-18% of thepaths cannot be served by LASH and they will have to beprovided by using the up*/down* algorithm, possibly beingnon-minimal and causing traffic congestion around the rootin the up*/down* tree (see Steps 5 and 6 of the algorithmproposed in Section 3).

This effect is far more noticeable as the size of the net-work grows (Figures 2.b and 2.c). Though the number ofup*/down* paths here is rather independent of the networksize (varies between 15% - 18%) the length of these paths,however, increases more relative to the shortest paths. Forthe situation when 3 virtual channels are available, LASH isalso able to serve all <source, destination> pairs with shor-test paths, but still it does not perform as well as LASH-TOR and DOR for the 8x8 and 16x8 cases (Figures 2.band 2.c, respectively). The reason for this is that LASHand LASH-TOR may select different shortest paths, whichcan have impact on the physical distribution of the trafficand consequently the all-over performance. Both algorithmshave the freedom to select the shortest path that has the leastneed for virtual layers (if several paths have the same needone of them is picked randomly) (refer Step 3.2 in Sec-tion 3). Since LASH is strictly layered and LASH-TOR istransition-oriented they may, from that point of view, se-lect different shortest paths (because they are operating ondifferent channel dependency graphs). Another main fin-ding here is that LASH-TOR, when using 3 virtual chan-nels, does not boost the performance compared to the twovirtual channel scenario.

Figure 3 shows results in torus networks. As can be seen,DOR and LASH-TOR perform similar for all the testednetwork sizes. Furthermore, as for mesh networks, DORand LASH-TOR outperform conventional LASH with in-creasing network size (Figures 3.b and 3.c). In fact, for the8x8 torus case, LASH-TOR when deploying 2 virtual chan-nels has 1.8 times higher throughput compared to LASHwhen using 3 virtual layers. The corresponding through-put improvement for the 16x8 network is 4.2. The reasonfor this is again that a significant number of the <source,destination> pairs cannot be served by LASH and have tobe provided by the up*/down* algorithm (refer table 1).

Moreover, LASH cannot guarantee minimal routing evenwhen 3 virtual layers are offered (for the 8x8 and 16x8 casesthe number of up*/down* paths are 16.0% and 24.3%, re-spectively). Another interesting aspect we find is related tothe smaller network sizes. LASH actually is ahead of DORand LASH-TOR (Figure 3.a), where LASH has about 30%higher throughput compared to the other two algorithms forthe torus 8x4. The explanation of LASH winning here is thatis a strictly layered concept, while LASH-TOR and DORon the other hand are transition oriented. Note that the lat-ter method has also to rely on transitions from one virtuallayer to another virtual layer in order to guarantee deadlock-free routing for k-ary n-cubes (the transitions are related tothe use of the wraparound channels). This transition orien-ted nature may cause congestion scenarios where hot-spotsin one virtual layer affect also other virtual layers. Howe-ver, for larger networks we do not see that LASH is ahead ofLASH-TOR and DOR since the LASH performance is af-fected by the fact that it cannot guarantee minimal routingfor a significant number of paths. Besides, offering LASH-TOR 3 virtual channels does not boost performance compa-

0

2000

4000

6000

8000

10000

0.01 0.02 0.03 0.04 0.05

Ave

rage

Mes

sage

Lat

ency

(ns

)

Traffic (flits/ns/switch)

’DOR 2VL’’LASH 2VL’

’TOR 2VL’’LASH 3VL’

’TOR 3VL’

(a)

0

2000

4000

6000

8000

10000

0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

Ave

rage

Mes

sage

Lat

ency

(ns

)

Traffic (flits/ns/switch)

’DOR 2VL’’LASH 2VL’

’TOR 2VL’’LASH 3VL’

’TOR 3VL’

(b)

0

2000

4000

6000

8000

10000

0.005 0.01 0.015 0.02 0.025

Ave

rage

Mes

sage

Lat

ency

(ns

)

Traffic (flits/ns/switch)

’DOR 2VL’’LASH 2VL’

’TOR 2VL’’LASH 3VL’

’TOR 3VL’

(c)

Figure 2. Average packet latency vs. traffic. (a) 8x4 mesh, (b) 8x8 mesh, and (c) 16x8 mesh.

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Ave

rage

Mes

sage

Lat

ency

(ns

)

Traffic (flits/ns/switch)

’DOR 2VL’’LASH 2VL’

’TOR 2VL’’LASH 3VL’

’TOR 3VL’

(a)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0.01 0.02 0.03 0.04 0.05 0.06 0.07

Ave

rage

Mes

sage

Lat

ency

(ns

)

Traffic (flits/ns/switch)

’DOR 2VL’’LASH 2VL’

’TOR 2VL’’LASH 3VL’

’TOR 3VL’

(b)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045

Ave

rage

Mes

sage

Lat

ency

(ns

)

Traffic (flits/ns/switch)

’DOR 2VL’’LASH 2VL’

’TOR 2VL’’LASH 3VL’

’TOR 3VL’

(c)

Figure 3. Average packet latency vs. traffic. (a) 8x4 torus, (b) 8x8 torus, and (c) 16x8 torus.

red to the 2 virtual channel case.

5.3. Simulation Results of Semi-Regular Networks

Now we focus on the previous networks with unconnec-ted (faulty) links. Different percentages of unconnectedlinks will be used. In particular, 1%, 3% and 5% of thelinks will be eliminated randomly. In order to have resultsindependently of the positions of the missing links, 10 ran-dom combinations of unconnected links will be evaluatedfor each topology and average results will be shown.

Table 2 shows minimum, maximum, and average fac-tors of throughput increase for different topologies with dif-ferent percentages of unconnected links when LASH andLASH-TOR use 2 and 3 virtual channels. Notice that DORrouting is not evaluated in this context since it cannot be ap-plied to these semi-regular networks.

As can be seen from the first column of average results,LASH routing always benefits from using a third virtualchannel, since LASH hardly is able to obtain full minimalrouting when only 2 virtual channels are used. The averageimprovement depends on topology but they range from 1.12to 2.28. On the contrary, for LASH-TOR using an additio-nal virtual channel it does not boost the performance. Thatis, the network throughput is not increased significantly (re-

gardless of topology used). Note also that the percentages ofunconnected links do not lead to obtaining significant diffe-rences in performance for LASH and LASH-TOR. It seemsthat the different analyzed percentages of unconnected linkslead to similar irregular networks.

More important, from the table we can observe the si-gnificant improvements in network throughput when using2 virtual channels with LASH-TOR instead of using themwith LASH. As the size of the network increases, the im-provements are much more noticeable. For mesh networks,throughput is increased on average, by factors ranging from1.18 for a 8x4 mesh up to 2.48 for a 16x8 mesh network.Moreover, the improvements are higher for torus networks.For the 16x8 torus case, average factor of throughput in-crease is always higher than 2.95 for different percentagesof unconnected links. These results also corroborate the re-sults obtained in the previous section.

Finally, the last column states that with a third virtualchannel LASH reaches performance levels of LASH-TOR.However, for large torus networks LASH still suffers fromrouting through non-minimal paths and from that point ofview obtains lower throughput values. For a 16x8 torus,LASH-TOR with 3 virtual channels doubles throughputachieved by LASH with 3 virtual channels.

LASH (3VL vs 2VL) TOR (3VL vs 2VL) TOR 2VL vs LASH 2VL TOR 3VL vs LASH 3VLTopology %Faults Min Max Avg Min Max Avg Min Max Avg Min Max Avg

mesh 8 × 4 1 1.10 1.59 1.27 0.98 1.11 1.03 0.82 1.62 1.18 0.75 1.08 0.94mesh 8 × 4 3 1.13 1.74 1.44 0.96 1.09 1.00 1.03 1.88 1.45 0.80 1.21 1.01mesh 8 × 4 5 1.17 1.72 1.39 0.99 1.09 1.03 1.14 1.57 1.27 0.81 1.00 0.94mesh 8 × 8 1 1.56 2.50 1.98 0.99 1.05 1.01 1.43 2.73 2.11 0.82 1.26 1.08mesh 8 × 8 3 1.32 2.41 2.06 0.97 1.12 1.02 1.49 2.44 1.99 0.71 1.25 1.00mesh 8 × 8 5 1.69 2.61 1.98 0.97 1.12 1.04 1.43 2.46 2.02 0.86 1.30 1.06mesh 16 × 8 1 1.63 2.68 2.22 0.97 1.07 1.03 1.75 3.16 2.48 0.80 1.96 1.17mesh 16 × 8 3 1.55 2.86 2.28 0.95 1.08 1.03 1.41 3.64 2.39 0.91 1.44 1.07mesh 16 × 8 5 1.44 2.60 2.13 0.94 1.05 1.01 1.73 3.24 2.39 0.91 1.86 1.15

torus 8 × 4 1 1.48 2.44 1.75 1.00 1.05 1.02 1.33 2.22 1.67 0.87 1.14 0.98torus 8 × 4 3 1.35 2.09 1.65 1.00 1.05 1.01 1.12 2.10 1.61 0.83 1.24 0.99torus 8 × 4 5 1.22 1.71 1.53 0.77 1.06 0.97 1.29 2.01 1.56 0.83 1.25 0.99torus 8 × 8 1 1.07 1.78 1.34 0.97 1.08 1.01 1.62 2.56 1.95 1.19 1.85 1.48torus 8 × 8 3 1.14 1.67 1.36 0.98 1.06 1.01 1.70 2.58 2.03 1.27 2.07 1.53torus 8 × 8 5 1.15 1.49 1.29 0.95 1.05 1.00 1.51 3.00 2.12 1.17 2.46 1.65torus 16 × 8 1 1.03 1.26 1.17 0.96 1.15 1.05 2.93 3.97 3.30 2.38 3.40 2.88torus 16 × 8 3 1.07 1.18 1.12 1.00 1.18 1.08 2.52 3.54 2.96 2.36 3.34 2.89torus 16 × 8 5 0.93 1.23 1.12 0.96 1.17 1.03 2.60 4.10 3.07 2.15 3.56 2.74

Table 2. Min., max., and avg. factor of throughput increase when using LASH-TOR vs LASH.

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

Ave

rage

Mes

sage

Lat

ency

(ns

)

Traffic (flits/ns/switch)

’LASH_2_VL’’LASH_3_VL’’LASH_4_VL’

’TOR_2VL’’TOR_3VL’’TOR_4VL’

(a)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0.01 0.02 0.03 0.04 0.05 0.06

Ave

rage

Mes

sage

Lat

ency

(ns

)

Traffic (flits/ns/switch)

’LASH_2VL’’LASH_3VL’’LASH_4VL’

’TOR_2VL’’TOR_3VL’’TOR_4VL’

(b)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0.005 0.01 0.015 0.02 0.025 0.03

Ave

rage

Mes

sage

Lat

ency

(ns

)

Traffic (flits/ns/switch)

’LASH_2VL’’LASH_3VL’’LASH_4VL’

’TOR_2VL’’TOR_3VL’’TOR_4VL’

(c)

Figure 4. Average packet latency vs. traffic. (a) 32, (b) 64, and (c) 256 switches. 2S links.

5.4. Simulation Results of Irregular Networks

As more and more faults occur in regular networks theytend to become highly irregular, besides we also have clu-ster networks that originally may be structured irregularly.In this Section, we evaluate LASH and LASH-TOR with re-spect to such irregular networks where we have randomlygenerated irregular topologies of different sizes with diffe-rent number of available resources. In particular, we willevaluate the routing algorithms for an availability of 2, 3,and 4 virtual channels. Average results will be shown ta-ken from 10 random topologies for each network size.

Figure 4 shows the performance evaluation of LASH andLASH-TOR with different number of resources for selectedtopologies of different networks sizes with 2S links. For 16-switch networks (not shown) the performance of LASH andLASH-TOR routings are quite similar. Further, the num-ber of available virtual channels does not affect. The rea-

son for this behavior is that LASH routing is able to obtaina minimal path for almost all source-destination pairs, evenwhen only 2 virtual channels are available for routing pur-poses. Therefore, no improvement is achieved by using vir-tual channel transitions as LASH-TOR allows.

For 32-switch networks, LASH needs up to 3 VLs in or-der to guarantee minimal routing. As can be seen in Figure4.a, LASH 2VL suffers a 22% of throughput decrease dueto this fact. Only with 3 (LASH 3VL) and 4 (LASH 4VL)virtual channels, LASH obtains the maximum performancein terms of throughput. On the other hand, LASH-TORrouting is not affected by the number of virtual channels(LASH-TOR 2VL already obtains the maximum perfor-mance).

Moreover, as network size increases, the LASH routingbecomes more affected by the lack of enough VLs in orderto provide minimal paths (see Table 1). This leads to a de-crement in performance. For 64-switch irregular networks

(Figure 4.b) LASH 2VL performance is highly limited (per-formance is decreased by a 36%). Even LASH 3VL lowersperformance by a 15%. Indeed, LASH-TOR 2VL routingobtains better performance that LASH 3VL.

Therefore, as network size increases, using virtual chan-nel transitions increases network performance. This can becorroborated on results obtained for 256-switch networks(not shown). The performance of LASH-TOR routing isquite significant. LASH routing is not able to obtain theperformance of LASH-TOR 2VL routing even with 4 VLs.Also, LASH-TOR routing with 3 VLs is able to obtain themaximum performance (the same performance is achievedby LASH-TOR 4VL).

From additional evaluations (not shown), the conclusi-ons made from the previous figures are sustained by theaverage results obtained. For uniform traffic, 2S connec-tivity, and 2 VLs devoted to routing, the average factor ofthroughput increase ranges from 1.00 for small networksto 2.08 for larger networks. As the number of VLs increa-ses, the factors of throughput increase when using LASH-TOR routing decrease. However, for 256-switch networks,on average, LASH-TOR 4VL increases LASH 4VL perfor-mance by a factor of 1.88. Also, for higher connected topo-logies (2.5S), even when 4 VLs are available for routing, theLASH-TOR routing takes over LASH routing. Average fac-tor of throughput increase reached 1.26 for 128-switch net-works.

6. Conclusions

In this paper we have challenged the problem of rou-ting through shortest paths by using the smallest numberof virtual channels. In particular, we have extended theLASH routing scheme by allowing transitions between vir-tual layers. The resulting routing algorithm (LASH-TOR)achieves full minimal routing with a reduced number of vir-tual channels. To the best of our knowledge, LASH-TOR isthe generic routing scheme that requires the smallest num-ber of resources to provide minimal routing in cluster net-works, regardless of their topology. Empirically we showthat 3 virtual channels are sufficient to provide full mini-mal routing for network sizes up to 128 switches (regular,semi-regular or highly irregular topologies), whereas LASHrequires up to 6 virtual channels. Furthermore, only one ad-ditional virtual channel is required by LASH-TOR in somecases for 256-switch networks (LASH requires up to 9).

Extensive evaluations have been conducted with respectto regular topologies (2D mesh and torus networks), semi-regular topologies (faults imposed on 2D mesh and torusnetworks), and irregular topologies. For the regular networkcases we compare LASH-TOR with DOR where we seethat for mesh networks LASH-TOR is slightly behind DOR,while for the torus topologies DOR and LASH-TOR ex-hibit similar performance. We have in fact seen very fewresults comparing dedicated routing algorithms with gene-ric routing algorithms in the literature. Furthermore, the

performance evaluation shows that for an availability of2 virtual channels LASH-TOR outperforms LASH regard-less of network topology (regular or irregular), since LASHhere cannot provide minimal paths between every <source,destination> pair. Even using 3 or 4 virtual channels LASHis not able to match the performance of LASH-TOR with in-creasing network size.

References

[1] S.Chalasani and R.V. Boppana, ”Communication in multi-computers with nonconvex faults”, IEEE Trans. Computers,vol. 46, no. 5, pp. 616-622, May 1997.

[2] L. Cherkasova, V. Kotov, and T. Rokicki, “Fibre channel fa-brics: Evaluation and design,” in Proc. of 29th Int. Conf. onSystem Sciences, Feb. 1995.

[3] W. J. Dally and C. I. Seitz. Deadlock-free message routing inmultiprocessor interconnection networks. IEEE Trans. Com-puters, C-36(5):547–553, May 1987.

[4] J. Flich, P. Lopez, M. P. Malumbres, and J. Duato, “Boostingthe Performance of Myrinet Networks,” in IEEE Trans. Par-allel and Distributed Systems, vol. 13, num 7, pp 693-709,2002.

[5] J. Flich, P. Lopez, J.C. Sancho, A. Robles, and J. Duato, “Im-proving InfiniBand Routing through Multiple Virtual Net-works,” in Int. Symp. High Performance Computing, May2002.

[6] InfiniBandTM Trade Association, http://www. infini-bandta.com.

[7] P. Lopez, J. Flich, and J. Duato, Deadlock-free Routing inInfiniBandTM through Destination Renaming, in Proc. of2001 Int. Conf. Parallel Processing, Sept. 2001.

[8] O. Lysne and T. Skeie, Load balancing of irregular systemarea networks through multiple roots, Proc. of the Int. Conf.Communication in Computing, CIC 2001, CSREA Press, pa-ges 165 -171.

[9] W. Qiao and L. M. Ni, “Adaptive routing in irregular net-works using cut-through switches,” in Proc. of the 1996 Int.Conf. Parallel Processing, Aug. 1996.

[10] R. Riesen et al., CPLANT, Proc. of the Second Extreme Li-nux Workshop, June 1999.

[11] J.C. Sancho, A. Robles, and J. Duato, New methodology tocompute deadlock-free routing tables for irregular networks,in Proc. of CANPC’2000, Jan. 2000.

[12] J. C. Sancho, A. Robles, J. Flich, P. Lopez, and J. Duato. Ef-fective methodology for deadlock-free minimal routing in in-finiband networks. Proc. of the 2002 Int. Conf. Parallel Pro-cessing (ICPP ’02), IEEE Computer Society, 2002.

[13] M. D. Schroder et.al. Autonet: A high-speed, self-configuring local area network using point-to-point links.SRC Research Report 59, Digital Equipment Corporation,1990.

[14] F. Silla and J. Duato, Tuning the Number of Virtual Channelsin Networks of Workstations, in Proc. of the 10th Int. Conf.Parallel and Distributed Computing Systems (PDCS’97),Oct. 1997.

[15] T. Skeie, O. Lysne, and I. Theiss. Layered shortest ath(LASH) routing in irregular system area networks. In Proc.of Workshop of Communication Architecture for Clusters(CAC’02), April 2002.