Scalable Fractional Lambda Switching: A Testbed

Baldi et al. VOL. 3, NO. 5/MAY 2011/J. OPT. COMMUN. NETW. 447

Scalable Fractional Lambda Switching:A Testbed

Mario Baldi, Michele Corrà, Giorgio Fontana, Guido Marchetto, Yoram Ofek, Danilo Severina, andOlga Zadedyurina

Abstract—This paper presents experiments on a testbedbased on ultra-scalable switches realized using off-the-shelfoptical and electronic components. The scalability of thisswitching architecture, a direct outcome of the deploymentof pipeline forwarding, results—in addition to much lowercost—in the need for a smaller amount of components, and,consequently, lower power dissipation, which is key to a“greening” of the Internet. Although an all-optical architectureis demonstrated, we reached the conclusion that, given thecurrent state of the art, a hybrid electro-optical architectureis the “best-of-breed” switch solution.

Index Terms—Multi-terabit packet switching; Optical net-works; Pipeline forwarding; Sub-lambda switching; Testbed;Time-driven switching.

I. INTRODUCTION

T he steady (exponential) growth of the Internet over thepast few years is impressive, but the applications so far

deployed are insignificant in terms of bandwidth and servicerequirements when compared to multimedia ones, especially ifinteractive in nature. In fact, bandwidth demand is expectedto grow significantly, driven in particular by the growth inthe popularity of multimedia applications (see some recentpredictions by Cisco Systems [1]). A large part of those appli-cations will be deployed by home users, who are accustomedto not paying very much. Furthermore, the long term successof such bandwidth intensive applications, together with theopportunity for service providers to benefit from new sourcesof revenue, is dependent on the cost of such a predictableservice. Consequently, over-provisioning, which is the basis forproviding predictable services on today’s networks, is not likelyto be a viable solution to accommodate these growing newtypes of traffic. This considered, how will the Internet be ableto support this additional growth, and who will pay for theconstruction and energy consumption of such a network? Thedevelopment of highly scalable (i.e., to multi-terabit/s) switchesthat can handle such media-based applications, while ensuring

Manuscript received December 15, 2010; revised March 22, 2011; acceptedMarch 22, 2011; published April 29, 2011 (Doc. ID 139693).

Mario Baldi and Guido Marchetto (e-mail: [email protected]) are withthe Department of Control and Computer Engineering, Politecnico di Torino,Torino, Italy.

Michele Corrà is with the TRETEC s.r.l., Trento, Italy.Giorgio Fontana, Yoram Ofek, Danilo Severina, and Olga Zadedyurina are

with the Department of Information and Communication Technology, Universityof Trento, Trento, Italy.

Digital Object Identifier 10.1364/JOCN.3.000447

the right level of service quality [2], is urgently needed.Scalability implies not only (i) the capability of achieving highbit rates of aggregated switched traffic, but also (ii) lower costand (iii) lower energy consumption per bit switched, whichis emerging as a major requirement for the future Internet,which is expected to be “green” [3–5].

Wavelength division multiplexing (WDM), originally pro-posed as a solution to tap into the large transmissionpotential of fiber optics to increase the capacity of existingand future fiber infrastructures, has been used as the basisfor the realization of high performance switching systems andmulti-wavelength optical networks that have been the subjectof research for many years (e.g., [6,7]). Optical switching hasthe potential of being highly scalable, in all three dimensionsmentioned above: capability of building a high performancearchitecture, low cost and low power consumption. However,besides the cost of optical components still being very high,1

the bandwidth granularity in lambda (λ) switched networksis that of a whole optical channel, i.e., only the whole opticalchannel capacity or nothing can be allocated. Switching awhole optical channel is not efficient since each optical channelhas a capacity ranging from 2.5 Gb/s to 100 Gb/s (in thenear future), which accommodates a very large number of ses-sions/connections/flows. The obvious solution is the implemen-tation of asynchronous IP packet switching, but when lookingat all-optical networking, asynchronous IP packet switchingis not yet practical. Moreover, future traffic will include alarge portion of multimedia distribution that is inherentlypredictable and of long duration (i.e., minutes or more).

More recently, optical burst switching (OBS) [8] has beenproposed for the realization of high performance asynchronousswitching systems. A burst accommodates a possibly largenumber of packets. In OBS, control packets are transferredon a control channel to configure switching nodes before thearrival of corresponding bursts, reducing the requirementsfor optical buffers. Although OBS is interesting and somesolutions for quality of service have been defined for it(e.g., [9–12]), the asynchronous nature of burst switchingmakes the necessary switching fabrics hard to implement andcontrol, even when the traffic load is moderate.

Traditional TDM (time division multiplexing) systems, suchas SONET/SDH, represent a synchronous solution that usesfrequency synchronization with known bounds on clock drifts.In order to overcome possible data loss, SONET/SDH usesrather complex overhead information to accommodate (1) the

1 The cost of optical components, currently very high due to the immaturity of thetechnology, is expected to decrease, due to economy of scale and their inherentsimplicity.

1943-0620/11/050447-11/$15.00 © 2011 Optical Society of America

448 J. OPT. COMMUN. NETW./VOL. 3, NO. 5/MAY 2011 Baldi et al.

accumulation of delay uncertainties and (2) continuous clockdrifts from the nominal value. Multiplexing in SONET/SDHis based on time slots (TS) organized in a reoccurring structurecalled a frame. Due to the lack of phase synchronization amongnodes (resulting from the abovementioned continuous clockdrifts), a TS incoming from an input interface might be storedfor up to the duration of a whole frame before being sent outon its output link. In order to keep the delay introduced byeach node small, the frame duration is defined as 125 µs and,consequently, the TS is defined to hold small-size data units,specifically 1 byte. The 1-byte TS, with a duration of about1 ns at 10 Gb/s (OC-192/STM-48), and the other mechanismsrequired to cope with continuous clock drifts (such as thefloating payload) significantly contribute to SONET/SDH’scomplexity and cost and make its optical implementationimpractical.

This paper presents a testbed demonstrating a methodknown as fractional λ switching (FλS), a.k.a. time-drivenswitching (TDS), based on the application of pipelineforwarding (see, for example, [13,14]). In FλS, all nodes share acommon time reference (CTR) that coordinates switching oper-ations in the entire portion of the network in which the solutionis used. In essence, fixed size time slots are defined and usedas minimum switching units. In this way, the optical channelscan be partitioned into a number of sub-lambda (or “fractional”lambda) channels that can be switched independently, thusmaking the network more bandwidth efficient than wholeoptical channel switching. Scalability is maximized as none of(i) header processing, (ii) reliance on small-size switching units(such as in SONET/SDH), or (iii) large buffering is required, as“stopping of the serial bit stream” is minimized. Furthermore,FλS provides service guarantees for multimedia applicationsas a “bonus,” i.e., without any added complexity, such as costand power consumption. In summary, FλS enables

1. high scalability of network switches to 10–100 Tb/s in asingle chassis;

2. all-optical implementation with state of the art technology;

3. quality of service guarantees (deterministic delay and jitterand no IP packet loss) for (UDP-based) constant bit rate andvariable bit rate streaming applications—as needed; while

4. preserving the support of elastic, TCP-based traffic, i.e., ex-isting applications based on “best-effort” services are notaffected in any way.

The primary objectives of this work are (i) to demonstratethe feasibility of our design principles; (ii) to confirm the lowcomplexity, cost and energy consumption of FλS by imple-menting a testbed using off-the-shelf optical and electroniccomponents; and (iii) to provide a testbed for measurementsand further experimental studies. It is worth mentioningthat although one-way media streaming is deployed in thepresented experiments for simplicity, the very low end-to-enddelay and jitter guaranteed by the system make it idealfor the support of interactive services. Even though anall-optical FλS switch prototype is presented to demonstratethe feasibility of the implementation fully in the opticaldomain, a hybrid electro-optical architecture is concluded tobe the “best-of-breed” switch solution given the current state ofthe art.

The potential of using time to drive switching operationsin optical networks is confirmed by several other works.For example, in CANON [15,16] the network is divided intoclusters interconnected by a mesh optical network. Within eachcluster, nodes are connected to form an optical ring accessed ina time division multiple access (TDMA) fashion, thus based ona common time reference. A proper reservation of time slots co-ordinated by a master node allows the avoidance of contentionsand packet loss on each ring. Since the common time referenceis not global, contentions may instead occur in the meshnetwork connecting the master nodes, where different opticalswitching solutions (e.g., optical interconnection, wavelengthswitching, OBS) may be used depending on the networkcomplexity and expected traffic profile. CANON and FλS arecomplementary. First, by extending the common time referencebeyond the single clusters, FλS could be deployed on the meshoptical network interconnecting master nodes and seamlesslyintegrated with the TDMA, thus extending the congestionfree service end-to-end. Second, the hierarchical, on-the-flyresource reservation solution proposed in CANON could bedeployed in FλS for which periodic reservation has been so farproposed. Time sliced OBS (TSOBS) [17] applies the concept oftime-driven operation in the context of OBS to limit the buffer-ing requirements of OBS switches. A time reference definesfixed size time slots, grouped in frames, which are reserved toincoming bursts. Since each burst can be allocated to any of theavailable time slots, it may be buffered for a whole frame beforebeing forwarded. In order to keep buffering requirementscompatible with current optical technology, the frame must beshort (e.g., 100 ns–1 µs), particularly with high-capacity links.The main difference between TSOBS and FλS is the scope ofthe time reference, which is common to all network nodes inFλS and local to each switch in TSOBS. As a consequence, thetime difference between the beginning of frames on the inputsof TSOBS switches (defined by the time reference on upstreamnodes) and the ones on the outputs is not constant. Thismakes the deployment of immediate forwarding (explained inSubsection II.A) with its very limited buffering requirementsimpossible and buffering for at least one frame necessary.

The rest of the paper is organized as follows. Section IIpresents the basic concepts and ideas underlying the proposedscalable switch design. The architecture and the implemen-tation of a prototypal switching system are presented inSection III. Subsection III.D is devoted to the switch controllercard that is at the heart of the switching system. Section IVpresents the testbed realized with prototypal switches and theresults of some testing and measurement experiments, mostsignificantly with six nodes and 100 km of single mode fiber(4 fiber segments of 25 km). Section V summarizes the workand draws some conclusions.

II. SCALABLE DESIGN

A. Pipeline Forwarding With UTC

Pipeline forwarding of packets is one of the key componentsof our scalable switch design. Thus, this section brieflyintroduces this method. An extensive and detailed description


(coordinated

1 2 100

timecycle 0

1 2 100

timecycle 1

1 2 100

timecycle 79

with 8k time frames

0 11 2 100

timecycle 0

1 2 100

timecycle 1

1 2 100

timecycle 7 9

with 8k time frames

T T T T T T T T T T T

Supercycle 0 Supercycle m

beginningof a UTC second

beginningof a UTC second

universal time)

CTR from UTC

Fig. 1. Common time reference structure.

of pipeline forwarding is outside the scope of this paper and isavailable in the literature [13,14,18,19].

The necessary condition for pipeline forwarding is to have aCTR. See, for example, the initial work presented in [20]. In thedesign presented in this paper, the UTC (coordinated universaltime) is used. This is globally available via tens of sources onEarth and in space (e.g., GPS (USA) and GLONASS (Russia),and in the near future Galileo (EU) and Beidou (China)),and can be easily distributed through the network itself(i.e., in-band) using a variety of mechanisms and standardprotocols, e.g., [21].

All packet switches utilize a time period called a time frame(TF). The TF duration T, which does not have to be thesame throughout the network, is derived as a fraction of theUTC second. As shown in Fig. 1, TFs are grouped into timecycles (TCs), and TCs are further grouped into supercycles;this timing structure aligned in all nodes constitutes the CTR.Each supercycle duration may be one UTC second, as shown inFig. 1, where the 125 µs time frame duration T is obtained bydividing the UTC second by 8000; sequences of 100 time framesare grouped into one time cycle, and a sequence of 80 timecycles is comprised in one supercycle (i.e., one UTC second).

During a resource reservation phase TFs are reserved foreach flow on the links of its route. Thus, TFs can be viewedas virtual containers for multiple IP packets that are switchedand forwarded according to the UTC. The TC provides the basisfor a periodic repetition of the reservation, while the supercycleoffers a basis for reservations with a period longer than asingle TC.

A signaling protocol should be chosen for performingresource reservation and TF scheduling, i.e., selecting the TFin which packets belonging to a given flow should be forwardedby each router. Existing standard protocols and formats shouldbe used whenever possible. Many solutions have been proposedfor distributed scheduling in pipeline forwarding networks [19]and the generalized MPLS (G-MPLS) control plane providessignaling protocols suitable for their implementation.

The basic pipeline forwarding operation as originallyproposed in [18] is regulated by two simple rules: (i) all packetsthat must be sent in TF t by a node must be in its output ports’buffers at the end of TF t−1, and (ii) a packet p transmittedin TF t by a node Nn must be transmitted in TF t+d by nodeNn+1, where d is a predefined integer called forwarding delay,and TF t and TF t+ d are also referred to as the forwardingTFs of packet p at node Nn and node Nn+1, respectively. Itfollows that packets are timely forwarded along their path andserved at well defined instants of time at each node. Nodestherefore operate as if they were part of a pipeline, from whichthe technology’s name is derived. In pipeline forwarding, a

synchronous virtual pipe (SVP) is a predefined schedule forforwarding a pre-allocated amount of bytes during one ormore TFs along a path of subsequent UTC-based switches.A hierarchical resource reservation model can be used toset up SVPs, which enables multiple component SVPs to beaggregated in larger, possibly pre-provisioned, SVPs in the coreof the network.

As demonstrated in many previous publications(e.g., [18,19]), pipeline forwarding guarantees that real-timetraffic transmitted according to a reservation experiences (i)bounded end-to-end delay, (ii) low delay jitter independent ofthe number of nodes traversed with (iii) no congestion andresulting packet loss. Whenever a flow generated by a sourceexceeds its reservation, excess packets are dealt with at theingress to the FλS portion of the network (see the conceptof the FλS interface in Subsection II.D), where they mightbe buffered (delayed), statistically multiplexed on purposelyprovisioned “best-effort” SVPs, and possibly experience loss.

Pipeline forwarding is the basic operating principle under-lying FλS, also known as (TDS). In FλS, all packets in thesame TF are switched in the same way, i.e., through the sameoutput port. It is based on the setting of a switching scheduleof IP packets in TFs along a predefined route/path in the FλSnetwork, which results in the partitioning of optical channelsinto a number of sub-lambda (or “fractional”) channels. Headerprocessing is not required, with consequent low complexityand, hence, high scalability. In FλS there are two main typesof TF forwarding, wherein each TF contains multiple packets.

1. Immediate forwarding (IF): upon the arrival of each TFto a FλS switch, the content of the TF (e.g., IP packets)is scheduled to be “immediately” switched and forwardedto the next switch during the next TF. Hence, the bufferthat is required is bounded to one TF and the end-to-endtransmission delay is minimized. Consequently, the amountof buffer for one TF of 12.5 µs at 10 Gb/s is only 15 KB.Clearly, no buffering is required if IP packets scheduledin TF k arrive at the switch at the exact time when TF kbegins.

2. Non-immediate forwarding (NIF): TFs may be delayed atthe input of the switch for a predefined number of additionalTFs, typically, one or two (e.g., 2-frame forwarding up tok-frame forwarding). The main advantage of NIF is thereduction in blocking probability, which is the probabilitythat there are available TFs but not in the right sequenceor schedule.

B. Design Guidelines

The design approach is based on the following threeimplementation principles which, as it is shown below, lead tohigh scalability:

1. electronic and/or optical switching components, with

2. optical interconnection of the switching components, andwith

3. global pipeline forwarding (PF) with time-driven switchingand control.


Today single-chip, high-capacity electronic cross-pointswitches are available on the market; for example, a 144-by-144 switch matrix with 0–11 Gb/s per input/output port,i.e., up to 1.5 Tb/s aggregate switching capacity, dissipating0.1–0.15 W per input–output pair. However, electrical inter-connects suffer from high wire resistance, residual wire capaci-tance, and inter-wire crosstalk as the length and/or the densityof the electrical interconnections increases. Hence, construc-tion of a large switching matrix based on such cross-points mayrequire optical interconnects as they allow, at least in principle,any desired interconnection topology, while minimizing variousnoise and interference sources (see, for example, [22,23]).Finally, as shown in [13,14], the use of universal time leadsto an optimal switch design featuring the following:

1. Input ports with optimal memory access speedup of 1 (s =1), where speedup is defined as the ratio between the linkbandwidth and the memory access bandwidth.

2. Input ports with a single queue/buffer (note that typicalinput buffered switches have N queues—one queue peroutput in order to prevent head-of-the-line blocking).

3. Optimal output port design without buffers, i.e., the IPpackets are forwarded from the switching fabric directlyonto the out-going link.

4. Optimal switching fabric complexity by using multistageBanyan interconnection networks [24], with switchingcomplexity of O

(a ·N · logaN

), where N is the number of

input/outputs and a is the size of each switching elementor block. This aspect is further detailed in the followingsubsection.

5. Switch control complexity equal to the fabric complexity,O

(a ·N · logaN

).

6. Guaranteed quality of service with sub-lambda switching.

To summarize the above discussion, our design is guidedby the clear limitations of both “all-optical” and “all-electrical”switching approaches. Our current conclusion is that a hybridarchitecture is currently the “best-of-breed” switching solutionin order to achieve 10–100 Tb/s switching capacity in asingle chassis. Specifically, our current design approach isbased on time-driven (i.e., pipeline forwarding based) electronicswitching with optical interconnects and without “stopping”the serial bit stream. This may change as optical switchingcomponents evolve. It is worth noticing that, although thereare many technical challenges, no physical breakthrough isrequired in order to realize dense electrical/optical switcheswith optical interconnections.

C. Switch Scalability

In order to maximize the switching fabric scalability it isnecessary to minimize its complexity (which also minimizescost and power consumption). The lowest complexity fabricsare multistage Banyan interconnection networks, with switch-ing complexity of O

(a ·N · logaN

)[24]. Banyan is known to

suffer from space blocking, which means that a connectionbetween an available input and an available output may not bepossible because there is no available route through the switchinterconnection network. However, as shown in [13], with

320 Gbps

1

32

320 Gbps

1

32

132-by-32

32-by-32

320 Gbps

1

32

32-by-32320 Gbps 32

1

3232

Cross-point switches:Mindspeed –M21151/M21156[144-by-144: 3.2 Gb/s]

Opt

ical

inte

rcon

nect

ions

10 Tbps switching modulewith 10 Gb/s input (with 4 wires)

32-by-32

Fig. 2. 10 Tb/s switching module with M21151/M21156 implementedin the current testbed.

UTC-based pipeline forwarding of data packets space blockinghas just the effect of reducing the achievable link utilization.Intuitively, the multiple TFs in each time cycle provide anadditional degree of freedom for scheduling. Namely, havingK TFs in each time cycle is equivalent to having K differentpossible switch instances that can be used for moving datafrom inputs to outputs. In addition, non-immediate forwardingof TFs can be used: a TF (as a virtual container withmultiple IP packets) may be delayed one or more TFs beforebeing switched and forwarded, which increases the chancesof matching viable interconnections through subsequentswitches. In summary, the combination of plurality of TFsin each time cycle together with non-immediate forwardingenables the efficient use of Banyan-based switching fabric withoptimal switching complexity. A detailed performance study isgiven in [13].

Figure 2 is a possible Banyan-based switching fabricconfiguration based on the Mindspeed M21151/M21156 cross-point switch [25] that is used in our current prototype, asdescribed in the following sections. The Banyan design usingthe M21151/M21156 has an aggregate switching capacity of10 Tb/s. In summary, FλS enables the construction of Banyan-based fabrics, which are the most scalable switch designs, bysignificantly limiting the impact of their space blocking nature.Figure 2 is a low-power design of a Banyan-based fabric withswitching capacity of 10 Tb/s. The design is based on the opticalinterconnection of commercially available electrical switchingdevices. In comparison, all-optical switches with switchingtime below 1 µs have much larger physical size and are moreexpensive. However, all-optical switches can operate at linerates of 100s of Gb/s per optical channel.

D. Multimedia System Architecture

To fully benefit from FλS and UTC-based pipelineforwarding, the network nodes and end-systems need to beprovided with a CTR and the network applications need to beimplemented in such a way that they use it to maximize thequality of the received service. However, the Internet is cur-rently based on asynchronous IP packet switches and IP hosts.Thus, especially in an initial deployment phase, UTC-basedpipeline forwarding must coexist and interoperate with current


F S switch

(F S) backboneTDP router

End-system with time-drivenapplication and TDP scheduler

Asynchronous end-system(Asynchronous) access network

High-capacityAccess (TDP) network

SVP interface

Fig. 3. (Color online) Architecture for deploying pipeline forwarding.

asynchronous packet switches, hosts, and applications (e.g., IPvideo-phones, video-streaming servers, and clients, etc.). Thisis achieved with a network architecture such as the onedepicted in Fig. 3, where an intermediate synchronous switch-ing domain is used between the asynchronous edge and FλScore. This is realized with the time-driven priority (TDP) [18]technology, a pipeline forwarding realization offering higherflexibility (although less scalability) than FλS. In fact, TDPimplements UTC-based pipeline forwarding combined withconventional IP/MPLS routing and full hop-by-hop trafficmultiplexing. At FλS switches, there is no packet header pro-cessing and routing is based on the TF during which packetsare sent. Consequently, TDP routers at the backbone edgeare instrumental in properly “time-shaping” asynchronous IPpackets by transmitting them in pre-scheduled TFs. In a way,a TDP router connecting to the FλS core acts as an FλSinterface. At resource reservation time, schedules must bedefined seamlessly across the TDP backbone edge and the FλSbackbone core. Edge TDP routers are connected to traditionalasynchronous IP routers through an SVP (synchronous virtualpipe) interface that controls access to the pipeline forwardingnetwork by policing and shaping the incoming traffic flows,i.e., asynchronous packets are stored in a buffer waiting fortheir previously evaluated forwarding TF.

Clearly, the best performance is achieved when the pipelineforwarding domain extends until the end-systems, i.e., whenalso multimedia sources are UTC aware. However, [26] demon-strated how multimedia applications significantly benefit frompipeline forwarding even when deployed in a limited portionof the network, such as in Fig. 3. Furthermore, note howseveral “last-mile” technologies, such as cable modems andpassive optical networks, are also time based. Hence, they mayseamlessly interoperate with our time-based solutions, thuscreating end-to-end SVPs.

E. Future (Green) Internet

High switching scalability implies less hardware per unit ofswitching capacity, which consequently reduces component andelectricity costs. Although the reduction of power consumptionfor what is now called the “green Internet” [3–5] was not oneof our initial goals, it is, indeed, a beautiful outcome given thecurrent global energy and environmental crisis.

Switch controller

10 MHzGPS card

Data bus

Strobe signal

Interconnection

Address bus

M21151320 Gb/s

M21151320 Gb/s

M21151320 Gb/s

M21151320 Gb/s

1PPS

10 Tbps prototype switching module

Fig. 4. (Color online) Block diagram of FλS switch: 2-stage Banyanimplemented with M21151/M21156 cross-point switches.

Specifically, as Internet traffic increases, its power uti-lization grows. This growth in energy utilization must beconsidered seriously in future Internet design; otherwise itmay constrain the growth of the network itself. Various studieshave shown that current IP packet switching is not energyefficient. However, the time-based IP switching approachdemonstrated by our testbed implementation reduces theelectricity requirement significantly.

In future studies we will quantify more precisely thiselectricity reduction in both

1. switching, where the proposed architecture brings a majorreduction (which can be roughly estimated as a factor of10–20) in the number of electronic components—ultimatelytransistors—due to the elimination of buffers and headerprocessing, and

2. transmission, resulting from the ability to fully utilize eachoptical channel with sub/fractional lambda switching (FλS),which is not the case when all-lambda switching is deployed(i.e., lambda routing).

Thanks to its switching granularity, FλS enables theextension of the cost/energy efficient time-based IP packetswitching all the way to the edges of the network, specifically,in the abovementioned “last-mile.”

III. SWITCH PROTOTYPE ARCHITECTURE AND

IMPLEMENTATION

The guidelines presented in Section II are used to develop anFλS opto-electronic switch prototype, which is part of the widerpipeline forwarding network testbed described in Section IV.This section presents the architecture of the opto-electronicswitching system, describing in detail its main components.

A functional diagram of the FλS switch prototype is shownin Fig. 4. It has three major parts: (i) a field programmablegate-array (FPGA) switch controller, (ii) a switching fabricimplemented by interconnecting in a Banyan topology com-mercially available Mindspeed M21151/M21156 cross-pointswitches, and (iii) a GPS time receiver. The switch fabricconfiguration is determined by the switch controller, which isresponsive to timing signals from the GPS receiver.


CT

RL

SPI SSC

M

SPI SSC

USB

PLL

1PPS

10 MHz

PC

Additional SRAM

Mindspeed switch boards

Conditioning

block J1,J2

RS232 Memory table

Spartan-3 FPGA

50 MH

z

micro

SSC - switch status control

(*) GPS controller can be configured for simulating GPS clocks (1 PPS and 10 MHz) derived from 60 MHz

Mindspeed switch CTRL0

Scheduling FS M Scheduling FS

Mindspeed switch CTRL1

GPS interface (*)

XEM3001

Fig. 5. Block diagram of the switch control card.

A. General Description

The GPS receiver provides 1 PPS (pulse per second) and10 MHz signals (derived from UTC) to the FPGA switchcontroller. The communication between the switch controllerand the cross-point switches can be parallel or serial. Thereare three main signal types that connect the switch controllerwith the cross-point switches: address, data and strobe signals,as shown in Fig. 4. The cross-point configuration information(i.e., input to which each output should be connected, for all 144outputs, and for each TF within the time cycle) is stored in thememory table of the FPGA switch controller. Data and addresssignals are used for writing the switch configuration for thenext TF onto the cross-point switches. This writing processshall end before the falling edge of the next strobe signal, whichcorresponds to the beginning of the next TF and determines thelatching of a new switching configuration onto the cross-points.The cross-point switch configuration is ready in less than 10 nsfrom the falling edge of the strobe signal.

B. GPS Receiver

The GPS receiver, an EPSILON Board OEM II [27],provides accurate and stable time and frequency signals forUTC (coordinated universal time) synchronization. It provides1 PPS and 10 MHz sine waves and UTC time-of-day output.Furthermore, the 10 MHz frequency reference is cycle lockedto 1 PPS, which is the standard UTC second. This implies thatwithin 1 PPS there are exactly 10,000,000 cycles of the 10 MHzGPS card output.

C. Mindspeed Cross-Point Switch

Mindspeed cross-points M21151/M21156 [25] are the pri-mary components of the presented switch implementation.These are low-power CMOS, high-speed 144-by-144 cross-points. Each input can receive a serial signal from 0 to3.2 Gb/s—thus, its nominal aggregate capacity is 460 GB/s.

The serial input signal goes through a sequence of internalinverters inside the M21151/M21156, and, therefore, does nothave any power loss. The cross-point switch ports are equippedwith input equalization (IE) and output integrated clock datarecovery (CDR), which further preserves the serial signalquality. Each CDR is preceded by a programmable IE.

D. Switch Controller Card

The switch controller card is the “brain” of this scalableswitching system. The controller is implemented using anOpal Kelly XEM3001 integration module [28]. The XEM3001consists of an EEPROM, a USB 2.0 micro-controller, aphase locked loop, a 400,000-gate Xilinx Spartan-3 FPGAsubmodule [28], and a 1 MHz to 150 MHz multi-outputclock generator. The block diagram of the implemented FPGAcontroller is shown in Fig. 5.

The very high speed integrated circuit hardware descriptionlanguage (VHDL) is used for the implementation of the FPGAcontroller. As shown in Fig. 5, the FPGA submodule blocksare control logic (CTRL), memory table, GPS interface, andMindspeed cross-point controller. The implementations of allfunctional blocks in VHDL are synthesized into a single bit file,which is uploaded from a PC to the FPGA through the USB2.0 connection. In the prototype presented here, we have usedone controller for controlling two M21151 cross-point switches.There are two scheduling finite state machines (FSMs) for thetwo cross-points.

The Mindspeed switch control logic (CTRL0 and 1, in Fig. 5)contains the FSM that controls and connects all the sub-blocksimplemented on the FPGA module. Every clock cycle reads theinput status register (target_status) connected to a USB wireand acts accordingly. Each scheduling FSM consists of threemain counters: TF counter, time cycle (TC) counter, and UTCsecond counter. Furthermore, each scheduling FSM receivestwo reference input signals from the GPS receiver: (1) 1 pulseper UTC second (1 PPS) and (2) 10 MHz. Every UTC second(1 PPS) is divided into a programmable number of TCs n, andevery TC is divided into a programmable number of TFs m,each with a predefined duration. Given a TF duration T, itmust be T · n · m = 1 UTC second, as shown in Fig. 1. Thevalues for n, m, and T can be set by changing the correspondingparameters stored on the FPGA (CTRL0 and CTRL1) from aPC via the USB link.

During every TF, the FSM downloads the next configurationinto the cross-point switch. The next configuration is activatedby the switch strobe signal, which is sent via SPI interface. Thecross-point switch configuration data are stored in a memorytable (see Fig. 5) where there is a table entry for each TF withinthe TC. When a new TC starts, the pointer into the memorytable is reset to the first memory entry that corresponds to thefirst TF in the TC. This is the cyclic operation of CTRL0 andCTRL1. The memory table can be configured from a graphicaluser interface (GUI) on a PC through the USB commun-ication link.


First TDS switch Second TDS switch

forwarding

25 km Optical

UTC

UTCUTC

O-EE-O1

2

4

distance Arbitrarydistance

Streaming media

receiver 02E-O

Mindspeed

3O-E

Streaming media

sourcePipeline

router

Arbitrary cross-point

GPS/GALILEO O-E: Optical-to-electrical (analog)

E-O: Electrical-to-optical (analog)

Streaming

media receiver 01

fiber

cross-pointMindspeed

2115121151

Fig. 6. (Color online) Initial testbed set-up without “stopping” the bits.

IV. TESTBED OVERVIEW AND EXPERIMENTS

A pipeline forwarding testbed scaling to multi-terabit/sswitching capacity has been developed deploying the pre-viously described FλS switch. This testbed (see Fig. 6)implements a prototypal video distribution network providingguaranteed quality of service with deterministic queuing delayand small jitter in network nodes. The multimedia systemarchitecture presented in Subsection II.D is considered. Inparticular, two asynchronous video streams are generated bya video server (to the left), transported with deterministicquality of service through a network composed of one TDProuter and two multi-terabit/s FλS switches, and delivered totwo different video clients. IP packets carrying encoded mediaare transported unchanged, as a whole, end-to-end. Namely, nochange can be seen by observing packets flowing on any link ofthe testbed, as only conventional IP packets encapsulated intoEthernet frames travel across the network testbed.

A. Pipeline Forwarding (TDP) Router

Traffic leaving the video server is “asynchronous,” thus ithas to be “time-shaped” before entering the pipeline forwardingnetwork. In essence, packets have to be sent to the firstFλS during the right TFs. This is performed by the pipelineforwarding router, which in our testbed represents both theTDP network edge and the SVP interface (see Subsection II.D).The implementation of the pipeline forwarding router and theoperations it has to perform are briefly described below. A moredetailed presentation can be found in [29], and for a descriptionof how scheduling can be done see [30].

The developed pipeline forwarding router is based on therouting software of the FreeBSD 4.8 operating system runningon a 2.4 GHz Pentium IV PC; the TDP scheduling algorithmis implemented in the FreeBSD kernel. Generically, in arouter data plane packets are moved from input ports tooutput ports going through three modules that perform inputprocessing, forwarding, and output processing. Thus, the samewas realized in our FreeBSD-based pipeline forwarding router.

The router has to shape asynchronous traffic entering thepipeline forwarding network. Consequently, its input modulecomprises mechanisms to classify incoming packets, identifythe data flow they belong to, and select a TF in which theywill be forwarded according to the current resource reservationset-up.

The forwarding module processes packets according tothe specific network technology; specifically, in the presentedprototype packet routing and forwarding is based on IP. Nomodification is required to the forwarding module.

The output module implements a per-TF, per-output queu-ing system, where packets to be forwarded during the sameTF through the same interface are buffered in the same queue.The queue in which each packet is stored is determined byboth the input module, which decides the forwarding TF, andthe forwarding module, which selects the output interface. Theoutput module is also responsible for the timely transmissionof all the packets stored in the queues corresponding to thecurrent TF.

The UTC is provided to the router using a Symmetricombc637PCI-U GPS receiver PCI card [31] that can generateinterrupts at a programmable rate ranging between less than1 Hz and 250 kHz (1 PPS and 4 µs).

B. Initial Testbed Set-Up

The initial switch prototype experimental set-up is shown inFig. 6. The main components are streaming media sources, apipeline forwarding router for scheduling packet entrance intothe first switch, a 25 km single mode optical fiber, a two-stagesource side switch, and a single-stage receiver side switch.The deployed cross-point switches have 144 channels with acapacity of 3.2 Gb/s each.

Two asynchronous streaming media flows—a DVD moviewith soundtrack and subtitles, and an animation movie withsoundtrack—are generated by the streaming media source, asis shown in Fig. 6, and transmitted to the pipeline forwardingrouter. The streaming media packets are then forwarded viaan optical link by the pipeline forwarding router through


(a) (b)

Fig. 7. (Color online) Eye patterns at (a) location 1 and (b) location 4in Fig. 6.

Node 2 Node 3 Node 4N

ode

625 km 25 km

Node 525 km 25 km

No

de1

(i) (ii)

(iii )

Fig. 8. (Color online) 100 km, 6-node metropolitan network (actuallyrealized using only two FλS switches).

Gigabit Ethernet (GE) transceivers to the first FλS switchduring different predefined TFs. The first cross-point switchsplits the packets into two streams forwarded to the secondcross-point on separate channels. At the second cross-pointboth streams are multiplexed again into one channel that istransmitted via the 25 km single mode optical fiber link to thesecond switch, which routes each video stream to a differentoutput. Then the separated video streams are forwarded to tworeceivers through optical links of an arbitrary length. Videoflows are therefore multiplexed on the first and second linksthey traversed, but FλS ensures that the video packets reachtheir corresponding destination with deterministic delay (withno jitter), i.e., during predefined TFs. Switching of all threeswitching boards and network interfaces is synchronized withthe 1 PPS signal received from three different GPS receivers.

Data integrity has been validated at the output of eachswitch (i.e., location 1 and location 4 highlighted in Fig. 6) bymatching the eye pattern with a standard Gigabit Ethernet1000 base SX/LX test mask [32]. The results are shown inFig. 7 for locations 1 and 4 from a snapshot of the screen ofan oscilloscope. The signal received at the measurement pointis sampled by the oscilloscope, and its value is plotted on thescreen, one yellow dot for each sample. The rectangles andhexagon drawn on the screen reproduce the transmitted eyemask defined in Section 38.6.5 of [32]; the signal is conformant,and hence can be correctly received by a compliant receiver, aslong as its plotted measures do not touch these geometricalshapes. It is noteworthy that the measurement at point 4shown in Fig. 7(b) was performed after 25 km of single modefiber. In addition each data link, including switch boards andoptical transceivers (GBIC), has been tested for bit error rates(BERs) from 0.1 to 3.0 Gb/s, in order to allow for high safetymargins. Having observed no errors over several test runsinvolving 1 Tb of data sent across the switch, the BER is lowerthan 10−12, and hence negligible.

(i)

(iii)(ii)

(i

(iii)(ii)

Fig. 9. (Color online) Eye patterns at locations (i), (ii), and (iii) inFig. 8.

TABLE IEND-TO-END DELAY JITTER

Traffic flow Jitter [ms]

Media flow 1 0.14Media flow 2 0.16

C. Advanced Testbed Testing

To evaluate the robustness of the switch prototype and toknow its limits we have extended our testing to a 100 kmmetro-like network shown in Fig. 8. It consists of six nodes,each including either one or two cross-point switches. Five ofthe six nodes are connected optically through four segments of25 km optical fiber.

The eye pattern test is performed, and measurements weretaken at locations (i), (ii), and (iii), as shown in Fig. 8.Specifically, Fig. 9 shows the eye pattern measurements after25 km (location (i)), after 50 km (location (ii)), and after 100 km(location (iii)). It is apparent from the eye pattern that thereis a reduction of the safety margin as the optical fiber lengthincreases. However, the signal is compliant to the 1000 baseSX-LX standard even after having traveled for 100 km.

The maximum jitter is also measured for each flow in this100 km testbed. This is computed as the difference between themaximum and the minimum end-to-end delay observed duringthe whole experiment on packets belonging to the same flow.Delay measurements are performed by time-stamping datapackets at the kernel level of the PCs which host the mediaserver and the receivers, respectively. GPS Symmetricom cardfeatures are exploited to do this in order to obtain very precisetime stamps. The results obtained are shown in Table I.

The delay jitter is low and does not exceed the pipelineforwarding theoretical upper bound 2T (T = 100 µs in ourtestbed). Furthermore, it is worth noting that in our experi-ments, with flows compliant to their reservation, no packet losswas experienced. This further validates our implementation.


TDS switch TDS switch

source

25 km

fiber

GPS/GALILEO

media

UTC1PPS

UTC1PPS

O-EE-O

Arbitrarydistance

Streaming media

E-OO-E

25 km

fiber

TDSAll-optica lSw itch

FP GAGP S

FPGAGPSFP

GA

GPS

UTC1PPS

UTC1PPS

TDS switch TDS switch

O-EE-O

E-OO-E

TDSAll-opticalswitch

FP GAGPS

FPGAGPSFPGAGPS

UTC1PPS

Streaming media

Pipeline forwarding

router

Arbitrarydistance

Optical

Streaming

O-E: Optical-to-electrical (analog)

E-O: Electrical-to-optical (analog)

Optical

Fig. 10. (Color online) Testbed set-up for testing the interoperability of all-optical and electrical FλS switches.

Coupler

Coupler

Coupler

Coupler

Coupler

Coupler

Output 1 - 0 dBm

Output 2 - 0 dBm

Monitor PD3

Monitor PD1

FPGA

+

UBLOX-T GPS

GPS

Individual

InternetInput 2

PC

Monitor PD4

Monitor PD2

90

90

10

10

50/50 50/50

50/50

FC-APC

FC

FC-APC

FC-APC FC-APC

FC-APC FC-APC

FC-APC FC-APC X 8

FC-AP C

FC-APC

FC

FC50/50

- SOA out +3dbm

Power AMPfor

SOA switchactuation

gain control

environment

Typical output of 40 km transceiver = –4 to +1 dbm

SOA

SOA

SOA

SOA

LABview

Input 1

Fig. 11. (Color online) Implementation of an all-optical switch including a diagnostic and testing subsystem.


D. Advanced Testbed With an All-Optical Switch

Recently an all-optical FλS switch was added to the originaltestbed (in Fig. 6), as shown in Figs. 10 and 11. The objectiveof the all-optical switch is to provide a proof-of-concept for sucha hybrid implementation. This has been successfully achievedby using semiconductor optical amplifiers (SOAs) operating asON–OFF optical switches. The all-optical switch facilitates thefollowing primary advantages:

• elimination of the need for OE (optical-to-electronic) and EO(electronic-to-optical) conversions,

• serial optical transmission up to hundreds of Gb/s, and• interoperability with similar electronic FλS switches that

are operating “without stopping the serial bit stream” as itis in the all-optical domain.

In the testbed (Fig. 10) the all-optical switch is connectedwith two 25 km fiber links to the two original FλS electronicswitches.

The all-optical switch, shown in Fig. 11, is electronicallydriven and controlled, and can change its state in less than1 ns. In other words, the all-optical switches require anelectronic control system, but the optical data carrier is routedwithout converting it into electrical signals. The all-opticalrouting enables the switch to be transparent to opticalwavelength and data rates. The current initial switch imple-mentation operates at 1.25 Gb/s Ethernet (GE) at 1550 nm andcan also operate at 10 GE (and beyond) at 1550 nm without anychange to the current all-optical switch implementation.

The switch consists of ON/OFF SOAs interconnected ina network of (3db) passive optical couplers implementing“broadcast-and-select” design. The control mechanism is thesame as the one developed for the electronic FλS switch.However, the all-optical switch controller and GPS timereceiver are integrated and implemented in a single FPGAin a novel design. This implementation saves a considerableamount of money. The all-optical FλS switch includes anextensive diagnostic and testing subsystem, as shown inFig. 11, which is based on LABview.

The current SOA electronic driver operates with a powersupply of 5 V and 0.2 A, which requires 1 W. However, it ispossible to use 1.3 V, which requires only 0.26 W. Since theoptical channel requires two SOAs, then the 100 Gb/s channelwill use 0.26 ·2/100= 0.0052 W per 1 Gb/s, which is lower thanin an electronic cross-connect.

V. CONCLUSIONS

The implementation of UTC-based pipeline forwarding in areal optical testbed that is scalable to multi-terabit/s switchingcapacity has been a rewarding experience. The implementationsuccess is a direct outcome of the simplicity of the pipelineforwarding method. The beauty is that the simplicity ofthis realization does not compromise three highly desiredproperties for the future Internet:

(1) high switch scalability (e.g., to 10 and 160 Tb/s in a singlechassis);

(2) predictable quality of service (QoS) performance forstreaming media, and live (sport and entertainment) events,with low end-to-end delay for interactive multimedia applica-tions; and

(3) low energy (electricity) consumption for a “green,”environmentally friendly Internet.

This is significant, as it demonstrates that the deployedtechnology is suitable for providing (1) ultra-scalable switchingcapacity and (2) sub-lambda reconfigurable optical add–dropmultiplexing (ROADM) in metropolitan networks while mini-mizing the provider cost and complexity per end-user.

ACKNOWLEDGMENT

This work was supported in part by funds from the EuropeanCommission (contract No. MC-EXC 002807).

REFERENCES

[1] Cisco Systems, Inc. “Hyperconnectivity and the approachingzettabyte era,” June 2, 2010 [Online]. Available: http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.html.

[2] I. Vaishnavi, P. Cesar, D. Bulterman, and O. Friedrich, “FromIPTV services to shared experiences: challenges in architecturedesign,” in IEEE Int. Conf. on Multimedia and Expo (ICME2010), Singapore, July 2010, pp. 1511–1516.

[3] M. Gupta and S. Singh, “Greening of the Internet,” in ACMSIGCOMM 2003, Karlsruhe, Germany, Aug. 2003.

[4] K. Christensen, B. Nordman, and R. Brown, “Power managementin networked devices,” IEEE Comput. Mag., vol. 37, no. 8,pp. 91–93, Aug. 2004.

[5] M. Guenach, C. Nuzman, J. Maes, and M. Peeters, “On poweroptimization in DSL systems,” in IEEE Int. Workshop on GreenCommunications (GreenComm) at ICC 2009, Dresden, Germany,June 2009.

[6] R. Ramaswami and K. N. Sivarajan, Optical Networks: A Practi-cal Perspective, 2nd ed. Morgan Kaufmann Publishers, 2001, ch.8, 9, 10, 11.

[7] A. Pattavina and R. Zanzottera, “Non-blocking WDM switchesbased on arrayed waveguide grating and shared wavelengthconversion,” in IEEE Conf. on Computer Communications (IN-FOCOM 2006), Barcelona, Spain, Apr. 2006.

[8] C. Qiao and M. Yoo, “Optical Burst Switching (OBS)—a newparadigm for an optical internet,” J. High Speed Netw., vol. 8,no. 1, pp. 69–84, Jan. 1999.

[9] H. C. Cankaya, S. Charcranoon, and T. S. El-Bawab, “A pre-emptive scheduling technique for OBS networks with servicedifferentiation,” in IEEE Int. Conf. on Global Communications(GLOBECOM 2003), San Francisco, CA, Dec. 2003.

[10] T. Tachibana and S. Kasahara, “Burst cluster transmission:service differentiation mechanism for immediate reservationin Optical Burst Switching networks,” IEEE Commun. Mag.,vol. 44, no. 5, pp. 46–55, May 2006.

[11] W.-S. Park, M. Shin, H.-W. Lee, and S. Chong, “A joint designof congestion control and burst contention resolution for opticalburst switching networks,” J. Lightwave Technol., vol. 27, no. 17,pp. 3820–3830, Sept. 2009.

[12] T. Orawiwattanakul, Y. Ji, and N. Sonehara, “Fair bandwidthallocation with distance fairness provisioning in optical burstswitching networks,” in IEEE Int. Conf. on Global Communica-tions (GLOBECOM 2010), Dec. 2010, pp. 1–5.

http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.html


















[13] M. Baldi and Y. Ofek, “Fractional lambda switching—principlesof operation and performance issues,” SIMULATION: Trans. Soc.Model. Simul. Int., vol. 80, no. 10, pp. 527–544, Oct. 2004.

[14] V. T. Nguyen, R. Lo Cigno, and Y. Ofek, “Tunable laser-baseddesign and analysis for fractional lambda switches,” IEEE Trans.Commun., vol. 56, no. 6, pp. 957–967, June 2008.

[15] J. D. Angelopoulos, K. Kanonakis, G. Koukouvakis, H. C. Leligou,C. Matrakidis, T. Orphanoudakis, and A. Stavdas, “An opticalnetwork architecture with distributed switching inside nodeclusters features improved loss, efficiency and cost,” J. LightwaveTechnol., vol. 25, no. 5, pp. 1138–1146, May 2007.

[16] A. Stavdas, T. G. Orphanoudakis, A. Lord, H. C. Leligou,K. Kanonakis, C. Matrakidis, A. Drakos, and J. D. Angelopoulos,“Dynamic CANON: a scalable multi-domain core network,” IEEECommun. Mag., vol. 46, no. 6, pp. 138–144, June 2008.

[17] J. Ramamirtham and J. Turner, “Time sliced optical burstswitching,” in IEEE Int. Conf. on Computer and Communications(INFOCOM 2003), Apr. 2003, pp. 2030–2038.

[18] C.-S. Li, Y. Ofek, and M. Yung, “Time-driven priority flow controlfor real-time heterogeneous internetworking,” in IEEE Int. Conf.on Computer Communications (INFOCOM 1996), San Francisco,CA, Mar. 1996.

[19] C.-S. Li, Y. Ofek, A. Segall, and K. Sohraby, “Pseudo-isochronouscell forwarding,” Comput. Netw. ISDN Syst., vol. 30, no. 24,pp. 2359–2372, Dec. 1998.

[20] Y. Ofek, “Generating a fault tolerant global clock usinghigh-speed control signals for the MetaNet architecture,” IEEETrans. Commun., vol. 42, no. 5, pp. 2179–2188, May 1994.

[21] IEEE Instrumentation and Measurement Society, “IEEE Stan-dard for a Precision Clock Synchronization Protocol for Net-

worked Measurement and Control Systems,” IEEE Std. 1588(Revision), July 2008.

[22] W. Gibbs, “Computing at the speed of light,” Sci. Am., vol. 291,no. 5, pp. 80–87, Nov. 2004.

[23] L. Pavesi and D. J. Lockwood, Eds., Silicon Photonics. Springer-Verlag, 2004.

[24] L. R. Goke and G. J. Lipovski, “Banyan networks for partitioningmultiprocessor systems,” in ACM Annu. Symp. on ComputerArchitecture (ISCA 1973), New York, NY, Dec. 1973.

[25] Mindspeed Technologies, Inc. [Online]. Available: http://www.mindspeed.com.

[26] E. Masala, A. Vesco, M. Baldi, and J. C. De Martin, “OptimizedH.264 video encoding and packetization for video transmissionover pipeline forwarding networks,” IEEE Trans. Multimedia,vol. 11, no. 5, pp. 972–985, Aug. 2009.

[27] Tekelec Systemes [Online]. Available: http://www.tekelec-systemes.com.

[28] Opal Kelly, Inc. [Online]. Available: http://www.opalkelly.com.

[29] M. Baldi, G. Marchetto, G. Galante, F. Risso, R. Scopigno, andF. Stirano, “Time driven priority router implementation and firstexperiments,” in IEEE Int. Conf. on Communications (ICC 2006),Istanbul, Turkey, June 2006.

[30] T.-H. Truong, M. Baldi, and Y. Ofek, “Efficient scheduling forheterogeneous fractional lambda switching (FλS) networks,”in IEEE Int. Conf. on Global Communications (IEEE GLOBE-COM 2007), Washington, DC, Nov. 2007.

[31] Symmetricom, Inc. [Online]. Available: http://www.symmttm.com.

[32] IEEE 802.3 Working Group, “Part 3: carrier sense multipleaccess with collision detection (CSMA/CD) access method andphysical layer specifications,” IEEE Std 802.3 2000 Edition,2000.

http://www.mindspeed.com




http://www.tekelec-systemes.com




http://www.opalkelly.com




http://www.symmttm.com




Scalable Fractional Lambda Switching: A Testbed

Documents

Transcript of Scalable Fractional Lambda Switching: A Testbed