Learn on the fly: Data-driven link estimation and routing in sensor network backbones

17
Learn on the Fly: Data-driven Link Estimation and Routing in Sensor Network Backbones Hongwei Zhang, Anish Arora, Prasun Sinha Technical Report: WSU-CS-DNC-07-TR1 Wayne State University, USA Abstract In the context of IEEE 802.11b network testbeds, we examine the differences between unicast and broadcast link properties, and we show the inherent difficulties in precisely estimating unicast link properties via those of broadcast beacons even if we make the length and transmission rate of beacons be the same as those of data packets. To circumvent the difficulties in link estimation, we propose to estimate unicast link properties directly via data traffic itself without using periodic beacons. To this end, we de- sign a data-driven routing protocol Learn on the Fly (LOF). LOF estimates link quality based on data traffic, and it chooses routes by way of a locally measurable metric ELD, the expected MAC latency per unit-distance to the destination. Using a realistic sen- sor network traffic trace and an 802.11b testbed of 195 Stargates, we experimentally compare the performance of LOF with that of existing protocols, represented by the geography-unaware ETX and the geography-based PRD. We find that LOF reduces end- to-end MAC latency by a factor of 3, enhances energy efficiency by a factor up to 2.37, improves route stability by 2 orders of magnitude, and improves network throughput by a factor up to 7.78. The results demonstrate the feasibility as well as potential benefits of data-driven link estimation and routing. Keywords—sensor network, beacon-free geographic routing, data-driven link quality estimation, MAC latency, IEEE 802.11b, real time, energy, reli- ability 1 Introduction Wireless sensor networks are envisioned to be of large scale, comprising thousands to millions of nodes. To guarantee real- time and reliable end-to-end packet delivery in such networks, they usually require a high-bandwidth network backbone to pro- cess and relay data generated by the low-end sensor nodes such as motes [3]. This architecture has been demonstrated in the sen- sor network field experiment ExScal [7], where 203 Stargates and 985 XSM motes were deployed in an area of 1260 meters by 288 meters. Each Stargate is equipped with a 802.11b radio, and the 203 Stargates form the backbone network of ExScal to support reliable and real-time communication among the motes for target detection, classification, and tracking. Similar 802.11 based sensor networks (or network backbones) have also been explored in other projects such as MASE [1] and CodeBlue [2]. H. Zhang is with the Department of Computer Science, Wayne State University, Detroit, Michigan 48202, U.S.A. E-mail: [email protected]. A. Arora and P. Sinha are with the Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, U.S.A. E-mail: {anish, prasun}@cse.ohio- state.edu. In this paper, we study how to perform routing in such 802.11 based wireless sensor network backbones. As the quality of wireless links, for instance, packet delivery rate, varies both temporally and spatially in a complex manner [8, 25, 35], estimating link quality is an important aspect of rout- ing in wireless networks. To this end, peers exchange broadcast beacons periodically in existing routing protocols [13, 14, 15, 30, 33], and the measured quality of broadcast acts as the basis of link estimation. Nonetheless, beacon-based link estimation has several drawbacks: Firstly, link quality for broadcast beacons differs signifi- cantly from that for unicast data, because broadcast bea- cons and unicast data differ in packet size, transmission rate, and coordination method at the media-access-control (MAC) layer [12, 28]. Therefore, we have to estimate uni- cast link quality based on that of broadcast. It is, however, difficult to precisely estimate unicast link quality via that of broadcast, because temporal correlations of link quality assume complex patterns [32] and are hard to model. As a result, existing routing protocols do not consider temporal link properties in beacon-based estima- tion [13, 33]. Thus the link quality estimated using peri- odic beacon exchange may not accurately apply for unicast data, which can negatively impact the performance of rout- ing protocols. Even if we could precisely estimate unicast link quality based on that of broadcast, beacon-based link estimation may not reflect in-situ network condition either. For in- stance, a typical application of wireless sensor networks is to monitor an environment (be it an agricultural field or a classified area) for events of interest to the users. Usually, the events are rare. Yet when an event occurs, a large burst of data packets is often generated that needs to be routed re- liably and in real-time to a base station [34]. In this context, even if there were no discrepancy between the actual and the estimated link quality using periodic beacon exchange, the estimates still tend to reflect link quality in the absence, rather than in the presence, of bursty data traffic. This is because: Firstly, link quality changes significantly when traffic pattern changes (as we will show in Section 2.2.2); Secondly, link quality estimation takes time to converge, yet different bursts of data traffic are well separated in time, and each burst lasts only for a short period. Beacon-based link estimation is not only limited in reflect- ing the actual network condition, it is also inefficient in energy usage. In existing routing protocols that use link quality estima- tion, beacons are exchanged periodically. Therefore, energy is 1

Transcript of Learn on the fly: Data-driven link estimation and routing in sensor network backbones

Learn on the Fly: Data-driven Link Estimation andRouting in Sensor Network Backbones

Hongwei Zhang, Anish Arora, Prasun SinhaTechnical Report: WSU-CS-DNC-07-TR1

Wayne State University, USA

Abstract

In the context of IEEE 802.11b network testbeds, we examine thedifferences between unicast and broadcast link properties, andwe show the inherent difficulties in precisely estimating unicastlink properties via those of broadcast beacons even if we makethe length and transmission rate of beacons be the same as thoseof data packets. To circumvent the difficulties in link estimation,we propose to estimate unicast link properties directly viadatatraffic itself without using periodic beacons. To this end, we de-sign a data-driven routing protocolLearn on the Fly(LOF). LOFestimates link quality based on data traffic, and it chooses routesby way of a locally measurable metric ELD, theexpected MAClatency per unit-distance to the destination. Using a realistic sen-sor network traffic trace and an 802.11b testbed of 195 Stargates,we experimentally compare the performance of LOF with that ofexisting protocols, represented by the geography-unawareETXand the geography-based PRD. We find that LOF reduces end-to-end MAC latency by a factor of 3, enhances energy efficiencyby a factor up to 2.37, improves route stability by 2 orders ofmagnitude, and improves network throughput by a factor up to7.78. The results demonstrate the feasibility as well as potentialbenefits of data-driven link estimation and routing.Keywords—sensor network, beacon-free geographic routing, data-drivenlink quality estimation, MAC latency, IEEE 802.11b, real time, energy, reli-ability

1 Introduction

Wireless sensor networks are envisioned to be of large scale,comprising thousands to millions of nodes. To guarantee real-time and reliable end-to-end packet delivery in such networks,they usually require a high-bandwidth network backbone to pro-cess and relay data generated by the low-end sensor nodes suchas motes [3]. This architecture has been demonstrated in thesen-sor network field experiment ExScal [7], where 203 Stargatesand 985 XSM motes were deployed in an area of 1260 metersby 288 meters. Each Stargate is equipped with a 802.11b radio,and the 203 Stargates form the backbone network of ExScal tosupport reliable and real-time communication among the motesfor target detection, classification, and tracking. Similar 802.11based sensor networks (or network backbones) have also beenexplored in other projects such as MASE [1] and CodeBlue [2].

H. Zhang is with the Department of Computer Science, Wayne State University, Detroit,Michigan 48202, U.S.A. E-mail: [email protected].

A. Arora and P. Sinha are with the Department of Computer Science and Engineering, TheOhio State University, Columbus, OH 43210, U.S.A. E-mail:{anish, prasun}@cse.ohio-state.edu.

In this paper, we study how to perform routing in such 802.11based wireless sensor network backbones.

As the quality of wireless links, for instance, packet deliveryrate, varies both temporally and spatially in a complex manner[8, 25, 35], estimating link quality is an important aspect of rout-ing in wireless networks. To this end, peers exchange broadcastbeacons periodically in existing routing protocols [13, 14, 15,30, 33], and the measured quality of broadcast acts as the basisof link estimation. Nonetheless, beacon-based link estimationhas several drawbacks:• Firstly, link quality for broadcast beacons differs signifi-

cantly from that for unicast data, because broadcast bea-cons and unicast data differ in packet size, transmissionrate, and coordination method at the media-access-control(MAC) layer [12, 28]. Therefore, we have to estimate uni-cast link quality based on that of broadcast.• It is, however, difficult to precisely estimate unicast link

quality via that of broadcast, because temporal correlationsof link quality assume complex patterns [32] and are hardto model. As a result, existing routing protocols do notconsider temporal link properties in beacon-based estima-tion [13, 33]. Thus the link quality estimated using peri-odic beacon exchange may not accurately apply for unicastdata, which can negatively impact the performance of rout-ing protocols.• Even if we could precisely estimate unicast link quality

based on that of broadcast, beacon-based link estimationmay not reflect in-situ network condition either. For in-stance, a typical application of wireless sensor networks isto monitor an environment (be it an agricultural field or aclassified area) for events of interest to the users. Usually,the events are rare. Yet when an event occurs, a large burstof data packets is often generated that needs to be routed re-liably and in real-time to a base station [34]. In this context,even if there were no discrepancy between the actual andthe estimated link quality using periodic beacon exchange,the estimates still tend to reflect link quality in the absence,rather than in the presence, of bursty data traffic. This isbecause: Firstly, link quality changes significantly whentraffic pattern changes (as we will show in Section 2.2.2);Secondly, link quality estimation takes time to converge,yet different bursts of data traffic are well separated in time,and each burst lasts only for a short period.

Beacon-based link estimation is not only limited in reflect-ing the actual network condition, it is also inefficient in energyusage. In existing routing protocols that use link quality estima-tion, beacons are exchanged periodically. Therefore, energy is

1

consumed unnecessarily for the periodic beaconing when thereis no data traffic. This is especially true if the events of interestare infrequent enough that there is no data traffic in the networkmost of the time [34].

To deal with the shortcomings of beacon-based link quality es-timation and to avoid unnecessary beaconing, new mechanismsfor link estimation and routing are desired.

Contributions of the paper. Using outdoor and indoor testbedsof 802.11b networks, we study the impact of environment, packettype, packet size, and interference pattern on the quality of wire-less links. The results show that it is difficult (if even possible)to precisely estimate unicast link quality using broadcastbeaconseven if we make the length and transmission rate of beacons bethe same as those of data packets. Fortunately, we find that geog-raphy and the DATA-ACK handshake (available in the 802.11bMAC) make it possible to perform routing without using periodicbeacons, in terms of information diffusion and data-drivenlinkquality estimation respectively. To demonstrate the technique ofdata-driven link estimation and routing, we define a routingmet-ric ELD, theexpected MAC latency per unit-distance to the des-tination, which can be implemented in our 802.11 networks andworks well in both our indoor testbeds and the large scale fieldexperiment ExScal [7]. (Note: in principle, we could have usedmetrics such as ETX [13] or RNP [11] in data-driven routing, butthis is not feasible with our 802.11 radios.)

To implement data-driven routing, we modify the Linux kerneland the WLAN driverhostap[5] to exfiltrate the MAC latencyfor each packet transmission, which is not available in existingsystems. The exfiltration of MAC latency is reliable in the sensethat it deals with the loss of MAC feedback at places such asnetlinksockets and IP transmission control.

Building upon the capability of reliably fetching MAC latencyfor each packet transmission, we design a routing protocolLearnon the Fly (LOF) which implements the ELD metric withoutusing periodic beacons. In LOF, control packets are used onlyrarely, for instance, during the node boot-up. Upon bootingup, anode initializes its routing engine by taking a few (e.g., 6)sam-ples on the MAC latency to each of its neighbors; then the nodeadapts its routing decision solely based on the MAC feedbackfor data transmission, without using any control packet. Todealwith temporal variations in link quality and possible imperfec-tion in initializing its routing engine, the node probabilisticallyexplores alternative neighbors at controlled frequencies.

Using an event traffic trace from the field sensor network ofExScal [7], we experimentally evaluate the design and the per-formance of LOF in a testbed of 195 Stargates [3] with 802.11bradios. We also compare the performance of LOF with that ofexisting protocols, represented by the geography-unawareETX[13, 33] and the geography-based PRD [30]. We find that LOFreduces end-to-end MAC latency, reduces energy consumptionin packet delivery, and improves route stability. Besides burstyevent traffic, we evaluate LOF in the case of periodic traffic,andwe find that LOF outperforms existing protocols in that case too.We also find that LOF significantly improves network through-put. The results corroborate the feasibility as well as the benefitsof data-driven link estimation and routing.

Organization of the paper. In Section 2, we study the short-

comings of beacon-based link quality estimation, and we analyzethe feasibility of data-driven routing. Following that, wepresentthe routing metric ELD in Section 3, and we design the protocolLOF in Section 4. We experimentally evaluate LOF in Section 5,and we discuss the related work in Section 6. We make conclud-ing remarks in Section 7.

2 Why data-driven routing?

In this section, we first experimentally study the impact of packettype, packet length, and interference on link properties1. Thenwe discuss the shortcomings of beacon-based link property esti-mation, as well as the concept of data-driven link estimation androuting.

2.1 Experiment design

We set up two 802.11b network testbeds as follows.

Outdoor testbed. In an open field (see Figure 1), we deploy 29Stargates in a straight line, with a 45-meter separation betweenany two consecutive Stargates. The Stargates run Linux withker-nel 2.4.19. Each Stargate is equipped with a

Figure 1: Outdoor testbed

SMC 2.4GHz 802.11b wirelesscard and a 9dBi high-gain collinearomnidirectional antenna, whichis raised 1.5 meters above theground. To control the maximumcommunication range, the trans-mission power level of each Star-gate is set as 35. (Transmissionpower level is a tunable parameterfor 802.11b wireless cards, and its

range is 127, 126, . . . , 0, 255, 254, . . . , 129, 128, with 127 beingthe lowest and 128 being the highest.)

Sensornet testbed Kansei.In an open warehouse with flat alu-minum walls (see Figure 2(a)), we deploy 195 Stargates in a 15× 13 grid (as shown in Figure 2(b)) where the separation be-tween neighboring grid points is 0.91 meter (i.e., 3 feet). The

(a) Kansei

columns (0 - 14)

r o w

s ( 0

- 1

2 )

(b) grid topology

Figure 2: Sensornet testbed Kansei

deployment is a part of the sensornet testbed Kansei [9]. Forconvenience, we number the rows of the grid as 0 - 12 from thebottom up, and the columns as 0 - 14 from the left to the right.Each Stargate is equipped with the same SMC wireless card asin the outdoor testbed. To create realistic multi-hop wireless net-works similar to the outdoor testbed, each Stargate is equippeda 2.2dBi rubber duck omnidirectional antenna and a 20dB atten-uator. We raise the Stargates 1.01 meters above the ground by

1In this paper, the phraseslink quality andlink propertyare used interchangeably.

2

putting them on wood racks. The transmission power level ofeach Stargate is set as 60, to simulate the low-to-medium den-sity multi-hop networks where a node can reliably communicatewith around 15 neighbors.

The Stargates in the indoor testbed are equipped with wall-power and outband Ethernet connections, which facilitate long-duration complex experiments at low cost. We use the indoortestbed for most of the experiments in this paper; we use theoutdoor testbed mainly for justifying the generality of thephe-nomena observed in the indoor testbed.

Experiments. In the outdoor testbed, the Stargate at one endacts as the sender, and the other Stargates act as receivers.Giventhe constraints of time and experiment control, we leave complexexperiments to the indoor testbed and only perform relativelysimple experiments in the outdoor testbed: the sender first sends30,000 1200-byte broadcast packets, then it sends 30,000 1200-byte unicast packets to each of the receivers.

In the indoor testbed, we let the Stargate at column 0 of row6 be the sender, and the other Stargates in row 6 act as receivers.To study the impact of interference, we consider the followingscenarios (which are named according to the interference):• Interferer-free: there is no interfering transmission. The

sender first sends 30,000 broadcast packets each of 1200bytes, then it sends 30,000 1200-byte unicast packets toeach of the receivers, and lastly it broadcasts 30,000 30-byte packets.• Interferer-close: one “interfering” Stargate at column 0 of

row 5 keeps sending 1200-byte unicast packets to the Star-gate at column 0 of row 7, serving as the source of the in-terfering traffic. The sender first sends 30,000 1200-bytebroadcast packets, then it sends 30,000 1200-byte unicastpackets to each of the receivers.• Interferer-middle: the Stargate at column 7 of row 5 keeps

sending 1200-byte unicast packets to the Stargate at column7 of row 7. The sender performs the same as in the case ofinterferer-close.• Interferer-far: the Stargate at column 14 of row 5 keeps

sending 1200-byte unicast packets to the Stargate at column14 of row 7. The sender performs the same as in the case ofinterferer-close.• Interferer-exscal: In generating the interfering traffic, ev-

ery Stargate runs the routing protocol LOF (as detailed inlater sections of this paper), and the Stargate at the upper-right corner keeps sending packets to the Stargate at theleft-bottom corner, according to an event traffic trace fromthe field sensor network of ExScal [7] . The traffic tracecorresponds to the packets generated by a Stargate when avehicle passes across the corresponding section of ExScalnetwork. In the trace, 19 packets are generated, with thefirst 9 packets corresponding to the start of the event detec-tion and the last 10 packets corresponding to the end of theevent detection. Figure 3 shows, in sequence, the intervalsbetween packets 1 and 2, 2 and 3, and so on. The senderperforms the same as in the case of interferer-close.

In all of these experiments, except for the case of interferer-exscal, the packet generation frequency, for both the sender andthe interferer, is 1 packet every 20 milliseconds. In the case of

0 5 10 150

500

1000

1500

2000

2500

3000

3500

sequence

pack

et in

terv

al (

mill

isec

onds

)

Figure 3: The traffic trace of an ExScal event

interferer-exscal, the sender still generates 1 packet every 20 mil-liseconds, yet the interferer generates packets accordingto theevent traffic trace from ExScal, with the inter-event-run intervalbeing 10 seconds. (Note that the scenarios above are far frombeing complete, but they do give us a sense of how different in-terfering patterns affect link properties.)

In the experiments, broadcast packets are transmitted at thebasic rate of 1M bps, as specified by the 802.11b standard. Notfocusing on the impact of packet rate in our study, we set unicasttransmission rate to a fixed value (e.g., 5.5M bps). (We havetested different unicast transmission rates and observed similarphenomena.) For other 802.11b parameters, we use the defaultconfiguration that comes with the system software. For instance,unicast transmissions use RTS-CTS handshake, and each unicastpacket is retransmitted up to 7 times until success or failure in theend.

2.2 Experimental results

For each case, we measure various link properties, such as packetdelivery rate and the run length of packets successfully receivedwithout any loss in between, for each link defined by the sender- receiver. Due to space limitations, however, we only presentthe data on packet delivery rate here. The packet delivery rateis calculated once every 100 packets (we have also calculateddelivery rates in other granularities, such as once every 20, 50 or1000 packets, and similar phenomena were observed).

We first present the difference between broadcast and unicastwhen there is no interference, then we present the impact of inter-ference on network conditions as well as the difference betweenbroadcast and unicast.

2.2.1 Interferer free

Figure 4 shows the scatter plot of the delivery rates for broadcast

0 500 1000 15000

20

40

60

80

100

distance (meter)

pack

et d

eliv

ery

rate

(%

)

(a) broadcast

0 200 400 600 80095

96

97

98

99

100

distance (meter)

pack

et d

eliv

ery

rate

(%

)

(b) unicast

Figure 4: Outdoor testbed

3

and unicast packets at different distances in the outdoor testbed.From the figure, we observe the following:• Broadcast has longer communication range than unicast.

This is due to the fact that the transmission rate for broad-cast is lower, and that there is no RTS-CTS handshake forbroadcast. (Note: the failure in RTS-CTS handshake alsocauses a unicast to fail.)• For links where unicast has non-zero delivery rate, the mean

delivery rate of unicast is higher than that of broadcast. Thisis due to the fact that each unicast packet is retransmitted upto 7 times upon failure.• The variance in packet delivery rate is lower in unicast than

that in broadcast. This is due to the fact that unicast pack-ets are retransmitted upon failure, and the fact that thereis RTS-CTS handshake for unicast. (Note: the success inRTS-CTS handshake implies higher probability of a suc-cessful unicast, due to temporal correlations in link proper-ties [11].)

Similar results are observed in the indoor testbed, as showninFigures 5(a) and 5(b). Nevertheless, there are exceptions at dis-

0 5 10 150

20

40

60

80

100

distance (meter)

pack

et d

eliv

ery

rate

(%

)

(a) broadcast: 1200-byte packet

0 5 10 150

20

40

60

80

100

distance (meter)

pack

et d

eliv

ery

rate

(%

)

(b) unicast: 1200-byte packet

0 5 10 150

20

40

60

80

100

distance (meter)

pack

et d

eliv

ery

rate

(%

)

(c) broadcast: 30-byte packet

Figure 5: Indoor testbed

tances 3.64 meters and 5.46 meters, where the delivery rate ofunicast takes a wider range than that of broadcast. This is likelydue to temporal changes in the environment. Comparing Fig-ures 5(a) and 5(c), we see that packet length also has significantimpact on the mean and variance of packet delivery rate.

Implication. From Figures 4 and 5, we see that packet deliveryrate differs significantly between broadcast and unicast, and thedifference varies with environment, hardware, and packet length.

2.2.2 Interference scenarios

To demonstrate how network condition changes with interfer-ence scenarios, Figure 6 shows the broadcast packet deliveryrates in different interference scenarios. We see that broadcastpacket delivery rate varies significantly (e.g., up to 39.26%) asinterference patterns change. Thus, link properties estimated forone scenario may not apply to another.

0 2 4 6 8 10 12 140

20

40

60

80

100

distance (meter)

Mea

n br

oadc

ast r

elia

bilit

y (%

)

interferer−freeinterferer−closeinterferer−middleinterferer−farinterferer−exscal

Figure 6: Network condition, measured in broadcast reliability,in different interference scenarios

Having shown the impact of interference patterns on networkcondition, Figure 7 shows how the difference between broad-

0 2 4 6 8 10 12 14−60

−40

−20

0

20

40

60

80

100

distance (meter)

diffe

renc

e in

pac

ket d

eliv

ery

rate

(%

)

interferer−freeinterferer−closeinterferer−middleinterferer−farinterferer−exscal

Figure 7: The difference between broadcast and unicast in dif-ferent interference scenarios

cast and unicast in the mean packet delivery rate changes as theinterference and distance change. Given a distance and an in-terference scenario, the difference is calculated asU−B

B, where

U andB denote the mean delivery rate for unicast and broadcastrespectively. From the figure, we see that the difference is signif-icant (up to 94.06%), and that the difference varies with distance.Moreover, the difference changes significantly (up to 103.41%)as interference pattern changes.

Implication. For wireless sensor networks where data bursts arewell separated in time and possibly in space (e.g., in burstycon-vergecast), the link properties experienced by periodic beaconsmay well differ from those experienced by data traffic. More-over, the difference between broadcast and unicast changesasinterference pattern changes.

2.3 Data-driven routing

To ameliorate the differences between broadcast and unicast linkproperties, researchers have proposed to make the length andtransmission rate of broadcast beacons be the same as those ofdata packets, and then estimate link properties of unicast data viathose of broadcast beacons by taking into account factors such aslink asymmetry. ETX [13] has explored this approach. Neverthe-less, this approach may not be always feasible when the length ofdata packets is changing; or even if the approach is always feasi-ble, it still does not guarantee that link properties experienced byperiodic beacons reflect those in the presence of data traffic, es-pecially in event-driven sensor network applications. Moreover,the existing method for estimating metrics such as ETX does nottake into account the temporal correlations in link properties [11](partly due to the difficulty of modeling the temporal correlationsthemselves [32]), which further decreases its estimation fidelity.

4

For instance, Figure 8 shows the significant error2 in estimating

0 2 4 6 8 10 12 14−100

−50

0

50

100

distance (meter)

erro

r in

bea

con−

base

d es

timat

ion

(%)

interferer−freeinterferer−closeinterferer−middleinterferer−farinterferer−exscal

Figure 8: Error in estimating unicast delivery rate via thatofbroadcast

unicast delivery rate via that of broadcast under differentinterfer-ence scenarios when temporal correlations in link properties arenot considered (i.e., assuming independent bit error and packetloss). Therefore, it is not trivial, if even possible, to preciselyestimate link properties for unicast data via those of broadcastbeacons.

To circumvent the difficulty of estimating unicast link proper-ties via those of broadcast, we propose to directly estimateuni-cast link properties via data traffic itself. In this context, sincewe are not using beacons for link property estimation, we alsoexplore the idea of not using periodic beacons in routing at all(i.e., beacon-free routing) to save energy; otherwise, beaconingrequires nodes to wake up periodically even when there is nodata traffic.

To enable data-driven routing, we need to find alternative mech-anisms for accomplishing the tasks that are traditionally assumedby beacons: acting as the basis for link property estimation, anddiffusing information (e.g., the cumulative ETX metric). In sen-sor network backbones, data-driven routing is feasible becauseof the following facts:• MAC feedback. In MACs where every frame transmission

is acknowledged by the receiver (e.g., in the 802.11b MAC),the sender can determine if a transmission has succeededby checking whether it receives the acknowledgment. Also,the sender can determine how long each transmission takes(as to be explained in detail in Section 4.5), i.e., MAC la-tency. Therefore, the sender is able to get information onlink properties without using any beacons. (Note: it hasalso been shown that MAC latency is a good routing metricfor optimizing wireless network throughput [10].)• Mostly static network & geography. Nodes are static

most of the time, and their geographic locations are read-ily available via devices such as GPS. Therefore, we canuse geography-based routing in which a node only needsto know the location of the destination and the informa-tion regarding its local neighborhood (such as the qualityof the links to its neighbors). Thus, only the location ofthe destination (e.g., the base station in convergecast) needsto be diffused across the network. Unlike in beacon-baseddistance-vector routing, the diffusion happens infrequentlysince the destination is static most of the time. In general,control packets are needed only when the location of a nodechanges, which occurs infrequently.

2The error is defined as actual unicast link reliability minusthe estimated link reliability

In what follows, we first present the routing metric ELD whichis based on geography and MAC latency, then we present thedesign of LOF which implements ELD without using periodicbeacons.

Remarks:• Although parameters such as Receiver Signal Strength In-

dicator (RSSI), Link Quality Indicator (LQI), and Signal toNoise Ratio (SNR) also reflect link reliability, it is difficultto use them as a precise prediction tool [8]. Moreover, theaforementioned parameters can be fetched only at packetreceivers (instead of senders), and extra control packets areneeded to convey these information back to the senders ifwe want to use them as the basis of link property estima-tion. Therefore, we do not recommend using these parame-ters as the core basis of data-driven routing, especially whensenders need to precisely estimate in-situ link properties.• Our objective in this paper is to explore the idea of data-

driven link property estimation and routing, and it is notour objective to prove that geography-based routing is bet-ter than distance-vector routing. In principle, we could haveused distance-vector routing together with data-driven linkproperty estimation, but this would introduce periodic con-trol packets which we would like to avoid to save energy.We will show in Section 5.3, however, that geography-baseddata-driven routing has similar performance as that of distance-vector data-driven routing.• Conceptually, we could have also defined our routing metric

based on other parameters such as ETX [13] or RNP [11].Nevertheless, the firmware of our SMC WLAN cards doesnot expose information on the number of retries of a unicasttransmission, which makes it hard to estimate ETX or RNPdirectly via data traffic. As a part of our future work, weplan to design mechanisms to estimate ETX and RNP viadata traffic (e.g., in IEEE 802.15.4 based mote networks)and study the corresponding protocol performance.

3 ELD: the routing metric

In this section, we first justify mathematically why MAC la-tency reflects link reliability and energy consumption, then wederive the routing metric ELD, theexpected MAC latency perunit-distance to the destination, and finally we analyze the sam-ple size requirement in routing.

3.1 MAC latency as the basis for route selection

For convergecast in sensor networks (especially for event-drivenapplications), packets need to be routed reliably and in real-timeto the base station. As usual, packets should also be delivered inan energy-efficient manner. Therefore, a routing metric shouldreflect link reliability, packet delivery latency, and energy con-sumption at the same time. One such metric that we adopt inLOF is based on MAC latency (i.e., the time taken for the MACto transmit a data frame).

Intuitively, both the MAC latency and the energy consumptionof a frame transmission depend on the link reliability. ThereforeMAC latency certainly reflects link reliability and energy con-

5

sumption. But to characterize their relationships more precisely,we mathematically analyze them as follows, using 802.11b asanexample. (Readers unfamiliar with the details of 802.11b couldrefer to [6], or simply skip the mathematical formulation.)

Given a senderS and a receiverR where the link betweenthem has non-zero reliability, we letDS,R andPS,R denote theMAC latency and the energy consumption for transmitting a uni-cast frame fromS toR. Then, we are interested in calculating theexpected valuesE(DS,R) andE(PS,R). To simply analysis, weonly consider the case where there is no interfering traffic,andwe assume that the MAC continues to transmit a packet until itissuccessful. (Note that this simplification, though different fromreality, does no prevent us from getting a sense of the gross re-lationship between MAC latency and energy consumption.) Letp0 be the probability that a RTS-CTS handshake betweenS andR will fail (e.g., due to the loss of RTS or CTS) (p0 > 0), p1 bethe probability that a DATA-ACK handshake betweenS andR

will fail ( p1 > 0), andC = p0 + p1 − p0p1. Then, we have

E(DS,R) = (1 − p0)(1 − p1)(464 + td) + ( 1−p1

p1

)T (µs) (1)

where

td = time taken to transmit the DATA frame (in microseconds);

T = (502 + (134+2td)(1−p0)p1

C) 2C2

−C3

(1−C)2+

158720C8−138880C9

(1−C)2+

(−td−482)C2

C−

119040C8

1−C−

1004p2

0−502p3

0

(1−p0)2+

−158720p8

0+138880p9

0

(1−P0)2+

(td+482)p2

0

1−p0

+

119040p8

0

1−p0

+ 155P7

k0=2((2C)k0 − (2p0)k0 ).

Derivation sketch for formula (1).Assume a frame transmissionfrom S to R takesk1 rounds of DATA-ACK handshakes andk0

rounds of RTS-CTS handshakes. Clearly,k0 ≥ k1. Then, theMAC latencyDS,R(k0, k1) can be decomposed into the follow-ing three components:• The latencyI(k0, k1) due to the initial DIFS before any

RTS-CTS-DATA-ACK handshake. We haveI(k0, k1) =DIFS (µs).• The latencyCB(k0, k1) due to the contention avoidance

backoffs: there are(k0 − k1) RTS-CTS handshake failures,(k1 − 1) DATA-ACK handshake failures, and(k0 − k1) +(k1− 1) = k0− 1 contention backoffs. Therefore, we have

CB(k0, k1) = (k0 − k1)(CTSTimeout + DIFS)+(k1 − 1)(ACKTimeout + DIFS)+Pk0−1

i=1 BTi

whereCTST imeout = trts+tcts+2SIFS, ACKTimeout =td + tack +SIFS +DIFS, andBTi is the value of theith

contention backoff timer.• The latencyDT (k0, k1) due to the normal RTS-CTS-DATA-

ACK procedure. We have

DT (k0, k1) = (k0 − k1)trts+k1(trts + SIFS + tcts)+(SIFS + td + SIFS + tack) (µs)

wheretrts, tcts, andtack denote the time taken to transmita RTS, a CTS, and an ACK respectively.

Therefore, we have

DS,R(k0, k1) = I(k0, k1) + CB(k0, k1) + DT (k0, k1)= k0(DIFS + trts + CTSTimeout)+

k1(−CTSTimeout + ACKTimeout+tcts + 3SIFS + td + tack)−

ACKTimeout +Pk0−1

i=1 BTi

Now, let us calculate the probabilityP (k0, k1) that a transmis-sion fromS to R takesk0 rounds of RTS-CTS handshake andk1

rounds of DATA-ACK handshake. We haveP (k0, k1) =

P{(k0 − k1) RTS-CTS failure out ofk0 times}×P{(k1 − 1) DATA-ACK failures followed by a success}

=

(

(` k0

(k0−k1)

´

pk0−k1

0 (1 − p0)k1 ) × (p1k1−1(1 − p1)) if k0 ≥ k1

0 otherwise

=

(

`k0

k1

´

( 1−p1

p1

)pk0

0 ((1−p0)p1

p0

)k1 if k0 ≥ k1

0 otherwise

Therefore, we have

E(DS,R) = Ek0,k1(DS,R(k0, k1))

=P

k0=1

Pk0

k1=1 DS,R(k0, k1)P (k0, k1)

Applying 802.11b parameters, such astrts andSIFS, to theabove formula, we could arrive at formula (1). Due to the limi-tation of space, we skip the detail here.

2

For energy consumption, we have

E(PS,R) = 1−p1

p1(1−C)2(34C + (ld + 14)p1(1 − p0)(2 − C)) (bytes)

(2)where

ld = the length of the DATA frame (in number of bytes).

Derivation sketch for formula (2). Let k0, k1, andP (k0, k1)be the same as in the “derivation sketch for formula (1). LetBrts−cts be the length, in number of bytes, of a RTS-CTS pair,andBdata−ack be the length of a DATA-ACK pair. Then, if atransmission fromS to R takesk0 rounds of RTS-CTS hand-shakes andk1 rounds of DATA-ACK handshakes (k0 ≥ k1), theenergyPS,R(k0, k1) consumed, in number of bytes transmitted,is k0Brts−cts + k1Bdata−ack.

Therefore, we have

E(PS,R) = Ek0,k1(PS,R(k0, k1))

=P

k0=1

Pk0

k1=1 PS,R(k0, k1)P (k0, k1)

Applying 802.11b parameters, such asBrts−cts, to the aboveformula, we could arrive at formula (2). Due to the limitation ofspace, we skip the detail here.

2

To visualize Formulas (1) and (2), we let the probabilityp′1that a DATA-ACK handshake will succeed represent link relia-bility (i.e., p′1 = 1−p1). Lettingk = length(DATA+ACK)

length(RTS+CTS), and assum-

ing that bit errors are independent, we havep1 = 1− (1− p0)k.

Thus,p0 = 1− k

p

1 − p1 = 1− k

q

p′1 (3)

Based on equations (1), (2), (3), and assuming thattd is 1200,Figure 9 presents a visual characterization of the expectedMAC

6

00.20.40.60.8110

3

104

105

106

p1‘

MAC latency (microseconds)energy consumption (# of bytes)

Figure 9: MAC latency as an indicator of energy consumption

latency and the expected energy consumption. We see that MAClatency is strongly related to energy consumption in a positivemanner. Thus, routing metrics optimizing MAC latency wouldalso optimize energy efficiency.

Previous work has argued the advantages of using routing met-rics Expected Transmission Count (ETX) [33, 13] and ExpectedTransmission Time (ETT) [15, 14], which are similar to MAClatency, from perspectives such as reducing self-interference andincreasing throughput. Our analysis complements theirs bymath-ematically showing the relationships among MAC latency, en-ergy consumption, and link reliability.

3.2 ELD: a geography-based routing metric

Given that MAC latency is a good basis for route selection andthat geography enables low frequency information diffusion, wedefine a routing metric ELD,the expected MAC latency per unit-distance to the destination, which is based on both MAC latencyand geography. Specifically, given a senderS, a neighborR of S,and the destination D as shown in Figure 10,

S D

R

L e (S, R)

Figure 10:Le calculation

we first calculate theeffective ge-ographic progressfrom S to D

via R, denoted byLe(S, R), as(LS,D−LR,D), whereLS,D de-notes the distance between S andD, and LR,D denotes the dis-tance between R and D. Then,we calculate, for the senderS, the MAC latency per unit-

distance to the destination(LD) via R, denoted byLD(S, R),as3

{

DS,R

Le(S,R)if LS,D > LR,D

∞ otherwise(4)

whereDS,R is the MAC latency fromS to R. Therefore, theELD via R, denoted asELD(S, R), is E(LD(S, R)) which iscalculated as

{

E(DS,R)

Le(S,R)if LS,D > LR,D

∞ otherwise(5)

For every neighborR of S, S associates withR a rank

〈ELD(S, R), var(LD(S, R)), LR,D, ID(R)〉

wherevar(LD(S, R)) denotes the variance ofLD(S, R), andID(R) denotes the unique ID of nodeR. Then,S selects as

3Currently, we focus on the case where a node forwards packetsonly to a neighbor closerto the destination than itself.

its next-hop forwarder the neighbor that ranks the lowest amongall the neighbors. (Note: routing via metric ELD is a greedyapproach, where each node tries to optimize the local objective.Like many other greedy algorithms, this method is effectiveinpractice, as shown via experiments in Section 5.)

To understand what ELD implies in practice, we set up an ex-periment as follows: consider a line network formed by row 6 ofthe indoor testbed shown in Figure 2, the StargateS at column0 needs to send packets to the StargateD at the other end (i.e.,column 14). Using the data on unicast MAC latencies in the caseof interferer-free, we show in Figure 11 the mean unicast MAC

0 2 4 6 8 10 12 140

5

10

15

20

25

distance (meter)

mean MAC latency (ms)ELD (ms/meter)

Figure 11: Mean unicast MAC latency and the ELD

latencies and the corresponding ELD’s regarding neighborsatdifferent distances. From the figure, StargateD, the destinationwhich is 12.8 meters away fromS, offers the lowest ELD, andS sends packets directly toD. From this example, we see that,using metric ELD, a node tends to choose nodes beyond the re-liable communication range as forwarders, to reduce end-to-endMAC latency as well as energy consumption.

Remark. ELD is a locally measurable metric based only onthe geographic locations of nodes and information regarding thelinks associated with the senderS; ELD does not assume linkconditions beyond the local neighborhood ofS. In the analysisof geographic routing [30], however, a common assumption isgeographic uniformity— that the hops in any route have sim-ilar properties such as geographic length and link quality.Aswe will show by experiments in Section 5, this assumption isusually invalid. For the sake of verification and comparison,we derive another routing metric ELR, theexpected MAC la-tency along a route, based on this assumption. More specifically,ELR(S, R) =

{

E(DS,R) × ⌈LS,R+LR,D

LS,R⌉ if LS,D > LR,D

∞ otherwise(6)

where⌈LS,R+LR,D

LS,R⌉ denotes the number of hops to the destina-

tion, assuming equal geographic distance at every hop. We willshow in Section 5 that ELR is inferior to ELD.

3.3 Sample size analysis

To understand the convergence speed of ELD-based routing andto guide protocol design, we experimentally study the samplesize required to distinguish out the best neighbor in routing.

In our indoor testbed, let the Stargate at column 0 of row 6be the senderS and Stargate at the other end of row 6 be thedestinationD; then letS send 30,000 1200-byte unicast packetsto each of the other Stargates in the testbed, to get information

7

(e.g., MAC latency and reliability) on all the links associatedwith S. The objective is to see what sample size is required forS to distinguish out the best neighbor.

First, we need to derive thedistribution modelfor MAC la-tency. Figure 12 shows the histogram of the unicast MAC laten-

0 20 40 60 800

2000

4000

6000

8000

10000

12000

14000

16000

MAC latency (milliseconds)

# of

sam

ples

Figure 12: Histogram for unicast MAC latency

cies for the link to a node 3.65 meters (i.e., 12 feet) away fromS. (The MAC latencies for other links assume similar patterns.)Given the shape of the histogram and the fact that MAC latencyis a type of “service time”, we select three models for evalua-tion: exponential, gamma, and lognormal.4 Against the data onthe MAC latencies for all the links associated withS, we performKolmogorov-Smirnov test [21] on the three models, and we findthat lognormaldistribution fits the data the best.

Therefore, we adopt lognormal distribution for the analysis inthis paper. Given that MAC latency assumes lognormal distribu-tion, the LD associated with a neighbor also assumes lognormaldistribution, i.e.,log(LD) assumes normal distribution.

Because link quality varies temporally, the best neighbor forSmay change temporally. Therefore, we divide the 30,000 MAClatency samples of each link into chunks of time spanWc, de-noted as thewindow of comparison, and we compare all thelinks via their corresponding sample-chunks. Given each sam-ple chunk for the MAC latency of a link, we compute the samplemean and sample variance for the correspondinglog(LD), anduse them as the mean and variance of the lognormal distribu-tion. When considering thei-th sample chunks of all the links(i = 1, 2, . . .), we find the best link according to these samplechunks, and we compute the sample size required for comparingthis best link with each of the other links as follows:

Given two normal variatesX1, X2 whereX1 ∼ N(µ1, δ21)

andX2 ∼ N(µ2, δ22), the sample size required to com-

pareX1 andX2 at 100(1 − α)% confidence level is(Zα(δ1+δ2)

µ1−µ2

)2 (0 ≤ α ≤ 1), with Zα being theα-quantile of a unit normal variate [22].

In the end, we have a set of sample sizes for each specificWc.For a 95% confidence level comparison and route selection, Fig-ure 13(a) shows the 75-, 80-, 85-, 90-, and 95-percentiles ofthesample sizes for differentWc’s. We see that the percentiles donot change much asWc changes. Moreover, we observe that,even though the 90- and 95-percentiles tend to be large, the 75-and 80-percentiles are pretty small (e.g., being 2 and 6 respec-tively whenWc is 20 seconds), which implies that routing deci-sions can converge quickly in most cases. This observation alsomotivates us to use initial sampling in LOF, as detailed in Sec-

4The methodology of LOF is independent of the distribution model adopted. Therefore,LOF would still apply even if better models are found later.

50 100 150 200 250 30010

0

101

102

103

window of comparison (seconds)

sam

ple

size

75−percentile80−percentile85−percentile90−percentile95−percentile

(a) route selection

50 100 150 200 250 30040

60

80

100

120

window of comparison (seconds)

sam

ple

size

75−percentile80−percentile85−percentile90−percentile95−percentile

(b) absolute ELD

Figure 13: Sample size requirement

tion 4.2.

Remarks. In the analysis above, we did not consider the tempo-ral patterns of link properties (which are usually unknown). Hadthe temporal patterns been known and used in link estimation,the sample size requirement can be even lower.

By way of contrast, we also compute the sample size requiredto estimate the absolute ELD value associated with each neigh-bor. Figure 13(b) shows the percentiles for a 95% confidencelevel estimation with an accuracy of±5%. We see that, eventhough the 90- and 95-percentiles are less than those for routeselection, the 75- and 80-percentiles (e.g., being 42 and 51re-spectively whenWc is 20 seconds) are significantly greater thanthose for route selection. Therefore, when analyzing sample sizerequirement for routing, we should focus on relative comparisonamong neighbors rather than on estimating the absolute value,unlike what has been done in the literature [33].

4 LOF: a data-driven protocol

Having determined the routing metric ELD, we are ready to de-sign protocol LOF for implementing ELD without using periodicbeacons. Without loss of generality, we only consider a singledestination, i.e., the base station to which every other node needsto find a route.

Briefly speaking, LOF needs to accomplish two tasks: First,to enable a node to obtain the geographic location of the basestation, as well as the IDs and locations of its neighbors; Second,to enable a node to track the LD (i.e., MAC latency per unit-distance to the destination) regarding each of its neighbors. Thefirst task is relatively simple and only requires exchanginga fewcontrol packets among neighbors in rare cases (e.g., when a nodeboots up); LOF accomplishes the second task using three mech-anisms: initial sampling of MAC latency, adapting estimationvia MAC feedback for application traffic, and probabilisticallyexploring alternative forwarders.

In this section, we first elaborate on the individual componentsof LOF, then we discuss implementation issues of LOF such asreliably fetching MAC feedback.

4.1 Learning where we are

LOF enables a node to learn its neighborhood and the locationof the base station via the following rules:

I. [Issue request]Upon boot-up, a node broadcastsM copiesof hello-requestpackets if it is not the base station. Ahello-

8

requestpacket contains the ID and the geographic locationof the issuing node. To guarantee that a requesting node isheard by its neighbors, we setM as 7 in our experiments.

II. [Answer request] When receiving ahello-requestpacketfrom another node that is farther away from the base sta-tion, the base station or a node that has a path to the basestation acknowledges the requesting node by broadcastingM copies ofhello-replypackets. Ahello-replypacket con-tains the location of the base station as well as the ID andthe location of the issuing node.

III. [Handle announcement] When a nodeA hears for thefirst time ahello-replypacket from another nodeB closerto the base station,A records the ID and location ofB andregardsB as a forwarder-candidate.

IV. [Announce presence] When a node other than the basestation finds a forwarder-candidate (and thus a path) for thefirst time, or when the base station boots up, it broadcastsM copies ofhello-replypackets.

To reduce potential contention, every broadcast transmission men-tioned above is preceded by a randomized waiting period whoselength is dependent on node distribution density in the network.Note that the above rules can be optimized in various ways. Forinstance, rule II can be optimized such that a node acknowledgesat most onehello-requestfrom another node each time the re-questing node boots up. Even though we have implemented quitea few such optimizations, we skip the details here since theyarenot the focus of this paper.

4.2 Initial sampling

Having learned the location of the base station as well as thelo-cations and IDs of its neighbors, a node needs to estimate theLDs regarding its neighbors. To design the estimation mecha-nism, let us first check Figure 14, which shows the mean unicast

0 2 4 6 8 10 12 140

5

10

15

20

25

30

distance (meter)

mea

n M

AC

late

ncy

(mill

isec

onds

)

interferer−freeinterferer−closeinterferer−middleinterferer−farinterferer−exscal

Figure 14: MAC latency in the presence of interference

MAC latency in different interfering scenarios for the indoor ex-periments described in Section 2.1. We see that, even thoughMAC latencies change as interference pattern changes, the rel-ative ranking in the mean MAC latency among links does notchange much. Neither will the LDs accordingly.

In LOF, therefore, when a nodeS learns of the existence of aneighborR for the first time,S takes a few samples of the MAClatency for the link toR before forwarding any data packets toR. The sampling is achieved byS sending a few unicast packetsto R and then fetching the MAC feedback. The initial samplinggives a node a rough idea of the relative quality of the links to itsneighbors, to jump start the data-driven estimation.

According to the analysis in Section 3.3, another reason forinitial sampling is that, with relatively small sample size, a nodecould gain a decent sense of the relative goodness of its neigh-bors. We set the initial sample size as6 (i.e., the 80-percentile ofthe sample size whenWc is 20 seconds) in our experiments.

4.3 Data-driven adaptation

Via initial sampling, a node gets a rough estimation of the rel-ative goodness of its neighbors. To improve its route selectionfor an application traffic pattern, the node needs to adapt its es-timation of LD via the MAC feedback for unicast data transmis-sion. (According to the analysis in Section 3.3, route decisionsconverge quickly because of the small sample size requirement.)Since LD is lognormally distributed, LD is estimated by estimat-ing log(LD).

On-line estimation. To determine the estimation method, wefirst check the properties of the time series oflog(LD), consid-ering the same scenario as discussed in Section 3.3. Figure 15shows a time series of thelog(LD) regarding a node 3.65 me-

0 1 2 3x 10

4

−1

0

1

2

3

sample serieslo

g(LD

) (m

s/m

eter

)

Figure 15: A time series oflog(LD)

ters (i.e., 12 feet) away from the senderS (The log(LD) forthe other nodes assumes similar patterns.). We see that the timeseries fits well with theconstant-level model[20] where the gen-erating process is represented by a constant superimposed withrandom fluctuations. Therefore, a good estimation method isex-ponentially weighted moving average(EWMA) [20], assumingthe following form

V ←− αV + (1 − α)V ′ (7)

whereV is the parameter to be estimated,V ′ is the latest obser-vation ofV , andα is the weight (0 ≤ α ≤ 1).

In LOF, when a new MAC latency and thus a newlog(LD)value with respect to the current next-hop forwarderR is ob-served, theV value in the right hand side of formula (7) may bequite old if R has just been selected as the next-hop and somepackets have been transmitted to other neighbors immediatelybefore. To deal with this issue, we define theage factorβ(R)of the current next-hop forwarderR as the number of packetsthat have been transmitted sinceV of R was last updated. Then,formula (7) is adapted to be the following:

V ←− αβ(R)V + (1 − αβ(R))V ′ (8)

(Experiments confirm that LOF performs better with formula (8)than with formula (7).)

Each MAC feedback indicates whether a unicast transmissionhas succeeded and how long the MAC latencyl is. When a node

9

receives a MAC feedback, it first calculates the age factorβ(R)for the current next-hop forwarder, then it adapts the estimationof log(LD) as follows:• If the transmission has succeeded, the node calculates the

new log(LD) value usingl and applies it to formula (8)to get a new estimation regarding the current next-hop for-warder.• If the transmission has failed, the node should not usel di-

rectly because it does not represent the latency to success-fully transmit a packet. To address this issue, the node keepstrack of the unicast delivery rate, which is also estimated us-ing formula (8), for each associated link. Then, if the noderetransmits this unicast packet via the currently used link,the expected number of retries until success is1

p, assuming

that unicast failures are independent and that the unicast de-livery rate along the link isp. Including the latency for thislast failed transmission, the expected overall latencyl′ is(1 + 1

p)l. Therefore, the node calculates the newlog(LD)

value usingl′ and applies it to formula (8) to get a new es-timation.

Another important issue in EWMA estimation is choosing theweightα, since it affects the stability and agility of estimation.To address this question, we again perform experiment-basedanalysis. Using the data from Section 3.3, we try out differentα values and compute the corresponding estimation fidelity, thatis, the probability of LOF choosing the right next-hop forwarderfor S. Figure 16(a) shows the bestα value and the correspond-ing estimation fidelity for different windows of comparison. Ifthe window of comparison is 20 seconds, for instance, the best α

is 0.8, and the corresponding estimation fidelity is 89.3%. (Sincethe time span of the ExScal traffic trace is about 20 seconds, wesetα as 0.8 in our experiments.)

50 100 150 200 250 30080

85

90

95

100

window of comparison (seconds)

best α (%)estimation fidelity (%)

(a) bestα

0 0.2 0.4 0.6 0.8 175

80

85

90

α

estim

atio

n fid

elity

(%

)

(b) sensitivity analysis

Figure 16: The weightα in EWMA

For sensitivity analysis, Figure 16(b) shows how the estima-tion fidelity changes withα when the window of comparison is20 seconds. We see that the estimation fidelity is not very sen-sitive to changes inα over a wide range. For example, the es-timation fidelity remains above 85% whenα changes from 0.6to 0.98. Similar patterns are observed for the other windowsof comparison too. The insensitivity of estimation fidelityto α

guarantees the robustness of EWMA estimation in different en-vironments.

Route adaptation. As the estimation of LD changes, a nodeS

adapts its route selection by the ELD metric. Moreover, if theunicast reliability to a neighborR is below certain threshold (say60%),S will mark R as dead and will removeR from the set of

forwarder-candidates. IfS loses all its forwarder-candidates,S

will first broadcastM copies ofhello-withdrawalpackets andthen restarts the routing process. If a nodeS′ hears ahello-withdrawalpacket fromS, and ifS is a forwarder-candidate ofS′, S′ removesS from its set of forwarder-candidates and up-date its next-hop forwarder as need be. (As a side note, we findthat, on average, only 0.9863 neighbors of any node are markedas dead in both our testbed experiments and the field deploy-ment of LOF in project ExScal [7]. Again, the withdrawing andrejoining process can be optimized, but we skip the details here.)

4.4 Exploratory neighbor sampling

Given that the initial sampling is not perfect (e.g., covering 80%instead of 100% of all the possible cases) and that wireless linkquality varies temporally (e.g., from initial sampling to actualdata transmission), the data-driven adaptation alone may missusing good links, simply because they were relatively bad whentested earlier and they do not get chance to be tried out lateron. Therefore, we propose exploratory neighbor sampling inLOF. That is, whenever a nodeS has consecutively transmit-tedIns(R0) number of data packets using a neighborR0, S willswitch its next-hop forwarder fromR0 to another neighborR′

with probabilityPns(R′) (so that the link quality toR′ can be

sampled). On the other hand, the exploratory neighbor samplingis optimistic in nature, and it should be used only for good neigh-bors. In LOF, exploratory neighbor sampling only considerstheset of neighbors that are not marked as dead.

In what follows, we explain how to determine the samplingprobabilityPns(R

′) and the sampling intervalIns(R0). For con-venience, we consider a senderS, and let the neighbors ofS beR0, R1, . . . , RN with increasing ranks.

Sampling probability. At the moment of neighbor sampling,a better neighbor should be chosen with higher probability.InLOF, a neighbor is chosen with the probability of the neighboractually being the best next-hop forwarder. We derive this proba-bility in three steps: the probabilityPb(Ri, Rj) of a neighborRi

being actually better than another oneRj (given by formula (9)),the probabilityPh(Ri) of a neighborRi being actually betterthan all the neighbors that ranks lower than itself (given byfor-mula (10)), and the probabilityPns(Ri) of a neighborRi beingactually the best forwarder (given by formula (11)).

Given S and its two neighborsRi andRj , we approximatePb(Ri, Rj) with P{LD(S, Ri) > LD(S, Rj)}, which equalsP{log(LD(S, Ri)) > log(LD(S, Rj))}. As discussed in Sec-tion 3.3,log(LD(S, Ri)) as well aslog(LD(S, Rj)) has a nor-mal distribution. Assumelog(LD(S, Ri)) ∼ N(µi, δ

2i ), log(LD(S, Rj)) ∼

N(µj , δ2j ), and thatlog(LD(S, Ri)) is independent oflog(LD(S, Rj)),

then we havePb(Ri, Rj) = G(

µj−µiq

δ2

i+δ2

j

) (9)

whereG(x) = 1− Φ(x), with

Φ(x) =

(

12

erfc(−x/p

(2)) x ≤ 0

1 − 12

erfc(x/p

(2)) x > 0

erfc(x) ≈ ( 11+x/2

)exp(−x2 + P ( 11+x/2

))

P (x) = 0.17087277x9 − 0.82215223x8 + 1.48851587x7−1.13520398x6 + 0.27886807x5 − 0.18628806x4+0.09678418x3 + 0.37409196x2 + 1.00002368x−1.26551223.

10

Derivation sketch for formula (9 ). Sincelog(LD(S, Ri)) ∼N(µi, δ

2i ), log(LD(S, Rj)) ∼ N(µj , δ

2j ), andlog(LD(S, Ri))

is independent oflog(LD(S, Rj)), it is easy to show thatZ ′ =log(LD(S, Ri))− log(LD(S, Rj) is a normal variate with mean

(µi − µj) and variance(δ2i + δ2

j ). Therefore,Z =Z′−(µi−µj)√

δ2

i+δ2

j

is a standard normal variate. Thus,

Pb(Ri, Rj) = P{log(LD(S, Ri)) > log(LD(S, Rj))}= P [Z′ > 0]

= P [Z >−(µi−µj)

q

δ2

i+δ2

j

]

By Andrew’s method [31],P [Z >−(µi−µj)√

δ2

i+δ2

j

] = G(µj−µi√

δ2

i+δ2

j

),

whereG(x) = 1− Φ(x), with

Φ(x) =

(

12

erfc(−x/p

(2)) x ≤ 0

1 − 12

erfc(x/p

(2)) x > 0

erfc(x) ≈ ( 11+x/2

)exp(−x2 + P ( 11+x/2

))

P (x) = 0.17087277x9 − 0.82215223x8 + 1.48851587x7−1.13520398x6 + 0.27886807x5 − 0.18628806x4+0.09678418x3 + 0.37409196x2 + 1.00002368x−1.26551223

2

KnowingPb(Rj , Rk) for everyj andk, we computePh(Ri)(i = 1, . . . , N ) inductively as follows:

Ph(R1) = Pb(R1, R0);

Ph(Ri) ≈ Pb(Ri, R0) ×Qi−1

j=1(1 − (Pb(Rj , Ri)+

(Ph(Rj) − 1) × Pb(R0, Ri)))(i = 2, . . . , N)

(10)

Derivation sketch for formula (10 ).LetPb(〈Rk0, Rm0

〉, . . . , 〈Rkl, Rml

〉)denote the probability thatRk0

is better thanRm0, . . . , andRkl

isbetter thanRml

, and letPb(〈Ri, Rj〉|〈Rk0, Rm0

〉, . . . , 〈Rkl, Rml

〉)denote the probabilityRi being better thanRj given thatRk0

isbetter thanRm0

, . . . , andRklis better thanRml

. Then,Ph(Ri)(i = 1, . . . , N ) is computed inductively as follows:

Ph(R1) = Pb(R1, R0);Ph(Ri) = Pb(Ri, R0) × Pb(〈Ri, R1〉|〈Ri, R0〉) × . . .

Pb(〈Ri, Rj〉|〈Ri, R0〉, . . . , 〈Ri, Rj−1〉) × . . .Pb(〈Ri, Ri−1〉|〈Ri, R0〉, . . . , 〈Ri, Ri−2〉)

= Pb(Ri, R0) ×Qi−1

j=1 Pb(〈Ri, Rj〉|〈Ri, R0〉, . . . , 〈Ri, Rj−1〉)

= Pb(Ri, R0) ×Qi−1

j=1(1 − Pb(〈Rj , Ri〉|〈Ri, R0〉, . . . , 〈Ri, Rj−1〉))

= Pb(Ri, R0) ×Qi−1

j=1(1 − Pb(〈Rj , Ri〉, 〈Rj , R0〉, . . . , 〈Rj , Rj−1〉))

= Pb(Ri, R0) ×Qi−1

j=1(1 − (Pb(Rj , all of R0 to Rj−1)×

Pb(any ofR0 to Rj−1, Ri)+Pb(Rj , Ri) × Pb(Ri, all of R0 to Rj−1)))

= Pb(Ri, R0) ×Qi−1

j=1(1 − (Ph(Rj) × Pb(any ofR0 to Rj−1, Ri)+

Pb(Rj , Ri) × Pb(Ri, all of R0 to Rj−1)))

= Pb(Ri, R0) ×Qi−1

j=1(1 − (Pb(Rj , Ri)+

(Ph(Rj) − 1)Pb(any ofR0 to Rj−1, Ri)))

≈ Pb(Ri, R0) ×Qi−1

j=1(1 − (Pb(Rj , Ri)+

(Ph(Rj) − 1) × Pb(R0, Ri)))(i = 2, . . . , N)

2

Then, we compute the sampling probability as follows:

Pns(R0) = Pb(R0, R1) ×QN

j=2(1 − Ph(Rj));

Pns(Ri) = Ph(Ri) ×QN

j=i+1(1 − Ph(Rj))(i = 1, . . . , N − 1);

Pns(RN ) = Ph(RN )

(11)

Derivation sketch for formula (11 ).Inductively,

Pns(R0) = Pb(R0, R1) × Pb(〈R0, R2〉|〈R0, R1〉) × . . .Pb(〈R0, RN 〉|〈R0, R1〉, . . . , 〈R0, RN−1〉)

= Pb(R0, R1) ×QN

j=2 Pb(〈R0, Rj〉|〈R0, R1〉, . . . , 〈R0, Rj−1〉)

= Pb(R0, R1) ×QN

j=2(1 − Pb(〈Rj , R0〉|〈R0, R1〉, . . . , 〈R0, Rj−1〉))

= Pb(R0, R1) ×QN

j=2(1 − Ph(Rj));

Pns(Ri) = Ph(Ri) ×QN

j=i+1 Pb(〈Ri, Rj〉|〈Ri, R0〉, . . . , 〈Ri, Rj−1〉)

= Ph(Ri) ×QN

j=i+1(1 − Ph(Rj))(i = 1, . . . , N − 1);

Pns(RN ) = Ph(RN )

2

Because of the approximation in formula (10),∑N

i=0 Pns(Ri)may not equal to 1. To address this issue, we normalize thePns(Ri)’s (i = 0, . . . , N ) such that their sum is 1.

Sampling interval. The frequency of neighbor sampling shoulddepend on how good the current next-hop forwarderR0 is, i.e.,the sampling probabilityPns(R0). In LOF, we set the samplingintervalIns(R0) to be proportional toPns(R0), that is,

Ins(R0) = C × Pns(R0) (12)

whereC is a constant being equal to(N ×K), with N being thenumber of active neighbors thatS has, andK being a constantreflecting the degree of temporal variations in link quality. WesetK to be 20 in our experiments.

The above method of setting the sampling probability and thesampling interval ensures that better forwarders will be sam-pled with higher probabilities, and that the better the current for-warder is, the lower the frequency of exploratory neighbor sam-pling. The sampling probabilities and the sampling interval arere-calculated each time the next-hop forwarder is changed.

4.5 Implementation issues

In this subsection, we discuss implementation issues of LOF.

MAC feedback exfiltration. In LOF, both the status and theMAC latency are required for every unicast transmission. Yetthe default Linux WLAN driverhostap[5] only signals failedunicast transmissions, and it does not signal the unicast MAClatency. Therefore, we modify the Linux kernel and the hostapdriver such that the transmission status, whether success or fail-ure, is always signaled and the MAC latency is reported too.Since we implement LOF, using EmStar [4], as a user-spaceprocess, MAC feedback is sent to the LOF process vianetlinksockets and/proc file system [19].

Given that the LOF process executes in user-space and thatpacket transmission is supported via UDP sockets in EmStar,there is memory copying in the procedure between the LOF pro-cess sending a packet and the hostap driver transmitting thecor-responding 802.11b MAC frame(s). Thus, one issue is how tomap a data transmission at the user-space with the frame trans-mission at the driver and thus the MAC feedback. Fortunately,the data buffers in EmStar, Linux TCP/IP stack, hostap driver,and the SMC WLAN card are managed in the first-in-first-out(FIFO) manner. Therefore, as long as we make sure that eachdata transmission from the LOF process can be encapsulated ina single MAC frame, each MAC feedback can be mapped with

11

the corresponding data transmission if there is no loss of MACfeedback.

Nevertheless, we find that, under stressful conditions, MACfeedback may get lost in two ways:• A MAC feedback will be dropped innetlinksockets if the

socket buffer overflows.• If there is no valid ARP (Address Resolution Protocol) en-

try regarding the unicast destination, a data packet is droppedat the IP layer (without informing the application layer) be-fore even getting to the hostap driver, which means that noMAC feedback will be generated and thus “lost”.

To deal with possible loss of MAC feedback, LOF adopts thefollowing two mechanisms:• To avoid buffer overflow atnetlink sockets, LOF enforces

flow control within a node by enforcing an upper boundon the number of data transmissions whose MAC feedbackhas not come back. (This upper bound is set to 7 in ourexperiments.)• After each data transmission, LOF checks the kernel ARP

table to see if there is a valid entry for the destination of thisunicast packet. In this way, LOF is able to decide whethera MAC feedback will ever come back and act accordingly.

Via the stress tests in both testbeds and outdoor deployment, wefind that the above mechanisms guarantee the reliable deliveryof MAC feedback.

We implement LOF at user-space for the sake of safety andeasy maintenance. As a part of our future work, we are explor-ing implementing LOF in kernel space to see if the process ofreliably fetching MAC feedback can be simplified.

Reliable transport. MAC feedback helps not only in link qual-ity estimation but also in reliable data transport. For example,upon detecting a failed transmission via the MAC feedback, anode can retransmit the failed packet via a new next-hop for-warder. On the other hand, the transmission status carried ina MAC feedback only reflects the reliability at the MAC layer.To guarantee end-to-end reliability, we need to make sure thatpacket delivery is reliable at layers above MAC: First, we needto guarantee the liveness of the LOF routing process, which isenabled by the EmStar process monitoring facilityemrunin ourcurrent implementation; Second, the sender of a packet transmis-sion guarantees that the packet is received by the hostap driver,using the transmission status report from EmStar; Third, sender-side flow control guarantees that there is no queue overflow atthe receiver side.

Node mobility. Given that nodes in most sensor networks arestatic, LOF is not designed to support high degree of mobility.Nevertheless, LOF can deal with infrequent movement of nodesin the following simple manner:• If the base station moves, the new location of the base sta-

tion is diffused across the network;• If a node other than the base station moves, it first broadcast

M copies ofhello-withdrawalpackets, then it restarts itsrouting process.

(Note that a node can detect the movement of itself with the helpof a GPS device.)

Neighbor-table size control. Compared with Berkeley motes,Stargates have relatively large memory and disk size (e.g.,64MB

RAM and 32MB flash disk). Therefore, we adopt a very simplemethod of neighbor-table size control: keeping the best next-hopforwarders according to their ranks. In our experiments, wesetthe maximum neighbor table size as 20. A more detailed studyof the best neighborhood management scheme for Stargates isbeyond the scope of this paper.

5 Experimental evaluation

Via testbeds and field deployment, we experimentally evaluatethe design decisions and the performance of LOF. First, we presentthe experiment design; then we discuss the experimental results.

5.1 Experiment design

Network setup. In our indoor testbed as shown in Figure 2, welet the Stargate at the left-bottom corner of the grid be the basestation, to which the other Stargates need to find routes. Then,we let the StargateS at the upper-right corner of the grid be thetraffic source.S sends packets of length 1200 bytes according tothe ExScal event trace as discussed in Section 2.1 and Figure3.For each protocol we study,S simulates 50 event runs, with theinterval between consecutive runs being 20 seconds. Therefore,for each protocol studied, 950 (i.e.,50 × 19) packets are gener-ated atS.

We have also tested scenarios where multiple senders gener-ate ExScal traffic simultaneously, as well as scenarios where thedata traffic is periodic; LOF has also been used in the backbonenetwork of ExScal. We discuss them in Section 5.3, togetherwith other related experiments.

Protocols studied. We study the performance of LOF in com-parison with that of beacon-based routing, where the latestde-velopment is represented by ETX [13, 33] and PRD [30]:(For

convenience, we do not differentiate the name of a routing metric and the proto-

col implementing it.)

• ETX: expected transmission count. It is a type of geography-unaware distance-vector routing where a node adopts a routewith the minimum ETX value. Since the transmission rateis fixed in our experiments, ETX routing also represents an-other metric ETT [15], where a route with the minimumexpected transmission timeis used. ETT is similar toMAClatencyas used in LOF.• PRD: product of packet reception rate and distance traversed

to the destination. Unlike ETX, PRD is geography-based.In PRD, a node selects as its next-hop forwarder the neigh-bor with the maximum PRD value. The design of PRD isbased on the analysis that assumes geographic-uniformity.

By their original proposals, ETX and PRD use broadcast bea-cons in estimating the respective routing metrics. In this paper,we compare the performance of LOF with that of ETX and PRDas originally proposed in [13] and [30], without considering thepossibility of directly estimating metrics ETX and PRD via datatraffic. This is because the firmware of our SMC WLAN cardsdoes not expose information on the number of retries of a uni-cast transmission. (As a part of our future work, we plan todesign mechanisms to estimate ETX and PRD via data trafficand study the corresponding protocol performance.) In our ex-

12

periments, metrics ETX and PRD are estimated according to themethod originally proposed in [13] and [30]; for instance, broad-cast beacons have the same packet length and transmission rateas those of data packets. Since it has been shown that ETX andPRD perform better than protocols based on metrics such as RTT(round-trip-time) and hop-count [14, 30], we do not study thoseprotocols in this paper.

To verify some important design decisions of LOF, we alsostudy different versions of LOF as follows:5

• L-hop: assumes geographic-uniformity, and thus uses met-ric ELR, as specified by formula (6), instead of ELD;• L-ns: does not use the method of exploratory neighbor sam-

pling;• L-sd: considers, in exploratory neighbor sampling, the neigh-

bors that have been marked as dead;• L-se: performs exploratory neighbor sampling after every

packet transmission.For easy comparison, we have implemented all the protocols

mentioned above in EmStar [4], a software environment for de-veloping and deploying wireless sensor networks.

Evaluation criteria. Reliability is one critical concern in con-vergecast. Using the techniques of reliable transport discussedin Section 4.5, all the protocols guarantee 100% packet deliveryin our experiments. Therefore, we compare protocols in metricsother than reliability as follows:• End-to-end MAC latency: the sum of the MAC latency spent

at each hop of a route. This reflects not only the deliv-ery latency but also the throughput available via a protocol[13, 15].• Energy efficiency: energy spent in delivering a packet to the

base station.• Route stability: the number as well as the degree of route

changes, and the stability of end-to-end packet delivery.

5.2 Experimental results

MAC latency. Using boxplots6, Figure 17 shows the end-to-endMAC latency, in milliseconds, for each protocol. The averageend-to-end MAC latency in both ETX and PRD is around 3 timesthat in LOF, indicating the advantage of data-driven link qualityestimation. The MAC latency in LOF is also less than that of theother versions of LOF, showing the importance of using the rightrouting metric (including not assuming geographic uniformity)and neighbor sampling technique.

To explain the above observation, Figures 18, 19, 20, and 21show the route hop length, per-hop MAC latency, average per-

5Note: we have studied the performance of geography-unawaredistance-vector routingusing data-driven estimation trying to minimize the sum of MAC latency along routes, andwe found that the performance is similar to that of LOF, except that more control packets areused.

6Boxplot is a nice tool for describing the distribution of a data sample:• The lower and upper lines of the “box” are the 25th and 75th percentiles of the sam-

ple. The distance between the top and bottom of the box is the interquartile range.• The line in the middle of the box is the sample median.• The “whiskers”, lines extending above and below the box, show the extent of the rest

of the sample. If there is no outlier, the top of the upper whisker is the maximumof the sample, and the bottom of the lower whisker is the minimum. An outlier is avalue that is more than 1.5 times the interquartile range away from the top or bottomof the box. An outlier, if any, is represented as a plus sign.

• The notches in the box shows the 95% confidence interval for the sample median.

ETX PRD LOF L−hop L−ns L−sd L−se0

50

100

150

200

250

end−

to−

end

MA

C la

tenc

y (m

s)

Figure 17: End-to-end MAC latency

ETX PRD LOF L−hop L−ns L−sd L−se

2

4

6

8

10

12

hop

leng

th o

f rou

tes

Figure 18: Number of hops in a route

ETX PRD LOF L−hop L−ns L−sd L−se0

10

20

30

40

50

60

per−

hop

MA

C la

tenc

y (m

s)

Figure 19: Per-hop MAC latency

ETX PRD LOFL−hopL−ns L−sd L−se0

1

2

3

4

5

aver

age

per−

hop

dist

ance

(m

eter

)

Figure 20: Average per-hop geographic distance

ETX PRD LOFL−hopL−ns L−sd L−se0

0.2

0.4

0.6

0.8

CO

V o

f per

−ho

p ge

o−di

stan

ce

Figure 21: COV of per-hop geographic distance in a route

13

hop geographic distance, and the coefficient of variation (COV)of per-hop geographic distance. Even though the average routehop length and per-hop geographic distance in ETX are approx-imately the same as those in LOF, the average per-hop MAClatency in ETX is about 3 times that in LOF, which explains whythe end-to-end MAC latency in ETX is about 3 times that in LOF.In PRD, both the average route hop length and the average per-hop MAC latency is about twice that in LOF.

From Figure 21, we see that the COV of per-hop geographicdistance is as high as 0.4305 in PRD and 0.2754 in L-hop. There-fore, the assumption of geographic uniformity is invalid, whichpartly explains why PRD and L-hop do not perform as well asLOF. Moreover, the fact that the COV value in LOF is the largestand that LOF performs the best tend to suggest that the networkstate is heterogeneous at different locations of the network.

Energy efficiency. Given that beacons are periodically broad-casted in ETX and PRD, and that beacons are rarely used in LOF,it is easy to see that more beacons are broadcasted in ETX andPRD than in LOF. Therefore, we focus our attention only onthe number of unicast transmissions required for delivering datapackets to the base station, rather than on the broadcast overhead.To this end, Figure 22 shows the number of unicast transmissions

ETX PRD LOFL−hopL−ns L−sd L−se0

5

10

15

20

# of

uni

cast

tran

smis

sion

s

Figure 22: Number of unicast transmissions per packet received

averaged over the number packets received at the base station.The number of unicast transmissions per packet received in ETXand PRD is 1.49 and 2.37 times that in LOF respectively, show-ing again the advantage of data-driven instead of beacon-basedlink quality estimation. The number of unicast transmissions perpacket received in LOF is also less than that in the other ver-sions of LOF. For instance, the number of unicast transmissionsin L-hop is 2.89 times that in LOF.

Given that the SMC WLAN card in our testbed uses IntersilPrism2.5 chipset which does not expose the information on thenumber of retries of a unicast transmission, Figure 22 does notrepresent the actual number of bytes sent. Nevertheless, givenFigure 19 and the fact that MAC latency and energy consumptionare positively related (as discussed in Section 3.1), the aboveobservation on the relative energy efficiency among the protocolsstill holds.

To explain the above observation, Figure 23 shows the num-ber of failed unicast transmissions for the 950 packets generatedat the source. The number of failures in ETX and PRD is 1112and 786 respectively, yet there are only 5 transmission failuresin LOF. Also, there are 711 transmission failures in L-hop. To-gether with Figures 20 and 5(b), we see that there exist reliablelong links, yet only LOF tends to find them well: ETX also uses

ETX PRD LOFL−hopL−ns L−sd L−se0

200

400

600

800

1000

1200

# of

tra

nsm

issi

on fa

ilure

s

Figure 23: Number of failed unicast transmissions

long links, but they are not reliable; L-ns uses reliable links, butthey are relatively shorter.

Route stability. Figure 24 shows the average number of route

ETX PRD LOF L−hop L−ns L−sd L−se−0.5

0

0.5

1

1.5

2

2.5

3

log1

0(#

of r

oute

cha

nges

per

nod

e)

Figure 24: Average number of route changes per node

changes at each node. For readability in spite of the sharp dif-ference in the values across protocols, we present the commonlogarithm (i.e., base 10) of the values along the y-axis. We seethat the average number of route changes in ETX and PRD is 2orders of magnitude greater than that in LOF. As a result, packetstend to be delivered in order in LOF but not in ETX and PRD, asshown in Figure 25 where the reorder distance of a packetp0 is

ETX PRD LOF L−hop L−ns L−sd L−se0

50

100

150

200

250

300

350

pack

et r

eord

er d

ista

nce

Figure 25: Packet reorder distance

the number of packets that are generated later thanp0 but reachthe base station earlier thanp0.

To understand how route changes, we measure the degree ofroute changes and how the hop-length of routes change. Thedegree of route change is measured as follows:• 0: the route taken by a packet is the same as that taken by

the previous packet;• -1: the route taken by a packet is different from that taken

by the previous packet, but they are of equal hop length;• -2: the route taken by a packet is longer, in hop-length, than

that taken by the previous packet;• -3: the route taken by a packet is shorter, in hop-length, than

that taken by the previous packet.

14

Due to space limitations, we present, in Figure 26, only the timeseries of the hop-length and the degree of route change for pro-tocols ETX, PRD, LOF, and L-hop. LOF seldom changes route;

0 200 400 600 800 1000−4

−2

0

2

4

6

8

10

12

series

hop−lengthroute−changes

(a) ETX

0 200 400 600 800 1000−4

−2

0

2

4

6

8

10

12

series

hop−lengthroute−changes

(b) PRD

0 200 400 600 800 1000−4

−2

0

2

4

6

8

series

hop−lengthroute−changes

(c) LOF

0 200 400 600 800 1000−4

−2

0

2

4

6

8

series

hop−lengthroute−changes

(d) L-hop

Figure 26: Time series of route changes

yet route changes occur frequently in ETX and PRD, even whenthe routes are of equal hop-length. This shows that data-drivenlink quality estimation is also more stable than beacon-based linkestimation.

5.3 Other experiments

Multiple senders and periodic traffic. Besides the scenarioof 1 source event traffic which we discussed in detail in the lastsubsection, we have performed experiments where the Stargateat the upper-right corner and its two immediate grid-neighborssimultaneously generate packets according to the ExScal traffictrace. We have also experimented with periodic traffic where1or 3 Stargates (same as those in the case of event traffic) generate1,000 packets each, with each packet being 1200-byte long andthe inter-packet interval being 500 milliseconds. In theseexperi-ments, we have observed similar patterns in the relative protocolperformance as those in the case of 1 source event traffic. Forconciseness, we only present the end-to-end MAC latency forthese three cases, as shown in Figure 27.

Distance-vector data-driven routing and network through-put. We have so far focused on geographic routing in this paper,so that we do not need periodic control packets at all. In practice,however, it may not be feasible to have high-precision locationinformation. In this case, we can adopt the classical distance-vector routing (which is based on the distributed Bellman-Fordalgorithm) with data-driven link estimation. We have imple-mented the distance-vector data-driven routing protocolL-dv inEmStar, and we have experimentally measured its performancein Kansei. For the case where there is one node generating datapackets according to the ExScal traffic trace, Figure 28 showshow the end-to-end MAC latency in L-dv compares with that inother protocols. We see that L-dv has similar performance asLOF, and L-dv performs better than beacon-driven routing ETX

ETX PRD LOF0

50

100

150

200

end−

to−

end

MA

C la

tenc

y (m

s)

(a) event traffic, 3 sendersETX PRD LOF

0

50

100

150

end−

to−

end

MA

C la

tenc

y (m

s)

(b) periodic traffic, 1 sender

ETX PRD LOF0

50

100

150

200

end−

to−

end

MA

C la

tenc

y (m

s)

(c) periodic traffic, 3 senders

Figure 27: End-to-end MAC latency

ETX PRD L−dv LOF0

20

40

60

80

100

120

140

160

end−

to−

end

MA

C la

tenc

y (m

illis

econ

ds)

Figure 28: End-to-end MAC latency: event traffic, 1 sender.(Note: the experiments are done with a 15×6 subgrid of Kan-sei.)

and PRD. We observe similar patterns in terms of other eval-uation metrics (such as energy efficiency) and experimentationsetup (such as multiple senders and periodic traffic).

To understand the impact of different protocols on networkcapacity, we also measure the network throughput by lettingthecorner source node generating data packets as fast as possible.Each data packet contains a payload of 1200 bytes. Figure 29shows the throughput in different protocols. We see that LOF

ETX PRD L−dv LOF0

2

4

6

8

10

good

put (

pack

ts/s

econ

d)

Figure 29: Network throughput (Note: the experiments are donewith a 15×6 grid of Kansei.)

and L-dv yield similar network throughput, and they both sig-nificantly improves the network throughput of ETX and PRD(e.g., up to a factor of 7.78). Our current experimental facil-ity limits the highest achievable one-hop throughput to be 50

15

packets/second, thus the highest achievable multi-hop through-put is 7.14 packets/second [27]. We see that both LOF and L-dvachieve a throughput very close to the highest possible networkthroughput.

The data presented in Figures 28 and 29 are for experimentsexecuted on a 15×6 grid of Kansei. When executing these ex-periments, we were unable to access the complete grid of Kanseidue to some maintenance issues. But we believe the observationswill carry over to other network setups including the completegrid of Kansei.

Field deployment. Based on its well-tested performance, LOFhas been incorporated in the ExScal sensor network field exper-iment [7], where 203 Stargates were deployed as the backbonenetwork, with the inter-Stargate separation being around 45 me-ters. LOF successfully guaranteed reliable and real-time con-vergecast from any number of non-base Stargates to the base sta-tion in ExScal, showing not only the performance of the protocolbut also the stability of its implementation.

6 Related work

The literature on routing in ad hoc and wireless networks is quiterich. In this section, we only review those related most closelyto LOF.

Link properties in 802.11b mesh networks and dense wire-less sensor networks have been well studied in [8], [25], and[35]. They have observed that wireless links assume complexproperties, such as wide-range non-uniform packet delivery rateat different distances, loose correlation between distance andpacket delivery rate, link asymmetry, and temporal variations.Our study on link properties complements existing works by fo-cusing on the differences between broadcast and unicast linkproperties, as well as the impact of interference pattern onthedifferences.

Differences between broadcast and unicast and their impactonthe performance of AODV have been discussed in [28] and [12].Our work complements [28] and [12] by experimentally study-ing the differences as well as the impact of environment, dis-tance, and interference pattern on the differences, which were notthe focus of [28] and [12]. [12] mentioned the difficulty of get-ting MAC feedback and thus focused on the method of beacon-based link estimation. Our work complements [12] by devel-oping techniques for reliably fetching MAC feedback, whichbuild the foundation for data-driven link estimation and rout-ing. To improve the performance of AODV, [28] and [12] alsodiscussed reliability-based mechanisms (e.g., RSSI- and SNR-based ones) for blacklisting bad links. Since it has been shownthat reliability-based blacklisting does not perform as well asETX [17, 13, 33], we do not directly compare LOF to [28] and[12], instead we compare LOF to ETX.

Recently, great progress has been made regarding routing inwireless sensor networks as well as in mesh networks. Rout-ing metrics such as ETX [13, 33] and ETT/WCETT [15] havebeen proposed and shown to perform well in real-world wirelessnetworks [14]. The geography-based metric PRD [30] has alsobeen proposed for energy-efficient routing in wireless sensor net-works. Nevertheless, unicast link properties were still estimated

using broadcast beacons in these works. Our work differs fromexisting approaches by experimentally demonstrating the diffi-culty of precisely estimating unicast link properties via those ofbroadcast beacons, and proposing the data-driven protocolLOFwhere unicast link properties are estimated via the data trafficitself.

Similar to LOF, SPEED [18] also used MAC latency and geo-graphic information in route selection. In parallel with our work,[26] proposed NADV which also uses information from MAClayer. While focusing on real-time packet delivery and a gen-eral framework for geographic routing, [18] and [26] did notfocus on the protocol design issues in data-driven link estima-tion and routing. They did not study the differences betweenbroadcast and unicast link properties. They did not consider theimportance of appropriateexploratory neighbor samplingeither.SPEED switches next-hop forwarders after every packet trans-mission (as in L-se), and NADV does not perform exploratoryneighbor sampling (as in L-ns), both of which degenerate net-work performance as shown in Section 5. Complementary toSPEED and NADV, moreover, we have analyzed the small sam-ple size requirement in LOF, which shows the feasibility of data-driven link estimation. While [18] and [26] have evaluated theirmethods via simulation, we have studied the systems issues inreliably fetching MAC feedback and evaluated LOF via experi-ments in real networks with realistic traffic trace. Finally, [26]did not compare the performance of NADV with that of ETX andPRD. MAC feedback was also used in [24] and [16], but they didnot focus on the differences between broadcast and unicast linkproperties and the protocol design isssues specific to data-drivenlink estimation and routing either.

The problem of local minimum or geographic void has beendealt with in routing protocols such as GPSR [23]. In this pa-per, therefore, we have not considered this problem since itisindependent of our major concerns — data-driven link estima-tion and routing. As a part of our future work, we plan to in-corporate techniques of dealing with geographic void into LOF,by adapting the definition of “effective geographic progress” (inSection 3.2) and routing around void. The impact of localizationerrors on geographic routing has been studied in [29]. In LOF,we adopted a separate software component that fine tunes theGPS readings to reduce localization inaccuracy, as also used inthe field experiment ExScal [7].

7 Concluding remarks

Via experiments in testbeds of 802.11b networks, we have demon-strated the difficulties of precisely estimating unicast link prop-erties via broadcast beacons. To circumvent the difficulties, wehave proposed to estimate unicast link properties via data trafficitself, using MAC feedback for data transmissions. To this end,we have modified the Linux kernel andhostapWLAN driverto provide feedback on the MAC latency as well as the statusof every unicast transmission, and we have built system soft-ware for reliably fetching MAC feedbacks. Based on these sys-tem facilities, we have demonstrated the feasibility as well aspotential benefits of data-driven routing by designing protocolLOF. LOF mainly used three techniques for link quality estima-

16

tion and route selection: initial sampling, data-driven adaptation,and exploratory neighbor sampling. With its well tested perfor-mance and implementation, LOF has been successfully used tosupport convergecast in the backbone network of ExScal, where203 Stargates have been deployed in an area of 1260 meters by288 meters.

In this paper, we have focused on data-driven link estimationand routing in 802.11 networks. But we believe that the conceptof data-driven link estimation also applies to other sensornet-works such as those using IEEE 802.15.4 radios, since temporalcorrelation in link properties also leads to estimation inaccuracyin these networks [11]. Given the limitation of our 802.11 radios,we have not applied the technique of data-driven link estimationto metrics such as ETX [13] or RNP [11]. We plan to explorethese directions in our future work.

Besides saving energy by avoiding periodic beaconing, LOFfacilitates greater extent of energy conservation, because LOFdoes not require a node to be awake unless it is generating orforwarding data traffic. LOF also helps in enhancing networksecurity, since the network is less exposed. More detailed studyof the impact of data-driven routing on energy efficiency andsecurity is a part of our future work.

Acknowledgment

This material is based upon work supported by the National Sci-ence Foundation under Grant No. 0341703, and by DARPA un-der contract OSU-RF #F33615-01-C-1901. We thank VinayakNaik and Emre Ertin for their help in the discussion and experi-mentation. We appreciate the help from UCLA EmStar team fortheir help in answering questions regarding EmStar.

References[1] Broadband seismic network, mexico experiment (mase).

http://research.cens.ucla.edu/.[2] Codeblue: Wireless sensor networks for medical care.

http://www.eecs.harvard.edu/∼mdw/proj/codeblue/.[3] Crossbow technology inc. http://www.xbow.com/.[4] EmStar: Software for wireless sensor networks.

http://cvs.cens.ucla.edu/emstar/.[5] Linux WLAN driver hostap. http://hostap.epitest.fi/.[6] Wireless LAN medium access control (MAC) and physical layer (PHY)

specifications. InANSI/IEEE Std 802.11. 1999.[7] Exscal project. http://www.cse.ohio-state.edu/exscal, 2004.[8] D. Aguayo, J. Bicket, S. Biswas, G. Judd, and R. Morris. Link-level mea-

surements from an 802.11b mesh network. InACM SIGCOMM, pages121–132, 2004.

[9] A. Arora, E. Ertin, R. Ramnath, M. Nesterenko, and W. Leal. Kansei: Ahigh-fidelity sensing testbed.IEEE Internet Computing, March 2006.

[10] B. Awerbuch, D. Holmer, and H. Rubens. High throughput route selectionin multi-rate ad hoc wireless networks. Technical report, Johns HopkinsUniversity, 2003.

[11] A. Cerpa, J. Wong, M. Potkonjak, and D. Estrin. Temporalproperties oflow power wireless links: Modeling and implications on multi-hop routing.In ACM MobiHoc, pages 414–425, 2005.

[12] I. Chakeres and E. Belding-Royer. The utility of hello messages for deter-mining link connectivity. InWPMC, 2002.

[13] D. S. J. D. Couto, D. Aguayo, J. Bicket, and R. Morris. A high-throughputpath metric for multi-hop wireless routing. InACM MobiCom, pages 134–146, 2003.

[14] R. Draves, J. Padhye, and B. Zill. Comparison of routingmetrics for staticmulti-hop wireless networks. InACM SIGCOMM, pages 133–144, 2004.

[15] R. Draves, J. Padhye, and B. Zill. Routing in multi-radio, multi-hop wire-less mesh networks. InACM MobiCom, pages 114–128, 2004.

[16] R. Fonseca, O. Gnawali, K. Jamieson, and P. Levis. Four-bit wireless linkestimation. InACM HotNets, 2007.

[17] O. Gnawali, M. Yarvis, J. Heidemann, and R. Govindan. Interaction ofretransmission, blacklisting, and routing metrics for reliability in sensornetwork routing. InIEEE SECON, pages 34–43, 2004.

[18] T. He, J. Stankovic, C. Lu, and T. Abdelzaher. SPEED:A stateless protocolfor real-time communication in sensor networks. InIEEE ICDCS, 2003.

[19] T. Herbert.TheL inux TCP/IPStack: Networking for Embedded Systems.Charles River Media, 2004.

[20] F. Hillier and G. Lieberman. Introduction to Operations Research.McGraw-Hill, 2001.

[21] M. Hollander.Nonparametric statistical methods. Wiley, 1999.[22] R. Jain. The Art of Computer Systems Performance Analysis. John Wiley

& Sons, Inc., 1991.[23] B. Karp and H. T. Kung. GPSR: Greedy perimeter statelessrouting for

wireless networks. InACM MobiCom, pages 243–254, 2000.[24] K.-H. Kim and K. G. Shin. On accurate measurement of linkquality in

multi-hop wireless mesh networks. InACM MobiCom, 2006.[25] D. Kotz, C. Newport, and C. Elliott. The mistaken axiomsof wireless-

network research. Technical Report TR2003-467, DartmouthCollege,Computer Science, July 2003.

[26] S. Lee, B. Bhattacharjee, and S. Banerjee. Efficient geographic routing inmultihop wireless networks. InACM MobiHoc, pages 230–241, 2005.

[27] J. Li, C. Blake, D. D. Couto, H. Lee, and R. Morris. Capacity of ad hocwireless networks. InACM MobiCom, pages 61–69, 2001.

[28] H. Lundgren, E. Nordstrom, and C. Tschudin. Coping withcommunicationgray zones in ieee 802.11b based ad hoc networks. InACM WoWMoM,pages 49–55, 2002.

[29] K. Seada, A. Helmy, and R. Govindan. On the effect of localization errorson geographic face routing in sensor networks. InIEEE-ACM IPSN, 2004.

[30] K. Seada, M. Zuniga, A. Helmy, and B. Krishnamacari. Energy-efficientforwarding strategies for geographic routing in lossy wireless sensor net-works. InACM SenSys, 2004.

[31] R. Thisted.Elements of statistical computing. Chapman & Hall, Ltd., 1988.[32] A. Willig. A new class of packet- and bit-level models for wireless chan-

nels. InIEEE PIMRC, 2002.[33] A. Woo, T. Tong, and D. Culler. Taming the underlying challenges of

reliable multihop routing in sensor networks. InACM SENSYS, pages 14–27, 2003.

[34] H. Zhang, A. Arora, Y. R. Choi, and M. Gouda. Reliable bursty converge-cast in wireless sensor networks. InACM MobiHoc, 2005.

[35] J. Zhao and R. Govindan. Understanding packet deliveryperformance indense wireless sensor networks. InACM SenSys, pages 1–13, 2003.

17