Arterial travel time estimation based on vehicle re-identification using magnetic sensors:...

21
Arterial travel time estimation based on vehicle re-identification using wireless magnetic sensors Karric Kwong a,1 , Robert Kavaler a,1 , Ram Rajagopal b,2 , Pravin Varaiya b, * a Sensys Networks, Inc., 2560 Ninth Street, Berkeley, CA 94710, United States b Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720-1700, United States article info Article history: Received 24 July 2008 Received in revised form 1 April 2009 Accepted 2 April 2009 Keywords: Real-time travel time estimation Vehicle re-identification Arterial performance measures Queue length Discharge rate Magnetic signature abstract A practical system is described for the real-time estimation of travel time across an arterial segment with multiple intersections. The system relies on matching vehicle signatures from wireless sensors. The sensors provide a noisy magnetic signature of a vehicle and the precise time when it crosses the sensors. A match (re-identification) of signatures at two locations gives the corresponding travel time of the vehicle. The travel times for all matched vehicles yield the travel time distribution. Matching results can be processed to provide other important arterial performance measures including capacity, volume/capac- ity ratio, queue lengths, and number of vehicles in the link. The matching algorithm is based on a statistical model of the signatures. The statistical model itself is estimated from the data, and does not require measurement of ‘ground truth’. The procedure does not require measurements of signal settings; in fact, signal settings can be inferred from the matched vehicle results. The procedure is tested on a 1.5 km (0.9 mile)-long segment of San Pablo Avenue in Albany, CA, under different traffic conditions. The segment is divided into three links: one link spans four intersections, and two links each span one intersection. Ó 2009 Elsevier Ltd. All rights reserved. 1. Introduction and previous work Estimating arterial travel time is difficult. Since the movement of vehicles is interrupted by signals, estimates based on speeds measured by loop detectors or radar are inaccurate. Approaches for estimating travel times on arterial links include speed vs. volume to capacity ratio relationships or procedures based on the Highway Capacity Manual. The latter calculates average travel time as the sum of the running time, based on arterial design characteristics, and the intersection delay, based on a deterministic point delay model. These approaches are not suited for real-time applications with variable traffic conditions. Statistical models have been proposed for estimating travel times from surveillance data. For example, Zhang (1999) esti- mates link-speed as a function of volume to capacity ratio and volume and occupancy measured by loop detectors. Since the estimation itself requires collection of travel times, the model is site-specific and impractical to implement. By contrast, Skabardonis and Geroliminis (2005) develop a generally applicable kinematic wave model to construct a link travel time estimate from 30-s flow and occupancy data collected at an upstream loop detector, together with the exact 0968-090X/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.trc.2009.04.003 * Corresponding author. Tel.: +1 510 642 5270; fax: +1 510 642 1785. E-mail addresses: [email protected] (K. Kwong), [email protected] (R. Kavaler), [email protected] (R. Rajagopal), [email protected] (P. Varaiya). 1 Tel.: +1 510 548 4620; fax: +1 510 548 8264. 2 Tel.: +1 510 642 5270; fax: +1 510 642 1785. Transportation Research Part C 17 (2009) 586–606 Contents lists available at ScienceDirect Transportation Research Part C journal homepage: www.elsevier.com/locate/trc

Transcript of Arterial travel time estimation based on vehicle re-identification using magnetic sensors:...

Transportation Research Part C 17 (2009) 586–606

Contents lists available at ScienceDirect

Transportation Research Part C

journal homepage: www.elsevier .com/locate / t rc

Arterial travel time estimation based on vehicle re-identificationusing wireless magnetic sensors

Karric Kwong a,1, Robert Kavaler a,1, Ram Rajagopal b,2, Pravin Varaiya b,*

a Sensys Networks, Inc., 2560 Ninth Street, Berkeley, CA 94710, United Statesb Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720-1700, United States

a r t i c l e i n f o

Article history:Received 24 July 2008Received in revised form 1 April 2009Accepted 2 April 2009

Keywords:Real-time travel time estimationVehicle re-identificationArterial performance measuresQueue lengthDischarge rateMagnetic signature

0968-090X/$ - see front matter � 2009 Elsevier Ltddoi:10.1016/j.trc.2009.04.003

* Corresponding author. Tel.: +1 510 642 5270; faE-mail addresses: [email protected]

[email protected] (P. Varaiya).1 Tel.: +1 510 548 4620; fax: +1 510 548 8264.2 Tel.: +1 510 642 5270; fax: +1 510 642 1785.

a b s t r a c t

A practical system is described for the real-time estimation of travel time across an arterialsegment with multiple intersections. The system relies on matching vehicle signaturesfrom wireless sensors. The sensors provide a noisy magnetic signature of a vehicle andthe precise time when it crosses the sensors. A match (re-identification) of signatures attwo locations gives the corresponding travel time of the vehicle. The travel times for allmatched vehicles yield the travel time distribution. Matching results can be processed toprovide other important arterial performance measures including capacity, volume/capac-ity ratio, queue lengths, and number of vehicles in the link. The matching algorithm isbased on a statistical model of the signatures. The statistical model itself is estimated fromthe data, and does not require measurement of ‘ground truth’. The procedure does notrequire measurements of signal settings; in fact, signal settings can be inferred from thematched vehicle results. The procedure is tested on a 1.5 km (0.9 mile)-long segment ofSan Pablo Avenue in Albany, CA, under different traffic conditions. The segment is dividedinto three links: one link spans four intersections, and two links each span one intersection.

� 2009 Elsevier Ltd. All rights reserved.

1. Introduction and previous work

Estimating arterial travel time is difficult. Since the movement of vehicles is interrupted by signals, estimates based onspeeds measured by loop detectors or radar are inaccurate.

Approaches for estimating travel times on arterial links include speed vs. volume to capacity ratio relationships orprocedures based on the Highway Capacity Manual. The latter calculates average travel time as the sum of the running time,based on arterial design characteristics, and the intersection delay, based on a deterministic point delay model. Theseapproaches are not suited for real-time applications with variable traffic conditions.

Statistical models have been proposed for estimating travel times from surveillance data. For example, Zhang (1999) esti-mates link-speed as a function of volume to capacity ratio and volume and occupancy measured by loop detectors. Since theestimation itself requires collection of travel times, the model is site-specific and impractical to implement.

By contrast, Skabardonis and Geroliminis (2005) develop a generally applicable kinematic wave model to construct a linktravel time estimate from 30-s flow and occupancy data collected at an upstream loop detector, together with the exact

. All rights reserved.

x: +1 510 642 1785.(K. Kwong), [email protected] (R. Kavaler), [email protected] (R. Rajagopal),

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 587

times of the red and green phases. Their procedure can be explained as follows: The upstream detector counts the number nof vehicles that arrive in a 30-s interval, say during ½s; sþ 30�. These n vehicles are assumed to cross the detector at uniformlyspaced times sþ 30i=n; i ¼ 1; . . . ;n. From the assumed or estimated free flow travel time Tf , these vehicles will arrive at theintersection at times Tf þ sþ 30i=n. Knowing the signal phase at these arrival times and the previously calculated queue atthe intersection, and using the kinematic wave model (with assumed or estimated congestion wave speed and jam density),one can calculate the delay faced by each of the n vehicles, and the queue remaining at the end of the 30 s. The procedure isthen repeated, to yield the average travel time across the link and the average delay.

Liu and Ma (2009) use a similar model. However, by measuring individual vehicle detector actuations they know the ex-act times that the vehicles crossed the upstream detector, instead of assuming that these are uniformly spaced times. Therest of their procedure is similar. The models of individual vehicle trajectories in both Skabardonis and Geroliminis(2005) and Liu and Ma (2009) are more elaborate than the uniform free flow speed assumed above, as they take into accountthe vehicle’s deceleration as it approaches a queue and its acceleration as it departs from the signal. However, this elabora-tion does not measurably affect the average travel time and delay estimates (Liu and Ma, 2009, Fig. 10).

The two approaches outlined above have limitations. They require precise signal phase times, and these must be synchro-nized with the detector times. Moreover, for a link with multiple intersections, each intersection must be instrumented. Suchinstrumentation is expensive. Second, both approaches require knowledge of parameters such as free flow travel time, whichmay not be constant across the entire range of traffic conditions, and lead to bias in the estimates. Third, average travel timeand delay are insufficient to calculate interesting arterial performance measures provided by the scheme proposed here, asseen in Section 3.

In principle, vehicle re-identification schemes overcome these limitations. These schemes work as follows: Sensors placedat the two ends of a link record the times when a vehicle crosses them and measure its signature. When a vehicle’s signatureis matched at the two sensors, its travel time is obtained. Signal phase information is not needed. If sufficiently many vehi-cles are re-identified, the travel time distribution can be estimated. Vehicles can be easily re-identified by matching uniquetags or license plates; but besides raising privacy concerns, these schemes are too expensive to deploy over an arterialnetwork.

Re-identification schemes for freeway travel times have been demonstrated. Sun et al. (1999) match waveforms frominductive loops produced by the passage of a vehicle. The waveforms are first normalized using independently measuredspeeds. Features from the normalized waveform pairs are extracted and compared in a multi-criterion optimization frame-work to obtain the best match. Coifman (1999) compares lengths of vehicle platoons at the two detector locations. Thelength estimate too requires independent speed measurements.

Ndoye et al. (2008) and Oh and Ritchie (2002) report experiments using inductive loop signatures. Again, vehicle speed isused to normalize the raw waveform and produce a speed-independent signature. The speed normalization procedure as-sumes that vehicle speed is constant. If a vehicle is accelerating or decelerating, this assumption is invalid: as Ndoyeet al. (2008) report, the rate of correct matching then drops drastically. Oh and Ritchie (2002) only report results for anon-peak period. Neither scheme would perform well in a link with significant acceleration and deceleration, caused by traf-fic signals on an arterial or by vehicles arriving at a queue behind a freeway bottleneck. Platoon lengths used in Coifman(1999) would not work well for the additional reason that signalized intersections would break platoons up.

Sun et al. (2004) use vehicle color (extracted from video images) in addition to the loop-based signature and speed in adata ‘fusion’ algorithm that achieves a high matching rate for vehicle platoons in a link without an intersection. The selectionof the parameters of the fusion algorithm requires an extensive and expensive collection of ‘ground truth’ measurements.The fusion algorithm weights loop signature, speed, vehicle color and platoon traversal time. In the best fusion scheme, colorreceives a weight of 95%. The scheme is impractical and would not work in a link with an intersection that breaks platoonsup.

This paper presents a system to estimate the travel time distribution of a single arterial link, spanning several signalizedintersections. The system is based on matching individual vehicle signatures obtained from wireless magnetic sensors placedat the two ends of the link. The signature consists of the sequence of peak values of the ‘raw’ magnetic signal. The peak valuesare independent of the vehicle speed, so speed measurements are not needed. The matching scheme is anonymous, whereasapproaches that rely on reading licence plates or RFID tags or tracking cell phones risk violating privacy.

Unlike Skabardonis and Geroliminis (2005) and Liu and Ma (2009) the scheme requires no signal phase measurements.Indeed, signal phases, queue lengths, delay distributions and other performance measures can all be evaluated from thematched vehicles. The scheme is tested on a 1.5 km (0.9 mile)-long segment of San Pablo Avenue in Albany, CA, spanningsix intersections. The segment is divided into three links: one link spans four intersections, and two links each span oneintersection. The peak hour per lane flow over the segment is 500–600 vph.

The paper is organized as follows: The test site and measurement system are described in Section 2. Test results are pre-sented in Section 3. The matching problem and the statistical signature model used to evaluate matching algorithms occupySection 4. The optimal unconstrained matching algorithm and the optimal constrained matching algorithm are described inSections 5 and 6, respectively. (The results of Section 3 are based on optimal constrained matching.) A real-time version ofthe optimal constrained algorithm is presented in Section 7. Estimation of the statistical signature model without measure-ment of ground truth is described in Section 8. Major conclusions and some suggestions for future work are collected in Sec-tion 9. Some of the more technical material is presented in Appendix.

588 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

2. Test site and measurement system

On the left in Fig. 1 is a map of the 1.5 km (0.9 mile)-long test segment on southbound San Pablo Avenue in Albany, CA,starting at A (Fairmount) and ending at D (Buchanan). The segment is divided into three links, A! B; B! C; C ! D. LinkA! B spans four signalized intersections (the three circles plus the intersection at Washington), links B! C and C ! D eachspan one signalized intersection. Sensors at A; B; C, and D are located immediately downstream (12 m) of the correspondingintersection. Each link thus has one upstream and one downstream sensor, as indicated in the middle of Fig. 1. For example,link C ! D has its upstream sensor at C (12 m downstream of the intersection at Solano) and its downstream sensor at D(12 m downstream of the intersection at Buchanan).

San Pablo Avenue has two lanes, each with its own sensors. Vehicles in each lane are matched separately. The results be-low are for lane 1, the ‘‘fast” lane next to the median. We now describe the data. Consider the link illustrated in the middle ofthe figure. During a measurement time interval, vehicles indexed i ¼ 1; . . . ;N cross the upstream sensor at times s1 < s2 < � � �.Vehicles indexed j ¼ 1; . . . ;M cross the downstream sensor at times t1 < t2 < � � �. The upstream sensor measures the ‘signa-ture’ Xi of each vehicle i that crosses it and the corresponding time si. The downstream sensor measures the signature Yj ofeach vehicle j that crosses it and the corresponding time tj. Thus the measurement data consists of two arraysfðsi;XiÞ; i ¼ 1; . . . ;Ng and fðtj;YjÞ; j ¼ 1; . . . ;Mg. Vehicle speed is not measured. As suggested by the figure on the right, up-stream vehicle 1 is the same as downstream vehicle 3, which will be denoted as X1 ! Y3; similarly X3 ! Y5. On the otherhand upstream vehicle 2 has turned (either into the other lane or at the intersection) before reaching the downstream sen-sor, denoted X2 ! s; similarly s! Y4 indicates that downstream vehicle 4 turned into the lane and did not cross the up-stream sensor.

Each sensor consists of an array of seven nodes, embedded in the pavement 30 cm (12 in.) apart, flush with the pavementsurface and perpendicular to the direction of motion. A node has electronic circuits that incorporate into a system a mag-neto-resistive sensor measuring the earth’s three-dimensional magnetic field, a radio transceiver, an antenna, a microproces-sor, memory and a battery. The assembly is enclosed in a 7:4 cm� 7:4 cm� 4:9 cm ð2:9 in:� 2:9 in:� 1:9 in:Þ hardenedplastic cube. For installation, a 10 cm (4 in.) diameter hole is drilled approximately 5.7 cm (2.25 in.) deep, the node is placedin the hole, which is then sealed with fast-drying epoxy. The installation takes 10 min. The magnetic field is distorted as avehicle goes over a node. The node records the measured field sampled at a rate of 128 Hz and extracts from the record afeature vector of the vehicle. The seven feature vectors constitute the vehicle’s signature. The nodes transmit the time whenthe vehicle crossed it together with its signature via radio to a roadside pole-mounted ‘access point’. The access point in turntransmits the data to a server using a cellular service. The server analyzes the data in real time. The nodes and access pointare products of Sensys Networks, Inc. and described in Haoui et al. (2008). Fig. 2 shows how the sensor is deployed at one testsite location.

time

s1 s2 s3 s4 si

t1 t2 t3 t4 tjt5

τ

τ

tM

sN

12m

upsensor

12m

downsensor

Fig. 1. Test site, sensor locations, and vehicle matches.

Fig. 2. Deployment of sensor nodes and access point at one test site location.

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 589

As we have seen, data arriving at the server from a link during a time interval consist of the upstream arrayfðsi; XiÞ; i ¼ 1; . . . ;Ng and the downstream array fðtj; YjÞ; j ¼ 1; . . . ;Mg. Results are reported for a 30-min peak period (1–1:30PM) and a 60-min off-peak period (11PM–12AM); during the 30-min peak period between 200 and 300 vehicles tra-versed each link of the test segment.

The matching of upstream and downstream arrays is done in two steps. In the first or signal processing step, each pair ofðXi; YjÞ of upstream and downstream signatures is compared to produce a measure of dissimilarity or distance dði; jÞP 0between them, i.e., the signal processing algorithm implements a function d, dði; jÞ ¼ dðXi; YjÞ. The signal processing stepthereby reduces the two signature arrays to the N �M matrix D ¼ fdði; jÞj1 6 i 6 N; 1 6 j 6 Mg. The signal processing stepis described in Appendix. In the second step the matching problem is formulated and solved.

The solution to the matching problem is a set of matches of the form i! j; i! s, or s! j, indicating respectively thatupstream vehicle i is declared to be downstream vehicle j; i cannot be matched to any downstream vehicle, or j cannotbe matched to any upstream vehicle. For a vehicle match i! j; si and tj are the times the vehicle was at the beginningand end of the link, so ðtj � siÞ is its link travel time. The other matches may indicate turns. The next section discussesthe solution of the matching problem for the test site.

3. Test results and analysis

We first consider the solution of the matching problem for the single link B! C, and then analyze the implications of thesolution. The solution is obtained using the optimal constrained matching algorithm of Section 6.

Fig. 3 shows one way to display the matches for link B! C during the peak hour period 1–1:30PM on May 23, 2008. Thex-axis is the time in 104 s that a matched vehicle crosses the upstream sensor at B (top) or the downstream sensor at C (bot-tom). (Midnight is 0 s, 4:68� 104=3600 ¼ 13 h or 1PM, and 4:86� 104=3600 is 1:30PM.) The y-axis is the matched vehicle’stravel time in seconds. For illustration, two matches are highlighted: one vehicle with start time s1, end time t1 and traveltime t1 � s1, and another vehicle with corresponding times s10; t10; t10 � s10. The travel time samples can be used to estimatetravel time distributions as in Figs. 6 and 7 below. Vehicles that are not matched are not shown in Fig. 3 (but see Fig. 5). Fig. 3also reveals information about delay, signal phase, queues, number of vehicles in the link, and turning movements.

3.1. Delay

The free flow travel time Tf is the shortest travel time, 20–30 s in these plots, as indicated in the lower plot. The differencebetween a vehicle’s travel time and the free flow travel time is the delay it encountered at the intersection from decelerationand acceleration, and waiting in queue or at the stop bar during the red phase. Thus it is straightforward to estimate the totaldelay, average delay, and delay statistics such as percent of vehicles experiencing delay larger than some amount.

3.2. Signal phase

The sensor at C is 12 m downstream of the intersection at C, so the end time of a vehicle is virtually the same as the timethat it crosses the intersection. Hence the signal phase at C must be green at every vehicle end time. The arrival of a cluster or

Fig. 3. Sample results for link B! C, May 23, 2008, 1–1:30PM: travel time vs. start time (top), travel time vs. end time (bottom). Estimated red and greensignal phases are denoted by black and white rectangles, respectively.

590 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

platoon of vehicles indicates that these vehicles had formed a queue at the stop bar. So the end time of the first vehicle in aplatoon must coincide with the start of the green phase (and end of the red phase) at C. From these facts we can preciselyinfer the start of the green phase at C. On the other hand, the first vehicle in the platoon must be the first to arrive in the redphase. Thus the start of the red phase at C must occur before the start time of the first vehicle in the platoon plus its free flowtravel time. This reasoning leads to the construction of the signal phases as illustrated in the bottom of the figure in whichthe red and green phases are denoted by black and white rectangles, respectively. The signal at C is actuated, so the phasedurations are not constant.

3.3. Discharge rate

Intersection capacity estimates are based on the maximum rate at which a queue discharges during a green phase. Thismaximum rate r is easily obtained from the times ftjg when vehicles cross the downstream sensor by finding the minimumtime needed to discharge (say) five vehicles:

r ¼ 5minjðtjþ5 � tjÞ

:

By taking different numbers (than 5) more detailed characteristics of queue discharge can also be obtained (Lin and Tho-mas, 2005). By analyzing discharges over several cycles one may obtain a statistical characterization of discharge rates.

3.4. Queues

A platoon arriving at the downstream sensor indicates that these vehicles were earlier queued at the stop bar. Vehicles inthe platoon arrive in order of their position in the queue. The first vehicle in the queue will experience the longest delay (andtravel time) and successive vehicles in the queue will experience shorter queueing delays. Thus from the top plot in the fig-ure we can infer that the vehicle with start time s1 is first in the queue and it waited in the queue for time t1 � s1 � Tf , thesecond vehicle in the queue waited for time t2 � s2 � Tf , and so on. The vehicle with start time s10 (or perhaps the vehiclewith start time s8) is the vehicle that joined the queue after it had cleared and hence it faced no delay. Since these eightor 10 vehicles arrived in a platoon, we may infer that the queue size reached a maximum value of 10 or 8 vehicles duringthis cycle.

3.5. Cycle failure

In most cycles the travel time of the first vehicle in a platoon is larger than the travel time of the preceding vehicle, andthe latter equals the free flow travel time, Tf . This implies that the queue at the stop bar is cleared in these cycles, indicating

upstream time

downstream time

s5 s7 s8 s9 s10s6

t20 t21 t22

t

t19

s11

t18

upsensor

downsensor

τ

τ

Fig. 4. Calculation of number of vehicles Nt in link at time t. Dark squares are vehicles that have been matched before t.

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 591

no cycle failure. But in Fig. 3 the vehicle with start time s1 left B with the preceding platoon, but did not leave C with thatplatoon. The vehicle with start time s1 is delayed by more than one red phase, indicating cycle failure. Thus one can estimatecycle failure (Zheng et al., 2006).

3.6. Vehicles in link

Fig. 4 shows how to estimate NðtÞ, the number of vehicles in the link between the upstream and downstream sensors attime t.

Let i; si denote a vehicle’s index and the time it crosses the upstream sensor, and let j; tj correspond to the downstreamsensor. Let J be the index of the downstream vehicle with the largest time tJ 6 t that is matched with some upstream vehiclewith index I, say. Let K be the index of the upstream vehicle with the largest time sK 6 t. In the figure, J ¼ 19; I ¼ 6; K ¼ 10.The upstream vehicle with index I left the link before time t. So if Imax is the largest index of an upstream vehicle that left thelink before t; Imax P I. On the other hand, K is the largest index of an upstream vehicle that is still in the link at time t. Henceif there are no turning movements,

NðtÞ ¼ K � Imax 6 K � I:

Equality will not hold above only if Imax > I, which happens only when upstream vehicle Imax is not matched. (In the figure,Imax ¼ I ¼ 6; K � I ¼ 4.) If the matching probability is p, on average Imax � I ¼ p�1 � 1, so if p > 0:5, the estimate K � I differsfrom NðtÞ by at most 1 on average.

If there are turning movements, the bound above changes to

NðtÞ 6 K � I � nout þ nin;

in which nout is the number of upstream vehicles with index between I and K (like vehicle with index i ¼ 8 in the figure) thatwent out of the link before crossing the downstream sensor and nin is the number of vehicles with index larger than J (likej ¼ 20) that came into the link without crossing the upstream sensor.

Remark. If sensors are placed at the beginning and end of an on- or off-ramp, the scheme above will give an accurate real-time estimate of the number NðtÞ of vehicles on the ramp at any time t. On a ramp, there are no turning movements, sonout ¼ nin ¼ 0. Note, too, that in terms of the notation above, tJ � sI is the ramp delay experienced by the most recentdeparting vehicle. The scheme can also be used to estimate the number of vehicles (and hence the average headway) withina freeway segment to obtain the true spatial density. Such estimates may be useful in setting a ‘variable speed control’ policy.

3.7. Turning

Fig. 5 conveys more information than Fig. 3. Matched vehicles are connected; unmatched vehicles are shown as isolatedcrosses. The signal phases at C were inferred as explained above. It appears that the unmatched vehicles that cross C during a

Fig. 5. Matched vehicles on link B! C are connected; unmatched vehicles are shown as isolated crosses. Vehicles more from bottom ðBÞ to top ðCÞ.

0 50 100 150 2000

0.05

0.1

0.15

0.2

0.25

A to B 99/211

seconds0 50 100 150 200

0

0.05

0.1

0.15

0.2

0.25

B to C 138/232

seconds

0 50 100 150 2000

0.05

0.1

0.15

0.2

0.25

C to D 115/225

seconds0 50 100 150 200

0

0.05

0.1

0.15

0.2

0.25

A to D 87/211

seconds

Mean = 92.4, Var = 860 Mean = 58.4, Var = 784

Mean = 36.8, Var = 109 Mean = 187.7, Var = 1455

Fig. 6. Travel time distributions for May 23, 2008, 1–1:30PM.

592 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

Table 1Mean and variance of travel times.

Link Mean Variance

A! B 92.4 860B! C 58.4 784C ! D 36.8 109Sum 187.6 1753A! D 187.7 1455

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 593

red phase are those that turn into San Pablo from Solano (see map in Fig. 1). Similarly, some unmatched vehicles at B appearto have turned from San Pablo into Solano without crossing the sensor at Solano.

3.8. Travel time distribution

From the matched vehicles one obtains the four travel time distributions of Fig. 6.The x-axis is travel time in seconds and the y-axis is probability. The travel time distributions of Fig. 6 also make clear that

the mean travel time is an insufficient indication of travel time reliability.Table 1 gives the means and variances of the four travel time distributions in Fig. 6. The sum of the means and variances

for links A! B, B! C and C ! D can be compared with the mean and variance for link A! D. As expected, the mean traveltime over A! D is the sum of its component link mean travel times. Note that the travel time distribution for A! D is ob-tained by directly matching signatures at A and D, which shows that only two sensors are needed to estimate the travel timefor the entire segment!

3.9. Matching rate

The legend 99/211 in the plot for link A! B in Fig. 6 means that 211 vehicles crossed the sensor at A of which 99 werematched at B. If a fraction s of vehicles entering A turned before crossing B, the matching rate is 99=ð1� sÞ211. For anestimated s ¼ 0:3 (there are four intersections between A and B) this gives a matching rate of 99=½0:7� 211� or 67%. Thematching rates for the other links are 138=½0:8� 232� ¼ 74% for B! C; 115=½0:8� 225� ¼ 64% for C ! D; and87=½0:6� 211� ¼ 69% for A! D (which has six intersections).

3.10. Peak vs. off-peak

Fig. 7 displays the travel time distributions for links A! B and C ! D for an off-peak period, 11–12PM. Comparison withFig. 3 shows that the mean travel time is lower: 77.12 s vs. 92.4 s for link A! B; 29.6 s vs. 36.8 s for link C ! D. More inter-estingly, about 25% of peak vehicles face the same travel time as off-peak vehicles.

0 50 100 150 2000

0.05

0.1

0.15

0.2

0.25

seconds0 50 100 150 200

seconds0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

A to B 31/48

Mean = 77.2, Var = 1335C to D 115/225

Mean = 29.6, Var = 1348

Fig. 7. Travel time distributions for May 23, 2008, 11–12PM.

Fig. 8. The distance matrix (left) and the optimum constrained match (right) for link C ! D, May 23, 2008, 1–1:30PM.

594 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

3.11. Optimum matching

The results described above are obtained from the optimum matching algorithm presented in Section 6. The algorithmtakes the distance matrix as data and produces the optimum match, as illustrated in Fig. 8. The plot on the left is a gray scalecoding of the distance matrix. There is one pixel for the distance between the signatures of each pair of vehicles; a darkercolor indicates shorter distance. Clearly visible in the plot is a dark diagonal line of short distances. The graph on the rightis the assignment from the optimal matching function. The data are for about 250 vehicles that traversed link C ! D on May23, 2008, 1–1:30PM. The graph on the right consists of short diagonal, horizontal and vertical segments. As will be describedin Section 6, the diagonal segments correspond to matches between upstream and downstream signatures, while a horizon-tal (or vertical) segment indicates a failure to match an upstream (or downstream) signature.

4. Matching problem

The signal-processing step assigns to each pair of signatures ðXi; YjÞ the distance dði; jÞ ¼ dðXi; YjÞP 0 between them. Thesmaller is dðXi; YjÞ the more likely it is that Xi; Yj are signatures of the same vehicle. At the end of the signal processing stepwe thus arrive at the data ðD; S; TÞ consisting of the N �M distance matrix D ¼ fdði; jÞ; 1 6 i 6 N; 1 6 j 6 Mg; the array of Nordered upstream times S ¼ ðs1; . . . ; sNÞ; and the array of M ordered downstream times T ¼ ðt1; . . . ; tMÞ.

A matching function assigns to each distance matrix D a matching

l : f1; . . . ;Ng ! f1; . . . ;M; sg;

with this interpretation: lðiÞ ¼ j means that the upstream vehicle i is declared to match (be the same as) downstream vehiclej; lðiÞ ¼ s means i is declared not to match any downstream vehicle. If vehicle i is matched with j, its start, end and traveltimes are si; tj, and tj � si are obtained from the arrays S and T; an unmatched vehicle does not yield a travel time, and mayindicate a turning movement.

The true matching (ground truth) is denoted by �l. The problem is to design a matching l, based on the observation matrixD, that is close to �l, without of course knowing �l.

An important, intuitive example is the minimum distance matching function, lminD, which declares an upstream vehicle ito be the same as the downstream vehicle j that is closest to it, provided also that the distance dði; jÞ is smaller than a thresh-old, say d�; otherwise it assigns s. Formally,

lminDðiÞ ¼j if dði; jÞ 6 dði; kÞ 8k; dði; jÞ 6 d�

s if dði; kÞ > d� 8k

�: ð1Þ

4.1. Statistical model of signature distance

In order to evaluate the performance of lminD and to develop better matching functions, we propose a statistical model ofthe signature distance.

We assume that the distance matrix is characterized by two probability density functions (pdf), f and g: f is the pdf of thedistance dðXv ; Yv Þ between the signatures at the upstream and downstream sensors of the same randomly selected vehicle v,

μf = 0.143, σf = 0.058 μg = 0.353, σg = 0.088

nf = 91, ng = 24,622

Link A to Bμf = 0.177, σf = 0.048μg = 0.370, σg = 0.086nf = 58, ng = 12,458

Link B to C

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0.12

f

g

0 0.5 10

0.02

0.04

0.06

0.08

0.1

0.12

f

g

μf = 0.157, σf = 0.048, μg = 0.323, σg = 0.080, nf = 38, ng = 8,189

Link C to D

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

f

g

0

0.02

0.04

0.06

0.08

0.1

0.12

Fig. 9. The empirical pdfs f and g and their Gaussian approximations for links A! B, B! C and C ! D.

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 595

f ðdÞ ¼ pðdðXv ; YvÞ ¼ dÞ;

and g is the pdf of the distance dðXv ; YwÞ between two different randomly selected vehicles v – w:

gðdÞ ¼ pðdðXv ;YwÞ ¼ dÞ:

Then, conditional on the true matching �l, the coefficients of the random observation matrix D have the pdf

dði; jÞ �f if �lðiÞ ¼ j

g if �lðiÞ– j or �lðiÞ ¼ s

�:

We assume that conditional on �l the dði; jÞ are independent random variables. Let Di ¼ fdði; jÞ; 1 6 j 6 Mg be the array ofdistances between Xi and all the Yj. Then

pðDj�lÞ ¼Y

i

pðDij�lðiÞÞ; ð2Þ

pðDij�lðiÞÞ ¼f ðdði; jÞÞ

Qk–jgðdði; kÞÞ if �lðiÞ ¼ jQ

kgðdði; kÞÞ if �lðiÞ ¼ s

�¼

Lðdði; jÞÞcðDiÞ if �lðiÞ ¼ j

cðDiÞ if �lðiÞ ¼ s

�;

ð3Þ

in which

Lðdði; jÞÞ ¼ f ðdði; jÞÞgðdði; jÞÞ ; cðDiÞ ¼

YMk¼1

gðdði; kÞÞ: ð4Þ

Relations (2)–(4) constitute the signature distance statistical model.Fig. 9 displays the empirical pdfs and the Gaussian approximations of f and g for the three links. The annotation above the

left plot for link A! B means that lf and rf are the mean and standard deviation for f ; lg and rg are the mean and standarddeviation for g; nf ¼ 91 and ng ¼ 24;622 are the number of samples used to estimate the statistics for f and g, respectively.That is, there were 91 matched vehicle pairs and 24,622 unmatched pairs. (There always are many more unmatched pairs.)Section 8 describes how the distributions in Fig. 9 are estimated.

5. Optimal unconstrained matching

We evaluate the performance of lminD and the unconstrained MAP (maximum a posteriori) matching function, luMAP , de-fined later. These matching functions are unconstrained since no restriction is placed on the assignment:

l : f1; . . . ;Ng ! f1; . . . ;M; sg:

In particular, the matching function permits a vehicle to overtake other vehicles in front of it, even though, as in the caseof a single arterial link, this is unlikely. Matching functions considered in the literature are unconstrained. In the next section

596 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

we consider constrained matching functions, which do not permit overtaking. Such a constraint greatly improvesperformance.

Define the cumulative and complementary distribution functions

GðdÞ ¼Z d

0gðxÞdx; eGðdÞ ¼ 1� GðdÞ:

Similarly define FðdÞ and eFðdÞ.5.1. Minimum distance matching

Theorem 5.1 gives the performance of the minimum distance matching function, lminD. It is proved in the Appendix.

Theorem 5.1. The probability of a correct match, lminDðiÞ ¼ �lðiÞ, is

pðlminDðiÞ ¼ jj�lðiÞ ¼ jÞ ¼Z d�

0f ðxÞ½eGðxÞ�M�1dx: ð5Þ

The probability of an incorrect match, lminDðiÞ – �lðiÞ, is

pðlminDðiÞ– jj�lðiÞ ¼ jÞ ¼ ðM � 1ÞZ d�

0½eGðxÞ�M�2gðxÞeFðxÞdx: ð6Þ

The probability that a vehicle is unmatched, lminDðiÞ ¼ s, is

pðlminDðiÞ ¼ sj�lðiÞ ¼ jÞ ¼ eF ðd�Þ½eGðd�Þ�M�1: ð7Þ

The three probabilities (5)–(7) add up to 1.

Theorem 5.1 allows us to predict how the performance depends on the threshold d� and the number M of potential vehi-cle matches. From (5) and (6), the probabilities of both correct and incorrect match increase as d� increases. Thus, as inhypothesis testing generally, the proper choice of the threshold value must compromise between correct and incorrectre-identification.

Second, from (5) and (6), the probability of a correct match decreases and the probability of an incorrect match increasesas M increases. This is intuitive: the larger is the number M of potential matches, the worse is the performance of the min-imum distance matching function. So one way to improve the matching algorithm is to reduce M. A common way of reducingM is to place an upper bound T on the link travel time and limit the matching of an upstream vehicle i to those downstreamvehicles j for which tj � si 6 T , as in Ndoye et al. (2008).

Third, the probability of a vehicle being unmatched decreases as d� or M increases. This, too, is intuitive: a larger d� im-plies a less stringent condition on matching, while a larger M increases the chance of finding a potential match.

Formulas (5)–(7) help determine the range of values of d� and M for which the re-identification scheme gives satisfactoryperformance. Fig. 10 plots the probabilities of correct and incorrect matches using the Gaussian approximations for the dis-tributions f ; g in Fig. 9 in (5) and (6) for link A! B. For a per lane flow of 500 vph, M ¼ 50 corresponds to a time interval of6 min, which is the travel time window over a three-mile long link at an average speed of 30 mph. For d� ¼ 0:15, the min-imum distance matching function is predicted to give 45% correctly matched, 25% incorrectly matched, and 30% unmatchedvehicles.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8probability of a correct match

d*

M=20M=50M=100M=300

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7probability of an incorrect match

d*

M=20M=50M=100M=300

0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4

Fig. 10. Probabilities of correct and incorrect matches of lminD for different values of d� ; M.

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 597

5.2. Unconstrained MAP matching luMAP

We shall evaluate the performance of any matching function l by its (normalized) reward qðlÞ:

qðlÞ ¼ 1N

EXN

i¼1

1ðlðiÞ ¼ �lðiÞÞ; ð8Þ

in which 1ð�Þ is the indicator function: 1ðlðiÞ ¼ �lðiÞÞ ¼ 1 if lðiÞ ¼ �lðiÞ, and ¼ 0 otherwise. Thus qðlÞ is the correct matchingrate, the fraction of correctly matched vehicles on average.

To evaluate the expectation operator E in (8) we need the joint probability distribution on ðD; �lÞ. Since pðDj�lÞ is alreadyspecified by (2)–(4) we only need the (prior) distribution of �l. We assume that the number of upstream vehiclesN ¼ ð1þ bÞM of which M vehicles cross the downstream sensor and bM vehicles turn before reaching the downstream sen-sor. Thus b is the turning probability. Subject to this assumption, we impose a uniform distribution on �l:

pð�lðiÞ ¼ jÞ ¼ a; j ¼ 1; . . . ;M; pð�lðiÞ ¼ sÞ ¼ b; ð9Þ

with Maþ b ¼ 1.Let l� denote the optimal or reward-maximizing matching function. Theorem 5.2 is proved in Appendix.

Theorem 5.2. l� is given by

l�ðiÞ ¼j if Lðdði; jÞÞP Lðdði; kÞÞ 8k; Lðdði; jÞÞP b=as if Lðdði; kÞÞ < b=a 8k

�: ð10Þ

To implement (10) we need b or a, since the dði; jÞ and M are known from the data. In the present context, b is the turningprobability, which may be determined from field observations or experience. But another consideration may govern thechoice of b. From (10) one sees that the larger is b, the more stringent is the requirement of a match, and lower is the prob-ability of an incorrect match. So, depending on the application, one should choose a larger value for b, if the ‘cost’ of an incor-rect match is high.

Observe that l� maximizes the posterior probability

pð�ljDÞ ¼ pðDj�lÞðpð�lÞÞpðDÞ ; ð11Þ

with prior probability pð�lÞ given by (9). So l� ¼ luMAP is also the (unconstrained) maximum a posteriori (MAP) matchingfunction.

The minimum distance matching lminD and unconstrained MAP matching luMAP have a similar structure. One calculates astatistic for each pair ði; jÞ of downstream and upstream vehicles—dði; jÞ in the lminD case and the likelihood ratio Lðdði; jÞÞ inthe luMAP case—and matches i to the best j in terms of this statistic, provided that it meets a threshold. However, lminD doesnot take into account the uncertainty in the distance measurements, whereas luMAP does.

Intuitively, correct matching of a downstream vehicle i to one of the upstream vehicles 1; . . . ;M should be more difficultas M increases. The next result shows this is indeed the case. Corollary 5.1 can be compared with (5).

Define

fLðlÞ ¼ pðLðdði; jÞÞ ¼ lj�lðiÞ ¼ jÞ and GLðlÞ ¼ pðLðdði; kÞÞ 6 lj�lðiÞ– kÞ; ð12Þ

the pdf of Lðdði; jÞÞ and the cumulative distribution function (cdf) of Lðdði; kÞÞ, conditional on �lðiÞ ¼ j – k. That is, fLðlÞ is thepdf of LðdÞ when d has pdf f, and GLðlÞ is the cdf of LðdÞ when d has pdf g.

Corollary 5.1. The maximum reward is given by the explicit formula:

qðluMAPÞ ¼ MaZ 1

b=afLðlÞ½GLðlÞ�M�1dlþ b½GLðlÞ�M: ð13Þ

Moreover, qðluMAPÞ decreases as M increases, keeping Ma and b constant ðMaþ b ¼ 1Þ.

Proof. See Appendix. �

Remark on the Gaussian case: In the Gaussian case,

f ðdÞ ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffi2pr2

f

q exp�ðd� lf Þ

2

2r2f

; gðdÞ ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffi2pr2

g

q exp�ðd� lgÞ

2

2r2g

:

Here lf and rf denote the mean and standard deviation of the Gaussian pdf f while lg and rg denote analogous quantitiesfor g. To characterize l� it is more convenient to maximize the ‘log-likelihood’ lðdÞ instead of the ‘likelihood’ LðdÞ:

598 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

lðdÞ ¼ ln LðdÞ ¼ lnrg

rf�ðd� lf Þ

2

2r2f

þðd� lgÞ

2

2r2g

:

It is easy to check by differentiating this quadratic expression that lðdÞ is decreasing in d for 0 6 d 6 lg for the estimatedparameters of Fig. 9. Thus in this range maximizing lðdÞ is equivalent to minimizing d. Since ½0; lg � is likely to include therange Lðdði; jÞÞP b=a (in (10)), this means that l� coincides with lminD (with an appropriate threshold).

6. Optimal constrained matching

Minimum distance lminD, and unconstrained MAP luMAP , and the matchings in Ritchie et al. (2002) and Ndoye et al. (2008)are examples of unconstrained matching. Unconstrained matching may violate two constraints. First, a matching may allowduplicates: two different upstream vehicles i1 – i2 may be matched to the same downstream vehicle j. Second, a matchingmay permit overtaking: an upstream vehicle i2 which follows i1; i2 > i1, may be matched to downstream vehicles j1; j2 in thereverse order, j1 > j2. A constrained matching should not allow duplicates and it should not permit overtaking.

Suppose the data comprise a sequence of upstream vehicle signatures Xi; i ¼ 1; . . . ;N and a sequence of downstreamvehicle signatures Yj; j ¼ 1; . . . ;M. A constrained matching is a pair of matched sequences like ðUp;DownÞ:

Up ¼ X1 s X2 X3 X4 X5 X6

# # # # # # #Down ¼ s Y1 Y2 s Y3 Y4 Y5

ð14Þ

The interpretation of (14) is that the upstream vehicles with signatures X2; X4; X5 and X6 are, respectively, matched withthe downstream vehicles with signatures Y2; Y3; Y4 and Y5; the upstream vehicles with signatures X1 and X3 are matchedwith s, meaning that these vehicles turned before reaching the downstream sensor (or are unmatched for some other rea-son); and the downstream vehicle with signature Y1 is matched with s, meaning this vehicle turned into the link withoutcrossing the upstream sensor but it did cross the downstream sensor. Note that in (14) one can distinguish the two typesof turns.

In general, a constrained matching ðUp;DownÞ is a pair of equal length sequences, comprising the original signature se-quences, X1; . . . ;XN and Y1; . . . ;YM , together with arbitrarily many insertions of s within these sequences, with the restrictionthat an occurrence of s in Up can only be matched with a Yj in Down and a s in Down can only be matched with a Xi in Up.Observe that in a constrained matching the matches of the form Yj ! s are determined once all matches of the form Xi ! Yj

and Xi ! s are specified.We want to find lcMAP , the maximum a posteriori (MAP) matching function for the constrained case. lcMAP maximizes the

posterior probability

pð�ljDÞ ¼ pðDj�lÞpcð�lÞpðDÞ ; ð15Þ

in which pcð�lÞ denotes the prior probability that �l is the true constrained matching. In (15), pðDj�lÞ is given by the signaturedistance model ((2)–(4), so we only need to figure out the prior pcð�lÞ, which is just the unconstrained prior pð�lÞ given in (9),conditioned by the requirement that �l is a constrained matching. That is,

pcð�lÞ ¼pð�lÞ=

P�l2Mc

pð�lÞ �l 2 Mc

0 �l R Mc

�; ð16Þ

in which Mc denotes the set of constrained matchings. Thus lcMAP is given by

lcMAP ¼ arg max�l2Mc

pðDj�lÞpð�lÞpðDÞ ¼ arg max

�l2McpðDj�lÞpð�lÞ:

The last equality follows from the fact that pðDÞ ¼P

�l2McpðDj�lÞpð�lÞ does not depend on �l.

Recall that the unconstrained prior is the uniform distribution on �l with turning probability b:

pð�lÞ ¼Y

i

pð�lðiÞÞ; pð�lðiÞ ¼ jÞ ¼ a; j ¼ 1; . . . ;M; pð�lðiÞ ¼ sÞ ¼ b; ð17Þ

with Maþ b ¼ 1. Using (2)–(4) and (17) gives

pðDj�lÞpð�lÞ ¼Y

i

pðDij�lðiÞÞpð�lðiÞÞ; ð18Þ

pðDij�lðiÞÞpð�lðiÞÞ ¼Lðdði; jÞÞcðDiÞa �lðiÞ ¼ j

cðDiÞb �lðiÞ ¼ s

�: ð19Þ

(0,0)

(N,M)

Y1

Y2

Y3

Y4

Y5

X1 X2 X3 X4 X5 X6

Xi

Yj

(i-1, j-1)

(i, j)

τ

τ

(i-1, j)

(i, j-1)

Fig. 11. The edit graph for example (14). A diagonal edge corresponds to a signature match; a horizontal or vertical edge corresponds to a turn(match with s).

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 599

To find lcMAP we must maximize the likelihood (18) over the set Mc . It will be more convenient to minimize the negative‘log-likelihood’,

� ln pðDj�lÞpð�lÞ ¼ �X

i

ln pðDij�lðiÞÞ � ln pð�lðiÞÞ ¼X

i

Xj

kði; jÞ1ð�lðiÞ ¼ jÞ þX

i

kði; sÞ1ð�lðiÞ ¼ sÞ; ð20Þ

in which 1ð�Þ denotes the indicator function and

kði; jÞ ¼ � ln Lðdði; jÞÞ � ln cðDiÞ � ln a; ð21Þkði; sÞ ¼ � ln cðDiÞ � ln b: ð22Þ

Thus to find lcMAP we must minimize the linear form (20) over the ‘‘combinatorial” constraint �l 2 Mc . The difficulty is tofind a convenient representation of Mc.

We now describe a graph GðN; MÞ whose paths are in one–one correspondence with the set Mc of all constrained match-ings. GðN; MÞ comprises ðN þ 1Þ � ðM þ 1Þ nodes arranged in the form of a grid like the one shown in Fig. 11, which is thegraph for example (14) with N ¼ 6; M ¼ 5. GðN; MÞ is called the edit graph in the literature on sequence comparison algo-rithms in computer science and molecular biology (Myers, 1986).

GðN; MÞ is constructed as follows: Its nodes are labeled ði; jÞ; 0 6 i 6 N; 0 6 j 6 M. A node ði� 1; j� 1Þ has three directededges connected to nodes ði� 1; jÞ; ði; j� 1Þ and ði; jÞ (unless i > N or j > M). The ‘diagonal’ edge ði� 1; j� 1Þ ! ði; jÞ indi-cates the match Xi ! Yj; the ‘horizontal’ edge ði� 1; j� 1Þ ! ði; j� 1Þ indicates the match Xi ! s (vehicle i did not crossthe downstream sensor); the ‘vertical’ edge ði� 1; j� 1Þ ! ði� 1; jÞ indicates the match Yj ! s (vehicle j did not cross theupstream sensor).

It is an obvious but very important fact that each path in GðN; MÞ from node ð0; 0Þ to ðN; MÞ corresponds to a constrainedmatching (no duplicates, no overtaking) and vice versa. The constrained matching (14) corresponds to the path in Fig. 11indicated by the thick dashed lines.

Having identified constrained matchings with paths in the edit graph, we identify (20) with the sum of the weights of theedges along the path, assigning edge weights according to

wðði� 1; j� 1Þ ! ði; jÞÞ ¼ kði; jÞwðði� 1; j� 1Þ ! ði; j� 1ÞÞ ¼ kði; sÞ: ð23Þwðði� 1; j� 1Þ ! ði� 1; jÞÞ ¼ 0

Evidently, the value (20) for any constrained matching �l is equal to the weight of the corresponding path (defined as thesum of the edge weights) in the edit graph. Thus finding lcMAP is equivalent to finding the minimum weight path.

The minimum weight path can be found via the algorithm described in Table 2, which recursively computes Wði; jÞ, theweight of the shortest path from node (0,0) to node ði; jÞ; in particular

Table 2Shortest path algorithm for lcMAP .

Wð0;0Þ 0.For j 1 to M do Wð0; jÞ Wð0; j� 1Þ þwðð0; j� 1Þ ! ð0; jÞÞ.For i 1 to N do Wði;0Þ Wði� 1;0Þ þwðði� 1;0Þ ! ði;0ÞÞ.For i 1 to N do

For j 1 to M do {

Wði; jÞ min fWði� 1; jÞ þwðði� 1; jÞ ! ði; jÞÞ;Wði; j� 1Þ þwðði; j� 1Þ ! ði; jÞÞ;Wði� 1; j� 1Þ þwðði� 1; j� 1Þ ! ði; jÞÞg;Eði; jÞ arg minfWði� 1; jÞ þwðði� 1; jÞ ! ði; jÞÞ;Wði; j� 1Þ þwðði; j� 1Þ; ði; jÞÞ;Wði� 1; j� 1Þ þwðði� 1; j� 1Þ ! ði; jÞÞg;

}

600 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

WðN;MÞ ¼minl

WðlÞ:

The array Eði; jÞ can be used to backtrack and recover the shortest path or lminW . The algorithm requires a single passthrough all the MN nodes taken in topological order, so its complexity is OðMNÞ.

As evident in Fig. 11, from the shortest path one can read off the vehicles that are re-identified Xi ! Yj. Since we know thetimes si; tj when these vehicles crossed the two sensors, we know their travel time Ti ¼ tj � si. One can also read off whichvehicles were declared to have crossed the upstream sensor but not the downstream sensor (these are the horizontal edgesin the path) and which vehicles were declared to have crossed the upstream sensor but not the downstream sensor (theseare the vertical edges).

The results in Section 4 are based on this shortest path algorithm. Fig. 8 illustrates the result for link C ! D. Approxi-mately 250 vehicles crossed this link in 30 min. The horizontal edges in the shortest path are presumably vehicles thatcrossed the sensor at C but not at D; the vertical edges are vehicles that crossed D but not C.

Implementation of the algorithm requires the edge weights, which in turn require knowing the Gaussian distributionsf ; g of Fig. 9 and the turning probability b. We consider these in turn.

Section 8 explains how the distributions are obtained through an iterative procedure. The iteration starts from an initialmatching, which is obtained by a modified shortest path algorithm applied to edge weights obtained directly from the dis-tance matrix D ¼ fdði; jÞg. That algorithm called saturation method works as follows:

Step 1. The distance matrix is scanned and values dði; jÞ > :75 are set to .75. (The signal processing step assigns distancevalues between 0 and 1.) Call the resulting matrix DM.

Step 2. The shortest path is calculated with this iteration rule:

Dði; jÞ ¼minfDði� 1; j� 1Þ þ 2 � DMði; jÞ; Dði� 1; jÞ þ DMði; jÞ;Dði; j� 1Þ þ DMði; jÞg

in which Dði; jÞ is the cost of the shortest path from node (0,0) to node ði; jÞ.As in the unconstrained case, the choice of b is determined by experience or observation. More importantly, the choice of

b controls the probability that lcMAP makes incorrect matches: the larger is b, the lower the probability of an incorrect match;but the larger is b, the lower is also the probability of a correct match. It is possible to obtain a formula for the expected num-ber of correct matches with lcMAP like formula (13) for the performance of luMAP . But the formula for lcMAP involves a multi-dimensional integral that cannot be evaluated in closed form (unlike (13)) and it is not presented here.

Of course the performance of lcMAP is much better than that of luMAP because of the restriction to constrained matching.

7. Real-time matching

As formulated in Section 6 the edit graph grows with the observation time interval, and so with each new upstream ordownstream vehicle, one needs to calculate the distance of its signature from all previous signatures. The effort to computethese distances for each new vehicle grows linearly with the observation interval, which is unsuitable for real-timeimplementation.

For real-time implementation, we want to restrict the growth of the edit graph. One way of doing this is as follows: Foreach new upstream vehicle i (with signature Xi) that arrives at time si compute the distance dðXi; YkÞ:

dði; kÞ ¼dðXi;YkÞ if 0 6 tk � si 6 TM

1 else

�: ð24Þ

For each new downstream vehicle j (with signature Yj) that arrives at time tj compute the distance dðXm; YjÞ:

dðm; jÞ ¼dðXm; YjÞ if 0 6 tj � sm 6 TM

1 else

�: ð25Þ

Yj

Xiupstream vehicle i arrives at si

dow

nstr

eam

veh

icle

j a

rriv

es a

t t j 0 < tk - si < TM

d(i,k) = infinity

d(m,j) = infinity

0 < tj - sm < TM

Fig. 12. The distances to be calculated in real-time implementation.

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 601

If TM is an upper bound on the travel time, (24), (25) must hold. The distances dðXi; YjÞ to be calculated are thus boundedby the dashed lines in Fig. 12. So the computational burden for each new upstream or downstream vehicle is essentiallyconstant.

A simple choice for TM would be to assume a minimum speed and take TM to be the corresponding travel time, whichrequires knowledge of the link length, cycle times, etc. A better idea is this. Let l be the shortest path matching until thecurrent time. Pick an integer M. Let i1; . . . ; iM be the most recent M matched downstream vehicles, and set

TM ¼ 2�maxftlðimÞ � sim ; 1 6 m 6 Mg:

Thus TM is twice the maximum travel time experienced by these M vehicles. This choice will automatically adapt tochanges in travel time.

8. Estimating the statistical model

The statistical model (2)–(4) parameterizes the probability density of the observation matrix D as pðDj�l; lf ; rf ; lg ; rgÞ,with parameter �l for the true matching, and ðlf ; . . . ;rgÞ for the four parameters of the Gaussian distributions of f ; g.

Ideally the optimum matching and the parameters of f ; g should be jointly determined as the maximum likelihoodestimate

ð �̂l;clf ;crf ;clg ;crg Þ ¼ arg max�l;lf ;rf ;lg ;rg

pðDj�l;lf ;rf ;lg ;rgÞ: ð26Þ

While (26) is a well-defined optimization problem, it is computationally very expensive to solve because of the combi-natorial nature of the variable �l. Instead we perform a coordinate-wise optimization:

1. Begin with an initial estimate �l0 (which we do using the saturation method).2. At step i we have the estimate �li. Find

ðlif ;r

if ;l

ig ;r

igÞ ¼ arg max

lf ;rf ;lg ;rgpðDj�li;lf ;rf ;lgrgÞÞ:

602 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

This step is easy because the given match �li divides elements of the observation matrix D into distances of pairs ofmatched and unmatched vehicles, so ðli

f ; rif Þ are the empirical moments of the matched pairs and ðli

g ; rigÞ are corre-

sponding values for the unmatched pairs.3. Use the optimal matching algorithm to find

�liþ1 ¼ arg max�l

pðDj�l;lif ;r

if ;l

ig ;r

igÞ;

and return to 2. with iþ 1.

Since the likelihood pðDj�li; lif ; ri

f ; lig ; ri

gÞ increases with i, the iteration must converge to a local maximum of the likeli-hood. For the test results the iterations converged in three to four steps. (The iterations bear a family resemblance to thewell-known EM algorithm of Dempster et al. (1977).)

9. Conclusions and future directions

A system is described for the real-time estimation of the probability distribution of travel time across an arterial seg-ment with several intersections. It relies on matching vehicle signatures from wireless magnetic sensors. Results of a teston a 1.5 km (0.9 mile)-long segment with six intersections are discussed. The system requires no measurement of vehiclespeed or signal phase information. The optimal matching procedure is based on a statistical model of signature distance,whose parameters are themselves estimated from the data, so no ground truth measurements are needed. The model canbe used to predict the rates of correct, incorrect, and missed matches. Matched vehicle results yield signal phase, queuesat stop bars, number of vehicles in the link, and other arterial performance measures, in addition to travel time and delaydistributions.

The matching scheme is anonymous, unlike approaches that risk violating privacy by relying on reading licence plates orRFID tags or tracking cell phones. The system is immediately deployable, as indicated by the fact that following its installa-tion on November 10, 2008 on a 10 km (6.2 mile)-long segment of Telegraph Canyon Rd. in San Diego, CA, travel times shownin Fig. 13 were published on the web starting November 13, 2008. Observe that by displaying the median, 80th and 90thpercentile of the travel time travelers get much more information than from the average travel time alone.

Several extensions of the system and its potential applications suggest themselves. The system described here matchessignatures of vehicles crossing one sensor in a single input stream to those crossing another sensor in a single outputstream. Call this a SISO (single input, single output) system. A valuable extension would be a MIMO (multi-input,

Fig. 13. Screen capture of map of travel times on Telegraph Canyon Rd, in San Diego.

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 603

multi-output) system in which signatures from vehicles crossing sensors placed in each of I P 1 input streams arematched with those crossing sensors in each of J P 1 output streams. A MIMO system for an intersection with severalinput and output lanes would yield turning movements and corresponding travel times. (The two-lane test segment couldbe viewed as a 2-input, 2-output system and matching results would give the number of lane changes.) The SISO no over-taking constraint has a natural MIMO generalization: two vehicles arriving on the same input stream and leaving on thesame output stream cannot change order. It turns out that the matching algorithm of Section 6 has an extension to theMIMO case. Kwong et al. (2009) describe this extended algorithm, performance results, and several examples of MIMOsystems.

Previous results of Cheung et al. (2005) prompt the conjecture that a vehicle’s magnetic signature may be processed toclassify it as a truck, bus or passenger car. If the conjecture is correct, one can classify the signatures and separately estimatetravel times on arterial and freeway links for (say) buses and private autos, which would be useful for performance compar-isons and traveler information.

The matching algorithm can be used for different kinds of signatures. We discuss one example. Every GSM and UMTS cellphone measures and reports signal strengths from neighboring cells. The Network Measurement Report (NMR) of such signalstrengths is a ‘‘fingerprint” of the location where the phone makes the measurements. (The NMR is used in several commer-cial location-based services.) Suppose a vehicle such as a bus repeatedly traverses the same route and a cell phone in the busrecords the sequence of NMRs and the corresponding time stamp. By matching the sequences of NMR fingerprints one canrelate the trajectories of the bus in two traversals, one of which is considered a ‘standard’, and thereby calculate measures ofbus performance (on-time arrival, delay at different stops, etc.). This is a virtually costless method compared with expensiveAVL systems in use today. The non-overtaking constraint in this context simply means that the bus always moves forwardalong its route.

The SISO system described here or its MIMO extension provide measurements to improve traffic control. We describethree applications. First, by placing sensors at the entrance and exit of an on- or off-ramp, the system will provide accuratereal-time measurement of both the number of vehicles (queue size) in the ramp and the delay experienced by the most re-cently exiting vehicle. These measurements can replace imprecise queue size estimation in queue control policies (Smarag-adis and Papageorgiou (2003); Sun and Horowitz (2006)). Second, the number of vehicles in an arterial link at any timeprovides an accurate short-term prediction of the demand for service originating on this link at the downstream intersectionsignal within the current cycle. This prediction can be used for very precise green light extension or termination to increasegreen light utilization. If the matching procedure is used on other links incident on the intersection, one can control the sig-nal for optimal balancing. Third, if as suggested above, the magnetic signature can also be used to identify buses, the inter-section signal can be controlled to favor transit.

The nodes in the sensor described here are conventionally used for accurate point measurements of count, occupancy andspeed, similar to inductive loops and radar (Haoui et al., 2008). The system described here uses the same nodes in a novelconfiguration to provide for the first time the real-time measurement of link vehicle count and link travel time, which con-stitute the state of second-order models of network traffic flow.

Acknowledgements

We are grateful to Professors Nik Geroliminis, Henry Liu and Hani Mahmassani for discussion, and to two reviewers fortheir suggestions. Some results in this paper were presented at the 13th International Conference of Hong Kong Society forTransportation Studies, December 2008, and at 89th Annual Meeting of the Transportation Research Board, January 2009.This research was supported in part by California Department of Transportation and by AROMURI-UCSC-W911NF-05-1-0246-VA-09/05.

Appendix A. Signal processing algorithm

As mentioned in the text, a sensor comprises an array of seven nodes, each of which has a three-axis magnetometer thatmeasures the x; y; z directions of the earth’s magnetic field at a sampling rate of 128 Hz as a vehicle goes over the node.Fig. 14 shows the raw z-axis measured signal from one node. The x- and y-axis signals are similar.

The microprocessor in the node automatically extracts the sequence of peak values (local maxima and minima) from eachof the x; y; z signals. (The times when the peaks occur are not recorded.) In the example, there are six peak values (includingthe initial and terminal values of the signal), denoted by squares. The node transmits the array of these peak values to theaccess points (AP). There will be three such arrays, corresponding to the three axes. Together, the three arrays constitute aslice of the two-dimensional magnetic ‘footprint’ of the vehicle.

A slice measured by a node is determined by the distribution of the ferrous material in the vehicle within 1200 from thenode. For each vehicle, the AP receives one slice from each of the seven nodes. The seven slices constitute the vehicle’s sig-nature at the sensor. (It may be worth noting that two vehicles of the same make and model year typically have differentsignatures.)

0 50 100 150 200 250−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

milliseconds

Fig. 14. The raw z-axis magnetic signal generated by a vehicle; the squares denote the peaks extracted from the signal.

604 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

The signal processing algorithm takes two signatures, say X ¼ ðX1; . . . ;X7Þ and Y ¼ ðY1; . . . ;Y7Þ (Xi; Yj are the slices), andcomputes a distance (a measure of dissimilarity) between each pair of slices. The distance dðX; YÞ is defined as the minimumof the distances between all pairs of slices ðXi; YjÞ.

A.1. Proof of Theorem 5.1

Applying the definition (1) and using (2), the probability of a correct match, lminDðiÞ ¼ �lðiÞ, is

pðlminDðiÞ ¼ jj�lðiÞ ¼ jÞ ¼ pðdði; jÞ 6 dði; kÞ 8k; dði; jÞ 6 d�j�lðiÞ ¼ jÞ ¼Z d�

0pðdði; jÞ ¼ xj�lðiÞ ¼ jÞ

Yk–j

Probðdði; kÞ

P xj�lðiÞ ¼ jÞdx ¼Z d�

0f ðxÞ½eGðxÞ�M�1dx:

The probability of an incorrect match is

pðlminD–jj�lðiÞ ¼ jÞ ¼ pðdði; jÞ > mink–j

dði; kÞ; mink–j

dði; kÞ 6 d�j�lðiÞ ¼ jÞ ¼Z d�

0Probðdði; jÞ > xj�lðiÞ ¼ jÞpðmin

k–jdði; kÞ

¼ xj�lðiÞ ¼ jÞdx ¼ ðM � 1ÞZ d�

0½eGðxÞ�M�2gðxÞeFðxÞdx:

To arrive at the last equality, we use the facts that Probðdði; jÞ > xj�lðiÞ ¼ jÞ ¼ eFðxÞ and

Probðmink–j

dði; kÞ 6 xj�lðiÞ ¼ jÞ ¼ 1� Probðdði; kÞ 6 x; k – jj�lðiÞ ¼ jÞ ¼ 1� ½eGðxÞ�M�1;

which, upon differentiating with respect to x, gives the density

pðmink–j

dði; kÞ ¼ xj�lðiÞ ¼ jÞ ¼ ðM � 1Þ½eGðxÞ�M�2gðxÞ:

Lastly, the probability that lminDðiÞ ¼ s is

pðlminDðiÞ ¼ sj�lðiÞ ¼ jÞ ¼ pðdði; jÞP d�; dði; kÞP d�; k – jj�lðiÞ ¼ jÞ ¼ eFðd�Þ½eGðd�Þ�M�1:

This proves (5)–(7).

K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606 605

A.2. Proof of Theorem 5.2

For any matching function l, we can express qðlÞ as

NqðlÞ ¼XN

i¼1

XM

j¼1

pðlðiÞ ¼ jj�lðiÞ ¼ jÞpð�lðiÞ ¼ jÞ"

þ pðlðiÞ ¼ sj�lðiÞ ¼ sÞpð�lðiÞ ¼ sÞ#

ð27Þ

¼XN

i¼1

aXM

j¼1

ZpðlðiÞ ¼ jjDÞpðDj�lðiÞ ¼ jÞdD

"

þ bZ

pðlðiÞ ¼ sjDÞpðDj�lðiÞ ¼ sÞdD�: ð28Þ

From (2)–(4) and (9)

pðDj�lðiÞ ¼ jÞ ¼ pðDij�lðiÞ ¼ jÞpðD�iÞ ¼ Lðdði; jÞÞcðDiÞpðD�iÞ; pðDj�lðiÞ ¼ sÞ ¼ cðDiÞpðD�iÞ;

in which pðD�iÞ ¼Q

l–ipðDlÞ. Substitution into (28) yields

NqðlÞ ¼XN

i¼1

aXM

j¼1

ZpðlðiÞ ¼ jjDÞLðdði; jÞÞcðDiÞpðD�iÞdDþ b

ZpðlðiÞ ¼ sjDÞcðDiÞpðD�iÞdD

" #; ð29Þ

We see from (29) that l�ðiÞ is given by that j which maximizes the integrand. This selection leads to (10).

A.3. Proof of Corollary 5.1

From (27)

Nqðl�Þ ¼XN

i¼1

aXM

j¼1

pðl�ðiÞ ¼ jj�lðiÞ ¼ jÞ þ bpðlðiÞ ¼ sj�lðiÞ ¼ sÞ" #

: ð30Þ

From (10)

paðMÞ ¼ pðl�ðiÞ ¼ jj�lðiÞ ¼ jÞ¼ pðLðdði; jÞÞP maxfLðdði; kÞÞ; k – j; b=agj�lðiÞ ¼ jÞ ð31Þ

pbðMÞ ¼ pðlðiÞ ¼ sj�lðiÞ ¼ sÞ¼ pðb=a P maxfLðdði; kÞÞgj�lðiÞ ¼ sÞ: ð32Þ

In (31) and (32), the random variables dði; jÞ and dði; kÞ are independent; dði; jÞ is distributed according to f and the dði; kÞare all distributed according to g. The random variable on the right hand side of the inequalities in (31) and (32) increaseswith M, because it is the maximum of more random variables, whereas the random variable on the left hand side does notchange with M. Thus both probabilities paðMÞ and pbðMÞ decrease with M. So, from 30 qðl�Þ ¼ MapaðMÞ þ bpbðMÞ decreaseswith M.

We can use (31) and (32) to calculate qðl�Þ. From (31),

paðMÞ ¼ pðLðdði; jÞP maxk–j

Lðdði; kÞÞ; Lðdði; jÞP b=aj�lðiÞ ¼ jÞ ¼Z 1

b=afLðlÞ½GLðlÞ�M�1dl;

pbðMÞ ¼ pðmaxk

Lðdði; kÞ 6 b=aj�lðiÞ ¼ sÞ ¼ ½GLðlÞ�M :

This gives (13).

References

Cheung, S.Y., Coleri, S., Dundar, B., Ganesh, S., Tan, C-W., Varaiya, P., 2005. Traffic measurement and vehicle classification with a single magnetic sensor.Transportation Research Record (1917), 173–181.

Coifman, B., 1999. Vehicle Reidentification and Travel Time Measurement Using Loop Detector Speed Traps. PhD thesis, University of California, Berkeley, CA94720.

Dempster, A., Laird, N., Rubin, D., 1977. Likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39 (1), 1–38.Haoui, A., Kavaler, R., Varaiya, P., 2008. Wireless magnetic sensors for traffic surveillance. Transportation Research Part C 16 (3), 294–306.Kwong, K., Kavaler, R., Rajagopal, R., Varaiya, P., 2009. Real-time measurement of link vehicle count and travel time in a road network. IEEE Transactions on

Intelligent Transportation Systems, submitted for publication.Lin, F.B., Thomas, D.R., 2005. Headway compression during queue discharge at signalized intersections. Transportation Research Record 1920, 81–85.Liu, H.X., Ma, W., 2009. A virtual vehicle probe model for time-dependent arterial travel time estimation. Transportation Research Part C 17(1), 11–26.Myers, E.W., 1986. An OðNDÞ difference algorithm and its variations. Algorithmica 1, 251–266.

606 K. Kwong et al. / Transportation Research Part C 17 (2009) 586–606

Ndoye, M., Totten, V., Carter, B., Bullock, D.M., Krogmeier, J.V., (2008). Vehicle detector signature processing and vehicle reidentification for travel timeestimation. In: Proceedings of 88th Transportation Research Board Annual Meeting, Washington, DC, January.

Oh, C., Ritchie, S.G., 2002. Real-time inductive-signature-based level of service for signalized intersections. Transportation Research Record 1802, 97–104.Ritchie, S.G., Park, S., Oh, C., Sun, C., 2002. Field investigation of advanced vehicle reidentification techniques and detector technologies-Phase 1. Technical

Report PATH Research Report UCB-ITS-PRR-2002-15, Institute of Transportation Studies, University of California, Berkeley, California.Skabardonis, A., Geroliminis, N., 2005. Real-time estimation of travel times along signalized arterials. In: Mahmassani, H.S. (Ed.), Proceedings of the 16th

International Symposium on Transportation and Traffic Theory. Elsevier, pp. 387–406.Smaragadis, E., Papageorgiou, M., 2003. A series of new local ramp metering strategies. Transportation Research Record 1856, 74–86.Sun, C., Ritchie, S.G., Tsai, K., Jayakrishnan, R., 1999. Use of vehicle signature analysis and lexicographic optimization for vehicle reidentication on freeways.

Transportation Research, Part C 7, 167–185.Sun, C.C., Arr, G.S., Ramachandram, R.P., Ritchie, S.G., 2004. Vehicle reidentification using multidetector fusion. IEEE Transactions on Intelligent

Transportation Systems 5 (3), 155–164.Sun, X., Horowitz, R., 2006. Set of new traffic-responsive ramp-metering algorithms and microscopic simulation results. Transportation Research Record

1959, 9–18.Zhang, H.M., 1999. Link-journey-speed model for arterial traffic. Transportation Research Record 1676, 109–115.Zheng, J., Wang, Y., Nihan, N.L., 2006. Detecting cycle failures at signalized intersections using video image processing. Computer-Aided Civil and

Infrastructure Engineering 21, 425–435.