A multi-hypothesis solution to data association for the two ...

21
Article The International Journal of Robotics Research 2015, Vol. 34(1) 43–63 Ó The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0278364914545674 ijr.sagepub.com A multi-hypothesis solution to data association for the two-frame SLAM problem Edmund Brekke 1 and Mandar Chitre 2 Abstract We propose a multi-hypothesis solution to the simplified problem of simultaneous localization and mapping (SLAM) that arises when only two measurement frames are available. The proposed solution is obtained through direct evaluation of the posterior density as given by finite set statistics. We show that hypothesis probabilities can be evaluated within reason- able accuracy by means of a closed-form expression. Consistency properties are discussed extensively. We overcome inconsistency problems of the extended Kalman filter by means of natural gradient optimization, and we demonstrate through implementations on simulated and real data that the proposed approach has better consistency properties than alternative approaches when applied to the two-frame SLAM problem. Keywords SLAM, data association, MHT, consistency, Laplace approximation, natural gradient, finite set statistics, scan matching 1. Introduction A key task in the navigation of autonomous vehicles is to estimate the motion of the vehicle with respect to its sur- rounding environment. This naturally leads to the concept of simultaneous localization and mapping (SLAM), in which the vehicle builds a map of the environment while simultaneously estimating its own position in this map. SLAM is typically formulated according to a feature-based parametrization (FB-SLAM), where estimation is done by processing sets of point measurements extracted from images obtained by a radar, sonar or laser scanner. In this paper we address the special case of FB-SLAM which arises when only two consecutive data frames, and corresponding measurement sets, are available. We refer to this problem as feature-based scan-matching (FBSM). This problem is obviously of particular importance for initializa- tion of recursive SLAM methods. Furthermore, we believe that it is instructive to study this simplified problem in close detail, before attempting to pursue optimal solutions to the full SLAM problem which comprises several data frames. FB-SLAM is in some ways related to the problem of multi-target tracking (MTT). The difference is, crudely speaking, that in FB-SLAM the sensor moves and the tar- gets are stationary, while in MTT the targets move while the sensor is stationary. Both problems are naturally decomposed into two subproblems: data association and state estimation. The former problem concerns establishing which features in consecutive scans originate from the same physical objects. This includes distinguishing landmark-originating measurements from clutter measure- ments, also known as false alarms. In other words, data association includes answering the question: how many landmarks are there? The latter problem, when viewed from the perspective of SLAM, concerns estimating the motion of the vehicle and the map of landmarks, condi- tional on such association hypotheses. Based on this philosophy, the multi-hypothesis tracker (MHT) was developed in the late 1970s by Donald Reid (1979) as a solution to the MTT problem. The idea of the MHT is simply to evaluate posterior probabilities of all feasi- ble association hypotheses. Reid’s original MHT has been extended and refined in numerous papers, most notably by 1 Department of Engineering Cybernetics, Norwegian University of Science and Technology (NTNU), Trondheim, Norway 2 Acoustic Research Laboratory, Tropical Marine Science Institute, National University of Singapore, Singapore Corresponding author: Edmund Brekke, Department of Engineering Cybernetics, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway. Email: [email protected] at PENNSYLVANIA STATE UNIV on May 11, 2016 ijr.sagepub.com Downloaded from

Transcript of A multi-hypothesis solution to data association for the two ...

Article

The International Journal of

Robotics Research

2015, Vol. 34(1) 43–63

� The Author(s) 2014

Reprints and permissions:

sagepub.co.uk/journalsPermissions.nav

DOI: 10.1177/0278364914545674

ijr.sagepub.com

A multi-hypothesis solution to dataassociation for the two-frame SLAMproblem

Edmund Brekke1 and Mandar Chitre2

Abstract

We propose a multi-hypothesis solution to the simplified problem of simultaneous localization and mapping (SLAM) that

arises when only two measurement frames are available. The proposed solution is obtained through direct evaluation of

the posterior density as given by finite set statistics. We show that hypothesis probabilities can be evaluated within reason-

able accuracy by means of a closed-form expression. Consistency properties are discussed extensively. We overcome

inconsistency problems of the extended Kalman filter by means of natural gradient optimization, and we demonstrate

through implementations on simulated and real data that the proposed approach has better consistency properties than

alternative approaches when applied to the two-frame SLAM problem.

Keywords

SLAM, data association, MHT, consistency, Laplace approximation, natural gradient, finite set statistics,scan matching

1. Introduction

A key task in the navigation of autonomous vehicles is to

estimate the motion of the vehicle with respect to its sur-

rounding environment. This naturally leads to the concept

of simultaneous localization and mapping (SLAM), in

which the vehicle builds a map of the environment while

simultaneously estimating its own position in this map.

SLAM is typically formulated according to a feature-based

parametrization (FB-SLAM), where estimation is done by

processing sets of point measurements extracted from

images obtained by a radar, sonar or laser scanner.

In this paper we address the special case of FB-SLAM

which arises when only two consecutive data frames, and

corresponding measurement sets, are available. We refer to

this problem as feature-based scan-matching (FBSM). This

problem is obviously of particular importance for initializa-

tion of recursive SLAM methods. Furthermore, we believe

that it is instructive to study this simplified problem in close

detail, before attempting to pursue optimal solutions to the

full SLAM problem which comprises several data frames.

FB-SLAM is in some ways related to the problem of

multi-target tracking (MTT). The difference is, crudely

speaking, that in FB-SLAM the sensor moves and the tar-

gets are stationary, while in MTT the targets move while

the sensor is stationary. Both problems are naturally

decomposed into two subproblems: data association and

state estimation. The former problem concerns establishing

which features in consecutive scans originate from the

same physical objects. This includes distinguishing

landmark-originating measurements from clutter measure-

ments, also known as false alarms. In other words, data

association includes answering the question: how many

landmarks are there? The latter problem, when viewed

from the perspective of SLAM, concerns estimating the

motion of the vehicle and the map of landmarks, condi-

tional on such association hypotheses.

Based on this philosophy, the multi-hypothesis tracker

(MHT) was developed in the late 1970s by Donald Reid

(1979) as a solution to the MTT problem. The idea of the

MHT is simply to evaluate posterior probabilities of all feasi-

ble association hypotheses. Reid’s original MHT has been

extended and refined in numerous papers, most notably by

1Department of Engineering Cybernetics, Norwegian University of

Science and Technology (NTNU), Trondheim, Norway2Acoustic Research Laboratory, Tropical Marine Science Institute,

National University of Singapore, Singapore

Corresponding author:

Edmund Brekke, Department of Engineering Cybernetics, Norwegian

University of Science and Technology (NTNU), 7491 Trondheim, Norway.

Email: [email protected]

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

Kurien (1990) and Mori et al. (1986). Several researchers

have expressed belief that the MHT provides an optimal

solution to MTT in some unspecified sense (Ristic, 1999;

Van Keuk, 2002; Musicki and Evans, 2005). Despite its

inherent intractability due to exponential hypothesis growth,

there exist numerous suboptimal MHT implementations

which have been made tractable through aggressive usage of

hypothesis pruning, hypothesis merging, limited time hori-

zons and optimization techniques such as the auction method

and Lagrangian relaxation (Deb et al., 1997; Blackman,

2004). Other well-established MTT methods such as the joint

probabilistic data association filter (JPDAF) (Bar-Shalom

et al., 2011) or the probabilistic multiple hypothesis tracker

(PMHT) (Streit and Luginbuhl, 1994) have been inspired by

the MHT paradigm to a large extent.

In FB-SLAM, joint compatibility branch and bound

(JCBB) (Neira and Tards, 2001) has been described as a gold

standard for data association (Frese, 2010). JCBB aims to

choose the best out of all possible association hypotheses by

using hypothesis cardinality and joint Mahalanobis distance

as score functions. It is not a multi-hypothesis approach in

the same sense as the MHT, since it does not calculate poster-

ior hypothesis probabilities, and therefore is unable to hedge

on alternative hypotheses in the same manner as the MHT.

Many recent advances for data association in FB-SLAM have

been inspired by previous advances in MTT. In particular, this

can be said about the multi-dimensional assignment approach

of Wijesoma et al. (2006), the PMHT-based approach of

Davey (2007) and FastSLAM (Montemerlo, 2003; Nieto

et al., 2003). Both PMHT and FastSLAM, as well as other

SLAM methods such as the work of Pfingsthorn and Birk

(2013), do provide MHT-like hedging abilities, but do not

include concrete formulae for posterior hypothesis probabil-

ities. Blanco et al. (2012) recently proposed such formulae,

but their formulae did not address issues related to false

alarms and landmark existence uncertainty.

During the last decade, an alternative paradigm in MTT

has emerged with basis in Ronald Mahler’s theory of finite

set statistics (FISST) (Mahler, 2007). FISST allows one to

formulate standard MTT problems in such a way that a

well-defined posterior density of the multi-target state,

which is a set-valued random variable, can be constructed.

A primary rationale behind FISST is to be able to deal with

target existence uncertainty in a rigorous manner. FISST

facilitates several useful approximations of this multi-object

posterior, such as the probability hypothesis density (PHD),

which is defined by the property that its integral over some

region R should be equal to the expected number of targets

in R. Mahler’s PHD filter is able to circumvent explicit

data association, and thus avoids the full complexity of the

MHT. Several SLAM methods have been proposed with

basis in the PHD concept (Kalyan et al., 2010; Mullane

et al., 2011; Lee et al., 2012). In this paper we will give par-

ticular attention to the approach of Lee et al. (2012), known

as a single-cluster PHD filter (SCPHD), since this approach

arguably is the neatest FISST-based SLAM method so far

developed.

The PHD is a drastic approximation of the multi-object

posterior. It is reasonably clear that the full multi-object

posterior for standard MTT problems must be represented

in terms of association hypotheses (Williams, 2011; Vo and

Vo, 2013; Williams, 2012; Brekke et al., 2014). Based on

this, we believe that construction of the full posterior for

FB-SLAM may proceed along the lines of the MHT formu-

lations of Reid (1979), Kurien (1990) and Mori et al.

(1986). Establishing concrete expressions for the full pos-

terior of FB-SLAM is important in order to benchmark the

performance of the aforementioned SLAM methods. It is

also important in order to provide further unification

between SLAM methods and MTT methods, and in order

to unify the FISST paradigm with the MHT paradigm. As

a first step on this road, we develop an MHT-solution to

the FBSM problem in this paper.

This entails four contributions. First, we derive a formal

solution to the FBSM problem based on FISST. Second, we

derive an analytical approximation for posterior hypothesis

probabilities. Third, we use Amari’s natural gradient (NG)

(Amari and Douglas, 1998) to overcome inconsistency

problems of the extended Kalman filter (EKF). Fourth we

devise an efficient strategy for exploring the hypothesis

space. This strategy is based on the Bron–Kerbosch method

for clique-detection in graph theory (Bron and Kerbosch,

1973). We therefore refer to the overall FBSM method as

clique-detection scan-matching (CDSM).

This paper builds on preliminary work reported in

Brekke and Chitre (2013). It complements this work by

developing CDSM in the FISST framework, as opposed to

the more old-fashioned MHT-framework used in Brekke

and Chitre (2013), and by presenting real-world results on

laser data, as opposed to the sonar data discussed in Brekke

and Chitre (2013). It also extends Brekke and Chitre (2013)

by providing a full derivation of the hypothesis probability

formula of CDSM, and by comparing CDSM with SCPHD

in addition to other benchmark methods.

This paper is organized as follows. In Section 2 we pres-

ent the underlying framework for our work, including ter-

minology, assumptions and a brief summary of FISST. In

Section 3 we develop a formal solution to the FBSM prob-

lem by means of FISST. In Section 4 we present our most

important contribution: a closed-form expression for pos-

terior hypothesis probabilities. In Section 5 we deal with

hypothesis-conditional state estimation, while Section 6

deals with the search for good hypotheses. Results on

simulated data are presented in Section 7, while results

on the Victoria Park data (Guivant and Nebot, 2001) are

presented in Section 8. Finally, a conclusion follows in

Section 9. The paper includes three appendices.

Appendix A provides details concerning the evaluation of

derivatives used in Sections 4–5. Appendix B provides

details regarding our FBSM-adaptation of SCPHD. In

Appendix C we show how the posterior density for

FBSM can be expressed in terms of association hypoth-

eses. The acronyms used throughout the paper can be

found in Table 1.

44 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

2. Underlying framework

The purpose of this section is to state all aspects of the

FBSM problem in precise terms, and thus to provide a

foundation for the developments in later sections.

2.1. Basic notation and terminology

We address a two-dimensional scan-matching problem, and

so the two-dimensional rotation matrix given by

R(c)=cosc � sinc

sinc cosc

� �ð1Þ

plays a central role. Following the formalism for

matrix derivatives established by Magnus and Neudecker

(1988), differentiation of the rotation matrix yields

DcR(� c)= ½� sin (c), � cos (c), cos (c), � sin (c)�T

where D is the matrix derivative operator as defined in

Magnus and Neudecker (1988). For future reference we

reshape DcR(� c) into the matrix

S(c)=� sinc cosc

� cosc � sinc

� �ð2Þ

Landmark positions (also known as targets) are gener-

ally denoted by x. Vehicle position is denoted by rk = [xk,

yk]T, while vehicle pose is denoted by pk = [xk, yk, ck]

T (cf.

Figure 1). By N (a ; b,S) we mean the Gaussian distribu-

tion N (b,S) evaluated at a. The notation f(�) is used to sig-

nify both probability density functions (PDFs) and multi-

object densities. Uniform distributions are written as

unifL(x)=1=v if x 2 L0 otherwise

�ð3Þ

Let S denote the field of view (FOV) in sensor (polar)

coordinates, and let Lk denote the corresponding region in

landmark (Cartesian) space at time k. Let the correspond-

ing volumes be denoted by V and v, respectively.

2.2. Assumptions

Here we formulate the FBSM problem according to a clas-

sical Bayesian framework along the lines of Wijesoma et al.

(2006) and Reid (1979).

2.2.1. Kinematic prior. We assume that there exist a set

X = {x1,.,xn} of n stationary landmarks, where xi 2 R2,

and both xi and n are unknown. We assume the landmarks

to be a priori independent and identically distributed (i.i.d.)

according to a uniform distribution over the region L1 :

f (xi)=unifL1(xi) ð4Þ

Furthermore, the landmarks are assumed a priori indepen-

dent of the vehicle pose pk. For the vehicle pose, we use a

standard motion model

p2 = p1 +R(c1) 02 3 1

01 3 2 1

� �q+w, w;N (0,Q) ð5Þ

where the displacement q depends on the velocity of the

vehicle and w is plant noise. We assume that p1 is perfectly

known (= 03 3 1), and that q;N (mq,Pq). The prior knowl-

edge about p2 (i.e. the information that we have about p2

without invoking any sensor measurements) can then be

represented by

f (p2)=N (p2 ; mq,P) where P=Q+Pq ð6Þ

2.2.2. Measurement model. At each time step a measure-

ment set Zk = fz1k , . . . , zmk

k g is registered. The combined

measurement set is denoted Z1:2 = (Z1, Z2). Any measure-

ment zjk may originate from clutter or from one of the

Table 1. List of Acronyms.

Acronym Definition

EKF Extended Kalman filterFB-SLAM Feature-based SLAMFBSM Feature-based scan matchingFISST Finite set statisticsJCBB Joint compatibility branch and boundMAP Maximum a posterioriMHT Multiple Hypothesis TrackerMSS Minimal sample setMTT Multitarget trackingNG Natural gradientPHD Probability hypothesis densitypIC Probabilistic iterative correspondencePDF Probability density functionRANSAC Random sample and consensusSCPHD Single-cluster PHDSLAM Simultaneous localization and mapping

Fig. 1. Illustration of the scan-matching problem. Our aim is to

estimate the vehicle pose p2 = [x2, y2, c2]T from the

measurement sets Z1 and Z2. In this illustrative example a

plausible association hypothesis would consist of the

correspondences (z11, z

12), (z

41, z

22) and (z3

1, z32).

Brekke and Chitre 45

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

landmarks in X. In the former case its spatial PDF is uni-

form over the sensor’s FOV:

f (zjk jNot from landmark)=unifS(z

jk) ð7Þ

In the latter case its spatial PDF is modeled as a

Gaussian

f (zjk jxi, pk)=N (z

jk ; h(xi, pk),R) ð8Þ

where

h(xi, pk)= f(g(xi, pk)) ð9Þ

fx

y

� �� �=

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffix2 + y2

patan2(y, x)

� �ð10Þ

g(xi, pk)=R(� ck)(xi � rk) ð11Þ

Note that f(�) signifies a vector-valued function. The matrix

R is related to the sensor resolution (Brekke et al., 2010).

The measurements in Zk are independent when conditioned

on X and pk.

2.2.3. Cardinality models. In addition to the randomness

of the kinematic quantities discussed above, we must also

specify the random nature of the numbers of landmarks,

landmark-originating measurements and clutter measure-

ments, also known as false alarms. We want to make mini-

mal assumptions regarding these numbers. Therefore, we

model both the number of landmarks n and the number of

false alarms fk as uniformly distributed with upper limits

N and M, respectively:

Pr(n)=1

N + 1if n 2 f0, . . . ,Ng

0 otherwise

�ð12Þ

Pr(fk)=1

M + 1iffk 2 f0, . . . ,Mg

0 otherwise:

�ð13Þ

Further justification for these assumptions is given in

Section 7.5.

Furthermore, we assume that all landmarks are detected

with unity probability. We employ this assumption because

non-repeated landmarks do not provide useful information

concerning the motion of the vehicle, and therefore should

be regarded as clutter for practical purposes. This is, how-

ever, a problematic assumption since it contradicts the kine-

matic landmark prior (3). Clearly, a landmark which lies

outside of the second sensor region L2 cannot be detected

at k = 2. This becomes increasingly problematic as p2

increases, and the overlap between L1 and L2 decreases.

Conditional on p1 and p2 only, it is reasonable to assume

that any repeated landmark has a uniform distribution in

L1 \ L2. For p2 sufficiently small, this will be similar to a

uniform distribution over L1 alone. In order to obtain the

prior distribution of repeated landmarks we must margina-

lize this distribution over p2. This will lead to a smeared

distribution, but for mq and P sufficiently small the smeared

distribution will not differ substantially from the original

distribution. Therefore, we consider (3) as a reasonable

approximation of our actual prior knowledge concerning

landmark locations.

2.3. Association hypotheses

It can be shown that the FISST multi-object posterior for

standard MTT is a mixture over association hypotheses sim-

ilar to those used in Reid (1979) or Mori et al. (1986). In

this paper we make a similar statement for FBSM: That the

marginalized posterior f(p2jZ1:2) in FBSM can be written as

a mixture over association hypotheses.

In our preliminary work (Brekke and Chitre, 2013) we

developed the solution to FBSM using classical MHT con-

cepts. We defined the concept of association hypotheses

through four progressive refinements of the outcome space,

called the number event g, configuration event tk, data-to-

data hypothesis u and landmark-to-data hypothesis v.

Rigorous definitions are given below. The background for

this terminology can be found in the seminal references by

Reid (1979) and Mori et al. (1986). We then obtained the

prior hypothesis probabilities by counting the number of tk

per g etc., and we obtained the posterior hypothesis prob-

abilities by multiplying with a hypothesis-conditional

likelihood.

In the FISST approach to FBSM, we restrict attention to

data-to-data hypotheses u and landmark-to-data hypotheses

v. The latter provide assignments of measurements to land-

marks, while the former associate measurements in one

scan with measurements in another scan. The distinction

between these two types of objects, which first was pointed

out by Mori et al. (1986), is of crucial importance in order

to develop a mathematically precise and parsimonious for-

mulation of FBSM (and, more generally, FB-SLAM).

Definition 1 (Landmark-to-data hypothesis). A landmark-

to-data hypothesis is a mapping v: {1,.,n} !{1,.,m1} 3 {1,.,m2} such that vk(s) = vk(t) )s = t for

any k 2 {1,2}.

Definition 2 (Data-to-data hypothesis). Let v be a

landmark-to-data hypothesis. The data-to-data hypothesis u

corresponding to v is then defined as the equivalence class

consisting of all landmark-to-data hypotheses ~v for which

there exists a permutation s: {1,.,n} ! {1,.,n} such

that vk(s(i))= ~vk(i) for k 2 {1, 2}. We signify the rela-

tion of permutation-equivalence by the symbol ‘‘~’’. That

is, if such a permutation s exists for v and ~v, then v;~v.

Remark 1 (Alternative representations). It follows from

Definition 1 that a landmark-to-data hypothesis also can be

viewed as a 2 3 n matrix, where each column represents

a correspondence. It follows from Definition 2 that a data-

to-data hypothesis also can be viewed as a set of corre-

spondences. We use the notation t 2 u to signify that cor-

respondence t belongs in u, while the notation v 2 Per(u)

is used to signify that the landmark-to-data hypothesis

46 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

v belongs to u. Conversely, we let the notation u[v] repre-

sent the data-to-data hypothesis corresponding to v.

Remark 2 (Measurement and correspondence indices).

The measurement associated to landmark i at time step k

under v is denoted zvk (i)k . By zv(i) we understand the vector

½(zv1(i)1 )T, (z

v2(i)2 )T�T 2 R

4, while zv denotes the vector

½(zv1(1)1 )T, (z

v1(2)1 )T, . . . , (zv2(n)

2 )T�T 2 R4n, i.e. the stacked

vector of all measurements that originate from landmarks

according to v. The notation xv signifies a stacked vector

containing all of the landmarks [(x1)T,.,(xn)T]T. Also let

jv be a joint state vector which contains both the vehicle

displacement p2 as well as the landmarks involved in v:

jv = ½pT2 , (xv)T�T = ½pT2 , (x1)T, . . . , (xn)T�T: ð14Þ

If s is a permutation, then xsv is the permuted stacked

landmark vector [(xs(1))T,.,(xs(n))T]T, and jsv is the corre-

sponding permuted joint state vector ½pT2 , (xsv)T�T. The

notation u(i) signifies correspondence number i in u, while

uk(i) signifies the measurement claimed by this correspon-

dence at time k.

In order to illustrate the entities defined in this sec-

tion, let us refer to the example displayed in Figure 1.

Here, the correct number event is gtrue = (n = 3, f1 = 3,

f2 = 2) where n is the number of landmarks and fk is

the number of false alarms at time step k. The correct

association hypothesis utrue corresponds to the following

set of correspondences:

1

1

� �,

3

3

� �,

4

2

� �� �ð15Þ

There are six landmark-to-data hypotheses which agree

with utrue. Two of these are displayed in Figure 2. Thus, in

the rigorous sense of Definition 2, utrue is the equivalence

class

1 3 4

1 3 2

� �, . . . ,

4 3 1

2 3 1

� �� �ð16Þ

2.4. A brief review of finite set statistics

FISST can be developed with a basis in belief measures.

The belief measure of a random finite set N is the

probability

bN(S)=Pr(N � S) ð17Þ

Here S is a subset of the base space, i.e. if x 2 N and

x 2 Rd , then S � R

d . FISST utilizes the so-called set inte-

gral. Let f(X) be a set function, i.e. a function of the finite

set X. Then the set integral of f(X) over S is defined asZS

f (X) dX=X‘

n= 0

1

n!

ZS 3 ... 3 S

f (fx1, . . . , xng) dx1 . . . dxn

Here n is the cardinality of X, and f({x1,.,xn}) = f(X)

under the constraint that jXj = n. A multiobject density

fN(X) is a function which produces a non-negative real

number from any realization X of N, and which normalizes

to one under the set integral:ZR

d

fN(X)dX= 1

The multiobject density fN(X) is related to the belief mea-

sure according to

bN(S)=

ZS

fN(X)dX

Fig. 2. Illustration of how several landmark-to-data hypotheses correspond to the same data-to-data hypothesis: (a) data-to-data

hypothesis; (b) one landmark-to-data hypothesis; (c) another landmark-to-data hypothesis.

Brekke and Chitre 47

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

For further details on FISST the reader is referred to

Mahler (2007), Goodman et al. (1997) and Vo et al. (2005).

3. Formal solution based on FISST

The multi-hypothesis solution to FBSM that was originally

presented in Brekke and Chitre (2013) can also be derived

using FISST. This is important, both in order to provide a

rigorous foundation for data association in SLAM, and also

in order to explore how FISST relates to the classical MHT

paradigm.

We formulate the FBSM problem in terms of the set Nof landmarks (with realization X), the set Gk of false alarms

(with realization Ck), and the set Sk of all measurements at

time k (with realization Zk). The prior multiobject density

of the landmarks is

fN(X)=n!

N + 1

Yn

i= 1

f (xi)

where n is the cardinality of X and f(xi) is as given in (4).

It is easy to verify that this expression, subject to the set-

integral, yields a uniform distribution of n. In a similar

way, the multiobject density of the clutter set Gk is

fGk(Ck)=

fk !

M + 1

1

V fk

where fk is the cardinality of Ck. The measurement set Sk

is the union of the clutter set Gk and a set consisting of

landmark-originating measurements. Based on the assump-

tions of Section 2.2 (e.g. unity detection probability) and

the results of Chapter 12 in Mahler (2007), we arrive at the

following multiobject likelihood:

fSk(Zk jX, pk)=

Xvk

fk !

M + 1

1

V fk

Yn

i= 1

f (zvk (i)k jxi, pk):

The summation goes over all functions vk:

{1,.,n}!{1.,mk} such that vk(i) =vk(j))i = j.

Solving the Bayesian FBSM problem amounts to evalu-

ating the joint posterior density f(X,p2jZ1:2). Assuming a

priori independence between p2 and X, the joint prior

becomes

f (X, p2)= f (X)f (p2): ð18Þ

Bayes’ rule yields

f (X, p2jZ1:2)=1

cf (Z2jX, p2)f (Z1jX)f (X)f (p2) ð19Þ

It can be shown (see Appendix C) that this density is a mix-

ture over all feasible data-to-data hypotheses. In order to

state this result more precisely, we shall first introduce the

components involved in this mixture.

Let Gm1,m2

n denote the set of all feasible data-to-data

hypotheses given that jXj = n, jZ1j = m1 and jZ2j = m2.

This set is the quotient set under permutation-equivalence of

a larger set T m1,m2

n of landmark-to-data hypotheses on the

form v: {1,.,n}!{1,.,m1} 3 {1,.,m2}. For any data-to-

data hypothesis u 2 Gm1,m2

n there are exactly n! permutation-

equivalent landmark-to-data hypotheses v 2 T m1,m2

n .

For any landmark-to-data hypothesis v we define the

hypothesis-conditional posterior density

f (jvjv,Z1:2) = 1au½v� f (p2)

Qnt = 1

f (z

v1(t)1 jxt, 03 3 1)

f (zv2(t)2 jxt, p2)f (x

t)ð20Þ

where the normalization constant au[v] is given by

au½v� =R

f (p2)Qn

i= 1

Rf (z

v1(i)1 jxi)

f (zv2(i)2 jxi, p2)f (x

i)dxidp2

ð21Þ

For reasons soon to be explained we refer to au as the

hypothesis score. We can then express the joint posterior as

f (X, p2jZ1:2)=1

c

Xu2Gm1 ,m2

n

n!f1!f2!

V f1 +f2au

Xv2Per(u)

f (jvjv,Z1:2)

ð22Þ

Remark 3 (The hypothesis score is well-defined). The

number au can be expressed in terms of any v 2 Per (u),

that is, in terms of any landmark-to-data hypothesis v that

corresponds to the data-to-data hypothesis u. The expres-

sion in (21) attains the same value for all v 2 Per (u), since

the integration over xi removes any dependence on the

landmark index i.

Remark 4 (Redundancy of landmark-to-data hypotheses).

All of the densities f(�jv,Z1:2) conditional on landmark-to-

data hypotheses v belonging to the same data-to-data

hypothesis u are functionally equivalent under permutation

of the landmarks. In mathematical terms, if both v(1) and

v(2) are in Per (u), then there exists a permutation s:

{1,.,n} ! {1,.,n} such that f(jv(1)jv(1),Z1:2) =

f(jsv(1)jv(2),Z1:2). Thus, any of the densities f(�jv,Z1:2) can

be used to represent all the kinematic densities belonging

to u. In Brekke and Chitre (2013) we consequently

expressed the multi-hypothesis solution to FBSM in terms

of densities on the form f(�ju,Z1:2). This can be understood

as follows: the function f(�ju,Z1:2) takes as its argument a

joint state vector ½pT2 , (x1)T, . . . , (xn)T�T where landmark

xi is identified with correspondence number i in u.

We define the posterior hypothesis probability

Pr(ujZ1:2) as the total probability mass contributed by the

48 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

data-to-data hypothesis u. Application of the set integral

yields

1=

ZZf (X, p2jZ1:2) dX dp2

=1

c

XN

n= 0

1

n!

Xu2Gm1 ,m2

n

n!f1!f2!

V f1 +f2au

Xv2Per(u)

Zf (jvjv,Z1:2)djv

=1

c

XN

n= 0

1

n!

Xu2Gm1 ,m2

n

n!f1!f2!

V f1 +f2aun!

=1

c

Xu

n!f1!f2!

V f1 +f2au

ð23Þ

Therefore, the contribution of the data-to-data hypothesis u

towards the total posterior probability mass is

Pr(ujZ1:2)=1

c

n!f1!f2!

V f1 +f2au: ð24Þ

This justifies referring to au as the hypothesis score. Also

note that the overall normalization constant c can be found

by normalizing the hypothesis probabilities. Equations (22)

and (24) are important contributions of this paper. These

expressions show how FISST leads to a solution which is

equivalent to the corresponding MHT-based solution, in the

sense that it arrives at exactly the same equations as were

proposed in Brekke and Chitre (2013).

Our primary objective in this paper is to study the

marginalized posterior of the pose displacement p2. It is

given by

f (p2jZ1:2)=

Zf (X, p2jZ1:2) dX

=X

u

Pr(ujZ1:2)f (p2ju,Z1:2)ð25Þ

where

f (p2ju,Z1:2)=

Zf (jvjv,Z1:2) dx

v ð26Þ

for any v 2 Per (u). Thus, the posterior density of p2 is a

conventional mixture of PDFs.

4. Approximation of hypothesis probabilities

Direct evaluation of the posterior densities f(X,p2jZ1:2) or

f(p2jZ1:2) through (22) or (25) can only be of practical util-

ity if we are able to evaluate the hypothesis probabilities

Pr(ujZ1:2) by means of a closed-form expression. This is

problematic, because the hypothesis score au is given by an

integral which does not have any closed-form solution

in general. Nevertheless, we argue in this section that it

is possible to approximate au reasonably well by a

closed-form expression. We will first present this approxi-

mation in Section 4.1, before we derive and justify it in

Section 4.2. Some details are left for Appendix A.

4.1. Statement of main results

Let us convert the measurements to Cartesian coordinates

along the lines of Bar-Shalom et al. (2011). We define

the converted measurements as yik = f�1(zi

k), that is,

according to

yik =

r cosq

r sinq

� �with r, q given by zi

k =r

q

� �ð27Þ

The covariances corresponding to yik and zi

k are

Yik =

Y11 Y12

Y21 Y22

� �and R=

s2r 0

0 s2q

� �ð28Þ

where (as a first-order approximation)

Y11 = r2s2q sin2 (q)+s2

r cos2 (q)

Y22 = r2s2q cos2 (q)+s2

r sin2 (q)

Y12 = Y21 =(s2r � r2s2

q) sin (q) cos (q)

ð29Þ

We can then approximate the hypothesis score au as

au’

(2p)3=2N (pu

2j2 ; mq,P)

(V jRj)nffiffiffiffiffiffiffiffiffiffiffiffiffijJu(p

u

2j2)jp 3Qn

i= 1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijYu1(i)

1 jjYu2(i)2 j

qsu(i)(p

u

2j2) n.0

1 n= 0

8>>>><>>>>: ð30Þ

where

su(i)(p2)=N (yu2(i)2 ; Ay

u1(i)1 + b,

Yu2(i)2 +AY

u1(i)1 AT)

ð31Þ

and where

A=R(� c2), b= �R(� c2)r2 ð32Þ

Note that both A and b depend on p2. The vector pu

2j2 is a

maximum a posteriori probability (MAP) estimate of p2

conditional on u, which is found using EKF and NG tech-

niques as explained in Section 5. Finally, Ju(pu

2j2) is an

information matrix which describes the shape of the poster-

ior density f(p2ju,Z1:2). It is found as a sum of information

matrices for the correspondences involved in the associa-

tion hypothesis u:

Ju(p2)=P�1 +Xi2u

Ju(i)(p2) ð33Þ

By defining

nu(i) = yu2(i)2 � Ay

u1(i)1 � b

Su(i)

=Yu2(i)2 +AY

u1(i)1 AT

ð34Þ

Brekke and Chitre 49

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

we can find the correspondence-conditional information

matrices Ju(i) as

Ju(i)(p2)= Dp2nu(i)

� �T(S

u(i))�1Dp2

nu(i)

+

0 0 0

0 0 0

0 0 12tr (S

u(i))�1Uu(i)(S

u(i))�1Uu(i)

24 35 ð35Þ

where

Dp2nu(i) = A,

� sinc2 cosc2

� cosc2 � sinc2

� �(r2 � y

u1(i)1 )

� �and the matrix

Uu(i) =u11 u12

u21 u22

� �ð36Þ

is given according to

u11 = � 2c11 sinc2 + 2c12 cosc2

u21 = � (c12 + c21) sinc2 +(c22 � c11) cosc2

u12 = � (c12 + c21) sinc2 +(c22 � c11) cosc2

u22 = � 2c22 sinc2 � 2c21 cosc2

ð37Þ

where c11, c12, c21 and c22 are given by

C=c11 c12

c21 c22

� �=AY

u1(i)1 ð38Þ

The form of the information matrix in (35) is based on a

result from Porat and Friedlander (1986).

4.2. Derivation of main results

First of all, let us note that under the given assumptions we

can rewrite (21) as

au =

ZN (p2 ; mq,P)

Yn

i= 1

Iu(i)(p2) dp2 ð39Þ

where the landmark integral

Iu(i)(p2) =RN (z

u1(i)1 ; h(x; 0),R)

N (zu2(i)2 ; h(x; p2),R) unifL1

(x) dxð40Þ

represents the contribution from correspondence number i.

Before delving into the mathematical details of the pro-

posed approximation, let us make some reflections regard-

ing the strategy used.

We came to believe that a closed-form approximation of

au was possible through numerical investigations which

indicated that the densities f(p2ju, Z1:2) are close to

Gaussian. Based on this, the key idea is as follows: if we

are able to approximate each Iu(i)(p2) by a function propor-

tional to a Gaussian in p2, then we also have au in closed

form.

Approximation of Iu(i)(p2) must overcome two obstacles,

posed by the presence of the non-Gaussian prior unifL1(x)

and by the nonlinearities in h(x;p2). These nonlinearities

are due to the fact that measurements are received in polar

coordinates, and (more importantly) due to change in vehi-

cle orientation as represented by c2. By converting the

measurements from polar to Cartesian coordinates, we

make the integrand in (40) approximately proportional to a

Gaussian in x. We then assume that the integrand in (39) as

well is approximately proportional to a Gaussian in p2. We

are able to find the parameters of this Gaussian using EKF-

techniques and evaluation of the Fisher information matrix

(FIM). This yields a closed-form approximation of au.

More concretely, by recalling the converted measure-

ments defined in (27)–(29) and the notation A and b

defined in (32), we first employ the following

approximation

Iu(i)(p2) ’N (0 ; 0,R)2

N (0 ; 0,Yu1 (i)

1)N (0 ; 0,Y

u2 (i)

2)vR

L1N (y

u2(i)2 ; Ax+ b,Y

u2(i)2 )

N (x ; yu1(i)1 ,Y

u1(i)1 ) dx

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijYu1(i)

1jjYu2 (i)

2j

pjRjv su(i)(p2)

ð41Þ

where su(i)(p2) is as defined in (31). The idea underlying

(41) is that the ‘‘converted Gaussians’’ N (x ; yu1(i)1 ,Y

u1(i)1 )

and N (yu2(i)2 ; Ax+ b,Y

u2(i)2 ) are approximately propor-

tional to the original densities N (zu1(i)1 ; h(x; 0),R) and

N (zu2(i)2 ; h(x; p2),R) when interpreted as functions of x.

The proportionality constants are found as

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijYu1(i)

1 j=jRjq

and

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijYu2(i)

2 j=jRjq

by comparing the peak values of the

densities.

Having eliminated the integral over x, we hypothesize

that for each correspondence in u there exist a number bu(i),

a vector mu(i) and a matrix Ju(i) such that

su(i)(p2)’bu(i)N (p2 ; mu(i), (Ju(i))�1) ð42Þ

The shape of this approximating Gaussian is given by the

matrix Ju(i). More precisely, the curvature of its logarithm is

given by Ju(i). We can obtain a measure of this curvature in

different ways. In a so-called Laplace approximation we

would match the second-order derivatives of su(i)(p2) and of

the approximating Gaussian for a suitable choice of p2. In

this paper we explore an alternative option based on match-

ing of FIMs, which can be expressed in terms of only first-

order derivatives. The idea is that if su(i)(p2) is proportional

to a PDF (in, say, zu2(i)2 ) which depends on p2 as a parameter,

then the FIM of this PDF with respect to p2 should be equal

to the inverse covariance of the approximating Gaussian.

This scheme, using the general result for FIMs of Gaussian

processes established by Porat and Friedlander (1986), leads

to the correspondence-conditional information matrix for-

mula (35).

Assuming thus that every Iu(i)(p2) can be approximated

by a function proportional to a Gaussian with information

50 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

matrix Ju(i) and some expectation mu(i), we can approxi-

mate the full integrand in (39) by a function proportional to

a Gaussian. Its information matrix is

Ju =P�1 +Xi2u

Ju(i) ð43Þ

In other words, we hypothesize that there exists a vector

pu

2j2 and a scalar bu such that

au’

ZbuN (p2 ; p

u

2j2, (Ju)�1) dp2 =bu ð44Þ

The scalar bu should be equal to the ratio between the true

integrand in (39) and its Gaussian replacement in (44) for

all p2 which contribute to these integrals. Clearly, the ratio

between these two functions will in reality depend on p2,

so we need to choose a particular p2 for which we calculate

the ratio. As indicated by the notation pu

2j2, we choose the

hypothesis-conditional MAP estimate of p2 for this pur-

pose. We discuss in the next section how to find this vector.

Thus, we approximate

au’

N (pu

2j2 ; mq,P)Qn

i= 1

Iu(i)(pu

2j2)

N (pu

2j2 ; pu

2j2, (Ju)�1)

ð45Þ

Combining (35), (41), (43) and (45) leads to the desired

hypothesis score formula (30).

The accuracy of this approximation will depend on

factors such as the actual displacement p2, the sensor reso-

lution and the number of correspondences. Our approxima-

tion tends to underestimate the hypothesis probabilities

when n = 1, but does not deviate more than a few per cent

for other cardinalities, cf. Figure 3.

5. State estimation

The subproblem of state estimation in FBSM amounts to

evaluating hypothesis-conditional PDFs such as f(jvjv,

Z1:2) and f(p2ju, Z1:2). We remind the reader that although

several landmark-to-data hypotheses v exist for each data-

to-data hypothesis u, state estimation needs only be done

once for each u, as explained in Remark 4.

Under the given assumptions, the joint state jv is

approximately Gaussian when conditioned on Z1 only, with

first and second moments

jv2j1 =

mq

yv1(1)1

..

.

yv1(n)1

2666437775, Pv

1j1 =

P

Yv1(1)1

. ..

Yv1(n)1

2666437775

The PDF f(jvjv, Z1:2) can be approximated by a Gaussian

in different ways. The most straightforward approach, used

in this paper, is to use an EKF to obtain its first and second

moments. Unfortunately, the EKF tends to exhibit inconsis-

tent behavior: the actual estimation error of the EKF may

be substantially larger than indicated by the calculated cov-

ariance (Huang et al., 2010). This effect can have a huge

detrimental impact on evaluation of the hypothesis prob-

abilities Pr (ujZ1:2), since these essentially are exponential

in the underlying Mahalanobis distances. Any analytical

solution to this problem must either inflate the covariance

to correspond to the actual estimation error, or it must

improve the estimation accuracy so that it actually reaches

the expected covariance.

In order to resolve this problem we employ a two-stage

estimation procedure. In the first stage we use an EKF to

obtain an estimate jv2j2 with associated covariance Pv

2j2 of

the full state vector jv. In the second stage we obtain an

improved estimate pu

2j2 of the pose vector by means of a

technique know as the NG (Amari and Douglas, 1998).

5.1. EKF-based pose estimation

In this stage, we linearize the hypothesis-conditioned esti-

mation problem around the predicted state vector jv2j1. The

linearization is carried out by means of the Jacobian

Hv =

Hv(1)p Hv(1)

x

..

. . ..

Hv(n)p Hv(n)

x

26643775 j = jv

2j1

ð46Þ

with sub-matrices

Hv(i)p =Dp2

h(yv1(i)1 , p2)=DgfDp2

g

Hv(i)x =D

yv1 (i)

1

h(yv1(i)1 , p2)=DgfDy

v1 (i)

1

gð47Þ

where

Fig. 3. The closed-form approximation (21) does not appear to

deviate more than 10% from numerical evaluation of (39) for the

true hypotheses in the scenarios considered in Section 7. For all

practical purposes this deviation is negligible.

Brekke and Chitre 51

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

Dgf =x=r y=r

�y=r2 x=r2

� �Dp2

g = �R(� c2) S(c2)(yv1(i)1 � r2)

h iD

yv1 (i)

1

g = R(� c2)

ð48Þ

Here r, x and y are given according to r= yv1(i)1 � r2

��� ���2

and ½x, y�T =R(� c2)(yv1(i)1 � r2). Furthermore, we

define the combined measurement mapping

eh(ju)= ½h(x1, p2)T, . . . , h(xn

1, p2)T�T ð49Þ

and the corresponding covariance

Rv = In � R ð50Þ

An EKF with output jv2j2 and Pv

2j2 can then be con-

structed as follows:

Wv = Pv1j1(H

v)T(Rv +HvPv1j1(H

v)T)�1 ð51Þ

nv = zv2 � eh(jv

1j1) ð52Þ

jv2j2 = jv

1j1 +Wvnv ð53Þ

Pv2j2 = (I�WvHv)Pv

1j1 ð54Þ

We can partition the EKF estimate into pose and land-

mark components according to jv2j2 = ½(p

u½v�2j2 )T, (xv

2j2)T�T,

and the corresponding covariance can be partitioned as

Pv2j2 =

Pu½v�p Pv

px

(Pvpx)

T Pvx

" #

5.2. NG optimization

The NG is an optimization technique similar to the more

well-known optimization method of Newton.1

Instead of

the Hessian of the cost function, the NG uses a Riemannian

tensor which is ‘‘naturally defined from the characteristics

of the parameter space’’ (Amari and Douglas, 1998).

Assuming the integrand of (39) to be proportional to a

Gaussian in p2, it follows that its desired expectation pu

2j2can be found by maximizing the integrand. Therefore, we

use a cost function proportional to the negative logarithm

of the integrand

c(p2)= � lnN (p2 ; mq,P)�Xn

i= 1

ln su(i)(p2) ð55Þ

By using the information matrix Ju(pu2j2) as the

Riemannian tensor, we obtain the optimization scheme

pu

2j2 = pu2j2 � (Ju(pu

2j2))�1Dpu

2j2c(pu

2j2) ð56Þ

with Ju(p2) as given in (33), and where

Dp2c(p2)= (p2 � mq)

TP�1

+ 12

Pi2u

d1 + d2 +vec((Su(i))�1)TDp2

Su(i)

d1 = ((Su(i))�1nu(i))TDp2

nu(i)

d2 = (nu(i))T½((nu(i))T � I2)Dp2½(Su(i)

)�1�+(Su(i)

)�1Dp2nu(i)�

Dp2½(Su(i)

)�1�= � ((Su(i))�1 � (Su(i)

)�1)Dp2Su(i)

Dp2Su(i)

= ½04 3 2, vec(Uu(i))�

ð57Þ

All quantities not defined in (57) have been introduced in

Section 4. The landmark estimates and the covariance are

not optimized during this stage, and these quantities there-

fore remain as in (53) and (54). For detailed derivations of

the derivatives in (57) we refer the reader to Appendix A.

In Figure 4 the inconsistency problem of the EKF is illu-

strated by means of an example scenario where the perfor-

mance of EKF is particularly bad. In this particular case,

the EKF asserts that the true MAP estimate is about 1040

times less probable than its own estimate pu2j2. On the other

hand, the Gaussian N (p2 ; pu

2j2, (Ju(pu

2j2))�1) is very similar

to the true posterior f(p2ju, Z1:2). Further illustration of how

the NG step mends the inconsistency problem can also be

found in Figure 3 of Brekke and Chitre (2013).

6. Hypothesis search by clique detection

In practical MHT implementations, the key challenge is to

search for promising hypotheses without brute force enu-

meration. This search is typically carried out by assignment

methods based on integer programming (Wijesoma et al.,

2006) or the auction algorithm (Blackman and Popoli,

1999). These methods exploit a key feature of multiple

hypothesis tracking, namely that the cost of a hypothesis

can be written as a sum of terms contributed by the mea-

surements involved. In multi-hypothesis SLAM this is no

longer possible, since the correspondences become co-

dependent when the uncertainty of p2 is taken into account.

Also, assignment methods such as integer programming or

the auction algorithm only aim to discover a single best

hypothesis, or possibly the N best hypotheses. They do not

attempt to explore the hypothesis space beyond such

limitations.

In this paper we solve the search problem through a

three-stage procedure consisting of validation gating,

clique-detection and hypothesis expansion.

6.1. Validation gating

First of all, validation gating similar to the individual com-

patibility test of Neira and Tards (2001) is used in order to

keep the number of correspondences on a manageable

level. More precisely, zj2 may possibly be associated with

zi1 only if

(zj2 � h(yi

1,mq))T(Si)�1(z

j2 � h(yi

1,mq))\g2 ð58Þ

52 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

where g is the number of standard deviations tolerated and

Si =HipP(H

ip)

T +HixY

i1(H

ix)

T

The Jacobians Hip and Hi

x are found according to (47).

The output of the gating procedure can be represented by a

matrix G 2 f0, 1gm1 3 m2 whose entry number (i,j) is one if

zi1 may be associated with z

j2, and zero otherwise.

6.2. Clique detection

The search problem can be addressed by treating the indi-

vidual correspondences as nodes in a graph, whose edges

represent distance between the correspondences. For the

FBSM problem discussed in this paper, a correspondence

consists of two measurements. This yields one free para-

meter during displacement estimation. The locus of all pos-

sible p2 for a given correspondence can be visualized as a

helix. Following this line of thought, the nodes of the corre-

spondence graph correspond to helices, and the edges cor-

respond to minimal sample sets (MSSs), i.e. data-to-data

hypotheses with two correspondences. The edge weights

are related to the minimal distances between helices. One

would expect that all of the helices involved in a good

hypothesis should intersect a limited volume near the cor-

rect displacement vector ptrue2 .

Working with helices is problematic for two reasons.

First, the construction of helices is fundamentally a non-

probabilistic method. Second, inter-helix distances would

have to be evaluated using a Riemannian metric, but it is

not immediately clear how this one should be defined or

minimized.

Instead, we establish the edges of the correspondence

graph using normalized re-projection error. For any MSS u

which can be generated from G we find its normalized re-

projection error as

e(u)= (nu)T(HuPu2j2(H

u)T +Ru)�1nu ð59Þ

Only those edges whose normalized re-projection error is

below some threshold t are included in the graph.

Furthermore, if the number of edges exceeds some constant

B, then we only retain the B best edges.

In order to find the maximal cliques of this graph we use

the Bron–Kerbosch method (Bron and Kerbosch, 1973) as

implemented by Wildman (2011).

6.3 Hypothesis expansion

The collection of maximal cliques does not necessarily

contain the true hypothesis utrue. One can nevertheless rest

assured that utrue will be a sub-clique of one of the maximal

cliques, insofar as the edge threshold t and maximal edge

number B are large enough. Therefore, for each maximal

clique u, we also include all sub-cliques which contain

n(u) 2 d or more nodes in the hypothesis collection. We

refer to d as the expansion depth.

7. Test design and simulation results

In this section we will first (Sections 7.1–7.4) compare

CDSM with four other methods for scan-matching and data

association in SLAM. We will then discuss the sensitivity

of SCPHD and CDSM to model mismatch in Section 7.5.

7.1. Simulation setup

We investigate the performance of CDSM through Monte

Carlo simulations which are designed to mimic output from

a multi-beam sonar with maximum range 60 m, and total

azimuth coverage of 120�. The vehicle’s surge velocity is

uniformly drawn from the interval [0 knots, 5 knots], while

the time interval between the two scans is uniformly drawn

Fig. 4. Illustration of the EKF’s inconsistency problem, and how the NG optimization step mends it. (a) Logarithm of pose

distribution and EKF approximant. (b) Logarithm of pose distribution and NG approximant.

Brekke and Chitre 53

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

from the interval [1s, 3s], so as to mimic a realistic applica-

tion for autonomous underwater vehicles (AUVs).2

Landmarks are placed at random around the vehicle,

and detected with various detection probabilities in [0,1],

independently between the scans. For any detected land-

mark, the measurement noise is given by R = diag ([0.125,

5.7 3 1024]), corresponding to a sensor resolution of

60 3 31 cells (Brekke et al., 2010). Clutter measurements

are drawn with a spatial distribution being uniform over the

FOV in polar coordinates. The number of false alarms are

drawn according to Poisson distributions corresponding to

various false-alarm rates in [10210, 1021].

Having generated several hundreds of thousands of

such scenarios, we pick the first 200 Monte Carlo

runs which satisfy various constraints given by the number

n of landmark measurements, by the average number�f=(f1 +f2)=2 of clutter measurements, and by the rela-

tive pose angle c2 between the two scans. We investigate a

total of 16 such scenarios (see Tables 2–5).

The pose prior is given by mq = 03 3 1 and P = diag

([(5 m)2, (2 m)2, (30�)2]). Normalized re-projection error

e(u) is thresholded at t = 4 standard deviations, and valida-

tion gating is done with g = 3. Maximally B = 200 corre-

spondences are included in the correspondence graph. We

use d = n as the expansion depth for CDSM.

Recall that CDSM makes minimal assumptions regard-

ing false-alarm rate or detection probability. Thus, the

model assumed by CDSM differs from the simulation

model with regard to these quantities. This is justified by

the fact that it hardly is possible to estimate these quantities

reliably from processing only two frames.

7.2. Benchmark methods

7.2.1. JCBB. We use a Matlab implementation of JCBB

which was downloaded from http://www.robots.ox.ac.uk/

~SSS06. It should be noted that the original implementa-

tion did not give satisfactory performance for large jc2j,

Table 2. Successes for 5 � n \ 10, �f\10.

CDSM JCBB SCPHD RANSAC pIC

jc2j \ 2� 94.5% 93.0% 96.0% 90.0% 26.5%jc2j2 [2�, 8�) 94.0% 94.0% 91.0% 55.5% 31.0%jc2j2 [8�, 32�) 92.5% 95.0% 95.5% 1.0% 19.5%jc2j� 32� 94.5% 94.5% 93.0% 0.0% 7.0%

Table 3. Successes for n \ 5, �f\10.

CDSM JCBB SCPHD RANSAC pIC

jc2j \ 2� 92.0% 90.0% 93.4% 85.5% 27.5%jc2j2 [2�, 8�) 87.0% 89.5% 97.5% 61.0% 27.0%jc2j2 [8�, 32�) 92.0% 91.0% 92.5% 16.0% 20.5%jc2j� 32� 88.0% 89.0% 90.5% 8.0% 12.0%

Table 4. Successes for n \ 5, 10� �f\15.

CDSM JCBB SCPHD RANSAC pIC

jc2j \ 2� 79.5% 78.5% 78.5% 73.5% 11.0%jc2j2 [2�, 8�) 79.5% 73.5% 83.0% 41.5% 9.0%jc2j2 [8�, 32�) 73.0% 74.5% 82.0% 5.5% 6.0%jc2j� 32� 67.5% 72.5% 78.0% 1.0% 1.0%

Table 5. Successes for n \ 5, 15� �f\20.

CDSM JCBB SCPHD RANSAC pIC

jc2j \ 2� 51.5% 41.0% 59.0% 40.5% 6.0%jc2j2 [2�, 8�) 51.0% 50.5% 61.0% 21.0% 4.0%jc2j2 [8�, 32�) 37.0% 40.5% 57.0% 0.5% 3.5%jc2j� 32� 24.0% 25.0% 36.0% 0.0% 0.0%

54 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

due to usage of converted measurements yi2 when comput-

ing joint compatibility and Mahalanobis distances. The pro-

gram is therefore rewritten so that the polar measurements

zi2 are used instead of the converted measurements yi

2 in

these tasks. JCBB’s estimate of p2 is found using the same

technique as CDSM uses.

7.2.2. RANSAC. A generic random sampling consensus

(RANSAC) method works by drawing random MSSs until

k iterations have been executed. During each iteration, all

correspondences are tested for compatibility with the cur-

rent MSS, and added to the current hypothesis if deemed

compatible. A score function is then evaluated to test

whether the new hypothesis is better than the previously

best hypothesis. The number k is dynamically updated dur-

ing each iteration, so as to reflect how many more itera-

tions are deemed necessary to find the true solution with a

given probability p (p = 0.999 is used in our simulations):

k =1� p

1� (card(u)=M)sð60Þ

Here M is the total number of feasible correspondences,

s = 2 is the size of an MSS, and card (u*) is the cardinality

of the best hypothesis u* found so far.

RANSAC requires us to specify an estimator of p2, an

error function for whether a correspondence fits the MSS,

and a hypothesis score function. We have deliberately

aimed to make our RANSAC implementation as straight-

forward as possible. Hence, we use the formulae suggested

by Horn (1987) to estimate p2. As the error function, we

use re-projection error, defined as

error(u)=1

n

Xn

i= 1

R(� c)yu1(i)1 � r � y

u2(i)2

��� ���2

ð61Þ

When testing a correspondence’s fit with an MSS, this

function is evaluated for each of the resulting three pairs of

correspondences. The correspondence is added to the

hypothesis if the average of all three reprojection errors is

smaller than a threshold T = 2 m. As hypothesis score we

use cardinality of hypotheses. Whenever a tie is encoun-

tered, the hypothesis set with the lowest re-projection error

is chosen.

7.2.3. pIC. Standard scan-matching methods work in

terms of a two-step iterative process which is repeated until

convergence. In step 1, a measure of the plausibility of

each tentative correspondence is calculated. In step 2, the

displacement between the two scans is calculated according

to a least-squares criterion which puts most emphasis on

the most plausible correspondences. The most popular

methods of this kind are the iterative closest point (ICP)

method (Besl and McKay, 1992; Nieto et al., 2007) and the

probabilistic iterative correspondence (pIC) method

(Montesano et al., 2005; Hernandez et al., 2009). The latter

is specifically tailored towards working with measurements

received by range-bearing sensors. In our simulations we

use pIC as described in Montesano et al. (2005) with maxi-

mally eight iterations.

7.2.4. SCPHD. This method, originally proposed by Lee

et al. (2012) and further developed in Lee et al. (2013), is

currently the most recent SLAM method derived from the

FISST formalism. Roughly speaking, SCPHD evaluates a

joint intensity function over the vehicle state space and the

landmark space. In the implementation of Lee et al. (2012),

the vehicle posterior was evaluated using a particle filter,

while the landmark intensity conditional on a given vehicle

pose was represented by a Gaussian mixture.

The accuracy of SCPHD depends on the number of

samples and on how the samples are chosen. In order to

ensure that SCPHD is able to achieve the same resolution

as CDSM without odometry or data-dependent sampling,

several hundred thousand samples may be necessary. For

this reason, we do not use particle filtering in our imple-

mentation of SCPHD. Instead, we evaluate the pose displa-

cement posterior of SCPHD over a fixed grid.

In SCPHD we must specify some tuning constants,

notably the detection probability PD, the expected number

of landmarks m and the clutter rate l. In accordance with

the modeling underlying CDSM we assume PD = 1, while

we set both m and l equal to five. Further details concern-

ing our implementation of SCPHD can be found in

Appendix B.

7.3. Performance measures

We analyze the performance of CDSM in terms of the three

normalized error functions

eu =

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi(p

u2j2 � ptrue2 )TP�1

p, true(pu2j2 � ptrue2 )

qeMAP

1 =ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi(pMAP

2j2 � ptrue2 )TP�1p, true(p

MAP2j2 � ptrue2 )

qeMAP

2 =ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi(pMAP

2j2 � ptrue2 )TP�1p, ave(p

MAP2j2 � ptrue2 )

qwhere

pMAP2j2 =argmax

p2

f (p2jZ1:2)

Pp, ave =

Z(p2 � ptrue2 )(p2 � ptrue2 )Tf (p2jZ1:2)dp2

and where u* is the most probable non-empty data-to-data

hypothesis, ptrue2 is the true pose displacement, and Pp,true is

the hypothesis-conditional pose covariance of the true data-

to-data hypothesis. We are particularly concerned about two

criteria for performance: success rates and consistency.

7.3.1. Success rates. For a match to be declared successful,

we require that the top non-empty hypothesis has at least

max(2, ntrue/2) correct correspondences, and that eu\3.

Brekke and Chitre 55

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

This performance measure must in some cases be slightly

modified in order to apply to the benchmark methods. For

both JCBB and RANSAC the same success criterion

applies. For SCPHD we declare a success whenever

eMAP1 \3. For pIC we require that its final estimate must

lie within three standard deviations (as given by Pp,true) of

the true pose displacement.

7.3.2. Consistency. In the stochastic filtering literature, an

estimator is considered consistent if it has the appropriate

degree of self-confidence. More precisely, Bar-Shalom

et al. (2011) state that the estimation error should have mag-

nitude commensurate with the corresponding covariance

that is yielded by the estimator. A related requirement, that

was suggested by Brekke and Chitre (2013), is that the esti-

mation procedure should not consider the true parameter

value as significantly less probable than the estimated

value. In this paper we take a more conventional approach,

and analyze consistency in terms of the normalized error

measure eMAP2 . Without stating any more precise require-

ments, we desire that the distribution of eMAP2 should be

close to that of a three-degree-of-freedom x-distributed ran-

dom variable, i.e. similar to the distribution of the absolute

value of a three-dimensional Gaussian random variable with

zero mean and unit covariance. This is based on the ratio-

nale that insofar as the posterior density f(p2jZ1:2) is close

to a Gaussian, then eMAP2 should indeed have a distribution

similar to the unit-variance Gaussian.

We analyze consistency for CDSM, JCBB and SCPHD,

only. For both CDSM and SCPHD we evaluate eMAP2

through grid-based optimization of the pose posterior. For

JCBB, which only provides a single hypothesis with no

hedging on alternative hypotheses, the concept of MAP

estimation is of questionable validity. In order to get some

idea of JCBB’s consistency properties we simply treat its

final pose estimate as pMAP2j2 , while we treat the correspond-

ing covariance as Pp,ave.

7.4. Simulation results

7.4.1. Success rates. Success rates for increasingly difficult

scenarios are displayed in Tables 2–5. We make the follow-

ing four comments on these results.

The success rates of CDSM, JCBB and SCPHD are very

similar. The are generally quite high, but not perfect. Even

in the simplest case (Table 2) there are several Monte Carlo

runs where both CDSM and JCBB fail to choose the correct

hypothesis. However, in the vast majority of these cases, a

hypothesis which satisfies our success criteria can be found

among the top 10 CDSM hypotheses. Such hedging is not

provided by JCBB since it only returns one hypothesis.

SCPHD is arguably the most robust method since it

achieves substantially higher success rates than the other

methods for the most challenging scenarios (the last two

rows in Table 5). However, neither SCPHD nor any of the

other methods considered can be expected to estimate the

true pose displacement accurately when there is four times

as much or more clutter than landmarks. An interesting,

but highly non-trivial topic for future research may be to

develop rigorous Cramer–Rao lower bounds based on

model parameters such as sensor resolution and clutter rate,

possibly along the lines of Rezaeian and Vo (2010).

The plain-vanilla RANSAC method achieves very good

performance for low rotation uncertainty, but deteriorates

quickly as c2 increases. We believe that this is because it

does not use a probabilistic EKF-based pose estimation

technique. In Brekke and Chitre (2013) an alternative

‘‘probabilistic’’ RANSAC method was also proposed to

overcome this problem, but that method never achieved the

same best-case performance as the plain-vanilla RANSAC.

We believe that RANSAC-based methods have a great

potential to solve data association problems in SLAM, but

many bells and whistles may be necessary to achieve the

desired performance.

In most cases, pIC fails to provide adequate estimates.

This could possibly be because pIC depends on continuous

features such as walls, or because there are too few land-

marks in the simulated scenarios. However, even when pIC

violates the success criterion, it is not necessarily entirely

lost. Often pIC yields near-acceptable estimates, but local

optima prevent the method from converging to sufficiently

accurate estimates.

7.4.2. Consistency. In order to investigate consistency,

Figure 5 displays box-plots of eMAP2 -values for the scenar-

ios corresponding to Table 3 (CDSM 1, JCBB 1 and

SCPHD 1) and Table 4 (CDSM 2, JCBB 2 and SCPHD 2),

as well as a box-plot for samples of a three-degree-of-free-

dom x-distributed random variable (labeled Gauss). For the

easy scenario, both CDSM, JCBB and SCPHD have pretty

much ideal consistency properties insofar as eMAP2 is

concerned.

For the tough scenario, CDSM maintains low eMAP2 val-

ues. In other words, actual estimation error is low com-

pared to overall covariance, implying that CDSM becomes

Fig. 5. Consistency of CDSM, JCBB and SCPHD.

56 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

more conservative. This is due to two reasons. First, the

increased amount of clutter causes it to put more emphasis

on the empty hypothesis as well as on other low-cardinality

hypotheses. Second, the increased amount of available

measurements also causes it to hedge on a larger number

of semi-true hypotheses, typically with one correspondence

wrong or missing, which leads to a larger spread of the

main mode of f(p2jZ1:2).

Both for SCPHD, and especially for JCBB, the large

number of outliers indicates inconsistent behavior in the

tough scenario. For JCBB this is clearly because it chooses

a single hypothesis without accounting for data association

uncertainty. For SCPHD, we will discuss one possible cause

of inconsistency in the next subsection.

7.5. Sensitivity to assumed clutter rate in Poisson

formulation

As mentioned, SCPHD assumes the cardinalities n (number

of landmarks) and fk (number of false alarms) to be

Poisson distributed with rates m and l. We could also have

developed CDSM under these assumptions, instead of the

uniform assumptions used in (12) and (13). This leads to

the alternative hypothesis probability formula

Pr(ujZ1:2)=mn(l=V )f1 +f2 au ð62Þ

The approximation for au as well as the hypothesis-

conditional state estimation remain as before. It is of inter-

est to investigate how robust such a Poisson-CDSM is

compared with the default CDSM.

In Figure 6 we see an example of how the normalized

errors eMAP1 and eMAP

2 may depend on the assumed false

alarm rate l for SCPHD and the Poisson-CDSM. First of

all, we note that eMAP1 and eMAP

2 are virtually indistinguish-

able for SCPHD and the Poisson-CDSM. This can be taken

as evidence that SCPHD indeed provides a very good

approximation of the posterior density for FBSM (condi-

tional on the Poisson assumptions, of course). With regard

to the primary error measure eMAP1 , we can see that assum-

ing too low a clutter rate appears to be less dangerous than

assuming too high a clutter rate. At least for Poisson-

CDSM, this can be understood in terms of the emphasis

put on cardinality information. The assumed absence of

clutter makes Poisson-CDSM shift probability towards

high-cardinality hypothesis. In most cases, unless actual

clutter rates are extremely high, the hypothesis with the

highest cardinality tends to be correct or semi-correct, and

choosing this hypothesis (which, by the way, always is

attempted by JCBB or RANSAC) is therefore likely to

yield good accuracy. On the other hand, the consistency

measure eMAP2 attains its best values for slightly exagger-

ated values of l. However, these very low values of eMAP2

(down to 0.1 standard deviations) are indeed unreasonably

low, and indicate an over-conservative estimation method.

For the default CDSM, both eMAP1 and eMAP

2 attain

reasonable values (2.8 and 0.8 standard deviations, respec-

tively) in this particular example.

In Figure 7 we get an insight into how the hypothesis

probabilities depend on m and l. For the same example, the

ratio Pr (utruejZ1:2)/Pr (u0jZ1:2), where utrue is the true data-

to-data hypothesis and u0 is the empty data-to-data hypoth-

esis, is plotted as a function of m and l. For default CDSM,

this ratio is 386 in this example, indicating a strong, but not

absolute, confidence in utrue. For Poisson-CDSM, on the

other hand, this ratio varies with a factor larger than 109 for

l 2 [1, 10]. Such a large variation is clearly not representa-

tive of our actual knowledge, and should cause concern

regarding the Poisson assumption.

There are three ways of dealing with this problem. The

first way is simply to ignore it, with the justification that

curves such as those displayed in Figure 6 indicate a lim-

ited practical relevance of this problem. The second way is

to acknowledge that our knowledge of l (and m and PD as

well) is uncertain. Following this line of thought we must

specify a (possibly non-informative) distribution of l, and

marginalize over this distribution. Research in this direction

has been reported in Mahler et al. (2011), but no such

developments have so far been proposed for SLAM-related

problems. The third way, pursued in our development of

CDSM, is to give up the Poisson cardinality model, and

instead treat n and fk as distributed according to non-

informative distributions.

7.6. Computational complexity

Only limited attempts have been made to optimize the

computational complexity of CDSM, and a systematic

comparison with the complexities of JCBB and SCPHD

has therefore not yet been conducted. For scenarios such as

those studied in this section, we have found that CDSM

will require between 2 times and 100 times longer runtime

than JCBB, while our implementation of SCPHD will typi-

cally require about 1000 times longer runtime than JCBB.

Fig. 6. Sensitivity to Poisson rate l for SCPHD and the Poisson

version of CDSM.

Brekke and Chitre 57

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

For CDSM, as for any bona fide MHT approach, the

theoretical complexity is exponential. Its runtime depends

on how aggressively low-probability correspondences and

hypotheses are being pruned. For SCPHD, the runtime

depends on the number of samples used. The high runtime

for SCPHD results from our requirement that SCPHD

should yield an accuracy commensurate with the accuracy

of CDSM, while not utilizing any more prior information

than CDSM does (cf. Section 7.2.4).

8. Real data results

The Victoria Park data set (Guivant and Nebot, 2001) con-

sists of 7249 frames of data recorded by a laser scanner

mounted on a small truck, which was driving in the

Victoria Park in Sydney, NSW, Australia. Point features

corresponding to trees in the Victoria Park can be extracted

from the data, for example using the Matlab code available

from www-personal.acfr.usyd.edu.au/nebot/victoria_park

.htm. The Victoria Park data have been used as a test case

in several SLAM publications, among them Davey (2007)

and Montemerlo (2003).

Since this paper only addresses FBSM, and not FB-

SLAM in general, we only analyze the data in pairwise

consecutive frames. That is, for all pairs of frames k 2 1

and k we investigate whether CDSM and our FBSM adap-

tations of CDSM and SCPHD are able to estimate the odo-

metry between these frames reliably. The odometry consists

of velocity measurements recorded at the rear left wheel

and measurements of the average steering angles of the

front wheels. By dead-reckoning using the Ackermann

model we are able to calculate ground truths for the pose

displacements between consecutive scans. For the methods

to be investigated, we use the measurement noise matrix

R = diag ([0.3 m, 0.0076 rad]2).

In the upper plots of Figures 8 and 9 we can see how

hypothesis-conditional velocity estimates from CDSM’s

lead hypotheses match the measured odometry for forward

velocity and vehicle rotation, respectively. In the lower plots

of these figures we display normalized errors

ebestx, (2) = (xbest2j2 � x2)=(Dt

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPave½1, 1�

p)

ebestc, (2) = (cbest2j2 � c2)=(Dt

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPave½3, 3�

p)

for the same time steps. It is evident that CDSM in the

majority of cases estimates the odometry very well.

Occasionally it fails to provide good estimates, but in all

such cases the normalized errors remain small, indicating

that CDSM acknowledges its own failure, and hedges

heavily on the empty hypothesis. It should also be noted

Fig. 7. Ratio between the probabilities of the true hypothesis and

the empty hypothesis as function of m and l for the Poisson

version of CDSM.

Fig. 8. Forward velocity estimates of CDSM. The best

hypothesis occasionally misses odometry, but is generally within

62s.

Fig. 9. Turn rate estimates of CDSM. A small bias can be

observed during very strong left turns.

58 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

that during very sharp left turns (turn radius less than 5 m,

estimates of the rotation rate exhibit a noticeable bias. This

may possibly be due to inadequacy of the Ackermann

model under these circumstances.

In Figure 10 we display box-plots for of forward-

velocity errors

ebestx, (1) = (xbest2j2 � x2)=Dt

eMAPx, (1) = (xMAP

2j2 � x2)=Dt

for the collection all consecutive pairs of scans in the data

set.3

All of the methods tend to estimate the correct velocity

within 610 cm/s. The typical accuracy of CDSM’s best

hypothesis and JCBB is better than the typical accuracy of

the MAP estimates. This is so because the MAP estimates

are obtained from optimization over a grid with limited res-

olution. This illustrates the performance loss inevitably suf-

fered by sampling-based methods. Again, we note that

JCBB has more outliers than CDSM or SCPHD.

In Figure 11 we investigate the distributional properties

of the estimation errors more closely by plotting empirical

survival functions for the normalized error measures ebestx, (2)

and

eMAPx, (2) = (xMAP

2j2 � x2)=(DtffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPave½3, 3�

p)

The survival function is defined as one minus the cumula-

tive distribution function (cdf). The empirical survival

function is constructed by first making a histogram of the

relevant samples, and then calculating one minus the cumu-

lative sum of the histogram divided by the total sum of the

histogram. For comparison, we also include the survival

function for the absolute value of a zero-mean unit-variance

Gaussian random variable in Figure 11. Again, the inconsis-

tency of JCBB is very noticeable, while SCPHD displays a

small degree of inconsistency. The normalized error distri-

bution of CDSM-MAP is very similar to the folded unit-

variance Gaussian, indicating ideal consistency properties,

while CDSM-Best appears to be slightly over-conservative.

9. Conclusion and future research

In this paper we have developed a rigorous solution to what

we call FBSM, which is a special case of the more general

SLAM problem. Our proposed solution, whose complete

machinery is referred to as CDSM, provides a Gaussian

mixture approximation of the posterior density of the pose

displacement. Novel strategies for state estimation and

hypothesis search have been incorporated in CDSM in

order to make this approximation as reliable and efficient

as possible.

More precisely, we have shown that the posterior density

of the pose displacement according to FISST is a mixture

over data-to-data hypotheses, we have derived an exact

expression for the weights in this mixture, and we have

developed a closed-form approximation for these mixture

weights. Thus, this paper provides a bridge between the

MHT paradigm and the FISST paradigm. This is very

important, because the relationship between these two para-

digms has not yet been clarified in the available literature.

For the FBSM problem, CDSM appears to have advan-

tages both over traditional data association methods such as

JCBB, and over FISST-based techniques such as SCPHD.

Analysis of consistency reveals that CDSM is significantly

more robust than JCBB. This can be credited to CDSM’s

ability of hedging on alternative hypotheses. CDSM also

has somewhat better consistency properties than SCPHD.

This is probably so because CDSM is able to utilize unin-

formative models for the number of targets and false

alarms, as opposed to the Poisson models required by

SCPHD. Furthermore, CDSM provides a more direct and

parsimonious representation of the posterior density than

SCPHD does.

CDSM is only a first step towards truly multi-hypothesis

SLAM and target tracking with navigation uncertainty. In

future research, we aim to develop extensions to k time

steps. This is in principle relatively straightforward, but will

entail challenges with regard to the following. First, the

Fig. 10. Boxplots displaying forward velocity error statistics.

Fig. 11. Consistency properties of velocity estimates

investigated through empirical survival functions.

Brekke and Chitre 59

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

assumption of unity detection probability must be

relaxed. Second, the number of available hypotheses

will increase, partly because of non-unity detection

probability. Therefore, efficient hypothesis management

techniques are necessary for this. Third, more refined

motion models may be required for extension to k time

steps. Using a slightly different hypothesis representation

than the one used in this paper, we have recently devel-

oped a formal solution to Bayesian multi-hypothesis

SLAM with k time steps in Brekke et al. (2014), and we

are currently pursuing practical implementations of that

solution.

Funding

This research received no specific grant from any funding agency

in the public, commercial, or not-for-profit sectors.

Notes

1. The benefit of the NG over Newton is that the NG only

requires evaluation of first-order derivatives of the objective

function, while Newton requires evaluation of second-order

derivatives. Nevertheless, both methods are able to exploit

curvature information. Gauss–Newton, as opposed to

Newton, can also manage with only first-order derivatives,

but only for least-squares problems. Minimizing c(p2) is not

simply a least-squares problem, because c(p2) contains

Gaussians whose normalization constants also depend on p2

(cf. (31)).

2. Typical surge speed for a small AUV is about 3 knots (Eng

et al., 2010).

3. For JCBB, there is no difference between ebest:, (�) and eMAP:, (�) . For

CDSM, ebest:, (�) does not exist, so only eMAP:, (�) is relevant.

References

Amari S and Douglas SC (1998) Why natural gradient? In: Pro-

ceedings of 1998 IEEE ICASP. Seattle, WA, USA, pp. 1213–

1216.

Bar-Shalom Y, Willett PK and Tian X (2011) Tracking and Data

Fusion: A Handbook of Algorithms. Storrs, CT: YBS

Publishing.

Besl PJ and McKay HD (1992) A method for registration of 3-D

shapes. IEEE Transactions on Pattern Analysis and Machine

Intelligence 14(2): 239–256.

Blackman S (2004) Multiple hypothesis tracking for multiple tar-

get tracking. IEEE Aerospace and Electronic Systems Maga-

zine 19: 5–18.

Blackman S and Popoli R (1999) Design and Analysis of Modern

Tracking Systems. Norwood, MA: Artech House.

Blanco JL, Gonzalez-Jimenez J and Fernandez-Madrigal J (2012)

An alternative to the mahalanobis distance for determining

optimal correspondences in data association. IEEE Transac-

tions on Robotics 28(4): 980–986.

Brekke E and Chitre M (2013) Bayesian multi-hypothesis scan

matching. In: Proceedings of OCEANS’13, Bergen, Norway.

Brekke E, Hallingstad O and Glattetre J (2010) The signal-to-

noise ratio of human divers. In: Proceedings of OCEANS’10,

Sydney, Australia.

Brekke E, Kalyan B and Chitre M (2014) A novel formulation of

the bayes recursion for single-cluster filtering. In: Proceedings

of IEEE aerospace conference, Big Sky, MT, USA.

Bron C and Kerbosch J (1973) Finding all cliques of an undirected

graph. Communications of the ACM 16(9): 575–7.

Davey SJ (2007) Simultaneous localization and map building

using the probabilistic multi-hypothesis tracker. IEEE Transac-

tions on Robotics 23(2): 271–280.

Deb S, Yeddanapudi M, Pattipati K and Bar-Shalom Y (1997) A

generalized S-D assignment algorithm for multisensor-

multitarget state estimation. IEEE Transactions on Aerospace

and Electronic Systems 33(2): 523–538.

Eng YH, Hong GS and Chitre M (2010) Depth control of an

autonomous underwater vehicle, starfish. In: Proceedings of

OCEANS’10, Sydney, NSW, Australia.

Fackler PL (2005) Notes on Matrix Calculus. North Carolina State

University. Available at: http://www4.ncsu.edu/~pfackler/.

Frese U (2010) Interview: is SLAM solved?Kunstliche Intelligenz

24(3): 255–257.

Goodman IR, Nguyen HT and Mahler R (1997) Mathematics of

Data Fusion. New York: Springer.

Guivant J and Nebot E (2001) Simultaneous localization and map

building: Test case for outdoor applications. Technical report,

Australian Centre for Field Robotics. Available at: http://www-

personal.acfr.usyd.edu.au/nebot/publications/slam/

IJRR_slam.htm.

Hernandez E, Ridao P, Ribas D and Batlle J (2009) MSISpIC: A

probabilistic scan matching algorithm using a mechanical

scanned imaging sonar. Journal of Physical Agents 3(1): 3–11.

Horn BKP (1987) Closed-form solution of absolute orientation

using unit quaternions. Journal of the Optical Society of Amer-

ica A 4(4): 629–642.

Huang GP, Mourikis AI and Roumeliotis SI (2010) Observability-

based rules for designing consistent EKF SLAM estimators.

The international Journal of Robotics Research 29(5): 502–528.

Kalyan B, Lee KW and Wijesoma WS (2010) FISST-SLAM:

Finite set statistical approach to simultaneous localization and

mapping. The International Journal of Robotics Research

29(10): 1251–1262.

Kurien T (1990) Issues in the design of practical multitarget track-

ing algorithms. In: Multitarget-Multisensor Tracking: Applica-

tions and Advances, volume 1, chapter 3. Norwood, MA:

Artech House, pp. 219–245.

Lee CS, Clark DE and Salvi J (2012) SLAM with single cluster

PHD filters. In: Proceedings of ICRA, St. Paul, MN, pp. 2096–

2101.

Lee CS, Clark DE and Salvi J (2013) SLAM with dynamic targets

via single-cluster PHD filtering. IEEE Journal of Selected

Topics in Signal Processing 7(3): 543–552.

Magnus JR and Neudecker H (1988) Matrix Differential Calculus

with Applications in Statistics and Econometrics. London:

John Wiley & Sons.

Mahler R (2007) Statistical Multisource-Multitarget Information

Fusion. Norwood, MA: Artech House.

Mahler R, Vo BT and Vo BN (2011) CPHD filtering with

unknown clutter rate and detection profile. IEEE Transactions

on Signal Processing 59(8): 3497–3513.

Montemerlo M (2003) FastSLAM: A Factored Solution to the

Simultaneous Localization and Mapping Problem With

Unknown Data Association. PhD Thesis, School of Computer

Science, Carnegie Mellon University, Pittsburgh, PA, USA.

60 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

Montesano L, Minguez J and Montano L (2005) Probabilistic scan

matching for motion estimation in unstructured environments.

In: Proceedings of IROS 2005, Edmonton, AB, Canada, pp.

3499–504.

Mori S, Chong CY, Tse E and Wishner R (1986) Tracking and

classifying multiple targets without a priori identification.

IEEE Transactions on Automatic Control 31(5): 401–409.

Mullane J, Vo BN, Adams MD and Vo BT (2011) A random-

finite-set approach to Bayesian SLAM. IEEE Transactions on

Robotics. DOI:10.1109/TRO.2010.2101370.

Musicki D and Evans R (2005) Target existence based MHT. In:

Proceedings of CDC-ECC’05, Seville, Spain, pp. 1228–1233.

Neira J and Tards JD (2001) Data association in stochastic map-

ping using the joint compatibility test. IEEE Transactions on

Robotics and Automation 17(6): 890–897.

Nieto J, Bailey T and Nebot E (2007) Recursive scan-matching

SLAM. Robotics and Autonomous Systems 55(1): 39–49.

Nieto J, Guivant J, Nebot E and Thrun S (2003) Real time data

association for FastSLAM. In: Proceedings of ICRA 2003, Tai-

pei, Taiwan, pp. 412–418.

Pfingsthorn M and Birk A (2013) Simultaneous localization and

mapping with multimodal probability distributions. The Inter-

national Journal of Robotics Research 32(2): 143–171.

Porat B and Friedlander B (1986) Computation of the exact infor-

mation matrix of gaussian time series with stationary random

components. IEEE Transactions on Acoustics, Speech and Sig-

nal Processing 34(1): 118–130.

Reid D (1979) An algorithm for tracking multiple targets. IEEE

Transactions on Automatic Control 24(6): 843–854.

Rezaeian M and Vo BN (2010) Error bounds for joint detection

and estimation of a single object with random finite set obser-

vation. IEEE Transactions on Signal Processing 58(3):

1493–506.

Ristic B (1999) A comparison of MHT and 2D assignment algo-

rithm for tracking with an airborne pulse Doppler radar. In: Pro-

ceedings of ISSPA’99, Brisbane, QLD, Australia, pp. 341–344.

Streit RL and Luginbuhl TE (1994) Maximum likelihood method

for probabilistic multi-hypothesis tracking. In: Signal and Data

Processing of Small Targets, Orlando, FL (Proceedings of

SPIE, vol. 2235). Bellingham, WA: SPIE.

Van Keuk G (2002) MHT extraction and track maintenance of a

target formation. IEEE Transactions on Aerospace and Elec-

tronic Systems 38(1): 288–295.

Vo BN, Singh S and Doucet A (2005) Sequential Monte Carlo

methods for multitarget filtering with random finite sets. IEEE

Transactions on Aerospace and Electronic Systems 41(4):

1224–1245.

Vo BT and Vo BN (2013) Labeled random finite sets and multi-

object conjugate priors. IEEE Transactions on Signal Process-

ing 61(13): 3460–3475.

Wijesoma WS, Perera LDL and Adams MD (2006) Toward multi-

dimensional assignment data association in robot localization

and mapping. IEEE Transactions on Robotics 22(2): 350–365.

Wildman J (2011) Bron-Kerbosch maximal clique finding algo-

rithm. Matlab Central File Exchange. Retrived 27 October 2011.

Williams JL (2011) Graphical model approximations of random

finite set filters. Preprint arXiv:1105.3298v2.

Williams JL (2012) Hybrid Poisson and multi-Bernoulli filters. In:

Proceedings of FUSION 2012, Singapore, pp. 1103–1110.

Appendix A: The gradient of (55)

The NG update step (56) utilizes the gradient of c(p2) as

expressed in (57). Detailed derivations of the formulae in

(57) are provided in this appendix.

First of all, it is useful to generalize the formula for dif-

ferentiation of quadratic forms. It is well known that

Dx½f(x)AfT(x)�=(A+AT)Dx f(x). If f is allowed to be a

matrix or A is allowed to depend on x, a more general for-

mula is needed. For this to be handled, the well-known

product rule (uv)0 = u0v + uv0 should be generalized to

matrix-valued functions. In Fackler (2005), the following

result can be found.

Lemma 1. (Generalized product rule). Let f : Rn !R

m 3 pand g : Rn ! Rp 3 qbe matrix-valued functions. If

f(x) = f(x)g(x), then its matrix derivative is

Dxf(x)= (g(x)T � Im)Dxf (x)+ (Iq � f (x))Dxg(x): ð63Þ

Lemma 2. (Derivative of quadratic form). If

F : Rn ! Rt3 sand c : Rn ! R

t 3 tis given by c(x) = F

(x)TA(x)F(x), then

Dxc(x)= A(x)F(x)½ �T�Is

Ks, tDxF(x)+ (Is � F(x)T)

(F(x)T � It)DxA(x)�

+(Is � A(x))DxF(x)�

where Ks, t 2 Rst 3 st is the commutation matrix (Magnus

and Neudecker, 1988).

Proof. This can easily be seen by applying the result of

Lemma 1 twice. In the first step, let f(x) bF(x)T and let

g(x) bA(x)F(x). In the second step, differentiate A(x)F(x)

by letting f(x) bA(x) and g(x) bF(x).

Lemma 3. We have Dp2Su(i) = ½04 3 2, vec(U

u(i))�.

Proof. It can be seen from (34) that the only variable in p2

that Si depends on is c2. As a special case of Lemma 2 we

obtain

Dc2Su(i)

= Yu(i)1 R(c2)

h iT� �K2, 2Dc2

½R(c2)�

+(I2 �R(c2)T)(I2 � Y

u(i)1 )Dx½R(c2)�

= (½Yu(i)1 R(c2)�

T � I2)K2, 2Dc2R(c2)

+ I2 � ½R(c2)TY

u(i)1 �Dc2

R(c2)

ð64Þ

The second equality in (64) follows from the mixed-

product property of the Kronecker product. The derivative

Dc2R(c2) of the rotation matrix is

Dc2R(c2)= ½� sinc2, cosc2, � cosc2, � sinc2�T

If we now define C=R(c2)Yu(i)1 as in (38), then

Brekke and Chitre 61

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

Dc2Su(i)

= ((C� I2)K2, 2 + I2 � C) Dc2R(c2)

=

c11 c12 0 0

0 0 c11 c12

c21 c22 0 0

0 0 c21 c22

2666437775

0BBB@

+

c11 c12 0 0

c21 c22 0 0

0 0 c11 c12

0 0 c21 c22

26664377751CCCA

� sinc2

cosc2

� cosc2

� sinc2

2666437775

=

�2c11 sinc2 + 2c12 cosc2

�(c12 + c21) sinc2 +(c22 � c11) cosc2

�(c12 + c21) sinc2 +(c22 � c11) cosc2

�2c22 sinc2 � 2c21 cosc2

2666437775

By definition, this is vec(Uu(i)).

Lemma 4. The derivative of the inverse covariance matrix

isDp2(Su(i))�1 = � ((Su(i))�1 � (Su(i))�1)Dp2

Su(i).

Proof. This result is established in Ch. 9 of Magnus and

Neudecker (1988).

Lemma 5. We have Dp2nu(i) = R(� c2), S(c2)½

(r2 � yu(i)1 )�.

Proof. Application of Lemma 1 yields

Dp2nu(i) =Dp2

(yu(i)1 �R(� c2)y

u(i)1 � b)

=Dp2(R(� c2)r2 �R(� c2)y

u(i)1 )

= (r2 � I2)Dp2R(� c2)+R(c2)Dp2

r2

�(yu(i)1 � I2)Dp2

R(� c2)

= 02 3 2, (r2 � yu(i)1 )� I2

� sinc2

cosc2

� cosc2

� sinc2

2666437775

2666437775

� R(� c2), 02 3 1½ �

= R(� c2), S(c2)(r2 � yu(i)1 )

h iLemma 6. If c(p2) is given by (55), then

Dp2c(p2)= (p2 � mq)

TP�1

+1

2

Xi2u

d1 + d2 +vec((Su(i))�1)TDp2

Su(i)

where

d1 =((Su(i))�1nu(i))TDp2

nu(i)

d2 =(nu(i))T ((nu(i))T � I2)Dp2((Su(i)

)�1)h

+(Su(i))�1Dp2

nu(i)i:

Proof. By discarding all terms in c(p2) whose derivatives

are zero, we get the reduced cost function

c(p2)=1

2(p2 � mq)

TP�1(p2 � mq)

+Xi2u

1

2ln jSu(i)j+ 1

2(nu(i))T(Su(i)

)�1nu(i)

� �The derivative of the first term is

Dp2

1

2(p2 � mq)

TP�1(p2 � mq)

� �=(p2 � mq)

TP�1

The derivatives of the logarithmic determinants are

Dp2

12ln jSu(i)j

=Dp2jSu(i)j

2jSu(i)j= 1

2vec((Su(i)

)�1)TDp2Su(i)

where the second equality follows from (1) in Ch. 9 of

Magnus and Neudecker (1988). Finally, from Lemma 2 we

find that

Dp2(nu(i))T(Su(i)

)�1nu(i)

= d1 + d2

where d1 and d2 correspond to the first and second terms in

(63), respectively.

Appendix B: Implementation of SCPHD

SCPHD is based on the concept of a joint pose-and-

landmark intensity, which is factorized as follows:

Dk(pk , x)= sk(pk)Dk(xjpk)

Here sk(pk) represents a parent process (of the pose), while

Dk(xjpk) represents a daughter process (of the landmarks

conditional on the pose). In this particular case, where pk is

a vector, sk(�) is a PDF, while Dk(�jpk) is a PHD. Thus, the

integralR

Dk(xjpk)dx yields the expected number of land-

marks conditional on pk.

Based on the assumptions stated in Section 2.2 we use

the prior intensities

s1j0(p1)= d0(p1) and D1j0(xjp1)=m unifL1(x)

After the first measurement update we get the posterior

intensities

s1(p1)= d0(p1)

D1(xjp1)’Xm1

i= 1

m

m+ l � (v=V )ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffijRj=jYi

1jq N (x; yi

1,Yi1)

This follows from applying the same kind of reasoning that

led to (41) in Section 4.2 to formulae (6)–(11) in Lee et al.

(2012). After the second measurement update we get the

posterior intensities

62 The International Journal of Robotics Research 34(1)

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from

s2(p2)}N (p2; mq,P)Ym2

j= 1

l=V +Xm1

i= 1

wij(p2)

!

D2(xjp2)=Xm2

j= 1

Pm1

i= 1 wij(p2)N (x; mij,Pij)

l=V +Pm1

i= 1 wij(p2)

where

wij = m

m+ l�(v=V )ffiffiffiffiffiffiffiffiffiffiffiffijRj=jYi

1jp N (z

j2; h(yi

1, p2),R)

mij = yi1 +Wij(z

j2 � h(yi

1, p2))Pij =(I�WijHi

x)Yi1

Wij =Yi1(H

ix)

T(HixY

i1(H

ix)

T +R)

These equations are implemented for each p2 in a

discrete grid, with 50, 61 and 200 samples in the intervals

22 � x2 � 8, 27 � y2 � 7 and 265��c2 � 65�,

respectively.

Appendix C: Deriving the joint posterior as a

mixture over data-to-data hypotheses

In this third and last appendix we show that the joint density

f(p2, XjZ1:2) is a mixture over data-to-data hypotheses on the

form (22). Recall that the prior density can be factorized as

f(X, p2) = f(X) f(p2) due to a priori independence between

p2 and X. Thus, we let the prior joint density be given by

f (X, p2)= f (p2)f (X)= f (p2)n!

N + 1

Yn

i= 1

f (xi)

Inserting the measurement model fSkZk jX, pkð Þ into the

two-step Bayes rule (19) yields the posterior joint density

f (X, p2jZ1:2)=1

c

Xv1

f1!

V f1

Yn

i= 1

f (zv1(i)1 jxi, 03 3 1)

!Xv2

f2!

V f2

Yn

i= 1

f (zv2(i)2 jxi, p2)

!f (p2)n!

Yn

i= 1

f (xi)

where the two summations go over all landmark-to-data

hypotheses v1: {1, ., n}! {1, ., m1} and v2: {1, ., n}

! {1, ., m2}, respectively. This product of sums can be

replaced by a single sum over all 2-dimensional landmark-

to-data hypotheses v: {1, ., n}! {1, ., m1} 3 {1, .,

m2}:

f (X, p2jZ1:2)=1

c

Xv

n!f1!f2!

V f1 +f2f (p2)

Yn

i= 1

f (zv1(i)1 jxi, 03 3 1)f (z

v2(i)2 jxi, p2)f (x

i):

=1

c

Xv

n!f1!f2!

V f1 +f2au½v�f (jvjv,Z1:2)

Here u[v] is the data-to-data hypothesis encompassing the

landmark-to-data hypothesis v, and the last line was

obtained by invoking the definitions in (20) and (21).

Finally, it remains to partition the collection of landmark-

to-data hypotheses into data-to-data hypotheses. By recog-

nizing that for any landmark-to-data hypothesis v all its per-

mutations correspond to the same data-to-data hypothesis u,

we immediately obtain the desired formula (22).

Brekke and Chitre 63

at PENNSYLVANIA STATE UNIV on May 11, 2016ijr.sagepub.comDownloaded from