Mixture Kalman filters

26

Transcript of Mixture Kalman filters

Mixture Kalman FiltersRong Chen Jun S. LiuDepartment of Statistics Department of StatisticsTexas A&M University Stanford UniversityCollege Station, TX 77843 Stanford, CA 94305AbstractIn treating dynamic systems, sequential Monte Carlo methods use discrete samples to rep-resent a complicated probability distribution and use rejection sampling, importance sampling,and weighted resampling to complete the on-line \�ltering" task. In this article we propose aspecial sequential Monte Carlo method, the mixture Kalman �lter, which uses random mixtureof normal distributions to represent a target distribution. It is designed for on-line estima-tion and prediction of conditional and partial conditional dynamic linear models, which arethemselves a class of widely used nonlinear system and also serve to approximate many othernonlinear systems. Compared with a few available �ltering methods including Monte Carlo ones,the e�ciency gain provided by the mixture Kalman �lter can be very substantial. Another con-tribution of this article is the formulation of many nonlinear systems into conditional or partialconditional linear form, to which the mixture Kalman �lter can be applied. Examples in targettracking and digital communications are given to demonstrate the proposed procedures.1 IntroductionDynamic system is widely used in almost all �elds of applications such as computer vision, eco-nomic and �nancial data analysis, feed-back control systems, mobile communication, radar or sonarsurveillance systems, just to name a few. Because most of these systems are nonlinear and non-Gaussian, a main challenge to researchers is to �nd e�cient methods for on-line (in real time)estimation and prediction (�ltering) of the ever-changing system characteristics, along with thecontinuous ow of the information (observations) from the system.For a Gaussian linear dynamic system, Kalman (1960) provided a genius algorithm (the famousKalman �lter) for on-line �ltering. To date, however, there has yet been a universally e�ectivealgorithm for dealing with nonlinear and non-Gaussian systems. Depending on the features ofindividual problems, some generalizations of the Kalman �ltering methodology to nonlinear systemscan be e�ective. For example, a few well-known extensions are the extended Kalman �lters (Gelb,1974), the Gaussian sum �lters (Anderson and Moore, 1979), and the iterated extended Kalman1

�lters (Jazwinski, 1970). Most of these methods are based on local linear approximations of thenonlinear system. More recently, researchers began to pay attention to a new class of �lteringmethods based on the sequential Monte Carlo approach.Sequential Monte Carlo can be loosely de�ned as a set of methods that use Monte Carlosimulation to solve on-line estimation and prediction problems. More precisely, sequential MonteCarlo techniques achieve the �ltering task by recursively generating Monte Carlo samples of thestate variables or some other latent variables. These methods are often more adaptive to features ofthe target system because of the exible nature of Monte Carlo simulations. Since the appearanceof two such methods, the bootstrap �lter (also called particle �lter) for nonlinear state-space models(Gordon, Salmond, and Smith 1993) and the sequential imputation for Bayesian missing dataproblems (Kong, Liu, and Wong 1994), Monte Carlo �ltering techniques have caught attentionsfrom researchers in very di�erent areas that require dynamic modeling. There have also been manyrecent modi�cations and improvements on the method (Berzuini, Best, Gilks, and Larizza 1997;Carpenter, Cli�ord, and Fearnhead 1997; Doucet 1998; H�urzeler and K�unsch 1995; Liu and Chen1995; Pitt and Shephard 1999). A sequential importance sampling (SIS) framework is proposed inLiu and Chen (1998) to unify and generalize these related techniques. In the following context, werefer to all these methods applied to state-space models as Monte Carlo �lters.In this article, we focus on the popular state space model of the following form:(state equation): xt+1 � ft(� j xt)(observation equation): yt+1 � qt(� j xt+1); (1)and a special case of which, the conditional dynamic linear model (CDLM). Here the xt are un-observed state variables and the yt are observed signals. Let yt = (y1; : : : ; yt) be the informationavailable up to time t. Of interest in these systems are usually (a)estimation of the state variable,say E(xt j yt), by using all available information; (b) prediction of a future state, say E(xt+1 j yt);and (c) revision of the previous state estimations given new information, e.g., E(xt�l j yt). Themain challenge is that these tasks need to do done on-line in real time, which makes it critical fora �ltering method behave recursively (e.g., modify the estimations or predictions quickly as newobservation comes in.An important feature of the CDLM, whose precise de�nition can be found in Section 3, isthat given the trajectory of an indicator variable (vector), the system is Gaussian and linear, forwhich the Kalman �lter can be used. Thus, by using the marginalization technique for Monte Carlocomputation (Rubinstein, 1981), we derive a Monte Carlo �lter that focuses its full attention on2

the space of indicator variable. We call this �lter a mixture Kalman �lter (MKF). By doing sowe can drastically reduce Monte Carlo variances associated with a standard sequential importancesampler applied directly to the space of the state variable.The MKF idea can also be applied to those systems that are only partially conditional linearand Gaussian, i.e., those systems whose state variable consists of a component that is conditionallylinear and a component that is completely nonlinear. By conditioning on an indicator and thevalue of the nonlinear component of the state variable, the system (both the state equation andthe observation equation) becomes linear and Gaussian. We call such a system partial CDLM. Inthis case, the linear component of the state variable can be `marginalized' out before running asequential importance sampler. The marginalization operation is again achieved by the Kalman�lter operation. We call this method an extended MKF.Given the importance of the CDLM in system modeling, it is perhaps not surprising thatapproaches similar to the MKF described in this article have been proposed earlier. Indeed, theearlier work of Ackerson and Fu (1970), Akashi and Kumamoto (1977), and Tugnait (1982), andrecent work of Liu and Chen (1995) and Doucet (1998) are all closely related. We will provide amore detailed account on each of these approaches in Section 3.The rest of the paper is organized as follows. In Section 2 we provide a brief overview of thecelebrated Kalman �lter and the central idea of a Monte Carlo �lter, which serves as the backboneof our procedure and the basis of comparisons. In Section 3, we give a detailed description of theCDLMs and the proposed mixture Kalman �lter. Section 4 is devoted to the partial conditionaldynamic linear models and the extended mixture Kalman �lters. In Section 5, we give severalapplications of the proposed MKF and extended MKF, including three examples in target trackingand two examples in telecommunications. A brief summary is given in Section 6.2 Kalman Filter and Monte Carlo Filter2.1 Kalman �lterWhen the transition functions ft and gt in the state-space model (1) are linear Gaussian (i.e., withlinear mean functions and constant covariance matrices), we have that xt j yt � N(�t;�t). Basedon this fact, Kalman (1960) found a very fast algorithm for the on-line incorporation of the newobservation yt+1, i.e., for updating from (�t;�t) to (�t+1;�t+1). Speci�cally, consider the following3

linear and Gaussian state space model8<: xt = Htxt�1 +Wtwt;yt = Gtxt + Vtvt;with known coe�cients Ht; Gt;Wt and Vt, and wt � N(0; I) and vt � N(0; I). Assume x0 �N(�0;�0), then p(xt j yt) is also Gaussian. Its mean vector and covariance matrix can be obtainedrecursively. Engineers are often in favor of the following recursions:Pt+1 = Ht�tH 0t +WtW 0t ;St+1 = Gt+1Pt+1G0t+1 + VtV 0t ;�t+1 = Ht+1�t + Pt+1G0t+1S�1t+1(yt+1 �Gt+1Ht+1�t); (2)�t+1 = Pt+1 � Pt+1G0t+1S�1t+1Gt+1Pt+1:In addition, the predictive distribution isp(yt+1 j yt) � N(Gt+1Ht+1�t; St+1): (3)2.2 Sequential importance sampler for the state space modelConsider the state space model (1). Suppose at time t we have the posterior distribution of xt asp(xt j yt), then the predictive distribution for xt+1 isp(xt+1 j yt) = Z ft+1(xt+1 j xt)p(xt j yt)dxt: (4)Since analytical computation with this distribution is often di�cult, a possible alternative is theMonte Carlo approximation. More precisely, if we can draw x(m)t (either iid or dependently) fromp(xt j yt), then we can approximate p(xt+1 j yt) byp(xt+1 j yt) = 1M MXm=1 ft(xt+1 j x(m)t ): (5)In many cases, however, directly sampling from p(xt j yt) is infeasible, but drawing from a trialdistribution �(xtjyt) is easy. If this is the case, we need to modify (5) by adjusting the weightof each sampled x(m)t . That is, if the x(m)t are drawn from �(xt j yt), then we can modify theapproximation (5) as: ~p(xt+1 j yt) = 1Wt MXm=1w(m)t ft(xt+1 j x(m)t ) (6)4

where w(m)t = p(x(m)t jyt)=�(x(m)t jyt), and Wt = PMm=1w(m)t : The w(m)t are usually called \impor-tance weights."When a new data yt+1 is observed, the posterior distribution of xt+1 isp(xt+1 j yt+1) / gt+1(yt+1 j xt+1)p(xt+1 j yt);which, by using (6), can be approximated as~p(xt+1 j yt+1) / 1Wt MXm=1w(m)t gt+1(yt+1 j xt+1)ft+1(xt+1 j x(m)t ): (7)>From this approximation, one can conduct a Sampling-Importance-Resampling (Rubin, 1987) stepto obtain an approximate sample from the approximate posterior distribution (7), which forms thebasis of the bootstrap �lter (Gordon et al., 1993). Many improvements upon this basic algorithmhave been proposed, see Liu and Chen (1998) for a recent summary.In many situations as those focused in this article, we �nd that quantitiesu(m)t+1 = Z gt+1(yt+1 j xt+1)ft+1(xt+1 j x(m)t )dxt+1p(xt+1 j yt+1; x(m)t ) = 1u(m)t+1 gt+1(yt+1 j xt+1)ft+1(xt+1 j x(m)t )can often be worked out analytically. Then we can rewrite (7) as~p(xt+1 j yt+1) = 1Wt+1 MXm=1w(m)t+1p(xt+1 j yt+1; x(m)t ); (8)where w(m)t+1=w(m)t � u(m)t+1 and Wt+1=PMm=1 w(m)t+1 . This can be used to construct a more e�cientalgorithm. Firstly, we can draw an exact sample from (8) instead of an approximate one from (7).Secondly, for any measurable function h(�), we can estimate E[h(xt+1) j yt+1] byht+1 = 1Wt+1w(m)t+1Efh(xt+1) j yt+1; x(m)t g:Because the weights w(m)t+1 incorporate new information from yt+1, an earlier estimate of, say, h(xt�s)(i.e., Efh(xt�s)jyt�sg), can also be modi�ed to Efh(xt�s)jyt+1g by using these new weights. Inorder to proceed to time t + 2, we may use (8) directly or draw a sample of the x(m)t+1 from (8),and go back to (4) with t replaced by t + 1. The operations from (4) to (8) thus de�ne the basicrecursive procedure of a Monte Carlo �lter. Some detailed discussions on the implementation andthe e�ciency of the sequential importance sampler can be found in Liu and Chen (1998).5

3 Model and the Method3.1 Conditional Dynamic Linear Models (CDLM)A conditional dynamic linear model (CDLM) can be generally described as follows:8<: xt = H�xt�1 +W�wtyt = G�xt + V�vt if �t = � (9)where wt � N(0; I); vt � N(0; I) and all coe�cient matrices are known given �. The �t, which canbe either continuous or discrete, is a latent indicator process with certain probabilistic structure.The CDLM is a direct generalization of the dynamic linear model (DLM) (West and Harrison,1989) and has been widely used in practice. With discrete indicator variables, the model can beused to deal with outliers, sudden jumps, system failures, environmental changes, and clutters.With carefully chosen continuous indicator variables, CDLM can also accommodate dynamic linearmodels with non-Gaussian innovations (e.g. t-distributions, logistic distributions, etc).Example 1: A special CDLM is the linear system with non-Gaussian errors. Suppose8<: xt = Hxt�1 +Wwtyt = Gxt + V vt;where wt and vt are mixture Gaussian, i.e., conditioning on some unobserved variables �1t and�2t, wtj�1t � N(�1(�1t);�1(�1t)) and vtj�2t � N(�2(�2t);�2(�2t)), for some mean functions �1;�2and variance functions �1 and �2. This model is clearly a CDLM with �1t and �2t being itslatent indicator processes. This class of error models include, in addition to the discrete mixtureof Gaussian distributions, also the t distributions, double exponential distributions, exponentialpower family, logistic distributions, etc. Even if wt and vt are not mixture Gaussian, most of thetime they can be satisfactorily approximated by mixture Gaussian distributions. However, onehas to balance complexity and accuracy: greater e�ciency can be achieved if the distribution isapproximated accurately, with relatively simple indicators. In section 5 we show and analyze severalCDLMs in practice.Engineers have begun to deal with special forms of the CDLM since 1970s. In a pioneeringwork, Ackerson and Fu (1970) consider a linear system operating in switching environments, whichthey formulate as the model in Example 1 with the �t being a �nite discrete Markovian indicatorprocess. To deal with the computational di�culty, they propose an approximate �ltering procedurein which the posterior probability of the indicator variable �t (given yt) is recursively updated6

under a conditional independence assumption, and then used in a Gaussian approximation of theposterior of xt. Their approach can be easily generalized to update a segment (�t�k; : : : ;�t) ofthe indicator process recursively (Tugnait, 1982). In dealing with the same CDLM, Akashi andKumamoto (1977) introduce essentially a sequential importance sampling method for the indicatorprocess in which an \optimal" sampling distribution is used (Doucet 1998; Liu and Chen 1998).Thus, Akashi and Kumamoto's algorithm is closest to the MKF proposed in this article. However,the key resampling and rejection steps are missing in their method, which makes it perform muchless satisfactorily. By formulating the MKF in a general sequential Monte Carlo framework, weare able to incorporate various Monte Carlo techniques, such as resampling, rejection control, andauxiliary variable approach into the scheme and to greatly extend the applicability of the method.More recently, the methods used in Svetnik (1986), Liu and Chen (1995) and Doucet (1998) haveall captured some attractive aspects of the CDLM and the MKF, but they are limited in scope.Several Markov chain Monte Carlo algorithms for this type of models have been proposed(See Carlin, Sto�er, and Polson, 1992; Carter and Kohn, 1994). In particular, Carter and Kohn(1994) present an e�cient Gibbs sampler in which the indicator �t is the only latent variable tobe imputed and the state variable xt is explicitly integrated out via a clever use of forward andbackward Kalman �ltering. A main problem with all the MCMC algorithms for dynamic systems,however, is that they can not be e�ectively used for on-line estimation and prediction. Whereas afast and e�cient on-line algorithm in problems such as target tracking and digital signal processingis essential.The bootstrap �lter (Avitzour 1995; Gordon et al. 1993; and Kitagawa 1996) can be directlyapplied to the CDLM for on-line estimation and prediction. In such a procedure, multiple MonteCarlo samples of the state variable xt are recursively imputed using the sampling-importance-resampling technique. A more e�cient alternative, sequential imputation with rejuvenation, wasproposed in Liu and Chen (1995) for a blind deconvolution problem. However, an even more so-phisticated algorithm can be obtained by making further use of the conditional Gaussian structure,as in Akashi and Kumamoto (1977), Carter and Kohn (1994), Doucet (1998), and Liu and Chen(1998). That is, when conditioning on �1; : : : ;�t, the CDLM de�ned by (9) becomes a GaussianDLM, and all the xs, s = 1; : : : ; t, can be integrated out recursively by using a standard Kalman�lter. West (1992) �rst suggest to use mixture distributions for a general adaptive importancesampling strategy in dynamic systems. In a CDLM, mixture Gaussian distribution becomes anobvious choice, because of the e�cient Kalman �lter.7

3.2 The Method of Mixture Kalman FilteringLet yt = (y1; : : : ; yt) and �t = (�1; : : : ;�t). We observe thatp(xt j yt) = Z p(xt j �t;yt)p(�t j yt)d�t;where p(xt j �t;yt) � N(�t(�t);�t(�t));in which (�t(�t);�t(�t)) can be obtained by running a Kalman �lter with given trajectory �t.The main idea of the MKF is to use a weighted sample of the indicators,St = f(�(1)t ; w(1)t ); : : : ; (�(m)t ; w(m)t )g;to represent the distribution p(� j yt), and then use a random mixture of Gaussian distributions,mXj=1w(j)t N(�t(�(j)t );�t(�(j)t ));to approximate the target distribution p(xt j yt). For any integrable function h(�), we can approx-imate the quantity of interest Efh(xt) j ytg asE(h(xt) j yt) = mXj=1w(j)t Z h(x)'(x;�t(�(j)t );�t(�(j)t ))dx;where ' is the Gaussian density function.A straightforward sequential Monte Carlo method as described in Section 2.2 simply uses aweighted sample of the state variable, f(x(1)t ; w(1)t ) , : : :, (x(m)t ; w(m)t )g; to approximate p(xt j yt).Whereas the MKF tries to sample in the indicator space instead, which is equivalent to marginalizingout the xt. This operation has been shown to improve a Gibbs sampling algorithm (Liu, Wong,and Kong, 1994). Its advantage in a usual importance sampling scheme is shown in MacEachernet al. (1998). In the MKF setting, there is no clear theory available at this point. Our limitedexperience shows that the e�ciency gain of the MKF can be very signi�cant. Intuitively, the usualsequential Monte Carlo recursively approximates the posterior of xt by a discrete sample; whereasthe MKF approximates the posterior of xt by a mixture of Gaussian distributes. Note that the trueposterior of xt in a CDLM is indeed a mixture of Gaussian, although the number of its componentsincreases exponentially along with t.Let KF (j)t = (�t(�(j)t );�t(�(j)t )), the mean and variance matrix of xt obtained by Kalman�lter at time t given a trajectory �(j)t . Then the MKF consists of recursive applications of thefollowing: 8

MKF updating Step : For j = 1; : : : ;m,(M1) : generate �(j)t+1 from a trial distribution g(�t+1 j �(j)t ;KF (j)t ; yt+1)(M2) : ObtainKF (j)t+1 by one-step Kalman �lter, as shown in (2), conditional on fKF (j)t ; yt+1; �(j)t+1g.(M3) : update the new weight as w(j)t+1 = w(j)t � u(j)t+1, whereu(j)t+1 = p(�(j)t ; �(j)t+1 j yt+1)p(�(j)t j yt)g(�(j+1)t+1 j �(j)t ;KF (j)t ; yt+1) :(M4) : if the coe�cient of variation of the wt+1 exceeds a threshold value, we resample a new setof KFt+1 from fKF (1)t+1; : : : ;KF (m)t+1 g with probability proportional to the weights w(j)t+1.When �t takes value in a �nite discrete set, I, then the most e�cient trial distribution for�t+1 is g(�t+1 j �t;KFt; yt+1) = p(�t+1 j �t;KFt; yt+1), which can be obtained by inspecting allthe possible values of �t. The incremental weight u(j)t+1 is then simpli�ed asu(j)t+1 / p(yt+1 j KF (j)t ) =Xi2I p(yt+1 j �t+1 = i;KF (j)t )p(�t+1 = i j �(j)t ):Speci�cally, a MKF updating step in this case becomes: For j = 1; : : : ;m,(M0) : For each �t+1 = i, i 2 I, run a Kalman �lter to obtainv(j)i / p(yt+1 j �t+1 = i;KF (j)t )p(�t+1 = i j �(j)t );where p(�t+1 = i j �(j)t ) is the prior transition probability for the indicator and p(yt+1 j�t+1 = i;KF (j)t ) is a by-product of the Kalman �lter, using (3).(M1) : Sample a �(j)t+1 from the set I, with probability proportional to v(j)i .(M2) : Let KF (j)t+1 be the one with �t+1 = �(j)t+1.(M3) : The new weight is w(j)t+1 = w(j)t Pi2I v(j)i :Smith and Winter (1978) proposed a deterministic �ltering method, called split-track �lter,which has a similar avor to the MKF we proposed. It is designed for CDLMs with discrete latentindicator variables. In split-track �lters, one always keepsm trajectories of the latent indicators. Ata future time step, it evaluates the likelihoods of all possible propagations from the m trajectorieskept at previous step, then �nd and keep the m new trajectories with the highest likelihood values.In contrast, our MKF selects these trajectories randomly, according to the weights (which is thepredictive likelihood value), and uses the associated weights to measure how good each trajectory9

is. The important step of resampling is naturally built into MKF which can overcome some weak-nesses of the split-track �lter. More sophisticated sampling and estimation methods can also beincorporated. A comparison of the MKF and the split-track �lter in a target tracking problem ispresented in Section 5.When �t is a continuous random variable a simpler but less e�cient algorithm is(M1) : Sample a �(i)t+1 from p(�t+1 j �(i)t ), the prior structure of the indicator variable.(M2) : Run one step of Kalman Filter on f�(i)t+1;KF (i)t ; yt+1g to obtain KF (i)t+1, using (2).(M3) : The new weight is w(i)t+1 = w(i)t p(yt+1 j �(i)t+1;KF (i)t ) using (3).Methods of Berzuini et al. (1997) and Pitt and Shephard (1999) can be applied to improvethe e�ciency of this algorithm.4 The Extended Mixture Kalman Filters4.1 Partial conditional dynamic linear modelsSuppose the state variable has two components: xt = (xt1; xt2). The following system is called apartial conditional dynamic linear model (PCDLM):State Equation: 8<: xt;1 = Ht(xt�1;2;�t)xt�1;1 +Wt(xt�1;2;�t)wtxt;2 = gt(xt�1;2;�t; "t)Observation Equation: yt = Gt(xt;2;�t)xt;1 + ht(xt;2;�t) + Vt(xt;2;�t)vtwith wt � N(0; I) and vt � N(0; I). The matrices Ht; Gt;Wt; Vt are known given the values offxt�1;2, xt;2, �tg. The functions gt and ht are known and "t has a known distribution.There is in fact no absolute distinction between the PCDLM and the CDLM because if weregard the \nonlinear" component xt;2 of the state variable in a PCDLM as part of the indicatorvariable, the system becomes a CDLM. However, unlike in CDLM where we have no interest in thelatent indicator, the inference of the 'nonlinear component' of the state variable in a PCDLM isoften of great interest. Note that in our model formulation the state propagation of the nonlinearcomponent does not depend on the linear component.Example 2. Fading Channel: Many mobile communication channels can be modeled as10

Rayleigh at-fading channels, which has the following form:State Equations: 8>>><>>>: xt = Fxt�1 +Wwt�t = Gxtst � p(� j st�1)Observation Equation: yt = �tst + V vtwhere st are the input digital signals (symbols), yt are the received complex signals, and �t arethe unobserved (changing) fading coe�cients. Both wt and vt are complex Gaussian with identitycovariance matrices. This system is a clearly a PCDLM. Given the input signals st, the systemis linear in xt and yt. In Section 5 we show how to use the extended MKF for extracting digitalsignals transmitted over such channels.Example 3. Blind Deconvolution. Consider the following system in digital communicationyt = qXi=1 �ist�i + "t;where st is a discrete process taking values on a known set S. In a blind deconvolution problem, stis to be estimated from the observed signals fy1; : : : ; ytg, without knowing the channel coe�cients�i. This system can be formulated as a PCDLM. Let �t = (�t1; : : : ; �tq) and xt = (st; : : : ; st�q)0.We can de�ne State Equation: 8<: �t = �t�1xt = Hxt�1 +WstObservation equation: yt = �txt + "twhereH is a q�q matrix with lower o�-diagonal element being one and all other elements being zeroand W = (1; 0; : : : ; 0)0. In this case, the unknown system coe�cients are part of the state variable,and is linear conditional on the digital signal xt. Liu and Chen (1995) studied this problem with aprocedure which is essentially an extended MKF as described in the next subsection. This PCDLMformulation can be easily extended to deal with a blind deconvolution problem with time-varyingsystem coe�cients.4.2 The Extended MKFThe main idea of the extended MKF (EMKF) is to extract as many linear and Gaussian componentsfrom the system as possible, and then integrate these components out (marginalized) using theKalman Filter before running a Monte Carlo �lter to the remaining components. Thus, in EMKF11

we generate discrete samples in the joint space of the latent indicator and the nonlinear statecomponent. More intuitively, because of the fact that p(xt1; xt2 j yt) = p(xt1 j xt2;yt)p(xt2 j yt),the approximation of p(xt1; xt2 j yt) in EMKF is decomposed as a Monte Carlo approximation ofthe marginal distribution p(xt2 j yt) and an exact Gaussian conditional distribution p(xt1 j xt2;yt).The EMKF algorithm is as follows: suppose at time t, we have a sample (�(j)t ; x(j)t;2 ;KF (j)t ; w(j)t ),j = 1; : : : ;m, where KF (j)t = (�t(�(j)t ; x(j)t;2 );�t(�(j)t ; x(j)t;2 )), the mean and variance matrix of p(xt1 jx(j)t2 ;�(j)t ;yt) obtained by the Kalman �lter. At time t+ 1, we update these �lters as follows:(E1) : generate (�(j)t+1; x(j)t+1;2) from a trial distribution g(�t+1; xt+1;2 j �(j)t ; x(j)t;2 ;KF (j)t ; yt+1)(E2) : run one step Kalman �lter conditioning on (�(j)t+1; x(j)t+1;2;KF (j)t ; yt+1) and obtain KF (j)t+1.(E3) : calculate the incremental weightu(j)t+1 = p(�(j)t ; x(j)t+1;2; �(j)t+1 j yt+1)p(�(j)t ; x(j)t+1;2 j yt)g(�(j)t+1; x(j)t+1;2 j �(j)t ;KF (j)t ; yt+1)and update the new weight as w(j)t+1 = w(j)t u(j)t+1.(E4) : resampling as in (M4) if necessary.>From the weighted sample obtained at each time t, we can estimate quantities of interest, e.g.,Efh(xt) j y1; : : : ; ytg �W�1t mXj=1w(j)t Z h(x1; x(j)t;2 )'(x1;�(j)t ;�(j)t )dx1where Wt =Pw(j)t . In particular,Efh1(xt;1) j y1; : : : ; ytg � W�1t mXj=1w(i)t Z h1(x1)�(x1;�(j)t ;�(j)t )dx1;Efh2(xt;2) j y1; : : : ; ytg � W�1t mXj=1w(j)t h2(x(j)t;2 ):5 Some Numerical Examples5.1 A linear system with mixture Gaussian errorsTo illustrate how the MKF works, we consider the following simple state space model with outliers:xt = axt�1 + �t; with �t � N(0; �2j ) if Jt = j;yt = xt + �t with �t � N(0; �2�):12

And we assume that P (Jt = j j J t�1;xt�1) = P (Jt = j) = �j , which can be easily extended to aMarkovian case. (a) (b)•

••

••

••

•••

••

••

••

Index

squa

re ro

ot o

f SSE

0 20 40 60 80 100

4.72

4.74

4.76

4.78

4.80

4.82

ooo

o

ooo

oo

o

o

o

o

oo

oooo

o

o

o

o

oo

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

ooo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

••

••••

••

•••

••

••••••

••

••••

••

••

••

•••••

•••

••

••

••••

•••

••••

••

••

••

•••

••

••

••

••

••

••••

••••

•••

•••

••

•••••••

••

•••••

••

••

••

•••••

••••

••

••

•••••

•••••

••

••••

••

•••

••

••

•••

••

••

•••

•••••

••

•••••

••••••

•••

••

••••

•••

•••

••

••••

•••

•••

••

••••••

•••

••

••

•••

Index

y

0 50 100 150 200 250 300

-4-2

02

46

Figure 1: (a) The mean-squared-error of the estimates of xt for using the MKF with (dots)and without (circles) resampling in 100 repeated experiments. (b) The true xt versus theiron-line estimation from MKF: dots | observations (yt); line | true values (xt); dotted lines| estimates (i.e., E(xt j yt)).We simulated a system with a=0.9, �1=0.5, �2=1.5, �1=0.7, and ��=0.3, and applied the MKFfor on-line estimation. We also tested the e�ect of a partial resampling scheme for the MKF (Liuet al., 1998). The numerical result was very satisfactory: the MKF estimated the state variablext very accurately disregarding several sudden jumps in the system. With Monte Carlo samplesize m=50, we already produced a result better than that in Kitagawa (1996), who routinely usedm=1,000 \particles." The �gure also shows that the application of resampling can e�ectively reducethe mean squared errors of the estimates.5.2 Target trackingDesigning sophisticated target tracking algorithm is an important task to both civilian and militarysurveillance systems, particularly when a radar, sonar, or optical sensor is operated in the present ofclutter or when innovations are non-Gaussian (Bar-Shalom and Fortmann, 1988). We show threeexamples of target tracking using the MKF: (a) targets in the presence of random interference(clutter); (b) targets with non-Gaussian innovations; and (c) targets with maneuvering.13

5.2.1 Random (Gaussian) accelerated target in clutterSuppose the target follows a linear and Gaussian state space model:xt = Hxt�1 +Wwtzt = Gxt + V vt;where xt is the state variable (location and velocity) of the target and wt; vt are white Gaussianwith identity covariance matrix.For targets moving on a straight line, we have xt = (st; vt) where st is the true target locationand vt is its current velocity. In this caseH = 0@ 1 00 T 1A ; W = �2w0@ T=21 1A ; G = (1; 0) and V = �2v ;where T is the time duration between two observations and the random acceleration is assumed tobe constant in the period, with rate �2wwt=T . For targets moving in two (three) dimensional space,the state variable becomes xt = (st;vt) with st and vt being two (three) dimensional vectors. Thecorresponding matrixes can be expanded similarly.In a clutter environment, we observe mt signals fyt1; : : : ; ytmtg at time t, withmt � Bernoulli(pd) + Poisson(��);where pd is the probability of a true signal zt being detected, � is the rate of a Poisson random �eldand � is the surveillance region. In words, at time t we observe the true signal with probabilitypd. We also observe false signals, such as deceiving objects, electro-magnetic interferences, etc.,distributed as a Poisson process in the detection region.This model can be easily formulated as a CDLM: let �t be the identi�er of the target at timet. That is, �t = 0 if the target is not observed, and �t = i if the i-th object on the detection screenis the true signal generated from the target, i.e. ytj = zt. Given the indicators, the system is linearand Gaussian, and the remaining y signals bear no information.Figure 2 shows the plots of tracking errors (the di�erence of the estimated and true targetlocation) of 50 simulated runs of a one-dimensional target, with r2 = 1:0; q2 = 1:0; pd = 0:9 and� = 0:1. Five hundred Monte Carlo samples are used for both MKF and an ordinary sequentialimportance sampler. We also run the split-track �lter, which, at each step, saves the 500 trajectorieswith the highest likelihood values. We can see that the MKF performs much better than the othertwo algorithms. 14

5.2.2 Random (Non-Gaussian) accelerated target in a clean environmentThis situation is usually modeled as follows:xt = Hxt�1 +Wwtyt = Gxt + V vt;with wt and vt are non-Gaussian errors. If wt and vt are mixture Gaussian distributions, this modelis clearly a CDLM. For example, if wt � tk1 and vt � tk2 , we can de�ne �t = (�t1;�t2) with priordistributions as independent �2k1 and �2k2 respectively. Then the above model can be rewritten as:8<: xt = Hxt�1 + (pk1=p�1)Wetyt = Gxt + (pk2=p�2)V "t if (�t1;�t2) = (�1; �2)with et � N(0; I) and "t � N(0; I).Simulation were carried out with the following systems:H = 0@ 1 00 1 1A ; W = q0@ 1=21 1A ; G = (1; 0) and V = rand wt � t3 and vt � t3. The following table shows a comparison of the MKF and a standardsequential importance sampler in terms of the number of times the target was lost (jxt� xtj > 1200)and the cpu time for one hundred simulated runs.noise variance MC size (m) Particle MKFcpu time # miss cpu time # miss20 9.49843 72 19.4277 150 20.1622 20 51.6061 1q2= 16.00 200 80.3340 7 181.751 1r2= 1600 500 273.369 4 500.157 11500 1063.36 3 2184.67 1We observed that the MKF takes about twice as much CPU time as the standard sequentialsampler for the same m. Figure 3 shows the tracking mean squares error, after the lost tracks areeliminated. Clearly , the MKF performed much better in this example for the same amount ofCPU time.We also tested the idea of using a �nite mixture of Gaussians to approximate the t distribution,i.e. approximating t3 with Pki=1 piN(0; �2i ). Similar results were obtained. The advantage of using15

this mixture approach is that with discrete indicators a more e�cient MKF can be used. However,the approximation also causes some bias.5.2.3 Maneuvered target in a clean environment:This situation is usually modeled as follows:xt = Hxt�1 + Fut +Wwtyt = Gxt + V vt;where ut is the maneuvering acceleration. The prior structure of ut is the key of this model. First,maneuvering can be classi�ed into several categories, indicated by an indicator. For example: in athree level model, It = 0 indicates no maneuvering (ut = 0), and It = 1 and 2 indicate slow and fastmaneuvering, respectively, (ut � N(0; �2i ), �21 < �22). One can also specify a transition probabilitiesP (It = j j It�1 = i) = pij for the maneuvering status. Second, there are di�erent ways of modelingthe serial correlation of the ut. Here we assume a multi-level white noise model, as in Bar-Shalomand Fortmann (1988), where the ut are assumed independent. This is the easiest but not a veryrealistic model. Other possible models are currently under investigation.We applied the MKF to an example of Bar-Shalom and Fortmann (1988), in which a two-dimensional target's position is sampled every T = 10s. The target moves in a plane with constantcourse and speed until k = 40 when it starts a slow 90o turn which is completed in 20 samplingperiods. A second, fast, 90o turn starts at k = 61 and is completed in 5 sampling times. The slowturn is the result of acceleration inputs ux = uy = 0:075 (40 < k � 60), and the fast turn is fromux = �uy = �0:3 (61 < k � 65). Figure 4 shows the trajectory of the target and its x-directionand y-direction velocity in one simulated run. In Figure 5 we present the root mean square errorsof the MKF estimates of the target position for 50 simulated runs with three noise levels: 0.000001,1 and 36. Comparing our result with that of Bar-Shalom and Fortmann (1988, pp 143) who usedthe traditional detection-and-switching method, we see a clear advantage of the proposed MKF.5.3 A simple illustration of the extended MKFConsider the system xt;1 = 0:9xt�1;1 + 0:2x2t�1;2 + q1et1xt;2 = xt�1;2 + q2et2yt = 0:5xt;1xt;2 + rvt16

with et1, et2; vt � N(0; 1). Note that, even though xt;2 has a linear propagation equation, it isactually a nonlinear component, because it appears in the other state equation in a nonlinear form.Given xt�1;2 and xt;2, the system is linear in both the �rst state equation and the observationequation.Figure 6 shows a comparison of the MSE in estimating xt;1 and xt;2 by using EMKF with MonteCarlo sample size m = 300 and the standard sequential importance sampler with m = 900. Fiftysimulated runs of the above PCDLM with q1 = 10; q2 = 0:1; r = 1 were carried out. Resamplingwere done at every time step. The CPU time of the EMKF is almost the same as a standardsequential importance sampler for the same m. Other con�gurations showed similar results.5.4 Digital Signal Extraction in Fading ChannelsConsider Example 2 in Section 4.1 with binary input signals st = f1;�1g. The fading coe�cienttakes complex values, with independent real and imaginary parts following the same state equation.Simulation were done with the following con�gurationsF = 0BBBBBB@ 0 1 0 00 0 1 00 0 0 10 :9391 �2:8763 2:93721CCCCCCA ; G0 = 10�40BBBBBB@ :0376:1127:1127:0376

1CCCCCCA ; W = 0BBBBBB@ 00011CCCCCCA ; and V = r:That is, both of the real and the imaginary parts of �t follow an ARMA(3,3) process�t � 0:9391�t�1 + 2:8763�t�2 � 2:9372�t�3 = 0:0376et + 0:1127et�1 + 0:1127et�2 + 0:0376et�3where et � N(0; 0:012). This is, in the communication literature, a (lowpass) Butterworth �lter oforder 3 with cuto� frequency 0.01. It is also normalized to have stationary variance 1.We are interested in estimating the di�erential code dt = stst�1. Figure 7 shows the biterror rate of di�erent signal to noise ratios (SNR), using EMKF, the di�erential �lter (DPSK)dt = sign(real(yty�t�1)) and a lower bound. The lower bound is obtained using the true fadingcoe�cients �t and dt = sign(real(��t yty�t�1�t�1)).Monte Carlo sample size m was 100 for MKF, except in the case when SNR� 10, in whichm=500. We can see that the simple DPSK works very well in low SNR cases and no signi�cantimprovement can be expected. However, DPSK has an apparent bit error rate oor for high SNRcases. The MKF managed to break that oor, by using the structure of the fading coe�cients.17

6 DiscussionIn this article, we propose the mixture Kalman �lter for on-line estimation and predictionin conditional dynamic linear models. The methodology is further extended to deal with partialconditional dynamic linear models. The MKF is a sequential Monte Carlo technique in which wea marginalization operation is employed for improving its e�ciency. All of our numerical examplesshow that the MKF approach gains signi�cantly over the earlier sequential Monte Carlo approaches,e.g., the bootstrap �lter and the sequential imputation.The developments in this article also show that the sequential importance sampling is a verygeneral and powerful platform that allows people to design various improved algorithms by makinguse of special structure of a given problem. We hope that the new tools we add to the nonlinear�ltering toolkit are of interest to researchers, as well as practitioners, in this �eld.REFERENCESAckerson, G.A. and Fu, K.S. (1970). On state estimation in switching environments. IEEE Trans.Autom. Control, AC-15, 10-17.Akashi, H. and Kumamoto, H. (1977). Random sampling approach to state estimation in switchingenvironments. Automatica, 13, 429-434.Anderson, B.D.O. and Moore, J.B. (1979), Optimal �ltering, Prentice-HallAvitzour, D. (1995), A stochastic simulation Bayesian approach to multitarget tracking, IEEProceedings on Radar, Sonar and Navigation, 142, 41-44.Bar-Shalom, Y. and Fortmann, T.E. (1988) tracking and Data Association, Academic Press:BostonBerzuini, C., Best, N.G., Gilks, W.R., and Larizza, C. (1996), Dynamic conditional independencemodels and Markov chain Monte Carlo methods, J. Amer. Statist. Assoc., to appear.Carlin, B.P, Polson, N. G. and Sto�er, D. S. (1992), A Monte Carlo approach to nonnormal andnonlinear state-space modeling, Journal of the American Statistical Association, 87 493-500.Carpenter, J., Cli�ord, P. and Fearnhead, P. (1997), \An improved particle �lter for non-linearproblems." Technical Report, Oxford University.Carter, C.K. and Kohn, R. (1994), On Gibbs sampling for state space models, Biometrika, 81,541-553. 18

Doucet, A. (1998). On sequential simulation-based methods for Bayesian �ltering. Technicalreport TR.310, Department of Engineering, University of Cambridge.Gelb, A. (1974), Applied Optimal Estimation, MIT pressGordon, N.J., Salmon, D.J. and Smith, A.F.M. (1993), \A novel approach to nonlinear/nonGaussian Bayesian state estimation," IEE Proceedings on Radar and Signal Processing, 140,107-113.H�urzeler, M. and K�unsch, H.R. (1995), \Monte Carlo approximations for general state spacemodels." Research Report 73, ETH, Z�urich.Jazwinski, A. (1970), Stochastic Processess and Filtering Theory, Academic PressKalman, R.E. (1960), A new approach to linear �ltering and prediction problems. J. BasicEngineering, 82, 35-45Kitagawa, G. (1996), Monte Carlo �lter and smoother for non-Gaussian nonlinear State spacemodels, Journal of Computational and Graphical Statistics, 5, 1-25.Kong, A., Liu, J.S., and Wong, W.H. (1994), Sequential imputations and Bayesian missing dataproblems, J. Amer. Statist. Assoc., 89, 278-288.Liu, J.S. and Chen, R. (1995), Blind deconvolution via sequential imputations, Journal of theAmerican Statistical Association, 90, 567-576.| (1998), Sequential Monte Carlo methods for dynamic systems, Journal of the American Sta-tistical Association, 93, 1032-1044Liu, J.S., Chen, R., and Wong, W.H. (1998), Rejection control for sequential importance sampling,Journal of the American Statistical Association, 93, 1022-1031.Liu, J.S., Wong, W.H., and Kong, A. (1994), Covariance structure of the Gibbs sampler withapplications to the comparisons of estimators and augmentation schemes, Biometrika, 81,27-40.MacEachern, S.N., Clyde, M.A., and Liu, J.S. (1998), Sequential Importance Sampling for Non-parametric Bayes Models: The Next Generation, Canadian Journal of Statistics, in press.Pitt, M.K. and Shephard, N. (1999), Filtering via simulation: auxiliary particle �lters, J. Amer.Statist. Asso., in press.Rubin, D.B.(1987). A noniterative sampling/importance resampling alternative to the data aug-mentation algorithm for creating a few imputations when fractions of missing information aremodest: the SIR algorithm, Journal of the Ametrican Statistical Association, 52, 543-546.19

Rubinstein, R.Y. (1981). Simulation and the Monte Carlo Method, New York: Wiley.Smith, M.C. and Winter, E.M. (1978), On the detection of target trajectories in a multi-targetenvironment. Proc. 17th (1978) IEEE Conf. on Decision & Control., San Deigo, CA. January1979Svetnik, V.B. (1986). Applying the Monte Carlo method for optimum estimation in systems withrandom structure. Auto. Remo. Cont., 47, 818-827.Tugnait, J.K. (1982). Detection and estimation for abruptly changing systems. Automatica, 18,607-615.West (1992), Mixture models, Monte Carlo, Bayesian updating and dynamic models, ComputerScience and Statistics, 24, 325-333.West, M. and Harrison, J. (1989), Bayesian forecasting and dynamic models, New York: Springer-Verlag.

20

time

error

0 20 40 60 80 100

-40-20

020

40

time

error

0 20 40 60 80 100

-40-20

020

40

time

error

0 20 40 60 80 100

-40-20

020

40

Figure 2: The tracking errors of 50 runs of the MKF (top), a sequential importance sampler(middle), and the split-track �lter (bottom) for a simulated target moving system.21

MSE x

Index

xx1[

, 1]

0 20 40 60 80 100

050

010

0015

00

M20M200P50P500

MSE speed

Index

xx2[

, 1]

0 20 40 60 80 100

020

040

060

080

0

M20M200P50P500

Figure 3: The MSE's 50 runs of the MKF and a standard sequential importance samplerfor a simulated target moving system with di�erent Monte Carlo sample size

22

• ••

•• •

•• • •

• •••

•••

••••

••••

• •••

••• •

•••

• •• •• ••

• •••• •

•• • ••• • • •

• ••

•• •

••••

•• • •••

••• •

• ••• •

•••••

•••••

••• •

x position

y pos

ition

2000 2500 3000 3500 4000

4000

6000

8000

1000

0

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••

x velocity

time

veloc

ity

0 20 40 60 80 100

05

1015

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••

y velocity

time

veloc

ity

0 20 40 60 80 100

-15

-10

-50

510

15

Figure 4: The position and velocity of a simulated 2 dimensional maneuvering target. (Top)Position. (Bottom) Velocity23

x-position

time

root

MS

E

0 20 40 60 80 100

050

100

150

x-velocity

time

root

MS

E

0 20 40 60 80 100

4060

8010

0

Figure 5: The root MSE's 50 runs of MKF for a simulated target moving system withmaneuvering24

MSE x1

Index

xx1[

, 1]

0 20 40 60 80 100

010

020

030

040

0

EM300P900

MSE x2

Index

xx2[

, 1]

0 20 40 60 80 100

0.01

0.02

0.03

0.04

0.05

0.06

0.07

EM300P900

Figure 6: The MSEs of 50 runs of the EMKF and the standard sequential importancesampler for a simulated PCDLM.

25

Bit Error Rate Comparison

SNR

bit e

rror

rate

10 20 30 40

0.00

050.

0050

0.05

00

MKFDPSKL-Bound

Figure 7: The bit error rate of extracting di�erential binary signals from a fading channelusing MKF and DPSK. A lower bound that assumes the exact knowledge of the fadingcoe�cients is also shown.

26