Generalized Multiple-Model Adaptive Estimation Using An Autocorrelation Approach
Generalized Multiple-Model Adaptive Estimation Using An Autocorrelation Approach
Keywords: Multiple-model adaptive estimation, filtering, extended Kalman filter, target tracking.
Introduction
Filtering algorithms, such as the extended Kalman filter (EKF) [1], the Unscented filter (UF) [2] and Particle filters (PFs) [3, 4], are commonly used to both
estimate unmeasurable states and filter noisy measurements. The EKF and UF assume that the process noise
and measurement noise are represented by zero-mean
Gaussian white-noise processes. Even if this is true,
both filters only provide approximate solutions when
the state and/or measurement models are nonlinear,
since the posterior density function is most often nonGaussian. The EKF typically works well only in the
region where the first-order Taylor-series linearization
adequately approximates the non-Gaussian probability density function (pdf). The Unscented filter works
on the premise that with a fixed number of parameters it should be easier to approximate a Gaussian
Yang Cheng
Dept. of Mech. & Aero. Eng.
University at Buffalo
State University of New York
Amherst, NY 14260-4400 U.S.A.
[email protected]
distribution than to approximate an arbitrary nonlinear function. This in essence can provide higher-order
moments for the computation of the posterior function
without the need to calculate jacobian matrices as required in the EKF. Still, the standard form of the EKF
has remained the most popular method for nonlinear
estimation to this day, and other designs are investigated only when the performance of this standard form
is not sufficient.
Like other approximate approaches to optimal filtering, the ultimate objective of a PF is to construct the
posterior pdf of the state vector, or the pdf of the state
vector conditioned on all the available measurements.
However, the approximation of a PF is vastly different
from that of conventional nonlinear filters. The central
idea of the PF approximation is to represent a continuous distribution of interest by a finite (but large) number of weighted random samples of the state vector, or
particles. Particle filters do not assume the posterior
distribution of the state vector to be a Gaussian distribution or any other distribution of known form. In
principle, they can estimate probability distributions
of arbitrary form and solve any nonlinear and/or nonGaussian system.
Even if the process noise and/or measurement noise
are Gaussian, all standard forms of the EKF, UF and
PFs require knowledge of their characteristics, such as
the mean and covariance for a Gaussian process. The
covariance and mean of the measurement noise can
be inferred from statistical inferences and calibration
procedures of the hardware sensing devices. The calibration procedures can also be used to determine the
nature of the measurement process distribution. The
kurtosis characterizes the relative compactness of the
distribution around the mean, relative to a Gaussian
distribution. A common kurtosis, called the Pearson kurtosis, divides the fourth moment by the second moment [5]. Positive kurtosis indicates a relatively
peaked distribution, while negative kurtosis indicates a
relatively flat distribution. However, the process noise
is extremely difficult to characterize because it is usually used to represent modeling errors. The covariance is usually determined by ad hoc or heuristic approaches, which leads to the classical tuning of the
filter problem. Fortunately, there are tools available
x(t)
= f (x(t), u(t), t) + G(t) w(t), w(t) N (0, Q(t))
k = h(xk ) + vk , vk N (0, Rk )
y
(t0 ) = x
0
x
(t0 ) x
T (t0 )
P0 = E x
Initialize
T
1
Kk = Pk HkT (
x
x
x
k )[Hk (
k )Pk Hk (
k ) + Rk ]
h
Hk (
x
k)
x
Gain
k
x
x
yk h(
x
k = x
k + Kk [
k )]
Update
Pk+ = [I Kk Hk (
x
k )]Pk
(t) = f (
x
x(t), u(t), t)
Propagation
P (t) = F (
x(t), t) P (t) + P (t) F T (
x(t), t) + G(t) Q(t) GT (t)
f
F (
x(t), t)
x
(t)
x
+
covariance Rk , x
k and x
k are the propagated and updated state estimates, respectively, and Pk and Pk+ are
the propagated and updated covariances, respectively.
Oftentimes, if the sampling interval is below Nyquists
limit, a discrete-time propagation of the covariance is
used:
Pk+1
= k Pk+ Tk + Qk
(1)
F (
x(t), t)
G(t) Q(t) GT (t)
t
A=
(2)
0
F T (
x(t), t)
B11
B12
B11
1
k Qk
=
(3)
B = eA
T
0
B22
0
k
(4)
(6)
In this section a review of MMAE is shown. More details can be found in Refs. [16, 17]. Multiple-model
adaptive estimation is a recursive estimator that uses
a bank of filters that depend on some unknown parameters. In our case these parameters are the process
noise covariance, denoted by the vector p, which is assumed to be constant (at least throughout the interval
of adaptation). Note that we do not necessarily need to
make the stationary assumption for the state and/or
output processes though, i.e. time varying state and
output matrices can be used. A set of distributed elements is generated from some known pdf of p, denoted
by p (p), to give {p() ; = 1, . . . , M }. The goal of the
estimation process is to determine the conditional pdf
of the th element p() given the current-time measurek . Application of Bayes rule yields
ment y
k) =
p (p() |Y
k |p() ) p (p() )
p (Y
M
X
k |p(j) ) p (p(j) )
p (Y
j=1
(7)
k1 )
p (
yk , p() |Y
k1 )
p (
yk |Y
()
k1 )
p (
yk |
xk ) p (p() |Y
M h
i
X
(j)
k |
k1 )
p (Y
xk ) p (p(j) |Y
(8)
4.1
In this section the autocorrelation matrix for timevarying systems is derived, which is an extension to
the approach shown in Ref. [8]. Here we assume that
the model is linear with
xk+1 = k xk + k uk + k wk
k = Hk xk + vk
y
j=1
()
k1 , p() ) is given by p (
since p (
yk , |Y
yk |
xk ) in the
Kalman recursion. Note that the denominator of
Eq. (8) is just a normalizing factor to ensure that
k ) is a pdf. The recursion formula can now
p (p() |Y
()
be cast into a set of defined weights k , so that
()
()
()
k = k1 p (
yk |
xk
k
M
X
(j)
k
(9)
()
x
k =
M
X
(j) (j)
k
k x
(10)
j=1
(j)
k
x
x
k
(j)
k
x
x
k
T
(11)
k,
The specific estimate for p at time tk , denoted by p
and error covariance, denoted by Pk , are given by
k =
p
M
X
(j)
k p(j)
(12a)
j=1
M
X
j=1
T
(j)
k p(j) p
k
k p(j) p
where Ck, i E ek eTki and E{} denotes expecta
tion. The propagation of x
k is given by
x
k = k1 (I Kk1 Hk1 ) x
k1
(16)
i
Y
x
kj (I Kkj Hkj ) x
k =
ki
+
i
X
"j1
Y
j=2
=1
j=2
k (I Kk Hk ) kj Kkj vkj
=1
"j1
i
X
Y
k (I Kk Hk ) kj wkj
(12b)
where x
k x
k xk . The following autocorrelation
function matrix can be computed:
Hk Pk HkT + Rk
i=0
T T
Ck, i =
(15)
k x
ki Hki
Hk E x
T
k vki
Hk E x
i>0
j=1
(j)
k
j=1
Pk =
(14)
= Hk x
k + vk
where k p (p() |
yk ). The weights at time t0
()
are initialized to 0 = 1/M for = 1, 2, . . . , M .
The convergence properties of MMAE are shown in
Ref. [18], which assumes ergodicity in the proof. The
ergodicity assumptions can be relaxed to asymptotic
stationarity and other assumptions are even possible
for non-stationary situations [19].
The conditional mean estimate is the weighted sum
of the parallel filter estimates:
k Hk x
ek y
k
j=1
M
X
()
()
Pk
(13a)
(13b)
In this section the adaptive law, based on an autocorrelation approach, for the process noise covariance
matrix is shown. First, the autocorrelation for timevarying systems is derived, followed by the associated
likelihood functions for the defined measurement residuals.
where
i
Y
(18)
j=1
Hk Pk HkT + Rk
Hk k1 (P H T Kk1 Ck1, 0 )
k1 k1
Ck, i =
hQ
i
i1
Hk
kj (I Kkj Hkj )
j=1
T
ki (Pki Hki Kki Cki, 0 )
of Ck, i
i=0
i=1
i>1
(19)
()
where
T
Cki, 0 Hki Pki
Hki
+ Rki
(20)
()
k Hk x
k .
where ek y
The new adaptive law, which we call the generalized
multiple-model adaptive estimation (GMMAE) algorithm, is based on carrying Eq. (8) i steps back to give
the new update law:
()
()
()
k = k1 Li
4.2
Likelihood Function
In this section the likelihood function for the measurement residual is shown. First, the following residual is
defined:
ek
ek1
i .
(21)
..
eki
The likelihood function associated with i is given by
1
1 T 1
exp i Ci i
(22)
Li =
2
[det(2 Ci )]1/2
E{i Ti }
where Ci
Ck, 0
T
Ck,
T1
Ci = Ck, 2
.
..
T
Ck,
i
is given by
Ck, 1
Ck1, 0
T
Ck1,
2
..
.
T
Ck1,
i1
Ck, 2
Ck1, 2
Ck2, 0
..
.
T
Ck2,
i2
..
.
Ck, i
Ck1, i1
Ck2, i2
..
.
Cki, 0
(23)
When i = 0 the likelihood function reduces down to
1
L0 =
1/2
T
det[2 (Hk Pk Hk + Rk )]
(24)
1 T
T
1
exp ek (Hk Pk Hk + Rk ) ek
2
This likelihood is widely used in MMAE algorithms
[12, 16], but ignores correlations between different measurement times. However, it is simpler to evaluate than
the general likelihood function in Eq. (22) since no storage of data or system matrices is required.
4.3
In this section the new adaptive law based on the autocorrelation is shown. In the traditional MMAE approach only the current the measurement information
is used in the update law given by Eq. (9). Therefore,
the update law is given by
()
k
()
()
= k1 L0
()
()
k M k
X (j)
k
j=1
()
since p (
yk |
xk
(25)
()
o1/2
+ Rk )]
1 ()T
()
()
exp ek (Hk Pk HkT + Rk )1 ek
2
1 ()T () 1 ()
1
()
exp
Li = h
i1/2
i
i
2 i
()
det 2 Ci
(28)
where
()
i
is defined as
()
i
()T
[ek
()T
ek1
()T
eki ]T .
()
Ci
th
The matrix
is given by Eqs. (19) and (23) evaluated at the covariance and the optimal Kalman
gain. Unfortunately, the optimal gain is a function of
the actual covariance Qk , which is not known. Specifically, if Kk from Table 1 is substituted into Eq. (19),
then for i 1 the correlated terms Ck, i will always
be zero. One way to overcome this problem is to esti()
mate the Ck, i terms using the residuals, which is the
approach taken in Ref. [8]. But, this requires a stationary process and a sufficiently large set of measurements
over time, which would not work properly for timevarying system matrices and/or a sequential updating
scheme. A different approach is taken here, which is
also expanded for nonlinear systems. This assumes
that the measurement noise is small compared to the
signal so that the Gaussian nature of the measurement
()
residual is maintained. Estimates for Ck, i are given by
()
Ck, i =
()
()
()
xk )Pk HkT (
xk ) + Rk
Hk (
()
()
Hk (
xk )k1 (
xk1 )
()
()
T
k1 C () ]
xk1 ) K
[Pk1 Hk1 (
k1, 0
() Qi1
()
xk ){ j=1 kj (
xkj )
Hk (
()
kj Hkj (
[I K
xkj )]}
()
()
()
ki (
xki ) [Pki Hki
(
xki )
K
ki C () ]
ki, 0
()
()
()
()
()
is computed using
i=0
i=1
i>1
(29)
T
(
xki ) + Rki (30)
Cki, 0 Hki (
xki ) Pki Hki
()
(26)
(27)
with
) = L0 , which is defined by
()
det[2 (Hk Pk HkT
k
M
X
(j)
k
j=1
where
()
L0 = n
()
()
()
+()
()
Pk+1 = k (
xk )Pk Tk (
xk ) + Q()
(31a)
h
i
+()
()
()
()
Pk
= I Kk Hk (
xk ) Pk
(31b)
h
i1
()
()
()
()
()
Kk = Pk HkT Hk (
xk )Pk HkT (
xk ) + Rk
(31c)
h
i1
k = P H T (
K
x
)
H
(
x
)
P
H
(
x
)
+
R
(32)
k
k
k
k
k
k
k
k
k
0.8
with
k is computed using p
k.
where Q
k , along with the
Using the current measurement, y
th element, p() , 1 M , a bank of filters are ex()
k , and
ecuted. For each filter the state estimates, x
()
measurements are used to form the residual, i , go()
ing back i steps. The filter error covariance, Pk , and
()
()
state matrices, k
and Hk , evaluated at the current estimates are used to update the estimate of the
()
autocorrelation, denoted by Ci . Note that at each
()
new measurement time, all elements of Ci need to
k is provided,
be recalculated since a new estimate p
which is used to compute an estimate of the optimal
gain. Unfortunately, this can significantly increase the
computational costs. The diagonal elements do not
need to be recomputed though, since they are not a
function of the optimal gain. The residuals and autocorrelations are then used to evaluate the likelihood
()
functions Li . These functions are used to update the
k using Eq. (12a).
weights, which gives the estimate p
There are many possibilities for the chosen distribution of the process noise covariance parameters. A simple approach is to a assume a uniform distribution. We
instead choose a Hammersley quasi-random sequence
[14] due to its well distributed pattern. A comparison
between the uniform distribution and the Hammersley quasi-random sequence for 500 elements is shown
in Figure 1. Clearly, the Hammersley quasi-random
sequence provides a better spread of elements than
the uniform distribution. In low dimensions, the multidimensional Hammersley sequence quickly fills up
the space in a well-distributed pattern. However, for
very high dimensions, the initial elements of the Hammersley sequence can be very poorly distributed. Only
when the number of sequence elements is large enough
relative to the spatial dimension is the sequence properly behaved. This isnt much of a concern for the
process noise covariance adaption problem since the
dimension of the elements will be much larger than
the dimension of the unknown process noise parameters. Remedies for this problem are given in Ref. [20]
if needed.
p2
0.4
(33b)
Numerical Simulations
0.2
0
0
0.2
0.4
()
0.6
0.8
0.8
p1
0.8
0.6
()
(33a)
p2
+ T x ) + Q
k
Pk+1
= k (
x
k )Pk k (
k
h
i
k Hk (
P + = I K
x ) P
()
0.6
0.4
0.2
0
0
0.2
0.4
()
0.6
p1
1
0
=
0
0
t
1
0
0
0 0
0 0
1 t
0 1
(34)
(35)
t /3 t2 /2
022
qx t2 /2
t
3
Qk =
2
t /3 t /2
022
qy
2
t /2
t
(36)
The true values of qx and qy are chosen as qx = qy = 10.
In the MMAE and GMMAE, the elements of qx and qy
for the individual Kalman filters are generated using a
two-dimensional Hammersley sequence under the assumption that qx and qy are independently uniformly
60
GMMAE
MMAE
50
40
qx
of resampling and Markov-Chain Monte Carlo or regularization based on just a few consecutive data points
does not yield a satisfactory result. It is found that
good elements could be pruned in the early stage and
promising elements could end up with small weights.
Further investigation of the problem is under way.
30
20
10
10
15
20
25
Time (s)
30
35
40
45
50
Figure 2: Estimates of qx
80
GMMAE
MMAE
70
60
50
qy
40
30
20
10
10
15
20
25
30
35
40
45
50
Time (s)
Figure 3: Estimates of qy
Conclusions
ing the elements in the Kalman gain computation. Further studies are under way to assess the differences between these approaches. Simulation results indicated
that the new multiple-model adaptive estimation approach can provide better convergence properties than
the standard approach.
Acknowledgment
This work was supported through an Independent Research and Development (IRAD) award by L-3 Communications Integrated Systems, Greenville, TX, under the supervision of Dr. Gerald L. Fudge. The authors greatly appreciate the support.
References