Laplace l1 Robust Kalman Filter Based On Majorization Minimization
Laplace l1 Robust Kalman Filter Based On Majorization Minimization
net/publication/319370296
CITATIONS READS
6 210
4 authors, including:
Hongwei Wang
Northwestern Polytechnical University
17 PUBLICATIONS 90 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
applications on robust filtering and smoothing------robust system identification and robust data fusion View project
All content following this page was uploaded by Hongwei Wang on 29 March 2020.
Abstract—In this paper, we attack the estimation problem Markov chain Monte Carlo (MCMC) method [13] and particle
in Kalman filtering when the measurements are contaminated filter (PF) [14] can be used to infer the state. However, MCMC
by outliers. We employ the Laplace distribution to model the and PF are computationally intensive. Based on the linearity
underlying non-Gaussian measurement process. The maximum
posterior estimation is solved by the majorization minimization of Student’s t distributions, a robust filter is proposed under
(MM) approach. This yields an MM based robust filter, where the the assumption that both the process and measurement noise
intractable 1 norm problem is converted into an 2 norm format. follow student’s t distribution [11]. Nevertheless, the degree of
Furthermore, we implement the MM based robust filter in the freedom (DOF) of student’s t distributions, which has a great
Kalman filtering framework and develop a Laplace 1 robust influence on the accuracy of estimation, is hard to determine
Kalman filter. The proposed algorithm is tested by numerical
simulations. The robustness of our algorithm has been borne out in practice. While the Laplace distribution can subtly avoid the
when compared with other robust filters, especially in scenarios problem of DOF, it has limited applications as a heavy-tailed
of heavy outliers. distribution due to its 1 norm.
The approach presented here is significantly different from
I. I NTRODUCTION
the robust regression and stochastic approximation approaches.
Estimation is a ubiquitous task from industrial applica- We adopt the Laplace distribution along with an optimiza-
tions to research areas like target tracking, system identi- tion framework based on the maximum posterior estimation
fication, navigation and many others. Conventional Kalman (MAP). We formulate the optimization problem as a hybrid
filter (cKF) [1] is a well-known optimal recursive estimation 2 /1 norm problem and apply the Majorization Minimiza-
solution for linear dynamic systems with uncorrelated zero- tion (MM) approach [15] to make the hybrid problem more
mean Gaussian noise. However, in many applications related tractable. Specifically, Young’s inequality [16] is used to
to scenarios involving, e.g., glint noise and air turbulence [2], construct a surrogate function to approximate the cost function
the measurement noise may have heavier tail than the Gaussian of the hybrid 2 /1 norm problem. Interestingly, the surrogate
distribution. Heavy-tailed distributions can also be utilized function is in a quadratic format, and we have a closed-form
to model the measurement noise where outliers are present. expression for state estimation in each MM iteration. Then we
Robust filtering methods are useful to deal with heavy tails implement an iterative procedure of the MM method in the
and obtain accurate results in such cases. Kalman filtering framework, resulting in a Laplace 1 robust
One such approach is based on robust regression [3]. In Kalman filter (LRKF). Through numerical experiments, we
this approach, the Kalman filter is treated as a linear regression show that the proposed method can yield more accurate and
problem. Instead of using the 2 norm as a cost function in the steady results compared with cKF and other robust filters with
Kalman filter, Huber’s cost function [4][5] or other robust cost a moderate increase of the computational time.
functions like the Hampel weight funcrion [6][7] are employed
to form robust versions of the Kalman filter. This approach can II. P ROBLEM F ORMULATION
be thought of as an extension of the M-estimator [3]. Recently, We here consider the following discrete-time linear state-
the maximum correntropy criterion [8], which is introduced space model (SSM):
to deal with non-Gaussian signals, has also been utilized for xt = Ft xt−1 + ut−1 ,
(1)
robust Kalman filter design [9][10]. It should be noted that yt = Ht xt + vt ,
most existing robust regression based schemes are sensitive to where for all t ∈ {1, · · · , T }, Ft ∈ n×n is a known state
design parameters. Their poor performance in heavy outliers transition matrix; Ht ∈ m×n is a known measurement
scenarios is motivating researchers to find alternative solutions. matrix. In cKF, both ut−1 and vt are zero-mean Gaussian
Meanwhile, heavy-tailed distributions can be directly em- noises with covariance Qt−1 and Rt , respectively. However,
ployed to model the measurement noise, including the mix- in general applications, the measurement noise may have a
ture Gaussian, Student’s t distribution and Laplace distribu- more heavy-tailed distribution due to the existence of outliers
tion [11][12]. Stochastic approximation methods such as the or secondary noise disturbance. This heavy-tailed distribution
This research was supported by China Scholarship Council and National may cause a loss of estimation accuracy or even a divergence
Natural Science Foundation of China (No. 61473227 and 11472222) of the Kalman filter. To address this issue, we utilize the
978-0-9964-5270-0©2017 ISIF
Laplace distribution to model the measurement noise which Algorithm 1 MM based 1 robust filter
naturally has heavy tails. Input: yt , x̂t−1 , Qt−1 and Rt .
Specifically, we assume that vt follows a Laplace distribu- Output: x̂t .
tion with mean zero and covariance R , i.e., x̂t|t−1 = Ft x̂t−1 ;
√t −1/2
p(vt ) = det(2Rt ) −1/2
exp − 2Rt vk 1 , (2) Pt|t−1 = FtT Pt−1 Ft + Qt−1 ;
1/2
where Rt is the Cholesky factor of a positive semi-definite W (0) = I, k = 0;
matrix Rt and · 1 represents the 1 norm. Furthermore, repeat √
1/2 T /2
ut−1 and vt are assumed to be mutually independent and also Γt = 2HtT (Rt W (k) Rt )−1 ;
(k+1) −1 −1
uncorrelated to the initial state x0 , which is also assumed to x̂t = (Pt|t−1 + Γt Ht )−1 (Pt|t−1 x̂t|t−1 + Γt yt );
(k) −1/2 (k)
be Gaussian: x0 ∼ N (x̂0|0 , P0 ). The main target is to obtain Wi = |(Rt (y − H x̂ ))i | + ;
t(k+1) t t (k+1)
the MAP estimate of xt by exploiting the noisy measurements W (k+1) = diag W1 , · · · , Wm ;
up to time t, i.e., {yi }ti=1 . k = k + 1;
until convergence criterion is met
(k−1)
III. L APLACE 1 ROBUST K ALMAN F ILTER x̂t = x̂t ;
A. MAP Formulation
According to the Bayesian theory, the posterior density of have
xt is given by: m
p(xt |y1:t ) ∝ p(xt |y1:t−1 )p(yt |xt ), (3) ηt 1 = |(ηt )i |
i=1
The MAP estimation of xt is then obtained by solving the m
following optimization problem: Πi (ηt )2i
≤ + ( ). (7)
x̂t = arg min − log p(xt |y1:t−1 ) − log p(yt |xt ) (4) i=1
2 2Πi
xt
Applying (7) into (5) and writing the resulting equation in
By using the probability density function (pdf) defined in the
the matrix form, we obtain the following surrogate function
SSM model (1) and dropping terms that do not relate to xt ,
for f (xt ):
we can write (4) as 1
x̂t = arg min f (xt ) Q(xt , Π) = xt − x̂t|t−1 2P −1
xt 2 t|t−1
√ √ m
1 1 2
= arg min xt − x̂t|t−1 2P −1 + 2ηt 1 , (5) + yt − Ht xt 2R̄−1 + Πi (8)
xt 2 t|t−1 2 t 2 i=1
where x̂t|t−1 is the predicted state with the predicted error √
1/2 T /2
covariance Pt|t−1 and ηt = Rt
−1/2
(yt − Ht xt ) is the where R̄t = 22 Rt ΠRt and Π is a diagonal matrix with
normalized residual vector. the diagonal element Πi (i = 1, · · · , m).
Generally, gradient methods can not be directly employed Given the above surrogate function, the original problem
to solve (5). This is mainly because (5) involves an 1 norm can be solved by the MM method, which involves iteratively
which is non-differentiable at ηt = 0. To overcome this, we updating parameters through
(k)
resort to a bounded optimization approach, namely the MM W (k) = arg min Q(x̂t , Π), (9)
Π
method. (k+1)
x̂t = arg min Q(xt , W (k) ). (10)
xt
B. Majorization Minimization Based Robust Filter By using to the condition under which (7) turns to be equality,
the solution of (9) is:
The key idea of the MM method is to transform the (k) −1/2 (k)
Wi = |(Rt (yt − Ht x̂t ))i | + , (11)
intractable original problem into a simpler one that can be
where subscript i means the i-th component and is a very
solved. Specifically, a tractable surrogate function is construct- (k)
small constant (e.g. 10−6 ) to avoid Wi = 0. We note
ed iteratively as an approximation of the original cost function. (k)
that W is independent of xt and hence (8) is actually a
Minimizing the surrogate function yields a solution sequence
quadratic function with regard to xt . We can easily find the
with non-increasing cost and converges to a stationary point
solution as
of the original optimization problem. (k+1) −T −T
x̂t = (Pt|t−1 + Γt Ht )−1 (Pt|t−1 x̂t|t−1 + Γt yt ), (12)
To construct a surrogate function, we here introduce Y- √ T 1/2 (k) T /2 −1
oung’s inequality to convert the 1 norm problem into an 2 where Γt = 2Ht (Rt W Rt ) . It should be noted
norm format. For a general function g(·) : R → [0, ∞) and that the following equations hold
an arbitrary constant a > 0, Young’s inequality holds that (i) (i)
f (xt ) ≤ Q(xt , W (k) ), i = 1, 2, · · · , (13)
a g(x) (k) (k)
g(x) ≤ + , (6) f (xt ) = Q(xt , W (k) ). (14)
2 2a
2
where equality holds only when g(x) = a . Based on this, we In practice, we choose a starting weighting matrix W (0)
(1) (1)
let g(x) = (|x|)2 . For arbitrary Πi > 0 (i = 1, . . . , m), we to obtain x̂t through (12) and then use the estimate of x̂t
to construct a new weighting matrix W (1) . This process is to (12) and after some algebraic manipulations, we obtain
repeated until some convergence criterion is met. Generally the same expression of x̂kt as described in the Algorithm 2.
the unit matrix I can be used as a starting weighting matrix. In this section, we present the convergence analysis of the
Details of the resulting algorithm are described in Algorithm both proposed robust filters. Since those two algorithms are
1. equivalent, we only analyze the convergence of the MM based
C. Implementation in the Kalman Filter Framework 1 robust filter.
First we show that the optimal solution of (8) is the optimal
When given W (k) , (10) can certainly be updated by the
solution of the original problem (5). We assume that x∗t and
gradient based method, as showed in (12). Unfortunately, this
x+t are the optimal solution of (8) and (5), respectively, with
approach can hardly provide any covariance information of
their corresponding weight matrixs W (∗) and W (+) . Since
x̂t , which is needed in order to calculate the prediction error
Q(xt , W ) can be considered as a function with respect to xt
covariance. Hence the MM based 1 robust filter, as showed
and W , we obtain
in Algorithm 1, is not a recursive implementation.
When carefully checking (10) for a given W (k) we find that Q(x∗t , W (∗) ) ≤ Q(x+
t ,W
(+)
), (16)
it is similar to the cost function of cKF. The only difference is f (x+ ∗
t ) ≤ f (xt ). (17)
that the measurement noise covariance needs to be modified. Then using (14), the following equations hold
That is, for given W (k) , we can obtain x̂k+1
t by solving f (x+ ∗ ∗
t ) ≤ f (xt ) = Q(xt , W
(∗)
), (18)
1
x̂k+1
t = arg min xt − x̂t|t−1 2P −1 Q(x∗t , W (∗) )
≤ Q(x+
t ,W
(+)
) = f (x+t ). (19)
xt 2 t|t−1
506(RI[
changes from case to case. Both GMKF and MCCKF are
derived under the assumption that the measurement noise
follows the Gaussian distribution, hence Rt is used directly in
their algorithms. However, the proposed LRKF employs the 7LPHW
506(RI[
To evaluate the performance of different filters, the root
mean squared error (RMSE) at each instant t is employed
as a metric, whichis defined as
L
1 i
RMSE(t) = (x − x̂it ), t = 1, · · · , T (23) 7LPHW
L i=1 t (b) RMSE of x2
where L is the number of independent Monte Carlo runs, and Fig. 1: Performance comparison in no outlier scenario.
xit and x̂it , respectively, denote the true and estimated state
component at time t in the i-th Monte Carlo run. In addition F.) /5.) *0.) 0&&.)
the average RMSE (ARMSE) is also utilized as another metric,
which is defined as
T
1
A. Example 1
7LPHW
We consider the following linear system [10]:
xt,1 cos(θ) − sin(θ) xt−1,1 (a) RMSE of x1
=
xt,2 sin(θ) cos(θ) xt−1,2
qt−1,1
506(RI[
+ (25)
qt−1,2
xt,1
yt = 1 1 + vt (26)
xt,2 7LPHW
where θ = π/18. In this simulation, we set Qt−1 = 0.01I2
and Rt = 0.01. The covariance of contamination noise is set (b) RMSE of x2
to 50 times that of the primary measurement noise, i.e., Rt = Fig. 2: Performance comparison in scenario that α = 0.3.
50Rt . The initial state xt,1 and xt,2 are randomly selected
from N (x0 , 0.01I2 ) where x0 = [2, 3]T .
As can be seen in Fig. 1, when no outliers occur in algorithm through a comparison of the ARMSE change along
measurements, the cKF outperforms the other robust filters. It versus increasing α. It is clear that both cKF and GMKF
is expected since the cKF offers the optimal estimation under have an increasing trend, while LRKF and MCCKF share
the Gaussian noise assumption in the minimum mean-square- an almost constant pattern. That is, the contamination ratio α
error sense, while the other robust filters do not minimize the poses less effect on LRKF and MCCKF than it dose on cKF
2 norm during the measurement update stage. Interestingly, and GMKF. MCCKF performs well when the contamination
although not optimal, both LRKF and MHKF provide only ratio α is high while GMKF has a relative good performance
slightly deteriorated results. In contrast, MCCKF has the when level of contamination ratio is low. Expect for some low
largest RMSE among four filters. levels of α (around 0 to 0.08), LRKF achieves the smallest
Fig. 2 illustrates the RMSEs of two components in the RMSEs in different α cases. Hence we can conclude that
case where the contamination ratio α = 0.3. All robust filters LRKF has the most robust performance in almost all levels
outperforms the cKF, which is not surprising since directly of the contamination ratio.
minimizing 2 norm, as done by the cKF, is sensitive to
outliers. Among the robust filters, LRKF is noticeably better B. Example 2
than both MHKF and MCCKF, while MHKF and MCCKF In the second simulation, the problem of tracking a target is
have a similar performance in this case. considered, where the target undergoes one dimensional linear
Fig. 3 further indicates the superiority of the proposed uniformly accelerated motion (LUAM) with an unknown and
F.) /5.) *0.) 0&&.)
counterparts.
DYHUDJH506(RI[ TABLE II: Average iteration number in different α cases
α 0 0.1 0.2 0.3 0.4 0.5
AIN 6.62 6.59 6.52 6.51 6.62 6.47
V. C ONCLUSIONS
&RQWDPLQDWLRQUDWLRD