0% found this document useful (0 votes)
9 views17 pages

A Sliding Window Filter For SLAM

This document presents a Sliding Window Filter (SWF) for Simultaneous Localization and Mapping (SLAM), which is an efficient algorithm designed to handle large environments with constant time complexity and linear memory requirements. The SWF approximates the full Batch Least Squares SLAM problem while maintaining statistical optimality and consistency, allowing for various applications in robotics and computer vision. Initial experiments demonstrate that the SWF approaches the performance of optimal estimators, even with small sliding windows.

Uploaded by

Trung Con
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

A Sliding Window Filter For SLAM

This document presents a Sliding Window Filter (SWF) for Simultaneous Localization and Mapping (SLAM), which is an efficient algorithm designed to handle large environments with constant time complexity and linear memory requirements. The SWF approximates the full Batch Least Squares SLAM problem while maintaining statistical optimality and consistency, allowing for various applications in robotics and computer vision. Initial experiments demonstrate that the SWF approaches the performance of optimal estimators, even with small sliding windows.

Uploaded by

Trung Con
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

A Sliding Window Filter for SLAM

Gabe Sibley

University of Southern California


Center for Robotics and Embedded Systems
Technical Report No. CRES-06-004.
August 18, 2006

Abstract
This note describes a Sliding Window Filter that is an on-line constant-
time approximation to the feature-based 6-degree-of-freedom full Batch
Least Squares Simultaneous Localization and Mapping (SLAM) prob-
lem. We contend that for SLAM to be useful in large environments and
over extensive run-times, its computational time complexity must be con-
stant, and its memory requirements should be at most linear. Under
this constraint, the “best” algorithm will be the one that comes closest
to matching the all-time maximum-likelihood estimate of the full SLAM
problem, while also maintaining consistency. We start by formulating
SLAM as a Batch Least Squares state estimation problem, and then show
how to modify the Batch estimator into an approximate Sliding Window
Batch/Recursive framework that achieves constant time complexity and
linear space complexity. We argue that viewing SLAM from the Sliding
Window Least Squares perspective is very useful for understanding the
structure of the problem. This perspective is general, capable of subsum-
ing a number of common estimation techniques such as Bundle Adjust-
ment and Extended Kalman Filter SLAM. By tuning the sliding window,
the algorithm can scale from exhaustive Batch solutions to fast incremen-
tal solutions; if the window encompasses all time, the solution is alge-
braically equivalent to full SLAM; if only one time step is maintained, the
solution is algebraically equivalent to the Extended Kalman Filter SLAM
solution. The Sliding Window Filter enables other interesting properties,
like continuous sub-mapping, lazy data association, undelayed or delayed
landmark initialization, and incremental robust estimation. We test the
algorithm in simulations using stereo vision exterioceptive sensors and in-
ertial measurement proprioceptive sensors. Initial experiments show that
the SWF approaches the performance of the optimal batch estimator, even
for small windows on the order of 5-10 frames.

1 Introduction
For a mobile robot, there are many situations in which accurate high-resolution
local spatial awareness is a prerequisite for successfully performing a task. For
instance, good sensing is undoubtedly useful for tight obstacle avoidance, deli-

1
cate mobile manipulation, sensing complicated terrain, subtle motion detection
or moving object tracking from a moving platform.
This is a data-fusion problem in which a sensor (potentially undergoing
uncertain, dynamic motion) must combine noisy measurements into a single
underlying sensor-relative state estimate. In Robotics this problem is most
commonly tackled under the banner of Simultaneous Localization and Mapping
(SLAM); in Computer Vision the problem is usually referred to as Structure
from Motion (SFM).
This paper develops a Sliding Window Filter for SLAM that focuses compu-
tational resources on accurately estimating the immediate spatial surroundings
using a sliding time window of the most recent sensor measurements. Ideally,
we would like a constant time algorithm that closely approximates the all-time
maximum-likelihood estimate as well as the minimum variance Cramer Rao
Lower Bound - e.g. we would like an estimator that achieves some notion of sta-
tistical optimality (quickly converges), efficiency (quickly reduces uncertainty)
and consistency (avoids over-confidence). To this end we give a derivation of
the SLAM problem from the Gaussian non-linear least squares optimization per-
spective. We find that this results in a simple, yet general, take on the SLAM
problem; we think this is a useful contribution.
It is interesting to note at this point that even though the Sliding Win-
dow Filter technique focuses on the spatially immediate estimation process, it
also incrementally builds a map structure (occupying O(n) memory space) that
is useful for non-local problems, such as loop closure, place recognition, nav-
igation, topological planning, etc. So while it does not address loop closure
per-se, it does build a representation amenable to solving such problems. Ul-
timately, we believe that loop closure is best cast as a distinct problem from
local estimation. Decoupling loop closure from the core SLAM estimator allows
concentrating computational resources on improving the local result, which is
crucial for applications that require spatially high-resolution, dense structure
estimates. With high bandwidth sensors (like cameras) focusing on the local
problem is clearly important for computational reasons; this is especially true if
we wish to fuse all of the sensor data (or a significant portion thereof). However,
even with this local focus, once a loop closure is identified, global optimization
over the map structure left behind by the Sliding Window Filter should match
the global batch solution.
Sliding Window Filtering is generally applicable to a wide variety of platform
configurations and environments, capable of working with or without a motion
model and with or without proprioceptive sensors. This flexibility allows solv-
ing bearing only problems, such as Bundle Adjustment, as well as traditional
dynamic state space problems like EKF SLAM. Further, the approach is easily
applicable to problems with asynchronous sensor measurements, like for exam-
ple when using GPS at ∼1 hz in combination with inertial sensors at ∼100hz.
We apply the Sliding Window Filter to SLAM with stereo vision and inertial
measurements. Experiments show that the best approximate method comes
close to matching the performance of the optimal estimator while attaining
constant time complexity - empirically, it is often the case that the difference in
their performance is is indistinguishable.

2
2 Problem Description
It is useful to approach SLAM from the traditional Statistical Point Estimation
perspective because it helps reveal the underlying structure of the problem.
This is apparent for a number of reasons. First, because it highlights the funda-
mental minimization principle at work in least squares, which, we would argue,
is a principle that is harder to see from the recursive estimation perspective.
Second, starting with the underlying probability density functions that describe
our problem, it clearly shows the Gaussian probabilistic nature of SLAM - that
is, SLAM is simply tracking a normal distribution through a large state space;
a state space that changes dimension as we undertake the fundamental proba-
bilistic operations of removing parameters via marginalization, and adding pa-
rameters via conditioning. A third reason to derive SLAM via statistical point
estimation is because it exposes a rich body of theory about the convergence of
least squares estimators - theory that is missing or difficult to come by from the
recursive non-linear estimation perspective.
Without going into detail, note that from this point one can easily see the
connection to many important concepts like Newton’s method, Fisher Infor-
mation, the Cramer Rao Lower bound, information graphs, graphical models,
belief propagation and probabilistic belief networks to name just a few. All these
concepts have intuitive derivations starting from traditional statistical point es-
timation. If these concepts are branches of a tree, then optimizing a quadratic,
as in least-squares, is the fundamental trunk.
With this in mind, we carry forward in the usual way, by describing the
system state vector, process model, measurement model and how we include
prior information.

2.1 System Parameterization


The state vector is a temporal sequence of robot poses xpj , 1 ≤ j ≤ m, and 3D
landmark positions xmi , 1 ≤ i ≤ n.

xp 1
 
 .. 
 . 
" # 



xp  xp m 
x= = 
xm  x
 m1


 ..
 

 . 
xm n
Each pose xpj is represented by a six parameter column vector comprised of
a 3D point and an Euler angle xpj = [xpj ypj zpj rpj ppj qpj ]T . Landmarks are
represented by their 3D position, xmi = [xmi ymi zmi ]T . The state dimension
is thus |x| = (6m + 3n) and grows as the robot path increases and as new
landmarks are observed.

2.2 Process Model


The process model fj : R6 → R6 for a single step describes each pose in terms
of the previous pose

3
xpj = fj (xpj−1 , uj ) + wj (1)
where uj is an input command to the robot. The noise vector wj is additive
and follows a normal distribution wj ∼ N (0, Qj ), so that xpj ∼ N (fj (xpj−1 , uj ), Qi ).
A simple and useful kinematic process model for f is the compound ¯ operation,
∂f ¯
⊕, which is described in (Appendix B). The Jacobian of f , Fj = ∂x ¯ xj−1 , which
we will need in a moment, is also derived in (Appendix B). Concatenating the
individual process models together we can write the system dynamic state space
model as,  
f1 (xp0 , u1 )

f (xp ) =  .. 
.
 . 
fm (xpm−1 , um )
With the above definitions and assumptions the probability density function
describing the robot path is
½ ¾
1 T −1
P (xp ) = ηxp exp − (xp − f (xp )) Q (xp − f (xp )) (2)
2
p
where η = 1/ (2π)|xp | det(Q) is a normalizing constant and |xp | is the dimen-
sion of xp . In practice, one usually extends this basic model to also estimate
other quantities, such as linear and angular velocities; for clarity, we will stick
with this basic formulation.

2.3 Sensor Model


A measurement of the ith landmark taken from the j th pose is related to the
state vector by the sensor model hij : R|xmi |+|xpj | → R|zij |

zij = hij (xmi , xpj ) + vij (3)


which generates the expected value the sensor will return when landmark i
is observed from pose j. We assume vij ∼ N (0, Rij ) so that zij ∼ N (hij , Rij ),
where Rij is the observation error covariance matrix.
Concatenating all the observations, measurement functions and measure-
ment covariances together we write z, h, and R as

     
z11 h11 R11 0 ... 0
 .   .   .. .. 
 .   . .
 .   . 0 .
  
  
     
z= z , h(x) =  h1m , R =  R1m
 1m 
 
   
 .   . .. ..
  
 .   .
 .   . . .
  
  
znm hnm 0 ... Rnm
Thus the probability density describing the measurement likelihood is simply
½ ¾
1
P (z|x) = ηz exp − (z − h(x))T R−1 (z − h(x)) (4)
2
p
where ηz = 1/ (2π)|z| det(R) is again a normalizing term.

4
2.4 Point Estimation
In the beginning let us say we have some¡ prior information about the state
represented by a prior distribution x̂ ∼ N x, Π−1 which encodes information
¢

about a single starting pose, about some previously known map of n landmarks,
and about the relationships between the starting pose and the map. The inverse
covariance Π can be broken into four distinct parts
" # " #−1 
xp 1 Πp Πpm
x̂ ∼ N  , 
xm1:n ΠTpm Πm

where here Πp is the 6 × 6 initial pose information matrix, Πm is 3n × 3n


map prior information matrix, and Πpm is the 6 × 3n pose-map information
matrix. The probability density function representing this prior knowledge is
½ ¾
1
P (x̂) = ηx̂ exp − (x̂ − x)T Π(x̂ − x) , (5)
2
p
where ηx̂ = 1/ (2π)|x̂| det(Π−1 ) is the normalizing factor.
Using (2), (5) and (4) we can now write the posterior probability of the state

P (x|z, x̂) = P (z|x)P (xp )P (x̂) (6)


We wish to compute the maximum a posteriori estimate of x which maxi-
mizes this density. It helps if we lump the sensor model, process model, and
prior information terms together by defining the function g(x) and matrix C as
 
  z − h(x)  −1 
gz (x) R 0 0
x − f (x )
 
p p
C−1 =  0
 
g(x) =  gf (x)  =  # , Q−1 0  ;
  "  
 x p1

gπ (x) 
x̂ −
 0 0 Π
xm

then by taking the negative logarithm of (6) we get a proportional non-linear


least squares problem


gz (x)T R−1 gz (x) + gf (x)T Q−1 gf (x) + gπ (x)T Πgπ (x)
¢
ℓ(x) =
2

g(x)T C−1 g(x) .
¢
= (7)
2
If we let ST S = C−1 and r(x) = Sg(x) then (7) is clearly a non-linear least
squares problem of the form
1
ℓ(x) = ||r(x)||2 . (8)
2
Newton’s solution to such optimization problems is the iterative sequence

xi+1 = xi − (∇2 ℓ(xi ))−1 ∇ℓ(xi ). (9)

For small residual problems a useful approximation to (9) is the Gauss-Newton


method, which approximates the Hessian ∇2 ℓ(xi ) by r′ (xi )T r′ (xi ). Thus, since

5
Figure 1: Basic Figure 2: The sparse structure of least squares SLAM system
structure of the matrix is due to contributions from three components: the
sensor model measurement block HT R−1 H, the process block DT Q−1 D,
Jacobian, H. and the prior information block LT ΠL.

the gradient of (8) is ∇ℓ(xi ) = r′ (xi )T r(xi ), the Gauss-Newton method defines
the sequence of iterates[6]

xi+1 = xi − (r′ (xi )T r′ (xi ))−1 r′ (xi )T r(xi ) (10)

Noting that r′ (xi ) = SGi where Gi is the Jacobian of g(xi ), (10) becomes

δxi = (GTi C−1 Gi )−1 GTi C−1 g(xi ). (11)

such that xi+1 = xi +δxi . When iterated, this sequence is locally q-quadratically
convergent to the MAP estimate for near zero-residual problems[6]. The system
of linear equations
GTi C−1 Gi δxi = GTi C−1 g(xi ) (12)
is the essential least squares form of the SLAM problem. The difference be-
tween many SLAM algorithms can be boiled down to differences in how these
equations are solved. It is also interesting to note here that for many problems
the Gauss-Newton method is algebraically identical to the Iterated Extended
Kalman Filter. In fact, as will become clear later, the Sliding Window Filter
over single time step and all landmarks is exactly the IEKF SLAM solution.
As δxi → 0 the term (GTi C−1 Gi ) converges to the Hessian of the likelihood
function (this is the approximation the Gauss-Newton method makes over the
full second order Newton method[6]). It turns out that for maximum-likelihood
estimation with normal distributions (or MAP estimation when prior informa-
tion is included like in our formulation) the term (GTi C−1 Gi ) is also the Fisher
Information matrix, and it’s inverse approximates the system covariance.

3 Sparsity in the System Equations


Before describing the Sliding Window Filter it is useful to take a look at the
overall structure of the SLAM least squares equations, and to study how this
structure lends itself to various algebraic solutions.
Expanding the Jacobian G,
 ∂gz   
∂x H
G =  ∂g = − D ,
 f   
∂x 
∂gπ L
∂x

6
we see that the system matrix, GT C−1 G = HT R−1 H + DT Q−1 D + LT ΠL,
has a sparse structure due to the form of D, L and especially H. The structure
of H is shown in Fig. 1. The sparsity pattern of least squares SLAM system
matrix is due to contributions from the three components
2 3
Πp 0 ... Πpm
2 3 2 3 6 7
6 7
U W E 0
6 0 0 0 7
HT R−1 H=4 DT Q−1 D=4 and T
6 7
5, 5, L ΠL=6
6 7 6 7
..
7
.. .
WT
6 7
V 0 0 6
6 . 0 . .
.
7
7
4 5
Πpm T 0 ... Πm

where U = AT R−1 A, W = AT R−1 B and V = BT R−1 B. This structure


is also depicted graphically in Fig. 2. The task is to solve the system of normal
equations 10 which expand to
" #" # " #
Λp Λpm δxp gp
=
Λpm T Λm δxm gm

where gp and gm are the least squares RHS vector corresponding to the
robot path and map, respectively. We solve this system of equations using
elementary matrix operations – e.g. the Schur complement (Appendix A) - to
reduce the lower right map block Λm onto the upper left process block Λp

" #" # " #


Λp − Λpm (Λm )−1 Λpm T 0 δxp gp − Λpm (Λm )−1 gm
=
Λpm T Λm δxm gm

which is solved directly for δxp and then for δxm by back-substitution:

δxp = (Λp − Λpm (Λm )−1 Λpm T )−1 (gp − Λpm (Λm )−1 gm )
δxm = (Λm )−1 (gm − Λpm T δxp )

Alternately, we can also reduce the upper left process block Λp onto the
lower right map block Λm

" #" # " #


Λp Λpm δxp gp
=
0 Λm − Λpm T (Λp )−1 Λpm δxm gm − Λpm T (Λp )−1 gp

giving the solution

δxm = (Λm − Λpm T (Λp )−1 Λpm )−1 (gm − Λpm T (Λp )−1 gp )
δxp = (Λpm )−1 (gp − Λpm δxm )

Depending on the process noise and the prior, the system matrix GT C−1 G
can take on different sparsity patterns that affect the complexity of finding a
solution. The possible sparsity patterns are shown in figure 3. In the field, the
problem at hand will define the sparsity pattern, which will influence the choice
of which algorithm to use. For instance, an infinite process noise covariance

7
(a) With motion (b) With motion (c) No motion infor- (d) No motion infor-
model; with prior model; Without prior mation; with prior in- mation; No prior in-
information. This is information. This formation. formation. This is
the structure of the is the structure of the structure of Pho-
full SLAM problem in GraphSLAM[19]. togrammetric Bundle
this paper. Adjustment[20].

Figure 3: The pattern of the system matrix in the SLAM problem depends on
information contributed (or not) from the dynamical process model and any
prior information. With no process model information and no prior information
the problem is equivalent to Photogrammetric Bundle Adjustment (the struc-
ture of this problem is visible on the right). Including a process model makes the
upper left m × m “process block” tridiagonal. Including a prior can potentially
cause complete fill-in of the lower right n × n “map-block”.

would mean the motion model does not contribute information to the system
(Q−1 = 0 =⇒ E = 0), which would reduce the upper left block of the system
matrix to block diagonal, which is O(m + n3 ) to solve. Similarly, without prior
information (Π = 0) the system matrix lower right block is also block diagonal,
which is O(m3 + n) to solve. Without information from the motion model and
without prior information the problem is equivalent to the Bundle Adjustment
problem in Photogrammetry[3], which can be solved in either O(m3 + n) or
O(m + n3 ). It is interesting to note that in this form (no motion model, no
prior), the first optimal solution to the SLAM problem using cameras appears
to have been developed by Brown circa 1958[2]. Brown was also the originator
of what has come to be known as the Tsai camera model[3, 21].

4 The Sliding Window Filter


For SLAM to operate usefully over long periods of time, perhaps over the course
of a robots life time, its computational complexity must be O(1) - hence the
size of parameter vector cannot grow without bound.
The simplest way to bound computational complexity is to reduce the size
of the state vector by, say, removing the oldest pose parameters or distant land-
mark parameters. If we directly remove parameters from the system equation
however, we can lose information about how the parameters interact. The right
way to remove parameters from a multi-dimensional normal distribution is to
marginalize them out.

4.1 Marginalization
Analogously to the how the sparse SLAM equations are solved via reduction
and back-substitution, marginalizing uses the Schur Compliment to reduce the

8
parameters of interest onto a smaller system, which is then solved for the re-
maining parameters. This is equivalent to marginalizing out the parameters
we wish to remove[10]. Considering the effects of the Schur Compliment, it is
not surprising that marginalizing out parameters induces conditional dependen-
cies between all the remaining parameters that the removed parameters were
conditionally dependent on.
Thus, at a minimum, marginalizing a set of pose parameters will add cross-
information terms between all the landmarks that were visible from that pose.
This is depicted graphically in Fig. 4 for a system that starts without any prior
information. From this we see that it is important to consider how marginalizing
affects the system of equations. It is well known that the information matrix can
be interpreted as an adjacency matrix of an undirected graph where non-zero off-
diagonal entries encode conditional dependencies between parameters. The ith ,
j th element of the Λpm matrix encodes pose-to-map conditional dependencies
and is non-zero only if the ith map landmark was visible from the j th pose.
Studying this structure we see that marginalizing the oldest pose causes
fill-in in three places: 1) between any landmarks that were visible from the
marginalized pose, 2) between the parameters of the next-oldest-pose (the pose
one time step after the pose being marginalized), and 3) between the next-
oldest-pose and all landmarks seen by the marginalized pose. Notice that only
Π experiences additional fill-in.
This is important because it means that the upper left block of the full SLAM
problem is still block tri-diagonal, and the Λpm block is only changed across
the top 6 rows - exactly where it overlaps with Πpm . Hence, when solving we
can still take advantage of any sparsity patterns that may exist in Λpm , just
like as in Bundle Adjustment (see the Appendix in[9] for more on this exploit).
Note that the patterns in Λpm are problem specific, depending on how the
robot went about observing the environment. For instance, a robot sentry that
always sees the same landmarks will have a completely dense Λpm block, while
for a robot exploring and sensing new landmarks, it will have a sparse banded
pattern. This fact can hamper methods that rely on predictability in Λpm in
order to achieve computational efficiency[5].
When landmarks are observed from a pose it adds pose-to-landmarks condi-
tional dependence information to the system matrix. This information encodes
a soft spatial rigidity between the parameters. Marginalizing a pose preserves
this soft spatial rigidity by transferring its structure into a map of conditionally
dependent landmarks.
By marginalizing poses we have succeeded in removing the O(m3 ) cost of car-
rying the complete robot path in the state estimate. Interestingly, this process
is in some sense the opposite of sparsification in Sparse Extended Information
Filters[18] - one might call it map “densification” - it is also the opposite of
what is done in Delayed State Filters[7].
The next thing is to bound the growth of the map by a constant, which we
will do by marginalizing landmarks that are no longer visible from any pose
currently in the state vector. To marginalize landmark parameters the parame-
ters to be removed are first manipulated into the upper-left block of the system
equations via elementary matrix manipulations. The fact that the landmark
we are removing is no longer visible is crucial because except for the oldest
pose-to-map terms, all the cross-information terms in Λpm will be zero, which
means that the Schur complement onto the remaining system will once again

9
Figure 4: Evolution of the system matrix for a toy problem with 4 poses and
6 landmarks. On the left is the system matrix after measuring landmarks 1,
2, 3 from pose 1, landmarks 2, 3, and 4 from pose 2, landmarks 3, 4, and, 5
from pose 3 and 4, 5, and 6 at pose 4. This is depicted graphically in Fig.
5. Marginalizing out pose 1 induces conditional dependencies (fill-in) in three
places: 1) the top left 6 × 6 of the process-block, 2) the prior map-block Πm
between landmarks that were visible from pose 1, and 3) the prior pose-to-map
block Πpm between landmarks that were visible from pose 1. These places are
shaded in darker grey. At this point marginalizing landmark 1, which is not
visible from any of the remaining poses, will induce no extra fill-in in Π.

Figure 5: Graph interpretation of the example described in Fig. 4. On the left


is the initial system, the middle image shows the result of marginalizing the
first pose, the right image shows the result of marginalizing the first landmark -
marginalizing these “invisible” landmarks does not cause additional fill-in. The
dashed lines indicate parameters that are stored but no longer in the estimator.
Saving the map graph structure for later use, say to aid in closing loops, requires
only O(n) storage.

10
MSE vs Time MSE vs Time
−4
x 10
−3 41 landmarks x 10 25 features, 85 percent overlap
1.4 1
"Visual Odometry" 2 Frames
1/N Predicted MSE
2 Frame LS−SLAM 0.9 5 Frames
Full SLAM 20 Frames
1.2
10 Frames
0.8

1 0.7

0.6
0.8
MSE

MSE
0.5

0.6
0.4

0.4 0.3

0.2
0.2
0.1

0 0
1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20
Time Time

(a) Sliding Window Filter comes close to the (b) MSE vs Time with the cameras in for-
Full SLAM solution. Sliding Window of 2 is ward motion over 1m. Each frame overlaps
close to optimal 1/k full batch curve. Each ∼85 with ∼25 features per frame. The saw-
curve is a trial for different size time window, tooth pattern stems from the fact that new
averaged over 50 Monte-Carlo trials, with features will cause the MSE to jump up. The
0.1 pixel std.dev measurement noise. 1.0m 2 frame SWF suffers from marginalizing in-
std.dev process noise. Because VO does not formation out too quickly.
combine information over time, it does not
reduce uncertainty as time passes.

only affect the prior, Π.


With a sensor that has a limited range and under the reasonable condition
that the spatial density of landmarks in the world is roughly uniform, then the
number of pose-to-landmark information terms is constant. Marginalizing poses
at a fixed rate and landmarks when they become “invisible” results in a constant
time complexity SLAM estimation algorithm. Unlike other techniques this is
true regardless of how the robot explores the environment[8, 19].
The key points here are that 1) marginalizing can preserve information by
transforming the way in which the system probability distribution is repre-
sented; 2) marginalizing both pose and “invisible” landmark parameters only
ever affects the system prior, and not the general sparsity pattern of the system
equations - the prior terms x̂ and Π “catch” all the information we marginalize
out.
Note that just by choosing when to marginalize poses and landmarks the
Least Squares SLAM algorithm can scale from the full Batch solution, to the
Extended Kalman Filter solution, to the incremental Visual Odometry solution.
That all these algorithms are subsumed within one framework testifies to the
generality of the simple least squares approach.
In theory, Sliding Window Filters offer a method to achieve the optimal all-
time MAP estimate at reduced complexity, potentially even at constant com-
plexity. A key question that remains to be answered is how much information is
lost when we transform the system p.d.f. via marginalizing? If we can quantify
this information loss then we can possibly show a bounded equivalence between
the full SLAM solution and a constant time approximation. For now we show
simulation and empirical evidence that the information loss is negligible.
It is interesting to note what happens if we simply delete parameters from
the estimator instead of marginalizing them out. For a sliding window of size
k, the error converges like 1/k just as we would expect the batch estimator to

11
do. However, after k steps, the error stops converging as we delete information
from the back of the filter. With such deleting and a sliding window of k = 1 it
is interesting to note that we end up with a solution that is nearly identical to
previous forms of Visual Odometry[12, 14, 15] (though we should expect slightly
better results since the state estimate of the landmarks is being improved with
each observation, that is, there is some data fusion, even with “deleting”). The
graph in Fig. 4.1 shows the MSE performance for this type of Visual Odometry
compared to the batch solution, as well as the Sliding Window Filter solution.

5 Optimality and Efficiency


If (10) converges to the global minimum of ℓ(x) then the resultant x is the
minimal variance MAP estimate [6]. This is the “best” or “optimal” estimate
possible in the non-linear least squares sense with normally distributed mea-
surements. Unlike the Extended Kalman Filter, the Gauss-Newton method has
a known convergence theory, which, for near zero-residual problems like SLAM,
is locally q-quadratically convergent[6]. The signature of an optimal estimator
is a 1/k reduction in mean squared error of the state estimate vs. ground truth,
where here k is the number of time steps.
Another important factor to address is estimator efficiency, that is, how well
the estimator approximates a minimal variance estimate of the parameters. The
information inequality, covx (x) ≥ I(x)−1 , defines the minimal variance bound,
which is called Cramer-Rao lower bound[4]. Here the matrix I(x) is the Fisher
information matrix defined by the symmetric matrix whose ith , j th element is
the covariance between first partial derivatives of the log-likelihood function (7)
µ ¶
∂ℓ ∂ℓ
I(x),i,j = covx , (13)
∂xi ∂xj

For a multivariate normal distribution (13) reduces to Ii,j (x) = GT C−1 G,


which is equivalent to the Hessian matrix in the Gauss-Newton method. This is
why in least squares problems like SLAM the Hessian (and inverse covariance)
is the Information Matrix[17]. Iterative batch solutions to non-linear problems
generally give better parameter estimates than both non-iterative methods and
first order recursive methods. For example, because they do not iterate, meth-
ods like the Extended Kalman Filter have no guarantee of converging to the
local cost minimum of the objective function, while iterative techniques like the
Gauss-Newton method have a well established convergence theory[6]. For con-
vergence, one would have to use the iterative Extended Kalman Filter (which
is algebraically equivalent to the Gauss-Newton method[1]). Iteration is funda-
mental.
In SLAM, batch estimators are beneficial because new observations can often
improve old pose estimates, which can in turn lead to better estimate for the cur-
rent pose. For instance, new landmark measurements can help estimate not only
where the current pose, but also previous poses. Recursive methods marginalize
out old poses and hence cannot improve previous pose estimates. On the other
hand, batch methods maintain the old pose information necessary to re-evaluate
and re-linearize the objective function using all measurements, which improves
a significant portion of the robot path estimate. Estimating the robot path
and re-considering past measurements both lead to better (smaller residual, less

12
variance) parameter estimates. Batch methods also produce smooth trajecto-
ries (hence techniques that optimize over more than one time step are called
smoothers).
The covariance estimate in Recursive Least Squares like the EKF is based on
the assumption that second derivative terms in the true Hessian are negligible.
The closer the parameter estimate to the true value the better this covariance
approximation will be. Since iterative batch least squares solutions give better
parameter estimates, the approximated covariance matrix (and information ma-
trix) is also more accurate. In the non-linear least squares framework, estimating
over all the parameters in the system, considering the entire measurement his-
tory, and iterating the update are crucial for attaining an optimal, efficient and
consistent estimate.

6 Related Work
This work was inspired by the results from the Photogrammetry community,
dating back to the late 1950’s[2], and later derivatives like the Variable State
Dimension Filter[13], Visual Odometry[12], and of course SLAM[16].

A Schur Complement, Gaussian Marginals, and


the Matrix Inverse Lemma
In this section we look at some properties of block matrices and how they are
used in recursive filtering [11]. First we look at the problem of computing the
inverse of a matrix in terms its sub-matrices. This derivation leads to the Matrix
Inversion Lemma, and makes use of an operation called the Schur Complement,
which for normal distributions in Canonical form turns out to be equivalent to
marginalization.
The Matrix Inverse Lemma and Marginalization via the Schur Complement
are useful for understanding recursive linear estimation theory. For instance,
deriving the Kalman Filter from Bayes rule is greatly simplified by reference
to the Matrix Inversion Lemma, and the update step in state estimation is an
application of marginalization.
Let us say M is a large matrix
" #
A B
M=
C D

that we wish to invert, and we know that A and D are square and invertable.
The first thing is to notice the two following simple matrix multiplications that
allow us to triangularize M: first, the following left multiplication creates an
upper-right triangular system,
" #" # " #
I 0 A B A B
=
−CA−1 I C D 0 ∆A
and the following right multiplication creates a lower-left triangular system,

13
" #" # " #
A B I −A−1 B A 0
=
C D 0 I C ∆A

The term ∆A = D − CA−1 B is called the Schur Complement of A in M .


Similarly, we can complement D instead of A,
" #" # " #
I −BD−1 A B ∆D 0
=
0 I C D C D
and
" #" # " #
A B I 0 ∆D B
=
C D −D−1 C I 0 D

where ∆D = A − BD−1 C is the Schur Complement of D in M.


Combining the above gives two different ways to block diagonalize M:

" #" #" # " #


I 0 A B I −A−1 B A 0
=
−CA−1 I C D 0 I 0 ∆A

and

" #" #" # " #


I −BD−1 A B I 0 D 0
−1
= .
0 I C D −D C I 0 ∆D

Using these we can re-express the original matrix M in terms of a lower-left


block triangular component, a block diagonal component, and an upper-right
block triangular component. That is,

" # " #" #" #


A B I 0 A 0 I A−1 B
=
C D CA−1 I 0 ∆A 0 I
" #" #" #
I BD−1 D 0 I 0
= .
0 I 0 ∆D D−1 C I

which greatly simplifies computing the inverse since the middle term is block
diagonal. For instance,

" #−1 " #" #" #


A B I −A−1 B A−1 0 I 0
= (A.2)
C D 0 I 0 ∆−1
A −CA−1 I
" #
−1
A−1 + A−1 B∆−1
A CA −A−1 B∆−1
A
= −1
(A.3)
−∆−1
A CA ∆−1
A

and equivalently,

14
" #−1 " #" #" #
A B I 0 ∆−1
D 0 I −BD−1
= −1
(A.4)
C D −D C I 0 D−1 0 I
" #
−1
∆−1
D −∆−1
D BD
= −1
(A.5)
−D−1 C∆−1
D D−1 + D−1 C∆−1
D BD

Equating various terms of (A.3) and (A.5) yeild the differnt forms of the
Matrix Inverse Lemma, one of which is

(A − BD−1 C)−1 = A−1 + A−1 B(D − CA−1 B)−1 CA−1 .


This lemma is one of the primary tricks used for manipulating linear least
squares equations (for instance, it is used to get betwen the Gauss-Newton
method and the Kalman filter). The Schur Complement applied to an inverse
covariance matrix it is equivalent to Marginalizing a normal distribution. [10].

B Spatial Operations
This section discusses two tools for reasoning about spatial relationships, namely
the compound and inverse operators. Generally, objects are represented by a
pose vector which is an entity comprised of a 3D position and an orientation. A
pose always has 6 degrees of freedom, though it may have more elements (i.e. a
quaternion has 4 parameters to represent 3 degrees of rotational freedom, giving
a pose vector with 7 elements). This detail has consequences. For example, it is
worth remembering that all 3-parameter representations of orientation will have
a singularity associated with them (e.g. gimbal-lock for Euler angles), and all
4 or more parameter representations are redundant, leading to other problems,
such as rank deficient covariance matrices.
In general there is no one “best” representation of orientation, and it is often
up to the practitioner to decide what is the right one for the application. One
approach is to use quaternions to represent pose globally, and then use Euler
angles to represent small local changes. This gets the best of both worlds: a
non-singular global representation and a minimal local representation.
Note that using local parameterizations like this buys us some of the benefits
enjoyed by error-state Kalman filters, namely that the least squares optimization
takes place around a state vector whose optimal solution will be near zero. In
our case this has the effect of keeping the optimization machinery far away from
the singularities of SO3.
Therefore local pose, xij is represented by a six parameter column vector
comprised of a 3D point, pij = [xij yij zij ]T and a 3 element Euler Roll-Pitch-
Yaw angle θij = [rij pij qij ]T . The +x-axis is forward, +y-axis is right and
+z-axis is down; roll is about x-axis, pitch about y-axis and yaw is about z-
axis. Dropping the subscripts and abbreviating sin and cos, a rotation matrix
is parameterized by θ such that Rθ = RzRyRx. where
     
1 0 0 cp 0 sp cq −sq 0
Rx =  0 cr −sr , Ry =  0 1 0  , Rz =  sq cq 0  .
     

0 sr cr −sp 0 cp 0 0 1

15
Say, for example, that xwr denotes the robot pose in the world coordinate
frame and xrs denotes a sensor in the robot coordinate frame. The compound
operator, ⊕, can be used to find the sensor pose in the world coordinate frame,
e.g. xws = xwr ⊕ xrs (notice that the two r’s are adjacent).
More generally, given two spatial relationships, xij and xjk the compound
operator, ⊕, gives the pose of object k in coordinate frame i,

Rθij pjk + pij


 

xjk = xij ⊕ xjk =  ³ ´ , (A.6)


γ Rθij Rθjk

where Rθij and Rθjk are the 3x3 rotation matrices defined by the Euler angle
components of xij and xjk , respectively. The function γ maps from a rotation
matrix to the appropriate Euler angle
 
arctan(R32 /R33 )
γ(R) =  − arcsin(R31 ) 
 

arctan(R21 /R11

which is well defined for attitude angles between -90o and 90o . The compounding
operation is associative but not commutative.
The inverse operation, ⊖, inverts a spatial relation. For instance xij gives
object j’s pose in i’s coordinate frame, and ⊖xij gives object i’s pose in j’s
coordinate frame. The inverse operator is
" #
−Rθij pij
xji = ⊖xij =
γ(RTθij )

Jacobians of spatial operations are needed in many places - e.g. linear er-
ror propagation, Kalman filtering applications, least squares normal equations,
non-linear optimization, etc. These Jacobians are involved, but tractable; it is
recommended to find them with the aid of symbolic solvers, such as Maple or
Mathematica. As a special case, note that error-state prameterizations simplify
these Jacobians because the entries are all zeros or ones.

C Errata
February 9, 2010 — Replaced the term “downdate” with “marginalize” for
correctness.

References
[1] F.W. Bell, B.M. Cathey. The iterated Kalman filter update as a Gauss-Newton method.
IEEE Transactions on Automatic Control, 38(2):294–297, Feb 1993.
[2] D.C. Brown. A solution to the general problem of multiple station analytical stereotrian-
gulation. Technical report, RCP-MTP Data Reduction Technical Report No. 43, Patrick
Air Force Base, Flordia (also designated as AFMTC 58-8), 1958.
[3] D.C. Brown. The bundle adjustment - progress and prospects. In XIIIth Congress of the
International Society for Photogrammetry, 1976.
[4] M. H. DeGroot and M. J. Schervish. Probability and Statistics. Addison Wesley, 2001.
[5] Frank Dellaert. Square root SAM. In Proceedings of Robotics: Science and Systems,
2005.

16
[6] Jr Dennis J.E. and R. B. Schnabel. Numerical Methods for Unconstrained Optimization
and Nonlinear Equations. Soc for Industrial & Applied Math, 1996.
[7] R. Eustice, H. Singh, and J. Leonard. Exactly sparse delayed-state filters. In Proceedings
of the 2005 IEEE International Conference on Robotics and Automation, 2005.
[8] Ryan Eustice, Hanumant Singh, John Leonard, Matthew Walter, and Robert Ballard.
Visually navigating the rms titanic with slam information filters. In Robotics: Science
and Systems, 2005.
[9] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge
University Press, 2000.
[10] M.I. Jordan. An Introduction to Graphical Models. unpublished, 2003.
[11] T. Kailath, A. H. Sayed, and B. Hassibi. Linear Estimation. Prentice Hall, 2000. Good
reference for estimation theory. Good appendix on matrix algebra.
[12] L. Matthies and S. Shafer. Error modelling in stereo navigation. IEEE Journal of
Robotics and Automation, 3(3):239–248, 1987.
[13] Philip F. McLauchlan. The variable state dimension filter applied to surface-based struc-
ture from motion. Technical report, University of Surrey, 1999.
[14] David Nister, Oleg Naroditsky, and James Bergen. Visual odometry. In CVPR, volume 1,
2004.
[15] Clark F. Olson, Larry H. Matthies, Marcel Schoppers, and Mark W. Maimone. Stereo ego-
motion improvements for robust rover navigation. In In Proceedings IEEE Conference
on Robotics and Automation (ICRA), volume 2, pages 1099– 1104, 2001.
[16] R. C. Smith, M. Self, and P. Cheeseman. Estimating uncertain spatial relationships in
robotics. In I. J. Cox and G. T. Wilfong, editors, Autonomous Robot Vehicles, pages
167–193. Springer-Verlag, 1990.
[17] H. W. Sorenson. Parameter Estimation: Principles and Problems. Marcel Drekker, Inc.,
1980.
[18] S. Thrun, D. Koller, Z. Ghahmarani, and H. Durrant-Whyte. Simultaneous map-
ping and localization with sparse extended information filters: Theory and ini-
tial results. In Workshop on Algorithmic Foundations of Robotics, 2002. URL
citeseer.nj.nec.com/article/thrun02simultaneous.html.
[19] Sebastian Thrun, Wolfram Burgard, and Dieter Fox. Probabilistic Robotics. MIT Press,
Cambridge, MA, 2005.
[20] Bill Triggs, Philip McLauchlan, Richard Hartley, and Andrew Fitzgibbon. Bundle ad-
justment – A modern synthesis. In W. Triggs, A. Zisserman, and R. Szeliski, editors,
Vision Algorithms: Theory and Practice, LNCS, pages 298–375. Springer Verlag, 2000.
URL citeseer.ist.psu.edu/triggs00bundle.html.
[21] R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3d machine vision
metrology using off-the-shelf tv cameras and lenses. IEEE Journal of Robotics and
Automation, 3(4):pp 323–344, 1987.

17

You might also like