A Sliding Window Filter For SLAM
A Sliding Window Filter For SLAM
Gabe Sibley
Abstract
This note describes a Sliding Window Filter that is an on-line constant-
time approximation to the feature-based 6-degree-of-freedom full Batch
Least Squares Simultaneous Localization and Mapping (SLAM) prob-
lem. We contend that for SLAM to be useful in large environments and
over extensive run-times, its computational time complexity must be con-
stant, and its memory requirements should be at most linear. Under
this constraint, the “best” algorithm will be the one that comes closest
to matching the all-time maximum-likelihood estimate of the full SLAM
problem, while also maintaining consistency. We start by formulating
SLAM as a Batch Least Squares state estimation problem, and then show
how to modify the Batch estimator into an approximate Sliding Window
Batch/Recursive framework that achieves constant time complexity and
linear space complexity. We argue that viewing SLAM from the Sliding
Window Least Squares perspective is very useful for understanding the
structure of the problem. This perspective is general, capable of subsum-
ing a number of common estimation techniques such as Bundle Adjust-
ment and Extended Kalman Filter SLAM. By tuning the sliding window,
the algorithm can scale from exhaustive Batch solutions to fast incremen-
tal solutions; if the window encompasses all time, the solution is alge-
braically equivalent to full SLAM; if only one time step is maintained, the
solution is algebraically equivalent to the Extended Kalman Filter SLAM
solution. The Sliding Window Filter enables other interesting properties,
like continuous sub-mapping, lazy data association, undelayed or delayed
landmark initialization, and incremental robust estimation. We test the
algorithm in simulations using stereo vision exterioceptive sensors and in-
ertial measurement proprioceptive sensors. Initial experiments show that
the SWF approaches the performance of the optimal batch estimator, even
for small windows on the order of 5-10 frames.
1 Introduction
For a mobile robot, there are many situations in which accurate high-resolution
local spatial awareness is a prerequisite for successfully performing a task. For
instance, good sensing is undoubtedly useful for tight obstacle avoidance, deli-
1
cate mobile manipulation, sensing complicated terrain, subtle motion detection
or moving object tracking from a moving platform.
This is a data-fusion problem in which a sensor (potentially undergoing
uncertain, dynamic motion) must combine noisy measurements into a single
underlying sensor-relative state estimate. In Robotics this problem is most
commonly tackled under the banner of Simultaneous Localization and Mapping
(SLAM); in Computer Vision the problem is usually referred to as Structure
from Motion (SFM).
This paper develops a Sliding Window Filter for SLAM that focuses compu-
tational resources on accurately estimating the immediate spatial surroundings
using a sliding time window of the most recent sensor measurements. Ideally,
we would like a constant time algorithm that closely approximates the all-time
maximum-likelihood estimate as well as the minimum variance Cramer Rao
Lower Bound - e.g. we would like an estimator that achieves some notion of sta-
tistical optimality (quickly converges), efficiency (quickly reduces uncertainty)
and consistency (avoids over-confidence). To this end we give a derivation of
the SLAM problem from the Gaussian non-linear least squares optimization per-
spective. We find that this results in a simple, yet general, take on the SLAM
problem; we think this is a useful contribution.
It is interesting to note at this point that even though the Sliding Win-
dow Filter technique focuses on the spatially immediate estimation process, it
also incrementally builds a map structure (occupying O(n) memory space) that
is useful for non-local problems, such as loop closure, place recognition, nav-
igation, topological planning, etc. So while it does not address loop closure
per-se, it does build a representation amenable to solving such problems. Ul-
timately, we believe that loop closure is best cast as a distinct problem from
local estimation. Decoupling loop closure from the core SLAM estimator allows
concentrating computational resources on improving the local result, which is
crucial for applications that require spatially high-resolution, dense structure
estimates. With high bandwidth sensors (like cameras) focusing on the local
problem is clearly important for computational reasons; this is especially true if
we wish to fuse all of the sensor data (or a significant portion thereof). However,
even with this local focus, once a loop closure is identified, global optimization
over the map structure left behind by the Sliding Window Filter should match
the global batch solution.
Sliding Window Filtering is generally applicable to a wide variety of platform
configurations and environments, capable of working with or without a motion
model and with or without proprioceptive sensors. This flexibility allows solv-
ing bearing only problems, such as Bundle Adjustment, as well as traditional
dynamic state space problems like EKF SLAM. Further, the approach is easily
applicable to problems with asynchronous sensor measurements, like for exam-
ple when using GPS at ∼1 hz in combination with inertial sensors at ∼100hz.
We apply the Sliding Window Filter to SLAM with stereo vision and inertial
measurements. Experiments show that the best approximate method comes
close to matching the performance of the optimal estimator while attaining
constant time complexity - empirically, it is often the case that the difference in
their performance is is indistinguishable.
2
2 Problem Description
It is useful to approach SLAM from the traditional Statistical Point Estimation
perspective because it helps reveal the underlying structure of the problem.
This is apparent for a number of reasons. First, because it highlights the funda-
mental minimization principle at work in least squares, which, we would argue,
is a principle that is harder to see from the recursive estimation perspective.
Second, starting with the underlying probability density functions that describe
our problem, it clearly shows the Gaussian probabilistic nature of SLAM - that
is, SLAM is simply tracking a normal distribution through a large state space;
a state space that changes dimension as we undertake the fundamental proba-
bilistic operations of removing parameters via marginalization, and adding pa-
rameters via conditioning. A third reason to derive SLAM via statistical point
estimation is because it exposes a rich body of theory about the convergence of
least squares estimators - theory that is missing or difficult to come by from the
recursive non-linear estimation perspective.
Without going into detail, note that from this point one can easily see the
connection to many important concepts like Newton’s method, Fisher Infor-
mation, the Cramer Rao Lower bound, information graphs, graphical models,
belief propagation and probabilistic belief networks to name just a few. All these
concepts have intuitive derivations starting from traditional statistical point es-
timation. If these concepts are branches of a tree, then optimizing a quadratic,
as in least-squares, is the fundamental trunk.
With this in mind, we carry forward in the usual way, by describing the
system state vector, process model, measurement model and how we include
prior information.
xp 1
..
.
" #
xp xp m
x= =
xm x
m1
..
.
xm n
Each pose xpj is represented by a six parameter column vector comprised of
a 3D point and an Euler angle xpj = [xpj ypj zpj rpj ppj qpj ]T . Landmarks are
represented by their 3D position, xmi = [xmi ymi zmi ]T . The state dimension
is thus |x| = (6m + 3n) and grows as the robot path increases and as new
landmarks are observed.
3
xpj = fj (xpj−1 , uj ) + wj (1)
where uj is an input command to the robot. The noise vector wj is additive
and follows a normal distribution wj ∼ N (0, Qj ), so that xpj ∼ N (fj (xpj−1 , uj ), Qi ).
A simple and useful kinematic process model for f is the compound ¯ operation,
∂f ¯
⊕, which is described in (Appendix B). The Jacobian of f , Fj = ∂x ¯ xj−1 , which
we will need in a moment, is also derived in (Appendix B). Concatenating the
individual process models together we can write the system dynamic state space
model as,
f1 (xp0 , u1 )
f (xp ) = ..
.
.
fm (xpm−1 , um )
With the above definitions and assumptions the probability density function
describing the robot path is
½ ¾
1 T −1
P (xp ) = ηxp exp − (xp − f (xp )) Q (xp − f (xp )) (2)
2
p
where η = 1/ (2π)|xp | det(Q) is a normalizing constant and |xp | is the dimen-
sion of xp . In practice, one usually extends this basic model to also estimate
other quantities, such as linear and angular velocities; for clarity, we will stick
with this basic formulation.
z11 h11 R11 0 ... 0
. . .. ..
. . .
. . 0 .
z= z , h(x) = h1m , R = R1m
1m
. . .. ..
. .
. . . .
znm hnm 0 ... Rnm
Thus the probability density describing the measurement likelihood is simply
½ ¾
1
P (z|x) = ηz exp − (z − h(x))T R−1 (z − h(x)) (4)
2
p
where ηz = 1/ (2π)|z| det(R) is again a normalizing term.
4
2.4 Point Estimation
In the beginning let us say we have some¡ prior information about the state
represented by a prior distribution x̂ ∼ N x, Π−1 which encodes information
¢
about a single starting pose, about some previously known map of n landmarks,
and about the relationships between the starting pose and the map. The inverse
covariance Π can be broken into four distinct parts
" # " #−1
xp 1 Πp Πpm
x̂ ∼ N ,
xm1:n ΠTpm Πm
1¡
gz (x)T R−1 gz (x) + gf (x)T Q−1 gf (x) + gπ (x)T Πgπ (x)
¢
ℓ(x) =
2
1¡
g(x)T C−1 g(x) .
¢
= (7)
2
If we let ST S = C−1 and r(x) = Sg(x) then (7) is clearly a non-linear least
squares problem of the form
1
ℓ(x) = ||r(x)||2 . (8)
2
Newton’s solution to such optimization problems is the iterative sequence
5
Figure 1: Basic Figure 2: The sparse structure of least squares SLAM system
structure of the matrix is due to contributions from three components: the
sensor model measurement block HT R−1 H, the process block DT Q−1 D,
Jacobian, H. and the prior information block LT ΠL.
the gradient of (8) is ∇ℓ(xi ) = r′ (xi )T r(xi ), the Gauss-Newton method defines
the sequence of iterates[6]
Noting that r′ (xi ) = SGi where Gi is the Jacobian of g(xi ), (10) becomes
such that xi+1 = xi +δxi . When iterated, this sequence is locally q-quadratically
convergent to the MAP estimate for near zero-residual problems[6]. The system
of linear equations
GTi C−1 Gi δxi = GTi C−1 g(xi ) (12)
is the essential least squares form of the SLAM problem. The difference be-
tween many SLAM algorithms can be boiled down to differences in how these
equations are solved. It is also interesting to note here that for many problems
the Gauss-Newton method is algebraically identical to the Iterated Extended
Kalman Filter. In fact, as will become clear later, the Sliding Window Filter
over single time step and all landmarks is exactly the IEKF SLAM solution.
As δxi → 0 the term (GTi C−1 Gi ) converges to the Hessian of the likelihood
function (this is the approximation the Gauss-Newton method makes over the
full second order Newton method[6]). It turns out that for maximum-likelihood
estimation with normal distributions (or MAP estimation when prior informa-
tion is included like in our formulation) the term (GTi C−1 Gi ) is also the Fisher
Information matrix, and it’s inverse approximates the system covariance.
6
we see that the system matrix, GT C−1 G = HT R−1 H + DT Q−1 D + LT ΠL,
has a sparse structure due to the form of D, L and especially H. The structure
of H is shown in Fig. 1. The sparsity pattern of least squares SLAM system
matrix is due to contributions from the three components
2 3
Πp 0 ... Πpm
2 3 2 3 6 7
6 7
U W E 0
6 0 0 0 7
HT R−1 H=4 DT Q−1 D=4 and T
6 7
5, 5, L ΠL=6
6 7 6 7
..
7
.. .
WT
6 7
V 0 0 6
6 . 0 . .
.
7
7
4 5
Πpm T 0 ... Πm
where gp and gm are the least squares RHS vector corresponding to the
robot path and map, respectively. We solve this system of equations using
elementary matrix operations – e.g. the Schur complement (Appendix A) - to
reduce the lower right map block Λm onto the upper left process block Λp
which is solved directly for δxp and then for δxm by back-substitution:
δxp = (Λp − Λpm (Λm )−1 Λpm T )−1 (gp − Λpm (Λm )−1 gm )
δxm = (Λm )−1 (gm − Λpm T δxp )
Alternately, we can also reduce the upper left process block Λp onto the
lower right map block Λm
δxm = (Λm − Λpm T (Λp )−1 Λpm )−1 (gm − Λpm T (Λp )−1 gp )
δxp = (Λpm )−1 (gp − Λpm δxm )
Depending on the process noise and the prior, the system matrix GT C−1 G
can take on different sparsity patterns that affect the complexity of finding a
solution. The possible sparsity patterns are shown in figure 3. In the field, the
problem at hand will define the sparsity pattern, which will influence the choice
of which algorithm to use. For instance, an infinite process noise covariance
7
(a) With motion (b) With motion (c) No motion infor- (d) No motion infor-
model; with prior model; Without prior mation; with prior in- mation; No prior in-
information. This is information. This formation. formation. This is
the structure of the is the structure of the structure of Pho-
full SLAM problem in GraphSLAM[19]. togrammetric Bundle
this paper. Adjustment[20].
Figure 3: The pattern of the system matrix in the SLAM problem depends on
information contributed (or not) from the dynamical process model and any
prior information. With no process model information and no prior information
the problem is equivalent to Photogrammetric Bundle Adjustment (the struc-
ture of this problem is visible on the right). Including a process model makes the
upper left m × m “process block” tridiagonal. Including a prior can potentially
cause complete fill-in of the lower right n × n “map-block”.
would mean the motion model does not contribute information to the system
(Q−1 = 0 =⇒ E = 0), which would reduce the upper left block of the system
matrix to block diagonal, which is O(m + n3 ) to solve. Similarly, without prior
information (Π = 0) the system matrix lower right block is also block diagonal,
which is O(m3 + n) to solve. Without information from the motion model and
without prior information the problem is equivalent to the Bundle Adjustment
problem in Photogrammetry[3], which can be solved in either O(m3 + n) or
O(m + n3 ). It is interesting to note that in this form (no motion model, no
prior), the first optimal solution to the SLAM problem using cameras appears
to have been developed by Brown circa 1958[2]. Brown was also the originator
of what has come to be known as the Tsai camera model[3, 21].
4.1 Marginalization
Analogously to the how the sparse SLAM equations are solved via reduction
and back-substitution, marginalizing uses the Schur Compliment to reduce the
8
parameters of interest onto a smaller system, which is then solved for the re-
maining parameters. This is equivalent to marginalizing out the parameters
we wish to remove[10]. Considering the effects of the Schur Compliment, it is
not surprising that marginalizing out parameters induces conditional dependen-
cies between all the remaining parameters that the removed parameters were
conditionally dependent on.
Thus, at a minimum, marginalizing a set of pose parameters will add cross-
information terms between all the landmarks that were visible from that pose.
This is depicted graphically in Fig. 4 for a system that starts without any prior
information. From this we see that it is important to consider how marginalizing
affects the system of equations. It is well known that the information matrix can
be interpreted as an adjacency matrix of an undirected graph where non-zero off-
diagonal entries encode conditional dependencies between parameters. The ith ,
j th element of the Λpm matrix encodes pose-to-map conditional dependencies
and is non-zero only if the ith map landmark was visible from the j th pose.
Studying this structure we see that marginalizing the oldest pose causes
fill-in in three places: 1) between any landmarks that were visible from the
marginalized pose, 2) between the parameters of the next-oldest-pose (the pose
one time step after the pose being marginalized), and 3) between the next-
oldest-pose and all landmarks seen by the marginalized pose. Notice that only
Π experiences additional fill-in.
This is important because it means that the upper left block of the full SLAM
problem is still block tri-diagonal, and the Λpm block is only changed across
the top 6 rows - exactly where it overlaps with Πpm . Hence, when solving we
can still take advantage of any sparsity patterns that may exist in Λpm , just
like as in Bundle Adjustment (see the Appendix in[9] for more on this exploit).
Note that the patterns in Λpm are problem specific, depending on how the
robot went about observing the environment. For instance, a robot sentry that
always sees the same landmarks will have a completely dense Λpm block, while
for a robot exploring and sensing new landmarks, it will have a sparse banded
pattern. This fact can hamper methods that rely on predictability in Λpm in
order to achieve computational efficiency[5].
When landmarks are observed from a pose it adds pose-to-landmarks condi-
tional dependence information to the system matrix. This information encodes
a soft spatial rigidity between the parameters. Marginalizing a pose preserves
this soft spatial rigidity by transferring its structure into a map of conditionally
dependent landmarks.
By marginalizing poses we have succeeded in removing the O(m3 ) cost of car-
rying the complete robot path in the state estimate. Interestingly, this process
is in some sense the opposite of sparsification in Sparse Extended Information
Filters[18] - one might call it map “densification” - it is also the opposite of
what is done in Delayed State Filters[7].
The next thing is to bound the growth of the map by a constant, which we
will do by marginalizing landmarks that are no longer visible from any pose
currently in the state vector. To marginalize landmark parameters the parame-
ters to be removed are first manipulated into the upper-left block of the system
equations via elementary matrix manipulations. The fact that the landmark
we are removing is no longer visible is crucial because except for the oldest
pose-to-map terms, all the cross-information terms in Λpm will be zero, which
means that the Schur complement onto the remaining system will once again
9
Figure 4: Evolution of the system matrix for a toy problem with 4 poses and
6 landmarks. On the left is the system matrix after measuring landmarks 1,
2, 3 from pose 1, landmarks 2, 3, and 4 from pose 2, landmarks 3, 4, and, 5
from pose 3 and 4, 5, and 6 at pose 4. This is depicted graphically in Fig.
5. Marginalizing out pose 1 induces conditional dependencies (fill-in) in three
places: 1) the top left 6 × 6 of the process-block, 2) the prior map-block Πm
between landmarks that were visible from pose 1, and 3) the prior pose-to-map
block Πpm between landmarks that were visible from pose 1. These places are
shaded in darker grey. At this point marginalizing landmark 1, which is not
visible from any of the remaining poses, will induce no extra fill-in in Π.
10
MSE vs Time MSE vs Time
−4
x 10
−3 41 landmarks x 10 25 features, 85 percent overlap
1.4 1
"Visual Odometry" 2 Frames
1/N Predicted MSE
2 Frame LS−SLAM 0.9 5 Frames
Full SLAM 20 Frames
1.2
10 Frames
0.8
1 0.7
0.6
0.8
MSE
MSE
0.5
0.6
0.4
0.4 0.3
0.2
0.2
0.1
0 0
1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20
Time Time
(a) Sliding Window Filter comes close to the (b) MSE vs Time with the cameras in for-
Full SLAM solution. Sliding Window of 2 is ward motion over 1m. Each frame overlaps
close to optimal 1/k full batch curve. Each ∼85 with ∼25 features per frame. The saw-
curve is a trial for different size time window, tooth pattern stems from the fact that new
averaged over 50 Monte-Carlo trials, with features will cause the MSE to jump up. The
0.1 pixel std.dev measurement noise. 1.0m 2 frame SWF suffers from marginalizing in-
std.dev process noise. Because VO does not formation out too quickly.
combine information over time, it does not
reduce uncertainty as time passes.
11
do. However, after k steps, the error stops converging as we delete information
from the back of the filter. With such deleting and a sliding window of k = 1 it
is interesting to note that we end up with a solution that is nearly identical to
previous forms of Visual Odometry[12, 14, 15] (though we should expect slightly
better results since the state estimate of the landmarks is being improved with
each observation, that is, there is some data fusion, even with “deleting”). The
graph in Fig. 4.1 shows the MSE performance for this type of Visual Odometry
compared to the batch solution, as well as the Sliding Window Filter solution.
12
variance) parameter estimates. Batch methods also produce smooth trajecto-
ries (hence techniques that optimize over more than one time step are called
smoothers).
The covariance estimate in Recursive Least Squares like the EKF is based on
the assumption that second derivative terms in the true Hessian are negligible.
The closer the parameter estimate to the true value the better this covariance
approximation will be. Since iterative batch least squares solutions give better
parameter estimates, the approximated covariance matrix (and information ma-
trix) is also more accurate. In the non-linear least squares framework, estimating
over all the parameters in the system, considering the entire measurement his-
tory, and iterating the update are crucial for attaining an optimal, efficient and
consistent estimate.
6 Related Work
This work was inspired by the results from the Photogrammetry community,
dating back to the late 1950’s[2], and later derivatives like the Variable State
Dimension Filter[13], Visual Odometry[12], and of course SLAM[16].
that we wish to invert, and we know that A and D are square and invertable.
The first thing is to notice the two following simple matrix multiplications that
allow us to triangularize M: first, the following left multiplication creates an
upper-right triangular system,
" #" # " #
I 0 A B A B
=
−CA−1 I C D 0 ∆A
and the following right multiplication creates a lower-left triangular system,
13
" #" # " #
A B I −A−1 B A 0
=
C D 0 I C ∆A
and
which greatly simplifies computing the inverse since the middle term is block
diagonal. For instance,
and equivalently,
14
" #−1 " #" #" #
A B I 0 ∆−1
D 0 I −BD−1
= −1
(A.4)
C D −D C I 0 D−1 0 I
" #
−1
∆−1
D −∆−1
D BD
= −1
(A.5)
−D−1 C∆−1
D D−1 + D−1 C∆−1
D BD
Equating various terms of (A.3) and (A.5) yeild the differnt forms of the
Matrix Inverse Lemma, one of which is
B Spatial Operations
This section discusses two tools for reasoning about spatial relationships, namely
the compound and inverse operators. Generally, objects are represented by a
pose vector which is an entity comprised of a 3D position and an orientation. A
pose always has 6 degrees of freedom, though it may have more elements (i.e. a
quaternion has 4 parameters to represent 3 degrees of rotational freedom, giving
a pose vector with 7 elements). This detail has consequences. For example, it is
worth remembering that all 3-parameter representations of orientation will have
a singularity associated with them (e.g. gimbal-lock for Euler angles), and all
4 or more parameter representations are redundant, leading to other problems,
such as rank deficient covariance matrices.
In general there is no one “best” representation of orientation, and it is often
up to the practitioner to decide what is the right one for the application. One
approach is to use quaternions to represent pose globally, and then use Euler
angles to represent small local changes. This gets the best of both worlds: a
non-singular global representation and a minimal local representation.
Note that using local parameterizations like this buys us some of the benefits
enjoyed by error-state Kalman filters, namely that the least squares optimization
takes place around a state vector whose optimal solution will be near zero. In
our case this has the effect of keeping the optimization machinery far away from
the singularities of SO3.
Therefore local pose, xij is represented by a six parameter column vector
comprised of a 3D point, pij = [xij yij zij ]T and a 3 element Euler Roll-Pitch-
Yaw angle θij = [rij pij qij ]T . The +x-axis is forward, +y-axis is right and
+z-axis is down; roll is about x-axis, pitch about y-axis and yaw is about z-
axis. Dropping the subscripts and abbreviating sin and cos, a rotation matrix
is parameterized by θ such that Rθ = RzRyRx. where
1 0 0 cp 0 sp cq −sq 0
Rx = 0 cr −sr , Ry = 0 1 0 , Rz = sq cq 0 .
0 sr cr −sp 0 cp 0 0 1
15
Say, for example, that xwr denotes the robot pose in the world coordinate
frame and xrs denotes a sensor in the robot coordinate frame. The compound
operator, ⊕, can be used to find the sensor pose in the world coordinate frame,
e.g. xws = xwr ⊕ xrs (notice that the two r’s are adjacent).
More generally, given two spatial relationships, xij and xjk the compound
operator, ⊕, gives the pose of object k in coordinate frame i,
where Rθij and Rθjk are the 3x3 rotation matrices defined by the Euler angle
components of xij and xjk , respectively. The function γ maps from a rotation
matrix to the appropriate Euler angle
arctan(R32 /R33 )
γ(R) = − arcsin(R31 )
arctan(R21 /R11
which is well defined for attitude angles between -90o and 90o . The compounding
operation is associative but not commutative.
The inverse operation, ⊖, inverts a spatial relation. For instance xij gives
object j’s pose in i’s coordinate frame, and ⊖xij gives object i’s pose in j’s
coordinate frame. The inverse operator is
" #
−Rθij pij
xji = ⊖xij =
γ(RTθij )
Jacobians of spatial operations are needed in many places - e.g. linear er-
ror propagation, Kalman filtering applications, least squares normal equations,
non-linear optimization, etc. These Jacobians are involved, but tractable; it is
recommended to find them with the aid of symbolic solvers, such as Maple or
Mathematica. As a special case, note that error-state prameterizations simplify
these Jacobians because the entries are all zeros or ones.
C Errata
February 9, 2010 — Replaced the term “downdate” with “marginalize” for
correctness.
References
[1] F.W. Bell, B.M. Cathey. The iterated Kalman filter update as a Gauss-Newton method.
IEEE Transactions on Automatic Control, 38(2):294–297, Feb 1993.
[2] D.C. Brown. A solution to the general problem of multiple station analytical stereotrian-
gulation. Technical report, RCP-MTP Data Reduction Technical Report No. 43, Patrick
Air Force Base, Flordia (also designated as AFMTC 58-8), 1958.
[3] D.C. Brown. The bundle adjustment - progress and prospects. In XIIIth Congress of the
International Society for Photogrammetry, 1976.
[4] M. H. DeGroot and M. J. Schervish. Probability and Statistics. Addison Wesley, 2001.
[5] Frank Dellaert. Square root SAM. In Proceedings of Robotics: Science and Systems,
2005.
16
[6] Jr Dennis J.E. and R. B. Schnabel. Numerical Methods for Unconstrained Optimization
and Nonlinear Equations. Soc for Industrial & Applied Math, 1996.
[7] R. Eustice, H. Singh, and J. Leonard. Exactly sparse delayed-state filters. In Proceedings
of the 2005 IEEE International Conference on Robotics and Automation, 2005.
[8] Ryan Eustice, Hanumant Singh, John Leonard, Matthew Walter, and Robert Ballard.
Visually navigating the rms titanic with slam information filters. In Robotics: Science
and Systems, 2005.
[9] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge
University Press, 2000.
[10] M.I. Jordan. An Introduction to Graphical Models. unpublished, 2003.
[11] T. Kailath, A. H. Sayed, and B. Hassibi. Linear Estimation. Prentice Hall, 2000. Good
reference for estimation theory. Good appendix on matrix algebra.
[12] L. Matthies and S. Shafer. Error modelling in stereo navigation. IEEE Journal of
Robotics and Automation, 3(3):239–248, 1987.
[13] Philip F. McLauchlan. The variable state dimension filter applied to surface-based struc-
ture from motion. Technical report, University of Surrey, 1999.
[14] David Nister, Oleg Naroditsky, and James Bergen. Visual odometry. In CVPR, volume 1,
2004.
[15] Clark F. Olson, Larry H. Matthies, Marcel Schoppers, and Mark W. Maimone. Stereo ego-
motion improvements for robust rover navigation. In In Proceedings IEEE Conference
on Robotics and Automation (ICRA), volume 2, pages 1099– 1104, 2001.
[16] R. C. Smith, M. Self, and P. Cheeseman. Estimating uncertain spatial relationships in
robotics. In I. J. Cox and G. T. Wilfong, editors, Autonomous Robot Vehicles, pages
167–193. Springer-Verlag, 1990.
[17] H. W. Sorenson. Parameter Estimation: Principles and Problems. Marcel Drekker, Inc.,
1980.
[18] S. Thrun, D. Koller, Z. Ghahmarani, and H. Durrant-Whyte. Simultaneous map-
ping and localization with sparse extended information filters: Theory and ini-
tial results. In Workshop on Algorithmic Foundations of Robotics, 2002. URL
citeseer.nj.nec.com/article/thrun02simultaneous.html.
[19] Sebastian Thrun, Wolfram Burgard, and Dieter Fox. Probabilistic Robotics. MIT Press,
Cambridge, MA, 2005.
[20] Bill Triggs, Philip McLauchlan, Richard Hartley, and Andrew Fitzgibbon. Bundle ad-
justment – A modern synthesis. In W. Triggs, A. Zisserman, and R. Szeliski, editors,
Vision Algorithms: Theory and Practice, LNCS, pages 298–375. Springer Verlag, 2000.
URL citeseer.ist.psu.edu/triggs00bundle.html.
[21] R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3d machine vision
metrology using off-the-shelf tv cameras and lenses. IEEE Journal of Robotics and
Automation, 3(4):pp 323–344, 1987.
17