0% found this document useful (0 votes)
26 views29 pages

Least Squares Optimization: From Theory To Practice

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views29 pages

Least Squares Optimization: From Theory To Practice

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Least Squares Optimization: from Theory to Practice

Giorgio Grisetti 1 Tiziano Guadagnino 1 Irvin Aloise 1 Mirco Colosi 1,2

Bartolomeo Della Corte 1 Dominik Schlegel 1

Abstract— Nowadays, Non-Linear Least-Squares embodies cases these systems can be used as black boxes. Since they
the foundation of many Robotics and Computer Vision systems. typically consist of an extended codebase, entailing them to
The research community deeply investigated this topic in the a particular application/architecture to achieve the maximum
arXiv:2002.11051v1 [cs.RO] 25 Feb 2020

last years, and this resulted in the development of several


open-source solvers to approach constantly increasing classes performances is a non-trivial task. In contrast, extending
of problems. In this work, we propose a unified methodology these systems to approach new problems is typically easier
to design and develop efficient Least-Squares Optimization than customizing: in this case the developer has to imple-
algorithms, focusing on the structures and patterns of each ment some additional functionalities/classes according to the
specific domain. Furthermore, we present a novel open-source API of the system. Also in this case, however, an optimal
optimization system, that addresses transparently problems with
a different structure and designed to be easy to extend. The implementation might require a reasonable knowledge of the
system is written in modern C++ and can run efficiently on solver internals.
embedded systems3 . We validated our approach by conducting We believe that at the current times, a researcher work-
comparative experiments on several problems using tandard ing in robotics should possess the knowledge on how to
datasets. The results show that our system achieves state-of- design factor graph solvers for specific problems. Having
the-art performances in all tested scenarios.
this skill enables to both effectively extend existing systems
I. I NTRODUCTION and realize custom software that utilizes the hardware at
its maximum. Accordingly, the primary goal of this paper
Iterative Least-Squares (ILS) solvers are core building is to provide the reader with a methodology on how to
blocks of many robotic applications, systems and subsys- mathematically define such a solver for a problem. To
tems [1]. This technique has been traditionally used for cal- this extent in Sec. IV, we start by revising the nonlinear
ibration [2]–[4], registration [5]–[7] and global optimization least squares by highlighting the connections between in-
[8]–[11]. In particular, modern Simultaneous Localization ference on conditional Gaussian distributions and ILS. In
and Mapping (SLAM) systems typically employ multiple the same section, we introduce the ⊞ method introduced by
ILS solvers at different levels: in computing the incremental Hertzberg et al. [14] to deal with non-Euclidean domains.
ego motion of the sensor, in refining the localization of a Furthermore, we discuss on how to cope with outliers in
robot upon loop closure and - most notably - to obtain a the measurements through robust cost functions and we
globally consistent map. Similarly, in several computer vision outline the effects of sparsity in factor graphs. We conclude
systems, ILS is used to compute/refine camera parameters, the section by presenting a general methodology on how
estimating the structure of a scene, the position of the camera to design factors and variables that describe a problem. In
or both. Many inference problems in robotics are effectively Sec. V we bake up this methodology by providing examples
described by a factor graph [12], which is a graphical model that approach four prominent problems in Robotics: Iterative
expressing the joint likelihood of the known measurements Closest Point (ICP), projective registration, Bundle Adjust-
with respect to a set of unknown conditional variables. Solv- ment (BA), Pose-Graph Optimization (PGO).
ing a factor graph requires to find the values of the variables When it comes to the implementation of a solver, several
that maximize the joint likelihood of the measurements. If the choices have to be made in the light of the problem structure,
noise affecting the sensor data is Gaussian the solution of a the compute architecture and the operating conditions (on-
factor graph can be computed by an ILS solver implementing line or batch). In this work we characterize ILS prob-
variants of the well known Gauss-Newton (GN) algorithm. lems, distinguishing between dense and sparse, batch and
The relevance of the topic has been addressed by several incremental, stationary and non-stationary based on their
works such as GTSAM [9], g 2 o [8], SLAM++ [13], or the structure and application domain. In Sec. II we provide a
Ceres solver [10] by Google. These systems have grown over more detailed description of these characteristics, while in
time to include comprehensive libraries of factors and vari- Sec III we discuss how ILS has been used in the literature to
ables, that can tackle a large variety of problems, and in most approach various problems in Robotics and by highlighting
1 Department of Computer, Control, and Management Engineering how addressing a problem according to its traits leads to
”Antonio Ruberti”, Sapienza University of Rome, Rome, Italy. Email: effective solutions.
{grisetti, guadagnino, ialoise, colosi, dellacorte, The second orthogonal goal of this work is to propose
schlegel}@diag.uniroma1.it
2 Robot Navigation and Perception (CR/AER1), Robert Bosch Corporate a unifying system that deals with dense/sparse, static/dy-
Research, Stuttgart, Germany. Email: [email protected] namic, batch problems, with no apparent performance loss
3 Our package is available at http://srrg.gitlab.io/srrg2-solver.html compared to ad-hoc solutions. We build on the ideas that
are at the base of the g 2 o optimizer [8], to address some stationary when the measurements do not change during
requirements arising from users and developers, namely: fast the iterations of the optimization. This occurs when the
convergence, small runtime per iteration, rapid prototyping, data-association is known a priori with sufficient certainty.
trade-off between implementation effort and performances, Conversely, non-stationary problems admit measurements
and, finally, code compactness. In Sec. VI we highlight from that might change during the optimization, as a result of a
the general algorithm outlined Sec. IV, a set of functionalities modification of the variables being estimated. A typical case
that results in a modular, decoupled and minimal design. of non-stationary problem is point registration [16], when
This analysis ultimately leads to a modern compact and the associations between the points in the model and in the
efficient C++ library released under BSD3 license for ILS of reference are computed at each iteration based on an heuristic
Factor Graphs that relies on a component model, presented that depends on the current estimate of their displacement.
in Sec VII that effectively runs on both on x86-64 and Finally, the problem might be extended over time by
ARM platforms. To ease prototyping we offer an interactive adding new variables and measurements. Several Graph-
environment to graphically configure the solver (Fig. 1a). The Based SLAM systems exploit this intrinsic characteristic in
core library of our solver consists of no more than 6000 lines on-line applications, reusing the computation done while
of C++ code, whereas the companion libraries implementing solving the original problem to determine the solution for
a large set of factors and variables for approaching problems the augmented one. We refer to a solver with this capability
-e.g. 2D/3D ICP, projective registration, BA, 2D/3D PGO as an incremental solver, in contrast to batch solvers that
and Pose-Landmark Graph Optimization (PLGO) and many carry on all the computation from scratch once the factor
others - is at the time of this writing below 4000 lines. graph is augmented.
Our system relies on our visual component framework, In this taxonomy we left out other crucial aspects that
image processing and visualization libraries that contain no affect the convergence basin of the solver such as the linearity
optimization code and consists of approximately 20000 lines. of the measurement function, or the domain of variables and
To validate our claims, we conducted extensive comparative measurements. Exploiting the structure of these domains has
experiments on publicly avalilable datasets (Fig. 1b) - in shown to provide even more efficient solutions [17], with the
dense and sparse scenarios. We compared our solver with obvious shortcoming that they are restricted to the specific
sparse approaches such as GTSAM, g 2 o and Ceres, and with problem they are designed to address.
dense ones, such as the well-known PCL library [15]. The Using a sparse stationary solver on a dense non-stationary
experiments presented in Sec. VIII confirm that our system problem results in carrying on useless computations that
has performances that are on par with other state-of-the-art hinders the usability of the system. Using a dense dynamic
frameworks. Summarizing, the contribution of this work is solver to approach a sparse stationary problem presents
twofold: similar issues. State-of-the-art open-source solvers like the
ones mentioned in Sec I focus on sparse stationary or
– We present a methodology on how to design a solver incremental problems. Dense solvers are usually within the
for a generic class of problems, and we exemplify application/library using it and tightly coupled to it. On the
such a methodology by showing how it can be used one hand, this allows to reduce the time per iteration, while
to approach a relevant subset of problems in Robotics. on the other hand it results in avoidable code replication
– We propose an open-source, component-based ILS sys- when multiple systems are integrated. This might result in
tem that aims to coherently address problems having potential inconsistencies among program parts and conse-
different structure, while providing state-of-the-art per- quent bugs.
formances.
III. R ELATED W ORK
II. TAXONOMY OF ILS PROBLEMS
In this section, we revise the use of ILS in approaching
Whereas the theory on ILS is well-known, the effective- several problems in robotics, to highlight the structure and
ness of an implementation greatly depends on the struc- the peculiarities that each problem presents to the solver
ture of the problem being addressed and on the operating according to the taxonomy presented in Sec. II. Furthermore,
conditions. We qualitatively distinguish between dense and we provide an overview of generic sparse solvers that are
sparse problems, by discriminating on the connectivity of commonly used nowadays for factor graph optimization.
the factor graph. A dense problem is characterized by many
measurements affected by relatively few variables. This A. ILS in Robotics
occurs in typical registration problems, where the likelihoods In calibration, ILS has been used extensively since the first
of the measurements (e.g. the intensities of image pixels) works appeared until these days [18]–[20]. Common works
depend on a single variable expressing the sensor position. In in batch calibration involve relatively small state spaces
contrast, sparse problems are characterized by measurements covering only the parameters to be estimated. Since these
that depend only on a small subset of variables. Examples parameters condition directly or indirectly all measurements,
of sparse problems include PGO or BA. these class of problems typically requires a dense stationary
A further orthogonal classification of the problems di- solver. When temporal calibration is required, however, the
vides them in stationary and non-stationary. A problem is changing time offset might result in considering different
(a) Graphical solver configurator.

(b) Datasets used in the evaluation.

Fig. 1: Left: visual configuration manager implemented in our framework. Each block represents a configurable sub-module.
Right: dataset used in the evaluation - dense and sparse.

data chunks at each iterations thus requiring a dense, non- spondences. The final stage, however typically uses a dense
stationary solver, such as the one presented in [4]. and stationary ILS approach, since the correspondences do
Among the first works on pairwise shape registration not change during the iterations. When the initial guess of
relying on ILS, we find the ICP proposed by Besl and the camera is known with sufficient accuracy, like in Visual
McKay [16], while Chen and Medioni [21] proposed the Odometry (VO), only the latter stage is typically used. In
first ILS method for the incremental reconstruction of a 3D contrast to these feature-based solvers, Engels et. al [30]
model from multiple range images. These methods constitute approach VO by minimizing the reprojection error between
the foundation of many registration algorithms appearing two images through dense and non-stationary ILS. Using
during the subsequent years. In particular Lu and Milios [22] this method requires the system to possess a reasonably
specialized ICP to operate on 2D laser scans. All these works good estimate of the depth for a subset of the point in the
employed dense non-stationary solvers, to estimate the robot scene. Such initialization is usually obtained by estimating
pose that better explain the point measurements. The non- the transformation between two images using a combination
stationary aspect arises from the heuristic used to estimate of RANSAC and direct solvers, and then computing the
the data association is based on the current pose estimate. depth through triangulation between the stereo pair. Della
In the context of ICP, Censi [23] proposed an alternative Corte et. al [31] developed a registration algorithm called
metric to compute the distance between two points and an MPR, which was built on this idea. As a result, MPR is
approach to estimate the information matrix from the set able to operate on depth images capturing different cues and
of correspondences [24]. Subsequently, Segal et al. [25] obtained with arbitrary projection functions. To operate on-
proposed the use of covariance matrices that better reflect line, all the registration works mentioned so far rely on ad-
the structure hoc dense and non-stationary ILS solvers that leverage on
of the surface in computing the error. Registration has been the specific problem’s structure to reduce the computation.
addressed by Bieber et al. [26] for 2D scans and subsequently The scan based ICP algorithm [22] has been subsequently
Magnusson et al. [27] for 3D point clouds by using a pure employed by the same authors [32] as a building block
Newton’s method relying on a Gaussian approximation of for a system that estimates a globally consistent map. The
the point clouds to be registered, called Normal Distributed core is to determine the relative transforms between pairwise
Transform (NDT). Serafin et. al [7] approached the problem scans through ICP. These transformations are known as
of point cloud registration using a 6D error function encoding constraints, and a global map is obtained by finding the
also the normal difference in the error vector. All the position of all the scans that better explain the constraints.
approaches mentioned so far leverage on a dense ILS solver, The process can be visualized as a graph, whose nodes
with the notable exception of NDT that is a second-order are the scan positions and whose edges are the constraints,
approach that specializes the Newton’s algorithm. hence this problem is called PGO. Constraints can exist
In the context of Computer Vision, the p2p algorithm [28], only between spatially close nodes, due to the limited sensor
[29] allows to find the camera transformation that minimizes range. Hence, PGO is inherently sparse. Additionally, in the
the reprojection error between a set of 3D points in the scene on-line case the graph is incrementally augmented as new
and their 2D projections in the image. The first stage of p2p measurements become available, rendering it incremental.
is usually conducted according to a consensus scheme that We are unaware on these two aspects being exploited in the
relies on an ad-hoc minimal solver requiring only 3 corre- design of the underlying solver in [32]. The work of Lu and
Milios inspired Borrman et al. [33] to produce an effective exchange sub-modules of the system - e.g. the linear solver
3D extension. or optimization algorithm. A further paper of Hertzberg et
For several years after the introduction of Graph-Based al. [14] extends the ⊞ method ILS to filtering.
SLAM [32], the community put aside ILS approaches in Agarwal et al. proposed in their Ceres Solver [10] a
favor of filtering methods relying on Gaussian [34]–[39] or generalized framework to perform non-linear optimization.
Particle [40]–[43] representation of the posterior. Filtering Ceres embeds state-of-the-art methodologies that take advan-
approaches were preferred since they were regarded as more tages of modern CPUs - e.g. SIMD instructions and multi-
suitable to be run on-line on a moving robot with the threading - resulting in a complete and fast framework. One
available computational resources of the era, and the sparsity of its most relevant feature is represented by the efficient
of the problem had not yet been fully exploited. use of Automatic Differentiation (AD) [49], that consists in
In a Graph-Based SLAM problem, it is common to have the algorithmic computation of derivatives starting from the
a number of variables in the order of hundreds or thousands. error function. Further information on the topic of AD can
Such a high number of variables results in a large optimiza- be found in [50], [51].
tion problem that represented a challenge for the computers In several contexts knowing the optimal value of a so-
of the time, rendering global optimization a bottleneck lution is not sufficient, and also the covariance is required.
of Graph-Based SLAM systems. In the remainder of this In SLAM, knowing the marginal covariances relative to a
document we will refer to the global optimization module variable is fundamental to approach data-association. To this
in Graph-SLAM as the back-end, in contrast to the front- extent Kaess et. al [52] outlined the use of the elimination
end which is responsible to construct the factor graph based tree. Subsequently, Ila et al. [13] designed SLAM++, an
on the sensor measurements. Gutmann and Konolidge [44] optimization framework to estimate mean and covariance
addressed the problem of incrementally building a map, by of the state by performing incremental Cholesky updates.
finding topological relations and loop closures. This work This work takes advantage of the incremental aspect of
was one of the first on-line implementations of Graph-Based the problem to selectively update the approximated Hessian
SLAM. The core idea to reduce the computation in the matrix by using parallel computation.
back-end was to restrict the optimization to the portions of ILS algorithms have several known drawbacks. Perhaps
the graph having the larger errors. This insight has inspired the most investigated aspect is the sensitivity of the solution
several subsequent works [13], [45]. to the initial guess, that is reflected by the convergence
basin. A wrong initial guess might lead a non-linear solver
B. Stand-Alone ILS solvers to converge to an inconsistent local minimum. Convex opti-
Whereas dense solvers are typically embedded in the spe- mization [53] is one of the possible strategies to overcome
cific application for performance reasons, sparse solvers are this problem, however its use is highly domain dependent.
complex enough to motivate the design of generic libraries Rosen et al. [17] explored this topic, proposing a system to
forILS. The first work to explicitly consider the sparsity of perform optimization of generic SE(d) factor graphs. In their
SLAM in conjunction√ with a direct method to solve the system, called SE-Sync, they use Riemannian Truncated-
linear system was SAM , developed by Dallaert et al. [46]. Newton Trust-Region method to certifiably compute the
Kaess et al. [45] exploited this√ aspect of the problem in global optimum in a two step optimization (rotation and
iSAM, the second iteration of SAM . Here when a new translation). Briales et al. [54] extended this approach to
edge is added to the graph, the system computes a new jointly optimize rotation and translation using the same
solution reusing part of the previous one and selectively concepts. Bai et al. [55] provided a formulation of the
updating the vertices. In the third iteration of the system SLAM problem based on constrained optimization, where
- iSAM2 - Kaess et al. [47] exploited the Bayes Tree to constraints are represented by loop-closure cycles. Still, those
solve the optimization problem without explicitly construct- approaches are bounded to SE(d) sparse optimization. In
ing the linear system. This solution is in contrast with the contrast, Ni et al. [56] and Grisetti et al. [57] exploited re-
general trend of decoupling linearization of the problem and spectively nested dissection and hierarchical local sub-graphs
solution of the linear system and highlights the connections devising divide and conquer strategies to both increase the
between the elimination algorithms used in the solution of a convergence basin and speed up the computation.
linear system and inference on graphical models. This self-
contained engine allows deal very efficiently with dynamic IV. L EAST S QUARES M INIMIZATION
graph that grows during time and Gaussian densities, two This section describes the foundations of ILS minimiza-
typical features of the SLAM problem. The final iteration of tion. We first present a formulation of the problem, that
the system, called GTSAM [9], embeds all this concepts in highlights its probabilistic aspects (Section IV-A). In Sec-
a single framework. tion IV-B we review some basic rules for manipulating the
Meanwhile, Hertzberg with his thesis [48] introduced the Normal distribution and we apply these rules to the definition
⊞ method to systematically deal with non-Euclidean spaces presented in Section IV-A, leading to the initial definition of
and sparse problems. This work has been at the root of linear Least-Squares (LS). In Section IV-C we discuss the
the framework of Kümmerle et al. [8] - called g 2 o. This effects of non-linear observation model, assuming that both
system introduces a layered architecture that allows to easily the state space and the measurements space are Euclidean.
5

4.5

3.5

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Fig. 2: Affine transformation of a uni-variate Gaussian distribution. The blue curve represents the source PDF while in green
we show the output PDF. The red line represents the affine transformation.

Subsequently, we relax this assumption on the structure of Eq. (2) expresses the uniform prior about the states using
state and measurement spaces, proposing a solution that the canonical parameterization of the Gaussian. Alternatively,
uses smooth manifold encapsulation. In Section IV-D we the moment parameterization characterizes the Gaussian by
introduce effects of outliers in the optimization and we show the information matrix - i.e. the inverse of the covariance
commonly used methodologies to reject them. Finally in matrix Ωx = Σ−1 x - and the information vector νx = Ωx µx .
Section IV-E, we address the case of large, sparse problem The canonical parameterization is better suited to represent
characterized by measurement functions depending only on non-informative prior, since Ωx = 0 does not lead to
small subsets of the state. Classical problems such as SLAM numerical instabilities while implementing the algorithm. In
or BA fall in this category and are characterized by a rather contrast, the moment parameterization can express in a stable
sparse structure. manner situations of absolute certainty by setting Σx = 0.
In the remainder, we will use both representations upon
A. Problem Formulation
convenience, being their relation clear.
Let W be a stationary system whose non-observable state In Eq. (3) the mean µz|x of the predicted measurement
variable is represented by x and let z be a measurement, distribution is controlled by a generic non-linear function
i.e. a perception of the environment. The state is distributed of the state h(x), commonly referred to as measurement
according to a prior distribution p(x), while the conditional function. In the next section we will derive the solution
distribution of the measurement given the state p(z|x) is for Eq. (1), imposing that the measurement function is an
known. p(z|x) is commonly referred to as the observation affine transformation of the state - i.e. h(x) = Ax + b -
model. Our goal is to find the most likely distribution of illustrated in Fig. 2. In Section IV-C we address the more
states, given the measurements - i.e. p(x|z). A straightfor- general non-linear case.
ward application of the Bayes rule results in the following:
B. Linear Measurement Function
p(x)p(z|x)
p(x|z) = ∝ p(x) p(z|x). (1) In case of linear measurement function, expressed as
p(z) h(x) = A(x − µx ) + ẑ, the prediction model has the
The proportionality is a consequence of the normalization following form:
factor p(z), which is constant. ∆x
In the reminder of this work, we will considers two key
p(z|x) = N (z; µz|x = A(x − µx ) + ẑ, Ω−1
z|x )
assumptions:
– the prior about the states is uniform, i.e. p(z|∆x) = N (z; µz|∆x = A∆x + ẑ, Ω−1
z|x ), (4)
p(x) = N (x; µx , Σx = inf) = N (x; νx , Ωx = 0), with ẑ constant and µx being the mean of the prior. For
(2) convenience, we express the stochastic variable ∆x = x−µx
– the observation model is Gaussian, i.e. as
p(z|x) = N (z; µz|x , Ω−1
z|x ) where µz|x = h(x). p(∆x) = N (∆x; 0, Σx = inf) = N (x; ν∆x , Ωx = 0).
(3) (5)
Switching between ∆x and x is achieved by summing or An important result in this derivation is that the matrix
subtracting the mean. To retrieve a solution for Eq. (1) H = Ω∆x|z is the information matrix of the the estimate,
we first compute the joint probability of p(z, ∆x) using therefore, we can estimate not only the optimal solution µx|z ,
the chain rule and, subsequently, we condition this joint but also its uncertainty Σx|z = Ω−1∆x|z . Fig. 4 illustrates vi-
distribution with respect to the known measurement z. For sually the conditioning of a bi-variate Gaussian distribution.
further details on the Gaussian manipulation, we refer the c) I NTEGRATING M ULTIPLE M EASUREMENTS: Inte-
reader to [58]. grating multiple independent measurements z1:K requires to
a) C HAIN RULE: Under the Gaussian assumptions stack them in a single vector. As a result, the observation
made in the previous section, the parameters of the joint model becomes
distribution over states and measurements p(∆x, z) have the K
Y
following block structure: p(z1:K |∆x) = p(zk |∆x) ∼ N (z; µz|x , Ωz|x ) = (14)
  k=1
−1
p(∆x, z) = N ∆x, z; µ∆x,z , Ω∆x,z (6) 
z1
 
A1
   
ẑ1 Ωz1 |x

     ..   ..   ..  
0 Ωxx Ωxz =  .  ;  .  ∆x +  .  ,  .. 
 .
µ∆x,z = Ω∆x,z = . (7) .
ẑ ΩTxz Ωzz zK AK ẑK ΩzK |x
The value of the terms in Eq. (7) are obtained by applying the Hence, matrix H and vector b are composed by the sum of
chain rule to multivariate Gaussians to Eq. (4) and Eq. (5), each measurement’s contribution; setting ek = ẑk − zk , we
according to [58], and they result in the following: compute them as follows:
Ωxx = AT Ωz|x A + Ωx K
X K
X
T H= A⊤
k Ωzk |x Ak b= A⊤
k Ωzk |x ek . (15)
Ωxz = −A Ωz|∆x
k=1 Hk k=1 bk
Ωzz = Ωz|x .
C. Non-Linear Measurement Function
Since we assumed the prior to be non-informative, we can
Equations (12) (13) and (15) allow us to find the exact
set Ωx = 0. As a result, the information vector ν∆x,z of the
mean of the conditional distribution, under the assumptions
joint distribution is computed as:
    that i) the measurement noise is Gaussian, ii) the measure-
ν∆x −AT Ωz|x ẑ ment function is an affine transform of the state and iii)
ν∆x,z = = Ω∆x,z µ∆x,z = . (8) both measurement and state spaces are Euclidean. In this
νz Ωz|x ẑ
section we first relax the assumption on the affinity of the
Fig. 3 shows visually the outcome distribution. measurement function, leading to the common derivation of
b) C ONDITIONING: Integrating the known measure- the GN algorithm. Subsequently, we address the case of non-
ment z in the joint distribution p(∆x, z) of Eq. (6) results in Euclidean state and measurement spaces.
a new distribution p(∆x|z). This can be done by condition- If the measurement model mean µz|x is controlled by
ing in the Gaussian domain. Once again we refer the reader a non-linear but smooth function h(x), and that the prior
to [58] for the proofs, while we report here the results that mean µx = x̆ is reasonably close to the optimum, we
the conditioning has on the Gaussian parameters: can approximate the behavior of µx|z through the first-order
 Taylor expansion of h(x) around the mean, namely:
p(∆x|z) ∼ N ∆x; ν∆x|z , Ω∆x|z (9)
∂h(x)
where h(x̆ + ∆x) ≈ h(x̆) + ∆x = J∆x + ẑ. (16)
∂x x=x̆

ν∆x|z = ν∆x − Ωxz z J
⊤ ⊤
= ν∆x − (−A Ωz|x )z = A Ωz|x (z − ẑ) (10) The Taylor expansion reduces conditional mean to an affine
−e transform in ∆x. Whereas the conditional distribution will
Ω∆x|z = Ωzz = A⊤ Ωz|x A . (11) not be in general Gaussian, the parameters of a Gaussian
H
approximation can still be obtained around the optimum
through Eq. (11) and Eq. (13). Thus, we can use the same
The conditioned mean µ∆x|z is retrieved from the informa- algorithm described in Sec. IV-B, but we have to compute
tion matrix and the information vector as: the linearization at each step. Summarizing, at each iteration,
µ∆x|z = Ω−1 −1
A⊤ Ωz|x e = −H−1 b. (12) the GN algorithm:
∆x|z νx|z = −H
b
– processes each measurement zk by evaluating error
ek (x) = hk (x) − zk and Jacobian Jk at the current
Remembering that ∆x = x − µx , the Gaussian distribution solution x̆:
over the conditioned states has the same information matrix,
while the mean is obtained by summing the increment’s mean ek = h(x̆) − z (17)
µ∆x|z as ∂hk (x)
Jk = . (18)
µx|z = µx + µ∆x|z . (13) ∂x x=x̆
Fig. 3: Given a uni-variate Gaussian PDF p(x) and an affine trasformation f (x) - indicated in red - we computed the joint
distribution p(x, y = f (x)) through the chain rule.

– builds a coefficient matrix and coefficient vector for the as it usually happens in Robotics and Computer Vision
linear system in Eq. (12), and computes the optimal applications - the straightforward implementation does not
perturbation ∆x by solving a linear system: generally provide satisfactory results. Rotation matrices or
angles cannot be directly added or subtracted without per-
∆x = − H−1 b (19)
forming subsequent non-trivial normalization. Still, typical
K K
X X continuous states involving rotational or similarity transfor-
H= J⊤
k Ωzk |x Jk b= J⊤
k Ωzk |x ek mations are known to lie on a smooth manifold [59].
k=1 k=1
A smooth manifold M is a space that, albeit not homeo-
– applies the computed perturbation to the current state morphic to Rn , admits a locally Euclidean parameterization
as in Eq. (13) to get an improved estimate around each element M of the domain, commonly referred
to as chart - as illustrated in Fig. 5. A chart computed in a
x̆ ← x̆ + ∆x. (20)
manifold point M is a function from Rn to a new point M′
A smooth prediction function has lower-magnitude higher on the manifold:
order terms in its Taylor expansion. The smaller these terms
chartM (∆m) : Rn → M. (21)
are, the better its linear approximation will be. This leads to
the situations close to the ideal affine case. In general, the Intuitively, M′ is obtained by “walking” along the pertur-
smoother the measurement function h(x) is and the closer bation ∆m on the chart, starting from the origin. A null
the initial guess is to the optimum, the better the convergence motion (∆m = 0) on the chart leaves us at the point where
properties of the problem. the chart is constructed - i.e. chartM (0) = M.
a) N ON -E UCLIDEAN S PACES: The previous formula- Similarly, given two points M and M′ on the manifold,
tion of GN algorithm uses vector addition and subtraction to we can determine the motion ∆m on the chart constructed
compute the error ek in Eq. (17) and to apply the increments on M that would bring us to M′ . Let this operation be the
in Eq. (20). However, these two operations are only defined inverse chart, denoted as chart−1 ′
M (M ). The direct and in-
in Euclidean spaces. When this assumption is violated - verse charts allow us to define operators on the manifold that
Fig. 4: Conditioning of a bi-variate Gaussian. Top left: the source PDF; top right and bottom left indicate the conditioning
over x and y respectively.

A typical example of smooth manifold is the SO(3)


domain of 3D rotations. We represent an element SO(3)
on the manifold as a rotation matrix R. In contrast, the for
perturbation, we pick a minimal representation consisting on
the three Euler angles ∆r = (∆φ, ∆θ, ∆ψ)⊤ . Accordingly,
the operators become:

RA ⊞ ∆r = fromVector(∆r) RA (24)
RA ⊟ RB = toVector(R−1
B RA ). (25)

The function fromVector(·) computes a rotation matrix as


Fig. 5: Illustration of a Manifold space. Since the manifold the composition of the rotation matrices relative to each Euler
is smooth, local perturbations - i.e. ∆x in the illustration - angle. In formulæ:
can be expressed with a suitable Euclidean vector.
R = fromVector(∆r) = Rx (∆φ) Ry (∆θ) Rz (∆ψ). (26)

are analogous to the sum and subtraction. Those operators, The function toVector(·) does the opposite by computing
referred to as ⊞ and ⊟, are, thence, defined as: the value of each Euler angle starting from the matrix R. It
operates by equating each of its element to the corresponding
M] = M ⊞ ∆m , chartM (∆m) (22) one in the matrix product Rx (∆φ) Ry (∆θ) Rz (∆ψ), and

∆m = M ⊟ M , −1
chartM (M′ ). (23) by solving the resulting set of trigonometric equations. As
a result, this operation is quite articulated. Around the
This notation - firstly introduced by Smith et al. [60] and origin the chart constructed in this manner is immune to
then generalized by Hertzberg et al. [14], [61] - allows us singularities.
to straightforwardly adapt the Euclidean version of ILS to Once defined proper ⊞ and ⊟ operators, we can refor-
operate on manifold spaces. The dimension of the chart is mulate our minimization problem in the manifold domain.
chosen to be the minimal needed to represent a generic To this extent we can simply replace the + with a ⊞ in the
perturbation on tha manifold. On the contrary, the manifold computation of the Taylor expansion of Eq. (16). Since we
representation can be chosen arbitrarily. will compute an increment on the chart, we need to compute
5

4.5

3.5

2.5

1.5

0.5

0
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

Fig. 6: Commonly used robust kernel functions. The kernel threshold is set to 1 in all cases.

the expansion on the chart ∆x at the local optimum, that is D. Handling Outliers: Robust Cost Functions
at the origin of the chart itself ∆x = 0, in formulæ:
In Sec. IV-C we described a methodology to compute
∂hk (X̆ ⊞ ∆x) the parameters of the Gaussian distribution over the state
hk (X̆ ⊞ ∆x) ≈ hk (X̆) + ∆x. (27) x which minimizes the Omega-norm of the error between
∂∆x ∆x=0
prediction and observation. More concisely, we compute the
J̆k optimal state x⋆ such that:
The same holds when applying the increments in Eq. (20), K
X
leading to: x∗ = argmin kek (x)k2Ωk . (31)
x
X̆ ← X̆ ⊞ ∆x. (28) k=1

Here we denoted with capital letters the manifold representa- The mean of our estimate µx|z = x∗ is the local optimum
tion of the state X, and with ∆x the Euclidean perturbation. of the GN algorithm, and the information matrix Ω∗x|z =
Since the optimization within one iteration is conducted on H∗ , is he coefficient matrix of the system at convergence.
the chart, the origin of the chart X̆ on the manifold stays The procedure reported in the previous section assumes
constant during this iteration. If the measurements lie on a all measurements correct, albeit affected by noise. Still, in
manifold too, a local ⊟ operator is required to compute the many real cases this is not the case. This is mainly due to
error, namely: aspects that are hard to model - i.e. multi-path phenomena or
incorrect data associations. These wrong measurements are
ek (X) = Ẑk ⊟ Zk = hk (X) ⊟ Zk . (29) commonly referred to as outliers. On the contrary, inliers
represent the good measurements.
To apply the previously defined optimization algorithm we A common assumption made by several techniques to
should linearize the error around the current estimate through reject outliers is that the inliers tend to agree towards a
its first-order Taylor expansion. Posing ĕk = ek (X̆), we have common solution, while outliers do not. This fact is at the
the following relation: root of consensus schemes such as RANSAC [29]. In the
context of ILS, the quadratic nature of the error terms leads
∂ek (X̆ ⊞ ∆x) to over-accounting for measurement whose error is large,
ek (X̆⊞∆x) = hk (X̆)⊟Zk ≈ ĕk + ∆x = albeit
ĕk +J̃kthose
∆x. measurements are typically outliers. However,
∂∆x ∆x=0
(30) there are circumstances where all errors are quite large even
The reader might notice that in Eq. (30) the error space may if no outliers are present. A typical case occurs when we
differ from the increments one, due to the ⊟ operator. As start the optimization from an initial guess which is far from
reported in [62], having a different parametrization might the optimum.
enhance the convergence properties of the optimization in A possible solution to this issue consists in carrying
specific scenarios. Still, to avoid any inconsistencies, the on the optimization under a cost function that grows sub-
information matrix Ωk should be expressed on a chart around quadratically with the error. Indicating with uk (x) the L1
the current measurement Zk . Omega-norm of the error term in Eq. (17), its derivatives with
respect to the state variable x can be computed as follows:
q
uk (x) = ek (x)T Ωk ek (x) (32)
∂uk (x) 1 ∂ek (x)
= ek (x)T Ωk . (33)
∂x uk (x) ∂x

We can generalize Eq. (31) by introducing a scalar function


ρ(u) that computes a new error term as a function of the
L1-norm. Eq. (31) is a special case for ρ(u) = 12 u2 . Thence,
our new problem will consist in minimizing the following
(b) Cholesky decomposition before
function: (a) H matrix before reordering.
reordering.
XK
x∗ = argmin ρ(uk (x)) (34)
x
k=1

Going more in detail and analyzing the gradients of Eq. (34)


we have the following relation:

∂ρ(uk (x)) ∂ρ(u) ∂uk (x)


=
∂x ∂u u=uk (x) ∂x
∂ρ(u) 1 ∂ek (x)
= ek (x)T Ωk
∂u u=uk (x) uk (x) ∂x
∂ek (x)
= γk (x)ek (x)T Ωk (35) (c) H matrix after reordering.
(d) Cholesky decomposition after re-
∂x ordering.

where Fig. 7: Effects of AMD variable reordering on the fill-in of


∂ρ(u) 1 the Cholesky decomposition of matrix H. Black pixels indi-
γk (x) = . (36) cate non-zero blocks. As illustrated in Fig. 7b and Fig. 7d,
∂u u=uk (x) uk (x)
variable reordering dramatically reduces the fill-in of the
The robustifier function ρ(·) acts on the gradient, modulating decomposed matrix.
the magnitude of the error term through a scalar function
γk (x). Still, we can also compute the gradient of the Eq. (31)
as follows: E. Sparsity
Minimization algorithms like GN or Levenberg-Marquardt
∂kek (x)k2Ωk ∂ek (x)
= 2ek (x)T Ωk . (37) (LM) lead to the repeated construction and solution of the
∂x ∂x linear system H∆x = −b. In many cases, each measure-
We notice that Eq. (37) and Eq. (35) differ by a scalar term ment zk only involves a small subset of state variables,
γ(x) that depends on x. By absorbing this scalar term at each namely:
iteration in a new information matrix Ω̄k (x) = γk (x)Ωk , we
can rely on the iterative algorithm illustrated in the previous hk (x) = hk (xk ) where xk = {xk1 , . . . , xkq } ∈ x.
sections to implement a robust estimator. In this sense, at (38)
each iteration we compute γk (x) based on the result of the Therefore, the Jacobian for the error term k has the following
previous iteration. Note that, upon convergence the Eq. (37) structure:
and Eq. (35) will be the same, therefore, they lead to the  
Jk = 0 · · · 0 Jk1 0 · · · 0 Jkh 0 · · · 0 Jkq 0 · · · 0 . (39)
same optimum. This formalization of the problem is called
Iterative Reweighed Least-Squares (IRLS). According to this, the contribution Hk = JTk Ωk Jk of the k th
The use of robust cost functions biases the information measurement to the system matrix H exhibits the following
matrix of the system H. Accordingly, if we want to recover pattern:
an estimate of the solution uncertainty when using robust · 
cost functions, we need to “undo” the effect of function ρ(·).
JT
k 1 Ω k Jk 1 ··· JT
k 1 Ω k Jk h ··· JT
k 1 Ω k Jk q
This can be easily achieved recomputing H after convergence 
 .. .. ..


considering only inliers and disabling the robustifier - i.e. set- 
 . . . 

ting ρ(u) = 12 u2 . Fig. 6 illustrates some of the most common Hk =  JT
k h k Jk 1
Ω ··· JT
k h k Jk h
Ω ··· JT
k h k Jk q
Ω .
 
cost function used in Robotics and Computer Vision. Further  .. .. .. 
 . . . 
information on modern robust cost function can be found in  T
Jk q Ω k Jk 1 ··· T
Jk q Ω k Jk h ··· T
Jk q Ω k Jk q

the work of MacTavish et al. [63]. ·
expand Eq. (1) as follows:
K
Y K
Y
p(x|z) ∝ p(zk |xk ) = p(zk |xk1 , xk2 , . . . , xkq ).
k=1 k=1
(40)
Eq. (40) expresses the likelihood of the measurements as
a product of factors. This concept is elegantly captured
Fig. 8: Illustration of a generic factor graph. Orange round in the factor graph formalism, which provides a graphical
nodes depict state variables x0:N . Each blue squared nodes, representation for this kind of problems. A factor graph is a
instead, represents a factor p(z|xks , . . . , xkq ); edges denote bipartite graph where each node represents either a variable
the variables in conditionals of a factor {xks , . . . , xkq }. xi ∈ x, or a factor p(zk |xk ). Fig. 8 illustrates an example
of factor graph. Edges connect a factor p(zk |xk ) with each
of its variables xk = {xk1 , xk2 , . . . , xkq }. In the remainder
Each measurement introduces a finite number of non- of this document, we will stick to the factor graph notation
diagonal components that depends quadratically on the num- and we will refer to the measurement likelihoods p(zk |xk )
ber of variables that influence the measurement. Therefore, as factors. Note that, the main difference between Eq. (40)
in the many cases when the number of measurements is and Eq. (14) is that the former highlights the subset of state
proportional to the number of variables such as SLAM or variables xk from which the observation depends, while the
BA, the system matrix H is sparse, symmetric and positive latter considers all state variables - also those that have a
semi-definite by construction. Exploiting these intrinsic prop- null contribution.
erties, we can efficiently solve the linear system in Eq. (12). The aim of this section is to use the factor graph formu-
In fact, the literature provides many solutions to this kind lation to formalize the ILS minimization exposed so far. In
of problem, which can be arranged in two main groups: this sense, Alg. 1 reports a step-by-step expansion of the
(i) iterative methods and (ii) direct methods. The former vanilla GN algorithm exploiting the factor graph formalism
computes an iterative solution to the linear system by fol- - supposing that both states and measurement belong to a
lowing the gradient of the quadratic form. These techniques smooth manifold. In the remainder of this document, we
often use a pre-conditioner e.g. Preconditioned Conjugate indicate with bold uppercase symbols elements lying on a
Gradient (PCG), which scales the matrix to achieve quicker manifold space - e.g. X ∈ SE(3); lowercase bold symbols
convergence to take steps along the steepest directions. specify their corresponding vector perturbations - e.g. ∆x ∈
Further informations about these approaches can be found Rn . Going more in detail, at each iteration the algorithm re-
in [64]. Iterative methods might require a quadratic time initializes its workspace to store the current linear system
in computing the exact solution of a linear system, but (lines 6-7). Subsequently, it processes each measurement
they might be the only option when the dimension of the Zk (line 8), computing i) the prediction (line 9), ii) the
problem is very large due to their limited memory require- error vector (line 10) and iii) the coefficients to apply the
ments. Direct methods, instead, compute return the exact robustifier (lines 13-15). While processing a measurement, it
solution of the linear system, usually leveraging on some also computes the blocks Jk,i of the Jacobians with respect
matrix decomposition followed by backsubstitution. Typical to the variables Xi ∈ Xk involved in the factor. We denote
methods include: the Cholesky factorization H = LLT or with Hi,j the block i, j of the H matrix corresponding to the
the QR-decomposition. A crucial parameter controlling the variables Xi and Xj ; similarly we indicate with bi the block
performances of a sparse direct linear solver is the fill-in, of the coefficient vector for the variable Xi . This operation
that is the number of new non-zero elements introduced by is carried on in the lines 18-20. The contribution of each
the specific factorization. A key aspect in reducing the fill in measurement to the linear system is added in a block fashion.
is the ordering of the variables. Since computing the optimal Further efficiency can be achieved exploiting the symmetry
reordering is NP-hard, approximated techniques [65]–[67] of the system matrix H, computing only its lower triangular
are generally employed. Fig. 7 shows the effect of different part. Finally, once the linear system H∆x = −b has been
variable ordering on the same system matrix. A larger fill-in built, it is solved using a general sparse linear solver (line 21)
results in more demanding computations. We refer to [68] and the perturbation ∆x is applied to the current state in a
and [69] for a more detailed analysis about this topic. block-wise fashion (line 23). The algorithm proceeds until
convergence is reached, namely when the delta of the cost-
F. A unifying formalism: Factor Graphs function F between two consecutive iteration is lower than
In this section, we introduce a formalism to represent a a threshold ǫ - line 3.
super-class of the minimization problems discussed so far: Summarizing, instantiating Alg. 1 on a specific problem
factor graphs. We recall that i) our state x = {x1:N } is requires to:
composed by N variables, ii) the conditional probabilities – Define for each type of variable xi ∈ x i) an extended
p(zk |x) might depend only by a subset of the state vari- parametrization Xi , ii) a vector perturbation ∆xi and
ables xk = {xk1 , xk2 , . . . , xkq } ∈ x and iii) we have no iii) a ⊞ operator that computes a new point on the
prior about the state p(x) = U(x). Given this, we can manifold X′i = Xi ⊞ ∆xi . If the variable is Euclidean,
Algorithm 1 Gauss-Newton minimization algorithm for
manifold measurements and state spaces
Require: Initial guess X̆; Measurements C = {hZk , Ωk i}
Ensure: Optimal solution X⋆
1: Fold ← inf
2: Fnew ← 0
3: while Fold − Fnew > ǫ do
4: Fold ← Fnew
5: Fnew ← 0
6: b←0
7: H←0
8: for all k ∈ {1 . . . K} do
9: Ẑk ← hk (X̆k ) Fig. 9: Registration of two point cloud through ICP. The red
10: ek ← Ẑk ⊟ Zk points represent entries of the fixed cloud, while the blue
11: χk ← eTk Ωk ek points belong to the moving one. Green lines emphasize the
12: Fnew ← Fnew + χk associations between points belonging to the two clouds.

13: u k ← χk
k (u)
14: γk = u1k ∂ρ∂u .
u=uk
block rows and columns in the linear system that arise from
15: Ω̃k = γk Ωk
these special nodes. In the next section, we present how
16: for all Xi ∈ {Xk1 ... Xkq } do
to formalize several common SLAM problems through the
17: J̃k,i ← ∂hk (X⊞∆x)⊟Z
∂∆xi
k
factor graph formalization introduced so far.
∆x=0
18: for all Xj ∈ {Xk1 ... Xkq } and j <= i do
V. E XAMPLES
19: Hi,j ← Hi,j + J⊤ k,i Ω̃k Jk,j
20: ⊤
bi ← bi + Jk,i Ω̃i ek In this section we present examples on how to apply the
methodology illustrated in Sec. IV-F to typical problems
21: ∆x ← solve(H∆x = −b) in robotics, namely: Point-Cloud Registration, Projective
22: for all Xi ∈ X do Registration, BA and PGO.
23: X̆i ← X̆i ⊞ ∆xi
24: return X̆ A. ICP
ICP represents a family of algorithms used to compute
a transform that maximizes the overlap between two point
clouds. Let P f be the cloud that stays fixed and P m be the
the extended and the increment parametrization match
one that is moved to maximize the overlap. ICP algorithms
and, thus, ⊞ degenerates to vector addition.
achieve this goal progressively refining an initial guess
– For each type of factor p(zk |xk ), specify i) an extended
of the target transformation X by alternating two phases:
parametrization Zi , ii) an Euclidean representation ∆zi
data-association, and optimization. The aim of the data-
for the error vector and ii) a ⊟ operator such that, given
association is to find a point pfi ∈ P f that is likely to
two points on the manifold Zi and Z′i , ∆zi = Z′i ⊟ Zi
be the same as the point pm j ∈ P
m
being transformed
represents the motion on the chart that moves Zi onto
according to X, see Fig. 9. Note that, several heuristics
Z′i . If the measurement is Euclidean, the extended and
to determine the data association have been proposed by
perturbation parametrizations match and, thus, ⊟ be-
the research community, depending on the context of the
comes a simple vector difference. Finally, it is necessary
problem. The most common one are either geometry based -
to define the measurement function hk (Xk ), that given
i.e. nearest neighbor, normal shooting, projective association
a subset of state variables Xk , computes the expected
- or rely on appearance-based evaluations. Discussing data-
measurement Ẑk .
association strategies is out of the scope fo this work, still, we
– Choose a robustifier function ρk (u) for each type of
can generalize the outcome of data association by a selector
factor. The non-robust case is captured by choosing
function j(k) ∈ {1, . . . , |P f |} that maps a point index k in
ρk (u) = 21 u2 .
the moving cloud to an index j in the fixed cloud. In this way,
Note that, depending on the choices and on the application, we indicate a pair of corresponding points as hpm f
k , pj(k) i. In
not all these steps indicated here are required. Furthermore, contrast to data-association, the optimization step is naturally
the value of some variables might be known beforehand - described as an ILS problem. The variable to be estimated
e.g. the initial position of the robot in SLAM is typically is a transform X whose domain depends on the specific
set at the origin. Hence, these variables do not need to scenario - e.g. SE(2), SE(3) or even a Similarity if the two
be estimated in the optimization process, since they are clouds are at different scales. In the remaining of this section,
constants in this context. In Alg. 1, fixed variables can be we will use X ∈ SE(3) to instantiate our factor-graph-based
handled in the solution step - i.e. line 21 - suppressing all ILS problem.
1) Variables: Since the transformation we should estimate
is a 3D Isometry X ∈ SE(3), our state lies on a smooth man-
ifold. Therefore we should define all the entities specified
in Sec. IV-F, namely:
– Extended Parameterization: we conveniently define a
transformation X = [R | t] ∈ SE(3) as a rotation
matrix R and a translation vector t. Using this notation,
the following relations hold:
 
X12 = X1 X2 , R1 R2 t1 + R1 t2 (41)
−1
 ⊤ ⊤

X , R −R t . (42)

– Perturbation Vector: a commonly used vector


parametrization is ∆x⊤ = [∆t⊤ ∆a⊤ ] ∈ R6 ,
where ∆t ∈ R3 represents a translation, while
Fig. 10: Projective Registration scenario: I represents the
∆a ∈ R3 is a minimal representation for the rotation.
image plane; blue points represent 3D entries of the moving
The latter might use Euler angles, unit-quaternion or
cloud, while red stars indicate the projection of each corre-
the logarithm of the rotation matrix.
sponding 3D point onto I. Finally, green lines emphasize
– X ⊞ ∆x Operator: this is straightforwardly imple-
the associations between moving cloud and fixed image
mented by first computing the transformation ∆X =
projections.
[∆R ∆t] ∈ SE(3) from the perturbation, and then
applying such a perturbation to the previous transform.
In formulæ:
The Jacobians can be computed analytically very straightfor-
X ⊞ ∆x = v2t(∆x)X (43) wardly from Eq. (45) as:

where v2t(∆x) computes a transform ∆X from a ∂ (X ⊞ ∆x)−1 p


Jicp (X, p) = . (46)
perturbation vector ∆x. Its implementation depends on ∂∆x ∆x=0
the parameters chosen for the rotation part ∆a. Note With this in place, we can now fully instantiate Alg. 1.
that, the perturbation might be applied to the left or to For completeness, in the appendix we report the functions
the right of the initial transformation. In this document v2t(·) and t2v(·) for SE(3) objects, together with the an-
we will consistently apply it to the left. Finally, we alytical derivation of the Jacobians. Since in this case the
define also the inverse function ∆x = t2v(∆X), that measurement is Euclidean, the Jacobians of error function
computes perturbation vector from the transformation and measurement function are the same.
matrix such that ∆x = t2v(v2t(∆x)).
2) Factors: In this problem, we have just one type of B. Projective Registration
factor, which depends on the relative position between a
Projective Registration consists in determining the pose X
pair of corresponding points, after applying the current
of a camera in a known 3D scene from a set of 2D projections
transformation X to the moving cloud. Given a set of
of these points on the image plane. In this case our fixed
associations {hpm f m f
s , pj(s) i, . . . , hpK , pj(K) i}, each fixed point
f
point cloud P f will be consisting of the image projections,
pj(k) constitutes a measurement zk - since its value does not while the moving one P m will be composed by the known
change during optimization. On the contrary, each moving location of the 3D points, see Fig. 10. We use the notation for
point pmk will be used to generate the prediction ẑ. Note that, data-association defined in Sec. V-A, in which the function
the measurement space is Euclidean in ths scenario - i.e. R3 . j(i) retrieves the index of a 2D measurement on the image
Therefore, we only need to define the following entities: that corresponds to the 3D point pm m
i ∈ P . Also in this
– Measurement Function: it computes the position of a scenario, the only variable to estimate is the transformation
point pm f
k that corresponds to the point pj(k) in fixed X ∈ SE(3), therefore we will simply re-use the entities
scene by applying the transformation X, namely: defined in Sec. V-A.1 and focus only on the factors.
1) Factors: Given a set of 2D-3D associations
hicp
k (X) , X
−1 m
pk = R⊤ (pm
j(k) − t) (44) {hpm f m f m 2
s , pj(s) i, . . . , hpK , pj(K) i}, each fixed point pk ∈ R
will represent a measurement zk , each moving point will
– Error Function: since both prediction and measurement
contribute to the prediction ẑk . Therefore we can define:
are Euclidean, the ⊟ operator boils down to simple
vector difference. The error, thus, is a 3-dimensional – Measurement Function: it is the projection on the image
vector computed as: plane of a scene point pmk , assuming the camera is at
X. Such a prediction is obtained by first mapping the
eicp f
k (X) = hk (X) − pj(k) . (45) point in the camera reference frame to get a new point
picp , and then projecting this point on the image plane,
in formluæ:
picp , X−1 pm (47)
cam icp
p , Kp (48)
 cam cam 
px /pz
pimg cam
, hom(p ) = . (49)
pcam
y /pcam
z
Note that, pcam is the point in homogeneous image
coordinates, while pimg is the 2D point in pixel co-
ordinates obtained normalizing the point through ho-
mogeneous division. Finally, the complete measurement
function is defined as:
hreg
k (X) , hom(KX pj(k) ) = hom(K[hicp (pm
−1 m
j(k) )]). Fig. 11: SfM scenario: blue dots represent 3D point in
(50)
the world, while red dots indicate their projection onto a
– Error Function: also in this case, both measurement and
specific image plane Ik . Colored lines emphasize the data
prediction are Euclidean vectors and, thus, we can use
association between 3D points and their corresponding image
the vector difference to compute the 2-dimensional error
projections.
as follows:
ereg reg
k (X) = hk (X) − zk (51)
Otherwise speaking, it is a 6N + 3M vector obtained
Note that, we can exploit the work done in Sec. V-A.2 to
by stacking the individual perturbations.
easily compute Jacobians using the chain-rule, namely:
– X ⊞ ∆x Operator: the operator will use the same
Jhom (pcam ) machinery introduced in Sec. V-A.1 for the poses and
z }| {
reg ∂hom(v) the standard Euclidean addition for the point positions.
J (X, p) = KJICP (X, p)
∂v v=pcam
2) Factors: Similar to Projective Registration, in BA a
measurement zk is a projection of a point on the 2D image
= Jhom (pcam )KJICP (X, p). (52)
plane. However, in this specific scenario, such a projection
C. Structure from Motion and Bundle Adjustment depends not only on the estimate of a camera pose but also
Structure from Motion (SfM) is the problem of determin- on the estimate of the point. Note that, this information was
ing the pose of N cameras and the position of M 3D points known in the case of Projective Registration, while now it
on a scene, from their image projections. The scenario is becomes part of the state. For consistency with Alg. 1, if the
shown in Fig. 11. This problem is highly non-convex, and k th measurement arises from observing the point xpm with the
tackling it with ILS requires to start from an initial guess not camera Xcn , we will denote these indices with two selector
too far from the optimum. Such a guess is usually obtained functions n = n(k) and m = m(k), that map the factor
by using Projective Geometry techniques to determine an index k respectively to the indices of the observing camera
initial layout of the camera poses. Subsequently, the points and the observed point. For the k th factor, the camera and
are triangulated to initialize all variables. A final step of the point variables will then be xpm(k) and Xcn(k) . Note that, also
algorithm consists in performing a non-linear refinement of in this case, a measurement zk lies in an Euclidean space -
such an initial guess - known as BA - which is traditionally i.e. R2 . Given this, to instantiate a factor we define:
approached as an ILS problem. Since typically each camera – Measurement Function: the prediction ẑk can be easily
observes only a subset of points, and a point projection obtained from Eq. (50), namely:
depends only on the relative pose between the observed point p
hba ba c
k (X) , hk (Xn(k) , xm(k) )
and the observing camers, BA is a good example of a sparse  
problem. = hom K(Xcn(k) −1 )xpm(k) . (53)
1) Variables: We want to estimate the pose of each
camera Xc1:N , and the position of each point xp1:M . The – Error Function: it is the Euclidean difference between
state will thus be a data structure X = hXc1:N , xp1:M i storing prediction and measurement:
all camera poses and all points. Given this, for the camera  
p
eba c ba
k (Xn(k) , xm(k) ) , hk Xcn(k) , xpm(k) − zk . (54)
poses Xc1:N , the definitions in Sec. V-A.1 will be used
again. As for the points, we do not need a specific extended In this context, the Jacobian Jba
k (X) will be consisting of two
parametrization, since they lie on ℜ3 . Therefore we should blocks, corresponding to the perturbation of the camera pose
define only: and to the perturbation of the point position, in formulæ:
– Perturbation Vector: the total perturbation vector is  
ba
defined as Jba
k = 0 · 0 Jk,n(k) 0 · 0 Jba k,m(k) 0 · 0
 (55)
∆x⊤ = ∆xc⊤ 1 ·· ∆xc⊤ N | ∆xp⊤ 1 ·· ∆xp⊤ M .
where – Error Function: in this case, since the measurements are
non-Euclidean too, we are required to specify a suitable
∂eba c
k (Xn(k) ⊞ ∆x r
, xpm(k) )
Jba parametrization for the error vector ek . In literature,
k,n(k) (X) = (56)
∂∆xc many error vectorization are available [62], each one
∆xc =0
p p with different properties. Still, in this document, we will
∂eba r
k (Xn(k) , xm(k) ⊞ ∆x )
Jba
k,m(k) (X) = . (57) make use of the same 6-dimensional parametrization
∂∆xp
∆xp =0 used for the increments - i.e. e⊤ = (exyz⊤ erpy⊤ ).
Again, the measurement domain is Euclidean, thus the Ja- Furthermore, we need to define a proper ⊟ operator
cobians of the error function and the measurement function that expresses on a chart the relative pose between two
match. For completeness, in the Appendix of this document SE(3) objects ∆z = Ẑ ⊟ Z. To achieve this goal, we
we report a more in-depth derivation of the Jacobians. i) express Ẑ in the reference system of Z obtaining the
Still, since all measurements are relative, given a particular relative transformation ∆Z and then ii) compute the
solution X⋆ all solutions X′ = TX⋆ obtained by applying chart coordinates of ∆Z around the origin using the
a transformation T ∈ SE(3) to all the variables in X⋆ have t2v(·) function. In formluæ:
 
the same residual χ2 and, thus, are equivalent. Furthermore, ∆z , Ẑ ⊟ Z = t2v(∆Z) = t2v Z−1 Ẑ . (59)
all solutions X′ = sX⋆ obtained by scaling all poses and
landmarks by a constant s are equivalent too. This reflects Note that, since ∆Zk expresses a relative motion be-
the fact that observing an object that is twice as big from tween prediction and measurement, its rotational com-
twice the distance results in the same projection. Thence, ponent will by away from singularities. With this in
the problem of BA is under-determined by 7 Degrees-of- place, the error vector is computed as the pose of the
Freedom (DoF) and, thus, the vanilla GN algorithm requires prediction Ẑk = hpgo
k (X) on a chart centered in Zk ;
to fix at least 7 DoF - typically a camera pose (6 DoF), namely:
and the distance between two points or two camera poses (1
DoF). epgo pgo r r
k (X) , hk (Xn(k) , Xm(k) ) ⊟ Zk
 
= (Xrn(k) )−1 Xrm(k) ⊟ Zk . (60)
D. Pose Graphs
A pose graph is a factor graph whose variables Xr1:N Similar to the BA case, in PGO the Jacobian Jpgok (X) will be
are poses and whose measurements Z1:K are relative mea- consisting of two blocks, corresponding to the perturbation
surements between pairs of poses. Optimizing a pose graph of observed and the observing poses. The measurements in
means determining the configuration of poses that is max- this case are non-Euclidean, and, thus, we need to compute
imally consistent with the measurements. PGO is very the Jacobians on the error function - as specified in Eq. (60):
common in the SLAM community, and several ad-hoc ap-  
pgo pgo
Jpgo
k = 0 · 0 Jk,n(k) 0 · 0 Jk,m(k) 0 · 0
proaches have been proposed. Similar to BA, PGO is highly
non-convex, and its solution with ILS requires a reasonably (61)
good initial guess. where
1) Variables: Also in PGO, each variable Xrk lies on
∂epgo r r r
k (Xn(k) ⊞ ∆x , Xm(k) )
the smooth manifold SE(3). Once again, we will make Jpgo
k,n(k) (X) =
∂∆xr
(62)
use of the formulation used in Sec. V-A.1 to characterize ∆xr =0
the state X = Xr1:N and the perturbation vector ∆xrT = ∂epgo r r r
k (Xn(k) , Xm(k) ⊞ ∆x )
(∆xrT rT
1 . . . ∆xN ). Jpgo
k,m(k) (X) = (63)
2) Factors: Using the same index notation in Sec. V-C.2, ∂∆xr
∆xr =0
let Zk be the k th relative pose measurement expressing the Analogous to the BA case, also in PGO all measurements
pose Xm in the reference frame of the pose Xn . We denote are relative, and, hence, all solutions that are related by a
the pair of poses as Xn = Xn(k) , and Xm = Xm(k) using single transformation are equivalent. The scale invariance,
the two selector functions n(k) and m(k). In this scenario, however, does not apply in this context. As a result, PGO is
a measurement Zk expresses a relative pose between two under-determined by 6 DoF and using GN requires to fix at
variables and, consequently, also Zk lies on the smooth least one of the poses.
manifold SE(3). Considering this, we define the following
entities: E. Considerations
– Measurement Function: this is straightforwardly ob- In general, one can carry on the estimation by using an
tained by expressing the observed pose Xrm(k) in the arbitrary number of heterogeneous factors and variables. As
reference frame of the observing pose Xrn(k) , namely: an instance, if we want to augment a BA problem with
  odometry, we can model the additional measurements with
hpgo
k (X) , hk
pgo
Xrn(k) , Xrm(k) PGO factors connecting subsequent poses. Similarly, if we
want to solve a Projective Registration problem where the
= (Xrn(k) )−1 Xrm(k) . (58) world is observed with two cameras, and we have guess
of the orientation from an inertial sensor, we can extend coefficients are null. In these scenarios, the time spent to
the approach presented in Sec. V-B by conducting the solve the linear system dominates over the time required to
optimization on a common origin of the rigid sensor system, build it.
instead of the camera position. We will have three types of A typical aspect that hinders the implementation of a ILS
factors, one for each camera, and one modeling the inertial algorithm by a person approaching this task for the first
measurements. time is the calculation of the Jacobians. The labor-intensive
As a final remark, common presentations of ICP, Pro- solution is to compute them analytically, potentially with the
jective Registration and BA conduct the optimization by aid of some symbolic-manipulation package. An alternative
estimating world-to-sensor frame, rather than the sensor- solution is to evaluate them numerically, by calculating the
to-world, as we have done in this document. This leads Jacobians column-by-column with repeated evaluation of the
to a more compact formulation. This avoids inverting the error function around the linearization point. Whereas this
transform to compute the prediction, and results in Jacobians practice might work in many situations, numerical issues can
are easier to compute in close form. We preferred to provide arise when the derivation interval is not properly chosen. A
the solution for sensor-to-world to be consistent with the third solution is to delegate the task of evaluating the analytic
PGO formulation. solution directly to the program, starting from the error
function. This approach is called AD and Ceres Solver [10]
VI. A GENERIC S PARSE /D ENSE M ODULAR L EAST
is the most representative system to embed this feature - later
S QUARES S OLVER also adopted by other optimization frameworks.
The methodology presented in Sec. IV-F outlines a straight In the remainder of this section, we first revisit and
path to the design of an ILS optimization algorithm. Robotic generalize Alg. 1 to support multiple solution strategies.
applications often require to run the system on-line, and, Subsequently, we outline some design requirements that will
thus, they need efficient implementations. When extreme finally lead to the presentation of the overall design of our
performances are needed, the ultimate strategy is to overfit approach - proposed in Sec. VII.
the solution to the specific problem. This can be done both
at an algorithmic level and at an implementation level. To A. Revisiting the Algorithm
improve the algorithm, one can leverage on the a-priori
known structure of the problem, by removing parts of the In the previous section, we presented the implementation
algorithm that are non needed or by exploiting domain- of a vanilla GN algorithm for generic factor graphs. This
specific knowledge. A typical example is when the structure simplistic scheme suffers under high non linearities, or when
of the linear system is known in advance - e.g. in BA - where the cost function is under-determined. Over time, alternatives
it is common to use specialized methods to solve the linear to GN have been proposed, to address these issues, such
system [70]. as LM or Trusted-Region Method (TRM) [71]. All these
Focusing on the implementation, we reported two main algorithms present some common aspects or patterns that
bottlenecks: the computation of the linear system H∆x = b can be exploited when designing an optimization system.
and its solution. Dense problems such as ICP, Sensor Cali- Therefore, in this section, we reformulate Alg. 1 to isolate
bration or Projective Registration, are typically characterized different independent sub-modules. Finally we present both
by a small state space and many factors of the same type. In the GN and the LM algorithms rewritten by using these sub-
ICP, for instance, the state contains just a single SE(3) object modules.
- i.e. the robot pose. Still, this variable might be connected In Alg. 2 we isolate the operations needed to compute the
to hundreds of thousands of factors, one for each point cor- scaling factor γk for the information matrix Ωk , knowing
respondence. Between iterations, the ICP mechanism results the current χ2k . Alg. 3 performs the calculation of the error
in these factors to change, depending on the current status of ek and the Jacobian Jk for a factor hZk , Ωk i at the current
the data association. As a consequence, these systems spend linearization point. Alg. 4 applies the robustifier to a factor,
most of their time in constructing the linear system, while and updates the linear system. Alg. 5 applies the perturbation
the time required solve it is negligible. Notably, applications ∆x to the current solution X̆ to obtain an updated estimate.
such as Position Tracking or VO require the system to run Finally, in Alg. 6 we present a revised version of Alg. 1 that
at the sensor frame-rate, and each new frame might take relies on the modules described so far. In Alg. 7, we provide
several ILS iterations to perform the registration. On the an implementation of the LM algorithm that makes use of the
contrary, sparse problems like , PGO or large scale BA are same core sub-algorithms used in Alg. 6. The LM algorithm
characterized by thousands of variables, and a number of solves a damped version of the system, namely (H + λ ·
factors which is typical in the same order of magnitude. In diag(H))∆x = b. The magnitude of the damping factor λ
this context, a factor is connected to very few variables. As an is adjusted depending on the current variation of the χ2 . If the
example, in case of PGO, a single measurement depends only χ2 increases upon an iteration, λ increases too. In contrast, if
two variables that express mutually observable robot poses, the solution improves, λ is decreased. Variants of these two
whereas the complete problem might contain a number of algorithms - e.g. damped GN, that solves (H+λI)∆x = b -
variables proportional to the length of the trajectory. This can be straightforwardly implemented by slight modification
results in a large-scale linear system, albeit most of its to the algorithm presented here.
Algorithm 2 robustify(χ2k ) – computes the robustification Algorithm 6 gaussN(X̆, C) – Gauss-Newton minimization
coefficient γk algorithm for manifold measurements and state spaces
Require: Current χ2k . Require: Initial guess X̆; Measurements C = {hZk , Ωk i}
Ensure: γk computed from the actual error, Ensure: Optimal solution X⋆

1: uk ← χk 1: Fold ← inf, Fnew ← 0
∂ρk (u)
2: γk = u1 ∂u 2: while Fold − Fnew > ǫ do
k u=uk
3: return γk 3: Fold ← Fnew , Fnew ← 0, b ← 0, H ← 0
4: for all k ∈ {1 ... K} do
5: hχk , Hk , bk i ←
Algorithm 3 linearize(X̆k , Z̆k ) – computes the error ek and updateHb(Hk , bk , Xk , Zk , Ωk )
the Jacobians Jk at the current linearization point X̆ 6: Fnew ← χk
Require: Initial guess X̆k ; Current measurement Z̆k ; 7: ∆x ← solve(H∆x = −b)
Ensure: Error: ek ; Jacobians J˜k ; 8: X̆ ← updateSolution(X̆, ∆x)
1: Ẑk ← hk (X̆k ) 9: return X̆
2: ek ← Ẑk ⊟ Zk
3: J̃k = {}
4: for all Xi ∈ {Xk1 ... Xkq } do Algorithm 7 levenbergM(X̆, C) – Levenberg-Marquardt
5: J̃k,i ← ∂hk (X⊞∆x)⊟Z k minimization algorithm for manifold measurements and state
∂∆xi ∆x=0 spaces
6: J̃k ← J̃k ∪ {J̃k,i }
Require: Initial guess X̆; Measurements C = {hZk , Ωk i};
7: return < ek , Jk >
Maximum number of internal iteration tmax
Ensure: Optimal solution X⋆
Algorithm 4 updateHb(H, b, X̆k , Zk , Ωk ) – updates linear 1: Fold ← inf, Fnew ← 0, Finternal ← 0
system with a factor current linearization point X̆, and 2: X̆backup ← X̆
returns the χ2k of the factor 3: λ ← initializeLambda(X̆, C)
4: while Fold − Fnew < ǫ do
Require: Initial guess X̆k ; Coefficients of the linear system 5: Fold ← Fnew , Fnew ← 0, b ← 0, H ← 0
H and b; Measurement hZk , Ωk i 6: for all k ∈ {1 . . . K} do
Ensure: Coefficients of the linear system after the update 7: hχ, H, bi ← updateHb(H, b, Xk , Zk , Ωk )
H and b; Value of the cost function for this factor χ2k 8: Finternal ← χ
1: < ek , Jk >= linearize(X̆k , Zk )
9: t←0
2: χ2k ← eTk Ωk e k 10: while t < tmax ∧ t > 0 do
3: γk = robustify(χ2k )
11: ∆x ← solve((H + λI)∆x = −b)
4: Ω̃k = γk Ωk
12: X̆ ← updateSolution(X̆, ∆x)
5: for all Xi ∈ {Xk1 ... Xkq } do
13: hχ, H, bi ← updateHb(H, b, Xk , Zk , Ωk )
6: for all Xj ∈ {Xk1 ... Xkq } and j <= i do
14: Fnew ← χ
7: Hi,j ← Hi,j + J⊤ k,i Ω̃k Jk,j

15: if Fnew − Finternal < 0 then
8: bi ← bi + Jk,i Ω̃i ek 16: λ ← λ/2
9: return < χ2k , H, b > 17: X̆backup ← X̆
18: t←t−1
19: else
Algorithm 5 updateSolution(X̆, ∆x) – applies a perturba- 20: λ←λ·2
tion to the current system solution 21: X̆ ← X̆backup
Require: Current solution X̆; Perturbation ∆x 22: t←t+1
Ensure: New solution X̆, moved according to ∆x 23: return X̆
1: for all Xi ∈ X do
2: X̆i ← X̆i ⊞ ∆xi
3: return X̆ work. Although most of these requirements indicate good
practices to be followed in potentially any new development,
we highlight here their role in the context of a solver design.
B. Design Requirements a) E ASY TO USE AND S YMMETRIC API: As users
While designing our system, we devised a set of require- we want to configure, instantiate and run a solver in the
ments stemming from our experience both as developers and same manner, regardless to the specific problem to which
as users. Subsequently, we turned these requirements in some is applied. Ideally, we do not want the user to care if
design choices that lead to our proposed optimization frame- the problem is dense or sparse. Furthermore, in several
practical scenarios, one wants to change aspects of the solver by substituting the approximated Hessian Hk = JTk Ωk Jk
while it runs - e.g. the minimization algorithm chosen, the ∂ 2 ek
with the analytic one Hk = ∂∆x 2 . This feature captures
k
robust kernel or the termination criterion. Finally we want to second order approaches such as NDT [72] in the language
save/retrieve the configuration of a solver and all of its sub- of our API.
modules to/from disk. Thence, the expected usage pattern d) M INIMIZE C ODEBASE: The likelihood of bugs in
should be: i) load the specific solver configuration from disk the implementation grows with the size of the code-base.
ed eventually tune it, ii) assign a problem to the solver or load For small teams characterized by a high turnover - like the
it from file, iii) compute a solution and eventually iv) provide ones found in academic environments - maintaining the code
statistics about the evolution of optimization. Note that, many becomes an issue. In this context we choose to favor the
current state-of-the-art ILS solver allow to easily perform the code reuse in spite of a small performance gain. The same
last 3 steps of this process, however, they do not provide the class used to implement an algorithm, a variable type or
ability of permanently write/read their configuration on/from a robustifier should be used in all circumstances - namely
disk - as our system does. sparse and dense problems - where it is needed.
b) I SOLATING PARAMETERS , WORKING VARIABLES
AND PROBLEM DESCRIPTION / SOLUTION : When a user is
VII. I MPLEMENTATION
presented to a new potentially large code-base, having a
clear distinction between what the variables represent and As support material for this tutorial, we offer an own
how they are used, substantially reduces the learning curve. implementation of a modular ILS optimization framework,
In particular we distinguish between parameters, working that has been designed around the methodology illustrated
variables and the input/output. Parameters are those objects in Section IV-F. Our system is written in modern C++17
controlling the behavior of the algorithm, such as number and provides static type checking, AD and a straightforward
of iterations or the thresholds in a robustifier. Parameters interface. The core of our framework fits in less than 6000
might include also processing sub-modules, such as the lines of code, while the companion libraries to support the
algorithm to use or the algebraic solver of the linear system. most common problems - e.g. 2D and 3D SLAM, ICP,
Summarizing, parameters characterize the behavior of the Projective and Dense Registration, Sensor Calibration - are
optimizer, independently from the input, and represent the contained in 6500 lines of code. Albeit originally designed
configuration that can be stored/retrieved from disk. as a tool for rapid prototyping, our system achieves a high
In contrast to parameters, working variables are altered degree of customization and competes with other state-of-
during the computation, and are not directly accessible to the-art systems in terms of performances.
the end user. Finally, we have the description of the problem Based on the requirements outlined in Sec. VI-B, we
- i.e. the factor graph, where the factors and the variables designed a component model, where the processing ob-
expose an interface agnostic to the approach that will be jects (named Configurable) can possess parameters, sup-
used to solve the problem. port dynamic loading and can be transparently serial-
c) T RADE - OFF D EVELOPMENT E FFORT / P ERFOR - ized. Our framework relies on a custom-built serialization
MANCE : Quickly developing a proof of concept is a valuable infrastructure that supports format independent serializa-
feature to have while designing a novel system. At the same tion of arbitrary data structures, named Basic Object
time, once a way to approach the problem has been found, Serialization System (BOSS). Furthermore, thanks
it becomes perfectly reasonable to invest more effort to to this foundation, we can provide both a graphical configu-
enhance its efficiency. rator - that allows to assemble and easily tune the modules
Upon instantiation, the system should provide an off-the of a solver - and a command-line utility to edit and run
shelf generic and fair configuration. Obviously, this might be configurations on the go.
tweaked later for enhancing the performances on the specific The goal of this section is to provide the reader with a
class of problems. A possible way to enhance performances quick overview of the proposed system, focusing on how
is by exploiting the special structure of a specific class the user interacts with it. Given the class-diagram illustrated
of problems, overriding the general APIs to perform ad- in Fig. 12, in the remainder we will first analyze the
hoc computations. This results in layered APIs, where the core modules of the solver and then provide two practical
functionalities of a level rely only on those of the level examples on how to use it.
below. As an example, in our architecture the user can either
specify the error function and let the system to compute the
A. Solver Core Classes
Jacobians using AD or provide the analytical expression of
the Jacobians if more performances are needed. Finally, the Our framework has been designed to satisfy the require-
user might intervene at a lower level, providing directly the ments stated in Sec. VI, embedding unified APIs to cover
contribution to the linear system Hk = JTk Ωk Jk given by both dense and sparse problems symmetrically. Furthermore,
the factor. In a certain class of problems also computing thanks to the BOSS serialization library, the user can generate
this product represents a performance penalty. An additional permanent configuration of the solver, to be later read and
benefit provided by this design is the direct support for reused on the go. The configuration of a solver generally
Newton’s method, which can be straightforwardly achieved embeds the following parameters:
Fig. 12: UML class diagram of our architecture. In green we show the type-independent classes, in pink the type-dependent
classes and in pale blue we outline potential specializations. The template class names end with an underscore and the
arguments are highlighted above the name, between angular brackets. Arrow lines denote inheritance, and diamonds
aggregation/ownership.

– Optimization Algorithm: the algorithm that performs the optimization. This class allows to select the type of
the minimization; currently, only GN and LM are sup- algorithm used within one iteration, which algorithm to use
ported, still, we plan to add also TRM approaches to solve the linear system, or which termination criterion to
– Linear Solver: the algebraic solver that computes the use. This mechanism is achieved by delegating the execution
solution of the linear system H∆x = −b; we embed a of these functions to specific interfaces. More in detail, the
naive AMD-based [65] linear solver together with other linear system is stored in a sparse-block-matrix structure,
approaches based on well-known highly-optimized lin- that effectively separates the solution of the linear system
ear algebra libraries - e.g. SuiteSparse 1 from the rest of the optimization machinery. Furthermore,
– Robustifier: the robust kernel function applied to a our solver supports incremental updates, and can provide
specific factor; we provide several commonly used an estimate of partial covariance blocks. Finally, our system
instances of robustifier, together with a modular mech- supports hierarchical approaches. In this sense the problem
anism to assign specific robustifier to different types of can be represented at different resolutions (levels), by using
factor - called robustifier policy different factors. When the solution at a coarse level is
– Termination Criterion: a simple modules that, based on computed, the optimization starts from the next denser level,
optimization statistics, checks whether convergence has enabling a new set of factors. In the new step, the initial
been reached. guess is computed from the solution of the coarser level.
Note that, in our architecture there is a clear separation The IterationAlgorithmBase class defines an interface
between solver and problem. In the next section we will for the outer optimization algorithm - i.e. GN or LM. To
describe how we formalized the latter. In the remaining of carry on its operations, it relies on the interface exposed
this section, instead, we will focus on the solver classes, by the Solver class. The latter is in charge to invoke the
which are in charge of computing the problem solution. IterationAlgorithmBase, which will run a single iteration
The class Solver implements a unified interface for our of its algorithm.
optimization framework. It presents itself to the user with an
unified data-structure to configure, control, run and monitor Class RobustifierBase defines an interface for using
arbitrary ρ(u) functions - as illustrated in Sec. IV-D. Robust
1 http://faculty.cse.tamu.edu/davis/suitesparse.html kernels can be directly assigned to factors or, alternatively,
the user might define a policy, that based on the sta- is provided with a structure on which to write the outcome
tus of the actual factor decides which robustifier to use. of the operation. A factor can be enabled or disabled. In
The definition of a policy is done by implementing the the latter case, it will be ignored during the computation.
RobustifierPolicyBase interface. Besides, upon update a factor might become invalid, if the
Finally, TerminationCriterionBase defines an interface result of the computation is meaningless. This occurs for
for a predicate that, exploiting the optimization statistics, instance in BA, when a a point is projected outside the image
detects whether the system has converged to a solution or plane.
a fatal error has occurred. The Factor_<VariableTupleType> class implements a
typed interface for the factor class. The user willing to
B. Factor Graph Classes
extend the class at this level is responsible of implementing
In this section we provide an overview of the top-level the entire FactorBase interface, relying on functions for
classes constituting a factor graph - i.e. the optimization typed access to the blocks of the system matrix H and
problem - in our framework. In specifying new variables of the coefficient vector b. In this case, the block size is
or factors, the user can interact with the system through a determined from the dimension of the perturbation vector
layered interface. More specifically, factors can be defined of the variables in the template argument list. We extended
using AD and, thus, contained in few lines of code for the factors at this level to implement approaches such as
rapid prototyping or the user can directly provide how to dense multi-cue registration [31]. Special structures in the
compute analytic Jacobians if more speed is required. Fur- Jacobians can be exploited to speed up the calculation of
thermore, to achieve extreme efficiency, the user can choose Hk whose computation has a non negligible cost.
to compute its own routines to update the quadratic form The ErrorFactor_<ErrorDim, VariableTypes...> class
directly, consistently in line with our design requirement of specializes a typed interface for the factor class, where the
more-work/more-performance. Note that, we observed in our user has to implement both the error function ek and the
experiments that in large sparse problems the time required Jacobian blocks Jk,i . The calculation of the H and the b
to linearize the system is marginal compared to the time blocks is done through loops unrolled at compile time since
required to solve it. Therefore, in most of these cases AD the types and the dimensions of the variables/errors are part
can be used without significant performance losses. of the type.
1) Variables: The VariableBase implements a base The ADErrorFactor_<Dim, VariableTypes...> class fur-
abstract interface the variables in a factor graph, whereas ther specializes the ErrorFactor_. Extending the class at this
Variable_<PerturbationDim,EstimateType> specializes level only required to specify only the error function. The
the base interface on a specific type. The definition of a new Jacobians are computed through AD, and the updates of H
variable extending the Variable_ template requires the user and the b are done according to the base class.
to specify i) the type EstimateType used to store the value Finally, the FactorCorrespondenceDriven_<FactorType>
of the variable Xi , ii) the dimension PerturbationDim implements a mechanism that allows the solver to iterate
of the perturbation ∆xi and iii) the ⊞ operator. This is over multiple factors of the same type and connecting the
coherent with the methodology provided in Sec. IV-F. In same set of variables, without the need of explicitly storing
addition to these fields, variable has an integer key, to be them in the graph. A FactorCorrespondenceDriven_ is
uniquely identified within a factor graph. Furthermore, a instantiated on a base type of factor, and it is specialized by
variable can be in either one of these three states: defining which actions should be carried on as a consequence
– Active: the variable will be estimated of the selection of the “next” factor in the pool by the
– Fixed: the variable stays constant through the optimiza- solver. The solver sees this type of factor as multiple ones,
tion albeit a FactorCorrespondenceDriven_ is stored just once
– Disabled: the variable is ignored and all factors that in memory. Each time a FactorCorrespondenceDriven_ is
depend on it are ignored as well. accessed by the solver a callback changing the internal
To provide roll-back operations - such as those required by parameters is called. In its basic implementation this class
LM - a variable also stores a stack of values. takes a container of corresponding indices, and two data
To support AD, we introduce the ADVariable_ template, containers: Fixed and Moving. Each time a new factor within
that is instantiated on a variable without AD. Instantiating a the FactorCorrespondenceDriven_ is requested, the factor
variable with AD requires to define the ⊞ operator by using is configured by: selecting the next pair of corresponding
the AD scalar type instead of the usual float or double. indices from the set, and by picking the elements in Fixed
This mechanism allows us to mix in a problem factors that and Moving at those indices. As an instance, to use our solver
require AD with factors that do not. within an ICP algorithm, the user has to configure the factor
2) Factors: The base level of the hierarchy is the by setting the Fixed and Moving point clouds. The corre-
FactorBase. It defines a common interface for this type of spondence vector can be changed anytime to reflect a new
graph objects. It is responsible of i) computing the error - data association. This results in different correspondences to
and, thus, the χ2 - ii) updating the quadratic form H and be considered at each iteration.
the right-hand side vector b and iii) invoking the robustifier 3) FactorGraph: To carry on an iteration, the solver has
function (if required). When an update is requested, a factor to iterate over the factors and, hence, it requires to randomly
D ATASET S ENSOR VARIABLES FACTORS
access the variables. Restricting the solver to access a graph ICL-NUIM [74] RGB-D 1 307200
through an interface of random access iterators enables us to ETH-Hauptgebaude [75] Laser-scanner 1 189202
decouple the way the graph is accessed from the way it is ETH-Apartment [75] Laser-scanner 1 370276
stored. This would allow us to support transparent off-core Stanford-Bunny [76] 3D digitalizer 1 35947
storage that can be useful on very large problems. TABLE I: Specification of the datasets used to perform dense
A FactorGraphInterface defines the way to access a benchmarks.
graph. In our case we use integer values as key for variables
and factors. The solver accesses a graph only through the
FactorGraphInterface and, thence, it can read/write the range scans. Therefore, in such cases, we constructed the
value of state variables, read the factors, but it is not allowed ICP problem as follows:
to modify the graph structure. – reading of the first raw scan and generate a point cloud
A heap-based concrete implementation of a factor graph – transformation of the point cloud according to a known
is provided by the FactorGraph class, that specializes the isometry TGT
interface. The FactorGraph supports transparent serializa- – generation of perfect association between the two clouds
tion/deserialization. Our framework makes use of the open- – registration starting from Tinit = I.
source math library Eigen [73], which provides fast and Since our focus is on the ILS optimization, we used the same
easy matrix operation. The serialization/deserialization of set of data-associations and the same initial guess for all
variable and factors that are constructed on Eigen types is approaches. This methodology renders the comparison fair
automatically handled by our BOSS library. and unbiased. As for the ICL-NUIM dataset, since obtained
In sparse optimization it is common to operate on a the raw point cloud unprojecting the range image of the first
local portion of the entire problem. Instrumenting the solver reading of the lr-0 scene. After this initial preprocessing,
with methods to specify the local portions would bloat the the benchmark flow is the same described before.
implementation. Alternatively we rely on the concept of In this context, we compared i) the accuracy of the solution
FactorGraphView that exposes an interface on a local portion obtained computing the translational and rotational error
of a FactorGraph - or of any other object implementing the of the estimate and ii) the time required to achieve that
FactorGraphInterface. solution. We compared the recommended PCL registration
VIII. E XPERIMENTS suite - that uses the Horn formulas - against our framework
with and without AD. Furthermore, we also provide results
In this section we propose several comparisons between obtained using PCL implementation of the LM optimization
our framework and other state-of-the-art optimization sys- algorithm.
tem. The aim of these experiments is to support the claims on As reported in Tab. II, the final registration error is almost
the performance of our framework and, thus, we focused on negligible in all cases. Instead, in Fig. 13 we document
the accuracy of the computed solution and the time required the speed of each solver. When using the full potential of
to achieve it. Experiments have been performed both on our framework - i.e. using analytic Jacobians - it is able
dense scenarios - such as ICP - and sparse ones - e.g. PGO to achieve results in general equal or better than the off-
and PLGO. the-shelf PCL registration algorithm. Using AD has a great
A. Dense Problems impact on the iteration time, however, our system is able to
be faster than the PCL implementation of LM also in this
Many well-known SLAM problems related to the front- case.
end can be solved exploiting the ILS formulation introduced
before. In such scenarios - e.g.point-clouds registration - B. Sparse Problems
the number of variables is small compared to the observa- Sparse problems are mostly represented by generic global
tions’ one. Furthermore, at each registration step, the data- optimization scenarios, in which the graph has a large
association is usually recomputed to take advantage of the number of variables while each factor connects a very small
new estimate. In this sense, one has to build the factor graph subset of those (typically two). In this kind of problems, the
associated to the problem from scratch at each step. In such graph remains unchanged during the iterations, therefore, the
contexts, the most time consuming part of the process is most time-consuming part of the optimization is the solution
represented by the construction of linear system in Eq. (19) of the linear system not its construction. PGO and PLGO
and not its solution. are two instances of this problem that are very common in
To perform dense experiments, we choose a well-known the SLAM context and, therefore, we selected these two to
instance of this kind of problems: ICP. We conducted mul- perform comparative benchmarks.
tiple tests, comparing our framework to the current state-of- 1) Pose-Graph Optimization: PGO represents the back-
the-art PCL library [15] on the standard registration datasets bone of SLAM systems and it has been well investigated by
summarized in Tab. I. In all the cases, we setup a con- the research community. For these experiments, we employed
trolled benchmarking environment, to have a precise ground- standard 3D PGO benchmark datasets - all publicly avail-
truth. In the ETH-Hauptgebaude, ETH-Apartment and able [62]. We added to the factors Additive White Gaussian
Stanford-Bunny, the raw data consists in a series of Noise (AWGN) and we initialized the graph using the
Time per Iteration Cumulative Time
2.5 14

12
2

10

1.5
8
seconds

seconds
6
1

0.5
2

0 0
PCL PCL-LM our our-AD PCL PCL-LM our our-AD

(a) Iteration time: ICL-NUIM - lr-0 (b) Cumulative time: ICL-NUIM - lr-0
Time per Iteration Cumulative Time
0.8 3

0.7
2.5
0.6

0.5
2
0.4
seconds

seconds
0.3 1.5

0.2
1
0.1

0
0.5
-0.1

-0.2 0
PCL PCL-LM our our-AD PCL PCL-LM our our-AD

(c) Iteration time: ETH-Hauptgebaude (d) Cumulative time: ETH-Hauptgebaude


Time per Iteration Cumulative Time
1.6 6

1.4
5
1.2

1
4
0.8
seconds

seconds

0.6 3

0.4
2
0.2

0
1
-0.2

-0.4 0
PCL PCL-LM our our-AD PCL PCL-LM our our-AD

(e) Iteration time: ETH-Apartment (f) Cumulative time: ETH-Apartment


Time per Iteration Cumulative Time
0.1 1.8

1.6
0.08
1.4
0.06
1.2

0.04 1
seconds

seconds

0.02 0.8

0.6
0
0.4
-0.02
0.2

-0.04 0
PCL PCL-LM our our-AD PCL PCL-LM our our-AD

(g) Iteration time: Stanford-Bunny (h) Cumulative time: Stanford-Bunny

Fig. 13: Timing analysis of the ILS optimization. On the left column are reported the mean and standard deviation of a full
ILS iteration - computed over 10 total iterations. On the right column, instead, the cumulative time to perform all 10 ILS
iterations.
Time per Iteration Cumulative Optimization Statistics
0.35 35

0.3 30

0.25 25

0.2 20
seconds

seconds
0.15 15

0.1 10

0.05 5

0 0
ceres gtsam g2o our ceres gtsam g2o our

(a) Iteration time: kitti-00. (b) Cumulative time: kitti-00.

Time per Iteration Cumulative Optimization Statistics


0.7 70

60
0.6

50
0.5

40
seconds

seconds

0.4
30

0.3
20

0.2
10

0.1 0
ceres gtsam g2o our ceres gtsam g2o our

(c) Iteration time: sphere-b. (d) Cumulative time: sphere-b.

Time per Iteration Cumulative Optimization Statistics


0.16 16

0.14 14

12
0.12

10
0.1
seconds

seconds

8
0.08
6

0.06
4

0.04 2

0.02 0
ceres gtsam g2o our ceres gtsam g2o our

(e) Iteration time: torus-b. (f) Cumulative time: torus-b.

Fig. 14: Timing analysis of different optimization frameworks. The left column reports the mean and standard deviation of
the time to perform a complete LM iteration. The right column, instead, illustrates the total time to reach convergence -
mean and standard deviation.

breadth-first initialization. We report in Tab. III the complete noise imposed on the factors, we performed experiments
specifications of the datasets employed together with the over 10 noise realizations and we report here the statistics
noise statistics used. Given the probabilistic nature of the of the results obtained - i.e. mean and standard deviation.
PCL PCL-LM O UR O UR -AD
epos [m] 6.525 × 10−06 1.011 × 10−04 1.390 × 10−06 6.743 × 10−07
ICL-NUIM-lr-0
erot [rad] 1.294 × 10−08 2.102 × 10−05 1.227 × 10−08 9.510 × 10−08
epos [m] 4.225 × 10−06 2.662 × 10−05 1.581 × 10−06 2.384 × 10−07
ETH-Haupt
erot [rad] 5.488 × 10−08 8.183 × 10−06 1.986 × 10−08 1.952 × 10−07
epos [m] 1.527 × 10−06 5.252 × 10−05 6.743 × 10−07 2.023 × 10−06
ETH-Apart
erot [rad] 7.134 × 10−08 1.125 × 10−04 1.548 × 10−08 1.564 × 10−07
epos [m] 1.000 × 10−12 1.352 × 10−05 1.284 × 10−06 9.076 × 10−06
bunny
erot [rad] 1.515 × 10−07 2.665 × 10−04 1.269 × 10−06 5.660 × 10−07

TABLE II: Comparison of the final registration error of the optimization result.

Time per Iteration Cumulative Optimization Statistics


0.12 12

0.11

0.1 10

0.09
8
0.08
seconds

seconds
0.07
6
0.06

0.05
4
0.04

0.03 2
0.02

0.01 0
ceres ceres-schur gtsam gtsam-seq g2o-schur g2o our ceres ceres-schur gtsam gtsam-seq g2o-schur g2o our

(a) Iteration time: victoria-park. (b) Cumulative time: victoria-park.

Time per Iteration Cumulative Optimization Statistics


10 1000

9 900

800
8
700
7
600
6
seconds

seconds

500
5
400
4
300
3
200

2 100

1 0
ceres ceres-schur gtsam gtsam-seq g2o-schur g2o our ceres ceres-schur gtsam gtsam-seq g2o-schur g2o our

(c) Iteration time: kitti-00-full. (d) Cumulative time: kitti-00-full.

Fig. 15: Timing analysis: the left column illustrates the time to perform a complete LM iteration; the right column reports
the total time to complete the optimization. All values are mean and standard deviation computed over 5 noise realizations.

D ATASET VARIABLES FACTORS N OISE Σt [m] N OISE ΣR [rad]


kitti-00 4541 5595 diag(0.05, 0.05, 0.05) diag(0.01, 0.01, 0.01)
sphere-b 2500 9799 diag(0.10, 0.10, 0.10) diag(0.05, 0.05, 0.05)
torus-b 1000 1999 diag(0.10, 0.10, 0.10) diag(0.05, 0.05, 0.05)

TABLE III: Specifications of PGO datasets.

To avoid any bias in the comparison, we used the native each one can detect when to stop the optimization. Finally,
LM implementation of each framework, since it was the no robust kernel has been employed in these experiments.
only algorithm common to all candidates. Furthermore, we
In Tab. IV we illustrate the Absolute Trajectory Error
imposed a maximum number of 100 LM iterations. Still, each
(ATE) (RMSE) computed on the optimized graph with
framework has its own termination criterion active, so that
respect to the ground truth. The values reported refer to mean
C ERES g2 o G TSAM O UR
ATEpos [m] 96.550 ± 36.680 94.370 ± 39.590 77.110 ± 41.870 95.290 ± 38.180
kitti-00
ATErot [rad] 1.107 ± 0.270 0.726 ± 0.220 0.579 ± 0.310 0.720 ± 0.230
ATEpos [m] 83.210 ± 7.928 9.775 ± 4.003 55.890 ± 12.180 26.060 ± 16.350
sphere-b
ATErot [rad] 2.135 ± 0.282 0.150 ± 0.160 0.861 ± 0.170 0.402 ± 0.274
ATEpos [m] 14.130 ± 1.727 2.232 ± 0.746 8.041 ± 1.811 3.691 ± 1.128
torus-b
ATErot [rad] 2.209 ± 0.3188 0.121 ± 0.0169 0.548 ± 0.082 0.156 ± 0.0305

TABLE IV: Comparison of the ATE (RMSE) of the optimization result - mean and standard deviation.

C ERES g2 o G TSAM O UR
kitti-00 81.70 99.50 69.40 49.0
the linear system in Eq. (19) is can be rearranged as follows:
sphere-b 101.0 70.90 15.50 27.40     
Hpp Hpl ∆xp −bp
torus-b 93.50 12.90 25.50 16.40 = . (64)
H⊤pl Hll ∆xl −bl
TABLE V: Comparison of the number of LM iterations to
reach convergence - mean values. A linear system with this structure can be solved more
efficiently through the Schur complement of the Hessian
matrix [80], namely:
and standard deviation over all noise trials. As expected, the (Hpp − Hpl H−1 ⊤ −1
ll Hpl )∆xp = −bp + Hpl Hll bl (65)
result obtained are in line with all other methods. Fig. 14,
Hll ∆xl = −bl + H⊤
pl ∆xp . (66)
instead, reports a detailed timing analysis. The time to per-
form a complete LM iteration is always among the smallest, Ceres-Solver and g 2 o can make use of the Schur com-
with a very narrow standard deviation. Furthermore, since the plement to solve this kind of special problem, therefore,
specific implementation of LM is slightly different in each we reported also the wall times of the optimization when
framework, we reported also the total time to perform the this technique is used. Obviously, using the Schur com-
full optimization, while the number of LM iteration elapsed plement leads to a major improvement in the efficiency
are shown in Tab. V. Also in this case, our system is able to of the linear solver, leading to very low iteration times.
achieve state-of-the-art performances that are better or equal For completeness, we reported the results of GTSAM with
to the other approaches. two different linear solvers: cholesky multifrontal
2) Pose-Landmark Graph Optimization: PLGO is another and cholesky sequential. Our framework does not
common global optimization task in SLAM. In this case, provide at the moment any implementation of a Schur-
the variables contain both robot (or camera) poses and complement-based linear solver, still, the performance
landmarks’ position in the world. Factors, instead, embody achieved are in line with all the non-Schur methods, con-
spatial constraints between either two poses or between firming our conjectures.
a pose and a landmark. As a result, this kind of factor
graphs are the perfect representative of the SLAM problem, IX. C ONCLUSIONS
since they contain the robot trajectory and the map of In this work, we propose a generic overview on ILS
the environment. To perform the benchmarks we used two optimization for factor graphs in the fields of robotics and
datasets: Victoria Park [77] and KITTI-00 [78]. We obtained computer vision. Our primary contribute is providing a uni-
the last one running ProSLAM [79] on the stereo data fied and complete methodology to design efficient solution
and saving the full output graph. We super-imposed to the to generic problems. This paper analyzes in a probabilistic
factors specific AWGN and we generated the initial guess flavor the mathematical fundamentals of ILS, addressing
through the breadth-first initialization technique. Tab. VI also many important collateral aspects of the problem such
summarizes the specification of the datasets used in these as dealing with non-Euclidean spaces and with outliers,
experiments. Also in this case, we sampled multiple noise exploiting the sparsity or the density. Then, we propose a
trials (5 samples) and reported mean and standard deviation set of common use-cases that exploit the theoretic reasoning
of the results obtained. The configuration of the framework previously done.
is the same as the one used in PGO experiments - i.e. 100 In the second half of the work, we investigate how to
LM iterations at most, with termination criterion active. design an efficient system that is able to work in all the
As reported in Tab. VII the ATE (RMSE) that we obtain possible scenarios depicted before. This analysis led us to
is compatible with the one of the other frameworks. The the development of a novel ILS solver, focused on effi-
higher error in the kitti-00-full dataset is mainly due ciency, flexibility and compactness. The system is developed
to the slow convergence of LM, that triggers too early the in modern C++ and almost entirely self-contained in less
termination criterion, as shown in Tab. VIII. In such case, the than 6000 lines of code. Our system can seamlessly deal
use of GN leads to better results, however, in order to not bias with sparse/dense, static/dynamic problems with a unified
the evaluation, we choose to not report results obtained with consistent interface. Furthermore, thanks to specific imple-
different ILS algorithms. As for the wall times to perform the mentation designs, it allows to easily prototype new factors
optimization, the results are illustrated in Fig. 15. In PLGO and variables or to intervene at low level when performances
scenarios, given the fact that there are two types of factors, are critical. Finally, we provide an extensive evaluation of
D ATASET VARIABLES FACTORS N OISE Σt [m] N OISE ΣR [rad] N OISE Σland [m]
victoria-park 7120 10608 diag(0.05, 0.05) 0.01 diag(0.05, 0.05)
kitti-00-full 123215 911819 diag(0.05, 0.05, 0.05) diag(0.01, 0.01, 0.01) diag(0.05, 0.05, 0.05)

TABLE VI: Specification of PLGO datasets.

C ERES g2 o G TSAM O UR
ATEpos [m] 37.480 ± 21.950 29.160 ± 37.070 2.268 ± 0.938 5.459 ± 3.355
victoria-park
ATErot [rad] 0.515 ± 0.207 0.401 ± 0.461 0.030 ± 0.007 0.056 ± 0.028
ATEpos [m] 134.9 ± 29.160 31.14 ± 27.730 30.97 ± 18.150 135.4 ± 27.000
kitti-00-full
ATErot [rad] 1.137 ± 0.268 0.173 ± 0.157 0.174 ± 0.104 0.850 ± 0.148

TABLE VII: Comparison of the ATE (RMSE) of the optimization result - mean and standard deviation.

C ERES C ERES - SCHUR g2 o g 2 o- SCHUR G TSAM G TSAM - SEQ O UR


victoria-pack 101.0 101.0 66.8 66.0 43.6 43.6 36.0
kitti-00-full 101.0 101.0 100.0 100.0 100.0 100.0 2.0

TABLE VIII: Comparison of the number of LM iterations to reach convergence - mean values.

the system’s performances, both in dense - e.g. ICP - and Expanding Eq. (71) and performing all the multiplications,
sparse - e.g. batch global optimization - scenarios. The the rotation matrix R(∆θ) is computed as follows:
evaluation shows that the performances achieved are in line  
with contemporary state-of-the-art frameworks, both in terms r11 r12 r13
of accuracy and speed. R(∆θ) = r21 r22 r23 
r31 r32 r33
 
A PPENDIX I c(∆γ) c(∆ψ) −c(∆γ) s(∆ψ) s(∆γ)
SE(3) M APPINGS = a b −c(∆γ) s(∆φ)
In this section, we will assume that a SE(3) variable X c d c(∆γ) c(∆φ)
is composed as follows: (72)
 
R t where c(·) and s(·) indicate the cosine and sine of an angle
X ∈ SE(3) = . (67)
01×3 1 respectively, while
A possible minimal representation for this object could be a = c(∆φ) s(∆ψ) + s(∆φ) c(∆ψ) s(∆γ)
using 3 Cartesian coordinates for the position and the 3 Euler
b = c(∆φ) c(∆ψ) − s(∆φ) s(∆γ) s(∆ψ)
angles for the orientation, namely
 ⊤  ⊤ c = s(∆φ) s(∆ψ) − c(∆φ) c(∆ψ) s(∆γ)
∆x = ∆x ∆y ∆z ∆φ ∆γ ∆ψ = ∆t ∆θ d = s(∆φ) c(∆ψ) + c(∆φ) s(∆γ) s(∆ψ).
(68)
To pass from one representation to the other, we should On the contrary, through Eq. (70) we compute the minimal
define suitable mapping functions. In this case, we use the parametrization ∆x starting from ∆X. Again, while the
following notation: translational component of ∆x can be retrieved easily from
the isometry. The rotational component ∆θ - i.e. the 3 Euler
∆X = v2t(∆x) (69)
angles - should be computed starting from the rotation matrix
∆x = t2v(∆X). (70) in Eq. (72), in formulæ:
More in detail, the function v2t computes the SE(3) isometry 
−r23

reported in Eq. (67) from the 6-dimensional vector ∆x ∆φ = atan2
r
in Eq. (68). While the translational component of X can  33 
−r12
be recovered straightforwardly from ∆x, the rotational part ∆ψ = atan2
requires to compose the rotation matrices around each axis, r11
!
leading to the following relation: r13
∆γ = atan2 1 .
r11 · c(∆ψ)
R(∆θ) = R(∆φ, ∆γ, ∆ψ) = Rx (∆φ) Ry (∆γ) Rz (∆ψ).
(71)
Other minimal parametrizations of SE(3) can be used, and
In Eq. (71), Rx , Ry , Rz represent elementary rotations
they typically differ on how they represent the rotational
around the x, y and z axis. Summarizing, we can ex-
component of the SE(3) object. Common alternatives to
pand Eq. (69) as:
  euler angles are unit quaternions and axis-angle. Clearly,
R(∆θ) ∆t changing the minimal parametrization will affect the jaco-
∆X = v2t(∆x) =
01×3 1 bians too, and thus the entire optimization process.
A PPENDIX II i.e. the contribution of the homogeneous division. Given that
ICP JACOBIAN the function hom(·) is defined as
 
In this section, we will provide the reader the full ⊤ x/z
mathematical derivation of the Jacobian matrices reported hom([x y z] ) =
y/z
in Sec. V-A.2. To this end, we recall that the measurement
function for the ICP problem is: the Jacobian Jhom (pcam ) is computed as follows:
" 1 −pcam #
hicp −1
p = R⊤ (p − t). 0 x
k (X) , X (73)
J hom cam
(p ) =
pcam
z (pcam )2
z cam
(78)
1 −py
0 (pcam )2 .
If we apply a small state perturbation using the ⊞ operator pcam z z

defined in Eq. (43), we obtain: A PPENDIX IV


icp −1 B UNDLE A DJUSTMENT JACOBIAN
h (X ⊞ ∆x) = (v2t(∆x) · X) p
−1 −1 In this section we address the computation of the Jacobians
=X · v2t(∆x) p
in the context of Bundle Adjustment. The scenario is the one
= R⊤ (v2t(∆x)−1 p − t). (74) described in Sec. V-C. We recall that, in this case, each factor
involves two state variables, namely a pose and a landmark.
To compute the Jacobian matrix Jicp we should de-
Therefore, Jba has the following pattern:
rive Eq. (74) with respect to ∆x. Note that, the translation  
vector t is constant with respect to the perturbation, so it Jba ba
0 · 0 Jba
k = 0 · 0 Jk,n(k) k,m(k) 0 · 0 .
will have no impact in the computation. Furthermore, R⊤
represents a constant multiplicative factor. Since the state More in detail, the two non-zero block embody the deriva-
perturbation is local, its magnitude is small enough to make tives computed with respect to the two active variables,
the following approximation hold: namely:
  
1 −∆ψ ∆γ ba ∂ ek (Xcam ⊞ ∆xcam , xland )
Jpose =
R(∆θ) ≈  ∆ψ 1 −∆φ . (75)  ∂∆xcam 
−∆γ ∆φ 1 ∂ ek (Xcam , xland ⊞ ∆xland )
Jba
land =
Finally, the Jacobian Jicp is computed as follows: ∂∆xland

∂ hicp (X ⊞ ∆x) where ek represents the error for the k-th factor, computed
icp
J (X) = according to Eq. (54).
∂∆x ∆x=0
∂ v2t(∆x) −1
p Unrolling the multiplications, we note that Jba pose is the
= R⊤ = same as the one computed in Eq. (78) - i.e. in the pro-
 ∂∆x ∆x=0  jective registration example. The derivatives relative to the
∂ (v2t(∆x)−1 p) ∂ (v2t(∆x)−1 p)
= −R⊤ ∂∆t ∂∆θ
landmark - i.e. Jbapose - can be straightforwardly computed
∆x=0 ∆x=0
  from Eq. (74), considering that the derivation is with respect

= −R I3×3 − [p]× (76) to the landmark perturbation this time. In formluæ:
where [p]× is the skew-symmetric matrix constructed out of Jba
land (X) = J
hom
(pcam )KR⊤ . (79)
p.
Summarizing, the complete Bundle Adjustment Jacobian is
A PPENDIX III computed as:
P ROJECTIVE R EGISTRATION JACOBIAN 
Jba
k = 0 · 0 Jpose
ba
0 · 0 Jba land 0 · 0
In this section, we will provide the complete derivation
Jba
pose = J
hom
(pcam ) K Jicp Jba
land = J
hom
(pcam )KR⊤ .
of the Jacobians in the context of projective registration.
From Eq. (50), we know that the prediction is computed (80)
as R EFERENCES
hreg
k (X) = hom(K[h
icp
(pm
j(k) )]). [1] G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard. A tutorial on
graph-based slam. IEEE Trans. on Intelligent Transportation Systems
Furthermore, as stated in Eq. (52), we can compute the Magazine, 2(4):31–43, 2010.
Jacobian for this factor through the chain rule, leading to [2] Rainer Kümmerle, Giorgio Grisetti, and Wolfram Burgard. Simultane-
ous calibration, localization, and mapping. In Proc. of the IEEE/RSJ
the following relation: Intl. Conf. on Intelligent Robots and Systems (IROS), pages 3716–
3721. IEEE, 2011.
∂hom(v) [3] Andrea Censi, Antonio Franchi, Luca Marchionni, and Giuseppe
Jreg (X) = K Jicp (X) Oriolo. Simultaneous calibration of odometry and sensor parameters
∂v
v=pcam for mobile robots. IEEE Trans. on Robotics (TRO), 29(2):475–492,
hom cam icp 2013.
=J (p )KJ (X). (77) [4] Bartolomeo Della Corte, Henrik Andreasson, Todor Stoyanov, and
cam Giorgio Grisetti. Unified motion-based calibration of mobile multi-
where p = KX p . Since we already computed Jicp
−1 m
sensor platforms with time delay estimation. IEEE Robotics and
in Eq. (76), in the remaining we will focus only on Jhom - Automation Letters (RA-L), 4(2):902–909, 2019.
[5] Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David [30] Jakob Engel, Jurgen Sturm, and Daniel Cremers. Semi-dense visual
Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohli, Jamie odometry for a monocular camera. In Proceedings of the IEEE
Shotton, Steve Hodges, and Andrew W Fitzgibbon. Kinectfusion: international conference on computer vision, pages 1449–1456, 2013.
Real-time dense surface mapping and tracking. In ISMAR, volume 11, [31] Bartolomeo Della Corte, Igor Bogoslavskyi, Cyrill Stachniss, and
pages 127–136, 2011. Giorgio Grisetti. A general framework for flexible multi-cue pho-
[6] François Pomerleau, Francis Colas, Roland Siegwart, and Stéphane tometric point cloud registration. In Proc. of the IEEE Intl. Conf. on
Magnenat. Comparing icp variants on real-world data sets. Au- Robotics & Automation (ICRA), pages 1–8. IEEE, 2018.
tonomous Robots, 34(3):133–148, 2013. [32] F. Lu and E. Milios. Globally consistent range scan alignment for
[7] J. Serafin and G. Grisetti. Using extended measurements and scene environment mapping. Autonomous robots, 4(4):333–349, 1997.
merging for efficient and robust point cloud registration. Journal on [33] Dorit Borrmann, Jan Elseberg, Kai Lingemann, Andreas Nüchter,
Robotics and Autonomous Systems (RAS), 92:91 – 106, 2017. and Joachim Hertzberg. Globally consistent 3d mapping with scan
[8] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and Wolfram matching. Journal on Robotics and Autonomous Systems (RAS),
Burgard. g 2 o: A general framework for graph optimization. In 56(2):130–142, 2008.
Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), pages [34] MWM G. Dissanayake, P. Newman, S. Clark, H. Durrant-Whyte, and
3607–3613. IEEE, 2011. M. Csorba. A solution to the simultaneous localization and map
[9] F. Dellaert. Factor graphs and gtsam: A hands-on introduction. building (slam) problem. IEEE Trans. on Robotics and Automation,
Technical report, Georgia Institute of Technology, 2012. 17(3):229–241, 2001.
[10] Sameer Agarwal, Keir Mierle, and Others. Ceres solver. [35] A. J. Davison and D. W. Murray. Simultaneous localization and map-
http://ceres-solver.org. building using active vision. IEEE transactions on pattern analysis
[11] V. Ila, L. Polok, M. Solony, and K. Istenic. Fast incremental bundle and machine intelligence, 24(7):865–880, 2002.
adjustment with covariance recovery. In Proc. of the International [36] J. Leonard and P. Newman. Consistent, convergent, and constant-time
Conference on 3D Vision (3DV), pages 175–184. IEEE, 2017. slam. In Proc. of the Intl. Conf. on Artificial Intelligence (IJCAI),
[12] Frank Dellaert and Michael Kaess. Factor graphs for robot perception. pages 1143–1150, 2003.
Foundations and Trends in Robotics, 6(1-2):1139, August 2017. [37] S. Se, D. Lowe, and J. Little. Mobile robot localization and mapping
[13] V. Ila, L. Polok, M. Solony, and P. Svoboda. Slam++-a highly efficient with uncertainty using scale-invariant visual landmarks. Intl. Jour-
and temporally scalable incremental slam framework. Intl. Journal of nal of Robotics Research (IJRR), 21(8):735–758, 2002.
Robotics Research (IJRR), 36(2):210–230, 2017. [38] J. A Castellanos, J. Neira, and J. D. Tardós. Limits to the consistency
[14] Christoph Hertzberg, René Wagner, Udo Frese, and Lutz Schröder. of ekf-based slam. IFAC Proceedings Volumes, 37(8):716–721, 2004.
Integrating generic sensor fusion algorithms with sound state repre- [39] L. A. Clemente, A. J. Davison, I. D. Reid, J. Neira, and J. D. Tardós.
sentations through encapsulation of manifolds. Information Fusion, Mapping large loops with a single hand-held camera. In Proc. of
14(1):57–77, 2013. Robotics: Science and Systems (RSS), volume 2, 2007.
[15] Radu Bogdan Rusu and Steve Cousins. 3D is here: Point Cloud Library [40] M. Montemerlo, S. Thrun, D. Koller, B. Wegbreit, et al. Fastslam:
(PCL). In Proc. of the IEEE Intl. Conf. on Robotics & Automation A factored solution to the simultaneous localization and mapping
(ICRA), Shanghai, China, May 9-13 2011. problem. Proc. of the Conference on Advancements of Artificial
[16] P. J. Besl and N. D. McKay. A method for registration of 3-D shapes. Intelligence (AAAI), 593598, 2002.
IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), [41] M. Montemerlo, S. Thrun, D. Koller, B. Wegbreit, et al. Fastslam 2.0:
1992. An improved particle filtering algorithm for simultaneous localization
[17] D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard. Se-sync: and mapping that provably converges. In Proc. of the Intl. Conf. on
A certifiably correct algorithm for synchronization over the special Artificial Intelligence (IJCAI), pages 1151–1156, 2003.
euclidean group. Intl. Journal of Robotics Research (IJRR), page [42] G. Grisetti, C. Stachniss, W. Burgard, et al. Improved techniques for
0278364918784361, 2016. grid mapping with rao-blackwellized particle filters. IEEE Trans. on
[18] Z. Zhang. A flexible new technique for camera calibration. IEEE Robotics (TRO), 23(1):34, 2007.
Trans. on Pattern Analysis and Machine Intelligence (TPAMI), [43] Cyrill Stachniss, Giorgio Grisetti, Wolfram Burgard, and Nicholas
22(11):1330–1334, 2000. Roy. Analyzing gaussian proposal distributions for mapping with rao-
[19] A. Censi, L. Marchionni, and G. Oriolo. Simultaneous maximum- blackwellized particle filters. In Proc. of the IEEE/RSJ Intl. Conf. on
likelihood calibration of robot and sensor parameters. In Proc. of the Intelligent Robots and Systems (IROS), pages 3485–3490. IEEE, 2007.
IEEE Intl. Conf. on Robotics & Automation (ICRA), 2008. [44] J.S. Gutmann and K. Konolige. Incremental mapping of large
[20] M. Di Cicco, B. Della Corte, and G. Grisetti. Unsupervised calibration cyclic environments. In Computational Intelligence in Robotics and
of wheeled mobile platforms. In ICRA, pages 4328–4334, 2016. Automation, 1999. CIRA’99. Proceedings. 1999 IEEE International
[21] J Chen and G. Medioni. Object modeling by registration of multiple Symposium on, pages 318–325. IEEE, 1999.
range images. In Proc. of the IEEE Intl. Conf. on Robotics & [45] M. Kaess, A. Ranganathan, and F. Dellaert. isam: Fast incremental
Automation (ICRA), 1991. smoothing and mapping with efficient data association. In Proc. of the
[22] Feng Lu and Evangelos Milios. Robot pose estimation in unknown IEEE Intl. Conf. on Robotics & Automation (ICRA), pages 1670–1677.
environments by matching 2d range scans. Journal of Intelligent and IEEE, 2007.
Robotic Systems, 18(3):249–275, Mar 1997. [46] F. Dellaert and M. Kaess. Square root sam: Simultaneous localization
[23] Andrea Censi. An icp variant using a point-to-line metric. 2008. and mapping via square root information smoothing. Intl. Journal of
[24] Andrea Censi. An accurate closed-form estimate of icp’s covariance. Robotics Research (IJRR), 25(12):1181–1203, 2006.
In Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), [47] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard, and
pages 3167–3172. IEEE, 2007. F. Dellaert. isam2: Incremental smoothing and mapping using the
[25] A. V. Segal, D. Haehnel, and S. Thrun. Generalized-ICP. In Proc. of bayes tree. Intl. Journal of Robotics Research (IJRR), 31(2):216–235,
Robotics: Science and Systems (RSS), 2009. 2012.
[26] Peter Biber and Wolfgang Strasser. The normal distributions transform: [48] Christoph Hertzberg. A framework for sparse, non-linear least squares
a new approach to laser scan matching. Proceedings 2003 IEEE/RSJ problems on manifolds. In Universität Bremen. Citeseer, 2008.
International Conference on Intelligent Robots and Systems (IROS [49] Andreas Griewank, David Juedes, and Jean Utke. Algorithm 755:
2003) (Cat. No.03CH37453), 3:2743–2748 vol.3, 2003. Adol-c: a package for the automatic differentiation of algorithms
[27] M. Magnusson, T. Duckett, and A. J. Lilienthal. Scan registration for written in c/c++. ACM Transactions on Mathematical Software
autonomous mining vehicles using 3D-NDT. Journal of Field Robotics (TOMS), 22(2):131–167, 1996.
(JFR), 2007. [50] Andreas Griewank et al. On automatic differentiation. Mathematical
[28] Paul R Wolf and Bon A Dewitt. Elements of photogrammetry: with Programming: recent developments and applications, 6(6):83–107,
applications in GIS, volume 3. McGraw-Hill New York, 2000. 1989.
[29] Martin A Fischler and Robert C Bolles. Random sample consensus: [51] Andreas Griewank and Andrea Walther. Evaluating derivatives:
a paradigm for model fitting with applications to image analysis and principles and techniques of algorithmic differentiation, volume 105.
automated cartography. Communications of the ACM, 24(6):381–395, Siam, 2008.
1981. [52] Michael Kaess and Frank Dellaert. Covariance recovery from a square
root information matrix for data association. Journal on Robotics and [78] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for
Autonomous Systems (RAS), 57(12):1198–1210, 2009. autonomous driving? the kitti vision benchmark suite. In Proc. of
[53] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cam- the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),
bridge university press, 2004. 2012.
[54] J. Briales and J. Gonzalez-Jimenez. Cartan-sync: Fast and global se [79] Dominik Schlegel, Mirco Colosi, and Giorgio Grisetti. Proslam:
(d)-synchronization. IEEE Robotics and Automation Letters (RA-L), Graph slam from a programmer’s perspective. In Proc. of the IEEE
2(4):2127–2134, 2017. Intl. Conf. on Robotics & Automation (ICRA), pages 1–9. IEEE, 2018.
[55] Fang Bai, Teresa Vidal-Calleja, and Shoudong Huang. Robust in- [80] Udo Frese. A proof for the approximate sparsity of slam information
cremental slam under constrained optimization formulation. IEEE matrices. In Proc. of the IEEE Intl. Conf. on Robotics & Automation
Robotics and Automation Letters (RA-L), 3(2):1207–1214, 2018. (ICRA), pages 329–335. IEEE, 2005.
[56] K. Ni and F. Dellaert. Multi-level submap based slam using nested
dissection. In Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots
and Systems (IROS), pages 2558–2565. IEEE, 2010.
[57] G. Grisetti, R. Kümmerle, and K. Ni. Robust optimization of factor
graphs by using condensed measurements. In Proc. of the IEEE/RSJ
Intl. Conf. on Intelligent Robots and Systems (IROS), pages 581–588.
IEEE, 2012.
[58] T. B Schön and F. Lindsten. Manipulating the multivariate gaussian
density. Div. Automat. Control, Linköping Univ., Linköping, Sweden,
Tech. Rep, 2011.
[59] J. M. Lee. Smooth manifolds. In Introduction to Smooth Manifolds,
pages 1–31. Springer, 2013.
[60] R. C Smith and P. Cheeseman. On the representation and estimation of
spatial uncertainty. Intl. Journal of Robotics Research (IJRR), 5(4):56–
68, 1986.
[61] C. Hertzberg, R. Wagner, and U. Frese. Tutorial on quick and easy
model fitting using the slom framework. In International Conference
on Spatial Cognition, pages 128–142. Springer, 2012.
[62] Irvin Aloise and Giorgio Grisetti. Chordal based error function for
3-d pose-graph optimization. IEEE Robotics and Automation Letters,
5(1):274–281, 2019.
[63] Kirk MacTavish and Timothy D Barfoot. At all costs: A comparison
of robust cost functions for camera correspondence outliers. In 2015
12th Conference on Computer and Robot Vision, pages 62–69. IEEE,
2015.
[64] Y. Saad. Iterative methods for sparse linear systems, volume 82. siam,
2003.
[65] P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum
degree ordering algorithm. SIAM Journal on Matrix Analysis and
Applications, 17(4):886–905, 1996.
[66] T. A. Davis. A column pre-ordering strategy for the unsymmetric-
pattern multifrontal method. ACM Transactions on Mathematical
Software (TOMS), 30(2):165–195, 2004.
[67] G. Karypis and V. Kumar. Multilevelk-way partitioning scheme for
irregular graphs. Journal of Parallel and Distributed computing,
48(1):96–129, 1998.
[68] T. A. Davis. Direct methods for sparse linear systems, volume 2.
Siam, 2006.
[69] P. Agarwal and E. Olson. Variable reordering strategies for slam. In
Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems
(IROS), pages 3844–3850. IEEE, 2012.
[70] Manolis IA Lourakis and Antonis A Argyros. Sba: A software
package for generic sparse bundle adjustment. ACM Transactions on
Mathematical Software (TOMS), 36(1):2, 2009.
[71] Andrew R Conn, Nicholas IM Gould, and Ph L Toint. Trust region
methods, volume 1. Siam, 2000.
[72] Peter Biber and Wolfgang Straßer. The normal distributions transform:
A new approach to laser scan matching. In Proc. of the IEEE/RSJ
Intl. Conf. on Intelligent Robots and Systems (IROS), volume 3, pages
2743–2748. IEEE, 2003.
[73] Gaël Guennebaud, Benoı̂t Jacob, et al. Eigen v3.
http://eigen.tuxfamily.org, 2010.
[74] Ankur Handa, Thomas Whelan, John McDonald, and Andrew J
Davison. A benchmark for rgb-d visual odometry, 3d reconstruction
and slam. In Proc. of the IEEE Intl. Conf. on Robotics & Automation
(ICRA), pages 1524–1531. IEEE, 2014.
[75] François Pomerleau, Ming Liu, Francis Colas, and Roland Sieg-
wart. Challenging data sets for point cloud registration algorithms.
Intl. Journal of Robotics Research (IJRR), 31(14):1705–1711, 2012.
[76] Brian Curless and Marc Levoy. A volumetric method for building
complex models from range images. 1996.
[77] Jose E Guivant and Eduardo Mario Nebot. Optimization of the
simultaneous localization and map-building algorithm for real-time
implementation. IEEE Transactions on Robotics and Automation,
17(3):242–257, 2001.

You might also like