Least Squares Optimization: From Theory To Practice
Least Squares Optimization: From Theory To Practice
Abstract— Nowadays, Non-Linear Least-Squares embodies cases these systems can be used as black boxes. Since they
the foundation of many Robotics and Computer Vision systems. typically consist of an extended codebase, entailing them to
The research community deeply investigated this topic in the a particular application/architecture to achieve the maximum
arXiv:2002.11051v1 [cs.RO] 25 Feb 2020
Fig. 1: Left: visual configuration manager implemented in our framework. Each block represents a configurable sub-module.
Right: dataset used in the evaluation - dense and sparse.
data chunks at each iterations thus requiring a dense, non- spondences. The final stage, however typically uses a dense
stationary solver, such as the one presented in [4]. and stationary ILS approach, since the correspondences do
Among the first works on pairwise shape registration not change during the iterations. When the initial guess of
relying on ILS, we find the ICP proposed by Besl and the camera is known with sufficient accuracy, like in Visual
McKay [16], while Chen and Medioni [21] proposed the Odometry (VO), only the latter stage is typically used. In
first ILS method for the incremental reconstruction of a 3D contrast to these feature-based solvers, Engels et. al [30]
model from multiple range images. These methods constitute approach VO by minimizing the reprojection error between
the foundation of many registration algorithms appearing two images through dense and non-stationary ILS. Using
during the subsequent years. In particular Lu and Milios [22] this method requires the system to possess a reasonably
specialized ICP to operate on 2D laser scans. All these works good estimate of the depth for a subset of the point in the
employed dense non-stationary solvers, to estimate the robot scene. Such initialization is usually obtained by estimating
pose that better explain the point measurements. The non- the transformation between two images using a combination
stationary aspect arises from the heuristic used to estimate of RANSAC and direct solvers, and then computing the
the data association is based on the current pose estimate. depth through triangulation between the stereo pair. Della
In the context of ICP, Censi [23] proposed an alternative Corte et. al [31] developed a registration algorithm called
metric to compute the distance between two points and an MPR, which was built on this idea. As a result, MPR is
approach to estimate the information matrix from the set able to operate on depth images capturing different cues and
of correspondences [24]. Subsequently, Segal et al. [25] obtained with arbitrary projection functions. To operate on-
proposed the use of covariance matrices that better reflect line, all the registration works mentioned so far rely on ad-
the structure hoc dense and non-stationary ILS solvers that leverage on
of the surface in computing the error. Registration has been the specific problem’s structure to reduce the computation.
addressed by Bieber et al. [26] for 2D scans and subsequently The scan based ICP algorithm [22] has been subsequently
Magnusson et al. [27] for 3D point clouds by using a pure employed by the same authors [32] as a building block
Newton’s method relying on a Gaussian approximation of for a system that estimates a globally consistent map. The
the point clouds to be registered, called Normal Distributed core is to determine the relative transforms between pairwise
Transform (NDT). Serafin et. al [7] approached the problem scans through ICP. These transformations are known as
of point cloud registration using a 6D error function encoding constraints, and a global map is obtained by finding the
also the normal difference in the error vector. All the position of all the scans that better explain the constraints.
approaches mentioned so far leverage on a dense ILS solver, The process can be visualized as a graph, whose nodes
with the notable exception of NDT that is a second-order are the scan positions and whose edges are the constraints,
approach that specializes the Newton’s algorithm. hence this problem is called PGO. Constraints can exist
In the context of Computer Vision, the p2p algorithm [28], only between spatially close nodes, due to the limited sensor
[29] allows to find the camera transformation that minimizes range. Hence, PGO is inherently sparse. Additionally, in the
the reprojection error between a set of 3D points in the scene on-line case the graph is incrementally augmented as new
and their 2D projections in the image. The first stage of p2p measurements become available, rendering it incremental.
is usually conducted according to a consensus scheme that We are unaware on these two aspects being exploited in the
relies on an ad-hoc minimal solver requiring only 3 corre- design of the underlying solver in [32]. The work of Lu and
Milios inspired Borrman et al. [33] to produce an effective exchange sub-modules of the system - e.g. the linear solver
3D extension. or optimization algorithm. A further paper of Hertzberg et
For several years after the introduction of Graph-Based al. [14] extends the ⊞ method ILS to filtering.
SLAM [32], the community put aside ILS approaches in Agarwal et al. proposed in their Ceres Solver [10] a
favor of filtering methods relying on Gaussian [34]–[39] or generalized framework to perform non-linear optimization.
Particle [40]–[43] representation of the posterior. Filtering Ceres embeds state-of-the-art methodologies that take advan-
approaches were preferred since they were regarded as more tages of modern CPUs - e.g. SIMD instructions and multi-
suitable to be run on-line on a moving robot with the threading - resulting in a complete and fast framework. One
available computational resources of the era, and the sparsity of its most relevant feature is represented by the efficient
of the problem had not yet been fully exploited. use of Automatic Differentiation (AD) [49], that consists in
In a Graph-Based SLAM problem, it is common to have the algorithmic computation of derivatives starting from the
a number of variables in the order of hundreds or thousands. error function. Further information on the topic of AD can
Such a high number of variables results in a large optimiza- be found in [50], [51].
tion problem that represented a challenge for the computers In several contexts knowing the optimal value of a so-
of the time, rendering global optimization a bottleneck lution is not sufficient, and also the covariance is required.
of Graph-Based SLAM systems. In the remainder of this In SLAM, knowing the marginal covariances relative to a
document we will refer to the global optimization module variable is fundamental to approach data-association. To this
in Graph-SLAM as the back-end, in contrast to the front- extent Kaess et. al [52] outlined the use of the elimination
end which is responsible to construct the factor graph based tree. Subsequently, Ila et al. [13] designed SLAM++, an
on the sensor measurements. Gutmann and Konolidge [44] optimization framework to estimate mean and covariance
addressed the problem of incrementally building a map, by of the state by performing incremental Cholesky updates.
finding topological relations and loop closures. This work This work takes advantage of the incremental aspect of
was one of the first on-line implementations of Graph-Based the problem to selectively update the approximated Hessian
SLAM. The core idea to reduce the computation in the matrix by using parallel computation.
back-end was to restrict the optimization to the portions of ILS algorithms have several known drawbacks. Perhaps
the graph having the larger errors. This insight has inspired the most investigated aspect is the sensitivity of the solution
several subsequent works [13], [45]. to the initial guess, that is reflected by the convergence
basin. A wrong initial guess might lead a non-linear solver
B. Stand-Alone ILS solvers to converge to an inconsistent local minimum. Convex opti-
Whereas dense solvers are typically embedded in the spe- mization [53] is one of the possible strategies to overcome
cific application for performance reasons, sparse solvers are this problem, however its use is highly domain dependent.
complex enough to motivate the design of generic libraries Rosen et al. [17] explored this topic, proposing a system to
forILS. The first work to explicitly consider the sparsity of perform optimization of generic SE(d) factor graphs. In their
SLAM in conjunction√ with a direct method to solve the system, called SE-Sync, they use Riemannian Truncated-
linear system was SAM , developed by Dallaert et al. [46]. Newton Trust-Region method to certifiably compute the
Kaess et al. [45] exploited this√ aspect of the problem in global optimum in a two step optimization (rotation and
iSAM, the second iteration of SAM . Here when a new translation). Briales et al. [54] extended this approach to
edge is added to the graph, the system computes a new jointly optimize rotation and translation using the same
solution reusing part of the previous one and selectively concepts. Bai et al. [55] provided a formulation of the
updating the vertices. In the third iteration of the system SLAM problem based on constrained optimization, where
- iSAM2 - Kaess et al. [47] exploited the Bayes Tree to constraints are represented by loop-closure cycles. Still, those
solve the optimization problem without explicitly construct- approaches are bounded to SE(d) sparse optimization. In
ing the linear system. This solution is in contrast with the contrast, Ni et al. [56] and Grisetti et al. [57] exploited re-
general trend of decoupling linearization of the problem and spectively nested dissection and hierarchical local sub-graphs
solution of the linear system and highlights the connections devising divide and conquer strategies to both increase the
between the elimination algorithms used in the solution of a convergence basin and speed up the computation.
linear system and inference on graphical models. This self-
contained engine allows deal very efficiently with dynamic IV. L EAST S QUARES M INIMIZATION
graph that grows during time and Gaussian densities, two This section describes the foundations of ILS minimiza-
typical features of the SLAM problem. The final iteration of tion. We first present a formulation of the problem, that
the system, called GTSAM [9], embeds all this concepts in highlights its probabilistic aspects (Section IV-A). In Sec-
a single framework. tion IV-B we review some basic rules for manipulating the
Meanwhile, Hertzberg with his thesis [48] introduced the Normal distribution and we apply these rules to the definition
⊞ method to systematically deal with non-Euclidean spaces presented in Section IV-A, leading to the initial definition of
and sparse problems. This work has been at the root of linear Least-Squares (LS). In Section IV-C we discuss the
the framework of Kümmerle et al. [8] - called g 2 o. This effects of non-linear observation model, assuming that both
system introduces a layered architecture that allows to easily the state space and the measurements space are Euclidean.
5
4.5
3.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Fig. 2: Affine transformation of a uni-variate Gaussian distribution. The blue curve represents the source PDF while in green
we show the output PDF. The red line represents the affine transformation.
Subsequently, we relax this assumption on the structure of Eq. (2) expresses the uniform prior about the states using
state and measurement spaces, proposing a solution that the canonical parameterization of the Gaussian. Alternatively,
uses smooth manifold encapsulation. In Section IV-D we the moment parameterization characterizes the Gaussian by
introduce effects of outliers in the optimization and we show the information matrix - i.e. the inverse of the covariance
commonly used methodologies to reject them. Finally in matrix Ωx = Σ−1 x - and the information vector νx = Ωx µx .
Section IV-E, we address the case of large, sparse problem The canonical parameterization is better suited to represent
characterized by measurement functions depending only on non-informative prior, since Ωx = 0 does not lead to
small subsets of the state. Classical problems such as SLAM numerical instabilities while implementing the algorithm. In
or BA fall in this category and are characterized by a rather contrast, the moment parameterization can express in a stable
sparse structure. manner situations of absolute certainty by setting Σx = 0.
In the remainder, we will use both representations upon
A. Problem Formulation
convenience, being their relation clear.
Let W be a stationary system whose non-observable state In Eq. (3) the mean µz|x of the predicted measurement
variable is represented by x and let z be a measurement, distribution is controlled by a generic non-linear function
i.e. a perception of the environment. The state is distributed of the state h(x), commonly referred to as measurement
according to a prior distribution p(x), while the conditional function. In the next section we will derive the solution
distribution of the measurement given the state p(z|x) is for Eq. (1), imposing that the measurement function is an
known. p(z|x) is commonly referred to as the observation affine transformation of the state - i.e. h(x) = Ax + b -
model. Our goal is to find the most likely distribution of illustrated in Fig. 2. In Section IV-C we address the more
states, given the measurements - i.e. p(x|z). A straightfor- general non-linear case.
ward application of the Bayes rule results in the following:
B. Linear Measurement Function
p(x)p(z|x)
p(x|z) = ∝ p(x) p(z|x). (1) In case of linear measurement function, expressed as
p(z) h(x) = A(x − µx ) + ẑ, the prediction model has the
The proportionality is a consequence of the normalization following form:
factor p(z), which is constant. ∆x
In the reminder of this work, we will considers two key
p(z|x) = N (z; µz|x = A(x − µx ) + ẑ, Ω−1
z|x )
assumptions:
– the prior about the states is uniform, i.e. p(z|∆x) = N (z; µz|∆x = A∆x + ẑ, Ω−1
z|x ), (4)
p(x) = N (x; µx , Σx = inf) = N (x; νx , Ωx = 0), with ẑ constant and µx being the mean of the prior. For
(2) convenience, we express the stochastic variable ∆x = x−µx
– the observation model is Gaussian, i.e. as
p(z|x) = N (z; µz|x , Ω−1
z|x ) where µz|x = h(x). p(∆x) = N (∆x; 0, Σx = inf) = N (x; ν∆x , Ωx = 0).
(3) (5)
Switching between ∆x and x is achieved by summing or An important result in this derivation is that the matrix
subtracting the mean. To retrieve a solution for Eq. (1) H = Ω∆x|z is the information matrix of the the estimate,
we first compute the joint probability of p(z, ∆x) using therefore, we can estimate not only the optimal solution µx|z ,
the chain rule and, subsequently, we condition this joint but also its uncertainty Σx|z = Ω−1∆x|z . Fig. 4 illustrates vi-
distribution with respect to the known measurement z. For sually the conditioning of a bi-variate Gaussian distribution.
further details on the Gaussian manipulation, we refer the c) I NTEGRATING M ULTIPLE M EASUREMENTS: Inte-
reader to [58]. grating multiple independent measurements z1:K requires to
a) C HAIN RULE: Under the Gaussian assumptions stack them in a single vector. As a result, the observation
made in the previous section, the parameters of the joint model becomes
distribution over states and measurements p(∆x, z) have the K
Y
following block structure: p(z1:K |∆x) = p(zk |∆x) ∼ N (z; µz|x , Ωz|x ) = (14)
k=1
−1
p(∆x, z) = N ∆x, z; µ∆x,z , Ω∆x,z (6)
z1
A1
ẑ1 Ωz1 |x
.. .. ..
0 Ωxx Ωxz = . ; . ∆x + . , ..
.
µ∆x,z = Ω∆x,z = . (7) .
ẑ ΩTxz Ωzz zK AK ẑK ΩzK |x
The value of the terms in Eq. (7) are obtained by applying the Hence, matrix H and vector b are composed by the sum of
chain rule to multivariate Gaussians to Eq. (4) and Eq. (5), each measurement’s contribution; setting ek = ẑk − zk , we
according to [58], and they result in the following: compute them as follows:
Ωxx = AT Ωz|x A + Ωx K
X K
X
T H= A⊤
k Ωzk |x Ak b= A⊤
k Ωzk |x ek . (15)
Ωxz = −A Ωz|∆x
k=1 Hk k=1 bk
Ωzz = Ωz|x .
C. Non-Linear Measurement Function
Since we assumed the prior to be non-informative, we can
Equations (12) (13) and (15) allow us to find the exact
set Ωx = 0. As a result, the information vector ν∆x,z of the
mean of the conditional distribution, under the assumptions
joint distribution is computed as:
that i) the measurement noise is Gaussian, ii) the measure-
ν∆x −AT Ωz|x ẑ ment function is an affine transform of the state and iii)
ν∆x,z = = Ω∆x,z µ∆x,z = . (8) both measurement and state spaces are Euclidean. In this
νz Ωz|x ẑ
section we first relax the assumption on the affinity of the
Fig. 3 shows visually the outcome distribution. measurement function, leading to the common derivation of
b) C ONDITIONING: Integrating the known measure- the GN algorithm. Subsequently, we address the case of non-
ment z in the joint distribution p(∆x, z) of Eq. (6) results in Euclidean state and measurement spaces.
a new distribution p(∆x|z). This can be done by condition- If the measurement model mean µz|x is controlled by
ing in the Gaussian domain. Once again we refer the reader a non-linear but smooth function h(x), and that the prior
to [58] for the proofs, while we report here the results that mean µx = x̆ is reasonably close to the optimum, we
the conditioning has on the Gaussian parameters: can approximate the behavior of µx|z through the first-order
Taylor expansion of h(x) around the mean, namely:
p(∆x|z) ∼ N ∆x; ν∆x|z , Ω∆x|z (9)
∂h(x)
where h(x̆ + ∆x) ≈ h(x̆) + ∆x = J∆x + ẑ. (16)
∂x x=x̆
ẑ
ν∆x|z = ν∆x − Ωxz z J
⊤ ⊤
= ν∆x − (−A Ωz|x )z = A Ωz|x (z − ẑ) (10) The Taylor expansion reduces conditional mean to an affine
−e transform in ∆x. Whereas the conditional distribution will
Ω∆x|z = Ωzz = A⊤ Ωz|x A . (11) not be in general Gaussian, the parameters of a Gaussian
H
approximation can still be obtained around the optimum
through Eq. (11) and Eq. (13). Thus, we can use the same
The conditioned mean µ∆x|z is retrieved from the informa- algorithm described in Sec. IV-B, but we have to compute
tion matrix and the information vector as: the linearization at each step. Summarizing, at each iteration,
µ∆x|z = Ω−1 −1
A⊤ Ωz|x e = −H−1 b. (12) the GN algorithm:
∆x|z νx|z = −H
b
– processes each measurement zk by evaluating error
ek (x) = hk (x) − zk and Jacobian Jk at the current
Remembering that ∆x = x − µx , the Gaussian distribution solution x̆:
over the conditioned states has the same information matrix,
while the mean is obtained by summing the increment’s mean ek = h(x̆) − z (17)
µ∆x|z as ∂hk (x)
Jk = . (18)
µx|z = µx + µ∆x|z . (13) ∂x x=x̆
Fig. 3: Given a uni-variate Gaussian PDF p(x) and an affine trasformation f (x) - indicated in red - we computed the joint
distribution p(x, y = f (x)) through the chain rule.
– builds a coefficient matrix and coefficient vector for the as it usually happens in Robotics and Computer Vision
linear system in Eq. (12), and computes the optimal applications - the straightforward implementation does not
perturbation ∆x by solving a linear system: generally provide satisfactory results. Rotation matrices or
angles cannot be directly added or subtracted without per-
∆x = − H−1 b (19)
forming subsequent non-trivial normalization. Still, typical
K K
X X continuous states involving rotational or similarity transfor-
H= J⊤
k Ωzk |x Jk b= J⊤
k Ωzk |x ek mations are known to lie on a smooth manifold [59].
k=1 k=1
A smooth manifold M is a space that, albeit not homeo-
– applies the computed perturbation to the current state morphic to Rn , admits a locally Euclidean parameterization
as in Eq. (13) to get an improved estimate around each element M of the domain, commonly referred
to as chart - as illustrated in Fig. 5. A chart computed in a
x̆ ← x̆ + ∆x. (20)
manifold point M is a function from Rn to a new point M′
A smooth prediction function has lower-magnitude higher on the manifold:
order terms in its Taylor expansion. The smaller these terms
chartM (∆m) : Rn → M. (21)
are, the better its linear approximation will be. This leads to
the situations close to the ideal affine case. In general, the Intuitively, M′ is obtained by “walking” along the pertur-
smoother the measurement function h(x) is and the closer bation ∆m on the chart, starting from the origin. A null
the initial guess is to the optimum, the better the convergence motion (∆m = 0) on the chart leaves us at the point where
properties of the problem. the chart is constructed - i.e. chartM (0) = M.
a) N ON -E UCLIDEAN S PACES: The previous formula- Similarly, given two points M and M′ on the manifold,
tion of GN algorithm uses vector addition and subtraction to we can determine the motion ∆m on the chart constructed
compute the error ek in Eq. (17) and to apply the increments on M that would bring us to M′ . Let this operation be the
in Eq. (20). However, these two operations are only defined inverse chart, denoted as chart−1 ′
M (M ). The direct and in-
in Euclidean spaces. When this assumption is violated - verse charts allow us to define operators on the manifold that
Fig. 4: Conditioning of a bi-variate Gaussian. Top left: the source PDF; top right and bottom left indicate the conditioning
over x and y respectively.
RA ⊞ ∆r = fromVector(∆r) RA (24)
RA ⊟ RB = toVector(R−1
B RA ). (25)
are analogous to the sum and subtraction. Those operators, The function toVector(·) does the opposite by computing
referred to as ⊞ and ⊟, are, thence, defined as: the value of each Euler angle starting from the matrix R. It
operates by equating each of its element to the corresponding
M] = M ⊞ ∆m , chartM (∆m) (22) one in the matrix product Rx (∆φ) Ry (∆θ) Rz (∆ψ), and
′
∆m = M ⊟ M , −1
chartM (M′ ). (23) by solving the resulting set of trigonometric equations. As
a result, this operation is quite articulated. Around the
This notation - firstly introduced by Smith et al. [60] and origin the chart constructed in this manner is immune to
then generalized by Hertzberg et al. [14], [61] - allows us singularities.
to straightforwardly adapt the Euclidean version of ILS to Once defined proper ⊞ and ⊟ operators, we can refor-
operate on manifold spaces. The dimension of the chart is mulate our minimization problem in the manifold domain.
chosen to be the minimal needed to represent a generic To this extent we can simply replace the + with a ⊞ in the
perturbation on tha manifold. On the contrary, the manifold computation of the Taylor expansion of Eq. (16). Since we
representation can be chosen arbitrarily. will compute an increment on the chart, we need to compute
5
4.5
3.5
2.5
1.5
0.5
0
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
Fig. 6: Commonly used robust kernel functions. The kernel threshold is set to 1 in all cases.
the expansion on the chart ∆x at the local optimum, that is D. Handling Outliers: Robust Cost Functions
at the origin of the chart itself ∆x = 0, in formulæ:
In Sec. IV-C we described a methodology to compute
∂hk (X̆ ⊞ ∆x) the parameters of the Gaussian distribution over the state
hk (X̆ ⊞ ∆x) ≈ hk (X̆) + ∆x. (27) x which minimizes the Omega-norm of the error between
∂∆x ∆x=0
prediction and observation. More concisely, we compute the
J̆k optimal state x⋆ such that:
The same holds when applying the increments in Eq. (20), K
X
leading to: x∗ = argmin kek (x)k2Ωk . (31)
x
X̆ ← X̆ ⊞ ∆x. (28) k=1
Here we denoted with capital letters the manifold representa- The mean of our estimate µx|z = x∗ is the local optimum
tion of the state X, and with ∆x the Euclidean perturbation. of the GN algorithm, and the information matrix Ω∗x|z =
Since the optimization within one iteration is conducted on H∗ , is he coefficient matrix of the system at convergence.
the chart, the origin of the chart X̆ on the manifold stays The procedure reported in the previous section assumes
constant during this iteration. If the measurements lie on a all measurements correct, albeit affected by noise. Still, in
manifold too, a local ⊟ operator is required to compute the many real cases this is not the case. This is mainly due to
error, namely: aspects that are hard to model - i.e. multi-path phenomena or
incorrect data associations. These wrong measurements are
ek (X) = Ẑk ⊟ Zk = hk (X) ⊟ Zk . (29) commonly referred to as outliers. On the contrary, inliers
represent the good measurements.
To apply the previously defined optimization algorithm we A common assumption made by several techniques to
should linearize the error around the current estimate through reject outliers is that the inliers tend to agree towards a
its first-order Taylor expansion. Posing ĕk = ek (X̆), we have common solution, while outliers do not. This fact is at the
the following relation: root of consensus schemes such as RANSAC [29]. In the
context of ILS, the quadratic nature of the error terms leads
∂ek (X̆ ⊞ ∆x) to over-accounting for measurement whose error is large,
ek (X̆⊞∆x) = hk (X̆)⊟Zk ≈ ĕk + ∆x = albeit
ĕk +J̃kthose
∆x. measurements are typically outliers. However,
∂∆x ∆x=0
(30) there are circumstances where all errors are quite large even
The reader might notice that in Eq. (30) the error space may if no outliers are present. A typical case occurs when we
differ from the increments one, due to the ⊟ operator. As start the optimization from an initial guess which is far from
reported in [62], having a different parametrization might the optimum.
enhance the convergence properties of the optimization in A possible solution to this issue consists in carrying
specific scenarios. Still, to avoid any inconsistencies, the on the optimization under a cost function that grows sub-
information matrix Ωk should be expressed on a chart around quadratically with the error. Indicating with uk (x) the L1
the current measurement Zk . Omega-norm of the error term in Eq. (17), its derivatives with
respect to the state variable x can be computed as follows:
q
uk (x) = ek (x)T Ωk ek (x) (32)
∂uk (x) 1 ∂ek (x)
= ek (x)T Ωk . (33)
∂x uk (x) ∂x
– Optimization Algorithm: the algorithm that performs the optimization. This class allows to select the type of
the minimization; currently, only GN and LM are sup- algorithm used within one iteration, which algorithm to use
ported, still, we plan to add also TRM approaches to solve the linear system, or which termination criterion to
– Linear Solver: the algebraic solver that computes the use. This mechanism is achieved by delegating the execution
solution of the linear system H∆x = −b; we embed a of these functions to specific interfaces. More in detail, the
naive AMD-based [65] linear solver together with other linear system is stored in a sparse-block-matrix structure,
approaches based on well-known highly-optimized lin- that effectively separates the solution of the linear system
ear algebra libraries - e.g. SuiteSparse 1 from the rest of the optimization machinery. Furthermore,
– Robustifier: the robust kernel function applied to a our solver supports incremental updates, and can provide
specific factor; we provide several commonly used an estimate of partial covariance blocks. Finally, our system
instances of robustifier, together with a modular mech- supports hierarchical approaches. In this sense the problem
anism to assign specific robustifier to different types of can be represented at different resolutions (levels), by using
factor - called robustifier policy different factors. When the solution at a coarse level is
– Termination Criterion: a simple modules that, based on computed, the optimization starts from the next denser level,
optimization statistics, checks whether convergence has enabling a new set of factors. In the new step, the initial
been reached. guess is computed from the solution of the coarser level.
Note that, in our architecture there is a clear separation The IterationAlgorithmBase class defines an interface
between solver and problem. In the next section we will for the outer optimization algorithm - i.e. GN or LM. To
describe how we formalized the latter. In the remaining of carry on its operations, it relies on the interface exposed
this section, instead, we will focus on the solver classes, by the Solver class. The latter is in charge to invoke the
which are in charge of computing the problem solution. IterationAlgorithmBase, which will run a single iteration
The class Solver implements a unified interface for our of its algorithm.
optimization framework. It presents itself to the user with an
unified data-structure to configure, control, run and monitor Class RobustifierBase defines an interface for using
arbitrary ρ(u) functions - as illustrated in Sec. IV-D. Robust
1 http://faculty.cse.tamu.edu/davis/suitesparse.html kernels can be directly assigned to factors or, alternatively,
the user might define a policy, that based on the sta- is provided with a structure on which to write the outcome
tus of the actual factor decides which robustifier to use. of the operation. A factor can be enabled or disabled. In
The definition of a policy is done by implementing the the latter case, it will be ignored during the computation.
RobustifierPolicyBase interface. Besides, upon update a factor might become invalid, if the
Finally, TerminationCriterionBase defines an interface result of the computation is meaningless. This occurs for
for a predicate that, exploiting the optimization statistics, instance in BA, when a a point is projected outside the image
detects whether the system has converged to a solution or plane.
a fatal error has occurred. The Factor_<VariableTupleType> class implements a
typed interface for the factor class. The user willing to
B. Factor Graph Classes
extend the class at this level is responsible of implementing
In this section we provide an overview of the top-level the entire FactorBase interface, relying on functions for
classes constituting a factor graph - i.e. the optimization typed access to the blocks of the system matrix H and
problem - in our framework. In specifying new variables of the coefficient vector b. In this case, the block size is
or factors, the user can interact with the system through a determined from the dimension of the perturbation vector
layered interface. More specifically, factors can be defined of the variables in the template argument list. We extended
using AD and, thus, contained in few lines of code for the factors at this level to implement approaches such as
rapid prototyping or the user can directly provide how to dense multi-cue registration [31]. Special structures in the
compute analytic Jacobians if more speed is required. Fur- Jacobians can be exploited to speed up the calculation of
thermore, to achieve extreme efficiency, the user can choose Hk whose computation has a non negligible cost.
to compute its own routines to update the quadratic form The ErrorFactor_<ErrorDim, VariableTypes...> class
directly, consistently in line with our design requirement of specializes a typed interface for the factor class, where the
more-work/more-performance. Note that, we observed in our user has to implement both the error function ek and the
experiments that in large sparse problems the time required Jacobian blocks Jk,i . The calculation of the H and the b
to linearize the system is marginal compared to the time blocks is done through loops unrolled at compile time since
required to solve it. Therefore, in most of these cases AD the types and the dimensions of the variables/errors are part
can be used without significant performance losses. of the type.
1) Variables: The VariableBase implements a base The ADErrorFactor_<Dim, VariableTypes...> class fur-
abstract interface the variables in a factor graph, whereas ther specializes the ErrorFactor_. Extending the class at this
Variable_<PerturbationDim,EstimateType> specializes level only required to specify only the error function. The
the base interface on a specific type. The definition of a new Jacobians are computed through AD, and the updates of H
variable extending the Variable_ template requires the user and the b are done according to the base class.
to specify i) the type EstimateType used to store the value Finally, the FactorCorrespondenceDriven_<FactorType>
of the variable Xi , ii) the dimension PerturbationDim implements a mechanism that allows the solver to iterate
of the perturbation ∆xi and iii) the ⊞ operator. This is over multiple factors of the same type and connecting the
coherent with the methodology provided in Sec. IV-F. In same set of variables, without the need of explicitly storing
addition to these fields, variable has an integer key, to be them in the graph. A FactorCorrespondenceDriven_ is
uniquely identified within a factor graph. Furthermore, a instantiated on a base type of factor, and it is specialized by
variable can be in either one of these three states: defining which actions should be carried on as a consequence
– Active: the variable will be estimated of the selection of the “next” factor in the pool by the
– Fixed: the variable stays constant through the optimiza- solver. The solver sees this type of factor as multiple ones,
tion albeit a FactorCorrespondenceDriven_ is stored just once
– Disabled: the variable is ignored and all factors that in memory. Each time a FactorCorrespondenceDriven_ is
depend on it are ignored as well. accessed by the solver a callback changing the internal
To provide roll-back operations - such as those required by parameters is called. In its basic implementation this class
LM - a variable also stores a stack of values. takes a container of corresponding indices, and two data
To support AD, we introduce the ADVariable_ template, containers: Fixed and Moving. Each time a new factor within
that is instantiated on a variable without AD. Instantiating a the FactorCorrespondenceDriven_ is requested, the factor
variable with AD requires to define the ⊞ operator by using is configured by: selecting the next pair of corresponding
the AD scalar type instead of the usual float or double. indices from the set, and by picking the elements in Fixed
This mechanism allows us to mix in a problem factors that and Moving at those indices. As an instance, to use our solver
require AD with factors that do not. within an ICP algorithm, the user has to configure the factor
2) Factors: The base level of the hierarchy is the by setting the Fixed and Moving point clouds. The corre-
FactorBase. It defines a common interface for this type of spondence vector can be changed anytime to reflect a new
graph objects. It is responsible of i) computing the error - data association. This results in different correspondences to
and, thus, the χ2 - ii) updating the quadratic form H and be considered at each iteration.
the right-hand side vector b and iii) invoking the robustifier 3) FactorGraph: To carry on an iteration, the solver has
function (if required). When an update is requested, a factor to iterate over the factors and, hence, it requires to randomly
D ATASET S ENSOR VARIABLES FACTORS
access the variables. Restricting the solver to access a graph ICL-NUIM [74] RGB-D 1 307200
through an interface of random access iterators enables us to ETH-Hauptgebaude [75] Laser-scanner 1 189202
decouple the way the graph is accessed from the way it is ETH-Apartment [75] Laser-scanner 1 370276
stored. This would allow us to support transparent off-core Stanford-Bunny [76] 3D digitalizer 1 35947
storage that can be useful on very large problems. TABLE I: Specification of the datasets used to perform dense
A FactorGraphInterface defines the way to access a benchmarks.
graph. In our case we use integer values as key for variables
and factors. The solver accesses a graph only through the
FactorGraphInterface and, thence, it can read/write the range scans. Therefore, in such cases, we constructed the
value of state variables, read the factors, but it is not allowed ICP problem as follows:
to modify the graph structure. – reading of the first raw scan and generate a point cloud
A heap-based concrete implementation of a factor graph – transformation of the point cloud according to a known
is provided by the FactorGraph class, that specializes the isometry TGT
interface. The FactorGraph supports transparent serializa- – generation of perfect association between the two clouds
tion/deserialization. Our framework makes use of the open- – registration starting from Tinit = I.
source math library Eigen [73], which provides fast and Since our focus is on the ILS optimization, we used the same
easy matrix operation. The serialization/deserialization of set of data-associations and the same initial guess for all
variable and factors that are constructed on Eigen types is approaches. This methodology renders the comparison fair
automatically handled by our BOSS library. and unbiased. As for the ICL-NUIM dataset, since obtained
In sparse optimization it is common to operate on a the raw point cloud unprojecting the range image of the first
local portion of the entire problem. Instrumenting the solver reading of the lr-0 scene. After this initial preprocessing,
with methods to specify the local portions would bloat the the benchmark flow is the same described before.
implementation. Alternatively we rely on the concept of In this context, we compared i) the accuracy of the solution
FactorGraphView that exposes an interface on a local portion obtained computing the translational and rotational error
of a FactorGraph - or of any other object implementing the of the estimate and ii) the time required to achieve that
FactorGraphInterface. solution. We compared the recommended PCL registration
VIII. E XPERIMENTS suite - that uses the Horn formulas - against our framework
with and without AD. Furthermore, we also provide results
In this section we propose several comparisons between obtained using PCL implementation of the LM optimization
our framework and other state-of-the-art optimization sys- algorithm.
tem. The aim of these experiments is to support the claims on As reported in Tab. II, the final registration error is almost
the performance of our framework and, thus, we focused on negligible in all cases. Instead, in Fig. 13 we document
the accuracy of the computed solution and the time required the speed of each solver. When using the full potential of
to achieve it. Experiments have been performed both on our framework - i.e. using analytic Jacobians - it is able
dense scenarios - such as ICP - and sparse ones - e.g. PGO to achieve results in general equal or better than the off-
and PLGO. the-shelf PCL registration algorithm. Using AD has a great
A. Dense Problems impact on the iteration time, however, our system is able to
be faster than the PCL implementation of LM also in this
Many well-known SLAM problems related to the front- case.
end can be solved exploiting the ILS formulation introduced
before. In such scenarios - e.g.point-clouds registration - B. Sparse Problems
the number of variables is small compared to the observa- Sparse problems are mostly represented by generic global
tions’ one. Furthermore, at each registration step, the data- optimization scenarios, in which the graph has a large
association is usually recomputed to take advantage of the number of variables while each factor connects a very small
new estimate. In this sense, one has to build the factor graph subset of those (typically two). In this kind of problems, the
associated to the problem from scratch at each step. In such graph remains unchanged during the iterations, therefore, the
contexts, the most time consuming part of the process is most time-consuming part of the optimization is the solution
represented by the construction of linear system in Eq. (19) of the linear system not its construction. PGO and PLGO
and not its solution. are two instances of this problem that are very common in
To perform dense experiments, we choose a well-known the SLAM context and, therefore, we selected these two to
instance of this kind of problems: ICP. We conducted mul- perform comparative benchmarks.
tiple tests, comparing our framework to the current state-of- 1) Pose-Graph Optimization: PGO represents the back-
the-art PCL library [15] on the standard registration datasets bone of SLAM systems and it has been well investigated by
summarized in Tab. I. In all the cases, we setup a con- the research community. For these experiments, we employed
trolled benchmarking environment, to have a precise ground- standard 3D PGO benchmark datasets - all publicly avail-
truth. In the ETH-Hauptgebaude, ETH-Apartment and able [62]. We added to the factors Additive White Gaussian
Stanford-Bunny, the raw data consists in a series of Noise (AWGN) and we initialized the graph using the
Time per Iteration Cumulative Time
2.5 14
12
2
10
1.5
8
seconds
seconds
6
1
0.5
2
0 0
PCL PCL-LM our our-AD PCL PCL-LM our our-AD
(a) Iteration time: ICL-NUIM - lr-0 (b) Cumulative time: ICL-NUIM - lr-0
Time per Iteration Cumulative Time
0.8 3
0.7
2.5
0.6
0.5
2
0.4
seconds
seconds
0.3 1.5
0.2
1
0.1
0
0.5
-0.1
-0.2 0
PCL PCL-LM our our-AD PCL PCL-LM our our-AD
1.4
5
1.2
1
4
0.8
seconds
seconds
0.6 3
0.4
2
0.2
0
1
-0.2
-0.4 0
PCL PCL-LM our our-AD PCL PCL-LM our our-AD
1.6
0.08
1.4
0.06
1.2
0.04 1
seconds
seconds
0.02 0.8
0.6
0
0.4
-0.02
0.2
-0.04 0
PCL PCL-LM our our-AD PCL PCL-LM our our-AD
Fig. 13: Timing analysis of the ILS optimization. On the left column are reported the mean and standard deviation of a full
ILS iteration - computed over 10 total iterations. On the right column, instead, the cumulative time to perform all 10 ILS
iterations.
Time per Iteration Cumulative Optimization Statistics
0.35 35
0.3 30
0.25 25
0.2 20
seconds
seconds
0.15 15
0.1 10
0.05 5
0 0
ceres gtsam g2o our ceres gtsam g2o our
60
0.6
50
0.5
40
seconds
seconds
0.4
30
0.3
20
0.2
10
0.1 0
ceres gtsam g2o our ceres gtsam g2o our
0.14 14
12
0.12
10
0.1
seconds
seconds
8
0.08
6
0.06
4
0.04 2
0.02 0
ceres gtsam g2o our ceres gtsam g2o our
Fig. 14: Timing analysis of different optimization frameworks. The left column reports the mean and standard deviation of
the time to perform a complete LM iteration. The right column, instead, illustrates the total time to reach convergence -
mean and standard deviation.
breadth-first initialization. We report in Tab. III the complete noise imposed on the factors, we performed experiments
specifications of the datasets employed together with the over 10 noise realizations and we report here the statistics
noise statistics used. Given the probabilistic nature of the of the results obtained - i.e. mean and standard deviation.
PCL PCL-LM O UR O UR -AD
epos [m] 6.525 × 10−06 1.011 × 10−04 1.390 × 10−06 6.743 × 10−07
ICL-NUIM-lr-0
erot [rad] 1.294 × 10−08 2.102 × 10−05 1.227 × 10−08 9.510 × 10−08
epos [m] 4.225 × 10−06 2.662 × 10−05 1.581 × 10−06 2.384 × 10−07
ETH-Haupt
erot [rad] 5.488 × 10−08 8.183 × 10−06 1.986 × 10−08 1.952 × 10−07
epos [m] 1.527 × 10−06 5.252 × 10−05 6.743 × 10−07 2.023 × 10−06
ETH-Apart
erot [rad] 7.134 × 10−08 1.125 × 10−04 1.548 × 10−08 1.564 × 10−07
epos [m] 1.000 × 10−12 1.352 × 10−05 1.284 × 10−06 9.076 × 10−06
bunny
erot [rad] 1.515 × 10−07 2.665 × 10−04 1.269 × 10−06 5.660 × 10−07
TABLE II: Comparison of the final registration error of the optimization result.
0.11
0.1 10
0.09
8
0.08
seconds
seconds
0.07
6
0.06
0.05
4
0.04
0.03 2
0.02
0.01 0
ceres ceres-schur gtsam gtsam-seq g2o-schur g2o our ceres ceres-schur gtsam gtsam-seq g2o-schur g2o our
9 900
800
8
700
7
600
6
seconds
seconds
500
5
400
4
300
3
200
2 100
1 0
ceres ceres-schur gtsam gtsam-seq g2o-schur g2o our ceres ceres-schur gtsam gtsam-seq g2o-schur g2o our
Fig. 15: Timing analysis: the left column illustrates the time to perform a complete LM iteration; the right column reports
the total time to complete the optimization. All values are mean and standard deviation computed over 5 noise realizations.
To avoid any bias in the comparison, we used the native each one can detect when to stop the optimization. Finally,
LM implementation of each framework, since it was the no robust kernel has been employed in these experiments.
only algorithm common to all candidates. Furthermore, we
In Tab. IV we illustrate the Absolute Trajectory Error
imposed a maximum number of 100 LM iterations. Still, each
(ATE) (RMSE) computed on the optimized graph with
framework has its own termination criterion active, so that
respect to the ground truth. The values reported refer to mean
C ERES g2 o G TSAM O UR
ATEpos [m] 96.550 ± 36.680 94.370 ± 39.590 77.110 ± 41.870 95.290 ± 38.180
kitti-00
ATErot [rad] 1.107 ± 0.270 0.726 ± 0.220 0.579 ± 0.310 0.720 ± 0.230
ATEpos [m] 83.210 ± 7.928 9.775 ± 4.003 55.890 ± 12.180 26.060 ± 16.350
sphere-b
ATErot [rad] 2.135 ± 0.282 0.150 ± 0.160 0.861 ± 0.170 0.402 ± 0.274
ATEpos [m] 14.130 ± 1.727 2.232 ± 0.746 8.041 ± 1.811 3.691 ± 1.128
torus-b
ATErot [rad] 2.209 ± 0.3188 0.121 ± 0.0169 0.548 ± 0.082 0.156 ± 0.0305
TABLE IV: Comparison of the ATE (RMSE) of the optimization result - mean and standard deviation.
C ERES g2 o G TSAM O UR
kitti-00 81.70 99.50 69.40 49.0
the linear system in Eq. (19) is can be rearranged as follows:
sphere-b 101.0 70.90 15.50 27.40
Hpp Hpl ∆xp −bp
torus-b 93.50 12.90 25.50 16.40 = . (64)
H⊤pl Hll ∆xl −bl
TABLE V: Comparison of the number of LM iterations to
reach convergence - mean values. A linear system with this structure can be solved more
efficiently through the Schur complement of the Hessian
matrix [80], namely:
and standard deviation over all noise trials. As expected, the (Hpp − Hpl H−1 ⊤ −1
ll Hpl )∆xp = −bp + Hpl Hll bl (65)
result obtained are in line with all other methods. Fig. 14,
Hll ∆xl = −bl + H⊤
pl ∆xp . (66)
instead, reports a detailed timing analysis. The time to per-
form a complete LM iteration is always among the smallest, Ceres-Solver and g 2 o can make use of the Schur com-
with a very narrow standard deviation. Furthermore, since the plement to solve this kind of special problem, therefore,
specific implementation of LM is slightly different in each we reported also the wall times of the optimization when
framework, we reported also the total time to perform the this technique is used. Obviously, using the Schur com-
full optimization, while the number of LM iteration elapsed plement leads to a major improvement in the efficiency
are shown in Tab. V. Also in this case, our system is able to of the linear solver, leading to very low iteration times.
achieve state-of-the-art performances that are better or equal For completeness, we reported the results of GTSAM with
to the other approaches. two different linear solvers: cholesky multifrontal
2) Pose-Landmark Graph Optimization: PLGO is another and cholesky sequential. Our framework does not
common global optimization task in SLAM. In this case, provide at the moment any implementation of a Schur-
the variables contain both robot (or camera) poses and complement-based linear solver, still, the performance
landmarks’ position in the world. Factors, instead, embody achieved are in line with all the non-Schur methods, con-
spatial constraints between either two poses or between firming our conjectures.
a pose and a landmark. As a result, this kind of factor
graphs are the perfect representative of the SLAM problem, IX. C ONCLUSIONS
since they contain the robot trajectory and the map of In this work, we propose a generic overview on ILS
the environment. To perform the benchmarks we used two optimization for factor graphs in the fields of robotics and
datasets: Victoria Park [77] and KITTI-00 [78]. We obtained computer vision. Our primary contribute is providing a uni-
the last one running ProSLAM [79] on the stereo data fied and complete methodology to design efficient solution
and saving the full output graph. We super-imposed to the to generic problems. This paper analyzes in a probabilistic
factors specific AWGN and we generated the initial guess flavor the mathematical fundamentals of ILS, addressing
through the breadth-first initialization technique. Tab. VI also many important collateral aspects of the problem such
summarizes the specification of the datasets used in these as dealing with non-Euclidean spaces and with outliers,
experiments. Also in this case, we sampled multiple noise exploiting the sparsity or the density. Then, we propose a
trials (5 samples) and reported mean and standard deviation set of common use-cases that exploit the theoretic reasoning
of the results obtained. The configuration of the framework previously done.
is the same as the one used in PGO experiments - i.e. 100 In the second half of the work, we investigate how to
LM iterations at most, with termination criterion active. design an efficient system that is able to work in all the
As reported in Tab. VII the ATE (RMSE) that we obtain possible scenarios depicted before. This analysis led us to
is compatible with the one of the other frameworks. The the development of a novel ILS solver, focused on effi-
higher error in the kitti-00-full dataset is mainly due ciency, flexibility and compactness. The system is developed
to the slow convergence of LM, that triggers too early the in modern C++ and almost entirely self-contained in less
termination criterion, as shown in Tab. VIII. In such case, the than 6000 lines of code. Our system can seamlessly deal
use of GN leads to better results, however, in order to not bias with sparse/dense, static/dynamic problems with a unified
the evaluation, we choose to not report results obtained with consistent interface. Furthermore, thanks to specific imple-
different ILS algorithms. As for the wall times to perform the mentation designs, it allows to easily prototype new factors
optimization, the results are illustrated in Fig. 15. In PLGO and variables or to intervene at low level when performances
scenarios, given the fact that there are two types of factors, are critical. Finally, we provide an extensive evaluation of
D ATASET VARIABLES FACTORS N OISE Σt [m] N OISE ΣR [rad] N OISE Σland [m]
victoria-park 7120 10608 diag(0.05, 0.05) 0.01 diag(0.05, 0.05)
kitti-00-full 123215 911819 diag(0.05, 0.05, 0.05) diag(0.01, 0.01, 0.01) diag(0.05, 0.05, 0.05)
C ERES g2 o G TSAM O UR
ATEpos [m] 37.480 ± 21.950 29.160 ± 37.070 2.268 ± 0.938 5.459 ± 3.355
victoria-park
ATErot [rad] 0.515 ± 0.207 0.401 ± 0.461 0.030 ± 0.007 0.056 ± 0.028
ATEpos [m] 134.9 ± 29.160 31.14 ± 27.730 30.97 ± 18.150 135.4 ± 27.000
kitti-00-full
ATErot [rad] 1.137 ± 0.268 0.173 ± 0.157 0.174 ± 0.104 0.850 ± 0.148
TABLE VII: Comparison of the ATE (RMSE) of the optimization result - mean and standard deviation.
TABLE VIII: Comparison of the number of LM iterations to reach convergence - mean values.
the system’s performances, both in dense - e.g. ICP - and Expanding Eq. (71) and performing all the multiplications,
sparse - e.g. batch global optimization - scenarios. The the rotation matrix R(∆θ) is computed as follows:
evaluation shows that the performances achieved are in line
with contemporary state-of-the-art frameworks, both in terms r11 r12 r13
of accuracy and speed. R(∆θ) = r21 r22 r23
r31 r32 r33
A PPENDIX I c(∆γ) c(∆ψ) −c(∆γ) s(∆ψ) s(∆γ)
SE(3) M APPINGS = a b −c(∆γ) s(∆φ)
In this section, we will assume that a SE(3) variable X c d c(∆γ) c(∆φ)
is composed as follows: (72)
R t where c(·) and s(·) indicate the cosine and sine of an angle
X ∈ SE(3) = . (67)
01×3 1 respectively, while
A possible minimal representation for this object could be a = c(∆φ) s(∆ψ) + s(∆φ) c(∆ψ) s(∆γ)
using 3 Cartesian coordinates for the position and the 3 Euler
b = c(∆φ) c(∆ψ) − s(∆φ) s(∆γ) s(∆ψ)
angles for the orientation, namely
⊤ ⊤ c = s(∆φ) s(∆ψ) − c(∆φ) c(∆ψ) s(∆γ)
∆x = ∆x ∆y ∆z ∆φ ∆γ ∆ψ = ∆t ∆θ d = s(∆φ) c(∆ψ) + c(∆φ) s(∆γ) s(∆ψ).
(68)
To pass from one representation to the other, we should On the contrary, through Eq. (70) we compute the minimal
define suitable mapping functions. In this case, we use the parametrization ∆x starting from ∆X. Again, while the
following notation: translational component of ∆x can be retrieved easily from
the isometry. The rotational component ∆θ - i.e. the 3 Euler
∆X = v2t(∆x) (69)
angles - should be computed starting from the rotation matrix
∆x = t2v(∆X). (70) in Eq. (72), in formulæ:
More in detail, the function v2t computes the SE(3) isometry
−r23
reported in Eq. (67) from the 6-dimensional vector ∆x ∆φ = atan2
r
in Eq. (68). While the translational component of X can 33
−r12
be recovered straightforwardly from ∆x, the rotational part ∆ψ = atan2
requires to compose the rotation matrices around each axis, r11
!
leading to the following relation: r13
∆γ = atan2 1 .
r11 · c(∆ψ)
R(∆θ) = R(∆φ, ∆γ, ∆ψ) = Rx (∆φ) Ry (∆γ) Rz (∆ψ).
(71)
Other minimal parametrizations of SE(3) can be used, and
In Eq. (71), Rx , Ry , Rz represent elementary rotations
they typically differ on how they represent the rotational
around the x, y and z axis. Summarizing, we can ex-
component of the SE(3) object. Common alternatives to
pand Eq. (69) as:
euler angles are unit quaternions and axis-angle. Clearly,
R(∆θ) ∆t changing the minimal parametrization will affect the jaco-
∆X = v2t(∆x) =
01×3 1 bians too, and thus the entire optimization process.
A PPENDIX II i.e. the contribution of the homogeneous division. Given that
ICP JACOBIAN the function hom(·) is defined as
In this section, we will provide the reader the full ⊤ x/z
mathematical derivation of the Jacobian matrices reported hom([x y z] ) =
y/z
in Sec. V-A.2. To this end, we recall that the measurement
function for the ICP problem is: the Jacobian Jhom (pcam ) is computed as follows:
" 1 −pcam #
hicp −1
p = R⊤ (p − t). 0 x
k (X) , X (73)
J hom cam
(p ) =
pcam
z (pcam )2
z cam
(78)
1 −py
0 (pcam )2 .
If we apply a small state perturbation using the ⊞ operator pcam z z