Lec 19
Lec 19
These lecture summaries are designed to be a review of the lecture. Though I do my best to include all main topics from the
lecture, the lectures will have more elaborated explanations than these notes.
Last time, we saw that one way we can find an optimal transformation between two coordinate systems in 3D is to de-
compose the optimal transformation into an optimal translation and an optimal rotation. We saw that we could solve for
optimal translation in terms of rotation, and that we can mitigate the constraint issues with solving for an orthonormal rotation
matrix by using quaternions to carry out rotation operations.
1
1.1.2 Quaternion Representations: Axis-Angle Representation and Orthonormal Rotation Matrices
From our previous discussion, we saw that another way we can represent quaternions is through the axis-angle notation (known
as the Rodrigues formula):
Combining these equations from above, we have the following axis-angle representation:
!
o θ θ o θ θ
q ⇐⇒ ω̂, θ, q = cos , q = ω̂ sin =⇒ q = cos , ω̂ sin
2 2 2 2
We also saw that we can convert these quaternions to orthonormal rotation matrices. Recall that we can write our vector rotation
operation as:
o o o∗ o
q rq = (Q̄T Q)r, where
o o
q·q 0 0 0
0 q02 + qx2 − qy2 − qz2 2(qx qy − q0 qz ) 2(qx qz + q0 qy )
Q̄T Q =
0 2(qy qx + q0 qz ) q02 − qx2 + qy2 − qz2 2(qy qz − q0 qx )
0 2(qz qx − q0 qy ) 2(qz qy + q0 qz ) q02 − qx2 − qy2 + qz2
The matrix Q̄T Q has skew-symmetric components and symmetric components. This is useful for conversions. Given a
quaternion, we can compute orthonormal rotations more easily. For instance, if we want an axis and angle representation, we
can look at the lower right 3 × 3 submatrix, specifically its trace:
This equation can be solved by taking square roots, but due to the number of solutions (8 by Bezout’s theorem, allowing for the
flipped signs of quaternions, we should not use this set of equations alone to find the solution).
2
Instead, we can compute these equations, evaluate them, take the largest for numerical accuracy, arbitrarily select to use
the positive version (since there is sign ambiguity with the signs of the quaternions), and solve for this. We will call this selected
righthand side qi .
For off-digaonals, which have symmetric and non-symmetric components, we derive the following equations:
Adding/subtracting off-diagonals give us 6 relations, of which we only need 3 (since we have 1 relation from the diagonals). For
instance, if we have qi = qy , then we pick off-diagonal relations involving qy , and we solve the four equations given by:
This system of four equations gives us a direct way of going from quaternions to an orthonormal rotation matrix. Note that this
could be 9 numbers that could be noisy, and we want to make sure we have best fits.
Taking scaling into account, we can write the relationship between two point clouds corresponding to two different coordi-
nate systems as:
r0r = sR(r0l )
∆
Where rotation is again given by R ∈ SO(3), and the scaling factor is given by s ∈ R+ (where R+ = {x : x ∈ R , x > 0}). Recall
that r0r and r0l are the centroid-subtracted variants of the point clouds in both frames of reference.
As we did for translation and rotation, we can solve for an optimal scaling parameter:
n
X
∗
s = arg min ||r0r,i − sR(r0l,i )||22
s
i=1
n
X n
X n
X
= arg min ||r0r,i ||22 − 2s r0r,i R(r0l,i ) + s2 ||R(r0l,i )||22
s
i=1 i=1 i=1
n
X n
X n
X
= arg min ||r0r,i ||22 − 2s r0r,i R(r0l,i ) + s2 ||r0l,i ||22 (Rotation preserves vector lengths)
s
i=1 i=1 i=1
3
∆ Pn
1. sr = i=1 ||r0r,i ||22
∆ Pn
2. D = i=1 r0r,i R(r0l,i )
∆ Pn
3. sl = i=1 ||r0l,i ||22
Then we can write this objective for the optimal scaling factor s∗ as:
∆
s∗ = arg min{J(s) = sr − 2sD + s2 sl }
s
Since this is an unconstrained optimization problem, we can solve this by taking the derivative w.r.t. s and setting it equal to 0:
dJ(s) d
= sr − 2sD + s2 sl = 0
ds ds
D
= −2D + s2 sl = 0 =⇒ s =
sl
As we also saw with rotation, this does not give us an exact answer without finding the orthonormal matrix R, but now we are
able to remove scale factor and back-solve for it later using our optimal rotation.
Intuitively, this is the case because the version of OLS we used above “cheats” and tries to minimize error by shriking the
scale by more than it should be shrunk. This occurs because it brings the points closer together, thereby minimizing, on average,
the error term. Let us look at an alternative formulation for our error term that accounts for this optimization phenomenon.
Then we can write our objective and optimization problem over scale as:
n
1
X √
s∗ = arg min || √ r0r,i − sR(r0l,i )||22
s
i=1
s
n
X 1 Xn Xn
= arg min ||r0r,i ||22 − 2 r0r,i R(r0l,i ) + s ||R(r0l,i )||22
s
i=1
s i=1 i=1
n n n
1 X 0 2 X X
= arg min ||rr,i ||2 − 2 r0r,i R(r0l,i ) + s ||r0l,i ||22 (Rotation preserves vector lengths)
s s i=1 i=1 i=1
We then take the same definitions for these terms that we did above:
∆ Pn
1. sr = i=1 ||r0r,i ||22
∆ Pn
2. D = i=1 r0r,i R(r0l,i )
∆ Pn
3. sl = i=1 ||r0l,i ||22
Then, as we did for the asymmetric OLS case, we can write this objective for the optimal scaling factor s∗ as:
∆ 1
s∗ = arg min{J(s) = sr − 2D + ssl }
s s
4
Since this is an unconstrained optimization problem, we can solve this by taking the derivative w.r.t. s and setting it equal to 0:
dJ(s) d 1
= sr − 2D + ssl = 0
ds ds s
1 sl
= − 2 sr + sl = 0 =⇒ s2 =
s sr
Therefore, we can see that going in the reverse direction preserves this inverse (you can verify this mathematically and intu-
itively by simply setting r0r,i ↔ r0l,i ∀ i ∈ {1, ..., n} and noting that you will get s2inverse = ssrl ). Since this method better preserves
symmetry, it is preferred.
Intuition: Since s no longer depends on correspondences (matches between points in the left and right point clouds), then
the scale simply becomes the ratio of the point cloud sizes in both coordinate systems (note that sl and sr correspond to the
summed vector lengths of the centroid-subtracted point clouds, which means they reflect the variance/spread/size of the point
cloud in their respective coordinate systems.
We can deal with translation and rotation in a correspondence-free way, while also allowing for us to decouple rotation. Let us
also look at solving rotation, which is covered in the next section.
If this were an unconstrained optimization problem, we could solve by taking the derivative of this objective w.r.t. our quaternion
o
q and setting it equal to zero. Note the following helpful identities with matrix and vector calculus:
d
1. da (a · b) = b
d T
2. da (a M b) = 2M b
However, since we are working with quaternions, we must take this constraint into account. We saw in lecture 18 that we did
this with using Lagrange Multiplier - in this lecture it is also possible to take this specific kind of vector length constraint
into account using Rayleigh Quotients.
What are Rayliegh Quotients? The intuitive idea behind them: How do I prevent my parameters from becoming too
large) positive or negative) or too small (zero)? We can accomplish this by dividing our objective by our parameters, in this
case our constraint. In this case, with the Rayleigh Quotient taken into account, our objective becomes:
oT o n
q Nq ∆
X
T
Recall that N = Rl,i Rr,i
oT o
q q i=1
How do we solve this? Since this is now an unconstrained optimization problem, we can solve this simply using the rules of
calculus:
oT o
o ∆ q Nq
J(q) ==
oT o
q q
o oT o
dJ(q) d q Nq
o = o T
=0
dq dq qo qo
T o oT o oT o d oT o
d o
o (q N q)q q − q N q o (q q)
dq dq
= =0
oT o 2
(q q)
o o
2N q 2q oT o
= − (q N q) = 0
oT o oT o
q q (q q)2
5
From here, we can write this first order condition result as:
oT o
o q Nq o
Nq = T q
o o
q q
oT o
q Nq
Note that oT o ∈ R (this is our objective). Therefore, we are searching for a vector of quaternion coefficients such applying the
q q
T
∆ qo N qo
rotation matrix to this vector simply produces a scalar multiple of it - i.e. an eigenvector of the matrix N . Letting λ = oT o ,
q q
o o
then this simply becomes N q = λq. Since this optimization problem is a maximization problem, this means that we can pick
the eigenvector of N that corresponds to the largest eigenvalue (which in turn maximizes the objective consisting of the
oT o
q Nq
Rayleigh quotient oT o , which is the eigenvalue.
q q
Even though this quaternion-based optimization approach requires taking this Rayleigh Quotient into account, it is much easier
to do this optimization than to solve for orthonormal matrices, which either require a complex Lagrangian (if we solve with
Lagrange multipliers) or an SVD decomposition from Euclidean space to the SO(3) group (which also happens to be a manifold).
Let us start with two correspondences: if we have two objects corresponding to the correspondences of points in the 3D world,
then if we rotate one object about axis, we find this does not work, i.e. we have an additional degree of freedom. Note that the
distance between correspondences is fixed.
Figure 1: Using two correspondences leads to only satisfying 5 of the 6 needed constraints to solve for translation and rotation
between two point clouds.
Because we have one more degree of freedom, this accounts for only 5 of the 6 needed constraints to solve for translation and
rotation, so we need to have at least 3 correspondences.
With 3 correspondences, we get 9 constraints, which leads to some redundancies. We can add more constraints by incor-
porating scaling and generalizing the allowable transformations between the two coordinate systems to be the generalized
linear transformation - this corresponds to allowing non-orthonormal rotation transformations. This approach gives us 9
unknowns!
a11 a12 a13 x a14
a21 a22 a23 y + a24
a31 a32 a33 z a34
6
But we also have to account for translation, which gives us another 3 unknowns, giving us 12 in total and therefore requiring at
least 4 non-redundant correspondences in order to compute the full general linear transformation. Note that this doesn’t have
any constraints as well!
On a practical note, this is often not needed, especially for finding the absolute orientation between two cameras, because
oftentimes the only transformations that need to be considered due to the design constraints of the system (e.g. an autonomous
car with two lidar systems, one on each side) are translation and rotation.
Recall that our matrix N composed of the data has some special properties:
1. c3 = tr(N ) = 0 (This is actually a great feature, since usually the first step in solving 4th-order polynomial systems is
eliminating the third-order term).
2. c2 = 2tr(M T M ), where M is defined as the sum of dyadic products between the points in the point clouds:
n
∆
X
M = r0l,i r0r,i T ∈ R3×3
i=1
3. c1 = 8 det |M |
4. c0 = det |N |
What happens if det |M | = 0, i.e. the matrix M is singular? Then using the formulas above we must have that the coefficient
c1 = 0. Then this problem reduces to:
λ4 + c2 λ2 + c0 = 0
This case corresponds to a special geometric case/configuration of the point clouds - specifically, when points are coplanar.
To describe this plane in space, we need only find a normal vector n̂ that is orthogonal to all points in the point cloud -
i.e. the component of each point in the point cloud in the n̂ direction is 0. Therefore, we can describe the plane by the equation:
Figure 2: A coplanar point cloud can be described entirely by a surface normal of the plane n̂.
Note: In the absence of measurement noise, if one point cloud is coplanar, the the other point cloud must be as well (assuming
that the transformation between the point clouds is a linear transformation). This does not necessarily hold when measurement
7
noise is introduced.
Recall that our matrix M , which we used above to compute the coefficients of the characteristic polynomial describing this
system, is given by:
n
∆
X
M= r0r,i r0l,i T
i=1
Therefore, when a point cloud is coplanar, the null space of M is non-trivial (it is given by at least Span({n̂}), and therefore
M is singular. Recall that a matrix M ∈ Rn×d is singular if ∃ x ∈ Rd , x 6= 0 such that M x = 0, i.e. the matrix has a non-trivial
null space.
Figure 3: Two coplanar point clouds. This particular configuration allows us to estimate rotation in two simpler steps.
In this case, we can actually decompose finding the right rotation into two simpler steps!
1. Rotate one plane so it lies on top of the other plane. We can read off the axis and angle from the unit normal vectors of
these two planes describing the coplanarity of these point clouds, given respectively by n̂1 and n̂2 :
• Axis: We can find the axis by noting that the axis vector will be parallel to the cross product of n̂1 and n̂2 , simply
scaled to a unit vector:
n̂1 × n̂2
ω̂ =
||n̂1 × n̂||2
• Angle: We can also solve for the angle using the two unit vectors n̂1 and n̂2 :
8
We now have an axis angle representation for rotation between these two planes, and since the points describe each of the
respective point clouds, therefore, a rotation between the two point clouds! We can convert this axis-angle representation
into a quaternion with the formula we have seen before:
o θ θ
q = cos , sin ω̂
2 2
2. Perform an in-plane rotation. Now that we have the quaternion representing the rotation between these two planes, we can
orient two planes on top of each other, and then just solve a 2D least-squares problem to solve for our in-place rotation.
With these steps, we have a rotation between the two point clouds!
1.5 Robustness
In many methods in this course, we have looked at the use of Least Squares methods to solve for estimates in the presence
of noise and many data points. Least squares produces an unbiased, minimum-variance estimate if (along with a few other
assumptions) the dataset/measurement noise is Gaussian (Gauss-Markov Theorem) [1]. But what if the measurement noise is
non-Gaussian? How do we deal with outliers in this case?
It turns out that Least Squares methods are not robust to outliers. One alternative approach is to use absolute error in-
stead. Unfortunately, however, using absolute error does not have a closed-form solution. What are our other options for dealing
with outliers? One particularly useful alternative is RANSAC.
RANSAC, or Random Sample Consensus, is an algorithm for robust estimation with least squares in the presence
of outliers in the measurements. The goal is to find a least squares estimate that includes, within a certain threshold band, a
set of inliers corresponding to the inliers of the dataset, and all other points outside of this threshold bands as outliers. The
high-level steps of RANSAC are as follows:
1. Random Sample: Sample the minimum number of points needed to fix the transformation (e.g. 3 for absolute orientation;
some recommend taking more).
2. Fit random sample of points: Usually this involves running least squares on the sample selected. This fits a line (or
hyperplane, in higher dimensions), to the randomly-sampled points.
3. Check Fit: Evaluate the line fitted on the randomly-selected subsample on the rest of the data, and determine if the fit
produces an estimate that is consistent with the “inliers” of your dataset. If the fit is good enough accept it, and if it is
not, run another sample. Note that this step has different variations - rather than just immediately terminating once you
have a good fit, you can run this many times, and then take the best fit from that.
Furthermore, for step 3, we threshold the band from the fitted line/hyperplane to determine which points of the dataset are
inliers, and which are outliers (see figure below). This band is usually given by a 2 band around the fitted line/hyperplane.
Typically, this parameter is determined by knowing some intrinsic structure about the dataset.
Figure 4: To evaluate the goodness of fit of our sampled points, as well as to determine inliers and outliers from our dataset, we
have a 2 thick band centered around the fitted line.
Another interpretation of RANSAC: counting the “maximimally-occupied” cell in Hough transform parameter space! Another
way to find the best fitting line that is robust to outliers:
9
1. Repeatedly sample subsets from the dataset/set of measurements, and fit these subsets of points using least squares
estimates.
2. For each fit, map the points to a discretized Hough transform parameter space, and have an accumulator array that keeps
track of how often a set of parameters falls into a discretized cell. Each time a set of parameters falls into a discretized
cell, increment it by one.
3. After N sets of random samples/least squares fits, pick the parameters corresponding to the cell that is “maximally-
occupied”, aka has been incremented the most number of times! Take this as your outlier-robust estimate.
Figure 5: Another way to perform RANSAC using Hough Transforms: map each fit from the subsamples of measurements to a
discretized Hough Transform (parameter) space, and look for the most common discretized cell in parameter space to use for an
outlier-robust least-squares estimate.
Why are we interested in this space? Many orientation problems we have studied so far do not have a closed-form
solution and may require sampling. How do we sample from the space of rotations?
One way to sample from a sphere is with latitude and longitude, given by (θi , φi ), respectively. The problem with this ap-
proach, however, is that we sample points that are close together at the poles. Alternatively, we can generate random longitude
θi and φi , where:
• − π2 ≤ θi ≤ π
2 ∀i
• −π ≤ φi ≤ π ∀ i
But this approach suffers from the same problem - it samples too strongly from the poles. Can we do better?
Idea: Map all points (both inside the sphere and outside the sphere/inside the cube) onto the sphere by connecting a line
from the origin to the sampled point, and finding the point where this line intersects the sphere.
10
Figure 6: Sampling from a sphere by sampling from a cube and projecting it back to the sphere.
Problem with this approach: This approach disproportionately samples more highly on/in the direction of the cube’s
edges. We could use sampling weights to mitigate this effect, but better yet, we can simply discard any samples that fall outside
the sphere. To avoid numerical issues, it is also best to discard points very close to the sphere.
Generalization to 4D: As we mentioned above, our goal is to generalize this from 3D to 4D. Cubes and spheres simply
become 4-dimensional - enabling us to sample quaternions.
As we did for the cube, we can do the same for polyhedra: to sample from the sphere, we can sample from the polyhedra,
and then project onto the point on the sphere that intersects the line from the origin to the sampled point on the polyhedra.
From this, we get great circles from the edges of these polyhedra on the sphere when we project.
Fun fact: Soccer balls have 32 faces! More related to geometry: soccer balls are part of a group of semi-regular solids,
specifically an icosadodecahedron.
o
3. π about ŷ: q = (cos π2 , sin π2 ŷ) = (0, ŷ)
o
4. π about ẑ: q = (cos π2 , sin π2 ẑ) = (0, ẑ)
o
5. π2 about x̂: q = (cos π4 , sin π4 x̂) = √12 (1, x̂)
π o π π √1 (1, ŷ)
6. 2 about ŷ: q = (cos 4 , sin 4 ŷ) = 2
11
π o π π √1 (1, ẑ)
7. 2 about ẑ: q = (cos 4 , sin 4 ẑ) = 2
o
8. − π2 about x̂: q = (cos − π4 , sin − π4 x̂) = √1 (1, −x̂)
2
o
9. − π2 about ŷ: q = (cos − π4 , sin − π4 ŷ) = √1 (1, −ŷ)
2
o
10. − π2 about ẑ: q = (cos − π4 , sin − π4 ẑ) = √1 (1, −ẑ)
2
These 10 rotations by themselves give us 10 ways to sample the rotation space. How can we construct more samples? We can
do so by taking quaternion products, specifically, products of these 10 quaternions above. Let us look at just a couple of
these products:
1. (0, x̂)(0, ŷ):
We see that this simply produces the third axis, as we would expect. This does not give us a new rotation to sample from.
Next, let us look at one that does.
2. √1 (1, x̂) √1 (1, ŷ):
2 2
1 1 1
√ (1, x̂) √ (1, ŷ) = (1 − x̂ · ŷ, ŷ + x̂ + x̂ × ŷ)
2 2 2
1
= (1, x̂ + ŷ + x̂ × ŷ)
2
This yields the following axis-angle representation:
• Axis: √1 (1
3
1 1)
• Angle: cos θ
2 = 1
2 =⇒ θ
2 = π
3 =⇒ θ = 2π
3
1.7 References
1. Gauss-Markov Theorem, https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov theorem
12
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms