Computer Vision Three-Dimensional - Andrea Fusiello
Computer Vision Three-Dimensional - Andrea Fusiello
profound
impact on the way we interact with technology. From facial
recognition to self-driving cars, the applications of
computer vision are vast and
ever-expanding. Geometry plays a fundamental role in this
discipline,
providing the necessary mathematical framework to
understand the underlying principles of how we perceive
and interpret visual
information in the world around us.
This text delves into the theories and computational
techniques
used for determining the geometric properties of solid
objects through images. It covers the fundamental concepts
and provides the necessary mathematical background for
more advanced studies. The book is
divided into clear and concise chapters that cover a wide
range of
topics, including image formation, camera models, feature
detection, and 3D reconstruction. Each chapter includes
detailed explanations of the theory, as well as practical
examples to help readers understand and apply the
concepts presented.
With a focus on teaching, the book aims to find a balance
between
complexity of the theory and its practical applicability in
terms of
implementation. Instead of providing an all-encompassing
overview of the current state of the field, it offers a
selection of specific methods
with enough detail for readers to implement them. To aid
the reader in implementation, most of the methods
discussed in the book are
accompanied by a MATLAB listing and the sources are
available on Github at
https://github.com/fusiello/Computer_Vision_Toolkit.
This approach results in leaving out several valuable
topics and algorithms, but this does not mean that they
are any less important than the ones that have been
included; it is simply a personal choice.
The book has been written with the intention of being
used as a
primary resource for students of university courses on
computer
vision, specifically final-year undergraduates or
postgraduates of
computer science or engineering degrees. It is also useful
for self-study
and for those who are using computer vision for practical
applications outside of academia.
Basic knowledge of linear algebra is necessary, while
other
mathematical concepts can be introduced as needed
through included appendices. The modular structure
allows instructors to adapt the
material to fit their course syllabus, but it is
recommended to cover at least the chapters on
fundamental geometric concepts, namely Chaps. 3, 4, 5, 6,
7, 8.
This edition has been updated to ensure that it is
accessible to a
global audience, while also ensuring that the material is
current and up- to-date with the latest developments in the
field. To accomplish this, the book has been translated to
English and has undergone extensive
revision from its previous version, which was published by
Franco
Angeli, Milano. The majority of the chapters have
undergone changes, which include the inclusion of new
material, as well as the
reorganisation of existing content.
I hope that you will find this book to be a valuable
resource as you explore the exciting world of computer
vision.
Andrea
Fusiello
Udine,
Italy
December, 2022
Acknowledgements
This text is derived from the handouts I have prepared for
academic
courses or seminar presentations over the past 2 0 years.
The first
chapters were born, in embryonic form, in 1997 and then
evolved and expanded to the current version. I would like
to thank the students of the University of Udine and the
University of Verona who, during these years, have pointed
out errors, lacks and unclear parts. Homeopatic
traces of my PhD thesis can also be found here and there.
The text benefited from the timely corrections suggested
by
Fed erica Arrigoni, Guido Maria Cortelazzo, Fabio Crosilla,
Riccardo
Gherardi, Luca Magri, Francesco Malapelle, Samuel e
Martelli, Eleonora Maset, Roberto Rinaldo, and Roberto
Toldo, whom I sincerely thank. The residual errors are
solely my responsibility.
Credits for figures are recognised in the respective
captions.
Acronyms
PPM Perspective Projection Matrix
COP Centre of Projection
SVD Singular Value Decomposition
OPA Orthogonal Procrustes analysis
GPA Generalised Procrustes analysis
BA Bundle Adjustment
AIM Adjustment of Independent Models
DLT Direct Linear Transform
ICP Iterative Closest Point
GSD Ground Sampling Distance
IRLS Iteratively Reweighted Least-Squares
LMS Least Median of Squares
RANSAC Random Sample Consensus
SFM Structure from Motion
SGM Semi Global Matching
SO Scanline Optimisation
WTA Winner Takes All
DSI Disparity Space Image
LS Least-Squares
LM Levenberg-Marquardt
SIFT Scale Invariant Feature Transform
TOF Time of Flight
SSD Sum of Squared Difference
SAD Sum of Absolute Difference
NCC Normalised Cross Correlation
Contents
1 Introduction
1.1 The Prodigy of Vision
1.2 Low-Level Computer Vision
1.3 Overview of the Book
1.4 Notation
References
2 Fundamentals of Imaging
2.1 Introduction
2.2 Perspective
2.3 Digital Images
2.4 Thin Lenses
2.4.1 Telecentric Optics
2.5 Radiometry
References
3 The Pinhole Camera Model
3.1 Introduction
3.2 Pinhole Camera
3.3 Simplified Pinhole Model
3.4 General Pinhole Model
3.4.1 Intrinsic Parameters
3.4.2 Extrinsic Parameters
3.5 Dissection of the Perspective
Projection Matrix 3.5.1 Collinearity
Equations
3.6 Radial Distortion
Problems
References
4 Camera Calibration
4.1 Introduction
4.2 The Direct Linear Transform Method
4.3 Factorisation of the Perspective
Projection Matrix 4.4 Calibrating Radial
Distortion
4.5 The Sturm-Maybank-Zhang Calibration
Algorithm Problems
References
5 Absolute and Exterior Orientation
5.1 Introduction
5.2 Absolute Orientation
5.2.1 Orthogonal Procrustes Analysis
5.3 Exterior Orientation
5.3.1 Fiore’s Algorithm
5.3.2 Procrustean Method
5.3.3 Direct Method
Problems
References
6 Two-View Geometry
6.1 Introduction
6.2 Epipolar Geometry
6.3 Fundamental Matrix
6.4 Computing the Fundamental Matrix
6.4.1 The Seven-Point Algorithm
6.4.2 Preconditioning
6.5 Planar Homography
6.5.1 Computing the Homography
6.6 Planar Parallax
Problems
References
7 Relative Orientation
7.1 Introduction
7.2 The Essential Matrix
7.2.1 Geometric Interpretation
7.2.2 Computing the Essential Matrix
7.3 Relative Orientation from the Essential Matrix
7.3.1 Closed Form Factorisation of the
Essential Matrix
7.4 Relative Orientation from the Calibrated
Homography
Problems
References
8 Reconstruction from Two Images
8.1 Introduction
8.2 Triangulation
8.3 Ambiguity of Reconstruction
8.4 Euclidean Reconstruction
8.5 Projective Reconstruction
8.6 Euclidean Upgrade from Known Intrinsic
Parameters 8.7 Stratification
Problems
References
9 Non-linear Regression
9.1 Introduction
9.2 Algebraic Versus Geometric Distance
9.3 Non-linear Regression of the PPM
9.3.1 Residual
9.3.2 Parameterisation
9.3.3 Derivatives
9.3.4 General Remarks
9.4 Non-linear Regression of Exterior
Orientation 9.5 Non-linear Regression of
a Point in Space
9.5.1 Residual
9.5.2 Derivatives
9.5.3 Radial Distortion
9.6 Regression in the Joint Image Space
9.7 Non-linear Regression of the Homography
9.7.1 Residual
9.7.2 Parameterisation
9.7.3 Derivatives
9.8 Non-linear Regression of the Fundamental
Matrix
9.8.1 Residual
9.8.2 Parameterisation
9.8.3 Derivatives
9.9 Non-linear Regression of Relative Orientation
9.9.1 Parameterisation
9.9.2 Derivatives
9.10 Robust Regression
Problems
References
10 Stereopsis: Geometry
10.1 Introduction
10.2 Triangulation in the Normal Case
10.3 Epipolar Rectification
10.3.1 Calibrated Rectification
10.3.2 Uncalibrated Rectification
Problems
References
11 Features Points
11.1 Introduction
11.2 Filtering Images
11.2.1 Smoothing
11.2.2 Derivation
11.3 LoG Filtering
11.4 Harris-Stephens Operator
11.4.1 Matching and Tracking
11.4.2 Kanade-Lucas-Tomasi Algorithm
11.4.3 Predictive Tracking
11.5 Scale Invariant Feature Transform
11.5.1 Scale-Space
11.5.2 SIFT Detector
11.5.3 SIFT Descriptor
11.5.4 Matching
References
12 Stereopsis: Matching
12.1 Introduction
12.2 Constraints and Ambiguities
12.3 Local Methods
12.3.1 Matching Cost
12.3.2 Census Transform
12.4 Adaptive Support
12.4.1 Multiresolution Stereo Matching
12.4.2 Adaptive Windows
12.5 Global Matching
12.6 Post-Processing
12.6.1 Reliability Indicators
12.6.2 Occlusion Detection
References
13 Range Sensors
13.1 Introduction
13.2 Structured Lighting
13.2.1 Active Stereopsis
13.2.2 Active Triangulation
13.2.3 Ray-Plane Triangulation
13.2.4 Scanning Methods
13.2.5 Coded-Light Methods
13.3 Time-of-Flight Sensors
13.4 Photometric Stereo
13.4.1 From Normals to Coordinates
13.5 Practical Considerations
References
14 Multi-View Euclidean Reconstruction
14.1 Introduction
14.1.1 Epipolar Graph
14.1.2 The Case of Three Images
14.1.3 Taxonomy
14.2 Point-Based Approaches
14.2.1 Adjustment of Independent Models
14.2.2 Incremental Reconstruction
14.2.3 Hierarchical Reconstruction
14.3 Frame-Based Approach
14.3.1 Synchronisation of Rotations
14.3.2 Synchronisation of Translations
14.3.3 Localisation from Bearings
14.4 Bundle Adjustment
14.4.1 Jacobian of Bundle Adjustment
14.4.2 Reduced System
References
15 3D Registration
15.1 Introduction
15.1.1 Generalised Procrustes Analysis
15.2 Correspondence-Less Methods
15.2.1 Registration of Two Point Clouds
15.2.2 Iterative Closest Point
15.2.3 Registration of Many Point Clouds
References
16 Multi-view Projective Reconstruction and
Autocalibration
16.1 Introduction
16.1.1 Sturm-Triggs Factorisation Method
16.2 Autocalibration
16.2.1 Absolute Quadric Constraint
16.2.2 Mendonça-Cipolla Method
16.3 Autocalibration via H∞
16.4 Tomasi-Kanade’s Factorisation
16.4.1 Affine Camera
16.4.2 The Factorisation Method for Affine
Camera
Problems
References
17 Multi-view Stereo Reconstruction
17.1 Introduction
17.2 Volumetric Stereo in Object-Space
17.2.1 Shape from Silhouette
17.2.2 Szeliski’s Algorithm
17.2.3 Voxel Colouring
17.2.4 Space Carving
17.3 Volumetric Stereo in Image-Space
17.4 Marching Cubes
References
18 Image-Based Rendering
18.1 Introduction
18.2 Parametric Transformations
18.2.1 Mosaics
18.2.2 Image Stabilisation
18.2.3 Perspective Rectification
18.3 Non-parametric Transformations
18.3.1 Transfer with Depth
18.3.2 Transfer with Disparity
18.3.3 Epipolar Transfer
18.3.4 Transfer with Parallax
18.3.5 Ortho-Projection
18.4 Geometric Image Transformation
Problems
References
A Notions of Linear Algebra
A.1 Introduction
A.2 Scalar Product
A.3 Matrix Norm
A.4 Inverse Matrix
A.5 Determinant
A.6 Orthogonal Matrices
A.7 Linear and Quadratic Forms
A.8 Rank
A.9 QR Decomposition
A.10 Eigenvalues and Eigenvectors
A.11 Singular Value Decomposition
A.12 Pseudoinverse
A.13 Cross Product
A.14 Kronecker’s Product
A.15 Rotations
A.16 Matrices Associated with Graphs
References
B Matrix Differential Calculation
B.1 Derivatives of Vector and Matrix Functions
B.2 Derivative of Rotations
B.2.1 Axis/Angle Representation
B.2.2 Euler Representation
References
C Regression
C.1 Introduction
C.2 Least-Squares
C.2.1 Linear Least-Squares
C.2.2 Non-linear Least-Squares
C.2.3 The Levenberg-Marquardt Method
C.3 Robust Regression
C.3.1 Outliers and Robustness
C.3.2 M-Estimators
C.3.3 Least Median of Squares
C.3.4 RANSAC
C.4 Propagation of Uncertainty
C.4.1 Covariance Propagation in Least-Squares
References
D Notions of Projective Geometry
D.1 Introduction
D.2 Perspective Projection
D.3 Homogeneous Coordinates
D.4 Equation of the Line
D.5 Transformations
Reference
E MATLAB Code
Index
Listings
3.1 Projective transformation
6.3 Preconditioning
8.1 Triangulation
derivatives
9.6 Parameterisation of H
9.9 Parameterisation of F
C. 1 Gauss-Newton method
C.2 IRLS
C.4 MSAC
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_1
Reconstruction Techniques
1. Introduction
Andrea Fusi ello1
( 1) University of Udine, Udine, Italy
Email: [email protected]
Andrea Fusiello
https://doi.org/10.1007/978-3-031-34507-4_1
Reconstruction Techniques
1.4 Notation
The notation largely follows that ofFaugeras (1993),
with 3D points called M (from the French “monde”) and
camera matrices called P
(from “projection” and “perspective”), although P for
points and M for matrices would have sounded more
intuitive. Vectors are in bold, the 2D ones in lower case,
the 3D ones in upper case. Matrices are
capitalised. The ˜ above a vector is used to distinguish its
Cartesian
representation (with ˜) from the homogeneous one
(without ˜). This choice is the opposite ofFaugeras
(1993) and is motivated by the fact
that homogeneous coordinates are the default in this
book, while Cartesian coordinates are rarely used.
References
J. Aloimonos and D. Shulman. Integration of Visual Modules. An Extension to the Marr
Paradigm. Academic Press, Waltham, MA, 1989.
O. Faugeras. Three-Dimensional Computer Vision: A Geometric Viewpoint. The MIT
Press, Cambridge, MA, 1993.
R. Jain, R. Kasturi, and B.G. Schunk. Machine Vision. Computer Science
Series. McGraw-Hill International Editions, 1995.
D. Marr. Vision. Freeman, San Francisco, CA, 1982.
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_2
Reconstruction Techniques
2. Fundamentals of Imaging
Andrea Fusi ello1
( 1) University of Udine, Udine, Italy
Email: [email protected]
Andrea Fusiello
2. 1 Introduction
An imaging device works by collecting light reflected from
objects in the scene and creating a two-dimensional
image. If we want to use the image to gain information
about the scene, we need to be familiar with the nature of
this process that we would like to be able to reverse.
2.2 Perspective
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_2
Reconstruction Techniques
2.2 Perspective
The simplest geometric model of image formation is the
camera, represented in Fig. 2.1; it is based on the same
pinhole
(2.
1)
and therefore
(2.2)
Note that the image is inverted with respect to the scene,
both left-right and top-bottom, as indicated by the minus
sign. These equations define the image formation process,
which is called perspective projection.
The parameterf determines the magnification factor of
the image: if the image plane is close to the pinhole, the
image is reduced in size,
while it will become larger and larger asf increases. Since
the working area of the image plane is framed, moving the
image plane away also
reduces the field of view, that is the solid angle that contains
the portion of the scene that appears in the image.
The division by Z is responsible for the foreshortening efect,
whereby the apparent size of an object in the image
decreases
according to its distance from the observer, such as the
sleepers of the tracks of Fig. 2.2, which, although being of
the same length in reality, appear shortened to different
extents in the image. In the words of
Leonardo da Vinci: “Infra le cose d ’egual grandezza quella
che sarà più distante dall’ ochio sidimostrerà di minore
figura” 1
(2.3)
hole in
correspondence of the focus F, as in Fig. 2.6, rays leaving
Min a
direction parallel to the optical axis are selected, while
other rays are blocked. This results in a device that
produces an image according to a pattern that is obviously
different from the pinhole/perspective model. The image of
the point M, in fact, does not depend on its depth: this is
indeed a telecentric camera, which realises an orthographic
projection.
2.5 Radiometry
Light is emitted by the sun or other sources and interacts
with objects in the scene; part is absorbed, and part is
scattered and propagates in new directions. Among these
directions, some points towards the
camera sensor and contributes to the image formation.
The brightness of a pixel in the image I is
proportional to the “ amount of light” that the surface —
centred at a point M—
scatters in the direction of the camera, where is the
area of the surface that projects into the pixel .
This in turn depends on the surface reflectance and the
spatial distribution of the light sources.
(2.6)
(2.8)
Footnotes
1 “Of things of equal size, that which is most distant from the eye will appear
of lesser figure”
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_3
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
3. 1 Introduction
Referring back to the geometric aspect of image formation,
covered in
Chap. 2, we shall now introduce a geometric model for the
camera, that is, we will study how the position of a point in
the scene and the position of the corresponding point in the
image are related. We make the
simplifying assumption that lenses can be neglected
as far as the geometry of projection is concerned.
(3.
1)
that is
(3.2)
Fig. 3.3 Nadiral view of the camera model. The two triangles that have
and for hypotenuse are similar to each other
This is the perspective projection. The transformation in
Cartesian coordinates is clearly non-linear because of the
division by Z. Using homogeneous coordinates1 instead
it becomes linear.
Let therefore be
(3.3)
(3.4)
(3.6)
where means “equal up to a scaling factor”
The matrix represents the geometric model of the
camera and is called Perspective Projection Matrix (PPM)
or camera matrix.
Listing 3.1 reports the implementation of a generic
projective
transformation, which includes perspective projection as a
special case.
Listing 3. 1 Projective transformation
\r\n
(3.7)
(3. 11)
(3. 12)
guesses on some of
them. Normally, it is safe to assume , as we
implicitly did in the beginning. Also, very often the
footprint of the pixel is square, that is,
. Pushing this further, one could reduce the intrinsic
parameters to
the sole focal length by fixing the principal point
in the centre of the image.
Some authors parameterise K with its entries by
introducing the focal length expressed in vertical pixels
and the skew parameter s:
(3.
13)
Fig. 3.6 The pyramid of vision (extending to the right of C) and the field of view
(3.
14)
And, conversely:
(3.
15)
Focal length and field of view are related, but the latter is
a more
intelligible parameter than the focal length in pixels: an
educated guess for lies in the range from to . For
this reason, in our MATLAB
toolkit we choose to parameterise as in (3.15), obtaining
the
following five intrinsic parameters: . Thanks to
the modular approach of the toolkit, however, this aspect
remains encapsulated in
the function par2K (Listing 3.2) that computes the
parameterisation and its derivative. We will henceforth use
as the first parameter in the
text, but it is understood that the implementation uses
instead.
frame
does not coincide with the camera reference frame, we must
introduce a coordinate change consisting of a direct isometry,
that is, a rotation
followed by a translation . Let us denote by
the
homogeneous coordinates of a point in the camera
coordinate system and the homogeneous coordinates
of the same point in the world coordinate system. We
can therefore write:
(3. 16)
where
(3. 17)
(3. 18) and substituting (3.16) into the equation above gives:
(3.22
)
we get the following expression for P as a function of the
elements of the matrices K and G:
(3.23)
(3.26)
(3.27)
(3.29)
Setting in the above equation yields the
relation between the COP and the translation component
of the exterior orientation:
(3.30
)
(3.32)
(3.33)
(3.34)
Furthermore, being:
(3.35)
(3.36)
We obtained what in photogrammetry are known as
equations (Kraus 2007). They express the fact that the COP
collinearity
C, the object point M and the image point m are aligned on
the same optical ray. The rows of R can be further
expanded into their components, which are
combinations of trigonometric functions of the angles
.
Fig. 3.9 Cushion (outer dashed red line) and barrel (inner dashed blue line)
radial distortion, obtained with and , respectively
The standard model in computer vision4 is a
transformation defined on the normalised image coordinates
that brings the ideal (undistorted)
ones onto the observable (distorted) ones :
(3.37)
(3.38)
(3.39)
(3.40
)
(3.41
)
(3.43
)
Problems
3. 1 Prove that ifMis a point at infinity, its projection is
given by
where .
References
Pierre Drap and Julien Lefèvre. An Exact Formula for Calculating Inverse Radial
Lens Distortions. Sensors, 16 (6): 807, 2016.
Karl Kraus. Photogrammetry - Geometryfrom Images and Laser Scans - 2nd edition.
Walter de Gruyter, Berlin, 2007.
[Crossref]
Footnotes
1 The homogeneous coordinates of a 2D point in the image are the ,
where
are the corresponding Cartesian coordinates. Thus, there is a one-
to-many
correspondence between Cartesian and homogeneous coordinates.
Homogeneous coordinates can represent any point in the Euclidean plane and
also points at infinity, which have the third component equal to zero and
therefore do not have a corresponding Cartesian representation. See
Appendix D.
2 Humans instead have a circular sensor; hence, we speak of the cone of vision.
People have an approximate angle of undistorted vision that extends as a
fictitious cone from their eyes forward.
https://doi.org/10.1007/978-3-031-34507-4_4
Reconstruction Techniques
4. Camera Calibration
Andrea Fusi ello1
( 1) University of Udine, Udine, Italy
Email: [email protected]
Andrea Fusiello
4. 1 Introduction
In the previous chapter we introduced a geometric model
for the camera and we shall now address the problem of
precisely and accurately
determining the parameters of that model, a process that
goes by the name of calibration of the camera.
The idea of the resection methods for calibration is that
by knowing the image projections of 3D points of known
coordinates (control
points), it should be possible to obtain the unknown
parameters by
solving the perspective projection equation. The specific
algorithm we will illustrate, called Direct Linear Transform
(DLT), directly estimates the PPM, from which the intrinsic
and extrinsic parameters can
subsequently be derived.
Resection is not the only way to perform calibration (e.g.
Caprile and Torre 1990 uses vanishing points). In Sect. 4.5
we will also see a non-
resection calibration method.
Fig. 4.1 Calibration object with the world reference frame superimposed.
One of the control points is highlighted
(4.4)
where A is the matrix of coefficients and depends on
the
coordinates of the control points, while the vector of
unknowns contains the 12 elements of P read by
columns. In theory, then, six non- coplanar points are
sufficient for the computation of P; if any four of
them are coplanar, the rank of A drops and the DLT
algorithm fails (Faugeras 1993).
In practice, it is advisable to use all the available
points (the more, the better) to compensate for the
inevitable measurement errors.
System (4.4) is therefore solved with the least-squares
method:
according to Proposition A. 13, the solution is the least
right singular vector of A.
Listing 4.1 implements this method.
(4.5)
Let us focus on the submatrix in : by
comparison with , we have that with K
upper triangular andR
orthogonal (with positive determinant). Let
(4.6
)
be the QR factorisation of , with Q orthogonal
and U upper triangular. Since , it
suffices to set
(4.7
)
where the product by is used to make the determinant
of R
positive, as required for a rotation matrix. We are allowed to
change sign
to R since this results in changing sign to P, which can
absorb an arbitrary scaling factor. For the same
reason, we can rescale K by imposing .
The translation is eventually calculated with
(4.8
)
where we take into account the possible change of sign of P.
Listing 4.3 shows the MATLAB implementation of the
method.
Fig. 4.2 The map bringing any plane of on the image plane is a
homography
We assume that there is a plane in the scene of which we
are able to calculate the homography that maps it to the
image. It is customary to prepare for this purpose a
planar calibration object, on which a grid or a chessboard is
drawn, as in Fig. 4.3. The matching between the points in
the image m and the corresponding control points Min the
plane should be easy to obtain. The homography His
computed from these
correspondences using the DLT algorithm. As a
matter of fact, the function dlt given in Listing 4.1
serves to both resection and
homography calculation (see Sect. 6.5.1).
Fig. 4.3 Some images of the calibration chessboard
Given that , and , we have:
(4. 12)
where is an unknown scalar that accounts for the
fact that the homographyH computed with DLT is
given up to a scaling factor.
Listing 4.6 SMZ calibration
By , we therefore get:
writing
(4.
13)
Due to the fact that the columns of R are orthonormal, one
can
constraint the intrinsic parameters. In particular, the
orthogonality yields or,
equivalently
(4.
14)
where translates into: Similarly,
the condition on
the norm:
(4.
Thanks to the “vec trick”, the last two equations are 15)
rewritten as:
(4.
16)
(4.
17)
B is a symmetric matrix; therefore, its unique
elements are only six. This can be formally taken into
account by introducing the operator and the
duplication matrix (see Sect. A. 14):
(4. 18)
(4. 19)
In summary, one image provides two equations in six
unknowns. If one takes n images of the plane (with
different orientations, as in Fig. 4.3), one can stack the
resulting 2n equations into a linear system of equations
(4.20)
where A is a matrix. If there is a solution,
determined up to a scaling factor. From we derive B
and then K (by Cholesky factorisation), from which we
then go back to R and .
The implementation is given in Listing 4.6. Some caveats
are in order, which are related to the fact that homographies
are known up to a scale, including a sign. Therefore, B is
not necessarily positive definite as one would expect from
its definition, but either B or is positive definite. To test
which of these two cases occur, we exploit the fact that the
trace of a positive definite matrix is positive, being the
sum of positive
eigenvalues. Finally, a sign on the extrinsic parameters
(derived from H) is arbitrarily fixed by imposing that the
cameras lie in the positive half- space of the plane.
References
B. Caprile and V. Torre. Using vanishing points for camera calibration.
International Journal of Computer Vision, 4: 127– 140, 1990.
[Crossref]
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_5
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
5. 1 Introduction
In computer vision and photogrammetry several orientation
problems are defined, that require the computation of a
3D direct i sometry
(rotation + translation) from correspondences between
points.
According to the space where the corresponding points
belong, the following problems are established:
Absolute orientation (3D-3D): The coordinates of some 3D
points with respect to two different 3D reference frames
are given; we are
required to determine the transformation between the
two reference frames.
Exterior orientation (3D-2D): The position of some 3D points
and
their projection in the image are given; we are required
to determine the transformation between the camera
reference frame and the
world reference frame. The intrinsic parameters are
assumed to be known.
Relative orientation (2D-2D): The projections of some 3D
points and two distinct images are given; we are
required to determine the
transformation between the two camera reference
frames. The intrinsic parameters are assumed to be
known.
Absolute orientation will be addressed in Sect. 5.2
where a method based on the Orthogonal Procrustes
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_5
Reconstruction Techniques
(5.2)
to each
other the vectors and the matrix obtained in the
same way with the vectors . This allows us to rewrite
(5.1) in matrix form:
(5.3)
and the objective function as well:
(5.4)
Note that if is a vector of d elements and is a vector of n
elements all equal to 1, then replicates the vector n
times, thus producing a
matrix.
If there were no scale and translation, this would be an
orthogonal Procrustes problem, which we could solve by
applying Proposition A. 14; we will now see how to fallback
in this case anyway.
Observe that ifA is a matrix, is the vector
containing the average computed over the rows of A.
Thus, averaging over the rows of both members of (5.3)
produces:
(5. 10)
where = .
The scale is finally computed as the solution of a one-
dimensional
least-squares problem. Given two matrices A and B, the
problem
is rewritten: , whose least-squares
solution is:
(5.
11)
thanks to the properties of the trace. In our case we
obtain:
(5.
12)
Fig. 5.1 Three control points and their projections determine the exterior
orientation of the
camera
(5. 14)
(5.20)
where denotes the Hadamard (element by element)
product.
(5.21)
As already noted, if is known, this reduces to an
absolute
orientation problem, which is solved with OPA (see Sect.
5.2.1). If, on the other hand, R and are known, one is left
with the linear problem of
solving for . The algorithm (Listing 5.3) proceeds by
alternating between these two steps until
convergence.
Solving for in (5.21) follows a path similar to the
solution of
(5.19). Instead of the “vec-trick” we exploit Proposition A.20,
to write
(5.22)
and the fact that is the diagonal
block matrix which has the columns of Q as blocks,
implemented by the MATLAB
function diagc.
Fig. 5.3 Example application of the direct method. The spray bottle is the
object whose model is known. On the right is the gradient map on which the
camera orientation retrieval is based
(5.23)
(5.24)
Problems
Given two vectors and prove that the least-
squares solution of
5. 1
is
(5.25)
Xiao-Shan Gao, Xiao-Rong Hou, Jianliang Tang, and Hang-Fei Cheng. Complete
solution
classification for the perspective-three-point problem. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 25: 930–943, August 2003.
[Crossref]
V. Garro, F. Crosilla, and A. Fusiello. Solving the PnP problem with anisotropic
orthogonal
procrustes analysis. In Second Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling,
Processing, Visualization and Transmission (3DIMPVT), 2012.
Johann August Grunert. Das pothenotische problem in erweiterter gestalt nebst
bber seine
anwendungen in der geodasie. Grunerts Archiv fur Mathematik und Physik, pages 238–
248, 1841.
Robert M. Haralick, Chung-Nan Lee, Karsten Ottenberg, and Michael Nölle.
Review and analysis of solutions of the three point perspective pose estimation
problem. International Journal of
Computer Vision, 13: 331– 356, December 1994.
[Crossref]
R. Hooke and T.A. Jeeves. Direct search solution of numerical and statistical
problems. Journal of the Associationfor Computing Machinery (ACM), pages 212– 229,
1961.
L Kneip, D Scaramuzza, and R Siegwart. A novel parametrization of the
perspective-three-point problem for a direct computation of absolute camera
position and orientation. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2011.
Karl Kraus. Photogrammetry - Geometryfrom Images and Laser Scans - 2nd edition.
Walter de Gruyter, Berlin, 2007.
[Crossref]
pages 372– 377, Palermo, IT, 2001. IAPR, IEEE Computer Society Press. doi: 10.
Proceedings of the 11th International Conference on Image Analysis and Processing (ICIAP),
1109/ICIAP.2001.
957038.
Bo Wang, Hao Hu, and Caixia Zhang. Geometric interpretation of the multi-
solution phenomenon in the P3P problem.J. Math. Imaging Vis., 62 (9): 1214– 1226,
nov 2020. ISSN 0924-9907.
https://doi.org/10.1007/s10851-020-00982-5.
Footnotes
1 In this section, in order to enlighten the notation, the Cartesian coordinates
of the points are denoted without the .
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_6
Reconstruction Techniques
6. Two-View Geometry
Andrea Fusi ello1
( 1) University of Udine, Udine, Italy
Email: [email protected]
Andrea Fusiello
6. 1 Introduction
We shall now study the relationship that links two images
framing the same scene from two different points of view.
In particular, we will be concerned with the relationship
that exists between the homologous points, which are
defined as the points in the two images that are
projections of the same point in objects space (Fig. 6.1).
Fig. 6.1 Stereo pair. Two homologous (image) points are the projection of the
same object point
Epipolar geometry describes the relationship between
two images of the same scene, so it is fundamental to any
computer vision technique
based on more than one image.
(6.4)
Fig. 6.3 The epipolar lines corresponding to the points marked with a square
in the left image are drawn on the right
(6. 14)
where and are the two matrices corresponding to the
last two
singular vectors in the Singular Value Decomposition
(SVD) of . Only one coefficient is needed because of
the scale ambiguity ofF. The
specific that makes F singular is the solution of:
(6. 15)
which is a third-degree polynomial in and therefore
can be solved analytically, yielding one or three real
solutions.
6.4.2 Preconditioning
The eight-point algorithm (but the same can be said about
the seven- point algorithm) has been criticised for being
too sensitive to noise and thus not very useful in practical
applications. However, Hartley (1995) showed that the
instability of the method is mainly due to a
malconditioning problem rather than to its linear nature.
He observed, in fact, that homogeneous pixel coordinates
most likely yield an ill-
conditioned system of linear equations. In fact, typical
image
coordinates are , so the entries of ranges
from 1 to , and consequently, has a high
conditioning number, which makes the solution of the
system unstable.
By applying a simple preconditioning technique— which
consists of transforming the points so that all the three
coordinates have the same orders of magnitude— the
conditioning number is reduced and the
results are more stable. The points are translated so that
their centroid coincides with the origin, and then they are
scaled so that the average distance from the origin is
. The latter choice comes from the
observation that the best conditioning occurs when the
mean
coordinates are , and this point has distance
from the origin.
Let and be the resulting transformations in the two
images and , the transformed points.
Using and in the eight-point algorithm, we obtain a
fundamental matrix that can be
related to the original one by , as can be easily
shown.
The MATLAB implementation of the computation
ofF with preconditioning is given in Listings 6.2
and 6.3.
Listing 6.2 Linear computation ofF
Fig. 6.4 The plane induces a homography of between the two images
(6.22)
(6.23)
If approaches the plane at infinity, that is, , in
the limit we get the matrix of the homography induced by
the plane at infinity
. This homography relates the projections of
points
lying the plane at infinity, that is, it maps vanishing points
of one image into vanishing points of the other image.
Formula (6.22) applies to any pair of cameras framing
a plane , with a totally general relative orientation.
There is, however, another special case when two images
are linked by a homography, which
depends on the motion of the camera and not on the
structure of the scene. This occurs when the baseline
(i.e. the relative translation) is zero, In fact, plugging
into (6.20) one gets:
(6.24)
Thus, plays a dual role:
Like all homographies induced by a plane, it relates the
projections of points on the plane at infinity in the two
images.
In addition, associates the projections of all the points
in the
scene (no matter what plane they belong to) between the
two images when the camera makes a purely rotational
motion.
6.5. 1 Computing the Homography
Given n homologous points , we are required to
determine the homography matrix H such that:
(6.25)
Taking advantage of the cross product to eliminate the
multiplicative factor, the equation is rewritten:
(6.28)
One way to interpret it is that a point m is associated with its
homologous in two steps: first is applied and then a
correction, called parallax, is added along the epipolar
line.
This observation is not only true for but
generalises to any plane (Shashua and Navab 1996). In
fact, after a few steps along the lines of the procedure
used to derive (6.22), we obtain:
(6.29
)
where a is the distance of the point (of which and
are the
projections) from the plane and is its depth with
respect to the first
camera.
When belongs to the plane , then ;
Fig. 6.6 Epipolar geometry and planar parallax. Parallax with respect to plane
is the
projection of the segment joining the point M and its projection onto along
the optical ray
Fig. 6.7 The first two images are the original ones. The third one is the
superimposition of the second one with the first one transformed according to
the homography of the building facade (the points used to calculate the
homography are highlighted)
Problems
6. 1 If the PPMs are normalised as in (3.23), we can
rewrite the
equation of the epipolar line with the point depths (
and ) made explicit:
(6.30)
References
Sameer Agarwal, Hon-Leung Lee, Bernd Sturmfels, and Rekha R. Thomas.
On the existence of epipolar matrices. International Journal of Computer Vision,
121 (3): 403–415, Feb 2017.
[MathSciNet][Crossref]
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_7
Reconstruction Techniques
7. Relative Orientation
Andrea Fusi ello1
( 1) University of Udine, Udine, Italy
Email: [email protected]
Andrea Fusiello
7. 1 Introduction
Consider the situation where some points of the 3D
space—called in this context tie points—are projected into
two different cameras, and for each point in one camera the
homologous point in the other camera is given. The
relative orientation problem consists in determining the
orientation (which includes position and angular attitude) of
one
camera with respect to the other, assuming known intrinsic
parameters.
It is customary to refer also to motion recovery, implicitly
assuming the equivalent scenario to the previous one in
which a static scene is framed by a moving camera with
known intrinsic parameters. In each case, orientation or
rigid motion is represented by a direct i sometry.
The approach that we will describe in Sect. 7.2 follows
the one
proposed by Longuet-Higgins (1981): it is based on the
essential matrix, which describes the epipolar geometry of
two perspective images with known intrinsic parameters
and from which the rigid motion of the
camera can be derived with a factorisation described in
Sect. 7.3.
Note that the translational component of the
displacement can only be calculated up to a scaling factor,
because it is impossible to determine ( without additional
information) whether the motion observed in the
image is caused by a nearby object with the camera moving
slowly or by
a distant object with the camera moving faster. This fact is
known as the depth-speed ambiguity.
The matrix
(7.3)
which contains the coefficients of the form is called the
essential matrix. It depends on three parameters for rotation
and on two parameters for translation. In fact, (7.2) is
homogeneous with respect to , that is, the
modulus of the vector does not matter. This reflects the
depth-speed
ambiguity, that is, the fact that we cannot derive the
absolute scale of the scene without an additional parameter,
for example knowledge of the
distance between two tie points. Therefore, an essential
matrix has only five degrees of freedom, accounting for
rotation (three parameters) and translation up to a scale
factor (two parameters).
In terms of constraints, we can observe that the
essential matrix is defined up to a scaling factor and is
singular, since . This
brings the degrees of freedom to seven; in order to match
the five
degrees of freedom of the parameterisation, we need to be
able to
exhibit other two constraints. The theorem in the next
section proves that these two constraints are the equality
of the two non-zero singular values of E (a polynomial in
the elements of E, which yields two
independent constraints).
The essential matrix and the fundamental matrix are
related, since both encode rigid motion between two
cameras. The former relates the normalised image coordinates
of the homologous points, while the latter relates the pixel
coordinates of the same points. It is easy to verify that:
(7.4)
7.2. 1 Geometric Interpretation
The Longuet-Higgins equation (7.2):
(7.5)
holds for any pair of homologous points , in
normalised image
coordinates. We can visualise and (Fig. 7.2) as the
vectors of
that connect the COP (origin of the camera reference frame)
and the
corresponding point on the image (which has equation
, in front of the camera). Then the Longuet-Higgins
equation can be interpreted as the coplanarity of the three
vectors ( is rotated to bring it
into the reference frame of the left camera), since the left
member is
nothing but the triple product of these vectors, which cancels
if and only if they are coplanar (see Appendix A).
(7.8)
Fig. 7.3 The four possible solutions of the factorisation of E. Between the left
and right columns there is a left-right inversion, while between the top and
bottom rows the camera B rotates around the baseline. Only in the top-left
case the triangulated point lies in front of both cameras
The property that specifies whether a point is in front
of or behind with respect to a given camera is called
chirality of the point.1
The MATLAB implementation is given in Listing 7.2. Note
that for the chirality test we solved (7.6) using the
function icomb that implements
Proposition A. 18.
(7.
15)
(7.
16)
Regarding the ambiguity of the solutions, we have two
solutions of opposite sign for translation, which generate
two solutions for rotation, respectively. Combining them
yields four solutions. This factorisation is implemented in
Listing 7.3.
Listing 7.3 Relative orientation from E in closed form
(7.
19)
and so
(7.20)
(7.21)
(7.22
Sinc an we )
e d have
(7.23
and finally
)
(7.24
)
(7.25)
(7.26)
(7.31)
Problems
Derive the essential matrix with two generic PPMs and
observe
7. 1
that: a) the translation and the rotation that appear in the
expression for E are the relative ones; b) by applying the
same i sometry to the right of each PPM, the expression
for E remains unchanged. This allows us,
without loss of generality, to fix .
enough.
Prove
that:
7.5
(7.32
)
References
O. Faugeras and S. Maybank. Motion from point matches: multiplicity of
solutions. International Journal of Computer Vision, 4 (3):225– 246, June 1990.
[Crossref]
O.D. Faugeras and F. Lustman. Motion and structure from motion in a piecewise
planar
environment. International Journal of Pattern Recognition and Artificial Intelligence,
2:485– 508, 1988.
[Crossref]
Berthold Horn. Recovering baseline and orientation from essential matrix.
Unpublished, 1990a.
T.S. Huang and O.D. Faugeras. Some properties of the E matrix in two-view
motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(
12):1310– 1312, December 1989.
[Crossref]
Hongdong Li and Richard Hartley. Five-point motion estimation made easy. In
Proceedings of the International Conference on Pattern Recognition, pages 630–633,
Washington, DC, USA, 2006.
IEEE Computer Society.
H. C. Longuet-Higgins. A computer algorithm for reconstructing a scene from
two projections. Nature, 293( 10):133– 135, September 1981.
[Crossref]
Ezio Malis and Manuel Vargas. Deeper understanding of the homography
decomposition for vision-based control. Research Report RR-6303, INRIA,
2007. https://hal.inria.fr/inria-
00174036.
David Nister. An efficient solution to the five-point relative pose problem. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, page
195, Los Alamitos, CA, USA, 2003. IEEE Computer Society.
Footnotes
1 A geometric configuration is said to be chiral if it is not superimposable on
its mirror image, such as right hand and left hand.
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_8
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
8. 1 Introduction
We will deal in this chapter with triangulation, which allows
us to obtain the 3D coordinates of the tie points, that is points
whose projections
have been matched in the images. The set of 3D points
obtained from
triangulation is also called model. We will see that the model
differs from the real one by transformations that reflect our
degree of knowledge
about the sensor and/or the scene.
8.2 Triangulation
Given homologous points in the two images and given the
two PPMs, triangulation (also known as forward intersection)
aims to reconstruct the position in object space of the
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_8
Reconstruction Techniques
Listing 8. 1 Triangulation
Iterative refinement of the linear-eigen method. With
reference—for example—to the second equation of the
linear system 8.2, the
residual that is minimised is while
instead we would like to minimise the geometric
residual, that is the difference between the measured u
coordinate and the projection of :
(8.8)
(8.
11)
where is a vector of three elements and s is a
scalar. From the expression for in (8.9) instead we
get:
(8.
and then, for each block: 12)
(8.
13)
While translation (up to a scale) is immediately derived,
for rotation it is necessary to also recover the unknown
vector . A solution can be found by observing that there
always exists a rotation Q such that:
Then, after multiplying both members of (8.13) by Q:
(8.
14)
(8.
15)
8.7 Stratification
This section may be challenging because of the technical
detail, but it is supplementary and can be omitted without
affecting the understanding of the rest.
We have seen in the previous sections that depending on
the
information held on the cameras— that provide images from
which
measurements are made— one can have access to different
descriptions of the three-dimensional structure of physical
space, via the
reconstruction. This hierarchy is called stratification by
Faugeras (1995), and each of the levels is called
The table below summarises the strata that are relevant
stratum.
to our
discussion. The second row reports the transformation
that links the obtained reconstruction to the true one,
while the third shows the
invariants, that is quantities that remain unchanged
after this transformation. They are defined as
follows:
(8. 16)
(8. 17)
(8. 18)
(8. 19)
(8.20)
(8.21)
The denotes a normalisation. For the PPM it means
dividing by the norm of the third row of .
The last row gives the canonical pair of PPM that can be
written in
that case, using only the invariants. Transforming by T the
canonical pair yields another possible pairs of cameras that
are consistent with the
invariant above.
Rows 4 and 5 describe how the canonical pair is
obtained from what is known or measurable. “Match” stand
for “a sufficient number of
homologous points”
Problems
Prove that, given a PPM P, any transformation
leaves it unchanged (modulo
8. 1
References
S. Bougnoux. From projective to Euclidean space under any practical situation,
a criticism of self- calibration. In Proceedings of the International Conference on
Computer Vision, pages 790– 796,
Bombay, 1998.
Karl Kraus. Photogrammetry - Geometryfrom Images and Laser Scans - 2nd edition.
Walter de Gruyter, Berlin, 2007.
[Crossref]
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_9
Reconstruction Techniques
9. Non-linear Regression
Andrea Fusi ello1
( 1) University of Udine, Udine, Italy
Email: [email protected]
Andrea Fusiello
9. 1 Introduction
In this chapter we will see how to refine the results
obtained from the linear algorithms studied so far in
order to compute better estimates (in terms of precision and
accuracy) in the face of input data affected by
errors. We will do this at the cost of having to use non-linear
minimisation iterative algorithms, such as the Levenberg-
Marquardt
algorithm, which involve a greater computational burden
and have no guarantee of global convergence to the
optimal value. Our job is to
provide the cost function and its derivatives, which will be
the bulk of the work done in this chapter. Dealing with
derivatives forces us to be overly pedantic. However, the
chapter is not essential to the economy of the text if one
wishes to limit oneself to linear methods. At least the first
two sections (Sects. 9.3 and 9.4) are necessary, though, if
one wants to understand bundle adjustment in Chap. 14.
(9. 1)
(9.4)
(9.7)
thus obtaining:
(9.8)
where we defined as the composition of and :
(9.9)
(9. 10)
(9. 11)
(9.
13)
because
(9.
where 14)
(9.
15)
and , the Jacobian of the rotation matrix with respect
to the Euler angles, has been derived in Sect. B.2.
As for the intrinsic parameters, thanks to the chain rule,
we have:
(9.
16) where is the Jacobian of the parameterisation of K
and is easily
computed entry-wise.
Listing 9.1 reports the implementation of non-linear
regression of
PPM or non-linear calibration. Notice how the value of initial
estimate
for P is used to transform the control points. In this way the
Euler angles are close to zero and singularities—that occurs
at —are
avoided.
Listing 9. 1 Non-linear calibration
(9.
19)
(9.21)
or:
(9.22
)
(9.23
)
radial
distortion (defined by (3.38)) is introduced before the
transformation into pixels operated by K, that is:
(9.24
)
Taking this into account, it is easy to verify that the
derivatives that we previously computed for the
reprojection residual are modified as
follows:
(9.25
(9.26
(9.27)
where and .
Finally, we have to compute the derivative of the
reprojection
residual with respect to the radial distortion coefficients,
gathered in the vector :
(9.30
)
(9.32
)
The regression on n points thus minimises the following
function
cost:
(9.33
)
(9.34)
(9.39)
(9.42)
There are still three terms to be expanded in (9.42).
The first one is the derivative of the pseudoinverse that is
given by
(see Sect. B. 1):
(9.43
)
The second one is
(9.44)
(9.46)
(9.48
)
These derivatives are with respect to the
homogeneous vector; to obtain the derivatives with
respect to the Cartesian components, we need to
eliminate the third component, via the projection matrix
introduced earlier, hence:
(9.49
)
The Sampson residual is then:
(9.50
)
with given by (9.47) and given by (9.49).
Consider the square norm of the residual, that is, the
square of the Sampson distance:
(9.51)
(9.53)
with e . The
procedure is similar to the one followed for the
homography, with the difference that
in this case is a scalar and a row vector,
which will make expressions simpler.
(9.55
)
The other two derivatives are:
(9.56
)
(9.57
)
(9.59
(9.60
(9.61
)
Since , the derivative of D is
straightforward. Listings 9.8, 9.9 and 9.10 implement
the method.
Listing 9.8 Non-linear regression ofF
factored into
. Therefore, it can be parameterised with three
Euler angles for the rotation R and two angles e
which identify as a point on the unit sphere of :
(9.62
)
the
parameterisation, while the rest is as in the case ofF (Sect.
9.8). Thanks to the rule for the derivative of the product:
(9.63)
(9.64
) The derivative of the rotation was obtained in Sect. B.2,
while as regards the right block, we observe that, by the
chain rule, the derivative is:
(9.65
)
The derivative of the vector with respect to its
parameters is computed easily:
(9.66
)
as well as the Jacobian of the constructor of the
antisymmetric matrix (already encountered in the
regression of H).
Listing 9.11 reports the MATLAB implementation of the
algorithm. Also note in this case the transformation that
brings Euler angles near zero (followed by the anti-
transformation of the result). The
parameterisation is implemented in Listing 9.12.
Listing 9. 11 Non-linear regression of relative
orientation
Listing 9. 12 Parameterisation of E
Fig. 9.2 Scale Invariant Feature Transform SIFT point correspondences before
(right) and after (left) validation of epipolar geometry through RANSAC
RANSAC can be applied likewise to homographies, as
reported in Listing 9.14.
Listing 9. 14 Robust estimate of H
(9.67
)
Problems
Finally, beware that errors that inevitably affect
measures propagate to the regression results, which should
therefore always be accompanied by an indication of the
error, usually in the form of standard deviation.
Section C.4 deals with this topic.
Problems
Verify that the conic equation can be written as a
quadratic form with the matrix
9. 1
in homogeneous coordinates.
Newton method.
References
A. Bartoli and P. Sturm. Nonlinear estimation of the fundamental matrix with
minimal
parameters. IEEE Transactions on Pattern Analysis and Machine Intelligence,
26(3):426–432, March 2004.
[Crossref]
Fred L. Bookstein. Fitting conic sections to scattered data. Computer
Graphics and Image Processing, 9 ( 1):56– 71, 1979.
[Crossref]
Paul Sampson. Sampson, p.d.: Fitting conic sections to ”very scattered” data: An
iterative
refinement of the Bookstein algorithm. comput. graphics image process. 18,
97- 108. Computer Graphics and Image Processing, 18:97– 108, 01 1982.
P. H. S. Torr and A. W. Fitzgibbon. Invariant fitting of two view geometry. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 26(5):648–650, May 2004.
[Crossref]
Z. Zhang. Determining the epipolar geometry and its uncertainty: A review.
International Journal of Computer Vision, 27(2):161– 195, March/April 1998.
Footnotes
1 Set of zeros of a family of polynomials.
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_10
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
10. 1 Introduction
Traditionally, the term stereopsis (from sterèos solid and
sight) denotes a human perceptual capacity;
Poggio and Poggio (1984) give the following definition:
opsis
( 10.2)
( 10.
3)
Equation (10.3) shows that it is possible to derive the third
coordinate Z once the binocular disparity is known (
). It can also be seen
that the baseline b behaves like a scaling factor: the
disparity associated
with a fixed scene point depends directly on b. If parameter
b is
unknown, it is possible to reconstruct the three-
dimensional model up to a scalingfactor.
Let us see how to translate what we just saw into matrix
form. If we add a third component equal to the disparity
to the image
coordinates, we can modify (10.1) into:
( 10.4)
( 10.
5) then the matrix that accounts for (10.5) is the following:
( 10.
6)
( 10.
7)
And finally, taking the inverse:
( 10.
8)
Considering the third component we arrive at the
expression:
( 10.
9)
( 10.
12)
( 10.
13)
therefore:
( 10.
14)
The sought transformation is the homography
defined by the matrix: .
Recalling and , we have
that that
( 10.
15)
Fig. 10.4 An image is the intersection of a plane with the projective bundle
( 10. 16)
( 10.20)
( 10.25)
( 10.26)
( 10.27)
( 10.2
Problems
on the disparity?
Referring to triangulation in the normal case,
plot the iso- disparity surfaces in space.
10.2
10.3 is correct.
The rectification can be easily extended to the
trifocal case (i.e. three images). In such a case the focal
10.5
References
A. Fusiello and L. Irsara. Quasi-euclidean epipolar rectification of uncalibrated
images. Machine Vision and Applications, 22(4):663–670, 2011.
[Crossref]
A. Fusiello, E. Trucco, and A. Verri. A compact algorithm for rectification of
stereo pairs. Machine Vision and Applications, 12( 1):16– 22, 2000.
[Crossref]
Footnotes
1 Actually the horizontal coordinate of the principal point, , can be
different in the two images.
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_11
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
11. 1 Introduction
In the discussion so far, we have always assumed that it
was possible to perform the preliminary computation of a
number of point
correspondences. In this chapter we will address the
practical problem of how to obtain such correspondences.
We begin by noting that not all points in an image are
equally suitable for computing correspondences. The salient
points orfeature points are points belonging to a region of
the image that differs from its neighbourhood and therefore
can be
detected repeatably and with positional accuracy.
The definition of salient point that we have given is
forcibly vague, because it depends implicitly on the
algorithm under consideration: a posteriori we could say
that the salient points are those that the
algorithm extracts.
In order to match such points in different images, we
need to
characterise them. Since the intensity of a single point is
poorly
discriminative, we typically abstract some property that is
a function of the pixel intensities of a surrounding region.
The vector that
summarises the local structure around the salient point is
called the
descriptor. Point matching thus reduces to descriptor
comparison. For this to be effective, it is important that
the descriptors remain invariant (to some extent) to
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_11
Reconstruction Techniques
( 11. 1)
( 11.2)
( 11.3)
The practical effect of such a filter is to smooth out the
noise, since the mean intuitively tends to level out small
variations. Formally, we see that averaging noisy values
divides the standard deviation of the
noise by m. The obvious disadvantage is that smoothing
reduces the
sharp details of the image, thereby introducing blurring. The
size of the kernel controls the amount of blurring: a larger
kernel produces more blurring, resulting in greater loss of
detail.
In the frequency domain, we know that the Fourier
transform of a 1D box signal is the sinc function: something
similar holds in 2D (Fig. 11.1). Since the frequencies of the
signal that fall within the main lobe are
weighted more than the frequencies that fallin the
secondary lobes, the average filter is a low-pass filter.
However, it is a very poor low-pass
filter as the frequency cut-off is not sharp, due to the
secondary lobes.
( 11.
4)
( 11.5)
( 11.6)
point
is the vector whose components are the partial
derivatives off
at :
( 11.9)
Fig. 11.2 Edges are points of sharp contrast in an image where the intensity of
the pixels
changes abruptly. The edge direction points to the direction in which intensity
is constant, while the normal to the edge points in the direction of the maximal
change in intensity
Fig. 11.3 Top: Original image and gradient magnitude. Bottom: Directional
derivatives in u and
main one is that edges often (but not always) define the
contours of solid objects.
To calculate the gradient we need the directional
derivatives. Since the image is a function assigned on a
discrete domain, we will need to consider its numerical
derivatives. Combining the truncated Taylor
expansions of and we get:
( 11. 10)
Considering the image I as the discrete representation
off, setting and neglecting the factor, we have
that:
( 11. 11)
( 11. 12)
We immediately see that the numerical derivation of an
image is implemented as a linear filtering, namely a
convolution with the mask for and with its
transpose for .
The frequency interpretation of the convolution with a
derivative
mask is a high-pass filtering, which amplifies the noise.
The solution is to cut down the noise before deriving by
smoothing the image. Let D be a derivation mask and S a
smoothing mask. For the associativity of the convolution:
( 11. 13)
This means that one needs to filter the image only once
with a kernel given by .
In the case of two 1D kernels D and S oriented in
orthogonal
directions, their convolution reduces to an outer product and
thus
results in a separable kernel. For example, the Prewitt
operator is
obtained from the box filter convolved with the
derivative mask
:
( 11. 14)
( 11. 15)
A 2D Gaussian can be used for smoothing: we can either
filter with a Gaussian kernel and then calculate the
derivatives or utilise the
commutativity property between differential operators and
convolution and simply convolve with the derivative of the
Gaussian kernel:
( 11. 16)
In practice, convolution is carried out with the two partial
derivatives of the Gaussian (Fig. 11.4).
Fig. 11.5 From left: signal (edge), first derivative, second derivative
To obtain the numerical approximation of the second
derivative, we proceed similarly to what we did for the first
derivative, obtaining:
( 11. 17)
( 11.2
1)
Due to the commutativity between the differential
operators and the convolution, the Laplacian of the
smoothed image can be equivalently
computed as the convolution with the Laplacian of the
Gaussian kernel, hence the name Laplacian of Gaussian (LoG):
( 11.22)
In addition to identifying the edges as zero crossings
(Marr and
Hildreth 1980), observe that LoG filtering provides strong
positive
responses for dark blobs and strong negative responses for
bright blobs (Fig. 11.7). In fact, the LoG kernel is precisely
the template of the dark
blob on a light background (Fig. 11.6), and we already noted
that the
correlation can be read as an index of similarity with the
mask (a.k.a.
template matching). Moreover, we see that the response is
dependent on the ratio of the blob size to the size of the
Gaussian kernel and is
maximal for blobs of diameter close to . In summary,
extremal points of the LoG detect blobs at a scale given by
(Fig. 11.7).
Fig. 11.6 Two plots of the LoG filter kernel, with the classic “sombrero” shape
Fig. 11.7 Test image and response of the Gaussian Laplacian with and
. As can be seen, the peaks of the response correspond to dark blobs of a
certain size, which depends on the . Sunflowers photo by Todd Trapani on
Unsplash
LoG can be approximated by a difference of two
Gaussians (DoG) with different scales (Fig. 11.8). The
separability and cascadability of Gaussians applies to
DoG, so an efficient implementation is achieved:
( 11.23)
Fig. 11.8 Approximation of LoG as difference of Gaussians (DoG)
From a frequency point of view, the LoG operator is a
bandpass filter, and in fact, it responds well to spots
characterised by a spatial frequency in its passband, while
attenuating the rest. The approximation with DoG further
confirms this interpretation, as it shows that the LoG is the
difference of two low-pass filters.
( 11.2
4)
By truncating the Taylor series we obtain:
( 11.25)
( 11.2
9)
and
( 11.3
0)
Fig. 11.10 From left: test image, Harris-Stephens operator response (note that
it takes negative values), Noble operator response. The circles indicate the
corners detected after selecting local maxima larger than a threshold (4.0 for
HS and 1.0 for Noble). With these thresholds, low
contrast corners on the right side of the image are not detected
( 11.32)
( 11.3
with 6)
( 11.3
7)
Fig. 11.11 Predictive tracking (a) and data association of single-track (b) and
multiple-tracks (c)
( 11.39)
( 11.43)
( 11.44)
References
Y. Bar-Shalom and T. E. Fortmann. Tracking and data Association. Academic Press,
Waltham, MA, 1988.
Lionel Gueguen and Martino Pesaresi. Multi scale Harris corner detector based
on differential
morphological decomposition. Pattern Recognition Letters, 32: 1714– 1719, 10 2011.
https://doi. org/10. 1016/j.patrec.2011.07.021.
C. Harris and M. Stephens. A combined corner and edge detector. Proceedings
of the 4thAlvey Vision Conference, pages 189– 192, August 1988.
T. Lindeberg. Scale invariant feature transform. Scholarpedia, 7 (5): 10491, 2012.
Tony Lindeberg. Feature detection with automatic scale selection.
International Journal of Computer Vision, 30: 79– 116, 1998.
[Crossref]
David G. Lowe. Distinctive image features from scale-invariant keypoints.
International Journal of Computer Vision, 60 (2): 91– 110, 2004.
[Crossref]
Footnotes
1 Maximum and minimum curvature of a curve contained in the surface and
passing through the point.
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_12
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
12. 1 Introduction
Having already discussed the geometric aspect of
stereopsis in Chap. 10, we will focus here on stereo matching,
an important technique used in computer vision for finding
correspondences between two images of a scene taken
from different perspectives. The goal is to establish a one-
to-one correspondence between pixels in the two images,
which can be customarily represented as a disparity map
(Fig. 12.1).
Fig. 12.1 Pair of stereo images (rectified) and ideal disparity map (images
taken from http:// vision.middlebury.edu/stereo/)
(
12.3) It can be seen as the squared norm of the difference of
the vectorised
windows.
The smaller the value of (12.3), the more similar the
portions of the images considered. The MATLAB
implementation is given in Listing 12.2.
Listing 12.2 Stereo with SSD
a
transformation based on local sorting of grey levels is
applied to the images and then the similarity of the
windows on the transformed images is measured.
The census transform (Zabih and Woodfill 1994) is based on
the
comparison of intensities. Let and be the
intensity values of pixels p and , respectively.
If we denote the concatenation of bits by the symbol ,
the census transform for a pixel p in the image I is the bit
string:
( 12.5
)
where denotes a window centred in p of radius
and are the Iverson parenthesis.
The census transform summarises the local spatial
structure. In fact, it associates with a window a bit string
that encodes its intensity in
relation to the central pixel, as exemplified in Fig. 12.5.
(
12.6) Each term of the summation is the number of pixels
inside the
transformation window , whose relative order (i.e.
having
higher or lower intensity) with respect to the considered
pixel changes from to .
This method is invariant to any monotonic transformation
of
intensities, whether linear or not. In addition, this method
is tolerant to errors due to occlusions. In fact, the study of
Hirschmuller and
Scharstein (2007) identifies SCH as the best matching
cost for stereo matching.
The MATLAB implementation is reported in Listing 12.3.
Listing 12.3 Stereo with SCH
Fig. 12.6 Disparity maps produced by SSD, NCC and SCH (left to right) with
window
Fig. 12.10 An eccentric window can cover a constant disparity zone even near
a disparity jump
Fig. 12.12 Idealised cost matrices for the random point stereogram. Section
of the DSI (left) and match space. Intensities represent cost, with white
corresponding to the minimum cost. Choosing a point in the matrix is
equivalent to setting a match
In both cases, it is a matter of computing a minimum cost
path
through a cost matrix, and dynamic programming has been
shown to be particularly well suited for this task. However,
since the optimisation is performed independently along
the horizontal scanlines, horizontal
artefacts in the form of streakings are present. Several
authors add other penalties to the cost function for violating,
e.g. uniqueness or ordering constraints.
A compromise between global methods, which
undoubtedly produce better results (Fig. 12.13) but at high
computational cost, and those
operating on a single scanline is the Semi Global Matching
(SGM)) of
Hirschmuller (2005) which can be regarded as a variation of
SO that
considers many different scanlines instead of just one (Fig.
12.14). The method minimises a one-dimensional cost
function over n (typically
) sections of the DSI along the cardinal directions of
the image (the horizontal one corresponds to the
section of the SO methods
described above). This way of proceeding can be
seen as an approximation of the optimisation of
the global cost function.
For each pixel and each disparity, the costs are summed
over the eight paths, resulting in an aggregate cost
volume:
( 12. 11)
in which per-pixel minima (as in WTA) are chosen as
the computed disparity.
Hirschmuller (2005) described SGM in conjunction to a
matching cost based on mutual information; however,
any cost can be plugged in. This algorithm achieves an
excellent trade-off between result quality
and execution time, as well as being suitable for parallel
implementations in hardware. Therefore, it has been widely
adopted in application contexts such as robotics and driver
assistance systems,
where there are real-time constraints and limited
computational capabilities.
When the two images of a stereo pair have different
lighting
conditions, it is necessary to normalise them by matching
their
respective histograms, that is, to transform one histogram
so that it is as similar as possible to the other.
12.6 Post-Processing
Downstream of the calculation of the “raw” disparity,
there are several steps of optimisation and post-
processing that can be applied. First,
interpolating the cost function near the minimum (e.g. with
a parabola) can yield a sub-pixel resolution corresponding
to fractional values of
disparity. Further, various image processing techniques
such as median filtering, morphological operators and
bilateral filtering (Chap. 11) can be used to reduce
isolated pixels and make the map more regular,
particularly if it has been created using non-global methods.
In the
following paragraphs, we will explore two important post-
processing steps: the computation of reliability
indicators and occlusion detection.
12.6. 1 Reliability Indicators
The depth information provided by stereopsis is not
everywhere equally reliable. In particular, there is no
information for occlusion zones and
uniform intensity (or non-textured) areas. This incomplete
information can be integrated with information from other
sensors, but then it needs to be accompanied by a reliability
or confidence estimate, which plays a primary role in the
integration process.
We now briefly account for the most popular confidence
indicators; more details are found in (Hu and Mordohai
2012). In the following denotes the matching cost,
normalised in , related to the disparity d (in the case of
NCC we take 1-NCC). The minimum of the cost function is
denoted by and the corresponding disparity value by .
The second best cost is denoted by , while the best
second local minimum is
denoted by (Fig. 12.15). The confidence varies in
, where 0 means unreliable and 1 reliable.
Matching cost
Curvature )
Peak Ratio )
Matching cost
Maximum )
Margin
Winner Margin
)
References
I. J. Cox, S. Hingorani, B. M. Maggs, and S. B. Rao. A maximum likelihood
stereo algorithm. Computer Vision and Image Understanding, 63 (3): 542– 567,
May 1996.
[Crossref]
Mark Gerrits and Philippe Bekaert. Local stereo matching with segmentation-
based outlier
rejection. In Proceedings of the The 3rd Canadian Conference on Computer and Robot
Vision, CRV ’06, page 66, USA, 2006. IEEE Computer Society.
Heiko Hirschmüller, Peter R. Innocent, and Jonathan M. Garibaldi. Real-time
correlation-based stereo vision with reduced border errors. Int.J. Comput. Vis., 47
( 1-3): 229– 246, 2002.
[Crossref]
Pattern Recognition, CVPR’03, page 556– 561, USA, 2003. IEEE Computer
Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and
Society.
R. Zabih and J. Woodfill. Non-parametric local transform for computing visual
correspondence. In Proceedings of the European Conference on Computer Vision, volume
2, pages 151– 158.
Springer, Berlin, 1994.
Footnotes
1 The two approaches are equivalent in the proposed example, but in more
complex cases they are not, as pointed out by Di Stefano et al. (2002).
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_13
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
13. 1 Introduction
The recovery of a 3D model, or 3D shape acquisition, can be
achieved
through a variety of approaches that do not necessarily
involve cameras and images. These techniques include
contact (probes), destructive
(slicing), transmissive (tomography) and reflective non-
optical (SONAR, RADAR) methods. Reflective optical
techniques, which rely on back-
scattered visible (including near infrared)
electromagnetic radiation, have been the focus of our
discussion so far, as they offer several
advantages over other methods. These include not
requiring contact, speed and cost-effectiveness.
However, these methods also have some limitations, such
as only being able to acquire the visible portion of
surfaces and dependency on surface reflectance.
Within optical techniques, we distinguish between active
and passive ones. The attribute active refers to the fact that
they involve a control on the illumination of the scene, that
is, radiating it in a specific and
structured way (e.g. by projecting a pattern of light or a
laser beam), and exploit this knowledge in the 3D model
reconstruction. In contrast, the passive methods seen so far
rely only on analysing the images as they
are, without any assumptions about the illumination, except
that there is enough to let the camera see.
Active methods are reifie d as stand-alone sensors that
incorporate a light source and a light detector, which is not
necessarily a camera.
These are also called range sensors, for they return the range
of visible points in the scene, that is, their distance from
the sensor (Fig. 13.1).
This definition can be extended to include passive sensors
as well, such as stereo heads.
Fig. 13.1 Colour image and range image of the same subject, captured by a
Microsoft Kinect device. Courtesy of U. Castellani
Full-field sensors return a matrix of depths called a range
which is acquired in a single temporal instant, similar to
image
how a camera with a global shutter operates.
Scanning sensors (or scanners) sweep the scene with a
beam or a light plane, and capture a temporal sequence
of different depth
measurements. This sequence can then be used to generate
a range
image or a point cloud, depending on the desired output.
Similarly to rolling shutter cameras, artefacts will
appear if the sensor or the object
moves.
Fig. 13.2 Active stereo with random dots. Stereo pair with projected
“salt and pepper” artificial texture and resulting disparity map
Fig. 13.3 Example of active stereo system with laser sheet. The top row
shows the two images acquired by the cameras, in which the line formed
by the laser sheet is visible. Below are shown, in overlay, the points
detected after a sweep
Fig. 13.4 Image of intensity and range image obtained from a commercial laser
active
triangulation system. The missing parts in the range image (in white) are due
to the different position of the laser source and the camera. Courtesy of S.
Fantoni
( 13.3)
( 13.
4)
Using Eqs. (13.2), (13.3) and (13.4), we get:
( 13.
6)
( 13.7)
measured modulo .
The choice of modulation frequency depends on the type
of
application. A choice of = 10 MHz, which corresponds to
an
unambiguous range of 15 m, is typically suitable for indoor
robotic
vision. For close-range object modelling, we will choose a
lower . This ambiguity can be eliminated by scanning
with decreasing
wavelengths, or equivalently with a chirp type signal.
What we described so far is a laser range finder, a sensor
that emits a single beam and therefore obtains distance
measurements of only one
point in the scene at a time. A laser scanner, instead, uses a
beam of laser light to scan the scene in order to derive the
distance to all visible points.
In a typical terrestrial laser scanner, an actuator rotates a
mirror
along the horizontal axis (thus scanning in the vertical
direction), while the instrument body, rotating on the
vertical axis, scans horizontally.
Each position then corresponds to a 3D point
detected in polar coordinates (two angles and the
distance).
In an airborne laser scanner, on the other hand, the
instrument
performs an angular scan (or “swath”) along the
perpendicular to the flight direction, while the other scan
direction is given by the motion of the aircraft itself, which
must therefore be able to measure its
orientation with extreme precision, thanks to inertial
sensors
(accel erometer and gyroscope) and a global navigation
satellite system (GNSS), such as the GPS. The same
principle is also applied in the
terrestrial domain (mobile mapping) to different carriers
such as road vehicles, robots or humans.
Simultaneous Localization and Mapping (SLAM) is a
powerful tool for navigation in GNSS-denied
environments, such as indoors. It utilises 3D registration
techniques (see Chap. 15) to compute the incremental
rigid motion between consecutive scans. For further
information, see
(Thrun and Leonard 2008).
Recently, full-field time-of-flight sensors, also known as
Flash LiDAR or TOF cameras, have been developed. These
sensors allow for
simultaneous time-of-flight detection of a 2D matrix of
points through an array of electronic circuits on the chip.
While the lateral resolution of these sensors is low (e.g.
128x128), it is compensated for by their high acquisition
rate (e.g. 50 fps).
Fig. 13.6 Top: 12 images of a figurine taken with different light source
positions. Bottom: the normals map obtained by photometric stereo (the hue
encodes the normal), on the right the reconstructed surface. Courtesy of L.
Magri
Listing 13. 1 Photometric stereo
( 13.
15) approximates the tangent vector to the surface
in the
direction x and must therefore be orthogonal to the
normal, that is, (Fig. 13.7).
(
13. 17) which approximates the tangent vector to the surface
in the y-direction, yielding
( 13. 18)
In this way, as the normal s vary, we can derive conditions
on the depths Z. It is then possible to determine Z by
solving the corresponding
overdetermined sparse system of dimension .
References
Amit Agrawal, Ramesh Raskar, and Rama Chellappa. What is the range of
surface reconstructions from a gradient field? In European Conference on Computer
Vision, pages 578– 591. Springer,
Berlin, 2006.
Peter N Belhumeur, David J Kriegman, and Alan L Yuille. The bas-relief
ambiguity. International Journal of Computer Vision, 35 ( 1): 33–44, 1999.
K. Boyer and A. Kak. Color-encoded structured light for rapid active ranging.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 9 ( 10): 14– 28, 1987.
[Crossref]
Footnotes
1 A plane or sheet can be obtained by passing a laser beam through a
cylindrical lens.
2 This is also the working principle of the first Microsoft Kinect sensor. The
pattern is generated by an infrared laser through appropriate diffraction grating
that creates a speckle pattern.
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_14
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
14. 1 Introduction
In this chapter we will deal with the problem of
reconstruction from many calibrated images, which is the
most relevant in practice and leads to a Euclidean
reconstruction.
Consider a set of 3D points, viewed by m cameras with
matrices
. Let be the (homogeneous) coordinates of the
projection of the j-th point into the i-th camera. The problem
of reconstruction can be posed as follows: given the set of
pixel coordinates , find the set of Perspective
Projection Matrix (PPM) and the 3D points
(called model in this context) such that:
( 14. 1)
As already observed in the case of two images, without
further
constraints one will obtain, in general, a reconstruction
defined up to an arbitrary projectivity, and for this reason it
is called projective
reconstruction. Indeed, if and are a
reconstruction, that is, they satisfy (14.1), then also
and satisfy (14.1) for every nonsingular
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_14
Reconstruction Techniques
Fig. 14.5 The points shared by I 1and I2 (stars and dots) are those that make up
the initial
model. Image I3 is added, thanks to the model points visible in I3 (dots); their
position will be refined by triangulation from three views. Other tie points
are added to the model: those visible from I 1and I3 (crosses) and those visible
from I2and I3 (rhombuses)
( 14.3)
and then instantiate three mutually consistent PPMs,
and and
reconstruct the model by triangulation (Listing 8.1).
Unfortunately, however, the composition does not work
for versors. In other words, while the modulus of the only
translation involved was ignored in the two-view case
(yielding a scaled reconstruction), in the
three (or more)-view case, the moduli of each relative
translation cannot be fixed arbitrarily, since the relative
translations are not independent of each other.
Adding the edge 1– 3 to the graph solves this problem.
Let us rewrite (14.3) such as the ratios between the moduli
are explicited:
( 14.4)
where and From the ,
one matrix
and solves the above equation for the
unknowns and
as described in Proposition A. 18.
Note, however, that a global scaling factor remains
undetermined, and it is fixed arbitrarily through the norm
of , as in the case of two images.
Fig. 14.8 Dendrogram relative to the cherub reconstruction. Pairs ( 10, 11),
(6,7) and (2,3)
initiate the reconstructions; other images are added by resection; in the root
node and its left child, two reconstructions are merged
The algorithm proceeds from the leaves to the root of the
tree:
1. in the nodes where two leaves are joined (image pairs),
we proceed with the reconstruction as described in
Sect. 8.4;
2. in the nodes where a leaf joins an internal node, the
reconstruction corresponding to the latter is increased
by adding a single image via resection and updating the
model via intersection ( as in the
previous sequential method);
( 14.
12)
( 14. 13)
( 14. 17)
which we rewrite as
19)
( 14.20
)
where represents the relative translation expressed
in the world reference frame, which we fixed integral to
the first PPM.
We now consider the matrix X obtained from the
juxtaposition of the COPs: . Then (14.20)
is
written: where
the
indicator vector
( 14.21
)
( 14.22
)
selects columns i and j from X. We see that represents an
edge of the epipolar graph G, and in fact it is one of the
columns of the incidence matrix . Hence, the
equations that we can write, one for each
edge, are expressed in matrix form as
( 14.2
6)
obtaining
( 14.2
7)
The right-hand member cancels out as each
(column of U) is
multiplied on the left by the corresponding
, which is
the cross product of a vector by its own versor. The net
effect is to
replace U, which contains the unknown moduli, with ,
which depends only on the known bearings. In the noise-
free case, (14.27) has a unique
solution up to a scale if and only if possesses a
one-
dimensional null space.
While translation synchronisation has solution as soon as
the
epipolar graph is connected, the conditions on the graph
topology
under which (14.27) is solvable are more complex and call
into
question the notion of rigidity of the graph (Arrigoni and
Fusi ello
2019). Generally, in order to be rigid, the graph must have
more edges than those ensuring connectivity. For
example, in the three- view
case, edge 1– 3 was needed to solve the problem.
( 14.30)
( 14.3
1) The Jacobian of the least-squares problem corresponding
to (14.30) is
therefore composed of blocks of this type. Let us define:
( 14.32)
( 14.33)
Fig. 14.11 Structure of matrix H for the same example of Fig. 14.10. The fact
that the two NW
and SE blocks are block diagonals depends on the primary structure, while the
sparsity pattern
of the NE and SW blocks depends on the secondary structure,
that is, visibility To simplify the notation, we
rewrite it as
( 14.37)
( 14.4
0)
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_15
Reconstruction Techniques
15. 3D Registration
Andrea Fusi ello1
( 1) University of Udine, Udine, Italy
Email: [email protected]
Andrea Fusiello
15. 1 Introduction
The purpose of registration or alignment of partial 3 D models is
to bring
them all into the same reference frame by means of a
suitable
transformation. This problem has strong similarities
with image mosaicking, which we will discuss in
Chap. 18.
This chapter takes its starting point from the Adjustment
of
Independent Models (AIM) introduced in 14 and complete it
by
describing (in Sect. 15.1.1) how to extend the alignment
of two sets of 3D points to the simultaneous alignment of
many sets, with given
correspondences. Such sets of 3D points are also called
point clouds in this context.
The registration problem arises also when considering
model
acquisition by a range sensor. In fact, range sensors such
as the ones we studied Chap. 13 typically fail to capture the
shape of an object with a
single acquisition: many are needed, each spanning a part
of the object’ s surface, possibly with some overlap (Fig.
15.1). These partial models of the object are each
expressed in its own reference frame (related to the
position of the sensor).
Fig. 15.1 Registration of partial 3D models. On the right, eight range images of
an object, each in its own reference frame. On the left, all images superimposed
after registration. Courtesy ofF.
Arrigoni
While in the AIM the correspondences are given by
construction, the partial point clouds produced by range
sensors do not come with
correspondences. In Sect. 15.2.2 we will present an
algorithm called Iterative Closest Point (ICP) that aligns
two sets of points without
knowing the correspondences, as long as the initial
misalignment is moderate. We will finally (Sect. 15.2.3)
mention how to extend ICP for registration of many point
clouds by exploiting synchronisation (Sect. 14.3).
For a general framing of the problem, see also
Bernardini and Rushmeier (2002).
15. 1. 1 Generalised Procrustes Analysis
The registration of more than two point clouds in a single
reference
frame can be simply obtained by concatenating registrations
of
overlapping pairs. This procedure, however, does not lead to
the optimal solution. For example, if we have three
overlapping point clouds and we
align sets one and two and then sets one and three, we are
not
necessarily minimising the same cost function between
sets two and three. We therefore need adjustment
procedures that operate globally, that is, take into account
all sets simultaneously.
If correspondences between the different point clouds
are available, we can merge them in a single operation,
applying to each the similitude that brings it into a common
reference frame, thus generalising the
absolute orientation via Orthogonal Procrustes analysis
(OPA) that we
saw earlier (Sect. 5.2.1). This technique is in fact
referred to as Generalised Procrustes analysis
(GPA).
Given the matrices representing the m sets
of
points the alignment is achieved by
minimising the following cost function:
( 15. 1)
( 15.
2)
wher is the mean or centroid:
e
( 15.
3)
( 15.5)
Fig. 15.4 From left to right: 27 point clouds in the initial references frames,
point cloud after alignment (points are coloured according to the source
image), surface reconstructed using the technique of Kazhdan et al. (2006).
Images taken from Fantoni et al. (2012)
References
Federica Arrigoni, Beatrice Rossi, and Andrea Fusiello. Global registration of
3D point sets via LRS decomposition. In Proceedings of the 14th European
Conference on Computer Vision, pages 489– 504, 2016.
Fausto Bernardini and Holly E. Rushmeier. The 3d model acquisition pipeline.
Computer Graphics Forum, 21 (2): 149– 172, 2002.
[Crossref]
P. Besl and N. McKay. A method for registration of 3-D shapes. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 14 (2): 239– 256,
February 1992.
[Crossref]
Y. Chen and G. Medioni. Object modeling by registration of multiple range
images. Image and Vision Computing, 10 (3): 145– 155, 1992.
[Crossref]
J. J. F. Commandeur. Matching configurations. DSWO Press, Leiden, 1991.
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_16
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
16. 1 Introduction
In this chapter we will study the problem of computing
a projective reconstruction from multiple uncalibrated
images. The point-based methods of Chap. 14 can be—
to some extent—adapted to the
uncalibrated case. For example, with the Adjustment of
Independent Models (AIM) one would obtain a set of
projective reconstructions
related to each other by an unknown projectivity (in other
words, each of these defines its own projective reference
frame). Less obvious is how to modify frame-based
techniques.
In the following we will describe a global method
proposed by Sturm and Triggs (1996) that is specific for
projective reconstruction and is
based on the factorisation method of Tomasi and Kanade
(1992). We will then address the problem of self-
calibration, that is, promoting the reconstruction from
projective to Euclidean, which is equivalent to
estimating the intrinsic parameters.
( 16.
2)
It tells us that W factorises into the product of
a matrix M, and because of this it matrix P and
has rank four.
a
Let us rewrite (16.2) in compact form as
( 16.
where 3)
( 16.
4)
and the symbol denotes the Hadamard product, that is,
the element- by-element product. Note that the matrix Z
has m rows, while Q has 3m since its elements are
blocks, and so in order to match the
dimensions, one must triplicate each row of Z via the
Kronecker product by the vector .
In this equation only Q is known and everything else is
unknown.
However, if we provisionally assume that Z is known, the
matrix W
becomes known, and we can compute its Singular Value
Decomposition (SVD):
( 16.5)
In the ideal case where the data is unaffected by error,
the rank of W is four and therefore
. Then, only the
first four columns of U and V contribute to the product
between
matrices. Let therefore and be the matrices
formed by the first four columns of U and V , respectively.
We can then write the
compact SVD of W as
( 16.6)
Comparing (16.6) with (16.2), we can identify:
( 16.
8)
We are now left with the problem of estimating the
unknowns
that can be calculated once P and M are known. In fact, for
image i (16.2) writes:
( 16.9
)
or:
( 16.
10)
which is analogous to (5.21) already solved in Sect. 5.3.2.
An iterative scheme that is often adopted in cases like
this is to
alternate the solution of two problems: in one step we estimate
given
P and M; in the next step, we estimate P and M given and
iterate.
To avoid convergence to the trivial solution , the
matrix Z must be normalised in some way at each iteration,
for example, by fixing the norm of its rows or columns, or
other subsets of its entries. Nasihatkon et al. (2015) show
that there are other trivial solutions besides
and that the choice of normalisation criterion is crucial to
exclude all or part of them.
In the implementation shown in Listing 16.1, we chose
to normalise the rows of Z, a simple and widely used
solution, but one that actually leaves open the possibility
of convergence to trivial solutions in which an entire
column vanishes (see Nasihatkon et al. 2015 for details).
Note also that preconditioning is applied to the
coordinates of the
input image points, as in the fundamental matrix
computation (Sect. 6. 4): experimentally it is observed
that this step is crucial to obtain a good result.
16.2 Autocalibration
Although some useful information can be derived from a
projective
reconstruction (Robert and Faugeras 1995), what we
would ultimately like to obtain is at least a Euclidean
reconstruction of the model, which
differs from the true model by a similitude. The latter
consists of a direct isometry (due to the arbitrary choice of
world reference frame) plus a
uniform change of scale (due to the well-known
depth-speed ambiguity).
Fortunately, if some reasonable conditions are met (e.g.
at least three cameras with constant intrinsic parameters), it
is possible to switch to the Euclidean reconstruction, that
is, the one we would get in the
calibrated case. This process is called autocalibration, and can
be
achieved through two different ways: we can start from
the projective reconstruction and then upgrade it to
Euclidean (Sect. 16.2.1), or
recover the intrinsic parameters from the fundamental
matrices and fall back to the calibrated case (Sect. 16.2.2).
( 16. 14)
( 16. 15)
( 16.
16) Note that (16.16) contains five equations, because the
matrices of
both members are symmetric, and the homogeneity reduces
the number of equations by one.
This is the basic equation for autocalibration, called
absolute quadric constraint (Triggs 1997), relating the unknowns
( ) and to the available data . The name
comes from its geometrical
interpretation, where is special conic called dual image of
absolute conic and is a quadric, the dual absolute quadric. The
the
( 16. 19)
( 16.20)
( 16.21
)
( 16.23)
( 16.2
4)
where
( 16.25)
( 16.26)
( 16.27)
and denotes the set of real matrices . Since we
are dealing with matrix functions, we employ the definition
of the Jacobian given by Magnus-Neudecker (Sect. B. 1),
which allows to apply the chain rule
(Theorem B. 1) to the compound function ( ),
obtaining
(
16.28) The Jacobian is immediately derived from
(B.26), while as for , using the chain rule and
formulae (B.27) and (B.4), we arrive at
( 16.2
9)
Finally, it is immediate to verify that
( 16.3
0)
Since K is parameterised through the intrinsic
parameters, it will be necessary to multiply the Jacobian
(16.28) to the right by the derivative of K with respect to its
parameters.
Listing 16.2 Mendonça-Cipolla self-calibration
( 16.33)
( 16.35)
with and
In Cartesian coordinates, the affine projection is written: 1
( 16.36)
The point is the image of the origin of the world
reference frame, indeed .
Consider a set of n 3D points, viewed by m affine cameras
with
matrices . Let be the Cartesian coordinates of
the
projection of the j-th point in the i-th camera. The goal of the
reconstruction is to estimate the PPM of the cameras, that
is, , and the 3D points such that
( 16.37)
We consider the “centralised” coordinates obtained by
subtracting the centroid in each image:
( 16.38)
(this step implies that all n points are visible in all m images).
We choose the 3D reference frame in the centroid of
the points, so that . Taking this into account, we
substitute (16.37) into (16.38), thus obtaining to
eliminate the vector in (16.38), which
becomes
( 16.40)
( 16.45)
16.5 Problems
16. 1 Adapt (i) AIM, (ii) incremental reconstruction
and (iii) bundle adjustment to the uncalibrated case.
References
A. Fusiello, A. Benedetti, M. Farenzena, and A. Busti. Globally convergent
autocalibration using interval analysis. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 26 ( 12): 1633– 1638, December 2004.
[Crossref]
Riccardo Gherardi and Andrea Fusiello. Practical autocalibration. In Proceedings
of the European Conference on Computer Vision, Lecture Notes in Computer Science,
pages 790– 801. Springer,
Berlin, 2010.
Footnotes
1 In this paragraph, in order not to clutter the notation, the Cartesian
coordinates of the points are denoted without the ˜.
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_17
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
17. 1 Introduction
We treat in this chapter the problem of multi-view stereo
reconstruction,1 that is, recovering the surface of an object
from many ( ) images.
The key notion to understand multi-view stereo
reconstruction
algorithms is that of photo-consistency. A point in the scene
is defined as photo-consistent in a set of images if, for each
image in which it is visible, its radiance in the
direction of the pixel is equal to the intensity of the
corresponding pixel . The surface
reconstruction problem then reduces to that of
determining the photo- consistency of the points in space:
the photo-consistent ones belong to the surface, and the
others correspond to empty space.
If the surface is Lambertian, the radiance is the same in
all
directions, and therefore for a photo-consistent point all
values are equal . In fact the equality of the
colour (modulo radiometric and geometric perturbations) of
the projections of the point in the images in which it is
visible is commonly taken as a test of photo-consistency
(Fig. 17.1), usually including a small region around it (a.k.a.
https://doi.org/10.1007/978-3-031-34507-4_17
Reconstruction Techniques
images.
Fig. 17.1 Illustration of the concept of photo-consistency. The patch A is
photo-consistent, and its two projections are similar, whereas patch B is not,
and in fact its two projections are very different
An aspect that in the binocular case can be ignored, as
it reduces to occlusions handling, but in multi-view stereo
becomes crucial is that of visibility. In fact, the problem of
determining photo-consistency is
closely related to that of visibility. In order to correctly
evaluate the
photo-consistency of a point, it is necessary to consider
only the images in which it is visible; however determining
in which images a point is
visible implies the knowledge of the shape of the surface,
as it is made clear in Fig. 17.2.
Fig. 17.2 The visibility problem. In order to recover the surface by photo-
consistency, the
visibility is needed. At the same time, to calculate the visibility the surface is
needed. Adapted from Hernández and Vogiatzis (2010)
Multi-view stereo algorithms can be grouped into four
classes.
Volumetric stereo approaches (on which we will focus in this
chapter) assume that there exists a known, finite volume
within which the objects of interest lie. This volume can
be viewed as a parallel epiped surrounding the scene and is
represented by a grid of “cubes” called
voxels, the 3D equivalent of pixels. The goal is to assign a
binary
occupancy label (or, possibly, a colour) to each element in
the working volume. The desired final representation is a
polygonal mesh
representing the surface of the object. To step from a
volumetric
representation to a polygonal mesh, marching cubes (Sect.
17.4) is the standard algorithm.
Within these approaches, we have object-space techniques
(see Dyer 2001 for an excellent review), which proceed by
scanning the working volume and assigning an occupancy
label to each voxel according to its photo-consistency
(Sect. 17.2), and image-space techniques, which start by
calculating correlation (or any other matching cost)
between
windows in the images (similarly to binocular stereo), and
only
eventually assign a photo-consistency value to the voxel
(Sect. 17.3).
The second class of techniques start directly from a
polygonal mesh that roughly represents the object and
evolve it in the direction of
optimi sing its photo-consistency (Faugeras and Keriven
1998; Delaunoy et al. 2008). This class can also be
considered as a post-processing for
other methods.
The third class consists of algorithms that compute a set
of depth
maps from binocular stereo and merge them into a 3D
model.2 These
methods immediately generalise binocular stereo and can
exploit fusion methods originally conceived for range images
(Goesele et al. 2006).
The last class starts from photo-consistent points that
definitely
belong to the surface and expand them by generating a
“cloud” of planar elements or patches oriented in space
(Furukawa and Ponce 2010).
The last three classes will not be covered in this chapter.
The reader is referred to Seitz et al. (2006) for a
comprehensive review
( 17. 1)
( 17.3
)
( 17.4)
Fig. 17.7 On the left are the k correlation curves and their mean (dashed),
in the centre are highlighted with a triangle the local maxima of each curve,
and on the right is the aggregate correlation curve with the Parzen window
and its maximum (triangle), which corresponds to the true depth, different from
the maximum of the mean. Images courtesy of G. Vogiatzis
The depth map calculation step can be further improved
(Hernández and Vogiatzis 2010) by relaxing the approach
that considers the
maximum of the aggregate correlation function with a
Markov random field in which the best local maxima of the
correlation are all candidates for the depth of a point.
on a
volume and wants to determine from it a surface that is
photo-
consistent and regular (Fig. 17.8). The problem can be seen
as a
segmentation of the volume into object-background, the
result of which is precisely the surface of the object. The
solution can be obtained by
applying the minimum-cut algorithm (dual of the maximum
flow) on a
suitably constructed graph, where the nodes are the voxel
centres and the edge connecting two voxels and has
a weight equal to
. The rational is that the cost of separating two
adjacent voxels (i.e., considering them object and
background) is proportional to the
photo-consistency of the two.
References
Frank Dachille, Klaus Mueller, and Arie Kaufman. Volumetric backprojection.
In Proceedings of the 2000 IEEE symposium on Volume visualization, pages 109– 117.
ACM Press, 2000.
Amael Delaunoy, Emmanuel Prados, Pau Gargallo, Jean-Philippe Pons, and Peter
Sturm.
Minimizing the multi-view stereo reprojection error for triangular surface
meshes. British Machine Vision Conference, 2008.
C.R. Dyer. Volumetric scene reconstruction from multiple views. In L. S. Davis,
editor, Foundations of Image Understanding, chapter 16. Kluwer, Boston, 2001.
O. Faugeras and R. Keriven. Variational principles, surface evolution, PDEs, level
set methods, and the stereo problem. IEEE Transactions on Image Processing, 7 (3):
336– 344, March 1998.
[MathSciNet][Crossref]
Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview
stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (8):
1362– 1376, August 2010. [Crossref]
D. T. Gering and W. M. Wells. Object modeling using tomography and
photography. In Proceedings of the IEEE Workshop on Multi-View Modeling and Analysis of
Visual Scenes, pages 11– 18, 1999.
Michael Goesele, Brian Curless, and Steven M. Seitz. Multi-view stereo
revisited. pages 2402– 2409, Washington, DC, USA, 2006. IEEE Computer
Society. ISBN 0-7695-2597-0. doi:
10. 1109/CVPR.2006. 199. URL http://dx.doi.org/10.1109/CVPR.2006.199.
Ming Li, Marcus Magnor, and Hans Peter Seidel. Hardware-accelerated visual
hull reconstruction and rendering. In In Graphics Interface 2003, pages 65– 71,
2003.
W.E. Lorensen and H.E. Cline. Marching cubes: a high resolution 3-D surface
construction
algorithm. In M.C. Stone, editor, SIGGRAPH: International Conference on Computer
Graphics and Interactive Techniques, pages 163– 170, Anaheim, CA, July 1987.
W. N. Martin and J. K. Aggarwal. Volumetric descriptions of objects from
multiple views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5 (2):
150– 158, March 1983.
[Crossref]
S. M. Seitz and C. R. Dyer. Photorealistic scene reconstruction by voxel
coloring. International Journal of Computer Vision, 35 (2): 151– 173, 1999.
[Crossref]
Steven M. Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard
Szeliski. A
comparison and evaluation of multi-view stereo reconstruction algorithms. In
IEEE Conference on Computer Vision and Pattern Recognition, pages 519– 528, 2006.
Peter Sturm. A historical survey of geometric computer vision. In Computer
Analysis of Images and Patterns, pages 1– 8. Springer, Berlin, 2011.
R. Szeliski. Rapid octree construction from image sequences. CVGIP: Image
Understanding, 58 ( 1): 23– 32, 1993.
[Crossref]
Footnotes
1 In this chapter the term reconstruction is used with a different meaning from
the one defined in Chap. 16.
OceanofPDF.com
© The Author(s), under exclusive license to Springer Nature
Switzerland AG 2024 A. Fusiello, Computer Vision: Three-dimensional
https://doi.org/10.1007/978-3-031-34507-4_18
Reconstruction Techniques
Email: [email protected]
Andrea Fusiello
18. 1 Introduction
In this last chapter, techniques are presented that have in
common the fact to synthesise— or to render—an image
obtained from the geometric
transformation of one or more input images; we speak also of
image-based rendering. Synthetic images can be interpreted
as obtained from a virtual camera that differs from the real
one for some aspects such as a greater
field of view, or a different orientation.
Image synthesis from other images (image-based
rendering), as
opposed to image synthesis from three-dimensional models
(model-based rendering), originates from the idea that a
scene can be represented by a collection of its images.
Those that are missing can be synthesised from
existing ones. It is also referred to as image interpolation.
18.2. 1 Mosaics
projective, piecewise affine and non-parametric
transformations, as in the case of a disparity or parallax field.
The simplest case of synthesis is when an image is
transformed through a homography. We will describe in this
section some applications based on projective
transformations.
18.2. 1 Mosaics
Image mosaicing is the alignment (or registration) of multiple
images
into larger aggregates through the application (in general) of
homographies. Mosaicing synthesises an image taken from a
virtual camera with a larger
field of view. Since there are two cases where images of a
scene are linked by homographies, there are correspondingly
two types of mosaics:
panoramic mosaics: images are taken from a camera rotating
on its COP, as in the case of Fig. 18.1;
Fig. 18.1 Panoramic mosaic from three images. Note the stretching both
horizontally and vertically as one moves away from the central reference image
to those
seen for 3D model registration in Chap. 15: what changes is
the nature of
the data, from 3D point clouds to 2D images, and the type of
transformation, homographies of in place of isometries (or
similitudes) of .
Therefore, in the wake of ICP, one can devise
correspondence-less
methods that, use all pixels to compute the best homography
that aligns the two images (such as Szeliski 1996). This
approach would become
impractical if extended to many images, where feature
extraction and
matching is to be preferred, as in Brown and Lowe (2007).
Another
approach would be to compute homographies between pairs of
images with any method and then synchronise them, as we did
for rotations in Chap. 14 and Sect. 15.2.3.
insist more than one image, which raises the question of how
to assign the colour to the
mosaic.
The most commonly used mixing operators are the average,
the median and the selection of a single image as the colour
source according to
appropriate criteria. The feathering operator performs a
weighted average of the colours, with the weight decreasing
according to the distance of the pixel from the image edge.
This remedies the vignetting effect which results in brighter
images in the centre than in the periphery. Brown and Lowe
(2007) show very good result with multi-band blending.
In the presence of misalignments due to residual alignment
errors,
parallax (deviations from flatness) or moving objects, the
techniques just
listed are nevertheless bound to produce visible artefacts.
Blending cannot remedy problems inherently present the
data; however, it can mask them as much as possible. This is
the approach that informs methods that attempt to compose
the mosaic with tiles each cut from a single image, so that the
transitions between them (or “seams”) are as little visible as
possible.
The first criterion for reducing the impact of seams is to
minimise their total length, so a tessellation of the plane with
minimal perimeter is sought. Consider the centroidal Voronoi
tessellation induced by the image centres projected on the
mosaic reference plane. Recall that, given a set of centres,
the Voronoi tessellation assigns to each centre a convex cell, to
which belong the points in the plane closer to that centre than
to any other. If the centers are themselves the centroids of
those regions such a tessellation is called a centroidalVoronoi
tessellation.
Two facts are relevant here: (i) as the number of
centroidalVoronoi cells in a bounded region increases, each
Voronoi cell approaches a hexagon (Du et al. 1999); (ii) the
“honeycomb conjecture” (Hales 2001) states that the
tessellation that divides the plane into regions of equal area
with the
smallest total perimeter is one with regular hexagons. For
these reasons,
the centroidalVoronoi tessellation of the image centres can be
considered a cost-effective approximation of the lowest-
perimeter tessellation.
Figure 18.5 shows the area covered by each image after its
transformation into the mosaic reference plane (left) and the
area assigned to that image after decomposition into Voronoi
cells (right).
Fig. 18.5 Representation of the frames of the images transferred into the mosaic
reference plane and Voronoi tessellation. Images taken from Santellani et al. (2018)
Boundaries between tiles are straight segments that do not
take into
account the content of the image; they should be refined by
evolving them into paths connecting two vertices through
pixels where colour differences between adjacent cells are
minimal, as suggested by Davis (1998). In this way, if the
original cell boundary crosses, for example, a moving object,
the refined path will avoid it, since there are likely to be large
differences in the affected pixels due to object-background
contrast.
Figure 18.6 shows an example of the seams before and after
optimisation. One can see how the cell boundary follows the
space between the trees and avoids crossing them, since the
canopies are more textured
than the ground and affected by parallax.
Fig. 18.6 Detail of the seams produced by Voronoi tessellation and optimised
seams. Images taken from Santellani et al. (2018)
Given a sequence of images, the purpose of stabilisation is
18.2.2 Image Stabilisation
Fig. 18.7 Top row: two images from an aerial sequence. Bottom row: the same
images stabilised
with respect to the ground plane. In white the frame of the reference image.
Images taken from Censi et al. (1999)
18.2.3 Perspective Rectification
There are cases where one would like to use an image as if
was a map, a blueprint or an elevation, that is, drawings
that has a constant scale of representation, thus allowing
to measure real distances on them.
Photographs result from a perspective projection, and the
scale is not constant across the image, unless in a special
case where framed object is planar and parallel to the focal
plane, as noted in Problem 3.3. In general, however, to have
a constant scale, a scaled orthographic projection is
required, which must be synthesised from the actual images
and the depth of the points.
Let us first concentrate on the special case of approximately
planar
objects such as portions of the territory without great
orographic
discontinuity or architectural items such as paving or facades.
Let the plane be our proxy of such object, then the image
can be transformed or
rectified so that the virtual focal plane is parallel to . The
transformation between and its perspective image is a
homography, which is completely defined by four points whose
coordinates in are known. Once such a
homography is determined, the image can be projected
backwards onto . This is equivalent to synthesising an
image taken by a virtual camera whose optical axis is
perpendicular to . Such an image is also called a photoplane.
An example is shown in Fig. 18.8. The photoplane can be
obtained from a
single photo or from a mosaic (in this case it is also called
a controlled mosaic).
Fig. 18.8 To the right a photograph of Porta Udine (Palmanova, IT). On the left the
photoplane of the facade. The four points used in the rectification are the vertices
of the yellow quadrilateral, which in the photoplane becomes a rectangle
The photoplane is correct only if applied to perfectly flat
objects or with such depth variations that generates negligible
errors at the scale of
representation chosen. When the object to be represented is
not exactly a plane, we can evaluate the extent of the error
that is made. With reference to Fig. 18.9, consider a generic
point M of depth z, which projects in onto the reference
plane , and let a be the elevation of M with respect to that
plane, or relief. The difference between the image of M,
which is x
away from the principal point, and that of represents the
error in the
image plane. From and , one
obtains:
( 18. 1)
Fig. 18.9 The relief displacement. The actual object point is M, with relief
a. Ignoring a it is tantamount to confusing M with that lies on the
reference plane
In the photogrammetric literature, is called relief
displacement, since it manifests as a (radial) displacement of the
image point caused by the relief;
it is in fact a form of planar parallax.
We observe that is small when:
the height a is small compared to the distance from the plane
is small.
If the maximum parallax falls within a tolerance
threshold,1 then the photoplane can be considered an
orthographic projection. Since parallax increases with the
distance of the image point from the principal point,
using only the central part of the image in the
composition of a mosaic allows to keep parallax under
control.
( 18.2)
Substituting these PPMs into the equation of the epipolar line
of with the explicit depths (6.30), we get:
( 18.5)
Fig. 18.11 Some frames of a sequence (Palmanova, IT) synthesised using parallax.
The values and correspond to the reference images. For values between
0 and 1, there is interpolation; for values outside the range, there is extrapolation
( 18.
12)
Fig. 18.13 Artefacts in image synthesis. From left to right: folding, holes and
magnification
( 18.
13)
Problems
18. 1 How to calculate the epipole given six homologous
points, four of which are coplanar?
18.2 In an autonomous driving application (e.g. of a car),
if one fixes the road plane as reference, all points for which
the parallax is significantly
different from zero are immediately detected as obstacles.
Expand this idea to build an obstacle avoidance algorithm.
References
M. Brown and D. G. Lowe. Automatic panoramic image stitching using
invariant features. International Journal of Computer Vision, 74 ( 1): 59– 73,
2007.
[Crossref]
A. Censi, A. Fusiello, and V. Roberto. Image stabilization by features tracking. In
10th International Conference on Image Analysis and Processing, pages 665–667, Venice,
Proceedings of the
James Davis. Mosaics of scenes with moving objects. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 354–360, 1998.
Qiang Du, Vance Faber, and Max Gunzburger. Centroidal voronoi tessellations:
Applications and
algorithms. SIAM Review, 41 (4): 637–676, 1999.
https://doi.org/10.1137/S0036144599352836. [MathSciNet][Crossref]
A. Fusiello. Specifying virtual cameras in uncalibrated view synthesis. IEEE
Transactions on Circuits and Systemsfor Video Technology, 17 (5): 604–611, May 2007.
[Crossref]
T. C. Hales. The honeycomb conjecture. Discrete Comput. Geom., 25 ( 1): 1–22, jan
2001. ISSN 0179- 5376. https://doi.org/10.1007/s004540010071.
D. Liebowitz and A. Zisserman. Metric rectification for perspective images of planes.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 482–
488, 1998.
L. McMillan and G. Bishop. Head-tracked stereo display using image warping. In
Stereoscopic Displays and Virtual Reality Systems II, number 2409 in SPIE Proceedings,
pages 21–30, San Jose, CA, 1995.
E. Santellani, E. Maset, and A. Fusiello. Seamless image mosaicking via
synchronization. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information
Sciences, IV-2: 247–254, 2018.
Steven M. Seitz and Charles R. Dyer. View morphing: Synthesizing 3D
metamorphoses using image
transforms. In Holly Rushmeier, editor, SIGGRAPH: International Conference on Computer
Graphics and Interactive Techniques, pages 21–30, New Orleans, Louisiana, August 1996.
A. Shashua and N. Navab. Relative affine structure: Canonical model for 3D from
2D geometry and applications. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 18 (9): 873– 883,
September 1996.
[Crossref]
R. Szeliski. Video mosaics for virtual environments. IEEE Computer Graphics and
Applications, 16 (2): 22–30, March 1996.
[Crossref]
Richard Szeliski. Image alignment and stitching: a tutorial. Foundations and Trends
in Computer Graphics and Vision, 2 ( 1): 1– 104, 2006.
[MathSciNet][Crossref]
Footnotes
1 In photogrammetric survey this threshold is given by the graphical error, equal
to the thickness of the mark on the paper, conventionally fixed at 0.2 mm.
OceanofPDF.com
Appendix A
(A. 1)
(A.3)
(A.4)
Considering the vectors as matrices formed by a single
column, we can write the scalar product using the matrix
product and transposition
(superscript ):
(A.5)
(A.6)
(A.7)
(A.8)
In the special case where the matrix is a vector, the two norms
coincide:
(A.9)
A.4 Inverse Matrix
Definition A.4 (Square Matrix) A matrix A is said to be
square if it has the same number of rows and columns.
A square matrix in which all elements below (or above) the
diagonal are null is said to be triangular.
A square matrix in which all elements other than the
diagonal are null is said to be diagonal:
(A. 10)
Properties:
1. The inverse, if it exists, is unique
2.
3. .
(A. 11)
(A. 12)
2. with
3.
4.
5. A is invertible
Let A be a matrix
. We call the minor of order q extracted from A the
Definition A.7 (Minor of Order q)
determinant of a submatrix of A.
The adjoint matrix of a
square matrix A is defined as
Definition A.8 (Adjoint Matrix)
(A. 13)
(A. 15)
(A. 17)
2. is a squareform in
3. is a bilinearform in and .
2.
3. is antisymmetric
4. .
If , it is said that
A has full rank.
Remark A.4
Properties:
1. A square has full rank
2.
3. The rank does not change after swapping two rows (or
columns)
4. .
(A.21)
2. .
(A.22)
A.9 QR Decomposition
1
Theorem A.3 (QR Decomposition) Let A , then there exists an
orthogonal matrix Q and a upper triangular matrix R such
that
(A.23)
(A.24)
or, equivalently
(A.26)
Such a vector is called eigenvector of A.
Let be the eigenvalues of the matrix A .
Then:
(A.27)
(A.28)
(A.31)
.
and . The columns of U are the eigenvectors of ; the
columns of V are the eigenvectors of
satisfies the
assumptions of Theorem A.2.) So the diagonal matrix
contains the eigenvalues of , and the columns of V are its
eigenvectors.
Remark A.13
Proposition A. 12 (Compact SVD) Let A be a
matrix and let be its singular value decomposition. Let
(A.34)
be its nonzero singular values. Then:
where )
.
(A.37)
.
After making the variable change (recall that V is
Proof
orthogonal), we
get:
(A.3
8)
(A.39)
Given two matrices A and B, the solution of the problem
is where is the SVD of : .
; therefore,
the problem becomes to maximise . Let
Proof
, then
is a
generalisation of the inverse from square matrices to non-
square matrices. It is defined as a matrix that satisfies
certain properties resembling those of the inverse of a non-
square matrix. Let be a matrix. The Moore- Penrose
inverse of A, denoted by , satisfies the following properties:
It can can be shown that for each A, the pseudoinverse
exists and is unique.
Some properties derived from the defining equations:
for nonsingular A
(A.43)
(A.44)
then also solves the least-squares problem
(A.45)
Otherwise, there are many solutions, and one is arbitrarily
chosen as the representative.
The pseudoinverse can be used to find a least-squares
solution to a linear system of equations with
(A.46)
When , the unique solution is exactly (A.45);
otherwise, it is the one with minimum Euclidean norm.
In general, the pseudoinverse of A can be calculated using
the SVD as
(A.47)
where is the diagonal matrix with the reciprocals of
the nonzero singular values on the diagonal, and zeroes
elsewhere.
Ultimately, also the least-squares solution of
inhomogeneous systems reduces to the SVD.
A. 13 Cross Product
This is a product between two vectors that returns a vector,
and is defined only in .
Definition A. 19 (Cross Product) The cross product
of two vectors is defined as the vector:
(A.48)
(A.49)
acts as the cross product for , that is, .
(A.5
0)
(A.53)
(A.54)
and likewisefor .
thesis.
It is shown (Kanatani 1993) that the formula (A.54) also
solves the least- squares problem:
(A.55
whe )
.
n
A. 14 Kronecker’s Product
(A.56)
3.
4.
5. .
Proposition A. 19 The eigenvalues of are Kronecker’s
product of the eigenvalues of A by the eigenvalues of B.
Wi ,we obtain:
(A.63)
It is easy to see that is nothing else but
the diagonal block matrix which has the columns of A as
blocks. This construction is
implemented in the CV Toolkit by the MATLAB function diagc.
A. 15 Rotations
The rotations in are represented by orthogonal
matrices with positive determinant and form the special
orthogonal group denoted by .
According to a well-known Euler theorem, every rotation
in can be written as a rotation of some angle around an
axis passing through the origin, identified by a versor .
The corresponding rotation that performs a
matrix
rotation of can be
around the axis identified by the
obtained from and
versor using Rodriguez’s formula:
(A.64)
(A.71)
(A.72)
Shape.
Journal of the Royal Statistical Society. Series B (Methodological), 53
285– 339, 1991.
K. Kanatani. Geometric Computationfor Machine Vision.
Oxford University Press, Oxford, 1993.
Gilbert Strang. Linear Algebra and Its Applications. Harcourt
Brace & Co., San Diego, CA, 1988.
Appendix B
(B.
1)
or
(B.2)
By this coincides with the usual Jacobian matrix of
definition, .
matrix :
(B.3)
(B.7)
(B.
11)
Therefore, by the identification theorem
(B.
Example B.5 Derivative of where X is . 12)
(B.
13)
We introduce the commutation matrix K, a specific permutation
matrix that transforms into . In particular, let A
be an matrix:
So
(B. 16)
Example B.6 Derivative of the inverse. First let us work out the
differential of the inverse:
(B. 17)
and therefore
(B. 18)
Then:
(B.
19) hence, by the identification theorem
(B.20)
(B.2
3)
Thanks to the linearity of , we obtain:
So (B.2
(B.2
5)
(B.2
6)
F
The derivative is
or
example, for , we have: (B.2
8)
(B.2
9)
(B.3
0)
(B.3
1)
(B.32)
defined
as follows:
(B.33)
where denotes the versor of the i-th axis (e.g. )
For (angle ) the derivative is
. In the end:
(B.35)
inverses and
nonlinear least-squares problems whose variables separate.
SIAM Journal on Numerical Analysis, 10 413–432, 1973. doi:
https://doi.org/10. 1137/ 0710036.
J. R. Magnus and H. Neudecker. Matrix Differential Calculus with
Applications in Statistics and Econometrics. John Wiley & Sons,
New York, revised edition, 1999.
Appendix C
Regression
C. 1 Introduction
Fitting of a model to noisy data, or regression, is an important
statistical
tool, frequently employed in computer vision for a wide variety
of purposes.
The aim of regression analysis is to fit the parameters of a
function
(model) to the observations (data), which are obtained by
measuring a
dependent variable in correspondence to m data values of n
independent variables , with . The
model function has the form , where contains the p
parameters .
The adequacy of the model in fitting to the data is
measured through its residual, defined as the difference
between the measured value of the
dependent variable and the value predicted by the model
(at the same values of the independent variables):
(C. 1)
The goal is to compute the parameter estimates
that best explain the observations, in the
sense that they minimise the residuals defined above.
(C.9)
from which
(C.
10)
In the following, to lighten the notation, we neglect the
weighting with .
(C. 12)
In the case where the Jacobian matrix of does not have full
rank, one can
generalise the procedure by computing the
increment with the pseudoinverse:
(C. 15)
Finally, remember that we dropped the variance
weighting, so the normal equation (C.13) should write:
(C. 16)
turn is a
quadratic approximation of the function at a point, the Gauss-
Newton step can be interpreted as the one leading on the
minimum of the parabol oid that locally approximates the cost
function.
This approximation is all the more verified the closer we are
to the
minimum. On the other hand, when the quadratic
approximation does not hold, Gauss-Newton does not reduce
the error quickly enough, or not at all. In these cases it would
be better to use the gradient descent technique,
which always guarantees (under mild conditions)
convergence to a stationary point, however slowly.
Consider again (C.4); if we wanted to find its minimum
with gradient descent, we would have to calculate its
gradient:
(C. 18) where is the length of the step. Gradient descent has
a somewhat
complementary behavior to Gauss-Newton; in fact it has a very
slow
convergence rate right near the minimum when contour
lines are very stretched in one direction, just where
Gauss-Newton works best.
Levenberg’s idea was to blend the two methods with a
strategy to choose which of the two to weigh more
depending on local conditions.
Combining (C.18) and (C.13) gives:
(C. 19)
For , one gets Gauss-Newton, while for one
obtains
which is a gradient descent. The strategy to
modulate the mixing factor is as follows: when the error
decreases, then Gauss-Newton is working and we make it the
prevailing strategy by decreasing (e.g.
divide by 10). When, on the contrary, the error is
increasing, we drift towards gradient descent by increasing
(e.g. multiply by 10).
Marquardt observed that even when the strategy inclines
towards
gradient descent, we can still exploit the local curvature
information by
scaling the components of the diagonal matrix to make longer
steps where the gradient is small. So the formula for
calculating the Levenberg-
Marquardt (LM) step becomes:
(C.20)
Let it be clear that is not actually computed by
inverting a matrix but by solving a linear system of
equations.
The LM method can be seen as an instance of the more
general class of damped Gauss-Newton methods, where a
diagonal matrix, called damping
term, is summed to . The damping term also solves the
problem of rank-deficient Jacobian matrices.
The lsq_nonlin function provided in the CV Toolkit, not
listed here for space reasons, is an instructional
implementation of the Levenberg-
Marquardt method
is only
optimal in the case where the errors that affect the data are
Gaussian
distributed. In the case where the distribution is
contaminated by outliers, that is, observations that do not
follow the general trend of the data, these can corrupt the
least-squares estimate in an arbitrarily large way (see Fig.
C.1). The outliers arise in most cases from gross
measurement errors or impulsive noise that plagues the
data.
Fig. C.2 Loss functions “Bisquare”,” Cauchy” and “Huber” (first line) and
related weight functions (second line). The square function is also drawn for
comparison
(C.2
4)
The solution of the latter equation is equivalent to the
solution of the weighted least-squares problem:
(C.2
5)
The weights, however, depend on the residuals, the residuals
depend on the estimate , and the latter depends on the
weights. Thus, an iterative
solution is needed, called Iteratively Reweighted Least-
Squares (IRLS) and implemented in Listing C.2. If is not
convex, convergence to local minima may occur. The initial
estimate can be obtained, for example, with ordinary least-
squares; however, Stewart (1999) notes that even in the
presence of
low percentages of outliers ( 15%), IRLS may not converge to
the correct
solution from this estimate. Thus, a robust initial estimate is
needed,
obtained for example, by one of the methods we will see
below. The same author also suggests freezing the
estimate after a number of iterations to allow IRLS to
converge better.
Listing C.2 IRLS
(C.28)
(C.29)
and also
3)
(C.3
4)
If the relation that links and is not linear, it is
necessary to approximate it by its Taylor series expansion
around the point obtaining
(C.3
where
5)
(C.3
6)
is the Jacobian matrix off evaluated in .
Truncating the expansion to the linear term and
computing the expectation yields the linear estimate of
the mean: (C.3
(C.3
8)
(C.4
0)
(C.41) assuming that the Jacobian matrix has full rank. If not,
the pesudoinverse
replaces the inverse (Hartley and Zisserman 2003), p. 144.
JiriMatas.
Magsac++, a fast, reliable and accurate robust estimator. In
CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 1301– 1309.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition,
Appendix D
therefore of
homogeneous coordinates, we follow Leon Battista Alberti
and study first the perspective projection of a plane. For a
formal treatment of analytic projective geometry, see, e.g.
Semple and Kneebone (1952).
Fig. D.2 The depth line containing M projects onto a line containing P and
passing through the principal point O
points;
Points at infinity are ideal (or improper) points that are added
D.3 Homogeneous Coordinates
(D.2)
.
Proposition D. 1 The line through two distinct points
and is represented by the triplet
Proof The line passes through both points, in fact:
and
(D.4)
has parametric equation:
.
Proof
D.5 Transformations
Definition D.4 (Projectivity) A projectivity or projective
transformation is a linear application in
homogeneous coordinates:
(D.6)
where His a non-singular matrix.
(D.7)
Appendix E
MATLAB Code
The MATLAB code listed in the text, along with other support
functions not listed, can be downloaded from
https://github.com/fusi ello/computer_ vision_Toolkit. Keep
in mind that this is code written with educational
intent and therefore:
whenever more efficient code could have been written at
the expense of readability, we have preferred the latter;
to avoid cluttering up the listings, there are no checks on
whether the parameters passed to the functions match
what is assumed within the functions themselves;
for the same reason, there is no documentation at the top
of the function and comments are used sparingly;
for reasons of compactness and simplicity, special cases and
pathological situations that might occur are not dealt with;
the input parameters to the functions are limited to the
indispensable arguments; other control parameters are
wired into the function itself;
the use of a variable number of input and output
parameters, and
assignments by default are used less than would be
expected in a well- built library.
Conventions related to the interface of functions:
the points are in Cartesian coordinates;
both 2D and 3D points are packed into a matrix by columns;
items (points, PPM, etc.) referred to multiple images are
represented with cell arrays indexed by the image;
in cases where the function calculates a transformation, it is
such that the second argument is transformed to the first.
The list of the main implemented functions follows
(auxiliary ones are omitted). All were tested with MATLAB
R2021a and OCTAVE 4.2. 1 under macOS 12.3. 1.
Index
Adjoint matrix 80
Adjustment of independent models 207
Affinity 331
Algebraic distance 100
Alignment 225
Autocalibration 236, 242
Bearing 216
Bilinear interpolation 280
Block matching 168, 173
Breakpoint 318, 319
Bundle adjustment 217– 224
Calibration 35–43
C
Camera
affine 244
pinhole 5, 10, 15– 28
telecentric 10
Camera matrix, see Perspective Projection Matrix (PPM) 334
Camera obscura 5, 21
Camera reference frame 16
Census transform 171
Centre of projection 15, 26
Chirality 78
Circle of confusion 10
Coded-light 191
Collinearity equations 28
Collineation 64
Commutation matrix 308
Compatible matrix 92
Conjugate points 9
Control points 35, 49, 50
Convolution 140
Corresponding points, see Homologous Points 334
Cross-correlation 140
Depth of field 10
Depth-speed ambiguity 73, 74, 236
Diffuse scattering 13
Direct Linear Transform (DLT) 35, 37, 50, 64, 67, 103
Disparity 125, 127, 137, 165– 182, 274, 275
Disparity map 166
Features
F
Image plane 5, 6, 9, 15
I
Jacobian matrix 30, 103, 108, 153, 154, 219– 222, 305–
J
Kernel 140
K
derivative 144
Gaussian 141
smoothing 141
Sobel 146
Kronecker product , 36, 61, 67, 215, 244299
Kruppa constraints 240
Lambert’s law 13
L
Mapping
M
backward 280
forward 279
Marching cubes 250, 262
Match space 177
M-estimators 319
Model 201
Mosaic
panoramic 266
planar 266
Mosaicing 266– 270
Motion 246
Nodal points 9, 29
N
Non-linear
calibration 105
exterior orientation 106
fundamental matrix 115
homography 111
relative orientation 118
triangulation 107, 108
Normal configuration 59, 125, 126
Normal equation 296, 315
Normalised cross-correlation, NCC 169, 259
Normalised image coordinates 20, 50, 74– 76
Parameters
extrinsic (see Extrinsic parameters) 334
intrinsic (see Intrinsic parameters) 334
Perspective pose 49
Perspective Projection Matrix (PPM) 18– 28, 333
photo-consistency 249, 251, 258, 259, 261
Photogrammetry 23, 28, 32, 49, 91, 207
Photographic scale 32
Photoplane 272
Pinhole camera 5
Pinhole camera, see camera
pinhole 334
Planar mosaics 266
Pose, see Orientation 334
Preconditioning 37, 63
Principal axis 15, 16
Principal point 15
Projection
orthographic 10
perspective 6, 17, 18, 26, 327
Projective bundle 15
Projectivity 64, 92, 233, 331
Thin lens 8, 29
Tie points 73, 87, 203
Triangulation 75, 87– 89, 92, 125– 129, 205, 208, 333
active 186– 188
normal case 126
Triple product 75, 298
Footnotes
1 in the sense that for .
7 Only exception in this text to the convention that vectors are arranged in
columns.
OceanofPDF.com