Exercises
Exercises
1 Projective Geometry
1. Why is the Euclidean geometry not sufficient to model image formation ?
2. How many vanishing points the perspective projection of the edges of a cube define ?
3. Assume that a point Pi is linearly interpolated between two points P1 and P2 in 3D. Is the
perspective projection of Pi the same linear interpolation between the projection of P1 and P2 ?
Same question with an orthographic projection ?
4. What are the homogeneous coordinates of the line of P 2 going through the points with homo-
geneous coordinates (1, 0, 0) and (0, 1, 0) respectively ?
2 Perspective Projection
Consider a perspective projection with focal length f :
1. In such a projection why do objects further away appear smaller in the image ?
2. Given an object (perspectively) projected in an image how should I modify the focal length of
the projection so that the size of the object in the image is divided by 2 ?
3. Assume that two discs S1 and S2 of radius R and 2R are perpendicular to the optical axis
with their centers on the optical axis at distances D1 and D2 ≥ D1 from the projection center
respectively.
3 3D Modeling
1. Considering a point on a shape silhouette in an image, can we tell whether the corresponding
point in 3D is a convex, concave or saddle point on the observed shape ?
2. An algorithm estimates the visual hull associated to n silhouettes using a voxel grid of size
d3 . What is the theoretical maximum number of inside silhouette tests required ? Is there a
theoretical minimum number of such tests ?
MOSIG M2 GVR 1
Computer Vision Exercises
4. In a multi-view stereo reconstruction, we seek for points that are photoconsistent, what does it
mean to be photoconsistent and what are the assumptions made in such reconstruction ?
4 Image Mosaics
Assume that a camera acquires images while rotating about its optical center and consider the case
where the camera takes two images. Between the two images, it carries out a rotation about the Y
axis:
cos β 0 sin β
R= 0 1 0
− sin β 0 cos β
As for the intrinsic parameters of the camera, we suppose that they correspond to a simplified cali-
bration matrix whose only unknown is the focal length α:
α 0 0
K = 0 α 0
0 0 1
It is known that there exists a projective transformation (homography) that links the two images. The
goal is to estimate the transformation from a single point correspondence.
1. Write down the homography H in terms of the two unknowns, the rotation angle β and the focal
length α.
2. Assume that two point correspondences are required to estimate the transformation H. Assume
further that we have access to 10 point correspondences among which 80% are good correspon-
dences. What is the probability to get a good estimation after 2 RANSAC iterations ?
5 Plane Projection
The perspective projection of the point P with homogeneous coordinates (x, y, z, 1) onto the image
point with coordinates (u, v) can be modeled with:
MOSIG M2 GVR 2
Computer Vision Exercises
R a rotation matrix in R3 and T the 3 × 1 position vector of the world origin in the camera coordinate
frame.
1. We consider points in the plane with equation z = 0, what kind of transformation becomes the
above projection (1) with such points ?
2. Assume that the image plane is parallel to the plane with equation z = 0.
(a) What kind of transformation is the above projection (1) with the points in the plane z = 0 ?
(b) What is the minimum number of point correspondences required to estimate the transfor-
mation and how can we compute it given this number of correspondences ?
(c) Consider two lines in the plane z = 0 that are parallel to the x axis. Where does the
projections of those two lines intersect in the image plane ?
6 Correction
2. The edges of a cube in 3D are along 3 different directions and each direction in 3D defines one vanishing
point in a perspective projection, thus the cube defines 3 vanishing points.
5. Assume two parallel lines of P 2 , they intersect at a point at infinity with therefore homogeneous co-
ordinates (a, b, 0). Given that the third row of an affine transformation of P 2 is (0 0 1) this point is
transformed by any affine transformation into a point (a0 , b0 , 0) which is itself on the line at infinity.
Since linear transformations of P 2 preserve incidence, this point is at the intersection of the transformed
lines that are therefore parallel.
MOSIG M2 GVR 3
Computer Vision Exercises
2. Using l = Lf /Z from the previous question we see that the focal length must be divided by 2.
3. (a) Consider the 3D point (X, Y, D1 ) on the external circle of S1 , i.e. X 2 + Y 2 = R. This point
projects onto the point (Xf /D1 , Y f /D1 ) in the image and we can check that: (Xf /D1 )2 +
(Y f /D1 )2 = Rf /D1 . Hence the disc projection in the image is delimited by a circle of radius
Rf /D1 hence it is a disc. The same reasoning applies to S2 and we thus observe two nested discs
centered on the optical axis with radius Rf /D1 and 2Rf /D2 respectively.
(b) D2 = 2D1 .
(c) Assume D2 − D1 = l then the radius of disc S2 writes 2Rf /D2 = 2Rf /(l + D1 ) and it equals
the radius of S1 when Rf /D1 = 2Rf /(l + D1 ), hence when D1 = l.
6.3 3D Modeling
1. In the image the curvature along the occluding contour (the projection of the 3D curve that delimits the
visible region on the observed shape) is either convex or concave in which cases the shape is locally
convex or hyperbolic (saddle point) respectively. Concavities are not observed by silhouettes due to
occlusion.
2. In practice n inside tests can be required, e.g., for a voxel inside the visual hull, and 1 test can be
sufficient, e.g., for a voxel that projects outside all silhouettes.
3. To observe a scene in 3D we need to generate different viewpoints of that scene. On a mobile phone, we
can generate 2D views that depend on the position and orientation of the mobile phone hence creating a
3D feeling when navigating around the scene with the device. On a stereo screen 2 views are generated
from a fixed viewpoint. These views are processed by the human brain to generate 3D information,
depth information in practice (fixed viewpoint). With a head mounted display, 2 views are generated that
depend on the device position and orientation. These solutions differ by the number of views that are
generated at a given time, one or two, and by whether these views depend on the position and orientation
of the device over time. Only the head mounted display is providing a full 3D experience.
4. A 3D point is photo consistent in several image projections when the colors, at the corresponding image
locations, are consistent, i.e. , similar. This is true for a point on the surface of the observed shape when
the surface is Lambertian (diffuse) and not specular, since specularities appear differently depending on
the viewpoint (brighter when the viewpoint is in front of the specular region).
5. A 4D model is composed of geometric information, typically a triangular mesh in 3D, appearance in-
formation, typically in the form of a 2D texture image that is associated to the geometry information,
typically with texture coordinates for mesh vertices. Additional motion information can be provided
with for instance vertex displacements over time in which case the texture information can also evolved
over time, as an image sequence for instance.
MOSIG M2 GVR 4
Computer Vision Exercises
2. The probability to get 2 good correspondences is 0.82 = 0.64. The probability to not get 2 good
correspondences after 2 iterations is (1 − 0.64)2 = 0.1296 and thus the probability to have a good
estimation after 2 RANSAC iterations is 0.8704.
3. In principle yes. Given a point correspondence we get 3 equations with 3 unknowns: β, α and the scale
factor in the correspondence equation.
2. (a) The rotation R is around the z axis only and the two column vectors R1 and R2 present both a
third coordinate which is zero (the rotation applies to x and y only). The transformation is then an
affine transformation of the plane z = 0.
a1 a2 a3
(b) Let A = a4 a5 a6 be the associated transformation. A point correspondence brings 2
0 0 1
equations hence 3 correspondences are therefore required to estimate A.
0 0 t = A · (x , y , 1), i ∈ [1..3]., we can write:
3 correspondences
Given (xi ,0 yi , 1) i i
x1 x2 x3 x1 x2 x03
0
MOSIG M2 GVR 5