Tutesolutions
Tutesolutions
(a) By deriving a succession of Euclidean transformations, find the camera’s extrinsic cam-
era calibration matrix [R|t].
First find the rotation from World to aligned frame A (either in steps — or in one go here as
it is obvious). Second handle the translation.
Using inhomogeneous 3 × 1 scene vectors
0 1 0 0
XA = 0 0 1 XW XC = XA + −h
1 0 0 4h
(b) The “ideal” image plane is placed at zC = 1. Derive the image coordinates of the
vanishing point of the family of lines parallel to (XW , YW , ZW ) = (2 + 4t, 3 + 2t, 4 + 3t) .
2 + 4t
P 3 + 2t
Point on the lines are XW = 4 + 3t . Let t → ∞, and the point at infinity is
1
4
X∞
P 2
W = 3
0
C18 Machine Vision 1(MT 2015) Page 2
(c) The intrinsic calibration matrix maps ideal image positions onto actual positions in
pixels. Explain ... (Lecture notes.)
(d) The actual camera has f = 800 pixels, γ = 0.9, s = 0, and (uo , vo ) = (350, 250) pixels.
Derive the coordinates in the actual image plane of the vanishing point of part (b).
P
Using an explicit per-point scale λ to convert = to = ...
x 800 0 350 1/2
λ y = 0 720 250 3/4
1 0 0 1 1
⇒ λx = 800(1/2) + 350(1)
λy = 720(3/4) + 250(1)
λ = 1
Hence
(p31 Xi + p32 Yi + p33 Zi + p34 )xi = (p11 Xi + p12 Yi + p13 Zi + p14 )
(p31 Xi + p32 Yi + p33 Zi + p34 )yi = (p21 Xi + p22 Yi + p23 Zi + p24 )
where vector p = (p11 , p12 . . . , p33 , p34 )> contains the unknown elements. Solve for 6 known
points i = 1...6 build up Ap = 0 and find p as the null-space of A
p = ker(A)
C18 Machine Vision 1(MT 2015) Page 3
QR ← P−1
LEFT
Step 6. Recover the translation. The scale of P is now consistent with the scale K. Recall
P = K[R|t], hence
t ← K−1 [P14 P24 P34 ]> or you could use the pre-scaled quantities t ← R[p4 p8 p12 ]>
(b) Code in calibration.m also fixes some sign ambiguities which were mentioned but not explained
in the lecture (they would not be asked for). They arise because you can change the sign of
a row or column of a rotation matrix and it remains a rotation matrix. To understand them,
write the product KR in terms of symbols
(f R11 + sR21 + u0 R31 ) (f R12 + sR22 + u0 R32 ) (f R13 + sR23 + u0 R33 )
KR = (γf R21 + v0 R31 ) (γf R22 + v0 R32 ) (γf R23 + v0 R33 )
K33 R31 K33 R32 K33 R33
and argue that f , γf and K33 must be positive. So
• if (f < 0), change the signs of f , R11 , R12 , R13 .
• if (γf < 0), change the signs of γf , s, R21 , R22 , R23 .
• if (K33 < 0), change the signs of K33 , u0 , v0 , R31 , R32 , R33 .
(c) Projection is
0 1 0 0
1 0 2 0 0 1 −1
(λ1 · · · λ6 )x = 0 2 1 [I|0]
1 0 0 4
X1 · · · X 6
0 0 1
0 0 0 1
0 0 0 0 1 1
2 1 0 8
0 1 1 0 0 1
= 1 0 2 2 0 , 0 , 1 , 1 , 1 ,
1
1 0 0 4
1 1 1 1 1 1
8 9 9 8 10 11
= 2 2 , 4 , 4 , 5 , 5
4 4 4 4 5 5
C18 Machine Vision 1(MT 2015) Page 4
translation =
0.0000
-1.0000
4.0000
Hurrah, it works.
K−1 x
−1
0 0 K x
Q= ⇒ q = K [R|t] = K0 RK−1 x0
0 0
0 0 0 0
C= ⇒ e = K [R|t] = K0 t
1 1
⇒l0 = e0 ×q0 = K0 t×K0 RK−1 x0 = [K0 ]−> (t×RK−1 x) = [K0 ]−> [t]× RK−1 x = Fx
(c) i) ... derive a compact expression for the Fundamental Matrix F ... done already
ii) Show that x0 > Fx = 0.
Point x0 is on the epipolar line, hence x0 > l0 = 0. Hence x0 > Fx = 0.
(d) A single camera with K = I captures an image, and then translates along its optic axis before
capturing a second image, so that t = [0, 0, tz ]> .
i) Use x0 > Fx = 0 to derive an explicit relationship relating x 0 , y 0 , x, and y . First find F.
0 −tz 0
0 −> −1
[K ] [t]× RK = [t]× = tz 0 0
0 0 0
C18 Machine Vision 1(MT 2015) Page 6
Hence
>
x0 Fx = tz (−x 0 y + y 0 x) = 0 hence x 0 /y 0 = x/y .
ii) Briefly relate your result to the expected epipolar geometry for this camera motion.
(x’,y’)
The camera has moved along the optic
axis, so we expect the origin to be the
Focus of Expansion of a flow field. This
(x,y)
is exactly what x 0 /y 0 = x/y expresses.
(a) The figure shows a pair of cameras each with focal length unity whose principal axes
meet at a point. The y -axes of both cameras are parallel and point out of the page.
i) C has K = I and P = [I|0]. Find P0 , find F.
x1
x d z2
z x2
0 0 0 0 z1
Camera C has P = K [R|t] with K = I, so C
this is all about finding the extrinsic matrix 1
(one) 60o
for the second camera, such that
0 R t
X = X d
0> 1
x’ z’
1
C’
Build in stages.
0 cos 60 0 − sin 60 0 0
I 0 0 1 0 0 1 0
0 I
X1 =
−d
X ; X2 =
sin 60 0 cos 60 0 X ; X = +d
X2
0> 1 0 0 0 1 0> 1
Combine
√ √
1/2 0 − 3/2 +( 3/2)d
0 R t 0
√ 1 0 0
X = X= X
0> 1 3/2 0 1/2 +(1/2)d
0 0 0 1
C18 Machine Vision 1(MT 2015) Page 7
P
F = [K0 ]−> [t]× RK−1
√
0 −(1/2)d √ 0 (1/2) 0 (− 3/2)
= I> +(1/2)d √0 −( 3/2)d √ 0 1 0 I−1
0 +( 3/2)d 0 ( 3/2) 0 (1/2)
0 −(1/2)d √0
−(1/2)d −( 3/2)d
= √0
0 +( 3/2)d 0
ii) Use the relationship l0 = Fx to compute the epipolar line in the right image corre-
sponding to the homogeneous point x = [1, 1, 1]> in the left image.
0 −(1/2)d √ 0 1 +1√
P P
l0 = Fx = −(1/2)d √ 0 −( 3/2)d 1 = +(1 + 3)
√
0 +( 3/2)d 0 1 − 3
(b) i) Derive the 4-vectors which represent the optical centres of the two cameras.
The world frame is coincident
with
the frame of the first camera, so0 that
optical centre
0 C
of C is at the world origin . The optical centre of C’ is at such that
1 1
C0 C0 −R> t
0 R t
= ⇒ =
1 0> 1 1 1 1
ii) Use the above results to derive expressions for the epipoles of the two cameras.
Epipole e is the projection of the optic centre of C’ into camera C.
−R> t
P P
e = K[I|0] = KR> t.
1
iii) Prove that the epipoles e and e0 are the right and left null-spaces respectively of
the fundamental matrix F.
We have to show that Fe = 0 and F> e0 = 0, respectively.
P
Fe = [K0 ]−> [t]× RK−1 e = [K0 ]−> [t]× RK−1 KR> t = [K0 ]−> [t]× t = 0
P
F> e0 = [K]−> R> [t]× > [K0 ]−1 K0 t = [K]−> R> [t]× > t
(c) A camera rotates abouts its optical centre and changes its intrinsics so that the camera
matrices before and after P = K[I|0] and P0 = K0 [R|0]
i) Derive an expression for the homography H which relates the images of points before
and after the motion as x0 = Hx.
From part (a),
x = KXW
3×1 and x0 = K0 RXW
3×1
Eliminate XW
3×1
P
x0 = K0 RK−1 x = Hx
2
1 Original
Image
Mesh
4
3
C18 Machine Vision 1(MT 2015) Page 9
>
x0 Fx = 0 ⇒ x> H> Fx = 0
Now consider the scalar quantity x> Ax, where A is antisymmetric. The transpose of a scalar
is the same scalar, so
x> Ax = [x> Ax]> = x> A> x
But A> = −A, so
x> Ax = −x> Ax ⇒ x> Ax = 0 .
As x> H> Fx = 0, then H> F is antisymmetric.
C18 Machine Vision 1(MT 2015) Page 10
(a) The ordering constraint is often used in stereo correspondence algorithms to disam-
biguate point matches on corresponding epipolar lines. Sketch a configuration of sur-
faces and cameras for which the ordering constraint is valid, and a configuration for
which it is not valid.
As per notes. Invalid for (i) Discontinuous surfaces (ii) Tranparent surfaces
(b) Prove that ZNCC is invariant under I 0 = αI + β.
Assume the original intensity, and let the source patch be A. The patch mean is subtracted
from each pixel Ãij = Aij − µA and the values used in
P P
i j Ãij B̃ij
ZNCC = qP P qP P
2 2
i Ã
j ij i j B̃ij
Now consider altering the intensities of each pixel as in I 0 = αI + β. The patch mean is
changed to
µ0A = αµA + β
Now
P P P P
i j Ã0 ij B̃ij α i j Ãij B̃ij
ZNCC 0 = qP P qP P = q P P qP P
2
i j
0
à ij i j B̃ij2 α2 i j Ã2ij i
2
j B̃ij
= ZNCC
as the α’s cancel top and bottom.
(c) Explain why using normalized cross-correlation is likely to be more important for stereo
matching than for matching between images in a monocular sequence.
In a monocular sequence images are taken a short time apart (typically 30ms) by the same
camera. Any change in gain and offset over that time is likely to be neglible.
Stereo involves different camera which could quite easily have different gains and offsets.
C18 Machine Vision 1(MT 2015) Page 11
8. Lecture 4: Triangulation
f
1
4
f
(a) When the cameras are separated by tx = 4 units, find the coordinates of the 3D point
for the correspondences: (i) [−1, 0] ↔ [1, 0]; (ii) [0, 0] ↔ [0, 0].
Easy to see that
−2 0
0 0
X1 =
2f
X2 = 1
1 0
We want to find either α or β. To find α, it saves work to bring the rotation to the LHS
−1 0
αRK−1 x = −t + βK0 x or αp = −t + βq
Then dot this equation first with p then q to get two simultaneous equations in α and
β, and solve for
ii) Verify this expression using the parallel camera configuration earlier in this question
−1
R=I t = [4, 0, 0]> K−1 = K0 = Diag(1/f , 1/f , 1)
1 Z2 Z2
δZ = −f tx δd = − f tx δd = − δd
d2 (f tx )2 f tx
What are the consequences of this relation when computing a stereo reconstruction?
The error in disparity δd is likely to be constant across the image — it depends principally on
uncertainties in the location of image features which are independent of depth. As f and tx
are constant, the uncertainty in depth is proportional to the depth squared.