Lec 4
Lec 4
These lecture summaries are designed to be a review of the lecture. Though I do my best to include all main topics from the
lecture, the lectures will have more elaborated explanations than these notes.
• Image formation
– Where in the image? Recall perspective projection:
x X y Y
= , = (1)
f Z f Z
Differentiating this expression gives:
u U XW v V Y W
= − , = − (2)
f Z Z Z f Z Z Z
From these, we can find the Focus of Expansion (FOE), or, more intuitively: “The point in the image toward
which you are moving.”
How long until we reach this point? This is given by Time to Contact (TTC):
Z 1
Time to Contact = = (3)
W C
– How bright in the image? For this, let us consider an image solid, where the brightness function is parameterized by
x, y, and t: E(x, y, t).
dx ∂E dy ∂E ∂E
+ + =0 (4)
dt ∂x dt ∂y ∂t
Δ dx Δ dy
Recall our variables: u = dt , v = dt . Then BCCE rewritten in the standard notation we’ve been using:
1
Recall our method of using least-squares regression to solve for optimal values of u, v that minimize the total computed sum
of the LHS of the BCCE over the entire image (note that integrals become discrete in the presence of discretized pixels, and
derivatives become differences):
ZZ
u∗ , v ∗ = arg min (uEx + yEy + Et )dxdy (6)
u,v
(New) Now, to introduce a new variation on this problem, let us suppose we have the following spatial parameterization of
brightness (you’ll see that this brightness function creates linear isophotes) for linear f :
If f is differentiable over the domain, then the spatial derivatives Ex and Ey can be computed as follows, using the chain rule:
• Ex = f 0 (ax + by)a
• Ey = f 0 (ax + by)b
Where f 0 is the derivative of this scalar-valued function (i.e, we can define the input to be z = ax + by, and the derivative f 0 is
therefore equivalent to dfdz
(z)
).
Isophote Example: If E(x, y) = ax + by + c, for a, b, c ∈ R+ , then the isophotes of this brightness function will be lin-
ear.
• TTC = w 1 dZ
Z = Z dt =
d
dt loge (z) , therefore we can simply take the slope of the line corresponding to the logarithm of Z to
compute TTC.
Now, let’s suppose that objects are moving both in the world and in the image. Let’s denote s as our image coordinate and S
as our world coordinate. Then:
s S
= (10)
f Z
Then we can write:
sZ + sf = 0 (11)
Differentiating:
ds dZ
ds dZ
Z +s = 0 =⇒ dt = dt (12)
dt dt S Z
The above relationship between derivative ratios can be interpreted as: “The change in the image’s size is the same as the change
in distance.”
2
1.2 Increasing Generality of TTC Problems
Let us now consider adding some additional generality to the Time to Contact (TTC) problem. We’ve already visited some of
these cases before:
• Even More General case: (Motivation) What if the optical axis isn’t perpendicular to the wall? What if the camera
plane is tilted, e.g. Z = aX + bY + C for some a, b, C ∈ R? In this case, we can solve the problem numerically rather
than through a closed-form expression.
Another motivating question for developing TTC methods: What if the surface is non-planar? This is a common scenario for
real-world TTC systems. In this case, we have two options:
• Parameterize the geometric models of these equations with polynomials, rather than planes.
• Leave the planar solution, and look for other ways to account for errors between the modeled and true surfaces.
In practice, the second option here actually works better. The first option allows for higher modelling precision, but is less robust
to local optima, and can increase the sensitivity of the parameters we find through least-squares optimization.
If you want to draw an analog to machine learning/statistics, we can think of modeling surfaces with more parameters (e.g.
polynomials rather than planes) as creating a model that will overfit or not generalize well to the data it learns on, and create
a problem with too many unknowns and not enough equations.
Additionally, multiscale is computationally-efficient: Using the infinite geometric series, we can see that downsampling/down-
scaling by a factor of 2 each time and storing all of these smaller image representations requires only 33% more stored data than
the full size image itself:
∞
X 1 1 4 1
(( )2 )n = 1 = =1+ (13)
n=0
2 1− 4
3 3
1
More generally, for any downsampling factor r ∈ N, we only add r 2 −1 × 100% amount of additional data:
∞
X 1 1 r2 (r2 − 1) + 1 1
( 2 )n = 1 = 2
= 2
=1+ 2 (14)
n=0
r 1 − r2 r − 1 r − 1 r −1
(Note that we have r2 rather than r in the denominator because we are downsampling across both the x and y dimensions.)
Nyquist’s Sampling Theorem states that we must sample at twice the frequency of the highest-varying component of our image
to avoid aliasing and consequently reducing spatial artifacts.
3
1.3.2 Applications of TTC
A few more applications of TTC:
• Airplane Wing Safety - Using TTC to make sure wings don’t crash into any objects at airports.
• NASA TTC Control - TTC systems were used to ensure that NASA’s payload doesn’t crash into the surface of earth/other
planets/moons. In this application, feedback control was achieved by setting a nominal “desired TTC” and using an
amplifier, imaging, and TTC estimation to maintain this desired TTC.
• Autonomous Vehicles - e.g. a vehicle is coming out of a parking lot and approaching a bus - how do we control when/if to
brake?
Let’s discuss the NASA TTC Control example a little further. Using our equation for TTC:
Z
=T (15)
W
We can rewrite this as a first-order Ordinary Differential Equation (ODE):
Z dZ 1
dZ
= T =⇒ = Z (16)
dt
dt T
Since the derivative of Z is proportional to Z, the solution to this ODE will be an exponential function in time:
−t
Z(t) = Z0 e T (17)
This method requires that deceleration is not uniform, which is not the most energy efficient approach for solving this problem.
As you can imagine, energy conservation is very important in space missions, so let’s next consider a constant deceleration
2 Δ
approach. Note that under constant deceleration, we have ddt2z = a = 0. Then we can express the first derivative of Z w.r.t. t as:
dZ
= at + v0 (18)
dt
Where v0 is an initial velocity determined by the boundary/initial conditions. Here we have the following boundary condition:
dZ
= a(t − t0 ) (19)
dt
This boundary condition gives rise to the following solution:
1 2 1
Z= at − at0 t + c = a(t − t0 )2 (20)
2 2
Therefore, the TTC for this example becomes:
1
Z 2 a(t− t0 )2 1
T = dZ
= = (t − t0 ) (21)
dt
a(t − t0 ) 2
How can we impose additional constraints? To do this, let us first understand how motion relates across pixels, and
information that they share. Pixels don’t necessarily move exactly together, but they move together in similar patterns, partic-
ularly if pixels are close to one another. We’ll revisit this point in later lectures.
What Else Can We Do? One solution is to divide the images into equal-sized patches and apply the Fixed Flow Paradigm,
as we’ve done with entire images before. When selecting patch size, one trade-off to be mindful of is that the smaller the patch,
the more uniform the brightness patterns will be across the patch, and patches may be too uniform to detect motion (note: this
is equivalent to the matrix determinants we’ve been looking at evaluating to zero/near-zero).
4
1.5 Vanishing Points (Perspective Projection)
Before we dive into what vanishing points are, let’s discuss why they’re useful. Vanishing points can be useful for:
Now, let’s discuss what it is. Suppose we have several parallel lines in the world, and we image them by projecting them onto
the 2D image plane. Then vanishing points are the points in the image (or, more commonly, outside of the image) where these
lines converge to in the image. To discuss these mathematically, let’s first discuss about defining lines in a 3D world:
To build intuition, let’s consider what happens when we travel far along the lines (i.e. as s gets very large) in our parametric
definition of lines:
x 1 αs α
• limx→∞ f = limx→∞ Z0 +γs (x0 + αs) = γs = γ (x-coordinate)
y 1 βs β
• limy→∞ f = limy→∞ Z0 +γs (x0 + βs) = γs = γ (x-coordinate)
The 2D point ( αγ , βγ ) is the vanishing point in the image plane. As we move along the line in the world, we approach this point
in the image, but we will never reach it. More generally, we claim that parallel lines in the world have the same vanishing
point in the image.
• Protecting Pedestrians on a Road Side: To protect pedestrians, the camera must transform its coordinate system.
This transformation can be found using vanishing points.
• Camera Calibration: One way to calibrate a camera is to solve for the Center of Projection (COP) in the image
space, using perspective projection. Calibration is typically achieved through calibration objects.
1.6.1 Spheres
:
• If image projection is directly overhead/straight-on, the projection from the world sphere to the image plane is a circle. If
it is not overhead/straight on, it is elliptic.
5
1.6.2 Cube
:
• Cubes can be used for detecting edges, which in turn can be used to find vanishing points (since edges are lines in the
world).
• Cubes have three sets of four parallel lines/edges each, and each of these sets of lines are orthogonal to the others. This
implies that we will have three vanishing points - one for each set of parallel lines.
• For each of these sets of lines, we can pick a line that goes through the Center of Projection (COP), denoted p ∈ R3 (in
the world plane). We can then project the COP onto the image plane (and therefore now p ∈ R2 ).
• Let us denote the vanishing points of the cube in the image plane as a, b, c ∈ R2 . Then, because of orthogonality between
the different sets of lines, we have the following relations between our three vanishing points and p:
– (p − a) · (p − b) = 0
– (p − b) · (p − c) = 0
– (p − c) · (p − a) = 0
In other words, the difference vectors between p and the vanishing points are all at right angles to each other.
To find p, we have three equations and three unknowns. We have terms that are quadratic in p. Using Bézout’s
Theorem (The maximum number of solutions is the product of the polynomial order of each equation in the system of
equations), we have (2)3 = 8 possible solutions for our system of equations. More generally:
E
Y
number of solutions = oe (26)
e=1
Whhere E is the number of equations and oe is the polynomial order of the eth equation in the system.
This is too many equations to work with, but we can subtract these equations from one another and create a system
of 3 linearly dependent equations. Or, even better, we can leave one equation in its quadratic form, and 2 in their linear
form, and this maintains linear independence of this system of equations:
– (a − p) · (c − b) = 0
– (b − p) · (a − c) = 0
– (p − c) · (p − a) = 0
• u = u0 − wy
• v = v0 + wx
6
Note that we can also write the radial vector of x and y, as well as the angle in this 2D plane to show how this connects to
rotation:
p
r = (x, y) = x2 + y 2 (27)
θ = arctan 2(y, x) (28)
dθ
ω= (29)
dt
With this rotation variable, we leverage the same least-squares approach as before over the entirety of the image, but now we
also optimize over the variable for ω:
ZZ
∗ ∗ ∗ Δ
u0 , v0 , ω = arg min {J(u0 , v0 , ω) = (u0 Ex + v0 Ey + wH + Et )2 dxdy} (30)
u0 ,v0 ,ω
Like the other least-squares optimization problems we’ve encountered before, this problem can be solved by solving a system of
first-order conditions (FOCs):
dJ(u0 ,v0 ,ω)
• du0 =0
dJ(u0 ,v0 ,ω)
• dv0 =0
dJ(u0 ,v0 ,ω)
• dω =0
1.7.2 6 0)
Generalization: Time to Contact (for U = 0, V = 0, ω =
6 0, V =
Let’s now revisit TTC, but with the following parameterization: U = 6 0, image is of a tilted plane.
v V Y W
• f = Z − Z Z
Some additional terms that are helpful when discussing these topics:
• Motion Field: Projection of 3D motion onto the 2D image plane.
• Optical Flow:
– What we can sense
– Describes motion in the image
We can transform this into image coordinates:
1 1
u= (f U − xw), v = (f V − yw) (33)
2 2
Let’s take U = V = 0, u = −X w W
Z , v = −Y Z . Z (world coordinates) is not constant, so we can rewrite this quantity by
substituting the image coordinates in for our expression for Z:
Z = Z0 + px + qy (34)
X Y
= Z0 + p Z + q Z (35)
f f
Now, we can isolate Z and solve for its closed form:
X Y
Z(1 − p − q ) = Z0 (36)
f f
Z0
Z= (37)
1 − pX Y
f −qf
7
From this we can conclude that Z1 is linear in x and y (the image coordinates, not the world coordinates). This is helpful for
methods that operate on finding solutions to linear systems. If we now apply this to the BCCE given by uEx + vEy + Et = 0,
we can first express each of the velocities in terms of this derived expression for Z:
1
• u= Z0 (1 − pX Y
f − q f )(−xω)
1
• v= Z0 (1 − pX Y
f − q f )(−yω)
ω X Y
0= (1 − p − q )G − Et (42)
Z0 f f
For symbolic simplicity, take the following definitions:
Δ w
• R= Z0
Δ
• P = −p Zw0
Δ
• Q = −q Zw0
Using these definitions, the BCCE with this paramterization becomes:
0 = (R + P x + Qy)G − Et (43)
Now, we can again take our standard approach of solving these kinds of problems by applying least-squares to estimate the free
variables P, Q, R over the entire continuous or discrete image space. Like other cases, this fails when the determinant of the
system involving these equations is zero.
8
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms