0% found this document useful (0 votes)
19 views9 pages

Lec 4

This lecture covered optical flow, the optical mouse, and the constant brightness assumption. Key points include: 1) Optical flow estimates motion between frames using the constant brightness assumption and solving equations relating pixel brightness over time. 2) The optical mouse uses optical flow to track hand movements by shining light and tracking reflections. 3) Multiscale approaches help with time to contact estimation by smoothing noise through downsampling images.

Uploaded by

stathiss11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views9 pages

Lec 4

This lecture covered optical flow, the optical mouse, and the constant brightness assumption. Key points include: 1) Optical flow estimates motion between frames using the constant brightness assumption and solving equations relating pixel brightness over time. 2) The optical mouse uses optical flow to track hand movements by shining light and tracking reflections. 3) Multiscale approaches help with time to contact estimation by smoothing noise through downsampling images.

Uploaded by

stathiss11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

6.801/6.

866: Machine Vision, Lecture 4

Professor Berthold Horn, Ryan Sander, Tadayuki Yoshitake


MIT Department of Electrical Engineering and Computer Science
Fall 2020

These lecture summaries are designed to be a review of the lecture. Though I do my best to include all main topics from the
lecture, the lectures will have more elaborated explanations than these notes.

1 Lecture 4: Fixed Optical Flow, Optical Mouse, Constant Brightness As-


sumption, Closed Form Solution
1.1 Review
Let’s frame the topics today by briefly reviewing some concepts we’ve covered in the previous lectures. Feel free to skip this
section if you already feel comfortable with the material.

• Image formation
– Where in the image? Recall perspective projection:
x X y Y
= , = (1)
f Z f Z
Differentiating this expression gives:
u U XW v V Y W
= − , = − (2)
f Z Z Z f Z Z Z

From these, we can find the Focus of Expansion (FOE), or, more intuitively: “The point in the image toward
which you are moving.”

How long until we reach this point? This is given by Time to Contact (TTC):

Z 1
Time to Contact = = (3)
W C
– How bright in the image? For this, let us consider an image solid, where the brightness function is parameterized by
x, y, and t: E(x, y, t).

1.1.1 Constant Brightness Assumption Review with Generalized Isophotes


Recall the constant brightness assumption, which says that the total derivative of brightness with respect to time is zero:
dE(x,y,t)
dt = 0. By chain rule we obtain the BCCE:

dx ∂E dy ∂E ∂E
+ + =0 (4)
dt ∂x dt ∂y ∂t
Δ dx Δ dy
Recall our variables: u = dt , v = dt . Then BCCE rewritten in the standard notation we’ve been using:

uEx + yEy + Et = 0 (5)

1
Recall our method of using least-squares regression to solve for optimal values of u, v that minimize the total computed sum
of the LHS of the BCCE over the entire image (note that integrals become discrete in the presence of discretized pixels, and
derivatives become differences):
ZZ
u∗ , v ∗ = arg min (uEx + yEy + Et )dxdy (6)
u,v

The first-order conditions (FOCs) of this optimization problem give:


ZZ ZZ ZZ
2
u Ex + v E x Ey = − Ex Et (7)
ZZ ZZ ZZ
u Ey Ex + v Ey2 = − Ey Et (8)

Written in matrix-vector form, our equations become:


" RR # 
2
RR RR 
E x Ex Ey u Ex Et
RR RR 2 =− RR
Ey Ex Ey v Ey Et

(New) Now, to introduce a new variation on this problem, let us suppose we have the following spatial parameterization of
brightness (you’ll see that this brightness function creates linear isophotes) for linear f :

E(x, y) = f (ax + by) (9)

If f is differentiable over the domain, then the spatial derivatives Ex and Ey can be computed as follows, using the chain rule:
• Ex = f 0 (ax + by)a

• Ey = f 0 (ax + by)b
Where f 0 is the derivative of this scalar-valued function (i.e, we can define the input to be z = ax + by, and the derivative f 0 is
therefore equivalent to dfdz
(z)
).

Isophote Example: If E(x, y) = ax + by + c, for a, b, c ∈ R+ , then the isophotes of this brightness function will be lin-
ear.

1.1.2 Time to Contact (TTC) Review


Recall the following symbols/quantities from TTC:
Z
• C= w

• TTC = w 1 dZ
Z = Z dt =
d
dt loge (z) , therefore we can simply take the slope of the line corresponding to the logarithm of Z to
compute TTC.

Now, let’s suppose that objects are moving both in the world and in the image. Let’s denote s as our image coordinate and S
as our world coordinate. Then:
s S
= (10)
f Z
Then we can write:

sZ + sf = 0 (11)

Differentiating:
ds dZ
ds dZ
Z +s = 0 =⇒ dt = dt (12)
dt dt S Z
The above relationship between derivative ratios can be interpreted as: “The change in the image’s size is the same as the change
in distance.”

2
1.2 Increasing Generality of TTC Problems
Let us now consider adding some additional generality to the Time to Contact (TTC) problem. We’ve already visited some of
these cases before:

• Simple case: Solving for C

• More General case: Solving for A, B, C

• Even More General case: (Motivation) What if the optical axis isn’t perpendicular to the wall? What if the camera
plane is tilted, e.g. Z = aX + bY + C for some a, b, C ∈ R? In this case, we can solve the problem numerically rather
than through a closed-form expression.
Another motivating question for developing TTC methods: What if the surface is non-planar? This is a common scenario for
real-world TTC systems. In this case, we have two options:
• Parameterize the geometric models of these equations with polynomials, rather than planes.

• Leave the planar solution, and look for other ways to account for errors between the modeled and true surfaces.

In practice, the second option here actually works better. The first option allows for higher modelling precision, but is less robust
to local optima, and can increase the sensitivity of the parameters we find through least-squares optimization.

If you want to draw an analog to machine learning/statistics, we can think of modeling surfaces with more parameters (e.g.
polynomials rather than planes) as creating a model that will overfit or not generalize well to the data it learns on, and create
a problem with too many unknowns and not enough equations.

1.3 Multiscale and TTC


If you recall from last lecture, we saw that TTC and FOE estimation “fell apart” as we got really close. This is due to mea-
surement sensitivity and the fact that the pixels occupy increasingly more space as we get closer and closer. This is where
multiscale can help us - it enables us to use more coarse resolutions for these estimation problems. The implicit averaging done
through downsampling allows us to “smoooth out” any measurement noise that may be present, and will consequently reduce
the magnitide of pixel brightness gradients.

Additionally, multiscale is computationally-efficient: Using the infinite geometric series, we can see that downsampling/down-
scaling by a factor of 2 each time and storing all of these smaller image representations requires only 33% more stored data than
the full size image itself:

X 1 1 4 1
(( )2 )n = 1 = =1+ (13)
n=0
2 1− 4
3 3

1
More generally, for any downsampling factor r ∈ N, we only add r 2 −1 × 100% amount of additional data:

X 1 1 r2 (r2 − 1) + 1 1
( 2 )n = 1 = 2
= 2
=1+ 2 (14)
n=0
r 1 − r2 r − 1 r − 1 r −1

(Note that we have r2 rather than r in the denominator because we are downsampling across both the x and y dimensions.)

1.3.1 Aliasing and Nyquist’s Sampling Theorem


Though multiscale is great, we also have to be mindful of aliasing. Recall from 6.003 (or another Signals and Systems course)
that aliasing causes overlap and distortion between signals in the frequency domain, and it is required that we sample at a spatial
frequency that is high enough to not produce aliasing artifacts.

Nyquist’s Sampling Theorem states that we must sample at twice the frequency of the highest-varying component of our image
to avoid aliasing and consequently reducing spatial artifacts.

3
1.3.2 Applications of TTC
A few more applications of TTC:
• Airplane Wing Safety - Using TTC to make sure wings don’t crash into any objects at airports.
• NASA TTC Control - TTC systems were used to ensure that NASA’s payload doesn’t crash into the surface of earth/other
planets/moons. In this application, feedback control was achieved by setting a nominal “desired TTC” and using an
amplifier, imaging, and TTC estimation to maintain this desired TTC.
• Autonomous Vehicles - e.g. a vehicle is coming out of a parking lot and approaching a bus - how do we control when/if to
brake?
Let’s discuss the NASA TTC Control example a little further. Using our equation for TTC:
Z
=T (15)
W
We can rewrite this as a first-order Ordinary Differential Equation (ODE):
Z dZ 1
dZ
= T =⇒ = Z (16)
dt
dt T

Since the derivative of Z is proportional to Z, the solution to this ODE will be an exponential function in time:
−t
Z(t) = Z0 e T (17)

Where Z0 depends on the initial conditions of the system.

This method requires that deceleration is not uniform, which is not the most energy efficient approach for solving this problem.
As you can imagine, energy conservation is very important in space missions, so let’s next consider a constant deceleration
2 Δ
approach. Note that under constant deceleration, we have ddt2z = a = 0. Then we can express the first derivative of Z w.r.t. t as:

dZ
= at + v0 (18)
dt
Where v0 is an initial velocity determined by the boundary/initial conditions. Here we have the following boundary condition:
dZ
= a(t − t0 ) (19)
dt
This boundary condition gives rise to the following solution:
1 2 1
Z= at − at0 t + c = a(t − t0 )2 (20)
2 2
Therefore, the TTC for this example becomes:
1
Z 2 a(t− t0 )2 1
T = dZ
= = (t − t0 ) (21)
dt
a(t − t0 ) 2

1.4 Optical Flow


Motivating question: What if the motion of an image is non-constant, or it doesn’t move together? We have the Brightness
Change Constraint Equation (BCCE), but this only introduces one constraint to solve for two variables, and thus creates
an under-constrained/ill-posed problem.

How can we impose additional constraints? To do this, let us first understand how motion relates across pixels, and
information that they share. Pixels don’t necessarily move exactly together, but they move together in similar patterns, partic-
ularly if pixels are close to one another. We’ll revisit this point in later lectures.

What Else Can We Do? One solution is to divide the images into equal-sized patches and apply the Fixed Flow Paradigm,
as we’ve done with entire images before. When selecting patch size, one trade-off to be mindful of is that the smaller the patch,
the more uniform the brightness patterns will be across the patch, and patches may be too uniform to detect motion (note: this
is equivalent to the matrix determinants we’ve been looking at evaluating to zero/near-zero).

4
1.5 Vanishing Points (Perspective Projection)
Before we dive into what vanishing points are, let’s discuss why they’re useful. Vanishing points can be useful for:

• Camera calibration - has applications to robotics, autonomous vehicles, photogrammetry, etc.

• Finding relative orientation between two coordinate frames/systems

Now, let’s discuss what it is. Suppose we have several parallel lines in the world, and we image them by projecting them onto
the 2D image plane. Then vanishing points are the points in the image (or, more commonly, outside of the image) where these
lines converge to in the image. To discuss these mathematically, let’s first discuss about defining lines in a 3D world:

• Vector Form: R = R0 + sn̂


Here, we can express this using our standard vector form of perspective projection:
1 1
r= R (22)
f R · ẑ
1
= (R0 + sn̂) (23)
(R0 + sn̂) · n̂

• Parametrically: (x0 + αs, y0 + βs, z0 + γs)


Here, we can expand this to our standard Cartesian form of perspective projection to apply our transformation:
x 1
= (x0 + αs) (24)
f Z0 + γs
y 1
= (y0 + βs) (25)
f Z0 + γs

To build intuition, let’s consider what happens when we travel far along the lines (i.e. as s gets very large) in our parametric
definition of lines:
x 1 αs α
• limx→∞ f = limx→∞ Z0 +γs (x0 + αs) = γs = γ (x-coordinate)
y 1 βs β
• limy→∞ f = limy→∞ Z0 +γs (x0 + βs) = γs = γ (x-coordinate)

The 2D point ( αγ , βγ ) is the vanishing point in the image plane. As we move along the line in the world, we approach this point
in the image, but we will never reach it. More generally, we claim that parallel lines in the world have the same vanishing
point in the image.

1.5.1 Applications of Vanishing Points


Let’s now discuss some of these applications in greater detail.

• Protecting Pedestrians on a Road Side: To protect pedestrians, the camera must transform its coordinate system.
This transformation can be found using vanishing points.

• Camera Calibration: One way to calibrate a camera is to solve for the Center of Projection (COP) in the image
space, using perspective projection. Calibration is typically achieved through calibration objects.

1.6 Calibration Objects


Let’s discuss two calibration objects: spheres and cubes:

1.6.1 Spheres
:

• If image projection is directly overhead/straight-on, the projection from the world sphere to the image plane is a circle. If
it is not overhead/straight on, it is elliptic.

• Relatively easy to manufacture

5
1.6.2 Cube
:

• Harder to manufacture, but generally a better calibration object than a sphere.

• Cubes can be used for detecting edges, which in turn can be used to find vanishing points (since edges are lines in the
world).

• Cubes have three sets of four parallel lines/edges each, and each of these sets of lines are orthogonal to the others. This
implies that we will have three vanishing points - one for each set of parallel lines.

• For each of these sets of lines, we can pick a line that goes through the Center of Projection (COP), denoted p ∈ R3 (in
the world plane). We can then project the COP onto the image plane (and therefore now p ∈ R2 ).
• Let us denote the vanishing points of the cube in the image plane as a, b, c ∈ R2 . Then, because of orthogonality between
the different sets of lines, we have the following relations between our three vanishing points and p:

– (p − a) · (p − b) = 0
– (p − b) · (p − c) = 0
– (p − c) · (p − a) = 0

In other words, the difference vectors between p and the vanishing points are all at right angles to each other.

To find p, we have three equations and three unknowns. We have terms that are quadratic in p. Using Bézout’s
Theorem (The maximum number of solutions is the product of the polynomial order of each equation in the system of
equations), we have (2)3 = 8 possible solutions for our system of equations. More generally:
E
Y
number of solutions = oe (26)
e=1

Whhere E is the number of equations and oe is the polynomial order of the eth equation in the system.

This is too many equations to work with, but we can subtract these equations from one another and create a system
of 3 linearly dependent equations. Or, even better, we can leave one equation in its quadratic form, and 2 in their linear
form, and this maintains linear independence of this system of equations:
– (a − p) · (c − b) = 0
– (b − p) · (a − c) = 0
– (p − c) · (p − a) = 0

1.7 Additional Topics


We’ll also briefly recap some of the topics discussed in our synchronous section today. These topics are not meant to introduce
new concepts, but are designed to generalize the concepts we’ve discussed such that they can be adapted for a broader range of
applications.

1.7.1 Generalization: Fixed Flow


The motivating example for this generalization is a rotating optimal mouse. We’ll see that instead of just solving for our two
velocity parameters u and v, we’ll also need to solve for our rotational velocity, ω.

Suppose we are given the following parameterizations of velocity:

• u = u0 − wy

• v = v0 + wx

6
Note that we can also write the radial vector of x and y, as well as the angle in this 2D plane to show how this connects to
rotation:
p
r = (x, y) = x2 + y 2 (27)
θ = arctan 2(y, x) (28)

ω= (29)
dt
With this rotation variable, we leverage the same least-squares approach as before over the entirety of the image, but now we
also optimize over the variable for ω:
ZZ
∗ ∗ ∗ Δ
u0 , v0 , ω = arg min {J(u0 , v0 , ω) = (u0 Ex + v0 Ey + wH + Et )2 dxdy} (30)
u0 ,v0 ,ω

Like the other least-squares optimization problems we’ve encountered before, this problem can be solved by solving a system of
first-order conditions (FOCs):
dJ(u0 ,v0 ,ω)
• du0 =0
dJ(u0 ,v0 ,ω)
• dv0 =0
dJ(u0 ,v0 ,ω)
• dω =0

1.7.2 6 0)
Generalization: Time to Contact (for U = 0, V = 0, ω =
6 0, V =
Let’s now revisit TTC, but with the following parameterization: U = 6 0, image is of a tilted plane.

For this, we can write Z as a function of the world coordinates X and Y :


∂Z ∂Z
Z = Z0 + X+ Y (31)
∂X ∂Y
Z = Z0 + pX + qY (32)
Recall the following derivations for the image coordinate velocities u and v, which help us relate image motion in 2D to world
motion in 3D:
u U X W
• f = Z − Z Z

v V Y W
• f = Z − Z Z

Some additional terms that are helpful when discussing these topics:
• Motion Field: Projection of 3D motion onto the 2D image plane.
• Optical Flow:
– What we can sense
– Describes motion in the image
We can transform this into image coordinates:
1 1
u= (f U − xw), v = (f V − yw) (33)
2 2
Let’s take U = V = 0, u = −X w W
Z , v = −Y Z . Z (world coordinates) is not constant, so we can rewrite this quantity by
substituting the image coordinates in for our expression for Z:
Z = Z0 + px + qy (34)
X Y
= Z0 + p Z + q Z (35)
f f
Now, we can isolate Z and solve for its closed form:
X Y
Z(1 − p − q ) = Z0 (36)
f f
Z0
Z= (37)
1 − pX Y
f −qf

7
From this we can conclude that Z1 is linear in x and y (the image coordinates, not the world coordinates). This is helpful for
methods that operate on finding solutions to linear systems. If we now apply this to the BCCE given by uEx + vEy + Et = 0,
we can first express each of the velocities in terms of this derived expression for Z:
1
• u= Z0 (1 − pX Y
f − q f )(−xω)

1
• v= Z0 (1 − pX Y
f − q f )(−yω)

Applying these definitions to the BCCE:

0 = uEx + vEy + Et (38)


1 X Y 1 X Y
= (1 − p − q )(−xω)Ex + (1 − p − q )(−yω)Ey + Et (39)
Z0 f f Z0 f f
Combining like terms, we can rewrite this constraint as:
ω X Y
0=− (1 − p − q )(xEx + yEy ) + Et (40)
Z0 f f
Equivalently, dividing everything by −1:
ω X Y
0= (1 − p − q )(xEx + yEy ) − Et (41)
Z0 f f
Δ
We can also express this with the “radial gradient” given by: G = (xEx + yEy ):

ω X Y
0= (1 − p − q )G − Et (42)
Z0 f f
For symbolic simplicity, take the following definitions:
Δ w
• R= Z0

Δ
• P = −p Zw0
Δ
• Q = −q Zw0
Using these definitions, the BCCE with this paramterization becomes:

0 = (R + P x + Qy)G − Et (43)

Now, we can again take our standard approach of solving these kinds of problems by applying least-squares to estimate the free
variables P, Q, R over the entire continuous or discrete image space. Like other cases, this fails when the determinant of the
system involving these equations is zero.

8
MIT OpenCourseWare
https://ocw.mit.edu

6.801 / 6.866 Machine Vision


Fall 2020

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms

You might also like