Lec 12
Lec 12
These lecture summaries are designed to be a review of the lecture. Though I do my best to include all main topics from the
lecture, the lectures will have more elaborated explanations than these notes.
1 Lecture 12: Blob Analysis, Binary Image Processing, Use of Green’s Theo-
rem, Derivative and Integral as Convolutions
In this lecture, we will continue our discussion of intellectual property, and how it relevant for all scientists and engineers. We
will then elaborate on some of the specific machine vision techniques that were used in this patent, as well as introduce some
possible extensions that could be applicable for this patent as well.
– Can be thought of as a “contract with society” - you get a limited monopoly on your idea, and in turn, you publish
the technical details of your approach.
– Can help to reduce litigation and legal fees.
– Can be used by large companies as “ammunition” for “patent wars”.
Some “rules” of patents:
• Trademarks:
1
– Cannot use common words - this is actually one reason why many companies have slightly misspelled combinations
of common words.
– Can use pictures, character distortions, and color as part of the trademark.
– No issues if in different fields.
• Trade Secret
• Determining to position
0.8
0.6
u(x)
0.4
0.2
0
−6 −4 −2 0 2 4 6
x
The gradient of this brightness across the edge, given by ru(x) (or du
dx in one dimension), is then given by the following. Notice
that the location of the maximum matches the inflection point in the graph above:
2
Gradient of “Soft” Unit Step Function, ru(x)
0.25
0.2
0.15
ru(x)
0.1
5 · 10−2
0
−6 −4 −2 0 2 4 6
x
As we mentioned above, we can find the location of this edge by looking at where the second derivative of brightness crosses
zero, a.k.a. where r(ru(x)) = r2 u(x) = 0. Notice that the location of this zero is given by the same location as the inflection
point of u(x) and the maximum of ru(x):
0.1
5 · 10−2
r2 u(x)
−5 · 10−2
−0.1
−6 −4 −2 0 2 4 6
x
For those curious, here is the math behind this specific function, assuming a sigmoid for u(x):
1
1. u(x) = 1+exp (−x)
du d 1 exp(−x)
2. ru(x) = dx = dx 1+exp −x = (1+exp(−x))2
1
2. Ex = 2 −1 0 1
1 −1 1
3. Ex = 2
−1 1
3
Where for molecule 2, the best point for estimating derivatives lies directly in the center pixel, and for molecules 1 and 3, the
best point for estimating derivatives lies halfway between the two pixels.
1. Taylor Series: From previous lectures we saw that we could use averaging to reduce the error terms from 2nd order
derivatives to third order derivatives. This is useful for analytically determining the error.
2. Test functions: We will touch more on these later, but these are helpful for testing your derivative estimates using
analytical expressions, such as polynomial functions.
3. Fourier domain: This type of analysis is helpful for understanding how these “stencils”/molecules affect higher (spatial)
frequency image content.
Note that derivative estimators can become quite complicated for high-precision estimates of the derivative, even for low-order
derivatives. We can use large estimators over many pixels, but we should be mindful of the following tradeoffs:
• Features can also affect each other - e.g. a large edge detection estimator means that we can have two nearby edges affecting
each other.
We can also look at some derivative estimators for higher-order derivatives. For 2nd-order derivatives, we just apply another
derivative operator, which is equivalent to convolution of another derivative estimator “molecule”:
∂2 ∂ ∂(·) 1 1 1
2
(·) = ⇐⇒ −1 1 ⊗ −1 1 = 2 1 −2 1
∂x ∂x ∂x
For deriving the sign here and understanding why we have symmetry, remember that convolution “flips” one of the two filters/-
operators!
Sanity Check: Let us apply this to some functions we already know the 2nd derivative of.
• f (x) = x2 :
f (x) = x2
f 0 (x) = 2x
f 00 (x) = 2
• f (x) = x:
f (x) = x
f 0 (x) = 1
f 00 (x) = 0
4
• f (x) = 1:
f (x) = 1
f 0 (x) = 0
f 00 (x) = 0
In Practice: As demonstrated in the example “test functions” above, in general a good way to test an Nth order derivative
estimator is use polynomial test functions of arbitrary coefficients from order 0 up to order N. For instance, to calculate 4th
order derivative estimator, test:
1. f (x) = a
2. f (x) = ax + b
3. f (x) = ax2 + bx + c
4. f (x) = ax3 + bx2 + cx + d
Note: For derivative estimator operators, the weights of the “stencils”/computational molecules should add up to zero. Now
that we have looked at some of these operators and modes of analysis in one dimension, let us now look at 2 dimensions.
• Shift-Invariant:
d
(f (x + δ)) = f 0 (x + δ), for some δ ∈ R
dx
Derivative of shifted function = Derivative equivalently shifted by same amount
• Linear :
d
(af1 (x) + bf2 (x)) = af10 (x) + bf20 (x) for some a, b ∈ R
dx
Derivative of scaled sum of two functions = Scaled sum of derivatives of both functions
We will exploit this linear, shift-invariant property frequently in machine vision. Because of this joint property, we can treat
derivative operators as convolutions in 2D:
∂2
∂ ∂ 1 1 1 1 −1 +1
(·) = (·) ⇐⇒ −1 1 ⊗ = 2
∂x∂y ∂x ∂y −1 +1 −1
• If we project this derivative onto a “diagonal view”, we find that it is simply√ the second
√
derivative of x0 , where x0 is x
rotated 45 degrees counterclockwise in the 2D plane: x0 = x cos 45+y cos 45 = 22 x+ 22 y. In other words, in this 45-degree
rotated coordinate system, Ex0 x0 = Exy .
• Intuition for convolution: If convolution is a new concept for you, check out reference [2] here. Visually, convolution
is equivalent to “flipping and sliding” one operator across all possible (complete and partial) overlapping configurations of
the filters with one another.
5
1.2.4 Laplacian Estimators in 2D
Δ 2
∂ 2
∂
The Laplacian r2 = Δ = ∂x 2 + ∂y 2 is another important estimator in machine vision, and, as we discussed last lecture, is the
lowest-order rotationally-symmetric derivative operator. Therefore, our finite difference/computational molecule estimates
should reflect this property if they are to be accurate. Two candidate estimators of this operator are:
⎡ ⎤
0 1 0
1. “Direct Edge”: 12 ⎣1 −4 1⎦
0 1 0
⎡ ⎤
1 0 1
2. “Indirect Edge”: 212 ⎣0 −4 0⎦
1 0 1
√
Note that the second operator has a factor of 212 in front of it because the distance between edges is 2 rather than 1, therefore,
√
we effectively have 102 , where 0 = 2.
How do we know which of these approximations is better? We can go back to our analysis tools:
• Taylor Series
• Test functions
• Fourier analysis
Intuitively, we know that neither of these estimators will be optimal, because neither of these estimators are rotationally-
symmetric. Let us combine these intelligently to achieve rotational symmetry. Adding four times the first one with one times
the second:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1 0 1 1 0 1 1
1 4 1
4 2 ⎣1 −4 1⎦ + 1 ⎣0 −4 0⎦ = ⎣4 −20 4⎦
22 62
0 1 0 1 0 1 1 4 1
Using Taylor Series, we can show that this estimator derived from this linear combination of estimators above results in an error
term that is one derivative higher than suing either of the individual estimators above, at the cost of more computation. Note
that the sum of all the entries here is zero, as we expect for derivative estimators.
For a hexagonal grid, this is scaled by 212 and has entries of all 1s on the outer ring, and an entry of -6 in the center. An
example application of a hexagonal grid - imaging black holes! Leads to π4 greater efficiency.
It turns out the authors discourage thresholding, and in their work they remove all but the maximum estimated gradient
(note that this is quantized at the octant level). Note that the quantized gradient direction is perpendicular to the edge. In this
case, for a candidate gradient point G0 and the adjacent pixels G− and G+ , we must have:
G0 > G− , G0 ≥ G+
This forces − 12 ≤ s ≤ 21 . Note that we have the asymmetric inequality signs to break ties arbitrarily. Next we plot the quantized
profile that has been interpolated parabolically - i.e. sub-pixel interpolation.
6
1.2.6 Plane Position
Note that we have not yet done any thresholding. How can we improve this, given that we quantized the edge gradient direction?
Could we try not quantizing the edge direction? If we have the true gradient direction, we can find the intersection of this line
with the edge (at 90 degrees to the edge gradient) to find a better solution.
To find this point above (please take a look at the handwritten lecture notes for this lecture), we project from the quantized
gradient direction to the actual gradient direction. This is the “plane position” component.
In addition to cubic interpolation, we can also consider piecewise linear interpolation with “triangle” functions. For some
different values of b:
• b = 0 → s0 = s
• b = 1 → s0 = 2 sign(s)s2
• b = 2 → s0 = 4 sign(s)s3
d2
R=δ (Point Spread Function (PSF))
f
This pillbox image is given mathematically by:
1
(1 − u(r − R))
πR2
Where u(·) is the unit step function. Where f is the focal length of the lens, d is the diameter of the lens (assumed to be conic),
and δ is the distance along the optical axis between the actual image plane and the “in focus” plane.
1.3 Multiscale
Note: We will discuss this in greater detail next lecture.
Multiscale is quite important in edge detection, because we can have edges at different scales. To draw contrasting exam-
ples, we could have an image such that:
We can slide a circle across a binary image - the overlapping regions inside the circle between the 1-0 edge controls how
bright things appear. We can use this technique to see how accurately the algorithm plots the edge position - this allows for
7
error calculation since we have ground truth results that we can compute using the area of the circle. Our area of interest is
given by the area enclosed by the chord whose radial points intersect with the binary edge:
√
2 2 R2 − X 2 X
A=R θ−
2
√R2 − x2
θ = arctan
x
Another way to analyze this is to compute the analytical derivatives of this brightness function:
√
1. ∂E 2
∂x = 2 R − x
2
∂2E √ −2x
2. ∂x2 = R2 −x2
What can we do with this? We can use this as input into our algorithm to compute teh error and compensate for the degree
of defocusing of the lens. In practice, there are other factors that lead to fuzzy edge profiles aside from defocusing, but this
defocusing compensation helps.
Linear 1D Interpolation:
f (a)(b − x) + f (b)(x − a)
f˜(x) =
b−a
We can also leverage more sophisticated interpolation methods, such as cubic spline.
1.3.3 CORDIC
As we discussed in the previous lecture, CORDIC is an algorithm used to estimate vector direction by iteratively rotating a
vector into a correct angle. For this patent, we are interested in using CORDIC to perform a change of coordinates from cartesian
to polar:
(Ex , Ey ) → (E0 , Eθ )
Idea: Rotate a coordinate system to make estimates using test angles iteratively. Note that we can simply compute these with
square roots and arc tangents, but these can be prohibitively computationally-expensive:
q
E0 = Ex2 + Ey2
E
y
Eθ = arctan
Ex
Rather than computing these directly, it is faster to iteratively solve for the desired rotation θ by taking a sequence of iterative
rotations {θi }ni=1,2,··· . The iterative updates we have for this are, in matrix-vector form:
" # " (i) #
(i+1)
Ex cos θi sin θi Ex
(i+1) = − sin θ cos θi Ey(i)
Ey i
8
How do we select {θi }ni=1,2,··· ? We can select progressively smaller angles. We can accept the candidate angle and invoke the
iterative update above if each time the candidate angle reduces |Ey | and increases |Ex |.
P
The aggregate rotation θ is simply the sum of all these accepted angles: θ = i θi
One potential practical issue with this approach is that it involves a significant number of multiplications. How can we avoid
this? We can pick the angles carefully - i.e. if our angles are given successively by π2 , π4 , π8 , ..., then:
2−i
sin θu 1 1 1
= i → rotation matrix becomes :
cos θi 2 cos θi −2−i 1
Note that this reduces computation to 2 additions per iteration. Angle we turn through becomes successively smaller:
r r
1 Y Y 1
cos θi = 1 + 2i → R = cos θi = 1 + 2i ≈ 1.16 (precomputed)
2 i i
2
1.4 References
1. Finite Differences, https://en.wikipedia.org/wiki/Finite
_________________________________________
difference
9
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms