Lec 16
Lec 16
These lecture summaries are designed to be a review of the lecture. Though I do my best to include all main topics from the
lecture, the lectures will have more elaborated explanations than these notes.
1
Cosine Function, cos(x)
2
True Function
Interpolated Function
1
f (x)
0
−1
−2
0 2 4 6 8 10 12 14
x
Figure 1: Sampling only once per period provides us with a constant interpolated function, from which we cannot recover the
original. Therefore, we must sample at a higher frequency.
Note that this holds at points not on the peaks as well:
−1
−2
0 2 4 6 8 10 12 14
x
Figure 2: Sampling only once per period provides us with a constant interpolated function, from which we cannot recover the
original. Therefore, we must sample at a higher frequency.
2
Cosine Function, cos(x)
2
True Function
Interpolated Function
1
f (x)
0
−1
−2
0 2 4 6 8 10 12 14
x
Figure 3: Sampling at twice the rate as the highest-varying component almost gets us there! This is known as the Nyquist
Rate. It turns out we need to sample at frequencies that are strictly greater than this frequency to guarantee no aliasing - we
will see why in the example below.
Is this good enough? As it turns out, the inequality for Nyquist’s Sampling Theorem is there for a reason: we need to sample
at greater than twice the frequency of the original signal in order to uniquely recover it:
Cosine Function, cos(x)
2
True Function
Interpolated Function
1
f (x)
−1
−2
0 2 4 6 8 10 12 14
x
Figure 4: It turns out we need to sample at frequencies that are strictly greater than this frequency to guarantee no aliasing -
we will see why in the example below.
Therefore, any rate above 2 times the highest-varying frequency component of the signal will be sufficient to completely avoid
aliasing. As a review, let us next discuss aliasing.
1.1.2 Aliasing
Aliasing occurs when higher frequencies become indistinguishable from lower frequencies, and as a result they add interference
and artifacts to the signal that are caused by sampling at too low of a frequency.
3
Now let us consider what happens when we add multiples of 2π to this:
f
0
sk−2π = cos 2π k − 2πk
fs
f
0
= cos 2π −1 k
fs
f − f
0 s
= cos 2π k
fs
f − f
s 0
= cos 2π k , since cos(x) = cos(−x) ∀ x ∈ R
fs
Another way to put this - you cannot distinguish multiples of base frequencies with the base frequencies themselves if you sample
at too low a frequency, i.e. below the Nyquist Rate.
It turns out this computationally-simpler solution is through integral images. An integral image is essentially the sum of
values from the first value to the ith value, i.e if gi defines the ith value in 1D, then:
i
Δ
X
Gi = gk ∀ i ∈ {1, · · · , K}
k=1
Why is this useful? Well, rather than compute averages (normalized sums) by adding up all the pixels and then dividing, we
simply need to perform a single subtraction between the integral image values (followed by a division by the number of elements
we are averaging). For instance, if we wanted to calculate the average of values between i and j, then:
j
1 X 1
ḡ[i,j] = gk = (Gj − Gi )
j−i j−i
k=i
This greatly reduces the amortized amount of computation, because these sums only need to be computed once, when we
calculate the initial values for the integral image.
4
Let us now see how block averaging looks in 2D - in the diagram below, we can obtain a block average for a group of pix-
els in the 2D range (i, j) in x and (k, l) in y using the following formula:
j l
1 XX
ḡ([i,j],[k,l]) = gx,y
(j − i)(l − k) x=i
y=k
But can we implement this more efficiently? We can use integral images again:
j
i X
X
Gi,j = gk,l
k=1 l=1
Figure 5: Block averaging using integral images in 2D. As pointed out above, block averaging also extends beyond pixels! This
can be computed for other measures such as gradients (e.g. Histogram of Gradients).
Using the integral image values, the block average in the 2D range (i, j) in x and (k, l) in y becomes:
1
ḡ([i,j],[k,l]) = (Gj,l + Gi,k ) − (Gi,l + Gj,k )
(j − i)(l − k)
• This analysis can be extended to higher dimensions as well! Though the integral image will take longer to compute, and
the equations for computing these block averages become less intuitive, this approach generalizes to arbitrary dimensions.
• As we saw in the one-dimensional case, here we can also observe that after computing the integral image (a one-time
operation that can be amortized), the computational cost for averaging each of these 2D blocks becomes independent
of the size of the block being averaged. This stands in stark contrast to the naive implementation - here, the
computational cost scales quadratically with the size of the block being averaged (or linearly in each dimension, if we take
rectangular block averages).
• Why is this relevant? Recall that block averaging implements approximate lowpass filtering, which can be used as a
frequency suppression mechanism to avoid aliasing when filtering.
• In other domains outside of image processing, the integral image is known as the “Summed-area Table” [2].
Since we intend to use this for approximate lowpass filtering, let us now change topics toward fourier analysis of this averaging
mechanism to see how efficacious it is.
Visually:
5
Block averaging h(x) for δ = 2
0.6
0.4
h(x)
0.2
−2 −1 0 1 2
x
Let’s see what this Fourier Transform looks like. Recall that the Fourier Transform (up to a constant scale factor, which varies
by domain) is given by:
Z ∞
F (jω) = f (x)e−jωx dx
−∞
Where jω corresponds to complex frequency. Substituting our expression into this transform:
Z ∞
H(jω) = h(x)e−jωx dx
−∞
Z δ
2 1 −jωx
= e dx
− δ2 δ
1 1 −jωx x= δ2
= [e ]x=− δ
δ jω 2
jωδ jωδ
e− 2 −e 2
=
−jωδ
δω
sin 2
= δω
(Sinc function)
2
Where in the last equality statement we use the identity given by:
ejx − e−jx
sin(x) =
−2j
6
δω
sin 2
Sinc function H(jω) = δω
2
0.5
H(jω)
0
−8 −6 −4 −2 0 2 4 6 8
jω
Figure 7: Example H(jω) for δ = 2. This is the Fourier Transform of our block averaging “filter”.
Although sinc functions in the frequency domain help to attenuate higher frequencies, they do not make the best lowpass filters.
This is the case because:
• Higher frequencies are not completely attenuated.
• The first zero is not reached quickly enough. The first zero is given by:
ω0 δ π π
= =⇒ ω0 =
2 2 δ
Intuitively, the best lowpass filters perfectly preserve all frequencies up to the cutoff frequencies, and perfectly attenuate every-
thing outside of the passband. Visually:
δω
sin 2
Sinc function H(jω) = δω
2
1 Sinc Filter
Ideal lowpass filter
0.5
H(jω)
−15 −10 −5 0 5 10 15
jω
Figure 8: Frequency response comparison between our block averaging filter and an ideal lowpass filter. We also note that the
“boxcar” function and the sinc function are Fourier Transform pairs!
Although sinc functions in the frequency domain help to attenuate higher frequencies, they do not make the best lowpass filters.
This is the case because:
• Higher frequencies are not completely attenuated.
7
• The first zero is not reached quickly enough. The first zero is given by:
ω0 δ π π
= =⇒ ω0 =
2 2 δ
Where else might we see this? It turns out cameras perform block average filtering becuase pixels have finite width over which
to detect incident photons. But is this a sufficient approximate lowpass filtering technique? Unfortunately, oftentimes it is not.
We will see below that we can improve with repeated block averaging.
f (x) y1 (x)
b(x)
What happens if we add another filter? Then, we simply add another element to our convolution:
y2 (x) = (f (x) ⊗ b(x)) ⊗ b(x) = y1 (x) ⊗ b(x)
Adding this second filter is equivalent to convolving our signal with the convolution of two “boxcar” filters, which is a triangular
filter:
Triangular filter for δ = 2.
0.8
0.6
h(x)
0.4
0.2
−3 −2 −1 0 1 2 3
x
Figure 9: Example of a triangular filter resulting from the convolution of two “boxcar” filters.
Additionally, note that since convolution is associative, for the “two-stage” approximate lowpass filtering approach above, we
do not need to convolve our input f (x) with two “boxcar” filters - rather, we can convolve it directly with our trinagular filter
b2 (x) = b(x) ⊗ b(x):
y2 (x) = (f (x) ⊗ b(x)) ⊗ b(x)
= f (x) ⊗ (b(x) ⊗ b(x))
= f (x) ⊗ b2 (x)
Let us now take a brief aside to list out how discontinuities affect Fourier Transforms in the frequency domains:
F
• Delta Function: δ(x) ←
→1
Intuition: Convolving a function with a delta function does not affect the transform, since this convolution simply
produces the function.
8
F
• Unit Step Function: u(x) ←
→ 1
jω
Intuition: Convolving a function with a step function produces a degree of averaging, reducing the high frequency
components and therefore weighting them less heavily in the transform domain.
F
• Ramp Function: r(x) ←
→ − ω12
Intuition: Convolving a function with a ramp function produces a degree of averaging, reducing the high frequency
d F
components and therefore weighting them less heavily in the transform domain. Derivative: dx f (x) ←
→ jωF (jω)
Intuition: Since taking derivatives will increase the sharpness of our functions, and perhaps even create discontinuities, a
derivative in the spatial domain corresponds to multiplying by jω in the frequency domain.
As we can see from above, the more “averaging” effects we have, the more the high-frequency components of the signal will be
filtered out. Conversely, when we take derivatives and create discontinuities in our spatial domain signal, this increases high
frequency components of the signal because it introduces more variation.
To understand how we can use repeated block averaging in the Fourier domain, please recall the following special properties of
Fourier Transforms:
1. Convolution in the spatial domain corresponds to multiplication in the frequency domain, i.e. for all
f (x), g(x), h(x) with corresponding Fourier Transforms F (jω), G(jω), H(jω), we have:
F
h(x) = f (x) ⊗ g(x) ←
→ H(jω) = F (jω)G(jω)
2. Multiplication in the spatial domain corresponds to convolution in the frequency domain, i.e. for all
f (x), g(x), h(x) with corresponding Fourier Transforms F (jω), G(jω), H(jω), we have:
F
h(x) = f (x)g(x) ←
→ H(jω) = F (jω) ⊗ G(jω)
For block averaging, we can use the first of these properties to understand what is happening in the frequency domain:
F
→ Y (jω) = F (jω)(B(jω)2 )
y2 (x) = f (x) ⊗ (b(x) ⊗ b(x)) ←
0.8
0.6
H 2 (jω)
0.4
0.2
−8 −6 −4 −2 0 2 4 6 8
jω
Figure 10: Example H 2 (jω) for δ = 2. This is the Fourier Transform of our block averaging “filter” convolved with itself in the
spatial domain.
.
9
This is not perfect, but it is an improvement. In fact, the frequencies with this filter drop off with magnitude ω1 )2 . What
happens if we continue to repeat this process with more block averaging filters? It turns out that for N “boxcar” filters that we
N
use, the magnitude will drop off as ω1 . Note too, that we do not want to go “too far” in this direction, because this repeated
block averaging process will also begin to attenuate frequencies in the passband of the signal.
1.4.1 Warping Effects and Numerical Fourier Transforms: FFT and DFT
Two main types of numerical transforms we briefly discuss are the Discrete Fourier Transform (DFT) and Fast Fourier Transform
(FFT). The FFT is an extension of the DFT that relies on using a “divide and conquer” approach to reduce the computational
runtime from f (N ) ∈ O(N 2 ) to f (N ) ∈ O(N log N ) [3].
Mathematically, the DFT is given as a transform that transforms a sequence of N complex numbers {xn }N n=1 into another
sequence of N complex numbers {Xk }Nk=1 [4]. The transform for the K th
value of this output sequence is given in closed-form
as:
N
2π
X
Xk = xn e−j N kn
i=1
And the inverse transform for the nth value of this input sequence is given as:
N
2π
X
xn = Xk ej N kn
k=1
One aspect of these transforms to be especially mindful of is that they introduce a wrapping effect, since transform values are
spread out over 2π intervals. This means that the waveforms produced by these transforms, in both the spatial (if we take the
inverse transform) and frequency domains may be repeated - this repeating can introduce undesirable discontinuities, such as
those seen in the graph below:
Repeated Function x2
8
6
f (x)
0
0 2 4 6 8
x
Figure 11: Example of a repeated waveform that we encounter when looking at DFTs and FFTs.
1
Fun fact: It used to be thought that natural images had a power spectrum (power in the frequency domain) that falls off as ω.
It turns out that this was actually caused by warping effects introduced by discrete transforms.
This begs the question - how can we mitigate these warping effects? Some methods include:
• Apodizing: This corresponds to multiplying your signal by a waveform, e.g. Hamming’s Window, which takes the form
akin to a Gaussian, or an inverted cosine.
• Mirroring: Another method to mitigate these warping effects is through waveform mirroring - this ensures continuity at
points where discontinuties occurred:
10
Repeated Function x2
8
f (x)
4
0
0 2 4 6 8
x
Figure 12: Example of a mirrored waveform that we can use to counter and mitigate the discontinuity effects of warping from
transforms such as the DFT and FFT.
1 1
With this approach, the power spectrum of these signals falls off at ω2 , rather than ω.
• Infinitely Wide Signal: Finally, a less practical, but conceptual helpful method is simply to take an “infinitely wide
signal”.
Let us now switch gears to talk more about the unit impulse and convolution.
An impulse can be conceptualized as the limit in which the variance of this Gaussian distribution σ 2 goes to 0, which corresponds
to a Fourier Transform of 1 for all frequencies (which is the Fourier Transform of a delta function).
Another way to consider impulses is that they are the limit of “boxcar” functions as their width goes to zero.
Let us next generalize from a single impulse function to combinations of these functions.
11
Correlating (*note that this is not convolution - if we were to use convolution, this derivative would be flipped) this combination
of impulse “filter” with an arbitrary function f (x), we compute a first-order approximation of the derivative:
Z ∞
f 0 (x) ≈ f (x)h(x)dx
−∞
Z ∞
1
= δ(x + ) − δ(x − ) dx
−∞ 2 2
Therefore, combinations of impulses can be used to represent the same behavior as the “computational molecules” we identified
before. It turns out that there is a close connection between linear, shift-invariant operators and derivative operators.
One way to achieve this analog filtering is through Birefringent Lenses. Here, we essentially take two “shifted” images
by convolving the image with a symmetric combination of offset delta functions, given mathematically by:
1 1
h(x) = δ(x + ) + δ(x − ) for some > 0
2 2 2 2
Let us look at the Fourier Transform of this filter, noting the following Fourier Transform pair:
F
→ e−jωx0
δ(x − x0 ) ←
With this we can then express the Fourier Transform of this filter as:
Z ∞
1
F (jω) = √ h(x)e−jωx dx
2π −∞
1 − jω jω
= e 2 +e 2
2
ω
= cos
2
π
With this framework, the first zero to appear here occurs at ω0 = . A few notes about these filters, and how they relate to
high-frequency noise suppression.
• When these birefringent lenses are cascaded with a block averaging filter, this results in a combined filtering scheme in
which the zeros of the frequency responses of these filters cancel out most of the high-frequency noise.
• In the 2D case, we will have 2 birefringent filters, one for the x-direction and one for the y-direction. Physically, these are
rotated 90 degrees off from one another, just as they are for a 2D cartesian coordinate system.
12
• High-performance lowpass filtering requires a large support (see definition of this below if needed) - the computational
costs grow linearly with the size of the support in 1D, and quadratically with the size of the support in 2D. The support
of a function is defined as the set where f (·) is nonzero [5]:
supp(f ) = {x : f (x) 6= 0, x ∈ R}
• Therefore, one way to reduce the computatonal costs of a filtering system is to reduce the size/cardinality of the support
|supp(f )| - in some sense to encourage sparsity. Fortunately, this does not necessarily mean looking over a narrower range,
but instead just considering less points overall.
Therefore, we can represent integral and derivative operators as Fourier Transform pairs too, denoted S for integration and D
for derivative:
F
• S←
→ 1
jω
F
• D←
→ jω
Niote that we can verify this by showing that convolving these filter operators corresponds to multiplying these transforms in
frequency space, which results in no effect when cascaded together:
F
1 F
(f (x) ⊗ D) ⊗ S = f (x) ⊗ (D ⊗ S) ←
→ F (jω) jω = F (jω) ←
→ f (x)
jω
R∞
f (x) 0
f (ξ)dξ f (x)
S D
d
f (x) dx f (x) f (x)
D S
Can we extend this to higher-order derivatives? It turns out we can. One example is the convolution of two derivative operators,
which becomes:
F
h(x) = δ(x + ) − 2δ(x) + δ(x − ) = D ⊗ D ← → H(jω) = D(jω)2 = (jω)2 = −ω 2 (Recall that j 2 = −1)
2 2
In general, this holds. Note that the number of integral operators S must be equal to the number of derivative operators D, e.g.
for K order:
i=1 S ⊗
⊗K i=1 D ⊗ f (x)
⊗K
13
• Recall that one key element of computational efficiency we pursue is to use integral images for block averaging, which is
much more efficient than computing naive sums, especially if (1) This block averaging procedure is repeated many times
(the amortized cost of computing the integral image is lessened) and (2) This process is used in higher dimensions.
• Linear interpolation can be conceptualized as connecting points together using straight lines between points. This
corresponds to piecewise-linear segments, or, convolution with a triangle filter, which is simply the convolution of two
“boxcar filters”:
Unfortunately, one “not-so-great” property of convolving with triangular filters for interpolation is that the noise in the
interpolated result varies depending on how far away we are from the sampled noise.
• Nearest Neighbor techniques can also be viewed through a convolutional lens - since this method produces piecewise-
constant interpolation, this is equivalent to convolving our sampled points with a “boxcar” filter!
The inverse transform of this can be thought of as a sinc function in polar coordinates:
B 2 J1 (ρB)
f (ρ, θ) =
2π ρB
A few notes about this inverse transform function:
• This is the point spread function of a microscope.
• In the case of defocusing, we can use the “symmetry” property of the Fourier Transform to deduce that if we have a circular
point spread function resulting from defocusing of the lens, then we will have a Bessel function in the frequency/Fourier
domain.
• Though a pointspread function is a “pillbox” in the ideal case, in practice this is not perfect due to artifacts such as lens
aberrations.
1.6 References
1. Gibbs Phenomenon, https://en.wikipedia.org/wiki/Gibbs
____________________________________________
phenomenon
2. Summed-area Table, https://en.wikipedia.org/wiki/Summed-area
____________________________________________
table
3. Fast Fourier Transform, https://en.wikipedia.org/wiki/Fast
_______________________________________________
Fourier transform
4. Discrete Fourier Transform, https://en.wikipedia.org/wiki/Discrete
__________________________________________________
Fourier transform
5. Support, https://en.wikipedia.org/wiki/Support (mathematics)
______________________________________________
14
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms