Instructions
Instructions
A SSIGNMENT D ESCRIPTION
Description
Problem Set 2 is aimed at introducing basic building blocks of image processing. Key areas that
we wish to see you implement are: loading and manipulating images, producing some valued
output of images, and comprehension of the structural and semantic aspects of what makes an
image. For this and future assignments, we will give you a general description of the problem.
It is up to the student to think about and implement a solution to the problem using what you
have learned from the lectures and readings. You will also be expected to write a report on your
approach and lessons learned.
Learning Objectives
• Use Hough tools to search and find lines and circles in an image.
• Use the results from the Hough algorithms to identify basic shapes.
• Use template matching to identify shapes
• Understand the Fourier Transform and its applications to images
• Address the presence of distortion / noise in an image.
Problem Overview
Rules
You may use image processing functions to find color channels and load images. Don’t forget
that those have a variety of parameters and you may need to experiment with them. There are
certain functions that may not be allowed and are specified in the assignment’s autograder Ed
post.
Refer to this problem set’s autograder post for a list of banned function calls.
1
Please do not use absolute paths in your submission code. All paths should be relative to
the submission directory. Any submissions with absolute paths are in danger of receiving a
penalty!
I NSTRUCTIONS
Obtaining the Starter Files:
Obtain the starter code from canvas under files.
Programming Instructions
Your main programming task is to complete the api described in the file ps2.py. The driver pro-
gram experiment.py helps to illustrate the intended use and will output the files needed for the
writeup. Additionally there is a file ps2_test.py that you can use to test your implementation.
You can refer to the FAQ for a non-exhaustive list of banned functions.
Write-up Instructions
Create ps2_report.pdf - a PDF file that shows all your output for the problem set, including
images labeled appropriately (by filename, e.g. ps2-1-a-1.png) so it is clear which section they
are for and the small number of written responses necessary to answer some of the questions
(as indicated). For a guide as to how to showcase your results, please refer to the latex template
for PS2.
How to Submit
Two assignments have been created on Gradescope: one for the report - PS2_report, and the
other for the code - PS2_code where you need to submit ps2.py and experiment.py.
It is your goal to find a way to determine the state of each traffic light and position in a scene.
Position is measured from the center of the traffic light. Given that this image presents symme-
try, the position of the traffic light matches the center of the yellow circle.
2
Complete your python ps2.py such that traffic_light_detection returns the traffic light center
coordinates (x, y) ie (col, row) and the color of the light that is activated (‘red’, ‘yellow’, or
‘green’). Read the function description for more details.
Testing:
A traffic light scene that we will test will be randomly generated, like in the following pictures
and examples in the github repo.
Functional assumptions:
For the sake of simplicity, we are using a basic color scheme, but assume that the scene may
have different color objects and backgrounds [relevant for part 2 and 3]. The shape of the traf-
fic light will not change, nor will the size of the individual lights relative to the traffic light. Size
range of the lights can be reasonably expected to be between 10-30 pixels in radius. There will
only be one traffic light per scene, but its size and location will be generated at random (that
is, a traffic light could appear in the sky or in the road–no assumptions should be made as to
its logical position). While the traffic light will not be occluded, the objects in the background
may be.
Code:
Complete traffic_light_detection(img_in, radii_range)
Report:
Place the coordinates using cv2.putText before saving the output images.
Input: scene_tl_1.png. Output: ps2-1-a-1.jpg [5 points]
This section won’t be autograded. It is graded manually in the report.
3
Similar to the traffic light, you are tasked with detecting the sign in a scene and finding the (x,
y) i.e (col, row) coordinates that represent the polygon’s centroid.
Functional assumptions:
Like above, assume that the scene may have different color objects and backgrounds. The size
and location of the traffic sign will be generated at random. While the traffic signs will not be
occluded, objects in the background may be.
Code:
Complete the following functions. Read their documentation in ps2.py for more details.
• construction_sign_detection(img_in)
Report:
Place the coordinates using c2.putText before saving the output images.
Input: scene_constr_1.png. Output: ps2-1-b-1.jpg [5 points]
This section won’t be autograded. It is graded manually in the report.
We will attempt to find Waldo by matching the Waldo template with the image using the fol-
lowing techniques:
• Sum of squared differences: tm_ssd
• Normalized sum of squared differences: tm_nssd
• Correlation: tm_ccor
• Normalized correlation: tm_nccor
We use the sliding window technique for matching the template. As we slide our template pixel
by pixel, we calculate the similarity between the image window and the template and store this
result in the top left pixel of the result. The location with maximum similarity is then touted a
match for the template.
4
Code:
Complete the template matching function. Each method is called for a different metric to de-
termine the degree to which the template matches the original image. You’ll be testing on the
traffic signs used in Part 1, and Suggestion : For loops in python are notoriously slow. Can we
find a vectorized solution to make it faster?
Note: c v2.mat chTempl at e() isn’t allowed.
Report:
Pick the best of the 4 methods to display in the report.
Input: scene_tl_1.png. Output: ps2-2-a-1.jpg [5]
Input: scene_constr_1.png. Output: ps2-2-b-1.jpg [5]
Input: waldo1.png. Output: ps2-2-c-1.jpg [5]
Text:
2d. What are the disadvantages of using Hough based methods in finding Waldo? Can template
matching be generalised to all images? Explain Why/Why not. Which method consistently per-
formed the best, why? [15]
3. F OURIER T RANSFORM
In this section we will use the Fourier Transform to compress an image. The Fourier transform
is an integral signal processing tool used in a variety of domains and converts a signal into
individual spectral components (sine and cosine waves). Another way of thinking about this is
that it converts a signal from the time domain to the frequency domain. While signals like audio
are a 1-dimensional signal, we will apply the Fourier transform to images as a 2-dimensional
signal. For more information on the Fourier Transform, lectures 2C-L1 and 2C-L2 provide a
good overview.
One way to calculate the Fourier Transform is a dot product between a coefficient matrix and
the signal. Given a signal of length n, we define the coefficient matrix (n × n) as
5
1 1 1 1 ... 1
1
ω ω2 ω3 ... ωn−1
1 ω2 ω4 ω6 ... ω2(n−1)
ω3 ω6 ω9 ω3(n−1)
1 ...
ωj ω2 j ω3 j ω(n−1) j
M n (ω) = 1
...
. . . . . .
. . . . . .
. . . . . .
n−1
1 ω ω2(n−1) ω3(n−1) ... ω(n−1)(n−1)
−i 2π
where j represents each row and ω is e N . The vector resulting from M n (ω) · f (x) is now your
i 2π
Fourier transformed signal! To compute the inverse of the fourier transform, ω is e N and
f (x) = N1 M n (ω) · F (x).
Code:
Complete the following functions following the process above. Numpy matrix operations can
be used to simplify the calculation but np.fft functions are not allowed in this section.
• dft(x)
• idft(x)
Code:
You may use the functions from the last sections but np.fft functions are not allowed.
• dft2(x)
• idft2(x)
6
Input: an image of shape n × n (img_bgr) and threshold percentage (t);
for channel = b, g , r do
Select channel of img_bgr as image. Convert the image to frequency domain;
Sort all the values of frequency from greatest to least into a 1-dimensional array of
length (n 2 );
Find the threshold value at the index calculated by f l oor (t n 2 );
Mask the frequency image to keep all values greater than the threshold value;
Convert the masked frequency image back into pixel values with the inverse
Fourier transform;
Set the channel of the new image to the masked filtered version;
Set the channel of the frequency domain image to the masked version;
end
return the filtered image and the masked frequency domain image (each should have 3
channels)
Algorithm 1: Image Compression with Fourier Transform
To visualize the masked frequency image, be sure to shift all the low frequencies to the center
of the image with np.fft.fftshift. Additionally, take 20 ∗ l og (abs(x)). This is done to properly
visualize the frequency domain image.
Code:
• compress_img_fft(x)
Note: Students in previous semesters have noted issues with very small amounts of error for com-
pression. If you run into this, try using np.fft functions only for this question.
7
Input: an image (img) and circle of radius r ;
for channel = r, g , b do
Convert the image to frequency domain;
Shift the spectral frequencies so that all low frequencies are in the center of the
image (use np.fft.fftshift);
Mask the frequency image with a circle of radius r, keeping all the low frequencies
and removing all the high frequencies;
Undo the phase shift with np.fft.ifftshift;
Convert the masked frequency image back into pixel values with the inverse
Fourier transform;
Set the channel of the new image to the low pass filtered version;
Set the channel of the frequency domain image to the masked version;
end
return the filtered image and the masked frequency domain image (each should have 3
channels)
Algorithm 2: Low Pass Filter with Fourier Transform
Code:
• low_pass_filter(img_bgr, r)