0% found this document useful (0 votes)

14 views8 pages

Instructions

Problem Set 2 for Georgia Tech's CS 6476 course focuses on image processing techniques, including template matching, Hough transforms, and Fourier transforms. Students are required to implement various image detection algorithms, perform image compression, and apply low-pass filtering, while adhering to specific coding and reporting guidelines. The assignment emphasizes hands-on experience with image manipulation and understanding of the underlying concepts in computer vision.

Uploaded by

yucezhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views8 pages

Instructions

Uploaded by

yucezhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

G EORGIA T ECH ’ S CS 6476 C OMPUTER V ISION

Problem Set 2: Template Matching and FFT

January 17, 2025

A SSIGNMENT D ESCRIPTION
Description
Problem Set 2 is aimed at introducing basic building blocks of image processing. Key areas that
we wish to see you implement are: loading and manipulating images, producing some valued
output of images, and comprehension of the structural and semantic aspects of what makes an
image. For this and future assignments, we will give you a general description of the problem.
It is up to the student to think about and implement a solution to the problem using what you
have learned from the lectures and readings. You will also be expected to write a report on your
approach and lessons learned.

Learning Objectives
• Use Hough tools to search and find lines and circles in an image.
• Use the results from the Hough algorithms to identify basic shapes.
• Use template matching to identify shapes
• Understand the Fourier Transform and its applications to images
• Address the presence of distortion / noise in an image.

Problem Overview
Rules

You may use image processing functions to find color channels and load images. Don’t forget
that those have a variety of parameters and you may need to experiment with them. There are
certain functions that may not be allowed and are specified in the assignment’s autograder Ed
post.

Refer to this problem set’s autograder post for a list of banned function calls.

1
Please do not use absolute paths in your submission code. All paths should be relative to
the submission directory. Any submissions with absolute paths are in danger of receiving a
penalty!

I NSTRUCTIONS
Obtaining the Starter Files:
Obtain the starter code from canvas under files.

Programming Instructions
Your main programming task is to complete the api described in the file ps2.py. The driver pro-
gram experiment.py helps to illustrate the intended use and will output the files needed for the
writeup. Additionally there is a file ps2_test.py that you can use to test your implementation.
You can refer to the FAQ for a non-exhaustive list of banned functions.

Write-up Instructions
Create ps2_report.pdf - a PDF file that shows all your output for the problem set, including
images labeled appropriately (by filename, e.g. ps2-1-a-1.png) so it is clear which section they
are for and the small number of written responses necessary to answer some of the questions
(as indicated). For a guide as to how to showcase your results, please refer to the latex template
for PS2.

How to Submit
Two assignments have been created on Gradescope: one for the report - PS2_report, and the
other for the code - PS2_code where you need to submit ps2.py and experiment.py.

1. H OUGH T RANSFORMS [10 POINTS ]

1.a. Traffic Light
First off, you are given a generic traffic light to detect from a scene. For the sake of the prob-
lem, assume that traffic lights are shown as below: (with red, yellow, and green) lights that are
vertically stacked. You may also assume that there is no occlusion of the traffic light.

It is your goal to find a way to determine the state of each traffic light and position in a scene.
Position is measured from the center of the traffic light. Given that this image presents symme-
try, the position of the traffic light matches the center of the yellow circle.

2
Complete your python ps2.py such that traffic_light_detection returns the traffic light center
coordinates (x, y) ie (col, row) and the color of the light that is activated (‘red’, ‘yellow’, or
‘green’). Read the function description for more details.

Testing:
A traffic light scene that we will test will be randomly generated, like in the following pictures
and examples in the github repo.

Functional assumptions:
For the sake of simplicity, we are using a basic color scheme, but assume that the scene may
have different color objects and backgrounds [relevant for part 2 and 3]. The shape of the traf-
fic light will not change, nor will the size of the individual lights relative to the traffic light. Size
range of the lights can be reasonably expected to be between 10-30 pixels in radius. There will
only be one traffic light per scene, but its size and location will be generated at random (that
is, a traffic light could appear in the sky or in the road–no assumptions should be made as to
its logical position). While the traffic light will not be occluded, the objects in the background
may be.

Code:
Complete traffic_light_detection(img_in, radii_range)

Report:
Place the coordinates using cv2.putText before saving the output images.
Input: scene_tl_1.png. Output: ps2-1-a-1.jpg [5 points]
This section won’t be autograded. It is graded manually in the report.

1.b. Construction sign one per scene [5 points]

Now that you have detected a basic traffic light, see if you can detect road signs. Below is the
construction sign that you would see in the United States (apologies to those outside the United
States).
Implement a way to recognize the signs:

3
Similar to the traffic light, you are tasked with detecting the sign in a scene and finding the (x,
y) i.e (col, row) coordinates that represent the polygon’s centroid.
Functional assumptions:
Like above, assume that the scene may have different color objects and backgrounds. The size
and location of the traffic sign will be generated at random. While the traffic signs will not be
occluded, objects in the background may be.

Code:
Complete the following functions. Read their documentation in ps2.py for more details.
• construction_sign_detection(img_in)

Report:
Place the coordinates using c2.putText before saving the output images.
Input: scene_constr_1.png. Output: ps2-1-b-1.jpg [5 points]
This section won’t be autograded. It is graded manually in the report.

2. T EMPLATE M ATCHING [30 POINTS ]

Template matching is a common image processing technique that is used to find small parts
of an image that match with the template image. In this section, we will learn how to perform
template matching using different metrics to establish matching.
We will try to retrieve the traffic light and the traffic sign in Part 1, by using their templates. In
addition, We have the image of a Waldo and another image where Waldo is hidden.

We will attempt to find Waldo by matching the Waldo template with the image using the fol-
lowing techniques:
• Sum of squared differences: tm_ssd
• Normalized sum of squared differences: tm_nssd
• Correlation: tm_ccor
• Normalized correlation: tm_nccor
We use the sliding window technique for matching the template. As we slide our template pixel
by pixel, we calculate the similarity between the image window and the template and store this
result in the top left pixel of the result. The location with maximum similarity is then touted a
match for the template.

4
Code:
Complete the template matching function. Each method is called for a different metric to de-
termine the degree to which the template matches the original image. You’ll be testing on the
traffic signs used in Part 1, and Suggestion : For loops in python are notoriously slow. Can we
find a vectorized solution to make it faster?
Note: c v2.mat chTempl at e() isn’t allowed.

Report:
Pick the best of the 4 methods to display in the report.
Input: scene_tl_1.png. Output: ps2-2-a-1.jpg [5]
Input: scene_constr_1.png. Output: ps2-2-b-1.jpg [5]
Input: waldo1.png. Output: ps2-2-c-1.jpg [5]

Text:
2d. What are the disadvantages of using Hough based methods in finding Waldo? Can template
matching be generalised to all images? Explain Why/Why not. Which method consistently per-
formed the best, why? [15]

3. F OURIER T RANSFORM
In this section we will use the Fourier Transform to compress an image. The Fourier transform
is an integral signal processing tool used in a variety of domains and converts a signal into
individual spectral components (sine and cosine waves). Another way of thinking about this is
that it converts a signal from the time domain to the frequency domain. While signals like audio
are a 1-dimensional signal, we will apply the Fourier transform to images as a 2-dimensional
signal. For more information on the Fourier Transform, lectures 2C-L1 and 2C-L2 provide a
good overview.

1-Dimensional Fourier Transform

The Fourier transform can be computed in two different algorithmic ways: Discrete and Fast.
In big O notation, the Discrete Fourier Transform (DFT) can be computed in O(n 2 ) time while
the Fast Fourier Transform can be computed in O(n log(n)) time. In this assignment, we will
implement the Discrete Fourier Transform.

The Discrete Fourier Transform can be defined as

Px=N −1 −i 2πkx
F (k) = x=0 f (x)e N

One way to calculate the Fourier Transform is a dot product between a coefficient matrix and
the signal. Given a signal of length n, we define the coefficient matrix (n × n) as

5
 
1 1 1 1 ... 1
1
 ω ω2 ω3 ... ωn−1 

1 ω2 ω4 ω6 ... ω2(n−1)
 

ω3 ω6 ω9 ω3(n−1)
 
1 ... 
ωj ω2 j ω3 j ω(n−1) j
 
M n (ω) = 1
 ... 

. . . . . . 
 
. . . . . .
 

 
. . . . . . 
n−1
1 ω ω2(n−1) ω3(n−1) ... ω(n−1)(n−1)

−i 2π
where j represents each row and ω is e N . The vector resulting from M n (ω) · f (x) is now your
i 2π
Fourier transformed signal! To compute the inverse of the fourier transform, ω is e N and
f (x) = N1 M n (ω) · F (x).

Code:
Complete the following functions following the process above. Numpy matrix operations can
be used to simplify the calculation but np.fft functions are not allowed in this section.

• dft(x)
• idft(x)

Report: No writeup for this section

2-Dimensional Fourier Transform

Now that we have computed the Fourier Transform for a 1-dimensional signal, we can do the
same thing for a 2-dimensional image. Remember that the 2-dimensional Fourier transform
simply applies the one-dimensional transform on each row and column.

Code:
You may use the functions from the last sections but np.fft functions are not allowed.

• dft2(x)
• idft2(x)

Report: No writeup for this section

4. U SING THE F OURIER T RANSFORM FOR C OMPRESSION [15 POINTS ]

Compression is a useful tool in all types of signal processing but especially useful for images.
Lets say you take a picture of your dog that is 10 mb but want to reduce the file size and yet still
maintain the quality of the image. One method of doing this is to convert the image into the
frequency domain and keeping only the most dominant frequencies. For this section, we will
implement image compression and pseudocode for the algorithm is shown below.

6
Input: an image of shape n × n (img_bgr) and threshold percentage (t);
for channel = b, g , r do
Select channel of img_bgr as image. Convert the image to frequency domain;
Sort all the values of frequency from greatest to least into a 1-dimensional array of
length (n 2 );
Find the threshold value at the index calculated by f l oor (t n 2 );
Mask the frequency image to keep all values greater than the threshold value;
Convert the masked frequency image back into pixel values with the inverse
Fourier transform;
Set the channel of the new image to the masked filtered version;
Set the channel of the frequency domain image to the masked version;
end
return the filtered image and the masked frequency domain image (each should have 3
channels)
Algorithm 1: Image Compression with Fourier Transform
To visualize the masked frequency image, be sure to shift all the low frequencies to the center
of the image with np.fft.fftshift. Additionally, take 20 ∗ l og (abs(x)). This is done to properly
visualize the frequency domain image.

Code:
• compress_img_fft(x)

This functions are used in: compression_runner()

Report:
Use threshold percentages of 0.1, 0.05, and 0.001. Display both the resulting image and the fre-
quency domain image in the report for each.
Outputs: ps2-4-a-1.jpg, ps2-4-a-1-freq.jpg, ps2-4-a-2.jpg, ps2-4-a-2-freq.jpg, ps2-4-a-3.jpg, ps2-
4-a-3-freq.jpg [15 points, 2.5 each]

Note: Students in previous semesters have noted issues with very small amounts of error for com-
pression. If you run into this, try using np.fft functions only for this question.

5. F ILTERING WITH THE F OURIER T RANSFORM [35 POINTS ]

Now that we have seen how the Fourier Transform can be used for compression, we will now
use the Fourier Transform as a low-pass filter. A low-pass filter similarly keeps all the low fre-
quencies within the image and 0’s out all the high frequency components. We will follow the
process shown in lecture video 2C-L1-14 by first converting the image to the frequency domain,
masking the spectral image with a circle of radius r, and converting the image back to pixel color
values. The pseudocode for the algorithm is given below:

7
Input: an image (img) and circle of radius r ;
for channel = r, g , b do
Convert the image to frequency domain;
Shift the spectral frequencies so that all low frequencies are in the center of the
image (use np.fft.fftshift);
Mask the frequency image with a circle of radius r, keeping all the low frequencies
and removing all the high frequencies;
Undo the phase shift with np.fft.ifftshift;
Convert the masked frequency image back into pixel values with the inverse
Fourier transform;
Set the channel of the new image to the low pass filtered version;
Set the channel of the frequency domain image to the masked version;
end
return the filtered image and the masked frequency domain image (each should have 3
channels)
Algorithm 2: Low Pass Filter with Fourier Transform

Code:
• low_pass_filter(img_bgr, r)

This function is used in: low_pass_filter_runner()

Report: Use radii of 100, 50, and 10. Display both the resulting image and the frequency do-
main image in the report for each. Apply 20 ∗ l og (abs(x)) to properly visualize the frequency
domain image.
Outputs: ps2-5-a-1.jpg, ps2-5-a-1-freq.jpg, ps2-5-a-2.jpg, ps2-5-a-2-freq.jpg, ps2-5-a-3.jpg, ps2-
5-a-3-freq.jpg [15 points, 2.5 each]
5-b What are the differences between compression and filtering? How does this change the
resulting image? [10 points]
5-c Given an image corrupted with salt and pepper pepper noise, what filtering method can
effectively reduce/remove this noise? Also explain your choice of filtering method. Show some
examples. [10 points]

ADA SolBank Final
No ratings yet
ADA SolBank Final
80 pages
CONM TechMax Syllabus
0% (2)
CONM TechMax Syllabus
12 pages
Roman Numerals
100% (2)
Roman Numerals
10 pages
Factoring - Solving Polynomial Equations
No ratings yet
Factoring - Solving Polynomial Equations
17 pages
Operation researchOR MCQs
No ratings yet
Operation researchOR MCQs
8 pages
Citstudents - In: Unit-Iii Transportation Model
No ratings yet
Citstudents - In: Unit-Iii Transportation Model
19 pages
Lab Report Experiment 3
No ratings yet
Lab Report Experiment 3
3 pages
Convolutional Codes
No ratings yet
Convolutional Codes
26 pages
Module 4
No ratings yet
Module 4
15 pages
Éléments de Data Mining Avec Tanagra: Vincent ISOZ, 2013-10-21 (V3.0 Revision 6) (oUUID 1.679)
No ratings yet
Éléments de Data Mining Avec Tanagra: Vincent ISOZ, 2013-10-21 (V3.0 Revision 6) (oUUID 1.679)
146 pages
Trace Table Notes and Activity
No ratings yet
Trace Table Notes and Activity
4 pages
Jaya A Novel Optimization Algorithm: What, How and Why?: Hari Mohan Pandey
No ratings yet
Jaya A Novel Optimization Algorithm: What, How and Why?: Hari Mohan Pandey
3 pages
Digital Image Processing - Pages-122-126
No ratings yet
Digital Image Processing - Pages-122-126
5 pages
Haar Wavelet
No ratings yet
Haar Wavelet
56 pages
Error Handling
No ratings yet
Error Handling
4 pages
CSC 108H1 F 2010 Test 1 Duration - 45 Minutes Aids Allowed: None
No ratings yet
CSC 108H1 F 2010 Test 1 Duration - 45 Minutes Aids Allowed: None
8 pages
Some Practical Assignments in Computer Vision
No ratings yet
Some Practical Assignments in Computer Vision
4 pages
CSC 108H1 S 2012 Test 1 Duration - 45 Minutes Aids Allowed: None
No ratings yet
CSC 108H1 S 2012 Test 1 Duration - 45 Minutes Aids Allowed: None
6 pages
An Adaptive POD Approximation Method For The Control of Advection-Diffusion Equations
No ratings yet
An Adaptive POD Approximation Method For The Control of Advection-Diffusion Equations
17 pages
CSC 108H1 S 2009 Test 1 Duration - 35 Minutes Aids Allowed: None
No ratings yet
CSC 108H1 S 2009 Test 1 Duration - 35 Minutes Aids Allowed: None
7 pages
Structural Analysis I Moment Distribution Method Examples
100% (2)
Structural Analysis I Moment Distribution Method Examples
5 pages
Basic Operations in Image Processing - Poorvi Joshi - 2019 Batch
No ratings yet
Basic Operations in Image Processing - Poorvi Joshi - 2019 Batch
26 pages
Matlab Exercises
No ratings yet
Matlab Exercises
5 pages
Image Filtering and Hybrid Images
No ratings yet
Image Filtering and Hybrid Images
7 pages
Digital Signal Processing: Lab Manual
No ratings yet
Digital Signal Processing: Lab Manual
5 pages
EEE 123 Group Project - Image Processing Using C++: Task 1 - Improving Contrast of Image
No ratings yet
EEE 123 Group Project - Image Processing Using C++: Task 1 - Improving Contrast of Image
6 pages
HW3 Image Processing
No ratings yet
HW3 Image Processing
6 pages
Assignment1 2017w
No ratings yet
Assignment1 2017w
3 pages
Filtering and Hough Transform
No ratings yet
Filtering and Hough Transform
3 pages
Internship-Report-Sharath Palle
No ratings yet
Internship-Report-Sharath Palle
16 pages
Lab - Image Filter
No ratings yet
Lab - Image Filter
3 pages
Question Bank Unit - I
No ratings yet
Question Bank Unit - I
7 pages
CS867: Computer Vision: Instructor: Muhammad Moazam Fraz
No ratings yet
CS867: Computer Vision: Instructor: Muhammad Moazam Fraz
4 pages
Chapter 2. Solutions of Equations in One Variable
No ratings yet
Chapter 2. Solutions of Equations in One Variable
27 pages
Assignment 2 3134
No ratings yet
Assignment 2 3134
6 pages
P6 - Computer Vision
No ratings yet
P6 - Computer Vision
27 pages
A Review On Empirical Mode Decomposition in Fault Diagnosis
No ratings yet
A Review On Empirical Mode Decomposition in Fault Diagnosis
19 pages
Practice Final 09
No ratings yet
Practice Final 09
8 pages
Real Time Image Processing
No ratings yet
Real Time Image Processing
32 pages
(Color Identification in Images) : Iot & Computer Vision
No ratings yet
(Color Identification in Images) : Iot & Computer Vision
17 pages
DIGITAL IMAGE PROCESSING Course Information Sheet DIP-CS457-UIT
No ratings yet
DIGITAL IMAGE PROCESSING Course Information Sheet DIP-CS457-UIT
4 pages
This Study Resource Was: Name: Section
No ratings yet
This Study Resource Was: Name: Section
2 pages
ComputerGraphicsNotesWeek9 01 0418
No ratings yet
ComputerGraphicsNotesWeek9 01 0418
6 pages
Phyton-Working With Pixels
No ratings yet
Phyton-Working With Pixels
20 pages
DIP Lab Assignments
No ratings yet
DIP Lab Assignments
31 pages
Simplex Method
No ratings yet
Simplex Method
61 pages
Assignment1 Pub
No ratings yet
Assignment1 Pub
6 pages
AI11
No ratings yet
AI11
2 pages
dm545 Lec3
No ratings yet
dm545 Lec3
19 pages
ACFrOgCfX9ATrHm9ZSjs1HLKnJCXmmPcIwFi Y7hVAv6zU1Li3igjIXOOLtGhffODBql8a993YAsc3gM SE8bidlMJr2eFkl9eJB0BU8jcLD6iWrroxwbp1 X9yQtpQks6r8vMLEnR-ORk02lgVJ
No ratings yet
ACFrOgCfX9ATrHm9ZSjs1HLKnJCXmmPcIwFi Y7hVAv6zU1Li3igjIXOOLtGhffODBql8a993YAsc3gM SE8bidlMJr2eFkl9eJB0BU8jcLD6iWrroxwbp1 X9yQtpQks6r8vMLEnR-ORk02lgVJ
20 pages
05 Master Theorem
No ratings yet
05 Master Theorem
39 pages
Preconditioning Techniques For Nonsymmetric Linear Systems in The Computation of Incompressible Flows
No ratings yet
Preconditioning Techniques For Nonsymmetric Linear Systems in The Computation of Incompressible Flows
7 pages
Some Practical Assignments in Computer Vision
No ratings yet
Some Practical Assignments in Computer Vision
5 pages
LAB1
No ratings yet
LAB1
25 pages
Computer Vision-121cs0184
No ratings yet
Computer Vision-121cs0184
10 pages
Ele492 HW 2
No ratings yet
Ele492 HW 2
1 page
1 s2.0 S2667305323000273 Main
No ratings yet
1 s2.0 S2667305323000273 Main
18 pages
10th Maths Model Revision EM Thirunelveli
No ratings yet
10th Maths Model Revision EM Thirunelveli
2 pages
THE3
No ratings yet
THE3
3 pages
hw1 Title
No ratings yet
hw1 Title
2 pages
IP&PR Question Bank
No ratings yet
IP&PR Question Bank
3 pages
SC101Assignment3 Eng
No ratings yet
SC101Assignment3 Eng
10 pages
DIP Ques
No ratings yet
DIP Ques
15 pages
Digital Image Processing Midterm Assessment 2
No ratings yet
Digital Image Processing Midterm Assessment 2
2 pages
DIP Exer 3
No ratings yet
DIP Exer 3
4 pages
P2 - Image RLE
No ratings yet
P2 - Image RLE
4 pages
ch3 Ex Ques
No ratings yet
ch3 Ex Ques
10 pages
ch2 Ex Ques
No ratings yet
ch2 Ex Ques
4 pages
Problem Set 5
No ratings yet
Problem Set 5
7 pages
Hyperparameters Optimization of Convolutional Neur
No ratings yet
Hyperparameters Optimization of Convolutional Neur
18 pages
Single-Source Shortest Paths - Cormen Book CH 24
No ratings yet
Single-Source Shortest Paths - Cormen Book CH 24
28 pages
19CSE367 - I Ass Mar 2023
No ratings yet
19CSE367 - I Ass Mar 2023
2 pages
ELE490 - New Assignment - 29-09-2024
No ratings yet
ELE490 - New Assignment - 29-09-2024
1 page
Midexam#005
No ratings yet
Midexam#005
13 pages
BLG 453E Fall2425 HW3
No ratings yet
BLG 453E Fall2425 HW3
4 pages
Tasks For Home All in One
No ratings yet
Tasks For Home All in One
2 pages
DIP Lab.1 For Level 5th
No ratings yet
DIP Lab.1 For Level 5th
6 pages
C131 EN Study 07 Answerkey C131 EN Study 07 Answerkey
No ratings yet
C131 EN Study 07 Answerkey C131 EN Study 07 Answerkey
6 pages
Quant Finance in Excel 1716541306
No ratings yet
Quant Finance in Excel 1716541306
140 pages
SEPM 11 (Case Study)
No ratings yet
SEPM 11 (Case Study)
7 pages
CV SVD L02 P1 IntroImageProcColor
No ratings yet
CV SVD L02 P1 IntroImageProcColor
89 pages
Group46 FINC5001 S22024
No ratings yet
Group46 FINC5001 S22024
41 pages
2154 Final
No ratings yet
2154 Final
422 pages
Computer Vision
No ratings yet
Computer Vision
20 pages
MMS210 Connected Sheets Assignment 3 José Borges
No ratings yet
MMS210 Connected Sheets Assignment 3 José Borges
4 pages
副本FINC
No ratings yet
副本FINC
46 pages
QB 2
No ratings yet
QB 2
2 pages
QB 1
No ratings yet
QB 1
2 pages
Ip 2023 Paper
No ratings yet
Ip 2023 Paper
15 pages
Lecture-3 Unit 3
No ratings yet
Lecture-3 Unit 3
22 pages
Homework 2 Description - Image Processing
No ratings yet
Homework 2 Description - Image Processing
15 pages
ASIP Practical
No ratings yet
ASIP Practical
30 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet

Instructions

Uploaded by

Instructions

Uploaded by

G EORGIA T ECH ’ S CS 6476 C OMPUTER V ISION

Problem Set 2: Template Matching and FFT

January 17, 2025

1. H OUGH T RANSFORMS [10 POINTS ]

1.b. Construction sign one per scene [5 points]

2. T EMPLATE M ATCHING [30 POINTS ]

1-Dimensional Fourier Transform

The Discrete Fourier Transform can be defined as

Report: No writeup for this section

2-Dimensional Fourier Transform

Report: No writeup for this section

4. U SING THE F OURIER T RANSFORM FOR C OMPRESSION [15 POINTS ]

This functions are used in: compression_runner()

5. F ILTERING WITH THE F OURIER T RANSFORM [35 POINTS ]

This function is used in: low_pass_filter_runner()

You might also like