(Fall 2024) Images and Convolutions
(Fall 2024) Images and Convolutions
Outline ●
●
Representing Images
Problems with MLP
● Convolution Mechanics
● More Convolutions!
What is Computer Vision?
A field of computer science
focused on processing,
analyzing, and understanding
visual data
Brief History of CV
● 1959
○ David Hubel and Torsten Wiesel started
experimenting on the visual cortex of cats
○ Discovered that our visual cortex processes
images by analyzing simple structures such
as edges first
Evolution of CV
● Object Detection prior to 2012:
Canny Edge Detector — a very famous feature extractor developed by Berkeley Prof. John Canny!
Takeaway: you can’t really extract any useful information from by looking at individual pixels at once
Extracting representations from images
Recall that deep learning is the process of extracting
hierarchical representations from an input. What
does this look like for an image?
1. Learn to detect edges, textures and colors from
raw pixels in the first layer
2. Use edges to detect simple shapes and patterns
in intermediate layers
3. Combine shapes and patterns to detect
abstract higher-level features, such as facial
shapes, in higher layers
Other desiderata
● Equivariance to translation: the same set of pixels, when translated, should have
their representations translated too
● Invariance to translation: semantic meaning does not change due to a translation
Solution: CNNs
1 0 1
0 1 0
1 0 1
Weight Filter
Terminology!: Also
referred to as a “kernel”
Filters + Convolutions
1 1 1 1 0
1 0 1 0 0 1 1 0 2 3 2
0 0 1 0 1 1 2 3
0 1 0
0 0 1 1 1
1 0 1 3 2 4
1 1 0 1 0
Weight Filter
Terminology!: Also
referred to as a “kernel”
Filters (2D)
How to perform convolutions:
1. Slide filter along width and height by a certain amount (stride).
2. Compute dot products between entries of filter and input at any position.
1 0 1
0 1 0
1 0 1
Weight Filter
Example What does this do? Any
ideas?
10 10 10 0 0 0
10 10 10 0 0 0 ?
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0 * 1
1
0
0
-1
-1
=
10 10 10 0 0 0
10 10 10 0 0 0
Example
10 10 10 0 0 0
10 10 10 0 0 0 0
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0 * 1
1
0
0
-1
-1
=
10 10 10 0 0 0
10 10 10 0 0 0
Example
10 10 10 0 0 0
10 10 10 0 0 0 0 ?
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0 * 1
1
0
0
-1
-1
=
10 10 10 0 0 0
10 10 10 0 0 0
Example
10 10 10 0 0 0
10 10 10 0 0 0 0 30
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0 * 1
1
0
0
-1
-1
=
10 10 10 0 0 0
10 10 10 0 0 0
Example
10 10 10 0 0 0
10 10 10 0 0 0 0 30 ?
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0 * 1
1
0
0
-1
-1
=
10 10 10 0 0 0
10 10 10 0 0 0
Example
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0 * 1
1
0
0
-1
-1
=
10 10 10 0 0 0
10 10 10 0 0 0
Example
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 ?
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0 * 1
1
0
0
-1
-1
=
10 10 10 0 0 0
10 10 10 0 0 0
Example
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0
10 10 10 0 0 0 * 1
1
0
0
-1
-1
=
10 10 10 0 0 0
10 10 10 0 0 0
What does this filter do?
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0
* =
0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
Vertical Edge Detection
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0
* =
0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
Vertical Edge Detection
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0
* =
0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
Convolutions Concept Check
1) What does a horizontal edge detector look like?
2) What is the output of the same input with a horizontal edge detector?
3) What does this ^ tell us about the output of some convolution based “<insert
shape here> detector”?
Convolutions Concept Check 1 1 1
0 0 0
1) What does a horizontal edge detector look like?
-1 -1 -1
Similar, but rotated 90 degrees
2) What is the output of the same input with a horizontal edge detector?
The output is all zeros
3) What does this ^ tell us about the output of some convolution based “<insert
shape here> detector”?
The output of convolving the kernel at any location is high when the feature the
kernel was designed to detect (or something similar) is present, and low when it isn’t
present. In this case, there were no horizontal lines, so our horizontal line kernel
outputted zero everywhere
Purpose of Convolutions
● Different filters can be used to extract
various features of an image such as
edge detection and blurring
Some Classical Ideas: Edge Detection
● edges and shapes are important!
● John Canny (Berkeley prof) made good edge detector
● discrete gradients
Some Classical Ideas: HOG
● Histogram of Oriented
Gradients
● uses multiple gradient
orientations
● compares histogram (e.g.
SVM)
Where does the “Deep Learning” part come in?
Like dense fully-connected layers, we can just learn these filters!
Filters (3D)
● Steps:
○ Compute the dot product for each channel
(same as 2D)
○ Sum over each channel
● Note: The depth of the filter is always the
same as the depth of the input image
Derivative
Derivative of output
Input Values Calculations for
feature map
Defining a Convolutional Layer in PyTorch
https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
Other Operations
Pooling Layers
● Reduces output size
● Applied to each channel independently
● Neighboring features may be similar
○ Doesn’t remove too much information
● Max pooling takes the max
● Average pooling takes the average
Pooling Layers Concept Check
1) In max pooling, what are the partial
derivatives of the top right output with
respect to the 2x2 sub-grid of inputs in the
top right corner?
Pooling Layers Concept Check
1) In max pooling, what are the partial
derivatives of the top right output with
respect to the 2x2 sub-grid of inputs in the
top right corner?
0 0
0 1
http://tinyurl.com/fa24-dl4cv
Contributors
● Jake Austin
● Aryan Jain
● Val Rotan
● Past ML@B Edu members