0% found this document useful (0 votes)

24 views69 pages

(Fall 2024) Images and Convolutions

Uploaded by

David Earnest

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views69 pages

(Fall 2024) Images and Convolutions

Uploaded by

David Earnest

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Images and Convolutions

By: ML@B Edu Team

● What is Computer Vision

Outline ●
●
Representing Images
Problems with MLP
● Convolution Mechanics
● More Convolutions!
What is Computer Vision?
A ﬁeld of computer science
focused on processing,
analyzing, and understanding
visual data
Brief History of CV

● 1959
○ David Hubel and Torsten Wiesel started
experimenting on the visual cortex of cats
○ Discovered that our visual cortex processes
images by analyzing simple structures such
as edges ﬁrst
Evolution of CV
● Object Detection prior to 2012:

● Then came deep learning…

○ No more feature extraction by hand!
○ Use a large neural network to learn the important features
○ Deep Learning paved the way for a massive acceleration in the progress of computer vision
The Deep Learning Approach
● Convolutional Neural Networks (CNNs)
○ Deep Learning algorithm used for analyzing images
○ Invented by Yann LeCun (LeNet - 5) in 1990s
● Why the sudden explosion post 2012?
○ AlexNet (16.4% Error Rate!!)
■ Nvidia GPUs
■ More powerful computers
■ More access to data
Images as Data
How do we represent
images digitally?
Images as matrices?
Grayscale
● Pixel values range in gray levels
from 0 (black) to 255 (white)
● Each pixel has 256 values — takes
up 8 bits
Color Images/Channels

Terminology: Each “layer” is commonly

referred to as a channel. An RGB image
has 3 channels.
Moving Past MLPs
Why are standard
dense NNs (MLPs) not
ideal for image
classiﬁcation?
Classiﬁcation with NN
● Consider a 200 x 200 x 3 image
○ This is an RGB image with height and width
200 pixels
○ We can represent this by a 200x200x3 =
120,000 element vector
● How many parameters do we need for
an MLP with one fully connected
hidden layer of 10 units?

Fully Connected Layers: y = Wx + b

Classiﬁcation with NN
● Consider a 200 x 200 x 3 image
○ This is an RGB image with height and width
200 pixels
○ We can represent this by a 200x200x3 =
120,000 element vector
● How many parameters do we need for
an MLP with one fully connected
hidden layer of 10 units?

200 * 200 * 3 * 10 + 10 = 1,200,010

WAY TOO MANY!!
Each pixel is an individual input “feature” of the
network. Why does this not make sense?
How does image classiﬁcation
work intuitively? How can you
convince someone that an
image contains object x
Features of an Image
Local Regions of an Image
● The basic idea is to operate on local regions of an image rather than individual
pixels or operating on far away pixels
● The dense neural network model template doesn’t lend itself immediately to patch
recognition like this!
Same ideas prevalent in classical CV too!

Canny Edge Detector — a very famous feature extractor developed by Berkeley Prof. John Canny!

Takeaway: you can’t really extract any useful information from by looking at individual pixels at once
Extracting representations from images
Recall that deep learning is the process of extracting
hierarchical representations from an input. What
does this look like for an image?
1. Learn to detect edges, textures and colors from
raw pixels in the ﬁrst layer
2. Use edges to detect simple shapes and patterns
in intermediate layers
3. Combine shapes and patterns to detect
abstract higher-level features, such as facial
shapes, in higher layers
Other desiderata
● Equivariance to translation: the same set of pixels, when translated, should have
their representations translated too
● Invariance to translation: semantic meaning does not change due to a translation
Solution: CNNs

Downsample the image and only extract what is relevant

Convolutions
High level concept
*don’t worry about implementation just yet :)
Initial Ideas
● Instead of processing an entire image at once, process patches of it instead!
○ Allows the network to pay attention to local regions of an image
● Process each patch with a layer parameterized by the same weights and bias —
also called weight sharing:
○ Ensures that same patches will output the same “representations”
○ Preserves translational equivariance and invariance
Filters + Convolutions

1 0 1
0 1 0
1 0 1
Weight Filter

Terminology!: Also
referred to as a “kernel”
Filters + Convolutions

1 1 1 1 0

1 0 1 0 0 1 1 0 2 3 2
0 0 1 0 1 1 2 3
0 1 0
0 0 1 1 1
1 0 1 3 2 4
1 1 0 1 0
Weight Filter

Terminology!: Also
referred to as a “kernel”
Filters (2D)
How to perform convolutions:
1. Slide ﬁlter along width and height by a certain amount (stride).
2. Compute dot products between entries of ﬁlter and input at any position.

Note: There is one bias

term per ﬁlter applied after
the convolution, just not in
our examples
Filters (2D) Practice
Filters (2D) Practice
Convolutions
Terminology!: This resulting matrix
is called an activation map

1 0 1
0 1 0
1 0 1
Weight Filter
Example What does this do? Any
ideas?

10 10 10 0 0 0

10 10 10 0 0 0 ?
1 0 -1
10 10 10 0 0 0

10 10 10 0 0 0 * 1

1
0

0
-1

-1
=
10 10 10 0 0 0

10 10 10 0 0 0
Example

10 10 10 0 0 0

10 10 10 0 0 0 0
1 0 -1
10 10 10 0 0 0

10 10 10 0 0 0 * 1

1
0

0
-1

-1
=
10 10 10 0 0 0

10 10 10 0 0 0
Example

10 10 10 0 0 0

10 10 10 0 0 0 0 ?
1 0 -1
10 10 10 0 0 0

10 10 10 0 0 0 * 1

1
0

0
-1

-1
=
10 10 10 0 0 0

10 10 10 0 0 0
Example

10 10 10 0 0 0

10 10 10 0 0 0 0 30
1 0 -1
10 10 10 0 0 0

10 10 10 0 0 0 * 1

1
0

0
-1

-1
=
10 10 10 0 0 0

10 10 10 0 0 0
Example

10 10 10 0 0 0

10 10 10 0 0 0 0 30 ?
1 0 -1
10 10 10 0 0 0

10 10 10 0 0 0 * 1

1
0

0
-1

-1
=
10 10 10 0 0 0

10 10 10 0 0 0
Example

10 10 10 0 0 0

10 10 10 0 0 0 0 30 30
1 0 -1
10 10 10 0 0 0

10 10 10 0 0 0 * 1

1
0

0
-1

-1
=
10 10 10 0 0 0

10 10 10 0 0 0
Example

10 10 10 0 0 0

10 10 10 0 0 0 0 30 30 ?
1 0 -1
10 10 10 0 0 0

10 10 10 0 0 0 * 1

1
0

0
-1

-1
=
10 10 10 0 0 0

10 10 10 0 0 0
Example

10 10 10 0 0 0

10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0

10 10 10 0 0 0 * 1

1
0

0
-1

-1
=
10 10 10 0 0 0

10 10 10 0 0 0
What does this ﬁlter do?

10 10 10 0 0 0

10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0
* =
0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0

10 10 10 0 0 0
Vertical Edge Detection

10 10 10 0 0 0

10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0
* =
0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0

10 10 10 0 0 0
Vertical Edge Detection

10 10 10 0 0 0

10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0
* =
0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0

10 10 10 0 0 0
Convolutions Concept Check
1) What does a horizontal edge detector look like?

2) What is the output of the same input with a horizontal edge detector?

3) What does this ^ tell us about the output of some convolution based “<insert
shape here> detector”?
Convolutions Concept Check 1 1 1

0 0 0
1) What does a horizontal edge detector look like?
-1 -1 -1
Similar, but rotated 90 degrees
2) What is the output of the same input with a horizontal edge detector?
The output is all zeros
3) What does this ^ tell us about the output of some convolution based “<insert
shape here> detector”?
The output of convolving the kernel at any location is high when the feature the
kernel was designed to detect (or something similar) is present, and low when it isn’t
present. In this case, there were no horizontal lines, so our horizontal line kernel
outputted zero everywhere
Purpose of Convolutions
● Different filters can be used to extract
various features of an image such as
edge detection and blurring
Some Classical Ideas: Edge Detection
● edges and shapes are important!
● John Canny (Berkeley prof) made good edge detector
● discrete gradients
Some Classical Ideas: HOG
● Histogram of Oriented
Gradients
● uses multiple gradient
orientations
● compares histogram (e.g.
SVM)
Where does the “Deep Learning” part come in?
Like dense fully-connected layers, we can just learn these filters!
Filters (3D)
● Steps:
○ Compute the dot product for each channel
(same as 2D)
○ Sum over each channel
● Note: The depth of the filter is always the
same as the depth of the input image

⚠W1 and W2 are distinct 4 x 4 x 3

filters
Convolutional Layers Also referred to as a
“feature map”
Padding
Convolving an image with a filter results in a block with a smaller height and width —
what if we want the height and width as before?
Same vs Valid Padding
● Same padding: padding with 0s (or
possibly some other constant value)
to preserve the spatial dimensions of
the output
● Valid padding: no padding
Stride
The amount of pixels to slide the filter by (both horizontally and vertically):
1. A stride of 1 will shift the filter every pixel
2. A stride of 2 will shift the filter every 2 pixels
Output Dimensions
● ceil[(W − F + 2P) / S] + 1
○ W: Input Dimension
○ F: Kernel / Filter Size
○ P: Padding size
○ S: Strides
○ ceil is the ceiling function
■ ceil[3.5] = ceil[4] = 4
● For the figure on the right:
○ Assume no padding and a stridge of 1
○ W’ = ceil[(6 - 3 + 2 * 0) / 1] + 1 = 4
○ H’ = ceil[(6 - 3 + 2 * 0) / 1] + 1 = 4
Backprop with Convolutions
● Like regular backprop algorithm (see previous lectures)
● Derivative of error w.r.t. a particular weight = sum of derivatives @ each output

Derivative
Derivative of output
Input Values Calculations for
feature map
Deﬁning a Convolutional Layer in PyTorch

https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
Other Operations
Pooling Layers
● Reduces output size
● Applied to each channel independently
● Neighboring features may be similar
○ Doesn’t remove too much information
● Max pooling takes the max
● Average pooling takes the average
Pooling Layers Concept Check
1) In max pooling, what are the partial
derivatives of the top right output with
respect to the 2x2 sub-grid of inputs in the
top right corner?
Pooling Layers Concept Check
1) In max pooling, what are the partial
derivatives of the top right output with
respect to the 2x2 sub-grid of inputs in the
top right corner?

0 0

0 1

The only information that ﬂows into the next

layer is from that 7, so only it will receive any
gradient flow during backprop
Transposed Convolution
● Note: This is just for culture, just look at the high level
● With most convolutions, our output size ended up being
smaller than the input size if we don’t pad
● What if we want to increase the output size?
○ Say you want to do a task like super-resolution where the output of the
model is larger than the input size
● Boils down to dilating the input feature map and running a
convolution on it
Recap
Recap
Intuition
● In an image, local regions provide more information than individual pixels
● Convolution is an operation that can filter for different things like edges by itself,
and you can have multiple filters to create more feature maps
○ Fewer parameters than a linear layer, allows us to extract features in an efficient way
○ Hyperparameters include stride, padding, kernel size
● The convolution operation amounts to a “sliding window”, and is spatially invariant
Tools for your toolkit:
● Conv2D layer
● Pooling layers
● Transposed Convolution
Lecture Attendance

http://tinyurl.com/fa24-dl4cv
Contributors
● Jake Austin
● Aryan Jain
● Val Rotan
● Past ML@B Edu members

05introduction To Convolutional Neural Networks
No ratings yet
05introduction To Convolutional Neural Networks
72 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
98 pages
Unit 5th Ig Ann
No ratings yet
Unit 5th Ig Ann
112 pages
DL Unit4 CNN
No ratings yet
DL Unit4 CNN
132 pages
Cours CNN Eng
No ratings yet
Cours CNN Eng
60 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
32 pages
Mod 5
No ratings yet
Mod 5
96 pages
CNN (Neural Network)
No ratings yet
CNN (Neural Network)
32 pages
11 CNNs
No ratings yet
11 CNNs
64 pages
Deep Learning UNIT-5
No ratings yet
Deep Learning UNIT-5
37 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
161 pages
Unit4 CNN
No ratings yet
Unit4 CNN
187 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
102 pages
465-Lecture 5-6
No ratings yet
465-Lecture 5-6
40 pages
Mayo Clinic Internal Medicine Board Review 10th
No ratings yet
Mayo Clinic Internal Medicine Board Review 10th
303 pages
UNIT 2 Study Materials 1
No ratings yet
UNIT 2 Study Materials 1
42 pages
Unit 3
No ratings yet
Unit 3
80 pages
CNN Basic Beak of Bird
100% (1)
CNN Basic Beak of Bird
20 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
95 pages
Images, Neural Networks, CNNs
No ratings yet
Images, Neural Networks, CNNs
26 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
77 pages
Machine Learning-Lecture 17 (Student)
No ratings yet
Machine Learning-Lecture 17 (Student)
7 pages
Lec 8
No ratings yet
Lec 8
60 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
DeepLearning Unit-II
No ratings yet
DeepLearning Unit-II
70 pages
Convolutional Networks1
No ratings yet
Convolutional Networks1
52 pages
Lecture 3 Updated
No ratings yet
Lecture 3 Updated
56 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
6 pages
Convolution Operation
No ratings yet
Convolution Operation
23 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
E-Note 33951 Content Document 20250328020322PM
No ratings yet
E-Note 33951 Content Document 20250328020322PM
29 pages
21CS743 DL Module4 Notes
No ratings yet
21CS743 DL Module4 Notes
7 pages
Unit 4
No ratings yet
Unit 4
19 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
26 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
53 pages
Convolution and Pooling Layers
No ratings yet
Convolution and Pooling Layers
42 pages
Chap4 CNN (20240205) - DL4H Practioner Guide
No ratings yet
Chap4 CNN (20240205) - DL4H Practioner Guide
23 pages
Sarma CNN Vce Oct 2022
No ratings yet
Sarma CNN Vce Oct 2022
63 pages
The Math Behind Convolutional Neural Networks - Towards Data Science
No ratings yet
The Math Behind Convolutional Neural Networks - Towards Data Science
37 pages
Convolutional Neural Networks: ZV0GDF798E
No ratings yet
Convolutional Neural Networks: ZV0GDF798E
9 pages
CNNS, Part 1: An Introduction To Convolutional Neural Networks
No ratings yet
CNNS, Part 1: An Introduction To Convolutional Neural Networks
17 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
27 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
Iii Unit - Deeplearning
No ratings yet
Iii Unit - Deeplearning
93 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
ND Chemical Engineering
No ratings yet
ND Chemical Engineering
150 pages
5 - Convolutional Neural Network
No ratings yet
5 - Convolutional Neural Network
14 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
The American Foundation Cambridge School: First Midterm Assessments SESSION 2021 - 2022 Physics (5054)
No ratings yet
The American Foundation Cambridge School: First Midterm Assessments SESSION 2021 - 2022 Physics (5054)
5 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
21CS743 Module4 Notes
No ratings yet
21CS743 Module4 Notes
15 pages
Masters Thesis Timeline
100% (3)
Masters Thesis Timeline
7 pages
S10 Q1 Week 4
No ratings yet
S10 Q1 Week 4
8 pages
02 Strategy - Setting Aspirations
No ratings yet
02 Strategy - Setting Aspirations
27 pages
Transfer Application 445584
No ratings yet
Transfer Application 445584
1 page
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
Summative Test Reading and Writing
No ratings yet
Summative Test Reading and Writing
3 pages
307 Physics - STUDY HELP FOR FINAL - QUESTIONS WITH 100% CORRECT ANSWERS
No ratings yet
307 Physics - STUDY HELP FOR FINAL - QUESTIONS WITH 100% CORRECT ANSWERS
92 pages
Dissertation Travail Et Technique Philosophie
100% (2)
Dissertation Travail Et Technique Philosophie
6 pages
Westin Aristotle's Rhetorical Energeia
No ratings yet
Westin Aristotle's Rhetorical Energeia
11 pages
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
No ratings yet
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
7 pages
Divine Mathematics Like You Have Never Seen Before: You Will Enter an Area That Will Show You From Where Arises All the Diversity of This Ours Monolithic World
From Everand
Divine Mathematics Like You Have Never Seen Before: You Will Enter an Area That Will Show You From Where Arises All the Diversity of This Ours Monolithic World
Nenad Ilic
No ratings yet
Power System Operation and Control - EE8702 2017 Regulation - Question Paper 2020 Nov Dec
No ratings yet
Power System Operation and Control - EE8702 2017 Regulation - Question Paper 2020 Nov Dec
6 pages
Ey Parthenon Ficci Report Transformation of Indian Higher Education Strategies To Leapfrog
No ratings yet
Ey Parthenon Ficci Report Transformation of Indian Higher Education Strategies To Leapfrog
60 pages
DFPC Fire Instructor I NFPA 1041 2007
No ratings yet
DFPC Fire Instructor I NFPA 1041 2007
10 pages
Blue - Print - Pre-Board - 2 - Pol. - SC - XII
No ratings yet
Blue - Print - Pre-Board - 2 - Pol. - SC - XII
4 pages
JEE Main 2023 Important Topics
No ratings yet
JEE Main 2023 Important Topics
6 pages
A Comprehensive Tutorial To Learn Convolutional Neural Networks From Scratch
No ratings yet
A Comprehensive Tutorial To Learn Convolutional Neural Networks From Scratch
11 pages
DR Tambanandmr Flores
No ratings yet
DR Tambanandmr Flores
7 pages
MSDS LDPE - LLDPE Version 7 EN
No ratings yet
MSDS LDPE - LLDPE Version 7 EN
11 pages
Reliability Analysis
No ratings yet
Reliability Analysis
22 pages
What Is A Procedure Text-1
No ratings yet
What Is A Procedure Text-1
4 pages
SAWS-EnG-0632 Minimum Requirements of Geotechnical Investigations and Reports
No ratings yet
SAWS-EnG-0632 Minimum Requirements of Geotechnical Investigations and Reports
13 pages
Luyện Đọc Điền - Đọc Hiểu (Buổi 8) Livestream
No ratings yet
Luyện Đọc Điền - Đọc Hiểu (Buổi 8) Livestream
3 pages
Scholarship For MSC Student
No ratings yet
Scholarship For MSC Student
3 pages
Samai (Hod) Applied Accounting Year 1 Business Economices Sec Sem
No ratings yet
Samai (Hod) Applied Accounting Year 1 Business Economices Sec Sem
19 pages
2021sem3 PHSH CC6
No ratings yet
2021sem3 PHSH CC6
2 pages
Trabajo de Ingles Animal en Extincion
No ratings yet
Trabajo de Ingles Animal en Extincion
4 pages
CVATSFriendly 1706242751 582645 344192
No ratings yet
CVATSFriendly 1706242751 582645 344192
1 page
Presupposition and Entailment by Jack Sidnell
No ratings yet
Presupposition and Entailment by Jack Sidnell
4 pages
Sentence Structure
No ratings yet
Sentence Structure
3 pages
Leadership Theories: MBA 645 Leadership in Organizations Jeff Shay University of Montana
No ratings yet
Leadership Theories: MBA 645 Leadership in Organizations Jeff Shay University of Montana
16 pages