0% found this document useful (0 votes)

14 views56 pages

Lecture 7-8

The document discusses convolutional and pooling layers in deep learning, highlighting their roles in image classification tasks such as distinguishing between dogs and cats. It explains key concepts like 2-D convolution, translation invariance, locality, padding, stride, and the architecture of convolutional networks, including the LeNet model. Additionally, it covers the importance of pooling layers for achieving invariance to translation and reducing computational complexity.

Uploaded by

hanimukhtar512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views56 pages

Lecture 7-8

Uploaded by

hanimukhtar512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Deep Learning

Convolutional and Pooling Layers

Dr. Ahsen Tahir

.The slides in part have been modified from Ian Good Fellow book slides and Alex’s Dive in to Deep Learning book slides
Convolutional Networks
Classifying Dogs and Cats in Images

• Use a good camera

• RGB image has 36M elements
• The model size of a single hidden
layer MLP with a 100 hidden size
is 3.6 Billion parameters
• Exceeds the population of dogs
and cats on earth
(900M dogs + 600M cats)
Flashback - Network with one hidden layer

100 neurons

3.6B parameters = 14GB

36M features

h = σ(Wx + )b
Convolution
2-D Convolution (Cross Correlation)

0 × 0 + 1 × 1 + 3 × 2 + 4 × 3 = 19,
1 × 0 + 2 × 1 + 4 × 2 + 5 × 3 = 25,
3 × 0 + 4 × 1 + 6 × 2 + 7 × 3 = 37,
(vdumoulin@ Github)
4 × 0 + 5 × 1 + 7 × 2 + 8 × 3 = 43.
Two Principles
• Translation
Invariance
• Locality
Idea #1 - Translation Invariance

hi, j = ∑ vi, j,a,bxi+a,j+b

a,b

• A shift in x also leads to a shift in h

• v should not depend on (i,j). Fix via vi, j,a,b = va,b

hi, j = ∑ va,b x i+a,j+b

a,b

That’s a 2-D convolution

cross-correlation
Idea #2 - Locality

hi, j = va,b xi+a,j+b

∑
a,b
• We shouldn’t look very far from x(i,j) in order to assess
what’s going on at h(i,j)
• Outside range |a|,|b| > Δ parameters vanish va,b = 0

Δ Δ
hi, j = ∑ ∑ va,b xi+a,j+b
a=−Δ b=−Δ
2-D Convolution Layer

• X : nh× nw input matrix

• W : kh× kw kernel matrix
• b: scalar bias
• Y : (n − kh+ 1) × (n − k+ 1) output matrix
h w w
Y=X⋆W+b
• W and b are learnable parameters
Examples
Edge Detection

Sharpen

(wikipedia)

Gaussian Blur
Examples

(Rob Fergus)
Gabor filters

@medium
Cross Correlation vs Convolution

• 2-D Cross Correlation

h w
yi, j = ∑ ∑ w xi+a,j+b
a,b
a=1 b=1
• 2-D Convolution
h w
yi, j = ∑ ∑ w x
−a,−b i+a,j+b
a=1 b=1

• No difference in practice during to symmetry

1-D and 3-D Cross Correlations

• 1-D • 3-D
h h w d
yi = ∑ waxi+a yi, j,k= ∑ ∑ ∑ w x
i+a,j+b,k+c
a=1 a=1 b=1 c=1 a,b,c

• Text • Video
• Voice • Medical images
• Time series
Padding and Stride

courses.d2l.ai/berkeley-stat-157
Padding

• Given a 32 x 32 input image

• Apply convolutional layer with 5 x 5 kernel
• 28 x 28 output with 1 layer
• 4 x 4 output with 7 layers
• Shape decreases faster with larger kernels
• Shape reduces from n × nwto
h
(nh− kh+ 1) × (nw− kw+ 1)
Padding

Padding adds rows/columns around input

0×0+0×1+0×2+0×3=0
Padding

• If Padding p=1 (means zero layer around each side of image)

(n+ 2p − k + 1)

• A common choice is 2p = k − 1
Stride

• Padding reduces shape linearly with #layers

• Given a 224 x 224 input with a 5 x 5 kernel, needs 44
layers to reduce the shape to 4 x 4
• Requires a large amount of computation
Stride

• Stride is the #rows/#columns per slide

Strides of 3 and 2 for height and width

0×0+0×1+1×2+2×3=8
0×0+6×1+0×2+0×3=6
Stride

• Given stride s, sh for the height and stride sw for the width,
the output shape is
⌊(n + 2p− k + 1)⌋
s
• With 2p= k− 1 in n+2p-k+1 → n → n/s

(n /s ) × (n/s )
h h w w
Multiple Input and
Output Channels

courses.d2l.ai/berkeley-stat-157
Multiple Input Channels

• Color image may have three RGB channels

• Converting to grayscale loses information
Multiple Input Channels

• Color image may have three RGB channels

• Converting to grayscale loses information
Multiple Input Channels

• Have a kernel for each channel, and then sum results

over channels

(1 × 1 + 2 × 2 + 4 × 3 + 5 × 4)
+(0 × 0 + 1 × 1 + 3 × 2 + 4 × 3)
= 56
Multiple Input Channels

• X : ci× nh× nwinput

• W : ci× kh× kw kernel
• Y : mh× mwoutput

ci
Y= Xi,:,: ⋆ Wi,:,:
∑
i=0
Multiple Output Channels

• No matter how many inputs channels, so far we always

get single output channel
• We can have multiple 3-D kernels, each one generates a
output channel
• Input X : ci× nh× n w
Yi,:,: = X ⋆ W i,:,:,:
• Kernel W : co× ci× kh× kw
• Output Y : co× mh× m w for i = 1,…, co

Tensorflow → Channels Last (default)

Pytorch → Channels First (default)
Multiple Input/Output Channels

• Each output channel may recognize a particular pattern

• Input channels kernels recognize and combines patterns

in inputs
1 x 1 Convolutional Layer

k h= k w= 1 is a popular choice. It doesn’t recognize spatial

patterns, but fuse channels.
2-D Convolution Layer Summary

• Input X : ci× nh× n w

• Kernel W : co× ci× kh× kw
• Bias B : co× c i
Y = X ⋆ W + B
• Output Y : co× m× m w
h
• Complexity (number of floating point operations FLOP)
ci = co = 100
kh= hw= 5 O(c c k k m m ) 1GFLOP
i o hw h w
mh= mw= 64
• 10 layers, 1M examples: 10PF
(CPU: 0.15 TF = 18h, GPU: 12 TF = 14min)
Pooling Layer

courses.d2l.ai/berkeley-stat-157
Pooling
0 output with
• Convolution is sensitive to position 1 pixel shift
• Detect vertical edges

X Y

• We need some degree of invariance to translation

• Lighting, object positions, scales, appearance vary
among images
2-D Max Pooling

• Returns the maximal value in the

sliding window

max(0,1,3,4) = 4
2-D Max Pooling

• Returns the maximal value in the sliding window

Vertical edge detection Conv output 2 x 2 max pooling

Tolerant to 1
pixel shift
Padding, Stride, and Multiple Channels

• Pooling layers have similar padding

and stride as convolutional layers
• No learnable parameters
• Apply pooling for each input channel to
obtain the corresponding output
channel

#output channels = #input channels

Average Pooling

• Max pooling: the strongest pattern signal in a window

• Average pooling: replace max with mean in max pooling
• The average signal strength in a window
Max pooling Average pooling
LeNet Architecture
Handwritten Digit
Recognition

courses.d2l.ai/berkeley-stat-157
MNIST
• Centered and scaled
• 50,000 training data
• 10,000 test data
• 28 x 28 images
• 10 classes

courses.d2l.ai/berkeley-stat-157
Y. LeCun, L.
Bottou, Y. Bengio,
P. Haffner, 1998
Gradient-based
learning applied to
document
recognition

courses.d2l.ai/berkeley-stat-157
Expensive if we
have many
outputs

gluon-cv.mxnet.io
LeNet in MXNet

net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='tanh'))
net.add(gluon.nn.AvgPool2D(pool_size=2))
net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='tanh'))
net.add(gluon.nn.AvgPool2D(pool_size=2))
net.add(gluon.nn.Flatten())
net.add(gluon.nn.Dense(500, activation='tanh'))
net.add(gluon.nn.Dense(10))

loss = gluon.loss.SoftmaxCrossEntropyLoss()

(size and shape inference is automatic)

Summary

• Convolutional layer
• Reduced model capacity compared to dense layer
• Efficient at detecting spatial pattens
• High computation complexity
• Control output shape via padding, strides and
channels
• Max/Average Pooling layer
• Provides some degree of invariance to translation

courses.d2l.ai/berkeley-stat-157

Ch10 Deep Learning
No ratings yet
Ch10 Deep Learning
104 pages
Computer Vision 2
No ratings yet
Computer Vision 2
62 pages
11 Convolution
No ratings yet
11 Convolution
56 pages
L11 Learning III Neural Network Architectures
No ratings yet
L11 Learning III Neural Network Architectures
35 pages
CV 2025 Spring 16
No ratings yet
CV 2025 Spring 16
53 pages
02 CNN Slides
No ratings yet
02 CNN Slides
77 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
102 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Lect 12 21062023 043845pm 1 03062024 111022am
No ratings yet
Lect 12 21062023 043845pm 1 03062024 111022am
79 pages
CNN Slides Part2
No ratings yet
CNN Slides Part2
69 pages
Where Is Waldo?: Courses.d2l.ai/berkeley-Stat-157
No ratings yet
Where Is Waldo?: Courses.d2l.ai/berkeley-Stat-157
47 pages
Unit 5 Ann
No ratings yet
Unit 5 Ann
28 pages
Military AI-Week 05-AI in Computer Vision
No ratings yet
Military AI-Week 05-AI in Computer Vision
65 pages
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
123 pages
Convolutional Neural Networks - Part 2
No ratings yet
Convolutional Neural Networks - Part 2
49 pages
CNN Slides PDF
No ratings yet
CNN Slides PDF
81 pages
Week6 - Intro To Convolutional Neural Networks
No ratings yet
Week6 - Intro To Convolutional Neural Networks
25 pages
Images, Neural Networks, CNNs
No ratings yet
Images, Neural Networks, CNNs
26 pages
CNN Ai
No ratings yet
CNN Ai
17 pages
Week 6 Unsupervised Learning
No ratings yet
Week 6 Unsupervised Learning
60 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
Module 3
No ratings yet
Module 3
67 pages
Lecture2 CNN Network Design
No ratings yet
Lecture2 CNN Network Design
34 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
Lecture 08
No ratings yet
Lecture 08
43 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Lecture4 - Convnets For CV Slide
No ratings yet
Lecture4 - Convnets For CV Slide
65 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Lecture 11 Slides
No ratings yet
Lecture 11 Slides
15 pages
Machine Learning-Lecture 17 (Student)
No ratings yet
Machine Learning-Lecture 17 (Student)
7 pages
CNN 1
No ratings yet
CNN 1
9 pages
Unit 2
No ratings yet
Unit 2
45 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
ML Lec 13 CNN
No ratings yet
ML Lec 13 CNN
44 pages
Sarma CNN Vce Oct 2022
No ratings yet
Sarma CNN Vce Oct 2022
63 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Convolution and Pooling Layers
No ratings yet
Convolution and Pooling Layers
42 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Lecture CNN
No ratings yet
Lecture CNN
68 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
No ratings yet
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
64 pages
Unit 4
No ratings yet
Unit 4
19 pages
Convolutinal Neural Networks
No ratings yet
Convolutinal Neural Networks
43 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
55 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
CNN With Tensor Flow
No ratings yet
CNN With Tensor Flow
61 pages
Lab 5 - Intro To Convolutional Neural Networks
No ratings yet
Lab 5 - Intro To Convolutional Neural Networks
52 pages
IU22 Presentation
100% (1)
IU22 Presentation
47 pages
RF Mixers
100% (3)
RF Mixers
34 pages
Guide Convolutional Neural Network CNN
100% (1)
Guide Convolutional Neural Network CNN
25 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
NN 06
No ratings yet
NN 06
18 pages
Chapter 1 Introduction To DSP
No ratings yet
Chapter 1 Introduction To DSP
22 pages
Ml@ok Questions
No ratings yet
Ml@ok Questions
16 pages
Nyquist Plots
No ratings yet
Nyquist Plots
7 pages
Lab 6 DFT and FFT
No ratings yet
Lab 6 DFT and FFT
16 pages
Lab 5 Introduction To Simulink and Filter Design Using MATLAB.
No ratings yet
Lab 5 Introduction To Simulink and Filter Design Using MATLAB.
26 pages
Chapter 19
No ratings yet
Chapter 19
102 pages
Clase 12 - Adc
No ratings yet
Clase 12 - Adc
57 pages
Electronicsbehindv 2 K
100% (1)
Electronicsbehindv 2 K
17 pages
CNN Architecture
No ratings yet
CNN Architecture
24 pages
CS 601 Machine Learning Unit 3
No ratings yet
CS 601 Machine Learning Unit 3
37 pages
Digital Image Processing: Image Enhancement (Spatial Filtering 2)
No ratings yet
Digital Image Processing: Image Enhancement (Spatial Filtering 2)
33 pages
Signal Approximation Using The Bilinear Transform
No ratings yet
Signal Approximation Using The Bilinear Transform
118 pages
Ee1320 Analog To Digital Convertors
100% (1)
Ee1320 Analog To Digital Convertors
42 pages
Lesson 44 Place Value and Value of A Digit in A Given Decimal Number Through Hundredths
No ratings yet
Lesson 44 Place Value and Value of A Digit in A Given Decimal Number Through Hundredths
15 pages
Communication Reviewer
100% (1)
Communication Reviewer
3 pages
A Comprehensive Tutorial To Learn Convolutional Neural Networks From Scratch
No ratings yet
A Comprehensive Tutorial To Learn Convolutional Neural Networks From Scratch
11 pages
DSP Assignments 1 Solution Spring 2020 PDF
100% (1)
DSP Assignments 1 Solution Spring 2020 PDF
3 pages
Bachelor of Engineering (Electrical & Electronics) Vii Semester
No ratings yet
Bachelor of Engineering (Electrical & Electronics) Vii Semester
11 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
Using MATLAB To Plot The Fourier Transform of A Time Function
No ratings yet
Using MATLAB To Plot The Fourier Transform of A Time Function
13 pages
Asem Moto Ble & MB: Distribution Amplifiers 1002 MHZ
No ratings yet
Asem Moto Ble & MB: Distribution Amplifiers 1002 MHZ
8 pages
Analog To Digital Conversion (Chapter 1)
No ratings yet
Analog To Digital Conversion (Chapter 1)
3 pages
Designcon 2019: Baseline Wander: Systematic Approach To Rapid Simulation and Measurement
No ratings yet
Designcon 2019: Baseline Wander: Systematic Approach To Rapid Simulation and Measurement
24 pages
Harbinger LV8 Owner Manual
No ratings yet
Harbinger LV8 Owner Manual
9 pages
Exp 4 Python
No ratings yet
Exp 4 Python
10 pages
TFA9842J 2-Channel Audio Amplifier PDF
No ratings yet
TFA9842J 2-Channel Audio Amplifier PDF
21 pages
Alt Codes
No ratings yet
Alt Codes
9 pages
BESTLOUDESTSETSBYMASTERANDSANCHITNinjakochodu
No ratings yet
BESTLOUDESTSETSBYMASTERANDSANCHITNinjakochodu
2 pages
DSP Important Questions
No ratings yet
DSP Important Questions
4 pages
Implementation of Gabor Filter On FPGA
No ratings yet
Implementation of Gabor Filter On FPGA
5 pages
Ec8501 Syllabus
No ratings yet
Ec8501 Syllabus
2 pages
Part - A (03X02 06) Part - A (03X02 06) : Ec8553 Digital Signal Processing Ec8553 Digital Signal Processing
No ratings yet
Part - A (03X02 06) Part - A (03X02 06) : Ec8553 Digital Signal Processing Ec8553 Digital Signal Processing
2 pages
Subject: Analog and Digutal Communication (Ece3151)
No ratings yet
Subject: Analog and Digutal Communication (Ece3151)
1 page

Lecture 7-8

Uploaded by

Lecture 7-8

Uploaded by

Deep Learning

Convolutional and Pooling Layers

Dr. Ahsen Tahir

• Use a good camera

3.6B parameters = 14GB

hi, j = ∑ vi, j,a,bxi+a,j+b

• A shift in x also leads to a shift in h

hi, j = ∑ va,b x i+a,j+b

That’s a 2-D convolution

hi, j = va,b xi+a,j+b

• X : nh× nw input matrix

• 2-D Cross Correlation

• No difference in practice during to symmetry

• Given a 32 x 32 input image

Padding adds rows/columns around input

• If Padding p=1 (means zero layer around each side of image)

• Padding reduces shape linearly with #layers

• Stride is the #rows/#columns per slide

• Color image may have three RGB channels

• Color image may have three RGB channels

• Have a kernel for each channel, and then sum results

• X : ci× nh× nwinput

• No matter how many inputs channels, so far we always

Tensorflow → Channels Last (default)

• Each output channel may recognize a particular pattern

• Input channels kernels recognize and combines patterns

k h= k w= 1 is a popular choice. It doesn’t recognize spatial

• Input X : ci× nh× n w

• We need some degree of invariance to translation

• Returns the maximal value in the

• Returns the maximal value in the sliding window

Vertical edge detection Conv output 2 x 2 max pooling

• Pooling layers have similar padding

#output channels = #input channels

• Max pooling: the strongest pattern signal in a window

(size and shape inference is automatic)

You might also like