0% found this document useful (0 votes)
102 views63 pages

Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)

Uploaded by

Afaq Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views63 pages

Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)

Uploaded by

Afaq Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

CS490 ̶ Advanced Topics in Computing

(Deep Learning)

Lecture 16: Convolutional Neural Networks (CNNs)

Dr. Muhammad Shahzad


[email protected]

Department Of Computing (DOC),


School of Electrical Engineering & Computer Science (SEECS),
National University of Sciences & Technology (NUST)

12/04/2021
Fully Connected Layer

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 2
Motivation: Deep Learning on Images
How many entries does
the weight matrix 𝑤 1
has assuming that the
12288-dimensional first hidden layer have
input vector 1000 units?
64 x 64 x 3 3 Billion!
Shape of 𝒘𝟏 is 1000 x 3M

i.e., adding 1000 biases,


we need to train more
1000 x 1000 x 3 3 Million-dimensional input than 3 Billion parameters
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 3
Convolutional Neural Networks

▪ Similar to regular Neural Networks except that they make the


explicit assumption that the inputs are images, which allows us to
encode certain properties into the architecture
▪ These then make the forward function more efficient to implement
and vastly reduce the amount of parameters in the network, e.g.,
using local receptive field and parameter sharing scheme

A ConvNet is made up of Layers


Every Layer has a simple API: It transforms an input 3D volume to an output 3D
volume with some differentiable function that may or may not have parameteras
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 4
Layers used to build ConvNets

▪ A ConvNet architecture is in the simplest case a list of Layers that


transform the image volume into an output volume (e.g. holding
the class scores)
▪ Three main types of layers that are stacked to build ConvNet
architectures:
► Convolutional Layer

► Pooling Layer

► Fully-Connected Layer (exactly as seen in regular Neural


Networks)
▪ Each Layer accepts an input 3D volume and transforms it to an
output 3D volume through a differentiable function
▪ Each Layer may or may not have parameters (e.g. CONV/FC do,
RELU/POOL don’t)
▪ Each Layer may or may not have additional hyperparameters (e.g.
CONV/FC/POOL do, RELU doesn’t)
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 5
How does Convolution work?

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 6
Edge Detection Via Convolution Operation

3x1 + 1x1 + 2x1 + 0x0 + 5x0 + 7x0 + 1x(-1) + 8x(-1) + 2x(-1) = -5

-5
1 0 -1
1 0 -1
1 0 -1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 7
Edge Detection Via Convolution Operation

0x1 + 5x1 + 7x1 + 1x0 + 8x0 + 2x0 + 2x(-1) + 9x(-1) + 5x(-1) = -4

-5 -4
1 0 -1
1 0 -1
1 0 -1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 8
Edge Detection Via Convolution Operation

1x1 + 8x1 + 2x1 + 2x0 + 9x0 + 5x0 + 7x(-1) + 3x(-1) + 1x(-1) = 0

-5 -4 0
1 0 -1
1 0 -1
1 0 -1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 9
Edge Detection Via Convolution Operation

1x1 + 6x1 + 2x1 + 7x0 + 2x0 + 3x0 + 8x(-1) + 8x(-1) + 9x(-1) = -16

-5 -4 0 8
1 0 -1
-10 -2 2 3
1 0 -1
0 -2 -4 -7
1 0 -1
-3 -2 -3 -16

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 10
How does Convolution work?

▪ Convolution of the image with a filter (also called kernel,


window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 11
How does Convolution work?

▪ Convolution of the image with a filter (also called kernel,


window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 12
How does Convolution work?

▪ Convolution of the image with a filter (also called kernel,


window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Image convolved with a filter with positive and equal
coefficients results in smoothed output image

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 13
How does Convolution work?

▪ Convolution of the image with a filter (also called kernel,


window, mask, or template) with different coefficient values
results in a new filtered output image e.g.,
► Similarly we can also compute image derivatives to compute
edges in the input image

Any idea what could be the filter coefficients?


CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 14
Edge Detection Via Convolution Operation

The natural derivative operator can be defined as the


difference between the intensity of neighbouring pixels

f
= f ( x + 1) − f ( x)
x

z1 z2 z3
z4 z5 z6
z7 z8 z9
z5 = -1 z6 = -1
z8 = 1 z9 = 1

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 15
Edge Detection Via Convolution Operation

Vertical edges

Horizontal edges
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 16
Edge Detection Via Convolution Operation

10x1 + 10x1 + 10x1 + 0x0 + 0x0 + 0x0 + 0x(-1) + 0x(-1) + 0x(-1) = 30

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 17
Edge Detection Via Convolution Operation

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 18
Learning To Detect Edges

3 -3
2 -2 10 -10

3 -3
Prewitt Sobel Schar

With the rise of deep


learning, it is possible to
automatically learn these
filter coefficients more
robustly via backpropagation
for a specific task e.g., edge
detection

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 19
Edge Detection Via Convolution Operation

Vertical edges

Horizontal edges
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 20
Spatial Dimensions: A Closer Look

7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 21
Spatial Dimensions: A Closer Look

7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 22
Spatial Dimensions: A Closer Look

7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 23
Spatial Dimensions: A Closer Look

7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 24
Spatial Dimensions: A Closer Look

Output
dimension?
7
5x5 output
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 1
7

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 25
Spatial Dimensions: A Closer Look

7x7 input
(spatially)
assume
3x3 filter
applied with
stride 2
7

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 26
Spatial Dimensions: A Closer Look

7x7 input
(spatially)
assume
3x3 filter
applied with
stride 2
7

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 27
Spatial Dimensions: A Closer Look

Output
dimension?
7
3x3 output
7x7 input
(spatially)
assume
3x3 filter
applied with
stride 2
7

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 28
Spatial Dimensions: A Closer Look

Doesn’t fit!

7 Cannot apply
3x3 filter on
7x7 input 7x7 input with
(spatially) stride 3
assume
3x3 filter
applied with
stride 3
7

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 29
Spatial Dimensions: A Closer Look

Output size?

(N - F) / stride + 1

E.g., with N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1 = 5
stride 2 => (7 - 3)/2 + 1 = 3
stride 3 => (7 - 3)/3 + 1 = 2.33

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 30
Common Practice: Zero Padding At Borders

(N+2P-F)/stride + 1
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 31
Valid vs Same Convolutions

(N+2P-F)/stride + 1
▪ Valid convolution: The spatial dimensions of the resulting image
after convolution shrinks

▪ Same convolution: The spatial dimensions of the resulting image


after the convolution stays the same
► Acheived via zero-padding

(N+2P-F)/S + 1 = N

For S=1,
N+2P-F + 1 = N
=> P = (F-1)/2

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 32
Convolution Layer

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 33
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 34
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 35
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 36
Convolution Over Volumes

6x6x3 3x3x3 4x4

Note we have now 27 learnable coefficients

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 37
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 38
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 39
Convolution Over Volumes

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 40
Convolutional Layer: Neuron View

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 41
Receptive Field

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 42
Convolutional Layer: Neuron View

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 43
Single Convolutional Layer

with 6 5x5x3 filters


𝑤1
(75x6 entries)

𝑎0 𝑎1
𝑧 1 = 𝑤 1 𝑎 0 + 𝑏1
𝑎1 = 𝑔(𝑧1 )

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 44
ConvNets

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 45
ConvNets

Flatten the last volume, e.g., 24 x 24 x 10 volume into 5760-d vector of


neurons and feed them to Fully Connected (FC) layer followed by a softmax
unit for prediction
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 46
Example

Input volume: 32x32x3


10 5x5x3 filters with stride 1, pad 2

Output volume size?

(32+2*2-5)/1+1 = 32 spatially, so
32x32x10

Number of parameters in this layer?


each filter has 5*5*3 + 1 = 76 params (+1 for bias)
=> 76*10 = 760

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 47
ConvNet Dimensions

Common settings:
K = (powers of 2, e.g. 32, 64, 128, 512)
- F = 3, S = 1, P = 1
- F = 5, S = 1, P = 2
- F = 5, S = 2, P = ? (whatever fits)
- F = 1, S = 1, P = 0

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 48
1x1 convolution

2x

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 49
1x1 convolution

2x

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 50
1x1 convolution layer

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 51
Pooling Layer

▪ Makes the representations smaller and more manageable


▪ Operates over each activation map independently

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 52
MAX Pooling

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 53
Average-Pooling

3.75 1.25

4 2

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 54
MAX-Pooling

What would be the


results of appliying
Max-POOl using
F=3&S=1?

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 55
MAX-Pooling

9 9 5

9 9 5

8 6 9

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 56
Pooling Dimensions

Common settings:
F = 2, S = 2
F = 3, S = 2

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 57
Example: ConvNets

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 58
Summary of Typical ConvNet Design

▪ ConvNets stack CONV,POOL,FC layers


▪ Trend towards smaller filters and deeper architectures
▪ Trend towards getting rid of POOL/FC layers (just CONV)
▪ Historically architectures looked like

[(CONV-RELU)*N-POOL?]*M - (FC-RELU)*K, SOFTMAX


where N is usually up to ~5, M is large, 0 <= K <= 2

▪ However, recent advances such as ResNet/GoogLeNet have


challenged this paradigm

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 59
CNNs vs FC Neural Networks

Two major advantages of CNNs over FC neural networks


▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
For a regular neural network with 32 x 32 x 3(= 3072) convolved
dense connections, this means you with 6 filters 5 x 5 x 3 resulting
have 3072 x 4704 ≈ 14 Million weights in 28 x 28 x 6 volume (= 4704)
How many parameters do we
need for Conv layer?

32 x 32 x 3 28 x 28 x 6
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 60
CNNs vs FC Neural Networks

Two major advantages of CNNs over FC neural networks


▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
32 x 32 x 3(= 3072) convolved
(75 + 1) x 6 = 456 only with 6 filters 5 x 5 x 3 resulting
in 28 x 28 x 6 volume (= 4704)
How many parameters do we
need for Conv layer?

32 x 32 x 3 28 x 28 x 6
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 61
CNNs vs FC Neural Networks

Two major advantages of CNNs over FC neural networks


▪ Parameter sharing
► A feature detector (such as vertical edge detector) that is
useful in one part of the image is probably useful in another
part of the image (translational invariance)
▪ Sparsity of connections (i.e., Local receptive field)
► In each layer, each output value depends only on a small
number of inputs

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 62
Acknowledgements

Various contents in this presentation have been taken from


different books, lecture notes (particularly CS231n Stanford, MIT
6.S191, deeplearning.ai & neuralnetworksanddeeplearning.com),
and the web. These solely belong to their owners and are here used
only for clarifying various educational concepts. Any copyright
infringement is not intended.

CS490 – Advanced Topics in Computing (Deep Learning) Lecture 16: Convolutional Neural Networks (CNNs) 63

You might also like