0% found this document useful (0 votes)

30 views64 pages

Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision

This document provides an overview of convolutional neural networks (CNNs) for computer vision tasks. It discusses how images can be represented as inputs to neural networks by normalizing pixel values. While multilayer perceptrons may not be effective for images due to different features appearing in various locations, convolutions help address this issue by applying filters over local regions. Convolutions have long been used for tasks like edge detection, sharpening, and blurring images. The document also notes that convolution is similar to correlation and is translation equivariant, meaning the result of the convolution does not change based on shifts in the input.

Uploaded by

ashishamitav123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views64 pages

Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision

Uploaded by

ashishamitav123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Convolutional Neural Networks

Convolutions, pooling and CNNs. Neural architectures

for computer vision.
Fourth Machine Learning in High Energy Physics Summer School,
MLHEP 2018, August 6--12

Alexey Artemov1,2
1Skoltech 2National Research University Higher School of Economics
1
Lecture overview
• Digital images and processing by neural networks
• Image processing operations: convolutions and pooling
• Convolutional neural networks from scratch
• Modern computer vision architectures: AlexNet, VGG, Inception and
ResNets

2
Images as inputs
to neural networks

3
Digital representation of an image
• Grayscale image is a matrix of pixels (picture elements)
• Dimensions of this matrix are called image resolution (e.g. 300 x 300)
• Each pixel stores its brightness (or intensity) ranging
from 0 to 255, 0 intensity corresponds to black color:

255

4
Image as a neural network input
(
• Normalize input pixels: 𝑥#$%& = − 0.5
)**

5
Image as a neural network input
(
• Normalize input pixels: 𝑥#$%& = − 0.5
)**

• Maybe MLP will work?

𝜎 2 𝑥/0 𝑤/0 + 𝑏
/,0

Pixels 𝑥/0 Weights 𝑤/0

6
Image as a neural network input
(
• Normalize input pixels: 𝑥#$%& = − 0.5
)**

• Maybe MLP will work?

𝜎 2 𝑥/0 𝑤/0 + 𝑏
/,0

Pixels 𝑥/0 Weights 𝑤/0

• Actually, no!

7
Why not MLP?
• Let’s say we want to train a “cat detector”
On this training image red
weights 𝑤/0 will change a little bit
to better detect a cat

8
Why not MLP?
• Let’s say we want to train a “cat detector”
On this training image red
weights 𝑤/0 will change a little bit
to better detect a cat

On this training image green

weights 𝑤/0 will change…

9
Why not MLP?
• Let’s say we want to train a “cat detector”
On this training image red
weights 𝑤/0 will change a little bit
to better detect a cat

On this training image green

weights 𝑤/0 will change…

• We learn the same “cat features” in different areas

and don’t fully utilize the training set!
• What if cats in the test set appear in different places?
10
Convolutions will help!
• Convolution is a dot product of a kernel (or filter)
and a patch of an image (local receptive field) of the same size

1 0 1 0 1 0 1 2 5
0 1 1 0 0 1 * 3 4
1 0 1 0 Image patch
1 0 1 1 (local Kernel
Output
Input receptive
field)

11
Convolutions will help!
• Convolution is a dot product of a kernel (or filter)
and a patch of an image (local receptive field) of the same size

1 0 1 0 1 1 1 2 5 9 4
0 1 1 0 0 1 * 3 4 5 7
1 0 1 0 Image patch
1 0 1 1 (local Kernel
Output
Input receptive
field)

12
Convolutions have been used for a while
Kernel
-1 -1 -1
* -1 8 -1 Edge
= detection
-1 -1 -1

Sums up to 0 (black color)

when the patch is a solid fill

Original
image

13
Convolutions have been used for a while
Kernel
-1 -1 -1
* -1 8 -1 = Edge
-1 -1 -1 detection

0 -1 0
* -1 5 -1 = Sharpening
0 -1 0
Original
image Doesn’t change an image for solid fills
Adds a little intensity on the edges

14
Convolutions have been used for a while
Kernel
-1 -1 -1
* -1 8 -1 = Edge
-1 -1 -1 detection

0 -1 0
* -1 5 -1 = Sharpening
0 -1 0
Original
image 1 1 1
9
∗ 1 1 1 = Blurring
:
1 1 1
15
Convolution is similar to correlation
0 0 0 0
0 0 0
0 0 0 0 1 0
* = 0 1 0
0 0 1 0 0 1
0 0 2
0 0 0 1

Input Kernel Output

16
Convolution is similar to correlation
0 0 0 0
0 0 0
0 0 0 0 1 0
* = 0 1 0
0 0 1 0 0 1
0 0 2
0 0 0 1

Input Kernel Output

0 0 0 0
0 0 0
0 0 0 0 1 0
* = 0 0 1
0 0 0 1 0 1
0 1 0
0 0 1 0

Input Kernel Output

17
Convolution is similar to correlation
0 0 0 0
0 0 0
0 0 0 0 1 0 Max = 2
* = 0 1 0
0 0 1 0 0 1
0 0 2
0 0 0 1
Simple
Input Kernel Output
classifier
0 0 0 0
0 0 0
0 0 0 0 1 0 Max = 1
* = 0 0 1
0 0 0 1 0 1
0 1 0
0 0 1 0

Input Kernel Output

18
Convolution is translation equivariant
0 0 0 0
0 0 0
0 0 0 0 1 0
* = 0 1 0
0 0 1 0 0 1
0 0 2
0 0 0 1

Input Kernel Output

19
Convolution is translation equivariant
0 0 0 0
0 0 0
0 0 0 0 1 0
* = 0 1 0
0 0 1 0 0 1
0 0 2
0 0 0 1

Input Kernel Output

1 0 0 0
2 0 0
0 1 0 0 1 0
* = 0 1 0
0 0 0 0 0 1
0 0 0
0 0 0 0
Input Kernel Output
20
Convolution is translation equivariant
0 0 0 0
0 0 0
0 0 0 0 1 0 Max = 2
* = 0 1 0
0 0 1 0 0 1
0 0 2
0 0 0 1
Didn’t
Input Kernel Output
change
1 0 0 0
2 0 0
0 1 0 0 1 0 Max = 2
* = 0 1 0
0 0 0 0 0 1
0 0 0
0 0 0 0
Input Kernel Output
21
Convolutional layer in neural network
Shared bias: Shared kernel:
𝑏 𝑤9 𝑤) 𝑤@
0 0 0 0 0 𝑤A 𝑤* 𝑤<
0 0 1 0 0 𝑤B 𝑤C 𝑤:
0 1 1 0 0
𝜎(𝑤< … …
0 1 0 1 0 + 𝒘𝟖
0 0 0 0 0 + 𝑤:
+ 𝑏)
Input 3x3 … … …
image with … … …
zero padding
(grey area) 9 output neurons (feature map)
with only 10 parameters 22
Convolutional layer in neural network
Shared bias: Shared kernel:
Stride: 1 𝑏 𝑤9 𝑤) 𝑤@
0 0 0 0 0 𝑤A 𝑤* 𝑤<
0 0 1 0 0 𝑤B 𝑤C 𝑤:
0 1 1 0 0
𝜎(𝑤< 𝜎(𝑤* …
0 1 0 1 0 + 𝒘𝟖 + 𝑤B
0 0 0 0 0 + 𝑤: + 𝒘𝟖
+ 𝑏) + 𝑏)
Input 3x3 … … …
image with … … …
zero padding
(grey area) 9 output neurons (feature map)
with only 10 parameters 23
Backpropagation for CNN
Gradients are first calculated as if
the kernel weights
were not shared:
𝑎
𝑏
𝑐
𝑤9 𝑤)
𝑑
𝑤@ 𝒘𝟒

24
Backpropagation for CNN
Gradients are first calculated as if
the kernel weights
were not shared:
𝜕𝐿 𝜕𝐿
𝑎 𝑎 = 𝑎 − 𝛾 𝑏 =𝑏−𝛾
𝑏 𝜕𝑎 𝜕𝑏
𝜕𝐿 𝜕𝐿
𝑐 𝑐 =𝑐−𝛾 𝑑 =𝑑−𝛾
𝑤9 𝑤) 𝜕𝑐 𝜕𝑑
𝑑
𝑤@ 𝒘𝟒

25
Backpropagation for CNN
Gradients are first calculated as if
the kernel weights
were not shared:
𝜕𝐿 𝜕𝐿
𝑎 𝑎 = 𝑎 − 𝛾 𝑏 =𝑏−𝛾
𝑏 𝜕𝑎 𝜕𝑏
𝜕𝐿 𝜕𝐿
𝑐 𝑐 =𝑐−𝛾 𝑑 =𝑑−𝛾
𝑤9 𝑤) 𝜕𝑐 𝜕𝑑
𝑑
𝑤@ 𝒘𝟒 𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿
𝑤A = 𝑤A − 𝛾 + + +
𝜕𝑎 𝜕𝑏 𝜕𝑐 𝜕𝑑

26
Backpropagation for CNN
Gradients are first calculated as if
the kernel weights
were not shared:
𝜕𝐿 𝜕𝐿
𝑎 𝑎 = 𝑎 − 𝛾 𝑏 =𝑏−𝛾
𝑏 𝜕𝑎 𝜕𝑏
𝜕𝐿 𝜕𝐿
𝑐 𝑐 =𝑐−𝛾 𝑑 =𝑑−𝛾
𝑤9 𝑤) 𝜕𝑐 𝜕𝑑
𝑑
𝑤@ 𝒘𝟒 𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿
𝑤A = 𝑤A − 𝛾 + + +
𝜕𝑎 𝜕𝑏 𝜕𝑐 𝜕𝑑

Gradients of the same shared weight are summed up!

27
Convolutional vs fully connected layer
• In convolutional layer, the same kernel is used for every output neuron,
this way we share parameters of the network and train a better model;

28
Convolutional vs fully connected layer
• In convolutional layer, the same kernel is used for every output neuron,
this way we share parameters of the network and train a better model;
• 300x300 input, 300x300 output, 5x5 kernel – 26 parameters in
convolutional layer and 8.1×10: parameters in fully connected layer
(each output is a perceptron);

29
Convolutional vs fully connected layer
• In convolutional layer, the same kernel is used for every output neuron,
this way we share parameters of the network and train a better model;
• 300x300 input, 300x300 output, 5x5 kernel – 26 parameters in
convolutional layer and 8.1×10: parameters in fully connected layer
(each output is a perceptron);
• Convolutional layer can be viewed as a special case of a fully connected
layer when all the weights outside the local receptive field of each neuron
equal 0 and kernel parameters are shared between neurons.

30
Intermediate summary
• We’ve introduced a convolutional layer which works better than fully
connected layer for images: it has fewer parameters and acts the same
for every patch of input.
• This layer will be used as a building block for larger neural networks!

31
Building convolutional neural
networks for vision

32
A color image input
• Let’s say we have a color image as an input, which is
W× 𝐻×𝐶/# tensor (multidimensional array), where
– W – is an image width,
– 𝐻 – is an image height,
– 𝐶/# − is a number of input channels (e.g. 3 RGB channels).

33
A color image input
• Let’s say we have a color image as an input, which is
W× 𝐻×𝐶/# tensor (multidimensional array), where
– W – is an image width,
– 𝐻 – is an image height,
– 𝐶/# − is a number of input channels (e.g. 3 RGB channels).

-1 -1 -1
𝐻 * -1 8 -1 =
-1 -1 -1
𝐶/#
𝑊
kernel of size feature map
𝑊S × 𝐻S ×𝐶/#
34
A color image input
• Let’s say we have a color image as an input, which is
W× 𝐻×𝐶/# tensor (multidimensional array), where
– W – is an image width,
– 𝐻 – is an image height,
– 𝐶/# − is a number of input channels (e.g. 3 RGB channels).

-1 -1 -1
𝐻 * -1 8 -1 =
-1 -1 -1
𝐶/#
𝑊
kernel of size feature map
𝑊S × 𝐻S ×𝐶/#
35
One kernel is not enough!
• We want to train 𝐶$TU kernels of size 𝑊S × 𝐻S ×𝐶/# .
• Having a stride of 1 and enough zero padding we can have
W× 𝐻×𝐶$TU output neurons. feature map
𝑊 𝑊 neuron
𝑊S
𝐻S

𝐻 𝐻 𝐶$TU
𝐶/#

36
One kernel is not enough!
• We want to train 𝐶$TU kernels of size 𝑊S × 𝐻S ×𝐶/# .
• Having a stride of 1 and enough zero padding we can have
W× 𝐻×𝐶$TU output neurons. feature map
𝑊 𝑊 neuron
𝑊S
𝐻S

𝐻 𝐻 𝐶$TU
𝐶/#

• Using 𝑊S ∗ 𝐻S ∗ 𝐶/# + 1 ∗ 𝐶$TU parameters.

37
One convolutional layer is not enough!
• Let’s say neurons of the 1st convolutional layer look at the patches of
the image of size 3x3.
• What if an object of interest is bigger than that?
• We need a 2nd convolutional layer on top of the 1st!

38
One convolutional layer is not enough!
• Let’s say neurons of the 1st convolutional layer look at the patches of
the image of size 3x3.
• What if an object of interest is bigger than that?
• We need a 2nd convolutional layer on top of the 1st!
2nd 3x3 conv receptive field 5x5

1st 3x3 conv receptive field 3x3

39
Receptive field after N convolutional layers
4th 1x9
3rd 1x7

2nd 1x5
1st 1x3 conv
1x3
layer
inputs
(1-dimensional)

40
Receptive field after N convolutional layers
4th 1x9
3rd 1x7

2nd 1x5
1st 1x3 conv
1x3
layer
inputs
(1-dimensional)
• If we stack 𝑁 convolutional layers with the same kernel size 3x3 the
receptive field on 𝑁-th layer will be 2𝑁 + 1×2𝑁 + 1.
• It looks like we need to stack a lot of convolutional layers!
To be able to identify objects as big as the input image 300x300 we will
need 150 convolutional layers! 41
We need to grow receptive field faster!
• We can increase a stride in our convolutional layer to reduce the output
dimensions! Stride: 2

1 1 1 4
2x2 conv
2 6 5 8 7 9
3 2 1 0 2x2 conv 4 6
1 1 3 5

42
We need to grow receptive field faster!
• We can increase a stride in our convolutional layer to reduce the output
dimensions! Stride: 2

1 1 1 4
2x2 conv
2 6 5 8 7 9
3 2 1 0 2x2 conv 4 6
1 1 3 5

• Further convolutions will effectively double

their receptive field!

43
How do we maintain translation invariance?
0 0 0 0
0 0 0
0 0 0 0 1 0 Max = 2
* = 0 1 0
0 0 1 0 0 1
0 0 2
0 0 0 1
Didn’t
Input Kernel Output
change
1 0 0 0
2 0 0
0 1 0 0 1 0 Max = 2
* = 0 1 0
0 0 0 0 0 1
0 0 0
0 0 0 0
Input Kernel Output
44
Pooling layer will help!
• This layer works like a convolutional layer but doesn’t have kernel,
instead it calculates maximum or average of input patch values.
200x200x64 Single depth slice
100x100x64 1 1 1 4
pooling 2 6 5 8 6 8
3 2 1 0 3 5

200 100 1 1 3 5
downsampling
100
200 2x2 max pooling with stride 2

45
Backpropagation for max pooling layer
• Strictly speaking: maximum is not a differentiable function!

46
Backpropagation for max pooling layer
• Strictly speaking: maximum is not a differentiable function!

6 8 7 8
Maximum = 8 Maximum = 8
3 5 3 5

• There is no gradient with respect to non maximum patch neurons, since

changing them slightly does not affect the output.

47
Backpropagation for max pooling layer
• Strictly speaking: maximum is not a differentiable function!

6 8 7 8
Maximum = 8 Maximum = 8
3 5 3 5

• There is no gradient with respect to non maximum patch neurons, since

changing them slightly does not affect the output.

6 8 7 9
Maximum = 8 Maximum = 9
3 5 3 5
48
Putting it all together into a simple CNN
• LeNet-5 architecture (1998) for handwritten digits recognition on MNIST
dataset:
32 28
14 10
5 5
2 5
2 2
5 5
2
28 14 10 5
32
120 84
16 fc1 fc2
6 16 pool2
6 pool1 conv2 2x2 10
1
conv1 2x2 5x5 fc3 + softmax
5x5
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf 49
Learning deep representations
• Neurons of deep convolutional layers learn complex representations that
can be used as features for classification with MLP.
Automatic feature extraction

Good
features
for MLP

Inputs that provide highest activations:

conv1 conv2 conv3

http://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf 50
Summary
• Using convolutional, pooling and fully connected layers we’ve built our
first network for handwritten digits recognition!

51
Neural architectures
for computer vision

52
ImageNet classification dataset
1000 classes, 1.2 million labeled photos
Human top 5 error: ~5%

Olga Russakovsky, https://arxiv.org/pdf/1409.0575.pdf

53
AlexNet (2012)
• First deep convolutional neural net for ImageNet
• Significantly reduced top 5 error from 26% to 15%

Alex Krizhevsky, https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

• 11x11, 5x5, 3x3 convolutions, max pooling, dropout, data augmentation,
ReLU activations, SGD with momentum
• 60 million parameters
• Trains on 2 GPUs for 6 days 54
VGG (2015)
• Similar to AlexNet, only 3x3 convolutions, but lots of filters!
• ImageNet top 5 error: 8.0% (single model)

Vanessa He, http://www.datalearner.com/paper_note/content/300035

• Training similar to AlexNet with additional multi-scale cropping.

• 138 million parameters
• Trains on 4 GPUs for 2-3 weeks 55
Inception V3 (2015)
• Similar to AlexNet? Not quite, uses Inception block introduced in
GoogLeNet (a.k.a. Inception V1)
• ImageNet top 5 error: 5.6% (single model), 3.6% (ensemble)

Inception block
Jon Shlens, https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html

• Batch normalization, image distortions, RMSProp

• 25 million parameters!
• Trains on 8 GPUs for 2 weeks 56
1x1 convolutions
• Such convolutions capture interactions of input channels in one “pixel”
of feature map
• They can reduce the number of channels not hurting the quality of the
model, because different channels can correlate
• Dimensionality reduction with added ReLU activation

1 + ReLU
1
𝐶$TU
𝐶/# 57
Basic Inception block
• All operations inside a block use stride 1 and enough padding to output
the same spatial dimensions (𝑊×𝐻) of feature map.
• 4 different feature maps are concatenated on depth at the end

Christian Szegedy, https://arxiv.org/pdf/1512.00567.pdf 58

Replace 5x5 convolutions
• 5x5 convolutions are expensive! Let’s replace them with two layers of
3x3 convolutions which have an effective receptive field of 5x5.
Receptive field 5x5

Christian Szegedy, https://arxiv.org/pdf/1512.00567.pdf

59
Filter decomposition
• It’s known that a Gaussian blur filter can be decomposed in two 1
dimensional filters:

Ati’s presentation, http://www.florian-oeser.de/rtr/ue5/ 60

Filter decomposition in Inception block
• 3x3 convolutions are currently the most expensive parts!
• Let’s replace each 3x3 layer with 1x3 layer followed by 3x1 layer.

Christian Szegedy, https://arxiv.org/pdf/1512.00567.pdf

61
ResNet (2015)
• Introduces residual connections
• ImageNet top 5 error: 4.5% (single model), 3.5% (ensemble)

Kaiming He, https://arxiv.org/pdf/1512.03385.pdf

• 152 layers, few 7x7 convolutional layers, the rest are 3x3, batch
normalization, max and average pooling.
• 60 million parameters
• Trains on 8 GPUs for 2-3 weeks.
62
Residual connections
• We create output channels adding a small delta 𝐹(𝑥) to original input
channels 𝑥:

Kaiming He, https://arxiv.org/pdf/1512.03385.pdf

• This way we can stack thousands of layers and gradients do not vanish
thanks to residual connections
63
Summary
• By stacking more convolution and pooling layers you can reduce the
error! Like in AlexNet or VGG.
• But you cannot do that forever, you need to utilize new kind of layers like
Inception block or residual connections.
• You’ve probably noticed that one needs a lot of time to train her neural
network!

CNN Short
No ratings yet
CNN Short
61 pages
Computer Vision With Keras
No ratings yet
Computer Vision With Keras
67 pages
Images, Neural Networks, CNNs
No ratings yet
Images, Neural Networks, CNNs
26 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
4a Convolutional Neural Networks
No ratings yet
4a Convolutional Neural Networks
56 pages
Unit 4a - Convolutional Neural Networks
No ratings yet
Unit 4a - Convolutional Neural Networks
107 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Lab 5 - Intro To Convolutional Neural Networks
No ratings yet
Lab 5 - Intro To Convolutional Neural Networks
52 pages
Convolutional Neural Networks : Covnets
No ratings yet
Convolutional Neural Networks : Covnets
22 pages
Week6 - Intro To Convolutional Neural Networks
No ratings yet
Week6 - Intro To Convolutional Neural Networks
25 pages
PNAL9 CNNs
No ratings yet
PNAL9 CNNs
61 pages
Deep LearningUNIT-IV
No ratings yet
Deep LearningUNIT-IV
16 pages
K-Max Pooling Operation
No ratings yet
K-Max Pooling Operation
134 pages
Unit Iii Deep Learning
No ratings yet
Unit Iii Deep Learning
31 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
Convolutional Neural Networks. Before Kickstarting Into CNNs We Must - by Namita - Medium
No ratings yet
Convolutional Neural Networks. Before Kickstarting Into CNNs We Must - by Namita - Medium
13 pages
02 CNN Slides
No ratings yet
02 CNN Slides
77 pages
Convolutional Networks
No ratings yet
Convolutional Networks
37 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
Deep Learning UNIT-4
No ratings yet
Deep Learning UNIT-4
34 pages
Module5 ML
No ratings yet
Module5 ML
112 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
No ratings yet
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
26 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
NN 07
No ratings yet
NN 07
24 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
DL Unit Iv
No ratings yet
DL Unit Iv
18 pages
Co2 CNN 3
No ratings yet
Co2 CNN 3
31 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
CNN 2
No ratings yet
CNN 2
47 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
Module 3
No ratings yet
Module 3
67 pages
Convolutional Neural Networks in Python - DataCamp
No ratings yet
Convolutional Neural Networks in Python - DataCamp
22 pages
Unit 3
No ratings yet
Unit 3
105 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
DEEP LEARNING Unit-2 NOTES For Post Graduation
No ratings yet
DEEP LEARNING Unit-2 NOTES For Post Graduation
11 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
MLP and CNN
No ratings yet
MLP and CNN
56 pages
Unit III
No ratings yet
Unit III
89 pages
NN 06
No ratings yet
NN 06
18 pages
Guddu Jha - Organized
No ratings yet
Guddu Jha - Organized
3 pages
Project Exhibition 2
No ratings yet
Project Exhibition 2
42 pages
Lecture 08
No ratings yet
Lecture 08
43 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Convolutional Neural Network - 5
No ratings yet
Convolutional Neural Network - 5
21 pages
Ch3 CNN
No ratings yet
Ch3 CNN
64 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
CNN Slides Part2
No ratings yet
CNN Slides Part2
69 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
UNIT-III Convolution Neural Networks
No ratings yet
UNIT-III Convolution Neural Networks
9 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
47 pages
Liu 2018 J. Phys. Conf. Ser. 1087 062032
No ratings yet
Liu 2018 J. Phys. Conf. Ser. 1087 062032
8 pages
DPM 14
No ratings yet
DPM 14
11 pages
DPM 13 (Solutions)
No ratings yet
DPM 13 (Solutions)
6 pages
DPM 15(Solutions)
No ratings yet
DPM 15(Solutions)
6 pages
DPM 101(Solutions)
No ratings yet
DPM 101(Solutions)
9 pages
DPM 78
No ratings yet
DPM 78
14 pages
Labour Laws Every HR Professional Must Master in A Private Limited Company (India)
No ratings yet
Labour Laws Every HR Professional Must Master in A Private Limited Company (India)
7 pages
DPM 4
No ratings yet
DPM 4
10 pages
DPM 40 (Solutions)
No ratings yet
DPM 40 (Solutions)
8 pages
DPM 4 (Solutions)
No ratings yet
DPM 4 (Solutions)
6 pages
Chapter-7-Networks and 3D Diagrams
No ratings yet
Chapter-7-Networks and 3D Diagrams
6 pages
DPM 56 (Solutions)
No ratings yet
DPM 56 (Solutions)
8 pages
DPM 57 (Solutions)
No ratings yet
DPM 57 (Solutions)
12 pages
DPM 92 (Solutions)
No ratings yet
DPM 92 (Solutions)
8 pages
DPM 91 (Solutions)
No ratings yet
DPM 91 (Solutions)
9 pages
DPM 18 (Solutions)
No ratings yet
DPM 18 (Solutions)
7 pages
Chapter 9 The Nano World PART 1
No ratings yet
Chapter 9 The Nano World PART 1
17 pages
Fresh and Rotten Fruit Classification: Using Deep Learning
No ratings yet
Fresh and Rotten Fruit Classification: Using Deep Learning
17 pages
Transfer Learning and Fine-Tuning
No ratings yet
Transfer Learning and Fine-Tuning
32 pages
Emerging Technology Assignment and Rubric 2016 1
100% (1)
Emerging Technology Assignment and Rubric 2016 1
5 pages
1 s2.0 S0957417423000635 Main
No ratings yet
1 s2.0 S0957417423000635 Main
11 pages
جزوه هوش مصنوعی
No ratings yet
جزوه هوش مصنوعی
16 pages
Mak Etal IJMAV09
No ratings yet
Mak Etal IJMAV09
4 pages
Haptic Interfaces For Robot Teleoperation A Project Poster at Khalifa Univ
No ratings yet
Haptic Interfaces For Robot Teleoperation A Project Poster at Khalifa Univ
1 page
Materials Science Workshop Presentation
No ratings yet
Materials Science Workshop Presentation
30 pages
B.Tech. Theory Examination (Sem - IV) 2016-17 Introduction To Soft Computing (Neural Network, Fuzzy Logic & Genetic Algorithm)
No ratings yet
B.Tech. Theory Examination (Sem - IV) 2016-17 Introduction To Soft Computing (Neural Network, Fuzzy Logic & Genetic Algorithm)
1 page
Robotics 11
No ratings yet
Robotics 11
35 pages
STS-MAY-18-30m Module5
No ratings yet
STS-MAY-18-30m Module5
4 pages
3D Face Papers
No ratings yet
3D Face Papers
12 pages
OOVIRT Virtual Tour Brochure v2 2
No ratings yet
OOVIRT Virtual Tour Brochure v2 2
8 pages
Acoustic Detection of Drone: Mel Spectrogram
No ratings yet
Acoustic Detection of Drone: Mel Spectrogram
1 page
H2o Prot
No ratings yet
H2o Prot
359 pages
Lecture 1
No ratings yet
Lecture 1
82 pages
Computer Vision and Artificial Intelligence
No ratings yet
Computer Vision and Artificial Intelligence
55 pages
Comp 4010 Lecture12 - Research Directions in AR and VR
No ratings yet
Comp 4010 Lecture12 - Research Directions in AR and VR
66 pages
Comparative Study On Spoken Language Identification Based On Deep Learning
No ratings yet
Comparative Study On Spoken Language Identification Based On Deep Learning
5 pages
A Textbook On Fundamentals and Applications of Nanotechnology
No ratings yet
A Textbook On Fundamentals and Applications of Nanotechnology
202 pages
Edge Detection Region Based Segementation
No ratings yet
Edge Detection Region Based Segementation
57 pages
Project Report
No ratings yet
Project Report
5 pages
Salary Prediction
No ratings yet
Salary Prediction
12 pages
H13 311 - V3.0 Demo
No ratings yet
H13 311 - V3.0 Demo
5 pages
Ict 423 - Deep Learning
No ratings yet
Ict 423 - Deep Learning
18 pages
07 Chap004
100% (1)
07 Chap004
32 pages
ML001 Getting Started
No ratings yet
ML001 Getting Started
25 pages
Digital Play
No ratings yet
Digital Play
1 page

Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision

Uploaded by

Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision

Uploaded by

Convolutional Neural Networks

Convolutions, pooling and CNNs. Neural architectures

• Maybe MLP will work?

Pixels 𝑥/0 Weights 𝑤/0

• Maybe MLP will work?

Pixels 𝑥/0 Weights 𝑤/0

On this training image green

On this training image green

• We learn the same “cat features” in different areas

Sums up to 0 (black color)

Input Kernel Output

Input Kernel Output

Input Kernel Output

Input Kernel Output

Input Kernel Output

Input Kernel Output

Gradients of the same shared weight are summed up!

• Using 𝑊S ∗ 𝐻S ∗ 𝐶/# + 1 ∗ 𝐶$TU parameters.

1st 3x3 conv receptive field 3x3

• Further convolutions will effectively double

• There is no gradient with respect to non maximum patch neurons, since

• There is no gradient with respect to non maximum patch neurons, since

Inputs that provide highest activations:

conv1 conv2 conv3

Olga Russakovsky, https://arxiv.org/pdf/1409.0575.pdf

Alex Krizhevsky, https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Vanessa He, http://www.datalearner.com/paper_note/content/300035

• Training similar to AlexNet with additional multi-scale cropping.

• Batch normalization, image distortions, RMSProp

Christian Szegedy, https://arxiv.org/pdf/1512.00567.pdf 58

Christian Szegedy, https://arxiv.org/pdf/1512.00567.pdf

Ati’s presentation, http://www.florian-oeser.de/rtr/ue5/ 60

Christian Szegedy, https://arxiv.org/pdf/1512.00567.pdf

Kaiming He, https://arxiv.org/pdf/1512.03385.pdf

Kaiming He, https://arxiv.org/pdf/1512.03385.pdf

You might also like