0% found this document useful (0 votes)
85 views97 pages

Convolution Neural Networks (CNN) : Ms. Anisha Mahato Assistant Professor (CSE Specialization)

Cnn ppt

Uploaded by

Shubham Goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views97 pages

Convolution Neural Networks (CNN) : Ms. Anisha Mahato Assistant Professor (CSE Specialization)

Cnn ppt

Uploaded by

Shubham Goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

Convolution Neural Networks

(CNN)

Ms. Anisha Mahato


Assistant Professor (CSE Specialization)
Computer Vision Problems
Image Classification Neural Style Transfer

Cat? (0/1)

64x64

Object detection
Deep Learning on large images
Problems
1. Too many parameters to train
2. Positional information is lost
Cat? (0/1) 3. Chance of overfitting

64 x 64 x 3

1000 x 1000 x 3
= 3 million

1000 x 1000 x 3
3 million x 1000 = 3 billion trainable weights
Edge Detection

vertical edges

horizontal edges
Vertical edge detection
3x1 + 0x0 + 1x-1 + 1x1 + 5x0 + 8x-1 + 2x1 + 7x0 + 2x-1 = -5
1 0 -1
3 0 1 2 7 4 Filter / Kernel
1 0 -1
1 5 8 9 3 1 1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6 Convolution
Vertical edge detection
3x1 + 0x0 + 1x-1 + 1x1 + 5x0 + 8x-1 + 2x1 + 7x0 + 2x-1 = -5
1 0 -1
3 0 1 2 7 4 Filter / Kernel
1
1
5
0
8
-1
9 3 1 -5
1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6 Convolution
Vertical edge detection

1 0 -1
3 0 1 2 7 4
1 5
1
8
0
9
-1
3 1 -5 -4
1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

1 0 -1
3 0 1 2 7 4
1 5 8
1
9
0
3
-1
1 -5 -4 0
1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

1 0 -1
3 0 1 2 7 4
1 5 8 9
1
3
0
1
-1 -5 -4 0 8
1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1
1
5
0
8
-1
9 3 1 -5 -4 0 8
1 0 -1
2
1
7
0
2
-1
5 1 3 -10
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5
1
8
0
9
-1
3 1 -5 -4 0 8
1 0 -1
2 7
1
2
0
5
-1
1 3 -10 -2
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5 8
1
9
0
3
-1
1 -5 -4 0 8
1 0 -1
2 7 2
1
5
0
1
-1
3 -10 -2 2
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5 8 9
1
3
0
1
-1 -5 -4 0 8
1 0 -1
2 7 2 5
1
1
0
3
-1
-10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2
1
7
0
2
-1
5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0
4
1
2
0
1
-1
6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7
1
2
0
5
-1
1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2
4 2
1
1
0
6
-1
2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2
1
5
0
1
-1
3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4
4 2 1
1
6
0
2
-1
8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5
1
1
0
3
-1
-10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4 2 1 6
1
2
0
8
-1 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4
1
2
0
1
-1
6 2 8 1 0 -1
-3
3x3
1 0 -1
2 4 5 2 3 9
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4 2
1
1
0
6
-1
2 8 1 0 -1
-3 -2
3x3
1 0 -1
2 4 5 2 3 9
4x4
6x6
Vertical edge detection

3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4 2 1
1
6
0
2
-1
8 1 0 -1
-3 -2 -3
3x3
1 0 -1
2 4 5 2 3 9
4x4
6x6
Vertical edge detection

Feature Map
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4 2 1 6
1
2
0
8
-1 1 0 -1
-3 -2 -3 -16
3x3
1 0 -1
2 4 5 2 3 9
4x4
6x6
Vertical edge detection

10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0 1 0 -1
1 0 -1 0 30 30 0
10 10 10 0 0 0 =
* 0 30 30 0
10 10 10 0 0 0 1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
Vertical edge detection
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0 * 1
1
0 -1
0 -1
= 0 30 30 0
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0

0 0 0 10 10 10
0 0 0 10 10 10 0 -30 -30 0
1 0 -1
0 0 0 10 10 10 0 -30 -30 0
0 0 0 10 10 10 * 1
1
0 -1
0 -1
= 0 -30 -30 0
0 0 0 10 10 10 0 -30 -30 0
0 0 0 10 10 10
Horizontal edge detection
1 0 -1 1 1 1
1 0 -1 0 0 0
1 0 -1 -1 -1 -1
Vertical Horizontal
10 10 10 0 0 0
0 0 0 0
10 10 10 0 0 0 1 1 1
10 10 10
0 0 0
0 0
10 10 10
0
* 0 0 0 =
30
30
10 -10 -30
10 -10 -30
-1 -1 -1
0 0 0 10 10 10 0 0 0 0
0 0 0 10 10 10
Learning to detect edges
1 0 -1 1 0 -1 3 0 -3
1 0 -1 2 0 -2 10 0 -10
1 0 -1 1 0 -1 3 0 -3
Sobel filter Scharr filter

3 0 1 2 7 4
1 5 8 9 3 1
w1 w2 w3
2 7 2 5 1 3
0 1 3 1 7 8 * w4 w5 w6 =
w7 w8 w9
4 2 1 6 2 8
2 4 5 2 3 9
Why convolutions

• Parameter sharing: A feature detector (such as a vertical


edge detector) that’s useful in one part of the image is
probably useful in another part of the image.

• Sparsity of connections: In each layer, each output value


depends only on a small number of inputs.

• Translation invariance: Shared weights across different


spatial locations enable the network to recognize the same
pattern in various positions, reducing sensitivity to feature
location.
1 -1 -1 Filter 1 1 1
-1 1 -1 2 0
-1 -1 1 3 0
4: 0 3
1 0 0 0 0 1


0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10: 0


0 0 1 0 1 0
13 0
6 x 6 image
14 0
fewer parameters! 15 1 Only connect to 9
16 1 inputs, not fully
connected


1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3
1 0 0 0 0 1


0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0 -1
0 1 0 0 1 0 10: 0


0 0 1 0 1 0
13: 0
6 x 6 image
14: 0
Fewer parameters 15: 1
16: 1 Shared weights
Even fewer parameters


Padding

3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk

6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding

3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk

6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding

3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk

6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding

3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk

6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding

3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk

6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding

3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk

6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding
0 0 0 0 0 0 0 0
0 3 0 1 2 7 4 0
0 1 5 8 9 3 1 0
0 2 7 2 5 1 3 0 * =
0 0 1 3 1 7 8 0
0 4 2 1 6 2 8 0 3x3 4x4
0 2 4 5 2 3 9 0 fxf kxk
0 0 0 0 0 0 0 0
6x6 k = n + 2p – f + 1
nxn = 6 + 2x1 – 3 + 1
=6
Padding = p = 1
Valid and Same Convolution

Valid Convolution: No padding à Shrinks the image

Same convolution: Pad the image so that the output size is same as the
input size.
n + 2p – f + 1 = n
à 2p = f – 1
à p = (f - 1)/2
Therefore, if f = 3, then p = (3 – 1)/2 = 1 if f = 5, then p = (5 – 1)/2 = 2
Valid and Same Convolution

Valid Convolution: No padding à Shrinks the image

Same convolution: Pad the image so that the output size is same as the
input size.
n + 2p – f + 1 = n Filter size is usually odd
à 2p = f – 1
à p = (f - 1)/2
• Central pixel as reference
• Symmetry around center
• Avoid asymmetric padding
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 8 9 7 3 4 4
7 8 3 6 6 3 4 * 1 0 2 =
4 2 1 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4

7x7
Strided Convolution
23 34 74 4 6 2 9
61 60 92 8 7 4 3
3 -1 4 0 8 3 3 8 9 7 3 4 4 91
7 8 3 6 6 3 4 * 1 0 2 =
4 2 1 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4

7x7
Strided Convolution
2 3 73 44 64 2 9
6 6 91 80 72 4 3
3 4 8 -1 3 0 8 3 9 7 3 4 4 91 100
7 8 3 6 6 3 4 * 1 0 2 =
4 2 1 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4

7x7
Strided Convolution
2 3 7 4 63 24 94
6 6 9 8 71 40 32
3 4 8 3 8 -1 9 0 7 3 3 4 4 91 100 83
7 8 3 6 6 3 4 * 1 0 2 =
4 2 1 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4

7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
33 44 84 3 8 9 7 3 4 4 91 100 83
71 80 32 6 6 3 4 * 1 0 2 = 69
4 -1 2 0 1 3 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4

7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 83 34 84 9 7 3 4 4 91 100 83
7 8 31 60 62 3 4 * 1 0 2 = 69 91
4 2 1 -1 8 0 3 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4

7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 83 94 74 3 4 4 91 100 83
7 8 3 6 61 30 42 * 1 0 2 = 69 91 127
4 2 1 8 3 -1 4 0 6 3 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4

7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 8 9 7 3 4 4 91 100 83
7 8 3 6 6 3 4 * 1 0 2 = 69 91 127
43 24 14 8 3 4 6 -1 0 3 44
31 20 42 1 9 8 3
3x3 3x3
0 -1
1 0
3 3
9 2 1 4

7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 8 9 7 3 4 4 91 100 83
7 8 3 6 6 3 4 * 1 0 2 = 69 91 127
4 2 13 84 34 4 6 -1 0 3 44 72
3 2 41 10 92 8 3
3x3 3x3
0 1 3 -1
9 0
2 3
1 4

7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 8 9 7 3 4 4 91 100 83
7 8 3 6 6 3 4 * 1 0 2 = 69 91 127
4 2 1 8 33 44 64 -1 0 3 44 72 74
3 2 4 1 91 80 32
3x3 3x3
0 1 3 9 2 -1
1 0
4 3
fxf kxk
7x7 !"#$%&
nxn k= +1
'
= (7 + 0 - 3)/2 + 1
Padding = p = 0
= 2+1 = 3
Stride = s = 2
General Formulae
• w is the width of the input image
• h is the height of the input image
• pw is the padding applied to the width of the input
• ph is the padding applied to the height of the input
• fw is the width of the convolutional filter
• fh is the height of the convolutional filter
• sw is the stride used in the horizontal direction
• sh is the stride used in the vertical direction

𝑤 + 2𝑝! − 𝑓!
• Output image width = 𝑠!
+1

ℎ + 2𝑝" − 𝑓"
• Output image height = 𝑠"
+1
Volume Convolution

* =
3x3x3
4x4
6x6x3
Volume Convolution

* =
3x3x3
4x4
6x6x3
Volume Convolution

*
3x3x3

6x6x3 4x4
Volume Convolution

*
3x3x3

6x6x3 4x4
Volume Convolution

*
3x3x3

6x6x3 4x4
Multiple Kernels

* =
3x3x3 4x4

6x6x3 4x4x2
* =

No. of channels
3x3x3 No. of kernels
4x4
n x n x nc * f x f x nc = (n – f + 1) x (n – f + 1) x nf
One Neural Network Layer

a(0) W(1) W(1)a(0) Z(1) g a(1)

b(1)
Z(1) = W(1)a(0) + b(1)
a(1) = g(Z(1))
One Convolution Layer Z(1) = W(1)a(0) + b(1)
a(1) = g(Z(1)) Activation
function

a(0)
* = ReLU W1(1)a(0) +b1(1)

3x3x3 4x4 a(1)


g
6x6x3 4x4x2

* = ReLU W2(1)a(0) +b2(1)


W(1) 3x3x3 Z(1)
4x4
Number of parameters in one layer

If you have 10 kernels that are 3 x 3 x 3 in one layer of


a neural network, how many parameters does that
layer have?

3x3x3 3x3x3 3x3x3


Kernel 1 Kernel 2 Kernel 10

For each kernel there are 3 x 3 x 3 = 27 parameters (weights) + 1 parameter (bias) = 28 parameters

For 10 kernels, there will be 28 x 10 = 280 trainable parameters (Independent of input image size)
Summary of notation
If layer l is a convolution layer:
(,-.) (,-.) (,-.)
𝑓 (,) = kernel size Input: 𝑛* ×𝑛0 ×𝑛1
(,)
Output: 𝑛* (,) (,)
𝑝(,) = padding ×𝑛0 ×𝑛1
𝑠 (,) = stride
(,-.)
(,)
𝑛1 = number of filters (,) 𝑛* + 2𝑝(,) − 𝑓 (,)
𝑛* = (,)
+1
𝑠
(,-.) (,-.)
Each kernel is: 𝑓 (,) ×𝑓 (,) ×𝑛
1 (,) 𝑛0 + 2𝑝(,) − 𝑓 (,)
𝑛0 = (,)
+1
(,) (,) (,) 𝑠
Activations: 𝑎(,) → 𝑛0 ×𝑛* ×𝑛1
Weights: 𝑓 (,) ×𝑓 (,) ×𝑛1(,-.) ×𝑛1(,)
(,)
bias: 𝑛1
Pooling layer: Max pooling

1 3 2 1
2 9 1 1
1 3 2 3
5 6 1 2
4x4
Pooling layer: Max pooling

1 3 2 1
2 9 1 1 9

1 3 2 3
5 6 1 2 2x2

4x4
Filter size = 2 x 2
Pooling layer: Max pooling

1 3 2 1
2 9 1 1 9 2

1 3 2 3
5 6 1 2 2x2

4x4
Filter size = 2 x 2
Stride= 2
Pooling layer: Max pooling

1 3 2 1
2 9 1 1 9 2

1 3 2 3 6
5 6 1 2 2x2

4x4
Filter size = 2 x 2
Stride= 2
Pooling layer: Max pooling

1 3 2 1
2 9 1 1 9 2

1 3 2 3 6 3
5 6 1 2 2x2

4x4
Filter size = 2 x 2 Two
Stride= 2 hyperparameters

No parameters to learn
Why Pooling?

• To reduce the dimensions of the feature maps, thus reducing the


number of parameters to learn and the amount of computation
performed in the network.

• The pooling layer summarizes the features present in a region of


the feature map generated by a convolution layer.

• By summarizing information in local neighborhoods, pooling helps


the network focus on the presence or absence of features rather
than their precise positions (Translation invariance).
Max Pooling – advantage
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9 9
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8 6
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8 6 9
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8 6 9
5 6 1 2 9 3 x 3 x nc
Filter size = 3 x 3
5 x 5 x nc
Stride= 1
Average Pooling

1 3 2 1 3
2 9 1 1 5 2.67 2.78 2.22

1 3 2 3 2 3.78 3.11 2.22

8 3 5 1 0 3.78 2.89 2.78

5 6 1 2 9 3 x 3 x nc
Filter size = 3 x 3
5 x 5 x nc
Stride= 1
Summary of pooling
Hyperparameters :

f : filter size
s : stride
Max or average pooling

𝑛* − 𝑓 𝑛0 − 𝑓
𝑛* × 𝑛0 × 𝑛1 → +1 × + 1 × 𝑛1
𝑠 𝑠

f = 2, s = 2 f = 3, s = 1

4 x 4 x 10 à 2 x 2 x 10 5 x 5 x 10 à 3 x 3 x 10
Flattening and Fully Connected Layers
Types of layer in a convolutional network:

- Convolution
- Pooling
- Fully connected
Complete convolutional network

7 x 7 x 64
28 x 28 x 3 14 x 14 x 32
14 x 14 x 64
28 x 28 x 32

Fully Fully
Connected Connected
Complete convolutional network parameters

Layers Activation shape Activation size #paramete


rs
Input 28 x 28 x 3 2352 0
Conv1 (f=3, p=1, s=1, F=32) 28 x 28 x 32 25,088 896
Pool1 (f=2, s=2) 14 x 14 x 32 6,272 0
Conv2 (f=3, p=1, s=1, F=64) 14 x 14 x 64 12,544 18,496
Pool2 (f=2, s=2) 7 x 7 x 64 3,136 0
FC1 128 x 1 128 4,01,536
FC2 10 x 1 10 1290
CNN in Keras

from keras.models import Sequential


from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3), padding='same', strides=(1, 1), activation='relu', input_shape=(28, 28, 3)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

model.add(Conv2D(64, kernel_size=(3, 3), padding='same', strides=(1, 1), activation='relu'))


model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

model.add(Flatten())

model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


Putting it together: Training a CNN
Training set (𝑥 (") , 𝑦 (") ) … (𝑥 ($) , 𝑦 ($) )

𝑦.

$
1
Cost J = , 𝐿(𝑦. (%) , 𝑦 (%) )
𝑚
%&"

Use backpropagation with gradient descent to optimize


parameters (w and b) to reduce 𝐽
Popular CNN Architectures

• LeNet-5

• AlexNet

• VGG

• ResNet

• Inception Net
LeNet-5
FC FC

• Valid convolution (no padding)


• Average pooling
• Tanh / sigmoid non-linearity after pooling layer
• 60k trainable parameters
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document
recognition. Proceedings of the IEEE, 86(11), 2278-2324.
AlexNet

• 60M parameters • Multiple GPUs


• Relu non-linearity • Local response normalization
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
networks. Advances in neural information processing systems, 25.
VGG-16
• Convolution: Filter 3 x 3, stride 1, same convolution
• Max pooling: Filter 2 x 2, stride 2

• 138M parameters
• Relu non-linearity
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv
preprint arXiv:1409.1556.
Transfer Learning

A pre-trained model on a source task is utilized as a starting point for training on a


target task, leveraging learned features and representations to improve performance
on the new task with limited data.

Suppose you want to build a CNN to classify the following dog breeds

German shepherd (G) Labrador (L) Others (O)

Start with a pre-trained open source model which is trained on image data and
download its code and weights.
Transfer Learning (Little training data)

Train G
L

Softmax 1000

Freeze
Transfer Learning (Some more training data)

Train G
L

Softmax 1000

G
Freeze
L
O
Train
Transfer Learning (Lot of training data)

G
Train
L

Softmax 1000

Re-train the complete network with pre-trained weights as the initial weights
Data Augmentation

Data augmentation is crucial for improving the generalization


and robustness of CNNs by artificially increasing dataset
diversity through transformations, reducing overfitting, and
aiding effective learning from limited labeled data.
Data Augmentation
Object Localization

Classification Classification with localization


Classification with Localization

(0,0)

Softmax

(1,1) Bounding box


1. Pedestrian CNN
bx , by , bw , bh
2. Car (bx , by) bw 0.5 , 0.4 , 0.3, 0.25
3. Motor cycle
bh
4. Background
Defining the Target Label y
Need to output bx , by , bw , bh , class label 1-4
1. Pedestrian
2. Car
3. Motor cycle
4. Background

pc Is there a car in
the image? (0/1) 1 0
bx 0.5 ?
If 𝑦 = 1
by Bounding box 0.4 ? 7 𝑦) = (𝑦7# − 𝑦# )$ +(𝑦7$ − 𝑦$ )$
L(𝑦,
bw (real numbers) + ⋯ + (𝑦7% − 𝑦% )$
y= 0.3 ?
bh y= 0.25 y= ?
c1 0 ? If 𝑦 = 0
Class, one of
c2 them is 1, others
1 ?
7 𝑦) = (𝑦7# − 𝑦# )$
L(𝑦,
c3 are 0 0 ?

You might also like