Convolution Neural Networks (CNN) : Ms. Anisha Mahato Assistant Professor (CSE Specialization)
Convolution Neural Networks (CNN) : Ms. Anisha Mahato Assistant Professor (CSE Specialization)
(CNN)
Cat? (0/1)
64x64
Object detection
Deep Learning on large images
Problems
1. Too many parameters to train
2. Positional information is lost
Cat? (0/1) 3. Chance of overfitting
64 x 64 x 3
1000 x 1000 x 3
= 3 million
1000 x 1000 x 3
3 million x 1000 = 3 billion trainable weights
Edge Detection
vertical edges
horizontal edges
Vertical edge detection
3x1 + 0x0 + 1x-1 + 1x1 + 5x0 + 8x-1 + 2x1 + 7x0 + 2x-1 = -5
1 0 -1
3 0 1 2 7 4 Filter / Kernel
1 0 -1
1 5 8 9 3 1 1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6 Convolution
Vertical edge detection
3x1 + 0x0 + 1x-1 + 1x1 + 5x0 + 8x-1 + 2x1 + 7x0 + 2x-1 = -5
1 0 -1
3 0 1 2 7 4 Filter / Kernel
1
1
5
0
8
-1
9 3 1 -5
1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6 Convolution
Vertical edge detection
1 0 -1
3 0 1 2 7 4
1 5
1
8
0
9
-1
3 1 -5 -4
1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
1 0 -1
3 0 1 2 7 4
1 5 8
1
9
0
3
-1
1 -5 -4 0
1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
1 0 -1
3 0 1 2 7 4
1 5 8 9
1
3
0
1
-1 -5 -4 0 8
1 0 -1
1 0 -1
2 7 2 5 1 3
* 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1
1
5
0
8
-1
9 3 1 -5 -4 0 8
1 0 -1
2
1
7
0
2
-1
5 1 3 -10
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5
1
8
0
9
-1
3 1 -5 -4 0 8
1 0 -1
2 7
1
2
0
5
-1
1 3 -10 -2
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5 8
1
9
0
3
-1
1 -5 -4 0 8
1 0 -1
2 7 2
1
5
0
1
-1
3 -10 -2 2
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5 8 9
1
3
0
1
-1 -5 -4 0 8
1 0 -1
2 7 2 5
1
1
0
3
-1
-10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8
4 2 1 6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2
1
7
0
2
-1
5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0
4
1
2
0
1
-1
6 2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7
1
2
0
5
-1
1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2
4 2
1
1
0
6
-1
2 8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2
1
5
0
1
-1
3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4
4 2 1
1
6
0
2
-1
8 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5
1
1
0
3
-1
-10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4 2 1 6
1
2
0
8
-1 1 0 -1
2 4 5 2 3 9 3x3
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4
1
2
0
1
-1
6 2 8 1 0 -1
-3
3x3
1 0 -1
2 4 5 2 3 9
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4 2
1
1
0
6
-1
2 8 1 0 -1
-3 -2
3x3
1 0 -1
2 4 5 2 3 9
4x4
6x6
Vertical edge detection
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4 2 1
1
6
0
2
-1
8 1 0 -1
-3 -2 -3
3x3
1 0 -1
2 4 5 2 3 9
4x4
6x6
Vertical edge detection
Feature Map
3 0 1 2 7 4
1 5 8 9 3 1 -5 -4 0 8
1 0 -1
2 7 2 5 1 3 -10 -2 2 3
1 0 -1 * 1 0 -1 =
0 1 3 1 7 8 0 -2 -4 -7
4 2 1 6
1
2
0
8
-1 1 0 -1
-3 -2 -3 -16
3x3
1 0 -1
2 4 5 2 3 9
4x4
6x6
Vertical edge detection
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0 1 0 -1
1 0 -1 0 30 30 0
10 10 10 0 0 0 =
* 0 30 30 0
10 10 10 0 0 0 1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
Vertical edge detection
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0 * 1
1
0 -1
0 -1
= 0 30 30 0
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
0 0 0 10 10 10
0 0 0 10 10 10 0 -30 -30 0
1 0 -1
0 0 0 10 10 10 0 -30 -30 0
0 0 0 10 10 10 * 1
1
0 -1
0 -1
= 0 -30 -30 0
0 0 0 10 10 10 0 -30 -30 0
0 0 0 10 10 10
Horizontal edge detection
1 0 -1 1 1 1
1 0 -1 0 0 0
1 0 -1 -1 -1 -1
Vertical Horizontal
10 10 10 0 0 0
0 0 0 0
10 10 10 0 0 0 1 1 1
10 10 10
0 0 0
0 0
10 10 10
0
* 0 0 0 =
30
30
10 -10 -30
10 -10 -30
-1 -1 -1
0 0 0 10 10 10 0 0 0 0
0 0 0 10 10 10
Learning to detect edges
1 0 -1 1 0 -1 3 0 -3
1 0 -1 2 0 -2 10 0 -10
1 0 -1 1 0 -1 3 0 -3
Sobel filter Scharr filter
3 0 1 2 7 4
1 5 8 9 3 1
w1 w2 w3
2 7 2 5 1 3
0 1 3 1 7 8 * w4 w5 w6 =
w7 w8 w9
4 2 1 6 2 8
2 4 5 2 3 9
Why convolutions
…
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10: 0
…
0 0 1 0 1 0
13 0
6 x 6 image
14 0
fewer parameters! 15 1 Only connect to 9
16 1 inputs, not fully
connected
…
1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3
1 0 0 0 0 1
…
0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0 -1
0 1 0 0 1 0 10: 0
…
0 0 1 0 1 0
13: 0
6 x 6 image
14: 0
Fewer parameters 15: 1
16: 1 Shared weights
Even fewer parameters
…
Padding
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk
6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk
6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk
6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk
6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk
6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding
3 0 1 2 7 4
1 5 8 9 3 1
2 7 2 5 1 3 * =
0 1 3 1 7 8
4 2 1 6 2 8 3x3 4x4
2 4 5 2 3 9 fxf kxk
6x6
• Reduction in spatial dimension k=n–f +1
nxn =6–3+1
• Loss of information at image
=4
boundary region
Padding
0 0 0 0 0 0 0 0
0 3 0 1 2 7 4 0
0 1 5 8 9 3 1 0
0 2 7 2 5 1 3 0 * =
0 0 1 3 1 7 8 0
0 4 2 1 6 2 8 0 3x3 4x4
0 2 4 5 2 3 9 0 fxf kxk
0 0 0 0 0 0 0 0
6x6 k = n + 2p – f + 1
nxn = 6 + 2x1 – 3 + 1
=6
Padding = p = 1
Valid and Same Convolution
Same convolution: Pad the image so that the output size is same as the
input size.
n + 2p – f + 1 = n
à 2p = f – 1
à p = (f - 1)/2
Therefore, if f = 3, then p = (3 – 1)/2 = 1 if f = 5, then p = (5 – 1)/2 = 2
Valid and Same Convolution
Same convolution: Pad the image so that the output size is same as the
input size.
n + 2p – f + 1 = n Filter size is usually odd
à 2p = f – 1
à p = (f - 1)/2
• Central pixel as reference
• Symmetry around center
• Avoid asymmetric padding
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 8 9 7 3 4 4
7 8 3 6 6 3 4 * 1 0 2 =
4 2 1 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4
7x7
Strided Convolution
23 34 74 4 6 2 9
61 60 92 8 7 4 3
3 -1 4 0 8 3 3 8 9 7 3 4 4 91
7 8 3 6 6 3 4 * 1 0 2 =
4 2 1 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4
7x7
Strided Convolution
2 3 73 44 64 2 9
6 6 91 80 72 4 3
3 4 8 -1 3 0 8 3 9 7 3 4 4 91 100
7 8 3 6 6 3 4 * 1 0 2 =
4 2 1 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4
7x7
Strided Convolution
2 3 7 4 63 24 94
6 6 9 8 71 40 32
3 4 8 3 8 -1 9 0 7 3 3 4 4 91 100 83
7 8 3 6 6 3 4 * 1 0 2 =
4 2 1 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4
7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
33 44 84 3 8 9 7 3 4 4 91 100 83
71 80 32 6 6 3 4 * 1 0 2 = 69
4 -1 2 0 1 3 8 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4
7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 83 34 84 9 7 3 4 4 91 100 83
7 8 31 60 62 3 4 * 1 0 2 = 69 91
4 2 1 -1 8 0 3 3 4 6 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4
7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 83 94 74 3 4 4 91 100 83
7 8 3 6 61 30 42 * 1 0 2 = 69 91 127
4 2 1 8 3 -1 4 0 6 3 -1 0 3
3 2 4 1 9 8 3
3x3 3x3
0 1 3 9 2 1 4
7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 8 9 7 3 4 4 91 100 83
7 8 3 6 6 3 4 * 1 0 2 = 69 91 127
43 24 14 8 3 4 6 -1 0 3 44
31 20 42 1 9 8 3
3x3 3x3
0 -1
1 0
3 3
9 2 1 4
7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 8 9 7 3 4 4 91 100 83
7 8 3 6 6 3 4 * 1 0 2 = 69 91 127
4 2 13 84 34 4 6 -1 0 3 44 72
3 2 41 10 92 8 3
3x3 3x3
0 1 3 -1
9 0
2 3
1 4
7x7
Strided Convolution
2 3 7 4 6 2 9
6 6 9 8 7 4 3
3 4 8 3 8 9 7 3 4 4 91 100 83
7 8 3 6 6 3 4 * 1 0 2 = 69 91 127
4 2 1 8 33 44 64 -1 0 3 44 72 74
3 2 4 1 91 80 32
3x3 3x3
0 1 3 9 2 -1
1 0
4 3
fxf kxk
7x7 !"#$%&
nxn k= +1
'
= (7 + 0 - 3)/2 + 1
Padding = p = 0
= 2+1 = 3
Stride = s = 2
General Formulae
• w is the width of the input image
• h is the height of the input image
• pw is the padding applied to the width of the input
• ph is the padding applied to the height of the input
• fw is the width of the convolutional filter
• fh is the height of the convolutional filter
• sw is the stride used in the horizontal direction
• sh is the stride used in the vertical direction
𝑤 + 2𝑝! − 𝑓!
• Output image width = 𝑠!
+1
ℎ + 2𝑝" − 𝑓"
• Output image height = 𝑠"
+1
Volume Convolution
* =
3x3x3
4x4
6x6x3
Volume Convolution
* =
3x3x3
4x4
6x6x3
Volume Convolution
*
3x3x3
6x6x3 4x4
Volume Convolution
*
3x3x3
6x6x3 4x4
Volume Convolution
*
3x3x3
6x6x3 4x4
Multiple Kernels
* =
3x3x3 4x4
6x6x3 4x4x2
* =
No. of channels
3x3x3 No. of kernels
4x4
n x n x nc * f x f x nc = (n – f + 1) x (n – f + 1) x nf
One Neural Network Layer
b(1)
Z(1) = W(1)a(0) + b(1)
a(1) = g(Z(1))
One Convolution Layer Z(1) = W(1)a(0) + b(1)
a(1) = g(Z(1)) Activation
function
a(0)
* = ReLU W1(1)a(0) +b1(1)
For each kernel there are 3 x 3 x 3 = 27 parameters (weights) + 1 parameter (bias) = 28 parameters
For 10 kernels, there will be 28 x 10 = 280 trainable parameters (Independent of input image size)
Summary of notation
If layer l is a convolution layer:
(,-.) (,-.) (,-.)
𝑓 (,) = kernel size Input: 𝑛* ×𝑛0 ×𝑛1
(,)
Output: 𝑛* (,) (,)
𝑝(,) = padding ×𝑛0 ×𝑛1
𝑠 (,) = stride
(,-.)
(,)
𝑛1 = number of filters (,) 𝑛* + 2𝑝(,) − 𝑓 (,)
𝑛* = (,)
+1
𝑠
(,-.) (,-.)
Each kernel is: 𝑓 (,) ×𝑓 (,) ×𝑛
1 (,) 𝑛0 + 2𝑝(,) − 𝑓 (,)
𝑛0 = (,)
+1
(,) (,) (,) 𝑠
Activations: 𝑎(,) → 𝑛0 ×𝑛* ×𝑛1
Weights: 𝑓 (,) ×𝑓 (,) ×𝑛1(,-.) ×𝑛1(,)
(,)
bias: 𝑛1
Pooling layer: Max pooling
1 3 2 1
2 9 1 1
1 3 2 3
5 6 1 2
4x4
Pooling layer: Max pooling
1 3 2 1
2 9 1 1 9
1 3 2 3
5 6 1 2 2x2
4x4
Filter size = 2 x 2
Pooling layer: Max pooling
1 3 2 1
2 9 1 1 9 2
1 3 2 3
5 6 1 2 2x2
4x4
Filter size = 2 x 2
Stride= 2
Pooling layer: Max pooling
1 3 2 1
2 9 1 1 9 2
1 3 2 3 6
5 6 1 2 2x2
4x4
Filter size = 2 x 2
Stride= 2
Pooling layer: Max pooling
1 3 2 1
2 9 1 1 9 2
1 3 2 3 6 3
5 6 1 2 2x2
4x4
Filter size = 2 x 2 Two
Stride= 2 hyperparameters
No parameters to learn
Why Pooling?
1 3 2 1 3
2 9 1 1 5
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9 9
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8 6
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8 6 9
5 6 1 2 9 3x3
Filter size = 3 x 3
5x5
Stride= 1
Max Pooling – Another Example
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8 6 9
5 6 1 2 9 3 x 3 x nc
Filter size = 3 x 3
5 x 5 x nc
Stride= 1
Average Pooling
1 3 2 1 3
2 9 1 1 5 2.67 2.78 2.22
5 6 1 2 9 3 x 3 x nc
Filter size = 3 x 3
5 x 5 x nc
Stride= 1
Summary of pooling
Hyperparameters :
f : filter size
s : stride
Max or average pooling
𝑛* − 𝑓 𝑛0 − 𝑓
𝑛* × 𝑛0 × 𝑛1 → +1 × + 1 × 𝑛1
𝑠 𝑠
f = 2, s = 2 f = 3, s = 1
4 x 4 x 10 à 2 x 2 x 10 5 x 5 x 10 à 3 x 3 x 10
Flattening and Fully Connected Layers
Types of layer in a convolutional network:
- Convolution
- Pooling
- Fully connected
Complete convolutional network
7 x 7 x 64
28 x 28 x 3 14 x 14 x 32
14 x 14 x 64
28 x 28 x 32
Fully Fully
Connected Connected
Complete convolutional network parameters
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), padding='same', strides=(1, 1), activation='relu', input_shape=(28, 28, 3)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
𝑦.
$
1
Cost J = , 𝐿(𝑦. (%) , 𝑦 (%) )
𝑚
%&"
• LeNet-5
• AlexNet
• VGG
• ResNet
• Inception Net
LeNet-5
FC FC
• 138M parameters
• Relu non-linearity
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv
preprint arXiv:1409.1556.
Transfer Learning
Suppose you want to build a CNN to classify the following dog breeds
Start with a pre-trained open source model which is trained on image data and
download its code and weights.
Transfer Learning (Little training data)
Train G
L
Softmax 1000
Freeze
Transfer Learning (Some more training data)
Train G
L
Softmax 1000
G
Freeze
L
O
Train
Transfer Learning (Lot of training data)
G
Train
L
Softmax 1000
Re-train the complete network with pre-trained weights as the initial weights
Data Augmentation
(0,0)
Softmax
pc Is there a car in
the image? (0/1) 1 0
bx 0.5 ?
If 𝑦 = 1
by Bounding box 0.4 ? 7 𝑦) = (𝑦7# − 𝑦# )$ +(𝑦7$ − 𝑦$ )$
L(𝑦,
bw (real numbers) + ⋯ + (𝑦7% − 𝑦% )$
y= 0.3 ?
bh y= 0.25 y= ?
c1 0 ? If 𝑦 = 0
Class, one of
c2 them is 1, others
1 ?
7 𝑦) = (𝑦7# − 𝑦# )$
L(𝑦,
c3 are 0 0 ?