0% found this document useful (0 votes)
5 views

Week 7

The document provides an overview of Convolutional Neural Networks (CNNs), covering key concepts such as padding (valid and same), strides, and the introduction of bias in kernels. It discusses pooling techniques like max and average pooling, typical CNN architecture, and popular CNN models like VGG, ResNet, Inception, and MobileNet. Additionally, it highlights the importance of data augmentation in enhancing training datasets for better model generalization.

Uploaded by

sidharth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Week 7

The document provides an overview of Convolutional Neural Networks (CNNs), covering key concepts such as padding (valid and same), strides, and the introduction of bias in kernels. It discusses pooling techniques like max and average pooling, typical CNN architecture, and popular CNN models like VGG, ResNet, Inception, and MobileNet. Additionally, it highlights the importance of data augmentation in enhancing training datasets for better model generalization.

Uploaded by

sidharth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Week 7

Convolution Neural Network


Padding
• Padding preserves the spatial dimensions of the input image after
convolution operations on a feature map by adding the pixels around
the boundary
• The most popularly used padding is Zero Padding
• Zero Padding – Simply add value 0 to the borders of the input feature
map
• Two types of padding
▪ Valid Padding
▪ Same Padding
Valid Padding
• No padding is added to the input feature map
• The convolution operation is performed only on the valid pixels of the
input feature map and results in a smaller dimension output feature
map

*
filter 3 X 3
Input 5 X 5 Output (5-3+1) X (5-3+1)
-> 3 X 3
Same Padding
• Padding is added to the input feature map.
• How is the padding calculated?
• Output shape with padding is calculated by n+2p − f+1
• The number of pixels to be added for padding can be calculated based on the
size of the kernel and the desired output feature map size.
n+2p − f+1 = n
(𝒇 − 𝟏)
𝒑=
𝟐
• The convolution operation is performed on the padded input
feature map and to maintain same-dimension output feature map
as the input feature map (Note: default stride =1)
Same Padding
Padding -> (3-1)/2 = 1

filter 3 X 3
Output (7-3 +1)X (7-3 +1)
Actual Input 5 X 5 -> 5 X 5
After Padding 7 X 7
Strides
• Stride is a parameter of the neural network's filter that modifies the
amount of movement/slide over input feature map

Stride 2

Stride 2

*
Output 2 X 2
filter 3 X 3
Input 5 X 5

Consider padding = ‘valid’, Stride = 2


Understanding output shape with padding
and stride
Introducing bias in kernel & parameter
Considering: Padding – ”Valid”, Stride = 1

wn[0:0] wn[0:1] wn[0:2]

wn[1:0] wn[1:1] wn[1:2] b[n] Filter n


wn[2:0] wn[2:1] wn[2:2]

w2[0:0] w2[0:1] w2[0:2]


b2
w2[1:0]
w1[0:0]
w2[1:1]
w1[0:1]
w2[1:2]
w1[0:2] Filter 2
w2[2:0] w2[2:1] w2[2:2]
w1[1:0] w1[1:1] w1[1:2]

b1 Filter 1
Input – 5 x 5 w1[2:0] w1[2:1] w1[2:2]

Filter – 3 X 3 X nf
Output – 3 X 3 X nf
Here, filter size = 3 x 3,
Number of total parameters for each filter = 9 weights + 1 bias
Total parameter of the layer = (9+1) * n, where n- number of channels
Convolution on colored images (RGB)
• Number of channels in input layer n c will be 3 (Red , Blue and Green )
• Filter will be applied to each channels in the input

Filter 2

Filter 1
Convolution on colored images (RGB)

Output – 3*3

Filter – 3 X 3

Considering: Padding – ”Valid”, Stride = 1, Number of filter =1


Input – 5 x 5 x 3
Introducing bias in kernel & parameter
w1[0:0:2] w1 [0:1:2] w1 [0:2:2] Considering: Padding – ”Valid”,
w1 [1:0:2] w1 [1:1:2] w1 [1:2:2] Stride = 1, Number of filter =1
w1 [2:0:2] w1 [2:1:2] w1 [2:2:2]

w1[0:0:1] w1 [0:1:1] w1 [0:2:1]

w1 [1:0:1] w1 [1:1:1] w1 [1:2:1]

w1 [2:0:1] w1 [2:1:1] w1 [2:2:1]


+ b1

w1[0:0:0] w1 [0:1:0] w1 [0:2:0] Output – 3 X 3


w1 [1:0:0] w1 [1:1:0] w1 [1:2:0]
W1 – 3 x 3 x 3
w1 [2:0:0] w1 [2:1:0] w1 [2:2:0]

5x
5(0) Filter – 3 X 3

Here, we have 1 filter with filter size = 3 x 3 (height ,width)


Input – 5 x 5 x 3 Number of total parameters for this filter = 9 * 3(no of input channel weights +1 bias
Parameter Calculation
• Input: h - height of the input, w - weight of input, nc -Number of channels in the input
• Filter: f height and weight of filter, nf - number of filter

Filter n

bn

Filter 1

b1

weight - (Filter height, Filter width,


Number of input channels, Number of filters)
Bias – Number of filters

Number of Parameter of layer - 𝑓 ∗ 𝑓 ∗ 𝑛𝑐 + 1 ∗ 𝑛𝑓


RGB image – Convolution Calculation

Convolution with
filter 1:

Input Size = 5X5


Padding=1
Stride=2
Filter = 3x3
(0*0+0*-1+0*1)+(0*-1+2*1+0*-1)+(0*0+1*0+1*0)
+
Slide 1 (0*1+0*0+0*-1)+(0*0+1*0+1*1)+(0*-1+1*-1+2*1)
+
(0*1+0*1+0*-1)+(0*0+0*-1+0*0)+(0*1+0*-1+2*1)
+
1

x[:,:,0] * w0[:,:,0] + x[:,:,1] * w0[:,:,1] + x[:,:,2] * w0[:,:,2] + b0


RGB image – Convolution Calculation
Convolution with
filter 1:
Input Size = 5X5
Padding=1
Stride=2
Filter = 3x3
Slide 2
RGB image – Convolution Calculation

Convolution with
filter 1:

Slide 9
Pooling
• Progressively reduce the spatial size of the representation to
reduce the network complexity and computational cost.
• Handles overfitting during feature extraction process
• Widely used pooling layer:
• Max Pooling
• Average Pooling
Max Pooling
• Max pooling applies MAX filter
• Max pooling selects the brighter pixels from the image.
When interested in only the lighter pixels of the image
this filter can be applied

1 2 5 5 3 10 5
1 6 1 5 0 11 11
10 6 5 3 1
7 5 11 4 4
5−3 5−3
1 -2 9 3 0 Stride =2, pooling Output +1 X +1 -> 2 X 2
2 2

Input 5 X 5 filter 3 X 3
Average Pooling
• This pooling layer works by getting the average of the pool.
• Max Pooling in the sense that it retains much information about the
“less important” elements of a block, or pool.

6 1 5 0 4.5 2.25
6 5 3 1
5 11 4 4 5.75 2.75

-2 9 3 0
Stride =2, pooling
4−2 4−2
Input 4 X 4 filter = 2 X 2 Output +1 X +1 -> 2 X 2
2 2
Typical Architecture of CNN
• Convolutional layers: Performs convolution operations by applying the filters (set of the learnable
parameter). to input
• Pooling layers: Perform pooling operations and help spatial dimension reduction on the output of
the convolutional layers.
• Fully connected layers: Connect every neuron in one layer to every neuron in another layer and
computes the final classification or regression task
Typical Architecture of CNN
Input

((convolution -> relu) x n / pooling) x m) + (1 fully-connected layer -> relu) x k + (output layer)

Block for Feature Classifier


Extraction

• (convolution/pooling) + (1 fully-connected layer) + (output layer)


• (convolution/pooling/convolution/pooling) + (1 fully-connected layer) + (output layer)
• (convolution/convolution/pooling) + (1 fully-connected layer) + (output layer)
• (convolution/convolution/pooling convolution/convolution/pooling) + (1 fully-connected layer) + (output layer)
Typical Architecture of CNN
• Number of Filters/Kernels: Varies depending on the specific architecture but may start with 32
or 64 filters.
• Filter/Kernel Size: Default values often include 3x3 or 5x5.
• Stride: Default stride is usually set to 1.
• Padding: Often set to 'valid' (no padding) or 'same' (zero padding).
• Activation Function: The default activation function is often the rectified linear unit (ReLU).
• Pooling: Max pooling layers with a default pool size of 2x2 are commonly used.
Popular CNN Architectures
• VGG (Visual Geometry Group):
• VGG16 and VGG19 are popular models known for their simplicity and effectiveness. They have 16
and 19 weight layers, respectively.
• Pretrained models are available in various deep learning frameworks.
• ResNet (Residual Network):
• ResNet models, including ResNet-50, ResNet-101, and ResNet-152, introduced residual
connections that allow for training very deep networks.
• Pretrained ResNet models are widely used for various computer vision tasks.
• Inception (GoogLeNet):
• Inception models, including InceptionV3 and InceptionV4, use a network architecture with multiple
branches of different filter sizes to capture a wide range of features.
• Pretrained Inception models are useful for tasks that require multi-scale feature extraction.
• MobileNet:
• MobileNet models are designed for efficient inference on mobile devices. They offer a balance
between model size and accuracy.
• Pretrained MobileNet models are commonly used in mobile and embedded applications.
VGG 16 – Architecture

VGG 16
Data Augmentation
• Techniques for increasing the size of the training dataset by applying random transformations
to the input data, such as rotation, flipping, scaling etc..
• Helps in increasing the amount of data in a machine learning model by adding slightly
modified copies of already existing data or newly created synthetic data from existing data
• Data augmentation enhances the diversity of your training dataset, which can improve the
model's ability to generalize to new data.

You might also like