Week 7
Week 7
*
filter 3 X 3
Input 5 X 5 Output (5-3+1) X (5-3+1)
-> 3 X 3
Same Padding
• Padding is added to the input feature map.
• How is the padding calculated?
• Output shape with padding is calculated by n+2p − f+1
• The number of pixels to be added for padding can be calculated based on the
size of the kernel and the desired output feature map size.
n+2p − f+1 = n
(𝒇 − 𝟏)
𝒑=
𝟐
• The convolution operation is performed on the padded input
feature map and to maintain same-dimension output feature map
as the input feature map (Note: default stride =1)
Same Padding
Padding -> (3-1)/2 = 1
filter 3 X 3
Output (7-3 +1)X (7-3 +1)
Actual Input 5 X 5 -> 5 X 5
After Padding 7 X 7
Strides
• Stride is a parameter of the neural network's filter that modifies the
amount of movement/slide over input feature map
Stride 2
Stride 2
*
Output 2 X 2
filter 3 X 3
Input 5 X 5
b1 Filter 1
Input – 5 x 5 w1[2:0] w1[2:1] w1[2:2]
Filter – 3 X 3 X nf
Output – 3 X 3 X nf
Here, filter size = 3 x 3,
Number of total parameters for each filter = 9 weights + 1 bias
Total parameter of the layer = (9+1) * n, where n- number of channels
Convolution on colored images (RGB)
• Number of channels in input layer n c will be 3 (Red , Blue and Green )
• Filter will be applied to each channels in the input
Filter 2
Filter 1
Convolution on colored images (RGB)
Output – 3*3
Filter – 3 X 3
5x
5(0) Filter – 3 X 3
Filter n
bn
Filter 1
b1
Convolution with
filter 1:
Convolution with
filter 1:
Slide 9
Pooling
• Progressively reduce the spatial size of the representation to
reduce the network complexity and computational cost.
• Handles overfitting during feature extraction process
• Widely used pooling layer:
• Max Pooling
• Average Pooling
Max Pooling
• Max pooling applies MAX filter
• Max pooling selects the brighter pixels from the image.
When interested in only the lighter pixels of the image
this filter can be applied
1 2 5 5 3 10 5
1 6 1 5 0 11 11
10 6 5 3 1
7 5 11 4 4
5−3 5−3
1 -2 9 3 0 Stride =2, pooling Output +1 X +1 -> 2 X 2
2 2
Input 5 X 5 filter 3 X 3
Average Pooling
• This pooling layer works by getting the average of the pool.
• Max Pooling in the sense that it retains much information about the
“less important” elements of a block, or pool.
6 1 5 0 4.5 2.25
6 5 3 1
5 11 4 4 5.75 2.75
-2 9 3 0
Stride =2, pooling
4−2 4−2
Input 4 X 4 filter = 2 X 2 Output +1 X +1 -> 2 X 2
2 2
Typical Architecture of CNN
• Convolutional layers: Performs convolution operations by applying the filters (set of the learnable
parameter). to input
• Pooling layers: Perform pooling operations and help spatial dimension reduction on the output of
the convolutional layers.
• Fully connected layers: Connect every neuron in one layer to every neuron in another layer and
computes the final classification or regression task
Typical Architecture of CNN
Input
((convolution -> relu) x n / pooling) x m) + (1 fully-connected layer -> relu) x k + (output layer)
VGG 16
Data Augmentation
• Techniques for increasing the size of the training dataset by applying random transformations
to the input data, such as rotation, flipping, scaling etc..
• Helps in increasing the amount of data in a machine learning model by adding slightly
modified copies of already existing data or newly created synthetic data from existing data
• Data augmentation enhances the diversity of your training dataset, which can improve the
model's ability to generalize to new data.