Unit 3
Unit 3
• The size of the resulting feature map would have smaller dimensions than the original input image.
• This could be detrimental to the model's predictions, as our features will continue to pass through
many more convolutional layers. How do we ensure that we do not miss out on any vital
information?
• When we incorporate stride and padding into the process, we ensure that our
input and output volumes remain the same size - thus maintaining the spatial
arrangement of visual features.
1.Preserve Spatial Dimensions: Without padding, each convolution operation
reduces the spatial dimensions (height and width) of the input image. This can lead
to a significant reduction in size after multiple layers, which might not be desirable
for certain applications where maintaining the spatial dimensions is crucial.
2.Edge Information: Pixels at the edges and corners of an image often contain
important information. Without padding, the filters would only pass over these
pixels fewer times compared to the central pixels. This means the edge information
would be less influential in the output feature map, potentially leading to loss of
valuable details.
3.Control Output Size: Padding helps control the output size of the convolutional
layers. By appropriately padding the input, we can ensure the output feature map
has the desired dimensions, which can be crucial for designing certain types of
network architectures.
• Let say ‘p’ is the padding
• Initially(without padding)
• (N x N) * (F x F) = (N-F+1)x(N-F+1)---(1)
• After applying padding
• If we apply filter F x F in (N+2p) x (N+2p) input matrix
with padding, then we will get output matrix dimension
(N+2p-F+1) x (N+2p-F+1). As we know that after
applying padding we will get the same dimension as
original input dimension (N x N).
• (N+2p-F+1)x(N+2p-F+1) equivalent to NxN
• N+2p-F+1 = N ---(2)
• p = (F-1)/2 ---(3) The equation (3) clearly shows that
Padding depends on the dimension of filter.
• There are three different kinds of padding:
• Valid padding: Also known as no padding. In this
specific case, the last convolution is dropped if the
dimensions do not align.
• Same padding: This padding ensures that the output
layer has the exact same size as the input layer.
• Full padding: This kind of padding increases the size of
the output by adding zeros to the borders of the input
matrix.
Stride
• the distance the filter moves at a time. A filter with a stride of 1 will
move over the input image, 1 pixel at a time.
• Stride governs how many cells the filter is moved in the input to
calculate the next cell in the result.
With both stride and padding
• It is important to note that the weights in the filter remain fixed as it
moves across the image. The weight values are adjusted during the
training process due to backpropagation and gradient descent.
• Besides the weights in the filter, we have other three important
parameters that need to be set before the training begins:
• Number of Filters: This parameter is responsible for defining the
depth of the output. If we have three distinct filters, we have three
different feature maps, creating a depth of three.
• Stride: This is the distance, or number of pixels, that the filter moves
over the input matrix.
• Zero-padding: This parameter is usually used when the filters do not fit
the input image. This sets all elements outside the input matrix to zero,
producing a larger or equally sized output.
• After each convolution operation, we have the
application of a Rectified Linear Unit (ReLU) function,
which transforms the feature map and introduces
nonlinearity.
• As mentioned earlier, the initial convolutional layer can
be followed by additional convolutional layers.
• The subsequent convolutional layers can see the pixels
within the receptive fields of the prior layers, which
helps to extract and interpret additional patterns.
• The output volume of the Conv. layer is fed to an
elementwise activation function, commonly a Rectified-
Linear Unit (ReLu).
• The ReLu layer will determine whether an input node
will 'fire' given the input data.
• This 'firing' signals whether the convolution layer's
thresholding at 0.
• The dimensions of the volume are left unchanged.
Pooling Layer
• The pooling layer is responsible for reducing the
dimensionality of the input. It also slides a filter across
the entire input — without any weights — to populate
the output array. We have two main types of pooling:
• • Max Pooling: As the filter slides through the input, it
selects the pixel with the highest value for the output
array.
• • Average Pooling: The value selected for the output
is obtained by computing the average within the
receptive field.
• Interesting properties of pooling layer:
• Stride usually depends on filter size (if filter is 2*2 , than
stride can be 2)
• it has hyper-parameters:
• size (f)
• stride (s)
• type (max or avg)
Why Max Pooling?
• Number of images generated as output at max pooling layer are same
as number of inputs from convolutional layer
• After each convolutional layer operation, max pooling is not required.
• No parameters are must, so its kind of transformation operation
Fully-connected Layer
• This is the layer responsible for performing the task
classification based on the features extracted during the
previous layers. While both convolutional and pooling
layers tend to use ReLU functions, fully-connected
layers use the Softmax activation function for
classification, producing a probability from 0 to 1.
• After extracting features from pooling layer, array will be flattened in
1D and will be given as input to the fully connected layer
• The layer is call fully connected cause nodes of one layers are
connected to all the nodes of previous layer as well as next layer.
• No of outputs are same as number of categories for classification.
Regularization
• AlexNet was the first convolutional network which used GPU to boost
performance.
• AlexNet architecture consists of 5 convolutional layers, 3 max-
pooling layers, 2 normalization layers, 2 fully connected layers, and 1
SoftMax layer.
• Each convolutional layer consists of convolutional filters and a
nonlinear activation function ReLU.
• The pooling layers are used to perform max pooling.
• Input size is fixed due to the presence of fully connected layers.
• The input size is mentioned at most of the places as 224x224x3 but
due to some padding which happens it works out to be 227x227x3
• AlexNet overall has 60 million parameters.
• Used Normalization layers which are not common
anymore
• Batch size of 128
• SGD Momentum as learning algorithm
• Heavy Data Augmentation with things like flipping,
jittering, cropping, color normalization, etc.
• Ensemble of models to get the best results.
• The Overfitting Problem. AlexNet had 60 million parameters, a major
issue in terms of overfitting.
• Two methods to reduce overfitting:
• Data Augmentation
• Dropout
Data Augmentation