Unit 2 Convolutional Neural Network
Unit 2 Convolutional Neural Network
Unit 2 Convolutional Neural Network
Introduction to CNN
Convolutional Layer:
● The cornerstone of CNNs is the convolutional layer. It performs
convolution operations on input data using learnable filters (also
called kernels). These filters slide over the input to extract local
patterns and features.
● Convolution involves element-wise multiplication of the filter
with a local region of the input and then summing the results.
This process captures spatial hierarchies.
Pooling Layer:
● After convolutional layers, pooling layers are employed for
downsampling and dimensionality reduction. Pooling
aggregates information from neighboring regions, reducing the
spatial dimensions.
● Common pooling techniques include max pooling (selecting the
maximum value in a region) and average pooling (taking the
average value).
Activation Functions:
● Activation functions like ReLU (Rectified Linear Unit) introduce
non-linearity into the network, allowing it to capture complex
relationships in the data.
● ReLU outputs the input directly if it's positive, otherwise outputs
zero.
Weight Sharing and Local Connectivity:
● CNNs leverage weight sharing, where the same set of filters is
applied across different spatial locations in the input. This
allows the network to learn local features irrespective of their
position.
● This local connectivity reduces the number of parameters and
enhances the model's ability to capture patterns across the
entire input.
Hierarchy of Features:
● CNNs learn features hierarchically. Lower layers capture simple
features like edges, corners, and textures, while deeper layers
learn more complex features like object parts and shapes.
● This hierarchy enables the network to understand the context
and composition of objects.
Fully Connected Layers:
● Typically, one or more fully connected layers follow the
convolutional and pooling layers to make predictions based on
the extracted features.
● These layers integrate the learned features and produce the final
output.
Training and Backpropagation:
● CNNs are trained using backpropagation and optimization
algorithms. Gradients are computed to update the network's
parameters (weights and biases) iteratively.
● The loss function measures the discrepancy between predicted
and actual values, and the network learns to minimize this loss.
Advantages of CNNs:
Applications of CNNs:
Convolution Operation
2. Local Receptive Field: During each step of the sliding process, the filter
covers a small region of the input, known as the local receptive field. This
receptive field captures local patterns and features. By convolving the filter
over the entire input, the network can detect various features across the data.
3. Filter and Feature Map: The filter is a small matrix of learnable weights. It
represents a pattern or feature that the network aims to detect. As the filter
slides over the input, its weights are element-wise multiplied with the
corresponding input values within the receptive field. The results are summed
to produce a single value in the feature map.
5. Strides and Padding: The stride defines the step size by which the filter
moves as it slides over the input. A larger stride results in smaller output
dimensions. Padding can be added to the input to ensure that the filter covers
the edges and corners adequately. Common padding methods include "same"
padding (output size equals input size) and "valid" padding (no padding,
output size is reduced).
8. Advantages of Convolution:
● Parameter Sharing: The same filter weights are shared across different
spatial locations, reducing the number of parameters and making
learning more efficient.
● Local Patterns: Convolution focuses on local patterns, allowing CNNs
to capture features regardless of their position in the input.
● Feature Hierarchy: By stacking multiple convolutional layers, CNNs can
learn hierarchical features, from simple edges to complex objects.
Equivariant Representation
Pooling
● Pooling layers are commonly inserted between successive
convolutional layers. We want to follow convolutional layers with
pooling layers to progressively reduce the spatial size (width and
height) of the data representation. Pooling layers reduce the data
representation progressively over the network and help control
overfitting. The pooling layer operates independently on every depth
slice of the input.
● The pooling layer uses the max() operation to resize the input data
spatially (width, height). This operation is referred to as max pooling.
With a 2 × 2 filter size, the max() operation is taking the largest of four
numbers in the filter area. This operation does not affect the depth
dimension.
● Pooling layers use filters to perform the downsampling process on the
input volume. These layers perform downsampling operations along
the spatial dimension of the input data. This means that if the input
image were 32 pixels wide by 32 pixels tall, the output image would be
smaller in width and height (e.g., 16 pixels wide by 16 pixels tall).
● The most common setup for a pooling layer is to apply 2 × 2 filters with
a stride of 2. This will downsample each depth slice in the input volume
by a factor of two on the spatial dimensions (width and height). This
downsampling operation will result in 75 percent of the activations
being discarded.
● Pooling layers do not have parameters for the layer but do have
additional hyperparameters. This layer does not involve parameters,
because it computes a fixed function of the input volume. It is not
common to use zero-padding for pooling layers.
(chatgpt)
Types of Pooling:
1. Max Pooling:
a. Max pooling extracts the maximum value within each pooling
window. It effectively retains the most activated feature in the
region.
b. Max pooling is robust to noise and minor variations in the data.
c. It emphasizes dominant features and helps the network learn
patterns regardless of their precise location.
2. Average Pooling:
a. Average pooling calculates the average value within each
pooling window.
b. It's less sensitive to outliers and emphasizes overall trends in
the data.
3. Global Average Pooling (GAP):
a. GAP takes the average of all values in the feature map, reducing
it to a single value per feature channel.
b. It serves as a form of regularization by encouraging the network
to focus on the most important features.
Pooling Process:
1. Pooling Window:
a. A pooling window (also known as the pooling kernel) is a
fixed-size window that slides over the input feature map.
b. It defines the local region from which information will be
summarized.
2. Pooling Operation:
a. For each pooling window position, the pooling operation (max or
average) is applied to the values within the window.
b. This operation produces a downsampled output value for that
region.
3. Stride:
a. The stride determines the step size at which the pooling window
moves over the input.
b. Larger strides result in more aggressive downsampling.
4. Padding:
a. Padding can be added around the input to control the output
dimensions after pooling.
b. It ensures that spatial dimensions are preserved or adjusted as
needed.
Use Cases:
1. Image Classification: Pooling is widely used in image classification
Image Segmentation:tasks to extract key features from images.
2. Object Detection: Pooling helps extract features that are robust to
changes in object position and size.
3. In segmentation tasks, pooling can be used to reduce spatial
dimensions while preserving key information for segmentation masks.
1. Input Layer: The input layer of a CNN receives the raw data, which is usually
an image or a set of images. Images are represented as multi-dimensional
arrays of pixel values, where each pixel's intensity or color information is
encoded.
5. Fully Connected Layer: After several convolutional and pooling layers, fully
connected layers are employed for high-level feature capture. These layers
resemble traditional neural network layers. Each neuron is connected to all
neurons in the previous and subsequent layers. Fully connected layers can
learn complex relationships but come with a large number of parameters.
6. Flatten Layer: Before entering the fully connected layers, the feature maps
are flattened into a one-dimensional vector. This transformation is necessary
as fully connected layers require fixed-size inputs, whereas feature maps are
two-dimensional and spatially organized.
● Depth and Width: Increasing depth (more layers) and width (more
neurons per layer) enhances the network's capacity to capture complex
features but requires more computational resources.
● Padding: Padding can be applied to maintain spatial dimensions after
convolutions (same padding) or reduce dimensions (valid padding).
● Strides: Strides control how the convolutional kernels move over the
input, affecting output size and feature extraction.
11. Data Augmentation and Regularization: To prevent overfitting and enhance
generalization, data augmentation involves creating variations of the training
data. Regularization techniques like dropout (randomly disabling neurons) and
L2 regularization (penalizing large weights) are used.
2. Input Layer:
● AlexNet's input layer receives images as input, with each image having
a fixed size of 227x227 pixels.
● The images consist of three color channels (RGB), representing red,
green, and blue color intensities.
3. Convolutional Layers:
4. Activation Functions:
● ReLU (Rectified Linear Unit) activation functions are employed after
each convolutional layer.
● ReLU introduces non-linearity by setting negative values to zero and
passing positive values unchanged.
● This non-linearity enables the network to learn complex relationships in
the data.
● AlexNet utilizes max pooling layers after the first and second
convolutional layers.
● A 3x3 window is moved with a stride of 2, resulting in downsampling
and enhancing translation invariance.
● Max pooling selects the maximum value within each local region,
reducing the spatial dimensions while preserving key features.
8. Softmax Activation:
● The output layer follows the final fully connected layer and utilizes the
softmax activation function.
● Softmax transforms the raw class scores into a probability distribution,
assigning a probability to each class.
● The class with the highest probability is predicted as the class label.
9. Training Details:
1. Data Augmentation:
2. Regularization:
3. Optimization:
4. Learning Rate:
11. Optimization and Learning Rate: - AlexNet uses the stochastic gradient
descent (SGD) optimization algorithm with momentum to update model
parameters. - Momentum accelerates convergence by adding a fraction of the
previous parameter update to the current update. - The initial learning rate
(e.g., 0.01) is gradually reduced during training to achieve better convergence.