Unit III
Unit III
(Autonomous)
Parameter Description
Images exhibit a hierarchical structure with local patterns and features. Traditional fully
connected neural networks do not consider the spatial relationships between pixels,
Local Connectivity resulting in a large number of parameters and inefficient learning of local patterns.
CNNs, with their convolutional layers, take advantage of the local connectivity, allowing
the network to learn features using shared weights.
CNNs use parameter sharing through the convolutional operation. A single set of weights
(filter) is applied to different parts of the input image. This reduces the number of
Parameter Sharing
parameters in the network, making it more efficient and easier to train. Parameter
sharing also helps in learning translation-invariant features.
Parameter Description
Pooling layers in CNNs help in reducing the spatial dimensions of the feature
Reduction of Spatial maps, making the computation more tractable and decreasing the risk of
Dimensions overfitting. Pooling layers also help in maintaining the invariance to small
translations.
Parameter Description
CNNs are well-suited for transfer learning, where a pre-trained model on a large
Transfer Learning dataset (e.g., ImageNet) can be fine-tuned for a specific task with a smaller
dataset. This is particularly valuable when labeled data for a specific task is
limited.
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)
For 3 entry:
For 1 entry:
For 4 entry:
For 2 entry:
Convolutional Neural Network Layers (CNN Layers)
For 6 entry:
Convolutional Neural Network Layers (CNN Layers)
Let us consider for 2-D vector i.e., image
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)
For the rest of the discussion we use the following formula for the convolution.
In other words the kernel is center on the pixel of interest.
So we will be looking at both preceding and succeeding neighbors.
Convolutional Neural Network Layers (CNN Layers)
?
The resulting image should be detects the
edges.
Convolutional Neural Network Layers (CNN Layers)
Instead of using hand crafted kernels such as edge detector can we learn for meaningful kernel/filter in
addition to learn the weights of the classifier?
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)
Dropout Layer
Dropout is a regularization technique where a random set of neurons are ignored during training. This
helps prevent overfitting by promoting redundancy in the network.
Batch Normalization Layer
Batch normalization normalizes the input of a layer by adjusting and scaling the activations. This can
speed up training and improve the overall stability of the neural network.
These layers are typically stacked together to form the architecture of a CNN.
The convolutional and pooling layers are responsible for extracting features from the input data, and the fully
connected layers are responsible for making predictions or classifications based on these features.
The arrangement and number of these layers can vary depending on the specific architecture of the CNN.
Popular CNN architectures include AlexNet, VGGNet, GoogLeNet, and ResNet.
Stride
The stride defines how much the filter moves across the input data in each step. A larger stride reduces the size
of the output feature map, while a smaller stride increases its size. Stride is a hyper parameter that influences
the spatial dimensions of the feature maps.
Filters can combine with activation functions (such as ReLU) and pooling layers, enable CNNs to learn
hierarchical representations of features in the input data.
As the network goes deeper, it learns more abstract and complex features, making it capable of understanding and
recognizing intricate patterns in the data.
The weights of the filters are adjusted during training to minimize the error in the network's predictions.
Parameter Sharing in CNN
Parameter sharing is a key concept in Convolutional Neural Networks (CNNs) that contributes to the efficiency and
effectiveness of these networks, particularly in image recognition tasks. The idea behind parameter sharing is to use the
same set of parameters (weights and biases) for multiple units in a layer.
In the context of CNNs, parameter sharing is primarily applied to the filters (kernels) used in convolutional layers.
However, It works on
Shared Weights
In a convolutional layer, a filter is used to perform convolutional operations on the input data. Instead of having
separate weights for each position in the input, the same set of filter weights is shared across the entire input. This
means that the filter slides over the input, and the same weights are used at every position.
Local Receptive Fields
Each unit in the feature map (output of the convolutional layer) is responsible for a small local region in the input
known as the receptive field. The weights of the filter are applied to this local region, and by sliding the filter, the
same weights are applied to different local regions across the entire input.
Parameter Sharing in CNN
Regularization is an important concept in machine learning, including Convolutional Neural Networks (CNNs).
Regularization techniques are employed to prevent overfitting, which occurs when a model learns to perform well on the
training data but fails to generalize to new, unseen data. Overfitting is a common concern, especially when dealing with
complex models and limited datasets.
Regularization techniques used in CNNs
Dropout: Dropout is a widely used regularization technique. During training, random units (neurons) are "dropped out" or
set to zero with a certain probability. This helps prevent co-adaptation of neurons, making the network more robust and
reducing overfitting.
Weight Regularization (L1 and L2 regularization): L1 and L2 regularization involve adding a penalty term to the loss
function based on the magnitude of the weights. L1 regularization adds the sum of the absolute values of the weights, while
L2 regularization adds the sum of the squared values. This discourages the model from learning overly complex patterns
and helps prevent overfitting.
Batch Normalization: While primarily used to normalize inputs, batch normalization also has a regularization effect. It
introduces a small amount of noise during training, which can act as a form of regularization and help prevent overfitting.
Regularization in CNN
Data Augmentation: Data augmentation involves applying random transformations to the training data, such as rotations,
flips, and shifts. This artificially increases the size of the training dataset, helping the model generalize better to unseen
data.
Early Stopping: Monitoring the model's performance on a validation set during training and stopping the training process
when the performance starts to degrade is a form of regularization. This helps prevent the model from fitting the training
data too closely and overfitting.
Drop Connect: Drop Connect is an extension of Dropout, where instead of dropping out individual neurons, entire
connections between layers are dropped with a certain probability. This can be applied to the weights in convolutional
layers.
Ensemble Methods: Training multiple models and combining their predictions can also act as a form of regularization.
Each model may learn different aspects of the data, and combining them helps improve generalization.
The choice of regularization techniques depends on the specific characteristics of the dataset and the complexity of the
model. Often, a combination of these techniques is used to achieve the best results in terms of preventing overfitting while
still allowing the model to learn useful patterns from the data.
Regularization in CNN
Data Augmentation: Data augmentation involves applying random transformations to the training data, such as rotations,
flips, and shifts. This artificially increases the size of the training dataset, helping the model generalize better to unseen
data.
Early Stopping: Monitoring the model's performance on a validation set during training and stopping the training process
when the performance starts to degrade is a form of regularization. This helps prevent the model from fitting the training
data too closely and overfitting.
Drop Connect: Drop Connect is an extension of Dropout, where instead of dropping out individual neurons, entire
connections between layers are dropped with a certain probability. This can be applied to the weights in convolutional
layers.
Ensemble Methods: Training multiple models and combining their predictions can also act as a form of regularization.
Each model may learn different aspects of the data, and combining them helps improve generalization.
The choice of regularization techniques depends on the specific characteristics of the dataset and the complexity of the
model. Often, a combination of these techniques is used to achieve the best results in terms of preventing overfitting while
still allowing the model to learn useful patterns from the data.
AlexNet
AlexNet
AlexNet
AlexNet
AlexNet is a convolutional neural network (CNN) architecture designed for image classification tasks.
It was developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, and it won the ImageNet
Large Scale Visual Recognition Challenge (ILSVRC) in 2012, marking a breakthrough in the field of
deep learning.
The key characteristics of AlexNet is as follows:
Architecture: AlexNet consists of eight layers of learnable parameters, including five convolutional
layers, followed by three fully connected layers. The convolutional layers are designed to capture
hierarchical features in the input images.
Rectified Linear Units (ReLU): AlexNet uses the rectified linear unit activation function (ReLU)
throughout most of its layers. ReLU introduces non-linearity to the network, helping it learn complex
patterns in the data.
Local Response Normalization (LRN): Local Response Normalization is applied after the first and
second convolutional layers. It normalizes the responses across different feature maps, enhancing the
network's generalization ability.
AlexNet
Overlapping Max-Pooling: Max-pooling is used for down-sampling in the spatial dimensions. Unlike
traditional pooling methods, AlexNet uses overlapping pooling, meaning that the pooling regions have
some overlap. This helps in capturing more spatial hierarchies.
Dropout: To prevent overfitting, AlexNet incorporates dropout regularization in the fully connected
layers. Dropout randomly drops a certain percentage of neurons during training, forcing the network to
learn more robust features.
Large-Scale Training: AlexNet was one of the first neural networks to be trained on a large-scale
dataset, specifically the ImageNet dataset, which contains millions of labeled images. The massive
scale of the dataset and the computational power required for training contributed to the success of
AlexNet.
GPU Acceleration: AlexNet's successful implementation was made possible, in part, by the use of
Graphics Processing Units (GPUs) for parallelizing the training process. This significantly reduced
training time compared to using traditional Central Processing Units (CPUs).
RESNET
In the context of Convolutional Neural Networks (CNNs), transfer learning has been particularly
successful due to the ability of CNNs to learn hierarchical features.
Here are some common transfer learning techniques in CNN:
Feature Extraction: In feature extraction, a pre-trained CNN is used as a fixed feature extractor. The
idea is to remove the last few layers of the pre-trained model, which are typically responsible for task-
specific classification, and retain the earlier layers that have learned generic features. These features
can then be used as input for a new classifier trained on the target task.
Fine-Tuning: Fine-tuning involves taking a pre-trained CNN and further training it on the target task.
Instead of keeping the entire architecture fixed, as in feature extraction, fine-tuning allows the weights
of some layers to be updated during training on the new task. This is especially useful when the target
task is related to the original task, but there are some task-specific nuances.
Transfer Learning Techniques
Pre-trained Models: Many pre-trained CNN models are available and have been trained on large-scale
datasets like ImageNet. Models like VGG, ResNet, Inception, and MobileNet are examples.
Researchers and practitioners often use these models as starting points for their own tasks, leveraging
the learned features and hierarchical representations.
Domain Adaptation: Domain adaptation focuses on transferring knowledge from a source domain to a
target domain, where the distributions of data might be different. This is crucial when the labeled data
in the target domain is limited. Techniques like adversarial training can be employed to align the feature
distributions between the source and target domains.
Progressive Neural Networks: In progressive neural networks, the model is trained progressively on
multiple tasks. Each new task involves the addition of new layers or units to the existing model. This
allows the model to learn task-specific features while retaining knowledge from previous tasks.
Transfer Learning Techniques
Knowledge Distillation: Knowledge distillation involves training a smaller model (student) to mimic
the behavior of a larger, well-established model (teacher). The idea is to transfer the knowledge
captured by the teacher model to the smaller model, making it more efficient while retaining much of
the performance.
Transfer learning can significantly reduce the amount of labeled data required for training a CNN on a
new task, making it a powerful technique in scenarios where obtaining large labeled datasets is
challenging.
The choice of transfer learning technique depends on factors such as the similarity between the source
and target tasks and the amount of labeled data available for the target task.
DenseNet
DENSENET
DenseNet is a neural network architecture introduced by Gao Huang, Zhuang Liu, and Kilian Q.
Weinberger in the paper "Densely Connected Convolutional Networks" in 2017.
DenseNet addresses some challenges in traditional deep neural networks, such as the vanishing gradient
problem and the need for a large number of parameters.
The Key features of DenseNet is as follows
Dense Blocks: The architecture introduces dense blocks where each layer is connected to every other
layer in a feed-forward fashion. This dense connectivity allows for feature reuse, facilitates gradient
flow, and helps in learning more compact representations.
Bottleneck Layers: Within dense blocks, bottleneck layers are employed to reduce the number of input
feature maps, making the network more computationally efficient.
DENSENET
Transition Layers: Transition layers are used to control the growth of the network and reduce the
spatial dimensions of the feature maps.
Global Average Pooling (GAP): DenseNet typically uses global average pooling instead of fully
connected layers at the end of the network. This reduces the number of parameters and helps improve
model generalization.
DenseNet architectures have demonstrated strong performance in image classification and other computer
vision tasks.
PixelNet
PixelNet refers to a neural network architecture designed for semantic segmentation tasks, where the goal is
to assign a class label to each pixel in an input image.
There might be some confusion here because "PixelNet" is not as widely recognized as some other
architectures.
It's possible that you might be referring to "PixelNet: Representation of the pixels, by the pixels, and for the
pixels" by Tsung-Yi Lin et al., which is a paper that proposed a method for object instance segmentation.
PixelNet
Key features of PixelNet:
Instance Embedding: PixelNet focuses on instance-level segmentation, embedding each pixel with instance-
specific information to distinguish between different object instances.
Pixel Embedding Network: The architecture introduces a Pixel Embedding Network (PEN) to embed pixels
with instance information, allowing the network to understand the relationships between pixels belonging to
the same object instance.
Objectness Score: PixelNet incorporates an objectness score, which helps in distinguishing between object
and non-object pixels, aiding in the segmentation process.