0% found this document useful (0 votes)
68 views60 pages

Unit III

CNN

Uploaded by

Naga Raju Challa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views60 pages

Unit III

CNN

Uploaded by

Naga Raju Challa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

BAPATLA ENGINEERING COLLEGE :: BAPATLA

(Autonomous)

Deep Learning (20ECJ44)


By
Dr. Naga Raju Challa
Assistant Professor,
Department of ECE,
Bapatla Engineering College,
(Autonomous)
Bapatla.
UNIT- III
Convolutional Neural Networks
Motivation
 The motivation behind designing CNNs is rooted in the need to effectively process and extract features
from structured grid data, with a particular emphasis on images, leading to improvements in various
computer vision tasks.
 Here some of the key motivations for the development of CNNs.
1. Local Connectivity
2. Parameter Sharing
3. Translation Invariance
4. Feature Hierarchy
5. Reduction of Spatial Dimensions
6. Effective Parameterization
7. Success in Image Recognition
8. Transfer Learning
Motivation towards CNN

Parameter Description

Images exhibit a hierarchical structure with local patterns and features. Traditional fully
connected neural networks do not consider the spatial relationships between pixels,
Local Connectivity resulting in a large number of parameters and inefficient learning of local patterns.
CNNs, with their convolutional layers, take advantage of the local connectivity, allowing
the network to learn features using shared weights.

CNNs use parameter sharing through the convolutional operation. A single set of weights
(filter) is applied to different parts of the input image. This reduces the number of
Parameter Sharing
parameters in the network, making it more efficient and easier to train. Parameter
sharing also helps in learning translation-invariant features.

CNNs are designed to be translation-invariant, meaning they can recognize patterns


regardless of their position in the input. This is achieved through the use of convolutional
Translation Invariance
and pooling layers, enabling the network to capture and recognize features regardless of
their spatial location.
Motivation towards CNN

Parameter Description

CNNs automatically learn hierarchical representations of features. The early


layers capture basic features like edges and textures, while deeper layers learn
Feature Hierarchy
more complex and abstract features. This hierarchical feature extraction is crucial
for tasks such as image recognition and object detection.

Pooling layers in CNNs help in reducing the spatial dimensions of the feature
Reduction of Spatial maps, making the computation more tractable and decreasing the risk of
Dimensions overfitting. Pooling layers also help in maintaining the invariance to small
translations.

CNNs are designed to be effective in parameterization, meaning they can learn


Effective Parameterization complex representations with a relatively small number of parameters compared
to fully connected networks. This is especially important for tasks involving
large input data like high-resolution images.
Motivation towards CNN

Parameter Description

CNNs have demonstrated remarkable success in image recognition tasks, such as


Success in Image the ImageNet Large Scale Visual Recognition Challenge. Their ability to
Recognition automatically learn and extract hierarchical features from images has contributed
to their widespread adoption in computer vision applications.

CNNs are well-suited for transfer learning, where a pre-trained model on a large
Transfer Learning dataset (e.g., ImageNet) can be fine-tuned for a specific task with a smaller
dataset. This is particularly valuable when labeled data for a specific task is
limited.
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)

For 3 entry:
For 1 entry:

For 4 entry:
For 2 entry:
Convolutional Neural Network Layers (CNN Layers)

 This convolution is for 1-D


For 5 entry: vectors.
 The mathematical expression
for convolution is defined as

For 6 entry:
Convolutional Neural Network Layers (CNN Layers)
 Let us consider for 2-D vector i.e., image
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)

 For the rest of the discussion we use the following formula for the convolution.
 In other words the kernel is center on the pixel of interest.
 So we will be looking at both preceding and succeeding neighbors.
Convolutional Neural Network Layers (CNN Layers)

 The resulting image should be blurred,


?
because we are taking the average
Convolutional Neural Network Layers (CNN Layers)

 The resulting image should be sharpen.


Convolutional Neural Network Layers (CNN Layers)

?
 The resulting image should be detects the
edges.
Convolutional Neural Network Layers (CNN Layers)

 The size of the image is𝑛 ×𝑛 ×3, because RGB


 The resulting output is known as feature map
 If we use multiple filters, multiple feature maps can be extracted
Convolutional Neural Network Layers (CNN Layers)
 What is the connection between convolution and neural networks?
 We will try to understand this by assuming the “Image classification”

(Machine Learning Algorithm)


Convolutional Neural Network Layers (CNN Layers)

 Instead of using hand crafted kernels such as edge detector can we learn for meaningful kernel/filter in
addition to learn the weights of the classifier?
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)
Convolutional Neural Network Layers (CNN Layers)

General Feedforward Neural Network


Convolutional Neural Network Layers (CNN Layers)

General Feedforward Neural Network


Convolutional Neural Network Layers (CNN Layers)

Convolutional Neural Network


Convolutional Neural Network Layers (CNN Layers)

Convolutional Neural Network


Sparse Connectivity
Weight Sharing
Weight Sharing
Fully Connected CNN
MaxPooling
MaxPooling
Convolutional Neural Network Layers (CNN Layers)

Fig 3.1: Basic Architecture of CNN


CNN Layers
 Convolutional Neural Networks (CNNs) are a class of deep neural networks that are particularly
effective in computer vision tasks, such as image recognition and classification. CNNs are composed of
layers with specific functions. The common layers that are found in a typical convolutional neural
network as follows.
 Input Layer:
 The input layer represents the raw input data, such as an image. Each neuron in this layer corresponds
to a pixel in the input image.
 Convolutional Layer:
 This layer applies convolutional operations to the input data. Convolutional operations involve sliding a
filter (also called a kernel) over the input to perform local feature extraction. The result is a feature map
that highlights important patterns in the input.
CNN Layers
 Activation Layer (ReLU - Rectified Linear Unit)
 After the convolutional operation, an activation function is applied element-wise to introduce non-
linearity. The Rectified Linear Unit (ReLU) is a common activation function used in CNNs.
 Pooling (Subsampling or Down-sampling) Layer
 Pooling layers are used to reduce the spatial dimensions of the feature maps. Max pooling, for example,
takes the maximum value from a group of neighboring pixels, effectively down sampling the data.
 Fully Connected (Dense) Layer
 Fully connected layers connect every neuron in one layer to every neuron in the next layer. In CNNs,
these layers are often used at the end of the network for classification tasks.
 Flatten Layer
 Before connecting to fully connected layers, the feature maps are often flattened into a one-dimensional
vector. This is necessary because fully connected layers require one-dimensional input.
CNN Layers

 Dropout Layer
 Dropout is a regularization technique where a random set of neurons are ignored during training. This
helps prevent overfitting by promoting redundancy in the network.
 Batch Normalization Layer
 Batch normalization normalizes the input of a layer by adjusting and scaling the activations. This can
speed up training and improve the overall stability of the neural network.
 These layers are typically stacked together to form the architecture of a CNN.
The convolutional and pooling layers are responsible for extracting features from the input data, and the fully
connected layers are responsible for making predictions or classifications based on these features.
The arrangement and number of these layers can vary depending on the specific architecture of the CNN.
Popular CNN architectures include AlexNet, VGGNet, GoogLeNet, and ResNet.

Source: NPTEL IIT KGP


Filters in CNN
 Filters also known as kernels in CNN and it plays a crucial role in the convolutional layers.
 Filters are small, learnable matrices that slide over the input data to perform the convolution operation.
 The purpose of filters is to detect patterns and features in the input data, such as edges, textures, or more
complex structures.
 Each filter learns to recognize different features through the training process.
 Working of Filters in CNN
 Convolution Operation
 The filter is slid over the input data, and at each position, it computes the dot product between its
weights and the values in the input volume. This process results in a feature map that highlights the
presence of certain features in the input.
 Stacking Filters
 CNNs typically use multiple filters in each convolutional layer. Each filter specializes in detecting
different patterns. For example, one filter might be designed to detect edges in a certain orientation,
while another might focus on texture patterns.
 Depth of Filters
Working of Filters in CNN
 The number of filters in a convolutional layer determines the depth of the layer. Each filter produces a separate
feature map, and the depth of the layer is equal to the number of filters used. The depth increases the network's
ability to capture and learn diverse features.
 Learnable Parameters
 The weights of the filters are learnable parameters, meaning they are adjusted during the training process
through backpropagation and gradient descent. The network learns to optimize these weights to improve its
ability to recognize important features for the given task.
 Size of Filters
 The size of the filters (usually a small square, like 3x3 or 5x5) determines the size of the local region they
analyze. Smaller filter sizes are common because they allow the network to capture more local features and
reduce the number of parameters, making the model computationally more efficient.
 Stride
 The stride defines how much the filter moves across the input data in each step. A larger stride reduces the size
of the output feature map, while a smaller stride increases its size. Stride is a hyperparameter that influences the
spatial dimensions of the feature maps.
Working of Filters in CNN

 Stride
 The stride defines how much the filter moves across the input data in each step. A larger stride reduces the size
of the output feature map, while a smaller stride increases its size. Stride is a hyper parameter that influences
the spatial dimensions of the feature maps.
 Filters can combine with activation functions (such as ReLU) and pooling layers, enable CNNs to learn
hierarchical representations of features in the input data.
 As the network goes deeper, it learns more abstract and complex features, making it capable of understanding and
recognizing intricate patterns in the data.
 The weights of the filters are adjusted during training to minimize the error in the network's predictions.
Parameter Sharing in CNN

 Parameter sharing is a key concept in Convolutional Neural Networks (CNNs) that contributes to the efficiency and
effectiveness of these networks, particularly in image recognition tasks. The idea behind parameter sharing is to use the
same set of parameters (weights and biases) for multiple units in a layer.
 In the context of CNNs, parameter sharing is primarily applied to the filters (kernels) used in convolutional layers.
However, It works on
 Shared Weights
 In a convolutional layer, a filter is used to perform convolutional operations on the input data. Instead of having
separate weights for each position in the input, the same set of filter weights is shared across the entire input. This
means that the filter slides over the input, and the same weights are used at every position.
 Local Receptive Fields
 Each unit in the feature map (output of the convolutional layer) is responsible for a small local region in the input
known as the receptive field. The weights of the filter are applied to this local region, and by sliding the filter, the
same weights are applied to different local regions across the entire input.
Parameter Sharing in CNN

 Pooling Layers and Subsampling


 In addition to parameter sharing in convolutional layers, pooling layers are often used to reduce the spatial dimensions
of the feature maps. Pooling involves downsampling the feature maps by taking the maximum or average value within
local regions. This further reduces the computational load and helps create a spatial hierarchy of features.
 Benefits
 Reduced Memory Usage: Since the same set of parameters is used across the entire input, parameter sharing reduces the
number of parameters compared to fully connected layers. This leads to a significant reduction in memory requirements and
helps manage computational complexity.
 Translation Invariance: Parameter sharing allows the network to be invariant to translation. In other words, the network
can recognize the same pattern regardless of where it appears in the input. This is particularly useful for tasks like image
recognition where the location of a feature in an image shouldn't affect its recognition.
 Feature Detection: Shared weights enable the filters to learn generic feature detectors that can recognize patterns in
different spatial locations. This is crucial for capturing hierarchical representations of features in the input data.
Regularization in CNN

 Regularization is an important concept in machine learning, including Convolutional Neural Networks (CNNs).
 Regularization techniques are employed to prevent overfitting, which occurs when a model learns to perform well on the
training data but fails to generalize to new, unseen data. Overfitting is a common concern, especially when dealing with
complex models and limited datasets.
 Regularization techniques used in CNNs
 Dropout: Dropout is a widely used regularization technique. During training, random units (neurons) are "dropped out" or
set to zero with a certain probability. This helps prevent co-adaptation of neurons, making the network more robust and
reducing overfitting.
 Weight Regularization (L1 and L2 regularization): L1 and L2 regularization involve adding a penalty term to the loss
function based on the magnitude of the weights. L1 regularization adds the sum of the absolute values of the weights, while
L2 regularization adds the sum of the squared values. This discourages the model from learning overly complex patterns
and helps prevent overfitting.
 Batch Normalization: While primarily used to normalize inputs, batch normalization also has a regularization effect. It
introduces a small amount of noise during training, which can act as a form of regularization and help prevent overfitting.
Regularization in CNN

 Data Augmentation: Data augmentation involves applying random transformations to the training data, such as rotations,
flips, and shifts. This artificially increases the size of the training dataset, helping the model generalize better to unseen
data.
 Early Stopping: Monitoring the model's performance on a validation set during training and stopping the training process
when the performance starts to degrade is a form of regularization. This helps prevent the model from fitting the training
data too closely and overfitting.
 Drop Connect: Drop Connect is an extension of Dropout, where instead of dropping out individual neurons, entire
connections between layers are dropped with a certain probability. This can be applied to the weights in convolutional
layers.
 Ensemble Methods: Training multiple models and combining their predictions can also act as a form of regularization.
Each model may learn different aspects of the data, and combining them helps improve generalization.
 The choice of regularization techniques depends on the specific characteristics of the dataset and the complexity of the
model. Often, a combination of these techniques is used to achieve the best results in terms of preventing overfitting while
still allowing the model to learn useful patterns from the data.
Regularization in CNN

 Data Augmentation: Data augmentation involves applying random transformations to the training data, such as rotations,
flips, and shifts. This artificially increases the size of the training dataset, helping the model generalize better to unseen
data.
 Early Stopping: Monitoring the model's performance on a validation set during training and stopping the training process
when the performance starts to degrade is a form of regularization. This helps prevent the model from fitting the training
data too closely and overfitting.
 Drop Connect: Drop Connect is an extension of Dropout, where instead of dropping out individual neurons, entire
connections between layers are dropped with a certain probability. This can be applied to the weights in convolutional
layers.
 Ensemble Methods: Training multiple models and combining their predictions can also act as a form of regularization.
Each model may learn different aspects of the data, and combining them helps improve generalization.
 The choice of regularization techniques depends on the specific characteristics of the dataset and the complexity of the
model. Often, a combination of these techniques is used to achieve the best results in terms of preventing overfitting while
still allowing the model to learn useful patterns from the data.
AlexNet
AlexNet
AlexNet
AlexNet
 AlexNet is a convolutional neural network (CNN) architecture designed for image classification tasks.
It was developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, and it won the ImageNet
Large Scale Visual Recognition Challenge (ILSVRC) in 2012, marking a breakthrough in the field of
deep learning.
 The key characteristics of AlexNet is as follows:
 Architecture: AlexNet consists of eight layers of learnable parameters, including five convolutional
layers, followed by three fully connected layers. The convolutional layers are designed to capture
hierarchical features in the input images.
 Rectified Linear Units (ReLU): AlexNet uses the rectified linear unit activation function (ReLU)
throughout most of its layers. ReLU introduces non-linearity to the network, helping it learn complex
patterns in the data.
 Local Response Normalization (LRN): Local Response Normalization is applied after the first and
second convolutional layers. It normalizes the responses across different feature maps, enhancing the
network's generalization ability.
AlexNet
 Overlapping Max-Pooling: Max-pooling is used for down-sampling in the spatial dimensions. Unlike
traditional pooling methods, AlexNet uses overlapping pooling, meaning that the pooling regions have
some overlap. This helps in capturing more spatial hierarchies.
 Dropout: To prevent overfitting, AlexNet incorporates dropout regularization in the fully connected
layers. Dropout randomly drops a certain percentage of neurons during training, forcing the network to
learn more robust features.
 Large-Scale Training: AlexNet was one of the first neural networks to be trained on a large-scale
dataset, specifically the ImageNet dataset, which contains millions of labeled images. The massive
scale of the dataset and the computational power required for training contributed to the success of
AlexNet.
 GPU Acceleration: AlexNet's successful implementation was made possible, in part, by the use of
Graphics Processing Units (GPUs) for parallelizing the training process. This significantly reduced
training time compared to using traditional Central Processing Units (CPUs).
RESNET

Block Diagram of RESNET


RESNET
 ResNet, short for Residual Networks, is a type of deep neural network architecture that was designed to
address the challenges of training very deep networks.
 It was introduced by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun in their paper "Deep
Residual Learning for Image Recognition" in 2015.
 The main innovation of ResNet is the use of residual blocks, which allow the network to learn residual
functions instead of directly learning the desired underlying mapping.
 This is achieved by introducing shortcut connections that skip one or more layers.
 These shortcut connections enable the gradient to be easily propagated back through the network during
training, addressing the vanishing gradient problem commonly encountered in deep networks.
 The key features of ResNet is as follows:
 Residual Blocks: The basic building block of a ResNet is the residual block. Each block consists of two
main paths: the identity path, which passes the input directly to the next layer, and the residual path, which
learns the residual mapping. The output of the block is the sum of these two paths.
RESNET
 ResNet, short for Residual Networks, is a type of deep neural network architecture that was designed to
address the challenges of training very deep networks.
 It was introduced by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun in their paper "Deep
Residual Learning for Image Recognition" in 2015.
 The main innovation of ResNet is the use of residual blocks, which allow the network to learn residual
functions instead of directly learning the desired underlying mapping.
 This is achieved by introducing shortcut connections that skip one or more layers.
 These shortcut connections enable the gradient to be easily propagated back through the network during
training, addressing the vanishing gradient problem commonly encountered in deep networks.
 The key features of ResNet is as follows:
 Residual Blocks: The basic building block of a ResNet is the residual block. Each block consists of two
main paths: the identity path, which passes the input directly to the next layer, and the residual path, which
learns the residual mapping. The output of the block is the sum of these two paths.
RESNET
 Shortcut Connections: The shortcut connections, or skip connections, enable the gradient to be
directly propagated through the network without passing through multiple layers. This helps in
mitigating the vanishing gradient problem, making it easier to train very deep networks.
 Deep Architectures: ResNet architectures are capable of being very deep, with models ranging from dozens
to hundreds of layers. This depth is facilitated by the residual connections, allowing for effective training and
optimization of extremely deep networks.
 Batch Normalization: ResNet typically incorporates batch normalization, which helps in stabilizing and
accelerating the training process by normalizing the inputs of each layer.
 Global Average Pooling: Instead of using fully connected layers at the end of the network, ResNet often
uses global average pooling to reduce spatial dimensions. This helps in reducing the number of parameters
and preventing overfitting.
 ResNet architectures have been widely used in various computer vision tasks, such as image classification,
object detection, and segmentation.
Transfer Learning Techniques

 In the context of Convolutional Neural Networks (CNNs), transfer learning has been particularly
successful due to the ability of CNNs to learn hierarchical features.
 Here are some common transfer learning techniques in CNN:
 Feature Extraction: In feature extraction, a pre-trained CNN is used as a fixed feature extractor. The
idea is to remove the last few layers of the pre-trained model, which are typically responsible for task-
specific classification, and retain the earlier layers that have learned generic features. These features
can then be used as input for a new classifier trained on the target task.
 Fine-Tuning: Fine-tuning involves taking a pre-trained CNN and further training it on the target task.
Instead of keeping the entire architecture fixed, as in feature extraction, fine-tuning allows the weights
of some layers to be updated during training on the new task. This is especially useful when the target
task is related to the original task, but there are some task-specific nuances.
Transfer Learning Techniques

 Pre-trained Models: Many pre-trained CNN models are available and have been trained on large-scale
datasets like ImageNet. Models like VGG, ResNet, Inception, and MobileNet are examples.
Researchers and practitioners often use these models as starting points for their own tasks, leveraging
the learned features and hierarchical representations.
 Domain Adaptation: Domain adaptation focuses on transferring knowledge from a source domain to a
target domain, where the distributions of data might be different. This is crucial when the labeled data
in the target domain is limited. Techniques like adversarial training can be employed to align the feature
distributions between the source and target domains.
 Progressive Neural Networks: In progressive neural networks, the model is trained progressively on
multiple tasks. Each new task involves the addition of new layers or units to the existing model. This
allows the model to learn task-specific features while retaining knowledge from previous tasks.
Transfer Learning Techniques

 Knowledge Distillation: Knowledge distillation involves training a smaller model (student) to mimic
the behavior of a larger, well-established model (teacher). The idea is to transfer the knowledge
captured by the teacher model to the smaller model, making it more efficient while retaining much of
the performance.
 Transfer learning can significantly reduce the amount of labeled data required for training a CNN on a
new task, making it a powerful technique in scenarios where obtaining large labeled datasets is
challenging.
 The choice of transfer learning technique depends on factors such as the similarity between the source
and target tasks and the amount of labeled data available for the target task.
DenseNet
DENSENET
 DenseNet is a neural network architecture introduced by Gao Huang, Zhuang Liu, and Kilian Q.
Weinberger in the paper "Densely Connected Convolutional Networks" in 2017.
 DenseNet addresses some challenges in traditional deep neural networks, such as the vanishing gradient
problem and the need for a large number of parameters.
 The Key features of DenseNet is as follows
 Dense Blocks: The architecture introduces dense blocks where each layer is connected to every other
layer in a feed-forward fashion. This dense connectivity allows for feature reuse, facilitates gradient
flow, and helps in learning more compact representations.
 Bottleneck Layers: Within dense blocks, bottleneck layers are employed to reduce the number of input
feature maps, making the network more computationally efficient.
DENSENET

 Transition Layers: Transition layers are used to control the growth of the network and reduce the
spatial dimensions of the feature maps.
 Global Average Pooling (GAP): DenseNet typically uses global average pooling instead of fully
connected layers at the end of the network. This reduces the number of parameters and helps improve
model generalization.
 DenseNet architectures have demonstrated strong performance in image classification and other computer
vision tasks.
PixelNet

 PixelNet refers to a neural network architecture designed for semantic segmentation tasks, where the goal is
to assign a class label to each pixel in an input image.
 There might be some confusion here because "PixelNet" is not as widely recognized as some other
architectures.
 It's possible that you might be referring to "PixelNet: Representation of the pixels, by the pixels, and for the
pixels" by Tsung-Yi Lin et al., which is a paper that proposed a method for object instance segmentation.
PixelNet
 Key features of PixelNet:
 Instance Embedding: PixelNet focuses on instance-level segmentation, embedding each pixel with instance-
specific information to distinguish between different object instances.
 Pixel Embedding Network: The architecture introduces a Pixel Embedding Network (PEN) to embed pixels
with instance information, allowing the network to understand the relationships between pixels belonging to
the same object instance.
 Objectness Score: PixelNet incorporates an objectness score, which helps in distinguishing between object
and non-object pixels, aiding in the segmentation process.

You might also like