0% found this document useful (0 votes)
14 views103 pages

DL Unit-2

The document provides an overview of Computer Vision and Convolutional Neural Networks (CNNs), detailing their definitions, functionalities, and applications. It discusses various aspects of CNN architecture, including convolutional layers, pooling layers, and the importance of padding and striding in feature extraction. Additionally, it highlights the advantages and disadvantages of CNNs and pooling layers, along with parameter calculation methods for both convolutional and fully connected layers.

Uploaded by

cosisew977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views103 pages

DL Unit-2

The document provides an overview of Computer Vision and Convolutional Neural Networks (CNNs), detailing their definitions, functionalities, and applications. It discusses various aspects of CNN architecture, including convolutional layers, pooling layers, and the importance of padding and striding in feature extraction. Additionally, it highlights the advantages and disadvantages of CNNs and pooling layers, along with parameter calculation methods for both convolutional and fully connected layers.

Uploaded by

cosisew977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 103

UNIT-1 INTRODUCTION

CNN
MODULE -01
UNIT-2 INTRODUCTION

• What is computer vision?


• Why Convolutions (CNN)?
• Introduction to CNN,
• Train a simple convolutional neural net,
• Explore the design space for convolutional nets,
• Pooling layer motivation in CNN,
• Design a convolutional layered application,
• Understanding and visualizing a CNN,
• Transfer learning and fine-tuning CNN,
• Image classification,
• Text classification,
• Image classification and hyper-parameter tuning,
• Emerging NN architectures.
UNIT-2 What is computer vision?
UNIT-2 What is computer vision?

• Computer vision is a field of artificial intelligence (AI) that enables computers and
systems to derive meaningful information from digital images, videos and other
visual inputs — and take actions or make recommendations based on that
information. If AI enables computers to think, computer vision enables them to
see, observe and understand.
• Computer vision works much the same as human vision, except humans have a
head start. Human sight has the advantage of lifetimes of context to train how to
tell objects apart, how far away they are, whether they are moving and whether
there is something wrong in an image.
• Computer vision trains machines to perform these functions, but it has to do it in
much less time with cameras, data and algorithms rather than retinas, optic
nerves and a visual cortex. Because a system trained to inspect products or watch
a production asset can analyze thousands of products or processes a minute,
noticing imperceptible defects or issues, it can quickly surpass human capabilities.
UNIT-2 What is computer vision?
UNIT-2 What is computer vision?

• Definition: Computer Vision is a field of artificial intelligence that focuses on enabling machines to
interpret and understand visual information from the world, typically through digital images and
videos.
• Image Processing: It involves various techniques for image manipulation, enhancement, and feature
extraction to prepare visual data for analysis.
• Object Detection: Computer Vision can detect and locate objects within images or videos, making it
valuable for tasks like facial recognition, object tracking, and security surveillance.
• Image Classification: It classifies images into predefined categories or labels, allowing for tasks like
identifying animals, recognizing handwritten digits, or diagnosing medical conditions from X-rays.
• Semantic Segmentation: This technique assigns labels to individual pixels in an image, enabling
precise understanding of object boundaries and fine-grained analysis.
• Depth Estimation: Computer Vision can estimate the depth or 3D structure of a scene from 2D
images, which is crucial for applications like autonomous driving and augmented reality.
• Feature Extraction: It involves identifying key patterns, edges, corners, or other distinctive elements
in images, which are then used for various tasks such as object recognition.
UNIT-2 What is computer vision?

• Convolutional Neural Networks (CNNs): CNNs are a fundamental architecture in Computer Vision,
designed to automatically learn and extract features from images through layers of convolutional
and pooling operations.
• Face Recognition: Computer Vision can identify and verify individuals by analyzing facial features,
making it useful for security, access control, and even personalized user experiences.
• Object Tracking: It can track the movement of objects over time in videos, crucial for applications
like autonomous drones, surveillance, and sports analysis.
• Pose Estimation: This technique determines the positions and orientations of human or object
parts within an image or video, commonly used in fields like gesture recognition and robotics.
• OCR (Optical Character Recognition): OCR technology in Computer Vision can recognize and convert
printed or handwritten text into machine-readable text, enabling text analysis in documents or
images.
• Medical Imaging: Computer Vision plays a significant role in medical diagnosis by analyzing medical
images like X-rays, MRIs, and CT scans for disease detection and localization.
• Agriculture: It's used for crop monitoring, disease detection in plants, and automated harvesting by
analyzing images of fields and crops.
UNIT-2 What is computer vision?

• Autonomous Vehicles: Computer Vision is a key component in self-driving cars,


helping them perceive the environment, identify road signs, and avoid obstacles.
• Retail and E-commerce: It's employed for product recommendation, inventory
management, and cashierless checkout systems using computer vision in stores.
• Quality Control: In manufacturing, Computer Vision can inspect products for
defects, ensuring high-quality production.
• Augmented Reality (AR): AR applications overlay digital information on the real
world, often relying on Computer Vision to align and interact with the physical
environment.
• Human-Computer Interaction: Computer Vision enables gesture recognition, gaze
tracking, and other forms of natural interaction between humans and machines.
• Privacy and Ethical Considerations: The use of Computer Vision raises privacy
concerns, and ethical considerations are crucial in its development and deployment,
especially in surveillance and facial recognition applications.
UNIT-2 Introduction to CNN

1.CNN stands for Convolutional Neural Network.


2. It is a specialized type of ANN that has proven to be highly effective
in various computer vision tasks, such as image classification,
object detection, and image segmentation.
3. CNNs are designed to automatically and adaptively learn patterns
and features from input images, making them well-suited for tasks
that involve analyzing visual data.
UNIT-2 Introduction to CNN

• Advantages of Convolutional Neural Networks (CNNs):


• Good at detecting patterns and features in images, videos, and audio
signals.
• Robust to translation, rotation, and scaling invariance.
• End-to-end training, no need for manual feature extraction.
• Can handle large amounts of data and achieve high accuracy.
• Disadvantages of Convolutional Neural Networks (CNNs):
• Computationally expensive to train and require a lot of memory.
• Can be prone to overfitting if not enough data or proper regularization
is used.
• Requires large amounts of labeled data.
• Interpretability is limited, it’s hard to understand what the network has
learned.
UNIT-2 Introduction to CNN
UNIT-2 Introduction to CNN
UNIT-2 Introduction to CNN

• Basic Architecture
UNIT-2 Introduction to CNN

• Basic Architecture
• There are two main parts to a CNN architecture
• A convolution tool that separates and identifies the various features of the
image for analysis in a process called as Feature Extraction.
• The network of feature extraction consists of many pairs of convolutional or
pooling layers.
• A fully connected layer that utilizes the output from the convolution process
and predicts the class of the image based on the features extracted in
previous stages.
• This CNN model of feature extraction aims to reduce the number of features
present in a dataset.
• It creates new features which summarises the existing features contained in an original
set of features.
• There are many CNN layers as shown in the CNN architecture diagram.
UNIT-2 Introduction to CNN

• Convolution Layers
• There are three types of layers that make up the CNN which are the convolutional layers,
pooling layers, and fully-connected (FC) layers.
• When these layers are stacked, a CNN architecture will be formed.
• In addition to these three layers, there are two more important parameters which are the
dropout layer and the activation function which are defined below.
• 1. Convolutional Layer
• This layer is the first layer that is used to extract the various features from the input images.
• In this layer, the mathematical operation of convolution is performed between the input image and
a filter of a particular size MxM.
• By sliding the filter over the input image, the dot product is taken between the filter and the parts
of the input image with respect to the size of the filter (MxM).
• The output is termed as the Feature map which gives us information about the image such as the
corners and edges.
• Later, this feature map is fed to other layers to learn several other features of the input image.
• The convolution layer in CNN passes the result to the next layer once applying the convolution
operation in the input.
• Convolutional layers in CNN benefit a lot as they ensure the spatial relationship between the pixels
is intact.
UNIT-2 Introduction to CNN

2. Pooling Layer
• In most cases, a Convolutional Layer is followed by a Pooling Layer.
• The primary aim of this layer is to decrease the size of the convolved feature map to reduce
the computational costs.
• This is performed by decreasing the connections between layers and independently operates
on each feature map.
• Depending upon method used, there are several types of Pooling operations. It basically
summarises the features generated by a convolution layer.
• In Max Pooling, the largest element is taken from feature map.
• Average Pooling calculates the average of the elements in a predefined sized
Image section.
• The total sum of the elements in the predefined section is computed in Sum
Pooling.
• The Pooling Layer usually serves as a bridge between the Convolutional Layer and
the FC Layer.

• This CNN model generalises the features extracted by the convolution layer, and helps the
networks to recognise the features independently.
UNIT-2 Introduction to CNN

2. Types of Pooling Layer


Max Pooling
• Max pooling is a pooling operation that selects the maximum element
from the region of the feature map covered by the filter.
• Thus, the output after max-pooling layer would be a feature map
containing the most prominent features of the previous feature map.

Average Pooling
• Average pooling computes the average of the elements
present in the region of feature map covered by the filter.
• Thus, while max pooling gives the most prominent feature in
a particular patch of the feature map, average pooling gives
the average of features present in a patch.
UNIT-2 Introduction to CNN

Advantages of Pooling Layer:


• Dimensionality reduction: The main advantage of pooling layers is that they
help in reducing the spatial dimensions of the feature maps. This reduces the
computational cost and also helps in avoiding overfitting by reducing the
number of parameters in the model.
• Translation invariance: Pooling layers are also useful in achieving translation
invariance in the feature maps. This means that the position of an object in
the image does not affect the classification result, as the same features are
detected regardless of the position of the object.
• Feature selection: Pooling layers can also help in selecting the most important
features from the input, as max pooling selects the most salient features and
average pooling preserves more information.
UNIT-2 Introduction to CNN

Disadvantages of Pooling Layer:


• Information loss: One of the main disadvantages of pooling layers is that they
discard some information from the input feature maps, which can be
important for the final classification or regression task.
• Over-smoothing: Pooling layers can also cause over-smoothing of the feature
maps, which can result in the loss of some fine-grained details that are
important for the final classification or regression task.
• Hyperparameter tuning: Pooling layers also introduce hyperparameters such
as the size of the pooling regions and the stride, which need to be tuned in
order to achieve optimal performance. This can be time-consuming and
requires some expertise in model building.
UNIT-2 Introduction to CNN

Feature Map Size Calculation Formula


UNIT-2 Introduction to CNN

Padding
• Padding is a technique used to preserve the spatial dimensions of the
input image after convolution operations on a feature map.
• Padding involves adding extra pixels around the border of the input
feature map before convolution.
• This can be done in two ways:
• Valid Padding: In the valid padding, no padding is added to the input feature
map, and the output feature map is smaller than the input feature map. This
is useful when we want to reduce the spatial dimensions of the feature maps.
• Same Padding: In the same padding, padding is added to the input feature
map such that the size of the output feature map is the same as the input
feature map. This is useful when we want to preserve the spatial dimensions
of the feature maps.
UNIT-2 Introduction to CNN

Padding Same Padding


same padding adds additional rows and columns
of pixels around the edges of the input data so
that the size of the output feature map is the
same as the size of the input data. This is
achieved by adding rows and columns of pixels
with a value of zero around the edges of the
input data before the convolution operation.

Valid Padding
Valid padding is used when it is desired to reduce the size
of the output feature map in order to reduce the number
of parameters in the model and improve its computational
efficiency.
UNIT-2 Introduction to CNN

Padding
• The most common padding value is zero-padding, which involves
adding zeros to the borders of the input feature map.
• Padding can help in reducing the loss of information at the borders of
the input feature map and can improve the performance of the
model.
• However, it also increases the computational cost of the convolution
operation.
• Overall, padding is an important technique in CNNs that helps in
preserving the spatial dimensions of the feature maps and can
improve the performance of the model.
UNIT-2 Introduction to CNN

• Reasons of using Striding in CNNs


• Downsampling: Striding with a larger value helps downsample the feature
map, reducing its spatial dimensions. This can be beneficial in reducing
computational complexity and memory usage, especially in deep networks.
• Increasing Receptive Field: Larger strides can increase the receptive field of
neurons in deeper layers, allowing them to capture information from a larger
portion of the input image.
• Dimension Reduction: Striding can be used to reduce the spatial dimensions
of the feature map gradually as you move deeper into the network, which can
be helpful in certain tasks like object detection and image classification.
UNIT-2 Introduction to CNN

• Example of padding = 1 and striding = 2


UNIT-2 Introduction to CNN

• Striding
• "striding" refers to the step
size or the distance by which
the convolutional filter or
kernel moves across the input
image during the convolution
operation.
• Striding is an important
parameter that determines
how much the output feature
map's spatial dimensions are
reduced compared to the
input.
UNIT-2 Introduction to CNN

• Parameter Calculation
• In Convolutional Neural Networks (CNNs), parameter calculation refers to
determining the number of learnable parameters (weights and biases) that
need to be optimized during training.
• These parameters help the model learn the mapping between input data
and output predictions.
• step by step calculation for parameters.
UNIT-2 Introduction to CNN

• 1. Convolutional Layer
• The convolutional layer consists of filters (kernels) that convolve over the input data
to extract features.
• Parameters in a Convolutional Layer:
• Filter size (K x K): A filter (also called a kernel) is a small matrix used to convolve
over the input. The dimensions of this filter are K×K, where K is typically a small
value like 3, 5, or 7.
• Number of filters (F): The number of filters used to capture different features in the
input. For example, if you use 32 filters, the output depth will be 32.
• Input depth (C): The depth of the input (for example, 3 channels for RGB images).
UNIT-2 Introduction to CNN

• Parameter Calculation for a Convolutional Layer:


• Number of parameters per filter = K×K×C+1 (the extra 1 is for the bias term)
• K×K×C is the number of weights for each filter, where C is the input depth.
• +1 is for the bias term for each filter.
• Total parameters = Number of filters F × (Number of parameters per filter)
• So the total parameters for a convolutional layer are calculated as:

Total Parameters=F×(K×K×C+1)
UNIT-2 Introduction to CNN

• 2. Fully Connected Layer (Dense Layer)


• A fully connected layer connects every neuron to every neuron in the previous layer.
The number of parameters in a fully connected layer depends on the number of
inputs (neurons from the previous layer) and the number of neurons in the current
layer.
• Parameters in a Fully Connected Layer:
• Input size (N): The number of neurons from the previous layer (for example, after flattening the
output of the convolutional layers).
• Number of neurons (M): The number of neurons in the current layer.
• Bias: Each neuron in the fully connected layer has a bias.
• Parameter Calculation for a Fully Connected Layer:
• Number of parameters = N×M+M
• N×M represents the weights between each input neuron and each output neuron.
• +M accounts for the bias terms for each output neuron.
UNIT-2 Introduction to CNN
UNIT-2 Introduction to CNN

• Implementation of Padding and Striding


https://colab.research.google.com/drive/116QNuxPQtxhb-
1zqyWqDIboYh2GzTvDD?authuser=1#scrollTo=Esb12gcaa9V_

• Simple CNN visualization

https://colab.research.google.com/drive/
1LJUx72Qkp756U1mT9BQcW19n9TF-VToH?usp=sharing
UNIT-2 Introduction to CNN

3. Fully Connected Layer


UNIT-2 Introduction to CNN

3. Fully Connected Layer


• The Fully Connected (FC) layer consists of the weights and biases along with
the neurons and is used to connect the neurons between two different layers.
• These layers are usually placed before the output layer and form the last few
layers of a CNN Architecture.
• In this, the input image from the previous layers are flattened and fed to the
FC layer.
• The flattened vector then undergoes few more FC layers where the
mathematical functions operations usually take place.
• In this stage, the classification process begins to take place.
• The reason two layers are connected is that two fully connected layers will
perform better than a single connected layer.
• These layers in CNN reduce the human supervision
UNIT-2 Introduction to CNN

4. Dropout
UNIT-2 Introduction to CNN

4. Dropout
• When all the features are connected to the FC layer, it can cause overfitting in
the training dataset.
• Overfitting occurs when a particular model works so well on the training data
causing a negative impact in the model’s performance when used on a new
data.
• To overcome this problem, a dropout layer is utilised wherein a few neurons
are dropped from the neural network during training process resulting in
reduced size of the model.
• On passing a dropout of 0.3, 30% of the nodes are dropped out randomly
from the neural network.
• Dropout results in improving the performance of a machine learning model as
it prevents overfitting by making the network simpler.
• It drops neurons from the neural networks during training.
UNIT-2 Introduction to CNN

5. Activation Functions
• Finally, one of the most important parameters of the CNN model is the
activation function.
• They are used to learn and approximate any kind of continuous and complex
relationship between variables of the network.
• It decides which information of the model should fire in the forward direction
and which ones should not at the end of the network.
• It adds non-linearity to the network.
• There are several commonly used activation functions such as the ReLU,
Softmax, tanH and the Sigmoid functions.
• Each of these functions have a specific usage. For a binary classification CNN
model, sigmoid and softmax functions are preferred and for a multi-class
classification, generally softmax us used.
UNIT-2 Expected Questions on Padding and Striding

1. What is padding in convolutional neural networks, and why is it used?


2. Explain the concept of "valid" and "same" padding in CNNs. How do they affect the spatial dimensions of the output
feature maps?
3. How does zero-padding impact the size of the output feature map during convolution?
4. In the context of CNNs, what is the purpose of using padding when applying convolutional filters to an input image?
5. What is striding in CNNs, and how does it affect the spatial dimensions of the output feature maps?
6. How can you calculate the size of the output feature map when applying a convolutional filter with a specific size and
stride to an input image with a given size?
7. In what scenarios might you use larger strides during convolution, and what are the advantages and disadvantages of
doing so?
8. How does the choice of padding and striding influence the receptive field of neurons in a CNN?
9. Explain the trade-off between using padding and striding in CNNs in terms of preserving spatial information versus
reducing computational complexity.
10. Can you describe a situation where you would use both padding and striding in a CNN layer, and what would be the
expected impact on the output feature map?
11. What is the relationship between the size of the input image, the size of the convolutional filter, the amount of
padding, and the stride value in determining the size of the output feature map?
12. How do padding and striding affect the performance of a CNN in tasks like image classification or object detection?
UNIT-2 Transfer Learning
UNIT-2 Transfer Learning
UNIT-2 Transfer Learning

• Transfer Learning
• Transfer learning involves taking a pre-trained neural network model, typically trained on a large and diverse
dataset, and using it as a starting point for a new, related task.
• The idea is that the knowledge learned from one task can be transferred to another task, potentially saving
a lot of training time and data.
• Working of Transfer learning
• Pre-trained Model: Start with a pre-trained CNN model, such as VGG, ResNet, Inception, or MobileNet, that has
been trained on a large dataset like ImageNet for image classification.
• Remove Last Layers: Remove the final classification layers (output layers) of the pre-trained model, which are
specific to the original task.
• Add New Layers: Add new layers to the model. These layers should be tailored to your specific task. The number
and structure of these layers depend on the complexity of your task.
• Fine-tuning: Optionally, you can choose to fine-tune some of the layers of the pre-trained model on your task. Fine-
tuning allows the model to adapt to the new data while retaining some of the knowledge from the pre-trained
model.
• Training: Train the modified model on your dataset, which is typically smaller and more task-specific than the
original dataset.
• Transfer learning is especially beneficial when you have limited data because the pre-trained model has
already learned useful features from a large dataset, which can be applied to your smaller dataset.
UNIT-2 Advantages of Transfer Learning

• Advantages of Transfer Learning:

• Improved Learning Speed: Transfer learning can significantly reduce the time and
computational resources required for training a model. Pre-trained models have already
learned useful features, so they converge faster on new tasks.
• Better Generalization: Pre-trained models often capture a rich set of features from a
large and diverse dataset. This knowledge is beneficial for generalizing to new, related
tasks, especially when you have limited data for the target task.
• State-of-the-Art Performance: Pre-trained models, particularly in fields like computer
vision and natural language processing, are often state-of-the-art in terms of accuracy
and performance on benchmark tasks. Transfer learning allows you to leverage this
expertise without starting from scratch.
• Lower Data Requirements: Transfer learning can work well with smaller datasets. This is
crucial in situations where collecting a large dataset for a specific task is time-consuming
or expensive.
• Domain Adaptation: Transfer learning can help adapt models from one domain to
another. For example, you can train a model on medical images and then adapt it for a
different hospital's data.
UNIT-2 Disadvantages of Transfer Learning

• Disadvantages of Transfer Learning:

• Domain Mismatch: If the source domain (the original pre-trained model's data) is
substantially different from the target domain (your specific task), transfer learning may not
work well. The features learned in the source domain might not be relevant.
• Overfitting: If you're not careful, pre-trained models can suffer from overfitting when
applied to a new task. Fine-tuning and regularization techniques are needed to prevent this.
• Limited Task Specificity: Pre-trained models are general-purpose. They may not capture
task-specific nuances, leading to suboptimal performance in some cases.
• Dependency on Source Model Quality: The quality of the source model matters. If the pre-
trained model is not well-trained or is biased, it can negatively impact your target task.
• Model Complexity: Pre-trained models are often deep neural networks with a large number
of parameters. Fine-tuning them requires significant computational resources and may not
be feasible on less powerful hardware.
• Lack of Interpretability: Deep pre-trained models can be challenging to interpret, making it
difficult to understand why they make specific predictions.
UNIT-Transfer Learning

• Examples of Transfer Learning in Deep Learning include:


• Using a pre-trained image classification network for a new image classification task
with a similar dataset.
• Fine-tuning a pre-trained language model for text classification on a new text
dataset.
• Application of a pre-trained object detection network, like the segmentation on a
new dataset.
UNIT-2 Traditional Approach Vs. Transfer Learning

Figure 1.1 Traditional Setup Vs Transfer Learning.


The traditional approach trains or builds a separate model for each task as
shown in the left figure where we have 3 isolated models for 3 tasks. The
transfer learning approach leverages prior knowledge from source tasks to
improve upon the performance of target task as shown in the right figure
UNIT-2 Traditional Approach Vs. Transfer Learning
•Traditional Approach:
• Training Process: In the traditional approach, models are trained from scratch on the task-specific data,
without leveraging knowledge from other tasks or domains.
• Data Requirement: Requires a large amount of labeled data for the specific task to perform well.
• Generalization: Limited generalization to other tasks or domains as the model is specifically optimized for
one task and dataset.
• Training Time: Generally requires more time for training, as the model learns all patterns from the ground
up.
• Knowledge Transfer: No reuse of knowledge from different tasks or domains.
•Transfer Learning:
• Training Process: Transfer learning uses pre-trained models or knowledge from a source task/domain and
adapts it to a target task/domain.
• Data Requirement: Often requires less labeled data for the target task, as the model leverages existing
knowledge from the source domain.
• Generalization: Enhances generalization by transferring knowledge from related tasks or domains,
improving performance in new or unseen tasks.
• Training Time: Typically reduces training time since the model starts with learned parameters and adapts
them to the new task.
• Knowledge Transfer: Allows the transfer of knowledge from one domain/task to another, making it more
efficient in scenarios where data is limited.
UNIT-2 Traditional Approach Vs. Transfer Learning
UNIT-2. Transfer Learning Scenario

Figure 1.3 Transfer Learning Scenarios:


Inductive, Unsupervised and Transductive transfers are three main
scenarios under which transfer learning is applied. Inductive transfer
draws a lot of parallels from multi-task learning while domain adaptation
and transductive transfer are often discussed together.
UNIT-2 Transfer Learning Scenario

•Inductive Transfer Overview: Involves transferring knowledge from a


source task to a target task where the domains are similar or related, but
the tasks are different.
• Presence of Labels: Both source and target domains have labeled
data, with the target domain relying on labels to induce knowledge
from the source domain.
• Larger Source Dataset: The source domain typically has a larger
labeled dataset compared to the target domain, which has limited
labeled samples.
• Improving Target Domain Performance: Inductive transfer helps
improve performance in the target domain by leveraging the
knowledge and objective function from the source domain.
• Real-World Application: This form of transfer learning is widely
used in practical scenarios.
UNIT-2 Transfer Learning Scenario

•Unsupervised Transfer Overview: Similar to inductive transfer, but with


the key difference being the absence of labeled data in both the source
and target domains.
• No Labels: Both the source and target domains lack labeled data,
requiring different strategies for knowledge transfer.
• Focus Areas: Primarily addresses transfer learning in unsupervised
scenarios such as:
• Dimensionality reduction
• Clustering
• Density estimation
• Application: Used in cases where labeled data is unavailable, and
learning from unstructured data is needed.
UNIT-2 Transfer Learning Scenario

•Transductive Transfer Overview: Involves the same source and target


tasks but different corresponding domains.
• Absence of Labeled Data in Target Domain: The target domain lacks
labeled samples, which differentiates it from inductive transfer.
• Domain Mismatch: The scenario closely matches feature-space and
marginal-probability mismatch, as discussed in section 1.1.
• Similarity to Domain Adaptation: This scenario is similar to the related
concept of domain adaptation, where knowledge is transferred from one
domain to another with a shared task.
UNIT-2 Types of Transfer Learning in Deep Learning
UNIT-2 Types of Transfer Learning in Deep Learning
• Fine-tuning: It involves the use of pre-trained models as the underlying framework
and then training on a new task with a lower learning rate.

• Feature extraction: Here, the developer uses pre-trained models to extract features
from new data. Then they use the best features to train a new classifier.

• Domain adaptation: It works by adapting a pre-trained model to a new domain by


fine-tuning it on the target domain data.

• Multi-task learning: The focus is to train a single model on multiple tasks to improve
performance on all tasks.

• Zero-shot learning: It involves the use of pre-trained models to make predictions on


new classes without any training data for those classes.
UNIT-2 Why is Transfer Learning Gaining Popularity?
• Transfer Learning could be a revolutionary addition to the Machine Learning domain.
It helps in overcoming some of the drawbacks and bottlenecks of Machine Learning:
• Data scarcity: Transfer Learning technology doesn’t require reliance on larger data sets.
This technology allows models to be fine-tuned using a limited amount of data. It is
especially useful in applications where labelled datasets are scarce or expensive to
acquire, such as medical imaging, autonomous driving, and Natural Language Processing
(NLP).
• Computational cost: Transfer Learning works on free trade network, thereby reducing the
dependency on creating a model from scratch. Thus it is computationally lesser
expensive.
• Long training time: When the developer starts training a model from scratch, it may take
days or weeks. This eventually increases the time. However, in Transfer Learning, the
computational time dramatically comes down because it involves the use of pre-trained
models. Thus, making it a time-saving process.
• Domain adaptation: Transfer Learning enables models to be adapted to new domains by
fine-tuning pre-trained models on task-specific data.
UNIT-2 Introduction to CNN

• Fine-Tuning:
• Fine-tuning is the process of training a pre-trained model on a new task.
• It can be thought of as a specific application of transfer learning.
• Fine-tuning can be done in two ways:
• Feature Extraction: In this approach, you freeze all layers of the pre-trained model except the final classification layers.
Then, you train the model on your task-specific dataset. This is often used when you have a small dataset.
• Fine-Tuning All Layers: In this approach, you unfreeze some or all of the layers in the pre-trained model and retrain them
on your task. This approach is suitable when you have a larger dataset, and you want to adapt the model more to your
specific task.
• Benefits of Transfer Learning and Fine-Tuning:
• Faster Training: Transfer learning and fine-tuning typically require less training time than training a model from scratch.
• Better Generalization: Pre-trained models have already learned useful features, leading to improved generalization on
your task.
• Lower Data Requirements: You can achieve good results with smaller datasets, which is often a limitation in deep
learning.
• State-of-the-Art Performance: Many pre-trained models are state-of-the-art on a wide range of tasks, allowing you to
benefit from the latest advancements in deep learning.
UNIT-2 Methods of Fine-tuning in CNN

• Select a Pre-trained Model: Choose a pre-trained CNN model that has been trained on a large dataset,
typically on a similar type of data or a related task. Common choices include models like VGG, ResNet,
Inception, and MobileNet, which are available in popular deep learning frameworks like TensorFlow and
PyTorch.
UNIT-2 Methods of Fine-tuning in CNN

• Remove the Top Layers: The top layers of the pre-trained model are often specific to the original task on
which the model was trained (e.g., ImageNet classification). These top layers include the final classification
layers. Remove these layers, keeping the convolutional and feature extraction layers intact.

• Add New Layers: Add new layers on top of the remaining layers. These new layers should be tailored to
your specific task. This typically includes a few fully connected layers followed by an output layer with the
number of units matching the number of classes or the desired output. You may also include activation
functions like ReLU, dropout layers for regularization, and batch normalization layers.
UNIT-2 Methods of Fine-tuning in CNN

• Freeze or Unfreeze Layers: Decide which layers to freeze and which to train. Freezing a layer means that its
weights and parameters are not updated during training. Generally, you might want to freeze the initial
layers (lower layers) to preserve the pre-trained feature extraction capabilities and only train the newly
added layers. However, you can experiment with fine-tuning some of the pre-trained layers as well,
depending on your dataset size and similarity to the pre-training data.
UNIT-2 Methods of Fine-tuning in CNN

• Data Augmentation: Apply data augmentation techniques to the training dataset. Data augmentation helps
increase the diversity of training examples by applying random transformations (e.g., rotation, flipping,
scaling) to the input images. This can improve model generalization.
UNIT-2 Methods of Fine-tuning in CNN

• Choose a Learning Rate: Experiment with different learning rates for training the model. A smaller learning
rate is often used for fine-tuning, as it helps stabilize the training process, especially when only a few layers
are being updated.

• Loss Function and Metrics: Select an appropriate loss function for your specific task (e.g., categorical cross-
entropy for classification) and the evaluation metrics you want to use (e.g., accuracy, F1-score).

• Training: Train the modified model on your target dataset using the chosen learning rate, loss function, and
metrics. Monitor the training progress, and consider using early stopping to prevent overfitting.
UNIT-2 Methods of Fine-tuning in CNN

• Hyperparameter Tuning: Experiment with different hyperparameters such as


batch size, dropout rates, and the number of neurons in the added layers to find
the best configuration for your task.

• Evaluate and Fine-Tune: After training, evaluate the fine-tuned model on a


validation dataset and fine-tune further if necessary. You can adjust
hyperparameters or the architecture of the added layers to improve
performance.

• Testing: Finally, evaluate the fine-tuned model on a separate test dataset to


assess its performance and generalization to unseen data.
UNIT-2 PRE-TRAINED MODEL ARCHITECTURES
1. Image Classification
• ResNet (Residual Networks): Pre-trained on large datasets like
ImageNet, ResNet is commonly used for tasks like object detection,
facial recognition, and medical image analysis. ResNet helps in learning
deep features through residual blocks, which are useful in avoiding
vanishing gradients.
• VGG (Visual Geometry Group): A pre-trained model also trained on
ImageNet, often used for feature extraction and fine-tuning in various
image-related tasks like object detection and segmentation.
• Inception (GoogLeNet): Pre-trained on ImageNet, Inception is used for
image classification and is efficient for transfer learning in complex vision
tasks.
UNIT-2 PRE-TRAINED MODEL ARCHITECTURES
2. Natural Language Processing (NLP)
• BERT (Bidirectional Encoder Representations from Transformers): Pre-trained on large
text corpora, BERT is widely used for tasks like sentiment analysis, named entity
recognition (NER), and question answering. BERT can be fine-tuned to adapt to specific
NLP tasks.
• GPT (Generative Pre-trained Transformer): GPT models (including GPT-3) are pre-
trained on vast amounts of text and can be used for tasks like text generation, language
translation, and summarization.
• T5 (Text-to-Text Transfer Transformer): Pre-trained to convert all NLP tasks into a text-
to-text format, T5 is a versatile model that can be adapted for translation,
summarization, and more.
3. Speech Recognition
• DeepSpeech: A pre-trained model by Mozilla for speech-to-text tasks, trained on large
datasets of spoken language. It is fine-tuned for specific languages or dialects.
• Wav2Vec: A pre-trained model by Facebook AI, used for automatic speech recognition
(ASR) tasks. It is designed to recognize speech and convert it into text, and can be fine-
tuned for specialized speech tasks.
UNIT-2 PRE-TRAINED MODEL ARCHITECTURES
4. Generative Models
• StyleGAN: A pre-trained generative model, commonly used for creating
realistic synthetic images (e.g., human faces). It has been trained on
large datasets and can be fine-tuned to generate specific types of images
(like art, landscapes, etc.).
• CycleGAN: Pre-trained on paired datasets to transfer styles between
images. For example, converting photos into paintings or transforming
images between domains (e.g., from winter to summer scenes).
5. Object Detection
• YOLO (You Only Look Once): A real-time object detection model, pre-
trained on datasets like COCO, which is widely used for detecting
multiple objects in images and videos.
• Faster R-CNN: A region-based convolutional neural network, pre-trained
on ImageNet or COCO, used for tasks like object detection and
localization in images.
UNIT-2 Image Classification

• Step 1: Data Preparation

• Collect and Prepare Your Dataset: Gather a labeled dataset of images for training and
testing. Ensure that the dataset is balanced and representative of the classes you want to
classify.

• Data Preprocessing: Preprocess the images by resizing them to a consistent size (e.g.,
224x224 pixels), normalizing pixel values (usually in the range [0, 1]), and augmenting the
data if needed (applying random transformations like rotation, flipping, and cropping to
increase dataset diversity).

• Split the Dataset: Divide the dataset into training, validation, and test sets. Typically, you
allocate a larger portion to training (e.g., 70-80%) and the rest to validation and testing.
UNIT-2 Image Classification
• Step 2: Build the CNN Model
• Choose a Pre-trained Model (Optional): Consider using a pre-trained CNN model like VGG, ResNet,
Inception, or MobileNet as a starting point. These models are trained on large datasets (e.g.,
ImageNet) and have learned useful features. You can fine-tune these models for your specific task.

• Custom CNN Architecture (Alternative): If you prefer to build your own CNN architecture, design a
stack of convolutional layers, pooling layers, and fully connected layers. Ensure that the architecture
suits the complexity of your classification problem.

• Compile the Model: Define the loss function (typically categorical cross-entropy for classification), the
optimizer (e.g., Adam, SGD), and the evaluation metric (e.g., accuracy).
• Step 3: Training the CNN Model

• Training: Feed the training data into the CNN model and start training. During training, the
model adjusts its weights to minimize the loss function. This process may take several
epochs (iterations over the entire dataset). Monitor training performance on the validation
set to prevent overfitting.
UNIT-2 Image Classification

• Step 4: Evaluate and Fine-Tune


• Validation: After training, evaluate the model's performance on the validation set. This helps you tune hyperparameters, such as
learning rate, batch size, and model architecture, for better results.

• Fine-Tuning (Optional): Depending on validation results, you may decide to fine-tune the model by adjusting layers,
adding regularization (e.g., dropout), or training for more epochs.
• Step 5: Testing and Deployment
• Testing: Once you are satisfied with the model's performance, evaluate it on the separate test dataset to assess its
generalization to unseen data.

• Deployment: If the model performs well, deploy it in your application for real-time image classification. This could be in
the form of a web app, mobile app, or integration into an existing system.

• Monitoring and Maintenance: Continuously monitor the model's performance in the production environment and
retrain it periodically with new data if necessary.
UNIT-2 Image Classification

• Image classification involves assigning labels or classes to input images.


• It is a supervised learning task where a model is trained on labeled image data to predict the class of unseen
images.
• CNN are commonly used for image classification as they can learn hierarchical features like edges, textures,
and shapes, enabling accurate object recognition in images.
• CNNs excel in this task because they can automatically extract meaningful spatial features from images.
• Here are different layers involved in the process:
UNIT-2 Image Classification

• Input Layer
• The input layer of a CNN takes in the raw image data as input. The images are typically represented as matrices of pixel
values. The dimensions of the input layer correspond to the size of the input images (e.g., height, width, and color channels).
• Convolutional Layers
• Convolutional layers are responsible for feature extraction. They consist of filters (also known as kernels) that are convolved
with the input images to capture relevant patterns and features. These layers learn to detect edges, textures, shapes, and
other important visual elements.
• Pooling Layers
• Pooling layers reduce the spatial dimensions of the feature maps produced by the convolutional layers. They perform
downsampling operations (e.g., max pooling) to retain the most salient information while discarding unnecessary details. This
helps in achieving translation invariance and reducing computational complexity.
• Fully Connected Layers
• The output of the last pooling layer is flattened and connected to one or more fully connected layers. These layers function as
traditional neural network layers and classify the extracted features. The fully connected layers learn complex relationships
between features and output class probabilities or predictions.
• Output Layer
• The output layer represents the final layer of the CNN. It consists of neurons equal to the number of distinct classes in the
classification task. The output layer provides each class’s classification probabilities or predictions, indicating the likelihood of
the input image belonging to a particular cla
UNIT-2 Image Classification

• Image classification involves the extraction of features from the image to observe some patterns in the dataset.
• Using an ANN for the purpose of image classification would end up being very costly in terms of computation since the trainable parameters become
extremely large.
• For example, if we have a 50 X 50 image of a cat, and we want to train our traditional ANN on that image to classify it into a dog or a cat the
trainable parameters become –
• (50*50) * 100 image pixels multiplied by hidden layer + 100 bias + 2 * 100 output neurons + 2 bias = 2,50,302
• Filters are used in CNN to extract features from a raw image and Filters exist of many different types according to their purpose.

• Filters help us exploit the spatial locality of a particular image by enforcing a local connectivity pattern between neurons.
UNIT-2 Image Classification

• Convolution basically means a pointwise multiplication of two functions to produce a third function.
• Here one function is our image pixels matrix and another is our filter.
• filter slides over the image and get the dot product of the two matrices.
• The resulting matrix is called an “Activation Map” or “Feature Map”.
UNIT-2 Image Classification

• https://colab.research.google.com/drive/14hUmYnsOV-laGY6XjMSXqmoB9E-g5McV?usp=sharing#scrollTo=nRkIkniZ9mOI
UNIT-2 Text Classification

• Text classification is the process of categorizing text data into predefined groups
or labels.
• Text Classification is the task of assigning a sentence or document an appropriate
category. The categories depend on the chosen dataset and can range from topics.
• It is a type of supervised learning, where a machine learning model learns from a
labeled dataset and then predicts the category of unseen data.
• Text Classification problems include emotion classification, news classification,
citation intent classification, among others.
• Text classification can also be applied to various tasks such as spam detection,
sentiment analysis, topic categorization, and language identification.
• Benchmark datasets for evaluating text classification capabilities include GLUE,
AGNews, among others.
• The primary goal of text classification is to automatically assign a category or
class to a given text based on its content.
UNIT-2 Steps in Text Classification:

1. Data Collection: You need a labeled dataset where each text sample is associated with
a category label. The quality and quantity of data significantly impact the performance
of a text classification model.
2. Text Preprocessing: Text data is often noisy and unstructured, so preprocessing is
crucial. The preprocessing steps may include:
• Tokenization: Splitting the text into smaller pieces (tokens), typically words or
sentences.
• Lowercasing: Converting all the text to lowercase to maintain uniformity.
• Stopword Removal: Removing common words such as "the", "and", "is", etc., that
do not contribute much to the meaning of the text.
• Stemming or Lemmatization: Reducing words to their root form, e.g., "running" →
"run".
• Punctuation Removal: Removing unnecessary punctuation marks.
• Vectorization: Converting text into numerical features using techniques like Bag of
Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), or word
embeddings (Word2Vec, GloVe, etc.).
UNIT-2 Steps in Text Classification:

3. Feature Extraction: Once the text is preprocessed, the next step is to convert the text into
numerical representations (vectors). This allows the machine learning model to process the text
effectively. Some common methods include:
• Bag of Words (BoW): This method represents each document as a vector of word
frequencies.
• TF-IDF: This technique takes into account the frequency of terms within a document and
the rarity of those terms across a corpus, assigning higher weight to rare words.
• Word Embeddings: These are dense vector representations of words in a continuous vector
space, capturing semantic meanings and relationships between words.
4. Model Training: With the prepared feature set, you can train a classification model. Some
popular models for text classification include:
• Naive Bayes: A probabilistic classifier based on Bayes' theorem, often used for tasks like
spam detection.
• Support Vector Machines (SVM): A powerful classifier that tries to find the best boundary
between different classes.
• Deep Learning Models (e.g., CNNs, RNNs, Transformers): These models, especially with
architectures like LSTMs, GRUs, or BERT, have proven highly effective in text
classification tasks.
UNIT-2 Steps in Text Classification:

5. Model Evaluation: After training the model, it's essential to evaluate


its performance. Common evaluation metrics include:
• Accuracy: The proportion of correctly classified instances out of
the total instances.
• Precision: The proportion of true positives out of all predicted
positives.
• Recall: The proportion of true positives out of all actual positives.
• F1-Score: The harmonic mean of precision and recall, providing a
balance between the two.
6. Prediction: Once the model is trained and evaluated, you can use it to
classify unseen text. The model will output a predicted category label
for the new text.
UNIT-2 Terms used in Text Classification:

1. Text Corpus
A corpus (plural: corpora) is a large collection of text data used for training and
evaluating text classification models. The corpus can consist of documents,
sentences, or words, depending on the application.
Example: A corpus could be a collection of movie reviews, where each review is a
document that will be labeled for sentiment (positive or negative).

2. Label/Category
A label or category is the class or category assigned to a given text. In supervised
learning, each piece of text in the training dataset is associated with a label.
Example:
•In a sentiment analysis task, the labels could be "positive" or "negative."
•In spam detection, labels could be "spam" or "ham" (non-spam).
UNIT-2 Terms used in Text Classification:

3. Tokenization
Tokenization is the process of splitting a text into smaller units, typically words or
phrases. Tokens are the building blocks used for analysis.
Example:
•Text: "I love programming!"
•Tokens: ["I", "love", "programming"]
4. Stop Words
Stop words are common words like "the", "and", "is", "to", etc., that do not carry
significant meaning and are usually removed during preprocessing. The rationale is
that these words do not help in distinguishing between different text categories.
Example:
•Sentence: "The movie was amazing."
•Stop words might be removed, leaving: "movie amazing."
UNIT-2 Terms used in Text Classification:

5. Stemming
Stemming is the process of reducing words to their root form. The idea is to treat
different forms of the same word as equivalent.
Example:
•"running" → "run"
•"better" → "good"
6. Lemmatization
Lemmatization is similar to stemming, but it involves reducing a word to its
dictionary form (lemma). Unlike stemming, lemmatization considers the context of
a word and reduces it to a valid word (lemma) that exists in the dictionary.
Example:
•"running" → "run"
•"better" → "good"
UNIT-2 Terms used in Text Classification:

7. Feature Extraction
Feature extraction refers to the process of transforming raw text into numerical
representations that can be fed into a machine learning model. These numerical features
capture the important aspects of the text.
Common methods include:
•Bag of Words (BoW): A simple representation where each word in a document is
counted, and the frequency of each word is used as a feature.
•TF-IDF (Term Frequency-Inverse Document Frequency): Weighs words by their
frequency in a document and how rare they are in the entire corpus. It helps in reducing
the importance of common words and emphasizes rare but significant words.
8. Vocabulary
A vocabulary refers to the set of unique words that exist in the corpus after
preprocessing (tokenization, stopword removal, etc.). It forms the basis for vectorizing
text.
Example: In the sentence "I love programming", the vocabulary could be ["I", "love",
"programming"].
UNIT-2 Terms used in Text Classification:

7. Feature Extraction
Feature extraction refers to the process of transforming raw text into numerical
representations that can be fed into a machine learning model. These numerical features
capture the important aspects of the text.
Common methods include:
•Bag of Words (BoW): A simple representation where each word in a document is
counted, and the frequency of each word is used as a feature.
•TF-IDF (Term Frequency-Inverse Document Frequency): Weighs words by their
frequency in a document and how rare they are in the entire corpus. It helps in reducing
the importance of common words and emphasizes rare but significant words.
8. Vocabulary
A vocabulary refers to the set of unique words that exist in the corpus after
preprocessing (tokenization, stopword removal, etc.). It forms the basis for vectorizing
text.
Example: In the sentence "I love programming", the vocabulary could be ["I", "love",
"programming"].
UNIT-2 Terms used in Text Classification:

9. Bag of Words (BoW)


The Bag of Words (BoW) model is a simple text representation technique where the text is
represented as a vector, and each word's frequency in the document is used as a feature. The
order of words is ignored, making it a "bag" of words.
Example:
•Document 1: "I love programming."
•Document 2: "I enjoy coding."
•BoW vectors could look like this:
•Vocabulary: ["I", "love", "programming", "enjoy", "coding"]
•Document 1: [1, 1, 1, 0, 0] (indicating the presence of "I", "love", and "programming")
•Document 2: [1, 0, 0, 1, 1]
UNIT-2 Terms used in Text Classification:
UNIT-2 Terms used in Text Classification:
UNIT-2 Terms used in Text Classification:
UNIT-2 Terms used in Text Classification:
UNIT-2 Terms used in Text Classification:
UNIT-2 Text Classification
UNIT-2 Text Classification
UNIT-2 Text Classification
UNIT-2 Text Classification
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 Text Classification (Example)
UNIT-2 References

https://livebook.manning.com/book/transfer-learning-in-action
/chapter-1/v-1/18

Text Classification:
https://colab.research.google.com/drive/1CzSuEAb6bl7ke3iZiE
hQgjM1NNGs9C4r?usp=sharing

You might also like