0% found this document useful (0 votes)
26 views27 pages

DL Unit 3

Unit III of the Deep Learning course focuses on Convolutional Neural Networks (CNNs), detailing their architecture, including convolutional, pooling, and fully connected layers. It highlights the importance of CNNs in image recognition and various applications such as facial recognition, document analysis, and climate understanding. The unit also discusses key components like dropout layers and activation functions, which enhance the performance and efficiency of CNNs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views27 pages

DL Unit 3

Unit III of the Deep Learning course focuses on Convolutional Neural Networks (CNNs), detailing their architecture, including convolutional, pooling, and fully connected layers. It highlights the importance of CNNs in image recognition and various applications such as facial recognition, document analysis, and climate understanding. The unit also discusses key components like dropout layers and activation functions, which enhance the performance and efficiency of CNNs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

PANIMALAR ENGINEERING COLLEGE

23CS2902 – DEEP LEARNING – UNIT 3

UNIT III CONVOLUTIONAL NEURAL NETWORKS

CNN Architectures – Convolution – Pooling Layers – Transfer Learning – Image


Classification using Transfer Learning

CNN Architectures:

Deep learning, there are several types of models such as the Artificial Neural
Networks (ANN), Autoencoders, Recurrent Neural Networks (RNN) and Reinforcement
Learning. But there has been one particular model that has contributed a lot in the field of
computer vision and image analysis which is the Convolutional Neural Networks (CNN)
or the ConvNets.

CNN is very useful as it minimises human effort by automatically detecting the features.
For example, for apples and mangoes, it would automatically detect the distinct features
of each class on its own.

CNNs are a class of Deep Neural Networks that can recognize and classify particular
features from images and are widely used for analysing visual images. Their applications
range from image and video recognition, image classification, medical image analysis,
computer vision and natural language processing.

CNN has high accuracy, and because of the same, it is useful in image recognition. Image
recognition has a wide range of uses in various industries such as medical image
analysis, phone, security, recommendation systems, etc.

The term ‘Convolution” in CNN denotes the mathematical function of convolution which
is a special kind of linear operation wherein two functions are multiplied to produce a
third function which expresses how the shape of one function is modified by the other. In
simple terms, two images which can be represented as matrices are multiplied to give an
output that is used to extract features from the image.

1
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Basic Architecture

There are two main parts to CNN architecture

 A convolution tool that separates and identifies the various features of the image for
analysis in a process called as Feature Extraction.

 The network of feature extraction consists of many pairs of convolutional or pooling


layers.

 A fully connected layer that utilizes the output from the convolution process and predicts
the class of the image based on the features extracted in previous stages.

 This CNN model of feature extraction aims to reduce the number of features present in a
dataset. It creates new features which summarises the existing features contained in an
original set of features. There are many CNN layers as shown in the CNN architecture
diagram.

CNN has high accuracy, and because of the same, it is useful in image recognition. Image
recognition has a wide range of uses in various industries such as medical image
analysis, phone, security, recommendation systems, etc.
1. Convolution Layers

There are three types of layers that make up the CNN which are the convolutional
layers, pooling layers, and fully-connected (FC) layers. When these layers are stacked, a
CNN architecture will be formed. In addition to these three layers, there are two more
important parameters which are the dropout layer and the activation function which are

2
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

The output is termed as the Feature map which gives us information about the image such
as the corners and edges. Later, this feature map is fed to other layers to learn several
other features of the input image.
The convolution layer in CNN passes the result to the next layer once applying the
convolution operation in the input. Convolutional layers in CNN benefit a lot as they
ensure the spatial relationship between the pixels is intact.

2. Pooling Layer

In most cases, a Convolutional Layer is followed by a Pooling Layer. The primary


aim of this layer is to decrease the size of the convolved feature map to reduce the
computational costs. This is performed by decreasing the connections between layers and
independently operates on each feature map. Depending upon method used, there are
several types of Pooling operations. It basically summarises the features generated by a
convolution layer.
In Max Pooling, the largest element is taken from feature map. Average Pooling
calculates the average of the elements in a predefined sized Image section. The total sum
of the elements in the predefined section is computed in Sum Pooling. The Pooling Layer
usually serves as a bridge between the Convolutional Layer and the FC Layer.
This CNN model generalises the features extracted by the convolution layer, and
helps the networks to recognise the features independently. With the help of this, the
computations are also reduced in a network.

3. Fully Connected Layer

The Fully Connected (FC) layer consists of the weights and biases along with the
neurons and is used to connect the neurons between two different layers. These layers are
usually placed before the output layer and form the last few layers of a CNN Architecture.
In this, the input image from the previous layers are flattened and fed to the FC
layer. The flattened vector then undergoes few more FC layers where the mathematical
functions operations usually take place. In this stage, the classification process begins to
take place. The reason two layers are connected is that two fully connected layers will

3
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

perform better than a single connected layer. These layers in CNN reduce the human
supervision.
4. Dropout
Usually, when all the features are connected to the FC layer, it can cause over
fitting in the training dataset. Over fitting occurs when a particular model works so well
on the training data causing a negative impact in the model’s performance when used on
a new data.
To overcome this problem, a dropout layer is utilised wherein a few neurons are dropped
from the neural network during training process resulting in reduced size of the model.
On passing a dropout of 0.3, 30% of the nodes are dropped out randomly from the neural
network.

Dropout results in improving the performance of a machine learning model as it prevents


overfitting by making the network simpler. It drops neurons from the neural networks
during training.

5. Activation Functions

Finally, one of the most important parameters of the CNN model is the activation
function. They are used to learn and approximate any kind of continuous and complex
relationship between variables of the network. In simple words, it decides which
information of the model should fire in the forward direction and which ones should not
at the end of the network.
It adds non-linearity to the network. There are several commonly used activation
functions such as the ReLU, Softmax, tanH and the Sigmoid functions. Each of these
functions have a specific usage. For a binary classification CNN model, sigmoid and
softmax functions are preferred an for a multi-class classification, generally softmax is
used. In simple terms, activation functions in a CNN model determine whether a neuron
should be activated or not. It decides whether the input to the work is important or not to
predict using mathematical operations.

Top 7 Applications of Convolutional Neural Networks:

1. Decoding Facial Recognition


Facial recognition is broken down by a convolutional neural network into the
following major components -
 Identifying every face in the picture
 Focusing on each face despite external factors, such as light, angle, pose, etc

4
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

 Identifying unique features


 Comparing all the collected data with already existing data in the database to
match a face with a name.

2. Analyzing Documents

Convolutional neural networks can also be used for document analysis. This is not
just useful for handwriting analysis, but also has a major stake in recognizers. For a
machine to be able to scan an individual's writing, and then compare that to the wide
database it has, it must execute almost a million commands a minute. It is said with the
use of CNNs and newer models and algorithms, the error rate has been brought down to a
minimum of 0.4% at a character level, and though it's complete testing is yet to be widely
seen
3. Collecting Historic and Environmental Elements

CNNs are also used for more complex purposes such as natural history
collections. These collections act as key players in documenting major parts of history
such as biodiversity, evolution, habitat loss, biological invasion, and climate change.
4. Understanding Climate

CNNs can be used to play a major role in the fight against climate change,
especially in understanding the reasons why we see such drastic changes and how we
could experiment in curbing the effect. It is said that the data in such natural history
collections can also provide greater social and scientific insights, but this would
require skilled human resources such as researchers who can physically visit these
types of repositories. There is a need for more manpower to carry out deeper
experiments in this field.
5 Understanding Gray Areas

Introduction of the Gray area into CNNs is posed to provide a much more realistic
picture of the real world. Currently, CNNs largely function exactly like a machine,
seeing a true and false value for every question. However, as humans, we understand
that the real world plays out in a thousand shades of gray. Allowing the machine to
understand and process fuzzier logic will help it understand the gray area we humans
live in and strive to work against. This will help CNNs get a more holistic view of
what human sees.

6. Advertising

CNNs have already brought a world of difference to advertising with the


introduction of programmatic buying and data-driven personalized advertising.
7. Other Interesting Fields

5
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

CNNs are poised to be the future with their introduction into driverless cars,
robots that can mimic human behaviour, aides to human genome mapping projects,
predicting earthquakes and natural disasters, and maybe even self-diagnoses of
medical problems. So, you wouldn't even have to drive down to a clinic or schedule an
appointment with a doctor to ensure your sneezing attack or high fever is just the
simple flu and not the symptoms of some rare disease. One problem that researchers
are working on with CNNs is brain cancer detection. The earlier detection of brain
cancer can prove to be a big step in saving more lives affected by this illness.

Convolutional Neural Network Architecture


A CNN typically has three layers: a convolutional layer, a pooling layer, and a
fully connected layer.

Convolution Layer

The convolution layer is the core building block of the CNN. It carries the main
portion of the network’s computational load.
This layer performs a dot product between two matrices, where one matrix is the
set of learnable parameters otherwise known as a kernel, and the other matrix is the
restricted portion of the receptive field. The kernel is spatially smaller than an image but
is more in- depth. This means that, if the image is composed of three (RGB) channels, the
kernel height and width will be spatially small, but the depth extends up to all three
channels.
6
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Illustration of Convolution Operation

During the forward pass, the kernel slides across the height and width of the
image- producing the image representation of that receptive region. This produces a two-
dimensional representation of the image known as an activation map that gives the
response of the kernel at each spatial position of the image. The sliding size of the kernel
is called a stride.
If we have an input of size W x W x D and Dout number of kernels with a spatial
size of F with stride S and amount of padding P, then the size of output volume can be
determined by the following formula:

This will yield an output volume of size Wout x Wout x Dout.

7
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Motivation behind Convolution

Convolution leverages three important ideas that motivated computer vision researchers:
sparse interaction, parameter sharing, and equivariant representation. Let’s describe each
one of them in detail.
Trivial neural network layers use matrix multiplication by a matrix of parameters
describing the interaction between the input and output unit. This means that every output
unit interacts with every input unit. However, convolution neural networks have sparse
interaction. This is achieved by making kernel smaller than the input e.g., an image can
have millions or thousands of pixels, but while processing it using kernel we can detect
meaningful information that is of tens or hundreds of pixels. This means that we need to
store fewer parameters that not only reduces the memory requirement of the model but
also improves the statistical efficiency of the model.
If computing one feature at a spatial point (x1, y1) is useful then it should also be useful
at some other spatial point say (x2, y2). It means that for a single two-dimensional slice
i.e., for creating one activation map, neurons are constrained to use the same set of
weights. In a traditional neural network, each element of the weight matrix is used once
and then never revisited, while convolution network has shared parameters i.e., for
getting output, weights applied to one input are the same as the weight applied elsewhere.

8
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Due to parameter sharing, the layers of convolution neural network will have a property
of equivariance to translation. It says that if we changed the input in a way, the output
will also get changed in the same way.

Pooling Layer

The pooling layer replaces the output of the network at certain locations by deriving a
summary statistic of the nearby outputs. This helps in reducing the spatial size of the
representation, which decreases the required amount of computation and weights. The
pooling operation is processed on every slice of the representation individually.
There are several pooling functions such as the average of the rectangular neighbourhood,
L2 norm of the rectangular neighbourhood, and a weighted average based on the distance
from the central pixel. However, the most popular process is max pooling, which reports
the maximum output from the neighbourhood.

This will yield an output volume of size Wout x Wout x D.

In all cases, pooling provides some translation invariance which means that an object
would be recognizable regardless of where it appears on the frame.

9
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Fully Connected Layer

Neurons in this layer have full connectivity with all neurons in the preceding and
succeeding layer as seen in regular FCNN. This is why it can be computed as usual by a
matrix multiplication followed by a bias effect.
The FC layer helps to map the representation between the input and the output.

Non-Linearity Layers

Since convolution is a linear operation and images are far from linear, non-linearity layers
are often placed directly after the convolutional layer to introduce non-linearity to the
activation map.
There are several types of non-linear operations, the popular ones being:

1. Sigmoid

The sigmoid non-linearity has the mathematical form σ(κ) = 1/(1+e¯κ). It takes a real-
valued number and “squashes” it into a range between 0 and 1.
However, a very undesirable property of sigmoid is that when the activation is at either
tail, the gradient becomes almost zero. If the local gradient becomes very small, then in
back propagation it will effectively “kill” the gradient. Also, if the data coming into the
neuron is always positive, then the output of sigmoid will be either all positives or all
negatives, resulting in a zig-zag dynamic of gradient updates for weight.

2. Tanh - Tanh squashes a real-valued number to the range

[-1, 1]. Like sigmoid, the activation saturates, but — unlike the sigmoid neurons —
its output is zero centered.

3. ReLU - The Rectified Linear Unit (ReLU) has become very popular in the last few
years. It computes the function ƒ(κ)=max (0,κ). In other words, the activation is
simply threshold at zero.

10
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

In comparison to sigmoid and tanh, ReLU is more reliable and accelerates the
convergence by six times.
Unfortunately, a con is that ReLU can be fragile during training. A large gradient flowing
through it can update it in such a way that the neuron will never get further updated.
However, we can work with this by setting a proper learning rate.

Designing a Convolutional Neural Network


Now that we understand the various components, we can build a convolutional neural
network. We will be using Fashion-MNIST, which is a dataset of Zalando’s article
images consisting of a training set of 60,000 examples and a test set of 10,000 examples.
Each example is a 28x28 grayscale image, associated with a label from 10 classes. The
dataset can be downloaded here.

Our convolutional neural network has architecture as follows:

[INPUT]

→[CONV 1] → [BATCH NORM] → [ReLU] → [POOL 1]

→ [CONV 2] → [BATCH NORM] → [ReLU] → [POOL 2]

→ [FC LAYER] → [RESULT]

For both conv layers, we will use kernel of spatial size 5 x 5 with stride size 1 and padding
of 2. For both pooling layers, we will use max pool operation with kernel size 2, stride 2,
and zero padding.

11
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

12
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Introduction to Pooling Layers in CNN


The typical structure of a CNN consists of three basic layers
1. Convolutional layer: These layers generate a feature map by sliding a filter over
the input image and recognizing patterns in images.
2. Pooling layers: These layers down sample the feature map to introduce
Translation invariance, which reduces the over fitting of the CNN model.
3. Fully Connected Dense Layer: This layer contains the same number of units as
the number of classes and the output activation function such as “softmax” or
“sigmoid”

What are Pooling layers?

Pooling layers are one of the building blocks of Convolutional Neural Networks. Where
Convolutional layers extract features from images, Pooling layers consolidate the
features learned by CNNs. Its purpose is to gradually shrink the representation’s spatial
dimension to minimize the number of parameters and computations in the network.

Why are Pooling layers needed?

The feature map produced by the filters of Convolutional layers is location-dependent.


For example, if an object in an image has shifted a bit it might not be recognizable by the
Convolutional layer. So, it means that the feature map records the precise positions of
features in the input. What pooling layers provide is “Translational Invariance” which

13
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

makes the CNN invariant to translations, i.e., even if the input of the CNN is translated,
the CNN will still be able to recognize the features in the input.
In all cases, pooling helps to make the representation become approximately invariant to
small translations of the input. Invariance to translation means that if we translate the
input by a small amount, the values of most of the pooled outputs do not change.

How do Pooling layers achieve that? A Pooling layer is added after the Convolutional
layer(s), as seen in the structure of a CNN above. It downsamples the output of the
Convolutional layers by sliding the filter of some size with some stride size and
calculating the maximum or average of the input.

There are two types of poolings that are used:


1. Max pooling: This works by selecting the maximum value from every pool. Max
Pooling retains the most prominent features of the feature map, and the returned
image is sharper than the original image.
2. Average pooling: This pooling layer works by getting the average of the pool.
Average pooling retains the average values of features of the feature map. It
smoothes the image while keeping the essence of the feature in an image.

Let’s explore the working of Pooling Layers using TensorFlow. Create a NumPy array
and reshape it.
matrix=np.array([[3.,2.,0.,0.],
[0.,7.,1.,3.],
[5.,2.,3.,0.],
[0.,9.,2.,3.]]).reshape(1,4,4,1)

14
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Max Pooling

Create a MaxPool2D layer with pool size = 2 and strides = 2. Apply the MaxPool2D
layer to the matrix, and you will get the MaxPooled output in the tensor form. By
applying it to the matrix, the Max pooling layer will go through the matrix by computing
the max of each 2×2 pool with a jump of 2. Print the shape of the tensor. Use tf.squeeze
to remove dimensions of size 1 from the shape of a tensor.
max_pooling=tf.keras.layers.MaxPool2D(pool_size=2,strides=2)
max_pooled_matrix=max_pooling(matrix)
print(max_pooled_matrix.shape)
print(tf.squeeze(max_pooled_matrix))

Average Pooling
Create an AveragePooling2D layer with the same 2 pool_size and strides. Apply the
AveragePooling2D layer to the matrix. By applying it to the matrix, the average pooling
layer will go through the matrix by computing the average of 2×2 for each pool with a
jump of 2. Print the shape of the matrix and Use tf.squeeze to convert the output into a
readable form by removing all 1 size dimensions.
average_pooling=tf.keras.layers.AveragePooling2D(pool_size=2,
strides=2)
average_pooled_matrix=average_pooling(matrix)
print(averge_pooled_matrix.shape)
print(tf.squeeze(average_pooled_matrix))

15
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Max Pooling and Average Pooling being performed

Global Pooling Layers

Global Pooling Layers often replace the classifier’s fully connected or Flatten layer. The
model instead ends with a convolutional layer that produces as many feature maps as
there are target classes and performs global average pooling on each of the feature maps
to combine each feature map into a single value.
Create the same NumPy array but with a different shape. By keeping the same shape as
above, the Global Pooling layers will reduce them to one value.
matrix=np.array([[[3.,2.,0.,0.],
[0.,7.,1.,3.]],
[[5.,2.,3.,0.],
[0.,9.,2.,3.]]]).reshape(1,2,2,4)

Global Average Pooling


Considering a tensor of shape h*w*n, the output of the Global Average Pooling layer is a
single value across h*w that summarizes the presence of the feature. Instead of
downsizing the patches of the input feature map, the Global Average Pooling layer
downsizes the whole h*w into 1 value by taking the average.

16
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

global_average_pooling=tf.keras.layers.GlobalAveragePooling2D()
global_average_pooled_matrix=global_average_pooling(matrix)
print(global_average_pooled_matrix)

The output of the Global Average Pooled layer

Global Max Pooling


With the tensor of shape h*w*n, the output of the Global Max Pooling layer is a single
value across h*w that summarizes the presence of a feature. Instead of downsizing the
patches of the input feature map, the Global Max Pooling layer downsizes the
whole h*w into 1 value by taking the maximum.
global_max_pooling=tf.keras.layers.GlobalMaxPool2D()
global_max_pooled_matrix=global_max_pooling(matrix)
print(global_max_pooled_matrix)

The output of the Global Max Pooled layer

Conclusion

In general, pooling layers are useful when you want to detect an object in an image
regardless of its position in the image. The consequence of adding pooling layers is the
reduction of over fitting, increased efficiency, and faster training times in a CNN model.
While the max pooling layer draws out the most prominent features of an image, average
pooling smoothest the image retaining the essence of its features. Global pooling layers
often replace the Flatten or Dense output layers.
Transfer Learning for Deep Learning

The reuse of a previously learned model on a new problem is known as transfer learning.
It’s particularly popular in deep learning right now since it can train deep neural networks
with a small amount of data.

The knowledge of an already trained machine learning model is transferred to a different


but closely linked problem throughout transfer learning. For example, if you trained a
simple classifier to predict whether an image contains a backpack, you could use the
model’s training knowledge to identify other objects such as sunglasses.

17
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

With transfer learning, we basically try to use what we’ve learned in one task to better
understand the concepts in another. Weights are being automatically shifted to a network
performing “task A” from a network that performed new “task B.”
Because of the massive amount of CPU power required, transfer learning is typically
applied in computer vision and natural language processing tasks like sentiment analysis.

How Transfer Learning Works?

In computer vision, neural networks typically aim to detect edges in the first layer, forms
in the middle layer, and task-specific features in the latter layers. The early and central
layers are employed in transfer learning, and the latter layers are only retrained. It makes
use of the labelled data from the task it was trained on.

18
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Let’s return to the example of a model that has been intended to identify a backpack in an
image and will now be used to detect sunglasses. Because the model has trained to
recognise objects in the earlier levels, we will simply retrain the subsequent layers to
understand what distinguishes sunglasses from other objects.
The reuse of a previously learned model on a new problem is known as transfer learning.
It’s particularly popular in deep learning right now since it can train deep neural networks
with a small amount of data. This is particularly valuable in the field of data science, as
most real-world situations do not require millions of labelled data points to train
complicated models.

Let’s return to the example of a model that has been intended to identify a backpack in an
image and will now be used to detect sunglasses. Because the model has trained to
recognise objects in the earlier levels, we will simply retrain the subsequent layers to
understand what distinguishes sunglasses from other objects.

Why Should You Use Transfer Learning?


Transfer learning offers a number of advantages, the most important of which are reduced
training time, improved neural network performance (in most circumstances), and the
absence of a large amount of data.
To train a neural model from scratch, a lot of data is typically needed, but access to that
data isn’t always possible – this is when transfer learning comes in handy.

19
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Because the model has already been pre-trained, a good machine learning model can be
generated with fairly little training data using transfer learning. This is especially useful
in natural language processing, where huge labelled datasets require a lot of expert
knowledge. Additionally, training time is decreased because building a deep neural
network from the start of a complex task can take days or even weeks.

When to Use Transfer Learning?


When we don’t have enough annotated data to train our model with. When there is a pre-
trained model that has been trained on similar data and tasks. If you used TensorFlow to
train the original model, you might simply restore it and retrain some layers for your job.

Transfer learning, on the other hand, only works if the features learnt in the first task are
general, meaning they can be applied to another activity. Furthermore, the model’s input
must be the same size as it was when it was first trained. If you don’t have it, add a step
to resize your input to the required size.

1. TRAINING A MODEL TO REUSE IT


Consider the situation in which you wish to tackle Task A but lack the necessary data to
train a deep neural network. Finding a related task B with a lot of data is one method to
get around this.
Utilize the deep neural network to train on task B and then use the model to solve task A.
The problem you’re seeking to solve will decide whether you need to employ the entire
model or just a few layers.
If the input in both jobs is the same, you might reapply the model and make predictions
for your new input. Changing and retraining distinct task-specific layers and the output
layer, on the other hand, is an approach to investigate.

20
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

2. USING A PRE-TRAINED MODEL


The second option is to employ a model that has already been trained. There are a number
of these models out there, so do some research beforehand. The number of layers to reuse
and retrain is determined by the task.
Keras consists of nine pre-trained models used in transfer learning, prediction, fine-
tuning. These models, as well as some quick lessons on how to utilise them, may be
found here. Many research institutions also make trained models accessible.
The most popular application of this form of transfer learning is deep learning.

3. EXTRACTION OF FEATURES
Another option is to utilise deep learning to identify the optimum representation of your
problem, which comprises identifying the key features. This method is known as
representation learning, and it can often produce significantly better results than hand-
designed representations.
Feature creation in machine learning is mainly done by hand by researchers and domain
specialists. Deep learning, fortunately, can extract features automatically. Of course, this
does not diminish the importance of feature engineering and domain knowledge; you
must still choose which features to include in your network.

Neural networks, on the other hand, have the ability to learn which features are critical
and which aren’t. Even for complicated tasks that would otherwise necessitate a lot of
human effort, a representation learning algorithm can find a decent combination of
characteristics in a short amount of time.
The learned representation can then be applied to a variety of other challenges. Simply
utilise the initial layers to find the appropriate feature representation, but avoid using the
network’s output because it is too task-specific. Instead, send data into your network and
output it through one of the intermediate layers.
The raw data can then be understood as a representation of this layer.
This method is commonly used in computer vision since it can shrink your dataset,
reducing computation time and making it more suited for classical algorithms.
Models That Have Been Pre-Trained
There are a number of popular pre-trained machine learning models available. The
Inception-v3 model, which was developed for the ImageNet “Large Visual Recognition

21
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Challenge,” is one of them.” Participants in this challenge had to categorize pictures into
1,000 subcategories such as “zebra,” “Dalmatian,” and “dishwasher.”
Topic II
Transfer learning is about leveraging feature representations from a pre-trained model, so
you don’t have to train a new model from scratch.
The pre-trained models are usually trained on massive datasets that are a standard
benchmark in the computer vision frontier. The weights obtained from the models can be
reused in other computer vision tasks.
These models can be used directly in making predictions on new tasks or integrated into
the process of training a new model. Including the pre-trained models in a new model
leads to lower training time and lower generalization error.
Transfer learning is particularly very useful when you have a small training dataset. In
this case, you can, for example, use the weights from the pre-trained models to initialize
the weights of the new model. As you will see later, transfer learning can also be applied
to natural language processing problems.

 Models trained on the ImageNet can be used in real-world image classification problems.
This is because the dataset contains over 1000 classes. Let’s say you are an insect
researcher. You can use these models and fine-tune them to classify insects.
 Classifying text requires knowledge of word representations in some vector space. You
can train vector representations yourself. The challenge here is that you might not have
enough data to train the embeddings. Furthermore, training will take a long time. In this
case, you can use a pre-trained word embedding like GloVe to hasten your development
process.

What is the difference between transfer learning and fine-tuning?


Fine-tuning is an optional step in transfer learning. Fine-tuning will usually improve the
performance of the model. However, since you have to retrain the entire model, you’ll
likely overfit.

22
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Overfitting is avoidable. Just retrain the model or part of it using a low learning
rate. This is important because it prevents significant updates to the gradient. These
updates result in poor performance. Using a callback to stop the training process when the
model has stopped improving is also helpful.

Why use transfer learning?


Assuming you have 100 images of cats and 100 dogs and want to build a model to
classify the images. How would you train a model using this small dataset? You can train
your model from scratch, but it will most likely overfit horribly. Enter transfer learning.
Generally speaking, there are two big reasons why you want to use transfer learning:
 Raining models with high accuracy requires a lot of data. For example, the ImageNet
dataset contains over 1 million images. In the real world, you are unlikely to have such a
large dataset.
 Assuming that you had that kind of dataset, you might still not have the resources
required to train a model on such a large dataset. Hence transfer learning makes a lot of
sense if you don’t have the compute resources needed to train models on huge datasets.
 Even if you had the compute resources at your disposal, you still have to wait for days or
weeks to train such a model. Therefore using a pre-trained model will save you precious
time.

When does transfer learning not work?


Transfer learning will not work when the high-level features learned by the
bottom layers are not sufficient to differentiate the classes in your problem. For example,
a pre- trained model may be very good at identifying a door but not whether a door is
closed or open. In this case, you can use the low-level features (of the pre-trained
network) instead of the high-level features. In this case, you will have to retrain more
layers of the model or use features from earlier layers.
When datasets are not similar, features transfer poorly. This paper investigates the
similarity of datasets in more detail. That said, as shown in the paper, initializing the
network with pre-trained weights results in better performance than using random
weights. You might find yourself in a situation where you consider the removal of some
layers from
the pre-trained model. Transfer learning is unlikely to work in such an event. This is
because removing layers reduces the number of trainable parameters, which can result in
23
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

overfitting. Furthermore, determining the correct number of layers to remove without


overfitting is a cumbersome and time-consuming process.

How to implement transfer learning?


Let’s now take a moment and look at how you can implement transfer learning.
Transfer learning in 6 steps

Obtain the pre-trained model


The first step is to get the pre-trained model that you would like to use for your problem.
The various sources of pre-trained models are covered in a separate section.

Create a base model


Usually, the first step is to instantiate the base model using one of the architectures such
as ResNet or Xception.
You can also optionally download the pre-trained weights. If you don’t download the
weights, you will have to use the architecture to train your model from scratch. Recall
that the base model will usually have more units in the final output layer than you require.
When creating the base model, you, therefore, have to remove the final output layer.
Later on, you will add a final output layer that is compatible with your problem.

24
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Freeze layers so they don’t change during training


Freezing the layers from the pre-trained model is vital. This is because you don’t want
the weights in those layers to be re-initialized. If they are, then you will lose all the
learning that has already taken place. This will be no different from training the model
from scratch.

base_model.trainable = False

25
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

Add new trainable layers


The next step is to add new trainable layers that will turn old features into predictions on
the new dataset. This is important because the pre-trained model is loaded without the
final output layer.

Image Classification using Transfer Learning:


Step 1:

import matplotlib.pyplot as plt


import numpy as np
import os
import tensorflow as tf

Step 2:

_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')


validation_dir = os.path.join(PATH, 'validation')

BATCH_SIZE = 32
IMG_SIZE = (160, 160)

train_dataset = tf.keras.utils.image_dataset_from_directory(train_dir,

26
PANIMALAR ENGINEERING COLLEGE
23CS2902 – DEEP LEARNING – UNIT 3

shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE)

Output:

Downloading data from https://storage.googleapis.com/mledu-datasets/


cats_and_dogs_filtered.zip
68606236/68606236 [==============================] - 5s 0us/step
Found 2000 files belonging to 2 classes.

Step 3:

validation_dataset = tf.keras.utils.image_dataset_from_directory(validation_dir,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE)

Output:

Found 1000 files belonging to 2 classes.

Step 4:

class_names = train_dataset.class_names
plt.figure(figsize=(10, 10))
for images, labels in train_dataset.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")

Output:

27

You might also like