0% found this document useful (0 votes)
28 views57 pages

CPCS432 Lecture 5 Deep Learning and Artificial Neural Networks Techniques in Computer Vision

The document discusses Convolutional Neural Networks (CNNs) and their application in deep learning for computer vision tasks. It highlights the architecture of CNNs, including feature extraction and classification components, and explains the importance of convolution, pooling, and activation functions in processing image data. The document emphasizes the efficiency of CNNs in detecting features and their ability to improve accuracy in complex image classification tasks.

Uploaded by

dunaziad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views57 pages

CPCS432 Lecture 5 Deep Learning and Artificial Neural Networks Techniques in Computer Vision

The document discusses Convolutional Neural Networks (CNNs) and their application in deep learning for computer vision tasks. It highlights the architecture of CNNs, including feature extraction and classification components, and explains the importance of convolution, pooling, and activation functions in processing image data. The document emphasizes the efficiency of CNNs in detecting features and their ability to improve accuracy in complex image classification tasks.

Uploaded by

dunaziad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

CPCS432

Lecture 5
Deep Learning and Artificial
Neural Networks
Techniques in
Computer Vision
Applying Deep Learning Algorithms for
Computer Vision Tasks

Dr. Arwa Basbrain


Part 4: Deep Learning CNN
Convolutional Neural Network
theCova
at

tryoreer
③Introduction to convolutional neural network (CNN)

In the previous part, we discussed the feature extraction and feature selection .
We now understand that the better the features are, the more accurate the results are going to be.

In recent periods, the features have become more precise and as such better accuracy has been achieved. This is
due to a new kind of feature extractor called Convolutional Neural Networks (CNNs) and they have shown
remarkable accuracy in complex tasks, such as object detection in challenging domains, and classifying images
with high accuracy, and are now quite ubiquitous in applications ranging from smartphone photo enhancements to
satellite image analysis.

The most important feature to classify images is the Shape.


To extract the shape features we need to extract the Edges.
Sri action

Lakshmanan, V., Görner, M., & Gillard, R. (2021). Practical machine learning for computer vision. " O'Reilly Media, Inc.".
CPCS432 Lecture 5 28/09/2023 64
Convolutional Neural Network

A convolutional neural network (CNN) is a special artificial neural network that automatically
extracts and selects relevant features from input data. This characteristic makes CNNs well-suited
for handling grid-like structured data, such as images.

CPCS432 Lecture 5 28/09/2023 65


Convolutional Neural Network

Why is convolutional neural network (CNN) efficient?

• Edge detection-based method


• Translational invariance
• Reduce number of parameters
• Capture complex dependencies

To understand these points, we need to


first grasp CNN architecture.

CPCS432 Lecture 5 28/09/2023 66


er
CNN Architecture

· •
Feature extraction & selection Classification
ANN Architecture
A CNN model can be thought as a combination of two components: feature extraction and the
classification. ame
• The convolution and pooling layers perform feature extraction and selection.
~

• The ANN layers act as a classifier on top of these features. They learn how to use the features map to
correctly classify the images by assigning a probability for the input image class.

CPCS432 Lecture 5 28/09/2023 67


CNN Architecture


Key Components: Filtering fe a
ature
 Convolutional Layers
 Pooling Layers
 Fully Connected Layers

Additional Components:
 Batch Normalization
 Dropout
*

All CNN models follow a similar architecture

CPCS432 Lecture 5 28/09/2023 68


Ex
Image(s (3x3) G
(
:

ExampleKernel (n-mtl)

(33-3xD
CNN Components Convolutional Layers +
1) /
-
k

What Is Convolution? (n
output-
-
3
=

2x2
(k,k)
3x3
4x4 ((n-k+1), (n-k+1))
(n,n)

CPCS432 Lecture 5 28/09/2023 69


&
ei
end B

imaging
i
ginning
CNN Components  Convolutional Layers incl begin
to Orig
What Is padding?
added Rossemms
a Padding (“same”
* voparagus
2

Padding (“valid” mode) w mode) Padding (“full” mode)

3X3

5x5
output(15-3+ 1) , (5- +) -image After
F

image
smaller Padding: valid
-
(3 3)
=

,
Padding: same
>
-
filtering stays Padding: full -
>
-
>
image bigger
Input length = N Input length = N the same Input length = N After
After
filtering
Kernel length = K
Output length = N - K + 1
Kernel length = K
Output length = N
Kernel length = K filtering
Output length = N + K - 1

CPCS432 Lecture 5 28/09/2023 70


CNN Components  Convolutional Layers
Why do Convolution?
• Convolution is a pattern finder
• We want the same filter to look at all locations in the image
• This is the idea behind “Translational Invariance" show in
white

Gro
fire
or
Found
Filter

CPCS432 Lecture 5 28/09/2023 71


of depth
# example
:
image
(5) must have

CNN Components  Convolutional Layers f Her of depth

Convolution on Colour Images (5)


channels
D
3-D filterhas to
0 2-D
Filter
Filter Note: the same size in depth dimension
e
Blacks Image
Image


ruffuhi
r Y

Hight x Width x Colour

CPCS432 Lecture 5 28/09/2023 72


CNN Components  Convolutional Layers
3
colors

- T >
-
7x7x3

Il

&

>
- 3x3x3

Joseph Nelson. (Feb 5, 2020). When to Use Grayscale as a Preprocessing Step. Roboflow Blog: https://blog.roboflow.com/when-to-use-grayscale-as-a-preprocessing-step/
73
CPCS432 Lecture 5 28/09/2023 73
CNN Components  Convolutional Layers
2-D
Filter

Image

3-D

&
image everne Filter

Image

2 indices 3 indices
↓ 2
-

E st
CPCS432 Lecture 5 28/09/2023 74
CNN Components  Convolutional Layers
Convolution on Colour Images
bott e
Do
·
Multiple features
• A.shape = H x W x 3 example

-
3
• If we use "same mode", then B₁.shape = H x W, B₂.shape = H x W
• If we stack B₁, and B₂, we get B.shape = H x W x 2
• We can add any number of features!
• Consider that we should have more than one filter per image, because each filter is looking for something different
F₁ = K x K x 3 B₁ = H x W

F₂ = K x K x 3
B₂ = H x W

A= H x W x 3

CPCS432 Lecture 5 28/09/2023 75


filter 2x
CNN Components  Convolutional Layers de feature
echo
Convolution on Colour Images To a

·
• We call these "feature maps" (e.g. each 2-D image is a Output Image
map that tells us where the feature is found)
Feature Maps
Filter
3-D 1
H R+1
-

Input Image

HXWX3 Filter
2

The size of the final dimension the:


Filter • Number of channels
C
• Number of feature maps
CPCS432 Lecture 5
Il
28/09/2023
feature
76
CNN Components  Convolutional Layers
Convolution on Colour Images

Let's vectorize this operation; we don't need to do each "colour convolution"


separately
eineerre
B = A*w
shape(A) = H × W × C₁

shape(w) = C₁ × K × K ×⑦
C₂
o
shape(B) = H × W × C₂

↓ h
un Li
CPCS432 Lecture 5 28/09/2023 77
CNN Components  Convolutional Layers

Shape of Bias Term


• In a Dense layer, if Wi x is a vector of size M, b is also a vector of size M
• In a Conv layer, b does not have the same shape as W * x (a 3-D image)
• Technically, this is not allowed by the rules of matrix arithmetic
• But the rules of broadcasting (in Numpy code) allow it
• If W * X has the shape H x W x C₂, then b is a vector of size C₂ One scalar per feature

·
map

78
CPCS432 Lecture 5 28/09/2023 78
od
met h
sed
n-ba
c tio
dete
Ed ge
CNN Components  Convolutional Layers

How are convolution filters found?


Since convolution is just a part of some neural network layer, it's easy to conceive of how the
filters will be found
Initially, we looked at convolution as an image modifier (blur, edge)
Now, we see it as a pattern finder / shared-parameter matrix multiplication / feature transformer
In other words, W will be found the same as before, automatically!

CPCS432 Lecture 5 28/09/2023 79


s
meter
a
par
r of

used
e
mb
nu
ce
du

s
Re

CNN Components  Convolutional Layers

-
How much do we save?
• Input image: 32 x 32 x 3= 3072
• Filter: 3 x 5 x 5 x 64= 4800 # of parameters (ignoring bias term)
• Output image: 28 x 28 x 64 (32-5+1=28) = 50176
• Weight matrix: 3072 x 50176 =154,140,672 ~154 MILLION
• Compared to convolution, 154,140,672 / 4800 we have ~32,000 times more parameters
• It would also perform sub optimally, because we want to use the same pattern finder in
multiple places
• Without shared weights, we need to learn to find the pattern in every possible location it
might appear, separately

CPCS432 Lecture 5 28/09/2023 80


CNN Components  Convolutional Layers  Activation Function

Activation Function
• Activation function serves as a decision function and helps learn complex patterns.
• Activation functions are necessary to prevent linearity. Without them, the data would pass
through the nodes and layers of the network only going through linear functions
• The selection of an appropriate activation function can accelerate the learning process

CPCS432 Lecture 5 28/09/2023 81


s
meter ?
a
par
er of max
mb

zen
nu
ce
Re du

CNN Components  Pooling Layer

Pooling/Subsampling/Downsampling

&
Pooling layer performs “downsampling,” much like feature selection.
Pooling Layer
• Pooling or down-sampling is an interesting local operation.
• It sums up similar information in the neighborhood of the receptive
field and outputs the dominant response within this local region.
• The use of pooling operation helps to extract a combination of
features, which are invariant to translational shifts and small distortions.

CPCS432 Lecture 5 28/09/2023 82


s
meter
a
par
r of
e

Re du
ce
nu
mb
feature
minimizes
CNN Components  Pooling Layer

Pooling
At a high level, pooling is downsampling
E.g output a smaller image from a bigger image
If input is 100x100, a pool size of 2 would yield 50x50A.k.a. "Downsample by 2"

100
100
+

2xin

- 50x50

CPCS432 Lecture 5 28/09/2023 83


CNN Components  Pooling Layer common
Different Pool Sizes
very
• It's possible to have a non-square window, e.g. 2x3 or 3x2, but this is -not
unconventional
• It's also possible for boxes to overlap (this is called "stride")
• Previously, we looked at a pool size of 2 with a stride of 2 (common)
• If you had a stride of 1, the boxes would overlap (not common)

CPCS432 Lecture 5 28/09/2023 84


CNN Components  Pooling Layer

Why use pooling?


● Practical: if we shrink the image, we have less data to process!
● Translational invariance: I don't care where in the image the feature occurred, I just
care that it did

CPCS432 Lecture 5 28/09/2023 85


CNN Components  Pooling Layer

Why use pooling?


Convolution is a "pattern finder" (the highest number is the best matching location)

CPCS432 Lecture 5 28/09/2023 86


CNN Components  Pooling Layer  Stride
Alternative to Pooling: Stride
Researchers have found that sometimes, we can avoid pooling and just do strided convolution
instead (we saw this in the context of pooling first)
We get the same reduction in output image size.

smaller
.

Corey de
age

CPCS432 Lecture 5
Stride =2 -
> age ,
28/09/2023 87
CNN Components  Pooling Layer  Stride
Stride /Padding Equation

Elters
of

3708 No
.

ne eith
=( + 1, + 1, 𝑁𝑓) Filters
87
37+ 2(0)
3
-
+
37+
20
3,0
-

To .

-
1 - = 35x39410
CPCS432 Lecture 5 28/09/2023 88
CNN Components  Pooling Layer  Stride
Stride /Padding

CPCS432 Lecture 5 28/09/2023 89


CNN Components  Dropout

Dropout introduces regularization within the network, which


ultimately improves generalization by randomly skipping some
units or connections
• In NNs, multiple connections that learn a non-linear relation are sometimes co-adapted, which causes
overfitting.
• This random dropping of some connections or units produces several thinned network architectures, and
finally, one representative network is selected with small weights.
• This selected architecture is then considered as an approximation of all of the proposed network

CPCS432 Lecture 5 28/09/2023 90


CNN Components  Normalization Layer
• The normalization layer performs normalization operations to ensure that the activations of
each layer are well-conditioned and prevent overfitting.
• Batch Normalization: Apart from stabilizing the learning process, it also enables higher
learning rates, acting as a regularizer to some extent, reducing the need for Dropout in
some cases.
• Layer Normalization: Similar to batch normalization but is applied across all the features
in a single layer for a single sample. It's especially useful in recurrent neural networks.
• Group Normalization: Divides the channels into groups and computes within each group
the mean and variance for normalization. It's effective for small batch sizes.
·
standi

91
CPCS432 Lecture 5 28/09/2023 91
J
CNN Components  Fully Connected Layer

A Fully Connected Layer is simply feed-forward neural network.

• Fully Connected Layers form the last few layers in the network.
• The input to the fully connected layer is the output of the preceding layer (activation maps of high-
level features) and outputs an n-dimensional vector
• The flattened output is fed to a feed-forward neural network, and backpropagation is applied to
every iteration of training.
• Over a series of epochs, the model can distinguish between dominating and certain low-level
features in images and classify them using the softmax classification technique.

CPCS432 Lecture 5 28/09/2023 92


Convolutional Neural Network
Why is convolutional neural network (CNN) efficient ?
Edge detection-based method
• One of the key operations in CNNs is the convolution operation, which is
similar to edge detection methods in classical computer vision.
• The filters in CNNs act like edge detectors, automatically learning to
detect edges, gradients, and other low-level features in the early layers.
• This means CNNs inherently perform edge detection as part of their
feature extraction process, making them highly efficient at capturing
shapes and contours in images.

CPCS432 Lecture 5 28/09/2023 93


Convolutional Neural Network
Why is convolutional neural network (CNN) efficient ?
Reduce number of parameters
• Weight sharing and local connectivity: One of the key advantages of CNNs is
their ability to reduce the number of parameters through weight sharing and local
connectivity. In a fully connected neural network, each neuron is connected to
every neuron in the previous layer, which leads to a massive number of
parameters, especially for large images. In CNNs, the same filter is applied to
different parts of the input, drastically reducing the number of weights needed.

• Pooling Layers: Additionally, CNNs use pooling operations (e.g., max pooling)
to progressively reduce the spatial dimensions of the input, further reducing the
number of parameters and making the model computationally more efficient
without losing important features.

CPCS432 Lecture 5 28/09/2023 94


Convolutional Neural Network
Why is convolutional neural network (CNN) efficient ? Translational invariance

Translational invariance refers to the ability of CNNs to recognize an object or feature


regardless of its position in the input image. That means the network can detect the same
object even if it has been shifted to a different part of the image.

In traditional machine learning models, when an object shifts position in an image, the
model might not be able to recognize it because the exact pixel locations have changed.
This requires extensive data augmentation (training on multiple versions of the same image
in different locations) for good performance.

CPCS432 Lecture 5 28/09/2023 95


Convolutional Neural Network
Why is convolutional neural network (CNN) efficient ? Translational invariance

CNNs, on the other hand, inherently provide translational invariance due to two key aspects:

• Convolutional Layers: CNNs apply filters (kernels) across the entire input image using
convolution operations. These filters are small matrices that slide over the image and are
shared across the whole input space. The same filter is applied to different parts of the
image, meaning that it can recognize specific patterns, like edges or textures, regardless of
where they appear. This allows the network to detect the same feature (like an eye, wheel, or
object) even if it moves to a different part of the image.

• Pooling Layers: Pooling operations, like max pooling, further contribute to translational
invariance by reducing the spatial dimensions of the input. Max pooling takes the maximum
value from a specific region, making the exact position of a feature less important while still
retaining its presence. This down-sampling ensures that minor translations (small
movements) in the input do not change the output too much, hence making the model more
robust to shifts in objects' positions.
CPCS432 Lecture 5 28/09/2023 96
Convolutional Neural Network
Why is convolutional neural network (CNN) efficient ?
Capture complex dependencies
• Learning Hierarchical Relationships: CNNs excel at capturing complex
dependencies by using multiple layers of convolution, pooling, and non-linearity
(e.g., ReLU). The successive layers of a CNN are able to combine lower-level
features (like edges or textures) to recognize higher-level patterns (like objects or
faces). These complex dependencies between features are automatically learned
during training without the need for manual feature engineering.

• Spatial Dependencies: CNNs can also capture spatial hierarchies or dependencies


in data by utilizing the spatial information in images, meaning they understand
how pixels relate to their neighbours..

CPCS432 Lecture 5 28/09/2023 97


Typical CNN Architecture

-Ann
Feature extraction & selection

feature e

CPCS432 Lecture 5 28/09/2023 98


Typical CNN Architecture

Classification
Feature extraction & selection ANN Architecture

CPCS432 Lecture 5 28/09/2023 99


Typical CNN Architecture

CPCS432 Lecture 5 28/09/2023 100


Typical CNN Architecture

hermentou

are
CPCS432 Lecture 5 28/09/2023 101
Typical CNN Architecture
Why convolution followed by pooling

eatur
a
or

CPCS432 Lecture 5 28/09/2023 102


Typical CNN Architecture

Why convolution followed by pooling


• After each "conv-pool", the image shrinks, but filter sizes generally stay the same
• Common filter sizes are 3x3, 5x5, 7x7
• Assume "same mode" convolution and pool size = 2

CPCS432 Lecture 5 28/09/2023 103


Typical CNN Architecture

Why convolution followed by pooling


• After each "conv-pool", the image shrinks, but filter sizes generally stay the same
• Common filter sizes are 3x3, 5x5, 7x7
• Assume "same mode" convolution and pool size = 2
• If the filter size stays the same, but the image shrinks, then the portion of the image that the
filter covers increases!

CPCS432 Lecture 5 28/09/2023 104


Typical CNN Architecture

Why convolution followed by pooling


The input image shrinks
Since filters stay the same size, they find increasingly large patterns (relative to the image)
This is why CNNs learn hierarchical features

CPCS432 Lecture 5 28/09/2023 105


Typical CNN Architecture
Do we lose information if we shrink the image? Yes

Losing Information
We lose spatial information: we don't care where the feature was found
We haven't yet considered the # of feature maps
Generally, these increase at each layer
So, we gain information in terms of what features were found

CPCS432 Lecture 5 28/09/2023 106


Typical CNN Architecture

Two
type
ides
Es

CPCS432 Lecture 5 28/09/2023 108


Summary of CNN Concepts
• A CNN consists of alternating convolutional and pooling layers with MLP at the end.
Every convolutional layer does not necessarily have a pooling layer.
• Convolution is a feature extraction process in the convolutional layer.
• A kernel of dimension k×k is defined to divide input images into grids.
• Filters, of the same dimension as the kernel, are multiplied with the pixels in the kernel,
and the results are summed over each pixel and each image channel. An optional bias is
added to the result to generate feature matrices.
• The pooling layer implements downsampling algorithms (max pooling or average
pooling) to downsample the features.
• The process is repeated for each pair of convolutional-pooling layers where output from
one pooling layer is fed as input to the next convolutional layer.
• The last convolutional/pooling layer feeds feature matrices to the input layer of the MLP.
• The MLP part of the network learns as a conventional MLP network.

CPCS432 Lecture 5 28/09/2023 109


Examples of Popular CNNs
LeNet-5

The LeNet-5 CNN architecture, introduced in 1998 by LeCun et al. in their paper “Gradient-Based Learning Applied
to Document Recognition,” was mainly used for recognizing handwritten and machine-generated characters (optical
character recognition [OCR]) from documents.
It is a CNN consisting of seven layers. • There are two subsampling layers (S2 and S4).
• There is one fully connected layer (F6) and one output layer.
• The convolutional layers use 5×5 convolution kernels with stride 1.
• The subsampling layers are 2×2 average pooling layers.
• The entire network uses the TanH activation function except for the
output layer, which uses softmax.

Toclassification

110
CPCS432 Lecture 5 Dr. Arwa Basbrain
Examples of Popular CNNs
The input size is 224×224×3 colour images. AlexNet

The features of AlexNet are as follows:


It is a deep CNN containing eight layers.
Convolution layer 1: Kernel 11×11, filters 96, strides 4×4, activation ReLU
The network has 60 million parameters
Pooling layer 1: MaxPooling with kernel size 3×3, strides 2×2
and 650,000 neurons, and it takes about 3
Convolution layer 2: Kernel 5×5, filters 256, strides 1×1, activation ReLU days to train on a GPU.
Pooling layer 2: MaxPooling with kernel size 3×3, strides 2×2

Convolution layer 3: Kernel 3×3, filters 384, strides 1×1, activation ReLU

Convolution layer 4: Kernel 3×3, filters 384, strides 1×1, activation ReLU

Convolution layer 5: Kernel 3×3, filters 384, strides 1×1, activation ReLU
Pooling layer 5: MaxPooling with kernel size 3×3, strides 2×2
The last three layers are a fully connected MLP.
All convolution layers use ReLU activation functions.
The output layer uses softmax activation.
There are 1,000 classes in the output layer. 111
CPCS432 Lecture 5 Dr. Arwa Basbrain
Examples of Popular CNNs
Input size 224×224x3

Convolution layer 1:kernel 3×3, filters 64, activation ReLU


Convolution layer 2: Kernel 3×3, filters 64, activation ReLU
VGG-16
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
Convolution layer 3: Kernel 3×3, filters 128, activation ReLU It is a CNN that consists of 16 layers.
Convolution layer 4: Kernel 3×3, filters 128, activation ReLU It has 13 convolutional layers
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2 and 3 fully connected dense layers.
Convolution layer 5: Kernel 3×3, filters 256, activation ReLU This network has 138 million parameters.
Convolution layer 6: Kernel 3×3, filters 256, activation ReLU
Convolution layer 7: Kernel 3×3, filters 256, activation ReLU
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
Convolution layer 8: Kernel 3×3, filters 512, activation ReLU
Convolution layer 9: Kernel 3×3, filters 512, activation ReLU
Convolution layer 10: Kernel 3×3, filters 512, activation ReLU
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
Convolution layer 11: Kernel 3×3, filters 512, activation ReLU
Convolution layer 12: Kernel 3×3, filters 512, activation ReLU
Convolution layer 13: Kernel 3×3, filters 512, activation ReLU
Pooling layer: MaxPooling, kernel size 2×2 and strides 2×2
Fully connected layer 14 (MLP input layer): Flatten dense layer with input size 25088
Fully connected hidden layer 15: Dense layer with input size 4096
Fully connected output layer: Dense layer for 1,000 classes
112
CPCS432 Lecture 5 Dr. Arwa Basbrain
Examples of Popular CNNs

GoogLeNet

The Inception architecture introduces several key ideas that allow it to achieve Inception
high performance, both in terms of accuracy and computational cost. Multiple convolutions in parallel branches
Instead of trying to choose different filter
sizes (1x1, 3x3, 5x5, etc.) just try them all!

Al
& Add

113
CPCS432 Lecture 5 Dr. Arwa Basbrain
Examples of Popular CNNs
ResNet
ResNet
The key innovation of ResNet is the introduction of "residual blocks" that allow A CNN with branches (one branch is the
for training substantially deeper networks than what was previously possible. identity function, so the other learns the
residual)Variations: ResNet50, ResNet101,
Residual Block ResNet152, ResNet_v2, ResNeXt.

114
CPCS432 Lecture 5 Dr. Arwa Basbrain
wengt see
e
of

Transfer Learning Intuition


>
- saves in

• The features are found from one task may be useful for another task
• Transfer Learning took off in the field of computer vision
• ImageNet - Large-scale image dataset (millions of images, 1k categories)

CPCS432 Lecture 5 28/09/2023 115


Transfer Learning Intuition
Transfer Learning in a Picture
• Freeze the "body" Train only the head freeze

-
6
00
G

CPCS432 Lecture
Lecture5 5 Dr. Arwa Basbrain 28/09/2023 116
Transfer Learning Intuition

Transfer Learning in a Picture


• Transfer Learning in a Picture
• Chop off the old "head", add a new head!

hange
L

Dont
117
CPCS432 Lecture
Lecture5 5 Dr. Arwa Basbrain 28/09/2023 117
Doesa
e

Transfer Learning Intuition


Advantages of Transfer Learning
• Speeds Up Training: Pre-trained models on large datasets significantly
reduce the time to train a new model for a different but related task.
• Requires Less Data: You can achieve high accuracy with a smaller dataset,
as the model has already learned many features from the pre-trained dataset.
• Improves Performance: Leveraging pre-trained models can lead to better
accuracy and performance, especially on tasks with limited training data.

118
CPCS432 Lecture
Lecture5 5 Dr. Arwa Basbrain 28/09/2023 118
Transfer Learning Intuition

• Main idea: The features are found from one task may be useful for another task
• Transfer Learning took off in the field of computer vision
• ImageNet - Large-scale image dataset (millions of images, 1k categories)
• Because the dataset is so diverse, weights trained on this dataset can be applied
to a large number of vision tasks
• Cats vs Dogs
• Cars vs Trucks
• Even microscope images/images never seen before.

CPCS432 Lecture 5 28/09/2023 120

You might also like