0% found this document useful (0 votes)
13 views35 pages

CSE4261 Lecture-11

Cse

Uploaded by

asad chowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views35 pages

CSE4261 Lecture-11

Cse

Uploaded by

asad chowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Fundamentals of Convolutional

Neural Networks(CNN)

Prof. Dr. Shamim Akhter


Professor, Dept. of CSE
Ahsanullah University of Science and Technology
Kernels and Filters
• A main component of CNN is filter
– which is a square matrix that has nK × nK dimension,
where nK is an integer and is usually a small number,
like 3 or 5.
– Sometimes filters are also called kernels.
– Kernels are used to do sharpening, blurring, embossing,
and so on during image processing.
Example: Four different filters
Convolution
In the previous process, we moved our 3 × 3 region always
one column to the right and one row down. The number of
rows and columns, in this example 1, is called the stride and
is often indicated with s. Stride s = 2 means simply that we
shift our 3 × 3 region two columns to the right and two rows
down at each step.
Pooling: max pooling
• Pooling is the second operation in CNNs.

5
Padding
• Sometimes, when dealing with images, getting a result from a
convolution operation with dimensions different from the original
image is not optimal. This is when padding is necessary.
• Add rows of pixels on the top and bottom and columns of pixels on
the right and left of the final images so the resulting matrices are the
same size as the original.
Building Blocks of a CNN
• Convolution and pooling operations are used to
build the layers in CNNs.

• In CNNs typically you can find the following layers:


– Convolutional layers
– Pooling layers
– Fully connected layers: a layer where neurons are
connected to all neurons of previous and subsequent
layers.
Convolutional Layers

However, what are the weights in this layer?


– The weights, or the parameters that the network learns during the
training phase, are the elements of the kernel themselves.
– We have nc kernels, each of nK × nK dimensions. That means that we have
nK2 nc parameters in a convolutional layer.
– since for each filter there is also a bias term that you will need to add.
Convolution- Pooling layer

LeNet-5 with Softmax activation function


Dense/FC Layer

• The weights are the ones you know from traditional


feed-forward networks.
• So the number depends on the number of neurons
and the number of neurons in the preceding and
subsequent layers.
CNN Implementation

Why 1 is here?

What is Dropout?
What is Flatten?
Dropout Regularization Techniques
• Dropout is a technique where randomly selected
neurons are ignored during training. They are
“dropped out” randomly. This means that their
contribution to the activation of downstream neurons
is temporally removed on the forward pass, and
weight updates are not applied to the neuron on the
backward pass.
• Generally, use a small dropout value of 20%-50% of
neurons with 20% providing a good starting point. A
low probability has minimal effect and a high value
results in under-learning by the network.
Flatten Layer
• Intuition behind flattening layer is to converts data
into 1-dimentional array for feeding next layer. we
flatted output of convolutional layer into single long
feature vector.
FC and Dense Layer
• A fully-connected layer is a layer that has a
connection/edge across every pair of nodes from two
node sets. For example, if you want to build a layer
with N1 input neurons and N2 output neurons, the
number of connections/edges will be N1 X N2, which is
also the shape of the weight matrix.
• As the number of connections can be very large (think
of connecting thousands of neurons to one another),
the layer is going to be highly dense which is why these
layers are also called Dense Layer.
4608

32x5x5
S=1 2x2
32x12x12

4608x128+128

128x10+10
Effect of
Convolution
Effect of Max
Pooling
Going Deeper with Convolutions

• Problems of “classical” CNNs


– It isn’t easy to get the right kernel size. Each image
is different. Typically, larger kernels are good for
more globally distributed information, and smaller
ones for locally distributed information.
– Deep CNNs are prone to overfitting.
– Training and inference of networks with many
parameters is computationally intensive.
Inception Module: Naïve Version
• Overcome the difficulties of CNN
– networks are wider instead of deeper
• perform convolution with multiple-size kernels in parallel,
to detect features at different sizes simultaneously, instead
of adding convolutional layer after layer sequentially.
• convolution with 1 × 1, 3 × 3, and 5 × 5 kernels, and even
max pooling at the same time in parallel

1 × 1 kernel looks at very localized features,


while the 5 × 5 spots more global features.
Number of Parameters @ Naïve Inception
• Let’s use 32 kernels for all layers.
– 1 × 1 convolutions: 64 parameters [32+32]
– 3 × 3 convolutions: 320 parameters [9x32+32]
– 5 × 5 convolutions: 832 parameters [25x32+32]
– max-pooling does not have learnable parameters
Models # of Parameters
Sequential Processing 64+9248+25632=34,944 Classical
Parallel Processing 64+320+832=1,216 Naïve Inception 30 times faster

1x1
3x3 5x5
32+32 32x9x32+32 Reduce channel to kernel
32x25x32+32
dimension
Inception Module: Dimension Reduction
• In the naïve inception module, we get a smaller number of learnable parameters
concerning classical CNNs, but we can do even better
• We can use 1 × 1
convolutions at the right
places (mainly before the 8 kernels
higher dimension 8 kernels
convolutions) to reduce
dimensions. 8 kernels 8 kernels
• Suppose that the previous
layer is the output of a
previous operation and that
its output has the
dimensions of 256, 28, 28.
Models # of Parameters
Naïve Inception 256x1x8+8=2056, 256x9x8+8=18,440, 256x25x8+8=51,208
Total=71,704
Parallel Processing 2056, (2056+8x9x8+8=2056+584=2640),
(2056+8x25x8+8=2056+1608=3664), 2056
Total=10,416 [An inception network is simply built by stacking lots of those
modules one after the other]
GoogLeNet: Multiple Cost Functions
• Stacks several inception models one after the other
– the middle layers tend to “die”.

Introduced two intermediate loss functions and then


computed the total loss function as a weighted sum of the
auxiliary losses, effectively using a total loss
Of course, the auxiliary losses are used only in training and not during inference.
Pre-Trained Networks
• Pre-trained deep learning models available to use.

• If weights is None the weights are randomly initialized.


That means that you get the VGG16 architecture and you
can train it yourself. But be aware, that it has roughly 138
million parameters, so you will need a really big training
dataset.

• If you use the value imagenet, the weights are the ones
obtained by training the network with the imagenet
dataset
Transfer Learning
• Transfer learning is a technique where a model trained to
solve a specific problem is re-purposed for a new
challenge related to the first problem.
• The imagenet dataset can be used to classify dogs’
images but should not be used for speech recognition.

• In image recognition with CNN typically,


– the first layers will learn to detect generic features, and
– the last layers will be able to detect more specific ones.
– In a classification problem, the last layer will have N softmax
neurons (for classifying N classes), and therefore must learn to
be very specific to your problem.
How does Transfer Learning work?
• A network with nL layers
– train a base network (or get a pre-trained model) on a big
dataset (called a base dataset). The dataset should be problem-
specified.
– the new or target dataset will be much smaller than the
previous dataset.
– train a new network, called a target network, on the target
dataset.
• The target network will typically have the same first nk (with nk < nL)
layers of our base network.
• The learnable parameters of the first layers (let’s say 1 to nk, with nk <
nL) are inherited from the base pre-trained network and are not
changed during the training of the target network.
• Only the last and new layers (from layer nK to nL) are trained.
Schematic representation of the transfer learning
A Dog and Cat Problem
Dogs vs. Cats | Kaggle, 800MB
Training Image (3000, 150, 150, 3), Testing Image (1000, 150, 150, 3)

Naïve CNN Approach

With two epochs, 69% validation accuracy and 70% training accuracy.
A Dog and Cat Problem
Transfer Learning Approach 1

The result will be an astounding 88% in two epochs.


An incredibly better result than before!
A Dog and Cat Problem
Transfer Learning Approach 2

90% accuracy in a few seconds.

One epoch takes only six


seconds, in comparison to the
4.5 minutes in Approach1

100 epochs??

You might also like