0% found this document useful (0 votes)
23 views

Lecture 2 - CNN and Overfitting

Uploaded by

Joel Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lecture 2 - CNN and Overfitting

Uploaded by

Joel Lim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Official (Closed) - Non Sensitive

Deep Learning

Lecture 2:
Convolutional Neural
Networks & Overfitting

• Specialist Diploma in Applied


Generative AI

• Academic Year 2024/25


Official (Closed) - Non Sensitive

Topics
1. Introduction to Convolutional Neural
Networks (CNN)

2. The CNN operations

3. Training a CNN from scratch on a small


dataset

4. Handling Overfitting
Official (Closed) - Non Sensitive

1. Introduction to CNN
Official (Closed) - Non Sensitive

1. Introduction to CNN

https://www.youtube.com/watch?v=Gu0MkmynWkw
Official (Closed) - Non Sensitive

1. Introduction to CNN
 The MNIST classification problem using CNN
Convolution
operation

Max-pooling
operation
Official (Closed) - Non Sensitive

2. The CNN operations


Official (Closed) - Non Sensitive

2.1 The convolution operation


 Dense Layers vs.
Convolution Layers
1. Dense Layers learn
global patterns
2. Convolution Layers learn
local patterns

Images can be broken


into local patterns e.g.
edges, textures, etc.
Official (Closed) - Non Sensitive

2.1 The convolution operation


 Patterns learnt by convnets are
1. Transformable
2. Hierarchical
2nd Layer

1st Layer
Official (Closed) - Non Sensitive

2.1 The convolution operation


Refer to excel
 A Simple Example of 2D convolution spreadsheet
for details
Input Image: (5, 5, 1)  Output (3,
3, 1)
One Filter: (3, 3)
Pixel value

Filter
weight
0 1 2
2 2 0
0 1 2
Source:
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
Official (Closed) - Non Sensitive

2.1 The convolution operation


 A Simple Example of 2D convolution
Input Image: (5, 5, 1)  Output (3, 3,
1)
One Filter: (3, 3)

0 1 2
2 2 0
0 1 2

Source:
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
Official (Closed) - Non Sensitive

2.1 The convolution operation

TO CALCULATE WIDTH OF OUTPUT

Output Depth= # of Filters = 2

Source: Output Height or Width = Input width -


https://towardsdatascience.com/demystifying-convolutional-neural-networks-384785791596
Filter width + 1
=6–4+1=3
Official (Closed) - Non Sensitive

2.1 The convolution operation


 Padding
Input Image: (5, 5, 1)
One Filter: (3, 3)
Padding = 1


Output (5, 5, 1)

Source:
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
Official (Closed) - Non Sensitive

2.1 The convolution operation


 Strides
Input Image: (5, 5, 1)
One Filter: (3, 3)
Strides = 2


Output (2, 2, 1)

Source:
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
Official (Closed) - Non Sensitive

2.1 The convolution operation


Dot product of image with filter and add
across different filter with a bias

(-2)+(-1)+(0)+0=-3

Input width =5
Padding = 1
Strides = 2

TO CALCULATE WIDTH OF OUTPUT

Output Depth= # of Filters = 2

Output Height or Width =


(Input width + (2*Padding) - Filter width) / Strides + 1
= (5 + 2*1 - 3) / 2 + 1 = 3
Official (Closed) - Non Sensitive

2. The convolution operation


Input Image
This layer uses
is 28 x 28 pixels
a total of 32 filters
black-and-white

Filter Size
is 3 x 3

The output shape is (26, 26, 32),


Where:
26 = 28 – 3 + 1
32 = # of Filters
Official (Closed) - Non Sensitive

2.2 The max-pooling operation


 Down sample the tensors by taking the max value
in a window (e.g. 2x2)

Source:
https://computersciencewiki.org/index.php/Max-pooling_/_Pooling
Official (Closed) - Non Sensitive

2.2 The max-pooling operation


 MNIST Model with max-pooling layers
Official (Closed) - Non Sensitive

2.2 The max-pooling operation


 MNIST Model without max-pooling layers
4.1.5 Demo on convnet
Official (Closed) - Non Sensitive

Demo 2 – MNIST using CNN


Official (Closed) - Non Sensitive

3. Training a CNN from scratch on a small


dataset
Official (Closed) - Non Sensitive

 Datasets
1. Kaggle 2013 competition
 Original: 25,000 images (12,500 dogs and 12,500
cats)
 A Small Set:
• Training: 2,000 images (1,000 dogs and 1,000 cats)
• Validation: 1,000 images (500 dogs and 500 cats)
• Testing: 1,000 images (500 dogs and 500 cats)
2. Data Preprocessing  ImageDataGenerator
in Keras
 Read the picture files
 Decode the JPEG content to RGB grids of pixels
 Convert these into floating-point tensors
 Rescale the pixels values (0-255) to [0,1] interval
Official (Closed) - Non Sensitive

Let’s try your hands on your first CNN!


(Practical 2 – Part 1)
We will naively train a small CNN on our training
samples to classify images of "dogs" and "cats".
Official (Closed) - Non Sensitive

3 (RGB) 150
3x3x32
150

148
32
3x3x64

148

74 32

74

64
72

72 64
36
3x3x128
36

128

34
34
Official (Closed) - Non Sensitive

128

128
34
34

17
128
17

3x3x128
15
128
15

7
7
Official (Closed) - Non Sensitive

128

7
7
7*7*128=6272

[0.1,0.234,0.521,0.2,…,0.454,0.442,0.984]

512 nodes ……

Output
(Sigmoid)
Official (Closed) - Non Sensitive

3 (RGB) 150
3x3x32
150

148
32

148

128

7
7
7*7*128=6272
[0.1,0.234,0.521,0.2,…,0.454,0.442,0.984]

512 nodes ……

Output
(Sigmoid)
Official (Closed) - Non Sensitive

4. Handling Overfitting
Official (Closed) - Non Sensitive

4.1 Fundamental Issues


 Optimization vs. Generalization
• Adjust model to get best possible performance on
train data
• How well the trained model performs on data it has
never seen before (test data)

 Underfitting
• The lower the error on training data, the lower the
error on testing data  still progress to be made

 Overfitting
• Generalization stops improving:
 Training error keeps decreasing
 Validation / Testing error starts to increase
• Model beginning to learn patterns overly-specific to
training data only
Official (Closed) - Non Sensitive

4.1 Fundamental Issues


Balancing
Optimization and
Generalization
Tradeoff of Model
Complexity against
Training and Testing
accuracy

(Testing)

Optimizing Regularizing
Official (Closed) - Non Sensitive

4.2 To prevent overfitting


1.Reducing network size (tweak
hyperparameters)

2.Adding weight regularization

3.Adding dropout

4.Get more training data


Official (Closed) - Non Sensitive

4.2 To prevent overfitting


1. Reducing network size
• Capacity: the number of learnable parameters
 The number of layers
 The number of units per layer

• High Capacity
 Good at fitting to the training data

• Limited Capacity
 Good at generalizing to unseen data (prediction)

• Balance: too much capacity vs. not enough


capacity
 Start with small network size
 Increase the size and monitor error with validation
dataset
Official (Closed) - Non Sensitive

4.2 To prevent overfitting


2. Adding weight regularization
• To avoid overfitting  simpler model
 Forcing the weights to take only small values
 Adding a cost (associated with weights) to loss function

• Two types of cost function (weights


Lasso
regularization)
 L1 regularization
• Cost is proportional to the absolute value of the weights
 L2 regularization (Weight Decay)
• Cost is proportional to the square of the value of the
weights

• Ridge
Implemented in Keras
 In layers function, add in an argument to configure
weight regularizer
Official (Closed) - Non Sensitive

4.2 To prevent overfitting


3. Adding dropout
• Applied to a layer during training time
 randomly dropping out (setting to zero) a few outputs
 dropout rate normally between 0.2 - 0.5

• At test time, no dropout but the outputs are


scaled down by a factor (= dropout rate)

• The technique helps to reduce overfitting


 Randomly removing a different subset of neurons
 Introducing noises and break up non-significant
patterns

• Implemented in Keras by adding in a Dropout


Layer
Official (Closed) - Non Sensitive

4.2 To prevent overfitting

4. Get more training data


 The best solution!
 A model trained on more data will
naturally generalize better
 Why?
Official (Closed) - Non Sensitive

4. Get more training data - Data


Augmentation

• Data augmentation is a strategy to


significantly increase the diversity of
data available for training models,
without actually collecting new data.

• Data augmentation techniques such as


cropping, padding, and horizontal
flipping are commonly used to train
large neural networks.
Official (Closed) - Non Sensitive

 Example of Data Augmentation


After Data Augmentation

Original Image
(150 x 150 pixels)
Official (Closed) - Non Sensitive

Let’s try your hands on your first CNN!


(Practical 2 – Part 2)
We will now add measures to train the CNN to
classify images of "dogs" and "cats“ to prevent
overfitting.
Official (Closed) - Non Sensitive

For every epoch, the


Example
150
of Data Augmentation images presented to
3 (RGB) the model for
learning will be a
2000 images
3x3x32 simulated new
Epoch 1 150 image.

148
32
These new images 2000 images
are not saved 148
anywhere. Epoch 2
128

Epoch 3 2000 images


7
7
7*7*128=6272
[0.1,0.234,0.521,0.2,…,0.454,0.442,0.984]
Epoch 4
2000 images
512 nodes ……

Output Validates
Epoch 5 (Sigmoid)
2000 images
Official (Closed) - Non Sensitive

Wrapping Up
 CNN are the best type of machine-
learning models for computer-
vision tasks

 It’s possible to train a model from


scratch on a very small dataset 
overfitting

 Data augmentation is a powerful


way to fight overfitting when
working with image data
Official (Closed) - Non Sensitive

Further Reading
 Demystifying Convolutional Neural Networks
• https://towardsdatascience.com/demystifying-conv
olutional-neural-networks-384785791596

 Intuitively Understanding Convolutions for


Deep Learning
• https://towardsdatascience.com/intuitively-unders
tanding-convolutions-for-deep-learning-1f6f42faee
1

 Data Augmentation Increases Accuracy of


your Model – But how?
• https://medium.com/secure-and-private-ai-writing-
challenge/data-augmentation-increases-accuracy-
of-your-model-but-how-aa1913468722
Official (Closed) - Non Sensitive

Q&A
Official (Closed) - Non Sensitive

References
Books:

François Chollet, Deep Learning with Python (2018)

Online Resources:

You might also like