0% found this document useful (0 votes)
14 views

Lecture 2 - CNN and Overfitting

Uploaded by

Joel Lim
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 2 - CNN and Overfitting

Uploaded by

Joel Lim
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Official (Closed) - Non Sensitive

Deep Learning

Lecture 2:
Convolutional Neural
Networks & Overfitting

• Specialist Diploma in Applied


Generative AI

• Academic Year 2024/25


Official (Closed) - Non Sensitive

Topics
1. Introduction to Convolutional Neural
Networks (CNN)

2. The CNN operations

3. Training a CNN from scratch on a small


dataset

4. Handling Overfitting
Official (Closed) - Non Sensitive

1. Introduction to CNN
Official (Closed) - Non Sensitive

1. Introduction to CNN

https://www.youtube.com/watch?v=Gu0MkmynWkw
Official (Closed) - Non Sensitive

1. Introduction to CNN
 The MNIST classification problem using CNN
Convolution
operation

Max-pooling
operation
Official (Closed) - Non Sensitive

2. The CNN operations


Official (Closed) - Non Sensitive

2.1 The convolution operation


 Dense Layers vs.
Convolution Layers
1. Dense Layers learn
global patterns
2. Convolution Layers learn
local patterns

Images can be broken


into local patterns e.g.
edges, textures, etc.
Official (Closed) - Non Sensitive

2.1 The convolution operation


 Patterns learnt by convnets are
1. Transformable
2. Hierarchical
2nd Layer

1st Layer
Official (Closed) - Non Sensitive

2.1 The convolution operation


Refer to excel
 A Simple Example of 2D convolution spreadsheet
for details
Input Image: (5, 5, 1)  Output (3,
3, 1)
One Filter: (3, 3)
Pixel value

Filter
weight
0 1 2
2 2 0
0 1 2
Source:
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
Official (Closed) - Non Sensitive

2.1 The convolution operation


 A Simple Example of 2D convolution
Input Image: (5, 5, 1)  Output (3, 3,
1)
One Filter: (3, 3)

0 1 2
2 2 0
0 1 2

Source:
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
Official (Closed) - Non Sensitive

2.1 The convolution operation

TO CALCULATE WIDTH OF OUTPUT

Output Depth= # of Filters = 2

Source: Output Height or Width = Input width -


https://towardsdatascience.com/demystifying-convolutional-neural-networks-384785791596
Filter width + 1
=6–4+1=3
Official (Closed) - Non Sensitive

2.1 The convolution operation


 Padding
Input Image: (5, 5, 1)
One Filter: (3, 3)
Padding = 1


Output (5, 5, 1)

Source:
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
Official (Closed) - Non Sensitive

2.1 The convolution operation


 Strides
Input Image: (5, 5, 1)
One Filter: (3, 3)
Strides = 2


Output (2, 2, 1)

Source:
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
Official (Closed) - Non Sensitive

2.1 The convolution operation


Dot product of image with filter and add
across different filter with a bias

(-2)+(-1)+(0)+0=-3

Input width =5
Padding = 1
Strides = 2

TO CALCULATE WIDTH OF OUTPUT

Output Depth= # of Filters = 2

Output Height or Width =


(Input width + (2*Padding) - Filter width) / Strides + 1
= (5 + 2*1 - 3) / 2 + 1 = 3
Official (Closed) - Non Sensitive

2. The convolution operation


Input Image
This layer uses
is 28 x 28 pixels
a total of 32 filters
black-and-white

Filter Size
is 3 x 3

The output shape is (26, 26, 32),


Where:
26 = 28 – 3 + 1
32 = # of Filters
Official (Closed) - Non Sensitive

2.2 The max-pooling operation


 Down sample the tensors by taking the max value
in a window (e.g. 2x2)

Source:
https://computersciencewiki.org/index.php/Max-pooling_/_Pooling
Official (Closed) - Non Sensitive

2.2 The max-pooling operation


 MNIST Model with max-pooling layers
Official (Closed) - Non Sensitive

2.2 The max-pooling operation


 MNIST Model without max-pooling layers
4.1.5 Demo on convnet
Official (Closed) - Non Sensitive

Demo 2 – MNIST using CNN


Official (Closed) - Non Sensitive

3. Training a CNN from scratch on a small


dataset
Official (Closed) - Non Sensitive

 Datasets
1. Kaggle 2013 competition
 Original: 25,000 images (12,500 dogs and 12,500
cats)
 A Small Set:
• Training: 2,000 images (1,000 dogs and 1,000 cats)
• Validation: 1,000 images (500 dogs and 500 cats)
• Testing: 1,000 images (500 dogs and 500 cats)
2. Data Preprocessing  ImageDataGenerator
in Keras
 Read the picture files
 Decode the JPEG content to RGB grids of pixels
 Convert these into floating-point tensors
 Rescale the pixels values (0-255) to [0,1] interval
Official (Closed) - Non Sensitive

Let’s try your hands on your first CNN!


(Practical 2 – Part 1)
We will naively train a small CNN on our training
samples to classify images of "dogs" and "cats".
Official (Closed) - Non Sensitive

3 (RGB) 150
3x3x32
150

148
32
3x3x64

148

74 32

74

64
72

72 64
36
3x3x128
36

128

34
34
Official (Closed) - Non Sensitive

128

128
34
34

17
128
17

3x3x128
15
128
15

7
7
Official (Closed) - Non Sensitive

128

7
7
7*7*128=6272

[0.1,0.234,0.521,0.2,…,0.454,0.442,0.984]

512 nodes ……

Output
(Sigmoid)
Official (Closed) - Non Sensitive

3 (RGB) 150
3x3x32
150

148
32

148

128

7
7
7*7*128=6272
[0.1,0.234,0.521,0.2,…,0.454,0.442,0.984]

512 nodes ……

Output
(Sigmoid)
Official (Closed) - Non Sensitive

4. Handling Overfitting
Official (Closed) - Non Sensitive

4.1 Fundamental Issues


 Optimization vs. Generalization
• Adjust model to get best possible performance on
train data
• How well the trained model performs on data it has
never seen before (test data)

 Underfitting
• The lower the error on training data, the lower the
error on testing data  still progress to be made

 Overfitting
• Generalization stops improving:
 Training error keeps decreasing
 Validation / Testing error starts to increase
• Model beginning to learn patterns overly-specific to
training data only
Official (Closed) - Non Sensitive

4.1 Fundamental Issues


Balancing
Optimization and
Generalization
Tradeoff of Model
Complexity against
Training and Testing
accuracy

(Testing)

Optimizing Regularizing
Official (Closed) - Non Sensitive

4.2 To prevent overfitting


1.Reducing network size (tweak
hyperparameters)

2.Adding weight regularization

3.Adding dropout

4.Get more training data


Official (Closed) - Non Sensitive

4.2 To prevent overfitting


1. Reducing network size
• Capacity: the number of learnable parameters
 The number of layers
 The number of units per layer

• High Capacity
 Good at fitting to the training data

• Limited Capacity
 Good at generalizing to unseen data (prediction)

• Balance: too much capacity vs. not enough


capacity
 Start with small network size
 Increase the size and monitor error with validation
dataset
Official (Closed) - Non Sensitive

4.2 To prevent overfitting


2. Adding weight regularization
• To avoid overfitting  simpler model
 Forcing the weights to take only small values
 Adding a cost (associated with weights) to loss function

• Two types of cost function (weights


Lasso
regularization)
 L1 regularization
• Cost is proportional to the absolute value of the weights
 L2 regularization (Weight Decay)
• Cost is proportional to the square of the value of the
weights

• Ridge
Implemented in Keras
 In layers function, add in an argument to configure
weight regularizer
Official (Closed) - Non Sensitive

4.2 To prevent overfitting


3. Adding dropout
• Applied to a layer during training time
 randomly dropping out (setting to zero) a few outputs
 dropout rate normally between 0.2 - 0.5

• At test time, no dropout but the outputs are


scaled down by a factor (= dropout rate)

• The technique helps to reduce overfitting


 Randomly removing a different subset of neurons
 Introducing noises and break up non-significant
patterns

• Implemented in Keras by adding in a Dropout


Layer
Official (Closed) - Non Sensitive

4.2 To prevent overfitting

4. Get more training data


 The best solution!
 A model trained on more data will
naturally generalize better
 Why?
Official (Closed) - Non Sensitive

4. Get more training data - Data


Augmentation

• Data augmentation is a strategy to


significantly increase the diversity of
data available for training models,
without actually collecting new data.

• Data augmentation techniques such as


cropping, padding, and horizontal
flipping are commonly used to train
large neural networks.
Official (Closed) - Non Sensitive

 Example of Data Augmentation


After Data Augmentation

Original Image
(150 x 150 pixels)
Official (Closed) - Non Sensitive

Let’s try your hands on your first CNN!


(Practical 2 – Part 2)
We will now add measures to train the CNN to
classify images of "dogs" and "cats“ to prevent
overfitting.
Official (Closed) - Non Sensitive

For every epoch, the


Example
150
of Data Augmentation images presented to
3 (RGB) the model for
learning will be a
2000 images
3x3x32 simulated new
Epoch 1 150 image.

148
32
These new images 2000 images
are not saved 148
anywhere. Epoch 2
128

Epoch 3 2000 images


7
7
7*7*128=6272
[0.1,0.234,0.521,0.2,…,0.454,0.442,0.984]
Epoch 4
2000 images
512 nodes ……

Output Validates
Epoch 5 (Sigmoid)
2000 images
Official (Closed) - Non Sensitive

Wrapping Up
 CNN are the best type of machine-
learning models for computer-
vision tasks

 It’s possible to train a model from


scratch on a very small dataset 
overfitting

 Data augmentation is a powerful


way to fight overfitting when
working with image data
Official (Closed) - Non Sensitive

Further Reading
 Demystifying Convolutional Neural Networks
• https://towardsdatascience.com/demystifying-conv
olutional-neural-networks-384785791596

 Intuitively Understanding Convolutions for


Deep Learning
• https://towardsdatascience.com/intuitively-unders
tanding-convolutions-for-deep-learning-1f6f42faee
1

 Data Augmentation Increases Accuracy of


your Model – But how?
• https://medium.com/secure-and-private-ai-writing-
challenge/data-augmentation-increases-accuracy-
of-your-model-but-how-aa1913468722
Official (Closed) - Non Sensitive

Q&A
Official (Closed) - Non Sensitive

References
Books:

François Chollet, Deep Learning with Python (2018)

Online Resources:

You might also like