Lec 2
Lec 2
Neural networkSummary
are parallel processing
using highly nonlinear prediction functions
Neurons are nodes and they are
activated
based on the weighted sum of their inputs
Learning is a recursive process for optimizing a loss function using
data samples (or batches)
Backpropagation for a certain
(or error number is
propagation) of iterations (or epochs).
an efficient way to
activation function gradients
compute
NNs require some types of regularization to avoid overfitting and
vanishing gradient problems (e.g., dropout, batch normalization)
2
Recap: ANN
To design an ANN for
Design
a specific task, there are many design
decisions:
Depth of the network (i.e., the # of hidden layers)
Width of the hidden layers ( i.e., the # of units per hidden layer)
Type of activation function (nonlinearity)
Form of objective function
3
Recap: Activation
Functions Objective function
�1
𝑎 𝟏𝟏 �1
𝑦
𝖰𝟏𝟏
𝑥 𝑧
�
𝑎 𝟏𝟐
�
Sum and Activation
for output layer
2 2 𝖰𝟏𝟐 𝑘
𝑧𝑖
𝑥 𝑎𝒋
Sum and
𝑗 Inputs 𝒊
Hidden Output Activation For
Neurons Hidden Layer
(size is M) Neuron
Inputs Outputs
X Y
Deep Learning
(DL)
ResNet152
Image Source: https://www.researchgate.net/publication/343615852_Automobile_Classification_Using_Transfer_Learning_on_ResNet_Neural_Network_Architecture/figures?lo=1 6
Deep Learning Vs. Classical Machine Learning
Deep learning has become the wining player in numerous pattern recognition
competitions
Main advantage is that it does so without feature engineering (or hand-
crafted features)
pipeline
Deep
Results or
Learning Input Data Deep Learning Algorithm
Output
pipeline
7
Features Selection for Classical ML
Features are distinctive aspect or characteristic of objects
(e.g., tumor shape or volume).
Features’ quality is related to their ability to discriminate
subjects from different classes (normal and abnormal).
Subjects from the same class should have common
feature values, while examples from different classes
should have different feature values
Good (discriminative) features should have small
intra- class variations and large inter-class variations
Class A Class
B
In classical ML, feature selection is very
Feature x values challenging and must be carefully chosen
DL vs. Classical ML
(cont’d)
SVM, RF, [1] [1]
Decision trees
[2] [2]
SVM,
RF,
Decision
trees
Image source:
1 https://semiengineering.com/deep-learning-spreads/
2 https://quantdare.com/what-is-the-difference-between-deep-le
arning-and-machine-learning/deep_learning/
𝒙𝟏
For this
part of
the image
only
there are
36 inputs
10
The Need for CNN (cont’d)
512*512 = 262,144-pixel values
𝒙𝟏
Large number of points for just Very dense network with large
one sample (i.e., image) number of parameters 11
The Need for CNN (cont’d)
This is impractical large-sized input for an ANN≡ Huge # of weights
More abstracted and meaningful features of the input should be used for
practical implementation 𝒙𝟏
Lung Images
What if we
have 3D images
Huge number of points for just Very dense network with huge
one sample (i.e., patient 3D data) number of parameters 12
The Need for CNN (cont’d)
Two-fold advantages
It mimics the human visual cortex Small-sized feature(s)
𝒙𝟏
It is suitable for processing 2D and 3D images
Lung Images
Input Data
Reduction or
Meaningful
Features
Extraction
13
Convolutional Neural Network (CNN)
Convolutional neural network (CNN or ConvNet) are DL network that possess
the availability to learn directly from data and mimic the human visual system ,
eliminating the need for manual (or the hand-crafted) feature extraction features.
CNNs were around since 90’s and the early version was called LeNet (after LeCun):
A network to recognize handwritten digits.
Scalability issues
lot of data
computing resources
15
Convolutional Neural Network (CNN)
CNNs are NN with some convolutional layers (and some other layers).
A convolutional layer is convolved with the input using several filters (kernels)
Filters are responsible for detecting hidden patterns in the inputs (e.g., images)
Same pattern appears in different places: They can be compressed!
1 0 0 0 0 1 1 -1 -1
Each filter (3 x 3) detects a small
Filter 1
-1 1 -1
0 1 0 0 1 0 pattern (dots, corner, edges, etc.)
-1 -1 1
0 0 1 1 0 0
-1 1 -1
1 0 0 0 1 0 -1 1 -1 Filter 2
0 1 0 0 1 0 -1 1 -1
Filter weights are the network
parameters to be learned.
0 0 1 0 1 0
…
…
Filter n
6 x 6 image
Modified from: https://www.cs.bgu.ac.il/~dip182/wiki.files/Deep-Learning-2017-Lecture5CNN.pdf 16
Convolution Operation
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
If stride=1 If stride=2
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1 -3 -1 3 -3
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1
6 x 6 image 3 -2 -2 -1
Modified from: https://www.cs.bgu.ac.il/~dip182/wiki.files/Deep-Learning-2017-Lecture5CNN.pdf 17
Convolution Operation (cont’d)
11 -1-1 -1-1 -1-1 11 -1-1
1 -1 -1 -1 1 -1
-1 1 -1 -1-1 11 -1-1
-1-1 11 -1-1 Filter 1 -1 1 -1 Filter 2
-1-1 -1-1 11 -1-1 11 -1-1
-1 -1 1 -1 1 -1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Modified from: https://www.cs.bgu.ac.il/~dip182/wiki.files/Deep-Learning-2017-Lecture5CNN.pdf 18
A Generic CNN Architecture
CNNs are NN with some convolutional layers (and some other layers).
Flatten
layer
on ing on
Input ingConvolutionallayers Output C1
Layer Ci is class # i
and some other layers.
uti uti (Softmax) C2
vation Pool
Data vation Pool
C3
Distribution 19
A Generic CNN Architecture
(cont’d) Can be repeated Feature maps
multiple times
Input*filters Flatten
Input
3322xx3322xx33 iimmaaggee
5x5x3 kernel
21
Convolution Layer
(cont’d)
What will be the size of the activation map?
𝑁𝑥𝑁
Input Image size is
𝑁+
2𝑃
𝑁
If S=1, then the map size = +
Output Image size
+
32−5
1
𝑁−
1=28 sizes in modern 𝐹S
1
Without padding
Typical kernel
𝑁+2𝑃−𝐹
S +
With padding
CNNs 22
Convolution Layer
(cont’d)
Convolve the same image with another filter, a green one
Suppose we have 6 kernels, we will get 6 separate activation maps
size 28𝑥28𝑥6
Stacking the results of the six-convolution operation gives a “new image” of
23
Activation Function
The convolution layers are interspersed by activation function to learn complex
information
24
A Generic CNN Architecture
Can be repeated multiple times
Flatten
Input
Data Pooling
Output Layer
C
o
n
v
o
l
u
t
i
o 25
CNN: Pooling Layer
It compresses the input into a lower-dimensional representation, thus makes it
more manageable
Like convolution operation, a window (or a batch) with parametrized size and
stride is slide over each activation map independently and then select
(compute) a single value, i.e., maximum (average)
Thus, as if we subsample the pixels to
make image smaller
The bird stays a bird
Fewer parameters to characterize the
image
𝐻
pooling
This �
2�
step NOT any
1
𝑊 𝐷
does
parameters as it introduce fixed
𝑊2
computations of the ONLY
activationperform
map values
𝐷 1 2 27
A Generic CNN Architecture
Can be repeated multiple times
Flatten
Input
Data Fully
Output Connected
C (FC) Layer
o
n
v
o
l
u
t
i
28
o
CNN: Fully Connected Layer
Neurons in the FC layer have full connections to all activations in the previous
layer, as in regular NNs.
Their activations can hence be computed with a matrix multiplication followed
by a bias offset. 4
-2
𝑦1
4 -2 3 0
8 1 0
𝑦2
1
0 1 2
-3 6 5
𝑦
-1 6 5 Flattened
0 2 -5
3
Assume two 0
activation maps
2
-5
18 Input 29
A Generic CNN Architecture
Can be repeated multiple times
Flatten
Input
Data Output
Layer
C
o
n
v
o
l
u
t
i 30
CNN: Output Layer
Neurons in the FC layer output real values (𝑦𝑘).
Spatial Pooling
𝑦1
Convolution
Convolution
Activation
Activation
Softmax C1
�2
𝑦3
� Activation C2
Function C3
Ci is class # i
Convolution Output
How many parameters for
each filter? 9 25 x 26 x 26
Max Pooling
50 x 5 x 5 1250
Flattened
CNN Design Choices and Issues
Data preprocessing
Number of layers and transfer learning
Loss Function
Optimization
Regularization and data scarcity
34
CNN Design: Data Preprocessing
Classical Preprocessing Traditional
Results or
(Traditional Input Data & Features Machine
Output
) ML Extractor Learning
pipeline
Deep
Learning Input Data
Deep Learning Results or
PreprocesDsienegp Learning Algorithm
Algorithm Output
pipeline
35
Data Preprocessing: Medical Application
Data Acquisition Medical Data Medical Image Analysis Results
Lung Images
Identification,
ML / DL
Analysis Measurement,
Kidney Images Framework and/or
Judgment
Detection of abnormalities
Imaging techniques Cardiac
Images Software Packages
(MRI, CT, PET, etc.) (e.g., tumor)
Your own developed Assessment of functionality
Software (e.g., kidney function)
Software available by Follow-up on treatment
machine vendors (e.g., cancer recurrence)
DICOM, NIFTI, Classification/diagnosis
BMP, JPEG, etc. (benign or malignant)
36
Preprocessing: Examples
[1] [1] [2] [2] Denoised image
Bias-corrected image
38
CNN Design: Loss Function
During learning, an optimization strategy is used to minimize error (loss or
cost) function of the network.
Mean Squared Error (MSE)
Mean Squared Logarithmic Error (MSLE) Regression
problems
Mean Absolute Error (MAE)
Binary Cross Entropy (BCE)
Hinge Binary classification
problems
Squared Hinge
Categorical Cross Entropy (CCE)
Multiclass
Sparse CCE (SCCE)
Classification
Kullback Leibler (KL) problems
Divergence
For more details and examples, please see: https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
For more details about Loss function in Keras, please see: https://keras.io/api/losses/ 42
CNN Design: Regularization
Regularization is any technique that is employed to prevent overfitting and/or
help the optimization.
https://analyticsindiamag.com/everything-you-should-know-about-dropouts-and-batchnormalization-in-cnn/ 48
Regularization: Batch Normalization
Batch normalization (BN) is a general technique that can be used to normalize
the inputs to a layer
𝑧
𝑧�
Mean of the neuron’s output
−𝑠𝑚𝑧
=
�
𝑧
Standard deviation
BN is used before the activation.
Using BN makes the network more stable during learning, and this may require
the use of larger learning rates than normal and turns in speeding up the
learning process.
It is done along mini-batches instead of the full data set.
May NOT be combined with dropout as it perform the same task
https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/
https://www.baeldung.com/cs/batch-normalization-cnn 49
Regularization: Data
Augmentation
Efficient DL requires large sample(s) for training
Small number of sample yield underfitting
Unbalanced classes cause large errors even with high accuracy
Assignment 2: Design your deep convolutional neural network (CNN) to classify the
CIFAR10 images into one of the 10 classes.
Invistage the use of different architectures (different layers, kernel sizes, pooling, learning rate,
optimizers, loss function).
Assignmetn 3: Repeat Assignment #1 and #2 using MNIST dataset.
Note that you will need to convert the training labels into categorical using one hot encoding
using
to_categorical() function.
Thank
You &
Questions