0% found this document useful (0 votes)
20 views42 pages

Lec 2

The document provides an overview of artificial neural networks (ANNs) and their evolution into deep learning, highlighting the importance of design decisions such as network depth, width, and activation functions. It discusses the advantages of convolutional neural networks (CNNs) in image processing, emphasizing their ability to learn features directly from data without manual extraction. The text also contrasts deep learning with classical machine learning, noting the challenges of feature selection in traditional methods.

Uploaded by

Omar Fahmy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views42 pages

Lec 2

The document provides an overview of artificial neural networks (ANNs) and their evolution into deep learning, highlighting the importance of design decisions such as network depth, width, and activation functions. It discusses the advantages of convolutional neural networks (CNNs) in image processing, emphasizing their ability to learn features directly from data without manual extraction. The text also contrasts deep learning with classical machine learning, noting the challenges of feature selection in traditional methods.

Uploaded by

Omar Fahmy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Recap: ANN

 Neural networkSummary
are parallel processing
using highly nonlinear prediction functions
 Neurons are nodes and they are
activated
based on the weighted sum of their inputs
 Learning is a recursive process for optimizing a loss function using
 data samples (or batches)
Backpropagation for a certain
(or error number is
propagation) of iterations (or epochs).
an efficient way to
activation function gradients
compute
 NNs require some types of regularization to avoid overfitting and
vanishing gradient problems (e.g., dropout, batch normalization)
2
Recap: ANN
 To design an ANN for
Design
a specific task, there are many design
decisions:
 Depth of the network (i.e., the # of hidden layers)
 Width of the hidden layers ( i.e., the # of units per hidden layer)
 Type of activation function (nonlinearity)
 Form of objective function

No hidden layer: One hidden layer: More hidden layer:


linear boundary concave boundary More convex boundary

3
Recap: Activation
Functions Objective function

�1
𝑎 𝟏𝟏 �1

𝑦
𝖰𝟏𝟏
𝑥 𝑧

𝑎 𝟏𝟐

Sum and Activation
for output layer
2 2 𝖰𝟏𝟐 𝑘

𝑧𝑖

𝑥 𝑎𝒋
Sum and

𝑗 Inputs 𝒊
Hidden Output Activation For
Neurons Hidden Layer
(size is M) Neuron

For sigmoid activation For other activation function (𝝈)


4
From Shallow to Deep: Deep Learning
 Deep learning comes from NN that contains several hidden
Artificial Intelligence
layers between input and output, which is called deep (AI)

neural networks (DNNs)


Machine Learning
(ML)

Inputs Outputs
X Y
Deep Learning
(DL)

 Why deep network did not


evolve after the 90s?
 Limitation on hardware
 Shortage of data sources
 Lack of theories for
hyperparameters
Image Source: https://www.buys365.ga/products.aspx?cname=learning+to+learn+neural+networks&cid=28  Vanishing gradient problem 5
From Shallow to Deep: Deep Learning
 With advanced mathematic techniques, powerful processing
units, new tricks to handle large data inputs, NN has came back
to public vision
 In recent years, the number of hidden layers has been increased
greatly, and the network is becoming deeper and deeper.
 Although a single hidden layer MLP can “solve any problem”
theoretically, the total number of neurons needed can be
exponentially larger even for solving some simple problems.
 Using more layers can make the network more efficient and
effective. This is one reason why DL is now very popular.

ResNet152
Image Source: https://www.researchgate.net/publication/343615852_Automobile_Classification_Using_Transfer_Learning_on_ResNet_Neural_Network_Architecture/figures?lo=1 6
Deep Learning Vs. Classical Machine Learning
 Deep learning has become the wining player in numerous pattern recognition
competitions
 Main advantage is that it does so without feature engineering (or hand-
crafted features)

Classical Preprocessing Traditional


Results or
(Traditional Input Data & Features Machine
Output
) ML Extractor Learning

pipeline

Deep
Results or
Learning Input Data Deep Learning Algorithm
Output
pipeline
7
Features Selection for Classical ML
 Features are distinctive aspect or characteristic of objects
(e.g., tumor shape or volume).
 Features’ quality is related to their ability to discriminate
subjects from different classes (normal and abnormal).
 Subjects from the same class should have common
feature values, while examples from different classes
should have different feature values
 Good (discriminative) features should have small
intra- class variations and large inter-class variations

Class A Class
B
In classical ML, feature selection is very
Feature x values challenging and must be carefully chosen
DL vs. Classical ML
(cont’d)
SVM, RF, [1] [1]
Decision trees

[2] [2]

SVM,
RF,
Decision
trees

Image source:
1 https://semiengineering.com/deep-learning-spreads/
2 https://quantdare.com/what-is-the-difference-between-deep-le
arning-and-machine-learning/deep_learning/

Old ANN models since the 


 More hidden units New nonlinear functions (ReLUs)
80s with new tricks &
 Better (online) optimization  Faster computers (CPUs and GPUs)
additions
 Exponential explosion of available data 9
The Need for Convolutional Neural Network
(CNN) Samples or the feature are
individual pixel values

𝒙𝟏

For this
part of
the image
only
there are
36 inputs

Cardiac Cine MR image 𝒙𝟑


𝟔

10
The Need for CNN (cont’d)
512*512 = 262,144-pixel values

𝒙𝟏

What if we take the


whole image as an
input

Cardiac Cine MR image 𝒙𝟐𝟔𝟐𝟏𝟒


𝟒

Large number of points for just Very dense network with large
one sample (i.e., image) number of parameters 11
The Need for CNN (cont’d)
 This is impractical large-sized input for an ANN≡ Huge # of weights
 More abstracted and meaningful features of the input should be used for
practical implementation 𝒙𝟏

Lung Images

What if we
have 3D images

512*512*150 ~40 M-pixel values 𝒙𝟑𝟗𝟑𝟐𝟏 𝟎


𝟔𝟎

Huge number of points for just Very dense network with huge
one sample (i.e., patient 3D data) number of parameters 12
The Need for CNN (cont’d)
 Two-fold advantages
 It mimics the human visual cortex Small-sized feature(s)

𝒙𝟏
 It is suitable for processing 2D and 3D images

Lung Images
Input Data
Reduction or
Meaningful
Features
Extraction

512*512*150 ~40 M-pixel values 𝒙𝒋

13
Convolutional Neural Network (CNN)
 Convolutional neural network (CNN or ConvNet) are DL network that possess
the availability to learn directly from data and mimic the human visual system ,
eliminating the need for manual (or the hand-crafted) feature extraction features.
 CNNs were around since 90’s and the early version was called LeNet (after LeCun):
A network to recognize handwritten digits.

Scalability issues
 lot of data
 computing resources

 The beginning of the modern DL era was by the introduction of ImageNet by


Prof. Fei-Fei Li 14
CNNs (cont’d)
 AlexNet was the boom of CNN when the authors won ImageNet challenge in
2012 after that become CNN gold standard for image classification

More models, new


architectures, expanded
capabilities, new
applications, etc.

15
Convolutional Neural Network (CNN)
 CNNs are NN with some convolutional layers (and some other layers).
 A convolutional layer is convolved with the input using several filters (kernels)
 Filters are responsible for detecting hidden patterns in the inputs (e.g., images)
 Same pattern appears in different places: They can be compressed!

1 0 0 0 0 1 1 -1 -1
Each filter (3 x 3) detects a small
Filter 1
-1 1 -1
0 1 0 0 1 0 pattern (dots, corner, edges, etc.)
-1 -1 1
0 0 1 1 0 0
-1 1 -1
1 0 0 0 1 0 -1 1 -1 Filter 2
0 1 0 0 1 0 -1 1 -1
Filter weights are the network
parameters to be learned.
0 0 1 0 1 0


Filter n
6 x 6 image
Modified from: https://www.cs.bgu.ac.il/~dip182/wiki.files/Deep-Learning-2017-Lecture5CNN.pdf 16
Convolution Operation
1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
If stride=1 If stride=2
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1 -3 -1 3 -3
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1

6 x 6 image 3 -2 -2 -1
Modified from: https://www.cs.bgu.ac.il/~dip182/wiki.files/Deep-Learning-2017-Lecture5CNN.pdf 17
Convolution Operation (cont’d)
11 -1-1 -1-1 -1-1 11 -1-1
1 -1 -1 -1 1 -1
-1 1 -1 -1-1 11 -1-1
-1-1 11 -1-1 Filter 1 -1 1 -1 Filter 2
-1-1 -1-1 11 -1-1 11 -1-1
-1 -1 1 -1 1 -1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Modified from: https://www.cs.bgu.ac.il/~dip182/wiki.files/Deep-Learning-2017-Lecture5CNN.pdf 18
A Generic CNN Architecture
 CNNs are NN with some convolutional layers (and some other layers).
Flatten
layer

on ing on
Input ingConvolutionallayers Output C1
Layer Ci is class # i
and some other layers.
uti uti (Softmax) C2
vation Pool
Data vation Pool
C3

Convol Acti atial Convol


Acti atial
Sp
New images or maps Fully Connected
Sp (FC) layer

Feature Extraction Classification Probabilistic

Distribution 19
A Generic CNN Architecture
(cont’d) Can be repeated Feature maps
multiple times
Input*filters Flatten

Input

Data Convolutional Pooling


Output
Fully Layer
Layer+ Layer Connected
High-level
Activation
feature (Global) (FC) Layer
Low-level feature (Local)

 The output of each operation is known as feature (activation)


 maps
Feature maps sizes are different (smaller) as we proceed by operations ≡ Data 20
CNN: Convolution Layer
 In the convolution layer, a given input image is convolved with a filter (or a
kernel)
 The filter depth and image depth are the same
 The output is the sum of dot product of the filter weights and the image values

3322xx3322xx33 iimmaaggee

5x5x3 kernel

21
Convolution Layer

(cont’d)
What will be the size of the activation map?
𝑁𝑥𝑁
Input Image size is

Filter size is 𝐹𝑥𝐹


𝑁+ is 𝑆
2𝑃
Strid
� (or step)

Activation map
� �
� �

𝑁+
2𝑃
𝑁
If S=1, then the map size = +
Output Image size

+
32−5
1
𝑁−
1=28 sizes in modern 𝐹S
1
Without padding
Typical kernel
𝑁+2𝑃−𝐹
S +
With padding
CNNs 22
Convolution Layer

(cont’d)
Convolve the same image with another filter, a green one
 Suppose we have 6 kernels, we will get 6 separate activation maps

size 28𝑥28𝑥6
 Stacking the results of the six-convolution operation gives a “new image” of

23
Activation Function
 The convolution layers are interspersed by activation function to learn complex
information

Please see this link for keras activation


functions:
https://keras.io/api/layers/activations/

24
A Generic CNN Architecture
Can be repeated multiple times
Flatten

Input

Data Pooling
Output Layer
C
o
n
v
o
l
u
t
i
o 25
CNN: Pooling Layer
 It compresses the input into a lower-dimensional representation, thus makes it
more manageable
 Like convolution operation, a window (or a batch) with parametrized size and
stride is slide over each activation map independently and then select
(compute) a single value, i.e., maximum (average)
 Thus, as if we subsample the pixels to
make image smaller
 The bird stays a bird
 Fewer parameters to characterize the
image

Image source: https://computersciencewiki.org/index.php/Max-pooling_/_Pooling 26


Pooling Layer (cont’d)
 What will be the size of the output of the pooling layer?

Input volume size is 𝑊1𝑥𝐻1𝑥𝐷1


Pooling window size is 𝐹 with
𝑆1 − 𝐹
strid𝐻 𝑊1 − 𝐹
𝐷2
𝐻2 = +1 = +
𝑊 S 2
1 S = 𝐷1
With padding
𝑊1 + 2𝑃 − 𝐹
𝐻1 + 2𝑃 𝐷2
𝐻2 + 𝑊2 = +
−𝐹 S 1 S
= 1 = 𝐷1

Image source: https://www.quora.com/What-is-max-pooling-in-convolutional-neural-networks

𝐻
pooling
 This �
2�
step NOT any
1

𝑊 𝐷
does
parameters as it introduce fixed

𝑊2
computations of the ONLY
activationperform
map values
𝐷 1 2 27
A Generic CNN Architecture
Can be repeated multiple times
Flatten

Input

Data Fully
Output Connected
C (FC) Layer
o
n
v
o
l
u
t
i
28
o
CNN: Fully Connected Layer
 Neurons in the FC layer have full connections to all activations in the previous
layer, as in regular NNs.
 Their activations can hence be computed with a matrix multiplication followed
by a bias offset. 4

-2

𝑦1
4 -2 3 0
8 1 0
𝑦2
1
0 1 2
-3 6 5
𝑦
-1 6 5 Flattened
0 2 -5
3
Assume two 0
activation maps
2

-5
18 Input 29
A Generic CNN Architecture
Can be repeated multiple times
Flatten

Input

Data Output
Layer

C
o
n
v
o
l
u
t
i 30
CNN: Output Layer
 Neurons in the FC layer output real values (𝑦𝑘).

 For multilabel classification, we need to determine the winner label (class) in a


probabilistic manner
 The Softmax converts the scores (𝑦𝑘) to a normalized probability distribution
Flatten
compute the
class scores
Spatial Pooling

Spatial Pooling
𝑦1
Convolution

Convolution
Activation

Activation
Softmax C1
�2
𝑦3
� Activation C2
Function C3
Ci is class # i

Probability of each label, 𝑃(𝐶


𝑖 )
𝑒𝑦
𝑦
Sum of all Probability =1 → 𝑖 =1 𝑃(𝐶𝑖 )
σ 𝑗 𝑖= 𝑒 31
CNN in Keras

1 -1 -1-1 1 Pooling window size is 2𝑥2 and strid is 2


-1
-1 1 -1-1 1 -1 … There are 25
-1 -1 -11 1 -1 … 3x3 filters. 3 -1 3
Input_shape = ( 28 , 28 ,
1) -3 1
28 x 28 pixels 1: greyscale, 3: RGB
Modified from: https://www.cs.bgu.ac.il/~dip182/wiki.files/Deep-Learning-2017-Lecture5CNN.pdf 32
CNN in Keras (cont’d)
Input data
1 x 28 x 28

Convolution Output
How many parameters for
each filter? 9 25 x 26 x 26

Max Pooling Fully connected


feedforward
25 x 13 x 13
network
Convolution
How many parameters
for each filter? 225=25x9 50 x 11 x 11

Max Pooling
50 x 5 x 5 1250
Flattened
CNN Design Choices and Issues
 Data preprocessing
 Number of layers and transfer learning
 Loss Function
 Optimization
 Regularization and data scarcity

34
CNN Design: Data Preprocessing
Classical Preprocessing Traditional
Results or
(Traditional Input Data & Features Machine
Output
) ML Extractor Learning

pipeline

Deep
Learning Input Data
Deep Learning Results or
PreprocesDsienegp Learning Algorithm
Algorithm Output
pipeline

35
Data Preprocessing: Medical Application
Data Acquisition Medical Data Medical Image Analysis Results
Lung Images

Identification,
ML / DL
Analysis Measurement,
Kidney Images Framework and/or
Judgment
 Detection of abnormalities
Imaging techniques Cardiac
Images  Software Packages
(MRI, CT, PET, etc.) (e.g., tumor)
 Your own developed  Assessment of functionality
Software (e.g., kidney function)
 Software available by  Follow-up on treatment
machine vendors (e.g., cancer recurrence)
DICOM, NIFTI,  Classification/diagnosis
BMP, JPEG, etc. (benign or malignant)
36
Preprocessing: Examples
[1] [1] [2] [2] Denoised image

High Pass Filtered and Equalized


[1] Digital Image Processing C. Gonzales 3rd edition, chapter 4, Fig. 4.59, pp. 290 [2] https://www.nitrc.org/project/list_screenshots.php?group_id=806&screenshot_id=738

Bias-corrected image

38
CNN Design: Loss Function
 During learning, an optimization strategy is used to minimize error (loss or
cost) function of the network.
 Mean Squared Error (MSE)
 Mean Squared Logarithmic Error (MSLE) Regression
problems
 Mean Absolute Error (MAE)
 Binary Cross Entropy (BCE)
 Hinge Binary classification
problems
 Squared Hinge
 Categorical Cross Entropy (CCE)
Multiclass
 Sparse CCE (SCCE)
Classification
 Kullback Leibler (KL) problems
Divergence
For more details and examples, please see: https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
For more details about Loss function in Keras, please see: https://keras.io/api/losses/ 42
CNN Design: Regularization
 Regularization is any technique that is employed to prevent overfitting and/or
help the optimization.

Examples: data augmentation, early stopping, dropout, batch normalization

 Dropouts are the regularization technique that is used to prevent overfitting


and thus enhance model generalizability.
 Randomly switching off a percentage of neurons of the DL network.

https://analyticsindiamag.com/everything-you-should-know-about-dropouts-and-batchnormalization-in-cnn/ 48
Regularization: Batch Normalization
 Batch normalization (BN) is a general technique that can be used to normalize
the inputs to a layer
𝑧
𝑧�
Mean of the neuron’s output
−𝑠𝑚𝑧
=

𝑧
Standard deviation
 BN is used before the activation.
 Using BN makes the network more stable during learning, and this may require
the use of larger learning rates than normal and turns in speeding up the
learning process.
 It is done along mini-batches instead of the full data set.
 May NOT be combined with dropout as it perform the same task

https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/
https://www.baeldung.com/cs/batch-normalization-cnn 49
Regularization: Data
Augmentation
 Efficient DL requires large sample(s) for training
 Small number of sample yield underfitting
 Unbalanced classes cause large errors even with high accuracy

 With scarce samples, data augmentation is used


An approach to increase the amount of
samples by adding slightly modified
copies of already existing ones.

Image Source: https://medium.com/secure-and-private-ai-writing-challenge/data-augmentation-increases-accuracy-of-your-model-but-how-aa1913468722 50


Assignments
 Assignment 1: Design your own deep NN to classify the CIFAR10 images (you can download from
keras.dataset) into one of the 10 classes.
 Investigate the use of different architectures (different layers, learning rate,
optimizers, loss function).
 Note: you will need to flatten the image and use it as your input vector

 Assignment 2: Design your deep convolutional neural network (CNN) to classify the
CIFAR10 images into one of the 10 classes.
 Invistage the use of different architectures (different layers, kernel sizes, pooling, learning rate,
optimizers, loss function).
 Assignmetn 3: Repeat Assignment #1 and #2 using MNIST dataset.

Note that you will need to convert the training labels into categorical using one hot encoding
using
to_categorical() function.
Thank
You &
Questions

You might also like