Lecture 3

The document provides an overview of Convolutional Neural Networks (CNNs) and their application in deep learning, particularly for computer vision tasks. It discusses key concepts such as the vanishing gradient problem, CNN architecture, various layers (convolution, activation, pooling, dropout, fully connected, and softmax), and techniques like transfer learning and data augmentation. Additionally, it highlights challenges faced by CNNs and lists notable CNN architectures used in practice.

Uploaded by

Abdelrhman Adel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views48 pages

Lecture 3

Uploaded by

Abdelrhman Adel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Lecture 3: Convolutional Neural Networks (CNN)

CS460: Deep Learning

Deep Learning (DL)

“Think of them as deep neural

networks”

 MultilayerPerceptron (MLP) is
considered a relatively shallower
model with one or two hidden layers
 DL models have much more layers
than MLPs
 The bottleneck for mounting more
layers has been the “vanishing
The vanishing gradient
problem
 a difficulty found in MLPs with the sigmoid activation
function with gradient-descent learning methods and
backpropagation
 In such methods, each of the neural network's
weights receives an update proportional to the
partial derivative of the error function with respect to
the current weight in each iteration of training.
 The problem is that in some cases, the gradient will
be very small due to the “chain rule”, effectively
preventing the weight from changing its value at the
“front” layers.
 In the worst case, this may completely stop the
neural network from further learning.
DL and Data Science

 Scalability of neural networks -

results get better with more data
and larger models, that in turn
require more computation to train.
DL and Feature
Engineering
 Automated Feature Learning -
ability to perform automatic feature
extraction from raw data.
 Hierarchical Feature Learning -
ability to provide different levels of
abstractions of the data.
CNN for vision
NN and Vision

 MNIST example
NN and Vision
 For computer vision, why can’t we just
flatten the image and feed it through
traditional NN such as MLPs ?
 Images are high-dimensional vectors. It
would take a huge amount of parameters to
characterize the network.
 (# of parameters = 784*15 + 15 for MNIST
ex.)
 CNNs are proposed to reduce the number of
parameters and adapt the network
architecture specifically to vision tasks.
Traditional Recognition
Approach
CNN Layers
Convolution Neural Networks (CNN)

 is
a class of deep, feed-forward
artificial neural networks that are
applied to analyzing visual imagery.
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer
Convolution Layer

 The# of output feature maps are

usually larger than the # of input
feature maps.
Convolution Layer
Related terms
 Filter : A mask/window that holds the
learned weights that are convolved with the
image. Its size specifies the patch or
receptive field of the image.
 Feature Map: is the output of one filter applied to
the previous layer.
 Stride: is the distance (number of rows and
columns) that filter is moved across the input from
the previous location.
 Padding: is to invent mock inputs for the
receptive field for the filter to read, incase the filter
is attempting to read off the edge of the input
feature map
Spatial
Dimensions

 7x7 input (spatially) assume

3x3 filter => 5x5 output
 7x7 input (spatially) assume
3x3 filter applied with stride
2 => 3x3 output!
 7x7 input (spatially) assume
3x3 filter applied with stride
3? doesn’t fit! cannot apply
3x3 filter on 7x7 input with
stride 3.
Spatial
Dimensions

 Output size: (N - F) /
stride + 1
 e.g. N = 7, F = 3:
stride 1 => (7 - 3)/1 + 1
=5
stride 2 => (7 - 3)/2 + 1
=3
stride 3 => (7 - 3)/3 + 1
= 2.33
Padding

 input 7x7 3x3 filter, applied with

stride 1 pad with 1 pixel border
=> what is the output?
 7x7 output!
 in general, common to see CONV
layers with stride 1, filters of size
FxF, and zero-padding with
 (F-1)/2. (will preserve size
spatially)
F = 3 => zero pad with 1
F = 5 => zero pad with 2
F = 7 => zero pad with 3
Weight Sharing
 Is the concept by which the CNN achieves
translation invariance.
 Based on the assumption: That if one feature is
useful to compute at some spatial position
(x,y), then it should also be useful to compute
at a different position (x2,y2).
 Is to constrain the neurons in each depth slice
to use the same weights and bias across the
whole image.
 However, it is possible to relax the parameter
sharing scheme, and instead simply call the
layer a Locally-Connected Layer.
Weight Sharing

 In practice, the weight update is

performed concurrently through
parallelization algorithms and special
hardware called the Graphical
Processing Unit (GPU)
 GPUs : are hundreds of simpler
cores, thousands of hardware
threads that are applied to image
regions at the same time.
Number of parameters

 Input volume: 32x32x3 10 5x5 filters

with stride 1, pad 2
 Number of parameters in this layer?
each filter has 5*5*3 + 1 = 76
params (+1 for bias) => 76*10 =
760
Hierarchy of Convolution
Layers
Activation Layer
 After each conv layer, it is conventional to apply a
nonlinear function.
 In the past, nonlinear functions like tanh and sigmoid were
used, but researchers found out that ReLU layers work
far better because the network is able to train a lot faster
(because of the computational efficiency) without making
a significant difference to the accuracy. It also helps to
alleviate the vanishing gradient problem.
 Generalization would not be possible with a linear
mapping as in that case a high level of
abstraction/generalization would not be possible. Hence,
to map a class of images into a manifold of feature vector,
we need activation, without it, it would be really difficult to
generalize as pictures in a class can have to much intra-
class variations.
Activation Layer

 Relu (REctified Linear Unit)

Pooling Layer
 It down-samples the previous layer’s feature map.
 Pooling layers follow a sequence of one or more
convolutional .
 It may be considered as a technique to compress or
generalize feature representations and generally reduce
the overfitting of the training data by the model.
 They too have a receptive field, often much smaller than
the convolutional layer. Also, the stride or number of
inputs that the receptive field is moved for each
activation is often equal to the size of the receptive field
to avoid any overlap.
 Pooling layers are often very simple, taking the average
or the maximum of the input value in order to the new
feature map.
Pooling Layer
Dropout Layer
 Probabilistically dropping out or ignoring nodes in
the network is a simple and effective regularization
method.
 It offers a very computationally cheap and
remarkably effective regularization method to
reduce overfitting and improve generalization error
in deep neural networks of all kinds.
 Dropout has the effect of making the training
process noisy, forcing nodes within a layer to
probabilistically take on more or less responsibility
for the inputs.
 It encourages the network to actually learn a sparse
representation.
Dropout Layer
Fully Connected Layer
 is the normal flat feed-forward neural network layer.
 is preceded by a flatten procedure.
 Contains neurons that connect to the entire input
volume, as in ordinary Neural Networks
 Spatial information is lost at this phase
 These layers may have a non-linear activation
function or a softmax activation in order to output
probabilities of class predictions.
 Fully connected layers are used at the end of the
network after feature extraction and consolidation has
been performed by the convolutional and pooling
layers.
 They are used to create final non-linear combinations
of features and for making predictions by the network.
Soft-max Layer
Soft-max Layer
 A Softmax function is a type of squashing function,
that limit the output of the function into the range 0 to
1.
 This allows the output to be interpreted directly as a
probability. Similarly, softmax functions are multi-class
sigmoids, meaning they are used in determining
probability of multiple classes at once.
 Since the outputs of a softmax function can be
interpreted as a probability, a softmax layer is typically
the final layer used in neural network functions.
 It is important to note that a softmax layer must have
the same number of nodes as the output later.
 It allows for the calculation of the error.
Transfer Learning

 is a technique which reuses the

finished Deep Learning model in
another more specific task.
 A pretrained CNN is used to process
data of different dataset than the
one it was trained on.
 The learned parameters are used as
they are.
 Sometimes, some further training to
fine tune the CNN is used. Also,
Data Augmentation

 Artificially making the dataset larger

 By using a collection of simple image
transformations on the already
included images yielding new ones,
such as: grayscales, horizontal flips,
vertical flips, random crops, color
jitters, translations, rotation.
Challenges to CNNs

A black-box : operates in the

paradigm of non-explainable AI, With
the exception of visualization of
output structures at intermediate
levels
 The application of CNNs in
unsupervised settings is still lagging
behind
 Limitations to context reasoning
 Not invariant to some non-affine
Famous CNNs Listing
 LeNet. The first successful application of Convolutional Networks were developed by Yann
LeCun in 1990’s.
 AlexNet. The first work that popularized Convolutional Networks in Computer Vision. The
AlexNet was submitted to the ImageNet ILSVRC challenge in 2012 and significantly
outperformed the second runner-up (top 5 error of 16% compared to runner-up with 26%
error). The Network had a very similar architecture to LeNet, but was deeper, bigger, and
featured Convolutional Layers stacked on top of each .
 ZF Net. The ILSVRC 2013. It was an improvement on AlexNet by tweaking the architecture
hyperparameters, in particular by expanding the size of the middle convolutional layers and
making the stride and filter size on the first layer smaller.
 GoogLeNet. The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al.
from Google. Its main contribution was the development of an Inception Module that
dramatically reduced the number of parameters in the network (4M, compared to AlexNet
with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers
at the top of the ConvNet, eliminating a large amount of parameters that do not seem to
matter much.
 VGGNet. The runner-up in ILSVRC. Its main contribution was in showing that the depth of
the network is a critical component for good performance. Their final best network contains
16 CONV/FC layers and, appealingly, features an extremely homogeneous architecture that
only performs 3x3 convolutions and 2x2 pooling from the beginning to the end.
 ResNet. Residual Network was the winner of ILSVRC 2015. It features special skip
connections and a heavy use of batch normalization. The architecture is also missing fully
connected layers at the end of the network.
CNN for Semantic
Segmentation
What is semantic
segmentation?
A technique to provide fine-grained
pixel-wise labelling of the image at
hand.
 Used in scene understanding
 Traditionally,
 Image segmentation
 Region-level classification
 Recent approaches trying to directly
adopt deep architectures designed
for category prediction to pixel-level
CNN for semantic
segmentation
A general semantic segmentation architecture
can be broadly thought of as
n encoder network followed by
a decoder network:
 The encoder is usually is a pre-trained
classification network like VGG/ResNet followed
by a decoder network.
 The task of the decoder is to semantically
project the discriminative features (lower
resolution) learnt by the encoder onto the pixel
space (higher resolution) to get a dense
classification.
R-CNN
DeConNet
DeConvNet
DeConvNet - Unpooling
 Pooling in convolution network is designed to filter
noisy activations in a lower layer by abstracting
activations in a receptive field with a single
representative value.
 Unpooling layers in deconvolution network, which
perform the reverse operation of pooling and
reconstruct the original size of activations .
 To implement the unpooling operation, the algorithm
records the locations of maximum activations selected
during pooling operation in variables, which are
employed to place each activation back to its original
pooled location.
 This unpooling strategy is particularly useful to
reconstruct the structure of input object.
DeConvNet -
DeConcolution
 The output of an unpooling layer is an enlarged,
yet sparse activation map.
 The deconvolution layers densify the sparse
activations obtained by unpooling through
convolution-like operations with multiple learned
filters.
 However, contrary to convolutional layers, which
connect multiple input activations within a filter
window to a single activation, deconvolutional
layers associate a single input activation with
multiple outputs.
 The output of the deconvolutional layer is an
enlarged and dense activation map.
Fully Convolutional Network-Based
Semantic Segmentation (FCN)

 learns a mapping from pixels to pixels,

without extracting the region
proposals.
 The FCN network pipeline is an
extension of the classical CNN.
 Contrary to the classical CNN, FCNs do
not have fully-connected layers only
have convolutional and pooling layers
which give them the ability to make
predictions on arbitrary-sized inputs.
FCN
FCN
 One issue in this specific FCN is that by
propagating through several alternated
convolutional and pooling layers, the
resolution of the output feature maps is down
sampled. Therefore, the direct predictions of
FCN are typically in low resolution, resulting
in relatively fuzzy object boundaries.
 A variety of more advanced FCN-based
approaches have been proposed to address
this issue, including SegNet, DeepLab-CRF,
and Dilated Convolutions.

CC511 Week 7 - Deep - Learning
No ratings yet
CC511 Week 7 - Deep - Learning
33 pages
New
No ratings yet
New
8 pages
UNIT-III DLL Full Unit
No ratings yet
UNIT-III DLL Full Unit
63 pages
Deep Learning Notes For Easy Access
No ratings yet
Deep Learning Notes For Easy Access
14 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Some Important Question
No ratings yet
Some Important Question
59 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
ML Lec 13 CNN
No ratings yet
ML Lec 13 CNN
44 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Unit II
No ratings yet
Unit II
38 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
47 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
Convolutional Neural Networks: 1. Basics of Cnns
No ratings yet
Convolutional Neural Networks: 1. Basics of Cnns
8 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
DL Mod3
No ratings yet
DL Mod3
102 pages
Unit 3
No ratings yet
Unit 3
59 pages
CNN 1
No ratings yet
CNN 1
19 pages
NN 07
No ratings yet
NN 07
24 pages
Unit III
No ratings yet
Unit III
89 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
CNN Cheat Sheet
No ratings yet
CNN Cheat Sheet
5 pages
Cheatsheet Convolutional Neural Networks
No ratings yet
Cheatsheet Convolutional Neural Networks
5 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
Convolutional Neural Network - 5
No ratings yet
Convolutional Neural Network - 5
21 pages
Theory of CNN (Convolutional Neural Network)
No ratings yet
Theory of CNN (Convolutional Neural Network)
4 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
Intro DL 02
No ratings yet
Intro DL 02
49 pages
CS 230 - Convolutional Neural Networks Cheatsheet
No ratings yet
CS 230 - Convolutional Neural Networks Cheatsheet
7 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Unit 5th Ig Ann
No ratings yet
Unit 5th Ig Ann
112 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
6 pages
3.convolutional Networks and Sequence Modeling
No ratings yet
3.convolutional Networks and Sequence Modeling
19 pages
Lecture - 07 (Convolutional Neural Networks)
No ratings yet
Lecture - 07 (Convolutional Neural Networks)
57 pages
DL Unit3
No ratings yet
DL Unit3
8 pages
MLT UNIT-4 & 5 Imp Sol
No ratings yet
MLT UNIT-4 & 5 Imp Sol
22 pages
CS 230 - Convolutional Neural Networks Cheatsheet
No ratings yet
CS 230 - Convolutional Neural Networks Cheatsheet
17 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
CNN Architecture
No ratings yet
CNN Architecture
24 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Intro CNN PDF
No ratings yet
Intro CNN PDF
31 pages
CNN
No ratings yet
CNN
31 pages
Additional CNN
No ratings yet
Additional CNN
82 pages
Unit 5 Ann
No ratings yet
Unit 5 Ann
28 pages
Module 5
No ratings yet
Module 5
20 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
15 pages
UNIT 2 Self Notes
No ratings yet
UNIT 2 Self Notes
10 pages
Assignment 13 Modern AI
No ratings yet
Assignment 13 Modern AI
3 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Ch03 Block Cipher
No ratings yet
Ch03 Block Cipher
54 pages
Hci 2
No ratings yet
Hci 2
10 pages
Introduction To Optimization-Lec1
No ratings yet
Introduction To Optimization-Lec1
36 pages
Lecture 1
No ratings yet
Lecture 1
10 pages
An Ensemble Neural Network Model For Predicting The Energy
No ratings yet
An Ensemble Neural Network Model For Predicting The Energy
16 pages
14 Backprop
No ratings yet
14 Backprop
34 pages
Chen 2021 Survey
No ratings yet
Chen 2021 Survey
8 pages
02 - Supervised Network (Perceptron)
No ratings yet
02 - Supervised Network (Perceptron)
40 pages
Kuzey 2014
No ratings yet
Kuzey 2014
16 pages
Final Exam - Attempt Review Ai 2333
No ratings yet
Final Exam - Attempt Review Ai 2333
13 pages
Internshipml (J2)
No ratings yet
Internshipml (J2)
50 pages
Object Detection in Images and Videos Using OpenCV A Comparative Study of Deep Learning and Traditional Computer Vision Techniques
No ratings yet
Object Detection in Images and Videos Using OpenCV A Comparative Study of Deep Learning and Traditional Computer Vision Techniques
6 pages
Arunkumar S Resume
No ratings yet
Arunkumar S Resume
2 pages
A Project Report
No ratings yet
A Project Report
43 pages
UNIT 5 Artificial Intelligence
No ratings yet
UNIT 5 Artificial Intelligence
7 pages
Virtual Mouse Using Hand Gestures
No ratings yet
Virtual Mouse Using Hand Gestures
6 pages
From Eliza To XiaoIce Challenges and Opportunities With Social Chatbots
No ratings yet
From Eliza To XiaoIce Challenges and Opportunities With Social Chatbots
20 pages
Analyze and Forecast The Cyber Attack Detection PR
No ratings yet
Analyze and Forecast The Cyber Attack Detection PR
49 pages
An Analytical Review On The Impact of Ar
No ratings yet
An Analytical Review On The Impact of Ar
24 pages
Monitoring and Detection of Steel Bridge Diseases
No ratings yet
Monitoring and Detection of Steel Bridge Diseases
21 pages
E-Journal GJCST (D) Vol 24 Issue 1
No ratings yet
E-Journal GJCST (D) Vol 24 Issue 1
106 pages
Intro
No ratings yet
Intro
12 pages
A Survey On Security and Privacy of Federated Learning
No ratings yet
A Survey On Security and Privacy of Federated Learning
61 pages
A Machine Learning-Based Sentiment Analysis of Online Product Reviews With A Novel Term Weighting and Feature Selection Approach
No ratings yet
A Machine Learning-Based Sentiment Analysis of Online Product Reviews With A Novel Term Weighting and Feature Selection Approach
14 pages
Generative AI On Amazon Web Services Ebook
No ratings yet
Generative AI On Amazon Web Services Ebook
33 pages
Conventional Methods in Housing Market Analysis: A Review of Literature
No ratings yet
Conventional Methods in Housing Market Analysis: A Review of Literature
15 pages
Advances in Artificial Intelligence IBERAMIA 2004 9th Ibero American Conference On AI Puebla MÃ©xico November 22 26 2004 Proceedings 1st edition by Christian Lemaitre, Carlos Reyes, Jesus Gonzalez ISBN 3540238069 978-3540238065pdf download
100% (4)
Advances in Artificial Intelligence IBERAMIA 2004 9th Ibero American Conference On AI Puebla MÃ©xico November 22 26 2004 Proceedings 1st edition by Christian Lemaitre, Carlos Reyes, Jesus Gonzalez ISBN 3540238069 978-3540238065pdf download
85 pages
Applications of Deep Learning in Stock Market Prediction Recent Progress
No ratings yet
Applications of Deep Learning in Stock Market Prediction Recent Progress
22 pages
Deep Learning - Question Papers
50% (2)
Deep Learning - Question Papers
7 pages
Report Final
No ratings yet
Report Final
42 pages
Multi-Cancer Early Detection Based On Serum Surface-Enhanced Raman Spectroscopy With Deep Learning: A Large-Scale Case-Control Study
No ratings yet
Multi-Cancer Early Detection Based On Serum Surface-Enhanced Raman Spectroscopy With Deep Learning: A Large-Scale Case-Control Study
16 pages
Lecture03 MachineLearning
No ratings yet
Lecture03 MachineLearning
78 pages
Utilising Artificial Intelligence To Predict Membrane Behaviour in Water Purification and Desalination
No ratings yet
Utilising Artificial Intelligence To Predict Membrane Behaviour in Water Purification and Desalination
24 pages