0% found this document useful (0 votes)

17 views49 pages

Intro DL 02

Uploaded by

Hoàng Khải

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views49 pages

Intro DL 02

Uploaded by

Hoàng Khải

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

INTRODUCTION TO DEEP LEARNING (IT3320E)

2 - Convolutional Neural Network (CNN)

Hung Son Nguyen

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

September 27, 2023

Agenda

1 CONVOLUTION OPERATION

2 CONVOLUTIONAL NEURAL NETWORKS

3 ARCHITECTURE
Convolutional Layers
Pooling Layers
Activation Layers
Case Study: VGG Network
Visualizing what CNNs Learn

1
Section 1

Convolution Operation
Convolution operation

The convolution of f : R → R and

g : R → R is defined as the integral of
the product of the two functions after
one is reversed and shifted.
∫ ∞
(f ⊛ g)(t) := f(τ )g(t − τ )dτ
−∞

Discrete version:
∞
∑
(f ⊛ g)[n] = f[m]g[n − m]
m=−∞

2
Example

3
Properties

1 Commutative property of linear convolution

f⊛g=g⊛f

2 Associative property of linear convolution

(f ⊛ g) ⊛ h = f ⊛ (g ⊛ h)

3 Distributive property of linear convolution

f ⊛ (g + h) = f ⊛ g + f ⊛ h

4
Section 2

Convolutional Neural Networks

CNNs: overview

Convolutional Neural Networks are very similar to ordinary Neural Networks.

They are made up of neurons that have learnable weights and biases.
Each neuron receives some inputs, performs a dot product and optionally
follows it with a non-linearity.
The whole network still expresses a single differentiable function.

5
CNNs: overview

However, CNNs make the explicit assumption that inputs are images.

This architecture constraint paves the way to more efficient

implementation, better performance and a vastly reduced amount of
learnable parameters w.r.t. fully-connected deep networks.

Most important peculiarities of CNNs are presented in the following slides.

6
Section 3

Architecture
CNN Architecture

Unlike a regular neural network, CNN layers have neurons arranged in 3

dimensions: width (W), height (H) and depth (C).

Remark: in the following we’ll refer to the word depth to indicate the number of
channels of an activation volume. This has nothing to do with the depth of the
whole network, which usually refers to the total number of layers in the network.

7
CNN Architecture
An ”real-world” CNN is made up by a whole bunch of layers stacked one on the
top of the other.

Every layer has a simple API: it transforms an input 3D volume to an

output 3D volume with some differentiable function that may or may not
have parameters. 8
Convolutional Layers
The Convolutional Layer is the core building block of convolutional neural
networks.
Intuition: every convolutional layer is equipped with a set of learnable filters.
During the forward pass, each filter is convolved with the input volume thus
producing a 2D activation map. One map for each filter is produced. The output
volume is then made up by stacking all activation maps produced one on the top
of the other.

e.g. Result of N = 6 filters of kernel size K = 5x5 convolved on input image.

9
Convolutional Layers

Each convolutional layer has three main hyperparameters:

Number of filters N
Kernel size K, the spatial size of the filters convolved
Filter stride S, factor by which to downscale

The presence and amount of spatial padding P on the input volume may be
considered an additional hyperparameter. In practice padding is usually
performed to avoid headaches caused by convolutions ”eating the borders”.

10
Visualizing Convolution 2D

Convolution 2D, half padding, stride S = 1.

11
Visualizing Convolution 2D

Convolution 2D, no padding, stride S = 2.

12
Convolution

For now, our networks computes a nonlinear function of the inputs. If we work
with images, it has to learn the spacial structure of the data by itself, which takes
a long time.

A nice way to help it is to build convolution layers into the network.

13
Convolution

 
−1 −2 −1
 
0 0 0
1 2 1

 
−1 0 1
 
−2 0 2
−1 0 1

Image from http://www.reddit.com/ 14

Convolutional Layers: Local Connectivity

Looking closer, neurons in a CNN perform the very same operation of the
neurons we already know from DNN.
∑
wi xi + b
i

However, in convolutional layers

neurons are only locally connected to
the input volume. The small region
that each neuron ”sees” of the
previous layer is usually referred to as
the receptive field of the neuron.

15
Parameter Compatibility

If I is the length of the input volume size, F – the length of the filter,
Pstart , Pend – the amounts of zero padding, S – the stride,
then the output size O of the feature map along that dimension is given by:
I − F + Pstart + Pend
O= +1
S 16
source: https://stanford.edu/~shervine/teaching/cs-230
Convolutional Layers: Parameter Sharing

Assumption: if a feature is useful to compute at some spatial location (x, y), then
it should be useful to compute also at different locations (xi , yi ). Thus, we
constrain the neurons in each depth slice to use the same weights and bias.
If all neurons in a single depth slice are using the same weight vector, then the
forward pass of the convolutional layer can in each depth slice be computed as a
convolution of the neuron’s weights with the input volume (hence the name).
This is why it is common to refer to each set of weights as a filter (or a kernel),
that is convolved with the input.

17
Convolutional Layers: Parameter Sharing

Example of weights learned by [?]. Each of the 96 filters shown here is of size [11x11x3],
and each one is shared by the 55*55 neurons in one depth slice. Notice that the
parameter sharing assumption is relatively reasonable: If detecting a horizontal edge is
important at some location in the image, it should intuitively be useful at some other
location as well due to the translationally-invariant structure of images.

18
Convolution

Image from http://stats.stackexchange.com/ 19

Convolution
Image from http://www.matthewzeiler.com/pubs/arxive2013/eccv2014.pdf

20
Convolution
Image from http://www.matthewzeiler.com/pubs/arxive2013/eccv2014.pdf

20
Convolutional Layers: Number of Learnable Parameters

Given an input volume of size H1 x W1 x C1 , the number of learnable parameters

of a convolutional layer with N filters and kernel size KxK is:

tot_learnable = N ∗ K ∗ K ∗ C1 + N

Explanation: there are N filters which convolve on input volume. The neural connection
is local on width and height, but extends for the full depth of input volume, so there are
K ∗ K ∗ C1 parameters for each filter. Furthermore, each filter has an additive learnable
bias.

21
Pooling Layers: overview
Pooling layers spatially subsample the input volume.
Each depth slice of the input is processed independently.

Two hyperparameters:

Pool size K, which is the size of the pooling window

Pool stride S, which is the factor by which to downscale

22
Pooling Layers: types

The pooling function may be considered an additional hyperparameter.

In principle, many different functions could be used.
In practice, the max pooling is by far the most common

hni (x, y) = maxx̄,ȳ∈N(x,y) hn−1

i (x̄, ȳ)

Another common pooling function is the average

1 ∑
hni (x, y) = hn−1
i (x̄, ȳ)
K
x̄,ȳ∈N(x,y)

23
Pooling Layers: why

Pooling layers are widely used for a number of reasons:

Gain robustness to exact location of the features

Reduce computational (memory) cost
Help preventing overfitting
Increase receptive field of following layers

Most common configuration: pool size K = 2x2, stride S = 2. In this setting 75%
of input volume activations are discarded.

24
Pooling Layers: why not

The loss of spatial resolution is not always beneficial.

e.g. semantic segmentation

There’s a lot of research on getting rid of pooling layers while mantaining the
benefits (e.g. [?, ?]). We’ll see if future architecture will still feature pooling
layers.

25
Activation Layers

Activation layers compute non-linear activation function elementwise on the

input volume. The most common activations are ReLu, sigmoid and tanh.

Sigmoid Tanh ReLu

Nonetheless, more complex activation functions exist [?, ?].

26
Activation Layers

ReLu wins
ReLu was found to greatly accelerate the convergence of SGD compared to
sigmoid/tanh functions [?]. Furthermore, ReLu can be implemented by a simple
threshold, w.r.t. other activations which require complex operations.
Why using non-linear activations at all?
Composition of linear functions is a linear function. Without nonlinearities,
neural networks would reduce to 1 layer logistic regression.

27
Computing Output Volume Size

Convolutional layer: given an input volume of size H1 x W1 x C1 , the output of a

convolutional layer with N filters, kernel size K, stride S and zero padding P is a
volume with new shape H2 x W2 x C2 , where:

H2 = (H1 − K + 2P)/S + 1
W2 = (W1 − K + 2P)/S + 1
C2 = N

28
Computing Output Volume Size
Pooling layer: given an input volume of size H1 x W1 x C1 , the output of a
pooling layer with pool size K and pool stride S is a volume with new shape H2 x
W2 x C2 , where:

H2 = (H1 − K)/S + 1
W2 = (W1 − K)/S + 1
C2 = C1

Activation layer: given an input volume of size H1 x W1 x C1 , the output of an

activation layer is a volume with shape H2 x W2 x C2 , where:

H2 = H1
W2 = W1
C2 = C1

29
ADVANCED CNN ARCHITECTURES
More complex CNN architectures have recently been demonstrated to perform
better than the traditional conv -> relu -> pool stack architecture.
These architectures usually feature different graph topologies and much more
intricate connectivity structures. 30
Convolutional neural network

Image from https://www.cs.toronto.edu/~frossard/post/vgg16/

31
Convolutional neural network

Image from https://www.cs.toronto.edu/~frossard/post/vgg16/

32
VGG

VGG [?] indicates a deep convolutional network for image recognition developed
and trained in 2014 by the Oxford Vision Geometry Group.

This network is well-known for a variety of reasons:

Performance of the network is (was) great. In 2014 VGG team secured the
first and the second places in the localization and classification challenge
on ImageNet;
Pre-trained weights were released in Caffe [?] and converted by the deep
learning community in a variety of other frameworks;
Architectural choices by the authors led to a very neat network model,
successively taken as guideline for a number of future works.

33
VGG16 Architecture

Input fixed size 224x224 RGB images. For training, images are
pre-processed subtracting the mean RGB value of the training set.

Convolutional filters feature 3x3 receptive field (the smallest size to

capture the notion of left/right, up/down, center) and stride is fixed to 1
pixel.

Spatial pooling is carried out by five max pooling layers performed over
2x2 pixel window, with stride 2.

ReLu activation follow all hidden layers.

Fully connected layers feature 4096 neurons each followed by ReLu. The
very last fully connected layer is composed of 1000 neurons (as many as
ImageNet classes) and is followed by softmax activation.

34
VGG16 Computational Footprint

VGG16 features a total of 138 M learnable parameters.

Each image takes approx. 93MB of memory for forward pass. As a rule of
thumb, backward pass consumes roughly the double of the resources.
Most of memory usage is due to the first layers in the network.

Most of learnable parameters (70%) are condensed in the last

fully-connected layers. In particular, one single layer is responsible of
approximately 100M parameters on the total of 138M (can you spot it?).

35
The Myth of Interpretability
Convolutional neural networks have often been criticized for their lack of
interpretability[?]. The main objection is to deal with big and complex black
boxes, that give correct results even if in which we have no cue of what’s
happening inside.

36
The Myth of Interpretability

On the other side, linear models and decision trees are often presented as
example of ”champions” of interpretability. The debate whether a logistic
regression would be more or less interpretable than a deep network is complex
out the scope of this lecture.

Partly as a response to this criticism, several methods have been developed in

literature to visualize what a CNN learned. Let’s see some examples.

37
Visualizing Activations
Visualizing activations of the network during the forward pass is straightforward
and can be useful to detect dead filters (i.e. activations that are zero whatever
the input).

Activations on the 1st conv layer (left), and the 5th conv layer (right) of a trained
AlexNet looking at a picture of a cat. Every box shows an activation map corresponding
to some filter. Notice that the activations are sparse and mostly local. 38
Inspecting Weights
Visualizing the learned weights is another common strategy to get an insight of
what the network looks for in the images. The most interpretable weights are the
ones learned by first convolutional layer, which operates directly on the image
pixels.

39
Partially Occluding the Images

To investigate which portion of the input image most contributed to a certain

prediction, we can slide an occluding object on the input, and seeing how the
class probability changes as a function of the position of the occluder object [?].

40
t-SNE Embedding
CNNs can be interpreted as gradually transforming the images into a
representation in which the classes are separable by a linear classifier. We can
get a rough idea about the topology of this space by embedding images into two
dimensions so that their low-dimensional representation has approximately
equal distances than their high-dimensional representation. Here, a t-SNE
embedding of a set of images.

Convolutional Neural Network - 5
No ratings yet
Convolutional Neural Network - 5
21 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
Convolutional Neural Networks - Annotated
No ratings yet
Convolutional Neural Networks - Annotated
83 pages
Convolutional Neural Networks - Part 2
No ratings yet
Convolutional Neural Networks - Part 2
49 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
47 pages
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
No ratings yet
Typical CNN (Convolutional Neural Network) Architecture: CHARAN S (1VE20CA005) Cse-Ai, Svce
13 pages
CV Lec6
No ratings yet
CV Lec6
57 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
CNN For Visual Recognition
No ratings yet
CNN For Visual Recognition
4 pages
NN 07
No ratings yet
NN 07
24 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
6 pages
Project Exhibition 2
No ratings yet
Project Exhibition 2
42 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
CNN 190813145957
No ratings yet
CNN 190813145957
34 pages
Additional CNN
No ratings yet
Additional CNN
82 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
9 pages
DL Unit 3 2019PAT
No ratings yet
DL Unit 3 2019PAT
66 pages
Deep LearningUNIT-IV
No ratings yet
Deep LearningUNIT-IV
16 pages
Unit III
No ratings yet
Unit III
60 pages
CNN Midterm
No ratings yet
CNN Midterm
103 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
Unit 5 CNN
No ratings yet
Unit 5 CNN
151 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
CNN Architecture
No ratings yet
CNN Architecture
24 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Convolutional Neural Networks: 1. Basics of Cnns
No ratings yet
Convolutional Neural Networks: 1. Basics of Cnns
8 pages
CNN Slides Part2
No ratings yet
CNN Slides Part2
69 pages
07 Ais302 CNN
No ratings yet
07 Ais302 CNN
56 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Unit - 2
No ratings yet
Unit - 2
51 pages
Unit 5 Ann
No ratings yet
Unit 5 Ann
28 pages
DL Mod3
No ratings yet
DL Mod3
102 pages
Intro To CNN
No ratings yet
Intro To CNN
93 pages
PNAL9 CNNs
No ratings yet
PNAL9 CNNs
61 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
15 pages
CNNS and Classification Networks
No ratings yet
CNNS and Classification Networks
115 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
NN 06
No ratings yet
NN 06
18 pages
Deep Learning Convolution Neural Networks
No ratings yet
Deep Learning Convolution Neural Networks
73 pages
CNN (Neural Network)
No ratings yet
CNN (Neural Network)
32 pages
Unit III
No ratings yet
Unit III
89 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
55 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
19 pages
New
No ratings yet
New
8 pages
Unit 5th Ig Ann
No ratings yet
Unit 5th Ig Ann
112 pages
Unit III
No ratings yet
Unit III
89 pages
Unit 2 QUESTIONS and ANSWERS
No ratings yet
Unit 2 QUESTIONS and ANSWERS
26 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
Unit 3
No ratings yet
Unit 3
59 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
No ratings yet
Assignment #1: Afzal Ali (11282) Muhammad Hammad (11293) Muhammad Bilal (11291) Mehran Ahmed (11287) Date 20/03/2019
7 pages
Module5 ML
No ratings yet
Module5 ML
112 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
CNN 2
No ratings yet
CNN 2
47 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
A Survey of Neural Networks Usage For Intrusion Detection Systems
No ratings yet
A Survey of Neural Networks Usage For Intrusion Detection Systems
18 pages
Neural Networks / Deep Learning
No ratings yet
Neural Networks / Deep Learning
9 pages
Exam2005 2
0% (1)
Exam2005 2
19 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
16 pages
Python - Programming
No ratings yet
Python - Programming
9 pages
04 - Neural Networks PDF
No ratings yet
04 - Neural Networks PDF
46 pages
UNIT 6.machine Learning
No ratings yet
UNIT 6.machine Learning
34 pages
Lecture Notes On Lecture Notes On Deep Learning
No ratings yet
Lecture Notes On Lecture Notes On Deep Learning
8 pages
DL Practical
No ratings yet
DL Practical
25 pages
Unit - 4 Artificial Neural Networks
No ratings yet
Unit - 4 Artificial Neural Networks
33 pages
Soft Computing Roadmap
No ratings yet
Soft Computing Roadmap
3 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
Neural Network and Fuzzy Logic
No ratings yet
Neural Network and Fuzzy Logic
46 pages
CNN Image Classification With Advanced Hyperparameter Tunning - Ipynb+ +colab
No ratings yet
CNN Image Classification With Advanced Hyperparameter Tunning - Ipynb+ +colab
2 pages
Lecture 3
No ratings yet
Lecture 3
68 pages
200-Article Text-3847-1-10-20230705
No ratings yet
200-Article Text-3847-1-10-20230705
7 pages
Unit 3 CNN
No ratings yet
Unit 3 CNN
47 pages
Spacecraft Time-Series Online Anomaly Detection Using Deep Learning
No ratings yet
Spacecraft Time-Series Online Anomaly Detection Using Deep Learning
9 pages
RNN Neural Network
No ratings yet
RNN Neural Network
23 pages
MCA 203 Deep Neural Network (DNN) : Lecture 1 - Introduction To Syllabus, Scheme, Grading and Outcomes of The Course
No ratings yet
MCA 203 Deep Neural Network (DNN) : Lecture 1 - Introduction To Syllabus, Scheme, Grading and Outcomes of The Course
42 pages
DEEP LEARNING THEORY and LAB PROJECT2 PT2
No ratings yet
DEEP LEARNING THEORY and LAB PROJECT2 PT2
5 pages
Lecture 7. Multilayer Perceptron. Backpropagation: COMP90051 Statistical Machine Learning
No ratings yet
Lecture 7. Multilayer Perceptron. Backpropagation: COMP90051 Statistical Machine Learning
26 pages
Bidirectional RNN and RVNN
No ratings yet
Bidirectional RNN and RVNN
15 pages
ATC-Alat Berat Part 4
No ratings yet
ATC-Alat Berat Part 4
18 pages
DL Unit 3
No ratings yet
DL Unit 3
27 pages
NNDL Record Final
No ratings yet
NNDL Record Final
46 pages
RNN & LSTM Notes
No ratings yet
RNN & LSTM Notes
8 pages
Week 6 Prev & Current Assignments
No ratings yet
Week 6 Prev & Current Assignments
21 pages
RWKV: Reinventing RNNs For The Transformer Era - Cropped
No ratings yet
RWKV: Reinventing RNNs For The Transformer Era - Cropped
25 pages