5-Convolutional Neural Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Introduction to Deep Learning

Convolutional Neural Network


(CNN)
Problem

• Input: color image of size 64 * 64


• Output: image contains human face or not
• Method: ?

2
Neural Network Model

• Each hidden layer is called a fully connected layer (or Dense layer)
• Each node in hidden layer is connected to all nodes in the previous layer

Fully Connected Neural Network (FCN)

3
Problem of Fully Connected Neural Network

• Color image size 64 * 64 needs 64 * 64 * 3 pixels

Input layer will have 12,288 values

4
Problem of Fully Connected Neural Network

• Color image size 64 * 64 needs 64 * 64 * 3 pixels


• Hidden layer 1 has 1000 nodes
à # of weights is 12,288,000 + # of bias is 1000
à # of paramemers between input layer and hidden layer 1 is 12,289,000
à What happen if we have 10 hidden layers and the size of image is 512
x 512?
5
Problem of Fully Connected Neural Network

• Color image size 64 * 64 needs 64 * 64 * 3 pixels


• Hidden layer 1 has 1000 nodes
à # of weights is 12,288,000 + # of bias is 1000
à # of paramemers between input layer and hidden layer 1 is 12,289,000
à What happen if we have 10 hidden layers and the size of image is 512
x 512? extremely large number of parameters to learn !!!
6
Problem of Fully Connected Neural Network

• Color image size 64 * 64 needs 64 * 64 * 3 pixels


• Spatial organization of the input is preserved until flatten do it
efficient for large images?

7
Convolution in a neural network

• Neuron depends only on a few local input neurons

Similarly to the local connectivity of visual features in images

8
Convolution in a neural network

• x is a 3x3 chunk (yellow area) of the image (green area)


• Each output neuron is parametrized with the 3x3 weight matrix w (small
red numbers in yellow area)
• Output image contains convolved features (in pink)

The process is performed by sliding the 3x3 window through the image

9
Convolution in color image

10
Convolution in color image

• Output Y of convolution operation with color image is a matrix


• Bias = 1 is added to the operation

11
Padding

Border of the image is added with zero values

12
Stride

Padding = 1, stride = 1 Padding = 1, stride = 2

13
First Convolutional Layer

Output of the first convolutional layer will be input of the next


convolutional layer

14
General Convolutional Layer

# of parameters of each kernel is F*F*D + 1 (for bias) # of parameters of


layer is K * (F*F*D + 1)

15
General Convolutional Layer

Output of the convolutional layer will be applied with a non-linear


activation function before being the input of the next convolutional layer

16
Element-wise activation functions

• Blue line: activation function


• Green line: derivative

Relu activation function is often used after each convolutional layer since
it is an efficient activation function without heavy computation

17
Pooling Layer

• Pooling layer is placed between two convolutional layers to reduce sizes


of output data and still preserve the important features of images

18
Pooling Layer

• In practice, pooling layer with size = (2,2), stride = 2 and padding = 0 is


often used so that output width and height of data are reduced half while
depth is unchanged

19
Pooling Layer

• In practice, pooling layer with size = (2,2), stride = 2 and padding = 0 is


often used so that output width and height of data are reduced half while
depth is unchanged
• Note: in some models, convolutional layer with stride > 1 is used to reduce
data sizes instead of pooling layer

20
Fully Connected Layer

• Tensor of output of last layer with size (H*W*D) is flatten to the vector
with size (H*W*D,1)
• The fully connected layers are then applied to this vector to combine
different image features learned by convolutional layers to produce output
of the model

21
Softmax activation function

22
Softmax Function
• Softmax function formula:

In which:

--> Each value (ai) in the output of the softmax


function is interpreted as the probability of
membership for each class

23
Softmax Function

• Softmax activation is used to normalize the outputs


of the last dense layer, converting them from
weighted sum values into probabilities that sum to 1

• Specifically, softmax activation outputs one value for


each node in the output layer. The output values are
interpreted as probabilities of the membership for
each class

24
Classic CNN Architecture

Input

Conv blocks:
• Convolution + activation (relu)
• Convolution + activaton (relu)
• …
• Maxpooling 2x2

Output
• Fully connected layers
• Softmax / Sigmoid activation function

25
Classic CNN Architecture

Output:
• Fully connected layers

If the last dense layer has only one node:


• Sigmoid activation function is used

If the last dense layer has more than one node:


• Softmax activation function is used

26
Feature extraction with CNN

Visualization of image features learned


automatically by convolutional layers
27
Popular CNN Architectures

• VGG (Visual Geometry Group)

• ResNet (Residual Network)

28
VGG Architecture

• VGG is a deep CNN architecture containing classical blocks of


CNN such as convolutional layers (conv), pooling layers
(pool) and fully connected layers (fc)

• Network architecture of VGG such as VGG16, VGG19

• VGG is proposed by Simonyan, Karen, and Zisserman in "Very


deep convolutional networks for large-scale image
recognition." (2014)

29
VGG Architecture
• VGG16 architecture:

30
VGG Architecture
• VGG16 architecture:

• Conv: size 3x3, padding = 1, stride = 1, # of kernels = 64 or dept of output


layer
• Pool/2: max pooling layer with size = 2x2, stride = 2
• fc 4096: fully connected layer with 4096 nodes
• From left to right: size of output features decreases, but depth increases
• After passing through all conv layers and pooling layers, data are flattened
and fed into the fc layers

31
VGG Architecture
• VGG16 architecture:

• Conv: size 3x3, padding = 1, stride = 1, # of kernels = 64 or dept of output


layer
• Pool/2: max pooling layer with size = 2x2, stride = 2
• fc 4096: fully connected layer with 4096 nodes
• From left to right: size of output features decreases, but depth increases
• After passing through all conv layers and pooling layers, data are flattened
and fed into the fc layers

32
Problem of classical CNN Architecture

• A 56-layer CNN gives more error rate than a 20-layer CNN in both training and
testing dataset
• Problem may cause by vanishing or exploding gradient (gradient becomes 0
or too large) during the backpropagation process

33
ResNet Architecture

• ResNet introduces the concepts called residual


block using skip (shortcut) connection

Residual learning: a building block

34
ResNet Architecture

• A ResNet architecture is created by stacking a set of


residual blocks together

A residual block

35
ResNet Architecture

• ResNet solves the problem of vanishing or


exploding gradient
• ResNet is able to support hundreds or thousands
of convolutional layers
• Proposed by He, Kaiming, et al. "Deep residual
learning for image recognition." CVPR. 2016.

36
Network Architecture of ResNet

ResNet-34 architecture

37
VGG19 vs. ResNet50

VGG19 ResNet50

Accuracy 5.25 top-5 error 7.1%

Parameters 25M 138M

Computational 3.8B Flops 15.3B Flops


complexity
Convolution Fully convolution until Contains several fully
the last layer connected layers

38
Tools and Framework

• TensorFlow is most popular, but difficult to use


• Keras is easier to use with high level APIs. Keras can be run on top
of TensorFlow, Theano, CNTK

39
Solving AI Problem with Deep Learning

1. Problem definition
2. Dataset preparation
3. Model construction
4. Loss function definition
5. Apply backpropagation and gradient descent to
find parameters (weight and bias) to optimize loss
function (noted that another optimizer can also
be used)
6. Predict new output with new data using learnt
parameters and weights

40
Deep Learning Datasets
• You can visit this website to get information of the
dataset: https://paperswithcode.com/datasets

• Kaggle is also useful source for you to get datasets for


deep learning models

41
Exercises 3

• TODO1: Download and study the properties of “Dogs


and Cats” dataset for binary classification problem

• TODO2: Download and study the properties of


“MNIST” dataset for multi-class classification
problem

• TODO3: Write a short report to express your


understanding of each dataset

42
43

You might also like