AE556 2024 Topic4 CNN
AE556 2024 Topic4 CNN
Han-Lim Choi
The Problem with Fully-Connected Networks
2
Convolutional Neural Networks
• To create architectures that can handle large images, restrict the weights in
two ways
1. Require that activations between layers only occur in “local” manner
2. Require that all activations share the same weights
3
Convolutions
4
Additional Notes on Convolutions
• Pooling is a process that minimizes the quantity of data, removing noise and
leaving only clear information, making judgment and learning easier
• When a receptive field is formed only by convolution operations, the amount of
computation grows and is inefficient, thus more significant feature maps can be
obtained from the feature maps extracted through pooling
5
Additional Notes on Convolutions
• The stride refers to the interval at which the filter moves when applied
Adjusting this changes the filter movement interval
6
Additional Notes on Convolutions
• To maintain the output size of the output and preserve the edge pixels, a
technique called "padding" is used. This is the task of putting additional
values on the edges of the image
• It is usual to "zero pad" the input image so that the resultant image has the
same size
7
Convolutions in Image Processing
8
Convolutions in Image Processing
9
Number of Parameters
10
Number of Parameters
• The total number of parameters in the network is the sum of the number of
parameters in each convolution layer (pooling, stride, and padding are
hyperparameters that are not calculated)
𝟐 𝑲 : Filter size
𝒄 𝑪 : Number of input channels
𝒄 𝒄+ 𝑵 : Number of filters
• The depth of each kernel in a convolution layer is always equal to the number
of channels in the input image
• All kernels have 𝟐 × parameters, and there are of them
• The number of parameters in the FC layer is as follows:
11
Number of Parameters
12
Improving model performance
13
Improving model performance
14
Learning with Convolutions
𝒊 𝟏= 𝒊( 𝒊* 𝒊 𝒊)
15
Convolutions as Matrix Multiplication
• Then 𝒊* 𝒊 = 𝒊 𝒊 for
𝟏 𝟐 𝟑
𝟏 𝟐 𝟑
𝒊
𝟏 𝟐 𝟑
𝟏 𝟐 𝟑
𝑻
• So how de we multiply by 𝒊?
16
Convolutions as Matrix Multiplication
𝟏
𝟐 𝟏
𝑻 𝟑 𝟐 𝟏
𝒊 𝒊 𝟏= 𝒊 𝟏= 𝒊 𝟏 * 𝒊
𝟑 𝟐 𝟏
𝟑 𝟐
𝟑
17
Loss function
• The loss function value is utilized during the learning process to determine
how well the model fits the learning data, and it is usually divided into two
categories: regression and classification
• “MSE” (Mean Squared Error) is commonly employed in regression problems
where the NN output and goal values are continuous (It is also used to determine
the difference between pictures or masks in segmentation and shows how far the data
is from the mean)
𝟏 𝒏
MSE = 𝒊 𝟏 𝒊 𝒊
𝟐
𝒏
18
Loss function
19
LeNet, Digit Classification
• The network that started it all (and then stopped for ~14 years)
20
AlexNet
• Alexnet is an 8-layer CNN model that learns two identical structures in parallel
utilizing GPUs
• This replaces the existing tanh and sigmoid functions with the “ReLU” function
This converges six times faster than the current activation function. Following
this, most current models use the ReLU function
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet classification with deep convolutional neural networks." Communications
of the ACM 60.6 (2012): 84-90. 21
VGGNet
• The Oxford University research team VGG developed VGGNet, which won
second in the 2014 ImageNet image recognition competition
• The VGGNet model confirmed that the deeper the network, the better the
performance, and it employed a stack of numerous tiny, 3x3 kernels in place of
large kernels
• VGG-16, 19, etc. are some divisions of the setup based on the layer's depth
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint
arXiv:1409.1556 (2014). 22
GoogLeNet (Inception module)
• An idea was made for an inception module that uses different kernel sizes and
bottleneck structures to enhance the computational inefficiency of VGG
• It effectively extracts features by employing parallel convolutional layers at
several sizes
• GAP was used in place of the last FC layer to significantly reduce the size of the
model and improve accuracy and computational efficiency
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition.
(2015). 23
ResNet
In 𝐻 𝑥 = 𝐹 𝑥 + 𝑥,
we learn so that 𝐹 𝑥 = 𝐻 𝑥 − 𝑥
which corresponds to the additional
learning amount, becomes 0
𝐻 𝑥 − 𝑥 is called the “residual”
• Residual blocks u lize shortcuts that add input values to output values
• By simply adding input x, the network uses “skip connections” to make each
layer learn only small information excluding existing information
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition.
(2016). 24
ResNet (bottleneck structure)
strandard bottleneck
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition.
(2016). 25
Classification model performance
26