0% found this document useful (0 votes)
25 views151 pages

Unit 5 CNN

The document provides a comprehensive overview of Convolutional Neural Networks (CNNs), covering their building blocks, architectures, and key concepts such as convolution, pooling, and activation functions. It discusses various CNN models like LeNet-5, AlexNet, and VGG-16, and highlights practical applications in image classification, object detection, and more. Additionally, it addresses important techniques like padding, striding, and batch normalization that enhance CNN performance.

Uploaded by

Aatif Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views151 pages

Unit 5 CNN

The document provides a comprehensive overview of Convolutional Neural Networks (CNNs), covering their building blocks, architectures, and key concepts such as convolution, pooling, and activation functions. It discusses various CNN models like LeNet-5, AlexNet, and VGG-16, and highlights practical applications in image classification, object detection, and more. Additionally, it addresses important techniques like padding, striding, and batch normalization that enhance CNN performance.

Uploaded by

Aatif Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 151

Unit 3

Convolution Neural
Network(CNN)
Contents
● Building blocks of CNNs,
● Architectures, convolution / pooling layers, Padding, Strided convolutions,
● Convolutions over volumes, SoftMax regression,
● Deep Learning frameworks, Training and testing on different distributions,
● Bias and Variance with mismatched data distributions,
● Transfer learning, multi-task learning, end-to-end deep learning,
● Introduction to CNN models: LeNet – 5, AlexNet, VGG – 16,Residual Networks
Course Outcome

Implement the concept of Convolutional Neural Networks and its models


Image Classification
Object localization
Object Detection
Face detection and Recognition
Image segmentation
Image superresolution
Edge-Change in Intensity
Deep Learning -CNN Demo
https://deeplizard.com/resource/pavq7noze2
Why Need of Padding???
Problem with Convolution

1.Feature map will reduce with each layer

2.Corner pixel have less involvement


Padding Demo

https://colab.research.google.com/drive/1txIfzUJ_ehc_waLV67r-yEhTLIccbXIC#sc
rollTo=mL_dQ3K0M-ZL
Stride Demo
https://colab.research.google.com/drive/1txIfzUJ_ehc_waLV67r-yEhTLIccbXIC#sc
rollTo=mL_dQ3K0M-ZL
Max Polling Demo
https://deeplizard.com/resource/pavq7noze3

Keras Demo
:--https://colab.research.google.com/drive/1F4F6Q9O-hPvCDeOWcqMUa5BuBOv
uOBWc?usp=sharing
Disadvantage
Introduction to CNN models
➔ LeNet – 5
➔ AlexNet
➔ VGG – 16
➔ Residual Networks
LeNet – 5
AlexNet
VGGNET
Convolutional Neural Network
● A convolutional neural network, or CNN, is a network architecture for deep learning.

● It learns directly from images. A CNN is made up of several layers that process and transform an input to

produce an output.

● You can train a CNN to do image analysis tasks, including scene classification, object

detection and segmentation, and image processing.

● three key concepts:

– local receptive fields,

– shared weights and biases, and

– activation and pooling.


89
● In a typical neural network, each neuron in the input layer is connected to a neuron in the hidden layer.
However, in a CNN, only a small region of input layer neurons connects to neurons in the
hidden layer. These regions are referred to as local receptive fields.The local receptive field is translated
across an image to create a feature map from the input layer to the hidden layer neurons.
Convolutional Neural Network

90
91
Shared weights and biases
However, in the case of CNNs, the weights and bias values are the same for all hidden neurons in a given layer. 92

This means that all hidden neurons are detecting the same feature, such as an edge or a blob, in different regions of the
image. This makes the network tolerant to translation of objects in an image. For example, a network trained to recognize
cats will be able to do so whenever the cat is in the image.
Convolutional Neural Network
● Our third and final concept is activation and pooling. The activation step applies a transformation to the

output of each neuron by using activation functions. Rectified linear unit, or ReLU, is an example of a commonly
used activation function. It takes the output of a neuron and maps it to the highest positive value.
● Pooling reduces the dimensionality of the featured map by condensing the output of small regions of
neurons into a single output. This helps simplify the following layers and reduces the number of parameters that the
model needs to learn.
Convolutional Neural Network

A Convolutional Neural Network, also known as CNN or ConvNet, is a class of neural networks that
specializes in processing data that has a grid-like topology, such as an image. A digital image is a binary
representation of visual data. It contains a series of pixels arranged in a grid-like fashion that contains
pixel values to denote how bright and what color each pixel should be.

Ref : https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
Convolutional Neural Network Architecture
A CNN typically has three layers: a convolutional layer, a pooling layer, and a fully connected layer.

Ref : https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
Convolution Layer

The convolution layer is the core building block of the CNN. It carries the main portion of the network’s computational

load.

This layer performs a dot product between two matrices, where one matrix is the set of learnable parameters otherwise

known as a kernel, and the other matrix is the restricted portion of the receptive field. The kernel is spatially smaller

than an image but is more in-depth. This means that, if the image is composed of three (RGB) channels, the kernel

height and width will be spatially small, but the depth extends up to all three channels.

Ref : https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
Convolution Layer

During the forward pass, the kernel slides across the height and width of the image-producing the image representation

of that receptive region. This produces a two-dimensional representation of the image known as an activation map that

gives the response of the kernel at each spatial position of the image. The sliding size of the kernel is called a stride.

Ref : https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
Ref : https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
Padding
While applying convolutions we will not obtain the output dimensions the same as input we will lose data over
borders so we append a border of zeros and recalculate the convolution covering all the input values.

Ref : https://www.analyticsvidhya.com/blog/2020/10/what-is-the-convolutional-neural-network-architecture/
CNN
Padding - Padding has the following benefits:

1. It allows us to use a CONV layer without necessarily shrinking the height and width of the volumes. This
is important for building deeper networks, since otherwise the height/width would shrink as we go to
deeper layers. If we have an activation map of size W x W x D, a pooling kernel of spatial size F, and stride S,
then the size of output volume can be determined by the following formula:

100
CNN
Some padding terminologies:

● “valid” padding: Output size = input size - kernel size + 1

● “same” padding: Output size = input size

● “Full” padding: Output size = input size + kernel size - 1

● Calculating the Output Dimension


Striding
Some times we do not want to capture all the data or information available so we skip some neighboring cells

Ref : https://www.analyticsvidhya.com/blog/2020/10/what-is-the-convolutional-neural-network-architecture/
● Convolution Operation with Multiple Filters
● Multiple filters can be used in a convolution layer to detect multiple features. The output of the layer then will have the
same number of channels as the number of filters in the layer.
● The total number of multiplications to calculate the result is (4 x 4 x 2) x (3 x 3 x 3) = 864
CNN
● 1 x 1 Convolution
● This is convolution with 1 x 1 filter. The effect is to flatten or “merge” channels together, which can save computations later
in the network:

104
CNN
Convolution parameters

● Filter dimensions: 2D for images.


● Filter size: generally 3x3 or 5x5.
● Number of filters: determine the number of feature maps created by the convolution operation.
● Stride: step for sliding the convolution window. Generally equal to 1.
● Padding: blank rows/columns with all-zero values added on sides of the input feature map.

105
Convolution Layer

If we have an input of size W x W x D and Dout number of kernels with a spatial size of F with stride S and

amount of padding P, then the size of output volume can be determined by the following formula:

This will yield an output volume of size Wout x Wout x Dout.

Ref : https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
CNN
Convolution Layer - Convolutions occur in convolution layer which are the building blocks of CNN. This
layer generally has

● Input vectors (Image)


● Filters (Feature Detector)
● Output vectors (Feature map)
● Input Image x Feature Detector = Feature Map

108
CNN
● This layer identify and extract best features/patterns from input image and preserves the generic information
into a matrix. Matrix representation of the input image is multiplied element-wise with filters and summed up to
produce a feature map, which is the same as dot product between combination of vectors.
● Convolution involves the following important features :
– Local connectivity
– Where each neuron is connected only to a subset of input image (unlike a neural network where all neurons
are fully connected). In CNN, a certain dimension of filter is chosen, which slides over these subsets of input
data. Multiple filters are present in CNN where each filter moves over entire image and learns different
portions of input image.
– Parameter Sharing
– Is sharing of weights by all neurons in a particular feature map. All of them share same amounts of weight
hence called parameter sharing.
109
CNN
● Batch Normalization
● Batch normalization is generally done in between convolution and activation(ReLU) layers. It
normalizes the inputs at each layer, reduces internal co-variate shift(change in the distribution of
network activations) and is a method to regularize a convolutional network.
● Batch normalization allows higher learning rates that can reduce training time and gives better
performance. It allows learning at each layer by itself without being more dependent on other layers.
● Padding and Stride
● Padding and Stride influence how convolution operation is performed. Padding and stride can be
used to alter the dimensions(height and width) of input/output vectors either by increasing or
decreasing.
ReLu Layer
ReLU Layer (Rectified Linear Unit)
ReLU is computed after convolution. It is most commonly deployed activation function that allows the neural network to
account for non-linear relationships. In a given matrix (x), ReLU sets all negative values to zero and all other values remains
constant. It is mathematically represented as :
Pooling Layer

The pooling layer replaces the output of the network at certain locations by deriving a
summary statistic of the nearby outputs. This helps in reducing the spatial size of the
representation, which decreases the required amount of computation and weights. The pooling
operation is processed on every slice of the representation individually.

Pooling functions
● Average of the rectangular neighborhood,
● Max of the rectangular neighborhood,
● and a weighted average based on the distance from the central pixel.

Ref : https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
Pooling Layer

If we have an activation map of size W x W x D, a pooling kernel of spatial size F, and stride S,
then the size of output volume can be determined by the following formula:

This will yield an output volume of size


Wout x Wout x D.

Ref : https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
CNN
Why pooling is important ?
● It progressively reduces the spatial size of representation to reduce amount of
parameters and computation in network and also controls overfitting. If no pooling,
then the output consists of same resolution as input.
● There can be many number of convolution, ReLU and pooling layers. Initial layers of
convolution learns generic information and last layers learn more specific/complex
features. After the final Convolution Layer, ReLU, Pooling Layer the output feature
map(matrix) will be converted into vector(one dimensional array). This is called
flatten layer. 114
Fully Connected Layer

Neurons in this layer have full connectivity with all neurons in the preceding and succeeding
layer as seen in regular FCNN. This is why it can be computed as usual by a matrix
multiplication followed by a bias effect.

Ref : https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
Soft-Max Layer

Soft-max is an activation layer normally applied to the last layer of network that acts as a classifier. Classification of given input
into distinct classes takes place at this layer. The soft max function is used to map the non-normalized output of a network to a
probability distribution.

● The output from last layer of fully connected layer is directed to soft max layer, which converts it into probabilities.
● Here soft-max assigns decimal probabilities to each class in a multi-class problem, these probabilities sum equals 1.0.
● This allows the output to be interpreted directly as a probability.

116
CNN Architectures
CNN architectures:
You’ve learned the following:

● Convolution Layer
1. LeNet-5
● Pooling Layer
2. AlexNet
● Normalization Layer
● Fully Connected Layer 3. VGG-16

● Activation Function 4. Inception-v1


CNN Architectures

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - LeNet-5
Excluding pooling, LeNet-5 consists of 5 layers:

● 2 convolution layers with kernel size 5×5, followed by


● 3 fully connected layers.

Each convolution layer is followed by a 2×2 average-pooling, and every layer has tanh activation function except
the last (which has softmax).

LeNet-5 has 60,000 parameters. The network is trained on greyscale 32×32 digit images and tries to recognize them as
one of the ten digits (0 to 9).

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - AlexNet
AlexNet introduces the ReLU activation function and LRN into the mix. ReLU becomes so popular that almost all CNN
architectures developed after AlexNet used ReLU in their hidden layers, abandoning the use of tanh activation function
in LeNet-5.

The network consists of 8 layers:

● 5 convolution layers with non-increasing kernel sizes, followed by

● 3 fully connected layers.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - AlexNet
● The last layer uses the softmax activation function, and all others use ReLU. LRN is applied on the first and

second convolution layers after applying ReLU. The first, second, and fifth convolution layers are followed

by a 3×3 max pooling.

● With the advancement of modern hardware, AlexNet can be trained with a whopping 60 million

parameters and becomes the winner of the ImageNet competition in 2012. ImageNet has become a

benchmark dataset in developing CNN architectures and a subset of it (ILSVRC) consists of various images

with 1000 classes. Default AlexNet accepts colored images with dimensions 224×224.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - VGG16
● Researchers investigated the effect of CNN depth on its accuracy in the large-scale image recognition setting. By pushing the
depth to 11–19 layers, VGG families are born: VGG-11, VGG-13, VGG-16, and VGG-19. A version of VGG-11 with LRN was also
investigated but LRN doesn’t improve the performance. Hence, all other VGGs are implemented without LRN.

VGG-16, a deep CNN architecture with, well, 16 layers:

● 13 convolution layers with kernel size 3×3, followed by

● 3 fully connected layers.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - VGG16
● VGG-16 is one of the biggest networks that has 138 million parameters. Just like AlexNet, the last layer is

equipped with a softmax activation function and all others are equipped with ReLU.

● The 2nd, 4th, 7th, 10th, and 13th convolution layers are followed by a 2×2 max-pooling. Default VGG-16 accepts

colored images with dimensions 224×224 and outputs one of the 1000 classes.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - Inception-v1
Going deeper has a caveat: exploding/vanishing gradients:
1. The exploding gradient is a problem when large error gradients accumulate and result in unstable weight updates
during training.
2. The vanishing gradient is a problem when the partial derivative of the loss function approaches a value close to
zero and the network couldn’t train.

Inception-v1 tackles this issue by adding two auxiliary classifiers connected to intermediate layers, with the hope to increase

the gradient signal that gets propagated back. During training, their loss gets added to the total loss of the network with a 0.3

discount weight. At inference time, these auxiliary networks are discarded.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - Inception-v1

● 3 convolution layers

● 18 layers that consist of 9 inception

● 1 fully connected layer.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - Inception-v1
Inception-v1 introduces the inception module, four series of one or two convolution and max-pool layers stacked in parallel
and concatenated at the end. The inception module aims to approximate an optimal local sparse structure in a CNN by
allowing the use of multiple types of kernel sizes, instead of being restricted to single kernel size.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - Inception-v1
Inception-v1 has fewer parameters than AlexNet and VGG-16, a mere 7 million, even though it consists of 22

layers:

● 3 convolution layers with 7×7, 1×1, and 3×3 kernel sizes, followed by

● 18 layers that consist of 9 inception modules where each has 2 layers of convolution/max-pooling,

followed by

● 1 fully connected layer.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - ResNet50
● When deeper networks can start converging, a degradation problem has been exposed:

with the network depth increasing, accuracy gets saturated and then degrades rapidly.

● Unexpectedly, such degradation is not caused by overfitting (usually indicated by lower

training error and higher testing error) since adding more layers to a suitably deep network

leads to higher training error.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - ResNet50
The degradation problem is addressed by introducing bottleneck residual blocks. There are 2

kinds of residual blocks:

1. Identity block: consists of 3 convolution layers with 1×1, 3×3, and 1×1 kernel sizes, all of

which are equipped with BN. The ReLU activation function is applied to the first two

layers, while the input of the identity block is added to the last layer before applying ReLU.

2. Convolution block: same as identity block, but the input of the convolution block is first

passed through a convolution layer with 1×1 kernel size and BN before being added to the

last convolution layer of the main series.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - ResNet50

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
CNN Architectures - ResNet50
Notice that both residual blocks have 3 layers. In total, ResNet-50 has 26 million parameters and

50 layers:

● 1 convolution layer with BN then ReLU is applied, followed by

● 9 layers that consist of 1 convolution block and 2 identity blocks, followed by

● 12 layers that consist of 1 convolution block and 3 identity blocks, followed by

● 18 layers that consist of 1 convolution block and 5 identity blocks, followed by

● 9 layers that consist of 1 convolution block and 2 identity blocks, followed by

● 1 fully connected layer with softmax.

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
Summary of all Architectures

Ref: https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af76f1f0065e#7ebd
Batch Normalization

What is “Normalization”?
● Normalization is a data pre-processing tool used to bring the numerical data to a common scale
without distorting its shape.
● Generally, when we input the data to a machine or deep learning algorithm we tend to change the
values to a balanced scale. The reason we normalize is partly to ensure that our model can generalize
appropriately.

Ref: https://www.analyticsvidhya.com/blog/2021/03/introduction-to-batch-normalization/
Batch Normalization

What is “Batch Normalization”?


● It is a process to make neural networks faster and more stable through adding extra layers in a deep neural

network. The new layer performs the standardizing and normalizing operations on the input of a layer coming

from a previous layer.

● But what is the reason behind the term “Batch” in batch normalization? A typical neural network is trained using a

collected set of input data called batch. Similarly, the normalizing process in batch normalization takes place in

batches, not as a single input.

Ref: https://www.analyticsvidhya.com/blog/2021/03/introduction-to-batch-normalization/
Batch Normalization
● Let’s understand this through an example, we have a deep neural network as shown in the following image.

Ref: https://www.analyticsvidhya.com/blog/2021/03/introduction-to-batch-normalization/
Batch Normalization
● Initially, our inputs X1, X2, X3, X4 are in normalized form as they are coming from the pre-processing stage. When the

input passes through the first layer, it transforms, as a sigmoid function applied over the dot product of input X and

the weight matrix W.

Ref: https://www.analyticsvidhya.com/blog/2021/03/introduction-to-batch-normalization/
Batch Normalization
● Similarly, this transformation will take place for the second layer and go till the last layer L as shown in the

following image.

Ref: https://www.analyticsvidhya.com/blog/2021/03/introduction-to-batch-normalization/
Batch Normalization

● Although, our input X was normalized with time the output will no longer be on the
same scale. As the data go through multiple layers of the neural network and L
activation functions are applied, it leads to an internal co-variate shift in the data

Ref: https://www.analyticsvidhya.com/blog/2021/03/introduction-to-batch-normalization/
Local Response Normalization
● Local Response Normalization (LRN) was first introduced in AlexNet architecture where the
activation function used was ReLU as opposed to the more common tanh and sigmoid at that
time.
● Apart from the reason mentioned above, the reason for using LRN was to encourage lateral
inhibition.
● It is a concept in Neurobiology that refers to the capacity of a neuron to reduce the activity of
its neighbors.
● In DNNs, the purpose of this lateral inhibition is to carry out local contrast enhancement so
that locally maximum pixel values are used as excitation for the next layers.

Ref:
https://towardsdatascience.com/difference-between-local-response-normalization-and-batch-normalization-272308c034ac#:~:text=Local%20Response%20Normalization%20(L
RN)%20was,was%20to%20encourage%20lateral%20inhibition
Local Response Normalization
LRN is a non-trainable layer that square-normalizes the pixel values in a feature map within a local
neighborhood. There are two types of LRN based on the neighborhood defined and can be seen in the
figure below.

Ref: http://surl.li/fduoi
Local Response Normalization
Inter-Channel LRN: This is originally what the AlexNet paper used. The neighborhood defined is across
the channel. For each (x,y) position, the normalization is carried out in the depth dimension and is given
by the following formula

Ref: http://surl.li/fduoi
Local Response Normalization

Inter-Channel LRN: where i indicates the output of filter i, a(x,y), b(x,y) the pixel values at (x,y)
position before and after normalization respectively, and N is the total number of channels. The
constants (k,α,β,n) are hyper-parameters. k is used to avoid any singularities (division by zero), α is
used as a normalization constant, while β is a contrasting constant. The constant n is used to define
the neighborhood length i.e. how many consecutive pixel values need to be considered while
carrying out the normalization. The case of (k,α, β, n)=(0,1,1,N) is the standard normalization).

Ref: http://surl.li/fduoi
Local Response Normalization
Let’s have a look at an

example of Inter-channel

LRN.

Consider the figure

Ref: http://surl.li/fduoi
Local Response Normalization
Different colors denote different channels and hence N=4. Lets take the hyper-parameters to be
(k,α, β, n)=(0,1,1,2). The value of n=2 means that while calculating the normalized value at
position (i,x,y), we consider the values at the same position for the previous and next filter i.e
(i-1, x, y) and (i+1, x, y). For (i,x,y)=(0,0,0) we have value(i,x,y)=1, value(i-1,x,y) doesn’t exist and
value(i+,x,y)=1. Hence normalized_value(i,x,y) = 1/(¹²+¹²) = 0.5 and can be seen in the lower
part of the figure above. The rest of the normalized values are calculated in a similar way.

Ref: http://surl.li/fduoi
Local Response Normalization

Intra-Channel LRN: In Intra-channel LRN, the neighborhood is extended within the same
channel only as can be seen in the figure above. The formula is given by

Ref: http://surl.li/fduoi
Local Response Normalization

where (W,H) are the width and height of the feature map (for example in the figure above (W,H) =
(8,8)). The only difference between Inter and Intra Channel LRN is the neighborhood for
normalization. In Intra-channel LRN, a 2D neighborhood is defined (as opposed to the 1D
neighborhood in Inter-Channel) around the pixel under-consideration. As an example, the figure
below shows the Intra-Channel normalization on a 5x5 feature map with n=2 (i.e. 2D
neighborhood of size (n+1)x(n+1) centered at (x,y)).

Ref: http://surl.li/fduoi
Local Response Normalization
As an example, the figure below shows the Intra-Channel normalization on a 5x5 feature map with
n=2 (i.e. 2D neighborhood of size (n+1)x(n+1) centered at (x,y)).

Ref: http://surl.li/fduoi
Comparison of BN & LRN
LRN has multiple directions to perform normalization across (Inter or Intra Channel), on the other hand,
BN has only one way of being carried out (for each pixel position across all the activations). The table
below compares the two normalization techniques.
Training a Convolutional Network

150
CNN Architectures
https://towardsdatascience.com/5-most-well-known-cnn-architectures-visualized-af
76f1f0065e#7ebd

LRN:https://towardsdatascience.com/difference-between-local-response-normaliz
ation-and-batch-normalization-272308c034ac#:~:text=Local%20Response%20No
rmalization%20(LRN)%20was,was%20to%20encourage%20lateral%20inhibition.

You might also like