0% found this document useful (0 votes)

427 views36 pages

Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks

Uploaded by

Ramya Velmurugan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

427 views36 pages

Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks

Uploaded by

Ramya Velmurugan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Click on Subject/Paper under Semester to enter.

Professional English Discrete Mathematics Environmental Sciences

Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -

Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -

Engineering Chemistry Information Science Algorithms - AD3351 AL3451
- CY3151 - PH3256
Data Exploration and Fundamentals of Data
Basic Electrical and
Visualization - AD3301 Science and Analytics
Problem Solving and Electronics Engineering -
BE3251 - AD3491
Python Programming -
GE3151 Artificial Intelligence
Data Structures Computer Networks
- AL3391
Design - AD3251 - CS3591

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791

6th Semester

7th Semester

8th Semester

Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
lOMoARcPSD|45374298

www.BrainKart.com

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
AD3501 DEEP LEARNING - NOTES

UNIT II CONVOLUTIONAL NEURAL NETWORKS

Convolution Operation -- Sparse Interactions -- Parameter Sharing -- Equivariance -- Pooling --
Convolution Variants: Strided -- Tiled -- Transposed and dilated convolutions; CNN Learning:
Nonlinearity Functions -- Loss Functions -- Regularization -- Optimizers --Gradient
Computation.

1. Introduction to Convolutional Neural Networks:

A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly
used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to
understand and interpret the image or visual data. In a regular Neural Network there are three types of layers:

 Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer
is equal to the total number of features in our data (number of pixels in the case of an image).
 Hidden Layer: The input from the Input layer is then feed into the hidden layer. There can be many
hidden layers depending upon our model and data size. Each hidden layer can have different numbers
of neurons which are generally greater than the number of features. The output from each layer is
computed by matrix multiplication of output of the previous layer with learnable weights of that layer
and then by the addition of learnable biases followed by activation function which makes the network
nonlinear.
 Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or
softmax which converts the output of each class into the probability score of each class.

The data is fed into the model and output from each layer is obtained from the above step is called feed
forward, we then calculate the error using an error function, some common error functions are cross-entropy,
square loss error, etc. The error function measures how well the network is performing. After that, we back
propagate into the model by calculating the derivatives. This step is called Back propagation which basically
is used to minimize the loss.

1.1 Convolution Neural Network

Convolutional Neural Network (CNN) is the extended version of artificial neural networks (ANN) which
is predominantly used to extract the feature from the grid-like matrix dataset. For example visual datasets
like images or videos where data patterns play an extensive role.

Around the 1980s, CNNs were developed and deployed for the first time. A CNN could only detect
handwritten digits at the time. CNN was primarily used in various areas to read zip and pin codes etc. The
most common aspect of any AI model is that it requires a massive amount of data to train. This was one of

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 1

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
the biggest problems that CNN faced at the time, and due to this, they were only used in the postal
industry. Yann LeCun was the first to introduce convolutional neural networks.

Convolutional Neural Networks, commonly referred to as CNNs, are a specialized kind of neural network
architecture that is designed to process data with a grid-like topology. This makes them particularly well-
suited for dealing with spatial and temporal data, like images and videos that maintain a high degree of
correlation between adjacent elements.

CNNs are similar to other neural networks, but they have an added layer of complexity due to the fact that
they use a series of convolutional layers. Convolutional layers perform a mathematical operation called
convolution, a sort of specialized matrix multiplication, on the input data. The convolution operation helps
to preserve the spatial relationship between pixels by learning image features using small squares of input
data. . The picture below represents a typical CNN architecture.

Fig. 1Typical CNN architecture

The following are definitions of different layers shown in the above architecture:

 Convolutional layers

Convolutional layers operate by sliding a set of ‘filters’ or ‘kernels’ across the input data. Each filter is
designed to detect a specific feature or pattern, such as edges, corners, or more complex shapes in the case
of deeper layers. As these filters move across the image, they generate a map that signifies the areas where
those features were found. The output of the convolutional layer is a feature map, which is a
representation of the input image with the filters applied. Convolutional layers can be stacked to create
more complex models, which can learn more intricate features from images. Simply speaking,
convolutional layers are responsible for extracting features from the input images. These features might
include edges, corners, textures, or more complex patterns.

 Pooling layers

Pooling layers follow the convolutional layers and are used to reduce the spatial dimension of the input,
making it easier to process and requiring less memory. In the context of images, “spatial dimensions” refer
to the width and height of the image. An image is made up of pixels, and you can think of it like a grid,
with rows and columns of tiny squares (pixels). By reducing the spatial dimensions, pooling layers help
reduce the number of parameters or weights in the network. This helps to combat over-fitting and help
train the model in a fast manner. Max pooling helps in reducing computational complexity, owing to
reduction in size of feature map, and making the model invariant to small transitions. Without max
pooling, the network would not gain the ability to recognize features irrespective of small shifts or

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 2

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
rotations. This would make the model less robust to variations in object positioning within the image,
possibly affecting accuracy.

There are two main types of pooling: max pooling and average pooling. Max pooling takes the maximum
value from each feature map. For example, if the pooling window size is 2×2, it will pick the pixel with
the highest value in that 2×2 region. Max pooling effectively captures the most prominent feature or
characteristic within the pooling window. Average pooling calculates the average of all values within the
pooling window. It provides a smooth, average feature representation.

 Fully connected layers

Fully-connected layers are one of the most basic types of layers in a convolutional neural network (CNN).
As the name suggests, each neuron in a fully-connected layer is Fully connected- to every other neuron in
the previous layer. Fully connected layers are typically used towards the end of a CNN- when the goal is to
take the features learned by the convolutional and max pooling layers and use them to make predictions
such as classifying the input to a label. For example, if we were using a CNN to classify images of
animals, the final Fully connected layer might take the features learned by the previous layers and use
them to classify an image as containing a dog, cat, bird, etc.
Fully connected layers take the high-dimensional output from the previous convolutional and pooling
layers and flatten it into a one-dimensional vector. This allows the network to combine and integrate all the
extracted features across the entire image, rather than considering localized features. It helps in
understanding the global context of the image. The fully connected layers are responsible for mapping the
integrated features to the desired output, such as class labels in classification tasks. They act as the final
decision-making part of the network, determining what the extracted features mean in the context of the
specific problem (e.g., recognizing a cat or a dog).

The combination of Convolution layer followed by max-pooling layer and then similar sets creates a
hierarchy of features. The first layer detects simple patterns, and subsequent layers build on those to detect
more complex patterns.

CNNs are often used for image recognition and classification tasks. For example, CNNs can be used to
identify objects in an image or to classify an image as being a cat or a dog. CNNs can also be used for more
complex tasks, such as generating descriptions of an image or identifying the points of interest in an image.
Beyond image data, CNNs can also handle time-series data, such as audio data or even text data, although
other types of networks like Recurrent Neural Networks (RNNs) or transformers are often preferred for
these scenarios. CNNs are a powerful tool for deep learning, and they have been used to achieve state-of-
the-art results in many different applications.

1.2 CNN architecture

Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer,
Pooling layer, and fully connected layers.

Fig.2 Simple CNN architecture

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 3

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
The Convolutional layer applies filters to the input image to extract features, the Pooling layer down
samples the image to reduce computation, and the fully connected layer makes the final prediction. The
network learns the optimal filters through back propagation and gradient descent as detailed in Fig. 3.

.
Fig. 3 Functions of CNN Layers

1.2.1 Different types of CNN Architectures

The following is a list of different types of CNN architectures:

LeNet: LeNet is the first CNN architecture. It was developed in 1998 by Yann LeCun, Corinna Cortes, and
Christopher Burges for handwritten digit recognition problems. LeNet was one of the first successful CNNs
and is often considered the “Hello World” of deep learning. It is one of the earliest and most widely-used
CNN architectures and has been successfully applied to tasks such as handwritten digit recognition. The
LeNet architecture consists of multiple convolutional and pooling layers, followed by a fully-connected
layer. The model has five convolution layers followed by two fully connected layers. LeNet was the
beginning of CNNs in deep learning for computer vision problems. However, LeNet could not train well due
to the vanishing gradients problem. To solve this issue, a shortcut connection layer known as max-pooling is
used between convolutional layers to reduce the spatial size of images which helps prevent overfitting and
allows CNNs to train more effectively. The diagram below represents LeNet-5 architecture.

Fig. 4 LeNet Architecture

The LeNet CNN is a simple yet powerful model that has been used for various tasks such as handwritten
digit recognition, traffic sign recognition, and face detection. Although LeNet was developed more than 20
years ago, its architecture is still relevant today and continues to be used.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 4

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
AlexNet: AlexNet is the deep learning architecture that popularized CNN. It was developed by Alex
Krizhevsky, Ilya Sutskever, and Geoff Hinton. AlexNet network had a very similar architecture to LeNet,
but was deeper, bigger, and featured Convolutional Layers stacked on top of each other. AlexNet was the
first large-scale CNN and was used to win the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC) in 2012. The AlexNet architecture was designed to be used with large-scale image datasets and it
achieved state-of-the-art results at the time of its publication. AlexNet is composed of 5 convolutional layers
with a combination of max-pooling layers, 3 fully connected layers, and 2 dropout layers. The activation
function used in all layers is Relu. The activation function used in the output layer is Softmax. The total
number of parameters in this architecture is around 60 million.

Fig. 5 AlexNet Architecture

ZF Net: ZFnet is the CNN architecture that uses a combination of fully-connected layers and CNNs. ZF Net
was developed by Matthew Zeiler and Rob Fergus. It was the ILSVRC 2013 winner. The network has
relatively fewer parameters than AlexNet, but still outperforms it on ILSVRC 2012 classification task by
achieving top accuracy with only 1000 images per class. It was an improvement on AlexNet by tweaking the
architecture hyperparameters, in particular by expanding the size of the middle convolutional layers and
making the stride and filter size on the first layer smaller. It is based on the Zeiler and Fergus model, which
was trained on the ImageNet dataset. ZF Net CNN architecture consists of a total of seven layers:
Convolutional layer, max-pooling layer (downscaling), concatenation layer, convolutional layer with linear
activation function, and stride one, dropout for regularization purposes applied before the fully connected
output. This CNN model is computationally more efficient than AlexNet by introducing an approximate
inference stage through deconvolutional layers in the middle of CNNs.

GoogLeNet: GoogLeNet is the CNN architecture used by Google to win ILSVRC 2014 classification task.
It was developed by Jeff Dean, Christian Szegedy, Alexandro Szegedy et al.. It has been shown to have a
notably reduced error rate in comparison with previous winners AlexNet (Ilsvrc 2012 winner) and ZF-Net
(Ilsvrc 2013 winner). In terms of error rate, the error is significantly lesser than VGG (2014 runner up). It
achieves deeper architecture by employing a number of distinct techniques, including 1×1 convolution and

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 5

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
global average pooling. GoogleNet CNN architecture is computationally expensive. To reduce the
parameters that must be learned, it uses heavy unpooling layers on top of CNNs to remove spatial
redundancy during training and also features shortcut connections between the first two convolutional layers
before adding new filters in later CNN layers. Real-world applications/examples of GoogLeNet CNN
architecture include Street View House Number (SVHN) digit recognition task, which is often used as a
proxy for roadside object detection. Below is the simplified block diagram representing GoogLeNet CNN
architecture:

Fig. 6 GoogLeNet Architecture

VGGNet: VGGNet is the CNN architecture that was developed by Karen Simonyan, Andrew Zisserman et
al. at Oxford University. VGGNet is a 16-layer CNN with up to 95 million parameters and trained on over
one billion images (1000 classes). It can take large input images of 224 x 224-pixel size for which it has
4096 convolutional features. CNNs with such large filters are expensive to train and require a lot of data,
which is the main reason why CNN architectures like GoogLeNet (AlexNet architecture) work better than
VGGNet for most image classification tasks where input images have a size between 100 x 100-pixel and
350 x 350 pixels. Real-world applications/examples of VGGNet CNN architecture include the ILSVRC
2014 classification task, which was also won by GoogleNet CNN architecture. The VGG CNN model is
computationally efficient and serves as a strong baseline for many applications in computer vision due to its
applicability for numerous tasks including object detection. Its deep feature representations are used across
multiple neural network architectures like YOLO, SSD, etc. The diagram below represents the standard
VGG16 network architecture diagram:

Fig. 7 VGGNet Architectre

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 6

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
ResNet: ResNet is the CNN architecture that was developed by Kaiming He et al. to win the ILSVRC 2015
classification task with a top-five error of only 15.43%. The network has 152 layers and over one million
parameters, which is considered deep even for CNNs because it would have taken more than 40 days on 32
GPUs to train the network on the ILSVRC 2015 dataset. CNNs are mostly used for image classification
tasks with 1000 classes, but ResNet proves that CNNs can also be used successfully to solve natural
language processing problems like sentence completion or machine comprehension, where it was used by
the Microsoft Research Asia team in 2016 and 2017 respectively. Real-life applications/examples of ResNet
CNN architecture include Microsoft’s machine comprehension system, which has used CNNs to generate
the answers for more than 100k questions in over 20 categories. The CNN architecture ResNet is
computationally efficient and can be scaled up or down to match the computational power of GPUs.

MobileNets: MobileNets are CNNs that can be fit on a mobile device to classify images or detect objects
with low latency. MobileNets have been developed by Andrew G Trillion et al.. They are usually very small
CNN architectures, which makes them easy to run in real-time using embedded devices like smartphones
and drones. The architecture is also flexible so it has been tested on CNNs with 100-300 layers and it still
works better than other architectures like VGGNet. Real-life examples of MobileNets CNN architecture
include CNNs that is built into Android phones to run Google’s Mobile Vision API, which can automatically
identify labels of popular objects in images.
GoogLeNet_DeepDream: GoogLeNet_DeepDream is a deep dream CNN architecture that was developed
by Alexander Mordvintsev, Christopher Olah, et al.. It uses the Inception network to generate images
based on CNN features. The architecture is often used with the ImageNet dataset to generate psychedelic
images or create abstract artworks using human imagination at the ICLR 2017 workshop by David Ha, et
al.
To summarize the different types of CNN architectures described above in an easy to remember form, you
can use the following:
Table 1. Different Types of CNN Architectures
Architecture Year Key Features Use Case

First successful applications of CNNs, 5 layers Recognizing handwritten

LeNet 1998 (alternating between convolutional and pooling), and machine-printed
Used tanh/sigmoid activation functions characters

Deeper and wider than LeNet, Used ReLU Large-scale image

AlexNet 2012 activation function, Implemented dropout layers, recognition tasks
Used GPUs for training
Similar architecture to AlexNet, but with
ZFNet 2013 different filter sizes and numbers of filters, ImageNet classification
Visualization techniques for understanding the
network
Deeper networks with smaller filters (3×3), All Large-scale image
VGGNet 2014 convolutional layers have the same depth, recognition
Multiple configurations (VGG16, VGG19)
Introduced “skip connections” or “shortcuts” to Large-scale image
ResNet 2015 enable training of deeper networks, Multiple recognition, won 1st place
configurations (ResNet-50, ResNet-101, ResNet- in the ILSVRC 2015
152)
Introduced Inception module, which allows for Large-scale image
GoogleLeNet 2014 more efficient computation and deeper networks, recognition, won 1st place
multiple versions (Inception v1, v2, v3, v4) in the ILSVRC 2014

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 7

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
Architecture Year Key Features Use Case

Designed for mobile and embedded vision Mobile and embedded

MobileNets 2017 applications, Uses depthwise separable vision applications, real-
convolutions to reduce the model size and time object detection
complexity
First successful applications of CNNs, 5 layers Recognizing handwritten
LeNet 1998 (alternating between convolutional and pooling), and machine-printed
Used tanh/sigmoid activation functions characters

1.3 Working of Convolutional Layers

Convolution Neural Networks or convnets are neural networks that share their parameters. Imagine you
have an image. It can be represented as a cuboid having its length, width (dimension of the image), and
height (i.e the channel as images generally have red, green, and blue channels).

Now imagine taking a small patch of this image and running a small neural network, called a filter or
kernel on it, with say, K outputs and representing them vertically. Now slide that neural network across the
whole image, as a result, we will get another image with different widths, heights, and depths. Instead o f
just R, G, and B channels now we have more channels but lesser width and height. This operation is
called Convolution. If the patch size is the same as that of the image it will be a regular neural network.
Because of this small patch, we have fewer weights.

Now let’s talk about a bit of mathematics that is involved in the whole convolution process.
Convolution layers consist of a set of learnable filters (or kernels) having small widths and heights and the
same depth as that of input volume (3 if the input layer is image input).
For example, if we have to run convolution on an image with dimensions 34x34x3. The possible size of
filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as compared to the image
dimension.
During the forward pass, we slide each filter across the whole input volume step by step where each step is
called stride (which can have a value of 2, 3, or even 4 for high-dimensional images) and compute the dot
product between the kernel weights and patch from input volume.
As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a result, we’ll
get output volume having a depth equal to the number of filters. The network will learn all the filters.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 8

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
1.3.1 Layers used to build ConvNets
A complete Convolution Neural Networks architecture is also known as convNets. A convNets is a
sequence of layers, and every layer transforms one volume to another through a differentiable function.
Let’s take an example by running a convNets on of image of dimension 32 x 32 x 3.
 Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will
be an image or a sequence of images. This layer holds the raw input of the image with width 32,
height 32, and depth 3.
 Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset.
It applies a set of learnable filters known as the kernels to the input images. The filters/kernels are
smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes
the dot product between kernel weight and the corresponding input image patch. The output of this
layer is referred ad feature maps. Suppose we use a total of 12 filters for this layer we’ll get an
output volume of dimension 32 x 32 x 12.
 Activation Layer: By adding an activation function to the output of the preceding layer, activation
layers add nonlinearity to the network. it will apply an element-wise activation function to the
output of the convolution layer. Some common activation functions are RELU, Tanh, Leaky
RELU, etc. The volume remains unchanged hence output volume will have dimensions 32 x 32 x
12.
 Pooling layer: This layer is periodically inserted in the convnets and its main function is to reduce
the size of volume which makes the computation fast reduces memory and also prevents over-
fitting. Two common types of pooling layers are max pooling and average pooling. If we use a
max pool with 2 x 2 filters and stride 2, the resultant volume will be of dimension 16x16x12.

 Flattening: The resulting feature maps are flattened into a one-dimensional vector after the
convolution and pooling layers so they can be passed into a completely linked layer for
categorization or regression.
 Fully Connected Layers: It takes the input from the previous layer and computes the final
classification or regression task.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 9

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
 Output Layer: The output from the fully connected layers is then fed into a logistic function for
classification tasks like sigmoid or softmax which converts the output of each class into the
probability score of each class.

1.4 Advantages of Convolutional Neural Networks (CNNs)

 Good at detecting patterns and features in images, videos, and audio signals
 Robust to translation, rotation, and scaling invariance
 End-to-end training, no need for manual feature extraction
 Can handle large amounts of data and achieve high accuracy

1.5 Disadvantages of Convolutional Neural Networks (CNNs)

 Computationally expensive to train and require a lot of memory
 Can be prone to over-fitting if not enough data or proper regularization is used
 Requires large amounts of labeled data
 Interpretability is limited, it’s hard to understand what the network has learned

1.6 Applications of CNN

Here are some of the common applications of convolutional neural networks:
 Semantic segmentation: CNNs can classify every pixel in an image into different classes, for e.g. -
different types of vegetation in satellite images.
 Object detection: CNNs can detect objects within an image, for e.g. - identifying the location & the
type of vehicle on the road.
 Image classification: CNNs can classify images into different categories, for e.g. identifying objects
in a photograph.
 Image captioning: CNNs can generate natural language descriptions of images, for e.g. - describing
the objects in a photograph.
 Face recognition - CNNs can recognize & verify the identity of different individuals in images, such
as finding people's faces in security footages.
 Medical image analysis -CNNs can identify tumors in medical scans, or in detecting abnormalities in
X rays.
 Video analysis - CNNs can detect the movement of objects across frames.
 Autonomous vehicles - CNNs can identify & track objects - such as pedestrians & other vehicles.

2. Convolution Operation:
A convolutional neural network, or ConvNet, is just a neural network that uses convolution. To understand
the principle, we are going to work with a 2-dimensional convolution first.
Convolution is a mathematical operation that allows the merging of two sets of information. Convolution
between two functions in mathematics produces a third function expressing how the shape of one function is
modified by other.In the case of CNN, convolution is applied to the input data to filter the information and
produce a feature map.

This filter is also called a kernel, or feature detector, and its dimensions can be, for example, 3x3. A kernel is
a small 2D matrix whose contents are based upon the operations to be performed. A kernel maps on the input
image by simple matrix multiplication and addition, the output obtained is of lower dimensions and therefore
easier to work with.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 10

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com

Above is an example of a kernel for applying Gaussian blur (to smoothen the image before processing),
Sharpen image (enhance the depth of edges) and edge detection.To perform convolution, the kernel goes
over the input image, doing matrix multiplication element after element. The result for each receptive field
(the area where convolution takes place) is written down in the feature map.

The shape of a kernel is heavily dependent on the input shape of the image and architecture of the entire
network, mostly the size of kernels is (MxM) i.e., a square matrix. The movement of a kernel is always
from left to right and top to bottom.

Stride defines by what step does to kernel move, for example stride of 1 makes kernel slide by one

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 11

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
row/column at a time and stride of 2 moves kernel by 2 rows/columns. We continue sliding the filter until the
feature map is complete.

For input images with 3 or more channels such as RGB a filter is applied. Filters are one dimension higher
than kernels and can be seen as multiple kernels stacked on each other where every kernel is for a particular
channel. Therefore for an RGB image of (32x32) we have a filter of the shape say (5x5x3).

Here the input matrix has shape 4x4x1 and the kernel is of size 3x3 since the shape of input is larger than the
kernel, we are able to implement a sliding window protocol and apply the kernel over entire input. First entry
in the convoluted result is calculated as:
45*0+12*(-1)+ 5*0+22*(-1)+10*5+35*(-1)+88*0+26*(-1)+51*0 = -45
We continue sliding the filter until the feature map is complete.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 12

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
2.1 Sliding window protocol:
1. The kernel gets into position at the top-left corner of the input matrix.
2. Then it starts moving left to right, calculating the dot product and saving it to a new matrix until it
has reached the last column.
3. Next, kernel resets its position at first column but now it slides one row to the bottom. Thus
following the fashion left-right and top-bottom.
4. Steps 2 and 3are repeated till the entire input has been processed.
For a 3D input matrix the movement of the kernel will be from front to back, left to right and top to bottom.

2.2 Sparse Interactions (Connectivity)

Convolutional neural networks are more efficient than simple neural networks — in applications where they
apply, because they significantly reduce the number of parameters which reduces the required memory of the
network and improves its statistical efficiency. They exploit feature locality. They try to find patterns in the
input data. They stack them to make abstract concepts by their convolution layers. A Convolution layer
defines a window or filter or kernel by which they examine a subset of the data, and subsequently scans the
data looking through this window. We can parameterize the window to look for specific features (e.g. edges
within an image). The output they produce focuses solely on the regions of the data which exhibited the
feature it was searching for. This is what we call sparse connectivity or sparse interactions or sparse
weights. Actually it limits the activated connections at each layer. In the example below an 5x5 input with a
2x2 filter produces a reduced 4x4 output. The first element of feature map is calculated by the convolution of
the input area with the filter i.e.

In practice, we don’t explicitly define the filters that our convolutional layer will use; we instead
parameterize the filters and let the network learn the best filters to use during training. We do, however,
define how many filters, we’ll use at each layer— a hyperparameter which is called the depth of the output
volume.
Another hyperparameter is the stride that defines how much we slide the filter over the data. For example
if stride is 1 then we move the window by 1 pixel at a time over the image, when our input is an image.
When we use larger values of stride 2 or 3 we allow jumping 2 or pixels at a time. This reduces
significantly the output size.
The last hyperparameter is the size of zero-padding, when sometimes is convenient to pad the input
volume with zeros around the border.
So now we can compute the spatial size of the output volume as a function of the input volume size (W),
the receptive field size of the Conv Layer neurons (F), the stride with which they are applied (S), and the

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 13

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
amount of zero padding used (P) on the border. The formula for calculating how many neurons “fit” is
given by

In our previous example for the 5x5 input (W=5) and the 2x2 filter (F=2) with stride 1(S=1) and pad 0
(P=0) we would get a 4x4x (number of filters) output for each network node.
Trivial neural network layers use matrix multiplication by a matrix of parameters describing the
interaction between the input an doutput unit. This means that every output unit interacts with every input
unit. However, convolution neural networks have sparse interaction. This is achieved by making kernel
smaller than the input e.g., an image can have millions or thousands of pixels, but while processing it
using kernel we can detect meaningful information that is of tens or hundreds of pixels. This means that
we need to store fewer parameters that not only reduces the memory requirement of the model but also
improves the statistical efficiency of the model.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 14

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com

2.3 Parameter (Weight) Sharing

If computing one feature at a spatial point (x1, y1) is useful then it should also be useful at some other spatial
point say (x2, y2). It means that for a single two-dimensional slice i.e., for creating one activation map,
neurons are constrained to use the same set of weights. In a traditional neural network, each element of the
weight matrix is used once and then never revisited, while convolution network has shared parameters i.e., for
getting output, weights applied to one input are the same as the weight applied elsewhere. Parameter sharing is
used in the convolutional layers to reduce the number of parameters in the network. For example in the first
convolutional layer let’s say we have an output of 15x15x4 where 15 is the size of the output and 4 the number
of filters used in this layer. For each output node in that layer we have the same filter, thus reducing
dramatically the storage requirements of the model to the size of the filter.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 15

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com

The same filter (weights) (1, 0, -1) are used for that layer.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 16

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com

2.4 Equivariant Representations

Equivariant means varying in the similar or equivalent proportion. Due to parameter sharing, the layers of
convolution neural network will have a property of equivariance to translation. It says that if we changed the
input in a way, the output will also get changed in the same way.
Equivariant to translation means that a translation of input features results in an equivalent translation of
outputs. It makes the CNN understand the rotation or proportion change. The equivariance allows the network
to generalize edge, texture, shape, detection in different locations.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 17

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
In some cases, we may not wish to share parameters across entire image
 If image is cropped to be centered on a face, we may want different features from different parts of
the face
 Part of the network processing the top of the face looks for eyebrows
 Part of the network processing the bottom of the face looks for the chin
 Certain image operations such as scale and rotation are not equivariant to convolution
 Other mechanisms are needed for such transformations
2.5 Pooling
The pooling operation involves sliding a two-dimensional filter over each channel of feature map and
summarizing the features lying within the region covered by the filter.
For a feature map having dimensions nh x nw x nc, the dimensions of output obtained after a pooling layer is
(nh – f + 1)/ s x (nw-f+1)/s x nc

where,
 nh - height of feature map
 nw – width of feature map
 nc – number of channels in the feature map
 f - size of filter
 s-stride length

A common CNN model architecture is to have a number of convolution and pooling layers stacked one after the
other.
Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters
to learn and the amount of computation performed in the network.
The pooling layer summarizes the features present in a region of the feature map generated by a convolution
layer. So, further operations are performed on summarized features instead of precisely positioned features
generated by the convolution layer. This makes the model more robust to variations in the position of the
features in the input image.

2.5.1 Types of Pooling Layers:

 Max Pooling

Max pooling is a pooling operation that selects the maximum element from the region of the feature map
covered by the filter. Thus, the output after max-pooling layer would be a feature map containing the most
prominent features of the previous feature map.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 18

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
 Average Pooling

Average pooling computes the average of the elements present in the region of feature map covered by the
filter. Thus, while max pooling gives the most prominent feature in a particular patch of the feature map,
average pooling gives the average of features present in a patch.

 Global Pooling
Global pooling reduces each channel in the feature map to a single value. Thus, an nh x nw x nc feature map is
reduced to 1 x 1 x nc feature map. This is equivalent to using a filter of dimensions nh x nw i.e. the dimensions
of the feature map. Further, it can be either global max pooling or global average pooling.
 Global Average Pooling
Considering a tensor of shape h*w*n, the output of the Global Average Pooling layer is a single value across
h*w that summarizes the presence of the feature. Instead of downsizing the patches of the input feature map,
the Global Average Pooling layer downsizes the whole h*w into 1 value by taking the average.
 Global Max Pooling
With the tensor of shape h*w*n, the output of the Global Max Pooling layer is a single value across h*w that
summarizes the presence of a feature. Instead of downsizing the patches of the input feature map, the Global
Max Pooling layer downsizes the whole h*w into 1 value by taking the maximum.
In convolutional neural networks (CNNs), the pooling layer is a common type of layer that is typically added
after convolutional layers. The pooling layer is used to reduce the spatial dimensions (i.e., the width and height)
of the feature maps, while preserving the depth (i.e., the number of channels).
 The pooling layer works by dividing the input feature map into a set of non-overlapping regions, called
pooling regions. Each pooling region is then transformed into a single output value, which represents the
presence of a particular feature in that region. The most common types of pooling operations are max
pooling and average pooling.
 In max pooling, the output value for each pooling region is simply the maximum value of the input
values within that region. This has the effect of preserving the most salient features in each pooling
region, while discarding less relevant information. Max pooling is often used in CNNs for object
recognition tasks, as it helps to identify the most distinctive features of an object, such as its edges and
corners.
 In average pooling, the output value for each pooling region is the average of the input values within
that region. This has the effect of preserving more information than max pooling, but may also dilute the
most salient features. Average pooling is often used in CNNs for tasks such as image segmentation and
object detection, where a more fine-grained representation of the input is required.

Pooling layers are typically used in conjunction with convolutional layers in a CNN, with each pooling layer
reducing the spatial dimensions of the feature maps, while the convolutional layers extract increasingly
complex features from the input. The resulting feature maps are then passed to a fully connected layer, which
performs the final classification or regression task.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 19

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
2.5.2 Advantages of Pooling Layer
Dimensionality reduction: The main advantage of pooling layers is that they help in reducing the spatial
dimensions of the feature maps. This reduces the computational cost and also helps in avoiding over-fitting by
reducing the number of parameters in the model.
Translation invariance: Pooling layers are also useful in achieving translation invariance in the feature maps.
This means that the position of an object in the image does not affect the classification result, as the same
features are detected regardless of the position of the object.
Feature selection: Pooling layers can also help in selecting the most important features from the input, as max
pooling selects the most salient features and average pooling preserves more information.

2.5.3 Disadvantages of Pooling Layer

 Information loss: One of the main disadvantages of pooling layers is that they discard some information
from the input feature maps, which can be important for the final classification or regression task.
 Over-smoothing: Pooling layers can also cause over-smoothing of the feature maps,
whichcanresultinthelossofsomefine-graineddetailsthatareimportantforthefinal classification or regression
task.
 Hyperparameter tuning: Pooling layers also introduce hyperparameters such as the size of the pooling
regions and the stride, which need to be tuned in order to achieve optimalperformance.Thiscanbetime-
consumingandrequiressomeexpertiseinmodel building.

3. Convolution Variants
The goal of a CNN is to transform the input image into concise abstract representations of the original input.
The individual convolutional layers try to find more complex patterns from the previous layer’s observations.
The logic is that 10 curved lines would form two elipses, which would make an eye.
To do this, each layer uses a kernel, usually a 2x2 or 3x3 matrix, that slides through the previous layer’s output
to generate a new output. The word convolve from convolution means to roll or slide.
The variants of convolution operations are as follows:

3.1 Strided Convolution

A strided convolution is another basic building block of convolution that is used in Convolutional Neural
Networks. Let’s say we want to convolve this (7 * 7) image with this (3 * 3) filter, except, that instead of doing
it the usual way, we’re going to do it with a stride of (2).

Convolutions with a stride of two

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 20

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
This means that we take the element-wise product as usual in this upper left (3 times 3) region, and then
multiply and sum elements. That gives us (91). But then instead of stepping the blue box over by one step,
we’re going to step it over by two steps. It’s illustrated how the upper left corner has gone from one dot to
another jumping over one position. Then we do the usual element-wise product and summing, and that gives us
(100). Next, we’re going to do that again and make the blue box jump over by two steps. We obtain the value
(83). Then, when we go to the next row, again we take two steps instead of one step. We will move filter by (2)
steps and we’ll obtain (69).

In this example we convolve (7 times 7) matrix with a (3 times 3) matrix and we get a (3 times 3) output. The
input and output dimensions turns out to be governed by the following formula:

n  f  2p
s 1

If we have (n times n) image convolved with an (f times f) filter and if we use a padding (p) and a stride (s), in
this example (s=2), then we end up with an output that is (n-f+2p). Because we’re stepping (s) steps at the time
instead of just one step at a time, we now divide by (s) and add (1). In our example, we have ((7-3+0)/2+1 = 4/2
+1 =3), that is why we end up with this (3 times 3) output. Notice that in this formula above, we round the value
of this fraction, which generally might not be an integer value, down to the nearest integer.

3.2 Tiled Convolution

Tiled Convolutional Neural Networks are an extension to Convolutional Neural Networks that learn k separate
convolution kernels within the same layer. These convolution operations are applied over every k'th unit (hence
the "tiling"). Even k=2 has been shown to give good results. The advantage of this is that through the pooling
operation (where layers are "downsampled" by taking the max, average, or even stochastic combination of each
pxp window in the output of a convolutional layer, across many tiles -- where k=p has been shown to give
good performance), the tiled layers can provide rotational and scale invariance as well as the translational
invariance that comes from having convolutional layers in the first place.

Moreover, each convolution operation is effectively learning an additional feature (or map), which is a learned
representation of the training data, and the tiled layers, like convolutional layers, also still have a relatively
small number of learned parameters. In essence, it is the pooling operation over these multiple "tiled" maps that
allows the network to learn invariances over scaling and rotation.

Fig. 8: CNN vs Tiled CNN

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 21

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
In above figure, units with the same color belong to the
same map; within each map, units with the same fill texture have tied weights. We call this local untying of
weights “tiling.” Tiled CNNs are parametrized by a tile size k: we
constrain only units that are k steps away from each other to be tied. By varying k, we obtain a
spectrum of models which trade off between being able to learn complex invariances, and having
few learnable parameters. At one end of the spectrum we have traditional CNNs (k = 1), and at the
other, we have fully untied simple units.

Next, we will allow our model to use multiple “maps,” so as to learn highly over complete representations. A
map is a set of pooling units and simple units that collectively cover the entire image
(see Figure 8 - Right). When varying the tiling size, we change the degree of weight tying within
each map; for example, if k = 1, the simple units within each map will have the same weights. In
our model, simple units in different maps are never tied. By having units in different maps learn
different features, our model can learn a rich and diverse set of features. Tiled CNNs with multiple
maps enjoy the twin benefits of (i) being able to represent complex invariances, by pooling over
(partially) untied weights, and (ii) having a relatively small number of learnable parameters.

3.3 Transposed Convolution

The transposed convolutional layer, unlike the convolutional layer, is up sampling in nature. Transposed
convolutions are usually used in auto-encoders and GANs, or generally any network that must reconstruct an
image.
The word transpose means to cause two or more things to switch places with each other, and in the context of
convolutional neural networks, this causes the input and the output dimensions to switch.
In a transposed convolution, instead of the input being larger than the output, the output is larger. An easy way
to think of it is to picture the input being padded until the corner kernel can just barely reach the corner of the
input.

When down sampling and up sampling techniques are applied to transposed convolutional layers, their effects
are reversed. The reason for this is for a network to be able to use convolutional layers to compress the image,
then transposed convolutional layers with the exact same down sampling and up sampling techniques to
reconstruct the image.
When padding is ‘added’ to the transposed convolutional layer, it seems as if padding is removed from the
input, and the resulting output becomes smaller.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 22

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com

Without padding, the output is 7x7, but with padding on both sides, it is 5x5. When strides are used, they
instead affect the input, instead of the output.

Strides (2, 2) increases the output dimension from 3x3 to 5x5.

 Transposed Convolution vs Deconvolution
Deconvolution is a term floating around next to transposed convolutions, and the two are often confused for
each other. Many sources use the two interchangeably, and while deconvolutions do exist, they are not very
popular in the field of machine learning.
A deconvolution is a mathematical operation that reverses the effect of convolution. Imagine throwing an input
through a convolutional layer, and collecting the output. Now throw the output through the deconvolutional
layer, and you get back the exact same input. It is the inverse of the multivariate convolutional function.
On the other hand, a transposed convolutional layer only reconstructs the spatial dimensions of the input. In
theory, this is fine in deep learning, as it can learn its own parameters through gradient descent, however, it
does not give the same output as the input.
3.4 Dilated Convolution
Dilated convolution, also known as atrous convolution, is a type of convolution operation used in convolutional
neural networks (CNNs) that enables the network to have a larger receptive field without increasing the number
of parameters. It is a technique that expands the kernel (input) by inserting holes between its consecutive
elements. In simpler terms, it is the same as convolution but it involves pixel skipping, so as to cover a larger
area of the input.
In a regular convolution operation, a filter of a fixed size slides over the input feature map, and the values in the
filter are multiplied with the corresponding values in the input feature map to produce a single output value.
The receptive field of a neuron in the output feature map is defined as the area in the input feature map that the
filter can “see”. The size of the receptive field is determined by the size of the filter and the stride of the
convolution.
In contrast, in a dilated convolution operation, the filter is “dilated” by inserting gaps between the filter values.
The dilation rate determines the size of the gaps, and it is a hyperparameter that can be adjusted. When the
dilation rate is 1, the dilated convolution reduces to a regular convolution.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 23

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
The dilation rate effectively increases the receptive field of the filter without increasing the number of
parameters, because the filter is still the same size, but with gaps between the values. This can be useful in
situations where a larger receptive field is needed, but increasing the size of the filter would lead to an increase
in the number of parameters and computational complexity.
Dilated convolutions have been used successfully in various applications, such as semantic segmentation, where
a larger context is needed to classify each pixel, and audio processing, where the network needs to learn
patterns with longer time dependencies.
An additional parameter l (dilation factor) tells how much the input is expanded. In other words, based on the
value of this parameter, (l-1) pixels are skipped in the kernel. Figure 9 depicts the difference between normal
vs dilated convolution. In essence, normal convolution is just a 1-dilated convolution.

Fig 9: Normal Convolution vs Dilated Convolution

 Intuition:
Dilated convolution helps expand the area of the input image covered without pooling. The objective is to cover
more information from the output obtained with every convolution operation. This method offers a wider field
of view at the same computational cost. We determine the value of the dilation factor (l) by seeing how much
information is obtained with each convolution on varying values of l.
By using this method, we are able to obtain more information without increasing the number of kernel
parameters. In Fig 9, the image on the left depicts dilated convolution. On keeping the value of l = 2, we skip 1
pixel (l – 1 pixel) while mapping the filter onto the input, thus covering more information in each step.
 Formula Involved:
( F*lk ) p   ( s lt  p ) F ( s )k (t )

where,
F(s) = Input
k(t) = Applied Filter
*l = l-dilated convolution
(F*lk)(p) = Output
 Advantages of Dilated Convolution:
Using this method rather than normal convolution is better as:
1. Larger receptive field (i.e. no loss of coverage)
2. Computationally efficient (as it provides a larger coverage on the same computation cost)
[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 24

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
3. Lesser Memory consumption (as it skips the pooling step) implementation
4. No loss of resolution of the output image (as we dilate instead of performing pooling)
5. Structure of this convolution helps in maintaining the order of the data.

 Disadvantages of dilated convolutions are:

1. Reduced spatial resolution in the output feature map compared to the input feature map
2. Increased computational cost compared to regular convolutions with the same filter size and stride

4. CNN Learning
A neural network without an activation function is essentially just a linear regression model. The activation
function does the non-linear transformation to the input making it capable to learn and perform more complex
tasks.

4.1 Non Linearity Functions

Nonlinear functions play a crucial role in Convolutional Neural Networks (CNNs) by introducing complex
transformations that allow the network to capture intricate patterns and relationships in the data. In CNNs, these
nonlinear functions are typically applied after convolutional and pooling layers to introduce nonlinearity into
the network architecture. The most commonly used nonlinear function in CNNs is the Rectified Linear Unit
(ReLU), but there are other options as well. Here are some common nonlinear activation functions used in
CNNs:
 Rectified Linear Unit (ReLU): The ReLU activation function is defined as f(x) = max(0, x). It
replaces all negative values with zero and keeps positive values unchanged. ReLU is
computationally efficient and helps mitigate the vanishing gradient problem, allowing deeper
networks to be trained effectively.
 Leaky ReLU: The Leaky ReLU is an extension of the ReLU function that allows a small gradient
for negative values to prevent neurons from becoming inactive. It's defined as f(x) = x if x > 0, and
f(x) = αx if x < 0, where α is a small positive constant.
 Parametric ReLU (PReLU): PReLU is similar to Leaky ReLU, but the slope for negative values
is learned during training rather than being a fixed constant. This can lead to improved
performance, especially on large datasets.
 Exponential Linear Unit (ELU): The ELU activation function is defined as f(x) = x for x > 0, and
f(x) = α * (exp(x) - 1) for x < 0, where α is a positive constant. ELU can help alleviate the
vanishing gradient problem and produce smoother gradients.
 Scaled Exponential Linear Unit (SELU): SELU is a variant of ELU that aims to maintain mean
and variance stability in neural networks. It's designed to automatically adjust its parameters to
achieve this stability, making it particularly useful in deeper architectures.
 Hyperbolic Tangent (tanh): The tanh activation function squashes values to the range of -1 to 1. It
is symmetric around the origin, so it can produce both positive and negative values.
 Sigmoid: The sigmoid activation function maps inputs to values between 0 and 1. It's often used in
the output layer for binary classification problems where the output represents a probability.
 Swish: Swish is a recently introduced activation function that combines elements of ReLU and
sigmoid functions. It's defined as f(x) = x * sigmoid(βx), where β is a learnable parameter. Swish
has shown promising performance in certain cases. The choice of activation function depends on
the specific problem, architecture, and dataset. It's common practice to experiment with different
activation functions to find the one that works best for a given task.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 25

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
Here we will look into the ReLU activation function, more specifically about it’s non - linear behaviour. ReLU
is an acronym for Rectified Linear Unit. It is the most commonly used activation function. The function returns
0 if it receives any negative input, but for any positive value x it returns that value back. So, Mathematically it
can be expressed as:- f(x) = max(0,x) Basically, it sets anything less than or equal to 0 (negative numbers) to be
0. And keeps all the same values for any values > 0. Graphical representation of ReLU function is:

Fig. 10 RELU Activation Function

From the graphical representation, we observe that it is a very simple function. This means, it is composed of
two pieces of straight lines only which are separated by y-axis of the graph. Also it includes very simple
mathematical operations that are why it is less computationally expensive than other activation functions.
Derivative of ReLU function By just looking into the equation of ReLU function it’s not clear that what the
derivative will be, However let’s look into the graph so that it may get clear to me about it’s derivatives. Let’s
draw a graph of ReLU function where x is ranging from -4 to +4, and increment by 1 unit. Similarly y axis is
labelled as f(x), value of function at x.

Fig.11 RELU function

Now we will look into the derivative of the ReLU using above graph. So, let us see the derivative at different
values of x. For example let’s see the derivative for both positive value of x and negative value of x.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 26

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com

Fig. 12 Derivative function of RELU

As we know the derivative of function is defined as the slope of the function at certain point. So you can see
that the function is mostly differentiable. If x is greater than 0 the derivative is 1 and if x is less than zero the
derivative is 0. But when x = 0, the derivative does not exist. There are two ways to deal with this. First, you
can just arbitrarily assign a value for the derivative of y = f(x) when x = 0. A second alternative is, instead of
using the actual y = f(x) function, use an approximation to ReLU which is differentiable for all values of x.
Anyway, Till now we were getting confused that actually what the ReLU is Linear or Nonlinear? We know that
Mathematically, it is clear that, A function is linear if the slope is constant in its complete domain and the ReLU
function is non-differentiable around 0, but the slope is always either 0 (for negative values) or 1 (for positive
values). That’s why the ReLU function is Non-Linear. Intuitively, we can understand that as The ReLU is an
activation function and the purpose of activation function is to introduce non-linearity in the neural network.

4.2 Loss Functions

The loss function is very important in machine learning or deep learning. In mathematical optimization and
decision theory, a loss or cost function (sometimes also called an error function) is a function that maps an event
or values of one or more variables onto a real number intuitively representing some “cost” associated with the
event. In simple terms, the Loss function is a method of evaluating how well your algorithm is modeling your
dataset. It is a mathematical function of the parameters of the machine learning algorithm.
In simple linear regression, prediction is calculated using slope(m) and intercept(b). The loss function for this is
the (Yi – Yihat)^2 i.e loss function is the function of slope and intercept.

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 27

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
Figur
Cost Function vs Loss Function

[DCE |Prepared By Dr. K. Revathi, AP/AIDS & Mrs. R. Selvi, AP/AIDS) Page 28

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com

Loss Function Cost Function

Measures the error between predicted and Quantifies the overall cost or error of the model on
actual values in a machine learning model the entire training set

Used to optimize the model during training Used to guide the optimization process by
minimizing the cost or error

Can be specific to individual samples Aggregates the loss values over the entire training
set
Examples include mean squared error Often the average or sum of individual loss values
(MSE), mean absolute error (MAE), and in the training set
binary cross- entropy.

Used to evaluate model performance Used to determine the direction and magnitude of
parameter updates during optimization

Different loss functions can be used for Typically derived from the loss function, but can
different tasks or problem domains include additional regularization terms or other
considerations
Loss Function in Deep Learning
 Regression
 MSE(Mean Squared Error)
 MAE(Mean Absolute Error)
 Hubber loss
 Classification
 Binary cross-entropy
 Categorical cross-entropy

A. Regression Loss

1. Mean Squared Error / Squared loss / L2 loss

The Mean Squared Error (MSE) is the simplest and most common loss function. To calculate the
MSE, you take the difference between the actual value and model prediction, square it, and average it
across the whole dataset.

 Advantage
o Easy to interpret
o Always differential because of the square

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
o Only one local minima

 Disadvantage
o Error unit in the square. Because the unit in the square is not understood properly
o Not robust to outlier

Note–In regression at the last neuron use linear activation function.

2. Mean Absolute Error / L1 loss

The Mean Absolute Error (MAE) is also the simplest loss function. To calculate the MAE, you take
the difference between the actual value and model prediction and average it across the whole dataset.

 Advantage
o Intuitive and easy
o Error Unit Same as the output column
o Robust to outlier

 Disadvantage
 Graph, not differential. We cannot use gradient descent directly, then we can sub
gradient calculation.
Note–In regression at the last neuron use linear activation function.

3. Huber Loss
In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers
in data.

n–the number of data points

y–the actual value of the data point. Also known as true value
y–the predicted value of the data point. This value is returned by the model
δ–defines the point where the Huber loss function transitions from a quadratic to linear

 Advantage
o Robust to outlier
o It lies between MAE and MSE

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
 Disadvantage
o Its main disadvantage is the associated complexity. In order to maximize model
accuracy, the hyperparameter δ will also need to be optimized which increases the
training requirements.

B. Classification Loss
1. Binary Cross Entropy / log loss
It is used in binary classification problems like two classes. Example a person has covid or not or my
article gets popular or not.
Binary cross entropy compares each of the predicted probabilities to the actual class output which
can be either 0 or 1. It then calculates the score that penalizes the probabilities based on the distance
from the expected value. That means how close or far from the actual value than the squared error
loss.

Yi – actual values
yihat – neural network prediction

 Advantage
o A cost function is a differential

 Disadvantage
o Multiple local minima
o Not intuitive

Note–In classification at last neuron use sigmoid activation function.

2. Categorical Cross Entropy
Categorical Cross entropy isused for Multiclass classification and softmax regression.

Where k is classes

Where,
k is classes,
y-actual value
yhat–Neural Network prediction

Note – In multi-class classification at the last neuron use the softmax activation function.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45374298

www.BrainKart.com
If problem statement have 3 classes
softmax activation f(z)=ez1/(ez1+ez2+ez3)
If target column has One hot encode to classes like 0 0 1, 0 1 0, 1 0 0 then use categorical cross-entropy.
And if the target column has numerical encoding to classes like 1,2,3,4….n then use sparse categorical
cross-entropy. Sparse categorical cross-entropy faster than categorical cross-entropy.

4.3 Gradient Computation

Gradient computation is a fundamental concept in deep learning and optimization. It involves calculating
the gradient of a mathematical function with respect to its input variables. In the context of deep learning,
the function typically represents a loss or cost function that quantifies how well a model's predictions
match the actual target values. The gradient of a function provides information about its local rate of
change. In other words, it tells you how the function's output changes as you make small adjustments to its
input variables. This information is crucial for optimization algorithms, which aim to minimize the value of
the loss function in order to train a machine learning model effectively. In deep learning, gradient
computation is used primarily for two purposes:
1. Backpropagation: Backpropagation is a key algorithm for training neural networks. It involves
computing the gradients of the loss function with respect to the model's parameters (weights and biases) for
each layer in the network. These gradients indicate how much each parameter needs to be adjusted to
minimize the loss. Backpropagation relies on the chain rule of calculus to efficiently calculate these
gradients layer by layer

2. Gradient Descent: Gradient descent is an optimization algorithm that uses the gradients of the loss
function to iteratively update the model's parameters in a way that reduces the loss. The basic idea is to
take steps in the opposite direction of the gradient to reach a local minimum of the loss function. This
process is repeated until the algorithm converges to a set of parameter values that hopefully result in a
well-trained model

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
Click on Subject/Paper under Semester to enter.
Professional English Discrete Mathematics Environmental Sciences
Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -

Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791

6th Semester

7th Semester

8th Semester

Ad3511 Deep Learning Laboratory
No ratings yet
Ad3511 Deep Learning Laboratory
1 page
CS Configuration Document Ace V1.0
100% (5)
CS Configuration Document Ace V1.0
106 pages
Unit 5
No ratings yet
Unit 5
61 pages
Looting in Kenya-Kroll Report (Hapa Kenya Version)
100% (7)
Looting in Kenya-Kroll Report (Hapa Kenya Version)
101 pages
Motivation Letter To Student Internship Exchange
100% (2)
Motivation Letter To Student Internship Exchange
2 pages
Machine Learning - AL3451 - Important Questions With Answer
No ratings yet
Machine Learning - AL3451 - Important Questions With Answer
25 pages
Deep Learning - AD3501 - Notes - Unit 1 - Deep Networks Basics
100% (1)
Deep Learning - AD3501 - Notes - Unit 1 - Deep Networks Basics
45 pages
Artificial Intelligence - AL3391 - Hand Written Notes - Unit 5 - Probabilistic Reasoning
No ratings yet
Artificial Intelligence - AL3391 - Hand Written Notes - Unit 5 - Probabilistic Reasoning
45 pages
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 2 - Problem Solving
No ratings yet
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 2 - Problem Solving
9 pages
Data Exploration and Visualization - AD3301 - Hand Written Notes - Unit 5 - Multivariate and Time Series Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Hand Written Notes - Unit 5 - Multivariate and Time Series Analysis
59 pages
Fundamentals of Data Science and Analytics - AD3491 - Important Questions With Answer - Unit 1 - Introduction To Data Science
No ratings yet
Fundamentals of Data Science and Analytics - AD3491 - Important Questions With Answer - Unit 1 - Introduction To Data Science
28 pages
Data and Information Security - CW3551 - Important Questions and Question Bank
No ratings yet
Data and Information Security - CW3551 - Important Questions and Question Bank
9 pages
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 1 - Exploratory Data Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 1 - Exploratory Data Analysis
8 pages
Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
No ratings yet
Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
13 pages
ccs355 Syllabus NNDL
100% (1)
ccs355 Syllabus NNDL
3 pages
NN UNIT-1 Complete Notes With 153 Pages
No ratings yet
NN UNIT-1 Complete Notes With 153 Pages
153 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
CSE 5th Semester - Software Testing and Automation - CCS366 - Question Bank and Important 2 Marks Questions With Answer
No ratings yet
CSE 5th Semester - Software Testing and Automation - CCS366 - Question Bank and Important 2 Marks Questions With Answer
25 pages
Foundation of Data Science - CS3352 - Hand Written Notes - Unit 4 - Python Libraries For Data Wrangling
No ratings yet
Foundation of Data Science - CS3352 - Hand Written Notes - Unit 4 - Python Libraries For Data Wrangling
42 pages
CSE 5th Semester - Neural Networks and Deep Learning - CCS355 2021 Regulation - Question Paper 2023 Nov Dec
No ratings yet
CSE 5th Semester - Neural Networks and Deep Learning - CCS355 2021 Regulation - Question Paper 2023 Nov Dec
5 pages
CSE 6th Semester - Web Application Security - CCS374 - Question Bank and Important 2 Marks Questions With Answer
No ratings yet
CSE 6th Semester - Web Application Security - CCS374 - Question Bank and Important 2 Marks Questions With Answer
22 pages
ccs355 Lab Manual
No ratings yet
ccs355 Lab Manual
24 pages
IF4071 Deep Learning QP
No ratings yet
IF4071 Deep Learning QP
2 pages
Ad3511 DL Lab All Lab Manual
No ratings yet
Ad3511 DL Lab All Lab Manual
36 pages
ML Set 1 QB Question Paper
No ratings yet
ML Set 1 QB Question Paper
4 pages
AD3501-DL-Unit 1 Notes
No ratings yet
AD3501-DL-Unit 1 Notes
43 pages
Unit 1 Notes
100% (1)
Unit 1 Notes
14 pages
5.hyperparameters and Validation Sets (C)
No ratings yet
5.hyperparameters and Validation Sets (C)
3 pages
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
100% (1)
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
4 pages
DEEP LEARNING (Previous Question Papers)
No ratings yet
DEEP LEARNING (Previous Question Papers)
3 pages
Deep Learning Handout
100% (1)
Deep Learning Handout
6 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
AL3391 AI UNIT 3 NOTES EduEngg
No ratings yet
AL3391 AI UNIT 3 NOTES EduEngg
38 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
NNDL Lab Record
No ratings yet
NNDL Lab Record
26 pages
ADL Unit-3
100% (2)
ADL Unit-3
21 pages
Efficient Convolution Algorithms
No ratings yet
Efficient Convolution Algorithms
13 pages
Algorithms - CS3401 - Question Bank and Important Questions With Answer
No ratings yet
Algorithms - CS3401 - Question Bank and Important Questions With Answer
55 pages
CCS334 BDA Practical Question
No ratings yet
CCS334 BDA Practical Question
2 pages
Theory of Computation - CS3452 - Question Bank and Important 2 Marks Questions With Answer
No ratings yet
Theory of Computation - CS3452 - Question Bank and Important 2 Marks Questions With Answer
45 pages
Deep Learning If4071
No ratings yet
Deep Learning If4071
2 pages
DAA UNIT 4 - Final
No ratings yet
DAA UNIT 4 - Final
12 pages
Aiml Lab Manual Upto DT
No ratings yet
Aiml Lab Manual Upto DT
40 pages
AD3501 Deep Learning Syllabus
No ratings yet
AD3501 Deep Learning Syllabus
1 page
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Cp4252-Machine Learning Lab Manual 23-24
No ratings yet
Cp4252-Machine Learning Lab Manual 23-24
28 pages
Ad3461 ML Lab Manual
100% (1)
Ad3461 ML Lab Manual
54 pages
Deep Learning Question Bank (2024-25)
No ratings yet
Deep Learning Question Bank (2024-25)
2 pages
Dbms
No ratings yet
Dbms
99 pages
Deep Learning Question Paper
100% (1)
Deep Learning Question Paper
3 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
CS8691 AI CO-PO Mapping
No ratings yet
CS8691 AI CO-PO Mapping
6 pages
Ad8701 DL Unit5 Notes
No ratings yet
Ad8701 DL Unit5 Notes
68 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Deep Learning - IIT Ropar - Unit 6 - Week 3
No ratings yet
Deep Learning - IIT Ropar - Unit 6 - Week 3
4 pages
CS3491 Unit 1 Aiml
No ratings yet
CS3491 Unit 1 Aiml
47 pages
AD3501 - Deep Learning University Question
No ratings yet
AD3501 - Deep Learning University Question
2 pages
Solving XOR Problem Using DNN AIDS
100% (1)
Solving XOR Problem Using DNN AIDS
4 pages
Well Posed Learning Problems and Applications of ML
100% (1)
Well Posed Learning Problems and Applications of ML
17 pages
Unit1 Web Essentials
No ratings yet
Unit1 Web Essentials
25 pages
CS3451 Course Plan
100% (1)
CS3451 Course Plan
10 pages
Database Design and Management - AD3391 2021 Regulation - Notes - Hand Writing
No ratings yet
Database Design and Management - AD3391 2021 Regulation - Notes - Hand Writing
238 pages
Big Data Analytics - CCS334 - Important Questions
No ratings yet
Big Data Analytics - CCS334 - Important Questions
9 pages
Volume 3 Reasoning 3
No ratings yet
Volume 3 Reasoning 3
19 pages
Questions To Be Revised-4
No ratings yet
Questions To Be Revised-4
8 pages
Ss Burst
No ratings yet
Ss Burst
1 page
Poriyaan 1CCVQtVbnp3tpnizHKju0 5dNPTwUUWw PDF
No ratings yet
Poriyaan 1CCVQtVbnp3tpnizHKju0 5dNPTwUUWw PDF
18 pages
Name: - Date: - Grade & Section: - Score
No ratings yet
Name: - Date: - Grade & Section: - Score
2 pages
GD121 Spare Parts Old
No ratings yet
GD121 Spare Parts Old
647 pages
ATC-3002 Quick Start Guide
No ratings yet
ATC-3002 Quick Start Guide
2 pages
C3000GT PM en 09
No ratings yet
C3000GT PM en 09
127 pages
ML22164A439
No ratings yet
ML22164A439
12 pages
Tropical Rainforest: Presented by
No ratings yet
Tropical Rainforest: Presented by
30 pages
Alternating Quantities
No ratings yet
Alternating Quantities
16 pages
Lovely Professional University: Academic Task - 2 Mittal School of Business
No ratings yet
Lovely Professional University: Academic Task - 2 Mittal School of Business
2 pages
Rebranding and Revitalisation
100% (1)
Rebranding and Revitalisation
7 pages
This Content Downloaded From 42.1.77.20 On Tue, 05 Nov 2024 14:43:27 UTC
No ratings yet
This Content Downloaded From 42.1.77.20 On Tue, 05 Nov 2024 14:43:27 UTC
17 pages
Energy
No ratings yet
Energy
2 pages
Spectrum MediaStore5000 Datasheet PDF
No ratings yet
Spectrum MediaStore5000 Datasheet PDF
2 pages
Mental Math Slide Show
No ratings yet
Mental Math Slide Show
22 pages
The Role of Frontier Orbitals in Chemical Reactions
No ratings yet
The Role of Frontier Orbitals in Chemical Reactions
18 pages
ASN RB Nov Dec 2023 1
No ratings yet
ASN RB Nov Dec 2023 1
51 pages
Ficha Técnica de Balatas-001 Noviembre 2011
No ratings yet
Ficha Técnica de Balatas-001 Noviembre 2011
4 pages
MSS 064 Rev.00 Final
No ratings yet
MSS 064 Rev.00 Final
33 pages
Early Method of Detecting Deception
100% (2)
Early Method of Detecting Deception
6 pages
Micro Controller 89c51
No ratings yet
Micro Controller 89c51
17 pages
Will (Advanced Uses)
No ratings yet
Will (Advanced Uses)
5 pages
Service Manual: DSC-P10/P12
No ratings yet
Service Manual: DSC-P10/P12
1 page
American Manufacturing Aw1122bcd Parts Book
100% (1)
American Manufacturing Aw1122bcd Parts Book
6 pages
Cómo Escribir Un Gancho para Un Ensayo
100% (1)
Cómo Escribir Un Gancho para Un Ensayo
7 pages
Research Proposal
No ratings yet
Research Proposal
10 pages
Filmit Themes 2021-22 For Students
No ratings yet
Filmit Themes 2021-22 For Students
4 pages
Data (Prod & Admin) - July 2023 - August
No ratings yet
Data (Prod & Admin) - July 2023 - August
332 pages
Techno-Commercial Proposal (Without Price) (08!04!2025)
No ratings yet
Techno-Commercial Proposal (Without Price) (08!04!2025)
6 pages

Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks

Uploaded by

Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks

Uploaded by

Click on Subject/Paper under Semester to enter.

Professional English Discrete Mathematics Environmental Sciences

Database Design and Operating Systems -

Physics for Design and Analysis of Machine Learning -

Security - CW3551 Ethics - GE3791

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

UNIT II CONVOLUTIONAL NEURAL NETWORKS

1. Introduction to Convolutional Neural Networks:

1.1 Convolution Neural Network

Fig. 1Typical CNN architecture

 Fully connected layers

1.2 CNN architecture

Fig.2 Simple CNN architecture

1.2.1 Different types of CNN Architectures

Fig. 4 LeNet Architecture

Fig. 5 AlexNet Architecture

Fig. 6 GoogLeNet Architecture

Fig. 7 VGGNet Architectre

First successful applications of CNNs, 5 layers Recognizing handwritten

Deeper and wider than LeNet, Used ReLU Large-scale image

Designed for mobile and embedded vision Mobile and embedded

1.3 Working of Convolutional Layers

1.4 Advantages of Convolutional Neural Networks (CNNs)

1.5 Disadvantages of Convolutional Neural Networks (CNNs)

1.6 Applications of CNN

2.2 Sparse Interactions (Connectivity)

2.3 Parameter (Weight) Sharing

2.4 Equivariant Representations

2.5.1 Types of Pooling Layers:

2.5.3 Disadvantages of Pooling Layer

3.1 Strided Convolution

Convolutions with a stride of two

3.2 Tiled Convolution

Fig. 8: CNN vs Tiled CNN

3.3 Transposed Convolution

Strides (2, 2) increases the output dimension from 3x3 to 5x5.

Fig 9: Normal Convolution vs Dilated Convolution

 Disadvantages of dilated convolutions are:

4.1 Non Linearity Functions

Fig. 10 RELU Activation Function

Fig.11 RELU function

Fig. 12 Derivative function of RELU

4.2 Loss Functions

Loss Function Cost Function

1. Mean Squared Error / Squared loss / L2 loss

Note–In regression at the last neuron use linear activation function.

2. Mean Absolute Error / L1 loss

n–the number of data points

Note–In classification at last neuron use sigmoid activation function.

4.3 Gradient Computation

Database Design and Operating Systems -

Physics for Design and Analysis of Machine Learning -

Security - CW3551 Ethics - GE3791

You might also like