0% found this document useful (0 votes)

15 views

Module 3

Uploaded by

janviboby

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Module 3

Uploaded by

janviboby

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Foundations of Deep Learning

Module 3

Syllabus
Convolutional Neural Networks –Architecture, Convolution operation, Motivation, pooling.
Variants of convolution functions, Structured outputs, Data types, Efficient convolution
algorithms, Applications of Convolutional Networks, Pre-trained convolutional Architectures :
AlexNet, ZFNet, VGGnet-19, ResNet50.

Overview: https://www.youtube.com/watch?v=QzY57FaENXg
https://www.youtube.com/watch?v=K_BHmztRTpA&sttick=0

Watch these videos

Reference:https://www.ibm.com/topics/convolutional-neural-networks

• A Convolutional Neural Network (CNN) is a type of Deep Learning neural network

architecture commonly used in Computer Vision. What is Computer Vision - field of
AI that deals with understanding and interpreting image/visual data

• CNN’s are a special type of ANN which accepts images as inputs.

Why do we have to use CNN?

• Talking about grayscale images, they have pixel ranges from 0 to 255 i.e. 8-bit pixel
values.

• If the size of the image is NxM, then the size of the input vector will be N*M. For RGB
images, it would be N*M*3. 3 represents channels.

• Consider an RGB image with size 30x30. This would require 2700 neurons.
[30*30*3=2700]. An RGB image of size 256x256 would require over 100000
neurons.

• The number of weights, parameters for 224x224x3 is very high.

• A single neuron in the output layer will have 224x224x3 weights coming into it. This
would require more computation, memory, and data.

• Each layer performs convolution on CNN. CNN takes input as an image volume for
the RGB image.

• Basically, an image is taken as an input and we apply a kernel/filter on the image to

get the output.
• CNN enables parameter sharing between the output neurons which means that a
feature detector (for example horizontal edge detector) that’s useful in one part of the
image is probably useful in another part of the image.

What are the issues with CNN?

● CNNs can be challenging to train and require large amounts of data. Additionally, they
can be computationally expensive, especially for large and complex models.
● Vulnerable to adversarial attacks
● Limited ability to generalize

They have three main types of layers, which are:

Convolutional layer Pooling layer Fully-connected layer

Convolutions

• Every output neuron is connected to a small neighborhood in the input through a

weight matrix also referred to as a kernel or a weight matrix.

• We can define multiple kernels for every convolution layer each giving rise to an output.

• Each filter is moved around the input image.

•The outputs corresponding to each filter are stacked giving rise to an output volume.

Padding
• Padded convolution is used when preserving the dimension of an input matrix that is
important to us and it helps us keep more of the information at the border of an
image.

• We have seen that convolution reduces the size of the feature map.

• To retain the dimension of feature map as that of an input map, we pad or append the
rows and column with zeros.

• Padding P=(F-1)/2 , F is the size of the kernel matrix

Stride

• Stride refers to the number of pixels the kernel filter will skip i.e pixels/time.

• A Stride of 2 means the kernel will skip 2 pixels before performing the convolution operation.

• In the figure above, the kernel filter is sliding over the input matrix by skipping one pixel at a
time.

•A Stride of 2 would perform this skipping action twice before performing the convolution like in
the image below.

•The output feature map is reduced(4 times) when the stride is increased from 1 to 2.

• The dimension of the output feature map is (N-F)/S + 1.

Pooling

• The pooling operation involves sliding a two-dimensional filter over each channel
of feature map and summarising the features lying within the region covered by the
filter.

• A common CNN model architecture is to have a number of convolution and pooling

layers stacked one after the other.

• Pooling layers are used to reduce the dimensions of the feature maps. Thus, it
reduces the number of parameters to learn and the amount of computation
performed in the network.

• The pooling layer summarizes the features present in a region of the feature map
generated by a convolution layer.

• Pooling provides translational invariance by subsampling: reduces the size of the

feature maps. The two commonly used Pooling techniques are max pooling and
average pooling.

• The pooling operation divides 4x4 matrix into 4 2x2 matrices and picks the value
which is the greatest amongst the four(for max-pooling) and the average of the four(
for average pooling).

• This reduces the size of the feature maps which therefore reduces the number of
parameters without missing important information.

• One thing to note here is that the pooling operation reduces the Nx and Ny values
of the input feature map but does not reduce the value of Nc (number of channels).
• Also, the hyperparameters involved in pooling operation are the filter dimension,
stride, and type of pooling(max or avg).

Max Pooling

• Max pooling is a pooling operation that selects the maximum element from the region
of the feature map covered by the filter. Thus, the output after max-pooling layer
would be a feature map containing the most prominent features of the previous
feature map.

Average Pooling

• Average pooling computes the average of the elements present in the region of
feature map covered by the filter. Thus, while max pooling gives the most prominent
feature in a particular patch of the feature map, average pooling gives the average of
features present in a patch.

Global Pooling

• Global pooling reduces each channel in the feature map to a single value. Thus, an
nh x nw x nc feature map is reduced to 1 x 1 x nc feature map. This is equivalent to
using a filter of dimensions nh x nw i.e. the dimensions of the feature map.

• Further, it can be either global max pooling or global average pooling

• In convolutional neural networks (CNNs), the pooling layer is a common type of layer
that is typically added after convolutional layers. The pooling layer is used to reduce
the spatial dimensions (i.e., the width and height) of the feature maps, while
preserving the depth (i.e., the number of channels).

• The pooling layer works by dividing the input feature map into a set of
non-overlapping regions, called pooling regions. Each pooling region is then
transformed into a single output value, which represents the presence of a particular
feature in that region. The most common types of pooling operations are max pooling
and average pooling.

• In max pooling, the output value for each pooling region is simply the maximum
value of the input values within that region. This has the effect of preserving the most
salient features in each pooling region, while discarding less relevant information.
Max pooling is often used in CNNs for object recognition tasks, as it helps to identify
the most distinctive features of an object, such as its edges and corners.

• In average pooling, the output value for each pooling region is the average of the
input values within that region. This has the effect of preserving more information
than max pooling, but may also dilute the most salient features. Average pooling is
often used in CNNs for tasks such as image segmentation and object detection,
where a more fine-grained representation of the input is required.

Advantages of Pooling Layer:

• Dimensionality reduction: The main advantage of pooling layers is that they help in
reducing the spatial dimensions of the feature maps. This reduces the computational
cost and also helps in avoiding overfitting by reducing the number of parameters in
the model.

• Translation invariance: Pooling layers are also useful in achieving translation

invariance in the feature maps. This means that the position of an object in the image
does not affect the classification result, as the same features are detected regardless
of the position of the object.

• Feature selection: Pooling layers can also help in selecting the most important
features from the input, as max pooling selects the most salient features and
average pooling preserves more information.

Output Feature Map

• The size of the output feature map or volume depends on:

• Size of the input feature map

• Kernel size(Kw,Kh)
• Zero padding

• Stride(Sw, Sh)

Motivation

● Sparse interactions,
● Parameter sharing
● Equivariant representations.

Sparse Interactions

➔ Convolutional networks have sparse interactions (also referred to as sparse connectivity

or sparse weights). This is accomplished by making the kernel smaller than the input.
➔ For example, when processing an image, the input image might have thousands or
millions of pixels, but we can detect small, meaningful features such as edges with
kernels that occupy only tens or hundreds of pixels. This means that we need to store
fewer parameters, which both reduces the memory requirements of the model and
improves its statistical efficiency.
➔ It also means that computing the output requires fewer operations. These improvements
in efficiency are usually quite large. If there are m inputs and n outputs, then matrix
multiplication requires m×n parameters and the algorithms used in practice have O(m ×
n) runtime (per example). If we limit the number of connections each output may have to
k, then the sparsely connected approach requires only k × n parameters and O(k × n)
runtime.
Parameter Sharing
➔ Parameter sharing refers to using the same parameter for more than one function in a
model. In a traditional neural net, each element of the weight matrix is used exactly once
when computing the output of a layer. It is multiplied by one element of the input and
then never revisited.
➔ lIn a traditional neural net, each element of the weight matrix is used exactly once when
computing the output of a layer.
➔ lIt is multiplied by one element of the input and then never revisited.
➔ lAs a synonym for parameter sharing, one can say that a network has tied weights,
because the value of the weight applied to one input is tied to the value of a weight
applied elsewhere.
➔ lIn a convolutional neural net, each member of the kernel is used at every position of the
input (except perhaps some of the boundary pixels, depending on the design decisions
regarding the boundary).
➔ lThe parameter sharing used by the convolution operation means that rather than
learning a separate set of parameters for every location, we learn only one set.
➔ This does not affect the runtime of forward propagation—it is still O(k × n )—but it does
further reduce the storage requirements of the model to k parameters.
➔ Recall that k is usually several orders of magnitude less than m.
➔ Since m and n are usually roughly the same size, k is practically insignificant compared
to m × n .
➔ Convolution is thus dramatically more efficient than dense matrix multiplication in terms
of the memory requirements and statistical efficiency.
➔ Feed-forward network connects every pixel with each node in the following layer,
ignoring any spatial information present in the image.
➔ Convolutional architecture looks at local regions of the image.
➔ In this case, a 2 by 2 filter with a stride of 2 is scanned across the image to output 4
nodes, each containing localized information about the image.
Equivariant Representations
➔ Due to parameter sharing, the layers of convolution neural network will have a property
of equivariance to translation.
➔ It says that if we change the input in a way, the output will also change in the same way.
➔ Specifically, a function f (x) is equivariant to a function g if f(g(x)) = g(f (x)).
➔ In the case of convolution, if we let g be any function that translates the input, i.e., shifts
it, then the convolution function is equivariant to g.
➔ For example, let I be a function giving image brightness at integer coordinates.
➔ Let g be a function mapping one image function to another image function, such that I’=
g(I) is the image function with I’ (x, y) = I(x − 1, y).
➔ This shifts every pixel of I one unit to the right.
➔ If we apply this transformation to I, then apply convolution, the result will be the same as
if we applied convolution to I , then applied the transformation g to the output.
➔ In a traditional 2D CNN designed for grayscale images, the input is a 2D grid of pixel
values. The convolutional layers employ 2D filters (kernels) that slide across the image
in both the vertical and horizontal directions.
➔ These filters capture local patterns in the image, making the network equivariant to
translation in the spatial domain.
➔ For instance, if an object moves to a different position in the image, the network will still
detect it as long as the local pattern (e.g., edges, textures) remains the same.
IMP: Q. What happens if the stride of the convolution layer increases? What can be
the maximum stride? Justify

• Stride is a component of convolutional neural networks, or neural networks tuned for

the compression of images and video data. Stride is a parameter of the neural
network's filter that modifies the amount of movement over the image or video. For
example, if a neural network's stride is set to 1, the filter will move one pixel, or unit,
at a time. The size of the filter affects the encoded output volume, so stride is often
set to a whole integer, rather than a fraction or decimal.

• Naturally, as the stride, or movement, is increased, the resulting output will be

smaller.

• The choice of stride is also important, but it affects the tensor shape after the
convolution, hence the whole network. The general rule is to use stride=1 in usual
convolutions and preserve the spatial size with padding, and use stride=2 when you
want to downsample the image.

• When the stride of a convolutional layer increases, it means that the filter or kernel
moves a larger distance with each step during the convolution operation. This results
in a reduction in the spatial dimensions of the output feature map. The maximum
stride you can use depends on the dimensions of the input data and the size of the
filter.

• The formula to calculate the output size of a convolutional layer is:

• Increasing the stride value reduces the output size. The maximum stride you can
use is limited by the filter size and the input size. If the stride is too large, the filter
might not effectively cover the input data, causing information loss and potentially
making the network unable to learn meaningful features.

• Typically, a common choice for stride is 1, which means the filter moves one pixel
at a time. Larger strides like 2 or 3 are often used in specific situations to
downsample the feature map and reduce computational complexity in deeper layers
of convolutional neural networks. However, the choice of stride should be made
carefully based on the specific task and network architecture to balance information
preservation and computational efficiency.
Can we apply multiple filters to the same image? - In practice instead of applying one
kernel, we can apply multiple kernels with different values on the same image one after another
so that we can get multiple outputs.

All of these outputs can be stacked on top of each other combined to form a volume.

If we apply three filters on the input we will get an output of depth equal to 3.

Depth of the output from the convolution operation is equal to the number of filters that are
being applied on the input

Variants of convolution functions

Full Convolution

Zero padding One stride

Zero Padding s stride
Convolution with a stride greater than 1 pixel is equivalent to conv with 1 stride followed
by downsampling:

Some Paddings and 1 Stride:

Special case of 0 padding:

● Valid: no 0 padding is used. Limited number of layers.

● Same: keep the size of the output to the size of input. Unlimited number of
layers. Pixels near the border influence fewer output pixels than the input
pixels near the center.
● Full: Enough zeros are added for every pixels to be visited k (kernel width)
times in each direction, resulting width m + k - 1. Difficult to learn a single
kernel that performs well at all positions in the convolutional feature map.

Usually the optimal amount of 0 padding lies somewhere between ‘Valid’ or ‘Same’

Unshared Convolution
Useful when we know that each feature should be a function of a small part of space,
but no reason to think that the same feature should occur accross all the space. eg: look
for mouth only in the bottom half of the image.

It can be also useful to make versions of convolution or local connected layers in which
the connectivity is further restricted, eg: constrain each output channel i to be a function
of only a subset of the input channel.
Tiled Convolution
❖ Learn a set of kernels that we rotate through as we move through space.

Immediately neighboring locations will have different filters, but the memory
requirement for storing the parameters will increase by a factor of the size of this
set of kernels.
Structured outputs
Strategy for size reduction issue:

● Avoid pooling altogether

● Emit a lower-resolution grid of labels
● Pooling operator with unit stride

One strategy for pixel-wise labeling of images is to produce an initial guess of the
image label.

1. Produce an initial guess of the image labels.

2. Refine this initial guess using the interactions between neighboring
pixels.

Repeat this refinement step serveral times corresponds to using the same convolution
at each stage, sharing weights between last layers of the deep net.
Data types
Efficient convolution algorithms

Modern convolutional network applications often involve networks containing more than one
million units. Powerful implementations exploiting parallel computation resources are essential.
However, in many cases it is also possible to speed up convolution by selecting an appropriate
convolution algorithm. Convolution is equivalent to converting both the input and the kernel to
the frequency domain using a Fourier transform, performing point-wise multiplication of the two
signals, and converting back to the time domain using an inverse Fourier transform. For some
problem sizes, this can be faster than the naive implementation of discrete convolution
When a d-dimensional kernel can be expressed as the outer product of d vectors, one vector
per dimension, the kernel is called separable.
● When the kernel is separable, naive convolution is inefficient.
● It is equivalent to compose d one-dimensional convolutions with each of these vectors.
● The composed approach is significantly faster than performing one d-dimensional convolution
with their outer product.
● The kernel also takes fewer parameters to represent as vectors.
● If the kernel is w elements wide in each dimension, then naive multidimensional convolution
requires O (w d ) runtime and parameter storage space, while separable convolution requires
O(w × d ) runtime and parameter storage space.
● Of course, not every convolution can be represented in this way
A spatial separable convolution simply divides a kernel into two, smaller kernels. The most
common case would be to divide a 3x3 kernel into a 3x1 and 1x3 kernel, like so

Now, instead of doing one convolution with 9 multiplications, we do two convolutions with 3
multiplications each (6 in total) to achieve the same effect. With less multiplications,
computational complexity goes down, and the network is able to run faster.
Case Studies of Convolutional Architectures :

LeNet-5

AlexNet
ZFNet

VGGNet-19,

ResNet-50
https://iq.opengenus.org/evolution-of-cnn-architectures/#google_vignette
Application Of CNN
Previous Year Questions:
1. Illustrate the strengths and weaknesses of convolutional neural networks.
2. What happens if the stride of the convolutional layer increases? What can be the
maximum stride? Justify your answer
3. Consider an activation volume of size 13×13×64 and a filter of size 3×3×64. Discuss
whether it is possible to perform convolutions with strides 2, 3 and 5. Justify your answer
in each case. (6 marks)
4. Suppose that a CNN was trained to classify images into different categories. It
performed well on a validation set that was taken from the same source as the training
set but not on a testing set. What could be the problem with the training of such a CNN?
How will you ascertain the problem? How can those problems be solved?
5. a. Explain the following convolution functions a)tensors b) kernel flipping c) down
sampling d) strides e) zero padding. (10 marks)
6. What is the motivation behind convolution neural networks? (4 marks)
7. Design a Convolutional Neural Network (CNN) for gender classification using face
images of size 256 x 256. Determine suitable filter sizes, activation functions, and the
width of each layer within the network
8. Consider an input image with dimensions of 28 x 28 pixels. You apply a convolutional
operation with a kernel (filter) size of 3x3, a padding of 0, and a stride of 2. Calculate the
dimensions of the output feature map. Also, calculate the padding value if we need the
output to have the same size as the input with a stride of 1.
9. What are the key differences between AlexNet, ZFNet, VGGnet-19, and ResNet50 in
terms of their architectures, performance?
10. Why do we use pooling in convolutional neural networks? Illustrate with an example how
pooling works?
11. Define the concept of the receptive field. Mention two strategies for expanding the
receptive field without increasing the filter size.
12. Explain the architecture of a Convolutional Neural Network (CNN) and its fundamental
components. (8)
13. Discuss different formats of data that can be used with CNN.
14. Provide examples of diverse applications where Convolutional Neural Networks excel
and explain their effectiveness in those domains.
15. Consider an input image with dimensions of 64 x 64 pixels. You apply a convolutional
operation with a kernel (filter) size of 5x5, a padding of 2, and a stride of 1. Calculate the
dimensions of the output feature map. Additionally, determine the padding value if we
need the output to have the same size as the input with a stride of 2.

DBQ - Documents - 13878104
No ratings yet
DBQ - Documents - 13878104
7 pages
DL Mod 3
No ratings yet
DL Mod 3
65 pages
Unit 3 CNN
No ratings yet
Unit 3 CNN
47 pages
DL_MOD3
No ratings yet
DL_MOD3
102 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
11 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
55 pages
DL Unit3
No ratings yet
DL Unit3
8 pages
DL Endsem 2024 FlyHigh Services
No ratings yet
DL Endsem 2024 FlyHigh Services
18 pages
UNIT4
100% (1)
UNIT4
14 pages
ML Lec 13 CNN
No ratings yet
ML Lec 13 CNN
44 pages
UNIT2-CNN
No ratings yet
UNIT2-CNN
34 pages
M4_IA2
No ratings yet
M4_IA2
6 pages
Unit-4
No ratings yet
Unit-4
19 pages
371810f3-a2d5-467f-aa88-bfa680405b79
No ratings yet
371810f3-a2d5-467f-aa88-bfa680405b79
17 pages
DL 3
No ratings yet
DL 3
7 pages
Machine Learning-Lecture 17(Student)
No ratings yet
Machine Learning-Lecture 17(Student)
7 pages
3.Convolutional Networks and Sequence Modeling
No ratings yet
3.Convolutional Networks and Sequence Modeling
19 pages
cnn
No ratings yet
cnn
10 pages
unit-3-CNN-2024
No ratings yet
unit-3-CNN-2024
58 pages
CNN
No ratings yet
CNN
2 pages
Unit III
No ratings yet
Unit III
38 pages
UNIT-4 Foundations of Deep Learning
100% (1)
UNIT-4 Foundations of Deep Learning
43 pages
AD3501-DL-Unit 2
No ratings yet
AD3501-DL-Unit 2
33 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
24 pages
Unit 2
No ratings yet
Unit 2
20 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
11 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
61 pages
AD3501-DL-UNIT 2 NOTES
No ratings yet
AD3501-DL-UNIT 2 NOTES
29 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
Module 3
No ratings yet
Module 3
67 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
3 pages
You Can't Stop The Clock
No ratings yet
You Can't Stop The Clock
14 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
CNN Interview Question
No ratings yet
CNN Interview Question
16 pages
Module 3 - Convolutional Neural Networks: History
No ratings yet
Module 3 - Convolutional Neural Networks: History
3 pages
Chapter 4 Ann
No ratings yet
Chapter 4 Ann
33 pages
Poolin Layer
No ratings yet
Poolin Layer
28 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
Unit 3 ML
No ratings yet
Unit 3 ML
27 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
CC511 Week 7 - Deep - Learning
No ratings yet
CC511 Week 7 - Deep - Learning
33 pages
What Should You Consider or Pay Attention To When Preparing A Data Set
No ratings yet
What Should You Consider or Pay Attention To When Preparing A Data Set
7 pages
DL_U4
No ratings yet
DL_U4
7 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
55 pages
Convolutional Neural Networks-Part2
No ratings yet
Convolutional Neural Networks-Part2
21 pages
Unit III
No ratings yet
Unit III
89 pages
CNN Test Answers
No ratings yet
CNN Test Answers
8 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
13 pages
CNN Part 2
No ratings yet
CNN Part 2
15 pages
Neural Networks and Deep Learning (PE - V) (18CSE23) Unit - 4
No ratings yet
Neural Networks and Deep Learning (PE - V) (18CSE23) Unit - 4
11 pages
unit2
No ratings yet
unit2
22 pages
UNIT - 2
No ratings yet
UNIT - 2
31 pages
convolution operation
No ratings yet
convolution operation
23 pages
Unit3 2023 NNDL
No ratings yet
Unit3 2023 NNDL
69 pages
Horizontal Max Pooling A Novel Approach For Noise Reduction in Max Pooling For Better Feature Detection in DNN
No ratings yet
Horizontal Max Pooling A Novel Approach For Noise Reduction in Max Pooling For Better Feature Detection in DNN
14 pages
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Volume Rendering: Exploring Visual Realism in Computer Vision
From Everand
Volume Rendering: Exploring Visual Realism in Computer Vision
Fouad Sabry
No ratings yet
Assertiveness Training
No ratings yet
Assertiveness Training
6 pages
Invoice Details SEP 2009
No ratings yet
Invoice Details SEP 2009
11 pages
Raising and Production of Catfish
No ratings yet
Raising and Production of Catfish
6 pages
Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
No ratings yet
Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
18 pages
2024 CS783 Quiz1
No ratings yet
2024 CS783 Quiz1
2 pages
QM-2 Solved Problems
No ratings yet
QM-2 Solved Problems
17 pages
Chapter 21 Guided Reading Sections 1and 2
No ratings yet
Chapter 21 Guided Reading Sections 1and 2
3 pages
Testpaper For Grade 10
No ratings yet
Testpaper For Grade 10
5 pages
13th q
No ratings yet
13th q
2 pages
Auto-Indexing A Powder Diffraction Pattern
No ratings yet
Auto-Indexing A Powder Diffraction Pattern
4 pages
SLRC Paper III SET B
No ratings yet
SLRC Paper III SET B
64 pages
tb67h451fng Application Note en 20201126
No ratings yet
tb67h451fng Application Note en 20201126
22 pages
Environmental Engineering: Master of Science
No ratings yet
Environmental Engineering: Master of Science
2 pages
How To Ask Professor To Be Thesis Advisor
100% (3)
How To Ask Professor To Be Thesis Advisor
5 pages
Research Report
No ratings yet
Research Report
41 pages
Chapter 01 - The Form of The Earth
100% (2)
Chapter 01 - The Form of The Earth
7 pages
Application OF Iron Waste IN Geotechnical Engineering
No ratings yet
Application OF Iron Waste IN Geotechnical Engineering
15 pages
Biologi: Fungi
No ratings yet
Biologi: Fungi
30 pages
Appeal Letter: Date: 27th Feb, 2022
No ratings yet
Appeal Letter: Date: 27th Feb, 2022
3 pages
Lecture 1 Occupation Safety and Health
No ratings yet
Lecture 1 Occupation Safety and Health
31 pages
Session 9 2 LAST
No ratings yet
Session 9 2 LAST
7 pages
4th QA Reviewer
No ratings yet
4th QA Reviewer
4 pages
INTERNATIONAL JOURNAL OF MATHEMATICAL COMBINATORICS, Vol. 1/2016
No ratings yet
INTERNATIONAL JOURNAL OF MATHEMATICAL COMBINATORICS, Vol. 1/2016
141 pages
Uts Lesson 3
No ratings yet
Uts Lesson 3
7 pages
Running A Standards Based Report in Think Central
No ratings yet
Running A Standards Based Report in Think Central
6 pages
Undersluice Bay Design Report PDF
No ratings yet
Undersluice Bay Design Report PDF
57 pages
Astm D2435M 11
No ratings yet
Astm D2435M 11
15 pages
Hill Is 1991
No ratings yet
Hill Is 1991
43 pages
Project Report - 2023-24
No ratings yet
Project Report - 2023-24
16 pages

Module 3

Uploaded by

Module 3

Uploaded by

Foundations of Deep Learning

Watch these videos

• A Convolutional Neural Network (CNN) is a type of Deep Learning neural network

• CNN’s are a special type of ANN which accepts images as inputs.

Why do we have to use CNN?

• The number of weights, parameters for 224x224x3 is very high.

• Basically, an image is taken as an input and we apply a kernel/filter on the image to

What are the issues with CNN?

They have three main types of layers, which are:

Convolutional layer Pooling layer Fully-connected layer

• Every output neuron is connected to a small neighborhood in the input through a

• Each filter is moved around the input image.

• Padding P=(F-1)/2 , F is the size of the kernel matrix

• The dimension of the output feature map is (N-F)/S + 1.

• A common CNN model architecture is to have a number of convolution and pooling

• Pooling provides translational invariance by subsampling: reduces the size of the

• Further, it can be either global max pooling or global average pooling

Advantages of Pooling Layer:

• Translation invariance: Pooling layers are also useful in achieving translation

Output Feature Map

• The size of the output feature map or volume depends on:

• Size of the input feature map

➔ Convolutional networks have sparse interactions (also referred to as sparse connectivity

• Stride is a component of convolutional neural networks, or neural networks tuned for

• Naturally, as the stride, or movement, is increased, the resulting output will be

• The formula to calculate the output size of a convolutional layer is:

Variants of convolution functions

Zero padding One stride

Some Paddings and 1 Stride:

● Valid: no 0 padding is used. Limited number of layers.

● Avoid pooling altogether

1. Produce an initial guess of the image labels.

You might also like