0% found this document useful (0 votes)

142 views7 pages

21CS743 DL Module4 Notes

hygyutyt

Uploaded by

NANDAN M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

142 views7 pages

21CS743 DL Module4 Notes

hygyutyt

Uploaded by

NANDAN M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Deep Learning Module 4: Convolutional Networks

Module 4: Convolutional Networks

Module 4: The Convolution Operation, Pooling, Convolution and Pooling as an Infinitely Strong Prior,
Variants of the Basic Convolution Function, Structured Outputs, Data Types, Efficient Convolution
Algorithms, Random or Unsupervised Features- LeNet, AlexNet.

Text Book: Ian Goodfellow, Yoshua Bengio, Aaron Courville, “Deep Learning”, MIT Press, 2016.

(Chapters 9.1, 9.9)

➢ Convolutional Neural Networks:

Convolutional networks (LeCun, 1989), also known as convolutional neural networks or CNNs, are a
specialized kind of neural network for processing data that has a known, grid-like topology. Examples include
time-series data, which can be thought of as a 1D grid taking samples at regular time intervals, and image
data, which can be thought of as a 2D grid of pixels. Convolutional networks have been tremendously
successful in practical applications. The name “convolutional neural network” indicates that the network
employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation.
Convolutional networks are simply neural networks that use convolution in place of general matrix
multiplication in at least one of their layers.

➢ The Convolutional Operation:

• Convolution Operation: As convolution is a mathematical operation on two functions that produces
a third function that expresses how the shape of one function is modified by another.
• Similarly, CNN is more like a convolution operation where we take different measurements and
rely more on the near ones for better results.
• So here we take revised measurements which is a weighted average of the measurements taken
such that the near ones are assigned more weight than the measurements taken earlier.
• Let us assume we are tracking the location of a spaceship with a laser sensor. Our laser sensor
provides a single output x(t), the position of the spaceship at time t. Both x and t are real-valued,
i.e., we can get a different reading from the laser sensor at any instant in time.
• Now suppose that our laser sensor is somewhat noisy. To obtain a less noisy estimate of the
spaceship’s position, we would like to average together several measurements.
• Of course, more recent measurements are more relevant, so we will want this to be a weighted
average that gives more weight to recent measurements.

Department of CSE, Vemana I.T Page 1 of 7

Deep Learning Module 4: Convolutional Networks

• We can do this with a weighting function w(a), where a is the age of a measurement. If we apply
such a weighted average operation at every moment, we obtain a new function s providing a
smoothed estimate of the position of the spaceship:

• This operation is called convolution. The convolution operation is typically denoted with an
asterisk:

• In the above equation the x represents the input, * represents the convolution operation and w
denotes the filter that is applied.
• In the above example, w needs to be a valid probability density function, or the output is not a
weighted average. Also, w needs to be 0 for all negative arguments, or it will look into the future,
which is presumably beyond our capabilities. These limitations are particular to the example
discussed above though.
• In general, convolution is defined for any functions for which the above integral is defined, and
may be used for other purposes besides taking weighted averages.
• In convolutional network terminology, the first argument (x) to the convolution is often referred to
as the input and the second argument (the function w) as the kernel. The output is sometimes
referred to as the feature map.
• Similarly, for 2D input, re-estimation of each pixel is done by taking a weighted average of all its
neighbours.
➢ Convolutional Neural Network Architecture:
• A CNN typically has three layers: a convolutional layer, a pooling layer, and a fully connected
layer.
• Convolution Layer:
• The convolution layer is the core building block of the CNN. It carries the main portion of the
network’s computational load. The CNN architecture is as shown in Figure 1.
• This layer performs a dot product between two matrices, where one matrix is the set of learnable
parameters otherwise known as a kernel, and the other matrix is the restricted portion of the
receptive field.
• The kernel is spatially smaller than an image but is more in-depth.

Department of CSE, Vemana I.T Page 2 of 7

Deep Learning Module 4: Convolutional Networks

Fig 1. The CNN Architecture

• During the forward pass, the kernel slides across the height and width of the image-producing the
image representation of that receptive region.
• This produces a two-dimensional representation of the image known as an activation map that
gives the response of the kernel at each spatial position of the image. The sliding size of the kernel
is called a stride.
• If we have an input of size W x W x D and Dout number of kernels with a spatial size of F with
stride S and amount of padding P, then the size of output volume can be determined by the
following formula:

• This will yield an output volume of size Wout x Wout x Dout.

• Input Volume Dimensions: W x W x D: The input volume has a width and height of 𝑊, and a
depth of 𝐷.
• In CNNs, depth 𝐷 typically represents the number of channels (for example, 𝐷=3 for RGB images
with three color channels).
• Kernel Parameters F: This is the spatial size of the filter (kernel). If F=3, it means the kernel is
3×3.
• S: The stride s the number of pixels by which the filter slides over the input each time. If S=1, the
filter moves one pixel at a time. If S=2, it moves two pixels, and so on.
• P: Padding refers to the number of pixels added around the input's border. Padding can help
preserve spatial dimensions after convolution. For example, if padding P=1, one pixel layer is
added to all four sides of the input.

Department of CSE, Vemana I.T Page 3 of 7

Deep Learning Module 4: Convolutional Networks

• Number of Filters: Dout This represents the number of filters (kernels) used in the convolution
layer, determining the depth of the output volume.
• Each filter produces one output channel, so the output will have Dout channels. The figure 2 depicts
the process of how the activation map is obtained.

Fig 2. Representation of how the activation map/ feature map is obtained.

• Before we go on to the next layer let us try and understand Cross-Correlation and its Role in
CNNs.
• In convolutional networks, the operation commonly referred to as "convolution" is, in fact, cross-
correlation.
• Cross-correlation computes the similarity between the input signal and the kernel as the kernel
slides over the input. Mathematically, this can be written as:

(f⋆g)[n]=m∑f[m]g[n+m]

• Here, f represents the input, g the kernel, and n the spatial or temporal shift. Cross-correlation
captures localized patterns in the data, which is essential for feature extraction in CNNs.
• Toeplitz Matrix in Convolution: The convolution operation can be expressed in matrix form
using a Toeplitz matrix, where each diagonal contains the same elements.
• For a 1D convolution, the input vector can be expanded into a Toeplitz matrix, enabling the
convolution to be represented as:

Output=Toeplitz (Input)×Kernel

• This matrix-based representation helps in understanding the linear transformations performed by

convolutional layers.

Department of CSE, Vemana I.T Page 4 of 7

Deep Learning Module 4: Convolutional Networks

• Block-Circulant Matrices for Efficiency: In advanced implementations, the Toeplitz matrix is

often extended into a block-circulant matrix to optimize memory usage and computation. Block-
circulant structures allow for the decomposition of the convolution operation into efficient
computations using FFT, reducing the computational cost significantly.
• Circulant Matrices and FFT:
• A circulant matrix is a specific type of Toeplitz matrix where each row is a circular shift of the
previous row. Circulant matrices are fundamental to the FFT-based acceleration of convolutions,
making them integral to efficient deep learning frameworks.
• By leveraging these mathematical structures, modern CNNs achieve high efficiency in both
training and inference.
➢ Motivation behind Convolution:
• Convolution leverages three important ideas that motivated computer vision researchers: sparse
interaction, parameter sharing, and equivariant representation. Let’s describe each one of them in
detail.
• Trivial neural network layers use matrix multiplication by a matrix of parameters describing the
interaction between the input and output unit. This means that every output unit interacts with every
input unit. However, convolution neural networks have sparse interaction.
• This is achieved by making kernel smaller than the input e.g., an image can have millions or
thousands of pixels, but while processing it using kernel, we can detect meaningful information
that is of tens or hundreds of pixels.
• This means that we need to store fewer parameters that not only reduces the memory requirement
of the model but also improves the statistical efficiency of the model.
• If computing one feature at a spatial point (x1, y1) is useful then it should also be useful at some
other spatial point say (x2, y2). It means that for a single two-dimensional slice i.e., for creating
one activation map, neurons are constrained to use the same set of weights.
• In a traditional neural network, each element of the weight matrix is used once and then never
revisited, while convolution network has shared parameters i.e., for getting output, weights
applied to one input are the same as the weight applied elsewhere.
• Due to parameter sharing, the layers of convolution neural network will have a property of
equivariance to translation. It says that if we changed the input in a way, the output will also get
changed in the same way.

Department of CSE, Vemana I.T Page 5 of 7

Deep Learning Module 4: Convolutional Networks

➢ Pooling Layer:
• A typical layer of a convolutional network consists of three stages. In the first stage, the layer
performs several convolutions in parallel to produce a set of linear activations. In the second stage,
each linear activation is run through a nonlinear activation function, such as the rectified linear
activation function. This stage is sometimes called the detector stage.
• In the third stage, we use a pooling function to modify the output of the layer further. A pooling
function replaces the output of the net at a certain location with a summary statistic of the nearby
outputs. For example, the max pooling operation reports the maximum output within a rectangular
neighbourhood.
• Other popular pooling functions include the average of a rectangular neighbourhood, the L2 norm
of a rectangular neighbourhood, or a weighted average based on the distance from the central pixel.
• In all cases, pooling helps to make the representation become approximately invariant to small
translations of the input. Invariance to translation means that if we translate the input by a small
amount, the values of most of the pooled outputs do not change.
• Translation Invariance: Invariance to local translation can be a very useful property if we care more
about whether some feature is present than exactly where it is.
• For example, when determining whether an image contains a face, we need not know the location
of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of
the face and an eye on the right side of the face.
• In other contexts, it is more important to preserve the location of a feature. For example, if we want
to find a corner defined by two edges meeting at a specific orientation, we need to preserve the
location of the edges well enough to test whether they meet.

➢ Fully Connected Layer:

• Neurons in this layer have full connectivity with all neurons in the preceding and succeeding layer
as seen in regular FCNN.
• This is why it can be computed as usual by a matrix multiplication followed by a bias effect.
• The FC layer helps to map the representation between the input and the output.
• The non linearity is introduced in this fully connected layer through the activation functions.

Department of CSE, Vemana I.T Page 6 of 7

Deep Learning Module 4: Convolutional Networks

➢ Convolution and Pooling as an Infinitely Strong Prior:

• A prior is something we already assume to be true about a problem.
• Convolution and pooling assume that:
• Patterns matter more than exact positions (e.g., a "tail" is a tail no matter where it is in the photo).
• Small details add up to the big picture (e.g., spotting "fur" and "paws" helps identify a tiger).
• These assumptions (or priors) are so strong that they work incredibly well for many tasks, especially
image recognition.
• They save time, computation, and generalize well to real-world images without needing extra data.
• An infinitely strong prior means having an assumption about how something works that is so powerful
and rigid that it dominates how we process or interpret data no matter what the actual data says.
• IN CNN the assumptions made by the convolution and pooling are that patterns matter more than their
exact location.
• A cat’s ears will look the same whether they are in the top-left corner or the bottom-right corner of
the image. This is called translation invariance.
• Local relationships are key: Convolution assumes the meaningful information in an image (e.g., an
eye, a whisker) can be found by looking at small patches of the image at a time.
• These assumptions are so strong that the model focuses entirely on patterns and ignores other
possibilities.
• For example: If an image’s pattern looks slightly like an ear, the convolutional model might still say,
“This must be part of a cat!” even if it’s just a random shape.
• In other words, convolution and pooling force the model to think in terms of patterns and local
information no matter what the data might suggest otherwise.

Department of CSE, Vemana I.T Page 7 of 7

Chapter 5 - Solid-State Physics For Quantum ESPRESSO
No ratings yet
Chapter 5 - Solid-State Physics For Quantum ESPRESSO
42 pages
Module-2 Decision Properties of CFL
No ratings yet
Module-2 Decision Properties of CFL
5 pages
Big Book For Buckyballs Tricks
0% (2)
Big Book For Buckyballs Tricks
6 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
6 pages
Python For Data Science Cheat Sheet 2.0
100% (1)
Python For Data Science Cheat Sheet 2.0
11 pages
r22 1 9 ML Lab Manual r22 Regulations
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
FIND-S Algorithm: Machine Learning 15CSL76
No ratings yet
FIND-S Algorithm: Machine Learning 15CSL76
3 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
F.Y.M.Sc. (CS) Sem-I AI Pract Slip
No ratings yet
F.Y.M.Sc. (CS) Sem-I AI Pract Slip
22 pages
Analysis of Beam Transverse
No ratings yet
Analysis of Beam Transverse
35 pages
5.hyperparameters and Validation Sets (C)
No ratings yet
5.hyperparameters and Validation Sets (C)
3 pages
Raptor Labs
No ratings yet
Raptor Labs
21 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Module 4: Dynamic Programming: Design and Analysis of Algorithms 21CS42
No ratings yet
Module 4: Dynamic Programming: Design and Analysis of Algorithms 21CS42
105 pages
UNIT1
No ratings yet
UNIT1
38 pages
Test - Calculator: Math No
No ratings yet
Test - Calculator: Math No
15 pages
Lecture Notes - Extensions of Functions
No ratings yet
Lecture Notes - Extensions of Functions
42 pages
UJIAN DIAGNOSTIK MATEMATIK (BM) PSPN
No ratings yet
UJIAN DIAGNOSTIK MATEMATIK (BM) PSPN
61 pages
2023 AMC Junior
50% (8)
2023 AMC Junior
8 pages
MCQ On Knowledge Representation 5eea6a0e39140f30f369e525
No ratings yet
MCQ On Knowledge Representation 5eea6a0e39140f30f369e525
21 pages
Linear Differential Equation - Wikipedia, The Free Encyclopedia
100% (1)
Linear Differential Equation - Wikipedia, The Free Encyclopedia
8 pages
Altair 05 TR
No ratings yet
Altair 05 TR
27 pages
UNIT2
No ratings yet
UNIT2
25 pages
Unit III Ai Kcs071
No ratings yet
Unit III Ai Kcs071
50 pages
AIML Unit 2 Notes
No ratings yet
AIML Unit 2 Notes
49 pages
ANFIS
No ratings yet
ANFIS
42 pages
முதலாம் தவணை - வவுனியா தெற்கு
No ratings yet
முதலாம் தவணை - வவுனியா தெற்கு
6 pages
ML - CSA 301 - ML Perspective and Issues
No ratings yet
ML - CSA 301 - ML Perspective and Issues
34 pages
Order of Operations: Ma'am Rae Ann V. Ines
No ratings yet
Order of Operations: Ma'am Rae Ann V. Ines
35 pages
B10-Food Preference
No ratings yet
B10-Food Preference
12 pages
G1 Sign Language Identifier PPT
No ratings yet
G1 Sign Language Identifier PPT
18 pages
21CS743 Module4 Notes
No ratings yet
21CS743 Module4 Notes
15 pages
Python Notes 3rd Mca
No ratings yet
Python Notes 3rd Mca
99 pages
Intro To Plant Taxonomy Notes
No ratings yet
Intro To Plant Taxonomy Notes
26 pages
DL Unit Wise Important Questions
No ratings yet
DL Unit Wise Important Questions
2 pages
IP Datagrams. Datagram Forwarding
No ratings yet
IP Datagrams. Datagram Forwarding
26 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Daa Assignment
No ratings yet
Daa Assignment
12 pages
Module-02 AIML NOTES
No ratings yet
Module-02 AIML NOTES
29 pages
DSP Lab Model Questions
0% (1)
DSP Lab Model Questions
3 pages
204CS001-Machine Learning Techniques.
No ratings yet
204CS001-Machine Learning Techniques.
1 page
cs8086 Soft Computing
No ratings yet
cs8086 Soft Computing
14 pages
Factoring Polynomials With Common Factors
No ratings yet
Factoring Polynomials With Common Factors
11 pages
ML Unit 1
No ratings yet
ML Unit 1
25 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
Designing A Learning System
No ratings yet
Designing A Learning System
12 pages
LM 08
No ratings yet
LM 08
38 pages
Al3411 Artificial Intelligence and Machine Learning Laboratory L T P C
No ratings yet
Al3411 Artificial Intelligence and Machine Learning Laboratory L T P C
11 pages
COSC 3100 Brute Force and Exhaustive Search: Instructor: Tanvir
No ratings yet
COSC 3100 Brute Force and Exhaustive Search: Instructor: Tanvir
44 pages
Prague Codex 1
No ratings yet
Prague Codex 1
6 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
Pre Term-II Class-X (Maths 241)
No ratings yet
Pre Term-II Class-X (Maths 241)
7 pages
N P-Hard and N P-Complete Problems
No ratings yet
N P-Hard and N P-Complete Problems
12 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
1 FIND+S+Algorithm
No ratings yet
1 FIND+S+Algorithm
2 pages
Lab Manual
No ratings yet
Lab Manual
28 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
1-NLP - Lab Manual
No ratings yet
1-NLP - Lab Manual
15 pages
Jawaharlal Nehru Engineering College: Digital Image Processing
50% (2)
Jawaharlal Nehru Engineering College: Digital Image Processing
26 pages
Theory of Machines Lab#1
No ratings yet
Theory of Machines Lab#1
6 pages
0580 - 22 Mathematics Paper 2 May-June 2022
No ratings yet
0580 - 22 Mathematics Paper 2 May-June 2022
6 pages
PPT04-Knowledge Representation
No ratings yet
PPT04-Knowledge Representation
37 pages
Storks Deliver Babies (P 0.008)
No ratings yet
Storks Deliver Babies (P 0.008)
5 pages
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
100% (1)
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
17 pages
Reversible Data Hiding-Based Contrast Enhancement With Multi-Group Stretching For ROI of Medical Image
No ratings yet
Reversible Data Hiding-Based Contrast Enhancement With Multi-Group Stretching For ROI of Medical Image
15 pages
Role of Bisection Method
No ratings yet
Role of Bisection Method
3 pages
CS6659 Artificial Intelligence
No ratings yet
CS6659 Artificial Intelligence
10 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
ML Unit 1
No ratings yet
ML Unit 1
44 pages
Important Questions - Sets QB365
No ratings yet
Important Questions - Sets QB365
3 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
Predicate Logic Exercise
No ratings yet
Predicate Logic Exercise
8 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
46 pages
Redsultados Conflic
No ratings yet
Redsultados Conflic
19 pages
PPS Course Material
100% (1)
PPS Course Material
177 pages
BHRM 242 - Collection, Organisation and Presentation of Data
No ratings yet
BHRM 242 - Collection, Organisation and Presentation of Data
13 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Question Bank AML
No ratings yet
Question Bank AML
4 pages
Tutorial Surveying II Ch#5 Orientation
No ratings yet
Tutorial Surveying II Ch#5 Orientation
2 pages
11 Formulae
No ratings yet
11 Formulae
2 pages
AI 2marks Questions
100% (1)
AI 2marks Questions
121 pages
DAA UNIT 4 - Final
No ratings yet
DAA UNIT 4 - Final
12 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
18AI61
No ratings yet
18AI61
3 pages
Unit 3 AI Srs 13-14
No ratings yet
Unit 3 AI Srs 13-14
45 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet

21CS743 DL Module4 Notes

Uploaded by

21CS743 DL Module4 Notes

Uploaded by

Deep Learning Module 4: Convolutional Networks

Module 4: Convolutional Networks

(Chapters 9.1, 9.9)

➢ Convolutional Neural Networks:

➢ The Convolutional Operation:

Department of CSE, Vemana I.T Page 1 of 7

Department of CSE, Vemana I.T Page 2 of 7

Fig 1. The CNN Architecture

• This will yield an output volume of size Wout x Wout x Dout.

Department of CSE, Vemana I.T Page 3 of 7

Fig 2. Representation of how the activation map/ feature map is obtained.

• This matrix-based representation helps in understanding the linear transformations performed by

Department of CSE, Vemana I.T Page 4 of 7

• Block-Circulant Matrices for Efficiency: In advanced implementations, the Toeplitz matrix is

Department of CSE, Vemana I.T Page 5 of 7

➢ Fully Connected Layer:

Department of CSE, Vemana I.T Page 6 of 7

➢ Convolution and Pooling as an Infinitely Strong Prior:

Department of CSE, Vemana I.T Page 7 of 7

You might also like