Convolution Neural Network-1
Convolution Neural Network-1
Network
Introduction
Convolutional networks also known as convolutional neural networks or CNNs, are a specialized kind of neural network for
processing data that has a known, grid-like topology.
Examples include time-series data, which can be thought of as a 1D grid taking samples at regular time intervals, and image
data, which can be thought of as a 2D grid of pixels.
The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution.
Click icon to addnetworks
Convolution is a specialized kind of linear operation. Convolutional picture are simply neural networks that use convolution
in place of general matrix multiplication in at least one of their layers.
Topics for Discussion:
1. Describe what convolution is?
2. The motivation behind using convolution in a neural network
3. Operation called pooling
4. Neuroscientific principles
CNN 2
The Convolution Operation
• In its most general form, convolution is an operation on two functions of a real valued argument.
• To motivate the definition of convolution, examples of two functions we might use.
• Suppose we are tracking the location of a spaceship with a laser sensor. Our laser sensor provides a single output x(t), the position of
the spaceship at time t. Both x and t are real-valued, i.e., we can get a different reading from the laser sensor at any instant in time.
• Now suppose that our laser sensor is somewhat noisy. To obtain a less noisy estimate of the spaceship’s position, we would like to
average together several measurements.
• Of course, more recent measurements are more relevant,Click iconwill
so we to add
wantpicture
this to be a weighted average that gives more weight to recent
measurements. We can do this with a weighting function w(a), where a is the age of a measurement. If we apply such a weighted
average operation at every moment, we obtain a new function s providing a smoothed estimate of the position of the spaceship.
• This operation is called convolution. The convolution operation is typically denoted with an asterisk:
CNN 3
Convolution Network
In convolutional network terminology, the first argument to the convolution is often referred to as the input.
The second argument is called kernel.
. The output is sometimes referred to as the feature map.
In our example, the idea of a laser sensor that can provide measurements at every instant in time is not realistic. Usually, when
we work with data on a computer, time will be discretized,
Click and
icon our sensor
to add will provide data at regular intervals. In our example, it
picture
might be more realistic to assume that our laser provides a measurement once per second. The time index t can then take on only
integer values. If we now assume that x and w are defined only on integer t, we can define the discrete convolution
In machine learning applications, the input is usually a multidimensional array of data and the kernel is usually a
multidimensional array of parameters that are adapted by the learning algorithm. These multidimensional arrays as tensors.
Because each element of the input and kernel must be explicitly stored separately, we usually assume that these functions are
zero everywhere but the finite set of points for which we store the values.
TEACH A COURSE 4
Flipping
Flipping in Convolutional Neural Networks (CNNs) refers to the process of reversing the order of
elements in a convolutional kernel (filter). It is an integral part of how convolution is mathematically
defined in many frameworks.
The commutative property of convolution arises because of flipping the kernel relative to the input, in
the sense that as m increases, the index into the input increases, but the index into the kernel decreases.
Click icon to add picture
The only reason to flip the kernel is to obtain the commutative property. While the commutative property is useful for
writing proofs, it is not usually an important property of a neural network implementation Instead, many neural network
libraries implement a related function called the cross-correlation, which is the same as convolution but without flipping
the kernel.
TEACH A COURSE 5
Convolution (without kernel flipping) applied to a
2-D tensor
TEACH A COURSE 6
Convolution (without kernel flipping) applied to a
2-D tensor
o Discrete convolution can be viewed as multiplication by a matrix. However, the matrix has several entries
constrained to be equal to other entries.
oFor example, for univariate discrete convolution, each row of the matrix is constrained to be equal to the row
above shifted by one element. This is known as a Toeplitz matrix.
Click icon to add picture
oIn two dimensions, a doubly block circulant matrix corresponds to convolution.
TEACH A COURSE 7
9.2 Motivation
Convolution leverages three important ideas that can help improve a machine learning system:
•Sparse interactions
• Parameter sharing
•Equivariant representations. Click icon to add picture
Moreover, convolution provides a means for working with inputs of variable size.
Traditional neural network layers use matrix multiplication by a matrix of parameters with a separate
parameter describing the interaction between each input unit and each output unit. This means every
output unit interacts with every input unit.
TEACH A COURSE 8
9.2 Motivation(Sparse Interactions)
• Convolutional networks, however, typically have sparse interactions. This is accomplished by making the kernel
smaller than the input.
• For example, when processing an image, the input image might have thousands or millions of pixels, but we can
detect small, meaningful features such as edges with kernels that occupy only tens or hundreds of pixels. This means
that we need to store fewer parameters, which Click
bothicon to addthe
reduces picture
memory requirements of the model and improves its
statistical efficiency. It also means that computing the output requires fewer operations.
Example: If there are m inputs and n outputs, then matrix multiplication requires m×n parameters and the
algorithms used in practice have O(m × n) runtime.
• If we limit the number of connections each output may have to k, then the sparsely connected approach requires
only k × n parameters and O(k × n) runtime.
TEACH A COURSE 9
Graphical demonstrations of sparse connectivity
TEACH A COURSE 10
Knowledge oriented
TEACH A COURSE 11
Graphical demonstrations of sparse connectivity
TEACH A COURSE 12
Graphical demonstrations of sparse connectivity
TEACH A COURSE 13
Parameter sharing
• Parameter sharing refers to using the same parameter for more than one function in a model.
• In a traditional neural net, each element of the weight matrix is used exactly once when computing the output of a layer. It
is multiplied by one element of the input and then never revisited.
• “As a synonym for parameter sharing, one can say that a network has tied weights, because the value of the weight applied
to one input is tied to the value of a weight applied elsewhere”. In a convolutional neural net, each member of the kernel is
used at every position of the input. Click icon to add picture
• The parameter sharing used by the convolution operation means that rather than learning a separate set of parameters for
every location, we learn only one set.
TEACH A COURSE 14
sparse connectivity and parameter sharing
TEACH A COURSE 15
Parameter sharing
• In the case of convolution, the particular form of parameter sharing causes the layer to have a property called
equivariance to translation. To say a function is equivariant means that if the input changes, the output changes in the same
way.
• A function f (x) is equivariant to a function g if f(g(x)) = g(f(x)).
Click icon to add picture
• In the case of convolution, if we let g be any function that translates the input, i.e., shifts it, then the convolution function
is equivariant to g.
• Example: Let I be a function giving image brightness at integer coordinates. Let g be a function mapping one image
function to another image function, such that I’ = g(I) is the image function with I’ (x, y) = I(x − 1, y). This shifts every pixel
of I one unit to the right.
• When processing time series data, this means that convolution produces a sort of timeline that shows when different
features appear in the input. If we move an event later in time in the input, the exact same representation of it will appear
in the output, just later in time
TEACH A COURSE 16
Parameter sharing
• If we move the object in the input, its representation will move the same amount in the output. This is useful for
when we know that some function of a small number of neighboring pixels is useful when applied to multiple input
locations. For example, when processing images, it is useful to detect edges in the first layer of a convolutional
network. The same edges appear more or less everywhere in the image, so it is practical to share parameters across
the entire image. Click icon to add picture
• In some cases, we may not wish to share parameters across the entire image. For example, if we are processing
images that are cropped to be centered on an individual’s face, we probably want to extract different features at
different locations—the part of the network processing the top of the face needs to look for eyebrows, while the part
of the network processing the bottom of the face needs to look for a chin
TEACH A COURSE 17
9.3 Pooling
Definition of pooling: Pooling is a technique used in convolutional neural networks (CNNs) to reduce the spatial
dimensions of feature maps (width and height) while retaining important information. This process helps reduce
computational complexity and overfitting.
A typical layer of a convolutional network consists of three stages
In the first stage, the layer performs several convolutions in parallel to produce a set of linear activations.
Click icon to add picture
In the second stage, each linear activation(like the weighted sum of inputs in a layer) is run through a nonlinear activation
function, such as the rectified linear activation function. This stage is sometimes called the detector stage
In the third stage, a pooling function to modify the output of the layer further.
A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs.
For example, the max pooling operation reports the maximum output within a rectangular neighborhood. Other popular
pooling functions include the average of a rectangular neighborhood, the L2 norm of a rectangular neighborhood, or a
weighted average based on the distance from the central pixel.
In all cases, pooling helps to make the representation become approximately invariant to small translations of the input.
Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do
not change
TEACH A COURSE 18
9.3 Pooling
Invariance to local translation can be a very useful property if we
care more about whether some feature is present than exactly
where it is.
TEACH A COURSE 19
9.3 Pooling
How the Pooling Works in This Diagram:
Input values are shown in the lower row (e.g., 0.1, 1.0,
0.2, etc.).
The max-pooling operation is applied to overlapping
windows of size 3:First window: [0.1, 1.0, 0.2] → The
maximum is 1.0.
Second window: [1.0, 0.2, 0.1] → The maximum is 1.0.
Third window: [0.2, 0.1] (smaller size, as it's the edge)
→ The maximum is 0.2.
Fourth window: [0.1] (only one value, as it's the edge)
→ The maximum is 0.1.
These maxima form the output values (upper row).
TEACH A COURSE 20
9.3 Pooling
For many tasks, pooling is essential for handling inputs of varying size.
For example, if we want to classify images of variable size, the input to the classification layer must have a
fixed size. This is usually accomplished by varying the size of an offset between pooling regions so that
the classification layer always receives the same number of summary statistics regardless of the input
size.
For example, the final pooling layer of the network may be defined to output four sets of summary statistics,
one for each quadrant of an image, regardless of the image size
TEACH A COURSE 21
Components of CNN
TEACH A COURSE 22
9.4 Convolution and Pooling as an Infinitely Strong Prior
A Probability distribution over the parameters of a model that encodes our beliefs about what models are reasonable,
before we have seen any data. Priors can be considered weak or strong depending on how concentrated the probability
density in the prior is.
A weak prior is a prior distribution with high entropy, such as a Gaussian distribution with high variance. Such a prior
allows the data to move the parameters more or less freely.
A strong prior has very low entropy, such as a Gaussian distribution with low variance. Such a prior plays a more active
role in determining where the parameters end up.
An infinitely strong prior places zero probability on some parameters and says that these parameter values are
completely forbidden, regardless of how much support the data gives to those values.
Click icon to add picture
We can imagine a convolutional net as being similar to a fully connected net, but with an infinitely strong prior over its
weights. This infinitely strong prior says that the weights for one hidden unit must be identical to the weights of its
neighbor, but shifted in space. The prior also says that the weights must be zero, except for in the small, spatially
contiguous receptive field assigned to that hidden unit. Overall, we can think of the use of convolution as introducing an
infinitely strong prior probability distribution over the parameters of a layer.
says that the function the layer should learn contains only local interactions and is equivariant to translation. Likewise,
the use of pooling is an infinitely strong prior that each unit should be invariant to small translations
TEACH A COURSE 23
9.4 Convolution and Pooling as an Infinitely Strong Prior
One key insight is that convolution and pooling can cause under fitting. Like any prior, convolution and
pooling are only useful when the assumptions made by the prior are reasonably accurate.
If a task relies on preserving precise spatial information, then using pooling on all features can increase the
training error.
Some convolutional network architectures are designed to use pooling on some channels but not on other
channels, in order to get both highly invariant features and features that will not under fit when the
translation invariance prior is incorrect.
When a task involves incorporating information from very distant locations in the input, then the prior
imposed by convolution may be inappropriate
Click icon to add picture
Another key insight from this view is that we should only compare convolutional models to other
convolutional models in benchmarks of statistical learning performance. Models that do not use convolution
would be able to learn even if we permuted all of the pixels in the image. For many image datasets, there are
separate benchmarks for models that are permutation invariant and must discover the concept of topology via
learning, and models that have the knowledge of spatial relationships hard-coded into them by their designer
TEACH A COURSE 24
9.5 Variants of the Basic Convolution
Function
• Convolution in Neural Networks:In neural networks, the term "convolution" doesn't refer exactly to the
standard convolution operation as defined in mathematics. Instead, it is slightly adapted to make it more
practical for feature extraction in neural networks. These changes help the operation work better for tasks like
image processing and other data-driven applications.
• Input Structure:The input to a convolutional neural network (CNN) is not just a simple grid of numbers. For
example, in a color image, each pixel contains three values representing red, green, and blue intensities. As we
go deeper into the network, the input to each layer becomes the output of the previous layer, which contains
multiple feature maps. Each feature map represents different types of information extracted by the previous
layer.
• Data Representation:When working with images, the input and output of a convolutional layer are often
represented as 3D tensors. These tensors include:
Click iconOne axispicture
to add for channels (e.g., RGB),Two axes for the spatial
dimensions (height and width).
In practice, software processes data in batches, adding a fourth dimension to represent the batch size. However,
for simplicity, this extra batch dimension is often omitted in explanations.
• Multi-Channel Convolutions:In CNNs, convolutional operations usually involve multiple input and output
channels. These multi-channel operations are not always commutative, meaning the order of the operations can
affect the result. This lack of commutativity is especially true when the number of input and output channels
differs between operations.
TEACH A COURSE 25
9.5 Variants of the Basic Convolution Function
Parallel Feature Extraction:In neural networks, convolution is typically applied in parallel using multiple
filters (kernels). A single kernel can extract one specific feature, such as edges, but in practice, we want to
detect many types of features at the same time. Therefore, each layer of a CNN applies several convolutions in
parallel to capture diverse patterns and information from the input data.
Components in the Convolution Process:
TEACH A COURSE 26
9.5 Variants of the Basic Convolution
Function
TEACH A COURSE 27
9.5 Variants of the Basic Convolution
Function
TEACH A COURSE 28
9.5 Variants of the Basic Convolution
Function
TEACH A COURSE 29
9.5 Variants of the Basic Convolution
Function
Stride and Down sampled Convolution:
When we use a stride, the kernel skips over some input positions during the convolution, reducing the computational cost.
The stride determines how far the kernel moves after each step.
TEACH A COURSE 30
Variants of the Basic Convolution Function
TEACH A COURSE 31
Click icon to add picture
TEACH A COURSE 32
TEACH A COURSE 33
TEACH A COURSE 34
Introduction
TEACH A COURSE 35
First Lesson
We will cover these skills:
First skill
Second skill
Third skill
TEACH A COURSE 36
First Skill
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nemo enim ipsam voluptatem quia voluptas sit aspernatur
Maecenas porttitor congue massa. Fusce posuere, magna aut odit aut fugit, sed quia consequuntur magni dolores eos
sed pulvinar ultricies, purus lectus malesuada libero, sit qui ratione voluptatem sequi nesciunt. Neque porro
amet commodo magna eros quis urna. quisquam est, qui dolorem ipsum quia dolor sit amet,
consectetur, adipisci velit, sed quia non numquam eius modi
Nunc viverra imperdiet enim. Fusce est. Vivamus a tellus. tempora incidunt ut labore et dolore magnam aliquam
Sed ut perspiciatis unde omnis iste natus error sit quaerat voluptatem.
voluptatem accusantium doloremque laudantium, totam
rem aperiam, eaque ipsa quae ab illo inventore veritatis et
quasi architecto beatae vitae dicta sunt explicabo.
TEACH A COURSE 37
Second Skill
Lorem ipsum dolor sit Lorem ipsum dolor sit Lorem ipsum dolor sit
amet, consectetuer amet, consectetuer amet, consectetuer
adipiscing elit adipiscing elit adipiscing elit
TEACH A COURSE 38
Second Skill
TEACH A COURSE 39
Third Skill
Lorem ipsum dolor sit amet,
consectetuer adipiscing elit. Maecenas
porttitor congue massa. Fusce posuere,
magna sed pulvinar ultricies, purus
lectus malesuada libero, sit amet
commodo magna eros quis urna.
Nunc viverra imperdiet enim. Fusce est.
Vivamus a tellus.
Pellentesque habitant morbi tristique
senectus et netus et malesuada fames
ac turpis egestas. Proin pharetra
nonummy pede. Mauris et orci.
TEACH A COURSE 40
First skill
Lorem ipsum dolor sit amet,
consectetuer adipiscing elit. Maecenas
porttitor congue massa. Fusce posuere,
magna sed pulvinar ultricies, purus
lectus malesuada libero.
Second skill
Nunc viverra imperdiet enim. Fusce est.
Vivamus a tellus. Pellentesque habitant
morbi tristique senectus et netus et
malesuada fames ac turpis egestas.
Third skill
Lorem ipsum dolor sit amet,
consectetuer adipiscing elit. Maecenas First Lesson Summary
porttitor congue massa.
TEACH A COURSE 41
Course Progress
Lesson 2. Sed ut perspiciatis unde omnis iste natus error sit voluptatem
Lesson 4. Sed ut perspiciatis unde omnis iste natus error sit voluptatem
TEACH A COURSE 42
Course Progress
Lesson 2. Sed ut perspiciatis unde omnis iste natus error sit voluptatem
Lesson 4. Sed ut perspiciatis unde omnis iste natus error sit voluptatem
TEACH A COURSE 43
Thank You!
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenas porttitor congue massa. Fusce posuere,
magna sed pulvinar ultricies, purus lectus malesuada libero, sit amet commodo magna eros quis urna.