Convolution Nueral Networks
Convolution Nueral Networks
2
Neural Networks / Multi-layer Perceptron
●
Regular Neural Networks
Fig: https://cs231n.github.io/convolutional-networks/
3
Neural Networks / Multi-layer Perceptron
●
Neural Networks: Layers and Functionality
●
In a regular Neural Network there are three types of layers:
●
Input Layers: It’s the layer in which we give input to our model. The number of neurons in this layer is equal to the total number of features in our data (number
of pixels in the case of an image).
●
Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can be many hidden layers depending on our model and data size. Each
hidden layer can have different numbers of neurons which are generally greater than the number of features. The output from each layer is computed by matrix
multiplication of the output of the previous layer with learnable weights of that layer and then by the addition of learnable biases followed by activation function
which makes the network nonlinear.
●
Output Layer: The output from the hidden layer is then fed into a logistic function like sigmoid or softmax which converts the output of each class into the
probability score of each class.
Fig: https://cs231n.github.io/convolutional-networks/
4
Convolutional Neural Networks
●
Neural networks that use convolution in place of general matrix
multiplication in atleast one of its layers.
●
Commonly used in Computer Vision.
Fig: https://www.geeksforgeeks.org/introduction-convolution-neural-network/
5
Convolutional Neural Networks
●
The Convolutional layer applies
filters to the input image to
extract features.
●
The Pooling layer downsamples
the image to reduce
computation.
●
The fully connected layer makes
the final prediction.
●
The network learns the optimal
filters through backpropagation
and gradient descent. Fig: https://www.geeksforgeeks.org/introduction-convolution-neural-network/
6
Convolution
●
Convolution is a mathematical operation to combine two
functions, say f(t) and g(t), denoted by *.
●
f(t) * g(t)
●
We can combine 2 functions by adding them or multiplying them.
●
The convolution operation is commutative, associative
7
●
Let f(x) be the treatment plan and g(x) be the list of patients.
●
At minute 0, I am yelling, so f(0) = 1. So the sound remaining in
the room = f(0).g(0), where g() is the sound impulse.
●
At minute 1, I yell again.
●
So the total sound at minute 1 = f(0).g(1) + f(1).g(0)
●
At minute 2, I yell again.
●
Total sound at minute 2 = f(0).g(2)+ f(1).g(1) + f(2).g(0)
8
Convolution in Image Processing
●
Is used to process an input image and transform that image into a
form that is more useful for downstream processing.
●
A kernel/filter is used to transform the input image. The kernels
are smaller in size when compared to the size of the input.
●
For example, the kernels can be 3X3, 5X5 matrices ....etc.
●
How a 6X6 matrix is convolved by a 3X3 kernel.
9
10
Convolutional Networks
●
●
In convolutional network terminology, the first argument to the
convolution is often referred to as the input.
●
The second argument as the kernel.
●
The output is sometimes referred to as the feature map.
●
In machine learning applications, the input is usually a multidimensional
arrayof data, and the kernel is usually a multidimensional array of
parameters that areadapted by the learning algorithm.
11
Motivation
●
Sparse interactions/ sparse
connectivity/ sparse weights
●
In a normal neural network,
every output unit interacts with
every input unit, through the
weights in the
interconnections.
●
Using convolution, small
features can be detected with
the help of smaller kernels. So
a group of input values are
mapped to an output value.
12
Sparse Connectivity
Receptive field of S3 – with convolution (Sparse)
14
Example CNN Architecture
15
Why Pooling?
●
We need the CNN to have a feature called spatial invariance.
●
It should identify the feature ireespective of where in the image it
appears, whether it is tilted, squished, elongated etc..
●
Pooling reduces the parameter size and so prevents overfitting.
●
Irrelevant features are removed.
●
Too much pooling can cause underfitting.
16
Types of COnvolution
●
Traditional Convolution – the kernels are shared
●
Multiple kernel convolution – multiple filters are used to extract different
features.
●
Unshared convolution – separate kernels are used in each stride
– Deepface uses this. This results in locally connected layers.
●
Locally connected layers are useful when we know that each feature
should be a function of a small part of space, but there is no reason to
think that the same feature should occur across all of space.
●
Tiled Convolution – A compromise between convoltion and
unshared convolution.
●
Cycle between a set of filters for each stride.
17
●
https://www.youtube.com/watch?v=FYlqTp2IoCY
Types of COnvolution
18
Separable Convolution
●
Spatial separable convolution and depthwise separable
convolution.
●
19
Spatial separable convolution
●
Deals with the spatial dimensions of an image and kernel (height and width).
●
A spatial separable convolution simply divides a kernel into two, smaller
kernels. The most common case would be to divide a 3x3 kernel into a 3x1 and
1x3 kernel, like so:
●
Instead of doing one convolution with 9 multiplications, we do two convolutions
with 3 multiplications each (6 in total) to achieve the same effect.
●
With less multiplications, computational complexity goes down, and the
network is able to run faster.
https://medium.com/towards-data-science/a-basic-introduction-to-separable-convolutions-20
b99ec3102728
Spatial separable convolution
21
Spatial separable convolution
https://medium.com/towards-data-science/a-basic-introduction-to-separable-convolutions-22
b99ec3102728
Depthwise Separable Convolution
23
Depthwise separable Convolution
●
If we want to increase the number of channels in the output
image, say we want 8 X8X256...
●
We need 256 kernels
●
Create 256 kernels to create 256 8x8x1 images, then stack them up
together to create a 8x8x256 image output.
24
●
12x12x3 —> (5x5x3x256) — >12x12x256 (Where 5x5x3x256
represents the height, width, number of input channels, and
number of output channels of the kernel).
25
Depthwise – Step 1
●
Give the input image a convolution without changing the depth.
Use 3 kernels of shape 5x5x1.
26
Pointwise Separation – Step 2
●
The pointwise convolution is so named because it uses a 1x1
kernel, or a kernel that iterates through every single point.
●
This kernel has a depth of however many channels the input
image has; in our case, 3.
●
Therefore, we iterate a 1x1x3 kernel through our 8x8x3 image, to
get a 8x8x1 image.
27
Pointwise Separation – Step 2
28
●
We can create 256 1x1x3 kernels that output a 8x8x1 image each
to get a final image of shape 8x8x256.
29
●
Separable Convolution
●
Convolution with lesser number of parameters.
30
31
●
https://www.youtube.com/watch?v=vCJ4magCPts
32