Theory of CNN (Convolutional Neural Network)
Theory of CNN (Convolutional Neural Network)
Name: Suleman
Shivani
Akanksha
Munendra
INTRODUCTION OF CNN:
CONVOLUTIONAL LAYER:
The CONVOLUTIONAL LAYER is related to feature extraction. First le us get clear
of the idea of ‘filter’ and ‘convolution’, then, we shall move on to its
implementation in the layer.
Filters: Filters or ‘kernels’ are also an image that depict a particular feature. For
example, let us take the picture of this curve. We take this as a sample that we will
recognize, i.e., determine whether it is present in an image.
Convolution: It is a
special operation applied
on a particular matrix.
The operation involves
multiplying the value of a
cell corresponding to a
particular row and
column, of the image
matrix, with the values of
the corresponding cell in
the filter matrix.
PAGE 1
RELU Activation: RELU or Rectified Linear Unit is applied on all the cells of all the
output-matrix. The basic intuition to derive from here is that, after convolution, if a
particular convolution function results in ‘0’ or a negative value, it implies that the
feature is not present there and we denote it by ‘0’, and for all the other cases we
keep the value.
Together with all the operations and the functions applied on the input image, we
form the first part of the Convolutional Block.
POOLING LAYER:
The Pooling layer consist of performing the process of extracting a particular value
from a set of values, usually the max value or the average value of all the values.
This reduces the size of the output matrix. It is common to periodically insert a
Pooling layer in-between successive Conv layers in a ConvNet architecture. Its
function is to progressively reduce the spatial size of the representation to reduce
the amount of parameters and computation in the network, and hence to also
control overfitting. The Pooling Layer operates independently on every depth slice
of the input and resizes it spatially, using the MAX operation. The most common
form is a pooling layer with filters of size 2x2 applied with a stride of 2
downsamples every depth slice in the input by 2 along both width and height,
discarding 75% of the activations. Every MAX operation would in this case be
taking a max over 4 numbers (little 2x2 region in some depth slice). The depth
dimension remains unchanged.
PAGE 2
Together with the CONVOLUTIONAL
LAYER and the POOLING LAYER, we
form the CONVOLUTIONAL BLOCK of
the CNN architecture. Generally, a simple
CNN architecture constitutes of a
minimum of three of these Convolutional
Block, that performs feature extraction at
various levels.
PAGE 3