03 - CNN
03 - CNN
Convolutional Neural
Network
(1) Convolutional Neural
Network
Image Representation using
CNN
• Convolutional Neural Network are special class of Deep
Neural Networks (DNN) that are built with the ability to
extract unique features from image data. They are the
best algorithms currently used for image processing as
compared to RNN , LSTM or any other variants.
• In mathematics, the convolution between two functions
f(x) and(f(t) basically measure the overlap between x
and t by MAC opeartion when one function is “flipped”
and shifted by x.
( f * t )(x) = f(z)g(x-z)dz
• They work phenomenally well on computer vision tasks
like image classification, object detection, image
recognition, etc.
• CNNs have a unique layered architecture consisting of
convolutional, pooling, and fully connected layers, which
are designed to automatically learn the features and
hierarchies of the input Image data.
Architecture of CNN
1. Input Image :
• This is the raw pixel values of an image represented as a 3D matrix
Dimensions
W x H x D the width W , height H and the depth D where depth
corresponds to the
number of color channels in the image. Eg . 32-pixel image with
dimensions
[32x32x3] Flattening this matrix into a single input vector would result
in2.anConvolutional
array of
•Layer :
The entire
32x32x3 numerical
= 3072 nodesrepresentation
and associatedi.eweights.
3072 values aresince
Further not passed
imagesinto
the network.
are made up Reducing the size of the numerical representation sent to
the
of CNN they
pixels, is done
arevia the convolution
converted operation.
into a numerical form that is passed to the
CNN.
Architecture of CNN ( contd )
• This process is vital so that only features that are important in
classifying an image are sent to the neural network. Apart from
improving the accuracy of the network, this also ensures that minimal
compute resources are used in training the network.
• The result of the convolution operation is referred to as a feature
map, convolved feature, or activation map. Applying a feature
detector is what leads to a feature map. The feature detector is also
known by other names such as kernel or filter.
• The kernel is usually a 3 by 3 matrix. Performing an element-wise
multiplication of the kernel with the input image and summing the
values, outputs the feature map. This is done by sliding the kernel on
the input image. The sliding happens in steps known as strides.
• Basically , the
convolutional layer
cross-correlates the
input and kernel and
adds a scalar bias to
produce an output.
The two parameters
of a convolutional
layer are the kernel
and the scalar bias.
https://d2l.ai/chapter_convolutional-neural-networks/conv-layer.ht
ml
This is the case since we need enough space to “shift” the convolution
kernel across the image. Later we will see how to keep the size unchanged
by padding the image with zeros around its boundary so that there is
enough space to shift the kernel.
Architecture of CNN ( contd )
Architecture of CNN ( contd )
3. Activation Layer :
• The output volume of the Convolution layer is fed to an element wise
activation function, commonly a Rectified-Linear Unit (ReLu). The ReLu
layer determines whether an input node will 'fire' given the input data.
4. Pooling Layer :
• In this operation, the size of the feature map is reduced further. A
common technique is max-pooling. The size of the pooling filter is
usually a 2 by 2 matrix. In max-pooling, the 2 by 2 filter slides over the
feature map and picks the largest value in a given box. This operation
results in a pooled feature map. The size of the pooling filter is usually a
2 by 2 matrix. In max-pooling, the 2 by 2 filter slides over the feature
map and picks the largest value in a given box. This operation results in
a pooled feature map which is nothing but are convolved features of the
5. Fully-Connected Layer :
image.
• The convolved features or the pooled feature map is flattened now to a
single column so as to pass through the fully connected layer. The goal
of the Fully-Connected layer is to make class predictions.
• Like conventional neural-networks, every node in the Fully connected
layer is connected to every node in the volume of features being fed-
forward. The class probabilities are computed and are outputted in a 3D
array with dimensions: [1x1xK] where K is the number of classes.
Real World CNN Architecture
• There Examples
are numerous architectures that have been developed and
released publicly for many different tasks, such as object detection,
object recognition, image segmentation, etc.
• . Some popular architectures that have been proven to have high
accuracy are listed below and they can be accessed using Keras :
Thanks