0% found this document useful (0 votes)
6 views

A convolutional neural network

Convolutional neural networks (CNNs) utilize convolution operations to filter input data and create feature maps, enabling the recognition of important features in images. Key techniques such as padding and striding help manage the size of the output, while CNN architecture consists of convolutional, pooling, and fully-connected layers to enhance image processing efficiency. The pooling layer compresses feature maps, improving stability and allowing for effective classification through a feed-forward neural network.

Uploaded by

jrn.begum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

A convolutional neural network

Convolutional neural networks (CNNs) utilize convolution operations to filter input data and create feature maps, enabling the recognition of important features in images. Key techniques such as padding and striding help manage the size of the output, while CNN architecture consists of convolutional, pooling, and fully-connected layers to enhance image processing efficiency. The pooling layer compresses feature maps, improving stability and allowing for effective classification through a feed-forward neural network.

Uploaded by

jrn.begum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Convolution operation

A convolutional neural network, or ConvNet, is a neural network that uses


convolution. To understand the principle, we are going to work with a 2-
dimensional convolution first.

Why do we use convolution in neural networks?


Convolution is a mathematical operation that allows the merging of two sets of
information. In the case of CNN, convolution is applied to the input data to
filter the information and produce a feature map.

This filter is also called a kernel, or feature detector, and its dimensions can
be, for example, 3x3. To perform convolution, the kernel goes over the input
image, doing matrix multiplication element after element. The result for each
receptive field (the area where convolution takes place) is written down in the
feature map.
We continue sliding the filter until the feature map is complete.

Padding and striding


Before we go further, it’s also useful to talk about padding and striding. These
techniques are often used in CNNs:

 Padding. Padding expands the input matrix by adding fake pixels to the
borders of the matrix. This is done because convolution reduces the size of
the matrix. For example, a 5x5 matrix turns into a 3x3 matrix when a filter
goes over it.
 Striding. It often happens that when working with a convolutional layer, you
need to get an output that is smaller than the input. One way to achieve this is
to use a pooling layer. Another way to achieve this is to use striding. The idea
behind stride is to skip some areas when the kernel slides over: for example,
skipping every 2 or 3 pixels. It reduces spatial resolution and makes the
network more computationally efficient.

How to Convolutional Neural Networks actually perform this operation?


 In reality, convolutional neural networks develop multiple feature detectors
and use them to develop several feature maps which are referred to as
convolutional layers .

 Through training, the network determines what features it finds important in


order for it to be able to scan images and categorize them more
accurately.
 Based on that, it develops its feature detectors. In many cases, the
features considered by the network will be unnoticeable to the human eye,
which is exactly why convolutional neural networks are so amazingly
useful. With enough training, they can go light years ahead of us in terms
of image processing.


Padding and striding can help process images more accurately.

For real-life tasks, convolution is usually performed in 3D. The majority of


images have 3 dimensions: height, width and depth, where depth corresponds
to color channels (RGB). So the convolutional filter needs to be 3-dimensional
as well. Here is how the same operation looks in 3D.

There are multiple filters in a convolutional layer and each of them generates
a filter map. Therefore, the output of a layer will be a set of filter maps,
stacked on top of each other.
For example, padding and passing a 30x30x3 matrix through 10 filters will
result in a set of 10 30x30x1 matrices. After we stack these maps on top of
each other, we will get a 30x30x10 matrix.

This is the output of our convolutional layer.

The process can be repeated: CNNs usually have more than one
convolutional layer.

3 layers of CNN
The goal of CNN is to reduce the images so that it would be easier to process
without losing features that are valuable for accurate prediction.

ConvNet architecture has three kinds of layers: convolutional layer, pooling


layer, and fully-connected layer.

 A convolutional layer is responsible for recognizing features in pixels.


 A pooling layer is responsible for making these features more abstract.
 A fully-connected layer is responsible for using the acquired features for
prediction.

Convolutional layer

We’ve already described how convolution layers work above. They are at the
center of CNNs, enabling them to autonomously recognize features in the
images.

But going through the convolution process generates a large amount of data,
which makes it hard to train the neural network. To compress the data, we
need to go through pooling.
Pooling layer

A pooling layer receives the result from a convolutional layer and compresses
it. The filter of a pooling layer is always smaller than a feature map. Usually, it
takes a 2x2 square (patch) and compresses it into one value.

A 2x2 filter would reduce the number of pixels in each feature map to one
quarter the size. If you had a feature map sized 10×10, the output map would
be 5×5.

Multiple different functions can be used for pooling. These are the most
frequent:

 Maximum Pooling. It calculates the maximum value for each patch of the
feature map.
 Average pooling. It calculates the average value for each patch on the feature
map.

After using the pooling layer, you get pooled feature maps that are a
summarized version of the features detected in the input. Pooling layer
improves the stability of CNN: if before even slightest fluctuations in pixels
would cause the model to misclassify, now small changes in the location of
the feature in the input detected by the convolutional layer will result in a
pooled feature map with the feature in the same location.

Now we need to flatten the input (turn it into a column vector) and pass it
down to a regular neural network for classification.

Fully-connected layer

The flattened output is fed to a feed-forward neural network and


backpropagation is applied at every iteration of training. This layer provides
the model with the ability to finally understand images: there is a flow of
information between each input pixel and each output class.

What are other uses of Convolution Matrices?


There's another use for convolution matrix, which is actually part of the reason
why they are called “filters”. The word here is used in the same sense we use it
when talking about Instagram filters.
You can actually use a convolution matrix to adjust an image. Here are a few
examples of filters being applied to images using these matrices.

There is really little technical analysis to be made of these filters and it would be
of no importance to our tutorial. These are just intuitively formulated matrices.
The point is to see how applying them to an image can alter its features in the
same manner that they are used to detect these features.

You might also like