Convolutional Neural Networks: ZV0GDF798E
Convolutional Neural Networks: ZV0GDF798E
Let's understand the notion of a convolutional neural network (CNN) and how it is
different from the kind of neural network we learned earlier.
Let’s understand the first layer once. Let's say that our input is a grayscale image that's
256 x 256. In grayscale images, we have only one channel so the image size will be
256 x 256 x 1.
[email protected]
ZV0GDF798E
We can break this image up into 8 x 8 image patches. The patch is nothing but a small
portion of an image.
Now let's consider a linear function on just the 8 x 8 image patch. Sometimes this is
called a filter.
In the first layer, we have a collection of filters, each one is applied to each of the image
patches,
[email protected] and together they give the output of the first layer, after applying a
ZV0GDF798E
non-linearity at the end. This is already a major innovation because it means we can
work with much larger neural networks in practice. Just the first few layers are
convolutional and the others are general and fully connected.
Here when we compute how well a neural network classifies some image, say through
the quadratic cost function, we instead randomly delete some fraction of the network,
Additional Content :
We have understood neural networks and how they can be used to build robust
models using numerical data. Let's assume we have an image with height = 6, width =
6, and the number of channels = 3 (colored image).
So there are 6 x 6 x 3 = 108 numbers required to fully describe the image. Consider we
have the first hidden layer of a neural network with 10 units. The total number of
parameters (weights) are 108 x 10 = 1080. So we need 1080 weights for only one
layer and generally, the size of the image will be equal to 224 x 224. In these kinds of
cases, we get a lot of parameters to train, which makes it very computationally
expensive and the model doesn't perform much better. To deal with this kind of
[email protected]
ZV0GDF798Eproblem, we have special neural networks called Convolutional Neural Networks
(CNNs).
Note: The blue matrix on the right above, represents just the first application of the
filter to the first 3 X 3 portion of the 6 X 6 image. It does not represent the final
[email protected]
ZV0GDF798Eoutput of the convolution. The blue matrix on the right will actually result in just one
final number, which is the sum of the element-wise products (products of the big
number and the small number inside each square).
So after the convolution, we finally get a 4 X 4 image. The first element of the 4 X 4
matrix will be calculated as we take the first 3 X 3 matrix from the 6 X 6 image and
Filters:
Filters are responsible for locating objects in an image by detecting changes in the
intensity values of the images. Generally, we have an edge detector that is capable of
detecting edges in an image in mathematical form.
For example:
[email protected]
ZV0GDF798E
In images, we have a lot of complex features that need to be detected other than
edges. For that purpose, we randomly initialize filter values, and the model itself will
learn the best filter values for feature detection during the backpropagation phase.
Pooling:
Pooling is another technique used to reduce the spatial size of the representation, in
order to reduce the number of parameters and the computational cost of the network.
Padding:
The convolutional layers reduce the size of the output. So in cases where we want to
increase the size of the output and save the information presented in the corners, we
[email protected]
ZV0GDF798Ecan use padding layers, where padding helps by adding extra rows and columns on the
outer dimension of the images. So the size of the input data will remain similar to the
output data. We mostly add zeros in the extra rows and columns (Zero padding).
After getting a fully connected layer, we pass this layer as an input layer to the neural
network in order to get the results.
We now
[email protected] have an understanding of the building blocks of a CNN. CNN's do nothing
ZV0GDF798E
other than arranging these building blocks in the right order. Usually, that order is a
convolution layer followed by a pooling layer (multiple times), and finally a fully
connected layer.
Let’s look at one of the historically famous CNN architectures in Deep Learning.
[email protected]
ZV0GDF798E
There are several other architectures that can be explored to get a better
understanding of which building blocks are to be used in the implementation of CNNs.