The CNN Architecture
The CNN Architecture
[email protected]
R8L0PN473F
1. The feature extraction stage w/ (a) Convolution layer + Activation function & (b) Pooling layer
2. The prediction stage w/ (a) Flatten layer & (b) Fully connected layer
[email protected]
R8L0PN473F Convolution Cat
Flatten
layer Fully
Pooling
+ connected
layer
Activation layer Dog
function
Input Output
Feature extraction
■ In the convolutional layer, a filter is applied on an input image to build a feature map.
■ Filters can be custom made. In convolutional neural networks though, the values of the
filters are learned through training to determine the important features that can
categorize the image more accurately. We simply pass the number of filters and their
sizes to the CNN, and the model learns the filter values to develop various feature maps
[email protected]
R8L0PN473F that capture the presence of the features detected.
3 x 3 filter
2 x 2 output
4 x 4 image dimension This process is repeated until it goes through the whole image
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
The activation function
Convolution Layer + Activation Function Pooling Layer
Feature extraction
■ The next step is the Activation Function. In CNNs, the ReLU activation function is used to
incorporate a non-linear component into neural networks.
■ The ReLU function is used in CNNs because the convolution operation is essentially a linear
operation (a sum of element-wise products) between the image and the filter, and ReLU adds
the non-linearity required for this operation to be able to detect complex non-linear boundaries
[email protected]
R8L0PN473F in the image.
Filter
Feature extraction
■ One advantage of using ReLU (Rectified Linear Unit) is that if there are any negative
pixels, the function will convert it into zero. This is useful for visualizing these image maps,
since negative pixels have no meaning, and it makes subsequent computations more
efficient since zeros are easy to work with.
[email protected]
R8L0PN473F
Filter
Feature extraction
■ The next step in the Feature Extraction stage is Pooling. The pooling layer helps in
removing unwanted features from the image. In doing so, it reduces the size of the
image and hence decreases the computational cost of the model.
■ CNNs can use ‘Max Pooling’ to create a pooled feature map using only the
maximum values of each patch, and dispose of the unnecessary pixel information.
[email protected]
R8L0PN473F ■ An important question here is - Do we lose information in this process?
The answer is Yes. The final feature map includes fewer cells and thus less information
than the original input image. But, the purpose of this step is to discard irrelevant
features so that the network can do its job more efficiently.
Max Pooling
Feature extraction
■ This process is what’s responsible for the "spatial invariance" property of CNNs.
Pooling also reduces the size of the images, and hence the number of parameters,
which minimises the likelihood of "overfitting", which neural networks are often
susceptible to.
[email protected]
R8L0PN473F
Max Pooling
Prediction
■ After the Feature Extraction stage, we move to the Prediction stage of the CNN.
The first step of the Prediction stage is the Flatten layer.
■ In this layer, the CNN literally flattens the pooled feature map into a column vector,
and the output is used in a standard artificial neural network configuration, which are
the fully connected layers of the CNN.
[email protected]
R8L0PN473F
Flattening
Inputs for the next
layer
Prediction
■ So far, we have learned about the convolution operation, the activation function,
pooling and flattening. The final stage of the process is the fully connected layer,
where the information from the previous layers is sent to an artificial neural network
architecture with Dense (fully connected) layers.
■ The aim of this step is to take the input and combine the features into a wider variety
[email protected]
R8L0PN473F of attributes
that make the convolutional network more capable of categorising
images, which is the sole purpose of building a convolutional neural network.
Fully
connected
Convolution layer
Pooling Flattening
Stride = 1
[email protected] Padding= 1 Stride = 1
R8L0PN473F Padding = 1 Padding = 1
Filters = 20 14 x 14 x 20
Filters = 10 14 x 14 x 10
28 x 28 x 3 28 x 28 x 10
Input
7 x 7 x 10
Fully connected
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Summary
To summarize, let's go over what we've learnt about convolutional neural networks in a nutshell:
■ We are starting off by applying filters to input images in the form of a convolutional layer.
■ We break up the linearity of the convolution operation, using the ReLU activation function.
■ We reduce the image dimension using pooling, for isolating the important features and computational
efficiency.
[email protected]
■
R8L0PN473FWe flatten the feature map and feed it into a fully-connected neural network to generate the final
predictions.
Throughout this process, the trainable parameters of the CNN - the weights and filter values, are trained
and continuously updated through backpropagation so that the CNN can achieve its best possible
performance in image prediction tasks.