E-Eli5-Way-3bd2b1164a53: CNN (Source:)
E-Eli5-Way-3bd2b1164a53: CNN (Source:)
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-th
e-eli5-way-3bd2b1164a53)
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which is most
commonly applied to analyse visual imagery. The architecture of a ConvNet is analogous to
that of the connectivity pattern of Neurons in the Human Brain and was inspired by the
organisation of the Visual Cortex. In humans, Individual neurons respond to stimuli only in a
restricted region of the visual field known as the Receptive Field. The receptive fields of
different neurons partially overlap such that they cover the entire visual field.
As compared to multilayer perceptron (fully connected networks in which each neuron in one
layer is connected to all neurons in the next layer), CNN takes advantage of the hierarchical
pattern in data. They assemble the patterns of increasing complexity using smaller and
simpler patterns embossed in their filters. Therefore, on a scale of connectivity and
complexity, CNNs are on the lower extreme. CNNs use relatively little pre-processing
compared to other image classification algorithms. This means that the network learns to
optimise the filters (or kernels) through automated learning, whereas in traditional algorithms
these filters are hand-engineered.
A CNN consists of the input layer, the hidden layer and the output layer. The hidden layer of
a CNN consists of convolution layers, pooling layers and fully connected layers.
Input layer: the input layer of a CNN is a tensor of the image with a shape: (number of
inputs) x (input height) x (input width) x (input channels).
Hidden layer:
1. Convolution layer: The element involved in carrying out the convolution operation in
the Convolutional Layer is called the Kernel/Filter. The filter slides over the input
matrix with a certain Stride Value and at each step a dot product is performed. Filter
moves to the right till it parses the complete width. Moving on, it hops down to the
beginning (left) of the image with the same Stride Value and repeats the process until
the entire image is traversed. This process generates a feature map which is the
input to the next layer. In a CNN, multiple convolution layers are present and at each
convolution layer, the feature map complexity increases.
2. Pooling layer: Pooling layers reduce the dimensions of data by combining the outputs
of neuron clusters at one layer into a single neuron in the next layer. Furthermore, it
is useful for extracting dominant features which are rotational and positional
invariant, thus maintaining the process of effectively training the model.
The Convolutional Layer and the Pooling Layer together form the i-th layer of a
Convolutional Neural Network. Depending on the complexities in the images, the number of
such layers may be increased for capturing low-level details even further, but at the cost of
more computational power.
After going through these layers, the input image is converted into a suitable form for
Multi-Level Perceptron (Fully connected layer). The image is flattened into a column vector
and is fed to a feed-forward neural network and backpropagation applied to every iteration
of training. Over a series of epochs, the model is able to distinguish between dominating and
certain low-level features in images and classify them using the Softmax Classification
technique.
LSTM Source (Source
https://www.researchgate.net/figure/Basic-structure-of-a-long-short-term-memory-LSTM-unit
_fig1_330890239) and (https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture
used in the field of deep learning. Unlike standard feedforward neural networks (like CNN), LSTM
has feedback connections. Because of the feedback connections, LSTM can not only process
single data points (like image) but also a sequence of data points like a sentence or a time
series.
A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate.
The cell remembers values over arbitrary time intervals and the three gates regulate the flow of
information into and out of the cell.
In the image below, the 𝐶𝑡 is the cell state and it runs down the information through the entire
chain. On this, the information is removed or added by “Forget gate” or “Input gate”.
1. The Forget gate: The first step in LSTM is to decide what information to remove from the
cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks
at ℎ𝑡−1 and 𝑥𝑡 and outputs a number between 0 and 1 for each number in the cell state
𝐶𝑡−1 . A “1” represents “completely keep this” while a 0 represents “completely get rid of
this.” The output of the forget gate is calculated using the following equation:
𝑓𝑡 = 𝜎(𝑊𝑓. [ℎ𝑡−1, 𝑥𝑡−1] + 𝑏𝑓)
2. Input gate: The next step is to decide what new information we’re going to store in
the cell state. For this step, first the sigmoid layer decides which values to update.
Then a tanh layer creates a vector of new values that are to be added in the cell
state. Then these two are combined to create an update to the state. Mathematically
it represented by:
𝑖𝑡 = 𝜎(𝑊𝑖. [ℎ𝑡−1, 𝑥𝑡−1] + 𝑏𝑖)