0% found this document useful (0 votes)
64 views4 pages

E-Eli5-Way-3bd2b1164a53: CNN (Source:)

A Convolutional Neural Network (CNN) is a type of deep learning algorithm used for analyzing visual imagery. CNNs are inspired by the human visual cortex and take advantage of patterns in hierarchical data. A CNN consists of an input layer, hidden layers of convolution, pooling and fully connected layers, and an output layer. Convolution layers apply filters to extract features while pooling layers reduce dimensions and maintain invariance. Together they form feature maps that increase in complexity through the hidden layers until the output layer performs classification. Long short-term memory (LSTM) networks are a type of recurrent neural network used for processing sequential data like text or time series. Unlike standard feedforward networks, LSTMs have feedback connections that allow them to remember

Uploaded by

Mohit Singhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views4 pages

E-Eli5-Way-3bd2b1164a53: CNN (Source:)

A Convolutional Neural Network (CNN) is a type of deep learning algorithm used for analyzing visual imagery. CNNs are inspired by the human visual cortex and take advantage of patterns in hierarchical data. A CNN consists of an input layer, hidden layers of convolution, pooling and fully connected layers, and an output layer. Convolution layers apply filters to extract features while pooling layers reduce dimensions and maintain invariance. Together they form feature maps that increase in complexity through the hidden layers until the output layer performs classification. Long short-term memory (LSTM) networks are a type of recurrent neural network used for processing sequential data like text or time series. Unlike standard feedforward networks, LSTMs have feedback connections that allow them to remember

Uploaded by

Mohit Singhal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CNN (source:

https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-th
e-eli5-way-3bd2b1164a53)
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which is most
commonly applied to analyse visual imagery. The architecture of a ConvNet is analogous to
that of the connectivity pattern of Neurons in the Human Brain and was inspired by the
organisation of the Visual Cortex. In humans, Individual neurons respond to stimuli only in a
restricted region of the visual field known as the Receptive Field. The receptive fields of
different neurons partially overlap such that they cover the entire visual field.

As compared to multilayer perceptron (fully connected networks in which each neuron in one
layer is connected to all neurons in the next layer), CNN takes advantage of the hierarchical
pattern in data. They assemble the patterns of increasing complexity using smaller and
simpler patterns embossed in their filters. Therefore, on a scale of connectivity and
complexity, CNNs are on the lower extreme. CNNs use relatively little pre-processing
compared to other image classification algorithms. This means that the network learns to
optimise the filters (or kernels) through automated learning, whereas in traditional algorithms
these filters are hand-engineered.

A CNN consists of the input layer, the hidden layer and the output layer. The hidden layer of
a CNN consists of convolution layers, pooling layers and fully connected layers.
Input layer: the input layer of a CNN is a tensor of the image with a shape: (number of
inputs) x (input height) x (input width) x (input channels).
Hidden layer:
1. Convolution layer: The element involved in carrying out the convolution operation in
the Convolutional Layer is called the Kernel/Filter. The filter slides over the input
matrix with a certain Stride Value and at each step a dot product is performed. Filter
moves to the right till it parses the complete width. Moving on, it hops down to the
beginning (left) of the image with the same Stride Value and repeats the process until
the entire image is traversed. This process generates a feature map which is the
input to the next layer. In a CNN, multiple convolution layers are present and at each
convolution layer, the feature map complexity increases.
2. Pooling layer: Pooling layers reduce the dimensions of data by combining the outputs
of neuron clusters at one layer into a single neuron in the next layer. Furthermore, it
is useful for extracting dominant features which are rotational and positional
invariant, thus maintaining the process of effectively training the model.

The Convolutional Layer and the Pooling Layer together form the i-th layer of a
Convolutional Neural Network. Depending on the complexities in the images, the number of
such layers may be increased for capturing low-level details even further, but at the cost of
more computational power.
After going through these layers, the input image is converted into a suitable form for
Multi-Level Perceptron (Fully connected layer). The image is flattened into a column vector
and is fed to a feed-forward neural network and backpropagation applied to every iteration
of training. Over a series of epochs, the model is able to distinguish between dominating and
certain low-level features in images and classify them using the Softmax Classification
technique.
LSTM Source (Source
https://www.researchgate.net/figure/Basic-structure-of-a-long-short-term-memory-LSTM-unit
_fig1_330890239) and (https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture
used in the field of deep learning. Unlike standard feedforward neural networks (like CNN), LSTM
has feedback connections. Because of the feedback connections, LSTM can not only process
single data points (like image) but also a sequence of data points like a sentence or a time
series.
A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate.
The cell remembers values over arbitrary time intervals and the three gates regulate the flow of
information into and out of the cell.
In the image below, the 𝐶𝑡 is the cell state and it runs down the information through the entire
chain. On this, the information is removed or added by “Forget gate” or “Input gate”.
1. The Forget gate: The first step in LSTM is to decide what information to remove from the
cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks
at ℎ𝑡−1 and 𝑥𝑡 and outputs a number between 0 and 1 for each number in the cell state
𝐶𝑡−1 . A “1” represents “completely keep this” while a 0 represents “completely get rid of
this.” The output of the forget gate is calculated using the following equation:
𝑓𝑡 = 𝜎(𝑊𝑓. [ℎ𝑡−1, 𝑥𝑡−1] + 𝑏𝑓)
2. Input gate: The next step is to decide what new information we’re going to store in
the cell state. For this step, first the sigmoid layer decides which values to update.
Then a tanh layer creates a vector of new values that are to be added in the cell
state. Then these two are combined to create an update to the state. Mathematically
it represented by:
𝑖𝑡 = 𝜎(𝑊𝑖. [ℎ𝑡−1, 𝑥𝑡−1] + 𝑏𝑖)

𝐶𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝑐. [ℎ𝑡−1, 𝑥𝑡−1] + 𝑏𝑐)


The updated cell state is calculated by: 𝐶𝑡 = 𝑓𝑡 * 𝐶𝑡−1 + 𝑖𝑡 *𝐶𝑡
3. Output gate: In the next step, the output is calculated. This will be a filtered version of the
cell state.First, a sigmoid layer is used which decides what parts of the cell state are
going to output. Then, the cell state is put through 𝑡𝑎𝑛ℎ (to push the values to be
between −1 and 1) and multiply it by the output of the sigmoid gate, so that only the
decided parts are output. Mathematically it is represented as:
𝑜𝑡 = σ(𝑊𝑜[ℎ𝑡−1, 𝑥𝑡−1] + 𝑏𝑜)
ℎ𝑡 = 𝑜𝑡 * 𝑡𝑎𝑛ℎ(𝐶𝑡)
In the above equations, the lowercase variables represent vectors. Matrices 𝑊𝑞 contains the
weights where the subscript 𝑞 can either be the input gate 𝑖, output gate 𝑜, the forget gate 𝑓 or
the memory cell 𝑐, depending on the activation being calculated. A "vector notation" was used

which means 𝑐𝑡ϵ𝑅 is not just one unit of one LSTM cell, but contains ℎ LSTM cell's units.
General classification models can only discriminate among the finite set of classes
predefined. This approach needs at least two categories, “normal” and “abnormal”. However,
as previously described in Section 1, there are too many variables to define anomalies, and
the human annotation for a large collection of abnormal data labeled at the class level is very
labor-intensive. It often suffers from the lack of prior information for unknown abnormal
classes, and they are practically impossible to be predicted. When the abnormal data are not
easily available, the neural network cannot perform very well because of the data imbalance.
All these problems are due to the fact that we attempt to solve them by a classification
problem and to select a better classifier. As a result, we propose to recast this problem as a
regression-based abnormality decision via manifold learning with the auto-encoder.

You might also like