0% found this document useful (0 votes)
17 views25 pages

Module 1

Neural networks are computational models inspired by biological neural networks, designed to perform tasks like pattern recognition and classification more efficiently than traditional systems. Artificial Neural Networks (ANNs) consist of interconnected nodes or neurons that process information through weighted connections, and can be classified into single-layer and multi-layer architectures. Learning in ANNs can occur through supervised, unsupervised, or reinforcement methods, with various activation functions applied to enhance performance and adaptability.

Uploaded by

thejasurendran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views25 pages

Module 1

Neural networks are computational models inspired by biological neural networks, designed to perform tasks like pattern recognition and classification more efficiently than traditional systems. Artificial Neural Networks (ANNs) consist of interconnected nodes or neurons that process information through weighted connections, and can be classified into single-layer and multi-layer architectures. Learning in ANNs can occur through supervised, unsupervised, or reinforcement methods, with various activation functions applied to enhance performance and adaptability.

Uploaded by

thejasurendran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Neural networks are parallel computing devices, which is basically an attempt to make a computer model

of the brain. The main objective is to develop a system to perform various computational tasks faster than
the traditional systems. These tasks include pattern recognition and classification, approximation,
optimization, and data clustering.

What is Artificial Neural Network?

Artificial Neural Network (ANN) is an efficient computing system whose central theme is borrowed from
the analogy of biological neural networks. ANNs are also named as “artificial neural systems,” or
“parallel distributed processing systems,” or “connectionist systems.” ANN acquires a large collection
of units that are interconnected in some pattern to allow communication between the units. These units,
also referred to as nodes or neurons, are simple processors which operate in parallel.
Every neuron is connected with other neuron through a connection link. Each connection link is
associated with a weight that has information about the input signal. This is the most useful information
for neurons to solve a particular problem because the weight usually excites or inhibits the signal that is
being communicated. Each neuron has an internal state, which is called an activation signal. Output
signals, which are produced after combining the input signals and activation rule, may be sent to other
units.

Biological Neuron

A nerve cell neuronneuron is a special biological cell that processes information. According to an
estimation, there are huge number of neurons, approximately 1011 with numerous interconnections,
approximately 1015.

Schematic Diagram

Working of a Biological Neuron


As shown in the above diagram, a typical neuron consists of the following four parts with the help of
which we can explain its working −
• Dendrites − They are tree-like branches, responsible for receiving the information from
other neurons it is connected to. In other sense, we can say that they are like the ears of
neuron.
• Soma − It is the cell body of the neuron and is responsible for processing of information,
they have received from dendrites.
• Axon − It is just like a cable through which neurons send the information.
• Synapses − It is the connection between the axon and other neuron dendrites.

ANN versus BNN


Similarities based on the terminology between these two.
Biological Neural Network Artificial Neural Network

Soma Node

Dendrites Input

Synapse Weights or Interconnections

Axon Output

The following table shows the comparison between ANN and BNN based on some criteria mentioned.

Criteria BNN ANN

Processing Massively parallel, slow but Massively parallel, fast but inferior than BNN
superior than ANN

Size 1011 neurons and 102 to 104 nodes mainly depends on the type of
1015 interconnections application and network design

Learning They can tolerate ambiguity Very precise, structured and formatted data is
required to tolerate ambiguity

Fault Performance degrades with even It is capable of robust performance, hence has
tolerance partial damage the potential to be fault tolerant

Storage Stores the information in the Stores the information in continuous memory
capacity synapse locations

Model of Artificial Neural Network

The following diagram represents the general model of ANN followed by its processing.
For the above general model of artificial neural network, the net input can be calculated as follows −
yin=x1.w1+x2.w2+x3.w3…xm.wm
i.e., Net input yin= ∑𝑚
𝑖 𝑋𝑖. 𝑤𝑖

The output can be calculated by applying the activation function over the net input.
Y=F(yin)

Processing of ANN depends upon the following three building blocks −


• Network Topology
• Adjustments of Weights or Learning
• Activation Functions

Network Topology

A network topology is the arrangement of a network along with its nodes and connecting lines. According
to the topology, ANN can be classified as the following kinds −
Feedforward Network
It is a non-recurrent network having processing units/nodes in layers and all the nodes in a layer are
connected with the nodes of the previous layers. The connection has different weights upon them. There
is no feedback loop means the signal can only flow in one direction, from input to output. It may be
divided into the following two types −
• Single layer feedforward network − The concept is of feedforward ANN having only
one weighted layer. In other words, we can say the input layer is fully connected to the
output layer.
• Multilayer feedforward network − The concept is of feedforward ANN having more
than one weighted layer. As this network has one or more layers between the input and the
output layer, it is called hidden layers.

Feedback Network
As the name suggests, a feedback network has feedback paths, which means the signal can flow in both
directions using loops. This makes it a non-linear dynamic system, which changes continuously until it
reaches a state of equilibrium. It may be divided into the following types −
• Recurrent networks − They are feedback networks with closed loops. Following are the
two types of recurrent networks.
• Fully recurrent network − It is the simplest neural network architecture because all nodes
are connected to all other nodes and each node works as both input and output.

• Jordan network − It is a closed loop network in which the output will go to the input
again as feedback as shown in the following diagram.
Adjustments of Weights or Learning

Learning, in artificial neural network, is the method of modifying the weights of connections between
the neurons of a specified network. Learning in ANN can be classified into three categories namely
supervised learning, unsupervised learning, and reinforcement learning.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a teacher. This learning
process is dependent.
During the training of ANN under supervised learning, the input vector is presented to the network, which
will give an output vector. This output vector is compared with the desired output vector. An error signal
is generated, if there is a difference between the actual output and the desired output vector. On the basis
of this error signal, the weights are adjusted until the actual output is matched with the desired output.

Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a teacher. This learning
process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar type are combined
to form clusters. When a new input pattern is applied, then the neural network gives an output response
indicating the class to which the input pattern belongs.
There is no feedback from the environment as to what should be the desired output and if it is correct or
incorrect. Hence, in this type of learning, the network itself must discover the patterns and features from
the input data, and the relation for the input data over the output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the network over some critic
information. This learning process is similar to supervised learning, however we might have very less
information.
During the training of network under reinforcement learning, the network receives some feedback from
the environment. This makes it somewhat similar to supervised learning. However, the feedback obtained
here is evaluative not instructive, which means there is no teacher as in supervised learning. After
receiving the feedback, the network performs adjustments of the weights to get better critic information
in future.

Basic Structure of Artificial Neural Networks

The Artificial Neural Networks (ANNs) are computational models that are inspired from human brain. In
another words, it is the modelling of human brain work logic mathematically. The main goal is providing a
result(or output) that in line with our purpose after passing some processes. Just as the human brain has
billions of neurons, ANNs also has hundreds or thousands of artificial neurons.

ANNs are used for regression or classification problems and they consists of two basic architecture:

1. Single-Layer Artificial Neural Networks

2. Multi-Layer Artifical Neural Networks

Single-Layer Artificial Neural Networks (Single-Layer Perceptrons)


The Single-Layer Artificial Neural Networks are also called as Perceptrons. The Perceptron, is the basic
component of ANNs. It is actually binary classification algorithm that is invented by Frank Rosenblatt in
1957. That is, it is an algorithm that tries to decide which output class an input belongs to.

How does the Perceptron work?

The perceptron consists of five components:

1. Inputs: These are the independent variables (x) that we have.

2. Weigths: Weight parameters (w) control the strength of the connection between inputs and
neurons. It can also be said to represent the effect of an independent variable on the result.

3. Bias value(b): It is a constant value that allows to control the output value. Also, when all
inputs are zero, it ensures that the process can still continue.

4. Activation Functions: The activation function (f) defines the output of the neuron according
to certain conditions.

5. Output: The dependent variable (y) is the result we want to find. In perceptrons, the result is
divided into two classes, classes 1 and 0.

If we formulate the process, we can show it like this: y=f(x×w+b)

The Perceptron

The process proceeds as follows:

• The weighted sum is calculated by first multiplication of weights and inputs, then addition of
them. The bias value is included in this value (bias=b=x0×w0)

• The activation function is applied to the weighted sum(z) and the result is found. The
perceptrons use step function as a activation function.
According to step funciton, if the weigted sum;

→ z>0, then result is 1,

→ z≤0, then result is 0.

The applied area of perceptrons is limited. Beacuse perceptrons are usually used for simple binary
classification problems that are linearly separable.

Multi-Layer Neural Networks (Multi-Layer perceptrons(MLPs))

As the name suggests, the multi-layer neural networks, or the multi-layer perceptrons (MLPs), consist
layers more than one. Beside of the perceptrons, they can be used for non-linearly separable problems.
They achieve this with the activation functions they use in their layers. The activation functions make the
output of neurons nonlinear. In this way, it enables to solve more complex problems. (Without activation
function, ANNs actually become a linear regression model.)

The basic layers of MLPs are:

• Input layer→ it includes 1 neuron per input x.

• Hidden layers (one or more) → The number of neurons it consists depends on the problem.

• Output layer →The number of neurons it consists depends on the problem.


In the first step, for every neurons of hidden layers, the same process in the perceptron is applied:

1. The weighted sum(z) is calculated.

2. It is transmitted to related hidden neuron, then the activation function present in the neuron is
applied.

In the next step, the outputs of hidden layers are transmitted to output layer. As said before, the number
of neurons depends on the problem in here:

Regression: consists of 1 neuron ,


Binary Classification: consists of 1 neuron,
Multi-label Classification: consists of 1 neuron per label,
Multi-class Classification: consists of 1 neuron per class in the output layer.

The activation functions in neurons of output layer also depends on the task:

Regression: None or ReLU/Softplus(if positive outputs) or Logistic/tanh( if bounded outputs),


Binary Classification: Logistic(sigmoid) function,
Multi-label Classification: Logistic(sigmoid) function,
Multi-class Classification: Softmax function.

The main goal is to enable ANN to learn the most accurate weight values (so achiving most accurate
result) with correct hidden layer and neuron numbers. We can do this by applying certain processes in our
artificial neural network and optimizing it.
Activation Functions

It may be defined as the extra force or effort applied over the input to obtain an exact output. In ANN,
we can also apply activation functions over the input to get the exact output.

It’s just a thing function that you use to get the output of node. It is also known as Transfer Function.

Why we use Activation functions with Neural Networks?

It is used to determine the output of neural network like yes or no. It maps the resulting values in between
0 to 1 or -1 to 1 etc. (depending upon the function).

The Activation Functions can be basically divided into 2 types-

1. Linear Activation Function

2. Non-linear Activation Functions

Linear or Identity Activation Function

As you can see the function is a line or linear. Therefore, the output of the functions will not be confined
between any range.

Equation : f(x) = x
Range : (-infinity to infinity)
It doesn’t help with the complexity or various parameters of usual data that is fed to the neural
networks.
Non-linear Activation Function

The Nonlinear Activation Functions are the most used activation functions. Nonlinearity helps to makes
the graph look something like this

It makes it easy for the model to generalize or adapt with variety of data and to differentiate between the
output.

The main terminologies needed to understand for nonlinear functions are:

Derivative or Differential: Change in y-axis w.r.t. change in x-axis.It is also known as slope.

Monotonic function: A function which is either entirely non-increasing or non-decreasing.

The Nonlinear Activation Functions are mainly divided on the basis of their range or curves-

1. Sigmoid or Logistic Activation Function

The Sigmoid Function curve looks like a S-shape.


The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is
especially used for models where we have to predict the probability as an output. Since probability of
anything exists only between the range of 0 and 1, sigmoid is the right choice.

The function is differentiable. That means, we can find the slope of the sigmoid curve at any two points.

The function is monotonic but function’s derivative is not.

The logistic sigmoid function can cause a neural network to get stuck at the training time.

The softmax function is a more generalized logistic activation function which is used for multiclass
classification.

2. Tanh or hyperbolic tangent Activation Function

tanh is also like logistic sigmoid but better. The range of the tanh function is from (-1 to 1). tanh is also
sigmoidal (s - shaped).
The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be
mapped near zero in the tanh graph.

The function is differentiable.

The function is monotonic while its derivative is not monotonic.

The tanh function is mainly used classification between two classes.

Both tanh and logistic sigmoid activation functions are used in feed-forward nets.

3. ReLU (Rectified Linear Unit) Activation Function

The ReLU is the most used activation function in the world right now. Since, it is used in almost all the
convolutional neural networks or deep learning.
As you can see, the ReLU is half rectified (from bottom). f(z) is zero when z is less than zero and f(z)
is equal to z when z is above or equal to zero.

Range: [ 0 to infinity)

The function and its derivative both are monotonic.

But the issue is that all the negative values become zero immediately which decreases the ability of the
model to fit or train from the data properly. That means any negative input given to the ReLU activation
function turns the value into zero immediately in the graph, which in turns affects the resulting graph by
not mapping the negative values appropriately.

Curse of Dimensionality

Regarding the curse of dimensionality — also known as the Hughes Phenomenon — there are
two things to consider. On the one hand, ML excels at analyzing data with many dimensions.
Humans are not good at finding patterns that may be spread out across so many dimensions,
especially if those dimensions are interrelated in counter-intuitive ways. On the other hand, as
we add more dimensions we also increase the processing power we need to analyze the data,
and we also increase the amount of training data required to make meaningful data models.

High dimensional data is when a dataset a number of features (p) that is bigger than the number
of observations (N). High dimensional data is the problem that leads to the curse of
dimensionality. The equation for high dimensional data is usually written like p >> N.

The Hughes Phenomenon shows that as the number of features increases, the classifier’s
performance increases as well until we reach the optimal number of features. Adding more
features based on the same size as the training set will then degrade the classifier’s
performance.
An increase in the number of dimensions of a dataset means there are more entries in the vector
of features that represents each observation in the corresponding Euclidean space. We measure
the distance in a vector space using Euclidean distance.

The Euclidean distance between two 2-D vectors is given by:

Hence, each new dimension adds a non-negative term to the sum, so the distance increases with
the number of dimensions for distinct vectors. In other words, as the number of features grows
for a given number of observations, the feature space becomes increasingly sparse; that is, less
dense or emptier. On the flip side, the lower data density requires more observations to keep
the average distance between data points the same.

When the distance between observations grows, supervised machine learning becomes more
difficult because predictions for new samples are less likely to be based on learning from
similar training features. The number of possible unique rows grows exponentially as the
number of features increases, which makes it so much harder to efficiently generalize. The
variance increases as they get more opportunity to overfit to noise in more dimensions,
resulting in poor generalization performance.

Dimensionality reduction techniques help compress the data without losing much of the signal,
and combat the curse of dimensionality.

Overfitting and Underfitting

In supervised learning, overfitting happens when our model captures the noise along with the
underlying pattern in data. It happens when we train our model a lot over noisy dataset. These
models have low bias and high variance. These models are very complex like Decision trees
which are prone to overfitting.
In supervised learning, underfitting happens when a model unable to capture the underlying
pattern of the data. These models usually have high bias and low variance. It happens when we
have very less amount of data to build an accurate model or when we try to build a linear model
with a nonlinear data. Also, these kind of models are very simple to capture the complex patterns
in data like Linear and logistic regression.

As a result, we will have to achieve a balance between overfitting and underfitting.


Understanding the Bias-Variance Tradeoff

Whenever we discuss model prediction, it’s important to understand prediction errors


(bias andvariance). There is a tradeoff between a model’s ability to minimize bias and
variance. Gaininga proper understanding of these errors would help us not only to
build accurate models but alsoto avoid the mistake of overfitting and underfitting.

What is bias?
Bias is the difference between the average prediction of our model and the correct
value whichwe are trying to predict. Model with high bias pays very little attention to
the training data andoversimplifies the model. It always leads to high error on training
and test data.

What is variance?
Variance is the variability of model prediction for a given data point or a value
which tells usspread of our data. Model with high variance pays a lot of attention to
training data and does not generalize on the data which it hasn’t seen before. As a
result, such models perform very well on training data but has high error rates on
test data.

If our model is too simple and has very few parameters then it may have high bias and
low variance. On the other hand if our model has large number of parameters then it’s
going to have high variance and low bias. So we need to find the right/good balance
without overfittingand underfitting the data. This tradeoff in complexity is why there
is a tradeoff between bias and variance. An algorithmcan’t be more complex and less
complex at the same time.

Vanishing and Exploding Gradient Problem

When training a deep neural network with gradient based learning and backpropagation, we find
the partial derivatives by traversing the network from the the final layer to the initial layer. Using
the chain rule, layers that are deeper into the network go through continuous matrix
multiplications in order to compute their derivatives.

In a network of n hidden layers, n derivatives will be multiplied together. If the derivatives are
large then the gradient will increase exponentially as we propagate down the model until they
eventually explode, and this is what we call the problem of exploding gradient. Alternatively,
if the derivatives are small then the gradient will decrease exponentially as we propagate through
the model until it eventually vanishes, and this is the vanishing gradient problem.
In the case of exploding gradients, the accumulation of large derivatives results in the model
being very unstable and incapable of effective learning, The large changes in the models weights
creates a very unstable network, which at extreme values the weights become so large that is
causes overflow resulting in NaN weight values of which can no longer be updated. On the other
hand, the accumulation of small gradients results in a model that is incapable of learning
meaningful insights since the weights and biases of the initial layers, which tends to learn the
core features from the input data (X), will not be updated effectively. In the worst case scenario
the gradient will be 0 which in turn will stop the network will stop further training.

How to know?

Exploding Gradients

There are few subtle methods that you may use to determine whether your model is suffering
from the problem of exploding gradients;

• The model is not learning much on the training data therefore resulting in a poor
loss.

• The model will have large changes in loss on each update due to the models
instability.

• The models loss will be NaN during training.

When faced with these problems, to confirm whether the problem is due to exploding
gradients, there are some much more transparent signs, for instance:

• Model weights grow exponentially and become very large when training the
model.

• The model weights become NaN in the training phase.

• The derivatives are constantly

Vanishing Gradient

There are also ways to detect whether your deep network is suffering from the vanishing
gradient problem

• The model will improve very slowly during the training phase and it is also
possible that training stops very early, meaning that any further training does
not improve the model.
• The weights closer to the output layer of the model would witness more of a
change whereas the layers that occur closer to the input layer would not change
much (if at all).

• Model weights shrink exponentially and become very small when training the
model.

• The model weights become 0 in the training phase.

Solutions

There are many approaches to addressing exploding and vanishing gradients; this section lists 3
approaches that you can use.

1.Reducing the amount of Layers

This is the solution could be used in both, scenarios (exploding and vanishing gradient).
However, by reducing the amount of layers in our network, we give up some of our models
complexity, since having more layers makes the networks more capable of representing
complex mappings.

2. Gradient Clipping (Exploding Gradients)

Checking for and limiting the size of the gradients whilst our model trains is another solution.

3. Weight Initialization

A more careful initialization choice of the random initialization for your network tends to be a
partial solution, since it does not solve the problem completely.

Working of deep learning models

In general deep learning modelling, we formulate a problem using the neuron and layers of the
network and expect the problem to come up with a loss function. At the same time, the training
of models includes weights as parameters. When including backpropagation with the model,
the process of backpropagation starts when the errors defined by the loss function reach a
defined point.

Every iteration in the training tries to reach closer to that point and at this point, the error value
gets minimized by updating the weights. This model includes a set of weights associated with
the loss function. The main goal of modelling is to find the minimum value loss at every
iteration and overall operation.

Convergence in deep learning

In simple words, we can say that convergence of neural networks is a point of training a model
after which changes in the learning rate become lower and the errors produced by the model in
training comes to a minimum. We can also say that a deep learning model is in convergence
when the loss given by the model reaches its minimum. The convergence can be of two types
either global or local. One thing that is noticeable here is that convergence should happen with
a decreasing trend. However, In a variety of modelling procedures, it is very rare to see a model
converge very strictly but it is common to see the model converge in a convex manner.

The above image is a representation of the convergence where we can see that the training of
the model after the 20th iteration becomes converged and the errors after the 20th iteration are
lower, decremental and within a smaller range.
By the above, we can say that the convergence in the model is important while training makes
us decide whether to proceed with the model or not. One of our articles consists of information
about how to converge the neural network faster. This article is focused on the information
when the neural network fails to converge. Let’s take a look at what fails to converge means.

The cause which fails the model to converge

In simple words, we can think of failure in convergence as a condition where we can’t find the
convergence point in the learning curve of a neural network. It directly means there is no such
point in the curve which can be represented as the starting point of getting lower and
decremental error. We can understand the failure in the convergence by looking at the below
image.

In the above image, we can see that the errors are decremental as the count of iteration is
increasing but one different thing is we can not tell from which point the error is varying within
a smaller range. For what are the global or local minima of the errors? In such a situation, we
can say that the neural network is failed to converge. Let’s see why it happens.

Why does a neural net fail to converge?

Most of the neural network fails to converge because of an error in the modelling. Let us say
the data is required to transform within the network and the nodes we have provided in the
networks are way smaller in number. In such a situation how can we expect the network to
work properly? So in the majority of the cases when the network fails to converge, it comes
into the picture because of inaccurate modelling. Some of the reasons behind this thing are as
follows:
• Implementation of not enough nodes may be a reason behind this issue because models
with fewer nodes need to change their architecture drastically to model the data better
and fail to converge.
• The amount of the training data is low or the data we are pushing on the model is
corrupted or not collected with the data integrity.
• The activation function we are using with the network often leads to good results from
the model but if complexity is higher then the model can fail to converge.
• Inappropriate weight application in the network can also cause a failure in convergence.
The weights we are applying to the network should be well calculated according to the
activation function.
• The learning rate parameter we have given in the network should be moderate which
means it should not be much larger or much lower.

Remedies for convergence failure

In the above section, we have discussed the reason that can cause failure in the convergence of
the neural networks. There are various things to do that can help in avoiding this failure. Let’s
take a look at some points that can help us in preventing the failure in the convergence of the
neural networks.

• Implementing momentum: sometimes convergence depends on the data and if the data
is making a model producing errors like a hair comb. The implementation of neural
network momentum can help in avoiding convergence and also helps in boosting the
accuracy and speed of the model.
• Reinitialization of the weights of the network can help in avoiding the failure of
convergence.
• If the training is stuck in the local minima and subsequent sessions have exceeded max
iteration, this means the session has failed and we will get a higher error. In such a
situation starting another session can be helpful.
• Change in the activation function can be helpful. For example, we are using a ReLU
activation and the neurons of the nodes become biased and this can cause the neuron to
never be activated. In such a situation changing the activation function to another
activation can be helpful.
• While performing classification using neural networks, then we can use the shuffling
of the training data to avoid the failure in convergence.
• The learning rate and the number of epochs should be proportional while modelling a
network. Applying a lower number of epochs causes the convergence to happen in
smaller steps and a bigger number of epochs there will mean a long wait in the
appearance of the convergence. A higher learning rate or the number of epochs should
be avoided to make the neural network converge faster.

Neural Network business applications:


• Banking: Credit card attrition, credit and loan application evaluation, fraud and risk
evaluation, and loan delinquencies
• Business Analytics: Customer behavior modeling, customer segmentation, fraud
propensity, market research, market mix, market structure, and models for attrition,
default, purchase, and renewals
• Defense: Counterterrorism, facial recognition, feature extraction, noise suppression,
object discrimination, sensors, sonar, radar and image signal processing, signal/image
identification, target tracking, and weapon steering
• Education: Adaptive learning software, dynamic forecasting, education system
analysis and forecasting, student performance modeling, and personality profiling
• Financial: Corporate bond ratings, corporate financial analysis, credit line use
analysis, currency price prediction, loan advising, mortgage screening, real estate
appraisal, and portfolio trading
• Medical: Cancer cell analysis, ECG and EEG analysis, emergency room test
advisement, expense reduction and quality improvement for hospital systems,
transplant process optimization, and prosthesis design
• Securities: Automatic bond rating, market analysis, and stock trading advisory systems
• Transportation: Routing systems, truck brake diagnosis systems, and vehicle
scheduling

Perceptron (P):

Applications:

• Classification.
• Encode Database (Multilayer Perceptron).
• Monitor Access Data (Multilayer Perceptron).

Feed Forward (FF)

Applications:

• Data Compression.
• Pattern Recognition.
• Computer Vision.
• Sonar Target Recognition.
• Speech Recognition.
• Handwritten Characters Recognition.

Deep Feed-forward (DFF)

Applications:

• Data Compression.
• Pattern Recognition.
• Computer Vision.
• ECG Noise Filtering.
• Financial Prediction.

Recurrent Neural Network (RNN)

Applications:

• Machine Translation.
• Robot Control.
• Time Series Prediction.
• Speech Recognition.
• Speech Synthesis.
• Time Series Anomaly Detection.
• Rhythm Learning.
• Music Composition.

Long / Short Term Memory (LSTM)

Applications:

• Speech Recognition.
• Writing Recognition.

Gated Recurrent Unit (GRU)

Applications:

• Polyphonic Music Modeling.


• Speech Signal Modeling.
• Natural Language Processing.

You might also like