Neural Networks

Artificial Neural Networks (ANNs)
Neural network is a type of machine learning model inspired from

the natural neural network of human nervous system.
The inventor of the first neurocomputer, Dr. Robert Hecht-Nielsen,
defines a neural network as −
"...a computing system made up of a number of simple, highly

interconnected processing elements, which process information
by their dynamic state response to external inputs.”
Human Nervous System
The human brain is composed more than billion nerve cells called
neurons. They are connected to other thousand cells by Axons.
Stimuli from external environment or inputs from sensory organs
are accepted by dendrites. These inputs create electric impulses,
which quickly travel through the neural network. A neuron can then
send the message to other neuron to handle the issue or does not
send it forward.
Human Nervous System
Basically, a neuron receives signals from other neurons (dendrite), processes it
like the CPU (soma or cell body), passes the output through a cable like structure
(axon) to synapse (Point of connection to other neurons) to other neuron’s
dendrite.
Basic Structure of ANN
A neural network is composed of layers of interconnected nodes. The input layer
receives the raw input data, and the output layer produces the final prediction or
decision based on the computations performed by the hidden layers. The hidden
layers perform intermediate computations on the data and can vary in number
and size depending on the complexity of the task and the desired network
architecture.
Each link is associated with weight. ANNs are capable of learning, which takes
place by altering weight values. ANN is also known as a Feed-Forward Neural
Network (FNN) because inputs are processed only in the forward direction.
Biological Neuron vs Artificial Neuron
Parts of an artificial neuron
 The neuron will receive input (x), which is usually a number or a vector
of numbers.
 Each input is multiplied by a weight (w), which describes the strength of
the connection. The weights are learnable parameter (basically an
adjustable number that can be tuned to find an optimal output).
 The bias (b) is a parameter that is associated with each node in a neural
network. It is used to adjust the output of the node, even when the
input is zero. The bias allows the model to fit the data better and make
more accurate predictions.
 The Linear Function is the sum of the weighted input.
 The output or the activation function (f) is used in the network to
introduce non-linearity into our model. This is done to make our neural
network learn more complex patterns in the data.
Structure of Neuron Cell
 A neural network is composed of a large number of high-
interconnected processing elements (neurons) working in
parallel to solve a specific problem.
 As illustrated in the following figure a neuron i consists of a set
of n connecting links that are characterized by weights Wij.
Structure of Neuron Cell
 Each input signal xj applied to the link j is multiplied by
its corresponding weight wij and transmitted to neuron i.
These links product are accumulated by the adder as expressed
by the formula:
 Each an activation function g ( ) provides the output yi of

the unit as:
Activation Function Types
 There are many several types of activation functions, such
as:
Activation Function Formula

0 x<0
Sign Function g(x)=
1 x> 0
1
Sigmoid Function g(x)=
1 + e-Bx
1 a<x<b
Pulse Function g(x)=
0 otherwise
(x-c)2
Gaussian Function g(x)= 2 2
e
Some of Activation Functions

The Learning Methods For Neural Networks
There are two different types of learning methods have
been constructed to give the neural networks the ability to adjust
themselves intelligently.
1. Supervised Learning
In this type of learning, data are presented together with
teacher information in order to associate the data with the teacher
signal.
Supervised learning algorithms adjust the weights using input-
output data to match the input-output characteristics of a
network to desired characteristics. One of the most popular of
these algorithms is the back-propagation learning algorithm.
The Learning Methods For Neural Networks
2. Unsupervised Learning
The neural network is presented for some data without
getting any teacher information. This type of learning is often used
for data clustering and data analysis.
Neural networks that use unsupervised learning use the
redundancy in the data in order to build up clusters or feature
maps based on a familiarity distance. K-Means clustering algorithm
represents one of unsupervised learning algorithms.
Back-propagation (Bp) Algorithm
It is the training or learning algorithm. It learns by example. If
you submit to the algorithm the example of what you want the
network to do, it changes the network’s weights so that it can
produce desired output for a particular input on finishing the
training.
In order to train a neural network to perform some task, it must
adjust the weights of each unit in such a way that the error
between the desired output and the actual output is reduced.
This process of Bp requires that the neural network

compute the error derivative of the weights.
In other words, Bp algorithm must calculate how the error
changes as each weight is increased or decreased. It looks for
the minimum of the error function in weight space using the
method of gradient descent learning.
The Bp network undergoes supervised training with a finite
number of pattern pairs. Each one of these pairs consists of an
input pattern and a desired or target output pattern.
The following figure explains the general structure of the Bp.
The structure of Back-propagation
where X is the input vector; Y is the output vector; and H is the hidden
layers.
The learning process of Bp algorithm involves two steps:

1. Feed forward
Entering each one of the pattern pairs (input, output) and
computing the actual output.
2. Feed backward
Adjustment all the weights depending on the difference
between the desired output and the actual output.
This process repeated as many times as needed until the error

between the desired and the actual outputs reaches to the
minimum value.
The gradient descent technique
This is based on the minimization of errors E defined in terms of
weights and the activation function of the network.
 The activation function of the network is required to be
differentiable because the updates of weights is dependent on
the gradient of the error E.
 If ∆𝒘𝒊𝒋 is the weight update of the link connecting the 𝑖 𝑡ℎ
and 𝑗 𝑡ℎ neurons of the two neighboring layers. So, the ∆𝒘𝒊𝒋
is defined as : ∆𝑤𝑖 𝑗 = − 𝜎(𝜕𝐸 / 𝜕𝑤𝑖 𝑗 )
Where 𝜎 is the learning rate parameter and (𝜕𝐸 / 𝜕𝑤 𝑖𝑗) is
error gradient with reference to the weight 𝑤 𝑖𝑗.
The Back-propagation (BP)
The Bp algorithm is based on the gradient descent technique for
solving an optimization problem, which involves the
minimization of the network cumulative error E, it defines as:
where n is the number of training patterns presented to the

network for training purposes. E(k) represents the vectorial
difference between the target output and the actual output vectors
of the network, it defines as:
where 𝑡𝑖 𝑘 is the target output vector, and 𝑜𝑖 𝑘 is the actual

output vector.
The Back-propagation (Bp)
So, the minimization of the network error becomes:
where the index i represents the 𝑖 𝑡ℎ neuron of the output layer

composed of the a total number of 𝑞 neurons.
common challenges when training neural networks
Overfitting: It occurs when the neural network becomes too complex and
fits the training data too closely, resulting in poor generalization to new
data. This can be addressed by using techniques such as early stopping,
regularization, and dropout.
Underfitting: It occurs when the neural network is too simple to capture

the complexity of the data, resulting in poor performance on both the
training and test data. This can be addressed by increasing the complexity
of the network, adding more layers, or adjusting the hyperparameters.
Vanishing gradients: It occur when the gradients become too small during
backpropagation, making it difficult to update the weights of the network.
This can be addressed by using activation functions such as ReLU or by
using techniques such as gradient clipping.
Exploding gradients: Exploding gradients occur when the gradients

become too large during backpropagation, causing the weights of the
network to update too much and leading to unstable training. This can be
addressed by using techniques such as gradient clipping.
Hyperparameter tuning: Neural networks have many

hyperparameters that need to be tuned to achieve optimal
performance. Finding the optimal values for these hyperparameters
can be time-consuming and require extensive experimentation.
Gradient Clipping
Gradient clipping is a technique that tackles exploding gradients.
The idea of gradient clipping is very simple: If the gradient gets too
large, we rescale it to keep it small. More precisely, if ‖g‖ ≥ c, then
where c is a hyperparameter, g is the gradient, and ‖g‖ is the norm

of g. Since g/‖g‖ is a unit vector, after rescaling the new g will have
norm c. Note that if ‖g‖ < c, then we don’t need to do anything.
Example

Neural Networks

Uploaded by

Copyright:

Available Formats

Neural Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Networks

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks (ANNs)

Neural network is a type of machine learning model inspired from

"...a computing system made up of a number of simple, highly

 Each an activation function g ( ) provides the output yi of

Activation Function Formula

Some of Activation Functions

This process of Bp requires that the neural network

The learning process of Bp algorithm involves two steps:

This process repeated as many times as needed until the error

where n is the number of training patterns presented to the

where 𝑡𝑖 𝑘 is the target output vector, and 𝑜𝑖 𝑘 is the actual

So, the minimization of the network error becomes:

where the index i represents the 𝑖 𝑡ℎ neuron of the output layer

Underfitting: It occurs when the neural network is too simple to capture

Exploding gradients: Exploding gradients occur when the gradients

Hyperparameter tuning: Neural networks have many

where c is a hyperparameter, g is the gradient, and ‖g‖ is the norm

You might also like