Back Propagation Algorithm
Dr.N.Herald Anantha Rufus
Asso. Professor/ECE
What is neural network?
The term neural network was
traditionally used to refer to a
network or circuit of biological
neurons. The modern usage of the
term often refers to artificial
neural networks, which are
composed of artificial neurons or
nodes.
In the artificial intelligence field,
artificial neural networks have been
applied successfully to
speech recognition, image analysis
in order to construct
software agents or
autonomous robots.
Neural networks resemble the human brain in the
following two ways:
A neural network acquires
knowledge through learning
A neural network's knowledge is
stored within inter-neuron
connection strengths known as
synaptic weights.
How a Multi-Layer Neural Network
Works?
The inputs to the network correspond to the attributes measured
for each training tuple
Inputs are fed simultaneously into the units making up the input
layer
They are then weighted and fed simultaneously to a hidden layer
The number of hidden layers is arbitrary, although usually only
one
The weighted outputs of the last hidden layer are input to units
making up the output layer, which emits the network's prediction
The network is feed-forward in that none of the weights cycles
back to an input unit or to an output unit of a previous layer
From a statistical point of view, networks perform nonlinear
regression: Given enough hidden units and enough training
samples, they can closely approximate any function
Back propagation algorithm
Backpropagation: A neural network learning algorithm
Started by psychologists and neurobiologists to develop
and test computational analogues of neurons
A neural network: A set of connected input/output units
where each connection has a weight associated with it
During the learning phase, the network learns by
adjusting the weights so as to be able to predict the
correct class label of the input tuples
Also referred to as connectionist learning due to the
connections between units
Contd..
Iteratively process a set of training tuples & compare the
network's prediction with the actual known target value
For each training tuple, the weights are modified to minimize
the mean squared error between the network's prediction and
the actual target value
Modifications are made in the “backwards” direction: from the
output layer, through each hidden layer down to the first hidden
layer, hence “backpropagation”
Steps
Initialize weights (to small random #s) and biases in the
network
Propagate the inputs forward (by applying activation function)
Backpropagate the error (by updating weights and biases)
Terminating condition (when error is very small, etc.)
Contd..
Efficiency of backpropagation: Each epoch (one interation
through the training set) takes O(|D| * w), with |D| tuples
and w weights, but # of epochs can be exponential to n,
the number of inputs, in the worst case
Rule extraction from networks: network pruning
Simplify the network structure by removing weighted links
that have the least effect on the trained network
Then perform link, unit, or activation value clustering
The set of input and activation values are studied to derive
rules describing the relationship between the input and
hidden unit layers
Sensitivity analysis: assess the impact that a given input
variable has on a network output. The knowledge gained
from this analysis can be represented in rules
Two phases: propagation and weight update.
Phase 1: Propagation
Each propagation involves the following steps:
Forward propagation of a training pattern's input
through the neural network in order to generate
the propagation's output activations.
Back propagation of the propagation's output
activations through the neural network using the
training pattern's target in order to generate the
deltas of all output and hidden neurons.
Contd..
Phase 2: Weight update
For each weight-synapse:
Multiply its output delta and input activation to
get the gradient of the weight.
Bring the weight in the opposite direction of the
gradient by subtracting a ratio of it from the
weight.
This ratio influences the speed and quality of
learning; it is called the learning rate. The sign
of the gradient of a weight indicates where the
error is increasing, this is why the weight must
be updated in the opposite direction.
Repeat the phase 1 and 2 until the performance
of the network is good enough.
Actual algorithm for a 3-layer network (only
one hidden layer):
Initialize the weights in the network (often randomly)
Do
For each example e in the training set
O = neural-net-output (network, e) ; forward pass
T = teacher output for e
Calculate error (T - O) at the output units
Compute delta_wh for all weights from hidden layer
to output layer ; backward pass
Compute delta_wi for all weights from input layer to
hidden layer ; backward pass continued
Update the weights in the network Until all examples
classified correctly or stopping criterion satisfied
Return the network
Weakness
Long training time
Require a number of parameters typically best
determined empirically, e.g., the network topology or
``structure."
Poor interpretability: Difficult to interpret the symbolic
meaning behind the learned weights and of ``hidden
units" in the network
Strength
High tolerance to noisy data
Ability to classify untrained patterns
Well-suited for continuous-valued inputs and outputs
Successful on a wide array of real-world data
Algorithms are inherently parallel
Techniques have recently been developed for the
extraction of rules from trained neural networks
Thank You!!!