Back Propagation Algorithm
Back Propagation Algorithm
• The hidden neurons act as feature detectors; as such, they play a critical
role in the operation of a multilayer perceptron.
• They do so by performing a nonlinear transformation on the input data
into a new space called the feature space.
• In this new space, the classes of interest in a pattern-classification task,
becomes more easily separated from each other than could be the case
in the original input data space.
• It is the formation of this feature space through supervised learning that
distinguishes the multilayer perceptron from Rosenblatt’s perceptron.
• The credit-assignment problem is the problem of assigning credit or blame for overall outcomes to
each of the internal decisions made by the hidden computational units of the distributed learning
system, recognizing that those decisions are responsible for the overall outcomes.
• In a multilayer perceptron using error-correlation learning, the credit-assignment problem arises
because the operation of each hidden neuron and of each output neuron in the network is important
to the network’s correct overall action on a learning task of interest.
• Output neuron is visible to the outside world, it is possible to supply a desired response to guide the
behavior of such a neuron.
• Thus, as far as output neurons are concerned, it is a straightforward matter to adjust the synaptic
weights of each output neuron in accordance with the error-correction algorithm.
• But how do we assign credit or blame for the action of the hidden neurons when the error-
correction learning algorithm is used to adjust the respective synaptic weights of these neurons?
• As answer to this fundamental question back-propagation algorithm has been introduced to train
the multilayer perceptron, solves the credit-assignment problem in an elegant manner.
The Back-propagation Algorithm
• Supervised training of multilayer perceptrons
• It depicts neuron j being fed by a set of function signals produced by a layer of neurons to its left.
• The induced local field vj(n) produced at the input of the activation function associated with neuron j is
𝑚
1 𝜕𝐸(𝑛)
𝐸 𝑛 = Ξ𝑗 𝑛 = 𝑒𝑗2 = 𝑒𝑗
2 𝜕𝑒𝑗 (𝑛)
𝑗 𝑗
𝜕𝑒𝑗 (𝑛)
𝑒𝑗 𝑛 = 𝑑𝑗 𝑛 − 𝑦𝑗 𝑛 = −1
𝜕𝑦𝑗 (𝑛)
𝜕𝑦𝑗 (𝑛)
𝑦𝑗 𝑛 = 𝜑𝑗 𝑣𝑗 𝑛 = 𝜑𝑗′ (𝑣𝑗 (𝑛))
𝜕𝑣𝑗 (𝑛)
𝑚
𝜕𝑣𝑗 (𝑛)
𝑣𝑗 𝑛 = 𝑤𝑗𝑖 𝑛 𝑦𝑖 𝑛 = 𝑦𝑖 (𝑛)
𝜕𝑤𝑗𝑖 (𝑛)
𝑖=0
𝜕𝐸(𝑛)
∆𝑤𝑗𝑖 𝑛 = −𝜂
𝜕𝑤𝑗𝑖 (𝑛)
The direction for weight change that reduces the value of 𝐸(𝑛)
𝜕𝐸(𝑛)𝜕𝑒𝑗 (𝑛)
local gradient 𝛿𝑗 (𝑛) = 𝑒𝑗 (𝑛)∅𝑗′ (𝑣𝑗 (𝑛))=
𝜕𝑒𝑗 (𝑛)𝜕𝑦𝑗 (𝑛)
• A key factor involved in the calculation of the weight adjustment
∆𝑤𝑗𝑖 𝑛 is the error signal ej(n) at the output of neuron j.
• In this context, we may identify two distinct cases, depending on
where in the network neuron j is located.
• Case 1: neuron j is an output node. This case is simple because each
output node of the network is supplied with a desired response of its
own, making it a straightforward matter to calculate the associated
error signal.
• Case 2: neuron j is a hidden node. Even though, they share
responsibility for any error made at the output of the network.
Neuron j Is a Hidden Node
• When neuron j is located in a hidden layer of the network, there is no
specified desired response for that neuron.
• The error signal for a hidden neuron would have to be determined
recursively and working backwards in terms of the error signals of all
the neurons to which that hidden neuron is directly connected.
• we may redefine the local gradient 𝛿𝑗 (𝑛) for hidden neuron j as
𝜕𝑦𝑗 (𝑛)
𝑦𝑗 𝑛 = 𝜑𝑗 𝑣𝑗 𝑛 = 𝜑𝑗′ (𝑣𝑗 (𝑛))
𝜕𝑣𝑗 (𝑛)
1 𝜕𝐸(𝑛)
𝐸 𝑛 = σ𝑘 Ξ𝑘 𝑛 = σ𝑘 𝑒𝑘2 = σ𝑘 𝑒𝑘
2 𝜕𝑒𝑘 (𝑛)
𝜕𝐸(𝑛) 𝜕𝐸(𝑛)𝜕𝑒𝑘 (𝑛) 𝜕𝐸(𝑛) 𝜕𝑒𝑘 (𝑛)
= = 𝑒𝑘 (𝑛)
𝜕𝑦𝑗 (𝑛) 𝜕𝑒𝑘 (𝑛)𝜕𝑦𝑗 (𝑛) 𝜕𝑦𝑗 (𝑛) 𝜕𝑦𝑗 (𝑛)
𝑘
local gradient 𝛿𝑘 (𝑛) = 𝑒𝑘 (𝑛)∅′𝑘 (𝑣𝑗 (𝑛))
𝜕𝐸(𝑛) ′
𝛿𝑗 (𝑛) = 𝜑 (𝑣 (𝑛))
𝜕𝑦𝑗 (𝑛) 𝑗 𝑗