0% found this document useful (0 votes)
28 views12 pages

Back Propagation Algorithm

The document describes the back propagation algorithm, which is used to train multilayer perceptrons. It works by calculating the gradient of the loss function with respect to the weights in the network using the chain rule. This gradient is then used to update the weights through an iterative process. The algorithm solves the credit assignment problem by calculating error signals for hidden nodes based on errors of downstream nodes.

Uploaded by

Jhumur Santra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views12 pages

Back Propagation Algorithm

The document describes the back propagation algorithm, which is used to train multilayer perceptrons. It works by calculating the gradient of the loss function with respect to the weights in the network using the chain rule. This gradient is then used to update the weights through an iterative process. The algorithm solves the credit assignment problem by calculating error signals for hidden nodes based on errors of downstream nodes.

Uploaded by

Jhumur Santra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Back Propagation Algorithm

• The hidden neurons act as feature detectors; as such, they play a critical
role in the operation of a multilayer perceptron.
• They do so by performing a nonlinear transformation on the input data
into a new space called the feature space.
• In this new space, the classes of interest in a pattern-classification task,
becomes more easily separated from each other than could be the case
in the original input data space.
• It is the formation of this feature space through supervised learning that
distinguishes the multilayer perceptron from Rosenblatt’s perceptron.
• The credit-assignment problem is the problem of assigning credit or blame for overall outcomes to
each of the internal decisions made by the hidden computational units of the distributed learning
system, recognizing that those decisions are responsible for the overall outcomes.
• In a multilayer perceptron using error-correlation learning, the credit-assignment problem arises
because the operation of each hidden neuron and of each output neuron in the network is important
to the network’s correct overall action on a learning task of interest.
• Output neuron is visible to the outside world, it is possible to supply a desired response to guide the
behavior of such a neuron.
• Thus, as far as output neurons are concerned, it is a straightforward matter to adjust the synaptic
weights of each output neuron in accordance with the error-correction algorithm.
• But how do we assign credit or blame for the action of the hidden neurons when the error-
correction learning algorithm is used to adjust the respective synaptic weights of these neurons?
• As answer to this fundamental question back-propagation algorithm has been introduced to train
the multilayer perceptron, solves the credit-assignment problem in an elegant manner.
The Back-propagation Algorithm
• Supervised training of multilayer perceptrons
• It depicts neuron j being fed by a set of function signals produced by a layer of neurons to its left.
• The induced local field vj(n) produced at the input of the activation function associated with neuron j is
𝑚

𝑣𝑗 𝑛 = ෍ 𝑤𝑗𝑖 (𝑛)𝑦𝑖 (𝑛)


𝑖=0
• where m is the total number of inputs (excluding the bias) applied to neuron j.
• The synaptic weight wj0 (corresponding to the fixed input y0=1) equals the
bias bj applied to neuron j.
• Hence, the function signal yj(n) appearing at the output of neuron j at iteration
n is
𝑦𝑗 𝑛 = 𝜑𝑗 (𝑣𝑗 (𝑛))
• The backpropagation algorithm adapts the synaptic weight wji(n)by an amount
𝜕𝐸(𝑛)
Δwji(n) which is proportional to the partial derivative
𝜕𝑤𝑗𝑖 (𝑛)
• According to the chain rule of calculus, we may express

𝜕𝐸(𝑛) 𝜕𝐸(𝑛)𝜕𝑒𝑗 (𝑛)𝜕𝑦𝑗 (𝑛)𝜕𝑣𝑗 (𝑛)


=
𝜕𝑤𝑗𝑖 (𝑛) 𝜕𝑒𝑗 (𝑛)𝜕𝑦𝑗 (𝑛)𝜕𝑣𝑗 (𝑛)𝜕𝑤𝑗𝑖 (𝑛)

1 𝜕𝐸(𝑛)
𝐸 𝑛 = ෍ Ξ𝑗 𝑛 = ෍ 𝑒𝑗2 = 𝑒𝑗
2 𝜕𝑒𝑗 (𝑛)
𝑗 𝑗

𝜕𝑒𝑗 (𝑛)
𝑒𝑗 𝑛 = 𝑑𝑗 𝑛 − 𝑦𝑗 𝑛 = −1
𝜕𝑦𝑗 (𝑛)
𝜕𝑦𝑗 (𝑛)
𝑦𝑗 𝑛 = 𝜑𝑗 𝑣𝑗 𝑛 = 𝜑𝑗′ (𝑣𝑗 (𝑛))
𝜕𝑣𝑗 (𝑛)
𝑚
𝜕𝑣𝑗 (𝑛)
𝑣𝑗 𝑛 = ෍ 𝑤𝑗𝑖 𝑛 𝑦𝑖 𝑛 = 𝑦𝑖 (𝑛)
𝜕𝑤𝑗𝑖 (𝑛)
𝑖=0

𝜕𝐸(𝑛) 𝜕𝐸(𝑛)𝜕𝑒𝑗 (𝑛)𝜕𝑦𝑗 (𝑛)𝜕𝑣𝑗 (𝑛) 𝜕𝐸(𝑛)


= ⇒ = −𝑒𝑗 (𝑛)∅𝑗′ (𝑣𝑗 (𝑛))𝑦𝑖 (𝑛)
𝜕𝑤𝑗𝑖 (𝑛) 𝜕𝑒𝑗 (𝑛)𝜕𝑦𝑗 (𝑛)𝜕𝑣𝑗 (𝑛)𝜕𝑤𝑗𝑖 (𝑛) 𝜕𝑤𝑗𝑖 (𝑛)

𝜕𝐸(𝑛)
∆𝑤𝑗𝑖 𝑛 = −𝜂
𝜕𝑤𝑗𝑖 (𝑛)

The direction for weight change that reduces the value of 𝐸(𝑛)

∆𝑤𝑗𝑖 𝑛 = 𝜂𝑒𝑗 (𝑛)∅𝑗′ (𝑣𝑗 (𝑛))𝑦𝑖 (𝑛)

∆𝑤𝑗𝑖 𝑛 = 𝜂𝛿𝑗 𝑛 𝑦𝑖 𝑛 , 𝑤𝑗𝑖 (n+1)= 𝑤𝑗𝑖 (n)+ ∆𝑤𝑗𝑖 𝑛

𝜕𝐸(𝑛)𝜕𝑒𝑗 (𝑛)
local gradient 𝛿𝑗 (𝑛) = 𝑒𝑗 (𝑛)∅𝑗′ (𝑣𝑗 (𝑛))=
𝜕𝑒𝑗 (𝑛)𝜕𝑦𝑗 (𝑛)
• A key factor involved in the calculation of the weight adjustment
∆𝑤𝑗𝑖 𝑛 is the error signal ej(n) at the output of neuron j.
• In this context, we may identify two distinct cases, depending on
where in the network neuron j is located.
• Case 1: neuron j is an output node. This case is simple because each
output node of the network is supplied with a desired response of its
own, making it a straightforward matter to calculate the associated
error signal.
• Case 2: neuron j is a hidden node. Even though, they share
responsibility for any error made at the output of the network.
Neuron j Is a Hidden Node
• When neuron j is located in a hidden layer of the network, there is no
specified desired response for that neuron.
• The error signal for a hidden neuron would have to be determined
recursively and working backwards in terms of the error signals of all
the neurons to which that hidden neuron is directly connected.
• we may redefine the local gradient 𝛿𝑗 (𝑛) for hidden neuron j as
𝜕𝑦𝑗 (𝑛)
𝑦𝑗 𝑛 = 𝜑𝑗 𝑣𝑗 𝑛 = 𝜑𝑗′ (𝑣𝑗 (𝑛))
𝜕𝑣𝑗 (𝑛)

𝜕𝐸(𝑛)𝜕𝑦𝑗 (𝑛) 𝜕𝐸(𝑛) ′


𝛿𝑗 (𝑛) = 𝛿𝑗 (𝑛) = 𝜑 (𝑣 (𝑛))
𝜕𝑦𝑗 (𝑛)𝜕𝑣𝑗 (𝑛) 𝜕𝑦𝑗 (𝑛) 𝑗 𝑗

𝛿𝑗 (𝑛) = 𝑒𝑗 (𝑛)∅𝑗′ (𝑣𝑗 (𝑛))


𝜕𝐸(𝑛) 𝜕𝐸(𝑛)𝜕𝑒𝑘 (𝑛)
=
𝜕𝑦𝑗 (𝑛) 𝜕𝑒𝑘 (𝑛)𝜕𝑦𝑗 (𝑛)

1 𝜕𝐸(𝑛)
𝐸 𝑛 = σ𝑘 Ξ𝑘 𝑛 = σ𝑘 𝑒𝑘2 = σ𝑘 𝑒𝑘
2 𝜕𝑒𝑘 (𝑛)
𝜕𝐸(𝑛) 𝜕𝐸(𝑛)𝜕𝑒𝑘 (𝑛) 𝜕𝐸(𝑛) 𝜕𝑒𝑘 (𝑛)
= = ෍ 𝑒𝑘 (𝑛)
𝜕𝑦𝑗 (𝑛) 𝜕𝑒𝑘 (𝑛)𝜕𝑦𝑗 (𝑛) 𝜕𝑦𝑗 (𝑛) 𝜕𝑦𝑗 (𝑛)
𝑘
local gradient 𝛿𝑘 (𝑛) = 𝑒𝑘 (𝑛)∅′𝑘 (𝑣𝑗 (𝑛))

𝜕𝐸(𝑛) ′
𝛿𝑗 (𝑛) = 𝜑 (𝑣 (𝑛))
𝜕𝑦𝑗 (𝑛) 𝑗 𝑗

You might also like