Mod 2.1,2.2
Mod 2.1,2.2
Aggarwal
I B M T J Watson Research Center
Yorktown Heights, N Y
DEEP LEARNING
ACCURACY
CONVENTIONAL
MACHINE LEARNING
AMOUNT OF DATA
INPUT NODES
x1
w1
x2 OUTPUT NODE
w2
x3 w3
∑ y
w4
x4
w5
x5
x3 w3
∑ y
w4
x4
w5
b
x5
+1 BIAS NEURON
OUTPUT LAYER
x3 y
x4
x5
OUTPUT LAYER
x3 y
x4
x5
The backpropagation algorithm contains two main phases, referred to as the forward and
backward phases, respectively.
1. Forward phase: In this phase, the inputs for a training instance are fed into the neural
network. This results in a forward cascade of computations across the layers, using the
current set of weights. The final predicted output can be compared to that of the training
instance and the derivative of the loss function with respect to the output is computed. The
derivative of this loss now needs to be computed with respect to the weights in all layers in
the backwards phase.
2. Backward phase: The main goal of the backward phase is to learn the gradient of the loss
function with respect to the different weights by using the chain rule of di fferential
calculus. These gradients are used to update the weights. Since these gradients are learned
in the backward direction, starting from the output node, this learning process is referred to
as the backward phase.
Training a Neural Network
with Backpropagation
In the single-layer neural network, the training process is relatively straightforward
because the error (or loss function) can be computed as a direct function of the
weights, which allows easy gradient computation.
In the case of multi-layer networks, the problem is that the loss is a complicated
composition function of the weights in earlier layers. The gradient of a composition
function is computed using the backpropagation algorithm.
The backpropagation algorithm leverages the chain rule of differential calculus, which
computes the error gradients in terms of summations of local-gradient products over
the various paths from a node to the output.
Backpropagation algorithms are a set of methods used to efficiently train artificial
neural networks following a gradient descent approach which exploits the chain
rule.
Illustration of chain rule in
computational graphs
Example to understand how exactly updates the weight using Backpropagation.