BackPropagation PDF
BackPropagation PDF
January 8, 2012
Ryan
The neuron
I The sigmoid equation is what is typically used as a transfer
function between neurons. It is similar to the step function,
but is continuous and differentiable.
The neuron
I The sigmoid equation is what is typically used as a transfer
function between neurons. It is similar to the step function,
but is continuous and differentiable.
I
1
σ(x) = (1)
1 + e −x
x
-5 -4 -3 -2 -1 0 1 2 3 4 5
x
-5 -4 -3 -2 -1 0 1 2 3 4 5
d d 1
σ(x) =
dx dx 1 + e −x
The derivative of the sigmoid transfer function
d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
The derivative of the sigmoid transfer function
d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
1 + e −x − 1
=
(1 + e −x )2
The derivative of the sigmoid transfer function
d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
The derivative of the sigmoid transfer function
d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
1 + e −x 1
= −
(1 + e −x )2 (1 + e −x ) 2
The derivative of the sigmoid transfer function
d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
1 + e −x
1 2
= −
(1 + e −x )2 1 + e −x
The derivative of the sigmoid transfer function
d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
1 + e −x
1 2
= −
(1 + e −x )2 1 + e −x
= σ(x) − σ(x)2
The derivative of the sigmoid transfer function
d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
1 + e −x
1 2
= −
(1 + e −x )2 1 + e −x
= σ(x) − σ(x)2
σ 0 = σ(1 − σ)
Single input neuron
ω
ξ σ O
In the above figure (2) you can see a diagram representing a single
neuron with only a single input. The equation defining the figure is:
O = σ(ξω)
Single input neuron
ω
ξ σ O
In the above figure (2) you can see a diagram representing a single
neuron with only a single input. The equation defining the figure is:
O = σ(ξω + θ)
Multiple input neuron
θ
ω1
ξ1
ω2 P
ξ2 σ O
ω3
ξ3
O = σ(ω1 ξ1 + ω2 ξ2 + ω3 ξ3 + θ)
A neural network
Figure: A layer
A neural network
I J K
Notation
I xj` : Input to node j of layer `
The back propagation algorithm
Notation
I xj` : Input to node j of layer `
I Wij` : Weight from layer ` − 1 node i to layer ` node j
The back propagation algorithm
Notation
I xj` : Input to node j of layer `
I Wij` : Weight from layer ` − 1 node i to layer ` node j
1
I σ(x) = 1+e −x
: Sigmoid Transfer Function
The back propagation algorithm
Notation
I xj` : Input to node j of layer `
I Wij` : Weight from layer ` − 1 node i to layer ` node j
1
I σ(x) = 1+e −x
: Sigmoid Transfer Function
I θ` : Bias of node j of layer `
j
The back propagation algorithm
Notation
I xj` : Input to node j of layer `
I Wij` : Weight from layer ` − 1 node i to layer ` node j
1
I σ(x) = 1+e −x
: Sigmoid Transfer Function
I θ` : Bias of node j of layer `
j
I Oj` : Output of node j in layer `
The back propagation algorithm
Notation
I xj` : Input to node j of layer `
I Wij` : Weight from layer ` − 1 node i to layer ` node j
1
I σ(x) = 1+e −x
: Sigmoid Transfer Function
I θ` : Bias of node j of layer `
j
I Oj` : Output of node j in layer `
I tj : Target value of node j of the output layer
The error calculation
∂E
=
∂Wjk
Output layer node
∂E ∂ 1X
= (Ok − tk )2
∂Wjk ∂Wjk 2
k∈K
Output layer node
∂E ∂
= (Ok − tk ) Ok
∂Wjk ∂Wjk
Output layer node
∂E ∂
= (Ok − tk ) σ(xk )
∂Wjk ∂Wjk
Output layer node
∂E ∂
= (Ok − tk )σ(xk )(1 − σ(xk )) xk
∂Wjk ∂Wjk
Output layer node
∂E
= (Ok − tk )Ok (1 − Ok )Oj
∂Wjk
Output layer node
∂E
= (Ok − tk )Ok (1 − Ok )Oj
∂Wjk
For notation purposes I will define δk to be the expression
(Ok − tk )Ok (1 − Ok ), so we can rewrite the equation above as
∂E
= Oj δk
∂Wjk
where
δk = Ok (1 − Ok )(Ok − tk )
Hidden layer node
∂E
=
∂Wij
Hidden layer node
∂E ∂ 1X
= (Ok − tk )2
∂Wij ∂Wij 2
k∈K
Hidden layer node
∂E X ∂
= (Ok − tk ) Ok
∂Wij ∂Wij
k∈K
Hidden layer node
∂E X ∂
= (Ok − tk ) σ(xk )
∂Wij ∂Wij
k∈K
Hidden layer node
∂E X ∂xk
= (Ok − tk )σ(xk )(1 − σ(xk ))
∂Wij ∂Wij
k∈K
Hidden layer node
∂E X ∂xk ∂Oj
= (Ok − tk )Ok (1 − Ok ) ·
∂Wij ∂Oj ∂Wij
k∈K
Hidden layer node
∂E X ∂Oj
= (Ok − tk )Ok (1 − Ok )Wjk
∂Wij ∂Wij
k∈K
Hidden layer node
∂E ∂Oj X
= (Ok − tk )Ok (1 − Ok )Wjk
∂Wij ∂Wij
k∈K
Hidden layer node
∂E ∂xj X
= Oj (1 − Oj ) (Ok − tk )Ok (1 − Ok )Wjk
∂Wij ∂Wij
k∈K
Hidden layer node
∂E X
= Oj (1 − Oj )Oi (Ok − tk )Ok (1 − Ok )Wjk
∂Wij
k∈K
Hidden layer node
∂E X
= Oj (1 − Oj )Oi (Ok − tk )Ok (1 − Ok )Wjk
∂Wij
k∈K
∂E X
= Oj (1 − Oj )Oi (Ok − tk )Ok (1 − Ok )Wjk
∂Wij
k∈K
∂E
= Oj δk
∂Wjk
where
δk = Ok (1 − Ok )(Ok − tk )
∂E
= Oi δj
∂Wij
where X
δj = Oj (1 − Oj ) δk Wjk
k∈K
What about the bias?
If we incorporate the bias term θ into the equation you will find
that
∂O ∂θ
= O(1 − O)
∂θ ∂θ
and because ∂θ/∂θ = 1 we view the bias term as output from a
node which is always one.
What about the bias?
If we incorporate the bias term θ into the equation you will find
that
∂O ∂θ
= O(1 − O)
∂θ ∂θ
and because ∂θ/∂θ = 1 we view the bias term as output from a
node which is always one.
This holds for any layer ` we are concerned with, a substitution
into the previous equations gives us that
∂E
= δ`
∂θ
(because the O` is replacing the output from the “previous layer”)
The back propagation algorithm
1. Run the network forward with your input data to get the
network output
2. For each output node compute
δk = Ok (1 − Ok )(Ok − tk )
3. For each hidden node calulate
X
δj = Oj (1 − Oj ) δk Wjk
k∈K