3 DeltaRule PDF
3 DeltaRule PDF
3 DeltaRule PDF
Kevin Swingler
[email protected]
1
Mathematical Preliminaries: Vector Notation
2
Review of the McCulloch-Pitts/Perceptron Model
I1 Wj1
I2 Aj Yj
∑
I3
In Wjn
3
Review of Geometrical Interpretation
X2
Y=1
X1
Y=-1
wx = 0
4
Review of Supervised Learning
x
Supervisor ytarget
Generator
Learning
y
Machine
5
Learning by Error Minimization
1
E ( w) = ∑∑ desired
2 p j
( y − y ) 2
Intuition:
• Square makes error positive and penalises large errors more
• ½ just makes the math easier
• Need to change the weights to minimize the error – How?
• Use principle of Gradient Descent
6
Principle of Gradient Descent
So, calculate the derivative (gradient) of the Cost Function with respect
to the weights, and then change each weight by a small increment in
the negative (opposite) direction to the gradient
∂E ∂E ∂y
= ⋅ = −( ydesired − y ) x = −δ x
∂w ∂y ∂w
7
Graphical Representation of Gradient Descent
8
Widrow-Hoff Learning Rule
(Delta Rule)
∂E or
∆w = w − wold = −η = +η δ x w = wold + η δ x
∂w
9
Convergence of PLR/DR
The weight changes ∆wij need to be applied repeatedly for each weight wij in
the network and for each training pattern in the training set.
One pass through all the weights for the whole training set is called an epoch
of training.
After many epochs, the network outputs match the targets for all the training
patterns, all the ∆wij are zero and the training process ceases. We then say
that the training process has converged to a solution.
It has been shown that if a possible set of weights for a Perceptron exist, which
solve the problem correctly, then the Perceptron Learning rule/Delta Rule
(PLR/DR) will find them in a finite number of iterations.
Furthermore, if the problem is linearly separable, then the PLR/DR will find a
set of weights in a finite number of iterations that solves the problem
correctly.
10