3 DeltaRule PDF
3 DeltaRule PDF
3 DeltaRule PDF
wi xi
i 0
0
w1
x2
x3
xn
wn
wi xi
=wx=a
i 0
if w x > 0
if w x 0
3
y=1
y=-1
x1
wx = 0
R i
Review
off Supervised
S
i d Learning
L
i
x
Generator
Supervisor
Learning
Machine
ytarget
Learning b
by Error Minimization
Minimi ation
The Perceptron Learning Rule is an algorithm for adjusting the network
weights w to minimize the difference between the actual and the
desired outputs.
We can define a Cost Function to quantify this difference:
E ( w)
1
p
p 2
(
y
tarj
j )
2 p j
Intuition:
Square makes error positive and penalises large errors more
jjust makes the maths easier
Need to change the weights to minimize the error How?
Use principle of Gradient Descent
6
Error Gradient
So, calculate the derivative (gradient) of the Cost Function with respect
to the weights, and then change each weight by a small increment in
the negative (opposite) direction to the gradient
To do this we need a differentiable activation function, such as the
linear function: f(a) = a
1
E ( w ji ) ( ytarj y j ) 2
2
y j f (a j ) w ji xi
i
E
E y j
( ytarj y j ) xi xi
w ji y j w ji
To reduce E by gradient descent, move/increment weights in the
negative direction to the gradient, -(-x)= +x
8
E
x
w
or
w wold x
Convergence of PLR/DR
The weight
g changes
g wji need to be applied
pp
repeatedly
p
y for each weight
g wji in
the network and for each training pattern in the training set.
One pass through all the weights for the whole training set is called an epoch
of training.
training
After many epochs, the network outputs match the targets for all the training
patterns all the wji are zero and the training process ceases
patterns,
ceases. We then say
that the training process has converged to a solution.
It has been shown that if a possible set of weights for a Perceptron exist, which
solve
l th
the problem
bl
correctly,
tl th
then th
the Perceptron
P
t
L
Learning
i rule/Delta
l /D lt R
Rule
l
(PLR/DR) will find them in a finite number of iterations.
Furthermore, if the problem is linearly separable
Furthermore
separable, then the PLR/DR will find a
set of weights in a finite number of iterations that solves the problem
correctly.
10