3 DeltaRule PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10
At a glance
Powered by AI
The key takeaways are that the document discusses vector notation, the McCulloch-Pitts/Perceptron model, supervised learning, error minimization using gradient descent, and the Widrow-Hoff/Delta Rule for updating network weights.

The Delta Rule is an algorithm for adjusting network weights to minimize the difference between the actual and desired outputs. It updates each weight by moving it in the opposite direction of the gradient, proportional to the error term δ.

The Delta Rule can handle any differentiable activation function and error term δ can take on any value, while the Perceptron Learning Rule only works for threshold activation and restricts δ to 0, 1, or -1.

Lecture 3: Delta Rule

Kevin Swingler
[email protected]

1
Mathematical Preliminaries: Vector Notation

Vectors appear in lowercase bold font


e.g. input vector: x = [x0 x1 x2 … xn]

Dot product of two vectors:


wx = w0 x0 + w1 x1 + … + wn xn =
n
= ∑ wi xi
i =0

E.g.: x = [1,2,3], y = [4,5,6] xy = (1*4)+(2*5)+(3*6) = 4+10+18 = 32

2
Review of the McCulloch-Pitts/Perceptron Model

I1 Wj1
I2 Aj Yj

I3

In Wjn

Neuron sums its weighted inputs: n


w0 x0 + w1 x1 + … + wn xn = ∑ wi xi =wx
i =0

Neuron applies threshold activation function:


y = f(w x)
where, e.g. f(w x) = + 1 if w x > 0
f(w x) = - 1 if w x ≤ 0

3
Review of Geometrical Interpretation
X2

Y=1

X1
Y=-1

wx = 0

Neuron defines two regions in input space where it outputs -1 and 1.


The regions are separated by a hyperplane wx = 0 (i.e. decision
boundary)

4
Review of Supervised Learning
x
Supervisor ytarget
Generator

Learning
y
Machine

Training: Learn from training pairs (x, ytarget)


Testing: Given x, output a value y close to the supervisor’s output ytarget

5
Learning by Error Minimization

The Perceptron Learning Rule is an algorithm for adjusting the network


weights w to minimize the difference between the actual and the
desired outputs.

We can define a Cost Function to quantify this difference:

1
E ( w) = ∑∑ desired
2 p j
( y − y ) 2

Intuition:
• Square makes error positive and penalises large errors more
• ½ just makes the math easier
• Need to change the weights to minimize the error – How?
• Use principle of Gradient Descent

6
Principle of Gradient Descent

Gradient descent is an optimization algorithm that approaches a local


minimum of a function by taking steps proportional to the negative of
the gradient of the function as the current point.

So, calculate the derivative (gradient) of the Cost Function with respect
to the weights, and then change each weight by a small increment in
the negative (opposite) direction to the gradient

∂E ∂E ∂y
= ⋅ = −( ydesired − y ) x = −δ x
∂w ∂y ∂w

To reduce E by gradient descent, move/increment weights in the


negative direction to the gradient, -(-δx)= +δx

7
Graphical Representation of Gradient Descent

8
Widrow-Hoff Learning Rule
(Delta Rule)

∂E or
∆w = w − wold = −η = +η δ x w = wold + η δ x
∂w

where δ = ytarget – y and η is a constant that controls the learning rate


(amount of increment/update ∆w at each training step).
Note: Delta rule (DR) is similar to the Perceptron Learning Rule
(PLR), with some differences:
1. Error (δ) in DR is not restricted to having values of 0, 1, or -1
(as in PLR), but may have any value
2. DR can be derived for any differentiable output/activation
function f, whereas in PLR only works for threshold output
function

9
Convergence of PLR/DR

The weight changes ∆wij need to be applied repeatedly for each weight wij in
the network and for each training pattern in the training set.

One pass through all the weights for the whole training set is called an epoch
of training.

After many epochs, the network outputs match the targets for all the training
patterns, all the ∆wij are zero and the training process ceases. We then say
that the training process has converged to a solution.

It has been shown that if a possible set of weights for a Perceptron exist, which
solve the problem correctly, then the Perceptron Learning rule/Delta Rule
(PLR/DR) will find them in a finite number of iterations.

Furthermore, if the problem is linearly separable, then the PLR/DR will find a
set of weights in a finite number of iterations that solves the problem
correctly.

10

You might also like