0% found this document useful (0 votes)
193 views10 pages

Kevin Swingler - Lecture 3: Delta Rule

The document discusses the Delta Rule, which is an algorithm for training neural networks. It summarizes the Delta Rule as follows: 1) The Delta Rule uses gradient descent to minimize the cost function and reduce error between the neural network's actual outputs and the desired outputs. 2) It works by calculating the derivative of the cost function with respect to the weights and adjusting each weight by an amount proportional to the negative of the derivative, moving weights in the direction that decreases error. 3) Specifically, the Delta Rule updates weights according to the equation: w = wold + η δ x, where δ is the error, η is the learning rate, and x is the input. This allows

Uploaded by

Roots999
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
193 views10 pages

Kevin Swingler - Lecture 3: Delta Rule

The document discusses the Delta Rule, which is an algorithm for training neural networks. It summarizes the Delta Rule as follows: 1) The Delta Rule uses gradient descent to minimize the cost function and reduce error between the neural network's actual outputs and the desired outputs. 2) It works by calculating the derivative of the cost function with respect to the weights and adjusting each weight by an amount proportional to the negative of the derivative, moving weights in the direction that decreases error. 3) Specifically, the Delta Rule updates weights according to the equation: w = wold + η δ x, where δ is the error, η is the learning rate, and x is the input. This allows

Uploaded by

Roots999
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Lecture 3: Delta Rule

Kevin Swingler
[email protected]

Mathematical Preliminaries: Vector Notation


Vectors appear in lowercase bold font e.g. input vector: x = [x0 x1 x2 xn] Dot product of two vectors: wx = w0 x0 + w1 x1 + + wn xn =
n

wi xi
i =0

E.g.: x = [1,2,3], y = [4,5,6] xy = (1*4)+(2*5)+(3*6) = 4+10+18 = 32

Review of the McCulloch-Pitts/Perceptron Model


I1 I2 I3 In Wjn
n

Wj1

Aj

Yj

Neuron sums its weighted inputs: w0 x0 + w1 x1 + + wn xn =

wi xi
i =0

=wx

Neuron applies threshold activation function: y = f(w x) where, e.g. f(w x) = + 1 f(w x) = - 1 if w x > 0 if w x 0
3

Review of Geometrical Interpretation


X2

Y=1

Y=-1

X1

wx = 0

Neuron defines two regions in input space where it outputs -1 and 1. The regions are separated by a hyperplane wx = 0 (i.e. decision boundary)

Review of Supervised Learning


x Generator Supervisor ytarget

Learning Machine

Training: Learn from training pairs (x, ytarget) Testing: Given x, output a value y close to the supervisors output ytarget

Learning by Error Minimization


The Perceptron Learning Rule is an algorithm for adjusting the network weights w to minimize the difference between the actual and the desired outputs. We can define a Cost Function to quantify this difference:

E ( w) =
Intuition:

1 ( ydesired y ) 2 2 p j

Square makes error positive and penalises large errors more just makes the math easier Need to change the weights to minimize the error How? Use principle of Gradient Descent
6

Principle of Gradient Descent


Gradient descent is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient of the function as the current point. So, calculate the derivative (gradient) of the Cost Function with respect to the weights, and then change each weight by a small increment in the negative (opposite) direction to the gradient

E E y = = ( ydesired y ) x = x w y w
To reduce E by gradient descent, move/increment weights in the negative direction to the gradient, -(-x)= +x
7

Graphical Representation of Gradient Descent


E

Widrow-Hoff Learning Rule (Delta Rule)


w = w wold = E = + x w

or

w = wold + x

where = ytarget y and is a constant that controls the learning rate (amount of increment/update w at each training step). Note: Delta rule (DR) is similar to the Perceptron Learning Rule (PLR), with some differences: 1. Error () in DR is not restricted to having values of 0, 1, or -1 (as in PLR), but may have any value 2. DR can be derived for any differentiable output/activation function f, whereas in PLR only works for threshold output function
9

Convergence of PLR/DR
The weight changes wij need to be applied repeatedly for each weight wij in the network and for each training pattern in the training set. One pass through all the weights for the whole training set is called an epoch of training. After many epochs, the network outputs match the targets for all the training patterns, all the wij are zero and the training process ceases. We then say that the training process has converged to a solution. It has been shown that if a possible set of weights for a Perceptron exist, which solve the problem correctly, then the Perceptron Learning rule/Delta Rule (PLR/DR) will find them in a finite number of iterations. Furthermore, if the problem is linearly separable, then the PLR/DR will find a set of weights in a finite number of iterations that solves the problem correctly.

10

You might also like