0% found this document useful (0 votes)

5 views10 pages

Handout Delta Rule

The document explains the Perceptron Learning algorithm using a two-input NAND example, detailing the weight adjustments through multiple epochs. It introduces the Delta Rule for training weights in multilayer networks and discusses the concept of gradient descent for minimizing errors. The document also provides an example of training a two-input TLU using the Delta Rule, illustrating the weight updates and their calculations.

Uploaded by

5873.2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views10 pages

Handout Delta Rule

Uploaded by

5873.2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Lesson 4

Perceptron Learning - An example

A Two-Input NAND

x1 x2 x1 NAND x2 Let w1 = w2 = = 0.25 to begin.

0 0 1
0 1 1
1 0 1
1 1 0

=
w1x1 + w2x2 =
x2 = -( w1 / w2 )x1 + ( / w2 )

Substituting, we obtain
x2 = -( 0.25 / 0.25 )x1 + (0.25 / 0.25)

x2 = -x1 + 1

i.e.,
x1 x2
0 1
1 0

The First Epoch:

w1 w2 x1 x2 a y t (t-y) w1 w2
.25 .25 .25 0 0 0 0 1 .5(1-0)=.5 0 0 -.5
.25 .25 -.25 0 1 .25 1 1 .5(1-1)=0 0 0 0
.25 .25 -.25 1 0 .25 1 1 0 0 0 0
.25 .25 -.25 1 1 .5 1 0 .5(0-1)=-.5 -.5 -.5 .5
After the First Epoch

w1 = -0.25 = w2
= +0.25

x2 = -( w1 / w2 )x1 + ( / w2)

x2 = -x1 - 1

i.e.,
x1 x2
0 -1
1 -2

Second Epoch:
w1 w2 x1 x2 a y t (t-y) w1 w2
-.25 -.25 .25 0 0 0 0 1 .5(1-0)=.5 0 0 -.5
-.25 -.25 -.25 0 1 -.25 1 1 0 0 0 0
-.25 -.25 -.25 1 0 -.25 1 1 0 0 0 0
-.25 -.25 -.25 1 1 -.5 0 0 0 0 0 0

After the Second Epoch

w1 = w2 = -0.25
= -0.25

x2 = -( w1 / w2)x1 + ( / w2 )
x2 = -x1 + 1

i.e.,

x1 x2
0 1
1 0
Third Epoch:
w1 w2 x1 x2 a y t (t-y) w1 w2
-.25 -.25 -.25 0 0 0 1 1 0 0 0 0
-.25 -.25 -.25 0 1 -.25 1 1 0 0 0 0
-.25 -.25 -.25 1 0 -.25 1 1 0 0 0 0
-.25 -.25 -.25 1 1 -.25 0 0 0 0 0 0

Since there have been no changes, Halt!

The Delta Rule

We desire:

1. Capability to train all the weights in multilayer nets with no a priori knowledge of the training
set.

2. Based on defining a measure of the difference between the actual network output and target
vector.

3. This difference is then treated as an error to be minimized by adjusting the weights.

Finding the Minimum of a Function: Gradient Descent (informed hillclimbing?)

Suppose that quantity y depends on a single variable x.

i.e., y = y(x).

We wish to find x0 which minimizes x.

i.e., y(x0) <= y(x) , x.

 Let x* be current best estimate for x0.
 To obtain a better estimate for x0, choose x so as to follow the function downhill.
 We need to know the slope of the function at x*:

Slope of a Function

The slope at any point x is just the slope of a straight

line, the tangent, which just grazes the curve at that
point.

1. Here, one may draw the function on graph paper

2. Draw the tangent at the point P
3. Measure the sides x, y, or merely calculate:

y'(x = P).

If x is small enough, y = y .

Dividing y by x, and then multiplying by x leaves y unchanged.

y=( y/ x) x

Furthermore, y y.

Hence, we may write: y = slope x.

That is,

y = (dy / dx) x. (*)

That is, the derivative of y with respect to x.

Suppose we can evaluate the slope or derivative of y and put

x = - (dy/dx) , where > 0 and is small enough to ensure that y y.

Then, substituting this is in ( * ), we get

dy - (dy/dx)2 ( ** )
The quantity (dy/dx)2 is positive.
Hence, the quantity - (dy/dx)2 must be negative.

y < 0.

i.e., we have "traveled down" the curve towards the minimal point.

If we keep repeating steps such as ( ** ), then we should approach the value x0 associated with the
function minimum.

This is Gradient Descent.

Its effectiveness hinges on the ability to calculate or make estimates of dy/dx.

Functions of More Than One Variable

Suppose y = y(x1, x2, ..., xn).

One may speak of the slope of the function, or its rate of change, with respect to each of these
variables independently.

The slope or derivative of a function y with respect to the variable xi is:

y / xi The partial derivative.

The equivalent is then

xi = - ( y / xi ).

There is an equation like this for each variable, and all of them must be used to ensure that y < 0
and there is gradient descent.

Gradient Descent on an Error.

wi = - ( E / wi ).

We need to find a suitable error E.

Suppose we assign equal importance to the error for each pattern, so that if ep is the error for training
pattern p, then the total error E is just the average or mean over all N patterns.

One attempt to define e p as simply the difference e p = t p - y p , where y p is the TLU output in
response to p.

However ... then the error is smaller for t p=0, y p=1 than for t p=1, y p=0. They're equally wrong.

We next try

e p = ( t p - y p )2.

 A subtle problem remains:

With gradient descent, it is assumed that the function to be minimized depends on its variables
in a smooth, continuous fashion.
First, the activation ap is simply the weighted sum of inputs. This is smooth and continuous.
But, the output depends on ap via the discontinuous step function.

 One remedy:
e p = ( t p - a p )2.

We must be careful how we define the targets.

We have used {0,1} heretofore.

When using the augmented weight vector, the output changes as the activation changes sign
i.e.,

a >= 0 y = 1.

 As long as the activation takes on the correct sign, the target output is guaranteed and we are
free to choose two arbitrary numbers, one positive, and one negative, as the activation targets.

{1, -1} are customary.

 One last modification:

A factor of 1/2 is added to the error expression - simplifies the resulting slope or derivative.

e p = 1/2 ( t p - a p )2.

and thus,

The Delta Rule.

 The Error E depends on all the patterns. So do all its derivatives. Hence ,the whole training set
needs to be presented in order to evaluate the gradients E / wi

 This is batch training - results in true gradient descent, but is computationally intensive.

 Instead ... adapt the weights based on the presentation of each pattern individually.

i.e., we present the net with a pattern p,

evaluate e p / wi,
and use this as an estimate of the true gradient E / wi

 Recall that:
e p = 1/2 ( t p - a p )2

and
a p = w1x1 p + w2x2 p + ... + wn+1xn+1 p

ep / wi = -( t p - a p )xi p , where xi p is the ith component of pattern p.

1. The gradient must depend in some way on ( t p - a p ). The larger this is, the larger we expect
the gradient to be.
If this difference is zero, then the gradient is also zero, since we have found the minimum value
of ep.
2. The gradient must depend on the input xi p, for if this is zero, then the ith input is making no
contribution to the activation for the pth pattern - and cannot affect the error. No matter how wi
changes, it makes no difference to ep.

Conversely, if xi p is large, then the ith input is correspondingly sensitive to the value of wi.

ep / wi = -( t p - a p )xi p

use as an estimate ,

w i = - ( E / wi )

we obtain,
wi = - (t p - a p) xi p

 Pattern Training Regime: weight changes are made after each vector presentation.

 We are using estimates for the true gradient. The progress in the minimization of E is noisy.
i.e., weight changes are sometimes made which increase E.

 This is the Widrow-Hoff Rule, now refered to as the Delta Rule (or -rule.)

 Widrow and Hoff first proposed this training regime (1960.) They trained ADALINES
(ADAptive LINear ElementS,) which is a TLU, except that the input and output signals were
bipolar (i.e., {-1,1}.)

 If the learning rate is sufficiently small, then the delta rule converges.
I.e., the weight vector approaches the vector w0, for which the error is a minimum, and E itself
approaches a constant value.

 Note: A solution will not exist if the problem is not linearly separable.

 Then w0 is the best the TLU can do, and some patterns will be incorrectly classified.

 (Note the difference with the Perceptron rule !!!)

 Also note, the delta rule will always make changes to weights, no matter how small (because
target activation values 1 will never be attained exactly.)

The Delta Rule Algorithm

Begin
Repeat
For each training vector pair (V, t)
Evaluate the activation a when V is input to the TLU
Adjust each of the weights
End For
Until the rate of change of the error is sufficiently small
End

The Delta Rule - An Example

Train a two-input TLU with initial weights (0, 0.4) and threshold 0.3, using a learning rate = 0.25.
(The AND function)

First Epoch
w1 w2 x1 x2 a t w1 w2
(1)
0.00 0.40 0.30 0 0 -0.30 -1.00 -0.17 -0.00 -0.00
0.17
(2) (3)
0.00 0.40 0.48 0 1 -0.08 -1.00 -0.23 -0.00
-0.23 0.23
(4) (5)
0.00 0.17 0.71 1 0 -0.71 -1.00 -0.07 -0.00
-0.07 0.07
(6) (7) (8)
0.07 0.17 0.78 1 1 -0.68 1.00 0.42
0.42 0.42 -0.42

After the first epoch, w1 = 0.35, w2 = 0.59, = 0.36

We employ wi = + (t p - a p) xi p.
Note the plus sign before : Always travel in the opposite direction of gradient.

(1) = +0.25(-1.00 - (-0.30) ) (-1) -1 is the input to .

= -0.25(-0.7) = 0.175 (sign?)!

(2) w2 = -0.25(-1.00 - (0.08) ) * 1

= -0.25(-0.92) = 0.23. ( (3) will have the opposite sign.)

(4) w1 = -0.25(-1.00 - (-0.71) ) * 1

= -0.25(-0.29) = 0.07. ( (5) will have the opposite sign.)

(8) = -0.25(1.00 - (-0.68) )(-1)

= -0.25 (1.68) = -0.42 ( (6), (7) opposite sign.)

Since, after the first epoch, we have

w1 = 0.35,
w2 = 0.59,
= 0.36,

x2 = -(0.35 / 0.59) x1 + (0.36 / 0.59) = -0.59 x1 + 0.6

i.e., slope is -0.59.

Second Epoch
w1 w2 x1 x2 a t w1 w2
0.35 0.59 0.36 0 0
0 1
1 0
1 1

Condensed Matter, Marder, Solutions
100% (4)
Condensed Matter, Marder, Solutions
38 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
Intelligent Network Design Driven by Big Data Analytics IoT AI and Cloud Comput
100% (1)
Intelligent Network Design Driven by Big Data Analytics IoT AI and Cloud Comput
427 pages
Hebbian Learning and Gradient Descent Learning: Neural Computation: Lecture 5
No ratings yet
Hebbian Learning and Gradient Descent Learning: Neural Computation: Lecture 5
20 pages
3 DeltaRule PDF
No ratings yet
3 DeltaRule PDF
10 pages
Petronas Technical Standards: Pipelines Pre-Commissioning
100% (4)
Petronas Technical Standards: Pipelines Pre-Commissioning
40 pages
Injection Moulding
No ratings yet
Injection Moulding
155 pages
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
No ratings yet
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
19 pages
Chapter 5
No ratings yet
Chapter 5
6 pages
7 - Feedforward and Backpropagation
No ratings yet
7 - Feedforward and Backpropagation
55 pages
Chapter 4 - Anatomy of A Learning Algorithms
No ratings yet
Chapter 4 - Anatomy of A Learning Algorithms
2 pages
Delta Rule Example
No ratings yet
Delta Rule Example
55 pages
What Is Neural Network Technology?
No ratings yet
What Is Neural Network Technology?
17 pages
Refrigerant Charging Procedure in Automotive A
No ratings yet
Refrigerant Charging Procedure in Automotive A
3 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Air Handling Unit: Sba / Ba
No ratings yet
Air Handling Unit: Sba / Ba
48 pages
Casting PDF
100% (1)
Casting PDF
48 pages
Lecture 03
No ratings yet
Lecture 03
32 pages
Msep2013 L5
No ratings yet
Msep2013 L5
14 pages
Gradient Descent (v2)
No ratings yet
Gradient Descent (v2)
38 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
2 LT Plug Valve Repair Instructions
No ratings yet
2 LT Plug Valve Repair Instructions
7 pages
Regression
No ratings yet
Regression
30 pages
cs229 2
No ratings yet
cs229 2
275 pages
Gradient Descent Learning: Minimize Objective Function: Error Landscape
No ratings yet
Gradient Descent Learning: Minimize Objective Function: Error Landscape
14 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
GD-Example 7
No ratings yet
GD-Example 7
15 pages
Gradient Descent From Scratch Complete Intuition
No ratings yet
Gradient Descent From Scratch Complete Intuition
8 pages
Gradient Descent Based Learners
No ratings yet
Gradient Descent Based Learners
11 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Deep Learning - Udacity
No ratings yet
Deep Learning - Udacity
4 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Tutorial 8 Questions
No ratings yet
Tutorial 8 Questions
3 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
PDF N Fcom SVR TF N Eu Cis 20110531 FCB FCB17
No ratings yet
PDF N Fcom SVR TF N Eu Cis 20110531 FCB FCB17
10 pages
Gradient Maths - Step by Step Delta Rule PDF
No ratings yet
Gradient Maths - Step by Step Delta Rule PDF
18 pages
06 23ECE216 GradientDescent v2
No ratings yet
06 23ECE216 GradientDescent v2
73 pages
Jntuk R20 ML Unit-V
No ratings yet
Jntuk R20 ML Unit-V
19 pages
Chapter 7
No ratings yet
Chapter 7
68 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
No ratings yet
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
20 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
13d-Waves and Optics FR Practice Problems-ANSWERS
No ratings yet
13d-Waves and Optics FR Practice Problems-ANSWERS
28 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Backpropagation LectureNotesPublic
No ratings yet
Backpropagation LectureNotesPublic
13 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
8365 1 Question Paper June 2023
No ratings yet
8365 1 Question Paper June 2023
24 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
MScFE 650 MLF - Video - Transcripts - M3
No ratings yet
MScFE 650 MLF - Video - Transcripts - M3
19 pages
2020 Houdini Learning
100% (1)
2020 Houdini Learning
87 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
Synthesis and Characterization of ZnCo2O4 Nanomaterial For Symmetric Supercapacitor Applications
100% (8)
Synthesis and Characterization of ZnCo2O4 Nanomaterial For Symmetric Supercapacitor Applications
4 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Stochastic Gradient Descent Algorithm
No ratings yet
Stochastic Gradient Descent Algorithm
6 pages
HKCEE 1984 Mathematics II: N N N N
No ratings yet
HKCEE 1984 Mathematics II: N N N N
10 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Simon Chapter 3
No ratings yet
Simon Chapter 3
12 pages
NI Tutorial 3173 en PDF
No ratings yet
NI Tutorial 3173 en PDF
9 pages
Preformulation Study
No ratings yet
Preformulation Study
130 pages
Kevin Swingler - Lecture 3: Delta Rule
No ratings yet
Kevin Swingler - Lecture 3: Delta Rule
10 pages
Visual Basic Program: Command 1 - Private Sub Command1 - Click Text3.text Val (Text1.text) + Val (Text2.text) End Sub
100% (1)
Visual Basic Program: Command 1 - Private Sub Command1 - Click Text3.text Val (Text1.text) + Val (Text2.text) End Sub
23 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
3 DeltaRule PDF
No ratings yet
3 DeltaRule PDF
10 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Jurnal Pindah Silang
No ratings yet
Jurnal Pindah Silang
14 pages
ML Notes
No ratings yet
ML Notes
14 pages
June CH 4 Atomic Structure Class Viii Notes
No ratings yet
June CH 4 Atomic Structure Class Viii Notes
4 pages
Azure Synapse DW - Pool Best Practices & Field Guidance: Prepared by
No ratings yet
Azure Synapse DW - Pool Best Practices & Field Guidance: Prepared by
41 pages
Ch.3 (Chemical Equilibrium) - 1-2
No ratings yet
Ch.3 (Chemical Equilibrium) - 1-2
31 pages
Complex Numbers
No ratings yet
Complex Numbers
9 pages
CSD Final - Doc
No ratings yet
CSD Final - Doc
12 pages
Technical Data Sheet: HL-10T8-PC
No ratings yet
Technical Data Sheet: HL-10T8-PC
2 pages
Emat Sensor Design
No ratings yet
Emat Sensor Design
20 pages
Properties of Fluids
No ratings yet
Properties of Fluids
26 pages
Survey of Failures in Wind Power Systems With Focus On Swedish Wind Power Plants During 19972005
No ratings yet
Survey of Failures in Wind Power Systems With Focus On Swedish Wind Power Plants During 19972005
7 pages
Sharma Et Al 2016 Chiral Ni Schiff Base Complexes Inside Zeolite y and Their Application in Asymmetric Henry Reaction
No ratings yet
Sharma Et Al 2016 Chiral Ni Schiff Base Complexes Inside Zeolite y and Their Application in Asymmetric Henry Reaction
11 pages
Materials and Design
No ratings yet
Materials and Design
12 pages
II - 17BT2102 - qp8
No ratings yet
II - 17BT2102 - qp8
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet

Handout Delta Rule

Uploaded by

Handout Delta Rule

Uploaded by

Lesson 4

Perceptron Learning - An example

x1 x2 x1 NAND x2 Let w1 = w2 = = 0.25 to begin.

The First Epoch:

After the Second Epoch

Since there have been no changes, Halt!

The Delta Rule

3. This difference is then treated as an error to be minimized by adjusting the weights.

Finding the Minimum of a Function: Gradient Descent (informed hillclimbing?)

Suppose that quantity y depends on a single variable x.

We wish to find x0 which minimizes x.

i.e., y(x0) <= y(x) , x.

The slope at any point x is just the slope of a straight

1. Here, one may draw the function on graph paper

Dividing y by x, and then multiplying by x leaves y unchanged.

Hence, we may write: y = slope x.

y = (dy / dx) x. (*)

That is, the derivative of y with respect to x.

Suppose we can evaluate the slope or derivative of y and put

x = - (dy/dx) , where > 0 and is small enough to ensure that y y.

Then, substituting this is in ( * ), we get

This is Gradient Descent.

Its effectiveness hinges on the ability to calculate or make estimates of dy/dx.

Functions of More Than One Variable

Suppose y = y(x1, x2, ..., xn).

The slope or derivative of a function y with respect to the variable xi is:

y / xi The partial derivative.

The equivalent is then

Gradient Descent on an Error.

 Consider a network consisting of a single TLU.

We need to find a suitable error E.

 A subtle problem remains:

We must be careful how we define the targets.

{1, -1} are customary.

 One last modification:

The Delta Rule.

i.e., we present the net with a pattern p,

ep / wi = -( t p - a p )xi p , where xi p is the ith component of pattern p.

 (Note the difference with the Perceptron rule !!!)

The Delta Rule Algorithm

The Delta Rule - An Example

After the first epoch, w1 = 0.35, w2 = 0.59, = 0.36

(1) = +0.25(-1.00 - (-0.30) ) (-1) -1 is the input to .

(2) w2 = -0.25(-1.00 - (0.08) ) * 1

(4) w1 = -0.25(-1.00 - (-0.71) ) * 1

(8) = -0.25(1.00 - (-0.68) )(-1)

Since, after the first epoch, we have

x2 = -(0.35 / 0.59) x1 + (0.36 / 0.59) = -0.59 x1 + 0.6

You might also like