0% found this document useful (0 votes)
88 views12 pages

Lecture 2

Adaline is an adaptive linear neuron proposed in 1960. It uses the Widrow-Hoff learning rule, also known as the least mean squares rule or delta rule, to minimize output error through gradient descent. The key differences between Adaline and the perceptron are the learning methods - Adaline adjusts weights continuously to reduce error based on the delta rule, while the perceptron only adjusts weights when there is misclassification. Adaline networks are commonly used to implement adaptive filters by adjusting weights to reduce the difference between network outputs and desired targets.

Uploaded by

RMD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views12 pages

Lecture 2

Adaline is an adaptive linear neuron proposed in 1960. It uses the Widrow-Hoff learning rule, also known as the least mean squares rule or delta rule, to minimize output error through gradient descent. The key differences between Adaline and the perceptron are the learning methods - Adaline adjusts weights continuously to reduce error based on the delta rule, while the perceptron only adjusts weights when there is misclassification. Adaline networks are commonly used to implement adaptive filters by adjusting weights to reduce the difference between network outputs and desired targets.

Uploaded by

RMD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Adaline

 ADALINE
 ADAptive LINEar neuron
 Proposed by Widrow & Hoff 1960
 Typically uses bipolar (+1,-1) activations for its
inputs and targets
 Not restricted to such values

1
Architecture
 Bipolar Input
 Bipolar Target
 Net Input: Yin=WTX+b
 If the net is being used
for pattern classification
with bipolar class labels,
a threshold function (with
threshold = 0) is applied
to the net input to obtain
the activations

2
Difference between Perceptron and Adaline

 The difference between (single layer) perceptron and


ADALINE networks is the learning method.
 The adaline learning rule (also known as the least-mean-
squares rule, the delta rule, and the Widrow-Hoff rule) is a
training rule that minimizes the output error using
(approximate) gradient descent.
 Adaptive Linear Element or adaline is a single layer linear
neural network based on the McCulloch-Pitts neuron.

3
Difference between Perceptron and Adaline

 The learning rule in adaline neural networks is quite


straightforward. Some input values or vectors are given to
the adaline network, and the values output are compared
with the desired values.
 If the difference in the output and desired values is greater
than the tolerance, the weights of the neurons are changed
till the error or difference becomes acceptable. Adaline
networks have found use in modern times for implementing
adaptive filters.

4
Difference between Perceptron and Adaline

 Thus, the weights are adjusted by


 Wij (t+1) = Wij (t) + α (T-Wij I) X
This corresponds to gradient descent on the quadratic error
surface, Ej=Sum [T - Wj .X] 2
 In perceptron learning, the weights are adjusted only when
a pattern is misclassified. The correction to the weights
after applying the training pattern p is
Wij (t+1) = Wij (t) + α (T- A) X
This corresponds to gradient descent on the error surface
E (Wij )= Summisclassified [Wij (A)X].

5
Training
 The learning rule minimizes the mean squared error
between the activations and the target value
 Delta Rule, LMS or Widrow-Hoff Rule

6
Training: Proof for Delta Rule
 Error for all P samples: mean square error
1 P
=E ∑
P p =1
(t ( p ) − y _ in( p )) 2

 E is a function of W = {w1, ... w n}


 Learning takes gradient descent approach to reduce E
by modifying W
∂E ∂E ∂E
 the gradient of E: ∇E =(
∂w1
, ......
∂wn
) ∆wi ∝ −
∂wi

∂E 2 P ∂

∂wi
=
[ ∑
P p =1
(t ( p ) − y _ in ( p ))]
∂wi
(t ( p ) − y _ in( p )

2 P
−[ ∑ (t ( p ) − y _ in( p ))]xi
=
P p =1
∂E 2 P
∆wi =
∝− [ ∑ (t ( p ) − y _ in( p ))]xi 7
∂wi P 1
Error Surface Error Contour
2 24

22
1.5

25 20

20 1
18
15

10
0.5 16
Sum Squared Error

0 14

Bias B
-5 0

12
-10

-15 -0.5
10
-20

-25 8
-1
2

6
1 2

1 -1.5
0
4
0
-1
-1
-2 2
Bias B -2 -2 -1 0 1 2
-2 Weight W
Weight W 8
Training…
 Application of Delta Rule
 Method 1 (sequential mode): change w i after each training
pattern by α (t ( p) − y _ in( p)) xi
 Method 2 (batch mode): change w i at the end of each epoch.
Within an epoch, cumulate α (t ( p) − y _ in( p)) xi
for every pattern (x(p), t(p))
 Method 2 is slower but may provide slightly better results
(because Method 1 may be sensitive to the sample ordering)
 Notes:
 E monotonically decreases until the system reaches a state
with (local) minimum E (a small change of any w i will cause E
to increase).
 At a local minimum E state, ∂E / ∂wi = 0 ∀i , but E is not
guaranteed to be zero

9
Training Algorithm

10
Parameter Initialization
 Learning Rate
 Hecht & Nielson

 Practical Value for a single neuron

11
Application

12

You might also like