0% found this document useful (0 votes)
45 views56 pages

NN Lec - 03

The document discusses the ADALINE neural network model. It provides details on the architecture and components of an ADALINE network including the activation function and cost function. It also describes the training and testing algorithms for ADALINE using least mean squares learning.

Uploaded by

Zeyad Gomaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views56 pages

NN Lec - 03

The document discusses the ADALINE neural network model. It provides details on the architecture and components of an ADALINE network including the activation function and cost function. It also describes the training and testing algorithms for ADALINE using least mean squares learning.

Uploaded by

Zeyad Gomaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Neural Networks

Lecture (3)

Dr. Mona Nagy ElBedwehy


ADALINE
➢ The ADALINE stands for ADAptive LInear NEuron.
➢ Widrow and his graduate student Hoff introduced ADALINE
network and learning rule which they called the LMS (Least
Mean Square) algorithm in 1960.
➢ The ADALINE uses the bipolar activations for its input signals
and its target output.
➢ The weights on the connections from the input units to the
ADALINE are adjustable.
➢ The ADALINE also has a bias.

Dr. Mona Nagy ElBedwehy


ElBedwehy 2
ADALINE
➢ An ADALINE is a special case if there is only one output unit.

➢ An ADALINE improves on the Perceptron model by including


two very important optimization concepts called the activation
function and the cost function.

➢ The activation function is the function that gives us a value to


work with the threshold function.
➢ The activation function works using the weights of the features.

➢ For a single layer ADALINE, we will be using an identity


function which simply gives us back what has been sent to it.

Dr. Mona Nagy ElBedwehy


ElBedwehy 3
ADALINE
➢ The cost function is the function that every machine learning
algorithm will be working to minimize the error so that it can
give accurate results.
➢ An ADALINE really deviates from Perceptron because there is
no cost function in Perceptron.
➢ This is why ADALINE is considered as an improvement of
Perceptron as you can see how it is improving its prediction by
simply looking at its cost function.

Dr. Mona Nagy ElBedwehy


ElBedwehy 4
ADALINE
➢ The error is the difference between an output of the net and the
target output.
➢ We would like to find the values for the network weight and
biases such that the sum of the squares of the errors is
minimized.
➢ An ADALINE will be trying to reduce the cost function. But
how exactly? That’s where Gradient Descent comes.
➢ The Delta rule derived from the Gradient Descent method.
➢ So, ADALINE can be trained using the delta rule, also known as
the least mean squares (LMS) or Widrow-Hoff rule.
Dr. Mona Nagy ElBedwehy
ElBedwehy 5
ADALINE
➢ The rule can be used for single-layer nets with several output
units.
➢ The learning rule minimizes the mean squared error between
the activation and target value.
➢ This allows the net to continue learning on all training patterns,
even after the correct output value for some patterns has been
generated.
➢ After training, if the net is used for pattern classification where
the desired output is either a +𝟏 or −𝟏, a threshold function is
applied to the net input to obtain the activation.

Dr. Mona Nagy ElBedwehy


ElBedwehy 6
ADALINE
➢ If the net input to the ADALINE is greater than or equal to 𝟎,
then its activation is set to 𝟏; otherwise it is set to −𝟏.

Dr. Mona Nagy ElBedwehy


ElBedwehy 7
ADALINE Architecture
An ADALINE is a single unit that receives input from several units.

Dr. Mona Nagy ElBedwehy


ElBedwehy 8
Perceptron VS. ADALINE

Dr. Mona Nagy ElBedwehy


ElBedwehy 9
ADALINE Training ALGORITHM
Algorithm 1
Step 1. Initialize the weights and bias.
(Small random usually values are used but not zero)
Set learning rate 𝜼 𝟎 < 𝜼 ≤ 𝟏 .
Step 2. While stopping condition is false, do steps 3 – 7.
Step 3. For each bipolar training pair 𝒔, 𝒕, do Steps 4 – 6.
Step 4. Set the activations of input units, 𝒊 = 𝟏, … , 𝒏:
𝒙𝒊 = 𝒔𝒊
Step 5. Compute the net input to output unit:

𝒚𝒊𝒏 = 𝒃 + ෍ 𝒙𝒊 𝒘𝒊
𝒊

Dr. Mona Nagy ElBedwehy 10


ADALINE Training ALGORITHM
Step 6. Update the weights and bias, 𝒊 = 𝟏, … , 𝒏,
𝒘𝒊 𝐧𝐞𝐰 = 𝒘𝒊 𝐨𝐥𝐝 + 𝜼 𝒕 − 𝒚𝒊𝒏 𝒙𝒊
𝒃 𝐧𝐞𝐰 = 𝒃 𝐨𝐥𝐝 + 𝜼 𝒕 − 𝒚𝒊𝒏

Step 7. Compute the total error then test the stop condition:

𝟐
𝑬𝒊 = ෍ 𝒕 − 𝒚𝒊𝒏

If the largest weight change that occurred in Step 3 is small


than a specified tolerance, then stop; otherwise continue.

Dr. Mona Nagy ElBedwehy 11


ADALINE Testing ALGORITHM
Algorithm 2
Step 1. Initialize weights and bias from ADALINE training algorithm.
Step 2. For each bipolar input vector 𝒙, do steps 3 – 5.
Step 3. Set the activations of input units to 𝒙.
Step 4. Compute the net input to output unit:

𝒚𝒊𝒏 = 𝒃 + ෍ 𝒙𝒊 𝒘𝒊
𝒊

Step 5. Apply the activation function:


𝟏 𝐢𝐟 𝒚𝒊𝒏 ≥ 𝟎
𝒚 = 𝒇 𝒚𝒊𝒏 = ቐ
−𝟏 𝐢𝐟 𝒚𝒊𝒏 < 𝟎

Dr. Mona Nagy ElBedwehy 12


ADALINE Example
Example (1) Implement OR function using ADALINE networks for
bipolar inputs and targets.

Dr. Mona Nagy ElBedwehy 13


ADALINE Example
Answer
➢ Consider the truth table for OR logic function.

Inputs Target
𝟏
𝒙𝟏 𝒙𝟐 𝒚
𝒘𝟏 1 1 1
𝒙𝟏 𝒚𝒊𝒏 𝒚
1 −1 1
𝒙𝟐 −1 1 1
−1 −1 −1

Dr. Mona Nagy ElBedwehy 14


ADALINE Example
Answer
➢ The initial weights and bias are set to 0.1.
➢ The learning rate is set to 0.1.
➢ For the first input, we calculate the net input:
Inputs Target
𝒚𝒊𝒏 = 𝒃 + ෍ 𝒙𝒊 𝒘𝒊 = 𝒃 + 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 𝒙𝟏 𝒙𝟐 𝒚
𝒊
1 1 1
= 𝟎. 𝟏 + 𝟏 × 𝟎. 𝟏 + 𝟏 × 𝟎. 𝟏 = 𝟎. 𝟑
1 −1 1
➢ So, the net input calculated is 𝟎. 𝟑.
−1 1 1
−1 −1 −1

Dr. Mona Nagy ElBedwehy 15


ADALINE Example
➢ Now, Update the value of weights and bias using:
𝒘𝒊 𝐧𝐞𝐰 = 𝒘𝒊 𝐨𝐥𝐝 + 𝜼 𝒕 − 𝒚𝒊𝒏 𝒙𝒊
𝒃 𝐧𝐞𝐰 = 𝒃 𝐨𝐥𝐝 + 𝜼 𝒕 − 𝒚𝒊𝒏

𝒘𝟏 𝐧𝐞𝐰 = 𝒘𝟏 𝐨𝐥𝐝 + ∆𝒘𝟏 = 𝟎. 𝟏 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟑 × 𝟏 = 𝟎. 𝟏𝟕

𝒘𝟐 𝐧𝐞𝐰 = 𝒘𝟐 𝐨𝐥𝐝 + ∆𝒘𝟐 = 𝟎. 𝟏 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟑 × 𝟏 = 𝟎. 𝟏𝟕

𝒃 𝐧𝐞𝐰 = 𝒃 𝐨𝐥𝐝 + ∆𝒃 = 𝟎. 𝟏 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟑 = 𝟎. 𝟏𝟕

➢ Compute the error :


𝑬𝟏 = 𝒕 − 𝒚𝒊𝒏 𝟐 = 𝟏 − 𝟎. 𝟑 𝟐 = 𝟎. 𝟒𝟗

Dr. Mona Nagy ElBedwehy 16


ADALINE Example
For the second input, we calculate the net input:

𝒚𝒊𝒏 = 𝒃 + ෍ 𝒙𝒊 𝒘𝒊 = 𝒃 + 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐
𝒊
= 𝟎. 𝟏𝟕 + 𝟏 × 𝟎. 𝟏𝟕 + −𝟏 × 𝟎. 𝟏𝟕 = 𝟎. 𝟏𝟕
Now, Update the value of weights and bias using:
𝒘𝒊 𝐧𝐞𝐰 = 𝒘𝒊 𝐨𝐥𝐝 + 𝜼 𝒕 − 𝒚𝒊𝒏 𝒙𝒊
𝒃 𝐧𝐞𝐰 = 𝒃 𝐨𝐥𝐝 + 𝜼 𝒕 − 𝒚𝒊𝒏
𝒘𝟏 𝐧𝐞𝐰 = 𝒘𝟏 𝐨𝐥𝐝 + ∆𝒘𝟏 = 𝟎. 𝟏𝟕 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟏𝟕 × 𝟏 = 𝟎. 𝟐𝟓𝟑
𝒘𝟐 𝐧𝐞𝐰 = 𝒘𝟐 𝐨𝐥𝐝 + ∆𝒘𝟐 = 𝟎. 𝟏𝟕 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟏𝟕 × −𝟏 = 𝟎. 𝟎𝟖𝟕

𝒃 𝐧𝐞𝐰 = 𝒃 𝐨𝐥𝐝 + ∆𝒃 = 𝟎. 𝟏𝟕 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟏𝟕 = 𝟎. 𝟐𝟓𝟑


Dr. Mona Nagy ElBedwehy 17
ADALINE Example
Compute the error :
𝟐 𝟐
𝑬𝟐 = 𝒕 − 𝒚𝒊𝒏 = 𝟏 − 𝟎. 𝟏𝟕 = 𝟎. 𝟔𝟗
For the third input, we calculate the net input:

𝒚𝒊𝒏 = 𝒃 + ෍ 𝒙𝒊 𝒘𝒊 = 𝒃 + 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐
𝒊
= −𝟏 × 𝟎. 𝟐𝟓𝟑 + 𝟏 × 𝟎. 𝟐𝟓𝟑 + 𝟏 × 𝟎. 𝟎𝟖𝟕 = 𝟎. 𝟎𝟖𝟕
Now, Update the value of weights and bias using:
𝒘𝟏 𝐧𝐞𝐰 = 𝒘𝟏 𝐨𝐥𝐝 + ∆𝒘𝟏 = 𝟎. 𝟐𝟓𝟑 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟎𝟖𝟕 × −𝟏 = 𝟎. 𝟏𝟔𝟏𝟕
𝒘𝟐 𝐧𝐞𝐰 = 𝒘𝟐 𝐨𝐥𝐝 + ∆𝒘𝟐 = 𝟎. 𝟎𝟖𝟕 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟎𝟖𝟕 × 𝟏 = 𝟎. 𝟏𝟕𝟖𝟑

𝒃 𝐧𝐞𝐰 = 𝒃 𝐨𝐥𝐝 + ∆𝒃 = 𝟎. 𝟐𝟓𝟑 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟎𝟖𝟕 = 𝟎. 𝟑𝟒𝟒𝟑

Dr. Mona Nagy ElBedwehy 18


ADALINE Example
Compute the error :
𝟐 𝟐
𝑬𝟑 = 𝒕 − 𝒚𝒊𝒏 = 𝟏 − 𝟎. 𝟎𝟖𝟕 = 𝟎. 𝟖𝟑
For the fourth input, we calculate the net input:

𝒚𝒊𝒏 = 𝒃 + ෍ 𝒙𝒊 𝒘𝒊 = 𝒃 + 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐
𝒊
= −𝟏 × 𝟎. 𝟏𝟔𝟏𝟕 + −𝟏 × 𝟎. 𝟏𝟕𝟖𝟑 + 𝟏 × 𝟎. 𝟑𝟒𝟒𝟑 = 𝟎. 𝟎𝟎𝟒𝟑
Now, Update the value of weights and bias using:
𝒘𝟏 𝐧𝐞𝐰 = 𝒘𝟏 𝐨𝐥𝐝 + ∆𝒘𝟏 = 𝟎. 𝟏𝟔𝟏𝟕 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟎𝟎𝟒𝟑 × −𝟏 = 𝟎. 𝟐𝟔𝟐𝟏
𝒘𝟐 𝐧𝐞𝐰 = 𝒘𝟐 𝐨𝐥𝐝 + ∆𝒘𝟐 = 𝟎. 𝟏𝟕𝟖𝟑 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟎𝟎𝟒𝟑 × −𝟏 = 𝟎. 𝟐𝟕𝟖𝟕

𝒃 𝐧𝐞𝐰 = 𝒃 𝐨𝐥𝐝 + ∆𝒃 = 𝟎. 𝟑𝟒𝟒𝟑 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟎𝟎𝟒𝟑 = 𝟎. 𝟐𝟒𝟑𝟗

Dr. Mona Nagy ElBedwehy 19


ADALINE Example
Compute the error : 𝑬𝟒 = 𝒕 − 𝒚𝒊𝒏 𝟐
= −𝟏 − 𝟎. 𝟎𝟎𝟒𝟑 𝟐
= 𝟏. 𝟎𝟏

EPOCH 1
Inputs Target Net Input Weight Changes Wights Error
𝟐
𝒙𝟏 𝒙𝟐 𝒚 𝒚𝒊𝒏 △ 𝒘𝟏 △ 𝒘𝟐 △𝒃 𝒘𝟏 𝒘𝟐 bias 𝒕 − 𝒚𝒊𝒏
1 1 1 0.3 0.07 0.07 0.07 0.17 0.17 0.17 0.49
1 −1 1 0.17 0.083 −0.083 0.083 0.253 0.087 0.253 0.69
−1 1 1 0.087 −0.0913 0.0913 0.0913 0.1617 0.1783 0.3443 0.83
−1 −1 −1 0.0043 0.1004 0.1004 −0.1004 0.2621 0.2787 0.2439 1.01
Compute the total error then test the stop condition:

𝑬𝑻 = ෍ 𝒕 − 𝒚𝒊𝒏 𝟐 = 𝟎. 𝟒𝟗 + 𝟎. 𝟔𝟗 + 𝟎. 𝟖𝟑 + 𝟏. 𝟎𝟏 = 𝟑. 𝟎𝟐

Dr. Mona Nagy ElBedwehy 20


ADALINE Example
EPOCH 2
For the first input, we calculate the net input:

𝒚𝒊𝒏 = 𝒃 + ෍ 𝒙𝒊 𝒘𝒊 = 𝒃 + 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐
𝒊
= 𝟏 × 𝟎. 𝟐𝟔𝟐𝟏 + 𝟏 × 𝟎. 𝟐𝟕𝟖𝟕 + 𝟏 × 𝟎. 𝟐𝟒𝟑𝟗 = 𝟎. 𝟕𝟖𝟒𝟕
Now, Update the value of weights and bias using:
𝒘𝟏 𝐧𝐞𝐰 = 𝒘𝟏 𝐨𝐥𝐝 + ∆𝒘𝟏 = 𝟎. 𝟐𝟔𝟐𝟏 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟕𝟖𝟒𝟕 × 𝟏 = 𝟎. 𝟐𝟖𝟑𝟔
𝒘𝟐 𝐧𝐞𝐰 = 𝒘𝟐 𝐨𝐥𝐝 + ∆𝒘𝟐 = 𝟎. 𝟐𝟕𝟖𝟕 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟕𝟖𝟒𝟕 × 𝟏 = 𝟎. 𝟑𝟎𝟎𝟐

𝒃 𝐧𝐞𝐰 = 𝒃 𝐨𝐥𝐝 + ∆𝒃 = 𝟎. 𝟐𝟒𝟑𝟗 + 𝟎. 𝟏 × 𝟏 − 𝟎. 𝟕𝟖𝟒𝟕 = 𝟎. 𝟐𝟔𝟓𝟒

Dr. Mona Nagy ElBedwehy 21


ADALINE Example
Compute the error : 𝑬𝟏 = 𝒕 − 𝒚𝒊𝒏 𝟐
= 𝟏 − 𝟎. 𝟕𝟖𝟒𝟕 𝟐
= 𝟎. 𝟎𝟒𝟔

EPOCH 2
Inputs Target Net Input Weight Changes Wights Error
𝟐
𝒙𝟏 𝒙𝟐 𝒚 𝒚𝒊𝒏 △ 𝒘𝟏 △ 𝒘𝟐 △𝒃 𝒘𝟏 𝒘𝟐 bias 𝒕 − 𝒚𝒊𝒏
1 1 1 0.7847 0.0215 0.0215 0.0215 0.2836 0.3002 0.2654 0.046
1 −1 1 0.2488 0.7512 −0.0751 0.0751 0.3588 0.2251 0.3405 0.564
−1 1 1 0.2069 −0.7931 0.0793 0.0793 0.2795 0.3044 0.4198 0.629
−1 −1 −1 −0.1641 0.0836 0.0836 −0.0836 0.3631 0.3880 0.336 0.699
Compute the total error then test the stop condition:

𝑬𝑻 = ෍ 𝒕 − 𝒚𝒊𝒏 𝟐 = 𝟎. 𝟎𝟒𝟔 + 𝟎. 𝟓𝟔𝟒 + 𝟎. 𝟔𝟐𝟗 + 𝟎. 𝟔𝟗𝟗 = 𝟏. 𝟗𝟑𝟖

Dr. Mona Nagy ElBedwehy 22


ADALINE Example
EPOCH 3
Inputs Target Net Input Weight Changes Wights Error
𝟐
𝒙𝟏 𝒙𝟐 𝒚 𝒚𝒊𝒏 △ 𝒘𝟏 △ 𝒘𝟐 △𝒃 𝒘𝟏 𝒘𝟐 bias 𝒕 − 𝒚𝒊𝒏
1 1 1 1.0873 −0.087 −0.087 −0.087 0.3543 0.3793 0.3275 0.0076
1 −1 1 0.3025 0.0697 −0.0697 0.0697 0.4241 0.3096 0.3973 0.487
−1 1 1 0.2827 −0.0717 0.0717 0.0717 0.3523 0.3813 0.469 0.515
−1 −1 −1 −0.2647 0.0735 0.0735 −0.0735 0.4259 0.4548 0.3954 0.541

Compute the total error then test the stop condition:

𝟐
𝑬𝑻 = ෍ 𝒕 − 𝒚𝒊𝒏 = 𝟎. 𝟎𝟎𝟕𝟔 + 𝟎. 𝟒𝟖𝟕 + 𝟎. 𝟓𝟏𝟓 + 𝟎. 𝟓𝟒𝟏 = 𝟏. 𝟓𝟓𝟏

Dr. Mona Nagy ElBedwehy 23


ADALINE Example
EPOCH 4
Inputs Target Net Input Weight Changes Wights Error
𝟐
𝒙𝟏 𝒙𝟐 𝒚 𝒚𝒊𝒏 △ 𝒘𝟏 △ 𝒘𝟐 △𝒃 𝒘𝟏 𝒘𝟐 bias 𝒕 − 𝒚𝒊𝒏
1 1 1 1.2761 −0.0276 −0.0276 −0.0276 0.3983 0.4272 0.3678 0.076
1 −1 1 0.3389 0.0661 −0.0661 0.0661 0.4644 0.3611 0.4339 0.437
−1 1 1 0.3307 −0.0669 0.0669 0.0669 0.3974 0.428 0.5009 0.448
−1 −1 −1 −0.3246 0.0675 0.0675 −0.0675 0.465 0.4956 0.4333 0.456

Compute the total error then test the stop condition:

𝟐
𝑬𝑻 = ෍ 𝒕 − 𝒚𝒊𝒏 = 𝟎. 𝟎𝟕𝟔 + 𝟎. 𝟒𝟑𝟕 + 𝟎. 𝟒𝟒𝟖 + 𝟎. 𝟒𝟓𝟔 = 𝟏. 𝟒𝟏𝟕

Dr. Mona Nagy ElBedwehy 24


ADALINE Example
EPOCH 5
Inputs Target Net Input Weight Changes Wights Error
𝟐
𝒙𝟏 𝒙𝟐 𝒚 𝒚𝒊𝒏 △ 𝒘𝟏 △ 𝒘𝟐 △𝒃 𝒘𝟏 𝒘𝟐 bias 𝒕 − 𝒚𝒊𝒏
1 1 1 1.3939 −0.0394 −0.0394 −0.0394 0.4256 0.4562 0.393 0.155
1 −1 1 0.3634 0.0637 −0.0637 0.0637 0.4893 0.3925 0.457 0.405
−1 1 1 0.3609 −0.0639 0.0639 0.0639 0.4253 0.4654 0.5215 0.408
−1 −1 −1 −0.3603 0.064 0.064 −0.064 0.4893 0.5204 0.4575 0.409

Compute the total error then test the stop condition:

𝟐
𝑬𝑻 = ෍ 𝒕 − 𝒚𝒊𝒏 = 𝟎. 𝟏𝟓𝟓 + 𝟎. 𝟒𝟎𝟓 + 𝟎. 𝟒𝟎𝟖 + 𝟎. 𝟒𝟎𝟗 = 𝟏. 𝟑𝟕𝟕

Dr. Mona Nagy ElBedwehy 25


ADALINE Example
Summary Results

𝐄𝐩𝐨𝐜𝐡 𝐓𝐨𝐭𝐚𝐥 𝐌𝐞𝐚𝐧 𝐒𝐪𝐮𝐚𝐫𝐞 𝐄𝐫𝐫𝐨𝐫


Epoch 1 3.02
Epoch 2 1.938
Epoch 3 1.5506
Epoch 4 1.417
Epoch 5 1.377

Dr. Mona Nagy ElBedwehy 26


Quiz (1)
Use the ADALINE network to train AND NOT function with bipolar
inputs and targets. Perform two epochs of training.
[Note: the initial weights and bias are taken to be 0.2. The learning
rate is taken to be 0.2]

Dr. Mona Nagy ElBedwehy 27


Quiz (2)
Check the validity of Hebb's rule for the following cases:
A. 3- input AND gate with binary inputs and bipolar output.
B. 3- input AND gate with bipolar inputs and bipolar output.

Dr. Mona Nagy ElBedwehy 28


MADALINE
➢ A MADALINE consists of many ADALINEs arranged in a
multilayer net.
➢ We can think of a MADALINE having hidden layer of ADALINEs.
➢ Each ADALINE unit has a bias.
➢ There are two hidden ADALINEs.
➢ There is a single output ADALINE 𝒀.
➢ Each ADALINE simply applies a threshold function to the unit’s
net input.
➢ 𝒀 is a non-linear function of the input vector (𝒙𝟏 , 𝒙𝟐 ).

Dr. Mona Nagy ElBedwehy 29


MADALINE
➢ The use of hidden units gives the net additional power, but
makes training more complicated.
➢ The weights between the input layer and the hidden layer are
adjusted, and the weight between the hidden layer and the
output layer is fixed.
➢ It may use the majority vote rule, the output would have an
answer either true or false.
➢ ADALINE and MADALINE layer neurons have a bias of ‘1’
connected to them.

Dr. Mona Nagy ElBedwehy 30


MADALINE Architecture

Dr. Mona Nagy ElBedwehy 31


MADALINE Architecture

Dr. Mona Nagy ElBedwehy 32


MADALINE Algorithm
➢ The MRI algorithm is the original form of MADALINE training
[Widrow and Hoff, 1960], only the weights for the hidden
ADALINEs are adjusted; the weights for the output unit are
fixed.

➢ The MRII algorithm [Widrow, Winter, and Baxter, 1987]


provides a method for adjusting all weights in the net.

Dr. Mona Nagy ElBedwehy 33


MRI Algorithm
➢ The weights and the bias that feed into the output unit are
determined so that the response of unit is 1 if the signal it
receives from either ADALINE neurons or both is 1 and is -1 if
both ADALINE neurons send a signal of -1.

➢ In other words, the output unit performs the logic function OR


on the signals it receives from two ADALINE neurons.
➢ The weights and the bias into the output unit are equal to 0.5.

Dr. Mona Nagy ElBedwehy 34


MRI Algorithm
Algorithm 3
Step 1. Initialize the weights 𝒗𝟏 and 𝒗𝟐 and bias are set as described.
Small random values are used for ADALINE weights.
Set learning rate 𝜼 𝟎 < 𝜼 ≤ 𝟏 .
Step 2. While stopping condition is false, do steps 3 – 9.
Step 3. For each bipolar training pair 𝒔, 𝒕, do Steps 4 – 8.
Step 4. Set the activations of input units, 𝒊 = 𝟏, … , 𝒏:
𝒙𝒊 = 𝒔𝒊
Step 5. Compute the net input to each hidden ADALINE:
𝒛𝒊𝒏𝟏 = 𝒃𝟏 + 𝒙𝟏 𝒘𝟏𝟏 + 𝒙𝟐 𝒘𝟐𝟏
𝒛𝒊𝒏𝟐 = 𝒃𝟐 + 𝒙𝟏 𝒘𝟏𝟐 + 𝒙𝟐 𝒘𝟐𝟐

Dr. Mona Nagy ElBedwehy 35


MRI Algorithm
Step 6. Determine the output of each hidden ADALINE:
𝒛𝟏 = 𝒇 𝒛𝒊𝒏𝟏

𝒛𝟐 = 𝒇 𝒛𝒊𝒏𝟐

Step 7. Determine the output of the net:


𝒚𝒊𝒏 = 𝒃𝟑 + 𝒛𝟏 𝒗𝟏 + 𝒛𝟐 𝒗𝟐
𝒚 = 𝒇 𝒚𝒊𝒏

Dr. Mona Nagy ElBedwehy 36


MRI Algorithm
Step 8. Determine the error and update the weights:
If 𝒕 = 𝒚, no weight updates are performed.
If 𝒕 = 𝟏, then update the weights on 𝒛𝒋 , the unit
whose net input is closest to 0.
𝒘𝒊𝒋 𝐧𝐞𝐰 = 𝒘𝒊𝒋 𝐨𝐥𝐝 + 𝜼 𝟏 − 𝒛𝒊𝒏𝒋 𝒙𝒊
𝒃𝒋 𝐧𝐞𝐰 = 𝒃𝒋 𝐨𝐥𝐝 + 𝜼 𝟏 − 𝒛𝒊𝒏𝒋
If 𝒕 = −𝟏, then update the weights on all units 𝒛𝒌
that have positive net input.
𝒘𝒊𝒌 𝐧𝐞𝐰 = 𝒘𝒊𝒌 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝒌 𝒙𝒊

𝒃𝒌 𝐧𝐞𝐰 = 𝒃𝒌 𝐨𝐥𝐝 + 𝜼 𝟏 − 𝒛𝒊𝒏𝒌

Dr. Mona Nagy ElBedwehy 37


MRI Algorithm
Step 9. Test the stopping condition.

If the weight changes have stopped (or reached an acceptable


level), or if the maximum number of weight update iterations
have been performed, then stop; otherwise continue.

Dr. Mona Nagy ElBedwehy 38


MRI Algorithm

If 𝒕 = 𝟏 and error has occurred, it means that all 𝒁 units had


value −𝟏 and at least one 𝒁 unit needs to have a value of +𝟏. So,
we take 𝒛𝒋 to be the 𝒁 unit whose the net input is closest to 𝟎
and adjust its weights.

If 𝒕 = −𝟏 and error has occurred, it means that at least one 𝒁


unit had value +𝟏 and all 𝒁 units need to have a value of −𝟏. So,
we adjust the weights on all of the 𝒁 units with the positive net
input.

Dr. Mona Nagy ElBedwehy 39


MRI Example
Example (2)

Implement XOR function using MRI to train a MADALINE networks


for bipolar inputs and targets.

Dr. Mona Nagy ElBedwehy 40


MRI Example
Answer

➢ Consider the truth table for XOR logic function. Inputs Target

➢ Initialize the weights 𝒗𝟏 , 𝒗𝟐 and 𝒃𝟑 are set to 0.5. 𝒙𝟏 𝒙𝟐 𝒚


1 1 −1
➢ The weights into 𝒁𝟏 , 𝒁𝟐 are
1 −1 1
𝒘𝟏𝟏 = 𝟎. 𝟎𝟓, 𝒘𝟐𝟏 = 𝟎. 𝟐, 𝒃𝟏 = 𝟎. 𝟑 −1 1 1
𝒘𝟏𝟐 = 𝟎. 𝟏, 𝒘𝟐𝟐 = 𝟎. 𝟐, 𝒃𝟐 = 𝟎. 𝟏𝟓 −1 −1 −1

➢ Set learning rate 𝜼 = 𝟎. 𝟓.

Dr. Mona Nagy ElBedwehy 41


MRI Example
For the first input (1,1), 𝒕 = −𝟏:
𝒛𝒊𝒏𝟏 = 𝒃𝟏 + 𝒙𝟏 𝒘𝟏𝟏 + 𝒙𝟐 𝒘𝟐𝟏 = 𝟎. 𝟑 + 𝟎. 𝟎𝟓 × 𝟏 + 𝟎. 𝟐 × 𝟏 = 𝟎. 𝟓𝟓
𝒛𝒊𝒏𝟐 = 𝒃𝟐 + 𝒙𝟏 𝒘𝟏𝟐 + 𝒙𝟐 𝒘𝟐𝟐 = 𝟎. 𝟏𝟓 + 𝟎. 𝟏 × 𝟏 + 𝟎. 𝟐 × 𝟏 = 𝟎. 𝟒𝟓
Determine the output of each hidden ADALINE:
𝒛𝟏 = 𝒇 𝒛𝒊𝒏𝟏 = 𝒇 𝟎. 𝟓𝟓 = 𝟏

𝒛𝟐 = 𝒇 𝒛𝒊𝒏𝟐 = 𝒇 𝟎. 𝟒𝟓 = 𝟏

Determine the output of the net:


𝒚𝒊𝒏 = 𝒃𝟑 + 𝒛𝟏 𝒗𝟏 + 𝒛𝟐 𝒗𝟐 = 𝟎. 𝟓 + 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟓 × 𝟏 = 𝟏. 𝟓
𝒚 = 𝒇 𝒚𝒊𝒏 = 𝒇 𝟏. 𝟓 = 𝟏

Dr. Mona Nagy ElBedwehy 42


MRI Example
∵ 𝒕 − 𝒚 = −𝟏 − 𝟏 = −𝟐 ≠ 𝟎, so an error occurred.
Since 𝒕 = −𝟏, then update the weights on all units 𝒛𝟏 and 𝒛𝟐 that have
positive net input.
As here both net inputs are positive, then update the weights and bias
on both hidden units.

Dr. Mona Nagy ElBedwehy 43


MRI Example
Update the weights on 𝒁𝟏 as follows.

𝒘𝟏𝟏 𝐧𝐞𝐰 = 𝒘𝟏𝟏 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟏 𝒙𝟏


= 𝟎. 𝟎𝟓 + 𝟎. 𝟓 × −𝟏 − 𝟎. 𝟓𝟓 × 𝟏 = −𝟎. 𝟕𝟐𝟓

𝒘𝟐𝟏 𝐧𝐞𝐰 = 𝒘𝟐𝟏 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟏 𝒙𝟐


= 𝟎. 𝟐 + 𝟎. 𝟓 × −𝟏 − 𝟎. 𝟓𝟓 × 𝟏 = −𝟎. 𝟓𝟕𝟓

𝒃𝟏 𝐧𝐞𝐰 = 𝒃𝟏 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟏 = 𝟎. 𝟑 + 𝟎. 𝟓 × −𝟏 − 𝟎. 𝟓𝟓


= −𝟎. 𝟒𝟕𝟓

Dr. Mona Nagy ElBedwehy 44


MRI Example
Update the weights on 𝒁𝟐 as follows.

𝒘𝟏𝟐 𝐧𝐞𝐰 = 𝒘𝟏𝟐 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟐 𝒙𝟏

= 𝟎. 𝟏 + 𝟎. 𝟓 × −𝟏 − 𝟎. 𝟒𝟓 × 𝟏 = −𝟎. 𝟔𝟐𝟓

𝒘𝟐𝟐 𝐧𝐞𝐰 = 𝒘𝟐𝟐 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟐 𝒙𝟐

= 𝟎. 𝟐 + 𝟎. 𝟓 × −𝟏 − 𝟎. 𝟒𝟓 × 𝟏 = −𝟎. 𝟓𝟐𝟓

𝒃𝟐 𝐧𝐞𝐰 = 𝒃𝟐 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟐 = 𝟎. 𝟏𝟓 + 𝟎. 𝟓 × −𝟏 − 𝟎. 𝟒𝟓

= −𝟎. 𝟓𝟕𝟓

Dr. Mona Nagy ElBedwehy 45


MRI Example
For the second input (𝟏, −𝟏), 𝒕 = 𝟏:
𝒛𝒊𝒏𝟏 = 𝒃𝟏 + 𝒙𝟏 𝒘𝟏𝟏 + 𝒙𝟐 𝒘𝟐𝟏 = −𝟎. 𝟒𝟕𝟓 − 𝟎. 𝟕𝟐𝟓 × 𝟏 − 𝟎. 𝟓𝟕𝟓 × −𝟏
= −𝟎. 𝟔𝟐𝟓
𝒛𝒊𝒏𝟐 = 𝒃𝟐 + 𝒙𝟏 𝒘𝟏𝟐 + 𝒙𝟐 𝒘𝟐𝟐 = −𝟎. 𝟓𝟕𝟓 − 𝟎. 𝟔𝟐𝟓 × 𝟏 − 𝟎. 𝟓𝟐𝟓 × −𝟏
= −𝟎. 𝟔𝟕𝟓
Determine the output of each hidden ADALINE:
𝒛𝟏 = 𝒇 𝒛𝒊𝒏𝟏 = 𝒇 −𝟎. 𝟔𝟐𝟓 = −𝟏, 𝒛𝟐 = 𝒇 𝒛𝒊𝒏𝟐 = 𝒇 −𝟎. 𝟔𝟕𝟓 = −𝟏

Determine the output of the net:


𝒚𝒊𝒏 = 𝒃𝟑 + 𝒛𝟏 𝒗𝟏 + 𝒛𝟐 𝒗𝟐 = 𝟎. 𝟓 + 𝟎. 𝟓 × −𝟏 + 𝟎. 𝟓 × −𝟏 = −𝟎. 𝟓
𝒚 = 𝒇 𝒚𝒊𝒏 = 𝒇 −𝟎. 𝟓 = −𝟏

Dr. Mona Nagy ElBedwehy 46


MRI Example
∵ 𝒕 − 𝒚 = 𝟏 + 𝟏 = 𝟐 ≠ 𝟎, so an error occurred.
If 𝒕 = 𝟏, then update the weights on all units 𝒛𝟏 and 𝒛𝟐 where the net input
is closest to 0. So, apply the weights updating on 𝒁𝟏 .

𝒘𝟏𝟏 𝐧𝐞𝐰 = 𝒘𝟏𝟏 𝐨𝐥𝐝 + 𝜼 𝟏 − 𝒛𝒊𝒏𝟏 𝒙𝟏

= −𝟎. 𝟕𝟐𝟓 + 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟔𝟐𝟓 × 𝟏 = 𝟎. 𝟎𝟖𝟕𝟓

𝒘𝟐𝟏 𝐧𝐞𝐰 = 𝒘𝟐𝟏 𝐨𝐥𝐝 + 𝜼 𝟏 − 𝒛𝒊𝒏𝟏 𝒙𝟐

= −𝟎. 𝟓𝟕𝟓 + 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟔𝟐𝟓 × −𝟏 = −𝟏. 𝟑𝟖𝟕𝟓

𝒃𝟏 𝐧𝐞𝐰 = 𝒃𝟏 𝐨𝐥𝐝 + 𝜼 𝟏 − 𝒛𝒊𝒏𝟏 = −𝟎. 𝟒𝟕𝟓 + 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟔𝟐𝟓 = 𝟎. 𝟑𝟑𝟕𝟓

Dr. Mona Nagy ElBedwehy 47


MRI Example
For the third input (−𝟏, 𝟏), 𝒕 = 𝟏:
𝒛𝒊𝒏𝟏 = 𝒃𝟏 + 𝒙𝟏 𝒘𝟏𝟏 + 𝒙𝟐 𝒘𝟐𝟏 = 𝟎. 𝟑𝟑𝟕 + 𝟎. 𝟎𝟖𝟕𝟓 × −𝟏 − 𝟎. 𝟏. 𝟑𝟖𝟕𝟓 × 𝟏
= −𝟏. 𝟏𝟑𝟕𝟓
𝒛𝒊𝒏𝟐 = 𝒃𝟐 + 𝒙𝟏 𝒘𝟏𝟐 + 𝒙𝟐 𝒘𝟐𝟐 = −𝟎. 𝟓𝟕𝟓 − 𝟎. 𝟔𝟐𝟓 × −𝟏 − 𝟎. 𝟓𝟐𝟓 × 𝟏
= −𝟎. 𝟒𝟕𝟓
Determine the output of each hidden ADALINE:
𝒛𝟏 = 𝒇 𝒛𝒊𝒏𝟏 = 𝒇 −𝟏. 𝟏𝟑𝟕𝟓 = −𝟏, 𝒛𝟐 = 𝒇 𝒛𝒊𝒏𝟐 = 𝒇 −𝟎. 𝟒𝟕𝟓 = −𝟏

Determine the output of the net:


𝒚𝒊𝒏 = 𝒃𝟑 + 𝒛𝟏 𝒗𝟏 + 𝒛𝟐 𝒗𝟐 = 𝟎. 𝟓 + 𝟎. 𝟓 × −𝟏 + 𝟎. 𝟓 × −𝟏 = −𝟎. 𝟓
𝒚 = 𝒇 𝒚𝒊𝒏 = 𝒇 −𝟎. 𝟓 = −𝟏

Dr. Mona Nagy ElBedwehy 48


MRI Example
∵ 𝒕 − 𝒚 = 𝟏 + 𝟏 = 𝟐 ≠ 𝟎, so an error occurred.
If 𝒕 = 𝟏, then update the weights on all units 𝒛𝟏 and 𝒛𝟐 where the net input
is closest to 0. So, apply the weights updating on 𝒁𝟐 .

𝒘𝟏𝟐 𝐧𝐞𝐰 = 𝒘𝟏𝟐 𝐨𝐥𝐝 + 𝜼 𝟏 − 𝒛𝒊𝒏𝟐 𝒙𝟏


= −𝟎. 𝟔𝟐𝟓 + 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟒𝟕𝟓 × −𝟏 = −𝟏. 𝟑𝟔𝟐𝟓

𝒘𝟐𝟐 𝐧𝐞𝐰 = 𝒘𝟐𝟐 𝐨𝐥𝐝 + 𝜼 𝟏 − 𝒛𝒊𝒏𝟐 𝒙𝟐


= −𝟎. 𝟓𝟐𝟓 + 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟒𝟕𝟓 × 𝟏 = 𝟎. 𝟐𝟏𝟐𝟓

𝒃𝟐 𝐧𝐞𝐰 = 𝒃𝟐 𝐨𝐥𝐝 + 𝜼 𝟏 − 𝒛𝒊𝒏𝟐 = −𝟎. 𝟓𝟕𝟓 + 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟒𝟕𝟓


= 𝟎. 𝟏𝟔𝟐𝟓

Dr. Mona Nagy ElBedwehy 49


MRI Example
For the four input (−𝟏, −𝟏), 𝒕 = −𝟏:
𝒛𝒊𝒏𝟏 = 𝒃𝟏 + 𝒙𝟏 𝒘𝟏𝟏 + 𝒙𝟐 𝒘𝟐𝟏
= 𝟎. 𝟑𝟑𝟕𝟓 + 𝟎. 𝟎𝟖𝟕𝟓 × −𝟏 + (−𝟏. 𝟑𝟖𝟕𝟓) × −𝟏 = 𝟏. 𝟔𝟑𝟕𝟓
𝒛𝒊𝒏𝟐 = 𝒃𝟐 + 𝒙𝟏 𝒘𝟏𝟐 + 𝒙𝟐 𝒘𝟐𝟐

= 𝟎. 𝟏𝟔𝟐𝟓 + −𝟏. 𝟑𝟔𝟐𝟓 × −𝟏 + 𝟎. 𝟐𝟏𝟐𝟓 × −𝟏 = 𝟏. 𝟑𝟏𝟐𝟓

Determine the output of each hidden ADALINE:


𝒛𝟏 = 𝒇 𝒛𝒊𝒏𝟏 = 𝒇 𝟏. 𝟔𝟑𝟗𝟓 = 𝟏, 𝒛𝟐 = 𝒇 𝒛𝒊𝒏𝟐 = 𝒇 𝟏. 𝟑𝟏𝟐𝟓 = 𝟏

Determine the output of the net:


𝒚𝒊𝒏 = 𝒃𝟑 + 𝒛𝟏 𝒗𝟏 + 𝒛𝟐 𝒗𝟐 = 𝟎. 𝟓 + 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟓 × 𝟏 = 𝟏. 𝟓
𝒚 = 𝒇 𝒚𝒊𝒏 = 𝒇 𝟏. 𝟓 = 𝟏
Dr. Mona Nagy ElBedwehy 50
MRI Example
∵ 𝒕 − 𝒚 = −𝟏 − 𝟏 = −𝟐 ≠ 𝟎, so an error occurred.
Since 𝒕 = −𝟏, then update the weights on all units 𝒛𝟏 and 𝒛𝟐 that have
positive net input.
As here both net inputs are positive, then update the weights and bias on
both hidden units.

Dr. Mona Nagy ElBedwehy 51


MRI Example
Update the weights on 𝒁𝟏 as follows.

𝒘𝟏𝟏 𝐧𝐞𝐰 = 𝒘𝟏𝟏 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟏 𝒙𝟏

= 𝟎. 𝟎𝟖𝟕𝟓 + 𝟎. 𝟓 × −𝟏 − 𝟏. 𝟔𝟑𝟕𝟓 × −𝟏 = 𝟏. 𝟒𝟎𝟔𝟐𝟓

𝒘𝟐𝟏 𝐧𝐞𝐰 = 𝒘𝟐𝟏 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟏 𝒙𝟐

= −𝟏. 𝟑𝟔𝟐𝟓 + 𝟎. 𝟓 × −𝟏 − 𝟏. 𝟔𝟑𝟕𝟓 × −𝟏 = −𝟎. 𝟎𝟔𝟖𝟕

𝒃𝟏 𝐧𝐞𝐰 = 𝒃𝟏 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟏 = 𝟎. 𝟑𝟑𝟕 + 𝟎. 𝟓 × −𝟏 − 𝟏. 𝟔𝟑𝟕𝟓

= −𝟎. 𝟗𝟖𝟏𝟑

Dr. Mona Nagy ElBedwehy 52


MRI Example
Update the weights on 𝒁𝟐 as follows.

𝒘𝟏𝟐 𝐧𝐞𝐰 = 𝒘𝟏𝟐 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟐 𝒙𝟏

= −𝟏. 𝟑𝟔𝟐𝟓 + 𝟎. 𝟓 × −𝟏 − 𝟏. 𝟑𝟏𝟐𝟓 × −𝟏 = −𝟎. 𝟐𝟎𝟔𝟑

𝒘𝟐𝟐 𝐧𝐞𝐰 = 𝒘𝟐𝟐 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟐 𝒙𝟐

= 𝟎. 𝟐𝟏𝟐𝟓 + 𝟎. 𝟓 × −𝟏 − 𝟏. 𝟑𝟏𝟐𝟓 × −𝟏 = 𝟏. 𝟑𝟔𝟖𝟕𝟓

𝒃𝟐 𝐧𝐞𝐰 = 𝒃𝟐 𝐨𝐥𝐝 + 𝜼 −𝟏 − 𝒛𝒊𝒏𝟐 = 𝟎. 𝟎𝟏𝟔𝟐𝟓 + 𝟎. 𝟓 × −𝟏 − 𝟏. 𝟑𝟏𝟐𝟓

= −𝟎. 𝟗𝟗𝟑𝟖

Dr. Mona Nagy ElBedwehy 53


MRI Example
Inputs Target Net Input out Weight Changes z1 Weight Changes z2 Wights
△ △ △ △ △ △

Dr. Mona Nagy ElBedwehy 54


What ADALINE and the Perceptron have in
common?
➢ They are classifiers for binary classification both have a linear
decision boundary.
➢ Both can learn iteratively, sample by sample (the Perceptron
naturally, and Adaline via stochastic gradient descent).

➢ Both use a threshold function.

Dr. Mona Nagy ElBedwehy 55


The differences between the Perceptron and
ADALINE
➢ The Perceptron uses the class labels to learn model
coefficients.
➢ ADALINE uses continuous predicted values (from the net input)
to learn the model coefficients, which is more “powerful” since
it tells us by “how much” we were right or wrong.
➢ So, in the perceptron, we simply use the predicted class labels
to update the weights, and in ADALINE, we use a continuous
response.

Dr. Mona Nagy ElBedwehy 56

You might also like