0% found this document useful (0 votes)
10 views24 pages

Lec7 Inroduction To Neural Network

The document discusses learning rules in neural networks, focusing on the Perceptron architecture and its learning rule introduced by Frank Rosenblatt. It outlines the steps involved in neural network learning, including weight initialization, output calculation, and weight updates based on target outputs. Additionally, it covers the Delta learning rule and backpropagation as advanced methods for training neural networks, emphasizing the importance of learning rates and the limitations of the Perceptron learning rule.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views24 pages

Lec7 Inroduction To Neural Network

The document discusses learning rules in neural networks, focusing on the Perceptron architecture and its learning rule introduced by Frank Rosenblatt. It outlines the steps involved in neural network learning, including weight initialization, output calculation, and weight updates based on target outputs. Additionally, it covers the Delta learning rule and backpropagation as advanced methods for training neural networks, emphasizing the importance of learning rates and the limitations of the Perceptron learning rule.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

BIO3603:Medical Pattern Recognition

Lecture7
Dr. Lamees Nasser
E-mail: [email protected]
Third Year– Biomedical Engineering Department
Academic Year 2024- 2025

12/1/2024 1
Learning Rules in Neural Network
Perceptron Architecture

• The output of the network is given by


𝑚

𝑦 = 𝑓 ෍ 𝑥𝑖 𝑤𝑖 + 𝑏
𝑖=1

An artificial neuron: basic unit of neural network


Learning Rules in Neural Network

• A procedure for modifying the weights and biases of a network. (This


procedure may also be referred to as a training algorithm.)

• The purpose of the learning rule is to train the network to perform some
task.
• There are many types of neural network learning rules. Commonly
used learning rules are:

• Perceptron learning rule


• Delta learning rule.
Neural Network Learning Steps

1. Initialize all weights and biases to small random values, typically ∈[-1,1],
2. Present a training sample and pass it through the network
3. Calculate the network output
• Inputs applied
• Multiplied by weights
• Summed
• Activation function applied
4. Compare network output with target output
5. Update the weights and biases of the neural network
6. Back to step 2 and continue iterating until we consider that we have a
good model.
Perceptron Learning Rule

• In the late 1950s, Frank Rosenblatt introduced a learning rule for


training perceptron networks to solve pattern recognition problems.
• It uses the hard limit transfer function as the activation of the output
neuron. Therefore, the perceptron output is limited to either 1 or 0.
• This learning rule is an example of supervised training, in which the
learning rule is provided with a set of examples of proper network
behavior:

𝐩1 , 𝐭1 , 𝐩2 , 𝐭 2 , … , 𝐩𝑄 , 𝐭 𝑄
𝐩q is an input to the network and 𝐭 2 is the corresponding target output

• As each input is applied to the network, the network output is compared


to the target. The learning rule then adjusts the weights and biases
of the network in order to move the network output closer to the target.
Perceptron Learning Rule (cont'd)

The input/target pairs for our test problem are 𝐩1 , 𝐭1 , 𝐩2 , 𝐭 2 , … , 𝐩𝑄 , 𝐭 𝑄


Perceptron Learning Rule (cont'd)

• If bias = 0, then the decision boundary must pass through origin


• Find decision boundary that separates the, vectors 𝐩2 and 𝐩3 from the
vector 𝐩1
• There are indeed an infinite number of such boundaries
Perceptron Learning Rule (cont'd)

• Shows the weight vectors that correspond to the allowable decision


boundaries. (Recall that the weight vector is orthogonal to the decision
boundary.)
• Find a weight vector that points in one of these directions
Constructing Learning Rules

1- Training begins by assigning random initial values to


weights

𝑇
1𝐰 = 1.0 −0.8

2- Presenting the input vector 𝐩1 to the network

𝑎 = hardlim 𝑇
1 𝐰 𝐩1 = hardlim 1.0 −0.8] 1
2
𝑎 = hardlim(−0.6) = 0
.
The network output is 0, while the target output, is 1.

Incorrect Classification.
Constructing Learning Rules (cont'd)

Update weights by adding 𝐩1 to, 𝐰

If 𝑡 = 1 and 𝑎 = 0, then 1 𝐰 new = 1 𝐰 old + 𝐩

new = 𝐰 old + 𝐩 = 1.0 1 2.0


1𝐰 1 1 + =
−0.8 2 1.2
Presenting the input vector 𝐩2 to the network

𝑇𝐩 −1
𝑎 = hardlim 1𝐰 2 = hardlim 2.0 1.2
2
= hardlim(0.4) = 1
.
The network output is 1, while the target output, is 0.
Incorrect Classification.
Constructing Learning Rules (cont'd)
Update weights by subtracting 𝐩2 from, 𝐰

If 𝑡 = 0 and 𝑎 = 1, then 1 𝐰 𝑛𝑒𝑤 = 1 𝐰 old − 𝐩

new = 𝐰 old − 𝐩 = 2.0 − −1 = 3.0


1𝐰 1 2
1.2 2 −0.8
Presenting the input vector 𝐩3 to the network

𝑇 0
𝑎 = hardlim 1 𝐰 𝐩3 = hardlim [3.0 − 0.8]
−1
= hardlim (0.8) = 1

The network output is 1, while the target output, is 0.


Incorrect Classification.
Constructing Learning Rules (cont'd)

Update weights by subtracting 𝐩3 from, 𝐰

new = 𝐰 old − 𝐩 = 3.0 0 3.0


1𝐰 1 3 − =
−0.8 −1 0.2

Patterns are now correctly classified.


If 𝑡 = 𝑎, then 1 𝐰 𝑛𝑒𝑤 = 1 𝐰 𝑜𝑙𝑑
Unified Learning Rule
• Here are the three rules, which cover all possible combinations of output
and target values:
If 𝑡 = 1 and 𝑎 = 0, then 1 𝐰 new = 1 𝐰 old + 𝐩.
If 𝑡 = 0 and 𝑎 = 1, then 1 𝐰 new = 1 𝐰 old − 𝐩.
If 𝑡 = 𝑎, then 1 𝐰 new = 1 𝐰 old .

Define 𝑒 = 𝑡 − 𝑎

If 𝑒 = 1, then 1 𝐰 𝑛𝑒𝑤 = 1 𝐰 old + 𝐩


If 𝑒 = −1, then 1 𝐰 𝑛𝑒𝑤 = 1 𝐰 old − 𝐩
If 𝑒 = 0, then 1 𝐰 new = 1 𝐰 old

new = 𝐰 old + 𝑒𝐩 = 𝐰 old + (𝑡 − 𝑎)𝐩 A bias is a


1𝐰 1 1 weight with
an input of 1.
𝑏 𝑛𝑒𝑤 = 𝑏 old + 𝑒
Perceptron Learning Rule Steps

1. Initialize the weights to small random numbers (between −1 and +1).


2. For each training sample 𝒙(𝑖) , compute the output value.
a. If output is incorrect, update the weights.
𝒘new = 𝒘old + 𝐞 ⋅ 𝒙
𝒊 𝒊 𝒊
3- Once the modification to weights has taken place, the next sample of
training data is used in the same way.
4- Iterate until all the weights are correct, and all errors are zero.

• Epoch – single presentation of the entire data to the neural network.


Typically, many epochs are required to train the neural network
• Iteration - the process of providing the network with a single input and
updating the network's weights.
Limitations of Perceptron Learning Rule

• If training data set is not linearly separable then perceptron algorithm will
not converge (never classify the samples 100% correctly)

• So, we need to add condition to stop the training, such as:

• Put a limit on the number of iterations, so that the algorithm will


terminate even if the sample set is not linearly separable.

• Include an error bound, the algorithm can stop as soon as the portion
of misclassified samples is less than this bound. This idea is developed
in the Delta Learning Rule
Delta Learning Rule

• In 1960 - Bernard Widrow and his student Marcian Hoff introduced


the delta learning rule (also known as Least mean square (LMS)
algorithm or Widrow-Hoff algorithm) for training neural networks.

• Delta rule can be derived for any differentiable output/activation function.

• The key idea behind the delta rule is to use gradient descent to search the
hypothesis space of possible weight vectors to find the weights that best
fit the training examples (minimize the error function).

• The delta rule is considered to be a special case of the backpropagation


algorithm.
Delta Learning Rule (cont'd)

• The delta rule is derived by attempting to minimize the error in the output
of the neural network through gradient descent. There are many ways to
define this error, one common measure is the squared difference between
the target output and obtained value :

1 2
𝐸(𝑤) = ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑
2

• 𝐷 is set of training samples.

• 𝑡 is the target output for training example ' d '.

• 𝑜 is the network output for training example ' d '.


Delta Learning Rule (cont'd)

• This learning rule can also be written :

𝜕𝐸
𝑾inew = 𝑊iold − 𝜂 ∗
𝜕𝑊𝑖

Gradient Descent
𝐸(𝑤) Initial weight 𝐸(𝑤)
Initial weight

-ve gradient(slope) +ve gradient( slope)


global minima
global minima

𝑤 𝑤

Wnew= Wold - (-ve) Wnew= Wold - (+ve)


Delta Learning Rule (cont'd)

• 𝜂 is a positive constant called the learning rate, which determines the


step size in the gradient descent search.

• The negative sign is present because we want to move the weight vector
in the direction that decreases E.

𝜕𝐸
• (partial derivative of 𝐸 W.R.T 𝑊): change in prediction Error (E)
𝜕𝑊𝑖
given the change in weight (W)
Delta Learning Rule- Derivation
1
𝐸(𝑤) = ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 2 Linear activation function
2
𝜕𝐸 𝜕 1
= ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 2
𝜕𝑤𝑖 𝜕𝑤𝑖 2
1 𝜕
= ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 2
2 𝜕𝑤𝑖
1 𝜕
= ∑𝑑∈𝐷 2 𝑡𝑑 − 𝑜𝑑 𝑡 − 𝑜𝑑 𝑜(𝑥)
Ԧ = 𝑤 ⋅ 𝑥Ԧ
2 𝜕𝑤𝑖 𝑑
𝜕
= ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 𝑡 − 𝑤 ⋅ 𝑥Ԧ𝑑 𝜕𝐸
𝜕𝑤𝑖 𝑑 𝑾inew = 𝑊iold − 𝜂 ∗
𝜕𝐸 𝜕𝑊𝑖
= ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 −𝑥𝑖𝑑
𝜕𝑤𝑖

𝑾inew = 𝑊iold + 𝜂 ∗ ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 𝑥𝑖𝑑


The Learning Rate
a) Learning rate is optimal, model converges to the minimum
b) Learning rate is too small, it takes more time but converges to the
minimum
c) Learning rate is higher than the optimal value, it overshoots but
converges
d) Learning rate is very large, it overshoots and diverges, moves away from
the minima, performance decreases on learning
The most commonly used rates are: 0.001, 0.003, 0.01, 0.03, 0.1, 0.3.
Backpropagation learning algorithm

• Backpropagation is a supervised learning algorithm, for training Multi-


layer Perceptron.
• A generalization of the Delta Learning Rule

• In multilayer networks with nonlinear activation functions, the


relationship between the network weights and the error is more complex.

• In order to calculate the derivatives, we need to use the chain rule of


calculus.

You might also like