Lec7 Inroduction To Neural Network
Lec7 Inroduction To Neural Network
Lecture7
Dr. Lamees Nasser
E-mail: [email protected]
Third Year– Biomedical Engineering Department
Academic Year 2024- 2025
12/1/2024 1
Learning Rules in Neural Network
Perceptron Architecture
𝑦 = 𝑓 𝑥𝑖 𝑤𝑖 + 𝑏
𝑖=1
• The purpose of the learning rule is to train the network to perform some
task.
• There are many types of neural network learning rules. Commonly
used learning rules are:
1. Initialize all weights and biases to small random values, typically ∈[-1,1],
2. Present a training sample and pass it through the network
3. Calculate the network output
• Inputs applied
• Multiplied by weights
• Summed
• Activation function applied
4. Compare network output with target output
5. Update the weights and biases of the neural network
6. Back to step 2 and continue iterating until we consider that we have a
good model.
Perceptron Learning Rule
𝐩1 , 𝐭1 , 𝐩2 , 𝐭 2 , … , 𝐩𝑄 , 𝐭 𝑄
𝐩q is an input to the network and 𝐭 2 is the corresponding target output
𝑇
1𝐰 = 1.0 −0.8
𝑎 = hardlim 𝑇
1 𝐰 𝐩1 = hardlim 1.0 −0.8] 1
2
𝑎 = hardlim(−0.6) = 0
.
The network output is 0, while the target output, is 1.
Incorrect Classification.
Constructing Learning Rules (cont'd)
𝑇𝐩 −1
𝑎 = hardlim 1𝐰 2 = hardlim 2.0 1.2
2
= hardlim(0.4) = 1
.
The network output is 1, while the target output, is 0.
Incorrect Classification.
Constructing Learning Rules (cont'd)
Update weights by subtracting 𝐩2 from, 𝐰
𝑇 0
𝑎 = hardlim 1 𝐰 𝐩3 = hardlim [3.0 − 0.8]
−1
= hardlim (0.8) = 1
Define 𝑒 = 𝑡 − 𝑎
• If training data set is not linearly separable then perceptron algorithm will
not converge (never classify the samples 100% correctly)
• Include an error bound, the algorithm can stop as soon as the portion
of misclassified samples is less than this bound. This idea is developed
in the Delta Learning Rule
Delta Learning Rule
• The key idea behind the delta rule is to use gradient descent to search the
hypothesis space of possible weight vectors to find the weights that best
fit the training examples (minimize the error function).
• The delta rule is derived by attempting to minimize the error in the output
of the neural network through gradient descent. There are many ways to
define this error, one common measure is the squared difference between
the target output and obtained value :
1 2
𝐸(𝑤) = ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑
2
𝜕𝐸
𝑾inew = 𝑊iold − 𝜂 ∗
𝜕𝑊𝑖
Gradient Descent
𝐸(𝑤) Initial weight 𝐸(𝑤)
Initial weight
𝑤 𝑤
• The negative sign is present because we want to move the weight vector
in the direction that decreases E.
𝜕𝐸
• (partial derivative of 𝐸 W.R.T 𝑊): change in prediction Error (E)
𝜕𝑊𝑖
given the change in weight (W)
Delta Learning Rule- Derivation
1
𝐸(𝑤) = ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 2 Linear activation function
2
𝜕𝐸 𝜕 1
= ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 2
𝜕𝑤𝑖 𝜕𝑤𝑖 2
1 𝜕
= ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 2
2 𝜕𝑤𝑖
1 𝜕
= ∑𝑑∈𝐷 2 𝑡𝑑 − 𝑜𝑑 𝑡 − 𝑜𝑑 𝑜(𝑥)
Ԧ = 𝑤 ⋅ 𝑥Ԧ
2 𝜕𝑤𝑖 𝑑
𝜕
= ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 𝑡 − 𝑤 ⋅ 𝑥Ԧ𝑑 𝜕𝐸
𝜕𝑤𝑖 𝑑 𝑾inew = 𝑊iold − 𝜂 ∗
𝜕𝐸 𝜕𝑊𝑖
= ∑𝑑∈𝐷 𝑡𝑑 − 𝑜𝑑 −𝑥𝑖𝑑
𝜕𝑤𝑖