0% found this document useful (0 votes)
144 views25 pages

Perceptron Lecture 3

The document discusses Rosenblatt's perceptron, which was the first algorithmically described neural network and introduced the concept of online learning. It presents the perceptron learning algorithm, which adjusts weights to reduce errors between actual and desired outputs. The algorithm detects errors, and if misclassification occurs, it updates the weights in a way that increases or decreases activation based on the error. A perceptron can only learn linearly separable functions, as its decision boundary is defined by a hyperplane.

Uploaded by

amjad tamish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views25 pages

Perceptron Lecture 3

The document discusses Rosenblatt's perceptron, which was the first algorithmically described neural network and introduced the concept of online learning. It presents the perceptron learning algorithm, which adjusts weights to reduce errors between actual and desired outputs. The algorithm detects errors, and if misclassification occurs, it updates the weights in a way that increases or decreases activation based on the error. A perceptron can only learn linearly separable functions, as its decision boundary is defined by a hyperplane.

Uploaded by

amjad tamish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

501582-3 Neural Networks

Perceptron

Dr. Huda Hakami


Department of Computer Science, Taif University
Introduction
• Rosenblatt’s Perceptron:
• Rosenblatt (1958) for proposing the perceptron as the first model for learning with a teacher
(i.e., supervised learning).
• The first algorithmically described neural network
• bio-inspired algorithm that tries to mimic a single neuron
• Occupies a special place in the historical development of neural networks
• Consider only one training instance at a time (online learning)
• Error-driven learning: learn only if we make a mistake when we classify using the current
weight vector. Otherwise, we don’t make adjustment to the weight vector.
Motivative Example
• Each day you get lunch at the cafeteria.
• Your diet consists of fish, chips, and Pepsi.
• You get several portions of each
• The cashier only tells you the total price of the meal
• After several days, you should be able to know the price of each portion.
• Each meal price gives a linear constraint on the prices (w) of the portions (x):

• The prices of the portions are like the weights in of a linear neuron.

• We will start with guesses for the weights and then adjust the guesses to give a better fit to the
prices given by the cashier.
The Artificial Perceptron

𝑎 = 𝑏 + % 𝑥! 𝑤!
!"#

The perceptron is an algorithm for supervised learning of binary linear


classifiers: functions that can decide whether an input (represented by
a vector of numbers) belong to one class or another.
Perceptron Training
How does the perceptron learn its classification tasks?

• This is done by making small adjustments in the weights to reduce the difference between the
actual and desired outputs of the perceptron.

• The initial weights are randomly assigned, usually in the range [-0.5, 0.5], and then updated to
obtain the output consistent with the training examples.

• The perceptron learns classification tasks through multiple iterations. Each iteration include the
weights adjustments process.
Perceptron Learning Algorithm
Perceptron Learning Algorithm (Cont.)
• Detecting error
Perceptron Learning Algorithm (Cont.)
• Detecting error
Desired (given) Predicted Update the weights Action
label y (
(actual) label 𝒚 𝒘
+1 sign(a)=+1 No error -> no
update
-1 sign(a)=-1 No error -> no
update
+1 sign(a)=-1 Misclassification Positive error, we
need to increase 𝑦 !
-1 sign(a)=+1 Misclassification Negative error, we
need to decrease 𝑦
!
𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦'
Perceptron Learning Algorithm (Cont.)
• Update rule - Intuitive Explanation
• The update rule if we have misclassification (if 𝑦𝑎 < 0 ):

• Incorrectly classify a positive instance as negative:


• We should increase the activation (𝐰 !𝐱)
• ADD the current instance to the weight vector
• Incorrectly classify a negative instance as positive:
• We should decrease the activation (𝐰 !𝐱)
• DEDUCT the current instance from the weight vector
Perceptron Learning Algorithm (Cont.)
• Update rule - Math Explanation

If the misclassified instance is a


positive one, then after we update
using w = w + x,
the new activation 𝑎" is greater
than the old activation 𝑎
Perceptron Learning Algorithm (Cont.)
• Update rule - Math Explanation
• Show that the analysis in the previous slide holds when y = -1 (i.e. we misclassified a negative
instance).
• Order of training instances:
• Randomly shuffling the training instances within each iteration improve the performance
• Showing only all the positives first and all the negatives next is a bad idea
Perceptron Algorithm (Compact)
• Variables and parameters:
• Given training data 𝐱, y where:
input vector: 𝐱 = [+1, 𝑥# , 𝑥$ , … , 𝑥% ]
desired response 𝑦: +1 or − 1
• Weight vector: 𝐰 = [𝑏, 𝑤# , 𝑤$ , … , 𝑤% ]
• Learning rate hyperparameter 0 ≤ 𝜂 ≤ 1
Perceptron Algorithm (Compact)
• Step 1: Initialization
• Set the initial weight vector to zero or to random numbers in a range [-1,+1] or [-0.5,+0.5],
then perform the following computations for time-step n = 1, 2, ....
• Step 2: Activation
• Computer activation 𝑎 and the actual output 𝑦' for an instance as follows:
𝑦' = sgn[𝐰 !𝐱]
• Step 3: Adaptation of weight vector
• Apply error-correction learning rule
𝐰&'( = 𝐰)*+ + 𝜂 𝑦 − 𝑦' 𝐱
• Continuation: Increment time step by one and go back to step 2.
Linear Separability: math review
• Linear functions are those whose graph is a straight line.
• A linear function has the following form: y = f(x) = ax + b, where a and b are
constants, often real numbers.
• A linear function has one independent variable and one dependent variable
• For a function f(x1, x2, ......xn), of any finite number of independent variables, the
general formula is f(x1, x2, ......xn)=a1x1+ a2x2+.....anxn +b
Perceptron: Linear Separability Concept
• A single perceptron can only be used to implement linearly separable functions
• For the perceptron to function properly, the two classes c1 and c2 must be linearly separable.
• the patterns to be classified must be separated from each other to ensure that the decision
surface consists of a hyperplane
Perceptron: Linear Separability Concept
• The training process involves the adjustment of the weight vector w in such a way that the two
classes c1 and c2 are linearly separable.
• That is, there exists a weight vector w such that we may state:

• Thus, the decision in perceptron is made depending on:


• Therefore,𝐰 !𝐱 = 0 is the decision boundary (defines the hyperplane)
Perceptron: Linear Separability Concept
• The decision in perceptron is made depending on:
• Therefore,𝐰 +𝐱 = 0 is the decision boundary (defines the hyperplane)
• Example:
• In 2D space we have 𝑤# 𝑥# + 𝑤$ 𝑥$ = 0, a straight line through the origin ignoring the bias
• In N dimensional space this is an (N-1) dimensional hyperplane
Geometric representation of Hyperplane
• Hyperplane defined by the weight vector is perpendicular to the weight vector

Hyperplane 𝐰 % 𝐱 = 0
perpendicular to w

The new weight vector w’ is the


This positive instance is misclassified addition of w + x according to the
Weight vector w
as negative as 𝐰 % 𝐱<0, why? perceptron update rule. x will be
classified as positive by w’, why?
Perceptron: Linear • A perceptron can learn logical operators AND, OR but cannot
Separability Concept learn Exclusive-OR (XOR)?
Linear Separability: Remarks
• When a dataset is linearly separable, there can exist more than one hyperplanes that separates
the dataset into positive/negative groups (not unique)
• However, (by definition) if a dataset is non-linearly separable, then there exist NO hyperplane that
separates the dataset into positive/negative groups.
• When a dataset is linearly separable it can be proved that the perceptron will always find a
separating hyperplane!
• As the final weight vector returned by the Perceptron is more influenced by the final training
instances it sees.
• Average over all weight vectors during the training
(averaged perceptron algorithm)
Example of Perceptron Learning
• Logical AND operator:
• Suppose: the initial weights are w1=0.3 and w2 =-0.1, threshold: 𝜃=0.2 and learning rate 0.1
• Activation function: step function (1/true or zero/false)
• After the initialization, the perceptron is activated by the sequence of four input patterns
representing an epoch.

w1=0.3 X
1
∑ ʃ 𝑦!
w2=-0.1 X
2 Actual / Resulting output

Desired / Target output


Example of Perceptron Learning
Desired Actual
Epoch Iteration
Inputs
output
Initial weights
output
Error Final weights Iteration is every
𝑦!
single repetition
X1 X2 Yd w1 w2 e w1 w2
1 0 0 0 0.3 -0.1 of a process
2 0 1 0
1
3 1 0 0
4 1 1 1 Epoch is the
5 0 0 0 presentation of
2
6 0 1 0 the entire
7 1 0 0 training set to
8 1 1 1 the ANN during
9 0 0 0 the training
10 0 1 0
3 process.
11 1 0 0
12 1 1 1
13 0 0 0 Threshold: 𝜽=0.2
14 0 1 0
4
15 1 0 0 learning rate: 0.1
16 1 1 1
Example of Perceptron Learning
Desired Actual
Inputs Initial weights Error Final weights
Epoch Iteration output output

X1 X2 Yd w1 w2 𝑦! e w1 w2
1 0 0 0 0.3 -0.1 0 0 0.3 -0.1
2 0 1 0 0.3 -0.1 0 0 0.3 -0.1
1
3 1 0 0 0.3 -0.1 1 -1 0.2 -0.1
4 1 1 1 0.2 -0.1 0 1 0.3 0.0
Iteration 1:
𝐰&'( = 𝐰)*+ + 𝜂 𝑦 − 𝑦' 𝐱
• 0 * 0.3 + 0 * -0.1 - 0.2 = -0.2 -> step(-0.2) = 0 (negative)
• error= 0 – 0 = 0 (no update for w1 and w2)
Iteration 2:
• 0 * 0.3 + 1 * -0.1 -0.2 = -0.3 -> step(-0.3) = 0 (negative) w1 = 0.3 + (0.1 * -1 * 1) = 0.2
• error= 0 – 0 = 0 (no update) w2 = -0.1 + (0.1 * -1 * 0) = 0.1
Iteration 3:
• 1 * 0.3 + 0 * -0.1 -0.2 = 0.1 -> step(0.1) = 1 (positive)
• error = 0 – 1 = -1 (apply update rule)
Example of Perceptron Learning
Desired Initial Actual
Inputs Error Final weights
Epoch Iteration output weights output
X1 X2 Yd w1 w2 𝑦! e w1 w2
1 0 0 0 0.3 -0.1 0 0 0.3 -0.1
2 0 1 0 0.3 -0.1 0 0 0.3 -0.1
1
3 1 0 0 0.3 -0.1 1 -1 0.2 -0.1
4 1 1 1 0.2 -0.1 0 1 0.3 0.0
5 0 0 0 0.3 0.0 0 0 0.3 0.0
6 0 1 0 0.3 0.0 0 0 0.3 0.0
2
7 1 0 0 0.3 0.0 1 -1 0.2 0.0
8 1 1 1 0.2 0.0 1 0 0.2 0.0
9 0 0 0 0.2 0.0 0 0 0.2 0.0
10 0 1 0 0.2 0.0 0 0 0.2 0.0
3
11 1 0 0 0.2 0.0 1 -1 0.1 0.0
12 1 1 1 0.1 0.0 0 1 0.2 0.1
13 0 0 0 0.2 0.1 0 0 0.2 01
14 0 1 0 0.2 0.1 0 0 0.2 0.1
4
15 1 0 0 0.2 0.1 1 -1 0.1 0.1
16 1 1 1 0.1 0.1 1 0 0.1 0.1
17 0 0 0 0.1 0.1 0 0 0.1 0.1
18 0 1 0 0.1 0.1 0 0 0.1 0.1
5
19 1 0 0 0.1 0.1 0 0 0.1 0.1
20 1 1 1 0.1 0.1 1 0 0.1 0.1
Example of Perceptron Learning: OR

You might also like