0% found this document useful (0 votes)

19 views26 pages

UNIT1 Perceptron MLP

The document discusses the perceptron model, introduced by Minsky-Papert, which improves upon the McCulloch-Pitts neuron by incorporating numerical weights for inputs and a learning mechanism. It explains the perceptron's ability to perform binary classification and its limitations with non-linearly separable data, leading to the development of the Multilayer Perceptron. Additionally, it covers various activation functions, including sigmoid, tanh, ReLU, and softmax, highlighting their advantages and disadvantages in neural network applications.

Uploaded by

BENAZIR AE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views26 pages

UNIT1 Perceptron MLP

Uploaded by

BENAZIR AE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Perceptron

 Proposed by Minsky-Papert
 More general computational model than
McCulloch-Pitts neuron.
 It overcomes some of the limitations of the M-
P neuron by introducing the concept of
numerical weights (a measure of importance)
for inputs, and a mechanism for learning
those weights.
 Inputs are no longer limited to boolean values
like in the case of an M-P neuron, it supports
real inputs as well which makes it more useful
and generalized.
Now, this is very similar to an M-P neuron but we take a weighted sum of the inputs and set the output as
one only when the sum is more than an arbitrary threshold (theta).

 Instead of hand coding the thresholding

parameter thetha, we add it as one of the
inputs, with the weight -theta like shown
below, which makes it learn-able.

EXAMPLE: Consider the task of predicting whether you would watch a random game of football on TV or not
using the behavioral data available. And let's assume your decision is solely dependent on 3 binary inputs
(binary for simplicity).
Here,
 w_0 is called the bias because it represents the prior
(prejudice).
A football freak may have a very low threshold and may watch any
football game irrespective of the league, club or importance of the
game [theta = 0]. On the other hand, a selective viewer may only
watch a football game that is a premier league game, featuring
Man United game and is not friendly [theta = 2].
 So, weights and the bias will depend on the data.
Based on the data, if needed the model may have to give a lot of
importance (high weight) to the isManUnitedPlaying input and
penalize the weights of other inputs.
Perceptron vs McCulloch-Pitts Neuron
What kind of functions can be implemented using a perceptron? How different is it from McCulloch-Pitts
neurons?

 A perceptron also separates the input space into two halves,

positive and negative.
 All the inputs that produce an output 1 lie on one side
(positive half space) and all the inputs that produce an output
0 lie on the other side (negative half space).
 In other words, a single perceptron can only be used to
implement linearly separable functions, just like the M-P
neuron.
 The weights, including the threshold can be learned and the
inputs can be real values.

Perceptron for Binary Classification

With this discrete output, controlled by the activation function,
the perceptron can be used as a binary classification model,
defining a linear decision boundary. It finds the
separating hyperplane that minimizes the distance between
misclassified points and the decision boundary
EXAMPLE: OR Function
The above ‘possible solution’ was obtained by solving the linear system of equations on the left. It is clear
that the solution separates the input space into two spaces, negative and positive half spaces.

Perceptron Limitation: The perceptron's harsh thresholding logic

results in sudden binary decisions (e.g., 0.49 -> "No", 0.51 ->
"Yes"), which may not suit real-world applications requiring
smoother decision transitions.

Sigmoid Function Advantage: Sigmoid neurons introduce a

smoother, continuous, and differentiable "S"-shaped output that
maps inputs to probabilities between 0 and 1, providing gradual
decision changes instead of sharp transitions.
REAL TIME EXAMPLE:
Setting Up The Problem

We are going to use a perceptron to estimate if I will be watching a

movie based on historical data with the above-mentioned inputs.
The data has positive and negative examples, positive being the
movies I watched i.e., 1. Based on the data, we are going to learn
the weights using the perceptron learning algorithm. For visual
simplicity, we will only assume two-dimensional input.
Perceptron Learning Algorithm
Our goal is to find the w vector that can perfectly classify positive
inputs and negative inputs in our data.

 We initialize w with some random vector.

 Iterate over all the examples in the data, (P U N) both positive
and negative examples.
 If an input x belongs to P, w.x >= 0
 If x belongs to N, w.x < 0
Case 1: When x belongs to P and its dot product w.x < 0
Case 2: When x belongs to N and its dot product w.x ≥ 0
Only for these cases, we are updating our randomly initialized w.
Otherwise, we don’t touch w at all because Case 1 and Case 2 are
violating the very rule of a perceptron.
Why Would The Specified Update Rule Work?
We have already established that when x belongs to P, we
want w.x > 0, basic perceptron rule. What we also mean by that is
that when x belongs to P, the angle between w and x should be
_____ than 90 degrees. Fill in the blank.
Answer: The angle between w and x should be less than 90
because the cosine of the angle is proportional to the dot product.

So whatever the w vector may be, as long as it makes an angle less

than 90 degrees with the positive example data vectors (x E P) and
an angle more than 90 degrees with the negative example data
vectors (x E N). So ideally, it should look something like this:

So we now strongly believe that the angle between w and x should

be less than 90 when x belongs to P class and the angle between
them should be more than 90 when x belongs to N class. Here’s
why the update works:
So when we are adding x to w, which we do when x belongs to P
and w.x < 0
In short,
Case 1: positive input x belongs to P
w=w+x
cos(α) > 0
α < 0 i.e., α < 90
Case 2: negative input x belongs to N
w=w-x
cos(α) < 0
α > 0 i.e., α > 90
XOR Function — Can’t Do!
 Look at a non-linear boolean function i.e., you cannot draw a
line to separate positive inputs from the negative ones.

 Fourth equation contradicts the second and the third

equation.
 i.e., there are no perceptron solutions for non-linearly
separated data.
 So, a single perceptron cannot learn to separate the data that
are non-linear in nature.
MULTILAYER PERCEPTRON
 The Multilayer Perceptron was developed to tackle this
limitation.
 It is a neural network where the mapping between inputs and
output is non-linear.
 A Multilayer Perceptron has input and output layers, and one
or more hidden layers with many neurons stacked together.
 And while in the Perceptron the neuron must have an
activation function that imposes a threshold, like ReLU or
sigmoid, neurons in a Multilayer Perceptron can use any
arbitrary activation function.
EXAMPLE:
XOR:
 XOR(A,B) = (A+B)*(AB)’
 Complex relations can be broken into simpler functions and
combined.
ACTIVATION FUNCTIONS
 Extremely important feature of the Artificial Neural
Network.
 Decides whether a neuron should be activated or not.
 It limits the output signal to a finite value.
 Activation Function does the non-linear transformation to
the input making it capable to learn more complex relation
between input and output.
 It make the network capable of learning more complex
pattern.
 Without an activation function, the neural network is just a
linear regression model as it performs only summation of
product of input and weights.
Eg. In the below image 2 requires a complex relation which is curve unlike a simple linear relation in image
1.

Fig. Illustrating the need of Activation Function for a complex problem. Activation function must be
efficient and it should reduce the computation time because the neural network sometimes trained on
millions of data points.
Types of AF:
The Activation Functions can be basically divided into 3 types-
1. Binary step Activation Function
2. Linear Activation Function
3. Non-linear Activation Functions
1. Binary Step Function
 A binary step function is a threshold-based activation
function.
 If the input value is above or below a certain threshold, the
neuron is activated and sends exactly the same signal to the
next layer.
 We decide some threshold value to decide output that
neuron should be activated or deactivated.
 It is very simple and useful to classify binary problems or
classifier.
Eg. f(x) = 1 if x > 0 else 0 if x <= 0

2. Linear or Identity Activation Function

 Function is a line or linear.
 Output of the functions will not be confined between any
range.

Fig: Linear Activation Function

Equation: f(x) = x
Range : (-infinity to infinity)
 It doesn’t help with the complexity or various parameters of
usual data that is fed to the neural networks
3. Non-linear Activation Function
 Most used activation functions.
 Nonlinearity helps to makes the graph look something like
this.

Fig: Non-linear Activation Function

Derivative or Differential: Change in y-axis w.r.t. change in x-

axis.It is also known as slope.
Monotonic function: A function which is either entirely non-
increasing or non-decreasing.
 The Nonlinear Activation Functions are mainly divided on
the basis of their range or curves-
Advantages of Non-linear function over the Linear function:
 Differential is possible in all the non -linear function.
 Stacking of network is possible, which helps us in creating
deep neural nets.
 It makes it easy for the model to generalize.

Sigmoid (Logistic AF) (σ):

 The main reason why we use sigmoid function is it exists
between 0 to 1.
 It is especially used for models where we have to predict the
probability as output.
 Since probability of anything exists only between the range
of 0 and 1, sigmoid is the right choice.

Fig: Sigmoid Function (S-shaped Curve)

 The function is differentiable and monotonic. But function

derivative is not monotonic.
 The logistic sigmoid function can cause a neural network to
get stuck at the training time.
Advantages
1. Easy to understand and apply
2. Easy to train on small dataset
3. Smooth gradient, preventing “jumps” in output values.
4. Output values bound between 0 and 1, normalizing the
output of each neuron.
Disadvantages:
 Vanishing gradient—for very high or very low values of X,
there is almost no change to the prediction, causing a
vanishing gradient problem.
 This can result in the network refusing to learn further, or
being too slow to reach an accurate prediction.
 Outputs not zero centered.
 Computationally expensive
TanH (Hyperbolic Tangent AF):
 TanH is also like logistic sigmoid but in better way.
 The range of the TanHfunction is from -1 to +1.
 TanH is often preferred over the sigmoid neuron because it
is zero centred.
 The advantage is that the negative inputs will be mapped
strongly negative and the zero inputs will be mapped near
zero in tanh graph.

tanh(x) = 2 * sigmoid(2x) - 1

Fig. Sigmoid Vs Tanh

The function is differentiable and monotonic. But function

derivative is not monotonic.
Advantages
 Zero centered—making it easier to model inputs that have
strongly negative, neutral, and strongly positive values.
Disadvantages
 Like the Sigmoid function is also suffers from vanishing
gradient problem
 hard to train on small datasets
ReLU(Rectified Linear Unit):
 The ReLU is the most used activation function.
 Used in almost all convolution neural networks in hidden
layers only.
 The ReLU is half rectified (from bottom).
f(z) = 0, if z < 0
= z, otherwise
R(z) = max(0,z)
 The range is 0 to inf.

Advantages
 Avoids vanishing gradient problem.
 Computationally efficient—allows the network to converge
very quickly
 Non-linear—although it looks like a linear function, ReLU
has a derivative function and allows for backpropagation
Disadvantages
 Can only be used with a hidden layer
 hard to train on small datasets and need much data for
learning non- linear behavior.
 The Dying ReLU problem—when inputs approach zero, or
are negative, the gradient of the function becomes zero,
the network cannot perform backpropagation and cannot
learn.
 The function and its derivative both are monotonic.
 All the negative values are converted into zero, and this
conversion rate is so fast that neither it can map nor fit into
data properly which creates a problem.
Leaky ReLU Activation Function
 We needed the Leaky ReLU activation function to solve the
Dying ReLU‟ problem.
 Leaky ReLU we do not make all negative inputs to zero but
to a value near to zero which solves the major issue of ReLU
activation function.
R(z) = max (0.1*z, z)

Advantages
 Prevents dying ReLU problem—this variation of ReLU has a
small positive slope in the negative area, so it does enable
backpropagation, even for negative input values
 Otherwise like ReLU
Disadvantages
 Results not consistent—leaky ReLU does not provide
consistent predictions for negative input values.
Softmax:

 Sigmoid not able to handle more than two cases (class

label).
 Softmax can handle multiple cases.
 Softmax function squeeze the output for each class
between 0 and 1 with sum of them is 1.
 It is ideally used in the final output layer of the classifier,
where we are actually trying to attain the probabilities.
 Softmax produces multiple outputs for an input array.
 For this reason, we can build neural network models that
can classify more than 2 classes instead of binary class
solution.

Sigma = softmax
zi = input vector
e^{zi}} = standard exponential function for input vector
K = number of classes in the multi-class classifier
e^{zj} = standard exponential function for output vector
Advantages
 Able to handle multiple classes only one class in other
activation functions—normalizes the outputs for each
class between 0 and 1with the sum of the probabilities
been equal to 1, and divides by their sum, giving the
probability of the input value being in a specific class.
 Useful for output neurons—typically Softmax is used
only for the output layer, for neural networks that need
to classify inputs into multiple categories.

Astrology and Winning The Lottery
70% (10)
Astrology and Winning The Lottery
5 pages
CPSE Contacts
No ratings yet
CPSE Contacts
1,264 pages
Gold Care
No ratings yet
Gold Care
16 pages
Newgen Software Technologies Limited: Date: 30 BSE Limited National Stock Exchange of India Limited
No ratings yet
Newgen Software Technologies Limited: Date: 30 BSE Limited National Stock Exchange of India Limited
249 pages
Series 2A Pneu Cyl 2a - 0910-Uk PDF
No ratings yet
Series 2A Pneu Cyl 2a - 0910-Uk PDF
48 pages
Simplex Algorithm - Wikipedia
No ratings yet
Simplex Algorithm - Wikipedia
20 pages
Converting MicroSim® Schematics Designs To OrCAD Capture® Designs
No ratings yet
Converting MicroSim® Schematics Designs To OrCAD Capture® Designs
44 pages
Highway Pavement Structural Design: (JRCP)
No ratings yet
Highway Pavement Structural Design: (JRCP)
37 pages
CE6603-Design of Steel Structures
No ratings yet
CE6603-Design of Steel Structures
12 pages
From Perceptron To Deep Neural Nets - Becoming Human - Artificial Intelligence Magazine
No ratings yet
From Perceptron To Deep Neural Nets - Becoming Human - Artificial Intelligence Magazine
36 pages
1 Henkel 09 Trouble Shooting
No ratings yet
1 Henkel 09 Trouble Shooting
17 pages
Immigrants and Crime
No ratings yet
Immigrants and Crime
36 pages
Cambridge IGCSE: PHYSICS 0625/41
No ratings yet
Cambridge IGCSE: PHYSICS 0625/41
16 pages
AFS Pro700 Brochure AFS-8018-10
No ratings yet
AFS Pro700 Brochure AFS-8018-10
2 pages
The Origin of Language: Presented By: Sadiq Mazari
No ratings yet
The Origin of Language: Presented By: Sadiq Mazari
13 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
mc34164 PDF
No ratings yet
mc34164 PDF
12 pages
Output: Aoi
No ratings yet
Output: Aoi
24 pages
NNDL
No ratings yet
NNDL
96 pages
Amendment in Regional Transmission Grid Plan of Gwadar Area - Complete (1) - Pages-70-74
No ratings yet
Amendment in Regional Transmission Grid Plan of Gwadar Area - Complete (1) - Pages-70-74
5 pages
Conference Program
No ratings yet
Conference Program
5 pages
Neural Network and Fuzzy Logic
50% (2)
Neural Network and Fuzzy Logic
54 pages
Weekly Home Learning Plan g10 q4 w7
No ratings yet
Weekly Home Learning Plan g10 q4 w7
3 pages
Lesson 4 - Contructivist Theory in Teaching Science
No ratings yet
Lesson 4 - Contructivist Theory in Teaching Science
2 pages
Activity On The Waves
No ratings yet
Activity On The Waves
1 page
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
AI - II - Cihan - Lect 6 PDF
No ratings yet
AI - II - Cihan - Lect 6 PDF
31 pages
Perceptron For Class
No ratings yet
Perceptron For Class
28 pages
Unit 2
No ratings yet
Unit 2
71 pages
CHP 9
No ratings yet
CHP 9
29 pages
Proyecto Salina Cruz Mediana Tension
No ratings yet
Proyecto Salina Cruz Mediana Tension
1 page
The 5th ICMS Agenda
No ratings yet
The 5th ICMS Agenda
13 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
Garage Door Control W/keyfob DSC-007: Application Note
No ratings yet
Garage Door Control W/keyfob DSC-007: Application Note
2 pages
DL CHPT 1
No ratings yet
DL CHPT 1
59 pages
Abhishek Dhiman
No ratings yet
Abhishek Dhiman
3 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
The Introduction To Neural Networks 10 4 24
No ratings yet
The Introduction To Neural Networks 10 4 24
54 pages
Unit 1.1
No ratings yet
Unit 1.1
44 pages
Module 6
No ratings yet
Module 6
104 pages
A Presentation On: By: Edutechlearners
No ratings yet
A Presentation On: By: Edutechlearners
33 pages
Neural Networks
No ratings yet
Neural Networks
54 pages
Javascript Programs
No ratings yet
Javascript Programs
14 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
Css Text Styling
No ratings yet
Css Text Styling
20 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
Unit 2
No ratings yet
Unit 2
15 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
NN Unit 2
No ratings yet
NN Unit 2
20 pages
Unit 1
No ratings yet
Unit 1
16 pages
Unit 4
No ratings yet
Unit 4
9 pages
Module 5
No ratings yet
Module 5
27 pages
EAI - Lecture 3
No ratings yet
EAI - Lecture 3
33 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Bootstrap Lab Manual
No ratings yet
Bootstrap Lab Manual
28 pages
JS Functions
No ratings yet
JS Functions
8 pages
Percptron
No ratings yet
Percptron
25 pages
11-Chapter 11-Wellsite Geologist
No ratings yet
11-Chapter 11-Wellsite Geologist
140 pages
FALLSEM2024-25 BCSE209L TH VL2024250101737 2024-08-06 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101737 2024-08-06 Reference-Material-I
20 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
Unit 2
No ratings yet
Unit 2
20 pages
Mod 3
No ratings yet
Mod 3
101 pages
Unit V
No ratings yet
Unit V
26 pages
Digital Library
No ratings yet
Digital Library
24 pages
Ai Unit 4 Part 2
No ratings yet
Ai Unit 4 Part 2
45 pages
Deep Leaning
No ratings yet
Deep Leaning
117 pages
ML Unit 2
No ratings yet
ML Unit 2
23 pages
Unit 1 Until MLP
No ratings yet
Unit 1 Until MLP
56 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
Inception New
No ratings yet
Inception New
11 pages
Ai Unit-3
No ratings yet
Ai Unit-3
69 pages
Ex NO 9 DL LAB
No ratings yet
Ex NO 9 DL LAB
3 pages
ML Unit 3-2-18
No ratings yet
ML Unit 3-2-18
17 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
Perceptrons
No ratings yet
Perceptrons
8 pages
Session 5
No ratings yet
Session 5
47 pages
FML Unit5
No ratings yet
FML Unit5
21 pages
Neural Networks - V Unit
No ratings yet
Neural Networks - V Unit
43 pages
Unit V - Aiml PDF
No ratings yet
Unit V - Aiml PDF
29 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Unit V
No ratings yet
Unit V
25 pages
Linearly Separable 1
No ratings yet
Linearly Separable 1
36 pages
Unit 5
No ratings yet
Unit 5
102 pages
ANN Presentation Exam Tanjina
No ratings yet
ANN Presentation Exam Tanjina
21 pages
RiceBioS
No ratings yet
RiceBioS
9 pages
PLANT
No ratings yet
PLANT
5 pages
Unit 5
No ratings yet
Unit 5
28 pages
Oe-Ml Unit-5
No ratings yet
Oe-Ml Unit-5
20 pages
S1 LGL3702 Lecture 1 2025
No ratings yet
S1 LGL3702 Lecture 1 2025
40 pages
Neural Network
No ratings yet
Neural Network
82 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
Astm d3359 22 English
No ratings yet
Astm d3359 22 English
8 pages

UNIT1 Perceptron MLP

Uploaded by

UNIT1 Perceptron MLP

Uploaded by

Perceptron

 Instead of hand coding the thresholding

 A perceptron also separates the input space into two halves,

Perceptron for Binary Classification

Perceptron Limitation: The perceptron's harsh thresholding logic

Sigmoid Function Advantage: Sigmoid neurons introduce a

We are going to use a perceptron to estimate if I will be watching a

 We initialize w with some random vector.

So whatever the w vector may be, as long as it makes an angle less

So we now strongly believe that the angle between w and x should

 Fourth equation contradicts the second and the third

2. Linear or Identity Activation Function

Fig: Linear Activation Function

Fig: Non-linear Activation Function

Derivative or Differential: Change in y-axis w.r.t. change in x-

Sigmoid (Logistic AF) (σ):

Fig: Sigmoid Function (S-shaped Curve)

 The function is differentiable and monotonic. But function

Fig. Sigmoid Vs Tanh

The function is differentiable and monotonic. But function

 Sigmoid not able to handle more than two cases (class

You might also like