0% found this document useful (0 votes)

47 views44 pages

Unit 1.1

The document discusses activation functions in neural networks and their importance. Activation functions introduce non-linearity and allow neural networks to learn complex patterns from data. Common activation functions mentioned are sigmoid, tanh, ReLU, and softmax.

Uploaded by

Prateek Patidar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views44 pages

Unit 1.1

Uploaded by

Prateek Patidar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

What is an activation function and why use them?

The activation function decides whether a neuron should be activated or not by calculating the
weighted sum and further adding bias to it. The purpose of the activation function is to introduce
non-linearity into the output of a neuron.

Explanation: We know, the neural network has neurons that work in correspondence with weight,
bias, and their respective activation function.
In a neural network, we would update the weights and biases of the neurons on the basis of the
error at the output. This process is known as back-propagation. Activation functions make the
back-propagation possible since the gradients are supplied along with the error to update the
weights and biases.

Why do we need Non-linear activation function?

A neural network without an activation function is essentially just a linear regression model. The
activation function does the non-linear transformation to the input making it capable to learn and
perform more complex tasks.
Mathematical proof
Suppose we have a Neural net like this :-
Hidden layer i.e. layer 1:

Layer 2 i.e. output layer :-

the composition of two linear
function is a linear function itself.
Neuron can not learn with just a linear
function attached to it. A non-linear
activation function will let it learn as per
the difference w.r.t error. Hence we
need an activation function.
Sigmoid Function
Tanh Function
RELU Function
Softmax Function
Choosing the Right Activation
Function
Models of Artificial Neural Network

1. McCulloch-Pitts Model of Neuron

• The McCulloch-Pitts neural model, which was the earliest ANN model, has only two types
of inputs — Excitatory and Inhibitory. The excitatory inputs have weights of positive
magnitude and the inhibitory weights have weights of negative magnitude.
• The inputs of the McCulloch-Pitts neuron could be either 0 or 1. It has a threshold
function as an activation function.
• So, the output signal yout is 1 if the input ysum is greater than or equal to a given threshold
value, else 0. The diagrammatic representation of the model is as follows:
• Simple McCulloch-Pitts neurons can be used to design logical operations. For that purpose, the
connection weights need to be correctly decided along with the threshold function (rather than
the threshold value of the activation function).
John carries an umbrella if it is sunny or if it is raining. There are four
given situations. I need to decide when John will carry the umbrella.
The situations are as follows:

•First scenario: It is not raining, nor it is sunny

•Second scenario: It is not raining, but it is sunny
•Third scenario: It is raining, and it is not sunny
•Fourth scenario: It is raining as well as it is sunny

To analyse the situations using the McCulloch-Pitts neural model, I can

consider the input signals as follows:

•X1: Is it raining?
•X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the
value of both weights X1 and X2 as 1 and a threshold function as 1. So,
the neural network model will look like:
Truth Table for this case will be:

Situation x1 x2 ysum yout

1 0 0 0 0

2 0 1 1 1

3 1 0 1 1

4 1 1 2 1
So, I can say that,

From the truth table, I can conclude that in the situations where the value of yout is 1,
John needs to carry an umbrella. Hence, he will need to carry an umbrella in scenarios
2, 3 and 4.
Perceptron Model

• Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning

rule based on the original MCP neuron.
• A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm
enables neurons to learn and processes elements in the training set one at a time.
• A perceptron has one or more than one inputs, a process, and only one output.
• The original Perceptron was designed to take a number of binary inputs, and produce
one binary output (0 or 1).

Perceptron Example
Imagine a perceptron (in your brain).

The perceptron tries to decide if you should go to a concert.

Is the artist good? Is the weather good?

What weights should these facts have?

Criteria Input Weight
Artists is Good x1 = 0 or 1 w1 = 0.7
Weather is Good x2 = 0 or 1 w2 = 0.6
Friend will Come x3 = 0 or 1 w3 = 0.5
Food is Served x4 = 0 or 1 w4 = 0.3
Alcohol is Served x5 = 0 or 1 w5 = 0.4

The Perceptron Algorithm

Frank Rosenblatt suggested this algorithm:
1.Set a threshold value
2.Multiply all inputs with its weights
3.Sum all the results
4.Activate the output
1. Set a threshold value:
•Threshold = 1.5 3. Sum all the results:
2. Multiply all inputs with its weights: •0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted
•x1 * w1 = 1 * 0.7 = 0.7 Sum)
•x2 * w2 = 0 * 0.6 = 0 4. Activate the Output:
•x3 * w3 = 1 * 0.5 = 0.5 •Return true if the sum > 1.5 ("Yes I will go to
•x4 * w4 = 0 * 0.3 = 0 the Concert")
•x5 * w5 = 1 * 0.4 = 0.4
How does Perceptron work?

• Perceptron is regarded as a single-layer neural network comprising four key parameters in Machine
Learning. These parameters of the perceptron algorithm are input values (Input nodes), net sum,
weights and Bias, and an activation function.
• The perceptron model starts by multiplying every input value and its weights. Subsequently, it adds
these values to generate the weighted sum. This weighted sum is then applied to the activation
function “f” to get the anticipated output. The corresponding activation function is also called the step
function.
Basic Components of Perceptron

1.Input Layer: The input layer consists of one or more input neurons, which receive input signals from
the external world or from other layers of the neural network.

2.Weights: Each input neuron is associated with a weight, which represents the strength of the
connection between the input neuron and the output neuron.

3.Bias: A bias term is added to the input layer to provide the perceptron with additional flexibility in
modeling complex patterns in the input data.

4.Activation Function: The activation function determines the output of the perceptron based on the
weighted sum of the inputs and the bias term. Common activation functions used in perceptrons
include the step function, sigmoid function, and ReLU function.

5.Output: The output of the perceptron is a single binary value, either 0 or 1, which indicates the class
or category to which the input data belongs.

6.Training Algorithm: The perceptron is typically trained using a supervised learning algorithm such as
the perceptron learning algorithm or backpropagation. During training, the weights and biases of the
perceptron are adjusted to minimize the error between the predicted output and the true output for a
given set of training examples.
Types of Perceptron:

1.Single layer: Single layer perceptron can learn only linearly separable patterns.

2.Multilayer: Multilayer perceptron can learn about two or more layers having a
greater processing power.

The multi-layer perceptron model is also known as the

Backpropagation algorithm, which executes in two stages
as follows:
•Forward Stage: Activation functions start from the input
layer in the forward stage and terminate on the output
layer.
•Backward Stage: In the backward stage, weight and bias
values are modified as per the model's requirement. In this
stage, the error between actual output and demanded
originated backward on the output layer and ended on the
input layer.
Linear Separability

• Linear separability is an important concept in machine learning, particularly in the field

of supervised learning. It refers to the ability of a set of data points to be separated into
distinct categories using a linear decision boundary.
• In other words, if there exists a straight line that can cleanly divide the data into two
classes, then the data is said to be linearly separable.
• Linearly separable data points can be separated using a line, linear function, or flat
hyperplane.
• In practice, there are several methods to determine whether data is linearly separable
• If the separate points in n-dimensional space follows

then it is said linearly separable

Methods for checking linear separability:

1.Visual Inspection: If a distinct straight line or plane divides the various groups, it
can be visually examined by plotting the data points in a 2D or 3D space. The data
may be linearly separable if such a boundary can be seen.
2.Perceptron Learning Algorithm: This binary linear classifier divides the input
into two classes by learning a separating hyperplane iteratively. The data are linearly
separable if the method finds a separating hyperplane and converges. If not, it is
not.
3.Support vector machines: SVMs are a well-liked classification technique that can
handle data that can be separated linearly. To optimize the margin between the two
classes, they identify the separating hyperplane. The data can be linearly separated
if the margin is bigger than zero.
4.Kernel methods: The data can be transformed into a higher-dimensional space
using this family of techniques, where it might then be linearly separable. The
original data is also linearly separable if the converted data is linearly separable.
5.Quadratic programming: Finding the separation hyperplane that reduces the
classification error can be done using quadratic programming. If a solution is found,
the data can be separated linearly.
Linearly Separable 2D Data

We say a two-dimensional dataset is linearly separable if we can separate the positive

from the negative objects with a straight line.
For example, here is a case of selling a house based on area and price. We have got
a number of data points for that along with the class, which is house Sold/Not Sold:
Adaptive Linear Neuron (Adaline)

Adaline which stands for Adaptive Linear Neuron, is a network having a single linear unit.
It was developed by Widrow and Hoff in 1960. Some important points about Adaline are
as follows −

•It uses bipolar activation function.

•Adaline neuron can be trained using Delta rule or Least Mean Square(LMS) rule or
widrow-hoff rule

•The net input is compared with the target value to compute the error signal.

•on the basis of adaptive training algoritham weights are adjusted

The basic structure of Adaline is similar to perceptron having an extra feedback loop with
the help of which the actual output is compared with the desired/target output. After
comparison on the basis of training algorithm, the weights and bias will be updated.
Adaptive Linear Neuron Learning algorithm
Step 0: initialize the weights and the bias are set to some
random values but not to zero, also initialize the learning
rate α.
Step 1 − perform steps 2-7 when stopping condition is
false.
Step 2 − perform steps 3-5 for each bipolar training pair
s:t.
Step 3 − Activate each input unit as follows −
Step 4 − Obtain the net input with the following relation −
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 5 Until least mean square is obtained (t - yin), Adjust the
weight and bias as follows −

wi(new) = wi(old) + α(t - yin)xi

b(new) = b(old) + α(t - yin)

Now calculate the error using => E = (t - yin)2

Step 7 − Test for the stopping condition, if error generated is less
then or equal to specified tolerance then stop.
Multiple Adaptive Linear Neuron (Madaline)

Madaline which stands for Multiple Adaptive Linear Neuron, is a network which consists
of many Adalines in parallel. It will have a single output unit. Some important points
about Madaline are as follows −

•It is just like a multilayer perceptron, where Adaline will act as a hidden unit between
the input and the Madaline layer.
•The weights and the bias between the input and Adaline layers, as in we see in the
Adaline architecture, are adjustable.
•The Adaline and Madaline layers have fixed weights and bias of 1.
•Training can be done with the help of Delta rule.
It consists of “n” units of input layer and “m” units of Adaline layer and “1” unit of the
Madaline layer. Each neuron in the Adaline and Madaline layers has a bias of excitation
“1”. The Adaline layer is present between the input layer and the Madaline layer; the
Adaline layer is considered as the hidden layer.

Deep Learning 117 MCQ
No ratings yet
Deep Learning 117 MCQ
33 pages
Deep Learning - Question Papers
50% (2)
Deep Learning - Question Papers
7 pages
Unit II - Perceptron
No ratings yet
Unit II - Perceptron
20 pages
Deep Learning Question Bank (2024-25)
No ratings yet
Deep Learning Question Bank (2024-25)
2 pages
Unit 5
No ratings yet
Unit 5
46 pages
Important Questions
No ratings yet
Important Questions
19 pages
Unit 1 Until MLP
No ratings yet
Unit 1 Until MLP
56 pages
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
No ratings yet
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
4 pages
Deep Leaning
No ratings yet
Deep Leaning
117 pages
Mod 3
No ratings yet
Mod 3
101 pages
Accenture Aptitude Paper 2012
No ratings yet
Accenture Aptitude Paper 2012
3 pages
Neural Network
No ratings yet
Neural Network
82 pages
Lecture 9
No ratings yet
Lecture 9
97 pages
UNIT1
No ratings yet
UNIT1
72 pages
Perceptron PDF
0% (1)
Perceptron PDF
8 pages
Ch1-Fundamental of Neural Network
No ratings yet
Ch1-Fundamental of Neural Network
59 pages
Neural Networks - V Unit
No ratings yet
Neural Networks - V Unit
43 pages
CV 2025 Spring 14
No ratings yet
CV 2025 Spring 14
33 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Unit 5
No ratings yet
Unit 5
28 pages
The Perceptrons
No ratings yet
The Perceptrons
41 pages
NNDL
No ratings yet
NNDL
96 pages
Unit-Ii MLT1
No ratings yet
Unit-Ii MLT1
45 pages
It ML Unit 2 Notes Final
No ratings yet
It ML Unit 2 Notes Final
23 pages
Machine MCQ
No ratings yet
Machine MCQ
32 pages
DL CHPT 1
No ratings yet
DL CHPT 1
59 pages
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
No ratings yet
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
11 pages
08 Neural Networks
No ratings yet
08 Neural Networks
47 pages
Unit V - Aiml PDF
No ratings yet
Unit V - Aiml PDF
29 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
UNIT1 Perceptron MLP
No ratings yet
UNIT1 Perceptron MLP
26 pages
Introduction-To-Ml-Part-3 Edited
No ratings yet
Introduction-To-Ml-Part-3 Edited
73 pages
Chapter II Build A Neural Network Step by Step
No ratings yet
Chapter II Build A Neural Network Step by Step
31 pages
Oe-Ml Unit-5
No ratings yet
Oe-Ml Unit-5
20 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
Research 2
No ratings yet
Research 2
26 pages
Percept Ron
No ratings yet
Percept Ron
49 pages
Mini Project 1
No ratings yet
Mini Project 1
16 pages
Research 4
No ratings yet
Research 4
22 pages
CHP 9
No ratings yet
CHP 9
29 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
Unit 3
No ratings yet
Unit 3
29 pages
1c Perceptrons
No ratings yet
1c Perceptrons
20 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
Research 6
No ratings yet
Research 6
16 pages
chp1 NN, MLFFN, Weight, Bias, Threshold, Activation FN, Loss FN
No ratings yet
chp1 NN, MLFFN, Weight, Bias, Threshold, Activation FN, Loss FN
19 pages
答案解析
No ratings yet
答案解析
15 pages
CNN Slides KD
No ratings yet
CNN Slides KD
25 pages
ML Module 5
No ratings yet
ML Module 5
14 pages
QB Ecc604 May 2022 Examination Te Extc Sem Vi 2021-22
No ratings yet
QB Ecc604 May 2022 Examination Te Extc Sem Vi 2021-22
25 pages
Module 4 Lab 1
No ratings yet
Module 4 Lab 1
5 pages
DL KIET Model Question Paper
No ratings yet
DL KIET Model Question Paper
31 pages
Unit 2
No ratings yet
Unit 2
20 pages
NN Part1
No ratings yet
NN Part1
43 pages
Neural Networks and CNN
No ratings yet
Neural Networks and CNN
25 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
Unit 2 - Class - Preceptron
No ratings yet
Unit 2 - Class - Preceptron
13 pages
1c Perceptrons4
No ratings yet
1c Perceptrons4
5 pages
1 Neural Networks
No ratings yet
1 Neural Networks
16 pages
Perceptrons
No ratings yet
Perceptrons
8 pages
Aiml 3
No ratings yet
Aiml 3
8 pages
NN 1
No ratings yet
NN 1
6 pages
Percept Ron
No ratings yet
Percept Ron
15 pages
ML Unit 3-2-18
No ratings yet
ML Unit 3-2-18
17 pages
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
No ratings yet
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
28 pages
CSA501 - QB Neural Network Deep Learning - Updated2024
No ratings yet
CSA501 - QB Neural Network Deep Learning - Updated2024
11 pages
Quiz 1 Machine Learning II
No ratings yet
Quiz 1 Machine Learning II
7 pages
Frontend Interview Preparation
No ratings yet
Frontend Interview Preparation
3 pages
Soil Analysis and Crop Fertility Predict
No ratings yet
Soil Analysis and Crop Fertility Predict
9 pages
R20A6610 DL Syllabus
No ratings yet
R20A6610 DL Syllabus
2 pages
CNN Course V1.3
No ratings yet
CNN Course V1.3
19 pages
Data Analytics With Python - Week 12 - 2022
No ratings yet
Data Analytics With Python - Week 12 - 2022
3 pages
Javascript Interview Topics
No ratings yet
Javascript Interview Topics
3 pages
Fallsem2018-19 Eee1007 Eth Tt424 Vl2018191002720 Reference Material I Unit - IV Maxnet
No ratings yet
Fallsem2018-19 Eee1007 Eth Tt424 Vl2018191002720 Reference Material I Unit - IV Maxnet
11 pages
NN Unit 2
No ratings yet
NN Unit 2
20 pages
Javascript Interview Questions
No ratings yet
Javascript Interview Questions
2 pages
3rd Lecture
No ratings yet
3rd Lecture
21 pages
Perceptron Lecture 3
No ratings yet
Perceptron Lecture 3
25 pages
AI Lab 12 & 13
No ratings yet
AI Lab 12 & 13
8 pages
2022 Ac Exnormal
No ratings yet
2022 Ac Exnormal
7 pages
Unit 2
No ratings yet
Unit 2
15 pages
chp3 Hebb Network
No ratings yet
chp3 Hebb Network
4 pages
Resume Template
No ratings yet
Resume Template
1 page
Btech All 7 Sem Soft Computing Pcp7h010 2020
No ratings yet
Btech All 7 Sem Soft Computing Pcp7h010 2020
2 pages
Unit 1: 23 February 2024 18:50
No ratings yet
Unit 1: 23 February 2024 18:50
1 page
Unit 4
No ratings yet
Unit 4
9 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
3 - Perceptron in Machine Learning
No ratings yet
3 - Perceptron in Machine Learning
7 pages
CSM 422
No ratings yet
CSM 422
2 pages
A Presentation On: By: Edutechlearners
No ratings yet
A Presentation On: By: Edutechlearners
33 pages
Perceptron Notes
No ratings yet
Perceptron Notes
4 pages
1 - Perceptron in Machine Learning
No ratings yet
1 - Perceptron in Machine Learning
6 pages
Artificial Neural Network: Presentation By: C. Vinoth Kumar SSN College of Engineering
No ratings yet
Artificial Neural Network: Presentation By: C. Vinoth Kumar SSN College of Engineering
9 pages
Pattern Recognition & Analysis Assignment - Ii
No ratings yet
Pattern Recognition & Analysis Assignment - Ii
19 pages
91329-0136097111 ch01
No ratings yet
91329-0136097111 ch01
12 pages

Unit 1.1

Uploaded by

Unit 1.1

Uploaded by

What is an activation function and why use them?

Why do we need Non-linear activation function?

Layer 2 i.e. output layer :-

1. McCulloch-Pitts Model of Neuron

•First scenario: It is not raining, nor it is sunny

To analyse the situations using the McCulloch-Pitts neural model, I can

Situation x1 x2 ysum yout

• Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning

The perceptron tries to decide if you should go to a concert.

What weights should these facts have?

The Perceptron Algorithm

The multi-layer perceptron model is also known as the

• Linear separability is an important concept in machine learning, particularly in the field

then it is said linearly separable

We say a two-dimensional dataset is linearly separable if we can separate the positive

•It uses bipolar activation function.

•on the basis of adaptive training algoritham weights are adjusted

wi(new) = wi(old) + α(t - yin)xi

Now calculate the error using => E = (t - yin)2

You might also like