0% found this document useful (0 votes)

21 views43 pages

Neural Networks - V Unit

Uploaded by

nishasakthivel620

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views43 pages

Neural Networks - V Unit

Uploaded by

nishasakthivel620

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

NEURAL NETWORKS

The Perceptron is one of the simplest artificial neural

network architectures, introduced by Frank Rosenblatt in
1957.
It is primarily used for binary classification.
At that time, traditional methods like Statistical Machine
Learning and Conventional Programming were commonly
used for predictions.
Despite being one of the simplest forms of artificial neural
networks, the Perceptron model proved to be highly effective
in solving specific classification problems, laying the
groundwork for advancements in AI and machine learning.
Perceptrons are often used as the building blocks for more complex neural networks, such
as multi-layer perceptrons (MLPs) or deep neural networks (DNNs).

By combining multiple perceptrons in layers and connecting them in a network structure,

these models can learn and represent complex patterns and relationships in data, enabling
tasks such as image recognition, natural language processing, and decision making.
Frank Rosenblatt

Frank Rosenblatt (1928 – 1971) was an American

psychologist notable in the field of Artificial Intelligence.
In 1957 he started something really big. He "invented"
a Perceptron program, on an IBM 704 computer at Cornell
Aeronautical Laboratory.
Scientists had discovered that brain cells (Neurons) receive
input from our senses by electrical signals.
The Neurons, then again, use electrical signals to store
information, and to make decisions based on previous
input.
• Frank had the idea that Perceptrons could simulate brain
principles, with the ability to learn and make decisions.
Perceptron

• The original Perceptron was designed to take a number

of binary inputs, and produce one binary output (0 or
1).

• The idea was to use different weights to represent the

importance of each input, and that the sum of the values
should be greater than a threshold value before making
a decision like yes or no (true or false) (0 or 1).
Perceptron Example

Imagine a perceptron (in your brain).

The perceptron tries to decide if you
should go to a concert.
Is the artist good?
Is the weather good?
What weights should these facts have?
The Perceptron Algorithm

Frank Rosenblatt suggested this algorithm:

1.Set a threshold value

2.Multiply all inputs with its weights

3.Sum all the results

4.Activate the output

1. Set a threshold value:
• Threshold = 1.5
2. Multiply all inputs with its weights:
• x1 * w1 = 1 * 0.7 = 0.7
• x2 * w2 = 0 * 0.6 = 0
• x3 * w3 = 1 * 0.5 = 0.5
• x4 * w4 = 0 * 0.3 = 0
• x5 * w5 = 1 * 0.4 = 0.4
3. Sum all the results:
• 0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted Sum)
4. Activate the Output:
• Return true if the sum > 1.5 ("Yes I will go to the Concert")
Types of Perceptron

Single-Layer Perceptron

Single-layer perceptrons are basic and can only learn

linearly separable patterns

Multi-Layer Perceptron

MLPs are more complex and can handle non-linearly

separable data due to their multiple hidden layers
Basic Components of Perceptron
• Input Features: The perceptron takes multiple input features, each representing a
characteristic of the input data.

• Weights: Each input feature is assigned a weight that determines its influence on the
output. These weights are adjusted during training to find the optimal values.

• Summation Function: The perceptron calculates the weighted sum of its inputs, combining
them with their respective weights.

• Activation Function: The weighted sum is passed through the Heaviside step function,
comparing it to a threshold to produce a binary output (0 or 1).

• Output: The final output is determined by the activation function, often used for binary
classification tasks.

• Bias: The bias term helps the perceptron make adjustments independent of the input,
improving its flexibility in learning.

• Learning Algorithm: The perceptron adjusts its weights and bias using a learning
algorithm, such as the Perceptron Learning Rule, to minimize prediction errors.
Components
Perceptron Inputs (nodes)
1.Node values (1, 0, 1, 0, 1)
2.Node Weights (0.7, 0.6, 0.5, 0.3, 0.4)
3.Summation
4.Treshold Value
5.Activation Function
6.Summation (sum > treshold)
1. Perceptron Inputs
A perceptron receives one or more input.
Perceptron inputs are called nodes.
• The nodes have both a value and a weight.

2. Node Values (Input Values)

Input nodes have a binary value of 1 or 0.
This can be interpreted as true or false / yes or no.
• The values are: 1, 0, 1, 0, 1
3. Node Weights
Weights are values assigned to each input.
Weights shows the strength of each node.
A higher value means that the input has a stronger influence on
the output.
• The weights are: 0.7, 0.6, 0.5, 0.3, 0.4

4. Summation
The perceptron calculates the weighted sum of its inputs.
It multiplies each input by its corresponding weight and sums up
the results.
• The sum is: 0.7*1 + 0.6*0 + 0.5*1 + 0.3*0 + 0.4*1 = 1.6
5. The Activation Function
After the summation, the perceptron applies the activation
function.
The purpose is to introduce non-linearity into the output. It
determines whether the perceptron should fire or not based
on the aggregated input.
• The activation function is simple: (sum > treshold) ==
(1.6 > 1.5)

6. The Threshold
The Threshold is the value needed for the perceptron to fire
(outputs 1), otherwise it remains inactive (outputs 0).
• In the example, the treshold value is: 1.5
Example: Perceptron in Action

Let’s take a simple example of classifying whether a given fruit is an apple

or not based on two inputs: its weight (in grams) and its color (on a scale of
0 to 1, where 1 means red). The perceptron receives these inputs,
multiplies them by their weights, adds a bias, and applies the activation
function to decide whether the fruit is an apple or not.
• Input 1 (Weight): 150 grams

• Input 2 (Color): 0.9 (since the fruit is mostly red)

• Weights: [0.5, 1.0]

• Bias: 1.5

The perceptron’s weighted sum would be:

(150∗0.5)+(0.9∗1.0)+1.5=76.4(150∗0.5)+(0.9∗1.0)+1.5=76.4
• Let’s assume the activation function uses a threshold of 75. Since 76.4 >
75, the perceptron classifies the fruit as an apple (output = 1).
Activation Function

• A mathematical function applied to a neuron’s output

• Adds non-linearity to the network
• Helps the network learn complex patterns
• Without it: entire network behaves like a linear model
Types of Activation Functions

1.Step Function
2.Sigmoid
3.Tanh (Hyperbolic Tangent)
4.ReLU (Rectified Linear Unit)
5.Leaky ReLU
6.Softmax (used in output layers for classification)
Gradient Descent Optimization
• Gradient Descent is a fundamental optimization algorithm used in
machine learning to minimize a loss function by iteratively moving in
the direction of the steepest descent as defined by the negative of the
gradient.

• Gradient descent is the backbone of the learning process for various

algorithms, including linear regression, logistic regression, support
vector machines, and neural networks which serves as a fundamental
optimization technique to minimize the cost function of a model
by iteratively adjusting the model parameters to reduce the
difference between predicted and actual values, improving the
model’s performance.
• Gradient descent is a crucial optimization algorithm in deep learning for
training machine learning models, especially neural
networks. It iteratively adjusts model parameters to minimize a cost or
loss function, effectively guiding the model towards optimal
performance.

• The main goal is to adjust the parameters of a model (weights, biases,

etc.) so that the error is minimized.

• Imagine you're at the top of a hill (high loss), and you want to reach the
bottom (minimum loss). You take steps in the steepest downward
direction until you can't go any lower.
Training Machine Learning Models

• Neural networks are trained using Gradient Descent (or its variants) in
combination with backpropagation. Backpropagation computes the
gradients of the loss function with respect to each parameter (weights
and biases) in the network by applying the chain rule. The process
involves:
• Forward Propagation: Computes the output for a given input by passing
data through the layers.
• Backward Propagation: Uses the chain rule to calculate gradients of the
loss with respect to each parameter (weights and biases) across all layers.
• Gradients are then used by Gradient Descent to update the parameters
layer-by-layer, moving toward minimizing the loss function.
Minimizing the Cost Function

• The algorithm minimizes a cost function, which quantifies the error or

loss of the model’s predictions compared to the true labels for:
• Linear Regression - Gradient descent minimizes the
Mean Squared Error (MSE)
• Logistic Regression -gradient descent minimizes the
Log Loss (Cross-Entropy Loss) to optimize the decision boundary for
binary classification
• Support Vector Machines (SVMs) - gradient descent optimizes the
hinge loss, which ensures a maximum-margin hyperplane
Type Description

Uses entire dataset to compute gradient. Accurate but

Batch Gradient Descent
slow on large data.

Stochastic Gradient Descent (SGD) Uses one data point at a time. Fast but noisy.

Uses a small subset of data. Balances speed and

Mini-batch Gradient Descent
accuracy.
To Calculate Global Minimum
Stochastic Gradient Descent
• Stochastic Gradient Descent (SGD) is an optimization algorithm in
machine learning, particularly when dealing with large datasets.
• Stochastic Gradient Descent (SGD) is a variant of gradient descent
where the model parameters are updated using only one randomly
selected training example at a time instead of the whole dataset.

• Faster updates: Especially useful for large datasets.

• Better generalization: The randomness can help escape local minima
and saddle points.
• More frequent updates: Can lead to faster convergence early on.
Need for Stochastic Gradient Descent

For large datasets, computing the gradient using all data points
can be slow and memory-intensive.
This is where SGD comes into play.
Instead of using the full dataset to compute the gradient at each
step, SGD uses only one random data point (or a small batch of
data points) at each iteration.
This makes the computation much faster.
Path followed by batch gradient descent vs. path followed
by SGD:
Feature First Image (SGD) Second Image (Likely Batch GD)

Path Noisy, zig-zag, irregular Smooth and curved

Each update uses one sample (SGD), Uses entire dataset per update
Cause
leading to variance (Batch GD), making updates stable

Fast per update, may take longer Slower per update, but smoother
Efficiency
overall convergence

Convergence Behavior Fluctuates around the minimum Direct, steady approach to minimum

Can escape local minima better due May get stuck in local minima if not
Exploration
to randomness convex
Working of Stochastic Gradient Descent

In Stochastic Gradient Descent, the gradient is calculated for each

training example (or a small subset of training examples) rather than the
entire dataset.
Step 1: Data Generation
Step 2: Define the SGD Function
Step 3: Train the Model Using SGD
Step 4: Visualize the Cost Function
Step 5: Plot the Data and Regression Line
Step 6: Print the Final Model Parameters
Applications of Stochastic Gradient Descent

• Deep Learning
• Natural Language Processing (NLP)
• Computer Vision
• Reinforcement Learning
• Advantages
• Works well with large-scale data and online learning
• Less memory required (no need to load all data at once)
• Adds a level of randomness that can help escape poor local optima

• Disadvantages
• Noisy updates → causes the loss function to fluctuate
• May take longer to converge or need learning rate decay
• Can get stuck or oscillate near the minimum
Error Backpropagation
• Error Backpropagation (or just Backpropagation) is the key algorithm used to train
neural networks. It's how the model learns by updating its weights based on the error
(loss) of its predictions.

• The error function in back-propagation is used to calculate the error between the
predicted output and the actual output of the neural network. The error is then used
to update the weights of each neuron in each layer of the network during the back-
propagation process.

• Compute the error at the output layer.

• Propagate that error backward through the network.
• Update the weights using gradient descent.
Step-by-Step
• Let’s say we have a simple neural network with:
• Input layer
• One hidden layer
• Output layer
And a loss function LLL, such as mean squared error.

• It makes a prediction (forward pass).

• It compares the prediction to the truth (loss).
• It sends the error backward to adjust weights so the next prediction is
better.
3. Backward Pass (Backpropagation):
• Using the chain rule of calculus, compute how much each weight
contributed to the error:
• Start from the output layer.
• Move backward, layer by layer.
• At each layer, calculate:
• Gradient of the loss w.r.t. activation.
• Gradient of activation w.r.t. weighted input.
• Gradient of weighted input w.r.t. weights.
Neural networks - TYPES
Neural networks can be broadly categorized into two types:
• shallow neural networks (SNNs) and
• deep neural networks (DNNs).
Shallow Neural Networks (SNNs):

• Shallow neural networks are characterized by their relatively simple

architecture. An SNN typically consists of three types of layers:
• Input Layer: Receives the raw data.
• Hidden Layer: Contains a single hidden layer where the computation
and feature extraction occur.
• Output Layer: Produces the final output or prediction.
• Due to the limited number of hidden layers, SNNs have a more
straightforward structure. Classic examples of shallow neural networks
include single-layer perceptrons and logistic regression models.
Deep Neural Networks (DNNs):

• Deep neural networks, as the name suggests, have a more complex

architecture with multiple hidden layers between the input and
output layers. These additional layers allow DNNs to learn more
abstract and intricate features from the data. The depth of a DNN
refers to the number of hidden layers it contains, which can range
from just a few to hundreds or even thousands.
• Common types of DNNs include:
• Convolutional Neural Networks (CNNs): Primarily used for image
recognition and computer vision tasks.
• Recurrent Neural Networks (RNNs): Designed for sequential data such
as time series or natural language.

Unit 5
No ratings yet
Unit 5
46 pages
ML Unit 2
No ratings yet
ML Unit 2
23 pages
30 Frequently Asked Deep Learning Interview Questions and Answers
100% (1)
30 Frequently Asked Deep Learning Interview Questions and Answers
28 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Unit 1 NNDL
No ratings yet
Unit 1 NNDL
8 pages
Unit II - Perceptron
No ratings yet
Unit II - Perceptron
20 pages
Back Propagation Technique
No ratings yet
Back Propagation Technique
24 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
Unit 1 Until MLP
No ratings yet
Unit 1 Until MLP
56 pages
DL Unit 2
No ratings yet
DL Unit 2
107 pages
Unit 2
No ratings yet
Unit 2
15 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
5 pages
Unit 5
No ratings yet
Unit 5
32 pages
Lect 5
No ratings yet
Lect 5
41 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
Percept Ron
No ratings yet
Percept Ron
49 pages
Deep Leaning
No ratings yet
Deep Leaning
117 pages
It ML Unit 2 Notes Final
No ratings yet
It ML Unit 2 Notes Final
23 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
IML5
No ratings yet
IML5
21 pages
Machine Learning
No ratings yet
Machine Learning
39 pages
Neural Networks and CNN
No ratings yet
Neural Networks and CNN
25 pages
Ann MLP
No ratings yet
Ann MLP
56 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
What Is Perceptron - Simplilearn
No ratings yet
What Is Perceptron - Simplilearn
46 pages
Unit 3
No ratings yet
Unit 3
8 pages
3 - Perceptron in Machine Learning
No ratings yet
3 - Perceptron in Machine Learning
7 pages
Perceptrons
No ratings yet
Perceptrons
8 pages
Unit 1.1
No ratings yet
Unit 1.1
44 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
ML Module 5
No ratings yet
ML Module 5
14 pages
Oe-Ml Unit-5
No ratings yet
Oe-Ml Unit-5
20 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Unit 4
No ratings yet
Unit 4
18 pages
Unit 2
No ratings yet
Unit 2
20 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
L13 Artificial Neural Network
No ratings yet
L13 Artificial Neural Network
45 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
chp1 NN, MLFFN, Weight, Bias, Threshold, Activation FN, Loss FN
No ratings yet
chp1 NN, MLFFN, Weight, Bias, Threshold, Activation FN, Loss FN
19 pages
3rd Lecture
No ratings yet
3rd Lecture
21 pages
The Perceptrons
No ratings yet
The Perceptrons
41 pages
The Introduction To Neural Networks 10 4 24
No ratings yet
The Introduction To Neural Networks 10 4 24
54 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Unit 5
No ratings yet
Unit 5
102 pages
Unit 5
No ratings yet
Unit 5
28 pages
Unit 3
No ratings yet
Unit 3
29 pages
Unit V
No ratings yet
Unit V
49 pages
UNIT1
No ratings yet
UNIT1
72 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
1 - Perceptron in Machine Learning
No ratings yet
1 - Perceptron in Machine Learning
6 pages
Unit 4
No ratings yet
Unit 4
9 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
NNDL
No ratings yet
NNDL
96 pages
UNIT-II Chapter-2
No ratings yet
UNIT-II Chapter-2
20 pages
CNN Building Blocks
No ratings yet
CNN Building Blocks
14 pages
Unit 1
No ratings yet
Unit 1
19 pages
Feed Forward Neural Network
No ratings yet
Feed Forward Neural Network
16 pages
A Presentation On: By: Edutechlearners
No ratings yet
A Presentation On: By: Edutechlearners
33 pages
Neural Network
No ratings yet
Neural Network
82 pages
Lecture Notes On Lecture Notes On Deep Learning
No ratings yet
Lecture Notes On Lecture Notes On Deep Learning
8 pages
Introduction To Neural Networks: Freek Stulp
No ratings yet
Introduction To Neural Networks: Freek Stulp
12 pages
Ann 2023-2023
No ratings yet
Ann 2023-2023
3 pages
Deep Learning Part 1 (IITM) - Unit 14 - Week 11
No ratings yet
Deep Learning Part 1 (IITM) - Unit 14 - Week 11
3 pages
Introduction To Neurofuzzy Technologies: Combining Neural Networks and Fuzzy Logic
No ratings yet
Introduction To Neurofuzzy Technologies: Combining Neural Networks and Fuzzy Logic
8 pages
Paper Presentation
No ratings yet
Paper Presentation
21 pages
Activation Functions and Convolutional Neural Networks
No ratings yet
Activation Functions and Convolutional Neural Networks
137 pages
"Prediction of Values Using Machine Learning": Dhole Patil College of Engineering Department of Information Technology
No ratings yet
"Prediction of Values Using Machine Learning": Dhole Patil College of Engineering Department of Information Technology
19 pages
05 Rnns
No ratings yet
05 Rnns
121 pages
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
22 pages
Deep Learning Basics Lecture 2 Backpropagation
No ratings yet
Deep Learning Basics Lecture 2 Backpropagation
31 pages
99-Article Text-341-1-10-20190510
No ratings yet
99-Article Text-341-1-10-20190510
9 pages
Untitled Document
No ratings yet
Untitled Document
15 pages
NNDL 1
No ratings yet
NNDL 1
13 pages
Ethem Alpaydin-Introduction To Machine Learning-The MIT Press (2014) (330-333)
No ratings yet
Ethem Alpaydin-Introduction To Machine Learning-The MIT Press (2014) (330-333)
4 pages
GANS
No ratings yet
GANS
22 pages
Yazan Waqfi Paper Published 2022
No ratings yet
Yazan Waqfi Paper Published 2022
21 pages
Week 3
No ratings yet
Week 3
5 pages
Rate Coding or Direct Coding Which One Is Better For Accurate Robust and Energy-Efficient Spiking Neural Networks
No ratings yet
Rate Coding or Direct Coding Which One Is Better For Accurate Robust and Energy-Efficient Spiking Neural Networks
5 pages
Transformer - Ipynb - Colab
No ratings yet
Transformer - Ipynb - Colab
5 pages
2B MultiLayer Perceptron Assignment
No ratings yet
2B MultiLayer Perceptron Assignment
3 pages
CNN Image Classification With Advanced Hyperparameter Tunning - Ipynb+ +colab
No ratings yet
CNN Image Classification With Advanced Hyperparameter Tunning - Ipynb+ +colab
2 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Neural Networks - V Unit

Uploaded by

Neural Networks - V Unit

Uploaded by

NEURAL NETWORKS

The Perceptron is one of the simplest artificial neural

By combining multiple perceptrons in layers and connecting them in a network structure,

Frank Rosenblatt (1928 – 1971) was an American

• The original Perceptron was designed to take a number

• The idea was to use different weights to represent the

Imagine a perceptron (in your brain).

Frank Rosenblatt suggested this algorithm:

1.Set a threshold value

2.Multiply all inputs with its weights

3.Sum all the results

4.Activate the output

Single-layer perceptrons are basic and can only learn

MLPs are more complex and can handle non-linearly

2. Node Values (Input Values)

Let’s take a simple example of classifying whether a given fruit is an apple

• Input 2 (Color): 0.9 (since the fruit is mostly red)

• Weights: [0.5, 1.0]

The perceptron’s weighted sum would be:

• A mathematical function applied to a neuron’s output

• Gradient descent is the backbone of the learning process for various

• The main goal is to adjust the parameters of a model (weights, biases,

• The algorithm minimizes a cost function, which quantifies the error or

Uses entire dataset to compute gradient. Accurate but

Uses a small subset of data. Balances speed and

• Faster updates: Especially useful for large datasets.

Path Noisy, zig-zag, irregular Smooth and curved

In Stochastic Gradient Descent, the gradient is calculated for each

• Compute the error at the output layer.

• It makes a prediction (forward pass).

• Shallow neural networks are characterized by their relatively simple

• Deep neural networks, as the name suggests, have a more complex

You might also like