0% found this document useful (0 votes)
21 views43 pages

Neural Networks - V Unit

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views43 pages

Neural Networks - V Unit

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

NEURAL NETWORKS

The Perceptron is one of the simplest artificial neural


network architectures, introduced by Frank Rosenblatt in
1957.
It is primarily used for binary classification.
At that time, traditional methods like Statistical Machine
Learning and Conventional Programming were commonly
used for predictions.
Despite being one of the simplest forms of artificial neural
networks, the Perceptron model proved to be highly effective
in solving specific classification problems, laying the
groundwork for advancements in AI and machine learning.
Perceptrons are often used as the building blocks for more complex neural networks, such
as multi-layer perceptrons (MLPs) or deep neural networks (DNNs).

By combining multiple perceptrons in layers and connecting them in a network structure,


these models can learn and represent complex patterns and relationships in data, enabling
tasks such as image recognition, natural language processing, and decision making.
Frank Rosenblatt

Frank Rosenblatt (1928 – 1971) was an American


psychologist notable in the field of Artificial Intelligence.
In 1957 he started something really big. He "invented"
a Perceptron program, on an IBM 704 computer at Cornell
Aeronautical Laboratory.
Scientists had discovered that brain cells (Neurons) receive
input from our senses by electrical signals.
The Neurons, then again, use electrical signals to store
information, and to make decisions based on previous
input.
• Frank had the idea that Perceptrons could simulate brain
principles, with the ability to learn and make decisions.
Perceptron

• The original Perceptron was designed to take a number


of binary inputs, and produce one binary output (0 or
1).

• The idea was to use different weights to represent the


importance of each input, and that the sum of the values
should be greater than a threshold value before making
a decision like yes or no (true or false) (0 or 1).
Perceptron Example

Imagine a perceptron (in your brain).


The perceptron tries to decide if you
should go to a concert.
Is the artist good?
Is the weather good?
What weights should these facts have?
The Perceptron Algorithm

Frank Rosenblatt suggested this algorithm:

1.Set a threshold value

2.Multiply all inputs with its weights

3.Sum all the results

4.Activate the output


1. Set a threshold value:
• Threshold = 1.5
2. Multiply all inputs with its weights:
• x1 * w1 = 1 * 0.7 = 0.7
• x2 * w2 = 0 * 0.6 = 0
• x3 * w3 = 1 * 0.5 = 0.5
• x4 * w4 = 0 * 0.3 = 0
• x5 * w5 = 1 * 0.4 = 0.4
3. Sum all the results:
• 0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted Sum)
4. Activate the Output:
• Return true if the sum > 1.5 ("Yes I will go to the Concert")
Types of Perceptron

Single-Layer Perceptron

Single-layer perceptrons are basic and can only learn


linearly separable patterns

Multi-Layer Perceptron

MLPs are more complex and can handle non-linearly


separable data due to their multiple hidden layers
Basic Components of Perceptron
• Input Features: The perceptron takes multiple input features, each representing a
characteristic of the input data.

• Weights: Each input feature is assigned a weight that determines its influence on the
output. These weights are adjusted during training to find the optimal values.

• Summation Function: The perceptron calculates the weighted sum of its inputs, combining
them with their respective weights.

• Activation Function: The weighted sum is passed through the Heaviside step function,
comparing it to a threshold to produce a binary output (0 or 1).

• Output: The final output is determined by the activation function, often used for binary
classification tasks.

• Bias: The bias term helps the perceptron make adjustments independent of the input,
improving its flexibility in learning.

• Learning Algorithm: The perceptron adjusts its weights and bias using a learning
algorithm, such as the Perceptron Learning Rule, to minimize prediction errors.
Components
Perceptron Inputs (nodes)
1.Node values (1, 0, 1, 0, 1)
2.Node Weights (0.7, 0.6, 0.5, 0.3, 0.4)
3.Summation
4.Treshold Value
5.Activation Function
6.Summation (sum > treshold)
1. Perceptron Inputs
A perceptron receives one or more input.
Perceptron inputs are called nodes.
• The nodes have both a value and a weight.

2. Node Values (Input Values)


Input nodes have a binary value of 1 or 0.
This can be interpreted as true or false / yes or no.
• The values are: 1, 0, 1, 0, 1
3. Node Weights
Weights are values assigned to each input.
Weights shows the strength of each node.
A higher value means that the input has a stronger influence on
the output.
• The weights are: 0.7, 0.6, 0.5, 0.3, 0.4

4. Summation
The perceptron calculates the weighted sum of its inputs.
It multiplies each input by its corresponding weight and sums up
the results.
• The sum is: 0.7*1 + 0.6*0 + 0.5*1 + 0.3*0 + 0.4*1 = 1.6
5. The Activation Function
After the summation, the perceptron applies the activation
function.
The purpose is to introduce non-linearity into the output. It
determines whether the perceptron should fire or not based
on the aggregated input.
• The activation function is simple: (sum > treshold) ==
(1.6 > 1.5)

6. The Threshold
The Threshold is the value needed for the perceptron to fire
(outputs 1), otherwise it remains inactive (outputs 0).
• In the example, the treshold value is: 1.5
Example: Perceptron in Action

Let’s take a simple example of classifying whether a given fruit is an apple


or not based on two inputs: its weight (in grams) and its color (on a scale of
0 to 1, where 1 means red). The perceptron receives these inputs,
multiplies them by their weights, adds a bias, and applies the activation
function to decide whether the fruit is an apple or not.
• Input 1 (Weight): 150 grams

• Input 2 (Color): 0.9 (since the fruit is mostly red)

• Weights: [0.5, 1.0]

• Bias: 1.5

The perceptron’s weighted sum would be:


(150∗0.5)+(0.9∗1.0)+1.5=76.4(150∗0.5)+(0.9∗1.0)+1.5=76.4
• Let’s assume the activation function uses a threshold of 75. Since 76.4 >
75, the perceptron classifies the fruit as an apple (output = 1).
Activation Function

• A mathematical function applied to a neuron’s output


• Adds non-linearity to the network
• Helps the network learn complex patterns
• Without it: entire network behaves like a linear model
Types of Activation Functions

1.Step Function
2.Sigmoid
3.Tanh (Hyperbolic Tangent)
4.ReLU (Rectified Linear Unit)
5.Leaky ReLU
6.Softmax (used in output layers for classification)
Gradient Descent Optimization
• Gradient Descent is a fundamental optimization algorithm used in
machine learning to minimize a loss function by iteratively moving in
the direction of the steepest descent as defined by the negative of the
gradient.

• Gradient descent is the backbone of the learning process for various


algorithms, including linear regression, logistic regression, support
vector machines, and neural networks which serves as a fundamental
optimization technique to minimize the cost function of a model
by iteratively adjusting the model parameters to reduce the
difference between predicted and actual values, improving the
model’s performance.
• Gradient descent is a crucial optimization algorithm in deep learning for
training machine learning models, especially neural
networks. It iteratively adjusts model parameters to minimize a cost or
loss function, effectively guiding the model towards optimal
performance.

• The main goal is to adjust the parameters of a model (weights, biases,


etc.) so that the error is minimized.

• Imagine you're at the top of a hill (high loss), and you want to reach the
bottom (minimum loss). You take steps in the steepest downward
direction until you can't go any lower.
Training Machine Learning Models

• Neural networks are trained using Gradient Descent (or its variants) in
combination with backpropagation. Backpropagation computes the
gradients of the loss function with respect to each parameter (weights
and biases) in the network by applying the chain rule. The process
involves:
• Forward Propagation: Computes the output for a given input by passing
data through the layers.
• Backward Propagation: Uses the chain rule to calculate gradients of the
loss with respect to each parameter (weights and biases) across all layers.
• Gradients are then used by Gradient Descent to update the parameters
layer-by-layer, moving toward minimizing the loss function.
Minimizing the Cost Function

• The algorithm minimizes a cost function, which quantifies the error or


loss of the model’s predictions compared to the true labels for:
• Linear Regression - Gradient descent minimizes the
Mean Squared Error (MSE)
• Logistic Regression -gradient descent minimizes the
Log Loss (Cross-Entropy Loss) to optimize the decision boundary for
binary classification
• Support Vector Machines (SVMs) - gradient descent optimizes the
hinge loss, which ensures a maximum-margin hyperplane
Type Description

Uses entire dataset to compute gradient. Accurate but


Batch Gradient Descent
slow on large data.

Stochastic Gradient Descent (SGD) Uses one data point at a time. Fast but noisy.

Uses a small subset of data. Balances speed and


Mini-batch Gradient Descent
accuracy.
To Calculate Global Minimum
Stochastic Gradient Descent
• Stochastic Gradient Descent (SGD) is an optimization algorithm in
machine learning, particularly when dealing with large datasets.
• Stochastic Gradient Descent (SGD) is a variant of gradient descent
where the model parameters are updated using only one randomly
selected training example at a time instead of the whole dataset.

• Faster updates: Especially useful for large datasets.


• Better generalization: The randomness can help escape local minima
and saddle points.
• More frequent updates: Can lead to faster convergence early on.
Need for Stochastic Gradient Descent

For large datasets, computing the gradient using all data points
can be slow and memory-intensive.
This is where SGD comes into play.
Instead of using the full dataset to compute the gradient at each
step, SGD uses only one random data point (or a small batch of
data points) at each iteration.
This makes the computation much faster.
Path followed by batch gradient descent vs. path followed
by SGD:
Feature First Image (SGD) Second Image (Likely Batch GD)

Path Noisy, zig-zag, irregular Smooth and curved

Each update uses one sample (SGD), Uses entire dataset per update
Cause
leading to variance (Batch GD), making updates stable

Fast per update, may take longer Slower per update, but smoother
Efficiency
overall convergence

Convergence Behavior Fluctuates around the minimum Direct, steady approach to minimum

Can escape local minima better due May get stuck in local minima if not
Exploration
to randomness convex
Working of Stochastic Gradient Descent

In Stochastic Gradient Descent, the gradient is calculated for each


training example (or a small subset of training examples) rather than the
entire dataset.
Step 1: Data Generation
Step 2: Define the SGD Function
Step 3: Train the Model Using SGD
Step 4: Visualize the Cost Function
Step 5: Plot the Data and Regression Line
Step 6: Print the Final Model Parameters
Applications of Stochastic Gradient Descent

• Deep Learning
• Natural Language Processing (NLP)
• Computer Vision
• Reinforcement Learning
• Advantages
• Works well with large-scale data and online learning
• Less memory required (no need to load all data at once)
• Adds a level of randomness that can help escape poor local optima

• Disadvantages
• Noisy updates → causes the loss function to fluctuate
• May take longer to converge or need learning rate decay
• Can get stuck or oscillate near the minimum
Error Backpropagation
• Error Backpropagation (or just Backpropagation) is the key algorithm used to train
neural networks. It's how the model learns by updating its weights based on the error
(loss) of its predictions.

• The error function in back-propagation is used to calculate the error between the
predicted output and the actual output of the neural network. The error is then used
to update the weights of each neuron in each layer of the network during the back-
propagation process.

• Compute the error at the output layer.


• Propagate that error backward through the network.
• Update the weights using gradient descent.
Step-by-Step
• Let’s say we have a simple neural network with:
• Input layer
• One hidden layer
• Output layer
And a loss function LLL, such as mean squared error.

• It makes a prediction (forward pass).


• It compares the prediction to the truth (loss).
• It sends the error backward to adjust weights so the next prediction is
better.
3. Backward Pass (Backpropagation):
• Using the chain rule of calculus, compute how much each weight
contributed to the error:
• Start from the output layer.
• Move backward, layer by layer.
• At each layer, calculate:
• Gradient of the loss w.r.t. activation.
• Gradient of activation w.r.t. weighted input.
• Gradient of weighted input w.r.t. weights.
Neural networks - TYPES
Neural networks can be broadly categorized into two types:
• shallow neural networks (SNNs) and
• deep neural networks (DNNs).
Shallow Neural Networks (SNNs):

• Shallow neural networks are characterized by their relatively simple


architecture. An SNN typically consists of three types of layers:
• Input Layer: Receives the raw data.
• Hidden Layer: Contains a single hidden layer where the computation
and feature extraction occur.
• Output Layer: Produces the final output or prediction.
• Due to the limited number of hidden layers, SNNs have a more
straightforward structure. Classic examples of shallow neural networks
include single-layer perceptrons and logistic regression models.
Deep Neural Networks (DNNs):

• Deep neural networks, as the name suggests, have a more complex


architecture with multiple hidden layers between the input and
output layers. These additional layers allow DNNs to learn more
abstract and intricate features from the data. The depth of a DNN
refers to the number of hidden layers it contains, which can range
from just a few to hundreds or even thousands.
• Common types of DNNs include:
• Convolutional Neural Networks (CNNs): Primarily used for image
recognition and computer vision tasks.
• Recurrent Neural Networks (RNNs): Designed for sequential data such
as time series or natural language.

You might also like