Neural Networks - V Unit
Neural Networks - V Unit
Single-Layer Perceptron
Multi-Layer Perceptron
• Weights: Each input feature is assigned a weight that determines its influence on the
output. These weights are adjusted during training to find the optimal values.
• Summation Function: The perceptron calculates the weighted sum of its inputs, combining
them with their respective weights.
• Activation Function: The weighted sum is passed through the Heaviside step function,
comparing it to a threshold to produce a binary output (0 or 1).
• Output: The final output is determined by the activation function, often used for binary
classification tasks.
• Bias: The bias term helps the perceptron make adjustments independent of the input,
improving its flexibility in learning.
• Learning Algorithm: The perceptron adjusts its weights and bias using a learning
algorithm, such as the Perceptron Learning Rule, to minimize prediction errors.
Components
Perceptron Inputs (nodes)
1.Node values (1, 0, 1, 0, 1)
2.Node Weights (0.7, 0.6, 0.5, 0.3, 0.4)
3.Summation
4.Treshold Value
5.Activation Function
6.Summation (sum > treshold)
1. Perceptron Inputs
A perceptron receives one or more input.
Perceptron inputs are called nodes.
• The nodes have both a value and a weight.
4. Summation
The perceptron calculates the weighted sum of its inputs.
It multiplies each input by its corresponding weight and sums up
the results.
• The sum is: 0.7*1 + 0.6*0 + 0.5*1 + 0.3*0 + 0.4*1 = 1.6
5. The Activation Function
After the summation, the perceptron applies the activation
function.
The purpose is to introduce non-linearity into the output. It
determines whether the perceptron should fire or not based
on the aggregated input.
• The activation function is simple: (sum > treshold) ==
(1.6 > 1.5)
6. The Threshold
The Threshold is the value needed for the perceptron to fire
(outputs 1), otherwise it remains inactive (outputs 0).
• In the example, the treshold value is: 1.5
Example: Perceptron in Action
• Bias: 1.5
1.Step Function
2.Sigmoid
3.Tanh (Hyperbolic Tangent)
4.ReLU (Rectified Linear Unit)
5.Leaky ReLU
6.Softmax (used in output layers for classification)
Gradient Descent Optimization
• Gradient Descent is a fundamental optimization algorithm used in
machine learning to minimize a loss function by iteratively moving in
the direction of the steepest descent as defined by the negative of the
gradient.
• Imagine you're at the top of a hill (high loss), and you want to reach the
bottom (minimum loss). You take steps in the steepest downward
direction until you can't go any lower.
Training Machine Learning Models
• Neural networks are trained using Gradient Descent (or its variants) in
combination with backpropagation. Backpropagation computes the
gradients of the loss function with respect to each parameter (weights
and biases) in the network by applying the chain rule. The process
involves:
• Forward Propagation: Computes the output for a given input by passing
data through the layers.
• Backward Propagation: Uses the chain rule to calculate gradients of the
loss with respect to each parameter (weights and biases) across all layers.
• Gradients are then used by Gradient Descent to update the parameters
layer-by-layer, moving toward minimizing the loss function.
Minimizing the Cost Function
Stochastic Gradient Descent (SGD) Uses one data point at a time. Fast but noisy.
For large datasets, computing the gradient using all data points
can be slow and memory-intensive.
This is where SGD comes into play.
Instead of using the full dataset to compute the gradient at each
step, SGD uses only one random data point (or a small batch of
data points) at each iteration.
This makes the computation much faster.
Path followed by batch gradient descent vs. path followed
by SGD:
Feature First Image (SGD) Second Image (Likely Batch GD)
Each update uses one sample (SGD), Uses entire dataset per update
Cause
leading to variance (Batch GD), making updates stable
Fast per update, may take longer Slower per update, but smoother
Efficiency
overall convergence
Convergence Behavior Fluctuates around the minimum Direct, steady approach to minimum
Can escape local minima better due May get stuck in local minima if not
Exploration
to randomness convex
Working of Stochastic Gradient Descent
• Deep Learning
• Natural Language Processing (NLP)
• Computer Vision
• Reinforcement Learning
• Advantages
• Works well with large-scale data and online learning
• Less memory required (no need to load all data at once)
• Adds a level of randomness that can help escape poor local optima
• Disadvantages
• Noisy updates → causes the loss function to fluctuate
• May take longer to converge or need learning rate decay
• Can get stuck or oscillate near the minimum
Error Backpropagation
• Error Backpropagation (or just Backpropagation) is the key algorithm used to train
neural networks. It's how the model learns by updating its weights based on the error
(loss) of its predictions.
• The error function in back-propagation is used to calculate the error between the
predicted output and the actual output of the neural network. The error is then used
to update the weights of each neuron in each layer of the network during the back-
propagation process.