0% found this document useful (0 votes)

10 views84 pages

Fundamentals of Neural Network

The document discusses the fundamentals of neural networks, focusing on key concepts such as forward propagation, loss functions, activation functions, backpropagation, and gradient descent. It explains how forward propagation processes input data through the network, the significance of loss functions in model training, and the role of activation functions in introducing non-linearity. Additionally, it covers the gradient descent optimization algorithm and its variants, which are essential for adjusting model parameters to minimize errors during training.

Uploaded by

henop47759

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views84 pages

Fundamentals of Neural Network

Uploaded by

henop47759

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Fundamentals of Neural Network

Training, Optimization and

Regularization of
Deep Neural Network
Forward propagation

Forward propagation
• Forward propagation in deep learning refers to the process of passing input
data through the neural network to get the output or prediction.
• It involves a series of computations, where the input data is transformed as
it passes through the layers of the network.
• Process of passing the input forward through the network, involving
weighted sums, biases, and activation functions, is forward propagation.
• The network learns the optimal weights and biases during the training phase
to make accurate predictions.
Forward propagation

Simple Analogy
• Preparing a recipe (making predictions)
• Ingredients (Input)
• Ingredients importance or preference (Weights and Biases)
• Mixing Ingredients (Weighted Sums)
• Taste recipe (Activation Function)
• Final Dish as per desire (Output ; 0/1)
Loss Function
Loss function
• In deep learning, a loss function is a measure of how well a model's
predictions match the actual target values.
• The goal during the training of a model is to minimize this loss function.
• It quantifies the difference between predicted values and true values,
providing a way to assess how well the model is performing.
• Different types of problems (classification, regression, etc.) and algorithms
use different loss functions.
Loss Function

Mean squared

Absolute
Loss Function
Squared Error Loss (Mean Squared Error - MSE):
• Use Case: Typically used for regression problems, where the goal is to
predict a continuous variable.
• Calculation: It calculates the average of the squared differences between
the predicted and actual values.
• Formula:
Squared Error Loss
(Mean Squared Error - MSE)
Squared Error Loss
(Mean Squared Error - MSE)
Squared Error Loss
(Mean Squared Error - MSE)
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Binary Cross Entropy Loss:
• Use Case: Commonly used for classification problems.
• Calculation: For binary classification problems (two classes).
• Formula:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Binary Cross Entropy Loss:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Binary Cross Entropy Loss:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Exercise:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Categorical Cross Entropy Loss:
• Use Case: Commonly used for classification problems.
• Calculation: For multi-class classification problems (more than two
classes).
• Formula
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Categorical Cross Entropy Loss:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Categorical Cross Entropy Loss:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
What is activation function?
• The activation function decides whether a neuron should be activated or not
by calculating the weighted sum and further adding bias to it. The purpose
of the activation function is to introduce non-linearity into the output of a
neuron.
• In artificial neural networks, an activation function is one that outputs a
smaller value for tiny inputs and a higher value if its inputs are greater than
a threshold. An activation function "fires" if the inputs are big enough;
otherwise, nothing happens.
• An activation function, then, is a gate that verifies how an incoming value is
higher than a threshold value.
• The activation function is a fundamental component of neural networks that
introduces non-linearity, enabling them to learn complex relationships,
adapt to various data patterns, and make sophisticated decisions.
Why there is a need of activation function?
Introducing Non-linearity:
• Without activation functions, the entire neural network would behave like a linear
model.
• The stacking of multiple linear operations would result in a linear combination,
limiting the network's ability to learn and represent complex, non-linear patterns in
the data.

Capturing Complex Relationships:

• Many real-world problems involve intricate and non-linear relationships.
• Activation functions allow the neural network to model and capture these complex
patterns, making it more powerful in representing diverse data.

Enabling Neural Network to Learn:

• The non-linear transformations introduced by activation functions enable the
network to learn and adapt to intricate patterns in the input data during the training
process. This is crucial for the network to generalize well to unseen data.
Why there is a need of activation function?
Thresholding and Output Scaling:
• Activation functions often introduce thresholding effects, where the neuron
activates or not based on certain conditions.
• This helps in decision-making and provides a level of abstraction. Additionally,
activation functions like sigmoid and softmax scale the output to represent
probabilities in classification tasks.

Avoiding Vanishing or Exploding Gradients:

• Activation functions play a role in mitigating issues like vanishing or exploding
gradients during backpropagation, especially in deep neural networks.
• Well-designed activation functions help in the stable training of deep networks.
Why there is a need of activation function?
Introducing Sparsity:
• Some activation functions, like ReLU (Rectified Linear Unit) and its variants,
introduce sparsity in the network by setting negative values to zero.
• This can be beneficial in certain scenarios.

Facilitating Backpropagation:
• Activation functions provide derivatives or gradients that are essential for the
backpropagation algorithm, which is used to update the weights of the network
during training.
• This enables the network to learn and improve its performance over time.
Types of activation function
In a perceptron or a neural network, activation functions play a crucial role by
introducing non-linearity to the model.
Here are some common types of activation functions used in perceptrons:

1. Linear activation function

2. Logistic activation function
3. Tanh activation function
4. Softmax activation function
5. ReLU activation function
6. Leaky ReLU activation function
Types of activation function
1. Linear Function:
 Description: The linear activation function, also known as the identity
activation function, is a straightforward and simple function. It is defined as:
 Mathematical Form:

 Advantages:
 Simplicity
 Ease of Interpretation:
• Direct proportionality between input and output.
• Straightforward interpretation.
 Compatibility with Linear Models (Well-suited for tasks with linear
relationships)
Types of activation function
Disadvantages:
 Limited Expressiveness:
• Inability to model complex, non-linear relationships.
• Stacking linear layers results in a linear model.
 Vanishing Gradient Problem:
• Prone to vanishing gradients, especially in deep networks.
• May lead to slow learning.
 Not Suitable for Classification Problems:
• Challenging for binary classification tasks.
• Output not squashed into a specific range.
 Not Used in Hidden Layers of Deep Networks:
• Rarely used in deep networks' hidden layers.
• Non-linear activations preferred.
Types of activation function
2. Logistic Activation Function :
 is also commonly referred to as the Sigmoid Activation Function.
 Description: The sigmoid (logistic) function squashes input values to the range
(0, 1). It is commonly used in the output layer of binary classification models.
 Mathematical Form:
Types of activation function

 It is a function which is plotted as ‘S’ shaped graph.

 Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are
very steep. This means, small changes in x would also bring about large
changes in the value of Y.
 Value Range : 0 to 1
 Uses : Usually used in output layer of a binary classification, where result is
either 0 or 1, as value for sigmoid function lies between 0 and 1 only so, result
can be predicted easily to be 1 if value is greater than 0.5 and 0 otherwise.
Types of activation function
Types of activation function
Types of activation function
Types of activation function
3. Tanh (Hyperbolic Tangent) Function:
 Description: Similar to the sigmoid function, the tanh function maps input
values to the range (-1, 1). It is often used in hidden layers of neural networks.
 Mathematical Form:
Types of activation function

 The activation that works almost always better than sigmoid function is Tanh
function also known as Tangent Hyperbolic function. It is actually
mathematically shifted version of the sigmoid function. Both are similar and
can be derived from each other.
 Value Range : -1 to +1
 Nature :- non-linear
 Uses :- Usually used in hidden layers of a neural network as it is values lies
between -1 to 1 hence the mean for the hidden layer comes out be 0 or very
close to it, hence helps in centering the data by bringing mean close to 0. This
makes learning for the next layer much easier.
Types of activation function
Types of activation function
Types of activation function
Types of activation function
Types of activation function
4. Softmax Function:
 Description: Often used in the output layer of a neural network for multi-class
classification problems. It transforms the raw output scores (logits) into a
probability distribution over multiple classes. The Softmax function is
particularly useful when dealing with problems where an input can belong to
one of several exclusive classes.
 Mathematical Form:
Types of activation function

 The softmax function is also a type of sigmoid function but is handy when we
are trying to handle multi- class classification problems.
 Nature :- non-linear
 Uses :- Usually used when trying to handle multiple classes. the softmax
function was commonly found in the output layer of image classification
problems.The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
 If your output is for binary classification then, sigmoid function is very natural
choice for output layer.
 If your output is for multi-class classification then, Softmax is very useful to
predict the probabilities of each classes.
Types of activation function
Types of activation function
Types of activation function
Types of activation function
5. Rectified Linear Unit (ReLU):
 Description: ReLU is a popular activation function that outputs the input for
positive values and zero for negative values. It introduces non-linearity and is
computationally efficient.
 Mathematical Form:
Types of activation function

 It Stands for Rectified linear unit. It is the most widely used activation
function. Chiefly implemented in hidden layers of Neural network.
 Value Range :- [0, inf)
 Nature :- non-linear, which means we can easily backpropagate the errors and
have multiple layers of neurons being activated by the ReLU function.
 Uses :- ReLu is less computationally expensive than tanh and sigmoid because
it involves simpler mathematical operations. At a time only a few neurons are
activated making the network sparse making it efficient and easy for
computation.
Types of activation function
Types of activation function
Types of activation function
Types of activation function
Types of activation function
5. Leaky ReLU (Rectified Linear Unit) Function:
 Description: Leaky ReLU is an activation
function used in artificial neural networks to
introduce nonlinearity among the outputs
between layers of a neural network. This
activation function was created to solve the
dying ReLU problem using the standard ReLU
function that makes the neural network die
during training.
 Mathematical Form:
Types of activation function

 Using this function, we can convert negative values to make them close to
0 but not actually 0, solving the dying ReLU issue that arises from using
the standard ReLU function during neural network training.
 The Leaky ReLU is a popular activation function that is used to address
the limitations of the standard ReLU function in deep neural networks by
introducing a small negative slope for negative function inputs, which
helps neural networks to maintain better information flow both during its
training and after.
Types of activation function
In a perceptron or a neural network, activation functions play a crucial role by
introducing non-linearity to the model.
Here are some common types of activation functions used in perceptrons:

1. Linear activation function

2. Logistic activation function
3. Tanh activation function
4. Softmax activation function
5. ReLU activation function
6. Leaky ReLU activation function
Backpropagation
• Backpropagation is a supervised learning algorithm used to train artificial
neural networks.
• In backpropagation the neural networks adjust weights and biases to minimize
the error between predicted and actual outputs.
• Backpropagation aims to minimize the difference between predicted and
actual outputs.
• It uses the gradient of the error with respect to weights to iteratively adjust
weights for better predictions.
• During the forward pass, input data is fed through the neural network to
produce predictions.(the process of passing input data through the neural network to get the
prediction)
• The loss function quantifies the error between predicted and actual outputs.
• In backpropagation we try to minimize the loss to improve the model's
accuracy.
• Iterate through forward and backward passes until the model converges and
Set criteria for stopping iterations.
Gradient
Gradient
Gradient Descent
• Gradient Descent is an optimization algorithm used in machine learning and
deep learning for training models/Neural networks and finding the optimal
parameters that minimize a cost function.
Gradient Descent
• Step-by-step explanation of how Gradient Descent works:
 Step:1 Initialization: It begins by initializing the model parameters randomly
or with some predetermined values. These parameters could be the weights
and biases of a neural network, for example.
 Step:2 Compute Gradient: At each iteration, the algorithm computes the
gradient of the cost function with respect to each parameter. The gradient
represents the direction of the steepest ascent of the function.
 Step:3 Update Parameters: Once the gradient is computed, the algorithm
updates the parameters by moving them in the opposite direction of the
gradient. This is done to minimize the cost function. The update rule typically
involves multiplying the gradient by a learning rate parameter and subtracting
the result from the current parameter values.
 Step:4 Iterate: Steps 2 and 3 are repeated iteratively until a stopping criterion
is met. This could be a predefined number of iterations or until the change in
the cost function falls below a certain threshold.
Gradient Descent
• Example: Finding the Lowest Point in a Valley

Imagine you are blindfolded and placed somewhere in a valley. Your goal is
to find the lowest point in the valley without being able to see the
terrain(area of land). Here's how you might proceed:
 Initial Position: You start at a random location in the valley.
 Objective: Your objective is to descend to the lowest point in the valley.
 Sense of Touch: You can feel the slope of the ground beneath your feet,
giving you an indication of the direction of descent.
 Movement: You take a step in the direction of the steepest slope, relying
on your sense of touch to guide you downhill.
 Repetition: You repeat this process, continuously adjusting your direction
based on the slope of the terrain.
 Convergence: Eventually, you reach the lowest point in the valley,
indicating convergence to the optimal solution.
Gradient Descent
• In this analogy:

 The blindfolded person represents the optimization algorithm.

 The sense of touch represents the gradient, providing information about
the direction of descent.
 Moving downhill corresponds to updating the parameters in the direction
that minimizes the objective function.
 This example illustrates how Gradient Descent works by iteratively
adjusting parameters to minimize a cost function, much like finding the
lowest point in a valley by descending along the steepest slope.
Gradient Descent

There are three different variants of Gradient Descent

1. Batch Gradient Descent
2. Stochastic Gradient Descent
3. Mini-batch Gradient Descent
Batch Gradient Descent
• Batch Gradient Descent (BGD) is a variant of the Gradient Descent
optimization algorithm used to minimize a cost function in machine learning
and deep learning models.
• It's called "batch" because it processes the entire training dataset in each
iteration to update the model parameters.

 Optimization Algorithm in Machine/Deep Learning

 Variant of Gradient Descent
 Minimize a cost function in ML/DL models
 Processes entire training dataset in each iteration
Batch Gradient Descent
• Working Principle

 Initialization: Start with initial guess for model parameters

 Compute Gradient: Calculate gradient of cost function with respect to
parameters using entire training dataset
 Update Parameters: Adjust parameters using gradients and learning rate
 Repeat until convergence criteria met

• Key Aspects
 Uses entire dataset in each iteration so it is computationally expensive for
large datasets
 Accurate estimate of gradient
 Often converges to global minimum in most of the problems
Stochastic Gradient Descent

• SGD is a variant of the gradient descent optimization algorithm widely used

in machine learning and deep learning.
• Unlike batch gradient descent, which computes the gradient using the entire
dataset, SGD updates the model parameters using a single training example at
a time.
Stochastic Gradient Descent
Working Principle:

• Initialization: Start with initial guess for model parameters.

• For each epoch:
• Shuffle the training dataset.
• Iterate through each training example:
• Compute the gradient of the cost function with respect to the current
training example.
• Update the model parameters using the computed gradient and a
predefined learning rate.
• Repeat the process for a fixed number of epochs or until convergence criteria
are met.
Mini-batch Gradient Descent
• MBGD strikes a balance between the efficiency of stochastic gradient descent
(SGD) and the stability of batch gradient descent by updating the model
parameters using a small subset of the training data at each iteration.
• Instead of using the entire training dataset (as in batch gradient descent) or
just one example (as in SGD), MBGD divides the dataset into small batches
and updates the parameters based on the average gradient computed over each
batch.
Mini-batch Gradient Descent
Working Principle:

 Initialization: Start with an initial guess for the model parameters.

 Divide the training dataset into mini-batches of equal size (e.g., 32, 64, or 128
examples per batch).
 For each epoch:
• Shuffle the training dataset to introduce randomness and prevent the
model from getting stuck in local minima.
• Iterate through each mini-batch:
 Compute the gradient of the cost function with respect to the mini-
batch.
 Update the model parameters using the computed gradient and a
predefined learning rate.
 Repeat the process for a fixed number of epochs or until convergence criteria
are met.
Mini-batch Gradient Descent
Advantages:
• Offers a good compromise between the efficiency of SGD and the
stability of batch gradient descent.
• Well-suited for training on large datasets that do not fit into memory.
Limitation:
• Requires tuning of hyperparameters such as the learning rate and batch
size.
Advanced Optimizers
Optimizers :
• Optimizers adjust model parameters iteratively during training to
minimize a loss function, enabling neural networks to learn from data.
• Choosing an appropriate optimizer for a deep learning model is important
as it can greatly impact its performance. Optimization algorithms have
different strengths and weaknesses and are better suited for certain
problems and architectures.
• Some advanced optimizers used in neural networks:
1. SGD with Momentum
2. Nesterov Accelerated Gradient (NAG)
3. AdaGrad (Adaptive Gradient)
4. Gradient Descent with RMSprop(Root Mean Squared Propagation)
5. Adam (Adaptive Moment Estimation)
SGD with Momentum
• Momentum optimization is a popular variant of the gradient descent
optimization algorithm commonly used to train neural networks. It
addresses some of the limitations of basic gradient descent, particularly
slow convergence in the presence of flat or small gradients and
oscillations in the optimization process.

• In momentum optimization, instead of updating the weights based solely

on the current gradient, it also considers the accumulation of past
gradients to determine the direction of the update.
• This is achieved by introducing a new parameter called the momentum
parameter, denoted by β, which is typically set to a value between 0 and 1.
SGD with Momentum
SGD with Momentum
• The momentum term Vt is an exponentially weighted moving average of
past gradients(Exponentially Weighted Moving Average is a method for smoothing time-series data by assigning exponentially
decreasing weights to older observations. It is widely used for trend analysis, noise reduction, and forecasting in various fields.). It

accelerates the updates in directions where the gradients point consistently

over time and dampens oscillations in directions where the gradients
change direction frequently.
• By incorporating momentum, the optimizer gains inertia, enabling it to
continue moving in the same direction for a longer time and traverse
through regions of flat or small gradients more efficiently. This leads to
faster convergence and reduced oscillations during training.
• In short momentum optimization accelerates gradient descent by
introducing a momentum term that accumulates past gradients, helping
the optimizer navigate through complex optimization landscapes more
effectively. It is widely used in practice due to its ability to speed up
training and improve convergence for deep learning models.
Nesterov Accelerated Gradient (NAG)
• Nesterov Accelerated Gradient (NAG) is an optimization algorithm that
builds upon the momentum optimization method. It aims to improve upon
the original momentum approach by addressing the issue of momentum
overshooting, which can occur when the current gradient update is
combined with the accumulated momentum term.

• In Nesterov Accelerated Gradient, instead of evaluating the gradient at the

current position of the parameters, it evaluates the gradient at an adjusted
position that takes into account the momentum term. This adjustment is
made to anticipate the future position of the parameters based on the
momentum.
Nesterov Accelerated Gradient (NAG)
Nesterov Accelerated Gradient (NAG)
• The key difference between Nesterov Accelerated Gradient and traditional
momentum optimization is that the gradient is evaluated at the
"lookahead" position ( ) which anticipates the future position of
the parameters before updating them with the momentum term. This
allows NAG to correct the momentum overshooting problem by
incorporating a more accurate gradient estimate.

• By incorporating Nesterov momentum, the optimizer can adjust the

momentum term more effectively and reduce oscillations, leading to faster
convergence and improved optimization performance compared to
traditional momentum optimization.

Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Activation Function
No ratings yet
Activation Function
31 pages
UNIT-III Activation-Function
No ratings yet
UNIT-III Activation-Function
6 pages
Mod 2.3 - Activation Function, Loss Functions
No ratings yet
Mod 2.3 - Activation Function, Loss Functions
12 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Activation Functions
No ratings yet
Activation Functions
9 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
7 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Unit 2
No ratings yet
Unit 2
35 pages
Activation Function
No ratings yet
Activation Function
9 pages
Module1
No ratings yet
Module1
124 pages
Lec 22 Activations Functions Complete
No ratings yet
Lec 22 Activations Functions Complete
33 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
DL Answers
No ratings yet
DL Answers
24 pages
Activation
No ratings yet
Activation
7 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Activation Funtions
No ratings yet
Activation Funtions
26 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
Activation Function 1706811454
No ratings yet
Activation Function 1706811454
11 pages
Activation Function
No ratings yet
Activation Function
43 pages
Activation Function
No ratings yet
Activation Function
4 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
Activatn FN 2
No ratings yet
Activatn FN 2
10 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
3-Activation Function, Loss Function-24-07-2024
No ratings yet
3-Activation Function, Loss Function-24-07-2024
19 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
SoftComp 02
No ratings yet
SoftComp 02
33 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
Module-4 Neural Network
No ratings yet
Module-4 Neural Network
61 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Activation Function
No ratings yet
Activation Function
34 pages
Performance Analysis of Various Activation Functio
No ratings yet
Performance Analysis of Various Activation Functio
7 pages
Act Fun
No ratings yet
Act Fun
7 pages
DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Unit V Neural Networks
No ratings yet
Unit V Neural Networks
35 pages
Activation FN
No ratings yet
Activation FN
15 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Lect 5 - Non Linear Activation Functions
No ratings yet
Lect 5 - Non Linear Activation Functions
41 pages
Forward and Backward Propagation Deep Learning 1703697260
No ratings yet
Forward and Backward Propagation Deep Learning 1703697260
9 pages
26 - Netinput Activation Function Forward and Back Propogation
No ratings yet
26 - Netinput Activation Function Forward and Back Propogation
41 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
Unit 2 - Activation Function - PR
No ratings yet
Unit 2 - Activation Function - PR
22 pages
How To Choose An Activation Function For Deep Learning
No ratings yet
How To Choose An Activation Function For Deep Learning
15 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Activation Functions in Neural Networks: What Is Activation Function?
No ratings yet
Activation Functions in Neural Networks: What Is Activation Function?
11 pages
Activation Functions
No ratings yet
Activation Functions
6 pages
12 Types of Neural Network Activation Functions
No ratings yet
12 Types of Neural Network Activation Functions
38 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
ProcessResilience FaultTolerance Recovery
No ratings yet
ProcessResilience FaultTolerance Recovery
21 pages
Questions On Module No.3
No ratings yet
Questions On Module No.3
1 page
Questions On Module No.1
No ratings yet
Questions On Module No.1
2 pages
Best Bollywood Actor
No ratings yet
Best Bollywood Actor
1 page
Best Mafia Movie
No ratings yet
Best Mafia Movie
1 page
INT 354 CA1 Mokshagna
No ratings yet
INT 354 CA1 Mokshagna
8 pages
Unit 6 Application of AI
No ratings yet
Unit 6 Application of AI
91 pages
MLP
No ratings yet
MLP
19 pages
4 MCQ Ann Ann Quiz Selected
100% (1)
4 MCQ Ann Ann Quiz Selected
18 pages
ANN Viva Prep
No ratings yet
ANN Viva Prep
66 pages
Virus Detection Using Deep Learning: Saurabh Malusare Rojan Sudev Rishabh Nrupnarayan
No ratings yet
Virus Detection Using Deep Learning: Saurabh Malusare Rojan Sudev Rishabh Nrupnarayan
28 pages
Deep Learning Research Paper
No ratings yet
Deep Learning Research Paper
4 pages
Introduction To Kmeans
No ratings yet
Introduction To Kmeans
4 pages
Part 2
No ratings yet
Part 2
165 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Final Unit 2 Questions.
No ratings yet
Final Unit 2 Questions.
5 pages
Tutorial On Neural Networks - 18MAR2024
No ratings yet
Tutorial On Neural Networks - 18MAR2024
33 pages
Pretrained Convolutional Neural Network - MATLAB & Simulink - MathWorks India
No ratings yet
Pretrained Convolutional Neural Network - MATLAB & Simulink - MathWorks India
3 pages
Lect 5
No ratings yet
Lect 5
17 pages
Major In: Machine Learning
No ratings yet
Major In: Machine Learning
11 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Redes Neurais Feedforward
No ratings yet
Redes Neurais Feedforward
53 pages
Deep Learning With Pytorch: Ai Courses by Opencv
No ratings yet
Deep Learning With Pytorch: Ai Courses by Opencv
5 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
Understanding Neural Networks A Python Implementation
No ratings yet
Understanding Neural Networks A Python Implementation
8 pages
Variational Autoencoders - Pre Quiz - Attempt Review
100% (2)
Variational Autoencoders - Pre Quiz - Attempt Review
4 pages
III B.Tech I Sem MachineLearning (20AD5T04)
No ratings yet
III B.Tech I Sem MachineLearning (20AD5T04)
1 page
AML - GTU Paper - Questions
No ratings yet
AML - GTU Paper - Questions
4 pages
Image Captioning: - A Deep Learning Approach
No ratings yet
Image Captioning: - A Deep Learning Approach
14 pages
Introduction To Neural Networks 67103 - 2019 Exam B
No ratings yet
Introduction To Neural Networks 67103 - 2019 Exam B
2 pages
(2020) Gaussian Error Linear Units (Gelus)
No ratings yet
(2020) Gaussian Error Linear Units (Gelus)
9 pages
Lecture5 - Clustering (K Means and K Medoids)
No ratings yet
Lecture5 - Clustering (K Means and K Medoids)
36 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
KNN Poor Choice
No ratings yet
KNN Poor Choice
9 pages
ANN Backpropagation Algorithm
No ratings yet
ANN Backpropagation Algorithm
4 pages

Fundamentals of Neural Network

Uploaded by

Fundamentals of Neural Network

Uploaded by

Fundamentals of Neural Network

Training, Optimization and

Capturing Complex Relationships:

Enabling Neural Network to Learn:

Avoiding Vanishing or Exploding Gradients:

1. Linear activation function

 It is a function which is plotted as ‘S’ shaped graph.

1. Linear activation function

 The blindfolded person represents the optimization algorithm.

There are three different variants of Gradient Descent

 Optimization Algorithm in Machine/Deep Learning

 Initialization: Start with initial guess for model parameters

• SGD is a variant of the gradient descent optimization algorithm widely used

• Initialization: Start with initial guess for model parameters.

 Initialization: Start with an initial guess for the model parameters.

• In momentum optimization, instead of updating the weights based solely

accelerates the updates in directions where the gradients point consistently

• In Nesterov Accelerated Gradient, instead of evaluating the gradient at the

• By incorporating Nesterov momentum, the optimizer can adjust the

You might also like