0% found this document useful (0 votes)
24 views

Optimization

The document discusses various optimization algorithms used in deep learning including gradient descent, stochastic gradient descent, mini-batch gradient descent, RMSprop, and Adam. It defines key terms like optimizer, learning rate, cost function, and discusses challenges with gradient descent.

Uploaded by

saisundaresan27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Optimization

The document discusses various optimization algorithms used in deep learning including gradient descent, stochastic gradient descent, mini-batch gradient descent, RMSprop, and Adam. It defines key terms like optimizer, learning rate, cost function, and discusses challenges with gradient descent.

Uploaded by

saisundaresan27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

What is Optimizer?

➢ Optimization algorithms are responsible for reducing losses and provide most accurate
results possible.
➢ The weight is initialized using some initialization strategies and is updated with each epoch
according to the equation.
➢ The best results are achieved using some optimization strategies or algorithms called
Optimizer.
➢ Optimizers are algorithms or methods used to change the attributes of your neural network
such as weights and learning rate in order to reduce the losses.

Types of Optimizers in Deep Learning:

➢ Gradient Descent
➢ Batch Gradient Descent
➢ Stochastic Gradient Descent (SGD)
➢ Mini-Batch Stochastic Gradient Descent (MB — SGD)
➢ RMSProp
➢ Adam

What is Gradient Descent?

➢ The Gradient Descent is an optimization algorithm that is used for minimizing the cost
function(Errors).
➢ Its updates the various parameters of a machine learning model to minimize the cost
function.
➢ Gradient descent is an optimization algorithm which is commonly-used to train machine
learning models and neural networks.

What is Gradient?

➢ A gradient measures how much the output of a function changes if you change the inputs a
little bit.

Steps to implement Gradient Descent:

➢ Randomly initialize values.


➢ Update values.
➢ Repeat until slope =0

What is Batch Gradient Descent?

➢ Batch gradient descent (BGD) is used to find the error for each point in the training set and
update the model after evaluating all training examples.
➢ This procedure is known as the training epoch.
➢ Batch gradient descent, also called vanilla gradient descent.

What is Stochastic Gradient Descent (SGD)?

➢ Stochastic Gradient Descent is an extension of Gradient Descent, where it overcomes some


of the disadvantages of Gradient Descent algorithm.
➢ Stochastic gradient descent (SGD) is a type of gradient descent that runs one training
example per iteration.
➢ Or in other words, it processes a training epoch for each example within a dataset and
updates each training example's parameters one at a time.

What is Mini Batch Gradient Descent?

➢ Mini Batch gradient descent is the combination of both batch gradient descent and
stochastic gradient descent.
➢ It splits the training dataset into small batch sizes and performs updates on each of those
batches.
➢ This approach strikes a balance between the computational efficiency of batch gradient
descent and the speed of stochastic gradient descent.

What is RMSprop Optimizer?

➢ The RMSprop optimizer is similar to the gradient descent algorithm with momentum.
➢ The RMSprop optimizer restricts the oscillations in the vertical direction.
➢ Therefore, we can increase our learning rate and our algorithm could take larger steps in the
horizontal direction converging faster.

What is Adam Optimizer?

➢ Adam is a replacement optimization algorithm for stochastic gradient descent for training
deep learning models.
➢ Adam combines the best properties of the AdaGrad and RMSProp algorithms to provide an
optimization algorithm that can handle sparse gradients on noisy problems.

Challenges with the Gradient Descent:

1.Local Minima and Saddle Point:

2.Vanishing and Exploding Gradient

➢ Vanishing Gradients:Vanishing Gradient occurs when the gradient is smaller


than expected.
➢ Exploding Gradient:This happens when the gradient is too large, creating an
unstable model.

What is Learning Rate?

➢ Learning rate is the size of the steps that are taken to reach the minimum.
➢ It is defined as the step size taken to reach the minimum or lowest point.
➢ This is typically a small value that is evaluated and updated based on the behavior of the
cost function.
➢ If the learning rate is high, it results in larger steps but also leads to risks of overshooting the
minimum.

What is Cost-function?

➢ The cost function is defined as the measurement of difference or error between actual
values and expected values at the current position and present in the form of a single real
number.

You might also like