0% found this document useful (0 votes)
43 views18 pages

Deep Learning

The document discusses various deep learning optimizers like momentum, Nesterov accelerated gradient, Adagrad, Adadelta, RMSProp, and Adam. It also covers deep learning concepts such as LSTM, autoencoders, bagging, and boosting. Finally, it advertises

Uploaded by

R Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views18 pages

Deep Learning

The document discusses various deep learning optimizers like momentum, Nesterov accelerated gradient, Adagrad, Adadelta, RMSProp, and Adam. It also covers deep learning concepts such as LSTM, autoencoders, bagging, and boosting. Finally, it advertises

Uploaded by

R Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

DEEP LEarNiNg

MastEr Class
Day-9
Announcement
Attendance Link will be Available around 7:30 PM
01

Optimizers
Concept
Concept
Disadvantage in gradient descent

• Small learning Rate leads to slow


convergence Variations of GD

• Large learning rate leads to hinder 1. Momentum


convergence
2. Nesterov Accelerated Gradient
• Reducing learning rate according to pre- 3. Adagrad
defined schedule unable to adapt to a
4. Adadelta/RMSProp
dataset’s characteristic
5. Adam
• Same learning rate applies to all parameter
updates, even we have different frequencies
of features
Momentum
• Momentum helps accelerate Gradient Descent(GD)
when we have surfaces that curve more steeply in
one direction than in another

• For updating the weights it takes the gradient of


the current step as well as the gradient of the
previous time steps

• This helps us move faster towards convergence

• Convergence happens faster when we apply


momentum optimizer to surfaces with curves

Momentum Gradient descent takes gradient of previous time steps into consideration
Nesterov accelerated gradient(NAG)
• Nesterov acceleration optimization is like a ball
rolling down the hill but knows exactly when to
slow down before the gradient of the hill increases
again.

• We calculate the gradient not with respect to the


current step but with respect to the future step

• We evaluate the gradient of the looked ahead and


based on the importance then update the weights

• Works slightly better than standard Momentum


Adagrad — Adaptive Gradient Algorithm
• We need to tune the learning rate in Momentum and NAG
which is an expensive process

• Adagrad is an adaptive learning rate method

• In Adagrad we adopt the learning rate to the parameters

• We perform larger updates for infrequent parameters and


smaller updates for frequent parameters.

• For SGD, Momentum, and NAG we update for all


parameters θ at once. We also use the same learning rate η.
In Adagrad we use different learning rate for every
parameter θ for every time step t

• Adagrad eliminates the need to manually tune the learning


rate.
Adadelta

• Adadelta is an extension of Adagrad and it also tries to


reduce Adagrad’s aggressive, monotonically reducing the
learning rate

• It does this by restricting the window of the past


accumulated gradient to some fixed size of w. Running
average at time t then depends on the previous average and
the current gradient

• In Adadelta we do not need to set the default learning rate


as we take the ratio of the running average of the previous
time steps to the current gradient
RMSProp

• Root Mean Square Propagation

• RMSProp tries to resolve Adagrad’s radically diminishing


learning rates by using a moving average of the squared
gradient

• It utilizes the magnitude of the recent gradient descents to


normalize the gradient

• In RMSProp learning rate gets adjusted automatically and it


chooses a different learning rate for each parameter

• It divides the learning rate by the average of the exponential


decay of squared gradients
Adam — Adaptive Moment Estimation
• Another method that calculates the individual adaptive learning rate for each
parameter from estimates of first and second moments of the gradients

• It also reduces the radically diminishing learning rates of Adagrad

• It can be viewed as a combination of Adagrad, which works well on sparse gradients


and RMSprop which works well in online and nonstationary settings

• Adam implements the exponential moving average of the gradients to scale the
learning rate instead of a simple average as in Adagrad. It keeps an exponentially
decaying average of past gradients

• Adam is computationally efficient and has very little memory requirement

• Adam optimizer is one of the most popular gradient descent optimization algorithms
Deep Learning Terminology - 1
LSTM
• LSTM stands for Long short-term memory
• LSTM has feedback connections which makes it a "general purpose
computer.“
• It can process not only a single data point but also entire sequences
of data.
• They are a special kind of RNN which are capable of learning long-
term dependencies.

• Step 1: The network decides what to forget and what to remember.


• Step 2: It selectively updates cell state values.
• Step 3: The network decides what part of the current state makes it
to the output.
Deep Learning Terminology - 2
Autoencoder
• Autoencoder is an artificial neural network. It can learn representation for a set of
data without any supervision.

• The network automatically learns by copying its input to the output

• They can learn efficient ways of representing the data

• It contains three layers:


• Encoder - is used to compress the input into a latent space. It encodes the input
images as a compressed representation in a reduced dimension. The compressed
images are the distorted version of the original image.
• Decoder - The decoder layer decodes the encoded image back to its original
dimension. The decoded image is a reduced reconstruction of the original image
Deep Learning Terminology - 3
Bagging and Boosting
• Bagging and Boosting are ensemble techniques to train multiple
models using the same learning algorithm and then taking a
call.

• With Bagging, we take a dataset and split it into training data


and test data. Then we randomly select data to place into the
bags and train the model separately.

• With Boosting, the emphasis is on selecting data points which


give wrong output to improve the accuracy.
A.I Career Assistance Course Bundle
A.I Career Assistance Bundle
• Access to 6 Technical Course with 150+ Projects (Value ₹80,000)
• Access to 3 Soft skill & Mindset COURSES (Value ₹5,000)
• 250+ Source Code (Value ₹20,000)
• Technical Materials (PPT & Mindmap) (Value ₹10,000)


Bonus Task, Assignment (Value ₹10,000)
Forum discussion & Support (Value ₹5,000)
9 Course @2999
• 10+ Industry Level Projects (Value ₹25,000)
• 6 Internship E-Certificate(A.I, ML, DA, DL, CV, Python) (Value unlimited)
• Life Time Community for Job Openings & Other discussions (Value ₹5,000)
• Personal assistance for Resume Validation (Value ₹10,000)
• Validity – 1 YEAR
Total Value ₹1,65,000
@₹2999 only
Short bytes – D9
 Decide what’s important and do it 1st Everyday
 Don’t let other people schedule your life
 Pay close attention to what makes you happy
 Stop watching TV
 Schedule your breaks and Enjoy them
 Look through ur Calendar & Cancel things you aren’t excited about
 If you keep putting something off just let it go
 Before you go to bed decide on tomorrow’s most important action

Ways to have more time


Q&A
Session
Thanks!
Follow Me on LinkedIn : Link in Description
+91 7305845758
Pantechsolutions.net

Ask Your Friends to Participate on this Event

You might also like