0% found this document useful (0 votes)

43 views18 pages

Deep Learning

The document discusses various deep learning optimizers like momentum, Nesterov accelerated gradient, Adagrad, Adadelta, RMSProp, and Adam. It also covers deep learning concepts such as LSTM, autoencoders, bagging, and boosting. Finally, it advertises

Uploaded by

R Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views18 pages

Deep Learning

Uploaded by

R Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

DEEP LEarNiNg

MastEr Class
Day-9
Announcement
Attendance Link will be Available around 7:30 PM
01

Optimizers
Concept
Concept
Disadvantage in gradient descent

• Small learning Rate leads to slow

convergence Variations of GD

• Large learning rate leads to hinder 1. Momentum

convergence
2. Nesterov Accelerated Gradient
• Reducing learning rate according to pre- 3. Adagrad
defined schedule unable to adapt to a
4. Adadelta/RMSProp
dataset’s characteristic
5. Adam
• Same learning rate applies to all parameter
updates, even we have different frequencies
of features
Momentum
• Momentum helps accelerate Gradient Descent(GD)
when we have surfaces that curve more steeply in
one direction than in another

• For updating the weights it takes the gradient of

the current step as well as the gradient of the
previous time steps

• This helps us move faster towards convergence

• Convergence happens faster when we apply

momentum optimizer to surfaces with curves

Momentum Gradient descent takes gradient of previous time steps into consideration
Nesterov accelerated gradient(NAG)
• Nesterov acceleration optimization is like a ball
rolling down the hill but knows exactly when to
slow down before the gradient of the hill increases
again.

• We calculate the gradient not with respect to the

current step but with respect to the future step

• We evaluate the gradient of the looked ahead and

based on the importance then update the weights

• Works slightly better than standard Momentum

Adagrad — Adaptive Gradient Algorithm
• We need to tune the learning rate in Momentum and NAG
which is an expensive process

• Adagrad is an adaptive learning rate method

• In Adagrad we adopt the learning rate to the parameters

• We perform larger updates for infrequent parameters and

smaller updates for frequent parameters.

• For SGD, Momentum, and NAG we update for all

parameters θ at once. We also use the same learning rate η.
In Adagrad we use different learning rate for every
parameter θ for every time step t

• Adagrad eliminates the need to manually tune the learning

rate.
Adadelta

• Adadelta is an extension of Adagrad and it also tries to

reduce Adagrad’s aggressive, monotonically reducing the
learning rate

• It does this by restricting the window of the past

accumulated gradient to some fixed size of w. Running
average at time t then depends on the previous average and
the current gradient

• In Adadelta we do not need to set the default learning rate

as we take the ratio of the running average of the previous
time steps to the current gradient
RMSProp

• Root Mean Square Propagation

• RMSProp tries to resolve Adagrad’s radically diminishing

learning rates by using a moving average of the squared
gradient

• It utilizes the magnitude of the recent gradient descents to

normalize the gradient

• In RMSProp learning rate gets adjusted automatically and it

chooses a different learning rate for each parameter

• It divides the learning rate by the average of the exponential

decay of squared gradients
Adam — Adaptive Moment Estimation
• Another method that calculates the individual adaptive learning rate for each
parameter from estimates of first and second moments of the gradients

• It also reduces the radically diminishing learning rates of Adagrad

• It can be viewed as a combination of Adagrad, which works well on sparse gradients

and RMSprop which works well in online and nonstationary settings

• Adam implements the exponential moving average of the gradients to scale the
learning rate instead of a simple average as in Adagrad. It keeps an exponentially
decaying average of past gradients

• Adam is computationally efficient and has very little memory requirement

• Adam optimizer is one of the most popular gradient descent optimization algorithms
Deep Learning Terminology - 1
LSTM
• LSTM stands for Long short-term memory
• LSTM has feedback connections which makes it a "general purpose
computer.“
• It can process not only a single data point but also entire sequences
of data.
• They are a special kind of RNN which are capable of learning long-
term dependencies.

• Step 1: The network decides what to forget and what to remember.

• Step 2: It selectively updates cell state values.
• Step 3: The network decides what part of the current state makes it
to the output.
Deep Learning Terminology - 2
Autoencoder
• Autoencoder is an artificial neural network. It can learn representation for a set of
data without any supervision.

• The network automatically learns by copying its input to the output

• They can learn efficient ways of representing the data

• It contains three layers:

• Encoder - is used to compress the input into a latent space. It encodes the input
images as a compressed representation in a reduced dimension. The compressed
images are the distorted version of the original image.
• Decoder - The decoder layer decodes the encoded image back to its original
dimension. The decoded image is a reduced reconstruction of the original image
Deep Learning Terminology - 3
Bagging and Boosting
• Bagging and Boosting are ensemble techniques to train multiple
models using the same learning algorithm and then taking a
call.

• With Bagging, we take a dataset and split it into training data

and test data. Then we randomly select data to place into the
bags and train the model separately.

• With Boosting, the emphasis is on selecting data points which

give wrong output to improve the accuracy.
A.I Career Assistance Course Bundle
A.I Career Assistance Bundle
• Access to 6 Technical Course with 150+ Projects (Value ₹80,000)
• Access to 3 Soft skill & Mindset COURSES (Value ₹5,000)
• 250+ Source Code (Value ₹20,000)
• Technical Materials (PPT & Mindmap) (Value ₹10,000)
•
•
Bonus Task, Assignment (Value ₹10,000)
Forum discussion & Support (Value ₹5,000)
9 Course @2999
• 10+ Industry Level Projects (Value ₹25,000)
• 6 Internship E-Certificate(A.I, ML, DA, DL, CV, Python) (Value unlimited)
• Life Time Community for Job Openings & Other discussions (Value ₹5,000)
• Personal assistance for Resume Validation (Value ₹10,000)
• Validity – 1 YEAR
Total Value ₹1,65,000
@₹2999 only
Short bytes – D9
 Decide what’s important and do it 1st Everyday
 Don’t let other people schedule your life
 Pay close attention to what makes you happy
 Stop watching TV
 Schedule your breaks and Enjoy them
 Look through ur Calendar & Cancel things you aren’t excited about
 If you keep putting something off just let it go
 Before you go to bed decide on tomorrow’s most important action

Ways to have more time

Q&A
Session
Thanks!
Follow Me on LinkedIn : Link in Description
+91 7305845758
Pantechsolutions.net

Ask Your Friends to Participate on this Event

Information System Security Plan
No ratings yet
Information System Security Plan
8 pages
Dora Error 2
No ratings yet
Dora Error 2
39 pages
Software Quality, Dilemma, Achieving
33% (3)
Software Quality, Dilemma, Achieving
21 pages
Oracle Bea Weblogic Server Overview Topology, Configuration and Administration
No ratings yet
Oracle Bea Weblogic Server Overview Topology, Configuration and Administration
38 pages
PROJECT Synopses: Hostel Management System
No ratings yet
PROJECT Synopses: Hostel Management System
22 pages
Chapter 1 Group5 IC2MB
No ratings yet
Chapter 1 Group5 IC2MB
11 pages
Convolutional Neural Networks in Computer Vision: Jochen Lang
No ratings yet
Convolutional Neural Networks in Computer Vision: Jochen Lang
44 pages
Serial Keys :)
0% (1)
Serial Keys :)
3 pages
Ankit's Resume PDF
No ratings yet
Ankit's Resume PDF
1 page
Chapter-2 Single Feed Forward Netwotk
No ratings yet
Chapter-2 Single Feed Forward Netwotk
132 pages
CLF-C02 Exam Guide Slides
No ratings yet
CLF-C02 Exam Guide Slides
30 pages
Deep Learning Glossary
No ratings yet
Deep Learning Glossary
41 pages
Hacking Step
No ratings yet
Hacking Step
10 pages
02 Querying Data On External Object Storage - v1 - 0 - DA016655
No ratings yet
02 Querying Data On External Object Storage - v1 - 0 - DA016655
11 pages
Sepm Final 064
No ratings yet
Sepm Final 064
67 pages
Optimization For Deep Learning: Sebastian Ruder
No ratings yet
Optimization For Deep Learning: Sebastian Ruder
49 pages
Fuzzy Relations
No ratings yet
Fuzzy Relations
23 pages
Windows 7 CertificationPath
No ratings yet
Windows 7 CertificationPath
1 page
DL Class2
No ratings yet
DL Class2
30 pages
Part 1.3. Optimazation of Learning Algorithms
No ratings yet
Part 1.3. Optimazation of Learning Algorithms
14 pages
Software Designing
No ratings yet
Software Designing
11 pages
Plotly Express Cheat Sheet
No ratings yet
Plotly Express Cheat Sheet
1 page
2021 04 17T16 33 48 - R3dlog
No ratings yet
2021 04 17T16 33 48 - R3dlog
4 pages
How Can Water Planning and Management Benefit From It?: Artificial Intelligence
No ratings yet
How Can Water Planning and Management Benefit From It?: Artificial Intelligence
4 pages
Curs6site PDF
No ratings yet
Curs6site PDF
40 pages
Notebook 08 April 16
No ratings yet
Notebook 08 April 16
3 pages
Optimization in Machine Learning
No ratings yet
Optimization in Machine Learning
26 pages
Smaart9 2ReleaseOverview
No ratings yet
Smaart9 2ReleaseOverview
5 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
Scrolling Dot Matrix LED Display Using 8051
100% (1)
Scrolling Dot Matrix LED Display Using 8051
3 pages
Computational Lexicography
No ratings yet
Computational Lexicography
3 pages
CCM Setup
No ratings yet
CCM Setup
39 pages
769 Padam Closing The Generalizati
No ratings yet
769 Padam Closing The Generalizati
16 pages
Momentum Update Rule
No ratings yet
Momentum Update Rule
4 pages
MLP Encoder Decoder
No ratings yet
MLP Encoder Decoder
14 pages
Soft Computing Assignment
No ratings yet
Soft Computing Assignment
9 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Optimization Gradient Descent Method
No ratings yet
Optimization Gradient Descent Method
3 pages
Activations, Loss Functions & Optimizers in ML
No ratings yet
Activations, Loss Functions & Optimizers in ML
29 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
L5 Training Neural Networks Part 2 en v2
No ratings yet
L5 Training Neural Networks Part 2 en v2
70 pages
Deep Learning (MODULE-2)
No ratings yet
Deep Learning (MODULE-2)
86 pages
Gradient Descent Overview
No ratings yet
Gradient Descent Overview
14 pages
Momentum, AdaGrad, RMSProp, Adam
No ratings yet
Momentum, AdaGrad, RMSProp, Adam
27 pages
Optimizers
No ratings yet
Optimizers
4 pages
Dissertation Computer Science Example
100% (2)
Dissertation Computer Science Example
4 pages
Unit2 Optimizer
No ratings yet
Unit2 Optimizer
18 pages
Lecture 8.5
No ratings yet
Lecture 8.5
9 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
Role of An Optimizer
No ratings yet
Role of An Optimizer
9 pages
Ais615 Key Terms Chapter 4
No ratings yet
Ais615 Key Terms Chapter 4
2 pages
Unit 2.2
No ratings yet
Unit 2.2
46 pages
Neural Networks and Deep Learning: Enhancing Ai Through Neural Network Optimization
No ratings yet
Neural Networks and Deep Learning: Enhancing Ai Through Neural Network Optimization
5 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
ADL Unit-3
100% (2)
ADL Unit-3
21 pages
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
No ratings yet
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
22 pages
BME 6407 - Class 10 (April 2023)
No ratings yet
BME 6407 - Class 10 (April 2023)
31 pages
Rajesh (DL Unit3) 06dec2024
No ratings yet
Rajesh (DL Unit3) 06dec2024
67 pages
Mcculloh: Linear Activation Function
No ratings yet
Mcculloh: Linear Activation Function
18 pages
08 Training
No ratings yet
08 Training
18 pages
403 Forbidden Issue #497 R0oth3x49 - Udemy-Dl
No ratings yet
403 Forbidden Issue #497 R0oth3x49 - Udemy-Dl
3 pages
AdamZ Research Paper
No ratings yet
AdamZ Research Paper
13 pages
DL 4
No ratings yet
DL 4
15 pages
App Service Internals
No ratings yet
App Service Internals
50 pages
GD Compare
No ratings yet
GD Compare
5 pages
Cours 5
No ratings yet
Cours 5
23 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Unit 3
No ratings yet
Unit 3
47 pages
A22-DS - (C TO All) - 23-08-2023 - (Regular)
No ratings yet
A22-DS - (C TO All) - 23-08-2023 - (Regular)
2 pages
Module 3
No ratings yet
Module 3
7 pages
Otimization 2024 - Ver3
No ratings yet
Otimization 2024 - Ver3
42 pages
17-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-21!08!2024
No ratings yet
17-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-21!08!2024
3 pages
Optimization Techniques (SGD Alternatives)
No ratings yet
Optimization Techniques (SGD Alternatives)
34 pages
Deep Learning Exp 2.3 MU
No ratings yet
Deep Learning Exp 2.3 MU
4 pages
Module 2
No ratings yet
Module 2
67 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Training NNs
No ratings yet
Training NNs
34 pages
Important Optimization Algorithms Essentials
No ratings yet
Important Optimization Algorithms Essentials
12 pages
Adam 1
No ratings yet
Adam 1
11 pages
Optimization
No ratings yet
Optimization
26 pages
Optimizers and Activation Functions in Deep Learning
No ratings yet
Optimizers and Activation Functions in Deep Learning
15 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
Gen Aiml Notes by Piyush
No ratings yet
Gen Aiml Notes by Piyush
39 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
Optimization of Gradiant Descant
No ratings yet
Optimization of Gradiant Descant
7 pages
Module 1
No ratings yet
Module 1
64 pages
Optimizer
No ratings yet
Optimizer
13 pages
DL CS 6 M2 Live Session Flow
No ratings yet
DL CS 6 M2 Live Session Flow
32 pages
Scrum Art Hand Book: Effective Tips & Techniques
From Everand
Scrum Art Hand Book: Effective Tips & Techniques
Durga Madiraju
No ratings yet
Programming Constructs in Java
From Everand
Programming Constructs in Java
Sarthak Saxena
1/5 (1)

Deep Learning

Uploaded by

Deep Learning

Uploaded by

DEEP LEarNiNg

• Small learning Rate leads to slow

• Large learning rate leads to hinder 1. Momentum

• For updating the weights it takes the gradient of

• This helps us move faster towards convergence

• Convergence happens faster when we apply

• We calculate the gradient not with respect to the

• We evaluate the gradient of the looked ahead and

• Works slightly better than standard Momentum

• Adagrad is an adaptive learning rate method

• In Adagrad we adopt the learning rate to the parameters

• We perform larger updates for infrequent parameters and

• For SGD, Momentum, and NAG we update for all

• Adagrad eliminates the need to manually tune the learning

• Adadelta is an extension of Adagrad and it also tries to

• It does this by restricting the window of the past

• In Adadelta we do not need to set the default learning rate

• Root Mean Square Propagation

• RMSProp tries to resolve Adagrad’s radically diminishing

• It utilizes the magnitude of the recent gradient descents to

• In RMSProp learning rate gets adjusted automatically and it

• It divides the learning rate by the average of the exponential

• It also reduces the radically diminishing learning rates of Adagrad

• It can be viewed as a combination of Adagrad, which works well on sparse gradients

• Adam is computationally efficient and has very little memory requirement

• Step 1: The network decides what to forget and what to remember.

• The network automatically learns by copying its input to the output

• They can learn efficient ways of representing the data

• It contains three layers:

• With Bagging, we take a dataset and split it into training data

• With Boosting, the emphasis is on selecting data points which

Ways to have more time

Ask Your Friends to Participate on this Event

You might also like