0% found this document useful (0 votes)

6 views

Week 4

Uploaded by

Netaji Gandi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Week 4

Uploaded by

Netaji Gandi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

DEEP LEARNING WEEK 4

1. Which step does Nesterov accelerated gradient descent perform before finding the update
size?
a) Increase the momentum
b) Estimate the next position of the parameters
c) Adjust the learning rate
d) Decrease the step size
Answer: b) Estimate the next position of the parameters
Solution: Nesterov gradient descent estimates the next position of the parameter and
calculates the gradient of parameters at that position. The new position is determined using
this gradient and the gradient at the original step.
2. Select the parameter of vanilla gradient descent controls the step size in the direction of the
gradient.
a) Learning rate
b) Momentum
c) Gamma
d) None of the above
Answer: a) Learning rate
Solution: Learning rate determines the step size in vanilla gradient descent. Momentum is
not used in the normal gradient descent.
3. What does the distance between two contour lines on a contour map represent?
a) The change in the output of the function
b) The direction of the function
c) The rate of change of the function
d) None of the above
Answer: c) The rate of change of the function
Solution: The distance between two contour line determine the rate of change/ steepness of
the function
4. Which of the following represents the contour plot of the function f(x,y) = x2 − y?
−20
4 −10 −1
0

0 0
2
10

0
0 0
20

−2
0 0
10

−10 −1
−4 0
−20

a) −4 −2 0 2 4

1
4

5
2

0
20

20
15

15
10

10
5

0
0

5
−2
15

15
10

10
20

20
5
25

25
−4
5

b) −4 −2 0 2 4

4
8
− 6 4
− − 2 0
−
2

2
4 2 0
0 − −

4
−2 2
2 0
−
6
−4 4

2 8
c) −4 −2 0 2 4
45
45 20
25
4 25
0

40 3

15
35 4

10 20
30

30
5
20

2
15

15
25

25
10

10
5

0
5

25
20
20

−2 5
15

10
10
25

−4 15
20 20 25
30
45 40 35 30
25 35 40 45
d) −4 −2 0 2 4

2
4

5
2

0
20

20
15

15
10

10
5

0
0

5
−2
15

15
10

10
20

20
5
25

25
−4

5
Answer: b) −4 −2 0 2 4

5. What is the main advantage of using Adagrad over other optimization algorithms?
a)It converges faster than other optimization algorithms.
b)It is less sensitive to the choice of hyperparameters(learning rate).
c)It is more memory-efficient than other optimization algorithms.
d)It is less likely to get stuck in local optima than other optimization algorithms.
Answer: b) The main advantage of using Adagrad over other optimization algorithms is
that it is less sensitive to the choice of hyperparameters.
Solution: Adagrad automatically adapts the learning rate for each weight based on the
gradient history, which makes it less sensitive to the choice of hyperparameters than other
optimization algorithms. This can be especially useful when dealing with high-dimensional
datasets or complex models where the manual tuning of hyperparameters can be
time-consuming and error-prone.
6. We are training a neural network using the vanilla gradient descent algorithm. We observe
that the change in weights is small in successive iterations. What are the possible causes for
the following phenomenon? (MSQ)

a)η is large
b)∇w is small
c)∇w is large
d)η is small

Solution: (b),(d)
Answer: Small update changes signifies that the quantity η∇w is small. This can happen if
∇w or η is small.
7. You are given labeled data which we call X where rows are data points and columns feature.
One column has most of its values as 0. What algorithm should we use here for faster
convergence and achieve the optimal value of the loss function?

a)NAG
b)Adam
c)Stochastic gradient descent

3
d)Momentum-based gradient descent

Answer: b)
Solution: One of our weight vectors is sparse hence adam would work best here.
Solution: The moving averages used in ADAM are initialized to zero, which can result in
biased estimates of the first and second moments of the gradient. To address this, ADAM
applies bias correction terms to the moving averages, which corrects for the initial bias and
leads to more accurate estimates of the moments.
8. What is the update rule for the ADAM optimizer?
√
a).wt = wt−1 − lr ∗ (mt /( vt + ϵ))
b). wt = wt−1 − lr ∗ m
c). wt = wt−1 − lr ∗ (mt /(vt + ϵ))
d). wt = wt−1 − lr ∗ (vt /(mt + ϵ))
Answer: a)
Solution: The update rule for the ADAM optimizer is w = w - lr * (m / (sqrt(v) + eps)),
where w is the weight, lr is the learning rate, m is the first-moment estimate, v is the
second-moment estimate, and eps is a small constant to prevent division by zero.
9. What is the advantage of using mini-batch gradient descent over batch gradient descent?
a) Mini-batch gradient descent is more computationally efficient than batch gradient descent.
b) Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch
gradient descent.
c) Mini batch gradient descent gives us a better solution.
d) Mini-batch gradient descent can converge faster than batch gradient descent.
Answer: a) and d).
Solution: The advantage of using mini-batch gradient descent over batch gradient descent is
that it is more computationally efficient, allows for parallel processing of the training
examples, and can converge faster than batch gradient descent.
10. Which of the following is a variant of gradient descent that uses an estimate of the next
gradient to update the current position of the parameters?
a) Momentum optimization
b) Stochastic gradient descent
c) Nesterov accelerated gradient descent
d) Adagrad Answer: c) Nesterov accelerated gradient descent
Solution: Nesterov gradient descent estimates the next position of the parameter and
calculates the gradient of parameters at that position. The new position is determined using
this gradient and the gradient at the original step.

TBDY 2018 English
0% (1)
TBDY 2018 English
608 pages
DL - Assignment 9 Solution
100% (3)
DL - Assignment 9 Solution
7 pages
Grammatical and Lexical Features of Scientific and Technical
No ratings yet
Grammatical and Lexical Features of Scientific and Technical
16 pages
Thirteen Moons in Motion
100% (3)
Thirteen Moons in Motion
41 pages
week4 (1)
No ratings yet
week4 (1)
4 pages
Deep Learning - IIT Ropar - Unit 7 - Week 4
100% (1)
Deep Learning - IIT Ropar - Unit 7 - Week 4
5 pages
4.deep Learning Assignment4 Solution PDF
100% (1)
4.deep Learning Assignment4 Solution PDF
12 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
Optimization For Deep Learning: Sebastian Ruder
No ratings yet
Optimization For Deep Learning: Sebastian Ruder
49 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
4.1 - EDA Lecture Module 4 Vetri Sir New
No ratings yet
4.1 - EDA Lecture Module 4 Vetri Sir New
19 pages
Rajesh (Dl Unit3) 06dec2024
No ratings yet
Rajesh (Dl Unit3) 06dec2024
67 pages
Unit 2.2
No ratings yet
Unit 2.2
46 pages
Deep Learning - week 4
No ratings yet
Deep Learning - week 4
6 pages
EDA Lecture Module 4
No ratings yet
EDA Lecture Module 4
20 pages
Adam: A Method For Stochastic Optimization: Diederik P. Kingma and Jimmy Lei Ba
No ratings yet
Adam: A Method For Stochastic Optimization: Diederik P. Kingma and Jimmy Lei Ba
41 pages
Assignment9 DeepLearning
No ratings yet
Assignment9 DeepLearning
6 pages
Nesterov Momentum
No ratings yet
Nesterov Momentum
3 pages
L5 - UCLxDeepMind DL2020
No ratings yet
L5 - UCLxDeepMind DL2020
52 pages
054 Report
No ratings yet
054 Report
6 pages
ADAM-1
No ratings yet
ADAM-1
11 pages
AdamZ research paper
No ratings yet
AdamZ research paper
13 pages
Deep Learning (MODULE-2) (2)
No ratings yet
Deep Learning (MODULE-2) (2)
86 pages
optimization techniques (SGD alternatives)
No ratings yet
optimization techniques (SGD alternatives)
34 pages
DL Class2
No ratings yet
DL Class2
30 pages
ADAM StochasticOptimiz 1412.6980
100% (1)
ADAM StochasticOptimiz 1412.6980
15 pages
adam optimizer
No ratings yet
adam optimizer
14 pages
All Optimizers
No ratings yet
All Optimizers
13 pages
Lecture 8.5
No ratings yet
Lecture 8.5
9 pages
Optimization Algorithms Deep PDF
No ratings yet
Optimization Algorithms Deep PDF
9 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Important Optimization Algorithms Essentials
No ratings yet
Important Optimization Algorithms Essentials
12 pages
Activations, Loss Functions & Optimizers in ML
No ratings yet
Activations, Loss Functions & Optimizers in ML
29 pages
SuperGD
No ratings yet
SuperGD
15 pages
A: A M S O: DAM Ethod For Tochastic Ptimization
No ratings yet
A: A M S O: DAM Ethod For Tochastic Ptimization
13 pages
2410.19706 (1)
No ratings yet
2410.19706 (1)
15 pages
Qbank 2 solutions
No ratings yet
Qbank 2 solutions
6 pages
DL_26-09 (3)
No ratings yet
DL_26-09 (3)
22 pages
Optimization Gradient Descent Method
No ratings yet
Optimization Gradient Descent Method
3 pages
Curs6site PDF
No ratings yet
Curs6site PDF
40 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Adafactor - Adaptive Learning Rates With Sublinear Memory Cost
No ratings yet
Adafactor - Adaptive Learning Rates With Sublinear Memory Cost
9 pages
9 - Gradient Descent Part 3
No ratings yet
9 - Gradient Descent Part 3
31 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
A Proof of Local Convergence For The Adam Optimizer
No ratings yet
A Proof of Local Convergence For The Adam Optimizer
8 pages
Chapter-2 Single Feed Forward Netwotk
No ratings yet
Chapter-2 Single Feed Forward Netwotk
132 pages
MODULE 3
No ratings yet
MODULE 3
7 pages
MLP Encoder Decoder
No ratings yet
MLP Encoder Decoder
14 pages
Deep Learning - IIT Ropar - Unit 7 - Week 4
No ratings yet
Deep Learning - IIT Ropar - Unit 7 - Week 4
6 pages
week 9
No ratings yet
week 9
10 pages
Optimizers Types
No ratings yet
Optimizers Types
6 pages
DL (2)
No ratings yet
DL (2)
18 pages
Otimization 2024_ver3
No ratings yet
Otimization 2024_ver3
42 pages
Nestrov Gradient Descent
No ratings yet
Nestrov Gradient Descent
8 pages
Gradient Descent Overview
No ratings yet
Gradient Descent Overview
14 pages
WEEK 9
No ratings yet
WEEK 9
80 pages
CS 437 / CS 5317 Deep Learning: Murtaza Taj
No ratings yet
CS 437 / CS 5317 Deep Learning: Murtaza Taj
11 pages
Adas: Adaptive Scheduling of Stochastic Gradients: Preprint. Under Review
No ratings yet
Adas: Adaptive Scheduling of Stochastic Gradients: Preprint. Under Review
19 pages
769 Padam Closing The Generalizati
No ratings yet
769 Padam Closing The Generalizati
16 pages
Soft Computing Assignment
No ratings yet
Soft Computing Assignment
9 pages
Assignment 12-New
No ratings yet
Assignment 12-New
4 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
SRP
No ratings yet
SRP
3 pages
Week 4 Solution PDS
No ratings yet
Week 4 Solution PDS
9 pages
Assignment Solutions 5
No ratings yet
Assignment Solutions 5
3 pages
Outcome Based Pedagogic Principles For Effective Teaching NPTEL
100% (1)
Outcome Based Pedagogic Principles For Effective Teaching NPTEL
18 pages
Week 9
No ratings yet
Week 9
3 pages
Week 11
No ratings yet
Week 11
3 pages
Statistics With R Programming
No ratings yet
Statistics With R Programming
2 pages
JP - IT-A 2023-Updated
No ratings yet
JP - IT-A 2023-Updated
11 pages
Timetable Rprogramming
No ratings yet
Timetable Rprogramming
1 page
Programming For Problem Solving - 1
No ratings yet
Programming For Problem Solving - 1
10 pages
T-Sheet: I B.Tech I Semester (VR23) Regular Examinations, February 2024 Branch: IT 23NM1A1201 SGPA: 8.37 CGPA: 8.37
No ratings yet
T-Sheet: I B.Tech I Semester (VR23) Regular Examinations, February 2024 Branch: IT 23NM1A1201 SGPA: 8.37 CGPA: 8.37
21 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
Internal Winners & Runners List, 25.1.24
No ratings yet
Internal Winners & Runners List, 25.1.24
10 pages
Johari Window
No ratings yet
Johari Window
11 pages
Development of An Automatic Arc Welding System Using SMAW Process
No ratings yet
Development of An Automatic Arc Welding System Using SMAW Process
7 pages
ilovepdf_merged
No ratings yet
ilovepdf_merged
53 pages
Description Dermatological Status
No ratings yet
Description Dermatological Status
6 pages
f829036352 MacBird
No ratings yet
f829036352 MacBird
2 pages
WHLP - W1-2 Principles of Marketing
No ratings yet
WHLP - W1-2 Principles of Marketing
1 page
Rezistans Ek Alternativ - Public Comment EIA Atalian - Les Salines, Rivière Noire
No ratings yet
Rezistans Ek Alternativ - Public Comment EIA Atalian - Les Salines, Rivière Noire
14 pages
04 - Convex Optimization Problems
No ratings yet
04 - Convex Optimization Problems
33 pages
SAR Window Functions: A Review and Analysis of The Notched Spectrum Problem
No ratings yet
SAR Window Functions: A Review and Analysis of The Notched Spectrum Problem
54 pages
Printed Images in Early Modern Britain
No ratings yet
Printed Images in Early Modern Britain
3 pages
The Genus Sinantherina Bory de St. Vincent, 1826, A New Record For The Turkish Rotifer Fauna
No ratings yet
The Genus Sinantherina Bory de St. Vincent, 1826, A New Record For The Turkish Rotifer Fauna
5 pages
Sound Absorption Coefficient Measurement: Re-Examining The Relationship Between Impedance Tube and Reverberant Room methodsMcGrory Castro Gaussen Cabrera AAS 2012f
No ratings yet
Sound Absorption Coefficient Measurement: Re-Examining The Relationship Between Impedance Tube and Reverberant Room methodsMcGrory Castro Gaussen Cabrera AAS 2012f
8 pages
Journal of Critical Care: Biao Wang, MD, Gang Chen, MD, Yifei Cao, MD, Jiping Xue, MD, Jia Li, MD, Yunfu Wu, MD
No ratings yet
Journal of Critical Care: Biao Wang, MD, Gang Chen, MD, Yifei Cao, MD, Jiping Xue, MD, Jia Li, MD, Yunfu Wu, MD
5 pages
Trading Guide For Sheep ($0.00)
No ratings yet
Trading Guide For Sheep ($0.00)
34 pages
Minimization of Exit Flow Non-Uniformities in Supersonic Nozzles
No ratings yet
Minimization of Exit Flow Non-Uniformities in Supersonic Nozzles
5 pages
A History of The Concept of Branding - Practice and Theory
No ratings yet
A History of The Concept of Branding - Practice and Theory
22 pages
Registration Certificate (2nd Semester)
No ratings yet
Registration Certificate (2nd Semester)
1 page
CR - Artikel 1 - Yuwono DKK
No ratings yet
CR - Artikel 1 - Yuwono DKK
18 pages
Student Lifecycle Management AttendanceTracking Cookbook
No ratings yet
Student Lifecycle Management AttendanceTracking Cookbook
49 pages
International Science in The National Interest at The U.S. Geological Survey
No ratings yet
International Science in The National Interest at The U.S. Geological Survey
4 pages
Ataniyazova
No ratings yet
Ataniyazova
7 pages
Turbine Off-Frequency Operation: GE Power Systems
100% (1)
Turbine Off-Frequency Operation: GE Power Systems
4 pages
Katalog Techno Veneer
No ratings yet
Katalog Techno Veneer
24 pages
Obure Judy - Corporate Social Responsibility and Organizational Growth - A Case of Equity Bank
No ratings yet
Obure Judy - Corporate Social Responsibility and Organizational Growth - A Case of Equity Bank
73 pages
Leader-Member Exchange As A Mediator
No ratings yet
Leader-Member Exchange As A Mediator
14 pages
LR Venn Diagrams by Arshad Sir
No ratings yet
LR Venn Diagrams by Arshad Sir
2 pages
MP2: Equations of State: Engr. Elisa G. Eleazar
No ratings yet
MP2: Equations of State: Engr. Elisa G. Eleazar
20 pages

Week 4

Uploaded by

Week 4

Uploaded by

DEEP LEARNING WEEK 4

You might also like