Assignment9 DeepLearning
Assignment9 DeepLearning
Deep Learning
Assignment- Week 9
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2 = 20
______________________________________________________________________________
QUESTION 1:
Gradient of a particular dimension points to the same direction. What can you say about the
momentum parameter, γ. Choose the correct option.
Correct Answer: a
Detailed Solution:
When we push a ball down a hill, the ball accumulates momentum as it rolls downhill,
becoming faster and faster on the way (until it reaches its terminal velocity if there is air
resistance, i.e. γ<1). The analogy is same for parameter updates and thus option a.
______________________________________________________________________________
QUESTION 2:
Comment on the learning rate of Adagrad. Choose the correct option.
a. Learning rate is adaptive
b. Learning rate increases for each time step
c. Learning rate remains the same for each update
d. None of the above
Correct Answer: a
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
Adagrad is an adaptive learning rate method. Each and every parameter adapts varied learning
rate
______________________________________________________________________________
QUESTION 3:
Adagrad has its own limitations. Can you choose that limitation from the following options?
a. Accumulation of the positive squared gradients in the denominator
b. Overshooting minima
c. Learning rate increases thus hindering convergence and cause the loss function
to fluctuate around the minimum or even to diverge
d. Getting trapped in local minima
Correct Answer: a
Detailed Solution:
Accumulation of the squared gradients in the denominator is a problem as every added term is
positive, thus accumulated sum keeps growing during training. This in causes the learning rate
to shrink and eventually become infinitesimally small.
______________________________________________________________________________
QUESTION 4:
What is the full form of RMSProp?
a. Retain Momentum Propagation
b. Round Mean Square Propagation
c. Root Mean Square Propagation
d. None of the above
Correct Answer: c
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
______________________________________________________________________________
QUESTION 5:
RMSProp resolves the limitation of which optimizer?
a. Adagrad
b. Momentum
c. Both a and b
d. Neither a nor b
Correct Answer: a
Detailed Solution:
RMSProp tries to resolve Adagrad’s radically diminishing learning rates by using a moving
average of the squared gradient. It utilizes the magnitude of the recent gradient descents to
normalize the gradient
____________________________________________________________________________
QUESTION 6:
Which of the following statement is true?
Correct Answer: c
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
We can view Nesterov Accelerated Gradient (NAG) as the correction factor for Momentum
optimizer. If the velocity added gives a high loss, momentum method can be very slow as the
optimization path taken exhibits large oscillations. In NAG, if the added velocity (which is used
to calculate intermediate parameter) leads to a bad loss, then the gradient will direct back
towards last position. This helps NAG avoid oscillations.
_____________________________________________________________________________
QUESTION 7:
The following is the equation of update vector for momentum optimizer:
v 𝑡𝑡=𝛾𝛾v 𝑡𝑡−1 + 𝜂𝜂 𝛻𝛻 𝜃𝜃 𝐽𝐽(𝜃𝜃)
What is the range of 𝛾𝛾?
a. 0 and 1
b. >0
c. >=0
d. >=1
Correct Answer: a
Detailed Solution:
A fraction of the update vector of the past time step is added to the current update vector. 𝛾𝛾 is
that fraction which lies between 0 and 1.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
______________________________________________________________________________
QUESTION 8:
Why it is at all required to choose different learning rates for different weights?
Correct Answer: d
Detailed Solution:
In case of adaptive learning rate, learning rate will be reduced for parameters with high
gradient and learning rate will be increased for parameter with small gradient. This would aid in
reaching the optimum point much faster. This is the benefit of choosing different learning rate
for different weights.
______________________________________________________________________________
QUESTION 9:
What is the major drawback of setting a large learning rate for updating weight parameter for
Gradient Descent?
a. Slower convergence
b. Struck in local minima
c. Overshoots optimum point
d. None of the above
Correct Answer: b
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
Too large learning rate can cause the loss function to fluctuate around the optimum point or
even diverge.
____________________________________________________________________________
QUESTION 10:
For a smaller magnitude gradient descent, what should be the suggested learning rate for
updating the weights?
a. Small
b. Large
c. Cannot comment
d. Same learning rate for small and large gradient magnitudes
Correct Answer: b
Detailed Solution:
For a smaller gradient magnitude, learning rate should be large to reach the optimum quickly
______________________________________________________________________
************END*******