Assignment 1: Reinforcement Learning Prof. B. Ravindran

This document contains 10 multiple choice questions about reinforcement learning and multi-armed bandit problems. The questions cover topics like exploration strategies, updating Q-values, asymptotic optimality, PAC optimality, softmax exploration, and more. The document provides explanations for the answers.

Uploaded by

udayraj singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

3K views4 pages

Assignment 1: Reinforcement Learning Prof. B. Ravindran

Uploaded by

udayraj singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Assignment 1

Reinforcement Learning
Prof. B. Ravindran
1. Which of the following is not a useful way to approach a standard multi-armed bandit problem
with n arms? Assume bandits are stationary.
(a) “How can I ensure the best action is the one which is mostly selected as time tends to
infinity?”
(b) “How can I ensure the total regret as time tends to infinity is minimal?”
(c) “How can I ensure an arm which has an expected reward within a certain threshold of
the optimal arm is chosen with a probability above a certain threshold?”
(d) “How can I ensure that when given any 2 arms, I can select the arm with a higher expected
return with a probability above a certain threshold?”
Sol. (d)
Options a,b and c refer to asymptotic correctness, regret optimality and PAC optimality re-
spectively. Option d, i.e, being able to choose the better arm given 2 arms is not a useful way
to look at the standard multi-armed bandit problem, since it is not necessary to ensure any
2 arms can be compared with a high degree of success. For example, if two arms had very
similar and very low expected returns as compared to the optimal arm, it would not be useful
to pick those arms again for the purpose of gaining more accurate estimates of the expected
reward obtained for picking that arm (which would help in finding the better arm among the
two).
2. What is the decay rate of the weightage given to past rewards in the computation of the Q
function in the stationary and non-stationary updates in the multi-armed bandit problem?

(a) hyperbolic, linear

(b) linear, hyperbolic
(c) hyperbolic, exponential
(d) exponential, linear

Sol. (c)
In the stationary case, the weightage of all rewards given so far, is n1 , where n is the number of
rewards obtained for that arm/action so far. Therefore it is hyperbolic. In the non-stationary
update, the weightage given to the most recent reward is α, for the pre-update Q value it is
1 − α. Upon expansion of of the pre-update Q value, we can see that the weightage of the
reward in the last but one time step is (α)(1 − α). We can continue to expand in a similar
fashion to get that a reward obtained t time steps ago has a weightage of α(1 − α)t .
3. In the update rule Qt+1 (a) ← Qt (a) + α(Rt − Qt (a)), select the value of α that we would prefer
to estimate Q values in a non-stationary bandit problem.
1
(a) α = na +1
(b) α = 0.1
(c) α = na + 1

1
1
(d) α = (na +1)2

Sol. (b)
By using a constant value of α, we decrease the importance of past samples exponentially as
time progresses, so that our Q value estimates are able to shift as the true action values change.

Option (a) weights each sample equally (computing a simple average), and after a large num-
ber of time steps, the importance of new samples in the average will be very low - preventing
meaningful update of Q values when the distribution of action-values change.

Option (c) increases the importance of new samples as training progresses and creates an
unbounded sum. Action values estimates will not converge, but grow to infinity.

Option (d) weights newer samples lower, and the importance of newer samples for action-
value estimates will fall quicker than in case (a).

4. Assertion: Taking exploratory actions is important for RL agents

Reason: If the rewards obtained for actions are stochastic, an action which gave a high reward
once, might give lower reward next time.
(a) Assertion and Reason are both true and Reason is a correct explanation of Assertion
(b) Assertion and Reason are both true and Reason is not a correct explanation of Assertion
(c) Assertion is true and Reason is false
(d) Both Assertion and Reason are false
Sol. (b)
An RL agent needs to take exploratory actions because it needs to estimate the advantage for
each action correctly. So, the Assertion is true. Reason is also true because if the rewards are
stochastic, the agent might get high reward once and low reward next time for the same action
in the same state. However, even if the rewards obtained for actions were deterministic, the
agent still needs to take exploratory actions to figure out more advantageous actions. Hence,
Reason does not explain the Assertion correctly.

5. We are trying different algorithms to find the optimal arm for a multi arm bandit. We plot
expected payoff vs time graph for each algorithm for which the expected payoff satisfy some
function with respect to time (staring from 0). Which among the following functions will have
the least regret. (We know that the optimal expected pay off is 1) (Hint: Plot the functions)
(a) tanh(t)
(b) 1 − 2−t
(c) x/20 if x < 20 and 1 after that
(d) Same regret for all the above functions.
Sol. (a)
If we plot the
R ∞functions we can clearly see the area between y = 1, f (t). The exact regret will
be equal to 0 1 − f (t)dt.
6. Consider the following statements for ϵ-greedy approach in a non-stationary environment:

2
i Always keeping a small constant ϵ is a good approach.
ii Large values of ϵ will lead to unnecessary exploration in the long run.
iii Decaying ϵ value is a good approach, as after reaching optimality we would like to reduce
exploration.
Which of the above statements is/are correct?
(a) ii, iii
(b) only iii
(c) only ii
(d) i, ii
Sol. (d)
In non-stationary environment since optimal state changes, hence it requires constant explo-
ration to find the optimal state. Decaying epsilon will reduce exploration hence iii is incorrect.
7. Following are two ways for defining the probability of selecting an action/arm. Select the
option regarding better choice among the following

i P r(at = a) = PQt (a) .

a Qt (a)

Qt (a)
ii P r(at = a) = Pne Qt (b) .
b=1 e

(a) Both are good as both formulas can bound probability in range 0 to 1.
(b) (i) is better because it is differentiable and requires less complex computation.
(c) None of the above

Sol. (c)

(a), (b) are incorrect as (i) cannot handle negative values in some cases.
8. Which of the following best refers to PAC -optimality solution to bandit problems?
ϵ – is the difference between the reward of the chosen arm and true optimal reward
δ – is the probability that chosen arm is not optimal
N – is the number of steps to reach PAC-optimality
(a) Given δ and ϵ, minimize the number of steps to reach PAC-optimality(i.e. N)
(b) Given δ and N , minimize ϵ.
(c) Given ϵ and N , maximize the probability of choosing optimal arm(i.e. minimize δ)
(d) none of the above is true about PAC -optimality
Sol. (a)
refer to the definition of PAC -optimality in Bandit Optimalities video.
9. Suppose we have a 10-armed bandit problem where the rewards for each of the 10 arms is
deterministic and in the range (0, 10). Which among the following methods will allow us to
accumulate maximum reward in the long term?

3
(a) ϵ-greedy with ϵ = 0.1.
(b) ϵ-greedy with ϵ = 0.01.
(c) greedy with initial reward estimates set to 0.
(d) greedy with initial reward estimates set to 10.
Sol. (d)
Since the rewards are deterministic, we only need to select each arm once to identify an optimal
arm. The greedy method with initial reward higher than all possible rewards ensures that each
arm is selected at least once, since on selecting any arm, the resultant reward estimate will
necessarily be lower than the initial estimates of the other arms. Once each arm has been
selected, the greedy method will settle on the arm with the maximum reward.

10. Which of the following is/are correct and valid reasons to consider sampling actions from a
softmax distribution instead of using an ϵ-greedy approach?
i Softmax exploration makes the probability of picking an action proportional to the action-
value estimates. By doing so, it avoids wasting time exploring obviously ’bad’ actions.
ii We do not need to worry about decaying exploration slowly like we do in the ϵ-greedy
case. Softmax exploration gives us asymptotic correctness even for a sharp decrease in
temperature.
iii It helps us differentiate between actions with action-value estimates (Q values) that are
very close to the action with maximum Q value.
Which of the above statements is/are correct?

(a) i, ii, iii

(b) only iii
(c) only i
(d) i, ii
(e) i, iii
Sol. (e)
ii is incorrect. If we decrease temperature to quickly, we could fail to do enough exploration
to make correct action-value estimates, just as we would by decaying ϵ too quickly in ϵ-greedy
exploration.
i and iii are correct. Softmax encourages exploration of actions that have action-value estimates
close to the action with maximum Q value. Further concentrated exploration of these actions
improve our action-value estimates for them, and this allows us to differentiate between them
better.

DL Assignment Solution 00 To 10
100% (1)
DL Assignment Solution 00 To 10
67 pages
DEEP LEARNING IIT Kharagpur Assignment - 4 - 2024
100% (2)
DEEP LEARNING IIT Kharagpur Assignment - 4 - 2024
7 pages
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 4 - Week 1
No ratings yet
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 4 - Week 1
4 pages
1.deep Learning Assignment1 Solutions 1
100% (3)
1.deep Learning Assignment1 Solutions 1
12 pages
DL - Assignment 5 Solution
No ratings yet
DL - Assignment 5 Solution
7 pages
DL - Assignment 9 Solution
100% (3)
DL - Assignment 9 Solution
7 pages
Assignment Week 12-Deep-Learning PDF
100% (3)
Assignment Week 12-Deep-Learning PDF
6 pages
Introduction To Machine Learning Assignment-Week 4
No ratings yet
Introduction To Machine Learning Assignment-Week 4
5 pages
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Macos Mojave Compatibility 02 07
No ratings yet
Macos Mojave Compatibility 02 07
11 pages
Assignment 3: Reinforcement Learning Prof. B. Ravindran
100% (1)
Assignment 3: Reinforcement Learning Prof. B. Ravindran
4 pages
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
2 pages
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
24 pages
DL - Assignment 1 Solution
No ratings yet
DL - Assignment 1 Solution
8 pages
Assignment Week 8-Deep-Learning PDF
100% (1)
Assignment Week 8-Deep-Learning PDF
5 pages
DL - Assignment 11 Solution
No ratings yet
DL - Assignment 11 Solution
7 pages
DL - Assignment 2 Solution
No ratings yet
DL - Assignment 2 Solution
7 pages
Assignment 7 (Sol.) : Reinforcement Learning
0% (1)
Assignment 7 (Sol.) : Reinforcement Learning
3 pages
MCQ Question
No ratings yet
MCQ Question
5 pages
DL - Assignment 12 Solution
No ratings yet
DL - Assignment 12 Solution
7 pages
DEEP LEARNING IIT Kharagpur Assignment - 1 - 2024 - Updated
No ratings yet
DEEP LEARNING IIT Kharagpur Assignment - 1 - 2024 - Updated
6 pages
Week3 Assignment
No ratings yet
Week3 Assignment
6 pages
Introduction To Machine Learning - Unit 3 - Week 1
No ratings yet
Introduction To Machine Learning - Unit 3 - Week 1
3 pages
DL - Assignment 6 Solution
100% (3)
DL - Assignment 6 Solution
6 pages
Assignment Week 4-Deep-Learning PDF
100% (1)
Assignment Week 4-Deep-Learning PDF
7 pages
DEEP LEARNING IIT Kharagpur Assignment - 2 - 2024 - Updated
No ratings yet
DEEP LEARNING IIT Kharagpur Assignment - 2 - 2024 - Updated
6 pages
Assignment Week 11-Deep-Learning PDF
100% (2)
Assignment Week 11-Deep-Learning PDF
7 pages
Assignment9 DeepLearning
No ratings yet
Assignment9 DeepLearning
6 pages
NPTEL ML Assignment Week1
100% (4)
NPTEL ML Assignment Week1
5 pages
DL - Assignment 3 Solution
No ratings yet
DL - Assignment 3 Solution
7 pages
DEEP LEARNING IIT Kharagpur Assignment - 3 - 2024
100% (2)
DEEP LEARNING IIT Kharagpur Assignment - 3 - 2024
7 pages
DL - Assignment 4 Solution
No ratings yet
DL - Assignment 4 Solution
6 pages
Assignment 5 (Sol.) : Reinforcement Learning
100% (1)
Assignment 5 (Sol.) : Reinforcement Learning
4 pages
Assignment 12: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 12: Introduction To Machine Learning Prof. B. Ravindran
4 pages
DL - Assignment 10 Solution
100% (2)
DL - Assignment 10 Solution
6 pages
DL - Assignment 8 Solution
100% (2)
DL - Assignment 8 Solution
6 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
Introduction To Machine Learning - IITKGP - Unit 5 - Week 3
No ratings yet
Introduction To Machine Learning - IITKGP - Unit 5 - Week 3
4 pages
DL - Assignment 7 Solution
100% (1)
DL - Assignment 7 Solution
5 pages
Assignment 6
No ratings yet
Assignment 6
2 pages
NPTEL Introduction To Machine Learning Assignment 10 Answers
100% (1)
NPTEL Introduction To Machine Learning Assignment 10 Answers
7 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
Assignment 8
No ratings yet
Assignment 8
4 pages
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 8 - Week 5
100% (1)
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 8 - Week 5
5 pages
Machine Learning, ML Ass 7
No ratings yet
Machine Learning, ML Ass 7
7 pages
Machine Learning Question Paper Solved ML
No ratings yet
Machine Learning Question Paper Solved ML
55 pages
Machine Learning, ML Ass 5
No ratings yet
Machine Learning, ML Ass 5
6 pages
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
100% (2)
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Introduction To Machine Learning - Unit 3 - Week 1 - Non - Graded
No ratings yet
Introduction To Machine Learning - Unit 3 - Week 1 - Non - Graded
3 pages
Assignment 7
No ratings yet
Assignment 7
3 pages
Deep Learning - IIT Ropar - Unit 3 - Week 1
100% (1)
Deep Learning - IIT Ropar - Unit 3 - Week 1
3 pages
PA12
100% (2)
PA12
3 pages
NLP Assignment-1 Solution
No ratings yet
NLP Assignment-1 Solution
4 pages
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 6 - Week 3
No ratings yet
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 6 - Week 3
5 pages
Introduction To Machine Learning Week 2 Assignment
100% (1)
Introduction To Machine Learning Week 2 Assignment
8 pages
Assignment 10: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 10: Introduction To Machine Learning Prof. B. Ravindran
4 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Question Bank Module-1 Questions. Introduction and Concept Learning
No ratings yet
Question Bank Module-1 Questions. Introduction and Concept Learning
6 pages
RL L2 MultiArmedBandits
No ratings yet
RL L2 MultiArmedBandits
44 pages
Unit II
No ratings yet
Unit II
10 pages
RL Unit5
No ratings yet
RL Unit5
101 pages
Mutations
No ratings yet
Mutations
48 pages
Week 2 - Critical Thinking and Fundamental Reading Skills
No ratings yet
Week 2 - Critical Thinking and Fundamental Reading Skills
49 pages
Troubleshooting Neato Botvac Connected Series
No ratings yet
Troubleshooting Neato Botvac Connected Series
4 pages
Topic 2 Linear Programming
No ratings yet
Topic 2 Linear Programming
64 pages
Ground Floor Containment Overall Layout
No ratings yet
Ground Floor Containment Overall Layout
1 page
Engine Code Won't Clear in My 2008 Saturn Vue - Google Search
No ratings yet
Engine Code Won't Clear in My 2008 Saturn Vue - Google Search
1 page
Stability of Food Emulsions (2) : David Julian Mcclements
No ratings yet
Stability of Food Emulsions (2) : David Julian Mcclements
37 pages
Econometrics Problem Set
No ratings yet
Econometrics Problem Set
5 pages
Deep and Surface Learning PDF
No ratings yet
Deep and Surface Learning PDF
1 page
Aug 1-27 Final
No ratings yet
Aug 1-27 Final
90 pages
Fullz PDF
No ratings yet
Fullz PDF
3 pages
Icats Basic HEO (HE)
No ratings yet
Icats Basic HEO (HE)
102 pages
FreemanWhite Hybrid Operating Room Design Guide PDF
No ratings yet
FreemanWhite Hybrid Operating Room Design Guide PDF
11 pages
Sheets, Preset Names and Formatting Descriptions For Different Elements (Such As "Chapter Title" or
No ratings yet
Sheets, Preset Names and Formatting Descriptions For Different Elements (Such As "Chapter Title" or
2 pages
Lab 1 Group 3 - Pure and Series
No ratings yet
Lab 1 Group 3 - Pure and Series
60 pages
Exam Lo1 Electrical Circuit Protection
No ratings yet
Exam Lo1 Electrical Circuit Protection
1 page
The Handbook of Mobile Middleware 1st Edition Paolo Bellavista 2024 Scribd Download
No ratings yet
The Handbook of Mobile Middleware 1st Edition Paolo Bellavista 2024 Scribd Download
45 pages
SOCI1003 Assignment Cover Sheet
No ratings yet
SOCI1003 Assignment Cover Sheet
7 pages
ADP-233600-019 R1 MS of Air Curtain (A)
No ratings yet
ADP-233600-019 R1 MS of Air Curtain (A)
24 pages
Prof K V Subbaraju
No ratings yet
Prof K V Subbaraju
26 pages
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
No ratings yet
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
2 pages
Unit 2 Principles of Assessm Ent in Instructional Decision
No ratings yet
Unit 2 Principles of Assessm Ent in Instructional Decision
11 pages
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
No ratings yet
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
55 pages
A Detailed Lesson Plan in Mathematics 7: I. Objectives
No ratings yet
A Detailed Lesson Plan in Mathematics 7: I. Objectives
8 pages
Reference Photo:: 9-7/8 In. (250.8mm) QD503X
No ratings yet
Reference Photo:: 9-7/8 In. (250.8mm) QD503X
1 page
Fórmulas Basicas de Derivadas e Integrales
No ratings yet
Fórmulas Basicas de Derivadas e Integrales
1 page
Methods 3 Unit Plan Project: Petition Rubric
No ratings yet
Methods 3 Unit Plan Project: Petition Rubric
1 page
Sand Casting
No ratings yet
Sand Casting
92 pages
Elephant Lifting Catalog v48
100% (1)
Elephant Lifting Catalog v48
80 pages

Assignment 1: Reinforcement Learning Prof. B. Ravindran

Uploaded by

Assignment 1: Reinforcement Learning Prof. B. Ravindran

Uploaded by

Assignment 1

(a) hyperbolic, linear

4. Assertion: Taking exploratory actions is important for RL agents

i P r(at = a) = PQt (a) .

(a) i, ii, iii

You might also like