Reinforcement Learning - Unit 6 - Week 4
Reinforcement Learning - Unit 6 - Week 4
(https://swayam.gov.in)
(https://swayam.gov.in/nc_details/NPTEL)
NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL)
»
Reinforcement Learning (course)
Register for
Certification exam
Week 4: Assignment 4
(https://examform.nptel.ac.in/2022_01/exam_form/dashboard)
Course outline
1) State True/False
1 point
How does an
NPTEL online The state transition graph for any MDP is a directed acyclic graph.
course work?
()
True
False
Week 0 ()
2) Consider the following statements:
1 point
Week 1 ()
(i) The optimal policy of an MDP is unique.
(ii) We can determine an optimal policy for a MDP using only the optimal value function(v∗ ) , without
Week 2 ()
accessing the MDP parameters.
(iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(q ∗ ) ,
Week 3 ()
without accessing the MDP parameters.
Week 4 ()
Which of these statements are true?
MDP Modelling
Only (ii)
(unit?
unit=43&lesson=44)
Only (iii)
Bellman
Only (i), (ii)
Equation (unit?
Only (i), (iii)
unit=43&lesson=45)
Only (ii), (iii)
Bellman
3) Which of the following statements are true for a finite MDP? (Select all that apply). 1 point
Optimality
Equation (unit?
The Bellman equation of a value function of a finite MDP defines a contraction in Banach
space
unit=43&lesson=46)
(using the max norm).
Cauchy If 0 ≤ γ < 1 , then the eigenvalues of γPπ are less than 1.
Sequence and
We call a normed vector space ’complete’ if Cauchy sequences exist in that vector space.
Green's Equation
(unit?
he sequence defined by vn = rπ + γP π vn−1 is a Cauchy sequence in Banach space (using
the max
unit=43&lesson=47)
norm).
(assessment? ∞
name=142) (
i) vπ (s) = Eπ [∑ γ
i−t
Ri+1 |St = s]
i=t
Quiz: Week 4:
(ii) q π (s, a) = ∑ p(s |s, a)v
′ π
(s )
′
Assignment 4
′
s
(assessment?
name=143) (iii) vπ (s) = ∑ π(a|s)q
π
(s, a)
Reinforcement
4 Feedback
Form (unit?
Only (i)
unit=43&lesson=129)
Only (i), (ii)
Week 5 ()
Only (ii), (iii)
Only (i), (iii)
DOWNLOAD
(i), (ii), (iii)
VIDEOS ()
6) What is true about the γ (discount factor) in reinforcement learning? 1 point
Text
discount factor can be any real number
Transcripts ()
7) Consider the following statements for a finite MDP (I is an identity matrix with dimensions
1 point
|S| × |S| (S is the set of all states) and Pπ is a stochastic matrix):
(i) MDP with stochastic rewards may not have a deterministic optimal policy.
(ii) There can be multiple optimal stochastic policies.
(iii) If 0 ≤ γ < 1 , then rank of the matrix I − γP π is equal to |S| .
Only (ii), (iii)
Only (ii), (iv)
Only (i), (iii)
Only (i), (ii), (iii)
8) Consider an MDP with 3 states A,B,C. At each state we can go to either of the two states.
i.e 1 point
if we are in state A then we can perform 2 actions, going to state B or C. The rewards for
each transactions
are r(A, B) = −2 (reward if we go from A to B),
r(B, A) = 3, r(B, C ) = 10, r(C , B) = −7, r(A, C ) = −2, r(C , A) = 4 , discount factor is 0.9. Find
the fixed point of the
value function for the policy π(A) = B (if we are in state A we choose the action to
go to C)
π(B) = C , π(C ) = A. vπ([ABC ]) =?
(round to 1 decimal place)
[35.2, 48.6, 10.7]
[37.8, 44.2, 38.0]
[37.8, 38.0, 44.2]
[40.6, 20.2, 75.3]
9) Which of the following is not a valid norm function? (x is a D dimensional vector) 1 point
max d∈{1,...,D} |x d |
−−−−−
D
2
∑ x
d
⎷
d=1
mind∈{1,...,D} |x d |
D
∑ |x d |
d=1
10) For an operator L, which of the following properties must be satisfied by x for it to be a fixed 1 point
point for L?(Multi-Correct)
Lx = x
2
L x = x
∀λ > 0Lx = λx
None of the above
You may submit any number of times before the due date. The final submission will be considered for
grading.
Submit Answers