Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
Practice Assignment 4: Reinforcement Learning Prof. B. Ravindran
Reinforcement Learning
Prof. B. Ravindran
1. Select the correct Bellman optimality equation:
(a) v ∗ (s) = maxa s′ p(s′ |s, a)[E[r|s, a, s′ ] + γv ∗ (s′ )]
P
Sol. (a)
Refer to video on Bellman optimality equation
2. State True/False
In MDPs, there is a unique resultant state for any given state-action pair.
(a) True
(b) False
Sol. (b)
The statement is true for deterministic MDPs, but for general MDPs, for a given state-action
pair, there can be multiple resultant states with different probabilities associated with them.
3. State True/False The state transition graph for any MDP is a directed acyclic graph.
(a) True
(b) False
Sol. (b)
The statement is false. There is a possibility of transitioning to the same state, as well as
having other cycles.
4. Consider the following statements:
(i) The optimal policy of an MDP is unique.
(ii) We can determine an optimal policy for a MDP using only the optimal value function(v ∗ ),
without accessing the MDP parameters.
(iii) We can determine an optimal policy for a given MDP using only the optimal q-value
function(q ∗ ), without accessing the MDP parameters.
Which of these statements are true?
(a) Only (ii)
(b) Only (iii)
(c) Only (i), (ii)
(d) Only (i), (iii)
(e) Only (ii), (iii)
1
Sol. (b)
Optimal policy can be recovered from an optimal q-value function.
5. Which of the following is a benefit of using RL algorithms for solving MDPs?
(a) They do not require the state of the agent for solving a MDP.
(b) They do not require the action taken by the agent for solving a MDP.
(c) They do not require the state transition probability matrix for solving a MDP.
(d) They do not require the reward signal for solving a MDP.
Sol. (c)
RL algorithms require to know the state the agent is in, the action it takes and a reward
signal from the environment to solve the MDP. However, they do not need to know the state
transition probability matrix.