Question Bank - Reinforcement Learning
Question Bank - Reinforcement Learning
UNIT II
1. Explain the key components of a Markov Decision Process (MDP) and their
significance.
2. How does the Markov property influence decision-making in an MDP? Provide an
example.
3. Define a policy in an MDP. Differentiate between deterministic and stochastic policies.
4. Explain how the Bellman equation helps in computing the state-value function V(s).
5. How does the action-value function Q(s,a) differ from the state-value function V(s)?
6. Compare finite-horizon and infinite-horizon reward models with real-world examples.
7. Explain how the discount factor γ\gamma γ affects long-term decision-making in
reinforcement learning.
8. What are the key differences between the total reward model and the average reward
model?
9. What is the difference between episodic and continuing tasks? Provide examples of
each.
10. Why is the concept of a discount factor especially important in continuing tasks?
11. Define Bellman’s optimality operator and explain its role in solving MDPs.
12. How does the Bellman optimality equation help in finding the optimal value function?
13. Discuss how Bellman’s optimality equation contributes to the efficiency of
reinforcement learning algorithms.
UNIT III
1. Explain the basic idea of Monte Carlo methods in reinforcement learning.
2. What are the key assumptions of Monte Carlo prediction?
3. Differentiate between first-visit and every-visit Monte Carlo methods with an example.
4. Describe the Monte Carlo control method for finding optimal policies. Include the
algorithm steps.
5. What is Temporal Difference learning? How is it different from Monte Carlo methods?
6. Explain TD(0) with an example of value prediction.
7. What are the advantages and Challenges of Monte Carlo Methods?
8. What are model-based reinforcement learning algorithms? Briefly explain with an
example
UNIT IV
1. What is bootstrapping in reinforcement learning?
2. What is Temporal differencing and why do we need it?
3. Describe the Sarsa algorithm. Explain its components and how exploration is handled.
4. Define Q-learning and write its update rule.
5. Compare Sarsa and Q-learning. Provide an example showing their behavior difference.
6. Explain the Q-learning algorithm in detail and how it helps in finding the optimal
policy.
7. What is Expected Sarsa? How does it differ from regular Sarsa?
8. Derive the Expected Sarsa update formula and discuss its advantages over Sarsa and
Q-learning.
UNIT V
1. What is an n-step return? Explain its role in reinforcement learning.
2. Write the expression for n-step return and explain its components.
3. What is the TD(λ) algorithm? Mention its significance.
4. Describe the TD(λ) algorithm in detail. Compare it with TD(0) and Monte Carlo
methods, highlighting how λ helps in generalization.
5. Why is generalization important in reinforcement learning?
6. Discuss the need for generalization in real-world RL problems. Describe techniques
used to achieve generalization and their implications.
7. What is linear function approximation in RL?
8. Explain the geometric view of linear function approximation using feature vectors
and projections.
9. What is Linear TD(λ)? Mention its key features
10. Derive the update equation for Linear TD(λ) and explain the role of each component
UNIT VI
1. What is tile coding in reinforcement learning?
2. Why is tile coding used for function approximation?
3. What does "control with function approximation" mean in reinforcement learning?
4. What is policy search in reinforcement learning?
5. Explain the concept of parameterized policies and how policy search methods optimize
them.
6. What is experience replay? Mention its basic working principle.
7. Explain the advantages of using experience replay in deep reinforcement learning.
8. Describe the architecture and functioning of experience replay in Q-learning. Discuss
how it improves sample efficiency and stabilizes learning.
9. What is fitted Q iteration? How does it differ from standard Q-learning?