Week 10
Week 10
0
Category Search Your Courses.... My Account
About Lesson
Q1. Ram has the opportunity to make one of 2 bets (say A,B) or invest equally in both
bets or make no bets each of which is based on the outcome of a cricket match. The
payoffs to Ram on winning/losing each of the bets are as described in the table
below:
If Ram employs minimax regret to decide in this situation, what action does he take?
Makes bet A
Makes bet B
Invest equally in A and B
Makes no bet
O(|S|^2)
O(|S||A|) O(|S|^2|A|) O(|S||A|^2)
Accepted Answers:
O(|S|^2|A|)
For Question 5 – 7 :
The MDP has three states: S={Standing,Moving,Fallen} and two actions: moving the
robot legs slowly(a) and moving the robot legs aggressively (b), denoted by the colour
black and green respectively. The task is to perform policy iteration for the above
MDP with discount factor 1.
Q5. We start with a policy 𝜋(s) = a for all s in S and V 𝜋 (s) = 0 for all s. What is the
value of the Fallen state after one iteration of bellman update during policy
evaluation?
Accepted Answers:Risk-prone
Q9. Which of the following statements are true regarding Markov Decision Processes
(MDPs)?
Accepted Answers:
We assume that the reward and cost models are independent of the
previous state transition history, given the current state.
MDPs assume full observability of the environment
Goal states may have transitions to other states in the MDP
Q10. Which of the following are true regarding value and policy iteration?
Value iteration is guaranteed to converge in a finite number of steps for any value of
epsilon and any MDP, if the MDP has a fixed point.
The convergence of policy iteration is dependent on the initial policy.
Value iteration is generally expected to converge in a lesser number of iterations as
compared to policy iteration.
In each iteration of policy iteration, value iteration is run as a subroutine, using a fixed
policy
Accepted Answers:
Value iteration is guaranteed to converge in a finite number of steps for any
value of epsilon and any MDP, if the MDP has a fixed point.
In each iteration of policy iteration, value iteration is run as a subroutine,
using a fixed policy
Previous Next
0% Complete
Mark as Complete