0% found this document useful (0 votes)
2 views5 pages

Week 10

The document contains a series of questions and answers related to decision-making strategies in the context of betting and Markov Decision Processes (MDPs). It discusses concepts such as minimax regret, the Hurwicz criterion, policy iteration, and the utility curve of agents. The answers provided indicate the correct choices for each question based on the given scenarios.

Uploaded by

Ankur Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

Week 10

The document contains a series of questions and answers related to decision-making strategies in the context of betting and Markov Decision Processes (MDPs). It discusses concepts such as minimax regret, the Hurwicz criterion, policy iteration, and the utility curve of agents. The answers provided indicate the correct choices for each question based on the given scenarios.

Uploaded by

Ankur Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Please do not message repeatedly. You will get the answer before the deadline.

0
Category Search Your Courses.... My Account

 [Week 1-12] NPTEL An Introduction to Artificial Intelligence Assignment Answers 2022 

About Lesson
Q1. Ram has the opportunity to make one of 2 bets (say A,B) or invest equally in both
bets or make no bets each of which is based on the outcome of a cricket match. The
payoffs to Ram on winning/losing each of the bets are as described in the table
below:

If Ram employs minimax regret to decide in this situation, what action does he take?

Makes bet A
Makes bet B
Invest equally in A and B
Makes no bet

Accepted Answers: Makes bet B


Q2. If Ram employs the Hurwicz criterion to decide, for which of the following values
of the coefficient of realism does Ram choose to not make a bet?
0.2
0.5
0.7
0.4
Accepted Answers: 0.2
0.4
Q3. Assume that an insider tells Ram that he can tell Ram beforehand whether Ram
will win or lose a bet. Also assume that all bets have an equal likelihood of success
and failure. What is the maximum amount of money Ram should be willing to pay the
agent for this information?

Accepted Answers: (Type: Numeric) 45


Q4. For an MDP of discrete finite state space S and discrete finite action space, what is
the memory size of the transition function, in the most general case?

O(|S|^2)
O(|S||A|) O(|S|^2|A|) O(|S||A|^2)

Accepted Answers:
O(|S|^2|A|)
For Question 5 – 7 :

Consider the MDP given below for a robot trying to walk.

The MDP has three states: S={Standing,Moving,Fallen} and two actions: moving the
robot legs slowly(a) and moving the robot legs aggressively (b), denoted by the colour
black and green respectively. The task is to perform policy iteration for the above
MDP with discount factor 1.

Q5. We start with a policy 𝜋(s) = a for all s in S and V 𝜋 (s) = 0 for all s. What is the
value of the Fallen state after one iteration of bellman update during policy
evaluation?

Return the answer as a decimal rounded to 1 decimal place.

Accepted Answers:(Type: Numeric) -0.2


Q6. Suppose we perform the policy improvement step just after one iteration of
bellman update as in Q5, what is the updated policy. Write in the order of actions for
Standing, Moving and Fallen.

Example, if the policy is 𝜋(Standing) = b, 𝜋(Moving) = b, 𝜋(Fallen) = a, write the


answer as bba.

Accepted Answers:(Type: String) aba


Q7. After one iteration of policy evaluation as in Q5, what is the value of
Q(state,action) where state = Moving and action = b?

Return the answer as a decimal rounded to 2 decimal places.

Accepted Answers:(Type: Range) 2.16,2.48


Q8. If the utility curve of an agent varies as m^2 for money m, then the agent is:
Risk-prone
Risk-averse
Risk-neutral
Can be any of these

Accepted Answers:Risk-prone
Q9. Which of the following statements are true regarding Markov Decision Processes
(MDPs)?

Discount factor is not useful for finite horizon MDPs.


We assume that the reward and cost models are independent of the previous state
transition history, given the current state.
MDPs assume full observability of the environment
Goal states may have transitions to other states in the MDP

Accepted Answers:
We assume that the reward and cost models are independent of the
previous state transition history, given the current state.
MDPs assume full observability of the environment
Goal states may have transitions to other states in the MDP
Q10. Which of the following are true regarding value and policy iteration?

Value iteration is guaranteed to converge in a finite number of steps for any value of
epsilon and any MDP, if the MDP has a fixed point.
The convergence of policy iteration is dependent on the initial policy.
Value iteration is generally expected to converge in a lesser number of iterations as
compared to policy iteration.
In each iteration of policy iteration, value iteration is run as a subroutine, using a fixed
policy

Accepted Answers:
Value iteration is guaranteed to converge in a finite number of steps for any
value of epsilon and any MDP, if the MDP has a fixed point.
In each iteration of policy iteration, value iteration is run as a subroutine,
using a fixed policy

 Previous Next 

0% Complete
Mark as Complete

Quick Links Get In Touch


Login Add. : 4, Shivpuri Road No 1B,
With 5+ years of experience, Answer
GPT helps students by providing clear About Us
and accurate assignment solutions for Shivpuri, Patna, Bihar – 800023
Contact Us
NPTEL courses. We are dedicated to
supporting students in their studies and Disclaimer
helping them succeed. Privacy Policy Email: [email protected]
Refund Policy
Shipping Policy Hours: Mon-Fri 9:00AM - 6:00PM
Terms & Conditions

AnswerGPT - © 2024 All Rights Reserved.

You might also like