HW 03
HW 03
Alice is taking a class taught by Bob called Articial Intelligence. Bob has three ways he can teach the class: Hard, Medium or Easy. Alice has three ways she can take the class: Hard Working, Working and Hardly Working. For each of them, there are pros and cons. For instance, its easy for Bob to teach an Easy class or a Hard class, but hard to balance the two. Obviously Alice doesnt like to work hard, but she realizes that she might have to in order to learn something. Bob is happy to teach a hard class to students who are willing to work hard, but if the students dont work hard, they punish Bob by giving him bad teaching evals!1 All of these things considered gives rise to the following table of rewards. These are written as (A, B) where A is Alices reward and B is Bobs reward: Hard Working Hardly Hard (9, 9) (5, 7) (3, 1) Medium (6, 5) (8, 6) (5, 2) Easy (2, 1) (4, 2) (4, 4)
1. If Bob assumes that Alice will optimize her own reward (i.e., Bob assumes Alice is an optimal agent), how should he teach the class, supposing that Bob plays rst? If Alice assumes Bob is an optimal agent, how hard should she work? 2. Draw a game tree for this problem supposing that Bob goes rst. Propagate values up through the tree using (the non-zero-sum variant of) minimax search. 3. Alice is clearly a good student (see question one), but once in a while we get students who arent quite as dilligent :(. It makes sense for Bob to model his class as a distribution over types of students. Suppose Bob believes that 40% of his class will work hard, 45% will work, and 15% will hardly work. Draw the expectimax tree for this setting, concentrating only on Bobs reward, and compute expected node values. What is Bobs expected reward for this setting and which type of class should he teach?
1 Yes,
this implies that any bad teaching reviews must be due to shortcomings of students, not of professors!
I dont have a cute story to wrap around the following questions, so just answer them :). 1. I ip a fair coin but dont let you see how it came up. I tell you that if you guess right, Ill give you $10. What is your expected reward (write out the computation!)? 2. Coins are boring. Now I roll a fair six-sided die but dont let you see how it came up. I tell you that if you guess right, Ill give you $10. What is your expected reward? 3. Lets say that now I tell you that the die isnt fair, but that the probabilities are as follows: p(1) = 0.3, p(2) = 0.1, p(3) = 0.1, p(4) = 0.2, p(5) = 0.2, p(6) = 0.1. Again, Ill give you $10 if you guess right. For each of the six possible guesses you could make, compute your expected reward. Which would you guess? 4. Now, I make you the following oer. Keep the same die as before. But now, I tell you that if you guess right, Ill give you $10 times the number you guess. I.e., if you guess 2 and youre right, Ill give you $20. Now what is your best option to guess? Is this the same or dierent from the previous problem? Explain why or why not. 5. What are the formal components of a Markov Decision Process (formally)!? Make up an example of an MDP and give (in English) what each of these is (eg., if you dont want to think of your own, you can use a baby crawling around, eating cookies and sticking his nger in an electrical socket. . . ). 6.