0% found this document useful (0 votes)
75 views26 pages

cs188 Fa18 Final Sol

orem ipsum dolor sit amet, consectetur adipiscing elit. Nunc commodo hendrerit magna. Phasellus quis nisi ullamcorper, iaculis elit non, ornare diam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nunc congue sit amet velit a accumsan. Vivamus laoreet, orci at dictum tincidunt, enim mauris pharetra elit, eget scelerisque augue dolor in orci. Pellentesque urna ligula, tincidunt sed suscipit id, maximus sit amet leo. Aenean aliquam vitae mi et venenatis

Uploaded by

Dũng Minh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views26 pages

cs188 Fa18 Final Sol

orem ipsum dolor sit amet, consectetur adipiscing elit. Nunc commodo hendrerit magna. Phasellus quis nisi ullamcorper, iaculis elit non, ornare diam. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nunc congue sit amet velit a accumsan. Vivamus laoreet, orci at dictum tincidunt, enim mauris pharetra elit, eget scelerisque augue dolor in orci. Pellentesque urna ligula, tincidunt sed suscipit id, maximus sit amet leo. Aenean aliquam vitae mi et venenatis

Uploaded by

Dũng Minh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CS 188 Introduction to

Fall 2018 Artificial Intelligence Final Exam


• You have 180 minutes. The time will be projected at the front of the room. You may not leave during the last
10 minutes of the exam.
• Do NOT open exams until told to. Write your SIDs in the top right corner of every page.
• If you need to go to the bathroom, bring us your exam, phone, and SID. We will record the time.
• In the interest of fairness, we want everyone to have access to the same information. To that end, we will not
be answering questions about the content. If a clarification is needed, it will be projected at the front of the
room. Make sure to periodically check the clarifications.
• The exam is closed book, closed laptop, and closed notes except your three-page double-sided cheat sheet. You
are allowed a non-programmable calculator for this exam. Turn off and put away all other electronics.
• The last two sheets in your exam are scratch paper. Please detach them from your exam. Mark your answers
ON THE EXAM IN THE DESIGNATED ANSWER AREAS. We will not grade anything on scratch paper.
• For multiple choice questions:
–  means mark ALL options that apply
– # means mark ONE choice


– When selecting an answer, please fill in the bubble or square COMPLETELY and

First name

Last name

SID

Student to the right (SID and Name)

Student to the left (SID and Name)

Q1. Agent Testing Today! /1


Q2. Short Questions /14
Q3. Treasure Hunting MDPs /12
Q4. Approximate Q-learning /8
Q5. Value of Asymmetric Information /16
Q6. Bayes Net Modeling /12
Q7. Help the Farmer! /14
Q8. Bayes Nets and RL /15
Q9. Decision Trees /8
Total /100

1
THIS PAGE IS INTENTIONALLY LEFT BLANK
SID:

Q1. [1 pt] Agent Testing Today!

It’s testing time! Not only for you, but for our CS188 robots as well! Circle your favorite robot below.

Any answer was acceptable.

3
Q2. [14 pts] Short Questions
(a) [2 pts] Which of the following properties must a set of preferences satisfy if they are rational:

 (A  B) OR (B  A) OR (B ∼ A)
 (B  A) AND (C  A) ⇒ (C ∼ B)
 (A ∼ B) AND (B ∼ C) ⇒ [p, A; 1 − p, B] ∼ [q, B; 1 − q, C]
 A  B  C  D ⇒ [p, A; 1 − p, C]  [p, B; 1 − p, D]
 (B  A) AND (C  B) ⇒ (C  A)
The first one is the orderability axiom that all rational preferences must satisfy. The second one is false,
you could consider the utility function U (A) = 1, U (B) = 2, U (C) = 3 which satisfies the premises but
U (B) 6= U (C). The second one is true since A, B, and C are equally preferable any lottery with those
preferences is also equally preferable. It follows directly from the substitutability axiom. The fourth and
fifth statements follow from the transitivity axiom.

(b) [2 pts] Which of the following are true?

 Given a set of preferences there exists a unique utility function.


 U (x) = x4 is a risk prone utility
 U (x) = 2x is a risk prone utility
 For any specific utility function, any lottery can be replaced by an appropriate deterministic utility
value
 For the lotteries A = [0.8, $4000; 0.2, $0], B = [1.0, $3000; 0.0, $0] we have A  B
Given a set of preferences there exists an infinite number of utility functions, you may consider a fixed
utility function and then apply a monotonic increasing function to it. This will result in a new utility
function for that set of preferences. The second statement is risk prone since U (L) > U (EMV(L)). The
third statement is risk neutral U (L) = U (EMV(L)). The fourth statement is true since given a lottery
and a utility function, we consider the utility of it as the expected utility value for that lottery. The last
statement is false: we can consider the utility function log(x), then we have 0.8 log(4000) < log(3000).

(c) [2 pts] Which of the following paths is a feasible trajectory for the gradient ascent algorithm?

4
SID:

 A  B  C  D  E  F

A is a gradient ascent path since the gradient lines are orthogonal to the contours and the point towards the
maximum. B is also a gradient ascent path with a high learning rate. C is not because the path is going towards
the minimum instead of the maximum. D is not a gradient ascent path since the gradient is not orthogonal to
the contour lines. E is not a gradient ascent path since it starts going towards the minimum. F is not since it
goes towards the minimum and the gradients are not orthogonal to the contour lines.

5
(d) We are given the following 5 neural networks (NN) architectures. The operation ∗ represents the matrix mul-
tiplication operation, [wi1 ...wik ] and [bi1 ...bik ] represents the weights and the biases of the NN, the orientation
(vertical and horizontal) its just for consistency in the operations. The term [ReLU σ] in B means applying
a ReLU activation to the first element of the vector and a sigmoid (σ) activation to the second element. These
operations are depicted in the following figures:

Which of the following neural networks can represent each function?

(i) [2 pts] fI (x) :  A  B  C  D  E


A and B cannot represent this plot since the ReLU activation results in a flat semi-line. C can represent it
by having w11 = 1, w12 = −1, w21 = 1, w22 = 1 and the rest of the parameters being 0. D cannot because
the sigmoid activations are non-linear. E is a linear function and therefore can represent the identity.
(ii) [2 pts] fII (x) :  A  B  C  D  E
This is a piecewise linear function with 6 pieces. As a result you can represent it by lineraly combining 5
ReLU functions. The only possible graph that can represent this function is C.

6
SID:

(iii) [2 pts] fIII (x):  A  B  C  D  E


This is a piecewise linear function with 2 pieces. This can be obtained by linearly combining to ReLU
functions. The only possible solution is then C. Note that A cannot represent this function since the last
ReLU of the network would result in a flat semi-line.
(iv) [2 pts] fIV (x) :  A  B  C  D  E
This function corresponds to a scaled sigmoid function. A, C, E cannot represent any non-linear or non-
piecewise linear functions. B can represent it by setting w11 = b11 = 0. D does not work because the
composition of sigmoid with sigmoid is not a sigmoid.

7
Q3. [12 pts] Treasure Hunting MDPs
In each below Gridworld, Pacman’s starting position is denoted as P . G denotes the goal. At the goal, Pacman can
only “Exit”, which will cause Pacman to exit the Gridworld and gain reward +100. The Gridworld also has K gold
nuggets g1 , ..., gk which will have the properties listed below in each part. Pacman automatically picks up a gold
nugget if he enters a square containing one. Finally, define P0 as Pacman’s initial state, where he is in position P
and has no gold nuggets.

(a) (b) (c)


(a) Pacman now ventures into (a). Each gold nugget Pacman holds when he exits will add +100 to his reward,
and he will receive +100 when he exits from the goal G.
(i) [2 pts] When conducting value iteration, what is the first iteration at which V (P0 ) is nonzero?

Answer: 5
The shortest number of iterations is 5 because it takes 4 timesteps for Pacman to go from P to G, and 1
more timestep for Pacman to exit from the goal. It takes far more iterations for Pacman to gain reward
by picking up g1 and then exiting from G.
(ii) [2 pts] Assume Pacman will act optimally. What nonzero discount factor γ ensures that the policy of
picking up g1 before going to goal G and the policy of going straight to G yield the same reward?

Answer: ( 12 )1/6
To solve for γ, we have the equation 200γ 11 = 100γ 5 , where the left hand is the reward after picking up
g1 , and the right hand is the reward after going straight to and exiting from the goal G.
1
Solving this equation yields γ 6 = 12 , γ = 1
2
6
≈ 0.891.

(b) Pacman is now at (b), which contains a Gold Store (S). He will receive +5 per nugget. When at the Store,
Pacman can either “Sell” to sell all his gold for +5 per nugget or “Exit” to exit the Gridworld. Exiting from
the Store yields 0 reward. Exiting from goal G will give +100 + 5k, where Pacman has k nuggets.
Note that Pacman can also only carry one gold nugget at a time.
(i) [2 pts] When conducting value iteration, what is the first iteration at which V (P0 ) is nonzero?

Answer: 7
It takes 7 steps for Pacman to pick up g3 and carry it to the Store to sell it. All other methods of obtaining
reward, such as exiting from goal G, selling g1 , selling g2 , or selling some combination of nuggets, takes
more time than to directly carry g3 to the store and sell it. The distance between P and S is 6, as Pacman
picks g3 up on the way, and 1 more timestep is needed to do the action “Sell.”
(ii) [2 pts] Now Pacman is in a world with a Store that is not necessarily the Gridworld (b). Assume Pacman
is acting optimally, and he begins at the Store. It takes Pacman time T1 , T2 , T3 to go from the Store, pick
up the nuggets g1 , g2 , g3 respectively, return to the store, and sell each nugget. It takes time TG to go
from the Store and exit from the goal G. Assume T1 < T2 < T3 < TG .
What must be true such that the better policy for Pacman would be to gather and sell all nuggets and
exit from the store rather than to gather all nuggets and exit from goal G?
5(γ T1 +γ T1 +T2 +γ T1 +T2 +T3 ) > 115γ T1 +T2 +T3 +TG # 5(γ T1 + γ T1 +T2 + γ T1 +T2 +T3 ) > 100γ TG
# 5(γ T1 + γ T2 + γ T3 ) > 100γ TG # 15γ T1 +T2 +T3 > 115γ TG
# 5(γ T1 + γ T1 +T2 + γ T1 +T2 +T3 ) > 115γ TG None of the above

8
SID:

The first solution, which was intended but incorrect, as well as “None of the above” were both awarded points due
to unclear problem instructions.

The clarification that Pacman cannot use the Store if he holds multiple gold nuggets was added during the exam.
The intended solution was to select the first option. However, it is always more optimal to pick up the gold nuggets
in a single round trip, rather than returning to the store each time after a nugget is picked up, so the expression
115γ T1 +T2 +T3 +TG does not denote the reward if Pacman acts optimally and exits from the goal. Therefore, if we
specify some time TG0 that is the optimal time to pick up all round trip gold nuggets and exit via the goal, then the
0
reward when acting optimally and exiting from the goal is 115γ TG . However, we never specified any such value TG0 ,
and the correct answer was not one of the listed five inequalities.

Because the true answer was not included in the options, “None of the above” is the correct answer.

(c) Finally, Pacman finds himself in Gridworld (c). There is no store. However, Pacman finds that there is now a
living reward! He gets the living reward for every action he takes except the Exit action. Pacman receives +0
exiting from the Door, and +100 exiting from the Goal. Once in the Door, Pacman can only Exit.
(i) [2 pts] Suppose γ = 0.5. For what living reward will Pacman receive the same reward whether he exits
via the Door or exits via the goal?

Answer: − 50
7
4
The rewards are the same if 12 ∗ 100 + RL + 21 RL + 14 RL + 18 RL = RL .
Solving this yields 78 RL = −0.54 100, which reduces to RL = − 8∗100 50
7∗16 = − 7 .
(ii) [2 pts] Suppose γ = 0.5. What is the living reward such that Pacman receives the same P∞ reward if he
t r
traverses the Gridworld forever or if he goes straight to and exits from the goal? Hint: t=0 rγ = 1−γ

Answer: 50
RL 4
He would want to go around forever if 1−0.5 = 21 100 + 15 25 15
8 RL , which can be reduced to 2RL = 4 + 8 RL ,
1 25 15 1 1 1
or 8 RL = 4 = 50. The 8 in the first equation is from RL + 2 RL + 4 RL + 8 RL that Pacman accumulated
on the way to the goal G.

9
Q4. [8 pts] Approximate Q-learning
(a) [2 pts] Pacman is trying to collect all food pellets, and each treasure chest contains 10 pellets but must be
unlocked with a key. Pacman will automatically pick up a pellet or key when he enters a square containing
them, and he will automatically unlock a chest when he enters a square with a chest and has at least one key.
A key can only unlock on chest; after being used, it vanishes.
To finish, Pacman must exit through either goal G1 or G2 .
The keys are shown on the map as a K, treasure chests as T, and food pellets as circles. Pacman’s starting
position is shown as P. The goals are G1 , G2 .
When calculating features for a Q-function Q(s, a), the state the features are calculated with is the state s0
Pacman is in after taking action a from state s. The possible features the Q-learning can use are:

• Nkeys : Number of keys Pacman holds. • Dm (F ): Manhattan distance to closest food pel-
• Dm (K): Manhattan distance to closest key. let.
• Nchests : Number of chests Pacman has unlocked.
• Dm (G1 ): Manhattan distance to G1 .
• Dm (T ): Manhattan distance to closest chest.
• Nf ood : Number of food pellets Pacman has eaten. • Dm (G2 ): Manhattan distance to G2 .

Note that the approximate Q-learning here can be any function over the features, not necessarily a weighted
linear sum.
Suppose we finished training an agent using approximate Q-learning and we then run the learned policy to
observe the following.

Assume we observe all the above paths. What is the minimal set of features that could have been used to learn
this policy?

 Nkeys  Dm (T )  Dm (G1 )
 Dm (K)  Nf ood  Dm (G2 )
 Nchests  Dm (F )  No features

Due to the phrasing of the problem, there is some ambiguity regarding the precise minimal feature set. Full
credit was given to answers that chose Dm (G1 ), at least one of {Nkeys , Dm (K)}, at least one of {Nf ood ,
Dm (F )}, and did not pick any of {Nchests , Dm (T ), Dm (G2 )}.
In episodes 1 and 3 above, Pacman walks in the direction of G1 ; of the features provided only Dm (G1 ) could
be used to learn this behavior.
Based on episode 3 above, Pacman is willing to move away from G1 to gather a food pellet. Since the food
pellet is just one square away, either Nf ood or Dm (F ) could be used to learn this behavior. At the same time,
after collecting the first key Pacman chooses to collect the second key instead of picking up the pellet right
away; this can be explained by the use of either Nkeys or Dm (K).
Moving towards/away from the treasure chest is not required to explain Pacman’s behavior, so the choices
Nchests and Dm (T ) are incorrect.
The feature Dm (G2 ) is not required, either. In episode 2, Pacman begins by collecting the key and two food
pellets. At that point, moving either up or right would bring Pacman closer to G1 , so Pacman could have

10
SID:

broken the tie in favor of G2 , at which point Pacman happens to stumble across the second goal and the
episode ends.

(b) Suppose Pacman is now in an empty grid of size M × M . For a Q-value Q(s, a), the features are the x- and
y-position of the state Pacman is in after taking action a.
Select “Possible with weighted sum” if the policy can be expressed by using a weighted linear sum to represent
the Q-function. Select “Possible with large neural net” if the policy can be expressed by using a large neural
net to represent the Q-function. Select “Not Possible” if expressing the policy with given features is impossible
no matter the function.
(i) [2 pts] Pacman’s optimal policy is always to go upwards.
 Possible with large neural net  Possible with weighted linear sum  Not Possible
(ii) [2 pts] We draw a vertical line and divide the Gridworld into two halves. On the left half, Pacman’s
optimal policy is to go upwards, and on the right half, Pacman’s optimal policy is to go downwards.
 Possible with large neural net  Possible with weighted linear sum  Not Possible
(iii) [2 pts] We draw a vertical line and divide the Gridworld into two equal halves. On the left half, Pacman’s
optimal policy is to go upwards, and on the right half, Pacman’s optimal policy is to go right.
 Possible with large neural net  Possible with weighted linear sum  Not Possible
The key in this part is understanding that a large enough neural net can approximate any function, so it is
always possible to use a large neural net to represent the Q-functions and express a certain policy.
The first is possible with a weighted linear sum, for example, 0 ∗ x + 1 ∗ y, so Q-values are larger for states with
larger y values.
The second is not possible with a weighted linear sum. To make going up optimal, Q-values with larger y-values
must be similarly larger. In the right half, this must be the opposite. However, the x features for two positions
(x, y+1) and (x, y-1), e.g., one position is higher than the other, are the same, meaning that the x feature can
only create offsets between the weighted linear sums for those states and not actually reverse the magnitude to
cause (x, y-1) to have a larger Q-value than (x, y+1). That is, for some weights wx , wy , the difference in Q-values
between (x, y+1), (x, y-1) is wx ∗x+wy (y+1)−(wx ∗x+wy (y−1)) = wx ∗x+wy ∗y+wy −wx ∗x−wy ∗y+wy = 2wy ,
and is independent of both x-position and wx .
The third is not possible with a weighted linear sum. For two positions (x, y+1) and (x + 1, y), we can
calculate the difference in Q-values for the given positions given any weights wx , wy . The difference is then
wx ∗ x + wy (y + 1) − (wx (x + 1) + wy ∗ y) = wx ∗ x + wy ∗ y + wy − wx ∗ x − wx − wy ∗ y = wy − wx . We see then
that this is actually completely independent of the positions and the difference will therefore always be fixed;
Pacman cannot prefer to go up in some situations and right in others.

11
Q5. [16 pts] Value of Asymmetric Information
Alice and Bob are playing an adversarial game as shown in the game tree below. Alice (the MAX player) and Bob
(the MIN player) are both rational and they both know that their opponent is also a rational player. The game tree
has one chance node E whose outcome can be either E = −100 or E = +120 with equal 0.5 probability.

Alice

Bob -20

E 60

-100 120

Each player’s utility is equal to the amount of money he or she has. The value x of each leaf node in the game tree
means that Bob will pay Alice x dollars after the game, so that Alice and Bob’s utilities will be x and −x respectively.

(a) [2 pts] Suppose neither Alice nor Bob knows the outcome of E before playing. What is Alice’s expected utility?

Answer: 10 E’s expectation is 10. Using minimax, both Alice and Bob should go left.

(b) Carol, a good friend of Alice’s, has access to E and can secretly tell Alice the outcome of E before the game
starts (giving Alice the true outcome of E without lying). However, Bob is not aware of any communication
between Alice and Carol, so he still assumes that Alice has no access to E.

(i) [1 pt] Suppose Carol secretly tells Alice that E = −100. What is Alice’s expected utility in this case?

Answer: -20 Here, Bob will still go left as before since he isn’t aware of Alice’s access
to E. Given this, Alice should now choose to go right when E = −100.
(ii) [1 pt] Suppose Carol secretly tells Alice that E = +120. What is Alice’s expected utility in this case?

Answer: 120 Here, Bob will still go left as before since he isn’t aware of Alice’s access
to E. Given this, Alice should now choose to go left when E = +120.
(iii) [1 pt] What is Alice’s expected utility if Carol secretly tells Alice the outcome of E before playing?

Answer: 50 E is equally likely to be −100 or +120. Averaging the two cases above,
−20 ∗ 0.5 + 120 ∗ 0.5 = 50.
We define the value of private information VApri (X) of a random variable X to a player A as the difference in
player A’s expected utility after the outcome of X becomes a private information to player A, such that A has
access to the outcome of X, while other players have no access to X and are not aware of A’s access to X.

(iv) [2 pts] In general, the value of private information VApri (X) of a variable X to a player A
 always satisfies VApri (X) > 0 in all cases.
 always satisfies VApri (X) ≥ 0 in all cases.
 always satisfies VApri (X) = 0 in all cases.
 can possibly satisfy VApri (X) < 0 in certain cases.
Since player A can always choose to ignore this information and act in the same way as if he/she doesn’t
know this information, player A is guaranteed to obtain at least the same utility as before, so VApri (X) ≥ 0.

12
SID:
pri
(v) [1 pt] What is VAlice (E), the value of private information of E to Alice in the specific game tree above?

Answer: 40 Subtracting the answer of (a) from the answer of (b, iii), 50 − 10 = 40.

(c) David also has access to E, and can make a public announcement of E (announcing the true outcome of E
without lying), so that both Alice and Bob will know the outcome of E and are both aware that their opponent
also knows the outcome of E. Also, Alice cannot obtain any information from Carol now.

(i) [1 pt] Suppose David publicly announces that E = −100. What is Alice’s expected utility in this case?

Answer: -20 Using minimax with E = −100, Bob will go left and Alice will go right.

(ii) [1 pt] Suppose David publicly announces that E = +120. What is Alice’s expected utility in this case?

Answer: 60 Using minimax with E = +120, Bob will go right and Alice will go left.

(iii) [1 pt] What is Alice’s expected utility if David makes a public announcement of E before the game starts?

Answer: 20 E is equally likely to be −100 or +120. Averaging the two cases above,
−20 ∗ 0.5 + 60 ∗ 0.5 = 20.
We define the value of public information VApub (X) of a random variable X to a player A as the difference in
player A’s expected utility after the outcome of X becomes a public information, such that everyone has access
to the outcome of X and is aware that all other players also have access to X.

(iv) [2 pts] In general, the value of public information VApub (X) of a variable X to a player A
 always satisfies VApub (X) > 0 in all cases.
 always satisfies VApub (X) ≥ 0 in all cases.
 always satisfies VApub (X) = 0 in all cases.
 can possibly satisfy VApub (X) < 0 in certain cases.
Player A’s utility may decrease if the outcome of X becomes a public information, since other players can
now exploit this information to better play against player A, especially in an adversarial setting.
pub
(v) [1 pt] What is VAlice (E), the value of public information of E to Alice in the specific game tree above?

Answer: 10 Subtracting the answer of (a) from the answer of (c, iii), 20 − 10 = 10.

pub
(vi) [2 pts] Let a = VAlice (E) be the value of public information of E to Alice. Suppose David will publicly
announce the outcome of E if anyone (either Alice or Bob) pays him b dollars (b > 0), and will make no
announcement otherwise. Which of the following statements are True?
 pub
The value of public information of E to Bob is VBob (E) = −a.
 If b < a, then Alice should pay David b dollars.
 If b > a, then Bob should pay David b dollars.
 If b < −a, then Bob should pay David b dollars.
 If b > −a, then Alice should pay David b dollars.
 There exists some value b > 0 such that both Alice and Bob should pay David b dollars.
 There exists some value b > 0 such that neither Alice nor Bob should pay David b dollars.
Since Alice and Bob’s utilities always sum up to zero, if Alice’s utility increases by a after the outcome of
pub
E becomes a public information, then Bob’s utility will certainly decrease by a, so VBob (E) = −a.
pub pub
Alice should pay when b < VAlice (E) = a, and Bob should pay when b < VBob (E) = −a, which cannot
happen simultaneously since b > 0. When b is large enough (b > |a|), then neither Alice nor Bob should
pay for the announcement.

13
Q6. [12 pts] Bayes Net Modeling
(a) Modeling Joint Distributions For each of the Bayes Net (BN) models of the true data distribution, indicate
if the new Bayes Net model is guaranteed to be able to represent the true joint distribution. If it is not able
to, draw the minimal number of edges such that the resulting Bayes Net can capture the joint distribution,
or indicate if it is not possible.
(i) [2 pts]
BN Model of True New Bayes Net Can new BN If no, draw arrows
Data Distribution Model represent joint needed
distribution of True
Data Distribution?
A A A

Yes # No B C
B C B C

D D
# Not Possible
The two Bayes Net models make the same independence assumptions.
(ii) [2 pts]
BN Model of True New Bayes Net Can new BN If no, draw arrows
Data Distribution Model represent joint needed
distribution of True
Data Distribution?
A B A B A B

# Yes No C
C C

D E
D E D E
# Not Possible
The new Bayes Net model encodes the independence assumptions A ⊥⊥ B|C and D ⊥⊥ E, which are not
present in the BN model of the true data distribution. The edge A → B removes the assumption A ⊥
⊥ B|C,
and the edge D → E removes the assumption D ⊥⊥ E. Using a different direction for either/both edges
is also correct.
(iii) [2 pts]
BN Model of True New Bayes Net Can new BN If no, draw arrows
Data Distribution Model represent joint needed
distribution of True
Data Distribution?
A B
A B A B

# Yes No C D
C D C D

E E
# Not Possible
The new Bayes Net model encodes the independence assumptions A ⊥⊥ B|C, A ⊥⊥ E, C ⊥⊥ D|B, E, and
A⊥ ⊥ D|B, E, which are not present in the BN model of the true data distribution. The edge A → B
removes A ⊥⊥ B|C. The edge A → E removes A ⊥⊥ E. The edge C → D removes C ⊥⊥ D|B, E and
A⊥ ⊥ D|B, E. Using a different edge direction A ← E is also correct.

14
SID:

(b) Bayes Nets and Classification Recall from class that we can use Bayes Nets for classification by using the
conditional distribution of P (Y |X1 , X2 , . . . , Xn ), where Y is the class and each of the Xi are the observed
features.

Assume all we know about the true data distribution is that it can be represented with the “True Distribution
Model” Bayes Net structure. Indicate if the new Bayes Net models are guaranteed to be able to represent the
true conditional distribution, P (Y |X1 , X2 , . . . , Xn ). Mark “Yes” if it can be represented and “No” otherwise.

For all subparts of this problem, the answer to the rightmost question (with edges X1 → Y , X2 → Y , . . . ) is
“Yes”. These models contain a factor P (Y |X1 , X2 , . . . , Xn ) that can represent any conditional distribution.
True Distribution Model

Y
Y Y

(i) [2 pts]
X1 X2
X1 X2 X3 X1 X2 X3

X3 Yes # No Yes # No

For the left Yes/No question: based on the true distribution model, we can conclude that P (Y |X1 , X2 , X3 ) =
P (Y |X1 , X2 ). In other words, X3 is not required to model the conditional distribution. If we set
P (X3 = x3 ) = constant for all values x3 in its domain, the new Bayes Net model will also have
P (Y |X1 , X2 , X3 ) = P (Y |X1 , X2 ). With X3 out of the picture, the remaining difference between the
two models is the direction of a single edge between Y and X1 , which does not affect the ability to model
the true conditional distribution.

True Distribution Model

Y
Y Y

(ii) [2 pts]
A B C
X1 X2 X3 X1 X2 X3

X1 X2 X3 Yes # No Yes # No

For the left Yes/No question: starting with the true distribution model, we run variable elimination to
eliminate A, B, and C. The result would be exactly the model shown in the middle column, so we can
conclude that the answer is “Yes”: the new model can represent the true conditional distribution.

True Distribution Model

Y
Y Y

(iii) [2 pts]
A B C
X1 X2 X3 X4 X5 X1 X2 X3 X4 X5

X1 X2 X3 X4 X5 # Yes No Yes # No

15
For the left Yes/No question: the true data distribution can have the property that Y = X1 XOR X2 ,
constructed as follows. Let Y , X1 , and X2 be binary random variables. Let A take on values from the set
{(0, 0), (0, 1), (1, 0), (1, 1)}. Let P (A = (0, 0)|Y = 0) = P (A = (1, 1)|Y = 0) = 0.5 and P (A = (0, 1)|Y =
1) = P (A = (1, 0)|Y = 1) = 0.5. Let X1 equal the first element of A’s value with probability 1, and let
X2 equal the second element of A’s value with probability 1.
However, the Naive Bayes model has X1 ⊥⊥ X2 |Y , which can’t represent the XOR function. As a result,
the Naive Bayes model can’t represent the true distribution model.

16
SID:

Q7. [14 pts] Help the Farmer!


Chris is a farmer. He has a hen in his barn, and it will lay at most one egg per day. Chris collects data and discovers
conditions that influence his hen to lay eggs on a certain day, which he describes below.

H W P(H|W) S W O P(S|W,O) E H S P(E|H,S)


O P(O) W P(W) +h +w 0.9 +s +w +o 0.6 +e +h +s 0.4
+o 0.1 +w 0.7 +h −w 0.5 +s +w −o 0.1 +e +h −s 0.8
−o 0.9 −w 0.3 −h +w 0.1 +s −w +o 0.8 +e −h +s 0.2
−h −w 0.5 +s −w −o 0.1 +e −h −s 0.6
−s +w +o 0.4 −e +h +s 0.6
−s +w −o 0.9 −e +h −s 0.2
−s −w +o 0.2 −e −h +s 0.8
−s −w −o 0.9 −e −h −s 0.4

E S O

H W

For a single hen, variables O, W, S, H, and E denote the event of an outbreak (O), sunny weather (W ), sickness (S),
happiness (H), and egg being laid (E). If an event does occur, we denote it with a +, otherwise −, e.g., +o denotes
an outbreak having occurred and −o denotes no outbreak occurred.

(a) Suppose Chris wants to estimate the probability that the hen lays an egg given it’s good weather and the hen
is not sick, e.g., P (+e| + w, −s). Suppose we receive the samples:
(−o, +w, −s, −h, +e), (−o, +w, −s, +h, −e), (+o, +w, −s, −h, −e)
(i) [2 pts] Similar to the likelihood weighing method, Chris weighs each of his samples after fixing evidence.
However, he weighs each sample only with P (−s|+w, O), i.e. he omits weighing by P (+w). Chris’ method
results in the correct answer for the query P (+e| + w, −s).
True # False
Because the probability P (+w) is constant, by not including it, the query after normalizing would be
correct.
(ii) [2 pts] Using likelihood weighting with the samples listed above, what is the probability the hen lays an
egg given it’s good weather and the hen is not sick, or P (+e| + w, −s)? Round your answer to the second
decimal place or express it as a fraction simplified to the lowest terms.

0.41
0.7∗0.9
The weights are 0.7 ∗ .9 for the first two and 0.7 ∗ .4. This gives us the estimation 0.7∗0.9∗2+0.7∗0.4 , or a
probability of 0.41.

(b) Chris uses Gibbs sampling to sample tuples of (O, W, S, H, E).


(i) [2 pts] As a step in our Gibbs sampling, suppose we currently have the assignment of (−o, −w, +s, +h, +e).
Then suppose we resample the “sickness” variable, i.e., S. What is the probability that the next assignment
is the same, i.e., (−o, −w, +s, +h, +e)? Round your answer to the second decimal point, or express it as
a fraction simplified to the lowest terms.

.05
P (+s,−o,−w,+e,+h)
This is asking for the probability P (+s|−o, −w, +e, +h). Mathematically, this is P (+s,−o,−w,+e,+h)+P (−s,−o,−w,+e,+h) =
0.9∗0.3∗0.5∗0.1∗0.4 0.0054
0.9∗0.3∗0.5∗0.1∗0.4+0.9∗0.3∗0.5∗0.9∗0.8 = 0.0054+0.0972 . This gives us .053.

17
(ii) [2 pts] What will be the most observed tuple of (O, W, S, H, E) if we keep running Gibbs sampling for a
long time? Select one value from each column below to denote the assignment.
 +o  +w  +s  +h  +e
 −o  −w  −s  −h  −e

The most observed sample should be good weather, no outbreak, no sickness, happiness, and laying an
egg. This yields probability of 0.7 ∗ 0.9 ∗ 0.9 ∗ 0.9 ∗ 0.8 = .40824.

(c) [3 pts] Suppose we adopt a sampling procedure where at each evidence node with probability 0.5 we fix to
the evidence, otherwise we sample the outcome and reject if it doesn’t match. Upon seeing an evidence node,
write an expression for the value we will multiply into the weight of the sample to make this procedure
consistent. The weight is initialized at the start as weight = 1.

Your answer may use the variables s and p, where s is 1 if the coin flip told us to sample the evidence node,
and p is the probability of the evidence given its parents.

weight *= (1 − s)p + s
When the coin flip tells us to fix the evidence, we treat it like likelihood weighting and multiply by p. When
the coin flip tells us to sample, we treat it like rejection sampling, so we don’t need to change the weights.
A nice way of writing this is (1 − s)p + s. Technically, we can also multiply by any constant (e.g. .5 or p)
since if every sample weight is scaled this cancels when we normalize weights to estimate probabilities. We also
accepted answers written as piece-wise functions.

Now, suppose there are 1000 hens, each independently modeled by the Bayes Net model below. Denote the random
variables for sickness, happiness, and laying an egg as Si , Hi , Ei for hen i. The conditional probability tables are the
same as above for each hen.
Ei Si O

Hi W

(d) [3 pts] One day, Chris observed that all the hens lay eggs and the weather is bad. What’s the probability of
an outbreak happening? Round your answer to the second decimal point. Hint: P (O = +o, W = −w, Ei =
+ei ) = 0.0114 and P (O = −o, W = −w, Ei = +ei ) = 0.1782 for all i.

0
P (+ei | + o, −w) = 0.0114/P (+o, −w) = 0.0114/0.1/0.3 = 0.38
P (+ei | − o, −w) = 0.1782/P (−o, −w) = 0.1782/0.9/0.3 = 0.66
Q1000
P (O| + e1 , ..., +e1000 , −w) ∝ P (O, +e1 , ..., +e1000 , −w) = i=1 P (+ei |O, −w)P (O)P (−w)
(0.38)1000 ∗0.1 1
By normalizing this, P (+o, +e1 , ..., +e1 000, −w) = (0.38)1000 ∗0.1+(0.66)1000 ∗0.9 = 1+(66/38)1000 ∗9 ≈ 0.

18
SID:

Q8. [15 pts] Bayes Nets and RL


In this question, you will see that variable elimination can solve reinforcement learning problems. Consider the
following Bayes net, where St ∈ S, At ∈ A, and Rt ∈ {0, 1}:

At-1 At At+1

St St+1 St+2

Rt Rt+1 Rt+2

(a) [2 pts] From the list below, select all the (conditional) independencies that are guaranteed to be true in the
Bayes net above:

 St+1 ⊥
⊥ St−1 |St , At  At+1 ⊥⊥ Rt |St
 Rt+1 ⊥
⊥ Rt−1 |St , At  At+1 ⊥⊥ Rt |St , At
 Rt+1 ⊥
⊥ Rt  None of the above

Let +rt:T denote the event Rt = Rt+1 = · · · = RT = 1, and assume that P (at ) = 1/|A|. Define the following
functions:
1 X
βt (st , at ) = P (+rt:T |st , at ), βt (st ) = P (+rt:T |st ) = βt (st , at )
|A| a
t

Perform variable elimination to compute P (At |St , +rt:T ).

(b) [2 pts] Which of the following recursions does βt (st , at ) satisfy?


# βt (st , at ) = P (+rt |st ) st+1 βt+1 (st+1 )
P

# βt (st , at ) = P (+rt |st ) st+1 βt+1 (st+1 , at )


P

# βt (st , at ) = P (+rt+1 |st ) st+1 βt+1 (st+1 , at )


P
P
βt (st , at ) = P (+rt |st ) st+1 P (st+1 |st , at )βt+1 (st+1 )
# βt (st , at ) = at+1 P (+rt+1 |st+1 ) |A| 1
P P
st+1 P (st+1 |st , at )βt+1 (st+1 )

# None of the above

(c) [2 pts] Write P (at |st , +r1:T ) in terms of βt (st , at ), βt (at ), and relevant probabilities from the Bayes net.

# P (at |st , +r1:T ) = βt (st ,at )


βt (st ) P (+rt |st ) # P (at |st , +r1:T ) = βt (st ,at )
P (+rt |st )|A|
P βt (st ,at ) 0
P (at |st , +r1:T ) =
a0 βt (st ,at )
t
# P (at |st , +r1:T ) = P (+rt |st )
βt (st )βt (st ,at )|A|

# P (at |st , +r1:T ) = βt (st ,at )


βt (st ) # None of the above

19
So far, we have only discussed variable elimination in a certain Bayes net. Now, we will associate the Bayes net with
an MDP with two parameters p, q ∈ (0, 1). The states are S = {x, y, z, w}, and the actions are A = {a, b}. The
transitions are described in this diagram:
a (w/prob. p)

a (w/prob. 1-p)

x y z a, b
a b

w
a, b

All transitions are deterministic except when taking action a starting in state y – this transition is determined by
P (St+1 = y | St = y, At = a) = p. The rewards are stochastic and depend only on state, taking on values in {0, 1}
with probabilities

P (+rt |St = x) = 1, P (+rt |St = y) = 1, P (+rt |St = z) = 0, P (+rt |St = w) = q

Throughout the following questions, assume that p, q ∈ (0, 1).


1
(d) [3 pts] Consider running the uniform policy πuniform (At = a) = 2 for T + 2 timesteps starting in state x. What
is P (A1 = a|S1 = x, +r1:T +2 )?

# pT +1 /(pT +1 + (2q)T +1 ) # pT /(pT + (2q)T ) # pT +1 /(pT +1 + q T +1 )


# (2q)T +1 /(pT +1 + (2q)T +1 ) # (2q)T /(pT + (2q)T )
# q T +1 /(pT +1 + q T +1 )
pT /(pT + 2T q T +1 ) # pT /(pT + q T )
# 2T q T +1 /(pT + 2T q T +1 ) # q T /(pT + q T ) # None of the above

1 1
T
Since P (A1 = a, +r1:T +2 |S1 = x) = 2 2p and P (A1 = b, +r1:T +2 |S1 = x) = 12 q T +1 , we get P (A1 = a|S1 =
x, +r1:T +2 ) = pT /(pT + 2T q T +1 ).

(e) [2 pts] Suppose p > 2q (T +1)/T . When running πuniform , what is arg maxz P (A1 = z|S1 = x, +r1:T +2 )?

 arg maxz P (A1 = z|S1 = x, +r1:T +2 ) = a


 arg maxz P (A1 = z|S1 = x, +r1:T +2 ) = b
 Cannot be determined
If p > 2q (T +1)/T , then P (A1 = a|S1 = x, +r1:T +2 ) > 1/2, so a is the answer.

(f ) [2 pts] Suppose q > 2−T /(T +1) . When running πuniform , what is arg maxz P (A1 = z|S1 = x, +r1:T +2 )?

 arg maxz P (A1 = z|S1 = x, +r1:T +2 ) = a


 arg maxz P (A1 = z|S1 = x, +r1:T +2 ) = b
 Cannot be determined
If q > 2−T /(T +1) , then P (A1 = a|S1 = x, +r1:T +2 ) < pT /(pT + 1) < 1/2 because p < 1. So, the answer is b.

(g) [2 pts] Consider running the optimal policy π ∗ for T + 1 timesteps starting in state x. When is π ∗ always
guaranteed to choose b as its first action?

20
SID:

T > 1
(1−p)q # T > 1
pq # T < 1
(1−p)(1−q)

# T > 1
(1−q)p
# T < 1
(1−p)q # T < 1
pq
# T > 1
(1−p)(1−q) # T < 1
(1−q)p # None of the above

In state y, the optimal policy will always choose action a, and its value is at most 1/(1 − p). Meanwhile, in
state w, the value is always qT . So, if qT > 1/(1 − p), the optimal policy must go to state w, meaning that it
must take action b at x.

21
Q9. [8 pts] Decision Trees
You are given a dataset for training a decision tree. The goal is to predict the label (+ or -) given the features A, B,
and C.

A B C label
0 0 0 +
0 0 1 +
0 1 0 +
0 1 1 -
1 0 0 -
1 0 1 -
1 1 0 +
1 1 1 -

First, consider building a decision tree by greedily splitting according to information gain.

(a) [2 pts] Which features could be at the root of the resulting tree? Select all possible answers.

 A
 B
 C
A and C yield maximal information gain at the root.

(b) [2 pts] How many edges are there in the longest path of the resulting tree? Select all possible answers.

 1
 2
 3
 4
 None of the above
Regardless of the choice of the feature at the root, the resulting tree needs to consider all 3 features in a path,
so there are 3 edges in that path.

Now, consider building a decision tree with the smallest possible height.

(c) [2 pts] Which features could be at the root of the resulting tree? Select all possible answers.

 A
 B
 C
The optimal decision tree first splits on B. For the B=0 branch, the next split is on A; for the B=1 branch,
the next split is on C.

(d) [2 pts] How many edges are there in the longest path of the resulting tree? Select all possible answers.

 1
 2
 3
 4
 None of the above
As can be seen from the answer to part (c), the optimal tree has two edges per path from the root to any leaf.

22
THIS PAGE IS INTENTIONALLY LEFT BLANK
SCRATCH PAPER – INTENTIONALLY BLANK – PLEASE DETACH ME

24
SCRATCH PAPER – INTENTIONALLY BLANK – PLEASE DETACH ME

25
SCRATCH PAPER – INTENTIONALLY BLANK – PLEASE DETACH ME

26

You might also like