Final Practice
Final Practice
Autumn 2012
1 Other Finals
The following pages are excerpts from similar classes’ finals. The content is similar to what
we’ve been covering this quarter, so that it should be useful for practicing. Note that the
topics and terminology differ slightly, so feel free to ignore the questions that we did not
cover.
Certain topics are less emphasized in the past exams, but will be more emphasized in
the final for the class. These include:
• Weighted CSPs and Markov Nets (the practice exams place more of an emphasis on
Bayes Nets).
• Loss-based learning (the practice exams place an emphasis on Naive Bayes instead).
In contrast, the practice exams cover state space models fairly deeply. State space models
will be less emphasized in the final for the class.
The first portion of the practice exam comes with solutions; the rest are provided as
example problems, but without solutions. In terms of other miscellaneous notes:
• Perceptron refers to a classifier using the perceptron loss (see slide 34 in the lecture on
loss minimization).
• The forward (and backward ) algorithm for HMMs is just an instance of variable elim-
ination, as you did in the first part of your Pacman projects, before implementing
particle filtering. Relatedly, Viterbi is an algorithm to decode the MAP estimate in an
HMM.
1
NAME: SID#: Login: GSI: 3
(i) Standard A∗
(ii) A∗ , but we apply the goal test before enqueuing nodes rather than after dequeuing
(iii) A∗ , but prioritize n by g(n) only (ignoring h(n))
(iv) A∗ , but prioritize n by h(n) only (ignoring g(n))
(v) A∗ , but prioritize n by g(n) + h(n0 )
(vi) A∗ , but prioritize n by g(n0 ) + h(n)
(a) (3 points) Which of the above variants are complete, assuming all heuristics are admissible?
(b) (3 points) Which of the above variants are optimal, again assuming all heuristics are admissible?
Upper Bounds: A∗ exploits lower bounds h on the true completion cost h∗ . Suppose now that we also have
an upper bound k(n) on the best completion cost (i.e. ∀n, k(n) ≥ h∗ (n)). We will now consider A∗ variants
which still use g + h as the queue priority, but save some work by using k as well. Consider the point at which
you are inserting a node n into the queue (fringe).
(c) (3 points) Assume you are required to preserve optimality. In response to n’s insertion, can you ever
delete any nodes m currently on the queue? If yes, state a general condition under which nodes m can be
discarded, if not, state why not. Your answer should involve various path quantities (g, h, k) for both the newly
inserted node n and other nodes m on the queue.
In a satisficing search, you are only required to find some solution of cost less than some threshold t (if one
exists). You need not be optimal.
(d) (3 points) In the satisficing case, in response to n’s insertion, can you ever delete any nodes m currently
on the queue? If yes, state a general condition, if not, state why not. Your answer should involve various path
quantities (g, h, k) for both the newly inserted node n and other nodes m on the queue.
4
-Admissible Heuristics: Suppose that we have a heuristic function which is not admissible, but -admissible,
meaning for some known > 0,
h(n) ≤ h∗ (n) + for all nodes n
where h∗ (n) is the optimal completion cost. In other words, h is never more than from being optimal.
(e) (1 point) Is using A∗ with an -admissible heuristic complete? Briefly justify.
(f ) (2 points) Assuming we utilize an admissible heuristic in standard A∗ search, how much worse than the
optimal solution can we get? I.e., if c∗ is the optimal cost for a search problem, what is the worst cost solution
an admissible heuristic would yield? Justify your answer.
(g) (2 points) Suggest a modification to the A∗ algorithm which will be guaranteed to yield an optimal
solution using an -admissible heuristic with fixed, known . Justify your answer.
NAME: SID#: Login: GSI: 5
Consider the general less-than chain CSP below. Each of the N variables Xi has the domain {1 . . . M }. The
constraints between adjacent variables Xi and Xi+1 require that Xi < Xi+1 .
(b) (1 point) What will the domain of X1 be after enforcing the consistency of only the arc X1 → X2 ?
(c) (2 points) What will the domain of X1 be after enforcing the consistency of only the arcs X2 → X3 then
X1 → X2 ?
(d) (2 points) What will the domain of X1 be after fully enforcing arc consistency?
6
(f ) (4 points) Imagine you wish to construct a similar family of CSPs which forces one of the two following
types of solutions: either all values must be ascending or all values must be descending, from left to right. For
example, if M = N = 3, there would be exactly two solutions: {1, 2, 3} and {3, 2, 1}. Explain how to formulate
this variant. Your answer should include a constraint graph and precise statements of variables and constraints.
NAME: SID#: Login: GSI: 7
If you play this game multiple times, it becomes more difficult to figure out what actions to take. The game is
formally a POMDP, but we can turn it into an MDP in which the states encode our past outcomes from lever
B. In particular, a state will be of the form (m, n), where m is the number of times we pulled B and got 10
and n is the number of times we pulled B and got 0. We begin in state (0, 0). If we then pull lever B and get
the outcome 10 we will go to state (1, 0), while getting the 0 outcome puts us in state (0, 1). Your actions are
{A, B}, and the rewards are as described above.
(c) (3 points) If you will play exactly two rounds, draw the computation tree which represents the possible
outcomes of the MDP. Clearly indicate which nodes are of which type (min, max, expectation, etc).
8
Note that if you pull lever B, the resulting payoff should change your beliefs about what kind of lever B is,
and therefore what future payoffs from B might be. For example, if you get the 10 reward, your belief that B
is lucky should increase.
(d) (2 points) If you are in state (0, 1) and select action B, list the states you might land in and the probability
you will land in them.
(e) (2 points) If you are in state (1, 0) and select action B, list the states you might land in and the probability
you will land in them.
(f ) (3 points) On the computation tree in (c), clearly mark the probabilities on each branch of any chance
nodes.
(g) (3 points) Again in this two-round setting, what is the MEU from the state state, and which first action(s)
(A or B or both) give it?
(h) (2 points) If the number of plays N is large enough, the optimal first action will eventually be to pull
lever B. Explain why this makes sense using concepts from reinforcement learning.
NAME: SID#: Login: GSI: 9
Assume there are two types of conditions: (S)inus congestion and (F)lu. Sinus congestion is is caused by
(A)llergy or the flu.
There are three observed symptoms for these conditions: (H)eadache, (R)unny nose, and fe(V)er. Runny nose
and headaches are directly caused by sinus congestion (only), while fever comes from having the flu (only). For
example, allergies only cause runny noses indirectly. Assume each variable is boolean.
A S F A S F A S F A S F
R H V R H V R H V R H V
(a) (2 points) Consider the four Bayes Nets shown. Circle the one which models the domain (as described
above) best.
(b) (3 points) For each network, if it models the domain exactly as above, write correct. If it has too many
conditional independence properties, write extra independence and state one that it has but should not have.
If it has too few conditional independence properties, write missing independence and state one that it should
have but does not have.
(i)
(ii)
(iii)
(iv)
(c) (3 points) Assume we wanted to remove the Sinus congestion (S) node. Draw the minimal Bayes Net
over the remaining variables which can encode the original model’s marginal distribution over the remaining
variables.
10
(d) (2 points) In the original network you chose, which query is more efficient to compute using variable
elimination: P (F |r, v, h, a, s) or P (F )? Briefly justify.
a, s, r, ¬h, ¬f, ¬v
a, s, ¬r, h, f, ¬v
a, ¬s, ¬r, ¬h, ¬f, ¬v
a, ¬s, ¬r, h, f, ¬v
a, s, ¬r, h, ¬f, ¬v
(e) (1 point) Give the sample estimate of P (f ) or state why it cannot be computed.
(f ) (1 point) Give the sample estimate of P (f |h) or state why it cannot be computed.
(g) (1 point) Give the sample estimate of P (f |v) or state why it cannot be computed.
(h) (2 points) For rejection sampling in general (not necessarily on these samples), which query will require
more samples to compute to a certain degree of accuracy, P (f |h) or P (f |h, a)? Justify your answer in general
terms.
NAME: SID#: Login: GSI: 11
You have been put in charge of a Jabberwock for your friend Lewis. The Jabberwock is kept in a large tugley
wood which is conveniently divided into an N × N grid. It wanders freely around the N 2 possible cells. At each
time step t = 1, 2, 3, . . . , the Jabberwock is in some cell Xt ∈ {1, . . . , N }2 , and it moves to cell Xt+1 randomly
as follows: with probability 1 − , it chooses one of the (up to 4) valid neighboring cells uniformly at random;
with probability , it uses its magical powers to teleport to a random cell uniformly at random among the N 2
possibilities (it might teleport to the same cell). Suppose = 21 , N = 10 and that the Jabberwock always starts
in X1 = (1, 1).
(a) (2 points) Compute the probability that the Jabberwock will be in X2 = (2, 1) at time step 2. What
about P (X2 = (4, 4))?
At each time step t, you don’t see Xt but see Et , which is the row that the Jabberwock is in; that is, if
Xt = (r, c), then Et = r. You still know that X1 = (1, 1).
(b) (4 points) Suppose we see that E1 = 1, E2 = 2, E3 = 10. Fill in the following table with the distribution
over Xt after each time step, taking into consideration the evidence. Your answer should be concise. Hint: you
should not need to do any heavy calculations.
3
12
You are a bit unsatisfied that you can’t pinpoint the Jabberwock exactly. But then you remembered Lewis
told you that the Jabberwock teleports only because it is frumious on that time step, and it becomes frumious
independently of anything else. Let us introduce a variable Ft ∈ {0, 1} to denote whether it will teleport at
time t. We want to to add these frumious variables to the HMM.
Consider the two candidates:
(A) (B)
F1 F2 F3 F1 F2 F3 X1 ⊥ X3 | X2 X1 ⊥ X3 | X2
X1 ⊥ E2 | X2 X1 ⊥ E2 | X2
X1 X2 X3 ··· X1 X2 X3 ··· X1 ⊥ F2 | X2 X1 ⊥ F2 | X2
X1 ⊥ E4 | X2 X1 ⊥ E4 | X2
E1 E2 E3 E1 E2 E3
X1 ⊥ F4 | X2 X1 ⊥ F4 | X2
E3 ⊥ F3 | X3 E3 ⊥ F3 | X3
(A) (B) E1 ⊥ F2 | X2 E1 ⊥ F2 | X2
E1 ⊥ F2 | E2 E1 ⊥ F2 | E2
(c) (3 points) For each model, circle the conditional independence assumptions above which are true in that
model.
(d) (2 points) Which Bayes net is more appropriate for the problem domain here, (A) or (B)? Justify your
answer.
For the following questions, your answers should be fully general for models of the structure shown above, not
specific to the teleporting Jabberwock. For full credit, you should also simplify as much as possible (including
pulling constants outside of sums, etc.).
(e) (2 points) For (A), express P (Xt+1 , e1:t+1 , f1:t+1 ) in terms of P (Xt , e1:t , f1:t ) and the CPTs used to define
the network. Assume the E and F nodes are all observed.
(f ) (2 points) For (B), express P (Xt+1 , e1:t+1 , f1:t+1 ) in terms of P (Xt , e1:t , f1:t ) and the CPTs used to define
the network. Assume the E and F nodes are all observed.
NAME: SID#: Login: GSI: 13
(h) (2 points) For (B), express P (Xt+1 , e1:t+1 ) in terms of P (Xt , e1:t ) and the CPTs used to define the
network.
14
Consider the following Naive-Bayes model for diagnosing whether your cat is (H)ungry. Signs of hunger include
that the cat is (T)hin, (M)eowing, or (W)eak.
(a) (3 points) If your cat is thin and meowing, but not weak, what is the probability that he is hungry?
(b) (2 points) Which of the following smoothing options might have been applied to produce the CPTs above
from training data? Circle the best answer:
(c) (2 points) Assume that no smoothing has been applied (so these are the maximum likelihood estimates).
Compute the linear interpolation smoothed estimate of Plin (w|h) using α = 0.5.
You can decide whether or not to give your cat a mega-feast (F) to counteract his (possible) hunger. Your
resulting utilities are below:
H F U (H, F )
h f 0
h ¬f -100
¬h f 0
¬h ¬f 10
If you do not know W , but wish to determine whether your cat is weak, you can apply the weak-o-meter test,
which reveals the value of W .
(g) (3 points) In terms of high-level quantities (MEUs, EUs, conditional probabilities, or similar) and variables,
give an expression for the maximum utility you should be willing pay to apply the weak-o-meter, assuming the
cat is again thin and meowing?
(h) (2 point) What is the maximum utility you should be willing to pay, as a specific real number?
(g) [8 pts] Bayes’ Nets For each of the conditional independence assertions given below, circle whether they are
guaranteed to be true, guaranteed to be false, or cannot be determined for the given Bayes’ net.
B C
D F H
G
B⊥
⊥C Guaranteed true Guaranteed false Cannot be determined
B⊥
⊥C|G Guaranteed true Guaranteed false Cannot be determined
B⊥
⊥C|H Guaranteed true Guaranteed false Cannot be determined
A⊥
⊥D|G Guaranteed true Guaranteed false Cannot be determined
A⊥
⊥D|H Guaranteed true Guaranteed false Cannot be determined
B⊥
⊥ C | A, F Guaranteed true Guaranteed false Cannot be determined
F ⊥
⊥ B | D, A Guaranteed true Guaranteed false Cannot be determined
F ⊥
⊥ B | D, C Guaranteed true Guaranteed false Cannot be determined
3
Q4. [12 pts] Worst-Case Markov Decision Processes
Most techniques for Markov Decision Processes focus on calculating V ∗ (s), the maximum expected utility of state
s (the expected discounted sum of rewards accumulated when starting from state s and acting optimally). This
maximum expected utility V ∗ (s) satisfies the following recursive expression, known as the Bellman Optimality
Equation: X
V ∗ (s) = max T (s, a, s0 ) [R(s, a, s0 ) + γV ∗ (s0 )] .
a
s0
In this question, instead of measuring the quality of a policy by its expected utility, we will consider the worst-case
utility as our measure of quality. Concretely, Lπ (s) is the minimum utility it is possible to attain over all (potentially
infinite) state-action sequences that can result from executing the policy π starting from state s. L∗ (s) = maxπ Lπ (s)
is the optimal worst-case utility. In words, L∗ (s) is the greatest lower bound on the utility of state s: the discounted
sum of rewards that an agent acting optimally is guaranteed to achieve when starting in state s.
Let C(s, a) be the set of all states that the agent has a non-zero probability of transferring to from state s using
action a. Formally, C(s, a) = {s0 | T (s, a, s0 ) > 0}. This notation may be useful to you.
(a) [3 pts] Express L∗ (s) in a recursive form similar to the Bellman Optimality Equation.
(b) [2 pts] Recall that the Bellman update for value iteration is:
X
Vi+1 (s) ← max T (s, a, s0 ) [R(s, a, s0 ) + γVi (s0 )]
a
s0
(c) [3 pts] From this point on, you can assume that R(s, a, s0 ) = R(s) (rewards are a function of the current state)
and that R(s) ≥ 0 for all s. With these assumptions, the Bellman Optimality Equation for Q-functions is
X h i
Q∗ (s, a) = R(s) + T (s, a, s0 ) γ max
0
Q∗ 0 0
(s , a )
a
s0
Let M (s, a) be the greatest lower bound on the utility of state s when taking action a (M is to L as Q is to
V ). (In words, if an agent plays optimally after taking action a from state s, this is the utility the agent is
guaranteed to achieve.) Formally define M ∗ (s, a), in a recursive form similar to how Q∗ is defined.
8
(d) [2 pts] Recall that the Q-learning update for maximizing expected utility is:
0 0
Q(s, a) ← (1 − α)Q(s, a) + α R(s) + γ max
0
Q(s , a ) ,
a
where α is the learning rate, (s, a, s0 , R(s)) is the sample that was just experienced (“we were in state s, we
took action a, we ended up in state s0 , and we received a reward R(s)). Circle the update equation below that
results in M (s, a) = M ∗ (s, a) when run sufficiently long under a policy that visits all state-action pairs infinitely
often. If more than one of the update equations below achieves this, select the one that would converge more
quickly. Note that in this problem, we do not know T or C when starting to learn.
n o
0 0
(iv) M (s, a) ← (1 − α)M (s, a) + α min M (s, a), R(s) + γ max
0
M (s , a ) .
a
(e) [1 pt] Suppose our agent selected actions to maximize L∗ (s), and γ = 1. What non-MDP-related technique
from this class would that resemble? (a one word answer will suffice)
(f ) [1 pt] Suppose our agent selected actions to maximize L3 (s) (our estimate of L∗ (s) after 3 iterations of our
“value-iteration”-like backup in section b) and γ = 1. What non-MDP-related technique from this class would
that resemble? (a brief answer will suffice)
9
Q5. [19 pts] Tree-Augmented Naive Bayes
In section, we twice have tried to help Pacbaby distinguish his father, Pacman,
from ghosts. Now Pacbaby has been transported back in time to the 1970s! Y
Pacbaby has noticed that in the 1970s, nearly everyone who wears sunglasses also
has a moustache, whether the person in question is Pacman, a ghost, or even a
young Ms. Pacman. So Pacbaby decides that it’s time for an upgrade from his
Naive Bayes brain: he’s getting a tree-augmented Naive Bayes brain so that the
X1
features he observes don’t have to be independent. X2
(a) [1 pt] Suppose we observe no variables as evidence in the Tanb above. What is the classification rule for the
Tanb? Write the formula in terms of the CPTs (Conditional Probability Tables) and prior probabilities in the
Tanb.
(b) [2 pts] Assume we observe all the variables X1 = x1 , X2 = x2 , . . . , X6 = x6 in the Tanb above. What is the
classification rule for the Tanb? Write the formula in terms of the CPTs and prior probabilites in the Tanb.
(c) [3 pts] Specify an elimination order that is efficient for the query P(Y | X5 = x5 ) in the Tanb above (including
Y in your ordering). How many variables are in the biggest factor (there may be more than one; if so, list only
one of the largest) induced by variable elimination with your ordering? Which variables are they?
10
(d) [3 pts] Specify an elimination order that is efficient for the query P (X3 | X5 = x5 ) in the Tanb above (including
X3 in your ordering). How many variables are in the biggest factor (there may be more than one; if so, list
only one of the largest) induced by variable elimination with your ordering? Which variables are they?
(e) [2 pts] Does it make sense to run Gibbs sampling to do inference in a Tanb? In two or fewer sentences, justify
your answer.
(f ) [2 pts] Suppose we are given a dataset of observations of Y and all the variables X1 , . . . , X6 in the Tanb above.
Let C denote the total count of observations, C(Y = y) denotes the number of observations of the event Y = y,
C(Y = y, Xi = xi ) denotes the count of the times the event Y = y, Xi = xi occurred, and so on. Using the C
notation, write the maximum likelihood estimates for all CPTs involving the variable X4 .
(g) [2 pts] In the notation of the question above, write the Laplace smoothed estimates for all CPTs involving the
variable X4 (for amount of smoothing k).
11
Y Y
M S M S
(Nb) (Tanb)
(h) [2 pts] Consider the two graphs on the nodes Y (Pacbaby sees Pacman or not), M (Pacbaby sees a moustache),
and S (Pacbaby sees sunglasses) above. Pacbaby observes Y = 1 and Y = −1 (Pacman or not Pacman) 50%
of the time. Given Y = 1 (Pacman), Pacbaby observes M = +m (moustache) 50% of the time and S = +s
(sunglasses on) 50% of the time. When Pacbaby observes Y = −1, the frequency of observations are identical
(i.e. 50% M = ±m and 50% S = ±s). In addition, Pacbaby notices that when Y = +1, anyone with a
moustache also wears sunglasses, and anyone without a moustache does not wear sunglasses. If Y = −1, the
presence or absence of a moustache has no influence on sunglasses. Based on this information, fill in the CPTs
below (you can assume that Pacbaby has the true probabilities of the world).
(i) [2 pts] Pacbaby sees a character with a moustache and wearing a pair of sunglasses. What prediction does the
Naive Bayes model Nb make? What probability does the Nb model assign its prediction? What prediction
does the Tanb model make? What probability does the Tanb-brained Pacbaby assign this prediction? Which
(if any) of the predictions assigns the correct posterior probabilities?
12
Q6. [10 pts] Finding Working Kernels
2
3
1
2
1
x2
x2
x2
0 0
0
−1
−1
−2
−3
−1
−2
−2 −1 0 1 2 −1 0 1 −2 −1 0 1 2 3
x1 x1 x1
(A) (B) (C)
The above pictures represent three distinct two-dimensional datasets with positive examples
labeled as o’s and
x
negative examples labeled as x’s. Consider the following three kernel functions (where x = 1 ):
x2
(a) [6 pts] For each dataset (A, B, C) circle all kernels that make the dataset separable (assume σ = .01 for the
RBF kernel):
Dataset (A): (i) (ii) (iii)
Dataset (B): (i) (ii) (iii)
Dataset (C): (i) (ii) (iii)
For parts (b) and (c), assume you train the perceptron using RBF (Gaussian) kernels: K(x, z) = exp − 2σ1 2 kx − zk2 .
You run the perceptron algorithm on dataset (C) until you either encounter no more errors on the training data or
you have encountered an error 1 million times and performed the associated update each time, whichever comes first.
Error rate (increasing →)
0 −3 −2 −1 0
0 −3 −2 −1 0
0 −3 −2 −1 0
0 −3 −2 −1 0
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
σ σ σ σ
(a) (b) (c) (d)
Figure 1: Possible plots of error rate (vertical axis) versus σ (horizontal axis)
(b) [2 pts] Which of the plots (a), (b), (c), or (d) in Fig. 1 is most likely to reflect the training set error rate of the
learned classifier as a function of σ?
(c) [2 pts] Which of the plots (a), (b), (c), or (d) in Fig. 1 is most likely to reflect the hold-out error rate as a
function of σ? Recall that “hold-out error-rate” is the error rate obtained by evaluating the classifier that was
learned on training data on held-out (unused) data.
13
Q7. [10 pts] Learning a Ranking for Twoogle Hiring
You were just hired by Twoogle. Twoogle is expanding rapidly, and you decide to use your machine learning skills to
assist them in their attempts to hire the best. To do so, you have the following available to you for each candidate
i in the pool of candidates I: (i) Their GPA, (ii) Whether they took CS164 with Hilfinger and achieved an A, (iii)
Whether they took CS188 and achieved an A, (iv) Whether they have a job offer from GBook, (v) Whether they
have a job offer from FacedIn, (vi) The number of misspelled words on their resume. You decide to represent each
candidate i ∈ I by a corresponding 6-dimensional feature vector f (x(i) ). You believe that if you just knew the right
weight vector w ∈ R6 you could reliably predict the quality of a candidate i by computing w · f (x(i) ). To determine
w your boss lets you sample pairs of candidates from the pool. For a pair of candidates (k, l) you can have them face
off in a “twoogle-fight.” The result is score (k l), which tells you that candidate k is at least score (k l) better
than candidate l. Note that the score will be negative when l is a better candidate than k. Assume you collected
scores for a set of pairs of candidates P.
(a) [8 pts] Describe how you could use a perceptron-like algorithm to learn the weight vector w. Make sure to
describe (i) Pseudo-code for the entire algorithm, (ii) In detail how the weight updates would be done.
(b) [2 pts] You notice that your perceptron-like algorithm is unable to reach zero errors on your training data. You
ask your boss if you could get access to more information about the candidates, but you are not getting it. Is
there anything else you could do to potentially improve performance on your training data?
14
2
Consider the problem of finding the most likely explanation in a general Bayes’ net. The input is a network G in
which some variables Xe1 . . . Xek are observed, and the output is an assignment to all the variables X1 . . . Xn ,
consistent with the observations, which has maximum probability. You will formulate this problem as a state
space search problem. Assume that the network is constructed such that for any variable Xi , its parents
Parents(Xi ) are variables Xj for j < i.
States: each partial assignment to a prefix of the variables, of the form {X1 = x1 , X2 = x2 , . . . Xk = xk }
Initial state: the empty assignment {}
Successor function: ??
Goal test: the assignment is complete (i.e. assigns all variables)
Step cost: ??
(a) (3 pts) Give an expression for the size of the state space if each variable Xi has Di elements in its domain.
(b) (3 pts) What is the successor function for this search problem?
(c) (4 pts) What is the cost function for this search problem? Hint: Recall that log ab = log a + log b and
that search minimizes total cost.
NAME: SID#: Login: Sec: 3
(d) (4 pts) Give two reasons why BFS would be a poor choice for solving this problem.
(e) (6 pts) Give a non-trivial admissible heuristic for this problem. You heuristic should be efficient to
compute. Justify the admissibility of your heuristic briefly.
(f) (4 pts) Briefly describe how we might use local search to solve this problem.
4
(a) (4 pts) Label each node in the following search tree with its value pair. 1-nodes represent player 1’s move,
while 2-nodes represent player 2’s move.
(b) (4 pts) Describe formally how to compute the value pair V (s) = (V1 (s), V2 (s)) of a node at state s under
the control of player 1 (the analog of a max node).
NAME: SID#: Login: Sec: 5
(c) (4 pts) Is it possible to prune the search in a manner similar to α-β pruning? Either describe a pruning
algorithm, or describe why such pruning is not possible.
(d) (4 pts) Would the knowledge that the game is nearly zero-sum, such as knowing that |V1 (s) + V2 (s)| ≤ k
for all terminal states s allow you to improve your pruning algorithm or enable pruning (depending on
your answer to (c))? Describe why or why not. Do not write more than a few sentences at most!
6
(a) (4 pts) Fill in the optimal values for grid (A) (hint: this should require very little calculation).
(b) (3 pts) Specify the optimal policy for grid (B) by placing an arrow in each empty square.
(d) (2 pts) Can we represent the actual optimal values V ∗ for grid (A) using these two features? Why or
why not?
(e) (4 pts) For each of the feature sets listed below, state which (if any) of the grid MDPs above can be
’solved’, in the sense that we can express some (possibly non-optimal) values which produce optimal
one-step look-ahead policies.
i. f1 (x, y) = x and f2 (x, y) = y.
ii. For each (i, j), a feature fi,j (x, y) = 1 if (x, y) = (i, j), 0 otherwise.
(a) (3 pts) Draw a Bayes’ net for a one-dimensional 4x1 Minesweeper grid, showing all eight variables
(B1 . . . B4 and N1 . . . N4 ). Show the minimal set of arcs needed to correctly model the domain above.
(b) (3 pts) Fully specify the CPTs for B1 and N1 , assuming that there is no noise in the readings (i.e. that
the number of adjacent bombs (or bomb) is reported exactly, deterministically). Your answers may use
the bomb rate b if needed.
8
(c) (3 pts) What are the posterior probabilities of bombs in each of the four squares, given no information?
(d) (4 pts) If we observe N2 = 1, what are the posterior probabilities of bombs in each square?
(e) (4 pts) On the following two-dimensional grid, assume we know the value of NA , NB , NC , and ND , and
we are about to observe NE . Shade in the squares whose posterior bomb probabilities can change as a
result of this new observation.
NAME: SID#: Login: Sec: 9
(f) (3 pts) On a 2x1 grid, imagine that you must take an action by declaring which squares have bombs and
which do not, so there are four possible actions on the 2x1 grid (again, note that there is no fixed number
of bombs, unlike in your project or in classic Minesweeper). The utility of correctly declaring a bomb is
+1, the utility of correctly declaring a clear square is +1, the utility of overlooking a bomb is −10 and
the utility of declaring a bomb where there is none is −1. If the initial probability of a bomb is 0.5, what
is the MEU action, and what is its EU?
(g) (6 pts) On the same 2x1 grid, what is the value of information about N1 ?
(h) (4 pts) How would you modify the network in (a) if you knew that there were exactly two bombs? Draw
a new network below and briefly describe/justify any new nodes you introduce.
12
P (X1 )
well 1
allergy 0
cold 0
P (Et | Xt = well)
P (Xt | Xt−1 = well) quiet 1.0
well 0.7 sneeze 0.0
allergy 0.2
cold 0.1 P (Et | Xt = allergy)
quiet 0.0
P (Xt | Xt−1 = allergy) sneeze 1.0
well 0.6
allergy 0.3 P (Et | Xt = cold)
cold 0.1 quiet 0.0
sneeze 1.0
P (Xt | Xt−1 = cold)
well 0.2
allergy 0.2
cold 0.6
Transitions Emissions
Note that colds are “stickier” in that you tend to have them for multiple days, while allergies come and go on
a quicker time scale. However, allergies are more frequent. Assume that on the first day, you are well.
(a) (2 pts) Imagine you observe the sequence quiet, sneeze, sneeze. What is the probability that you were
well all three days and observed these effects?
(b) (4 pts) What is the posterior distribution over your state on day 2 (X2 ) if E1 = quiet, E2 = sneeze?
NAME: SID#: Login: Sec: 13
(c) (4 pts) What is the posterior distribution over your state on day 3 (X3 ) if E1 = quiet, E2 = sneeze,
E3 = sneeze?
(d) (4 pts) What is the Viterbi (most likely) sequence for the observation sequence quiet, sneeze, sneeze,
sneeze, quiet, quiet, sneeze, quiet, quiet? Hint: you should not have to do extensive calculations.
Imagine you are monitoring your state using the particle filtering algorithm, and on a given day you have 5
particles on well, 2 on cold, and 3 on allergy before making an observation on that day.
(e) (4 pts) If you observe sneeze, what weight will each of your particles have?
(f) (4 pts) After resampling, what is the expected number of particles you will have on cold ?
NAME: SID#: Login: Sec: 3
Consider an agent in a maze-like grid, as shown to the right. Initially, the agent
might be in any location x (including the exit), but it does not know where it is.
The agent can move in any direction (N, S, E, W ). Moving into a wall is legal,
but does not change the agent’s position. For now, assume that all actions are
deterministic. The agent is trying to reach a designated exit location e where
it can be rescued. However, while the agent knows the layout of the maze, it
has no sensors and cannot tell where it is, or even what walls are nearby.
The agent must devise a plan which, on completion, guarantees that the agent will be in the exit location, regard-
less of the (unknown) starting location. For example, here, the agent might execute [W,N,N,E,E,N,N,E,E,E ],
after which it will be at e regardless of start position. You may find it useful to refer to pre(x, a), the either
empty or singleton set of squares which lead to x on a successful action a, and/or post(x, a), the square resulting
from x on a successful action a.
(a) (4 points) Formally state this problem as a single agent state-space search problem. You should formulate
your problem so that your state space is finite (e.g. do not use an encoding where each partial plan is a state).
States:
Start state:
Successor function:
Goal test:
Imagine the agent’s movement actions may fail, causing it to stay in place with probability f . In this case, the
agent can never be sure where it is, no matter what actions it takes. However, after any sequence of actions
a = a1 . . . ak , we can calculate a belief state P (x|a) over the locations x in the grid.
(c) (3 points) Give the expression from an incremental algorithm for calculating P (x|a1 . . . ak ) in terms of
P (x|a1 . . . ak−1 ). Be precise (e.g., refer to x, f , and so on).
P (x|a1 . . . ak ) =
Imagine the agent has a new action a = Z which signals for pick-up at the exit. The agent can only use this
action once, at which point the game ends. The utility for using Z if the agent is actually at the exit location
is +100, but -1000 elsewhere.
(d) (3 points) If the agent has already executed movement actions a1 . . . ak , give an expression for the utility
of then executing Z in terms of the quantity computed in (c).
U (Ak+1 = Z|a1 . . . ak ) =
Imagine that the agent receives a reward of -1 for each movement action taken, and wishes to find a plan which
maximizes its expected utility. Assume there is no discounting. Note that despite the underlying uncertainty,
this problem can be viewed as a deterministic state space search over the space of plans. Unlike your answer
in (a), this formulation does not guarantee a finite search space.
(e) (3 points) Complete the statement of this version of the problem as a single agent state-space search
problem. Remember that state space search minimizes cost and costs should be non-negative!
States: partial plans, which are strings of the form {N, S, E, W }∗ possibly followed by Z
Successor function: append N, S, E, W, or Z if current plan does not end in Z, no successors otherwise
Goal test:
Step cost:
NAME: SID#: Login: Sec: 5
You are asked to determine the layout of a new, small college. The campus will have three structures: an
administration building (A), a bus stop (B), a classroom (C), and a dormitory (D). Each building must be
placed somewhere on the grid below. The following constraints must be satisfied:
(i) The bust stop (B) must be adjacent to the road.
(ii) The administration building (A) and the classroom (C) must both be adjacent to the bus stop (B).
(iii) The classroom (C) must be adjacent to the dormitory (D).
(iv) The administration building (A) must not be adjacent to the dormitory (D).
(v) The administration building (A) must not be on a hill.
(vi) The dormitory (D) must be on a hill or near the road.
(vii) All buildings must be in different grid squares.
Here, “adjacent” means that the buildings must share a grid edge, not just a corner.
(a) (3 points) Express the non-unary above constraints as implicit binary constraints over the variables
A,B,C,D. Precise but evocative notation such as different(X,Y) is acceptable.
6
(b) (3 points) Cross out eliminated values to show the domains of all variables after unary constraints and
arc consistency have been applied (but no variables have been assigned).
A [ 1 2 3 4 5 6 ]
B [ 1 2 3 4 5 6 ]
C [ 1 2 3 4 5 6 ]
D [ 1 2 3 4 5 6 ]
(c) (3 points) Cross out eliminated values to show the domains of the variables after B = 3 has been assigned
and arc consistency has been rerun.
A [ 1 2 3 4 5 6 ]
B [ 3 ]
C [ 1 2 3 4 5 6 ]
D [ 1 2 3 4 5 6 ]
(d) (2 points) Give a solution for this CSP or state that none exist.
NAME: 3
Consider the zero-sum minimax game tree shown below. The triangles pointing up, such as the root, correspond
to the MAX player, while the triangles pointing down correspond to the MIN player. Leaves represent utilities
for the MAX player.
10 1 A 0 2
5 6 −3 −5
(b) (3 pt) Draw an X through all of the nodes that would be pruned (i.e. not explored at all) by α-β pruning.
Assume a left-to-right order of evaluation of children.
(c) (1 pt) What values of α and β will be passed into the recursive call to maxValue for node A from the
call to minValue on the parent of A?
(d) (1 pt) What will the final values of α and β be inside the recursive call to maxValue for node A just
before it returns?
4
Suppose you are playing a deterministic game against an opponent. You have been covertly surveilling your
opponent and have learned that he is a reflex agent.
(e) (1 pt) Suppose you also have determined the policy π = π0 that he is using. What search procedure can
you use to play optimally? State briefly but precisely how you would apply that procedure here.
(f ) (2 pt) Your opponent figured out that you know π0 . As a countermeasure, he has switched to randomly
picking the policy π from three alternatives, π1 , π2 , and π3 , each with equal probability, at the beginning
of each turn. You have been able to determine what the policies are, but naturally do not know which one
will be chosen each turn. What search procedure can you use to play optimally? State briefly but precisely
how you would apply that procedure here.
(g) (3 pt) Suppose your opponent switches to randomly picking the policy π at the beginning of the game,
rather than at the beginning of each turn (but still chooses randomly among π1 , π2 , and π3 , each with
equal probability). The opponent does not switch thereafter. What search procedure can you use to play
optimally, and over which state space is this procedure searching? State briefly but precisely how you
would apply that procedure here.
NAME: 5
States: S = {s0 , . . . , sn }
Initial state: s0
Actions: A
Successor function: Succ(s, a) = s0
Cost function: Cost(s, a) = c > 0
Goal test: sn is the only goal state
(a) (3 pt) Reformulate this problem as an MDP, described in terms of the original search problem elements.
States:
Actions:
Transition function:
Reward function:
Discount:
(b) (3 pt) Imagine that an agent has run value iteration for k rounds, computing Vk for all states, before
deciding to switch tactics and do graph search in the original search problem. The agent wants to take
advantage of the work it has already done, and so it uses Vk to construct an A∗ heuristic, h, setting
h(s) = −Vk (s). Which of the following are true? Briefly justify your answers.
(i) ∀s ∈ S, h(s) ≥ 0
(ii) h is admissible
6
(c) (4 pt) You are an agent who has just been dropped in some random state si ∈ S. You need to find a
plan that maximizes the appropriately discounted sum of rewards from now until the end of the episode.
Formulate this as a search problem.
States:
Initial state:
Actions:
Successor function:
Cost function:
Goal test:
(d) (3 pt) You’re a fairly well prepared agent, who’s done some Q-learning before being dropped in the
MDP. You want to use this information, so you define a heuristic, h(s) = − (maxa Q(s, a)). Which of the
following are true? Briefly justify your answers.
(i) ∀s ∈ S, h(s) ≥ 0
(ii) h is admissible
NAME: 7
Now, imagine trying to solve a generic, not necessarily deterministic, MDP with rewards R and transitions T .
Your goal is to find an optimal policy π, which gives the best action from any state. You decide to use a CSP
to solve this generic MDP, using a formulation in which each state’s action under the policy is represented by
a variable.
(e) (4 pt) Complete the definition of the CSP below. You may add additional variables beyond those given.
Your domains are not limited to binary (or even discrete) values, your constraints are not limited to unary
or binary ones, and implicit definitions are allowed. However, make sure that any variables or constraints
you add are stated precisely.
Constraints:
(f ) (2 pt) Why would solving this CSP with depth-first search be substantially less efficient than using value
iteration or policy iteration?
14
(a) (2 pt) You are given the following Bayes Net, and you need to compute P (A| + e, +f ). You decide to use
variable elimination. What factors will you start out with, taking the evidence into account?
B E
A C D F
Starting factors:
(b) (2 pt) You start out by eliminating C. What factors will you need to join and what factor will be created
after performing the join, but before summing out C?
Factors to join:
Resulting factor:
(c) (2 pt) For any variable, X, let |X| denote the size of X’s domain (the number of values it can take). How
large is the table for the factor from part (b), again before summing out C ?
Now, instead of a specific Bayes Net, let’s consider variable elimination on arbitrary Bayes Nets. At any stage
in the variable elimination process, there will be a set of factors F = {Fi | i = 1, . . . , n}. Each factor, Fi , can be
written as P (Li |Ri ), where Li and Ri are both sets of variables (for simplicity, assume that there is no evidence).
For any variable X, we define I(X) to be the set of indices of factors that include X: I(X) = {i | X ∈ (Li ∪ Ri )}.
When eliminating a specific variable, Y, we start by joining the appropriate factors from F, and creating a
single joined factor, called join(Y, F). Note that join(Y, F) is created before performing the elimination step,
so it still includes Y.
(d) (2 pt) For a given factor, F , we use size(F ) to denote the size of the table needed to represent F . Give
a precise general expression for size(join(Y, F)), using the notation above.
size(join(Y, F)) =
NAME: 15
The variable ordering used can make a large difference in the amount of time and memory needed to run
variable elimination. Because of this, it would be useful to figure out the optimal ordering before you actually
do any eliminations, making inference as efficient as possible. Consider a generic Bayes Net with variables
X ∈ V , that encodes a set of initial factors F0 .
(e) (5 pt) For a query variable Q, consider computing the marginal P (Q). Formulate a search problem that
determines the optimal variable elimination ordering, where cost is measured by the sum of the sizes of
the tables created during elimination. Note that for this problem we are only concerned with the sum of
the sizes of the tables created by the join step, not the smaller tables that are subsequently created by the
eliminate step. A correct answer will explicitly define the initial state in terms of the initial Bayes Net,
and will reference the quantity size(F ) defined above. You may also find the following notation helpful:
for a variable X, and factor F , eliminate(X, F ) denotes the new factor created by summing over values of
X to eliminate X from F .
States: Pairs (W, F), where W is the set of variables remaining to be eliminated and F is the set
of all remaining factors.
Initial state:
Actions:
Successor function:
Cost function:
Goal test:
(f ) (2 pt) Let parents(X) and children(X) denote the sets containing all of X’s parents/children in the Bayes
Net. Which of the following are admissible heuristics for the search problem you defined in part (e)?
Circle all that apply.
(i)|W |
(ii)|F|
P
(iii) |X|
Q X∈W
(iv) X∈W |X|
P Q
(v) X∈W |X| Y∈parents(X) |Y|
P Q
(vi) X∈W |X| Y∈children(X) |Y|
(g) (2 pt) Consider an alternative search problem, where the goal is to find the ordering that minimizes the
maximum table size instead of the sum over all tables created. What should the new state space be?
16
Consider the training data below. X1 and X2 are binary-valued features and Y is the label you’d like to classify.
Y X1 X2
1 0 0
0 1 0
1 1 1
1 0 0
0 0 1
0 1 0
(a) (2 pt) Assuming a Naive Bayes model, fill in the quantities learned from the training data in the tables
below (no smoothing).
(b) (2 pt) Fill in the learned quantities below as in (a), but with add-k smoothing, with k = 1.
Y X1 X2
1 0 0
0 1 0
1 1 1
1 0 0
0 0 1
0 1 0
(e) (4 pt) Circle the feature sets that would enable a perceptron to classify the training data perfectly.
i. {X1 }
ii. {X2 }
iii. {X1 , X2 }
iv. {1, X1 , X2 }
v. {1, abs(X1 − X2 )}
vi. {1, X1 , X2 , X1 + X2 }
vii. {1, X1 , X2 , max(X1 , X2 )}
viii. {X1 , X2 , X1 = X2 }
ix. {1, X1 , (X1 X2 )}
(f ) (2 pt) Circle true or false for each statement about a perceptron classifier in general. Assume weight
vectors are initialized to 0s.
(i) (true or false) Can produce non-integer-valued weight vectors from integer-valued features
(ii) (true or false) Estimates a probability distribution over the training data
(iii) (true or false) Assumes features are conditionally independent given the class
(g) (2 pt) Circle true or false for each statement about a MIRA classifier in general. Assume weight vectors
are initialized to 0s.
(ii) (true or false) Typically improves on the perceptron by allowing non-linearity in the decision bound-
ary
(iii) (true or false) Often improves on the perceptron by finding a decision boundary that generalizes
better to test data
(iv) (true or false) Typically tunes the C parameter on the training set
NAME: 3
Circle all of the following statements that are true, if any. Ignore ties in all cases.
i. Breadth-first search is a special case of depth-first search. (There is a way to get depth-first search
to generate the same search order as breadth-first search).
ii. Depth-first search is a special case of uniform-cost search. (There is a way to get uniform-cost search
to generate the same search order as depth-first search).
iii. Uniform-cost search is a special case of A* search. (There is a way to get A* search to generate the
same search order as uniform-cost search).
iv. A* search can perform breadth-first search under some class of admissible heuristics and cost functions.
v. A* search can perform depth-first search under some class of admissible heuristics and cost functions.
For each of the heuristics, circle the single choice that best describes what they’re doing.
A. Minimum Remaining Values (MRV)
ii. Focuses on the easy parts of the graph in order to postpone failure.
ii. Focuses on the easy parts of the graph in order to postpone failure.
Say we have player MAX and player MIN playing a game with a finite number of possible moves. MAX
calculates the minimax value of the root to be M . You may assume that each player has at least 2
possible actions at every turn. Also, you may assume for all parts that a different sequence of moves will
always lead to a different score (no two sequences yield the same score). Circle all of the following
statements that are true, if any.
i. Assume MIN is playing suboptimally, and MAX does not know this. The outcome of the game can
be better than M (i.e. higher for MAX).
ii. Assume MAX knows player MIN is playing randomly. There exists a policy for MAX such that MAX
can guarantee a better outcome than M .
iii. Assume MAX knows MIN is playing suboptimally at all times and knows the policy πMIN that MIN
is using (MAX knows exactly how MIN will play). There exists a policy for MAX such that MAX
can guarantee a better outcome than M .
iv. Assume MAX knows MIN is playing suboptimally at all times but does not know the policy πMIN
that MIN is using (MAX knows MIN will choose a suboptimal action at each turn, but does not know
which suboptimal action). There exists a policy for MAX such that MAX can guarantee a better
outcome than M .
i. If one is using value iteration and the policy (the greedy policy with respect to the values) has con-
verged, the values must have converged as well.
ii. If one is using value iteration and the values have converged, the policy must have converged as well.
iii. Expectimax will generally run in the same amount of time as value iteration on a given MDP.
We are given an MDP (S, A, T, γ, R), and a policy π (not necessarily the optimal policy). For each of the
following Bellman-like update equations, circle the single correct choice to match the equations with
the quantity being computed (V π , Qπ , V ∗ , Q∗ , π ∗ , or none of these).
P
i. g1 (s) = maxa∈A s′ ∈S T (s, a, s′ ) [R(s, a, s′ ) + γ maxa′ ∈A Q∗ (s′ , a′ )]
P
iii. g3 (s, a) = s′ ∈S T (s, a, s′ ) [R(s, a, s′ ) + γg3 (s′ , π(s′ ))]
P
iv. g4 (s) = s′ ∈S T (s, π(s), s′ ) [R(s, π(s), s′ ) + γg4 (s′ )]
P
v. g5 (s) = maxa∈A s′ ∈S T (s, a, s′ ) [R(s, a, s′ ) + γ maxs′′ ∈S g5 (s′′ )]
Circle all of the following equalities that are always true, if any.
i. P(A, B) = P(A)P(B)
P
v. P(A, B) = c∈C P(A|B, C = c)P(B|C = c)P(C = c)
For the Bayes’ Net shown below, you start with the factors P(I), P(J), P(L), P(K|I, L, J), and P(H|L, J).
I L J
K
What are the factors after joining on and eliminating J?
NAME: 7
ii. It is possible to use particle filtering when the state space is discrete.
iii. As the number of particles goes to infinity, particle filtering will represent the same probability dis-
tribution that you’d get by using exact inference.
iv. Particle filtering can represent a flat distribution (i.e. uniform) with fewer particles than it would need
for a more concentrated distribution (i.e. Gaussian).
Suppose you have a classification problem with classes Y = X, O and features F1 , F2 . You decide to use
the perceptron algorithm to classify the data. Suppose you run the algorithm for each of the data sets
shown below, stopping either after convergence or 1000 iterations (you may assume that if the algorithm
will converge, it will within 1000 iterations). Circle all of the examples, if any, where the decision boundary
could possibly be created by the perceptron algorithm.
F2 F2 F2
o x x
o o
x
F1 F1 F1
o x x
o o
x
You are defusing a bomb constructed by the evil Dr. Xor. You know the shutdown sequence is three bits
B1 , B2 , B3 , Bi ∈ {0, 1} and you know that an odd number of the Bi are 1. Otherwise, all sequences are equally
likely. You must pick a sequence, at which point the bomb either deactivates, or not. You get a utility of 100
if you guess the correct sequences, and 0 otherwise. (Hint: you should not need to do much calculation.)
(a) (1 pt) Draw a minimal (fewest arcs) Bayes’ Net that can represent the joint distribution over these
variables. You only need to include B1 , B2 , and B3 (not the utility).
At the last second, you discover that the bomb was actually set by Xor’s uncreative henchman, Repeato.
Repeato always uses all 1’s or all 0’s in his code (and is not restricted to an odd number of 1’s).
(f ) (1 pt) Draw a minimal (fewest arcs) Bayes’ Net that can represent the joint distribution over these
variables. You only need to include B1 , B2 , and B3 (not the utility).
In micro-blackjack, you repeatedly draw a card (with replacement) that is equally likely to be a 2, 3, or 4. You
can either Draw or Stop if the total score of the cards you have drawn is less than 6. Otherwise, you must
Stop. When you Stop, your utility is equal to your total score (up to 5), or zero if you get a total of 6 or higher.
When you Draw, you receive no utility. There is no discount (γ = 1).
(d) (2 pt) What is the smallest number of rounds (k) of value iteration for which this MDP will have its
exact values (if value iteration will never converge exactly, state so).
(e) (3 pt) Imagine that you run Q-learning instead of calculating values offline. You play many games, and
frequently choose all actions from states that you visit. However, due to bizarre luck, each card is a 2.
What will the final q-values approach in the limit if they are initialized to zero and you use a learning
rate of 1/2?
NAME: 11
Consider tree search (i.e. no closed set) on an arbitrary search problem with max branching factor b. Each
search node n has a backward (cumulative) cost of g(n), an admissible heuristic of h(n), and a depth of d(n).
Let c be a minimum-cost goal node, and let s be a shallowest goal node.
For each of the following, you will give an expression that characterizes the set of nodes that are expanded
before the search terminates. For instance, if we asked for the set of nodes with positive heuristic value, you
could say h(n) ≥ 0. Don’t worry about ties (so you won’t need to worry about > versus ≥). If there are no
nodes for which the expression is true, you must write “none.”
(a) (1 pt) Give an expression (i.e. an inequality in terms of the above quantities) for which nodes n will be
expanded in a breadth-first search.
(b) (1 pt) Give an expression for which nodes n will be expanded in a uniform cost search.
(c) (1 pt) Give an expression for which nodes n will be expanded in an A∗ search with heuristic h(n).
(d) (2 pt) Let h1 and h2 be two admissible heuristics such that ∀n, h1 (n) ≥ h2 (n). Give an expression for
the nodes which will be expanded in an A∗ search using h1 but not when using h2 .
(e) (2 pt) Give an expression for the nodes which will be expanded in an A∗ search using h2 but not when
using h1 .
12
B C
(a) (2 pt) A = B, B = C, C = A
(i) The CSP has no solutions, and enforcing arc consistency will expose it.
(ii) The CSP has no solutions, but enforcing arc consistency will not expose it.
(iii) The CSP has exactly one solution, and arc consistency will narrow domains to this solution.
(iv) The CSP has exactly one solution, but arc consistency will not narrow domains to this solution.
(v) The CSP has multiple solutions, and arc consistency will rule out all but one.
(vi) The CSP has multiple solutions, and arc consistency will leave domains so that all are possible.
(b) (2 pt) A 6= B, B 6= C, C 6= A
(i) The CSP has no solutions, and enforcing arc consistency will expose it.
(ii) The CSP has no solutions, but enforcing arc consistency will not expose it.
(iii) The CSP has exactly one solution, and arc consistency will narrow domains to this solution.
(iv) The CSP has exactly one solution, but arc consistency will not narrow domains to this solution.
(v) The CSP has multiple solutions, and arc consistency will rule out all but one.
(vi) The CSP has multiple solutions, and arc consistency will leave domains so that all are possible.
(i) The CSP has no solutions, and enforcing arc consistency will expose it.
(ii) The CSP has no solutions, but enforcing arc consistency will not expose it.
(iii) The CSP has exactly one solution, and arc consistency will narrow domains to this solution.
(iv) The CSP has exactly one solution, but arc consistency will not narrow domains to this solution.
(v) The CSP has multiple solutions, and arc consistency will rule out all but one.
(vi) The CSP has multiple solutions, and arc consistency will leave domains so that all are possible.
2
Consider the following generic search problem formulation with finitely many states:
(a) (2 pts) Suppose an optimal solution has cost n. If the goal is reachable, what is the upper bound on n?
(b) (2 pts) Suppose we must solve this search problem using BFS, but with limited memory. Specifically,
assume we can only store k states during search. Give a bound on n for which the search will fit in the available
memory.
(c) (2 pts) Would any other search procedure allow problems with substantially deeper solutions to be
solved? Either argue why not, or give a method along with an improved bound on n.
NAME: SID#: Login: Sec: 3
(d) (5 pts) If we knew the exact value of n, we could formulate a CSP whose complete assignment specifies
an optimal solution path (X0 , X1 . . . , Xn ) for this search problem. State binary and unary constraints which
guarantee that a statisfying assignment is a valid solution.
Variables: X0 , X1 , . . . , Xn
Domains: Dom(Xi ) = {ss , sg } ∪ {s1 , . . . , sd } ∀ i ∈ {0, 1, . . . , n}
Constraints:
(e) (3 pts) How can the successor function be used to efficiently enforce the consistency of an arc Xi → Xi−1 ?
(Note: Enforcing the consistency of this arc prunes values from the domain of Xi , not Xi−1 .)
(f ) (2 pts) After reducing the domains of any variables with unary constraints, suppose we then make all
arcs Xi → Xi−1 consistent, processed in order from i = 1 to n. Next, we try to assign variables in reverse
order, from Xn to X0 , using backtracking DFS. Why is this a particularly good variable ordering?
4
(a) (1 pt) Write out an expression for P (s1 , . . . sn , o1 , . . . on ) in terms of the local transition, emission, and
initial probabilities.
(b) (5 pts) Let (o1 , . . . on ) be a known sequence of observations of length n. Pose the problem of finding
the maximum likelihood sequence as a state space search problem in which the states are prefixes of hidden
sequences. Provide the initial state, successor function, goal test, and step cost. Hint: Recall that log ab =
log a + log b
Initial state:
Successor function:
Goal test:
Step cost:
(d) (4 pts) Provide a non-trivial admissible heuristic for this search problem. Explain why your heuristic is
admissible. Overly loose bounds will not receive full credit.
(e) (2 pts) Describe a qualitative scenario in which A∗ search would be more efficient than the Viterbi
algorithm from class. Make sure your answer relates specifically to the heuristic function you gave in part (d),
i.e. do not simply state that A∗ will be better if the heuristic is very good.
6
Given labeled training data points (x, y), where the the x-axis is depicted horizontally and the y-axis is vertical,
we can compute various features f1 and f2 as inputs to the perceptron above. For each feature set (A-H) below,
list the training sets (if any) which can be separated (classified perfectly). (12 points total)
0 0 0
−5 −5 −5
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
0 0 0
−5 −5 −5
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5
NAME: SID#: Login: Sec: 7
(a) (3 pts) Draw a (small) game tree in which the root node has a larger value if expectimax search is used
than if minimax is used, or argue why it is not possible.
(b) (3 pts) Draw a (small) game tree in which the root node has a larger value if minimax search is used
than if expectimax is used, or argue why it is not possible.
8
(c) (2 pts) Under what assumptions about player 2 should player 1 use minimax search rather than expec-
timax search to select a move?
(d) (2 pts) Under what assumptions about player 2 should player 1 use expectimax search rather than
minimax search?
(e) (4 pts) Imagine that player 1 wishes to act optimally (rationally), and player 1 knows that player 2 also
intends to act optimally. However, player 1 also knows that player 2 (mistakenly) believes that player 1 is
moving uniformly at random rather than optimally. Explain how player 1 should use this knowledge to select
a move. Your answer should be a precise algorithm involving a game tree search, and should include a sketch
of an appropriate game tree with player 1’s move at the root. Be clear what type of nodes are at each ply and
whose turn each ply represents.
12
Consider the above MDP, representing a robot on a balance beam. Each grid square is a state and the available
actions are right and left. The agent starts in state s2 , and all states have reward 0 aside from the ends of the
grid s1 and s8 and the ground state, which have the rewards shown. Moving left or right results in a move left
or right (respectively) with probability p. With probability 1 − p, the robot falls off the beam (transitions to
ground, and receives a reward of -1. Falling off, or reaching either endpoint, result in the end of the episode
(i.e., they are terminal states). Terminal states do have instantaneous rewards, but have zero future rewards.
(a) (3 pts) For what values of p is the optimal action from s2 to move right if the discount γ is 1?
(b) (3 pts) For what values of γ is the optimal action from s2 to move right if p = 1?
(c) (5 pts) Given initial value estimates of zero, show the results of one, then two rounds of value iteration.
You need only write down the non-zero entries.
sground s1 s2 s3 s4 s5 s6 s7 s8
Initial values 0 0 0 0 0 0 0 0 0
One update
Two updates
NAME: SID#: Login: Sec: 13
(d) (4 pts) Given initial q-value estimates of zero, show the result of Q-learning with learning rate α = 0.5
after two epsiodes: [s2 , s3 , ground] and [s2 , s3 , s4 , s5 , ground] where the agent always moves right. You need
only write down the non-zero entries. For the purposes of Q-learning updates, terminal states should be treated
as having a single action die which leads to future rewards of zero. Hint: q-values of terminal states which
have been visited should not be zero.
(e) (3 pts) We can develop learning updates that involve two actions instead of one. P Which of the following
are true of the utility U π (s) of a state s under policy π, given that U π (s) = R(s) + s0 T (s, π(s), s0 )γU π (s0 ) ?
(i) U π (s) = R(s) + s0 T (s, π(s), s0 ) s00 T (s0 , π(s), s00 )γ 2 U π (s00 )
P P
(ii) U π (s) = R(s) + s0 T (s, π(s), s0 ) γR(s0 ) + s00 T (s0 , π(s), s00 )γ 2 U π (s00 )
P P
(iii) U π (s) = R(s) + s0 T (s, π(s), s0 ) γU π (s0 ) + s00 T (s0 , π(s), s00 )γ 2 U π (s00 )
P P
(iv) U π (s) = s0 T (s, π(s), s0 ) γR(s0 ) + U π (s0 ) + s00 T (s0 , π(s), s00 )γ 2 U π (s00 )
P P
(v) U π (s) = s0 T (s, π(s), s0 ) R(s) + 12 γU π (s0 ) + 12 (γR(s0 ) + s00 T (s0 , π(s), s00 )γ 2 U π (s00 ))
P P
(f ) (2 pts) Write a two-step-look-ahead value iteration update that involves U (s) and U (s00 ), where s00 is the
state two time steps later. Why would this update not be used in practice?
(g) (2 pts) Write a two-step-look-ahead TD-learning update that involves U (s) and U (s00 ) for the observed
state-action-state-action-state sequence s, a, s0 , a0 , s00 .
14
(b) Write a Bellman equation expressing Qπ (s, a) in terms of U π (s0 ) (and other MDP quantities).
(c) Write an equation which expresses that X and Y are conditionally independent given Z.
(d) Give an example of a search algorithm and a search problem where the algorithm is not complete (you
may simply describe a qualitative property of the search problem, such as “a single goal state” rather
than stating a concrete problem, or you may draw a small search space).
(e) Why might arc consistency require that you process each arc multiple times?
(f) What does the size of a hypothesis class have to do with generalization and overfitting?
(g) In reinforcement learning, why can it be useful to sometimes act in a way which is believed to be subop-
timal?
End of Exam