Midterm Exam - Summer 21
Midterm Exam - Summer 21
Midterm Exam
Student Name:
Student ID:
Total points:
1|P a g e
Student Name:
1. (21 pts) True or False. Please justify your answers in no more than a few
sentences.
a) (3 pts) If X and Y are independent. then E[2XY ] = 2E(X)E(Y ). True or False?
Explain.
b) (3 pts) The error of a hypothesis measured over its training set provides a
pessimistically biased estimate of the true error of the hypothesis. True or
False? Explain.
f) (3 pts) Linear regression estimator has the smallest variance among all
unbiased estimators.
2. (15 pts) A box contains three cards. One card is red on both sides, one card is
green on both sides, and one card is red on one side and green on the other. Then
we randomly select one card from this box, and we can know the color of the
2|P a g e
Student Name:
selected card’s upper side. If this side is green, what is the probability that the
other side of the card is also green?
3. (25 pts) Given a set of i.i.d samples X1, ..., Xn, Uniform (0, θ), find the maximum
likelihood estimator of θ.
3|P a g e
Student Name:
Check each statement that must be true if w* = [𝑤0∗ , 𝑤1∗]T is indeed the least squares
solution.
Where 𝑥̅ and 𝑦̅ are the sample means based on the same dataset. (Hint: take
the derivative of J(w) with respect to w0 and w1)
5. (25 pts) We consider the following models of logistic regression for a binary
classification with a sigmoid function 𝑔(𝑧) = 1+𝑒1 −𝑧:
4|P a g e
Student Name:
a) (5 pts) Does it matter how the third example is labeled in Model 1? i.e., would
the learned value of w = (w1,w2) be different if we change the label of the
third example to -1? Does it matter in Model 2? Briefly explain your answer.
(Hint: think of the decision boundary on 2D plane.)
b) (20 pts) Now, suppose we train the logistic regression model (Model 2) based
on the n training examples x(1), ..., x(n) and labels y(1), ..., y(n) by
maximizing the penalized log-likelihood of the labels:
For large λ (strong regularization), the log-likelihood terms will behave as linear
functions of w.
Express the penalized log-likelihood using this approximation (with Model 1), and
derive the expression for MLE 𝑤 ^ in terms of λ and training data {x(i), y(i)}. Based
on this, explain how w behaves as λ increases. (We assume each x(i) = (𝑥 , 𝑥 ) and y(i)
(𝑖)
1
𝑖
2
𝑇
is either 1 or -1).
5|P a g e
Student Name:
6. (30 pts) Assume you have the following training set with three binary features
x1, x2 and x3 and binary response/output y. suppose you have to predict y using a
naïve bayes classifier.
X1 X2 X3 y
1 0 0 0
0 1 1 1
0 0 1 0
1 0 0 1
0 0 1 0
0 1 0 1
1 1 0 1
a) (15 pts) Compute the MLE for 𝜃𝑗𝑦 for j = 0,1 as well as 𝜃𝑥𝑥´ 𝑙𝑙|𝑗|𝑦 for j = 0, 1 and
𝑥̅ 𝑙 = 0,1 and for 𝑙 = 1, 2, 3
b) (10 pts) After learning vias MLE is complete, what would you estimate for
P(y=0|x1=0, x2=1, x3=0) and P(y=0|x1=0, x2=0).
c) (5 pts) What would be the solution to the previous part without the naïve
bayes assumption?
6|P a g e