Midterm 2023 Fall
Midterm 2023 Fall
FALL 2023
M IDTERM E XAM
Name:
Email:
• The exam is closed book and closed notes, except for two handwritten pages of notes.
• Please show all of your work. Answers without appropriate justification will receive very little
credit. If you need extra space, use the back of the previous page.
(a) You solve a logistic regression with two features, and you use no offset. You find
3
ŵ = .
2
• Draw the set of points that corresponds to the decision region, i.e., the set of points for which this
logistic regression classifier assigns a 50% chance of being 1 and a 50% chance of being 0. Jus-
tify/explain your answer.
(b) In the problem above, also indicate the region that the classifier assigns a higher probability of being
a “1” and the region with a higher probability of being a “0.”
(c) Now suppose we use feature augmentation, and add a feature X3 = X12 and X4 = X22 . Suppose that
we solve the logistic regression problem, and now we use an offset. Thus we compute five values:
w0 for the offset, and w1 , w2 , w3 and w4 for the four features. If w0 = −4, and w1 = w2 = 0, and
w3 = w4 = 1, draw the set of points that corresponds to the decision region, in the (X1 , X2 )-space.
Problem 2 (25 pnts):
True/False and Multiple Choice: circle your answer, and provide a brief justification.
2. If X1 and Y are uncorrelated, then we can discard X1 and we will never hurt training or testing error.
(True. False.)
3. Logarithmic transformations of the features (assuming the values of the features are strictly positive
so that log is defined) do not change the training loss for decision trees, but they can improve the
testing error. (True. False.)
4. Suppose we have a regression problem with 2 features. If we are using linear regression and we add
the feature X3 = X1 − 3X2 , then it’s possible that the training error might be strictly reduced. (True.
False.)
5. Suppose we have a regression problem with 2 features. If we are using depth 2 regression trees and
we add the feature X3 = X1 − 3X2 , then it’s possible that the training error might be strictly reduced.
(True. False.)
Problem 3 (20 pnts):
Consider the following binary classification problem. For this problem, we want to use the exponential loss:
exp(−ŷy), where ŷ is given by h(x) for the function h(x) of our choice.
Table 1: Data
x(1) x(2) y
0.2 0.6 1
0.3 0.65 1
0.7 0.4 -1
0.3 0.4 -1
0.6 0.55 -1
0.8 0.7 1
(a) Find the best decision stump for this problem. Assume that you can only set leaf values to be in the
range [−5, 5].
(b) Suppose we fit a stump, and you split on x(1) ≥ 0.4, and assign leaf values ℓ1 and ℓ2 , so that if
x(1) < 0.4 it is assigned ℓ1 , and if x(1) ≥ 0.4 it is assigned ℓ2 . Write down the value of the loss
function. This should not be a numerical value, but a function of ℓ1 and ℓ2 .
(c) For the stump above (same splitting rule as given above), suppose we set values ℓ1 = (ln 4)/2 and
ℓ2 = −(ln 4)/2. Call this stump h1 . Suppose we wish to use the AdaBoost framework to boost the
stump above with a linear function of the form: h2 (x) = β1 x(1) + β2 x(2). This is done by solving a
minimization problem that is a sum of six terms, one for each of the data points. Write down the first
term. This should be an expression involving β1 and β2 , but should not have other variables.
(d) If we boost more decision trees, it could be possible to: circle all that apply
(a) Suppose we have a data set with two features. Imagine that have a solution to the linear logistic
regression problem with an offset. We draw the region where the model says P (Y = 1 | X = x) =
1/2, and we find that all the points are on one side, i.e., all the 1’s and all the 0’s are on the same side
of the region. What is the highest the AUC of this logistic regression model could be? Give a score,
and justify your answer.
(b) Consider a classifier (not necessarily the one described above). Suppose that it is 99% accurate.
Provide two examples that show that the AUC of this very accurate classifier could be very close to 1,
and an example that shows that the AUC could be very close to 1/2.
(c) For a dataset, a model predicts probabilities {0.25, 0.3, 0.4, 0.5, 0.8, 0.9} and the true corresponsing
labels are y = {0, 0, 0, 1, 0, 1}. Draw the ROC curve and compute the AUC for these predictions.