0% found this document useful (0 votes)
2 views

Midterm 2023 Fall

Uploaded by

nikita.andhale
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Midterm 2023 Fall

Uploaded by

nikita.andhale
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

T HE U NIVERSITY O F T EXAS AT AUSTIN

MIS382N - B USINESS DATA S CIENCE

FALL 2023

M IDTERM E XAM

T UESDAY, N OVEMBER 14, 2023

Name:

Email:

• You have 75 minutes for this exam.

• The exam is closed book and closed notes, except for two handwritten pages of notes.

• No electronic device may be used.

• Write your answers in the spaces provided.

• Please show all of your work. Answers without appropriate justification will receive very little
credit. If you need extra space, use the back of the previous page.

Problem 1 (25 pnts):

Problem 2 (25 pnts):

Problem 3 (25 pnts):

Problem 4 (25 pnts):

Total (100 pnts) :


Problem 1 (25 pnts):

(a) You solve a logistic regression with two features, and you use no offset. You find
 
3
ŵ = .
2

• Draw the set of points that corresponds to the decision region, i.e., the set of points for which this
logistic regression classifier assigns a 50% chance of being 1 and a 50% chance of being 0. Jus-
tify/explain your answer.

(b) In the problem above, also indicate the region that the classifier assigns a higher probability of being
a “1” and the region with a higher probability of being a “0.”
(c) Now suppose we use feature augmentation, and add a feature X3 = X12 and X4 = X22 . Suppose that
we solve the logistic regression problem, and now we use an offset. Thus we compute five values:
w0 for the offset, and w1 , w2 , w3 and w4 for the four features. If w0 = −4, and w1 = w2 = 0, and
w3 = w4 = 1, draw the set of points that corresponds to the decision region, in the (X1 , X2 )-space.
Problem 2 (25 pnts):
True/False and Multiple Choice: circle your answer, and provide a brief justification.

1. With an appropriate increase in the regularization coefficient in linear regression, it is possible to


decrease the training loss, i.e., to obtain a better fit on the training data. (Never. Always. Only with
Ridge Regression. Only with Lasso.)

2. If X1 and Y are uncorrelated, then we can discard X1 and we will never hurt training or testing error.
(True. False.)

3. Logarithmic transformations of the features (assuming the values of the features are strictly positive
so that log is defined) do not change the training loss for decision trees, but they can improve the
testing error. (True. False.)

4. Suppose we have a regression problem with 2 features. If we are using linear regression and we add
the feature X3 = X1 − 3X2 , then it’s possible that the training error might be strictly reduced. (True.
False.)

5. Suppose we have a regression problem with 2 features. If we are using depth 2 regression trees and
we add the feature X3 = X1 − 3X2 , then it’s possible that the training error might be strictly reduced.
(True. False.)
Problem 3 (20 pnts):

Consider the following binary classification problem. For this problem, we want to use the exponential loss:
exp(−ŷy), where ŷ is given by h(x) for the function h(x) of our choice.

Table 1: Data
x(1) x(2) y
0.2 0.6 1
0.3 0.65 1
0.7 0.4 -1
0.3 0.4 -1
0.6 0.55 -1
0.8 0.7 1

(a) Find the best decision stump for this problem. Assume that you can only set leaf values to be in the
range [−5, 5].

(b) Suppose we fit a stump, and you split on x(1) ≥ 0.4, and assign leaf values ℓ1 and ℓ2 , so that if
x(1) < 0.4 it is assigned ℓ1 , and if x(1) ≥ 0.4 it is assigned ℓ2 . Write down the value of the loss
function. This should not be a numerical value, but a function of ℓ1 and ℓ2 .
(c) For the stump above (same splitting rule as given above), suppose we set values ℓ1 = (ln 4)/2 and
ℓ2 = −(ln 4)/2. Call this stump h1 . Suppose we wish to use the AdaBoost framework to boost the
stump above with a linear function of the form: h2 (x) = β1 x(1) + β2 x(2). This is done by solving a
minimization problem that is a sum of six terms, one for each of the data points. Write down the first
term. This should be an expression involving β1 and β2 , but should not have other variables.

(d) If we boost more decision trees, it could be possible to: circle all that apply

(i) Strictly increase training error


(ii) Strictly decrease training error
(iii) Strictly increase testing error
(iv) Strictly decrease testing error
Problem 4 (20 pnts):

(a) Suppose we have a data set with two features. Imagine that have a solution to the linear logistic
regression problem with an offset. We draw the region where the model says P (Y = 1 | X = x) =
1/2, and we find that all the points are on one side, i.e., all the 1’s and all the 0’s are on the same side
of the region. What is the highest the AUC of this logistic regression model could be? Give a score,
and justify your answer.

(b) Consider a classifier (not necessarily the one described above). Suppose that it is 99% accurate.
Provide two examples that show that the AUC of this very accurate classifier could be very close to 1,
and an example that shows that the AUC could be very close to 1/2.
(c) For a dataset, a model predicts probabilities {0.25, 0.3, 0.4, 0.5, 0.8, 0.9} and the true corresponsing
labels are y = {0, 0, 0, 1, 0, 1}. Draw the ROC curve and compute the AUC for these predictions.

You might also like