0% found this document useful (0 votes)
30 views6 pages

Midterm Exam - Summer 21

Uploaded by

Siddhant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

Midterm Exam - Summer 21

Uploaded by

Siddhant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Student Name:

IE 7374 ST: Machine Learning in


Engineering
Summer, 2021

Midterm Exam

Student Name:
Student ID:
Total points:

1|P a g e
Student Name:

1. (21 pts) True or False. Please justify your answers in no more than a few
sentences.
a) (3 pts) If X and Y are independent. then E[2XY ] = 2E(X)E(Y ). True or False?
Explain.

b) (3 pts) The error of a hypothesis measured over its training set provides a
pessimistically biased estimate of the true error of the hypothesis. True or
False? Explain.

c) (3 pts) No classifier can do better than a naive Bayes classifier if the


distribution of the data is known. True or False? Explain.

d) (3 pts) We saw that the Bayesian approach to Gaussian linear regression


corresponds to ridge regression. Does a large variance of prior distribution in
Bayesian approach correspond to a large amount of regularization? True or
False. Explain.

e) (3 pts) For a continuous random variable x and its probability distribution


function p(x), it holds that 0 ≤ 𝑝(𝑥) ≤ 1 for all x.

f) (3 pts) Linear regression estimator has the smallest variance among all
unbiased estimators.

g) (3 pts) Maximizing the likelihood of logistic regression model yields multiple


local optimums.

2. (15 pts) A box contains three cards. One card is red on both sides, one card is
green on both sides, and one card is red on one side and green on the other. Then
we randomly select one card from this box, and we can know the color of the

2|P a g e
Student Name:

selected card’s upper side. If this side is green, what is the probability that the
other side of the card is also green?

3. (25 pts) Given a set of i.i.d samples X1, ..., Xn, Uniform (0, θ), find the maximum
likelihood estimator of θ.

(a) Write down the likelihood function. (10 pts)

(b) Find the maximum likelihood estimator. (15 pts)

4. (10 pts) We are interested here in a particular 1-dimensional linear regression


problem. The dataset corresponding to this problem has n examples
(x1; y1), ...,(xn; yn) where xi and yi are real numbers for all i. Let w* = [𝑤0∗, 𝑤1∗]T be the
least squares solution we are after. In other words, w* minimizes
𝑛
1
𝐽(𝑤) = ∑(𝑦𝑖 − 𝑤0 − 𝑤1 𝑥𝑖 )2
𝑛
𝑖=1
You can assume for our purposes here that the solution is unique.

3|P a g e
Student Name:

Check each statement that must be true if w* = [𝑤0∗ , 𝑤1∗]T is indeed the least squares
solution.

Where 𝑥̅ and 𝑦̅ are the sample means based on the same dataset. (Hint: take
the derivative of J(w) with respect to w0 and w1)

5. (25 pts) We consider the following models of logistic regression for a binary
classification with a sigmoid function 𝑔(𝑧) = 1+𝑒1 −𝑧:

• Model 1: P(Y=1|X, w1, w2) = g(w1X1 + w2X2)


• Model 2: P(Y=1|X, w1, w2) = g(w0 + w1X1 + w2X2)

We have three training examples:

4|P a g e
Student Name:

a) (5 pts) Does it matter how the third example is labeled in Model 1? i.e., would
the learned value of w = (w1,w2) be different if we change the label of the
third example to -1? Does it matter in Model 2? Briefly explain your answer.
(Hint: think of the decision boundary on 2D plane.)

b) (20 pts) Now, suppose we train the logistic regression model (Model 2) based
on the n training examples x(1), ..., x(n) and labels y(1), ..., y(n) by
maximizing the penalized log-likelihood of the labels:

For large λ (strong regularization), the log-likelihood terms will behave as linear
functions of w.

Express the penalized log-likelihood using this approximation (with Model 1), and
derive the expression for MLE 𝑤 ^ in terms of λ and training data {x(i), y(i)}. Based
on this, explain how w behaves as λ increases. (We assume each x(i) = (𝑥 , 𝑥 ) and y(i)
(𝑖)
1
𝑖
2
𝑇

is either 1 or -1).

5|P a g e
Student Name:

6. (30 pts) Assume you have the following training set with three binary features
x1, x2 and x3 and binary response/output y. suppose you have to predict y using a
naïve bayes classifier.

X1 X2 X3 y
1 0 0 0
0 1 1 1
0 0 1 0
1 0 0 1
0 0 1 0
0 1 0 1
1 1 0 1

a) (15 pts) Compute the MLE for 𝜃𝑗𝑦 for j = 0,1 as well as 𝜃𝑥𝑥´ 𝑙𝑙|𝑗|𝑦 for j = 0, 1 and
𝑥̅ 𝑙 = 0,1 and for 𝑙 = 1, 2, 3

b) (10 pts) After learning vias MLE is complete, what would you estimate for
P(y=0|x1=0, x2=1, x3=0) and P(y=0|x1=0, x2=0).

c) (5 pts) What would be the solution to the previous part without the naïve
bayes assumption?

6|P a g e

You might also like