0% found this document useful (0 votes)
110 views12 pages

Midterm 1 Practice Solutions

This document contains a study guide and practice problems for Midterm 1 in CS 6140: Machine Learning. It includes 5 sample practice problems covering topics like linear regression, logistic regression, and bias-variance tradeoff. Skills that may be helpful for the midterm include setting up regression problems, matrix operations, evaluating classifier performance metrics, and explaining causes of overfitting and underfitting.

Uploaded by

Rama Abdlrahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views12 pages

Midterm 1 Practice Solutions

This document contains a study guide and practice problems for Midterm 1 in CS 6140: Machine Learning. It includes 5 sample practice problems covering topics like linear regression, logistic regression, and bias-variance tradeoff. Skills that may be helpful for the midterm include setting up regression problems, matrix operations, evaluating classifier performance metrics, and explaining causes of overfitting and underfitting.

Uploaded by

Rama Abdlrahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

CS 6140: Machine Learning — Fall 2021— Paul Hand

Midterm 1 Study Guide and Practice Problems


Due: Never.

S Your Name(s) Here]


Names: [Put
plesolutions
This document contains practice problems for Midterm 1. The midterm will only have 5
problems. The midterm will cover material up through and including the bias-variance
tradeoff, but not including ridge regression. Skills that may be helpful for successful
performance on the midterm include:

1. Setting up and solving a linear regression problem with features that are nonlinear
functions of the model’s input.
2. Writing down the optimization problem for least squares linear regression using
matrix-vector notation
3. Familiarity with matrix multiplication, in particular when multiplying by diagonal
matrices
4. Evaluating the true positive rate, false positive rate, precision, and recall of a
predictor for binary classification
5. Setting up a logistic regression problem and writing down the appropriate function
that is being minimized
6. Computing the mean, expected value, and variance of uniform random variables
7. Explaining causes and remedies for overfitting and underfitting of ML models

1
Question 1.
Consider the following training data.

x1 x2 y
0 0 0
0 1 1.5
1 0 2
1 1 2.5
Suppose the data comes from a model y = ✓0 +✓1 x1 +✓2 x2 +noise for unknown constants
✓0 , ✓1 , ✓2 . Use least squares linear regression to find an estimate of ✓0 , ✓1 , ✓2 .
Response:

Let
P 9 1D of
We solve the least squares problem

main Hy 110112
The solution is given by

Atx Xt
son
xx
f Atx P
so
0
15 o o s
2
02 0.5

2
Question 2.
Consider the following training data:
x y
1 3
2 1
3 0.5
Suppose the data comes from a model y = cx + noise, for unknown constants c and .
Use least squares linear regression to find an estimate of c and .
Response:

Write logy lost


lect f
Data becomes

1Lpg Egos

1 4
1 11,0
let
p
a

Solve 119 all


my
Solution given by O H'Xfxty
we compute 0 89
Using humpy
1
0 899
C C 2.45
B O S 3 B OS
Question 3.

(a) Let ✓ ⇤ 2 Rd , and let f (✓) = 12 k✓ ✓ ⇤ k2 . Show that the Hessian of f is the identity
matrix.

Response:
(b) Let X 2 Rn⇥d and y 2 Rn . For ✓ 2 Rd , let g(✓) = 12 kX✓ yk2 . Show that the Hessian
of g is X t X.

Response:

a H on offoontrol
Write Frost 0 057

On E
Efron
gro o

So Hin I IIe H Id

4
b grot
11 Xo yip
By class

Tgror Xtra y

Xoxo Tty
kthrow of
Let M Ktx e IRded Let Mn be M

so
f Ma Ky
Mato Atyln
so
go n my
jthentry of Mk

Thus
H M htt
Question 4.
Consider a binary classification problem whose features are in R2 . Suppose the predictor
learned by logistic regression is (✓0 + ✓1 x1 + ✓2 x2 ), where ✓0 = 4, ✓1 = 1, ✓2 = 0. Find
and plot curve along which P(class 1) = 1/2 and the curve along which P(class 1) = 0.95.
Response:
0.5
1
0100 0,21 02227 12 X2
Got 0,21 0222 0

4
Htt
x o

0.95

G Goto X 02227 0.95

Recall O Z
Éz Fez
So 0121 0.95
z 0.95

Z
E it 0.0526

2 2.94

5
Question 5.
Consider a 3-class classification problem. You have trained a predictor whose input is
x 2 R2 and whose output is softmax(x1 + x2 1, 2x1 + 3, x2 ). Find and sketch the three
regions in R2 that gets classified as class 1, 2, and 3.
Response:

The predicted class corresponds to


the largest component of soft mat
which is the same as the largest
input to softmax

Z X th 1

Za 24 3
Z As

a where is classified as class 1


Kitt 1 324 3 Atta I k

Atx 4 X I

6
b where is classified as class 2

24 3 4th 1 24 3742

24
Italy 1 2

c Where is classified as class 33

X2 24 3
X2 Htt 1

X Cl 24 1 3
class I

Xz
class 2
I

3 Class 3

X
Question 6.
Suppose x ⇠Uniform([ 1, 1]) and y = x + ", where " ⇠Uniform([ , ]) for some > 0.
Consider a predictor given by f ✓ (x) = ✓1 + ✓2 x, where ✓ 2 R2 . Evaluate the risk of f ✓
with respect to the square loss. Your answer should be a deterministic expression only
depending on ✓1 , ✓2 , and .
Response:

RIO E
II foul 91 I a tax x et

Note E X2
L o t lo 1 x et
Six tan til
V3
Exelon E ello tix E E

EXO 2
Ee to.co j 2E11o
x
Sina O O are deterministic O EE
Fae
EE EEE O and XandE inolep FIX E EE EEE

RCO 8
Q t
t 10 1 y

7
Question 7.
You are training a logistic regression model and you notice that it does not perform well
on test data.
• Could the poor performance be due to underfitting? Explain.
• Could the poor performance be due to overfitting? Explain.

Underfitting Yes logist classes


regression separates the 2
a line this be too
using only might simple
to explain the variations in the dhoti

classes
example the
Consider case 2
for of
separable by a curved line

A e
a cheek 1

chen
logisticRegression's

time

8
Overfitting

Yes if there are too many

Features the data could appear

to be linearly separable as a

mathematical artifact This could

result in over fitting of training data

You might also like