0% found this document useful (0 votes)

55 views12 pages

ML Question CMU

Uploaded by

arjbaid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views12 pages

ML Question CMU

Uploaded by

arjbaid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

CS 189 Introduction to

Spring 2014 Machine Learning Midterm

• You have 2 hours for the exam.
• The exam is closed book, closed notes except your one-page crib sheet.

• Please use non-programmable calculators only.

• Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a
brief explanation.
• For true/false questions, fill in the True/False bubble.

• For multiple-choice questions, fill in the bubbles for ALL CORRECT CHOICES (in some cases, there may be
more than one). We have introduced a negative penalty for false positives for the multiple choice questions
such that the expected value of randomly guessing is 0. Don’t worry, for this section, your score will be the
maximum of your score and 0, thus you cannot incur a negative score for this section.

First name

Last name

SID

First and last name of student to your left

First and last name of student to your right

For staff use only:

Q1. True or False /10
Q2. Multiple Choice /24
Q3. Decision Theory /8
Q4. Kernels /14
Q5. L2-Regularized Linear Regression with Newton’s Method /8
Q6. Maximum Likelihood Estimation /8
Q7. Affine Transformations of Random Variables /13
Q8. Generative Models /15
Total /100

1
Q1. [10 pts] True or False
(a) [1 pt] The hyperparameters in the regularized logistic regression model are η (learning rate) and λ (regularization
term).
True False

(b) [1 pt] The objective function used in L2 regularized logistic regression is convex.
True False

(c) [1 pt] In SVMs, the values of αi for non-support vectors are 0.

True False

(d) [1 pt] As the number of data points approaches ∞, the error rate of a 1-NN classifier approaches 0.
True False

(e) [1 pt] Cross validation will guarantee that our model does not overfit.
True False

(f ) [1 pt] As the number of dimensions increases, the percentage of the volume in the unit ball shell with thickness
grows.
True False

(g) [1 pt] In logistic regression, the Hessian of the (non regularized) log likelihood is positive definite.
True False

(h) [1 pt] Given a binary classification scenario with Gaussian class conditionals and equal prior probabilities, the
optimal decision boundary will be linear.
True False

(i) [1 pt] In the primal version of SVM, we are minimizing the Lagrangian with respect to w and in the dual
version, we are minimizing the Lagrangian with respect to α.
True False

(j) [1 pt] For the dual version of soft margin SVM, the αi ’s for support vectors satisfy αi > C.
True False

2
Q2. [24 pts] Multiple Choice
(a) [3 pts] Consider the binary classification problem where y ∈ {0, 1} is the label and we have prior probability
P (y = 0) = π0 . If we model P (x|y = 1) to be the following distributions, which one(s) will cause the posterior
P (y = 1|x) to have a logistic function form?

Gaussian Uniform

Poisson None of the above

(b) [3 pts] Given the following data samples (square and triangle belong to two different classes), which one(s) of
the following algorithms can produce zero training error?

1-nearest neighbor Logistic regression

Support vector machine Linear discriminant analysis

(c) [3 pts] The following diagrams show the iso-probability contours for two different 2D Gaussian distributions. On
the left side, the data ∼ N (0, I) where I is the identity matrix. The right side has the same set of contour levels
as left side. What is the mean and covariance matrix for the right side’s multivariate Gaussian distribution?
5
5

4
4

3
3

2
2

1
1

0
y

−1
−1

−2
−2

−3
−3

−4
−4

−5
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5
−5 −4 −3 −2 −1 0 1 2 3 4 5
x x

" # " #
1 0 4 0
µ = [0, 0]T , Σ= µ = [0, 1]T , Σ=
0 1 0 0.25

" # " #
T
1 0 T
2 0
µ = [0, 1] , Σ= µ = [0, 1] , Σ=
0 1 0 0.5

3
(d) [3 pts] Given the following data samples (square and triangle mean two classes), which one(s) of the following
kernels can we use in SVM to separate the two classes?

Linear kernel Gaussian RBF (radial basis function) kernel

Polynomial kernel None of the above

(e) [3 pts] Consider the following plots of the contours of the unregularized error function along with the constraint
region. What regularization term is used in this case?

L2 L∞

L1 None of the above

(f ) [3 pts] Suppose we have a covariance matrix

5 a
Σ=
a 4
What is the set of values that a can take on such that Σ is a valid covariance matrix?

a∈< a≥0
√ √ √ √
− 20 ≤ a ≤ 20 − 20 < a < 20

4
(g) [3 pts] The soft margin SVM formulation is as follows:
N
1 X
min wT w + C ξi
2 i=1
subject to yi (wT xi + b) ≥ 1 − ξi ∀i
ξi ≥ 0 ∀i
2
What is the behavior of the width of the margin ( kwk ) as C → 0?

Behaves like hard margin Goes to zero

Goes to infinity None of the above

(h) [3 pts] In Homework 4, you fit a logistic regression model on spam and ham data for a Kaggle Competition.
Assume you had a very good score on the public test set, but when the GSIs ran your model on a private test
set, your score dropped a lot. This is likely because you overfitted by submitting multiple times and changing
the following between submissions:

λ, your penalty term , your convergence criterion

η, your step size Fixing a random bug

(i) [0 pts] BONUS QUESTION (Answer this only if you have time and are confident of your other answers
because this is not extra points.)
We have constructed the multiple choice problems such that every false positive will incur some negative
penalty. For one of these multiple choice problems, given that there are p points, r correct answers, and k
choices, what is the formula for the penalty such that the expected value of random guessing is equal to 0?
(You may assume k > r)
p
k−r

5
Q3. [8 pts] Decision Theory
Consider the following generative model for a 2-class classification problem, in which the class conditionals are
Bernoulli distributions:

p(ω1 ) = π
p(ω2 ) = 1 − π
(
1 with probability 0.5
x|ω1 =
0 with probability 0.5
(
1 with probability 0.5
x|ω2 =
0 with probability 0.5

Assume the loss matrix

true class = 1 true class = 2

predicted class = 1 0 λ12
predicted class = 2 λ21 0

(a) [8 pts] Give a condition in terms of λ12 , λ21 , and π that determines when class 1 should always be chosen as
the minimum-risk class.

Based on Bayes’ Rule, the posterior probability of P (wi |x) is

1
P (x|w1 )P (w1 ) π
P (w1 |x) = = 2
P (x) P (x)

1
P (x|w2 )P (w2 ) 2 (1− π)
P (w2 |x) = =
P (x) P (x)

Risk for predicting class 1 is

λ12 (1 − π)
R(α1 |x) = λ11 P (w1 |x) + λ12 P (w2 |x) =
2P (x)

Risk for predicting class 2 is

λ21 π
R(α2 |x) = λ21 P (w1 |x) + λ22 P (w2 |x) =
2P (x)

λ12 (1−π) λ21 π

Choose class 1 when R(α1 |x) < R(α2 |x), i.e. 2P (x) < 2P (x) , which is

λ12 (1 − π) < λ21 π

6
Q4. [14 pts] Kernels
(a) [6 pts] Let k1 and k2 be (valid) kernels; that is, k1 (x, y) = Φ1 (x)T Φ1 (y) and k2 (x, y) = Φ2 (x)T Φ2 (y).
Show that k = k1 + k2 is a valid kernel by explicitly constructing a corresponding feature mapping Φ(z).

k(x, y) = k1 (x, y) + k2 (x, y) = Φ1 (x)T Φ1 (y) + Φ2 (x)T Φ2 (y) = [Φ1 (x) Φ2 (x)]T [Φ1 (x) Φ2 (x)]
If we let φ(z) = [Φ1 (x) Φ2 (x)], then we have k(x, y) = φ(z)T φ(z). Therefore, k = k1 + k2 is a valid kernel.

(b) [8 pts] The polynomial kernel is defined to be

k(x, y) = (xT y + c)d

where x, y ∈ Rn , and c ≥ 0. When we take d = 2, this kernel is called the quadratic kernel. Find the feature
mapping Φ(z) that corresponds to the quadratic kernel.

First we expand the dot product inside, and square the entire sum. We will get a sum of the squares of the
components and a sum of the cross products.

n
X
(xT y + c)2 = (c + xi yi )2
i=1
n
X n X
X i−1 n
X
= c2 + x2i yi2 + 2xi yi xj yj + 2xi yi c
i=1 i=2 j=1 i=1

Pulling this sum into a dot product of x components and y components, we have
√ √ √ √ √ √
Φ(x) = c, x21 , . . . , x2n , 2x1 x2 , . . . , 2x1 xn , 2x2 x3 , . . . , 2xn−1 xn , 2cx1 , . . . , 2cxn
√
In this feature
√ mapping, we have c, the squared components of the vector x, 2 multiplied by all of the cross
terms, and 2c multiplied by all of the components.

7
Q5. [8 pts] L2-Regularized Linear Regression with Newton’s
Method
Recall that the objective function for L2-regularized linear regression is

J(w) = kXw − yk22 + λkwk22

where X is the design matrix (the rows of X are the data points).

The global minimizer of J is given by:

w∗ = (X T X + λI)−1 X T y

(a) [8 pts] Consider running Newton’s method to minimize J.

Let w0 be an arbitrary initial guess for Newton’s method. Show that w1 , the value of the weights after one
Newton step, is equal to w∗ .

Recall that Newton’s Method for Optimization is

w1 = w0 − [H(J(w))]−1 ∇w J(w)

Solving for the gradient, we have:

∇w J(w) = 2X T Xw − 2X T Y + 2λw = 2[(X T X + λI)w − X T Y ]

Solving for the Hessian, we have:

H(J(w)) = ∇2w J(w) = 2X T X + 2λI = 2(X T X + λI)

We initialize w0 to some value. Note that this won’t matter. Plugging this in, we have

w1 = w0 − (X T X + λI)−1 2−1 2[(X T X + λI)w0 − X T Y ]

= w0 − (X T X + λI)−1 (X T X + λI)w0 + (X T X + λI)−1 X T Y
= w0 − w0 + (X T X + λI)−1 X T Y
= (X T X + λI)−1 X T Y

Thus, w1 = w∗ .

8
Q6. [8 pts] Maximum Likelihood Estimation
(a) [8 pts] Let x1 , x2 , . . . , xn be independent samples from the following distribution:

P (x | θ) = θx−θ−1 where θ > 1, x ≥ 1

Find the maximum likelihood estimator of θ.

n
Y n
Y
L(θ|x1 , x2 , . . . , xn ) = θx−θ−1
i = θn xi−θ−1
i=1 i=1
n
X
ln L(θ|x1 , x2 , . . . , xn ) = n ln θ − (θ + 1) ln xi
i=1

n
δ ln L n X
= − ln xi = 0
δθ θ i=1

n
θmle = Pn
i=1 ln xi
Since θ > 1, any θmle ≤ 1 has a zero probability of generating any data, so our best estimate of θ when θmle ≤ 1
is θmle = 1. Therefore, the final answer is θmle = max(1, Pn n ln xi ).
i=1

However, we will still accept θmle = Pn n

i=1 ln xi .

9
Q7. [13 pts] Affine Transformations of Random Variables
Let X be a d-dimensional random vector with mean µ and covariance matrix Σ. Let Y = AX + b, where A is a
n × d matrix and b is a n-dimensional vector.

(a) [6 pts] Show that the mean of Y is Aµ + b.

E(Y) = E(AX + b) = E(AX) + E(b) = AE(X) + b = Aµ + b

(b) [7 pts] Show that the covariance matrix of Y is AΣAT .

V ar(Y) = E((Y − EY)(Y − EY)T ) = E((AX + b − Aµ − b)(AX + b − Aµ − b)T )

= E((AX − Aµ)(AX − Aµ)T ) = E(A(X − µ)(X − µ)T AT ) = AE((X − µ)(X − µ)T )AT
= AΣAT

10
Q8. [15 pts] Generative Models
Consider a generative classification model for K classes defined by the following:

• Prior class probabilities: P (Ck ) = πk k = 1, . . . , K

• General class-conditional densities: P (x|Ck ) k = 1, . . . , K

Suppose we are given training data {(xn , yn )}N

n=1 drawn independently from this model.

The labels yi are “one-of-K” vectors; that is, K-dimensional vectors of all 0’s except for a single 1 at the element
corresponding to the class. For example, if K = 4 and the true label of xi is class 2, then
T
yi = 0 1 0 0

(a) [5 pts] Write the log likelihood of the data set. You may use yij to denote the j th element of yi .
The probability of one data point is
K
Y
P(x, y) = P(x|y)P(y) = (P(x|Ck )πk )yk
k=1

I denote the parameters of this model as θ. The independent samples allow us to take a product over the data
points.
YN Y K
L(θ) = (P(xn |Ck )πk )yn,k
n=1 k=1

Thus,
N X
X K
l(θ) = yn,k [log(P(xn |Ck ) + log πk ]
n=1 k=1

(b) [10 pts] What are the maximum likelihood estimates of the prior probabilities?
(Hint: Remember to use Lagrange multipliers!)
PK
We want to maximize the log likelihood subject to the constraint that k=1 πk = 1. Thus, we must introduce
Lagrange Multipliers. The parameters we care about here are the πk ’s. Here is the Lagrangian:
N XK K
!
X X
L (π, λ) = yn,k [log(P(xn |Ck ) + log πk ] + λ πk − 1
n=1 k=1 k=1

Taking the derivative with respect to πk and setting it to 0, we have

N N
∂ 1 X 1X Nk
L (π, λ) = yn,k + λ = 0 =⇒ πk = − yn,k = −
∂πk πk n=1 λ n=1 λ

where Nk is the number of data points whose label is class k. Taking the derivative with respect to λ, we have
K K
∂ X X
L (π, λ) = πk − 1 = =⇒ πk = 1
∂λ
k=1 k=1

We can plug in all of our values of the πk ’s into the constraint, giving us the value of λ:
K K
X X Nk N
πk = − = − = 1 =⇒ λ = −N
λ λ
k=1 k=1

11
After having solved for λ, we can just plug this back into our other equations to solve for our πk ’s. Thus, we
have that the maximum likelihood estimates of the prior probabilities are
Nk
πk =
N

ANSWERS With Marks EC402: Econometrics
No ratings yet
ANSWERS With Marks EC402: Econometrics
17 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
189 Cheat Sheet Minicards
No ratings yet
189 Cheat Sheet Minicards
2 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Practice Questions Lec 18 45
No ratings yet
Practice Questions Lec 18 45
4 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
ES Key
No ratings yet
ES Key
4 pages
EE 769 2023.02.23 Mid Term
No ratings yet
EE 769 2023.02.23 Mid Term
2 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
ES Key
No ratings yet
ES Key
4 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
Cs 419 Endsemsols
No ratings yet
Cs 419 Endsemsols
6 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
HW 3
No ratings yet
HW 3
7 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
ML 2024a QP Solution Full
No ratings yet
ML 2024a QP Solution Full
13 pages
Finals 19
No ratings yet
Finals 19
16 pages
Quiz3 2023
No ratings yet
Quiz3 2023
2 pages
CS725 2020 Quiz1
No ratings yet
CS725 2020 Quiz1
3 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
hw3 Red
No ratings yet
hw3 Red
4 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
HW 1
No ratings yet
HW 1
3 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Quiz 3
No ratings yet
Quiz 3
12 pages
ML ES 23-24-II Key
No ratings yet
ML ES 23-24-II Key
4 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
Version 1
No ratings yet
Version 1
18 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
EE 769 2020.02.29 Mid Term Solution
No ratings yet
EE 769 2020.02.29 Mid Term Solution
6 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
PCA Explained
No ratings yet
PCA Explained
5 pages
Useful Notes
No ratings yet
Useful Notes
6 pages
High Price 865.5 Low Price 730
No ratings yet
High Price 865.5 Low Price 730
3 pages
Use This Formula To Convert Date in B3 Cell Into YYYYMMDD Format As Required TEXT (B3, "Yyymmdd")
No ratings yet
Use This Formula To Convert Date in B3 Cell Into YYYYMMDD Format As Required TEXT (B3, "Yyymmdd")
1 page
Date Open High Low Close H Open H High H Low H Close
No ratings yet
Date Open High Low Close H Open H High H Low H Close
30 pages
Date Open High Low Close H Open H High H Low
No ratings yet
Date Open High Low Close H Open H High H Low
4 pages
Date Open High Low LTP Close Turnover (In Lakhs)
No ratings yet
Date Open High Low LTP Close Turnover (In Lakhs)
1 page
AP Heikin
No ratings yet
AP Heikin
30 pages
Date Open High Low LTP Close Turnover (In Lakhs)
No ratings yet
Date Open High Low LTP Close Turnover (In Lakhs)
2 pages
Nify Futures Prices Aug 2014
No ratings yet
Nify Futures Prices Aug 2014
2 pages
Code of Business Ethics
No ratings yet
Code of Business Ethics
1 page
BSTA450 - Project1
No ratings yet
BSTA450 - Project1
10 pages
Cronbach's Alpha PDF
No ratings yet
Cronbach's Alpha PDF
2 pages
Worksheet
100% (1)
Worksheet
2 pages
DS Assignment No 2
No ratings yet
DS Assignment No 2
21 pages
Pengaruh Discount Dan Store Atmosphere Terhadap Perilaku (Studi Kasus Pada Konsumen Lottemart Wholesale Semarang)
No ratings yet
Pengaruh Discount Dan Store Atmosphere Terhadap Perilaku (Studi Kasus Pada Konsumen Lottemart Wholesale Semarang)
13 pages
CH 7 - Ensemble Learning and Random Forests
No ratings yet
CH 7 - Ensemble Learning and Random Forests
78 pages
Biometry Course Outline
No ratings yet
Biometry Course Outline
3 pages
TD 1
No ratings yet
TD 1
6 pages
Criterion Regression 1 PDC A + (B × MP$) Regression 2 PDC A + (B × # of Pos) Regression 3 PDC A + (B × # of SS)
No ratings yet
Criterion Regression 1 PDC A + (B × MP$) Regression 2 PDC A + (B × # of Pos) Regression 3 PDC A + (B × # of SS)
3 pages
Econ 210 Exam 4 7
No ratings yet
Econ 210 Exam 4 7
18 pages
Mcqs
50% (2)
Mcqs
4 pages
Simple Linear Regression Notes
No ratings yet
Simple Linear Regression Notes
4 pages
Chapter 12 Heteroskedasticity PDF
No ratings yet
Chapter 12 Heteroskedasticity PDF
20 pages
Excel Assignment Opre 3360 Nateb
No ratings yet
Excel Assignment Opre 3360 Nateb
70 pages
Cia 4 ML
No ratings yet
Cia 4 ML
60 pages
Multiple Discriminant Analysis: 10.1 Concept
No ratings yet
Multiple Discriminant Analysis: 10.1 Concept
2 pages
Econ3150 - 4150 2018v Utsat Sensorveiledning
No ratings yet
Econ3150 - 4150 2018v Utsat Sensorveiledning
10 pages
ECON3334 Midterm Fall2022 Question
No ratings yet
ECON3334 Midterm Fall2022 Question
7 pages
BRM Questions For Practice
No ratings yet
BRM Questions For Practice
5 pages
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
No ratings yet
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
11 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
41 pages
Chapter7 Econometrics Multicollinearity
No ratings yet
Chapter7 Econometrics Multicollinearity
25 pages
STEP SPSS ANALYSIS COHEN KAPPA and ICC
No ratings yet
STEP SPSS ANALYSIS COHEN KAPPA and ICC
5 pages
NumXL Functions
No ratings yet
NumXL Functions
11 pages
Project 1 Macroeconometrics Assiyg1 Kedir.m PDF
100% (1)
Project 1 Macroeconometrics Assiyg1 Kedir.m PDF
19 pages
Stock - Market - Prediction - With - G
No ratings yet
Stock - Market - Prediction - With - G
15 pages
Regression
No ratings yet
Regression
4 pages
QTA Interpretation
No ratings yet
QTA Interpretation
17 pages
Assignment Project Using SPSS
No ratings yet
Assignment Project Using SPSS
14 pages

ML Question CMU

Uploaded by

ML Question CMU

Uploaded by

CS 189 Introduction to

Spring 2014 Machine Learning Midterm

• Please use non-programmable calculators only.

First and last name of student to your left

First and last name of student to your right

For staff use only:

(c) [1 pt] In SVMs, the values of αi for non-support vectors are 0.

Poisson None of the above

1-nearest neighbor Logistic regression

Support vector machine Linear discriminant analysis

Linear kernel Gaussian RBF (radial basis function) kernel

Polynomial kernel None of the above

L1 None of the above

(f ) [3 pts] Suppose we have a covariance matrix  

Behaves like hard margin Goes to zero

Goes to infinity None of the above

λ, your penalty term , your convergence criterion

η, your step size Fixing a random bug

Assume the loss matrix

Based on Bayes’ Rule, the posterior probability of P (wi |x) is

Risk for predicting class 1 is

Risk for predicting class 2 is

λ12 (1−π) λ21 π

λ12 (1 − π) < λ21 π

(b) [8 pts] The polynomial kernel is defined to be

k(x, y) = (xT y + c)d

J(w) = kXw − yk22 + λkwk22

The global minimizer of J is given by:

(a) [8 pts] Consider running Newton’s method to minimize J.

Recall that Newton’s Method for Optimization is

Solving for the gradient, we have:

∇w J(w) = 2X T Xw − 2X T Y + 2λw = 2[(X T X + λI)w − X T Y ]

Solving for the Hessian, we have:

H(J(w)) = ∇2w J(w) = 2X T X + 2λI = 2(X T X + λI)

w1 = w0 − (X T X + λI)−1 2−1 2[(X T X + λI)w0 − X T Y ]

P (x | θ) = θx−θ−1 where θ > 1, x ≥ 1

Find the maximum likelihood estimator of θ.

However, we will still accept θmle = Pn n

(a) [6 pts] Show that the mean of Y is Aµ + b.

E(Y) = E(AX + b) = E(AX) + E(b) = AE(X) + b = Aµ + b

(b) [7 pts] Show that the covariance matrix of Y is AΣAT .

V ar(Y) = E((Y − EY)(Y − EY)T ) = E((AX + b − Aµ − b)(AX + b − Aµ − b)T )

• Prior class probabilities: P (Ck ) = πk k = 1, . . . , K

Suppose we are given training data {(xn , yn )}N

Taking the derivative with respect to πk and setting it to 0, we have

You might also like

(f ) [3 pts] Suppose we have a covariance matrix

λ, your penalty term , your convergence criterion