0% found this document useful (0 votes)

225 views9 pages

Final: CS 189 Spring 2013 Introduction To Machine Learning

The document is the instructions for a machine learning final exam. It states that the exam is 3 hours, closed book except for a 1 or 2 page crib sheet. Students should use non-programmable calculators and mark their answers on the exam itself. Short answers should be a few sentences at most and include a bounding box. The exam consists of true/false questions, multiple choice questions, and short answer questions.

Uploaded by

Shabs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

225 views9 pages

Final: CS 189 Spring 2013 Introduction To Machine Learning

Uploaded by

Shabs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

CS 189 Introduction to

Spring 2013 Machine Learning Final

• You have 3 hours for the exam.
• The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

• Please use non-programmable calculators only.

• Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a
brief explanation. All short answer sections can be successfully answered in a few sentences AT MOST.
• For true/false questions, fill in the True/False bubble.

• For multiple-choice questions, fill in the bubbles for ALL CORRECT CHOICES (in some cases, there may be
more than one). For a question with p points and k choices, every false positive wil incur a penalty of p/(k − 1)
points.
• For short answer questions, unnecessarily long explanations and extraneous data will be penalized.
Please try to be terse and precise and do the side calculations on the scratch papers provided.

• Please draw a bounding box around your answer in the Short Answers section. A missed answer without
a bounding box will not be regraded.

First name

Last name

SID

For staff use only:

Q1. True/False /23
Q2. Multiple Choice Questions /36
Q3. Short Answers /26
Total /85

1
Q1. [23 pts] True/False
(a) [1 pt] Solving a non linear separation problem with a hard margin Kernelized SVM (Gaussian RBF Kernel)
might lead to overfitting.
True False

(b) [1 pt] In SVMs, the sum of the Lagrange multipliers corresponding to the positive examples is equal to the sum
of the Lagrange multipliers corresponding to the negative examples.
True False

(d) [1 pt] V (X) = E[X]2 − E[X 2 ]

True False

(e) [1 pt] In the discriminative approach to solving classification problems, we model the conditional probability
of the labels given the observations.
True False

(f ) [1 pt] In a two class classification problem, a point on the Bayes optimal decision boundary x∗ always satisfies
P (y = 1|x∗ ) = P (y = 0|x∗ ).
True False

(g) [1 pt] Any linear combination of the components of a multivariate Gaussian is a univariate Gaussian.
True False

(h) [1 pt] For any two random variables X ∼ N (µ1 , σ12 ) and Y ∼ N (µ2 , σ22 ), X + Y ∼ N (µ1 + µ2 , σ12 + σ22 ).
True False

(i) [1 pt] Stanford and Berkeley students are trying to solve the same logistic regression problem for a dataset.
The Stanford group claims that their initialization point will lead to a much better optimum than Berkeley’s
initialization point. Stanford is correct.
True False
p
(j) [1 pt] In logistic regression, we model the odds ratio ( 1−p ) as a linear function.
True False

(k) [1 pt] Random forests can be used to classify infinite dimensional data.
True False

(l) [1 pt] In boosting we start with a Gaussian weight distribution over the training samples.
True False

(m) [1 pt] In Adaboost, the error of each hypothesis is calculated by the ratio of misclassified examples to the total
number of examples.
True False

(n) [1 pt] When k = 1 and N → ∞, the kNN classification rate is bounded above by twice the Bayes error rate.
True False

(o) [1 pt] A single layer neural network with a sigmoid activation for binary classification with the cross entropy
loss is exactly equivalent to logistic regression.
True False

2
(p) [1 pt] The loss function for LeNet5 (the convolutional neural network by LeCun et al.) is convex.
True False

(q) [1 pt] Convolution is a linear operation i.e. (αf1 + βf2 ) ∗ g = αf1 ∗ g + βf2 ∗ g.
True False

(r) [1 pt] The k-means algorithm does coordinate descent on a non-convex objective function.
True False

(s) [1 pt] A 1-NN classifier has higher variance than a 3-NN classifier.
True False

(t) [1 pt] The single link agglomerative clustering algorithm groups two clusters on the basis of the maximum
distance between points in the two clusters.
True False

(u) [1 pt] The largest eigenvector of the covariance matrix is the direction of minimum variance in the data.
True False

(v) [1 pt] The eigenvectors of AAT and AT A are the same.

True False

(w) [1 pt] The non-zero eigenvalues of AAT and AT A are the same.
True False

3
Q2. [36 pts] Multiple Choice Questions
(a) [4 pts] In linear regression, we model P (y|x) ∼ N (wT x + w0 , σ 2 ). The irreducible error in this model is
.

σ2 E[(y − E[y|x])|x]

E[(y − E[y|x])2 |x] E[y|x]

(b) [4 pts] Let S1 and S2 be the set of support vectors and w1 and w2 be the learnt weight vectors for a linearly
separable problem using hard and soft margin linear SVMs respectively. Which of the following are correct?

S1 ⊂ S2 S1 may not be a subset of S2

w1 = w2 w1 may not be equal to w2 .

(c) [4 pts] Ordinary least-squares regression is equivalent to assuming that each data point is generated according
to a linear function of the input plus zero-mean, constant-variance Gaussian noise. In many systems, however,
the noise variance is itself a positive linear function of the input (which is assumed to be non-negative, i.e.,
x ≥ 0). Which of the following families of probability models correctly describes this situation in the univariate
case?
2 2 2
P (y|x) = √1
σ 2πx
exp(− (y−(w2xσ
0 +w1 x))
2 ) P (y|x) = √1
σ 2πx
exp(− (y−(w0 +(w 1 +σ )x))
2σ 2 )

(y−(w0 +w1 x))2 2

P (y|x) = √1 exp(−
σ 2π 2σ 2 ) P (y|x) = 1
√
σx 2π
exp(− (y−(w2x0 2+w
σ2
1 x))
)

(d) [3 pts] The left singular vectors of a matrix A can be found in .

Eigenvectors of AAT Eigenvectors of A2

Eigenvectors of AT A Eigenvalues of AAT

(e) [3 pts] Averaging the output of multiple decision trees helps .

Increase bias Increase variance

Decrease bias Decrease variance

(f ) [4 pts] Let A be a symmetric matrix and S be the matrix containing its eigenvectors as column vectors, and D
a diagonal matrix containing the corresponding eigenvalues on the diagonal. Which of the following are true:

AS = SD SA = DS

AS = DS AS = DS T

(g) [4 pts] Consider the following dataset: A = (0, 2), B = (0, 1) and C = (1, 0). The k-means algorithm is
initialized with centers at A and B. Upon convergence, the two centers will be at

A and C C and the midpoint of AB

A and the midpoint of BC A and B

4
(h) [3 pts] Which of the following loss functions are convex?

Misclassification loss Hinge loss

Logistic loss Exponential Loss (e(−yf (x)) )

(i) [3 pts] Consider T1 , a decision stump (tree of depth 2) and T2 , a decision tree that is grown till a maximum
depth of 4. Which of the following is/are correct?

Bias(T1 ) < Bias(T2 ) V ariance(T1 ) < V ariance(T2 )

Bias(T1 ) > Bias(T2 ) V ariance(T1 ) > V ariance(T2 )

(j) [4 pts] Consider the problem of building decision trees with k-ary splits (split one node intok nodes) and
you are deciding k for each node by calculating the entropy impurity for different values of k and optimizing
simultaneously over the splitting threshold(s) and k. Which of the following is/are true?

The algorithm will always choose k = 2 There will be k −1 thresholds for a k-ary split

This model is strictly more powerful than a

The algorithm will prefer high values of k binary decision tree.

5
Q3. [26 pts] Short Answers
σ2 σ12
(a) [5 pts] Given that (x1 , x2 ) are jointly normally distributed with µ = µµ12 and Σ = σ 1

σ22
(σ21 = σ12 ), give
21
an expression for the mean of the conditional distribution p(x1 |x2 = a).

This can be solved by writing p(x1 |x2 = a) = p(x 1 ,x2 =a)

p(x2 =a) . x2 being a component of a multivariate Gaussian is
a univariate Gaussian with x2 ∼ N (µ2 , σ22 ). Write out the Gaussian densities and simplify (complete squares)
to see the following:
σ12
x1 |x2 = a ∼ N (µ̄, σ̄ 2 ), µ̄ = µ1 + 2 (a − µ2 )
σ22

(b) [4 pts] The logistic function is given by σ(x) = 1

1+e−x . Show that σ 0 (x) = σ(x)(1 − σ(x)).

e−x e−x

1 1 1
σ 0 (x) = −x
= . = 1− = σ(x)(1 − σ(x))
(1 + e ) 2 (1 + e ) (1 + e−x )
−x 1 + e−x 1 + e−x

(c) Let X have a uniform distribution

(
1
θ 0≤x≤θ
p(x; θ) =
0 otherwise
Suppose that n samples x1 , . . . , xn are drawn independently according to p(x; θ).
(i) [5 pts] The maximum likelihood estimate of θ is x(n) = max(x1 , x2 , . . . , xn ). Show that this estimate of θ
is biased.

Biased estimator: θ̂ (the sample estimate) is a biased estimator of θ (the population distribution parameter)
if E[θ̂] 6= θ.

n
Here θ̂ = x(n) . And E[x(n) ] = n+1 θ 6= θ. The steps for finding E[x(n) ] are given in the solutions of Homework
2, problem 5(c).

(ii) [2 pts] Give an expression for an unbiased estimator of θ.

n+1
θ̂unbiased = x(n)
n
n+1 n+1 n+1 n
E[θ̂unbiased ] = E[ x(n) ] = E[x(n) ] = × θ=θ
n n n n+1

6
(d) [5 pts] Consider the problem of fitting the following function to a dataset of 100 points {(xi , yi )}, i = 1 . . . 100:

y = αcos(x) + βsin(x) + γ

This problem can be solved using the least squares method with a solution of the form:
 
α
β  = (X T X)−1 X T Y
γ

What are X and Y ?

   
cos(x1 ) sin(x1 ) 1 y1
 cos(x2 ) sin(x2 ) 1  y2 
X= 
 .. ..

..  Y = 
 ..


 . . .  . 
cos(x100 ) sin(x100 ) 1 y100

(e) [5 pts] Consider the problem of binary classification using the Naive Bayes classifier. You are given two dimen-
sional features (X1 , X2 ) and the categorical class conditional distributions in the tables below. The entries in
the tables correspond to P (X1 = x1 |Ci ) and P (X2 = x2 |Ci ) respectively. The two classes are equally likely.

PP Class PP Class
PP PP
C1 C2 C1 C2
X1 = PPP P X2 = PPP P
−1 0.2 0.3 −1 0.4 0.1
0 0.4 0.6 0 0.5 0.3
1 0.4 0.1 1 0.1 0.6

Given a data point (−1, 1), calculate the following posterior probabilities:

P (C1 |X1 = −1, X2 = 1) = Using Bayes’ Rule and conditional independence assumption of Naive Bayes

P (X1 =−1,X2 =1|C1 )P (C1 ) P (X1 =−1|C1 )P (X2 =1|C1 )P (C1 )

P (X1 =−1,X2 =1)
= P (X1 =−1|C1 )P (X2 =1|C1 )P (C1 )+P (X1 =−1|C2 )P (X2 =1|C2 )P (C2 )
= 0.1

P (C2 |X1 = −1, X2 = 1) = 1 − P (C1 |X2 = −1, X1 = 1) = 0.9

7
Scratch paper

8
Scratch paper

Practice Midterm
No ratings yet
Practice Midterm
4 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
13 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
Finals 19
No ratings yet
Finals 19
16 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
CS725 2020 Quiz1
No ratings yet
CS725 2020 Quiz1
3 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Extra 2
No ratings yet
Extra 2
7 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
Solution 2
0% (1)
Solution 2
6 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
Practice Questions Lec 18 45
No ratings yet
Practice Questions Lec 18 45
4 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
ML 2024a QP Solution Full
No ratings yet
ML 2024a QP Solution Full
13 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
CS725 2020 Midsem
No ratings yet
CS725 2020 Midsem
3 pages
Final 2019
No ratings yet
Final 2019
15 pages
Finals 19
No ratings yet
Finals 19
16 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
Final Exam Epfl 2020 Machine Leaning
No ratings yet
Final Exam Epfl 2020 Machine Leaning
16 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
CS 771A: Intro To Machine Learning, IIT Kanpur Name Roll No Dept
No ratings yet
CS 771A: Intro To Machine Learning, IIT Kanpur Name Roll No Dept
2 pages
ML Questions
No ratings yet
ML Questions
6 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
Quiz3 2023
No ratings yet
Quiz3 2023
2 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
ML FinalUpdated 1
No ratings yet
ML FinalUpdated 1
45 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
Awsm: CS 771A: Intro To Machine Learning, IIT Kanpur (19 Oct 2022) Name Roll No Dept
No ratings yet
Awsm: CS 771A: Intro To Machine Learning, IIT Kanpur (19 Oct 2022) Name Roll No Dept
2 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
56 pages
Julian Requiestas - 1. Introduction To Statistics Levels of Measurement and The Grid
No ratings yet
Julian Requiestas - 1. Introduction To Statistics Levels of Measurement and The Grid
38 pages
Student Answer:: Click Here
No ratings yet
Student Answer:: Click Here
5 pages
CS5103 Lecture Plan - Fundamnetals of Data Science
No ratings yet
CS5103 Lecture Plan - Fundamnetals of Data Science
2 pages
(Ebook) Solution Manual For Introduction To Econometrics 3rd Edition by James H. Stock, Mark W. Watson by James H. Stock, Mark W. Watson
100% (3)
(Ebook) Solution Manual For Introduction To Econometrics 3rd Edition by James H. Stock, Mark W. Watson by James H. Stock, Mark W. Watson
73 pages
Communication Systems: Random Process
No ratings yet
Communication Systems: Random Process
40 pages
Hypothesis Testing: Erwin L. Medina
No ratings yet
Hypothesis Testing: Erwin L. Medina
8 pages
Sta201 Assignment03 Spring2023
No ratings yet
Sta201 Assignment03 Spring2023
2 pages
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
No ratings yet
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
16 pages
CNNs Pytorch
No ratings yet
CNNs Pytorch
19 pages
Chapter 6
No ratings yet
Chapter 6
94 pages
Assignment 1 (Sol.) : Introduction To Data Analytics
No ratings yet
Assignment 1 (Sol.) : Introduction To Data Analytics
4 pages
Taylor Ims11 Tif ch14
No ratings yet
Taylor Ims11 Tif ch14
31 pages
Chapter 5: Probability: What Are The Chances?: Section 5.1 Randomness, Probability, and Simulation
No ratings yet
Chapter 5: Probability: What Are The Chances?: Section 5.1 Randomness, Probability, and Simulation
10 pages
Maths (4) Module 4
No ratings yet
Maths (4) Module 4
45 pages
Quantitative Reasoning
No ratings yet
Quantitative Reasoning
31 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Exercise 3.2 Paired T-Test (Diez)
No ratings yet
Exercise 3.2 Paired T-Test (Diez)
5 pages
Stats Unit 12 Notes
No ratings yet
Stats Unit 12 Notes
12 pages
Analyze Phase Workbook - Final
100% (3)
Analyze Phase Workbook - Final
151 pages
Revisiting The Classic Control Function Approach, With Implications For Parametric and Non-Parametric Regressions
No ratings yet
Revisiting The Classic Control Function Approach, With Implications For Parametric and Non-Parametric Regressions
42 pages
Regression Exercise
No ratings yet
Regression Exercise
2 pages
Occupancytuts - Survey-Level Covariates
No ratings yet
Occupancytuts - Survey-Level Covariates
35 pages
Jackknife Bias Estimator, Standard Error and Pseudo-Value
No ratings yet
Jackknife Bias Estimator, Standard Error and Pseudo-Value
14 pages
Bisf 2308 Dec 2024
No ratings yet
Bisf 2308 Dec 2024
3 pages
Class 1 Chapter 14 Nuclear Counting Statistics: INME - Principles of Radiation Physics Chapter 14 - Page 1
No ratings yet
Class 1 Chapter 14 Nuclear Counting Statistics: INME - Principles of Radiation Physics Chapter 14 - Page 1
18 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
17 pages
Lesson 1. Random Variables & Probability Distribution
No ratings yet
Lesson 1. Random Variables & Probability Distribution
5 pages
Tests of Normality: Kolmogorov-Smirnov Shapiro-Wilk Statistic DF Sig. Statistic DF Sig. Standardized Residual For Daya
No ratings yet
Tests of Normality: Kolmogorov-Smirnov Shapiro-Wilk Statistic DF Sig. Statistic DF Sig. Standardized Residual For Daya
2 pages

Final: CS 189 Spring 2013 Introduction To Machine Learning

Uploaded by

Final: CS 189 Spring 2013 Introduction To Machine Learning

Uploaded by

CS 189 Introduction to

Spring 2013 Machine Learning Final

• Please use non-programmable calculators only.

For staff use only:

(d) [1 pt] V (X) = E[X]2 − E[X 2 ]

(v) [1 pt] The eigenvectors of AAT and AT A are the same.

E[(y − E[y|x])2 |x] E[y|x]

S1 ⊂ S2 S1 may not be a subset of S2

w1 = w2 w1 may not be equal to w2 .

(y−(w0 +w1 x))2 2

(d) [3 pts] The left singular vectors of a matrix A can be found in .

Eigenvectors of AAT Eigenvectors of A2

Eigenvectors of AT A Eigenvalues of AAT

(e) [3 pts] Averaging the output of multiple decision trees helps .

Increase bias Increase variance

Decrease bias Decrease variance

A and C C and the midpoint of AB

A and the midpoint of BC A and B

Misclassification loss Hinge loss

Logistic loss Exponential Loss (e(−yf (x)) )

Bias(T1 ) < Bias(T2 ) V ariance(T1 ) < V ariance(T2 )

Bias(T1 ) > Bias(T2 ) V ariance(T1 ) > V ariance(T2 )

This model is strictly more powerful than a

This can be solved by writing p(x1 |x2 = a) = p(x 1 ,x2 =a)

(b) [4 pts] The logistic function is given by σ(x) = 1

(c) Let X have a uniform distribution

(ii) [2 pts] Give an expression for an unbiased estimator of θ.

What are X and Y ?

P (X1 =−1,X2 =1|C1 )P (C1 ) P (X1 =−1|C1 )P (X2 =1|C1 )P (C1 )

P (C2 |X1 = −1, X2 = 1) = 1 − P (C1 |X2 = −1, X1 = 1) = 0.9

You might also like