0% found this document useful (0 votes)

58 views5 pages

Homework2 v1.0

This document describes homework 2 for the 10-701 Introduction to Machine Learning course due on October 16th. It includes 4 problems on various machine learning topics: 1. Bayes optimal classification with asymmetric loss functions. 2. Regularized linear regression using lasso to encourage sparse solutions. 3. Multinomial logistic regression for multiclass classification. 4. Perceptron mistake bounds. It provides instructions for submitting homework, allows collaboration but requires individual write-ups, and details the specific questions and tasks for each problem.

Uploaded by

royadaneshi2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views5 pages

Homework2 v1.0

Uploaded by

royadaneshi2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

10-701 Introduction to Machine Learning

Homework 2, version 1.0 Due Oct 16, 11:59 am

Rules:

1. Homework submission is done via CMU Autolab system. Please package your writeup and code into
a zip or tar file, e.g., let submit.zip contain writeup.pdf and ps2 code/*.m. Submit the package to
https://autolab.cs.cmu.edu/courses/10701-f15.
2. Like conference websites, repeated submission is allowed. So please feel free to refine your answers.
We will only grade the latest version.
3. Autolab may allow submission after the deadline, note however it is because of the late day policy.
Please see course website for policy on late submission.
4. We recommend that you typeset your homework using appropriate software such as LATEX. If you are
writing please make sure your homework is cleanly written up and legible. The TAs will not invest
undue effort to decrypt bad handwriting.
5. You are allowed to collaborate on the homework, but you should write up your own solution and code.
Please indicate your collaborators in your submission.

1
1 Bayes Optimal Classification (20 Points) (Yan)
In classification, the loss function we usually want to minimize is the 0/1 loss:

` (f (x), y) = 1 {f (x) 6= y} (1)

where f (x), y ∈ {0, 1} (i.e., binary classification). In this problem we will consider the effect of using an
asymmetric loss function:

`α,β (f (x), y) = α1 {f (x) = 1, y = 0} + β1 {f (x) = 0, y = 1} . (2)

Under this loss function, the two types of errors receive different weights, determined by α, β > 0.

1. Determine the Bayes optimal classifier, i.e., the classifier that achieves minimum risk assuming P (x, y)
is known, for the loss `α,β where α, β > 0.
2. Suppose that the class y = 0 is extremely uncommon (i.e., P (y = 0) is small). This means that the
classifier f (x) = 1 for all x will have good risk. We may try to put the two classes on even footing by
considering the risk:

R = P (f (x) = 1 | y = 0) + P (f (x) = 0 | y = 1) . (3)

Show how this risk is equivalent to choosing a certain α, β and minimizing the risk where the loss
function is `α,β .
3. Consider the following classification problem. I first choose the label Y ∼ Ber 21 , which is 1 with

probability 12 . If Y = 1, then X ∼ Ber (p); otherwise, X ∼ Ber (q). Assume that p > q. What is the
Bayes optimal classifier, and what is its risk?
4. Now consider the regular 0/1 loss `, and assume that P (y = 0) = P (y = 1) = 12 . Also, assume that
the class-conditional densities are Gaussian with mean µ0 and co-variance Σ0 under class 0, and mean
µ1 and co-variance Σ1 under class 1. Further, assume that µ0 = µ1 .
For the following case, draw contours of the level sets of the class conditional densities and label them
with p(x | y = 0 and p(x | y = 1). Also, draw the decision boundaries obtained using the Bayes optimal
classifier in each case and indicate the regions where the classifier will predict class 0 and where it will
predict class 1.

1 0 4 0
Σ0 = , Σ1 = . (4)
0 4 0 1

2 Regularized Linear Regression Using Lasso (20 Points) (Yan)

Lasso is a form of regularized linear regression, where the L1 norm of the parameter vector is penalized. It
is used in an attempt to get a sparse parameter vector where features of little “importance” are assigned to
zero weight. But why does lasso encourage sparse parameters? For this question, you are going to examine
this.
Let X denote an n × d matrix where rows are training points, y denotes an n × 1 vector of corresponding
output value, w denotes a d × 1 parameter vector and w? denotes the optimal parameter vector. To make
the analysis easier we will consider the special case where the training data is whitened (i.e., X > X = I).
For lasso regression, the optimal parameter vector is given by
1 2
w? = argmin ky − Xwk + λ kwk1 , (5)
w 2
where λ > 0.

2
1. Show that whitening the training data nicely decouples the features, making wi? determined by the ith
feature and the output regardless of other features. To show this, write Jλ (w) in the form
d
X
Jλ (w) = g(y) + f (X·i , y, wi , λ) , (6)
i=1

where X·i is the ith column of X.

2. Assume that wi? > 0, what is the value of wi? in this case?
3. Assume that wi? < 0, what is the value of wi? in this case?
4. From 2 and 3, what is the condition for wi? to be 0? How can you interpret that condition?
2
5. Now consider ridge regression where the regularization term is replaced by 12 λ kwk2 . What is the
condition for wi? = 0? How does it differ from the condition you obtained in 4?

3 Multinomial Logistic Regression (20 Points) (Yan)

Multinomial logistic regression is a classification method that generalizes logistic regression to multiclass
problems. It has the form
exp wc0 + wc> x

p (y = c | x, W) = PC , (7)
>
k=1 exp wk0 + wk x

where C is the number of classes, and W is a C × (d + 1) weight matrix, and d is the dimension of input
vector x. In other words, W is a matrix whose rows are the weight vectors for each class.
1. Show that in the special case where C = 2, the multinomial logistic regression reduces to logistic
regression.
2. In the training process of the the multinomial logistic regression model, we are given a set of training
n
data {xi , yi }i=1 , and we want to learn a set of weight vectors that maximize the conditional likelihood
n n
of the output labels {yi }i=1 , given the input data {xi }i=1 and W. That is, we want to solve the
following optimization problem (assuming the data points are i.i.d ).
n
Y
?
W = argmax P (yi | xi , W) (8)
W i=1

In order to solve this optimization problem, most numerical solvers require that we provide a function
that computes the objective function value given some weight and the gradient of that objective function
(i.e., its first derivative). Some solvers usually also require the Hessian of the objective function (i.e.,
its second derivative). So, in order to implement the algorithm we need to derive these functions.
(a) Derive the conditional log-likelihood function for the multinomial logistic regression model. You
may denote this function as `(W ).
(b) Derive the gradient of `(W ) with respect to the weight vector of class c (wc ). That is, derive
∇wc `(W ). You may denote this function as gradient gc (W ). Note: The gradient of a function
f (x) with respect to a vector x is also a vector, whose i-th entry is defined as ∂f∂x(x) i
, where xi is
the i-th element of x.
(c) Derive the block submatrix of the Hessian with respect to weight vector of class c (wc ) and class
c0 (wc0 ). You may denote this function as Hc,c0 (W). Note: The Hessian of a function f (x) with
∂ 2 f (x)
respect to vector x is a matrix, whose {i, j}th entry is defined as ∂x i ∂xj
. In this case, we are
asking a block submatrix of Hessian of the conditional log-likelihood function, taken with respect
∂ 2 `(W)
to only two classes c and c0 . The {i, j}th entry of the submatrix is defined as ∂w ci ∂w 0
.
c j

3
4 Perceptron Mistake Bounds (20 Points) (Xun)
Suppose {(xi , yi ) : xi ∈ Rn , yi ∈ {+1, −1} , i = 1, . . . , m} can be linearly separated by a margin γ > 0, i.e.

∃w ∈ Rn s.t. kwk2 = 1, hyi xi , wi ≥ γ, ∀i = 1, . . . , m, (9)

where ha, bi = a> b is the dot product between two vectors. Further assume kxi k2 ≤ M, ∀i. Recall that
Perceptron algorithm starts from w(0) = 0 and updates w(t) = w(t−1) + y (t) x(t) , where (x(t) , y (t) ) is the t-th
misclassified example. We will prove that Perceptron learns an exact classifier in finitely many steps.

1. Show that w(t) , w ≥ tγ.

2. Show that kw(t) k22 ≤ tM 2 .

3. Use the results above, show that number of updates t is upper bounded by M 2 /γ 2 .
4. True or False: when zero error is achieved, the classifier always has margin γ. Explain briefly.

5 Logistic Regression for Image Classification (20 Points) (Xun)

In this problem, we will implement (almost) from scratch logistic regression for image classification. We
will use MNIST (http://yann.lecun.com/exdb/mnist/), which contains in total 70,000 handwritten digits
from 0 to 9. Although initially released in 1998, MNIST is still one of the most widely used benchmark data
sets for image classification.

5.1 Exploring the data

It is often a good practice to take a careful look at the data before modeling. Run download mnist.sh and
visualize mnist.m, and explore the following properties of images and labels:
1. size of each image

2. range of labels
3. range of pixel values
4. maximum and minimum `2 -norm of the images

5. whether the data is sparse or dense

6. whether the label distribution is skewed or uniform
Please append your code to visualize mnist.m.

5.2 Binary logistic regression

Let’s start with a simple binary logistic regression for classifying ONLY between the digits 3 and 8. In lr.m
we have outlined the training and testing process, as well as some preprocessing routines, such as selecting
3’s and 8’s and normalizing data to have zero mean and unit variance. Your goal is to perform the following
steps:

1. Complete the function s = sigmoid(a). Note that a and s could be vectors.

2. Complete the function [f, g] = oracle lr(w, X, y), where w is the weight vector, X is the set of
images, and y is the set of labels. This function returns the objective f and the gradient g.

4
3. Complete the function err = grad check(oracle, t). This will help you check if the oracle imple-
mentation is correct. First recall the definition of derivatives:
f (t + h) − f (t)
g(t) = lim . (10)
h→0 h
The idea is to check analytic gradients against numerical gradients. For a small h, say h ≈ 10−6 , the
numerical estimate
f (t + hej ) − f (t − hej )
gbj (t) = ≈ gj (t), (11)
2h
for all j ∈ {1, . . . , d}, where ej is the unit vector at j-th coordinate. If the oracle is implemented
correctly, then we should expect small (e.g. ≈ 10−6 ) average error
d
1X
err = |b
gj (t) − gj (t)| . (12)
d j=1

Try running oracle lr test.m to see if your oracle can pass the test.
4. Complete the function w = optimize lr(w0, X, y), where w0 is the initial value. You will implement
a simple gradient descent/ascent algorithm to find the best parameter w.

5. Complete the function acc = binary accuracy(w, X, y). This function returns the fraction of cor-
rect predictions of classifier w on data X.
2
6. Run lr.m, report the number of iterations, final objective function value, final kwk2 , training accuracy,
and test accuracy. You should be able to get ≥ 95% test accuracy.
7. Modify oracle lr.m to return `2 -regularized objective and gradient, with tuning parameter λ. Specif-
2
ically, add/subtract λ2 kwk2 to the objective, depending on whether the objective is log-likelihood or
2
negative log-likelihood. Report the number of iterations, final objective function value, final kwk2 ,
training accuracy, and test accuracy. Briefly summarize your observation.
(Note: you can check your implementation again with oralce lr test.m.)

5.3 Multiclass logistic regression

Now let’s learn a classifier for ALL digits using multiclass logistic regression derived above. Again we have
the algorithm outline and preprocessing in mlr.m. Notice that the labels are now shifted by 1, so that they
are 1-indexed. Your goal is to perform the following steps:

1. Complete the function [f, g] = oracle mlr(W0, X, y). In particular, implement `2 -regularization
2
for each class, again with λ being the tuning parameter, i.e. include λ2 kWkF in your objective.
(Note: you can check your implementation with oralce mlr test.m.)

2. Complete the function W = optimize mlr(W0, X, y).

3. Complete the function acc = multiclass accuracy(W, X, y).
2
4. Run mlr.m, report the number of iterations, final objective function value, final kWkF , training accu-
racy, and test accuracy. Also include the visualization of the learned weights. For this task you should
be able to get ≥ 92% test accuracy.

DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
351 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
CS-31002 (ML) - CS End April 2025
No ratings yet
CS-31002 (ML) - CS End April 2025
19 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
04 LogisticRegression
No ratings yet
04 LogisticRegression
29 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
383 Fall11 Lec19
No ratings yet
383 Fall11 Lec19
30 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Quiz2 Mock Solutions
No ratings yet
Quiz2 Mock Solutions
19 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Sample Midterm Exam 6
No ratings yet
Sample Midterm Exam 6
11 pages
EE353 - 769 08 Linear Classification
No ratings yet
EE353 - 769 08 Linear Classification
22 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Sample Midterm Exam 6 - Solutions
No ratings yet
Sample Midterm Exam 6 - Solutions
10 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Ass8 Solns
No ratings yet
Ass8 Solns
10 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
CS 229, Autumn 2017 Problem Set #2: Supervised Learning II
No ratings yet
CS 229, Autumn 2017 Problem Set #2: Supervised Learning II
6 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Determining Sample Size
No ratings yet
Determining Sample Size
10 pages
HW 3
No ratings yet
HW 3
7 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Output 23
No ratings yet
Output 23
6 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Statistics Presentation 7
No ratings yet
Statistics Presentation 7
55 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Linear Models
No ratings yet
Linear Models
30 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
No ratings yet
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
46 pages
Standard Normal Distribution Through Microsoft Excel
No ratings yet
Standard Normal Distribution Through Microsoft Excel
12 pages
Solutions Problem Set 1
No ratings yet
Solutions Problem Set 1
7 pages
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University
16 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
Chapter 24 - Logistic Regression
100% (7)
Chapter 24 - Logistic Regression
21 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Homework2 - Tran Anh Vu
No ratings yet
Homework2 - Tran Anh Vu
3 pages
Statistics 110, Lecture Notes - Cedar Crest College
No ratings yet
Statistics 110, Lecture Notes - Cedar Crest College
111 pages
Midterm Review Spring18 Sols
No ratings yet
Midterm Review Spring18 Sols
22 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
No ratings yet
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
10 pages
Representer Function
No ratings yet
Representer Function
12 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
45 pages
HW 4
No ratings yet
HW 4
7 pages
Kumar Et Al - Paper-2021
No ratings yet
Kumar Et Al - Paper-2021
20 pages
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Unit2-Regression NGP
No ratings yet
Unit2-Regression NGP
81 pages
Statistics Formulas: Parameters
No ratings yet
Statistics Formulas: Parameters
3 pages
Tutorial Problems Day 1
No ratings yet
Tutorial Problems Day 1
3 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
CMPUT 466/551 - Assignment 1: Paradox?
No ratings yet
CMPUT 466/551 - Assignment 1: Paradox?
6 pages
Cluster Analysis Chap8
No ratings yet
Cluster Analysis Chap8
42 pages
Engr371 S
No ratings yet
Engr371 S
5 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Simple Linear Regression: Yandell - Econ 216 Chap 13-1
No ratings yet
Simple Linear Regression: Yandell - Econ 216 Chap 13-1
70 pages
Unit V Anseer Key
No ratings yet
Unit V Anseer Key
10 pages
Subjective Questions
No ratings yet
Subjective Questions
8 pages
RV Econometrics II - Exam Fall 2016 - Solution Guide
No ratings yet
RV Econometrics II - Exam Fall 2016 - Solution Guide
15 pages
Chapter 1 Overview of Basic Probability Theory
No ratings yet
Chapter 1 Overview of Basic Probability Theory
17 pages
(Mai 4.4) Linear Regression
No ratings yet
(Mai 4.4) Linear Regression
20 pages
Frequencies: Notes
No ratings yet
Frequencies: Notes
36 pages
q2 w7 Advanced Statistics
No ratings yet
q2 w7 Advanced Statistics
2 pages
Project Management: Network-Planning Models Pert
No ratings yet
Project Management: Network-Planning Models Pert
19 pages
Quiz 2 Data Preparation
No ratings yet
Quiz 2 Data Preparation
8 pages
Previous Next: 11.3.4 Solved Problems
No ratings yet
Previous Next: 11.3.4 Solved Problems
1 page
Big Data Analytics
No ratings yet
Big Data Analytics
13 pages
Yu-Zhang 2005 - Three Parameter ALD
No ratings yet
Yu-Zhang 2005 - Three Parameter ALD
14 pages
Test of Homogeneity Based On Geometric Mean of Variances
No ratings yet
Test of Homogeneity Based On Geometric Mean of Variances
11 pages
MATPMD4 Assignment 1 Stochastic Processes
No ratings yet
MATPMD4 Assignment 1 Stochastic Processes
4 pages
Aligned Rank Transform PDF
No ratings yet
Aligned Rank Transform PDF
4 pages
Problem Set 3
No ratings yet
Problem Set 3
2 pages

Homework2 v1.0

Uploaded by

Homework2 v1.0

Uploaded by

10-701 Introduction to Machine Learning

Homework 2, version 1.0 Due Oct 16, 11:59 am

` (f (x), y) = 1 {f (x) 6= y} (1)

`α,β (f (x), y) = α1 {f (x) = 1, y = 0} + β1 {f (x) = 0, y = 1} . (2)

R = P (f (x) = 1 | y = 0) + P (f (x) = 0 | y = 1) . (3)

2 Regularized Linear Regression Using Lasso (20 Points) (Yan)

where X·i is the ith column of X.

3 Multinomial Logistic Regression (20 Points) (Yan)

∃w ∈ Rn s.t. kwk2 = 1, hyi xi , wi ≥ γ, ∀i = 1, . . . , m, (9)

1. Show that w(t) , w ≥ tγ.

2. Show that kw(t) k22 ≤ tM 2 .

5 Logistic Regression for Image Classification (20 Points) (Xun)

5.1 Exploring the data

5. whether the data is sparse or dense

5.2 Binary logistic regression

1. Complete the function s = sigmoid(a). Note that a and s could be vectors.

5.3 Multiclass logistic regression

2. Complete the function W = optimize mlr(W0, X, y).

You might also like