0% found this document useful (0 votes)

160 views4 pages

CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning

This document contains instructions for 5 problems related to machine learning concepts: 1) Proving a bound on model selection error for empirical risk minimization across hypothesis classes. 2) Calculating VC dimensions for various hypothesis classes. 3) Deriving and implementing coordinate descent to solve l1-regularized least squares. 4) Implementing k-means clustering on synthetic data. 5) Explaining the generalized EM algorithm as an alternative to exact maximization in the M-step of EM.

Uploaded by

suhar adi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

160 views4 pages

CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning

Uploaded by

suhar adi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CS229 Problem Set #3 1

CS 229, Public Course

Problem Set #3: Learning Theory and Unsuper-
vised Learning

1. Uniform convergence and Model Selection

In this problem, we will prove a bound on the error of a simple model selection procedure.
Let there be a binary classification problem with labels y ∈ {0, 1}, and let H1 ⊆ H2 ⊆
. . . ⊆ Hk be k different finite hypothesis classes (|Hi | < ∞). Given a dataset S of m iid
training examples, we will divide it into a training set Strain consisting of the first (1 − β)m
examples, and a hold-out cross validation set Scv consisting of the remaining βm examples.
Here, β ∈ (0, 1).
Let ĥi = arg minh∈Hi ε̂Strain (h) be the hypothesis in Hi with the lowest training error
(on Strain ). Thus, ĥi would be the hypothesis returned by training (with empirical risk
minimization) using hypothesis class Hi and dataset Strain . Also let h⋆i = arg minh∈Hi ε(h)
be the hypothesis in Hi with the lowest generalization error.
Suppose that our algorithm first finds all the ĥi ’s using empirical risk minimization then
uses the hold-out cross validation set to select a hypothesis from this the {ĥ1 , . . . , ĥk } with
minimum training error. That is, the algorithm will output
ĥ = arg min ε̂Scv (h).
h∈{ĥ1 ,...,ĥk }

For this question you will prove the following bound. Let any δ > 0 be fixed. Then with
probability at least 1 − δ, we have that
s ! s
∗ 2 4|Hi | 2 4k
ε(ĥ) ≤ min ε(hi ) + log + log
i=1,...,k (1 − β)m δ 2βm δ

(a) Prove that with probability at least 1 − 2δ , for all ĥi ,

s
1 4k
|ε(ĥi ) − ε̂Scv (ĥi )| ≤ log .
2βm δ

(b) Use part (a) to show that with probability 1 − 2δ ,

s
2 4k
ε(ĥ) ≤ min ε(ĥi ) + log .
i=1,...,k βm δ

δ
(c) Let j = arg mini ε(ĥi ). We know from class that for Hj , with probability 1 − 2
s
2 4|Hj |
|ε(ĥj ) − ε̂Strain (h⋆j )| ≤ log , ∀hj ∈ Hj .
(1 − β)m δ

Use this to prove the final bound given at the beginning of this problem.
CS229 Problem Set #3 2

2. VC Dimension
Let the input domain of a learning problem be X = R. Give the VC dimension for each
of the following classes of hypotheses. In each case, if you claim that the VC dimension is
d, then you need to show that the hypothesis class can shatter d points, and explain why
there are no d + 1 points it can shatter.

• h(x) = 1{a < x}, with parameter a ∈ R.

• h(x) = 1{a < x < b}, with parameters a, b ∈ R.
• h(x) = 1{a sin x > 0}, with parameter a ∈ R.
• h(x) = 1{sin(x + a) > 0}, with parameter a ∈ R.

3. ℓ1 regularization for least squares

In the previous problem set, we looked at the least squares problem where the objective
function is augmented with an additional regularization term λkθk22 . In this problem we’ll
consider a similar regularized objective but thisPtime with a penalty on the ℓ1 norm of
the parameters λkθk1 , where kθk1 is defined as i |θi |. That is, we want to minimize the
objective
m n
1 X T (i) X
J(θ) = (θ x − y (i) )2 + λ |θi |.
2 i=1 i=1

There has been a great deal of recent interest in ℓ1 regularization, which, as we will see,
has the benefit of outputting sparse solutions (i.e., many components of the resulting θ are
equal to zero).
The ℓ1 regularized least squares problem is more difficult than the unregularized or ℓ2
regularized case, because the ℓ1 term is not differentiable. However, there have been many
efficient algorithms developed for this problem that work very well in practive. One very
straightforward approach, which we have already seen in class, is the coordinate descent
method. In this problem you’ll derive and implement a coordinate descent algorithm for
ℓ1 regularized least squares, and apply it to test data.

(a) Here we’ll derive the coordinate descent update for a given θi . Given the X and
~y matrices, as defined in the class notes, as well a parameter vector θ, how can we
adjust θi so as to minimize the optimization objective? To answer this question, we’ll
rewrite the optimization objective above as
1 1
J(θ) = kXθ − ~y k22 + λkθk1 = kX θ̄ + Xi θi − ~y k22 + λkθ̄k1 + λ|θi |
2 2
where Xi ∈ Rm denotes the ith column of X, and θ̄ is equal to θ except with θ̄i = 0;
all we have done in rewriting the above expression is to make the θi term explicit in
the objective. However, this still contains the |θi | term, which is non-differentiable
and therefore difficult to optimize. To get around this we make the observation that
the sign of θi must either be non-negative or non-positive. But if we knew the sign of
θi , then |θi | becomes just a linear term. That, is, we can rewrite the objective as
1
J(θ) = kX θ̄ + Xi θi − ~y k22 + λkθ̄k1 + λsi θi
2
where si denotes the sign of θi , si ∈ {−1, 1}. In order to update θi , we can just
compute the optimal θi for both possible values of si (making sure that we restrict
CS229 Problem Set #3 3

the optimal θi to obey the sign restriction we used to solve for it), then look to see
which achieves the best objective value.
For each of the possible values of si , compute the resulting optimal value of θi . [Hint:
to do this, you can fix si in the above equation, then differentiate with respect to θi
to find the best value. Finally, clip θi so that it lies in the allowable range — i.e., for
si = 1, you need to clip θi such that θi ≥ 0.]
(b) Implement the above coordinate descent algorithm using the updates you found in
the previous part. We have provided a skeleton theta = l1ls(X,y,lambda) function
in the q3/ directory. To implement the coordinate descent algorithm, you should
repeatedly iterate over all the θi ’s, adjusting each as you found above. You can
terminate the process when θ changes by less than 10− 5 after all n of the updates.
(c) Test your implementation on the data provided in the q3/ directory. The [X, y,
theta true] = load data; function will load all the data — the data was generated
by y = X*theta true + 0.05*randn(20,1), but theta true is sparse, so that very
few of the columns of X actually contain relevant features. Run your l1ls.m imple-
mentation on this data set, ranging λ from 0.001 to 10. Comment briefly on how this
algorithm might be used for feature selection.
4. K-Means Clustering
In this problem you’ll implement the K-means clustering algorithm on a synthetic data
set. There is code and data for this problem in the q4/ directory. Run load ’X.dat’;
to load the data file for clustering. Implement the [clusters, centers] = k means(X,
k) function in this directory. As input, this function takes the m × n data matrix X and
the number of clusters k. It should output a m element vector, clusters, which indicates
which of the clusters each data point belongs to, and a k × n matrix, centers, which
contains the centroids of each cluster. Run the algorithm on the data provided, with k = 3
and k = 4. Plot the cluster assignments and centroids for each iteration of the algorithm
using the draw clusters(X, clusters, centroids) function. For each k, be sure to run
the algorithm several times using different initial centroids.
5. The Generalized EM algorithm
When attempting to run the EM algorithm, it may sometimes be difficult to perform the M
step exactly — recall that we often need to implement numerical optimization to perform
the maximization, which can be costly. Therefore, instead of finding the global maximum
of our lower bound on the log-likelihood, and alternative is to just increase this lower bound
a little bit, by taking one step of gradient ascent, for example. This is commonly known
as the Generalized EM (GEM) algorithm.
Put slightly more formally, recall that the M-step of the standard EM algorithm performs
the maximization
XX p(x(i) , z (i) ; θ)
θ := arg max Qi (z (i) ) log .
θ
i (i)
Qi (z (i) )
z

The GEM algorithm, in constrast, performs the following update in the M-step:
XX p(x(i) , z (i) ; θ)
θ := θ + α∇θ Qi (z (i) ) log
i
Qi (z (i) )
z (i)

where α is a learning rate which we assume is choosen small enough such that we do not
decrease the objective function when taking this gradient step.
CS229 Problem Set #3 4

(a) Prove that the GEM algorithm described above converges. To do this, you should
show that the the likelihood is monotonically improving, as it does for the EM algo-
rithm — i.e., show that ℓ(θ(t+1) ) ≥ ℓ(θ(t) ).
(b) Instead of using the EM algorithm at all, suppose we just want to apply gradient ascent
to maximize the log-likelihood directly. In other words, we are trying to maximize
the (non-convex) function
X X
ℓ(θ) = log p(x(i) , z (i) ; θ)
i z (i)

so we could simply use the update

X X
θ := θ + α∇θ log p(x(i) , z (i) ; θ).
i z (i)

Show that this procedure in fact gives the same update as the GEM algorithm de-
scribed above.

CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
No ratings yet
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
8 pages
CS 229, Spring 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
No ratings yet
CS 229, Spring 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
8 pages
Stanford University CS 229, Autumn 2014 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2014 Midterm Examination
23 pages
CS229 Problem Set 1: Supervised Learning
No ratings yet
CS229 Problem Set 1: Supervised Learning
8 pages
Ps 1
No ratings yet
Ps 1
5 pages
Problemset2 PDF
No ratings yet
Problemset2 PDF
4 pages
CS229 Autumn 2012 Problem Set 1 Solutions
No ratings yet
CS229 Autumn 2012 Problem Set 1 Solutions
16 pages
Ps and Solution CS229
No ratings yet
Ps and Solution CS229
55 pages
Ps 1
No ratings yet
Ps 1
16 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
CS229 Midterm Solutions 2010
No ratings yet
CS229 Midterm Solutions 2010
8 pages
CS229 Practice Midterm Overview
No ratings yet
CS229 Practice Midterm Overview
4 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
Midterm Aut2014 (Final) Sol
No ratings yet
Midterm Aut2014 (Final) Sol
23 pages
Practice Midterm 2 Sol
No ratings yet
Practice Midterm 2 Sol
26 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Ps 1
No ratings yet
Ps 1
25 pages
Stanford University CS 229, Autumn 2015 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2015 Midterm Examination
25 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
CS 229, Public Course Problem Set #4 Solutions: Unsupervised Learn-Ing and Reinforcement Learning
No ratings yet
CS 229, Public Course Problem Set #4 Solutions: Unsupervised Learn-Ing and Reinforcement Learning
12 pages
HW 2
No ratings yet
HW 2
10 pages
Cs 229, Public Course Problem Set #2 Solutions: Kernels, SVMS, and Theory
No ratings yet
Cs 229, Public Course Problem Set #2 Solutions: Kernels, SVMS, and Theory
8 pages
hw5 1
No ratings yet
hw5 1
6 pages
HW 3
No ratings yet
HW 3
7 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
MedTerm Machine Learning
No ratings yet
MedTerm Machine Learning
14 pages
CS229 Problem Set 4: EM, DL & RL
No ratings yet
CS229 Problem Set 4: EM, DL & RL
10 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
CS229 Linear Algebra Review
No ratings yet
CS229 Linear Algebra Review
47 pages
ML ES 23-24-II Key
No ratings yet
ML ES 23-24-II Key
4 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
No ratings yet
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
10 pages
MS Key-4
No ratings yet
MS Key-4
4 pages
CS 229 Spring 2016 Problem Set #3: Theory & Unsupervised Learning
No ratings yet
CS 229 Spring 2016 Problem Set #3: Theory & Unsupervised Learning
5 pages
CS 229 Autumn 2017 Problem Set #3: Deep Learning & Unsupervised Learning
No ratings yet
CS 229 Autumn 2017 Problem Set #3: Deep Learning & Unsupervised Learning
9 pages
Machine Learning Homework Guide
No ratings yet
Machine Learning Homework Guide
3 pages
Ps 1
No ratings yet
Ps 1
9 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
hw4 Red
No ratings yet
hw4 Red
6 pages
hw3 Red
No ratings yet
hw3 Red
4 pages
CS 229, Summer 2019 Problem Set #1 Solutions
No ratings yet
CS 229, Summer 2019 Problem Set #1 Solutions
22 pages
CS 229 Autumn 2016 Problem Set#3:Theory & Unsupervised Learning
No ratings yet
CS 229 Autumn 2016 Problem Set#3:Theory & Unsupervised Learning
5 pages
HW 4
No ratings yet
HW 4
7 pages
ps1 Sol
No ratings yet
ps1 Sol
25 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
38 pages
C2 M2 Exam Withsol
No ratings yet
C2 M2 Exam Withsol
12 pages
Linear Regression & Data Analysis
No ratings yet
Linear Regression & Data Analysis
4 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
Regularization
No ratings yet
Regularization
42 pages
HW 5
100% (1)
HW 5
11 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
Ps 3
No ratings yet
Ps 3
15 pages
2023 Summer Final
100% (1)
2023 Summer Final
21 pages
Solution Manual: Scientific Computing
No ratings yet
Solution Manual: Scientific Computing
192 pages
CS229 Final Project Guidelines
No ratings yet
CS229 Final Project Guidelines
4 pages
CS 229, Public Course Problem Set #4: Unsupervised Learning and Re-Inforcement Learning
No ratings yet
CS 229, Public Course Problem Set #4: Unsupervised Learning and Re-Inforcement Learning
5 pages
CS229: Factor Analysis Explained
No ratings yet
CS229: Factor Analysis Explained
9 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
cs229 Notes8 PDF
No ratings yet
cs229 Notes8 PDF
8 pages
Convex Optimization for Students
No ratings yet
Convex Optimization for Students
12 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
1 page
Convex Optimization Overview (CNT'D) : 1 Recap
No ratings yet
Convex Optimization Overview (CNT'D) : 1 Recap
15 pages
Beagle
100% (2)
Beagle
26 pages
Automotive Oils & Additives Catalog
No ratings yet
Automotive Oils & Additives Catalog
5 pages
Module 1 Data Warehousing Fundamentals
No ratings yet
Module 1 Data Warehousing Fundamentals
17 pages
Mini Project Automation in Customer Management
No ratings yet
Mini Project Automation in Customer Management
20 pages
Enter Store Receipt Details: Default Setting
No ratings yet
Enter Store Receipt Details: Default Setting
1 page
Linde Service Guide: Arrangement
No ratings yet
Linde Service Guide: Arrangement
2 pages
Amir CV
No ratings yet
Amir CV
2 pages
Online News vs. Traditional Media
No ratings yet
Online News vs. Traditional Media
7 pages
Slides - Vlookup
No ratings yet
Slides - Vlookup
130 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
28 pages
What Is Excel Swiss Knife - Excel Swiss Knife
No ratings yet
What Is Excel Swiss Knife - Excel Swiss Knife
1 page
One Pass Multi Pass Assembler and Implementation Examples
83% (6)
One Pass Multi Pass Assembler and Implementation Examples
17 pages
Classroom Setup
No ratings yet
Classroom Setup
4 pages
Cha-05 Adjustment Computation 01of01!15!05-2018
No ratings yet
Cha-05 Adjustment Computation 01of01!15!05-2018
85 pages
Anis Sarker
No ratings yet
Anis Sarker
2 pages
SF50H 150H 250H Specsheet
No ratings yet
SF50H 150H 250H Specsheet
1 page
Help - SAP Overall Equipment Effectiveness Management
No ratings yet
Help - SAP Overall Equipment Effectiveness Management
114 pages
NEW AAO Electronic Online System Form
73% (11)
NEW AAO Electronic Online System Form
1 page
Lime Slakers: Types and Trade-offs
No ratings yet
Lime Slakers: Types and Trade-offs
21 pages
Stakeholder Questionnaire for App Development
No ratings yet
Stakeholder Questionnaire for App Development
6 pages
Staff Selection Commission, Southern Region, Chennai
No ratings yet
Staff Selection Commission, Southern Region, Chennai
5 pages
Eric Brandon Rhoads
No ratings yet
Eric Brandon Rhoads
4 pages
Marketing Manager Role at D-Marin
No ratings yet
Marketing Manager Role at D-Marin
2 pages
Idea Day Guidelines V2
No ratings yet
Idea Day Guidelines V2
12 pages
Pneumatic Conveying Strategies For Efficient Operations-Pneumatic Conveying EHandbook
No ratings yet
Pneumatic Conveying Strategies For Efficient Operations-Pneumatic Conveying EHandbook
17 pages
Sliding Window
No ratings yet
Sliding Window
4 pages
BNZ Statement: January-February 2024
No ratings yet
BNZ Statement: January-February 2024
7 pages
PR 1 Table of Contents
No ratings yet
PR 1 Table of Contents
3 pages
ITSEC Asia Internship Openings May 2025
No ratings yet
ITSEC Asia Internship Openings May 2025
7 pages
Admit Card
No ratings yet
Admit Card
4 pages

CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning

Uploaded by

CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning

Uploaded by

CS229 Problem Set #3 1

CS 229, Public Course

1. Uniform convergence and Model Selection

(a) Prove that with probability at least 1 − 2δ , for all ĥi ,

(b) Use part (a) to show that with probability 1 − 2δ ,

• h(x) = 1{a < x}, with parameter a ∈ R.

3. ℓ1 regularization for least squares

so we could simply use the update

You might also like