hw3 Red

Uploaded by

chonleda777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

hw3 Red

Uploaded by

chonleda777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

Homework #3
RELEASE DATE: 10/07/2024
RED CORRECTION: 10/12/2024 16:30
DUE DATE: 10/21/2024, BEFORE 13:00 on GRADESCOPE
QUESTIONS ARE WELCOMED ON DISCORD (INFORMALLY) OR VIA EMAILS (FORMALLY).

You will use Gradescope to upload your scanned/printed solutions. For problems marked with (*), please
follow the guidelines on the course website and upload your source code to Gradescope as well. Any
programming language/platform is allowed.
Any form of cheating, lying, or plagiarism will not be tolerated. Students can get zero scores and/or fail
the class and/or be kicked out of school and/or receive other punishments for those kinds of misconducts.
Discussions on course materials and homework solutions are encouraged. But you should write the final
solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but
not copied from.
Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework
solutions and/or source codes to your classmates at any time. In order to maximize the level of fairness
in this class, lending and borrowing homework solutions are both regarded as dishonest behaviors and will
be punished according to the honesty policy.
You should write your solutions in English with the common math notations introduced in class or in the
problems. We do not accept solutions written in any other languages.

This homework set comes with 200 points and 20 bonus points. In general, every home-
work set would come with a full credit of 200 points, with some possible bonus points.

1. (10 points, auto-graded) Which of the following hypothesis set, each parameterized by one param-
eter only, is of the largest dvc ?
[a] {cs (x) : s ∈ {−1, +1}} where cs (x) = s
[b] {rθ (x) : θ ∈ R} where rθ (x) = sign(x1 − θ)
[c] {qi (x) : i ∈ {1, 2, . . . , d}} where qi (x) = sign(xi )
[d] {uα (x) : α ∈ R} where uα (x) = sign(sin(αx1 ))
[e] {vβ (x) : β ∈ R} where vβ (x) = sign(βx1 )
2. (10 points, auto-graded) Consider a hypothesis set that contains hypotheses of the form h(x) = wx
for x ∈ R. Combine the hypothesis set with the squared error function to minimize
N
1 X
Ein (w) = (h(xn ) − yn )2
N n=1

on a given data set {(xn , yn )}N

n=1 . What is the optimal wlin ? You can assume all denominators to
be non-zero. (Hint: This is linear regression in R without the added x0 .)
PN
xn xn
[a] Pn=1
N
n=1 xn yn
PN
n=1 xn yn
[b] PN
n=1 xn xn
PN
xn yn
[c] Pn=1
N
n=1 yn yn
PN
n=1 yn yn
[d] PN
n=1 xn yn

[e] none of the other choices

1 of 4
Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

3. (10 points, auto-graded) In Lecture 9, we introduced the hat matrix H = XX† for linear regression.
The matrix projects the label vector y to the “predicted” vector ŷ = Hy and helps us analyze the
error of linear regression. Assume that XT X is invertible, which makes H = X(XT X)−1 XT . Now,
consider the following operations on X. Which operation can possibly change H?
1
[a] multiplying each of the n-th row of X by n (which is equivalent to scaling the n-th example
by n1 )
[b] multiplying each of the i-th column of X by i2 (which is equivalent to scaling the i-th feature
by i2 )
[c] multiplying the whole matrix X by 2 (which is equivalent to scaling all input vectors by 2)
[d] adding three randomly-chosen columns i, j, k to column 1 of X
(i.e., xn,1 ← xn,1 + xn,i + xn,j + xn,k )
[e] none of the other choices (i.e. all other choices are guaranteed to keep H unchanged.)
4. (10 points, auto-graded) Let y1 , y2 , . . . , yN be N values generated i.i.d. from a uniform distribution
[θ, 1] with some unknown θ. For any θ̂ ≤ min(y1 , y2 , . . . , yN ), what is its likelihood?
N
[a] θ̂1
QN yn
[b] n=1 1− θ̂
N
1
[c] 1− θ̂
max(y1 ,...,yN )
[d] θ̂
min(y1 ,...,yN )
[e] 1−θ̂

(Hint: Those who are interested in more math [who isn’t? :-)] are encouraged to try to derive the
maximum-likelihood estimator.)
5. (20 points, human-graded) Prove or disprove that for any two non-empty hypothesis sets H1 and H2
for binary classification that operate on the same input space, dvc (H1 ∪ H2 ) ≤ dvc (H1 ) + dvc (H2 ).
Note that the ∪ operation represents set-union. That is, {h1 , h2 , h3 } ∪ {h2 , h4 } = {h1 , h2 , h3 , h4 }.

6. (20 points, human-graded) Consider a binary classification problem, where Y = {−1, +1}. Assume
a noisy scenario where the data is generated i.i.d. from some P (x, y). In class, we discussed
that when the 0/1 error function (i.e. classification error) is considered, calculating the “ideal
mini-target” on each x reveals the hidden target function of

1
f0/1 (x) = argmaxy∈{−1,+1} P (y|x) = sign P (y = +1|x) − .
2

Instead of the 0/1 error, if we consider the super-market error function, where a false negative
(classifying a positive example as a negative one) is 10 times more important than a false positive,
the hidden target should be changed to

fmkt (x) = sign P (y = +1|x) − α .

Prove what the value of α should be.

2 of 4
Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

7. (20 points, human-graded) In class, we had two definitions of Eout (h) for binary classification. The
first definition compares the hypothesis h against the target function f .
(1)
Eout (h) = Ex∼P (x) Jh(x) ̸= f (x)K .

The second definition extends from the first definition, and compares the hypothesis h against the
noisy distribution P (y | x).
(2)
Eout (h) = Ex∼P (x),y∼P (y|x) Jh(x) ̸= yK .

Note that when considering the 0/1 error, we know that the target function f (x) hides itself within
P (y | x) by
1
f (x) = argmaxy∈{−1,+1} P (y|x) = sign P (y = +1|x) − .
2
With all the definitions above, prove that for any hypothesis h,
(2) (1) (2)
Eout (h) ≤ Eout (h) + Eout (f ).
(2)
(Hint: Technically, Eout (f ) is a constant that represents the irreducible error (i.e. noise) of the
learning problem.)
8. (20 points, human-graded) Consider running linear regression on {(xn , yn )}N
n=1 , where xn includes
the constant dimension x0 = 1 as usual. For simplicity, you can assume that XT X is invertible.
Assume that the unique (why :-)) solution wlin is obtained after running linear regression on the
data above. Then, if every x0 is changed to 1126 instead of 1, run linear regression again to get the
unique solution wlucky . Prove that wlin = Dwlucky , where D is some diagonal matrix, by deriving
the correct D.
9. (20 points, human-graded) In logistic regression, we consider the logistic hypotheses
1
h(x) =
1 + exp(−wT x)

to approximate the target function f (x) = P (+1 | x). We use the property that the hypotheses
are sigmoid (s-shaped) to simplify the likelihood function and then take maximum likelihood to
derive the error function Ein . Now, consider another family of sigmoid hypotheses,
!
1 wT x
h̃(x) = p +1 .
2 1 + (wT x)2

Follow the same derivation steps to obtain the corresponding Ẽin when using h̃. What is ∇Ẽin (w)?

10. (20 points, code needed, human-graded) Next, we use a real-world data set to study linear regres-
sion. Please download the cpusmall scale data set at
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/cpusmall_scale

We use the column-scaled version instead of the original version so you’d likely encounter fewer
numerical issues.
The data set contains 8192 examples. In each experiment, you are asked to

(1) randomly sample N out of 8192 examples as your training data.

(2) add x0 = 1 to each example, as always.
(3) run linear regression on the N examples, using any reasonable implementations of X† on the
input matrix X, to get wlin
(4) evaluate Ein (wlin ) by averaging the squared error over the N examples; estimate Eout (wlin )
by averaging the squared error over the rest (8192 − N ) examples

3 of 4
Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

For N = 32, run the experiment above for 1126 times, and plot a scatter plot of (Ein (wlin ), Eout (wlin ))
in each experiment. Describe your findings.
Then, provide the first page of the snapshot of your code as a proof that you have written the code.
11. (20 points, code needed, human-graded) For each of N = 25, 50, 75, 100, . . . , 2000, calculate Ēin (N )
and Ēout (N ) by averaging Ein and Eout over 16 experiments. Then, plot the learning curves that
show Ēin (N ) and Ēout (N ) as functions of N on the same figure. Describe your findings.
Then, provide the first page of the snapshot of your code as a proof that you have written the code.
12. (20 points, code needed, human-graded) Repeat Problem 11, but using the first 2 features for each
example instead of all 12 features. That is, run linear regression with x = [x0 , x1 , x2 ] instead.
Describe your findings. In particular, compare your results here to those of Problem 11.
Then, provide the first page of the snapshot of your code as a proof that you have written the code.
13. (Bonus 20 points, human graded) Please note that this part is related to the “optional” lecture
6 of the course. If you want to get the bonus, you need to do something “extra” to at least
understand the definitions below. We hope that this reminds everyone that you do not always
k−1
X N
need to solve the bonus problem! In Lecture 6, we proved B(N, k) ≤ . Now, prove that
i=0
i
k−1
X N k−1
X N
B(N, k) ≥ . Thus, B(N, k) = .
i=0
i i=0
i

4 of 4

(Lynn E. Roller) in Search of God The Mother The (BookFi) PDF
100% (4)
(Lynn E. Roller) in Search of God The Mother The (BookFi) PDF
401 pages
CS236 Homework 1
100% (1)
CS236 Homework 1
4 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
HTH622B - OPERATORS MANUAL TimberRiteä 30H Control and Measuring System OMF071051 Issue 04 DIC06 (ENGLISH) AJUSTES
No ratings yet
HTH622B - OPERATORS MANUAL TimberRiteä 30H Control and Measuring System OMF071051 Issue 04 DIC06 (ENGLISH) AJUSTES
180 pages
Lesson-Exemplar-In-Math-10 For Cot 1
100% (1)
Lesson-Exemplar-In-Math-10 For Cot 1
5 pages
The Role of Input and Output in Second L
100% (1)
The Role of Input and Output in Second L
14 pages
HW 2
No ratings yet
HW 2
4 pages
G-Stomper 5 - Pattern Sequencer
No ratings yet
G-Stomper 5 - Pattern Sequencer
83 pages
Subject Verb Agreement PPT 2
No ratings yet
Subject Verb Agreement PPT 2
25 pages
CS-31002 (ML) - CS End April 2025
No ratings yet
CS-31002 (ML) - CS End April 2025
19 pages
Stochastic Gradient Descent 1
No ratings yet
Stochastic Gradient Descent 1
42 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Midterm Exam - Summer 21
No ratings yet
Midterm Exam - Summer 21
6 pages
Nafs and Rizq PDF
No ratings yet
Nafs and Rizq PDF
4 pages
HW 1
No ratings yet
HW 1
11 pages
Midterm Sp16 Solutions
100% (1)
Midterm Sp16 Solutions
17 pages
2011 End Spring 2011 Computer Science Machine Learning
No ratings yet
2011 End Spring 2011 Computer Science Machine Learning
10 pages
TOPIC 4 A
No ratings yet
TOPIC 4 A
32 pages
Dda3020 2024F HW1
No ratings yet
Dda3020 2024F HW1
6 pages
Practice Questions Lec 18 45
No ratings yet
Practice Questions Lec 18 45
4 pages
Tutorial 6
No ratings yet
Tutorial 6
5 pages
Stanford University CS 229, Autumn 2015 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2015 Midterm Examination
25 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
Midterm f01
No ratings yet
Midterm f01
10 pages
Mid-Term A2 ML Solution
No ratings yet
Mid-Term A2 ML Solution
7 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
hw4 Red
No ratings yet
hw4 Red
6 pages
HW 1
No ratings yet
HW 1
4 pages
Cs 419 Endsemsols
No ratings yet
Cs 419 Endsemsols
6 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
The Soul As Second Self Before Plato
No ratings yet
The Soul As Second Self Before Plato
48 pages
Comparative and Superlative Adjective
No ratings yet
Comparative and Superlative Adjective
5 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Linux VI and Vim Editor: Tutorial and Advanced Features
No ratings yet
Linux VI and Vim Editor: Tutorial and Advanced Features
17 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
hw5 1
No ratings yet
hw5 1
6 pages
I Am From - Poetry
No ratings yet
I Am From - Poetry
6 pages
Part 3
No ratings yet
Part 3
113 pages
Lecture10 Mid
No ratings yet
Lecture10 Mid
43 pages
10.1016@s0010 44850100100 2
No ratings yet
10.1016@s0010 44850100100 2
13 pages
EndSem 202223 Solution
No ratings yet
EndSem 202223 Solution
4 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
HW 2
No ratings yet
HW 2
5 pages
HW 3
No ratings yet
HW 3
7 pages
PT650M Weighing Display Controller (English Version)
0% (1)
PT650M Weighing Display Controller (English Version)
9 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Soal Latihan Uts Bahasa Inggris Kelas 4 Semester 1
100% (2)
Soal Latihan Uts Bahasa Inggris Kelas 4 Semester 1
2 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
11 pages
A4 Sol
No ratings yet
A4 Sol
27 pages
hw2 Red
No ratings yet
hw2 Red
4 pages
Machine Learning Homework
No ratings yet
Machine Learning Homework
8 pages
S.P.I.T.Polytechnic, Kurund: Classtest-1
No ratings yet
S.P.I.T.Polytechnic, Kurund: Classtest-1
2 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
Deadlock Detection Using Java
0% (1)
Deadlock Detection Using Java
3 pages
ADU AS-3 S-5 Addendum 8502558 PDF
No ratings yet
ADU AS-3 S-5 Addendum 8502558 PDF
20 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Homework 1
0% (1)
Homework 1
4 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
No ratings yet
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
8 pages
Buku Program English Week
100% (3)
Buku Program English Week
2 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Spagobi Server Configure v3
No ratings yet
Spagobi Server Configure v3
11 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
Polynomial Sample Problems
No ratings yet
Polynomial Sample Problems
3 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Question - Quora
No ratings yet
Question - Quora
24 pages
Test 2 Answer Key, Reading
No ratings yet
Test 2 Answer Key, Reading
5 pages
Li Lai-Resume
No ratings yet
Li Lai-Resume
2 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
The Cambridge Foucault Lexicon (Review)
No ratings yet
The Cambridge Foucault Lexicon (Review)
4 pages
CS 229, Spring 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
No ratings yet
CS 229, Spring 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
8 pages
HW 23 P 4 Rie
No ratings yet
HW 23 P 4 Rie
5 pages
Cs229 Midterm Aut2015
No ratings yet
Cs229 Midterm Aut2015
21 pages
Human Knot Lesson Plan
No ratings yet
Human Knot Lesson Plan
3 pages
(Online Teaching) A2 Flyers Speaking Part 2
No ratings yet
(Online Teaching) A2 Flyers Speaking Part 2
11 pages
HW 1
No ratings yet
HW 1
3 pages
Ps 1
No ratings yet
Ps 1
5 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
CS 229, Autumn 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
No ratings yet
CS 229, Autumn 2016 Problem Set #1: Supervised Learning: m −y θ x m θ (i) (i)
8 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Coupling UW16.2 KL Ver 1.1
No ratings yet
Coupling UW16.2 KL Ver 1.1
4 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

hw3 Red

Uploaded by

hw3 Red

Uploaded by

Machine Learning (NTU, Fall 2024) instructor: Hsuan-Tien Lin

on a given data set {(xn , yn )}N

[e] none of the other choices

Prove what the value of α should be.

(1) randomly sample N out of 8192 examples as your training data.

You might also like