0% found this document useful (0 votes)
13 views5 pages

HW 0

MLHW

Uploaded by

amien50311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

HW 0

MLHW

Uploaded by

amien50311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Machine Learning Foundations (NTU, Fall 2024) instructor: Hsuan-Tien Lin

Homework #0
RELEASE DATE: 09/02/2024
DUE DATE: 10/07/2024, 13:00, on Gradescope
QUESTIONS ARE WELCOMED ON DISCORD (INFORMALLY) OR NTU COOL (FORMALLY).

Please use Gradescope to upload your choices. For homework 0, you do not need to upload your
scanned/printed solutions.
Any form of cheating, lying, or plagiarism will not be tolerated. Students can get zero scores and/or fail
the class and/or be kicked out of school and/or receive other punishments for those kinds of misconducts.
Discussions on course materials and homework solutions are encouraged. But you should write the final
solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but
not copied from.
Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework
solutions and/or source codes to your classmates at any time. In order to maximize the level of fairness
in this class, lending and borrowing homework solutions are both regarded as dishonest behaviors and will
be punished according to the honesty policy.

This homework set is of 40 points, which is much smaller than that of a usual homework
set. For each problem, there is one correct choice. If you choose the correct answer, you
get 2 points; if you choose an incorrect answer, you get 0 points.

Combinatorics and Probability


1. Let C(N, K) = 1 for K = 0 or K = N , and C(N, K) = C(N − 1, K) + C(N − 1, K − 1) for N ≥ 1.
What is the closed-form equation of C(N, K) for N ≥ 1 and 0 ≤ K ≤ N ?
N!
[a] C(N, K) = K!(N −K)!
PK N!
[b] C(N, K) = k=0 k!(N −k)!
K!(N −K)!
[c] C(N, K) = K!
PK k!(N −k)!
[d] C(N, K) = k=0 N!
[e] none of the other choices
2. What is the probability of getting exactly 3 tails when flipping 10 fair coins? Choose the closest
number.

[a] 0.0
[b] 0.1
[c] 0.2
[d] 0.3
[e] 0.4

3. If your friend flipped a fair coin three times, and then tells you that one of the tosses resulted in
head, what is the probability that all three tosses resulted in heads?

[a] 1/8
[b] 3/8
[c] 7/8
[d] 1/7
[e] 1/3

1 of 5
Machine Learning Foundations (NTU, Fall 2024) instructor: Hsuan-Tien Lin

4. A program selects a random integer x like this: a random bit is first generated uniformly. If the bit
is 0, x is drawn uniformly from {0, 1, . . . , 3}; otherwise, x is drawn uniformly from {0, −1, . . . , −7}.
If we get an x from the program with |x| = 1, what is the probability that x is negative?

[a] 1/3
[b] 1/4
[c] 1/2
[d] 1/12
[e] 2/3
1
PN
5. For N random variables x1 , x2 , . . . , xN , let their mean be x̄ = N n=1 xn and variance be σx2 =
1
PN
N −1 n=1 (xn − x̄)2 . Which of the following is provably the same as σx2 ?

1
PN 2 2
[a] N n=1 (xn − x̄ )
1
PN 2 2
[b] N −1 n=1 (xn − x̄ )
1
P N 2 2
[c] N −1 n=1 (x̄ − xn )
N 2
[d] N −1 (x̄ )
[e] none of the other choices

6. For two events A and B, if their probability P (A) = 0.2 and P (B) = 0.5, what is the tightest
possible range of P (A ∪ B)?

[a] [0.3, 0.4]


[b] [0, 0.4]
[c] [0.5, 0.7]
[d] [0.3, 1]
[e] [0.2, 0.7]

Linear Algebra
7. Consider a line w0 + w1 x1 + w2 x2 = 0 on the (x1 , x2 ) plane with a non-zero w1 . Which of the
following point is on the line?
[a] ( w
w1 , 0)
0

w0
[b] (− w1
, 0)
[c] (w2 , w1 )
[d] ( 12 w0 w2 , 12 w0 w1 )
[e] none of the other choices
 
0 2 4
8. What is the diagonal on the inverse of  2 4 2 ?
3 3 1
[a] [3/4, 1/4, 1/8]
[b] [1/4, 1/8, 3/4]
[c] [1/4, 3/4, 1/8]
[d] [1/8, 3/4, 1/4]
[e] none of the other choices
 
2023 1 1
9. What is the largest eigenvalue of  2 2024 2 ?
−1 −1 2021

2 of 5
Machine Learning Foundations (NTU, Fall 2024) instructor: Hsuan-Tien Lin

[a] 2020
[b] 2021
[c] 2022
[d] 2023
[e] 2024

10. For a real matrix M, let M = UΣVT be its singular value decomposition, with U and V being
unitary matrices. Define M† = VΣ† UT , where Σ† [j][i] = Σ[i][j]
1
when Σ[i][j] is nonzero, and 0

otherwise. Which of the following is always the same as MM M?

[a] MMT M
[b] MVT
[c] UT M
[d] UT MVT
[e] M

11. Which of the following matrix is not guaranteed to be positive semi-definite?


[a] ZT Z for any real matrix Z
[b] a real symmetric matrix S whose eigenvalues are all non-negative
[c] an all-zero square matrix
[d] a real symmetric matrix whose entries are all positive
[e] none of the other choices

12. Consider a fixed x ∈ Rd and some varying u ∈ Rd with ∥u∥ = 1. Which of the following is the
smallest value of uT x?

[a] 0
[b] −∞
[c] −∥x∥
[d] −∥u∥
[e] none of the other choices

13. Consider two parallel hyperplanes in Rd :


H1 : wT x = +3,
H2 : wT x = −2,

What is the distance between H1 and H2 ?

[a] 5
[b] 5/∥w∥
[c] 5/∥w∥2
[d] 5 · ∥w∥
[e] none of the other choices

3 of 5
Machine Learning Foundations (NTU, Fall 2024) instructor: Hsuan-Tien Lin

Calculus
∂f
14. Let f (x, y) = xy, x(u, v) = cos(u + v), y(u, v) = sin(u − v). What is ?
∂v
[a] − sin(u + v) sin(u − v) − cos(u + v) cos(u − v)
[b] + sin(u + v) sin(u − v) − cos(u + v) cos(u − v)
[c] − sin(u + v) sin(u − v) + cos(u + v) cos(u − v)
[d] + sin(u + v) sin(u − v) + cos(u + v) cos(u − v)
[e] none of the other choices
∂E
 
15. Let E(u, v) = (uev −2ve−u )2 . Calculate the gradient ∇E(u, v) = ∂u
∂E at [u, v] = [1, 1]. Choose
∂v
the closest vector.

[a] [−13.70, −7.86]


[b] [−13.70, +7.86]
[c] [+13.70, −7.86]
[d] [+13.70, +7.86]
[e] [1, 1]

16. For some given A > 0, B > 0, what is the optimal α that solves
min Aeα + Be−2α ?
α

1
[a] 3 ln( 2B
A )
1 A
[b] 3 ln( 2B )
[c] ln( 2B
A )
A
[d] ln( 2B )
[e] none of the other choices

17. Let w be a vector in Rd and E(w) = 12 wT Aw + bT w for some symmetric matrix A and vector b.
What is the gradient ∇E(w)?

[a] wT Aw + wT b
[b] wT Aw − wT b
[c] Aw + b
[d] Aw − b
[e] none of the other choices

4 of 5
Machine Learning Foundations (NTU, Fall 2024) instructor: Hsuan-Tien Lin

18. Let w be a vector in Rd and E(w) = 12 wT Aw + bT w for some symmetric and strictly positive
definite matrix A and vector b. What is the optimal w that minimizes E(w)?
[a] +A−1 b
[b] −A−1 b
[c] −A−1 1 + b, where 1 is a vector of all 1’s
[d] +A−1 1 − b
[e] none of the other choices
19. Solve
1 2
min (w + 2w22 + 3w32 ) subject to w1 + w2 + w3 = 11.
w1 ,w2 ,w3 2 1

What is the optimal w1 ? (Hint: refresh your memory on “Lagrange multipliers”)

[a] 0
[b] 1
[c] 2
[d] 3
[e] 6

20. Solve
1 2
min (w1 + 2w22 + 3w32 )
w1 ,w2 ,w3 2
subject to w1 + w2 + w3 ≥ 11,
w2 + 2w3 ≥ −11.

What is the optimal (w1 , w2 , w3 )? (Hint: you can also consider using “Lagrange multipliers” to
solve this.)

[a] (3, 6, 2)
[b] (3, 2, 6)
[c] (6, 2, 3)
[d] (3, 6, 2)
[e] (6, 3, 2)

5 of 5

You might also like