HW 0
HW 0
Homework #0
RELEASE DATE: 09/02/2024
DUE DATE: 10/07/2024, 13:00, on Gradescope
QUESTIONS ARE WELCOMED ON DISCORD (INFORMALLY) OR NTU COOL (FORMALLY).
Please use Gradescope to upload your choices. For homework 0, you do not need to upload your
scanned/printed solutions.
Any form of cheating, lying, or plagiarism will not be tolerated. Students can get zero scores and/or fail
the class and/or be kicked out of school and/or receive other punishments for those kinds of misconducts.
Discussions on course materials and homework solutions are encouraged. But you should write the final
solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but
not copied from.
Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework
solutions and/or source codes to your classmates at any time. In order to maximize the level of fairness
in this class, lending and borrowing homework solutions are both regarded as dishonest behaviors and will
be punished according to the honesty policy.
This homework set is of 40 points, which is much smaller than that of a usual homework
set. For each problem, there is one correct choice. If you choose the correct answer, you
get 2 points; if you choose an incorrect answer, you get 0 points.
[a] 0.0
[b] 0.1
[c] 0.2
[d] 0.3
[e] 0.4
3. If your friend flipped a fair coin three times, and then tells you that one of the tosses resulted in
head, what is the probability that all three tosses resulted in heads?
[a] 1/8
[b] 3/8
[c] 7/8
[d] 1/7
[e] 1/3
1 of 5
Machine Learning Foundations (NTU, Fall 2024) instructor: Hsuan-Tien Lin
4. A program selects a random integer x like this: a random bit is first generated uniformly. If the bit
is 0, x is drawn uniformly from {0, 1, . . . , 3}; otherwise, x is drawn uniformly from {0, −1, . . . , −7}.
If we get an x from the program with |x| = 1, what is the probability that x is negative?
[a] 1/3
[b] 1/4
[c] 1/2
[d] 1/12
[e] 2/3
1
PN
5. For N random variables x1 , x2 , . . . , xN , let their mean be x̄ = N n=1 xn and variance be σx2 =
1
PN
N −1 n=1 (xn − x̄)2 . Which of the following is provably the same as σx2 ?
1
PN 2 2
[a] N n=1 (xn − x̄ )
1
PN 2 2
[b] N −1 n=1 (xn − x̄ )
1
P N 2 2
[c] N −1 n=1 (x̄ − xn )
N 2
[d] N −1 (x̄ )
[e] none of the other choices
6. For two events A and B, if their probability P (A) = 0.2 and P (B) = 0.5, what is the tightest
possible range of P (A ∪ B)?
Linear Algebra
7. Consider a line w0 + w1 x1 + w2 x2 = 0 on the (x1 , x2 ) plane with a non-zero w1 . Which of the
following point is on the line?
[a] ( w
w1 , 0)
0
w0
[b] (− w1
, 0)
[c] (w2 , w1 )
[d] ( 12 w0 w2 , 12 w0 w1 )
[e] none of the other choices
0 2 4
8. What is the diagonal on the inverse of 2 4 2 ?
3 3 1
[a] [3/4, 1/4, 1/8]
[b] [1/4, 1/8, 3/4]
[c] [1/4, 3/4, 1/8]
[d] [1/8, 3/4, 1/4]
[e] none of the other choices
2023 1 1
9. What is the largest eigenvalue of 2 2024 2 ?
−1 −1 2021
2 of 5
Machine Learning Foundations (NTU, Fall 2024) instructor: Hsuan-Tien Lin
[a] 2020
[b] 2021
[c] 2022
[d] 2023
[e] 2024
10. For a real matrix M, let M = UΣVT be its singular value decomposition, with U and V being
unitary matrices. Define M† = VΣ† UT , where Σ† [j][i] = Σ[i][j]
1
when Σ[i][j] is nonzero, and 0
†
otherwise. Which of the following is always the same as MM M?
[a] MMT M
[b] MVT
[c] UT M
[d] UT MVT
[e] M
12. Consider a fixed x ∈ Rd and some varying u ∈ Rd with ∥u∥ = 1. Which of the following is the
smallest value of uT x?
[a] 0
[b] −∞
[c] −∥x∥
[d] −∥u∥
[e] none of the other choices
[a] 5
[b] 5/∥w∥
[c] 5/∥w∥2
[d] 5 · ∥w∥
[e] none of the other choices
3 of 5
Machine Learning Foundations (NTU, Fall 2024) instructor: Hsuan-Tien Lin
Calculus
∂f
14. Let f (x, y) = xy, x(u, v) = cos(u + v), y(u, v) = sin(u − v). What is ?
∂v
[a] − sin(u + v) sin(u − v) − cos(u + v) cos(u − v)
[b] + sin(u + v) sin(u − v) − cos(u + v) cos(u − v)
[c] − sin(u + v) sin(u − v) + cos(u + v) cos(u − v)
[d] + sin(u + v) sin(u − v) + cos(u + v) cos(u − v)
[e] none of the other choices
∂E
15. Let E(u, v) = (uev −2ve−u )2 . Calculate the gradient ∇E(u, v) = ∂u
∂E at [u, v] = [1, 1]. Choose
∂v
the closest vector.
16. For some given A > 0, B > 0, what is the optimal α that solves
min Aeα + Be−2α ?
α
1
[a] 3 ln( 2B
A )
1 A
[b] 3 ln( 2B )
[c] ln( 2B
A )
A
[d] ln( 2B )
[e] none of the other choices
17. Let w be a vector in Rd and E(w) = 12 wT Aw + bT w for some symmetric matrix A and vector b.
What is the gradient ∇E(w)?
[a] wT Aw + wT b
[b] wT Aw − wT b
[c] Aw + b
[d] Aw − b
[e] none of the other choices
4 of 5
Machine Learning Foundations (NTU, Fall 2024) instructor: Hsuan-Tien Lin
18. Let w be a vector in Rd and E(w) = 12 wT Aw + bT w for some symmetric and strictly positive
definite matrix A and vector b. What is the optimal w that minimizes E(w)?
[a] +A−1 b
[b] −A−1 b
[c] −A−1 1 + b, where 1 is a vector of all 1’s
[d] +A−1 1 − b
[e] none of the other choices
19. Solve
1 2
min (w + 2w22 + 3w32 ) subject to w1 + w2 + w3 = 11.
w1 ,w2 ,w3 2 1
[a] 0
[b] 1
[c] 2
[d] 3
[e] 6
20. Solve
1 2
min (w1 + 2w22 + 3w32 )
w1 ,w2 ,w3 2
subject to w1 + w2 + w3 ≥ 11,
w2 + 2w3 ≥ −11.
What is the optimal (w1 , w2 , w3 )? (Hint: you can also consider using “Lagrange multipliers” to
solve this.)
[a] (3, 6, 2)
[b] (3, 2, 6)
[c] (6, 2, 3)
[d] (3, 6, 2)
[e] (6, 3, 2)
5 of 5