HW 1
HW 1
>>Quanliang Liu<<
>>9085925288<<
Instructions: This is a background self-test on the type of math we will encounter in class. If you find many
questions intimidating, we suggest you drop 760 and take it again in the future when you are more prepared.
You can use this latex file as a template to develop your homework. Submit your homework on time as a
single pdf file to Gradescope. There is no need to submit the latex source or any code.
1. Compute y T Xz
0
2. Is X invertible? If so, give the inverse, and
if no, explain
why not.
5 2
Yes, X is invertible, and the inverse is:
−7 −3
2 Calculus [3 pts]
x
1. If y = e− x + arctan(z) x6/z − ln , what is the partial derivative of y with respect to x?
x+1
6− z
6x z arctan(z)
∂y
∂x = −e− x + z − 1
x ( x +1)
1
Homework 1 CS 760 Machine Learning
A B P( A, B)
0 0 0.3
0 1 0.1
1 0 0.1
1 1 0.5
2
Homework 1 CS 760 Machine Learning
(f) f ( x; Σ, µ) = √ 1
exp − 12 ( x − µ)T Σ−1 ( x − µ)
(2π )k det(Σ)
(g) f ( x; n, α) = (nx)α x (1 − α)n− x for x ∈ {0, . . . , n}; 0
otherwise
1 | x −µ|
(h) f ( x; b, µ) = 2b exp − b
(a) Gamma j
x
(b) Multinomial i (i) f ( x; n, α) = n!
Πk α i for xi ∈ {0, . . . , n} and
Πik=1 xi ! i =1 i
(c) Laplace h ∑ik=1 xi = n; 0 otherwise
(d) Poisson l (j) f ( x; α, β) =
βα α−1 − βx
x e for x ∈ (0, +∞); 0 other-
Γ(α)
(e) Dirichlet k wise
Γ(∑ik=1 αi ) α −1
(k) f ( x; α) = ∏ik=1 xi i for xi ∈ (0, 1) and
∏ik=1 Γ(αi )
∑ik=1 xi = 1; 0 otherwise
−λ
(l) f ( x; λ) = λ x ex! for all x ∈ Z + ; 0 otherwise
E[ XY ] = ∑ ∑ xyP(X = x, Y = y)
x y
E[ XY ] = ∑ ∑ xyP(X = x) P(Y = y)
x y
3
Homework 1 CS 760 Machine Learning
Hence, we have:
E[ XY ] = E[ X ]E[Y ]
2. (3 pts) If X and Y are independent random variables, show that Var( X + Y ) = Var( X ) + Var(Y ).
Hint: Var( X + Y ) = Var( X ) + 2Cov( X, Y ) + Var(Y )
Since the sum of two random variables:
Cov( X, Y ) = E[ XY ] − E[ X ]E[Y ] = 0
3. (6 pts) If we roll two dice that behave independently of each other, will the result of the first die tell us
something about the result of the second die?
Since the probabilities of the results are independent. Therefore, the result of the first die tells nothing
about the result of the second die.
If, however, the first die’s result is a 1, and someone tells you about a third event — that the sum of
the two results is even — then given this information is the result of the second die independent of the
first die?
Since the first die’s result is 1, the second die must be 1 for the sum to be even. As a result, the result
of the second die is no longer independent of the first.
Given that Xi ∼ N (0, 1), the mean and variance of each Xi are:
E [ Xi ] = 0 and Var( Xi ) = 1
n
1 1
E[ X̄ ] =
n ∑ E [ Xi ] = n
·0 = 0
i =1
Step 2. Since the Xi ’s are independent and identically distributed (iid), the variance of X̄ is:
n
1 1 1
Var( X̄ ) =
n2 ∑ Var(Xi ) = n2
·n =
n
i =1
√
Step 3. For the scaled sample mean n X̄. The expectation is:
4
Homework 1 CS 760 Machine Learning
√ √ √
E[ n X̄ ] = n · E[ X̄ ] = n · 0 = 0
√
The variance of n X̄ is:
√ 1
Var( n X̄ ) = n · Var( X̄ ) = n · = 1
n
As a result, the distribution of the standardized sample mean approaches the normal distribution as
n → ∞. Specifically, we have:
√ d
n X̄ −
→ N (0, 1)
6 Linear algebra
6.1 Norms [5 pts]
Draw the regions corresponding to vectors x ∈ R2 with the following norms:
q
2. ||x||2 ≤ 1 (Recall that ||x||2 = ∑i xi2 )
5
Homework 1 CS 760 Machine Learning
5 0 0
For M = 0 7 0, Calculate the following norms.
0 0 3
4. || M ||2 (L2 norm)
7 ( largest singular value of the matrix M)
5. ∥ M ∥ F (Frobenius norm)
p √ √
∥ M∥ F = 52 + 72 + 32 = 25 + 49 + 9 = 83.
|w T x0 + b|
d=
∥ w ∥2
The distance is from the origin, so x0 = 0. Substituting x0 = 0 into the formula:
|wT 0 + b| |b|
d= =
∥ w ∥2 ∥ w ∥2
Thus, the smallest Euclidean distance from the origin to the hyperplane is:
|b|
d=
∥ w ∥2
|b1 −b2 |
2. The Euclidean distance between two parallel hyperplane w T x + b1 = 0 and w T x + b2 = 0 is ||w||2
(Hint: you can use the result from the last question to help you prove this one).
Since the hyperplanes are parallel, they share the same normal vector w. The distance between these
hyperplanes is the perpendicular distance between a point on one hyperplane to the other hyperplane.
Let’s choose a point x1 that lies on the hyperplane w T x + b1 = 0.
A point x1 that satisfies the equation of the hyperplane w T x + b1 = 0 must lie on this hyperplane, so:
w T x1 = −b1
6
Homework 1 CS 760 Machine Learning
The distance from the point x1 (which lies on the first hyperplane) to the second hyperplane w T x +
b2 = 0 is given by the formula:
|w T x1 + b2 |
d=
∥ w ∥2
|(−b1 ) + b2 | |b − b1 |
d= = 2
∥ w ∥2 ∥ w ∥2
1 0.25
2. Make a scatter plot by drawing 100 items from a mixture distribution 0.3N (5, 0) T , +
0.25 1
T 1 −0.25
0.7N (−5, 0) , .
−0.25 1