Assigment
Assigment
2 MATRIX DECOMPOSITION 9
2.1 Eigenvalues, Eigenvectors and Eigenspaces . . . . . . . . . . . . . . . . . . . 9
2.2 Eigendecomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Singular Value Decomposition (SVD) . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Matrix Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Supplementary Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 VECTOR CALCULUS 15
3.1 Derivatives, Partial Derivatives and Gradients . . . . . . . . . . . . . . . . . 15
3.2 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Taylor Series and Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 Supplementary Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 CONTINUOUS OPTIMIZATION 31
5.1 Gradient Descent Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Convex Sets and Convex Functions . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Constrained Optimization and Lagrange Multipliers . . . . . . . . . . . . . 32
5.4 Linear Programming and Quadratic Programming . . . . . . . . . . . . . . 32
5.5 Solved Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.6 Supplementary Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
List of Tables
i
ii
List of Figures
3.1 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1 Examples
n of convex and non-convex o sets . . . . . . . . . . . . . . . . . . . . 31
5.2 Region (x, y)| |x|1/3 + |y|1/3 ≤ 1 . . . . . . . . . . . . . . . . . . . . . . . 34
n o
5.3 Region (x, y)| |x|3/2 + |y|3/2 ≤ 1 . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 1
ANALYTIC GEOMETRY
1.1 Norms
• The mapping ∥.∥ : V → R, ∥x∥ := ⟨x, x⟩ is the norm induced by the inner product
p
⟨., .⟩.
• Given an inner product on a vector space V, the distance between two vectors u, v
are defined by d(u, v) := ∥u − v∥ = ⟨u − v, u − v⟩.
p
• (Cauchy - Swarchz inequality). Let (V, ⟨., .⟩) be an inner product space and ∥.∥ be
the norm induced by the inner product. Then, we have the following inequality
⟨u, u⟩
cos θ = . (1.2)
∥u∥∥v∥
1
2 ANALYTIC GEOMETRY
0, i ̸= j,
(
⟨vi , vj ⟩ = (1.3)
∥vi ∥2 > 0, i = j.
0, i ̸= j,
(
⟨vi , vj ⟩ = (1.4)
∥vi ∥2 = 1, i = j.
1°. Suppose ∥.∥ is the norm induced by an inner product ⟨., .⟩. Show that if ∥u∥ =
3, ∥v∥ = 4, and u + v∥ = 5, then u and v are orthogonal.
Solution. From the definitions and properties of an inner product and the norm
induced from this inner product, we have
Solution.
(a) We show
" that #⟨., .⟩ is a symmetric and positive bilinear mapping. Indeed, let
1 −1
A= .
−1 2
• Symmetry: ⟨u, v⟩ = uT Av = uT AT v = (Au)T v = vT (Au) = ⟨v, u⟩.
• Bilinear: ⟨au+bu′ , v⟩ = (au+bu′ )T Av = (au)T Av+(bu′ )T Av = a⟨u, v⟩+
b⟨u′ , v⟩.
Similarly, ⟨u, av + bv′ ⟩ = a⟨u, v⟩ + b⟨u, v′ ⟩, for all scalars a, b.
1.4 Solved Problems 3
Solution.
" #
1 −1
Let A = . Then
−1 3
" #
1 −1
v Aw = [v1
T
v2 ] [w1 w2 ]T = v1 w1 − v1 w2 − v2 w1 + 3v2 w2 .
−1 3
4°. Let a > 0, and b > 0. If v = (x1 , y1 ) and w = (x2 , y2 ), define an inner product on
R2 by
x 1 x 2 y1 y2
⟨v, w⟩ := 2 + 2 .
a b
Find all unit vectors in R2 .
Solution.
Let v = (x, y) be a unit vector. Then
x2 y 2
1 = ∥v∥2 = ⟨v, v⟩ = + 2.
a2 b
x2 y2
Therefore, the set of all unit vectors is {(x, y) : a2
+ b2
= 1}. This is an ellipse.
5°. Are there vectors u, v such that ⟨u, v⟩ = −7, ∥u∥ = 3, and ∥v∥ = 2?
Solution.
Using the Cauchy-Schwarz inequality, we obtain
6°. Let C be the vector space of all continuous functions defined on [0, 1].
Z 1
(a) Show that ⟨f, g⟩ = f (x)g(x) dx is an inner product on C.
0
(b) Find the length of the function f (x) = 2x − 3.
(c) Find the distance between f (x) = 1 + x and g(x) = 2.
Solution.
4 ANALYTIC GEOMETRY
Similarly,
⟨f, ag + bh⟩ = a⟨f, g⟩ + b⟨f, h⟩.
13
Z 1 Z 1
(b) ∥f ∥2 = ⟨f, f ⟩ = [f (x)]2 dx = (2x − 3)2 dx = .
q 0 0 3
Hence, ∥f ∥ = 13
3 .
(c) Let h(x) = f (x) − g(x) = x − 1. Then,
1
Z 1 Z 1
∥h∥ = ⟨h, h⟩ =
2
[h(x)] dx =2
(x − 1)2 dx = .
0 0 3
√
Thus, ∥f − g∥ = 3
3 .
7°. An inner product on the vector space of all matrices of size 2 × 2 is defined by
0 = ⟨A, X⟩ = trace(AT X) = b − c ⇔ b = c.
8°. Prove that if Q is an orthogonal matrix, then Q preserve the angle of two vectors
in a vector space with the dot product.
Solution.
We need to show that if Q is an orthogonal matrix, then the angle between Qu and
1.4 Solved Problems 5
Qv equals to the angle between u and v for any vectors u, v. Indeed, for any vector
x, we have
∥Qx∥2 = (Qx)T Qx = xT QT Qx = xT Ix = ∥x∥2 .
Therefore,
∥Qx∥ = ∥x∥ (1.7)
On the other hand, for any tow vectors x and y,
Qx · Qy = (Qx)T Qy = xT QT Qy = xT y. (1.8)
From (1.7), (1.8) and the formula (1.2), we obtain the conclusion of the problem.
9°. Show that if a matrix is positive definite, then its eigenvalues are real and positive.
Solution.
Suppose A is a positive definite matrix, λ is an eigenvalue of A, and v is an
λ−eigenvalue. Then, Av = λv and therefore,
10°. Let P2 be the vector space of all polynomials of degree less than or equal to 2. Define
an inner product on P2 ,
Solution.
First, We use the Gram-Schmidt process to construct a orthonormal basis of U.
Let p1 (x) = x + 1 and apply (1.5), we have
⟨x + 1, x2 ⟩ 14
p2 (x) = x2 − (x + 1) = x2 − (x + 1) = x2 − x − 1.
⟨x + 1, x + 1⟩ 14
pi (x)
Normalize pi (x) by setting bi (x) = ∥pi (x)∥ (i = 1, 2), we obtain
x+1 x2 − x − 1
{b1 = √ , b2 (x) = √ }
14 3
as an ONB of U.
Apply (1.6) to find the orthogonal projection of f (x) on U, we have
2°. Let (V, ⟨., .⟩) be an inner product space and ∥.∥ be the norm induced by the inner
product.
(a) Show that ∥u + v∥2 + ∥u − v∥2 = 2∥u∥2 + 2∥v∥2 for any vectors u, v.
(b) Show that ⟨u + v, u − v⟩ = ∥u∥2 − ∥v∥2 for any vectors u, v.
(c) Compute ⟨u, v⟩ if ∥u + v∥ = 8 and |u − v∥ = 6.
3°. Show that two unit vectors u, v are orthogonal if ⟨3u − v, u + 3v⟩ = 0.
1 0 0
⟨u, v⟩ = u 0 3 0 v.
T
0 0 1
7°. In each case, find a symmetric matrix A such that ⟨v, w⟩ = vT Aw.
8°. Let a > 0 and b > 0. If v = (x1 , y1 ) and w = (x2 , y2 ), define an inner product on
R2 by
x 1 x 2 y1 y2
⟨v, w⟩ := 2 + 2 .
a b
Find all vectors orthogonal to (−1, 1).
9°. Let C be the vector space of all continuous functions defined on [0, 1].
Z 1
(a) Show that ⟨f, g⟩ = f (x)g(x)dx is an inner product on C.
0
(b) Find the distance between f (x) = 1 + x and g(x) = x2 .
1.5 Supplementary Problems 7
11°. An inner product on vector space of all matrices of size 2 × 2 is defined by ⟨A, B⟩ =
tr(AT B).
" #
1 2
(a) Find the length of the matrix .
3 −1
(b) Find two matrices that are orthogonal.
12°. Prove that if Q is an orthogonal matrix, then ∥Qv∥ = ∥v∥ for every vector v.
14°. Show that if ∥.∥ is the norm induced by an inner product ⟨., .⟩, then for all vectors
x, y, the following hold:
∥x + y∥ ≤ ∥x∥ + ∥y∥.
15°. Let R2 be the inner product space with the inner product ⟨(x, y), (x′ , y ′ )⟩ = xx′ +2yy ′
and ∥.∥ be the norm derived from the inner product ⟨., .⟩. Find the value of α that
minimizes the value of ∥x − αy∥, where x = (1, 2) and y = (0, −1).
16°. Suppose two unit vectors u and v are not orthogonal. Show that w = u − v(vT u)
is orthogonal to v.
18°. Let P2 be the vector space of all polynomials of degree less than or equal to 2. Define
an inner product on P2 ,
MATRIX DECOMPOSITION
2.1 Eigenvalues, Eigenvectors and Eigenspaces
det(A) = λ1 · λ2 · · · λn (2.3)
and
trace(A) = a11 + a22 + · · · + ann = λ1 + λ2 + · · · + λn . (2.4)
2.2 Eigendecomposition
QT AQ = D = diag{λ1 , · · · , λn }. (2.6)
9
10 MATRIX DECOMPOSITION
Solution.
Suppose A is a positive semi-definite matrix and λ is an eigenvalue of A. Then,
there exists an eigenvector v such that Av = λv. Therefore,
0 ≤ vT Av = vT (λv) = λvT v = λ∥v∥2 .
Hence, λ is real and none-negative.
2°. Show that if A ∈ Rn×n , AAT is symmetric and has real, non-negative eigenvalues.
Solution.
For any v ∈ Rn , We have
vT AT Av = (Av)T Av = ∥Av∥2 ≥ 0.
Hence, AT A is positive semi-definite and it follows from Problem 1◦ that AT A has
real and non-negative eigenvalues.
2.5 Solved Problems 11
" #
1 0 −1
3°. Find σ1 , the largest singular value of the matrix A = .
0 1 1
Solution. √
Since σ1 = λ1 , where λ1 is the largest eigenvalue of AAT , we first compute AAT
and yield " #
2 −1
AA =T
.
−1 2
The characteristic polynomial
√ of AA
√ is (2 − λ) − 1 and its eigenvalues are λ1 =
T 2
3, and λ2 = 1. Thus, σ1 = λ1 = 3.
4°. Use an SVD to construct the best rank-k approximation Â(k) of a matrix A of rank
r > k. What is the relative error of the approximation with respect to the spectral
norm?
Solution.
Since ∥A∥2 = σ1 and ∥A − Â(k)∥2 = σk+1 , the relative error can be computed by
∥AA − Â(k)∥2 σk+1
= .
∥A∥2 σ1
" #
1 0 1
5°. Find an SVD of the matrix .
−1 1 0
Solution.
AT A = −1 1 0 .
1 0 1
" # "√ #
σ 0 0 3 0 0
• Step 2. Construct Σ. We have Σ = 1 = .
0 σ2 0 0 1 0
• Step 3. Construct U. use the formula
Ai vi
ui = , i = 1, 2,
σi
we have
√
2/ √6
" #
1 0 1
−1/√ 6
−1 1 0 " 1 # " 1 #
1/ 6 √ √
u1 = √ = √
−1
2 and u2 = √1
2
3 2 2
Thus, an SVD of A is
−1
" 1 # "√ # √2 √ √1
√ √1 3 0 0 6 6 6
A= 2 2 0 √1 √1 .
−1
√ √1 0 1 0 −1 −1
2 2
2 2 √ √ √1
3 3 3
2°. Show that if A ∈ Rn×n , AAT is symmetric and has real, non-negative eigenvalues.
6°. (a) Use an SVD to construct the best rank-k approximation Â(k) of a matrix A
of rank r > k. What is the relative error of the approximation with respect to
the spectral norm?
(b) Suppose a matrix A has positive singular values 4, 3, 2, 1. What is the spectral
norm of this matrix? What is the Frobenius norm of this matrix?
" #
1 0 1
9°. Given an SVD of a matrix A = ,
−1 1 0
−1
" 1 # "√ # √2 √ √1
√ √1 3 0 0 6 6 6
A= 2 2 0 √1 √1 .
−1
√ √1 0 1 0 −1
−1
2 2
2 2 √ √ √1
3 3 3
(a) Let Â(1) be the best rank-1 approximation of A. Find (1,2)-entry of Â(1)
(b) Let Â(2) be the best rank-2 approximation of A. Find (1,3)-entry of Â(2)
14 MATRIX DECOMPOSITION
Chapter 3
VECTOR CALCULUS
3.1 Derivatives, Partial Derivatives and Gradients
• The derivative of the function f (x), denoted by f ′ (x) or dx d
f (x), is the function
defined by
f (x + h) − f (x)
f ′ (x) = lim . (3.1)
h→0 h
The domain of f ′ (x) is the set of all x such that the limit in (3.1) exists and finite.
• The partial derivative of f (x1 , x2 , · · · , xn ) with respect to xi , i = 1, · · · , n, is the
function defined by
∂f f (x1 , · · · , xi + h, · · · , xn ) − f (x1 , · · · , xi , · · · , xn )
= . (3.2)
∂xi h
When computing the derivative of f with respect to xi , we treat other variables as
constants.
• The gradient of a multivariate function x = (x1 , x2 , · · · , xn ) 7−→ f (x) ∈ R is a row
vector h i
∇x f := ∂x∂f
1
∂f
∂x2 · · · ∂f
∂xn . (3.3)
15
16 VECTOR CALCULUS
3.3 Hessian
• When the the gradient is the collection of all first order partial derivatives of a
function, the Hessian is the collection of all its second order partial derivatives.
f (n) (x0 )
where the coefficients cn is given by cn = n! .
• When x0 = 0, the Taylor series is called the Maclaurin series which is of the following
form
f ′ (0) f ′′ (0) 2 f (n) (0) n
f (x) = f (0) + x+ x + ··· + x + ··· (3.8)
1! 2! n!
1 ∂f 1 ∂f
f (x, y) = f (x0 , y0 ) + (x0 , y0 )(x − x0 ) + (x0 , y0 )(y − y0 )
1! ∂x 1! ∂y
1
!
∂2f ∂2f ∂2f
+ (x0 , y0 )(x − x 0 ) 2
+ 2 (x 0 , y0 )(x − x 0 )(y − y0 ) + (x0 , y0 )(y − y0 )2
2! ∂x2 ∂x∂y ∂y 2
n
1
!
∂nf
+··· + (x0 , y0 )(x − x0 )k (y − y0 )n−k + ··· . (3.10)
X
Cnk
n! k=0
∂xk ∂y n−k
3.5 Solved Problems 17
(a) f (x, y) = 2x
1+y 2
, x(t, s) = s(1 − t), y(t, s) = t − s.
(b) f (u, v) = u2 + v 2 , u(t, s) = s cos t, v(t, s) = t2 − s.
Solution.
It is easily seen that (u, v) = (1, −1), and f (u, v) = 2 when (t, s) = (0, 1).
(a) We have
∂f ∂f ∂x ∂f ∂y 2 4xy
= + = (1 − t) + (−1) = 1 + 1 = 2.
∂s ∂x ∂s ∂y ∂s 1+y 2 (1 + y 2 )2
and
∂f ∂f ∂x ∂f ∂y 2 4xy
= + = (−s) + (1) = 1 − 1 = 0.
∂t ∂x ∂t ∂y ∂t 1+y 2 (1 + y 2 )2
(b) We have
∂f ∂f ∂u ∂f ∂v
= + = 2u cos t + 2v(−1) = 2 + 2 = 4.
∂s ∂u ∂s ∂v ∂s
and
∂f ∂f ∂u ∂f ∂v
= + = 2u(− sin t) + 2v(2t) = 0.
∂t ∂u ∂t ∂v ∂t
2°. Find the Taylor series of the function f (x) = ln(x + 2) centered at a = 1 and use
the Taylor polynomial of degree 2 to approximate the value of f (1.2).
Solution.
We have
1
f ′ (x) = , f ′′ (x) = −(x + 1)−2 , and f (n) (x) = (−1)n−1 (n − 1)(x + 1)−n .
x+1
1 −1 2 (−1)n−1 (n − 1)
ln(x+2) = ln 3+ (x−1)+ 2 (x−1)2 + 3 (x−1)3 +· · ·+ (x−1)n +· · · .
2 2 2! 2 3! 2n n!
The Taylor polynomial of degree 2 of f (x) at a = 1 is
1 −1
T2 (x) = ln 3 + (x − 1) + 2 (x − 1)2 .
2 2 2!
Then, 1.1631 ≈ ln(3.2) = f (1.2) ≈ T2 (1.2) ≈ 1.1936.
18 VECTOR CALCULUS
3°. Given the function f (x, y) = x3 + xy 2 − 4xy + 2. Find the Taylor series of the
function around the point (1, −1).
Solution.
We have ∂f∂x = fx = 3x + y − 4y, fy = 2xy − 4x, and
2 2
∂2f ∂2f
= fxx = 6x, = fxy = fyx = 2y − 4, and fyy = 2x.
∂x2 ∂x∂y
Hence, fxxx = 6, fxxy = 0 = fxyx = fyxx , fxyy = fyxy = fyyx = 2, fyyy = 0, and all
fourth partial derivatives are zeros.
Therefore,
f (1, −1) = 8, fx (1, −1) = 8, fy (1, −1) = −6, fxx (1, −1) = 6,
fxy (1, −1) = fyx (1, −1) = −6, fyy (1, −1) = 2.
The Taylor series of f (x, y) about (1, −1) is
1
f (x, y) = 8 + (8(x − 1) − 6(y + 1))
1!
1
+ 6(x − 1)2 + 2(−6)(x − 1)(y + 2) + 2(y + 1)2
2!
1
+ 6(x − 1)3 + 3(2)(x − 1)(y + 1)2 .
3!
4°. Find the linear approximation of the function f (x, y, z) = xyz − y 2 + x2 + 2y at the
point (2, 1, −1).
Solution.
∂f
We have fx = ∂x = yz + 2x, fy = xz − 2y + 2, fz = xy and therefore,
5°. Find the gradient of the function f : Rn → R; f (x) = ∥x − c∥. with respect to x.
Solution.
Since f (x) = ∥x − c∥ = (x1 − c1 )2 + (x2 − c2 )2 + · · · + (xn − cn )2 , we have
p
∂f x1 − c1 ∂f x2 − c2 ∂f xn − cn
= , = ,··· , = .
∂x1 ∥x − c∥ ∂x2 ∥x − c∥ ∂xn ∥x − c∥
Hence,
1 1
∇x f = [x1 − c1 x2 − c2 ··· xn − cn ]T = (x − c)T .
∥x − c∥ ∥x − c∥
3.6 Supplementary Problems 19
6°. Find the Jacobian determinant of (x(t, s), y(t, s)) = (s cos t, t sin s).
Solution.
Solution.
The gradient of f with respect to (x, y, z) is the 3 × 2 matrix
1 −3
1 4 .
−2 1
2
8°. Find the Hessian matrix of the function f (x, y, z) = x3 ey−z at the point (1, 0, −1).
Solution.
2 2 2
• We have fx = 3x2 ey−z , fy = x3 ey−z , fz = −2zx3 ey−z .
2 2 2 2
• fx = 3x2 ey−z ⇒ fxx = 6xey−z , fxy = 3x2 ey−z , fxz = −6zx2 ey−z .
2 2 2
• fy = x3 ey−z ⇒ fyy = x3 ey−z , fyz = −2zx3 ey−z .
2 2 2
• fz = −2zx3 ey−z ⇒ fzz = −2x3 ey−z + 4z 2 x3 ey−z .
• The Hessian of f at the point (1, 0, −1) is
∇ f (1, 0, −1) = fyx (1, 0, −1) fyy (1, 0, −1) fyz (1, 0, −1)
2
fzx (1, 0, −1) fzy (1, 0, −1) fzz (1, 0, −1)
6/e 3/e 6/e
1
(a) g(x) = .
1 + e−ax−b
(b) h(t) = .
1 + t2
(c) L(θ) = pθ (1 − p)1−θ .
2°. Find the Taylor series of the function f (x) = x4 + 2x2 − 3x2 + 8x − 2 centered at
a = 1.
3°. Suppose f is a continuous function such that f ′′′ exists and f (0) = 2, f ′ (0) =
−1, f ′′ (0) = 6 and f ′′′ (0) = 12. Find the Taylor polynomial of degree 3 of f around
a = 0.
(a) Find the Taylor series of the function around the point (1, −1).
(b) Use the Taylor polynomial of degree 2 of f around (1, −1) to approximate the
value of f (1.2, −0.9).
5°. Use the linear approximation of the function f (x, y, z) = x2 y − yz 3 + 2x at the point
(0, 1, −1) to approximate the value of f (0.1, 0.8, −1.2).
6°. Given the functions f (u, v) = u(u − v), u(x, y) = x + y, v(x, y) = y. Find the partial
derivatives ∂f ∂f
∂x , ∂y .
10°. Find the Hessian matrix of the function f (x, y, z) = x3 + y 2 − 3yez at the point
(1, −1, 0).
11°. Find the gradient of f (x, y, z) = (x3 − 2xy, y + z 2 ) at the point (1, 0, 2).
Chapter 4
21
22 PROBABILITY AND DISTRIBUTIONS
f (x, y)
fX|y (x) = (4.8)
fY (y)
f (x, y)
fY |x (y) = . (4.9)
fX (x)
where σij = σXi Xj = Cov[Xi , Xj ] for i ̸= j and σi2 is the variance of variable
Xi , i = 1, · · · , n. The variance matrix is a symmetric and positive semi-definite.
• If X is a random vector with mean vector E(X) and variance matrix V[X] and Y is
a random vector such that Y = AX, where A is a real matrix. Then,
and
V[Y ] = AV[X]AT . (4.13)
−1 ≤ corr[X, Y ] ≤ 1. (4.15)
• If the joint distribution p(x, y) is a Gaussian, then the marginal distributions are
also Gaussians (graphs of pdfs are bell-shaped curves):
σ12 2
σ12
µx|y = µx + (µy − y), 2
σx|y = σ12 − . (4.20)
σ22 σ22
• The mixture of two Gaussians p1 (x) = N (µ1 , σ12 ) and p2 (x) = N (µ2 , σ22 ) with a
weight 0 < α < 1 is defined by
Then, the mean and variance of the corresponding random variable are given by
1°. Given the joint distribution of two discrete random variables X and Y,
c
, for x = 1, 2, 3; y = 1, 2,
f (x, y) = x+y
0, elsewhere.
Solution.
24 PROBABILITY AND DISTRIBUTIONS
(a) Since
3 X
2
1= f (x, y)
X
x=1 y=1
(b) We have
P (Y = 2, X = 1)
P (Y = 2|X = 1) =
P (X = 1)
f (x = 1, y = 2)
= P2
y=1 f (x = 1, y)
5
2
= 28 = .
15 1
+ 1 5
28 2 3
15 1 1 25
(c) P (X = 1) = y=1 f (x = 1, y) = f (1, 1) + f (1, 2) = + =
P2
.
28 2 3 56
35 27
Similarly, P (X = 2) = 1 × , and P (X = 3) = .
112 112
Therefore,
3
25 35 27 201
E(X) = xP (X = x) = +2× +3× =
X
.
x=1
56 112 112 112
2°. Given the joint distribution of two continuous random variables X and Y,
(
c(x + y 2 ), for 0 ≤ x ≤ 2; 0 ≤ y ≤ 1,
f (x, y) =
0, elsewhere.
Solution.
4.5 Solved Problems 25
(a) We have
Z ∞ Z ∞
1= f (x, y) dy dx
−∞ −∞
Z 2Z 1
= f (x, y) dy dx
0 0
Z 2 Z 1
= c(x + y 2 ) dy dx
0 0
1
Z 2
= c x+ dx
0 3
8c
=
3
3
⇔c= .
8
(b) We have
Z 1Z 2
P (X + Y > 2) = f (x, y) dx dy
0 2−y
1Z 2 3
Z
= (x + y 2 ) dx dy
0 2−y 8
!#2
3
Z 1"
x2
= + y2x dy
0 8 2 2−y
3 (2 − y)2 13
Z 1 !
= 2 + 2y − − y 2 (2 − y)
2
dy = .
0 8 2 48
Then,
3 5
Z 1 Z 1
E(Y |X = 1) = yfY |x=1 (y) dy = y 1 + y 2 dy = .
0 0 4 8
3°. Suppose X and Y are two continuous random variables such that V (X) = 4, V (Y ) =
3, and Cov[X, Y ] = −8. Let Z = 2X − 3Y, find V (Z).
Solution.
V (Z) = V (2X − 3Y ) = 22 V (X) + 2 × 2 × (−3) × Cov[X, Y ] + (−3)2 V (Y ) = 139.
" # " #!
−2 3 −2
4°. Given the bivariate Gaussian p(x, y) = N µ = ,Σ = .
1 −2 2
Solution.
(a) We have
1 1
µx|y=0 = µx + σ12 × (µy − y) = −2 + (−2) (1 − 0) = −3
σ22 2
and
σ21 −2 5
µy|x=−1 = µy + (µx − x) = 1 + (−2 − (−1)) = .
σ12 3 3
5°. Given two Gaussians p1 (x) = N (µ1 = 2, σ12 = 3) and p2 (x) = N (µ2 = 1, σ22 = 2).
Consider p(x) = αp1 (x) + (1 − α)p2 (x), where 0 ≤ α ≤ 1. Then p(x) is the pdf of a
continuous random variable X.
Solution.
(a) We have
and
(b) We have
P (X > 1) = 1 − P (X ≤ 1)
Z 1
=1− p(x) dx
−∞
Z 1
=1− [αp1 (x) + (1 − α)p2 (x)] dx
−∞
Z 1 Z 1
=1−α p1 (x) dx − (1 − α) p2 (x) dx
−∞ −∞
1−2 1−1
= 1 − αP Z ≤ √ − (1 − α)P Z ≤ √
3 2
= 1 − 0.3 × 0.282 − 0.7 × 0.5 = 0.5654.
3
6°. Consider a random vector X with mean vector µ = 4 and the variance matrix
2
3 1 0
V[X] = 1 2 1 .
0 1 3
" #
2 3 1
Let Y = X.
1 −2 2
Solution. " #
2 3 1
Let A = .
1 −2 2
# 3
" " #
2 3 1 20
(a) E(Y) = AE(X) = 4 = .
1 −2 2 −1
2
# 3 1 0 "
" #T " #
2 3 1 2 3 1 51 3
(b) V[Y] = AV[X]AT = 1 2 1 = .
1 −2 2 1 −2 2 3 11
0 1 3
cx ,
for x = 1, 2; y = 1, 2, 3,
f (x, y) = y
0, elsewhere.
5. Suppose X and Y are two continuous random variables such that V (X) = 3, V (Y ) =
5, and Cov[X, Y ] = −4. Let Z = 2X − Y, find V (Z).
" # " #!
2 2 3
6. Given the bivariate Gaussian p(x, y) = N µ = ,Σ = .
3 3 5
7. Given two Gaussians p1 (x) = N (µ1 = 3, σ12 = 4) and p2 (x) = N (µ2 = 3, σ22 = 6).
Consider p(x) = αp1 (x) + (1 − α)p2 (x), where 0 ≤ α ≤ 1.
(a) Prove that p(x) is the probability density function of some continuous random
variable X.
(b) Find E(X) and V (X), when α = 0.3.
(c) Given α = 0.3, find the probability that X < 3.
8. Given the data points in R3 : (1, 1, 0); (−1, 0, 1); (2, 2, 3); (−2, 1, 0).
2
9. Consider a random vector X with mean vector µ = −1 and the variance matrix
1
2 0 1
0 2 1 .
1 1 3
" #
−2 1 3
Let Y = X.
1 −2 0
CONTINUOUS OPTIMIZATION
5.1 Gradient Descent Algorithm
• To find a local minimum value of a function f (x), an algorithm called gradient
descent can be used as follow:
– Choose/guess x0 near the optimal point x∗
– Repeat computing xn using
xn = xn−1 − γ[∇f (xn−1 )]T , (5.1)
for n = 1, 2, · · · until the stopping criteria is true.
• For good learning rate γ > 0, the sequence f (xn ) is decreasing and tends to the
local minimum value f (x∗ ).
• The small learning rate may cause a slow convergence while the algorithm may fail
to converge with a large learning rate. The learning rate could be changed at each
iterative step of the algorithm.
31
32 CONTINUOUS OPTIMIZATION
=0
(
∂L
∂x (5.3)
∂L
∂λ =0
By convex optimization, we mean that the f (.) and gi (.) are convex functions and
all hj (x = 0 are convex sets.
1 T
min x Qx + cT x subject to Ax ≤ b, (5.7)
x∈Rn 2
Solution.
The Hessian of f is
2 −1 0
H = −1 2 0 .
0 0 2
" #
2 −1
We have det[2] = 2 > 0, det = 3 > 0, and det(H) = 6 > 0. Since the
−1 2
determinants of all upper left square matrices are all positive, H is positive definite.
Therefore, f is convex.
Solution.
Using the Hessian of f , one can show that f is convex and therefore, f has one local
minimum and it is also the global minimum of f . Moreover, f attains its minimum
at (x0 , y0 ) at which the partial derivatives vanish. We have
( ∂f
=0 2x − y = 0 x=
( (
4
∂x 3
∂f ⇔ ⇔
∂y =0 −x + 2y − 4 = 0 y= 8
3
The minimum value of f is f 4 8
3, 3 = − 16
3 .
3°. Use gradient descent algorithm to find a local minimum value of the function f (x, y) =
x2 +3y 2 −2xy using (x0 , y0 ) = (0, 1) and learning rate γ = 0.1. Find the point (x2 , y2 ).
Solution.
Compute partial derivatives of f , we obtain fx (x, y) = 2x − 2y, fy (x, y) = 6y − 2x.
Use (5.1) we have
and then,
n o
4°. Prove that the set C = (x, y)| |x|1/3 + |y|1/3 ≤ 1 is not convex in R2 .
Solution.
Consider two points (1, 0) and (0, 1) in C. Then for θ = 0.5, we have θ(1, 0) + (1 −
θ)(0, 1) = (0.5, 0.5) ∈
/ C since 0.51/3 + 0.51/3 > 1. Hence, C is not convex (see Figure
5.2).
n o
5°. Is the set (x, y)| |x|3/2 + |y|3/2 ≤ 1 convex in R2 ?
Solution.
It can be seen that the set is convex (Figure 5.3).
34 CONTINUOUS OPTIMIZATION
Fig. 5.2: Region (x, y)| |x|1/3 + |y|1/3 ≤ 1
Fig. 5.3: Region (x, y)| |x|3/2 + |y|3/2 ≤ 1
5.5 Solved Problems 35
6°. Show that the intersection of two convex sets is also convex. Is the union of two
convex sets convex?
Solution.
Suppose A and B are two convex sets and x and y are two points in A ∩ B. For
any 0 ≤ θ ≤ 1, we have
and
θx + (1 − θ)y ∈ B since B is convex.
Therefore,
θx + (1 − θ)y ∈ A ∩ B.
This follows that A ∩ B is convex.
The union of two convex sets may not convex. For example, the sets {x|x > 1} and
{x|x < 0} are convex sets but {x|x > 1 or x < 0} is not convex.
7°. Find the Lagrangian and dual Lagrangian of the optimization problem
Solution.
Let g(x, y) = 2x − y − 2 ≤ 0, then the Lagragian is
Solve
=0 −6 + 2λx =0
( (
Lx
⇔
Ly =0 2y − 2λ =0
yields x0 = λ3 , y0 = λ.
Hence, the dual Lagrangian is
9
D(λ) = min L(x, y, λ) = L(x0 , y0 , λ) = −λ2 − 2λ − .
(x,y)∈R2 λ
Solution.
This is a linear program which is of the form
min cT x subject to Ax ≤ b
(x,y)∈R2
" #
1 −1
where c = [2 − 1]T , A = , and b = [3 11]T . The dual Lagrangian is
2 1
−λT b = −3λ1 − 11λ2 .
36 CONTINUOUS OPTIMIZATION
1°. Use gradient descent algorithm to find the local minimum value of the function
f (x, y) = x2 + y 2 − 4y using (x0 , y0 ) = (1, 1) and learning rate γ = 0.2. Find the
point (x3 , y3 ).
5°. Find the minimum value of the function f (x, y, z) = x2 + y 2 − xy + z 2 subject to the
constraint x + y = 2z − 4.
6°. Use gradient descent to find the local minimum value of the function f (x, y) =
2x2 + y 2 − xy + 2x using (x0 , y0 ) = (1, 1) and learning rate γ = 0.1. Find the point
(x3 , y3 ).
7°. Find the point closest to the origin and on the line of intersection of the planes
2x + y + 3z = 9 and 3x + 2y + z = 6.
8°. Prove that the set {(x, y)| |x| + |y| ≤ 1} is not convex in R2 .
p p
q q
9°. Is the set {(x, y)| |x|3/2 + |y|3/2 ≤ 1} convex in R2 ?
10°. Use the method of Lagrange multipliers to find critical points of the function f (x, y) =
x − 2y subject to the constraint xy = 1.
min x2 − 4y subject to 6x − 2y 2 ≥ 3.
(x,y)∈R2
Chapter 6
6.1 Questions
(a) 3
(b) 6
(c) 9
(d) 36
2. Compute the norm of u = (2, 1, −1) using the inner product defined by ⟨x, y⟩ :=
1 0 0
xT Ay, where A = 0 2 1.
0 1 3
(a) 2
√
(b) 5
√
(c) 7
(d) 7
(a) 1/3
(b) 2/3
√
(c) 3/3
√
(d) 3
37
38 MULTIPLE CHOICE QUESTIONS
(a) ( 12 , 0, −1
2 )
(b) ( 12 , 0, 21 )
(c) (2, 0, 2)
(d) (2, 0, −2)
7. Find the (2, 1)-entry of projection matrix on the space spanned by b = (1, 0, −1).
(a) 0
(b) −1
2
(c) 1
2
(d) 1
2 3 −1
(a) 1, 2, 3
(b) −2, 2, 4
(c) 1, 2
(d) 1, 3
1 2 0 0
3 4 0 0
9. Find the determinant of the matrix .
5 6 9 10
7 8 11 12
(a) −4
(b) −2
(c) 2
(d) 4
(a) −10
(b) −6
6.1 Questions 39
(c) −2
(d) 4
11.
Find the dimension of the eigenspace corresponding to the eigenvalue 1 of the matrix
−1 −1 1
−2 0 1.
−2 −1 2
(a) 0
(b) 1
(c) 2
(d) 3
(a) 0
(b) 1
√
(c) 2
(d) 2
(a) 0
(b) 1
2
√
(c) 1
(d) 2
" #
1 2 0
14. Let σ1 > σ2 be two largest singular values of the matrix . Find σ12 − σ22 .
0 1 2
(a) 4
(b) 5
(c) 7
(d) 10
" #
1 0
15. Select a matrix P such that P −1 P is a diagonal matrix.
2 3
40 MULTIPLE CHOICE QUESTIONS
" #
0 1
(a) P =
1 −1
" #
0 1
(b) P =
1 1
" #
1 1
(c) P =
0 1
" #
1 1
(d) P =
0 −1
16. Let H be the Hessian of f (x, y, z) = x2 + 3yz 2 − 4x at the point (1, 2, 0). Find tr(H).
(a) 2
(b) 6
(c) 8
(d) 14
√
17. Given f (u, v) = u2 + 2uv and u(s) = 1/s, v(s) = s. Find the derivative of f with
respect to s at s = 4.
(a) −5
32
(b) −5
16
(c) 5
32
(d) 3
16
" #
1 2
18. Find the gradient of f (x) = xT x at x = [1 0]T .
2 1
(a) [1 2]
(b) [2 1]
(c) [2 4]
(d) [4 2]
x2
19. Find the coefficient involving the term (x−1)2 y in the Taylor series of f (x, y) = 1+y .
(a) −6
(b) −3
(c) −2
(d) −1
20. Let T1 (x, y) be the Taylor polynomial of degree 1 of the function f (x, y) = x2 + y 2 +
2xy at the point (2, −1). Find T1 (3, 0).
(a) 3
(b) 4
(c) 5
6.1 Questions 41
(d) 6
∂f
21. Given f (x, y) = x2 − xy and x(s, t) = s/t, y(s, t) = s − t. Find ∂s at s = 1, t = 2.
(a) 1/2
(b) 1
(c) 3/2
(d) 2
(a) 1/6
(b) 1/4
(c) 1/3
(d) 1/2
(a) 0.12
(b) 0.2
(c) 10.32
(d) 0.6
k(x2 + 2y), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
(
f (x, y) =
0, otherwise.
(a) 1/3
(b) 1/2
(c) 3/4
(d) 4/3
42 MULTIPLE CHOICE QUESTIONS
4 (x + 2y), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
(
3 2
f (x, y) =
0, otherwise.
Find E(X).
(a) 1/4
(b) 9/16
(c) 3/4
(d) 1
4 (x + 2y), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
(
3 2
f (x, y) =
0, otherwise.
Find P (X < 1
2 | Y = 12 ).
(a) 13/32
(b) 13/27
(c) 13/24
(d) 3/4
" #
1 −1
27. Given bivariate Gaussian p(x, y) = N (µ, Σ) where µ = [2 3]T , Σ =
−1 4
Find corr[X, Y ].
(a) −0.5
(b) −0.25
(c) 0.25
(d) 0.5
" #
1 −1
28. Given bivariate Gaussian p(x, y) = N (µ, Σ) where µ = [2 3]T , Σ =
−1 4
Consider Z = 3X − Y . Find V (Z).
(a) 1
(b) 7
(c) 13
(d) 19
" #
1 −1
29. Given bivariate Gaussian p(x, y) = N (µ, Σ) where µ = [2 3]T , Σ =
−1 4
Find E(X | Y = 1).
(a) 0
6.1 Questions 43
(b) 1.5
(c) 2
(d) 2.5
(a) A only
(b) B only
(c) Both A and B
(d) Neither A nor B
(a) A only
(b) B only
(c) Both A and B
(d) Neither A nor B
(a) f only
(b) g only
(c) Both f and g
(d) Neither f nor g
(a) f only
(b) g only
(c) Both f and g
(d) Neither f nor g
(a) −12
(b) −10
(c) −9
44 MULTIPLE CHOICE QUESTIONS
(d) −8
(a) 0
(b) 2
(c) 3
(d) 4
−x + y
(
≤ −3,
min −2x + 3y − z subject to
(x,y,z)∈R3 27 + z ≤ 4.
min x2 − 4y subject to y 2 − 2x ≤ 3.
(x,y)∈R2
(a) λ2 − 4
λ
(b) −λ2 − 4
λ
(c) λ2 + 4
λ
(d) −λ2 + 4
λ
38. Let ∥.∥ be the norm induced by an inner product ⟨., .⟩. Compute ⟨u, v⟩ if ∥u + ∥ = 8
and ∥u − v∥ = 6.
(a) 2
(b) 7
(c) 16
(d) 24
39. Suppose {u, v, w} is an orthonormal set in a vector space with an inner product ⟨., .⟩.
Compute ⟨u − v + w, v − 3w⟩.
(a) −4
(b) −3
(c) 2
(d) 4
6.2 Answer key 45
40. Consider the mapping f (x) = Ax + x, where A is an n × n real matrix. Find the
the gradient of f with respect to x.
(a) A
(b) AT
(c) A + I
(d) AT + I
(a) −0.8
(b) −0.4
(c) 1.2
(d) 3.6
1b 2c 3c 4c 5b 6a 7a 8b 9d 10a 11c 12b 13d 14a 15a 16d 17a 18c 19d 20c 21a 22c 23a 24c
25b 26a 27a 28d 29d 30c 31d 32b 33c 34b 35a 36a 37b 38b 39a 40d 41b
46 MULTIPLE CHOICE QUESTIONS
Bibliography
[1] Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, Mathematics for
Machine Learning, Cambridge University Press, 2020.
[3] Stephen Boyd, and Lieven Vandenberghe, Convex Optimization, Cambridge Uni-
versity Press, 2004.
47
48 MULTIPLE CHOICE QUESTIONS