0% found this document useful (0 votes)
398 views4 pages

CS 229, Autumn 2016 Problem Set #0 Solutions: Linear Algebra and Multivariable Calculus

The document provides solutions to problems regarding linear algebra and multivariable calculus concepts for a CS229 problem set. It includes: 1) Definitions of gradients, Hessians, and symmetric matrices. Questions involve calculating gradients and Hessians of functions, as well as properties of positive definite matrices. 2) The first problem calculates gradients and Hessians of functions involving vectors, matrices, and composition of functions. 3) The second problem involves showing a matrix constructed from an vector is positive semidefinite and finding its null space and rank.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
398 views4 pages

CS 229, Autumn 2016 Problem Set #0 Solutions: Linear Algebra and Multivariable Calculus

The document provides solutions to problems regarding linear algebra and multivariable calculus concepts for a CS229 problem set. It includes: 1) Definitions of gradients, Hessians, and symmetric matrices. Questions involve calculating gradients and Hessians of functions, as well as properties of positive definite matrices. 2) The first problem calculates gradients and Hessians of functions involving vectors, matrices, and composition of functions. 3) The second problem involves showing a matrix constructed from an vector is positive semidefinite and finding its null space and rank.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CS229 Problem Set #0 1

CS 229, Autumn 2016


Problem Set #0 Solutions: Linear Algebra and
Multivariable Calculus

Notes: (1) These questions require thought, but do not require long answers. Please be as
concise as possible. (2) If you have a question about this homework, we encourage you to post
your question on our Piazza forum, at https://piazza.com/stanford/autumn2016/cs229. (3)
If you missed the first lecture or are unfamiliar with the collaboration or honor code policy, please
read the policy on Handout #1 (available from the course website) before starting work. (4)
This specific homework is not graded, but we encourage you to solve each of the problems to
brush up on your linear algebra. Some of them may even be useful for subsequent problem sets.
It also serves as your introduction to using Gradescope for submissions.
If you are scanning your document by cellphone, please check the Piazza forum for recommended
cellphone scanning apps and best practices.

1. [0 points] Gradients and Hessians


Recall that a matrix A ∈ Rn×n is symmetric if AT = A, that is, Aij = Aji for all i, j. Also
recall the gradient ∇f (x) of a function f : Rn → R, which is the n-vector of partial derivatives
 ∂   
∂x1 f (x) x1
..  . 
∇f (x) =   where x =  ..  .
 
.

∂xn f (x)
xn

The hessian ∇2 f (x) of a function f : Rn → R is the n × n symmetric matrix of twice partial


derivatives,
∂2 ∂2 2
 
∂x21
f (x) ∂x1 ∂x2 f (x) · · · ∂x1∂∂xn f (x)
 ∂2 ∂2 2
· · · ∂x2∂∂xn f (x)


∂x ∂x f (x) ∂x 2 f (x)
∇2 f (x) = 
2 1 2
.
 
.. .. .. ..

 . . . . 

2 2 2
∂ ∂ ∂
∂xn ∂x1 f (x) ∂xn ∂x2 f (x) · · · ∂x 2 f (x)
n

(a) Let f (x) = 21 xT Ax + bT x, where A is a symmetric matrix and b ∈ Rn is a vector. What


is ∇f (x)?
Answer: In short, we know that ∇( 12 xT Ax) = Ax for a symmetric matrix A, while
T
∇(b x) = b. Then ∇f (x) = Ax + b when A is symmetric. In more detail, we have
n n
1 T 1 XX
x Ax = Aij xi xj ,
2 2 i=1 j=1
CS229 Problem Set #0 2

so for each k = 1, . . . , n, we have


n n n n
∂ 1 XX (i) ∂ 1 X ∂ 1 X ∂ 1
Aij xi xj = Aik xi xk + Akj xk xj + Akk x2k
∂xk 2 i=1 j=1 ∂xk 2 ∂xk 2 ∂xk 2
i=1,i6=k j=1,j6=k
n n
(ii) 1 X 1 X
= Aik xi + Akj xj + Akk xk
2 2
i=1,i6=k j=1,j6=k
n
X
= Aki xi
i=1

where step (i) follows because ∂x∂ k Aij xi xj = 0 if i 6= k and j 6= k, step (ii) by the definition
of a partial derivative, and the final equality because Aij = Aji for all pairs i, j. Thus
∇( 21 xT Ax) = Ax. To see that ∇bT x = b, note that
n
∂ T ∂ X ∂
b x= bi xi = bk xk = bk .
∂xk ∂xk i=1 ∂xk

(b) Let f (x) = g(h(x)), where g : R → R is differentiable and h : Rn → R is differentiable.


What is ∇f (x)?
Answer: In short, if g 0 is the derivative of g, then the chain rule gives

∇f (x) = g 0 (h(x))∇h(x).

Expanding this by components, we have for each i = 1, . . . , n that


∂ ∂ ∂
f (x) = g(h(x)) = g 0 (h(x)) h(x)
∂xi ∂xi ∂xi
by the chain rule. Stacking each of these in a column vector, we obtain
 0 ∂ 
g (h(x)) ∂x 1
h(x)
∇f (x) =  .. 0
 = g (h(x))∇h(x).
 
.
g 0 (h(x)) ∂x∂n h(x)

(c) Let f (x) = 21 xT Ax + bT x, where A is symmetric and b ∈ Rn is a vector. What is ∇2 f (x)?


Answer: We have ∇2 f (x) = A. To see this more formally, note that ∇2 (bT x) = 0,
because the second derivatives of bi xi are all zero. Let A = [a(1) · · · a(n) ], where ai ∈ Rn is
an n-vector (because A is symmetric, we also have A = [a(1) a(2) · · · a(n) ]T ). Then we use
part (1a) to obtain
n
∂ 1 T T X
( x Ax) = a(k) x = Aik xi ,
∂xk 2 i=1

and thus
∂2 1 T ∂ (k) T
( x Ax) = a x = Aik .
∂xk xi 2 ∂xi
(d) Let f (x) = g(aT x), where g : R → R is continuously differentiable and a ∈ Rn is a vector.
What are ∇f (x) and ∇2 f (x)? (Hint: your expression for ∇2 f (x) may have as few as 11
symbols, including 0 and parentheses.)
CS229 Problem Set #0 3

Answer: We use the chain rule (part (1b)) to see that ∇f (x) = g 0 (aT x)a, because
T
∇(a x) = a. Taking second derivatives, we have
∂ ∂ ∂ 0 T
= g (a x)aj = g 00 (aT x)ai aj .
∂xi ∂xj ∂xi
Expanding this in matrix form, we have
 2 
a1 a1 a2 ··· a1 an
 a2 a1 a22 ··· a2 an 
∇2 f (x) = g 00 (aT x)  . 00 T T
..  = g (a x)aa .
 
.. ..
 .. . . . 
an a1 an a2 ··· a2n

2. [0 points] Positive definite matrices


A matrix A ∈ Rn×n is positive semi-definite (PSD), denoted A  0, if A = AT and xT Ax ≥ 0
for all x ∈ Rn . A matrix A is positive definite, denoted A  0, if A = AT and xT Ax > 0 for
all x 6= 0, that is, all non-zero vectors x. The simplest example of a positive definite matrix is
the identity I (the diagonal matrix with 1s on the diagonal and 0s elsewhere), which satisfies
2 Pn
xT Ix = kxk2 = i=1 x2i .
(a) Let z ∈ Rn be an n-vector. Show that A = zz T is positive semidefinite.
Answer: Take any x ∈ Rn . Then xT Ax = xT zz T x = (xT z)2 ≥ 0.
(b) Let z ∈ Rn be a non-zero n-vector. Let A = zz T . What is the null-space of A? What is
the rank of A?
Answer: If n = 1, the null space of A is empty. The rank of A is always 1, as the
null-space of A is the set of vectors orthogonal to z. That is, if z T x = 0, then x ∈ Null(A),
because Ax = zz T x = 0. Thus, the null-space of A has dimension n − 1 and the rank of A
is 1.
(c) Let A ∈ Rn×n be positive semidefinite and B ∈ Rm×n be arbitrary, where m, n ∈ N. Is
BAB T PSD? If so, prove it. If not, give a counterexample with explicit A, B.
Answer: Yes, BAB T is positive semidefinite. For any x ∈ Rm , we may define v = B T x ∈
Rn . Then
xT BAB T x = (B T x)T A(B T x) = v T Av ≥ 0,
where the inequality follows because v T Av ≥ 0 for any vector v.
3. [0 points] Eigenvectors, eigenvalues, and the spectral theorem
The eigenvalues of an n × n matrix A ∈ Rn×n are the roots of the characteristic polynomial
pA (λ) = det(λI − A), which may (in general) be complex. They are also defined as the the
values λ ∈ C for which there exists a vector x ∈ Cn such that Ax = λx. We call such a pair
(x, λ) an eigenvector, eigenvalue pair. In this question, we use the notation diag(λ1 , . . . , λn )
to denote the diagonal matrix with diagonal entries λ1 , . . . , λn , that is,
 
λ1 0 0 ··· 0
 0 λ2 0 · · · 0 
 
diag(λ1 , . . . , λn ) =  0
 0 λ3 · · · 0  .
 .. .. .. .. .. 
. . . . . 
0 0 0 · · · λn
CS229 Problem Set #0 4

(a) Suppose that the matrix A ∈ Rn×n is diagonalizable, that is, A = T ΛT −1 for an invertible
matrix T ∈ Rn×n , where Λ = diag(λ1 , . . . , λn ) is diagonal. Use the notation t(i) for the
columns of T , so that T = [t(1) · · · t(n) ], where t(i) ∈ Rn . Show that At(i) = λi t(i) , so
that the eigenvalues/eigenvector pairs of A are (t(i) , λi ).
Answer: The matrix T is invertible, so if we let t(i) be the ith column of T , we have
h i h i
In×n = T −1 T = T −1 t(1) t(2) · · · t(n) = T −1 t(1) T −1 t(2) · · · T −1 t(n)

so that T
T −1 t(i) = ∈ {0, 1}n ,

0| ·{z
· · 0} 1 |0 ·{z
· · 0}
i−1 times n−i times

the ith standard basis vector, which we denote by e(i) (that is, the vector of all-zeros except
for a 1 in its ith position. Thus

ΛT −1 t(i) = Λe(i) = λi e(i) , and T ΛT −1 t(i) = λi T e(i) = λi t(i) .

A matrix U ∈ Rn×n is orthogonal if U T U = I. The spectral theorem, perhaps one of the most
important theorems in linear algebra, states that if A ∈ Rn×n is symetric, that is, A = AT ,
then A is diagonalizable by a real orthogonal matrix. That is, there are a diagonal matrix
Λ ∈ Rn×n and orthogonal matrix U ∈ Rn×n such that U T AU = Λ, or, equivalently,

A = U ΛU T .

Let λi = λi (A) denote the ith eigenvalue of A.


(b) Let A be symmetric. Show that if U = [u(1) · · · u(n) ] is orthogonal, where u(i) ∈
Rn and A = U ΛU T , then u(i) is an eigenvector of A and Au(i) = λi u(i) , where Λ =
diag(λ1 , . . . , λn ).
Answer: Once we see that U −1 = U T because U T U = I, this is simply a repeated
application of part (3a).
(c) Show that if A is PSD, then λi (A) ≥ 0 for each i.
Answer: Let x ∈ Rn be any vector. We know that A = AT , so that A = U ΛU T for an
orthogonal matrix U ∈ Rn×n by the spectral theorem. Take the ith eigenvector u(i) . Then
we have
U T u(i) = e(i) ,
the ith standard basis vector. Using this, we have
T T
0 ≤ u(i) Au(i) = (U T u(i) )T ΛU T u(i) = e(i) Λe(i) = λi (A).

You might also like