CS 229, Autumn 2016 Problem Set #0 Solutions: Linear Algebra and Multivariable Calculus
CS 229, Autumn 2016 Problem Set #0 Solutions: Linear Algebra and Multivariable Calculus
Notes: (1) These questions require thought, but do not require long answers. Please be as
concise as possible. (2) If you have a question about this homework, we encourage you to post
your question on our Piazza forum, at https://piazza.com/stanford/autumn2016/cs229. (3)
If you missed the first lecture or are unfamiliar with the collaboration or honor code policy, please
read the policy on Handout #1 (available from the course website) before starting work. (4)
This specific homework is not graded, but we encourage you to solve each of the problems to
brush up on your linear algebra. Some of them may even be useful for subsequent problem sets.
It also serves as your introduction to using Gradescope for submissions.
If you are scanning your document by cellphone, please check the Piazza forum for recommended
cellphone scanning apps and best practices.
where step (i) follows because ∂x∂ k Aij xi xj = 0 if i 6= k and j 6= k, step (ii) by the definition
of a partial derivative, and the final equality because Aij = Aji for all pairs i, j. Thus
∇( 21 xT Ax) = Ax. To see that ∇bT x = b, note that
n
∂ T ∂ X ∂
b x= bi xi = bk xk = bk .
∂xk ∂xk i=1 ∂xk
∇f (x) = g 0 (h(x))∇h(x).
and thus
∂2 1 T ∂ (k) T
( x Ax) = a x = Aik .
∂xk xi 2 ∂xi
(d) Let f (x) = g(aT x), where g : R → R is continuously differentiable and a ∈ Rn is a vector.
What are ∇f (x) and ∇2 f (x)? (Hint: your expression for ∇2 f (x) may have as few as 11
symbols, including 0 and parentheses.)
CS229 Problem Set #0 3
Answer: We use the chain rule (part (1b)) to see that ∇f (x) = g 0 (aT x)a, because
T
∇(a x) = a. Taking second derivatives, we have
∂ ∂ ∂ 0 T
= g (a x)aj = g 00 (aT x)ai aj .
∂xi ∂xj ∂xi
Expanding this in matrix form, we have
2
a1 a1 a2 ··· a1 an
a2 a1 a22 ··· a2 an
∇2 f (x) = g 00 (aT x) . 00 T T
.. = g (a x)aa .
.. ..
.. . . .
an a1 an a2 ··· a2n
(a) Suppose that the matrix A ∈ Rn×n is diagonalizable, that is, A = T ΛT −1 for an invertible
matrix T ∈ Rn×n , where Λ = diag(λ1 , . . . , λn ) is diagonal. Use the notation t(i) for the
columns of T , so that T = [t(1) · · · t(n) ], where t(i) ∈ Rn . Show that At(i) = λi t(i) , so
that the eigenvalues/eigenvector pairs of A are (t(i) , λi ).
Answer: The matrix T is invertible, so if we let t(i) be the ith column of T , we have
h i h i
In×n = T −1 T = T −1 t(1) t(2) · · · t(n) = T −1 t(1) T −1 t(2) · · · T −1 t(n)
so that T
T −1 t(i) = ∈ {0, 1}n ,
0| ·{z
· · 0} 1 |0 ·{z
· · 0}
i−1 times n−i times
the ith standard basis vector, which we denote by e(i) (that is, the vector of all-zeros except
for a 1 in its ith position. Thus
A matrix U ∈ Rn×n is orthogonal if U T U = I. The spectral theorem, perhaps one of the most
important theorems in linear algebra, states that if A ∈ Rn×n is symetric, that is, A = AT ,
then A is diagonalizable by a real orthogonal matrix. That is, there are a diagonal matrix
Λ ∈ Rn×n and orthogonal matrix U ∈ Rn×n such that U T AU = Λ, or, equivalently,
A = U ΛU T .