Quick Reference of Linear Algebra
Quick Reference of Linear Algebra
Hao Zhang
National Key Laboratory for Novel Software Technology, Nanjing University, China
[email protected]
1
• In general, 𝑨𝑩 ≠ 𝑩𝑨. Definition 6 (Vector Space). A set of “vectors” together with
• In general, 𝑨𝑩 = 𝑨𝑪 ̸⇒ 𝑩 = 𝑪. rules for vector addition and for multiplication by real num-
• In general, 𝑨𝑩 = 𝟎 ̸⇒ 𝑨 = 𝟎 ∨ 𝑩 = 𝟎. bers. The addition and multiplication must produce vectors
that are in the space. The space ℝ𝑛 consists of all column
vectors 𝒙 with 𝑛 components. The zero-dimensional space 𝟘
1.3 Common Matrix Operations consists only of a zero vector 𝟎.
Table 1 and 2 summarize the properties of common matrix Definition 7 (Subspace). A subspace of a vector space is a
operations. set of vectors (including 𝟎) where all linear combinations
stay in that subspace.
Lemma 5. Any square matrix 𝑨 ∈ ℝ𝑛×𝑛 can be formed as
a sum of symmetric matrix and an anti-symmetric matrix Definition 8 (Span). A set of vectors spans a space if their
𝑨 = 12 (𝑨 + 𝑨⊤ ) + 12 (𝑨 − 𝑨⊤ ). linear combinations fill the space. The span of a set of vectors
is the smallest subspace containing those vectors.
Lemma 6. diag 𝑨𝑨⊤ = (𝑨 ⊙ 𝑨)𝟏.
Definition 9 (Linear Independent). The columns of 𝑨 are
Lemma 7. The followings are properties of vec operation. linear independent iff (𝑨) = 𝟘, or rank 𝑨 = 𝑛.
• (vec 𝑨)⊤ (vec 𝑩) = tr 𝑨⊤ 𝑩.
• (vec 𝑨𝑨⊤ )⊤ (vec 𝑩𝑩 ⊤ ) = tr(𝑨⊤ 𝑩)⊤ (𝑨⊤ 𝑩) = Definition 10 (Basis). A basis for a vector space is a set
∑𝑛 ∑𝑛 of linear independent vectors which span the space. Every
‖𝑨 𝑩‖F =
⊤ 2 (𝒂⊤ 𝒃 𝑗 )2 , where 𝑨 ∶=
[ ] 𝑖=1 [
𝑗=1 𝑖 ] vector in the space is a unique combination of the basis
𝒂1 𝒂2 ⋯ 𝒂𝑛 and 𝑩 ∶= 𝒃1 𝒃2 ⋯ 𝒃𝑛 ,
vectors. The columns of 𝑨 ∈ ℝ𝑛×𝑛 are a basis for ℝ𝑛 iff 𝑨 is
Algorithm 4 (Computing the Rank). rank 𝑨 is determined invertible.
by the SVD of 𝑨. The rank is the number of nonzero sigular
Definition 11 (Dimension). The dimension of a space if the
values. In this case, extremely small nonzero singular values
number of vectors in every basis. dim 𝟘 = 0 since 𝟎 itself
are assumed to be zero. In general, rank estimation is not a
forms a linearly dependent set.
simple problem.
Definition 12 (Rank). rank 𝑨 ∶= dim (𝑨), which also
Algorithm 5 (Computing the Determinant). Get the REF
equals to the number pivots of 𝑨.
𝑼 of 𝑨. If there are 𝑝 row interchanges, det 𝑨 = (−1)𝑝 ⋅
(product of pivots in 𝑼 ). Although 𝑼 is not unique and pivots There are four foundamental subspaces for a matrix 𝑨, as
are not unique, the product of pivots is unique. 𝑇 (𝑛) ∼ 31 𝑛3 . illustrated in Table 3.
2
Table 1: Properties of common matrix operations (I).
Algorithm 6 (The 𝑨−1 Algorithm).[ When ] 𝑨 is square[ and] computed, unless the entries of 𝑨−1 is explicitly needed.
invertible, Gaussian Elimination on 𝑨 𝑰 to produce 𝑹 𝑬 .
Since 𝑹 = 𝑰,[ then 𝑬𝑨] = 𝑹 becomes 𝑬𝑨 = 𝑰. The elimina-
tion result is 𝑰 𝑨−1 . 𝑇 (𝑛) ∼ 𝑛3 . In practice, 𝑨−1 is seldom
3
2 ⊤
2.3 Ill-conditioned Matrices Proof. By setting 𝜕(𝒙)
𝜕𝒙
= 𝑚
𝑨 𝑨𝒙− 𝑚2 𝑨⊤ 𝒃+2𝜆𝒙 = 𝟎.
Definition 14 (Ill-conditioned Matrix). An invertible matrix Lemma 14 (Weighted Least-squares Approximation). Sup-
that can become singular if some of its entries are changed pose 𝑪 ∈ ℝ𝑚×𝑚 is a diagonal matrix specifying the weight
ever so slightly. In this case, row reduction may produce for the equations,
fewer than 𝑛 pivots as a result of roundoff error. Also, round-
1
off error can sometimes make a singular matrix appear to be arg min (𝑨𝒙 − 𝒃)⊤ 𝑪(𝑨𝒙 − 𝒃) = (𝑨⊤ 𝑪𝑨)−1 𝑨⊤ 𝑪𝒃 . (12)
𝒙 𝑚
invertible.
2 ⊤
Definition 15 (Condition Number cond 𝑨). cond 𝑨 ∶=
𝜎1 Proof. By setting 𝜕(𝒙)
𝜕𝒙
= 𝑚
𝑨 𝑪𝑨𝒙 − 𝑚2 𝑨⊤ 𝑪𝒃 = 𝟎.
𝜎𝑟
for a matrix 𝑨 ∈ ℝ𝑛×𝑛 . The larger the condition number,
the closer the matrix is to being singular. cond 𝑰 = 1, and 2.5 Orthogonality
cond(singular matrix) = ∞. Lemma 15 (Plane in Point-normal Form). The equation of a
hyperplane with a point 𝒙0 in the plane and a normal vector
2.4 Least-squares and Projections 𝒘 orthogonal to the plane is 𝒘⊤ (𝒙 − 𝒙0 ) = 0.
It is often the case that 𝑨𝒙 = 𝒃 is overdetermined: 𝑚 > 𝑛. Lemma 16. The distance from a point 𝒙 to a plane with a
The 𝑛 columns span a small part of ℝ𝑚 . Typically 𝒃 is outside point 𝒙0 on the plane and a normal vector 𝒘 orthogonal to
(𝑨) and there is no solution. One approach is least-squares. |𝒘⊤ (𝒙−𝒙0 )|
the plane is ‖𝒘‖
.
Theorem 11 (Least-squares Approximation). The projection Definition 16 (Orthogonal Vectors). Two vectors 𝒖 and 𝒗
of 𝒃 ∈ ℝ𝑚 onto (𝑨) is are orthogonal if 𝒖⊤ 𝒗 = 0.
𝒃̂ = 𝑨(𝑨⊤ 𝑨)−1 𝑨⊤ 𝒃 , (7) Definition 17 (Orthonormal[ Vectors
] 𝒒). The columns of 𝑸
where the projection matrix 𝑷 = 𝑨(𝑨 ⊤
𝑨)−1 𝑨⊤ and are orthonormal if 𝑸⊤ 𝑸 = 𝒒 ⊤ 𝒒
𝑖 𝑗 𝑛×𝑛 = 𝑰. If 𝑸 is square, it
is called the orthogonal matrix.
1
arg min ‖𝑨𝒙 − 𝒃‖2 = (𝑨⊤ 𝑨)−1 𝑨⊤ 𝒃 . (8) Lemma 17. The followings are orthogonal matrices.
𝒙 𝑚
• Every permutation matrix 𝑷 .
In particular, the projection of 𝒃 onto the line 𝒂 ∈ ℝ𝑚 is • Reflection matrix 𝑰 − 2𝒆𝒆⊤ where 𝒆 is any unit vector.
𝒂𝒂⊤ Lemma 18. Orthogonal matrices 𝑸 preserve certain norms.
𝒃̂ = ⊤ 𝒃 . (9)
𝒂 𝒂 • ‖𝑸𝒙‖2 = ‖𝒙‖2 .
• ‖𝑸1 𝑨𝑸⊤ ‖ = ‖𝑨‖2 .
2 2
Proof. Let 𝒙⋆ ∶= arg min𝒙 𝑚1 ‖𝑨𝒙 − 𝒃‖2 and the approxima-
• ‖𝑸1 𝑨𝑸⊤ ‖ = ‖𝑨‖F .
2 F
tion error term 𝒆 ∶= 𝒃 − 𝑨𝒙⋆ . 𝒆 ⟂ (𝑨) ⇒ 𝒆 ∈ (𝑨⊤ ) ⇒
𝑨⊤ 𝒆 = 𝑨⊤ (𝒃 − 𝑨𝒙⋆ ) = 𝟎 ⇒ 𝑨⊤ 𝑨𝒙⋆ = 𝑨⊤ 𝒃. Another Theorem 19. The projection of 𝒃 ∈ ℝ𝑚 onto (𝑸) is
proof is by setting 𝜕(𝒙)
𝜕𝒙
= 𝑚2 𝑨⊤ 𝑨𝒙 − 𝑚2 𝑨⊤ 𝒃 = 𝟎. ∑
𝑛
𝒃̂ = 𝑸𝑸⊤ 𝒃 = 𝒒𝑗 𝒒⊤
𝑗 𝒃. (13)
Lemma 12. 𝑨𝒙 = 𝒃 has a unique least-squares solution for 𝑗=1
each 𝒃 when the columns of 𝑨 are linearly independent. ∑𝑛
If 𝑸 is square, 𝒃 = ⊤
Algorithm 7 (Least-squares Approximation). Since 𝑗=1 𝒒 𝑗 𝒒 𝑗 𝒃.
cond 𝑨⊤ 𝑨 = (cond 𝑨)2 , least-squares is solved by QR Definition 18 (QR Factorization). 𝑨 ∈ ℝ𝑛×𝑛 can be wriiten
factorization. as 𝑨 = 𝑸𝑹, where columns of 𝑸 ∈ ℝ𝑛×𝑛 are orthonormal,
and 𝑹 ∈ ℝ𝑛×𝑛 is an upper triangular matrix.
𝑨⊤ 𝑨𝒙 = 𝑨⊤ 𝒃 ⇒ 𝑹⊤ 𝑹𝒙 = 𝑹⊤ 𝑸⊤ 𝒃 ⇒ 𝑹𝒙 = 𝑸⊤ 𝒃 . (10)
Algorithm 8 (Gram-Schmidt Process). The idea is to sub-
𝑇 (𝑚, 𝑛) ∼ 𝑚𝑛2 . tract from every new vector its projections in the directions
Another common case is that 𝑨𝒙 = 𝒃 is underdetermined: already set, and divide the resulting vectors by their lengths,
𝑚 < 𝑛 or 𝑨 has dependent columns. Typically there are in- such that
finitely many solutions. One approach is using regularization. [ ]
𝑨 = 𝒂1 𝒂2 ⋯ 𝒂𝑛
Theorem 13 (Least-squares Approximation With Regular- ⊤ 𝒒⊤ ⋯ 𝒒⊤
⎡𝒒 1 𝒂1 𝒂
1 2 1 𝑛⎤
𝒂
ization). [ ]⎢ 𝒒⊤ 𝒂 ⋯ 𝒒⊤ 𝒂 ⎥
= 𝒒1 𝒒2 ⋯ 𝒒𝑛 ⎢ 2 2 2 𝑛⎥
( ) ( )−1 ⎢ ⋱ ⋮ ⎥
1 1 ⊤ 1 ⊤
arg min ‖𝑨𝒙 − 𝒃‖2 + 𝜆‖𝒙‖2 = 𝑨 𝑨 + 𝜆𝑰 𝑨 𝒃. ⎣ 𝒒⊤
𝑛 𝒂𝑛
⎦
𝒙 𝑚 𝑚 𝑚
(11) = 𝑸𝑸⊤ 𝑨 = 𝑸𝑹 . (14)
4
The algorithm is illustrated in Alg. 1. 𝑇 (𝑛) = Definition 22 (Block Elimination). We perform (row 2) -
∑𝑛 ∑𝑗
𝑗=1 𝑖=1
2𝑛 ∼ 𝑛 3 . In practice, the roundoff error can 𝑪𝑨−1 (row 1) to get a zero block in the first column.
build up. [ ][ ] [ ]
𝑰 𝟎 𝑨 𝑩 𝑨 𝑩
= . (15)
−𝑪𝑨−1 𝑰 𝑪 𝑫 𝟎 𝑫 − 𝑪𝑨−1 𝑩
Algorithm 1 QR Factorization.
Input: 𝑨 ∈ ℝ𝑛×𝑛 The final block 𝑫 − 𝑪𝑨−1 𝑩 is called the Schur complement.
Output: 𝑸 ∈ ℝ𝑛×𝑛 , 𝑹 ∈ ℝ𝑛×𝑛
1: 𝑸 ← 𝑹 ← 𝟎 Definition 23 (Row Exchange Matrix 𝑷 𝑖𝑗 ). Identity matrix
2: for 𝑗 ← 0 to 𝑛 − 1 do with row 𝑖 and row 𝑗 exchanged. 𝑷 𝑖𝑗 𝑨 means that we ex-
3: 𝒒 𝑗 ← 𝒂𝑗 change row 𝑖 and row 𝑗.
4: for 𝑖 ← 0 to 𝑗 − 1 do
5: 𝑟𝑖𝑗 ← 𝒒 ⊤𝑖 𝒂𝑗 Definition 24 (Permutation Matrix 𝑷 ). A permutation ma-
6: 𝒒 𝑗 ← 𝒒 𝑗 − 𝑟𝑖𝑗 𝒒 𝑖 trix has the rows of the identity 𝑰 in any order. This matrix
7: 𝑟𝑗𝑗 ← ‖𝒒 𝑗 ‖ has a single 1 in every row and every column. The simplest
𝒒
8: 𝒒 𝑗 ← ‖𝒒𝑗 ‖ permutation matrix is 𝑰. The next simplest are the row ex-
𝑗
9: return 𝑸, 𝑹 change matrix 𝑷 𝑖𝑗 . There are 𝑛! permutation matrices of
order 𝑛, half of which have determinant 1, and the other half
are -1. If 𝑷 is a permutation matrix, then 𝑷 −1 = 𝑷 ⊤ , which
Algorithm 9 (Householder reflections). In practice, House- is also a permutation matrix.
holder reflections are often used instead of the Gram-Schmidt [ ]
process, even though the factorization requires about twice Definition 25 (Augmented Matrix 𝑨 𝒃 ). Elimination does
as much arithmetic. the same row operations to 𝑨 and to 𝒃. We can include 𝒃 as
an extra column and let elimination act on whole rows of this
matrix.
3 Application: Solving Linear Sys- Definition 26 (Row Equivalent). Two matrices are called row
tems equivalent if there is a sequence of elementary row operations
that transforms one matrix into the other.
Understanding the linear system 𝑨𝒙 = 𝒃.
• Row picture: 𝑚 hyperplanes meets at a single point (if Lemma 20. If the augmented matrices of two linear systems
possible). are row equivalent, then the two linear systems have the same
• Column picture: 𝑛 vectors are combined to produce 𝒃. solution set.
5
Table 5: The four possibilities for steady state problem 𝑨𝒙 = 𝒃, where 𝑨 ∈ ℝ𝑚×𝑛 and 𝑟 ∶= rank 𝑨. Gaussian elimination on
[𝑨 𝒃] gives 𝑹𝒙 = 𝒅, where 𝑹 ∶= 𝑬𝑨 and 𝒅 ∶= 𝑬𝒃.
Case Shape of 𝑨 RREF 𝑹 Particular solution 𝒙𝑝 Nullspace matrix # solutions Left inverse Right inverse
−1 −1
𝑟=𝑚=𝑛 Square and invertible [𝑰] [𝑨
[ ]𝒃] [ [𝟎] ] 1 𝑨 𝑨−1
𝒅 −𝑭
𝑟=𝑚<𝑛 Short and wide [𝑰 𝑭 ] ∞ - 𝑨⊤ (𝑨𝑨⊤ )−1
[ ] 𝟎 𝑰
𝑰
𝑟=𝑛<𝑚 Tall and thin [𝒅] or none [𝟎] 0 or 1 (𝑨⊤ 𝑨)−1 𝑨 -
[ 𝟎 ] [ ] [ ]
𝑰𝑭 𝒅 −𝑭
𝑟 < 𝑚, 𝑟 < 𝑛 Not full rank or none 0 or ∞ - -
𝟎𝟎 𝟎 𝑰
6
Algorithm 12 (The LU Factorizaiton). The algorithm is il-
lustrated in Alg. 4. 𝑇 (𝑚, 𝑛) ∼ 13 𝑚3 + 31 𝑚2 𝑛. For a band
matrix 𝑩 with 𝑤 nonzero diagonals below and above its
main diagonal, 𝑇 (𝑚, 𝑛, 𝑤) ∼ 𝑚𝑤2 .
Algorithm 4 LU factorization on 𝑨.
Input: 𝑨 ∈ ℝ𝑚×𝑛
Output: 𝑳, 𝑼
1: 𝑳 ← 𝟎 ∈ ℝ𝑚×𝑚 Figure 1: Three different cases (fixed-fixed, fixed-free, and
2: 𝑘 ← 0 free-free) for a spring system.
3: for 𝑗 ← 0 to 𝑛 − 1 do
4: if 𝑗-th column if not a pivot column then
5: continue • 𝒆 ∈ ℝ𝑚 : The stretching distance of each spring.
6: Row exchange to make 𝑎𝑘𝑗 as the largest available pivot. By Hooke’s Law 𝑦𝑖 = 𝑐𝑖 𝑒𝑖 . In matrix form, 𝒚 = 𝑪𝒆.
7: 𝑙𝑘𝑘 ← 1 These are three different cases for these springs, as illus-
8: for 𝑖 ← 𝑘 + 1 to 𝑚 − 1 do trated in Fig. 1.
𝑎
9: 𝑙𝑖𝑘 ← 𝑎 𝑖𝑗 ⊳ Multiplier
𝑘𝑗 Fixed-fixed Case. In this case, 𝑚 = 𝑛 + 1 and the top and
10: ⊳ Eliminates row 𝑖 beyond row 𝑘 bottom spring are fixed. Originally there is no stretching.
11: (row 𝑖 of 𝑨) ← (row 𝑖 of 𝑨) - 𝑙𝑖𝑘 (row 𝑘 of 𝑨) Then gravity acts to move down the masses by 𝒖. Each
12: 𝑘←𝑘+1
spring is stretched by the difference in displacements of its
13: return 𝑳, 𝑨
end 𝑒𝑖 = 𝑢𝑖 − 𝑢𝑖−1 . Besides, 𝑒1 = 𝑢1 since the top is fixed,
and 𝑒𝑚 = −𝑢𝑛 since the bottom is fixed. In matrix form,
Lemma 23. Assuming no row exchanges, when a row of 𝑨
starts with zeros, so does that row of 𝑳. When a column of 𝑨 ⎡1 ⎤
⎢−1 1 ⎥
starts with zeros, so does that column of 𝑼 .
𝒆 = 𝑨𝒖 ∶= ⎢ −1 ⋱ ⎥𝒖. (16)
⎢ ⎥
Algorithm 13 (Solving {𝑨𝒙𝑘 = 𝒃𝑘 }𝐾 𝑘=1
). The algorithm is ⎢ ⋱ 1⎥
⎣ −1⎦
illustrated in Alg. 5. 𝑇 (𝑚, 𝑛, 𝐾) ∼ 3 𝑚 + 31 𝑚2 𝑛 + 𝑛2 𝐾. For
1 3
a band matrix 𝑩 with 𝑤 nonzero diagonals below and above Finally comes the balance equation, the internal forces
its main diagonal, 𝑇 (𝑚, 𝑛, 𝑤) ∼ 𝑚𝑤2 + 2𝑛𝑤𝐾. from the springs balance the external forces on the masses
𝑓𝑖 = 𝑦𝑖 − 𝑦𝑖+1 . In matrix form,
Algorithm 5 Solving {𝑨𝒙𝑘 = 𝒃𝑘 }𝐾 .
𝑘=1 ⎡1 −1 ⎤
Input: 𝑨, {𝒃𝑘 }𝐾
𝑘=1
. ⊤ ⎢ 1 −1 ⎥
Output: {𝒙𝑘 }𝐾 𝒇 = 𝑨 𝒚 ∶= ⎢ ⎥𝒚. (17)
⋱ ⋱
𝑘=1
⎢ ⎥
1: LU factorization 𝑨 = 𝑳𝑼 . ⎣ 1 −1⎦
2: for 𝑘 ← 1 to 𝐾 do
3: Solve 𝑳𝒚 𝑘 = 𝒃𝑘 by forward subsitution. Combining the three matrices gives
4: Solve 𝑼 𝒙𝑘 = 𝒚 𝑘 by backward subsitution.
5: return {𝒙𝑘 }𝐾𝑘=1 𝑨⊤ 𝑪𝑨𝒖 = 𝒇 . (18)
When 𝑪 = 𝑰,
7
• 𝑲 −1 is a full matrix with all positive entries. 𝑲 −1 is
also PD.
⎡1 ⎤ ⎡ 2 −1 ⎤
⎢−1 1 ⎥ ⎢−1 2 ⋱⎥
𝑨=⎢ ⎥, 𝑲 = ⎢ −1⎥
.
⋱ ⋱ ⋱ ⋱
⎢ ⎥ ⎢ ⎥
⎣ −1 1⎦ ⎣ 1⎦
−1
Figure 2: A curcuit with a current source into vertex 1.
(20)
Free-free Case. In this case, 𝑚 = 𝑛 − 1 and the both ends
are free. When 𝑪 = 𝑰, 𝑨𝒖 gives the potential differences across the 𝑚 edges.
Ohm’s law says that the current 𝑦𝑖 through the resistor is
⎡−1 1 ⎤ ⎡ 1 −1 ⎤ proportional to the potential difference 𝒚 = 𝑪𝑨𝒖. Kirch-
𝑨=⎢ ⋱ ⋱ ⎥, 𝑲 = ⎢⎢−1 2 ⋱ ⎥
hoff’s current law says that the net current into every node is
−1⎥
.
⎢ ⎥ ⎢
⋱ ⋱
⎥
⎣ −1 1⎦
⎣ −1 1⎦
zero, which is expressed as
(21) 𝑨⊤ 𝒚 = 𝑨⊤ 𝑪𝑨𝒖 = 𝒇 . (23)
There is a nonzero solution to 𝑨𝒖 = 𝟎. The masses can move
𝒖𝑛 = 𝟏 with no stretching of the springs 𝒆 = 𝟎. 𝑲 is only Kirchhoff’s voltage law says that the sum of potential
PSD, and 𝑲𝒖 = 𝒇 is solvable for special 𝒇 , i.e., 𝟏⊤ 𝒇 = 0, or differences around a loop must be zero.
the whole line of spring (with both ends free) will take off
Lemma 25. The followings are properties of an incidence
like a rocket.
matrix 𝑨.
• dim (𝑨) = dim (𝑨⊤ ) = 𝑛 − 1.
3.6 Graph and Networks • dim (𝑨) = 1
• dim (𝑨⊤ ) = 𝑚 − 𝑛 + 1.
Definition 31 (Adjacency Matrix). Given a directed graph
[ = (𝑉 , 𝐸) ]where |𝑉 | = 𝑛, the adjacency matrix is 𝑨 =
𝐺 Proof. Since we can raise or lower all the potentials by the
𝕀((𝑖, 𝑗) ∈ 𝐸) 𝑛×𝑛 , i.e., 𝑎𝑖𝑗 = 1 if there exists path from vertex same constant, 𝟏 ∈ (𝑨). Rows of 𝑨 are dependent if
𝑖 to vertex 𝑗. The (𝑖, 𝑗) entry of 𝑨𝑘 counts the number of 𝑘- the corresponding edges containing a loop. At the end of
step path from vertex 𝑖 to vertex 𝑗. If 𝐺 is undirected, 𝑨 is elimination we have a full set of 𝑟 independent rows. Those
symmetric. 𝑟 edges form a spanning tree of the graph, which has 𝑛 − 1
edges of the graph is connected.
Definition 32 (Incidence Matrix). Given a directed graph
𝐺 = (𝑉 , 𝐸) where |𝑉 | = 𝑛 and |𝐸| = 𝑚, the incidence 3.7 Two-point Boundary-value Problems
matrix is 𝑨 ∈ ℝ𝑚×𝑛 where 𝑎𝑖𝑗 = −1 if edge 𝑖 starts from
vertex 𝑗, 𝑎𝑖𝑗 = 1 if edge 𝑖 ends at vertex 𝑗, and 𝑎𝑖𝑗 = 0 Solving
otherwise.
d2 𝑢(𝑥)
− = 𝑓 (𝑥), 𝑥 ∈ [0, 1] , (24)
For a curcuit in Fig. 2, the incidence matrix is d𝑥2
with bounday condition 𝑢(0) = 0 and 𝑢(1) = 0. This equation
⎡−1 1 ⎤
⎢−1 ⎥ describe a steady state system, e.g., the temperature distribu-
1
⎢ ⎥ tion of a rod with a heat source 𝑓 (𝑥) and both ends fixed at
−1 1
𝑨 ∶= ⎢ ⎥. (22) 0 ◦ C.
⎢−1 1⎥
⎢ Since a computer cannot solve a differential equation ex-
−1 1⎥
⎢ ⎥ actly, we have to approximate the differential equation with
⎣ −1 1⎦
a difference equation. For that reason we can only accept a
finite amount of information at 𝑛 equally spaced points
We define
• 𝒖 ∈ ℝ𝑛 . Pontentials (the voltages) at 𝑛 nodes. 𝑢1 ∶= 𝑢(ℎ), 𝑢2 ∶= 𝑢(2ℎ), … , 𝑢𝑛 ∶= 𝑢(𝑛ℎ) (25)
• 𝒚 ∈ ℝ𝑚 . Currents flowing along 𝑚 edges. 𝑓1 ∶= 𝑓 (ℎ), 𝑓2 ∶= 𝑓 (2ℎ), … , 𝑓𝑛 ∶= 𝑓 (𝑛ℎ) (26)
• 𝒇 ∈ ℝ𝑛 be the current sources into 𝑛 nodes.
1
• 𝐶 ∶= diag(𝑐1 , 𝑐2 , … , 𝑐𝑚 ) ∈ ℝ𝑚×𝑚 . Conductance of where ℎ ∶= 𝑛
The boundary condition becomes 𝑢0 ∶= 0 and
each edge. 𝑢𝑛+1 ∶= 0.
8
We approximate the second-order derivative by • There is no connection between invertibility and diag-
onalizability. Invertibility is concerned with the eigen-
d2 𝑢(𝑥) 𝑢(𝑥 + ℎ) − 2𝑢(𝑥) + 𝑢(𝑥 − ℎ) values (𝜆 = 0 or 𝜆 ≠ 0). Diagonalizability is concerned
− ≈−
d𝑥 2 ℎ2 with the eigenvectors (too few or enough for 𝑺).
−𝑢𝑗+1 + 𝑢𝑗 − 𝑢𝑗−1
= . (27) • Suppose both 𝑨 and 𝑩 can be diagonalized, they share
ℎ2 the same eigenvector matrix 𝑺 iff 𝑨𝑩 = 𝑩𝑨.
2
Therefore, the differential equation − d d𝑥𝑢(𝑥)
2 = 𝑓 (𝑥) becomes
4.2 Diagonalizable
⎡2 −1 ⎤ ⎡𝑢1 ⎤ ⎡𝑓1 ⎤ Theorem 28 (Diagonalizable). If 𝑨 ∈ ℝ𝑛×𝑛 has 𝑛 indepen-
⎢−1 2 ⋱ ⎥ ⎢𝑢2 ⎥ ⎢𝑓 ⎥
𝑲𝒖 = ⎢ = ℎ2 ⎢ 2 ⎥ = ℎ2 𝒇 . (28) dent eigenvectors, 𝑨 is diagonalizable
⋱ ⋱ −1⎥ ⎢ ⋮ ⎥ ⋮
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ −1 2 ⎦ ⎣𝑢𝑛 ⎦ ⎣𝑓𝑛 ⎦ 𝑨 = 𝑺𝚲𝑺 −1 , (29)
Lemma 26. The FLOPs for solving 𝑲𝒇 = ℎ2 𝒇 is 𝑇 (𝑛) ∼ [ ]
where 𝑺 ∶= 𝒙1 𝒙2 ⋯ 𝒙𝑛 and 𝚲 ∶= diag(𝜆1 , 𝜆2 , … , 𝜆𝑛 ).
3𝑛. In other words, 𝑨 is similar to 𝚲.
[ ]
Proof. 𝑨𝑺 = 𝜆1 𝒙1 𝜆2 𝒙2 ⋯ 𝜆𝑛 𝒙𝑛 = 𝑺𝚲.
4 Theory: Eigenvalues and Eigenvec-
Definition 36 (Normal Matrix). A square matrix 𝑨 is normal
tors when 𝑨⊤ 𝑨 = 𝑨𝑨⊤ . That includes symmetric, antisymmet-
ric, and orthogonal matrices. In this case, 𝜎𝑖 = |𝜆𝑖 |.
4.1 Eigenvalues and Eigenvectors
Definition 33 (Eigenvalue 𝜆 and Eigenvector 𝒙). 𝜆 and 𝒙 ≠ 𝟎 Lemma 29. The eigenvectors of 𝑨 is orthnormal when 𝑨 is
are the eigenvalue and eigenvector of 𝑨 if 𝑨𝒙 = 𝜆𝒙. normal.
Algorithm 14 (Solving Eigenvalues and Eigenvectors). Theorem 30 (Spectral Theorem). Every symmetric matrix
𝑨𝒙 = 𝜆𝒙 ⇒ (𝑨 − 𝜆𝐼)𝒙 = 𝟎. Since 𝒙 ≠ 𝟎, (𝑨 − 𝜆𝑰) ≠ 𝟘, 𝑨 has the factorization
which has det(𝑨 − 𝜆𝑰) = 0. The algorithm is illustrated
∑
𝑛
in Alg. 6. In practice, the best way to compute eigenvalues 𝑨 = 𝑸𝚲𝑸⊤ = 𝜆𝑖 𝒙𝑖 𝒙⊤ (30)
𝑖
is to compute similar matrices 𝑨1 , 𝑨2 , … that approach a 𝑖=1
triangular matrix.
Porperties of eigenvalues and eigenvectors of special ma-
trices are illustrated in Table 10.
Algorithm 6 Solve the eigenvalues and eigenvector for 𝑨.
Input: 𝑨 ∈ ℝ𝑛×𝑛
Output: 𝜆𝑖 , 𝒙𝑖 5 Application: Solving Dynamic
1: Solve det(𝑨 − 𝜆𝑰) = 0, which is a polynomial in 𝜆 of degree 𝑛,
for eigenvalue 𝜆. Problems
2: For each eigenvalue 𝜆, solve (𝑨 − 𝜆𝑰)𝒙 = 𝟎 for eigenvector 𝒙.
3: return 𝜆𝑖 , 𝒙𝑖 5.1 Solving Difference and Differential Equa-
tions
Definition 34 (Geometric Multiplicity (GM)). The number The algorithm for solving first-order difference and differ-
of independent eigenvectors for 𝜆, which is dim (𝑨 − 𝜆𝑰). ential equations are illustrated in Alg. 7 and Alg. 8, respec-
tively.
Definition 35 (Algebraic Multiplicity (AM)). The number
of repetitions of 𝜆 among the eigenvalues. Look at the 𝑛 roots Algorithm 7 Solving 𝒖𝑘+1 = 𝑨𝒖𝑘 .
of det(𝑨 − 𝜆𝑰) = 0.
Input: 𝑨 ∈ ℝ𝑛×𝑛 , 𝒖0
Lemma 27. The followings are properties of eigenvalues Output: 𝒖𝑘
and eigenvectors. 1: Diagonalize on 𝑨 = 𝑺𝚲𝑺 −1 .
• For each eigenvalue, GM ≤ AM. A matrix is diagonaliz- 2: Solving 𝑺𝒄 = 𝒖0 to write 𝒖0 as a linear combination of eigen-
vectors.
able iff every eigenvalue has GM = AM. ∑𝑛
3: The solution 𝒖𝑘 = 𝑨𝑘 𝒖0 = 𝑺𝚲𝑘 𝑺 −1 𝒖0 = 𝑺𝚲𝑘 𝒄 = 𝑖=1 𝑐𝑖 𝜆𝑘𝑖 𝒙𝑖
• Each eigenvalue has ≥ 1 eigenvector.
4: return 𝒖𝑘
• All eigenvalues are different ⇒ all eigenvectors are inde-
pendent, which means the matrix can be diagonalized.
9
Algorithm 8 Solving d𝒖(𝑡)
= 𝑨𝒖(𝑡), where 𝑨 is a constant In the stable case, the powers 𝑨𝑘 approach zero and so does
d𝑡
coefficient matrix. 𝒖𝑘 = 𝑨 𝑘 𝒖0 .
Input: 𝑨 ∈ ℝ𝑛×𝑛 , 𝒖(0)
Lemma 33. The differential equation d𝑡d 𝒖(𝑡) = 𝑨𝒖(𝑡) is
Output: 𝒖(𝑡)
1: Diagonalize on 𝑨 = 𝑺𝚲𝑺 −1 .
• stable and exp 𝑨𝑡 → 𝟎 if ∀𝑖. Re𝜆𝑖 < 0.
2: Solving 𝑺𝒄 = 𝒖(0) to write 𝒖(0) as a linear combination of • neutrally stable if ∃𝑖. Re𝜆𝑖 = 0, and all the other Re𝜆𝑖 <
eigenvectors. 0.
3: The solution 𝒖(𝑡) = exp(𝑨𝑡)𝒖(0) = 𝑺 exp(𝚲𝑡)𝑺 −1 𝒖(𝑡) = • unstable and exp 𝑨𝑡 is unbounded if ∃𝑖. Re𝜆𝑖 > 0.
∑𝑛
𝑺 exp(𝚲𝑡)𝒄 = 𝑖=1 𝑐𝑖 exp(𝜆𝑖 𝑡)𝒙𝑖
4: If two 𝜆’s are equal, with only one eigenvector, another solution
5.2 Singular Value Decomposition
𝑡 exp(𝜆𝑡)𝒙 is needed.
5: return 𝒖(𝑡) Theorem 34 (SVD Factorization). For matrix 𝑨 ∈ ℝ𝑚×𝑛
with 𝑟 ∶= rank 𝑨, choose 𝑼 ∈ ℝ𝑚×𝑚 to contain orthonor-
mal eigenvectors of 𝑨𝑨⊤ , and 𝑽 ∈ ℝ𝑛×𝑛 to contain or-
Example 1 (Fibonacci Numbers). Find the 𝑘-th Fibonacci thonormal eigenvectors of 𝑨⊤ 𝑨. The shared eigenvalues are
number where the sequence is defined as 𝐹0 = 0, 𝐹1 = 1, 𝜎12 , 𝜎22 , … , 𝜎𝑟2 . Then
and 𝐹𝑘+2 = 𝐹𝑘+1 + 𝐹𝑘 .
[ ] [ ] ∑
𝑟
𝐹𝑘+1 1 𝑨 = 𝑼 𝚺𝑽 ⊤ = 𝜎𝑖 𝒖𝑖 𝒗⊤ (31)
Solution. Let 𝒖𝑘 ∶=
𝐹𝑘
, then 𝒖0 =
0
and 𝒖𝑘+1 = 𝑖 .
[ ] √ √ 𝑖=1
1 1
𝑨𝒖𝑘 ∶= 𝒖𝑘 . 𝜆1 = 1+2 5 , 𝜆2 = 1−2 5 . 𝒙1 = 𝑼 and 𝑽 satisfy the followings.
1 0
[ ] [ ] [ ] • The first 𝑟 columns of 𝑼 contains orthonormal bases for
𝜆1 𝜆 1 1 1
, 𝒙2 = 2 . 𝒄 = 𝜆 −𝜆 . 𝒖𝑘 = 𝜆 −𝜆 (𝜆𝑘1 𝒙1 −𝜆𝑘2 𝒙2 ). (𝑨).
1 1 1 2 −1 1 2
( √ )𝑘 • The last 𝑚−𝑟 columns of 𝑼 contains orthonormal bases
𝐹𝑘 = √1 (𝜆𝑘1 − 𝜆𝑘2 ) = nearest integer to √1 1+2 5 . for (𝑨⊤ ).
5 5 • The first 𝑟 columns of 𝑽 contains orthonormal bases for
d2 𝑥(𝑡) (𝑨⊤ ).
Example 2 (Simple Harmonic Vibration). Solve d𝑡2
+
d𝑥(0)
• The last 𝑛 − 𝑟 columns of 𝑽 contains orthonormal bases
𝑥(𝑡) = 0, where 𝑥(0) = 1, = 0. This is a 𝑚𝑎 = −𝑘𝑥
d𝑡 for (𝑨).
where 𝑚 = 1, 𝑘 = 1.
[ ] [ ] Proof. Start from 𝑨⊤ 𝑨𝒗𝑖 = 𝜎𝑖2 𝒗𝑖 . Multiply both sides by
𝑥 1
Solution. Let 𝒖(𝑡) ∶= d𝑥 , then 𝒖(0) = and d𝒖 = 𝑨 gives 𝑨𝑨⊤ (𝑨𝒗𝑖 ) = 𝜎𝑖2 (𝑨𝒗𝑖 ), which shows that 𝑨𝒗𝑖 is
0 d𝑡
[ ] d𝑡 [ ] an eigenvector of 𝑨𝑨⊤ with shared eigenvalue 𝜎𝑖2 . Since
0 1 1 √ √
𝑨𝒖(𝑡) = 𝒖(𝑡). 𝜆1 = i, 𝜆2 = −i. 𝒙1 = , 𝒙2 =
−1 0 i ‖𝑨𝒗𝑖 ‖ = 𝒗⊤ 𝑨⊤
𝑨𝒗 = 𝜎𝑖2 𝒗⊤
𝑖 𝒗𝑖 = 𝜎𝑖 , we denote 𝒖𝑖 ∶=
[ ] [ ] 𝑖 𝑖
1 1 𝑨𝒗𝑖 𝑨𝒗𝑖
. 𝒄 = 12 . 𝒖(𝑡) = 21 (exp(i𝑡)𝒙1 + exp(−i𝑡)𝒙2 ). 𝑥(𝑡) = ‖𝑨𝒗𝑖 ‖
= 𝜎𝑖
, namely, 𝑨𝒗𝑖 = 𝜎𝑖 𝒖𝑖 . It shows column by
−i 1
1 column that 𝑨𝑽 = 𝑼 𝚺. Since 𝑽 is orthogonal, 𝑨 = 𝑼 𝚺𝑽 ⊤ .
2
(exp(i𝑡) + exp(−i𝑡) = cos 𝑡.
Definition 37 (Markov Matrices). A 𝑛×𝑛 matrix is a Markov
Lemma 35. The largest singular value dominates all eigen-
matrix if all entries are nonnegative and the each column of
values and all entries of 𝑨. That is, 𝜎1 ≥ max𝑖 |𝜆𝑖 | and
the matrix adds up to 1.
𝜎1 ≥ max𝑖,𝑗 |𝑎𝑖𝑗 |.
Lemma 31. A Markov matrix 𝑨 has the following properties
Lemma 36. For a square matrix 𝑨, spectral factorization
• 𝜆1 = 1 is an eigenvalue of 𝑨.
and SVD factorization give the same result when 𝑨 is PSD.
• Its eigenvector 𝒙1 is nonnegative, and it is steady state
since 𝑨𝒙1 = 𝒙1 . Proof. We need orthonormal eigenvectors (𝑨 should be sym-
• The other eigenvalues satisfy |𝜆𝑖 | ≤ 1. metric), and nonnegative eigenvalues (𝑨 is PSD).
• If 𝑨 or any power of 𝑨 has all positive entries, these
other |𝜆𝑖 | < 1. The solution 𝑨𝑘 𝒖0 approaches a multi-
ple of 𝒙1 , which is the steady state 𝒖∞ . 5.3 Leontief’s Input-ouput Model
Lemma 32. The difference equation 𝒖𝑘=1 = 𝑨𝒖𝑘 is Leontief divided the US economy into 𝑛 sectors that pro-
• stable if ∀𝑖. |𝜆𝑖 | < 1. duce goods or services (e.g., coal, automotive, and commu-
• neutrally stable if ∃𝑖. |𝜆𝑖 | = 1, and all the other |𝜆𝑖 | < 1. nication), and another sectors that only consume goods or
• unstable if ∃𝑖. 𝜆𝑖 | > 1. services (e.g., consumer and government).
10
Table 6: Comparation of PD and PSD matrices.
• Production vector 𝒙 ∈ ℝ𝑛 . Ouput of each producer for Definition 38 (Stationary Point). Point where 𝜕𝑓
𝜕𝒖
= 𝟎. Such
one year. point can be a local minimum, a local maximum, or a saddle
• Final demand vector 𝒃 ∈ ℝ𝑛 . Demand for each pro- point.
ducer by the consumer for a year. 𝜕2 𝑓
• Intermediate demand vector 𝒖 ∶= 𝑨𝒙 ∈ ℝ𝑛 . Demand Lemma 40. 𝑓 (𝒖) has a local minimum when 𝜕𝒖2
is PD.
for each producer by the producer for a year. 𝑨 ∈ ℝ𝑛×𝑛 𝜕2 𝑓
Similarly, 𝑓 (𝒖) has a local maximum when is ND. If some
𝜕𝒖2
is the consumption matrix. eigenvalues are postive and some are negative, 𝑓 (𝒖) has a
2
Theorem 37 (Leontief Input-output Model). When there is saddle point. If 𝜕𝜕𝒖𝑓2 has eigenvalue 0, the test is inconclusive.
a production level 𝒙 such that the amounts produced will In some cases, we can directly get the stationary point by
exactly balance the total demand for that production
solving 𝜕𝑓𝜕𝒖
= 𝟎. In other cases, we iteratively approach to
𝒙 = 𝑨𝒙 + 𝒃 . (32) the stationary point.
If 𝑨 and 𝒃 have nonnegative entries and the largest eigen- Algorithm 16 (Gradient Descent). 𝒖 ← 𝒖 − 𝜂 𝜕𝑓
𝜕𝒖
.
value of 𝑨 is less than 1, then the solution exists and has ( 2 )−1
nonnegative entires Algorithm 17 (Newton’s Method). 𝒖 ← 𝒖 − 𝜕𝜕𝒖𝑓2 𝜕𝑓
𝜕𝒖
.
( )
∑∞
Lemma 41 (Taylor Series).
−1 𝑘
𝒙 = (𝑰 − 𝑨) 𝒃 = 𝑰 + 𝑨 𝒃. (33)
𝑘=1 𝜕𝑓 | 1 𝜕2𝑓 |
𝑓 (𝒖) ≈ 𝑓 (𝒖0 )+(𝒖−𝒖0 )⊤ | + (𝒖−𝒖0 )⊤ 2 | (𝒖−𝒖0 ) .
𝜕𝒖 |𝒖0 2 𝜕𝒖 |𝒖0
6 Theory: Positive Definite Matrices (35)
Lemma 38. For symmetric matrices, the pivots and the The Lagrange function is defined as
eigenvalues have the same signs. ∑
𝑚 ∑
𝑛
Lemma 39. 𝒙⊤ 𝑨𝒙 = 1 is an ellipsoid in 𝑛 dimensions. The (𝒖, 𝜶, 𝜷) ≔ 𝑓 (𝒖) + 𝛼𝑖 𝑔𝑖 (𝒖) + 𝛽𝑗 ℎ𝑗 (𝒖) , (37)
𝑖=1 𝑗=1
axes of the ellipsoid point toward the eigenvector of 𝑨.
where 𝛼𝑖 ≥ 0.
6.2 Unconstrained Optimization Lemma 42. The optimization problem of 36 is equivalent to
The goal is to solve min max (𝒖, 𝜶, 𝜷) (38)
𝒖 𝜶,𝜷
arg min 𝑓 (𝒖) . (34) s. t. 𝛼𝑖 ≥ 0, 𝑖 = 1, 2, … , 𝑚 .
𝒖
11
Proof. Lemma 46 (Slater Condition). When primal problem is con-
vex, i.e., 𝑓 and 𝑔𝑖 are convex, ℎ𝑗 is affine, and there exists
min max (𝒖, 𝜶, 𝜷) at least one point in the feasible region to let the inequal-
𝒖 𝜶,𝜷
( ( )) ity strictly holds true, the dual problem is equivalent to the
∑
𝑚 ∑
𝑛
primal problem.
= min 𝑓 (𝒖) + max 𝛼𝑖 𝑔𝑖 (𝒖) + 𝛽𝑗 ℎ𝑗 (𝒖)
𝒖 𝜶,𝜷
𝑖=1 𝑗=1
( { ) Proof. The proof is out of the range of this note. Please refer
0 if 𝒖 feasible ; to [2] if you are interested.
= min 𝑓 (𝒖) +
𝒖 ∞ otherwise
= min 𝑓 (𝒖), and 𝒖 feasible , (39)
𝒖
7 Application: Solving Optimization
When 𝑔𝑖 is infeasible 𝑔𝑖 (𝒖) > 0, we can let 𝛼𝑖 = ∞, such Problems
that 𝛼𝑖 𝑔𝑖 (𝒖) = ∞; When ℎ𝑗 is infeasible ℎ𝑗 (𝒖) ≠ 0, we can
let 𝛽𝑗 = sign(ℎ𝑗 (𝒖))∞, such that 𝛽𝑗 ℎ𝑗 (𝒖) = ∞. When 𝒖 7.1 Removale Non-differentiability
feasible, since 𝛼𝑖 ≥ 0, 𝑔𝑖 (𝒖) ≤ 0, 𝛼𝑖 𝑔𝑖 (𝒖) ≤ 0. Therefore, the
maximum of 𝛼𝑖 𝑔𝑖 (𝒖) is 0. Lemma 47. The optimization problem
Corollary 43 (KKT condition). The optimization problem arg min |𝑓 (𝒖)| (43)
𝒖
of 38 should satisfy the followings at the optimium.
• Primal feasible: 𝑔𝑖 (𝒖) ≤ 0, ℎ𝑖 (𝒖) = 0; is equivalent to
• Dual feasible: 𝛼𝑖 ≥ 0;
• Complementary slackness: 𝛼𝑖 𝑔𝑖 (𝒖) = 0. arg min 𝑥 (44)
𝒖,𝑥
Definition 40 (Dual problem). The dual problem of 36 is s. t. 𝑓 (𝒖) − 𝑥 ≤ 0
− 𝑓 (𝒖) − 𝑥 ≤ 0 .
max min (𝒖, 𝜶, 𝜷) (40)
𝜶,𝜷 𝒖
2
arg max min |𝒘⊤ 𝒙𝑖 + 𝑏| (45)
Definition 41 (Convex Function). A function 𝑓 is convex if 𝒘,𝑏 𝑖 ‖𝒘‖
s. t. 𝑦𝑖 (𝒘⊤ 𝒙𝑖 + 𝑏) > 0, 𝑖 = 1, 2, … , 𝑚 .
∀𝛼 ∈ [0, 1]. 𝑓 (𝛼𝒙 + (1 − 𝛼)𝒚) ≤ 𝛼𝑓 (𝒙) + (1 − 𝛼)𝑓 (𝒚) , (42)
Since scaling of (𝒘, 𝑏) does not change the solution, for
which means that if we pick two points on the graph of a
simplicity, we add a constraint that
convex function and draw a straight line segment between
them, the portion of the function between these two points min |𝒘⊤ 𝒙𝑖 + 𝑏| = 1 . (46)
will lie below this straight line. 𝑖
Lemma 45. A function 𝑓 is convex if every point on the Theorem 48 (Standard Form of SVM). The optimization
tangent line will lie below the corresponding point on 𝑓 problem of SVM is equivalent to
2
𝑓 (𝒚) ≥ 𝑓 (𝒙) + (𝒚 − 𝒙)⊤ 𝜕𝑓𝜕𝒙(𝒙) or 𝜕𝜕𝒙𝑓2 is PSD. 1 ⊤
arg min 𝒘 𝒘 (47)
Definition 42 (Affine Function). Funtion 𝑓 in the form 𝒘,𝑏 2
𝑓 (𝒙) = 𝒄 ⊤ 𝒙 + 𝑑. s. t. 𝑦𝑖 (𝒘⊤ 𝒙𝑖 + 𝑏) ≥ 1, 𝑖 = 1, 2, … , 𝑚 .
12
Proof. By contradiction. Suppose the equality of the Table 7: Analogy of real-valued functions with linear trans-
constraint does not hold at the optimial (𝒘⋆ , 𝑏⋆ ), i.e., formations.
⊤
min𝑖 𝑦𝑖 (𝒘⋆ 𝒙𝑖 + 𝑏⋆ ) > 1. There exists (𝑟𝒘, 𝑟𝑏) where
0 < 𝑟 < 1 such that min𝑖 𝑦𝑖 ((𝑟𝒘)⊤ 𝒙𝑖 + 𝑟𝑏) = 1, and Function Linear transformation
1
2
‖𝑟𝒘‖2 < 12 ‖𝒘‖2 . That implies (𝒘⋆ , 𝑟⋆ ) is not an optimial, Definition 𝑓∶ ℝ→ℝ 𝑇 ∶ ℝ𝑛 → ℝ𝑚
which contradicts to the assumption. Therefore, Eqn. 47 is Domain, Codomain ℝ, ℝ ℝ 𝑛 , ℝ𝑚
equivalent to Image of 𝑥 𝑓 (𝑥) 𝑇 (𝒙) ∶= 𝑨𝒙
Range {𝑦 ∣ ∃𝑥. 𝑦 = 𝑓 (𝑥)} (𝑨)
1 ⊤ Zero {𝑥 ∣ 𝑓 (𝑥) = 0} (𝑨)
arg min 𝒘 𝒘 (48)
𝒘,𝑏 2 Inverse 𝑓 −1 (𝑦) 𝑇 −1 (𝒚) = 𝑨−1 𝒚
Decomposition 𝑔◦𝑓 = 𝑔(𝑓 (𝑥)) 𝑇𝐵 ◦𝑇𝐴 = 𝑩𝑨𝒙
s. t. min 𝑦𝑖 (𝒘⊤ 𝒙𝑖 + 𝑏) = 1 .
𝑖
The objective function is equivalent to Table 8: Terminologies of linear transformation 𝑇 (𝒙) = 𝑨𝒙.
𝜕 ∑ 𝑚
Lemma 52 (Genral Matrix for a Linear Transformation). Let
=0⇒ 𝛼𝑖 𝑦𝑖 = 0 . (54)
𝜕𝑏 𝑇 ∶ → be a linear transformation, where 𝑽 ∈ ℝ𝑛×𝑛
𝑖=1
is the input basis and 𝑼 ∈ ℝ𝑚×𝑚 is the output basis. There
Substitute them into Eqn. 52 gives Eqn. . exists a unique matrix 𝑨 ∈ ℝ𝑚×𝑛 that gives the coordinate
13
𝑇 (𝒄) = 𝑨𝒄 in the output space when the coordinate of input Table 9: Transformation using homogeneous coordinates.
space is 𝒄. The 𝑗-th column of 𝑨 is found by solving 𝑇 (𝒗𝑗 ) =
𝑼 𝒂𝑗 . Transformation Result
⎡𝑐𝑥 0 0⎤ ⎡𝑥⎤ ⎡𝑐𝑥 𝑥⎤
8.2 Identity Transformations = Change of Ba- Scaling ⎢ 0 𝑐𝑦 0⎥ ⎢𝑦⎥ = ⎢ 𝑐𝑦 𝑦 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
sis ⎣ 0 0 1⎦ ⎣ 1 ⎦ ⎣ 1 ⎦
⎡1 0 𝑥0 ⎤ ⎡𝑥⎤ ⎡𝑥 + 𝑥0 ⎤
Definition 47 (Coordinate). The coordinate of a vector 𝒙 ∈ Translation ⎢0 1 𝑦0 ⎥ ⎢𝑦⎥ = ⎢ 𝑦 + 𝑦0 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
ℝ𝑛 relative to the bases matrix 𝑾 ∈ ℝ𝑛×𝑛 is the coefficient ⎣0 0 1 ⎦ ⎣ 1 ⎦ ⎣ 1 ⎦
𝒄 such that 𝒙 = 𝑾 𝒄, or equivalently 𝒄 = 𝑾 −1 𝒙. ⎡0 1 0⎤ ⎡𝑥⎤ ⎡𝑦⎤
Reflection ⎢1 0 0⎥ ⎢𝑦⎥ = ⎢𝑥⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
Example 3 (Wavelet Transform). Wavelets are little waves. ⎣0 0 1 ⎦ ⎣ 1 ⎦ ⎣ 1 ⎦
They have different length and they localized at different ⎡cos 𝛼 − sin 𝛼 0⎤ ⎡𝑥⎤
places. The basis matrix is Clockwise rotation ⎢ sin 𝛼 cos 𝛼 0⎥ ⎢𝑦⎥
⎢ ⎥⎢ ⎥
⎣ 0 0 1⎦ ⎣ 1 ⎦
⎡1 1 1 0⎤
⎢1 1 −1 0⎥
𝑾 ∶= ⎢
1⎥
. (56)
⎢
1 −1 0
⎥
9 Applications: Linear Transforma-
⎣1 −1 0 −1⎦
tions
Those bases are orthogonal. The wavelet transform finds the
coefficients 𝒄 when the input signal 𝒙 is expressed in the 9.1 Computer Graphics
wavelet basis 𝒙 = 𝑾 𝒄.
Definition 48 (Homogeneous Coordinates). Each point
Example 4 (Discrete Fourier Transform). The Fourier trans- [ ] ⎡𝑥⎤
𝑥
form decomposes the signal into waves at equally spaced ∈ ℝ2 can be identified with the point ⎢𝑦⎥ ∈ ℝ3 . Homo-
frequencies. The basis matrix is 𝑦 ⎢ ⎥
⎣1⎦
geneous coordinates can be trasformed via multiplication by
⎡1 1 1 1 ⎤ 3 × 3 matrices, as illustrated in Table 9. By analogy, each
⎢1 i i2 i3 ⎥
𝑭 ∶= ⎢ ⎡𝑥⎤
(i3 )2 ⎥
. (57) ⎡𝑥⎤
1 i2 (i2 )2 ⎢𝑦⎥
⎢ ⎥ point ⎢𝑦⎥ ∈ ℝ3 can be identified with the point ⎢ ⎥ ∈ ℝ4 .
⎣1 i3 (i2 )3 (i3 )3 ⎦ ⎢ ⎥ 𝑧
⎣𝑧⎦ ⎢ ⎥
⎣1⎦
Those bases are orthogonal. The discrete Fourier transform
finds the coefficients 𝒄 when the input signal 𝒙 is expressed
in the Fourier basis 𝒙 = 𝑭 𝒄. Theorem 54 (Perspective projections). A 3D object is rep-
resented on the 2D computer screen by projecting the object
Lemma 53 (Change of [ Basis). Suppose
] we want
[ to change] onto a viewing plane at 𝑧 = 0. Suppose the eye of a viewer is
the basis from 𝑽 ∶= 𝒗1 𝒗2 ⋯ 𝒗𝑛 to 𝑼 ∶= 𝒖1 𝒖2 ⋯ 𝒖𝑛 . ⎡0⎤
The coordinate of a vector 𝒙 is 𝒄 in 𝑽 , and is 𝒃 in 𝑼 . Then at the point ⎢ 0 ⎥. A perspective projection maps each point
𝒃 = 𝑼 −1 𝑽 𝒄, where 𝑨 ∶= 𝑼 −1 𝑽 is called the change of ⎢ ⎥
⎣𝑑 ⎦
basis matrix. ⎡𝑥⎤ ⎡𝑥𝑝 ⎤
⎢𝑦⎥ onto an image point ⎢𝑦𝑝 ⎥ such that those two point and
Proof. 𝒙 = 𝑽 𝒄 = 𝑼 𝒃 ⇒ 𝒃 = 𝑼 −1 𝑽 𝒄. ⎢ ⎥ ⎢ ⎥
⎣𝑧⎦ ⎣0⎦
Algorithm 18 (Solving the Change[ of Basis
] [Matrix).
] Per- the eye position (center of projection) are on a line.
form elementary row operations on 𝑼 𝑽 to 𝑰 𝑨 . 𝑇 (𝑛) ∼
𝑥 𝑦
𝑛3 . 𝑥𝑝 = 𝑧 , 𝑦𝑝 = 𝑧 . (58)
1− 𝑑
1− 𝑑
Example 5 (Diagonalization). 𝑇 (𝒙) ∶= 𝑨𝒙 = 𝑺𝚲𝑺 −1 𝒙
defines a linear transformation which changes the basis from
𝑰 to 𝑺, then transform 𝒙 in space of 𝑺, and last changes the
9.2 Principle Component Analysis
basis from 𝑺 back to 𝑰. Definition 49 (Principle Component Analysis, PCA). Given
Example 6 (SVD Factorization). 𝑇 (𝒙) ∶= 𝑨𝒙 = 𝑼 𝚺𝑽 ⊤ 𝒙 a set of instances {𝒙𝑖 }𝑚
𝑖=1
with empirical mean 𝝁 ∈ ℝ𝑑 and
defines a linear transformation which changes the basis from empirical covariance 𝚺 ∈ ℝ[𝑑×𝑑 , PCA wants] to find a set
𝑰 to 𝑽 , then transform 𝒙 from space 𝑽 to space 𝑼 , and last of orthonormal bases 𝑾 ∶= 𝒘1 𝒘2 ⋯ 𝒘𝑑 ′ such that the
changes the basis from 𝑼 back to 𝑰. sum of variance of the projected data along each component
14
Table 10: Porperties of eigenvalues and eigenvectors of special matrices.
1
is maximized. Definition 50 (PCA Whitening). 𝒙̂ ∶= 𝚲− 2 𝑸⊤ (𝒙 − 𝝁) has
̂ = 𝟎 and cov 𝒙̂ = 𝑰.
𝔼[𝒙]
arg max tr cov 𝑾 ⊤ (𝒙 − 𝝁) (59)
𝑾 1
Definition 51 (ZCA Whitening). 𝒙̂ ∶= 𝑸𝚲− 2 𝑸⊤ (𝒙 − 𝝁) has
s. t. 𝑾 ⊤𝑾 = 𝑰 .
̂ = 𝟎 and cov 𝒙̂ = 𝑰.
𝔼[𝒙]
Theorem 55. The optimium 𝑾 to Eqn. 61 is the top 𝑑 ′
eigenvectors of 𝚺.
10 Appendix
⊤ ⊤
Proof. Since 𝔼[𝑾 (𝒙 − 𝝁)] = 𝑾 𝔼[𝒙 − 𝝁] = 𝟎,
Lemma 57 (Sum of Series).
tr cov 𝑾 ⊤ (𝒙 − 𝝁) = tr 𝔼[(𝑾 ⊤ (𝒙 − 𝝁) − 𝟎)(𝑾 ⊤ (𝒙 − 𝝁) − 𝟎)⊤ ]
∑
𝑛
𝑛(𝑛 + 1) 1 2
= tr 𝑾 ⊤ 𝔼[(𝒙 − 𝝁)(𝒙 − 𝝁)⊤ ]𝑾 𝑖= ∼ 𝑛 , (64)
2 2
= tr 𝑾 ⊤ 𝚺𝑾 . (60) 𝑖=1
∑
𝑛
𝑛(𝑛 + 1)(2𝑛 + 1) 1 3
The optimization problem is equivalent to 𝑖2 = ∼ 𝑛 . (65)
𝑖=1
6 3
arg min − tr 𝑾 ⊤ 𝚺𝑾 (61)
𝑾
References
s. t. 𝑾 ⊤𝑾 = 𝑰 .
[1] S. Axler. Linear algebra done right. Springer, 1997. 1
The Lagrange function is [2] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge
University Press, 2004. 12
(𝑾 , 𝑩) ∶= − tr 𝑾 ⊤ 𝚺𝑾 + (vec 𝑩)⊤ (vec(𝑾 ⊤ 𝑾 − 𝑰)) [3] S. Boyd and L. Vandenberghe. Introduction to Applied Linear
= − tr 𝑾 ⊤ 𝚺𝑾 + tr 𝑩 ⊤ (𝑾 ⊤ 𝑾 − 𝑰) . (62) Algebra: Vectors, Matrices, and Least Squares. Cambridge
University Press, 2018. 1
We can get the optimial by [4] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge
University Press, 1990. 1
𝜕
= 𝟎 ⇒ 𝚺𝑾 = 𝑾 𝑩 . (63) [5] D. C. Lay, S. R. Lay, and J. J. McDonald. Linear Algebra and
𝜕𝑾 Its Applications (Fifth Edition). Pearson, 2014. 1
[6] K. B. Petersen, M. S. Pedersen, et al. The matrix cookbook.
Technical University of Denmark, 2008. 1
Corollary 56. 𝒙̂ ∶= 𝑸⊤ (𝒙−𝝁) has 𝔼[𝒙]
̂ = 𝟎 and cov 𝒙̂ = 𝚲, [7] G. Strang. Linear algebra and its applications (Fourth Edition).
where 𝚺 = 𝑸𝚲𝑸⊤ . Academic Press, 2006. 1
15
[8] G. Strang. Computational science and engineering. Wellesley-
Cambridge Press, 2007. 1
[9] G. Strang. Introduction to linear algebra (Fourth Edition).
Wellesley-Cambridge Press, 2009. 1
16