0% found this document useful (0 votes)
69 views57 pages

408 Note

This document provides an overview of key concepts in advanced linear algebra involving complex vectors and matrices. It covers topics such as: 1) Complex numbers, vectors, and matrices including basic operations and properties. 2) Concepts for complex vectors and matrices including linear equations, row/column spaces, ranks, determinants, eigenvalues/eigenvectors, and diagonalization. 3) Inner products, orthonormal sets, and the Gram-Schmidt process for obtaining orthonormal bases. It also provides exercises to solidify understanding of these concepts.

Uploaded by

vaibhavsingh4594
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views57 pages

408 Note

This document provides an overview of key concepts in advanced linear algebra involving complex vectors and matrices. It covers topics such as: 1) Complex numbers, vectors, and matrices including basic operations and properties. 2) Concepts for complex vectors and matrices including linear equations, row/column spaces, ranks, determinants, eigenvalues/eigenvectors, and diagonalization. 3) Inner products, orthonormal sets, and the Gram-Schmidt process for obtaining orthonormal bases. It also provides exercises to solidify understanding of these concepts.

Uploaded by

vaibhavsingh4594
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Notes on Advanced Linear Algebra

Chi-Kwong Li

1 Complex vectors and complex matrices

In applications and theoretical development, it is important to study complex vectors and


matrices. Let N, Z, Q, R, C be the set of natural numbers, integers, rational numbers, real
numbers, complex numbers, respectively.

1.1 Complex numbers: Basic operations


ˆ A complex number has the standard form z = a + ib with a, b ∈ R, and we have the
complex plane representation. The complex conjugate of z is z̄ = a − ib.

ˆ For z1 , z2 ∈ C, one can perform addition z1 + z2 , subtraction z1 − z2 , multiplication


z1 z2 , and division z1 /z2 provided z2 ̸= 0.

ˆ The size, modulus, or norm of z = a + ib is |z| = a2 + b2 , the argument of z is
θ ∈ [0, 2π) or R with cos θ = a/|z| and sin θ = b/|z|. Note that z z̄ = z̄z = |z|2 .

ˆ The polar form of z is z = |z|(cos θ + i sin θ) = |z|eiθ . If z1 = |z1 |eiθ1 and z2 = |z2 |eiθ2 ,
then z1 z2 = |z1 ||z2 |ei(θ1 +θ2 ) , where we may replace θ1 + θ2 by θ1 + θ2 − 2π in case
θ1 + θ2 ≥ 2π. If z2 ̸= 0, then z1 /z2 = (|z1 |/|z2 |)ei(θ1 −θ2 ) , where we may replace θ1 − θ2
by θ1 − θ2 + 2π in case θ1 < θ2 .

1
1.2 Real or Complex Vectors and Matrices
Let F = R or C, and Fn be the set of column vectors with n co-ordinates.
   
x1 y1
 ..   .. 
ˆ If x =  .  , y =  .  ∈ Fn , and γ ∈ R, then the addition and scalar multiplication
xn yn
are defined by
   
x1 + y 1 γx1
x + y =  ...  and γx =  ...  ,
   
xn + y n γxn
respectively.

ˆ The set Fn form a vector space under addition and scalar multiplication.
The addition is closed, associative, commutative; there is a zero vector 0 ∈ Fn such that
x + 0 = x; for any x ∈ Fn there is an additive inverse −x such that x + (−x) = 0; the scalar
multiplication always yields an element in Cn and satisfies γ1 (γ2 x) = (γ1 γ2 )x and 1x = x for
any γ1 , γ2 ∈ F and x ∈ Fn .

Let Mn (F), Mm,n (F) be the set of n × n and m × n matrices over F, respectively. We
write Mn , Mm,n if F = C.
 
A1 0
ˆ If A = ∈ Mm+n (F) with A1 ∈ Mm (F) and A2 ∈ Mn (F), we write A =
0 A2
A1 ⊕ A2 .

ˆ xt , At denote the transpose of a vector x and a matrix A.

ˆ For a complex matrix A, A denotes the matrix obtained from A by replacing each
entry by its complex conjugate. Furthermore, A∗ = (A)t .

ˆ If A = (aij ) is m × n, and B = (bjk ) is n × p, then C = AB = (cik ) is m × p such that


cik = ai1 b1k + · · · + ain bnk for 1 ≤ i ≤ m, 1 ≤ k ≤ p.

ˆ If A = (Aij ) is such that Aij is mi × nj for 1 ≤ i ≤ r, 1 ≤ j ≤ s, and B = (Bjk ) is


such that Bjk is nj × pk for 1 ≤ j ≤ s and 1 ≤ k ≤ q, then C = AB = (Cik ) such that
Cik = Ai1 B1k + · · · + Ais Bsk for 1 ≤ i ≤ r, 1 ≤ k ≤ q.

ˆ If A ∈ Mmn has columns u1 , . . . , un and B ∈ Mn,p has rows v1t , . . . , vnt , then
n
X
AB = ui vjt .
j=1

2
1.3 Basic concepts and operations for complex vectors & matrices

We can extend the concepts on real vectors and real matrices to complex vectors and complex
matrices.

ˆ Linear equations, solution sets, elementary row operations.


 
1 3i
Example. Consider Ax = b with A = , b = (1, i)t . Consider h such that
2−i h
the system is solvable.

ˆ Column space, row space, null space, and rank of a complex matrix.
Determine h in the above example so that A has rank one or rank 2. Also, determine
bases for the column space, row space, and null space of A for each choice of h.

ˆ Determinant, eigenvalues, eigenvectors, diagonal form.


Compute the determinant of A above. Find the eigenvalues, eigenvectors of A if h = 1.

ˆ To solve for eigenvalues and eigenvectors,


1) Solve the characteristic equation det(λI − A) = 0 to find the eigenvalues.
Note that det(λI − A) = (λ − λ1 ) · · · (λ − λn ) by the Fundamental Theorem of Algebra.
2) For each root λi of det(λI − A) = 0, find a basis for solution set of (λi I − A)x = 0.
3) There are n linearly independent eigenvectors x1 , . . . , xn corresponding the λ1 , . . . , λn
if and only if AS = SD, where S has columns x1 , . . . , xn , and D = diag (λ1 , . . . , λn ),
so that S −1 AS = D. We say that A is diaonalizable.
Note that if A has n distinct eigenvalues, then A is diagonalizable because each eigen-
value has at least one eigenvector, and these eigenvectors are linearly independent.
 
0 1
For example, A = is not diagonalizable.
0 0

ˆ Vector spaces, basis, change of bases.


The space Cn has dimension n, a linearly indpendent set (or a spanning set) {v1 , . . . , vn }
of n vectors form a basis. This happen if and only if the matrix S with column v1 , . . . , vn
is invertible, equivalently, det(S) ̸= 0.

ˆ Linear transformations, range space, kernel.


A matrix A ∈ Mm,n define a linear transformation T : Cn → Cm such that T (x) = Ax
for any x ∈ Cn . The column space of A is the range space, the null space of A is the
kernel.

3
1.4 Inner product, orthonormal sets, Gram-Schmidt process

Recall that the inner product of u, v ∈ Cn is ⟨u, v⟩ = v ∗ u and satisfies the following:
(1) For any u, u1 , u2 , v ∈ Cn and a, b ∈ C, ⟨au1 + bu2 , v⟩ = a⟨u1 , v⟩ + b⟨u2 , v⟩
(2) For any u, v ∈ Cn , ⟨u, v⟩ = ⟨v, u⟩
(3) For any u ∈ Cn , ⟨u, u⟩ ≥ 0, the equality holds if and only if u = 0.
The Euclidean norm (a.k.a. ℓ2 -norm) of v ∈ Cn is defined by ∥v∥ = (v ∗ v)1/2 and satisfies
the following.
(a) For any v ∈ Cn , ∥v∥ ≥ 0. ( positive definiteness)
The equality holds if and only if v = 0.
(b) For any a ∈ C and v ∈ C, ∥av∥ = |a|∥v∥. (absolute homogeneity)
(c) For any u, v ∈ Cn , ∥u + v∥ ≤ ∥u∥ + ∥v∥. (triangle inequality)
The equality holds if and only if one vector is a nonnegative multiple of the other.
Condition (c) follows from
(d) |⟨u, v⟩| ≤ ∥u∥∥v∥. (Cauchy-Schwartz inequality)
The equality holds if and only if one vector is a multiple of the other.
A set of vectors {u1 , . . . , um } ⊆ Fn is orthonormal if ⟨ui , uj ⟩ = δij , the Kronecker delta
such that δjj = 1 and δij = 0 if i ̸= j. Equivalently, U ∗ U = Im , where U ∈ Mn,m (F) has
columns u1 , . . . , um .
Note: An orthonormal set {u1 , . . . , um } ⊆ Fn is always linearly independent so that m ≤ n.
A vector v is a linear combination of u1 , . . . , um if and only if v = a1 u1 + · · · + am um with
aj = ⟨v, uj ⟩ for j = 1, . . . , m.
Gram-Schmidt Process Let v1 , . . . , vm ∈ Fn be linearly independent with m < n.
Set u1 = v1 /∥v1 ∥.
For k > 1, let fk = vk − (a1 u1 + · · · + ak−1 uk−1 ) with aj = u∗j vk and uk = fk /∥fk ∥.
Then {u1 , . . . , uk } is an orthonormal basis for span {v1 , . . . , vk } for k = 1, . . . , m.
If m < n, one may further extend {u1 , . . . , um } to an orthonormal basis {u1 , . . . , un }.
To see this, one can apply the Gram-Schmidt process to the basic columns of the rank n
matrix [u1 · · · um e1 · · · en ].
A set {u1 , . . . , un } is an orthonormal basis for Fn if and only if the matrix U with columns
u1 , . . . , un satisfies U ∗ U = In . When F = C, the matrix U is called a unitary matrix; when
F = R, the matrix U is called an orthogonal matrix.
We will denote by Un (F) the set of matrices U ∈ Mn (F) such that U ∗ U = In .

4
Exercises
 
1 2i 3 4
1. Let A =  2i 6 1 + i 1 − i.
1 + 2i 6 + 2i 4 + i 5 − i
(a) Reduce the matrix to row echelon form, and find the rank of A.
(b) Find bases for the row space, column space, and null space of A.
(c) Solve the equations Ax = (2, 2 − i, 3 − i))t and Ax = (1, 0, 0)t .
 
i 2
2. Let A = .
−2 i
(a) Find the eigenvalues λ1 , λ2 of A, and the corresponding unit eigenvectors u1 , u2 .
(b) Let U = [u1 u2 ]. Show that U ∗ U = I2 and AU = U D with D = diag (λ1 , λ2 ).
(c) Show that Ak = U Dk U ∗ = λk1 v1 v1∗ + λk2 v2 v2∗ for all (positive or negative) integers k

3. Suppose A = SDS −1 ∈ Mn such that D = diag (λ1 , . . . , λn ), and where S has columns
x1 , . . . , xn and S −1 has rows y1t , . . . , ynt .
(
1 if i = j,
(a) Show that yit xj = δij = [Hint: Consider S −1 S.]
0 if i ̸= j.

(b) Show that Ak = SDk S −1 = nj=1 λkj xj yjt for every positive integer k.
P

(c) If A is invertible, show that Ak = SDk S −1 = nj=1 λkj xj yjt for every negative integer
P

k.
(d) For any polynomial f (z) = am z m + · · · + a0 , let f (A) = am Am + · · · + a1 A + a0 In .
Show that f (A) = nj=1 f (λj )xj yjt .
P

 
  i 0 0
1 i
4. Suppose A = and B = 2 2i 0 .
0 2
1 1 3i
(a) Show that for any C ∈ M2,3 , there is X ∈ M2,3 such that AX + C = XB.
[Hint: Let X = [xij ] and set up a linear system of 6 equations to solve for [xij ] for a
given C.]
 
A C
(b) Suppose T = for some matrix C ∈ M2,3 . Show that there is X ∈ M2,3
0 B
such that
 
I2 X
T S = S(A ⊕ B) if S = . Find S −1 and conclude that S −1 T S = A ⊕ B.
0 I3
(c) Show that conclusion (a) may fail if A and B share a common eigenvalue.

5
5. Let u, v1 , v2 ∈ Cn , a, b ∈ C. Show that ⟨u, av1 + bv2 ⟩ = ā⟨u, v1 ⟩ + b̄⟨u, v2 ⟩.

6. Let S = {v1 , . . . , vk } ⊆ Cn be an orthonormal set. Show that S is linearly independent.

7. Let u, v ∈ Cn . Prove the Cauchy-Schwarz inequality |⟨u, v⟩| ≤ ∥u∥∥v∥, and the triangle
inequality ∥u + v∥ ≤ ∥u∥ + ∥v∥, and determine the conditions for equality.
Hint: Let u, v ∈ Cn be nonzero. Consider eiθ such that ⟨u, eiθ v⟩ = |⟨u, v⟩| so that
⟨eiθ v, u⟩ = ¯⟨u, eiθ v⟩ = |⟨u, v⟩|. Then for any t ∈ R,

0 ≤ ∥u + teiθ v∥2 = at2 + 2bt + c

with a = ∥v∥2 , c = ∥u∥2 , b = |⟨u, v⟩|. Then argue that b2 ≤ ac to prove the inequality,
and argue that the equality hold if and only if u + teiθ v = 0 for some t ∈ R.

8. Let v1 = (1, i, 1)t , v2 = (1, i, 2)t .


(a) Apply Gram-Schmidt process to the vectors v1 , v2 to get an orthonormal pairs
u1 , u2 .
(b) Let A = [u1 u2 ]. Solve the system A∗ x = (0, 0)t .
(c) Determine u3 such that {u1 , u2 , u3 } is an orthonormal basis for C3 .

9. Let u = (1, 2i, 1 − i)t . Find a unitary U with u/∥u∥ as the first column.

10. Suppose A ∈ Mn,m with m ≤ n with rank m. Show that A = P U such that P ∈ Mn,m
has orthonormal columns, and U is upper triangular.
 
1 1−i 2+i
11. Let A = 1 1 + i −2 + i. Write A = U R for an upper triangular matrix R.
i i 2
[Apply Gram Schmidt to the columns of A to get a unitary matrix U .]

6
2 Unitary equivalence and unitary similarity
Two matrices A, B ∈ Mm,n are unitarily equivalent if there are unitary U ∈ Mm and V ∈ Mn
such that A = U BV . Two matrices X, Y ∈ Mn are unitarily similar if there is a unitary
W ∈ Mn such that X = W ∗ Y W . It is easy to show that these are equivalence relations,
that is, reflective, symmetric and transitive.
In this chapter, we consider different canonical forms of matrices under unitary equiva-
lence and unitary similarity.

2.1 Singular value decomposition


Lemma 2.1.1 Let A be a nonzero m × n matrix, and u ∈ Cm , v ∈ Cn be unit vectors such
that |u∗ Av| attains the maximum value. Suppose U ∈ Mm and V ∈ Mn are unitary matrices
 ∗ 
∗ u Av 0
with u and v as the first columns, respectively. Then U AV = .
0 A1

Proof. Note that the existence of the maximum |u∗ Av| follows from basic analysis result.
Suppose U ∗ AV = (aij ). If the first column x = U ∗ Av = (a11 , . . . , am1 )t has nonzero
entries other than a11 , then ũ = U x/∥U x∥ = U x/∥x∥ ∈ Cm is a unit vector such that
p
ũ∗ Av = x∗ U ∗ Av/∥x∥ = x∗ x/∥x∥ = ∥x∥ > |a11 |2 = |a11 | = |u∗ Av|,

which contradicts the choice of u and v. Similarly, if the first row y ∗ = x∗ AV = (a11 , . . . , a1n )
has nonzero entries other than a11 , then ṽ = V y/∥V y∥ = V y/∥y∥ is a unit vector satisfying

u∗ Aṽ = u∗ AV y/∥y∥ = y ∗ y/∥y∥ = ∥y∥ > |a11 |2 ,

which is a contradiction. The result follows. □

Theorem 2.1.2 Let A be an m × n matrix of rank r. Then there are unitary matrices
U ∈ Mm , V ∈ Mn such that
Xr
U ∗ AV = D = sj Ejj .
j=1

As a results, if U and V have columns u1 , . . . , um ∈ Cm and v1 , . . . , vn ∈ Cn ,


r
X
A= sj uj vj∗
j=1

Proof. We prove the result by induction on max{m, n}. By the previous lemma, there are
 ∗ 
∗ u Av 0
unitary matrices U ∈ Mm , V ∈ Mn such that U AV = . We may replace U by
0 A1

7
eiθ U for a suitable θ ∈ [0, 2π) and assume that u∗ Av = |u∗ Av| = s1 . By induction assumption
 
s2
there are unitary matrices U1 ∈ Mm−1 , V1 ∈ Mn−1 such that U1∗ A1 V1 = 
 s3 . Then

..
.
([1] ⊕ U1∗ )U ∗ AV ([1] ⊕ V1 ) has the asserted form, where r is the rank of A. □

Remark 2.1.3 The values s1 ≥ · · · ≥ sr > 0 are the nonzero singular values of A, which
are s21 , . . . , s2r are the nonzero eigenvalues of AA∗ and A∗ A. The vectors v1 , . . . , vr are the
right singular vectors of A, and u1 , . . . , ur are the left singular vectors of A. So, they are
uniquely determined. We will denote the singular values of A by s1 (A) ≥ s2 (A) ≥ · · ·
Here is another way to do the singular value decomposition. Let {v1 , . . . , vr } ⊆ Cn
be an orthonormal set of eigenvectors corresponding to the nonzero eigenvalues s21 , . . . , s2r
of A∗ A. Let uj = Avj /sj . Then {u1 , . . . , ur } ⊆ Cm is an orthonormal family such that
A = rj=1 sj uj vj∗ .
P

Similarly, let {u1 , . . . , ur } ⊆ Cm be an orthonormal set of eigenvectors corresponding to


the nonzero eigenvalues s21 , . . . , s2r of AA∗ . Let vj = A∗ uj /sj . Then {v1 , . . . , vr } ⊆ Cn is an
orthonormal family such that A = rj=1 sj uj vj∗ .
P

If A ∈ Mm,n , then one can find real orthogonal matrices U ∈ Mm and V ∈ Mn with
columns u1 , . . . , um and v1 , . . . , vn such that A = U ( rj=1 sj Ejj )V ∗ = rj=1 sj uj vj∗ .
P P

We may extend the definition of inner product ⟨x, y⟩ and inner product norm ∥x∥ for
vectors x, y ∈ Fn to matrices by
X
⟨A, B⟩ = aij b̄ij = tr (AB ∗ ) and ∥A∥F = ⟨A, A⟩1/2
i,j

if A = (aij ), B = (bij ) ∈ Mm,n . ∥A∥F is called the Frobenius norm or ℓ2 -norm of A.

Theorem 2.1.4 Suppose A ∈ Mm,n (F) has rank r and singular value decomposition A =
Pr ∗ m n
j=1 sj uj vj , where s1 ≥ · · · ≥ sr > 0 {u1 , . . . , ur } ⊆ F , {v1 , . . . , vr } ⊆ F are orthonormal

sets. For any positive integer k ≤ r, Ak = kj=1 sj uj vj∗ satisfies


P

∥A − Ak ∥F ≤ ∥A − X∥F for all X ∈ Mm,n with rank at most k.

If k ≥ r, then no approximation is needed.


Proof. Let B has rank k such that ∥A − B∥F is minimum among all B with rank at
most k. Then there are unitary P ∈ Mm and Q ∈ Mn such that P BQ = kj=1 bj Ejj with
P

bj = sj (B) for j = 1, . . . , k. Since ∥P XQ∥F = ∥X∥F , if P AQ = (aij ), then

X k
X X
∥A − B∥2F = ∥P (A − B)Q∥2F = 2
|aij | + |ajj − bj |2 + |ajj |2 .
i̸=j j=1 j>k

8
Let C = P (A − B)Q = (cij ). If there is 1 ≤ i ≤ k such that cij ̸= 0, we may change the
(i, j) entry of P BQ to aij to get a rank at most r matrix B̂ so that ∥A − B̂∥F is smaller.
Similarly, if there is 1 ≤ j ≤ k such that such that cij ̸= 0, we may change the (i, j) entry
of P BQ to aij to get a rank at most k matrix B̂ so that ∥A − B̂∥F is smaller. Hence, at the
  Pk 
0r 0 j=1 bj Ejj 0
minimum, P (A − B)Q = . So, P AQ = , and b1 , . . . , bk are
0 A22 0 A22
singular values of A. Thus,

k
X r
X k
X

∥P AQ − P BQ∥2F = tr (AA ) − b2j = 2
sj (A) − b2j ,
j=1 j=1 j=1

which is minimum if (b1 , . . . , br ) = (s1 (A), . . . , sk (A)). □


Note that the Ak is uniquely determined if and only if sk (A) > sk+1 (A).

2.2 Schur Triangularization lemma and its consequences


Theorem 2.2.1 Let A ∈ Mn and det(λI − A) = nj=1 (λ − λj ). Then there is a unitary U
Q

such that U ∗ AU is in upper (or lower) triangular form with diagonal entries λ1 , . . . , λn .

Proof. By induction on n. If n = 1, the results holds. Assume the results holds for
matrices of sizes smaller than n, and A ∈ Mn . Let Ax = λ1 u1 for a unit vector u1 , and U
 
∗ λ1 ∗
is unitary with first column of U1 equal to u1 . Then U1 AU1 = . By induction
0 A2
assumption, there is V1 ∈ Mn−1 such that V1∗ A2 V1 = T is in triangular form. If U =
   
∗ λ1 ∗ λ1 ∗
U1 ([1] ⊕ V1 ), then U AU = = is in upper triangular form. □
0 V ∗ A2 V 0 T
Note that λ1 , . . . , λn can be arranged in any order we like. Some of the λj could be the
same. If µ1 , . . . , µr are distinct and det(λI −A) = rj=1 (λ−µj )mj , we say that A has distinct
Q

eigenvalues µ1 , . . . , µr with multiplicities m1 , . . . , mr , respectively.


Pn
Theorem 2.2.2 (Cayley-Hamilton) Let A ∈ Mn and f (λ) = det(λI − A) = j=0 aj λ j .
Then
Xn
f (A) = aj Aj = 0n .
j=0

Proof. We need to show that nj=0 aj Aj = (A − λ1 I) · · · (A − λn In ) = 0. It suffices to


P

show that
0 = Z = [U ∗ (A − λ1 I)U ] · · · [U ∗ (A − λn In )U ],

9
where U ∗ AU = (aij ) is in upper triangular form with diagonal entries λ1 , . . . , λn . Then
Bj = U ∗ (A − λj I)U is in upper triangular form with (j, j) entry equal to zero. .
We will prove by induction on n that if B1 , . . . , Bn ∈ Mn are matrices in upper triangular
form, and the (j, j) entry of Bj equals zero for j = 1, . . . , n, then B1 · · · Bn = 0n .
For n = 1, the result is trivial. For n = 2, the product B1 and B2 has the form
  
0 ∗ ∗ ∗
,
0 ∗ 0 ∗

which is clearly equal to 02 .


 
∗ ∗
Suppose the result holds for matrices in Mn−1 . Let Bj = for j = 1, . . . , n. Then
0 Tj
by block multiplication of B2 · · · Bn , and the induction assumption on T2 · · · Tn = 0n−1 , we
have     
0 ∗ ∗ ∗ 0 ∗
B1 · · · Bn = = = 0n .
0 T1 0 T2 · · · Tn 0 0n−2
Now, let Bj = U ∗ (A − λj I)U = U ∗ Aj U − λj I. We get the desired result. □
Remark People have the misconception that det(λI − A) = 0 is valid if we put λ = A in
the above equation so that det(λI − A) = det(A − A) = det(0n ) = 0. In the theorem, we
actually put xk = Ak in f (x) = a0 + · · · + xn and conclude that f (A) = 0n , the zero matrix.

2.3 Normal matrices


Definition 2.3.1 (1) A matrix A ∈ Mn is normal if AA∗ = A∗ A. (2) A matrix A ∈ Mn
is Hermitian if A = A∗ . (3) A matrix A ∈ Mn is positive semidefinite if x∗ Ax ≥ 0 for all
x ∈ Cn . (4) A matrix A ∈ Mn the matrix A is positive definite if x∗ Ax > 0 for all nonzero
x ∈ Cn . (5) A matrix A ∈ Mn is unitary A∗ A = In .

Theorem 2.3.2 A matrix A ∈ Mn is normal if and only if A = U DU ∗ for a diagonal


matrix D, i.e., A is unitarily diaogonalizable.

Proof. If U ∗ AU = D, i.e., A = U DU ∗ for some unitary U ∈ Mn . Then AA∗ =


U DU ∗ U D∗ U = U DD∗ U ∗ = U D∗ DU ∗ = U D∗ U ∗ U DU ∗ = A∗ A.
Conversely, suppose U ∗ AU = (aij ) = Ã is in upper triangular form. If AA∗ = A∗ A, then
ÃÃ∗ = Ã∗ Ã so that the (1, 1) entries of the matrices on both sides are the same. Thus,

|a11 |2 + · · · + |a1n |2 = |a11 |2

implying that à = [a11 ] ⊕ A1 , where A1 ∈ Mn−1 is in upper triangular form. Now,

[|a11 |2 ] ⊕ A1 A∗1 = ÃÃ∗ = Ã∗ Ã = [|a11 |2 ] ⊕ A∗1 A1 .

10
Consider the (1, 1) entries of A1 A∗1 and A∗1 A1 , we see that all the off-diagonal entries in the
second row of A1 are zero. Repeating this process, we see that à = diag (a11 , . . . , ann ). □

Proposition 2.3.3 A matrix A ∈ Mn is unitary if and only if it is unitarily similar to a


diagonal matrix with all eigenvalues having modulus 1.

Proof. If U ∗ AU = D = diag (λ1 , . . . , λn ) with |λ1 | = · · · = |λn | = 1, then A is unitary


because
AA∗ = U DU ∗ U D∗ U ∗ = U (DD∗ )U ∗ = U U ∗ = In .

Conversely, if AA∗ = A∗ A = In , then U ∗ AU = D = diag (λ1 , . . . , λn ) for some unitary


U ∈ Mn . Thus, I = U ∗ IU = U ∗ AU U ∗ A∗ U = DD∗ . Thus, |λ1 | = · · · = |λn | = 1. □

Theorem 2.3.4 Let A ∈ Mn . The following are equivalent.

(a) A is Hermitian.

(b) A is unitarily similar to a real diagonal matrix.

(c) x∗ Ax ∈ R for all x ∈ Cn .

Proof. Suppose (a) holds. Then AA∗ = A2 = A∗ A so that U ∗ AU = D = diag (λ1 , . . . , λn )


for some unitary U ∈ Mn . Now, D = U ∗ AU = U ∗ A∗ U = (U ∗ AU )∗ = D∗ . So, λ1 , . . . , λn ∈ R.
Thus (b) holds.
Suppose (b) holds and A = U ∗ DU such that U is unitary and D = diag (d1 , . . . , dn . Then
for any x ∈ Cn , we can set U x = (y1 , . . . , yn )t so that x∗ Ax = x∗ U ∗ DU x = nj=1 dj |yj |2 ∈ R.
P

Suppose (c) holds. Let A = H + iG with H = (A + A∗ )/2 ≥ 0 and G = (A − A∗ )/(2i).


Then H = H ∗ and G = G∗ . Then for any x ∈ Cn , x∗ Hx = µ1 ∈ R, x∗ Gx = µ2 ∈ R so
that x∗ Ax = µ1 + iµ2 ∈ C. If G is nonzero, then V ∗ GV = diag (λ1 , . . . , λn ) with λ1 ̸= 0.
Suppose x is the first column of V . Then x∗ Ax = x∗ Hx + ix∗ Gx = µ1 + iλ1 ∈ / R, which is
a contradiction. So, we have G = 0 and A = H is Hermitian. □

Proposition 2.3.5 Let A ∈ Mn . The following are equivalent.

(a) A is positive semidefinite.

(b) A is unitarily similar to a real diagonal matrix with nonnegative diagonal entries.

(c) A = B ∗ B for some B ∈ Mn . (We can choose B so that B = B ∗ .)

11
Proof. Suppose (a) holds. Then x∗ Ax ≥ 0 for all x ∈ Cn . Thus, there is a unitary
U ∈ Mn such that U ∗ AU = diag (λ1 , . . . , λn ) with λ1 , . . . , λn ∈ R. If there is λj < 0, we
can let x be the jth column of U so that x∗ Ax = λj < 0, which is a contradiction. So, all
λ1 , . . . , λn ≥ 0.
Suppose (b) holds. Then U ∗ AU = D such that D has nonnegative entries. We have
A = B ∗ B with B = U D1/2 U ∗ = B ∗ . Hence condition (c) holds.
Suppose (c) holds. Then for any x ∈ Cn , x∗ Ax = (Bx)∗ (Bx) ≥ 0. Thus, (a) holds. □

A quick proof of SVD and an efficient algorithm to find SVD.


Let A ∈ Mm,n . Then A∗ A is psd so that V ∗ A∗ AV = diag (λ1 , . . . , λn ). Since λj =
vj A∗ Avj , we see that λj = s2j for some sj ≥ 0, and we may assume that s21 ≥ · · · ≥ s2n .
Let s21 , . . . , s2r be the nonzero eigenvalues of A∗ A, and let uj = Aj vj /∥Aj vj ∥ ∈ Cm for
j = 1, . . . , r. Then {u1 , . . . , ur } is an orthonormal set and A = j=1 sj uj vj∗ . If we only need
P

Ak = kj=1 sj uj vj∗ , one can use power method to get s1 , v1 and then u1 from A∗ A. Then get
P

s2 , v2 and then u2 from A∗2 A2 with A2 = A − s1 u1 v1∗ , and so forth.


For any A ∈ Mn we can write A = H + iG with H = (A + A∗ )/2 and G = (A − A∗ )/(2i).
This is known as the Hermitian or Cartesian decomposition.

Theorem 2.3.6 Let A ∈ Mn . Then A = P U = V Q for some positive semidefinite matrices


P, Q ∈ Mn and unitary U, V ∈ Mn .

ˆ If A is invertible, then the matrices P, Q, U, V are uniquely determined as (P, U ) =


√ √
( AA∗ , P −1 A), and (Q, V ) = A∗ A, AQ−1 ).

ˆ The matrix A is normal if and only if P U = U P or V Q = QV .

Corollary 2.3.7 In fact, if A ∈ Mn,m with n ≥ m and has rank m, then A = V R where
V ∈ Mn,m has orthonormal columns and R ∈ Mm can be chosen to be upper triangular,
lower triangular, or positive definite.

2.4 Commuting families and Specht’s theorem


Definition 2.4.1 A family F ⊆ Mn is a commuting family if every pair of matrices X, Y ∈
F commute, i.e., XY = Y X.

Lemma 2.4.2 Let F ⊆ Mn be a commuting family. Then there a unit vector v ∈ Cn such
that v is an eigenvector for every A ∈ F.

Proof. Let V ⊆ Cn with minimum positive dimension be such that A(V ) ⊆ V . We will
show that dim V = 1 and the result will follow. First, A(Cn ) ⊆ Cn . So, one can always try

12
to find V with a minimum positive dimension. We claim that every nonzero vector in V
is an eigenvector of A for every A ∈ F. Then for any non-zero v ∈ V , V0 = span {v} will
satisfy A(V0 ) ⊆ V0 with dim V0 = 1.
Suppose there is A ∈ F such that not every nonzero vector in v is an eigenvector of A.
Now, if V has an orthonormal basis {u1 , . . . , uk } and U is unitary with u1 , . . . , uk as the
 
∗ B11 B12
first k columns. Then U BU = with B ∈ Mk for every B ∈ F. Then there is
0 B22
v = a1 u1 + · · · + ak uk ∈ V such that Av = λv.
Let V0 = {u ∈ V : Au = λu} ⊂ V . Then V0 is a subspace of V with smaller dimension.
Next, we show that Bu ∈ V0 for any u ∈ V0 . If B ∈ F and u ∈ V , then Bu ∈ V as
B(V ) ⊆ V , and A(Bu) = BAu = Bλu = λBu, i.e., ũ = Bu ∈ V0 . So, V0 satisfies
B(V0 ) ⊆ V0 and dim V0 < dim V , which is impossible. The desired result follows. □

Theorem 2.4.3 Let F ⊆ Mn be a commuting family. Then there is a unitary matrix


U ∈ Mn such that U ∗ AU is in upper triangular form.

Proof. We can consider the a basis for the span of F, and assume that F = {A1 , . . . , Am }
is finite. Assume A1 is nonscalar, and has an eigenvalue λ1 . Then Aj (V) ⊂ V if V is the null
space of A1 − λ1 I. By induction, there is a common unit eigenvector x for all Aj ∈ F. Then
 
∗ ∗
construct U with x as the first column so that U ∗ Aj U = , where {B1 , . . . , Bm } is a
0 Bj
commuting families. Apply induction to finish the proof. □

Corollary 2.4.4 Suppose F ⊆ Mn is a commuting family of normal matrices. Then there


is a unitary matrix U ∈ Mn such that U ∗ AU is in diagonal form.

There is no easy canonical form under unitary similarity.1 How to determine two matrices
are unitarily similar?

Definition 2.4.5 Let {X, Y } ⊆ Mn . A word W (X, Y ) in X and Y of length m is a product


of m matrices chosen from {X, Y } (with repetition).

Theorem 2.4.6 Let A, B ∈ Mn .


(a) If A and B are unitarily similar, then tr (W (A, A∗ )) = tr (W (B, B ∗ )) for all words
W (X, Y ).
(b) tr (W (A, A∗ )) = tr (W (B, B ∗ )) for all words W (X, Y ) of length 2n2 , then A and B
are unitarily similar.

1
Helene Shapiro, A survey of canonical forms and invariants for unitary similarity, Linear Algebra Appl.
147 (1991), 101-167.

13
2.5 Other canonical forms
Unitary congruence

ˆ A matrix A ∈ Mn is unitarily congruent to B ∈ Mn if there is a unitary matrix U such


that A = U t BU .

ˆ There is no easy canonical form under unitary congruence for general matrices.
Pk
ˆ Every complex symmetric matrix A ∈ Mn is unitarily congruent to j=1 sj Ejj , where
s1 ≥ · · · ≥ sk > 0 are the nonzero singular values of A.

ˆ Every skew-symmetric A ∈ Mn is unitarily congruent to 0n−2k and


 
0 sj
, j = 1, . . . , k,
−sj 0

where s1 ≥ · · · ≥ sk > 0 are nonzero singular values of A.

ˆ The singular values of a skew-symmetric matrix A ∈ Mn occur in pairs.

ˆ Two symmetric (skew-symmetric) matrices are unitarily congruent if and only if they
have the same singular values.

Proof. Suppose A ∈ Mn is symmetric. Let x ∈ Cn be a unit vector so that xt Ax is


real and maximum, and let U ∈ Mn be unitary with x as the first column. Show that
U t AU = [s1 ] ⊕ A1 . Then use induction.
Suppose A ∈ Mn is skew-symmetric. Let x, y ∈ Cn be orthonormal pairs such that xt Ay
is real and maximum, and U ∈ Mn be unitary with x, y as the first two columns. Show that
 
t 0 s1
U AU = ⊕ A1 . Then use induction. □
−s1 0

2.6 Real matrices


Theorem 2.6.1 Let A ∈ Mn be a real matrix, and

det(xI − A) = (x − c1 ) · · · (x − cr )(x2 − 2a1 x + a21 + b21 ) · · · (x2 − 2ak x + a2k + b2k ).

Then there is an real orthogonal matrix P such that P t AP = (Crs )0≤r,s≤k is in upper tri-
angular block form, where C00 ∈ Mr (R) is an upper triangular matrix with diagonal entries
c1 , . . . , cr , Cjj ∈ M2 (R) has eigenvalues aj ± ibj for j = 1, . . . , k, and Crs is zero if r > s.

14
Furthermore, if A is normal, i.e., At A = AAt , then

P t AP = B0 ⊕ B1 ⊕ · · · ⊕ Bk
 
aj b j
with B0 = diag (c1 , . . . , cr ), and Bj = ∈ M2 (R) for j = 1, . . . , k.
−bj aj
(a) If A = At , then B1 , . . . , Bk are vacuous.
(b) if A = −At , then B0 = 0r .
(c) If A is orthogonal, then b1 , . . . , br ∈ {1, −1} and a2j + b2j = 1 for j = 1, . . . , k.

Proof. If A has a real eigenvalue c1 and Au1 = c1 u1 , where u1 is a unit eigenvector.


 
t c1 ⋆
Let P be real orthogonal with u1 as the first column. Then P1 AP1 = . If A has
0 A1
another real eigenvalue c2 , then A1 has c1 as an eigenvalue and there is an orthogonal matrix
 
t c2 ⋆
P2 ∈ Mn−1 such that P2 A1 P2 = . Then
0 A2
 
c1 ⋆ ⋆
([1] ⊕ P2t )P1t AP1 ([1] ⊕ P2 ) =  0 c2 ⋆  .
0 0 A2

Repeating this argument, we can get


 
C00 ⋆
Prt APr = .
0 C1

Now, C1 has complex eigenvalue a1 ± ib1 . If C1 (x + iy) = (a1 + ib1 )(x + iy) for a pair
of nonzero real vectors x, y ∈ Rn . Then C1 x = a1 x − b1 y and C1 y = a1 y + b1 x, and
C1 (x − iy) = (a1 − ib1 )(x − iy), i.e., C1 [x y] = [x y]B1 . Now, x + iy and x − iy are eigenvectors
of B1 corresponding to the eigenvalues a1 ± ib1 . So, {x + iy, x − iy} is linear independent
and so is {x, y}. Apply Gram-Schmidt process to {x, y} to get a real orthonormal family
{q1 , q2 }. Then [x y] = [q1 q2 ]T1 for an upper triangular matrix T1 ∈ M2 (R). Let Q1 ∈ M2k
be real orthogonal with q1 , q2 as the first two columns. Then
 
C11 ⋆
Qt1 B1 Q1 =
0 C2

so that C11 = T1 B1 T1−1 has eigenvalues a1 ± ib1 . One can apply an inductive arguments to
C2 and get the desired form.
In case A is normal, then so is Qt AQ. One can then deduce that Qt AQ has the form
B0 ⊕ · · · ⊕ Bk . Assertions (a) – (c) can be verified directly. □

15
3 Similarity and equivalence
We consider other canonical forms in this chapter.

3.1 Jordan Canonical form


Theorem 3.1.1 Suppose A ∈ Mn has distinct eigenvalues λ1 , . . . , λk . Then A is similar to
A11 ⊕ · · · ⊕ Akk such that Ajj has (only one distinct) eigenvalue λj for j = 1, . . . , k.

Lemma 3.1.2 Suppose A ∈ Mm , B ∈ Mn have no common eigenvalues. Then for any


C ∈ Mm,n there is a unique solution X ∈ Mm,n such that AX − XB = C.

Proof. Let U be unitary such that B̃ = U ∗ BU is in upper triangular form. If C̃ = CU


and Y = XU , then we consider C̃ = CU = A(XU ) − (XU )U ∗ BU = AY − Y B̃ and solve
for Y . Let C̃ = [c1 · · · cn ], Y = [y1 · · · yn ] and B̃ = [bij ], where b11 , . . . , bnn are the eigenvalues
of B. Then
c1 = Ay1 − b11 y1 has a unique solution y1 as A − b11 I is invertible,
c2 = Ay2 − b22 y2 − b12 y1 has a unique a solution y2 as A − b22 I is invertible,
....
Pn−1
cn = Ayn − bnn yn − j=1 b1j yj has a unique solution yn as A − bnn I is invertible. □

 
A11 A12
Proposition 3.1.3 Suppose A = ∈ Mn such that A11 ∈ Mk , A22 ∈ Mn−k have
0 A22
no common eigenvalue. Then A is similar to A11 ⊕ A22 .

Proof. By the previous lemma, there is X be such that A11 X + A12 = XA22 . Let
 
Ik X
S= so that AS = S(A11 ⊕ A22 ). The result follows. □
0 In−k

Definition 3.1.4 Let Jk (λ) ∈ Mk such that all the diagonal entries equal λ and all super di-
 
λ 1
 .. .. 
. .
agonal entries equal 1. Then Jk (λ) =   ∈ Mk is call a (an upper triangular)
 
 λ 1
λ
Jordan block of λ of size k.

Theorem 3.1.5 Every A ∈ Mn is similar to a direct sum of Jordan blocks.

16
Proof. We may assume that A = A11 ⊕· · ·⊕Akk . If we can find invertible matrices S1 , . . . , Sk
such that Si−1 Aii Si is in Jordan form, then S −1 AS is in Jordan form for S = S1 ⊕ · · · ⊕ Sk .
Focus on T = Aii − λi Ink . If S −1 T S is in Jordan form, then so is Aii .
One may see http://cklixx.people.wm.edu/teaching/math408/Jordan.pdf for a proof of
this. The note will appear on arXiv soon. □
To determine the Jordan form of a matrix A with det(xI − A) = (x − λ1 )n1 · · · (x − λk )nk ,
one only needs to study the rank of (A − λj I)m for m = 1, . . . , nj .
Let ker((A − λI)i ) = ℓi has dimension ℓi . Then there are ℓ1 Jordan blocks of λ, and there
are ℓi − λi−1 Jordan blocks of size at least i.
 
0 0 1 2
0 0 3 4
Example 3.1.6 Let T =  0 0 0 0. Then T e1 = 0, T e2 = 0, T e3 = e1 + 3e2 , T e4 =

0 0 0 0
2e1 + 4e2 . So, T (V ) = span {e1 , e2 }. Now, T e1 = T e2 = 0 so that e1 , e2 form a Jordan basis
for T (V ). Solving u1 , u2 such that T (u1 ) = e1 , T (u2 ) = e2 , we let u1 = −2e3 + 3e4 /2 and
u2 = e3 − e4 /2. Thus, T S = S(J2 (0) ⊕ J2 (0)) with
 
1 0 0 0
0 0 1 0 
S=
0 −2 0
.
1 
0 3/2 0 −1/2
 
0 1 2
Example 3.1.7 Let T = 0 0 3. Then T e1 = 0, T e2 = e1 , T e3 = 2e1 + e2 . So,
0 0 0
T (V ) = span {e1 , e2 }, and e2 , T e2 = e1 form a Jordan basis for T (V ). Solving u1 such that
T (u1 ) = e2 , we have u1 = (−2e2 + e3 )/3. Thus, T S = SJ3 (0) with
 
1 0 0
S = 0 1 −2/3 .
0 0 1/3

Example 3.1.8 Suppose A ∈ M9 has distinct eigenvalues λ1 , λ2 , λ3 such that A − λ1 I has


rank 8, A − λ2 I has rank 7, (A − λ2 I)2 and (A − λ2 I)3 have rank 5, A − λ3 I has rank 6,
(A − λ3 I)2 and (A − λ3 I)3 have rank 5. Then the Jordan form of A is

J1 (λ1 ) ⊕ J2 (λ2 ) ⊕ J2 (λ2 ) ⊕ J1 (λ3 ) ⊕ J1 (λ3 ) ⊕ J2 (λ3 ).

17
3.2 Implications of the Jordan form
Theorem 3.2.1 Two matrices are similar if and only if they have the same Jordan form.

Proof. If A and B have Jordan form J, then S −1 AS = J = T −1 BT for some invertible


S, T so that R−1 AR = B with R = ST −1 .
If S −1 AS = B, then rank (A − µI)ℓ = rank (B − µI)ℓ for all eigenvalues of A or B, and
for all positive integers ℓ. So, A and B have the same Jordan form. □

Remark 3.2.2 If A = S(J1 ⊕ · · · ⊕ Jk )S −1 , then Am = S(J1m ⊕ · · · ⊕ Jkm )S −1 .


Pk−1
Theorem 3.2.3 Let Jk (λ) = λIk + Nk , where Nk = j=1 Ej,j+1 . Then

m  
X m
m
Jk (λ) = λm−j Nkj ,
j=0
j

where Nk0 = Ik , Nkj = 0 for j ≥ k, and Nkj has one’s at the jth super diagonal (entries with
indexes (ℓ, ℓ + j)) and zeros elsewhere.

For every polynomial function f (z) = am z m + · · · + a0 , let

f (A) = am Am + · · · + a0 In for A ∈ Mn .

Definition 3.2.4 Let A ∈ Mn . Then there is a unique monic polynomial

mA (z) = xm + a1 xm−1 + · · · + am

such that mA (A) = 0. It is called the minimal polynomial of A.

Theorem 3.2.5 A polynomial g(z) satisfies g(A) = 0 if and only if it is a multiple of the
minimal polynomial of A.

Proof. If g(z) = mA (z)q(z), then g(A) = mA (A)q(A) = 0. To prove the converse, by


the Euclidean algorithm, g(z) = mA (z)q(z) + r(z) for any polynomial g(z). If 0 = g(A) =
mA (A)q(A) + r(A) = r(A), then r(A) = 0. But r(z) has degree less than mA (z). If r(z)
is not zero, then there is a nonzero µ ∈ C such that µr(z) is a monic polynomial such that
µr(A) = 0, which is impossible. So, r(z) = 0, i.e., g(z) is a multiple of mA (z). □
We can actually determine the minimal polynomial of A ∈ Mn using its Jordan form.

Theorem 3.2.6 Suppose A has distinct eigenvalues λ1 , . . . , λk such that rj is the maximum
size Jordan block of λj for j = 1, . . . , k. Then mA (z) = (z − λ1 )r1 · · · (z − λk )mk .

18
Proof. Following the proof of the Cayley Hamilton Theorem, we see that mA (A) = 0n .
By the last Theorem, if g(A) = 0n , then g(z) = mA (z)q(z). So, taking q(z) = 1 will yield
the monic polynomial of minimum degree satisfying mA (A) = 0. □

Remark 3.2.7 For any polynomial g(z), the Jordan form of g(A) can be determine in
terms of the Jordan form of A. In particular, for every Jordan block Jk (λ), we can write
g(z) = (z − λ)k q(z) + r(z) with r(z) = a0 + · · · + ak−1 z k−1 so that g(Jk (λ)) = r(Jk (λ)).

Note that
g ′ (λ) g ′′ (λ) g (r−1) (λ)
 g(λ) 
0! 1! 2!
··· (r−1)!

g(λ) g ′ (λ) .. g (r−2) (λ) 


 0 0! 1!
. (r−2)! 
g(Jr (λ)) = 
 .. .. .. .

 0 0 . . . 
 .. .. g(λ) g ′ (λ) 
 . ··· . 0! 1!

g(λ)
0 ··· ··· 0 0!

One can extend this to function g(x), which are differentiable up to order r in a domain
containing λ in the interior.

3.3 Further canonical forms

Equivalence

ˆ Two matrices A, B ∈ Mm,n are equivalent if there are invertible matrices R ∈ Mm , S ∈


Mn such that A = RBS.
Pk
ˆ Every matrix A ∈ Mm,n is equivalent to j=1 Ejj , where k is the rank of A.

ˆ Two matrices are equivalent if they have the same rank.

Proof. Elementary row operations and elementary column operations. □

∗-congruence

ˆ A matrix A ∈ Mn is ∗-congruent to B ∈ Mn if there is an invertible matrix S such


that A = S ∗ BS.

ˆ There is no easy canonical form under ∗-congruence for general matrix.2

ˆ Every Hermitian matrix A ∈ Mn is ∗-congruent to Ip ⊕ −Iq ⊕ 0n−p−q . The triple


ν(A) = (p, q, n − p − q) is known as the inertia of A.
2
Roger A. Horn and Vladimir V. Sergeichuk, Canonical forms for complex matrix congruence and ∗-
congruence, Linear Algebra Appl. (2006), 1010-1032.

19
ˆ Two Hermitian matrices are ∗-congruent if and only if they have the same inertia.

Proof. Use the unitary congruence/similarity results. □

Congruence or t-congruence

ˆ A matrix A ∈ Mn is t-congruent to B ∈ Mn if there is an invertible matrix S such that


A = S t BS.

ˆ There is no easy canonical form under t-congruence for general matrices; see footnote
2.

ˆ Every complex symmetric matrix A ∈ Mn is t-congruent to Ik ⊕ 0n−k , where k =


rank (A).
 
0 1
ˆ Every skew-symmetric A ∈ Mn is t-congruent to 0n−2k and k copies of .
−1 0

ˆ The rank of a skew-symmetric matrix A ∈ Mn is even.

ˆ Two symmetric (skew-symmetric) matrices are t-congruent if and only if they have the
same rank.

Proof. Use the unitary congruence results. □

3.4 Remarks on real matrices


Remark 3.4.1 Let A ∈ Mn (R). Then A = S + K where S = (A + At )/2 is symmetric and
K = (A − At )/2 is skew-symmetric, i.e., K t = −K.

ˆ Note that xt Kx = 0 for all x ∈ Rn .

ˆ Clearly, xt Ax ∈ R for all real vectors x ∈ Rn , and the condition does not imply that A
is symmetric as in the complex Hermitian case.

ˆ The matrix A satisfies xt Ax ≥ 0 for all if and only if (A + At )/2 has only nonnegative
eigenvalues. The condition does not automatically imply that A is symmetric as in the
complex Hermitian case.

ˆ Every skew-symmetric matrix K ∈ Mn (R) is orthogonally similar to 02k and


 
0 sj
, j = 1, . . . , k,
−sj 0

where s1 ≥ · · · ≥ sk > 0 are nonzero singular values of A.

20
ˆ If A ∈ Mn (R) has only real eigenvalues, then one can find a real invertible matrix such
that S −1 AS is in Jordan form.

ˆ If A ∈ Mn (R), then there is a real invertible matrix such that S −1 AS is a direct sum
of real Jordan blocks, and 2k × 2k generalized Jordan blocks of the form (Cij )1≤i,j≤k
 
µ1 µ2
with C11 = · · · = Ckk = , C12 = · · · = Ck−1,k = I2 , and all other blocks
−µ2 µ1
equal to 02 .

ˆ The proof can be done by the following two steps.


First of all, find the Jordan form of A. Then group Jk (λ) and Jk (λ̄) together for any
complex eigenvalues, and find a complex S such that S −1 AS is a direct sum of the
above form.
Second if S = S1 + iS2 for some real matrix S1 , S2 , show that there is Ŝ = S1 + rS2 for
some real number r such that Ŝ is invertible so that Ŝ −1 AŜ has the desired form.

21
4 Eigenvalues and singular values inequalities
We study inequalities relating the eigenvalues, diagonal elements, singular values of matrices
in this chapter.
For a Hermitian matrix A, let λ(A) = (λ1 (A), . . . , λn (A)) be the vector of eigenvalues of A
with entries arranged in descending order. Also, we will denote by s(A) = (s1 (A), . . . , sn (A))
the singular values of a matrix A ∈ Mm,n . For two Hermitian matrices, we write A ≥ B if
A − B is positive semidefinite.

4.1 Diagonal entries and eigenvalues of a Hermitian matrix


Theorem Let A = (aij ) ∈ Mn be Hermitian with eigenvalues λ1 ≥ · · · ≥ λn . Then for any
1 ≤ k < n, a11 + · · · + akk ≤ λ1 + · · · + λk . The equality holds if and only if A = A11 ⊕ A22
so that A11 has eigenvalues λ1 , . . . , λk .

Remark The above result will give us what we needed, and we can put the majorization
result as a related result for real vectors.

Lemma 4.1.1 (Rayleigh principle) Let A ∈ Mn be Hermitian. Then for any unit vector
x ∈ Cn ,
λ1 (A) ≥ x∗ Ax ≥ λn (A).
The equalities hold at unit eigenvectors corresponding to the largest and smallest eigenvalues
of A, respectively.

Proof. Done in homework problem. □


If we take x = ej , we see that every diagonal entry of a Hermitian matrix A lies between
λ1 (A) and λn (A).
We can say more in the following. To do that we need the notion of majorization and
doubly stochastic matrices.
A matrix D = (dij ) ∈ Mn is doubly stochastic if dij ≥ 0 and all the row sums and column
sums of D equal 1.
Let x, y ∈ Rn . We say that x is weakly majorized by y, denoted by x ≺w y if the sum of
the k largest entries of x is not larger than that of y for k = 1, . . . , n; in addition, if the sum
of the entries of x and y, we say that x is majorized by y, denoted by x ≺ y. We say that x
is obtained from y be a pinching if x is obtained from y by changing (yi , yj ) to (yi − δ, yj + δ)
for two of the entries yi > jj of y and some δ ∈ (0, yi − yj ).

Theorem 4.1.2 Let x, y ∈ Rn with n ≥ 2. The following conditions are equivalent.

22
(a) x ≺ y.

(b) There are vectors x1 , x2 , . . . , xk with k < n, x1 = y, xk = x, such that each xj is


obtained from xj−1 by pinching two of its entries.

(c) x = Dy for some doubly stochastic matrix.

Proof. Note that the conditions do not change if we replace (x, y) by (P x, Qy) for any
permutation matrices P, Q. We may make these changes in our proof.
(c) ⇒ (a). We may assume that x = (x1 , . . . , xn )t and y = (y1 , . . . , yn )t with entries
in descending order. Suppose x = Dy for a doubly stochastic matrix D = (dij ). Let
vk = (e1 + · · · + ek ) and vkt D = (c1 , . . . , cn ). Then 0 ≤ cj ≤ 1 and nj=1 cj = k. So,
P

k
X
xj = vkt Dy = c1 y1 + · · · + cn yn
j=1

≤ c1 y1 + ck yk + [(1 − c1 ) + · · · + [1 − ck )]yk ≤ y1 + · · · + yk .

Clearly, the equality holds if k = n.


(a) ⇒ (b). We prove the result by induction on n. If n = 2, the result is clear. Suppose the
result holds for vectors of length less than n. Assume x = (x1 , . . . , xn )t and y = (y1 , . . . , yn )t
has entries arranged in descending order, and x ≺ y. Let k be the maximum integer such
that yk ≥ x1 . If k = n, then for S = nj=1 xj = nj=1 yj ,
P P

n−1
X n−1
X
yn ≥ x1 ≥ · · · xn ≥= S − xj ≥ S − yj = yn
j=1 j=1

so that x = · · · = xn = y1 = · · · = yn . So, x = x1 = y. Suppose k < n and yk ≥


x1 > yk+1 . Then we can replace (yk , yk+1 ) by (ỹk , ỹk+1 ) = (x1 , yk + yk+1 − x1 ). Then
removing x1 from x and removing ỹk in x1 will yield the vectors x̃ = (x2 , . . . , xn )t and
ỹ = (y1 , . . . , yk−1 , ỹk+1 , . . . , yn )t in Rn−1 with entries arranged in descending order. We will
show that x̃ ≺ ỹ. The result will then follows by induction. Now, if ℓ ≤ k, then

x2 + · · · + xℓ ≤ x1 + · · · + xℓ−1 ≤ y1 + · · · + yℓ−1 ;

if ℓ > k, then

x2 + · · · + xℓ ≤ (y1 + · · · + yℓ ) − x1 = y1 + · · · + yk−1 + ỹk+1 + yk+1 + · · · + yℓ

with equality when ℓ = n. The results follows.

23
(b) ⇒ (c). If xj is obtained from xj−1 by pinching the pth and qth entries. Then there
is a doubly stochastic matrix Pj obtained from I by changing the submatrix in rows and
columns p, q by
 
tj 1 − tj
1 − tj tj
for some tj ∈ (0, 1). Then x = Dy for D = Pk · · · P1 , which is doubly stochastic. □

Theorem 4.1.3 Let d, a ∈ Rn . The following are equivalent.


(a) There is a complex Hermitian (real symmetric) A ∈ Mn with entries of a as eigenvalues
and entries of d as diagonal entries.

(b) The vectors satisfy d ≺ a.

Proof. Let A = U DU ∗ such that D = diag (λ1 , . . . , λn )). Suppose A = (aij ) and U =
(uij ). Then ajj = ni=1 λi |uji |2 . Because (|uji |2 ) is doubly stochastic. So, (a11 , . . . , ann ) ≺
P

(λ1 , . . . , λn ).
We prove the converse by induction on n. Suppose (d1 , . . . , dn ) ≺ (λ1 , . . . , λn ). If n = 2,
let d1 = λ1 cos2 θ + λ2 sin2 θ so that
   
cos θ sin θ λ1 cos θ − sin θ
(aij ) =
− sin θ cos θ λ2 sin θ cos θ
has diagonal entries d1 , d2 .
Suppose n > 2. Choose the maximum k such that λk ≥ d1 . If λn = d1 , then for
S = nj=1 dj = nj=1 λj we have
P P

n−1
X n−1
X
λn ≥ d1 ≥ · · · ≥ dn = S − dj ≥ S − λj = λn .
j=1 j=1
Pn
Thus, λn = d1 = · · · = dn = S/n = j=1 λj /n implies that λ1 = · · · = λn . Hence,
A = λn I is the required matrix. Suppose k < n. Then there is A1 = At1 ∈ M2 (R)
with diagonal entries d1 , λk + λk+1 − d1 and eigenvalues λj , λj+1 . Consider A = A1 ⊕ D
with D = diag (λ1 , . . . , λk−1 , λk+2 , . . . , λn ). As shown in the proof of Theorem 4.1.3, if
λ̃k+1 = λk + λk+1 − d1 , then

(d2 , . . . , dn ) ≺ (λ̃k+1 , λ1 , . . . , λk−1 , λk+2 , . . . , λn ).

By induction assumption, there is a unitary U ∈ Mn−1 such that

U ([λ̃k ] ⊕ D)U ∗ ∈ Mn−1

has diagonal entries d2 , . . . , dn . Thus, A = ([1] ⊕ U )(A1 ⊕ D)([1] ⊕ U ∗ ) has the desired
eigenvalues and diagonal entries. □

24
4.2 Max-Min and Min-Max characterization of eigenvalues
In this subsection, we give a Max-Min and Min-Max characterization of eigenvalues of a
Hermitian matrix.

Lemma 4.2.1 Let V1 and V2 be subspaces of Cn such that dim(V1 ) + dim(V2 ) > n, then
V1 ∩ V2 ̸= {0}.

Proof. Let {u1 , . . . , up } and {v1 , . . . , vq } be bases for V1 and V2 . Then p + q > n and the
linear system [u1 · · · up v1 · · · vq ]x = 0 ∈ Cn has a non-trivial solution x = (x1 , . . . , xp , y1 , . . . , yq )t .
Note that not all x1 , . . . , xp are zero, else y1 v1 + · · · + yq v1 = 0 implies yj = 0 for all j. Thus,
v = x1 u1 + · · · + xp up = −(y1 v1 + · · · + yq vq ) is a nonzero vector in V1 ∩ V2 . □

Theorem 4.2.2 Let A ∈ Mn be Hermitian. Then for 1 ≤ k ≤ n,

λk (A) = max{λk (X ∗ AX) : X ∈ Mn,k , X ∗ X = Ik }


= min{λ1 (Y ∗ AY ) : Y ∈ Mn,n−k+1 , Y ∗ Y = In−k+1 }.

Equivalently,
λk (A) = maxn min x∗ Ax = min max x∗ Ax.
V ≤C x∈V V ≤ Cn x∈V
dim V = k ∥x∥ = 1 dim V = n − k + 1 ∥x∥ = 1

Proof. Suppose {u1 , . . . , un } is a family of orthonormal eigenvectors of A corresponding to


the eigenvalues λ1 (A), . . . , λn (A). Let X = [u1 · · · uk ]. Then X ∗ AX = diag (λ1 (A), . . . , λk (A))
so that
λk (A) ≤ max{λk (X ∗ AX) : X ∈ Mn,k , X ∗ X = Ik }.
Conversely, suppose X has orthonomals column x1 , . . . , xk spanning a subspace V1 . Let
uk , . . . , un span a subspace V2 of dimension n − k + 1. Then there is a unit vector v =
Pk Pn t t
j=1 xj xj = j=k yj uj . Let x = (x1 , . . . , xk ) , y = (yk , . . . , yn−k ) , Y = [uk . . . uk+1 ]. Then
v = Xx = Y y so that Y ∗ AY = diag (λk (A), . . . , λn (A)). By Rayleigh principle,

λk (X ∗ AX) ≤ x∗ X ∗ AXx = y∗ Y ∗ AY y ≤ λk (A).


4.3 Change of eigenvalues under perturbation


Theorem 4.3.1 Suppose A, B ∈ Mn are Hermitian such that A ≥ B. Then λk (A) ≥ λk (B)
for all k = 1, . . . , n.

25
Proof. Let A = B + P , where P is positive semidefinite. Suppose k ∈ {1, . . . , n}. There
is Y ∈ Mn,k with Y ∗ Y = Ik such that

λk (B) = λk (Y ∗ BY ) = max{λk (X ∗ BX) : X ∈ Mm,n , X ∗ X = Ik }.

Let y ∈ Ck be a unit eigenvector of Y ∗ AY corresponding to λk (X ∗ AX). Then

λk (A) = max{λk (X ∗ AX) : X ∈ Mm,n , X ∗ X = Ik }


≥ λk (Y ∗ AY ) = y∗ Y ∗ (B + P )Y y = y∗ Y ∗ BY y + y∗ Y ∗ P Y y
≥ y∗ Y ∗ BY y ≥ λk (Y ∗ BY ) = λk (B). □

Theorem 4.3.2 (Lidskii) Let A, B, C = A+B ∈ Mn be Hermitian matrices with eigenvalues


a1 ≥ · · · ≥ an , b1 ≥ · · · ≥ bn , c1 ≥ · · · ≥ cn , respectively. Then nj=1 aj + nj=1 bj = nj=1 cj
P P P

and for any 1 ≤ r1 < · · · < rk ≤ n,


k
X k
X k
X
bn−j+1 ≤ (crj − arj ) ≤ bj .
j=1 j=1 j=1

Proof. Suppose 1 ≤ r1 < · · · < rk ≤ n. We want to show kj=1 (crj − arj ) ≤ kj=1 bj .
P P

Replace B by B − bk I. Then each eieganvelue of B and each eigenvalue of C = A + B will


be changed by −bk . So, it will not affect the inequalities. Suppose B = nj=1 bj xj x∗j . Let
P

B+ = kj=1 bj xj x∗j . Then


P

k
X k
X
(crj − arj ) ≤ (λrj (A + B+ ) − λrj (A)) because λj (A + B) ≤ λj (A + B+ ) for all j
j=1 j=1
n
X
≤ (λj (A + B+ ) − λj (A)) because λj (A) ≤ λj (A + B+ ) for all j
j=1

k
X k
X
= tr (A + B+ ) − tr (A) = λj (B+ ) = bj .
j=1 j=1

Replacing (A, B, C) by (−A, −B, −C), we get the other inequalities. □

Lemma 4.3.3 Suppose A ∈ Mm,n has nonzero singular values s1 ≥ · · · ≥ sk . Then


 
0m A
has nonzero eigenvalues ±s1 , . . . , ±sk .
A∗ 0n
Theorem 4.3.4 Let A, B, C ∈ Mm,n with singular values a1 ≥ · · · ≥ an , b1 ≥ · · · ≥ bn and
c1 , . . . , cn , respectively. Then for any 1 ≤ j1 < · · · < jk ≤ n, we have
k
X k
X
(crj − arj ) ≤ bj .
j=1 j=1

26
4.4 Eigenvalues of principal submatrices
 
A ∗
Theorem 4.4.1 There is a positive matrix C = with A ∈ Mk so that A, B, C have
∗ B
eigenvalues a1 ≥ · · · ≥ ak , b1 ≥ · · · ≥ bn−k and c1 ≥ · · · ≥ cn , respectively, if and only if
there are positive semi-definite matrices Ã, B̃, C̃ = Ã + B̃ with eigenvalues a1 ≥ · · · ≥ ak ≥
0 = ak+1 = · · · = an , b1 ≥ · · · ≥ bn−k ≥ 0 = bn−k+1 = · · · = bn , and c1 ≥ · · · ≥ cn .
Consequently, for any 1 ≤ j1 < · · · < jk ≤ n, we have kj=1 (crj − arj ) ≤ kj=1 bj .
P P

Proof. To prove the necessity, let C = Ĉ ∗ Ĉ with Ĉ = [C1 C2 ] ∈ Mn with C1 ∈ Mn,k .


Then A = C1∗ C1 has eigenvalues a1 , . . . , ak , and B = C2∗ C2 has eigenvalues b1 , . . . , bn−k . Now,
C̃ = Ĉ Ĉ ∗ = C1 C1∗ + C2 C2∗ also eigenvalues c1 , . . . , cn , and à = C1 C1∗ , B̃ = C2 C2∗ have the
desired eigenvalues.
Conversely, suppose the Ã, B̃, C̃ have the said eigenvalues. Let à = C1 C1∗ , B̃ = C2 C2∗
for some C1 ∈ Mn,k , C2 ∈ Mn,n−k . Then C = [C1 C2 ]∗ = [C1 C2 ] have the desired principal
submatrices. □
By the above theorem, one can apply the inequalities governing the eigenvalues of
Ã, B̃, C̃ = Ã + B̃ to deduce inequalities relating the eigenvalues of a positive semidefi-
nite matrix C and its complementary principal submatrices. One can also consider general
Hermitian matrix by studying C − λn (C)I.

Theorem 4.4.2 There is a Hermitian (real symmetric) matrix C ∈ Mn with principal sub-
matrix A ∈ Mm such that C and A have eigenvalues c1 ≥ · · · ≥ cn and a1 ≥ · · · ≥ am ,
respectively, if and only if

c j ≥ aj and am−j+1 ≥ cn−j+1 , j = 1, . . . , m.

Proof. To prove the necessity, we may replace C by C − λn (C)I and assume that C is
positive semidefinite. Then by the previous theorem,

cj − aj ≥ bn−m ≥ 0, j = 1, . . . , m.

Applying the argument to −C, we get the conclusion.


To prove the sufficiency, we will construct C − cn I with principal submatrix A − cn Im .
Thus, we may assume that all the eigenvalues involved are nonnegative.
We prove the converse by induction on n − m ∈ {1, . . . , n − 1}. Suppose n − m = 1.
We need only address the case µj ∈ (λj+1 , λj ) for j = 1, . . . , n − 1, since the general
case µj ∈ [λj+1 , λj ] follows by a continuity argument. Alternatively, we can take away the
pairs of cj = aj or aj = cj+1 to get a smaller set of numbers that still satisfy the interlacing
inequalities and apply the following arguments.

27
We will show how to choose a real orthogonal matrix Q such that C = Qt diag(c1 , . . . , cn )Q
has the leading principal submatrix A ∈ Mn−1 with eigenvalues a1 ≥ · · · ≥ an−1 . To this
end, let Q have last column u = (u1 , . . . , un )t . By the adjoint formula for the inverse
Qn−1
−1 det(zIn−1 − A)) j=1 (z − aj )
[(zI − C) ]nn = = Qn ,
det(I − C) j=1 (z − cj )

but we also have the expression


n
X u2i
(zI − A)−1
nn
t −1
= u (zI − diag(λ1 , . . . , λn )) u = .
i=1
(z − ci )

Qn−1
Equating these two, we see that A(n) has characteristic polynomial i=1 (z − µi ) if and only
if
Xn Y n−1
Y
u2i (z − ai ) = (z − ci ).
i=1 j̸=i i=1

Both sides of this expression are polynomials of degree n − 1 so they are identical if and only
if they agree at the n distinct points c1 , . . . , cn , or equivalently,
Qn−1
j=1 (ck − aj )
u2k = Q ≡ wk , k = 1, . . . , n.
j̸=k (ck − cj )


Since (ck − aj )/(ck − cj ) > 0 for all k ̸= j, we see that wk > 0. Thus if we take uk = wk
then A has eigenvalues a1 , . . . , an−1 .
Now, suppose m < n − 1. Let
(
max{cj+1 , aj } 1 ≤ j ≤ m,
c̃j =
min{cj , am−n+j+1 } m < j < n.

Then
c1 ≥ c̃1 ≥ c2 ≥ · · · ≥ cn−1 ≥ c̃n−1 ≥ cn ,
and
c̃j ≥ aj ≥ c̃n−m−1+j , j = 1, . . . , m.
By the induction assumption, we can construct a Hermitian C̃ ∈ Mn−1 with eigenvalues
c̃1 ≥ · · · ≥ c̃n−1 , whose m × m leading principal submatrix has eigenvalues a1 ≥ · · · ≥ am ,
and C̃ is the leading principal submatrix of the real symmetric matrix C ∈ Mn such that C
has eigenvalues c1 ≥ · · · ≥ cn . □

28
4.5 Eigenvalues and Singular values
 
A11 A12
∈ Mn with A11 ∈ Mk . Then | det(A11 )| ≤ kj=1 sj (A).
Q
Theorem 4.5.1 Let A =
A21 A22
The equality holds if and only if A = A11 ⊕A22 such that A11 has singular values s1 (A), . . . , sk (A).

Proof. Let S(s1 , . . . , sn ) be the set of matrices in Mn with singular values s1 ≥ · · · ≥ sn .


 
A11 A12
Suppose A = ∈ S(s1 , . . . , sn ) with A11 ∈ Mk such that | det(A11 )| attains the
A21 A22
maximum value. We show that A = A11 ⊕ A22 and A11 has singular values s1 ≥ · · · ≥ sk .
Suppose U, V ∈ Mk are such that U ∗ A11 V = diag (ξ1 , . . . , ξk ) with ξ1 ≥ · · · ≥ ξk ≥ 0. We
may replace A by (U ∗ ⊕ In−k )A(V ⊕ In−1 ) and assume that A11 = diag (ξ1 , . . . , ξk ).
Let A = (aij ). We show that A21 = 0 as follows. Suppose there is a nonzero entry as1
 
a11 as1
with k < s ≤ n. Then there is a unitary X ∈ M2 such that X has (1, 1) entry
a1s ass
equal to
ξˆ1 = {|a11 |2 + |as1 |2 }1/2 = {ξ12 + |as1 |2 }1/2 > ξ1 .
Let X̂ ∈ Mn be obtained from In by replacing the submatrix in rows and columns 1, j by X.
Then the leading k × k submatrix of X̂A is obtained from that of A by changing its first row
from (ξ1 , 0, . . . , 0) to (ξˆ1 , ∗, · · · , ∗), and has determinant ξˆ1 ξ2 · · · ξk > ξ1 · · · ξk = det(A11 ),
contradicting the fact that | det(A11 )| attains the maximum value. Thus, the first column of
A21 is zero.
Next, suppose that there is as2 ̸= 0 for some k < s ≤ n. Then there is a unitary X ∈ M2
 
a22 as2
such that X has (1, 1) entry equal to
a2s ass

ξˆ2 = {|a22 |2 + |as2 |2 }1/2 = {ξ22 + |as2 |2 }1/2 > ξ2 .

Then the leading k × k submatrix of X̂A is obtained from that of A by changing its first row
from (0, ξ2 , 0, . . . , 0) to (0, ξˆ2 , ∗, · · · , ∗), and has determinant ξ1 ξˆ2 · · · ξk > ξ1 · · · ξk = det(A11 ),
which is a contradiction. So, the second column of A21 is zero. Repeating this argument, we
see that A21 = 0.
Now, the leading k × k submatrix of At ∈ S(s1 , . . . , sn ) also attains the maximum.
Applying the above argument, we see that At12 = 0. So, A = A11 ⊕ A22 .
Let Û , V̂ ∈ Mn−k be unitary such that Û ∗ A22 V̂ = diag (ξk+1 , . . . , ξn ). We may replace
A by (Ik ⊕ Û ∗ )A(Ik ⊕ V̂ ) so that A = diag (ξ1 , . . . , ξn ). Clearly, ξk ≥ ξk+1 . Otherwise, we
may interchange kth and (k + 1)st rows and also the columns so that the leading k × k
submatrix of the resulting matrix becomes diag (ξ1 , . . . , ξk−1 , ξk+1 ) with determinant larger
than det(A11 ). So, ξ1 , . . . , ξk are the k largest singular values of A. □

29
Theorem 4.5.2 Let a1 , . . . , an be complex numbers be such that |a1 | ≥ · · · ≥ |an | and s1 ≥
· · · ≥ sn ≥ 0. Then there is A ∈ Mn with eigenvalues a1 , . . . , an and singular values
s1 , · · · , sn if and only if
n
Y n
Y k
Y k
Y
|aj | = sj , and |aj | ≤ sj for j = 1, . . . , n − 1.
j=1 j=1 j=1 j=1

Proof. Suppose A has eigenvalues a1 , . . . , an and singular values s1 ≥ · · · ≥ sn ≥ 0. We


may apply a unitary similarity to A and assume that A is in upper triangular form with
diagonal entries a1 , . . . , an . By the previous theorem, if Ak is the leading k × k submatrix of
A, then |a1 · · · ak | = | det(Ak )| ≤ kj=1 sk for k = 1, . . . , n − 1, and | det(A)| = |a1 · · · an | =
Q

s1 · · · sn .
To prove the converse, suppose the asserted inequalities and equality on a1 , . . . , an and
s1 , . . . , sn hold. We show by induction that there is an upper triangular matrix A = (aij )
with singular values s1 ≥ · · · ≥ sn and diagonal values |a1 |, . . . , |an |. Then there will be a
diagonal unitary matrix D such that DA has the desired eigenvalues and singular values.
For notation simplicity, we assume aj = |aj | in the following.
Suppose n = 2. Then a1 ≤ s1 , and a1 a2 = s1 s2 so that s1 ≥ a1 ≥ a2 ≥ s2 . Consider
   
cos θ sin θ s1 cos ϕ − sin ϕ
A(θ, ϕ) = .
− sin θ cos θ s2 sin ϕ cos ϕ

There is ϕ ∈ [0, π/2] such that the (s1 cos ϕ, s2 sin ϕ)t has norm a1 ∈ [s2 , s1 ]. Then we can
find θ ∈ [0, π/2] such that (cos θ, sin θ)(s1 cos ϕ, s2 sin ϕ) = a1 . Thus, the first column of
A(θ, ϕ) equals (a1 , 0)t , and A(θ, ϕ) has the desired eigenvalues and singular values.
Suppose the result holds for matrices of size at most n − 1 ≥ 2. Consider (a1 , . . . , an )
and (s1 , . . . , sn ) satisfying the product equality and inequalities.
If a1 = 0, then sn = 0 and A = s1 E12 + · · · + sn−1 En−1,n has the desired eigenvalues and
singular values.
Suppose a1 > 0. Let k be the maximum integer such that sk ≥ a1 . Then there is
 
a1 ∗
A1 = with s̃k+1 = sk sk+1 /a1 ∈ [sk−1 , sk+1 ]. Let
0 s̃k+1

(s̃1 , . . . , s̃n−1 ) = (s1 , . . . , sk−1 , s̃k+1 , sk+2 , . . . , sn ).

We claim that (a2 , . . . , an ) and (s̃1 , . . . , s̃n−1 ) satisfy the product equality and inequalities.
First, nj=2 aj = nj=1 sj /a1 = n−1
Q Q Q
j=1 s̃j . For ℓ < k,


Y ℓ−1
Y ℓ−1
Y ℓ−1
Y
aj ≤ aj ≤ sj = s̃j .
j=2 j=1 j=1 j=1

30
For ℓ ≥ k + 1,

Y ℓ
Y ℓ−1
Y
aj ≤ sj /a1 = s̃j .
j=2 j=1 j=1

So, there is A2 ∈ U D̃V ∗ in triangular form with diagonal entries a2 , . . . , an , where U, V ∈


Mn−1 are unitary, and D̃ = diag (s̃1 , . . . , s̃n−1 ). Let
   
1 A0 1
A=
U D̃ V∗

is in upper triangular form with diagonal entries a1 , . . . , an and singular values s1 , . . . , sn as


desired. □

4.6 Diagonal entries and singular values


Theorem 4.6.1 Let A ∈ Mn have diagonal entries d1 , . . . , dn such that |d1 | ≥ · · · ≥ |dn |
and singular values s1 ≥ · · · ≥ sn .
Pk Pk
(a) For any 1 ≤ k ≤ n, we have j=1 |dj | ≤ j=1 sj . The equality holds if and only
if there is a diagonal unitary matrix D such that DA = A11 ⊕ A22 such that A11 is
positive semidefinite with eigenvalues s1 ≥ · · · ≥ sk .
Pn−1 Pn−1
(b) We have j=1 |d j | − |dn | ≤ j=1 sj − sn . The equality holds if and only if there
is a diagonal unitary matrix D such that DA = (aij ) is Hermitian with eigenvalues
s1 , . . . , sn−1 , −sn and ann ≤ 0.

Proof. (a) Let S(s1 , . . . , sn ) be the set of matrices in Mn with singular values s1 ≥ · · · ≥
 
A11 A12
sn . Suppose A = ∈ S(s1 , . . . , sn ) with A11 ∈ Mk such that |a11 | + · · · + |akk |
A21 A22
attains the maximum value. We may replace A by DA by a suitable diagonal unitary D ∈ Mn
and assume that ajj = |ajj | for all j = 1, . . . , n. If aij ̸= 0 for any j > k ≥ i, then there is a
 
aii aij
unitary X ∈ M2 such that X has (1, 1) entry equal to
aji ajj

ãii = {|aii |2 + |aji |2 }1/2 > |aii |.

Let X̂ ∈ Mn be obtained from In by replacing the submatrix in rows and columns i, j by


X. Then diagonal entries of the leading k × k submatrix Â11 of X̂A is obtained from that
of A by changing its (i, i) entry aii to âii so that tr Â11 > tr A11 , which is a contradiction.
So, A12 = 0. Applying the same argument to At , we see that A12 = 0. Now, A11 has
singular values ξ1 ≥ · · · ≥ ξk . Then A11 = P V for some positive semidefinite matrix P with

31
eigenvalues ξ1 , . . . , ξk and a unitary matrix V ∈ Mk . Suppose V = U D̂U ∗ for some diagonal
unitary D̂ ∈ Mk and unitary U ∈ Mk . Then
tr A11 = tr (P U D̂U ∗ ) = tr U ∗ P U D̂ ≤ tr U ∗ P U = tr P,
where the equality holds if and only if D̂ = Ik , i.e., A11 = P is positive semidefinite. In
particular, we can choose B = diag (s1 , . . . , sn ) so that the sum of the k diagonal entries is
Pk Pk
j=1 sj ≥ j=1 ξj = tr A11 . Thus, the eigenvalues of A11 must be s1 , . . . , sk as asserted.

(b) Let A = (aij ) ∈ S(s1 , . . . , sn ) attains the maximum values n−1


P
j=1 |ajj |−|ann |. We may
replace A by a diagonal unitary matrix and assume that aii ≥ 0 for j = 1, . . . , n − 1, and
ann ≤ 0. Let A11 ∈ Mn−1 be the leading (n − 1) × (n − 1) principal submatrix of A. By part
(a), we may assume that A11 is positive semidefinite so that its trace equals to the sum of its
singular values. Otherwise, there are U, V ∈ Mn−1 such that U ∗ A11 V = diag (ξ1 , . . . , ξn−1 )
with ξ1 +· · ·+ξn−1 > n−1 ∗
P
j=1 ajj . As a result, (U ⊕[1])A(V ⊕[1]) ∈ S(s1 , . . . , sn ) has diagonal

entries dˆ1 , . . . , dˆn−1 , ann such that


n−1
X n−1
X
dˆj − |ann | > ajj − |ann |,
j=1 j=1

which is a contradiction.  
ajj ajn
Next, for j = 1, . . . , n − 1, let Bj = . We show that |ajj | − |ann | =
anj ann
s1 (Bj ) − s2 (Bj ) and Bj is Hermitian in the following. Note that s1 (B1 )2 + s2 (Bj )2 =
|ajj |2 + |ajn |2 + |anj |2 + |ann |2 and s1 (Bj )s2 (Bj ) = |ajj ann − ajn anj | so that −ajj ann =
|ajj ann | ≥ s1 (Bj )s2 (Bj ) − |ajn anj |. Hence,
(|ajj | − |ann |)2 = (ajj + ann )2 = a2jj + a2nn + 2ajj ann
≤ s1 (Bj )2 + s2 (Bj )2 − (|ajk |2 + |akj |2 ) − 2(s1 (Bj )s2 (Bj ) − |ajn anj |)
= (s1 (Bj ) − s2 (Bj ))2 − (|ajk | − |akj |)2
≤ (s1 (Bj ) − s2 (Bj ))2 .
Here the two inequalities become equalities if and only if |ajk | = |akj | and |ajn anj | = ajn anj ,
i.e., ajn = ānj and Bj is Hermitian.
By the above analysis, |ajj | − |ann | ≤ s1 (Bj ) − s2 (Bj ). If the inequality is strict, there are
unitary X, Y ∈ M2 such that X ∗ Bj Y = diag (s1 (Bj ), s2 (Bj )). Let X̂ be obtained from In by
replacing the 2 × 2 submatrix in rows and columns j, n by X. Similarly, we can construct
Ŷ . Then X̂, Ŷ ∈ Mn are unitary and X̂ ∗ AŶ has diagonal entries dˆ1 , . . . , dˆn obtained from
that of A by changing (ajj , ann ) to (s1 (Bj ), s2 (Bj )). As a result,
n−1
X n−1
X
dˆj − |dˆn | > ajj − |ann |,
j=1 j=1

32
which is a contradiction. So, Bj is Hermitian for j = 1, . . . , n − 1. Hence, A is Hermitian,
and
tr A = a11 + · · · + ann = a11 + · · · + an−1,n−1 − ann .
Suppose A has eigenvalues λ1 , . . . , λn with |λj | = sj (A) for j = 1, . . . , n. Because 0 ≥ ann ≥
P Pn−1
λn , we see that tr A = j=1 λj ≤ j=1 sj − sn . Clearly, the equality holds. Else, we have
Pn−1
B = diag (s1 , . . . , sn ) ∈ S(s1 , . . . , sn ) attaining j=1 sj − sn . The result follows. □
Recall that for two real vectors x = (x1 , . . . , xn ), y = (y1 , . . . , yn ), we say that x ≺w y is
the sum of the k largest entries of x is not larger than that of y for k = 1, . . . , n.

Theorem 4.6.2 Let d1 , . . . , dn be complex numbers such that |d1 | ≥ · · · ≥ |dn |. Then there
is A ∈ Mn with diagonal entries d1 , . . . , dn and singular values s1 ≥ · · · ≥ sn if and only if

n−1
X n−1
X
(|d1 |, . . . , |dn |) ≺w (s1 , . . . , sn ) and |dj | − |dn | ≤ sj − sn .
j=1 j=1

Proof. The necessity follows from the previous theorem. We prove the converse by
induction on n ≥ 2. We will focus on the construction of A with singular values s1 , . . . , sn ,
and diagonal entries d1 , . . . , dn−1 , dn with d1 , . . . , dn ≥ 0.
 
d1 a
Suppose n = 2. We have d1 + d2 ≤ s1 + s2 , d1 − d2 ≤ s1 − s2 . Let A = such
−b d2
that a, b ≥ 0 satisfies ab = s1 s2 − d1 d2 and a2 + b2 = s21 + s22 − d21 − d22 . Such a, b exist because

2(s1 s2 − d1 d2 ) = 2ab ≤ a2 + b2 = s21 + s22 − d21 − d22 .

Suppose the result holds for matrices of sizes up to n − 1 ≥ 2. Consider (d1 , . . . , dn ) and
(s1 , . . . , sn ) that satisfy the inequalities. Let k be the largest integer k such that sk ≥ d1 .
 
d1 ∗
If k ≤ n − 2, there is B = with singular values sk , sk+1 , where ŝ = sk + sk+1 − d1 .
∗ ŝ
One can check that (d2 , . . . , dn ) and (s1 , . . . , sk−1 , ŝ, sk+2 , . . . , sn ) satisfy the inequalities for
the n − 1 case so that there are unitary U, V ∈ Mn−1 such that U DV ∗ has diagonal entries
d2 , . . . , dn , where D = diag (ŝ, s1 , . . . , sk−1 , sk+2 , . . . , sn ). Thus,

A = ([1] ⊕ U )(B ⊕ diag (s1 , . . . , sk−1 , sk+2 , . . . , sn )([1] ⊕ V ∗ )

has diagonal entries d1 , . . . , dn and singular values s1 , . . . , sn .


Now suppose k ≥ n − 1, let
( n−1 n−2
) ( n−2
)
X X X
ŝ = max 0, dn + sn − sn−1 , dj − sj ≤ min sn−1 , sn−1 + sn − dn , (sj − dj ) + dn−1 .
j=1 j=1 j=1

33
It follows that
(dn , ŝ) ≺w (sn−1 , sn ), |dn − ŝ| ≤ sn−1 − sn ,
n−2
X n−2
X
(d1 , . . . , dn−1 ) ≺w (s1 , . . . , sn−2 , ŝ) and dj − dn−1 ≤ sj − ŝ.
j=1 j=1

So, there is C ∈ M2 with singular values sn−1 , sn and diagonal elements ŝ, dn . Moreover,
there are unitary matrix X, Y ∈ Mn−1 such that Xdiag (s1 , . . . , sn−2 , ŝ)Y ∗ has diagonal
entries d1 , . . . , dn−1 . Thus,

A = (X ⊕ [1])(diag (s1 , . . . , sn−2 ) ⊕ C)(Y ∗ ⊕ [1])

will have the desired diagonal entries and singular values. □

4.7 Final remarks


The study of matrix inequalities has a long history and is still under active research. One of
the most interesting question raised in 1960’s and was finally solved in 2000’s is the following.
Problem Determine the necessary and sufficient conditions for three set of real numbers
a1 ≥ · · · ≥ an , b1 ≥ · · · ≥ bn , c1 ≥ · · · ≥ cn for the existence of three (real symmetric)
Hermitian matrices A, B and C = A+B with these numbers as their eigenvalues, respectively.
It was proved that the conditions can be described in terms of the equality nj=1 (aj +bj ) =
P
Pn
j=1 cj and a family of inequalities of the form

k
X k
X
(auj + bvj ) ≥ cwj
j=1 j=1

for certain subsequences (u1 , . . . , uk ), (v1 , . . . , vk ), (w1 , . . . , wk ) of (1, . . . , n).


There are different ways to specify the subsequences. A. Horn has the following recursive
way to define the sequences.

1. If k = 1, then w1 = u1 + v1 − 1. That is, we have au + bv ≥ cu+v−1 .

2. Suppose k < n and all the subsequences of length up to k − 1 are specified. Con-
Pk
sider subsequences (u1 , . . . , uk ), (v1 , . . . , vk ), (v1 , . . . , vk ) satisfying j=1 (uj + vj ) =
Pk
j=1 wj +k(k+1)/2, and for any lenth ℓ specified subsequences (α1 , . . . , αℓ ), (β1 , . . . , βℓ ), (γ1 , . . . , γℓ )
of (1, . . . , n) with ℓ < k,
X ℓ X
(uαj + vβj ) ≥ wγj .
j=1 j=1

34
Consequently, the subsequences (u1 , . . . , uk ), (v1 , . . . , vk ), (w1 , . . . , wk ) of (1, . . . , n) is a
Horn’s sequence triples of length k if and only if there are Hermitian matrices U, V, W = U +V
with eigenvalues

u1 − 1 ≤ u2 − 2 ≤ · · · ≤ uk − k, v1 − 1 ≤ v2 − 2 ≤ · · · ≤ vk − k, w1 − 1 ≤ w2 − 2 ≤ · · · ≤ wk − k,

respectively. This is known as the saturation conjecture/theorem.


Special cases of the above inequalities includes the following inequalities of Thompson,
which reduces to the Weyl’s inequalities when k = 1.

Theorem 4.7.1 Suppose A, B, C = A + B ∈ Mn are Hermitian matrices with eigenval-


ues a1 ≥ · · · ≥ an , b1 ≥ · · · bn and c1 ≥ · · · ≥ cn , respectively. For any subequences
(u1 , . . . , uk ), (v1 , . . . , vk ), of (1, . . . , n), if (w1 , . . . , wk ) is such that wj = uj + vj − j ≤ n
for all j = 1, . . . , k, then
X k Xk
(auj + bvj ) ≥ cwj .
j=1 j=1

Proof. We prove the result by induction on n. Suppose n = 2. If k = n so that


(u1 , u2 ) = (v1 , v2 ) = (1, 2), then the equality holds. If k = 1, then ai + bj ≥ ci+j−1 for any
i + j ≤ 3 by the Lidskii inequality.
Now, suppose the result holds for all matrices of size n−1. If k = n so that (u1 , . . . , un ) =
(v1 , . . . , vn ), then the equality holds. Suppose k < n. Let p be the largest integer such that
uj = j for j = 1, . . . , p, and let q be the largest integer such that vj = j for j = 1, . . . , q. We
may assume that q ≤ p < n. Else, interchange the roles of A and B.
Let {y1 , . . . , yn } be an orthonormal set of eigenvectors of B and {z1 , . . . , zn } be an or-
thonormal set of eigenvectors of C so that

Byj = aj yj , Czj = zj , j = 1, . . . , n.

Suppose Z ∈ Mn,n−1 has orthonormal columns such that the column space of Z contains
z1 , . . . , zq , yq+2 , . . . , yn . Let à = Z ∗ AZ, B̃ = Z ∗ BZ, C̃ = Z ∗ CZ have eigenvalues â1 ≥ · · · ≥
ân−1 , b̂1 ≥ · · · ≥ b̂n−1 , and ĉ1 ≥ · · · ≥ ĉn−1 , respectively. By induction assumption,
q k k k k
X X X X X
ĉuj +vj −j + ĉuj +(vj −1)−j ≤ âuj + b̂j + buj +(vj −1)−j .
j=1 j=q+1 j=1 j=1 j=q+1

Because uj + vj − j = j for j = 1, . . . , q, and the column space of Z contains z1 , . . . , zq , we


see that ĉj = cj for j = 1, . . . , q. For j = q + 1, . . . , k, we have cuj +vj −j ≤ ĉuj +vj −j−1 , and
hence
q k q k
X X X X
cuj +vj −1 + cuj +vj −j ≤ cuj +vj −1 + ĉuj +vj −j−1 .
j=1 j=q+1 j=1 j=q+1

35
Because b̂j ≤ bj for j = 1, . . . , q, and b̂uj +vj −j−1 = buj +vj −j for j = q + 1, . . . , k as the column
spaces contains yq+1 , . . . , yn , we have
q k k
X X X
b̂j + buj +(vj −1)−j ≤ buj +vj −j .
j=1 j=q+1 j=1

The result follows. □


Applying the result to −A−B = −C, we see that for any subsequences (u1 , . . . , uk ), (v1 , . . . , vk )
and (w1 , . . . , wk ) with wj = ui + vj − j such that uk + vk − k ≤ n, we have
k
X k
X
(an−uj +1 + bn−vj +1 ) ≤ (cn−wj +1 ).
j=1 j=1

Additional results and exercises

1. Suppose n = 3. List all the Horn sequences (u1 , u2 ), (v1 , v2 ), (w1 , w2 ) of length 2, and
list all the Thompson standard sequences (u1 , u2 ), (v1 , v2 ) and (w1 , w2 ) = (u1 + v1 −
1, u2 + v2 − 2).

2. Suppose A, B, C = A + B ∈ Mn are Hermitian matrices have eigenvalues a1 ≥ · · · ≥


an , b1 ≥ · · · ≥ bn and c1 ≥ · · · ≥ cn , respectively. Show that if C = (cij ) then
Pk Pk
j=1 cjj ≤ j=1 (aj +bj ); the equality holds if and only if A = A11 ⊕A22 , B = B11 ⊕B22
with A11 , B11 ∈ Mk such that A11 and B11 have eigenvalues a1 ≥ cdots ≥ ak , b1 ≥ · · · ≥
bk , respectively.

3. (Weyl’s inequalities.) Suppose A, B, C = A + B ∈ Mn are Hermitian matrices. For


any u, v ∈ {1, . . . , n} with u + v − 1 ≤ n, show that λu (A) + λv (B) ≥ λu+v−1 (A + B).
Hint: By induction on n ≥ 2. Check the case for n = 2. Assume the result hold
for matrices of size n − 1. Assume v ≤ u. Let {z1 , . . . , zn } and {y1 , . . . , yn } be or-
thonormal sets such that Byj = bj yj and Czj = cj zj for j = 1, . . . , n. If Z ∈ Mn,n−1
with orthonormal columns such that the column space of Z contains y1 , . . . , yu and
zq+2 , . . . , zn . Argue that

cu+v−1 = λu+v−2 (Z ∗ CZ) ≤ λu (Z ∗ AZ) + λv−1 (Z ∗ BZ) ≤ au + bv .

4. Suppose C = A + iB such that A and B has eigenvalues a1 , . . . , an and b1 , . . . , bn


such that |a1 | ≥ · · · ≥ |an | and |b1 | ≥ · · · ≥ |bn |. Show that if C has singular values
s1 , . . . , sn , then

(a21 +b2n , . . . , a2n +b21 ) ≺ (s21 , . . . , s2n ) and (s21 +s2n , . . . , s2n +s21 )/2 ≺ (a21 +b21 , . . . , a2n +b2n ).

Hint: 2(A2 + B 2 ) = CC ∗ + C ∗ C.

36
5. Suppose c1 ≥ a1 ≥ c2 ≥ a2 ≥ · · · ≥ an−1 ≥ cn ≥ an are 2n real numbers. Show
that there is a nonnegative real vector v ∈ Rn such that D + vv t has eigenvalues
c1 ≥ · · · ≥ cn for D = diag (a1 , . . . , an ).
Hint: Replace cj by cj + γ and aj + γ for j = 1, . . . , n, for a sufficiently large γ > 0,
 
D y
and assume that cn ≥ an > 0. By interlacing inequalities, there is C̃ = . Show
yt a
that C = D + vv t has eigenvalues c1 ≥ · · · ≥ cn .
 
à ∗
6. Suppose A = . Show that
0 ∗

s1 (A) ≥ s1 (Ã) ≥ s2 (A) ≥ s2 (Ã) ≥ · · · ≥ sn−1 (Ã) ≥ sn (A).

7. (Extra credit) Suppose A, B ∈ Mn . For any subsequences (u1 , . . . , uk ), (v1 , . . . , vk ) and


(w1 , . . . , wk ) of (1, . . . , n) such that wj = uj +vj −j for j = 1, . . . , k, and uk +vk −k ≤ n,
we have
Yk k
Y
suj (A)svj (B) ≥ swj (AB).
j=1 j=1

Hint: By induction on n. Check the case for n = 2. Assume that the result holds for
matrices of size n − 1. If k = n, the equality holds. Suppose k < n. Let p be the
largest integer such that uj = j for all j = 1, . . . , p, and q be the largest integer such
that vj = j for all j = 1, . . . , q. We may assume that q ≤ p. Let C = AB, {u1 , . . . , un }
and {v1 , . . . , vn } be orthonormal sets such that

B ∗ Buj = sj (B)2 uj and C ∗ Cvj = sj (C)2 vj .

Suppose U, V are unitary such that the first n − 1 columns span a subspace containing
 
B̃ ∗
v1 , . . . , v1 , uq+2 , . . . , un , and V ∗ BU = with B̃ ∈ Mn−1 . Let W be unitary such
0 ∗
   
∗ Ã ∗ ∗ ÃB̃ ∗
that W BV = . Then W ABV = . Apply induction assumption on
0 ∗ 0 ∗
ÃB̃ to finish the proof.

37
5 Norms
In many applications of matrix theory such as approximation theory, numerical analysis,
quantum mechanics, one has to determine the “size” of a matrix, how near is one matrix
to another, or how close is a matrix to a special class of matrices. We need concept of the
norm (size) of a matrix. There are different ways to define the norm of a matrix, and the
different definitions are useful in different applications.

5.1 Basic definitions and examples


Definition 5.1.1 Let V be a linear space over F = R or C. A function ν : V → [0, ∞) if
(a) ν(v) ≥ 0 for all v ∈ V ; the equality holds if and only if v = 0.
(b) ν(cv) = |c|ν(v) for any c ∈ F and v ∈ V .
(c) ν(u + v) ≤ ν(u) + ν(v) for all u, v ∈ V .

Example 5.1.2 Let V = Fn . For v = (v1 , . . . , vn )t ∈ Fn , let


n
X
ℓ∞ (v) = max{|vj | : 1, . . . , n} and ℓp (v) = ( |vj |p )1/p for p ≥ 1
j=1

be the ℓ∞ nrom and the ℓp norm.

Note that ℓ2 (v) = ( nj=1 |vj |2 )1/2 is the inner product norm.
P

For every p ∈ [1, ∞], it is easy to verify (a) and (b). For p = 1, ∞, it is easy to verify
the triangular inequality. For p > 1, the verification of ℓp (u + v) ≤ ℓp (u) + ℓp (v) is not so
easy. We may change all the entries of u and v to their absolute values, and focus on vectors
with nonnegative entries. to prove that the ℓp norm satisfies the triangle inequality 1 < p,
we establish the following.

Lemma 5.1.3 (Hölder’s inequality) Let p, q > 1 be such that 1/p + 1/q = 1. For u =
(u1 , . . . , un )t and v = (v1 , . . . , vn )t with positive entries,
X
uj vj ≤ ℓp (u)ℓq (v).
j=1

The equality holds if and only if (up1 , . . . , upn )t and (v1q , . . . , vnq )t are linearly dependent.

Proof. Replace (u, v) by (u/ℓp (u), v/ℓq (v)). We need to show that ut v ≤ 1. Note that for
two positive numbers a, b, we have

ab = exp(ln a + ln b) = exp((1/p) ln(ap ) + (1/q) ln(bq ))

38
≤ (1/p) exp(ln(ap )) + (1/q) exp(ln(bq )) = ap /p + bq /q,
where the equality holds if and only if ap = bq . Thus, we have uk vk ≤ upk /p + vkq /q, and
n
X
uj vj ≤ ℓp (u)/p + ℓq (v)/q = 1,
j=1

where the equality holds if and only if upj = vjq for all j = 1, . . . , n. □

Corollary 5.1.4 (Minkowski inequality) Suppose p ∈ [1, ∞]. We have ℓp (u + v) ≤ ℓp (u) +


ℓp (v).

Proof. The cases for p = 1, ∞ can be readily checked. Suppose p > 1. By the Hölder
inequality, if 1 − 1/p = 1/q, then
n
X n
X
(uj + vj )p = uj (uj + vj )p−1 + vj (uj + vj )p−1
j=1 j=1
n
X n
X
≤ ℓp (u)( (uj + vj ) ) + ℓp (v)( (uj + vj )p )1/q
p 1/q
as (p − 1)q = p
j=1 j=1

n
!1/q
X
= (ℓp (u) + ℓp (v)) (uj + vj )p .
j=1

P 1/q
n
Dividing both sides by j=1 (uj + vj )p , we get the conclusion. □

Next, we consider examples on matrices.

Example 5.1.5 Consider V = Mm,n . Using the inner product ⟨A, B⟩ = tr (AB ∗ ) on Mm,n ,
we have the inner product norm (a.k.a. Frobenius norm)

X m
X
∗ 1/2 2 1/2
∥A∥ = (tr AA ) =( |aij | ) = sj (A)2 .
i,j j=1

One can define the ℓp (A) = ( i,j |aij |p )1/p , and define the Schatten p-norm by
P

m
X
Sp (A) = ℓp (s(A)) = ( sj (A)p )1/p .
j=1

The Schatten ∞-norm reduces to s1 (A), which is also known as the spectral norm or operator
norm defined by
∥A∥ = max{ℓ2 (Ax) : x ∈ Cn , ℓ2 (x) ≤ 1}.

39
When m = n, the Schatten 1-norm of A is just the sum of the singular values of A, and
is also known as the trace norm.
One can also define the Ky Fan k-norm by Fk (A) = kj=1 sj (A) for k = 1, . . . , m.
P

Assertion The Ky Fan k-norms and the Schatten p-norms satisfy the triangle inequalities.
Proof. To prove the triangle inequality for the Ky Fan k-norm, note that if C = A +
     
0 C 0 A 0 B Pk
B, then ∗
C 0
= ∗
A 0
+
B 0 ∗ . By the Lidskii inequalities j=1 sj (C) ≤
Pk
j=1 (sj (A) + sj (B)). So, we have proved s(C) ≺w s(A) + s(B). It is easy so show that if
(c1 , . . . , cm ) ≺w (γ1 , . . . , γm ), then ℓp (c1 , . . . , cm ) ≤ ℓp (γ1 , . . . , γn ). Thus, we have

sp (C) = ℓp (s(C)) ≤ ℓp (s(A) + s(B) ≤ ℓp (s(A)) + ℓp (s(B)) = Sp (A) + sp (B).


For A ∈ Mn , one can define the numerical range and numerical radius of A by

W (A) = {x∗ Ax : x ∈ Cn , x∗ x = 1} and w(A) = max{|x∗ Ax| : x ∈ Cn , x∗ x = 1},

respectively. The spectral radius of A ∈ Mn as

r(A) = max{|λ| : λ is an eigenvalue of A}.


 
0 2
Example 5.1.6 If A = , then
0 0

W (A) = {(x̄1 , x̄2 )A(x1 , x2 )t : |x1 |2 + |x2 |2 = 1} = {2x̄1 x2 : |x1 |2 + |x2 |2 = 1}


= {2 cos θ sin θeit : θ ∈ [0, π/2], t ∈ [0, 2π)} = {µ ∈ C : |µ| ≤ 1}

Note that the numerical radius is a norm on Mn (homework), but the spectral radius is
not.

Theorem 5.1.7 Let A ∈ Mn . Then W (A) is a compact convex set containing all the
eigenvalues of A, and
r(A) ≤ w(A) ≤ s1 (A) ≤ 2w(A).

Proof. Let x, y ∈ Cn be unit vectors, and α = x∗ Ax, β = y ∗ Ay ∈ W (A). We need to


show that the line segment joining α and β lies in W (A). We assume α ̸= β to avoid trivial
consideration.
Note that W (ξA + µI) = ξW (A) + µ = {ξx∗ Ax + µ : x ∈ Cn , x∗ x = 1}. We may replace
A by B = (A − αI)/(β − α), and show that the line joining x∗ Bx = 0 and y ∗ By = 1 lies in
W (B). We may further assume that x∗ By + y ∗ Bx ∈ R. Else, replace y by eir y for a suitable
r ∈ [0, 2π).

40
Now, let z(s) = [(1 − s)x + sy]/∥(1 − s)x + sy∥ so that

(1 − s)2 x∗ Bx + s(1 − s)(x∗ By + y ∗ Bx) + s2 y ∗ By


z(s)∗ Bz(s) = ∈ W (B), s ∈ [0, 1],
∥(1 − s)x + sy∥s

has real values vary from 0 to 1 continuously as s varies in [0, 1]. So, [0, 1] ⊆ W (B).
The set W (A) is compact means that it is bounded and contains all the boundary points.
It follows from the fact that W (A) is the image of the set of unit vectors in Cn under the
continuous function x 7→ x∗ Ax.
Now, if λ is an eigenvalue of A, let x be a corresponding unit eigenvector of λ, then
x∗ Ax = λ ∈ W (A). So, r(A) ≤ w(A). Also, we have

w(A) = max{|x∗ Ax| : x ∈ Cn , x∗ x = 1} ≤ max{|x∗ Ay| : x, y ∈ Cn , x∗ x = y ∗ y = 1} ≤ s1 (A).

Finally, if A = H + iG with H = H ∗ , G = G∗ , then there are unit vectors x, y ∈ Cn such


that

s1 (A) ≤ s1 (H + iG) ≤ s1 (H) + s1 (G) = |x∗ Hx| + |y ∗ Gy| ≤ |x∗ Ax| + |y ∗ Ay| ≤ 2w(A).

Definition 5.1.8 A norm ∥ · ∥ on Mn is a matrix/algebra norm if

∥AB∥ ≤ ∥A∥∥B∥ for all A, B ∈ Mn .

Suppose ν is a norm on Fn . Then the operator norm induced by ν is defined by

∥A∥ν = max{ν(Ax) : x ∈ Cn , ν(x) ≤ 1}.

Note that every induced norm is a matrix norm. The Schatten p-norms, the Ky Fan
k-norms, are matrix norms, but the numerical radius is not.

Example 5.1.9 The operator norm induced by the ℓ1 -norm on Fn is the column sum norm
defined by
Xn
∥A∥ℓ1 = max{ : |ajℓ | : ℓ = 1, . . . , n}.
j=1

The operator norm induced by the ℓ∞ -norm on Fn is the row sum norm defined by
n
X
∥A∥ℓ∞ = max{ : |aℓj | : ℓ = 1, . . . , n}.
j=1

Theorem 5.1.10 Let A ∈ Mn . Then limk→∞ Ak = 0 if and only if r(A) < 1.

41
Proof. Let A = S(J1 ⊕· · ·⊕Jk )S −1 , where J1 , . . . , Jk are Jordan blocks. Assume r(A) < 1.
We will show that Aℓ → 0 as ℓ → ∞. It suffices to show that Jiℓ → 0 as ℓ → ∞ for each
i = 1, . . . , k.
Note that if µ satisfies |µ| < 1 and Nm = E12 + · · · Em−1,m ∈ Mm , then for ℓ > m,

m−1
X 
ℓ ℓ ℓ−p p
(µIm + Nm ) = µ N →0 as ℓ → ∞
j=0
p


 ℓ−p
as limℓ→∞ p
µ = 0. Conversely, if Ax = µx for some |µ| ≥ 1 and unit vector x ∈ Cn ,
then Ak x = µk x so that Ak ̸→ 0 as k → ∞. □

Theorem 5.1.11 Let ∥ · ∥ be a matrix norm on Mn . Then

lim ∥Ak ∥1/k = r(A).


k→∞

Proof. Suppose µ is an eigenvalue of A such that |µ| = r(A). Let x be a unit vector such
that Ax = µx. Then |µk |∥[x · · · x]∥ = ∥Ak [x · · · x]∥ ≤ ∥Ak ∥∥[x · · · x]∥. So, |µk | ≤ ∥Ak ∥.
Now, for any ε > 0, let Aε = A/(r(A) + ε). Then limk→∞ Akε = 0. So, for sufficiently
large k ∈ N we have ∥Ak /(rA + ε)k ∥ < 1. Hence, for any ε > 0, if k is sufficiently large, then

r(A) ≤ ∥Ak ∥1/k ≤ r(A) + ε.

The result follows. □

Remark In the proof, we use the fact that the function x 7→ ∥x∥ is continuous. To see this,
for any ε > 0, we can let δ = ε, then ∥x − y∥ < δ, we have |∥x∥ − ∥y∥| ≤ ∥x − y∥ = δ = ε.

Corollary 5.1.12 Suppose ∥ · ∥ is a matrix norm on Mn such that ∥A∥ ≥ r(A) for all
A ∈ Mn . If ∥A∥ < 1, then Ak → 0 as k → ∞.

5.2 Geometric and analytic properties of norms


Let ν be a norm on a linear space V . Then

Bν = {x ∈ V : ν(x) ≤ 1}

is the unit ball of the norm ν.

Theorem 5.2.1 Let ν be a norm on a nonzero linear space V . Then Bv satisfies the fol-
lowing.

42
(a) The zero vector 0 is an interior point.
(b) For any µ ∈ F with |µ| = 1,

Bv = µBν = {µx : x ∈ Bν }.

(c) The set Bν is convex. That is if x, y ∈ Bν , then tx + (1 − t)y ∈ Bν .


Conversely, if V is finite dimensional linear space over F and B is a set satisfying (a) —
(c), then we can define a norm ∥ · ∥ on V by ∥x∥ = 0, and for any nonzero x ∈ V ,

∥x∥ = sup{t > 0 : x/t ∈ B} = max{t > 0 : x/t ∈ B}.

Theorem 5.2.2 Suppose νj for j ∈ J is a family of norm on a linear space V so that 0 is


an interior point of ∩Bνj . Then ∩Bνj is the unit norm ball of ν defined by

ν(x) = sup{νj (x) : j ∈ J}.

5.3 Inner product norm and the dual norm


Recall that for a linear space V , a scalar function on V × V is an inner product denoted by
⟨x, y⟩ ∈ F if it satisfies
(a) ⟨x, x⟩ ≥ 0, where the equality holds if and only if x = 0,
(b) ⟨ax + by, z⟩ = a⟨x, z⟩ + b⟨y, z⟩,
(c) ⟨x, z⟩ = ⟨z, x⟩,
for any a, b ∈ F, x, y, z ∈ V .

Theorem 5.3.1 Suppose V is an inner product space. Then for any x, y ∈ V ,

∥x∥ = ⟨x, x⟩1/2 x∈V

is a norm satisfying the Cauchy inequality

|⟨x, y⟩| ≤ ∥x∥∥y∥

and the parallelogram identity

∥x + y∥2 + ∥x − y∥2 = 2(∥x∥2 + ∥y∥2 ).

Theorem 5.3.2 Suppose ∥ · ∥ is a norm on a linear space V satisfying the parallelogram


identity. Then one can define an inner product by ⟨x, y⟩ = a + ib with

2a = ∥x + y∥2 − ∥x∥2 − ∥y∥2 and 2b = ∥x + iy∥2 − ∥x∥2 − ∥y∥2

such that ∥z∥ = ⟨z, z⟩1/2 for all z ∈ V .

43
Remark 5.3.3 Suppose V is an inner product space, and ν is a norm on V . One can define
the dual norm on V by
ν D (x) = sup{|⟨x, y⟩| : ν(y) ≤ 1}.
We have (ν D )D = ν.

Example 5.3.4 The dual norm of the ℓp norm on Fn is the ℓp norm with 1/p + 1/q = 1.
The dual norm of the Schatten p norm on Mm,n is the Schatten q norm on Mm,n with
1/p + 1/q = 1.
The dual norm of the Ky Fan k-norm on Mm,n with m ≥ n is Fkd (A) = max{ nj=1 sj (A), s1 (A)}
P

5.4 Symmetric norms and unitarily invariant norms


A norm on Fn is a symmetric norm if ∥x∥ = ∥P x∥ for all permutation matrix P or diagonal
unitary (orthogonal) matrix P .
A norm on Mm,n (F) is a unitarily invariant norm (UI norm) if ∥U AV ∥ = ∥A∥ for any
unitary U ∈ Mm , V ∈ Mn , and any A ∈ Mm,n .

Theorem 5.4.1 Suppose m ≥ n. Every UI norm ∥ · ∥ on Mm,n corresponds to a symmetric


norm ν on Rn such that

∥A∥ = ν(s(A)) for all A ∈ Mm,n .

Proof. Suppose ∥ · ∥ is a UI norm. Then ∥A∥ = ∥ nj=1 sj (A)Ejj ∥ for any A ∈ Mm,n .
P

Define ν : Fn → R by ν(x) = ∥ nj=1 |xj |Ejj ∥ for x = (x1 , . . . , xn )t ∈ Rn . Then it is easy to


P

verify that ν is a symmetric norm.


Conversely, if ν is a symmetric norm on Rn , then define ∥ · ∥ by ∥A∥ = ν(s(A)) for any
A ∈ Mm,n . Then one can check that ∥A∥ is a norm using the fact that s(A+B) ≺ s(A)+s(B)
so that ν(s(A + B)) ≤ ν(s(A) + s(B)). □
Denote by GPn the set of matrices equal to the product of a permutation matrix and a
diagonal unitary (orthogonal) matrices if F = C (if F = R). Let c = (c1 , . . . , cn ) ∈ Rn with
c1 ≥ · · · ≥ cn ≥ 0. Define the c-norm on Fn by

νc (x) = max{ct P x : P ∈ GPn }

and the c-spectral norm on Mm,n (F) by

∥A∥c = νc (s(A)).

If c = (1, . . . , 1, 0, . . . , 0), we get the νk (x) and the Ky Fan k-norm Fk (A).
| {z }
k

44
Lemma 5.4.2 Suppose ν on Rn is a symmetric norm. Then for any x ∈ Rn ,

ν(x) = max{νc (x) : c = (c1 , . . . , cn ), c1 ≥ · · · ≥ cn , ν d (c) = 1}.

Suppose ∥ · ∥ is a UI norm on Mm,n (F). Then for any A ∈ Mm,n ,

∥A∥ = max{∥A∥c : C = s(C) for some C ∈ Mm,n , ∥C∥d = 1}.

Theorem 5.4.3 Let x, y ∈ Fn . The following are equivalent.


(a) νk (x) ≤ νk (y) for all k = 1, . . . , n.
(b) νc (x) ≤ νc (y) for all nonzero c = (c1 , . . . , cn ) with c1 ≥ · · · ≥ cn ≥ 0.
(c) ν(x) ≤ ν(y) for all symmetric norms ν.

Proof. Suppose (a) holds. Then for any c = (c1 , . . . , cn ) with c1 , . . . , cn , if we set dn = cn
and dj = cj − j + 1 for j = 1, . . . , n − 1, then νc (z) = nj=1 dj νj (z). Thus,
P

n
X n
X n
X
νc (x) = dj νj (x) ≤ dj νj (y) = cj yj = νc (y).
j=1 j=1 j=1

Suppose (b) holds. Let ν be a symmetric norm. Then for any c = (c1 , . . . , cn ) with
c1 ≥ · · · ≥ cn ≥ 0 with ν d (c) = 1, we have νc (x) ≤ νc (y). Thus, ν(x) = ν(y).
The implication (b) ⇒ (c) is clear. □

Theorem 5.4.4 Let A, B ∈ Mm,n (Fn ) with m ≥ n. The following are equivalent.
(a) Fk (A) ≤ Fk (B) for all k = 1, . . . , n.
(b) ∥A∥c ≤ ∥B∥c for all nonzero c = (c1 , . . . , cn ) with c1 ≥ · · · ≥ cn ≥ 0.
(c) ∥A∥ ≤ ∥B∥ for all UI norms ∥ · ∥.

Proof. Similar to the last theorem. □

Theorem 5.4.5 Let Rk ⊆ Mm,n be the set of matrices of rank at most k with m ≥ n > k.
Suppose ∥ · ∥ is a UI norm. If A ∈ Mm,n such that U ∗ AV = nj=1 sj (A)Ejj , then Ak =
P

U ( kj=1 sj (A)Ejj )V ∗ satisfies


P

∥A − Ak ∥ ≤ ∥A − X∥ for all X ∈ Rk .

45
Proof. Let X ∈ Rk and C = A − X. Then sj (X) = 0 for j > k so that

X ℓ
X ℓ
X
sk+j (A) = (sk+j (A) − sk+1 (X)) ≤ sj (C), for all ℓ = 1, . . . , n − k.
j=1 j=1 j=1

So, (sk+1 (A), . . . , sn (A), 0, . . . , 0) ≺w s(C) and ∥A − Ak ∥ ≤ ∥C∥ = ∥A − X∥. □

Theorem 5.4.6 Let A ∈ Mn and ∥ · ∥ be a unitarily invariant norm.


(a) ∥A − (A + A∗ )/2∥ ≤ ∥A − H∥ for any H = H ∗ ∈ Mn .
(b) ∥A − (A − A∗ )/2∥ ≤ ∥A − iG∥ for any G = G∗ ∈ Mn .

Proof. (a) Let H ∈ Mn be Hermitian, and let A − H = Ĥ + iG. Suppose Q ∈ Mn


is unitary such that Q∗ GQ is in diagonal form g1 , . . . , gn such that |g1 | ≥ · · · ≥ |gn |. If
d1 , . . . , dn are the diagonal entries of Q∗ (H + iG)Q, then

s(G) = (|g1 |, . . . , |gn |) ≺w (|d1 |, . . . , |dn |) ≺w (A − H).

Thus, ∥G∥ = ∥A − (A + A∗ )/2∥ ≤ ∥A − H∥.


(b) Similar to (a). □

Theorem 5.4.7 Suppose A, B ∈ Mn have singular values a1 ≥ · · · ≥ an and b1 ≥ · · · ≥ bn .


Then for any UI norm ∥ · ∥,
m
X m
X
∥ (aj + bn−j+1 )Ejj ∥ ≤ ∥A + B∥ ≤ ∥ (aj + bj )Ejj ∥.
j=1 j=1

Proof. By Lidskii inequalities, for all k = 1, . . . , n,


k
X k
X k
X k
X
[λj (A + B) − λj (A)] ≤ λj (B) and λij (A) − λij (−B) ≤ λj (A − (−B)).
j=1 j=1 j=1 j=1

We get the majorization result. □

Theorem 5.4.8 Let ∥ · ∥ be a UI norm on Mn .


(a) If P is positive semidefinite, then ∥P − I∥ ≤ ∥P − V ∥ ≤ ∥P + V ∥ for any unitary
V ∈ Mn .
(b) If A = U P such that P is positive semidefinite and U is unitary, then

∥A − U ∥ ≤ ∥A − V ∥ for any unitary V ∈ Mn .

Proof. (a) Apply the previous theorem with P = A and B = I.


(b) Use the fact that ∥A − V ∥ = ∥U P − V ∥ = ∥P − U ∗ V ∥ ≥ ∥P − I∥ = ∥U P − U ∥. □

46
5.5 Errors in computing inverse and solving linear equations
Theorem 5.5.1 If B ∈ Mn satisfies r(B) < 1, then I − B is invertible and

X
−1
(I − B) = Bk.
k=0

Consequently, if A ∈ Mn is invertible and E satisfies r(A−1 E) < 1, then A + E is invertible


and
X∞
−1 −1
A − (A + E) = (A−1 E)k A−1 .
k=1

Furthermore, if ∥ · ∥ is a matrix norm on Mn such that ∥A−1 E∥ < 1 and κ(A) = ∥A−1 ∥∥A∥,
then
∥A−1 − (A + E)−1 ∥ ∥A−1 E∥ κ(A) ∥E∥
−1
≤ −1
≤ .
∥A ∥ 1 − ∥A E∥ 1 − κ(A)(∥E∥/∥A∥) ∥A∥

Proof. Use the identity (I − A)( kj=1 Aj ) = I − Ak+1 and letting k → ∞.


P

The quantity κ(A) is called the condition number of A with respect to the norm ∥ · ∥.
Important implication, the change of the inverse will affected by κ(A). For example, if
∥A∥ = s1 (A), and A is unitary, then

∥A−1 − (A + E)−1 ∥ ∥E∥


−1
≤ .
∥A ∥ ∥A∥ − ∥E∥

So, the computation of A is very “stable”.


We can apply the result to analysis the solution of Ax = b.

Corollary 5.5.2 Let A, E ∈ Mn and x, b ∈ Cn be such that Ax = b and (A + E)x̂ = b.


Suppose A and (A + E) are invertible.

x − x̂ = [A−1 − (A + E)−1 ]b = [A−1 − (A + E)−1 ]A−1 x.

Suppose ∥ · ∥ is a matrix norm on Mn such that ∥A−1 E∥ < 1, and if ν is a norm on Cn such
that ν(Bz) ≤ ∥B∥ν(z) for all B ∈ Mn and z ∈ Cn . If κ(A) = ∥A−1 ∥∥A∥, then

ν(x − x̂) ∥A−1 E∥ κ(A) ∥E∥


≤ ≤ .
ν(x) 1 − ∥A−1 E∥ 1 − κ(A)(∥E∥/∥A∥) ∥A∥

47
6 Additional topics
6.1 Location of eigenvalues
Theorem 6.1.1 (Gershgorin Theorem) Let A ∈ (aij ), and let
X
Gj (A) = {µ ∈ C : |µ − ajj | ≤ |aji |}.
i̸=j

Then the eigenvalues of A lies in G(A) = ∪nj=1 Gj (A). Furthermore, if C = Gi1 (A) ∪ · · · ∪
Gij (A) form a connected component of G, then C contains exactly k eigenvalues counting
multipicities.

Proof. Suppose Av = λv with v = (v1 , . . . , vn ). Then for i = 1, . . . , n,


X
λvi − aii vi = aij vj .
j̸=i

Suppose vi has the maximum size. Then


X X
|λ − aii | = | aij vj /vi | ≤ |aij |.
j̸=i j̸=i

To prove the last assertion. Let At = D + t(A − D) with D = diag (a11 , . . . , ann ).
Then A0 has eigenvalues a11 , . . . , ann , and the eigenvalues and Gershgorin disk will change
continuously according to t ∈ [0, 1] until we get A1 = A.
One can apply the result to At to get Gershgorin disks of different sizes centered at
a11 , . . . , ann . Also, one can apply the result to S −1 AS for (simple) invertible S such that
G(S −1 AS) is small. In fact, if A is already in Jordan form, then for any ε > 0 there is
S such that S −1 AS has diagonal entries λ1 , . . . , λn and (i, i + 1) entries equal 0 or ε for
i = 1, . . . , n − 1, and all other entries equal to 0. So, we have the following.

Theorem 6.1.2 Let A ∈ Mn . Then


\
G(S −1 AS) = {λ1 (A), . . . , λn (A)}.
S∈Mn is invertible

One may use the Gershgorin theorem to study the zeros of a (monic) polynomial, namely,
one can apply the result to the companion matrix Cf of f (x) to get some estimate of the
location of the zeros. One can further apply similarity to Cf to get better estimate for the
zeros of f (x).

48
6.2 Eigenvalues and principal minors
Theorem 6.2.1 Let A ∈ Mn with eigenvalues λ1 , . . . , λn . Then

det(zI − A) = (z − λ1 ) · · · (z − λn ) = z n − a1 z n−1 + a2 z n−2 − · · · + (−1)n an ,

where for m = 1, . . . , n,
X
am = Sm (λ1 , . . . , λn ) = (λj1 + · · · + λjm )
1≤j1 <···<jm ≤n

is the sum of all m × m principal minors of A.

Proof. For any subseteq J ⊆ {1, . . . , n}, let A[J] be the principal submatrix of A with row
and column indices in J. Consider the expansion det(zI − A). The coefficient of z n−j comes
from the sum of the leading coefficients of (−1)j det(A[J]) det(zI − A[J]) for all different
j-element subsets J of {1, . . . , n}. The result follows. □

6.3 Nonnegative Matrices


In this section, we consider positive (nonnegative) matrices A, i.e., the entries of A are
positive (nonnegative) real numbers. Denote by |A| the matrix obtained from A by changing
its entries to their absolute values (norm). Similarly, we consider |v| of a vector v.

Theorem 6.3.1 (Perron-Frobenius Theorem) Suppose A ∈ Mn is nonnegative such that Ak


is positive for some positive integer k. Then the following holds.

(a) r(A) > 0 is an algebraically simple eigenvalue of A such that r(A) > |λ| for all other
eigenvalue λ of A.

(b) There is a unique positive vector x with ℓ1 (x) = 1 such that Ax = r(A)x, and there is
a unique positive vector y with y t x = 1 and y t A = r(A)y t .

(c) Let x and y be the vectors in (b). Then (r(A)−1 A)m → xy t as m → ∞.

We first prove a lemma.

Lemma 6.3.2 Suppose A ∈ Mn is nonnegative with row sums r1 , . . . , rn .

(a) For any nonnegative matrix P , r(A) ≤ r(A + P ).

(b) If all the row sums are the same, then r(A) = r1 . In general,

min{ri : 1 ≤ i ≤ n} ≤ r(A) ≤ max{ri : 1 ≤ i ≤ n}.

49
Proof. (a) If B = A + P , then for any positive integer k, B k − Ak is nonnegative so that
∥Ak ∥ℓ∞ ≤ ∥B k ∥ℓ∞ . Hence,

1/k 1/k
r(A) = lim ∥Ak ∥ℓ∞ ≤ lim ∥B k ∥ℓ∞ = r(B).
k→∞ k→∞

(b) Suppose all the row sums are the same. Let e = (1, . . . , 1)t . Then Ae = r1 e so that
r1 is an eigenvalue. By Gershgorin Theorem all eigenvalues lie in
Sn n P o
i=1 µ ∈ C : |µ − a ii | ≤ j̸=i ij .
a

Thus, all eigenvalues lie in the set {µ ∈ C : |µ| ≤ r1 }. Hence, r1 = r(A).


In general, let P be a nonnegative matrix such that B = A + P has all row sum equal to
∥A∥ℓ∞ . Then r(A) ≤ r(B) = ∥A∥ℓ∞ .
Similarly, let Q be a nonnegative matrix such that B̂ = A − Q is nonnegative with all
row sum equal to rℓ = min{ri : 1 ≤ i ≤ n}. Then rℓ = r(B̂) ≤ r(A). □

Proof of Theorem 6.3.1. Assume B = Ak is positive. Then r(B) is larger than the
minimum row sum of B so that 0 < r(B) = r(A)k . Note that Bv is positive for any nonzero
vector v ≥ 0.
Assertion 1 Let λ be an eigenvalue of B. Either |λ| < r(B) or λ = r(B) with an eigenvector
x such that x = eiθ |x| for some θ ∈ R.
Proof. Let λ be an eigenvalue of B such that |λ| = r(B), and x be an eigenvector. Then
r(B)|x| = |r(B)x| = |Bx| ≤ B|x|. We claim that the equality holds. If it is not true, we can
set z = B|x| so that y = (B − r(B))|x| = z − r(B)|x| ≠ 0 is nonnegative. Then

0 < By = Bz − r(B)B|x| = Bz − r(B)z.

So, z = (z1 , . . . , zn )t has positive entries, and for Z = diag (z1 , . . . , zn ), we have

Z −1 (BZe − r(B)Ze) = Z −1 BZe − r(B)e = Z −1 By > 0.

If follows that Z −1 BZ has minimum row sum r(B) + δ, where δ = ℓ∞ (Z −1 By) > 0. So,
r(Z −1 BZ) ≥ r(B) + δ, which is a contradiction.
Now, r(B)|x| = B|x| has positive entries, and |Bx| = r(B)|x| = B|x|. Thus, x = eiθ |x|,
i.e., x is the eigenspace of r(B) and λ = r(B). The proof of Assertion 1 is complete.
Assertion 2 The value r(B) is a simple eigenvalue of B with a unique positive positive
eigenvector x satisfying et x = 1 and a unique positive left eigenvector y such that y t x = 1.
Moreover, there is an invertible matrix S ∈ Mn such that x is the first column of S and y t
is the first row of S −1 satisfying S −1 BS = [r(B)] ⊕ B1 with r(B1 ) < r(B).

50
Proof. Suppose Bu = r(B)u and Bv = r(B)v for two linearly independent vectors u
and v such that et |u| = et |v| = 1. By the arguments in the previous paragraphs, we see
that there are θ, ϕ ∈ R such that u = eiθ |u| and v = eiϕ |v|, such that |u|, |v| have positive
entries. So, there is β > 0 such that |u| − β|v| is nonnegative with at least one zero entry.
We have r(B)(|u| − β|v|) = B(|u| − β|v|), and B(|u| − β|v|) has a positive entries, which is
a contradiction. So, |u| = |v|.
Let x be the unique positive eigenvector such that Bx = r(B)x satisfying et x = 1.
Then we can consider B t and obtain a positive vector B t y = r(B)y satisfying xt y = 1. Let
S = [x|S1 ] ∈ Mn be such that the columns of y t S1 = [0, . . . , 0] ∈ R1×n−1 . Then x is not in
the column space of S1 because y t x = 1 ̸= 0. So, S is invertible. Moreover, y t S = [1, 0, . . . , 0]
so that y is the first row of S −1 . Now, if S −1 BS = C, then SC = BS has first column equal
r(B)e1 . Thus, the first column of C is r(B)e1 . Similarly, the first column of CS −1 = S −1 B
equals r(B)y t . Thus, the first row of C is r(B)et1 . Hence, S −1 BS = [r(B)] ⊕ B1 such that
r(B1 ) < r(B). Assertion 2 follows.
Assertion 3 The conclusion of Theorem 6.3.1 holds.
Proof. Note that the vectors x and y in Assertion 3 are the left and right eigenvectors of
A corresponding to a simple eigenvalue λ of A with |λ| = r(A). Now, Ax = λx implies that
λ = r(A). So, S −1 AS = [r(A)] ⊕ A1 such that r(A1 ) < r(A). Finally,

lim [A/r(A)]m = lim S([1] ⊕ (A1 /r(A))m )S −1 = S([1] ⊕ 0n−1 )S −1 = xy t .


m→∞ m→∞ □
In general, for any nonnegative matrix A ∈ Mn , we can consider Aε = A + εeet for some
positive ε > 0 so that the resulting matrix is positive so that r(Aε ) is a simple eigenvalue of
Aε ) with positive left and right eigenvectors xε and yε . By continuity, we have the following.

Corollary 6.3.3 Let A ∈ Mn be a nonnegative matrix. Then r(A) is an eigenvalue of A


with at least one pair of nonnegative left and right eigenvector.

For a nonnegative matrix A, r(A) is call the Perron eigenvalue of A, and the corresponding
nonnegative left and right eigenvectors are called the Perron eigenvectors.

Example 6.3.4 Note that Ak is not positive for any positive integer k in all the following.
If A = I2 , then r(A) = 1 and all nonzero vectors are left and right eigenvectors.
 
1 1
If A = , then r(A) = 1 with right and left eigenvectors x = (1, 0)t /2 and y = (0, 1)t .
0 1
 
1/2 1/2
If A = , then r(A) = 1 with right and left eigenvectors x = (1, 1)t /2 and
0 1
y = (0, 2)t .

51
 
0 1
If A = , then r(A) = 1 with right and left eigenvectors x = (1, 1)t /2 y = (1, 1)t .
1 0

A row (column) stochastic matrix is a matrix with nonnegative entries such that all row
(colum) sums equal one. It appear in the study of Markov Chain in probability, population
models, Google page rank matrix, etc. If A ∈ Mn is both row and column stochastic, then
it is doubly stochastic.

Corollary 6.3.5 Let A be a row stochastic matrix. Then r(A) = 1. If Ak is positive, then
r(A) is a simple eigenvalue with a unique positive left eigenvector x satisfying et x = 1, and
a unique positive left eigenvector y such that Ak → xy t as k → ∞.

6.4 Kronecker (tensor) products


Definition 6.4.1 Let A = (aij ) ∈ Mm,n , B = (brs ) ∈ Mp,q . Then A ⊗ B = (aij B) ∈ Mmp,ns .

Theorem 6.4.2 The following equations hold for scalar a, b and matrices A, B, C, D) pro-
vided that the sizes of the matrices are compatible with the described operations.
(a) (aA + bB) ⊗ C = aA ⊗ C + bB ⊗ C, C ⊗ (aA + bB) = aC ⊗ A + bC ⊗ B.
(b) (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).

Proof. (a) By direct verification. (b) Suffices to show (A ⊗ B)(Cj ⊗ Dk ) = (ACj ) ⊗ (BDk )
for all columns Cj of C and Dk of D. □

Corollary 6.4.3 Let A, B be matrices. Then f (A ⊗ B) = f (A) ⊗ f (B) for f (X) = X, X t


or X ∗ .
(a) If A, B are invertible, then (A ⊗ B)−1 = A−1 ⊗ B −1 .
(b) If A and B are unitary, then so is A ⊗ B with inverse (A ⊗ B)∗ = A∗ ⊗ B ∗ .
(c) If S −1 AS and T −1 BS are in triangular forms, then so is (S ⊗ T )−1 (A ⊗ B)(S ⊗ T ).
(d) If A has eigenvalues a1 , . . . , am and B has eigenvalues b1 , . . . , bn , then A ⊗ B has eigen-
values
ai bj with 1 ≤ i ≤ m, 1 ≤ j ≤ n; if xi , yj are eigenvectors such that Axi = ai xi and
Byj = bj yj ,
then (A ⊗ B)(xi ⊗ j) = ai bj (xi ⊗ yj ).
(e) If A and B have singular value decomposition A = U1 D1 V1∗ and B = U2 D2 V2∗ , then the
equation
(A ⊗ B)(V1 ⊗ V2 ) = (U1 ⊗ U2 )(D1 ⊗ D2 ) will yield the information for singular values
and
singular vectors.

52
We have the following application of the tensor product results to matrix equations.

Theorem 6.4.4 Let A ∈ Mm , B ∈ Mn and C ∈ Mm,n . Then the matrix equation

AX + XB = C X ∈ Mm,n

can be rewritten as (Im ⊗ A)vec(X) + (B t ⊗ In )vec(X) = vec(C), where for Z ∈ Mm,n we


have vec(Z) ∈ Cmn with the first column of Z as the first m entries, second column of Z as
the next m entries, etc.
Consequently, the matrix equation is solvable if and only if vec(C) lies in the column
space of In ⊗ A + B t ⊗ Im . In particular, if In ⊗ A + B t ⊗ Im is invertible, then the matrix
equation is always solvable.

The Hadamard (Schur) product of two matrices A = (aij ), B = (bij ) ∈ Mm,n is defined
by A ◦ B = (aij bij ).

Corollary 6.4.5 Let A, B ∈ Mm,n .


(a) Then sk (A ⊗ B) ≥ sk (A ◦ B) for k = 1, . . . , m.
(b) If m = n, then sn−k+1 (A ◦ B) ≥ sn2 −k+1 (A ⊗ B) for k = 1, . . . , n.
(c) If A, B are positive semidefinite, then so is A ◦ B.

Remark Note that if A, B ∈ Mn are invertible, unitary, or normal, it does not follow that
A ◦ B has the same property.

53
6.5 Compound matrices
Let A ∈ Mm,n and k ≤ min{m, n}. Then the compound matrix Cm (A) is of size m × nk
 
k
with rows labeled by increasing subseqeuence r = (r1 , . . . , rk ) of (1, . . . , m) and columns
labeled by increasing subseqeuence s = (s1 , . . . , sk ) of (1, . . . , n) in lexicographic order such
that the (r, s) entry of Cm (A) equals det(A[r, s]), where A[r, s] ∈ Mk is the submatrix of A
with rows and columns indexed r and s, arranged in lexicographic order.

Example 6.5.1 Let A ∈ M4 . Then C2 (A) ∈ M6 with (r1 , r2 ), (s1 , s2 ) entry equal to det(A[r1 , r2 ; s1 , s2 ]).

It is easy to check that Ck (At ) = Ck (A)t , Ck (A∗ ) = Ck (A)∗ , etc.


We will prove a product formula for the compound matrix. The proof depends on the
following result which generalizes the Cauchy-Binet formula.

Theorem 6.5.2 Let A ∈ Mm,n and B ∈ Mn,m . Then for any 1 ≤ k ≤ m, the sum of the
k × k principal minors of AB is the same as that of BA ∈ Mn .

Note that when k = m ≤ n, the above result is known as the Cauchy Binet formula.
Proof. Recall that if
     
AB 0 0m 0 Im A
P = , Q= and S= ,
B 0n B BA 0 In

then S is invertible and


       
AB 0 Im A AB ABA Im A 0m 0
PS = = = = SQ
B 0n 0 In B BA 0 In B BA

Thus, P and Q are similar, and

z m det(zIn − BA) = det(zIm+n − Q) = det(zIm+n − P ) = z n det(zIm − AB).

Thus the sum of the kth principal minors of P and that of Q are the same. Evidently, the
sum of the kth principal minors of P are the same as that of AB, and the sum of the kth
principal minors of Q are the same as that of BA. The result follows. □

54
Theorem 6.5.3 Let A ∈ Mm,n , B ∈ Mn,p and k ≤ min{m, n, p}. Then Ck (AB) =
Ck (A)Ck (B).

Proof. Let Γr,k be the set of length k increasing subsequence of (1, . . . , r) for r ≥ k.
Consider the entry of Ck (AB) with row indexes r = (r1 , . . . , rk ) ∈ Γm,k and column indexes
s = (s1 , . . . , sk ) ∈ Γn,k . Let  ∈ Mk,n be obtained from A by using its rows indexed
by (r1 , . . . , rk ), and let B̂ ∈ Mn,k be obtained from B by using its columns indexed by
(s1 , . . . , sk ). Then the (r, s) entry of Ck (AB) equals det(ÂB̂) = Ck (Â)Ck (B̂) by the Cauchy
Binet formula. Note that Ck (Â)Ck (B̂) is the (r, s) entry of Ck (A)Ck (B). The result follows.

Corollary 6.5.4 Let A ∈ Mn and k ≤ n.

(a) If A is invertible (unitary), then so is Ck (A).

(b) Suppose A = U T U ∗ is in triangular form. Then Ck (A) = Ck (U )Ck (T )Ck (U ∗ ), where


Ck (T ) is in triangular form. Consequently, Ck (A) has eigenvalues kj=1 λij (A).
Q

Pn
(c) Suppose U ∗ AV = D with D = j=1 sj (A)Ejj , where U, V are unitary. Then

Ck (U ∗ )Ck (A)Ck (V ) = Ck (D).


Qk
Consequently, Ck (A) has singular values j=1 sij (A), 1 ≤ i1 < · · · < ik ≤ n.

Corollary 6.5.5 Let A ∈ Mn with eigenvalues λ1 (A), . . . , λn (A) satisfying |λ1 (A)| ≥ · · · ≥
|λn (A)|. Then
Yk k
Y
|λj (A)| ≤ sj (A) for j = 1, . . . , n.
j=1 j=1

55
6.6 Additive compound
Let A ∈ Mn and 1 ≤ k ≤ n, and

Ck (tIn + A) = Ck (A) + tDk (A) + t2 D2,k + tk−3 D3,k (A) + · · · + tk I(n) .


k

The matrix Dk (A) is called the additive compound of A.


Note that Dk (aA + bB) = aDk (A) + bDk (B) for any a, b ∈ C, A, B ∈ Mn .

Theorem 6.6.1 Let A ∈ Mn . Then Dk (S −1 AS) = Ck (S)−1 Dk (A)Ck (S) so that A has
eigenvalues kj=1 λij (A) with 1 ≤ i1 < · · · < ik ≤ n. Consequently, if A is normal (Hermi-
P

tian, positive semi-definite) then so is Dk (A).

Corollary 6.6.2 Let A ∈ Mn be Hermitian. Then

k
X k
X k
X
λn−j+1 (A) ≤ λj (A) ≤ sj (A).
j=1 j=1 j=1

Theorem 6.6.3 Let A, B ∈ Mn . Then Dk (AB) = Dk (AB − BA) = Dk (A)Dk (B) −


Dk (B)Dk (A). Consequently, if A and B commute, then so do Dk (A) and Dk (B).

Proof. The proof follows from the fact that Dk (X) can be written as
 
k
X
V ∗  (In ⊗ · · · ⊗ In ⊗X ⊗ In ⊗ · · · ⊗ In  V,
| {z } | {z }
j=1 j−1 k−j

where V ∈ Mnk ×(n) such that V ∗ V = I(n) and the columns of V is a basis for the subspace
k k
k
of Cn spanned by
( )
X
χ(σ)eσ(i1 ) ⊗ · · · ⊗ eσ(ik ) : 1 ≤ i1 < · · · < ik ≤ n ,
σ∈Sk

where χ(σ) = 1 if σ ∈ Sk is an even permutation and χ(σ) = −1 otherwise. □

56
6.7 More block matrix techniques
 
A11 A12
Schur Complement Let A = such that A11 ∈ Mk is invertible. Then
A21 A22
    
Ik 0 A11 A12 A11 A12
−1 = .
−A21 A11 In−k A21 A22 0 A22 − A21 A−1
11 A12

The matrix A22 − A21 A−1 11 A12 is the Schur complement of A with respect to Q11 . Clearly, it
is useful for block Gaussian elimination. Also, if A is invertible, then the Schur complement
if the n − k by n − k submatrix in A−1 .
If A−1 exists, then A22 − A21 A−111 A12 is invertible and

 −1    
−1 A11 A12 Ik 0 ⋆ ⋆
A = = −1 .
0 A22 − A21 A−1
11 A12 A21 A−1
11 In−k ⋆ (A22 − A21 A−1
11 A12 )

So, (A22 − A21 A111 A12 )−1 is the (n − k) × (n − k) matrix in the right bottom block of A−1 .
 
A11 A12
Block Hermitian matrices Suppose A = such that A11 ∈ Mk is invertible.
A21 A22
 
Ik 0
If S = , then SAS ∗ = A11 ⊕ (A22 − A21 A−1
11 A12 ).
−A12 A−1
11 In−k

57

You might also like