408 Note
408 Note
Chi-Kwong Li
The polar form of z is z = |z|(cos θ + i sin θ) = |z|eiθ . If z1 = |z1 |eiθ1 and z2 = |z2 |eiθ2 ,
then z1 z2 = |z1 ||z2 |ei(θ1 +θ2 ) , where we may replace θ1 + θ2 by θ1 + θ2 − 2π in case
θ1 + θ2 ≥ 2π. If z2 ̸= 0, then z1 /z2 = (|z1 |/|z2 |)ei(θ1 −θ2 ) , where we may replace θ1 − θ2
by θ1 − θ2 + 2π in case θ1 < θ2 .
1
1.2 Real or Complex Vectors and Matrices
Let F = R or C, and Fn be the set of column vectors with n co-ordinates.
x1 y1
.. ..
If x = . , y = . ∈ Fn , and γ ∈ R, then the addition and scalar multiplication
xn yn
are defined by
x1 + y 1 γx1
x + y = ... and γx = ... ,
xn + y n γxn
respectively.
The set Fn form a vector space under addition and scalar multiplication.
The addition is closed, associative, commutative; there is a zero vector 0 ∈ Fn such that
x + 0 = x; for any x ∈ Fn there is an additive inverse −x such that x + (−x) = 0; the scalar
multiplication always yields an element in Cn and satisfies γ1 (γ2 x) = (γ1 γ2 )x and 1x = x for
any γ1 , γ2 ∈ F and x ∈ Fn .
Let Mn (F), Mm,n (F) be the set of n × n and m × n matrices over F, respectively. We
write Mn , Mm,n if F = C.
A1 0
If A = ∈ Mm+n (F) with A1 ∈ Mm (F) and A2 ∈ Mn (F), we write A =
0 A2
A1 ⊕ A2 .
For a complex matrix A, A denotes the matrix obtained from A by replacing each
entry by its complex conjugate. Furthermore, A∗ = (A)t .
If A ∈ Mmn has columns u1 , . . . , un and B ∈ Mn,p has rows v1t , . . . , vnt , then
n
X
AB = ui vjt .
j=1
2
1.3 Basic concepts and operations for complex vectors & matrices
We can extend the concepts on real vectors and real matrices to complex vectors and complex
matrices.
Column space, row space, null space, and rank of a complex matrix.
Determine h in the above example so that A has rank one or rank 2. Also, determine
bases for the column space, row space, and null space of A for each choice of h.
3
1.4 Inner product, orthonormal sets, Gram-Schmidt process
Recall that the inner product of u, v ∈ Cn is ⟨u, v⟩ = v ∗ u and satisfies the following:
(1) For any u, u1 , u2 , v ∈ Cn and a, b ∈ C, ⟨au1 + bu2 , v⟩ = a⟨u1 , v⟩ + b⟨u2 , v⟩
(2) For any u, v ∈ Cn , ⟨u, v⟩ = ⟨v, u⟩
(3) For any u ∈ Cn , ⟨u, u⟩ ≥ 0, the equality holds if and only if u = 0.
The Euclidean norm (a.k.a. ℓ2 -norm) of v ∈ Cn is defined by ∥v∥ = (v ∗ v)1/2 and satisfies
the following.
(a) For any v ∈ Cn , ∥v∥ ≥ 0. ( positive definiteness)
The equality holds if and only if v = 0.
(b) For any a ∈ C and v ∈ C, ∥av∥ = |a|∥v∥. (absolute homogeneity)
(c) For any u, v ∈ Cn , ∥u + v∥ ≤ ∥u∥ + ∥v∥. (triangle inequality)
The equality holds if and only if one vector is a nonnegative multiple of the other.
Condition (c) follows from
(d) |⟨u, v⟩| ≤ ∥u∥∥v∥. (Cauchy-Schwartz inequality)
The equality holds if and only if one vector is a multiple of the other.
A set of vectors {u1 , . . . , um } ⊆ Fn is orthonormal if ⟨ui , uj ⟩ = δij , the Kronecker delta
such that δjj = 1 and δij = 0 if i ̸= j. Equivalently, U ∗ U = Im , where U ∈ Mn,m (F) has
columns u1 , . . . , um .
Note: An orthonormal set {u1 , . . . , um } ⊆ Fn is always linearly independent so that m ≤ n.
A vector v is a linear combination of u1 , . . . , um if and only if v = a1 u1 + · · · + am um with
aj = ⟨v, uj ⟩ for j = 1, . . . , m.
Gram-Schmidt Process Let v1 , . . . , vm ∈ Fn be linearly independent with m < n.
Set u1 = v1 /∥v1 ∥.
For k > 1, let fk = vk − (a1 u1 + · · · + ak−1 uk−1 ) with aj = u∗j vk and uk = fk /∥fk ∥.
Then {u1 , . . . , uk } is an orthonormal basis for span {v1 , . . . , vk } for k = 1, . . . , m.
If m < n, one may further extend {u1 , . . . , um } to an orthonormal basis {u1 , . . . , un }.
To see this, one can apply the Gram-Schmidt process to the basic columns of the rank n
matrix [u1 · · · um e1 · · · en ].
A set {u1 , . . . , un } is an orthonormal basis for Fn if and only if the matrix U with columns
u1 , . . . , un satisfies U ∗ U = In . When F = C, the matrix U is called a unitary matrix; when
F = R, the matrix U is called an orthogonal matrix.
We will denote by Un (F) the set of matrices U ∈ Mn (F) such that U ∗ U = In .
4
Exercises
1 2i 3 4
1. Let A = 2i 6 1 + i 1 − i.
1 + 2i 6 + 2i 4 + i 5 − i
(a) Reduce the matrix to row echelon form, and find the rank of A.
(b) Find bases for the row space, column space, and null space of A.
(c) Solve the equations Ax = (2, 2 − i, 3 − i))t and Ax = (1, 0, 0)t .
i 2
2. Let A = .
−2 i
(a) Find the eigenvalues λ1 , λ2 of A, and the corresponding unit eigenvectors u1 , u2 .
(b) Let U = [u1 u2 ]. Show that U ∗ U = I2 and AU = U D with D = diag (λ1 , λ2 ).
(c) Show that Ak = U Dk U ∗ = λk1 v1 v1∗ + λk2 v2 v2∗ for all (positive or negative) integers k
3. Suppose A = SDS −1 ∈ Mn such that D = diag (λ1 , . . . , λn ), and where S has columns
x1 , . . . , xn and S −1 has rows y1t , . . . , ynt .
(
1 if i = j,
(a) Show that yit xj = δij = [Hint: Consider S −1 S.]
0 if i ̸= j.
(b) Show that Ak = SDk S −1 = nj=1 λkj xj yjt for every positive integer k.
P
(c) If A is invertible, show that Ak = SDk S −1 = nj=1 λkj xj yjt for every negative integer
P
k.
(d) For any polynomial f (z) = am z m + · · · + a0 , let f (A) = am Am + · · · + a1 A + a0 In .
Show that f (A) = nj=1 f (λj )xj yjt .
P
i 0 0
1 i
4. Suppose A = and B = 2 2i 0 .
0 2
1 1 3i
(a) Show that for any C ∈ M2,3 , there is X ∈ M2,3 such that AX + C = XB.
[Hint: Let X = [xij ] and set up a linear system of 6 equations to solve for [xij ] for a
given C.]
A C
(b) Suppose T = for some matrix C ∈ M2,3 . Show that there is X ∈ M2,3
0 B
such that
I2 X
T S = S(A ⊕ B) if S = . Find S −1 and conclude that S −1 T S = A ⊕ B.
0 I3
(c) Show that conclusion (a) may fail if A and B share a common eigenvalue.
5
5. Let u, v1 , v2 ∈ Cn , a, b ∈ C. Show that ⟨u, av1 + bv2 ⟩ = ā⟨u, v1 ⟩ + b̄⟨u, v2 ⟩.
7. Let u, v ∈ Cn . Prove the Cauchy-Schwarz inequality |⟨u, v⟩| ≤ ∥u∥∥v∥, and the triangle
inequality ∥u + v∥ ≤ ∥u∥ + ∥v∥, and determine the conditions for equality.
Hint: Let u, v ∈ Cn be nonzero. Consider eiθ such that ⟨u, eiθ v⟩ = |⟨u, v⟩| so that
⟨eiθ v, u⟩ = ¯⟨u, eiθ v⟩ = |⟨u, v⟩|. Then for any t ∈ R,
with a = ∥v∥2 , c = ∥u∥2 , b = |⟨u, v⟩|. Then argue that b2 ≤ ac to prove the inequality,
and argue that the equality hold if and only if u + teiθ v = 0 for some t ∈ R.
9. Let u = (1, 2i, 1 − i)t . Find a unitary U with u/∥u∥ as the first column.
10. Suppose A ∈ Mn,m with m ≤ n with rank m. Show that A = P U such that P ∈ Mn,m
has orthonormal columns, and U is upper triangular.
1 1−i 2+i
11. Let A = 1 1 + i −2 + i. Write A = U R for an upper triangular matrix R.
i i 2
[Apply Gram Schmidt to the columns of A to get a unitary matrix U .]
6
2 Unitary equivalence and unitary similarity
Two matrices A, B ∈ Mm,n are unitarily equivalent if there are unitary U ∈ Mm and V ∈ Mn
such that A = U BV . Two matrices X, Y ∈ Mn are unitarily similar if there is a unitary
W ∈ Mn such that X = W ∗ Y W . It is easy to show that these are equivalence relations,
that is, reflective, symmetric and transitive.
In this chapter, we consider different canonical forms of matrices under unitary equiva-
lence and unitary similarity.
Proof. Note that the existence of the maximum |u∗ Av| follows from basic analysis result.
Suppose U ∗ AV = (aij ). If the first column x = U ∗ Av = (a11 , . . . , am1 )t has nonzero
entries other than a11 , then ũ = U x/∥U x∥ = U x/∥x∥ ∈ Cm is a unit vector such that
p
ũ∗ Av = x∗ U ∗ Av/∥x∥ = x∗ x/∥x∥ = ∥x∥ > |a11 |2 = |a11 | = |u∗ Av|,
which contradicts the choice of u and v. Similarly, if the first row y ∗ = x∗ AV = (a11 , . . . , a1n )
has nonzero entries other than a11 , then ṽ = V y/∥V y∥ = V y/∥y∥ is a unit vector satisfying
Theorem 2.1.2 Let A be an m × n matrix of rank r. Then there are unitary matrices
U ∈ Mm , V ∈ Mn such that
Xr
U ∗ AV = D = sj Ejj .
j=1
Proof. We prove the result by induction on max{m, n}. By the previous lemma, there are
∗
∗ u Av 0
unitary matrices U ∈ Mm , V ∈ Mn such that U AV = . We may replace U by
0 A1
7
eiθ U for a suitable θ ∈ [0, 2π) and assume that u∗ Av = |u∗ Av| = s1 . By induction assumption
s2
there are unitary matrices U1 ∈ Mm−1 , V1 ∈ Mn−1 such that U1∗ A1 V1 =
s3 . Then
..
.
([1] ⊕ U1∗ )U ∗ AV ([1] ⊕ V1 ) has the asserted form, where r is the rank of A. □
Remark 2.1.3 The values s1 ≥ · · · ≥ sr > 0 are the nonzero singular values of A, which
are s21 , . . . , s2r are the nonzero eigenvalues of AA∗ and A∗ A. The vectors v1 , . . . , vr are the
right singular vectors of A, and u1 , . . . , ur are the left singular vectors of A. So, they are
uniquely determined. We will denote the singular values of A by s1 (A) ≥ s2 (A) ≥ · · ·
Here is another way to do the singular value decomposition. Let {v1 , . . . , vr } ⊆ Cn
be an orthonormal set of eigenvectors corresponding to the nonzero eigenvalues s21 , . . . , s2r
of A∗ A. Let uj = Avj /sj . Then {u1 , . . . , ur } ⊆ Cm is an orthonormal family such that
A = rj=1 sj uj vj∗ .
P
If A ∈ Mm,n , then one can find real orthogonal matrices U ∈ Mm and V ∈ Mn with
columns u1 , . . . , um and v1 , . . . , vn such that A = U ( rj=1 sj Ejj )V ∗ = rj=1 sj uj vj∗ .
P P
We may extend the definition of inner product ⟨x, y⟩ and inner product norm ∥x∥ for
vectors x, y ∈ Fn to matrices by
X
⟨A, B⟩ = aij b̄ij = tr (AB ∗ ) and ∥A∥F = ⟨A, A⟩1/2
i,j
Theorem 2.1.4 Suppose A ∈ Mm,n (F) has rank r and singular value decomposition A =
Pr ∗ m n
j=1 sj uj vj , where s1 ≥ · · · ≥ sr > 0 {u1 , . . . , ur } ⊆ F , {v1 , . . . , vr } ⊆ F are orthonormal
X k
X X
∥A − B∥2F = ∥P (A − B)Q∥2F = 2
|aij | + |ajj − bj |2 + |ajj |2 .
i̸=j j=1 j>k
8
Let C = P (A − B)Q = (cij ). If there is 1 ≤ i ≤ k such that cij ̸= 0, we may change the
(i, j) entry of P BQ to aij to get a rank at most r matrix B̂ so that ∥A − B̂∥F is smaller.
Similarly, if there is 1 ≤ j ≤ k such that such that cij ̸= 0, we may change the (i, j) entry
of P BQ to aij to get a rank at most k matrix B̂ so that ∥A − B̂∥F is smaller. Hence, at the
Pk
0r 0 j=1 bj Ejj 0
minimum, P (A − B)Q = . So, P AQ = , and b1 , . . . , bk are
0 A22 0 A22
singular values of A. Thus,
k
X r
X k
X
∗
∥P AQ − P BQ∥2F = tr (AA ) − b2j = 2
sj (A) − b2j ,
j=1 j=1 j=1
such that U ∗ AU is in upper (or lower) triangular form with diagonal entries λ1 , . . . , λn .
Proof. By induction on n. If n = 1, the results holds. Assume the results holds for
matrices of sizes smaller than n, and A ∈ Mn . Let Ax = λ1 u1 for a unit vector u1 , and U
∗ λ1 ∗
is unitary with first column of U1 equal to u1 . Then U1 AU1 = . By induction
0 A2
assumption, there is V1 ∈ Mn−1 such that V1∗ A2 V1 = T is in triangular form. If U =
∗ λ1 ∗ λ1 ∗
U1 ([1] ⊕ V1 ), then U AU = = is in upper triangular form. □
0 V ∗ A2 V 0 T
Note that λ1 , . . . , λn can be arranged in any order we like. Some of the λj could be the
same. If µ1 , . . . , µr are distinct and det(λI −A) = rj=1 (λ−µj )mj , we say that A has distinct
Q
show that
0 = Z = [U ∗ (A − λ1 I)U ] · · · [U ∗ (A − λn In )U ],
9
where U ∗ AU = (aij ) is in upper triangular form with diagonal entries λ1 , . . . , λn . Then
Bj = U ∗ (A − λj I)U is in upper triangular form with (j, j) entry equal to zero. .
We will prove by induction on n that if B1 , . . . , Bn ∈ Mn are matrices in upper triangular
form, and the (j, j) entry of Bj equals zero for j = 1, . . . , n, then B1 · · · Bn = 0n .
For n = 1, the result is trivial. For n = 2, the product B1 and B2 has the form
0 ∗ ∗ ∗
,
0 ∗ 0 ∗
10
Consider the (1, 1) entries of A1 A∗1 and A∗1 A1 , we see that all the off-diagonal entries in the
second row of A1 are zero. Repeating this process, we see that à = diag (a11 , . . . , ann ). □
(a) A is Hermitian.
(b) A is unitarily similar to a real diagonal matrix with nonnegative diagonal entries.
11
Proof. Suppose (a) holds. Then x∗ Ax ≥ 0 for all x ∈ Cn . Thus, there is a unitary
U ∈ Mn such that U ∗ AU = diag (λ1 , . . . , λn ) with λ1 , . . . , λn ∈ R. If there is λj < 0, we
can let x be the jth column of U so that x∗ Ax = λj < 0, which is a contradiction. So, all
λ1 , . . . , λn ≥ 0.
Suppose (b) holds. Then U ∗ AU = D such that D has nonnegative entries. We have
A = B ∗ B with B = U D1/2 U ∗ = B ∗ . Hence condition (c) holds.
Suppose (c) holds. Then for any x ∈ Cn , x∗ Ax = (Bx)∗ (Bx) ≥ 0. Thus, (a) holds. □
Ak = kj=1 sj uj vj∗ , one can use power method to get s1 , v1 and then u1 from A∗ A. Then get
P
Corollary 2.3.7 In fact, if A ∈ Mn,m with n ≥ m and has rank m, then A = V R where
V ∈ Mn,m has orthonormal columns and R ∈ Mm can be chosen to be upper triangular,
lower triangular, or positive definite.
Lemma 2.4.2 Let F ⊆ Mn be a commuting family. Then there a unit vector v ∈ Cn such
that v is an eigenvector for every A ∈ F.
Proof. Let V ⊆ Cn with minimum positive dimension be such that A(V ) ⊆ V . We will
show that dim V = 1 and the result will follow. First, A(Cn ) ⊆ Cn . So, one can always try
12
to find V with a minimum positive dimension. We claim that every nonzero vector in V
is an eigenvector of A for every A ∈ F. Then for any non-zero v ∈ V , V0 = span {v} will
satisfy A(V0 ) ⊆ V0 with dim V0 = 1.
Suppose there is A ∈ F such that not every nonzero vector in v is an eigenvector of A.
Now, if V has an orthonormal basis {u1 , . . . , uk } and U is unitary with u1 , . . . , uk as the
∗ B11 B12
first k columns. Then U BU = with B ∈ Mk for every B ∈ F. Then there is
0 B22
v = a1 u1 + · · · + ak uk ∈ V such that Av = λv.
Let V0 = {u ∈ V : Au = λu} ⊂ V . Then V0 is a subspace of V with smaller dimension.
Next, we show that Bu ∈ V0 for any u ∈ V0 . If B ∈ F and u ∈ V , then Bu ∈ V as
B(V ) ⊆ V , and A(Bu) = BAu = Bλu = λBu, i.e., ũ = Bu ∈ V0 . So, V0 satisfies
B(V0 ) ⊆ V0 and dim V0 < dim V , which is impossible. The desired result follows. □
Proof. We can consider the a basis for the span of F, and assume that F = {A1 , . . . , Am }
is finite. Assume A1 is nonscalar, and has an eigenvalue λ1 . Then Aj (V) ⊂ V if V is the null
space of A1 − λ1 I. By induction, there is a common unit eigenvector x for all Aj ∈ F. Then
∗ ∗
construct U with x as the first column so that U ∗ Aj U = , where {B1 , . . . , Bm } is a
0 Bj
commuting families. Apply induction to finish the proof. □
There is no easy canonical form under unitary similarity.1 How to determine two matrices
are unitarily similar?
1
Helene Shapiro, A survey of canonical forms and invariants for unitary similarity, Linear Algebra Appl.
147 (1991), 101-167.
13
2.5 Other canonical forms
Unitary congruence
There is no easy canonical form under unitary congruence for general matrices.
Pk
Every complex symmetric matrix A ∈ Mn is unitarily congruent to j=1 sj Ejj , where
s1 ≥ · · · ≥ sk > 0 are the nonzero singular values of A.
Two symmetric (skew-symmetric) matrices are unitarily congruent if and only if they
have the same singular values.
Then there is an real orthogonal matrix P such that P t AP = (Crs )0≤r,s≤k is in upper tri-
angular block form, where C00 ∈ Mr (R) is an upper triangular matrix with diagonal entries
c1 , . . . , cr , Cjj ∈ M2 (R) has eigenvalues aj ± ibj for j = 1, . . . , k, and Crs is zero if r > s.
14
Furthermore, if A is normal, i.e., At A = AAt , then
P t AP = B0 ⊕ B1 ⊕ · · · ⊕ Bk
aj b j
with B0 = diag (c1 , . . . , cr ), and Bj = ∈ M2 (R) for j = 1, . . . , k.
−bj aj
(a) If A = At , then B1 , . . . , Bk are vacuous.
(b) if A = −At , then B0 = 0r .
(c) If A is orthogonal, then b1 , . . . , br ∈ {1, −1} and a2j + b2j = 1 for j = 1, . . . , k.
Now, C1 has complex eigenvalue a1 ± ib1 . If C1 (x + iy) = (a1 + ib1 )(x + iy) for a pair
of nonzero real vectors x, y ∈ Rn . Then C1 x = a1 x − b1 y and C1 y = a1 y + b1 x, and
C1 (x − iy) = (a1 − ib1 )(x − iy), i.e., C1 [x y] = [x y]B1 . Now, x + iy and x − iy are eigenvectors
of B1 corresponding to the eigenvalues a1 ± ib1 . So, {x + iy, x − iy} is linear independent
and so is {x, y}. Apply Gram-Schmidt process to {x, y} to get a real orthonormal family
{q1 , q2 }. Then [x y] = [q1 q2 ]T1 for an upper triangular matrix T1 ∈ M2 (R). Let Q1 ∈ M2k
be real orthogonal with q1 , q2 as the first two columns. Then
C11 ⋆
Qt1 B1 Q1 =
0 C2
so that C11 = T1 B1 T1−1 has eigenvalues a1 ± ib1 . One can apply an inductive arguments to
C2 and get the desired form.
In case A is normal, then so is Qt AQ. One can then deduce that Qt AQ has the form
B0 ⊕ · · · ⊕ Bk . Assertions (a) – (c) can be verified directly. □
15
3 Similarity and equivalence
We consider other canonical forms in this chapter.
A11 A12
Proposition 3.1.3 Suppose A = ∈ Mn such that A11 ∈ Mk , A22 ∈ Mn−k have
0 A22
no common eigenvalue. Then A is similar to A11 ⊕ A22 .
Proof. By the previous lemma, there is X be such that A11 X + A12 = XA22 . Let
Ik X
S= so that AS = S(A11 ⊕ A22 ). The result follows. □
0 In−k
Definition 3.1.4 Let Jk (λ) ∈ Mk such that all the diagonal entries equal λ and all super di-
λ 1
.. ..
. .
agonal entries equal 1. Then Jk (λ) = ∈ Mk is call a (an upper triangular)
λ 1
λ
Jordan block of λ of size k.
16
Proof. We may assume that A = A11 ⊕· · ·⊕Akk . If we can find invertible matrices S1 , . . . , Sk
such that Si−1 Aii Si is in Jordan form, then S −1 AS is in Jordan form for S = S1 ⊕ · · · ⊕ Sk .
Focus on T = Aii − λi Ink . If S −1 T S is in Jordan form, then so is Aii .
One may see http://cklixx.people.wm.edu/teaching/math408/Jordan.pdf for a proof of
this. The note will appear on arXiv soon. □
To determine the Jordan form of a matrix A with det(xI − A) = (x − λ1 )n1 · · · (x − λk )nk ,
one only needs to study the rank of (A − λj I)m for m = 1, . . . , nj .
Let ker((A − λI)i ) = ℓi has dimension ℓi . Then there are ℓ1 Jordan blocks of λ, and there
are ℓi − λi−1 Jordan blocks of size at least i.
0 0 1 2
0 0 3 4
Example 3.1.6 Let T = 0 0 0 0. Then T e1 = 0, T e2 = 0, T e3 = e1 + 3e2 , T e4 =
0 0 0 0
2e1 + 4e2 . So, T (V ) = span {e1 , e2 }. Now, T e1 = T e2 = 0 so that e1 , e2 form a Jordan basis
for T (V ). Solving u1 , u2 such that T (u1 ) = e1 , T (u2 ) = e2 , we let u1 = −2e3 + 3e4 /2 and
u2 = e3 − e4 /2. Thus, T S = S(J2 (0) ⊕ J2 (0)) with
1 0 0 0
0 0 1 0
S=
0 −2 0
.
1
0 3/2 0 −1/2
0 1 2
Example 3.1.7 Let T = 0 0 3. Then T e1 = 0, T e2 = e1 , T e3 = 2e1 + e2 . So,
0 0 0
T (V ) = span {e1 , e2 }, and e2 , T e2 = e1 form a Jordan basis for T (V ). Solving u1 such that
T (u1 ) = e2 , we have u1 = (−2e2 + e3 )/3. Thus, T S = SJ3 (0) with
1 0 0
S = 0 1 −2/3 .
0 0 1/3
17
3.2 Implications of the Jordan form
Theorem 3.2.1 Two matrices are similar if and only if they have the same Jordan form.
m
X m
m
Jk (λ) = λm−j Nkj ,
j=0
j
where Nk0 = Ik , Nkj = 0 for j ≥ k, and Nkj has one’s at the jth super diagonal (entries with
indexes (ℓ, ℓ + j)) and zeros elsewhere.
f (A) = am Am + · · · + a0 In for A ∈ Mn .
mA (z) = xm + a1 xm−1 + · · · + am
Theorem 3.2.5 A polynomial g(z) satisfies g(A) = 0 if and only if it is a multiple of the
minimal polynomial of A.
Theorem 3.2.6 Suppose A has distinct eigenvalues λ1 , . . . , λk such that rj is the maximum
size Jordan block of λj for j = 1, . . . , k. Then mA (z) = (z − λ1 )r1 · · · (z − λk )mk .
18
Proof. Following the proof of the Cayley Hamilton Theorem, we see that mA (A) = 0n .
By the last Theorem, if g(A) = 0n , then g(z) = mA (z)q(z). So, taking q(z) = 1 will yield
the monic polynomial of minimum degree satisfying mA (A) = 0. □
Remark 3.2.7 For any polynomial g(z), the Jordan form of g(A) can be determine in
terms of the Jordan form of A. In particular, for every Jordan block Jk (λ), we can write
g(z) = (z − λ)k q(z) + r(z) with r(z) = a0 + · · · + ak−1 z k−1 so that g(Jk (λ)) = r(Jk (λ)).
Note that
g ′ (λ) g ′′ (λ) g (r−1) (λ)
g(λ)
0! 1! 2!
··· (r−1)!
g(λ) g ′ (λ) .. g (r−2) (λ)
0 0! 1!
. (r−2)!
g(Jr (λ)) =
.. .. .. .
0 0 . . .
.. .. g(λ) g ′ (λ)
. ··· . 0! 1!
g(λ)
0 ··· ··· 0 0!
One can extend this to function g(x), which are differentiable up to order r in a domain
containing λ in the interior.
Equivalence
∗-congruence
19
Two Hermitian matrices are ∗-congruent if and only if they have the same inertia.
Congruence or t-congruence
There is no easy canonical form under t-congruence for general matrices; see footnote
2.
Two symmetric (skew-symmetric) matrices are t-congruent if and only if they have the
same rank.
Clearly, xt Ax ∈ R for all real vectors x ∈ Rn , and the condition does not imply that A
is symmetric as in the complex Hermitian case.
The matrix A satisfies xt Ax ≥ 0 for all if and only if (A + At )/2 has only nonnegative
eigenvalues. The condition does not automatically imply that A is symmetric as in the
complex Hermitian case.
20
If A ∈ Mn (R) has only real eigenvalues, then one can find a real invertible matrix such
that S −1 AS is in Jordan form.
If A ∈ Mn (R), then there is a real invertible matrix such that S −1 AS is a direct sum
of real Jordan blocks, and 2k × 2k generalized Jordan blocks of the form (Cij )1≤i,j≤k
µ1 µ2
with C11 = · · · = Ckk = , C12 = · · · = Ck−1,k = I2 , and all other blocks
−µ2 µ1
equal to 02 .
21
4 Eigenvalues and singular values inequalities
We study inequalities relating the eigenvalues, diagonal elements, singular values of matrices
in this chapter.
For a Hermitian matrix A, let λ(A) = (λ1 (A), . . . , λn (A)) be the vector of eigenvalues of A
with entries arranged in descending order. Also, we will denote by s(A) = (s1 (A), . . . , sn (A))
the singular values of a matrix A ∈ Mm,n . For two Hermitian matrices, we write A ≥ B if
A − B is positive semidefinite.
Remark The above result will give us what we needed, and we can put the majorization
result as a related result for real vectors.
Lemma 4.1.1 (Rayleigh principle) Let A ∈ Mn be Hermitian. Then for any unit vector
x ∈ Cn ,
λ1 (A) ≥ x∗ Ax ≥ λn (A).
The equalities hold at unit eigenvectors corresponding to the largest and smallest eigenvalues
of A, respectively.
22
(a) x ≺ y.
Proof. Note that the conditions do not change if we replace (x, y) by (P x, Qy) for any
permutation matrices P, Q. We may make these changes in our proof.
(c) ⇒ (a). We may assume that x = (x1 , . . . , xn )t and y = (y1 , . . . , yn )t with entries
in descending order. Suppose x = Dy for a doubly stochastic matrix D = (dij ). Let
vk = (e1 + · · · + ek ) and vkt D = (c1 , . . . , cn ). Then 0 ≤ cj ≤ 1 and nj=1 cj = k. So,
P
k
X
xj = vkt Dy = c1 y1 + · · · + cn yn
j=1
≤ c1 y1 + ck yk + [(1 − c1 ) + · · · + [1 − ck )]yk ≤ y1 + · · · + yk .
n−1
X n−1
X
yn ≥ x1 ≥ · · · xn ≥= S − xj ≥ S − yj = yn
j=1 j=1
x2 + · · · + xℓ ≤ x1 + · · · + xℓ−1 ≤ y1 + · · · + yℓ−1 ;
if ℓ > k, then
23
(b) ⇒ (c). If xj is obtained from xj−1 by pinching the pth and qth entries. Then there
is a doubly stochastic matrix Pj obtained from I by changing the submatrix in rows and
columns p, q by
tj 1 − tj
1 − tj tj
for some tj ∈ (0, 1). Then x = Dy for D = Pk · · · P1 , which is doubly stochastic. □
Proof. Let A = U DU ∗ such that D = diag (λ1 , . . . , λn )). Suppose A = (aij ) and U =
(uij ). Then ajj = ni=1 λi |uji |2 . Because (|uji |2 ) is doubly stochastic. So, (a11 , . . . , ann ) ≺
P
(λ1 , . . . , λn ).
We prove the converse by induction on n. Suppose (d1 , . . . , dn ) ≺ (λ1 , . . . , λn ). If n = 2,
let d1 = λ1 cos2 θ + λ2 sin2 θ so that
cos θ sin θ λ1 cos θ − sin θ
(aij ) =
− sin θ cos θ λ2 sin θ cos θ
has diagonal entries d1 , d2 .
Suppose n > 2. Choose the maximum k such that λk ≥ d1 . If λn = d1 , then for
S = nj=1 dj = nj=1 λj we have
P P
n−1
X n−1
X
λn ≥ d1 ≥ · · · ≥ dn = S − dj ≥ S − λj = λn .
j=1 j=1
Pn
Thus, λn = d1 = · · · = dn = S/n = j=1 λj /n implies that λ1 = · · · = λn . Hence,
A = λn I is the required matrix. Suppose k < n. Then there is A1 = At1 ∈ M2 (R)
with diagonal entries d1 , λk + λk+1 − d1 and eigenvalues λj , λj+1 . Consider A = A1 ⊕ D
with D = diag (λ1 , . . . , λk−1 , λk+2 , . . . , λn ). As shown in the proof of Theorem 4.1.3, if
λ̃k+1 = λk + λk+1 − d1 , then
has diagonal entries d2 , . . . , dn . Thus, A = ([1] ⊕ U )(A1 ⊕ D)([1] ⊕ U ∗ ) has the desired
eigenvalues and diagonal entries. □
24
4.2 Max-Min and Min-Max characterization of eigenvalues
In this subsection, we give a Max-Min and Min-Max characterization of eigenvalues of a
Hermitian matrix.
Lemma 4.2.1 Let V1 and V2 be subspaces of Cn such that dim(V1 ) + dim(V2 ) > n, then
V1 ∩ V2 ̸= {0}.
Proof. Let {u1 , . . . , up } and {v1 , . . . , vq } be bases for V1 and V2 . Then p + q > n and the
linear system [u1 · · · up v1 · · · vq ]x = 0 ∈ Cn has a non-trivial solution x = (x1 , . . . , xp , y1 , . . . , yq )t .
Note that not all x1 , . . . , xp are zero, else y1 v1 + · · · + yq v1 = 0 implies yj = 0 for all j. Thus,
v = x1 u1 + · · · + xp up = −(y1 v1 + · · · + yq vq ) is a nonzero vector in V1 ∩ V2 . □
Equivalently,
λk (A) = maxn min x∗ Ax = min max x∗ Ax.
V ≤C x∈V V ≤ Cn x∈V
dim V = k ∥x∥ = 1 dim V = n − k + 1 ∥x∥ = 1
25
Proof. Let A = B + P , where P is positive semidefinite. Suppose k ∈ {1, . . . , n}. There
is Y ∈ Mn,k with Y ∗ Y = Ik such that
Proof. Suppose 1 ≤ r1 < · · · < rk ≤ n. We want to show kj=1 (crj − arj ) ≤ kj=1 bj .
P P
k
X k
X
(crj − arj ) ≤ (λrj (A + B+ ) − λrj (A)) because λj (A + B) ≤ λj (A + B+ ) for all j
j=1 j=1
n
X
≤ (λj (A + B+ ) − λj (A)) because λj (A) ≤ λj (A + B+ ) for all j
j=1
k
X k
X
= tr (A + B+ ) − tr (A) = λj (B+ ) = bj .
j=1 j=1
26
4.4 Eigenvalues of principal submatrices
A ∗
Theorem 4.4.1 There is a positive matrix C = with A ∈ Mk so that A, B, C have
∗ B
eigenvalues a1 ≥ · · · ≥ ak , b1 ≥ · · · ≥ bn−k and c1 ≥ · · · ≥ cn , respectively, if and only if
there are positive semi-definite matrices Ã, B̃, C̃ = Ã + B̃ with eigenvalues a1 ≥ · · · ≥ ak ≥
0 = ak+1 = · · · = an , b1 ≥ · · · ≥ bn−k ≥ 0 = bn−k+1 = · · · = bn , and c1 ≥ · · · ≥ cn .
Consequently, for any 1 ≤ j1 < · · · < jk ≤ n, we have kj=1 (crj − arj ) ≤ kj=1 bj .
P P
Theorem 4.4.2 There is a Hermitian (real symmetric) matrix C ∈ Mn with principal sub-
matrix A ∈ Mm such that C and A have eigenvalues c1 ≥ · · · ≥ cn and a1 ≥ · · · ≥ am ,
respectively, if and only if
Proof. To prove the necessity, we may replace C by C − λn (C)I and assume that C is
positive semidefinite. Then by the previous theorem,
cj − aj ≥ bn−m ≥ 0, j = 1, . . . , m.
27
We will show how to choose a real orthogonal matrix Q such that C = Qt diag(c1 , . . . , cn )Q
has the leading principal submatrix A ∈ Mn−1 with eigenvalues a1 ≥ · · · ≥ an−1 . To this
end, let Q have last column u = (u1 , . . . , un )t . By the adjoint formula for the inverse
Qn−1
−1 det(zIn−1 − A)) j=1 (z − aj )
[(zI − C) ]nn = = Qn ,
det(I − C) j=1 (z − cj )
Qn−1
Equating these two, we see that A(n) has characteristic polynomial i=1 (z − µi ) if and only
if
Xn Y n−1
Y
u2i (z − ai ) = (z − ci ).
i=1 j̸=i i=1
Both sides of this expression are polynomials of degree n − 1 so they are identical if and only
if they agree at the n distinct points c1 , . . . , cn , or equivalently,
Qn−1
j=1 (ck − aj )
u2k = Q ≡ wk , k = 1, . . . , n.
j̸=k (ck − cj )
√
Since (ck − aj )/(ck − cj ) > 0 for all k ̸= j, we see that wk > 0. Thus if we take uk = wk
then A has eigenvalues a1 , . . . , an−1 .
Now, suppose m < n − 1. Let
(
max{cj+1 , aj } 1 ≤ j ≤ m,
c̃j =
min{cj , am−n+j+1 } m < j < n.
Then
c1 ≥ c̃1 ≥ c2 ≥ · · · ≥ cn−1 ≥ c̃n−1 ≥ cn ,
and
c̃j ≥ aj ≥ c̃n−m−1+j , j = 1, . . . , m.
By the induction assumption, we can construct a Hermitian C̃ ∈ Mn−1 with eigenvalues
c̃1 ≥ · · · ≥ c̃n−1 , whose m × m leading principal submatrix has eigenvalues a1 ≥ · · · ≥ am ,
and C̃ is the leading principal submatrix of the real symmetric matrix C ∈ Mn such that C
has eigenvalues c1 ≥ · · · ≥ cn . □
28
4.5 Eigenvalues and Singular values
A11 A12
∈ Mn with A11 ∈ Mk . Then | det(A11 )| ≤ kj=1 sj (A).
Q
Theorem 4.5.1 Let A =
A21 A22
The equality holds if and only if A = A11 ⊕A22 such that A11 has singular values s1 (A), . . . , sk (A).
Then the leading k × k submatrix of X̂A is obtained from that of A by changing its first row
from (0, ξ2 , 0, . . . , 0) to (0, ξˆ2 , ∗, · · · , ∗), and has determinant ξ1 ξˆ2 · · · ξk > ξ1 · · · ξk = det(A11 ),
which is a contradiction. So, the second column of A21 is zero. Repeating this argument, we
see that A21 = 0.
Now, the leading k × k submatrix of At ∈ S(s1 , . . . , sn ) also attains the maximum.
Applying the above argument, we see that At12 = 0. So, A = A11 ⊕ A22 .
Let Û , V̂ ∈ Mn−k be unitary such that Û ∗ A22 V̂ = diag (ξk+1 , . . . , ξn ). We may replace
A by (Ik ⊕ Û ∗ )A(Ik ⊕ V̂ ) so that A = diag (ξ1 , . . . , ξn ). Clearly, ξk ≥ ξk+1 . Otherwise, we
may interchange kth and (k + 1)st rows and also the columns so that the leading k × k
submatrix of the resulting matrix becomes diag (ξ1 , . . . , ξk−1 , ξk+1 ) with determinant larger
than det(A11 ). So, ξ1 , . . . , ξk are the k largest singular values of A. □
29
Theorem 4.5.2 Let a1 , . . . , an be complex numbers be such that |a1 | ≥ · · · ≥ |an | and s1 ≥
· · · ≥ sn ≥ 0. Then there is A ∈ Mn with eigenvalues a1 , . . . , an and singular values
s1 , · · · , sn if and only if
n
Y n
Y k
Y k
Y
|aj | = sj , and |aj | ≤ sj for j = 1, . . . , n − 1.
j=1 j=1 j=1 j=1
s1 · · · sn .
To prove the converse, suppose the asserted inequalities and equality on a1 , . . . , an and
s1 , . . . , sn hold. We show by induction that there is an upper triangular matrix A = (aij )
with singular values s1 ≥ · · · ≥ sn and diagonal values |a1 |, . . . , |an |. Then there will be a
diagonal unitary matrix D such that DA has the desired eigenvalues and singular values.
For notation simplicity, we assume aj = |aj | in the following.
Suppose n = 2. Then a1 ≤ s1 , and a1 a2 = s1 s2 so that s1 ≥ a1 ≥ a2 ≥ s2 . Consider
cos θ sin θ s1 cos ϕ − sin ϕ
A(θ, ϕ) = .
− sin θ cos θ s2 sin ϕ cos ϕ
There is ϕ ∈ [0, π/2] such that the (s1 cos ϕ, s2 sin ϕ)t has norm a1 ∈ [s2 , s1 ]. Then we can
find θ ∈ [0, π/2] such that (cos θ, sin θ)(s1 cos ϕ, s2 sin ϕ) = a1 . Thus, the first column of
A(θ, ϕ) equals (a1 , 0)t , and A(θ, ϕ) has the desired eigenvalues and singular values.
Suppose the result holds for matrices of size at most n − 1 ≥ 2. Consider (a1 , . . . , an )
and (s1 , . . . , sn ) satisfying the product equality and inequalities.
If a1 = 0, then sn = 0 and A = s1 E12 + · · · + sn−1 En−1,n has the desired eigenvalues and
singular values.
Suppose a1 > 0. Let k be the maximum integer such that sk ≥ a1 . Then there is
a1 ∗
A1 = with s̃k+1 = sk sk+1 /a1 ∈ [sk−1 , sk+1 ]. Let
0 s̃k+1
We claim that (a2 , . . . , an ) and (s̃1 , . . . , s̃n−1 ) satisfy the product equality and inequalities.
First, nj=2 aj = nj=1 sj /a1 = n−1
Q Q Q
j=1 s̃j . For ℓ < k,
ℓ
Y ℓ−1
Y ℓ−1
Y ℓ−1
Y
aj ≤ aj ≤ sj = s̃j .
j=2 j=1 j=1 j=1
30
For ℓ ≥ k + 1,
ℓ
Y ℓ
Y ℓ−1
Y
aj ≤ sj /a1 = s̃j .
j=2 j=1 j=1
Proof. (a) Let S(s1 , . . . , sn ) be the set of matrices in Mn with singular values s1 ≥ · · · ≥
A11 A12
sn . Suppose A = ∈ S(s1 , . . . , sn ) with A11 ∈ Mk such that |a11 | + · · · + |akk |
A21 A22
attains the maximum value. We may replace A by DA by a suitable diagonal unitary D ∈ Mn
and assume that ajj = |ajj | for all j = 1, . . . , n. If aij ̸= 0 for any j > k ≥ i, then there is a
aii aij
unitary X ∈ M2 such that X has (1, 1) entry equal to
aji ajj
31
eigenvalues ξ1 , . . . , ξk and a unitary matrix V ∈ Mk . Suppose V = U D̂U ∗ for some diagonal
unitary D̂ ∈ Mk and unitary U ∈ Mk . Then
tr A11 = tr (P U D̂U ∗ ) = tr U ∗ P U D̂ ≤ tr U ∗ P U = tr P,
where the equality holds if and only if D̂ = Ik , i.e., A11 = P is positive semidefinite. In
particular, we can choose B = diag (s1 , . . . , sn ) so that the sum of the k diagonal entries is
Pk Pk
j=1 sj ≥ j=1 ξj = tr A11 . Thus, the eigenvalues of A11 must be s1 , . . . , sk as asserted.
which is a contradiction.
ajj ajn
Next, for j = 1, . . . , n − 1, let Bj = . We show that |ajj | − |ann | =
anj ann
s1 (Bj ) − s2 (Bj ) and Bj is Hermitian in the following. Note that s1 (B1 )2 + s2 (Bj )2 =
|ajj |2 + |ajn |2 + |anj |2 + |ann |2 and s1 (Bj )s2 (Bj ) = |ajj ann − ajn anj | so that −ajj ann =
|ajj ann | ≥ s1 (Bj )s2 (Bj ) − |ajn anj |. Hence,
(|ajj | − |ann |)2 = (ajj + ann )2 = a2jj + a2nn + 2ajj ann
≤ s1 (Bj )2 + s2 (Bj )2 − (|ajk |2 + |akj |2 ) − 2(s1 (Bj )s2 (Bj ) − |ajn anj |)
= (s1 (Bj ) − s2 (Bj ))2 − (|ajk | − |akj |)2
≤ (s1 (Bj ) − s2 (Bj ))2 .
Here the two inequalities become equalities if and only if |ajk | = |akj | and |ajn anj | = ajn anj ,
i.e., ajn = ānj and Bj is Hermitian.
By the above analysis, |ajj | − |ann | ≤ s1 (Bj ) − s2 (Bj ). If the inequality is strict, there are
unitary X, Y ∈ M2 such that X ∗ Bj Y = diag (s1 (Bj ), s2 (Bj )). Let X̂ be obtained from In by
replacing the 2 × 2 submatrix in rows and columns j, n by X. Similarly, we can construct
Ŷ . Then X̂, Ŷ ∈ Mn are unitary and X̂ ∗ AŶ has diagonal entries dˆ1 , . . . , dˆn obtained from
that of A by changing (ajj , ann ) to (s1 (Bj ), s2 (Bj )). As a result,
n−1
X n−1
X
dˆj − |dˆn | > ajj − |ann |,
j=1 j=1
32
which is a contradiction. So, Bj is Hermitian for j = 1, . . . , n − 1. Hence, A is Hermitian,
and
tr A = a11 + · · · + ann = a11 + · · · + an−1,n−1 − ann .
Suppose A has eigenvalues λ1 , . . . , λn with |λj | = sj (A) for j = 1, . . . , n. Because 0 ≥ ann ≥
P Pn−1
λn , we see that tr A = j=1 λj ≤ j=1 sj − sn . Clearly, the equality holds. Else, we have
Pn−1
B = diag (s1 , . . . , sn ) ∈ S(s1 , . . . , sn ) attaining j=1 sj − sn . The result follows. □
Recall that for two real vectors x = (x1 , . . . , xn ), y = (y1 , . . . , yn ), we say that x ≺w y is
the sum of the k largest entries of x is not larger than that of y for k = 1, . . . , n.
Theorem 4.6.2 Let d1 , . . . , dn be complex numbers such that |d1 | ≥ · · · ≥ |dn |. Then there
is A ∈ Mn with diagonal entries d1 , . . . , dn and singular values s1 ≥ · · · ≥ sn if and only if
n−1
X n−1
X
(|d1 |, . . . , |dn |) ≺w (s1 , . . . , sn ) and |dj | − |dn | ≤ sj − sn .
j=1 j=1
Proof. The necessity follows from the previous theorem. We prove the converse by
induction on n ≥ 2. We will focus on the construction of A with singular values s1 , . . . , sn ,
and diagonal entries d1 , . . . , dn−1 , dn with d1 , . . . , dn ≥ 0.
d1 a
Suppose n = 2. We have d1 + d2 ≤ s1 + s2 , d1 − d2 ≤ s1 − s2 . Let A = such
−b d2
that a, b ≥ 0 satisfies ab = s1 s2 − d1 d2 and a2 + b2 = s21 + s22 − d21 − d22 . Such a, b exist because
Suppose the result holds for matrices of sizes up to n − 1 ≥ 2. Consider (d1 , . . . , dn ) and
(s1 , . . . , sn ) that satisfy the inequalities. Let k be the largest integer k such that sk ≥ d1 .
d1 ∗
If k ≤ n − 2, there is B = with singular values sk , sk+1 , where ŝ = sk + sk+1 − d1 .
∗ ŝ
One can check that (d2 , . . . , dn ) and (s1 , . . . , sk−1 , ŝ, sk+2 , . . . , sn ) satisfy the inequalities for
the n − 1 case so that there are unitary U, V ∈ Mn−1 such that U DV ∗ has diagonal entries
d2 , . . . , dn , where D = diag (ŝ, s1 , . . . , sk−1 , sk+2 , . . . , sn ). Thus,
33
It follows that
(dn , ŝ) ≺w (sn−1 , sn ), |dn − ŝ| ≤ sn−1 − sn ,
n−2
X n−2
X
(d1 , . . . , dn−1 ) ≺w (s1 , . . . , sn−2 , ŝ) and dj − dn−1 ≤ sj − ŝ.
j=1 j=1
So, there is C ∈ M2 with singular values sn−1 , sn and diagonal elements ŝ, dn . Moreover,
there are unitary matrix X, Y ∈ Mn−1 such that Xdiag (s1 , . . . , sn−2 , ŝ)Y ∗ has diagonal
entries d1 , . . . , dn−1 . Thus,
k
X k
X
(auj + bvj ) ≥ cwj
j=1 j=1
2. Suppose k < n and all the subsequences of length up to k − 1 are specified. Con-
Pk
sider subsequences (u1 , . . . , uk ), (v1 , . . . , vk ), (v1 , . . . , vk ) satisfying j=1 (uj + vj ) =
Pk
j=1 wj +k(k+1)/2, and for any lenth ℓ specified subsequences (α1 , . . . , αℓ ), (β1 , . . . , βℓ ), (γ1 , . . . , γℓ )
of (1, . . . , n) with ℓ < k,
X ℓ X
(uαj + vβj ) ≥ wγj .
j=1 j=1
34
Consequently, the subsequences (u1 , . . . , uk ), (v1 , . . . , vk ), (w1 , . . . , wk ) of (1, . . . , n) is a
Horn’s sequence triples of length k if and only if there are Hermitian matrices U, V, W = U +V
with eigenvalues
u1 − 1 ≤ u2 − 2 ≤ · · · ≤ uk − k, v1 − 1 ≤ v2 − 2 ≤ · · · ≤ vk − k, w1 − 1 ≤ w2 − 2 ≤ · · · ≤ wk − k,
Byj = aj yj , Czj = zj , j = 1, . . . , n.
Suppose Z ∈ Mn,n−1 has orthonormal columns such that the column space of Z contains
z1 , . . . , zq , yq+2 , . . . , yn . Let à = Z ∗ AZ, B̃ = Z ∗ BZ, C̃ = Z ∗ CZ have eigenvalues â1 ≥ · · · ≥
ân−1 , b̂1 ≥ · · · ≥ b̂n−1 , and ĉ1 ≥ · · · ≥ ĉn−1 , respectively. By induction assumption,
q k k k k
X X X X X
ĉuj +vj −j + ĉuj +(vj −1)−j ≤ âuj + b̂j + buj +(vj −1)−j .
j=1 j=q+1 j=1 j=1 j=q+1
35
Because b̂j ≤ bj for j = 1, . . . , q, and b̂uj +vj −j−1 = buj +vj −j for j = q + 1, . . . , k as the column
spaces contains yq+1 , . . . , yn , we have
q k k
X X X
b̂j + buj +(vj −1)−j ≤ buj +vj −j .
j=1 j=q+1 j=1
1. Suppose n = 3. List all the Horn sequences (u1 , u2 ), (v1 , v2 ), (w1 , w2 ) of length 2, and
list all the Thompson standard sequences (u1 , u2 ), (v1 , v2 ) and (w1 , w2 ) = (u1 + v1 −
1, u2 + v2 − 2).
(a21 +b2n , . . . , a2n +b21 ) ≺ (s21 , . . . , s2n ) and (s21 +s2n , . . . , s2n +s21 )/2 ≺ (a21 +b21 , . . . , a2n +b2n ).
Hint: 2(A2 + B 2 ) = CC ∗ + C ∗ C.
36
5. Suppose c1 ≥ a1 ≥ c2 ≥ a2 ≥ · · · ≥ an−1 ≥ cn ≥ an are 2n real numbers. Show
that there is a nonnegative real vector v ∈ Rn such that D + vv t has eigenvalues
c1 ≥ · · · ≥ cn for D = diag (a1 , . . . , an ).
Hint: Replace cj by cj + γ and aj + γ for j = 1, . . . , n, for a sufficiently large γ > 0,
D y
and assume that cn ≥ an > 0. By interlacing inequalities, there is C̃ = . Show
yt a
that C = D + vv t has eigenvalues c1 ≥ · · · ≥ cn .
à ∗
6. Suppose A = . Show that
0 ∗
Hint: By induction on n. Check the case for n = 2. Assume that the result holds for
matrices of size n − 1. If k = n, the equality holds. Suppose k < n. Let p be the
largest integer such that uj = j for all j = 1, . . . , p, and q be the largest integer such
that vj = j for all j = 1, . . . , q. We may assume that q ≤ p. Let C = AB, {u1 , . . . , un }
and {v1 , . . . , vn } be orthonormal sets such that
Suppose U, V are unitary such that the first n − 1 columns span a subspace containing
B̃ ∗
v1 , . . . , v1 , uq+2 , . . . , un , and V ∗ BU = with B̃ ∈ Mn−1 . Let W be unitary such
0 ∗
∗ Ã ∗ ∗ ÃB̃ ∗
that W BV = . Then W ABV = . Apply induction assumption on
0 ∗ 0 ∗
ÃB̃ to finish the proof.
37
5 Norms
In many applications of matrix theory such as approximation theory, numerical analysis,
quantum mechanics, one has to determine the “size” of a matrix, how near is one matrix
to another, or how close is a matrix to a special class of matrices. We need concept of the
norm (size) of a matrix. There are different ways to define the norm of a matrix, and the
different definitions are useful in different applications.
Note that ℓ2 (v) = ( nj=1 |vj |2 )1/2 is the inner product norm.
P
For every p ∈ [1, ∞], it is easy to verify (a) and (b). For p = 1, ∞, it is easy to verify
the triangular inequality. For p > 1, the verification of ℓp (u + v) ≤ ℓp (u) + ℓp (v) is not so
easy. We may change all the entries of u and v to their absolute values, and focus on vectors
with nonnegative entries. to prove that the ℓp norm satisfies the triangle inequality 1 < p,
we establish the following.
Lemma 5.1.3 (Hölder’s inequality) Let p, q > 1 be such that 1/p + 1/q = 1. For u =
(u1 , . . . , un )t and v = (v1 , . . . , vn )t with positive entries,
X
uj vj ≤ ℓp (u)ℓq (v).
j=1
The equality holds if and only if (up1 , . . . , upn )t and (v1q , . . . , vnq )t are linearly dependent.
Proof. Replace (u, v) by (u/ℓp (u), v/ℓq (v)). We need to show that ut v ≤ 1. Note that for
two positive numbers a, b, we have
38
≤ (1/p) exp(ln(ap )) + (1/q) exp(ln(bq )) = ap /p + bq /q,
where the equality holds if and only if ap = bq . Thus, we have uk vk ≤ upk /p + vkq /q, and
n
X
uj vj ≤ ℓp (u)/p + ℓq (v)/q = 1,
j=1
where the equality holds if and only if upj = vjq for all j = 1, . . . , n. □
Proof. The cases for p = 1, ∞ can be readily checked. Suppose p > 1. By the Hölder
inequality, if 1 − 1/p = 1/q, then
n
X n
X
(uj + vj )p = uj (uj + vj )p−1 + vj (uj + vj )p−1
j=1 j=1
n
X n
X
≤ ℓp (u)( (uj + vj ) ) + ℓp (v)( (uj + vj )p )1/q
p 1/q
as (p − 1)q = p
j=1 j=1
n
!1/q
X
= (ℓp (u) + ℓp (v)) (uj + vj )p .
j=1
P 1/q
n
Dividing both sides by j=1 (uj + vj )p , we get the conclusion. □
Example 5.1.5 Consider V = Mm,n . Using the inner product ⟨A, B⟩ = tr (AB ∗ ) on Mm,n ,
we have the inner product norm (a.k.a. Frobenius norm)
X m
X
∗ 1/2 2 1/2
∥A∥ = (tr AA ) =( |aij | ) = sj (A)2 .
i,j j=1
One can define the ℓp (A) = ( i,j |aij |p )1/p , and define the Schatten p-norm by
P
m
X
Sp (A) = ℓp (s(A)) = ( sj (A)p )1/p .
j=1
The Schatten ∞-norm reduces to s1 (A), which is also known as the spectral norm or operator
norm defined by
∥A∥ = max{ℓ2 (Ax) : x ∈ Cn , ℓ2 (x) ≤ 1}.
39
When m = n, the Schatten 1-norm of A is just the sum of the singular values of A, and
is also known as the trace norm.
One can also define the Ky Fan k-norm by Fk (A) = kj=1 sj (A) for k = 1, . . . , m.
P
Assertion The Ky Fan k-norms and the Schatten p-norms satisfy the triangle inequalities.
Proof. To prove the triangle inequality for the Ky Fan k-norm, note that if C = A +
0 C 0 A 0 B Pk
B, then ∗
C 0
= ∗
A 0
+
B 0 ∗ . By the Lidskii inequalities j=1 sj (C) ≤
Pk
j=1 (sj (A) + sj (B)). So, we have proved s(C) ≺w s(A) + s(B). It is easy so show that if
(c1 , . . . , cm ) ≺w (γ1 , . . . , γm ), then ℓp (c1 , . . . , cm ) ≤ ℓp (γ1 , . . . , γn ). Thus, we have
For A ∈ Mn , one can define the numerical range and numerical radius of A by
Note that the numerical radius is a norm on Mn (homework), but the spectral radius is
not.
Theorem 5.1.7 Let A ∈ Mn . Then W (A) is a compact convex set containing all the
eigenvalues of A, and
r(A) ≤ w(A) ≤ s1 (A) ≤ 2w(A).
40
Now, let z(s) = [(1 − s)x + sy]/∥(1 − s)x + sy∥ so that
has real values vary from 0 to 1 continuously as s varies in [0, 1]. So, [0, 1] ⊆ W (B).
The set W (A) is compact means that it is bounded and contains all the boundary points.
It follows from the fact that W (A) is the image of the set of unit vectors in Cn under the
continuous function x 7→ x∗ Ax.
Now, if λ is an eigenvalue of A, let x be a corresponding unit eigenvector of λ, then
x∗ Ax = λ ∈ W (A). So, r(A) ≤ w(A). Also, we have
s1 (A) ≤ s1 (H + iG) ≤ s1 (H) + s1 (G) = |x∗ Hx| + |y ∗ Gy| ≤ |x∗ Ax| + |y ∗ Ay| ≤ 2w(A).
□
Note that every induced norm is a matrix norm. The Schatten p-norms, the Ky Fan
k-norms, are matrix norms, but the numerical radius is not.
Example 5.1.9 The operator norm induced by the ℓ1 -norm on Fn is the column sum norm
defined by
Xn
∥A∥ℓ1 = max{ : |ajℓ | : ℓ = 1, . . . , n}.
j=1
The operator norm induced by the ℓ∞ -norm on Fn is the row sum norm defined by
n
X
∥A∥ℓ∞ = max{ : |aℓj | : ℓ = 1, . . . , n}.
j=1
41
Proof. Let A = S(J1 ⊕· · ·⊕Jk )S −1 , where J1 , . . . , Jk are Jordan blocks. Assume r(A) < 1.
We will show that Aℓ → 0 as ℓ → ∞. It suffices to show that Jiℓ → 0 as ℓ → ∞ for each
i = 1, . . . , k.
Note that if µ satisfies |µ| < 1 and Nm = E12 + · · · Em−1,m ∈ Mm , then for ℓ > m,
m−1
X
ℓ ℓ ℓ−p p
(µIm + Nm ) = µ N →0 as ℓ → ∞
j=0
p
ℓ
ℓ−p
as limℓ→∞ p
µ = 0. Conversely, if Ax = µx for some |µ| ≥ 1 and unit vector x ∈ Cn ,
then Ak x = µk x so that Ak ̸→ 0 as k → ∞. □
Proof. Suppose µ is an eigenvalue of A such that |µ| = r(A). Let x be a unit vector such
that Ax = µx. Then |µk |∥[x · · · x]∥ = ∥Ak [x · · · x]∥ ≤ ∥Ak ∥∥[x · · · x]∥. So, |µk | ≤ ∥Ak ∥.
Now, for any ε > 0, let Aε = A/(r(A) + ε). Then limk→∞ Akε = 0. So, for sufficiently
large k ∈ N we have ∥Ak /(rA + ε)k ∥ < 1. Hence, for any ε > 0, if k is sufficiently large, then
Remark In the proof, we use the fact that the function x 7→ ∥x∥ is continuous. To see this,
for any ε > 0, we can let δ = ε, then ∥x − y∥ < δ, we have |∥x∥ − ∥y∥| ≤ ∥x − y∥ = δ = ε.
Corollary 5.1.12 Suppose ∥ · ∥ is a matrix norm on Mn such that ∥A∥ ≥ r(A) for all
A ∈ Mn . If ∥A∥ < 1, then Ak → 0 as k → ∞.
Bν = {x ∈ V : ν(x) ≤ 1}
Theorem 5.2.1 Let ν be a norm on a nonzero linear space V . Then Bv satisfies the fol-
lowing.
42
(a) The zero vector 0 is an interior point.
(b) For any µ ∈ F with |µ| = 1,
Bv = µBν = {µx : x ∈ Bν }.
43
Remark 5.3.3 Suppose V is an inner product space, and ν is a norm on V . One can define
the dual norm on V by
ν D (x) = sup{|⟨x, y⟩| : ν(y) ≤ 1}.
We have (ν D )D = ν.
Example 5.3.4 The dual norm of the ℓp norm on Fn is the ℓp norm with 1/p + 1/q = 1.
The dual norm of the Schatten p norm on Mm,n is the Schatten q norm on Mm,n with
1/p + 1/q = 1.
The dual norm of the Ky Fan k-norm on Mm,n with m ≥ n is Fkd (A) = max{ nj=1 sj (A), s1 (A)}
P
Proof. Suppose ∥ · ∥ is a UI norm. Then ∥A∥ = ∥ nj=1 sj (A)Ejj ∥ for any A ∈ Mm,n .
P
∥A∥c = νc (s(A)).
If c = (1, . . . , 1, 0, . . . , 0), we get the νk (x) and the Ky Fan k-norm Fk (A).
| {z }
k
44
Lemma 5.4.2 Suppose ν on Rn is a symmetric norm. Then for any x ∈ Rn ,
Proof. Suppose (a) holds. Then for any c = (c1 , . . . , cn ) with c1 , . . . , cn , if we set dn = cn
and dj = cj − j + 1 for j = 1, . . . , n − 1, then νc (z) = nj=1 dj νj (z). Thus,
P
n
X n
X n
X
νc (x) = dj νj (x) ≤ dj νj (y) = cj yj = νc (y).
j=1 j=1 j=1
Suppose (b) holds. Let ν be a symmetric norm. Then for any c = (c1 , . . . , cn ) with
c1 ≥ · · · ≥ cn ≥ 0 with ν d (c) = 1, we have νc (x) ≤ νc (y). Thus, ν(x) = ν(y).
The implication (b) ⇒ (c) is clear. □
Theorem 5.4.4 Let A, B ∈ Mm,n (Fn ) with m ≥ n. The following are equivalent.
(a) Fk (A) ≤ Fk (B) for all k = 1, . . . , n.
(b) ∥A∥c ≤ ∥B∥c for all nonzero c = (c1 , . . . , cn ) with c1 ≥ · · · ≥ cn ≥ 0.
(c) ∥A∥ ≤ ∥B∥ for all UI norms ∥ · ∥.
Theorem 5.4.5 Let Rk ⊆ Mm,n be the set of matrices of rank at most k with m ≥ n > k.
Suppose ∥ · ∥ is a UI norm. If A ∈ Mm,n such that U ∗ AV = nj=1 sj (A)Ejj , then Ak =
P
∥A − Ak ∥ ≤ ∥A − X∥ for all X ∈ Rk .
45
Proof. Let X ∈ Rk and C = A − X. Then sj (X) = 0 for j > k so that
ℓ
X ℓ
X ℓ
X
sk+j (A) = (sk+j (A) − sk+1 (X)) ≤ sj (C), for all ℓ = 1, . . . , n − k.
j=1 j=1 j=1
46
5.5 Errors in computing inverse and solving linear equations
Theorem 5.5.1 If B ∈ Mn satisfies r(B) < 1, then I − B is invertible and
∞
X
−1
(I − B) = Bk.
k=0
Furthermore, if ∥ · ∥ is a matrix norm on Mn such that ∥A−1 E∥ < 1 and κ(A) = ∥A−1 ∥∥A∥,
then
∥A−1 − (A + E)−1 ∥ ∥A−1 E∥ κ(A) ∥E∥
−1
≤ −1
≤ .
∥A ∥ 1 − ∥A E∥ 1 − κ(A)(∥E∥/∥A∥) ∥A∥
Suppose ∥ · ∥ is a matrix norm on Mn such that ∥A−1 E∥ < 1, and if ν is a norm on Cn such
that ν(Bz) ≤ ∥B∥ν(z) for all B ∈ Mn and z ∈ Cn . If κ(A) = ∥A−1 ∥∥A∥, then
47
6 Additional topics
6.1 Location of eigenvalues
Theorem 6.1.1 (Gershgorin Theorem) Let A ∈ (aij ), and let
X
Gj (A) = {µ ∈ C : |µ − ajj | ≤ |aji |}.
i̸=j
Then the eigenvalues of A lies in G(A) = ∪nj=1 Gj (A). Furthermore, if C = Gi1 (A) ∪ · · · ∪
Gij (A) form a connected component of G, then C contains exactly k eigenvalues counting
multipicities.
To prove the last assertion. Let At = D + t(A − D) with D = diag (a11 , . . . , ann ).
Then A0 has eigenvalues a11 , . . . , ann , and the eigenvalues and Gershgorin disk will change
continuously according to t ∈ [0, 1] until we get A1 = A.
One can apply the result to At to get Gershgorin disks of different sizes centered at
a11 , . . . , ann . Also, one can apply the result to S −1 AS for (simple) invertible S such that
G(S −1 AS) is small. In fact, if A is already in Jordan form, then for any ε > 0 there is
S such that S −1 AS has diagonal entries λ1 , . . . , λn and (i, i + 1) entries equal 0 or ε for
i = 1, . . . , n − 1, and all other entries equal to 0. So, we have the following.
One may use the Gershgorin theorem to study the zeros of a (monic) polynomial, namely,
one can apply the result to the companion matrix Cf of f (x) to get some estimate of the
location of the zeros. One can further apply similarity to Cf to get better estimate for the
zeros of f (x).
48
6.2 Eigenvalues and principal minors
Theorem 6.2.1 Let A ∈ Mn with eigenvalues λ1 , . . . , λn . Then
where for m = 1, . . . , n,
X
am = Sm (λ1 , . . . , λn ) = (λj1 + · · · + λjm )
1≤j1 <···<jm ≤n
Proof. For any subseteq J ⊆ {1, . . . , n}, let A[J] be the principal submatrix of A with row
and column indices in J. Consider the expansion det(zI − A). The coefficient of z n−j comes
from the sum of the leading coefficients of (−1)j det(A[J]) det(zI − A[J]) for all different
j-element subsets J of {1, . . . , n}. The result follows. □
(a) r(A) > 0 is an algebraically simple eigenvalue of A such that r(A) > |λ| for all other
eigenvalue λ of A.
(b) There is a unique positive vector x with ℓ1 (x) = 1 such that Ax = r(A)x, and there is
a unique positive vector y with y t x = 1 and y t A = r(A)y t .
(b) If all the row sums are the same, then r(A) = r1 . In general,
49
Proof. (a) If B = A + P , then for any positive integer k, B k − Ak is nonnegative so that
∥Ak ∥ℓ∞ ≤ ∥B k ∥ℓ∞ . Hence,
1/k 1/k
r(A) = lim ∥Ak ∥ℓ∞ ≤ lim ∥B k ∥ℓ∞ = r(B).
k→∞ k→∞
(b) Suppose all the row sums are the same. Let e = (1, . . . , 1)t . Then Ae = r1 e so that
r1 is an eigenvalue. By Gershgorin Theorem all eigenvalues lie in
Sn n P o
i=1 µ ∈ C : |µ − a ii | ≤ j̸=i ij .
a
Proof of Theorem 6.3.1. Assume B = Ak is positive. Then r(B) is larger than the
minimum row sum of B so that 0 < r(B) = r(A)k . Note that Bv is positive for any nonzero
vector v ≥ 0.
Assertion 1 Let λ be an eigenvalue of B. Either |λ| < r(B) or λ = r(B) with an eigenvector
x such that x = eiθ |x| for some θ ∈ R.
Proof. Let λ be an eigenvalue of B such that |λ| = r(B), and x be an eigenvector. Then
r(B)|x| = |r(B)x| = |Bx| ≤ B|x|. We claim that the equality holds. If it is not true, we can
set z = B|x| so that y = (B − r(B))|x| = z − r(B)|x| ≠ 0 is nonnegative. Then
So, z = (z1 , . . . , zn )t has positive entries, and for Z = diag (z1 , . . . , zn ), we have
If follows that Z −1 BZ has minimum row sum r(B) + δ, where δ = ℓ∞ (Z −1 By) > 0. So,
r(Z −1 BZ) ≥ r(B) + δ, which is a contradiction.
Now, r(B)|x| = B|x| has positive entries, and |Bx| = r(B)|x| = B|x|. Thus, x = eiθ |x|,
i.e., x is the eigenspace of r(B) and λ = r(B). The proof of Assertion 1 is complete.
Assertion 2 The value r(B) is a simple eigenvalue of B with a unique positive positive
eigenvector x satisfying et x = 1 and a unique positive left eigenvector y such that y t x = 1.
Moreover, there is an invertible matrix S ∈ Mn such that x is the first column of S and y t
is the first row of S −1 satisfying S −1 BS = [r(B)] ⊕ B1 with r(B1 ) < r(B).
50
Proof. Suppose Bu = r(B)u and Bv = r(B)v for two linearly independent vectors u
and v such that et |u| = et |v| = 1. By the arguments in the previous paragraphs, we see
that there are θ, ϕ ∈ R such that u = eiθ |u| and v = eiϕ |v|, such that |u|, |v| have positive
entries. So, there is β > 0 such that |u| − β|v| is nonnegative with at least one zero entry.
We have r(B)(|u| − β|v|) = B(|u| − β|v|), and B(|u| − β|v|) has a positive entries, which is
a contradiction. So, |u| = |v|.
Let x be the unique positive eigenvector such that Bx = r(B)x satisfying et x = 1.
Then we can consider B t and obtain a positive vector B t y = r(B)y satisfying xt y = 1. Let
S = [x|S1 ] ∈ Mn be such that the columns of y t S1 = [0, . . . , 0] ∈ R1×n−1 . Then x is not in
the column space of S1 because y t x = 1 ̸= 0. So, S is invertible. Moreover, y t S = [1, 0, . . . , 0]
so that y is the first row of S −1 . Now, if S −1 BS = C, then SC = BS has first column equal
r(B)e1 . Thus, the first column of C is r(B)e1 . Similarly, the first column of CS −1 = S −1 B
equals r(B)y t . Thus, the first row of C is r(B)et1 . Hence, S −1 BS = [r(B)] ⊕ B1 such that
r(B1 ) < r(B). Assertion 2 follows.
Assertion 3 The conclusion of Theorem 6.3.1 holds.
Proof. Note that the vectors x and y in Assertion 3 are the left and right eigenvectors of
A corresponding to a simple eigenvalue λ of A with |λ| = r(A). Now, Ax = λx implies that
λ = r(A). So, S −1 AS = [r(A)] ⊕ A1 such that r(A1 ) < r(A). Finally,
For a nonnegative matrix A, r(A) is call the Perron eigenvalue of A, and the corresponding
nonnegative left and right eigenvectors are called the Perron eigenvectors.
Example 6.3.4 Note that Ak is not positive for any positive integer k in all the following.
If A = I2 , then r(A) = 1 and all nonzero vectors are left and right eigenvectors.
1 1
If A = , then r(A) = 1 with right and left eigenvectors x = (1, 0)t /2 and y = (0, 1)t .
0 1
1/2 1/2
If A = , then r(A) = 1 with right and left eigenvectors x = (1, 1)t /2 and
0 1
y = (0, 2)t .
51
0 1
If A = , then r(A) = 1 with right and left eigenvectors x = (1, 1)t /2 y = (1, 1)t .
1 0
A row (column) stochastic matrix is a matrix with nonnegative entries such that all row
(colum) sums equal one. It appear in the study of Markov Chain in probability, population
models, Google page rank matrix, etc. If A ∈ Mn is both row and column stochastic, then
it is doubly stochastic.
Corollary 6.3.5 Let A be a row stochastic matrix. Then r(A) = 1. If Ak is positive, then
r(A) is a simple eigenvalue with a unique positive left eigenvector x satisfying et x = 1, and
a unique positive left eigenvector y such that Ak → xy t as k → ∞.
Theorem 6.4.2 The following equations hold for scalar a, b and matrices A, B, C, D) pro-
vided that the sizes of the matrices are compatible with the described operations.
(a) (aA + bB) ⊗ C = aA ⊗ C + bB ⊗ C, C ⊗ (aA + bB) = aC ⊗ A + bC ⊗ B.
(b) (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).
Proof. (a) By direct verification. (b) Suffices to show (A ⊗ B)(Cj ⊗ Dk ) = (ACj ) ⊗ (BDk )
for all columns Cj of C and Dk of D. □
52
We have the following application of the tensor product results to matrix equations.
AX + XB = C X ∈ Mm,n
The Hadamard (Schur) product of two matrices A = (aij ), B = (bij ) ∈ Mm,n is defined
by A ◦ B = (aij bij ).
Remark Note that if A, B ∈ Mn are invertible, unitary, or normal, it does not follow that
A ◦ B has the same property.
53
6.5 Compound matrices
Let A ∈ Mm,n and k ≤ min{m, n}. Then the compound matrix Cm (A) is of size m × nk
k
with rows labeled by increasing subseqeuence r = (r1 , . . . , rk ) of (1, . . . , m) and columns
labeled by increasing subseqeuence s = (s1 , . . . , sk ) of (1, . . . , n) in lexicographic order such
that the (r, s) entry of Cm (A) equals det(A[r, s]), where A[r, s] ∈ Mk is the submatrix of A
with rows and columns indexed r and s, arranged in lexicographic order.
Example 6.5.1 Let A ∈ M4 . Then C2 (A) ∈ M6 with (r1 , r2 ), (s1 , s2 ) entry equal to det(A[r1 , r2 ; s1 , s2 ]).
Theorem 6.5.2 Let A ∈ Mm,n and B ∈ Mn,m . Then for any 1 ≤ k ≤ m, the sum of the
k × k principal minors of AB is the same as that of BA ∈ Mn .
Note that when k = m ≤ n, the above result is known as the Cauchy Binet formula.
Proof. Recall that if
AB 0 0m 0 Im A
P = , Q= and S= ,
B 0n B BA 0 In
Thus the sum of the kth principal minors of P and that of Q are the same. Evidently, the
sum of the kth principal minors of P are the same as that of AB, and the sum of the kth
principal minors of Q are the same as that of BA. The result follows. □
54
Theorem 6.5.3 Let A ∈ Mm,n , B ∈ Mn,p and k ≤ min{m, n, p}. Then Ck (AB) =
Ck (A)Ck (B).
Proof. Let Γr,k be the set of length k increasing subsequence of (1, . . . , r) for r ≥ k.
Consider the entry of Ck (AB) with row indexes r = (r1 , . . . , rk ) ∈ Γm,k and column indexes
s = (s1 , . . . , sk ) ∈ Γn,k . Let  ∈ Mk,n be obtained from A by using its rows indexed
by (r1 , . . . , rk ), and let B̂ ∈ Mn,k be obtained from B by using its columns indexed by
(s1 , . . . , sk ). Then the (r, s) entry of Ck (AB) equals det(ÂB̂) = Ck (Â)Ck (B̂) by the Cauchy
Binet formula. Note that Ck (Â)Ck (B̂) is the (r, s) entry of Ck (A)Ck (B). The result follows.
□
Pn
(c) Suppose U ∗ AV = D with D = j=1 sj (A)Ejj , where U, V are unitary. Then
Corollary 6.5.5 Let A ∈ Mn with eigenvalues λ1 (A), . . . , λn (A) satisfying |λ1 (A)| ≥ · · · ≥
|λn (A)|. Then
Yk k
Y
|λj (A)| ≤ sj (A) for j = 1, . . . , n.
j=1 j=1
55
6.6 Additive compound
Let A ∈ Mn and 1 ≤ k ≤ n, and
Theorem 6.6.1 Let A ∈ Mn . Then Dk (S −1 AS) = Ck (S)−1 Dk (A)Ck (S) so that A has
eigenvalues kj=1 λij (A) with 1 ≤ i1 < · · · < ik ≤ n. Consequently, if A is normal (Hermi-
P
k
X k
X k
X
λn−j+1 (A) ≤ λj (A) ≤ sj (A).
j=1 j=1 j=1
Proof. The proof follows from the fact that Dk (X) can be written as
k
X
V ∗ (In ⊗ · · · ⊗ In ⊗X ⊗ In ⊗ · · · ⊗ In V,
| {z } | {z }
j=1 j−1 k−j
where V ∈ Mnk ×(n) such that V ∗ V = I(n) and the columns of V is a basis for the subspace
k k
k
of Cn spanned by
( )
X
χ(σ)eσ(i1 ) ⊗ · · · ⊗ eσ(ik ) : 1 ≤ i1 < · · · < ik ≤ n ,
σ∈Sk
56
6.7 More block matrix techniques
A11 A12
Schur Complement Let A = such that A11 ∈ Mk is invertible. Then
A21 A22
Ik 0 A11 A12 A11 A12
−1 = .
−A21 A11 In−k A21 A22 0 A22 − A21 A−1
11 A12
The matrix A22 − A21 A−1 11 A12 is the Schur complement of A with respect to Q11 . Clearly, it
is useful for block Gaussian elimination. Also, if A is invertible, then the Schur complement
if the n − k by n − k submatrix in A−1 .
If A−1 exists, then A22 − A21 A−111 A12 is invertible and
−1
−1 A11 A12 Ik 0 ⋆ ⋆
A = = −1 .
0 A22 − A21 A−1
11 A12 A21 A−1
11 In−k ⋆ (A22 − A21 A−1
11 A12 )
So, (A22 − A21 A111 A12 )−1 is the (n − k) × (n − k) matrix in the right bottom block of A−1 .
A11 A12
Block Hermitian matrices Suppose A = such that A11 ∈ Mk is invertible.
A21 A22
Ik 0
If S = , then SAS ∗ = A11 ⊕ (A22 − A21 A−1
11 A12 ).
−A12 A−1
11 In−k
57