0% found this document useful (0 votes)
371 views117 pages

Matrix Theory Script

The course will take a holistic approach to analyzing matrix multiplication. It will derive and apply tools from matrix theory, including decompositions and properties of special matrix classes. A key focus will be characterizing properties of commuting matrices and simultaneous triangularization of multiple matrices. The course aims to provide a comprehensive yet cohesive view of matrix theory, navigating its conceptual complexity. It begins with a review of single matrix triangularization and progresses to topics like hypercomplex number systems, commuting matrix properties, simultaneous triangularization applications, and theorems about commutators. Familiarity with

Uploaded by

Surya Iyer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
371 views117 pages

Matrix Theory Script

The course will take a holistic approach to analyzing matrix multiplication. It will derive and apply tools from matrix theory, including decompositions and properties of special matrix classes. A key focus will be characterizing properties of commuting matrices and simultaneous triangularization of multiple matrices. The course aims to provide a comprehensive yet cohesive view of matrix theory, navigating its conceptual complexity. It begins with a review of single matrix triangularization and progresses to topics like hypercomplex number systems, commuting matrix properties, simultaneous triangularization applications, and theorems about commutators. Familiarity with

Uploaded by

Surya Iyer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 117

Matrix Theory

Jörg Liesen
TU Berlin
Institute of Mathematics

Version of March 18, 2021


Olga once told us,
“To get full insight into matrix theory
is like finding roads in a jungle.”
Helene Shapiro [71, p. 21]

Preface
The quote above is from Helene Shapiro’s recollection of the course Math 223: Matrix
Theory taught by Olga Taussky1 at the California Institute of Technology (Caltech) in
1976–1977. It indicates that Matrix Theory is a vast and tangled area with an abundance
of concepts and results, overgrowing each other to an extend that it becomes almost
impenetrable. Olga Taussky was certainly not the first to recognize this situation. The
main reasons for the unrelentless growth of the area was indicated already in the Preface
of the first systematic book on Matrix Theory, published in 1933 by MacDuffee [51]2 :

Matric algebra is a mathematical abstraction underlying many seemingly di-


verse theories. Thus bilinear and quadratic forms, linear associative algebra
(hypercomplex systems), linear homogeneous transformations and linear vec-
tor functions are various manifestations of matric algebra. Other branches
of mathematics as number theory, differential and integral equations, contin-
ued fractions, projective geometry etc. make use of certain properties of this
subject. Indeed, many of the fundamental properties of matrices were first
discovered in the notation of a particular application, and not until much
later recognized in their generality.

While MacDuffee in 1933 focused mostly on the inner-mathematical application of ma-


trices as a unifying “abstraction”, they in the meantime have become increasingly im-
portant in “real world” applications. Today we observe that matrices appear virtually
everywhere in science and engineering, so that results about them are derived from the
most diverse points of view, which makes navigating the Matrix Theory jungle more and
more challenging.
There are (at least) two main ordering principles used by authors of modern books on
Matrix Theory in order to tackle the confusing variety of results in the area: the view-
point of matrix decompositions, and the viewpoint of matrix classes. Under the first
1
Olga Taussky (1906–1995) was a “Torchbearer for Matrix Theory” [81]. Her profound and lasting
influence on the field as well as on the matrix theorists was described in detail in [65] by Hans Schneider
(1927–2014), the first president of the International Linear Algebra Society and founding editor of the
journal Linear Algebra and Its Applications.
2
Cyrus Colton MacDuffee (1895–1961)

2
viewpoint one tries to collect results related to certain decompositions or (similarity)
transformations, for example triangular factorizations or unitary similarity. And under
the second viewpoint one tries to collect results for certain matrix classes, for example
Hermitian (positive (semi-)definite), unitary, normal and nonnegative matrices, or even
more special classes like tridiagonal, circulant, Cauchy, Toeplitz and Vandermonde ma-
trices. Such structuring of the material is highly useful, and if applied consistently, it
can lead to excellent reference books or jungle guides (to stay in the picture). Examples
are the books by Horn and Johnson from 2012 [40] and Zhang from 2011 [89], and to
some extend also those of Serre from 2002 [68] and Friedland from 2016 [19], which also
contain an applied or algorithmic aspect. It appears that the general reference or sur-
vey style presentation of Matrix Theory is not only present in modern treatments, but
dates back to the classics of the area such as the books of MacDuffee quoted above and
Gantmacher from 1959 [24], who points out in the Preface that he “tried to keep the
individual chapters as far as possible independent of each other”.
In this course we will take a different approach, which is motivated by the content
of Olga Taussky’s Math 223: Matrix Theory. Our main point of view will be a holistic
(meaning: from every angle) analysis of matrix multiplication. The advantage of building
the course around such a mathematical concept is that we will obtain a “plot” for the
course rather than presenting material in the style of a reference book. During our
investigations we will derive, analyze and apply a wide range of tools and techniques
from Matrix Theory, including many matrix decompositions, the field of values, the
matrix exponential, (Sylvester) matrix equations, or properties of many special matrix
classes. Thus, while focussing on results revolving around matrix multiplication, the
course nevertheless will provide quite a large map for the Matrix Theory jungle.
What’s so special about matrix multiplication? Everybody who studied Linear Algebra
knows that this operation is not commutative. Thus, for matrices A, B ∈ K n,n we can not
expect in general that the products AB and BA are equal. In fact, if we pick “arbitraty”
matrices A, B ∈ K n,n , it is “highly unlikely” that they commute. Since commutativity of
some given A and B appears to be the exception rather than the rule, we are immediately
motivated to ask questions: What are algebraic and analytic properties of commuting
matrices, and are these properties useful? Which matrices commute with a given matrix?
Can we characterize subsets or subspaces of K n,n containing only pairwise commuting
matrices? On the other hand, if A and B do not commute, is there a reasonable concept
of “almost” or “quasi” commutativity? And what are the properties of the additive and
multiplicative commutators, defined by AB − BA and ABA−1 B −1 ? All these questions,
and further related ones, will be addressed in this course.
The course starts with a review of results about the triangularization of a single ma-
trix (Chapter 1). Conceptually these results from the foundation for the simultaneous
triangularization of two or more matrices, which will play a major role in the course.

3
In order to set up our approach after Chapter 1, we quote from the start of the Intro-
duction of Olga Taussky’s article “Commutativity in finite matrices” [78]:

The real and complex numbers have the properties that ab = ba for all a and b
and for every a 6= 0 there is an inverse a−1 , which implies ab = 0 only if a = 0
or b = 0. A classical theorem of Frobenius states that there exists no other
hypercomplex systems which have these properties. If we want to consider
more general systems, we must give up at least one of these properties. [...]
The set of n × n matrices A, B, . . . , with complex elements can be regarded as
a hypercomplex system with n2 base elements and in this system, in general,
AB 6= BA. In general, also, there is no inverse A−1 even if A 6= 0 and,
furthermore, AB can be zero without A = 0 or B = 0.

The first major topic of the course will be the characterization of “hypercomplex number
systems” (Chapter 2). We will see that associativity and commutativity of the multi-
plication in such systems naturally leads to a study of the properties of commuting
matrices. Moreover, we will prove the classical theorem of Frobenius (from 1878), which
shows that the real and complex numbers are the only examples of such systems that
are associative and commutative.
Next we will study commuting matrices, including some analytic properties (Chapter 3)
and their general structure (Chapter 4), followed by an investigation of the simultaneous
triangularization property of a finite set of matrices, which is implied by commutativity,
and its applications (Chapter 5).
The final part of the course deals with properties of the additive and multiplicative com-
mutators, with a focus on the theorems of McCoy, which characterize the simultaneous
triangularization property of a finite set of (non-commuting) matrices (Chapter 6).
It is assumed that students are familiar with the theory of Linear Algebra and of matrices
as established in the book J. Liesen and V. Mehrmann, Linear Algebra, Springer, 2015,
which will be cited as [LM] in these notes. Many relevant definitions and facts from that
book, and a few further results not contained in that book, are collected in the Appendix
for completeness.
Please send corrections or suggestions to me at [email protected].

Jörg Liesen, Berlin, March 18, 2021

4
Contents

1 Triangularization of a single matrix and related results 6

2 The existence of real division algebras 15


2.1 Algebras over a field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Division algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Some analytic properties of commuting complex matrices 35


3.1 Field of values results for products of commuting matrices . . . . . . . . 35
3.2 The exponential function of commuting matrices . . . . . . . . . . . . . . 43

4 Sylvester equations and the structure of commuting matrices 55

5 Simultaneous triangularization and applications 63


5.1 Simultaneous triangularization . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Applications of the simultaneous triangularization . . . . . . . . . . . . . 73

6 Commutators and the theorems of McCoy 80


6.1 The additive commutator . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 The theorems of McCoy . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3 The multiplicative commutator . . . . . . . . . . . . . . . . . . . . . . . 99

A Definitions and facts from Linear Algebra 103

5
Chapter 1

Triangularization of a single matrix


and related results

We start with the following result about the triangularization of a single matrix, which
according to Horn and Johnson is “[p]erhaps the most fundamentally useful fact of ele-
mentary matrix theory” [40, p. 101].

Theorem 1.1. If K is any field, then the following assertions are equivalent for each
A ∈ K n,n :

(1) PA decomposes into linear factors over K.

(2) There exists a matrix S ∈ GLn (K) such that S −1 AS is upper triangular, i.e., A
can be triangularized.

Proof. (2) ⇒ (1): If S −1 AS = R = [rij ] is upper triangular, then


n
Y
PA = det(tI − SRS −1 ) = det(tI − R) = PR = (t − rjj ).
j=1

(1) ⇒ (2): We prove this direction by induction on n. The assertion is trivial for n = 1,
since every 1 × 1 matrix is upper triangular. Suppose that the assertion holds for all
matrices up to order n − 1 for some n ≥ 2, and let A ∈ K n,n be given. Since PA
decomposes into linear factors, there exists an eigenvalue λ of A with a corresponding
eigenvector x ∈ K n . Let Y ∈ K n,n−1 be any matrix such that X = [x, Y ] ∈ GLn (K),
then  
λ ∗
AX = [λx, AY ] = X ,
0 A1

6
for some matrix A1 ∈ K n−1,n−1 . From PA = (t − λ)PA1 and our assumption on PA we
see that PA1 decomposes into linear factors. Thus the induction hypothesis implies that
there exists a matrix S1 ∈ GLn−1 (K) such that S1−1 A1 S1 is upper triangular. With
 
1 0
S := X ∈ GLn (K)
0 S1

we now get
       
−1 1 0 −1 1 0 1 0 λ ∗ 1 0
S AS = X AX =
0 S1−1 0 S1 0 S1−1 0 A1 0 S1
 
λ ∗
= ,
0 S1−1 A1 S1

which is upper triangular.

Note that proving Theorem 1.1 is significantly simpler than proving the existence of the
Jordan decomposition of a matrix A ∈ K n,n . However, unlike the Jordan form, the upper
triangular matrix R in Theorem 1.1 gives no information about the geometric structure
of the (generalized) eigenspaces. A discussion of the triangularization in the abstract
setting of endomorphisms on finite dimensional vector spaces as well as a complete proof
of the Jordan decomposition in this context is given in [LM, Section 14.3].
If S −1 AS = R = [rij ] is upper triangular, then the diagonal elements rjj , j = 1, . . . , n, are
the eigenvalues of A. If p ∈ K[t] is any polynomial, then p(A) = p(SRS −1 ) = Sp(R)S −1 ,
where p(R) is upper triangular with diagonal elements given by p(rjj ), j = 1, . . . , n.
These are the eigenvalues of p(A). In short, we can write this observation as

σ(p(A)) = p(σ(A)).

This result for polynomials of matrices can be generalized to analytic functions of


bounded linear operators in (infinite dimensional) Banach spaces, and in the functional
analysis context it is sometimes called the spectral mapping theorem.
If we reverse the order of the columns of the matrix S = [s1 , . . . , sn ], and hence form
Sb := [sn , . . . , s1 ], we obtain
 
rnn
ASb = Sb  ... ... .
 
r1n · · · r11

Consequently, we could replace upper triangular by lower triangular in item (2) of Theo-
rem 1.1. One reason why the theorem is formulated in terms of the upper triangular form

7
is that this form appears naturally in the iductive proof of the implication (1) ⇒ (2). If
we would like to end up with a lower triangular form, then the eigenvector that is found
in the first step of the proof must be the last vector of the basis we construct.
For the special case K = C we get the following result, originally due to Schur1 [67].

Corollary 1.2. If A ∈ Cn,n , then there exists a unitary matrix U ∈ Cn,n such that
U H AU is upper triangular, i.e., A can be unitarily triangularized.

Proof. If A ∈ Cn,n , then PA decomposes into linear factors, since the field C is alge-
braically closed. Thus, there exists a matrix S ∈ GLn (C) such that S −1 AS = R1 is
upper triangular. Let S = U R2 be a QR decomposition of S, where U ∈ Cn,n is unitary
and R2 ∈ GLn (C) is upper triangular. Then

U H AU = U −1 AU = R2 S −1 ASR2−1 = R2 R1 R2−1 ,

which is a product of upper triangular matrices, and hence is upper triangular.

We will refer to a decomposition of the form A = SRS −1 or A = U RU H as in Theorem 1.1


or Corollary 1.2, respectively, as a Schur decomposition of A, and call the respective
upper triangular matrix R a Schur form of A. The matrices A and R are called unitarily
similar. An extensive survey of results related to unitary similarity is given in [69].
We will next use the Schur decomposition for deriving the unitary diagonalization or
spectral decomposition of normal matrices2 and special classes of normal matrices.

Corollary 1.3. For A ∈ Cn,n the following assertions hold:

(1) A is normal, i.e., AH A = AAH , if and only if there exists a unitary matrix U ∈ Cn,n
such that U H AU is diagonal, i.e., A can be unitarily diagonalized.

(2) A is Hermitian, i.e., A = AH , if and only if A can be unitarily diagonalized and


all the eigenvalues of A are real.

(3) A is unitary, i.e., AH A = I, if and only if A can be unitarily diagonalized and all
eigenvalues λ of A satisfy |λ| = 1.
1
Issai Schur (1875–1941)
2
The concept of a normal matrix (actually, of a normal bilinear form) was introduced by Otto Toeplitz
(1881–1940) in 1918 [83]. In that paper Toeplitz also proved the result stated in Corollary 1.3 (1), and
he mentioned that this result was known to him and Schur for a long time. According to a footnote in a
paper of Alexander Markowitsch Ostrowski (1893–1986) from 1917 [59, p. 118], Toeplitz wrote to Schur
about the result already in 1914.

8
Proof. (1) Let A = U RU H be a Schur decomposition of A ∈ Cn,n . Then

AH A = (U RH U H )(U RU H ) = U RH RU H ,
AAH = (U RU H )(U RH U H ) = U RRH U,

and hence AH A = AAH implies that RH R = RRH . We will show by induction


on n that R must be diagonal. This is trivial for n = 1. Let R ∈ Cn,n for some
n ≥ 2 be upper triangular with RH R = RRH , and assume that the assertion holds
for all upper triangular matrices of order n − 1. We can write
 
ρ r1H
R=
0 R1

for some r1 ∈ Cn−1 and some upper triangular R1 ∈ Cn−1,n−1 . Now


    2 
H ρ 0 ρ r1H |ρ| ρr1H
R R= = =
r1 R1H 0 R1 ρr1 r1 r1H + R1H R1
    2 
H ρ r1H ρ 0 |ρ| + kr1 k22 r1H R1H
RR = = .
0 R1 r1 R1H R1 r1 R1 R1H

Comparing the (1,1) elements in these matrices shows that r1 = 0. Moreover, a


comparison of the (2,2) elements gives R1H R1 = R1 R1H . Since R1 is of order n − 1,
the induction hypothesis implies that R1 is diagonal, and thus R is diagonal.
On the other hand, if A = U DU H with a diagonal matrix D, then

AH A = (U DH U H )(U DU H ) = U DH DU H = U DDH U H = AAH ,

i.e., A is normal.

(2) If A is Hermitian, then A is normal, and hence A = U DU H with a diagonal matrix


D. Now
A = U DU H = AH = U DH U H
shows that D = DH , i.e., the eigenvalues of A are all real.
On the other hand, if A = U DU H with D ∈ Rn×n , then

AH = (U DU H )H = U DH U H = U DU H = A,

i.e., A is Hermitian.

(3) If A is unitary, then A is normal, and hence A = U DU H with a diagonal matrix


D. Now
In = AH A = (U DH U H )(U DU H ) = U DH DU H

9
implies that DH D = In , and thus |dii | = 1.
On the other hand, if A = U DU H with DH D = In , then

AH A = (U DH U H )(U DU H ) = In ,

i.e., A is unitary.

As shown in this corollary, normality of a complex matrix is equivalent with unitary


diagonalizability. Approximately 90 additional equivalent conditions are given in the
articles [27] from 1987 and [15] from 1998.
One of the most widely known and useful equivalent conditions is that A ∈ Cn,n is
normal if and only if there exists a polynomial p with AH = p(A). This is easy to see: If
AH = p(A), then trivially

AH A = p(A)A = Ap(A) = AAH .

On the other hand, if A is normal, then A can be unitarily diagonalized. If A = U DU H


with D = diag(λ1 , . . . , λn ) is such a diagonalization, we have AH = U DH U H with
DH = diag(λ1 , . . . , λn ). If p is any (interpolation) polynomial that satisfies p(λj ) = λj
for j = 1, . . . , n, then

AH = U DH U H = U p(D)U H = p(U DU H ) = p(A),

as required. From elementary considerations about (Lagrange) interpolation it is clear


that the maximum degree of such a polynomial p is bounded by the number of distinct
eigenvalues among λ1 , . . . , λn minus 1.
Note that the unitary diagonalization A = U DU H with U = [u1 , . . . , un ] and D =
diag(λ1 , . . . , λn ) can be written as
n
X
A= λ j Pj , Pj := uj uH
j , j = 1, . . . , n,
j=1

i.e., A is decomposed into the sum of n rank-one matrices. We have Pj2 = Pj and
PjH = Pj , and for each x ∈ Cm ,

Pj x ∈ span{uj }, (I − Pj )x ⊥ span{uj }.

Thus, the matrix Pj is an orthogonal projector onto the invariant subspace span{uj }.
Corollary 1.4. Let A ∈ Cn,n be Hermitian. Then A is positive definite (semidefinite) if
and only if all eigenvalues of A are positive (nonnegative).

10
Proof. If A ∈ Cn,n is Hermitian, there exists a unitary diagonalization by (1) in Corol-
lary 1.3, say A = U DU H with a unitary matrix U = [u1 , . . . , un ] ∈ Cn,n and D =
diag(λ1 , . . . , λn ) ∈ Rn,n . In particular, Aui = λi ui for i = 1, . . . , n.
If A is positive definite, i.e., xH Ax > 0 for all nonzero x ∈ Cn , then
0 < uH H
i Aui = λi ui ui = λi for all i = 1, . . . , n.
On the other hand, let λi > 0 for all i = 1, . . . , n. If x ∈ Cn is arbitrary, then x =
P n T
i=1 αi ui = U [α1 , . . . , αn ] for some α1 , . . . , αn ∈ C. Hence
n
X
H T
x Ax = [α1 , . . . , αn ]D[α1 , . . . , αn ] = λi |αi |2 ,
i=1

which is positive when x 6= 0. The same argument (with “>” replaced by “≥”) applies
to the semidefinite case.

Exercise. Prove the Schur inequality: If A = [aij ] ∈ Cn,n has the eigenvalues
λ1 , . . . , λn ∈ C, then
Xn Xn
2
|λj | ≤ |aij |2 ,
j=1 i,j=1

with equality if and only if A is normal. (Hint: Use the Schur decomposition and the
unitary invariance of the trace.)

Exercise. Show that if a normal matrix A ∈ Cn,n has its eigenvalues as diagonal
elements, then A is diagonal.

The situation for real matrices is more complicated, since the field R is not algebraically
closed. Of course, we may consider any matrix A ∈ Rn,n as an element of Cn,n , and then
A is unitarily triangularizable. The factors in the resulting decomposition of A will in
general be non-real. If we insist on a real decomposition using an orthogonal (instead
of a unitary) matrix, then we have to give up the upper triangular form, since this
form implies that the characteristic polynomial of A decomposes into linear factors. The
following “real variant” of Corollary 1.2, originally due to Murnaghan and Wintner [57]3
gets us as close as possible to the upper triangular form in the decomposition.
Theorem 1.5. If A ∈ Rn,n , then there exists an orthogonal matrix Q ∈ Rn,n with
 
R11 . . . R1m
QT AQ = R = 
 .. ..  ∈ Rn,n ,
. . 
Rmm
3
Francis Dominic Murnaghan (1893–1976) and Aurel Wintner (1903–1958)

11
where for every j = 1, . . . , m either Rjj ∈ R or
" #
(j) (j)
r1 r2 (j)
Rjj = (j) (j) ∈ R2,2 with r3 6= 0.
r3 r4

In the second case Rjj has, considered as complex matrix, a pair of complex conjugate
eigenvalues of the form αj ± iβj with αj ∈ R and βj ∈ R \ {0}.

Proof. We prove the result by induction on n. For n = 1 we have A = [a11 ] = R and


U = [1].
Now suppose that the assertion holds for some n ≥ 1, and let A ∈ Rn+1,n+1 be given. As
an element of Cn+1,n+1 the matrix A has an eigenvalue λ = α + iβ ∈ C, α, β ∈ R, and
a corresponding eigenvector v = x + iy ∈ Cn+1,1 , x, y ∈ Rn+1,1 . Dividing the equation
Av = λv into its real and imaginary parts yields the two real equations

Ax = αx − βy and Ay = βx + αy. (1.1)

We have two cases:


Case 1: β = 0. Then the two equations in (1.1) are Ax = αx and Ay = αy, so that at
least one of the real vectors x or y is an eigenvector corresponding to the real eigenvalue
α of A. Without loss of generality we assume that this is the vector x, and that kxk2 = 1.
We can extend the vector x by the vectors w2 , . . . , wn+1 to an orthonormal basis of Rn+1,1 .
This gives the orthogonal matrix U1 = [x, w2 , . . . , wn+1 ] ∈ Rn+1,n+1 , which satisfies
 
T α ∗
U1 AU1 = ,
0 A1

for some matrix A1 ∈ Rn,n . By the induction hypothesis there exists an orthogonal
matrix U2 ∈ Rn,n such that R1 = U2T A1 U2 has the desired form. The matrix
 
1 0
U := U1
0 U2

is orthogonal and satisfies


     
T 1 0 T 1 0 α ∗
U AU = U1 AU1 = = R,
0 U2T 0 U2 0 R1

where R has the desired form.


Case 2: β 6= 0. We first assume that x, y are linearly dependent and derive a contradic-
tion, which shows that the vectors are in fact linearly independent. If x = 0, then using
β 6= 0 in the first equation in (1.1) implies that also y = 0. This is not possible, since

12
the eigenvector v = x + iy must be nonzero. Thus, x 6= 0, and using β 6= 0 in the second
equation in (1.1) implies that also y 6= 0. Thus, there exists a µ ∈ R \ {0} with x = µy.
The two equations in (1.1) then can be written as
1
Ax = (α − βµ)x and Ax = (β + αµ)x,
µ

which implies that β(1+µ2 ) = 0. Since 1+µ2 6= 0 for all µ ∈ R, this implies β = 0, which
contradicts our assumption that β 6= 0. Consequently, x, y are linearly independent.
Combining the two equations in (1.1) yields
 
α β
A[x, y] = [x, y] ,
−β α

where rank([x, y]) = 2. A QR decomposition of [x, y] is given by


 
r11 r12
[x, y] = [q1 , q2 ] = QR1 ,
0 r22

with QT Q = I2 and R1 ∈ GL2 (R). We obtain


   
−1 α β −1 α β
AQ = A[x, y]R1 = [x, y] R1 = QR1 R1−1 .
−β α −β α

The real matrix  


α β
R2 := R1 R1−1
−β α
has, considered as element of C2,2 , the pair of complex conjugate eigenvalues α ± iβ with
β 6= 0. In particular, the (2, 1)-entry of R2 is nonzero, since otherwise R2 would have
two real eigenvalues.
We can extend q1 , q2 by vectors w3 , . . . , wn+1 to an orthonormal basis of Rn+1,1 . (For
n = 1 the list w3 , . . . , wn+1 is empty.) Then U1 = [Q, w3 , . . . , wn+1 ] ∈ Rn+1,n+1 is
orthogonal, and we have
 
T T T R2 ∗
U1 AU1 = U1 [AQ, A[w3 , . . . , wn+1 ]] = U1 [QR2 , A[w3 , . . . , wn+1 ]] = ,
0 A1

for some matrix A1 ∈ Rn−1,n−1 . Analogously to the first case, an application of the
induction hypothesis to this matrix yields the desired matrices R and U .

The matrix R in Theorem 1.5 is called a real Schur form of A. The theorem implies the
following result for real normal matrices.

13
Corollary 1.6. A matrix A ∈ Rn,n is normal if and only if there exists an orthogonal
matrix Q ∈ Rn,n with
QT AQ = diag(R1 , . . . , Rm ),
where, for every j = 1, . . . , m either Rj ∈ R or
 
αj βj
Rj = ∈ R2,2 with βj 6= 0.
−βj αj

In the second case the matrix Rj has, considered as complex matrix, a pair of complex
conjugate eigenvalues of the form αj ± iβj .

Exercise. Prove Corollary 1.6 using Theorem 1.5.

Exercise. Use Corollary 1.6 to show that A ∈ Rn,n is symmetric if and only if there
exists an orthogonal matrix Q ∈ Rn,n such that QT AQ is diagonal, i.e., A can be
orthogonally diagonalized.

Exercise. Analogously to Corollary 1.4, show that a symmetric matrix A ∈ Rn,n is


positve definite (semidefinite) if and only if all its eigenvalues are positive (nonnegative).

Exercise. What is the real Schur form of an orthogonal matrix A ∈ Rn,n ?

14
Chapter 2

The existence of real division


algebras

If K is any field, then K n,n is a K-vector space of dimension n2 . In addition to the


vector space operations (matrix addition and scalar multiplication) we have the matrix
multiplication, which is distributive with respect to the vector space operations. Thus,
the algebraic structure (K n,n , +, ·, ∗) is an algebra 1 .
In addition to its non-communicativity, a remarkable property of the matrix multiplica-
tion is that for given matrices A 6= 0 and B we cannot always  solve matrix equations
0 1
of the form AC = B or DA = B. For example, if A = and B = I2 , then there
0 0
exist no C,D ∈ K 2×2 such that AC
 =B  or DA = B. But if B = A, then AC = B for
0 0 1 0
C= and DA = B for D = , in addition to the trivial solutions C = D = I2 .
0 1 0 0
In this chapter we will analyze the existence of sub-algebras of K n,n , in which matrix
equations always have uniquely determined solutions. This analysis will reveal funda-
mental properties of numbers systems.

2.1 Algebras over a field


We begin with the formal definition of an algebra.
1
The word algebra has several different meanings in mathematics, including a large area of mathe-
matics itself. When used as a single word, an algebra usually refers to a certain mathematical structure,
i.e., a set with certain operations, such as the algebra over a field in Definition 2.1. Other examples of
algebras are the σ-algebra of sets, the max-plus algebra of the union of the real numbers and −∞, or
Lie algebras; see Remark 2.4.

15
Definition 2.1. An algebra over a field K is a K-vector space (V, +, ·) with an additional
multiplication
∗ : V × V → V, (x, y) 7→ x ∗ y,
which satisfies the distributive laws

(α · x + β · y) ∗ z = α · (x ∗ z) + β · (y ∗ z), x ∗ (α · y + β · z) = α · (x ∗ y) + β · (x ∗ z),

for all x, y, z ∈ V and α, β ∈ K. The algebra is called associative or commutative2 , if

x ∗ (y ∗ z) = (x ∗ y) ∗ z or x∗y =y∗x

hold for all x, y, z ∈ V , respectively. The dimension of the algebra is defined as the
dimension of the underlying K-vector space.
A nonzero element e ∈ V is called a unit element if e ∗ x = x ∗ e = x holds for all x ∈ V .
If a unit element e ∈ V exists, an element x ∈ V is called invertible when there exists
an element y ∈ V with x ∗ y = y ∗ x = e. The element y is called an inverse of x.
A nonzero element x ∈ V is called a zero divisor if there exists a nonzero element y ∈ V
such that x ∗ y = 0.

The trivial algebra V = {0} consists just of the zero element of the vector space V .
Without explicitly mentioning it, we will usually assume that an algebra we consider is
not the trivial algebra.
For simplicity of notation we will usually skip the multiplication signs, i.e., we will write
αx and xy instead of α · x and x ∗ y.

Lemma 2.2. If V is an algebra over a field K, then the following properties hold:

(1) Any multiplication with zero elements (from K or V ) yields the zero element in V .

(2) If V contains a unit element, then this element is unique.

(3) If V is associative and contains a unit element, and x ∈ V is invertible, then the
inverse of x is unique. We denote the inverse of x by x−1 .
2
The terms distributive and commutative (acutally, the French distributives and commutatives) were
first used by François-Joseph Servois (1767–1847) in his Essai sur un nouveau mode d’exposition des
principes du calcul différential from 1814. The term associative is most likely due to Sir William Rowan
Hamilton (1805–1865), who used it first in the context of his work on quaternions in 1843 [31]; see
Section 2.2 below. Probably the first study of matrix commutativity was made by Arthur Cayley (1821–
1895) in 1858. He used the term convertible and derived “the general form the matrix L convertible
with a given matrix M of the order 2” [7, p. 29].

16
(4) If V is associative and contains a unit element, then an invertible element x ∈ V
cannot be a zero divisor, and vice versa.

Proof. (1) For each x ∈ V we have

0K · x = (0K + 0K ) · x = 0K · x + 0K · x ⇒ 0K · x = 0V ,
0V ∗ x = (0V + 0V ) ∗ x = 0V ∗ x + 0V ∗ x ⇒ 0V ∗ x = 0V ,
x ∗ 0V = x ∗ (0V + 0V ) = x ∗ 0V + x ∗ 0V ⇒ x ∗ 0V = 0V .

(2) If e ∈ V is a unit element, and ee ∈ V is another element with ee ∗ x = x ∗ ee = x for


all x ∈ V , then ee = e ∗ ee = e.
(3) If y ∈ V is an inverse of x ∈ V , and ye ∈ V is another element with xey = yex = e, then
the associativity of the multiplication implies that y = ye = y(xe y ) = (yx)e
y = eey = ye.
(4) Suppose that an invertible element x ∈ V is a zero divisor. Then xy = 0 for some
y 6= 0, and a multiplication of both sides from the left with x−1 yields y = 0, which is a
contradiction. Similarly, if a zero divisor x ∈ V is invertible, then xy = 0 for some y 6= 0,
and a multiplication of both sides from the left with x−1 again yields thee contradiction
y = 0.

Let us give some examples.

Example 2.3. (1) The field K itself is an algebra. Here the scalar multiplication is
identical to the multiplication of two elements of K. Another algebra is given by
K 1,1 , i.e., the 1 × 1 matrices over K. In this algebra there is a formal difference be-
tween the scalar and the matrix multiplication. Both algebras are one-dimensional,
contain a unit element (1 ∈ K and [1] ∈ K 1,1 ), they are associative and com-
mutative, and they do not contain zero divisors. In particular, (K 1,1 , +, ∗) is a
field. The algebras K 1,1 and K can be identified with one another via an algebra
isomorphism; see Definition 2.5 below.

(2) More generally, the vector space K n,n with the usual matrix multiplication is an
associative algebra of dimension n2 with unit element given by In . As discussed
in the introduction of this chapter, for n ≥ 2 this algebra is not commutative and
contains zero divisors. Hence (K n,n , +, ∗) for n ≥ 2 is not a field.

(3) The vector space of the upper (or the lower) triangular matrices in K n,n , n ≥ 2,
with the usual matrix multiplication is an associative and non-commutative algebra
of dimension (n2 + n)/2, with unit element given by In .

(4) If we take any set of matrices {A1 , . . . , Am } ⊂ K n,n , then V := span{A1 , . . . , Am }


is a vector space. However, V with the usual matrix multiplication may not be an

17
algebra,
 since
 the product
 of two elements of V may
 not be in V . For example, if
0 1 0 −1 1 0
A= and B = , then AB = ∈
/ span{A, B}.
1 0 1 0 0 −1

(5) For a fixed nonzero matrix A ∈ K n,n , the vector space of the polynomials in A,

V (A) := {p(A) : p ∈ K[t]} = span{Aj : j ≥ 0}, (2.1)

with the usual matrix multiplication is an algebra. This algebra is associative and
commutative, and the unit element is In . We know from the Cayley3 -Hamilton
Theorem that PA (A) = 0, and hence

An ∈ span{In , A, . . . , An−1 },

which implies that dim(V (A)) ≤ n. An interesting extension of this result will be
derived later; see (4.5) and the corresponding discussion.

(6) A matrix A ∈ K n,n that has constant entries on all its 2n − 1 diagonals, i.e.,
 
a0 a1 · · · · · · an−1
.. .. 
 a−1 a0 . . 

 . . . . .. 
A=  .
. . . . . . . ,
. 
 . . .
 .. .. . . a1 

a−(n−1) · · · · · · a−1 a0

is called a Toeplitz matrix. Using Jn (0), the Jordan block of size n with eigenvalue
zero, a Toeplitz matrix can be written as
n−1
X n−1
X
T j
A= a−j (Jn (0) ) + aj Jn (0)j .
j=1 j=0

When a−(n−1) = · · · = a−1 = 0, we have an upper triangular Toeplitz matrix. Each


such matrix is a polynomial in Jn (0), and hence the upper triangular Toeplitz ma-
trices form the associative and commutative algebra V (Jn (0)); see (2.1). Similarly,
the lower triangular Toeplitz matrices form the associative and commutative algebra
V (Jn (0)T ).
3
Cayley proved this theorem in 1858 for n = 2 and claimed that he had verified it for n = 3 [7]. He
did not feel it necessary to give a proof for general n. Hamilton proved the theorem for the case n = 4
in 1853 in the context of his investigations of quaternions. One of the first proofs for general matrices
was given by Ferdinand Georg Frobenius (1849–1917) in 1878 [20, §3] (see the comment on Lemma 2.7
below). James Joseph Sylvester (1814–1897) may have coined the name of the theorem in 1884 when
he called a result in this context the “no-little-marvelous Hamilton-Cayley theorem”[76].

18
(7) An infinite dimensional associative and commutative algebra is given by the vector
space K[t] with the usual multiplication of polynomials. This algebra contains a
unit element, namely the polynomial p = 1, and it does not contain zero divisors.

While no element of an algebra V can simultaneously be invertible and a zero divisor,


there may in general be elements that are neither invertible nor zero divisors. For
example, the algebra Q[t] contains no zero divisors, but the only invertible elements
in Q[t] are the constant nonzero polynomials. On the other hand, in the algebra K n,n
with the usual matrix multiplication, each nonzero element is either invertible or a zero
divisor. If a nonzero matrix A ∈ K n,n is not invertible, then 1 ≤ r := rank(A) < n, and
there exist X, Y ∈ GLn (K) such that
 
Ir 0
A=X Y;
0 0
 
0
see [LM, Theorem 5.11]. Then for every B := Y −1 ∈ K n,n with Z ∈ K n−r,n we have
Z
AB = 0, and hence A is a zero divisor.
If V is an associative algebra with a unit element e we define

xk := x · · ∗ x}, for k ≥ 1, and x0 := e.


| ∗ ·{z
k times

For p = αk tk +· · ·+α1 t+α0 ∈ K[t] and x ∈ V we define p(x) := αk ·xk +· · ·+α1 ·x+α0 ·e.

Exercise. Show that if (V, +, ·, ∗) is an associative algebra, then (V, +, ∗) is a ring.

Exercise. Show that if (V, +, ·, ∗) is an associative algebra with a unit element and
p, q ∈ K[t], then (pq)(x) = p(x)q(x) holds for all x ∈ V .

Exercise. Show that the vector space R3 with the product

(α1 , α2 , α3 ) ∗ (β1 , β2 , β3 ) := (α2 β3 − α3 β2 , α3 β1 − α1 β3 , α1 β2 − α2 β1 )

is a non-associative and non-commutative algebra.


(This product is called the cross product or vector product in R3 .)

Exercise. Show that the vector space Rn,n , n ≥ 2, with the product A ∗ B :=
1
2
(AB + BA), where AB and BA are the usual matrix products, is a commutative and
non-associative algebra.

19
Remark 2.4. An important concept that is closely related to an algebra over a field is the
Lie algebra. This is a vector space V over a field K with an operation (multiplication)
[·, ·] : V × V → V called Lie bracket, which is bilinear, alternating (i.e., [x, x] = 0 for all
x ∈ V ) and satisfies the Jacobi identity

[x, [y, z]] + [z, [x, y]] + [y, [z, y]] = 0, for all x, y, z ∈ V .

Bilinearity and alternativity imply the anti-commutativity property

[x, y] = −[y, x], for all x, y ∈ V .

Note that if [x, [y, z]] = [[x, y], z] holds for some x, y, z ∈ V , then the Jacobi identity and
the anti-commutativity imply that [y, [z, x]] = 0, which shows that the Lie bracket is in
general not associative.

Exercise. Show that an associative algebra with the operation [x, y] := xy −yx, where
xy and yx are products in the associative algebra, is a Lie algebra.
(Thus, in particular, the associative algebra K n,n with [A, B] := AB − BA, where AB
and BA are the usual matrix products, is a Lie algebra.)

Exercise. Show that the skew symmetric matrices of K n,n with the operation [A, B] :=
AB − BA, where AB and BA are the usual matrix products, form a Lie algebra.

Exercise. Show that the vector space R3 with the cross product is a Lie algebra.

Of course, algebras like K and K 1,1 and should be identified with each other. Formally,
such an identification is done via an algebra isomorphism, which is defined as follows.
Definition 2.5. Let V and W be two algebras over the same field K. A linear map
f : V 7→ W is called an algebra homomorphism if

f (xy) = f (x)f (y)

holds for all x, y ∈ V . If f is bijective, then f is called an algebra isomorphism, and


the algebras are called isomporphic to one another.

In particular, if V = R1,1 and W = R, then the map defined by

f : V → W, [x] 7→ x, (2.2)

is an isomorphism between the one-dimensional (real) algebras V and W .


In general, if V and W are finite-dimensional algebras that are isomorphic to one another,
then dim(V ) = dim(W ).

20
Lemma 2.6. Suppose that the algebras V and W are isomorphic to one another. If the
multiplication in V is associative or commutative, then the multiplication in W is also
associative or commutative, respectively, and vice versa.

Proof. Let the multiplication in V be commutative, and let f : V → W be an algebra


isomorphism. Then for all z1 , z2 ∈ W there exist uniquely determined x1 , x2 ∈ V such
that z1 = f (x1 ) and z2 = f (x2 ), and we have

z1 z2 = f (x1 )f (x2 ) = f (x1 x2 ) = f (x2 x1 ) = f (x2 )f (x1 ) = z2 z1 .

Thus, the multiplication in W is commutative. The proof for associativity is similar.


Since f is bijective, associativity or commutativity of the multiplication in W now also
implies that the same property holds in V .

In (2.2) we have identified a real algebra of matrices and a real algebra of numbers with
one another. Both are one-dimensional, and motivated by an advice of Olga Taussky4 , we
now ask whether such an identification is also possible for two-dimensional real algebras.
As our candidate for the algebra of numbers we take, of course, the complex numbers,

W = C = {α + iβ : α, β ∈ R} = spanR {1, i}.

We also consider the two-dimensional real vector space of matrices


  
0 −1
V = spanR {E0 , E1 } = spanR I2 , (2.3)
1 0

with the usual matrix multiplication. We then have E12 = −E0 , and hence for any scalars
α1 , β1 , α2 , β2 ∈ R we obtain

(α1 E0 + β1 E1 )(α2 E0 + β2 E1 ) = (α1 α2 − β1 β2 )E0 + (α1 β2 + β1 α2 )E1 . (2.4)

In particular, AB ∈ V holds for all matrices A, B ∈ V , so that (V, +, ·, ∗) is indeed


a two-dimensional real algebra, which is associative since the matrix multiplication is
associative.
The map
f : V → W, αE0 + βE1 7→ α + iβ,
is obviously linear. Moreover, f (αE0 + βE1 ) = 0 holds if and only if α = β = 0, which
shows that ker(f ) = {0}, and hence f is bijective.
4
“When you observe an interesting property of numbers, ask if perhaps you are not seeing, in the
1 × 1 case, an interesting property of matrices.” [81, p. 809]

21
Using (2.4) and the multiplication in C, for any Aj = αj E0 + βj E1 ∈ V , j = 1, 2, we
have

f (A1 A2 ) = f ((α1 α2 − β1 β2 )E0 + (α1 β2 + β1 α2 )E1 )


= (α1 α2 − β1 β2 ) + i(α1 β2 + β1 α2 )
= (α1 + iβ1 )(α2 + iβ2 )
= f (α1 E0 + β1 E1 )f (α2 E0 + β2 E1 )
= f (A1 )f (A2 ).

Thus, V and W are indeed isomorphic to one another. Since the multiplication in C
is commutative, Lemma 2.6 implies that the multiplication in V is commutative, which
can also easily be verified by a direct computation.
Each element of V is of the form
 
α −β
A = αE0 + βE1 =
β α

for some scalars α, β ∈ R (depending on A). We have det(A) = α2 +β 2 ≥ 0 with equality


if and only if A = 0, and thus each nonzero matrix A ∈ V is invertible. The inverse of
A then is given by
 
−1 1 α β 1
A = = (αE0 − βE1 ) .
det(A) −β α det(A)

Analogously, for a complex number z = α + iβ we have |z|2 = α2 + β 2 ≥ 0, and if z 6= 0,


then
z α − iβ
z −1 = 2 = 2 .
|z| α + β2
In Section 2.2 we will answer the important question whether there exist any further
such examples of dimension larger than two.
Suppose that V is an associative finite dimensional algebra with the unit element e, and
let x ∈ V \{0} be arbitrary. Let m be the smallest possible integer such that the elements
e, x, . . . , xm−1 are linearly independent, and e, x, . . . , xm are linearly dependent. Clearly,
we have 1 ≤ m ≤ dim(V ). By construction,

xm ∈ span{e, x, . . . , xm−1 },

and hence xm = q(x) for some polynomial q ∈ K[t], which is uniquely determined since
e, x, . . . , xm−1 are linearly independent. We thus have

p(x) = 0 for p = tm − q ∈ K[t] with deg(q) ≤ m − 1.

22
Let pe ∈ K[t] be any other monic polynomial with pe(x) = 0. Then deg(e
p) ≥ deg(p), since
otherwise we would have a contradiction to the linear independence of e, x, . . . , xm−1 . If
deg(ep) = deg(p), then deg(p − pe) < m, since both polynomials are monic. But then
(p − pe)(x) = p(x) − pe(x) = 0,
which shows that p − pe = 0, since otherwise we would again have a contradiction to the
linear independence of e, x, . . . , xm−1 . In summary, we have shown the following result.
Lemma 2.7. If V is a finite dimensional associative algebra with a unit element, then
for each nonzero element x ∈ V there exists a uniquely determined monic polynomial
p ∈ K[t] of smallest possible degree deg(p), where 1 ≤ deg(p) ≤ dim(V ), such that
p(x) = 0. This uniquely determined polynomial is called the minimal polynomial5 of x.

In the next example we consider the minimal polynomial of complex matrices.

Example 2.8. Consider the finite dimensional associative algebra Cn,n with the unit
element I. Then every A ∈ Cn,n has a Jordan decomposition A = SJS −1 , where the
Jordan canonical form J = diag(Jd1 (λ1 ), . . . , Jdm (λm )) is determined uniquely up to the
order of the diagonal blocks. For p ∈ C[t] we have p(A) = Sp(J)S −1 , and hence p(A) = 0
if and only if p(J) = 0. This holds if and only if p(Jdi (λi )) = 0 for all i = 1, . . . , m. It
can be shown that
deg(p) (j)
X p (λi )
p(Jdi (λi )) = Jd (0)j , (2.5)
j=0
j!

where p(j) (λi ) is the jth derivative of p evaluated at λi . Thus, p(Jdi (λi )) = 0 holds if
and only if λi is a zero of p with multiplicity at least di . The uniquely determined monic
polynomial p ∈ C[t] of smallest possible degree with p(Jdi (λi )) = 0 therefore is given by
p = (t−λi )di . Consequently, if λ
e1 , . . . , λ
ek are the distinct eigenvalues of A, and de1 , . . . , dek
are the sizes of the largest corresponding Jordan blocks, then the minimal polynomial of
A is given by
Yk
MA = ei )dei .
(t − λ
i=1
Obviously, MA divides the characteristic polynomial
m
Y
PA = (t − λi )di ,
i=1
5
Frobenius introduced the concept of the minimal polynomial in 1878 [20, §3]. In this context
he showed that the minimal polynomial divides the characteristic polynomial, and thus proved the
Cayley-Hamilton Theorem, without explicitly mentioning Cayley or Hamilton. In 1896 he gave another
proof [21, §1], and he then cited Cayler’s paper from 1858 [7] as the first where this fundamental theorem
(“Fundamentalsatz”) was published.

23
and we have MA = PA if and only if each for each of the eigenvalues of A there is exactly
one Jordan block, or, equivalently, each eigenvalue has geometric multiplicity one. Such
a matrix A is called nonderogatory6 (cf. Definition 4.3).

Exercise. Prove the equality (2.5).

vector space with the basis {v1 , . . . , vn }. For any


Suppose that V isPan n-dimensional P
two elements x = i=1 αi vi and y = nj=1 βj vj we then define the product
n

n
X
xy := (αi βj )(vi vj ). (2.6)
i,j=1

By defining the n2 products vi vj as certain vectors in V we obtain a multiplication that


satisfies, by construction, the distributive laws stated in Definition 2.1. In this way any
finite dimensional vector space yields an algebra. Such an algebra will be associative
or commutative if and only if the respective rules are valid for all products of the basis
vectors, i.e., if and only if
(vi vj )vk = vi (vj vk ) or vi vj = vj vi
hold for all i, j, k = 1, . . . , n, respectively.
Let us construct examples of algebras of matrices based on this idea.
Example 2.9. We start with the vector space (K n,n , +, ·) with the standard basis
{Eij = ei eTj : i, j = 1, . . . , n}. (2.7)
For all i, j, k, ` = 1, . . . , n we define the products of the basis vectors as
(
Ei` , j = k,
Eij ∗ Ek` :=
0, j 6= k.

Using this in (2.6), the product of two matrices A = [aij ] and B = [bij ] is then given by
n
X n
 X  n
X n
X
A∗B = aij Eij bk` Ek` = (aij bk` )(Eij ∗ Ek` ) = (aij bj` )Ei` .
i,j=1 k,`=1 i,j,k,`=1 i,j,`=1
Pn
Thus, for all i, ` = 1, . . . , n the (i, `) entry of A ∗ B is given by j=1 aij bj` , which shows
that we have defined nothing but the usual matrix multiplication. The fact that this
multiplication is not commutative is shown by the equations Eij ∗ Ej` = Ei` 6= 0 for all
i, ` = 1, . . . , n, and Ej` ∗ Eij = 0 whenever i 6= `.
6
Sylvester introduced the term derogatory (actually, the French dérogatoires) in his article “Sur les
quantités formant un groupe de nonions analogues aux quaternions de Hamilton” from 1884; see p. 157
in paper no. 18 in Volume 4 of his Collected Works.

24
Example 2.10. If we use the basis (2.7) and define the products of the basis vector as
(
Eij , i = k and j = `,
Eij Ek` :=
0, otherwise,

then the product of A = [aij ] and B = [bij ] is given by


n
X n
 X  n
X n
X
A B = aij Eij bk` Ek` = (aij bk` )(Eij Ek` ) = (aij bij )Eij .
i,j=1 k,`=1 i,j,k,`=1 i,j=1

We have thus obtained an algebra with the elementwise or Hadamard product of matrices.
By definition the basis vectors satisfy Eij Ek` = Ek` Eij , so that the product is
commutative. (This is also clear from the commutativity of the multiplication in K,
which immediately gives A B = [aij bij ] = [bij aij ] = B A.)

As above, suppose that {v1 , . . . , vn } is a basis of the algebra V over the field K. The
products vj vk are elements of V if and only if there exist aijk ∈ K, i, j, k = 1, . . . , n, such
that n
X
vj vk = aijk vk . (2.8)
k=1

The n3 scalars aijk define n matrices

Ak := [aijk ]i,j=1,...,n ∈ K n,n , k = 1, . . . , n. (2.9)

Suppose that the multiplication is commutative, i.e., vj vk = vk vj for all j, k = 1, . . . , n.


From (2.8) we see that this holds if and only if

aijk = aikj , i, j, k = 1, . . . , n.

Using the distributivity and the commutativity of the multiplication we obtain


n
X  n
X n
X n
X
(vq vr )vs = aiqr vi vs = aiqr vi vs = aiqr a`is vi =
i=1 i=1 i=1 `=1
n X
X n 
= a`is airq v` , and
`=1 i=1
n
X n
X n
X n
X
vq (vr vs ) = vq airs vi = airs vq vi = airs a`qi vi =
i=1 i=1 i=1 `=1
n X
X n 
= a`iq airs v` .
`=1 i=1

25
The multiplication is associative, i.e., (vq vr )vs = vq (vr vs ) for all q, r, s = 1, . . . , n, if and
only if
Xn n
X
a`is airq = a`iq airs or (As Aq )`r = (Aq As )`r
i=1 i=1

for all `, q, r, s = 1, . . . , n. Ín short, As Aq = Aq As for all s, q = 1, . . . , n. We formalize


this observation in the following result.
Lemma 2.11. Suppose that V is an n-dimensional associative and commutative algebra
with the multiplication defined as in (2.6) and (2.8). Then the matrices A1 , . . . , An in
(2.9), which define the products of the basis elements, are pairwise commuting.

Apparently, the properties of associative and commutative algebras are inherently linked
with those of commuting matrices.
Results on the existence of such algebras, or more precisely hypercomplex number sys-
tems, which generalize the real and complex numbers, were published by Weierstraß7 in
1894 [85] and Dedekind in 1895 [9]. The connection to commuting matrices was known
to Frobenius, and motivated him to investigate their properties8 . His fundamental paper
from 1896 [21] will be addressed in detail in the Chapters 4 and 5 below. Interestingly,
Frobenius had completely characterized the existence of the most important classes of
associative algebras already in 1878 [20], and without exploring the connection to matrix
commutativity. The following section is mainly devoted to proving his characterization.

2.2 Division algebras


We will be interested in the existence of the following class of algebras.
Definition 2.12. An algebra V is called a division algebra when for each x ∈ V \ {0}
and y ∈ V the two equations xa = y and bx = y have uniquely determined solutions
a, b ∈ V .

Note that a division algebra does not contain zero divisors, since a zero divisor x ∈ V \{0}
would simultaneously satisfy the two equations xy = 0 for some y ∈ V \{0} and x∗0 = 0,
contradicting the uniqueness of the solution of xa = 0.
7
Karl Theodor Wilhelm Weierstraß (1815–1897)
8
Frobenius wrote to Richard Dedekind (1831–1916) on July 15, 1896: “Everything of a hypercomplex
nature ultimately reduces in reality to the composition of matrices.” (Quoted in [33, p. 528].) According
to MacDuffee [51, p. 4], “the concept of a matrix as a hypercomplex number is due in essence to Hamilton
but more directly to Cayley.” In more modern terms, Cayley established in his fundamental paper from
1858 [7] the idea of the matrix as an independent algebraic object, and thus significantly contributed to
establishing matrix theory as a field of research.

26
Let us prove some basic properties of finite dimensional associative division algebras.

Lemma 2.13. If V is a finite dimensional associative division algebra over a field K,


then the following properties hold:

(1) There exists a (uniquely determined) unit element e ∈ V .

(2) If K is algebraically closed, then the minimal polynomial of each x ∈ V \ {0} (cf.
Lemma 2.7) is of degree 1, and hence V and K are isomorphic to one another.

Proof. (1) We choose an arbitrary x ∈ V \ {0}, then by definition there exists a uniquely
determined ax ∈ V such that ax x = x. We have ax 6= 0 since x 6= 0. Moreover, since V
is associative,
ax x = ax (ax x) = a2x x ⇔ (a2x − ax )x = 0,
and since V does not contain zero divisors, we must have a2x = ax . For each y ∈ V we
therefore obtain
ax (ax y − y) = a2x y − ax y = 0,
which implies that ax y = y. Similarly,

(yax − y)ax = ya2x − yay = 0,

which implies that yax = y. Consequently, ax is the (uniquely determined) unit element
in V .
(2) Let x ∈ V \ {0} be arbitrary, and let p ∈ K[t] be the minimal polynomial of x. Since
K is algebraically closed, we have

p = (t − λ1 ) · · · (t − λm )

for some scalars λ1 , . . . , λm ∈ K, and hence

p(x) = (x − λ1 · e) · · · (x − λm · e) = 0.

Since p is the minimal polynomial and V does not contain zero divisors, we must have
m = 1. The equation x − λ1 · e = 0 then shows that every (nonzero) element x ∈ V can
be identified with a uniquely determined (nonzero) element λ1 ∈ K, and vice versa. The
corresponding map is obviously linear and bijective, so that V and K isomorphic to one
another.

Exercise. Investigate whether the algebra (K n,n , +, ·, ) with the Hadamard product
is a division algebra.

27
Exercise. Investigate whether the algebra (K n,n , +, ·, ∗), n ≥ 2, with the product
A ∗ B := 21 (AB + BA), where AB and BA are the usual matrix products, is a division
algebra.

Exercise. Show that if (V, +, ·, ∗) is an associative division algebra, then (V \{0}, ∗) is


a group. Conclude that if additionally the multiplication is commutative, then (V, +, ∗)
is a field.

The associative and commutative algebras V = spanR {[1]} and V = spanR {E0 , E1 }
in (2.3) that we studied in Section 2.1 are both division algebras, since each nonzero
element in these algebras is invertible. Following the classical argument of Frobenius
from 1878 [20, §14], we will now analyze if there exist such algebras of dimension larger
than two. This analysis will also show whether there exist any number fields beyond the
real and the complex numbers.
Let E0 , E1 , . . . , Em ∈ Rn,n be linearly independent matrices and consider the real vector
space
V = spanR {E0 , E1 , . . . , Em } ⊆ Rn,n ,
which is of dimension m + 1. Let us assume that we have defined a multiplication so that
V is an associative division algebra with the unit element E0 . Thus, E0 A = AE0 = A
holds for all A ∈ V . Our two examples show that the construction of such an algebra is
possible for m = 0 and m = 1.
The following important result holds for the minimal polynomials of the elements of V .

Lemma 2.14. For each nonzero matrix A ∈ V the minimal polynomial MA ∈ R[t]
satisfies 1 ≤ deg(MA ) ≤ 2, and deg(MA ) = 1 holds if and only if A = αE0 for some
nonzero α ∈ R.

Proof. Let A ∈ V \ {0} be arbitrary. If deg(MA ) ≥ 3, then the uniquely determined


monic polynomial MA would decompose over R into k ≥ 2 monic irreducible factors of
degree at most 2, say MA = p1 · · · pk with pj ∈ R[t] and 1 ≤ deg(pj ) ≤ 2 < deg(MA ) for
j = 1, . . . , k. But then
0 = MA (A) = p1 (A) · · · pk (A).
Since MA is a monic polynomial of smallest possible degree which annihilates A, we
have pj (A) 6= 0 for j = 1, . . . , k. This is a contradiction, since V does not contain zero
divisors. Consequently, 1 ≤ deg(MA ) ≤ 2.
If deg(MA ) = 1, then MA = t − α for some α ∈ R \ {0}, and 0 = MA (A) = A − αE0
implies A = αE0 . On the other hand, if A = αE0 with α 6= 0, then MA = t − α, which
shows that deg(MA ) = 1 if and only if A = αE0 for some α 6= 0.

28
Consider again the matrices E0 , E1 , . . . , Em which form a basis of the algebra V . Then
Lemma 2.14 implies that deg(ME0 ) = 1, and that for each other basis element Ej there
exist αj , βj ∈ R with

MEj = t2 − 2αj t + (αj2 + βj2 ), j = 1, . . . , m.

Here βj 6= 0, since MEj must be irreducible over R. (Otherwise we would have a contra-
diction as in the proof of Lemma 2.14.) Using E0 Ej = Ej E0 = Ej we obtain

MEj (Ej ) = Ej2 − 2αj Ej + (αj2 + βj2 )E0 = 0 ⇔ (Ej − αj E0 )2 = −βj2 E0 .

We define
ej := 1 (Ej − αj E0 ),
E j = 1, . . . , m,
βj
then
V = spanR {E0 , E1 , . . . , Em } = spanR {E0 , E em },
e1 , . . . , E

and
e 2 = −E0 ,
E j = 1, . . . , m.
j

We summarize the considerations above as follows.

Lemma 2.15. The real associative division algebra V constructed above is generated by
linearly independent matrices E0 , E
e1 , . . . , E e 2 = −E0 for j = 1, . . . , m.
em that satisfy E
j

For simplicity of notation, let us denote the basis vectors of V again by Ej instead of
ej , with the understanding that still Ej2 = −E0 for j = 1, . . . , m.
E
It remains to check whether there exist any examples beyond the cases m = 0 and the
case m = 1. Here the constraint Ej2 = −E0 turns out to be very restrictive. Let us
consider the different possibilities:
If m = 2, then E1 E2 ∈ V , hence E1 E2 = αE0 + βE1 + γE2 for some α, β, γ ∈ R. If we
multiply this equation from the left with E1 and use E12 = −E0 we get

−E2 = αE1 − βE0 + γE1 E2 ⇔ (αγ − β)E0 + (βγ + α)E1 + (1 + γ 2 )E2 = 0.

Since E0 , E1 , E2 are linearly independent, we must in particular have 1 + γ 2 = 0, but


this is impossible since γ ∈ R. Thus, there exists no such algebra with m = 2.
Now suppose that the real associative division algebra V is generated by the linearly
independent matrices E0 , E1 , . . . , Em for some m ≥ 3. We know that deg(MA ) = 2 holds
for each nonzero matrix A ∈ spanR {E1 , . . . , Em }. Thus, if 1 ≤ k, ` ≤ m and k 6= `, then

deg(MEk +E` ) = deg(MEk −E` ) = 2.

29
Hence there exist α1 , α2 , β1 , β2 ∈ R, depending on k, `, such that

(Ek + E` )2 + α1 (Ek + E` ) + β1 E0 = 0,
(Ek − E` )2 + α2 (Ek − E` ) + β2 E0 = 0.

Using that Ek2 = E`2 = −E0 these two equations can be written as

−E0 + Ek E` + E` Ek − E0 + α1 (Ek + E` ) + β1 E0 = 0, (2.10)


−E0 − Ek E` − E` Ek − E0 + α2 (Ek − E` ) + β2 E0 = 0.

Adding the two equations yields

(α1 + α2 )Ek + (α1 − α2 )E` + (β1 + β2 − 4)E0 = 0.

Since E0 , Ek , E` are linearly independent we must have α1 + α2 = 0 and α1 − α2 = 0,


which yields α1 = α2 = 0. Consequently, (2.10) yields

Ek E` + E` Ek = (2 − β1 )E0 =: 2sk` E0 ,

for some sk` ∈ R, where k, ` = 1, . . . , m and k 6= `. Note that sk` = s`k . Since Ek2 = −E0 ,
the equation holds for all pairs k, ` when we set skk := −1 for k = 1, . . . , m.
By construction, the matrix S = [sk` ] ∈ Rm,m is symmetric
Pm and trace(S) = −m. Now
T m
let g = [γ1 , . . . , γm ] ∈ R be arbitrary and let A = k=1 γk Ek . Then
m
X m
 X  m
X
2
A = γk Ek γ` E` = γk γ` Ek E` .
k=1 `=1 k,`=1

In this sum we have for each pair k, ` the terms

γk γ` Ek E` + γ` γk E` Ek = γk γ` (Ek E` + E` Ek ) = γk γ` (2sk` E0 ) = γk γ` (sk` + s`k )E0


= (γk γ` sk` + γ` γk s`k )E0 .

Therefore,
m
X 
A2 = sk` γk γ` E0 = (g T Sg)E0 .
k,`=1

Since V does not contain zero divisors, A2 = 0 holds if and only if A = 0, and thus
g T Sg = 0 holds if and only if g = 0. This shows that the symmetric matrix S is definite,
and from trace(S) = −m we see that S is negative definite. Thus, there exists a matrix
Z = [zij ] ∈ GLm (R) with Z T SZ = −Im , where on the left hand side we have the usual
matrix multiplication. We now define the matrices
m
X
Jk := zik Ei , k = 1, . . . , m.
i=1

30
Since Z is invertible (with respect to the usual matrix multiplication), the matrices
J1 , . . . , Jm are linearly independent. Moreover, we have V = spanR {E0 , J1 , . . . , Jm }. By
construction,
m
X
Jk J` = zik zj` Ei Ej , k, ` = 1, . . . , m,
i,j=1

and hence
m
X m
X (k`) (k`) (k`)
Jk J` + J` Jk = (zik zj` + zi` zjk )Ei Ej =: βij Ei Ej (where βij = βji )
i,j=1 i,j=1
m
X X
(k`) (k`)
= βii Ei2 + βij (Ei Ej + Ej Ei )
i=1 1≤i<j≤m
m
X   X 
(k`) (k`)
= βii sii E0 + 2βij sij E0
i=1 1≤i<j≤m
m
X   X 
(k`) (k`) (k`)
= βii sii E0 + (βij sij + βji sji ) E0
i=1 1≤i<j≤m
m
X 
(k`)
= sij βij E0
i,j=1
m
X 
= (zik sij zj` + zi` sij zjk ) E0
i,j=1
m
(
X  0, k 6= `,
= (zkT Sz` + z`T Szk ) E0 =
i,j=1
−2E0 , k = `.

In summary, the algebra V is generated by matrices E0 , J1 , . . . , Jm that satisfy

Jk2 = −E0 , k = 1, . . . , m, Jk J` = −J` Jk , k, ` = 1, . . . , m, k 6= `. (2.11)

The property Jk J` = −J` Jk is called anti-commutativity; cf. Example 2.4 and the
discussion following Theorem 5.11 below.
The case m = 3 is possible, as shown by the following example.
Example 2.16. Consider V = spanR {E0 , J1 , J2 , J3 } with the usual (associative) matrix
multiplication, where E0 = I4 and
     
0 1 0 0 0 0 1 0 0 0 0 1
−1 0 0 0 
 , J2 =  0 0 0 1  , J3 =  0 0 −1 0 .
  
J1 =  0 0 0 −1 −1 0 0 0  0 1 0 0
0 0 1 0 0 −1 0 0 −1 0 0 0

31
Elementary computations show that these (skew-symmetric) matrices satisfy the equa-
tions

J12 = J22 = J32 = J1 J2 J3 = −E0 , and


J1 J2 = −J2 J1 = J3 , J1 J3 = −J3 J1 = −J2 , J2 J3 = −J3 J2 = J1 .

Note that these equations yield the following (skew-symmetric) multiplication table:

J1 J2 J3
J1 −E0 J3 −J2
J2 −J3 −E0 J1
J3 J2 −J1 −E0

Each matrix A ∈ V = span{I4 , J1 , J2 , J3 } is of the form

A = αE0 + βJ1 + γJ2 + δJ3

for some α, β, γ, δ ∈ R (depending on A). If we define


e := αE0 − βJ1 − γJ2 − δJ3 ,
A

then an elementary computation using the multiplication table yields


e = (α2 + β 2 + γ 2 + δ 2 )E0 ,
AA

and hence each nonzero A ∈ V is invertible with


1
A−1 = A.
e
α2 + β2 + γ 2 + δ2

Exercise. Investigate whether the (real) vector space V = span{I4 , J1 , J2 , J3 } in


Example 2.16 with the operation [A, B] := AB − BA, where AB and BA are the usual
matrix products, forms a Lie algebra; cf. Remark 2.4.

The associative (but non-commutative) division algebra V constructed in Example 2.16


is isomorphic to the algebra of the quaternions 9 , which is usually denoted by H =
spanR {1, i, j, k}.
9
The quaternions were first discovered by Hamilton while crossing the Royal Canal in Dublin, Ireland,
on the Brougham Bridge. Today there is a commemorative plaque on the bridge that reads: “Here as he
walked by on the 16th of October 1843 Sir William Rowan Hamilton in a flash of genius discovered the
fundamental formula for quaternion multiplication i2 = j 2 = k 2 = ijk = −1 & cut it on a stone of this
bridge.” The book [13] contains many historical details particularly on this episode, and the following
remark on p. 222: “Had Hamilton known of Frobenius’s theorem, he would have been spared years of
hard work in his fruitless search for three-dimensional associative division algebras.”

32
If z = α + βi + γj + δk is a quaternion, then its corresponding conjugate quaternion is
defined as z := α − βi − γj − δk, its norm is |z| := (zz)1/2 = (α2 + β 2 + γ 2 + δ 2 )1/2 ,
and if z 6= 0, its inverse quaternion is given by z −1 = z/|z|2 . While this looks like a
straightforward generalization of the complex numbers C, the essential difficulty in this
generalization is the non-commutativity of the multiplication in H.
The set of matrices V = {E0 , J1 , J2 , J3 } in Example 2.16 is not the only one that generates
an algebra that is isomorphic to to H. As shown in [17], there are 48 distinct ordered
sets of linearly independent 4×4 signed permutation matrices (i.e. permutation matrices
where 1 may be replaced by −1) that form a basis of a division algebra isomorphic to H.
Instead of real 4 × 4 matrices we can also use complex 2 × 2 matrices for constructing a
real division algebra that is isomorphic to H.
Example 2.17. Consider the real vector space V = spanR {E0 , J1 , J2 , J3 } with the com-
plex matrices
     
i 0 0 −1 0 −i
E0 = I2 , J1 = , J2 = , J3 = ,
0 −i 1 0 −i 0
and the usual matrix multiplication in C2,2 . Now A ∈ V is of the form
 
α + iβ −γ − iδ
A = αE0 + βJ1 + γJ2 + δJ3 =
γ − iδ α − iβ
for some α, β, γ, δ ∈ R (depending on A). Hence
det(A) = (α + iβ)(α − iβ) + (γ + iδ)(γ − iδ) = α2 + β 2 + γ 2 + δ 2 ,
which again shows that every nonzero matrix A ∈ V is invertible.

Our next goal is to show that no real division algebra with dimension m ≥ 4 can exist.
Suppose that V = spanR {E0 , J1 . . . , Jm } for some m ≥ 4, where E0 , J1 , . . . , Jm are
linearly independent and satisfy (2.11). Consider J1 , J2 , Jk for any k ∈ {3, . . . , m}, then
(J1 J2 Jk )2 = (J1 J2 Jk )(J1 J2 Jk ) = J1 J2 (Jk J1 )J2 Jk = −J1 J2 (J1 Jk )J2 Jk = J1 J2 J1 J2 Jk Jk
= −J1 J2 J1 J2 = J1 J1 J2 J2 = E0 ,
which can be written as
(J1 J2 Jk − E0 )(J1 J2 Jk + E0 ) = 0.
Since V does not contain zero divisors, one of the two factors on the left hand side
must be zero, and hence J1 J2 Jk = ±E0 . A multiplication from the right with Jk yields
J1 J2 = ∓Jk , so that
spanR {J1 J2 } = spanR {Jk } for all k ≥ 3.

33
Thus, in particular, spanR {J3 } = spanR {J4 }, which contradicts the linear independence
of J3 , J4 .
In summary, we have shown the following fundamental result of Frobenius [20, §14].
Theorem 2.18. Each finite dimensional real associative division algebra is isomorphic to
one of the following: the real numbers R, the complex numbers C, or the quaternions H.
Thus, in particular, a finite dimensional associative real division algebra has dimension
1, 2, or 4. Among these, the multiplication is commutative only in the first two cases.

One can now ask whether there exist finite dimensional real commutative division al-
gebras which are non-associative. An example is given by the 2-dimensional algebra
over R given by the complex numbers C with the additional multiplication defined by
z1 ∗ z2 := z1 z2 , where on the right hand side we use the usual multiplication of complex
numbers. We have
z1 ∗ z2 = z1 z2 = z2 z1 = z2 ∗ z1 ,
and the non-associativity is shown, for example, by
i ∗ (i ∗ 1) = i ∗ i · 1 = i ∗ (−i) = i · (−i) = 1 and (i ∗ i) ∗ 1 = i · i ∗ 1 = −1.
Also note that this algebra does not have a unit element. If there was such an element,
it would be uniquely determined, and it would satisfy z ∗ e = z · e = z, or e = z/z, for
all z ∈ C, which is clearly impossible.
Because of the following result of Hopf from 1940 [38], the previous example of a com-
mutative division algebra is already a typical one.
Theorem 2.19. Each finite dimensional real commutative division algebra has dimen-
sion at most two.

Hopf’s proof is based on sophisticated methods from topology, and until today there
seems to exist no purely algebraic proof of this algebraic statement. This situation has
been called the “topological thorn in the flesh of algebra” [13, p. 223].
Hopf also showed that the dimension of a real division algebra always must be a power
of two. An example of a (non-associative and non-commutative) real division algebra of
dimension 23 = 8, the octonions 10 , was discovered independently by Hamilton’s friend
Graves11 in 1843 and by Cayley in 1845. The search for real division algebras came to
an end more than 100 years later with the following result of Milnor from 1958 [55].
Theorem 2.20. Each finite dimensional real division algebra has dimension 1,2,4, or 8.
10
While this non-associative algebra appears to be rather obscure at first sight, it is still of great
interest particularly because of its applications in theoretical physics; see, e.g., [3] or https://www.
quantamagazine.org/the-octonion-math-that-could-underpin-physics-20180720
11
John Thomas Graves (1806–1870)

34
Chapter 3

Some analytic properties of


commuting complex matrices

In this chapter we will study analytic properties of commuting complex matrices. In the
first part of the chapter we will derive results about the field of values and the numerical
radius of products of commuting matrices (Section 3.1). Since the field of values and
the numerical radius are not studied in [LM], we here also survey and prove their most
important properties for completeness. In the second part of the chapter we will study
the exponential function of commuting matrices (Section 3.2).

3.1 Field of values results for products of commuting


matrices
We start with the definition of the field of values of a matrix1 .
Definition 3.1. The field of values of a matrix A ∈ Cn,n is the set
F (A) := {xH Ax : x ∈ Cn , kxk2 = 1}.

We point out that the definition F (A) := {xH Ax : x ∈ Cn , kxk2 = 1} is also used for a
real matrix A. The reason will become apparent below.
 
1 0
Example 3.2. (1) If A = , then for each x = [x1 , x2 ]T ∈ C2 we have xH Ax =
0 0
|x1 |2 . Since F (A) is the set of all these numbers under the constraint |x1 |2 +|x2 |2 =
1, we see that F (A) is the real interval [0, 1].
1
Toeplitz introduced these sets for bilinear forms and under the name Wertevorrat in 1918 [83].

35
 
0 1
(2) If A = , then for each x = [x1 , x2 ]T ∈ C2 we have xH Ax = x1 x2 . Since
0 0
|x1 |2 + |x2 |2 = 1 we can parameterize

x1 = cos t eiϕ1 , x2 = sin t eiϕ2 ,

and obtain
1
xH Ax = x1 x2 = cos t sin t ei(ϕ2 −ϕ1 ) = sin 2t ei(ϕ2 −ϕ1 ) .
2
Since t, ϕ1 , ϕ2 can be any real numbers, we see that F (A) is a disk centered at the
origin in the complex plane with radius 12 .

Let us collect some properties of the field of values:


(1) It is clear that F (A) is always compact (i.e. closed and bounded) since it is the range
of the continuous function x 7→ xH Ax on the compact set {x ∈ Cn : kxk2 = 1}.
(2) If λ is an eigenvalue of A ∈ Cn,n with a corresponding unit Euclidean norm eigenvector
x ∈ Cn , then xH Ax = λ, which shows that

F (A) contains all eigenvalues of A.

Note that a matrix A ∈ Rn,n in general has complex (non-real) eigenvalues and eigen-
vectors. In order to guarantee that the field of values of such a matrix contains all its
eigenvalues, the definition of F (A) needs to contain all unit Euclidean norm x ∈ Cn (and
not just the unit Euclidean norm x ∈ Rn ). The set {xT Ax : x ∈ Rn , kxk2 = 1} is
called the real field of values of the matrix A, and for a real matrix A this set is always
a subset of the real line.
(3) For any numbers α, β ∈ C and unit Euclidean norm vector x ∈ Cn we have xH (αA +
βIn )x = αxH Ax + β, and hence

F (αA + βIn ) = αF (A) + β.

Moreover, if B ∈ Cn,n , then xH (A + B)x = xH Ax + xH Bx, and hence

F (A + B) ⊆ F (A) + F (B).

The example    
1 0 0 0
A= , B= ,
0 0 0 1
shows that a strict inclusion F (A + B) ⊂ F (A) + F (B) can occur.

36
(4) If U ∈ Cn,n is unitary, then xH U H AU x =: y H Ay, where y := U x ∈ Cn has unit
Euclidean norm, which shows that

F (U H AU ) = F (A).

(5) If A ∈ Cn,n is normal, then A can be unitarily diagonalized, U H AU = diag(λ1 , . . . , λn ),


and we obtain
n
nX n
X o
2
F (A) = F (diag(λ1 , . . . , λn )) = λj |xj | : |xj |2 = 1 .
j=1 j=1

Thus, if A is normal, then F (A) is the convex hull the eigenvalues of A. The converse,
however, is not true in general: There exist non-normal matrices A ∈ Cn,n for which
 F (A)

0 1
is equal to the convex hull of the eigenvalues of A. Moreover, as shown by A = ,
0 0
the eigenvalues of A, and hence their convex hull, can in general be contained in the
interior of F (A).

Exercise. Show that F (A) = {α} if and only if A = αIn .

 
α β
Exercise. Show that the field of values of the matrix A = ∈ C2,2 is an elliptical
0 γ
disk with foci α and γ and minor axis length |β|.
(For Paul Halmos (1916–2006) this was “analytic geometry at its worst” [30, p. 113].)

Exercise. Show that


 
1
 −1 
 
 i 
 ∈ C6,6
A=

 −i 

 0 1
0

is a non-normal matrix with F (A) equal to the convex hull of the eigenvalues of A.

The next result shows that the field of values of a product of commuting matrices is
“well behaved” when one of the matrices is Hermitian positive definite.
Theorem 3.3. If A, B ∈ Cn,n commute and A is Hermitian positive semidefinite, then

F (AB) ⊆ F (A)F (B) := {z1 z2 ∈ C : z1 ∈ F (A) and z2 ∈ F (B)}.

37
Proof. The Hermitian positive semidefinite matrix A can be unitarily diagonalized with
nonnegative real eigenvalues; see Corollary 1.3 (2) and Corollary 1.4. Thus, let A =
U DU H with U H U = In and D = diag(λ1 , . . . , λn ), where λi ≥ 0 for i = 1, . . . , n. We
define
1/2
D1/2 := diag(λ1 , . . . , λ1/2
n ) and A
1/2
:= U D1/2 U H ,
so that A1/2 is Hermitian positive semidefinite and satisfies A = A1/2 A1/2 .
If p ∈ C[t] is any (interpolation) polynomial with
1/2
p(λi ) = λi , i = 1, . . . , n,

then
p(A) = p(U DU H ) = U p(D)U H = U D1/2 U H = A1/2 .
Since the matrix B commutes with A, it also commutes with any polynomial in A, and
therefore in particular with A1/2 .
Let x ∈ Cn be any vector with kxk2 = 1. Then either A1/2 x = 0, which implies that
xH Ax = 0 ∈ F (A) and xH ABx = 0 ∈ F (A)F (B), or y := A1/2 x 6= 0 and we obtain

y H By
xH ABx = xH A1/2 A1/2 Bx = xH A1/2 BA1/2 x = y H By = y H y
yH y
y H By
= xH Ax ∈ F (A)F (B),
kyk22
which is what we needed to show.

Note that the inclusion F (AB) ⊆ F (A)F (B) does not hold in general, as shown by the
matrices      
0 1 0 0 1 0
A= , B= , AB = (3.1)
0 0 1 0 0 0
Here F (AB) = [0, 1], while F (A) and F (B) are both disks centered at the origin with
radius 12 . Clearly, F (AB) is not contained in F (A)F (B).
Toeplitz showed in 1918 [83] that for every matrix A ∈ Cn,n the boundary of F (A) is a
convex curve. Hausdorff 2 extended this result by proving in 1919 [32] that there are no
“holes” in the interior of this curve. This gives the Toeplitz–Hausdorff Theorem.

Theorem 3.4. The field of values of every matrix A ∈ Cn,n is convex.

Proof. Let A ∈ Cn,n be arbitrary. If F (A) is a single point, then this set trivially is
convex. Hence suppose that F (A) contains at least two points, z1 , z2 ∈ C. Since the
2
Felix Hausdorff (1868–1942)

38
convexity of a set is not altered by shifting, scaling, and rotation, and F (αA + βIn ) =
αF (A) + β for all α, β ∈ C, we can assume without loss of generality that z1 = 0 and
z2 = 1. It now suffices to prove that [0, 1] ⊆ F (A).
We write
1 1
A = H + iK, H := (A + AH ), K := (A − AH ).
2 2i
Note that H and K are both Hermitian. Since 0, 1 ∈ F (A), we can find unit Euclidean
norm vectors x, y ∈ Cn with

xH Ax = 0 and y H Ay = 1. (3.2)

Then

0 = xH Ax = x H
| {zHx} +i |xH{z
Kx}, 1 = y H Ay = y H Hy +i y H Ky ,
| {z } | {z }
∈R ∈R ∈R ∈R

which shows that


xH Hx = xH Kx = y H Ky = 0, y H Hy = 1. (3.3)
Moreover, we may assume without loss of generality that Re(xH Ky) = 0. This can be
done since we may always replace x by αx, |α| = 1, without changing any of the values
in (3.2)–(3.3).
The vectors x, y are linearly independent. In fact, if we would have x = µy for some
µ ∈ C, where |µ| = 1 since x, y have unit Euclidean norm, then xH Ax = µy H A(µy) = 1,
which contradicts our assumption on x in (3.2). Since x, y are linearly independent, we
have
(1 − λ)x + λy 6= 0 for all λ ∈ [0, 1].
Hence we can define the unit Euclidean norm vector
(1 − λ)x + λy
z(λ) := for all λ ∈ [0, 1].
k(1 − λ)x + λyk2
We now obtain
((1 − λ)x + λy)H K((1 − λ)x + λy)
z(λ)H Kz(λ) =
k(1 − λ)x + λyk22
|1 − λ|2 xH Kx + 2Re((1 − λ)λxH Ky) + |λ|2 y H Ky
= = 0,
k(1 − λ)x + λyk22
and therefore
z(λ)H Az(λ) = z(λ)H Hz(λ) =: f (λ).
Here f (λ) is a continuous and real valued function of the variable λ ∈ [0, 1] with f (0) = 0
and f (1) = 1 (cf. (3.2). Thus, [0, 1] ⊆ f ([0, 1]) ⊆ F (A).

39
Since every eigenvalue of A ∈ Cn,n is contained in F (A), and F (A) is convex, we see that

F (A) contains the convex hull of the eigenvalues of A.

Exercise. Prove or disprove: A ∈ Cn,n is Hermitian if and only if F (A) is a finite real
interval.

Exercise. What can be said about the field of values of a unitary matrix?

For a block diagonal matrix A = diag(B, C) with square matrices B, C and a corre-
spondingly partitioned z = [xT , y T ]T ∈ Cn , where x 6= 0 6= y, we obtain

xH Bx H
2 y Cy
z H Az = kxk22 + kyk 2 .
kxk22 kyk22

If 1 = kzk22 = kxk22 + kyk22 , then this equation shows that z H Az ∈ F (A) is a convex
combination of a point in F (B) and a point in F (C). This shows that F (A) is contained
in the convex hull of F (B) ∪ F (C). The reverse inclusion holds as well.

Exercise. Show that F (A) for A = diag(B, C) with square matrices B, C is equal to
the convex hull of F (B) ∪ F (C).

Definition 3.5. The numerical radius of A ∈ Cn,n is the real (nonnegative) number

ω(A) := max{|z| : z ∈ F (A)}.

In other words, the numerical radius of A is the radius of the smallest closed disk centered
at the origin in the complex plane that contains F (A).
The definition of the numerical radius easily extends from matrices to bounded linear
operators on a Hilbert space (possibly infinite dimensional). If A is such an operator,
we define ω(A) := sup{|hAx, xi| : kxk = 1}. Many of the properties we show below for
matrices hold for bounded operators as well; see, e.g., the survey article [26].

Exercise. Show that the map A 7→ ω(A) defines a norm on Cn,n .

Theorem 3.6. If A ∈ Cn,n , then

ρ(A) ≤ ω(A) ≤ kAk2 ≤ 2ω(A),

where ρ(A) := max{|λ| : λ ∈ σ(A)} is the spectral radius of A. In particular, if A is


normal, then ρ(A) = ω(A) = kAk2 .

40
Proof. The first inequality follows from the fact that every eigenvalue of A is contained
in F (A). Using the Cauchy–Schwarz3 inequality [LM, Theorem 12.5], we obtain for every
unit Euclidean norm vector x ∈ Cn that

|xH Ax| = |hAx, xi| ≤ kAxk2 kxk2 = kAxk2 ≤ kAk2 ,

and hence ω(A) ≤ kAk2 . For the last inequality we use

kAk2 = max |hAx, yi|,


kxk2 =kyk2 =1

the parallelogram identity

kx + yk22 + kx − yk22 = 2(kxk22 + kyk22 ),

and the polarization identity

4hAx, yi = hA(x + y), (x + y)i − hA(x − y), (x − y)i


+ ihA(x + iy), (x + iy)i − ihA(x − iy), (x − iy)i,

which both hold for all x, y ∈ Cn [LM, Exercises 11.7 and 12.8]. This gives

4kAk2 = max 4|hAx, yi|


kxk2 =kyk2 =1

= max hA(x + y), (x + y)i − hA(x − y), (x − y)i
kxk2 =kyk2 =1

+ ihA(x + iy), (x + iy)i − ihA(x − iy), (x − iy)i
≤ max |hA(x + y), (x + y)i| + |hA(x − y), (x − y)i|
kxk2 =kyk2 =1

+ |hA(x + iy), (x + iy)i| + |hA(x − iy), (x − iy)i|
kx + yk22 + kx − yk22 + kx + iyk22 + kx − iyk22

≤ ω(A) max
kxk2 =kyk2 =1

= 4ω(A) max (kxk22 + kyk22 )


kxk2 =kyk2 =1

= 8ω(A),

and hence kAk2 ≤ 2ω(A).


Finally, if A is normal, then F (A) is the convex hull of the eigenvalues of A, and hence
ρ(A) = ω(A). Moreover, in this case kAk2 = ρ(A).


0 1
For the matrix A = we have kAk2 = 1 and ω(A) = 21 , which shows that the
0 0
constant 2 in Theorem 3.6 is the best possible.
3
Augustin-Louis Cauchy (1789–1857) and Hermann Amandus Schwarz (1843–1921)

41
A matrix A ∈ Cn,n is called spectral when ρ(A) = ω(A), and hence all normal matri-
ces are spectral. An example of a spectral matrix that is not normal is given in [26,
Expample 2.1].
If A, B ∈ Cn,n , then the submultiplicativity of the matrix 2-norm and Theorem 3.6 yield

ω(AB) ≤ kABk2 ≤ kAk2 kBk2 ≤ (2ω(A)) (2ω(B)) = 4ω(A)ω(B).

This upper bound on ω(AB) can be attained, as shown by the matrices in (3.1), which
satisfy ω(A) = ω(B) = 21 and ω(AB) = 1. Consequently, the norm defined by the
numerical radius is not submultiplicative.
In general the submultiplicativity fails even when A and B commute, and even when
they are powers of the same matrix.

Example 3.7. This example from [26] uses the 4 × 4 Jordan block
 
0 1 0 0
0 0 1 0
A= 0 0 0
,
1
0 0 0 0

for which ω(A) = 41 (1 + 5) ≈ 0.8090. There exist permutation matrices P1 , P2 such that
 
0 0 1 0    
2
0 0 0 1 0 1 0 1
A = 0 0 0 0 = P1 diag 0 0 , 0 0
 P1T ,
0 0 0 0
 
0 0 0 1    
3
0 0 0 0 0 1 0 0
A =   = P2 diag , P2T ,
0 0 0 0  0 0 0 0
0 0 0 0

and hence ω(A2 ) = ω(A3 ) = 21 , which yields

1
ω(A3 ) = > ω(A)ω(A2 ) ≈ 0.4045.
2

Although the numerical radius is in general not submultiplicative, it satisfies the power
inequality for any A ∈ Cn,n ,

ω(Am ) ≤ ω(A)m for all m ≥ 1.

42
According to Goldberg and Tadmor [26, p. 274], this inequality was conjectured by
Halmos and first proven by Berger in 1965. An elementary proof was given by Pearcy in
1966 [60]. We skip the proof here, but note the interesting observation that if ω(A) ≤ 1,
then Theorem 3.6 and the power inequality imply that

ω(Am ) ≤ kAm k2 ≤ 2ω(Am ) ≤ 2ω(A)m ≤ 2

for all m ≥ 1.
For commuting matrices we get the following result of Holbrook [36].

Theorem 3.8. If A, B ∈ Cn,n commute, then ω(AB) ≤ 2ω(A)ω(B).

Proof. Since the numerical radius defines a norm we have ω(αA) = |α| ω(A) for evey
α ∈ C. We therefore can assume without loss of generality that ω(A) = ω(B) = 1. We
now have to show that ω(AB) ≤ 2.
Since AB = BA we have 4AB = (A + B)2 − (A − B)2 . Using the power inequality and
the fact that the numerical radius defines a norm, we now compute

4ω(AB) = ω(4AB)
= ω (A + B)2 − (A − B)2


≤ ω((A + B)2 ) + ω((A − B)2 )


≤ (ω(A + B))2 + (ω(A − B))2
≤ (ω(A) + ω(B))2 + (ω(A) + ω(B))2
= 8,

which completes the proof.

Holbrook [36] also gave an example of commuting A, B ∈ C4,4 for which the bound in
Theorem 3.8 is attained. Thus, the constant 2 is the best possible. Moreover, he showed
that if AB = BA and B is normal, then ω(AB) ≤ ω(A)ω(B) [36, Theorem 2.2]. This
inequality also holds when A, B “doubly commute”, i.e., AB = BA and AB H = B H A [36,
Theorem 3.4].

3.2 The exponential function of commuting matrices


If A, B ∈ Cn,n commute, then (AB)k = Ak B k for each k ≥ 0. Thus, if

lim Ak = X and lim B k = Y,


k→∞ k→∞

43
we have   
lim (AB)k = lim Ak B k = lim Ak lim B k = XY.
k→∞ k→∞ k→∞ k→∞

On the other hand, if AB 6= BA, then limk→∞ (AB)k may be different from XY (or the
limit may even not exist).

Example 3.9. Consider the matrices


      
0 1 0 0 1 0 0 0
A= , B= , AB = 6 BA =
= ,
0 0 1 0 0 0 0 1

then
lim Ak = X = 0, lim B k = Y = 0,
k→∞ k→∞

while   
1 0 k1 0
AB = , lim (AB) = 6 XY = 0.
=
0 0 k→∞ 0 0

Exercise. Find matrices A, B ∈ Cn,n for some n ≥ 2 such that limk→∞ Ak and
limk→∞ B k both exist, while limk→∞ (AB)k does not exist.

For each A ∈ Cn,n the matrix exponential function



X 1 j
exp(A) := A (3.4)
j=0
j!

is well defined [LM, Lemma 17.5]. Note that exp(0) = In .


For every S ∈ GLn (C) we have
∞ ∞
−1 −1
X 1 j X 1 −1
S exp(A)S = S A S= (S AS)j = exp(S −1 AS).
j=0
j! j=0
j!

Thus, if U H AU = R = [rij ] is a Schur form of A (cf. Lemma 1.2), then



X 1 j
exp(R) = R = U H exp(A)U,
j=0
j!

where the infinite sum is an upper triangular matrix with diagonal entries er11 , . . . , ernn .
These numbers are the eigenvalues of exp(A).
For commuting matrices we have the following extension of the well known property
ea eb = eb ea = ea+b of the scalar exponential function.

44
Lemma 3.10. If A, B ∈ Cn,n commute, then

exp(A) exp(B) = exp(B) exp(A) = exp(A + B).

In particular, exp(A) ∈ GLn (C) for every A ∈ Cn,n , and exp(A)−1 = exp(−A).

Proof. If A, B commute, then it is clear that the binomial formula


k  
k
X k
(A + B) = Aj B k−j
j=0
j

holds for each k ≥ 0. Using this and the Cauchy product formula yields
∞ ∞ ∞ j
X 1 j   X 1 ` X  X 1 ` 1 
exp(A) exp(B) = A B = A B j−`
j=0
j! `=0
`! j=0 `=0
`! (j − `)!
∞ j  
! ∞
X 1X j ` j−`
X 1
= A B = (A + B)j
j=0
j! `=0
` j=0
j!
= exp(A + B).

Exchanging the roles of A and B we get exp(B) exp(A) = exp(B + A) = exp(A + B).
Since A and −A commute,

exp(A) exp(−A) = exp(−A) exp(A) = exp(0) = In ,

and hence exp(A) is invertible with exp(A)−1 = exp(−A).

If AB 6= BA, then in general exp(A) exp(B), exp(B) exp(A), and exp(A + B) may be
three different matrices.

Example 3.11. Consider A and B as in Example 3.9, then


   
1 1 1 0
exp(A) = , exp(B) = ,
0 1 1 1
   
2 1 1 1
exp(A) exp(B) = 6= exp(B) exp(A) = ,
1 1 1 2
     
0 1 −1 0 −1 −1 −c c 1
A+B = =S S , S = S := , c := √ ,
1 0 0 1 c c 2
   
1/e 0 −1 1.54 1.18
exp(A + B) = S S ≈ .
0 e 1.18 1.54

45
The examples in the next two exercises are taken from [86].

Exercise. Consider the complex 2 × 2 matrices


    
πi 0 0 1 0 1
A= , B1 = , B2 = .
0 −πi 0 −2πi 0 0

(1) Show that AB1 6= B1 A, but exp(A) exp(B1 ) = exp(B1 ) exp(A) = exp(A + B1 ).
(2) Show that AB2 =6 B2 A, but exp(A) exp(B2 ) = exp(B2 ) exp(A) 6= exp(A + B2 ).

Exercise. Consider the complex 2 × 2 matrices


   
α 0 0 0
A= , B= .
0 β 0 1

(1) Determine the values α, β with exp(A) exp(B) = exp(B) exp(A) = exp(A + B).
(2) Show that there exist α, β with exp(A) exp(B) 6= exp(B) exp(A) = exp(A + B).

We will now have a closer look at the exponential function of matrices that have dif-
ferentiable functions of a single (real or complex) variable as their entries. This in-
vestigation will also give some insight into properties of the product exp(A) exp(B) for
non-commuting matrices A, B ∈ Cn,n .
Suppose that A(t) = [aij (t)] is an (n × m)-matrix, where each entry is a differentiable
function aij (t) of the (real or complex) variable t. We define the derivative of A(t) with
respect to t entrywise, i.e.,
d
A(t) = A0 (t) := [a0ij (t)].
dt
If A(t) = [aij (t)] and B(t) = [bij (t)] are two such matrix-functions of sizes n × m and
m × `, respectively, then
n
X 0 n
X
(A(t)B(t))0ij a0ik (t)bkj (t) + aik (t)b0kj (t)

= aik (t)bkj (t) =
k=1 k=1
n
X n
X
= a0ik (t)bkj (t) + aik (t)b0kj (t) = (A0 (t)B(t))ij + (A(t)B 0 (t))ij ,
k=1 k=1

so that we have the usual product rule of differentiation, i.e.,

(A(t)B(t))0 = A0 (t)B(t) + A(t)B 0 (t).

In particular, if B(t) ≡ B is constant, then (A(t)B(t))0 = (A(t)B)0 = A0 (t)B.

46
If A(t) is square, then its exponential function is defined as in 3.4, i.e., the constant
matrix A is simply replaced by A(t). The entrywise differentiation of exp(A(t)) gives the
same result as the termwise differentiation of the infinite series, i.e., we have
∞ ∞
d X1 X1
exp(A(t)) = exp(A(t))0 = (A(t)j )0 = (A(t)j )0 ,
dt j=0
j! j=1
j!

where in the last expression we have used that (A(t)0 )0 = (In )0 = 0.


Because of the chain rule, the derivative of the scalar exponential function satisfies
(ea(t) )0 = a0 (t)et . The analogous rule for the derivative of exp(A(t)) requires an ad-
ditional assumption, as shown by the following result.
Lemma 3.12. In the notation established above,
exp(A(t))0 = A0 (t) exp(A(t)) = exp(A(t))A0 (t) (3.5)
holds if and only if A(t) and A0 (t) commute.

Proof. In this proof we write A instread of A(t) for brevity. We know that A and exp(A)
commute, i.e., A exp(A) = exp(A)A. Differentiating both sides of this equation and
using the product rule yields
A0 exp(A) + A exp(A)0 = exp(A)0 A + exp(A)A0 .
If (3.5) holds, then this equation simplifies to A exp(A)0 = exp(A)0 A. We now replace
exp(A)0 on the left and right hand sides by A0 exp(A), which yields
AA0 exp(A) = A0 exp(A)A.
Using again that A and exp(A) commute gives AA0 exp(A) = A0 A exp(A), and a multi-
plication from the right with exp(A)−1 = exp(−A) yields AA0 = A0 A.
On the other hand, suppose that A and A0 commute. We first show by induction that
then
(Aj )0 = jA0 Aj−1 , j = 1, 2, 3, . . . .
This equation obviously holds for j = 1. Suppose that it holds for some j ≥ 1, then
the product rule, the induction hypothesis, and the assumption that A and A0 commute
yield
(Aj+1 )0 = (Aj A)0 = (Aj )0 A + Aj A0 = jA0 Aj + A0 Aj = (j + 1)A0 Aj .
For the derivative of the exponential function of A we therefore obtain
∞ ∞ ∞
0
X 1 j 0 X 1 0 j−1 0
X 1
exp(A) = (A ) = jA A =A Aj−1 = A0 exp(A).
j=1
j! j=1
j! j=1
(j − 1)!

Finally, A0 exp(A) = exp(A)A0 since A0 and A commute.

47
In the simple case A(t) = tA for a fixed matrix A ∈ Cn,n we have A0 (t) = (tA)0 = A, and
hence A(t) and A0 (t) commute. Lemma 3.12 then reduces to the well known formula
exp(tA)0 = (tA)0 exp(tA) = A exp(A); (3.6)
cf. [LM, pp. 264–265]. The function exp(tA) plays an important role in the solution of
linear homogeneous ordinary differential equation systems of the form
y 0 (t) = Ay(t), t ∈ [0, a], y(0) = y0 . (3.7)
As shown in [LM, Theorem 17.11], the function exp(tA)y0 is the uniquely determined
solution of (3.7).
For two given matrices A, B ∈ Cn,n we now define
f (t) := exp(tA)B exp(−tA) (3.8)
as a function of the (real of complex) variable t. By multiplying out the power series
expansions of exp(tA) and exp(−tA) and collecting equal powers of t we obtain the
representation
∞ j
X t
f (t) = Gj ,
j=0
j!
where the matrices Gj ∈ K n,n are certain sums of products of A and B. Clearly, G0 = B.
Moreover, using (3.6), the product rule, and the fact that A and exp(−tA) commute, we
obtain
f 0 (t) = A exp(tA)B exp(−tA) − exp(tA)BA exp(−tA)
∞ j
X t
= Af (t) − f (t)A = (AGj − Gj A). (3.9)
j=0
j!

On the other hand, differentiating the power series representation of f (t) termwise with
respect to t yields
∞ ∞ j
X tj−1 X t
f 0 (t) = Gj = Gj+1 . (3.10)
j=1
(j − 1)! j=0
j!
A comparison of (3.9) and (3.10) shows that
Gj+1 = [A, Gj ] := AGj − Gj A, for all j ≥ 0.
The matrix [A, Gj ] is called the additive commutator of A and Gj . (Additive commuta-
tors of matrices will be studied in detail in Section 6.1.)
In summary, we have shown the following result, which in the theory of Lie algebras (cf.
Remark 2.4) is known as the Hadamard Lemma 4 .
4
Jacques Hadamard (1865–1963)

48
Lemma 3.13. Any A, B ∈ Cn,n satisfy
∞ j
X t
exp(tA)B exp(−tA) = Gj ,
j=0
j!

where G0 = B and Gj+1 = [A, Gj ] for all j ≥ 0.

In the special case that A and G1 = [A, B] commute, i.e., G2 = [A, G1 ] = 0, we have
Gj = 0 for all j ≥ 2, and the Hadamard Lemma shows that

exp(tA)B exp(−tA) = B + t[A, B]. (3.11)

We next reformulate the identity from Lemma 3.13 using two maps that are important
in the theory of (matrix) Lie groups and Lie algebras (cf. Remark 2.4). Let A ∈ GLn (C)
be given, then
AdA : Cn,n → Cn,n , B 7→ ABA−1 ,
is called the adjoint map of A. This map is linear and bijective with (AdA )−1 = AdA−1 .
Moreover, for all X, Y ∈ Cn,n we have the interesting identity

AdA ([X, Y ]) = A(XY − Y X)A−1 = AXA−1 AY A−1 − AY A−1 AXA−1


= [AdA (X), AdA (Y )].

We also define, for a given matrix A ∈ Cn,n , the linear map

adA : Cn,n → Cn,n , B 7→ AB − BA = [A, B],

which in the theory of Lie algebras is called the adjoint action of A. Note that in the
notation of Lemma 3.13 we have adA (B) = G1 ,

(adA )2 (B) = adA ([A, B]) = [A, [A, B]] = G2 ,

and hence inductively (adA )j (B) = Gj for all j ≥ 0, where (adA )0 (B) = B. We thus
obtain ∞ j
X t
exp(tA)B exp(−tA) = Adexp(tA) (B) = (adA )j (B),
j=0
j!

where the last term can also be written as et adA (B). In the literature this reformulated
version of Lemma 3.13 is often stated without the argument B, i.e., in the compact form

Adexp(tA) = et adA .

49
From the series representation of the function Adexp(tA) it is easy to see that differentiating
with respect to t at t = 0 yields

∞ j−1
d X t j

Adexp(tA) =
(adA ) = adA .
dt t=0 (j − 1)!
j=1

t=0

The last two identities are useful when studying the relation between Lie group homo-
morphisms and maps between their corresponding Lie algebras; see, e.g., [29, Section 3.5]
Lemma 3.13 and equation (3.11) will be used in the proof of the next result, which is a
significant generalization of Lemma 3.10.
Theorem 3.14. If A, B ∈ Cn,n both commute with [A, B], then

exp(tA) exp(tB) = exp(t(A + B) + (t2 /2) [A, B])


= exp(t(A + B)) exp((t2 /2) [A, B]).

Proof. With the given A, B ∈ Cn,n we define


t2
M (t) := t(A + B) + [A, B],
2
as a function of the (real of complex) variable t, so that M 0 (t) = A + B + t[A, B]. Since
A, B both commute with [A, B], we have M (t)M 0 (t) = M 0 (t)M (t). Using Lemma 3.12
we obtain
exp(M (t))0 = M 0 (t) exp(M (t)).
Since exp(M (0)) = exp(0) = In , we see that for any given initial condition y0 the function
exp(M (t))y0 solves the ordinary differential equation system

y 0 (t) = M 0 (t)y(t), t ∈ [0, a], y(0) = y0 . (3.12)

Let w(t) be another solution of (3.12). We define u(t) := exp(−M (t))w(t), then the
product rule and Lemma 3.12 yield

u0 (t) = −M 0 (t) exp(−M (t))w(t) + exp(−M (t))w0 (t)


= exp(−M (t))(w0 (t) − M 0 (t)w(t))
= 0,

and hence u(t) must be constant. The initial condition in (3.12) implies that

u(0) = exp(−A(0))w(0) = y0 ,

which gives u(t) ≡ y0 , and therefore w(t)y0 = exp(M (t))y0 . Thus, similar to (3.7), the
system (3.12) has a uniquely determined solution, which is given by exp(M (t))y0 .

50
We now define
g(t) := exp(tA) exp(tB)
as a function of the (real of complex) variable t. Using the product rule and (3.6) we
obtain

g 0 (t) = A exp(tA) exp(tB) + exp(tA)B exp(tB)


= Ag(t) + exp(tA)B exp(−tA)g(t)
= (A + f (t))g(t),

where f (t) is defined in (3.8). Since A and [A, B] commute we can use (3.11), i.e.,
f (t) = B + t[A, B], which yields

g 0 (t) = M 0 (t)g(t). (3.13)

Note that g(0) = In , so that the function g(t)y0 is the uniquely determined solution of
the system (3.12), and hence g(t)y0 = exp(M (t))y0 . Since y0 is arbitrary, we must have
g(t) = exp(M (t)), or

exp(tA) exp(tB) = exp(t(A + B) + t2 /2 [A, B]),

which is the first equation we needed to show. The second equation follows from Theo-
rem 3.10, since A and B both commute with [A, B].

Note that Theorem 3.14 for t = 1 and commuting matrices A, B reduces to Theorem 3.10.
In general, two matrices A, B that both commute with their additive commutator are
called quasi-commuting; see Definition 6.14 and Theorem 6.15 below.
Theorem 3.14 is a special case of the Baker-Campbell-Hausdorff (or BCH) Theorem 5 ,
which for given A, B gives the solution C to the matrix equation

exp(A) exp(B) = exp(C). (3.14)

This theorem is of great interest in the theory of Lie algebras; see, e.g., [29]. A very
thorough treatment of the BCH Theorem, its history, numerous proofs, and applications
is given in [5].
The special case derived in Theorem 3.14 appears in quantum mechanics. As discussed
in more detail in Chapter 6 (see in particular Theorem 6.12 and its discussion), this area
involves operators P, Q whose additive commutator is of the form [P, Q] = αidV , and
hence P and Q both trivially commute with [P, Q].
5
Henry Frederick Baker (1866–1956), John Edward Campbell (1862–1924), and Felix Hausdorff
(1868–1942)

51
Note that if C is a solution of the equation (3.14), then

exp(A) exp(B) = exp(C) = exp(C/2) exp(C/2) = exp(C/2)2 ,

i.e., exp(C/2) is a square root of exp(A) exp(B). Some matrices do not have square roots
(see [34, Sections 1.5–1.7] for the general theory). In the next example we construct real
matrices A and B so that exp(A) exp(B) does not have a real square root, and hence
(3.14) does not have a real solution C.
Example 3.15. We adapt an example of Wei [84]. For a given α ∈ R, let
   
0 α 0 1
A= and B = .
−α 0 0 0

Considering A as an element of C2,2 , we have the diagonalization


   
iα 0 −1 1 1
A=S S , where S = ,
0 −iα i −i

which yields
   
eiα 0 −1 x y
exp(A) = S S = , where eiα = x + iy.
0 e−iα −y x
 
1 1
With exp(B) = we obtain
0 1
 
x x+y
exp(A) exp(B) = .
−y x − y

It is now easy to pick an α ∈ R for which exp(A) exp(B) does not have
√ a real√square
root. Wei’s example in [84] uses α = −5π/4, which yields eiα = −1/ 2 + i/ 2 and
hence    √ 
x x+y −1/√2 0

= .
−y x − y −1/ 2 − 2
Because of the negative diagonal elements, it is clear that this lower triangular matrix
does not have a real square root.

Let us define β := −1/ 2. Then the complex logarithm yields

log(β) = log(|β|) + iπ, log(1/β) = − log(|β|) + iπ,

and log(1/β) − log(β) = −2 log(|β|). Thus, for the matrix


     
log(β) 0 log(β) 0 1 0
C= =S S −1 , where S=
−2 log(|β|) log(1/β) 0 log(1/β) −1 1

52
we obtain
     √ 
log(β) 0 −1 β 0 −1 −1/√2 0
√ .
exp(C) = S exp S =S S =
0 log(1/β) 0 1/β −1/ 2 − 2
We see that the equation (3.14) for the given real matrices A with α = −5π/4 and B has
a complex solution C.

Integration of the function exp(tA) is also done entrywise [LM, p. 266]. Using this
definition of the integral, Wermuth [86] derived the following interesting formula for the
difference of two matrix exponential functions of commuting matrices.
Lemma 3.16. If A, B ∈ Cn,n commute, then
Z 1
exp(A) − exp(B) = exp(B) (A − B) exp(t(A − B)) dt,
0

and hence in particular


k exp(A) − exp(B)k ≤ k exp(B)k ekA−Bk − 1 ,


where k · k is any submultiplicative matrix norm on Cn,n , i.e., any norm on Cn,n with
kM1 M2 k ≤ kM1 k kM2 k for all M1 , M2 ∈ Cn,n .

Proof. Standard rules of analysis and Lemma 3.10 yield


Z 1
1
(A − B) exp(t(A − B)) dt = exp(t(A − B)) 0 = exp(A − B) − In
0
= exp(A) exp(−B) − In
= exp(A) exp(B)−1 − In
= exp(B)−1 (exp(A) − exp(B)).
A multiplication with exp(B) gives the formula for the difference exp(A) − exp(B).
In order to show the bound on the difference we note that
Z 1

k exp(A) − exp(B)k ≤ k exp(B)k (A − B) exp(t(A − B)) dt

0


X 1
j
= k exp(B)k k exp(A − B) − In k = k exp(B)k (A − B)

j!
j=1


X 1
kA − Bkj = k exp(B)k ekA−Bk − 1 ,

≤ k exp(B)k
j=1
j!

which completes the proof.

53
The bound on k exp(A) − exp(B)k also holds when AB 6= BA. In his proof of the general
case Wermuth [86] applied the Lie product formula

lim (exp(A/k) exp(B/k))k = exp(A + B),


k→∞

which is valid for all A, B ∈ Cn,n . For a proof of this formula see [39, p. 496].

54
Chapter 4

Sylvester equations and the


structure of commuting matrices

In this chapter we will study the structural properties of commuting matrices in more
detail. Our main goal is to answer the question which matrices commute with a given
matrix.
The following classical result, originally due to Sylvester [77]1 will be used frequently in
our analysis.

Theorem 4.1. Let K be algebraically closed, and let A ∈ K n,n and B ∈ K m,m . Then
the following two assertions are equivalent:

(1) A and B have no common eigenvalue, i.e., σ(A) ∩ σ(B) = ∅.

(2) For each C ∈ K n,m there exists a unique matrix X ∈ K n,m with

AX − XB = C. (4.1)

The equation (4.1) with the known matrices A, B, C is called a Sylvester equation for
the matrix X.

Proof. (1) ⇒ (2): Since the field K is algebraically closed, there exists a Schur decompo-
sition B = SRS −1 (cf. Theorem 1.1) with an upper triangular matrix R = [rij ] ∈ K m,m ,
which has the eigenvalues of B as its diagonal elements r11 , . . . , rnn . Inserting the Schur
1
Several years before Sylvester’s paper [77], Frobenius proved that if σ(A) ∩ σ(B) = ∅ and AP = P B,
then P = 0 [20, §7 Satz XI].

55
decomposition into the equation (4.1) yields
AX − XSRS −1 = C ⇔ AXS − XSR = CS
⇔ AY − Y R = D, where Y := XS, D := CS.
Clearly, X is uniquely determined if and only if Y is uniquely determined.
We write Y = [y1 , . . . , ym ] and D = [d1 , . . . , dm ], then the first column of the matrix
equation AY − Y R = D can be written as
(A − r11 In )y1 = d1 .
Since σ(A)∩σ(B) = ∅, the matrix A−r11 In is nonsingular, and hence y1 = (A−r11 In )−1 d1
is uniquely determined. Suppose that y1 , . . . , yk−1 for some k ≥ 2 have been uniquely
determined. Then the kth column of AY − Y R = D is
k−1
X
(A − rkk In )yk = dk + rjk yj .
j=1

Again, since σ(A) ∩ σ(B) = ∅ we get a uniquely determined vector


 k−1
X 
yk = (A − rkk In )−1 dk + rjk yj .
j=1

Thus, the matrix Y is uniquely determined, and hence X = Y S −1 is uniquely determined


as well.
(2) ⇒ (1): Consider the linear map
f : K n,m → K n,m , X 7→ AX − XB.
Assume that λ ∈ σ(A) ∩ σ(B), and let Ax = λx, y T B = λy T for some nonzero vectors
x ∈ K n , y ∈ K m . We define X := xy T ∈ K n,m , then X 6= 0 and
f (X) = Axy T − xy T B = λxy T − x(λy T ) = 0.
Since ker(f ) 6= {0}, the linear map f is not bijective, and hence the matrix equation
AX − XB = C is not uniquely solvable for each C ∈ K n,m .

Exercise. Let A = diag(a1 , . . . , an ) ∈ K n,n with pairwise distinct diagonal entries


a1 , . . . , an . Determine all matrices B ∈ K n,n with AB = BA.

A consequence of the implication (2) ⇒ (1) in Theorem 4.1 is that if there exists a
nonzero matrix X ∈ K n,m with AX − XB = 0, then A and B must have a common
eigenvalue. A nice (separate) proof of this fact was given by Drazin in [10, Lemma 2],
and his proof uses the following result (cf. Lemma 2.7).

56
Lemma 4.2. Let K be algebraically closed and let A ∈ K n,n . Then for every nonzero
X ∈ K n,m there exists a uniquely determined monic polynomial p ∈ K[t] of smallest
possible degree deg(p), where 1 ≤ deg(p) ≤ n, such that p(A)X = 0. Moreover, p divides
the minimal polynomial MA , and hence every zero of p is an eigenvalue of A.

The (uniquely determined) polynomial in Lemma 4.2 is called the minimal polynomial
of A with respect to X.

Exercise. Prove Lemma 4.2.

Following Drazin [10], we now assume that AX − XB = 0 holds for some nonzero matrix
X ∈ K n,m , and we let p ∈ K[t] be the minimal polynomial of A with respect to X. If
λ ∈ K is a zero of p (and thus an eigenvalue of A) we can write p = (t − λ)p0 for some
monic polynomial p0 ∈ K[t] with Y := p0 (A)X 6= 0. Moreover,

λY = λp0 (A)X = Ap0 (A)X − p(A)X = p0 (A)AX = p0 (A)XB = Y B,

and thus B T Y T = λY T . Since λ is an eigenvalue of B T , it is also an eigenvalue of B,


and hence a common eigenvalue of A and B.

Our next goal is to characterize matrices that commute with a given matrix A ∈ K n,n .
It is clear that
CA := {B ∈ K n,n : AB = BA}
is a subspace of K n,n . Moreover, if B = p(A) for some p ∈ K[t], then B ∈ CA .
On the other hand, every matrix commutes with the identity matrix, hence CIn = K n,n ,
while each polynomial in In is of the form p(In ) = αIn for some α ∈ K. This means that
in general CA may contain more matrices than just the polynomials in A.
Our analysis of this situation is based on the following definition.

Definition 4.3. Let K be any field. A matrix A ∈ K n,n is called nonderogatory when
deg(MA ) = n (or, equivalently, MA = PA ).

The following two exercises give further characterizations of the nonderogatory property.

Exercise. Show that A ∈ K n,n is nonderogatory if and only if there exists a vector
v ∈ K n such that the n vectors v, Av, . . . , An−1 v are linearly independent, and hence
form a basis of K n . (Such a vector v is called a cyclic vector for A.)

Exercise. Show that if K is algebraically closed, then A ∈ K n,n is nonderogatory if


and only if each eigenvalue of A has geometric multiplicity one.

57
We will now show that in case of an algebraically closed field K the only matrices that
commute with a nonderogatory matrix A ∈ K n,n are polynomials in A. The proof uses
that the inverse of an upper triangular Toeplitz matrix is again an upper triangular
Toeplitz matrix. More generally, one can even prove the following result.

Lemma 4.4. Let A = n−1 j


P
j=0 aj Jn (0) ∈ GLn (K) be an upper triangular Toeplitz matrix.
(Thus, in particular, a0 6= 0 since A is invertible.) Then
n−1
X
A−1 = bj Jn (0)j ∈ GLn (K)
j=0

is also an upper triangular Toeplitz matrix with its entries given by


j−1
X
b0 = a−1
0 and bj = −a−1
0 aj−i bi , j = 1, . . . , n − 1.
i=1

Exercise. Prove Lemma 4.4.

Theorem 4.5 (Frobenius [20], §7 Satz XIII). Let K be algebrically closed and let A ∈
K n,n be nonderogatory, i.e., each eigenvalue of A has geometric multiplicity one. Then
each B ∈ K n,n that commutes with A is of the form B = p(A) for some p ∈ K[t], and
dim(CA ) = n.

Proof. Let B ∈ K n,n be such that AB = BA and let

A = SJS −1 , J = diag(Jd1 (λ1 ), . . . , Jdm (λm ))

be a Jordan decomposition of A, where λ1 , . . . , λm are pairwise distinct. Then AB = BA


can be equivalently written as

JB
b = BJ,
b b := S −1 BS.
where B

We now partition B b = [Bij ], where Bii ∈ K di ,di for i = 1, . . . , m. Then the (i, j)-block
b − BJ
of the equation J B b = 0 is of the form

Jdi (λi )Bij − Bij Jdj (λj ) = 0. (4.2)

6 j we have λi 6= λj and thus


This is a Sylvester equation for the matrix Bij . For i =
Theorem 4.1 shows that Bij = 0 is the unique solution. Hence

B
b = diag(B11 , . . . , Bmm ),

58
where each diagonal block satisfies
Jdi (λi )Bii − Bii Jdi (λi ) = 0.
Using that Jdi (λi ) = λi Idi + Jdi (0) we obtain the equation
Jdi (0)Bii − Bii Jdi (0) = 0. (4.3)
A direct computation shows that Bii ∈ K di ,di solves this equation if and only if Bii is an
upper triangular Toeplitz matrix of the form
 (i) (i) (i)

b0 b1 · · · bdi −1
 .. .. ..  i −1
dX
 . . .  (i)
Bii = 

..
 = bj Jdi (0)j .
(i) 
 . b1  j=0
(i)
b0

Since the matrices Jdi (0)0 , . . . , Jdi (0)di −1 are linearly independent, there exist exactly di
linearly independent solutions of the equation (4.3). This shows that there exist exactly
d1 + · · · + dm = n linearly independent matrices B b = diag(B11 , . . . , Bmm ) that commute
with J, and consequently dim(CA ) = n.
It remains to be shown that B is a polynomial in A. For each i = 1, . . . , m we define
Y
qi = (t − λj )dj ∈ K[t],
j6=i

then it is easy to see that qi (Jdj (λj )) = 0 for i 6= j. Moreover, qi (Jdi (λi )) is an upper
triangular Toeplitz matrix that is nonsingular, since its diagonal element is given by
Y
(λi − λj )dj 6= 0.
j6=i

(Note that in the case m = 1 and hence d1 = n we only have the polynomial q1 = 1 and
the matrix q1 (Jn (λ1 )) = In .)
By Lemma 4.4, the matrix qi (Jdi (λi ))−1 is an upper triangular Toeplitz matrix, and hence
also qi (Jdi (λi ))−1 Bii is an upper triangular Toeplitz matrix, which we write as
i −1
dX i −1
dX
−1 (i) (i)
qi (Jdi )(λi ) Bii = cj Jdi (0)j = cj (Jdi (λi ) − λi Idi )j =: ri (Jdi (λi )),
j=0 j=0

where ri ∈ K[t] is of degree at most di − 1. We now define pi := qi ri ∈ K[t], then


(
0, i 6= j,
pi (Jdj (λj )) =
Bii , i = j.

59
Consequently, for the polynomial p := p1 + · · · + pm we have
p(A) = Sdiag(p(Jd1 (λ1 )), . . . , p(Jdm (λm )))S −1 = S BS
b −1 = B,

which completes the proof.

In particular, if A = diag(λ1 , . . . , λn ) ∈ K n,n has pairwise distinct diagonal entries, then


A is nonderogatory and dim(CA ) = n. Hence CA is given by the n-dimensional subspace
of K n,n consisting of the diagonal matrices.
The essential observation in the proof of Theorem 4.5 is that the Sylvester equation (4.2)
has the unique solution Bij = 0 when λi 6= λj . If A is derogatory, then there exists at
least one pair of indices i 6= j with λi = λj . For this pair the equation (4.2) is
Jdi (λi )Bij − Bij Jdj (λi ) = 0, where Bij ∈ K di ,dj .
Using Jd (λ) = λId + Jd (0) we obtain
Jdi (0)Bij − Bij Jdj (0) = 0.
A direct computation shows that Bij must again be an upper triangular Toeplitz matrix,
with its form depending on the sizes di and dj . More precisely, we have
" #
 T ij
 , if di ≥ dj ,
Bij = 0di −dj ,dj

[0di ,dj −di , Tij ], if di < dj ,

where Tij ∈ K d,d with d := min{di , dj } is an upper triangular Toeplitz matrix. If an


off-diagonal block in the matrix B
b = [Bij ] is nonzero, then B
b is not a polynomial in J,
and hence B is not a polynomial in A. In other words, when A is derogatory, there exist
matrices B that commute with A, but that are not polynomials in A.
If J = diag(Jd1 (λ1 ), . . . , Jdm (λm )) is a Jordan canonical form of A, where the eigenvalues
λ1 , . . . , λm are not necessarily distinct, then the polynomials
(t − λ1 )d1 , . . . , (t − λm )dm
are called the elementary divisors of A. For i, j = 1, . . . , m, let δij be the degree of the
greatest common divisor of the elementary divisors (t−λi )di and (t−λj )dj . In particular,
δii = di for i = 1, . . . , m, and δij = 0 if λi 6= λj . Based on the argument above and some
further algebra, one can show that
m
X
dim(CA ) = δij . (4.4)
i,j=1

A detailed derivation of this formula and a complete characterization of the matrices B


that commute with a derogatory matrix A is given in [24, Chapter VIII].

60
Example 4.6. The elementary divisors of J = diag(J3 (λ), J2 (λ)) are (t−λ)3 and (t−λ)2 ,
and
δ11 = 3, δ12 = δ21 = 2, δ22 = 2,
giving dim(CA ) = 9. For the identity matrix In we have the n elementary divisors t − 1,
and hence δij = 1 for i, j = 1, . . . , n, which gives dim(CIn ) = n2 , i.e., CIn = K n,n .
Finally, if A is nonderogatory, then δij = 0 when i 6= j, which yields dim(CA ) = n as
shown in Theorem 4.5.

We have shown above that there exist commuting matrices A, B ∈ K n,n for which B is
not a polynomial in A. One can ask whether AB = BA implies that A and B are both
polynomials in a third matrix C ∈ K n,n . The following example of Frobenius [21] shows
that this is in general not true. We will show in Theorem 5.8 below that two matrices
A, B ∈ K n,n can be expressed as polynomials in a third matrix C ∈ K n,n if and only if
they commute and are both diagonalizable.

Example 4.7. Suppose that


   
0 1 0 0 0 1
A = 0 0 0 = p(C) and B = 0 0 0 = q(C)
0 0 0 0 0 0

for some matrix C ∈ K n,n and polynomials p, q ∈ K[t]. Then AC = CA and BC = CB.
A straightforward computation shows that any matrix C that commutes with both A and
B is of the form  
0 α1 α2
C = α0 I3 + D with D = 0 0 0 
0 0 0
for some α0 , α1 , α2 ∈ K. Since D2 = 0, every polynomial in C is of the form β0 I + β1 D
for some β0 , β1 ∈ K. Thus,

A = p(C) = β0A I3 + β1A D and B = q(C) = β0B I3 + β1B D.

The first equation implies that β1A α1 = 1 and β1A α2 = 0, and hence in particular α2 = 0.
But the second equation implies that β1B α2 = 1, which is a contradiction.

Exercise. Determine all B ∈ K 3,3 that commute with A = diag(1, J2 (1)). What
changes in this computation when A = diag(1, J2 (−1)) ?

61
Exercise. Let A ∈ K n,n have the eigenvalue λ of algebraic multiplicity n and the
Jordan form diag(Jd1 (λ), Jd2 (λ), . . . , Jdm (λ)) with d1 ≥ d2 ≥ · · · ≥ dm . Using (4.4),
show that dim(CA ) = d1 + 3d2 + · · · + (2m − 1)dm .
A variant of this formula was first stated without proof by Frobenius in 1878 [20, §7
Satz XV], and he published a proof in 1910, more than 30 years later; cf. equations
(7.) and (8.) in [22]. Note that Frobenius’ formula applied to A = In implies the well
known identity 1 + 3 + · · · + (2n − 1) = n2 .

An interesting alternative approach in this context was considered by Holbrook in [37].


He showed that for a given matrix A ∈ Cn,n we have B ∈ CA if and only if for any  > 0
there exist a matrix A ∈ Cn,n and a polynomial p ∈ C[t], such that kA − A k < 
and p (A ) = B. Thus, when A is derogatory and CA contains matrices that are not
polynomials in A, we can find an arbitrarily small perturbation of A so that B is a
polynomial in the perturbed matrix.
As a straightforward consequence of the Cayley-Hamilton Theorem we have seen that the
algebra V (A) of the polynomials in a given matrix A ∈ K n,n has dimension at most n;
see (2.1). Now suppose that K is algebraically closed and that A, B ∈ K n,n commute.
Then we can form the vector space over K of the polynomials in A and B,
n m
X o
V (A, B) := p(A, B) : p(s, t) = γij si tj , m ≥ 0, γij ∈ K
i,j=0
i j
= span{A B : i, j ≥ 0}. (4.5)

With the usual matrix multiplication, V (A, B) becomes an associative and commutative
algebra with unit element In . If A is nonderogatory, then Theorem 4.5 says that B = p(A)
holds for some polynomial p ∈ K[t]. Then for every polynomial pe(s, t) in the two
(commuting) variables s, t we have pe(A, B) = pe(A, p(A)), which is a polynomial in A.
Thus, V (A, B) is contained in the algebra of the polynomials in A, and in particular
dim(V (A, B)) ≤ n.
Gerstenhaber showed in 1961 [25] that dim(V (A, B)) ≤ n also holds without the as-
sumption that A (or B) is nonderogatory. A proof of Gerstenhaber’s result using only
techniques from matrix theory is given in [4]. Interestingly, the question whether the
same bound holds for three commuting matrices, i.e., whether dim(V (A, B, C)) ≤ n, is
in general still open; see [58, Chapter 5] and [35] for extensive discussions.

62
Chapter 5

Simultaneous triangularization and


applications

In this chapter we will show that commuting matrices can be triangularized simultane-
ously, which generalizes the fundamental Theorem 1.1 about the triangularization of a
single matrix. After proving this result, which relies on the fact that commuting matri-
ces have a common eigenvector, we will discuss simultaneous diagonalizability and derive
some further sufficient conditions for simultaneous triangularizability (Section 5.1). We
will then give several applications of the simultaneous triangularization, where “applica-
tion” means that the simultaneous triangularization is used to prove other results. These
include Frobenius’ spectral mapping theorem, and a theorem of Schur on the maximal
number of linearly independent and mutually commuting matrices in K n,n (Section 5.2).

5.1 Simultaneous triangularization


The following three lemmas will be very useful in several proofs below.
Lemma 5.1. Let K be algebraically closed and A ∈ K n,n . If the subspace U ⊆ K n is
invariant under A and dim(U) ≥ 1, then A has an eigenvector in U.

Proof. Let the columns of X ∈ K n,m , where 1 ≤ m ≤ n, form a basis of U. Since U


is invariant under A, there exists a matrix A1 ∈ K m,m such that AX = XA1 . Since
K is algebraically closed, the matrix A1 has an eigenvector, say A1 y = λy for some
nonzero vector y ∈ K m and λ ∈ K. But then Xy 6= 0 since X has full rank, and
AXy = XA1 y = λXy, which shows that Xy ∈ U is an eigenvector of A.
Lemma 5.2. If K is any field, A, B ∈ K n,n commute, and λ ∈ K is an eigenvalue of
A, then the eigenspace Vλ (A) is invariant under B.

63
Proof. For each x ∈ Vλ (A) = ker(λIn − A) we have (λIn − A)Bx = (λB − AB)x =
(λB − BA)x = B(λIn − A)x = 0, and hence Bx ∈ Vλ (A).

Lemma 5.3. If K is algebraically closed and A, B ∈ K n,n commute, then A and B have
a common eigenvector.

Proof. Since the field is algebraically closed, A has an eigenvalue λ ∈ K and hence
{0} 6= ker(λIn − A) = Vλ (A). Since A and B commute, Vλ (A) is invariant under B
(cf. Lemma 5.2), and therefore B has an eigenvector in Vλ (A) (cf. Lemma 5.1). By the
definition of Vλ (A), this vector is also an eigenvector of A.

Note that AB = BA does not imply that A and B have a common eigenvalue. For a
simple counterexample consider A = In and B = −In .
Also note that the condition AB = BA is sufficient but not necessary for the existence
of a common eigenvector. For example, any two upper triangular matrices have the
common eigenvector e1 , but not all of them commute, as shown by the matrices
   
0 1 1 −1
A= and B = ,
0 1 0 1

which satisfy    
0 1 0 0
AB = 6 = = BA.
0 1 0 1
A necessary and sufficient condition for the existence of a common eigenvector is given in
the following result of Shemesh [72]; see also [50, Section 7]. Another sufficient condition
will be derived in Lemma 6.17.

Theorem 5.4. Suppose that K is algebraically closed. Two matrices A, B ∈ K n,n have
a common eigenvector if and only if

\ n−1
\
i j j i
{0} =
6 ker(A B − B A ) = ker(Ai B j − B j Ai ).
i,j=1 i,j=1

Proof. We denote U := ∞ i j j i n
T
i,j=1 ker(A B −B A ). This is a subspace of K . Because of the
Cayley-Hamilton Theorem, Ai ∈ span{In , A, . . . , An−1 } and B j ∈ span{In , B, . . . , B n−1 }
for all i, j ≥ n, and therefore we can replace ∞ in the intersection by n − 1.
Now suppose that x ∈ K n is a common eigenvector of A and B, i.e., Ax = λx and
Bx = µx. Then for all i, j ≥ 1 we have

(Ai B j − B j Ai )x = Ai (µj x) − B j (λi x) = λi µj − λi µj = 0,

64
and hence x ∈ U.
On the other hand, suppose that x ∈ U \ {0}. Then (AB − BA)x = 0, or ABx = BAx,
which shows that A and B commute on U. Moreover, for all i, j ≥ 1 we have

(Ai B j − B j Ai )Ax = Ai (B j Ax) − B j Ai+1 x = Ai+1 B j x − B j Ai+1 x = 0,

since by construction x ∈ U ⊆ ker(Ai+1 B j − B j Ai+1 ). Thus, U is invariant under A, and


the same argument shows that U is invariant under B. If the columns of X ∈ K n,m form
a basis of U, then AX = XA1 and BX = XB1 for some matrices A1 , B1 ∈ K m,m , where
m = dim(U) ≥ 1. Since A and B commute on U, we obtain

XA1 B1 = AXB1 = ABX = BAX = BXA1 = XB1 A1 ,

and thus A1 B1 = B1 A1 , since X has full rank. By Lemma 5.3, A1 and B1 have a
common eigenvector, say A1 y = λy and B1 y = µy, and then AXy = XA1 y = λXy and
BXy = XB1 y = µXy, which shows that Xy is a common eigenvector of A and B.

In the intersection
n−1
\
ker(Ai B j − B j Ai )
i,j=1

in Theorem 5.4 we can replace i, j = 1, . . . , n − 1 by i = 1, . . . , deg(MA ) and j =


1, . . . , deg(MB ). Also note that the intersection is non-trivial if and only if the matrix
 
AB − BA

 A2 B − BA2 

2 2 2

 AB − B A  ∈ K n(n−1) ,n

 .. 
 . 
n−1 n−1 n−1 n−1
A B −B A

has rank less than n. This is a computational (though very expensive) criterion for the
existence of a common eigenvector.
We now show that commuting matrices can be simultaneously triangularized, which gen-
eralizes Schur’s triangularization theorem for a single matrix from 1909 (Theorem 1.1).
This simultaneous triangularization result is often attributed to Frobenius’ article from
1896 on commuting matrices [21], but it was not explicitly stated there, and Frobenius
considered Corollary 5.14 (see below) as his main result1 .
1
Frobenius wrote to Dedekind about this article [21] on June 4, 1896: “I almost forgot to mention that
today my article Ueber vertauschbare Matrizen will be published in the Sitzungsberichte. ... Hopefully
you will enjoy its artistic structure. Many will find it artificial. Hensel confessed to me that he was
dumber after reading it, and I needed to take the entire building apart before he understood its meaning.”
(My translation of the quote from Frobenius’ letter given in [75].)

65
Theorem 5.5. If K is algebraically closed and A, B ∈ K n,n commute, then there exists
a matrix S ∈ GLn (K) such that S −1 AS and S −1 BS are both upper triangular, i.e., A
and B are simultaneously triangularizable.

Proof. We prove the result by induction on n. The case n = 1 is trivial. Suppose the
assertion is true for matrices of order n − 1 for some n ≥ 2, and consider two commuting
matrices A, B ∈ K n,n .
By Lemma 5.3, A and B have a common eigenvector, say Ax = λx and Bx = µx. Let
S1 := [x, x2 , . . . , xn ] ∈ GLn (K), then
   
λ ∗ µ ∗
AS1 = S1 and BS1 = S1 (5.1)
0 A1 0 B1

for some matrices A1 , B1 ∈ K n−1,n−1 . We have


   
λµ ∗ −1 λµ ∗
AB = S1 S = BA = S1 S −1 .
0 A1 B1 1 0 B1 A1 1

This shows that A1 and B1 commute. By the induction hypothesis, there exists a matrix
S2 ∈ GLn−1 (K) such that S2−1 A1 S2 and S2−1 B1 S2 are both upper triangular. Defining
 
1 0
S := S1 ∈ GLn (K),
0 S2

we obtain    
−1 λ ∗ −1 µ ∗
S AS = −1 and S BS = ,
0 S2 A1 S2 0 S2−1 B1 S2
which are both upper triangular.

As in Theorem 1.1 we can replace in Theorem 5.5 the term upper triangular by lower
triangular.
The condition that A and B have a common eigenvector is necessary but not sufficient
for the simultaneous triangularization. On the other hand, the condition AB = BA is
sufficient but not necessary for the simultaneous triangularization. In other words, there
exist matrices with a common eigenvector that are not simultaneously triangularizable,
and there exist simultaneously triangularizable matrices that do not commute. The
second point is clear from the fact that not all upper triangular matrices commute (see
the example above).
The inductive proof of Theorem 5.5 suggest the following algorithm for computing the
simultaneous triangularization of two commuting matrices:
Input: Commuting matrices A, B ∈ K n,n .

66
Output: Matrix S ∈ GLn (K) such that S −1 AS and S −1 BS are upper triangular.
Initialize: A1 = A, B1 = B, and S = In .
for k = 0, . . . , n − 1 do
Determine a common eigenvector x ∈ K n−k of A1 and B1 .
Determine x2 , . . . , xn−k ∈ K n−k  S1 = [x, x2 , . . . , xn−k ] ∈ GLn−k
 such that  (K). 
0 0
Set A1 = [0, In−(k+1) ]S1−1 AS1 and B1 = [0 In−(k+1) ]S1−1 BS1 .
  In−(k+1) In−(k+1)
In−k 0
Update S ← S .
0 S1
end for
This algorithm can of course be applied also to matrices A, B ∈ K n,n that do not
commute. If they are simultaneously triangularizable, then the algorithm will run until
the final step k = n − 1 and deliver the required matrix S. If they are not simultaneously
triangularizable there will be some step k < n − 1 where no common eigenvector of A1
and B1 exists, and the algorithm will fail.
As for the unitary triangularization of a single complex matrix (cf. Corollary 1.2), we can
obtain the simultaneous unitary triangularization of two commuting complex matrices.
Corollary 5.6. If A, B ∈ Cn,n commute, then there exists a unitary matrix U ∈ Cn,n
such that U H AU and U H BU are both upper triangular, i.e., A and B are simultaneously
unitarily triangularizable.

Proof. Since the field C is algebraically closed and A and B commute, we can simulta-
neously triangularize them by Theorem 5.5. Let S ∈ GLn (C) be such that S −1 AS = R1
and S −1 BS = R2 are both upper triangular. Let S = U R be a QR decomposition of S,
where U ∈ Cn,n is unitary and R ∈ GLn (C) is upper triangular. Then S −1 = R−1 U H
and
U H AU = RR1 R−1 and U H BU = RR2 R−1 ,
which are both upper triangular.

Theorem 5.5 and Corollary 5.6 can be extended inductively to an arbitrary (finite) num-
ber of mutually commuting matrices as follows.
Corollary 5.7. If K is algebraically closed and A1 , . . . , Ak ∈ K n,n are mutually com-
muting matrices, then there exists a matrix S ∈ GLn (K) such that S −1 Aj S is upper
triangular for all j = 1, . . . , k. In particular, if K = C, then S can be chosen unitary.

Exercise. Prove Corollary 5.7.

The next theorem of Drazin, Dungey and Gruenberg characterizes simultaneous diago-
nalization. They considered the result to be “of a trivial nature” [12, p. 221].

67
Theorem 5.8. Let K be algebraically closed and suppose that A, B ∈ K n,n . Then the
following assertions are equivalent:

(1) A and B commute and are both diagonalizable.


(2) There exists a matrix S ∈ GLn (K) such that S −1 AS and S −1 BS are both diagonal,
i.e., A and B are simultaneously diagonalizable.
(3) There exist a diagonalizable matrix C ∈ K n,n and polynomials p1 , p2 ∈ K[t] such
that A = p1 (C) and B = p2 (C).

Proof. (1) ⇒ (2) : Let

SA−1 ASA = DA = diag(λ1 Id1 , . . . , λm Idm )

be a diagonalization of A, where the eigenvalues λ1 , . . . , λm are pairwise distinct, and


b := S −1 BSA . Then, since A and B commute, we have
define B A

b = (S −1 ASA )(S −1 BSA ) = S −1 ABSA = S −1 BASA = (S −1 BSA )(S −1 ASA )


DA B A A A A A A

= BD
b A.

b = [Bij ], where Bii ∈ K di ,di for i = 1, . . . , m.


As in the proof of Theorem 4.5 we partition B
Then the (i, j)-block of the equation DA Bb − BD
b A = 0 is of the form

(λi Idi )Bij − Bij (λj Imj ) = 0.

For i 6= j we have λi 6= λj and thus Theorem 4.1 shows that Bij = 0 is the unique
solution of this Sylvester equation, which yields

B
b = diag(B11 , . . . , Bmm ).

The matrix B b is diagoanlizable since it is similar to the diagonalizable matrix B, and


because of its structure it can be diagonalized by a block-diagonal matrix, i.e.,

SB−1 BS
b B = DB , where SB = diag(Sd1 , . . . , Sdm ),

for some diagonal matrix DB , and Sdi ∈ K di ,di , i = 1, . . . , m. Now the matrix S := SA SB
yields the simultaneous diagonalization, since

S −1 AS = SB−1 SA−1 ASA SB = SB−1 DA SB = DA , and


S −1 BS = SB−1 SA−1 BSA SB = SB−1 BSB = DB .

(2) ⇒ (3) : Let S −1 AS = diag(α1 , . . . , αn ) and S −1 BS = diag(β1 , . . . , βn ) be simulta-


neous diagonalizations of A and B. Let γ1 , . . . , γn be any pairwise distinct elements of

68
K. (Note that K is algebraically closed and hence is not a finite field.) Then there
exist polynomials p1 , p2 ∈ K[t] of degree at most n − 1 which solve the two (Lagrange)
interpolation problems

p1 (γi ) = αi , i = 1, . . . , n, and p2 (γi ) = βi , i = 1 . . . , n.

For the diagonalizable matrix C := SDC S −1 , where DC := diag(γ1 , . . . , γn ), we then


obtain
p1 (C) = p1 (SDC S −1 ) = Sp1 (DC )S −1 = SDA S −1 = A,
and similarly p2 (C) = B.
(3) ⇒ (1) : Obvious.

Theorem 5.8 shows, in particular, that commuting diagonalizable matrices can be ex-
pressed as polynomials in the same matrix. We have already seen in Example 4.7 that
this does not hold for general (non-diagonalizable) commuting matrices.

Exercise. Extend the formulation and the proof of Theorem 5.8 to an arbitrary (finite)
number of matrices A1 , . . . , Ak ∈ K n,n .

Exercise. Show that if A, B ∈ Cn,n are normal matrices that commute, then there
exists a unitary matrix U ∈ Cn,n such that U H AU and U H BU are both diagonal, i.e.,
A and B are simultaneously unitarily diagonalizable.
Show that if additionally A and B are Hermitian positive definite (or semidefinite),
then the product AB is also Hermitian positive definite (or semidefinite).

Using Lemma 5.1 we can prove another sufficient condition of Shemesh [73] for the
existence of a simultaneous triangularization of two matrices.

Theorem 5.9. If K is algebraically closed and A, B ∈ K n,n are such that their product
AB commutes with both A and B, then A and B are simultaneously triangularizable.

Proof. Let us write C = AB−BA, then the assumptions A(AB) = (AB)A and B(AB) =
(AB)B can equivalently be written as AC = 0 and CB = 0, respectively. If A ∈ GLn (K),
then C = 0 and the result follows from Theorem 5.5. We therefore can assume that
A∈/ GLn (K), and thus ker(A) 6= {0}.
If x ∈ ker(A), then ACx = 0, and hence the non-trivial subspace ker(A) is invariant
under C. By Lemma 5.1 there exists an eigenvector x0 of C in ker(A), i.e., we have

Cx0 = λx0 and Ax0 = 0,

69
for some λ ∈ K. Let us define

X := {x ∈ K n : Cx = λx and Ax = 0},

which is a subspace of K n with dimension at least one. We will show that X is invariant
under B. For each x ∈ X we have

(C − λIn )Bx = CBx − λBx = −λBx,

and
ABx = (AB − BA + BA)x = Cx = λx.
If λ = 0, then these two equations show that Bx ∈ X . In this case B has an eigenvector
in X , which simultaneously is an eigenvector of A corresponding to the eigenvalue 0, say
Ax = 0 and Bx = µx. This yields a decomposition of the form (5.1) with λ = 0. Now
   
0 ∗ −1 0 ∗
S1 S = A(AB) = (AB)A = S −1 ,
0 A1 (A1 B1 ) 1 0 (A1 B1 )A1 1

and    
0 ∗ −1 0 ∗
S1 S = B(AB) = (AB)B = S −1 ,
0 B1 (A1 B1 ) 1 0 (A1 B1 )B1 1
show that the product A1 B1 commutes with both A1 and B1 . The result follows by
induction as in the proof of Theorem 5.5.
It remains to show that indeed λ = 0. We will assume that λ 6= 0 and derive a contra-
diction. First note that (AB)2 = ABAB = A2 B 2 , and inductively (AB)k = Ak B k for
all k ≥ 1. Suppose that 0 6= x ∈ X , then ABx = λx 6= 0. Thus, for all k ≥ 1,

Ak B k x = (AB)k x = λk x 6= 0,

but
Ak+1 B k x = A(AB)k x = (AB)k Ax = 0,
which means that
Bkx ∈
/ ker(Ak ), but B k x ∈ ker(Ak+1 ).
Consequently, for all k ≥ 1 we have the strict inclusion ker(Ak ) ⊂ ker(Ak+1 ). This
however is impossible since the space is finite-dimensional, and therefore there must be
some integer k ≥ 1 with ker(Ak ) = ker(Ak+1 ).

The following example shows that for simultaneous triangularizability it is in general not
sufficient that the product AB commutes with only one of the two matrices A and B.

70
Example 5.10. For the matrices
   
0 0 0 0 1 0
A = 0 0 0 and B = 0 0 1
1 0 0 0 0 0

we have  
0 0 0
AC = 0 and CB = 0 −1 0 6= 0.
0 0 1
Both A and B have only the eigenvalue 0 with the corresponding eigenspaces given by
V0 (A) = span{e2 , e3 } and V0 (B) = span{e1 }. Since A and B do not have a common
eigenvector, they are not simultaneously triangularizable.

If the products AB and BA are not equal, they may still be linearly dependent, i.e.,
AB = ωBA for some ω ∈ K. Drazin called such matrices projectively commuting [10],
and he showed the following result (where the case ω = 1 is included).

Theorem 5.11. Let K be algebraically closed, and suppose that A, B ∈ K n,n satisfy
AB = ωBA for some ω ∈ K. Then exactly one of the following holds:

(1) A and B are simultaneously triangularizable.

(2) There exists an integer r with 0 ≤ r ≤ n − 2, such that A and B are simultaneously
similar to matrices of the form
   
S X T Y
and , (5.2)
0 Ar 0 Br

respectively, where S, T ∈ K r,r are upper triangular and Ar , Br ∈ GLn−r (K).

Proof. First note that AB = ωBA implies

AB 2 = (AB)B = (ωBA)B = ωB(AB) = ω 2 B 2 A,

and hence, by induction, Ap(B) = p(ωB)A for any polynomial p ∈ K[t].


Suppose that at least one of the two matrices, say A, is singular. Let x ∈ K n be an
eigenvector corresponding to the eigenvalue 0 of A, and let p ∈ K[t] be the minimal
polynomial of B with respect to x; see Lemma 4.2. If µ ∈ K is any zero of p, then we
can write p = (t−µ)p0 for some monic polynomial p0 ∈ K[t] with p0 (B)x 6= 0. Moreover,

Ap0 (B)x = p0 (ωB)Ax = 0 and Bp0 (B)x = p(B)x + µp0 (B)x = µp0 (B)x,

71
which shows that A and B have the common eigenvector x1 := p0 (B)x. As in the proof
of Theorem 5.5 we define S1 := [x1 , x2 , . . . , xn ] ∈ GLn (K) and obtain
   
0 ∗ µ ∗
AS1 = S1 and BS1 = S1
0 A1 0 B1
for some matrices A1 , B1 ∈ K n−1,n−1 . We have
   
0 ∗ −1 0 ∗
AB = S1 S = ωBA = S1 S −1 ,
0 A1 B1 1 0 ωB1 A1 1
and hence A1 B1 = ωB1 A1 . We can repeat the process inductively if at least one of the
matrices A1 and B1 is singular (cf. also the proof of Theorem 5.5).
The process terminates either with a simultaneous triangularization of A and B, or when
we arrive at matrices Ar , Br ∈ K n−r,n−r with 0 ≤ r ≤ n−2 that are both nonsingular, and
in this case we obtain a simultaneous similarity of A and B to matrices as in (5.2).

Note that if the projectively commuting matrices A and B are both nonsingular, then
they are not simultaneously triangularizable, and we are in case (2) of Theorem 5.11 with
r = 0. If, however, the projectively commuting matrices A and B are simultaneously
triangularizable, then the proof of Theorem 5.11 shows that both products AB and BA
are nilpotent, and the union of the spectra of A and B contains at least n eigenvalues
equal to zero.
Apart from the usual commutativity (ω = 1), the most important class of projectively
commuting matrices are the anti-commuting matrices 2 , which satisfy AB = −BA, i.e.,
ω = −1 in Theorem 5.11. An example of three nonsingular real 4×4 anti-commuting ma-
trices is given in the context of the quaternions in Example 2.16. The size of these three
matrices is even and they have trace zero, which is true in general for anti-commuting
matrices.
Lemma 5.12. If K is algebraically closed and A, B ∈ GLn (K) are anti-commuting, then
n must be even, and trace(A) = trace(B) = 0.

Proof. Applying the determinant to both sides of the equation AB = −BA yields
det(A) det(B) = (−1)n det(A) det(B).
Since det(A) det(B) 6= 0, we obtain 1 = (−1)n , which implies that n is even.
Since B is nonsingular, AB = −BA implies that A = −BAB −1 , and hence trace(A) =
−trace(A), which gives trace(A) = 0. In the same way we obtain trace(B) = 0 from the
nonsingularity of A.
2
Such matrices have been studied already by Cayley, who called them skew convertible in 1858 [7,
p. 30].

72
Exercise. Is the assumption that K is algebraically closed necessary in Lemma 5.12,
or can the assertions be shown for other fields as well?

5.2 Applications of the simultaneous triangulariza-


tion
We will now give some applications of Theorem 5.5 and the simultaneous triangulariza-
tion.
We start with an alternative proof (following [58, pp. 33–34]) of the implication (1) ⇒ (2)
in Theorem 4.1: Let K be algebraically closed, and let A ∈ K n,n and B ∈ K m,m be given.
Define the two linear maps

fA : K n,m → K n,m , X 7→ AX, fB : K n,m → K n,m , X 7→ XB.

Eigenvectors V and W of fA and fB must satisfy fA (V ) = λV and fB (W ) = µW


for some scalars λ, µ ∈ K. Equivalently, AV = λV and W B = µW , which shows
that the eigenvalues of the maps fA and fB are equal to those of the matrices A and
B, respectively. Moreover, (fA ◦ fB )(X) = A(XB) = (AX)B = (fB ◦ fA )(X) for all
X ∈ K n,m . Since fA and fB commute, there exists a basis of K n,m in which both their
matrices are upper triangular. Since the eigenvalues of triangular matrices are their
diagonal entries, the eigenvalues of the map fA − fB are the differences of the eigenvalues
of A and B. If σ(A) ∩ σ(B) = ∅, then all differences are nonzero, and hence the map
fA − fB is bijective, which shows that for every C ∈ K n,m there exists exactly one
X ∈ K n,m such that C = (fA − fB )(X) = AX − XB.
Another result for simultaneously triangularizable matrices is the following.

Corollary 5.13. If K is any field, A, B ∈ K n,n are simultaneously triangularizable, and


B is nilpotent, then det(A + B) = det(A).

(1) (2)
Proof. Suppose that A = SR1 S −1 and B = SR2 S −1 , where R1 = [rij ] and R2 = [rij ]
(2)
are upper triangular. Since B is nilpotent we must have rii = 0 for i = 1, . . . , n, and
therefore
n
Y n
Y
(1) (2) (1)
det(A + B) = det(R1 + R2 ) = (rii + rii ) = rii = det(A).
i=1 i=1

73
Frobenius proved a variant of this result in [20, §3 Satz VII] and repeated the proof
almost 20 years later in [21, §2 Satz VII3 ]: Suppose that A, B ∈ K n,n commute, A is
invertible, and B is nilpotent. In this case AB = BA can be written as A−1 B = BA−1 ,
which by induction yields (A−1 B)n = A−n B n = 0, and hence A−1 B is nilpotent. Thus,
PA−1 B = det(tIn − A−1 B) = tn . For t = −1 we get
(−1)n = det(−In − A−1 B) = det(−A−1 (A + B)) = (−1)n det(A−1 ) det(A + B),
and using det(A−1 ) = det(A)−1 gives det(A + B) = det(A).
A simple, yet important corollary of Theorem 5.5 is the following spectral mapping the-
orem for commuting matrices.
Corollary 5.14 (Frobenius Pm [21], Satz III4 ). Let K be algebraically closed, let A, B ∈ K n,n
i j
commute, and let p = i,j=0 γij s t with γij ∈ K for i, j = 0, 1, . . . , m be a polynomial
in the two (commuting) variables s, t. Then the eigenvalues of p(A, B) are of the form
p(α1 , β1 ), . . . , p(αn , βn ), where α1 , . . . , αn and β1 , . . . , βn are the eigenvalues of A and B,
respectively, in some appropriate ordering.

Proof. The commuting matrices A, B are simultaneously triangularizable, i.e., A =


SRA S −1 and B = SRB S −1 for some S ∈ GLn (K) and upper triangular matrices RA , RB .
For all i, j ≥ 0 we have
j −1
Ai B j = SRA
i
RB S .
i j
The eigenvalues of this matrix are the products of the diagonal elements of RA and RB .
These are the respective powers of the diagonal elements of RA and RB , which are the
eigenvalues of A and B in some ordering. The result now follows inductively from this
fact.

The proof of Corollary 5.14 shows that in order to guarantee that the eigenvalues of
p(A, B) have the form p(αi , βi ) it is sufficient that A and B are simultaneously trian-
gularizable. The fact that these two properties are equivalent, where in general we have
to consider polynomials in non-commuting variables, is part of an important theorem of
McCoy5 that we will prove in Section 6.2 (see Theorem 6.18).
3
After the proof of [20, §3 Satz VII] Frobenius pointed out: “The main progress in the theory of
forms that Weierstrass made beyond Cauchy and Jacobi is that he taught how to further decompose
forms of which a power vanishes, or more generally, whose characteristic equation has only one root,
except when the smallest power that vanishes is the nth.” (My translation.)
4
Frobenius stated his variant of Corollary 5.14 for “arbitrary functions” of the given commuting
matrices, and as one of his main theorems in the Introduction of his article from 1896 [21]. He mentioned
that he knew the result already when he wrote his article [20] from 1878. In that article he stated
without proof the assertion of Corollary 5.14 for the polynomial p = st, and hence the eigenvalues of
the product AB [20, §7 Satz XII]. According to Drazin [11, p. 222], that was the “first significant result
on commutative matrices”.
5
Neal Henry McCoy (1905–2001)

74
Exercise. Use Corollary 5.7 to extend Corollary 5.14 to an arbitrary (finite) number
of mutually commuting matrices A1 , . . . , Ak ∈ K n,n .

Corollary 5.14 particularly shows that for two commuting matrices A, B the eigenvalues
of A + B are of the form α1 + β1 , . . . , αn + βn . Hence if σ(A) ∩ σ(−B) = ∅, then all
eigenvalues of A + B are nonzero and A + B ∈ GLn (K). Moreover, if AB = BA, then

σ(A + B) ⊆ σ(A) + σ(B).

Taking A = B = αIn , we see that equality may occur. The example of the commuting
2 × 2 real matrices
   
1 0 −1 0
A= , B= , where σ(A + B) = {0} ⊂ σ(A) + σ(B) = {−2, 0, 2}
0 −1 0 1

shows that a strict inclusion may occur as well.


On the other hand, if AB =6 BA, then there may be no inclusion at all, as shown by the
example
   
0 1 0 0
A= , B= , where σ(A + B) = {−1, 1}, σ(A) = σ(B) = {0}.
0 0 1 0

Exercise. Find non-commuting matrices A, B ∈ K n,n with σ(A + B) ⊂ σ(A) + σ(B)


(i.e., a strict inclusion of the spectra). Do there exist non-commuting matrices A, B ∈
K n,n with σ(A + B) = σ(A) + σ(B) ?

Exercise. Show that if A, B ∈ Cn,n commute, then the spectral radius satisfies ρ(A +
B) ≤ ρ(A) + ρ(B) and ρ(AB) ≤ ρ(A)ρ(B). Do these inequalities also hold for the
numerical radius?

We next address the question how many linearly independent and mutually commuting
matrices exist in K n,n , which was already asked by Frobenius in the introduction of his
article [21]. Clearly, if A ∈ K n,n , then all matrices in the sequence

In , A, A2 , . . .

commute. However, since PA (A) = 0 according to the Cayley-Hamilton Theorem, there


can be at most n linearly independent matrices in that sequence. We will now show
that a larger number of mutually commuting and linear independent matrices exists.
Consider the standard basis of K n,n , which is given by the matrices

Ek` := ek eT` = [δik δj` ] ∈ K n,n , k, ` = 1, . . . , n.

75
If `1 6= k2 and k1 6= `2 , then

Ek1 `1 Ek2 `2 = ek1 eT`1 ek2 eT`2 = 0,


Ek2 `2 Ek1 `1 = ek2 eT`2 ek1 eT`1 = 0,

which shows that Ek1 `1 and Ek2 `2 commute. Consequently, the matrices in the set
n jnk jnk o
Sn := Ek` : 1 ≤ k ≤ , + 1 ≤ ` ≤ n ∪ {In },
2 2
where bxc := max {m ∈ Z : m ≤ x} for all x ∈ R, are linearly independent and mutually
commuting. We have
jnk  j n k   jnk  j n k  n2 
|Sn | = 1 + · n− +1 +1 =1+ · n− = + 1.
2 2 2 2 4

In particular,

S1 = {I1 } , S2 = {I2 , E12 } , S3 = {I3 , E12 , E13 } , S4 = {I4 , E13 , E14 , E23 , E24 } .

The following result of Schur [66] shows that if K is algebraically closed, then Sn forms
a maximal set of linearly independent and mutually commuting matrices in K n,n . The
proof we give is due to Mirzakhani6 [56].

Theorem 5.15. If K is algebraically closed, then the jmaximal


k number of linearly inde-
n,n n2
pendent and mutually commuting matrices in K is 4 + 1.

j 2k
Proof. We have already seen above that there exist at least n4 +1 linearly independent
and mutually commuting matrices in K n,n . We will show that this is the maximal number
by induction on n. The case n = 1 is trivial.
Suppose that the assertion holds for matrices of order n − 1 for some n ≥ 2, and let
 2
n,n n
A1 , . . . , Ak ∈ K , k := + 2,
4

be linearly independent and mutually commuting matrices. Our goal is to derive a


contradiction. Since the field K is algebraically closed, the mutually commuting matrices
6
Maryam Mirzakhani (1977–2017) became in 2014 the first woman and the first Iranian who won
the Fields Medal, the most prestigious award in mathematics. She wrote her paper [56] in 1998, while
she was a Bachelor student at Sharif University of Technology in Tehran. After finishing her Bachelor
studies in 1999 she went to Harvard University, where she obtained her PhD in 2004. From 2008 to
2017 she was a professor at Stanford University.

76
A1 , . . . , Ak are simultaneously triangularizable (cf. Corollary 5.7). Hence we may assume
without loss of generality that
 
∗ ∗
Aj = , j = 1, . . . , k,
0 Rj

where Rj ∈ K n−1,n−1 is upper triangular. Then


    
∗ ∗ ∗ ∗ ∗ ∗
Ai Aj = = =
0 Ri 0 Rj 0 Ri Rj
    
∗ ∗ ∗ ∗ ∗ ∗
Aj Ai = = ,
0 Rj 0 Ri 0 Rj Ri

i.e., the matrices R1 , . . . , Rk are mutually commuting.


Define U := span{R1 , . . . , Rk }, then the induction hypothesis tells us that

(n − 1)2
 
m := dim(U) ≤ + 1 < k.
4

Suppose that R1 , . . . , Rm form a basis of U, then for each i = 1, . . . , k we have


m
X
Ri = αij Rj ,
j=1

for certain scalars αij ∈ K, j = 1, . . . , m. For i = m + 1, . . . , k we now define the matrix


m    
X ∗ ∗ ∗
Bi := Ai − αij Aj = = , (5.3)
0 Ri − m
P
α R
j=1 ij j 0
j=1

i.e., we can write  T


b
Bi = i ∈ K n,n
0
for some bTi ∈ K 1,n . The matrices Bm+1 , . . . , Bk are linearly independent (since A1 , . . . , Ak
are linearly independent), and hence the matrix
 
bTm+1
A :=  ...  ∈ K k−m,n
 
bTk

has full rank k − m > 0.

77
An analogous construction starting with
 
ej ∗
R
Aj = , j = 1, . . . , k,
0 ∗

yields a positive integer


(n − 1)2
 
e ≤
m +1<k
4
and linear independent matrices
ei = [0, ebi ] ∈ K n,n ,
B i=m
e + 1, . . . , k, (5.4)

where now ebm+1


e , . . . , ebk ∈ K n , so that the matrix

A
e = [ebm+1
e , . . . , ebk ] ∈ K n,k−m
e

has full rank k − m


e > 0.
By construction, all matrices in (5.3) and (5.4) are mutually commuting, so that
 T  
b Te
i 0 b i b j
Bi B
ej = [0, ebj ] = ,
0 0 0
 T
b
Bj Bi = [0, bj ] i = 0.
e e
0

e + 1, . . . , k, and hence dim(ker(A)) ≥


This shows, in particular, that Aebi = 0 for i = m
k − m.
e But then, by the dimension formula for linear maps [LM, Theorem 10.9],

n = dim(im(A)) + dim(ker(A)) = rank(A) + dim(ker(A))


 2 
(n − 1)2
   
n
≥ (k − m) + (k − m)
e = 2k − (m + m)
e ≥2 +2 −2 +1
4 4
 2  
(n − 1)2

n jnk
=2+2 − =2+2 ,
4 4 2
which is indeed a contradiction for all n ≥ 2.
j 2k j 2k j k
n  n  n n (n−1)2 n
Exercise. Show that 2
· n− 2
= 4
and 4
− 4
= 2
hold for all
n ∈ N.

In Theorem 5.15 we have assumed that K is algebraically closed. The example with
the set Sn above Theorem 5.15, however, didj notk make any assumptions on the field K.
n2
Thus, for any field K there exist at least 4 + 1 linear independent and mutually

78
commuting matrices. Jacobson7 [43] showed that this is indeed the maximal number for
(almost) arbitrary fields. Another set for which this maximum is achieved is shown in
the following exercise.

Exercise. Suppose that n is even and show that the set


  
αIn/2 A n/2,n/2
: A∈K
0 αIn/2
j 2k
contains n4 + 1 linear independent and mutually commuting matrices. Modify the
definition of the set for the case of odd n.

7
Nathan Jacobson (1910–1999)

79
Chapter 6

Commutators and the theorems of


McCoy

In Chapter 5 we have shown that commuting matrices over an algebraically closed field
are simultaneously triangularizable (Theorem 5.5). The converse of this result is however
not true, since there exist simultaneously triangularizable matrices that do not commute.
We have also shown that for two commuting matrices A, B with respective eigenvalues
αi , βi , the eigenvalues of the polynomial p(A, B) are of the form p(αi , βi ) (Corollary 5.14).
Again, the converse is not true, which motivated the following comment of Drazin on
the commutativity property of matrices [11, p. 222]:
“Indeed, no non-trivial result for general commutative matrices has yet been found which
is not true under conditions less stringent than commutativity; thus commuting is, as
such, hardly a fundamental property, and the problem of interest is to find useful gener-
alizations of it.”
A few years later, Olga Taussky expressed the same point of view [78, p. 232]:
“Apparently there are none or at most very few theorems known which say: “such and
such a statement is true if and only if AB = BA”; it is always “if AB = BA then ...”.
Hence we are tempted to replace AB = BA in various ways by weaker hypotheses.”
In this chapter we will study “useful generalizations” (Drazin) or “weaker hypetheses”
(Taussky) that lead to equivalent characterizations of the simultaneous triangularization
property; cf. our discussion of Corollary 5.14. It will turn out that this characterization
involves the additive commutator AB − BA. We will analyze additive commutators in
more detail (Section 6.1), and then prove McCoy’s important characterization theorem.
On our way to this theorem we will also derive further sufficient (but not necessary) con-
ditions for commutativity or simultaneous triangularizability of matrices (Section 6.2).
Finally, we will consider the multiplicative commutator ABA−1 B −1 of two invertible

80
matrices (Section 6.3).

6.1 The additive commutator


The additive commutator of two matrices A, B ∈ K n,n is the matrix C = [A, B] =
AB − BA. Obviously, C 6= 0 if and only if A and B do not commute. In this case it is of
interest to study whether the additive commutator still gives useful information about
the relation between AB and BA.
For each S ∈ GLn (K) we have

SCS −1 = S(AB − BA)S −1 = (SAS −1 )(SBS −1 ) − (SBS −1 )(SAS −1 )


=: A
bBb−BbA
b = [A,
b B].
b

Thus, a similarity transformation of an additive commutator yields another additive


commutator.
Moreover, for all A, B ∈ K n,n we have

trace(C) = trace(AB) − trace(BA) = 0,

i.e., any additive commutator of two square matrices has trace zero.

Exercise. Show that the trace zero matrices of K n,n with the additive commutator
form a Lie algebra.

For our further analysis we need the following theorem from Linear Algebra.

Theorem 6.1. Let K be any field and A ∈ K n,n . Then there exist uniquely determined
(non-constant) monic polynomials p1 , . . . , pk ∈ K[t] with pi |pi+1 for i = 1, . . . , k − 1 and
pk = MA , such that

A = SF S −1 , where F = diag(Cp1 , . . . , Cpk )

for some S ∈ GLn (K). Here Cpi is the companion matrix of the polynomial pi (see
(A.1)), the polynomials p1 , . . . , pk are called the invariant factors of A, and the matrix F
is called the Frobenius normal form or rational canonical form of A.

Proof. We prove the result only for nonderogatory matrices A ∈ K n,n . Let MA = PA =
tn + αn−1 tn−1 + · · · + α0 . We know that exists a cyclic vector v ∈ K n for A such that the
vectors
v, Av, . . . , An−1 v

81
are linearly independent. Now

MA (A) = An v +αn−1 An−1 v +· · ·+α0 v = 0, or A(An−1 v) = −(αn−1 An−1 v +· · ·+α0 v),

and hence with S := [v, Av, . . . , An−1 v] ∈ GLn (K) we obtain AS = SCMA , which we
needed to show.

We also need the following result of Shoda1 [74, Hilfssatz, p. 364].

Lemma 6.2. If K is any field and A ∈ K n,n has trace(A) = 0, then A is similar to a
matrix that has only zero diagonal elements.

Proof. First recall that the trace is invariant under similarity transformations of a matrix.
After a suitable reordering, which is a similarity transformation, we can therefore assume
without loss of generality that  
A11 A12
A= ,
A21 A22
where A11 ∈ K j1 ,j1 has only nonzero diagonal elements and A22 ∈ K j2 ,j2 has only zero
diagonal elements. If j1 = 0, we are done.
Thus, we can assume that j1 ≥ 1. We cannot have j1 = 1, since then trace(A) =
trace(A11 ) 6= 0 contrary to our assumption. Hence we must have j1 ≥ 2. Let S1 ∈
GLj1 (K) be a matrix such that S1−1 A11 S1 is in Frobenius normal form. Then the matrix
 −1   −1
S1 A11 S1 S1−1 A12
  
S1 0 S1 0
A =
0 In−j1 0 In−j1 A21 S1 A22

has more zero diagonal entries than A. We can reorder this matrix so that all nonzero
diagonal entries are again in the (1, 1)-block. We can then apply another similarity
transformation to the Frobenius normal form, which reduces the number of nonzero
digaonal elements. This process ends when the Frobenius normal form of the transformed
(1, 1)-block contains exactly one block. Since trace(A) = 0, the diagonal of this block,
which contains at most one nonzero element, must be zero.

Lemma 6.2 yields a important theorem of Shoda [74] about additive commutators.

Theorem 6.3. If the field K has characteristic zero, then each A ∈ K n,n with trace(A) =
0 is an additive commutator, i.e., A = [B, C] for some matrices B, C ∈ K n,n .
1
Kenjiro Shoda (1902–1977)

82
Proof. The matrix A is an additive commutator if and only if S −1 AS is an additive
commutator for any S ∈ GLn (K). Thus, we can use Shoda’s Lemma 6.2 and as-
sume without loss of generality that A = [aij ] with aii = 0 for i = 1, . . . , n. Let
B = diag(b1 , . . . , bn ) ∈ K n,n be any matrix with bi 6= bj for i 6= j. Since the field has
characteristic zero, the differences bi − bj for i 6= j are invertible and we can define
( a
ij
, i 6= j,
cij := bi −bj
arbitrary, i = j.

Then an easy calculation shows that B and C = [cij ] satisfy [B, C] = A.

Shoda’s Theorem 6.3 was extended to arbitrary fields by Albert and Muckenhoupt [1].
The next result was shown independently and almost simultaneously by Laffey [49] and
Guralnick [28]. The proof given here follows the later work in [8].

Theorem 6.4. If K is algebraically closed, and A, B ∈ K n,n satisfy rank([A, B]) ≤ 1,


then A and B are simultaneously triangularizable.

Proof. The statement is obvious for n = 1, so we can assume that n ≥ 2. Let C = [A, B].
If rank(C) = 0, then A and B are simultaneously triangularizable by Theorem 5.5.
Thus, let us assume that rank(C) = 1. We will first show that A and B have a common
eigenvector.
Let λ ∈ K be an eigenvalue of A. Then Vλ (A) = ker(A − λIn ) and im(A − λIn ) are both
non-trivial invariant subspaces under A, i.e., both invariant subspaces have a dimension
between 1 and n − 1. If Vλ (A) is invariant under B, then B has an eigenvector in Vλ (A)
by Lemma 5.1, and this vector simultaneously is an eigenvector of A.
Now suppose that Vλ (A) is not invariant under B. Then there exists a nonzero vector
x ∈ Vλ (A) such that Bx ∈
/ Vλ (A), i.e., (A − λIn )Bx 6= 0. Moreover, since rank(C) = 1
we have
im(C) = span{y}
for some nonzero vector y ∈ K n . We therefore obtain

Cx = ((A − λIn )B − B(A − λIn ))x = (A − λIn )Bx = αy

for some nonzero α ∈ K. In particular, this shows that y ∈ im(A − λIn ). For any vector
z ∈ K n there exists some µz ∈ K with

Cz = ((A − λIn )B − B(A − λIn ))z = µz y,

83
and therefore

B(A − λIn )z = µz y − (A − λIn )Bz ∈ im(A − λIn ).

Consequently, A and B have the common invariant subspace U (1) = im(A − λI), where
1 ≤ dim(U (1) ) ≤ n − 1.
Let the columns of U (1) ∈ K n,k1 form a basis of U (1) , where k1 := dim(U (1) ). Then
(1) (1)
AU (1) = U (1) ZA and BU (1) = U (1) ZB
(1) (1)
for some matrices ZA , ZB ∈ K k1 ,k1 .
If k1 = 1, then U (1) is a common eigenvector of A and B. Otherwise we consider the
equation
(1) (1)
CU (1) = U (1) [ZA , ZB ].
The matrix on the right hand side has rank at most 1, and since U (1) has full rank, we
must have
(1) (1)
rank([ZA , ZB ]) ≤ 1.
(1) (1)
We can now apply the same reasoning as above to the matrices ZA and ZB and continue
recursively. This reduces the dimension of the matrices by at least 1 in every step. After
m ≤ n − 1 steps we obtain
(m) (m)
AU = U ZA and BU = U ZB
(m) (m)
with U = U (1) · · · U (m) of full rank, where ZA and ZB must have a common eigenvec-
tor. It is easy to see that this vector yields a common eigenvector of A and B.
Since A and B have a common eigenvector, we obtain a decomposition of the form (5.1).
This decomposition yields  
0 ∗
C = S1 S −1 .
0 [A1 , B1 ] 1
Here rank([A1 , B1 ]) ≤ 1, and therefore the simultaneous triangularization of A and B
follows inductively as in the proof of Theorem 5.5.

As shown above, each additive commutator [A, B] has trace zero, and hence the sum of
its eigenvalues is zero. But this does in general not mean that all the eigenvalues are
zero, or that [A, B] is nilpotent.

Example 6.5. Consider


     
0 0 0 1 −1 0
A= , B= , [A, B] = .
1 0 0 0 0 1

84
Here [A, B] is invertible, although neither A nor B is. Also note that A and B have
only the eigenvalue 0 with the corresponding eigenspaces given by V0 (A) = span{e2 } and
V0 (B) = span{e1 }. Since A and B do not have a common eigenvector, they are not
simultaneously triangularizable.

This example motivates the following simple result.

Lemma 6.6. Let K be any field and let A, B ∈ K n,n . If A and B have a common
eigenvector, then [A, B] is not invertible. Moreover, if A and B are simultaneously
triangularizable, then [A, B] is nilpotent.

Proof. If x ∈ K n is a common eigenvector of A and B, say Ax = λx and Bx = µx, then


[A, B]x = (λµ − µλ)x = 0, and hence [A, B] is not invertible.
If A, B ∈ K n,n are simultaneously triangularizable, say A = SRA S −1 and B = SRB S −1 ,
then
[A, B] = S(RA RB − RB RA )S −1 .
All diagonal entries of RA RB − RB RA are zero, which shows that [A, B] is nilpotent.

Example 6.16 below shows that the converse of both assertions in this lemma do in
general not hold.
The next result, which is sometimes called Jacobson’s Lemma (see [42, Lemma 2]), gives
another sufficient condition for nilpotency of the additive commutator. A general version
of this lemma for linear operators in Banach spaces was proven by Kleinecke [48]. Here
we give a proof using only matrix theory, which is extracted from the proof of [12,
Theorem 2]. Apart from being interesting in its own right, the lemma will be very useful
for proving several results below.

Lemma 6.7. Let K be algebraically closed with characteristic zero or (prime) charac-
teristic p > n, let A, B ∈ K n,n , and C = [A, B]. If AC = CA or BC = CB, then C is
nilpotent.

Proof. If C has an eigenvalue λ with algebraic multiplicity n, then 0 = trace(C) =


λ + · · · + λ = nλ. Since the field K has characteristic zero or p > n, we must have λ = 0,
and thus C is nilpotent.
Now suppose that λ1 , . . . , λm , m ≥ 2, are the pairwise distinct eigenvalues of C with
respective algebraic multiplicities d1 , . . . , dm , where 1 ≤ dj < n for j = 1, . . . , m. If
AC = CA, then for any k ≥ 1 we obtain

C k = C k−1 (AB − BA) = A(C k−1 B) − (C k−1 B)A = [A, C k−1 B].

85
If BC = CB, then for any k ≥ 1 we obtain

C k = C k−1 (AB − BA) = (C k−1 A)B − B(C k−1 A) = [C k−1 A, B].

Thus, trace(C k ) = 0, which can be written as

d1 λk1 + d2 λk2 + · · · + dm λkm = 0.

This holds for all k ≥ 1, and hence we can take the first m of these equations and form
the linear algebraic system
    
1 1 ··· 1 λ1 d1 0
 λ1 λ2 · · · λm   λ2 d2  0
   
..   ..  =  ..  .
 
 .. ..
 . . .   .  .
m−1 m−1 m−1
λ1 λ2 · · · λm λm dm 0

Since λ1 , . . . , λm are pairwise distinct, the (Vandermonde) matrix on the left hand side
is invertible, giving λi di = 0 for i = 1, . . . , m. Since the field has characteristic zero or
p > n, we must have λi = 0 for i = 1, . . . , m, which however contradicts our assumption
that λ1 , . . . , λm are pairwise distinct. Consequently, C can have only one eigenvalue of
algebraic multiplicity n, and, as shown above, this is the eigenvalue λ = 0, so that C is
indeed nilpotent.

A direct consequence of Lemma 6.7 is the next result of Kato2 and Olga Taussky from
1956 [47, Theorem 2]. Two years earlier, Putnam had already established a more general
version of this result for linear operators on a Hilbert space [62].

Corollary 6.8. If A ∈ Cn,n commutes with [A, AH ], then A is normal. (Note that the
converse holds trivially.)

Proof. The field C is algebraically closed and has characteristic zero. Since A commutes
with [A, AH ], we know that [A, AH ] is nilpotent by Lemma 6.7. But [A, AH ] is also
Hermitian, and thus unitarily diagonalizable by Corollary 1.3. Hence [A, AH ] must be
the zero matrix, which means that A is normal.

Exercise. Prove the following generalization of Corollary 6.8, originally due to Olga
Taussky [79, Theorem 3]: If A, B ∈ Cn,n are such that AB and BA are Hermitian and
A commutes with [A, B], then [A, B] = 0.
2
Tosio Kato (1917–1999) published more than 160 articles and 6 monographs in English, among
them his famous Perturbation Theory of Linear Operators [45, 46]. His first article in English appeared
in 1941, and before publishing the article [47] with Olga Taussky in 1956, all his publications were
singly-authored. Thus, Olga Taussky was Kato’s first co-author.

86
Starting with C1 := [A, AH ], we can form C2 := [A, C1 ], C3 := [A, C2 ], and so on.
Corollary 6.8 then says that if C1 6= 0 (i.e., A is not normal), then C2 6= 0 (i.e., A and
C1 do not commute). But, somewhat surprisingly, C1 6= 0 and C2 6= 0 do in general not
imply that C3 6= 0, as shown by the following example of (real) matrices due to Olga
Taussky [78]:
     
1 1 1 0 0 −2
A= , C1 = 6= 0, C2 = 6= 0, but C3 = 0.
0 1 0 −1 0 0

Lemma 6.7 also is important in the proof of the following result of Shapiro [71] (cf. also
the proof of Theorem 5.9).
Theorem 6.9. Let K be algebraically closed with characteristic zero or (prime) charac-
teristic p > n. If A, B ∈ K n,n are such that [A, B] = p(A) for some p ∈ K[t], then A
and B are simultaneously triangularizable.

Proof. Let A, B ∈ K n,n be two matrices with C = [A, B] = p(A). Since A and C
commute, we know that C is nilpotent (Lemma 6.7), and that A and C have a common
eigenvector (Lemma 5.3), say
Ay = λy and Cy = 0
for some nonzero y ∈ K n . We define
X := {x ∈ K n : Ax = λx and Cx = 0},
which is a subspace of K n with dimension at least one. Note that for each polynomial
q ∈ K[t] and vector x ∈ X we have q(A)x = q(λ)x. Since K has characteristic zero and
we can choose a nonzero vector x, the equation 0 = Cx = p(A)x = p(λ)x then shows
that p(λ) = 0.
For each x ∈ X we have
A(Bx) = (C + BA)x = BAx = λ(Bx),
and inductively we obtain q(A)Bx = q(λ)Bx for every polynomial q ∈ K[t]. In particu-
lar,
C(Bx) = p(A)Bx = p(λ)Bx = 0,
so that X is invariant under B. Consequently, A and B have a common eigenvector in
X , and we obtain a decomposition of the form (5.1). Writing C1 := A1 B1 − B1 A1 , a
straightforward computation shows that
   
0 ∗ −1 0 ∗
C = AB − BA = S S = p(A) = S S −1 .
0 C1 0 p(A1 )
Since C1 = p(A1 ), the result follows inductively as in Theorem 5.5.

87
Theorem 6.9 was also shown by Bourgeois [6], and Ikramov [41] derived a variant of the
result using Shemesh’s necessary and sufficient condition for the existence of a common
eigenvector of two matrices (Theorem 5.4). Combining Theorem 6.9 and Theorem 4.5
shows that if K is algebraically closed with characteristic zero, and A, B ∈ K n,n are such
that A is nonderogatory and commutes with [A, B], then A and B are simultaneously
triangularizable.

Exercise. Show that if K is algebraically closed with characteristic zero, and


A, B ∈ K 2,2 are such that A commutes with [A, B], then A and B are simultaneously
triangularizable.

Exercise. Investigate whether the assumption in Theorem 6.9 that K has character-
istic zero can be replaced by K having (prime) characteristic p > n.

The next result of Shapiro [70] relates the commutativity of two matrices to the commu-
tativity with the additive commutator.
Theorem 6.10. Let K be algebraically closed and let A, B ∈ K n,n . If A is diagonalizable
and commutes with [A, B], then [A, B] = 0.

Proof. Since the field is algebraically closed and A and C = [A, B] commute, there exists
a simultaneous triangularization by Theorem 5.5, i.e., A = SR1 S −1 and C = SR2 S −1
with upper triangular matrices R1 , R2 ∈ K n,n . Since A is diagonalizable, we can choose
S so that R1 is diagonal. Furthermore, we can assume without loss of generality that
R1 = diag(λ1 Id1 , . . . , λk Idk ), where λi 6= λj for i 6= j.
Using the simultaneous triangularization, the equation AC = CA yields R1 R2 = R2 R1 .
Let us write R2 = [Rij ] with diagonal blocks Rii ∈ K di ,di for i = 1, . . . , k. Then the
(i, j)-block of the equation R1 R2 = R2 R1 can be written as

(λi Idi )Rij − Rij (λj Idj ) = 0, i, j = 1, . . . , k.

This is a Sylvester equation for the matrix Rij . For i 6= j we have λi 6= λj , and
thus Theorem 4.1 implies that Rij = 0 is the unique solution. Consequently, R2 =
diag(R11 , . . . , Rkk ).
We can now write the equation C = SR2 S −1 in the equivalent form
b − BR
R1 B b 1 = R2 , b := S −1 BS.
where B

Let Bb = [Bij ] with diagonal blocks Bii ∈ K di ,di for i = 1, . . . , k. Then the (i, j)-block of
the last equation is given by

(λi Idi )Bij − Bij (λj Idj ) = Rij , i, j = 1, . . . , k.

88
This is a Sylvester equation for the matrix Bij . For i 6= j we have Rij = 0 and λi 6=
λj , and thus Bij = 0 is the unique solution by Theorem 4.1. This means that B b =
diag(B11 , . . . , Bkk ), and we get

AB = (SR1 S −1 ) (S BS
b −1 )
= S diag(λ1 Id1 , . . . , λk Idk ) diag(B11 , . . . , Bkk ) S −1
= S diag(B11 , . . . , Bkk ) diag(λ1 Id1 , . . . , λk Idk ) S −1
= (Sdiag(B11 , . . . , Bkk )S −1 ) (Sdiag(λ1 Id1 , . . . , λk Idk )S −1 )
= BA,

which completes the proof.

A special case of Theorem 6.10, which was first shown by Putnam [61], is the following:
If A, B ∈ Cn,n are such that A is normal and commutes with [A, B], then [A, B] = 0.
The following example gives some more insight into the necessity of the diagonalizability
assumption on A in Theorem 6.10.
Example 6.11. For the matrices
   
1 1 0 0
A= and B=
0 1 0 1
we have 
0 1
C = [A, B] = , and AC = CA, but [A, B] 6= 0.
0 0
The matrix A is not diagonalizable, and the conclusion of Theorem 6.10 does not hold.

We will now have a brief look at the additive commutator of two linear operators on
a Hilbert space V . This case is important in the theory of quantum mechanics, where
Heisenberg’s relation 3 can be written as [P, Q] = αidV for some scalar α 6= 0, and P, Q
are self-adjoint quantum-mechanical operators4 .
For finite matrices P, Q such a relation cannot hold because the trace of their additive
commutator is zero. Wintner in 1947 [88] and Wielandt5 in 1949 [87] showed that the
Heisenberg relation is also impossible when the two linear operators are bounded.
Theorem 6.12. If P, Q are two bounded linear operators on a Hilbert space V with
[P, Q] = αidV , then α = 0.
3
Werner Heisenberg (1901–1976)
4
Kato and Olga Taussky remarked about this in 1956 [47, p. 38]: “The non-vanishing of the commu-
tator of canonically conjugate operators is the root of the Heisenberg uncertainty principle. One infers
that commutability is not “purely mathematical”, but of metamathematical interest.”
5
Helmut Wielandt (1910–2001)

89
Proof. Wintner’s proof: Let [P, Q] = αidV . If we define Pλ := P + λidV , then

[Pλ , Q] = (P + λidV )Q − Q(P + λidV ) = P Q + λQ − QP − λQ = [P, Q] = αidV .

Since Pλ satisfies the same relation as P for any scalar λ, we can assume without loss
of generality that P is invertible. Then QP = P −1 (P Q)P , which implies that σ(QP ) =
σ(P Q), and thus

σ(P Q) = σ(QP + αidV ) = σ(QP ) + α = σ(P Q) + α.

Since α is a constant, we must have α = 0.


Wieland’s proof: Let [P, Q] = αidV . Then

P 2 Q − QP 2 = P 2 Q − P QP + P QP − QP 2 = P (P Q − QP ) + (P Q − QP )P = 2αP,

and hence, by induction, P n Q − QP n = nαP n−1 for n = 1, 2, . . . . If P is nilpotent of


some index k we have P k−1 6= 0 and thus

0 = P k Q − QP k = αkP k−1 ,

which implies that α = 0. If P is not nilpotent, then for each n = 1, 2, . . . we get

n |α| kP n−1 k = kP n Q − QP n k ≤ 2kP n−1 kkQk,

which yields |α| ≤ 2kP kkQk/n. This holds for all n ≥ 1, which implies α = 0.

A consequence of this result is that linear operators P, Q with [P, Q] = αidV and α 6= 0
must be unbounded6 .

Example 6.13. This simple illustrating example is taken from [30, p. 128]. Let V =
L2 (−∞, ∞) and define the (unbounded) linear operators P, Q by

P f (x) = f 0 (x) and Qf (x) = xf (x)

for all x ∈ R. For these we have

[P, Q]f (x) = P (xf (x)) − Qf 0 (x) = f (x) + xf 0 (x) − xf 0 (x) = f (x),

and hence [P, Q] = idV .


6
Wintner wrote about this observation in 1947 [88, p. 739]: “In fact, the unboundedness of Q, P
is indicated by heuristic considerations concerning the location of possible energy levels. However, the
literature consulted does not contain a general proof.”

90
6.2 The theorems of McCoy
In this section we will prove two important theorems of McCoy that characterize the
simultaneously triangularization property of matrices. It is clear that commutativity is
sufficient but not necessary for the simultaneous triangularization, and hence a different
property must be found. In a first step in this direction, McCoy introduced the following
definition in 1934 [53].
Definition 6.14. Let K be a field, A, B ∈ K n,n and C = [A, B]. If AC = CA and
BC = CB, then A, B are called quasi-commuting.

It is obvious that commuting matrices are quasi-commuting7 . If A, B are quasi-commuting


and one of these matrices is diagonalizable, then they must be commuting by Theo-
rem 6.10. In other words, if A, B are quasi-commuting matrices that do not commute,
then both A and B must be non-diagonalizable.
If A, B are quasi-commuting, then Lemma 6.7 implies that C = [A, B] is nilpotent. For
any nilpotent matrix C ∈ K n,n , necessary and sufficient conditions for the existence of
A, B ∈ K n,n with C = [A, B], AC = CA, and BC = CB were obtained by McCoy
in [53, Theorem 2]. In particular, he showed that quasi-commuting matrices that are not
commuting exist only for n ≥ 3 [53, p. 335].
The next result is known as the little theorem of McCoy [53]. The strategy of the proof
given here is due to Egan and Ingram [14] (cf. also the proofs of Theorems 5.9 and 6.9).
Theorem 6.15. Let K be algebraically closed with characteristic zero or (prime) char-
acteristic p > n. If A, B ∈ K n,n are quasi-commuting, then A and B are simultaneously
triangularizable.

Proof. Let A, B ∈ K n,n be two quasi-commuting matrices. We write C = AB − BA,


then AC = CA and Lemma 6.7 implies that C is nilpotent. Moreover, A and C have a
common eigenvector by Lemma 5.3, say
Ay = λy and Cy = 0
for some nonzero y ∈ K n . Consider the set
X := {x ∈ K n : Ax = λx and Cx = 0},
which is a subspace of K n with dimension at least one. For each x ∈ X we have
A(Bx) = (C + BA)x = BAx = λ(Bx) and C(Bx) = BCx = 0,
7
In order to have to distinct classes of matrices, McCoy [53] originally defined A, B quasi-commuting
if C = [A, B] 6= 0, AC = CA, and BC = CB.

91
and hence Bx ∈ X . Thus, X is invariant under B, and Lemma 5.1 implies that B has
an eigenvector in X , which simultaneously is an eigenvector of A. We therefore obtain a
decomposition of the form (5.1), and a straightforward computation shows that
 
0 ∗
C=S S −1 , where C1 := [A1 , B1 ].
0 C1

Now AC = CA and BC = CB yield

A1 C1 = C1 A1 and B1 C1 = C1 B1 .

Since A1 , B1 are quasi-commuting, the result follows inductively as in the proof of The-
orem 5.5.

In Theorem 3.14 we have shown that quasi-commuting matrices A, B ∈ Cn,n satisfy

exp(tA) exp(tB) = exp(t(A + B) + (t2 /2) [A, B]).

By Theorem 6.15, the matrices A and B are simutaneously triangularizable. Let A =


(A) (B)
SRA S −1 and B = SRB S −1 be such triangularizations with RA = [rij ] and RB = [rij ],
then
exp(tRA ) exp(tRB ) = exp(t(RA + RB ) + (t2 /2) [RA , RB ]).
In particular, for the diagonal entries, which are the eigenvalues of exp(tA) exp(tB),
(A) (B) (A) (B)
etrjj etrjj = et(rjj +rjj )
, j = 1, . . . , n,

since the (upper triangular) additive commutator [RA , RB ] is nilpotent and hence has a
zero diagonal. What we have derived can be considered a (significant) generalization of
the usual rule ea eb = ea+b for the scalar exponential function.
The following example shows that for simultaneous triangularizability it is in general not
sufficient that the additive commutator commutes with only one of the two matrices A
and B.
Example 6.16. For the matrices
   
0 1 0 0 0 0
A = 0 0 0 and B = 1 0 0
0 0 0 0 1 0

we have
 
0 1 0
C = AB − BA = 0 0 −1 , AC = CA, and BC 6= CB.
0 0 0

92
The additive commutator is nilpotent, as guaranteed by Lemma 6.7. Both A and B have
only the eigenvalue 0 with the corresponding eigenspaces given by V0 (A) = span{e1 , e2 }
and V0 (B) = span{e3 }. Since A and B do not have a common eigenvector, they are not
simultaneously triangularizable.

Exercise. Show that if K is algebraically closed with characteristic zero or (prime)


characteristic p > n, A1 , . . . , Am ∈ K n,n , Cij = [Ai , Aj ], and Ak Cij = Cij Ak for
i, j, k = 1, . . . , m, then A1 , . . . , Am are simutaneously triangularizable.

Two years after his article [53], McCoy was able to give necessary and sufficient conditions
for the simultaneous triangularization property. McCoy’s original proof of Theorem 6.18
below uses abstract algebraic structures. The proof we give uses only matrix theory and
is based on the next lemma of Drazin, Dungey and Gruenberg [12]. This lemma and the
following developments use polynomials p(t1 , . . . , tm ) in the non-commuting variables
t1 , . . . , tm . Such polynomials are linear combinations of words in the variables. For
example, if m = 2, then a word in t1 and t2 is an expression of the form
k `
tk11 t`21 tk12 t`22 · · · t1j t2j ,

where the ki , `i are nonnegative integers, whose sum is called the degree of the word. It
is not allowed to change the order of the monomials in such words, so for example t1 t2 6=
t2 t1 . Of course, if we evaluate such a polynomial p(t1 , . . . , tm ) at mutually commuting
matrices A1 , . . . , Am , we may again commute the matrices.

Lemma 6.17. Let K be algebraically closed and let A1 , . . . , Am ∈ K n,n . Suppose that
for every polynomial p(t1 , . . . , tm ) in the non-commuting variables t1 , . . . , tm each of the
matrices p(A1 , . . . , Am )[Ak , A` ] for k, ` = 1, . . . , m is nilpotent. Then for every x ∈
K n \{0} the matrices A1 , . . . , Am have a common eigenvector of the form q(A1 , . . . , Am )x
for some polynomial q(t1 , . . . , tm ).

Proof. We prove the result by induction on m.


For m = 1 let A ∈ K n,n and x ∈ K n \ {0} be arbitrary. The n + 1 vectors x, Ax, . . . , An x
are linearly dependent. Hence there exists a unique monic polynomial q1 ∈ K[t] of
smallest possible degree (at least 1 since x 6= 0) with q1 (A)x = 0. Let λ ∈ K be any zero
of q1 , i.e., q1 = (t − λ) · q for some nonzero polynomial q ∈ K[t]. Then q(A)x 6= 0, since
deg(q) < deg(q1 ), and thus

0 = q1 (A)x = (A − λIn )q(A)x.

This can be written as A(q(A)x) = λq(A)x, i.e., q(A)x is an eigenvector of A.

93
Now suppose that the result holds for m − 1 and some m ≥ 2. Let x ∈ K n \ {0} be
arbitrary, and let A1 , . . . , Am ∈ K n,n be given, where for every polynomial p(t1 , . . . , tm )
in the non-commuting variables t1 , . . . , tm each of the matrices

p(A1 , . . . , Am )[Ak , A` ], k, ` = 1, . . . , m,

is nilpotent. We write Ck` = [Ak , A` ]. By the induction hypothesis there exists a common
eigenvector of A1 , . . . , Am−1 of the form y1 = q(A1 , . . . , Am−1 )x. We distinguish two cases:
Case 1: For every polynomial p ∈ K[t] and i = 1, . . . , m − 1 we have Cim p(Am )y1 = 0,
or equivalently
(Ai Am ) p(Am )y1 = (Am Ai ) p(Am )y1 . (6.1)
Then, in particular,

0 = Cim y1 = (Ai Am − Am Ai )y1 , or Ai Am y1 = Am Ai y1 .

We claim that then Ai p(Am )y1 = p(Am )Ai y1 for every p ∈ K[t]. Indeed, suppose that
Ai Akm y1 = Akm Ai y1 for some k ≥ 1. Then (6.1) yields

Ai Ak+1 k k k+1
m y1 = (Ai Am )Am y1 = (Am Ai )Am y1 = Am Ai y1 .

From the base case of the induction we know that there exits a (nonzero) polynomial
q1 ∈ K[t] such that the (nonzero) vector q1 (Am )y1 is an eigenvector of Am . Recall that
y1 is an eigenvector of Ai for each i = 1, . . . , m − 1. Thus, for each i = 1, . . . , m − 1 there
exists some λi ∈ K such that

Ai q1 (Am )y1 = q1 (Am )(Ai y1 ) = q1 (Am )(λi y1 ) = λi q1 (Am )y1 ,

which shows that q1 (Am )y1 = q1 (Am )q(A1 , . . . , Am−1 )x =: qe(A1 , . . . , Am )x is a common
eigenvector of A1 , . . . , Am .
Case 2: There exists a matrix C1 := Cim for some i ∈ {1, . . . , m − 1} and a polynomial
p1 ∈ K[t] with C1 p1 (Am )y1 6= 0. Then, by the induction hypothesis, there exists a
common eigenvector of A1 , . . . , Am−1 of the form

y2 := q1 (A1 , . . . , Am−1 )C1 p1 (Am )y1 .

If we have Cim p(Am )y2 = 0 for every polynomial p ∈ K[t] and i = 1, . . . , m − 1, we can
proceed with y2 as in Case 1 in order to construct a common eigenvector of A1 , . . . , Am
which has the required form.
Otherwise, there exists a matrix C2 := Cim for some i ∈ {1, . . . , m − 1} and a polynomial
p2 ∈ K[t] with C2 p2 (Am )y2 6= 0. By the induction hypothesis there now exists a common
eigenvector of A1 , . . . , Am−1 of the form

y3 := q2 (A1 , . . . , Am−1 )C2 p2 (Am )y2 .

94
Proceeding in this way we can construct a sequence of vectors

yk+1 = qk (A1 , . . . , Am−1 )Ck pk (Am )yk , k = 1, 2, . . . .

This sequence terminates at some step if and only if yk+1 is a common eigenvector of
A1 , . . . , Am−1 that satisfies Cim p(Am )yk+1 = 0 for all i = 1, . . . , m − 1 and p ∈ K[t].
If the sequence terminates with such a vector, we can use the same argument as in
Case 1 in order to construct a common eigenvector of A1 , . . . , Am which has the required
form. We will now show (by contradiction) that the sequence must terminate with such
a vector.
Suppose that the sequence does not terminate. Since the space K n is n-dimensional, the
vectors y1 , . . . , yn+1 are linearly dependent, and hence
n+1
X
µj yj = 0,
j=1

where µ1 , . . . , µn+1 ∈ K are not all equal to zero. If k ≥ 1 is the smallest integer such
that µk 6= 0, then we must have k ≤ n and can write
n+1
X
−µk yk = µj y j .
j=k+1

By construction,

yk+1 = qk (A1 , . . . , Am−1 )Ck pk (Am )yk ,


yk+2 = qk+1 (A1 , . . . , Am−1 )Ck+1 pk+1 (Am )yk+1 ,

etc., and hence there exists a polynomial u(t1 , . . . , tm ) such that

−µk yk = u(A1 , . . . , Am )Ck pk (Am )yk .

This yields
(pk (Am )u(A1 , . . . , Am )Ck ) (pk (Am )yk ) = −µk (pk (Am )yk ),
i.e., pk (Am )yk 6= 0 is an eigenvector of the matrix pk (Am )u(A1 , . . . , Am )Ck correspond-
ing to the nonzero eigenvalue −µk . This contradicts our assumption that each matrix
p(A1 , . . . , Am )Cim is nilpotent, and the proof is complete.

This lemma allows a very simple proof of the following theorem of McCoy, which he
considered “a perfection of Frobenius’ theorem” [54, p. 593]; cf. the discussion of Corol-
lary 5.14. The proof we give is due to Drazin, Dungey and Gruenberg [12].

95
Theorem 6.18. Let K be algebraically closed and let A1 , . . . , Am ∈ K n,n . Then the
following assertions are equivalent:

(1) The matrices A1 , . . . , Am are simultaneously triangularizable.


(2) For every polynomial p(t1 , . . . , tm ) in the (non-commuting) variables t1 , . . . , tm the
(1) (m)
eigenvalues of p(A1 , . . . , Am ) are of the form p(λi , . . . , λi ), i = 1, . . . , n, where
(j) (j)
λ1 , . . . , λn are the eigenvalues of Aj , j = 1, . . . , m, in some appropriate ordering.
(3) For every polynomial p(t1 , . . . , tm ) in the (non-commuting) variables t1 , . . . , tm and
all k, ` = 1, . . . , m, the matrix p(A1 , . . . , Am )[Ak , A` ] is nilpotent.
(In particular, each additive commutator [Ak , A` ] is nilpotent.)

Proof. (1) ⇒ (2) : If A1 , . . . , Am are simultaneously triangularizable, i.e., Aj = SRj S −1


with Ri upper triangular for j = 1, . . . , m, then
Aij11 Aij22 · · · Aijm
m
= S(Rji11 Rji22 · · · Rjim
m
)S −1 .
Thus, the eigenvalues of Aij11 Aij22 · · · Aijm
m
are the diagonal elements of Rji11 Rji22 · · · Rjim
m
. The
implication now follows by induction (cf. the proof of Corollary 5.14).
(2) ⇒ (3) : Let p = p(t1 , . . . , tm ) be any polynomial in the (non-commuting) variables
t1 , . . . , tm , and define pk` := p · (tk t` − t` tk ), so that
pk` (A1 , . . . , Am ) = p(A1 , . . . , Am )[Ak , A` ].
If (2) holds, then the eigenvalues of pk` (A1 , . . . , Am ) are of the form
(1) (m) (k) (`) (`) (k)
p(λi , . . . , λi ) · (λi λi − λi λi ) = 0, i = 1, . . . , n.
Thus, pk` (A1 , . . . , Am ) is nilpotent.
(3) ⇒ (1) : If (3) holds, then A1 , . . . , Am have a common eigenvector by Lemma 6.17,
say Aj x1 = λj x1 for j = 1, . . . , m. Taking any matrix S = [x1 , x2 , . . . , xn ] ∈ GLn (K) we
obtain decompositions of the form
 
λj ∗
Aj S = S
0 Aej

ej ∈ K n−1,n−1 for j = 1, . . . , m. For any polynomial p(t1 , . . . , tm ) in the (non-


with A
commuting) variables t1 , . . . , tm , and any pair k, ` we have
 
0 ∗
p(A1 , . . . , Am )[Ak , A` ] = S S −1 .
0 p(A1 , . . . , Am )[Ak , A` ]
e e e e

The matrix on the left is nilpotent, hence p(Ae1 , . . . , A


em )[A
ek , A
e` ] is nilpotent, and the
simultaneous triangularization follows by induction as in the proof of Theorem 5.5.

96
It is clear that for K = C we can write unitarily triangularizable in (1) of Theo-
rem 6.18. Moreover, in (3) the condition that p(A1 , . . . , Am )[Ak , A` ] is nilpotent implies
that [Ak , A` ]p(A1 , . . . , Am ) is nilpotent and vice versa, since these matrices have the same
eigenvalues; cf. Lemma A.1.
For algebraically closed fields with characteristic zero the little theorem of McCoy (The-
orem 6.15) shows that the quasi-commutativity property

Ak [Ai Aj ] = [Ai , Aj ]Ak , i, j, k = 1, . . . , m,

implies (1)–(3) in Theorem 6.18.

Example 6.19. Consider the matrices


    
0 1 0 0 1 0
A= , B= , so that [A, B] = .
0 0 1 0 0 −1

Since [A, B] invertible, the condition of (3) in Theorem 6.18 fails, e.g., for the polynomial
p = 1. Thus, A and B are not simultaneously triangularizable. Also note that σ(A+B) =
{−1, 1}, while σ(A) + σ(B) = {0}.

Exercise. Show that if A1 , . . . , Am in Theorem 6.18 are normal, then (1)–(3) are
equivalent with [Ai , Aj ] = 0 for all i, j = 1, . . . , m.

Exercise. Use Theorem 6.18 to give an alternative proof of the following special
case of Theorem 5.9, originally due to Schneider [64]: If K is algebraically closed and
A, B ∈ K n,n are such that AB = 0, then A and B are simultaneously triangulariz-
able. (Hint: Consider any polynomial p in two (non-commuting) variables and form
(p(A, B)[A, B])2 .)

The following is an overview of the results related to the simultaneous triangularization


of two matrices A, B ∈ K n,n , where K is an algebraically closed field (sometimes with
characteristic zero or (prime) characteristic p > n) and C = [A, B]:

97
C=0 Thm. 6.10 A diagonalizable and AC = CA

Thm. 5.5

A, B simultaneously triangularizable Thm. 5.9 A and B commute with AB

Thm. 6.18 (cf. Cor. 5.14) Thm. 6.4 & 6.9 & 6.15

σ(p(A, B)) = {p(αi , βi )} for any p ∈ K[s, t] rank(C) = 1 or C = p(A)


or A and B commute with C

Thm. 6.18

p(A, B)C nilpotent for any p ∈ K[s, t]

The sufficient conditions for simultaneous triangularizability of two matrices in Theo-


rems 5.5, 5.9, 6.15, and to some extend also the one in Theorem 6.9 are relatively easy
to check. It is computationally very expensive, however, to test whether the necessary
and sufficient condition (3) in Theorem 6.18 is satisfied. As shown by Al’pin and Ko-
reshkov [2] using a theorem on the nilpotency of a finite-dimensional nilalgebra, the
condition (3) for two matrices A, B ∈ K n,n is equivalent with the following (simpler)
condition:
The trace of every matrix in the set

A1 · · · Ak [A, B] : Ai ∈ {A, B}, i = 1, . . . , k, k = 0, 1, . . . , n2 − 1




is equal to zero.
Since [A, B] has trace zero, one needs to test
2 −1
nX
2
2k = 2n − 2
k=1

matrices whether they have trace zero. Each of these computations requires O(n3 ) multi-
2
plications, and thus a possible algorithm based on the condition above requires O(2n n3 )
multiplications. Bourgeois implemented such a method and concluded “practically, one
cannot test a pair of numerical matrices of dimension greater than five” [6, p. 593].

98
6.3 The multiplicative commutator
If G is any group and x, y ∈ G, then the element (x, y) := xyx−1 y −1 ∈ G is called the
group commutator of x and y. If e ∈ G is the unit element, then (x, y) = e holds if and
only if x and y commute. Moreover, (x, y)−1 = (y, x).
For the group GLn (K) and A, B ∈ GLn (K) we accordingly have

(A, B) := ABA−1 B −1 ,

and (A, B) = In if and only if [A, B] = 0. In the following we will call (A, B) the
multiplicative commutator of A and B. We immediately observe that the multiplicative
commutator

det(ABA−1 B −1 ) = det(A) det(B) det(A−1 ) det(B −1 ) = 1.

(Recall that the additive commutator [A, B] has trace zero.)


The next result is (almost) the multiplicative analogue of Theorem 6.3.

Theorem 6.20. If K is any field and A ∈ K n,n is diagonalizable with det(A) = 1, then
A is a multiplicative commutator, i.e., A = (B, C) for some B, C ∈ GLn (K).

Proof. Let A = SΛS −1 be a diagonalization of A, where


n
Y
Λ = diag(λ1 , . . . , λn ) with λi = 1.
i=1

We define µk := ki=1 λi for k = 1, . . . , n, so that in particular µ1 = λ1 and µn = 1.


Q
Moreover, we define

D := diag(µ1 , . . . , µn ) ∈ GLn (K),


B := SD−1 S −1 ,
C := SP S −1 , where P := [e2 , . . . , en , e1 ].

Note that then P −1 = P T = [en , e1 , . . . , en−1 ], and hence


 
µ−1
n en
T
 µ−1
1 e1
T 
D−1 P = [µ−1 e , . . . , µ −1
e , µ−1
e ] = ,
 
1 2 n−1 n n 1  ..
 . 
µ−1 T
n−1 en−1

DP T = [µ1 en , µ2 e1 , . . . , µn en−1 ],

99
which gives
(D−1 P )(DP T ) = diag(µ−1 −1 −1
n µ1 , µ1 µ2 , . . . , µn−1 µn ) = Λ.

Consequently,

(B, C) = BCB −1 C −1 = (SD−1 S −1 )(SP S −1 )(SDS −1 )(SP T S −1 )


= S(D−1 P DP T )S −1 = SΛS −1 = A.

In Theorem 6.20 we have made the rather strong assumption that A is diagonalizable.
As shown by Shoda [74, Satz 1], if K is algebraically closed, then every matrix A ∈ K n,n
with det(A) = 1 is a multiplicative commutator8 .
Theorem 6.20 yields the following corollary due to Ky Fan9 [16, Theorems 3 and 5].

Corollary 6.21. If A ∈ Cn,n is normal or unitary with det(A) = 1, then A = (B, C)


for some normal or unitary matrices B, C ∈ GLn (C), respectively.

Exercise. Derive Corollary 6.21 from Theorem 6.20.

For Hermitian matrices we have the following result, also due to Ky Fan [16, Theorem 6].

Theorem 6.22. Let A ∈ Cn,n be Hermitian with det(A) = 1. Then A can be written as
A = (B, C) for some Hermitian matrices B, C ∈ GLn (C) if and only if A and A−1 have
the same eigenvalues.

Proof. Suppose that A = AH with det(A) = 1, and that A = (B, C) for some matrices
B, C ∈ GLn (C). Then

A = BCB −1 C −1 = (BCB −1 C −1 )H = C −1 B −1 CB = (C −1 B −1 )(CB),


A−1 = (BCB −1 C −1 )−1 = CBC −1 B −1 = (CB)(C −1 B −1 ).

Lemma A.1 shows that A and A−1 have the same eigenvalues.
On the other hand, suppose that A = AH with det(A) = 1 is given, where A and A−1
have the same eigenvalues. Since det(A) = 1 we can write
−1
A = U ΛU H , where Λ = diag(1, . . . , 1, λ1 , λ−1
1 , . . . , λk , λk ).
8
Olga Taussky wrote that she “adored this theorem” [81, p. 802]. Her first official PhD student
Robert C. Thompson (1931–1995) extended Shoda’s result more general fields in 1961 [82].
9
Ky Fan (1914–2010)

100
Now note that      −1  −1
λj 0 λj 0 0 1 λj 0 0 1
= ,
0 λ−1j 0 1 1 0 0 1 1 0
   
λj 0 0 1
where the matrices and are Hermitian (even real symmetric). From this
0 1 1 0
it is easy to see that there exist Hermitian matrices B, C ∈ GLn (C) with A = (B, C).

The next result of Marcus10 and Thompson [52] gives an interesting application of the
field of values in the study of multiplicative commutators (cf. also Theorem 6.10).

Theorem 6.23. If A, B ∈ GLn (C) are normal, 0 ∈


/ F (B), the multiplicative commutator
−1 −1
C = ABA B is normal, and AC = CA, then AB = BA.

Proof. Since AC = CA and both A and C are normal, we can simultaneously unitarily
diagonalize these matrices by Theorem 5.8, i.e., A = U D1 U H and C = U D2 U H , where

D1 = diag(a1 , . . . , an ) and D2 = diag(c1 , . . . , cn ).

The equation C = ABA−1 B −1 can be written as AB = CBA, which yields

D1 B
b = D2 BD
b 1, b := U H BU.
where B

With B
b = [bij ] the last equation gives

ai bij = ci bij aj for i, j = 1, . . . , n.

In particular, for i = j we have ai bii = ci bii ai and since ai 6= 0, we get bii = ci bii . By
assumption 0 ∈ b and hence 0 6= eT Be
/ F (B) = F (B), b i = bii . Therefore ci = 1, giving
i
C = In , or AB = BA.

Exercise. Which essential properties in the proof of Theorem 6.23 do not hold when
we assume that A and C are just diagonalizable instead of normal?

The following special case of Theorem 6.23 was already established by Frobenius in
1911 [23, Satz V].

Corollary 6.24. If A, B ∈ Cn,n are unitary, σ(B) is contained in an arc less than a
semicircle, and the multiplicative commutator C = ABA−1 B −1 satisfies AC = CA, then
AB = BA.
10
Marvin David Marcus (1927–2016)

101
A unitary matrix with eigenvalues in an arc less than a semicircle is sometimes called a
cramped matrix.

Exercise. Derive Corollary 6.24 from Theorem 6.23.

Lemma 6.7 yields the following result for multiplicative commutators.

Lemma 6.25. If K is algebraically closed with characteristic zero or (prime) charac-


teristic p > n, and A, B ∈ GLn (K) are quasi-commuting (cf. Definition 6.14), then
(A, B −1 ) − In is nilpotent.

Proof. Since A commutes with C = [A, B], we know that C is nilpotent by Lemma 6.7.
Moreover, AC = CA and BC = CB imply that CA−1 B −1 = A−1 B −1 C and thus by
induction (CA−1 B −1 )k = (A−1 B −1 )k C k for all k ≥ 0. In particular,

(ABA−1 B −1 − In )n = (CA−1 B −1 )n = (A−1 B −1 )n C n = 0.

Since ABA−1 B −1 − In is nilpotent, we have

{1} = σ(ABA−1 B −1 ) = σ(BA−1 B −1 A),

and hence

{1} = σ((BA−1 B −1 A)−1 ) = σ(A−1 BAB −1 ) = σ(AB −1 A−1 B),

so that (A, B −1 ) − In indeed is nilpotent.

Note that the equalities of the different spectra in the proof of Lemma 6.25 show that
several other matrices are nilpotent as well. Putnam and Wintner [63] showed that the
assertion of Lemma 6.25 also holds when just one of A and B commutes with [A, B].

102
Appendix A

Definitions and facts from Linear


Algebra

In this appendix we will recall some important definitions and facts from Linear Algebra,
which mostly can be found in [LM].
Let K be a field. Every field has at least two elements, 0 and 1. The characteristic of a
field is the smallest possible number n ∈ N such that n · 1 = 0. If no such n ∈ N exists,
the characteristic of K is zero. If the characteristic of a field is not zero but some n ∈ N,
then n must be a prime number, which can be seen as follows: If the characteristic n ∈ N
would be of the form n = k · m with 1 < k, m < n, then
0 = n · 1 = (k · m) · 1 = (k · 1) · (m · 1).
Since K has no zero divisors, at least one of the numbers m · 1 and k · 1 on the right hand
side must be 0, but this contradicts the minimality assumption on n. Examples for fields
with characteristic zero are Q, R and C. An example of a field with (prime) characteristic
n ∈ N, and hence a finite field, is given in [LM, Example 2.29 and Exercise 3.13].
For a field K we denote by K[t] the ring of polynomials with coefficients in K. If every
non-constant polynomial in K[t] has at least one root in K, then the field K is called
algebraically closed. Equivalently, K is algebraically closed if and only if each p ∈ K[t]
with deg(p) = m ≥ 1 decomposes into linear factors over K, i.e.,
p = α(t − λ1 ) · · · (t − λm ),
for some α, λ1 , . . . , λm ∈ K. The classical example of an algebraically closed field is the
field of the complex numbers C; cf. the proof of the Fundamental Theorem of Algebra
in [LM, Section 15.2].
If K contains finitely many elements, say K = {k1 , . . . , kn }, we can form the polynomial
p = (t−k1 ) · · · (t−kn )+1 ∈ K[t]. This polynomial satisfies p(kj ) = 1 for all j = 1, . . . , n,

103
and hence does not have a root in K. Consequently, a finite field cannot be algebraically
closed.
By K n,m we denote the set of n × m matrices over K, i.e., the matrices A = [aij ] with
aij ∈ K for i = 1, . . . , n and j = 1, . . . , m. For simplicity, the n × 1 matrices are denoted
by K n instead of K n,1 . Each A ∈ K n,m has an image and a kernel defined by

im(A) = {Ax : x ∈ K m } ⊆ K n and ker(A) = {x : Ax = 0} ⊆ K m ,

respectively. The dimension formula for linear maps says that m = dim(im(A)) +
dim(ker(A)) [LM, Theorem 10.9].
The group of invertible or nonsingular n × n matrices is denoted by GLn (K) (see [LM,
Theorem 4.11]), and the identity (matrix) in GLn (K) is denoted by In = [δij ] ∈ K n,n ,
where δij denotes the Kronecker delta-function.
The transpose of A = [aij ] ∈ K n,m is the matrix AT = [bij ] ∈ K m,n with bij = aji for all
i, j. The Hermitian transpose of A = [aij ] ∈ Cn,m is the matrix AH = [bij ] with bij = aji
T
for all i, j, i.e., AH = A .
For A ∈ K n,n we denote by

• PA = det(tIn − A) ∈ K[t] the characteristic polynomial of A,

• MA the minimal polynomial of A,

• trace(A) = nj=1 ajj the trace of A,


P

• rank(A) the rank of A,

• σ(A) := {λ ∈ K : Ax = λx for some nonzero x ∈ K n } the spectrum of A,

• Vλ (A) = ker(λIn − A) the eigenspace of A corresponding to the eigenvalue λ.

• g(λ, A) = dim(Vλ (A)) the geometric multiplicity of the eigenvalue λ of A.

The companion matrix of a monic polynomial p = tn + an−1 tn−1 + · · · + a0 ∈ K[t] is


defined by
−a0
 
0
 ...
1

Cp =  .  ∈ K n,n , (A.1)

 . . 0 −an−2 
1 −an−1
where for n = 1 we set Cp = [−a0 ].

104
A subspace U ⊆ K n is invariant under A ∈ K n,n if Ax ∈ U for each x ∈ U. For
example, the eigenspace Vλ (A) of each eigenvalue λ of A is an invariant subspace of A
[LM, Lemma 14.4].
The following is a list of important matrix classes introduced in [LM]:

1. A ∈ K n,n is symmetric1 when A = AT , and skew-symmetric when A = −AT .

2. A ∈ Cn,n is Hermitian when A = AH , and skew-Hermitian when A = −AH .

3. Let A ∈ Rn,n be symmetric or A ∈ Cn,n be Hermitian. If xT Ax > 0 holds for all


x ∈ Rn \ {0} or xH Ax > 0 holds for all x ∈ Cn \ {0}, then A is symmetric positive
definite (SPD) or Hermitian positive definite (HPD), respectively.
If a weak inequality “≥” holds, then A is respectively symmetric or Hermitian
positive semidefinite. If the reverse inequality “≤” holds, then A is negative
(semi-) definite.

4. A = [aij ] ∈ K n,n is upper triangular when aij = 0 for all i > j, and lower triangular
when aij = 0 for all j > i (or when AT is upper triangular).

5. A ∈ K n,n is diagonal when it is both upper and lower triangular. We sometimes


denote such a matrix by diag(a11 , . . . , ann ).

6. If a nonzero matrix A ∈ K n,n satisfies Ak = 0 for some k ∈ N, then A is nilpotent.


If at the same time Ak−1 6= 0, then A is nilpotent of index k. The zero matrix is
nilpotent of index one.

7. A ∈ Rn,n is orthogonal when AT A = In , or, equivalently, when AT = A−1 .

8. A ∈ Cn,n is unitary when AH A = In , or, equivalently, when AH = A−1 .

9. A ∈ Rn,n is normal when AT A = AAT .

10. A ∈ Cn,n is normal when AH A = AAH .

The following is a list of useful facts:

1. The n × n invertible upper (or lower) triangular matrices over K form a subgroup
of GLn (K). In particular, the inverse of an invertible upper triangular matrix is
upper triangular, and the product of invertible upper triangular matrices is an
invertible upper triangular matrix [LM, Theorem 4.13].
1
“If one knows all about symmetric matrices, then one knows a great deal about matrices. On the
other hand, one can only understand symmetric matrices properly if one understands the links between
general matrices A and their transposes AT .” (Olga Taussky [80, p. 147])

105
2. Determinant Multiplication Theorem: For all A, B ∈ K n,n we have det(AB) =
det(A) det(B) = det(BA) [LM, Theorem 7.15].
3. If A ∈ K n,n has the characteristic polynomial PA = tn + an−1 tn−1 + · · · + a0 , then
an−1 = −trace(A) and a0 = (−1)n det(A) [LM, Lemma 8.3].
4. Cayley-Hamilton Theorem: For each A ∈ K n,n we have PA (A) = 0 ∈ K n,n [LM,
Theorem 8.6].
5. If A ∈ K n,n is nilpotent, then PA = tn [LM, Excercise 8.3].
6. For each p ∈ K[t] of (exact) degree n ≥ 1 and the corresponding companion matrix
Cp ∈ K n,n , we have PCp = det(tIn − Cp ) = p [LM, Lemma 8.4].
7. The map trace : K n,n → K, [aij ] 7→ nj=1 ajj , is linear and it satisfies trace(SAS −1 ) =
P
trace(A) as well as trace(AB) = trace(BA) for all A, B ∈ K n,n and S ∈ GLn (K)
[LM, Exercise 8.8].
8. The trace of a real or complex matrix is orthogonally or unitarily invariant, respec-
tively, i.e., trace(Q1 AQ2 ) = trace(A) for A ∈ Rn,n and orthogonal Q1 , Q2 ∈ Rn,n ,
and trace(U1 AU2 ) = trace(A) for A ∈ Cn,n and unitary U1 , U2 ∈ Cn,n .
9. For A ∈ Rn,m or A ∈ Cn,m we have rank(A) = rank(AH A) [LM, Corollary 10.25].
10. QR decomposition: For each A ∈ Rn,m with rank(A) = m there exists a matrix
Q ∈ Rn,m with pairwise orthonormal columns, i.e., QT Q = Im , and an upper
triangular matrix R ∈ GLn (R) such that A = QR. An analogous result holds for
A ∈ Cn,m with rank(A) = m; now QH Q = Im [LM, Corollary 12.12].
11. Jordan decomposition: If A ∈ K n,n has a characteristic polynomial that decom-
poses into linear factors, then there exists a decomposition of the form

A = SJS −1 with S ∈ GLn (K), J = diag(Jd1 (λ1 ), . . . , Jdm (λm )), where
 
λi 1
 ... ... 
Jdi (λi ) =   ∈ K di ,di , i = 1, . . . , m.
 
 . .. 1 
λi
The matrix J in this decomposition is called a Jordan canonical form of A. It is
uniquely determined up to the order of the diagonal Jordan blocks Jdi (λi ) [LM,
Theorems 16.10 and 16.12].

We will now derive a few basic and useful properties of the two products AB and BA of
two (possibly non-square) matrices A and B, which are not contained in [LM].

106
Lemma A.1. If K is any field, A ∈ K n,m and B ∈ K m,n , then tm PAB = tn PBA .

Proof. We define
   
tIn A In 0
C= ∈ K n+m,n+m and D = ∈ K n+m,n+m ,
B Im −B tIm

then    
tIn − AB tA tIn A
CD = and DC = ,
0 tIm 0 tIm − BA
and hence

tm PAB = tm det(tIn − AB) = det(CD) = det(DC) = tn det(tIm − BA)


= tn PBA ,

which completes the proof.

Thus, the nonzero eigenvalues of AB and BA coincide with the same algebraic multi-
plicities.

Example A.2. For the matrices

A = e1 ∈ K n,1 , B = eT1 ∈ K 1,n we have PAB = tn−1 (t − 1), PBA = t − 1.

If n = m, then PAB = PBA , but we may have MAB 6= MBA . For example,
   
0 1 1 0
A= , B= yield MAB = t, MBA = t2 .
0 0 0 0

The next lemma shows that the nonzero eigenvalues of AB and BA also have the same
geometric multiplicities.

Lemma A.3. Let K be any field, A ∈ K n,m and B ∈ K m,n , and let λ 6= 0 be an
eigenvalue of AB and BA. Then g(λ, AB) = g(λ, BA).

Proof. Let x1 , . . . , xk be linearly independent eigenvectors of AB corresponding to the


eigenvalue λ 6= 0, i.e.,
(AB)xj = λxj , j = 1, . . . , k.
Define yj := Bxj , j = 1, . . . , k, then

(BA)yj = B(ABxj ) = B(λxj ) = λyj .

107
Pk
Moreover, the vectors y1 , . . . , yk are linearly independent. If j=1 αj yj = 0, then

k
X k
X k
X
0= αj Bxj ⇒ 0= αj ABxj = λ αj xj .
j=1 j=1 j=1

Since λ 6= 0, we must have kj=1 αj xj = 0, and thus α1 = · · · = αk = 0, since the vectors


P
x1 , . . . , xk are linearly independent. This implies that g(λ, AB) ≤ g(λ, BA).
In the same way we can show that g(λ, BA) ≤ g(λ, AB), so that indeed g(λ, BA) =
g(λ, AB)

More details about the relationship between AB and BA, and in particular a detailed
analysis of the Jordan structures, can be found in [18, 44].

Exercise. Suppose that A, B ∈ K n,n are such that In − AB is invertible. Show that
then I − BA is invertible with (In − BA)−1 = In + BXA, where X = (In − AB)−1 .
Show further that for any λ ∈ K, the matrix λIn − AB is invertible if and only if
λIn − BA is invertible.

108
Index

additive commutator, 48, 81 Gerstenhaber’s theorem, 62


nilpotent, 85 group commutator, 99
adjoint action, 49
adjoint map, 49 Hadamard Lemma, 48
algebra isomorphism, 20 Hadamard product, 25
algebra over a field, 16 Hermitian matrix, 8, 105
algebraically closed field, 103 Hermitian transpose, 104
anti-commutativity, 20 hypercomplex number, 26
anti-commuting matrices, 31, 72 identity matrix, 104
image, 104
Baker-Campbell-Hausdorff Theorem, 51
inverse, 16
binomial formula, 45
invertible, 16
characteristic of a field, 103
Jacobi identity, 20
characteristic polynomial, 104
Jacobson’s Lemma, 85
common eigenvector, 64, 93
Jordan decomposition, 106
computational criterion, 65
commuting matrices, 88 kernel, 104
companion matrix, 104
cramped matrix, 102 Lie algebra, 20
cross product in R3 , 19 matrix exponential function, 44
cyclic vector, 57 Lie product formula, 54
diagonal matrix, 105 McCoy
division algebra, 26 little theorem, 91
over R, 34 main theorem, 95
over algebraically closed field, 27 minimal polynomial, 23, 57, 104

eigenspace, 104 negative (semi-) definite, 105


elementary divisor, 60 nilpotent, 105
nonderogatory matrix, 24, 57
field of values, 35 normal matrix, 8, 86, 105
finite field, 103 numerical radius, 40
Frobenius normal form, 81 power inequality, 42
geometric multiplicity, 104 octonions, 34

109
orthogonal diagonalization, 14 unit element, 16
orthogonal matrix, 105 unitary diagonalization, 8
unitary matrix, 8, 105
parallelogram identity, 41 unitary similarity, 8
polarization identity, 41
positive (semi-) definite, 105 vector product in R3 , 19
projectively commuting matrices, 71
zero divisor, 16
QR decomposition, 106
quasi-commuting matrices, 91
quaternions, 32

rank, 104
rational canonical form, 81
real field of values, 36

Schur decomposition, 8
Schur form, 8
real, 13
Schur inequality, 11
simultaneous diagonalization, 67
simultaneous triangularization, 65, 87, 91,
95
algorithm, 66
overview, 98
unitary, 67
skew-Hermitian matrix, 105
skew-symmetric matrix, 105
spectral decoposition of normal matrices, 8
spectral mapping theorem, 7, 74
spectral matrix, 42
spectral radius, 40
spectrum, 104
symmetric matrix, 105

Toeplitz matrix, 18
Toeplitz–Hausdorff Theorem, 38
trace, 104
transpose, 104
triangular matrix, 105
triangularization, 6
unitary, 8
trivial algebra, 16

110
Bibliography

[1] A. A. Albert and B. Muckenhoupt, On matrices of trace zeros, Michigan


Math. J., 4 (1957), pp. 1–3.

[2] Y. A. Al’pin and N. A. Koreshkov, On the simultaneous triangulability of


matrices, Mat. Zametki, 68 (2000), pp. 648–652.

[3] J. C. Baez, The octonions, Bull. Amer. Math. Soc. (N.S.), 39 (2002), pp. 145–205.

[4] J. Barrı́ a and P. R. Halmos, Vector bases for two commuting matrices, Linear
and Multilinear Algebra, 27 (1990), pp. 147–157.

[5] A. Bonfiglioli and R. Fulci, Topics in Noncommutative Algebra. The Theorem


of Campbell, Baker, Hausdorff and Dynkin, vol. 2034 of Lecture Notes in Mathe-
matics, Springer, Heidelberg, 2012.

[6] G. Bourgeois, Pairs of matrices, one of which commutes with their commutator,
Electron. J. Linear Algebra, 22 (2011), pp. 593–597.

[7] A. Cayley, A memoir on the theory of matrices, Philos. Trans. Roy. Soc. London,
148 (1858), pp. 17–37.

[8] M. D. Choi, C. Laurie, and H. Radjavi, On commutators and invariant sub-


spaces, Linear and Multilinear Algebra, 9 (1980/81), pp. 329–340.

[9] R. Dedekind, Zur Theorie der aus n Haupteinheiten gebildeten komplexen Größen,
Nachrichten von der Königl. Gesellschaft der Wissenschaften und der Georg-
Augusts-Universität zu Göttingen, (1895), pp. 141–159.

[10] M. P. Drazin, A reduction for the matrix equation AB = BA, Proc. Cambridge
Philos. Soc., 47 (1951), pp. 7–10.

[11] , Some generalizations of matrix commutativity, Proc. London Math. Soc. (3),
1 (1951), pp. 222–231.

111
[12] M. P. Drazin, J. W. Dungey, and K. W. Gruenberg, Some theorems on
commutative matrices, J. London Math. Soc., 26 (1951), pp. 221–228.

[13] H.-D. Ebbinghaus, H. Hermes, F. Hirzebruch, M. Koecher, K. Mainzer,


J. Neukirch, A. Prestel, and R. Remmert, Numbers, vol. 123 of Graduate
Texts in Mathematics, Springer-Verlag, New York, 1990.

[14] M. F. Egan and R. E. Ingram, On commutative matrices, Math. Gaz., 37 (1953),


pp. 107–110.

[15] L. Elsner and K. D. Ikramov, Normal matrices: an update, Linear Algebra


Appl., 285 (1998), pp. 291–303.

[16] K. Fan, Some remarks on commutators of matrices, Arch. Math. (Basel), 5 (1954),
pp. 102–107.

[17] R. W. Farebrother, J. Groß, and S.-O. Troschke, Matrix representation


of quaternions, Linear Algebra Appl., 362 (2003), pp. 251–255.

[18] H. Flanders, Elementary divisors of AB and BA, Proc. Amer. Math. Soc., 2
(1951), pp. 871–874.

[19] S. Friedland, Matrices—algebra, analysis and applications, World Scientific Pub-


lishing Co. Pte. Ltd., Hackensack, NJ, 2016.

[20] F. G. Frobenius, Über lineare Substitutionen und bilineare Formen, J. Reine


Angew. Math., 84 (1878), pp. 1–63.

[21] , Über vertauschbare Matrizen, Sitzungsberichte Akademie der Wiss. zu Berlin,


(1896), pp. 601–614.

[22] , Über die mit einer Matrix vertauschbaren Matrizen, Sitzungsberichte


Akademie der Wiss. zu Berlin, (1910), pp. 3–15.

[23] , Über unitäre Matrizen, Sitzungsberichte Akademie der Wiss. zu Berlin, (1911),
pp. 373–378.

[24] F. R. Gantmacher, The Theory of Matrices. Vol. 1, AMS Chelsea Publishing,


Providence, RI, 1998. Reprint of the 1959 translation.

[25] M. Gerstenhaber, On dominance and varieties of commuting matrices, Ann. of


Math. (2), 73 (1961), pp. 324–348.

[26] M. Goldberg and E. Tadmor, On the numerical radius and its applications,
Linear Algebra Appl., 42 (1982), pp. 263–284.

112
[27] R. Grone, C. R. Johnson, E. M. Sa, and H. Wolkowicz, Normal matrices,
Linear Algebra Appl., 87 (1987), pp. 213–225.

[28] R. M. Guralnick, A note on pairs of matrices with rank one commutator, Linear
and Multilinear Algebra, 8 (1979/80), pp. 97–99.

[29] B. Hall, Lie Groups, Lie Algebras, and Representations: An Elementary Intro-
duction, vol. 222 of Graduate Texts in Mathematics, Springer, Cham, second ed.,
2015.

[30] P. R. Halmos, A Hilbert Space Problem Book, vol. 19 of Graduate Texts in Math-
ematics, Springer-Verlag, New York, second ed., 1982.

[31] W. R. Hamilton, On a new species of imaginary quantities connected with a theory


of quaternions, Proceedings of the Royal Irish Academy, 2 (1844), pp. 424–434.

[32] F. Hausdorff, Der Wertevorrat einer Bilinearform, Math. Z., 3 (1919), pp. 314–
316.

[33] T. Hawkins, The Mathematics of Frobenius in Context. A Journey through 18th


to 20th Century Mathematics, Sources and Studies in the History of Mathematics
and Physical Sciences, Springer, New York, 2013.

[34] N. J. Higham, Functions of Matrices. Theory and Computation, Society for Indus-
trial and Applied Mathematics (SIAM), Philadelphia, PA, 2008.

[35] J. Holbrook and K. C. O’Meara, Some thoughts on Gerstenhaber’s theorem,


Linear Algebra Appl., 466 (2015), pp. 267–295.

[36] J. A. R. Holbrook, Multiplicative properties of the numerical radius in operator


theory, J. Reine Angew. Math., 237 (1969), pp. 166–174.

[37] , Polynomials in a matrix and its commutant, Linear Algebra Appl., 48 (1982),
pp. 293–301.

[38] H. Hopf, Ein topologischer Beitrag zur reellen Algebra, Comment. Math. Helv., 13
(1940), pp. 219–239.

[39] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge Univer-


sity Press, Cambridge, 1991.

[40] , Matrix Analysis, Cambridge University Press, Cambridge, second ed., 2013.

[41] K. D. Ikramov, A note on the matrix equation XA − AX = X p , Linear Algebra


Appl., 424 (2007), pp. 390–392.

113
[42] N. Jacobson, Rational methods in the theory of Lie algebras, Ann. of Math. (2),
36 (1935), pp. 875–881.

[43] N. Jacobson, Schur’s theorems on commutative matrices, Bull. Amer. Math. Soc.,
50 (1944), pp. 431–436.

[44] C. R. Johnson and E. A. Schreiner, The relationship between AB and BA,


Amer. Math. Monthly, 103 (1996), pp. 578–582.

[45] T. Kato, Perturbation Theory for Linear Operators, Die Grundlehren der math-
ematischen Wissenschaften, Band 132, Springer-Verlag New York, Inc., New York,
1966.

[46] , Perturbation theory for linear operators, Springer-Verlag, Berlin-New York,


second ed., 1976. Grundlehren der Mathematischen Wissenschaften, Band 132.

[47] T. Kato and O. Taussky, Commutators of A and A∗ , J. Washington Acad. Sci.,


46 (1956), pp. 38–40.

[48] D. C. Kleinecke, On operator commutators, Proc. Amer. Math. Soc., 8 (1957),


pp. 535–536.

[49] T. J. Laffey, Simultaneous triangularization of matrices—low rank cases and the


nonderogatory case, Linear and Multilinear Algebra, 6 (1978/79), pp. 269–305.

[50] , Simultaneous reduction of sets of matrices under similarity, Linear Algebra


Appl., 84 (1986), pp. 123–138.

[51] C. C. MacDuffee, The Theory of Matrices, Verlag von Julius Springer, Berlin,
1933.

[52] M. Marcus and R. C. Thompson, On a classical commutator result, J. Math.


Mech., 16 (1966), pp. 583–588.

[53] N. H. McCoy, On quasi-commutative matrices, Trans. Amer. Math. Soc., 36


(1934), pp. 327–340.

[54] N. H. McCoy, On the characteristic roots of matric polynomials, Bull. Amer.


Math. Soc., 42 (1936), pp. 592–600.

[55] J. Milnor, Some consequences of a theorem of Bott, Ann. Math., 68 (1958),


pp. 444–449.

[56] M. Mirzakhani, A simple proof of a theorem of Schur, Amer. Math. Monthly, 105
(1998), pp. 260–262.

114
[57] F. D. Murnaghan and A. Wintner, A canonical form for real matrices under
orthogonal transformations, Proc. Natl. Acad. Sci. USA, 17 (1931), pp. 417–420.

[58] K. C. O’Meara, J. Clark, and C. I. Vinsonhaler, Advanced Topics in Lin-


ear Algebra. Weaving Matrix Problems through the Weyr Form, Oxford University
Press, Oxford, 2011.

[59] A. Ostrowski, Über die Existenz einer endlichen Basis bei gewissen Funktionen-
systemen, Math. Ann., 78 (1917), pp. 94–119.

[60] C. Pearcy, An elementary proof of the power inequality for the numerical radius,
Michigan Math. J., 13 (1966), pp. 289–291.

[61] C. R. Putnam, On normal operators in Hilbert space, Amer. J. Math., 73 (1951),


pp. 357–362.

[62] , On the spectra of commutators, Proc. Amer. Math. Soc., 5 (1954), pp. 929–931.

[63] C. R. Putnam and A. Wintner, On the spectra of group commutators, Proc.


Amer. Math. Soc., 9 (1958), pp. 360–362.

[64] H. Schneider, A pair of matrices with property P , Amer. Math. Monthly, 62


(1955), pp. 247–249.

[65] , Olga Taussky-Todd’s influence on matrix theory and matrix theorists, Linear
and Multilinear Algebra, 5 (1977/78), pp. 197–224. A discursive personal tribute.

[66] I. Schur, Zur Theorie der vertauschbaren Matrizen, J. Reine Angew. Math., 130
(1905), pp. 66–76.

[67] , Über die charakteristischen Wurzeln einer linearen Substitution mit einer An-
wendung auf die Theorie der Integralgleichungen, Math. Ann., 66 (1909), pp. 488–
510.

[68] D. Serre, Matrices. Theory and Applications, vol. 216 of Graduate Texts in Mathe-
matics, Springer-Verlag, New York, 2002. Translated from the 2001 French original.

[69] H. Shapiro, A survey of canonical forms and invariants for unitary similarity,
Linear Algebra Appl., 147 (1991), pp. 101–167.

[70] , Commutators which commute with one factor, Pacific J. Math., (1997),
pp. 323–336. Olga Taussky-Todd: in memoriam.

[71] , Notes from Math 223: Olga Taussky Todd’s matrix theory course, 1976–1977,
Math. Intelligencer, 19 (1997), pp. 21–27.

115
[72] D. Shemesh, Common eigenvectors of two matrices, Linear Algebra Appl., 62
(1984), pp. 11–18.

[73] , A simultaneous triangularization result, Linear Algebra Appl., 498 (2016),


pp. 394–398.

[74] K. Shoda, Einige Sätze über Matrizen, Jpn. J. Math., 13 (1937), pp. 361–365.

[75] U. Stammbach, Vergnügliches aus dem Briefwechsel zwischen Ferdinand Georg


Frobenius und Richard Dedekind, Mitteilungen der DMV, 23 (2015), pp. 113–120.

[76] J. J. Sylvester, The genesis of an idea, or story of a discovery relating to equa-


tions in multiple quantity, Nature, 31 (1884), pp. 35–36.

[77] , Sur l’equations en matrices px = xq, C. R. Acad. Sc. Paris, 99 (1884), pp. 67–
71, 115–116.

[78] O. Taussky, Commutativity in finite matrices, Amer. Math. Monthly, 64 (1957),


pp. 229–235.

[79] , A note on the group commutator of A and A∗ , J. Washington Acad. Sci., 48


(1958), p. 305.

[80] , The role of symmetric matrices in the study of general matrices, Linear Alge-
bra and Appl., 5 (1972), pp. 147–154.

[81] , How I became a torchbearer for matrix theory, Amer. Math. Monthly, 95
(1988), pp. 801–812.

[82] R. C. Thompson, Commutators in the special and general linear groups, Trans.
Amer. Math. Soc., 101 (1961), pp. 16–33.

[83] O. Toeplitz, Das algebraische Analogon zu einem Satze von Fejér, Math. Z., 2
(1918), pp. 187–197.

[84] J. Wei, Note on the global validity of the Baker-Hausdorff and Magnus theorems,
Journal of Mathematical Physics, 4 (1963), pp. 1337–1341.

[85] K. Weierstraß, Zur Theorie der aus n Haupteinheiten gebildeten komplexen


Größen, Nachrichten von der Königl. Gesellschaft der Wissenschaften und der
Georg-Augusts-Universität zu Göttingen, (1894), pp. 395–419.

[86] E. M. E. Wermuth, Two remarks on matrix exponentials, Linear Algebra Appl.,


117 (1989), pp. 127–132.

116
[87] H. Wielandt, Über die Unbeschränktheit der Operatoren der Quantenmechanik,
Math. Ann., 121 (1949), p. 21.

[88] A. Wintner, The unboundedness of quantum-mechanical matrices, Physical Rev.


(2), 71 (1947), pp. 738–739.

[89] F. Zhang, Matrix Theory. Basic Results and Techniques, Universitext, Springer,
New York, second ed., 2011.

117

You might also like