Matrix Theory Script
Matrix Theory Script
Jörg Liesen
TU Berlin
Institute of Mathematics
Preface
The quote above is from Helene Shapiro’s recollection of the course Math 223: Matrix
Theory taught by Olga Taussky1 at the California Institute of Technology (Caltech) in
1976–1977. It indicates that Matrix Theory is a vast and tangled area with an abundance
of concepts and results, overgrowing each other to an extend that it becomes almost
impenetrable. Olga Taussky was certainly not the first to recognize this situation. The
main reasons for the unrelentless growth of the area was indicated already in the Preface
of the first systematic book on Matrix Theory, published in 1933 by MacDuffee [51]2 :
2
viewpoint one tries to collect results related to certain decompositions or (similarity)
transformations, for example triangular factorizations or unitary similarity. And under
the second viewpoint one tries to collect results for certain matrix classes, for example
Hermitian (positive (semi-)definite), unitary, normal and nonnegative matrices, or even
more special classes like tridiagonal, circulant, Cauchy, Toeplitz and Vandermonde ma-
trices. Such structuring of the material is highly useful, and if applied consistently, it
can lead to excellent reference books or jungle guides (to stay in the picture). Examples
are the books by Horn and Johnson from 2012 [40] and Zhang from 2011 [89], and to
some extend also those of Serre from 2002 [68] and Friedland from 2016 [19], which also
contain an applied or algorithmic aspect. It appears that the general reference or sur-
vey style presentation of Matrix Theory is not only present in modern treatments, but
dates back to the classics of the area such as the books of MacDuffee quoted above and
Gantmacher from 1959 [24], who points out in the Preface that he “tried to keep the
individual chapters as far as possible independent of each other”.
In this course we will take a different approach, which is motivated by the content
of Olga Taussky’s Math 223: Matrix Theory. Our main point of view will be a holistic
(meaning: from every angle) analysis of matrix multiplication. The advantage of building
the course around such a mathematical concept is that we will obtain a “plot” for the
course rather than presenting material in the style of a reference book. During our
investigations we will derive, analyze and apply a wide range of tools and techniques
from Matrix Theory, including many matrix decompositions, the field of values, the
matrix exponential, (Sylvester) matrix equations, or properties of many special matrix
classes. Thus, while focussing on results revolving around matrix multiplication, the
course nevertheless will provide quite a large map for the Matrix Theory jungle.
What’s so special about matrix multiplication? Everybody who studied Linear Algebra
knows that this operation is not commutative. Thus, for matrices A, B ∈ K n,n we can not
expect in general that the products AB and BA are equal. In fact, if we pick “arbitraty”
matrices A, B ∈ K n,n , it is “highly unlikely” that they commute. Since commutativity of
some given A and B appears to be the exception rather than the rule, we are immediately
motivated to ask questions: What are algebraic and analytic properties of commuting
matrices, and are these properties useful? Which matrices commute with a given matrix?
Can we characterize subsets or subspaces of K n,n containing only pairwise commuting
matrices? On the other hand, if A and B do not commute, is there a reasonable concept
of “almost” or “quasi” commutativity? And what are the properties of the additive and
multiplicative commutators, defined by AB − BA and ABA−1 B −1 ? All these questions,
and further related ones, will be addressed in this course.
The course starts with a review of results about the triangularization of a single ma-
trix (Chapter 1). Conceptually these results from the foundation for the simultaneous
triangularization of two or more matrices, which will play a major role in the course.
3
In order to set up our approach after Chapter 1, we quote from the start of the Intro-
duction of Olga Taussky’s article “Commutativity in finite matrices” [78]:
The real and complex numbers have the properties that ab = ba for all a and b
and for every a 6= 0 there is an inverse a−1 , which implies ab = 0 only if a = 0
or b = 0. A classical theorem of Frobenius states that there exists no other
hypercomplex systems which have these properties. If we want to consider
more general systems, we must give up at least one of these properties. [...]
The set of n × n matrices A, B, . . . , with complex elements can be regarded as
a hypercomplex system with n2 base elements and in this system, in general,
AB 6= BA. In general, also, there is no inverse A−1 even if A 6= 0 and,
furthermore, AB can be zero without A = 0 or B = 0.
The first major topic of the course will be the characterization of “hypercomplex number
systems” (Chapter 2). We will see that associativity and commutativity of the multi-
plication in such systems naturally leads to a study of the properties of commuting
matrices. Moreover, we will prove the classical theorem of Frobenius (from 1878), which
shows that the real and complex numbers are the only examples of such systems that
are associative and commutative.
Next we will study commuting matrices, including some analytic properties (Chapter 3)
and their general structure (Chapter 4), followed by an investigation of the simultaneous
triangularization property of a finite set of matrices, which is implied by commutativity,
and its applications (Chapter 5).
The final part of the course deals with properties of the additive and multiplicative com-
mutators, with a focus on the theorems of McCoy, which characterize the simultaneous
triangularization property of a finite set of (non-commuting) matrices (Chapter 6).
It is assumed that students are familiar with the theory of Linear Algebra and of matrices
as established in the book J. Liesen and V. Mehrmann, Linear Algebra, Springer, 2015,
which will be cited as [LM] in these notes. Many relevant definitions and facts from that
book, and a few further results not contained in that book, are collected in the Appendix
for completeness.
Please send corrections or suggestions to me at [email protected].
4
Contents
5
Chapter 1
We start with the following result about the triangularization of a single matrix, which
according to Horn and Johnson is “[p]erhaps the most fundamentally useful fact of ele-
mentary matrix theory” [40, p. 101].
Theorem 1.1. If K is any field, then the following assertions are equivalent for each
A ∈ K n,n :
(2) There exists a matrix S ∈ GLn (K) such that S −1 AS is upper triangular, i.e., A
can be triangularized.
(1) ⇒ (2): We prove this direction by induction on n. The assertion is trivial for n = 1,
since every 1 × 1 matrix is upper triangular. Suppose that the assertion holds for all
matrices up to order n − 1 for some n ≥ 2, and let A ∈ K n,n be given. Since PA
decomposes into linear factors, there exists an eigenvalue λ of A with a corresponding
eigenvector x ∈ K n . Let Y ∈ K n,n−1 be any matrix such that X = [x, Y ] ∈ GLn (K),
then
λ ∗
AX = [λx, AY ] = X ,
0 A1
6
for some matrix A1 ∈ K n−1,n−1 . From PA = (t − λ)PA1 and our assumption on PA we
see that PA1 decomposes into linear factors. Thus the induction hypothesis implies that
there exists a matrix S1 ∈ GLn−1 (K) such that S1−1 A1 S1 is upper triangular. With
1 0
S := X ∈ GLn (K)
0 S1
we now get
−1 1 0 −1 1 0 1 0 λ ∗ 1 0
S AS = X AX =
0 S1−1 0 S1 0 S1−1 0 A1 0 S1
λ ∗
= ,
0 S1−1 A1 S1
Note that proving Theorem 1.1 is significantly simpler than proving the existence of the
Jordan decomposition of a matrix A ∈ K n,n . However, unlike the Jordan form, the upper
triangular matrix R in Theorem 1.1 gives no information about the geometric structure
of the (generalized) eigenspaces. A discussion of the triangularization in the abstract
setting of endomorphisms on finite dimensional vector spaces as well as a complete proof
of the Jordan decomposition in this context is given in [LM, Section 14.3].
If S −1 AS = R = [rij ] is upper triangular, then the diagonal elements rjj , j = 1, . . . , n, are
the eigenvalues of A. If p ∈ K[t] is any polynomial, then p(A) = p(SRS −1 ) = Sp(R)S −1 ,
where p(R) is upper triangular with diagonal elements given by p(rjj ), j = 1, . . . , n.
These are the eigenvalues of p(A). In short, we can write this observation as
σ(p(A)) = p(σ(A)).
Consequently, we could replace upper triangular by lower triangular in item (2) of Theo-
rem 1.1. One reason why the theorem is formulated in terms of the upper triangular form
7
is that this form appears naturally in the iductive proof of the implication (1) ⇒ (2). If
we would like to end up with a lower triangular form, then the eigenvector that is found
in the first step of the proof must be the last vector of the basis we construct.
For the special case K = C we get the following result, originally due to Schur1 [67].
Corollary 1.2. If A ∈ Cn,n , then there exists a unitary matrix U ∈ Cn,n such that
U H AU is upper triangular, i.e., A can be unitarily triangularized.
Proof. If A ∈ Cn,n , then PA decomposes into linear factors, since the field C is alge-
braically closed. Thus, there exists a matrix S ∈ GLn (C) such that S −1 AS = R1 is
upper triangular. Let S = U R2 be a QR decomposition of S, where U ∈ Cn,n is unitary
and R2 ∈ GLn (C) is upper triangular. Then
U H AU = U −1 AU = R2 S −1 ASR2−1 = R2 R1 R2−1 ,
(1) A is normal, i.e., AH A = AAH , if and only if there exists a unitary matrix U ∈ Cn,n
such that U H AU is diagonal, i.e., A can be unitarily diagonalized.
(3) A is unitary, i.e., AH A = I, if and only if A can be unitarily diagonalized and all
eigenvalues λ of A satisfy |λ| = 1.
1
Issai Schur (1875–1941)
2
The concept of a normal matrix (actually, of a normal bilinear form) was introduced by Otto Toeplitz
(1881–1940) in 1918 [83]. In that paper Toeplitz also proved the result stated in Corollary 1.3 (1), and
he mentioned that this result was known to him and Schur for a long time. According to a footnote in a
paper of Alexander Markowitsch Ostrowski (1893–1986) from 1917 [59, p. 118], Toeplitz wrote to Schur
about the result already in 1914.
8
Proof. (1) Let A = U RU H be a Schur decomposition of A ∈ Cn,n . Then
AH A = (U RH U H )(U RU H ) = U RH RU H ,
AAH = (U RU H )(U RH U H ) = U RRH U,
i.e., A is normal.
AH = (U DU H )H = U DH U H = U DU H = A,
i.e., A is Hermitian.
9
implies that DH D = In , and thus |dii | = 1.
On the other hand, if A = U DU H with DH D = In , then
AH A = (U DH U H )(U DU H ) = In ,
i.e., A is unitary.
i.e., A is decomposed into the sum of n rank-one matrices. We have Pj2 = Pj and
PjH = Pj , and for each x ∈ Cm ,
Pj x ∈ span{uj }, (I − Pj )x ⊥ span{uj }.
Thus, the matrix Pj is an orthogonal projector onto the invariant subspace span{uj }.
Corollary 1.4. Let A ∈ Cn,n be Hermitian. Then A is positive definite (semidefinite) if
and only if all eigenvalues of A are positive (nonnegative).
10
Proof. If A ∈ Cn,n is Hermitian, there exists a unitary diagonalization by (1) in Corol-
lary 1.3, say A = U DU H with a unitary matrix U = [u1 , . . . , un ] ∈ Cn,n and D =
diag(λ1 , . . . , λn ) ∈ Rn,n . In particular, Aui = λi ui for i = 1, . . . , n.
If A is positive definite, i.e., xH Ax > 0 for all nonzero x ∈ Cn , then
0 < uH H
i Aui = λi ui ui = λi for all i = 1, . . . , n.
On the other hand, let λi > 0 for all i = 1, . . . , n. If x ∈ Cn is arbitrary, then x =
P n T
i=1 αi ui = U [α1 , . . . , αn ] for some α1 , . . . , αn ∈ C. Hence
n
X
H T
x Ax = [α1 , . . . , αn ]D[α1 , . . . , αn ] = λi |αi |2 ,
i=1
which is positive when x 6= 0. The same argument (with “>” replaced by “≥”) applies
to the semidefinite case.
Exercise. Prove the Schur inequality: If A = [aij ] ∈ Cn,n has the eigenvalues
λ1 , . . . , λn ∈ C, then
Xn Xn
2
|λj | ≤ |aij |2 ,
j=1 i,j=1
with equality if and only if A is normal. (Hint: Use the Schur decomposition and the
unitary invariance of the trace.)
Exercise. Show that if a normal matrix A ∈ Cn,n has its eigenvalues as diagonal
elements, then A is diagonal.
The situation for real matrices is more complicated, since the field R is not algebraically
closed. Of course, we may consider any matrix A ∈ Rn,n as an element of Cn,n , and then
A is unitarily triangularizable. The factors in the resulting decomposition of A will in
general be non-real. If we insist on a real decomposition using an orthogonal (instead
of a unitary) matrix, then we have to give up the upper triangular form, since this
form implies that the characteristic polynomial of A decomposes into linear factors. The
following “real variant” of Corollary 1.2, originally due to Murnaghan and Wintner [57]3
gets us as close as possible to the upper triangular form in the decomposition.
Theorem 1.5. If A ∈ Rn,n , then there exists an orthogonal matrix Q ∈ Rn,n with
R11 . . . R1m
QT AQ = R =
.. .. ∈ Rn,n ,
. .
Rmm
3
Francis Dominic Murnaghan (1893–1976) and Aurel Wintner (1903–1958)
11
where for every j = 1, . . . , m either Rjj ∈ R or
" #
(j) (j)
r1 r2 (j)
Rjj = (j) (j) ∈ R2,2 with r3 6= 0.
r3 r4
In the second case Rjj has, considered as complex matrix, a pair of complex conjugate
eigenvalues of the form αj ± iβj with αj ∈ R and βj ∈ R \ {0}.
for some matrix A1 ∈ Rn,n . By the induction hypothesis there exists an orthogonal
matrix U2 ∈ Rn,n such that R1 = U2T A1 U2 has the desired form. The matrix
1 0
U := U1
0 U2
12
the eigenvector v = x + iy must be nonzero. Thus, x 6= 0, and using β 6= 0 in the second
equation in (1.1) implies that also y 6= 0. Thus, there exists a µ ∈ R \ {0} with x = µy.
The two equations in (1.1) then can be written as
1
Ax = (α − βµ)x and Ax = (β + αµ)x,
µ
which implies that β(1+µ2 ) = 0. Since 1+µ2 6= 0 for all µ ∈ R, this implies β = 0, which
contradicts our assumption that β 6= 0. Consequently, x, y are linearly independent.
Combining the two equations in (1.1) yields
α β
A[x, y] = [x, y] ,
−β α
for some matrix A1 ∈ Rn−1,n−1 . Analogously to the first case, an application of the
induction hypothesis to this matrix yields the desired matrices R and U .
The matrix R in Theorem 1.5 is called a real Schur form of A. The theorem implies the
following result for real normal matrices.
13
Corollary 1.6. A matrix A ∈ Rn,n is normal if and only if there exists an orthogonal
matrix Q ∈ Rn,n with
QT AQ = diag(R1 , . . . , Rm ),
where, for every j = 1, . . . , m either Rj ∈ R or
αj βj
Rj = ∈ R2,2 with βj 6= 0.
−βj αj
In the second case the matrix Rj has, considered as complex matrix, a pair of complex
conjugate eigenvalues of the form αj ± iβj .
Exercise. Use Corollary 1.6 to show that A ∈ Rn,n is symmetric if and only if there
exists an orthogonal matrix Q ∈ Rn,n such that QT AQ is diagonal, i.e., A can be
orthogonally diagonalized.
14
Chapter 2
15
Definition 2.1. An algebra over a field K is a K-vector space (V, +, ·) with an additional
multiplication
∗ : V × V → V, (x, y) 7→ x ∗ y,
which satisfies the distributive laws
(α · x + β · y) ∗ z = α · (x ∗ z) + β · (y ∗ z), x ∗ (α · y + β · z) = α · (x ∗ y) + β · (x ∗ z),
x ∗ (y ∗ z) = (x ∗ y) ∗ z or x∗y =y∗x
hold for all x, y, z ∈ V , respectively. The dimension of the algebra is defined as the
dimension of the underlying K-vector space.
A nonzero element e ∈ V is called a unit element if e ∗ x = x ∗ e = x holds for all x ∈ V .
If a unit element e ∈ V exists, an element x ∈ V is called invertible when there exists
an element y ∈ V with x ∗ y = y ∗ x = e. The element y is called an inverse of x.
A nonzero element x ∈ V is called a zero divisor if there exists a nonzero element y ∈ V
such that x ∗ y = 0.
The trivial algebra V = {0} consists just of the zero element of the vector space V .
Without explicitly mentioning it, we will usually assume that an algebra we consider is
not the trivial algebra.
For simplicity of notation we will usually skip the multiplication signs, i.e., we will write
αx and xy instead of α · x and x ∗ y.
Lemma 2.2. If V is an algebra over a field K, then the following properties hold:
(1) Any multiplication with zero elements (from K or V ) yields the zero element in V .
(3) If V is associative and contains a unit element, and x ∈ V is invertible, then the
inverse of x is unique. We denote the inverse of x by x−1 .
2
The terms distributive and commutative (acutally, the French distributives and commutatives) were
first used by François-Joseph Servois (1767–1847) in his Essai sur un nouveau mode d’exposition des
principes du calcul différential from 1814. The term associative is most likely due to Sir William Rowan
Hamilton (1805–1865), who used it first in the context of his work on quaternions in 1843 [31]; see
Section 2.2 below. Probably the first study of matrix commutativity was made by Arthur Cayley (1821–
1895) in 1858. He used the term convertible and derived “the general form the matrix L convertible
with a given matrix M of the order 2” [7, p. 29].
16
(4) If V is associative and contains a unit element, then an invertible element x ∈ V
cannot be a zero divisor, and vice versa.
0K · x = (0K + 0K ) · x = 0K · x + 0K · x ⇒ 0K · x = 0V ,
0V ∗ x = (0V + 0V ) ∗ x = 0V ∗ x + 0V ∗ x ⇒ 0V ∗ x = 0V ,
x ∗ 0V = x ∗ (0V + 0V ) = x ∗ 0V + x ∗ 0V ⇒ x ∗ 0V = 0V .
Example 2.3. (1) The field K itself is an algebra. Here the scalar multiplication is
identical to the multiplication of two elements of K. Another algebra is given by
K 1,1 , i.e., the 1 × 1 matrices over K. In this algebra there is a formal difference be-
tween the scalar and the matrix multiplication. Both algebras are one-dimensional,
contain a unit element (1 ∈ K and [1] ∈ K 1,1 ), they are associative and com-
mutative, and they do not contain zero divisors. In particular, (K 1,1 , +, ∗) is a
field. The algebras K 1,1 and K can be identified with one another via an algebra
isomorphism; see Definition 2.5 below.
(2) More generally, the vector space K n,n with the usual matrix multiplication is an
associative algebra of dimension n2 with unit element given by In . As discussed
in the introduction of this chapter, for n ≥ 2 this algebra is not commutative and
contains zero divisors. Hence (K n,n , +, ∗) for n ≥ 2 is not a field.
(3) The vector space of the upper (or the lower) triangular matrices in K n,n , n ≥ 2,
with the usual matrix multiplication is an associative and non-commutative algebra
of dimension (n2 + n)/2, with unit element given by In .
17
algebra,
since
the product
of two elements of V may
not be in V . For example, if
0 1 0 −1 1 0
A= and B = , then AB = ∈
/ span{A, B}.
1 0 1 0 0 −1
(5) For a fixed nonzero matrix A ∈ K n,n , the vector space of the polynomials in A,
with the usual matrix multiplication is an algebra. This algebra is associative and
commutative, and the unit element is In . We know from the Cayley3 -Hamilton
Theorem that PA (A) = 0, and hence
An ∈ span{In , A, . . . , An−1 },
which implies that dim(V (A)) ≤ n. An interesting extension of this result will be
derived later; see (4.5) and the corresponding discussion.
(6) A matrix A ∈ K n,n that has constant entries on all its 2n − 1 diagonals, i.e.,
a0 a1 · · · · · · an−1
.. ..
a−1 a0 . .
. . . . ..
A= .
. . . . . . . ,
.
. . .
.. .. . . a1
a−(n−1) · · · · · · a−1 a0
is called a Toeplitz matrix. Using Jn (0), the Jordan block of size n with eigenvalue
zero, a Toeplitz matrix can be written as
n−1
X n−1
X
T j
A= a−j (Jn (0) ) + aj Jn (0)j .
j=1 j=0
18
(7) An infinite dimensional associative and commutative algebra is given by the vector
space K[t] with the usual multiplication of polynomials. This algebra contains a
unit element, namely the polynomial p = 1, and it does not contain zero divisors.
For p = αk tk +· · ·+α1 t+α0 ∈ K[t] and x ∈ V we define p(x) := αk ·xk +· · ·+α1 ·x+α0 ·e.
Exercise. Show that if (V, +, ·, ∗) is an associative algebra with a unit element and
p, q ∈ K[t], then (pq)(x) = p(x)q(x) holds for all x ∈ V .
Exercise. Show that the vector space Rn,n , n ≥ 2, with the product A ∗ B :=
1
2
(AB + BA), where AB and BA are the usual matrix products, is a commutative and
non-associative algebra.
19
Remark 2.4. An important concept that is closely related to an algebra over a field is the
Lie algebra. This is a vector space V over a field K with an operation (multiplication)
[·, ·] : V × V → V called Lie bracket, which is bilinear, alternating (i.e., [x, x] = 0 for all
x ∈ V ) and satisfies the Jacobi identity
[x, [y, z]] + [z, [x, y]] + [y, [z, y]] = 0, for all x, y, z ∈ V .
Note that if [x, [y, z]] = [[x, y], z] holds for some x, y, z ∈ V , then the Jacobi identity and
the anti-commutativity imply that [y, [z, x]] = 0, which shows that the Lie bracket is in
general not associative.
Exercise. Show that an associative algebra with the operation [x, y] := xy −yx, where
xy and yx are products in the associative algebra, is a Lie algebra.
(Thus, in particular, the associative algebra K n,n with [A, B] := AB − BA, where AB
and BA are the usual matrix products, is a Lie algebra.)
Exercise. Show that the skew symmetric matrices of K n,n with the operation [A, B] :=
AB − BA, where AB and BA are the usual matrix products, form a Lie algebra.
Exercise. Show that the vector space R3 with the cross product is a Lie algebra.
Of course, algebras like K and K 1,1 and should be identified with each other. Formally,
such an identification is done via an algebra isomorphism, which is defined as follows.
Definition 2.5. Let V and W be two algebras over the same field K. A linear map
f : V 7→ W is called an algebra homomorphism if
f : V → W, [x] 7→ x, (2.2)
20
Lemma 2.6. Suppose that the algebras V and W are isomorphic to one another. If the
multiplication in V is associative or commutative, then the multiplication in W is also
associative or commutative, respectively, and vice versa.
In (2.2) we have identified a real algebra of matrices and a real algebra of numbers with
one another. Both are one-dimensional, and motivated by an advice of Olga Taussky4 , we
now ask whether such an identification is also possible for two-dimensional real algebras.
As our candidate for the algebra of numbers we take, of course, the complex numbers,
with the usual matrix multiplication. We then have E12 = −E0 , and hence for any scalars
α1 , β1 , α2 , β2 ∈ R we obtain
21
Using (2.4) and the multiplication in C, for any Aj = αj E0 + βj E1 ∈ V , j = 1, 2, we
have
Thus, V and W are indeed isomorphic to one another. Since the multiplication in C
is commutative, Lemma 2.6 implies that the multiplication in V is commutative, which
can also easily be verified by a direct computation.
Each element of V is of the form
α −β
A = αE0 + βE1 =
β α
xm ∈ span{e, x, . . . , xm−1 },
and hence xm = q(x) for some polynomial q ∈ K[t], which is uniquely determined since
e, x, . . . , xm−1 are linearly independent. We thus have
22
Let pe ∈ K[t] be any other monic polynomial with pe(x) = 0. Then deg(e
p) ≥ deg(p), since
otherwise we would have a contradiction to the linear independence of e, x, . . . , xm−1 . If
deg(ep) = deg(p), then deg(p − pe) < m, since both polynomials are monic. But then
(p − pe)(x) = p(x) − pe(x) = 0,
which shows that p − pe = 0, since otherwise we would again have a contradiction to the
linear independence of e, x, . . . , xm−1 . In summary, we have shown the following result.
Lemma 2.7. If V is a finite dimensional associative algebra with a unit element, then
for each nonzero element x ∈ V there exists a uniquely determined monic polynomial
p ∈ K[t] of smallest possible degree deg(p), where 1 ≤ deg(p) ≤ dim(V ), such that
p(x) = 0. This uniquely determined polynomial is called the minimal polynomial5 of x.
Example 2.8. Consider the finite dimensional associative algebra Cn,n with the unit
element I. Then every A ∈ Cn,n has a Jordan decomposition A = SJS −1 , where the
Jordan canonical form J = diag(Jd1 (λ1 ), . . . , Jdm (λm )) is determined uniquely up to the
order of the diagonal blocks. For p ∈ C[t] we have p(A) = Sp(J)S −1 , and hence p(A) = 0
if and only if p(J) = 0. This holds if and only if p(Jdi (λi )) = 0 for all i = 1, . . . , m. It
can be shown that
deg(p) (j)
X p (λi )
p(Jdi (λi )) = Jd (0)j , (2.5)
j=0
j!
where p(j) (λi ) is the jth derivative of p evaluated at λi . Thus, p(Jdi (λi )) = 0 holds if
and only if λi is a zero of p with multiplicity at least di . The uniquely determined monic
polynomial p ∈ C[t] of smallest possible degree with p(Jdi (λi )) = 0 therefore is given by
p = (t−λi )di . Consequently, if λ
e1 , . . . , λ
ek are the distinct eigenvalues of A, and de1 , . . . , dek
are the sizes of the largest corresponding Jordan blocks, then the minimal polynomial of
A is given by
Yk
MA = ei )dei .
(t − λ
i=1
Obviously, MA divides the characteristic polynomial
m
Y
PA = (t − λi )di ,
i=1
5
Frobenius introduced the concept of the minimal polynomial in 1878 [20, §3]. In this context
he showed that the minimal polynomial divides the characteristic polynomial, and thus proved the
Cayley-Hamilton Theorem, without explicitly mentioning Cayley or Hamilton. In 1896 he gave another
proof [21, §1], and he then cited Cayler’s paper from 1858 [7] as the first where this fundamental theorem
(“Fundamentalsatz”) was published.
23
and we have MA = PA if and only if each for each of the eigenvalues of A there is exactly
one Jordan block, or, equivalently, each eigenvalue has geometric multiplicity one. Such
a matrix A is called nonderogatory6 (cf. Definition 4.3).
n
X
xy := (αi βj )(vi vj ). (2.6)
i,j=1
Using this in (2.6), the product of two matrices A = [aij ] and B = [bij ] is then given by
n
X n
X n
X n
X
A∗B = aij Eij bk` Ek` = (aij bk` )(Eij ∗ Ek` ) = (aij bj` )Ei` .
i,j=1 k,`=1 i,j,k,`=1 i,j,`=1
Pn
Thus, for all i, ` = 1, . . . , n the (i, `) entry of A ∗ B is given by j=1 aij bj` , which shows
that we have defined nothing but the usual matrix multiplication. The fact that this
multiplication is not commutative is shown by the equations Eij ∗ Ej` = Ei` 6= 0 for all
i, ` = 1, . . . , n, and Ej` ∗ Eij = 0 whenever i 6= `.
6
Sylvester introduced the term derogatory (actually, the French dérogatoires) in his article “Sur les
quantités formant un groupe de nonions analogues aux quaternions de Hamilton” from 1884; see p. 157
in paper no. 18 in Volume 4 of his Collected Works.
24
Example 2.10. If we use the basis (2.7) and define the products of the basis vector as
(
Eij , i = k and j = `,
Eij Ek` :=
0, otherwise,
We have thus obtained an algebra with the elementwise or Hadamard product of matrices.
By definition the basis vectors satisfy Eij Ek` = Ek` Eij , so that the product is
commutative. (This is also clear from the commutativity of the multiplication in K,
which immediately gives A B = [aij bij ] = [bij aij ] = B A.)
As above, suppose that {v1 , . . . , vn } is a basis of the algebra V over the field K. The
products vj vk are elements of V if and only if there exist aijk ∈ K, i, j, k = 1, . . . , n, such
that n
X
vj vk = aijk vk . (2.8)
k=1
aijk = aikj , i, j, k = 1, . . . , n.
25
The multiplication is associative, i.e., (vq vr )vs = vq (vr vs ) for all q, r, s = 1, . . . , n, if and
only if
Xn n
X
a`is airq = a`iq airs or (As Aq )`r = (Aq As )`r
i=1 i=1
Apparently, the properties of associative and commutative algebras are inherently linked
with those of commuting matrices.
Results on the existence of such algebras, or more precisely hypercomplex number sys-
tems, which generalize the real and complex numbers, were published by Weierstraß7 in
1894 [85] and Dedekind in 1895 [9]. The connection to commuting matrices was known
to Frobenius, and motivated him to investigate their properties8 . His fundamental paper
from 1896 [21] will be addressed in detail in the Chapters 4 and 5 below. Interestingly,
Frobenius had completely characterized the existence of the most important classes of
associative algebras already in 1878 [20], and without exploring the connection to matrix
commutativity. The following section is mainly devoted to proving his characterization.
Note that a division algebra does not contain zero divisors, since a zero divisor x ∈ V \{0}
would simultaneously satisfy the two equations xy = 0 for some y ∈ V \{0} and x∗0 = 0,
contradicting the uniqueness of the solution of xa = 0.
7
Karl Theodor Wilhelm Weierstraß (1815–1897)
8
Frobenius wrote to Richard Dedekind (1831–1916) on July 15, 1896: “Everything of a hypercomplex
nature ultimately reduces in reality to the composition of matrices.” (Quoted in [33, p. 528].) According
to MacDuffee [51, p. 4], “the concept of a matrix as a hypercomplex number is due in essence to Hamilton
but more directly to Cayley.” In more modern terms, Cayley established in his fundamental paper from
1858 [7] the idea of the matrix as an independent algebraic object, and thus significantly contributed to
establishing matrix theory as a field of research.
26
Let us prove some basic properties of finite dimensional associative division algebras.
(2) If K is algebraically closed, then the minimal polynomial of each x ∈ V \ {0} (cf.
Lemma 2.7) is of degree 1, and hence V and K are isomorphic to one another.
Proof. (1) We choose an arbitrary x ∈ V \ {0}, then by definition there exists a uniquely
determined ax ∈ V such that ax x = x. We have ax 6= 0 since x 6= 0. Moreover, since V
is associative,
ax x = ax (ax x) = a2x x ⇔ (a2x − ax )x = 0,
and since V does not contain zero divisors, we must have a2x = ax . For each y ∈ V we
therefore obtain
ax (ax y − y) = a2x y − ax y = 0,
which implies that ax y = y. Similarly,
which implies that yax = y. Consequently, ax is the (uniquely determined) unit element
in V .
(2) Let x ∈ V \ {0} be arbitrary, and let p ∈ K[t] be the minimal polynomial of x. Since
K is algebraically closed, we have
p = (t − λ1 ) · · · (t − λm )
p(x) = (x − λ1 · e) · · · (x − λm · e) = 0.
Since p is the minimal polynomial and V does not contain zero divisors, we must have
m = 1. The equation x − λ1 · e = 0 then shows that every (nonzero) element x ∈ V can
be identified with a uniquely determined (nonzero) element λ1 ∈ K, and vice versa. The
corresponding map is obviously linear and bijective, so that V and K isomorphic to one
another.
Exercise. Investigate whether the algebra (K n,n , +, ·, ) with the Hadamard product
is a division algebra.
27
Exercise. Investigate whether the algebra (K n,n , +, ·, ∗), n ≥ 2, with the product
A ∗ B := 21 (AB + BA), where AB and BA are the usual matrix products, is a division
algebra.
The associative and commutative algebras V = spanR {[1]} and V = spanR {E0 , E1 }
in (2.3) that we studied in Section 2.1 are both division algebras, since each nonzero
element in these algebras is invertible. Following the classical argument of Frobenius
from 1878 [20, §14], we will now analyze if there exist such algebras of dimension larger
than two. This analysis will also show whether there exist any number fields beyond the
real and the complex numbers.
Let E0 , E1 , . . . , Em ∈ Rn,n be linearly independent matrices and consider the real vector
space
V = spanR {E0 , E1 , . . . , Em } ⊆ Rn,n ,
which is of dimension m + 1. Let us assume that we have defined a multiplication so that
V is an associative division algebra with the unit element E0 . Thus, E0 A = AE0 = A
holds for all A ∈ V . Our two examples show that the construction of such an algebra is
possible for m = 0 and m = 1.
The following important result holds for the minimal polynomials of the elements of V .
Lemma 2.14. For each nonzero matrix A ∈ V the minimal polynomial MA ∈ R[t]
satisfies 1 ≤ deg(MA ) ≤ 2, and deg(MA ) = 1 holds if and only if A = αE0 for some
nonzero α ∈ R.
28
Consider again the matrices E0 , E1 , . . . , Em which form a basis of the algebra V . Then
Lemma 2.14 implies that deg(ME0 ) = 1, and that for each other basis element Ej there
exist αj , βj ∈ R with
Here βj 6= 0, since MEj must be irreducible over R. (Otherwise we would have a contra-
diction as in the proof of Lemma 2.14.) Using E0 Ej = Ej E0 = Ej we obtain
We define
ej := 1 (Ej − αj E0 ),
E j = 1, . . . , m,
βj
then
V = spanR {E0 , E1 , . . . , Em } = spanR {E0 , E em },
e1 , . . . , E
and
e 2 = −E0 ,
E j = 1, . . . , m.
j
Lemma 2.15. The real associative division algebra V constructed above is generated by
linearly independent matrices E0 , E
e1 , . . . , E e 2 = −E0 for j = 1, . . . , m.
em that satisfy E
j
For simplicity of notation, let us denote the basis vectors of V again by Ej instead of
ej , with the understanding that still Ej2 = −E0 for j = 1, . . . , m.
E
It remains to check whether there exist any examples beyond the cases m = 0 and the
case m = 1. Here the constraint Ej2 = −E0 turns out to be very restrictive. Let us
consider the different possibilities:
If m = 2, then E1 E2 ∈ V , hence E1 E2 = αE0 + βE1 + γE2 for some α, β, γ ∈ R. If we
multiply this equation from the left with E1 and use E12 = −E0 we get
29
Hence there exist α1 , α2 , β1 , β2 ∈ R, depending on k, `, such that
(Ek + E` )2 + α1 (Ek + E` ) + β1 E0 = 0,
(Ek − E` )2 + α2 (Ek − E` ) + β2 E0 = 0.
Using that Ek2 = E`2 = −E0 these two equations can be written as
Ek E` + E` Ek = (2 − β1 )E0 =: 2sk` E0 ,
for some sk` ∈ R, where k, ` = 1, . . . , m and k 6= `. Note that sk` = s`k . Since Ek2 = −E0 ,
the equation holds for all pairs k, ` when we set skk := −1 for k = 1, . . . , m.
By construction, the matrix S = [sk` ] ∈ Rm,m is symmetric
Pm and trace(S) = −m. Now
T m
let g = [γ1 , . . . , γm ] ∈ R be arbitrary and let A = k=1 γk Ek . Then
m
X m
X m
X
2
A = γk Ek γ` E` = γk γ` Ek E` .
k=1 `=1 k,`=1
Therefore,
m
X
A2 = sk` γk γ` E0 = (g T Sg)E0 .
k,`=1
Since V does not contain zero divisors, A2 = 0 holds if and only if A = 0, and thus
g T Sg = 0 holds if and only if g = 0. This shows that the symmetric matrix S is definite,
and from trace(S) = −m we see that S is negative definite. Thus, there exists a matrix
Z = [zij ] ∈ GLm (R) with Z T SZ = −Im , where on the left hand side we have the usual
matrix multiplication. We now define the matrices
m
X
Jk := zik Ei , k = 1, . . . , m.
i=1
30
Since Z is invertible (with respect to the usual matrix multiplication), the matrices
J1 , . . . , Jm are linearly independent. Moreover, we have V = spanR {E0 , J1 , . . . , Jm }. By
construction,
m
X
Jk J` = zik zj` Ei Ej , k, ` = 1, . . . , m,
i,j=1
and hence
m
X m
X (k`) (k`) (k`)
Jk J` + J` Jk = (zik zj` + zi` zjk )Ei Ej =: βij Ei Ej (where βij = βji )
i,j=1 i,j=1
m
X X
(k`) (k`)
= βii Ei2 + βij (Ei Ej + Ej Ei )
i=1 1≤i<j≤m
m
X X
(k`) (k`)
= βii sii E0 + 2βij sij E0
i=1 1≤i<j≤m
m
X X
(k`) (k`) (k`)
= βii sii E0 + (βij sij + βji sji ) E0
i=1 1≤i<j≤m
m
X
(k`)
= sij βij E0
i,j=1
m
X
= (zik sij zj` + zi` sij zjk ) E0
i,j=1
m
(
X 0, k 6= `,
= (zkT Sz` + z`T Szk ) E0 =
i,j=1
−2E0 , k = `.
The property Jk J` = −J` Jk is called anti-commutativity; cf. Example 2.4 and the
discussion following Theorem 5.11 below.
The case m = 3 is possible, as shown by the following example.
Example 2.16. Consider V = spanR {E0 , J1 , J2 , J3 } with the usual (associative) matrix
multiplication, where E0 = I4 and
0 1 0 0 0 0 1 0 0 0 0 1
−1 0 0 0
, J2 = 0 0 0 1 , J3 = 0 0 −1 0 .
J1 = 0 0 0 −1 −1 0 0 0 0 1 0 0
0 0 1 0 0 −1 0 0 −1 0 0 0
31
Elementary computations show that these (skew-symmetric) matrices satisfy the equa-
tions
Note that these equations yield the following (skew-symmetric) multiplication table:
J1 J2 J3
J1 −E0 J3 −J2
J2 −J3 −E0 J1
J3 J2 −J1 −E0
32
If z = α + βi + γj + δk is a quaternion, then its corresponding conjugate quaternion is
defined as z := α − βi − γj − δk, its norm is |z| := (zz)1/2 = (α2 + β 2 + γ 2 + δ 2 )1/2 ,
and if z 6= 0, its inverse quaternion is given by z −1 = z/|z|2 . While this looks like a
straightforward generalization of the complex numbers C, the essential difficulty in this
generalization is the non-commutativity of the multiplication in H.
The set of matrices V = {E0 , J1 , J2 , J3 } in Example 2.16 is not the only one that generates
an algebra that is isomorphic to to H. As shown in [17], there are 48 distinct ordered
sets of linearly independent 4×4 signed permutation matrices (i.e. permutation matrices
where 1 may be replaced by −1) that form a basis of a division algebra isomorphic to H.
Instead of real 4 × 4 matrices we can also use complex 2 × 2 matrices for constructing a
real division algebra that is isomorphic to H.
Example 2.17. Consider the real vector space V = spanR {E0 , J1 , J2 , J3 } with the com-
plex matrices
i 0 0 −1 0 −i
E0 = I2 , J1 = , J2 = , J3 = ,
0 −i 1 0 −i 0
and the usual matrix multiplication in C2,2 . Now A ∈ V is of the form
α + iβ −γ − iδ
A = αE0 + βJ1 + γJ2 + δJ3 =
γ − iδ α − iβ
for some α, β, γ, δ ∈ R (depending on A). Hence
det(A) = (α + iβ)(α − iβ) + (γ + iδ)(γ − iδ) = α2 + β 2 + γ 2 + δ 2 ,
which again shows that every nonzero matrix A ∈ V is invertible.
Our next goal is to show that no real division algebra with dimension m ≥ 4 can exist.
Suppose that V = spanR {E0 , J1 . . . , Jm } for some m ≥ 4, where E0 , J1 , . . . , Jm are
linearly independent and satisfy (2.11). Consider J1 , J2 , Jk for any k ∈ {3, . . . , m}, then
(J1 J2 Jk )2 = (J1 J2 Jk )(J1 J2 Jk ) = J1 J2 (Jk J1 )J2 Jk = −J1 J2 (J1 Jk )J2 Jk = J1 J2 J1 J2 Jk Jk
= −J1 J2 J1 J2 = J1 J1 J2 J2 = E0 ,
which can be written as
(J1 J2 Jk − E0 )(J1 J2 Jk + E0 ) = 0.
Since V does not contain zero divisors, one of the two factors on the left hand side
must be zero, and hence J1 J2 Jk = ±E0 . A multiplication from the right with Jk yields
J1 J2 = ∓Jk , so that
spanR {J1 J2 } = spanR {Jk } for all k ≥ 3.
33
Thus, in particular, spanR {J3 } = spanR {J4 }, which contradicts the linear independence
of J3 , J4 .
In summary, we have shown the following fundamental result of Frobenius [20, §14].
Theorem 2.18. Each finite dimensional real associative division algebra is isomorphic to
one of the following: the real numbers R, the complex numbers C, or the quaternions H.
Thus, in particular, a finite dimensional associative real division algebra has dimension
1, 2, or 4. Among these, the multiplication is commutative only in the first two cases.
One can now ask whether there exist finite dimensional real commutative division al-
gebras which are non-associative. An example is given by the 2-dimensional algebra
over R given by the complex numbers C with the additional multiplication defined by
z1 ∗ z2 := z1 z2 , where on the right hand side we use the usual multiplication of complex
numbers. We have
z1 ∗ z2 = z1 z2 = z2 z1 = z2 ∗ z1 ,
and the non-associativity is shown, for example, by
i ∗ (i ∗ 1) = i ∗ i · 1 = i ∗ (−i) = i · (−i) = 1 and (i ∗ i) ∗ 1 = i · i ∗ 1 = −1.
Also note that this algebra does not have a unit element. If there was such an element,
it would be uniquely determined, and it would satisfy z ∗ e = z · e = z, or e = z/z, for
all z ∈ C, which is clearly impossible.
Because of the following result of Hopf from 1940 [38], the previous example of a com-
mutative division algebra is already a typical one.
Theorem 2.19. Each finite dimensional real commutative division algebra has dimen-
sion at most two.
Hopf’s proof is based on sophisticated methods from topology, and until today there
seems to exist no purely algebraic proof of this algebraic statement. This situation has
been called the “topological thorn in the flesh of algebra” [13, p. 223].
Hopf also showed that the dimension of a real division algebra always must be a power
of two. An example of a (non-associative and non-commutative) real division algebra of
dimension 23 = 8, the octonions 10 , was discovered independently by Hamilton’s friend
Graves11 in 1843 and by Cayley in 1845. The search for real division algebras came to
an end more than 100 years later with the following result of Milnor from 1958 [55].
Theorem 2.20. Each finite dimensional real division algebra has dimension 1,2,4, or 8.
10
While this non-associative algebra appears to be rather obscure at first sight, it is still of great
interest particularly because of its applications in theoretical physics; see, e.g., [3] or https://www.
quantamagazine.org/the-octonion-math-that-could-underpin-physics-20180720
11
John Thomas Graves (1806–1870)
34
Chapter 3
In this chapter we will study analytic properties of commuting complex matrices. In the
first part of the chapter we will derive results about the field of values and the numerical
radius of products of commuting matrices (Section 3.1). Since the field of values and
the numerical radius are not studied in [LM], we here also survey and prove their most
important properties for completeness. In the second part of the chapter we will study
the exponential function of commuting matrices (Section 3.2).
We point out that the definition F (A) := {xH Ax : x ∈ Cn , kxk2 = 1} is also used for a
real matrix A. The reason will become apparent below.
1 0
Example 3.2. (1) If A = , then for each x = [x1 , x2 ]T ∈ C2 we have xH Ax =
0 0
|x1 |2 . Since F (A) is the set of all these numbers under the constraint |x1 |2 +|x2 |2 =
1, we see that F (A) is the real interval [0, 1].
1
Toeplitz introduced these sets for bilinear forms and under the name Wertevorrat in 1918 [83].
35
0 1
(2) If A = , then for each x = [x1 , x2 ]T ∈ C2 we have xH Ax = x1 x2 . Since
0 0
|x1 |2 + |x2 |2 = 1 we can parameterize
and obtain
1
xH Ax = x1 x2 = cos t sin t ei(ϕ2 −ϕ1 ) = sin 2t ei(ϕ2 −ϕ1 ) .
2
Since t, ϕ1 , ϕ2 can be any real numbers, we see that F (A) is a disk centered at the
origin in the complex plane with radius 12 .
Note that a matrix A ∈ Rn,n in general has complex (non-real) eigenvalues and eigen-
vectors. In order to guarantee that the field of values of such a matrix contains all its
eigenvalues, the definition of F (A) needs to contain all unit Euclidean norm x ∈ Cn (and
not just the unit Euclidean norm x ∈ Rn ). The set {xT Ax : x ∈ Rn , kxk2 = 1} is
called the real field of values of the matrix A, and for a real matrix A this set is always
a subset of the real line.
(3) For any numbers α, β ∈ C and unit Euclidean norm vector x ∈ Cn we have xH (αA +
βIn )x = αxH Ax + β, and hence
F (A + B) ⊆ F (A) + F (B).
The example
1 0 0 0
A= , B= ,
0 0 0 1
shows that a strict inclusion F (A + B) ⊂ F (A) + F (B) can occur.
36
(4) If U ∈ Cn,n is unitary, then xH U H AU x =: y H Ay, where y := U x ∈ Cn has unit
Euclidean norm, which shows that
F (U H AU ) = F (A).
Thus, if A is normal, then F (A) is the convex hull the eigenvalues of A. The converse,
however, is not true in general: There exist non-normal matrices A ∈ Cn,n for which
F (A)
0 1
is equal to the convex hull of the eigenvalues of A. Moreover, as shown by A = ,
0 0
the eigenvalues of A, and hence their convex hull, can in general be contained in the
interior of F (A).
α β
Exercise. Show that the field of values of the matrix A = ∈ C2,2 is an elliptical
0 γ
disk with foci α and γ and minor axis length |β|.
(For Paul Halmos (1916–2006) this was “analytic geometry at its worst” [30, p. 113].)
is a non-normal matrix with F (A) equal to the convex hull of the eigenvalues of A.
The next result shows that the field of values of a product of commuting matrices is
“well behaved” when one of the matrices is Hermitian positive definite.
Theorem 3.3. If A, B ∈ Cn,n commute and A is Hermitian positive semidefinite, then
37
Proof. The Hermitian positive semidefinite matrix A can be unitarily diagonalized with
nonnegative real eigenvalues; see Corollary 1.3 (2) and Corollary 1.4. Thus, let A =
U DU H with U H U = In and D = diag(λ1 , . . . , λn ), where λi ≥ 0 for i = 1, . . . , n. We
define
1/2
D1/2 := diag(λ1 , . . . , λ1/2
n ) and A
1/2
:= U D1/2 U H ,
so that A1/2 is Hermitian positive semidefinite and satisfies A = A1/2 A1/2 .
If p ∈ C[t] is any (interpolation) polynomial with
1/2
p(λi ) = λi , i = 1, . . . , n,
then
p(A) = p(U DU H ) = U p(D)U H = U D1/2 U H = A1/2 .
Since the matrix B commutes with A, it also commutes with any polynomial in A, and
therefore in particular with A1/2 .
Let x ∈ Cn be any vector with kxk2 = 1. Then either A1/2 x = 0, which implies that
xH Ax = 0 ∈ F (A) and xH ABx = 0 ∈ F (A)F (B), or y := A1/2 x 6= 0 and we obtain
y H By
xH ABx = xH A1/2 A1/2 Bx = xH A1/2 BA1/2 x = y H By = y H y
yH y
y H By
= xH Ax ∈ F (A)F (B),
kyk22
which is what we needed to show.
Note that the inclusion F (AB) ⊆ F (A)F (B) does not hold in general, as shown by the
matrices
0 1 0 0 1 0
A= , B= , AB = (3.1)
0 0 1 0 0 0
Here F (AB) = [0, 1], while F (A) and F (B) are both disks centered at the origin with
radius 12 . Clearly, F (AB) is not contained in F (A)F (B).
Toeplitz showed in 1918 [83] that for every matrix A ∈ Cn,n the boundary of F (A) is a
convex curve. Hausdorff 2 extended this result by proving in 1919 [32] that there are no
“holes” in the interior of this curve. This gives the Toeplitz–Hausdorff Theorem.
Proof. Let A ∈ Cn,n be arbitrary. If F (A) is a single point, then this set trivially is
convex. Hence suppose that F (A) contains at least two points, z1 , z2 ∈ C. Since the
2
Felix Hausdorff (1868–1942)
38
convexity of a set is not altered by shifting, scaling, and rotation, and F (αA + βIn ) =
αF (A) + β for all α, β ∈ C, we can assume without loss of generality that z1 = 0 and
z2 = 1. It now suffices to prove that [0, 1] ⊆ F (A).
We write
1 1
A = H + iK, H := (A + AH ), K := (A − AH ).
2 2i
Note that H and K are both Hermitian. Since 0, 1 ∈ F (A), we can find unit Euclidean
norm vectors x, y ∈ Cn with
xH Ax = 0 and y H Ay = 1. (3.2)
Then
0 = xH Ax = x H
| {zHx} +i |xH{z
Kx}, 1 = y H Ay = y H Hy +i y H Ky ,
| {z } | {z }
∈R ∈R ∈R ∈R
39
Since every eigenvalue of A ∈ Cn,n is contained in F (A), and F (A) is convex, we see that
Exercise. Prove or disprove: A ∈ Cn,n is Hermitian if and only if F (A) is a finite real
interval.
Exercise. What can be said about the field of values of a unitary matrix?
For a block diagonal matrix A = diag(B, C) with square matrices B, C and a corre-
spondingly partitioned z = [xT , y T ]T ∈ Cn , where x 6= 0 6= y, we obtain
xH Bx H
2 y Cy
z H Az = kxk22 + kyk 2 .
kxk22 kyk22
If 1 = kzk22 = kxk22 + kyk22 , then this equation shows that z H Az ∈ F (A) is a convex
combination of a point in F (B) and a point in F (C). This shows that F (A) is contained
in the convex hull of F (B) ∪ F (C). The reverse inclusion holds as well.
Exercise. Show that F (A) for A = diag(B, C) with square matrices B, C is equal to
the convex hull of F (B) ∪ F (C).
Definition 3.5. The numerical radius of A ∈ Cn,n is the real (nonnegative) number
In other words, the numerical radius of A is the radius of the smallest closed disk centered
at the origin in the complex plane that contains F (A).
The definition of the numerical radius easily extends from matrices to bounded linear
operators on a Hilbert space (possibly infinite dimensional). If A is such an operator,
we define ω(A) := sup{|hAx, xi| : kxk = 1}. Many of the properties we show below for
matrices hold for bounded operators as well; see, e.g., the survey article [26].
40
Proof. The first inequality follows from the fact that every eigenvalue of A is contained
in F (A). Using the Cauchy–Schwarz3 inequality [LM, Theorem 12.5], we obtain for every
unit Euclidean norm vector x ∈ Cn that
which both hold for all x, y ∈ Cn [LM, Exercises 11.7 and 12.8]. This gives
= 8ω(A),
41
A matrix A ∈ Cn,n is called spectral when ρ(A) = ω(A), and hence all normal matri-
ces are spectral. An example of a spectral matrix that is not normal is given in [26,
Expample 2.1].
If A, B ∈ Cn,n , then the submultiplicativity of the matrix 2-norm and Theorem 3.6 yield
This upper bound on ω(AB) can be attained, as shown by the matrices in (3.1), which
satisfy ω(A) = ω(B) = 21 and ω(AB) = 1. Consequently, the norm defined by the
numerical radius is not submultiplicative.
In general the submultiplicativity fails even when A and B commute, and even when
they are powers of the same matrix.
Example 3.7. This example from [26] uses the 4 × 4 Jordan block
0 1 0 0
0 0 1 0
A= 0 0 0
,
1
0 0 0 0
√
for which ω(A) = 41 (1 + 5) ≈ 0.8090. There exist permutation matrices P1 , P2 such that
0 0 1 0
2
0 0 0 1 0 1 0 1
A = 0 0 0 0 = P1 diag 0 0 , 0 0
P1T ,
0 0 0 0
0 0 0 1
3
0 0 0 0 0 1 0 0
A = = P2 diag , P2T ,
0 0 0 0 0 0 0 0
0 0 0 0
1
ω(A3 ) = > ω(A)ω(A2 ) ≈ 0.4045.
2
Although the numerical radius is in general not submultiplicative, it satisfies the power
inequality for any A ∈ Cn,n ,
42
According to Goldberg and Tadmor [26, p. 274], this inequality was conjectured by
Halmos and first proven by Berger in 1965. An elementary proof was given by Pearcy in
1966 [60]. We skip the proof here, but note the interesting observation that if ω(A) ≤ 1,
then Theorem 3.6 and the power inequality imply that
for all m ≥ 1.
For commuting matrices we get the following result of Holbrook [36].
Proof. Since the numerical radius defines a norm we have ω(αA) = |α| ω(A) for evey
α ∈ C. We therefore can assume without loss of generality that ω(A) = ω(B) = 1. We
now have to show that ω(AB) ≤ 2.
Since AB = BA we have 4AB = (A + B)2 − (A − B)2 . Using the power inequality and
the fact that the numerical radius defines a norm, we now compute
4ω(AB) = ω(4AB)
= ω (A + B)2 − (A − B)2
Holbrook [36] also gave an example of commuting A, B ∈ C4,4 for which the bound in
Theorem 3.8 is attained. Thus, the constant 2 is the best possible. Moreover, he showed
that if AB = BA and B is normal, then ω(AB) ≤ ω(A)ω(B) [36, Theorem 2.2]. This
inequality also holds when A, B “doubly commute”, i.e., AB = BA and AB H = B H A [36,
Theorem 3.4].
43
we have
lim (AB)k = lim Ak B k = lim Ak lim B k = XY.
k→∞ k→∞ k→∞ k→∞
On the other hand, if AB 6= BA, then limk→∞ (AB)k may be different from XY (or the
limit may even not exist).
then
lim Ak = X = 0, lim B k = Y = 0,
k→∞ k→∞
while
1 0 k1 0
AB = , lim (AB) = 6 XY = 0.
=
0 0 k→∞ 0 0
Exercise. Find matrices A, B ∈ Cn,n for some n ≥ 2 such that limk→∞ Ak and
limk→∞ B k both exist, while limk→∞ (AB)k does not exist.
where the infinite sum is an upper triangular matrix with diagonal entries er11 , . . . , ernn .
These numbers are the eigenvalues of exp(A).
For commuting matrices we have the following extension of the well known property
ea eb = eb ea = ea+b of the scalar exponential function.
44
Lemma 3.10. If A, B ∈ Cn,n commute, then
In particular, exp(A) ∈ GLn (C) for every A ∈ Cn,n , and exp(A)−1 = exp(−A).
holds for each k ≥ 0. Using this and the Cauchy product formula yields
∞ ∞ ∞ j
X 1 j X 1 ` X X 1 ` 1
exp(A) exp(B) = A B = A B j−`
j=0
j! `=0
`! j=0 `=0
`! (j − `)!
∞ j
! ∞
X 1X j ` j−`
X 1
= A B = (A + B)j
j=0
j! `=0
` j=0
j!
= exp(A + B).
Exchanging the roles of A and B we get exp(B) exp(A) = exp(B + A) = exp(A + B).
Since A and −A commute,
If AB 6= BA, then in general exp(A) exp(B), exp(B) exp(A), and exp(A + B) may be
three different matrices.
45
The examples in the next two exercises are taken from [86].
(1) Show that AB1 6= B1 A, but exp(A) exp(B1 ) = exp(B1 ) exp(A) = exp(A + B1 ).
(2) Show that AB2 =6 B2 A, but exp(A) exp(B2 ) = exp(B2 ) exp(A) 6= exp(A + B2 ).
(1) Determine the values α, β with exp(A) exp(B) = exp(B) exp(A) = exp(A + B).
(2) Show that there exist α, β with exp(A) exp(B) 6= exp(B) exp(A) = exp(A + B).
We will now have a closer look at the exponential function of matrices that have dif-
ferentiable functions of a single (real or complex) variable as their entries. This in-
vestigation will also give some insight into properties of the product exp(A) exp(B) for
non-commuting matrices A, B ∈ Cn,n .
Suppose that A(t) = [aij (t)] is an (n × m)-matrix, where each entry is a differentiable
function aij (t) of the (real or complex) variable t. We define the derivative of A(t) with
respect to t entrywise, i.e.,
d
A(t) = A0 (t) := [a0ij (t)].
dt
If A(t) = [aij (t)] and B(t) = [bij (t)] are two such matrix-functions of sizes n × m and
m × `, respectively, then
n
X 0 n
X
(A(t)B(t))0ij a0ik (t)bkj (t) + aik (t)b0kj (t)
= aik (t)bkj (t) =
k=1 k=1
n
X n
X
= a0ik (t)bkj (t) + aik (t)b0kj (t) = (A0 (t)B(t))ij + (A(t)B 0 (t))ij ,
k=1 k=1
46
If A(t) is square, then its exponential function is defined as in 3.4, i.e., the constant
matrix A is simply replaced by A(t). The entrywise differentiation of exp(A(t)) gives the
same result as the termwise differentiation of the infinite series, i.e., we have
∞ ∞
d X1 X1
exp(A(t)) = exp(A(t))0 = (A(t)j )0 = (A(t)j )0 ,
dt j=0
j! j=1
j!
Proof. In this proof we write A instread of A(t) for brevity. We know that A and exp(A)
commute, i.e., A exp(A) = exp(A)A. Differentiating both sides of this equation and
using the product rule yields
A0 exp(A) + A exp(A)0 = exp(A)0 A + exp(A)A0 .
If (3.5) holds, then this equation simplifies to A exp(A)0 = exp(A)0 A. We now replace
exp(A)0 on the left and right hand sides by A0 exp(A), which yields
AA0 exp(A) = A0 exp(A)A.
Using again that A and exp(A) commute gives AA0 exp(A) = A0 A exp(A), and a multi-
plication from the right with exp(A)−1 = exp(−A) yields AA0 = A0 A.
On the other hand, suppose that A and A0 commute. We first show by induction that
then
(Aj )0 = jA0 Aj−1 , j = 1, 2, 3, . . . .
This equation obviously holds for j = 1. Suppose that it holds for some j ≥ 1, then
the product rule, the induction hypothesis, and the assumption that A and A0 commute
yield
(Aj+1 )0 = (Aj A)0 = (Aj )0 A + Aj A0 = jA0 Aj + A0 Aj = (j + 1)A0 Aj .
For the derivative of the exponential function of A we therefore obtain
∞ ∞ ∞
0
X 1 j 0 X 1 0 j−1 0
X 1
exp(A) = (A ) = jA A =A Aj−1 = A0 exp(A).
j=1
j! j=1
j! j=1
(j − 1)!
47
In the simple case A(t) = tA for a fixed matrix A ∈ Cn,n we have A0 (t) = (tA)0 = A, and
hence A(t) and A0 (t) commute. Lemma 3.12 then reduces to the well known formula
exp(tA)0 = (tA)0 exp(tA) = A exp(A); (3.6)
cf. [LM, pp. 264–265]. The function exp(tA) plays an important role in the solution of
linear homogeneous ordinary differential equation systems of the form
y 0 (t) = Ay(t), t ∈ [0, a], y(0) = y0 . (3.7)
As shown in [LM, Theorem 17.11], the function exp(tA)y0 is the uniquely determined
solution of (3.7).
For two given matrices A, B ∈ Cn,n we now define
f (t) := exp(tA)B exp(−tA) (3.8)
as a function of the (real of complex) variable t. By multiplying out the power series
expansions of exp(tA) and exp(−tA) and collecting equal powers of t we obtain the
representation
∞ j
X t
f (t) = Gj ,
j=0
j!
where the matrices Gj ∈ K n,n are certain sums of products of A and B. Clearly, G0 = B.
Moreover, using (3.6), the product rule, and the fact that A and exp(−tA) commute, we
obtain
f 0 (t) = A exp(tA)B exp(−tA) − exp(tA)BA exp(−tA)
∞ j
X t
= Af (t) − f (t)A = (AGj − Gj A). (3.9)
j=0
j!
On the other hand, differentiating the power series representation of f (t) termwise with
respect to t yields
∞ ∞ j
X tj−1 X t
f 0 (t) = Gj = Gj+1 . (3.10)
j=1
(j − 1)! j=0
j!
A comparison of (3.9) and (3.10) shows that
Gj+1 = [A, Gj ] := AGj − Gj A, for all j ≥ 0.
The matrix [A, Gj ] is called the additive commutator of A and Gj . (Additive commuta-
tors of matrices will be studied in detail in Section 6.1.)
In summary, we have shown the following result, which in the theory of Lie algebras (cf.
Remark 2.4) is known as the Hadamard Lemma 4 .
4
Jacques Hadamard (1865–1963)
48
Lemma 3.13. Any A, B ∈ Cn,n satisfy
∞ j
X t
exp(tA)B exp(−tA) = Gj ,
j=0
j!
In the special case that A and G1 = [A, B] commute, i.e., G2 = [A, G1 ] = 0, we have
Gj = 0 for all j ≥ 2, and the Hadamard Lemma shows that
We next reformulate the identity from Lemma 3.13 using two maps that are important
in the theory of (matrix) Lie groups and Lie algebras (cf. Remark 2.4). Let A ∈ GLn (C)
be given, then
AdA : Cn,n → Cn,n , B 7→ ABA−1 ,
is called the adjoint map of A. This map is linear and bijective with (AdA )−1 = AdA−1 .
Moreover, for all X, Y ∈ Cn,n we have the interesting identity
which in the theory of Lie algebras is called the adjoint action of A. Note that in the
notation of Lemma 3.13 we have adA (B) = G1 ,
and hence inductively (adA )j (B) = Gj for all j ≥ 0, where (adA )0 (B) = B. We thus
obtain ∞ j
X t
exp(tA)B exp(−tA) = Adexp(tA) (B) = (adA )j (B),
j=0
j!
where the last term can also be written as et adA (B). In the literature this reformulated
version of Lemma 3.13 is often stated without the argument B, i.e., in the compact form
Adexp(tA) = et adA .
49
From the series representation of the function Adexp(tA) it is easy to see that differentiating
with respect to t at t = 0 yields
∞ j−1
d X t j
Adexp(tA) =
(adA ) = adA .
dt t=0 (j − 1)!
j=1
t=0
The last two identities are useful when studying the relation between Lie group homo-
morphisms and maps between their corresponding Lie algebras; see, e.g., [29, Section 3.5]
Lemma 3.13 and equation (3.11) will be used in the proof of the next result, which is a
significant generalization of Lemma 3.10.
Theorem 3.14. If A, B ∈ Cn,n both commute with [A, B], then
Let w(t) be another solution of (3.12). We define u(t) := exp(−M (t))w(t), then the
product rule and Lemma 3.12 yield
and hence u(t) must be constant. The initial condition in (3.12) implies that
u(0) = exp(−A(0))w(0) = y0 ,
which gives u(t) ≡ y0 , and therefore w(t)y0 = exp(M (t))y0 . Thus, similar to (3.7), the
system (3.12) has a uniquely determined solution, which is given by exp(M (t))y0 .
50
We now define
g(t) := exp(tA) exp(tB)
as a function of the (real of complex) variable t. Using the product rule and (3.6) we
obtain
where f (t) is defined in (3.8). Since A and [A, B] commute we can use (3.11), i.e.,
f (t) = B + t[A, B], which yields
Note that g(0) = In , so that the function g(t)y0 is the uniquely determined solution of
the system (3.12), and hence g(t)y0 = exp(M (t))y0 . Since y0 is arbitrary, we must have
g(t) = exp(M (t)), or
which is the first equation we needed to show. The second equation follows from Theo-
rem 3.10, since A and B both commute with [A, B].
Note that Theorem 3.14 for t = 1 and commuting matrices A, B reduces to Theorem 3.10.
In general, two matrices A, B that both commute with their additive commutator are
called quasi-commuting; see Definition 6.14 and Theorem 6.15 below.
Theorem 3.14 is a special case of the Baker-Campbell-Hausdorff (or BCH) Theorem 5 ,
which for given A, B gives the solution C to the matrix equation
This theorem is of great interest in the theory of Lie algebras; see, e.g., [29]. A very
thorough treatment of the BCH Theorem, its history, numerous proofs, and applications
is given in [5].
The special case derived in Theorem 3.14 appears in quantum mechanics. As discussed
in more detail in Chapter 6 (see in particular Theorem 6.12 and its discussion), this area
involves operators P, Q whose additive commutator is of the form [P, Q] = αidV , and
hence P and Q both trivially commute with [P, Q].
5
Henry Frederick Baker (1866–1956), John Edward Campbell (1862–1924), and Felix Hausdorff
(1868–1942)
51
Note that if C is a solution of the equation (3.14), then
i.e., exp(C/2) is a square root of exp(A) exp(B). Some matrices do not have square roots
(see [34, Sections 1.5–1.7] for the general theory). In the next example we construct real
matrices A and B so that exp(A) exp(B) does not have a real square root, and hence
(3.14) does not have a real solution C.
Example 3.15. We adapt an example of Wei [84]. For a given α ∈ R, let
0 α 0 1
A= and B = .
−α 0 0 0
which yields
eiα 0 −1 x y
exp(A) = S S = , where eiα = x + iy.
0 e−iα −y x
1 1
With exp(B) = we obtain
0 1
x x+y
exp(A) exp(B) = .
−y x − y
It is now easy to pick an α ∈ R for which exp(A) exp(B) does not have
√ a real√square
root. Wei’s example in [84] uses α = −5π/4, which yields eiα = −1/ 2 + i/ 2 and
hence √
x x+y −1/√2 0
√
= .
−y x − y −1/ 2 − 2
Because of the negative diagonal elements, it is clear that this lower triangular matrix
does not have a real square root.
√
Let us define β := −1/ 2. Then the complex logarithm yields
52
we obtain
√
log(β) 0 −1 β 0 −1 −1/√2 0
√ .
exp(C) = S exp S =S S =
0 log(1/β) 0 1/β −1/ 2 − 2
We see that the equation (3.14) for the given real matrices A with α = −5π/4 and B has
a complex solution C.
Integration of the function exp(tA) is also done entrywise [LM, p. 266]. Using this
definition of the integral, Wermuth [86] derived the following interesting formula for the
difference of two matrix exponential functions of commuting matrices.
Lemma 3.16. If A, B ∈ Cn,n commute, then
Z 1
exp(A) − exp(B) = exp(B) (A − B) exp(t(A − B)) dt,
0
where k · k is any submultiplicative matrix norm on Cn,n , i.e., any norm on Cn,n with
kM1 M2 k ≤ kM1 k kM2 k for all M1 , M2 ∈ Cn,n .
53
The bound on k exp(A) − exp(B)k also holds when AB 6= BA. In his proof of the general
case Wermuth [86] applied the Lie product formula
which is valid for all A, B ∈ Cn,n . For a proof of this formula see [39, p. 496].
54
Chapter 4
In this chapter we will study the structural properties of commuting matrices in more
detail. Our main goal is to answer the question which matrices commute with a given
matrix.
The following classical result, originally due to Sylvester [77]1 will be used frequently in
our analysis.
Theorem 4.1. Let K be algebraically closed, and let A ∈ K n,n and B ∈ K m,m . Then
the following two assertions are equivalent:
(2) For each C ∈ K n,m there exists a unique matrix X ∈ K n,m with
AX − XB = C. (4.1)
The equation (4.1) with the known matrices A, B, C is called a Sylvester equation for
the matrix X.
Proof. (1) ⇒ (2): Since the field K is algebraically closed, there exists a Schur decompo-
sition B = SRS −1 (cf. Theorem 1.1) with an upper triangular matrix R = [rij ] ∈ K m,m ,
which has the eigenvalues of B as its diagonal elements r11 , . . . , rnn . Inserting the Schur
1
Several years before Sylvester’s paper [77], Frobenius proved that if σ(A) ∩ σ(B) = ∅ and AP = P B,
then P = 0 [20, §7 Satz XI].
55
decomposition into the equation (4.1) yields
AX − XSRS −1 = C ⇔ AXS − XSR = CS
⇔ AY − Y R = D, where Y := XS, D := CS.
Clearly, X is uniquely determined if and only if Y is uniquely determined.
We write Y = [y1 , . . . , ym ] and D = [d1 , . . . , dm ], then the first column of the matrix
equation AY − Y R = D can be written as
(A − r11 In )y1 = d1 .
Since σ(A)∩σ(B) = ∅, the matrix A−r11 In is nonsingular, and hence y1 = (A−r11 In )−1 d1
is uniquely determined. Suppose that y1 , . . . , yk−1 for some k ≥ 2 have been uniquely
determined. Then the kth column of AY − Y R = D is
k−1
X
(A − rkk In )yk = dk + rjk yj .
j=1
A consequence of the implication (2) ⇒ (1) in Theorem 4.1 is that if there exists a
nonzero matrix X ∈ K n,m with AX − XB = 0, then A and B must have a common
eigenvalue. A nice (separate) proof of this fact was given by Drazin in [10, Lemma 2],
and his proof uses the following result (cf. Lemma 2.7).
56
Lemma 4.2. Let K be algebraically closed and let A ∈ K n,n . Then for every nonzero
X ∈ K n,m there exists a uniquely determined monic polynomial p ∈ K[t] of smallest
possible degree deg(p), where 1 ≤ deg(p) ≤ n, such that p(A)X = 0. Moreover, p divides
the minimal polynomial MA , and hence every zero of p is an eigenvalue of A.
The (uniquely determined) polynomial in Lemma 4.2 is called the minimal polynomial
of A with respect to X.
Following Drazin [10], we now assume that AX − XB = 0 holds for some nonzero matrix
X ∈ K n,m , and we let p ∈ K[t] be the minimal polynomial of A with respect to X. If
λ ∈ K is a zero of p (and thus an eigenvalue of A) we can write p = (t − λ)p0 for some
monic polynomial p0 ∈ K[t] with Y := p0 (A)X 6= 0. Moreover,
Our next goal is to characterize matrices that commute with a given matrix A ∈ K n,n .
It is clear that
CA := {B ∈ K n,n : AB = BA}
is a subspace of K n,n . Moreover, if B = p(A) for some p ∈ K[t], then B ∈ CA .
On the other hand, every matrix commutes with the identity matrix, hence CIn = K n,n ,
while each polynomial in In is of the form p(In ) = αIn for some α ∈ K. This means that
in general CA may contain more matrices than just the polynomials in A.
Our analysis of this situation is based on the following definition.
Definition 4.3. Let K be any field. A matrix A ∈ K n,n is called nonderogatory when
deg(MA ) = n (or, equivalently, MA = PA ).
The following two exercises give further characterizations of the nonderogatory property.
Exercise. Show that A ∈ K n,n is nonderogatory if and only if there exists a vector
v ∈ K n such that the n vectors v, Av, . . . , An−1 v are linearly independent, and hence
form a basis of K n . (Such a vector v is called a cyclic vector for A.)
57
We will now show that in case of an algebraically closed field K the only matrices that
commute with a nonderogatory matrix A ∈ K n,n are polynomials in A. The proof uses
that the inverse of an upper triangular Toeplitz matrix is again an upper triangular
Toeplitz matrix. More generally, one can even prove the following result.
Theorem 4.5 (Frobenius [20], §7 Satz XIII). Let K be algebrically closed and let A ∈
K n,n be nonderogatory, i.e., each eigenvalue of A has geometric multiplicity one. Then
each B ∈ K n,n that commutes with A is of the form B = p(A) for some p ∈ K[t], and
dim(CA ) = n.
JB
b = BJ,
b b := S −1 BS.
where B
We now partition B b = [Bij ], where Bii ∈ K di ,di for i = 1, . . . , m. Then the (i, j)-block
b − BJ
of the equation J B b = 0 is of the form
B
b = diag(B11 , . . . , Bmm ),
58
where each diagonal block satisfies
Jdi (λi )Bii − Bii Jdi (λi ) = 0.
Using that Jdi (λi ) = λi Idi + Jdi (0) we obtain the equation
Jdi (0)Bii − Bii Jdi (0) = 0. (4.3)
A direct computation shows that Bii ∈ K di ,di solves this equation if and only if Bii is an
upper triangular Toeplitz matrix of the form
(i) (i) (i)
b0 b1 · · · bdi −1
.. .. .. i −1
dX
. . . (i)
Bii =
..
= bj Jdi (0)j .
(i)
. b1 j=0
(i)
b0
Since the matrices Jdi (0)0 , . . . , Jdi (0)di −1 are linearly independent, there exist exactly di
linearly independent solutions of the equation (4.3). This shows that there exist exactly
d1 + · · · + dm = n linearly independent matrices B b = diag(B11 , . . . , Bmm ) that commute
with J, and consequently dim(CA ) = n.
It remains to be shown that B is a polynomial in A. For each i = 1, . . . , m we define
Y
qi = (t − λj )dj ∈ K[t],
j6=i
then it is easy to see that qi (Jdj (λj )) = 0 for i 6= j. Moreover, qi (Jdi (λi )) is an upper
triangular Toeplitz matrix that is nonsingular, since its diagonal element is given by
Y
(λi − λj )dj 6= 0.
j6=i
(Note that in the case m = 1 and hence d1 = n we only have the polynomial q1 = 1 and
the matrix q1 (Jn (λ1 )) = In .)
By Lemma 4.4, the matrix qi (Jdi (λi ))−1 is an upper triangular Toeplitz matrix, and hence
also qi (Jdi (λi ))−1 Bii is an upper triangular Toeplitz matrix, which we write as
i −1
dX i −1
dX
−1 (i) (i)
qi (Jdi )(λi ) Bii = cj Jdi (0)j = cj (Jdi (λi ) − λi Idi )j =: ri (Jdi (λi )),
j=0 j=0
59
Consequently, for the polynomial p := p1 + · · · + pm we have
p(A) = Sdiag(p(Jd1 (λ1 )), . . . , p(Jdm (λm )))S −1 = S BS
b −1 = B,
60
Example 4.6. The elementary divisors of J = diag(J3 (λ), J2 (λ)) are (t−λ)3 and (t−λ)2 ,
and
δ11 = 3, δ12 = δ21 = 2, δ22 = 2,
giving dim(CA ) = 9. For the identity matrix In we have the n elementary divisors t − 1,
and hence δij = 1 for i, j = 1, . . . , n, which gives dim(CIn ) = n2 , i.e., CIn = K n,n .
Finally, if A is nonderogatory, then δij = 0 when i 6= j, which yields dim(CA ) = n as
shown in Theorem 4.5.
We have shown above that there exist commuting matrices A, B ∈ K n,n for which B is
not a polynomial in A. One can ask whether AB = BA implies that A and B are both
polynomials in a third matrix C ∈ K n,n . The following example of Frobenius [21] shows
that this is in general not true. We will show in Theorem 5.8 below that two matrices
A, B ∈ K n,n can be expressed as polynomials in a third matrix C ∈ K n,n if and only if
they commute and are both diagonalizable.
for some matrix C ∈ K n,n and polynomials p, q ∈ K[t]. Then AC = CA and BC = CB.
A straightforward computation shows that any matrix C that commutes with both A and
B is of the form
0 α1 α2
C = α0 I3 + D with D = 0 0 0
0 0 0
for some α0 , α1 , α2 ∈ K. Since D2 = 0, every polynomial in C is of the form β0 I + β1 D
for some β0 , β1 ∈ K. Thus,
The first equation implies that β1A α1 = 1 and β1A α2 = 0, and hence in particular α2 = 0.
But the second equation implies that β1B α2 = 1, which is a contradiction.
Exercise. Determine all B ∈ K 3,3 that commute with A = diag(1, J2 (1)). What
changes in this computation when A = diag(1, J2 (−1)) ?
61
Exercise. Let A ∈ K n,n have the eigenvalue λ of algebraic multiplicity n and the
Jordan form diag(Jd1 (λ), Jd2 (λ), . . . , Jdm (λ)) with d1 ≥ d2 ≥ · · · ≥ dm . Using (4.4),
show that dim(CA ) = d1 + 3d2 + · · · + (2m − 1)dm .
A variant of this formula was first stated without proof by Frobenius in 1878 [20, §7
Satz XV], and he published a proof in 1910, more than 30 years later; cf. equations
(7.) and (8.) in [22]. Note that Frobenius’ formula applied to A = In implies the well
known identity 1 + 3 + · · · + (2n − 1) = n2 .
With the usual matrix multiplication, V (A, B) becomes an associative and commutative
algebra with unit element In . If A is nonderogatory, then Theorem 4.5 says that B = p(A)
holds for some polynomial p ∈ K[t]. Then for every polynomial pe(s, t) in the two
(commuting) variables s, t we have pe(A, B) = pe(A, p(A)), which is a polynomial in A.
Thus, V (A, B) is contained in the algebra of the polynomials in A, and in particular
dim(V (A, B)) ≤ n.
Gerstenhaber showed in 1961 [25] that dim(V (A, B)) ≤ n also holds without the as-
sumption that A (or B) is nonderogatory. A proof of Gerstenhaber’s result using only
techniques from matrix theory is given in [4]. Interestingly, the question whether the
same bound holds for three commuting matrices, i.e., whether dim(V (A, B, C)) ≤ n, is
in general still open; see [58, Chapter 5] and [35] for extensive discussions.
62
Chapter 5
In this chapter we will show that commuting matrices can be triangularized simultane-
ously, which generalizes the fundamental Theorem 1.1 about the triangularization of a
single matrix. After proving this result, which relies on the fact that commuting matri-
ces have a common eigenvector, we will discuss simultaneous diagonalizability and derive
some further sufficient conditions for simultaneous triangularizability (Section 5.1). We
will then give several applications of the simultaneous triangularization, where “applica-
tion” means that the simultaneous triangularization is used to prove other results. These
include Frobenius’ spectral mapping theorem, and a theorem of Schur on the maximal
number of linearly independent and mutually commuting matrices in K n,n (Section 5.2).
63
Proof. For each x ∈ Vλ (A) = ker(λIn − A) we have (λIn − A)Bx = (λB − AB)x =
(λB − BA)x = B(λIn − A)x = 0, and hence Bx ∈ Vλ (A).
Lemma 5.3. If K is algebraically closed and A, B ∈ K n,n commute, then A and B have
a common eigenvector.
Proof. Since the field is algebraically closed, A has an eigenvalue λ ∈ K and hence
{0} 6= ker(λIn − A) = Vλ (A). Since A and B commute, Vλ (A) is invariant under B
(cf. Lemma 5.2), and therefore B has an eigenvector in Vλ (A) (cf. Lemma 5.1). By the
definition of Vλ (A), this vector is also an eigenvector of A.
Note that AB = BA does not imply that A and B have a common eigenvalue. For a
simple counterexample consider A = In and B = −In .
Also note that the condition AB = BA is sufficient but not necessary for the existence
of a common eigenvector. For example, any two upper triangular matrices have the
common eigenvector e1 , but not all of them commute, as shown by the matrices
0 1 1 −1
A= and B = ,
0 1 0 1
which satisfy
0 1 0 0
AB = 6 = = BA.
0 1 0 1
A necessary and sufficient condition for the existence of a common eigenvector is given in
the following result of Shemesh [72]; see also [50, Section 7]. Another sufficient condition
will be derived in Lemma 6.17.
Theorem 5.4. Suppose that K is algebraically closed. Two matrices A, B ∈ K n,n have
a common eigenvector if and only if
∞
\ n−1
\
i j j i
{0} =
6 ker(A B − B A ) = ker(Ai B j − B j Ai ).
i,j=1 i,j=1
Proof. We denote U := ∞ i j j i n
T
i,j=1 ker(A B −B A ). This is a subspace of K . Because of the
Cayley-Hamilton Theorem, Ai ∈ span{In , A, . . . , An−1 } and B j ∈ span{In , B, . . . , B n−1 }
for all i, j ≥ n, and therefore we can replace ∞ in the intersection by n − 1.
Now suppose that x ∈ K n is a common eigenvector of A and B, i.e., Ax = λx and
Bx = µx. Then for all i, j ≥ 1 we have
64
and hence x ∈ U.
On the other hand, suppose that x ∈ U \ {0}. Then (AB − BA)x = 0, or ABx = BAx,
which shows that A and B commute on U. Moreover, for all i, j ≥ 1 we have
and thus A1 B1 = B1 A1 , since X has full rank. By Lemma 5.3, A1 and B1 have a
common eigenvector, say A1 y = λy and B1 y = µy, and then AXy = XA1 y = λXy and
BXy = XB1 y = µXy, which shows that Xy is a common eigenvector of A and B.
In the intersection
n−1
\
ker(Ai B j − B j Ai )
i,j=1
has rank less than n. This is a computational (though very expensive) criterion for the
existence of a common eigenvector.
We now show that commuting matrices can be simultaneously triangularized, which gen-
eralizes Schur’s triangularization theorem for a single matrix from 1909 (Theorem 1.1).
This simultaneous triangularization result is often attributed to Frobenius’ article from
1896 on commuting matrices [21], but it was not explicitly stated there, and Frobenius
considered Corollary 5.14 (see below) as his main result1 .
1
Frobenius wrote to Dedekind about this article [21] on June 4, 1896: “I almost forgot to mention that
today my article Ueber vertauschbare Matrizen will be published in the Sitzungsberichte. ... Hopefully
you will enjoy its artistic structure. Many will find it artificial. Hensel confessed to me that he was
dumber after reading it, and I needed to take the entire building apart before he understood its meaning.”
(My translation of the quote from Frobenius’ letter given in [75].)
65
Theorem 5.5. If K is algebraically closed and A, B ∈ K n,n commute, then there exists
a matrix S ∈ GLn (K) such that S −1 AS and S −1 BS are both upper triangular, i.e., A
and B are simultaneously triangularizable.
Proof. We prove the result by induction on n. The case n = 1 is trivial. Suppose the
assertion is true for matrices of order n − 1 for some n ≥ 2, and consider two commuting
matrices A, B ∈ K n,n .
By Lemma 5.3, A and B have a common eigenvector, say Ax = λx and Bx = µx. Let
S1 := [x, x2 , . . . , xn ] ∈ GLn (K), then
λ ∗ µ ∗
AS1 = S1 and BS1 = S1 (5.1)
0 A1 0 B1
This shows that A1 and B1 commute. By the induction hypothesis, there exists a matrix
S2 ∈ GLn−1 (K) such that S2−1 A1 S2 and S2−1 B1 S2 are both upper triangular. Defining
1 0
S := S1 ∈ GLn (K),
0 S2
we obtain
−1 λ ∗ −1 µ ∗
S AS = −1 and S BS = ,
0 S2 A1 S2 0 S2−1 B1 S2
which are both upper triangular.
As in Theorem 1.1 we can replace in Theorem 5.5 the term upper triangular by lower
triangular.
The condition that A and B have a common eigenvector is necessary but not sufficient
for the simultaneous triangularization. On the other hand, the condition AB = BA is
sufficient but not necessary for the simultaneous triangularization. In other words, there
exist matrices with a common eigenvector that are not simultaneously triangularizable,
and there exist simultaneously triangularizable matrices that do not commute. The
second point is clear from the fact that not all upper triangular matrices commute (see
the example above).
The inductive proof of Theorem 5.5 suggest the following algorithm for computing the
simultaneous triangularization of two commuting matrices:
Input: Commuting matrices A, B ∈ K n,n .
66
Output: Matrix S ∈ GLn (K) such that S −1 AS and S −1 BS are upper triangular.
Initialize: A1 = A, B1 = B, and S = In .
for k = 0, . . . , n − 1 do
Determine a common eigenvector x ∈ K n−k of A1 and B1 .
Determine x2 , . . . , xn−k ∈ K n−k S1 = [x, x2 , . . . , xn−k ] ∈ GLn−k
such that (K).
0 0
Set A1 = [0, In−(k+1) ]S1−1 AS1 and B1 = [0 In−(k+1) ]S1−1 BS1 .
In−(k+1) In−(k+1)
In−k 0
Update S ← S .
0 S1
end for
This algorithm can of course be applied also to matrices A, B ∈ K n,n that do not
commute. If they are simultaneously triangularizable, then the algorithm will run until
the final step k = n − 1 and deliver the required matrix S. If they are not simultaneously
triangularizable there will be some step k < n − 1 where no common eigenvector of A1
and B1 exists, and the algorithm will fail.
As for the unitary triangularization of a single complex matrix (cf. Corollary 1.2), we can
obtain the simultaneous unitary triangularization of two commuting complex matrices.
Corollary 5.6. If A, B ∈ Cn,n commute, then there exists a unitary matrix U ∈ Cn,n
such that U H AU and U H BU are both upper triangular, i.e., A and B are simultaneously
unitarily triangularizable.
Proof. Since the field C is algebraically closed and A and B commute, we can simulta-
neously triangularize them by Theorem 5.5. Let S ∈ GLn (C) be such that S −1 AS = R1
and S −1 BS = R2 are both upper triangular. Let S = U R be a QR decomposition of S,
where U ∈ Cn,n is unitary and R ∈ GLn (C) is upper triangular. Then S −1 = R−1 U H
and
U H AU = RR1 R−1 and U H BU = RR2 R−1 ,
which are both upper triangular.
Theorem 5.5 and Corollary 5.6 can be extended inductively to an arbitrary (finite) num-
ber of mutually commuting matrices as follows.
Corollary 5.7. If K is algebraically closed and A1 , . . . , Ak ∈ K n,n are mutually com-
muting matrices, then there exists a matrix S ∈ GLn (K) such that S −1 Aj S is upper
triangular for all j = 1, . . . , k. In particular, if K = C, then S can be chosen unitary.
The next theorem of Drazin, Dungey and Gruenberg characterizes simultaneous diago-
nalization. They considered the result to be “of a trivial nature” [12, p. 221].
67
Theorem 5.8. Let K be algebraically closed and suppose that A, B ∈ K n,n . Then the
following assertions are equivalent:
= BD
b A.
For i 6= j we have λi 6= λj and thus Theorem 4.1 shows that Bij = 0 is the unique
solution of this Sylvester equation, which yields
B
b = diag(B11 , . . . , Bmm ).
SB−1 BS
b B = DB , where SB = diag(Sd1 , . . . , Sdm ),
for some diagonal matrix DB , and Sdi ∈ K di ,di , i = 1, . . . , m. Now the matrix S := SA SB
yields the simultaneous diagonalization, since
68
K. (Note that K is algebraically closed and hence is not a finite field.) Then there
exist polynomials p1 , p2 ∈ K[t] of degree at most n − 1 which solve the two (Lagrange)
interpolation problems
Theorem 5.8 shows, in particular, that commuting diagonalizable matrices can be ex-
pressed as polynomials in the same matrix. We have already seen in Example 4.7 that
this does not hold for general (non-diagonalizable) commuting matrices.
Exercise. Extend the formulation and the proof of Theorem 5.8 to an arbitrary (finite)
number of matrices A1 , . . . , Ak ∈ K n,n .
Exercise. Show that if A, B ∈ Cn,n are normal matrices that commute, then there
exists a unitary matrix U ∈ Cn,n such that U H AU and U H BU are both diagonal, i.e.,
A and B are simultaneously unitarily diagonalizable.
Show that if additionally A and B are Hermitian positive definite (or semidefinite),
then the product AB is also Hermitian positive definite (or semidefinite).
Using Lemma 5.1 we can prove another sufficient condition of Shemesh [73] for the
existence of a simultaneous triangularization of two matrices.
Theorem 5.9. If K is algebraically closed and A, B ∈ K n,n are such that their product
AB commutes with both A and B, then A and B are simultaneously triangularizable.
Proof. Let us write C = AB−BA, then the assumptions A(AB) = (AB)A and B(AB) =
(AB)B can equivalently be written as AC = 0 and CB = 0, respectively. If A ∈ GLn (K),
then C = 0 and the result follows from Theorem 5.5. We therefore can assume that
A∈/ GLn (K), and thus ker(A) 6= {0}.
If x ∈ ker(A), then ACx = 0, and hence the non-trivial subspace ker(A) is invariant
under C. By Lemma 5.1 there exists an eigenvector x0 of C in ker(A), i.e., we have
69
for some λ ∈ K. Let us define
X := {x ∈ K n : Cx = λx and Ax = 0},
which is a subspace of K n with dimension at least one. We will show that X is invariant
under B. For each x ∈ X we have
and
ABx = (AB − BA + BA)x = Cx = λx.
If λ = 0, then these two equations show that Bx ∈ X . In this case B has an eigenvector
in X , which simultaneously is an eigenvector of A corresponding to the eigenvalue 0, say
Ax = 0 and Bx = µx. This yields a decomposition of the form (5.1) with λ = 0. Now
0 ∗ −1 0 ∗
S1 S = A(AB) = (AB)A = S −1 ,
0 A1 (A1 B1 ) 1 0 (A1 B1 )A1 1
and
0 ∗ −1 0 ∗
S1 S = B(AB) = (AB)B = S −1 ,
0 B1 (A1 B1 ) 1 0 (A1 B1 )B1 1
show that the product A1 B1 commutes with both A1 and B1 . The result follows by
induction as in the proof of Theorem 5.5.
It remains to show that indeed λ = 0. We will assume that λ 6= 0 and derive a contra-
diction. First note that (AB)2 = ABAB = A2 B 2 , and inductively (AB)k = Ak B k for
all k ≥ 1. Suppose that 0 6= x ∈ X , then ABx = λx 6= 0. Thus, for all k ≥ 1,
Ak B k x = (AB)k x = λk x 6= 0,
but
Ak+1 B k x = A(AB)k x = (AB)k Ax = 0,
which means that
Bkx ∈
/ ker(Ak ), but B k x ∈ ker(Ak+1 ).
Consequently, for all k ≥ 1 we have the strict inclusion ker(Ak ) ⊂ ker(Ak+1 ). This
however is impossible since the space is finite-dimensional, and therefore there must be
some integer k ≥ 1 with ker(Ak ) = ker(Ak+1 ).
The following example shows that for simultaneous triangularizability it is in general not
sufficient that the product AB commutes with only one of the two matrices A and B.
70
Example 5.10. For the matrices
0 0 0 0 1 0
A = 0 0 0 and B = 0 0 1
1 0 0 0 0 0
we have
0 0 0
AC = 0 and CB = 0 −1 0 6= 0.
0 0 1
Both A and B have only the eigenvalue 0 with the corresponding eigenspaces given by
V0 (A) = span{e2 , e3 } and V0 (B) = span{e1 }. Since A and B do not have a common
eigenvector, they are not simultaneously triangularizable.
If the products AB and BA are not equal, they may still be linearly dependent, i.e.,
AB = ωBA for some ω ∈ K. Drazin called such matrices projectively commuting [10],
and he showed the following result (where the case ω = 1 is included).
Theorem 5.11. Let K be algebraically closed, and suppose that A, B ∈ K n,n satisfy
AB = ωBA for some ω ∈ K. Then exactly one of the following holds:
(2) There exists an integer r with 0 ≤ r ≤ n − 2, such that A and B are simultaneously
similar to matrices of the form
S X T Y
and , (5.2)
0 Ar 0 Br
Ap0 (B)x = p0 (ωB)Ax = 0 and Bp0 (B)x = p(B)x + µp0 (B)x = µp0 (B)x,
71
which shows that A and B have the common eigenvector x1 := p0 (B)x. As in the proof
of Theorem 5.5 we define S1 := [x1 , x2 , . . . , xn ] ∈ GLn (K) and obtain
0 ∗ µ ∗
AS1 = S1 and BS1 = S1
0 A1 0 B1
for some matrices A1 , B1 ∈ K n−1,n−1 . We have
0 ∗ −1 0 ∗
AB = S1 S = ωBA = S1 S −1 ,
0 A1 B1 1 0 ωB1 A1 1
and hence A1 B1 = ωB1 A1 . We can repeat the process inductively if at least one of the
matrices A1 and B1 is singular (cf. also the proof of Theorem 5.5).
The process terminates either with a simultaneous triangularization of A and B, or when
we arrive at matrices Ar , Br ∈ K n−r,n−r with 0 ≤ r ≤ n−2 that are both nonsingular, and
in this case we obtain a simultaneous similarity of A and B to matrices as in (5.2).
Note that if the projectively commuting matrices A and B are both nonsingular, then
they are not simultaneously triangularizable, and we are in case (2) of Theorem 5.11 with
r = 0. If, however, the projectively commuting matrices A and B are simultaneously
triangularizable, then the proof of Theorem 5.11 shows that both products AB and BA
are nilpotent, and the union of the spectra of A and B contains at least n eigenvalues
equal to zero.
Apart from the usual commutativity (ω = 1), the most important class of projectively
commuting matrices are the anti-commuting matrices 2 , which satisfy AB = −BA, i.e.,
ω = −1 in Theorem 5.11. An example of three nonsingular real 4×4 anti-commuting ma-
trices is given in the context of the quaternions in Example 2.16. The size of these three
matrices is even and they have trace zero, which is true in general for anti-commuting
matrices.
Lemma 5.12. If K is algebraically closed and A, B ∈ GLn (K) are anti-commuting, then
n must be even, and trace(A) = trace(B) = 0.
Proof. Applying the determinant to both sides of the equation AB = −BA yields
det(A) det(B) = (−1)n det(A) det(B).
Since det(A) det(B) 6= 0, we obtain 1 = (−1)n , which implies that n is even.
Since B is nonsingular, AB = −BA implies that A = −BAB −1 , and hence trace(A) =
−trace(A), which gives trace(A) = 0. In the same way we obtain trace(B) = 0 from the
nonsingularity of A.
2
Such matrices have been studied already by Cayley, who called them skew convertible in 1858 [7,
p. 30].
72
Exercise. Is the assumption that K is algebraically closed necessary in Lemma 5.12,
or can the assertions be shown for other fields as well?
(1) (2)
Proof. Suppose that A = SR1 S −1 and B = SR2 S −1 , where R1 = [rij ] and R2 = [rij ]
(2)
are upper triangular. Since B is nilpotent we must have rii = 0 for i = 1, . . . , n, and
therefore
n
Y n
Y
(1) (2) (1)
det(A + B) = det(R1 + R2 ) = (rii + rii ) = rii = det(A).
i=1 i=1
73
Frobenius proved a variant of this result in [20, §3 Satz VII] and repeated the proof
almost 20 years later in [21, §2 Satz VII3 ]: Suppose that A, B ∈ K n,n commute, A is
invertible, and B is nilpotent. In this case AB = BA can be written as A−1 B = BA−1 ,
which by induction yields (A−1 B)n = A−n B n = 0, and hence A−1 B is nilpotent. Thus,
PA−1 B = det(tIn − A−1 B) = tn . For t = −1 we get
(−1)n = det(−In − A−1 B) = det(−A−1 (A + B)) = (−1)n det(A−1 ) det(A + B),
and using det(A−1 ) = det(A)−1 gives det(A + B) = det(A).
A simple, yet important corollary of Theorem 5.5 is the following spectral mapping the-
orem for commuting matrices.
Corollary 5.14 (Frobenius Pm [21], Satz III4 ). Let K be algebraically closed, let A, B ∈ K n,n
i j
commute, and let p = i,j=0 γij s t with γij ∈ K for i, j = 0, 1, . . . , m be a polynomial
in the two (commuting) variables s, t. Then the eigenvalues of p(A, B) are of the form
p(α1 , β1 ), . . . , p(αn , βn ), where α1 , . . . , αn and β1 , . . . , βn are the eigenvalues of A and B,
respectively, in some appropriate ordering.
The proof of Corollary 5.14 shows that in order to guarantee that the eigenvalues of
p(A, B) have the form p(αi , βi ) it is sufficient that A and B are simultaneously trian-
gularizable. The fact that these two properties are equivalent, where in general we have
to consider polynomials in non-commuting variables, is part of an important theorem of
McCoy5 that we will prove in Section 6.2 (see Theorem 6.18).
3
After the proof of [20, §3 Satz VII] Frobenius pointed out: “The main progress in the theory of
forms that Weierstrass made beyond Cauchy and Jacobi is that he taught how to further decompose
forms of which a power vanishes, or more generally, whose characteristic equation has only one root,
except when the smallest power that vanishes is the nth.” (My translation.)
4
Frobenius stated his variant of Corollary 5.14 for “arbitrary functions” of the given commuting
matrices, and as one of his main theorems in the Introduction of his article from 1896 [21]. He mentioned
that he knew the result already when he wrote his article [20] from 1878. In that article he stated
without proof the assertion of Corollary 5.14 for the polynomial p = st, and hence the eigenvalues of
the product AB [20, §7 Satz XII]. According to Drazin [11, p. 222], that was the “first significant result
on commutative matrices”.
5
Neal Henry McCoy (1905–2001)
74
Exercise. Use Corollary 5.7 to extend Corollary 5.14 to an arbitrary (finite) number
of mutually commuting matrices A1 , . . . , Ak ∈ K n,n .
Corollary 5.14 particularly shows that for two commuting matrices A, B the eigenvalues
of A + B are of the form α1 + β1 , . . . , αn + βn . Hence if σ(A) ∩ σ(−B) = ∅, then all
eigenvalues of A + B are nonzero and A + B ∈ GLn (K). Moreover, if AB = BA, then
Taking A = B = αIn , we see that equality may occur. The example of the commuting
2 × 2 real matrices
1 0 −1 0
A= , B= , where σ(A + B) = {0} ⊂ σ(A) + σ(B) = {−2, 0, 2}
0 −1 0 1
Exercise. Show that if A, B ∈ Cn,n commute, then the spectral radius satisfies ρ(A +
B) ≤ ρ(A) + ρ(B) and ρ(AB) ≤ ρ(A)ρ(B). Do these inequalities also hold for the
numerical radius?
We next address the question how many linearly independent and mutually commuting
matrices exist in K n,n , which was already asked by Frobenius in the introduction of his
article [21]. Clearly, if A ∈ K n,n , then all matrices in the sequence
In , A, A2 , . . .
75
If `1 6= k2 and k1 6= `2 , then
which shows that Ek1 `1 and Ek2 `2 commute. Consequently, the matrices in the set
n jnk jnk o
Sn := Ek` : 1 ≤ k ≤ , + 1 ≤ ` ≤ n ∪ {In },
2 2
where bxc := max {m ∈ Z : m ≤ x} for all x ∈ R, are linearly independent and mutually
commuting. We have
jnk j n k jnk j n k n2
|Sn | = 1 + · n− +1 +1 =1+ · n− = + 1.
2 2 2 2 4
In particular,
S1 = {I1 } , S2 = {I2 , E12 } , S3 = {I3 , E12 , E13 } , S4 = {I4 , E13 , E14 , E23 , E24 } .
The following result of Schur [66] shows that if K is algebraically closed, then Sn forms
a maximal set of linearly independent and mutually commuting matrices in K n,n . The
proof we give is due to Mirzakhani6 [56].
j 2k
Proof. We have already seen above that there exist at least n4 +1 linearly independent
and mutually commuting matrices in K n,n . We will show that this is the maximal number
by induction on n. The case n = 1 is trivial.
Suppose that the assertion holds for matrices of order n − 1 for some n ≥ 2, and let
2
n,n n
A1 , . . . , Ak ∈ K , k := + 2,
4
76
A1 , . . . , Ak are simultaneously triangularizable (cf. Corollary 5.7). Hence we may assume
without loss of generality that
∗ ∗
Aj = , j = 1, . . . , k,
0 Rj
(n − 1)2
m := dim(U) ≤ + 1 < k.
4
77
An analogous construction starting with
ej ∗
R
Aj = , j = 1, . . . , k,
0 ∗
A
e = [ebm+1
e , . . . , ebk ] ∈ K n,k−m
e
In Theorem 5.15 we have assumed that K is algebraically closed. The example with
the set Sn above Theorem 5.15, however, didj notk make any assumptions on the field K.
n2
Thus, for any field K there exist at least 4 + 1 linear independent and mutually
78
commuting matrices. Jacobson7 [43] showed that this is indeed the maximal number for
(almost) arbitrary fields. Another set for which this maximum is achieved is shown in
the following exercise.
7
Nathan Jacobson (1910–1999)
79
Chapter 6
In Chapter 5 we have shown that commuting matrices over an algebraically closed field
are simultaneously triangularizable (Theorem 5.5). The converse of this result is however
not true, since there exist simultaneously triangularizable matrices that do not commute.
We have also shown that for two commuting matrices A, B with respective eigenvalues
αi , βi , the eigenvalues of the polynomial p(A, B) are of the form p(αi , βi ) (Corollary 5.14).
Again, the converse is not true, which motivated the following comment of Drazin on
the commutativity property of matrices [11, p. 222]:
“Indeed, no non-trivial result for general commutative matrices has yet been found which
is not true under conditions less stringent than commutativity; thus commuting is, as
such, hardly a fundamental property, and the problem of interest is to find useful gener-
alizations of it.”
A few years later, Olga Taussky expressed the same point of view [78, p. 232]:
“Apparently there are none or at most very few theorems known which say: “such and
such a statement is true if and only if AB = BA”; it is always “if AB = BA then ...”.
Hence we are tempted to replace AB = BA in various ways by weaker hypotheses.”
In this chapter we will study “useful generalizations” (Drazin) or “weaker hypetheses”
(Taussky) that lead to equivalent characterizations of the simultaneous triangularization
property; cf. our discussion of Corollary 5.14. It will turn out that this characterization
involves the additive commutator AB − BA. We will analyze additive commutators in
more detail (Section 6.1), and then prove McCoy’s important characterization theorem.
On our way to this theorem we will also derive further sufficient (but not necessary) con-
ditions for commutativity or simultaneous triangularizability of matrices (Section 6.2).
Finally, we will consider the multiplicative commutator ABA−1 B −1 of two invertible
80
matrices (Section 6.3).
i.e., any additive commutator of two square matrices has trace zero.
Exercise. Show that the trace zero matrices of K n,n with the additive commutator
form a Lie algebra.
For our further analysis we need the following theorem from Linear Algebra.
Theorem 6.1. Let K be any field and A ∈ K n,n . Then there exist uniquely determined
(non-constant) monic polynomials p1 , . . . , pk ∈ K[t] with pi |pi+1 for i = 1, . . . , k − 1 and
pk = MA , such that
for some S ∈ GLn (K). Here Cpi is the companion matrix of the polynomial pi (see
(A.1)), the polynomials p1 , . . . , pk are called the invariant factors of A, and the matrix F
is called the Frobenius normal form or rational canonical form of A.
Proof. We prove the result only for nonderogatory matrices A ∈ K n,n . Let MA = PA =
tn + αn−1 tn−1 + · · · + α0 . We know that exists a cyclic vector v ∈ K n for A such that the
vectors
v, Av, . . . , An−1 v
81
are linearly independent. Now
and hence with S := [v, Av, . . . , An−1 v] ∈ GLn (K) we obtain AS = SCMA , which we
needed to show.
Lemma 6.2. If K is any field and A ∈ K n,n has trace(A) = 0, then A is similar to a
matrix that has only zero diagonal elements.
Proof. First recall that the trace is invariant under similarity transformations of a matrix.
After a suitable reordering, which is a similarity transformation, we can therefore assume
without loss of generality that
A11 A12
A= ,
A21 A22
where A11 ∈ K j1 ,j1 has only nonzero diagonal elements and A22 ∈ K j2 ,j2 has only zero
diagonal elements. If j1 = 0, we are done.
Thus, we can assume that j1 ≥ 1. We cannot have j1 = 1, since then trace(A) =
trace(A11 ) 6= 0 contrary to our assumption. Hence we must have j1 ≥ 2. Let S1 ∈
GLj1 (K) be a matrix such that S1−1 A11 S1 is in Frobenius normal form. Then the matrix
−1 −1
S1 A11 S1 S1−1 A12
S1 0 S1 0
A =
0 In−j1 0 In−j1 A21 S1 A22
has more zero diagonal entries than A. We can reorder this matrix so that all nonzero
diagonal entries are again in the (1, 1)-block. We can then apply another similarity
transformation to the Frobenius normal form, which reduces the number of nonzero
digaonal elements. This process ends when the Frobenius normal form of the transformed
(1, 1)-block contains exactly one block. Since trace(A) = 0, the diagonal of this block,
which contains at most one nonzero element, must be zero.
Lemma 6.2 yields a important theorem of Shoda [74] about additive commutators.
Theorem 6.3. If the field K has characteristic zero, then each A ∈ K n,n with trace(A) =
0 is an additive commutator, i.e., A = [B, C] for some matrices B, C ∈ K n,n .
1
Kenjiro Shoda (1902–1977)
82
Proof. The matrix A is an additive commutator if and only if S −1 AS is an additive
commutator for any S ∈ GLn (K). Thus, we can use Shoda’s Lemma 6.2 and as-
sume without loss of generality that A = [aij ] with aii = 0 for i = 1, . . . , n. Let
B = diag(b1 , . . . , bn ) ∈ K n,n be any matrix with bi 6= bj for i 6= j. Since the field has
characteristic zero, the differences bi − bj for i 6= j are invertible and we can define
( a
ij
, i 6= j,
cij := bi −bj
arbitrary, i = j.
Shoda’s Theorem 6.3 was extended to arbitrary fields by Albert and Muckenhoupt [1].
The next result was shown independently and almost simultaneously by Laffey [49] and
Guralnick [28]. The proof given here follows the later work in [8].
Proof. The statement is obvious for n = 1, so we can assume that n ≥ 2. Let C = [A, B].
If rank(C) = 0, then A and B are simultaneously triangularizable by Theorem 5.5.
Thus, let us assume that rank(C) = 1. We will first show that A and B have a common
eigenvector.
Let λ ∈ K be an eigenvalue of A. Then Vλ (A) = ker(A − λIn ) and im(A − λIn ) are both
non-trivial invariant subspaces under A, i.e., both invariant subspaces have a dimension
between 1 and n − 1. If Vλ (A) is invariant under B, then B has an eigenvector in Vλ (A)
by Lemma 5.1, and this vector simultaneously is an eigenvector of A.
Now suppose that Vλ (A) is not invariant under B. Then there exists a nonzero vector
x ∈ Vλ (A) such that Bx ∈
/ Vλ (A), i.e., (A − λIn )Bx 6= 0. Moreover, since rank(C) = 1
we have
im(C) = span{y}
for some nonzero vector y ∈ K n . We therefore obtain
for some nonzero α ∈ K. In particular, this shows that y ∈ im(A − λIn ). For any vector
z ∈ K n there exists some µz ∈ K with
83
and therefore
Consequently, A and B have the common invariant subspace U (1) = im(A − λI), where
1 ≤ dim(U (1) ) ≤ n − 1.
Let the columns of U (1) ∈ K n,k1 form a basis of U (1) , where k1 := dim(U (1) ). Then
(1) (1)
AU (1) = U (1) ZA and BU (1) = U (1) ZB
(1) (1)
for some matrices ZA , ZB ∈ K k1 ,k1 .
If k1 = 1, then U (1) is a common eigenvector of A and B. Otherwise we consider the
equation
(1) (1)
CU (1) = U (1) [ZA , ZB ].
The matrix on the right hand side has rank at most 1, and since U (1) has full rank, we
must have
(1) (1)
rank([ZA , ZB ]) ≤ 1.
(1) (1)
We can now apply the same reasoning as above to the matrices ZA and ZB and continue
recursively. This reduces the dimension of the matrices by at least 1 in every step. After
m ≤ n − 1 steps we obtain
(m) (m)
AU = U ZA and BU = U ZB
(m) (m)
with U = U (1) · · · U (m) of full rank, where ZA and ZB must have a common eigenvec-
tor. It is easy to see that this vector yields a common eigenvector of A and B.
Since A and B have a common eigenvector, we obtain a decomposition of the form (5.1).
This decomposition yields
0 ∗
C = S1 S −1 .
0 [A1 , B1 ] 1
Here rank([A1 , B1 ]) ≤ 1, and therefore the simultaneous triangularization of A and B
follows inductively as in the proof of Theorem 5.5.
As shown above, each additive commutator [A, B] has trace zero, and hence the sum of
its eigenvalues is zero. But this does in general not mean that all the eigenvalues are
zero, or that [A, B] is nilpotent.
84
Here [A, B] is invertible, although neither A nor B is. Also note that A and B have
only the eigenvalue 0 with the corresponding eigenspaces given by V0 (A) = span{e2 } and
V0 (B) = span{e1 }. Since A and B do not have a common eigenvector, they are not
simultaneously triangularizable.
Lemma 6.6. Let K be any field and let A, B ∈ K n,n . If A and B have a common
eigenvector, then [A, B] is not invertible. Moreover, if A and B are simultaneously
triangularizable, then [A, B] is nilpotent.
Example 6.16 below shows that the converse of both assertions in this lemma do in
general not hold.
The next result, which is sometimes called Jacobson’s Lemma (see [42, Lemma 2]), gives
another sufficient condition for nilpotency of the additive commutator. A general version
of this lemma for linear operators in Banach spaces was proven by Kleinecke [48]. Here
we give a proof using only matrix theory, which is extracted from the proof of [12,
Theorem 2]. Apart from being interesting in its own right, the lemma will be very useful
for proving several results below.
Lemma 6.7. Let K be algebraically closed with characteristic zero or (prime) charac-
teristic p > n, let A, B ∈ K n,n , and C = [A, B]. If AC = CA or BC = CB, then C is
nilpotent.
C k = C k−1 (AB − BA) = A(C k−1 B) − (C k−1 B)A = [A, C k−1 B].
85
If BC = CB, then for any k ≥ 1 we obtain
This holds for all k ≥ 1, and hence we can take the first m of these equations and form
the linear algebraic system
1 1 ··· 1 λ1 d1 0
λ1 λ2 · · · λm λ2 d2 0
.. .. = .. .
.. ..
. . . . .
m−1 m−1 m−1
λ1 λ2 · · · λm λm dm 0
Since λ1 , . . . , λm are pairwise distinct, the (Vandermonde) matrix on the left hand side
is invertible, giving λi di = 0 for i = 1, . . . , m. Since the field has characteristic zero or
p > n, we must have λi = 0 for i = 1, . . . , m, which however contradicts our assumption
that λ1 , . . . , λm are pairwise distinct. Consequently, C can have only one eigenvalue of
algebraic multiplicity n, and, as shown above, this is the eigenvalue λ = 0, so that C is
indeed nilpotent.
A direct consequence of Lemma 6.7 is the next result of Kato2 and Olga Taussky from
1956 [47, Theorem 2]. Two years earlier, Putnam had already established a more general
version of this result for linear operators on a Hilbert space [62].
Corollary 6.8. If A ∈ Cn,n commutes with [A, AH ], then A is normal. (Note that the
converse holds trivially.)
Proof. The field C is algebraically closed and has characteristic zero. Since A commutes
with [A, AH ], we know that [A, AH ] is nilpotent by Lemma 6.7. But [A, AH ] is also
Hermitian, and thus unitarily diagonalizable by Corollary 1.3. Hence [A, AH ] must be
the zero matrix, which means that A is normal.
Exercise. Prove the following generalization of Corollary 6.8, originally due to Olga
Taussky [79, Theorem 3]: If A, B ∈ Cn,n are such that AB and BA are Hermitian and
A commutes with [A, B], then [A, B] = 0.
2
Tosio Kato (1917–1999) published more than 160 articles and 6 monographs in English, among
them his famous Perturbation Theory of Linear Operators [45, 46]. His first article in English appeared
in 1941, and before publishing the article [47] with Olga Taussky in 1956, all his publications were
singly-authored. Thus, Olga Taussky was Kato’s first co-author.
86
Starting with C1 := [A, AH ], we can form C2 := [A, C1 ], C3 := [A, C2 ], and so on.
Corollary 6.8 then says that if C1 6= 0 (i.e., A is not normal), then C2 6= 0 (i.e., A and
C1 do not commute). But, somewhat surprisingly, C1 6= 0 and C2 6= 0 do in general not
imply that C3 6= 0, as shown by the following example of (real) matrices due to Olga
Taussky [78]:
1 1 1 0 0 −2
A= , C1 = 6= 0, C2 = 6= 0, but C3 = 0.
0 1 0 −1 0 0
Lemma 6.7 also is important in the proof of the following result of Shapiro [71] (cf. also
the proof of Theorem 5.9).
Theorem 6.9. Let K be algebraically closed with characteristic zero or (prime) charac-
teristic p > n. If A, B ∈ K n,n are such that [A, B] = p(A) for some p ∈ K[t], then A
and B are simultaneously triangularizable.
Proof. Let A, B ∈ K n,n be two matrices with C = [A, B] = p(A). Since A and C
commute, we know that C is nilpotent (Lemma 6.7), and that A and C have a common
eigenvector (Lemma 5.3), say
Ay = λy and Cy = 0
for some nonzero y ∈ K n . We define
X := {x ∈ K n : Ax = λx and Cx = 0},
which is a subspace of K n with dimension at least one. Note that for each polynomial
q ∈ K[t] and vector x ∈ X we have q(A)x = q(λ)x. Since K has characteristic zero and
we can choose a nonzero vector x, the equation 0 = Cx = p(A)x = p(λ)x then shows
that p(λ) = 0.
For each x ∈ X we have
A(Bx) = (C + BA)x = BAx = λ(Bx),
and inductively we obtain q(A)Bx = q(λ)Bx for every polynomial q ∈ K[t]. In particu-
lar,
C(Bx) = p(A)Bx = p(λ)Bx = 0,
so that X is invariant under B. Consequently, A and B have a common eigenvector in
X , and we obtain a decomposition of the form (5.1). Writing C1 := A1 B1 − B1 A1 , a
straightforward computation shows that
0 ∗ −1 0 ∗
C = AB − BA = S S = p(A) = S S −1 .
0 C1 0 p(A1 )
Since C1 = p(A1 ), the result follows inductively as in Theorem 5.5.
87
Theorem 6.9 was also shown by Bourgeois [6], and Ikramov [41] derived a variant of the
result using Shemesh’s necessary and sufficient condition for the existence of a common
eigenvector of two matrices (Theorem 5.4). Combining Theorem 6.9 and Theorem 4.5
shows that if K is algebraically closed with characteristic zero, and A, B ∈ K n,n are such
that A is nonderogatory and commutes with [A, B], then A and B are simultaneously
triangularizable.
Exercise. Investigate whether the assumption in Theorem 6.9 that K has character-
istic zero can be replaced by K having (prime) characteristic p > n.
The next result of Shapiro [70] relates the commutativity of two matrices to the commu-
tativity with the additive commutator.
Theorem 6.10. Let K be algebraically closed and let A, B ∈ K n,n . If A is diagonalizable
and commutes with [A, B], then [A, B] = 0.
Proof. Since the field is algebraically closed and A and C = [A, B] commute, there exists
a simultaneous triangularization by Theorem 5.5, i.e., A = SR1 S −1 and C = SR2 S −1
with upper triangular matrices R1 , R2 ∈ K n,n . Since A is diagonalizable, we can choose
S so that R1 is diagonal. Furthermore, we can assume without loss of generality that
R1 = diag(λ1 Id1 , . . . , λk Idk ), where λi 6= λj for i 6= j.
Using the simultaneous triangularization, the equation AC = CA yields R1 R2 = R2 R1 .
Let us write R2 = [Rij ] with diagonal blocks Rii ∈ K di ,di for i = 1, . . . , k. Then the
(i, j)-block of the equation R1 R2 = R2 R1 can be written as
This is a Sylvester equation for the matrix Rij . For i 6= j we have λi 6= λj , and
thus Theorem 4.1 implies that Rij = 0 is the unique solution. Consequently, R2 =
diag(R11 , . . . , Rkk ).
We can now write the equation C = SR2 S −1 in the equivalent form
b − BR
R1 B b 1 = R2 , b := S −1 BS.
where B
Let Bb = [Bij ] with diagonal blocks Bii ∈ K di ,di for i = 1, . . . , k. Then the (i, j)-block of
the last equation is given by
88
This is a Sylvester equation for the matrix Bij . For i 6= j we have Rij = 0 and λi 6=
λj , and thus Bij = 0 is the unique solution by Theorem 4.1. This means that B b =
diag(B11 , . . . , Bkk ), and we get
AB = (SR1 S −1 ) (S BS
b −1 )
= S diag(λ1 Id1 , . . . , λk Idk ) diag(B11 , . . . , Bkk ) S −1
= S diag(B11 , . . . , Bkk ) diag(λ1 Id1 , . . . , λk Idk ) S −1
= (Sdiag(B11 , . . . , Bkk )S −1 ) (Sdiag(λ1 Id1 , . . . , λk Idk )S −1 )
= BA,
A special case of Theorem 6.10, which was first shown by Putnam [61], is the following:
If A, B ∈ Cn,n are such that A is normal and commutes with [A, B], then [A, B] = 0.
The following example gives some more insight into the necessity of the diagonalizability
assumption on A in Theorem 6.10.
Example 6.11. For the matrices
1 1 0 0
A= and B=
0 1 0 1
we have
0 1
C = [A, B] = , and AC = CA, but [A, B] 6= 0.
0 0
The matrix A is not diagonalizable, and the conclusion of Theorem 6.10 does not hold.
We will now have a brief look at the additive commutator of two linear operators on
a Hilbert space V . This case is important in the theory of quantum mechanics, where
Heisenberg’s relation 3 can be written as [P, Q] = αidV for some scalar α 6= 0, and P, Q
are self-adjoint quantum-mechanical operators4 .
For finite matrices P, Q such a relation cannot hold because the trace of their additive
commutator is zero. Wintner in 1947 [88] and Wielandt5 in 1949 [87] showed that the
Heisenberg relation is also impossible when the two linear operators are bounded.
Theorem 6.12. If P, Q are two bounded linear operators on a Hilbert space V with
[P, Q] = αidV , then α = 0.
3
Werner Heisenberg (1901–1976)
4
Kato and Olga Taussky remarked about this in 1956 [47, p. 38]: “The non-vanishing of the commu-
tator of canonically conjugate operators is the root of the Heisenberg uncertainty principle. One infers
that commutability is not “purely mathematical”, but of metamathematical interest.”
5
Helmut Wielandt (1910–2001)
89
Proof. Wintner’s proof: Let [P, Q] = αidV . If we define Pλ := P + λidV , then
Since Pλ satisfies the same relation as P for any scalar λ, we can assume without loss
of generality that P is invertible. Then QP = P −1 (P Q)P , which implies that σ(QP ) =
σ(P Q), and thus
P 2 Q − QP 2 = P 2 Q − P QP + P QP − QP 2 = P (P Q − QP ) + (P Q − QP )P = 2αP,
0 = P k Q − QP k = αkP k−1 ,
which yields |α| ≤ 2kP kkQk/n. This holds for all n ≥ 1, which implies α = 0.
A consequence of this result is that linear operators P, Q with [P, Q] = αidV and α 6= 0
must be unbounded6 .
Example 6.13. This simple illustrating example is taken from [30, p. 128]. Let V =
L2 (−∞, ∞) and define the (unbounded) linear operators P, Q by
[P, Q]f (x) = P (xf (x)) − Qf 0 (x) = f (x) + xf 0 (x) − xf 0 (x) = f (x),
90
6.2 The theorems of McCoy
In this section we will prove two important theorems of McCoy that characterize the
simultaneously triangularization property of matrices. It is clear that commutativity is
sufficient but not necessary for the simultaneous triangularization, and hence a different
property must be found. In a first step in this direction, McCoy introduced the following
definition in 1934 [53].
Definition 6.14. Let K be a field, A, B ∈ K n,n and C = [A, B]. If AC = CA and
BC = CB, then A, B are called quasi-commuting.
91
and hence Bx ∈ X . Thus, X is invariant under B, and Lemma 5.1 implies that B has
an eigenvector in X , which simultaneously is an eigenvector of A. We therefore obtain a
decomposition of the form (5.1), and a straightforward computation shows that
0 ∗
C=S S −1 , where C1 := [A1 , B1 ].
0 C1
A1 C1 = C1 A1 and B1 C1 = C1 B1 .
Since A1 , B1 are quasi-commuting, the result follows inductively as in the proof of The-
orem 5.5.
since the (upper triangular) additive commutator [RA , RB ] is nilpotent and hence has a
zero diagonal. What we have derived can be considered a (significant) generalization of
the usual rule ea eb = ea+b for the scalar exponential function.
The following example shows that for simultaneous triangularizability it is in general not
sufficient that the additive commutator commutes with only one of the two matrices A
and B.
Example 6.16. For the matrices
0 1 0 0 0 0
A = 0 0 0 and B = 1 0 0
0 0 0 0 1 0
we have
0 1 0
C = AB − BA = 0 0 −1 , AC = CA, and BC 6= CB.
0 0 0
92
The additive commutator is nilpotent, as guaranteed by Lemma 6.7. Both A and B have
only the eigenvalue 0 with the corresponding eigenspaces given by V0 (A) = span{e1 , e2 }
and V0 (B) = span{e3 }. Since A and B do not have a common eigenvector, they are not
simultaneously triangularizable.
Two years after his article [53], McCoy was able to give necessary and sufficient conditions
for the simultaneous triangularization property. McCoy’s original proof of Theorem 6.18
below uses abstract algebraic structures. The proof we give uses only matrix theory and
is based on the next lemma of Drazin, Dungey and Gruenberg [12]. This lemma and the
following developments use polynomials p(t1 , . . . , tm ) in the non-commuting variables
t1 , . . . , tm . Such polynomials are linear combinations of words in the variables. For
example, if m = 2, then a word in t1 and t2 is an expression of the form
k `
tk11 t`21 tk12 t`22 · · · t1j t2j ,
where the ki , `i are nonnegative integers, whose sum is called the degree of the word. It
is not allowed to change the order of the monomials in such words, so for example t1 t2 6=
t2 t1 . Of course, if we evaluate such a polynomial p(t1 , . . . , tm ) at mutually commuting
matrices A1 , . . . , Am , we may again commute the matrices.
Lemma 6.17. Let K be algebraically closed and let A1 , . . . , Am ∈ K n,n . Suppose that
for every polynomial p(t1 , . . . , tm ) in the non-commuting variables t1 , . . . , tm each of the
matrices p(A1 , . . . , Am )[Ak , A` ] for k, ` = 1, . . . , m is nilpotent. Then for every x ∈
K n \{0} the matrices A1 , . . . , Am have a common eigenvector of the form q(A1 , . . . , Am )x
for some polynomial q(t1 , . . . , tm ).
93
Now suppose that the result holds for m − 1 and some m ≥ 2. Let x ∈ K n \ {0} be
arbitrary, and let A1 , . . . , Am ∈ K n,n be given, where for every polynomial p(t1 , . . . , tm )
in the non-commuting variables t1 , . . . , tm each of the matrices
p(A1 , . . . , Am )[Ak , A` ], k, ` = 1, . . . , m,
is nilpotent. We write Ck` = [Ak , A` ]. By the induction hypothesis there exists a common
eigenvector of A1 , . . . , Am−1 of the form y1 = q(A1 , . . . , Am−1 )x. We distinguish two cases:
Case 1: For every polynomial p ∈ K[t] and i = 1, . . . , m − 1 we have Cim p(Am )y1 = 0,
or equivalently
(Ai Am ) p(Am )y1 = (Am Ai ) p(Am )y1 . (6.1)
Then, in particular,
We claim that then Ai p(Am )y1 = p(Am )Ai y1 for every p ∈ K[t]. Indeed, suppose that
Ai Akm y1 = Akm Ai y1 for some k ≥ 1. Then (6.1) yields
Ai Ak+1 k k k+1
m y1 = (Ai Am )Am y1 = (Am Ai )Am y1 = Am Ai y1 .
From the base case of the induction we know that there exits a (nonzero) polynomial
q1 ∈ K[t] such that the (nonzero) vector q1 (Am )y1 is an eigenvector of Am . Recall that
y1 is an eigenvector of Ai for each i = 1, . . . , m − 1. Thus, for each i = 1, . . . , m − 1 there
exists some λi ∈ K such that
which shows that q1 (Am )y1 = q1 (Am )q(A1 , . . . , Am−1 )x =: qe(A1 , . . . , Am )x is a common
eigenvector of A1 , . . . , Am .
Case 2: There exists a matrix C1 := Cim for some i ∈ {1, . . . , m − 1} and a polynomial
p1 ∈ K[t] with C1 p1 (Am )y1 6= 0. Then, by the induction hypothesis, there exists a
common eigenvector of A1 , . . . , Am−1 of the form
If we have Cim p(Am )y2 = 0 for every polynomial p ∈ K[t] and i = 1, . . . , m − 1, we can
proceed with y2 as in Case 1 in order to construct a common eigenvector of A1 , . . . , Am
which has the required form.
Otherwise, there exists a matrix C2 := Cim for some i ∈ {1, . . . , m − 1} and a polynomial
p2 ∈ K[t] with C2 p2 (Am )y2 6= 0. By the induction hypothesis there now exists a common
eigenvector of A1 , . . . , Am−1 of the form
94
Proceeding in this way we can construct a sequence of vectors
This sequence terminates at some step if and only if yk+1 is a common eigenvector of
A1 , . . . , Am−1 that satisfies Cim p(Am )yk+1 = 0 for all i = 1, . . . , m − 1 and p ∈ K[t].
If the sequence terminates with such a vector, we can use the same argument as in
Case 1 in order to construct a common eigenvector of A1 , . . . , Am which has the required
form. We will now show (by contradiction) that the sequence must terminate with such
a vector.
Suppose that the sequence does not terminate. Since the space K n is n-dimensional, the
vectors y1 , . . . , yn+1 are linearly dependent, and hence
n+1
X
µj yj = 0,
j=1
where µ1 , . . . , µn+1 ∈ K are not all equal to zero. If k ≥ 1 is the smallest integer such
that µk 6= 0, then we must have k ≤ n and can write
n+1
X
−µk yk = µj y j .
j=k+1
By construction,
This yields
(pk (Am )u(A1 , . . . , Am )Ck ) (pk (Am )yk ) = −µk (pk (Am )yk ),
i.e., pk (Am )yk 6= 0 is an eigenvector of the matrix pk (Am )u(A1 , . . . , Am )Ck correspond-
ing to the nonzero eigenvalue −µk . This contradicts our assumption that each matrix
p(A1 , . . . , Am )Cim is nilpotent, and the proof is complete.
This lemma allows a very simple proof of the following theorem of McCoy, which he
considered “a perfection of Frobenius’ theorem” [54, p. 593]; cf. the discussion of Corol-
lary 5.14. The proof we give is due to Drazin, Dungey and Gruenberg [12].
95
Theorem 6.18. Let K be algebraically closed and let A1 , . . . , Am ∈ K n,n . Then the
following assertions are equivalent:
96
It is clear that for K = C we can write unitarily triangularizable in (1) of Theo-
rem 6.18. Moreover, in (3) the condition that p(A1 , . . . , Am )[Ak , A` ] is nilpotent implies
that [Ak , A` ]p(A1 , . . . , Am ) is nilpotent and vice versa, since these matrices have the same
eigenvalues; cf. Lemma A.1.
For algebraically closed fields with characteristic zero the little theorem of McCoy (The-
orem 6.15) shows that the quasi-commutativity property
Since [A, B] invertible, the condition of (3) in Theorem 6.18 fails, e.g., for the polynomial
p = 1. Thus, A and B are not simultaneously triangularizable. Also note that σ(A+B) =
{−1, 1}, while σ(A) + σ(B) = {0}.
Exercise. Show that if A1 , . . . , Am in Theorem 6.18 are normal, then (1)–(3) are
equivalent with [Ai , Aj ] = 0 for all i, j = 1, . . . , m.
Exercise. Use Theorem 6.18 to give an alternative proof of the following special
case of Theorem 5.9, originally due to Schneider [64]: If K is algebraically closed and
A, B ∈ K n,n are such that AB = 0, then A and B are simultaneously triangulariz-
able. (Hint: Consider any polynomial p in two (non-commuting) variables and form
(p(A, B)[A, B])2 .)
97
C=0 Thm. 6.10 A diagonalizable and AC = CA
Thm. 5.5
Thm. 6.18 (cf. Cor. 5.14) Thm. 6.4 & 6.9 & 6.15
Thm. 6.18
is equal to zero.
Since [A, B] has trace zero, one needs to test
2 −1
nX
2
2k = 2n − 2
k=1
matrices whether they have trace zero. Each of these computations requires O(n3 ) multi-
2
plications, and thus a possible algorithm based on the condition above requires O(2n n3 )
multiplications. Bourgeois implemented such a method and concluded “practically, one
cannot test a pair of numerical matrices of dimension greater than five” [6, p. 593].
98
6.3 The multiplicative commutator
If G is any group and x, y ∈ G, then the element (x, y) := xyx−1 y −1 ∈ G is called the
group commutator of x and y. If e ∈ G is the unit element, then (x, y) = e holds if and
only if x and y commute. Moreover, (x, y)−1 = (y, x).
For the group GLn (K) and A, B ∈ GLn (K) we accordingly have
(A, B) := ABA−1 B −1 ,
and (A, B) = In if and only if [A, B] = 0. In the following we will call (A, B) the
multiplicative commutator of A and B. We immediately observe that the multiplicative
commutator
Theorem 6.20. If K is any field and A ∈ K n,n is diagonalizable with det(A) = 1, then
A is a multiplicative commutator, i.e., A = (B, C) for some B, C ∈ GLn (K).
DP T = [µ1 en , µ2 e1 , . . . , µn en−1 ],
99
which gives
(D−1 P )(DP T ) = diag(µ−1 −1 −1
n µ1 , µ1 µ2 , . . . , µn−1 µn ) = Λ.
Consequently,
In Theorem 6.20 we have made the rather strong assumption that A is diagonalizable.
As shown by Shoda [74, Satz 1], if K is algebraically closed, then every matrix A ∈ K n,n
with det(A) = 1 is a multiplicative commutator8 .
Theorem 6.20 yields the following corollary due to Ky Fan9 [16, Theorems 3 and 5].
For Hermitian matrices we have the following result, also due to Ky Fan [16, Theorem 6].
Theorem 6.22. Let A ∈ Cn,n be Hermitian with det(A) = 1. Then A can be written as
A = (B, C) for some Hermitian matrices B, C ∈ GLn (C) if and only if A and A−1 have
the same eigenvalues.
Proof. Suppose that A = AH with det(A) = 1, and that A = (B, C) for some matrices
B, C ∈ GLn (C). Then
Lemma A.1 shows that A and A−1 have the same eigenvalues.
On the other hand, suppose that A = AH with det(A) = 1 is given, where A and A−1
have the same eigenvalues. Since det(A) = 1 we can write
−1
A = U ΛU H , where Λ = diag(1, . . . , 1, λ1 , λ−1
1 , . . . , λk , λk ).
8
Olga Taussky wrote that she “adored this theorem” [81, p. 802]. Her first official PhD student
Robert C. Thompson (1931–1995) extended Shoda’s result more general fields in 1961 [82].
9
Ky Fan (1914–2010)
100
Now note that −1 −1
λj 0 λj 0 0 1 λj 0 0 1
= ,
0 λ−1j 0 1 1 0 0 1 1 0
λj 0 0 1
where the matrices and are Hermitian (even real symmetric). From this
0 1 1 0
it is easy to see that there exist Hermitian matrices B, C ∈ GLn (C) with A = (B, C).
The next result of Marcus10 and Thompson [52] gives an interesting application of the
field of values in the study of multiplicative commutators (cf. also Theorem 6.10).
Proof. Since AC = CA and both A and C are normal, we can simultaneously unitarily
diagonalize these matrices by Theorem 5.8, i.e., A = U D1 U H and C = U D2 U H , where
D1 B
b = D2 BD
b 1, b := U H BU.
where B
With B
b = [bij ] the last equation gives
In particular, for i = j we have ai bii = ci bii ai and since ai 6= 0, we get bii = ci bii . By
assumption 0 ∈ b and hence 0 6= eT Be
/ F (B) = F (B), b i = bii . Therefore ci = 1, giving
i
C = In , or AB = BA.
Exercise. Which essential properties in the proof of Theorem 6.23 do not hold when
we assume that A and C are just diagonalizable instead of normal?
The following special case of Theorem 6.23 was already established by Frobenius in
1911 [23, Satz V].
Corollary 6.24. If A, B ∈ Cn,n are unitary, σ(B) is contained in an arc less than a
semicircle, and the multiplicative commutator C = ABA−1 B −1 satisfies AC = CA, then
AB = BA.
10
Marvin David Marcus (1927–2016)
101
A unitary matrix with eigenvalues in an arc less than a semicircle is sometimes called a
cramped matrix.
Proof. Since A commutes with C = [A, B], we know that C is nilpotent by Lemma 6.7.
Moreover, AC = CA and BC = CB imply that CA−1 B −1 = A−1 B −1 C and thus by
induction (CA−1 B −1 )k = (A−1 B −1 )k C k for all k ≥ 0. In particular,
and hence
Note that the equalities of the different spectra in the proof of Lemma 6.25 show that
several other matrices are nilpotent as well. Putnam and Wintner [63] showed that the
assertion of Lemma 6.25 also holds when just one of A and B commutes with [A, B].
102
Appendix A
In this appendix we will recall some important definitions and facts from Linear Algebra,
which mostly can be found in [LM].
Let K be a field. Every field has at least two elements, 0 and 1. The characteristic of a
field is the smallest possible number n ∈ N such that n · 1 = 0. If no such n ∈ N exists,
the characteristic of K is zero. If the characteristic of a field is not zero but some n ∈ N,
then n must be a prime number, which can be seen as follows: If the characteristic n ∈ N
would be of the form n = k · m with 1 < k, m < n, then
0 = n · 1 = (k · m) · 1 = (k · 1) · (m · 1).
Since K has no zero divisors, at least one of the numbers m · 1 and k · 1 on the right hand
side must be 0, but this contradicts the minimality assumption on n. Examples for fields
with characteristic zero are Q, R and C. An example of a field with (prime) characteristic
n ∈ N, and hence a finite field, is given in [LM, Example 2.29 and Exercise 3.13].
For a field K we denote by K[t] the ring of polynomials with coefficients in K. If every
non-constant polynomial in K[t] has at least one root in K, then the field K is called
algebraically closed. Equivalently, K is algebraically closed if and only if each p ∈ K[t]
with deg(p) = m ≥ 1 decomposes into linear factors over K, i.e.,
p = α(t − λ1 ) · · · (t − λm ),
for some α, λ1 , . . . , λm ∈ K. The classical example of an algebraically closed field is the
field of the complex numbers C; cf. the proof of the Fundamental Theorem of Algebra
in [LM, Section 15.2].
If K contains finitely many elements, say K = {k1 , . . . , kn }, we can form the polynomial
p = (t−k1 ) · · · (t−kn )+1 ∈ K[t]. This polynomial satisfies p(kj ) = 1 for all j = 1, . . . , n,
103
and hence does not have a root in K. Consequently, a finite field cannot be algebraically
closed.
By K n,m we denote the set of n × m matrices over K, i.e., the matrices A = [aij ] with
aij ∈ K for i = 1, . . . , n and j = 1, . . . , m. For simplicity, the n × 1 matrices are denoted
by K n instead of K n,1 . Each A ∈ K n,m has an image and a kernel defined by
respectively. The dimension formula for linear maps says that m = dim(im(A)) +
dim(ker(A)) [LM, Theorem 10.9].
The group of invertible or nonsingular n × n matrices is denoted by GLn (K) (see [LM,
Theorem 4.11]), and the identity (matrix) in GLn (K) is denoted by In = [δij ] ∈ K n,n ,
where δij denotes the Kronecker delta-function.
The transpose of A = [aij ] ∈ K n,m is the matrix AT = [bij ] ∈ K m,n with bij = aji for all
i, j. The Hermitian transpose of A = [aij ] ∈ Cn,m is the matrix AH = [bij ] with bij = aji
T
for all i, j, i.e., AH = A .
For A ∈ K n,n we denote by
104
A subspace U ⊆ K n is invariant under A ∈ K n,n if Ax ∈ U for each x ∈ U. For
example, the eigenspace Vλ (A) of each eigenvalue λ of A is an invariant subspace of A
[LM, Lemma 14.4].
The following is a list of important matrix classes introduced in [LM]:
4. A = [aij ] ∈ K n,n is upper triangular when aij = 0 for all i > j, and lower triangular
when aij = 0 for all j > i (or when AT is upper triangular).
1. The n × n invertible upper (or lower) triangular matrices over K form a subgroup
of GLn (K). In particular, the inverse of an invertible upper triangular matrix is
upper triangular, and the product of invertible upper triangular matrices is an
invertible upper triangular matrix [LM, Theorem 4.13].
1
“If one knows all about symmetric matrices, then one knows a great deal about matrices. On the
other hand, one can only understand symmetric matrices properly if one understands the links between
general matrices A and their transposes AT .” (Olga Taussky [80, p. 147])
105
2. Determinant Multiplication Theorem: For all A, B ∈ K n,n we have det(AB) =
det(A) det(B) = det(BA) [LM, Theorem 7.15].
3. If A ∈ K n,n has the characteristic polynomial PA = tn + an−1 tn−1 + · · · + a0 , then
an−1 = −trace(A) and a0 = (−1)n det(A) [LM, Lemma 8.3].
4. Cayley-Hamilton Theorem: For each A ∈ K n,n we have PA (A) = 0 ∈ K n,n [LM,
Theorem 8.6].
5. If A ∈ K n,n is nilpotent, then PA = tn [LM, Excercise 8.3].
6. For each p ∈ K[t] of (exact) degree n ≥ 1 and the corresponding companion matrix
Cp ∈ K n,n , we have PCp = det(tIn − Cp ) = p [LM, Lemma 8.4].
7. The map trace : K n,n → K, [aij ] 7→ nj=1 ajj , is linear and it satisfies trace(SAS −1 ) =
P
trace(A) as well as trace(AB) = trace(BA) for all A, B ∈ K n,n and S ∈ GLn (K)
[LM, Exercise 8.8].
8. The trace of a real or complex matrix is orthogonally or unitarily invariant, respec-
tively, i.e., trace(Q1 AQ2 ) = trace(A) for A ∈ Rn,n and orthogonal Q1 , Q2 ∈ Rn,n ,
and trace(U1 AU2 ) = trace(A) for A ∈ Cn,n and unitary U1 , U2 ∈ Cn,n .
9. For A ∈ Rn,m or A ∈ Cn,m we have rank(A) = rank(AH A) [LM, Corollary 10.25].
10. QR decomposition: For each A ∈ Rn,m with rank(A) = m there exists a matrix
Q ∈ Rn,m with pairwise orthonormal columns, i.e., QT Q = Im , and an upper
triangular matrix R ∈ GLn (R) such that A = QR. An analogous result holds for
A ∈ Cn,m with rank(A) = m; now QH Q = Im [LM, Corollary 12.12].
11. Jordan decomposition: If A ∈ K n,n has a characteristic polynomial that decom-
poses into linear factors, then there exists a decomposition of the form
A = SJS −1 with S ∈ GLn (K), J = diag(Jd1 (λ1 ), . . . , Jdm (λm )), where
λi 1
... ...
Jdi (λi ) = ∈ K di ,di , i = 1, . . . , m.
. .. 1
λi
The matrix J in this decomposition is called a Jordan canonical form of A. It is
uniquely determined up to the order of the diagonal Jordan blocks Jdi (λi ) [LM,
Theorems 16.10 and 16.12].
We will now derive a few basic and useful properties of the two products AB and BA of
two (possibly non-square) matrices A and B, which are not contained in [LM].
106
Lemma A.1. If K is any field, A ∈ K n,m and B ∈ K m,n , then tm PAB = tn PBA .
Proof. We define
tIn A In 0
C= ∈ K n+m,n+m and D = ∈ K n+m,n+m ,
B Im −B tIm
then
tIn − AB tA tIn A
CD = and DC = ,
0 tIm 0 tIm − BA
and hence
Thus, the nonzero eigenvalues of AB and BA coincide with the same algebraic multi-
plicities.
If n = m, then PAB = PBA , but we may have MAB 6= MBA . For example,
0 1 1 0
A= , B= yield MAB = t, MBA = t2 .
0 0 0 0
The next lemma shows that the nonzero eigenvalues of AB and BA also have the same
geometric multiplicities.
Lemma A.3. Let K be any field, A ∈ K n,m and B ∈ K m,n , and let λ 6= 0 be an
eigenvalue of AB and BA. Then g(λ, AB) = g(λ, BA).
107
Pk
Moreover, the vectors y1 , . . . , yk are linearly independent. If j=1 αj yj = 0, then
k
X k
X k
X
0= αj Bxj ⇒ 0= αj ABxj = λ αj xj .
j=1 j=1 j=1
More details about the relationship between AB and BA, and in particular a detailed
analysis of the Jordan structures, can be found in [18, 44].
Exercise. Suppose that A, B ∈ K n,n are such that In − AB is invertible. Show that
then I − BA is invertible with (In − BA)−1 = In + BXA, where X = (In − AB)−1 .
Show further that for any λ ∈ K, the matrix λIn − AB is invertible if and only if
λIn − BA is invertible.
108
Index
109
orthogonal diagonalization, 14 unit element, 16
orthogonal matrix, 105 unitary diagonalization, 8
unitary matrix, 8, 105
parallelogram identity, 41 unitary similarity, 8
polarization identity, 41
positive (semi-) definite, 105 vector product in R3 , 19
projectively commuting matrices, 71
zero divisor, 16
QR decomposition, 106
quasi-commuting matrices, 91
quaternions, 32
rank, 104
rational canonical form, 81
real field of values, 36
Schur decomposition, 8
Schur form, 8
real, 13
Schur inequality, 11
simultaneous diagonalization, 67
simultaneous triangularization, 65, 87, 91,
95
algorithm, 66
overview, 98
unitary, 67
skew-Hermitian matrix, 105
skew-symmetric matrix, 105
spectral decoposition of normal matrices, 8
spectral mapping theorem, 7, 74
spectral matrix, 42
spectral radius, 40
spectrum, 104
symmetric matrix, 105
Toeplitz matrix, 18
Toeplitz–Hausdorff Theorem, 38
trace, 104
transpose, 104
triangular matrix, 105
triangularization, 6
unitary, 8
trivial algebra, 16
110
Bibliography
[3] J. C. Baez, The octonions, Bull. Amer. Math. Soc. (N.S.), 39 (2002), pp. 145–205.
[4] J. Barrı́ a and P. R. Halmos, Vector bases for two commuting matrices, Linear
and Multilinear Algebra, 27 (1990), pp. 147–157.
[6] G. Bourgeois, Pairs of matrices, one of which commutes with their commutator,
Electron. J. Linear Algebra, 22 (2011), pp. 593–597.
[7] A. Cayley, A memoir on the theory of matrices, Philos. Trans. Roy. Soc. London,
148 (1858), pp. 17–37.
[9] R. Dedekind, Zur Theorie der aus n Haupteinheiten gebildeten komplexen Größen,
Nachrichten von der Königl. Gesellschaft der Wissenschaften und der Georg-
Augusts-Universität zu Göttingen, (1895), pp. 141–159.
[10] M. P. Drazin, A reduction for the matrix equation AB = BA, Proc. Cambridge
Philos. Soc., 47 (1951), pp. 7–10.
[11] , Some generalizations of matrix commutativity, Proc. London Math. Soc. (3),
1 (1951), pp. 222–231.
111
[12] M. P. Drazin, J. W. Dungey, and K. W. Gruenberg, Some theorems on
commutative matrices, J. London Math. Soc., 26 (1951), pp. 221–228.
[16] K. Fan, Some remarks on commutators of matrices, Arch. Math. (Basel), 5 (1954),
pp. 102–107.
[18] H. Flanders, Elementary divisors of AB and BA, Proc. Amer. Math. Soc., 2
(1951), pp. 871–874.
[23] , Über unitäre Matrizen, Sitzungsberichte Akademie der Wiss. zu Berlin, (1911),
pp. 373–378.
[26] M. Goldberg and E. Tadmor, On the numerical radius and its applications,
Linear Algebra Appl., 42 (1982), pp. 263–284.
112
[27] R. Grone, C. R. Johnson, E. M. Sa, and H. Wolkowicz, Normal matrices,
Linear Algebra Appl., 87 (1987), pp. 213–225.
[28] R. M. Guralnick, A note on pairs of matrices with rank one commutator, Linear
and Multilinear Algebra, 8 (1979/80), pp. 97–99.
[29] B. Hall, Lie Groups, Lie Algebras, and Representations: An Elementary Intro-
duction, vol. 222 of Graduate Texts in Mathematics, Springer, Cham, second ed.,
2015.
[30] P. R. Halmos, A Hilbert Space Problem Book, vol. 19 of Graduate Texts in Math-
ematics, Springer-Verlag, New York, second ed., 1982.
[32] F. Hausdorff, Der Wertevorrat einer Bilinearform, Math. Z., 3 (1919), pp. 314–
316.
[34] N. J. Higham, Functions of Matrices. Theory and Computation, Society for Indus-
trial and Applied Mathematics (SIAM), Philadelphia, PA, 2008.
[37] , Polynomials in a matrix and its commutant, Linear Algebra Appl., 48 (1982),
pp. 293–301.
[38] H. Hopf, Ein topologischer Beitrag zur reellen Algebra, Comment. Math. Helv., 13
(1940), pp. 219–239.
[40] , Matrix Analysis, Cambridge University Press, Cambridge, second ed., 2013.
113
[42] N. Jacobson, Rational methods in the theory of Lie algebras, Ann. of Math. (2),
36 (1935), pp. 875–881.
[43] N. Jacobson, Schur’s theorems on commutative matrices, Bull. Amer. Math. Soc.,
50 (1944), pp. 431–436.
[45] T. Kato, Perturbation Theory for Linear Operators, Die Grundlehren der math-
ematischen Wissenschaften, Band 132, Springer-Verlag New York, Inc., New York,
1966.
[51] C. C. MacDuffee, The Theory of Matrices, Verlag von Julius Springer, Berlin,
1933.
[56] M. Mirzakhani, A simple proof of a theorem of Schur, Amer. Math. Monthly, 105
(1998), pp. 260–262.
114
[57] F. D. Murnaghan and A. Wintner, A canonical form for real matrices under
orthogonal transformations, Proc. Natl. Acad. Sci. USA, 17 (1931), pp. 417–420.
[59] A. Ostrowski, Über die Existenz einer endlichen Basis bei gewissen Funktionen-
systemen, Math. Ann., 78 (1917), pp. 94–119.
[60] C. Pearcy, An elementary proof of the power inequality for the numerical radius,
Michigan Math. J., 13 (1966), pp. 289–291.
[62] , On the spectra of commutators, Proc. Amer. Math. Soc., 5 (1954), pp. 929–931.
[65] , Olga Taussky-Todd’s influence on matrix theory and matrix theorists, Linear
and Multilinear Algebra, 5 (1977/78), pp. 197–224. A discursive personal tribute.
[66] I. Schur, Zur Theorie der vertauschbaren Matrizen, J. Reine Angew. Math., 130
(1905), pp. 66–76.
[67] , Über die charakteristischen Wurzeln einer linearen Substitution mit einer An-
wendung auf die Theorie der Integralgleichungen, Math. Ann., 66 (1909), pp. 488–
510.
[68] D. Serre, Matrices. Theory and Applications, vol. 216 of Graduate Texts in Mathe-
matics, Springer-Verlag, New York, 2002. Translated from the 2001 French original.
[69] H. Shapiro, A survey of canonical forms and invariants for unitary similarity,
Linear Algebra Appl., 147 (1991), pp. 101–167.
[70] , Commutators which commute with one factor, Pacific J. Math., (1997),
pp. 323–336. Olga Taussky-Todd: in memoriam.
[71] , Notes from Math 223: Olga Taussky Todd’s matrix theory course, 1976–1977,
Math. Intelligencer, 19 (1997), pp. 21–27.
115
[72] D. Shemesh, Common eigenvectors of two matrices, Linear Algebra Appl., 62
(1984), pp. 11–18.
[74] K. Shoda, Einige Sätze über Matrizen, Jpn. J. Math., 13 (1937), pp. 361–365.
[77] , Sur l’equations en matrices px = xq, C. R. Acad. Sc. Paris, 99 (1884), pp. 67–
71, 115–116.
[80] , The role of symmetric matrices in the study of general matrices, Linear Alge-
bra and Appl., 5 (1972), pp. 147–154.
[81] , How I became a torchbearer for matrix theory, Amer. Math. Monthly, 95
(1988), pp. 801–812.
[82] R. C. Thompson, Commutators in the special and general linear groups, Trans.
Amer. Math. Soc., 101 (1961), pp. 16–33.
[83] O. Toeplitz, Das algebraische Analogon zu einem Satze von Fejér, Math. Z., 2
(1918), pp. 187–197.
[84] J. Wei, Note on the global validity of the Baker-Hausdorff and Magnus theorems,
Journal of Mathematical Physics, 4 (1963), pp. 1337–1341.
116
[87] H. Wielandt, Über die Unbeschränktheit der Operatoren der Quantenmechanik,
Math. Ann., 121 (1949), p. 21.
[89] F. Zhang, Matrix Theory. Basic Results and Techniques, Universitext, Springer,
New York, second ed., 2011.
117