Minpolyandappns
Minpolyandappns
KEITH CONRAD
1. Introduction
The easiest matrices to compute with are the diagonal ones. The sum and product of
diagonal matrices can be computed componentwise along the main diagonal, and taking
powers of a diagonal matrix is simple too. All the complications of matrix operations are
gone when working only with diagonal matrices. If a matrix A is not diagonal but can be
conjugated to a diagonal matrix, say D := P AP −1 is diagonal, then A = P −1 DP so Ak =
P −1 Dk P for all integers k, which reduces us to computations with a diagonal matrix. In
many applications of linear algebra (e.g., dynamical systems, differential equations, Markov
chains, recursive sequences) powers of a matrix are crucial to understanding the situation,
so the relevance of knowing when we can conjugate a nondiagonal matrix into a diagonal
matrix is clear.
We want look at the coordinate-free formulation of the idea of a diagonal matrix, which
will be called a diagonalizable operator. There is a special polynomial, the minimal polyno-
mial (generally not equal to the characteristic polynomial), which will tell us exactly when
a linear operator is diagonalizable. The minimal polynomial will also give us information
about nilpotent operators (those having a power equal to O).
All linear operators under discussion are understood to be acting on nonzero finite-
dimensional vector spaces over a given field F .
2. Diagonalizable Operators
Definition 2.1. We say the linear operator A : V → V is diagonalizable when it admits
a diagonal matrix representation with respect to some basis of V : there is a basis B of V
such that the matrix [A]B is diagonal.
Let’s translate diagoinalizability into the language of eigenvectors rather than matrices.
Theorem 2.2. The linear operator A : V → V is diagonalizable if and only if there is a
basis of eigenvectors for A in V .
Proof. Suppose there is a basis B = {e1 , . . . , en } of V in which [A]B is diagonal:
a1 0 ··· 0
0 a2 ··· 0
[A]B = .. .
.. ..
. . . 0
0 0 · · · an
Then Aei = ai ei for all i, so each ei is an eigenvector for A. Conversely, if V has a basis
{v1 , . . . , vn } of eigenvectors of A, with Avi = λi vi for λi ∈ F , then in this basis the matrix
representation of A is diag(λ1 , . . . , λn ).
A basis of eigenvectors for an operator is called an eigenbasis.
An example of a linear operator that is not diagonalizable over all fields F is ( 10 11 ) acting
x
on F 2 . Its only eigenvectors are the vectors 0 . There are not enough eigenvectors to form
and ( 10 01 ) have the same characteristic polynomial, and the second matrix is diagonalizable,
the characteristic polynomial doesn’t determine (in general) if an operator is diagonalizable.
Here are the main results we will obtain about diagonalizability:
(1) There are ways of determining if an operator is diagonalizable without having to
look explicitly for a basis of eigenvectors.
(2) When F is algebraically closed, “most” operators on a finite-dimensional F -vector
space are diagonalizable.
(3) There is a polynomial, the minimal polynomial of the operator, which can be used
to detect diagonalizability.
(4) If two operators are each diagonalizable, they can be simultaneously diagonalized
(i.e., there is a common eigenbasis) precisely when they commute.
Let’s look at three examples related to diagonalizability over R and C.
Example 2.3. Let R = ( 01 −1 2
0 ), the 90-degree rotation matrix acting on R . It is not
2 2
diagonalizable on R since there are no eigenvectors: a rotation in R sends no nonzero
vector to a scalar multiple of itself. This geometric reason is complemented by an algebraic
reason: the characteristic polynomial T 2 + 1 of R has no roots in R, so there are no real
eigenvalues and thus no eigenvectors in R2 . However, there are roots ±i of T 2 + 1 in C,
and there are eigenvectors of R as an operator on 2 2
2 i
−i
C rather than R . Eigenvectorsi of −iRin
C for the eigenvalues i and −i are 1 and 1 , respectively. In the basis B = { 1 , 1 },
the matrix of R is [R]B = ( 0i −i 0 ), where the first diagonal entry is the eigenvalue of the first
basis vector in B and the second diagonal entry is the eigenvalue of the second basis vector
in B. (Review the proof of Theorem 2.2 to see why this relation between the ordering of
vectors in an eigenbasis and the ordering of entries in a diagonal matrix always holds.)
Put more concretely, since passing to a new matrix representation of an operator from
an old one amounts to conjugating the old matrix representation by the change-of-basis
matrix expressing the old basis in terms of the new basis, we must have ( 0i −i0 ) = P RP −1
−i/2 1/2
where P = ([ 10 ]B [ 01 ]B ) = ( i/2 1/2 ). Verify this P really conjugates R to ( 0i −i0 ). Note
Example 2.4. Every A ∈ Mn (R) satisfying A = A> can be diagonalized over R. This
is a significant result, called the real spectral theorem. (Any theorem that gives sufficient
conditions under which an operator can be diagonalized is called a spectral theorem, because
the eigenvalues of an operator is called its spectrum.) The essential step in the proof of the
real spectral theorem is to show that every real symmetric matrix has a real eigenvalue.
> >
Example 2.5. Any A ∈ Mn (C) satisfying AA = A A is diagonalizable in Mn (C).1 When
>
A is real, so A = A> , saying AA> = A> A is weaker than saying A = A> . In particular,
the real matrix ( 01 −1
0 ) commutes with its transpose and thus is diagonalizable over C,
but the real spectral theorem does not apply to this matrix and in fact this matrix isn’t
diagonalizable over R (it has no real eigenvalues).
(See the proof of Theorem 2.2.) The converse is false: if all the eigenvalues of an operator
are in F this does not necessarily mean the operator is diagonalizable. Just think about
our basic example ( 10 11 ), whose only eigenvalue is 1. It is a “repeated eigenvalue,” in the
sense that the characteristic polynomial (T − 1)2 has 1 as a repeated root. Imposing an
additional condition, that the eigenvalues lie in F and are simple roots of the characteristic
polynomial, does force diagonalizability. To prove this, we start with a general lemma on
eigenvalues and linear independence.
Lemma 3.1. Eigenvectors for distinct eigenvalues are linearly independent. More pre-
cisely, if A : V → V is linear and v1 , . . . , vr are eigenvectors of A with distinct eigenvalues
λ1 , . . . , λr , the vi ’s are linearly independent.
Proof. This will be an induction on r. The case r = 1 is easy. If r > 1, suppose there is a
linear relation
(3.1) c1 v1 + · · · + cr−1 vr−1 + cr vr = 0
with ci ∈ F . Apply A to both sides: vi becomes Avi = λi vi , so
(3.2) c1 λ1 v1 + · · · + cr−1 λr−1 vr−1 + cr λr vr = 0.
Multiply the linear relation in (3.1) by λr :
(3.3) c1 λr v1 + · · · + cr−1 λr vr−1 + cr λr vr = 0.
Subtracting (3.3) from (3.2), the last terms on the left cancel:
c1 (λ1 − λr )v1 + · · · + cr−1 (λr−1 − λr )vr−1 = 0.
Now we have a linear relation with r − 1 eigenvectors having distinct eigenvalues. By
induction, all the coefficients are 0: ci (λi −λr ) = 0 for i = 1, . . . , r −1. Since λ1 , . . . , λr−1 , λr
are distinct, λi − λr 6= 0 for i = 1, . . . , r − 1. Thus ci = 0 for i = 1, . . . , r − 1. Now our
original linear relation (3.1) becomes cr vr = 0. The vector vr is not 0 (eigenvectors are
always nonzero by definition), so cr = 0.
Theorem 3.2. A linear operator on V whose characteristic polynomial is a product of
linear factors in F [T ] with distinct roots is diagonalizable.
Proof. The assumption is that there are n different eigenvalues in F , where n = dim V . Call
them λ1 , . . . , λn . Let ei be an eigenvector with eigenvalue λi . The eigenvalues are distinct,
so by Lemma 3.1 the ei ’s are linearly independent. Since there are n of these vectors, the
ei ’s are a basis of V , so the operator admits an eigenbasis and is diagonalizable.
Example 3.3. The matrix ( 10 11 ) has characteristic polynomial (T − 1)2 , which has linear
factors in R[T ] but the roots are not distinct, so Theorem 3.2 does not say the matrix is
diagonalizable in M2 (R), and in fact it isn’t.
Example 3.4. The matrix ( 11 10 ) has characteristic polynomial T 2 − T − 1, which has 2
different real roots, so the matrix is diagonalizable in M2 (R).
There are many diagonal matrices with repeated diagonal entries (take the simplest
example, In !), and their characteristic polynomials have repeated roots. The criterion in
Theorem 3.2 will never detect a diagonalizable operator with a repeated eigenvalue, so that
criterion is a sufficient but not necessary condition for diagonalizability. In Section 4 we will
see a way to detect diagonalizability using a different polynomial than the characteristic
polynomial that is both necessary and sufficient.
Exactly how common is it for a characteristic polynomial to have distinct roots (whether
or not they lie in F )? Consider 2 × 2 matrices: the characteristic polynomial T 2 + bT + c has
repeated roots if and only if b2 − 4c = 0. A random quadratic will usually satisfy b2 − 4c 6= 0
4 KEITH CONRAD
(you have to be careful to arrange things so that b2 − 4c is 0), so “most” 2 × 2 matrices have
a characteristic polynomial with distinct roots. Similarly, a random n × n matrix usually
has a characteristic polynomial with distinct roots. In particular, over the complex numbers
this means a random n × n complex matrix almost certainly has distinct eigenvalues and
therefore (since the eigenvalues lie in C) Theorem 3.2 tells us that a random n × n complex
matrix is diagonalizable. So diagonalizability is the rule rather than the exception over C,
or more generally over an algebraically closed field.
2That f (A) = O for some nonconstant f (T ) ∈ F [T ] can be shown without the Cayley-Hamilton theorem:
2
the space EndF (V ) has dimension n2 , so the n2 + 1 operators I, A, A2 , . . . , An in EndF (V ) must have a
non-trivial linear dependence relation, and this gives a polynomial relation on A with coefficients in F of
degree ≤ n2 , which is weaker than what we know from the Cayley-Hamilton theorem.
THE MINIMAL POLYNOMIAL AND SOME APPLICATIONS 5
Theorem 4.4 justifies speaking of the minimal polynomial. If two monic polynomials are
both of least degree killing A, Theorem 4.4 shows they divide each other, and therefore they
are equal (since they are both monic). Minimal polynomials of linear operators need not
be irreducible (e.g., ( 10 11 ) has minimal polynomial (T − 1)2 ).
Example 4.5. Write V as a direct sum of subspaces, say V = U ⊕ W . Let PU : V → V be
projection onto the subspace U from this particular decomposition: P (u + w) = u. Since
P (u) = u, we have P 2 (u + w) = P (u + w), so P 2 = P . Thus P is killed by the polynomial
T 2 − T = T (T − 1). If T 2 − T is not the minimal polynomial then by Theorem 4.4 either
T or T − 1 kills P ; the first case means P = O (so U = {0}) and the second case means
P = idV (so U = V ). As long as U and W are both nonzero, P is neither O nor idV and
T 2 − T is the minimal polynomial of the projection P .
While all operators on an n-dimensional space have characteristic polynomials of degree n,
the degree of the minimal polynomial varies from one operator to the next. Its computation
is not as mechanical as the characteristic polynomial, since there isn’t a universal formula
for the minimal polynomial. Indeed, consider the matrix
1+ε 0
Aε = .
0 1
For ε 6= 0, Aε has two different eigenvalues, 1 + ε and 1. Therefore the minimal polynomial
of Aε is not of degree 1, so its minimal polynomial must be its characteristic polynomial
T 2 − (2 + ε)T + 1 + ε. However, when ε = 0 the matrix Aε is the identity I2 with minimal
polynomial T − 1. When eigenvalue multipicities change, the minimal polynomial changes
in a drastic way. It is not a continuous function of the matrix.
To compute the minimal polynomial of a linear operator, Theorem 4.4 looks useful. For
example, the minimal polynomial divides the characteristic polynomial, since the charac-
teristic polynomial kills the operator by the Cayley–Hamilton theorem.3
Example 4.6. Let
0 −1 1
A = 1 2 −1 .
1 1 0
The characteristic polynomial is T 3 − 2T 2 + T = T (T − 1)2 . If the characteristic polynomial
is not the minimal polynomial then the minimal polynomial divides one of the quadratic
factors. There are two of these: T (T − 1) and (T − 1)2 . A calculation shows A(A − I3 ) = O,
so the minimal polynomial divides T (T − 1). Since A and A − I3 are not O, the minimal
polynomial of A is T (T − 1) = T 2 − T .
Since the minimal polynomial divides the characteristic polynomial, every root of the
minimal polynomial (possibly in an extension of F ) is an eigenvalue. The converse is also
true:
Theorem 4.7. Any eigenvalue of a linear operator is a root of its minimal polynomial in
F [T ], so the minimal polynomial and characteristic polynomial have the same roots.
Proof. The minimal polynomials of a linear operator and its matrix representation in some
basis are the same, so we pick bases to work with a matrix A acting on F n (n = dimF V ).
Say λ is an eigenvalue of A, in some extension field E. We want to show mA (λ) = 0. There
is an eigenvector in E n for this eigenvalue: Av = λv and v 6= 0. Then Ak v = λk v for all
k ≥ 1, so f (A)v = f (λ)v for all f ∈ E[T ]. In particular, taking f (T ) = mA (T ), mA (A) = O
so 0 = mA (λ)v. Thus mA (λ) = 0.
3In particular, deg m (T ) ≤ dim V . This inequality can also be proved directly by an induction on the
A F
dimension, without using the Cayley-Hamilton theorem. See [1].
6 KEITH CONRAD
Remark 4.8. The proof may look a bit funny: nowhere in the argument did we use
minimality of mA (T ). Indeed, we showed every polynomial that kills A has an eigenvalue
of A as a root. Since the minimal polynomial for A divides all other polynomials killing A,
it is the minimal polynomial of A to which this result has its main use, and that’s why we
formulated Theorem 4.7 for the minimal polynomial.
Example 4.9. In Example 4.6, χA (T ) = T (T − 1)2 , so the eigenvalues of A are 0 and
1. Theorem 4.7 says mA (T ) has roots 0 and 1, so mA (T ) is divisible by T and T − 1.
Therefore if mA 6= χA then mA = T (T − 1). The consideration of (T − 1)2 in Example 4.6
was unnecessary; it couldn’t have worked because mA (T ) must have both 0 and 1 as roots.
Corollary 4.10. The minimal and characteristic polynomials of a linear operator have the
same irreducible factors in F [T ].
Proof. Any irreducible factor of the minimal polynomial is a factor of the characteristic
polynomial since the minimal polynomial divides the characteristic polynomial. Conversely,
if π(T ) is an irreducible factor of the characteristic polynomial, a root of it (possibly in some
larger field than F ) is an eigenvalue and therefore is also a root of the minimal polynomial
by Theorem 4.7. Any polynomial in F [T ] sharing a root with π(T ) is divisible by π(T ), so
π(T ) divides the minimal polynomial.
If we compare the irreducible factorizations in F [T ]
χA (T ) = π1 (T )e1 · · · πk (T )ek , mA (T ) = π1 (T )f1 · · · πk (T )fk ,
we have 1 ≤ fk ≤ ek since mA (T ) | χA (T ). Since ei ≤ n ≤ nfi , we also get a “reverse”
divisibility: χA (T ) | mA (T )n in F [T ].
We say a polynomial in F [T ] splits if it is a product of linear factors in F [T ]. For instance,
T 2 − 5 splits in R[T ], but not in Q[T ]. The polynomial (T − 1)2 splits in every F [T ]. Any
factor of a polynomial in F [T ] that splits also splits.
Using the minimal polynomial in place of the characteristic polynomial provides a good
criterion for diagonalizability over a field, which is our main result:
Theorem 4.11. Let A : V → V be a linear operator. Then A is diagonalizable if and only
if its minimal polynomial in F [T ] splits in F [T ] and has distinct roots.
Theorem 4.11 gives necessary and sufficient conditions for diagonalizability, rather than
just sufficient conditions as in Theorem 3.2. Because the minimal polynomial is a factor of
the characteristic polynomial, Theorem 4.11 implies Theorem 3.2.
Proof. Suppose mA (T ) splits in F [T ] with distinct roots. We will show V has a basis of
eigenvectors for A, so A is diagonalizable. Let
mA (T ) = (T − λ1 ) · · · (T − λr ),
so the λi ’s are the eigenvalues of A (Theorem 4.7) and by hypothesis they are distinct.
For an eigenvalue λi , let
Eλi = {v ∈ V : Av = λi v}
be the corresponding eigenspace. We will show
(4.1) V = Eλ1 ⊕ · · · ⊕ Eλr ,
so using bases from each Eλi provides an eigenbasis for A. Since eigenvectors with different
eigenvalues are linearly independent, it suffices to show
V = Eλ1 + · · · + Eλr ,
as the sum will then automatically be direct by linear independence.
THE MINIMAL POLYNOMIAL AND SOME APPLICATIONS 7
The way to get the eigenspace components of a vector is to show that it is possible to
“project” from V to each eigenspace Eλi using polynomials in the operator A. Specifically,
we want to find polynomials h1 (T ), . . . , hr (T ) in F [T ] such that
(4.2) 1 = h1 (T ) + · · · + hr (T ), hi (T ) ≡ 0 mod mA (T )/(T − λi ).
The congruence condition implies the polynomial (T − λi )hi (T ) is divisible by mA (T ), so
(A − λi )hi (A) acts on V as O. So if we substitute the operator A for T in (4.2) and apply
both sides to a vector v ∈ V , we get
v = h1 (A)(v) + · · · + hr (A)(v), (A − λi )hi (A)(v) = 0.
The second equation saysPhi (A)(v) lies in Eλi and the first equation says v is a sum of such
eigenvectors, hence V = ri=1 Eλi .
Q It remains to find hi (T )’s fitting (4.2). For 1 ≤ i ≤ r, let fi (T ) = mA (T )/(T − λi ) =
j6=i (T − λj ). Since the λi ’s are distinct, the polynomials f1 (T ), . . . , fr (T ) are relatively
prime as an r-tuple, so some F [T ]-linear combination of them is equal to 1:
r
X
(4.3) 1= gi (T )fi (T ),
i=1
where gi (T ) ∈ F [T ]. Let hi (T ) = gi (T )fi (T )!
Now assume A is diagonalizable, so all eigenvalues of A are in F and V is the direct sum
of the eigenspaces for A. We want to show the minimal polynomial of A in F [T ] splits and
has distinct roots.
Let λ1 , . . . , λr be the different eigenvalues of A, so V = Eλ1 ⊕ · · · ⊕ Eλr . We will show
f (T ) := (T − λ1 ) · · · (T − λr ) ∈ F [T ] is the minimal polynomial of A in F [T ].
By hypothesis, the eigenvectors of A span V . Let v be an eigenvector, say Av = λv.
Then A − λ kills v. The operators A − λi commute with each other and one of them kills
v, so their product f (A) kills v. Thus f (A) kills the span of the eigenvectors, which is V ,
so f (A) = O. The minimal polynomial is therefore a factor of f (T ). At the same time,
each root of f (T ) is an eigenvalue of A and therefore is a root of the minimal polynomial
of A (Theorem 4.7). Since the roots of f (T ) each occur once, f (T ) must be the minimal
polynomial of A.
Remark 4.12. In (4.3), the polynomial gi (T ) can in fact be taken to be the constant
1/fi (λi ). Indeed, the sum
r
X 1
fi (T )
fi (λi )
i=1
is a polynomial of degree at most r − 1 (since each fi (T ) has degree r − 1) and at λ1 , . . . , λr
it takes the value 1 (all but one term in the sum vanishes at each λi and the remaining term
is 1). Taking a value r times when the degree is at most r − 1 forces the polynomial to be
that constant value.
Corollary 4.13. Let A be a linear transformation of finite order on a complex vector space:
Am = idV for some positive integer m. Then A is diagonalizable.
Proof. Since Am = idV , A is killed by T m − 1. Therefore the minimal polynomial of A is a
factor of T m − 1. The polynomial T m − 1 has distinct roots in C.
Earlier (Example 4.5) we saw that the projection from V onto a subspace in a direct sum
decomposition is a linear operator P where P 2 = P . Now we can prove the converse.
Corollary 4.14. Any linear operator A : V → V satisfying A2 = A is the projection of V
onto some subspace: there is a decomposition V = U ⊕ W such that A(u + w) = u.
8 KEITH CONRAD
5. Simultaneous Diagonalizability
Now that we understand when a single linear operator is diagonalizable (if and only if the
minimal polynomial splits with distinct roots), we consider simultaneous diagonalizability
of several linear operators Aj : V → V , j = 1, 2, . . . , r. Assuming each Aj has a diagonal
matrix representation in some basis, can we find a common basis in which the Aj ’s are all
diagonal matrices? (This possibility is called simultaneous diagonalization.) A necessary
constraint is commutativity: every set of diagonal matrices commutes, so if the Aj ’s can
be simultaneously diagonalized, they must commute. Happily, this necessary condition is
also sufficient, as we will soon see. What is special about commuting operators is they
preserve each other’s eigenspaces: if AB = BA and Av = λv then A(Bv) = B(Av) =
B(λv) = λ(Bv), so B sends each vector in the λ-eigenspace of A to another vector in the
λ-eigenspace of A. Pay attention to how this is used in the next theorem.
Theorem 5.1. If A1 , . . . , Ar are linear operators on V and each Ai is diagonalizable, they
are simultaneously diagonalizable if and only if they commute.
THE MINIMAL POLYNOMIAL AND SOME APPLICATIONS 9
sum ( 10 11 ) is not. The reader is invited to find diagonalizable 2 × 2 matrices with a non-
diagonalizable product.
6. Nilpotent Operators
The minimal polynomial classifies not only diagonalizable operators, but also nilpotent
operators (those having a power equal to O).
4This choice of basis for E is not made by A , but by the other operators together.
λ r
10 KEITH CONRAD
where the λi ’s are the distinct roots. Then the polynomials fi (T ) = mA (T )/(T − λi )ei are
relatively prime, so arguing as in the proof of Theorem 4.11 (where all the exponents ei are
1) we get M
V = ker((A − λi )ei ).
Let Wi = ker((A − λi )ei ). Since A commutes with (A − λi )ei , A(Wi ) ⊂ Wi . We will show
A|Wi has an upper-triangular matrix representation, and by stringing together bases of the
Wi ’s to get a basis of V we will obtain an upper-triangular matrix representation of A on
V.
On Wi , (A − λi )ei = O, so A − λi is a nilpotent operator. Write A = λi + (A − λi ), which
expresses A on Wi as the sum of a scaling operator and a nilpotent operator. By what we
know about nilpotent operators, there is a basis of Wi with respect to which A−λi is strictly
upper triangular. Now with respect to every basis, the scaling operator λi is diagonal. So
using the basis that makes A − λi a strictly upper triangular matrix, the matrix for A is
the sum of a diagonal and strictly upper triangular matrix, and that’s an upper triangular
matrix.
Corollary 6.4. If A1 , . . . , Ar are commuting linear operators on V and each Ai is upper
triangularizable, they are simultaneously upper triangularizable.
Unlike Theorem 5.1, the commuting hypothesis used here is far from being necessary:
most upper triangular matrices (unlike all diagonal matrices) do not commute with each
other! There is a theorem of A. Borel about linear algebraic groups that relaxes the commu-
tativity assumption to a more reasonable hypothesis (solvability, together with some other
technical conditions).
Proof. We argue by induction on the dimension of the vector space (not on the number of
operators, as in the proof of Theorem 5.1). The one-dimensional case is clear. Assume now
dim V ≥ 2 and the corollary is known for lower-dimensional spaces. We may assume the
Ai ’s are not all scalar operators on V (otherwise the result is obvious using an arbitrary
basis of V ). Without loss of generality, let Ar not be a scalar operator.
Since Ar is upper triangularizable on V , its eigenvalues are in F . Let λ ∈ F be an
eigenvalue of Ar and set Eλ to be the λ-eigenspace of Ar in V . Then 0 < dim Eλ < dim V .
Since the Ai ’s commute, Ai (Eλ ) ⊂ Eλ for all i. Moreover, the minimal polynomial of
Ai |Eλ is a factor of the minimal polynomial of Ai on V , so every Ai |Eλ is upper triangular-
izable by Theorem 4.17. Since the Ai ’s commute on V , they also commute as operators on
Eλ , so by induction on dimension the Ai ’s are simultaneously upper triangularizable on Eλ .
In particular, the first vector in a simultaneous “upper triangular basis” for the Ai ’s is a
common eigenvector of all the Ai ’s. Call this vector e1 and set W = F e1 . Then Ai (W ) ⊂ W
for all i. The Ai ’s are all operators on W and thus also on V /W . Let Ai : V /W → V /W
be the operator Ai induces on V /W . On V /W these operators commute and are upper
triangularizable (since their minimal polynomials divide those of the Ai ’s on V , which split
in F [T ]), so again by induction on dimension the operators Ai on V /W are simultaneously
upper triangularizable. If we lift a common upper triangular basis for the Ai ’s from V /W to
V and tack on e1 as the first member of a basis for V , we obtain a common upper triangular
basis for the Ai ’s by an argument similar to that in the proof of Corollary 6.3.
References
[1] M. D. Burrow, “The Minimal Polynomial of a Linear Transformation,” Amer. Math. Monthly 80 (1973),
1129–1131.