The Minimal Polynomial: Michael H. Mertens October 22, 2015
The Minimal Polynomial: Michael H. Mertens October 22, 2015
Michael H. Mertens
October 22, 2015
Introduction
In these short notes we explain some of the important features of the minimal
polynomial of a square matrix A and recall some basic techniques to find roots
of polynomials of small degree which may be useful.
Should there be any typos or mathematical errors in this manuscript, I’d
be glad to hear about them via email ([email protected]) and
correct them. Please note that the numbering of theorems and definitions is
unfortunately not consistent with the lecture.
1
(ii) The minimal polynomial of A, denoted by µA (X), is the monic (i.e.
with leading coefficient 1) polynomial of lowest degree such that
µA (A) = 0 ∈ Rn×n .
It is maybe not immediately clear that this definition always makes sense.
Lemma 1.2. The minimal polynomial is always well-defined and we have
deg µA (X) ≤ n2 .
Proof. We can write the entries of an n × n-matrix as a column, e.g. reading
the matrix row-wise,
a11
a.12
a11 a12 ... a1n .
a21 a22 ... a2n .
→
7 a1n .
.. .. . . ..
. . . . a21
an1 an2 ... ann .
..
ann
2
With this we can identify Rn×n with Rn and, as is easy to see, addition and
scalar multipication of matrices are respected by this map. In particular this
means that at most n2 matrices in Rn×n can be linearly independent, since
2
they can be thought of as living in Rn , whose dimension is n2 . This means
now that there must be a minimal d ≤ n2 such that the matrices
In , A, A2 , ..., Ad
(viewed as vectors as described above) are linearly dependent, i.e. there are
numbers c0 , ..., cd−1 such that
Ad + cd−1 Ad−1 + ... + c1 A + c0 In = 0 ∈ Rn×n
If we now replace A in this equation by the undeterminate X, we obtain a
monic polynomial p(X) satisfying p(A) = 0 and the degree d of p is minimal
by construction, hence p(X) = µA (X) by definition.
Remark 1.3. In fact, there is a much stronger (sharp) bound on the degree
of the minimal polynomial, namely we have that deg µA (X) ≤ n. This is a
consequence of the Theorem of Cayley-Hamilton (Theorem 1.11), which we
will encounter later.
2
Recall the definition of eigenvalues, eigenvectors, and eigenspaces from
the lecture.
Definition 1.4. A non-zero vector v ∈ Rn that satisfies the equation
Av = λv
3
Algorithm 1.7.
INPUT: A ∈ Rn×n
OUTPUT: µA (X)
ALGORITHM:
(ii) While the set S = {v, Av, A2 v, ..., Am v} is linearly independent, com-
pute Am+1 v = A(Am v) and add it to S.
(iii) Write down the (normalized) linear depency for S, i.e. compute num-
bers c0 , ..., cd−1 such that Ad v = cd−1 Ad−1 v + ... + c0 v and use these to
define the polynomial µ̃(X) = X d − (cd−1 X d−1 + ... + c0 ).
(iv) While the vectors found don’t span Rn , find a vector not in their span
and repeat steps 2. and 3.
(v) The minimal polynomial is then the least common multiple of all the
polynomials found on the way.
4
Hence we have found a linear dependency (A2 − 4I3 )e1 = 0, we have found a
divisor of our minimal polynomial,
µ̃(X) = X 2 − 4.
so that we find the linear dependence Ae2 + 2e2 = 0, i.e. the divisor X + 2 of
our minimal polynomial. Note that by chance, e2 is indeed an eigenvector of
A, but this is merely a coincidence. Since the three vectors e1 , Ae1 , e2 span
R3 , we can stop now and find
µA (X) = lcm(X 2 − 4, X + 2) = X 2 − 4.
The motivation for this definition essentially comes from the invertible
matrix theorem, especially Theorem 3.8 of the lecture. More precisely we
have that λ is an eigenvalue of A if and only if EA (λ) = Kern(A − λIn ) 6= {0}
which is the case if and only if the matrix A − λIn is not invertible, which
happens if and only if det(A − λIn ) = 0. This proves the following theorem.
5
Theorem 1.11 (Cayley-Hamilton). The minimal polynomial divides the
characteristic polynomial, or in other words, we have
χA (A) = 0 ∈ Rn×n .
Proof. By extending the definition of the classical adjoint to matrices with
polynomials as entries we can write
(XIn − A) adj(XIn − A) = det(XIn − A)In = χA (X)In . (1.1)
Now by construction the entries of the adjoint are polynomials of degree at
most n − 1, hence we can write
adj(XIn − A) = Cn−1 X n−1 + Cn−2 X n−2 + ... + C1 X + C0 ,
where C0 , ..., Cn−1 ∈ Rn×n are suitable matrices. If we now let
χA (X) = X n + an−1 X n−1 + ... + a1 X + a0
for some numbers a0 , ..., an−1 , it follows from (1.1) that
(X n + an−1 X n−1 + ... + a1 X + a0 )In
=(XIn − A)(Cn−1 X n−1 + Cn−2 X n−2 + ... + C1 X + C0 )
=Cn−1 X n + (Cn−2 − ACn−1 )X n−1 + ... + (C0 − AC1 )X − AC0 ,
so that by comparing coefficients we see that
Cn−1 = In , aj In = Cj−1 − ACj (j = 1, ..., n − 1), a0 In = −AC0 .
With this we obtain that
χA (A) = An + an−1 An−1 + ... + a1 A + a0 In
= An + An−1 (Cn−2 − A) + An−2 (Cn−3 − ACn−2 ) + ... + (A(C0 − AC1 ) − AC0
= An − An + AC0 − AC0
=0
which is what we wanted to show.
Remark 1.12. It is a popular joke to “prove” the theorem of Cayley-Hamilton
by the following short line of equations,
χA (A) = det(AIn − A) = det(0) = 0.
Why is this “proof ” rubbish?
6
If λ is an eigenvalue of A, then, according to Theorems 1.6, 1.10 and 1.11
we can write
µA (X) = (X − λ)γ p(X) (1.2)
and
χA (X) = (X − λ)α q(X), (1.3)
where p(λ), q(λ) 6= 0. This yields
Definition 1.13. For an eigenvalue λ of a matrix A ∈ Rn×n we call the
number γ defined by (1.2) the geometric multiplicity of λ. The number α
defined by (1.3) is called its algebraic multiplicity.
A = gDg −1
for some invertible matrix g ∈ Rn×n and a diagonal matrix D. Then we have
for example
A2 = (gDg −1 )2 = gDg −1 gDg −1 = gD2 g −1 ,
and similarly in general
Ak = gDk g −1 .
The good thing about this is of course that it is very easy indeed to compute
a power of a diagonal matrix. We want to study this phenomenon a bit more
closely.
Definition 1.14. (i) We call a matrix A ∈ Rn×n similar to a matrix
B ∈ Rn×n , in symbols A ∼ B, if there exists an invertible matrix
g ∈ Rn×n such that A = gBg −1 .
(ii) If A is similar to a diagonal matrix, we call A diagonalizable.
Remark 1.15. (i) If A ∼ B, then we also have B ∼ A, since
A = gBg −1 ⇔ B = g −1 Ag.
7
(ii) If A ∼ B and B ∼ C then we also have A ∼ C, since it follows from
A = gBg −1 and B = hCh−1 that A = ghCh−1 g −1 = (gh)C(gh)−1 .
In general it is quite hard to decide whether two given matrices are similar
or not. But there are several more or less easy to compute data that may
give some indication. We give some of them in the following theorem.
Theorem 1.16. Let A, B ∈ Rn×n be similar. Then the following are all true.
(v) A and B have the same eigenvalues with the same algebraic and geo-
metric multiplicities.
(i) We have
(iii) We have
8
We show that µA (B) = µB (A) = 0, which implies that both minimal
polynomials mutually divide each other, which means, since they have
the same leading coefficient, that they must be equal. We compute
Remark 1.17. (i) Theorem 1.16 does NOT say that if the indicated quan-
tities above agree for two matrices, then they are equal. It just states
that if they do not agree for two given matrices, then they cannot be
similar. It is not even so that two matrices are similar if all the above
quantities agree. For example for
2 1 0 0 2 1 0 0
0 2 0 0
and B = 0 2 0 0
A= 0 0 2 1 0 0 2 0
0 0 0 2 0 0 0 2
(ii) On the other hand one can show that two 2 × 2-matrices are similar
if and only if their minimal polynomials agree (this is false for the
characteristic polynomial!).
(iii) As one can also show, two 3 × 3-matrices are similar if and only if they
have the same minimal and characteristic polynomial.
9
2 Roots of polynomials
2.1 Quadratic and biquadratic equations
Even though it is probably the most well-known topic among those discussed
in these notes, we begin by recalling how to solve quadratic equations. Sup-
pose we have an equation of the form
ax2 + bx + c = 0
where a, b, c are given real numbers (with a 6= 0, otherwise we would in fact
be talking about linear equations) and we want to solve for x. Since a 6= 0,
we can divide the whole equation by it and add a clever zero on the left hand
side, giving 2 2
2 b b b c
x + x+ − + = 0.
a 2a 2a a
b 2
The first three summands can easily be recognized to equal (x + 2a ) . This
procedure of adding this particular zero is called completing the square. Now
we reorder the equation to obtain
2
b2 − 4ac
b
x+ = .
2a 4a2
Now there are three cases to distinguish,
(i) If ∆ := b2 − 4ac > 0, then we obtain two distinct real solutions by
taking the square-root, namely
√ √
−b + ∆ −b − ∆
x= or x = . (2.1)
2a 2a
Note that the square-root of a positive real number a is also positive
by definition
√ and therefore
√ unique, while the equation x2 = a has two
solutions, a and − a.
(ii) If ∆ = 0, then there is precisely one zero,
b
x=− .
2a
In this case we speak of a double zero, since the derivative of the function
f (x) = ax2 + bx + c would also vanish in this case. The zeros in the
first case are called simple zeros.
10
(iii) If ∆ < 0, then there is no real solution, since the square of a real
number is always non-negative.
Proof. If α1 and α2 are the two zeros of our polynomial, then we must have
x2 + bx + c = (x − α1 )(x − α2 ) = x2 − (α1 + α2 ) + α1 α2 .
ax4 + bx2 + c = 0.
x4 + x2 − 20 = 0.
11
Substituting z = x2 gives us the quadratic equation
z 2 + z − 20 = 0.
x = 2 or x = −2.
ax2 + bx + c = 0
12
facts about complex numbers here. We note that one can basically calculate
with complex numbers exactly as with real numbers, but since it is not really
relevant in the course of the lecture, we won’t go into this here. The only
thing that we will need is the exponential of a complex number. We will just
give it as a definition, although it is possible to derive it properly.
Definition 2.3. For a complex number α = a + bi with real numbers a, b we
have
exp(α) := eα := ea cos(b) + iea sin(b).
13
We multiply this 2x again by our divisor and subtract the result from the
dividend,
x3 − x2 − 3x − 9 = x − 3 x2 + 2x .
3 2
− x + 3x
2x2 − 3x
− 2x2 + 6x
We repeat this procedure again and finsh up,
x3 − x2 − 3x − 9 = x − 3 x2 + 2x + 3 .
− x3 + 3x2
2x2 − 3x
− 2x2 + 6x
3x − 9
− 3x + 9
0
We see that the last difference is a polynomial of degree 1, but not 0. This
14
polynomial is our remainder, which we have to add on the right-hand side,
x4 − 3x3 + 2x2 − 5x + 7 = x2 − x + 1 x2 − 2x − 1 − 4x + 8.
− x4 + x 3 − x2
− 2x3 + x2 − 5x
2x3 − 2x2 + 2x
− x2 − 3x + 7
x2 − x + 1
− 4x + 8
Lemma 2.4. (i) The product of two numbers is 0 if and only one of the
numbers is zero,
a·b=0 ⇒ a = 0 or b = 0.
15
So if we cannot find an integer root, then we have essetially no chance
of “guessing” one. In this case, there are numerical methods to obtain ap-
proximations for zeros, the most well-known goes back to Sir Isaac Newton
(1642–1727), or (for polynomials of degree 3 and 4) there are even closed
formulas. None of these will be relevant in our course.
x3 + 4x2 − 7x − 10.
By Proposition 2.5, we have to check the divisors of the absolute term, which
is 10 in this case. By trial and error we find that that 2 is in fact a zero of
this polynomial. Now we use polynomial division,
x3 + 4x2 − 7x − 10 = x − 2 x2 + 6x + 5 .
− x3 + 2x2
6x2 − 7x
− 6x2 + 12x
5x − 10
− 5x + 10
0
16
2.3 Exercises
Carry out polynomial division for the following pairs of polynomials.
(i) x2 − 4x + 3 divided by x − 3,
(i) x2 − 4x + 4,
(ii) x2 − 4x + 13,
(iv) x3 − 2x2 − 5x + 6,
17