0% found this document useful (0 votes)
75 views5 pages

A Proof of The Jordan Normal Form Theorem

1) The document provides a proof of the Jordan normal form theorem, which states that any matrix is similar to a block-diagonal matrix with Jordan blocks on the diagonal. 2) The proof first decomposes the vector space into invariant subspaces where the linear operator has a single eigenvalue. 3) It then establishes the Jordan normal form for a nilpotent linear operator (one where the power of the operator equals zero) to complete the proof for the general case.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views5 pages

A Proof of The Jordan Normal Form Theorem

1) The document provides a proof of the Jordan normal form theorem, which states that any matrix is similar to a block-diagonal matrix with Jordan blocks on the diagonal. 2) The proof first decomposes the vector space into invariant subspaces where the linear operator has a single eigenvalue. 3) It then establishes the Jordan normal form for a nilpotent linear operator (one where the power of the operator equals zero) to complete the proof for the general case.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A proof of the Jordan normal form theorem

Jordan normal form theorem states that any matrix is similar to a block-
diagonal matrix with Jordan blocks on the diagonal. To prove it, we first
reformulate it in the following way:
Jordan normal form theorem. For any finite-dimensional vector
space V and any linear operator A : V → V, there exist
• a decomposition of V

V = V1 ⊕ V2 ⊕ . . . ⊕ V k

into a direct sum of invariant subspaces of A;


(i) (i)
• a basis e1 , . . . , eni of Vi for each i = 1, . . . , k such that
(i) (i) (i) (i)
(A − λi Id)e1 = 0; (A − λi Id)e2 = e1 ; . . . (A − λi Id)e(i)
ni = eni −1

for some λi (which may coincide or be different for different i). Dimen-
sions of these subspaces and coefficients λi are determined uniquely up
to permutations.

Invariant subspaces, direct sums, and block matrices


Recall the following important definitions.
Definition 1. A subspace U of a vector space V is called an invariant
subspace of a linear operator A : V → V if for any u ∈ U we have A(u) ∈ U.
Definition 2. For any two subspaces U1 and U2 of a vector space V,
their sum U1 + U2 is defined as the set of all vectors u1 + u2 , where u1 ∈ U1 ,
u2 ∈ U2 . If U1 ∩ U2 = {0}, then the sum of U1 and U2 is called direct, and
is denoted by U1 ⊕ U2 .
Choose any decomposition of V into a direct sum of two subspaces U1 and
U2 . Note that if u1 ∈ U1 and u2 ∈ U2 , then u1 +u2 = 0 implies u1 = u2 = 0.
Indeed, one can rewrite it as u1 = −u2 and use the fact that U1 ∩ U2 = {0}.
This means that any vector of the direct sum can be represented in the form
u1 + u2 , where u1 ∈ U1 and u2 ∈ U2 , in a unique way. An immediate
consequence of that fact is the following important formula:

dim(U1 ⊕ U2 ) = dim U1 + dim U2 .

Indeed, it is clear that we can get a basis for the direct sum by joining
together bases for summands.
Fix a basis for U1 and a basis for U2 . Joining them together, we get a
basis for V = U1 ⊕U2 . When we write down the matrix of any linear operator

1
 
A11 A12
with respect to this basis, we get a block-diagonal matrix ; its
A21 A22
splitting into blocks correspond to the way our basis is split into two parts.
Let use formulate important facts which are immediate from the definition
of a matrix of a linear operator.
1. A12 = 0 if and only if U2 is invariant;
2. A21 = 0 if and only if U1 is invariant.
Thus, this matrix is block-triangular if and only if one of the subspaces is
invariant, and is block-diagonal if and only if both subspaces are invariant.
This admits an obvious generalisation for the case of larger number of sum-
mands in the direct sum.

Generalised eigenspaces
The first step of the proof is to decompose our vector space into the direct
sum of invariant subspaces where our operator has only one eigenvalue.
Definition 3. Let N1 (λ) = Ker(A − λ Id), N2 (λ) = Ker(A − λ Id)2 , . . . ,
Nm (λ) = Ker(A − λ Id)m , . . . Clearly,

N1 (λ) ⊂ N2 (λ) ⊂ . . . ⊂ Nm (λ) ⊂ . . .

Since we only work with finite-dimensional vector spaces, this sequence of


subspaces cannot be strictly increasing; if Ni (λ) 6= Ni+1 (λ), then, obvi-
ously, dim Ni+1 (λ) > 1 + dim Ni (λ). It follows that for some k we have
Nk (λ) = Nk+1 (λ).
Lemma 1. We have Nk (λ) = Nk+1 (λ) = Nk+2 (λ) = . . .
Let us prove that Nk+l (λ) = Nk+l−1 (λ) by induction on l. Note that the
induction basis (case l = 1) follows immediately from our notation.Suppose
that Nk+l (λ) = Nk+l−1 (λ); let us prove Nk+l+1 (λ) = Nk+l (λ). If we assume
that it is false, then there is a vector v such that

v ∈ Nk+l+1 (λ), v ∈
/ Nk+l (λ),

that is
(A − λ Id)k+l+1 (v) = 0, (A − λ Id)k+l (v) 6= 0.
Put w = (A − λ Id)(v) = 0. Obviously, we have

(A − λ Id)k+l (w) = 0, (A − λ Id)k+l−1 (w) 6= 0,

which contradicts the induction hypothesis, and our statement follows.


Lemma 2. Ker(A − λ Id)k ∩ Im(A − λ Id)k = {0}.

2
Indeed, assume there is a vector v ∈ Ker(A − λ Id)k ∩ Im(A − λ Id)k .
This means that (A − λ Id)k (v) = 0 and that there exists a vector w
such that v = (A − λ Id)k (w). It follows that (A − λ Id)2k (w) = 0, so
w ∈ Ker(A − λ Id)2k = N2k (λ). But from the previous lemma we know that
N2k (λ) = Nk (λ), so w ∈ Ker(A − λ Id)k . Thus, v = (A − λ Id)k (w) = 0,
which is what we need.
Lemma 3. V = Ker(A − λ Id)k ⊕ Im(A − λ Id)k .
Indeed, consider the direct sum of these two subspaces. It is a subspace of
V of dimension dim Ker(A − λ Id)k + dim Im(A − λ Id)k . Let A 0 = (A − λ Id)k .
Earlier we proved that for any linear operator its rank and the dimension
of its kernel sum up to the dimension of the vector space where it acts.
rk A 0 = dim Im A 0 , so we have

dim Ker(A − λ Id)k + dim Im(A − λ Id)k = dim Ker A 0 + rk A 0 = dim V.

A subspace of V whose dimension is equal to dim V has to coincide with V,


so the lemma follows.
Lemma 4. Ker(A − λ Id)k and Im(A − λ Id)k are invariant subspaces
of A.
Indeed, note that A(A − λ Id) = (A − λ Id)A, so
– if (A − λ Id)k (v) = 0, then (A − λ Id)k (A(v)) = A(A − λ Id)k (v) = 0;
– if v = (A−λ Id)k (w), then A(v) = A(A−λ Id)k (w) = (A−λ Id)k (A(w)).
To complete this step, we use induction on dim V. Note that on the
invariant subspace Ker(A − λ Id)k the operator A has only one eigenvalue (if
Av = µv for some 0 6= v ∈ Ker(A − λ Id)k , then (A − λ Id)v = (µ − λ)v, and
0 = (A − λ Id)k v = (µ − λ)k v, so µ = λ), and the dimension of Im(A − λ Id)k
is less than dim V (if λ is an eigenvalue of A), so we can apply the induction
hypothesis for A acting on the vector space V 0 = Im(A − λ Id)k . This results
in the following
Theorem. For any linear operator A : V → V whose (different) eigen-
values are λ1 , . . . , λk , there exist integers n1 , . . . , nk such that

V = Ker(A − λ1 Id)n1 ⊕ . . . ⊕ Ker(A − λk Id)nk .

Normal form for a nilpotent operator


The second step in the proof is to establish the Jordan normal form theorem
for the case of an operator B : V → V for which Bk = 0 (such operators
are called nilpotent). This would basically complete the proof, after we put
B = A − λ Id and use the result that we already obtained; we will discuss it
more precisely below.

3
Let us modify slightly the notation we used in the previous sec-
tion; put N1 = Ker B, N2 = Ker B2 , . . . , Nm = Ker Bm , . . . We have
Nk = Nk+1 = Nk+2 = . . . = V.
To make our proof more neat, we shall use the following definition.
Definition 4. For a vector space V and a subspace U ⊂ V, we say that
a sequence of vectors e1 , . . . , el is a basis of V relative to U if any vector
v ∈ V can be uniquely represented in the form c1 e1 + c2 e2 + . . . + cl el + u,
where c1 , . . . , cl are coefficients, and u ∈ U. In particular, the only linear
combination of e1 , . . . , el that belongs to U is the trivial combination (all
coefficients are equal to zero).
Example 1. The usual notion of a basis is contained in the new notion
of a relative basis: a usual basis of V is a basis relative to U = {0}.
Definition 5. We say that a sequence of vectors e1 , . . . , el is linearly
independent relative to U if the only linear combination of e1 , . . . , el that
belongs to U is the trivial combination (all coefficients are equal to zero).
Exercise 1. Any sequence of vectors that is linearly independent relative
to U can be extended to a basis relative to U.
Now we are going to prove our statement, constructing a required basis
in k steps. First, find a basis of V = Nk relative to Nk−1 . Let e1 , . . . , es be
vectors of this basis.
Lemma 5. The vectors e1 , . . . , es , B(e1 ), . . . , B(es ) are linearly inde-
pendent relative to Nk−2 .
Indeed, assume that

c1 e1 + . . . + cs es + d1 B(e1 ) + . . . + ds B(es ) ∈ Nk−2 .

Since ei ∈ Nk , we have B(ei ) ∈ Nk−1 ⊃ Nk−2 , so

c1 e1 + . . . + cs es ∈ −d1 B(e1 ) − . . . − ds B(es ) + Nk−2 ⊂ Nk−1 ,

which means that c1 = . . . = cs = 0 (e1 , . . . , es form a basis relative to


Nk−1 ). Thus,

B(d1 e1 + . . . + ds es ) = d1 B(e1 ) + . . . + ds B(es ) ∈ Nk−2 ,

so
d1 e1 + . . . + ds es ∈ Nk−1 ,
and we deduce that d1 = . . . = ds = 0 (e1 , . . . , es form a basis relative to
Nk−1 ), so the lemma follows.
Now we extend this collection of vectors by vectors f1 , . . . , ft which
together with B(e1 ), . . . , B(es ) form a basis of Nk−1 relative to Nk−2 . Abso-
lutely analogously one can prove

4
Lemma 6. The vectors e1 , . . . , es , B(e1 ), . . . , B(es ), B2 (e1 ), . . . , B2 (es ),
f1 , . . . , ft , B(f1 ), . . . , B(ft ) are linearly independent relative to Nk−3 .
We continue that extension process until we end up with a usual basis of
V of the following form:

e1 , . . . , es , B(e1 ), . . . , B(es ), B2 (e1 ), . . . , Bk−1 (e1 ), . . . , Bk−1 (es ),


f1 , . . . , ft , B(f1 ), . . . , Bk−2 (f1 ), . . . , Bk−2 (ft ),
...,
g1 , . . . , g u ,

where the first line contains a vector from Nk , a vector from Nk−1 , . . . , a
vector from N1 , the second one — a vector from Nk−1 , a vector from Nk−2 ,
. . . , a vector from N1 ,, . . . , the last one — just a vector from N1 .
To get from this basis a Jordan basis, we just re-number the basis vectors.
Note that the vectors

v1 = Bk−1 (e1 ), v2 = Bk−2 (e1 ), . . . , vk−1 = B(e1 ), vk = e1

form a “thread” of vectors for which B(v1 ) = 0, B(vi ) = vi−1 for i > 1, which
are precisely formulas for the action of a Jordan block matrix. Arranging all
vectors in chains like that, we obtain a Jordan basis.
Remark 1. Note that if we denote by md the number of Jordan blocks
of size d, we have

m1 + m2 + . . . + mk = dim N1 ,
m2 + . . . + mk = dim N2 − dim N1 ,
...
mk = dim Nk − dim Nk−1 ,

so the sizes of Jordan blocks are uniquely determined by the properties of


our operator.

General case of the Jordan normal form theorem


From the second section we know that V can be decomposed into a direct sum
of invariant subspaces Ker(A − λi Id)ni . From the third section, changing the
notation and putting B = A − λi Id, we deduce that each of these subspaces
can be decomposed into a direct sum of subspaces where A acts by a Jordan
block matrix; sizes of blocks can be computed from the dimension data listed
above. This completes the proof of the Jordan normal form theorem.

You might also like