0% found this document useful (0 votes)
39 views16 pages

Block Matrices in Linear Algebra: Abstract

Uploaded by

Amir Soussi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views16 pages

Block Matrices in Linear Algebra: Abstract

Uploaded by

Amir Soussi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

BLOCK MATRICES IN LINEAR ALGEBRA

STEPHAN RAMON GARCIA AND ROGER A. HORN

Abstract. Linear algebra is best done with block matrices. As evidence in


support of this thesis, we present numerous examples suitable for classroom
presentation.

1. Introduction
This paper is addressed to instructors of a first course in linear algebra, who
need not be specialists in the field. We aim to convince the reader that linear
algebra is best done with block matrices. In particular, flexible thinking about the
process of matrix multiplication can reveal concise proofs of important theorems
and expose new results. Viewing linear algebra from a block-matrix perspective
gives an instructor access to useful techniques, exercises, and examples.
Many of the techniques, proofs, and examples presented here are familiar to spe-
cialists in linear algebra or operator theory. We think that everyone who teaches
undergraduate linear algebra should be aware of them. A popular current textbook
says that block matrices “appear in most modern applications of linear algebra be-
cause the notation highlights essential structures in matrix analysis. . . ” [5, p. 119].
The use of block matrices in linear algebra instruction aligns mathematics peda-
gogy better with topics in advanced courses in pure mathematics, computer science,
data science, statistics, and other fields. For example, block-matrix techniques are
standard fare in modern algorithms [3]. Textbooks such as [2–7] make use of block
matrices.
We take the reader on a tour of block-matrix methods and applications. In
Section 2, we use right-column partitions to explain several standard first-course
results. In Section 3, we use left-column partitions to introduce the full-rank factor-
ization, prove the invariance of the number of elements in a basis, and establish the
equality of row and column rank. Instructors of a first linear algebra course will be
familiar with these topics, but perhaps not with a block matrix / column partition
approach to them. Section 4 concerns block-column matrices. Applications include
justification of a matrix-inversion algorithm and a proof of the uniqueness of the re-
duced row echelon form. Block-row and block-column matrices are used in Section
5 to obtain inequalities for the rank of sums and products of matrices, along with
algebraic characterizations of matrices that share the same column space or null
space. The preceding material culminates in Section 6, in which we consider block
matrices of several types and prove that the geometric multiplicity of an eigenvalue
is at most its algebraic multiplicity. We also obtain a variety of determinantal
results that are suitable for presentation in class. We conclude in Section 7 with
Kronecker products and several applications.
Key words and phrases. Matrix, matrix multiplication, block matrix, Kronecker product, rank,
eigenvalues.
1
2 STEPHAN RAMON GARCIA AND R.A. HORN

Notation: We frame our discussion for complex matrices. However, all of our
numerical examples involve only real matrices, which may be preferred by some
first-course instructors. We use Mm×n to denote the set of all m × n complex
matrices; Mn denotes the set of all n × n complex matrices. Boldface letters, such
as a, b, c, denote column vectors; e1 , e2 , . . . , en is the standard basis of Cn . We
regard elements of Cm as column vectors; that is, m × 1 matrices. If A ∈ Mm×n ,
then each column of A belongs to Mm×1 . The transpose of a matrix A is denoted
by AT . The null space and column space of a matrix A are denoted by null A and
col A, respectively. The trace and determinant of a square matrix A are denoted
by tr A and det A, respectively.

2. Right-column partitions
If    
1 2 4 5 2
A= and B = , (1)
3 4 6 7 1
then the entries of AB are dot products of rows of A with columns of B:
   
1·4+2·6 1·5+2·7 1·2+2·1 16 19 4
AB = = . (2)
3·4+4·4 3·5+4·7 3·2+4·1 36 43 10
But there are other ways to organize these computations. We examine right-column
partitions in this section. If A ∈ Mm×r and B = [b1 b2 . . . bn ] ∈ Mr×n , then the
jth column of AB is Abj . That is,
AB = [Ab1 Ab2 . . . Abn ]. (3)
An intentional approach to column partitions can facilitate proofs of important
results from elementary linear algebra.
Example 4. If A and B are the matrices from (1), then B = [b1 b2 b3 ], in which
     
4 5 2
b1 = , b2 = , and b3 = .
6 7 1
Partitioned matrix multiplication yields the expected answer (2):
"         #
1 2 4 1 2 5 1 2 2
[Ab1 Ab2 Ab3 ] =
3 4 6 3 4 7 3 4 1
"     #  
16 19 4 16 19 4
= =
36 43 10 36 43 10
= AB.
Example 5. Matrix-vector equations can be bundled together. For example, sup-
pose that x1 , x2 , . . . , xk are eigenvectors of A ∈ Mn for the eigenvalue λ and let
X = [x1 x2 . . . xk ] ∈ Mn×k . Then
AX = [Ax1 Ax2 . . . Axk ] = [λx1 λx2 . . . λxk ] = λX.
This observation can be used to prove that the geometric multiplicity of an eigen-
value is at most its algebraic multiplicity; see Example 36.
The following example provides a short proof of an important implication in “the
invertible matrix theorem,” which is in the core of a first course in linear algebra.
BLOCK MATRICES IN LINEAR ALGEBRA 3

Example 6 (Universal consistency yields right inverse). If Ax = b is consistent


for each b ∈ Cn , there are bi ∈ Cn such that Abi = ei for i = 1, 2, . . . , n. Then
A[b1 b2 . . . bn ] = [Ab1 Ab2 . . . Abn ] = [e1 e2 . . . en ] = I,
so AB = I for B = [b1 b2 . . . bn ].
In the preceding example, we obtained a right inverse for a square matrix A. The
fact that a right inverse for A is also a left inverse is nontrivial; it can fail for linear
transformations if the underlying vector space is not finite dimensional [2, P.2.7].
Here is an explanation that is based on column partitions.
Example 7 (One-sided inverses are two-sided inverses). If A, B ∈ Mn and AB = I,
then A(Bx) = x for all x ∈ Cn and hence col A = Cn . The Dimension Theorem [2,
Cor. 2.5.4] ensures that null A = {0}. Partition I − BA = [x1 x2 . . . xn ] according
to its columns. Then
[Ax1 Ax2 . . . Axn ] = A[x1 x2 . . . xn ] = A(I − BA)
= A − (AB)A = A − IA = 0,
so each xi = 0 since null A = {0}. Thus, I − BA = 0 and hence BA = I.
Although it cannot be recommended as a practical numerical algorithm, Cramer’s
rule is an important concept. Why does it work?
Example 8 (Cramer’s rule). Let A = [a1 a2 . . . an ] ∈ Mn be invertible, let b ∈
Cn , and let Ai ∈ Mn be the matrix obtained by replacing the ith column of A with
b. Then there is a unique x = [x1 x2 . . . xn ]T ∈ Cn such that Ax = b. Cofactor
expansion along the ith row of A reveals that (ith columns underlined)
xi = det[e1 . . . ei−1 x ei+1 . . . en ]
= det[A−1 a1 . . . A−1 ai−1 A−1 b A−1 ai+1 . . . A−1 an ]
= det A−1 [a1 . . . ai−1 b ai+1 . . . an ] = det(A−1 Ai )


det Ai
= .
det A
3. Left-column partitions
We have gotten some mileage out of partitioning the matrix on the right-hand
side of a product. If we partition the matrix on the left-hand side of a product, other
opportunities emerge. If A = [a1 a2 . . . an ] ∈ Mm×n and x = [x1 x2 . . . xn ]T ∈
Cn , then
Ax = x1 a1 + x2 a2 + · · · + xn an . (9)
That is, Ax is a linear combination of the columns of A.
The next example illustrates that relationships between geometric objects, such
as vectors and subspaces, can often be framed algebraically.
Example 10 (Geometry and matrix algebra). Let A ∈ Mm×n and B ∈ Mm×k .
We claim that
col B ⊆ col A ⇐⇒ there exists an X ∈ Mn×k such that AX = B;
moreover, if the columns of A are linearly independent, then X is unique. If each
column of B = [b1 b2 . . . bk ] ∈ Mm×k is a linear combination of the columns of
A ∈ Mm×n , then (9) ensures that there are xi ∈ Cn such that bi = Axi for each i;
4 STEPHAN RAMON GARCIA AND R.A. HORN

if the columns of A are linearly independent, then the xi are uniquely determined.
Let X = [x1 x2 . . . xk ] ∈ Mn×k . Then
B = [b1 b2 . . . bk ] = [Ax1 Ax2 . . . Axk ] = A[x1 x2 . . . xk ] = AX.
Conversely, if AX = B, then (9) indicates that each column of B lies in col A.
The following example uses Example 10 to show that any two bases for the same
subspace of Cn have the same number of elements [1], [2, P.3.38]. It relies on the
fact that tr XY = tr Y X if both products are defined; see [2, (0.3.5)].
Example 11 (Number of elements in a basis). If a1 , a2 , . . . , ar and b1 , b2 , . . . , bs
are bases for the same subspace of Cn , we claim that r = s. If
A = [a1 a2 . . . ar ] ∈ Mn×r and B = [b1 b2 . . . bs ] ∈ Mn×s ,
then col A = col B. Example 10 ensures that B = AX and A = BY , in which
X ∈ Mr×s and Y ∈ Ms×r . Thus,
A(Ir − XY ) = A − AXY = A − BY = A − A = 0.
Since A has linearly independent columns, each column of Ir − XY is zero; that is,
XY = Ir . A similar argument shows that Y X = Is and hence
r = tr Ir = tr Y X = tr XY = tr Is = s.
Another consequence of the principle in Example 10 is a second explanation of
the equality of left and right inverses.
Example 12 (One-sided inverses are two-sided inverses). Suppose that A, B ∈ Mn
and AB = I. If Bx = 0, then x = Ix = A(Bx) = A0 = 0. This shows that
null B = {0}. The Dimension Theorem ensures that col B = Cn , so there is an
X ∈ Mn such that I = BX (this is where we use Example 10). Then BA = BAI =
BABX = BIX = BX = I.
A fundamental result from elementary linear algebra is the equality of rank A
and rank AT ; that is, “column rank equals row rank.” The identity (9) permits us
to give a simple explanation.
Example 13 (Equality of row and column rank). For A ∈ Mm×n , we claim that
rank A = rank AT . We may assume that k = rank A ≥ 1. Let the columns of
B ∈ Mm×k be a basis for col A. Example 10 ensures that there is an X ∈ Mk×n
such that A = BX. Thus, AT = X T B T , so col AT ⊆ col X T . Then
rank AT = dim col AT ≤ dim col X T ≤ k = rank A.
Now apply the same reasoning to AT and obtain rank A = rank AT .
We finish this section with a matrix factorization that plays a role in many
block-matrix arguments.
Example 14 (Full-rank factorization). Let A = [a1 a2 . . . an ] ∈ Mm×n be nonzero,
let r = rank A, and let the columns of X ∈ Mm×r be a basis for col A. We claim that
there is a unique Y ∈ Mr×n such that A = XY ; moreover, rank Y = rank X = r.
Since the r columns of X are a basis for col A, we have rank X = r and col A = col X.
Example 10 ensures that there is a Y ∈ Mr×n such that A = XY . Moreover, Y is
unique because each column of A is a unique linear combination of the columns of
X. Finally, invoke Example 13 to compute
r = rank AT = dim col(Y T X T ) ≤ dim col Y T ≤ r.
BLOCK MATRICES IN LINEAR ALGEBRA 5

Therefore, rank Y = rank Y T = dim col Y T = r.


In the preceding example, the matrix X is never unique. One way to construct
a basis for col A is related to the reduced row echelon form (RREF) of A. Let A0 =
0 ∈ Cn ; for each j = 1, 2, . . . , n let Aj = [a1 a2 . . . aj ]. For j ∈ {1, 2, . . . , n}, we
say that aj is a basic column of A if aj ∈ / col Aj−1 (that is, if rank Aj > rank Aj−1 ).
The basic columns of A comprise a basis for col A and correspond to the pivot
columns of the RREF of A; see [6, Problem 3.9.8].

4. Block columns
Let A ∈ Mm×r and B ∈ Mr×n . Write
B = [B1 B2 ],
in which B1 ∈ Mr×k and B2 ∈ Mr×(n−k) ; that is, group the first k columns of B to
create B1 and group the remaining n − k columns of B to create B2 . Then,
AB = A[B1 B2 ] = [AB1 AB2 ]; (15)
this is the block version of (3). It can be generalized to involve multiple blocks Bi .
We consider two pedagogically-oriented applications of the block-column approach
(15) to matrix multiplication: a justification of the “side-by-side” matrix inversion
algorithm and a proof of the uniqueness of the reduced row echelon form of a matrix.
First, we consider some examples that illustrate (15).
Example 16. Let A and B be as in (1) and write B = [B1 B2 ], in which
   
4 5 2
B1 = and B2 = .
6 7 1
Then "   #
16 19 4
AB = [AB1 AB2 ] = ,
36 43 10

as computed in (2).
Example 17 (Extending to a basis). If the list x1 , x2 , . . . , xk ∈ Cn is linearly
independent, then it can be extended to a basis of Cn . Equivalently, if X ∈ Mn×k
has linearly independent columns, then there is a Y ∈ Mn×(n−k) such that [X Y ] ∈
Mn is invertible. This observation has lots of applications; see Example 36.
Example 18 (Inversion algorithm). Let A ∈ Mn be invertible and let R be a
product of elementary matrices that encode a sequence of row operations that row
reduces A to I. Then RA = I; that is, R = A−1 . Then (15) ensures that
R[A I] = [RA R] = [I A−1 ].
Thus, if one can row reduce the block matrix [A I] to [I X], then X = A−1 .
Our second application of block columns is the uniqueness of the RREF. The
RREF underpins almost everything in a typical first linear algebra course. It is used
to parametrize solution sets of systems of linear equations and to compute the rank
of a small matrix (for practical computations other procedures are preferred [3]).
6 STEPHAN RAMON GARCIA AND R.A. HORN

Example 19 (Uniqueness of RREF). We claim that each A ∈ Mm×n has a unique


reduced row echelon form E. If A = 0, then E = 0 so we assume that A 6= 0 and
proceed by induction on the number of columns of A. In the base case n = 1, the
RREF E = e1 is uniquely determined. Suppose that n ≥ 2, partition A = [A0 a],
and suppose that R ∈ Mm encodes a sequence of row operations (which need not
be unique) that reduce A to its RREF, which we partition as E = [E 0 y]. Then
RA = [RA0 Ra] = [E 0 y]. The induction hypothesis ensures that RA0 = E 0 is the
unique RREF of A0 . Let r = rank A0 . There are two cases to consider: either
a ∈ col A0 or a ∈
/ col A0 . If a ∈
/ col A0 , then it is a basic column of A and y = er+1
is uniquely determined. If a ∈ col A0 , then it is a unique linear combination of
the basic columns of A0 ; that is, a = A0 x, in which x is uniquely determined by
the condition that it has a zero entry in each position corresponding to a nonbasic
column of A0 . Then y = Ra = RA0 x = E 0 x, in which both E 0 and x are uniquely
determined.

5. Block rows and columns


What we have done for columns we can also do for rows. The following examples
illustrate a few results derived from block-matrix multiplication. Chief among these
are several important rank inequalities and characterizations of matrices with the
same column space or null space.
Example 20. A numerical example illustrates the general principle. Write
   
1 0 2 3 0  
Y
A =  0 3 4  = [X Z] and B =  1 4  = .
W
0 5 0 0 1
Then
 
Y
AB = [X Z] = XY + ZW
W
   
1 0   2
3 0
= 0 3 + 4 [0 1]
1 4
0 5 0
     
3 0 0 2 3 2
= 3 12 + 0 4 = 3 16 . (21)
5 20 0 0 5 20
A computation verifies that this evaluation of AB agrees with the standard method.
The rank of A ∈ Mm×n is the dimension of col A. Bundled row and column parti-
tions permit us to derive inequalities for rank(A + B) and rank AB without fiddling
with bases, linear combinations, and spans. Block-matrix notation simplifies and
streamlines our work.
Example 22 (Rank is subadditive). For A, B ∈ Mm×n , we claim that
rank(A + B) ≤ rank A + rank B. (23)
We may assume that r = rank A ≥ 1 and s = rank B ≥ 1 since there is nothing to
prove if r = 0 or s = 0. Let A = XY and B = ZW be full-rank factorizations; see
Example 14. Since [X Z] ∈ Mm×(r+s) , we have
rank(A + B) = dim col(A + B) = dim col(XY + ZW )
BLOCK MATRICES IN LINEAR ALGEBRA 7

  
Y
= dim col [X Z] ≤ dim col[X Z]
W
≤ r + s = rank A + rank B.
The preceding result could be proved by a counting argument: produce bases
for col A and col B and observe that col(A + B) ⊆ col A + col B. However, Example
22 has a natural advantage. Instead of dealing with the notational overhead of
columns and bases, we let a block matrix do the work. This approach produces
other applications too. For example, it is difficult to see a counting argument that
reproduces the following result.
Example 24 (Sylvester’s rank inequality). For A ∈ Mm×k and B ∈ Mk×n , we
claim that
rank A + rank B − k ≤ rank AB.
Let r = rank AB. If r ≥ 1, then let AB = XY be a full-rank factorization (Example
14), in which X ∈ Mm×r and Y ∈ Mr×n . Define

B if r = 0,

A if r = 0,

C= and D =
 
B
[A X] ∈ M
m×(k+r) if r ≥ 1,  ∈ M(k+r)×n if r ≥ 1.
−Y

Then CD = 0, so col D ⊆ null C and


rank A + rank B ≤ rank C + rank D ≤ rank C + nullity C
= k + r = k + rank AB
The following two examples reinforce an important point. Relationships between
geometric objects (subspaces here) can be revealed by matrix arithmetic.
Example 25 (Matrices with the same column space). Let A, B ∈ Mm×n . We
claim that
col A = col B ⇐⇒ there is an invertible S ∈ Mn such that A = BS.
The implication (⇐) is straightforward; we focus on (⇒). Suppose that col A =
col B and let the columns of X ∈ Mm×r be a basis for col A. Example 14 ensures
that there are matrices Y, Z ∈ Mr×n such that
A = XY, B = XZ, and rank Y = rank Z = r.
Let U, V ∈ M(m−r)×n be such that
   
Y Z
R= ∈ Mm and T = ∈ Mm
U V
are invertible. Then  
Y
A = XY = [X 0] = [X 0]R
U
and  
Z
B = XZ = [X 0] = [X 0]T,
V
so
A = [X 0]R = [X 0]T (T −1 R) = BS,
in which S = T −1 R is invertible.
8 STEPHAN RAMON GARCIA AND R.A. HORN

In a first linear algebra course, row reduction is often used to solve systems of
linear equations. Students are taught that A and B have the same null space if
A = EB, in which E is an elementary matrix. Since a matrix is invertible if and
only if it is the product of elementary matrices, it follows that A and B have the
same null space if they are row equivalent. What about the converse?
Example 26 (Matrices with the same null space). Let A, B ∈ Mm×n . Then
Example 25 ensures that
null A = null B ⇐⇒ (col A∗ )⊥ = (col B ∗ )⊥
⇐⇒ col A∗ = col B ∗
⇐⇒ A∗ = B ∗ S for some invertible S ∈ Mm
⇐⇒ A = RB for some invertible R ∈ Mm
Thus, if a sequence of elementary row operations is performed on B to obtain a
new matrix A = RB, then the linear systems Ax = 0 and Bx = 0 have the same
solutions. The latter are easily described if R is chosen so that A is in row echelon
form.

6. Block matrices
Having seen the advantages of block row and column partitions, we are now
ready to consider both simultaneously. Let
   
A11 A12 B11 B12
A= and B = ,
A21 A22 B21 B22
in which the sizes of the submatrices involved are appropriate for the following
matrix multiplications to be defined:
 
A11 B11 + A12 B21 A11 B12 + A12 B22
AB = . (27)
A21 B11 + A22 B21 A21 B12 + A22 B22
In particular, the diagonal blocks of A and B are square and the dimensions of
the off-diagonal blocks are determined by context. Multiplication of larger block
matrices is conducted in an analogous manner.
Example 28. Here is a numerical example of block matrix multiplication. We use
horizontal and vertical bars to highlight our partitions, although we refrain from
doing so in later examples. If
   
1 0 2 3 0
A =  0 3 4  and B =  1 4  ,
0 5 0 0 1
then (27) ensures that
          
1 0 3 2 1 0 0 2
+ [0] + [1]
 0 3 1 4 0 3 4 4 
AB =      
 3 0 
[0 5] + [0][0] [0 5] + [0][1]
1 4
    
3 2

3 2
= 3 16  = 3 16 .

[5] [20] 5 20
BLOCK MATRICES IN LINEAR ALGEBRA 9

This agrees with (21) and with the usual computation of the matrix product.
We are now ready for a symbolic example. Although there are more general
formulas for the inverse of a 2 × 2 block matrix [2, P.3.28], the following special
case is sufficient for our purposes.
Example 29 (Inverse of a block triangular matrix). We claim that if Y ∈ Mn and
Z ∈ Mm are invertible, then
−1  −1
−Y −1 XZ −1
 
Y X Y
= . (30)
0 Z 0 Z −1
How can such a result be discovered? Perform row reduction with block matrices,
being careful to take into account the noncommutativity of matrix multiplication:
 −1
I Y −1 X
   
Y 0 Y X
(1) = multiply first row by Y −1 ,
0 I 0 Z 0 Z

Y −1 X Y −1 X
    
I 0 I I
(2) = multiply second row by Z −1 ,
0 Z −1 0 Z 0 I

−Y −1 X Y −1 X add −Y −1 X times the


    
I I I 0
(3) =
0 I 0 I 0 I second row to first row.
The formula (30) for the inverse of a 2 × 2 block upper triangular matrix can be
used to prove that the inverse of an upper triangular matrix is upper triangular.
Example 31 (Inverse of an upper triangular matrix). We claim that if A = [aij ] ∈
Mn is upper triangular and has nonzero diagonal entries, then A−1 is upper tri-
angular. We proceed by induction on n. The base case n = 1 is clear. For the
induction step, let n ≥ 2 and suppose that every upper triangular matrix of size
less than n with nonzero diagonal entries has an inverse that is upper triangular.
Let A ∈ Mn be upper triangular and partition it as
 
B ?
A= ,
0 ann
in which B ∈ Mn−1 is upper triangular and ? indicates an (n − 1) × 1 submatrix
whose entries are unimportant. It follows from (30) that
 −1 
−1 B ?
A = .
0 a−1
nn

The induction hypothesis ensures that B −1 is upper triangular and hence so is A−1 .
This completes the induction.
Determinants are a staple of many introductory linear algebra courses. Numer-
ical recipes are often given for 2 × 2 and 3 × 3 matrices. Various techniques are
occasionally introduced to evaluate larger determinants. Since the development of
eigenvalues and eigenvectors is often based upon determinants via the characteristic
polynomial (although this is not how modern numerical algorithms approach the
subject [3]), techniques to compute determinants of larger matrices should be a wel-
come addition to the curriculum. This makes carefully-crafted problems involving
10 STEPHAN RAMON GARCIA AND R.A. HORN

4 × 4 or 5 × 5 matrices accessible to manual computation. Many of the follow-


ing examples can be modified by the instructor to provide a host of interesting
determinant and eigenvalue problems.
To begin, we make an observation: if A ∈ Mn , then
   
A 0 I 0
det = det A = det m . (32)
0 Im 0 A
There are several ways to establish these identities, each suited to a different ap-
proach to determinants. For example, one could establish the first equality (32)
by row reduction. The same row operations used to compute the RREF of A are
used to compute the RREF of the first block matrix. Thus, the first two deter-
minants are equal. One could also induct on m. In the base case m = 0, the
given block matrices are simply A itself. The inductive step follows from Laplace
(cofactor) expansion along either the last or the first row of the the given block
matrix. The identities (32) also follow readily from the combinatorial definition of
the determinant.
If A ∈ Mn , then its characteristic polynomial pA (z) = det(zI − A) is a monic
polynomial of degree n and its zeros are the eigenvalues of A. The following example
indicates how to compute the characteristic polynomial of a block triangular matrix.
Example 33. Let A and D be square. Then
     
A B I 0 I B A 0
= , (34)
0 D 0 D 0 I 0 I
so (32) and the multiplicativity of the determinant ensure that
 
A B
det = (det A)(det D). (35)
0 D
If M denotes the block matrix (34), then
 
zI − A −B
pM (z) = det = det(zI − A) det(zI − D) = pA (z)pD (z).
0 zI − D
This is an important property of block-triangular matrices: the characteristic poly-
nomial of the block matrix is the product of the characteristic polynomials of the
diagonal blocks. This has many consequences; see Examples 36 and 37.
If A ∈ Mn , then the geometric multiplicity of an eigenvalue λ is dim null(A − λI).
The algebraic multiplicity of λ is its multiplicity as a root of pA . A fundamental
result is that the geometric multiplicity of an eigenvalue cannot exceed its algebraic
multiplicity. Strict inequality can occur even for small matrices that arise in a first
linear algebra course. For example, the elementary matrices
 
1 c
A= , c 6= 0
0 1
have this property. Fortunately, block-matrix multiplication provides a way to
explain what is going on.
Example 36 (Geometric multiplicity ≤ algebraic multiplicity). Let A ∈ Mn and
let λ be an eigenvalue of A. Suppose that the columns of X ∈ Mn×k form a basis
BLOCK MATRICES IN LINEAR ALGEBRA 11

for the corresponding eigenspace; see Example 5. Choose Y ∈ Mn×(n−k) such that
S = [X Y ] ∈ Mn is invertible; see Example 17. Then AX = λX and
   
Ik 0 I
= In = S S = [S X S Y ], so S X = k .
−1 −1 −1 −1
0 In−k 0
Thus,
S −1 AS = S −1 A[X Y ] = S −1 [AX AY ] = S −1 [λX AY ]
 
−1 −1 λIk ?
= [λS X S AY ] = ,
0 C
in which ? denotes a k × (n − k) submatrix whose entries are of no interest. Since
similar matrices have the same characteristic polynomial, Example 33 ensures that
pA (z) = pS −1AS (z) = (z − λ)k pC (z). Consequently, k = nullity(A − λI) is at most
the multiplicity of λ as a zero of pA (z).
Students should be warned repeatedly that matrix multiplication is noncommu-
tative. That is, if A ∈ Mm×n and B ∈ Mn×m , then AB need not equal BA, even if
both products are defined. Students may be pleased to learn that AB ∈ Mm and
BA ∈ Mn are remarkably alike, despite potentially being of different sizes. This
fact has an elegant explanation using block matrices.
Example 37 (AB versus BA). If A ∈ Mm×n and B ∈ Mn×m , then
   
AB A 0m A
and (38)
0 0n 0 BA
are similar since
     
Im 0 AB A 0 A Im 0
= m ,
B In 0 0n 0 BA B In
in which the intertwining matrix is invertible. Since similar matrices have the same
characteristic polynomial, Example 33 ensures that
z n pAB (z) = z m pBA (z). (39)
Thus, the nonzero eigenvalues of AB and BA are the same, with the same mul-
tiplicities. In fact, one can show that the Jordan canonical forms of AB and BA
differ only in their treatment of the eigenvalue zero [2, Thm. 11.9.1].
The preceding facts about AB and BA are more than just curiosities. Example
37 can be used to compute the eigenvalues of certain large, structured matrices.
Suppose that A ∈ Mn has rank r < n. If A = XY , in which X, Y T ∈ Mn×r is a
full-rank factorization (Example 14), then the eigenvalues of A are the eigenvalues
of the r × r matrix Y X, along with n − r zero eigenvalues. Consider the following
example.
Example 40. What are the eigenvalues of
 
2 3 4 ··· n+1
 3
 4 5 ··· n + 2

A=
 4 5 6 ··· n + 3
?
 .. .. .. .. .. 
 . . . . . 
n+1 n+2 n+3 ··· 2n
12 STEPHAN RAMON GARCIA AND R.A. HORN

The column space of A is spanned by

e = [1 1 . . . 1]T and r = [1 2 . . . n]T

since the jth column of A is r + je. The list r, e is linearly independent, so it is


a basis for col A. Let X = [r e] and observe that the jth column of A is X[1 j]T .
This yields a full-rank factorization (Example 14) A = XY , in which Y = [e r]T .
Example 37 says that the eigenvalues of A = XY are n−2 zeros and the eigenvalues
of the 2 × 2 matrix
1
 T  T
e r eT e
  
e 2 n(n + 1) n
Y X = T [r e] = T = 1 1 ,
r r r rT e 6 n(n + 1)(2n + 1) 2 n(n + 1)

which are
s !
1 2n + 1
n(n + 1) ± .
2 6(n + 1)

Block-matrix computations can do much more than provide bonus problems and
alternative proofs of results in a first linear algebra course. Here are a few examples.

Example 41 (Sylvester’s determinant identity). If X ∈ Mm×n and Y ∈ Mn×m ,


then
det(Im + XY ) = det(In + Y X). (42)
This remarkable identity of Sylvester relates the determinants of an m × m matrix
and an n × n matrix. It follows from (35):
    
I + XY 0 I X I −X
det(I + XY ) = det = det
Y I 0 I Y I
       
I X I −X I −X I X
= det det = det det
0 I Y I Y I 0 I
    
I −X I X I 0
= det = det
Y I 0 I Y I +YX
= det(I + Y X).

Another explanation can be based on the fact that XY and Y X have the same
nonzero eigenvalues, with the same multiplicities (see Example 37). With the ex-
ception of the eigenvalue 1, the matrices Im + XY and In + Y X have the same
eigenvalues with the same multiplicities. Since the determinant of a matrix is the
product of its eigenvalues, (42) follows.

The following elegant identity permits the evaluation of the determinant of a


rank-one perturbation of a matrix whose determinant is known. In particular, (44)
is a rare example of the determinant working well with matrix addition.

Example 43 (Determinant of a rank-one update). If A ∈ Mn is invertible and


u, v ∈ Cn , then Sylvester’s identity (42) with X = A−1 u and Y = vT yields

det(A + uvT ) = (det A) det(I + A−1 uvT ) = (det A)(1 + v T −1


| A{z u}). (44)
a scalar
BLOCK MATRICES IN LINEAR ALGEBRA 13

The identity (44) can be used to create large matrices whose determinants can be
computed in a straightforward manner. For example,
     
2 1 1 1 1 1 0 0 0 0 1 1 1 1 1
1 0 1 1 1 0 −1 0 0 0 1 1 1 1 1
T
     
1 1 2 1 1 = 0 0 1 0 0 + 1 1 1 1 1 = A + uv (45)
 

1 1 1 0 1 0 0 0 −1 0  1 1 1 1 1
1 1 1 1 2 0 0 0 0 1 1 1 1 1 1

in which A = diag(1, −1, 1, −1, 1) and u = v = [1 1 . . . 1]T . Since A = A−1 and


det A = 1, an application of (44) reveals that the determinant of the matrix in (45)
is 2.
If a 6= 0, then the right-hand side of
 
a b
det = ad − bc (46)
c d
equals a(d − ca−1 b), a formula that generalizes to 2 × 2 block matrices.
Example 47 (Schur complement). Let
 
A B
M= , (48)
C D
in which A and D are square and A is invertible. Take determinants in
0 A B I −A−1 B I −A−1 B
      
I A B
=
−CA−1 I C D 0 I 0 D − CA−1 B 0 I
 
A 0
= (49)
0 D − CA−1 B
and use (35) to obtain
 
A B
det = (det A) det(D − CA−1 B); (50)
C D
there is an analogous formula if D is invertible. Schur’s formula (50) reduces the
computation of a large determinant to the computation of two smaller ones. Since
left- or right-multiplication by invertible matrices leaves the rank of a matrix in-
variant, we derive the elegant formula
rank M = rank A + rank(M/A).
in which M/A = D − CA−1 B is the Schur complement of A in M .
Example 51. If A and C commute in (48), then A, B, C, D are square matrices of
the same size and (50) reduces to det(AD−CB), which bears a striking resemblance
to (46). For example,
 
1 1 1 2      
 1 1 3 4  1 1 1 1 1 2 1 0
det 
  = det −
1 0 1 1  1 1 1 1 3 4 0 1
0 1 1 1
     
2 2 1 2 1 0
= det − = det = −2.
2 2 3 4 −1 −2
14 STEPHAN RAMON GARCIA AND R.A. HORN

Example 52. Partition


 
2 0 0 1 1

 0 2 0 1 1 

M =
 0 0 2 1 1 

 1 1 1 4 1 
1 1 1 1 4
as in (48). Then (50) ensures that
 1 0
     
2 0 0    0 1 1
4 1 1 1 1 2 1
det M = det 0 2 0 det  − 0 2 0  1 1
1 4 1 1 1 1
0 0 2 0 0 2
1 1
   3 3   5 1

4 1 −2
= 8 det 2
− 3 3 2 = 8 det 2
1 4 2 2 − 21 5
2
= 8 · 6 = 48.
From a pedagogical perspective, such techniques are desirable since they permit
the consideration of problems involving matrices larger than 3 × 3.

7. Kronecker products
We conclude with a discussion of Kronecker products. It illustrates again that
block-matrix arithmetic can be a useful pedagogical tool.
If A = [aij ] ∈ Mm×n and B ∈ Mp×q , then the Kronecker product of A and B is
the block matrix
 
a11 B a12 B · · · a1n B
 a21 B a22 B · · · a2n B 
A⊗B = . ..  ∈ Mmp×nq .
 
.. ..
 .. . . . 
am1 B am2 B ··· amn B
Example 53. If  
1 2
A= and B = [5 6],
3 4
then    
B 2B 5 6 10 12
A⊗B = =
3B 4B 15 18 20 24
and  
5 10 6 12
B ⊗ A = [5A 6A] = .
15 20 18 24
The Kronecker product interacts with ordinary matrix multiplication and addi-
tion as follows (A, B, C, D are matrices c is a scalar):
(i) (A ⊗ B)(C ⊗ D) = AC ⊗ BD; (iv) A ⊗ (B + C) = A ⊗ B + A ⊗ C;
(ii) c(A ⊗ B) = (cA) ⊗ B = A ⊗ (cB); (v) A ⊗ (B ⊗ C) = (A ⊗ B) ⊗ C.
(iii) (A + B) ⊗ C = A ⊗ C + B ⊗ C;

If A and B are square matrices of the same size then the eigenvalues of AB need
not be products of eigenvalues of A and B. However, for square matrices A and B
of any size, all of the eigenvalues of A ⊗ B are products of eigenvalues of A and B,
BLOCK MATRICES IN LINEAR ALGEBRA 15

and all possible products (by algebraic multiplicity) occur; see [2, P.10.39]. This
fact (and a related version for sums of eigenvalues) can be used by instructors who
wish to construct matrices with prescribed eigenvalues and multiplicities.
Example 54. If Ax = λx and By = µy, then
(A ⊗ B)(x ⊗ y) = (Ax) ⊗ (By) = (λx) ⊗ (µy) = λµ(x ⊗ y)
and
[(A ⊗ I) + (I ⊗ B)](x ⊗ y) = (A ⊗ I)(x ⊗ y) + (I ⊗ B)(x ⊗ y)
= Ax ⊗ y + x ⊗ By
= λx ⊗ y + µx ⊗ y
= (λ + µ)(x ⊗ y).
That is, if λ and µ are eigenvalues of A and B, respectively, then λµ is an eigenvalue
of A ⊗ B and λ + µ is an eigenvalue of A ⊗ I + I ⊗ B, respectively.
Example 55. The eigenvalues of
 
3 4 6 8    
2 1 4 2
 = 1 2

3 4
12 16 9 12 4 3 2 1
8 4 6 3
are −5, −5, 1, and 25; these are 5 × (−1), (−1) × 5, (−1) × (−1), and 5 × 5. The
eigenvalues of each factor are −1 and 5.
Example 56. The eigenvalues of
 
4 4 2 0    
2 2 0 2
 = 1 2
⊗ I2 + I2 ⊗
3 4
4 0 6 4 4 3 2 1
0 4 2 4
are −2, 4, 4, and 10; these are (−1) + (−1), (−1) + 5, 5 + (−1), and 5 + 5.
We conclude with a proof of a seminal result in abstract algebra: the algebraic
numbers form a field. That such a result should have a simple proof using block
matrices indicates the usefulness of the method.
An algebraic number is a complex number that is a zero of a monic polynomial
with rational coefficients. Let
f (z) = z n + cn−1 z n−1 + cn−2 z n−2 + · · · + c1 z + c0 , n ≥ 1.
The companion matrix of f is Cf = [−c0 ] if n = 1 and is
 
0 0 ... 0 −c0
1
 0 ... 0 −c1 
Cf = 0
 1 ... 0 −c2  if n ≥ 2.
 .. .. . . .. .. 
. . . . . 
0 0 . . . 1 −cn−1
Induction and cofactor expansion along the top row of zI − Cf shows that f is the
characteristic polynomial of Cf . Consequently, a complex number is algebraic if
and only if it is an eigenvalue of a matrix with rational entries.
16 STEPHAN RAMON GARCIA AND R.A. HORN

Example 57 (The algebraic numbers form a field). Let α, β be algebraic numbers


and suppose that p(α) = q(β) = 0, in which p and q are monic polynomials with
rational coefficients. Then α, β are eigenvalues of the rational matrices Cp and Cq ,
respectively, αβ is an eigenvalue of the rational matrix Cp ⊗ Cq , and α + β is an
eigenvalue of the rational matrix Cp ⊗ I + I ⊗ Cq . If α 6= 0 and p has degree k, then
there is a rational number c such that cz k p(z −1 ) = f (z −1 ), in which f is a rational
monic polynomial and f (α−1 ) = 0.
Acknowledgments: This work was partially supported by National Science Foun-
dation Grants DMS-1800123 and DMS-1265973, and the David L. Hirsch III and
Susan H. Hirsch Research Initiation Grant.

References
[1] Stephan Ramon Garcia. Linearly independent spanning sets. Amer. Math. Monthly,
124(8):722, 2017.
[2] Stephan Ramon Garcia and Roger A. Horn. A second course in linear algebra. Cambridge
Mathematical Textbooks. Cambridge University Press, New York, 2017.
[3] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins Studies in the
Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, fourth edition, 2013.
[4] Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge University Press, Cam-
bridge, second edition, 2013.
[5] David C. Lay, Steven R. Lay, and Judi J. McDonald. Linear Algebra and Its Applications.
Pearson, fifth edition, 2015.
[6] Carl Meyer. Matrix analysis and applied linear algebra. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA, 2000. With 1 CD-ROM (Windows, Macintosh and
UNIX) and a solutions manual (iv+171 pp.).
[7] Fuzhen Zhang. Matrix theory. Universitext. Springer, New York, second edition, 2011. Basic
results and techniques.

Department of Mathematics, Pomona College, 610 N. College Ave., Claremont, CA


91711
E-mail address: [email protected]
URL: http://pages.pomona.edu/~sg064747

E-mail address: [email protected]

You might also like