Block Matrices in Linear Algebra: Abstract
Block Matrices in Linear Algebra: Abstract
1. Introduction
This paper is addressed to instructors of a first course in linear algebra, who
need not be specialists in the field. We aim to convince the reader that linear
algebra is best done with block matrices. In particular, flexible thinking about the
process of matrix multiplication can reveal concise proofs of important theorems
and expose new results. Viewing linear algebra from a block-matrix perspective
gives an instructor access to useful techniques, exercises, and examples.
Many of the techniques, proofs, and examples presented here are familiar to spe-
cialists in linear algebra or operator theory. We think that everyone who teaches
undergraduate linear algebra should be aware of them. A popular current textbook
says that block matrices “appear in most modern applications of linear algebra be-
cause the notation highlights essential structures in matrix analysis. . . ” [5, p. 119].
The use of block matrices in linear algebra instruction aligns mathematics peda-
gogy better with topics in advanced courses in pure mathematics, computer science,
data science, statistics, and other fields. For example, block-matrix techniques are
standard fare in modern algorithms [3]. Textbooks such as [2–7] make use of block
matrices.
We take the reader on a tour of block-matrix methods and applications. In
Section 2, we use right-column partitions to explain several standard first-course
results. In Section 3, we use left-column partitions to introduce the full-rank factor-
ization, prove the invariance of the number of elements in a basis, and establish the
equality of row and column rank. Instructors of a first linear algebra course will be
familiar with these topics, but perhaps not with a block matrix / column partition
approach to them. Section 4 concerns block-column matrices. Applications include
justification of a matrix-inversion algorithm and a proof of the uniqueness of the re-
duced row echelon form. Block-row and block-column matrices are used in Section
5 to obtain inequalities for the rank of sums and products of matrices, along with
algebraic characterizations of matrices that share the same column space or null
space. The preceding material culminates in Section 6, in which we consider block
matrices of several types and prove that the geometric multiplicity of an eigenvalue
is at most its algebraic multiplicity. We also obtain a variety of determinantal
results that are suitable for presentation in class. We conclude in Section 7 with
Kronecker products and several applications.
Key words and phrases. Matrix, matrix multiplication, block matrix, Kronecker product, rank,
eigenvalues.
1
2 STEPHAN RAMON GARCIA AND R.A. HORN
Notation: We frame our discussion for complex matrices. However, all of our
numerical examples involve only real matrices, which may be preferred by some
first-course instructors. We use Mm×n to denote the set of all m × n complex
matrices; Mn denotes the set of all n × n complex matrices. Boldface letters, such
as a, b, c, denote column vectors; e1 , e2 , . . . , en is the standard basis of Cn . We
regard elements of Cm as column vectors; that is, m × 1 matrices. If A ∈ Mm×n ,
then each column of A belongs to Mm×1 . The transpose of a matrix A is denoted
by AT . The null space and column space of a matrix A are denoted by null A and
col A, respectively. The trace and determinant of a square matrix A are denoted
by tr A and det A, respectively.
2. Right-column partitions
If
1 2 4 5 2
A= and B = , (1)
3 4 6 7 1
then the entries of AB are dot products of rows of A with columns of B:
1·4+2·6 1·5+2·7 1·2+2·1 16 19 4
AB = = . (2)
3·4+4·4 3·5+4·7 3·2+4·1 36 43 10
But there are other ways to organize these computations. We examine right-column
partitions in this section. If A ∈ Mm×r and B = [b1 b2 . . . bn ] ∈ Mr×n , then the
jth column of AB is Abj . That is,
AB = [Ab1 Ab2 . . . Abn ]. (3)
An intentional approach to column partitions can facilitate proofs of important
results from elementary linear algebra.
Example 4. If A and B are the matrices from (1), then B = [b1 b2 b3 ], in which
4 5 2
b1 = , b2 = , and b3 = .
6 7 1
Partitioned matrix multiplication yields the expected answer (2):
" #
1 2 4 1 2 5 1 2 2
[Ab1 Ab2 Ab3 ] =
3 4 6 3 4 7 3 4 1
" #
16 19 4 16 19 4
= =
36 43 10 36 43 10
= AB.
Example 5. Matrix-vector equations can be bundled together. For example, sup-
pose that x1 , x2 , . . . , xk are eigenvectors of A ∈ Mn for the eigenvalue λ and let
X = [x1 x2 . . . xk ] ∈ Mn×k . Then
AX = [Ax1 Ax2 . . . Axk ] = [λx1 λx2 . . . λxk ] = λX.
This observation can be used to prove that the geometric multiplicity of an eigen-
value is at most its algebraic multiplicity; see Example 36.
The following example provides a short proof of an important implication in “the
invertible matrix theorem,” which is in the core of a first course in linear algebra.
BLOCK MATRICES IN LINEAR ALGEBRA 3
det Ai
= .
det A
3. Left-column partitions
We have gotten some mileage out of partitioning the matrix on the right-hand
side of a product. If we partition the matrix on the left-hand side of a product, other
opportunities emerge. If A = [a1 a2 . . . an ] ∈ Mm×n and x = [x1 x2 . . . xn ]T ∈
Cn , then
Ax = x1 a1 + x2 a2 + · · · + xn an . (9)
That is, Ax is a linear combination of the columns of A.
The next example illustrates that relationships between geometric objects, such
as vectors and subspaces, can often be framed algebraically.
Example 10 (Geometry and matrix algebra). Let A ∈ Mm×n and B ∈ Mm×k .
We claim that
col B ⊆ col A ⇐⇒ there exists an X ∈ Mn×k such that AX = B;
moreover, if the columns of A are linearly independent, then X is unique. If each
column of B = [b1 b2 . . . bk ] ∈ Mm×k is a linear combination of the columns of
A ∈ Mm×n , then (9) ensures that there are xi ∈ Cn such that bi = Axi for each i;
4 STEPHAN RAMON GARCIA AND R.A. HORN
if the columns of A are linearly independent, then the xi are uniquely determined.
Let X = [x1 x2 . . . xk ] ∈ Mn×k . Then
B = [b1 b2 . . . bk ] = [Ax1 Ax2 . . . Axk ] = A[x1 x2 . . . xk ] = AX.
Conversely, if AX = B, then (9) indicates that each column of B lies in col A.
The following example uses Example 10 to show that any two bases for the same
subspace of Cn have the same number of elements [1], [2, P.3.38]. It relies on the
fact that tr XY = tr Y X if both products are defined; see [2, (0.3.5)].
Example 11 (Number of elements in a basis). If a1 , a2 , . . . , ar and b1 , b2 , . . . , bs
are bases for the same subspace of Cn , we claim that r = s. If
A = [a1 a2 . . . ar ] ∈ Mn×r and B = [b1 b2 . . . bs ] ∈ Mn×s ,
then col A = col B. Example 10 ensures that B = AX and A = BY , in which
X ∈ Mr×s and Y ∈ Ms×r . Thus,
A(Ir − XY ) = A − AXY = A − BY = A − A = 0.
Since A has linearly independent columns, each column of Ir − XY is zero; that is,
XY = Ir . A similar argument shows that Y X = Is and hence
r = tr Ir = tr Y X = tr XY = tr Is = s.
Another consequence of the principle in Example 10 is a second explanation of
the equality of left and right inverses.
Example 12 (One-sided inverses are two-sided inverses). Suppose that A, B ∈ Mn
and AB = I. If Bx = 0, then x = Ix = A(Bx) = A0 = 0. This shows that
null B = {0}. The Dimension Theorem ensures that col B = Cn , so there is an
X ∈ Mn such that I = BX (this is where we use Example 10). Then BA = BAI =
BABX = BIX = BX = I.
A fundamental result from elementary linear algebra is the equality of rank A
and rank AT ; that is, “column rank equals row rank.” The identity (9) permits us
to give a simple explanation.
Example 13 (Equality of row and column rank). For A ∈ Mm×n , we claim that
rank A = rank AT . We may assume that k = rank A ≥ 1. Let the columns of
B ∈ Mm×k be a basis for col A. Example 10 ensures that there is an X ∈ Mk×n
such that A = BX. Thus, AT = X T B T , so col AT ⊆ col X T . Then
rank AT = dim col AT ≤ dim col X T ≤ k = rank A.
Now apply the same reasoning to AT and obtain rank A = rank AT .
We finish this section with a matrix factorization that plays a role in many
block-matrix arguments.
Example 14 (Full-rank factorization). Let A = [a1 a2 . . . an ] ∈ Mm×n be nonzero,
let r = rank A, and let the columns of X ∈ Mm×r be a basis for col A. We claim that
there is a unique Y ∈ Mr×n such that A = XY ; moreover, rank Y = rank X = r.
Since the r columns of X are a basis for col A, we have rank X = r and col A = col X.
Example 10 ensures that there is a Y ∈ Mr×n such that A = XY . Moreover, Y is
unique because each column of A is a unique linear combination of the columns of
X. Finally, invoke Example 13 to compute
r = rank AT = dim col(Y T X T ) ≤ dim col Y T ≤ r.
BLOCK MATRICES IN LINEAR ALGEBRA 5
4. Block columns
Let A ∈ Mm×r and B ∈ Mr×n . Write
B = [B1 B2 ],
in which B1 ∈ Mr×k and B2 ∈ Mr×(n−k) ; that is, group the first k columns of B to
create B1 and group the remaining n − k columns of B to create B2 . Then,
AB = A[B1 B2 ] = [AB1 AB2 ]; (15)
this is the block version of (3). It can be generalized to involve multiple blocks Bi .
We consider two pedagogically-oriented applications of the block-column approach
(15) to matrix multiplication: a justification of the “side-by-side” matrix inversion
algorithm and a proof of the uniqueness of the reduced row echelon form of a matrix.
First, we consider some examples that illustrate (15).
Example 16. Let A and B be as in (1) and write B = [B1 B2 ], in which
4 5 2
B1 = and B2 = .
6 7 1
Then " #
16 19 4
AB = [AB1 AB2 ] = ,
36 43 10
as computed in (2).
Example 17 (Extending to a basis). If the list x1 , x2 , . . . , xk ∈ Cn is linearly
independent, then it can be extended to a basis of Cn . Equivalently, if X ∈ Mn×k
has linearly independent columns, then there is a Y ∈ Mn×(n−k) such that [X Y ] ∈
Mn is invertible. This observation has lots of applications; see Example 36.
Example 18 (Inversion algorithm). Let A ∈ Mn be invertible and let R be a
product of elementary matrices that encode a sequence of row operations that row
reduces A to I. Then RA = I; that is, R = A−1 . Then (15) ensures that
R[A I] = [RA R] = [I A−1 ].
Thus, if one can row reduce the block matrix [A I] to [I X], then X = A−1 .
Our second application of block columns is the uniqueness of the RREF. The
RREF underpins almost everything in a typical first linear algebra course. It is used
to parametrize solution sets of systems of linear equations and to compute the rank
of a small matrix (for practical computations other procedures are preferred [3]).
6 STEPHAN RAMON GARCIA AND R.A. HORN
Y
= dim col [X Z] ≤ dim col[X Z]
W
≤ r + s = rank A + rank B.
The preceding result could be proved by a counting argument: produce bases
for col A and col B and observe that col(A + B) ⊆ col A + col B. However, Example
22 has a natural advantage. Instead of dealing with the notational overhead of
columns and bases, we let a block matrix do the work. This approach produces
other applications too. For example, it is difficult to see a counting argument that
reproduces the following result.
Example 24 (Sylvester’s rank inequality). For A ∈ Mm×k and B ∈ Mk×n , we
claim that
rank A + rank B − k ≤ rank AB.
Let r = rank AB. If r ≥ 1, then let AB = XY be a full-rank factorization (Example
14), in which X ∈ Mm×r and Y ∈ Mr×n . Define
B if r = 0,
A if r = 0,
C= and D =
B
[A X] ∈ M
m×(k+r) if r ≥ 1, ∈ M(k+r)×n if r ≥ 1.
−Y
In a first linear algebra course, row reduction is often used to solve systems of
linear equations. Students are taught that A and B have the same null space if
A = EB, in which E is an elementary matrix. Since a matrix is invertible if and
only if it is the product of elementary matrices, it follows that A and B have the
same null space if they are row equivalent. What about the converse?
Example 26 (Matrices with the same null space). Let A, B ∈ Mm×n . Then
Example 25 ensures that
null A = null B ⇐⇒ (col A∗ )⊥ = (col B ∗ )⊥
⇐⇒ col A∗ = col B ∗
⇐⇒ A∗ = B ∗ S for some invertible S ∈ Mm
⇐⇒ A = RB for some invertible R ∈ Mm
Thus, if a sequence of elementary row operations is performed on B to obtain a
new matrix A = RB, then the linear systems Ax = 0 and Bx = 0 have the same
solutions. The latter are easily described if R is chosen so that A is in row echelon
form.
6. Block matrices
Having seen the advantages of block row and column partitions, we are now
ready to consider both simultaneously. Let
A11 A12 B11 B12
A= and B = ,
A21 A22 B21 B22
in which the sizes of the submatrices involved are appropriate for the following
matrix multiplications to be defined:
A11 B11 + A12 B21 A11 B12 + A12 B22
AB = . (27)
A21 B11 + A22 B21 A21 B12 + A22 B22
In particular, the diagonal blocks of A and B are square and the dimensions of
the off-diagonal blocks are determined by context. Multiplication of larger block
matrices is conducted in an analogous manner.
Example 28. Here is a numerical example of block matrix multiplication. We use
horizontal and vertical bars to highlight our partitions, although we refrain from
doing so in later examples. If
1 0 2 3 0
A = 0 3 4 and B = 1 4 ,
0 5 0 0 1
then (27) ensures that
1 0 3 2 1 0 0 2
+ [0] + [1]
0 3 1 4 0 3 4 4
AB =
3 0
[0 5] + [0][0] [0 5] + [0][1]
1 4
3 2
3 2
= 3 16 = 3 16 .
[5] [20] 5 20
BLOCK MATRICES IN LINEAR ALGEBRA 9
This agrees with (21) and with the usual computation of the matrix product.
We are now ready for a symbolic example. Although there are more general
formulas for the inverse of a 2 × 2 block matrix [2, P.3.28], the following special
case is sufficient for our purposes.
Example 29 (Inverse of a block triangular matrix). We claim that if Y ∈ Mn and
Z ∈ Mm are invertible, then
−1 −1
−Y −1 XZ −1
Y X Y
= . (30)
0 Z 0 Z −1
How can such a result be discovered? Perform row reduction with block matrices,
being careful to take into account the noncommutativity of matrix multiplication:
−1
I Y −1 X
Y 0 Y X
(1) = multiply first row by Y −1 ,
0 I 0 Z 0 Z
Y −1 X Y −1 X
I 0 I I
(2) = multiply second row by Z −1 ,
0 Z −1 0 Z 0 I
The induction hypothesis ensures that B −1 is upper triangular and hence so is A−1 .
This completes the induction.
Determinants are a staple of many introductory linear algebra courses. Numer-
ical recipes are often given for 2 × 2 and 3 × 3 matrices. Various techniques are
occasionally introduced to evaluate larger determinants. Since the development of
eigenvalues and eigenvectors is often based upon determinants via the characteristic
polynomial (although this is not how modern numerical algorithms approach the
subject [3]), techniques to compute determinants of larger matrices should be a wel-
come addition to the curriculum. This makes carefully-crafted problems involving
10 STEPHAN RAMON GARCIA AND R.A. HORN
for the corresponding eigenspace; see Example 5. Choose Y ∈ Mn×(n−k) such that
S = [X Y ] ∈ Mn is invertible; see Example 17. Then AX = λX and
Ik 0 I
= In = S S = [S X S Y ], so S X = k .
−1 −1 −1 −1
0 In−k 0
Thus,
S −1 AS = S −1 A[X Y ] = S −1 [AX AY ] = S −1 [λX AY ]
−1 −1 λIk ?
= [λS X S AY ] = ,
0 C
in which ? denotes a k × (n − k) submatrix whose entries are of no interest. Since
similar matrices have the same characteristic polynomial, Example 33 ensures that
pA (z) = pS −1AS (z) = (z − λ)k pC (z). Consequently, k = nullity(A − λI) is at most
the multiplicity of λ as a zero of pA (z).
Students should be warned repeatedly that matrix multiplication is noncommu-
tative. That is, if A ∈ Mm×n and B ∈ Mn×m , then AB need not equal BA, even if
both products are defined. Students may be pleased to learn that AB ∈ Mm and
BA ∈ Mn are remarkably alike, despite potentially being of different sizes. This
fact has an elegant explanation using block matrices.
Example 37 (AB versus BA). If A ∈ Mm×n and B ∈ Mn×m , then
AB A 0m A
and (38)
0 0n 0 BA
are similar since
Im 0 AB A 0 A Im 0
= m ,
B In 0 0n 0 BA B In
in which the intertwining matrix is invertible. Since similar matrices have the same
characteristic polynomial, Example 33 ensures that
z n pAB (z) = z m pBA (z). (39)
Thus, the nonzero eigenvalues of AB and BA are the same, with the same mul-
tiplicities. In fact, one can show that the Jordan canonical forms of AB and BA
differ only in their treatment of the eigenvalue zero [2, Thm. 11.9.1].
The preceding facts about AB and BA are more than just curiosities. Example
37 can be used to compute the eigenvalues of certain large, structured matrices.
Suppose that A ∈ Mn has rank r < n. If A = XY , in which X, Y T ∈ Mn×r is a
full-rank factorization (Example 14), then the eigenvalues of A are the eigenvalues
of the r × r matrix Y X, along with n − r zero eigenvalues. Consider the following
example.
Example 40. What are the eigenvalues of
2 3 4 ··· n+1
3
4 5 ··· n + 2
A=
4 5 6 ··· n + 3
?
.. .. .. .. ..
. . . . .
n+1 n+2 n+3 ··· 2n
12 STEPHAN RAMON GARCIA AND R.A. HORN
which are
s !
1 2n + 1
n(n + 1) ± .
2 6(n + 1)
Block-matrix computations can do much more than provide bonus problems and
alternative proofs of results in a first linear algebra course. Here are a few examples.
Another explanation can be based on the fact that XY and Y X have the same
nonzero eigenvalues, with the same multiplicities (see Example 37). With the ex-
ception of the eigenvalue 1, the matrices Im + XY and In + Y X have the same
eigenvalues with the same multiplicities. Since the determinant of a matrix is the
product of its eigenvalues, (42) follows.
The identity (44) can be used to create large matrices whose determinants can be
computed in a straightforward manner. For example,
2 1 1 1 1 1 0 0 0 0 1 1 1 1 1
1 0 1 1 1 0 −1 0 0 0 1 1 1 1 1
T
1 1 2 1 1 = 0 0 1 0 0 + 1 1 1 1 1 = A + uv (45)
1 1 1 0 1 0 0 0 −1 0 1 1 1 1 1
1 1 1 1 2 0 0 0 0 1 1 1 1 1 1
7. Kronecker products
We conclude with a discussion of Kronecker products. It illustrates again that
block-matrix arithmetic can be a useful pedagogical tool.
If A = [aij ] ∈ Mm×n and B ∈ Mp×q , then the Kronecker product of A and B is
the block matrix
a11 B a12 B · · · a1n B
a21 B a22 B · · · a2n B
A⊗B = . .. ∈ Mmp×nq .
.. ..
.. . . .
am1 B am2 B ··· amn B
Example 53. If
1 2
A= and B = [5 6],
3 4
then
B 2B 5 6 10 12
A⊗B = =
3B 4B 15 18 20 24
and
5 10 6 12
B ⊗ A = [5A 6A] = .
15 20 18 24
The Kronecker product interacts with ordinary matrix multiplication and addi-
tion as follows (A, B, C, D are matrices c is a scalar):
(i) (A ⊗ B)(C ⊗ D) = AC ⊗ BD; (iv) A ⊗ (B + C) = A ⊗ B + A ⊗ C;
(ii) c(A ⊗ B) = (cA) ⊗ B = A ⊗ (cB); (v) A ⊗ (B ⊗ C) = (A ⊗ B) ⊗ C.
(iii) (A + B) ⊗ C = A ⊗ C + B ⊗ C;
If A and B are square matrices of the same size then the eigenvalues of AB need
not be products of eigenvalues of A and B. However, for square matrices A and B
of any size, all of the eigenvalues of A ⊗ B are products of eigenvalues of A and B,
BLOCK MATRICES IN LINEAR ALGEBRA 15
and all possible products (by algebraic multiplicity) occur; see [2, P.10.39]. This
fact (and a related version for sums of eigenvalues) can be used by instructors who
wish to construct matrices with prescribed eigenvalues and multiplicities.
Example 54. If Ax = λx and By = µy, then
(A ⊗ B)(x ⊗ y) = (Ax) ⊗ (By) = (λx) ⊗ (µy) = λµ(x ⊗ y)
and
[(A ⊗ I) + (I ⊗ B)](x ⊗ y) = (A ⊗ I)(x ⊗ y) + (I ⊗ B)(x ⊗ y)
= Ax ⊗ y + x ⊗ By
= λx ⊗ y + µx ⊗ y
= (λ + µ)(x ⊗ y).
That is, if λ and µ are eigenvalues of A and B, respectively, then λµ is an eigenvalue
of A ⊗ B and λ + µ is an eigenvalue of A ⊗ I + I ⊗ B, respectively.
Example 55. The eigenvalues of
3 4 6 8
2 1 4 2
= 1 2
⊗
3 4
12 16 9 12 4 3 2 1
8 4 6 3
are −5, −5, 1, and 25; these are 5 × (−1), (−1) × 5, (−1) × (−1), and 5 × 5. The
eigenvalues of each factor are −1 and 5.
Example 56. The eigenvalues of
4 4 2 0
2 2 0 2
= 1 2
⊗ I2 + I2 ⊗
3 4
4 0 6 4 4 3 2 1
0 4 2 4
are −2, 4, 4, and 10; these are (−1) + (−1), (−1) + 5, 5 + (−1), and 5 + 5.
We conclude with a proof of a seminal result in abstract algebra: the algebraic
numbers form a field. That such a result should have a simple proof using block
matrices indicates the usefulness of the method.
An algebraic number is a complex number that is a zero of a monic polynomial
with rational coefficients. Let
f (z) = z n + cn−1 z n−1 + cn−2 z n−2 + · · · + c1 z + c0 , n ≥ 1.
The companion matrix of f is Cf = [−c0 ] if n = 1 and is
0 0 ... 0 −c0
1
0 ... 0 −c1
Cf = 0
1 ... 0 −c2 if n ≥ 2.
.. .. . . .. ..
. . . . .
0 0 . . . 1 −cn−1
Induction and cofactor expansion along the top row of zI − Cf shows that f is the
characteristic polynomial of Cf . Consequently, a complex number is algebraic if
and only if it is an eigenvalue of a matrix with rational entries.
16 STEPHAN RAMON GARCIA AND R.A. HORN
References
[1] Stephan Ramon Garcia. Linearly independent spanning sets. Amer. Math. Monthly,
124(8):722, 2017.
[2] Stephan Ramon Garcia and Roger A. Horn. A second course in linear algebra. Cambridge
Mathematical Textbooks. Cambridge University Press, New York, 2017.
[3] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins Studies in the
Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, fourth edition, 2013.
[4] Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge University Press, Cam-
bridge, second edition, 2013.
[5] David C. Lay, Steven R. Lay, and Judi J. McDonald. Linear Algebra and Its Applications.
Pearson, fifth edition, 2015.
[6] Carl Meyer. Matrix analysis and applied linear algebra. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA, 2000. With 1 CD-ROM (Windows, Macintosh and
UNIX) and a solutions manual (iv+171 pp.).
[7] Fuzhen Zhang. Matrix theory. Universitext. Springer, New York, second edition, 2011. Basic
results and techniques.