Matrix Primer
Matrix Primer
Matrix Primer
Fall 2005
A Primer on Matrices1
These notes describe the notation of matrices, the mechanics of matrix manipulation,
and how to use matrices to formulate and solve sets of simultaneous linear equations.
We wont cover:
linear algebra, i.e., the underlying mathematics of matrices
numerical linear algebra, i.e., the algorithms used to manipulate matrices and solve
linear equations
software for forming and manipulating matrices, e.g., Matlab, Mathematica, or Octave
how to represent and manipulate matrices, or solve linear equations, in computer languages such as C/C++ or Java
applications, for example in statistics, mechanics, economics, circuit analysis, or graph
theory
Matrices
A matrix is a rectangular array of numbers (also called
brackets, as in
0
1 2.3 0.1
A = 1.3 4 0.1 0
4.1 1
0
1.7
An important attribute of a matrix is its size or dimensions, i.e., the numbers of rows and
columns. The matrix A above, for example, has 3 rows and 4 columns, so its size is 3 4.
(Size is always given as rows columns.) A matrix with m rows and n columns is called an
m n matrix.
An m n matrix is called square if m = n, i.e., if it has an equal number of rows and
columns. Some authors refer to an m n matrix as fat if m < n (fewer rows than columns),
or skinny if m > n (more rows than columns). The matrix A above is fat.
The entries or coefficients of a matrix are the values in the array. The i, j entry is the
value in the ith row and jth column, denoted by double subscripts: the i, j entry of a matrix
C is denoted Cij (which is a number). The positive integers i and j are called the (row and
column, respectively) indices. For our example above, A13 = 2.3, A32 = 1. The row
index of the bottom left entry (which has value 4.1) is 3; its column index is 1.
Two matrices are equal if they are the same size and all the corresponding entries (which
are numbers) are equal.
Vectors and scalars
A matrix with only one column, i.e., with size n 1, is called a column vector or just a
vector. Sometimes the size is specified by calling it an n-vector. The entries of a vector are
denoted with just one subscript (since the other is 1), as in a3 . The entries are sometimes
called the components of the vector, and the number of rows of a vector is sometimes called
its dimension. As an example,
1
2
v=
3.3
0.3
is a 4-vector (or 4 1 matrix, or vector of dimension 4); its third component is v3 = 3.3.
Similarly, a matrix with only one row, i.e., with size 1 n, is called a row vector. As an
example,
h
i
w = 2.1 3 0
1 i=j
0 i=
6 j
1 0
0 1
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
which are the 2 2 and 4 4 identity matrices. (Remember that both are denoted with the
same symbol, namely, I.) The importance of the identity matrix will become clear later.
Unit and ones vectors
A vector with one component one and all others zero is called a unit vector. The ith unit
vector, whose ith component is 1 and all others are zero, is usually denoted ei . As with zero
or identity matrices, you usually have the figure out the dimension of a unit vector from
context. The three unit 3-vectors are:
e1 = 0
,
0
e2 = 1
,
0
e3 = 0
.
1
Note that the n columns of the n n identity matrix are the n unit n-vectors. Another term
for ei is ith standard basis vector. Also, you should watch out, because some authors use the
term unit vector to mean a vector of length one. (Well explain that later.)
Another common vector is the one with all components one, sometimes called the ones
vector, and denoted 1 (by some authors) or e (by others). For example, the 4-dimensional
ones vector is
1
1
1 = .
1
1
Matrix operations
T
"
#
0 4
0 7 3
.
7 0 =
4 0 1
3 1
Transposition converts row vectors into column vectors, and vice versa. If we transpose a
T
matrix twice, we get back the original matrix: AT = A.
Matrix addition
Two matrices of the same size can be added together, to form another matrix (of the same
size), by adding the corresponding entries (which are numbers). Matrix addition is denoted
by the symbol +. (Thus the symbol + is overloaded to mean scalar addition when scalars
appear on its left and right hand side, and matrix addition when matrices appear on its left
and right hand sides.) For example,
1 6
1 2
0 4
7 0 + 2 3 = 9 3
3 5
0 4
3 1
A pair of row or column vectors of the same size can be added, but you cannot add
together a row vector and a column vector (except when they are both scalars!).
Matrix subtraction is similar. As an example,
"
1 6
9 3
I =
"
0 6
9 2
Note that this gives an example where we have to figure out what size the identity matrix
is. Since you can only add (or subtract) matrices of the same size, we conclude that I must
refer to a 2 2 identity matrix.
Matrix addition is commutative, i.e., if A and B are matrices of the same size, then
A + B = B + A. Its also associative, i.e., (A + B) + C = A + (B + C), so we write both as
A + B + C. We always have A + 0 = 0 + A = A, i.e., adding the zero matrix to a matrix
has no effect. (This is another example where you have to figure out the exact dimensions
of the zero matrix from context. Here, the zero matrix must have the same dimensions as
A; otherwise they could not be added.)
Scalar multiplication
Another operation is scalar multiplication: multiplying a matrix by a scalar (i.e., number),
which is done by multiplying every entry of the matrix by the scalar. Scalar multiplication
is usually denoted by juxtaposition, with the scalar on the left, as in
1 6
2 12
(2) 9 3 = 18 6 .
6 0
12 0
Sometimes you see scalar multiplication with the scalar on the right, or even scalar division
with the scalar shown in the denominator (which just means scalar multiplication by one
over the scalar), as in
"
1 6
2 12
9 3
18
6
2
=
,
6 0
12 0
9 6 9
6 0 3
3
"
3 2 3
2 0 1
p
X
i = 1, . . . , m,
j = 1, . . . , n.
k=1
This rule looks complicated, but there are several ways to remember it. To find the i, j entry
of the product C = AB, you need to know the ith row of A and the jth column of B. The
summation above can be interpreted as moving left to right along the ith row of A while
6
moving top to bottom down the jth column of B. As you go, you keep a running sum of
the product of the coresponding entries from A and B.
As an example, lets find the product C = AB, where
A=
"
1 2 3
1 0 4
0 3
1
B= 2
.
1 0
First, we check that they are compatible: A has three columns, and B has three rows, so
theyre compatible. The product matrix C will have two rows (the number of rows of A)
and two columns (the number of columns of B). Now lets find the entries of the product
C. To find the 1, 1 entry, we move across the first row of A and down the first column of B,
summing the products of corresponding entries:
C11 = (1)(0) + (2)(2) + (3)(1) = 1.
To find the 1, 2 entry, we move across the first row of A and down the second column of B:
C12 = (1)(3) + (2)(1) + (3)(0) = 1.
In each product term here, the lefthand number comes from the first row of A, and the
righthand number comes from the first column of B. Two more similar calculations give us
the remaining entries C21 and C22 :
"
1 2 3
1 0 4
#
"
0 3
1 1
1 =
.
2
4 3
1 0
At this point, matrix multiplication probably looks very complicated to you. It is, but once
you see all the uses for it, youll get used to it.
Some properties of matrix multiplication
Now we can explain why the identity has its name: if A is any m n matrix, then AI = A
and IA = A, i.e., when you multiply a matrix by an identity matrix, it has no effect. (The
identity matrices in the formulas AI = A and IA = A have different sizes what are they?)
One very important fact about matrix multiplication is that it is (in general) not commutative: we dont (in general) have AB = BA. In fact, BA may not even make sense, or,
if it makes sense, may be a different size than AB (so that equality in AB = BA is meaningless). For example, if A is 2 3 and B is 3 4, then AB makes sense (the dimensions
are compatible) but BA doesnt even make sense (much less equal AB). Even when AB
and BA both make sense and are the same size, i.e., when A and B are square, we dont (in
general) have AB = BA. As a simple example, consider:
"
1 6
9 3
#"
0 1
1 2
"
6 11
3 3
"
0 1
1 2
#"
1 6
9 3
"
9 3
17 0
Matrix multiplication is associative, i.e., (AB)C = A(BC) (provided the products make
sense). Therefore we write the product simply as ABC. Matrix multiplication is also associative with scalar multiplication, i.e., (AB) = (A)B, where is a scalar and A and B are
matrices (that can be multiplied). Matrix multiplication distributes across matrix addition:
A(B + C) = AB + AC and (A + B)C = AC + BC.
7
Matrix-vector product
A very important and common case of matrix multiplication is y = Ax, where A is an m n
matrix, x is an n-vector, and y is an m-vector. We can think of matrix vector multiplication
(with an m n matrix) as a function that transforms n-vectors into m-vectors. The formula
is
yi = Ai1 x1 + + Ain xn , i = 1, . . . , m
Inner product
Another important special case of matrix multiplication occurs when v is an row n-vector
and w is a column n vector. Then the product vw makes sense, and has size 1 1, i.e., is a
scalar:
vw = v1 w1 + + vn wn .
This occurs often in the form xT y where x and y are both n-vectors. In this case the product
(which is a number) is called the inner product or dot product of the vectors x and y. Other
notation for the inner product is hx, yi or x y. If x and y are n-vectors, then their inner
product is
hx, yi = xT y = x1 y1 + + xn yn .
But remember that the matrix product xy doesnt make sense (unless they are both scalars).
Matrix powers
When a matrix A is square, then it makes sense to multiply A by itself, i.e., to form A A.
We refer to this matrix as A2 . Similarly, k copies of A multiplied together is denoted Ak .
(Non-integer powers, such as A1/2 (the matrix squareroot), are pretty tricky they
might not make sense, or be ambiguous, unless certain conditions on A hold. This is an
advanced topic in linear algebra.)
By convention we set A0 = I (usually only when A is invertible see below).
Matrix inverse
If A is square, and there is a matrix F such that F A = I, then we say that A is invertible
or nonsingular. We call F the inverse of A, and denote it A1 . We can then also define
k
Ak = (A1 ) . If a matrix is not invertible, we say it is singular or noninvertible.
Its important to understand that not all square matrices are invertible, i.e., have inverses.
(For example, a zero matrix never has an inverse.) As a less obvious example, you might try
to show that the matrix
"
#
1 1
2
2
does not have an inverse.
As an example of the matrix inverse, we have
"
1 1
1
2
#1
1
=
3
8
"
2 1
1 1
a b
c d
#1
1
=
ad bc
"
d b
c
a
provided ad bc 6= 0. (If ad bc = 0, the matrix is not invertible.) There are similar, but
much more complicated, formulas for the inverse of larger (invertible) square matrices, but
they are not used in practice.
The importance of the matrix inverse will become clear when we study linear equations.
Useful identities
Weve already mentioned a handful of matrix identities, that you could figure out yourself,
e.g., A + 0 = A. Here we list a few others, that are not hard to derive, and quite useful.
(Were making no claims that our list is complete!)
transpose of product: (AB)T = B T AT
transpose of sum: (A + B)T = AT + B T
inverse of product: (AB)1 = B 1 A1 provided A and B are square (of the same size)
and invertible
products of powers: Ak Al = Ak+l (for k, l 1 in general, and for all k, l if A is
invertible)
Block matrices and submatrices
In some applications its useful to form matrices whose entries are themselves matrices, as
in
"
#
h
i
F I
A B C ,
,
0 G
where A, B, C, F , and G are matrices (as are 0 and I). Such matrices are called block
matrices; the entries A, B, etc. are called blocks and are sometimes named by indices.
Thus, F is called the 1, 1 block of the second matrix.
Of course the block matrices must have the right dimensions to be able to fit together:
matrices in the same (block) row must have the same number of rows (i.e., the same height);
matrices in the same (block) column must have the same number of columns (i.e., the same
width). Thus in the examples above, A, B and C must have the same number of rows (e.g.,
they could be 2 3, 2 2, and 2 1). The second example is more interesting. Suppose
that F is m n. Then the identity matrix in the 1, 2 position must have size m m (since
it must have the same number of rows as F ). We also see that G must have m columns, say,
9
dimensions p m. That fixes the dimensions of the 0 matrix in the 2, 1 block it must be
p n.
As a specific example, suppose that
C=
"
2 2
1 3
"
0 2 3 2 2
5 4 7 1 3
"
C
D
Then we have
h
D C
D=
"
0 2 3
5 4 7
#
doesnt make sense, because the top block has two columns and the bottom block has three.
But the block expression
"
#
C
DT
does make sense, because now the bottom block has two columns, just like the top block.
You can also divide a larger matrix (or vector) into blocks. In this context the blocks
are sometimes called submatrices of the big matrix. For example, its often useful to write
an m n matrix as a 1 n block matrix of m-vectors (which are just its columns), or as an
m 1 block matrix of n-row-vectors (which are its rows).
Block matrices can be added and multiplied as if the entries were numbers, provided the
corresponding entries have the right sizes (i.e., conform) and youre careful about the order
of multipication. Thus we have
"
A B
C D
#"
X
Y
"
AX + BY
CX + DY
10
Linear functions
Suppose that f is a function that takes as argument (input) n-vectors and returns (as output)
m-vectors. We say f is linear if it satisfies two properties:
scaling: for any n-vector x and any scalar , f (x) = f (x)
superposition: for any n-vectors u and v, f (u + v) = f (u) + f (v)
Its not hard to show that such a function can always be represented as matrix-vector multiplication: there is an m n matrix A such that f (x) = Ax for all n-vectors x. (Conversely,
functions defined by matrix-vector multiplication are linear.)
We can also write out the linear function in explicit form, i.e., f (x) = y where
yi =
n
X
i = 1, . . . , m
j=1
This gives a simple interpretation of Aij : it gives the coefficient by which yi depends on xj .
Suppose an m-vector y is a linear function of the n-vector x, i.e., y = Ax where A is
m n. Suppose also that a p-vector z is a linear function of y, i.e., z = By where B is p m.
Then z is a linear function of x, which we can express in the simple form z = By = (BA)x.
So matrix multiplication corresponds to composition of linear functions (i.e., linear functions
of linear functions of some variables).
Linear equations
Any set of m linear equations in (scalar) variables x1 , . . . , xn can be represented by the
compact matrix equation Ax = b, where x is a vector made from the variables, A is an m n
matrix and b is an m-vector. Lets start with a simple example of two equations in three
variables:
1 + x2 = x3 2x1 , x3 = x2 2.
The first thing to do is to rewrite the equations with the variables lined up in columns, and
the constants on the righthand side:
2x1 +x2 x3 = 1
0x1 x2 +x3 = 2
Now its easy to rewrite the equations as a single matrix equation:
"
2 1 1
0 1 1
"
#
x1
1
,
x2 =
2
x3
"
2 1 1
0 1 1
x1
x=
x2 ,
x3
11
b=
"
1
2
12