Linear Algebra Notes: 1 Column Picture of Matrix Vector Multiplication
Linear Algebra Notes: 1 Column Picture of Matrix Vector Multiplication
Nikhil Srivastava
February 9, 2015
Scalars are lowercase, matrices are uppercase, and vectors are lowercase bold. All vectors
are column vectors (i.e., a vector in Rn is an n × 1 matrix), unless transposed.
xn
This view makes it transparent that matrices can be used to represent linear transforma-
tions with respect to a pair of bases. Suppose T : Rn → Rm is a linear transformation, and
let e1 , . . . , en be the standard basis of Rn . Then we can write any x ∈ Rn as
x = x1 e1 + . . . + xn en
for some coefficients xi ; indeed, this is what we mean when we identify x with its standard
coordinate vector:
x1
x2
[x] =
. . . .
xn
1
Since T is linear, it is completely determined by its values on the basis:
Since these vectors are equal, their coordinate vectors in the standard basis of the “output”
vector space Rm must also be equal:
But since matrix-vector multiplication is the same thing as taking linear combinations,
we may write this as
x1
x2
[T (x)] = [T (e1 )]|[T (e2 )]| . . . |[T (en )] . . . ,
xn
where [v] denotes the standard coordinate vector of v. Note that anything that appears
inside a matrix must be some sort of coordinate vector: it does not make sense to put
an abstract vector inside a matrix whose entries are numbers. However, as before, we will
identify v with its standard coordinate vector [v] and drop the brackets when we are working
in the standard basis. With this identification, we can write:
x1
x2
T (x) = T (e1 )|T (e2 )| . . . |T (en ) . . . .
xn
[S ◦ T ] = [S][T ].
where (S ◦ T )(v) = S(T (v)) denotes the composition of two linear transformations S and T .
More concretely, for any matrix A and column vectors c1 . . . , cn , it is the unique definition
for which
A c1 |c2 | . . . |cn = Ac1 |Ac2 | . . . |Acn .
2
2 Change of Basis
We know we can write any x ∈ Rn as a linear combination of the standard basis vectors, for
some coefficients x1 , . . . , xn :
x = x1 e1 + . . . + xn en .
In some situations, it is useful to write x in another basis b1 , . . . , bn (we will see many
such situations in the course). By definition, since the bi ’s are a basis, there must be unique
coefficients x01 , . . . , x0n such that
[x] = B[x]B ,
[x]B = B −1 [x].
Thus, changing basis is equivalent to solving linear equations. Note that this also shows that
the columns of B −1 are [e1 ]B , . . . , [en ]B , the standard basis vectors in the B basis.
Once we know how to change basis for vectors, it is easy to do it for linear transformations
and matrices. For simplicity, we will only consider linear operators (linear transformations
from a vector space to itself), which will allow us to keep track of just one basis rather than
two.
Suppose T : Rn → Rn is a linear operator and [T ] is its standard matrix, i.e.
[T ][x] = [T (x)].
[T ]B [x]B = [T (x)]B .
Plugging in the relationship between [x]B and [x] which we derived above, we get
[T ]B B −1 [x] = B −1 [T (x)],
3
which since B is invertible is equivalent to
B[T ]B B −1 [x] = [T (x)].
Thus, we must have
B[T ]B B −1 = [T ],
or equivalently
[T ]B = B −1 [T ]B,
which are the explicit formulas for change of basis of matrices/linear transformations.
3 Diagonalization
The standard basis e1 , . . . , en of Rn is completely arbitrary, and as such it is just a conven-
tion invented by humans so that they can start writing vectors in some basis. But linear
transformations that occur in nature and elsewhere often come equipped with a much better
basis, given by their eigenvectors.
Let T : V → V be a linear operator (i.e., a linear transformation from a vector space
to itself). If v 6= 0 is a nonzero vector and T (v) = λv for some scalar λ (which may be
complex), then v is called an eigenvector of T and λ is called an eigenvalue of v.
There is an analogous definition for square matrices A, in which we ask that Av = λv.
Note that T (v) = λv if and only if [T ][v] = λ[v], so in particular the operator T and the
matrix [T ] have the same eigenvalues. This fact holds in every basis (see HW3 question 6),
so eigenvalues are intrinsic to operators and do not depend on the choice of basis used to
write the matrix.
The eigenvalues of a matrix may be computed by solving the characteristic equation
det(λI − A) = 0. Since this is a polynomial of degree n for an n × n matrix, the fundamental
theorem of algebra tells us that it must have n roots, whence every n × n matrix must
have n eigenvalues. Once the eigenvalues are known, the corresponding eigenvectors can be
obtained by solving systems of linear equations (λI − A)v = 0. See your Math 54 text for
more information on how to compute these things.
In any case, something wonderful happens when we have an operator/matrix with a
basis of linearly independent eigenvectors, sometimes called an eigenbasis. For instance, let
T : Rn → Rn be such an operator. Then, we have
T (b1 ) = λ1 b1 , T (b2 ) = λ2 b2 , . . . , T (bn ) = λn bn ,
for linearly independent b1 , . . . , bn . Thus, we may write an arbitrary x ∈ Rn in this basis:
x = x01 b1 + x02 b2 + . . . + x0n bn ,
and then
T (x) = T (x01 b1 + x02 b2 + . . . + x0n bn )
= x01 T (b1 ) + x02 T (b2 ) + . . . + x0n T (bn ) by linearity of T
= x01 λ1 b1 + x02 λ2 b2 + . . . + x0n λn bn .
4
So, applying the transformation T is tantamount to multiplying each coefficient x0i by λi .
In particular T acts on each coordinate completely independently by scalar multiplication,
and there are no interactions between the coordinates. This is as about as simple as a linear
transformation can be.
If we write down the matrix of T in the basis B consisting of its eigenvectors, we find
that [T ]B is a diagonal matrix with the eigenvalues λ1 , . . . , λn on the diagonal. Appealing to
the change of basis formula we derived in the previous section, this means that
[T ] = B[T ]B B −1 .
In general, this can be done for any square matrix A, since every A is equal to [T ] for
the linear transformation T (x) = Ax. Using the letter D to denote the diagonal matrix of
eigenvalues, this gives
A = B −1 DB.
Factorizing a matrix in this way is called diagonalization, and a matrix which can be diag-
onalized (i.e., one with a basis of eigenvectors) is called diagonalizable. Not all matrices are
diagonalizable, but the vast majority of them are.
∂ 2 x1 (t)
= −kx1 (t) − k(x1 (t) − x2 (t)),
∂ 2t
∂ 2 x2 (t)
= −kx2 (t) − k(x2 (t) − x1 (t)).
∂ 2t
It is not immediately obvious how to solve this because each partial derivatives depends on
both of the variables. We will now show how to reduce it to the diagonal case.
We can write this as a single differential equation in the vector-valued function
x1 (t)
x(t) =
x2 (t)
as follows:
∂ 2 x(t)
= Ax(t),
∂t2
where
−2k k
A=
k −2k
5
It turns out that A has eigenvalues −k and −3k and corresponding eigenvectors
1 1
b1 = and b2 = .
1 −1
We will now show how this simplifies the problem substantially, first with an “implicit”
change of basis, and then with an “explicit” matrix factorization.
Implicit Change of Basis. For every value of t, x(t) is a vector in R2 , so we can write it
as some linear combination of the basis vectors {b1 , b2 }:
where the coefficients a1 (t) and a2 (t) depend on t. Substituting this into our equations, we
have
∂ 2 a1 (t) ∂ 2 a2 (t) ∂2
b1 + b2 = 2 (a1 (t)b1 + a2 (t)b2 )
∂t2 ∂t2 ∂t
= A(a1 (t)b1 + a2 (t)b2 )
= a1 (t)Ab1 + a2 (t)Ab2
= a1 (t)λ1 b1 + a2 (t)λ2 b2 .
Since b1 , b2 are linearly independent, their coefficients on both sides of this equality must
be equal. Equating coefficients, we obtain the decoupled scalar differential equations:
∂ 2 a1 (t)
= λ1 a1 (t),
∂t2
∂ 2 a2 (t)
= λ2 a2 (t),
∂t2
For which we can easily find the general solutions:
p a˙1 (0) p
a1 (t) = a1 (0) cos( −λ1 t) + √ sin( −λ1 t),
−λ1
and
p a˙2 (0) p
a2 (t) = a2 (0) cos( −λ2 t) + √ sin( −λ2 t).
λ2
The initial conditions we had in class were x1 (0) = 1 and x2 (0) = x˙1 (0) = x˙2 (0) = 0. We
again use the equation (∗) to translate this into initial conditions in a1 (t), a2 (t):
1
x(0) = = (1/2)b1 + (1/2)b2 ,
0
so a1 (0) = a2 (0) = 1/2. We also have a˙1 (0) = a˙2 (0) = 0. Thus, the general solution is given
by √
a1 (t) cos(√ kt)
a(t) = = 1/2 ,
a2 (t) cos( 3kt)
6
which in terms of x is just
√ √
x1 (t) 1 cos( kt) + cos( 3kt)
= √ √
x2 (t) 2 cos( kt) − cos( 3kt)
Explicit Matrix Notation. Some people understand things better if they are written in
terms of explicit matrix factorizations. If we diagonalize A = BDB −1 , we can rewrite our
equation as
∂2
x(t) = BDB −1 x(t).
∂t2
Defining
this becomes
∂2
Ba(t) = BDa(t).
∂t2
Since B is a fixed matrix that does not depend on t, it commutes with the partial derivative
and we have
∂2
B 2 a(t) = BDa(t).
∂t
−1
Multiplying both sides by B gives
∂2
a(t) = Da(t),
∂t2
which is the same diagonal system we solved above.
The eigenvectors of A are called the normal modes of the system.
V T V = I,
7
where V = [v1 | . . . |vn ] is a matrix with the vi as columns; such a matrix is called an
orthogonal matrix.
This last identity implies that
V −1 = V T ,
which reveals one of the very desirable properties of orthonormal bases: for any x, the change
of basis is simply
(v1 · x)
[x]V = V −1 [x] = V T [x] = . . . .
(vn · x)
That is, the coefficients are given by dot products
which are much easier to calculate than solving linear equations as in Section 2
Theorem. If A is symmetric, then it is diagonalizable. Moreover, all of its eigenvalues
are real and it has an eigenbasis of orthonormal eigenvectors. Thus, A = V DV T for some
orthogonal V and diagonal D.
This is the same as the real dot product, except we take the complex conjugate (denoted by
∗) of the first vector x. Note that if both x and y are real then this doesn’t change anything.
We will refer to this as an “inner product” to distinguish it from the real dot product.
With the inner product in hand, we say that a set of vectors u1 , . . . , un in Cn is orthonor-
mal if
hui |uj i = 0 for i 6= j
and
hui |ui i = 1.
8
A matrix with complex orthonormal columns is called unitary. You should check that such
a matrix satisfies
(U ∗ )T U = I.
This looks a ilttle weird, but it’s only because the usual transpose isn’t the correct notion
for complex matrices. The right generalization is actually the conjugate transpose
U † := (U T )∗ ,
pronounced U “dagger”. In this notation, a unitary matrix is just one which satisfies
U † = U −1 .
Again, computing coefficients in such a basis is very easy, and amounts to finding inner
products:
x = hu1 |xiu1 + . . . + hun |xi.
The correct generalization of real symmetric matrices to the complex case is the class of
Hermitian matrices. A matrix A is called Hermitian if A = A† .
Theorem. If A is Hermitian, then it is diagonalizable. Moreover, all of its eigenvalues
are real and it has an eigenbasis of orthonormal eigenvectors. Thus, A = U DU † for some
unitary U and diagonal D.
2. hax1 + bx2 |yi = a∗ hx1 |yi + b∗ hx2 |yi for all a, b ∈ C and x, y ∈ V . This is called
“conjugate-linearity” in the first coordinate.
9
1. kx + yk ≤ kxk + kyk. (triangle inequality)
Item (2) is a simple exercise, and item (1) can be easily derived from item (3). Here is
the proof of item (3): first observe that it is equivalent to show that
x y |hx|yi|
kxk kyk = kxkkyk ≤ 1,
where the first equality is because of linearity in both coordinates with respect to real scalars.
So it suffices to show that
|hx|yi| ≤ 1
for all unit vectors x, y. We now compute
10