0% found this document useful (0 votes)
90 views10 pages

Linear Algebra Notes: 1 Column Picture of Matrix Vector Multiplication

The document discusses linear algebra concepts including matrix-vector multiplication, change of basis, and diagonalization. 1) Matrix-vector multiplication can be viewed as taking dot products of the vector with rows of the matrix or as taking a linear combination of the matrix columns with the vector entries as coefficients. 2) Changing basis involves expressing a vector in a new basis using the change of basis matrix B. The matrix of a linear transformation T in the new basis is given by B−1TB. 3) A matrix is diagonalizable if it has a basis of eigenvectors, allowing it to be represented by a diagonal matrix with the eigenvalues on the diagonal.

Uploaded by

BennyTran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views10 pages

Linear Algebra Notes: 1 Column Picture of Matrix Vector Multiplication

The document discusses linear algebra concepts including matrix-vector multiplication, change of basis, and diagonalization. 1) Matrix-vector multiplication can be viewed as taking dot products of the vector with rows of the matrix or as taking a linear combination of the matrix columns with the vector entries as coefficients. 2) Changing basis involves expressing a vector in a new basis using the change of basis matrix B. The matrix of a linear transformation T in the new basis is given by B−1TB. 3) A matrix is diagonalizable if it has a basis of eigenvectors, allowing it to be represented by a diagonal matrix with the eigenvalues on the diagonal.

Uploaded by

BennyTran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Linear Algebra Notes

Nikhil Srivastava
February 9, 2015

Scalars are lowercase, matrices are uppercase, and vectors are lowercase bold. All vectors
are column vectors (i.e., a vector in Rn is an n × 1 matrix), unless transposed.

1 Column Picture of Matrix Vector Multiplication


Suppose A is an m × n matrix with rows rT1 , . . . , rTm and columns c1 , . . . , cn . In high school,
we are taught to think of the matrix-vector product Ax as taking dot products of x with
rows of A:  T  T   
r1 r1 x (r1 · x)
 rT2   T  
 x =  r2 x  =  (r2 · x) 

Ax = 
. . .  ...   ... 
rTm rTm x (rm · x)
This makes sense if we regard matrices as essentially a convenient notation to represent
linear equations, since each linear equation naturally gives a dot product.
A different perspective is to view Ax as taking a linear combination of the columns
c1 , . . . , cn of A, with coefficients equal to the entries of x:
 
x1
 x2 
Ax = [c1 |c2 | . . . |cn ] 
. . . = x1 c1 + x2 c2 + . . . + xn cn .

xn
This view makes it transparent that matrices can be used to represent linear transforma-
tions with respect to a pair of bases. Suppose T : Rn → Rm is a linear transformation, and
let e1 , . . . , en be the standard basis of Rn . Then we can write any x ∈ Rn as
x = x1 e1 + . . . + xn en
for some coefficients xi ; indeed, this is what we mean when we identify x with its standard
coordinate vector:  
x1
 x2 
[x] = 
. . . .

xn

1
Since T is linear, it is completely determined by its values on the basis:

T (x) = T (x1 e1 + . . . + xn en ) = x1 T (e1 ) + . . . xn T (en ).

Since these vectors are equal, their coordinate vectors in the standard basis of the “output”
vector space Rm must also be equal:

[T (x)] = [T (x1 e1 + . . . + xn en )] = x1 [T (e1 )] + . . . xn [T (en )].

But since matrix-vector multiplication is the same thing as taking linear combinations,
we may write this as
 
x1
   x2 
[T (x)] = [T (e1 )]|[T (e2 )]| . . . |[T (en )] . . . ,

xn

where [v] denotes the standard coordinate vector of v. Note that anything that appears
inside a matrix must be some sort of coordinate vector: it does not make sense to put
an abstract vector inside a matrix whose entries are numbers. However, as before, we will
identify v with its standard coordinate vector [v] and drop the brackets when we are working
in the standard basis. With this identification, we can write:
 
x1
   x2 
T (x) = T (e1 )|T (e2 )| . . . |T (en ) . . . .

xn

The matrix above is called the standard matrix of T , and is denoted by [T ]. It is a


complete description of T , with respect to the standard basis. One of the remarkable things
about linear transformations is that they have such compact descriptions — this is not at
all true of arbitrary functions from Rn to Rm .
Remark. The column view of matrix-vector multiplication also explains why matrix-matrix
multiplication is defined the way it is (which usually seems mysterious the first time you see
it). It is the unique definition for which

[S ◦ T ] = [S][T ].

where (S ◦ T )(v) = S(T (v)) denotes the composition of two linear transformations S and T .
More concretely, for any matrix A and column vectors c1 . . . , cn , it is the unique definition
for which    
A c1 |c2 | . . . |cn = Ac1 |Ac2 | . . . |Acn .

2
2 Change of Basis
We know we can write any x ∈ Rn as a linear combination of the standard basis vectors, for
some coefficients x1 , . . . , xn :
x = x1 e1 + . . . + xn en .
In some situations, it is useful to write x in another basis b1 , . . . , bn (we will see many
such situations in the course). By definition, since the bi ’s are a basis, there must be unique
coefficients x01 , . . . , x0n such that

x = x01 b1 + x02 b2 + . . . + x0n bn .

Passing to this representation of x is what I called “implicit” change of basis in class. By


implicit, I meant that we chose to decompose x in terms of its coefficients in the bi , but we
didn’t give an explicit method for computing what those coefficients are.
To find the coefficients, it helps to write down the latter linear combination in matrix
notation:  0
x1
[x] = x01 [b1 ] + . . . + x0n [bn ] = [b1 | . . . |bn ] . . . = B[x]B ,
x0n
where B is the matrix with columns b1 , . . . , bn and [x]B denotes the coordinate vector of x
in the B basis. This yields the fundamental relationship

[x] = B[x]B ,

which since B is invertible tells us how to find [x]B from [x]:

[x]B = B −1 [x].

Thus, changing basis is equivalent to solving linear equations. Note that this also shows that
the columns of B −1 are [e1 ]B , . . . , [en ]B , the standard basis vectors in the B basis.
Once we know how to change basis for vectors, it is easy to do it for linear transformations
and matrices. For simplicity, we will only consider linear operators (linear transformations
from a vector space to itself), which will allow us to keep track of just one basis rather than
two.
Suppose T : Rn → Rn is a linear operator and [T ] is its standard matrix, i.e.

[T ][x] = [T (x)].

By definition, the matrix [T ]B of T in the B basis must satisfy:

[T ]B [x]B = [T (x)]B .

Plugging in the relationship between [x]B and [x] which we derived above, we get

[T ]B B −1 [x] = B −1 [T (x)],

3
which since B is invertible is equivalent to
B[T ]B B −1 [x] = [T (x)].
Thus, we must have
B[T ]B B −1 = [T ],
or equivalently
[T ]B = B −1 [T ]B,
which are the explicit formulas for change of basis of matrices/linear transformations.

3 Diagonalization
The standard basis e1 , . . . , en of Rn is completely arbitrary, and as such it is just a conven-
tion invented by humans so that they can start writing vectors in some basis. But linear
transformations that occur in nature and elsewhere often come equipped with a much better
basis, given by their eigenvectors.
Let T : V → V be a linear operator (i.e., a linear transformation from a vector space
to itself). If v 6= 0 is a nonzero vector and T (v) = λv for some scalar λ (which may be
complex), then v is called an eigenvector of T and λ is called an eigenvalue of v.
There is an analogous definition for square matrices A, in which we ask that Av = λv.
Note that T (v) = λv if and only if [T ][v] = λ[v], so in particular the operator T and the
matrix [T ] have the same eigenvalues. This fact holds in every basis (see HW3 question 6),
so eigenvalues are intrinsic to operators and do not depend on the choice of basis used to
write the matrix.
The eigenvalues of a matrix may be computed by solving the characteristic equation
det(λI − A) = 0. Since this is a polynomial of degree n for an n × n matrix, the fundamental
theorem of algebra tells us that it must have n roots, whence every n × n matrix must
have n eigenvalues. Once the eigenvalues are known, the corresponding eigenvectors can be
obtained by solving systems of linear equations (λI − A)v = 0. See your Math 54 text for
more information on how to compute these things.
In any case, something wonderful happens when we have an operator/matrix with a
basis of linearly independent eigenvectors, sometimes called an eigenbasis. For instance, let
T : Rn → Rn be such an operator. Then, we have
T (b1 ) = λ1 b1 , T (b2 ) = λ2 b2 , . . . , T (bn ) = λn bn ,
for linearly independent b1 , . . . , bn . Thus, we may write an arbitrary x ∈ Rn in this basis:
x = x01 b1 + x02 b2 + . . . + x0n bn ,
and then
T (x) = T (x01 b1 + x02 b2 + . . . + x0n bn )
= x01 T (b1 ) + x02 T (b2 ) + . . . + x0n T (bn ) by linearity of T
= x01 λ1 b1 + x02 λ2 b2 + . . . + x0n λn bn .

4
So, applying the transformation T is tantamount to multiplying each coefficient x0i by λi .
In particular T acts on each coordinate completely independently by scalar multiplication,
and there are no interactions between the coordinates. This is as about as simple as a linear
transformation can be.
If we write down the matrix of T in the basis B consisting of its eigenvectors, we find
that [T ]B is a diagonal matrix with the eigenvalues λ1 , . . . , λn on the diagonal. Appealing to
the change of basis formula we derived in the previous section, this means that

[T ] = B[T ]B B −1 .

In general, this can be done for any square matrix A, since every A is equal to [T ] for
the linear transformation T (x) = Ax. Using the letter D to denote the diagonal matrix of
eigenvalues, this gives
A = B −1 DB.
Factorizing a matrix in this way is called diagonalization, and a matrix which can be diag-
onalized (i.e., one with a basis of eigenvectors) is called diagonalizable. Not all matrices are
diagonalizable, but the vast majority of them are.

4 Coupled Oscillator Example


Here is an example of diagonalization in action. Suppose we have two unit masses connected
by springs with spring constant k as in Figure 12.1 of the book. Let x1 (t) and x2 (t) be
the positions of the springs at time t. Then, subject to initial positions and velocities
x1 (0), x2 (0), x˙1 (0), x˙2 (0), the system is governed by the coupled differential equations:

∂ 2 x1 (t)
= −kx1 (t) − k(x1 (t) − x2 (t)),
∂ 2t
∂ 2 x2 (t)
= −kx2 (t) − k(x2 (t) − x1 (t)).
∂ 2t
It is not immediately obvious how to solve this because each partial derivatives depends on
both of the variables. We will now show how to reduce it to the diagonal case.
We can write this as a single differential equation in the vector-valued function
 
x1 (t)
x(t) =
x2 (t)

as follows:
∂ 2 x(t)
= Ax(t),
∂t2
where  
−2k k
A=
k −2k

5
It turns out that A has eigenvalues −k and −3k and corresponding eigenvectors
   
1 1
b1 = and b2 = .
1 −1
We will now show how this simplifies the problem substantially, first with an “implicit”
change of basis, and then with an “explicit” matrix factorization.
Implicit Change of Basis. For every value of t, x(t) is a vector in R2 , so we can write it
as some linear combination of the basis vectors {b1 , b2 }:

x(t) = a1 (t)b1 + a2 (t)b2 , (∗)

where the coefficients a1 (t) and a2 (t) depend on t. Substituting this into our equations, we
have
∂ 2 a1 (t) ∂ 2 a2 (t) ∂2
b1 + b2 = 2 (a1 (t)b1 + a2 (t)b2 )
∂t2 ∂t2 ∂t
= A(a1 (t)b1 + a2 (t)b2 )
= a1 (t)Ab1 + a2 (t)Ab2
= a1 (t)λ1 b1 + a2 (t)λ2 b2 .

Since b1 , b2 are linearly independent, their coefficients on both sides of this equality must
be equal. Equating coefficients, we obtain the decoupled scalar differential equations:

∂ 2 a1 (t)
= λ1 a1 (t),
∂t2
∂ 2 a2 (t)
= λ2 a2 (t),
∂t2
For which we can easily find the general solutions:
p a˙1 (0) p
a1 (t) = a1 (0) cos( −λ1 t) + √ sin( −λ1 t),
−λ1
and
p a˙2 (0) p
a2 (t) = a2 (0) cos( −λ2 t) + √ sin( −λ2 t).
λ2
The initial conditions we had in class were x1 (0) = 1 and x2 (0) = x˙1 (0) = x˙2 (0) = 0. We
again use the equation (∗) to translate this into initial conditions in a1 (t), a2 (t):
 
1
x(0) = = (1/2)b1 + (1/2)b2 ,
0

so a1 (0) = a2 (0) = 1/2. We also have a˙1 (0) = a˙2 (0) = 0. Thus, the general solution is given
by    √ 
a1 (t) cos(√ kt)
a(t) = = 1/2 ,
a2 (t) cos( 3kt)

6
which in terms of x is just
   √ √ 
x1 (t) 1 cos( kt) + cos( 3kt)
= √ √
x2 (t) 2 cos( kt) − cos( 3kt)

Explicit Matrix Notation. Some people understand things better if they are written in
terms of explicit matrix factorizations. If we diagonalize A = BDB −1 , we can rewrite our
equation as
∂2
x(t) = BDB −1 x(t).
∂t2
Defining

a(t) = B −1 x(t) (this is the same as (∗), in matrix notation),

this becomes
∂2
Ba(t) = BDa(t).
∂t2
Since B is a fixed matrix that does not depend on t, it commutes with the partial derivative
and we have
∂2
B 2 a(t) = BDa(t).
∂t
−1
Multiplying both sides by B gives

∂2
a(t) = Da(t),
∂t2
which is the same diagonal system we solved above.
The eigenvectors of A are called the normal modes of the system.

5 Symmetric and Orthogonal Matrices


Not all matrices are diagonlizable, but there are some very important classes that are. A
real matrix A is called symmetric if A = AT . To state the main theorem about diagonalizing
symmetric matrices, we will need a definition.
Definition. A collection of vectors v1 , . . . , vn is called orthonormal if they are pairwise
orthogonal, i.e.,
(vi · vj ) = 0 for i 6= j
and they are unit vectors:
(vi · vi ) = 1.
In matrix notation, an orthonormal set of vectors has the property that

V T V = I,

7
where V = [v1 | . . . |vn ] is a matrix with the vi as columns; such a matrix is called an
orthogonal matrix.
This last identity implies that
V −1 = V T ,
which reveals one of the very desirable properties of orthonormal bases: for any x, the change
of basis is simply  
(v1 · x)
[x]V = V −1 [x] = V T [x] =  . . .  .
(vn · x)
That is, the coefficients are given by dot products

x = (x · v1 )v1 + (x · v2 )v2 + . . . + (x · vn )vn ,

which are much easier to calculate than solving linear equations as in Section 2
Theorem. If A is symmetric, then it is diagonalizable. Moreover, all of its eigenvalues
are real and it has an eigenbasis of orthonormal eigenvectors. Thus, A = V DV T for some
orthogonal V and diagonal D.

6 Complex Inner Products, Hermitian Matrices, and


Unitary Matrices
There is an important class of complex matrices which are also diagonalizable with orthogonal
eigenvectors. However, the notion of orthogonality (which is a geometric notion induced by
the dot product) is different for complex vectors.
To see that the usual real dot product is deficient in the complex case, consider that
   
1 1
· = 12 + i2 = 0,
i i
so we have a nonzero vector which is orthogonal to itself. This makes no sense geometrically.
It turns out that there is a way to redefine the dot product which recovers all the nice
geometric properties that we have in the real case:

hx|yi = x∗1 y1 + x∗2 y2 + . . . + x∗n yn .

This is the same as the real dot product, except we take the complex conjugate (denoted by
∗) of the first vector x. Note that if both x and y are real then this doesn’t change anything.
We will refer to this as an “inner product” to distinguish it from the real dot product.
With the inner product in hand, we say that a set of vectors u1 , . . . , un in Cn is orthonor-
mal if
hui |uj i = 0 for i 6= j
and
hui |ui i = 1.

8
A matrix with complex orthonormal columns is called unitary. You should check that such
a matrix satisfies
(U ∗ )T U = I.
This looks a ilttle weird, but it’s only because the usual transpose isn’t the correct notion
for complex matrices. The right generalization is actually the conjugate transpose

U † := (U T )∗ ,

pronounced U “dagger”. In this notation, a unitary matrix is just one which satisfies

U † = U −1 .

Again, computing coefficients in such a basis is very easy, and amounts to finding inner
products:
x = hu1 |xiu1 + . . . + hun |xi.
The correct generalization of real symmetric matrices to the complex case is the class of
Hermitian matrices. A matrix A is called Hermitian if A = A† .
Theorem. If A is Hermitian, then it is diagonalizable. Moreover, all of its eigenvalues
are real and it has an eigenbasis of orthonormal eigenvectors. Thus, A = U DU † for some
unitary U and diagonal D.

7 Inner Product Spaces


It turns out that the notion of an inner product can be made to work in even more general
settings than Cn . The motivation for doing this is that the inner product is the device that
allows us to define lengths and angles in Euclidean geometry.
Definition. A function h·|·i : V × V → C is called an inner product if it satisfies the
following properties:
1. hx|xi ≥ 0 for all x, with equality iff x = 0.

2. hax1 + bx2 |yi = a∗ hx1 |yi + b∗ hx2 |yi for all a, b ∈ C and x, y ∈ V . This is called
“conjugate-linearity” in the first coordinate.

3. hx|yi∗ = hy|xi. In particular, this implies linearity in the second coordinate.


p
An inner product induces a norm kxk = hx|xi, whose intended interpretation is that
it is the length of the vector x.
It turns out that these three properties are all we need to guarantee that the familiar
theorems of Euclidean geometry continue to hold with the new inner product. This is
incredibly powerful; later in the course we will define inner products on vector spaces of
functions, and it will allows us to “visualize” them in the familiar ways that we visualize
Euclidean space.
Here are three important properties shared by all inner products:

9
1. kx + yk ≤ kxk + kyk. (triangle inequality)

2. If hx|yi = 0 then kx + yk2 = kxk2 + kyk2 . (Pythagoras Theorem)

3. |hx|yi| ≤ kxkkyk. (Cauchy-Schwartz Inequality).

Item (2) is a simple exercise, and item (1) can be easily derived from item (3). Here is
the proof of item (3): first observe that it is equivalent to show that
 
x y |hx|yi|
kxk kyk = kxkkyk ≤ 1,

where the first equality is because of linearity in both coordinates with respect to real scalars.
So it suffices to show that
|hx|yi| ≤ 1
for all unit vectors x, y. We now compute

hx − y|x − yi = hx|xi + hy|yi − hy|xi − hx|yi.

10

You might also like