Vectors and Matrices
Vectors and Matrices
Abstract
Algebra and geometry of vectors. The algebra of matrices. 2x2 matrices. Inverses. Determinants.
Simultaneous linear equations. Standard transformations of the plane.
Notation 1 The symbol R2 denotes the set of ordered pairs (x, y) – that is the xy-plane. Similarly
R3 denotes the set of ordered triples (x, y, z) – that is, three-dimensional space described by three
co-ordinates x, y and z – and Rn denotes a similar n-dimensional space.
1 Vectors
A vector can be thought of in two different ways. Let’s for the moment concentrate on vectors in the
xy-plane.
• From one point of view a vector is just an ordered pair of numbers (x, y). We can associate
this vector with the point in R2 which has co-ordinates x and y. We call this vector the
position vector of the point.
• From the second point of view a vector is a ‘movement’ or translation. For example, to get
from the point (3, 4) to the point (4, 5) we need to move ‘one to the right and one up’; this
is the same movement as is required to − move
− from − ( 2,
− 3) to ( 1, 2) − or from (1, 2) to (2,
1) . Thinking about vectors from this second point − of view, all three of these movements are
the same vector, because the same translation ‘one right, one up’ achieves each of them, even
though the ‘start’ and ‘finish’ are different in each case. We would write this vector as (1, 1)
. Vectors from this second point of view are sometimes called translation vectors.
1 H1,1L H H4,5L
3,4L
4
0.8
0.6 2
0.4
1 2 H2,−1L 3 4
-2 -1
0.2 H1,−2L
H−1,−2L
-2
(1, 1) as a translation
0.2 0.4 0.6 0.8 1 H−2,−3L
(1, 1) as a position vector
∗
These handouts are produced by Richard Earl, who is the Schools Liaison and Access Officer for mathematics, statistics and
computer science at Oxford University. Any comments, suggestions or requests for other material are welcome at
[email protected]
Likewise in three (or higher) dimensions the triple (x, y, z) can be thought of as the point in R3,
which is x units along the x-axis, y units along the y-axis and z units along the z-axis, or it can represent
the translation which would take the origin to that point.
Notation 2 For ease of notation vectors are often denoted by a single letter, but to show that this is a
vector quantity, rather than just a single number, the letter is either underlined such as v or written in
bold as v.
5-1
0.
-
5
0.
-2
1.
5
1
2L
v=H1,−
= 1
u 1,
5
0.
HL
v
u+
v
5
1.
u
Given a vector v = (v1, v2) and a real number (a scalar) k then the scalar multiple kv is defined as
kv = (kv1, kv2) .
When k is a positive integer then we can think of kv as the translation achieved when we translate by v
k times. Note that the points kv, as k varies, make up the line which passes through the origin and the
point v.
We write −v for ( 1)
− v = (−a, − b) and this is the inverse operation of translating by v. And the
difference of two vectors v = (v1, v2) and w = (w1, w2) is defined as
v − w = v+ (−w) = (v1 − w1, v2 − w2) .
Put another way v −w is the vector that translates the point with position vector w to the point with
position vector v.
Note that there is also a special vector 0 =(0, 0) which may be viewed either as a special point the
origin, where the axes cross, or as the null translation, the translation that doesn’t move anything.
The vectors (1, 0) and (0, 1) form the standard or canonical basis for R2. They are denoted by
the symbols i and j respectively. Note that any vector v = (x, y) can be written uniquely as a
linear combination of i and j: that is
(x, y) = xi+yj
and this is the only way to write (x, y) as a sum of scalar multiples of i and j. Likewise the vectors
(1, 0, 0) , (0, 1, 0) , (0, 0, 1) form the canonical basis for R3 and are respectively denoted as i, j
and k.
Proposition 3 Vector addition and scalar multiplication satisfy the following properties. These proper-
ties verify that R2 is a real vector space (cf. Michaelmas Linear Algebra). Let u, v, w ∈R2 and λ, µ R.
Then
u+0=u u+v=v+u 0u = 0
u +− ( u) = 0 (u + v)+ w = u + (v + w)1u = u
(λ + µ)u =λu+µu λ(u + v) =λu+λv λ(µu)
= (λµ) u
Everything above generalises in an obvious way to the case of Rn and from now on we will discuss
this general case.
Note that |v| ≥ 0 and that |v| = 0 if and only if v = 0. Also |λv| = |λ| |v| for any λ ∈ R.
Proposition 4 (Triangle Inequality) Let u = (u1, u2,... , un) , v = (v1, v2,..., vn) ∈ Rn. Then
|u + v| ≤ |u| + |v|
0.
1
-
1
u=H1,
0.5
1L
v
u+
= 2
v 1,−
5
1.
H
L
2
Geometric interpretation of the triangle inequallity
Proof. Note that for t ∈ R
n n
X X
2 2 2
0 ≤ |u + tv| = (ui + tvi )2 = |u| + 2t ui vi + t2 |v| .
i= i=
1 1
The RHS of the above inequality is a quadratic in t which is always non-negative and so has non-
positive discriminant (i.e. b2 ≤ 4ac). Hence
à n !2
4X uivi
2
≤ 4 |u| |v|
i=
1
and so
¯
¯
X
¯
n
from the Cauchy-Schwarz Inequality. If we take the principles values of cos−1 to be in the range
≤ 0θπ
then this formula measures the smaller angle between the vectors. Note that two vectors u and v are
perpendicular precisely when u · v =0.
2 Matrices
At its simplest a matrix is just a two-dimensional array of numbers: for example
µ
1 2 −3 ⎛ ⎞ µ ¶
¶ 1 0 0
√
, ⎝ −1.2⎠ ,
2 π 0 0 0
−1
are all matrices. The examples above are respectively a 2 3 matrix, a 3 1 matrix and a 2 2 matrix
(read ‘2 by 3’ etc.); the first figure refers to the number × of rows and × the second to the number of
columns. So vectors like (x, y) and (x, y, z) are also matrices, respectively×1 2 and 1 3 matrices.
1 matrix above could just as easily be thought of as ×
The 3 × a vector – it is after all just a list of
three numbers, but written down rather than across. This is an example of a column vector. When
we need to differentiate between the two, the vectors we have already met like (x, y) and (x, y, z)
are called row vectors.
Notation 7 The set of 1 n×row vectors (or n-tuples) is written as Rn. When we need to differentiate
between row and column vectors we write (Rn )0 or (Rn )∗ for the set of n ×1 column vectors. If there is
no chance of confusion between the two (because we are only using row vectors, or only column vectors)
then Rn can denote either set.
Notation 8 If you are presented with an m n×matrix A = (aij) then the notation here simply means
that aij is the (i, j)th entry. That is, the entry in the ith row and jth column is aij. Note that i can
vary between 1 and m, and that j can vary between 1 and n. So
) and jth column = ⎞⎟ .
a1j
⎜
ith row = , ..., ⎛ ⎝ . ⎠
(ai1 ain am
j
That is, we multiply each of the entries of A by k to get the new matrix kA.
3. Matrix Multiplication: Based on how we added matrices then you might think that we multiply
matrices in a similar fashion, namely multiplying corresponding entries, but we do not. At first
glance the rule for multiplying matrices is going to seem rather odd, but we will soon discover
why we multiply them as we do.
The rule is this: we can multiply an m × n matrix A with an p × q matrix B if n = p and we
produce an m × q matrix AB with (i, j)th entry
X
n
(AB)ij = aikbkj for 1 ≤ i ≤ m and 1 ≤ j ≤ q.
k=1
It may help a little to write the rows of A as r1, ..., rm and the columns of B as c1, ..., cq and the
above rule says that
(AB)ij = ri · cj for 1 ≤ i ≤ m and 1 ≤ j ≤ q.
We dot the rows of A with the columns of B.
This will, I am sure, seem pretty bizarre, let alone easy to remember – so here are some
examples.
−4
Example 11 Where possible, calculate pairwise the products of the following matrices.
1 2 123 1 −1
A=µ ¶, B = µ ¶, C = µ ¶.
− −
1. Note the AC 6= CA. That is, matrix multiplication is not generally commutative.
(AB) C = A (BC)
whenever this product makes sense. So we can write down a product like A1A2 . .. An without
having to specify how we go about multiplying all these matrices or needing to worry about
bracketing. But we do have to keep the order in mind. We shall not prove this fact here.
3. Note that CC = 0 even though C is non-zero – not something that happens with numbers.
4. The distributive laws also hold for matrix multiplication, namely
A (B + C) = AB + AC and (A + B) C = AC + BC
Notation 12 We write A2 for the product AA and similarly we write An for the product AA}· · · A.
| {z
Note
n times
that A must be a square matrix (same number of rows and columns) for this to make sense.
3 Matrices as Maps
We have seen then how we can multiply an m × n matrix A and an n × p matrix B to form a product
AB which is an m × p matrix. Now given the co-ordinates of a point in n-dimensional space, we can
put them in a column to form a n × 1 column vector, an element v ∈ (Rn )0 . If we premultiply this
vector v by the m × n matrix A we get a m × 1 column vector Av ∈ (Rm )0 .
Importantly though, we can now answer the question of why we choose to multiply matrices as we
do. Take an× m n matrix A and an n p matrix B. We have two maps associated with
premultiplication by A and B; let’s call them α, given by:
and β, given by
β : (Rp )0 → (Rn )0 : v 7→Bv.
We also have their composition α ◦ β, that is we do β first and then α, given by
A (Bv) = (AB) v.
That is,
the composition α ◦ β is premultiplication by the product AB.
So if we think of matrices more as maps, rather than just simple arrays of numbers, matrix
multiplication is quite natural and simply represents the composition of the corresponding maps.
which is just pre-multiplication by A. Because of the distributive laws that hold for matrices, given
column vectors v1, v2 ∈ (R n)0 and scalars c1, c2 ∈ R then
This means α is what is called a linear map – multiplication by a matrix leads to a linear map. The
important thing is that converse also holds true – any linear map →(Rn )0 (Rm )0 has an
associated matrix. 0 0
¡ 2¢ 2 matrices.
To see this most clearly we will return to ¡ ¢ Let α : R2 R2 be a linear map –
we’ll × →
try to work out what its associated matrix is. Let’s suppose we’re right and α is just premultiplication
by some 2 × 2 matrix; let’s write it as µ
a b
.
¶
c d
Note that if we multiply the canonical basis vectors
1 0
i=µ ¶ and j = µ ¶
0 1
b 0 c c 1 d
So, if this matrix exists, the first α (i) and the second column is α (j) . But if we remember that α is
linear then we see now that we have the right matrix, let’s call it
¡ ¢
A = α (i) α (j) .
Then
x
Aµ ¶ = xα (i)+ yα (j)
y
= α (xi+yj) [as α is linear]
x
= αµ ¶.
y
We shall make use of this later when we calculate the matrices of some standard maps.
The calculations we performed above work just as well generally: if α : (Rn→)0 (Rm )0 is a linear
map then it is the same as premultiplying by×an m n matrix. In the columns of this matrix are the
images of the canonical basis of Rn under α.
4 Simultaneous Linear Equations – 2 × 2 Inverses
Suppose we are given two linear equations in two variables x and y. These might have the form
2x + 3y = 1 (2)
and
3x + 2y = 2 (3)
To have solved these in the past, you might have argued along the lines:
1
-
=2
3x+2y
5
0.
-
2
-
1
-
1 2x+3y
2
=1
5−0.2 L
.H0.8 ,
0 1
5
.
1 2
You may even have seen how this situation could be solved graphically by drawing the lines
2x + 3y = 1 and 3x + 2y = 2;
the solution then is their unique intersection.
2x + 3y = 1 and 3x + 2y = 2
We can though, put the two equations (2) and (3) into matrix form. We do this by writing
µ ¶µ ¶ = µ ¶.
2 3 x 1
32 y 2
The two simultaneous equations in the scalar variables x and y have now been replaced by a single
equation in vector quantities – we have in this vector equation two×2 1 vectors (one on each side
of the equation), and for the vector equation to be true both co-ordinates of the two vectors must
agree.
We know that it is possible to undo this equation to obtain the solution
µ ¶ =µ ¶
x 0.8
y −0.
2
because we have already solved this system of equations. In fact there is a very natural way of
unravelling these equations using matrices.
¶ = I2.
• The matrix I2 is called the identity matrix – or more specifically it is the 2×2 identity matrix.
There is an n×n identity matrix which has 1s down the diagonal from top left to bottom right
and 0s elsewhere.
• The identity matrices have the property that
AIn = A = InA
• If an inverse B for A exists then it is unique. This is easy to show: suppose B and C were two
inverses then note that
C = InC = (BA) C = B (AC) = BIn = B.
• If BA = In then, in fact, it will follow that AB = In. The converse is also true. We will not
prove this here.
d
has an inverse precisely when ad − bc 6= 0. If ad − bc 6= 0 then
µ ¶
−1 1 d −b
A = .
ad − bc −c a
Proof. Note for any values of a, b, c, d, that
µ ¶µ ¶ µ 0
d −b a ad − bc
−c =
b ¶ = (ad − bc) I2.
c 0 ad −
bc
So if ad − bc 6= 0 then we can divide by this scalar and we have found our inverse.
But if ad − bc = 0 then we have found a matrix
µ ¶
d −b
B=
−c a
such that
BA = 02
the 2 × 2 zero matrix. Now if an inverse C for A existed, we’d have that
ax + by = k1,
cx + dy = k2.
• If the two equations are entirely multiples of one another, i.e. a : c = b : d = k1 : k2 then we
essentially just have the one equation and there are infinitely many solutions.
• If the two equations aren’t entirely multiples of one another, just the left hand sides i.e.a : c = b :
d 6= k1 : k2 then we have two contradictory equations and there are no solutions.
ax + by = k1 and cx + dy = k2
being non-parallel and intersecting once, parallel and concurrent, or parallel and distinct.
-1-0.5
0.5
-1-
2
-
1
-
1
0.5
-1-
-1
0.
5
.
0
0.51
1
−0.2 L
H0.8 ,
5.
5
.
.-
01
1
1
-
0
0.511.52
0.511.52
Unique Infinite Solutions No solutions
1.52
Solution
5 Determinants
The determinant of a square n × n matrix is a number which reflects the way a matrix (or rather its
associated map α) stretch space. We will define this only for 2 × 2 matrices.
Then
det A = ad − bc.
We have already seen that A has an inverse precisely when det6A = 0. More generally
| det A is
an area-scaling factor for α. So if R is a region of the plane, then |
The sign of det A is also important (whether it is positive or negative). If det A is positive then α
will preserve a sense of orientation (such as a rotation does) but if det A is negative then the sense of
orientation will be reversed (such as a reflection does). If det A = 0 then α collapses space: under α
then the xy-plane will map to a line (or a single point in the case of A = 02). Viewed from this
geometric point the following multiplicative property of determinants should not seem to surprising,
if we recall that AB represents the composition of the maps represented by B then A.
Proof. Let µ ¶ e f ¶
a and B = .
A=
µ g h
Then bc
d
µ ae + bg¶ af + bh
det (AB) = det
ce + dg cf + dh
= (ae + bg) (cf + dh) − (af + bh) (ce + dg)
= bgcf + aedh − bhce − afdg [after cancelling]
= (ad − bc) (eh − fg)
= det A det B.
6 Transformations of the Plane
We end with a little geometry. We already noted in the section Matrices as Maps how a matrix A leads
to a linear map α and how every linear map can be achieved by premultiplying by a matrix – we
further saw how to calculate such matrices.
The following are all examples of linear maps of the xy-plane:
We concentrate on the first example Rθ. Remember to find the matrix for Rθ we need to find the
images Rθ (i) and Rθ (j) . We note from the diagrams
H−sin θ,cos
θL 0.8 Hcos θ,sin θL
0.6
0.4
0.
2
θ θ
-0.8 -0.6 -0.4 - 0.2 0.4
0.2
cos ¶ θ µ
that
sin θ¶
R µ(i) = − and R (j) = .
θ θ cos θ
sin θ
So, with a little abuse of notation, we can write
cos θθ sin θ ¶
R µ
= sin θ
.
− cos θ
There are several important things to notice about this matrix:
• det Rθ= +1, unsurprisingly, as it preserves area and it preserves a sense of orientation.
• RθRφ = Rθ+φ – rotating by θ and then by φ is the same as rotating by θ + φ. This is one way
of calculating the cos (θ + φ) and sin (θ + φ) formulas.
• Rθ (v) ·Rθ (w) = v ·w for any two 2 ×1 vectors v and w. This perhaps is surprising: this equation
says that Rθ is an orthogonal matrix (cf. Michaelmas Geometry I course). One consequence of this
equation is that Rθ preserves distances and angles.