LA Lecture Notes
LA Lecture Notes
FT
RA
D
Analysis Series
M. Winklmeier
Chigüiro Collection
Work in progress. Use at your own risk.
FT
RA
D
Contents
1 Introduction 7
1.1 Examples of systems of linear equations; coefficient matrices . . . . . . . . . . . . . . 8
1.2 Linear 2 × 2 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
FT
2 R2 and R3 25
2.1 Vectors in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Inner product in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Orthogonal Projections in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5 Vectors in R3 and the cross product . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6 Lines and planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7 Intersections of lines and planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . 56
RA
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4 Determinants 121
4.1 Determinant of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2 Properties of the determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.3 Geometric interpretation of the determinant . . . . . . . . . . . . . . . . . . . . . . . 133
4.4 Inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3
4 CONTENTS
FT
7.1 Orthonormal systems and orthogonal bases . . . . . . . . . . . . . . . . . . . . . . . 233
7.2 Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.3 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.4 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.5 The Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
7.6 Application: Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
RA
8 Symmetric matrices and diagonalisation 269
8.1 Complex vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.2 Similar matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
8.3 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
8.4 Properties of the eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . 288
8.5 Symmetric and Hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
8.6 Application: Conic Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
8.6.1 Solutions of ax2 + bxy + cy 2 = d as conic sections . . . . . . . . . . . . . . . . 309
D
Index 325
Chapter 1
Introduction
This chapter serves as an introduction to the main themes of linear algebra, namely the problem of
solving systems of linear equations for several unknowns. We are not only interested in an efficient
FT
way to find their solutions, but we also wish to understand how the solutions could possibly look
and what we can say about their structure. For the latter, it will be crucial to find a geometric
interpretation of systems of linear equations. In this chapter we will use the “solve and insert”-
strategy for solving linear systems. A systematic and efficient formalism will be given in Chapter 3.
Everything we discuss in this chapter will appear again later on, so you may read it quickly or even
skip (parts of) it.
A linear system is a set of equations for a number of unknowns which have to be satisfied simul-
RA
taneously and where the unknowns appear only linearly. If the number of equations is m and the
number of unknowns is n, then we call it an m × n linear system. Typically the unknowns are
called x, y, z or x1 , x2 . . . , xn . The following is an example of a linear system of 3 equations for 5
unknowns:
because
√ in the first equation we have a product of two of the unknowns. Also expressions like x2 ,
3
x, xyz, x/y or sin x make a system non-linear.
Now let us briefly discuss the simplest non-trivial case: A system consisting of one linear equation
for one unknown x. Its most general form is
ax = b (1.1)
where a and b are given constants and we want to find all x ∈ R which satisfy (1.1). Clearly, the
solution to this problem depends on the coefficients a and b. We have to distinguish several cases.
Case 1. a 6= 0. In this case, there is only one solution, namely x = b/a.
Case 2. a = 0, b 6= 0. In this case, there is no solution because whatever value we choose for x,
the left hand side ax will always be zero and therefore cannot be equal to b.
Case 3. a = 0, b = 0. In this case, there are infinitely many solutions. In fact, every x ∈ R solves
the equation.
So we see that already in this simple case we have three very different types of solution of the
system (1.1): no solution, exactly one solution or infinitely many solutions.
Now let us look at a system of one linear equation for two unknowns x, y. Its most general form is
ax + by = c. (1.1’)
Here, a, b, c are given constants and we want to find all pairs x, y so that the equation is satisfied.
For example, if a = b = 0 and c 6= 0, then the system has no solution, whereas if for example a 6= 0,
then there are infinitely many solutions because no matter how we choose y, we can always satisfy
the system by taking x = a1 (c − y).
Question 1.1
Is it possible that the system has exactly one solution?
(Come back to this question again after you have studied Chapter 3.)
FT
The general form of a system of two linear equations for one unknown is
a1 x = b1 ,
a11 x + a12 y = c1 ,
a2 x = b2
a21 x + a22 y = c2
RA
where a1 , a2 , b1 , b2 , respectively a11 , a12 , a21 , a22 , c1 , c2 are constants and x, respectively x, y are the
unknowns.
Question 1.2
Can you find find examples for the coefficients such that the systems have
Can you maybe even give a general rule for when which behaviour occurs?
(Come back to this question again after you have studied Chapter 3.)
Before we discuss general linear systems, we will discuss in this introductory chapter the special
case of a system of two linear equations with two unknowns. Although this is a very special type
of system, it exhibits many porperties of general linear systems and they appear very often in
problems.
Example 1.1. Assume that a car dealership sells motorcycles and cars. Altogether they have 25
vehicles in their shop with a total of 80 wheels. How many motorcycles and cars are in the shop?
Solution. First, we give names to the quantities we want to calculate. So let M = number of
motorcyles, C = number of cars in the dealership. If we write the information given in the exercise
in formulas, we obtain
since we assume that every motorcycle has 2 wheels and every car has 4 wheels. Equation 1 tells
us that M = 25 − C. If we insert this into equation 2 , we find
80 = 2(25 − C) + 4C = 50 − 2C + 4C = 50 + 2C =⇒ 2C = 30 =⇒ C = 15.
This implies that M = 25 − C = 25 − 15 = 10. Note that in our calculations and arguments, all
FT
the implication arrows go “from left to right”, so what we can conclude at this instance is that the
system has only one possible candidate for a solution and this candidate is M = 10, C = 15. We
have not (yet) shown that it really is a solution. However, inserting these numbers in the original
equation we see easily that our candidate is indeed a solution.
So the answer is: There are 10 motorcycles and 15 cars (and there is no other possibility).
Solution. Again, let M = number of motorcyles, C = number of cars. The information of the
exercise leads to the following system of equations:
D
As in the previous exercise, we obtain from 1 and 2 that M = 10, C = 15. Clearly, this also
satisfies equation 3 . So again the answer is: There are 10 motorcycles and 15 cars (and there is
no other possibility).
Example 1.3. Assume that a car dealership sells motorcycles and cars. Altogether they have 25
vehicles in their shop with a total of 80 wheels. Moreover, the shop arranges them in 5 distinct areas
of the shop so that in each area there are either 3 cars or 5 motorcycles. How many motorcycles
and cars are in the shop?
Solution. Again, let M = number of motorcycles, C = number of cars. The information of the
exercise gives the following equations:
As in the previous exercise, we obtain that M = 10, C = 15 using only equations 1 and 2 .
However, this does not satisfy equation 3 ; so there is no way to choose M and C such that all
three equations are satisfied simultaneously. Therefore, a shop as in this example does not exist.
Example 1.4. Assume that a zoo has birds and cats. The total count of legs of the animals is 60.
Feeding a bird takes 5 minutes, feeding a cat takes 10 minutes. The total time to feed the animals
is 150 minutes. How many birds and cats are in the zoo?
Solution. Let B = number of birds, C = number of cats in the zoo. The information of the
FT
exercise gives the following equations:
The first equation gives B = 30 − 2C. Inserting this into the second equation, gives
Remark. The reason for this is that both equations 1 and 2 are basically the same equation.
If we divide the first one by 2 and the second one by 5, then we end up in both cases with the
equation B + 2C = 30, so both equations contain exactly the same information.
D
Algebraically, the linear system has infinitely many solutions. But our variables represent animals
and the only come in nonnegativ integer quantities, so we have the 16 different solutions B = 30−C
where C ∈ {0, 1, . . . , 15}.
Solution. A polynomial of degree at most 3 is known if we know its 4 coefficients. In this exercise,
the unknowns are the coefficients of the polynomial P . If we write P (x) = αx3 + βx2 + γx + δ,
then we have to find α, β, γ, δ such that (1.2) is satisfied. Note that P 0 (x) = 3αx2 + 2βx + γ. Hence
FT
3 2
It is easy to verify that the polynomial P (x) = x + 2x + 3x + 1 has all the desired properties.
Example 1.6. A pole is 5 metres long and shall be coated with varnish. There are two types of
varnish available: The blue one adds 3 g per 50 cm to the pole, the red one adds 6 g per meter to
the pole. Is it possible to coat the pole in a combination of the varnishes so that the total weight
added is
(a) 35 g? (b) 30 g?
RA
Solution. (a) We denote by b the length of the pole which will be covered in blue and r the length
of the pole which will be covered in red. Then we obtain the system of equations
1 b+ r = 5 (total length)
2 6b + 6r = 35 (total weight)
The first equation gives r = 5 − b. Inserting into the second equation yields 35 = 6b + 6(5 − b) = 30
which is a contradiction. This shows that there is no solution.
(b) As in (a), we obtain the system of equations
D
1 b+ r = 5 (total length)
2 6b + 6r = 30 (total weight)
Again, the first equation gives r = 5−b. Inserting into the second equation yields 30 = 6b+6(5−b) =
30 which is always true, independently of how we choose b and r as long as 1 is satisfied. This
means that in order to solve the system of equations, it is sufficient to solve only the first equation
since then the second one is automatically satisfied. So we have infinitely many solutions. Any pair
b, r such that b + r = 5 gives a solution. So for any b that we choose, we only have to set r = 5 − b
and we have a solution of the problem. Of course, we could also fix r and then choose b = 5 − r to
obtain a solution.
For example, we could choose b = 1, then r = 4, or b = 0.00001, then r = 4.99999, or r = −2 then
b = 7. Clearly, the last example does not make sense for the problem at hand, but it still does
satisfy our system of equations.
Example 1.7. When octane reacts with oxigen, the result is carbon dioxide and water. Find the
equation for this reaction.
Solution. The chemical formulas for the substances are C8 H18 , O2 , CO2 and H2 O. Hence the
reaction equation is
a C8 H18 + b O2 −→ c CO2 + d H2 O
with unkonwn integers a, b, c, d. Clearly the solution will not be unique since if we have one set
of numbers a, b, c, d which works and we multiply all of then by the same number, then we obtain
another solution. Let us write down the system of equations. To this end we note that the number
of atoms of each element has to be equal on both sides of the equation. We obtain:
1 8a = c (carbon)
2 18a = 2d (hydrogen)
3 2b = 2c + d (oxygen)
or, if we put all the variables on the left hand side,
1
2
4
FT 8a
18a
− c = 0,
− 2d = 0,
2b − 2c − d = 0.
Let us express all the unknowns in terms of a: 1 and 2 show that c = 8a and d = 9a. Inserting
this in 3 we obtain 0 = 2b − 2 · 8a − 9a = 2b − 25a, hence b = 25
2 a. If we want all coefficients to
be integers, we can choose a = 2, b = 25, c = 16, d = 18 and the reaction equation becomes
RA
2 C8 H18 + 25 O2 −→ 16 CO2 + 18 H2 O .
All the examples we discussed in this section are so-called systems of linear equations. Let us give
a precise definition of what we mean by this.
Definition 1.8 (Linear system). An m×n system of linear equations (or simply a linear system)
is a system of m linear equations for n unknowns of the form
a11 x1 + a12 x2 + · · · + a1n xn = b1
D
Definition 1.9 (Coefficient matrix). The coefficient matrix A of the system is the collection of
all coefficients aij in an array as follows:
a11 a12 . . . a1n
a21 a22 . . . a2n
A= . .. . (1.4)
.. .
am1 am2 . . . amn
The numbers aij are called the entries or components of the matrix A.
The augmented coefficient matrix A of the system is the collection of all coefficients aij and the
right hand side; it is denoted by
a11 a12 . . . a1n b1
a21 a22 . . . a2n b2
(A|b) = ..
.. .. . (1.5)
. . .
am1 am2 . . . amn bn
FT
The coefficient matrix is nothing else than the collection of the coefficients aij ordered in some sort
of table or rectangle such that the place of the coefficient aij is in the ith row of the jth column.
The augmented coefficient matrix contains additionally the constants from the right hand side.
Important observation. There is a one-to-one correspondence between linear systems and aug-
mented coefficient matrices: Given a linear system, it is easy to write down its augmented coefficient
matrix and vice versa.
RA
Let us write down the coefficient matrices of our examples.
Example 1.1: This is a 2 × 2 system with coefficients a11 = 1, a11 = 1, a21 = 2, a22 = 4 and
right hand side b1 = 60, b2 = 200. The system has a unique solution. The coefficient matrix and
the augmented coefficient matrix are
1 1 1 1 60
A= , (A|b) = .
2 4 2 4 200
Example 1.2: This is a 3 × 2 system with coefficients a11 = 1, a11 = 1, a21 = 2, a22 = 4, a31 = 2,
D
a32 = 3, and right hand side b1 = 60, b2 = 200, b3 = 140. The system has a unique solution. The
coefficient matrix and the augmented coefficient matrix are
1 1 1 1 60
A = 2 4 , (A|b) = 2 4 200 ,
2 3 2 3 140
Example 1.3: This is a 3 × 2 system with coefficients a11 = 1, a11 = 1, a21 = 2, a22 = 4, a31 = 2,
a32 = 3, and right hand side b1 = 60, b2 = 200, b3 = 100. The system has no solution. The
coefficient matrix is the same as in Example 1.2, the augmented coefficient matrix is
1 1 60
(A|b) = 2 4 200 ,
2 3 100
Example 1.5: This is a 4 × 4 system with coefficients a11 = 0, a12 = 0, a13 = 0, a14 = 1, a21 = 1,
a22 = 1, a23 = 1, a24 = 1, a31 = 0, a32 = 0, a33 = 1, a34 = 0, a41 = 24, a42 = 8, a43 = 2, a44 = 1,
and right hand side b1 = 1, b2 = 7, b3 = 3, b4 = 23. The system has a unique solution. The
coefficient matrix and the augmented coefficient matrix are
0 0 0 1 0 0 0 1 1
1 1 1 1 1 1 1 1 7
A= 0 0 1 0 , (A|b) = 0 0 1 0 3 .
24 8 2 1 24 8 2 1 23
Example 1.7: This is a 3 × 4 homogeneous system with coefficients a11 = 8, a12 = 0, a13 = −1,
a14 = 0, a21 = 18, a22 = 0, a23 = 0, a24 = −2, a31 = 0, a32 = 2, a33 = −2, a34 = −1, and right
hand side b1 = 1, b2 = 7, b3 = 3, b4 = 23. The system has a unique solution. The coefficient matrix
and the augmented coefficient matrix are
8 0 −1 0 8 0 −1 0 0
FT
A = 18 0 0 −2 , (A|b) = 18 0 0 −2 0 .
0 2 −2 −1 0 2 −2 −1 0
We saw that Examples 1.1, 1.2, 1.5, 1.6 (a) have unique solutions. In Examples 1.6 (b) and 1.7
the solution is not unique; they even have infinitely many solutions! Examples 1.3 and 1.6(a) do
not admit solutions. So given an m × n system of linear equations, two important questions arise
naturally:
RA
• Existence: Does the system have a solution?
• Uniqueness: If the system has a solution, is it unique?
More generally, we would like to be able to say something about the structure of solutions of linear
systems. For example, is it possible that there is only one solution? That there are exactly two
solutions? That there are infinite solutions? That there is is no solution? Can we give criteria for
D
FT
and y instead of C. Recall that the system of equations that we are interested in solving is
1 x + y = 60,
(1.6)
2 2x + 4y = 200.
We want to give a geometric meaning to this system of equations. To this end we think of pairs
x, y as points (x, y) in the plane. Let us forget about the equation 2 for a moment and concentrate
only on 1 . Clearly, it has infinitely many solutions. If we choose an arbitrary x, we can always
find y such that 1 satisfied (just take y = 60 − x). Similarly, if we choose any y, then we only have
RA
to take x = 60 − y and we obtain a solution of 1 .
Where in the xy-plane lie all solutions of 1 ? Clearly, 1 is equivalent to y = 60 − x which we easily
identify as the equation of the line L1 in the xy-plane which passes through (0, 60) and has slope
−1. In summary, a pair (x, y) is a solution of 1 if and only if it lies on the line L1 , see Figure 1.1.
If we apply the same reasoning to 2 , we find that a pair (x, y) satisfies 2 if and only if (x, y) lies
on the line L2 in the xy-plane given by y = 41 (200 − 2x) (this is the line in the xy-plane passing
through (0, 50) with slope − 12 ).
D
Now it is clear that a pair (x, y) satisfies both 1 and 2 if and only if it lies on both lines L1 and
L2 . So finding the solution of our system (1.6) is the same as finding the intersection of the two
lines L1 and L2 . From elementary geometry we know that there are exactly three possibilities for
their intersection:
(i) L1 and L2 are not parallel. Then they intersect in exactly one point.
(ii) L1 and L2 are parallel and not equal. Then they do not intersect.
(iii) L1 and L2 are parallel and equal. Then L1 = L2 and they intersect in infinitely many points
(they intersect in every point of L1 = L2 ).
In our example we know that the slope of L1 is −1 and that the slope of L2 is − 12 , so they are not
parallel and therefore intersect in exactly one point. Consequently, the system (1.6) has exactly
one solution.
M M M
40 40 40
30 30 2M + 4C = 80 30
20 M + C = 25 20 20
10 10 10 (15, 10)
C C C
−10 10 20 30 −10 10 20 30 −10 10 20 30
−10 −10 −10
Figure 1.1: Graphs of the lines L1 , L2 which represent the equations from the system (1.6) (see also
Example 1.1). Their intersection represents the unique solution of the system.
the lines
FT
If we look again at Example 1.6, we see that in Case (a) we have to determine the intersection of
L1 : y = 5 − x, L2 : y =
35
6
− x.
Both lines have slope −1 so they are parallel. Since the constant terms in both lines are not equal,
they intersect nowhere, showing that the system of equations has no solution, see Figure 1.2.
In Case (b), the two lines that we have to intersect are
RA
G1 : y = 5 − x, G2 : y = 5 − x.
We see that G1 = G2 , so every point on G1 (or G2 ) is solution of the system and therefore we have
infinite solutions, see Figure 1.2.
Important observation. If a linear 2 × 2 system has a unique solution or not, has nothing to
do with the right hand side of the system because this only depends on whether the two lines are
parallel or not, and this in turn depends only on the coefficients on the left hand side.
D
r
6
5
4
3 L1 : y = 5 − x
2 L2 : y = 35/6 − x
1
g
−1 1 2 3 4 5 6
−1
FT
(ii) The set of solutions is all of the plane. This happens if α = β = γ = 0. In this case, clearly
every pair (x, y) is a solution of (1.7).
(iii) There is no solution. This happens if α = β = 0 and γ 6= 0. In this case, no pair (x, y) is a
solution of (1.7) since the left hand side is always 0.
In the first two cases, (1.7) has infinitely many solutions, in the last case it has no solution.
RA
Two linear equations with two unknowns
1 Ax + By = U
(1.8)
2 Cx + Dy = V.
We are using the letters A, B, C, D instead of a11 , a12 , a21 , a22 in order to make the calculations
more readable. If we interprete the system of equations as intersection of two geometrical objects,
D
in our case lines, we already know the there are exactly three possible types of solutions:
In case (i), the system has exactly one solution, in cases (ii) and (iii) the system has infinitely many
solutions and in case (iv) the system has no solution.
In summary, we have the following very important observation.
Remark 1.10. The system (1.8) has either exactly one solution or infinitely many solutions or
no solution.
Question 1.3
What is the geometric interpretation of
(i) a system of 3 linear equations for 2 unknowns?
(ii) a system of 2 linear equations for 3 unknowns?
What can be said about the structure of its solutions?
Algebraic proof of Remark 1.10. Now we want to prove the Remark 1.10 algebraically and we want
to find a criterion on A, B, C, D which allows us to decide easily how many solutions there are. Let
FT
us look at the different cases.
1
Case 1. B 6= 0. In this case we can solve 1 for y and obtain y = B (U − Ax). Inserting 2 we find
D
Cx + − Ax) = V . If we put all terms with x on one side and all other terms on the other side,
B (U
we obtain
2’ (AD − BC)x = DU − BV.
DU −BV
(i) If AD − BC 6= 0 then there is at most one solution, namely x = AD−BC and consequently
1 AV −CU
y= − Ax) =
B (U AD−BC .
Inserting these expressions for x and y in our system of equations,
RA
we see that they indeed solve the system (1.8), so that we have exactly one solution.
Case 2. D 6= 0. This case is analogous to Case 1. In this case we can solve 2 for y and obtain
1 B
D
y= D (V − Cx). Hence 1 becomes Ax + D (V − Cx) = U . If we put all terms with x on one side
and all other terms on the other side, we obtain
1’ (AD − BC)x = DU − BV
Case 3. B = 0 and D = 0. Observe that in this case AD − BC = 0 . In this case the system (1.8)
reduces to
Ax = U, Cx = V. (1.9)
We see that the system no longer depends on y. So, if the system (1.9) has at least one solution,
then we automatically have infinitely many solutions since we may choose y freely. If the system
(1.9) has no solution, then the original system (1.8) cannot have a solution either.
Note that there are no other possible cases for the coefficients.
In summary, we proved the following theorem.
1 Ax + By = U
(1.10)
2 Cx + Dy = V.
(i) The system (1.10) has exactly one solution if and only if AD − BC 6= 0 . In this case, the
FT
solution is
DU − BV AV − CU
x= , y= . (1.11)
AD − BC AD − BC
(ii) The system (1.10) has no solution or infinitely many solutions if and only if AD − BC = 0 .
Definition 1.12. The number d = AD − BC is called the determinant of the system (1.10).
RA
In Chapter 4.1 we will generalise this concept to n × n systems for n ≥ 3.
Remark 1.13. Let us see how the determinant connects to our geometric interpretation of the
system of equations. Assume that B 6= 0 and D 6= 0. Then we can solve 1 and 2 for y to obtain
equations for a pair of lines
A 1 C 1
L1 : y= − x + U, L2 : y= − x + V.
B B D D
A C
D
The two lines intersect in exactly one point if and only if they have different slopes, i.e., if − B 6= − D .
After multiplication by −BD we see that this is the same as AD 6= BC, or in other words,
AD − BC 6= 0.
On the other hand, the lines are parallel (hence they are either equal or they have no intersection)
A C
if − B 6= − D . This is the case if and only if AD = BC, or in other words, if AD − BC = 0.
Question 1.4
Consider the cases when B = 0 or D = 0 and make the connection between Theorem 1.11 and
the geometric interpretation of the system of equations.
y
7
(5, 3)
3 L1 : x + 2y = 11
L2 : 3x + 4y = 27
1
x
−1 1 3 5 7 9 11
−1
Figure 1.3: Example 1.14(a). Graphs of L1 , L2 and their intersection (5, 3).
FT 1
2
x + 2y = 11
3x + 4y = 27.
Clearly, the determinant is d = 4 − 6 = −2 6= 0. So the system has exactly one solution.
We can check this easily: The first equation gives x = 11 − 2y. Inserting this into the second
equations leads to
RA
3(11 − 2y) + 4y = 27 =⇒ −2y = −6 =⇒ y=3 =⇒ x = 11 − 2 · 3 = 5.
So the solution is x = 5, y = 3. (If we did not have Theorem 1.11, we would have to check
that this is not only a candidate for a solution, but indeed is one.)
(b) 1 x + 2y = 1
D
2 2x + 4y = 5.
Here, the determinant is d = 4 − 4 = 0, so we expect either no solution or infinitely many
solutions. The first equations gives x = 1 − 2y. Inserting into the second equations gives
2(1 − 2y) + 4y = 5. We see that the terms with y cancel and we obtain 2 = 5 which is a
contradiction. Therefore, the system of equations has no solution.
(c) 1 x + 2y = 1
2 3x + 6y = 3.
The determinant is d = 6 − 6 = 0, so again we expect either no solution or infinitely many
solutions. The first equations gives x = 1 − 2y. Inserting into the second equations gives
3(1 − 2y) + 6y = 3. We see that the terms with y cancel and we obtain 3 = 3 which is true.
Therefore, the system of equations has infinitely many solutions given by x = 1 − 2y.
L1 : x + 2y = 1 L1 : x + 2y = 1
y y
L2 : 3x + 4y = 5 L2 : 3x + 6y = 3
2 2
1 1
L1 = L2
x x
−1 1 2 3 −1 1 2 3
−1 −1
Figure 1.4: Picture on the left: The lines L1 , L2 from Example 1.14(b) are parallel and do not
intersect. Therefore the linear system has no solution.
Picture on the right: The lines L1 , L2 from Example 1.14(c) are equal. Therefore the linear system
FT
has infinitely many solutions.
Remark. This was somewhat clear since we can obtain the second equation from the first one
by multiplying both sides by 3 which shows that both equations carry the same information
and we loose nothing if we simply forget about one of them.
Solution. We only need to calculate the determinant and find all k such that it is different from
zero. So let us start by calculating
Hence there are exactly two values for k where d = 0, namely k = −1 ± 4, that is k1 = 3, k2 = −5.
For all other k, we have that d 6= 0.
So the answer is: The system has exactly one solution if and only if k ∈ R \ {−5, 3}.
Remark 1.16. (a) Note that the answer does not depend on the right hand side of the system
of the equation. Only the coefficients on the left hand side determine if there is exactly one
solution or not.
(b) If we wanted to, we could also calculate the solution x, y in the case k ∈ R \ {−5, 3}. We
could do it by hand or use (1.11). Either way, we find
1 5k − 45/2 1 6k − 4
x= [2k − 3(15/2 − k)] = 2 , y= [6k − 4] = 2 .
d 2k + 4k − 30 d 2k + 4k − 30
Note that the denominators are equal to d and they are equal to 0 exactly for the “forbidden”
values of k = −5 or k = 3.
4
4x − 10y = − , 4x − 10y = 3
5
FT
4
4x − 6y = , 4x − 6y = 3
3
1.3 Summary
A linear system is a system of equations
where x1 , . . . , xn are the unknowns and the numbers aij and bi (i = 1, . . . , m, j = 1, . . . , n) are
given. The numbers aij are called the coefficients of the linear system and the numbers b1 , . . . , bn
are called the right side of the linear system.
In the special case when all bi are equal to 0, the system is called a homogeneous; otherwise it is
called inhomogeneous.
The coefficient matrix A and the augmented coefficient matrix (A|b) of the system is are
FT
a11 a12 . . . a1n
a11 a12 . . . a1n b1
a21 a22 . . . a2n a21 a22 . . . a2n b2
A= . .. , (A|b) = ... .. .. .
. . . . .
am1 am2 . . . amn am1 am2 . . . amn bn
• If d = 0, then (1.12) has either no or infinitely many solutions (it depends on b1 and b2 which
case prevails).
Observe that d does not depend on the right hand side of the linear system.
Chapter 2
R2 and R3
In this chapter we will introduce the vector spaces R2 , R3 and Rn . We will define algebraic
operations in them and interpret them geometrically. Then we will add some additional structure
FT
to these spaces, namely an inner product. This allows us to assign a norm (length) to a vector and
talk about the angle between two vectors; in particular, it gives us the concept of orthogonality. In
Section 2.3 we will define orthogonal projections in R2 and we will give a formula for the orthogonal
projection of a vector onto another. This formula is easily generalised to projections onto a vector
in Rn with n ≥ 3. Section 2.5 is dedicated to the special and very important case R3 since it is the
space that physicists use in classical mechanics to describe our world. In the last two sections we
study lines and planes in Rn and in R3 . We will see how we can describe them in formulas and we
will learn how to calculate their intersections. This naturally leads to the question on how to solve
RA
linear systems efficiently which will be addressed in the next chapter.
2.1 Vectors in R2
Recall that the xy-plane is the set of all pairs (x, y) with x, y ∈ R. We will denote it by R2 .
Maybe you already encountered vectors in a physics lecture. For instance velocities and forces are
described by vectors. The velocity of a particle says how fast it is and in which direction the particle
moves. Usually, the velocity is represented by an arrow which points in the direction in which the
D
particle moves and whose length is proportional to the magnitude of the velocity.
Similarly, a force has strength and a direction so it is represented by an arrow which points in the
direction in which it acts and with length proportional to its strength.
Observe that it is not important where in the space R2 or R3 we put the arrow. As long it points
in the same direction and has the same length, it is considered the same vector. We call two arrows
equivalent if they have the same direction and the same length. A vector is the set of all arrows
which are equivalent to a given arrow. Each specific arrow in this set is called a representation of
the vector. A special representation is the arrow that starts in the origin (0, 0). Vectors are usually
denoted by a small letter with an arrow on top, for example ~v .
# –
Given two points P, Q in the xy-plane, we write P Q for
the vector which is represented by the arrow that starts y
in P and ends in Q. For example, let P (2, 1) and Q(4, 4)
be pointsinthe xy-plane. Then the arrow from P to Q
# – 2 Q
is P Q = .
3
P#Q–
We can identify a point P (p1 , p2 ) in the xy-plane with
the vector starting in the poiint (0, 0) and ending in P
# – p1 x
P . We denote this vector by OP or or some-
p2
times by (p1 , p2 )t in order to save space (the subscript
t
stands for “transposed”). p1 is called its x-coordinate
or x-component and p2 is called its y-coordinate or y-
# –
component. Figure 2.1: The vector P Q and several of
its representations. The green arrow is the
FT
a special representation whose initial point is
On the other hand, every vector describes a unique in the origin.
b
point in the xy-plane, namely the tip of the arrow which
represents the given vector and starts in the origin.
Clearly its coordinates are (a, b). Therefore we can iden-
tify the set of all vectors in R2 with R2 itself.
RA
a b
Observe that the slope of the arrow ~v = is a if a 6= 0. If a = 0, then the vector is parallel to
b
the y-axis.
2
For example, the vector ~v = can be represented as an arrow whose initial point is in the origin
5
and its tip is at the point (2, 5). If we put its initial point anywhere else, then we find the tip by
moving 2 units to the right (parallel to the x-axis) and 5 units up (parallel to the y-axis).
D
0
A very special vector is the zero vector . Is is usually denoted by ~0.
0
FT
How should we sum two vectors? Again, let us think of forces. Assume we have two forces F~1
and F~2 both acting on the same particle. Then we get the resulting force if we draw the arrow
representing F~1 and attach to its end point the initial point of the arrow representing F~2 . The total
force is then represented by the arrow starting in the initial point of F~1 and ending in the tip of F~2 .
Convince yourself that we obtain the same result if we start with F~2 and put the initial point of
F~1 at the tip of F~2 .
RA
We could also think of the sum of velocities. For example, if a train moves with velocity ~vt and a
passengar on the train is moving with relative velocity ~vp , then her total velocity with respect to
the ground is the vectorsum
of the twovelocities.
a p
Now assume that ~v = and w ~ = . Algebraically,
b q
we obtain the components of their sum by summing the y
a+p ~v + w
~
components: ~v + w~= , see Figure 2.3.
b+q w~
D
Our discussion of how the product of a vector and a scalar and how the sum of two vectors should
be, leads us to the following formal definition.
a p
Definition 2.1. Let ~v = ,w
~= ∈ R2 , c ∈ R. Then:
b q
a p a+p
Vector sum: ~v + w~= + = ,
b q b+q
a ca
Product with a scalar: c~v = c = .
b cb
It is easy to see that the vector sum satisfies what one expects from a sum: (~u +~v ) + w
~ = ~u + (~v + w)
~
(associativity) and ~v + w ~ = w ~ + ~v (commutativity). Moreover, we have the distributivity laws
(a + b)~v= a~v + b~v
anda(~v + w)
~ = a~v + aw.
~ Let us verify for example associativity. To this end,
u1 v1 w1
let ~u = , ~v = ,w ~= . Then
u2 v2 w2
u1 v1 w1 u1 + v1 w1 (u1 + v1 ) + w1
(~u + ~v ) + w
~= + + = + =
u2 v2 w2 u2 + v2 w2 (u2 + v2 ) + w2
FT
u1 + (v1 + w1 ) u1 (v1 + w1 ) u1 v1 w1
= = + = + +
u2 + (v2 + w2 ) u2 (v2 + w2 ) u2 v2 w2
= ~u + (~v + w).
~
In the same fashion, verify commutativity and distributivity of the vector sum.
~v w~ w~ ~v
RA
w
~
~v +
~v
w
~
~+
~v
w
w~
~v
Figure 2.4: The picture illustrates the commutativity of the vector sum.
~z ~z
~z)
~z
D
+
w
~
+
)
w
w~
(w~
~+
~z ~
w
+
~v +
+
(~v
w
~ w
~
~v
~z
~v
~v ~v
We can take these properties and define an abstract vector space. We shall call a set of things, called
vectors, with a “well-behaved” sum of its elements and a “well-behaved” product of its elements
with scalars a vector space. The precise definition is the following.
Note that we will usually write λv instead of λ · v. Then V is called an R-vector space and its
elements are called vectors if the following holds:
(c) Identity element of addition: There exists an element O ∈ V , called the additive identity
such that for every v ∈ V , we have O + v = v + O = v.
(d) Inverse element: For all v ∈ V , we have an inverse element v 0 such that v + v 0 = O.
FT
(f) Compatibility: For every v ∈ V and λ, µ ∈ R, we have that (λµ)v = λ(µv).
These axioms are fundamental for linear algebra and we will come back to them in Chapter 5.1.
RA
Check that R2 is a vector space, that its additive identity is O = ~0 and that for every vector
~v ∈ R2 , its additive inverse is −~v .
It is important to note that there are vector spaces that do not look like R2 and that we cannot
always write vectors as columns. For instance, the set of all polynomials form a vector space (the
sum and scalar multiple of polynomials is again polynomial, the sum is additive and commutative;
the additive identity is the zero polynomial and for every polynomial p, its additive inverse is the
D
polynomial −p; we can multiply polynomials with scalars and obtain another polynomial, etc.). The
vectors in this case are polynomials and it does not make sense to speak about its “components” or
“coordinates”. (We will however learn how to represent certain subspaces of the space of polynomials
as subspaces of some Rn in Chapter 6.3.)
After this brief excursion about abstract vector spaces, let us return to R2 . We know that it can
be identified with the xy-plane. This means that R2 has more structure than only being a vector
space. For example, we can measure angles and lengths. Observe that these concepts do not appear
in the definition of a vector space. They are something in addition to the vector space properties.
Let us now look at some more geometric properties of vectors in R2 . Clearly a vector is known if
we know its length anditsangle with the x-axis. From the Pythagoras theorem it is clear that the
a √
length of a vector ~v = is a2 + b2 .
b
y
y y
~v
ϕ
~v
ϕ
x
x x ϕ
~v
y
~v
ϕ0
ϕ
x
FT
−~v
Figure 2.7: The angle of ~v and −~v with the x-axis. Clearly, ϕ0 = ϕ + π.
2
Definition 2.2 (Norm of a vector in R ). The length of ~v =
a
∈ R2 is denoted by k~v k. It
RA
b
is given by
p
k~v k = a2 + b2 .
As already mentioned earlier, the slope of vector ~v is ab if a 6= 0. If ϕ is the angle of the vector ~v
with the x-axis then tan ϕ = ab if a 6= 0. If a = 0, then ϕ = 0 or ϕ = π. Recall that the range
a
of arctan is (−π/2, π/2), so we cannot simply take arctan
of the fraction b inorder to obtain ϕ.
D
b −b a −a a
Observe that arctan a = arctan −a , but the vectors and =− point in opposite
b −b b
directions, so they do not have the same angle with the x-axis. In fact, their angles differ by π, see
Figure 2.7. From elementary geometry, we find
arctan ab
if a > 0,
b π − arctan b
if a < 0,
tan ϕ = if a 6= 0 and ϕ= a
a π/2
if a = 0, b > 0,
−π/2 if a = 0, b < 0.
Note that this formula gives angles with values in [−π/2, 3π/2).
Remark 2.3. In order to obtain angles with values in (−π, π], we can use the formula
a
arccos √a2 +b2
if b > 0,
ϕ= − arccos √a2a+b2 if b < 0,
π if a < 0, b = 0.
(iii) k~v + wk
~ ≤ k~v k + kwk
~ (triangle inequality),
a c
FT
Proof. Let ~v = ,w
~= ∈ R2 and λ ∈ R.
b d
√
(i) Since k~v k = a2 + b2 it follows that k~v k = 0 if and only if a = 0 and b = 0. This is the case
if and only if ~v = ~0.
√
a
λa
p p
(ii) kλ~v k =
2 2 2 2 2 2 2
λ b
=
λb
= (λa) + (λb) = λ (a + b ) = |λ| a + b = |λ|k~v k.
RA
(iii) We postpone the proof of the triangle inequality to Corollary 2.20 when we will have the
cosine theorem at our disposal.
rectly from the origin of the blue vector to its tip than taking
~ In other words, k~v +wk
a detour along ~v and w. ~ ≤ k~v k+kwk.
~ ~v
D
Note that every vector ~v 6= ~0 defines a unit vector pointing in the same direction as itself by k~v k−1~v .
Remark 2.6. (i) The tip of every unit vector lies on the unit circle, and, conversely, every vector
whose initial point is the origin and whose tip lies on the unit circle is a unit vector.
cos ϕ
(ii) Every unit vector is of the form where ϕ is its angle with the positive x-axis.
sin ϕ
~v
ϕ
x
1
FT
1 0
~e1 = , ~e2 = .
0 1
Clearly, ~e1 is parallel to the x-axis, ~e2 is parallel to the y-axis and k~e1 k = k~e2 k = 1.
a
Remark 2.7. Every vector ~v = can be written as
b
a a 0
~v = = + = a~e1 + b~e2 .
RA
b 0 b
Remark 2.8. Another notation for ~e1 and ~e2 is ı̂ and ̂.
FT
Before we give properties of the inner product and explore what it is good for, we first calculate a
few examples to familiarise ourselves with it.
Examples 2.10.
2 −1
(i) , = 2 · (−1) + 3 · 5 = −2 + 15 = 13.
3 5
2
RA
2 2 2 2
2
(ii) ,
3
.
= 2 + 3 = 4 + 9 = 13. Observe that this is equal to
3 3
2 1 2 0
(iii) , = 2, , = 3.
3 0 3 1
2 −3
(iv) , = 0.
3 2
Remark 2.12. Observe that the proposition shows that the inner product is commutative and
distributive, so it has some properties of the “usual product” that we are used to from the product
in R or C, but there are some properties that show that the inner product is not a product.
(a) The inner products takes two vectors and gives back a number, so it gives back an object that
is not of the same type as the two things we put in.
FT
(b) In Example 2.10(iv) we saw that it may happen that ~v 6= ~0 and w ~ 6= ~0 but still h~v , wi
~ =0
which is impossible for a “decent” product.
(c) Given a vector ~v 6= 0 and a number c ∈ R, there are many solutions of the equation h~v , ~xi = c
for the vector ~x, in stark contrast to the usual product in R or C. Look for instance at
Example 2.10(i) and (ii). Therefore it makes no sense to write something like ~v −1 .
(d) There is no such thing as a neutral element for scalar multiplication.
RA
Now let us see why the inner product is useful. In fact, it is related to the angle between two vectors
and it will help us to define orthogonal projections of one vector onto another. Let us start with a
definition.
~v ~v
D
w
~
ϕ
ϕ ϕ
w
~ ϕ ~v
w
~
w
~
~v
w
~
ϕ ~v
ψ
−~v
(b) Two non-zero vectors ~v and w~ are called orthogonal (or perpendicular ) if ^(~v , w)
~ = π/2. In
FT
this case we use the notation ~v ⊥ w.
~
The following properties should be intuitively clear from geometry. A formal proof of (ii) and (iii)
can be given easily after Corollary 2.20. The proof of (i) will be given after Remark 2.24.
~ be vectors in R2 . Then:
Proposition 2.16. Let ~v , w
RA
(i) If ~v k w ~ 6= ~0, then there exists λ ∈ R such that ~v = λw.
~ and w ~
(ii) If ~v k w
~ and λ, µ ∈ R, then also λ~v k µw.
~
(iii) If ~v ⊥ w
~ and λ, µ ∈ R, then also λ~v ⊥ µw.
~
Remark 2.17. (i) Observe that (i) is wrong if we do not assume that w ~ 6= ~0 because if w
~ = ~0,
then it is parallel to every vector ~v in R2 , but there is no λ ∈ R such that λ~v could ever
become different from ~0.
D
(ii) Observe that the reverse direction in (ii) and (iii) is true only if λ 6= 0 and µ 6= 0.
h~v , wi
~ = k~v kkwk
~ cos ϕ.
Proof.
~ 2 = k~v k2 + kwk
k~v − wk ~ 2 − 2k~v kkwk
~ cos ϕ. (2.2) ϕ
~v
~ 2 = h~v − w
k~v − wk ~ , ~v − wi
~ = h~v , ~v i − h~v , wi
~ − hw
~ , ~v i + hw ~ = h~v , ~v i − 2h~v , wi
~ , wi ~ + hw
~ , wi
~
= k~v k2 − 2h~v , wi ~ 2.
~ + kwk (2.3)
FT
k~v k2 + kwk
~ 2 − 2k~v kkwk
~ cos ϕ = k~v k2 − 2h~v , wi ~ 2,
~ + kwk
A very important consequence of this theorem is that we can now determine if two vectors are
parallel or perpendicular to each other by simply calculating their inner product as can be seen
from the following corollary.
RA
~ ∈ R2 and ϕ = ^(~v , w).
Corollary 2.20. Let ~v , w ~ Then:
(i) ~v k w
~ ⇐⇒ k~v k kwk
~ = |h~v , wi|.
~
(ii) ~v ⊥ w
~ ⇐⇒ h~v , wi
~ = 0,
k~v + wk
~ ≤ k~v k + kwk.
~ (2.4)
Proof. The claims are clear if one of the vectors is equal to ~0 since the zero vector is parallel and
orthogonal to every vector in R2 . So let us assume now that ~v 6= ~0 and w~ 6= ~0.
(i) From Theorem 2.19 we have that |h~v , wi| ~ = k~v k kwk
~ if and only if | cos ϕ| = 1. This is the
case if and only if ϕ = 0 or π, that is, if and only if ~v and w
~ are parallel.
(ii) From Theorem 2.19 we have that |h~v , wi| ~ = 0 if and only if cos ϕ = 0. This is the case if and
only if ϕ = π/2, that is, if and only if ~v and w
~ are perpendicular.
w
~
~v +
w
~ ϕ
~ 2 = k~v k2 + kwk
k~v + wk ~ 2 + 2k~v k wk
~ cos ϕ
≤ k~v k2 + kwk
~ 2 + 2k~v k wk
~ ~v
~ 2.
= (k~v k + kwk)
Taking the square root on both sides gives us the desired inequality.
Question 2.1
When does equality hold in the triangle inequality (2.4)? Draw a picture and prove your claim
using the calculations in the proof of (iv).
Exercise. Prove (ii) and (iii) of Proposition 2.16 using Corollary 2.20.
Exercise.
FT
(i) Prove Corollary 2.20 (iii) without the cosine theorem.
Hint. Start with the inequality 0 ≤
kwk~
~ v − k~v kw
2
~
and expand the right hand side similar
Example 2.21. Theorem 2.19 allows us to calculate the angle of a given vector with the x-axis
easily (see Figure 2.13):
h~v ,~e1 i h~v ,~e2 i
cos ϕx = , cos ϕy = .
k~v kk~e1 k k~v kk~e2 k
If we now use that k~e1 k = k~e2 k = 1 and that h~v ,~e1 i = v1 and h~v ,~e2 i = v2 , then we can simplify
the expressions to
D
v1 v2
cos ϕx = , cos ϕy = .
k~v k k~v k
y
~v ϕy
ϕx
x
2.3
extend w
Orthogonal Projections in R2
~ be vectors in R2 and w
Let ~v and w
projection of ~v onto w
FT
~ 6= ~0. Geometrically, we have an intuition of what the orthogonal
~ should be and that we should be able to construct it as described in the
following procedure: We move ~v such that its initial point coincides with that of w. ~ Then we
~ to a line and construct a line that passes through the tip of ~v and is perpendicular to w.
The vector from the initial point to the intersection of the two lines should then be the orthogonal
~
RA
projection of ~v onto w.
~ see Figure 2.14
~v
~v
~v
w
~
w
~ w
~
~v
~v
·
D
~v
· w
~ ~vk
~vk w
~ w
~
·
~vk
~ in R2 .
Figure 2.14: Some examples for the orthogonal projection of ~v onto w
This procedure decomposes the vector ~v in a part parallel to w ~ and a part perpendicular to w
~ so
that their sum gives us back ~v . The parallel part is the orthogonal projection of ~v onto w.
~
In the following theorem we give the precise meaning of the orthogonal projection, we show that
a decomposition as described above always exists and we even derive a formula for orthogonal
projection. A more general version of this theorem is Theorem 7.34.
h~v , wi
~
~vk = w.
~ (2.6)
~ 2
kwk
~v ~v⊥
~v ~v⊥
·
~v
· w
~
~vk = projw~ ~v
w
~ ~v⊥ w
~
~vk = projw~ ~v
·
~vk = projw~ ~v
and since w ~
~ 6= 0, there exists λ ∈ R such that ~vk = λw,
determine λ. For this, we notice that ~v = λw
FT
Figure 2.15: Examples of decompositions of ~v into ~v = ~vk + ~v⊥ with ~vk k w
Proof. Assume we have vectors ~vk and ~v⊥ satisfying (2.5). Since ~vk and w
~ and ~v⊥ ⊥ w.
~ Note that
h~v , wi
~ h~v , wi
~
~vk = λw
~= w
~ and ~v⊥ = ~v − ~vk = ~v − w.
~
~ 2
kwk ~ 2
kwk
This already proves uniqueness of the vectors ~vk and ~v⊥ . It remains to show that they indeed have
the desired properties. Clearly, by construction ~vk is parallel to w ~ and ~v = ~vk + ~v⊥ since we defined
~v⊥ = ~v − ~vk . It remains to verify that ~v⊥ is orthogonal to w. ~ This follows from
h~v , wi
~ h~v , wi
~ h~v , wi
~
h~v⊥ , wi
~ = ~v − w ~ = h~v , wi
~ ,w ~ − w ~ = h~v , wi
~ ,w ~ − hw
~ , wi
~ =0
~ 2
kwk ~ 2
kwk ~ 2
kwk
where in the last step we used that hw ~ 2.
~ = kwk
~ , wi
Notation 2.23. Instead of ~vk we often write projw~ ~v , in particular when we want to emphasise
onto which vector we are projecting.
Proof. (i): By our geometric intuition, this should be clear. Let us give a formal proof. Suppose
~ for some c ∈ R \ {0}. Then
we want to project ~v onto cw
h~v , cwi
~ ch~v , wi
~ h~v , wi
~
projcw~ ~v = (cw)
~ = 2 (cw)
~ = w
~ = projw~ ~v .
kcwk~ 2 c kwk~ 2 ~ 2
kwk
FT
only the direction of w
~ matters, not its length.
(ii): Again, by geometric considerations, this should be clear. The corresponding calculation is
hc~v , wi
~ ch~v , wi
~
projw~ (c~v ) = w
~= w
~ = c projw~ ~v .
kwk~ 2 kwk~ 2
(iii) follows directly from (i) and (ii).
RA
(iv), (v) and (vi) follow from the uniqueness of the decomposisition of the vector ~v as sum of a
vector parallel and a vector perpendicular to w.
~
Now the proof of Proposition 2.16 (i) follows easily.
Proof of Proposition 2.16 (i). We have to show that if ~v k w ~ 6= ~0, then there exists λ ∈ R
~ and if w
~ = λ~v . From Remark 2.24 (iv) it follows that ~v = projw~ ~v = h~
such that w v ,wi
~
~ 2 w,
kwk ~ hence the claim
h~
v ,wi
~
follows if we can choose λ = ~ 2 .
kwk
D
4 2
,
−1
h~ ui
v ,~ 3 8−3 5 5 4
(v) proj~v ~u = k~v k2 ~
v = uk2
k~ ~u = 42 +(−1)2 ~
v = 17 ~
v = .
17 −1
a
Example 2.26 (Angle with coordinate axes). Let ~v = ∈ R2 \ {~0}. Then cos ^(~v ,~e1 ) =
b
a
v ,~e2 ) = k~vb k , hence
v k , cos ^(~
k~
a cos ^(~v ,~e1 ) cos ϕx
~v = = k~v k = k~v k
b cos ^(~v ,~e2 ) cos ϕy
and
projection of ~v onto the x-axis = proj~e1 ~v = k~v k cos ^(~v ,~e1 )~e1 = k~v k cos ϕx ~e1 ,
projection of ~v onto the y-axis = proj~e2 ~v = k~v k cos ^(~v ,~e2 )~e2 = k~v k cos ϕy ~e2 .
FT
Question 2.2
~ be a vector in R2 \ {~0}.
Let w
~ is equal to ~0?
(i) Can you describe geometrically all the vectors ~v whose projection onto w
(ii) Can you describe geometrically all the vectors ~v whose projection onto w
~ have length 2?
(iii) Can you describe geometrically all the vectors ~v whose projection onto w
~ have length 3kwk?
~
RA
You should have understood
2.4 Vectors in Rn
In this section we extend our calculations from R2 to Rn . If n = 3, then we obtain R3 which
usually serves as model for our everyday physical world and which you probably already are familiar
with from physics lectures. We will discuss R3 and some of its peculiarities in more detail in the
Section 2.5.
First, let us define Rn .
Again we can think of vectors as arrows. As in R2 , we can identify every point in Rn with the arrow
that starts in the origin of coordinate system and ends in the given point. The set of all arrows
with the same lengthandthe same direction is called a vector in Rn . So every point P (p1 , . . . , pn )
p1
defines a vector ~v = ... and vice versa. As before, we sometimes denote vectors as (p1 , . . . , pn )t
pn
in order to save (vertical) space. The superscript t stands for “transposed”.
FT
v1 w1 v1 + w 1 cv1
~ = ... + ... =
Rn × Rn → Rn , ~v + w ..
, R × Rn → Rn , c~v = ... . (2.7)
.
vn wn vn + w n cvn
Exercise. Show that Rn is a vector space. That is, you have to show that the vector space
axioms on page 29 hold.
As in R2 , we can define the norm of a vector, the angle between two vectors and an inner product.
RA
Note that the definition of the angle between two vectors is not different from the one in R2 since
when we are given two vectors, they always lie in a common plane which we can imagine as some
sort of rotated R2 . Let us give now the formal definitions.
v1 w1
.. ..
Definition 2.28 (Inner product; norm of a vector). For vectors ~v = . and w
~ = .
vn wn
the inner product (or scalar product or dot product) is defined as
D
* v1 w1 +
.. ..
~ = . , . = v1 w1 + · · · + vn wn .
h~v , wi
vn wn
v1
The length of ~v = ... ∈ Rn is denoted by k~v k and it is given by
vn
q
k~v k = v12 + · · · + vn2 .
(a) ~v k w
~ ⇐⇒ ~ ∈ {0, π}
^(~v , w) ⇐⇒ |h~v , wi|
~ = k~v k kwk,
~
(b) ~v ⊥ w
~ ⇐⇒ ^(~v , w)
~ = π/2 ⇐⇒ h~v , wi
~ = 0.
Remark 2.29. In abstract inner product spaces, the inner product is actually used to define
FT
orthogonality.
(iv) Relation of the inner product with the norm: For all vectors ~v ∈ Rn , we have k~v k2 = h~v , ~v i.
~ ∈ Rn and scalars c ∈ R, we have that kc~v k = |c|k~v k
(v) Properties of the norm: For all vectors ~v , w
and k~v + wk
~ ≤ k~v k + kwk.
~
~ 6= ~0 the
~ ∈ Rn with w
(vi) Orthogonal projections of one vector onto another: For all vectors ~v , w
orthogonal projection of ~v onto w
~ is
RA
h~v , wi
~
projw~ ~v = w.
~ (2.8)
~ 2
kwk
As in R2 , we have n “special vectors” which are parallel to the coordinate axes and have norm 1:
1 0 0
0 1 ..
~e1 := . , ~e2 := . , . . . , ~en := . .
.. .. 0
D
0 0 1
In the special case n = 3, the vectors ~e1 , ~e2 and ~e3 are sometimes denoted by ı̂,̂, k̂.
For a given vector ~v 6= ~0, we can now easily determine its projections onto the n coordinate axes
and its angle with the coordinate axes. By (2.8), the projection onto the xj -axis is
proj~ej ~v = vj~ej .
h~v ,~ej i vj
ϕj = ^(~v ,~ej ) =⇒ cos ϕx = = .
k~v k k~ej k k~v k
cos ϕ1
It follows that ~v = k~v k ... . Sometimes the notation
cos ϕn
cos ϕ1
~v
v̂ := = k~v k ...
k~v k
cos ϕn
is used for the unit vector pointing in the same direction as ~v . Clearly kv̂k = 1 because kv̂k =
kk~v k−1~v k = k~v k−1 k~v k = 1. Therefore v̂ is indeed a unit vector pointing in the same direction as
the original vector ~v .
FT
• that R2 from chapter 2.1 is a special case of Rn from this section,
• etc.
You should now be able to
• perform algebraic operations in the vector space R3 and, in the case n = 3, visualise them
in space,
• calculate lengths and angles,
RA
• calculate unit vectors, scale vectors,
• perform simple abstract proofs (e.g., prove that Rn is a vector space).
• etc.
R3 we can define an additional operation with vectors, the so-called cross product. Another name
for it its vector product. It takes two vectors and gives back another vector. It does have several
properties which makes it look like a product, however we will see that it is not a product. Here
is its definition.
v1 w1
Definition 2.30 (Cross product). Let ~v = v2 , w ~ = w2 ∈ R3 . Their cross product (or
v3 w3
vector product or wedge product) is
v1 w1 v2 w3 − v3 w2
~ = v2 × w2 := v3 w1 − v1 w3 .
~v × w
v3 w3 v1 w2 − v2 w1
A way to remember this formula is as follows. Write the first and the second component of the
vectors underneath them, so that formally you get a column of 5 components. Then make crosses
as in the sketch below, starting with the cross consisting of a line from v2 to w3 and then from w2
to v3 . Each line represents a product of the corresponding components; if the line goes from top
left to bottom right then it is counted positive, if it goes from top right to bottom left then it is
counted negative.
v2 w3 − v3 w2
v1 w1
v2 × w2 = v3 w1 − v1 w3
v3 w3 v1 w2 − v2 w1
v1 w1
v2 w2
FT
The cross product is defined only in R3 !
Before we collect some easy properties of the cross product, let us calculate a few examples.
1 5
Examples 2.31. Let ~u = 2, ~v = 6.
3 7
RA
1 5 2·7−3·6 14 − 18 −4
• ~u × ~v = 2 × 6 = 3 · 5 − 1 · 7 = 15 − 7 = 8,
3 7 1·6−2·5 6 − 10 −4
5 1 6·3−7·2 18 − 14 4
• ~v × ~u = 6 × 2 = 7 · 1 − 3 · 5 = 7 − 15 = −8,
7 3 5·2−6·1 10 − 6 4
5 1 6·0−7·0 0
D
(i) ~u × ~0 = ~0 × ~u = ~0.
(iii) ~u × (~v + w)
~ = (~u × ~v ) + (~u × w).
~
(vi) h~u , ~v × wi
~ = h~u × ~v , wi.
~
~v ⊥ ~v × ~u, ~u ⊥ ~v × ~u
Proof. The proofs of the formulas (i) – (v) are easy calculations (you should do them!).
FT
u3 v3 w3 u3 v1 w2 − v2 w1
= u1 (v2 w3 − v3 w2 ) + u2 (v3 w1 − v1 w3 ) + u3 (v1 w2 − v2 w1 )
= u1 v2 w3 − u1 v3 w2 + u2 v3 w1 − u2 v1 w3 + u3 v1 w2 − u3 v2 w1
= u2 v3 w1 − u3 v2 w1 + u3 v1 w2 − u1 v3 w2 + u1 v2 w3 − u2 v1 w3
= (u2 v3 − u3 v2 )w1 + (u3 v1 − u1 v3 )w2 + (u1 v2 − u2 v1 )w3
= h~u × ~v , wi.
~
RA
(vii) It follows from (vi) and (v) that
Note that the cross product is distributive but it is neither commutative nor associative.
Remark 2.33. The property (vii) explains why the cross product makes sense only in R3 . Given
two non-parallel vectors ~v and w,
~ their cross product is a vector which is orthogonal to both of
them and whose length is k~v k kwk
~ sin ϕ (see Theorem 2.34; ϕ = ^(~v , w))
~ and this should define the
result uniquely up to a factor ±1. This factor has to do with the relative orientation of ~v and w
~ to
each other. However, if n 6= 3, then one of the following holds:
• If we were in R2 , the problem is that “we do not have enough space” because then the only
vector orthogonal to ~v and w~ at the same time would be the zero vector ~0 and it would not
make too much sense to define a product where the result is always ~0.
• If we were in some Rn with n ≥ 4, the problem is that “we have too many choices”. We will
see later in Chapter 7.3 that the orthogonal complement of the plane generated by ~v and w ~
has dimension n − 2 and every vector in the orthogonal complement is orthogonal to both
~v and w.~ For example, if we take ~v = (1, 0, 0, 0)t and w~ = (0, 1, 0, 0)t , then every vector of
t
the form ~a = (0, 0, x, y) is perpendicular to both ~v and w
~ and it easy to find infinitely many
vectors of this form which in addition have norm k~v k kwk~ sin ϕ = 1 (~a = (0, 0, sin ϑ, ± cos ϑ)t
for arbitrary ϑ ∈ R works).
Recall that for the inner product we proved the formula h~v , wi
~ = k~v k kwk
~ cos ϕ where ϕ is the angle
between the two vectors, see Theorem 2.19. In the next theorem we will prove a similar relation
for the cross product.
k~v × wk
~ = k~v k kwk
~ sin ϕ
FT
~ 2 = k~v k2 kwk
Proof. A long, but straightforward calculation shows that k~v × wk ~ 2 − h~v , wi
~ 2 . Now it
follows from Theorem 2.19 that
~ 2 = k~v k2 kwk
k~v × wk ~ 2 − h~v , wi
~ 2 = k~v k2 kwk
~ 2 − k~v k2 kwk
~ 2 (cos ϕ)2
= k~v k2 kwk
~ 2 (1 − (cos ϕ)2 ) = k~v k2 kwk
~ 2 (sin ϕ)2 .
If we take the square root of both sides, we arrive at the claimed formula. (We do not need to
worry about taking the absolute value of sin ϕ because ϕ ∈ [0, π], hence sin ϕ ≥ 0.)
RA
~ 2 = k~v k2 kwk
Exercise. Show that k~v × wk ~ 2 − h~v , wi
~ 2.
w
~ h
~v
Proposition 2.35 (Area of a parallelogram). The area of the parallelogram spanned by the
vectors ~v and w
~ is
A = k~v × wk.
~ (2.9)
Proof. The area of a parallelogram is the product of the length of its base with the height. We
can take w ~ and ~v . Then we obtain that h = k~v k sin ϕ and
~ as base. Let ϕ be the angle between w
therefore, with the help of Theorem 2.34
A = kwkh
~ = kwkk~
~ v k sin ϕ = k~v × wk.
~
Volume of a paralellepiped
Any three vectors in R3 define a parallelepiped.
~n
~u
FT
pro j~ ~u
n
•
RA
h
w
~
~v
D
Proof. The volume of a parallelepiped is the product of the area of its base with the height. Let us
take the parallelogram spanned by ~v , w
~ as base. If ~v and w
~ are parallel or one or them is equal to
~0, then (2.10) is true because V = 0 and ~v × w
~ = ~0 in this case.
Now let us assume that they are not parallel. By Proposition 2.35 we already know that its base
has area A = k~v × wk.
~ The height is the length of the orthogonal projection of ~u onto the normal
vector of the plane spanned by ~v and w. ~ We already know that ~v × w ~ is such a normal vector.
Hence we obtain that
h~u , ~v × wi
~
|h~u , ~v × wi|
~ |h~u , ~v × wi|
~
h = k proj~v×w~ ~uk =
2
~
v × w
~
= k~v × wk
~ = .
k~v × wk ~
k~v × wk~ 2 k~v × wk~
|h~u , ~v × wi|
~
V = Ah = k~v × wk
~ = |h~u , ~v × wi|.
~
k~v × wk~
~ ∈ R3 . Then
Corollary 2.37. Let ~u, ~v , w
|h~u , ~v × wi|
~ = |h~v , w
~ × ~ui| = |hw
~ , ~u × ~v i|.
FT
Proof. The formula holds because each of the expressions describes the volume of the parallelepiped
spanned by the three given vectors since we can take any of the sides of the parallelogram as its
base.
Lines
Intuitively, it is clear what a line in R3 should be. In order to describe a line in R3 completely, it
is not necessary to know all its points. It is sufficient to know either
(a) two different points P, Q on the line
or
(b) one point P on the line and the direction of the line.
Q Q
~v ~v
P P P
# –
OQ
L L L # –
OP
O
FT
Figure 2.18: Line L given by: two points P, Q on L; or by a point P on L and the direction of L.
Clearly, both descriptions are equivalent because: If we have two different points P, Q on the line
# –
L, then its direction is given by the vector P Q. If on the other hand we are given a point P on L
# – # –
and a vector ~v which is parallel to L, then we easily get another point Q on L by OQ = OP + ~v .
RA
Now we want to give formulas for the line.
Given two points P (p1 , p2 , p3 ) and Q(q1 , q2 , q3 ) with P 6= Q, there is exactly one line L which passes
through both points. In formulas, this line is described as
n# – p1 + (q1 − p1 )t
# – o
L = OP + tP Q : t ∈ R = p2 + (q2 − p2 )t : t ∈ R . (2.11)
D
p3 + (q3 − p3 )t
v1
If we are given a point P (p1 , p2 , p3 ) on L and a vector ~v = v2 6= ~0 parallel to L, then
v3
n# – o p 1 + v1 t
L = OP + t~v : t ∈ R = p2 + v2 t : t ∈ R (2.12)
p 3 + v3 t
The formulas are easy to understand. They say: In order to trace the line, we first move to an
# –
arbitrary point on the line (this is the term OP ) and then we move an amount t along the line.
With this procedure we can reach every point on the line, and on the other hand, if we do this,
then we are guaranteed to end up on the line.
The formulas (2.11) and (2.12) are called vector equation for the line L. Note that they are the
same if we set v1 = q1 − p1 , v2 = q2 − p2 , v3 = q3 − p3 . We will mostly use the notation with the
v’s since it is shorter. The vector ~v is called directional vector of the line L.
Question 2.3
# –
Is it true that E passes through the origin if and only if OP = ~0?
Remark 2.38. It is important to observe that a given line has many different parametrisations.
• The vector equation that we write down depends on the points we choose on L. Clearly, we
have infinitely many possibilities to do so.
• Any given line L has many directional vectors. Indeed, if ~v is a directional vector for L, then
c~v is so too for every c ∈ R \ {0}. However, all possible directional vectors are parallel.
Exercise. Check that the following formulas all describe the same line:
FT
1 6 1 12
(i) L1 = 2 + t 5 : t ∈ R , (ii) L2 = 2 + t 10 : t ∈ R ,
3 4 3 8
13 6
(ii) L3 = 12 + t 5 : t ∈ R .
11 4
RA
Question 2.4
• How can you see easily if two given lines are parallel or perpendicular to each other?
• How would you define the angle between two lines? Do they have to intersect so that an
angle between them can be defined?
D
From the formula (2.12) it is clear that a point (x, y, z) belongs to L if and only if there exists t ∈ R
such that
The system of equations (2.13) or (2.14) are called the parametric equations of L. Here, t is the
parameter.
Observe that for (x, y, z) ∈ L, the three equations in (2.13) must hold for the same t. If we assume
that v1 , v2 , v3 6= 0, then we can solve for t and we obtain that
x − p1 y − p2 z − p3
= = . (2.15)
v1 v2 v3
x − p1 y − p2 z − p3
= = . (2.16)
q1 − p1 q2 − p2 q3 − p3
FT
The system of equations (2.15) or (2.16) is called the symmetric equation of L.
If for instance, v1 = 0 and v2 , v3 6= 0, then the line is parallel to the yz-plane and its symmetric
equation is
y − p2 z − p3
x = p1 , = .
v2 v3
RA
If v1 = v2 = 0 and v3 6= 0, then the line is parallel to the z-axis and its symmetric equation is
x = p1 , y = p2 , z ∈ R.
Representations of lines in Rn .
D
and, assuming that all vj are different from 0, its symmetric form is
x1 − p1 x2 − p2 xn − pn
= = ··· = .
v1 v2 vn
~n
R E E E
w
~
Q P
P ~v
P
Figure 2.19: Plane E given by: (a) three points P, Q, R on E, (b) a point P on E and two vectors
~v , w
~ parallel to E, (c) a point P on E and a vector ~n perpendicular to E.
FT
Question 2.5. Normal form of a line.
In R2 , there is also the normal form of a line:
L : ax + by = d (2.17)
where a, b and d are fixed numbers. This means that L consists of all the points (x, y) whose
coordinates satisfy the equation ax + by = d.
RA
(i) Given a line in the form (2.17), find a vector representation.
(ii) Given a line in vector representation, find a normal form (that is, write it as (2.17)).
a
(iii) What is the geometric interpretation of a, b? (Hint: Draw the line L and the vector .)
b
(iv) Can this normal form be extended/generalised to lines in R3 ? If it is possible, how can it
be done? If it is not possible, explain why not.
D
Planes
In order to know a plane E in R3 completely, it is sufficient to know
(a) three points P, Q on the plane that do not lie on a a common line,
or
or
(c) one point P on the plane and a vector ~n which is perpendicular to the plane,
~
n
Q R
# Q
# P
–
–
O
O R–
#
(0, 0, 0)
# – # –
Figure 2.20: Plane E given with three points P, Q, R on E, two vectors P Q, P R parallel to E, and a
# – # –
vector ~n perpendicular to E. Note the ~n k P Q × P R.
FT
First, let us see how we can pass from one description to another. Clearly, the descriptions (a) and
(b) are equivalent because given three points P, Q, R on E which do not lie on a line, we can form
# – # –
the vectors P Q and P R. These vectors are then parallel to the plane E but are not parallel to each
# – # – # – # –
other. (Of course, we also could have taken QR and QP or RP and RQ.) If, on the other hand,
we have one point P on E and two vectors ~v and w,
# – # –
~ parallel to E and ~v 6k w,
# – # –
two other points on E, for instance by OQ = OP + ~v and OR = OP + w.
P, Q, R lie on E and do not lie on a line.
~ then we can easily get
~ Then the three points
RA
Vector equation of a plane
As in the case of the vector equation of a line, it is easy to understand the formula. We first move
# –
to an arbitrary point on the line (this is the term OP ) and then we move parallel to the plane as
we please (this is the term s~v + tw).
~ With this procedure we can reach every point on the plane,
and on the other hand, if we do this, then we are guaranteed to end up on the plane.
Question 2.6
# –
Is it true that E passes through the origin if and only if OP = ~0?
Now we want to use the normal vector of the plane to describe it. Assume that we are given a
point P on E and a vector ~n perpendicular to the plane. This means that every vector which is
parallel to the plane E must be perpendicular to ~n. If we take an arbitrary point Q(x, y, z) ∈ R3 ,
# – # –
then Q ∈ E if and only if P Q is parallel to E, that means that P Q is orthogonal to ~n. Recall that
two vectors are perpendicular if and only if their inner product is 0, so Q ∈ E if and only if
*n x − p +
# – 1 1
0 = hn , P Qi = n2 , y − p2 = n1 (x − p1 ) + n2 (y − p2 ) + n3 (z − p3 )
n3 z − p3
= n1 x + n2 y + n3 z − (n1 p1 + n2 p2 + n3 − p3 )
If we set d = n1 p1 + n2 p2 + n3 − p3 , then it follows that a point Q(x, y, z) belongs to E if and only
if its coordinates satisfy
n1 x + n2 y + n3 z = d. (2.18)
Equation (2.18) is called the normal form for the plane E and ~n is called a normal vector of E.
FT
Exercise. Show that E passes through the origin if and only if d = 0.
Remark 2.40. As before, note that the normal equation for a plane is not unique. For instance,
x + 2y + 3z = 5 and 2x + 4y + 6z = 10
describe the same plane. The reason is that “the” normal vector of a plane is not unique. If ~n is
normal vector of the plane E, then every c~n with c ∈ R \ {0} is also a normal vector to the plane.
RA
Definition 2.41. The angle between two planes is the angle between their normal vectors.
Note that this definition is consistent with the fact that two planes are parallel if and only if their
normal vectors are parallel.
Remark 2.42. • Assume a plane is given as in (b) (that is, we know a point P on E and two
vectors ~v and w ~ parallel to E but with ~v 6k w).
~ In order to find a description as in (c) (that is
one point on E and a normal vector), we only have to find a vector ~n that is perpendicular to
both ~v and w. ~ Proposition 2.32(vii) tells us how to do this: we only need to calculate ~v × w.
~
D
Another way to find an appropriate ~n is to find a solution of the linear 2 × 3 system given by
{h~v , ~ni = 0, hw
~ , ~ni = 0}.
• Assume a plane is given as in (c) (that is, we know a point P on E and a normal vector). In
order to find vectors ~v and w
~ as in (b), we can proceed in many ways:
– Find two solutions of ~x × ~n = 0 which are not parallel.
# – # – # –
– Find two points Q, R on the plane such that P Q 6k P R. Then we can take ~v = P Q and
# –
w
~ = P R.
– Find one solution ~v 6= ~0 of ~n × ~v = ~0 which is usually easy to guess and then calculate
~ = ~v × ~n. The vector w
w ~ is perpendicular to ~n and therefore it is parallel to the plane.
It is also perpendicular to ~v and therefore it is not parallel to ~v . In total, this vector w
~
does what we need.
Representations of planes in Rn .
In Rn , the vector form of plane is
n# – o
~ :t∈R
E = OP + t~v + sw
FT
• how they can be described in formulas,
• etc.
You should now be able to
• pass easily between the different descriptions of lines and planes,
• etc.
RA
2.7 Intersections of lines and planes in R3
Intersection of lines
(a) The lines intersect in exactly one point. In this case, they cannot be parallel.
(b) The lines intersect in infinitely many points. In this case, the lines have to be equal. In
particular the have to be parallel.
(c) The lines do not intersect. Note that in contrast to the case in R2 , the lines do not have to be
parallel for this to happen. For example, the line L : x = y = 1 is a line parallel to the z-axis
passing through (1, 1, 0), and G : x = z = 0 is a line parallel to the y-axis passing through
(0, 0, 0), The lines do not intersect and they are not parallel.
L1 ∩ L2 = L1
Proof. A point Q(x, y, z) belongs to L1 ∩ L2 if and only if it belongs both to L1 and L2 . This means
# –
that there must exist an s ∈ R such that OQ = p~1 + s~v1 and there must exist a t ∈ R such that
# –
OQ = p~2 + t~v2 . Note that s and t are different parameters. So we are looking for s and t such that
FT
0 1 2 2
p~1 + s~v1 = p~2 + t~v2 , that is 0 + s 2 = 4 + t 4 . (2.19)
1 3 7 6
Once we have solved (2.19) for s and t, we insert them into the equations for L1 and L2 respectively,
in order to obtain Q. Note that (2.19) in reality is a system of three equations: one equation for
each component of the vector equation. Writing it out and solving each equation for s, we obtain
RA
0 + s = 2 + 2t s = 2 + 2t
0 + 2s = 4 + 4t ⇐⇒ s = 2 + 2t
1 + 3s = 7 + 6t s = 2 + 2t
This means that there are infinitely many solutions of (2.19). Given any point R on L1 , there is a
# – # –
corresponding s ∈ R such that OR = p~1 + s~v1 . Now if we choose t = (s − 2)/2, then OR = p~2 + t~v2
0
holds, hence R ∈ L2 too. If on the other hand we have a point R ∈ L2 , then there is a corresponding
# – # –
t ∈ R such that OR0 = p~2 + t~v2 . Now if we choose s = 2 + 2t, then OR0 = p~1 + t~v1 holds, hence
R0 ∈ L2 too. In summary, we showed that L1 = L2 .
D
Remark 2.44. We could also have seen that the directional vectors of L1 and L2 are parallel. In
fact, ~v2 = 2~v1 . It then suffices to show that L1 and L2 have at least one point in common in order
to conclude that the lines are equal.
L1 ∩ L3 = {(1, 2, 4)}
1 0 + s = −1 + t 1 s − t = −1
2 0 + 2s = 0 + t ⇐⇒ 2 2s − t = 0
3 1 + 3s = 0 + 2t 3 3s − 2t = −1
FT
In order to check if this result is correct, we can put t = 2 in the equation for L3 . The result must
be the same. The corresponding calculation is:
−1 2 1
# –
OQ = p~3 + 2 · ~v3 = 0 + 2 = 2 ,
0 4 4
1 s=3+ t 1 s− t=3
2 2s = t ⇐⇒ 2 2s − t = 0
3 1 + 3s = 5 + 2t 3 3s − 2t = 5
Intersection of planes
Given two planes E1 and E2 in R3 , there are two possibilities:
(a) The planes intersect. In this case, they necessarily intersect in infinitely many points. Their
intersection is either a line (if E1 and E2 are not parallel) or a plane (if E1 = E2 ).
(b) The planes do not intersect. In this case, the planes must be parallel and not equal.
E1 : x + y + 2z = 3, E2 : 2x + 2y + 4z = −4, E3 : 2x + 2y + 4z = 6, E4 : x + y − 2z = 5.
E1 ∩ E2 = ∅
FT
Proof. The set of all points Q(x, y, z) which belong both to E1 and E2 is the set of all x, y, z which
simultaneously satisfy
1 x + y + 2z = 3,
2 2x + 2y + 4z = −4.
Now clearly, if x, y, z satisfies 1 , then it cannot satisfy 2 (the right side would be 6). We can
see this more formally if we solve 1 , e.g., for x and then insert into 2 . We obtain from 1 :
x = 3 − y − 2z. Inserting into 2 leads to
RA
−4 = 2(3 − y − 2z) + 2y + 4z = 6,
which is absurd.
1 2
This result was to be expected since the normal vectors of the planes are ~n1 = 1 and ~n2 = 2
2 4
respectively. Since they are parallel, the planes are parallel and therefore they either are equal or
D
they have empty intersection. Now we see that for instance (3, 0, 0) ∈ E1 but (3, 0, 0) ∈/ E2 , so the
planes cannot be equal. Therefore they have empty intersection.
E1 ∩ E3 = E1
Proof. The set of all points Q(x, y, z) which belong both to E1 and E3 is the set of all x, y, z which
simultaneously satisfy
1 x + y + 2z = 3,
2 2x + 2y + 4z = 6.
Clearly, both equations are equivalent: if x, y, z satisfies 1 , then it also satisfies 2 and vice versa.
Therefore, E1 = E3 .
Figure 2.21: The left figure shows E1 ∩ E2 = ∅, the right figure shows E1 ∩ E4 which is a line.
FT
4 −1
E1 ∩ E4 = 0 + t 1 : t ∈ R .
− 12 0
1 1
Proof. First, we notice that the normal vectors ~n1 = 1 and ~n4 = 1 are not parallel, so we
2 −2
RA
expect that the solution is a line in R3 .
The set of all points Q(x, y, z) which belong both to E1 and E4 is the set of all x, y, z which
simultaneously satisfy
1 x + y + 2z = 3,
2 x + y − 2z = 5.
1
z=− , x=4−y with y ∈ R arbitrary,
2
in other words,
x 4−y 4 −y 4 −1
y = y = 0 + y = 0 + y 1 with y ∈ R arbitrary.
z − 12 − 21 0 − 21 0
FT
Proof. The set of all points Q(x, y, z) which belong both to E1 and L2 is the set of all x, y, z which
simultaneously satisfy
Replacing the expression with t from L2 into the equation of the plane E1 , we obtain the following
equation for t:
of 3 equations for 7 unknowns. If we solve them as we did here, the process could become quite
messy. So the next chapter is devoted to find a systematic and efficient way to solve a system of m
linear equations for n unknowns.
2.8 Summary
The vector space Rn is given by
x1
n ..
R = . : x1 , . . . , x n ∈ R .
xn
For points P (p1 , . . . , pn ), Q(q1 , . . . , qn ), the vector whose initial point is P and final point is Q, is
q1 − p1 q1
# – . # – .
P Q = .. and OQ = .. where O denotes the origin.
qn − pn qn
FT
n
On R , the sum and product with scalars are defined by
v1 w1 v1 + w1 cv1
Rn × Rn → Rn , ~v + w~ = ... + ... = ..
, R × Rn → R n , c~v = ... .
.
vn wn vn + wn cvn
The norm of a vector is q
k~v k = v12 + · · · + vn2 .
# – # –
RA
If ~v = P Q, then k~v k = kP Qk = distance between P and Q.
~ ∈ Rn their inner product is a real number defined by
For vectors ~v and w
* v1 w1 +
.. ..
h~v , wi
~ = . , . = v1 w1 + · · · + vn wn .
vn wn
• h~v , wi
~ = hw ~ , ~v i,
h~v , cwi
~ = ch~v , wi,~
h~v , w
~ + ~ui = h~v , wi
~ + h~v , ~ui,
• h~v , wi
~ = k~v k kwk ~ cos ϕ,
• k~v + wk
~ ≤ k~v k + kwk
~ Triangle inequality
• ~v ⊥ w ~ ⇐⇒ h~v , wi~ = 0,
2
• h~v , ~v i = k~v k .
The cross product is defined only in R3 . It is a vector defined by
v1 w1 v2 w3 − v3 w2
~ = v2 × w2 = v3 w1 − v1 w3 .
~v × w
v3 w3 v1 w2 − v2 w1
Applications
~ ∈ R3 : A = k~v × wk.
• Area of a parallelogram spanned by ~v , w ~
~ ∈ R3 : V = |h~u , ~v × wi|.
• Volume of a parallelepiped spanned by ~u, ~v , w ~
FT
Representations of lines
n# – o
• Vector equation L = OP + ~t~v : t ∈ R .
P is a point on the line, ~v is called directional vector of L.
• Parametric equation x1 = p1 + tv1 , . . . , xn = pn + tvn , t ∈ R.
Then P (p1 , . . . , pn ) is a point on L and ~v = (v1 , . . . , vn )t is a directional vector of L.
• Symmetric equation x1v−p 1
= x2v−p 2
= · · · = xnv−p n
.
RA
1 2 n
Then P (p1 , . . . , pn ) is a point on L and ~v = (v1 , . . . , vn )t is a directional vector of L.
If one or several of the vj are equal to 0, then the formula above has to be modified.
Representations of planes
n# – o
• Vector equation E = OP + ~t~v + sw
~ : s, t ∈ R .
~ are vectors parallel to E with ~v 6k w.
P is a point on the line, ~v and w ~
D
The parametrisations are not unique!! (One and the same line (or plane) has many different
parametrisations.)
• The angle between two lines is the angle between their directional vectors.
• Two lines are parallel if and only if their directional vectors are parallel.
Two lines are perpendicular if and only if their directional vectors are perpendicular.
• The angle between two planes is the angle between their normal vectors.
• Two planes are parallel if and only if their normal vectors are parallel.
Two planes are perpendicular if and only if their normal vectors are perpendicular.
• A line is parallel to a plane if and only if its directional vector is perpendicular to the plane.
A line is perpendicular to a plane if and only if its directional vector is parallel to the plane.
FT
RA
D
2.9 Exercises
3
1. Sean P (2, 3), Q(−1, 4) puntos en R2 y sea ~v = un vector en R2 .
−2
−−→
(a) Calcule P Q.
# –
(b) Calcule kP Qk.
−−→
(c) Calcule P Q + ~v .
(d) Encuentre todos los vectores que son ortogonales a ~v .
2
2. Sea ~v = ∈ R2 .
5
FT
(c) Encuentre todos los vectores que tienen la misma dirección que ~v y que tienen doble
longitud de ~v .
(d) Encuentre todos los vectores con norma 2 que son ortogonales a ~v .
Find at least one more vector equation and one more symmetric equation. Find at least two
different parametric equations.
4. Para los siguientes vectores ~u y ~v decida si son ortogonales, paralelos o ninguno de los dos.
D
Calcule el coseno del ángulo entre ellos. Si son paralelos, encuentre números reales λ y µ tales
que ~v = λ~u y ~u = µ~v .
1 5 2 1
(a) ~v = , ~u = , (b) ~v = , ~u = ,
4 −2 4 2
3 −8 −6 3
(c) ~v = , ~u = , (d) ~v = , ~u = .
4 6 4 −2
FT
~
x
9. (a) Calcule el área del paralelogramo cuyos vértices adyacentes A(1, 2, 3), B(2, 3, 4), C(−1, 2, −5)
son y calcule el cuarto vértice.
(b) Calcule el área del triángulo con los vértices. A(1, 2, 3), B(2, 3, 4), C(−1, 2, −5).
D
(c) Calcule
5 el volumen del paralelepipedo
−1 1 determinado por los vectores
~u = 2 , ~v = 4 ,w ~ = −2 .
1 3 7
10. (a) Demuestre que no existe un elemento neutral para el producto cruz en R3 . Es decir:
Demuestre que no existe ningún vector ~v ∈ R3 tal que ~v × w ~ =w ~ ∈ R3 .
~ para todo w
1
(b) Sea w~ = 2 ∈ R3 .
3
2 2
(i) Encuentre todos los vectores ~a, ~b ∈ R3 tales que ~a × w
~ = 1 , ~b × w
~ = −1 ,
0 0
(ii) Encuentre todos los vectores ~v ∈ R3 tales que h~v , wi
~ = 4.
• si L1 y L2 son paralelas,
• si L1 y L2 tienen un punto de intersección,
• si P pertenece a L1 y/o a L2 ,
• una recta paralela a L2 que pase por P .
3 1
y−2
(a) L1 : ~r(t) = 4 + t −1 , L2 : x−3 2 = 3 = 4 ,
z−1
P (5, 2, 11).
5 3
2 1
(b) L1 : ~r(t) = 1 + t 2 , L2 : x = t + 1, y = 3t − 4, z = −t + 2, P (5, 7, 2).
−7 3
FT
(d) Encuentre un punto en E y dos vectores ~a y ~b en E con ~a ⊥ ~b.
13. Para los puntos P (1, 1, 1), Q(1, 0, −1) y los siguientes planos E:
(a) Encuentre E ∩ L.
(b) Encuentre una recta G que no interseque ni al plano E ni a la recta L. Pruebe su afirmación.
Cúantas rectas con esta propiedad hay?
16. Sea E un plano en R2 y sean ~a, ~b vectores paralelos a E. Demuestre que para todo λ, µ ∈ R,
el vector λ~a + µ~b es paralelo al plano.
18. De todos los siguientes conjuntos decida si es un espacio vectorial con su suma y producto
usual.
a
FT
(a) V = :a∈R ,
a
a
(b) V = : a ∈ R ,
a2
(c) V es el conjunto de todas las funciones continuas R → R.
(d) V es el conjunto de todas las funciones f continuas R → R con f (4) = 0.
(e) V es el conjunto de todas las funciones f continuas R → R con f (4) = 1.
RA
D
Chapter 3
We will rewrite linear systems as matrix equations in order to solve them systematically and effi-
ciently. We will interpret matrices as linear maps from Rn to Rm which then allows us to define
FT
algebraic operations with matrices, specifically we will define the sum and the composition (=mul-
tiplication) of matrices which then leads naturally to the concept of the inverse of a matrix. We
can interpret a matrix as a system which takes some input (the variables x1 , . . . , xn ) and gives us
back as output b1 , . . . , bm via A~x = ~b. Sometimes we are given the input and we want to find
the bj ; and sometimes we are given de output b1 , . . . , bm and we want to find the input x1 , . . . , xn
which produces the desired output. The latter question is usually the harder one. We will see that
a unique input for any given output exists if and only if the matrix is invertible. We can refine
the concept of invertibility of a matrix. We say that A has a left inverse if for any ~b the equation
RA
A~x = ~b has at most one solution and we say that it has a right inverse A~x = ~b has at least one
solution for any ~b.
We will discuss in detail the Gauß and Gauß-Jordan elimination which helps us to find solutions
of a given linear system and the inverse of a matrix if it exists. In Section 3.7 we define the trans-
position of matrices and we have a first look at symmetric matrices. They will become important
in Chapter 8. We will also see the interplay of transposing a matrix and the inner product. In the
last section of this chapter we define the so-called elementary matrices which can be seen as the
building blocks of invertible matrices. We will use them in Chapter 4 to prove important properties
D
of the determinant.
Recall that the system is called consistent if it has at least one solution; otherwise it is called
inconsistent. According to (1.4) and (1.5) its associated coefficient matrix and augmented coefficient
matrices are
a11 a12 . . . a1n
a21 a22 . . . a2n
A= . (3.2)
..
.. .
am1 am2 ... amn
and
a11 a12 ... a1n b1
a21 a22 ... a2n b2
(A|b) =
... .. .. . (3.3)
. .
am1 am2 ... amn bn
Definition 3.1. The set of all matrices with m rows and n columns is denoted by M (m × n). If we
FT
want to emphasise that the matrix has only real entries, then we write M (m × n, R) or MR (m × n).
Another frequently used notation is Mm×n . A matrix A is called a square matrix if its number of
rows is equal to its number of columns.
In order to solve (3.1), we could use the first equation, solve for x1 and insert this in all the other
equations. This gives us a new system with m − 1 equations for n − 1 unknowns. Then we solve
the next equation for x2 , insert it in the other equations, and we continue like this until we have
only one equation left. This of course will fail if for example a11 = 0 because in this case we cannot
RA
solve the first equation for x1 . We could save our algorithm by saying: we solve the first equation
for the first unknown whose coefficient is different from 0 (or we could take an equation where the
coefficient of x1 is different from 0 and declare this one to be our first equation. After all, we can
order the equations as we please). Even with this modification, the process of solving and replacing
is error prone.
Another idea is to manipulate the equations. The question is: Which changes to the equations
are allowed without changing the information contained in the system? We don’t want to destroy
information (thus potentially allowing for more solutions) nor introduce more information (thus
potentially eliminating solutions). Or, in more mathematical terms, what changes to the given
D
system of equations result in an equivalent system? Here we call two systems equivalent if they
have the same set of solutions.
We can check if the new system is equivalent to the original one, if there is a way to restore the
original one.
For example, if we exchange the first and the second row, then nothing really happened and we end
up with an equivalent system. We can come back to the original equation by simply exchanging
again the first and the second row.
If we multiply both sides of the first equation on both sides by some factor, let’s say, by 2, then
again nothing changes. Assume for example that the first equation is x + 3y = 7. If we multiply
both sides by 2, we obtain 2x + 6y = 14. Clearly, if a pair (x, y) satisfies the first equation, then
it satisfies also the second one an vice versa. Given the new equation 3x + 6y = 14, we can easily
restore the old one by simply dividing both sides by 2.
If we take an equation and multiply both of its sides by 0, then we destroy information because we
end up with 0 = 0 and there is no way to get back the information that was stored in the original
equation. So this is not an allowed operation.
Show that squaring both sides of an equation in general does not give an equivalent equation.
Are there cases, when it does?
Squaring an equation or taking the logarithm on both sides or other such things usually are not
interesting to us because the resulting equation will no longer be a linear equation.
Let us denote the jth row of our linear system (3.1) by Rj . The following tabel contains the so-
called called elementary row operations. They are the “allowed” operations because they do not
alter the information contained in a given linear system since they are reversible.
The first column describes the operation in words, the second introduces their shorthand notation
and in the last row we give the inverse operation which allows us to get back to the original system.
FT
Elementary operation Notation Inverse Operation
1 Swap rows j and k. Rj ↔ Rk Rj ↔ Rk
2 Multiply row j by some λ 6= 0. Rj → λRj Rj → λ1 Rj
3 Replace row k by the sum of row k and λ times Rk → Rk + λRj Rk → Rk − λRj
Rj and leave row j unchanged.
RA
Exercise. Show that the operation in the third column reverses the operation from the second
column.
Exercise. Show that in reality 1 is not necessary since it can be achieved by a composition of
operations of the form 2 and 3 (or 2 and 3’ ). Show how this can be done.
D
Example 3.2.
x1 + x2 − x3 = 1 x1 + x2 − x3 = 1
x1 + x2 − x3 = 1
R →R −2R1 R →R −4R2
2x1 + 3x2 + x3 = 3 −−2−−−2−−−→ x2 + 3x3 = 1 −−3−−−3−−−→ x2 + 3x3 = 1
4x2 + x3 = 7 4x2 + x3 = 7 − 11x3 = 3
x1 + x2 − x3 =
1
R3 →R3 −4R2
−−−−−−−−→ x 2 + 3x 3 = 1
x3 = −3/11.
Here we can stop because it is already quite easy to read off the solution. Proceeding from the
bottom to the top, we obtain
Note that we could continue our row manipulations to clean up the system even more:
x1 + x2 − x3 = 1
R →−1/11R x 1 + x 2 − x 3 = 1
3 3
· · · −→ x2 + 3x3 = 1 −−−−−−−−−→ x2 + 3x3 = 1
− 11x3 = 3 x3 = −3/11
x1 + x2 − x3 =
1 R →R −1/11R x1 + x2
= 8/11
R →R −3R3 1 1 3
−−2−−−2−−−→ x2 = 20/11 −−−−−−−−−−−→ x2 + = 20/11
x3 = −3/11 x3 = −3/11
x1 + = −12/11
FT
R →R −R
−−1−−−−
1 2
−−→ x2 = 20/11
x3 = −3/11
Our strategy was to apply manipulations that successively eliminate the unknowns in the lower
equations and we aimed to get to a form of the system of equations where the last one contains the
least number of unknowns possible.
Convince yourself that the first step of our reduction process is equivalent to solve the first
RA
equation for x1 and insert it in the other equations in order to eliminate it there. The next step
in the reduction is equivalent to solve the new second equation for x2 and insert it into the third
equation.
It is important to note that there are infinitely many different routes leading to the final result,
but usually some are quicker than others.
Let us analyse what we did. We looked at the coefficients of the system and we applied trans-
D
formations such that they become 0 because this results in removing the corresponding unknowns
from the equations. So in the example above we could just as well delete all the xj , keep only the
augmented coefficient matrix and perform the line operations in the matrix. Of course, we have
to remember that the numbers in the first columns are the coefficients of x1 , those in the second
column are the coefficients of x2 , etc. Then our calculations are translated into the following:
1 1 1 1 1 1 1 1 1 1 1 1
R2 →R2 −2R1 R →R −4R2
2 3 1 3 − −−−−−−−→ 0 1 3 1 −−3−−−3−−−→ 0 1 3 1
0 4 1 7 0 4 1 7 0 0 −11 3
1 1 1 1
R3 →1/11R3
−−−−−−−−→ 0 1 3 1 .
0 0 1 −3/11
x1 + x2 + x3 = 1
x2 + 3x3 = 3
x3 = −3/11
FT
want them to be the last unknowns. So as last row we want one that has only zeros in it
or one that starts with zeros, until finally we get a non-zero number say in column k. This
non-zero number can always be made equal to 1 by dividing the row by it. Now we know
how the unknowns xk , . . . , xn are related. Note that all the other unknowns x1 , . . . , xk−1 have
disappeared from the equation since their coefficients are 0.
If k = n as in our example above, then we there is only one solution for xn .
• The second non-zero row from the bottom should also start with zeros until we get to a column,
say column l, with non-zero entry which we always can make equal to 1. This column should
RA
be to the left of the column k (that is we want l < k). Because now we can use what we
know from the last row about the unknowns xk , . . . , xn to say something about the unknowns
xl , . . . , xk−1 .
• We continue like this until all rows are as we want them.
Note that the form of such a “nice” matrix looks a bit like it had a triangle consisting of only zeros
in its lower left part. There may be zeros in the upper right part. If a matrix has the form we just
described, we say it is in row echelon form. Let us give a precise definition.
D
Definition 3.3 (Row echelon form). We say that a matrix A ∈ M (m × n) is in row echelon
form if:
Definition 3.4 (Reduced row echelon form). We say that a matrix A ∈ M (m×n) is in reduced
row echelon form if:
Examples 3.5.
(a) The following matrices are in reduced row echelon form. The pivots are highlighted.
1 1 0 0
1 0 0 0
! !
1 6 0 0 , 0 0 1 0
,
1 6 0 1 , 1 0 0 0 , 0 1 0 0 .
0 0 1 0 0 0 0 1
0 0 1 1 0 0 1 1 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 1
0 0 0 0
(b) The following matrices are in row echelon form but not in reduced row echelon form. The
pivots are highlighted.
1 6 3 1 ! ! 1 0 5 0
1 6 3 1 0 0 1 4
FT
1 6 1 0 , 1 0 2 0 , 0 1 0 0
0 0 1 1 , ,
.
0 0 0 1
0 0 1 1 0 0 1 1 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 1
0 0 0 0
Exercise. • Say why the matrices in (b) are not in reduced row echelon form and use ele-
mentary row operations to transform them into a matrix in reduced row echelon form.
• Say why the matrices in (c) are not in row echelon form and use elementary row operations
to transform them into a matrix in row echelon form. Transform them further to obtain a
D
Question 3.1
If we interchange two rows in a matrix this corresponds to writing down the given equations in a
different order. What is the effect on a linear system if we interchange two columns?
Remember: if we translate a linear system to an augmented coefficient matrix (A|b), perform the
row operations to arrive at a (reduced) row echelon form (A0 |b0 ), and translate back to a linear
system, then this new system contains exactly the same information as the original one but it is
“tidied up” and it is easy to determine its solution.
The natural question now is: Can we always transform a matrix into one in (reduced) row echelon
form? The answer is that this is always possible and we can even give an algorithm for it.
Gaußian elimination. Let A ∈ M (m × n) and assume that A is not the zero matrix. Gaußian
elimination is an algorithm that transforms A into a row echelon form. The steps are as follows:
• Find the first column which does not consist entirely of zeros. Interchange rows appropiately
such that the entry in that column in the first row is different from zero.
• Multiply the first row by an appropriate number so that its first non-zero entry is 1.
• Use the first row to eliminate all coefficients below its pivot.
• Now our matrix looks like
0 0 1 ∗ ∗
0 .
0
A0
0 0 0
where ∗ are arbitrary numbers and A0 is a matrix with fewer columns than A and m − 1 rows.
FT
Now repeat the process for A0 . Note that in doing so the first columns do not change since
we are only manipulating zeros.
Definition 3.6. Two m × n matrices A and B are called row equivalent if there are elementary
row operations that transform A into B. (Clearly then B can be transformed by row operations
into A.)
D
Before we give examples, we note that from the row echelon form we can immediately tell how
many solutions the corresponding linear system has.
Theorem 3.7. Let (A|b) be the augmented coefficient matrix of a linear m × n system and let
(A0 |b0 ) be a row reduced form.
(1) If there is a row of the form (0 · · · 0|β) with β 6= 0, then the system has no solution.
(2) If there is no row of the form (0 · · · 0|β) with β 6= 0, then one of the following holds:
(2.1) If there is a pivot in every column of A0 then the system has exactly one solution.
(2.2) If there is a column in A0 without a pivot, then the system has infinitely many solutions.
Proof. (1) If (A0 |b0 ) has a row of the form (0 · · · 0|β) with β 6= 0, then the corresponding equation
is 0x1 + · · · + 0xn = β which clearly has no solution.
(2) Now assume that (A0 |b0 ) has no row of the form (0 · · · 0|β) with β 6= 0. In case (2.1), the
transformed matrix is then of the form
0 0 0 0
1 a12 a13 a1n b1
0 0 0
0 1 a23 a2n b 2
1
FT
. (3.4)
1 a0(n−1)n b0n−1
0
b0n
1
0 0
0
0 0 0
RA
Note that the last zero rows appear only if n < m. This system clearly has the unique solution
xn = b0n , xn−1 = b0n−1 − a(n−1)n xn , ..., x1 = b01 − a1n xn − · · · − a12 x2 .
. (3.5)
0 0 1 ∗ 0
∗ bk
0 0
0 0 0
where the stars stand for numbers. (If we continue the reduction until we get to the reduced
row echelon form, then the numbers over the 1’s must be zeros.) Note that we can choose the
unknowns which correspond to the columns without a pivot arbitrarily. The unknowns which
correspond to the columns with pivots can then always be chosen in a unique way such that
the system is satisfied.
Definition 3.8. The variables wich correspond to columns without pivots are called free variables.
We will come back to this theorem later on page 99 (the theorem is stated again in the coloured
box).
From the above theorem we get as an immediate consequence the following.
Theorem 3.9. A linear system has either no, exactly one or infinitely many solutions.
Example 3.10 (Example with a unique solution (no free variables)). We consider the
linear system
2x1 + 3x2 + x3 = 12,
−x1 + 2x2 + 3x3 = 15, (3.6)
3x1 − x3 = 1.
Solution. We form the augmented matrix and perform row reduction.
FT
2 3 1 12 0 7 7 42 0 7 7 42
R1 →R1 +2R2 R3 →R3 +3R2
−1 2 3 15 −−−−−−−−→ −1 2 3 15 −−−−−−−−→ −1 2 3 15
3 0 −1 1 3 0 −1 1 0 6 8 46
R →−R
−1 2 3 15 1 1
R2 → 71 R2
1 −2 −3 −15
R1 ↔R2
−−−−−→ 0 7 7 42 −−−−−−→ 0
1 1 6
0 6 8 46 0 6 8 46
RA
1 −2 −3 −15 R3 → 12 R3
1 −2 −3 −15
R →R −6R2
−−3−−−3−−−→ 0 1 1 6 −−−−−−→ 0 1 1 6 .
0 0 2 10 0 0 1 5
This shows that the system (3.6) is equivalent to the system
x1 − 2x2 − 3x3 = −15,
x2 + x3 = 6, (3.7)
x3 = 5
D
Remark. If we continue the reduction process until we reach the reduced row echelon form, then
we obtain
1 −2 −3 −15 1 −2 −3 −15 1 −2 0 0
R →R2 −R3 R →R +3R3
. . . −→ 0 1 1 6 −−2−−−− −−→ 0 1 0 1 −−1−−−1−−−→ 0 1 0 1
0 0 1 5 0 0 1 5 0 0 1 5
1 0 0 2
R →R +2R2
−−1−−−1−−−→ 0 1 0 1 .
0 0 1 5
x3 = 5, x2 = 1, x1 = 2.
Example 3.11 (Example with two free variables). We consider the linear system
3x1 − 2x2 + 3x3 + 3x4 = 3,
2x1 + 6x2 + 2x3 − 9x4 = 2, (3.8)
x1 + 2x3 + x3 − 3x4 = 1.
Solution. We form the augmented matrix and perform row reduction.
FT
3 −2 3 3 3 3 −2 3 3 3 0 −8 0 12 0
R →R −2R1 R →R −3R3
2 6 2 −9 2 −−2−−−2−−−→ 0 2 0 −3 0 −−1−−−1−−−→ 0 2 0 −3 0
1 2 1 −3 1 1 2 1 −3 1 1 2 1 −3 1
1 2 1 −3 1 1 2 1 −3 1
R ↔R3 R →R +4R2
−−1−−−→ 0 2 0 −3 0 −−3−−−3−−−→ 0 2 0 −3 0
0 −8 0 12 0 0 0 0 0 0
RA
1 0 1 0 1
R →R1 −R2
−−1−−−−−−→ 0 2 0 −3 0 .
0 0 0 0 0
The 3rd and the 4th column do not have pivots and we see that the system (3.8) is equivalent to
the system
x1 − x3 = 1,
x2 + x4 = 0.
D
Clearly we can choose x3 and x4 (the unknowns corresponding to the columns without a pivot)
arbitrarily. We will always be able to adjust x1 and x2 such that the system is satisfied. In order
to make it clear that x3 and x4 are our free variables, we sometimes call them x3 = t and x4 = s.
Then every solution of the system (3.8) is of the form
In vector form we can write the solution as follows. A tuple (x1 , x2 , x3 , x4 ) is a solution of (3.8) if
and only if the corresponding vector is of the form
x1 1+t 1 1 0
x2 −s 0 0 −1
x3 t = 0 + t 1 + s 0 for some s, t ∈ R.
=
x4 s 0 0 1
FT
The last line tells us immediately that the system (3.9) has no solution because there is no choice
of x1 , x2 , x3 such that 0x1 + 0x2 + 0x3 = 30.
• why a given matrix can be transformed into may different row echelon forms, but in only
one reduced row echelon form,
• why a linear system always has either no, exactly one or infinitely many solutions,
• etc.
You should now be able to
• identify if a matrix is in row echelon or a reduced row echelon form,
• use the Gauß- or Gauß-Jordan elimination to solve linear systems,
• say if a system has no, exactly one or infinitely many solutions if you know its echelon form,
• etc.
Theorem 3.13. Let A be the coefficient matrix of a homogeneous linear m × n system and let A0
be a row reduced form.
(i) If there is a pivot in every column then the system has exactly one solution, namely the trivial
solution.
(ii) If there is a column with without a pivot, then the system has infinitely many solutions.
FT
Corollary 3.14. A homogeneous linear system has either exactly one or infinitely many solutions.
1
use R2 to clear 0 −1 1 0 −1
the 2nd column R →−R2
−−−−−−−−−−→ 0 −1 0 −−2−−−−→ 0 1 0 .
0 0 0 0 0 0
x1 = t, x2 = 0, x3 = t for t ∈ R.
or in vector form
x1 1
x2 = t 0 for t ∈ R.
x3 1
Example 3.16 (Example of a homogeneous system with exactly one solution). We con-
sider the linear system
x1 + 2x2 = 0,
2x1 + 3x2 = 0, (3.11)
3x1 + 5x2 = 0.
In the next section we will see the connection between the set of solutions of a linear system and
the corresponding homogeneous linear system.
FT
• why a homogeneous linear system always has either one or infinitely many solutions,
• etc.
You should now be able to
RA
• use the Gauß- or Gauß-Jordan elimination to solve homogeneous linear systems,
• etc.
this type are called inverse problems since we are given an output (the right hand of the system;
the “state” that we want to achieve) and we have to find a suitable input in order to obtain the
desired output.
Now we change our perspective a bit and we ask ourselves: If we put certain x1 , . . . , xn into the
system, what do we get as a result on the right hand side? To investigate this question, it is very
useful to write the system (3.1) in a short form. First note that we can view it as an equality of
the two vectors with m components each:
Let A be the coefficient matrix and ~x the vector whose components are x1 , . . . , xn . Then we write
the left hand side of (3.12) as
a11 x1 + a12 x2 + · · · + a1n xn
x1 a x + a x + · · · + a x
.. 21 1 22 2 2n n
A~x = A . := .. ..
.
(3.13)
. .
xn
am1 x1 + am2 x2 + · · · + amn xn
With this notation, the linear system (3.1) can be written very short as
A~x = ~b
b1
with ~b = ... .
bm
A way to remember the formula for the multiplication of a matrix and a vector is that we “multiply
FT
each row of the matrix by the column vector”, so we calculate “row by column”. For example, the
jth component of A~x is “(jth row of A) by (column ~x)”.
a11 a12 . . . a1n aj1 x1 + aj2 x2 + · · · + ajn xn
.. .. x1 ..
. . .
x
2
A~x = a
j1 a j2 . . . ajn .
= a x
j1 1 + a x
j2 2 + · · · + a x
jn n .
(3.14)
..
. . .
.. .. ..
xn
RA
am1 am2 . . . amn am1 x1 + am2 x2 + · · · + amn xn
Definition 3.17. The formula in (3.13) is called the multiplication of a matrix and a vector.
An m × n matrix A takes a vector with n components and gives us back a vector with m compo-
nents.
Observe that something like ~xA does not make sense!
D
Remark 3.19. Recall that ~ej is the vector which has a 1 as its jth component and has zeros
everywhere else. Formula (3.13) shows that for every j = 1, . . . , n
a1j
A~ej = ... = jth column of A. (3.16)
amj
FT
Proof. The proofs are not difficult. They follow by using the definitions and carrying out some
straightforward calculations as follows.
a11 cx1 + · · · + a1n cxn a11 x1 + · · · + a1n xn
cx1 a cx + · · · + a cx a x + · · · + a x
21 1 2n n 21 1 2n n
..
(i) A(c~x) = A . = .. ..
= c
.. ..
= cA~x.
. . . .
cx
n
am1 cx1 + · · · + amn cxn am1 x1 + · · · + amn xn
RA
a11 (x1 + y1 ) + · · · + a1n (xn + yn )
x1 + y1 a (x + y ) + · · · + a (x + y )
21 1 1 2n n n
(ii) A(~x + ~y ) = A ..
=
. .. ..
. .
x +y
n n
am1 (x1 + y1 ) + · · · + amn (xn + yn )
a11 x1 + · · · + a1n xn a11 y1 + · · · + a1n yn
a x + · · · + a x a y + · · · + a y
21 1 2n n 21 1 2n n
=
.. .
+
. .
= A~x + A~y .
.. . ..
D
. .
am1 x1 + · · · + amn xn am1 y1 + · · · + amn yn
(iii) To show that A~0 = ~0, we could simply do the calculation (which is very easy!) or we can use
(i):
A~0 = A(0~0) = 0A~0 = ~0.
Note that in (iii) the ~0 on the left hand side is the zero vector in Rn whereas the ~0 on the right
hand side is the zero vector in Rm .
Proposition 3.20 gives an important insight into the structure of solutions of linear systems.
Theorem 3.21. (i) Let ~x and ~y be solutions of the linear system (3.1). Then ~x − ~y is a solution
of the associated homogeneous linear system.
(ii) Let ~x be a solution of the linear system (3.1), let ~z be a solution of the associated homogeneous
linear system and let λ ∈ R. Then ~x + λ~z is solution of the system (3.1).
which shows that ~x − ~y solves the homogeneous equation A~v = ~0. Hence (i) is proved
In order to show (ii), we proceed similarly. If ~x solves the inhomogeneous system (3.1) and ~z solves
the associated homogeneous system, then
FT
A(~x + λ~z) = A~x + Aλ~z = A~x + λA~z = ~b + λ~0 = ~b.
Corollary 3.22. Let ~x be an arbitrary solution of the inhomogeneous system (3.1). Then the set
of all solutions of (3.1) is
x1 + 2x2 − x3 = 3,
2x1 + 3x2 − 2x3 = 3, (3.10’)
3x1 − x2 − 3x3 = −12.
Solution. We form the augmented matrix and perform row reduction.
1 2 −1 3 1 2 −1 3 use R2 to clear 1 0 −1 −3
R →R −2R1 the 2nd column
2 3 −2 3 −−2−−−2−−−→ 0 −1 0 −3 − −−−−−−−−−→ 0 −1 0 −3
R3 →R3 −3R1
3 −1 −3 −12 0 −7 0 −21 0 0 0 0
1 0 −1 −3
R →−R2
−−2−−−−→ 0 1 0 3 .
0 0 0 0
This shows that indeed we obtain all solutions of the inhomogenous equation as sum of the particular
solution (0, 3, 0)t and all solutions of the corresponding homegenous system.
FT
• that an m × n matrix can be viewed as an operator that takes vectors in Rn and returns a
vector in Rm ,
• the structure of the set of all solutions of a given linear system,
• etc.
You should now be able to
• calculate expressions like A~x,
RA
• relate the solutions of an inhomogeneous system with those of the corresponding homoge-
neous one,
• etc.
In the previous section we saw that a matrix A ∈ M (m × n) takes a vector ~x ∈ Rn and returns
a vector A~x in Rm . This allows us to view A as a function from Rn to Rm , and therefore we can
define the sum and composition of two matrices. Before we do this, let us see a few examples of
such matrices. As examples we work with 2 × 2 matrices because their action on R2 can be sketched
in the plane.
1 0
Example 3.24. Let us consider A = . This defines a function TA from R2 to R2 by
0 −1
TA : R2 → R2 , TA ~x = A~x.
Remark. We write TA to denote the function induced by A, but sometimes we will write simply
A : R2 → R2 when it is clear that we consider the matrix A as a function.
We calculate easily
1 1 0 0 x x
TA = , TA = , in general TA = .
0 0 1 −1 y −y
So we see that TA represents the reflection of a vector ~x about the x-axis.
y y
TA
~e2 ~v = ( vv12 )
TA w
~
x x
~e1 TA~e1
w
~ TA~e2
v1
TA~v = −v2
0
TB ~v = v2
~e2 ~v = ( vv12 )
x x
~e1
w
~ TB w
~
0 −1
Example 3.26. Let us consider C = . This defines a function TC from R2 to R2 by
1 0
TC : R2 → R2 , TC ~x = C~x.
We calculate easily
1 0 0 −1 x −y
TC = , TC = , in general TC = .
0 1 1 0 y x
−v2
TC ~v = v1
~e2 ~v = ( vv12 )
FT
TC~e1
x x
~e1 TC~e2
w
~
TC w
~
RA
Figure 3.3: Rotation about π/2 counterclockwise.
Just as with other functions, we can sum them or compose them. Remember from your calculus
classes, that functions are summed “pointwise”. That means, if we have two functions f, g : R → R,
then the sum f + g is a new function which is defined by
D
The multiplication of a function f with a number c gives the new function cf defined by
Matrix sum
Let us see how this looks like in the case of matrices. Let A and B be matrices. First note that
they both must depart from the same space Rn because we want to apply them to the same ~x, that
is, both A~x and B~x must be defined. Therefore A and B must have the same number of columns.
They also must have the same number of rows because we want to be able to sum A~x and B~x. So
let A, B ∈ M (m × n) and let ~x ∈ R. Then, by definition of the sum of two functions, we have
a11 a12 · · · a1n x1 b11 b12 · · · b1n x1
a21 a22 · · · a2n x2 b21 b22 · · · b2n x2
(A + B)~x := A~x + B~x = . .. .. + ..
.. .. .. ..
.. . . . . . . .
am1 am2 · · · amn xn bm1 bm2 · · · bmn xn
a11 x1 + a12 x2 + · · · + a1n xn b11 x1 + b12 x2 + · · · + b1n xn
a21 x1 + a22 x2 + · · · + a2n xn
b21 x1 + b22 x2 + · · · + b2n xn
= +
.. ..
. .
am1 x1 + am2 x2 + · · · + amn xn bm1 x1 + bm2 x2 + · · · + bmn xn
a11 x1 + a12 x2 + · · · + a1n xn + b11 x1 + b12 x2 + · · · + b1n xn
a21 x1 + a22 x2 + · · · + a2n xn + b21 x1 + b22 x2 + · · · + b2n xn
FT
=
..
.
am1 x1 + am2 x2 + · · · + amn xn + bm1 x1 + bm2 x2 + · · · + bmn xn
(a11 + b11 )x1 + (a12 + b12 )x2 + · · · + (a1n + bmn )xn
(a21 + b11 )x1 + (a22 + b12 )x2 + · · · + (a2n + bmn )xn
=
..
.
(am1 + b11 )x1 + (am2 + b12 )x2 + · · · + (amn + bmn )xn
RA
a11 + b11 a12 + b12 ··· a1n + bmn x1
a21 + b11 a22 + b12 ··· a2n + bmn x2
= .. .
.. .. ..
. . . .
am1 + b11 am2 + b12 ··· amn + bmn xn
We see that A + B is again a matrix of the same size and that the components of this new matrix
are just the sum of the corresponding components of the matrices A and B.
D
We see that cA is again a matrix and that the components of this new matrix are just the product
of the corresponding components of the matrix A with c.
Proposition 3.27. Let A, B, C ∈ M (m × n) let O be the matrix whose entries are all 0 and let
λ, µ ∈ R. Moreover, let A
e be the matrix whose entries are the negative entries of A. Then the
following is true.
e = O.
(iv) Additive inverse A + A
FT
(v) 1A = A.
Now let us calculate the composition of two matrices. This is also called the product of the matrices.
Assume we have A ∈ M (m × n) and we want to calculate AB for some matrix B. Note that A
describes a function from Rn → Rm . In order for AB to make sense, we need that B goes from
some Rk to Rn , that means that B ∈ M (n × k). The resulting function AB will then be a map
from Rk to Rm .
B A
Rk Rn Rm
AB
So let B ∈ M (n × k). Then, by the definition of the composition of two functions, we have for every
~x ∈ Rk
a11 [b11 x1 + b12 x2 + · · · + b1k xk ] + a12 [b21 x1 + b22 x2 + · · · + b2k xk ] + · · · + a1n [bn1 x1 + bn2 x2 + · · · + bnk xk ]
a21 [b11 x1 + b12 x2 + · · · + b1k xk ] + a22 [b21 x1 + b22 x2 + · · · + b2k xk ] + · · · + a2n [bn1 x1 + bn2 x2 + · · · + bnk xk ]
=
..
.
FT
am1 [b11 x1 + b12 x2 + · · · + b1k xk ] + am2 [b21 x1 + b22 x2 + · · · + b2k xk ] + · · · + amn [bn1 x1 + bn2 x2 + · · · + bnk xk ]
[a11 b11 + a12 b21 + · · · + a1n bn1 ]x1 + [a11 b12 + a12 b22 + · · · + a1n bn2 ]x2 + · · · + [a11 b1k + a12 b2k + · · · + a1n bnk ]xk
[a21 b11 + a22 b21 + · · · + a2n bn1 ]x1 + [a21 b12 + a22 b22 + · · · + a2n bn2 ]x2 + · · · + [a21 b1k + a22 b2k + · · · + a2n bnk ]xk
=
..
.
[am1 b11 + am2 b21 + · · · + amn bn1 ]x1 + [am1 b12 + am2 b22 + · · · + amn bn2 ]x2 + · · · + [am1 b1k + am2 b2k + · · · + amn bnk ]xk
a11 b11 + a12 b21 + · · · + a1n bn1 a11 b12 + a12 b22 + · · · + a1n bn2 ··· a11 b1k + a12 b2k + · · · + a1n bnk
x1
RA
a21 b11 + a22 b21 + · · · + a2n bn1 a21 b12 + a22 b22 + · · · + a2n bn2 ··· a21 b1k + a22 b2k + · · · + a2n bnk x2
=
.. .. .
. . ..
am1 b11 + am2 b21 + · · · + amn bn1 am1 b12 + am2 b22 + · · · + amn bn2 ··· am1 b1k + am2 b2k + · · · + amn bnk xk
We see that AB is a matrix of the size m × k as was to be expected since the composition function
goes from Rk to Rm . The component j` of the new matrix (the entry in lines j and column `) is
D
n
X
cj` = ajk bk` .
k=1
So in order to calculate this entry we need from A only its jth row and from B we only need its
`th column and we multiply them component by component. You can memorise this again as “row
by column”, more precisely:
of (3.20).
a11 a12 ... a1n b11 ... b1` ... b1k c11 ... c1` ... c1k
.. .. b21 ... b2` ... b2k .. .. ..
. .
.. .. .. . . .
aj1
AB = aj2 . . . ajn . ... . . = cj1 ... cj` ... cjk
. .. .. .. .. .. .. ..
.. . . ... . . . . .
am1 am2 . . . amn bn1 ... bn` ... bnk cm1 ... cm2 ... cmk
FT
1 2 3
AB = 2 0 1 4
8 6 4
2 6 −3 0
1·7+2·2+3·2 1·1+2·0+3·6 1 · 2 + 2 · 1 + 3 · (−3) 1·3+2·4+3·0
=
8·7+6·2+4·2 8·1+6·0+4·6 8 · 2 + 6 · 1 + 4 · (−3) 8·3+6·4+4·0
17 19 −5 11
= .
76 32 10 48
RA
Let us see some properties of the algebraic operations for matrices that we just introduced.
AB 6= BA.
That matrix multiplication is not commutative is to be expected since it is the composition of two
functions (think of functions that you know from your calculus classes. For example, it does make
a difference if you first square a variable and then take the arctan or if you first calculate its arctan
and then square the result).
Let us see an example. Let B be the matrix from Example 3.25 and C be the matrix from
Example 3.26. Recall that B represents the orthogonal projection onto the y-axis and that C
represents counterclockwise rotation by 90◦ . If we take ~e1 (the unit vector in x-direction), and we
first rotate and then project, we get the vector ~e2 . If however we project first and rotate then, we
get ~0. That means, BC~e1 6= CB~e1 , therefore BC 6= CB. Let us calculate the products:
0 0 0 −1 0 0
BC = = first rotation, then projection,
0 1 1 0 1 0
0 −1 0 0 0 −1
CB = = first projection, then rotation.
1 0 0 1 0 0
FT
Let A be the matrix from Example 3.24, B be the matrix from Example 3.25 and C the matrix
from Example 3.26. Verify that AB 6= BA and AC 6= CA and understand this result geometrically
by following for example where the unit vectors get mapped to.
Note also that usually, when AB is defined, the expression BA is not defined because in general
the number of columns of B will be different from the number of rows of A.
We finish this section with the definition of the so-called identity matrix.
RA
Definition 3.31. Let n ∈ N. Then the n × n identity matrix is the matrix which has 1s on its
diagonal and has zero everywhere else:
1 0 0 1
0 1
1 0
= . (3.21)
1 0 1
0
D
0 0 1 1
As notation for the identity matrix, the following symbols are used in the literature: En , idn , Idn ,
In , 1n , 1n . The subscript n can be omitted if the size of the matrix is clear.
FT
definition.
• Assume we are given the matrix A from Example 3.24 which represents reflection on the
x-axis and we want to find a matrix that restores a vector after we applied A to it. Clearly,
RA
we have to reflect again on the x-axis: reflecting an arbitrary vector ~x twice on the x-axis
leaves the vector where it was. Let us check:
1 0 1 0 1 0
AA = = = id2 .
0 −1 0 −1 0 1
That means that for every ~x ∈ R2 , we have that A2 ~x = ~x, hence A is its own inverse.
• Assume we are given the matrix C from Example 3.26 which represents counterclockwise
rotation by 90◦ and we want to find a matrix that restores a vector after we applied C to it.
D
Clearly, we have to rotate clockwise by 90◦ . Let us assume that there exists a matrix which
represents this rotation and let us call it C−90◦ . By Remark 3.18 it is enough to know how it
acts on ~e1 and ~e2 in order to write it down. Clearly C−90◦~e1 = −~e2 and C−90◦~e2 = ~e1 , hence
C−90◦ = (−~e2 |~e1 ).
Let us check:
0 1 0 −1 1 0
C−90◦ C = = = id2
−1 0 1 0 0 1
and
0 −1 0 1 1 0
CC−90◦ = = = id2
1 0 −1 0 0 1
which was to be expected because rotating first 90◦ clockwise and then 90◦ counterclockwise,
leaves any vector where it is.
• Assume we are given the matrix B from Example 3.25 which represents projection onto the
y-axis. In this case, we cannot
restore
a vector ~x after we projected
it onto the y-axis. For
0 0 7
example, if we know that B~x = , then ~x could have been or or any other vector
2 2 2
in R2 whose second component is equal to 2. This shows that B does not have an inverse.
Let us consider the following situation. A grocery sells two different packages of fruits. Type A
contains 1 peach and 3 mangos and type B contains 2 peaches and 1 mango. We can ask two
different type of questions:
(i) Given a certain number of packages of type A and of type B, how many peaches and how
many mangos do we get?
(ii) How many packages of each type do we need in order to obtain a given number of peaches
and mangos?
then
FT
The first question is quite easy to answer. Let us write down the information that we are given. If
p = 1a + 2b
p = number of peaches
m = number of mangos.
RA
(3.22)
m = 3a + 1b.
Using vectors and matrices, we can rewrite this as
p 1 2 a
= .
m 3 1 b
1 2
Let A = . Then the above becomes simply
3 1
D
p a
=A . (3.23)
m b
If we know a and b (that is, we know how many packages of each type we bought), then we can
find the values of p and m by simply evaluating A( ab ) which is relatively easy.
Example 3.33. Assume that we buy 1 package of type A and 3 packages of type B, then we
calculate
p 1 1 2 1 7
=A = = ,
m 3 3 1 3 6
which shows that we have 9 peaches and 7 mangos.
If on the other hand, we know p and m and we are asked find a and b such that (3.22) holds, we
have to solve a linear system which is much more cumbersome. Of course, we can solve (3.23) using
the Gauß or Gauß-Jordan elimination process, but if we were asked to do this for several pairs p
and m, then it would become long quickly. However, if we had a matrix A0 such that A0 A = id2 ,
then this task would be quite easy since in this case we could manipulate (3.23) as follows:
p a 0 p 0 a a a
=A =⇒ A =AA = id2 = .
m b m b b b
p a p a
=A ⇐⇒ A0 = . (3.24)
m b m b
The task to find a and b again reduces to perform a matrix multiplication. The matrix A0 , if it
FT
exists, is called the inverse of A and we will dedicate the rest of this section to give criteria for its
existence, investigate its properties and give a recipe for finding it.
−1 2
Exercise. Check that A0 = 1
satisfies A0 A = id2 .
5 3 −1
Example 3.34. Assume that we want to buy 5 peaches and 5 mangos. Then we calculate
RA
a 5 1 −1 2 5 1
= A0 = = ,
b 5 5 3 −1 5 2
Definition 3.35. A matrix A ∈ M (n×n) is called invertible if there exists a matrix A0 ∈ M (n×n)
such that
In this case A0 is called the inverse of A and it is denoted by A−1 . If A is not invertible then it is
called non-invertible or singular.
The reason why in the definition we only admit square matrices (matrices with the same number
or rows and columns) is explained in the following remark.
Remark 3.36. (i) Let A ∈ M (m×n) and assume that there is a matrix B such that BA = idn .
This means that if for some ~b ∈ Rm the equation A~x = ~b has a solution, then it is unique
because
A~x = ~b =⇒ BA~x = B~b =⇒ ~x = B~b.
From the above it is clear that A ∈ M (m × n) can have an inverse only if for every ~b ∈ Rm
the equation A~x = ~b has at most one solution. We know that if A has more columns than
rows, then the number of columns will be larger than the number of pivots. Therefore,
A~x = ~b has either no or infinitely many solutions (see Theorem 3.7). Hence a matrix A with
more columns than rows cannot have an inverse.
(ii) Again, let A ∈ M (m × n) and assume that there is a matrix B such that AB = idm . This
means that for every ~b ∈ Rm the equation A~x = ~b is solved by ~x = B~b because
From the above it is clear that A ∈ M (m × n) can have an inverse only if for every ~b ∈ Rm
FT
the equation A~x = ~b has at least one solution. Assume that A has more rows than columns.
If we apply Gaussian elimination to the augmented matrix A|~b) then the last row of the
row-echelon form has to be (0 · · · 0|βm ). If we chose ~b such that after the reduction βm 6= 0,
then A~x = ~b does not have a solution. Such a ~b is easy to find: We only need to take ~em
(the mth unit vector) and do the steps from the Gauß elimination backwards. If we take
this vector as right hand side of our system, then the last row after the reduction will be
(0 . . . 0|1). Therefore, a matrix A with more rows than columns cannot have an inverse
because there will always be some ~b such that the equation A~x = ~b has no solution.
RA
In conlusion we showed that we must have m = n if A ought to have an inverse matrix.
Note that C and D must be n × m matrices. The following examples show that the left- and right
inverses do not need to exist, and if they do, they are not unique.
0 0
Examples 3.38. (i) A = has neither left- nor right inverse.
0 0
1 0
1 0 0
(ii) A = has no left inverse and has right inverse D = 0 1. In fact, for every
0 1 0
0 0
1 0
x, y ∈ R the matrix 0 1 is a right inverse of A.
x y
1 0
1 0 0
(iii) A = 0 1 has no right inverse and has left inverse C =
. In fact, for every
0 1 0
0 0
1 0 x
x, y ∈ R the matrix is a left inverse of A.
0 1 y
Remark 3.39. We will show in Theorem 3.44 that a matrix A ∈ M (n × n) is invertible if and only
if it has a left- and a right inverse.
Examples 3.40. • From the examples at the beginning of this section we have:
1 0 −1 1 0
A= =⇒ A = A = ,
0 −1 0 −1
0 −1 0 1
C= =⇒ C −1 = ,
1 0 −1 0
FT
0 0
B= =⇒ B is not invertible.
0 1
4 0 0 0 1/4 0 0 0
0 5 0 0 −1
0 1/5 0 0
• Let A = . Then we can easily guess that A =
0 0 −3 0 0 0 −1/3 0
0 0 0 2 0 0 0 1/2
is an inverse of A. It is easy to check that the product of these matrices gives id4 .
RA
• Let A ∈ M (n × n) and assume that the kth row of A consists of only zeros. Then A is not
invertible because for any matrix B ∈ M (n × n), the kth row of the product matrix AB will
be zero, no matter how we choose B. So there is no matrix B such that AB = idn .
• Let A ∈ M (n × n) and assume that the kth column of A consists of only zeros. Then A is
not invertible because for any matrix B ∈ M (n × n), the kth column of the product matrix
BA will be zero, no matter how we choose B. So there is no matrix B such that BA = idn .
Now let us prove some theorems about inverse matrices. Recall that A ∈ M (n × n) is invertible if
D
Proof. (i) Assume that A is invertible and that A0 and A00 are inverses of A. Note that this
means that
AA0 = A0 A = idn and AA00 = A00 A = idn . (3.25)
We have to show that A0 = A00 . This follows from (3.25) and from the associativity of the
matrix multiplication because
(ii) Assume that A is invertible and let A−1 be its inverse. In order to show that A−1 is invertible,
we need a matrix C such that CA−1 = A−1 C = idn . This matrix C is then the inverse of
A−1 . Clearly, C = A does the trick. Therefore A−1 is invertible and (A−1 )−1 = A.
(iii) Assume that A and B are invertible. In order to show that AB is invertible and (AB)−1 =
B −1 A−1 , we only need to verify that B −1 A−1 (AB) = (AB)B −1 A−1 = idn . We see that this
is true using the associativity of the matrix product:
Note that in the proof we guessed the formula for (AB)−1 and then we verified that it indeed is the
FT
inverse of AB. We can also calculate it as follows. Assume that C is a left inverse of AB. Then
Theorem 3.43 in the next section will show us how to find the inverse of a invertible matrix; see in
particular the section on page 100.
• what invertibility of a matrix means and why it does not make sense to speak of the invert-
ibility of a matrix which is not a square matrix,
• that invertibility of matrix of n × n-matrix is equivalent to the fact that for every ~b ∈ Rm
the associated linear system A~x = ~b has exactly one solution.
• etc.
You should now be able to
• guess the inverse of simple invertible matrices, for example of matrices which have a clear
geometric interpretation, or of diagonal matrices,
• verify if two given matrices are inverse to each other,
• give examples of invertible and of non-invertible matrices,
• etc.
(2) Equation (3.26) has at least ⇐⇒ The reduced row echelon form of the augmented
one solution. system (A|~b) has no row of the form (0 · · · 0|β)
with some β 6= 0.
FT
In case (2), we have the following two sub-cases:
(2.1) Equation (3.26) has exactly one solution. ⇐⇒ #pivots = #columns.
(2.2) Equation (3.26) has infinitely many solutions. ⇐⇒ #pivots < #columns.
Observe that the case (1), no solution, cannot occur for homogeneous systems.
The next theorem connects the above to invertibility of the matrix representing the system.
We will complete this theorem with one more item in Chapter 4 (Theorem 4.11).
(ii) ⇒ (i) Assume that (ii) holds. We will construct A−1 as follows (this also tells us how we can
calculate A−1 if it exists). Recall that we need a matrix C such that AC = idn . This C will
then be our candidate for A−1 (we still would have to check that CA = idn ). Let us denote the
columns of C by ~cj for j = 1, . . . , n, so that C = (~c1 | · · · |~cn ). Recall that the kth column of AC is
A(kth column of C) and that the columns of idn are exactly the unit vectors ~ek (the vector with a
1 as kth component and zeros everywhere else). Then AC = idn can be written as
(A~c1 | · · · |A~cn ) = (~e1 | · · · |~en ).
By (ii) we know that equations of the form A~x = ~ej have a unique solution. So we only need
to set ~cj = unique solution of the equation A~x = ~ej . With this choice we then have indeed that
AC = idn .
It remains to show that CA = idn . To this end, note that
A = idn A =⇒ A = ACA =⇒ A − ACA = 0 =⇒ A(idn −CA) = 0.
This means that A(idn −CA)~x = ~0 for every ~x ∈ Rn . Since by (ii) the equation A~y = ~0 has the
unique solution ~y = ~0, it follows that (idn −CA)~x = ~0 for every x ∈ Rn . But this means that
FT
~x = CA~x for every ~x, hence CA must be equal to idn .
Proof. (i) By Theorem 3.43 it suffices to show that A~x = ~0 has a the unique solution ~0. So
assume that ~x ∈ Rn satisfies A~x = ~0. Then ~x = idn ~x = (CA)~x = C(A~x) = C~0 = ~0. This
RA
shows that A is invertible. Moreover, C = C(idn ) = C(AA−1 ) = (CA)A−1 = idn A−1 = A−1 ,
hence C = A−1 .
(ii) By (i) applied to D, it follows that D has an inverse and that D−1 = A, so by Theo-
rem 3.41 (ii), A is invertible and A−1 = (D−1 )−1 = D.
We only need to solve A~x = ~ek for k = 1, . . . , n. This might be cumbersome and long, but we
already know that if these equations have solutions, then we can find them with the Gauß-Jordan
elimination. We only need to form the augmented matrix (A|~ek ), apply row operations until we get
to (idn |~ck ). Then ~ck is the solution of A~x = ~ek and we obtain the matrix A−1 as the matrix whose
columns are the vectors ~c1 , . . . , ~cn . If it is not possible to reduce A to the identity matrix, then it
is not invertible.
Note that the steps that we have to perform to reduce A to the identiy matrix depend only on
the coefficients in A and not on the right hand side. So we can calculate the n vectors ~c1 , . . . ~cn
with only one (big) Gauß-Jordan elimination if we augment our given matrix A by the n vectors
~e1 , . . . ,~en . But the matrix (~e1 | · · · |~en ) is nothing else than the identity matrix idn . So if we take
(A| idn ) and apply the Gauß-Jordan elimination and if we can reduce A to the identity matrix,
then the columns on the right are the columns of the inverse matrix A−1 . If we cannot get to the
identity matrix, then A is not invertible.
1 2
Examples 3.45. (i) Let A = . Let us show that A is invertible by reducing the aug-
3 4
mented matrix (A| id2 ):
1 2 1 0 R2 −3R1 →R2 1 2 1 0 R1 +R2 →R1 1 0 2 1
(A| id2 ) = −−−−−−−−→ −−−−−−−−→
3 4 0 1 0 −2 −3 1 0 −2 −3 1
−1/2R2 →R2 1 0 −2 1
−−−−−−−−→ .
0 1 3/2 −1/2
−2 1
Hence A is invertible and A−1 = .
3/2 −1/2
We can check your result by calculating
1 2 −2 1 −2 + 3 1 − 1 1 0
= =
3 4 3/2 −1/2 −6 + 6 3 − 2 0 1
and
FT
−2 1 1 2 −2 + 3 −4 + 4 1 0
= = .
3/2 −1/2 3 4 3/2 − 3/2 3−2 0 1
1 2
(ii) Let A = . Let us show that A is not invertible by reducing the augmented matrix
−2 −4
(A| id2 ):
1 2 1 0 R2 +2R1 →R2 1 2 1 0
(A| id2 ) = −−−−−−−−→ .
−2 −4 0 1 0 0 2 1
RA
Since there is a zero row in the left matrix, we conclude that A is not invertible.
1 1 1
(iii) Let A = 0 2 3. Let us show that A is invertible by reducing the augmented matrix
5 5 1
(A| id3 ):
1 1 1 1 0 0 1 1 1 1 0 0
R3 −5R1 →R3
(A| id3 ) = 0 2 3 0 1 0 −−−−−−−−→ 0 2 3 0 1 0
D
5 5 1 0 0 1 0 0 −4 −5 0 1
1 1 1 1 0 0 4 4 0 −1 0 1
4R2 +3R3 →R2 4R1 +R3 →R1
−−−−−−−−−→ 0 8 0 −15 4 3 −−−−−−−−→ 0 8 0 −15 4 3
0 0 −4 −5 0 1 0 0 −4 −5 0 1
8 0 0 13 −4 −1
2R1 −R2 →R1
−−− −−−−−→ 0 8 0 −15 4 3
0 0 −4 −5 0 1
1 0 0 13/8 −1/2 −1/8
2R1 −R2 →R1
−−− −−−−−→ 0 1 0 −15/8 1/2 3/8 .
0 0 1 5/4 0 −1/4
13/8 −1/2 −1/8 13 −4 −1
Hence A is invertible and A−1 = −15/8 1/2 3/8 = 1
8
−15 4 3 .
5/4 0 −1/4 10 0 2
We can check your result by calculating
1 1 1 13/8 −1/2 −1/8 1 0 0
0 2 3 −15/8 1/2 3/8 = · · · = 0 1 0
5 5 1 5/4 0 −1/4 0 0 1
and
13/8 −1/2 −1/8 1 1 1 1 0 0
−15/8 1/2 3/8 0 2 3 = · · · = 0 1 0 .
5/4 0 −1/4 5 5 1 0 0 1
FT
c d
linear system has exactly one solution. By Theorem 1.11 this is the case if and only if det A 6= 0.
Recall that det A = ad − bc. So let us assume here that det A 6= 0.
Case 1. a 6= 0.
a b 1 0 aR2 −cR1 →R2 a b 1 0
(A| id2 ) = −−−−−−−−−→
c d 0 1 0 ad − bc −c a
RA
bc ab ad ab
b
R1 − ad−bc R2 →R1 a 0 1 + ad−bc − ad−bc a 0 ad−bc − ad−bc
−−−−−−−−−−−−→ =
0 ad − bc −c a 0 ad − bc −c a
d b
b
R1 − ad−bc R2 →R1 1 0 ad−bc − ad−bc
−−−−−−−−−−−−→ c a .
0 1 − ad−bc ad−bc
It follows that
1 d −b
A−1 = . (3.27)
ad − bc −c a
D
• the relation between the invertibility of a square matrix A and the existence and uniqueness
of solution of A~x = ~b,
• that inverting a matrix is the same as solving a linear system,
• etc.
You should now be able to
FT
a11 a21 · · · am1
a12 a22 · · · am2
.. .. ..
t
A = . . . ∈ M (n × m).
. . .
.. .. ..
If we denote At = (e
aij ) i=1,...,n , then e
aij = aji for i = 1, . . . , n and j = 1, . . . , m.
RA
j=1,...,m
are
D
1 4 1 4 7 3
t 1 3
A = , B t = 2 5 , C t = 2 5 7 2 .
2 4
3 6 3 6 7 4
Proof. Clear.
Proof. Note that both (AB)t and B t At are m × k matrices. In order to show that they are equal,
we only need to show that they are equal in every entry. Let i ∈ {1, . . . , m} and j ∈ {1, . . . , k}.
Then
Theorem 3.50. Let A ∈ M (n × n). Then A is invertible if and only if At is invertible. In this
case, (At )−1 = (A−1 )t .
Proof. Assume that A is invertible. Then AA−1 = id. Taking the transpose on both sides, we find
This shows that At is invertible and its inverse is (A−1 )t , see Theorem 3.44. Now assume that
FT
At is invertible. From what we just showed, it follows that then also its transpose (At )t = A is
invertible.
Next we show an important relation between transposition of a matrix and the inner product on
Rn .
(ii) We have to show: For all i = 1, . . . , m and j = 1, . . . , n we have that aij = bji . Take
~x = ~ej ∈ Rn and ~y = ~ei ∈ Rm . If we take the inner product of A~ej with ~ei , then we obtain
the ith component of A~ej . Recall that A~ej is the jth column of A, hence
Similarly if we take the inner product of B~ei with ~ej , then we obtain the jth component of
B~ei . Since B~ei is the jth column of B it follows that
By assumption hA~ej ,~ei i = h~ej , B~ei i, hence it follows that aij = bji , hence B = At .
That means that for an upper triangular matrix all entries below the diagonal are zero, for a lower
triangular matrix all entries above the diagonal are zero and for a diagonal matrix, all entries except
the ones on the diagonal must be zero. These matrices look as follows:
a11 a11 a11
a22 ∗
,
a22 0
,
a22 0
0 ∗ 0
ann ann ann
FT
lower triangular matrix, diagonal matrix diag(a11 , . . . , ann ).
Remark 3.53. A matrix is both upper and lower triangular if and only if it is diagonal.
Examples 3.54.
RA
0 2 4 2 0 0 0 0
1 2 4 0 2 0 0 0 0 0
0 5 2 0 3 0 0
A = 0 2 5 , B =
0
, C = , D = 0 3 0 , E = 0 0 0 .
0 0 8 3 4 0 0
0 0 3 0 0 8 0 0 0
0 0 0 0 5 0 0 1
The matrices A, B, D, E are upper triangular, C, D, E are lower triangular, D, E are diagonal.
Examples 3.56.
1 7 4 3 0 4 0 2 −5 0 0 8
A = 7 2 5 , B = 0 4 0 , C = −2 0 −3 , D = 0 3 0 .
4 5 3 4 0 1 5 3 0 2 0 0
Question 3.2
How many possibilities are there to express a given matrix A ∈ M (n × n) as sum of a symmetric
and an antisymmetric matrix?
Exercise 3.58. Show that the diagonal entries of an antisymmetric matrix are 0.
FT
• why (AB)t = B t At ,
• what the transpose of a matrix has to do with the inner product,
• etc.
You should now be able to
• calculate the transpose of a given matrix,
• check if a matrix is symmetric, antisymmetric or none,
RA
• etc.
1
(i) Sj (c) = for j = 1, . . . , n and c 6= 0. All entries outside the diagonal are 0.
c
1
column k
1 c row j
(ii) Qjk (c) =
for j, k = 1, . . . , n with j 6= k and c ∈ R. The number c is
1
in row j and column k. All entries apart from c and the diagonal are 0.
col. k col. j
1
0 1
row k
(iii) Pjk (c) =
for j, k = 1, . . . , n. This matrix is obtained from
1 0
row j
1
the identity matrix by swapping rows j and k (or, equivalently, by swapping columns j and
k).
FT
5 0 1 0 0 1
S1 (5) = , Q21 (3) = , P12 = .
0 1 3 1 1 0
Let us see how these matrices act on other n × n matrices. Let A = (aij )ni,j=1 ∈ M (n × n). We
want to calculate EA where E is an elementary matrix.
1 a11 a12 a1n a11 a12 a1n
• Sj (c)A =
D
= caj1 caj2
c aj1 aj2 ajn cajn
1 an1 an2 ann an1 an2 ann
a11 a12 a1n a11 a12 a1n
1
aj1 aj2 ajn
1 c aj1 + cak1 aj2 + cak2 ajn + cakn
• Qjk (c)A = =
ak1 ak2 akn
ak1 ak2 akn
1
an1 an2 ann an1 an2 ann
1 a11 a12 a1n a11 a12 a1n
0 1
aj1 aj2 ajn ak1 ak2
akn
• Pjk A = = .
aj1 aj2 ajn
ak1 ak2 akn
1 0
1 an1 an2 ann an1 an2 ann
FT
These are exactly the row operations from the Gauß or Gauß-Jordan elimination! So we see that
every row operation can be achieved by multiplying from the left by an appropriate elementary
matrix.
Remark 3.62. The form of the elementary matrices is quite easy to remember if you recall that
E idn = E for every matrix E, in particular for an elementary matrix. So, if you want to remember
e.g. how the 5 × 5 matrix looks like which sums 3 times the 2nd row to the 4th, just remember
that this matrix is
RA
1 0 0 0 0
0 1 0 0 0
E = E id5 = (take id5 and sum 3 times its 2nd row to its 4th row) = 0 0 1 0 0
0 3 0 1 0
0 0 0 0 1
Question 3.3
D
How do elementary matrices act on other matrices if we multiply them from the right?
Hint. There are two ways to find the answer. One is to carry out the matrix multiplication as we
did on page 107. Or you could use that AE = [(AE)t ]t = [E t At ]t . If E is an elementary matrix,
then so is E t , see Proposition 3.64. Since you know how E t At looks like, you can then deduce
how its transpose looks like.
Since the action of an elementary matrix can be “undone” (since the corresponding row operation
can be undone), we expect them to be invertible. The next proposition shows that they indeed are
and that their inverse is again an elementary matrix of the same type.
Show that Proposition 3.63 is true. Convince yourself that it is true using their interpretation as
row operations.
FT
Exercise 3.65. Show that Qjk (c) = Sk (c−1 )Qjk (1)Sk (c) for c 6= 0. Interpret the formula in
terms of row operations.
Exercise. Show that Pjk can be written as product of matrices of the form Qjk (c) and Sj (c).
RA
Let us come back to the relation of elementary matrices and the Gauß-Jordan elimination process.
Proposition 3.66. Let A ∈ M (n × n) and let A0 be a row echelon form of A. Then there exist
elementary matrices E1 , . . . , Ek such that
A = E1 E2 · · · Ek A0 .
Proof. We know that we can arrive at A0 by applying suitable row operations to A. By Propo-
sition 3.61 they correspond to multiplication of A from the left by suitable elementary matrices
D
Fk , Fk−1 , . . . , F2 , F1 , that is
A0 = Fk Fk−1 · · · F2 F1 A.
We know that all the Fj are invertible, hence their product is invertible and we obtain
We know that the inverse of every elementary matrix Fj is again an elementary matrix, so if we set
Ej = Fj−1 for j = 1, . . . , k, the proposition is proved.
Corollary 3.67. Let A ∈ M (n × n). Then there exist elementary matrices E1 , . . . , Ek and an
upper triangular matrix U such that
A = E1 E2 · · · Ek U.
Proof. This follows immediately from Proposition 3.66 if we recall that every row reduced echelon
form of A is an upper triangular matrix.
The next theorem shows that every invertible matrix is “composed” of elementary matrices.
Theorem 3.68. Let A ∈ M (n × n). Then A is invertible if and only if it can be written as product
of elementary matrices.
Proof. Assume that A is invertible. Then the reduced row echelon form of A is idn . Therefore,
by Proposition 3.66, there exist elementary matrices E1 , . . . , Ek such that A = E1 · · · Ek idn =
E1 · · · Ek .
If, on the other hand, we know that A is the product of elementary matrices, say, A = F1 · · · F` , then
clearly A is invertible since each elementary matrix Fj is invertible and the product of invertible
matrices is invertible.
We finish this section with an exercise where we write an invertible 2 × 2 matrix as product of
elementary matrices. Notice that there are infinitely many ways to write it as product of elementary
FT
matrices just as there are infinitely many ways of performing row reduction to get to the identity
matrix.
1 2
Example 3.69. Write the matrix A = as product of elementary matrices.
3 4
Solution. We use the idea of the proof of Theorem 3.43: we apply the Gauß-Jordan elimination
process and write the corresponding row transformations as elementary matrices.
RA
1 2 R2 →R2 −3R1 1 2 R1 →R1 +R2 1 0 R2 →− 12 R2 1 0
−−−−−−−−→ −−−−−−−−→ −−−−−− −→
3 4 Q21 (−3) 0 −2 Q12 (1) 0 −2 S2 (− 12 ) 0 1
| {z } | {z } | {z }
=Q21 (−3)A =Q21 (1)Q21 (−3)A =S2 (− 12 )Q21 (1)Q21 (−3)A
So we obtain that
id2 = S2 (− 12 )Q21 (1)Q21 (−3)A. (3.28)
Since the elementary matrices are invertible, we can solve for A and obtain
A = [S2 (− 21 )Q21 (1)Q21 (−3)]−1 id2 = [S2 (− 21 )Q21 (1)Q21 (−3)]−1
D
3.9 Summary
Elementary row operations (= operations which lead to an equivalent system) for
solving a linear system.
FT
Elementary operation Notation Inverse Operation
1 Swap rows j and k. Rj ↔ Rk Rj ↔ Rk
2 Multiply row j by some λ ∈ R \ {0} Rj → λRj Rj → λ1 Rj
3 Replace row k by the sum of row k and λ times Rk → Rk + λRj Rk → Rk − λRj
Rj and keep row j unchanged.
RA
On the solutions of a linear system.
• A linear system has either no, exactly one or infinitely many solutions.
• If the system is homogeneous, then it has either exactly one or infinitely many solutions. It
always has at least one solution, namely the trivial one.
• The set of all solutions of a homogeneous linear equations is a vector space.
• The set of all solutions of a inhomogeneous linear equations is an affine vector space.
For A ∈ M (m × n) and ~b ∈ Rm consider the equation A~x = ~b. Then the following is true:
D
(2) At least one solution ⇐⇒ The reduced row echelon form of the augmented
system (A|~b) has no row of the form (0 · · · 0|β)
with some β 6= 0.
In case (2), we have the following two sub-cases:
(2.1) Exactly one solution ⇐⇒ # pivots= # columns.
(2.2) Infinitely many solutions ⇐⇒ # pivots< # columns.
Definition.
a11 a12 ··· a1n x1 a11 x1 + a12 x2 + · · · + a1n xn
a21 a22 ··· x2 a21 x1 + a22 x2 + · · · + a2n xn
a2n
A~x = . .. .. = .. ,
.. ..
.. . . . . .
am1 am2 ··· amn xn am1 x1 + am2 x2 + · · · + amn xn
a11 a12 ··· a1n b11 b12 ··· b1n
a21 a22 ··· a2n b21 b22 ··· b2n
A+B = . .. + ..
.. .. ..
.. . . . . .
am1 am2 ··· amn bm1 bm2 · · · bmn
a11 + b11 a12 + b12 ··· a1n + b1n
FT
a21 + b21 a22 + b22 ··· a2n + b2n
= ,
.. .. ..
. . .
am1 + bm1 am2 + bm2 ··· amn + bmn
a11 a12 ··· a1n b11 b12 ··· b1n
a21 a22 ··· a2n b21 b22 ··· b2n
AB = .
.. .. .. .. ..
.. . . . . .
RA
am1 am2 · · · amn bm1 bm2 · · · bmn
a11 b11 + a12 b21 + · · · + a1n bn1 · · · a11 b1k + a12 b2k + · · · + a1n bnk
a21 b11 + a22 b21 + · · · + a2n bn1 · · · a21 b1k + a22 b2k + · · · + a2n bnk
=
.. ..
. .
am1 b11 + am2 b21 + · · · + amn bn1 ··· am1 b1k + am2 b2k + · · · + amn bnk
= (cj` )j`
D
with
n
X
cj` = ajh bh` .
h=1
• A1 + A2 = A2 + A1 ,
• (A1 + A2 ) + A3 = A1 + (A2 + A3 ),
• (AB)C = A(BC),
• in general, AB 6= BA,
• A(~x + c~y ) = A~x + cA~y ,
• (A1 + cA2 )~x = A1 ~x + cA2 ~x,
• (AB)~z = A(B~z),
Transposition of matrices
• (At )t = A,
FT
• (A + B)t = At + B t ,
• (AC)t = C t At ,
• hA~x , ~y i = h~x , At ~y i for all ~x ∈ Rn and ~y ∈ Rm .
A matrix A is called symmetric if At = A and antisymmetric if At = −A. Note that only square
matrices can be symmetric.
A matrix A = (aij )i,j=1,...,n ∈ M (n × n) is called
RA
• upper triangular if aij = 0 whenever i > j,
• lower triangular if aij = 0 whenever i < j,
• diagonal if aij = 0 whenever i 6= j.
Clearly, a matrix is diagonal if and only if it is upper and lower triangular. The transpose of an
upper triangular matrix is lower triangular and vice verse. Every diagonal matrix is symmetric.
D
Invertibility of matrices
(i) A is invertible.
(ii) For every ~b ∈ Rn , the equation A~x = ~b has exactly one solution.
(iii) The equation A~x = ~0 has exactly one solution.
(iv) Every row-reduced echelon form of A has n pivots.
(v) A is row-equivalent to idn .
Let A ∈ M (n × n). Form the augmented matrix (A| idn ) and use the Gauß-Jordan elimination to
reduce A to its reduced row echelon form A0 : (A| idn ) → · · · → (A0 |B). If A0 = idn , then A is
invertible and A−1 = B. If A0 6= idn , then A is not invertible.
FT
Inverse of a 2 × 2 matrix
a b
Let A = . Then det A = ad − bc. If det A = 0, then A is not invertible. If det A 6= 0, then
c d
−1 1 d −b
A is invertible and A = det A .
−c a
RA
Elementary matrices
• Sj (c) = (sik )i,k=1...,n for c 6= 0 where sik = 0 if i 6= k, skk = 1 for k 6= j and sjj = c,
• Qjk (c) = (qi` )i,`=1...,n for j 6= k, where qjk = c, q`` = 1 for all ` = 1, . . . , n and all other
coefficients equal to zero,
• Pjk = (pi` )i,`=1...,n for j 6= k, where p`` = 1 for all ` ∈ {1, . . . , n} \ {j, k}, pjk = pkj = 1 and
D
col. k col. k
column k 1
1 0 1 row k
1
c
row j
Sj (c) =
c
,
Qjk (c) =
, Pjk =
.
1 1 1 0 row j
1
3.10 Exercises
1. Vuelva al Capı́tulo 1 y haga los ejercicios otra vez utilizando los conocimientos adquiridos en
este capı́tulo.
2. Encuentre un polinomio de grado a lo más 2 que pase por los puntos (−1, −6), (1, 0), (2, 0).
FT
¿Cuántos tales polinomios hay?
3. (a) ¿Existe un polynomio de grado 1 que pase por los tres puntos del Ejercicio 2? ¿Cuántos
tales polinomios hay?
(b) ¿Existe un polynomio de grado 3 que pase por los tres puntos del Ejercicio 2? ¿Cuántos
tales polinomios hay? Dé por lo menos dos polinomios de grado 3.
RA
2x2 − 4x + 14
4. Encuentre las fracciones parciales de .
x(x − 2)2
¿Existen sistemas 3 × 3 y 4 × 3 con las mismas solucioes? Dé ejemplos o diga por qué no existen.
¿Existe un sistema 4 × 3 con las mismas solucioes? Dé ejemplos o diga por qué no existen.
Encuentre todo los posibles b1 , b2 , b3 , o diga por qué no hay, para que el sistema tenga
(a) exactamente una solución,
(b) ninguna solución,
(c) infinitas soluciones.
FT
A = 4 8 1 0 , B= , C = 4 1 0 , D= ,
1 4 3 −2 2
1 4 4 3 1 4 3
5 −4
1
1 4 4
1
0 2 3 −3
3 ,
~r = ~v = , w
~ = 3 ,
5 ,
~x = ~y = , ~z = −2 .
3 5 5
π
6 −1
−1
RA
2 6 −1 17
9. Sean A = 1 −2 2 y ~b = 6 . Encuentre todos los vectores ~x ∈ R3 tal que A~x = ~b.
1 2 −2 4
1 1
10. Sea M = .
−1 3
(a) Demuestre que no existe ~y 6= 0 tal que M ~y ⊥ ~y .
D
(b) Encuentre todos los vectores ~x 6= 0 tal que M~xk~x. Para cada tal ~x, encuentre λ ∈ R tal
que M~x = λ~x.
12. Determine si las matrices son invertibles. Si lo son, encuentre su matriz inversa.
1 3 6 1 4 6
1 −2 −14 21
A= , B= , D = 4 1 0 , E = 2 1 5 .
2 7 12 −18
1 4 3 3 5 11
13. De las siguientes matrices determine si son invertibles. Si lo son, encuentre su matriz inversa.
1 3 6
1 0 5 2 4 10
A= , B= , C= , D = 4 1 0 .
3 6 8 6 6 15
1 4 3
(a) Dé una ecuación de la forma A~x = ~b que describe lo de arriba. Diga que siginifican los
vectores ~x y ~b.
FT
(b) Calcule, usando el resultado de (a), cuantos chocolates y cuantas mentas contienen:
(i) 1 caja de tipo A y 3 de tipo B, (iii) 2 caja de tipo A y 6 de tipo B,
(ii) 4 cajas de tipo A y 2 de tipo B, (iv) 3 cajas de tipo A y 5 de tipo B.
1 3
15. Sea Ak = y considere la ecuación
2 k
0
Ak ~x = . (∗)
D
(a) Encuentre todos los k ∈ R tal que (∗) tiene exactamente una solución para ~x.
(b) Encuentre todos los k ∈ R tal que (∗) tiene infinitas soluciones para ~x.
(c) Encuentre todos los k ∈ R tal que (∗) tiene ninguna solución para ~x.
2
(d) Haga lo mismo para Ak ~x = en vez de (∗).
3
b1 b
(e) Haga los mismo para Ak ~x = en vez de (∗) donde 1 es un vector arbitrario distinto
b 2 b2
0
de .
0
16. Escriba las matrices invertibles de los Ejercicios 12 y 13 como producto de matrices elementales.
17. Para las sigientes matrices encuentre matrices elementales E1 , . . . , En tal que E1 · E2 · · · · · En A
es de la forma triangular superior.
1 4 −4 1 2 3
7 4
A= , B = 2 1 0 , C = 1 2 0 .
3 5
3 5 3 2 4 3
18. Sea A ∈ M (m × n) y sean ~x, ~y ∈ Rn , λ ∈ R. Demuestre que A(~x + λ~y ) = A~x + λA~y .
19. Demuestre que el espacio M (m × n) es un espacio vectorial con la suma de matrices y producto
con λ ∈ R usual.
FT
20. Sea A ∈ M (n × n).
21. Sea A = (aij ) i=1,...,n ∈ M (m × n) y sea ~ek el k-ésimo vector unitario en Rn (es decir, el vector
j=1,...,m
RA
en Rn cuya k-ésima entrada es 1 y las demás son cero). Calcule A~ek para todo k = 1, . . . , n y
describa en palabras la relación del resultado con la matriz A.
22. (a) Sea A ∈ M (m × n) y suponga que A~x = ~0 para todo ~x ∈ Rn . Demuestre que A = 0 (la
matriz cuyas entradas son 0).
(b) Sea x ∈ Rn y suponga que A~x = ~0 para todo A ∈ M (n × n). Demuestre que ~x = ~0.
(c) Encuentre una matriz A ∈ M (2 × 2) y ~v ∈ R2 , ambos distintos de cero, tal que A~v = ~0.
D
4 −1
23. Sean ~v = yw
~= .
5 3
(a) Encuentre una matriz A ∈ M (2 × 2) que mapea el vector ~e1 a ~v y el vector ~e2 a w.
~
(b) Encuentre una matriz B ∈ M (2 × 2) que mapea el vector ~v a ~e1 y el vector w
~ a ~e2 .
24. Encuentre una matriz A ∈ M (2 × 2) que describe una rotación por π/3.
RS = SR ⇐⇒ R−1 S −1 = S −1 R−1 .
FT
(e) Si A + B es una matriz simétrica, entonces A, B son matrices simétricas.
(f) Si A es una matriz simétrica, entonces At es sı́metrica.
(g) AAt = At A.
31. (a) Sea P12 = ( 01 10 ) ∈ M (2 × 2). Demuestre que P12 se deja expresar como producto de
matrices elementales de la forma Qij (c) y Sk (c).
D
(b) Pruebe el caso general: Sea Pij ∈ M (n × n). Demuestre que Pij se deja expresar como
producto de matrices elementales de la forma Qkl (c) y Sm (c).
Observación: El ejercicio demuestra que en verdad solo hay dos tipos de matrices elementales
ya que el tercero (las permutaciones) se dejan reducir a un producto apropiado de matrices de
tipo Qij (c) y Sj (c).
Chapter 4
Determinants
In this section we will define the determinant of matrices in M (n × n) for arbitrary n and we will
recognise the determinant for n = 2 defined in Section 1.2 as a special case of our new definition.
FT
We will discuss the main properties of the determinant and we will show that a matrix is invertible
if and only if its determinant is different from 0. We will also give a geometric interpretation of
the determinant and get a glimpse of its importance in geometry and the theory of integration.
Finally we will use the determinant to calculate the inverse of an invertible matrix and we will
prove Cramer’s rule.
again the determinant tells us if a matrix is invertible or not. We will give several formulas for the
determinant. As definition, we use the Leibniz formula because it is non-recursive. First we need
to know what a permutation is.
sign(σ) 1 -1
sign(σ) 1 1 1 -1 -1 -1
FT
For instance the second permutation has two inversions (1 < 3 but σ(1) > σ(3) and 2 < 3
but σ(2) > σ(3)), the third permutation has two inversions (1 < 2 but σ(1) > σ(2), 1 < 3 but
σ(1) > σ(3)), etc.
Definition 4.3. Let A = (aij )i,j=1,...,n ∈ M (n × n). Then its determinant is defined by
X
det A = sign(σ) a1σ(1) a2σ(2) · · · anσ(n) . (4.1)
σ∈Sn
RA
The formula in equation (4.1) is called the Leibniz formula.
This means: instead of putting the permutation in the column index, we can just as well put
them in the row index.
Let us check if this new definition coincides with our old definition for the case n = 2.
a11 a12 X
det = sign(σ) a1σ(1) a2σ(2) = a11 a22 − a21 a12
a21 a22
σ∈S2
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a12 a21 a33 − a11 a23 a32 − a13 a22 a31 . (4.3)
Now let us group terms with coefficients from the first line of A.
FT
det A = a11 a22 a33 − a23 a32 − a12 a21 a33 − a23 a31 + a13 a21 a32 − a22 a31 . (4.4)
• a11 is multiplied by the determinant of the 2 × 2 matrix obtained from A by deleting row 1
and column 1.
• a12 is multiplied by the determinant of the 2 × 2 matrix obtained from A by deleting row 1
RA
and column 2.
• a13 is multiplied by the determinant of the 2 × 2 matrix obtained from A by deleting row 1
and column 3.
If we had grouped the terms by coefficients from the second row, we would have obtained something
similar: each term a2j would be multiplied by the determinant of the 2 × 2 matrix obtained from
A by deleting row 2 and column j.
Of course we could also group the terms by coefficients all from the first column. Then the formula
D
would become a sum of terms where the aj1 are multiplied by the determinants of the matrices
obtained from A by deleting row j and column 1.
This motivates the definition of the so-called minors of a matrix.
Definition 4.5. Let A = (aij )i,j=1,...,n ∈ M (n × n). Then the (n − 1) × (n − 1) matrix Mij which
is obtained from A by deleting row i and column j of A is called a minor of A. The corresponding
cofactor is Cij := (−1)i+j det(Mij ).
3
X 3
X
det A = (−1)1+j a1j det M1j = a1j C1j .
j=1 j=1
This formula is called the expansion of the determinant of A along the first row. We also saw that
we can expand along the second or the third row, or along columns, so
3
X 3
X
det A = (−1)k+j akj det Mkj = akj Ckj for k = 1, 2, 3,
j=1 j=1
3
X 3
X
det A = (−1)i+k aik det Mik = aik Cik for k = 1, 2, 3.
i=1 i=1
The first formula is called expansion along the kth row, and the second formula is called expansion
along the kth column. With a little more effort we can show that an analogous formula is true for
arbitrary n.
Theorem 4.6. Let A = (aij )i,j=1,...,n ∈ M (n × n) and let Mij denote its minors. Then
n n
FT
X X
det A = (−1)k+j akj det Mkj = akj Ckj for k = 1, 2, . . . , n, (4.5)
j=1 j=1
Xn n
X
det A = (−1)i+k aik det Mik = aik Cik for k = 1, 2, . . . , n. (4.6)
i=1 i=1
Proof. The formulas can be obtained from the Leibniz formula by straightforward calculations; but
RA
they are long and quite messy so we omit them here.
The formulas (4.5) and (4.5) are called Laplace expansion of the determinant. More precisely, (4.5)
is called expansion along the kth row, (4.6) is called expansion along the kth column.
Note that for calculating for instance the determinant of a 5×5 matrix, we have to calculate five 4×4
determinants for each of which we have to calculate four (3×3) determinants, etc. Computationally,
it is as long as the Leibniz formula, but at least we do not have to find all permutations in Sn first.
Later, we will see how to calculate the determinant using Gaussian elimination. This is computa-
D
We obtain the same result if we expand the determinant along e.g. the first row:
3 2 1 3 2 1 3 2 1 3 2 1
det 5 6 4 = 3 det 5 6 4 − 2 det 5 6 4 + 1 det 5 6 4
8 0 7 8 0 7 8 0 7 8 0 7
6 4 5 4 5 6
= 3 det − 2 det + 1 det
0 7 8 7 8 0
= 3 6 · 7 − 4 · 0] − 2 5 · 7 − 4 · 8] + 5 · 0 − 6 · 8] = 3 · 42 − 2 35 − 32] − 40 = 126 − 6 − 48 = 72.
Example 4.8. We give an example of the calculation of the determinant of a 4 × 4 matrix. The
red arrows indicate along which row or column we expand.
1 2 3 4
0 6 0 1 6 0 1 0 0 1 0 6 1 0 6 0
det = det 0 7 0 − 2 det 2 7 0 + 3 det 2 0 0 − 4 det 2 0 7
2 0 7 0
FT
3 0 1 0 0 1 0 3 1 0 3 0
0 3 0 1
6 1 0 1 6 1 2 7
= 7 det − 2 7 det + 3 −2 det − 4 −6 det
3 1 0 1 3 1 0 0
= 7[6 − 3] − 14[0 − 0] − 6[6 − 3] + 24[0 − 0] = 21 − 18 = 3.
Now we calculate the determinant of the same matrix but choose a row with more zeros in the first
step. The advantage is that there are only two 3 × 3 minors whose determinants we really have to
RA
compute.
1 2 3 4
0 6 0 1 2 3 4 1 3 4 1 2 4 1 2 3
det = −0 det 0 7 0 + 6 det 2 7 0 − 0 det 2 0 0 + det 2 0 7
2 0 7 0
3 0 1 0 0 1 0 3 1 0 3 0
0 3 0 1
2 0 1 4 0 7 2 3
= 6 −3 det +7 + det − 2 det
0 1 0 1 3 0 3 0
D
Rule of Sarrus
We finish this section with the so-called rule of Sarrus. From (4.3) we know that
det A = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a12 a21 a33 + a11 a22 a33 + a13 a22 a31
which can be memorised as follows: Write down the matrix A and append its first and second
column to it. Then we sum the products of the three terms lying on diagonals from the top left to
the bottom right and subtract the products of the terms lying on diagonals from the top right to
the bottom left as in the following picture:
det A = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 + a11 a23 a32 + a12 a21 a33 .
Convince yourself that one could also append the first and the second row below the matrix and
make crosses.
FT
Example 4.9 (Rule of Sarrus).
1 2 3
det 4 5 6 = 1 · 5 · 7 + 2 · 6 · 0 + 3 · 4 · 8 − 3 · 5 · 0 + 6 · 8 · 1 + 7 · 2 · 4
0 8 7
= 35 + 96 − [48 + 56] = 131 − 106 = 27.
RA
You should now have understood
• what a permutation is,
• how to derive the Laplace expansion formula from the Leibniz formula,
• etc.
D
In this section we will show properties of the determinant and we will prove that a matrix is
invertible if and only if its determinant is different from 0.
FT
~cj = ~sj + γ~tj . Then
det A = det(~c1 | · · · |~cj | · · · |~cn ) = det(~c1 | · · · |~sj + γtj | · · · |~cn )
= det(~c1 | · · · |~sj | · · · |~cn ) + γ det(~c1 | · · · |tj | · · · |~cn ).
This is proved easily by expanding the determinant along the jth column, or it can be seen from
the Leibniz formula as well.
RA
(D2) The determinant is alternating in its rows.
If two rows in a matrix are swapped, then the determinant changes its sign. This means: Let
~r1 , . . . , ~rn be the row vectors of the matrix A and i 6= j ∈ {1, . . . , n}. Then
.. ..
. .
~rj ~ri
det A = det . = − det ... .
..
~ri ~rj
D
.. ..
. .
This is easy to see when the two rows that shall be interchanged are adjacent. For example, assume
that j = i + 1. Let A be the original matrix and let B be the matrix with rows i and i + 1 swapped.
We expand the determinant of A along the ith row and and the determinant of B along the (i+1)th
A B
row. Note that in both cases the minors are equal, that is, Mik = M(i+1)k (we use superscripts A
and B to distinguish between the minors of A and of B). So we find
n
X n
X n
X
det B = (−1)(i+1)+k M(i+1)k
B
= (−1)(−1)i+k Mik
A
=− (−1)i+k Mik
A
= − det A.
k=1 k=1 k=1
This can seen also via the Leibniz formula. Now let us see what happens if i and j are not adjacent
rows. Without restriction we may assume that i < j. Then we first swap the jth row (j − i) times
with the row above until it is in the ith row. The original ith row is now in row (i + 1). Now we
swap it down with its neighbouring rows until it becomes row j. To do this we need j − (i + 1)
swaps. So in total we swapped [j − i] + [j − (i + 1)] = 2j − 2i + 1 times neighbouring rows, so the
determinant of the new matrix is
If two columns in a matrix are swapped, then the determinant changes its sign. This means: Let
~c1 , . . . , ~cn be the column vectors of the matrix A and i 6= j ∈ {1, . . . , n}. Then
FT
det A = det(· · · |~ci | · · · |~cj | · · · ) = det(· · · |~cj | · · · |~ci | · · · ).
This follows in the same way as the alternating property for rows.
Remark 4.10. It can shown: Every function f : M (n × n) → R which satisfies (D1), (D2) and
D
This follows easily from the Leibniz formula or from the Laplace expansion (if you expand A along
the first row and At along the first column, you obtain exactly the same terms). This also shows
that (D1’) follows from (D1) and that (D2’) follows from (D2) and vice versa.
.. .. .. ..
. . . .
~rk
c~rj
(D2)
(D1)
~rj ~rj
.. .. .. ..
det A = det = det = − det = −c det
. . . .
~rj ~rj c~rj ~rj
.. .. .. ..
. . . .
.. ..
. .
c~rj
~rk
(D1)
= − det
.. = − det .. = − det A.
.
.
FT
~rj
~rj
.. ..
. .
This shows det A = − det A, and therefore det A = 0. If A has a column which is a multiple of
another, then its transpose has a row which is multiple of another row and with the help of (D4) it
follows that det A = det At = 0.
RA
(D6) The determinant of an upper or lower triangular matrix is the product of its
diagonal entries.
Let A be an upper triangular matrix and let us expand its determinant in the first column. Then
only the first term in the Laplace expansion is different from 0 because all coefficients in the first
column are equal to 0 except possibly the one in the first row. We repeat this and obtain
c1 c2 c3
det A = det
c2 ∗
= c1 det
c3 ∗
= c1 c2 det
c4 ∗
D
0 0 0
cn cn cn
cn−1 0
= · · · = c1 c2 · · · cn−2 det = c1 c2 · · · cn−1 cn .
0 cn
The claim for lower triangular matrices follows from (D4) and what we just showed because the
transpose of an upper triangular matrix is lower triangular and the diagonal entries are the same.
Or we could repeat the above proof but this time we would expand always in the first row (or last
column).
The affirmation about Sj (c) and Qij (c) follow from (D6) since they are triangular matrices. The
claim for Pij follows from (D2) and (D3) because swapping row i and row j in Pij gives us the
identity matrix, so det Pij = − det id = −1.
Now we calculate the determinant of a product of an elementary matrix with another matrix.
(D8) Let E be an elementary matrix and let A ∈ M (n × n). Then det(EA) = det E det A.
Let E be an elementary matrix and let us denote the rows of A by ~r1 , . . . , ~rn . We have to distinguish
between the three different types of elementary matrices.
Case 1. E = Sj (c). We know from (D6) that det E = det Sj (c) = c. Using Proposition 3.61 and
FT
(D1) we find that
.. ..
. .
det(EA) = det Sj (c)A = det c~
r j
= c det ~rj = c det A = det Sj (c) det A.
.. ..
. .
Case 2. E = Qij (c). We know from (D6) that det E = det Qij (c) = 1. Using Proposition 3.61 and
RA
(D1) and (D5) we find that
.. .. .. ..
. . . .
~ri + c~rj ~ri ~rj ~ri
.
.. . . .
det(EA) = det Qij (c)A = det = det .. + c det .. = det ..
~rj ~rj ~rj ~rj
.. .. .. ..
. . . .
D
Case 3. E = Pij . We know from (D6) that det E = det Pjk = −1. Using Proposition 3.61 and
(D2) we find that
.. .. ..
. . .
~rj ~rk ~rj
.. .. ..
det(EA) = det Pjk A = det Pjk . = det = − det
. .
~rk ~rj ~rk
.. .. ..
. . .
= − det A = det Pjk det A.
Recall that the determinant of an elementary matrix is different from zero, so (4.7) shows that
det A = 0 if and only if det A = 0.
If A is invertible, then A0 = id hence det A0 = 1 6= 0 and therefore also det A 6= 0. If A is not
invertible, then the last row of A0 must be zero, hence det A0 = 0 and therefore also det A = 0.
FT
Next we show that the determinant is multiplicative.
If on the other hand A is not invertible, then det A = 0. Moreover, the last row of A0 is zero,
so also the last row of A0 B is zero, hence A0 B is not invertible and therefore det A0 B = 0. So
we have det(AB) = 0 by (4.7), and also det(A) det(B) = 0 det(B) = 0, so also in this case
det(AB) = det A det B.
D
Let A ∈ M (n × n). Give two proofs of det(cA) = cn det A using either one of the following:
(i) Apply (D1) or (D1’) n times.
(ii) Use that cA = diag(c, c, . . . , c)A and apply (D10) and (D6).
(vi) det A 6= 0.
FT
(iv) Every row-reduced echelon form of A has n pivots.
(v) A is row-equivalent to idn .
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
1 3 4 6 1 0 1 1 2 2 0 1 1 2 3 0 0 1 2
det
= det
= 5 det
= 5 det
1 7 8 9 0 5 5 5 0 1 1 1 0 0 1 1
1 5 3 4 0 3 0 0 0 3 0 0 0 3 0 0
1 2 3 4 1 2 3 4
4 0 0 0 2 5 0 3 0 0 7
= 5 = −5 det = −30.
0 0 1 1 0 0 1 1
0 3 0 0 0 0 0 2
1 We subtract the first row from all the other rows. The determinant does not change.
2 We factor 5 in the third row.
3 We subtract 1/3 of the last row from rows 2 and 3. The determinant does not change.
4 We subtract row 3 from row 2. The determinant does not change.
5 We swap rows 2 and 4. This gives a factor −1.
6 Easy calculation.
FT
• compute determinants using their properties,
• compute abstract determinants,
• use the factorisation of a matrix to compute its determinant,
• etc.
Area in R2
a1 b
D
Let ~a = and ~b = 1 be vectors in R2 and let us consider the matrix A = (~a|~b) the matrix
a2 b2
whose columns are the given vectors. Then
That means that A transforms the unit square spanned by the unit vectors ~e1 and ~e2 into the
parallelogram spanned by the vectors ~a and ~b. Let area(~a, ~b) be the area of the parallelogram
spanned by ~a and ~b. We can view ~a and ~b as vectors in R3 simply by adding a third component.
Then formula (2.9) shows that the area of the parallelogram spanned by ~a and ~b is equal to
a1 b2 − a2 b1
a1 b1
a2 × b2
=
0
= |a1 b2 − a2 b1 | = | det A|,
0 0
0
A
y y
b1
~e2 A~e2 = b2
A~e1 = ( aa12 )
x x
~e1
Figure 4.1: The figure shows how the area of the unit square transforms under the linear transforma-
tion A. The area of the square on left hand side is 1, the area of the parallelogram on the right hand
side is | det A|.
FT
area(~a, ~b) = | det A|. (4.9)
So while A tells us how the shape of the unit square changes, | det A| tells us how its area changes,
see Figure 4.1.
You should also notice the following: The area of the image of the unit square under A is zero
if and only if the two image vectors ~a and ~b are parallel. This is in accordance to the fact that
det A = 0 if and only if the two lines described by the associated linear equations are parallel (or if
one equation describes the whole plane).
RA
Volumes in R3
a1 b1 c1
Let ~a = a2 , ~b = b2 and ~c = c2 be vectors in R3 and let us consider the matrix
a3 b3 c3
~
A = (~a | b | ~c) whose columns are the given vectors. Then
parallelepiped spanned by the vectors ~a, ~b and ~c. Let vol(~a, ~b, ~c) be the volume of the parallelepiped
spanned by the vectors ~a, ~b and ~c. According to formula (2.10), vol(~a, ~b, ~c) = |h~a , ~b × ~ci|. We
calculate
* + *
a1 b1 c1 a1 b2 c3 − b3 c2 +
|h~a , ~b × ~ci| = a2 , b2 × c2 = a2 , b3 c1 − c3 b1
a3 b3 c3 a3 b1 c2 − b2 c1
= |a1 (b2 c3 − b3 c2 ) − a2 (c3 b1 − b3 c1 ) + a3 (b1 c2 − b2 c1 )|
= | det A|
hence
vol(~b, ~b, ~c) = | det A| (4.10)
z A z
A~e3
~e3
~e2 y y
A~e2
~e1 A~e
1
x
x
Figure 4.2: The figure shows how the volume of the unit cube transforms under the linear transfor-
mation A. The volume of the cube on left hand side is 1, the volume of the parallelepiped on the right
FT
hand side is | det A|.
since we recognise the second to last line as the expansion of det A along the first column. So while
A tells us how the shape of the unit cube changes, | det A| tells us how its volume changes.
You should also notice the following: The volume of the image of the unit cube under A is zero if
RA
and only if the three image vectors lie in the same plane. We will see later that this implies that
the range of A is not all of R3 , hence A cannot be invertible. For details, see Section 6.2.
Exercise. Give two proofs of the following statements: One using the formula (4.9) and linearity
D
−~v x
−w
~
y
3w
~
(ii) Show that the area of the blue parallelogram is
six times the area of the green parallelogram.
~v 2~v
x
−w
~
w
~
(iii) Show that the area of the blue and the red par-
allelogram is equal to the area of the green par-
~z
allelogram. ~v
FT
x
−~z
−~z
RA
You should now have understood
• the geometric interpretation of the determinant in R2 and R3 ,
• the close relation between the determinant and the cross product in R3 and that this is the
reason why the cross product appears in the formulas for the area of a parallelogram and
the volume of a parallelepiped,
• etc.
D
• calculate the area of a parallelogram and the volume of a parallelepiped using determinants,
• etc.
Now we want to see that happens if the k in akj and in Mkj are different.
Proposition 4.13. Let A = (aij )i,j=1,...,n ∈ M (n × n) and let k, ` ∈ {1, . . . , n} with k 6= `. Then
n
X
(−1)`+j akj det M`j = 0. (4.12)
j=1
Proof. We build the new matrix B from A by replacing its `th row by the kth row. Then B has
two equal rows (row ` and row k), hence det B = 0. Note that the matrices A and B are equal
B A
everywhere except possibly in the `th row, so their minors along the row ` are equal: M`j = M`j
(we put superscripts A, B in order to distinguish the minors of A and of B). If we expand det B
along the `th row then we find
FT
n
X n
X
0 = det B = (−1)`+j b`j det M`j
B
= (−1)`+j akj det M`j
A
.
j=1 j=1
Using the cofactors Cij of A (see Definition 4.5), formulas (4.11) and (4.12) can be written as
n n
(
X
`+j A
X det A if k = `,
(−1) akj det M`j = akj C`j := (4.13)
j=1 j=1
0 if k 6= `.
RA
Definition 4.14. For A ∈ M (n × n) we define its adjugate matrix adj A as the transpose of its
cofactor matrix:
t
C11 C12 · · · C1n C11 C21 ··· Cn1
C21 C22 · · · C2n C12 C22 ··· Cn2
adj A := . .. = .. .. .
.. ..
.. . . . . .
Cn1 Cn2 · · · Cnn C1n C2n ··· Cnn
D
Remark 4.16. Note that the proof of Theorem 4.15 shows that A adj A = det A idn is true for
every A ∈ M (n × n), even if it is not invertible (in this case, both sides of the formula are equal to
the zero matrix).
Formula (4.14) might look quite nice and innocent, however bear in mind that in order to calculate
A−1 with it you have to calculate one n ×n determinant and n2 determinants of the (n −1)×(n −1)
minors of A. This is a lot more than the O(n3 ) steps needed in the Gauß-Jordan elimination.
Finally, we prove Cramer’s rule for finding the solution of a linear system if the corresponding
matrix is invertible.
Theorem 4.17. Let A ∈ M (n × n) be an invertible matrix and let B~ ∈ Rn . Then the unique
~
solution ~x of A~x = b is given by
~
det Ab1
x1
~
x2 det Ab2
1
~x = . = (4.15)
.
FT
.. det A
..
xn ~
det Ab n
~
where Abj is the matrix obtained from the matrix A if we replace its jth column by the vector ~b.
Proof. As usual we write Cij for the cofactors of A and Mij for its minors. Since A is invertible,
we know that ~x = A−1~b = det1 A adj A~b. Therefore we find for j = 1, . . . , n that
RA
n n
1 X 1 X 1 ~
xj = Ckj bk = (−1)k+j bk Ckj = det Abj .
det A det A det A
k=1 k=1
~
The last equality is true because the second to last sum is the expansion of the determinant of Abj
along the kth column.
Note that, even if (4.15) might look quite nice, it involves the computation of n + 1 determinants
of n × n matrices, so it involves O((n + 1)!) steps.
D
1 2 3
Example 4.18. Let us calculate the inverse of the matrix A = 4 5 6 from Example 4.9. We
0 8 7
already know that det A = 27. Its cofactors are
5 6 4 6 4 5
C11 = det = −13, C12 = − det = −28, C13 = det = 32,
8 7 0 7 0 8
2 3 1 3 1 2
C21 = − det = 10, C22 = det = 7, C23 = − det = −8,
8 7 9 7 0 8
2 3 1 3 1 2
C31 = det = −3, C32 = − det = 6, C33 = det = −3.
5 6 4 6 4 5
Therefore
C C21 C31 −13 10 −3
1 1 11 1
A−1 = adj A = C12 C22 C32 = −28 7 6 .
det A det A 10
C13 C23 C33 32 −8 −3
FT
• etc.
4.5 Summary
The determinant is a function from the square matrices to the real or complex numbers. Let
A = (aij )ni,j−1 ∈ M (n × n).
RA
Formulas for the determinant.
X
det A = sign(σ) a1σ(1) a2σ(2) · · · anσ(n) Leibniz formula
σ∈Sn
Xn n
X
= (−1)k+j akj det Mkj = akj Ckj Laplace expansion along the kth row
D
j=1 j=1
Xn n
X
= (−1)i+k aik det Mik = aik Cik Laplace expansion along the kth column
i=1 i=1
• Mij are the minors of A ((n − 1) × (n − 1) matrices obtained from A by deleting row i and
column j),
Geometric interpretation
The determinant of a matrix A gives the oriented volume of the image of the unit cube under A.
• in R2 : area of parallelogram spanned by ~a and ~b = | det A|,
• in R3 : volume of parallelepiped spanned by ~a, ~b and ~c = | det A|.
4.6 Exercises
1. De las siguientes matrices calcule la determinante. Determine si las matrices son invertibles. Si
lo son, encuentre su matriz inversa y la determinante de la inversa.
1 3 6 1 4 6
1 −2 −14 21
A= , B= . D = 4 1 0 , E = 2 1 5 .
2 7 12 −18
1 4 3 3 5 11
2. De las siguientes matrices calcule el determinante. Determine si las matrices son invertibles. Si
lo son, encuentre su matriz inversa y el determinante de la inversa.
1 2 3 0
−1 2 3
π 3 0 1 2 2
A= , B = 1 3 1 , C= .
5 2 1 4 0 3
4 3 2
1 1 5 4
1
2 3
FT
3. Encuentre por lo menos cuatro matrices 3 × 3 cuyo determinante es 18.
4. Use las factorizaciones encontrados en los Ejercicios 16 y 16 en Capı́tulo 3 para calcular sus
determinantes.
RA
5. Escribe la matriz A = 1 2 6 como producto de matrices elementales y calcule el
−2 −2 −6
determinante de A usando las matrices elementales encontradas.
6. Determine todos los x ∈ R tal que las siguientes matrices son invertibles:
x x 3 11 − x 5 −50
x 2
A= , B = 1 2 6 , C= 3 −x −15 .
1 x−3
−2 2 −6 2 1 −x − 9
D
7. Suponga que una función y satisface y [n] = bn−1 y [n−1] + · · · b1 y 0 + b0 y donde b0 , . . . , bn−1 son
coeficientes constantes y y [j] denota la derivada j-ésima de y.
Verifique que Y 0 = AY donde
0 1 0 0 ... 0
0 0 1 0 ... 0 y
..
.. .. .. ..
y0
. . . . .
y 00
A= , Y =
. .. ..
.
. . 1 0
.
0 0 0 ... 0 1 y [n−1]
b1 b2 ... ... ... bn−1
y calcule el determinante de A.
8. Sin usar fórmulas de expansión para determinantes, encuentre para cada una de las matrices
dadas parámetros x, y tales que el determinante de las siguientes matrices es igual a zero y
explique por qué los parametros encontrado sirven.
1 x y 2
x 2 6 x 0 1 y
N1 = 2 5 1 , N2 = x 5 3 y .
3 4 y
4 x y 8
FT
B1 = 0, B2 = , B3 = 1 0 1 , B4 = 1 1 0 1 , B5 = 1 1 0 1
1 , etc.
1 0
1 1 0 1 1 1 0 1
1 1 1 0
1 1 1 1 0
Vector spaces
In the following, K always denotes a field. In this chapter, you may always think of K = R,
FT
though almost everything is true also for other fields, like C, Q or Fp where p is a prime number.
Later, in Chapter 8 it will be more useful to work with K = C.
In this chapter we will work with abstract vector spaces. We will first discuss their basic proper-
ties. Then, in Section 5.2 we will define subspaces. These are subsets of vector spaces which are
themselves vector spaces. In Section 5.4 we will introduce bases and the dimension of a vector
space. These concepts are fundamental in linear algebra since they allow us to classify all finite
RA
dimensional vector spaces. In a certain sense, all n-dimensional vector spaces over the same field
K are equal. In Chapter 6 we will study linear maps between vector spaces.
Note that we will usually write λv instead of λ · v. Then V (or more precisely, (V, +, ·)) is called a
vector space over K if for all u, v, w ∈ V and all λ, µ ∈ K the following holds:
141
142 5.1. Definitions and basic properties
Remark 5.2. (i) Note that we use the notation ~v with an arrow only for the special case of
vectors in Rn or Cn . Vectors in abstract vector spaces are usually denoted without an arrow.
(ii) If K = R, then V is called a real vector space. If K = C, then V is called a complex vector
space.
Before we give examples of vector spaces, we first show some basic properties of vector spaces.
FT
Properties 5.3. (i) The identity element is unique. (Note that in the vector space axioms we
only asked for existence of an additive identity element; we did not ask for uniqueness. So one
could think that there may be several elements which satisfy (c) in Definition 5.1. However,
this is not possible as the following proof shows.)
Proof. Assume there are two neutral elements O and O0 . Then we know that for every v and
w in V the following is true:
v = v + O, w = w + O0 .
RA
Now let us take v = O0 and w = O. Then, using commutativity, we obtain
O0 = O0 + O = O + O0 = O.
Proof. Let x0 be an additive inverse of x (that means that x0 + x = O which must exist since
D
y = O + y = (x0 + x) + y = x0 + (x + y) = x0 + (x + z) = (x0 + x) + z O + z = z.
(iii) For every v ∈ V , its inverse element is unique. (Note that in the vector space axioms we
only asked for existence of an additive inverse for every element x ∈ V ; we did not ask
for uniqueness. So one could think that there may be several elements which satisfy (d) in
Definition 5.1. However, this is not possible as the following proof shows.)
Proof. Let v ∈ V and assume that there are elements v 0 , v 00 in V such that
v + v 0 = O, v + v 00 = O.
λO + O = λO + λO.
Proof. The proof is similar to the one above. Observe that 0v = 0v + O0 and 0v = (0 + 0)v =
0v + 0v, hence
0v + O = 0v + 0v.
Now it follows from (ii) that O = 0v (take x = 0v, y = O and z = 0v in (ii)).
O = 0v = (1 + (−1))x = v + (−1)v.
Hence (−1)v is an additive inverse of v. By (iii), the inverse of v is unique, therefore (−1)v is
the inverse of v.
Remark 5.4. From now on, we write −v for the additive inverse of a vector. This notation is
D
• R is a real vector space. More generally, Rn is a real vector space. The proof is the same
as for R2 in Chapter 2. Associativity and commutativity are clear. The identity element is
the vector whose entries are all equal to zero: ~0 = (0, . . . , 0)t . The inverse for a given vector
~x = (x1 , . . . , xn )t is (−x1 , . . . , −xn )t . The distributivity laws are clear, as is the fact that
1~x = ~x for every ~x ∈ Rn .
• C is a complex vector space. More generally, Cn is a complex space. The proof is the same
as for Rn .
• R is not a complex vector space with the usual definition of the algebraic operations. If it
was, then the vectors would be real numbers and the scalars would be complex numbers. But
then if we take 1 ∈ R and i ∈ C, then the product i1 must be a vector, that is, a real number,
which is not the case.
• R can be seen as a Q-vector space.
• For every n, m ∈ N, the space M (m × n) of all m × n matrices with real coefficients is a real
vector space.
Proof. Note that in this case the vectors are matrices. Associativity and commutativity are
easy to check. The identity element is the matrix whose entries are all equal to zero. Given
a matrix A = (aij )i=1,...,m , its (additive) inverse is the matrix −A = (−aij )i=1,...,m . The
j=1,...,n j=1,...,n
distributivity laws are clear, as is the fact that 1A = A for every A ∈ M (m × n).
FT
Proof. As in the case of real matrices.
• Let C(R) be the set of all continuous functions from R to R. We define the sum of two
functions f and g in the usual way as the new function
f + g : R → R, (f + g)(x) = f (x) + g(x).
The product of a function f with a real number λ gives the new function λf defined by
RA
λf : R → R, (λf )(x) = λf (x).
Then C(R) is a vector space with these new operations.
Proof. It is clear that these operations satisfy associativity, commutativity and distributivity
and that 1f = f for every function f ∈ C(R). The additive identity is the zero function
(the function which is constant to zero). For a given function f , its (additive) inverse is the
function −f .
D
• Let P (R) be the set of all polynomials. With the usual sum and products with scalars, they
form a vector space.
Prove that C is a vector space over R and that R is a vector space over Q.
Observe that the sets M (m × n) and C(R) admit more operations, for example we can multiply
functions, or we can multiply matrices or we can calculate det A for a square matrix. However, all
these operations have nothing to do with the question whether they are vector spaces or not. It
is important to note that for a vector space we only need the sum of two vectors and the product
of a scalar with vector and that they satisfy the axioms from Definition 5.1.
Examples 5.6. • Consider R2 but we change the usual sum to the new sum ⊕ defined by
x a x+a
⊕ = .
y b 0
With this new sum, R2 is not a vector space. The reason is that
there is no additive identity.
α
To see this, assume that we had an additive identity, say . Then we must have
β
x α x x
+ = for all ∈ R2 .
y β y y
• Consider R2 but we change the usual sum to the new sum ⊕ defined by
FT
x a x+b
⊕ = .
y b y+b
With this new sum, R2 is not a vector space. One of the reasons is that the sum is not
commutative. For example
1 0 1+1 2 0 1 0+0 0
+ = = , but + = = .
0 1 0+0 0 1 0 1+1 2
RA
Show that there is no additive identity O which satisfies ~x ⊕ O = ~x for all ~x ∈ R2 .
• Let V = R+ = (0, ∞). We make V a real vector space with the following operations: Let
x, y ∈ V and λ ∈ R. We define
x ⊕ y = xy and λ x = xλ .
(λµ) v = v λµ = (v λ )µ = µ (v λ ) = λ (µ v).
(λ + µ) x = xλ+µ = xλ xµ = (λ v)(µ v) = (λ v) ⊕ (µ v)
and
• The example above can be generalised: Let f : R → (a, b) be an injective function. Then the
interval (a, b) becomes a real vector space if we define the sum of two vectors x, y ∈ (a, b) by
x ⊕ y = f (f −1 (x) + f −1 (y))
FT
λ x = f (λf −1 (x)).
Note that in the example above we had (a, b) = (0, ∞) and f = exp (that is: f (x) = ex ).
• check if a given set with a given addition and multiplication with scalars is a vector space,
• recite the vector space axioms when woken in the middle of the night,
• etc.
5.2 Subspaces
In this section, we work mostly with real vector spaces for the sake of definiteness. However, all
the statements are also true for complex vector spaces. We only have to replace R by C and the
word real by complex everywhere.
In this section we will investigate when a subset of a given vector space is itself a vector space.
Definition 5.7. Let V be a vector space and let W ⊆ V be a subset of V . Then W is called a
subspace of V if W itself is a vector space with the sum and product with scalars inherited from V .
A subspace W is called a proper subspace if W 6= {0} and W 6= V .
• V always contains the following subspaces: {0} and V itself. However, they are not proper
subspaces.
• If V is a vector space, W is a subspace of V and U is a subspace of W , then U is a subspace
of V .
Prove these statements.
Remark 5.9. Let W be a subspace of a vector space V . Let O be the neutral element in V . Then
O ∈ W and it is the neutral element of W .
FT
Proof. Since W is a vector space, it must have a neutral element OW . A priori, it is not clear that
OW = O. However, since OW ∈ W ⊂ V , we know that 0OW = O. On the other hand, since
W is a vector space, it is closed under product with scalars, so O = 0OW ∈ W . Clearly, O is a
neutral element in W . So it follows that O = OW by uniqueness of the neutral element of W , see
Properties 5.3(i).
Now assume that we are given a vector space V and in it a subset W ⊆ V and we would like to
RA
check if W is a vector space. In principle we would have to check all seven vector space axioms
from Definition 5.1. However, if W is a subset of V , then we get some of the vector space axioms
for free. More precisely, the axioms (a), (b), (e), (f) and (g) hold automatically. For example, to
prove (b), we take two elements w1 , w2 ∈ W . They belong also to V since W ⊆ V , and therefore
they commute: w1 + w2 = w2 + w1 .
We can even show the following proposition:
Proposition 5.10. Let V be a real vector space and W ⊆ V a subset. Then W is a subspace of V
if and only if the following three properties hold:
D
(iv) W is closed under sums and product with scalars, that is, if we take w1 , w2 ∈ W and λ ∈ R,
then λw1 + w2 ∈ W .
Proof of 5.10. Assume that W is a subspace, then clearly (ii) and (iii) hold. (i) holds because every
vector space must contain at least the additive identity O.
Now suppose that W is a subset of V such that the properties (i), (ii) and (iii) are satisfied. In
order to show that W is a subspace of V , we need to verify the vector space axioms (a) - (f) from
Definition 5.1. By assumptions (ii) and (iii), the sum and product with scalars are well defined in
W . Moreover, we already convinced ourselves that (a), (b), (e), (f) and (g) hold. Now, for the
existence of an additive identity, we take an arbitrary w ∈ W (such a w exists because W is not
empty by assumption (i)). Hence O = 0w ∈ W where O is the additive identity in V . This is then
also the additive identity in W . Finally, given w ∈ W ⊆ V , we know from Properties 5.3 (vi) that
its additive inverse is (−1)w, which, by our assumption (iii), belongs to W . So we have verified
that W satisfies all vector space axioms, so it is a vector space.
The proposition is also true if V is a complex vector space. We only have to replace R everywhere
by C.
In order to verify that a given W ⊆ V is a subspace, one only has to verify (i), (ii) and (iii) from
the preceding proposition. In order to verify that W is not empty, one typically checks if it contains
O.
The following definition is very important in many applications.
is a subspace of V .
FT
Definition 5.11. Let V be a vector space and W ⊆ V a subset. The W is called an affine subspace
if there exists an v0 ∈ V such that set
v0 + W := {v0 + w : w ∈ W }
Examples 5.12. Let V be a vector space. We assume that V is a real vector space, but everything
works also for a complex vector space (we only have to replace R everywhere by C.)
(vi) If W is a subspace of V , then V \ W is not a subspace. This can be easily seen if we recall that
W must contain O. But then V \ W cannot contain O, hence it cannot be a vector space.
Examples 5.13. • The set of all solutions of a homogeneous system of linear equations is a
vector space.
• The set of all solutions of an inhomogeneous system of linear equations is an affine vector
space if it is not empty.
• The set of all solutions of a homogeneous linear differential equation is a vector space.
• The set of all solutions of an inhomogeneous linear differential equation is an affine vector
space if it is not empty.
FT
λ
• W = : λ ∈ R is a subspace of R2 . This is actually a subspace of the form (iii) from
0
1
Example 5.12 with z = . Note that geometrically W is a line (it is the x-axis).
0
v1
• For fixed v1 , v2 ∈ R let ~v = and let W = {λ~v : λ ∈ R}. Then W is a subspace of R2 .
v2
RA
Geometrically, W is the trivial subspace {~0} if ~v = ~0. Otherwise it is the line in R2 passing
through the origin which is parallel to the vector ~v .
W
~v
D
a1 v1
• For fixed a1 , a2 , v1 , v2 ∈ R let ~a = and ~v = . Let us assume that ~v 6= ~0 and set
a2 v2
W = {~a + λ~v : λ ∈ R}. Then W is an affine subspace. Geometrically, W represents a line in
R2 parallel to ~v which passes through the point (a1 , a2 ). Note that W is a subspace if and
only if ~a and ~v are parallel.
~v W
~v W
~a ~a
Figure 5.2: Sketches of W = {~a + λ~v : λ ∈ R}. In the figure on the left hand side, ~a 6k ~v , so W is an
affine subspace of R@ but not a subspace. In the figure on the right hand side, ~a k ~v and therefore W
is a subspace of R2 .
FT
2 2 1
• V = {~x ∈ R : k~xk ≤ 2} is not a subspace of R . For example, take ~z = . Then ~z ∈ W ,
0
however 3~z ∈
/ V.
x 2
• W = : x ≥ 0 . Then W is not a vector space. For example, ~z = ∈ W , but
y
0
−2
(−1)~z = ∈
/ W.
0
RA
Note that geometrically W is a right half plane in R2 .
V 3~z ∈
/V −~z ∈
/W
~z ∈ V ~z ∈ W
D
Figure 5.3: The sets V and W in the figures are not subspaces of R2 .
x
• For fixed a, b, c ∈ R the set W = y : ax + by + cz = 0 is a subspace of R3 .
z
3 ~
Proof. We use Proposition 5.10
toverify that W isa subspace of R . Clearly, 0 ∈ W since
x1 x2
0a+0b+0c = 0. Now let w~ 1 = y1 and w~ 2 = y2 in W and let λ ∈ R. Then w ~2 ∈ W
~ 1 +w
z1 z2
because
~ 1 ∈ W because
Also λw
FT
Remark. Note that W is the set of all solutions of a homogeneouos linear system of equations
(one equation with three unknowns). Therefore W is a vector space by Theorem 3.21 where
it is shown that the sum and the product with a scalar of two solutions of a homogeneous
linear system is again a solution.
• For fixed
a,b, c, d ∈ R with d 6=
0 and at least of the numbers a, b, c different from 0, the set
x
W = y : ax + by + cz = d is not a subspace of R3 , see Figure 5.4, but it is an affine
z
subspace.
D
x1 x2
~ 1 = y1 and w
Proof. Let us see that W is not a vector space. Let w ~ 2 = y2 in W . Then
z1 z2
w
~1 + w~2 ∈
/ W because
FT
Figure 5.4: The green plane passes through the origin and is a subspace of R3 . The red plane does
not pass through the origin and therefore it is an affine subspace of R3 .
an arbitrary vector from the origin to a point on the plane W . (Note that W0 is the plane
described by ax + by + cz = 0.)
Note that we already showed in Corollary 3.22 that W is an affine vector space.
RA
Remark. If a = b = c = 0, then W = ∅.
x 1
• W = x2 : x ∈ R . Then W is not a vector space. For example, ~a = 1 ∈ W , but
3
x 1
2
2~a = 2 ∈ / W.
2
• The set of all matrices such that its first row is equal to its last row.
FT
Examples 5.17 (Examples and non-examples of subspaces of the set all functions from
R to R). Let V be the set of all functions from R to R. Then V clearly is a real vector space.
The following sets are examples for subspaces of V :
Definition 5.18. For n ∈ N0 let Pn be the set of all polynomials of degree less than or equal to n.
Proof. Clearly, the zero function belongs to Pn (it is a polynomial of degree 0). For polynomials
p, q ∈ Pn and numbers λ ∈ R we clearly have that p + q and λp are again polynomials of degree
at most n, so they belong to Pn . By Proposition 5.10, Pn is a subspace of the space of all real
functions, hence it is a vector space.
FT
• check if a given subset of a vector space is a subspace,
• etc.
Definition 5.20. Let V be a real vector space and let v1 , . . . , vk ∈ V and α1 , . . . , αk ∈ R. Then
every vector of the form
v = α1 v1 + · · · αk vk (5.1)
is called a linear combination of the vectors v1 , . . . , vk ∈ V .
D
1 4 9 3
Examples 5.21. • Let V = R3 and let ~v1 = 2 , ~v2 = 5 , ~a = 12 , ~b = 3 .
3 6 15 3
Then ~a and ~b are linear combinations of ~v1 and ~v2 because ~a = ~v1 + 2~v2 and ~b = −~v1 + ~v2 .
1 0 0 1 5 7 1 2
• Let V = M (2 × 2) and let A = , B= , R= , S= .
0 1 −1 0 −7 5 −2 3
Then R is a linear combination of A and B because R = 5A+7B. S is not a linear combination
of A and B because clearly every linear combination of A and B is of the form
α β
αA + βB =
−β α
so it can never be equal to S since S has two different numbers on its diagonal.
Definition and Theorem 5.22. Let V be a real vector space and let v1 , . . . , vk ∈ V . Then the
set of all their possible linear combinations is denoted by
span{v1 , . . . , vk } := {α1 v1 + · · · + αk vk : α1 , . . . , αk ∈ R}.
It is a subspace of V and it is called the linear span of the vectors v1 , . . . , vk . The vectors v1 , . . . , vk
are called generators of the subspace span{v1 , . . . , vk }.
Remark. By definition, the vector space generated by the empty set is the vector space which
consists only of the zero vector, that is, span{} := {O}.
Remark. Other names for “linear span” that are commonly used, are subspace generated by
the v1 , . . . , vk or subspace spanned by the v1 , . . . , vk . Instead of span{v1 , . . . , vk } the notation
gen{v1 , . . . , vk } is used frequently. All these names and notations mean exactly the same thing.
FT
Proof of Theorem 5.22. We have to show that W := span{v1 , . . . , vk } is a subspace of V . To this
end we use Proposition 5.10 again. Clearly, W is not empty since at least O ∈ W (we only need
to choose all the αj = 0). Now let u, w ∈ W and λ ∈ R. We have to show that λu + w ∈ W . Since
u, w ∈ W , there are real numbers α1 , . . . , αk and β1 , . . . , βk such that u = α1 v1 + . . . , αk vk and
w = β1 w1 + · · · + βk vk . Then
λu + v = λ(α1 v1 + · · · + αk vk ) + β1 w1 + · · · + βk vk
= λα1 v1 + · · · + λαk vk ) + β1 w1 + · · · + βk vk
RA
= (λα1 + β1 )v1 + · · · + (λαk + βk )vk
which belongs to W since it is a linear combination of the vectors v1 , . . . , vk .
span{A, B} = {αA + βB : α, β ∈ R} = : α, β ∈ R ,
β α
α + γ −(β + γ)
span{A, B, C} = {αA + βB + γC : α, β, γ ∈ R} = : α, β, γ ∈ R ,
β+γ α+γ
α+γ −γ
span{A, C} = {αA + γC : α, γ ∈ R} = : α, γ ∈ R .
γ α+γ
We see that span{A, B} = span{A, B, C} = span{A, C} (in all cases it consists of exactly those
matrices whose diagonal entries are equal and the off-diagonal entries differ by a minus sign). So
we see that neither the generators nor their number is unique.
Remark. If a vector is a linear combination of other vectors, then the coefficients in the linear
combination are not necessarily unique.
Remark 5.23. Let V be a vector space and let v1 , . . . , vn and w1 , . . . , wm be vectors in V . Then
the following are equivalent:
FT
Examples 5.24. (i) Pn = span{1, X, X 2 , . . . , X n−1 , X n } since every vector in Pn is a polyno-
mial of the form p = αn X n + αn−1 X n−1 + · · · + α1 X + α0 , so it is a linear combination of
the polynomials X n , X n−1 , . . . , X, 1.
~ ∈ R3 \ {~0}.
(iii) Let V = R3 and let ~v , w
Solution. • Let us check if q ∈ U . To this end we have to check if we can find α, β such that
q = αp1 + βp2 . Inserting the expressions for p1 , p2 , q we obtain
α+ β=2
−α − 2β = −1
α + 5β = −2.
It follows that α = 3 and β = −1 is a solution, and therefore q = 2p1 − p2 which shows that
q ∈ U.
• Let us check if r ∈ U . To this end we have to check if we can find α, β such that r = αp1 +βp2 .
Inserting the expressions for p1 , p2 , q we obtain
α+ β=1
−α − 2β = 1
FT
α + 5β = −3.
We see that the system is inconsistent. Therefore r is not a linear combination of p1 and p2 ,
RA
hence r ∈
/ U.
Definition 5.26. A vector space V is called finitely generated if is has a finite set of generators.
• M (m × n) because it is generated by the set of all possible matrices which are 0 everywhere
except a 1 in exactly one entry.
• Pn is finitely generated as was shown in Example 5.24.
• Let P be the vector space of all real polynomials. Then P is not finitely generated.
Proof. Assume that P is finitely generated and let q1 , . . . , qk be a system of generators of P .
Note that the qj are polynomials. We will denote their degrees by mj = deg qj and we set
M = max{m1 , . . . , mk }. Then any linear combination of them will be a polynomial of degree
at most M no matter who we choose the coefficients, However, there are elements in P which
have higher degree, for example X m+1 . Therefore q1 , . . . , qk cannot generate all of P .
Another proof using the concept of dimension will be given in Example 5.54 (f).
Later, in Lemma 5.51, we will see that every subspace of a finitely generated vector space is again
finitely generated.
Now we ask ourselves what is the least number of vectors we need in order to generate Rn . We
know that for example Rn = span{~e1 , . . . , ~en }. So in this case we have n vectors that generate
Rn . Could it be that fewer vectors are sufficient? Clearly, if we take away one of the ~ej , then the
remaining system no longer generates Rn since “one coordinate is missing”. However, could we
maybe find other vectors so that n − 1 or less vectors are enough to generate all of Rn ? The next
proposition says that this is not possible.
Proof. Let A = (~v1 | . . . |~vk ) be the matrix whose columns are the given vectors. We know that
there exists an invertible matrix E such that A0 = EA is in reduced echelon form (the matrix E
is the product of elementary matrices which correspond to the steps in the Gauß-Jordan process
to arrive at the reduced echelon form). Now, if k < n, then we know that A0 must have at least
FT
one row which consists of zeros only. If we can find a vector w ~ such that it is transformed to ~en
under the Gauß-Jordan process, then we would have that A~x = w ~ is inconsistent, which means
that w~ ∈
/ span{~v1 , . . . , ~vk }. How do we find such a vector w?
~ Well, we only have to start with ~en
and “do the Gauß-Jordan process backwards”. In other words, we can take w ~ = E −1~en . Now if we
apply the Gauß-Jordan process to the augmented matrix (A|w), ~ we arrive at (EA|E w) ~ = (A0 |~en )
which we already know is inconsistent.
Therefore, k < n is not possible and therefore we must have that k ≥ n.
RA
Note that the proof above is basically the same as the one in Remark 3.36. Observe that the system
of vectors ~v1 , . . . , ~vk ∈ Rn is a set of generators for Rn if and only if the equation A~y = ~b has a
solution for every ~b ∈ Rn (as above, A is the matrix whose columns are the vectors ~v1 , . . . , ~vk ).
Now we will answer the question when the coefficients of a linear combination are unique. The
following remark shows us that we have to answer this question only for the zero vector.
Remark 5.29. Let V be a vector space, let v1 , . . . , vk ∈ V and let w ∈ span{v1 , . . . , vk }. Then
there are unique β1 , . . . , βk ∈ R such that
D
β1 v 1 + · · · + βk vk = w (5.2)
α1 v1 + · · · + αk vk = O. (5.3)
Proof. First note that (5.3) always has at least one solution, namely α1 = · · · = αk = 0. This
solution is called the trivial solution.
Let us assume that (5.2) has two different solutions, so that there are γ1 , . . . , γk ∈ R such that for
at least one j = 1, . . . , k we have that βj 6= γj and
γ1 v1 + · · · + γk vk = w. (5.2’)
In fact, the discussion above should remind you of the relation between solutions of an inhomo-
geneous system and the solutions of its associated homogeneous system in Theorem 3.21. Note
FT
that just as in the case of linear systems, (5.2) could have no solution. This happens if and only
if w ∈
/ span{v1 , . . . , vk }.
If V = Rn then the remark above is exactly Theorem 3.21.
So we see that only one of the following two cases can occur: (5.4) as exactly one solution (namely
the trivial one) or it has infinitely many solutions. Note that this is analogous to the situation of
the solutions of homogeneous linear systems: They have either only the trivial solution or they have
infinitely many solutions. The following definition distinguishes between the two cases.
RA
Definition 5.30. Let V be a vector space. The set of vectors v1 , . . . , vk in V are called linearly
independent if
α1 v1 + · · · + αk vk = O. (5.4)
has only the trivial solution. They are called linearly dependent if (5.4) has more than one solution.
Remark 5.31. The empty set is linearly independent since O cannot be written as a nontrivial
linear combination of vectors from the empty set.
D
Proof. Consider the equation αv~1 + β v~2 = ~0. This equation is equivalent to the following
system of linear equations for α and β:
α + 3β = 0
2α + 0β = 0.
We can use the Gauß-Jordan process to obtain all solutions. However, in this case we easily
1
see that α = 0 (from the line) and then that β = − 3 α = 0. Note that we could
second
1 3
also have calculated det = −6 6= 0 to conclude that the homogeneous system above
2 0
has only the trivial solution. Observe that the columns of the matrix are exactly the given
vectors.
1 2
(iii) The vectors v~1 = 1 and v~2 = 3 ∈ R2 are linearly independent.
1 4
Proof. Consider the equation αv~1 + β v~2 = ~0. This equation is equivalent to the following
system of linear equations for α and β:
α + 2β = 0
α + 3β = 0
α + 4β = 0.
1
−1 0FT
If we subtract the first from the second equation, we obtain β = 0 and then α = −2β = 0. So
again, this system has only the trivial solution and therefore the vectors v~1 and v~2 are linearly
independent.
0
(iv) Let v~1 = 1, v~2 = 2 v~3 = 0 and v~4 = 6 ∈ R2 Then
1 3 1 8
RA
(a) The system {~v1 , ~v2 , ~v3 } is linearly independent.
(b) The system {~v1 , ~v2 , ~v4 } is linearly dependent.
Proof. (a) Consider the equation αv~1 + β v~2 + γ v~3 = ~0. This equation is equivalent to the
following system of linear equations for α, β and γ:
α − 1β + 0γ = 0
α + 2β + 0γ = 0
D
α + 3β + 1γ = 0.
We use the Gauß-Jordan process to solve the system. Note that the columns of the
matrix associated to the above system are exactly the given vectors ~v1 , ~v2 , ~v3 .
1 −1 0 1 −1 0 1 −1 0 1 0 0
A = 1 2 0 −→ 0 3 0 −→ 0 1 0 −→ 0 1 0 .
1 3 1 0 4 1 0 4 1 0 0 1
Therefore the unique solution is α = β = γ = 0 and consequently the vectors ~v1 , ~v2 , ~v3
are linearly independent.
Observe that we also could have calculated det A = 3 6= 0 to conclude that the homoge-
neous system has only the trivial solution.
(b) Consider the equation αv~1 + β v~2 + δ v~4 = ~0. This equation is equivalent to the following
system of linear equations for α, β and δ:
α − 1β + 0δ = 0
α + 2β + 6δ = 0
α + 3β + 8δ = 0.
We use the Gauß-Jordan process to solve the system. Note that the columns of the
matrix associated to the above system are exactly the given vectors.
1 −1 0 1 −1 0 1 −1 0 1 −1 0
A = 1 2 6 −→ 0 3 6 −→ 0 1 2 −→ 0 1 2
1 3 8 0 4 8 0 1 2 0 0 0
1 0 2
−→ 0 1 2 .
0 0 0
So there are infinitely many solutions. If we take δ = t, then α = β = −2t. Consequently
FT
the vectors ~v1 , ~v2 , ~v3 are linearly dependent, because, for example, −2~v1 − 2~v2 + ~v3 = 0
(taking t = 1).
Observe that we also could have calculated det A = 0 to conclude that the system has
infinite solutions.
0 1 1 0
(v) The matrices and are linearly independent in M (2 × 2).
0 0 0 0
RA
1 1 1 0 0 1
(vi) The matrices A = , B= and C = are linearly dependent in M (2×2)
0 1 0 1 0 0
because A − B − C = 0.
After these examples we will proceed with some facts on linear independence. We start with the
special case when we have only two vectors.
Proposition 5.32. Let v1 , v2 be vectors in a vector space V . Then v1 , v2 are linearly dependent if
and only if one vector is a multiple of the other.
D
Proof. Assume that v1 , v2 are linearly dependent. Then there exist α1 , α2 ∈ R such that α1 v1 +
α2 v2 = 0 and at least one of the α1 and α2 is different from zero, say α1 6= 0. Then we have
α2
v1 + α 1
v2 = 0, hence v1 = − α
α1 v2 .
2
Now assume on the other hand that, e.g., v1 is a multiple of v2 , that is v1 = λv2 for some λ ∈ R.
Then v1 − λv2 = 0 which is a nontrivial solution of α1 v1 + α2 v2 = 0 because we can take α1 = 1 6= 0
and α2 = −λ (note that λ may be zero).
The proposition
above cannot
be extended
to the case of three or more vectors. For instance, the
1 ~ 0 1
vectors ~a = ,b= , ~c = are linearly dependent because ~a + ~b − ~c = ~0, but none of
0 1 1
them is a multiple of any of the other two vectors.
FT
α`−1 α`+1 αk
(ii) If α` 6= 0, then we can solve for v` : v` = − αα1` v1 − · · · − α` v`−1 − α` v`+1 − ··· − α ` vk .
(iii) If the vectors v1 , . . . , vk ∈ V are linearly dependent, then there exist α1 , . . . , αk ∈ R such
that at least one of them is different from zero and α1 v1 + · · · + αk vk = O. But then also
α1 v1 + · · · + αk vk + 0w = O which shows that the system {v1 , . . . , vk , w} is linearly dependent.
(v) Suppose that a subsystem of v1 , . . . , vk ∈ V are linearly dependent. Then, by (iii) every
system in which it is contained, must be linearly dependent too. In particular, the original
system of vectors must be linearly dependent which contradicts our assumption. Note that
also the empty set is linearly independent by Remark 5.31.
Now we specialise to the case when V = Rn . Let us take vectors ~v1 , . . . , ~vk ∈ Rn and let us write
(~v1 | · · · |~vk ) for the n × k matrix whose columns are the vectors ~v1 , . . . , ~vk .
D
Lemma 5.34. With the above notation, the following statements are equivalent:
(ii) There exist α1 , . . . , αk not all equal to zero, such that α1~v1 + · · · + αk~vk = 0.
α1 α1
.. ~ .. ~
(iii) There exists a vector . 6= 0 such that (~v1 | · · · |~vk ) . = 0.
αk αk
(iv) The homogeneous system corresponding to the matrix (~v1 | · · · |~vk ) has at least one non-trivial
(and therefore infinitely many) solutions.
Proof. (i) =⇒ (ii) is simply the definition of linear independence. (ii) =⇒ (iii) is only rewriting
the vector equation in matrix form. (iv) only says in word what the equation in (iii) means. And
finally (iv) =⇒ (i) holds because every non trivial solution the inhomogeneous system associated
to (~v1 | · · · |~vk ) gives a non-trivial solution of α1~v1 + · · · + αk~vk .
Since we know that a homogeneous linear system with more unknowns than equations has infinitely
many solutions, we immediately obtain the following corollary.
(i) If k > n, then the vectors ~v1 , . . . , ~vk are linearly dependent.
(ii) If the vectors ~v1 , . . . , ~vk are linearly independent, then k ≤ n.
Observe that (ii) does not say that if k ≤ n, then the vectors ~v1 , . . . , ~vk are linearly independent.
It only says that they have a chance to be linearly independent whereas a system with more than
n vectors always is linearly dependent.
FT
Now we specialise further to the case when k = n.
Theorem 5.36. Let ~v1 , . . . , ~vn be vectors in Rn . Then the following are equivalent:
Proof. The equivalence of (i) and (ii) follows from Lemma 5.34. The equivalence of (ii), (iii) and
(iv) follows from Theorem 4.11.
D
Theorem 5.37. Let ~v1 , . . . , ~vn be vectors in Rn . and let A = (~v1 | · · · |~vn ) be the matrix whose
columns are the given vectors ~v1 , · · · , ~vn . Then the following are equivalent:
(iii) det A 6= 0.
(ii) ⇐⇒ (iii): The vectors ~v1 , . . . , ~vn generate Rn if and only if for every w
~ ∈ Rn there exist
! numbers
β1
β1 , . . . , βn such that β1~v1 + · · · + βn vn = w.
~ In matrix form that means that A .. = w.
~ By
.
βn
Theorem 3.43 we know that this has a solution for every vector w ~ if and only if A is invertible
(because if we apply Gauß-Jordan to A, we must get to the identity matrix).
The proof of the preceding theorem basically goes like this: We consider the equation Aβ~ = w. ~
When are the vectors ~v1 , . . . , ~vn linearly independent? – They are linearly independent if and only
~ = ~0 the system has only the trivial solution. This happens if and only if the reduced echelon
if for w
form of A is the identity matrix. And this happens if and only if det A 6= 0.
When do the vectors ~v1 , . . . , ~vn generate Rn ? – They do, if and only if for every given vector w
~ ∈ Rn
the system has at least one solution. This happens if and only if the reduced echelon form of A is
the identity matrix. And this happens if and only if det A 6= 0.
Since a square matrix A in invertible if and only if its transpose At is invertible, Theorem 5.37 leads
immediately to the following corollary.
FT
Corollary 5.38. For a matrix A ∈ M (n × n) the following are equivalent:
(i) A is invertible.
(ii) The columns of A are linearly independent.
(iii) The rows of A are linearly independent.
Comparing coefficients, it follows that α1 +2α3 = 0, 2α1 +5α2 −11α3 = 0, −α1 +2α2 −8α3 = 0.
We write this in matrix form and apply Gauß-Jordan:
1 0 2 1 0 2 1 0 2 1 0 2
2 5 −11 −→ 0 5 −15 −→ 0 1 −3 −→ 0 1 −3 .
−1 2 −8 0 2 −6 0 1 −3 0 0 0
FT
This shows that the system has non-trivial solutions (find them!) and therefore p1 , p2 and p3
are linearly dependent.
1 2 1 0 0 5
• In V = M (2 × 2) consider A = ,B = ,C = . Then A, B, C are
2 1 0 1 5 0
linearly dependent because A − B − 51 C = 0.
1 2 3 2 2 2 1 2 2
• In V = M (2 × 3) consider A = ,B = ,C = . Then A, B, C
4 5 6 1 1 1 2 1 1
RA
are linearly independent.
• etc.
FT
word real by complex everywhere.
Definition 5.39. Let V be a vector space. A basis of V is a set of vectors {v1 , . . . , vn } in V which
is linearly independent and generates V .
The following remark shows that a basis is a minimal system of generators of V and at the same
RA
time a maximal system of linear independent vectors.
span{v2 , . . . , vn } =
6 V.
Remark 5.40. By definition, the empty set is a basis of the trivial vector space {O}.
Remark 5.41. Every basis of Rn has exactly n elements. To see this note that by Corollary 5.35,
a basis can have at most n elements because otherwise it cannot be linearly independent. On the
other hand, if it had less than n elements, then, by Remark 5.28, it cannot generate Rn .
1 0 0
Examples 5.42. • A basis of R3 is, for example, 0 , 1 , 0 . The vectors of this
0 0 1
basis are the standard unit vectors. The basis is called the standard basis (or canonical basis)
of R3 .
FT
• The standard basis in Pn (or canonical basis in Pn ) is {1, X, X 2 , . . . , X n }.
Exercise. Verify that they form a basis of Pn .
• Let p1 = X, p2 = 2X 2 + 5X − 1, p3 = 3X 2 + X + 2. Then the system {p1 , p2 , p3 } is a basis
of P2 .
Proof. We have to show that the system in linearly independent and that it generates the
RA
space P2 . Let q = aX 2 + bX + c ∈ P2 . We want to see if there are α1 , α2 , α3 ∈ R such that
q = α1 p1 + α2 p2 + α3 p3 . If we write this equation out, we find
aX 2 + bX + c = α1 X + α2 (2X 2 + 5X − 1) + α3 (3X 2 + X + 2)
= (2α2 + 3α3 )X 2 + (α1 + 5α2 + α3 )X − α2 + 2α3 .
Comparing coefficients, we obtain the following system of linear equations for the αj :
2α2 + 3α3 = a 0 2 3 α1 a
α1 + 5α2 + α3 = b in matrix form: 1 5 1 α2 = b .
D
−α2 + 2α3 = c 0 −1 2 α3 c
FT
c d 0 0 1 0 1 1 1 1
α1 + α2 + α3 + α4 α4
= .
α2 + α3 + α4 α3 + α4
0 1 0 0 c − d 0 1 0 0 c − d
−→ −→ .
0 0 1 0 d − b 0 0 1 0 d − b
0 0 0 1 b 0 0 0 1 b
We see that there is exactly one solution for any given M ∈ M (2 × 2). Existence of the
solution shows that the matrices A, B, C, D generate M (2 × 2) and uniqueness shows that
they are linearly independent if we choose M = 0.
The next theorem is very important. It says that if V has a basis which consists of n vectors, then
every basis consists of exactly n vectors.
Theorem 5.43. Let V be a vector space and let {v1 , . . . , vn } and {w1 , . . . , wn } be bases of V . Then
n = m.
Proof. Suppose that m > n. We will show that then the vectors w1 , . . . , wm cannot be linearly
independent, hence they cannot be a basis of V . Since the vectors v1 , . . . , vn are a basis of V , every
wj can be written as a linear combination of them. Hence there exist numbers aij which
FT
+ · · · + cm (am1 v1 + am2 v2 + · · · + amn vn )
= (c1 a11 + c2 a21 + · · · + cm am1 )v1 + · · · + (c1 a1n + c2 a2n + · · · + cm amn )vn .
Since the vectors v1 , . . . , vn are linearly independent, the expressions in the parentheses must be
equal to zero. So we find
Definition 5.44. • Let V be a finitely generated vector space. Then it has a basis by The-
orem 5.45 below and by Theorem 5.43 the number n of vectors needed for a basis does not
depend on the particular chosen basis. This number is called the dimension of V . It is denoted
by dim V .
• The empty set is a basis of the trivial vector space {O}, hence dim{O} = 0.
Next we show that every finitely generated vector space has a basis and therefore a well-defined
dimension.
Theorem 5.45. Let V be a vector space and assume that there are vectors w1 , . . . , wm ∈ V such
that V = span{w1 , . . . , wm }. Then the set {w1 , . . . , wm } contains a basis of V . In particular, V
has a finite basis and dim V ≤ m.
Proof. Without restriction we may assume that all vectors wj are different from O. We start with
the first vector. If V = span{w1 }, then {w1 } is a basis of V and dim V = 1. Otherwise we set
V1 := span{w1 } and we note that V1 6= V . Now we check if w2 ∈ span{w1 }. If it is, we throw it out
because in this case span{w1 } = span{w1 , w2 } so we do not need w2 to generate V . Next we check
if w3 ∈ span{w1 }. If it is, we throw it out, etc. We proceed like this until we find a vector wi2 in
our list which does not belong to span{w1 }. Such an i2 must exist because otherwise we already
had that V1 = V . Then we set V2 := span{w1 , wi2 }. If V2 = V , then we are done. Otherwise, we
proceed as before: We check if wi2 +1 ∈ V2 . If this is the case, then we can throw it out because
span{w1 , wi2 } = span{w1 , wi2 , wi2 +1 }. Then we check wi2 +2 , etc., until we find a wi3 such that
wi3 ∈
/ span{w1 , wi2 } and we set V3 := span{w1 , wi2 , wi3 }. If V3 = V , then we are done. If not, then
we repeat the process. Note that after at most m repetitions, this comes to an end. This shows
that we can extract from the system of generators a basis {w1 , wi2 , . . . , wik } of V .
FT
The following theorem complements the preceding one.
Theorem 5.46. Let V be a finitely generated vector space. Then any system w1 , . . . , wm ∈ V of
linearly independent vectors can be completed to a basis {w1 , . . . , wm , vm+1 , . . . , vn } of V .
Proof. Note that dim V < ∞ by Theorem 5.45 and set n = dim V . It follows that n ≥ m because
we have m linearly independent vectors in V . If m = n, then w1 , . . . , wm is already a basis of V
RA
and we are done.
If m < n, then span{w1 , . . . , wm } = 6 V and we choose an arbitrary vector vm+1 ∈
/ span{w1 , . . . , wm }
and we define Vm+1 := span{w1 , . . . , wm , vm+1 }. Then dim Vm+1 } = m + 1. If m + 1 = n,
then necessarily Vm+1 = V and we are done. If m + 1 < n, then we choose an arbitrary vector
vm+2 ∈ V \ Vm+1 and we let Vm+2 := span{w1 , . . . , wm , vm+1 , vm+2 }. If m + 2 = n, then necessarily
Vm+2 = V and we are done. If not, we repeat the step before. Note that after n − m steps we have
found a basis {w1 , . . . , wm , vm+1 , . . . , vn } of V .
• If the set of vectors v1 , . . . vm generates the vector space V , then it is always possible to extract
a subset which is a basis of V (we need to eliminate m − n vectors).
1 0 1 1
Example 5.48. • Let A = , B= ∈ M (2 × 2) and suppose that we want to
0 0 1 1
complete them to a basis of M (2 × 2) (it is clear that A and B are linearly independent,
so this makes sense). Since dim(M (2 × 2)) = 4, we know that we need 2 more matrices.
0 1
We take any matrix C ∈ / span{A, B}, for example C = . Finally we need a matrix
0 0
0 0
D ∈/ span{A, B, C}. We can take for example D = . Then A, B, C, D is a basis of
1 0
M (2 × 2).
Check that D ∈
/ span{A, B, C}
FT
3
and we want to find a subset of them which form a basis of R .
Note that a priori it is not clear that this is possible because we do not know without further
calculations that the given vectors really generate R3 . If they do not, then of course it is
impossible to extract a basis from them.
Let us start. First observe that we need 3 vectors for a basis since dim R3 = 3. So we start
with the first non-zero vector which is ~v1 . We see that ~v2 = 4~v1 , so we discard it. We keep
~v3 since ~v3 ∈
/ span{~v1 }. Next, ~v4 = ~v3 − ~v1 , so ~v4 ∈ span{~v1 , ~v3 } and we discard it. A little
RA
calculation shows that ~v5 ∈/ span{~v1 , ~v3 }. Hence {~v1 , ~v3 , ~v5 } is a basis of R3 .
Remark 5.49. We will present a more systematic way to solve exercises of this type in
Theorem 6.34 and Remark 6.35.
Theorem 5.50. Let V be a vector space with basis {v1 , . . . , vn }. Then every x ∈ V can be written
in unique way as linear combination of the vectors v1 , . . . , vn .
D
If we have a vector space V and a subspace W ⊂ V , then we can ask ourselves what the relation
between their dimensions is because W itself is a vector space.
Lemma 5.51. Let V be a finitely generated vector space and let W be a subspace. Then W is
finitely generated and dim W ≤ dim V .
Proof. Let V be a finitely generated vector space with dim V = n < ∞. Let W be a subspace of
V and assume that W is not finitely generated. Then we can construct an arbitrary large system
of linear independent vectors in W as follows. Clearly, W cannot be the trivial space, so we can
choose w1 ∈ W \ {O} and we set W1 = span{w1 }. Then W1 is a finitely generated subspace of
W , therefore W1 ( W and we can choose w2 ∈ W \ W1 . Clearly, the set {w1 , w2 } is linearly
independent. Let us set W2 = span{w1 , w2 }. Since W2 is a finitely generated subspace of W , it
follows that W2 ( W and we can choose w3 ∈ W \ W2 . Then the vectors w1 , w2 , w3 are linearly
independent and we set W3 = span{w1 , w2 , w3 }. Continuing with this procedure we can construct
subspaces W1 ( W2 ( · · · W with dim Wk = k for every k. In particular, we can find a system
of n + 1 linear independent vectors in W ⊆ V which contradicts the fact that any system of more
than n = dim V vectors in V must be linearly dependent, see Corollary 5.47. This also shows that
any system of more than n vectors in W must be linear dependent. Since a basis of W consists of
linearly independent vectors, it follows that dim W ≤ n = dim V .
FT
Theorem 5.52. Let V be a finitely generated vector space and let W ⊆ V be a subspace. Then the
following is true:
Remark 5.53. Note that (i) is true even when V is not finitely generated because dim W ≤ ∞ =
dim V whatever dim W may be. However (ii) is not true in general for infinite dimensional vector
spaces. In Example 5.54 (f) and (g) we will show that dim P = dim C(R) in spite of P 6= C(R).
(Recall that P is the set of all polynomials and that C(R) is the set of all continuous functions. So
D
we have P ( C(R).)
(b) dim M (m × n) = mn. This follows because the set of all m × n matrices Aij which have a 1 in
the ith row and jth column and all other entries are equal to zero form a basis of M (m × n)
and there are exactly mn such matrices.
(c) Let Msym (n × n) be the set of all symmetric n × n matrices. Then dim Msym (n × n) = n(n+1) 2 .
To see this, let Aij be the n × n matrix with aij = aji = 1 and all other entries equal to 0.
Observe that Aij = Aji . It is not hard to see that the set of all Aij with i ≤ j form a basis of
Msym (n × n). The dimension of Msym (n × n) is the number of different matrices of this type.
How many of them are there? If we fix j = 1, then only i = 1 is possible. If we fix j = 2,
then i = 1, 2 is possible, etc. until for j = n the allowed values for i are 1, 2, . . . , n. In total
we have 1 + 2 + · · · + n = n(n+1)
2 possibilities. For example, in the case n = 2, the matrices
are
1 0 0 1 0 0
A11 = , A12 = , A12 = .
0 0 1 0 0 1
In the case n = 3, the matrices are
1 0 0 0 1 0 0 0 1
A11 = 0 0 0 , A12 = 1 0 0 , A13 = 0 0 0 ,
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0
A22 = 0 1 0 , A23 = 0 0 1 , A33 = 0 0 0 .
0 0 0 0 1 0 0 0 1
FT
Convince yourself that the Aij form a basis of Msym (n × n).
(d) Let Masym (n × n) be the set of all antisymmetric n × n matrices. Then dim Masym (n × n) =
n(n−1)
2 . To see this, for i 6= j let Aij be the n × n matrix with aij = −aji = 1 and all other
entries equal to 0 form a basis of Msym (n × n). It is not hard to see that the set of all Aij
with i < j form a basis of Masym (n × n). How many of these matrices are there? If we fix
j = 2, then only i = 1 is possible. If we fix j = 3, then i = 1, 2 is possible, etc. until for j = n
the allowed values for i are 1, 2, . . . , n − 1. In total we have 1 + 2 + · · · + (n − 1) = n(n−1)
RA
2
possibilities. For example, in the case n = 2, the only matrix is
0 1
A12 = .
−1 0
0 0 0 −1 0 0 0 −1 0
Remark. Observe that dim Msym (n × n) + dim Masym (n × n) = n2 = dim M (n × n). This
is no coincidence. Note that every n × n matrix M can be written as
M = 12 (M + M t ) + 12 (M − M t )
Proof. We know that for every n ∈ N, the space Pn is a subspace of P . Therefore for every
n ∈ N, we must have that n + 1 = dim Pn ≤ dim P . This is possible only if dim P = ∞.
(g) dim C(R) = ∞. Recall that C(R) is the space of all continuous functions.
Proof. Since P is a subspace of C(R), it follows that dim P ≤ dim(C(R)), hence dim(C(R)) =
∞.
Now we use the concept of dimension to classify all subspaces of R2 and R3 . We already know that
for examples lines and planes which pass through the origin are subspaces of R3 . Now we can show
that there are no other proper subspaces.
FT
Subspaces of R2 . Let U be a subspace of R2 . Then U must have a dimension. So we have the
following cases:
• dim U = 1. Then U is of the form U = span{~v1 } with some vector ~v1 ∈ R2 \ {~0}. Therefore
U is a line parallel to ~v1 passing through the origin.
• dim U = 2. In this case dim U = dim R2 . Hence it follows that U = R2 by Theorem 5.52 (ii).
RA
• dim U ≥ 3 is not possible because 0 ≤ dim U ≤ dim R2 = 2.
In conclusion, the only subspaces of R2 are {~0}, lines passing through the origin and R2 itself.
• dim U = 1. Then U is of the form U = span{~v1 } with some vector ~v1 ∈ R3 \ {~0}. Therefore
D
• dim U = 2. Then U is of the form U = span{~v1 , ~v2 } with linearly independent vectors
~v1 , ~v2 ∈ R3 . Hence U is a plane parallel to the vectors ~v1 and ~v2 which passes through the
origin.
• dim U = 3. In this case dim U = dim R3 . Hence it follows that U = R3 by Theorem 5.52 (ii).
In conclusion, the only subspaces of R3 are {~0}, lines passing through the origin, planes passing
through the origin and R3 itself.
We conclude this section with the formal definition of lines and planes.
Definition 5.55. Let V be a vector space with dim V = n and let W ⊆ V be a subspace. Then
W is called a
• line if dim W = 1,
• plane if dim W = 2,
• hyperplane if dim W = n − 1.
FT
• why and how the concept of dimension helps to classify all subspaces of given vector space,
• why a matrix A ∈ M (n × n) is invertible if and only if its columns are a basis of Rn ,
• etc.
You should now be able to
5.5 Summary
Let V be a vector space over K and let v1 , . . . , vk ∈ V .
D
• The set of all linear combinations of the vectors v1 , . . . , vk is a subspace of V , called the space
generated by the vectors v1 , . . . , vk or the linear span of the vectors v1 , . . . , vk . Notation:
α1 v1 + · · · + αk vk = O
• A vector space V is called finitely generated if it has a finite basis. In this case, every basis
of V has the same number of vectors. The number of vectors needed for a basis of a vector
space V is called the dimension of V .
FT
• For v1 , . . . , vk ∈ V , it follows that dim(span{v1 , . . . , vk }) ≤ k with equality if and only if the
vectors v1 , . . . , vk are linearly independent.
• dim{O} = 0,
D
• dim Rn = n, dim Cn = n,
• dim M (m × n) = mn,
n(n+1)
• dim Msym (n × n) = 2 ,
n(n−1)
• dim Masym (n × n) = 2 ,
• dim Pn = n + 1,
• dim P = ∞,
• dim C(R) = ∞.
• gen{~v1 , . . . , ~vk } = Rn if and only if the system A~x = ~b has at least one solution for every
~b ∈ Rn .
• The vectors ~v1 , . . . , ~vk are linearly independent if and only if the system A~x = ~0 has only the
trivial solution ~x = ~0.
• The vectors ~v1 , . . . , ~vk are a basis of Rn if and only if k = n and A is invertible.
5.6 Exercises
1. Sea X el conjunto de todas las funciones de R a R. Demuestre que X con la suma y producto
con números en R es un espacio vectorial.
FT
De los siguientes subconjuntos de X, diga si son subespacios de X.
2. Sean A ∈ M (m × n) y sea ~a ∈ Rk .
3. Sean A ∈ M (m × n) y sea ~a ∈ Rk .
son subespacios de Rk ?
FT
5. Considere el conjunto R2 con las siguientes operaciones:
2 2 2 x1 y1 x1 + y1
:R ×R →R , = ,
x2 y2 0
2 2 x1 λx1
:R×R →R , λ = .
x2 λx2
para todo x, y ∈ V, λ ∈ R.
FT
(l) Todas las matrices no invertibles.
(m) Todas las matrices con det A = 1.
9. Demuestre que
x1
x1 + x2 − 2x3 − x4 = 0
x
2
V = :
x3 x1 − x2 + x3 + 7x4 = 0
RA
x4
es un subespacio de R4 .
es un subespacio afı́n de R4 .
Sea U el conjunto de todas las soluciones de (1) y W el conjunto de todas las soluciones de (2).
Note que se pueden ver como subconjuntos de R3 .
1 −2 2 3
12. (a) Sean v1 = , v2 = ∈ R . Escriba v = como combinación lineal de v1 y v2 .
2 5 0
1 1 1
(b) ¿Es v = 2 combinación lineal de v1 = 7 , v2 = 5?
5 2 2
13 −5
(c) ¿Es A = combinación lineal de
50 8
1 0 0 1 2 1 1 −1
A1 = , A2 = , A3 = , A4 = ?
2 2 −2 2 5 0 5 2
FT
1 2 3
13. (a) ¿Los vectores v1 = 2 , v2 = 2 , v3 = 0 son linealmente independientes en R3 ?
3 5 1
1 1 1
(b) ¿Los vectores v1 = −2 , v2 = 7 , v3 = 5 son linealmente independientes en R3 ?
2 2 2
(c) ¿Los vectores p1 = X 2 − X + 2, p2 = X + 3, p3 = X 2 − 1 son linealmente independientes
RA
en P2 ? Son linealmente independientes en Pn para n ≥ 3?
1 3 1 1 7 3 1 −1 0
(d) ¿Los vectores A1 = , A2 = , A3 = son lineal-
−2 2 3 2 −1 2 5 2 8
mente independientes en M (2 × 3)?
~ ∈ Rn . Suponga que w
14. Sean ~v1 , . . . , ~vk , w 6 ~0 y que w
~ = ~ ∈ Rn es ortogonal a todos los vectores
~vj . Demuestre que w ~ ∈
/ gen{~v1 , . . . , ~vm }. Se sigue que el sistema w,
~ ~v1 , . . . , ~vm es linealmente
D
independiente?
15. Determine
si gen{a
1 4 } =
, a2 , a3 , a gen{v1 , v } para
2 , v3
0 1 1 2 5 1 1
a1 = 1 , a2 = 0 , a3 = 2 , a4 = 1 , v1 = −3 , v2 = 1 , v3 = −1.
5 3 13 11 0 8 −2
16. (a) ¿Las siguientes matrices generan el espacio de todas las matrices simétricas 2 × 2?
2 0 13 0 0 3
A1 = , A2 = , A3 = ,
0 7 0 5 3 0
(b) ¿Las siguientes matrices generan el espacio de todas las matrices simétricas 2 × 2?
2 0 13 0 0 3
B1 = , B2 = , B3 = ,
0 7 0 5 −3 0
(c) ¿Las siguientes matrices generan el espacio de las matrices triangulares superiores 2 × 2?
6 0 0 3 10 −7
C1 = , C2 = , C3 = .
0 7 0 5 0 0
17. Sea n ∈ N y sea V el conjunto de las matrices simétricas n × n con la suma y producto con
λ ∈ R usual.
(a) Demuestre que V es un espacio vectorial sobre R.
FT
(b) Encuentre matrices que generan V . ¿Cual es el número mı́nimo de matrices que se necesitan
para generar V ?
18. Determine si los siguientes conjuntos de vectores son bases del espacio vectorial indicado.
1 −2
(a) v1 = , v2 = ; R2 .
2 5
1 3 5 3 0 1 2 1
RA
(b) A = , B= , C= , D= ; M (2 × 2).
2 1 1 2 −2 2 5 0
(c) p1 = 1 + x, p2 = x + x2 , p3 = x2 + x3 , p4 = 1 + x + x2 + x3 ; P3 .
3 2
forma E : ax + by + cz = d.
(c) Encuentre un vector w ∈ R3 , distinto de v1 y v2 , tal que gen{v1 , v2 , w} = E.
(d) Encuentre un vector v3 ∈ R3 tal que gen{v1 , v2 , v3 } = R3 .
1 0 4 2 1
23. Sean v1 = 2 , v2 = 4 , v3 = 2 , v4 = 8 , v5 = 0.
3 1 5 3 1
Determine si estos vectoren generan el espacio R3 . Si lo hacen, escoja una base de R3 de los
FT
vectores dados.
6 0 6 3 6 −3 12 −9
24. Sean C1 = , C2 = , C3 = , C4 = .
0 7 0 12 0 2 0 −1
Determine si estas matrices generan el espacio de las matrices triangulares superiores 2 × 2. Si
lo hacen, escoja una base de las matrices dadas.
RA
25. Sean p1 = x2 +7, p2 = x+1, p3 = 3x3 +7x. Determine si los polinomios p1 , p2 , p3 son linealmente
independientes. Si lo son, complételos a una base en P3 .
26. Para los siguientes conjuntos, determine si son espacios vectoriales. Si lo son, calcule su di-
mensión.
(c) M3 = {A ∈ M (n × n) : At = −A}.
(d) M4 = {p ∈ P5 : p(0) = 0}.
27. Para los siguientes sistemas de vectores en el espacio vectorial V , determine la dimensión del
espacio vectorial generado por ellos y escoja un subsistema de ellos que es base del espacio
vectorial generado por los vectores dados. Complete este subsistema a una base de V .
1 3 3
(a) V = R3 , ~v1 = 2 , ~v2 = 2 , ~v3 = 2.
3 7 1
(b) V = P4 , p1 = x3 + x, p2 = x3 − x2 + 3x, p3 = x2 + 2x − 5, p4 = x3 + 3x + 2.
1 4 3 0 0 12 9 −12
(c) V = M (2 × 2), A = , B= , C= , D= .
−2 5 1 4 −7 11 10 1
tenemos w` ∈ gen{u1 , . . . , uk }.
FT
(b) Sean u1 , . . . , uk , w1 , . . . , wm ∈ V . Demuestre que lo siguiente es equivalente:
(i) gen{u1 , . . . , uk } = gen{w1 , . . . , wm }.
(ii) Para todo j = 1, . . . , k tenemos uj ∈ gen{w1 , . . . , wm } y para todo ` = 1, . . . , m
FT
In the first section of this chapter we will define linear maps between vector spaces and discuss their
properties. These are functions which “behave well” with respect to the vector space structure. For
example, m × n matrices can be viewed as linear maps from Rm to Rn . We will prove the so-called
dimension formula for linear maps. In Section 6.2 we will study the special case of matrices. One of
the main results will be the dimension formula (6.4). In Section 6.4 we will see that, after choice of
a basis, every linear map between finite dimensional vector spaces can be represented as a matrix.
This will allow us to carry over results on matrices to the case of linear transformations.
RA
As in previous chapters, we work with vector spaces over R or C. Recall that K always stands for
either R or C.
Other words for linear map are linear function, linear transformation or linear operator.
Remark. Note that very often one writes T x instead of T (x) when T is a linear function.
185
186 6.1. Linear maps
(iii) The condition (6.1) says that a linear map respects the vector space structures of its
domain and its target space.
Exercise 6.3. Let U, V be vector spaces over K (with K = R or K = C). Let us denote the set
of all linear maps from U to V by L(U, V ). Show that L(U, V ) is a vector spaces over K. That
means you have to show that the sum of two linear maps is a linear map, that a scalar multiple
of linear map is a linear map and that the vector space axioms hold.
Examples 6.5 (Linear maps). (a) Every matrix A ∈ M (m × n) can be identified with a linear
FT
map Rn → Rm .
(b) Differentiation is a linear map, for example:
(i) Let C(R) be the space of all continuous functions and C 1 (R) the space of all continuously
differentiable functions. Then
T : C 1 (R) → C(R), Tf = f0
is a linear map.
RA
Proof. First of all note that f 0 ∈ C(R) if f ∈ C 1 (R), so the map T is well-defined. Now
we want to see that it is linear. So we take f, g ∈ C 1 (R) and λ ∈ R. We find
(ii) The following maps are linear, too. Note that their action is the same as the one of T
above, but we changed the vector spaces where it acts on.
R : Pn → Pn−1 , Rf = f 0 , S : P n → Pn , Sf = f 0 .
D
Proof. Clearly I is well-defined since the integral of a continuous function is again continuous.
In order to show that I is linear, we fix f, g ∈ C(R) and λ ∈ R. We find for every x ∈ R:
Z x Z x Z t Z x
I(λf + g) (x) = (λf + g)(t) dt = λf (t) + g(t) dt = λ f (t) dt + g(t) dt
0 0 0 0
= λ(If )(x) + (Ig)(x).
Since this is true for every x, it follows that I(λf + g) = λ(If ) + (Ig).
T : M (n × n) → M (n × n), T (A) = A + At .
The next lemma shows that a linear map always maps the zero vector to the zero vector.
Proof. T O = T (O − O) = T O − T O = O.
FT
(i) T is called injective (or one-to-one) if
x, y ∈ U, x 6= y =⇒ T x 6= T y.
(ii) T is called surjective if for all v ∈ V there exists at least one x ∈ U such that T x = v.
(iii) T is called bijective if it is injective and surjective.
(iv) The kernel of T (or null space of T ) is
RA
ker(T ) := {x ∈ U : T x = 0}.
Remark 6.8. (i) Observe that ker(T ) is a subset of U , Im(T ) is a subset of V . In Proposi-
tion 6.11 we will show that they are even subspaces.
(ii) Clearly, T is injective if and only if for all x, y ∈ U the following is true:
Tx = Ty =⇒ x = y.
(iii) If T is a linear injective map, then its inverse T −1 : Im(T ) → U exists and is linear too.
Proof. (i) From Lemma 6.6, we know that O ∈ ker(T ). Assume that T is injective. Then ker(T )
cannot contain any other element, hence ker(T ) = {O}.
Now assume that ker(T ) = {O} and let x, y ∈ U with T x = T y. By Remark 6.8 it is sufficient to
show that x = y. By assumption, O = T x − T y = T (x − y), hence x − y ∈ ker(T ) = {O}. Therefore
x − y = O, which means that x = y.
(ii) follows directly from the definitions of surjectivity and the image of a linear map.
Examples 6.10 (Kernels and ranges of the linear maps from Examples 6.5).
(a) We will discuss the case of matrices at the beginning of Section 6.2.
(b) If T : C 1 (R) → C(R), T f = f 0 , then it is easy to see that the kernel of T consists exactly
of the constant functions. Moreover T is surjective because every continuousR functions is the
x
derivative of another function because for every f ∈ C(R) we can set g(x) = 0 f (t) dt. Then
FT
g ∈ C 1 (R) and T g = g 0 = f which shows that Im(T ) = C(R).
(c) For the integration operator in Example 6.5((c)) we have that ker(I) = {0} and Im(I) =
C 1 (R). In other words, I is injective but not surjective.
Proof. First we proveR x the claim about the range of I. Suppose that g ∈ Im(I). Then g is
of the form g(x) = 0 f (t) dt for some f ∈ C(R). By the fundamental theorem of calculus,
it follows that g ∈ C 1 (R), so we proved Im(I) ⊆ C 1 (R). To show the other inclusion, let
RA
0
g ∈ C 1 (R). Then g is differentiable
R x 0 and g ∈ C(R) and, again by the fundamental theorem of
calculus, we have that g(x) = 0 g (t) dt, so g ∈ Im(I) and it follows that C 1 (R) ⊆ Im(I).
Rx
Now assume that Ig = 0. If we differentiate, we find that 0 = (Ig)0 (x) = dx
d
0
g(t) dt = g(x)
for all x ∈ R, therefore g ≡ 0, hence ker(I) = {0}.
Proof. First we prove the claim about the range of T . Clearly, Im(T ) ⊆ Msym (n × n) because
for every A ∈ M (n × n) we have that T (A) is symmetric because (T (A))t = (A + At )t =
At + (At )t = At + A = T (A). To prove Msym (n × n) ⊆ Im(T ) we take some B ∈ Msym (n × n).
Then T ( 21 B) = 12 B +( 12 B)t = 21 B + 12 B = B where we used that B is symmetric. In summary
we showed that Im(T ) = Msym (n × n).
The claim on the kernel of T follows from
Proof. (i) By Lemma 6.6, O ∈ ker(T ). Let x, y ∈ ker(T ) and λ ∈ K. Then x + λy ∈ ker(T )
because
T (x + λy) = T x + λT y = O + λ0 = O.
Hence ker(T ) is a subspace of U by Proposition 5.10.
(ii) C;early, O ∈ Im(T ). Let v, w ∈ Im(T ) and λ ∈ K. Then there exist x, y ∈ U such that
T x = v and T y = w. Then v + λw = T x + λT y = T (x + λy) ∈ Im(T ). hence v + λw ∈ Im(T ).
Therefore Im(T ) is a subspace of V by Proposition 5.10.
Since we now know that ker(T ) and Im(T ) are subspaces, the following definition makes sense.
FT
Sometimes the notations ν(T ) = dim(ker(T )) and ρ(T ) = dim(Im(T )) are used.
Proof. • First we show the claim about the image of T . We know that differentiation lowers
the degree of a polynomial by 1. Hence Im(T ) ⊆ {q ∈ P3 : deg q ≤ 2}. On the other hand,
we know that every polynomial of degree ≤ 2 is the derivative of a polynomial of degree ≤ 3.
RA
So the claim follows.
• First we show the claim about the kernel of T . Recall that ker(T) = {p ∈ P3 : T p = 0}. So
the kernel of T are exactly those polynomials whose first derivative is 0. These are exactly
the constant polynomials, i.e., the polynomials of degree 0.
Lemma 6.13. Let T : U → V be a linear map between two vector spaces U, V and let {u1 , . . . , uk }
be a basis of U . Then Im T = span{T u1 , . . . , T uk }.
Proof. Clearly, T u1 , . . . , T uk ∈ Im T . Since the image of T is a vector space, all linear combinations
D
of these vectors must belong to Im T too which shows span{T u1 , . . . , T uk } ⊆ Im T . To show the
other inclusion, let y ∈ Im T . Then there is an x ∈ U such that y = T x. Let us express x as linear
combination of the vectors of the basis: x = α1 u1 + . . . αk uk . Then we obtain
y = T x = T (α1 u1 + . . . αk uk ) = α1 T u1 + . . . αk T uk ∈ span{T u1 , . . . , T uk }.
(i) If the x1 , . . . , xk are linearly dependent, then y1 , . . . , yk are linearly dependent too.
(ii) If the y1 , . . . , yk are linearly independent, then x1 , . . . , xk are linearly independent too.
(iii) Suppose additionally that T invertible. Then x1 , . . . , xk are linearly independent if and only
if y1 , . . . , yk are linearly independent.
In general the implication “If x1 , . . . , xk are linearly independent, then y1 , . . . , yk are linearly
independent.” is false. Can you give an example?
Proof of Proposition 6.14. (i) Assume that the vectors x1 , . . . , xk are linearly dependent. Then
there exist λ1 , . . . , λk ∈ K such that λ1 x1 + · · · + λk xk = O and at least one λj 6= 0. But then
O = T O = T (λ1 x1 + · · · + λk xk ) = λ1 T x1 + · · · + λk T xk
= λ1 y1 + · · · + λk yk ,
FT
(iii) Suppose that the vectors y1 , . . . , yk are linearly independent. Then so are the x1 , . . . , xk by
(i). Now suppose that x1 , . . . , xk are linearly independent. Note that T is invertible, so T −1
exists. Therefore we can apply (i) to T −1 in order to conclude that the system y1 , . . . , yk is linearly
independent. (Note that xj = T −1 yj .)
Exercise 6.15. Assume that T : U → V is an injective linear map and suppose that {u1 , . . . , u` }
is a set of are linearly independent vectors in U . Show that {T u1 , . . . , T u` } is a set of are linearly
independent vectors in V .
RA
The following lemma is very useful and it is used in the proof of Theorem 6.4.
(i) From Lemma 6.13 we know that Im T = span{T u1 , . . . , T uk }. Therefore dim Im T ≤ k = dim U
by Theorem 5.45.
(ii) Assume that T is injective. We will show that T u1 , . . . , T uk are linearly independent. Let
α1 , . . . , αk ∈ K such that α1 T u1 + · · · + αk T uk = O. Then
O = α1 T u1 + · · · + αk T uk = T (α1 u1 + · · · + αk uk ).
Since T is injective, it follows that α1 u1 + · · · + αk uk = O, hence α1 = · · · = αk = 0 which
shows that the vectors T u1 , . . . , T uk are indeed linearly independent. Therefore they are a basis
of span{T u1 , . . . , T uk } = Im T and we conclude that dim Im T = k = dim U .
(iii) Since T is bijective, it is surjective and injective. Surjectivity means that Im T = V and
injectivity of T implies that dim Im T = dim U by (ii). In conclusion,
The previous theorem tells us for example that there is no injective linear map from R5 to R3 ; or
that there is no surjective linear map from R3 to M (2 × 2).
Remark 6.17. Proposition 6.16 is true also for dim U = ∞. In this case, (i) clearly holds whatever
dim Im(T ) may be. To prove (ii) we need to show that dim Im(T ) = ∞ if T is injective. Note that
for every n ∈ N we can find a subspace Un of U with dim Un = n and we define Tn to be the
restriction of T to Un , that is, Tn : Un → V . Since the restriction of an injective map is injective,
it follows from (ii) that dim Im(Tn ) = n. On the other hand, Im(Tn ) is a subspace of V , therefore
dim V ≥ dim Im(Tn ) = n by Theorem 5.52 and Remark 5.53. Since this is true for any n ∈ N, it
follows that dim V = ∞. The proof of (iii) is the same as in the finite dimensional case.
Theorem 6.18. Let U, V be K-vector spaces and T : U → V a linear map. Moreover, let E : U →
FT
U , F : V → V be linear bijective maps. Then the following is true:
(i) Im(T ) = Im(T E), in particular dim(Im(T )) = dim(Im(T E)).
(ii) ker(T E) = E −1 (ker(T )) and dim(ker(T )) = dim(ker(T E)).
(iii) ker(T ) = ker(F T ), in particular dim(ker(T )) = dim(ker(F T )).
(iv) Im(F T ) = F (Im(T )) and dim(Im(T )) = dim(Im(F T )).
RA
In summary we have
and
Proof. (i) Let v ∈ V . If v ∈ Im(T ), then there exists x ∈ U such that T x = v. Set y = E −1 x.
Then v = T x = T EE −1 x = T Ey ∈ Im(T E). On the other hand, if v ∈ Im(T E), then there exists
y ∈ U such that T Ey = v. Set x = E. Then v = T Ey = T x ∈ Im(T ).
(ii) To show ker(T E) = E −1 ker(T ) observe that
It follows that
E −1 : ker T → ker(T E)
is a linear bijection and therefore dim T = dim ker(T E) by Proposition 6.16(iii) (or Remark 6.17 in
the infinite dimensional case) with E −1 as T , ker(T ) as U and ker(T E) as V .
(iii) Let x ∈ U . Then x ∈ ker(F T ) if and only if F T x = O. Since F is injective, we know that
ker(F ) = {O}, hence it follows that T x = O. But this is equivalent to x ∈ ker(T ).
(iv) To show Im(F T ) = F Im(T ) observe that
Im(F T ) = {y ∈ V : y = F T x for some x ∈ U } = {F v : v ∈ Im(T )} = F (Im(T )).
It follows that
F : Im T → Im(F T )
is a linear bijection and therefore dim T = dim Im(F T ) by Proposition 6.16(iii) (or Remark 6.17 in
the infinite dimensional case) with F as T , Im(T ) as U and Im(F T ) as V .
FT
but
1 0
ker(T E) = span , Im(F T ) = span .
0 1
Draw a picture to visualise the example above, taking into account that T represents √
the projection
onto the x-axis and E and F are rotation by 45◦ and a “stretching” by the factor 2.
We end this section with one of the main theorems of linear algebra. In the next section we will
RA
re-prove it for the special case when T is given by a matrix in Theorem 6.33. The theorem below
can be considered a coordinate free version of Theorem 6.33.
Theorem 6.20. Let U, V be vector spaces with dim U = n < ∞ and let T : U → V be a linear
map. Then
dim(ker(T )) + dim(Im(T )) = n. (6.4)
Proof. Let k = dim(ker(T )) and let {u1 , . . . , uk } be a basis of ker(T ). We complete it to a basis
{u1 , . . . , uk , wk+1 , . . . , wn } of U and we set W := span{wk+1 , . . . , wn }. Note that by construction
D
where in the second step we used that T u1 = · · · = T uk = O and therefore they do not contribute
to the linear span and in the third step we used that T wj = Tewj for j = k + 1, . . . , n. So we
showed that Im Te = Im T , in particular their dimensions are equal and the claim follows from (6.5)
because, recalling that k = dim ker(T ),
Note that an alternative way to prove the theorem above is to first prove Theorem 6.33 for matrices
and then use the results on representations of linear maps in Section 6.4 to conclude formula (6.4).
FT
• etc.
You should now be able to
Let A ∈ M (m × n). We already know that we can view A as a linear map from Rn to Rm . Hence
ker(A) and Im(A) and the terms injectivity and surjectivity are defined.
Strictly speaking, we should distinguish between a matrix and the linear map induced by it. So
we should write TA : Rn → Rm for the map x 7→ Ax. The reason is that if we view A directly
as a linear map then this implies that we tacitly have already chosen a basis in Rn and Rm , see
Section 6.4 for more on that. However, we will usually abuse notation and write A instead of TA .
If we view a matrix A as a linear map and at the same time as a linear system of equations, then
we obtain the following.
Remark 6.21. Let A ∈ M (m × n) and denote the columns of A by ~a1 , . . . , ~an ∈ Rm . Then the
following is true.
(ii) Im(A) = all vectors ~b such that the system A~x = ~b has a solution
= span{~a1 , . . . , ~an }.
Consequently,
Proof. All claims should be clear except maybe the second equality in (ii). This follows from
x1 x1
n .. .. n
Im A = {A~x : ~x ∈ R } = (~a1 | . . . |~an ) . : . ∈ R
xn xn
FT
= {x1~a1 + · · · + xn~an ) : x1 , . . . , xn ∈ R}
= span{~a1 , . . . , ~an },
Proof using the concept of dimension. We already saw that Im A is the linear span of its columns.
Therefore dim Im A ≤ #columns of A = n < m = dim Rm , therefore Im A ( Rm .
Proof with Gauß-Jordan. Let A0 be the row reduced echelon form of A. Then A0 can have at
most m pivots. Since A0 has more columns than pivots, the homogeneous system A~x = ~0 has
infinitely solutions, but then also ker A contains infinitely many vectors, in particular A cannot be
injective.
Proof using the concept of dimension. We already saw that Im A is the linear span of its n columns
in Rm . Since n > m it follows that the column vectors are linearly dependent in Rm , hence A~x = ~0
has a non-trivial solution. Therefore ker A is not trivial and it follows that A is not injective.
Note that the remarks do not imply that A is surjective if m ≤ n or that A is injective if n ≤ m.
Find examples!
From Theorem 3.43 we obtain the following very important theorem for the special case m = n.
(i) A is invertible.
(ii) A is injective, that is, ker A = {~0}.
(iii) A is surjective, that is, Im A = Rn .
In particular, A is injective if and only if A is surjective if and only if A is bijective.
Definition 6.25. Let A ∈ M (m × n) and let ~c1 , . . . , ~cn be the columns of A and ~r1 , . . . , ~rm be the
rows of A. We define
(i) CA := span{~c1 , . . . , ~cm } =: column space of A ⊆ Rm ,
FT
(ii) RA := span{~r1 , . . . , ~rn } =: row space of A ⊆ Rn ,
The next proposition follows immediately from the definition above and from Remark 6.21(ii).
(ii) RA = RF A .
Proof. (i) Note that CA = Im(A) = Im(AE) = CAE , where in the first and third equality we
used Proposition 6.26, and in the second equality we used Theorem 6.4.
(ii) Recall that, if F is invertible, then F t is invertible too. With Proposition 6.26(i) and what
we already proved in (i), we obtain RF A = C(F A)t = CAt F t = CAt = RA .
We immediately obtain the following proposition.
Proof. We will only prove (i). The claim (ii) can be proved similarly (or can be deduced easily
from (i) by applying (i) to the transposed matrices). That A and B are row equivalent means that
we can transform B into A by row transformations. Since row transformations can be represented
by multiplication by elementary matrices from the left, there are elementary matrices F1 , . . . , Fk ∈
M (m × m) such that A = F1 . . . Fk B. Note that all Fj are invertible, hence F := F1 . . . Fk is
invertible and A = F B. Therefore all the claims in (i) follow from Theorem 6.4 and Proposition 6.27.
The proposition above is very useful to calculate the kernel of a matrix A: Let A0 be the reduced
row-echelon form of A. Then the proposition can be applied to A and A0 , and we find that
ker(A) = ker(A0 ).
In fact, we know this since the first chapter of this course, but back then we did not have fancy
FT
words like “kernel” at our disposal. It says nothing else than: the solutions of a homogenous
system do not change if we apply row transformations, which is exactly why the Gauß-Jordan
elimination works.
In Examples 6.36 and 6.37 we will calculate the kernel and range of a matrix. Now we will prove
two technical lemmas.
m−r
0 0
Proof. Let A0 be the reduced row-echelon form of A. Then there exist F1 , . . . , F` ∈ M (m × m) such
that F1 · · · F` A = A0 and A0 is of the form
1 ∗ ∗ 0 ∗ ∗ 0 ∗
1 ∗ ∗ 0 ∗
0
A = . (6.7)
1 ∗
Now clearly we can find “allowed” column transformations such that A0 is transformed into the
form A00 . If we observe that applying column transformations is equivalent to multiplying A0 from
the right by elementary matrices, then we can find elementary matrices E1 , . . . , Ek such that
A0 E1 . . . Ek if of the form (6.6).
where the ~ej are the standard unit vectors (that is, their jth component is 1 and all other components
FT
are 0).
Proposition 6.31. Let A ∈ M (m × n) and let A0 be its reduced row-echelon form. Then
That means:
(dimension of the range of A) = (dimension of row space) = (dimension of column space).
As an immediate consequence we obtain the following theorem which is a special case of Theo-
rem 6.20, see also Theorem 6.46.
dim(ker(A)) = dim(ker(A00 )) = n − r,
dim(Im(A)) = dim(Im(A00 )) = r
Theorem 6.34. Let A ∈ M (m × n) and let A0 be its reduced row-echelon form with columns
~c1 , . . . , ~cn and ~c1 0 , . . . , ~cn 0 respectively. Assume that the pivot columns of A0 are the columns j1 <
· · · < jk . Then dim(Im(A)) = k and a basis of Im(A) is given by the columns ~cj1 , . . . , ~cjk of A.
FT
Proof. Let E be an invertible matrix such that A = EA0 . By assumption on the pivot columns of
A0 , we know that dim(Im(A0 )) = k and that a basis of Im(A0 ) is given by the columns ~cj1 0 , . . . , ~cjk 0 .
By Theorem 6.4 it follows that dim(Im(A)) = dim(Im(A0 )) = k. Now observe that by definition of
E we have that E~c` 0 = ~c` for every ` = 1, . . . , n; in particular this is true for the pivot columns of
A0 . Moreover, since E in invertible and the vectors ~cj1 0 , . . . , ~cjk 0 are linearly independent, it follows
from Theorem 6.14 that the vectors ~cj1 , . . . , ~cjk are linearly independent. Clearly they belong to
Im(A), so we have span{~cj1 , . . . , ~cjk } ⊆ Im(A). Since both spaces have the same dimension, they
must be equal.
RA
Remark 6.35. The theorem above can be used to determine a basis of a subspace given in the
form U = span{~v1 , . . . , ~vk } ⊆ Rm as follows: Define the matrix A = (~v1 | . . . |~vk ). Then clearly
U = Im A and we can apply Theorem 6.34 to find a basis of U .
4 5 22 1
Image of A: The pivot columns of A0 are the columns 1, 2 and 4. Therefore, by Theorem 6.34, a
FT
basis of Im(A) are the columns 1, 2 and 4 of A:
1
1 1
3 2 1
Im(A) = span , , .
2 −1 (6.9)
0
4 5 1
Alternative method for calculating the image of A: We can uses column manipulations of A
RA
to obtain Im A. (If you fell more comfortable with row operations, you could apply row operations
to At and then transpose the resulting matrix again.) We find (Cj stands for “jth column of A):
C →C −C
1 1 5 1 C32 → C32 − 5C11 1 0 0 0 C3 → C3 − 2C2 1 0 0 0
3 2 13 1 C4 → C4 − C1 3 −1 −2 −2 C4 → C4 − 2C2 3 −1 0 0
A = −−−−−−−−−−→ −−−−−−−−−−→
0 2 4 −1 0 2 4 −1 0 2 0 −5
4 5 22 1 4 1 2 −3 4 1 0 −5
1 0 0 0 C1 → C1 − 3C4 1 0 0 0 1 0 0 0
D
C4 → −1/5C4 C3 ↔ C4
C1 → C1 + 3C2 0 −1 0 0 C2 → C2 − 2C4 0 −1 0 0 C2 → −C2 0 1 0 0
−−−−−−−−−−→ 3
−−−−−−−−−−→ −−−−−−−→ =: A.
e
2 0 1 0 0 0 1 0 0 1 0
7 1 0 1 4 −1 0 1 4 1 1 0
It follows that
1 0 0
e = span , , 0 .
0 1
Im(A) = Im(A) (6.9’)
0 0 1
4 1 1
p1 = x3 − x2 + 2x + 2, p2 = x3 + 2x2 + 8x + 13,
p3 = 3x3 − 6x2 − 5, p3 = 5x3 + 4x2 + 26x − 9.
Solution. First we identify P3 with R4 by ax3 + bx2 + cx + d = b (a, b, c, d)t . The polynomials
p1 , p2 , p3 , p4 correspond to the vectors
1 1 3 5
−1 2 −6 4
~v1 =
2 , ~v2 = 8 , ~v3 = 0 , ~v4 = 26 .
2 13 −5 −9
Now we use Remark 6.35 to find a basis of span{v1 , v2 , v3 , v4 }. To this end we consider the A whose
columns are the vectors ~v1 , . . . , ~v4 :
1 1 3 5
−1 2 −6 4
FT
A=
2 8
.
0 26
2 13 −5 −9
Clearly, span{v1 , v2 , v3 , v4 } = Im(A), so it suffices to find a basis of Im(A). Applying row transfor-
mation to A, we obtain
1 1 3 5 1 0 4 5
−1 2 −6 4 0 1 2 3 = A0 .
A= −→ · · · −→
2 8 0 26 0 0 0 0
RA
2 13 −5 −9 0 0 0 0
The pivot columns of A0 are the first and the second column, hence by Theorem 6.34, a basis of
Im(A) are its first and second columns, i.e. the vectors ~v1 and ~v2 .
It follows that {p1 , p2 } is a basis of span{p1 , p2 , p3 , p4 } ⊆ P3 , hence dim(span{p1 , p2 , p3 , p4 }) = 2.
Remark 6.38. Let us use the abbreviation π = span{p1 , p2 , p3 , p4 }. The calculation above actually
shows that any two vectors of p1 , p2 , p3 , p4 form a basis of π. To see this, observe that clearly any
two of them are linearly independent, hence the dimension of their generated space is 2. On the
D
other hand, this generated space is a subspace of π which has the same dimension 2. Therefore
they must be equal.
Remark 6.39. If we wanted to complete p1 , p2 to a basis of P3 , we have (at least) the two following
options:
(i) In order to find q3 , q4 ∈ P3 such that p1 , p2 , q3 , q4 forms a basis of P3 we can use the reduction
process that was employed to find A0 . Assume that E is an invertible matrix such that
A = EA0 . Such an E can be found by keeping track of the row operations that transform
A into A0 . Let ~ej be the standard unit vectors of R4 . Then we already know that ~v1 = E~e1
and ~v2 = E~e2 . If we set w ~ 3 = E~e3 and w ~ 4 = E~e4 , then ~v1 , ~v2 , w ~ 4 form a basis of R4 .
~ 3, w
This is because ~e1 , . . . ,~e4 are linearly independent and E is injective. Hence E~e1 , . . . , E~e4 are
linearly independent too (by Proposition 6.14).
x1 − x2 +2x3 +2x4 = 0,
x1 +2x2 −6x3 +4x4 = 0
or, in matrix notation, P ~x = 0 where P is the 2 × 4 matrix whose rows are p1 and p2 . Since
clearly Im(P ) ⊆ R2 , it follows that dim(Im(P )) ≤ 2 and therefore dim(ker(P )) ≥ 4 − 2 = 2.
The following theorem is sometimes useful, cf. Lemma 7.27. For the definition of the orthogonal
complement see Definition 7.23.
FT
Proof. Observe that RA = CAt = Im(At ). So we have to show that ker(A) = (Im(At ))⊥ . Recall
that hAx , yi = hx , At yi. Therefore
x ∈ ker(A) ⇐⇒ Ax = 0 ⇐⇒ Ax ⊥ Rm
⇐⇒ hAx , yi = 0 for all y ∈ Rm
⇐⇒ hx , At yi = 0 for all y ∈ Rm ⇐⇒ x ∈ (Im(A))t .
Alternative proof of Theorem 6.40. Let ~r1 , . . . , ~rn be the rows of A. Since RA = span{~r1 , . . . , ~rn },
RA
it suffices to show that ~x ∈ ker(A) if and only if ~x ⊥ ~rj for all j = 1, . . . , m.
By definition ~x ∈ ker(A) if and only if
~r1 x1 h~r1 , ~xi
~0 = A~x = . . .
.. .. = ..
~rm xm h~rm , ~xi
This is the case if and only if h~rj , ~xi for all j = 1, . . . , m, that is, if and only if ~x ⊥ ~rj for all
j = 1, . . . , m. (h· , ·i denotes the inner product on Rn .)
D
FT
~v = 2 , or more generally, ~ = y ,
w (6.10)
−1 z
Such columns of numbers are usually interpreted as the Cartesian coordinates of the tip of the
vector if its initial point is in the origin. So for example, we can visualise ~v as the vector which
we obtain when we move 3 units along the x-axis, 2 units along the y-axis and −1 unit along the
z-axis.
If we set ~e1 , ~e2 , ~e3 the unit vectors which are parallel to the x-, y- and z-axis, respectively, then we
RA
can write ~v as a weighted sum of them:
3
~v = 2 = 3~e1 + 2~e2 − ~e3 . (6.11)
−1
So the column of numbers which we use to describe ~v in (6.10) can be seen as a convenient way to
abbreviate the sum in (6.11).
Sometimes however, it may make more sense to describe a certain vector not by its Cartesian
D
coordinates. For instance, think of an infinitely large chess field (this is R2 ). Then the rook is
moving a along the Cartesian axis while the bishop moves a along the diagonals, that is along
~b1 = ( 1 ), ~b2 = −1 and the knight moves in directions parallel to ~k1 = ( 2 ), ~k2 = ( 1 ). We
1 1 1 2
suppose that in our imaginary chess game the rook, the bishop and the knight may move in arbitrary
multiples of their directions. Suppose all three of them are situated in the origin of the field and we
want to move them to the field (3, 5). For the rook, this is very easy. It only has to move 3 steps to
the right and then 5 steps up. He would denote his movement as ~vR = ( 35 )R . The bishop cannot
do this. He can move only along the diagonals. So what does he have to do? He has to move 4
steps in the direction of ~b1 and 1 step in the direction of ~b2 . So he would denote his movement with
respect to his bishop coordinate system as ~vB = ( 42 )B . Finally the knight has to move 31 steps in
the direction of ~k1 and 73 steps in the direction of ~k2 to reach the point
(3, 5). So he would denote
1/3
his movement with respect to his knight coordinate system as ~vK = 7/3
. See Figure 6.1.
K
y y
P (3, 5) P (3, 5)
5 5
~b2 1
4 3
4 1 B 4 7
3 K
7~
3 3 k
3 2
4~b1 ~k2
2 2
~b2 1 1 ~k1
~b1 1~
x k
3 1 x
−1 0 1 2 3 4 −1 0 1 2 3 4
FT
Figure 6.1
The pictures shows the point (3, 5) in “bishop” and “knight” coordinates. The vectors for the
bishop are ~b1 = ~ −1
xB = ( 31 ). The vectors for the knight are ~k1 = ( 21 ), ~k2 = ( 12 )B
1
( 1 ), b2 =
1
1 and ~
and ~xK = 3
7 .
3 K
1/3
Exercise. Check that ~vB = ( 42 )B = 4~b1 + 2~b2 = ( 35 ) and that ~vK = 7/3
= 1/3~k1 + 7/3~k2 = ( 35 ).
K
RA
Although the three vectors ~v , ~vB and ~vK look very different, they describe the same vector – only
from three different perspectives (the rook, the bishop and the knight perspective). We have to
remember that they have to be interpreted as linear combinations of the vectors that describe their
movements.
What we just did was to perform a change of bases in R2 : Instead of describing a point in the plane
in Cartesian coordinates, we used “bishop”- and “knight”-coordinates.
We can also go in the other direction and transform from “bishop”- or “knight”-coordinates to
Cartesian coordinates. Assume that we know that the bishop moves 3 steps in his direction ~b1 and
D
−2 steps in his direction ~b2 , where does he end up? In his coordinate system, he is displaced by
the vector ~u = −23 B . In Cartesian coordinates this vector is
3 ~ ~ 3 2 5
~u = = 3b1 − 2b2 = + = .
−2 B 3 −2 1
~ ~
3 steps in his direction k1 and −2 step in his direction k2 , that is, we move
If we move the knight
3
him along w
~ = −2 K according to his coordinate system, then in Cartesian coordinates this vector
is
3 ~ ~ 6 −2 4
w
~= = 3b1 − 2b2 = + = .
−2 K 3 −4 −1
Can the bishop and the knight reach every point in the plane? If so, in how many ways? The
y y
3 3
3~b1 ~k2
2 −2~b2 2
3~k1
~b1
3 ~k1
~b2 1 −2 B 1
−2~k2
x x
−1 1 2 3 4 5 6 −1 1 2 3 4 5 6
3
−2 K
3 3
Figure 6.2: The pictures shows the vectors −2 B and −2 K .
FT
answer is yes, and they can do so in exactly one way. The reason is that for the bishop and for the
knight, their set of direction vectors each form a basis of R2 (verify this!).
Let us make precise the concept of change of basis. Assume we are given an ordered basis B =
{~b1 , . . . , ~bn } of Rn . If we write
x1
~x = ... (6.12)
xn B
RA
then we interprete it as a vector which is expressed with respect to the basis B and
x1
..
~x = . := x1~b1 + · · · + xn~bn . (6.13)
xn B
If there is no index attached to the column vector, then we interprete it as a vector with respect to the
canonical basis ~e1 , . . . ,~en of Rn . Now we want to find a way to calculate the Cartesian coordinates
D
(that is, those with respect to the canonical basis) if we are given a vector in B-coordinates and
vice versa.
It will turn out that the following matrix will be very useful:
AB→can = (~v1 | . . . |~vn ) = matrix whose columns are the vectors of the basis B.
xn B
xn xn
that is
y1
~x = AB→can ~xB = ... . (6.14)
yn can
The last vector (the one with the y1 , . . . , yn in it) describes the same vector as ~xB , but it does so
with respect to the standard basis of Rn . The matrix AB→can is called the transition matrix from
the basis B to the canonical basis (which explains the subscript “B → can”). The matrix is also
called the change-of-coordinates matrix
FT
Transition from Cartesian coordinates to representation with respect to a given basis.
Suppose we are given a vector ~x in Cartesian coordinates. How do we calculate its coordinates ~xB
with respect to the basis B?
We only need to remember that the relation between ~x and ~xB according to (6.14) is
~x = AB→can ~xB .
RA
In this case, we know the entries of the vector ~xB . So we only need to invert the matrix AB→can in
order to obtain the entries of ~xB :
~xB = A−1
B→can ~x.
This requires of course that AB→can is invertible. But this is guaranteed by Theorem 5.37 since we
know that its columns are linearly independent. So it follows that the transition matrix from the
canonical basis to the basis B is given by
Acan→B = A−1
B→can .
D
y1
Note that we could do this also “by hand”: We are given ~x = ... and we want to find the
yn can
entries x1 , . . . , xn of the vector ~xB which describes the same vector. That is, we need numbers
x1 , . . . , xn such that
~x = x1~b1 + · · · + ~bn xn .
If we know the vectors ~b1 , . . . , ~bn , then we can write this as an n × n system of linear equations
and then solve it for x1 , . . . , xn which
of course in reality is the same as applying the inverse of the
y1
matrix AB→can to the vector ~x = ... .
yn can
Now assume that we have two ordered bases B = {~b1 , . . . , ~bn } and C = {~c1 , . . . , ~cn } of Rn and we
are given a vector ~xB with respect to the basis B. How can we calculate its representation ~xC with
respect to the basis C? The easiest way is to use the canonical basis of Rn as an auxiliary basis.
So we first calculate the given vector ~xB with respect to the canonical basis, we call this vector ~x.
Then we go from ~x to ~xC . According to the formulas above, this is
~ can→C ~x = Acan→C AB→can ~xB .
~xC = A
Example 6.41. Let us go back to our example of our imaginary chess board. We have the “bishop
basis” B = {~b1 , ~b2 } where ~b1 = ( 11 ), ~b2 = −11 and the “knight basis” K = {~k1 , ~k2 } ~k1 = ( 21 ), ~k2 =
( 12 ). Then the transition matrices to the canonical basis are
1 −1 2 1
AB→can = , AK→can = ,
FT
1 1 1 2
1 3 −3 −5
Solution. (~x)K = AB→K ~xB = 3 1 1 ( 27 ) = 3 K.
1 13 3
Solution. (~y )B = AK→B ~yK = −1 3 ( 51 ) = −1 B .
D
1 11
Solution. (~z)B = Acan→B ~z = 2 −1 1 ( 13 ) = ( 21 )B .
Example 6.42. Recall the example on page 94 where we had a shop that sold different types of
packages of food. Package type A contains 1 peach and 3 mangos and package type B contains 2
peaches and 1 mango. We asked two types of questions:
Question 1. If we buy a packages of type A and b packages of type B, how many peaches and
mangos will we get? We could rephrase this question so that it becomes more similar to Question
2: How many peaches and mangos do we need in order to fill a packages of type A and b packages
of type B?
Question 2. How many packages of type A and of type B do we have to buy in order to get p
peaches and m mangos?
Recall that we had the relation
a m −1 m a 1 2 −1 1 −1 2
M = , M = where M = and M = .
b p p b 3 1 5 3 −1
(6.15)
We can view these problems in two different coordinate systems. We have the “fruit basis” F =
{~ ~ and the “package basis” P = {A,
p, m} ~ B}
~ where
1 0 ~= 1 , B ~ = 2 .
m
~ = , p~ = , A
0 1 3 1
FT
1 7
want to reach is in “package coordinates” and in “fruit coordinates” . This is sketched
3 P 6 F
in Figure 6.3.
An example for the second question is: How many packages of type A and of type B do we have
to buy in order to obtain 5 peaches and 5 mangos? Using (6.15) we find that we need 1 package of
type
A and 3 packages of type B.Sothe point that we want to reach is in “package coordinates”
1 5
and in “fruit coordinates” . This is sketched in Figure 6.4.
2 P 5 F
RA
In the rest of this section we will apply these ideas to introduce coordinates in abstract (finitely
generated) vector spaces V with respect to a given a basis. This allows us to identify in a certain
sense V with Rn or Cn for an appropriate n.
Assume we are given a real vector space V with an ordered basis B = {v1 , . . . , vn }. Given a vector
w ∈ V , we know that there are uniquely determined real numbers α1 , . . . , αn such that
w = α1 v1 + · · · + αn vn .
D
So, if we are given w, we can find the numbers α1 , . . . , αn . On the other hand, if we are given the
numbers α1 , . . . , αn , we can easily reconstruct the vector w (just replace in the right hand side of
the above equation). Therefore it makes sense to write
α1
..
w= .
αn B
where again the index B reminds us that the column of numbers has to be understood as the
coefficients with respect to the basis B. In this way, we identify V with Rn since every column
vector gives a vector w in V and every vector w gives one column vector in Rn . Note that if we
start with some w in V , calculate its coordinates with respect to a given basis and then go back to
V , we get back our original vector w.
mangos
7
Type B
6
FT
5 ~
3B 4
4 6m
~
3
3
~
A 7~
p 2
2
1
1 ~
B
RA
p
~
peaches Type A
1 2 3 4 5 6 7 −1 1 2
m
~
(a)
(b)
Figure 6.3: How many peaches and mangos do we need to obtain 1 package of type A and 3 packages
of type B? Answer: 7 peaches and 6 mangos. Figure (a) describes the situation in the “fruit plane”
D
while Figure (b) describes the same situation in the “packages plane”. In both figures we see that
~ + 3B
A ~ = 7~
p + 6m.
~
mangos
Type B
6
5 4
~
2B
4
3
3 5m
~
~
A 2
2 5~
p
1
1 ~
B
p
~
peaches Type A
1 2 3 4 5 6 −1 1 2
m
~
(a)
FT
(b)
Figure 6.4: How many packages of type A and of type B do we need to get 5 peaches and 5 mangos?
Answer: 1 package of type A and 2 packages of type B. Figure (a) describes the situation in the “fruit
plane” while Figure (b) describes the same situation in the “packages plane”. In both figures we see
~ + 2B
that A ~ = 5~p + 5m.
~
p1 = 1, p2 = X, p3 = X 2 , q1 = X 2 , q2 = X, q3 = 1, r1 = X 2 + 2X, r2 = 5X + 2, r3 = 1.
We want to write the polynomial π(X) = aX 2 + bX + c with respect to the given basis.
c
• Basis B: Clearly, π = cp1 + bp2 + ap3 , therefore π = b .
a B
D
a
• Basis C: Clearly, π = aq1 + bq2 + cq3 , therefore π = b .
c C
• Basis D: This requires some calculations. Recall that we need numbers α, β, γ ∈ R such that
α
π = β = αr1 + βr2 + γr3 .
γ D
Note that the columns of the matrix appearing on the right hand side are exactly a the vector
representations of r1 , r2 , r3 with respect to the basis C and the column vector b is exactly
c
the vector representation of π with respect to the basis C! The solution of the system is
α = a, β = − 52 a + 15 b, γ = 25 a − 15 b + c,
therefore
a
π = − 25 a + 51 b .
2 1
5a − 5b + c D
1
FT
We could have found the solution also by doing a detour through R3 as follows: We identify the
vectors q1 , q2 , q3 with the canonical basis vectors ~e1 , ~e2 ,~e3 of R3 . Then the vectors r1 , r2 , r3
and π correspond to
0
0
~r10 = 2 , ~r20 = 5 , ~r30 = 0 , ~π 0 = b .
0 2 1
a
c
RA
Let R = {~r10 , ~r20 , ~r30 }. In order to find the coordinates of ~π 0 with respect to the basis ~r10 , ~r20 , ~r30 ,
we note that
~π 0 = AR→can~πR
0
where AR→can is the transition matrix from the basis R to the canonical basis of R whose
columns consist of the vectors ~r10 , ~r20 , ~r30 . So we see that this is exactly the same equation as
the one in (6.16).
(i) Show that B = {R, S, T } is a basis of Msym (2 × 2) (the space of all symmetric 2 × 2 matrices).
(ii) Write Z in terms of the basis B.
Solution. (i) Clearly, R, S, T ∈ Msym (2 × 2). Since we already know that dim Msym (2 × 2) = 3,
it suffices to show that R, S, T are linearly independent. So let us consider the equation
α+β α+γ
0 = αR + βS + γT = .
α + γ α + 3β
FT
| {z }
=A
Therefore
α 2 3 0 −1 2 3
1
β = A−1 3 = −1 0 1 3 = −1 ,
2
γ 0 −3 2 1 0 0
3
hence Z = 3R − S = −1 .
0 B
RA
Now we give an alternative solution (which is essentially
the same as the above)doing a detour
1 0 0 0 1 1
through R3 . Let C = {A1 , A2 , A3 } where A1 = , A2 = , A3 = . This is
0 0 0 1 1 0
clearly a basis of Msym (2 × 2). We identify it with the standard basis ~e1 ,~e2 ,~e3 of R3 . Then the
vectors R, S, T in this basis look like
1 1 0 2
R0 = 1 , S 0 = 0 , T 0 = 1 and Z 0 = 3 .
1 3 0 0
D
(i) In order to show that R, S, T are linearly independent, we only have to show that the vectors
R0 , S 0 and T 0 are linearly independent in R3 . To this end, we consider the matrix A whose
columns are these vectors. Note that this is the same matrix that appeared in (6.18). It is
easy to show that this matrix is invertible (we already calculated its inverse!). Therefore the
vectors R0 , S 0 , T 0 are linearly independent in R3 , hence R, S, T are linearly independent in
Msym (2 × 2).
(ii) Now in order to find the representation of Z in terms of the basis B, we only need to find the
representation of Z 0 in terms of the basis B 0 = {R0 , S 0 , T 0 }. This is done as follows:
2
ZB0 0 = Acan→B0 Z 0 = A−1 Z 0 = 3 .
0
FT
6.4 Linear maps and their matrix representations
Let U, V be K-vector spaces and let T : U → V be a linear map. Recall that T satisfies
T (λ1 x1 + · · · + λk xk ) = λ1 T (x1 ) + · · · + λk T (xk )
for all x1 , . . . , xk ∈ U and λ1 , . . . , λk ∈ K. This shows that in order to know T , it is in reality
enough to know how T acts on a basis of U . Suppose that we are given a basis B = {u1 , . . . , un } ∈ U
and take an arbitrary vector w ∈ U . Then there exist uniquely determined λ1 , . . . , λk ∈ K such
RA
that w = λ1 u1 + · · · + λn un . Hence
T w = T (λ1 u1 + · · · + λn un ) = λ1 T u1 + · · · + λn T un . (6.19)
So T w is a linear combination of the vectors T u1 , . . . , T un ∈ V and the coefficients are exactly the
λ1 , . . . , λ n .
Suppose we are given a basis C = {v1 , . . . , vk } of V . Then we know that for every j = 1, . . . , n, the
vector T uj is a linear combination of the basis vectors v1 , . . . , vm of V . Therefore there exist uniquely
determined numbers aij ∈ K (i = 1, . . . , m, j = 1, . . . n) such that T uj = aj1 v1 + · · · + ajm vm , that
is
D
Note that the first column of AT is the vector representation of T u1 with respect to the basis
v1 , . . . , vm , the second column is the vector representation of T u2 , and so on.
Now let us come back to the calculation of T w and its connection with the matrix AT . From (6.19)
and (6.20) we obtain
T w = λ1 T u1 + λ2 T u2 + · · · + λn T un
FT
+ (am1 λ1 + am2 λ2 + · · · + amn λn )vm .
The calculation shows that for every k the coefficient of vk is the kth component of the vector AT ~λ!
Now we can go one step further. Recall that the choice of the basis B of U and the basis C of V
allows us to write w and T w as a column vectors:
λ1 a11 λ1 + a12 λ2 + · · · + a1n λn
λ2 a21 λ1 + a22 λ2 + · · · + a2n λn
w=w ~B . , Tw = .
..
RA
.. .
λ1 B
am1 λ1 + am2 λ2 + · · · + amn λn C
Very important remark. This identification of m×n-matrices with linear maps U → V depends
on the choice of the basis! See Example 6.47.
Theorem 6.45. Let U, V be finite dimensional vector spaces and let B = {u1 , . . . , un } be an ordered
basis of U and let C = {v1 , . . . , vn } be an ordered basis of V . Then the following is true:
(T w)C = AT w
~B
where (T w)C is the representation of T w ∈ V with respect to the basis C and w ~ B is the
representation of w ∈ U with respect to the basis B. The entries aij of AT can be calculated
as in (6.20).
(iii) T = TAT and A = ATA . , That means: If we start with a linear map T : U → V , calculate its
matrix representation AT and then the linear map TAT : U → V induced by AT , then we get
back our original map T . If on the other hand we start with a matrix A ∈ M (m×n), calculate
the linear map TA : U → V induced by A and then calculate its matrix representation ATA ,
then we get back our original matrix A.
Proof. We already showed (i) and (ii) in the text before the theorem. To see (iii), let us start with a
linear transformation T : U → V and let AT = (aij ) be the matrix representation of T with respect
FT
to the bases B and C. For TAT , the linear map induced by AT , it follows that
T AT uj = a1j v1 + . . . amj vm = T uj , j = 1, . . . , n
Since this is true for all basis vectors and both T and TAT are linear, they must be equal.
If on the other hand we are given a matrix A = (aij )i=1,...,m ∈ M (m × n) then we have that the
j=1,...,n
linear transformation TA induced by A acts on the basis vectors u1 , . . . , un as follows:
RA
TA uj = TAT uj = a1j v1 + . . . amj vm .
But then, by definition of the matrix representation ATA of TA , it follows that ATA = A.
Let us see this “identifications” of matrices with linear transformations a bit more formally. By
choosing a basis B = {u1 , . . . , un } in U and thereby identifying U with Rn , we are in reality defining
a linear bijection
λ1
..
D
n
Ψ:U →R , Ψ(λu1 + · · · + λn un ) = . .
λn
Recall that we denoted the vector on the right hand side by ~uB .
The same happens if we choose a basis C = {v1 , . . . , vm } of V . We obtain a linear bijection
µ1
Φ : V → Rm , Φ(µv1 + · · · + µm vm ) = ... .
µm
The maps Ψ and Φ “translate” the spaces U and V to Rn and Rm where the chosen bases serve
as “dictionary”. Thereby they “translate” linear maps U : U → V to matrices A ∈ M (m × n) and
vice versa. In a diagram this looks likes this:
T
U V
Ψ Φ
AT
Rn Rm
So in order to go from U to V , we can take the detour through Rn and Rm . The diagram above is
called commutative diagram. That means that it does not matter which path we take to go from
one corner of the diagram to another one as long as we move in the directions of the arrows. Note
that in this case we are even allowed to go in the opposite directions of the arrows representing Ψ
and Φ because they are bijections.
What is the use of a matrix representation of a linear map? Sometimes calculations are easier in
the world of matrices. For example, we know how to calculate the range and the kernel of a matrix.
Therefore, using Theorem :
FT
• If we want to calculate Im T , we only need to calculate Im AT and then use Φ to “translate
back” to the range of T . In formula: Im T = Im(Φ−1 AT Ψ) = Im(Φ−1 AT ) = Φ−1 (Im AT ).
• If we want to calculate ker T , we only need to calculate ker AT and then use Ψ to “translate
back” to the kernel of T . In formula: ker T = ker(Φ−1 AT Ψ) = ker(AT Ψ) = Ψ−1 (ker AT ).
• If dim U = dim V , i.e., if n = m, then T is invertible if and only if AT is invertible. This is
the case if and only if det AT 6= 0.
RA
Let us summarise. From Theorem 6.24 we obtain again the following very important theorem, see
Theorem 6.20 and Proposition 6.16.
Theorem 6.46. Let U, V be vector spaces and let T : U → V be a linear transformation. Then
(i) Represent T with respect to the basis B = {p1 , p2 , p3 , p4 } and find its kernel where p1 =
1, p2 = X, p3 = X 2 , p4 = X 3 .
Solution. We only need to evaluate T in the elements of the basis and then write the re-
sult again as linear combination of the basis. Since in this case, the bases are “easy”, the
calculations are fairly simple:
T p1 = 0, T p2 = 1 = p1 , T p3 = 2X = 2p2 , T p4 = 3X 2 = 3p3 .
FT
(ii) Represent T with respect to the basis C = {q1 , q2 , q3 , q4 } and find its kernel where q1 =
X 3 , q2 = X 2 , q3 = X, q4 = 1.
Solution. Again we only need to evaluate T in the elements of the basis and then write the
result as linear combination of the basis.
T q1 = 3X 2 = 3q2 , T q2 = 2X = 2q3 , T q3 = X = q4 , T q4 = 0.
RA
Therefore the matrix representation of T is
0 0 0 0
C
3 0 0 0
AT = .
0 2 0 0
0 0 1 0
(iii) Represent T with respect to the basis B in the domain of T (in the “left” P3 ) and the basis
D
Solution. We calculate
T p1 = 0, T p2 = 1 = q4 , T p3 = 2X = 2q3 , T p4 = 3X 2 = 3q2 .
(iv) Represent T with respect to the basis D = {r1 , r2 , r3 , r4 } and find its kernel where
r1 = X 3 + X, r2 = 2X 2 + X 2 + 2X, r3 = 3X 3 + X 2 + 4X + 1, r4 = 4X 3 + X 2 + 4X + 1.
Solution 1. Again we only need to evaluate T in the elements of the basis and then write the
result as linear combination of the basis. This time the calculations are a bit more tedious.
T r1 = 3X 2 + 1 = − 8r1 + 2r2 + r4 ,
2
T r2 = 6X + 2X + 2 = − 14r1 + 4r2 + r3 ,
T r3 = 9X 2 + 2X + 4 = − 24r1 + 5r2 + 2r3 + 2r4 ,
T r4 = 12X 2 + 2X + 4 = 30r1 + 8r2 + 2r3 + 2r4 .
Therefore the matrix representation of T is
−8 −14 −24 −30
2 4 5 8
AD
T = 0
.
2 2 2
1 0 2 2
FT
In order to calculate the kernel of AT , we apply the Gauß-Jordan process and obtain
−8 −14 −24 −30 1 0 0 2
2 4 5 8 −→ · · · −→ 0 1 0 1
AD
T = 0
.
2 2 2 0 0 1 0
1 0 2 2 0 0 0 0
The kernel of AT is clearly span{−2~e1 − ~e2 + ~e4 }, hence ker T = span{−2r1 − r2 + r4 } =
span{1}.
RA
Solution 2. We already have the matrix representation ACT and we can use it to calculate
AD
T . To this end define the vectors
1 2 3 4
0 1 1 1
ρ
~1 =
1 , ρ
~4 = .
2 ~3 = 4 , ρ
~2 = , ρ
4
0 0 1 1
Note that these vectors are the representations of our basis vectors r1 , . . . , r4 in the basis C.
D
Let us see how this looks in diagrams. We define the two bijections of P3 with R4 which are
given by choosing the bases C and D by ΨC and ΨD :
T T
P3 P3 P3 P3
ΨC ΨC ΨD ΨD
AC AD
R4 T
R4 R4 T
R4
We already know everything in the diagram on the left and we want to calculate AD
T in the
diagram on the right. We can put the diagrams together as follows:
FT
T
P3 P3
ΨC ΨC
ΨD ΨD
SD→C AC SC→D
R4 R4 T
R4 R4
AD
T
RA
We can also see that the change-of-basis maps SD→C and SC→D are
SD→C = ΨC ◦ Ψ−1
D , SC→D = ΨD ◦ Ψ−1
C .
For AD
T we obtain
−1
AD C
T = ΨD ◦ T ◦ ΨD = SD→C ◦ AT ◦ SC→D .
D
T
P3 P3
ΨC ΨC
AC
ΨD R4 T
R4 ΨD
S
C C
→ →
D
SD
AD
T
R4 R4
B,C
Note that the matrices AB C D
T , AT , AT and AT all look different but they describe the same linear
transformation. The reason why they look different is that in each case we used different bases to
describe them.
Example 6.48. The next example is not very applied but it serves to practice a bit more. We
consider the operator given
T : M (2 × 2) → P2 , T ( ac db ) = (a + c)X 2 + (a − b)X + a − b + d.
Show that T is a linear transformation and represent T with respect to the bases B = {B1 , B2 , B3 , B4 }
of M (2 × 2) and C = {p1 , p2 , p3 } of P2 where
1 0 0 1 0 0 0 0
B1 = , B2 = , B3 = , B4 = ,
0 0 0 0 1 0 0 1
and
p1 = 1, p2 = X, p3 = X 2 .
FT
Find bases for ker T and Im T and their dimensions.
a1 b1
Solution. First we verify that T is indeed a linear map. To this end, we take matrices A1 = c1 d1
and A2 = ac22 db22 and λ ∈ R. Then
a b1 a b2 λa1 + a2 λb1 + b2
T (λA1 + A2 ) = T λ 1 + 2 =T λ
c1 d1 c2 d2 λc1 + c2 λd1 + d2
= (λa1 + a2 + λc1 + c2 )X 2 + (λa1 + a2 − λb1 − b2 )X + λa1 + a2 − (λb1 + b2 ) + λd1 + d2
RA
= λ[(a1 + c1 )X 2 + (a1 − b1 )X + a1 − b1 + d1 )] + (a2 + c2 )X 2 + (a2 − b2 )X + a2 − b2 + d2 )
= λT (A1 ) + T (A2 ).
T B1 = X 2 + X + 1 = p1 + p2 + p3 ,
T B2 = −X = −p2 ,
D
T B3 = X 2 = p3 ,
T B4 = 1 = p1 .
In order to determine the kernel and range of AT , we apply the Gauß-Jordan process:
1 0 0 1 1 0 0 1 1 0 0 1
AT = 1 −1 0 0 −→ 0 −1 0 −1 −→ 0 1 0 1 .
1 0 1 0 0 0 1 −1 0 0 1 −1
5
R~z L
4
3
~v ~z
w
~ 2
x
−3 −2 −1 1 2 3 4 5
−1
−2 ~ = −w
Rw ~
FT
Figure 6.5: The pictures shows the reflection R on the line L. The vector ~v is parallel to L, hence
R~v = ~v . The vector w ~ = −w.
~ is perpendicular to L, hence Rw ~
So the range of AT is R3 and its kernel is ker e1 +~e2 −~e3 −~e3 }. Therefore Im T = P2 and
AT = span{~
ker T = span{B1 + B2 − B3 − B4 } = span −11 −11 . For their dimensions we find dim(Im T ) = 3
and dim(ker T ) = 1.
Solution 1 (use coordinates adapted to the problem). Clearly, there are two directions which
are special in this problem: the direction parallel and the direction orthogonalto the line. So a
~ = −32 . Clearly, R~v = ~v
~ where ~v = ( 23 ) and w
basis which is adapted to the exercise, is B = {~v , w}
and Rw ~ = −w.~ Therefore the matrix representation of R with respect to the basis B is
D
B 1 0
AR = .
0 −1
In order to obtain the representation AR with respect to the standard basis, we only need to perform
a change of basis. Recall that change-of-bases matrices are given by
2 −3 −1 1 2 3
SB→can = (~v |w)
~ = , Scan→B = SB→can = .
3 2 13 −3 2
Therefore
1 2 −3 1 0 2 3 1 −5 12
AR = SB→can AB
R Scan→B = = .
13 3 2 0 −1 −3 2 13 12 5
Solution 2 (reduce the problem to a known reflection). The problem would be easy if we
were asked to calculate
the matrix representation of the reflection on the x-axis. This would simply
1 0
be A0 = . Now we can proceed as follows: First we rotate R2 about the origin such that
0 −1
the line L is parallel to the x-axis, then we reflect on the x-axis and then we rotate back. The result
is the same as reflecting on L. Assume that Rot is the rotation matrix. Then
How can we calculate Rot? We know that Rot~v = ~e1 and that Rotw ~ = ~e2 . It follows that
Rot−1 = (~v |w)
~ = −32 32 . Note that up to a numerical factor, this is SB→can . We can calculate
FT
2 a b 2 2a + 3b
= ~v = AT ~v = = ,
3 c d 3 2c + 3d
−3 a b −3 3a − 2b
=w ~ = −AT w ~ =− =
2 c d 2 3c − 2d
2a + 3b = 2, 2c + 3d = 3, 3a − 2b = −3,3c − 2d = 2,
RA
5 −5 12
, b = c = 12 5
Its unique solution is a = − 13 13 , d = 13 , hence AR = 12 5 .
a basis of E.
Solution 1 (use coordinates adapted to the problem). Clearly, a basis which is adapted to
the exercise is B = {~n, ~v , w}
~ because for these vectors we have R~v = ~v , Rw ~ R~n = −~n, and
~ = w,
P ~v = ~v , P w ~ P ~n = ~0. Therefore the matrix representation of R with respect to the basis B is
~ = w,
1 0 0
AB
R =
0 1 0
0 0 −1
z z
~x ~x
• •
E E
P ~x
R~x
y y
x x
Figure 6.6: The figure shows the plane E : x − 2y + 3z = 0 and for the vector ~
x it shows its orthogonal
projection P ~
x onto E and its reflection R~
x about E, see Example 6.50.
Therefore
2 0
~ n) = 1 3 −2 ,
SB→can = (~v |w|~
0 2
1
3
FT
In order to obtain the representations AR and AP with respect to the standard basis, we only need
to perform a change of basis. Recall that change-of-bases matrices are given by
−1
Scan→B = SB→can =
1
28
13
−3
2 −4
2 −3
6
5 .
6
RA
2 0 1 1 0 0 13 2 −3
1
AR = SB→can AB
R Scan→B = 1 3 −2 0 1 0 −3 6 5
28
0 2 3 0 0 −1 2 −4 6
6 2 −3
1
= 2 3 6
7
−3 6 −2
and
D
2 0 2 1 0 0 13 2 −3
1
AP = SB→can AB
P Scan→B = 1 3 −1 0 1 0 −3 6 5
28
0 2 3 0 0 0 2 −4 6
13 2 −3
1
= 2 10 6
14
−3 6 5
Solution 2 (reduce the problem to a known reflection). The problem would be easy if we
were asked to calculate
thematrix representation of the reflection on the xy-plane. This would
1 0 0
simply be A0 = 0 1 0. Now we can proceed as follows: First we rotate R3 about the origin
0 0 −1
such that the plane E is parallel to the xy-axis, then we reflect on the xy-plane and then we rotate
back. The result is the same as reflecting on the plane E. We leave the details to the reader. An
analogous procedure works for the orthogonal projection.
Solution 3 (straight forward calculation).
a11 a12 a13 Lastly, we can form a system of linear equations in
order to find AR . We write AR = aa21 aa22 aa23 with unknowns aij . Again, we use that we know
31 32 33
that AR~v = ~v , AR w ~ and AR~n = −~n. This gives a system of 9 linear equations for the nine
~ =w
unknowns aij which can be solved.
Remark 6.51. Yet another solution is the following. Let Q be the orthogonal projection onto ~n.
We already know how to calculate its representing matrix:
1 −2 3 x
h~x , ~ni x − 2y + 3z 1
Q~x = ~
n = ~
n = −2 4 −6 y .
k~nk2 14 14
3 −6 9 z
1 −2 3
1 −2 4 −6 . Geometrically, it is clear that P = id −Q and R = id −2Q. Hence it
Hence AQ = 14
FT
3 −6 9
follows that
1 0 0 1 −2 3 13 2 −3
1 1
AP = id −AQ = 0 1 0 − −2 4 −6 = 2 10 6
14 14
0 0 1 3 −6 9 −3 6 5
and
1 0 0 1 −2 3 6 2 −3
1 1
AR = id −2AQ = 0 1 0 − −2 4 −6 = 2 3 6 .
7 7
RA
0 0 1 3 −6 9 −3 6 −2
Therefore
Aid = Ψ−1 −1 −1
C ◦ id ◦ΨB = ΨC ◦ ΨB = SC→can ◦ SB→can = Scan→C ◦ SB→can = SB→C→can .
FT
• use the matrix representation of a linear map to calculate its kernel and range,
• interpret a matrix as a linear map between finite dimensional vector spaces,
• etc.
6.5 Summary
Linear maps
RA
A function T : U → V between two K-vector spaces U and V is called linear map (or linear function
or linear transformation) if it satisfies
T (u1 + λu2 ) = T (u1 ) + λT (u2 ) for all u1 , u2 ∈ U and λ ∈ K.
The set of all linear maps from U to V is denoted by L(U, V ).
• The composition of linear maps is a linear map.
• If a linear map is invertible, then its inverse is a linear map.
D
• If U, V are K-vector spaces then L(U, V ) is a K-vector space. This means: If S, T ∈ L(U, V )
and λ ∈ K, then S + λT ∈ L(U, V ).
For a linear map T : U → V we define the following sets
ker T = {u ∈ U : T u = O} ⊆ U,
Im T = {T u : u ∈ U } ⊆ V.
ker T is called kernel of T or null space of T . It is a subspace of U . Im T is called image of T or
range of T . It is a subspace of V .
The linear map T is called injective if T u1 = T u2 implies u1 = u2 for all u1 , u2 ∈ U . The linear
map T is called surjective if for every v ∈ V exist some u ∈ U such that T u = v. The linear map
T is called bijective if it is injective and surjective.
Let T : U → V be a linear map.
FT
ker(F T ) = ker(T ), ker(T E) = E −1 (ker(T )),
Im(F T ) = F (Im(T )), Im(T E) = Im(T ).
and
dim(ker(T )) + dim(Im(T )) = n.
TA : Kn → Km , ~x 7→ A~x.
D
αn βm
then these functions are linear and Φ ◦ AT ◦ Ψ = T and Ψ−1 ◦ T ◦ Φ−1 = AT . In a diagram this is
T
U V
Ψ Φ
n AT m
R R
Matrices
Let A ∈ M (m × n).
• The column space CA of A is the linear span of its column vectors. It is equal to Im A.
• The row space RA of A is the linear span of its row vectors. It is equal to the orthogonal
complement of ker A.
• dim RA = dim CA = dim(Im A) = number of columns with pivots in any echelon form of A.
FT
Kernel and image of A:
• dim(ker A) = number of free variables = number of columns without pivots in any row echelon
form of A.
ker A is equal to the solution set of A~x = ~0 which can be determined for instance with the
Gauß or Gauß-Jordan elimination.
• dim(Im A) = dim CA = number of columns with pivots in any row echelon form of A.
RA
Im(A) be be found by either of the following two methods:
(i) row reduction of A. The columns of the original matrix A which correspond to the
columns of the row reduced echelon form of A are a basis of Im A.
(ii) column reduction of A. The remaining columns are a basis of Im A.
6.6 Exercises
D
1. Determine si las siguientes funciones son lineales. Si lo son, calcule el kernel y la dimensión del
kernel.
x
3 2x + y x−z
(a) A : R → M (2x2), A y = ,
x + y − 3z z
z
x
3 2xy x−z
(b) B : R → M (2x2), A y =
,
x + y − 3z z
z
(c) C : M (2 × 2) → M (2 × 2), C(M ) = M + M t
(d) D : P3 → P4 , Dp = p0 + xp,
a+b b+c c+d
(e) T : P3 → M (2 × 3), T (ax3 + bx2 + cx + d) = ,
0 a+d 0
a+b b+c c+d
(f) T : P3 → M (2 × 3), T (ax3 + bx2 + cx + d) = .
0 a+d 3
FT
5. Sean U, V espacios vectoriales sobre K (con K = R o K = C). Sabemos de Ejercicio 4 que
L(U, V ) es un espacio vectorial. Fije un vector v0 ∈ V . Demuestre que la siguiente función es
(c) Calcule Im(A), Im(F A), Im(AE) y sus dimensiones. Dibújalos y diga cual es la relación
entre ellos.
(d) Calcule ker(A), ker(F A), ker(AE) y sus dimensiones. Dibújalos y diga cual es la relación
entre ellos.
11. (a) Encuentre por lo menos dos diferentes funciones lineales biyectivas de M (2 × 2) a P3 .
FT
(b) Existe una función lineal biyectiva S : M (2 × 2) → Pk para k ∈ N, k 6= 3?
1 0 1
D
13. (a) Sean ~v1 = 4 , ~v2 = 1 , ~v3 = 0 y sea B = {~v1 , ~v2 , ~v3 }. Demuestre que B es una
7 2 2
1 0
base de R3 y escriba los vectores ~x = 2 , ~y = 1 en términos de la base B.
3 1
1 2 3 2 3 2
14. Sean R = , S= , T = . Demuestre que B = {R, S, T } es una base del
0 3 0 7 0 1
espacio de las matrices triangulares superiores y exprese las matrices
1 1 0 0 1 0
K= , L= , M=
0 1 0 1 0 1
en términos de la base B.
1 3 −1 3
15. Sean ~a1 = , ~a2 = , ~b1 = , ~b2 = ∈ R2 y sean A = {~a1 , ~a2 }, B = {~b1 , ~b2 }.
2 1 1 2
2 −1 4
16. Sea B = {~b1 , ~b2 } una base de R2 y sean ~x1 = , ~x2 = , ~x3 = (dados en
3 1 6
coordenadas cartesianas).
3 3
(a) Si se sabe que ~x1 = , ~x2 = , es posible calcular ~b1 y ~b2 ? Si sı́, calcúlelos. Si
1 B 2 B
FT
no, explique por qué no es posible.
3 6
(b) Si se sabe que ~x1 = , ~x3 = , es posible calcular ~b1 y ~b2 ? Si sı́, calcúlelos. Si
1 B 2 B
no, explique por qué no es posible.
3 6
(c) ¿Existen ~b1 y ~b2 tal que ~x1 = , ~x2 = ? Si sı́, calcúlelos. Si no, explique por
1 B 2 B
qué no es posible.
3 2
RA
(d) ¿Existen ~b1 y ~b2 tal que ~x1 = , ~x3 = ? Si sı́, calcúlelos. Si no, explique por
1 B 5 B
qué no es posible.
Φ : M (2 × 2) → M (2 × 2), Φ(A) = At
FT
RA
D
FT
In this chapter we will work in Rn and not in arbitrary vector spaces since we want to explore in
more detail its geometric properties. In particular we will discuss orthogonality. Note that in an
arbitrary vector space, we do not have the concept of angles or orthogonality. Everything that we
will discuss here can be extended to inner product spaces where the inner product is used to define
angles. Recall that we showed in Theorem 2.19 that for non-zero vectors ~x, ~y ∈ Rn the angle ϕ
between them satisfies the equation
h~x , ~y i
RA
cos ϕ = .
k~xk k~y k
In a general inner product space (V, h· , ·i) this equation is used to define the angle between two
vectors. In particular, two vectors are said to be orthogonal if their inner product is 0. Inner
product spaces are useful for instance in physics, and maybe in some not so distant future there
will be chapter in these lecture notes about them.
First we will define what the orthogonal complement of a subspace of Rn is and we will see that
the direct sum of a subspace and its orthogonal complement gives us all of Rn .
D
We already know what the orthogonal projection of a vector ~x onto another vector ~y 6= ~0 is (see
Section 2.3). Since it is independent of the norm of ~y , we can just as well consider it the orthogonal
projection of ~x onto the line generated by ~y . In this chapter we will generalise the concept of an
orthogonal projection onto a line to the orthogonal projection onto an arbitrary subspace.
As an application, we will discuss the minimal squares method for the approximation of data.
231
232 7.1. Orthonormal systems and orthogonal bases
Definition 7.1. (i) A set of vectors ~x1 , . . . , ~xk ∈ Rn is called an orthogonal set if they are
pairwise orthogonal; in formulas we can write this as
(ii) A set of vectors ~x1 , . . . , ~xk ∈ Rn is called an orthonormal set if they are pairwise orthonormal;
in formulas we can write this as
(
1 for j = `,
h~xj , ~x` i =
0 for j 6= `.
The difference between an orthogonal and an orthonormal set is that in the latter we additionally
require that each vector of the set satisfies h~xj , ~xj i = 1, that is, that k~xj k = 1. Therefore an
orthogonal set may contain vectors of arbitrary lengths, including the vector ~0, whereas in an
orthonormal all vectors set must have length 1. Note that every orthonormal system is also an
orthogonal system. On the other hand, every orthogonal system which does not contain ~0 can be
converted to an orthonormal one by normalising each vector (that is, by dividing each vector by its
FT
norm).
Examples 7.2. (i) The following systems are orthogonal systems but not orthonormal systems
since the norm of at least one of their vectors is different from 1:
1 0 0
1 3 0 1 3
, , , , , 0 , 1 , −2 .
−1 3 0 −1 3
0 2 1
RA
(ii) The systems following systems are orthonormal systems:
1 0 0
1 1 1 1 1 1
√ , √ , 0 , √ 1 , √ −2 .
2 −1 2 1
0 5 2 5 1
We have to show that all αj must be zero. To do this, we take the inner product on both sides
with the vectors ~xj . Let us start with ~x1 . We find
Since h~0 , ~x1 i = 0, h~x1 , ~x1 i = k~x1 k2 = 1 and h~x2 , ~x1 i = · · · = h~xn−1 , ~xn−1 i = h~xn , ~x1 i = 0, it
follows that
0 = α1 + 0 + · · · + 0 = α1 .
Now we can repeat this process with ~x2 , ~x3 , . . . , ~xn to show that α2 = · · · = αn = 0.
Definition 7.4. An orthonormal basis of Rn is a basis whose vectors form an orthogonal set.
Occasionally we will write ONB for “orthonormal basis”.
FT
1 1 1 −1 1 −3 1
1 1 1 1 1
√ −1 , √ 1 , √ 1 , √ 2 , √ 0 , √ −5 .
3 2 0 6 14 10 35
1 2 3 1 3
cos ϕ − sin ϕ
Exercise 7.6. Show that every orthonormal basis of R2 is of the form ,
sin ϕ cos ϕ
cos ϕ sin ϕ
or , for some ϕ ∈ R. See also Exercise 7.13.
sin ϕ − cos ϕ
RA
We will see in Corollary 7.31 that every orthonormal system in Rn can be completed to an or-
thonormal basis. In Section 7.5 we will show how to construct an orthonormal basis of a subspace
of Rn from a given basis. In particular it follows that every subspace of Rn has an orthonormal
basis.
Orthonormal bases are very useful. Among other things it is very easy to write a given vector
~ ∈ Rn as a linear combination of such a basis. Recall that if we are given an arbitrary basis
w
~z1 , . . . , ~zn of Rn and we want to write a vector ~x as linear combination of this basis, then we have
D
to find coefficients α1 , . . . , αn such that ~x = α1 ~z1 +· · ·+αn ~zn , which means we have to solve a n×n
system in order to determine the coefficients. If however the given basis is an orthonormal basis,
then calculating the coefficients reduces to evaluating n inner products as the following theorem
shows.
Theorem 7.7 (Representation of a vector with respect to an ONB). Let ~x1 , . . . , ~xn be an
orthonormal basis of Rn and let w
~ ∈ Rn . Then
~ = hw
w ~ , ~x1 i~x1 + hw
~ , ~x2 i~x2 + · · · + hw
~ , ~xn i~xn .
Now let us take the inner product on both sides with ~xj for j = 1, . . . , n. Note that h~xk , ~xj i = 0
if k 6= j and that h~xj , ~xj i = k~xj k2 = 1.
hw
~ , ~xj i = hα1 ~x1 + α2 ~x2 + · · · + αn ~xn , ~xj i
= α1 h~x1 , ~xj i + α2 h~x2 , ~xj i + · · · + αn h~xn , ~xj i
= αj h~xj , ~xj i = αj .
Note that the proof of this theorem is essentially the same as that of Lemma 7.3. In fact, Lemma 7.3
follows from the theorem above if we choose w ~ = ~0.
Exercise 7.8. If ~x1 , . . . , ~xn are an orthogonal, but not necessarily orthonormal basis of Rn , then
we have for every w~ ∈ Rn that
hw
~ , ~x1 i hw
~ , ~x2 i hw
~ , ~xn i
w
~= ~x1 + ~x2 + · · · + ~xn .
k~x1 k2 k~x2 k2 k~xn k2
(You can either use a modified version of the proof of Theorem 7.7 or you define yj = k~xj k−1 ~xj ,
FT
show that ~y1 , . . . , ~yn is an orthogonal basis and apply the formula from Theorem 7.7.)
qn1 C
qnn C
If we exchange the role of B and C and use that hw ~ i , ~uj i = h~uj , w~ i i, then we obtain
FT
hw
~ 1 , ~u1 i hw
~ 2 , ~u1 i hw
~ n , ~u1 i h~u1 , w ~ 1 i h~u1 , w~ 2i h~u1 , w
~ n i
hw
~ 1 , ~u2 i hw
~ 2 , ~u2 i hw
~ n , ~u2 i h~u2 , w~ 1 i h~u2 , w~ 2i h~u2 , w
~ n i
AC→B =
=
.
hw
~ 1 , ~un i hw
~ 2 , ~un i hw~ n , ~un i h~un , w
~ 1 i h~un , w~ 2i h~un , w
~ ni
RA
This shows that AC→B = (AB→C )t . If we use that AC→B = (AB→C )−1 , then we find that
Lemma 7.9. Let B = {~u1 , . . . , ~un } and C = {w ~ n } be orthonormal bases of Rn and let
~ 1, . . . , w
Q = AB→C be the transition matrix from the basis B to the basis C. Then
Qt = Q−1 .
D
Proof. (i) =⇒ (ii): Assume that Q is orthogonal. Then it is invertible, hence also Qt is invertible
by Theorem 3.50 and (Qt )−1 = (Q−1 )t = (Qt )t = Q holds. Hence Qt is an orthogonal matrix.
(ii) =⇒ (i): Assume that Qt is an orthogonal matrix. Then (Qt )t = Q must be an orthogonal
matrix too by what we just proved.
(i) =⇒ (iii): Assume that Q is orthogonal. Then it is invertible and (Q−1 )−1 = (Qt )−1 = (Q−1 )t
where in the second step we used Theorem 3.50. Hence Q−1 is an orthogonal matrix.
(iii) =⇒ (i): Assume that Q−1 is an orthogonal matrix. Then its inverse (Q−1 )−1 = Q must be
an orthogonal matrix too by what we just proved.
By Lemma 7.9, every transition matrix from one ONB to another ONB is an orthogonal matrix.
The reverse is also true as the following theorem shows.
(i) Q is an orthogonal matrix if and only if its columns are an orthonormal basis of Rn .
(ii) Q is an orthogonal matrix if and only if its rows are an orthonormal basis of Rn .
FT
(iii) If Q is an orthgonal matrix, then | det Q| = 1.
Proof. (i): Assume that Q is an orthogonal matrix and let ~cj be its columns. We already know
that they are a basis of Rn since Q is invertible. In order to show that they are also an orthonormal
system, we calculate
h~c , ~c i h~c1 , ~c2 i h~c1 , ~cn i
1 1
RA
~c1
h~c2 , ~c1 i h~c2 , ~c2 i h~c2 , ~cn i
id = Qt Q = ... (~c1 | · · · | ~cn ) = . (7.1)
~cn
h~cn , ~c1 i h~cn , ~c2 i h~cn , ~cn i
Since the product is equal to the identity matrix, it follows that all the elements on the diagonal
must be equal to 1 and all the other elements must be equal to 0. This means that h~cj , ~cj i = 1 for
j = 1, . . . , n and h~cj , ~ck i = 0 for j 6= k, hence the columns of Q are an orthonormal basis of Rn .
D
Now assume that the columns ~c1 , . . . , ~cn of Q are an orthonormal basis of Rn . Then clearly (7.1)
holds which shows that Q is an orthogonal matrix.
(ii): The rows of Q are the columns of Qt hence they are an orthonormal basis of Rn by (i) and
Proposition 7.11 (ii).
(iii): Recall that det Qt = det Q. Therefore we obtain
1 −1 1 0
then det R = 1, but R−1 = is different from Rt = .
0 1 1 1
Question 7.1
Assume that ~a1 , . . . , ~an ∈ Rn are pairwise orthogonal and let R ∈ M (n × n) be the matrix whose
columns are the given vectors. Can you calculate Rt R and RRt ? What are the conditions on the
vectors such that R is invertible? If it is invertible, what is its inverse? (You should be able to
answer the above questions more or less easily if k~aj k = 1 for all j = 1, . . . , n because in this case
R is an orthogonal matrix.)
cos ϕ − sin ϕ
Exercise 7.13. Show that every orthogonal 2 × 2 matrix is of the form Q =
sin ϕ cos ϕ
FT
cos ϕ sin ϕ
or Q = . Compare this with Exercise 7.6.
sin ϕ − cos ϕ
Exercise 7.14. Use the results from Section 4.3 to prove that | det Q| = 1 if Q is an orthogonal
2 × 2 or 3 × 3 matrix.
RA
It can be shown that every orthogonal matrix represents either a rotation (if its determinant is 1)
or the composition of a rotation and a reflection (if its determinant is −1).
cos ϕ
cos ϕ − sin ϕ
• In the first case, det Q = det(~c1 |~c2 + ) = det = cos2 ϕ + sin2 ϕ = 1 and Q
sin ϕ cos ϕ
represents the rotation by ϕ counterclockwise.
cos ϕ sin ϕ
−
• In the second case, det Q = det(~c1 |~c2 ) = det = − cos2 ϕ − sin2 ϕ = −1.
sin ϕ − cos ϕ
and Q represents the rotation by ϕ counterclockwise followed by a reflection on the direction
given by ~c1 (or: reflection on the x-axis followed by the rotation by ϕ counterclockwise).
y
Q y
cos ϕ
~c1 = sin ϕ
~e2 − sin ϕ
~c2 + =
cos ϕ
ϕ
(a) x x
~e1
y
Q y
cos ϕ
~c1 = sin ϕ
~e2
ϕ
(b) x x
~e1
FT
~c2 − = sin ϕ
− cos ϕ
Figure 7.1: In case (a), Q represents a rotation and det A = 1. In case (b) it represents rotation
followed by a reflection and det Q = −1.
matrix.
Exercise 7.15 together with Exercise 7.16 show the following.
A matrix Q is an orthogonal matrix if and only if it preserves lengths if and only if it preserves
angles. That is
Q is orthogonal ⇐⇒ Qt = Q−1
⇐⇒ hQ~x , Q~y i = h~x , ~y i for all ~x, ~y ∈ Rn
⇐⇒ kQ~xk = k~xk for all ~x ∈ Rn .
Note that every isometry is injective since T ~x = ~0 if and only if ~x = ~0, therefore necessarily n ≤ m.
FT
7.3 Orthogonal complements
The first part of this section works for all vector spaces, not necessarily Rn .
Exercise. • Give an example of two subspaces whose union is not a vector space.
• Give an example of two subspaces whose union is a vector space.
Let us define the sum and the direct sum of vector spaces.
Definition 7.19. Let U, W be subspaces of a vector space V . Then the sum of the vector spaces
U and W is defined as
U + W = {u + w : u ∈ U, w ∈ W }. (7.2)
If in addition U ∩ W = {O}, then the sum is called the direct sum of U and W and one writes
U ⊕ W instead of U + W .
Remark 7.20. (i) Assume that U = span{u1 , . . . , uk } and that W = span{w1 , . . . , wj }, then
U + W = span{u1 , . . . , uk , w1 , . . . , wj }.
FT
(ii) The space U + W is the smallest vector space which contains both U and W .
Examples 7.21. (i) Let V be a vector space and let U ⊆ V be a subspace. Then we always
have:
(a) U + {O} = U ⊕ {O} = U ,
(b) U + U = U ,
(c) U + V = V .
RA
If U and W are subspaces of V , then
(a) U ⊆ U + W and W ⊆ U + W .
(b) U + W = U if and only if W ⊆ U .
(ii) Let U and W be lines in R2 passing through the origin. Then they are subspaces of R2 and
we have that U + W = U if the lines are parallel and U + W = R2 if they are not parallel.
(iii) Let U and W be lines in R3 passing through the origin. Then they are subspaces of R3 and
D
we have that U + W = U if the lines are parallel; otherwise U + W is the plane containing
both lines.
(iv) Let U be a line and W be a plane in R3 , both passing through the origin. Then they are
subspaces of R3 and we have that U + W = W if the line U is contained in W . If not, then
U + W = R3 .
Proof. Let dim U = k and dim W = m. Recall that U ∩ W is a subspace of V . and that U ∩ W ⊆ U
and U ∩W ⊆ W . Let v1 , . . . , v` be a basis of U ∩W . By Theorem 5.46 we can complete it to a basis
v1 , . . . , v` , u`+1 , . . . , uk of U . Similarly, we can complete it to a basis v1 , . . . , v` , w`+1 , . . . , wm of
W . Now we claim that v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm is a basis of U + W .
FT
• Now we show that the vectors v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm are linearly indepen-
dent. Let α1 , . . . , αn , β`+1 , . . . , βm ∈ R such that
It follows that
RA
α1 v1 + · · · + α` v` + α`+1 u`+1 + · · · + αk uk = −(β`+1 w`+1 + · · · + βm wm ) (7.3)
| {z } | {z }
∈U ∈W
γ1 v1 + · · · + γ` v` + β`+1 w`+1 + · · · + βm wm = O.
D
Since the vectors v1 , . . . , v` , w`+1 , . . . , wm form a basis of W , they are linearly independent,
and we conclude that γ1 = · · · = γ` = β`+1 = · · · = βm = 0. Inserting in (7.3), we obtain
α1 v1 + · · · + α` v` + α`+1 u`+1 + · · · + αk uk = O,
hence α1 = · · · = αk = 0.
It follows that
For the rest of this section, we will work in Rn . First let us define the orthogonal complement of a
given subspace.
(ii) The orthogonal complement of U is denoted by U ⊥ and it is the set of all vectors which are
perpendicular to every vector in U , that is
FT
(i) U ⊥ is a subspace of Rn .
(ii) U ∩ U ⊥ = {~0}.
Proof. (i) Clearly, ~0 ∈ U ⊥ . Let ~x, ~y ∈ U ⊥ and let c ∈ R. Then for every ~u ∈ U we have that
h~x + c~y , ~ui = h~x , ~ui + ch~y , ~ui = 0, hence ~x + c~y ∈ U ⊥ and U ⊥ is a subspace by Theorem 5.10.
RA
(ii) Let ~x ∈ U ∩ U ⊥ . Then it follows that ~x ⊥ ~x, hence k~xk2 = h~x , ~xi = 0 which shows that ~x = ~0
and therefore U ∩ U ⊥ consists only of the vector ~0.
(iii) Assume that ~x ∈ (Rn )⊥ . Then ~x ⊥ ~y for every ~y ∈ Rn , in particular also ~x ⊥ ~x. Therefore
k~xk2 = h~x , ~xi = 0 which shows that ~x = ~0. It follows that ~x ∈ (Rn )⊥ .
It is clear that h~x , ~0i = 0, hence Rn ⊆ {~0}⊥ ⊆ Rn which proves that {~0}⊥ = Rn .
D
Examples 7.25. (i) The orthogonal complement of a line in R2 is again a line, see Figure 7.2.
(ii) The orthogonal complement of a line in R3 is the plane perpendicular to the given lines. The
orthogonal complement to a plane in R3 is the line perpendicular to the given plane, see
Figure 7.2.
The next goal is to show that dim U + dim U ⊥ = n and to establish a method for calculating
U ⊥ . To this end, the following lemma is useful. It tells us that in order to verify that some ~x is
perpendicular to U we do not have to check that ~x ⊥ ~u for every ~u ∈ U , but that it is enough to
check it for a set of vectors ~u which generate U .
Lemma 7.26. Let U = span{~u1 , . . . , ~uk } ⊆ Rn . Then ~x ∈ U ⊥ if and only if ~x ⊥ ~uj for every
j = 1, . . . , k.
z
x H
G
U
L
• y •
FT
Figure 7.2: The figure on the left shows the orthogonal complement of the line L in R2 which is the
line G. The figure on the right shows the orthogonal complement of the plane U in R3 which is the
line H. Note the orhtogonal complement of G is U .
Proof. Suppose that ~x ⊥ U , then ~x ⊥ ~u for every ~u ∈ U , in particular for the generating vectors
~u1 , . . . , ~uk . Now suppose that ~x ⊥ ~uj for all j = 1, . . . , k. Let ~u ∈ U be an arbitrary vector in U .
Then there exist α1 , . . . , αk ∈ R such that ~u = α1 ~u1 + · · · + ~uk αk . So we obtain
RA
h~x , ~ui = h~x , α1 ~u1 + · · · + ~uk αk i = h~x , α1 ~u1 i + · · · + αk h~x , ~uk i = 0.
Lemma 7.27. Let U = span{~u1 , . . . , ~uk } ⊆ Rn and let A be the matrix whose rows consist of the
vectors ~u1 , . . . , ~uk . Then
D
U ⊥ = ker A. (7.4)
Proof. Let ~x ∈ Rn . By Lemma 7.26 we know that ~x ∈ U ⊥ if and only if ~x ⊥ ~uj for every
j = 1, . . . , k. This is the case if and only if
which is the same as A~x = ~0 by definition of A. In conclusion, ~x ⊥ U if and only A~x = ~0, that is,
if and only if ~x ∈ ker A.
The next two theorems are the main results of this section.
Proof. Let ~u1 , . . . , ~uk be a basis of U . Note that k = dim U . Then we have in particular U =
span{~u1 , . . . , ~uk }. As in Lemma 7.26 we consider the matrix A ∈ M (k × n) whose rows are the
vectors ~u1 , . . . , ~uk . Then U ⊥ = ker A, so
Note that dim(Im A) is the dimension of the column space of A which is equal to the dimension of
the row space of A by Proposition 6.32. Since the vectors ~u1 , . . . , ~uk are linear independent, this
dimension is equal to k. Therefore dim U ⊥ = n − k = n − dim U . Rearranging we obtained the
FT
desired formula dim U ⊥ + dim U = n.
(We could also have said that the reduced form of A cannot have any zero row because its rows
are linearly independent. Therefore the reduced form must have k pivots and we obtain dim U ⊥ =
dim(ker A) = n − #(pivots of the reduced form of A) = n − k = n − dim U . We basically re-proved
Proposition 6.32.)
(ii) (U ⊥ )⊥ = U .
Proof. (i) Recall that U ∩ U ⊥ = {~0} by Remark 7.24, therefore the sum is a direct sum. Now let
us show that U + U ⊥ = Rn . Since U + U ⊥ ⊆ Rn , we only have to show that dim(U + U ⊥ ) =
n because the only n-dimensional subspace of Rn is Rn itself, see Theorem 5.52. From
Proposition 7.22 and Theorem 7.28 we obtain
D
(ii) First let us show that U ⊆ (U ⊥ )⊥ . To this end, fix ~u ∈ U . Then, for every ~y ∈ U ⊥ , we have
that h~x , ~y i = 0, hence ~x ⊥ U ⊥ , that is, ~x ∈ (U ⊥ )⊥ . Note that dim(U ⊥ )⊥ = n − dim U ⊥ =
n − (n − dim U ) = dim U . Since we already know that U ⊆ (U ⊥ )⊥ , it follows that they must
be equal by Theorem 5.52.
The next proposition shows that every subspace of Rn has an orthonormal basis. Another proof of
this fact will be given later when we introduce the Gram-Schmidt process in Section 7.5.
Proposition 7.30. Every subspace U ⊆ Rn with dim U > 0 has an orthonormal basis.
Proof. Let U be a subspace of Rn with dim U = k > 0. Then dim U ⊥ = n − k and we can choose
a basis w ~ k+1 , . . . , wn of U ⊥ . Let A0 ∈ M ((n − k) × n) be the matrix whose rows are the vectors
~ k+1 , . . . , wn . Since U = (U ⊥ )⊥ , we know that U = ker A0 . Pick any ~u1 ∈ ker A0 with ~u1 6= ~0.
w
Then ~u1 ∈ U . Now we form the new matrix A1 ∈ M ((n−k+1)×n) by adding ~u1 as a new row to the
matrix A0 . Note that the rows of A1 are linearly independent, so dim ker(A1 ) = n−(n−k+1) = k−1.
If k−1 > 0, then we pick any vector ~u2 ∈ ker A1 with ~u2 6= ~0. This vector is orthogonal to all the rows
of A1 , in particular it belongs to U (since it is orthogonal to w ~ k+1 , . . . , w
~ n ) and it is perpendicular
to ~u1 ∈ U . Now we form the matrix A2 ∈ M ((n−k +2)×n) by adding the vector ~u2 as a row to A1 .
Again, the rows of A2 are linearly independent and therefore dim(ker A2 ) = n − (n − k + 2) = k − 2.
If k − 2 > 0, then we pick any vector ~u3 ∈ ker A2 with ~u3 6= ~0. This vector is orthogonal to all
the rows of A2 , in particular it belongs to U (since it is orthogonal to w ~ k+1 , . . . , w
~ n ) and it is
perpendicular to ~u1 , ~u2 ∈ U . We continue this process until we have vectors ~u1 , . . . , ~uk ∈ U which
are pairwise orthogonal and the matrix Ak ∈ M (n × n) consists of linearly independent rows, so its
kernel is trivial. By construction, ~u1 , . . . , ~uk is an orthogonal system of k vectors in U with none of
them being equal to ~0. Hence they are linearly independent and therefore they are an orthogonal
basis of U since dim U = k. In order to obtain an orthonormal basis we only have to normalise
each of the vectors.
Proof. Let w
Then w
~ 1, . . . , w
~ 1, . . . , w
Solution. Recall that ~x ∈ U ⊥ if and only if it is perpendicular to the vectors which generate U .
D
Therefore ~x ∈ U ⊥ if and only if it belongs to the kernel of the matrix whose rows are the generators
of U . So we calculate
1 2 3 4 1 2 3 4 1 0 1 0
−→ −→ .
1 0 1 0 0 −2 −2 −4 0 1 1 2
Solution. We will use the method from Proposition 7.30. Another solution of this exercise will be
given in Example 7.48. From the solution of Example 7.32 we can take the first basis vector w ~ 1.
We append it to the matrix from the solution of Example 7.32 and reduce the new matrix (note
that the first few steps are identical to the reduction of the original matrix). We obtain
1 2 3 4 1 0 1 0 1 0 1 0
1 0 1 0 −→ 0 1 1 2 −→ 0 1 1 2
0 −2 0 1 0 −2 0 1 0 0 2 5
FT
whose kernel is generated by
5
1
.
−5
2
h~v , wi
~
projw~ ~v := w
~ (7.6)
~ 2
kwk
is the unique vector in Rn which is orthogonal to w ~ and satisfies that ~v − projw~ ~v is parallel to
w.
~ We already know that the projection is independent on the length of w. ~ So projw~ ~v should be
regarded as the projection of ~v onto the one-dimensional subspace generated by w.~
In this section we want to generalise this to orthogonal projections on higher dimensional subspaces,
for instance you could think of the projection in R3 onto a given plane. Then, given a subspace U
of Rn , we want to define the orthogonal projection as the function from Rn to Rn which assigns to
each vector ~v its orthogonal projection onto U . We start with the analogue of Theorem 2.22.
Theorem 7.34 (Orthogonal projection). Let U ⊆ Rn be a subspace and let ~v ∈ Rn . Then there
FT
exist uniquely determined vectors ~vk and ~v⊥ such that
The vector ~vk is called the orthogonal projection of ~v onto U ; it is denoted by projU ~v .
Proof. First we show the existence of the vectors ~vk and ~v⊥ . If U = Rn , we take ~vk = ~v and
~v⊥ = ~0. If U = {~0}, we take ~vk = ~0 and ~v⊥ = ~v . Otherwise, let 0 < dim U = k < n. Choose
RA
orthonormal bases ~u1 , . . . , ~uk of U and w ~ n of U ⊥ . This is possible by Theorem 7.28
~ k+1 , . . . , w
and Proposition 7.30. Then ~u1 , . . . , ~uk , w ~ n is an orthonormal basis of Rn and for every
~ k+1 , . . . , w
n
~v ∈ R we find with the help of Theorem 7.7 that
Next we show uniqueness of the decomposition of ~v . Assume that there are vectors ~vk and ~zk ∈ U
and ~v⊥ and ~z⊥ ∈ U ⊥ such that ~v = ~vk + ~v⊥ and ~v = ~zk + ~z⊥ . Then ~vk + ~v⊥ = ~zk + ~z⊥ and,
rearranging, we find that
~vk − ~zk = ~z⊥ − ~v⊥ .
| {z } | {z }
∈U ∈U ⊥
Since U ∩ U ⊥ = {~0}, it follows that ~vk − ~zk = ~0 and ~z⊥ − ~v⊥ = ~0, and therefore ~zk = ~vk and
~z⊥ = ~v⊥ .
Definition 7.35. Let U be a subspace of Rn . Then we define the orthogonal projection onto U as
the map which sends ~v ∈ Rn to its orthogonal projection onto U . It is usually denoted by PU , so
PU : Rn → R n , PU ~v = projU ~v .
Remark 7.36 (Formula for the orthogonal projection). The proof of Theorem 7.34 indicates
how we can calculate the orthogonal projection onto a given subspace U ⊆ Rn . If ~u1 , . . . , ~uk is an
orthonormal basis of U , then
PU ~v = h~u1 , ~v i~u1 + · · · + h~uk , ~v i~uk . (7.8)
This shows that PU is a linear transformation since PU (~x + c~y ) = PU ~x + cPU ~y follows easily from
(7.8).
Exercise. If ~u1 , . . . , ~uk is an orthogonal basis of U (but not necessarily orthonormal), show that
h~u1 , ~v i h~uk , ~v i
PU ~v = ~u1 + · · · + ~uk . (7.9)
k~u1 k2 k~uk k2
Remark 7.37 (Formula for the orthogonal projection for dim U = 1). If dim U = 1, we
~ ∈U
obtain again the formula (7.6) which we already know from Section 2.3. To see this, choose w
~ 6= ~0. Then w
with w ~ 0 = kwk
~ −1 w
~ is an orthonormal basis of U and according to (7.8) we have that
FT
hw~ , ~v i
~ 0 , ~v iw
~ 0 = kwk
~ −1 w ~ −1 w ~ −2 hw
projw~ ~v = projU ~v = hw ~ , ~v kwk ~ = kwk ~ , ~v i w
~= w.
~
~ 2
kwk
Remark 7.38 (Pythagoras’s Theorem). Let U be a subspace of Rn , ~v ∈ Rn and let ~vk and ~v⊥
be as in Theorem 7.34. Then
k~v k2 = k~vk k2 + k~v⊥ k2 .
Exercise 7.39. Let U be a subspace of Rn with basis ~u1 , . . . , ~uk and let w
~ k+1 , . . . , w~ n be a basis
of U ⊥ . Find the matrix representation of PU with respect to the basis ~u1 , . . . , ~uk , w~ k+1 , . . . , w
~ n.
Exercise 7.40. Let U be a subspace of Rn . Show that PU ⊥ = id −PU . (You can show this either
directly or using the matrix representation of PU from Exercise 7.39.)
D
Exercise 7.41. Let U be a subspace of Rn . Show that (PU )2 = PU . (You can show this either
directly or using the matrix representation of PU from Exercise 7.39.)
In Theorem 7.34 we used the concept of orthogonality to define the orthogonal projection of ~v onto
a given subspace. We obtained a decomposition of ~v into a part parallel to the given subspace and
a part orthogonal to it. The next theorem shows that the orthogonal projection of ~v onto U gives
us the point in U which is closest to ~v .
~v z
dist(~v , U )
U
•
pro
jU ~v
FT
Figure 7.3: The figure shows the orthogonal projection of the vector ~v onto the subspace U (which is
a vector) and the distance of ~v to U (which is a number. It is the length of the vector (~v − projU ~v )).
Theorem 7.43. Let U be a subspace of Rn and let ~v ∈ Rn . Then PU ~v is the point in U which is
closest to ~v , that is,
RA
k~v − PU ~v k ≤ k~v − ~uk for every ~u ∈ U.
Taking the square root on both sides shows the desired inequality.
D
Definition 7.44. as Let U be a subspace of Rn and let ~v ∈ R. The we define the distance of ~v to
U as
dist(~v , U ) := k~v − PU ~v k.
This is the shortest distance of ~v to any point in U .
In Remark 7.36 we already found a formula for the orthogonal projection PU of a vector ~v to a
given subspace U . This formula however requires to have an orthonormal basis of U . We want to
give another formula for PU which does not require the knowledge of an orthonormal basis.
Theorem 7.45. Let U be a subspace of Rn with basis ~u1 , . . . , ~uk and let B ∈ M (n × k) be the
matrix whose columns are these basis vectors. Then the following holds.
(i) B is injective.
(ii) B t B : Rk → Rk is a bijection.
(iii) The orthogonal projection onto U is given by the formula
PU = B(B t B)−1 B t .
Proof. (i) By construction, the columns of B are linearly independent. Therefore the unique
solution of B~x = ~0 is ~x = ~0 which shows that B is injective.
(ii) Observe that B t B ∈ M (k × k) and assume that B t B~x = ~0 for some ~x ∈ Rk . Then it follows
for every ~y ∈ Rk that B~y = ~0 because
FT
there exists exactly one ~z ∈ Rk such that PU ~x = B~z. Moreover, ~x − PU ~x ⊥ U = Im B, hence
for every ~y ∈ Rk we have that
Since this is true for every ~y ∈ Rk , it follows that B t ~x − B t B~z = ~0. Now we recall that B t B
is invertible, so we can solve for ~z and obtain ~z = (B t B)−1 B~x. This finally gives
Theorem 7.46. Let U be a subspace of Rn with basis ~u1 , . . . , ~uk . Then there exists an orthonormal
basis ~x1 , . . . , ~xk of U such that
Proof. The proof is constructive, that is, we do not only prove the existence of such basis, but it
tells us how to calculate it. The idea is to construct the new basis ~x1 , . . . , ~xk step by step. In order
to simplify notation a bit, we set Uj = span{~u1 , . . . , ~uj } for j = 1, . . . , k. Note that dim Uj = j
and that Uk = U .
• Set ~x1 = k~v1 k−1~v1 . Then clearly k~x1 k = 1 and span{~u1 } = span{~x1 } = U1 .
• The vector ~x2 must be a normalised vector in U2 which is orthogonal to ~x1 , that is, it must
be orthogonal to U1 . So we simple take ~x2 and subtract its projection onto U1 :
~ 2 = ~x2 − projU1 w
w ~ 2 = ~x2 − proj~x1 w
~ 2 = ~x2 − h~x1 , w
~ 2 i~x1 .
FT
Clearly w ~ 2 ∈ U2 because it is a linear combination of vectors in U2 . Moreover, w ~ 2 ⊥ U1
because
D E
hw
~ 2 , ~x1 i = ~x2 − h~x1 , ~u2 i~x1 , ~x1 = h~x2 , ~x1 i − h~x1 , ~u2 ih~x1 , ~x1 i = h~x2 , ~x1 i − h~x1 , ~u2 i = 0.
~ 2 k−1 w
~x2 = kw ~ 2.
RA
Since ~x2 ∈ U2 it follows that span{~x1 , ~x2 } ⊆ U2 . Both spaces have dimension 2, so they must
be equal.
• The vector ~x3 must be a normalised vector in U3 which is orthogonal to U2 = span{~x1 , ~x2 }.
So we simple take ~x3 and subtract its projection onto U2 :
w~ 3 = ~x3 − projU2 ~u3 = ~x2 − (proj~x1 ~u3 + proj~x2 ~u3 ) = ~x2 − h~x1 , ~u3 i~x1 + h~x1 , ~u3 i~x1 .
~ 3 k−1 w
~x3 = kw ~ 3.
Since ~x3 ∈ U3 it follows that span{~x1 , ~x2 , ~x3 } ⊆ U3 . Since both spaces have dimension 3, they
must be equal.
We repeat this k times until have constructed the basis ~x1 , . . . , ~xk .
Note that the general procedure is as follows:
• Suppose that we already have constructed ~x1 , . . . , ~x` . Then we first construct
FT
1 2 1
We want to find an orthonormal basis ~x1 , ~x2 , ~x3 of U using the Gram-Schmidt process.
Solution. (i) ~x1 = k~v1 k−1~v1 = 21 ~v1 . −3
2
√
(ii) w~ 2 = ~u2 − proj~x1 ~u2 = ~u2 − h~x1 , ~u2 i~x1 = ~u2 − 4~x1 = ~u2 − 2~u1 = 2
−3 1
2 0
1 √
RA
=⇒ ~x2 = kw~ 2 k−1 w
~2 = 2 .
4 1
0
(iii) w~ 3 = ~u3 − projspan{~x1 ,~x2 } ~u3 = ~u3 − h~x1 , ~u3 i~x1 + h~x2 , ~u3 i~x1 = ~u3 − 2~x1 + 4~x2
−2 1 −3 0
5 1 2
√ √2
= 0 − 0 − 2 = − 2
0 1 1 −2
D
1 1 0 0
0
−2
−1 1
√
=⇒ ~x3 = kw~ 3k w ~ 3 = √10 2 .
2
0
Therefore the desired orthonormal basis of U is
1 −3 0
1 2 −2
1 1 √ 1 √
~x1 = 0 , ~x2 = 2 , ~x3 = √ .
2
2
4
10
1 1 2
1 0 0
Note that we will obtain a different basis if you change the order of the given basis ~u1 , ~u2 , ~u3 .
Example 7.48. We will given another solution of Example 7.33. We were asked to find an or-
thonormal basis of the orthogonal complement of
1
1
2 0
U = span , .
3
1
4 0
1 0
FT
We use the Gram-Schmidt process to obtain an orthonormal basis ~x1 , ~x2 of U .
1
(i) ~x1 = k~v1 k−1~v1 = √ ~v1 .
5
−1 0 −5
2 −1 2 −2 1 −1
(ii) ~y2 = w~ 2 − proj~x1 w~ 2 = ~u2 − h~x1 , ~u2 i~x1 = ~u2 − √ ~x1 =
1 − 5 0 = 5 5
5
0 1 −2
RA
5
1 1
=⇒ ~x2 = k~y2 k−1 ~y2 = √
55 −5
2
Therefore
0 5
1 −2 1 −1
~x1 = √ , ~x2 = √ .
5 0 55 5
D
1 2
• apply the Gram-Schmidt process in order to generate an orthonormal basis of a given sub-
space,
• etc.
In this section we want to present the least squares method to fit a linear function to certain
measurements. Let us see an example.
Example 7.49. Assume that we want to measure the Hook constant k of a spring. By Hook’s law
we know that
y = y0 + km (7.10)
where y0 is the elongation of the spring without any mass attached and y is the elongation of the
spring when we attach the mass m to it.
FT y0 y0
RA
ym
m
y y
D
Assume that we measure the elongation for different masses. If Hook’s law is valid and if our
measurements were perfect, then our measured points should lie on a line with slope k. However,
measurements are never perfect and the points will rather be scattered around a line. Assume that
we measured the following.
m 2 3 4 5
y 4.5 5.1 6.1 7.9
y y
8 8
7 7
6 6
g1
5 5
g2
4 4
m m
1 2 3 4 5 1 2 3 4 5
Figure 7.4: The left plot shows the measured data. In the plot on the right we added the two functions
g1 (x) = x + 2.5, g2 (x) = 1.1x + 2 which seem to be reasonable candidates for linear approximations to
FT
the measured data.
The plot gives us some confidence that Hook’s law holds since the points seem to lie more or less
RA
on a line. How do we best fit a line through the points? The slope seems to be around 1. We could
make the following guesses:
Which of the two functions is the better approximation? Are there other approximations that are
even better?
The answer to this questions depend very much on how we measure how “good” an approximation
is. One very common way is the following: For each measured point, we take the difference
∆j := mj − g(mj ) between the measured value and the value of our test function. Then we
P 1
n 2 2
square all these differences, sum them and then we take the square root j=1 (mj − g(mj )) ,
see also Figure 7.5. The resulting number will be our measure for how good our guess is.
y
y
∆n−1 ∆n
∆1 ∆2
m m
x1 x2 x3 . . . xn−1 xn x1 x2 x3 . . . xn−1 xn
Figure 7.5: The graph on the left shows points for which we want to find an approximating linear
function. The graph on the right shows such a linear function and how to measure the error or
Pn 1
2 2
discrepancy between the measured points the proposed line. A measure for the error is j=1 ∆j .
FT
Before we do this for our data, we make some simple observations.
(i) If all the measured point lie on a line and we take this line as our candidate, then this method
gives the total error 0 as it should be.
(ii) We take the squares of the errors in each measured points so that the error is always counted
positive. Otherwise it could happen that the errors cancel each other. If we would simply
sum the errors, then the total error could be 0 while the approximating line is quite far from
RA
all the measure points.
Pn
(iii) There are other ways how to measure the error, for example one could use j=1 |mj − g(mj )|,
but it turns out the methods with the squares has many advantages. (See some course on
optimisation for further details.)
Now let us calculate the errors for our measure points and our two proposed functions.
m 2 3 4 5 m 2 3 4 5
D
y (measured) 4.5 5.1 6.1 7.9 y (measured) 4.5 5.1 6.1 7.9
g1 (m) 4.5 4.5 6.5 7.5 g2 (m) 4.2 5.3 6.4 7.5
y − g1 0 0.6 -0.4 0.4 y − g2 0.3 -0.2 -0.3 0.4
so our second guess seems to be closer to the best linear approximation to our measured points
than the first guess. This exercise will be continued on p. 260.
Now the question arises how we can find the optimal linear approximation.
Best linear approximation. Assume we are given measured data (x1 , y1 ), . . . , (xn , yn ) and we
want to find a linear function g(x) = ax + b such that the total error
n
hX i 21
∆ := (yj − g(xj ))2 (7.11)
j=1
is minimal. In other words, we have to find the parameters a and b such that ∆ becomes as small
as possible. The key here is to recognise the right hand side on (7.11) as the norm of a vector
(here the particular form of how we chose to measure the error is crucial). Let us rewrite (7.11) as
follows:
y1 − (ax1 − b)
hX n i 12 hX n i 12
y2 − (ax2 − b)
∆= (yj − g(xj ))2 = (yj − (axj − b))2 =
..
j=1 j=1
.
yn − (axn − b)
y1 x1 1
FT
y2 x2 1
=
. − a . + b .
.
.. .. ..
yn xn 1
Let us set
y1 x1 1
y2 x2 1
~y = . , ~x = . and ~u = . . (7.12)
.. .. ..
RA
yn xn 1
n
Note that these are vectors in R . Then
and the question is how we have to choose a and b such that this becomes as small as possible. In
other words, we are looking for the point in the vector space spanned by ~x and ~u which is closest
to ~y . By Theorem 7.43 this point is given by the orthogonal projection of ~y onto that plane.
To calculate this projection, set U = span{~x, ~u} and let P be the orthogonal projection onto U .
D
P = A(At A)−1 At
1 Of y and then plant the linear n × 2 system to find the coefficients a and b.
course, you could simply calculate P ~
where A is the n × 2 matrix whose columns consist of the vectors ~x and ~u. Therefore (7.13) becomes
a
A(At A)−1 At ~y = a~x + b~u = A . (7.14)
b
Since by our assumption the columns of A are linearly independent, it is injective. Therefore we
can conclude from (7.14) that
t −1 t a
(A A) A ~y =
b
which is formula for the numbers a and b that we were looking for.
Theorem 7.50. Let (x1 , y1 ), . . . , (xn , yn ) be given. The linear function g(x) = ax + b which min-
imises the total error
hX n i 21
∆ := (yj − g(xj ))2 (7.15)
FT
j=1
is given by
a
= (At A)−1 At ~y (7.16)
b
where ~y , ~x and ~u are as in (7.12) and A is the n × 2 matrix whose columns consist of the vectors ~x
and ~u.
In Remark 7.51 we will show how this formula can be derived with methods from calculus.
RA
Exercise 7.49 continued. . Let us use Theorem 7.50 to calculate the best linear approximation
to the data from Exercise 7.49. Note that in this case the mj correspond to the xj from the theorem
and we will write m
~ instead of ~x. In this case, we have
2 1 2 1 4.5
3 1 3 1 5.1
~ = , ~u = , A = (~x | ~u) =
m , ~y =
,
4 1 4 1 6.1
5 1 5 1 7.9
D
hence
2 1
t 2 3 4 5
3 1
= 54 14 t −1 1 2 −7
AA= , (AA ) =
1 1 1 1 4 1 14 4 5 −7 27
5 1
and therefore
4.5 4.5
a t −1 t 1 2 −7 2 3 4 5.1 = 1 −3 −1
5 1 3
5.1
= (A A) A ~y =
b 10 −7 27 1 1 1 1 6.1 10 13 6 −1 −8 6.1
7.9 7.9
1.12
= .
1.98
7 g
4
m
FT
1 2 3 4 5
Figure 7.6: The plot shows the measured data and the linear approximation g(m) = 1.12m + 1.98
calculated with Theorem 7.50.
The method above can be generalised to other types of functions. We will show how it can be
adapted to the case of polynomial and to exponential functions.
RA
Polynomial functions.. Assume we are given measured data (x1 , y1 ), . . . , (xn , yn ) and we want
to find a polynomial of degree k which best fits the data points. Let p(x) = ak xk + ak−1 xk−1 +
· · · + a1 x + a0 be the desired polynomial. We define the vectors
k k−1
y1 x1 x1 x1 1
y2 xk2 xk−1 x2 1
2
~y = . , ξ~k = . , ξ~k−1 = . , . . . , ξ~1 = . , ξ~0 = . .
.. .. .. .. ..
yn k k−1 xn 1
xn xn
D
where A = (ξ~k | . . . | ξ~0 ) is the n × (k + 1) matrix whose columns are the vectors ξ~k , . . . , ξ~0 . Note
that by our assumption k < n (otherwise the vectors ξ~k , . . . , ξ~0 cannot be linearly independent).
Remark. Generally one should have many more data points than the degree of the polynomial
one wants to fit; otherwise the problem of overfitting might occur. For example, assume that
the curve we are looking for is f (x) = 0.1 + 0.2x and we are given only three measurements:
(0, 0.25), (1, 0), (3, 1). Then a linear fit would give us g(x) = 72 x + 28 1
≈ 0.23x + 0.036. The
1 2 1 1
fit with a quadratic function gives p(x) = 4 x − 2 x + 4 which matches the data points perfectly
but is far away from the curve we are looking for. The reason is that we have too many free
parameters in the polynomial so the fit the data too well. (Note that for any given n + 1 points
(x1 , y1 ), . . . , (xn+1 , yn+1 ) with x1 z . . . , xn+1 , there exists exactly one polynomial p of degree ≤ n
such that p(xj ) = yj for every j = 1, . . . , n + 1.) If we had a lot more data points and we tried to
fit a polynomial to a linear function, then the leading coefficient should become very small but this
effect does not appear if we have very few data points.
2
p
g
1
f
FT
x
−2 −1 1 2 3 4
Figure 7.7: Example of overfitting when we have too many free variables for a given set of data
points. The dots mark the measured points which are supposed to approximate the red curve f .
Fitting polynomial p of degree 2 leads to the green curve. The blue curve g is the result of a linear fit.
RA
Exponential functions.. Assume we are given measured data (x1 , y1 ), . . . , (xn , yn ) and we want
to find a function of form g(x) = c ekx to fit our data point. Without restriction we may assume
that c > 0 (otherwise we fit −g).
Then we only need to define h(x) = ln(g(x)) = ln c + kx so that we can use the method to fit a
linear function to the data points (x1 , ln(y1 )), . . . , (xn , ln(yn )) in order to obtain c and k.
Remark 7.51. Let us show how the formula in Theorem 7.50 can be derived with analytic methods.
Recall that the problem is the following: Let (x1 , y1 ), . . . , (xn , yn ) be given. Find a linear function
D
as a function of the two variables a, b. In order to simplify the calculations a bit, we observe that
it is enough to minimise the square of ∆ since ∆(a, b) ≥ 0 for all a, b, and therefore it is minimal if
To this end, we have to derive F . Since F : R2 → R, the derivative will be vector valued function.
We find
X n n
∂F ∂F X
DF (a, b) = (a, b), (a, b) = −2xj (yj − axj − b), −2(yj − axj − b)
∂a ∂b j=1 j=1
X n n
X n
X n
X n
X
=2 a x2j + b xj − xj yj , a xj + nb − yj .
j=1 j=1 j=1 j=1 j=1
Now we need to find the critical points, that is, a, b such that DF (a, b) = 0. This is the case for
n n n
X X X
2
FT
a
xj + b xj = xj yj
Pn Pn ! Pn
2
j=1 xj j=1 xj a j=1 xj yj
j=1
j=1 j=1
n n
that is Pn = Pn .
a
X X
j=1 x j n b j=1 yj
xj + bn = yj
j=1 j=1
(7.18)
Now we can multiply on both sides from the left by the inverse of the matrix and obtain the solution
for a, b. This shows that F has only one critical point. Since F tends to infinity for k(a, b)k → ∞,
RA
the function F must indeed have a minimum in this critical point. For details, see a course on
vector calculus or optimisation.
We observe the following: If, as before, we set
x1 1 x1 1 y1
.. .. .. ..
~x = . , ~u = . , A = (~x | ~u) = . , ~y = . ,
xn 1 xn 1 yn
then
D
n
X n
X n
X n
X
x2j = h~x , ~xi, xj = h~x , ~ui, n = h~u , ~ui, xj yj = h~x , ~y i, yj = h~u , ~y i.
j=1 j=1 j=1 j=1
which becomes our equation (7.16) if we multiply both sides of the equation from the left by
(At A)−1 .
FT
• fit a linear function to given data points,
• fit a polynomial to given data points,
• fit an exponential function to given data points,
• etc.
7.7 Summary
RA
Let U be a subspace of Rn . Then its orthogonal complement is defined by
• U ⊥ is a vector space.
• U ⊥ = ker A where A is any matrix whose rows are formed by a basis of U .
D
• (U ⊥ )⊥ = U .
• dim U + dim U ⊥ = n.
• U ⊕ U ⊥ = Rn .
• U has an orthonormal basis. One way to construct such a basis is to first construct an
arbitrary basis of U and then apply the Gram-Schmidt orthogonalisation process to obtain
an orthonormal basis.
• PU is a linear transformation.
• PU ~x k U for every ~x ∈ Rn .
• ~x − PU ~x ⊥ U for every ~x ∈ Rn .
• For every ~x ∈ Rn the point in U nearest to ~x is given by ~x − PU ~x and dist(~x, U ) = k~x − PU ~xk.
• Formulas for PU :
Orthogonal matrices
A matrix Q ∈ M (n × n) is called an orthogonal matrix if it is invertible and if Q−1 = Qt . Note
FT
that the following assertions for a matrix Q ∈ M (n × n) are equivalent:
Every orthogonal matrix represents either a rotation (in this case its determinant is 1) or a com-
position of a rotation with a reflection (in this case its determinant is −1).
7.8 Exercises
D
1. (a) Complete p1/4 a una base ortonormal para R2 . ¿Cuántas posibilidades hay para
15/16
hacerlo?
√ √
1/ √2 1/√3
(b) Complete −1/ 2 , 1/√3 a una base ortonormal para R3 . ¿Cuántas posibilidades
0 1/ 3
hay para hacerlo?
√
1/√2
(c) Complete 1/ 2 a una base ortonormal para R3 . ¿Cuántas posibilidades hay para
0
hacerlo?
2. Encuentre una base para el complemento ortogonal de los siguientes espacios vectoriales. En-
cuentre la dimensión del espacio y la dimensión de su complemento ortogonal.
1 2 1 3 2
2 3 , , 3 ⊆ R4 .
2 4
, ⊆ R , 4
(a) U = span 3 4 (b) U = span 3 5 4
4 5 4 6 5
FT
(i) Encuentre una base ortogonal de W .
(ii) Sean ~a1 = (1, 2, 0, 1)t , ~a2 = (11, 4, 4, −3)t , ~a3 = (0, −1, −1, 0)t . Para cada j = 1, 2, 3
encuentre el punto w ~ j ∈ W que esté más cercano a ~aj y calcule la distancia entre ~aj
yw~j.
(iii) Encuentre la matriz que representa la proyección ortogonal sobre W (en la base es-
tandar).
0 −1 1 0
RA
0 1 3 0 0 0
2
~ = 2, ~a = 4, ~b = 0 , ~c = 1, d~ = 1.
2, w
4. Sean ~v = 3
0 0 0 0
1 5
0 3 1 1
(a) Demuestre que ~v y w ~ son linealmente independientes y encuentre una base ortonormal de
~ ⊆ R4 .
U = span{~v , w}
(b) Demuestre que ~a, ~b, ~c y d~ son linealmente independientes. Use el proceso de Gram-Schmidt
para encontrar una base ortonormal de U = span{~a, ~b, ~c, d} ~ ⊆ R5 . Encuentre una base de
D
⊥
U .
6. Una bola rueda a lo largo del eje x con velocidad constante. A lo largo de la trayectoria de la
bola se miden las coordenadas x de la bola en ciertos tiempos t. Las siguientes mediciones son
(t en segundos, x en metros):
(b) Use el método de mı́nimos cuadrados para econtrar la posición inicial x0 y la velocidad v
de la bola.
(c) Dibuje la recta en el bosquejo anterior. ¿Dónde/Cómo se ven x0 y v?
7. Se supone que una sustancia quı́mica inestable decaye según la ley P (t) = P0 ekt . Suponga que
se hicieron las siguientes mediciones:
t 1 2 3 4 5
P 7.4 6.5 5.7 5.2 4.9
Con el método de mı́nimos cuadrados aplicado a ln(P (t)), encuentre P0 y k que mejor corre-
sponden con las mediciones. Dé una estimada para P (8).
FT
8. Con el método de mı́nimos cuadrados encuentre el polı́nomio y = p(x) de grado 2 que mejor
aproxima los siguientes datos:
x -2 -1 0 1 2 3 4
y 15 8 2.8 -1.2 -4.9 -7.9 -8.7
cos ϕ sin ϕ
10. (a) Sea ϕ ∈ R y sean ~v1 = , ~v2 = . Demuestre que ~v1 , ~v2 es una base
− sin ϕ cos ϕ
ortonormal de R2 .
D
(b) Sea α ∈ R. Encuentre la matriz Q(α) ∈ M (2 × 2) que describe rotación por α contra las
manecillas del reloj.
(c) Sean α, β ∈ R. Explique por qué es claro que Q(α)Q(β) = Q(α + β). Use esta relación
para concluir las identidades trigonométricas
cos(α + β) = cos α cos β − sin α sin β, sin(α + β) = sin α cos β + cos α sin β.
FT
RA
D
FT
In this chapter we work mostly in Rn and in Cn . We write MR (n × n) or MC (n × n) only if it is
important if the matrix under consideration is a real or a complex matrix.
The first section is dedicated to Cn . We already know that it is a vector space. But now we
introduce an inner product on it. Moreover we define hermitian and unitary matrices on Cn which
are analogous to symmetric and orthogonal matrices in Rn . We define eigenvalues and eigenvectors
in Section 8.3. It turns out that it is more convenient to work over C because the eigenvalues
RA
are zeros of the so-called characteristic polynomial and in C every polynomial has a zero. The
main theorem is Theorem 8.48 which says that an n × n matrix is diagonalisable if it has enough
eigenvectors to generate Cn (or Rn ). It turns out that every symmetric and every hermitian matrix
is diagonalisable.
We end the chapter with an application of orthogonal diagonalisation to the solution of quadratic
equations in two variables.
D
z1
n ..
C = . : z1 , . . . , zn ∈ C
zn
267
268 8.1. Complex vector spaces
FT
zn
w1
~ = ... ∈ Cn the inner product (or scalar product or dot product) is defined as
w
wn
* z1 w1 + n
.. .. X
h~z , wi
~ = . , . = zj wj = z1 w1 + · · · + zn wn .
RA
zn wn j=1
z1
The length of ~z = ... ∈ Rn is denoted by k~zk and it is given by
zn
p
k~zk = |z1 |2 + · · · + |zn |2 .
Other names for the length of ~z are magnitude of ~z or norm of ~z.
D
Exercise 8.2. Show that the scalar product from Definition 8.1 can be viewed as an extension of
the scalar product in Rn in the following sense: If the components of ~z and ~v happen to be real,
then they can also be seen as vectors in Rn . The claim is that their scalar product as vectors in
Rn is equal to their scalar product in Cn . The same is true for their norms.
Properties 8.3. (i) Norm of a vector: For all vectors ~z ∈ Cn , we have that
h~z , ~zi = k~zk2 .
(iii) Sesqulinearity of the inner product: For all vectors ~u, ~v , ~z ∈ Cn and all c ∈ C, we have that
h~v + cw
~ , ~zi = h~v , ~zi + chw
~ , ~zi and h~v , w
~ + c~zi = h~v , wi
~ + ch~v , ~zi.
vn wn zn
FT
= v1 w1 + · · · + vn wn + cw1 w1 + · · · + cwn wn
= h~v , ~zi + chw
~ , ~zi.
The second equation can be shown by an analogous calculation. Instead of repeating them,
we can also use the symmetry property of the inner product:
h~v , w
~ + c~zi = hw
~ + c~z , ~v i = hw
~ , ~v i + ch~z , ~v i = hw
~ , ~v i + ch~z , ~v i = h~v , ~zi + ch~v , ~zi.
RA
(iv) kc~zk2 = hc~z , c~zi = cch~z , ~zi = |c|2 k~zk2 . Taking the square root on both sides, we obtain the
desired equality kc~zk = |c|k~zk.
For Cn there is no cosine theorem and in general it does not make too much sense to speak about
the angle between two complex vectors (orthogonality still makes sense!).
h~ vi
z ,~
(ii) If ~v 6= ~0, then the orthogonal projection of ~z onto ~v is proj~v ~z = k~v k2 ~
v.
Proof. (i) If ~z ⊥ ~v , then k~z +~v k2 = h~z , ~zi + h~z , ~v i + h~v , ~zi + h~v , ~v i = h~z , ~zi + h~v , ~v i = k~zk2 + k~v k2 .
(ii) It is clear that ~z = ~zk + ~z⊥ and that ~zk k ~v by definition of ~zk and ~z⊥ . That ~z⊥ ⊥ ~v follows
from
h~z , ~v i
h~z⊥ , ~v i = h~z − proj~v ~z , ~v i = h~z , ~v i − hproj~v ~z , ~v i = h~z , ~v i − h~v , ~v i = h~z , ~v i − h~z , ~v i = 0.
k~v k2
Finally, by the Pythagoras theorem,
k~zk2 = k(~z − proj~v ~z) + proj~v ~zk2 = k~z − proj~v ~zk2 + k proj~v ~zk2 ≥ k proj~v ~zk2 .
FT
= proj~v ~z1 + c proj~v ~z2 .
Question 8.1
What changes if in the definition of the orthogonal projection we put h~v , ~zi instead of h~z , ~v i?
Now let us show the triangle inequality. Note the the following inequalities (8.1) and (8.2) were
proved for real vector spaces in Corollary 2.20 using the cosine theorem.
RA
Proposition 8.6. For all vectors ~v , w ~ ∈ Cn and c ∈ C, we have the Cauchy-Schwarz inequality
(which is a special case of the so-called Hölder inequality)
|h~v , wi|
~ ≤ k~v k kwk
~ (8.1)
inequality are equal to 0. So let us assume now that w ~ 6= ~0. Note that for any λ ∈ C we have that
~ 2 = h~v − λw
0 ≤ k~v − λwk ~ = kvk2 − λhw
~ , ~v − λwi ~ + |λ|2 kwk2 .
~ , ~v i − λh~v , wi
If we chose λ = − h~
v ,wi
~
~ 2 , we obtain
kwk
h~v , wi
~ h~v , wi
~ ~ 2
|h~v , wi|
0 ≤ kvk2 − hw~ , ~
v i − h~
v , wi
~ + kwk2
~ 2
kwk ~ 2
kwk ~ 4
kwk
~ 2
|h~v , wi| ~ 2
|h~v , wi|
= kvk2 − 2 + kwk2
~ 2
kwk ~ 4
kwk
~ 2
|h~v , wi| 1 h i
= kvk2 − = kvk 2
kwk 2
− |h~
v , wi|
~ 2
~ 2
kwk kwk~ 2
~ 2 = h~v + w
k~v + wk ~ = h~v , ~v i + h~v , wi
~ , ~v + wi ~ + hw
~ , ~v i + hw
~ , wi
~
~ + h~v , wi
= h~v , ~v i + h~v , wi ~ + hw
~ , wi
~
= k~v k2 + 2 Reh~v , wi ~ 2
~ + kwk
≤ k~v k2 + 2|h~v , wi| ~ 2 ≤ k~v k2 + 2k~v k kwk
~ + kwk ~ 2 = (k~v k + kw|)
~ + kwk ~ 2.
In the first inequality we used that Re a ≤ |a| for any complex number a and in the second inequality
we used (8.1). If we take the square root on both sides we get the triangle inequality.
Remark 8.7. Observe that the choice of λ in the proof of (8.1) is not as arbitrary as it may seem.
Note that for this particular λ
FT
h~v , wi
~
~v − λw
~ = ~v − ~ = ~v − projw~ ~v .
w
~ 2
kwk
Hence this choice of λ minimises the norm of ~v −λw
~ and ~v −projw~ ~v ⊥ w.
~ Therefore, by Pythagoras,
In the complex case, we want for a given matrix A ∈ MC (m × n) a matrix A∗ such that
Lemma 8.9. Let A ∈ M (n × n). Then det(A∗ ) = det A = complex conjugate of det A.
Proof. det A∗ = det(A)t = det A = det A. The last equality follows directly from the definition of
the determinant.
A matrix with real entries is symmetric if and only if A = At . The analogue for complex matrices
are hermitian matrices.
FT
• A= =⇒ A∗ = . The matrix A is hermitian.
2 − 3i 5 2 − 3i 1 + 7I
Exercise 8.12. • Show that the entries on the diagonal of a hermitian matrix must be real.
Another important class of real matrices are the orthogonal matrices. Recall that a matrix Q ∈
MR (n × n) is an orthogonal matrix if and only if Qt = Q−1 . We saw that if Q is orthogonal, then
RA
its columns (or rows) form an orthonormal basis for Rn and that | det Q| = 1, hence det Q = ±1.
The analogue in complex vector spaces are so-called unitary matrices.
It is clear from the definition that a matrix is unitary if and only if its columns (or rows) form an
orthonormal basis for Cn , cf. Theorem 7.12.
D
(a) Q is unitary.
(b) hQ~x , Q~y i = h~x , ~y i for all ~x, ~y ∈ Rn .
(c) kQ~xk = k~xk for all ~x ∈ Rn .
Proof. (i) (a) =⇒ (b): Assume that Q is a unitary matrix and let ~x, ~y ∈ Cn . Then
(b) =⇒ (a): Fix ~x ∈ Cn . Then we have hQ~x , Q~y i = h~x , ~y i for all ~y ∈ Cn , hence
0 = hQ~x , Q~y i − h~x , ~y i = hQ∗ Q~x , ~y i − h~x , ~y i = hQ∗ Q~x − ~x , ~y i. = h(Q∗ Q − id)~x , ~y i.
Since this is true for any ~y ∈ Cn , it follows that (Q∗ Q − id)~x = 0. Since ~x ∈ Cn was arbitrary,
we conclude that Q∗ Q − id = 0, in other words, that Q∗ Q = id.
(b) =⇒ (c): It follows from (b) that kQ~xk2 = hQ~x , Q~xi = h~x , ~xi = k~xk2 , hence kQ~xk = k~xk.
(c) =⇒ (b): Observe that the inner product of two vectors in Cn can be expressed completely
in terms of norms as follows
1h i
h~a , ~bi = k~a + ~bk2 − k~a − ~bk2 + ik~a + i~bk2 − ik~a − i~bk2
4
as can be easily verified. Hence we find
1h i
hQ~x , Q~y i = kQ~x + Q~y k2 − kQ~x − Q~y k2 + ikQ~x + iQ~y k2 − ikQ~x − iQ~y k2
4
1h
FT
i
= kQ(~x + ~y )k2 − kQ(~x − ~y )k2 + ikQ(~x + i~y )k2 − ikQ(~x − i~y )k2
4
1h i
= k~x + ~y k2 − k~x − ~y k2 + ik~x + i~y k2 − ik~x − i~y k2
4
= h~x , ~y i.
A = C −1 BC. (8.3)
Question 8.2
FT
Exercise 8.17. Show that A ∼ B if and only if there exists an invertible matrix C
A = CB
e Ce −1 .
e such that
Assume that A and B are similar. Is the matrix C in (8.3) unique or is it possible that there are
(8.4)
RA
different invertible matrices C1 6= C2 such that A = C1−1 BC1 = C2−1 BC2 ?
Remark 8.18. Similarity is an equivalence relation on the set of all square matrices. This means
that it satisfies the following three properties. Let A1 , A2 , A3 ∈ M (n × n). Then:
(i) Reflexivity: A ∼ A for every A ∈ M (n × n).
(ii) Symmetry: If A1 ∼ A2 , then also A2 ∼ A1 .
(iii) Transitivity: If A1 ∼ A2 and A2 ∼ A3 , then also A1 ∼ A3 .
D
Two matrices A and B ∈ M (n × n) are similar if and only if they represent the same linear
transformation. The matrix C in A = C −1 BC is the transition matrix between the two bases
used in the representations A and B.
FT
Proof. Let C ∈ M (n × n) invertible such that A = C −1 BC. Then
det A = det C −1 BC = det(C −1 ) det B det C = (det C)−1 det B det C = det B.
Exercise 8.20. Show that det A = det B does not imply that A and B are similar.
Exercise 8.21. Assume that A and B are similar. Show that dim(ker A) = dim(ker B) and that
RA
dim(Im A) = dim(Im B). Why is this no surprise?
Question 8.3
Assume that A and B are similar. What is the relation between ker A and ker B? What is the
relation between Im A and Im B?
Hint. Theorem 6.4.
A very nice class of matrices are the diagonal matrices because it is rather easy to calculate with
D
In other words, A is diagonalisable if there exists a diagonal matrix D and an invertible matrix C
with
C −1 AC = D. (8.5)
How can we decide if a matrix A is diagonalisable? We know that it is diagonalisable if and only if
it is similar to a diagonal matrix, that is, if and only if there exists a basis ~c1 , . . . , ~cn such that the
representation of A with respect to these vectors is a diagonal matrix. In this case, (8.5) is satisfied
if the columns of C are the basis vectors ~c1 , . . . , ~cn .
Denote the diagonal entries of D by d1 , . . . , dn . Then it easy to see that D~ej = dj~ej . This means
that if we apply D to some ~ej , then the image D~ej is parallel to ~ej . Since D is nothing else than
the representation of A with respect to the basis ~c1 , . . . , ~cn , we have A~cj = dj ~cj .
We can make this more formal: Take equation (8.5) and multiply both sides from the left by C so
that we obtain AC = CD. Recall that for any matrix B, we have that B~ej = jth column of B. If
we obtain
AC~ej = A~cj , AC=CD
======⇒ A~cj = d~cj .
CD~ej = C(dj~ej ) = dj C(~ej ) = dj ~cj .
In summary, we found:
A matrix A ∈ M (n × n) is diagonalisable if and only we can find a basis ~c1 , . . . , ~cn of Rn (or Cn )
and numbers d1 , . . . , dn such that
A~cj = dj ~cj , j = 1, . . . , n.
FT
In this case C −1 AC = D (or equivalently A = CDC −1 ) where D = diag(d1 , . . . , dn ) and C =
(~c1 | · · · |~cn ).
The vectors ~cj are called eigenvectors of A and the numbers dj are called eigenvalues of A. They
will be discussed in greater detail in the next section where we will also see how we can calculate
them.
Diagonalization of a matrix is very useful when we want to calculate powers of the matrix.
k
= C diag(d1 , . . . , dn ) C −1
= C diag(dk1 , . . . , dkn )C −1 .
−1
If all dj 6= 0, then D is invertible with inverse D−1 = diag(d1 , . . . , dn ) = diag(d−1 −1
1 , . . . , dn ).
−1 −1 −1 −1 −1
Hence A is invertible and A = CDC = CD C and we obtain for k ∈ Z with k < 0
|k|
Ak = A−|k| = (A−1 )|k| = CD−1 C −1 = C(D−1 )|k| C −1 = CDk C −1 = CD−|k| C −1
= C diag(dk1 , . . . , dkn )C −1 .
Proposition 8.23 is useful for example when we describe dynamical systems by matrices or when
we solve linear differential equations with constant coefficients in higher dimensions.
FT
T v = λv.
The reason why we exclude v = O in the definition above is because for every λ it is true that
T O = O = λO, so (8.8) would be satisfied for any λ if we were allowed to choose v = O, in which
case the definition would not make too much sense.
(8.6)
RA
Exercise 8.25. Show that 0 is an eigenvalue of T if and only if dim(ker T ) ≥ 1, that is, if and only
if T is not invertible. Show that v is an eigenvector with eigenvalue 0 if and only if v ∈ ker T \{O}.
Exercise 8.26. Show that all eigenvalues of a unitary matrix have norm 1.
Question 8.4
Let V, W be vector spaces and let T : V → W be a linear transformation. Why does in not make
sense to speak of eigenvalues of T if V 6= W ?
D
(iv) We can generalise (iii) as follows: If v1 , . . . , vk are eigenvectors of T with the same eigenvalue
λ, then every non-zero linear combination is an eigenvector with the same eigenvalue because
T (α1 v1 + . . . αk vk ) = α1 T v1 + . . . αk T vk = α1 λv1 + · · · + αk λvk = λ(α1 v1 + · · · + αk vk ).
(iv) says that the set of all eigenvectors with the same eigenvalue is almost a subspace. The only
thing missing is the zero vector O. This motivates the following definition.
Definition 8.27. Let V be a vector space and let T : V → V be a linear map with eigenvalue λ.
Then the eigenspace of T corresponding to λ is
FT
v ∈ Eigλ (T ) ⇐⇒ T v = λv ⇐⇒ T v − λv = O ⇐⇒ T v − λ id v = O
⇐⇒ (T − λ id)v = O ⇐⇒ v ∈ ker(T − λ id).
Note that Proposition 8.28 shows again that Eigλ (T ) is a subspace of V . Moreover it shows that
that λ is an eigenvalue of T if and only if T − λ id is not invertible. For the special case λ = 0 we
have that Eig0 (T ) = ker T .
RA
Examples 8.29. (a) Let V be a vector space and let T = id. Then for every v ∈ V we have that
T v = v = 1v. Hence T has only one eigenvalue, namely λ = 1 and Eig1 (T ) = ker(T − id) =
ker 0 = V . Its geometric multiplicity is dim(Eig1 (T ) = dim V .
(b) Let V = R2 and let R be reflection on the x-axis. If ~v is an eigenvector of R, then R~v
must be parallel to ~v . This happens if and only if ~v is parallel to the x-axis in which case
R~v = ~v , or if ~v is perpendicular to the x-axis in which case R~v = −~v . All other vectors
change directions under a reflection. Hence we have the eigenvalues λ1 = 1 and λ2 = −1 and
Eig1 (R) = span{~e1 }, Eig−1 (R) = span{~e2 }. Each eigenvalue has geometric multiplicity 1.
D
Note that
the matrix representation
of R
with respect
to the canonical basis of R2 is AR =
1 0 1 0 x1 x1
and AR ~x = = . Hence AR ~x is parallel to ~x if and only if
0 −1 0 −1 x2 −x2
x1 = 0 (in which case ~x ∈ span{~e2 }) or x2 = 0 (in which case ~x ∈ span{~e1 }).
(c) Let V = R2 and let R be rotation about 90◦ . Then clearly R~v 6k ~v for any ~v ∈ R2 \ {~0}. Hence
R has no eigenvalues.
2
Note that
the matrix representation of R with respect to the canonical basis of R is AR =
0 −1
. If we consider AR as a real matrix, then it has no eigenvalues. However, if consider
1 0
AR as a complex matrix, then it has the eigenvalues ±i as we shall see later.
1 0 0 0 0 0
050000
(d) Let A = 00 00 50 05 00 00 . As always, we identify A with the linear map R6 → R6 , ~x 7→ A~x. It
000080
000000
is not hard to see that the eigenvalues and eigenspaces of A are
FT
given by gλ (x) = eλx . In particular, the geometric multiplicity of any λ ∈ R is 1.
(f) Let V = C ∞ (R) be the space of all infinitely many times differentiable functions from R to
R and let T : V → V, T f = f 00 . It is easy to see that T is a linear transformation. The
eigenvalues of T are those λ ∈ R such that there exists a function f ∈ C ∞ (R)√with f 00√= λf .
If λ > 0, then the general solution of this differential
√ equation
√ is fλ (x) = a e λx +b e λx . If
λ < 0, the general solution is fλ (x) = a cos λx + b sin λx. If λ = 0, the general solution is
f0 (x) = ax + b. Hence every λ ∈ R is an eigenvalue of T with geometric multiplicity 2.
RA
Write down the eigenspaces for a given λ.
Find the eigenvalues and eigenspaces if we consider the vector space of infinitely differentiable
functions from R to C.
A + At ?
Since any linear transformation on a finite dimensional vector space V can be “translated” to a
matrix by choosing a basis on V , it is sufficient to find eigenvalues of matrices as the next theorem
shows.
Theorem 8.30. Let V be a finite dimensional vector space with basis B = {v1 , . . . , vn } and let
T : V → V be a linear transformation. If AT is the matrix representation of T with respect to
the basis B, then the eigenvalues of T and AT coincide and a vector v = c1 v1 + · · · + cn vn is an
c1
eigenvector of T with eigenvalue λ if and only if ~x = ... is an eigenvector of AT with the same
cn
eigenvalue λ. In particular, the dimensions of the eigenspaces of T and of AT coincide.
Proof. Let K = R if V is a real vector space and K = C if V is a complex vector space and
let Φ : V → Rn be the linear map defined by Φ(vj ) = ~ej , (j = 1, . . . , n). That means that Φ
c1
..
“translates” a vector v = c1 v1 + · · · + cn vn into the column vector ~x = . , cf. Section 6.4.
cn
T
V V
Φ Φ
AT
Kn Km
Recall that T = Φ−1 AT Φ. Let λ be an eigenvalue of T with eigenvector v, that is, T v = λv. We
express v as linear combination of the basis vectors from B as v = c1 v1 + · · · + cn vn . Hence
T v = λv ⇐⇒ Φ−1 AT Φv = λv ⇐⇒ AT Φv = Φλv ⇐⇒ AT (Φv) = λ(Φv)
which is the case if and only if λ is an eigenvalue of AT and Φv ∈ Eigλ (AT ).
FT
The proof shows that Eigλ (AT ) = Φ(Eigλ (T )) as was to be expected.
Corollary 8.31. Assume that A and B are similar matrices and let C be an invertible matrix with
A = C −1 BC. Then A and B have the same eigenvalues and for every eigenvalue λ we have that
Eigλ (B) = C Eigλ (A).
Now back to the question about how to calculate the eigenvalues and eigenvectors of a given matrix
A. Recall that λ is an eigenvalue of A if and only if ker(A − λ id) 6= {~0}, see Proposition 8.28. Since
RA
A − λ id is a square matrix, this is the case if and only if det(A − λ id) = 0.
Definition 8.32. The function λ 7→ det(A − λ id) is called the characteristic polynomial of A. It
is usually denoted by pA .
Before we discuss the characteristic polynomial and show that it is indeed a polynomial, we will
describe how to find the eigenvalues and eigenvectors of a given square matrix A.
• Find the zeros λ1 , . . . , λk of the characteristic polynomial. They are the eigenvalues of A.
• For each eigenvalue λj calculate ker(A − λj ), for instance using Gauß-Jordan elimination.
This gives the eigenspaces.
2 1
Example 8.33. Find the eigenvalues and eigenspaces of A = .
3 4
Solution. • The characteristic polynomial of A is
2−λ 1
pA (λ) = det(A − λ id) = det = (2 − λ)(4 − λ) − 3 = λ2 − 6λ + 5.
3 4−λ
• Now we can either complete the square or use the solution formula for quadratic equations to
find the zeros of pA . Here we choose to complete the square.
pA (λ) = λ2 − 6λ + 5 = (λ − 3)2 − 4 = (λ − 5)(λ − 1).
Hence the eigenvalues of A are λ1 = 5 and λ2 = 1.
• Now we calculate the eigenspaces using Gauß elimination.
2−5 1 −3 1 R2 →R2 +R1 −3 1 R1 →−R1 3 −1
∗ A − 5 id = = −−−−−−−−→ −−−−−−→ .
3 4−5 3 −1 0 0 0 0
1
Therefore, ker(A − 5 id) = span .
3
2−1 1 1 1 R2 →R2 −3R1 1 1
∗ A − 1 id = = −−−−−−−−→ .
3 4−1 3 3 0 0
1
Therefore, ker(A − 1 id) = span .
−1
FT
In summary, we have two eigenvalues,
1
λ1 = 5, Eig5 (A) = span , geom. multiplicity: 1,
3
1
λ2 = 1, Eig1 (A) = span , geom. multiplicity: 1.
−1
1 1
RA
If we set ~v1 = and ~v2 = we can check our result by calculating
3 −1
2 1 1 5 1
A~v1 = = =5 = 5~v1 ,
3 4 3 15 3
2 1 1 1
A~v2 = = = ~v2 .
3 4 −1 −1
Before we give more examples, we show that the characteristic polynomial is indeed a polynomial.
First we need a definition.
D
Definition 8.34. Let A = (aij )ni,j=1 ∈ M (n × n). The trace of A is the sum of its entries on the
diagonal:
tr A := a11 + a22 + . . . ann .
Theorem 8.35. Let A = (aij )ni,j=1 ∈ M (n × n) and let pA (λ) = det(A − λ id) be the characteristic
polynomial of A. Then the following is true.
(i) pA is a polynomial of degree n.
(i) Let pA (λ) = cn λn + cn−1 λn−1 + · · · + c1 λ + c0 . Then we have formulas for the coefficients
cn , cn−1 and c0 :
cn = (−1)n , cn−1 = (−1)n−1 tr A, c0 = det A.
Proof. By definition,
a11 − λ a12 a1n
22 − λ
a a a2n
21
pA (λ) = det(A − λ id) = det
.
an1 an2 ann − λ
According to Remark 4.4, the determinant is the sum of products where each product consists of a
sign and n factors chosen such that it contains one entry from each row and from each column of
A − λ id. Therefore it is clear that pA is a polynomial in λ. The term with the most λ in it is the
one of the form
(a11 − λ)(a22 − λ) · · · (ann − λ). (8.7)
All the other terms contain at most n − 2 factors with λ. To see this, assume for example that in
one of the terms the factor from the first row is not (a11 − λ) but some a1j . Then there cannot be
FT
another factor from the jth column, in particular the factor (ajj − λ) cannot appear. So this term
has already two factors without λ, hence the degree of the term as polynomial in λ can be at most
n − 2. This shows that
pA (λ) = (−1)n λn + (−1)n−1 λn−1 (a11 + · · · + ann ) + terms of order at most n − 2, (8.9)
hence deg(pA ) = n.
Formula (8.9) also shows the claim about cn and cn−1 . The formula for c0 follows from
D
Proof. Let A ∈ M (n × n). Then the eigenvalues of A are exactly the zeros of its characteristic
polynomial. Since it has degree n, it can have at most n zeros.
Now we understand why working with complex vector spaces is more suitable when we are interested
in eigenvalues. They are precisely the zeros of the characteristic polynomial. While a polynomial
may not have real zeros, it always has zeros when we allow them to be complex numbers. Indeed,
any polynomial can always be factorised over C.
Let A ∈ M (n × n) and let pA be its characteristic polynomial. Then there exist complex numbers
λ1 , . . . , λk and integers m1 , . . . , mk ≥ 1 such that
pa (λ) = (λ1 − λ)m1 · (λ2 − λ)m2 · · · (λk − λ)mk .
The numbers λ1 , . . . , λk are precisely the complex eigenvalues of A and m1 +· · ·+mk = deg pA = n.
Definition 8.37. The integer mj is called the algebraic multiplicity of the eigenvalue λj .
FT
Example 8.39. Let A = . Since A − λ id is an upper triangular matrix, its
0 0 0 5 0 0
0 0 0 0 8 0
0 0 0 0 0 8
determinant is the product of the entries on the diagonal. We we obtain
pA (λ) = det(A − λ id) = (1 − λ)(5 − λ)3 (8 − λ)2 .
Therefore the eigenvalues of A are λ1 = 1, λ2 = 5, λ3 = 8. Let us calculate the eigenspaces.
RA
0 0 0 0 0 0 0 4 1 0 0 0
041000 permute rows 004100
• A − 1 id = 00 00 40 14 00 00 −−−−−−−−→ 00 00 00 40 07 00 . This matrix is in row echelon form and
000070 000007
000007 000000
we can see easily that Eig1 (A) = ker(A − 1 id) = span{~e1 } which has dimension 1.
−4 0 0 0 0 0 −4 0 0 0 0 0
0 0 1 0 0 0 permute rows 0 0 1 0 0 0
• A − 5 id = 0 0 0 1 0 0 −−−−−−−−→ 0 0 0 1 0 0 . This matrix is in row echelon form
0 0 0 0 0 0 0 0 0 0 3 0
0 0 0 0 3 0 0 0 0 0 0 3
0 0 0 0 0 3 0 0 0 0 0 0
and we can see easily that Eig5 (A) = ker(A − 5 id) = span{~e2 } which has dimension 1.
D
−7 0 0 0 0 0
0 −3 1 0 0 0
• A − 8 id = 0 0 −3 1 0 0 . This matrix is in row echelon form and we can see easily that
0 0 0 −3 0 0
0 0 0 000
0 0 0 000
Eig8 (A) = ker(A − 8 id) = span{~e5 , ~e6 } which has dimension 2.
In summary, we have
λ1 = 1, Eig1 (A) = span{~e1 }, geom. multiplicity: 1, alg. multiplicity: 1,
λ2 = 5, Eig5 (A) = span{~e2 }, geom. multiplicity: 1, alg. multiplicity: 3,
λ3 = 8, Eig8 (A) = span{~e6 , ~e7 }, geom. multiplicity: 2, alg. multiplicity: 2.
0 −1
Example 8.40. Find the complex eigenvalues and eigenspaces of R = .
1 0
Solution. From Example 8.29 we already know that R has no real eigenvalues. The characteristic
polynomial of R is
−λ −1
pR (λ) = det(R − λ) = det = λ2 + 1 = (λ − i)(λ + i).
1 −λ
D=
5 0
FT
Solution. We need to find an invertible matrix C and a diagonal matrix D such that D = C −1 AC.
By Example 8.33, A has the eigenvalues λ1 = 5 and λ2 = 1, hence A is indeed diagonalisable. We
know that the diagonal entries of D are the eigenvalues
of C are the corresponding eigenvalues ~v1 =
, C=
1
3
1 1
of A,hence
and ~v2 =
−1
1
D = diag(5, 1) and the columns
, hence
and D = C −1 AC.
RA
0 1 3 −1
Alternatively, we could have chosen D e = diag(1, 5). Then the corresponding C e = (~v2 |~v1
e is C
because the jth column of the invertible matrix must be an eigenvector corresponding the the jth
entry of the diagonal matrix, hence
De= 1 0 , C e= 1 1
and D e =Ce −1 AC.
e
0 5 −1 3
Observe that up to ordering the diagonal elements, the matrix D is uniquely determined by A. For
D
the matrix C however we have more choices. For instance, if we multiply each column of C by an
arbitrary constant different from 0, it still works.
0 0 0 2
FT
0 0 0 0 0 0 0 0 0 1 −1 0
0 −1 1 0 R2 →R 2 +R 3
0 0 0 0 R1 ↔R3 0 0 0 0
• AT − 2 id = −−−−−−−−→
0 1 −1 0 −−−−−→ 0 0
.
0 1 −1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0
1 0 0
−1
and Eig2 (AT ) = span , , 0 .
0 1
Hence Eig0 (AT ) = span 1 0 1 0
RA
0 0 0 1
This means
that the eigenvalues of T are 0 and 2 and that the eigenspaces are Eig0 (T ) = span {M2 − M3 } =
0 1
span and
−1 0
0 1
Eig0 (T ) = span {M2 − M3 } = span = Masym (2 × 2),
−1 0
1 0 0 1 0 0
Eig2 (T ) = span {M1 , M2 + M3 , M4 } = span , , = Msym (2 × 2),
D
0 0 1 0 0 1
Remark. We could have calculated the eigenspaces or T directly without calculating those of AT
first as follows.
• A matrix M belongs to Eig0 (T ) if and only if T (M ) = 0. This is the case if and only if
M + M t = 0 which means that M = −M t . So Eig0 (T ) is the space of all antisymmetric 2 × 2
matrices.
FT
• etc.
α1 ~
v2 and
α2 α2 α2 α2 ~0 = (λ1 − λ2 )~v1 .
λ1~v1 = A~v1 = A ~v2 = A~v2 = λ2~v2 = λ2 ~v2 = λ2~v1 =⇒
α1 α1 α1 α1
Since λ1 6= λ2 and ~v1 6= ~0, the last equality is false and therefore we must have α1 = 0. Then,
by (8.10), ~0 = α1~v1 + α2~v2 = α2~v2 , hence also α2 = 0 which proves that ~v1 and ~v2 are linearly
independent.
Induction step: Assume that we already know for some j < k that the vectors ~v1 , . . . , ~vj are linearly
independent. We have to show that then also the vectors ~v1 , . . . , ~vj+1 are linearly independent. To
this end, let α1 , α2 , . . . , αj+1 such that
On the one hand we apply A on both sides of the equation and use the fact that vectors are
eigenvectors. On the other hand we multiply both sides by λj+1 and then we compare the two
results.
Note that the term with ~vj+1 cancelled. By the induction hypothesis, the vectors ~v1 , . . . , ~vj are
linearly independent, hence
FT
α1 (λ1 − λj+1 ) = 0, α2 (λ1 − λj+1 ) = 0, ..., αj (λ1 − λj+1 ) = 0.
We also know that λj+1 is not equal to any of the other λ` , hence it follows that
α1 = 0, α2 = 0, ..., αj = 0.
Inserting this in (8.11) gives that also αj+1 = 0 and the proof is complete.
RA
Note that the proposition shows again that an n×n matrix can have at most n different eigenvalues.
and dim(Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A)) = dim(Eigµ1 (A) + · · · + dim Eigµk (A)).
(m)
Proof. Let αj be numbers such that
=w
~1 + w
~2 + . . . w
~k
(j) (j)
~ j = α1 ~v1j + · · · + α`1 ~v`j1 ∈ Eigµj . Proposition 8.43 implies that w
with w ~ k = ~0.
~1 = · · · = w
(m) (m) (m)
But then also all coefficients αj = 0 because for fixed m, the vectors ~v1 , . . . , ~v`m are linearly
independent. Now all the assertions are clear.
d1
Theorem 8.45. (i) Let D = diag(d1 , . . . , dn ) =
0 be a diagonal matrix. Then
0 dn
the eigenvalues of D are precisely the numbers d1 , . . . , dn and the geometric multiplicity of
each eigenvalue is equal to its algebraic multiplicity.
d1 ∗ d
(ii) Let B =
and C =
1 0 be upper and lower triangular matrices
∗
0 dn
dn
respectively. Then the eigenvalues of D are precisely the numbers d1 , . . . , dn and the algebraic
multiplicity of an eigenvalue is equal to the number of times it appears on the diagonal. In
general, nothing can be said about the geometric multiplicities.
Proof. (i) Since the determinant of a diagonal matrix is the product of its diagonal elements, we
obtain for the characteristic polynomial of D
FT
d 1 − λ
0
pD (λ) = det(D − λ) = det = (d1 − λ) · · · · · (dn − λ).
0
dn − λ
Since the zeros of the characteristic polynomial are the eigenvalues of D, we showed that the
RA
numbers on the diagonal of D are precisely its eigenvalues. The algebraic multiplicity of an
eigenvalue µ is equal to the number of times it is repeated on the diagonal of D. The algebraic
multiplicity of µ is equal to dim(ker(D − µ id). Note that D − µ id is a diagonal matrix and
the jth entry on its diagonal is 0 if and only if µ = dj . it is not hard to see that the dimension
of the kernel of a diagonal matrix is equal to the number of zeros on its diagonal. So, in
summary we have for an eigenvalue µ of A:
(ii) Since the determinant of a triangular matrix is the product of its diagonal elements, we obtain
for the characteristic polynomial of B
d − λ
1
∗
pB (λ) = det(B − λ) = det = (d1 − λ) · · · · · (dn − λ).
0
dn − λ
and analogously for C. The reasoning for the algebraic multiplicities of the eigenvalues is as
in the case of a diagonal matrix. However, in general the algebraic and geometric multiplicity
of an eigenvalue of a triangular matrix may be different as Example 8.39 shows.
5 0 0 0 0 0
0 1 0 0 0 0
0 0 5 0 0 0
Example 8.46. Let D = . Then pD (λ) = (1 − λ)(5 − λ)3 (5 − λ)2 .
0 0 0 8 0 0
0 0 0 0 8 0
0 0 0 0 0 5
The eigenvalues are 1 (with geom. mult = alg. mult = 1), 5 (with geom. mult = alg. mult = 3)
and 8 (with geom. mult = alg. mult = 2),
Theorem 8.47. If A and B are similar matrices, then they have the same characteristic polyno-
mial. In particular, they have the same eigenvalues with the same algebraic multiplicities. Moreover,
also the geometric multiplicities are equal.
A − λ id = C −1 BC − λ id = C −1 BC − λC −1 C = C −1 (B − λ id)C
FT
and we obtain for the characteristic polynomial of A
pA (λ) = det(A − λ id) = det(C −1 (B − λ id)C) = det(C −1 ) det(B − λ id) det C = det(B − λ id)
= pB (λ).
This shows that A and B have the same eigenvalues and that their algebraic multiplicities coincide.
where in the second to last step we used that C −1 is invertible. The invertibility of C −1 also shows
that dim(C −1 Eigµ (B)) = dim(Eigµ (B), hence dim Eigµ (A) = dim(Eigµ (B), which proves that the
geometric multiplicity of µ as eigenvalue of A is equal to that of B.
D
(i) A is diagonalisable, that means that there exists a diagonal matrix D and an invertible matrix
C such that C −1 AC = D.
(ii) For every eigenvalue of A, its geometric and algebraic multiplicities are equal.
(iii) A has a set of n linearly independent eigenvectors.
(iv) Kn has a basis consisting of eigenvectors of A.
Proof. Let µ1 , . . . , µk be the different eigenvalues of A and let us denote the algebraic multiplicities
of µj by mj (A) and mj (D) and the geometric multiplicities by nj (A) and nj (D).
(i) =⇒ (ii): By assumption A and D are similar so they have the same eigenvalues by Theorem 8.47
and
mj (A) = mj (D) and nj (A) = nj (D) for all j = 1, . . . , k,
and Theorem 8.45 shows that
(ii) =⇒ (iii): Recall that the geometric multiplicities nj (A) are the dimensions of the kernel of
A − µj id. So in each ker(A − µj ) we may choose a basis consisting of nj (A) vectors. In total we
have n1 (A)+· · ·+nk (A) = m1 (A)+· · ·+mk (A) = n such vectors and they are linearly independent
by Corollary 8.44.
(iii) =⇒ (iv): This is clear because dim Kn = n.
FT
(iv) =⇒ (i): Let B = {~c1 , . . . , ~cn } be a basis of Kn consisting of eigenvectors of A and let d1 , . . . , dn
be the corresponding eigenvalues, that is, A~cj = dj ~cj . Note that the dj are not necessarily pairwise
different. Then the matrix C = (~c1 | · · · |~cn ) is invertible and C −1 AC is the representation of A in
the basis B, hence C −1 AC = diag(d1 , . . . , dn ). In more detail, using that ~cj = C~ej and C −1~cj = ~ej ,
Proof. If A has n different eigenvalues λ1 , . . . , λn , then for each of them the algebraic multiplicity
is equal to 1. Moreover,
D
for each eigenvalue. Hence the algebraic and the geometric multiplicity for each eigenvalue are equal
(both are equal to 1) and the claim follows from Theorem 8.48.
Corollary 8.50. If the matrix A ∈ M (n × n) is diagonalisable, then its determinant is equal to the
product of its eigenvalues.
Proof. Let λ1 , . . . , λn be the (not necessarily different) eigenvalues of A and let C be an invertible
matrix such that C −1 AC = D := diag(λ1 , . . . , λn ). Then
n
Y
det A = det(CDC −1 ) = (det C)(det D)(det C −1 ) = det D = λj .
j=1
Proof. Let us denote the algebraic multiplicity of each µj by mj (A) and its geometric multiplicity
by nj (A).
If A is diagonalisable, then the geometric and algebraic multiplicities are equal for each eigenvalue.
Hence
dim(Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A)) = dim(Eigµ1 (A)) + · · · + dim(Eigµk (A))
= n1 (A) + · · · + nk (A) = m1 (A) + · · · + mk (A) = n.
Since every n-dimensional subspace of Kn is equal to Kn , (8.12) is proved.
Now assume that (8.12) is true. We have to show that A is diagonalisable. In each Eigµj we choose
a basis Bj . By (8.12) the collection of all those basis vectors form a basis of Kn . Therefore we found
FT
a basis of Kn consisting of eigenvectors of A. Hence A is diagonalisable by Theorem 8.48.
The above theorem says that A is diagonalisable if and only if there are enough eigenvectors of
A to span Kn . This is the case if and only if Kn splits in the direct sum of subspaces on each of
which A acts simply by multiplying each vector with the number (namely with the corresponding
eigenvalue).
To practice a bit the notions of algebraic and geometric multiplicities, finish this section with an
alternative proof of Theorem 8.48.
RA
Alternative proof of Theorem 8.48. Let us prove (i) =⇒ (iv) =⇒ (iii) =⇒ (ii) =⇒ (i).
(i) =⇒ (iv): This was already discussed after Definition 8.22. Let D = diag(d1 , . . . , dn ) and let ~c1 , . . . , ~cn
be the columns of C. Clearly they form a basis of Kn because C in invertible. By assumption we know
that AC = CD. Hence we have that
A~cj = jth column of AC = jth column of CD = dj · (jth column of C) = dj ~cj .
Therefore the vectors ~c1 , . . . , ~cn are linearly independent and are all eigenvalues of A and hence they are
even a basis of Kn .
D
Since nj (A)−mj (A) ≤ 0 for all j = 1, . . . , k, each of the terms must be zero which shows that nj (A)−mj (A)
as desired.
(ii) =⇒ (i): For each j = 1, . . . , k let us choose a basis Bj of Eigµj (A). Observe that each basis has
nj (A) vectors. By Corollary 8.44, the system consisting of all these basis vectors is linearly independent.
Moreover, the total number of these vectors is n1 (A) + · · · + nk (A) = m1 (A) + · · · + mk (A) = n where we
used the assumption that the algebraic and geometric multiplicities are equal for each eigenvalue. Hence
the collection of all those vectors form a basis of Kn . That A is diagonalisable follows now as in the proof
of (iv) =⇒ (i):
FT
• algebraic and geometric multiplicities,
• etc.
You should now be able to
• verify if a given matrix is diagonalisable,
• if it is diagonalisable, find its diagonalisation,
• etc.
RA
8.5 Symmetric and Hermitian matrices
In this section we will deal with symmetric and hermitian matrices. The main results are that all
eigenvalues of a hermitian matrix are real, that eigenvectors corresponding to different eigenvalues
are orthogonal and that every hermitian matrix is diagonalisable. Note that symmetric matrices
are a special case of hermitian ones, so whenever we show something about hermitian matrices, the
D
Proof. Let A be hermitian, that is, A∗ = A and let λ be an eigenvalue of A with eigenvector ~v .
Then ~v 6= ~0 and A~v = λ~v . We have to show that λ = λ. Therefore
λk~v k2 = λh~v , ~v i = hλ~v , ~v i = hA~v , ~v i = h~v , A∗~v i = h~v , A~v i = h~v , λ~v i = λh~v , ~v i = λk~v k2 .
Since ~v 6= ~0, it follows that λ = λ which means that the imaginary part of λ is 0, hence λ ∈ R.
Theorem 8.53. Let A be a hermitian matrix and let λ1 , λ2 be two different eigenvalues of A with
eigenvectors ~v1 and ~v2 , that is A~v1 = λ1~v1 and A~v2 = λ2~v2 . Then ~v1 ⊥ ~v2 .
Proof. The prove is similar to the proof of Theorem 8.52. We have to show that h~v1 , ~v2 i = 0. Note
that by Theorem 8.52, the eigenvalues λ1 , λ2 are real.
λ1 h~v1 , ~v2 i = hλ1~v1 , ~v2 i = hA~v1 , ~v2 i = h~v1 , A∗~v2 i = h~v1 , A~v2 i = h~v1 , λ2~v2 i = λ2 h~v1 , ~v2 i = λ2 h~v1 , ~v2 i.
Corollary 8.54. Let A be a hermitian matrix and let λ1 , λ2 be two different eigenvalues of A.
Then Eigλ1 (A) ⊥ Eigλ2 (A).
The next theorem is one of the most important theorems in Linear Algebra.
FT
As a corollary we obtain the following very important theorem.
Theorem 8.57. A matrix is hermitian if and only if it is unitarily diagonalisable, that is, there
exists a unitary matrix Q and a diagonal matrix D such that D = Q−1 AQ = Q∗ AQ.
Theorem 8.57*. A matrix is symmetric if and only if it is orthogonally diagonalisable, that is,
RA
there exists an orthogonal matrix Q and a diagonal matrix D such that D = Q−1 AQ = Qt AQ.
In both cases, D = diag(λ1 , . . . , λn ) where the λ1 , . . . , λn are the eigenvalues of A and the columns
of Q are the corresponding eigenvectors.
Proof. Let A be a hermitian matrix. From Theorem 8.55 we know that A is diagonalisable. Hence
where µ1 , . . . , µk are the different eigenvalues of A. In each eigenspace Eigµj (A) we can choose an
D
orthonormal basis Bj consisting of nj vectors ~v1j , . . . , ~vnj j where nj is the geometric multiplicity of
µj . We know that the eigenspaces are pairwise orthogonal by Corollary 8.54. Hence the system of
all these vectors form an orthonormal basis B of Cn . Therefore the matrix Q whose columns are
the vectors of this basis is a unitary matrix and Q−1 AQ = D.
Now assume that A is unitarily diagonalisable. We have to show that A is hermitian. Let Q be a
unitary matrix and let D be a diagonal matrix such that D = Q∗ AQ. Then A = QDQ∗ and
where we used that D∗ = D because D is a diagonal matrix whose entries on the diagonal are real
numbers because they are the eigenvalues of A.
The proof of Theorem 8.57* is the same.
Corollary 8.59. If a matrix A is hermitian (or symmetric), then its determinant is the product
of its eigenvalues.
Proof. This follows from Theorem 8.55 (or Theorem 8.55*) and Corollary 8.50.
Let us denote the right hand side by U , that is, U := Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A). Then we have
to show that U ⊥ = {~0}. For the sake of a contradiction, assume that this is not true and let
(j) (j)
` = dim(U ⊥ ). In each Eigµj (A) we choose an orthogonal basis ~v1 , . . . , ~vnj and we choose and
orthogonal basis w ~ ` in U ⊥ . The set of all these vectors is an orthonormal basis B of
~ 1, . . . , w
Cn because all the eigenspaces are orthogonal to each other and to U ⊥ . Let Q be the matrix
(1) (k)
whose columns are these vectors: Q = (~v1 | · · · |~vnk |w ~ 1 | · · · |w
~ ` ). Then Q is a unitary matrix
FT
because its columns are an orthogonal basis of Cn . Next let us define B = Q−1 AQ. Then B is
symmetric because B ∗ = (Q−1 AQ)∗ = Q∗ A∗ (Q−1 )∗ = Q−1 AQ = B where we used that A = A∗
by assumption and that Q−1 = Q∗ because it is a unitary matrix. On the other hand, B being the
matrix representation of A with respect to the basis B, is of the form
µ1
µ1
RA
µ2
B= .
µk
C
All the empty spaces are 0 and C is an ` × ` matrix (it is the matrix representation of the restriction
of A to U ⊥ with respect to the basis w
~ 1, . . . , w
~ ` ). The characteristic polynomial of C has at least
D
one zero, hence C has at least one eigenvalue λ. Clearly, λ is then also an eigenvalue of B and if
~y ∈ C` is an eigenvector of C, we obtain an eigenvector of B with the same eigenvalue by putting
0s as its first n − ` components and ~y as its last ` components. Since A and B have the same
eigenvalues, λ must be equal to one of the eigenvectors µ1 , . . . , µk , say λ = µj0 . But then the
dimension of the eigenspace Eigµj0 (B) is strictly larger than the dimension of Eigµj0 (A) which
contradicts Theorem 8.47. Therefore U ⊥ = {~0} and the theorem is proved.
Proof of Theorem 8.55*. The proof is essentially the same as that for Theorem 8.55. We only
have to note that, using the notation of the proof above, the matrix C is symmetric (because B
is symmetric). If we view C as a complex matrix, it has at least one eigenvalue λ because in C
its characteristic polynomial has at least one complex zero. However, since C is hermitian, all its
eigenvalues are real, hence λ is real, so it is an eigenvalue of C if we view it as a real matrix.
FT
matrix,
• etc.
with constants a, b, d. A solution is a tuple (x, y) which satisfies (8.13). We can view the set of all
solutions as a subset in the plane R2 . Since (8.13) is a linear equation (a 1 × 2 system of linear
equations), we know that we have the following possibilities for the solution set:
(a) a line if a 6= 0 or b 6= 0,
(b) the plane R2 if a = 0, b = 0 and d = 0,
D
with constants a, b, c, d.
In the following we will always assume that d ≥ 0. This is no loss of generality because if d < 0,
we can multiply both sides of (8.14) by −1 and replace a, b, c by −a, −b, −c. The set of solutions
does not change.
Again, we want to identify the solutions with subsets in R2 and we want to find out what type of
figures they are. The equation (8.14) is not linear, so we have to see what relation (8.14) has with
what we studied so far. It turns out that the left hand side of (8.14) can be written as an inner
product
x x a b/2
G , with G = . (8.15)
y y b/2 c
Question 8.5
The matrix G from (8.15) is not the only possible choice. Find all possible matrices G such that
hG( xy ) , ( xy )i = ax2 + bxy + cy 2 .
The matrix G is very convenient because it is symmetric. This means that up to an orthogonal
transformation, it is a diagonal matrix. So once we know how to solve the problem when G is
diagonal, then we know it for the general case since the solutions differ only by a rotation and
maybe a reflection. This motivates us to first study the case when G is diagonal, that is, when
b = 0.
FT
Quadratic equation without mixed term (b = 0).
Remark 8.60. The solution set is symmetric with respect to the x-axis and the y-axis because if
some (x, y) is a solution of (8.16), then so are (−x, y) and (x, −y).
RA
Let us define
( (
p p 2 a if a ≥ 0, c2 if c ≥ 0,
α := |a|, γ := |c|, hence α = and γ =
−a if a < 0 −c if c < 0.
We have to distinguish several cases according to whether the coefficients a, c are positive, negative
or 0.
D
Case 1.1: a > 0 and c > 0. In this case, the equation (8.16) becomes
α2 x2 + γ 2 y 2 = d. (8.16.1.1)
(i) If d > 0, then (8.16.1.1) is the equation of an ellipse whose axes are parallel to the x and
√ p
the y-axis. The intersection with the x-axis is at ± αd = ± d/a and the intersection with
√ p
the y-axis is at ± γd = ± d/c.
y y
p
d/c
p
d/a
x x
Figure 8.1: Solution of (8.16) for det G > 0 . If a > 0, b > 0, then the solution is an ellipse (if d > 0)
or the point (0, 0) (if d = 0). The right picture shows ellipses with a and c fixed but decreasing d (from
red to blue). If a < 0, b < 0, d > 0, then there is no solution.
Case 1.2: a < 0 and c < 0. In this case, the equation (8.16) becomes
− α2 x2 − γ 2 y 2 = d. (8.16.1.2)
FT
(i) If d > 0, then (8.16.1.2) has no solution because the left hand side is always less or equal to
0 while the right hand side is strictly positive.
(ii) If d = 0, then the only solution of (8.16.1.2) is the point (0, 0) .
Case 2.1: a > 0 and c < 0. In this case, the equation (8.16) becomes
α2 x2 − γ 2 y 2 = d. (8.16.2.1)
RA
(i) If d > 0, then (8.16.2.1) is the equation of a hyperbola . If x = 0, the equation has no
√
solution. Indeed, we need |x| ≥ αr such that the equation has a solution. Therefore the
hyperpola
√
does
√
not intersect the y-axes (in fact, the hyperbola cannot pass through the strip
d d
− α < y < α ).
• Intersection with the√coordinate
p axes: No intersection with the y-axis. Intersection with
the x-axis at x = ± αd = ± d/a.
• Asymptotics: For |x| → ∞ and |y| → ∞, the hyperbola has the asymptotes
D
α
y = ± x.
γ
Note that the asymptote does not depend on d.
Proof. It follows from (8.16.2.1) that |x| → ∞ if and only if |y| → ∞ because otherwise
the difference α2 x2 − γ 2 y 2 cannot be constant. Dividing (8.16.2.1) by x2 and by γ 2 and
rearranging leads to
y2 α2 d x large α2 α
2
= 2 − 2 2 ≈ , hence y≈± x.
x γ γ x γ2 γ
(ii) If d = 0, then (8.16.2.1), becomes α2 x2 +γ 2 y 2 = 0, and its solution is the pair of lines y = ± αγ x .
y y
p
d/a
x
x
FT
Figure 8.2: Solution of (8.16) for det G < 0 . The solutions are hyperbola (if d > 0) or a set of two
intersecting lines. The left picture shows a solution for a > 0, c < 0 and d > 0. The right picture
shows hyperbolas for fixed a and c but decreasing d. The blue pair of lines passing through the origin
correspond to the case d = 0.
RA
Remark√8.62. Note that the intersection point of the hyperbola with the x-axis is propor-
tional to d. Hence as d decreases, the intersection points moves closer to the 0 and the turn
becomes sharper. If d = 0, the intersection point reaches 0 and the hyperbola become two
angles which look like two crossing lines.
Case 2.2: a < 0 and c > 0. In this case, the equation (8.16) becomes
− α2 x2 + γ 2 y 2 = d. (8.16.2.2)
D
This case is the same as Case 2.1, only with the roles of x and y interchanged. So we find:
• Asymptotics: For |x| → ∞ and |y| → ∞, the hyperbola has the asymptotes y = ± αγ x.
(ii) If d = 0, then (8.16.2.1), becomes α2 x2 +γ 2 y 2 = 0, and its solution is the pair of lines y = ± αγ x .
Case 3.1: a > 0 and c = 0. Then (8.16) Case 3.2: a = 0 and c > 0. Then (8.16)
becomes α2 x2 = d. becomes γ 2 y 2 = d.
• If d > 0, the solutions are the • If d > 0, the solutions are the
√ √
two parallel lines x = ± αd . two parallel lines y = ± γ
d
.
• If d = 0, the solution is the line x = 0 . • If d = 0, the solution is the line y = 0 .
Case 3.3: a < 0 and c = 0. Then (8.16) Case 3.4: a = 0 and c < 0. Then (8.16)
becomes −α2 x2 = d. becomes −γ 2 x2 = d.
If G was diagonal, then we immediately could give the solution. We know that G is symmetric,
hence we know that G can be orthogonally diagonalized. In other words, there exists an orthogonal
basis of R2 with respect to which G has a representation as a diagonal matrix. We can even choose
this basis such that they are a rotation of the canonical basis ~e1 and ~e2 (without an additional
D
reflection).
Let λ1 , λ2 be eigenvalues of G and let D = diag(λ1 , λ2 ). We choose an orthogonal matrix Q such
that
D = Q−1 GQ. (8.18)
Denote the columns of Q by ~v1 and ~v2 . They are normalised eigenvectors of G with eigenvalues λ1
and λ2 respectively. Recall that for an orthogonal matrix Q we always have that det Q = ±1. We
may assume that det Q = 1, because if not we can simply multiply one of its columns by −1. This
column then is still a normalised eigenvector of G with the same eigenvalue, hence (8.18) is still
valid. With this choice we guarantee that Q is a rotation.
From (8.18) it follows that G = QDQ−1 = QDQ∗ . So we obtain from (8.17) that
0
x
where ~x 0 = = Q∗ ~x = Q−1 ~x.
y0
0
x
Observe that the column vector is the representation of ~x with respect to the basis ~v1 , ~v2
y0
(recall that they are eigenvectors of G). Therefore the solution of (8.14) is one of the solutions
we found for the case b = 0 only now the symmetry axes of the figures are no longer the x- and
y-axis, but they are the directions of the eigenvectors of G. In other words: Since Q is a rotation,
we obtain the solutions of ax2 + bxy + cy 2 = d by rotating the solutions of ax2 + cy 2 = d with the
matrix Q.
FT
• Quadratic form without mixed terms: d = λ1 x02 + λ2 y 02 where x0 , y 0 are the components of
~x 0 = Q−1 ~x.
• Graphic of the solution: In the xy-coordinate system, indicate the x0 -axis (parallel to ~v1 )
and the y 0 -axis (parallel to ~v2 ). Note that these axes are a rotation of the x- and the y-axis.
The solutions are then, depending on the eigenvalues, an ellipse, hyperbola, etc. whose
symmetry axes are the x0 - and y 0 -axis.
If we want to know only the shape of the solution, it is enough to calculate the eigenvalues λ1 , λ2
RA
of G, or even only det G. Recall that we always assume d ≥ 0.
Definition 8.63. The axis of symmetry are called the principal axes.
we write (8.19) in the form hG~x, ~xi with a symmetric matrix G. Let us define
Solution. (i) First
10 3
G= . Then (8.19) is equivalent to
3 2
x x
G , = 4. (8.20)
y y
FT
(ii) Now we calculate the eigenvalues of G. They are the roots of the characteristic polynomial
det(G − λ).
(Recall that for symmetric matrices the eigenvectors for different eigenvalues are orthogonal.
If you solve such an exercise it might be a good idea to check if the vectors are indeed
orthogonal to each other.)
Observation. With the information obtained so far, we already can sketch the solution.
Set
1 1 −3 λ1 0 11 0
Q = (~v1 |~v2 ) = √ , D= = ,
10 3 1 0 λ2 0 1
then
Q−1 = Qt y D = Q−1 GQ = Qt GQ.
Observe that det Q = 1, so it is a rotation en R2 . It is a rotation by the angle arctan(−3).
If we define 0
x −1 x 1 x + 3y
= Q = √ ,
y0 y 10 3x − y
then (8.20) gives
0 0
x x t x t x x x
4= G , = DQ , Q = D 0 , ,
y y y y y y0
and therefore
1 11
4 = x02 + 11y 02 = (x − 3y)2 + (3x + y)2 .
10 10
FT
(iii) The solution of (8.19) is an ellipse whose
principal axes are parallel to the vectors ~v1 y ~v2 .
x0 is the coordinate along the axis parallel to ~v1 ,
y 0 is the coordinate along the axis parallel to ~v2 .
y
√
2/ 11
y0
RA
~v2
x
~v1
2
D
x0
47 2 32 13
− x − xy + 13y 2 = 2. (8.21)
17 17 17
(i) Write the equation in matrix form.
(ii) Make a change of coordinates so that the quadratic equation (8.21) has no mixed term.
(iii) Describe the solution of (8.21) in geometrical terms and sketch it. Indicate the principal axes
and important intersections.
Solution. (i) First we write (8.21) in the form hG~x, ~xi with symmetric matrix G. Let us define
1 −47 −16
G = 17 . Then (8.21) is equivalent to
−16 13
x x
G , = 2. (8.22)
y y
(ii) Now we calculate the eigenvalues of G. They are the roots of the characteristic polynomial
47 13
0 = det(G−λ) = (− 17 −λ)( 17 −λ)− 128 2 34 611 256 2
172 = λ + 17 λ− 172 − 172 = λ +2λ−3 = (λ−1)(λ+3).
FT
• G − λ1 = −→ =⇒ ,
17 −16 64 17 0 0 17 1
1 −64 −16 1 4 1 1 −1
• G − λ2 = −→ =⇒ ~v2 = √ .
17 −16 −4 17 0 0 17 4
Observation. With the information obtained so far, we already can sketch the solution.
• The solution are hyperbola because the eigenvalues have opposite signs.
• The principal axes (symmetry axes) are parallel to the vectors ~v1 and ~v2 . The intersec-
RA
√
tions of the hyperbola with the axis parallel to ~v2 are ± 2.
Set
1 4 1 λ1 0 −3 0
Q = (~v1 |~v2 ) = √ , D= = ,
17 −1 4 0 λ2 0 1
then
Q−1 = Qt y D = Q−1 GQ = Qt GQ.
Observe that det Q = 1, hence Q is a rotation of R2 . It is a rotation by the angle arctan(1/4).
D
If we define 0
x −1 x 1 4x − y
= Q = √ ,
y0 y 17 x + 4y
then (8.22) gives
0 0
x x x x x x
2= G , = DQt , Qt = D 0 , ,
y y y y y y0
hence
3 1
2 = −3x02 + y 02 = − (4x − y)2 + (x + 4y)2 .
17 17
y
y0
FT
Asymptotes of the hyperbola. In order to calculate the slopes of the asymptotes of the
hyperbola, we first calculate in the x0 -y 0 -coordinate system. Our starting point is the equation
2 = −3x02 + y 02 .
RA
r
02 02 y 02 1 y0 1
2 = −3x + y ⇐⇒ = 3 + 02 ⇐⇒ = ± 3 + 02 .
x02 2x x0 2x
0 √
We see that |y 0 | → ∞ if and only if |x0 | → ∞ and that xy 0 ≈ ± 3. So the slopes of the
√
asymptotes in x0 -y 0 -coordinates are ± 3.
How do we find the slope in x − y-coordinates?
• Method 1: Use Q. We know that if we rotate our hyperbola by the linear transforma-
D
√
1 4 1 1
√ 1 4− √ 3
~ 20
w ~2 = √
= Qw =√ .
17 −1 4 − 3 17 −1 − 4 3
√ √1 (x + 4y)
y0 17 x + 4y
± 3= = =
x0 √1 (4x − y) 4x − y
17
√
⇐⇒ ± 3(4x − y) = x + 4y
√ √
⇐⇒ (±4 3 − 1)x = (4 ± 3)y
√
y −1 ± 4 3
⇐⇒ = √ .
x 4± 3
0
• Method 3: Adding √ angles. We know that the0 angle between the x -axis and an
FT
asymptote is arctan 3 and the angle between the x -axis and the x-axis
√ is arctan(1/4).
Therefore the angel between the asymptote and the x-axis is arctan 3 + arctan(1/4)
(see Figure 8.3.)
−3x2 + y 2 = 2 − 47 2
17 x −
32
17 16xy + 13 2
17 y =2
y y
y0 y0
RA
√
2 α=ϕ+ϑ
x0 ~v2 ϑ x0
ϑ
ϕ ~v1 ϕ
D
x x
ϕ = arctan(1/4)
√
ϑ = arctan( 3)
Figure 8.3: The figure on the right (our hyperbola) is obtained from the figure on the left by
applying the transformation Q to it (that is, by rotating it by arctan(1/4)).
Solution 1.
• First
we write (8.21) in the form hG~x, ~xi with symmetric matrix G. Let us define
9 −3
G= . Then (8.23) is equivalent to
−3 1
x x
G , = 25. (8.24)
y y
λ1 = 0, λ2 = 10.
Next we need the normalised eigenvectors. To this end, we calculate ker(G−λj ) using Gauß
RA
elimination:
9 −3 3 −1 1 1
• G − λ1 = −→ =⇒ ~v1 = √ ,
−3 1 0 0 10 3
−1 −3 1 3 1 −3
• G − λ2 = −→ =⇒ ~v2 = √ .
−3 −9 0 0 10 1
Observation. With the information obtained so far, we already can sketch the solution.
D
– The solution are two parallel lines because one of the eigenvalues is zero and the other
is positive.
– The
p lines are p
parallel to ~v1 and their intersections with the axis parallel to ~v1 are
± 25/10 = ± 5/2.
Set
1 1 −3 λ1 0 0 0
Q = (~v1 |~v2 ) = √ , D= = ,
10 3 1 0 λ2 0 10
then
Q−1 = Qt y D = Q−1 GQ = Qt GQ.
Observe that det Q = 1, hence Q is a rotation in R2 . It is a rotation by the angle arctan(3).
If we define
0
x −1 x 1 x + 3y
=Q =√ ,
y0 y 10 −3x + y
then (8.24) gives
0 0
x x x x x x
25 = G , = DQt , Qt = D 0 , ,
y y y y y y0
therefore
25 = 10y 02 = (−3x + y)2 .
the
pvector ~v1 which
0
p intersect the y -axis at
± 25/10 = ± 5/2.
FT
• The solution of (8.19) are two lines parallel to
x
RA
arctan(3).
y = 3x ± 5
FT
Figure 8.4: Ellipses. The plane in the picture on the left is parallel to the xy-plane. Therefore
the intersection with the cone is a circle. If the plane starts to incline, the intersection becomes an
ellipse. The more inclined the plane is, the more prolonged is the ellipse. As long as the plane is not
yet parallel to the surface of the cone, the intersects only either the upper or the lower part of the
cone and the intersection is an ellipse.
RA
D
Figure 8.5: Parabola. If the plane is parallel to the surface of the cone and does not pass through
the origin, then the intersection with the cone is a parabola (this is not a possible solution of (8.14)).
If the plane is parallel to the surface of the cone and passes through the origin, then the plane is
tangential to the cone and the intersection is one line.
Figure 8.6: Hyperbola. If the plane is steeper than the cone, then it intersects both the upper and
the lower part of the cone. The intersection are hyperbola. If the plane passes through the origin, then
FT
the hyperbola degenerate to two intersecting lines. The plane in the picture in the middle is parallel
to the yz-plane. Therefore the intersection with the cone is a circle.
λ1 x02 + λ2 y 02 + r0 x0 + s0 y 0 = d0 .
Now we only need to complete the squares on the left hand sides to obtain
Note that this can always be done if λ1 and λ2 are not 0 (here we use that G is invertible).
If we set d00 = d0 + (r0 /2)2 + (s0 /2)2 , x00 = x0 + r0 /2 y 00 = y 0 + s0 /2, then
e)2 + b(x0 + x
d = a(x0 + x e)(y0 + ye) + c(y0 + ye)2 + r(x0 + x
e) + s(y0 + ye)2
x2 + be y 2 + 2ax0 + by0 + r x e + 2cy0 + bx0 + s ye + ax20 + bx0 y0 + cy02 .
= ae xye + ce (8.27)
FT
x2 + be
de = ae y2
xye + ce (8.28)
which is now in the form of (8.14) (if de is negative, then we must multiply both sides of (8.28) by
−1. In this case, the eigenvalues of G change their sign, hence D also changes sign, but Q does
not). Hence if we set ~x 0 = Q−1~x
e, then
de = λ1 x02 + λ2 y 02
RA
0 −1~ −1 −1 −1 −1 r
1 −1 −1
and ~x = Q x e = Q (~x − ~x0 ) = Q ~x − Q ~x0 = Q ~x + 2Q G . So again we see that
s
2 2
the solution
of (8.25) is the solution of λ1 x + λ2 y = d but rotated by Q and shifted by the vector
e
1 −1 −1 r
2Q G
s
.
and that
0 0 0
x − 3y 0
x −1 x 1 x + 3y x x 1
=Q =√ and =Q 0 = √ 0 .
y0 y 10 3x − y y y 0
10 3x − y
Method 1. With the notation above, we know from Example 8.64 that(8.19’) is
FT
1/ 10 in y 0 -direction. The length of the semiaxes are 21 19
2 and 2
1 19
22 .
e = x − x0 = x + 1 and ye = y − y0 = y − 2. Then
Set x
RA
4 = 10x2 + 6xy + 2y 2 + 8x − 2y = 10(e
x − 1)2 + 6(e
x − 1)(e y + 2)2 + 8(e
y + 2) + 2(e x − 1) − 2(e
y + 2)
x2 − 20e
= 10e x + 1 + 6e x − 6e
xye + 12e y 2 + 8e
y − 12 + 2e x − 8 − 2e
y + 8 + 8e y−4
x2 + 6e
= 10e y 2 − 15
xye + 2e
hence
x2 + 6e
19 = 10e e02 + 11e
y2 = x
xye + 2e y 02
with
D
0
x −1 x 1 xe + 3e
y 1 (x + 1) + 3(y − 2) 1 x + 3y − 5
= Q = √ = √ = √ .
e e
ye0 ye 10 3ex − ye 10 3(x + 1) − (y − 2) 10 3x − y + 5
8.7 Summary
Cn as an inner product space
Cn is an inner product space if we set
n
X
h~z , wi
~ = zj w j .
j=1
~ ~z ∈ Cn and c ∈ C:
We have for all ~v w,
FT
• h~v , ~zi = h~z , wi,
~
• h~v + cw
~ , ~zi = h~v , ~zi + chw
~ , ~zi, h~z , ~v + cwi
~ = h~z , ~v i + ch~z , wi,
~
2
• h~z , ~zi = k~zk ,
• h~v , ~zi ≤ k~v k k~zk,
• k~v + ~zk2 ≤ k~v k2 + k~zk2 .
RA
The adjoint of a matrix A ∈ MC (n×n) is A∗ = (At ) = (A)t (= transposed and complex conjugated).
The matrix A is called hermitian if A∗ = A. The matrix Q is called unitary if it is invertible and
Q∗ = Q−1 .
Note that det A∗ = det A.
It is a polynomial of degree n. Since every polynomial of degree ≥ 1 has at least one complex root,
every complex matrix has at least one eigenvalue (but there are real matrices without eigenvalues.)
Moreover, an n × n-matrix has at most n eigenvalues. If we factorise pA , we obtain
where µ1 , . . . , µ)k are the different eigenvalues of A. The exponent mj is called algebraic multi-
plicity of µj . The geometric multiplicity of µj is dim(Eigµj (A). Note that
Similar matrices.
• Two matrices A, B ∈ M (n × n) are called similar if there exists an invertible matrix C such
that A = C −1 BC.
• A matrix A is called diagonalisable if it is similar to a diagonal matrix.
(i) A is diagonalisable.
(ii) Cn has a basis consisting of eigenvectors of A.
FT
(iii) Cn = Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A).
(iv) nj = mj for every j = 1, . . . , k.
(v) n1 + · · · + nk = n.
• det A ∈ R,
• If λ is an eigenvalue of Q, then λ ∈ R.
• A is unitarily diagonalisable hence Cn has a basis consisting of eigenvectors of A. They can
be chosen to be mutually orthogonal.
Moreover, A ∈ M (n × n) is hermitian if and only if hA~v , ~zi = h~v , A~zi for all ~v , ~z ∈ Cn .
Moreover, A is symmetric if and only if hA~v , ~zi = h~v , A~zi for all ~v , ~z ∈ Rn .
Solution of ax2 +bxy +cy 2 = d. The equation can be rewritten as hG~x , ~xi = d with the symmetric
matrix
a b/2
G= .
b/2 c
Let λ1 , λ2 be the eigenvalues of G and let us assume that d ≥ 0. Then the solutions are:
• an ellipse if det G > 0, more precisely,
p p
– an ellipse with length of its axes d/λ1 and d/λ2 if λ1 , λ2 > 0 and d > 0,
– the point (0, 0) if d = 0,
– the empty set if λ1 , λ2 < 0 and d > 0,
• hyperbola if det G < 0, more precisely,
– hyperbola d > 0,
– two lines crossing at the origin if d = 0,
FT
• two parallel lines, one line or R2 if det G = 0.
8.8 Exercises
1. Sea Q una matriz unitaria. Demuestre que todos sus autovalores tienen norma 1.
(a) Diga si los vectores u y w son autovectores de A. Si lo son, cuáles son los vectores propios
correspondientes?
(b) Puede usar que det(A − λ) = −λ3 + 21λ2 − 138λ + 280. Calcule todos los autovalores de A.
4. Para las siguientes matrices, encuentre los vectorios propios, los espacios propios, una matriz
invertible C y una matrix diagonal D tal que C −1 AC = D.
−3 5 −20 −2 0 1 1 0 0
A1 = 2 0 8 , A2 = 0 2 0 , A3 = 3 2 0 .
2 1 7 9 0 6 1 3 2
5. We consider a string of lenth L which is fixed on both end points. It is excited then its vertical
∂2 ∂2
elongation satisfies the partial differential equation ∂t2 u(t, x) = ∂x2 u(t, x). If we make the
iωt
ansatz u(, x) = e v(x) for some number ω and a function v which depends only on x, we
obtain −ω 2 v = v 00 . If we set λ = −ω 2 , we see that we have to solve the following eigenvalue
problem:
T : V → V, T v = v 00
with
V = {f : [0, L] → R, f is twice differentiable and f (0) = f (L) = 0}.
6. Para cada una de las siguientes matrices, determine si son diagonalizables. Si lo es, encuentre
FT
una D que es semejante. D = CAC −1 .
−1 4 2 −7 3 2 5 1
3 1 −1 3 1 0 0 5 −3 6 2 0 2 6
A1 = 1 3 −1 , A2 = 0 3 1 , A3 = 0 0 −5 1 , A4 = 5 2 7
.
−1
−1 −1 5 0 0 3
0 0 0 11 1 6 −1 3
RA
7. Encuentre una substitción ortogonal que diagonalice las formas cuadráticas dadas y encuentre
la forma diagonal. Haga un bosquejo de las soluciones. Si es un elipse, calcule las longitudes de
las ejes principales y el ángulo que tienen con el eje x. Si es una hipérbola, calcule en ángulo
que tiene las ası́ntotas con el eje x.
8. Encuentre los valores propios y los espacios propios de las siguientes matrices n × n:
1 1 ··· 1 1
1 1 ··· 1 1 1 · · · 1 2
A = ... ... .. , B = . . .. .
. ..
.. .. . .
1 1 ··· 1
1 1 ··· 1 n
1 2 P∞ 1 n
9. Sea A = . Calcule eA := n=0 n! A .
2 4
Hint. Encuentre una matriz invertible C y una matriz diagonal D tal que A = C −1 DC y use
esto para calcular An .
10. Sea A ∈ M (n × n, C) una matriz hermitiana tal que todos sus autovalores son estrictamente
mayores a 0. Sea h· , ·i el producto interno estandar en Cn . Demuestre que A induce un producto
interno en Cn a través de
11. (a) Sea Φ : M (2 × 2, R) → M (2 × 2, R), Φ(A) = At . Encuentre los valores propios y los
espacios propios de Φ.
(b) Sea P2 el espacio vectorial de polinomios de grade menor o igual a 0 con coeficientes reales.
Encuentre los valores propios y los espacios propios de T : P2 → P2 , T p = p0 + 3p.
FT
(c) Sea R la reflexión en el plano P : x + 2y + 3z = 0 en R3 . Calcule los valores propios y los
espacios propios de R.
RA
D
Complex Numbers
FT
a + ib
where a, b ∈ R and i is called the imaginary unit. The number a is called the real part of z, denoted
by Re(z) and b is called the imaginary part of z, denoted by Im(z).
The set of all complex numbers is sometimes called the complex plane and it is denoted by C:
C = {a + ib : a, b ∈ R}.
A complex number can be visualised as a point in the plane R2 where a is the coordinate on the
RA
real axis and b is the coordinate on the imaginary axis.
Let a, b, x, y ∈ R. We define the algebraic operations sum and product for complex numbers
z = a + ib, w = x + iy:
z + w = (a + ib) + (x + iy) := a + x + i(b + y),
zw = (a + ib)(x + iy) := ax − by + i(ay + bx).
a
Exercise A.1. Show that if we identify the complex number z = a+ib with the vector ∈ R2 ,
b
D
then the addition of complex planes is the same as the addition of vectors in Rn .
We will give a geometric interpretation of the multiplication of complex numbers later after formula
(A.5).
It follows from the definition above that i2 = −1. Moreover, we can view the real numbers R as a
subset of C if we identify a real number x with the complex number x + 0i.
Let a, b ∈ R and z = a + ib. Then the complex conjugate of z is
z = a − ib
and its modulus or norm is p
|z| = a2 + b2 .
Geometrically, the complex conjugate is obtained from the z by an reflection on the x-axis and its
norm is the distance of the point represented by z from the origin of the complex plane.
317
318
Im Im
3 + 2i
2 z = a + ib
−1 + i
1
Re Re
−3 −2 −1 1 2 3 4
−1
− 32 i
−2 z = a − ib
(i) z = Re z + i Im z.
(iv) zz = |z|2 .
(v) Re z = 12 (z + z), Re z = 1
2i (z
FT
(ii) Re(z + w) = Re(z) + Re(w), Im(z + w) = Re(z) + Im(w).
(iii) (z) = z, z + w = z + w, zw = z w.
− z).
RA
Proof. (i) and (ii) should be clear. For (iii) not that z = a − ib = a + ib,
z + w = a + x + i(b + y) = a + x − i(b + y) = a − ib + x − iy = a + ib + x + iy = z + w,
zw = ax − by + i(ay + bx) = ax − by + i(ay + bx) = (a − ib)(x − iy) = (a + ib)(x + iy) = z w.
z + z = a + ib + (a + ib) = a + ib + a − ib = 2a = 2 Re(z),
z + z = a + ib − (a + ib) = a + ib − (a − ib) = 2ib = 2i Im(z).
We call a complex number real if it is of the form z = a + i0 for some a ∈ R and we call it purely
imaginary if it is of the form z = 0 + ib for some b ∈ R. Hence
(c) Identity element of addition: There exists an element 0, called the additive identity such
that for every v ∈ C, we have 0 + v = v + 0 = v.
(d) Additive inverse: For all z ∈ C, we have an inverse element −z such that z + (−z) = 0.
(g) Identity element of addition: There exists an element 1, called the multiplicative identity
such that for every v ∈ C, we have 1 · v = v + ·1 = v.
(h) Multiplicative inverse: For all z ∈ C \ {0}, we have an inverse element z −1 such that
z · z −1 = 1.
FT
(i) Distributivity laws: For all u, v, w ∈ C we have
u(w + v) = uw + uv.
It is easy to check that commutativity, associativity and distributivity hold. Clearly, the additive
identity is 0 + i0 and the multiplicative identity is 1 + 0i. If z = a + ib, then its additive inverse is
−a − ib. If z ∈ C \ {0}, then z −1 = |z|z 2 = aa−ib 2
2 +b2 . This can be seen easily if we recall that |z| = zz.
The proof of the next theorem is beyond the scope of these lecture notes.
RA
Theorem A.3 (Fundamental theorem of algebra). Every non-constant complex polynomial
has at least one complex root.
Proof. Let n = deg(p). If n = 0, then p is constant and it clearly of the form (A.1). If n > 0, then,
by Theorem A.3 there exists µ1 ∈ C such that p(µ) = 0. Hence there exists some polynomial q1
such that p(z) = (z − µ)q1 (z). Clearly, deg(q) = n − 1. If q1 is constant, we are done. If q1 is not
constant, then it must have a zero µ2 . Hence q1 (z) = (z − µ2 )q2 (z) with some polynomial q2 with
deg(q2 ) = n − 2. If we repeat this process n times, we finally obtain that
Now we only have to group all terms with the same µj and we obtain the form (A.1).
where the cn are the coefficients and a is where the power series is centred.
P∞ In our case, they are
complex numbers and z P is a complex number. Recall that a series n=0 an is called absolutely
∞
convergent if and only if n=0 |an | is convergent. It can be shown that every absolutely convergent
series of complex numbers is convergent. Moreover, for every power series of the form (A.2) there
exists a number R > 0 or R = ∞, called the radius of convergence such that the series converges
absolutely for every z ∈ C with |z − a| < R and it diverges for z with |z − a| > R. That means that
the series converges absolutely for all z in the open disc with radius R centred in a, and it diverges
outside the closed disc with radius R centred in a. For z on the boundary the series may converge
FT
or diverge. Note that R = 0 and R = ∞ are allowed. If R = 0, then the series converges only for
z = a and if R = ∞, then the series converges for all z ∈ C.
Important functions that we know from the real numbers and have a power series are sine, cosine
and the exponential function. We can use their power series representation to define them also for
complex numbers.
Note that for every z the series in (A.3) are absolutely convergent because, for instance, for the series
P∞ (−1)n 2n+1 P∞ 1
for the sine function, we have n=0 | (2n+1)! z | = n=0 (2n+1)! |z|2n+1 is convergent because |z|
is a real number and we know that the cosine series is absolutely convergent for every real argument.
Hence the sine series is absolutely convergent for any z ∈ C, hence converges. The same argument
shows that the series for the cosine and for the exponential are convergent for every z ∈ C.
D
Remark A.6. Since the series for the sine function contains only odd powers of z, it is an odd
function and cosine is an even function because it contains only even powers of z. In formulas:
sin(−z) = − sin z, cos(−z) = cos z.
Next we show the relation between the trigonometric and the exponential function.
Proof. Let us show the formula for eiz . In the calculation we will use that i2n = (i2 )n = (−1)n and
i2n+1 = (i2 )n i = (−1)n i and
∞ ∞ ∞ ∞
X 1 X 1 n n X 1 (2n) 2n X 1
eiz = (iz)n = i z = i z + i(2n+1) z 2n+1
n=0
n! n=0
n! n=0
(2n)! n=0
(2n + 1)!
∞ ∞ ∞ ∞
X 1 X 1 X (−1)n 2n X (−1)n 2n+1
= (−1)n z 2n + i(−1)n z 2n+1 = z +i z
n=0
(2n)! n=0
(2n + 1)! n=0
(2n)! n=0
(2n + 1)!
= cos z + i sin z.
Note that the third steps needs some proper justification (see some course on intergral calculus).
For the proof of the formula for cos z we note that from what we just proved, it follows that
1 iz 1 1
(e + e−iz ) = (cos(z) + i sin(z) + cos(−z) + i sin(−z)) = (cos(z) + i sin(z) + cos(z) − i sin(z))
FT
2 2 2
= cos(z).
in C.
(v) Show that the exponential function is 2πi periodic.
Let z ∈ C with |z| = 1 and let ϕ be the angle between the positive real axis and the line connecting
the origin and z. It is called the argument of z. and it is denoted by arg(z). Observe that the
argument is only determined modulo 2π. That means, if we add or subtract any integer multiple
of 2π to the argument, we obtain another valid argument.
Im
Im z
Im(z) = |z| sin ϕ
1 1
z
Im(z) ze
ϕ ϕ
Re(z) 1 Re Re(z) 1 Re
= |z| cos ϕ
FT
Then the real and imaginary part of z are Re(z) = cos ϕ and Im(z) = i sin ϕ, and therefore
z = cos ϕ + iϕ = eiϕ . We saw in Remark 2.3 how we can calculate the argument of a complex
number.
Now let z ∈ C \ {0} and again let ϕ be the angle between the positive real axis and the line
z
connecting the origin with z. Let ze = |z| z | = 1 and therefore ze = eiϕ . It follows that
. Then |e
Im
D
zw w
α+β
β z
α
Re
FT
×, 44 approximation by least squares, 256
∧, 44 argument of a complex number, 323
C, 319 augmented coefficient matrix, 13, 70
Cn , 269
M (m × n), 70 bases
R2 , 25, 28 change of, 204
R3 , 44 basis, 168
Rn , 41 orthogonal, 235
RA
Eigλ (T ), 280 bijective, 189
Im, 319
L(U, V ), 188, 229 canonical basis in Rn , 169
Masym (n × n), 105 Cauchy-Schwarz inequality, 36, 272
Msym (n × n), 105 change of bases, 204
Pn , 155 change-of-coordinates matrix, 207
Re, 319 characteristic polynomial, 282
Sn , 121 coefficient matrix, 13, 70
U ⊥ , 244 augmented, 13, 70
D
323
324
FT
elementary matrix, 106 linear combination, 156
elementary row operations, 71 linear map, 187
empty set, 157, 159, 161, 168, 171 linear maps
entry, 13 matrix representation, 214
equivalence relation, 276 linear operator, see linear map
Euler formulas, 322 linear span, 157
expansion along the kth row/column, 124 linear system, 12, 69
consistent, 12
RA
field, 320 homogeneous, 12
finitely generated, 159 inhomogeneous, 12
free variables, 77 solution, 12
linear transformation, see linear map
Gauß-Jordan elimination, 75 matrix representation, 215
Gaußian elimination, 75 linearly dependent, 161
generator, 157 linearly independent, 161
geometric multiplicity, 280 lower triangular, 105
Gram-Schmidt process, 252
D
FT
upper triangular, 105 inner, 33, 42, 270
matrix representation of a linear transformation, product of vector in R2 with scalar, 28
215 projection
minor, 123 orthogonal, 249
modulus, 319 proper subspace, 149
Multiplicative identity, 321 Pythagoras Theorem, 250, 271
multiplicity
algebraic, 285 radius of convergence, 322
RA
geometric, 280 range, 189
real part of z, 319
norm, 319 reduced row echelon form, 73
norm of a vector, 30, 42, 270 reflection in R2 , 222
normal form reflection in R3 , 223
line, 53 right hand side, 12, 69
plane, 55 right inverse, 96
normal vector of a plane, 55 row echelon form, 73
null space, 189 row equivalent, 75
D
row operations, 71
ONB, 235 row space, 197
one-to-one, 189
orthogonal basis, 235 Sarrus
orthogonal complement, 241, 244 rule of, 125
orthogonal diagonalisation, 295 scalar, 26
orthogonal matrix, 237 scalar product, 33, 42, 270
orthogonal projection, 249, 249 sesquilinear, 271
orthogonal projection in R2 , 39 sign of a permutation, 121
orthogonal projection in Rn , 43, 249, 271 similar matrices, 276
orthogonal projection to a plane in R3 , 223 snymmetrix matrix, 295
orthogonal system, 234 solution
orthogonal vectors, 35, 271 vector form, 78
span, 157
square matrix, 70
standard basis in Rn , 169
standard basis in Pn , 169
subspace, 149
affine, 150
sum of functions, 87
surjective, 189
symmetric equation, 52
symmetric matrix, 105
system
orthogonal, 234
orthonormal, 234
trace, 283
transition matrix, 207
triangle inequality, 31, 36, 272
FT
trivial solution, 80, 160
unit vector, 31
unitary matrix, 274
upper triangular, 105
vector, 29
in R2 , 25
RA
norm, 30, 42, 270
unit, 31
vector equation, 51
vector form of solutions, 78
vector product, 44
vector space, 29, 143
direct sum, 241
generated, 157
intersection, 241
D
polynomials, 155
spanned, 157
subspace, 149
sum, 241
vector sum in R2 , 28
vectors
orthogonal, 35, 271
parallel, 35
perpendicular, 35, 271