LA Nutshell
LA Nutshell
– in a nutshell –
Julian P. Großmann
www.JP-G.de
[email protected]
ii
1 Foundations of mathematics 3
1.1 Logic and sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Natural numbers and induction . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Vectors in Rn 25
2.1 What are vectors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Vectors in the plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 The vector space Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Linear and affine subspaces (and the like) . . . . . . . . . . . . . . . . . . . 34
2.5 Inner product and norm in Rn . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6 A special product in R3 (!): The vector product or cross product . . . . . . 42
2.7 What are complex numbers? . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
iii
iv Contents
4 Determinants 91
4.1 Determinant in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Determinant as a volume measure . . . . . . . . . . . . . . . . . . . . . . . 93
4.3 The cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Important facts and using Gauß . . . . . . . . . . . . . . . . . . . . . . . . 100
4.5 Determinants for linear maps . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.6 Determinants and systems of equations . . . . . . . . . . . . . . . . . . . . 105
4.7 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Index 231
Some words
This text should help you to understand the course Linear Algebra. To expand your
knowledge, you can look into the following books:
• Gilbert Strang: Introduction to Linear Algebra,
• Sheldon Axler: Linear Algebra Done Right,
• Gerald Teschl, Susanne Teschl: Mathematik für Informatiker, Band 1.
• Shin Takahashi, Iroha Inoue: The Manga Guide to Linear Algebra.
• Klaus Jänich: Lineare Algebra.
Linear Algebra is a very important topic and useful in different applications. We1 discuss
simple examples later. However, the main idea is that we have a problem consisting of
a lot of quantities where some are fixed and others can be altered or are not known.
However, if we know the relations between the quantities, we use Linear Algebra to find
all the possible solutions for the Unknowns.
Variables Simple
Linear Find all
relations
solutions
between
Unknowns for
the
variables Algebra
That would be the calculation side of the world regarding Linear Algebra. In this lecture,
we will concentrate on understanding the field as a whole. Of course, this is not an easy
task and it will be a hiking tour that we will do together. The summit and goal is to
understand why solving equations is indeed a meaningful mathematical theory.
1
In mathematical texts, usually, the first-person plural is used even if there is only one author. Most of
the time it simply means “we” = “I (the author) and the reader”.
1
2 Contents
We start in the valley of mathematics and will shortly scale the first hills. Always stay
in shape, practise and don’t hesitate to ask about the ways up. It is not an easy trip but
you can do it. Maybe the following tips can guide you:
• You will need a lot of time for this course if you really want to understand everything
you learn. Hence, make sure that you have enough time each week to do mathematics
and keep these time slots clear of everything else.
• Work in groups, solve problems together and discuss your solutions. Learning math-
ematics is not a competition.
• Explain the content of the lectures to your fellow students. Only the things you can
illustrate and explain to others are really understood by you.
• Learn the Greek letters that we use in mathematics:
α alpha β beta γ gamma Γ Gamma
δ delta epsilon ε epsilon ζ zeta
η eta θ theta Θ Theta ϑ theta
ι iota κ kappa λ lambda Λ Lambda
µ mu ν nu ξ xi Ξ Xi
π pi Π Pi ρ rho σ sigma
Σ Sigma τ tau υ upsilon Υ Upsilon
φ phi Φ Phi ϕ phi χ chi
ψ psi Ψ Psi ω omega Ω Omega
• Choosing a book is a matter of taste. Look into different ones and choose the book
that really convinces you.
• Keep interested, fascinated and eager to learn. However, do not expect to under-
stand everything at once.
DON’T PANIC J.P.G.
Foundations of mathematics
1
It is a mistake to think you can solve any major problems just with
potatoes.
Douglas Adams
Before starting with Linear Algebra, we first have to learn the mathematical language,
which consists of symbols, logic, sets, numbers, maps and so on. We also talk about the
concept of a mathematical proof. These things build up the mathematical foundation.
A little bit of knowledge about numbers and how to calculate with them is assumed but
not much more than that. All symbols are introduced such that you know how to work
with them. However, if you interested in a more detailed discussion, I can recommend
you my video series about the foundations of mathematics:
3
4 1 Foundations of mathematics
Instead of true, one often writes T or 1 and instead of false, one often writes F or 0.
Not every meaningful declarative fulfils this requirement. There are opinions, alternative
facts, self-contradictory statements, undecidable statements and so on. In fact, a lot of ex-
amples here, outside the mathematical world, work only if we give the words unambiguous
definitions which we will implicitly do.
The last two examples are not logical statements but so-called predicates and will be
considered later.
Logical operations
For given logical statements, one can form new logical statements with so-called logical
operations. In the following, we will consider two logical statements A and B.
A ¬A
Truth table T F (1.1)
F T
Example 1.4. What are the negations of the following logical statements?
(a) The wine bottle is full.
(b) The number 5 is smaller than the number 2.
(c) All students are in the lecture hall.
A B A∧B
T T T
Truth table T F F (1.2)
F T F
F F F
A B A∨B
T T T
Truth table T F T (1.3)
F T T
F F F
A B A→B
T T T
Truth table T F F (1.4)
F T T
F F T
A B A↔B
T T T
Truth table T F F (1.5)
F T F
F F T
If a conditional or biconditional is true, we have a short notation for this that is used
throughout the whole field of mathematics:
A⇒B.
A⇔B.
6 1 Foundations of mathematics
This means that we speak of equivalence of A and B if the truth values in the truth table
are exactly the same. For example, we have
A ↔ B ⇔ (A → B) ∧ (B → A) .
Now one can ask: What to do with truth-tables? Let us show that ¬B → ¬A is the same
as A → B.
A B ¬A ¬B ¬B → ¬A
T T F F T
Truth table T F F T F (1.6)
F T T F T
F F T T T
Therefore:
(A → B) ⇔ (¬B → ¬A) .
This is the proof by contraposition:
“Assume that B does not hold, then we can show that A cannot hold as well”. Hence A
implies B.
Contraposition
If A ⇒ B, then also ¬B ⇒ ¬A.
The contraposition is an example of a deduction rule, which basically tells us how to get
new true proposition from other true propositions. The most important deduction rules
are given just by using the implication.
Modus ponens
If A ⇒ B and A is true, then also B is true.
Chain syllogism
If A ⇒ B and B ⇒ C, then also A ⇒ C.
Reductio ad absurdum
If A ⇒ B and A ⇒ ¬B, then ¬A is true.
One can easily prove these rules by truth tables. However, here we do not state every
deduction in this formal manner. We may still use deduction in the intuitive way as well.
Try it here:
Exercise 1.10. Let “All birds can fly” be a true proposition (axiom). Are the following
deductions correct?
1.1 Logic and sets 7
Sets
Modern mathematics does not say what sets are, but only specifies rules. This is, however,
too difficult for us right now, and we rather cite the attempt of a definition by Georg
Cantor:
“Unter einer ‚Menge‘ verstehen wir jede Zusammenfassung von bestimmten wohlun-
terschiedenen Objekten unserer Anschauung oder unseres Denkens zu einem Gan-
zen.”
The symbol “:=” is read as defined by and means that the symbol M is newly
introduced as a set by the given elements.
Example 1.12. • The empty set {} = ∅ = ∅ is the unique set that has no elements
at all.
• The set that contains the empty set {∅}, which is non-empty since it has exactly
one element.
• A finite set of numbers is {1, 2, 3}.
A = {n ∈ N : 1 ≤ n ≤ 300}
P(B) = {M : M ⊂ B} power set: set of all subsets of B
I = {x ∈ R : 1 ≤ x < π} = [1, π) half-open interval
3∈N, 12034 ∈ N , −1 ∈ N, 0 ∈ N, 0 ∈ N0
2
−1 ∈ Z , 0∈
/ Z, −2.7 ∈ Z, 3
∈ Z,
2
√
3
∈Q, −3 ∈ Q , −2.7 ∈ Q , 2 ∈ Q,
√ √
2∈R, −2 ∈ R, − 23 ∈ R , 0∈R.
Example 1.18.
X=R A(x) = “x < 0“
Then we can define the set
{x ∈ X : A(x)} = {x ∈ R : x < 0}
1.1 Logic and sets 9
The quantifiers and predicates are very useful for a compact notation:
• ∀x ∈ X : A(x) for all x ∈ X A(x) is true
• ∃x ∈ X : A(x) there exists at least one x ∈ X for which A(x) is true
• ∃!x ∈ X : A(x) there exists exactly one x ∈ X for which A(x) is true
Negation of statements with quantifiers:
• ¬(∀x ∈ X : A(x)) ⇔ ∃x ∈ X : ¬A(x)
• ¬(∃x ∈ X : A(x)) ⇔ ∀x ∈ X : ¬A(x)
In our notation: ¬(∃n ∈ N : A(n)) this is the same as ∀n ∈ N : ¬A(n), i.e. Each n ∈ N
is not the greatest natural number . But this is clear, because n + 1 > n.
Example 1.21. The set M := {x ∈ Z : x2 = 25} is defined by the set of each integer x
that squares to 25. We immediately see that this is just −5 and 5.
In other words: The equation x2 = 25 with unknown x has, depending in which number
realm you want to solve it, one or two solutions, and the equation x2 = −25 has no
solution in the real numbers. However, we will find solutions in the complex numbers as
we will see later.
Operations on sets
• M1 ∪ M2 := {x : x ∈ M1 ∨ x ∈ M2 } (union)
• M1 ∩ M2 := {x : x ∈ M1 ∧ x ∈ M2 } (intersection)
• M1 \ M2 := {x : x ∈ M1 ∧ x 6∈ M2 } (set difference)
10 1 Foundations of mathematics
M1 ∩ M2
M 1 \ M2
M1 ⊂ M2
1 2 3 B
A
x (x,1) (x,2) (x,3)
A×B
y (y,1) (y,2) (y,3)
In the same sense, for sets A1 , . . . , An the set of all n-tupels is defined:
A1 × · · · × An := {(a1 , . . . , an ) : a1 ∈ A1 , . . . , an ∈ An }
Z \ N = {−x : x ∈ N}, N ⊂ N0 , Z ⊂ N0 , (Z \ Q) ⊂ N .
3
√
N⊂N, −3 ∈ Z \ N0 , 7
∈Q\Z, 2∈R\Q,
Exercise 1.27. Describe the following sets and calculate its cardinalities:
(a) X1 := {x ∈ N : ∃a, b ∈ {1, 2, 3} with x = a − b}
(b) X2 := {(a − b) : a, b ∈ {1, 2, 3}}
(c) X3 := {|a − b| : a, b ∈ {1, 2, 3}}
(d) X4 := {1, ..., 20} \ {n ∈ N : ∃a, b ∈ N with 2 ≤ a and 2 ≤ b and n = a · b}.
(e) X5 := {S : S ⊂ {1, 2, 3}}.
In our lecture, we will also get to know other objects than real numbers, like vectors and
matrices, where some of these internalized laws do not apply any more. So we start by
having a fresh look at these rules.
We can add and multiply real numbers. Moreover, we use parentheses to describe the
order of the computations. We have the notational convention that multiplication binds
stronger than addition: ab + c means (ab) + c.
Furthermore, we are used to have the neutral numbers 0 and 1 with special properties:
a+0=a a·1=a
and additive inverse elements, denoted by −a, and also the multiplicative inverses, denoted
by a−1 for a 6= 0. They fulfil a + (−a) = 0 and aa−1 = 1.
A set with such properties is called a field. Here we have the field of real numbers R.
More details, you can find in the box below and in the videos.
It is also known from school that the real numbers can be ordered, which simply means
that the relation a < b always makes sense. One can show that the following rules are
sufficient to derive all known calculations properties concerning ordering of numbers.
a > 0, or − a > 0, or a = 0.
• For all a, b ∈ R with a > 0 and b > 0, one has a + b > 0 and ab > 0.
a<b :⇔ a−b<0
and
a≤b :⇔ a − b < 0 or a = b .
This order relation is the reason, why we can think of the real numbers as a line, the ”real
line“.
For describing subsets of the real numbers, we will use intervals. Let a, b ∈ R. Then we
define
[a, b] := {x ∈ R : a ≤ x ≤ b}
1.2 Real Numbers 13
(a, b] := {x ∈ R : a < x ≤ b}
[a, b) := {x ∈ R : a ≤ x < b}
(a, b) := {x ∈ R : a < x < b}.
Obviously, in the case a > b, all the sets above are empty. We also can define unbounded
intervals:
The first is a useful notation for a sum which is the result of an addition. Two or more
summands added. Instead of using points, we use the Greek letter . For example,
P
3 + 7 + 15 + . . . + 127
is not an unambiguous way to describe the sum. Using the sum symbol, there is no
confusion:
X7
(2i − 1).
i=2
Of course, the parentheses are necessary here. You can read this as a for loop:
Example 1.31.
10
?
X
(2i − 1) = 1 + 3 + 5 + . . . + 19 = 100
i=1
10
?
X
i = −10 − 9 − 8 − . . . − 1 + 0 + 1 + · · · + 8 + 9 + 10 = 0
i=−10
8
?
Y
(2i) = (2 · 1) · (2 · 2) · (2 · 3) · . . . · (2 · 8) = 10321920.
i=1
1.3 Maps
Example 1.33. (a) f : N → N with f (x) = x2 maps each natural number to its square.
16 1 Foundations of mathematics
X=N Y =N
f
1 1 2 3
2 4 5 6 7 8
9 10 11 12
3
13 14 15
4 16 . . .
5 25
... ...
(b)
f : R2 → R
(x1 , x2 ) 7→ x21 + x22
(c)
f :Z×N→Q
q
(q, p) 7→
p
Problem of well-definedness: the above equation can have two solutions, in the case that
a > 0. However in the case of a < 0, there are no solutions at all.
One possible way out: restrict the domain of definition and the codomain
R+
0 = {a ∈ R : a ≥ 0}
Then:
√
: R+ +
0 → R0
Definition 1.34.
Let f : X → Y be a function and A ⊂ X and B ⊂ Y some sets.
X f Y
f (A) := {f (x) : x ∈ A}
A f (A)
is called the image of A under f .
X Y
f −1 (B) := {x ∈ X : f (x) ∈ B}
f −1 (B) B
is called the preimage of B under f .
Note that the preimage can also be the empty set if none of the elements in B are “hit”
by the map.
To describe the behaviour of a map, the following sets are very important:
If these definitions seem too abstract, the following video may help you to get used to the
terms.
Example 1.37. Define the function that maps each student to her or his chair. This
means that X is the set of all students in the room, and Y the set of all chairs in the
room.
• well-defined: every student has a chair
• surjective: every chair is taken
• injective: on each chair there is no more than one student
• bijective: every student has his/her own chair, and no chair is empty
not
injective
not
surjective
X Y
y 7→ x where f (x) = y .
Example 1.38. Consider the function f : N → {1, 4, 9, 16, . . .} given by f (n) = n2 . This
is a bijective function. The inverse map f −1 is given by:
f (N) N
1 1
4 2
9 3
16 4
25 5
... ...
Example 1.39. For a function f : R → R, we can sketch the graph {(x, f (x)) : x ∈ X}
in the x-y-plane:
y
f :R→R
y x 7→ x2 − 1
f :R→R
x 7→ x2 + 1
x
x
y f :R→R
x 7→ sin x
These notions might seem a little bit off-putting, but we will use them so often that you
need to get use to them. Maybe the following video helps you as well:
20 1 Foundations of mathematics
Composition of maps
Definition 1.40.
If f : X → Y and g : Y → Z, we may compose, or concatenate these maps:
g◦f :X →Z
x 7→ g(f (x))
X→Y →Z
X Y Z
f g
x z
y
g◦f
g◦f :R→R
x 7→ sin(x2 )
f ◦g :R→R
x 7→ (sin(x))2
(b) Let X be a set. Then idX : X → X with x 7→ x is called the identity map. If there
is no confusion, one usually writes id instead of idX . Let f : X → X be a function.
Then
f ◦ id = f = id ◦ f.
1.4 Natural numbers and induction 21
Mathematical induction
s1 = 1
s2 = s1 + 2 = 3
s3 = s2 + 3 = 6
s4 = s3 + 4 = 10
s5 = s4 + 5 = 15
sn+1 = sn + n + 1
(n + 1)n
sn = (Induction hypothesis).
2
Very good! We can verify our formula for these examples. In particular:
(1 + 1)1
s1 = =1 (Base case).
2
Induction step: We have to show
(n + 2)(n + 1) (n + 1)n
is equal to sn+1 = sn + (n + 1) = +n+1
2 2
where we used the induction hypothesis in the last step. So let us compute:
(n + 1)n n2 + n + 2n + 2 (n + 2)(n + 1)
sn + (n + 1) = +n+1= = .
2 2 2
(2) Show that A(n + 1) is true under the assumption that A(n) is true.
Sometimes it can happen that a claim A(n) is indeed false for finitely many natural
numbers, but it is eventually true. This means that the base case cannot be shown for
n = 1 but for some other natural number n0 ∈ N. Then the induction proof shows that
A(n) is true for all natural number n ≥ n0 .
1.5 Summary 23
1.5 Summary
• For doing Mathematics, we need logic and sets. A set is just a gathering of its
elements.
• Important symbols: ∈, 6∈, ∅, ∀, ∃, ⊂, (, ∩, ∪, \
• Implication A ⇒ B: If A holds, then also B.
• Equivalence A ⇔ B: The statement A holds if and only if B holds.
• Sums and products Σ, Π
• A map or function f : X → Y sends each x ∈ X to exactly one y ∈ Y .
• f is surjective: Each y ∈ Y is “hit” (one or more times).
• f is injective: Each y ∈ Y is “hit” at most one time.
• f is bijective: Each y ∈ Y is “hit” exactly once.
• Is f : X → Y bijective, then the inverse map f −1 : Y → X sends each y ∈ Y to
exactly one x ∈ X.
• The composition g ◦ f : X → Z is the application of the function g : Y → Z to the
result of another function f : X → Y : (g ◦ f )(x) = g(f (x)).
• Mathematical induction is a tool for proving mathematical statements for all natural
numbers at once. You have to show a base case and then do the induction step.
1.6 Exercises
Exercise 1
Calculate the following numbers and sets:
4 4 5 50
(a) j
, (b) (c) (d)
Q P S P
j+1
3, [2n, 2n + 2), k.
j=2 i=0 n=0 k=1
Exercise 2
(a) Consider the two functions f1 : R → R, x 7→ x2 and f2 : [0, ∞) → R, x 7→ x2 . For
both functions calculate preimages of the sets {1}, [4, 9) and (−1, 0).
(b) Consider the two functions g1 : R → [0, 1], x 7→ |sin(x)| and g2 : [0, 2π] → R,
x 7→ sin(x). For both functions calculate images of the sets (0, π/2), [0, π) and
(0, 2π].
√
(c) Consider the two functions h1 : R → R and h2 : [−1, 1] → [ 3, 2] given by
Exercise 3
Let X be the set of all fish in a given aquarium. Define a function f : X → Y by mapping
every fish on its species where Y denotes the set of all species of fish. What does it mean
if f is injective or surjective or bijective?
Exercise 4
In the lecture you already learnt about the example (A → B) ⇔ (¬B → ¬A) of two
logically equivalent statements. Show that the following statements are also logically
equivalent by using truth tables:
(a) ¬(A ∧ ¬B) ⇔ (A → B),
(b) ¬(A ∧ B) ⇔ ¬A ∨ ¬B.
Exercise 5
One usually deals with subsets A, B, etc. of a given fixed set X. In such a situation it
is useful to introduce Ac := X \ A which is called the complement of A (with respect to
(w.r.t.) the set X). Show for A, B ⊂ X
(a) A \ B = A ∩ B c ,
(b) (A ∩ B)c = Ac ∪ B c .
Exercise 6
Let A, B and C be sets.
(a) Show A × (B ∪ C) = (A × B) ∪ (A × C).
(b) Let |A| = n and |B| = m where n, m ∈ N. Show that
|A × B| = n · m.
Vectors in Rn
2
This is Frank Drebin, Police Squad. Throw down your guns, and come
on out with your hands up. Or come on out, then throw down your
guns, whichever way you want to do it. Just remember the two key
elements here: one, guns to be thrown down; two, come on out!
After the first chapter about the foundations, we can finally start with the first topics
about Linear Algebra. There is a whole video series which can help you understanding
the definitions and propositions of the next sections.
In this section we do some informal discussions about the objects of linear algebra. We
will make all objects into rigorous definitions later.
When we are talking about a vector, we often mean an object or a quantity that has a
length and a direction in some sense. Therefore, we can always visualise this object as an
“arrow” and we write, for example, ~v and w
~ for two vectors.
25
26 2 Vectors in Rn
Now we can exactly do two things to vectors. First of all, we can scale a vector ~v by a
number λ and get a new vector that has the same direction but a different length. The
second operation is that we form two vectors into a new one. This vector addition where
one sets the tail of the one arrow at the tip of the other one.
• Add the two arrows, by concatenating them and call the result ~v + w.
~
• Scale an arrow by a (positive or negative) factor λ and call the result λ~v .
w
~
~
v+w
~
~
v
3~
v
~
v
− 12 ~
v
λ~v + µw
~ (linear combination)
2.2 Vectors in the plane 27
Mostly, there is no confusion which variables are vectors and which one are just numbers
such that we will omit the arrow from now on. However, we will use bold letters in this
script to denote vectors most of the time.
We already know that we can describe the two-dimensional plane by the cartesian product
R × R, which consists of all the pairs of real numbers. For each point in the plane, there
is an arrow where the tail sits at the origin. This is what one calls a position vector.
y
v
3
v = x
2
Our vector is given by the point in coordinate system, which means it consists of exactly
two numbers, an x- and a y-coordinate. The arrow is given if we know these two numbers
as in the example above we can write
3
v= .
2
The first number says how many steps we have to go to right (or left) and the second
number says how many steps we have to go upwards (or downwards) parallel to the y-axis.
These numerical representations of the arrows are called columns or column vectors
Now we also know how to add and scale these column vectors:
These are the two things, we want to do with vectors and now we can describe such
arrows in the two-dimensional plane. We have the geometric view given by arrows and
the numerical view by operating on the coordinates.
28 2 Vectors in Rn
For describing each point in the plane, the following elements are useful:
Note that we can write every vector v ∈ R2 as a linear combination of the two unit vectors:
y
v v v2 e2
v = 1 = v1 e1 + v2 e2
v2
v1 e1
x
Linear combinations
To compare apples and oranges: An apple has 8mg vitamin C and 4µg vit-
amin K. An orange has 85mg vitamin C and 0.5µg vitamin K:
8 VitC 85 VitC
Apple a = , Orange b =
4 VitK 0.5 VitK
Fruit salad: How much vitamin C and vitamin K do I get if I eat 3 apples and 2
oranges? Answer:
8 85 3 · 8 + 2 · 85 194 VitC
3a + 2b = 3 +2 = =
4 0.5 3 · 4 + 2 · 0.5 13 VitK
VitK
3a + 2b
a
0 b VitC
A vector written as
λa + µb with λ, µ ∈ R (2.1)
Question:
Which vectors v in R2 are perpendicular to the vector u = 2
?
1
30 2 Vectors in Rn
u1 −u2
v ∈ R is perpendicular to u =
2
⇐⇒ v=λ for a λ ∈ R (2.2)
u2 u1
u1 v1 v1 −u2
u= and v = are orthogonal ⇔ =λ for a λ ∈ R
u2 v2 v2 u1
⇔ u1 v1 = −u2 v2
⇔ u1 v1 + u2 v2 = 0
Hence, this means that uu12 and vv12 are orthogonal if the calculation of u1 v1 + u2 v2 gives
is called the (standard) inner product of u and v. Sometimes also called: (standard)
scalar product.
By using Pythagoras’ theorem, we can calculate the length of the arrow in the coordinate
system.
2.2 Vectors in the plane 31
v v2 e2
Length of v =
p
v12 + v22
v1 e1
x
Lines in R2
For describing points in the plane, we can use the position vectors and just use the vector
operations to define objects in the plane. One of the simplest objects is a line g inside
the plane:
y g
We already know that all vectors that a orthogonal to a fixed vector u ∈ R2 , which means
that hu, vi = 0, build a line through the origin. On the other hand, if we have a line g
through the origin, we can find a vector n that is perpendicular to the vectors lying on
the line. Such an orthogonal vector is often called normal vector of the line.
Inthis first case, where g goes through the origin, we denote the normal vector by n =
α
β
∈ R2 and get:
32 2 Vectors in Rn
x
g = {v ∈ R2 : hn, vi = 0} = { ∈ R2 : αx + βy = 0}.
y | {z }
|{z}
v hn,vi
vector v = xy ) on line: Such a point V lies on g if and only if the vector v − p is inside
0 = hn, v − pi = h αβ , x−p
y−p2
1
i = α(x − p1 ) + β(y − p2 ) = αx + βy − (αp1 + βp2 ) .
| {z }
=:δ
g = {v ∈ R2 : hn, v − pi = 0} = { xy ∈ R2 : αx + βy = δ}
n V
n ·
n · P v−p
·
v
p
g
0
2.3 The vector space Rn 33
The same calculation rules as for R2 also hold for the general case. The most important
properties we should note:
Each set V with an addition and scalar multiplication that satisfies the eight rules above
is called a vector space. We will come back to this in an abstract sense later. First we
will use this notion to talk about vector spaces inside Rn .
Linear subspaces
Rule of thumb:
Linear subspaces correspond to lines, planes,. . . through the origin.
Since we can set all λj to 0, the zero vector o is always contained in U , and therefore
{0} is the smallest possible subspace. On the other hand, Rn itself is the largest possible
subspace. Both are called the trivial subspaces.
Each linear subspace U of the vector space Rn is also a vector space in the sense of
the properties given in Proposition 2.8.
Linear combinations remain in U (by definition), and rules are inherited from V .
u, v ∈ U , λ, µ ∈ R =⇒ λu + µv ∈ U . (2.3)
Proof. We do the proof by induction for k vectors like in the definition of a subspace:
2.4 Linear and affine subspaces (and the like) 35
Thus, to check if a given set U is a linear subspace, we only have to check if linear
combinations of two vectors remain in U . Or we can check it separately:
(1) o∈U,
(2) u ∈ U , λ ∈ R =⇒ λu ∈ U ,
(3) u, v ∈ U =⇒ u + v ∈ U .
This subspace is called the span or the linear hull of M . For convenience, we define
Span(∅) := {o}.
An equivalent definition would be: Span (M ) is the smallest linear subspace U ⊂ Rn with
M ⊂ U . See Proposition 2.15.
36 2 Vectors in Rn
Most interesting is the case, where M = {u1 , . . . , uk } just consists of finitely many vectors.
We say that U := Span(M ) is spanned by the vectors u1 , . . . , uk or, the other way around,
that {u1 , . . . , uk } is a generating set for U (generates U , spans U ). In this case, we often
write U = Span(u1 , . . . , uk ).
To check, if a vector space is spanned by some vectors, we only have to check this for
some generating set:
Proof. Exercise!
(b) Span( 10 , 01 ) is the whole plane R2 . Span( 10 , 11 ) is also the whole plane.
! !
1 0
(c) Span 0 , 1 this is the xy-plane in R3 .
0 0
! ! ! ! !
1 2 0 1 2
(d) Span 2 , 4 is a plane in R3 going through 0 , 2 and 4 .
3 7 0 3 7
2.4 Linear and affine subspaces (and the like) 37
! ! ! ! ! !
1 0 0 1 0 1
(e) Span 0 , 1 , 0 is the whole space R3 . Span 1 , 1 , 0 is also the
0 0 1 0 1 1
whole space
1 ! 1 5 !
2 2 4
(f) Span 3 is a “line” in R5 , Span
3 , 3 is a “plane”.
4 4 2
5 5 1
Rule of thumb:
Affine subspaces correspond to arbitrary lines, planes,. . . . In other words: translated
linear subspaces.
corresponding to the points A and B. Find the centre point of the line between A and B.
y y
B B
M M
P
A A
b b
a a
x x
0 0
The center point is then given by going only half way in the direction of d:
1 −1 1 4 −1 2 1
m=a+ d= + = + = (2.4)
2 2 2 2 2 1 3
m = a + 21 d = a + 12 (−a + b) = a − 12 a + 12 b = 21 a + 12 b = 12 (a + b)
d
z }| {
q := a + λ (−a + b) = (1 − λ)a + λb (2.5)
The corresponding point Q (with position vector q from the equation above) lies
at point A if λ = 0,
at point B if λ = 1,
at the centre point M if λ = 21 ,
between A and B if λ ∈ [0, 1] (2.6)
on the line through A and B for all λ ∈ R, (2.7)
on the line through A and B, “in front of” A for all λ < 0,
on the line through A and B, “behind” B for all λ > 1.
y
Q2
Q1 = M Q1 = B
2
Q1 = P
4
Q0 = A
b
a
Q−1
x
0
The analogous formulation to the linear hull is the affine hull. Try to give a definition!
v + S := {x ∈ Rn : x = v + s for s ∈ S}
(iv) Every nonempty affine subspace S can be written in the form S = v + U for
some v ∈ S and U a linear subspace.
Proof. (i) : Follows from the definition because each affine combination is a linear com-
bination.
(ii) : If we have an arbitrary linear combination, we can trivially add also the zero vector.
But if the zero-vector is in a linear combination, we can make it an affine one.
(iii) : Let us write an arbitrary affine combination of elements of v + S:
k
X k
X k
X k
X
λj (sj + v) = λj s j + λj v = λj sj + v .
| {z }
j=1 j=1 j=1 j=1
∈v+S | {z } | {z }
1 ∈v+S
a, b ∈ S , λ ∈ R =⇒ λa + (1 − λ)b ∈ S
The sets which contain all possible conical combinations of their elements are called convex
cones, and we can define the conical hull of a set of vectors.
We can summarise this in the following table:
no sign imposed λj ≥ 0
For all these types of sets we know ”... combinations“, and ”... hulls“.
This illustrates our strategy: describe things known from R2 and R3 algebraically, and
thus generalise them to arbitrary dimensions.
is called the (standard) inner product of u and v. Sometimes also called: (standard)
scalar product. If hu, vi = 0, then we call u and v orthogonal.
Proposition 2.24.
The standard inner product h·, ·i : Rn × Rn → R fulfils the following: For all vectors
x, x0 , y ∈ Rn and λ ∈ R, one has
In general, we just need a map h·, ·i with the properties given in Proposition 6.18 to define
orthogonality as follows:
u ⊥ v :⇔ hu, vi = 0.
From the first binomial formula, we obtain directly a generalisation of the Pythagorean
theorem:
u ⊥ v ⇒ ku + vk2 = kuk2 + kvk2 .
42 2 Vectors in Rn
U ⊥ := {v ∈ Rn : hv, ui = 0 ∀u ∈ U } .
Remark:
In some calculations it can be really helpful to use the Levi-Civita symbol:
+1 if (i, j, k) is an even permutation of (1, 2, 3)
εijk = −1 if (i, j, k) is an odd permutation of (1, 2, 3)
0 if i = j, or j = k, or k = i
Then we find a short notation for the cross product of two vectors u, v ∈ R3 :
X
u×v = εijk ui vj ek .
ijk
Since we have a good imagination for the three-dimensional space, we can interpret the
result of the cross product u × v in a geometric way. It is the only vector in R3 that has
the following three properties:
2.7 What are complex numbers? 43
1.) u × v ⊥ u and u × v ⊥ v
v
2.) ku × vk = Area
u
3.) Orientation: “right-hand rule”
x2 + 2bx + c = 0,
In fact, there is the fundamental theorem of algebra, which says that complex numbers
can even be used to solve any algebraic equation.
44 2 Vectors in Rn
an xn + an−1 xn−1 + · · · + a1 x + a0 = 0, ak ∈ C : k = 0 . . . n
has at least one zero in C if the left hand side is not constant.
Complex plane
Im
geometrical identification:
y z
x
ϕ
x + iy ∈ C ←→ y
∈ R2
0 x Re
Computations in C
Business as usual, only new rule i2 = −1. We use two complex numbers w = u + iv and
z = x + iy:
But they are more: vectors cannot be multiplied with each other, but with complex
numbers we can do that. Just like the reals, they are a field (but have no ordering). This
is very special. There is no 3d analogue to the complex numbers.
2.7 What are complex numbers? 45
Definition 2.28.
We can define the following derived quantities for z = x + iy:
Re z = x real part
Im z = y imaginary part
z = x − iy complex conjugate
p
r = |z| = x2 + y 2 absolute value, modulus
ϕ = arg z = angle of z with positive real line argument
Usually, one either takes −π < arg z ≤ π or 0 ≤ arg z < 2π. In cases of ambiguity, one
has to carefully explain, what is meant.
We have:
z = |z|(cos argz + i sin argz) = |z|eiargz
It holds
|wz| = |w||z| and arg (zw) = arg z + arg w.
So we can write shortly:
zw = |w||z|ei(arg w+arg z)
However, as usual with an angle, we would like to have arg w + arg z in [0, 2π[. Thus,
zw = |w||z|eiϕ
where ϕ is chosen by using a k ∈ Z in a way that 0 ≤ ϕ = arg w + arg z − 2kπ < 2π.
Thus |z| = |a|1/n and 0 ≤ arg a = n arg z − 2kπ < 2π, where k ∈ N can again be chosen.
Thus, we get the following solutions:
1
arg z0 = arg a
n
46 2 Vectors in Rn
1
arg z1 = (arg a + 2π)
n
1
arg z2 = (arg a + 2 · 2π)
n
..
.
So in general:
1 1
zj = |a| n ei n (arg a+j·2π) for j = 0, . . . , n − 1 .
Thus, we have n complex nth roots, which are evenly distributed on the circle around 0
1
with radius |a| n .
It does not matter here, which arg a one takes, if all the results are written again in the
form zk = xk + iyk .
Summary
• Rn denoted the set of all vectors with n real components
• You can add and scale vectors. Both operations in Rn are realised by doing these
inside the components.
• Rn is an example of an abstract concept, called a vector space.
• Combinations like λv + µu are called linear combinations.
• There are linear subspaces and affine linear spaces. They are the generalisation of
lines and planes one can illustrate in R3 .
• The inner product shows you orthogonality and the norm measures the length of a
vector.
• In R3 we have the cross product to calculate orthogonal vectors.
• Complex numbers are given by a multiplication rule on R2 .
Matrices and linear systems
3
Arthur looked up. “Ford!” he said, “there’s an infinite number of
monkeys outside who want to talk to us about this script for Hamlet
they’ve worked out.”
Douglas Adams
In this chapter, we will study matrices in more detail, and after that, describe systems of
linear equations. First of all, a matrix is just a table of numbers. One writes the numbers
in a rectangle with m rows and n columns, where m, n are natural numbers.
Definition 3.1.
The set of all matrices with m rows and n columns is notated as:
a11 a12 . . . a1n
a21 a22 . . . a2n
Rm×n := A = .. .. : aij ∈ R , i = 1, . . . , m , j = 1, . . . , n
. .
am1 am2 . . . amn
Since we can add matrices of the same size A + B by adding the entries and scale a matrix
λ · A by scaling the entries, the space Rm×n is also a vector space.
Example 3.3.
1 2 1 0 1+1 2+0 2 2
+ = = .
3 4 2 −1 3+2 4−1 5 3
47
48 3 Matrices and linear systems
Attention!
The addition A + B is only defined for matrices with the same height and the same
width.
Example 3.5.
1 2 2·1 2·2 2 4 1 2 1 2
2 = = = + .
3 4 2·3 2·4 6 8 3 4 3 4
To see why it is interesting to study matrices, we first have to look at linear equations.
This was an example with two unknowns (x and y). Here we give an example for three
3.1 Introduction to systems of linear equations 49
2x −3y +4z = −7
−3x +y −z = 0
20x +10y = 80
10y +25z = 90
You can imagine that we can have an arbitrary number of unknowns and also an arbitrary
number of equations. Often these unknowns are denoted by x1 , x2 , . . . , xn and we search
for suitable values such that all equations are satisfied.
Here, the most important part is that the equations are linear. The exact definition will
follow later. The sloppy way to say that an equation is linear is:
As you can see, there are a lot of constants that have to be numeric.
Here, aij and bi are given numbers, mostly just real numbers. A solution of the LES
is a choice of values for x1 , ..., xn such that all m equations are satisfied.
Example 3.7. Having three unknowns x1 , x2 , x3 , we could have different cases for the set
of solutions:
E2
s
E1
50 3 Matrices and linear systems
or
2 equations 3 equations
While the system with two variables was very well-arranged, the general case seems more
complicated in spite of representing the same idea. At this point matrix and vector
notation comes in very handy:
Ax = b
This can be seen as a short notation for a system of linear equations. However, this also
defines a product of a matrix and a vector.
The product Ax = A · x (where we mostly do not use a dot) is given as the vector
a11 x1 + a12 x2 + · · · + a1n xn
.. ∈ Rm .
Ax := .
am1 x1 + am2 x2 + · · · + amn xn
3.2 Some words about matrices 51
Attention!
The width of A has to be the same as the height of x! Otherwise Ax is not defined.
We can interpret this also as the following: The matrix is a machine where we can put in
vectors x ∈ Rn and we get a new vector Ax ∈ Rm as a linear combination of the columns.
This machine is given by the multiplication of A by x ∈ Rn :
n
A·
x∈R Ax ∈ Rm
This machine multiplies
each vector from Rn
with the matrix A.
Such a machine is, of course, nothing else than “function” or “map” defined in Section 1.3.
We call this function above fA . It maps x ∈ Rn to Ax ∈ Rm .
fA : Rn → Rm , with fA : x 7→ Ax (3.3)
Here, the function fA and the matrix include the same amount of information.
Here, we use the notation T for the transpose of a column vector. The result is a row
vector with the same entries. We fix this as a space:
R1×n = {xT =
x1 . . . xn : x1 , . . . , xn ∈ R}
Since a row vector uT ∈ R1×n is just a very flat matrix, the product with a column vector
v ∈ Rn is well-defined:
uT v = (u1 v1 + · · · + un vn ) ∈ R1×1 .
Indeed, this is just a scalar and it coincides with standard inner product we already
know: hu, vi = uT v.
In this view, the product of the matrix with a vector can just be seen as the scalar product
with each row:
3.5 Matrix multiplication 53
The result is a matrix with m rows and k columns and denoted by AB. It is called the
matrix product of A and B.
for i = 1, . . . , m and j = 1, . . . , k.
Attention!
The product AB is only defined if the width of A coincides with the height of B.
The “inner dimensions” have to match.
Special cases:
• A = aT ∈ R1×n , B = b ∈ Rn×1 : AB = aT b ∈ R
!
1 2 1 2 3 1 2 1 1 2 2 1 2 3
=
3 4 4 5 6 3 4 4 3 4 5 3 4 6
| {z } | {z } | {z } |{z} | {z } |{z} | {z } |{z}
A B A b1 A b2 A b3
!
1·1+2·4 1·2+2·5 1·3+2·6 9 12 15
= =
+ 4 · 4} 3| · 2 {z
3| · 1 {z + 4 · 5} 3| · 3 {z
+ 4 · 6} 19 26 33
Ab1 Ab2 Ab3
1 2 3 1 2
(b) Let A = and B = . Then the product of A and B is not defined
4 5 6 3 4
since width(A) = 3 6= 2 = height(B). The product with the other order BA is
defined.
(c) Now the matrix dimensions (3 × 1) · (1 × 3) ⇒ 3 × 3:
1
!
1
!
1
!
1
! ! 1 2 3
2 (1 2 3) = 2 |{z}
1 2 |{z}
2 2 |{z}
3 = 2 4 6
3 3 b1 3 b2 3 b3 3 6 9
(d) Now the matrix
! dimensions (1 × 3) · (3 × 1) ⇒ 1 × 1:
1
(1 2 3) 2 = (1 · 1 + 2 · 2 + 3 · 3) = (14) = 14
3
(e) A 2 × 2-example (which you shouldn’t take seriously):
(Example by Florian Dalwigk; images from Super Mario World and Super Mario World 2)
We can also ask what happens if we multiply a row vector xT from the left to a matrix
B ∈ Rn×k . By definition, we get:
β T1
x T B = x1 · · · xn
.. β T1
β Tn
. = x1 + · · · + xn .
β Tn
This means the product xT B is a linear combination of the rows of B. This is an analogy
that Ax is a linear combination of the columns of A, cf. equation (3.2).
3.5 Matrix multiplication 55
Remark:
Now we can see the matrix product as introduced
This means that each column of AB consists of a linear combination of the columns
from A.
Seeing the product with the other eye
αT1 αT1 B
.. ..
AB = . B = . ,
T T
αm αm B
we see that each row of AB consists of a linear combination of the rows from B.
(c) Associative rule: For all A ∈ Rm×n and B ∈ Rn×k and C ∈ Rk×` we have:
A · (B · C) = (A · B) · C.
Proof. All these rules follow from the definition of the matrix product of A and B,
n
X
(AB)ij = air brj ,
r=1
and the fact that these rules hold for the real numbers air , brj ∈ R. For example, for
showing (c):
n n k k n
!
X X X X X
(A(BC))ij = air (BC)rj = air brz czj = air brz czj = ((AB)C)ij .
r=1 r=1 z=1 z=1 r=1
AB 6= BA (in general).
56 3 Matrices and linear systems
Remark:
We thus have the following interpretations of the matrix vector product AB:
• each row αTi of A and each column bj of B are multiplied to form an entry
of the product: (AB)ij = αTi bj ,
All these interpretations are equally valid, and from situation to situation, we can
change our point of view to gain additional insights.
Rule of thumb:
Equation (+) means: First adding, then mapping = First mapping, then adding
Equation ( · ) means: First scaling, then mapping = First mapping, then scaling
We already know that for each matrix A ∈ Rm×n there is an associated map fA . This
map is indeed a linear map.
Proof. This follows immediately from the properties of the matrix product in Proposi-
tion 3.12. However, it may be helpful to write down a direct proof for the case n = 2.
(a) Let x = xx12 and y = yy12 be vectors in R2 . Then we have:
a x1 + y1 (3.2)
fA (x + y) = A(x + y) = a 1 2 = (x1 + y1 )a1 + (x2 + y2 )a2
x2 + y2
(3.2)
a x1 a y1
= x1 a1 + x2 a2 + y1 a1 + y2 a2 = a1 2 + a 1 2
x2 y2
= Ax + Ay = fA (x) + fA (y).
a λx1 (3.2)
fA (λx) = A(λx) = a 1 2 = (λx1 )a1 + (λx2 )a2
λx2
(3.2)
a x1
= λ(x1 a1 + x2 a2 ) = λ a 1 2 = λAx = λfA (x).
x2
fA ◦ fB : Rn → Rm
x 7→ A(Bx) = ABx.
A linear map is completely determined when one knows how it acts on the canonical unit
vectors e1 , . . . , en . Therefore, in R2 , a good visualisation is to look at “houses”: A house H
is given by two points. Now what happens under a linear map fA associated to a matrix
A? One just have to look at the corners:
q fA q0
0
o0 := fA (o) = A
0
= 0a1 + 0a2 = o
1 a2
p0 := fA (p) = A
0
= 1a1 + 0a2 = a1
0 p0
q0 := fA (q) = A
1
= 0a1 + 1a2 = a2
o p o0 a1
With the help of the linearity, we also know what happens with the other parts of the
house, for example the corners of the door:
s s0
Since t = 1
p and u = 1
p, we have:
2 4
q fA q0 r0
r
(·) 1
fA (t) = fA ( 12 p) = f (p)
2 A
= 12 p0 a2
(·) 1
fA (u) = fA ( 14 p) = f (p) = 41 p0 a1
p0
4 A 0
o0 u t
o u t p 0
A map f : R2 → R2 given by
f
x − 51 (cos(πy) − 1)
x
f: 7→ .
y y + 18 sin(2πx)
is not linear!
3.6 Linear maps 59
3 0 1 0 3 0
A= B= C=
0 1 0 2 0 2
2 2
1
3 1 3
−1 0
D= 1 0 −1 0
0 1 E= F =
0 −1 0 −1
1 −1
1
−1 −1
−1
5 0 1 0 0 1
G=
0 5 H 0 = 5H I=
0 1 H0 = H J=
1 0
fI = id
5
1 1
5 1 1
3 1
3 0
3 6 M =
K= L= 1 0
1 2 1 2
1
1
−1
1 1 0
N = 0 0 P =
1 1 O= 0 0
0 0
√
2
1
−1 1 1
cos( π6 ) − sin( π6 )
−1 −1
0 0 R= S=
Q= sin( π6 ) π
cos( 6 ) −3 3
0 1
3
1
1.5
π
/6
1 −3
60 3 Matrices and linear systems
There is a λ ∈ R with a = λb .
If this is the case, we can build a loop of vectors, starting at o and ending at o again:
o = (−1)a + λb + µc .
If this is not possible, we call the family (v1 , . . . , vk ) linearly independent. This
means that
Xk
λj vj = o ⇒ λ1 , . . . , λk = 0
j=1
holds.
1 1 0
= +
1 0 1
(b) ( 1 1
, 2 ) is linearly dependent.
2
0
, 1
(c) Each family which includes o is linearly dependent. Also each family that has the
same vector twice or more is linearly dependent.
(d)
1 0 0
e1 = 0 e2 = 1 e3 = 0
0 0 1
3.7 Linear dependence, linear independence, basis and dimension 61
yields λ1 = λ2 = λ3 = 0.
If we add an arbitrary additional vector
a1
a = a2 ,
a3
a = a1 e1 + a2 e2 + a3 e3
(ii) There is a vector in Span (v1 , . . . , vk ) that has two or more representations as
a linear combination.
(iii) At least one of the vectors in (v1 , . . . , vk ) is a linear combination of the others.
Proof. Exercise!
Since the opposite of linear dependence is linear independence, we can simply negate
Proposition 3.19 and get the following:
Corollary 3.21.
If the family (v1 , . . . , vk ) is linearly dependent, we can subjoin vectors and the res-
ulting family is still linearly dependent. On the other hand, if the family (v1 , . . . , vk )
is linearly independent, we can omit vectors and the resulting family is still linearly
independent.
The quick answer “k” is in general false since the family (v1 , . . . , vk ) could be linearly
dependent.
Our geometric intuition says that on a plane we cannot have more than two linearly
independent vectors, and in three-dimensional space not more than three. We express
this, by saying, a plane is two-dimensional, and space is three-dimensional. We will again
formalise this:
Sketch of the proof. Pack B and A together to a linearly dependent set, and remove vec-
tors (starting with elements of B) until it is linearly independent. One has to show now,
that the resulting set has again k elements, and that A remains untouched.
Now, we can record that all bases of V have the same number of elements.
Corollary 3.25.
Let V be a subspace of Rn and let B = (v1 , . . . , vk ) be a basis of V . Then:
(a) Each family (w1 , . . . , wm ) consisting of vectors from V where m > k is linearly
dependent.
So we can define:
The unit vectors e1 , . . . , en in Rn form a basis. The linear indepencency can be seen by:
x1
x1 e1 + . . . + xn en = o ⇒ ... = o ⇒ x1 = . . . = xn = 0.
xn
We obtain, as expected:
dim(Rn ) = n.
Rule of thumb:
The dimension of a vector space V says how many independent degrees of freedom
are needed to build linear combinations of all vectors in V .
Theorem 3.27.
Let U and V be two linear subspaces of Rn .
(i) One has dim(U ) = dim(V ) if and only if there exists a linear bijective map
between U and V .
Proof. (i) (⇒): If dim(U ) = dim(V ), a bijection can be defined by mapping the basis
vectors of the subspace U to the basis vectors of V and extending it linearly.
(⇐): If there is a bijection between U and V , the image of the basis of U is a basis of
V (linearly independent by injectivity and spanning by surjectivity). Hence, dim(U ) =
dim(V ).
(ii) A basis of U is an linearly independent family in V with dim(U ) = dim(V ) vectors.
Thus it is also a basis for V , and thus U = V .
Example 3.28.
The following subspaces of Rn are very important:
Corollary 3.29.
A family consisting of more than n vectors in Rn is always linearly dependent.
0 ··· ··· 0 1
The associated linear map Rn → Rn is, of course, the identity id. Other notations for the
identity matrix are 1n , In or En . If the context is clear, one usually omits the index n.
In the space of square matrices A ∈ Rn×n , the identity matrix fulfils:
1n x = x .
Also we can define inverses A−1 , which may or may not exist:
AA−1 = A−1 A = 1n .
3.9 Transposition 65
Theorem 3.31.
Let A ∈ Rn×n be a square matrix. Then
fA injective ⇔ fA surjective
Hence, if one of these cases holds, then fA is already bijective, i.e., invertible.
Remark:
If f : Rn → Rn is a linear map that is bijective, then f −1 : Rn → Rn is also a linear
map.
3.9 Transposition
Mathematicians found it convenient to write down most things in terms of column vectors,
and mainly think about column matrices. If one wants to talk about the rows of a matrix,
most of the time, one defines the transposed matrix and talks about their columns.
We already know transposition of column vectors:
T
a1
..
. = (a1 . . . an )
an
66 3 Matrices and linear systems
(b)
1 2 1 3
A= ∈ R2×2 ⇒ T
A = ∈ R2×2 .
3 4 2 4
(c)
1 −2 1 −2
A= ∈ R2×2 ⇒ T
A = ∈ R2×2 .
−2 1 −2 1
(d)
1
A = 2 ∈ R3×1 AT = 1 2 3 ∈ R1×3 .
⇒
3
(e)
4
5
A = 4 5 6 7 ∈ R1×4 AT = 4×1
⇒ 6 ∈ R .
Since we have exchanged the roles of rows and columns, the order of multiplication
changes, too:
(Ax)T = xT AT xT A = (AT x)T .
3.9 Transposition 67
and
0 4 1 1 4 11 26
−2 −1 0 · 2 5 = −4 −13 .
2 2 1 3 6 9 24
(d) For all A ∈ Rm×n and for all B ∈ Rn×r we have: (A · B)T = B T · AT .
(e) If A ∈ Rn×n is invertible, then AT is also invertible and we get (AT )−1 =
(A−1 )T .
and from this one can prove all properties. For example for showing (d), we see
X X
(B T · AT )ij = (B T )ik (AT )kj = Ajk Bki = (A · B)ji = ((A · B)T )ij
k k
for all i, j.
Proof. We already know that for all u, v ∈ Rn , we have hu, vi = uT v. Hence, we conclude
that for x ∈ Rn , y ∈ Rm and A ∈ Rm×n , the following holds hAx, yi = (Ax)T y =
xT AT y = hx, AT yi.
Moreover, AT is the only matrix in B ∈ Rn×m that satisfies the equation hAx, yi = hx, Byi
for all x ∈ Rn and y ∈ Rm . Therefore, some people use this as the definition for AT .
(b)
0 3 4 0 −3 −4
A = −3 0 −5 is skew-symmetric since AT = 3 0 5 = −A.
−4 5 0 4 −5 0
Ran(A) := {Ax : x ∈ Rn } ⊂ Rm
Note that the range of A coincides with the range of the corresponding map fA and
that the kernel of A corresponds to the fiber of fA for the origin o. In other words,
Ran(A) = Ran(fA ) and Ker(A) = fA−1 ({o}).
In our previous study of matrices, we have already found out quite a lot of things about
matrices:
• Ran(A) = Span (a1 , . . . , an ) where the vectors ai are the columns of A.
3.10 The kernel, range and rank of a matrix 69
• Ker(A) is a subspace of Rn
• For a matrix A the linear mapping fA is injective if and only if Ker(A) = {o} and
surjective if and only if Ran(A) = Rm .
Since our ultimate goal is to understand linear systems given the form:
Ax = b,
we would like to know more about Ran(A) (because it tell us, for which b our system has
a solution) and Ker(A) (because it tells us about the uniqueness of solutions).
We obviously have:
rank(A) ≤ min{m, n}
A is said to have full rank, if rank(A) = min{m, n}.
Let A be some matrix with, say m columns. Let us assume that somebody gives us
r = rank(A). This means that A has r linearly independent columns, and these columns
are a basis for Ran(A). Let us again assume that we know these columns, and we have
already reordered them, so that they are the first r columns of A:
B | |{z}
A = (|{z} F )
r m−r
Later, when we discuss the Gauß algorithm, we will find a way to identify these columns.
With this information, we would like to compute Ker(A), i.e. all x, such that Ax = o. It
is also of interest to obtain its dimension dim(Ker(A)).
dim(Ker(A)) + dim(Ran(A)) = n.
Seeing this as the matrix vector form, we can write Ax = b. Later, we will also put in
the right-hand side and get the augmented matrix (A|b) that simply is
2 3 4 1
4 6 9 1 .
2 4 6 1
There are a lot of viewpoints for the method of solving this system. We start with the
most important one.
Example 3.42.
A typical 2 × 2 LES might be look like this
E1 : x1 + 3x2 = 7
E2 : 2x1 − x2 = 0.
From equation E2 we can conclude x2 = 2x1 . Putting this equation for x2 into the
first equation E1 , we immediately get:
E
7 =1 x1 + 3x2 = x1 + 3 · 2x1 = 7x1
and therefore x1 = 1. Hence, the system has exactly one solution given by the vector
x = xx12 = 12 .
My advice here: Please, do not do that. It works for 2×2 LES but you will be in big trouble
while solving larger systems, most of the time. Therefore, we will now learn a systematic
solving recipe, called the Gauß algorithm or often just called Gaussian elimination.
One idea of Gaussian elimination can be shown with the example above. One does row
operations to get:
The next idea is, as always, to rewrite the system in matrix form Ax = b. Operations on
the system, can now be realised by using an invertible matrix M . This reformulates the
LES:
Ax = b ⇔ M Ax = M b
If the product M A has a simpler form than A, this helps to solve our system.
Linear systems can be solved by building linear combinations of rows. This means
multiplication of invertible matrices on A and b from the left.
Recall that for each vector c ∈ Rm the product cT A builds linear combinations of rows of
A which are always denoted by αT1 , . . . , αTm . As a reminder:
a11 . . . a1n αT1
A = ... .. = ..
, where αi = ai1 · · · ain .
T
. .
am1 . . . amn αTm
cT = (0 . . . ci . . . 0) gives: cT A = ci αTi
cT = (0 . . . ci 0 . . . 0 cj . . . 0) gives: cT A = ci αTi + cj αTj
If we put such a vector cT as the ith row into a matrix M , then the ith row of M A will
contain this result. This gives us the row operations for the example above.
Similar things can be done, of course, with the columns of A by right multiplication of
columns. However, row operations are more important right now.
The next example shows how the first row is added λ-times to the last:
1 αT1 αT1
Z3+λ1 A = 1 αT2 = αT2 .
T T T
λ 1 α3 α3 + λα1
Undoing this last operation means using subtraction of such a row again. Thus, for
the last example, we have
1 αT1 αT1
Z3−λ1 Z3+λ1 A = 1 αT2 = αT2 = A.
−λ 1 αT3 T
+ λα1 T T T
α3 + λα1 − λα1
In general, we can define Zj+λi that adds the ith row of A to the j th row, where i 6= j. By
the 3 × 3 examples, it is easy to see how one has to define this matrix in the m × m-case.
This is always an invertible matrix since the inverse of Zj+λi is always the matrix Zj−λi
since this undoes the row addition. One gets:
Instead of adding rows, we could also exchange rows: We replace the ith row of A by its
j th row, and vice versa.
The next example shows how to exchange the first and the last row:
0 1 αT1 αT3
P1↔3 A = 1 αT2 = αT2 .
T T
1 0 α3 α1
−1
Pi↔j = Pi↔j (3.8)
M Ax = o ⇔ Ax = M −1 o ⇔ Ax = o.
Note that the range may change a lot by row operations. What happens with the kernel
and the range if you just do column operations?
Corollary 3.49.
Row operations do not change the set of solutions.
Goal
For solving Ax = b, use row operations M to bring A into upper triangular form,
which is matrix that has zeros below the diagonal, or into row echelon form, which
we will define later. Then, one can construct the solution set for M Ax = M b.
Since we use the same row operations on A and b it is useful to use augmented matrix
(A|b). In the end, we obtain (M A|M b).
Attention!
Do not use equation E10 anymore at this point!
Otherwise, you would bring the variable x1 back in the game.
We start with a square matrix A ∈ Rn×n . Let us write Ae := (A|b) as a row matrix:
e T1
a11 a12 . . . a1n b1 α
a21 a22 . . . a2n b2 e T2
α
A ..
e = (A|b) = .. .. .. = ..
. . . . .
T
an1 an2 . . . ann bn α
en
3.11 Solving systems of linear equations 75
e T1
a11 a12 . . . a1n b1 α
T
e 2 − λ2 α
0 ã22 . . . ã2n b̃2 α e T1
Zn−λn 1 . . . Z2−λ2 1 (A|b) = .. .. .. .. = .. = L−1
1 (A|b)
| {z } . . . . .
L−1
1 0 ãn2 . . . ãnn b̃n e Tn − λn α
α e Tn
Since L−1
1 substracts the first row from all others, its inverse is easily seen to be the matrix
that adds this row to all others:
1 1
−λ2 1 λ2 1
−1
L1 = .. . , L1 = .. .
. . . . . .
−λn 1 λn 1
Once, we have eliminated the entries a21 . . . an1 . We can do the same with ã32 . . . ãn2 .
Here we use factors λ̃i = ãã22
i2
We then obtain:
a11 a12 a13 . . . a1n b1
0 ã22 ã23 . . . ã2n b̃2
L−1 L −1
(A|b) = 0 0 â33 . . . â3n b̂3
2 1
.. .. .. .. ..
. . . . .
0 0 ân3 . . . ânn b̂n
It (luckily) turns out, that
1
λ2 1
L1 L2 =
λ3 λ̃3 1
.. .. ..
. . .
λn λ̃n 1
LU-decomposition A = LU
Then Ax = LU x = Lc = b.
• Since L and U are both triangular, the above solves can be performed easily.
• Since our factorisation is done once and for all, further problems with the same
matrix but different b can be solved later.
• Another point of view on L is the following: we keep track of what is done during
the transformation of the right hand side b → c.
ci ; ci − λcj ⇔ lij = λ
If later our system has to be solved for another right hand side b, then we can use
the subdiagonal entries of L to do the same computations to this new b, as was
done to the old one. This can be nicely written as c = L−1 b.
The Gauß algorithm can be performed by hand, or implemented on a computer. The
following pseudo-code describes it in detail:
Here U and c are overwritten, so we do not distinguish uij and ũij , and so on.
(U |c) = (A|b), L = 1n
for j = 1 . . . n (loop over columns)
for i = j + 1 . . . n (loop over rows)
uij
lij = ujj
uij = 0 (eliminate entry)
for s = j + 1 . . . n
uis = uis − lij ujs (subtract remaining entries)
ci = ci − lij cj (subtract rhs)
• We recognise three nested loops, and thus, the cost of this algorithm is proportional
to n3 .
3.11 Solving systems of linear equations 77
For a right-hand side b, one simply does the same row calculation steps:
1 1 1
b = 1 ; −1 ; −1 =: c
1 2 4
The bug:
• What, if ujj is zero at some stage of the computation? Then we have division by 0.
• Also, on the computer, if ujj is very small, say 10−14 , then due to round-off error,
problems may also occur.
If you are just interested in the solution(s) of a given LES, then you will just do the
Gaussian elimination step by step until you reach the upper triangle form (or the row
echelon form, see next section).
E1 : x1 + 2x2 + x4 = 3
E2 : 4x1 + 8x2 + 2x3 + 3x4 + 4x5 = 14
(3.9)
E3 : 2x3 + 3x4 + 12x5 = 10
E4 : −3x1 − 6x2 − 6x3 + 8x4 + 4x5 = 4
The entry in grey 1 is first one we have to consider. All entries below should get zero
after the first elimination.
• multiply 4
1
= 4 to E1 and subtract the result from E2 ,
• multiply −3
1
= (−3) to E1 and subtract the result from E4 .
x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
E1 : 1 2 0 1 0 3 E10 : 1 2 0 1 0 3
E2 :
4
8 2 3 4 14−4·E1 0 0
E20 : 2 −1 4 2
;
E3 : 0 0 2 3 12 10 E30 : 0 0 2 3 12 10
E4 : −3 −6 −6 8 4 4 +3·E1 E40 : 0 0 −6 11 4 13
The next number on the diagonal is a zero and it seems like that our algorithm has to stop
here. However, since below there are also zeros, the column is already eliminated. We
can just ignore the variable x2 at this point and just restart the algorithm with starting
point 2 .
Just subtract the equation E20 with the right factor from the other rows: 2
2
= 1 times
3.11 Solving systems of linear equations 79
Now, we cannot use any rows for elimination and we are finished. We get the following
result:
x x x3 x4 x5
1 2
0000
E1 : 1 2 0 1 0 3
0000
E2 :
0 0 2 −1 4 2
(3.10)
E3 : 0
0000 0 0 4 8 8
E40000 : 0 0 0 0 0 3
This is not a triangle matrix like in Example 3.42 but an upper triangle matrix by defin-
ition since below the diagonal, there are just zeros. This form is called the row echelon
form and defined below.
• for each row: the first nonzero number from the left is always strictly to the
right of the first nonzero coefficient from the row above it.
In the row echelon form we can put the variable into two groups:
Example 3.55. Looking at equation (3.10) again, we can distinguish the variables
80 3 Matrices and linear systems
x1 x2 x3 x4 x5
G0000
1 : 1 2 0 1 0 3
0000
G2 :
0 0 2 −1 4 2
(3.11)
G0000
3 : 0 0 0 4 8 8
0000
G4 : 0 0 0 0 0 3
In this example x1 , x3 and x4 are the leading variables and x2 and x5 are free.
If you have a LES where the matrix is given in row echelon form, then the solution set
is immediately given. You just have to push the free variables to the right and solve the
leading variables by backward substitution.
For example, the solution of (3.11) is empty since the last row is not satisfiable. However,
we can give another example:
Example 3.56.
The LES
x1 x2 x3 x4 x5
E1 : 1 2 0 1 0 3
E2 : 0 0 2 −1 4 2 (3.12)
E3 : 0 0 0 4 8 8
is already in row echelon form and can be equivalently written as
x x3 x4
1
1 3 − 2 x2
E1 : 1 0
E2 : 0 2 −1 2 − 4 x5 (3.13)
E3 : 0 0 4 8 − 8 x5
1 − 2x2 + 2x5 −2
x1 1 2
x 2
x2 0 1 0
S = x3 = 2 − 3x5 = 2 + x2 0 + x5 −3 : x2 , x5 ∈ R
x4
2 − 2x5 2 0 −2
x5 x5 0 0 1
Corollary 3.57.
If A ∈ Rm×n is a row echelon matrix, then rank(A) is the number of leading variables
and dim(Ker(A)) is the number of free variables.
Proof. Obviously, the columns with pivots are linearly independent vectors where the
columns with free variables are a linear combination of the other ones.
In the next section, we will generalise what we did in the example before.
3.11 Solving systems of linear equations 81
Let j by the current column and assume that we want to eliminate all entries below
krj . (In the standard Gaussian elimination, we always wanted to eliminate all entries
below a diagonal element kjj , but in the general case, we have r ≤ j, as already seen in
Example 3.62.)
Initialise the permutations matrix Prow = 1 to store permutations and start with K := A
and c := b.
• If krj = 0, test for i = r + 1 . . . m, if kij 6= 0
• At the first occurence ipivot , exchange row r and row ipivot of L\K, and c and Prow .
This means that only the subdiagonal entries of L are exchanged, not the diagonal
entries (which are 1).
• If all tested kij are zero, continue with the next column.
Question:
One may wonder, why only the entries of L are permuted, which are below the
diagonal, but not on the diagonal. As you recall, L is constructed for the purpose
of book-keeping, which row is substracted from what row, and by which scaling. The
unit diagonal of L has nothing to do with this book-keeping.
If we permute the subdiagonal entries of L, we update the book-keeping according to
the permutations of the rows of A. It is, as if the rows would have been permuted
in the beginning, before the start of the elimination.
Since we have exchanged the subdiagonal entries of L, all row exchanges, applied during
the transformation b ; c can be performed at the beginning. Thus, we do not have to
remember when a row exchange took place. We only need the result of all row exchanges,
and apply it at the beginning.
1 1 1
b= 2 → (permute first) ; Prow b = 3 −1
; L Prow b = c = 2 .
3 2 0
This algorithm can also be applied to non-square matrices and leads to K in the so called
row echelon form. This is why there are now two variables r and j. The first stands for
the column, the second for the ”head“ of the column. r ≤ j.
K = A, L = 1m , c = b, r = 1, Prow = 1m
for j = 1 . . . n (loop over columns)
perform pivot search for the first non-zero element of K at or below krj
if ipivot was found, exchange row r and row ipivot of L\K, c, and Prow
for i = r . . . m (loop over rows)
kij
lir = krj
kij = 0
for s = r + 1 . . . m
kis = kis − lir krs
cr = cr − lir cr
r =r+1 consider the next row
It does, however, not work properly on the computer, because the test ujj = 0 is unreliable,
in the presence of round-off errors. For toy examples, however, it can be used to find all
possible solutions of a non-square linear system.
1 0 0 1 2 1 2
= 1 1 0 0 0 1 1
2 1 1 0 0 0 0
We observe that rank(A) = 2.
Although this exchange of rows happens during the course of elimination, the outcome of
the resulting algorithm for matrices A can be written as
Prow A = LK,
where in Prow all the performed permutations, and K is in row echelon form. Hence, for
a right hand side b we may solve Ax = b as follows:
w = Prow b (row permutations)
Lc = w (forward substitution)
Kx = c (backward substitution)
Then Prow Ax = LKx = Lc = w = Prow b. Usually, Prow is not stored as a matrix, but
rather as a vector p of indices: wi = bpi .
• Subtract 4
1
= 4 times E10 from E20 ,
• subtract −3
1
= (−3) times E10 from E40 .
Here the solution:
x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
E10 : 1 2 0 1 0 3 E100 : 1 2 0 1 0 3
E20 : 4 8 2 3 4 14 0
−4·E1 0 0
E200 : 2 −1 4 2
;
E30 : 0 0 2 3 12 10 E300 : 0 0 2 3 12 10
E40 : −3 −6 −6 8 4 1 +3·E10 E400 : 0 0 −6 11 4 10
Now there is no x1 in the rows 2, 3 and 4. Now, we should not touch the first row any
more since otherwise x1 comes back in the game.
Also, x2 remains only in row 1. Hence, we do not have to do anything with x2 . We can
go to x3 .
There the gray 2 in E200 is the next pivot. Subtract E200 with the right multiple ( 2
2
= 1)
from E300 . Also subtract −6
2
= (−3) times E200 from E400 . We get:
x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
E100 : 1 2 0 1 0 3 E1000 : 1 2 0 1 0 3
E200 : 0 0 2 −1 4 2 0 0
E2000 : 2 −1 4 2
;
00
E3 : 0 0 2 3 12 10 −1·E200 E3000 : 0 0 0 4 8 8
E400 : 0 0 −6 11 4 10 +3·E200 000
E4 : 0 0 0 8 16 16
x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
G000
1 : 1 2 0 1 0 3 E10000 : 1 2 0 1 0 3
G2 : 0 0
000 2 −1 4 2 0 0
E20000 : 2 −1 4 2
0 0
;
G000
3 : 0 4 8 8 E30000 :
0 0 0 4 8 8
G000
4 : 0 0 0 8 16 16 −2·G000
3 E40000 : 0 0 0 0 0 0
The elimination algorithm ends. This is the wanted solution in row echelon form
x1 x2 x3 x4 x5
E10000 : 1 2 0 1 0 3
0000
E2 :
0 0 2 −1 4 2
(3.15)
E3 : 0
0000 0 0 4 8 8
E40000 : 0 0 0 0 0 0
In the Gaussian elimination everything works in the rows. Now, we will look what we can
say about the columns. As a reminder:The LES Ax = b has at least one solution x if and
only if b can be written as Ax for a x ∈ Rn , which means that b ∈ Ran(A). Looking at
the columns of the matrix
A = a1 · · · an ∈ Rm×n ,
we can conclude
x1
Ax = a1 · · · an ... = x1 a1 + · · · + xn an
xn
In other words:
(ii) b ∈ Ran(A),
Rn Rm
fA
Ran(A)
x Ax
b
c
Example 3.64. Let A = 31 62 .Then Ax = b has at least one solution if and only if
n 3
6 o n 3
3 o
b ∈ Ran(A) = x1 + x2 : x1 , x2 ∈ R = x1 + 2x2 : x1 , x2 ∈ R
1 2 1 1
o n 3
n 3 o
= (x1 + 2x2 ) : x1 , x2 ∈ R = λ :λ∈R
1 1
Remember that for each matrix A there is a linear map fA : Rn → Rm , cf. section 3.3,
defined by
fA
x ∈ Rn 7→ Ax ∈ Rm .
Of course, solving Ax = b, is the same as solving fA (x) = b. This means we want to find
the preimage of the element b with respect to the map fA . Obviously the image of fA is
exactly the range of A Ran(A), so we get:
(iii) Ran(A) = Rm .
(iv) rank(A) = m ≤ n.
(v) The row echelon form of A, denoted by A0 , has a pivot in every row.
(vi) fA is surjective.
3.12 Looking at columns and maps 87
Proof. (i)⇔(ii)⇔(iii)⇔(vi) and (iv)⇔(v) is clear. Assume (iv), rank(A) = m. Then for
each b ∈ Rm we can use rank(A) ≤ rank(A|b) ≤ m and therefore rank(A|b) = m. We
can conclude rank(A) = rank(A|b). This means that b is a linear combination of the
columns of A and, hence, this gives us a solution x, which shows (i).
Show now (i)⇒(v) by contraposition. Assume ¬(v). By doing the elimination A ; A0
to get a row echelon form A0 , we also get at least one zero row. Then, it is possible to
choose b ∈ Rm in such a way that we get a row in (A0 |b0 ) that is given by (0 . . . 0 | c)
with c 6= 0. Such a row cannot be solved and, hence, Ax = b has no solution. Therefore,
we get ¬(i).
Example 3.66. Consider a 3 × 5 matrix A and calculate the row echelon form A0 :
1 4 0 2 −1 1 4 0 2 −1
A = −1 2 −2 −2 3 ; · · · ; A0 = 0 6 −2 0 2
−3 0 −4 −3 8 0 0 0 3 1
Each row of A0 has a pivot and (v) from Proposition 3.65 holds. One immediately sees
rank(A) = 3 = m ≤ n = 5.
(i) says that the LES Ax = b has for every right-hand sides b ∈ R3 at least one solution.
Proof. Equivalence (i)⇔(iii) follow from S = v0 + Ker(A), see Proposition 3.48. (i)⇔(vi)
holds by the definition of injectivity. The equivalence (ii)⇔(iii) follows from the definition
of Ker(A). (iii)⇔(iv) holds by Theorem 3.41, the Rank-nullity Theorem. (iv)⇔(v) holds
since row operations do not change the rank of a matrix.
Example 3.68. Consider a 4 × 3 matrix A and calculate the row echelon form A0 :
2 3 0 2 3 0
; · · · ; A0 = 0 −1
2 2 5 5
A= −4 −5
−3 0 0 −8
4 7 1 0 0 0
(iv) fA is bijective.
(v) Ran(A) = Rn .
(vi) rank(A) = n.
(vii) For A, the row echelon form A0 has a pivot in each row.
(viii) For A, the row echelon form A0 has a pivot in each column.
(ix) fA is surjective.
(x) fA is injective.
(xi) fA is bijective.
Proof. Since m = n, the equations rank(A) = m and rank(A) = n from Proposition 3.65
and Proposition 3.67 are equivalent. Therefore all the claims above are equivalent.
We can conclude:
Summary
• By Rm×n we denote number tables with m rows and n columns.
• We call these number tables matrices and can naturally scale them and add them
Both operations in Rm×n are realised by doing these inside the components.
• Linear equations look like
• Systems of linear equations (LES) are finitely many of these linear equations.
• A solution of the system is a choice of all unknowns x1 , . . . , xn such that all equations
are satisfied.
• A short notation for LES is the matrix notation: Ax = b.
• This notation leads us to the general matrix product.
• Each matrix A induces a linear map fA : Rn → Rm . A linear map satisfies two
properties ( · ) and (+).
• If fA is bijective, the corresponding matrix is invertible with respect to the matrix
product.
• Linearly independent vectors are the most efficient method to describe a linear sub-
space.
90 3 Matrices and linear systems
• A linearly independent family that generates the whole subspace U is called a basis
of U .
• Range, rank and kernel are important objects for matrices.
• For solving a LES, we use Gaussian elimination or equivalently LU -decomposition.
In the general case the upper triangular matrix U is substituted by a row echelon
form.
• Solvability and unique solvability can be equivalently formulated and, for example,
read from the row echelon form.
Determinants
4
A learning experience is one of those things that says, ’You know that
thing you just did? Don’t do that.’
Douglas Adams, The Salmon of Doubt
We already know how to solve the system Ax = b if A is a square matrix. The determinant
should then tell us if the system has a unique solution before solving it. For a 2 × 2 LES,
we get (for a11 6= 0):
a11 a12 b1 a11 a12 b1
;
a21 a22 b2 0 a11 a22 − a12 a21 b2 a11 − b1 a21
Hence, we know that the LES has a unique solution if and only if in the second column
there is a pivot. This means a11 a22 − a12 a21 6= 0. And that is the determinant of the
system or rather the matrix A.
91
92 4 Determinants
Definition 4.1.
For a matrix A = a11 a12
∈ R2×2 , we call det(A) := a11 a22 −a12 a21 the determinant
a21 a22
of A.
Consider two vectors u = uu12 and v = vv12 from R2 and the spanned parallelogram.
v
|Area(u, v)| = area of parallelogram
u
• Plus sign if then rotation by rotating u such that the angle between u and v
gets smaller is the mathematical positive sense,
• Minus sign if this rotation is the mathematical negative sense.
v u
+ −
u v
If you look back at Section 2.6, you already know a possible calculation for the area of
a parallelogram. However, this only works in the three-dimensional space R3 since it
involves the cross product. However, if we just embed the vectors u, v ∈ R2 and therefore
the whole parallelogram into R3 , we can calculate |Area(u, v)| in this way. A possible
way is to set the supplementary third component as zero:
! !
u1 v1
ũ := u2 and ṽ := v2 .
0 0
Then we find:
!
!
u2 0 − 0v2
0
|Area(u, v)| = kũ × ṽk =
0v1 − u1 0
=
0
= |u1 v2 − u2 v1 |.
u1 v2 − u2 v1
u1 v2 − u2 v1
Without the absolute value this coincides with the determinant of the matrix
!
A := u v .
Proposition 4.2.
u v
u1 v1
1 1
Area , = det = u1 v2 − u2 v1 (4.1)
u2 v2 u2 v2
3 2
Area(u, v) = Area , = 3 · 1 − (−2) · 2 = 7.
−2 1
2 3
Area , = 2 · (−2) − 1 · 3 = −7.
1 −2
u αu
1 1
Area(u, v) = Area(u, αu) = Area , = u1 αu2 − u2 αu1 = 0.
u2 αu2
Note that the vectors u and v = αu do not span a actual parallelogram but rather
just a stripe. Therefore, the area has to be zero.
In the previous section, we showed that, in two dimensions, the determinant is connected
to a measuring of an area. In three dimensions, we therefore expect that the determinant
will measure a volume. In general, we want that the determinant measures an generalised
n-dimensional volume in Rn . We use the symbol Voln for this. We already know that
Vol2 = Area. Now, we can summarise what one should demand of a meaningful volume
measure.
u1
h0 h + h0 · u1
v
·
u2 + v
h ·
u2
·
u1
3v
v
v u
+ −
u v
In mathematical terms, this means that the volume function is linear in each entry, anti-
symmetric and normalised to the standard basis. For the case n = 2, we can show that
solely these properties imply equation (4.1).
First, we show an easy consequence that follows from the two properties (2) and (3):
Proof. Because of (3), we find Vol2 (u, u) = −Vol2 (u, u) and this implies Vol2 (u, u) = 0.
Since (2) holds, we get Vol2 (u, αu) = α Vol2 (u, u) = α 0 = 0.
Proposition 4.6.
If Vol2 fulfils (1), (2), (3), (4), then for all u = u1 v1
∈ R2 the following
u2
,v = v2
u1 v1
holds
u v
1 1
Vol2 (u, v) = Vol2 , = +u1 v2 −u2 v1 .
u2 v2 u2 v2
Proof.
u
0 (1)
u 0
1 1
Vol2 (u, v) = Vol2 + , v = Vol2 , v +Vol2 ,v
0 u2 0 u2
(1)
u v u 0
1 1 1
= Vol2 , +Vol2 ,
0 0 0 v2
0 v 0 0
1
+ Vol2 , +Vol2 ,
u2 0 u2 v2
(2),(2)
1 1 1 0
= u1 v1 Vol2 , +u1 v2 Vol2 ,
0 0 0 1
| {z } | {z }
=0 see Proposition 4.5 =1 because of (4)
0 1 0 0
+ u2 v1 Vol2 , +u2 v2 Vol2 ,
1 0 1 1
| {z } | {z }
=−1 because of (3),(4) =0 see Proposition 4.5
= u1 v2 − u2 v1 .
Note that this proves that the volume function Vol2 is uniquely defined by the four
properties (1), (2), (3) and (4) alone. We expect the same for arbitrary dimension n and
indeed we prove this now.
a11 a1n
For a1 = ... , . . . , an = ... ∈ Rn , it follows
an1 ann
with n · . . . · n = nn summands, where most of them are zero since det(ei1 · · · ein ) = 0 if
two indices coincide. The possibilities (i1 , i2 , . . . , in ) for that all entries are different can
be counted. The number is the number of all permutations of the set {1, . . . , n}, which
is exactly n · (n − 1) · . . . · 1 = n!. Let Pn be the set of these n! permutations, which can
also be denoted by τ = (i1 , i2 , . . . , in ).
For a permutation τ ∈ Pn , we define sgn(τ ) = 1 if one can use an even number of
exchanges of two elements to get from τ to (1, 2, . . . , n). If one needs an odd number of
exchanges of two elements to get from τ to (1, 2, . . . , n), we define sgn(τ ) = −1.
Repeatable usage of (3) shows Voln (ei1 · · · ein ) = sgn(i1 , . . . , in ). In summary, we get
96 4 Determinants
Proof. The calculation from above shows that (4.2) is the only function that fulfils all the
four rules.
Remark:
You can remember the Leibniz formula of the determinant det(A) in the following
way:
(1) Build a product of n factors out of the entries in A. From each row and each
column you are only allowed to choose one factor.
(2) Sum up all the possibilities for such a product where you add a minus-sign for
the odd permutations.
Example 4.9. Consider the matrix Pk↔` that we used in the Gaussian elimination to
switch the kth row with the `th row. Let’s denote the entries by pij and then we know
1 if i = j and i, j 6= k, `
pij = 1 if i = k, j = ` or i = `, j = k
0 else
This means that in the Leibniz formula, there is only one non-vanishing term:
Since the permutation is only one single exchange, the sign is −1. Of course, we expect
this result by property (3) of the volume form.
4.3 The cofactor expansion 97
We have used the Leibniz formula to finally define the determinant of a matrix or, equi-
valently, the volume measure in Rn . However, despite being useful in abstract proofs,
this formula is not a good one for actual calculation. Even for n = 4, we have to sum up
4! = 24 terms. Only for n = 2 and n = 3, we get good calculation formulas, which can be
memorised.
a11 a12 a13
det a21 a22 a23 = + a11 a22 a33 + a12 a23 a31 + a13 a21 a32
a31 a32 a33
− a13 a22 a31 − a11 a23 a32 − a12 a21 a33
−
u1 v1 w1 u1 v1
u2 v2 w2 u2 v2
u3 v3 w3 u3 v3
+
Vol3 (u, v, w) = +u1 v2 w3 + v1 w2 u3 + w1 u2 v3 −u3 v2 w1 − v3 w2 u1 − w3 u2 v1
Moreover, the sign of the three-dimensional volume can be easily seen by the right-hand-
rule:
A = (aij ) with i, j = 1, . . . , n.
• A ∈ R2×2 :
det(A) = a11 a22 − a21 a12
• A ∈ R3×3 :
det(A) = + a11 a22 a33 + a12 a23 a31 + a13 a21 a32
− a13 a22 a31 − a11 a23 a32 − a12 a21 a33
Note that it would be helpful to have an algorithm that reduces an n × n-matrix to these
cases.
Checkerboard of signs:
+ − + − ...
− + − + . . .
+
− + − . . .
− + − + . . .
.. .. .. ..
. . . .
The entry (i, j) of this matrix is (−1)i+j .
Cofactors:
Definition 4.10.
C = (cij ) = (−1)i+j det(A(i,j) ) is called the cofactor matrix of A.
To compute det(A(i,j) ), apply the same formula recursively, until you reach 2 × 2 matrices,
where the corresponding formula can be applied. The proof of this formula follows im-
mediately from the Leibniz formula and is left to the reader.
Here, it would be useful first to expand along the second row since we find three zeros
there:
0 2 3 4
2 0 0 2 3 4
0
3 4
det(A) = det
1 1 0 = (−2) · det 1 0 0 = (−2) · (−1) · det
= 4.
0 1 2
0 1 2
6 0 1 2
as one can easily see by the Laplace’s formula. Now we can conclude:
Proposition 4.13.
Let A ∈ Rn×n and C be its cofactor matrix. Then
C T A = det(A)1n .
100 4 Determinants
CT
A−1 = .
det(A)
Proof. This is just a matrix multiplication where we consider the (i, j)th entry:
n
X n
X
(C T A)ij = cki akj = det a1 · · · ai−1 ek ai+1 · · · an akj
k=1 k=1
Proposition 4.14.
If the square matrix A ∈ Rn×n is in triangular form,
a1,1 a1,2 · · · a1,n
0 a2,2 a2,n
A = .. . . .. ,
. . . . . .
0 · · · 0 an,n
Proposition 4.16.
For each matrix A ∈ Rn×n , we have det(AT ) = det(A).
We only have to show that both sums consist of the same summands.
By definition of the signum function:
sgn(σ ◦ ω) = sgn(σ)sgn(ω)
From this we get:
sgn(σ −1 ) = sgn(σ)
The multiplication is commutative and we can rearrange the product aσ(1),1 aσ(2),2 · · · aσ(n),n .
Hence, we get
sgn(σ) aσ(1),1 aσ(2),2 · · · aσ(n),n = sgn(σ −1 ) a1,σ−1 (1) a2,σ−1 (2) · · · an,σ−1 (n),
Now we substitute ω for σ −1 and recognise that all summands are, in fact, the same. Here,
it is important that Pn is a so-called group, in which each element has exactly one inverse.
Therefore, we can sum over ω instead of σ without changing anything. In summary, we
have:
X X
sgn(σ) aσ(1),1 aσ(2),2 · · · aσ(n),n = sgn(ω) a1,ω(1) a2,ω(2) · · · an,ω(n) .
σ∈Pn ω∈Pn
Proposition 4.17.
For A, B ∈ Rn×n , we have
Proof. Denote the column vectors of A by aj and the rows of B by β Tj , we can write the
matrix product as AB = j aj β Tj and get:
P
!
X X
det(AB) = Voln aj1 bj1 ,1 , . . . , ajn bjn ,n .
j1 jn
102 4 Determinants
Now we can use the properties (1), (2), (3), and (4) the volume form has, see Definition 4.4.
We get:
X
det(AB) = bj1 ,1 · · · bjn ,n Voln (aj1 , . . . , ajn )
j1 ,...,jn
X
= bσ(1),1 · · · bσ(n),n Voln aσ(1) , . . . , aσ(n)
σ∈Pn
X
= bσ(1),1 · · · bσ(n),n sgn(σ)Voln (a1 , . . . , an )
σ∈Pn
X
= det(A) sgn(σ)bσ(1),1 · · · bσ(n),n = det(A) det(B) .
σ∈Pn
The determinant function det : Rn×n → R is therefore multiplicative. This is what we can
use for calculating determinants with the Gaussian elimination since this is nothing more
than multiplying matrices from the left.
Corollary 4.18.
Using the row operations Zi+λj (for i 6= j and λ ∈ R) do not change the determinant.
The switching of rows only changes the sign of the determinant. However, scaling
rows with a diagonal matrix D changes the determinant by the product of these
scaling factors.
Since det(AT ) = det(A), one can equivalently use column operations if you just want to
calculate the determinant. It is not recommended in the case you need indeed the row
echelon form in the end for other applications.
Example 4.19. We calculate the determinant of the following 5 × 5 matrix A. The third
column already has three zeros but we can generate a fourth zero by using one Gaussian
step: Subtract the fifth row from the second one:
−1 1 0 −2 0 −1 1 0 −2 0
0
2 1 −1 4
0
4 0 −2 2
A=
1 0 0 −3 1 ; 1
0 0 −3 1 =: B
1 2 0 0 3 1 2 0 0 3
0 −2 1 1 2 0 −2 1 1 2
4.4 Important facts and using Gauß 103
Now, we know that det(A) = det(B) holds. Having so many zeros, it is the best to expand
det(B) along the third column:
−1 1 0 −2 0
−1 1 −2 0
0
4 0 −2 2
0 4 −2 2
3+5
det(B) = det 1
0 0 −3 1 = (−1) · 1 · det
1 0 −3 1
1 2 0 0 3
1 2 0 3
0 −2 1 1 2 | {z }
=:C
Looking at C, we can use Gaussian elimination to get two more zeros in the second row.
(Add the fourth column to the third column and subtract it from the second column two
times):
−1 1 −2 0 −1 1 −2 0
0 4 −2 2 0 0 0 2
C= 1 0 −3 1
; 1 −2 −2 1 =: D
1 2 0 3 1 −4 3 3
Of course, we have again: det(C) = det(D). Now, det(D) should be expanded along the
second row:
−1 1 −2 0
0 −1 1 −2
0 0 2 2+4
1 −2 −2 1 = (−1)
det(D) = det · 2 · det 1 −2 −2 .
1 −4 3
1 −4 3 3 | {z }
=:E
In summary:
Remark:
• det(A−1 ) = 1
det(A)
(if the inverse exists)
• If P A = LU , then det(A) = 1
det(P )
det(L) det(U ) = det(P ) det(U ) =
± det(U ).
n 2 3 4 5 6 7 8 9 10 · · · 20
3
n /3 2 9 21 42 72 114 171 243 333 · · · 2667
n! 2 6 24 120 720 5040 40320 362880 3628800 · · · 2.4 · 1018
det(f ) := det(A)
In fact det(f ) is the relative change of all volumes and we remind that we have the
following:
In general, det(A) = det(fA ) describes the change of volume for every figure:
fAB = fA ◦ fB
fB fA
)
(AB
t (B) det
1 de
Area · det(AB)
Example 4.21.
λ 1 2
A(λ) = 1 2 3 det(A(λ)) = λ(4 − 3) − 1(2 − 2) + 1(3 − 4) = λ − 1.
1 1 2
This matrix is singular if and only if λ = 1, and in indeed, for λ = 1, we have for the
column vectors a1 (λ) + a2 = a3 .
Conclusion: singular matrices do not appear very often. Whatever this means.
Warning: this is only good for pen-and-paper computations. In numerical computations,
det(A + round off) says nothing about invertibility of A, only about change of volume:
ε 0
det = 1.
0 1/ε
Proof. Exercise!
Ax = b .
CT b
−1
Ax = b ⇒ x = A b = .
det(A)
Proof. Having the cofactor matrix C, we already know that the solution is given by
CT b
x = A−1 b =
det(A)
Therefore, we just have to look at the ith row of the matrix C T b which is given by:
n
X n
X
(C T b)i = cki bk = det a1 · · · ai−1 ek ai+1 · · · an bk
k=1 k=1
If you calculate the determinants by Laplace, then you work is of order n!. If you use
3
Gaussian elimination for calculating the determinants, you only need n3 steps for each
component of x, but, of course in this case, you could have solved the whole system
Ax = b by using Gaussian elimination.
For computational reasons the Cramer’s rule can only be used for small matrices, but the
real advantage is the theoretical interest. You can use Cramer’s rule in proofs if you need
claims about a single component xi of the solution x.
Summary
• The determinant is the volume form.
• The determinant fulfils three defining properties:
(1) Linear in each column.
(2) Alternating when exchanging columns.
(3) The identity matrix has determinant 1.
• To calculate a determinant, you have the Leibniz formula, the Laplace expansion or
Gaussian elimination (without scaling!).
General inner products, orthogonality and distances
5
Sadly, very little school maths focuses on how to win free drinks in a
pub.
Matt Parker
We have already encountered the standard inner product (also called Euclidean scalar
product) in Rn :
Xn
T
hx, yi = x y = xi yi , for x, y ∈ Rn
i=1
With the help of this inner product, we are able to define and compute many useful things:
• length: kxk = hx, xi
p
• distances: dist(x, y) = kx − yk
hx,yi
• angle: cos(∠(x, y)) := kxkkyk
• orthogonality: x ⊥ y : ⇔ hx, yi = 0.
• orthogonal projections, e.g. the height
• rotations about an axis by an angle
• reflections at a hyperplane
109
110 5 General inner products, orthogonality and distances
hx + y, vi = hx, vi + hy, vi
hλx, vi = λhx, vi
(S4) Symmetry:
hx, yi = hy, xi
We usually summarise (S2) and (S3) to linearity in the first argument. Note that from
(S3) always follows ho, oi = 0 · ho, oi = 0. In combination with (S1), we get:
hx, xi = 0 ⇔ x = o . (5.1)
Also, due to positive definiteness, we can define a norm (or length) via
p
kxk := hx, xi.
Then the second argument actually has different properties than the first.
• In the usual real case, the binomial formulas hold:
In contrast to Section 2.5, we now use a subscript to denote this special inner product. If
there is no confusion which inner product we use, we can omit the index.
Remark:
Due to its simplicity, this inner product is prominent in theory and practice. How-
ever, in particular for very large scale problems with special structure other “specially
tailored” inner products play a major role.
5.1 General inner products in Rn 111
Example 5.4. Each diagonal matrix D ∈ Rn×n with positive entries on the diagonal is
a positive definite matrix.
Let A ∈ Rn×n be a positive definite matrix. Then the following defines an inner product
on Rn :
hx, yiA := hx, Ayieuklid = xT Ay
It is linear in the first argument, by the linearity of the matrix- (row) vector product,
symmetric by symmetry of A (aij = aji , or A = AT ) and positive definite by positive
definiteness of A.
The simplest case is A = 1, so hx, yi1 = xT y is the standard euclidean product.
A simple case, where A = D is a diagonal matrix makes sense, if R3 corresponds to spatial
coordinates, given in different units (say inch and centimeters).
Our abstract assumptions already yield all the useful formulas, known from our standard
inner product:
Proposition 5.5.
Let h·, ·i be an inner product on a subspace V ⊂ Rn and k · k its associated norm.
Then for all x, y ∈ V and λ ∈ R, we have:
(a) |hx, yi| ≤ kxkkyk (Cauchy-Schwarz inequality). Equality holds if and only if x
and y are colinear (written as x k y).
(This is zero, if y = o, or x = λy, i.e. x and y are colinear). Setting λ = hx, yi/kyk2 ,
we obtain
0 ≤ kxk2 kyk2 − 2hx, yi2 + hx, yi2 = kxk2 kyk2 − hx, yi2 .
The norm properties are left as an exercise.
112 5 General inner products, orthogonality and distances
No matter which inner product we are using, we can define orthogonality as follows:
x⊥y :⇔ hx, yi = 0.
Once we can measure the size of a vector v by a norm kvk, we may think about measuring
the “size” of a linear map. Consider A ∈ Rm×n , and w = Av. Then the following quotient
kwkRm kAvkRm
=
kvkRn kvkRn
tells us, how much longer (or shorter) w = Av is, compared to v. A should be “large”,
if it produces long vectors from short ones, and “small”, if it produces short vectors from
long ones. Thus, we may define
kAvkRm
kAk := max ,
v6=0 kvkRn
ion r
wa
· ect rse
x=p+n dire cou
of
th
- of
w
tho x)
flo
o r
consisting of two orthogonal vec- p ( on o
f
flow jecti
tors: p is parallel to the wanted dir- o
t
ne l pro
a
gon
ection r and n is orthogonal to this.
5.2 Orthogonal projections 113
The orthogonal projection and normal component is indeed well-defined. If there are two
decompositions x = p + n = p0 + n0 with p, p0 ∈ U and n, n0 ∈ U ⊥ , then we can use
the subspace properties to conclude n − n0 ∈ U ⊥ and p − p0 ∈ U . Applying the inner
product onto p − p0 = n0 − n, we conclude kp − p0 k = 0 and kn0 − nk = 0. From the
norm properties, we get n = n0 and p = p0 .
Calculation of p and n: Because of p ∈ U = Span(r), we have p = λr for a λ ∈ R,
which we simply have to find. Since x = p + n = λr + n and n ⊥ r, we get:
hx, ri
λr +n, ri = hλr, ri + hn, ri = λhr, ri and therefore λ =
hx, ri = h|{z} .
| {z } hr, ri
p 0
Proposition 5.7. Orth. projection & and normal component w.r.t a line
Let x, r ∈ Rn with r 6= o. For the orthogonal projection p of x onto U = Span(r)
and the associated normal component n of x w.r.t. U , one finds:
hx, ri hx, ri
p= r and n=x−p=x− r.
hr, ri hr, ri
We reformulate this:
If α is not acute, we can do an analogue calculation. In summary, we can give the following
definition for an angle:
hx, yi
cos(α) = . (5.3)
kxkkyk
Example 5.9. Consider the cube C in R3 with center M in the origin and the corners
(±1, ±1, ±1)T , where all the combinations with ±-signs occur.
−1 1
All diagonals of C go trough M and intersect with an angle 1 1
α, which is calculated with the vectors x = (1, 1, 1)T and 1 1
y = (−1, 1, 1)T :
hx, yieuklid −1 + 1 + 1 1
cos(α) = =√ √ = , y x
kxkkyk 1+1+1 1+1+1 3 α
M
which implies α = arccos( 13 ) ≈ 70.53◦ .
Proposition 5.12.
For all nonempty sets M ⊂ Rn we have:
Proof. Exercise!
We state one important property of the orthogonal complement. Other important one,
you find at the end of this section.
Proof. (a) For x ∈ U ∩ U ⊥ , we have x ⊥ x, i.e. hx, xi = 0. Using equation (5.1), we get
x = o.
Proof. ⇒ If x ⊥ u holds for all u ∈ U , then, of course, also for all basis elements
u = ui ∈ B ⊂ U .
⇐ If hx, ui i = 0 holds for i = 1, . . . , k and we have u = α1 u1 + . . . + αk uk ∈ U , then
hx, ui = hx, α1 u1 + . . . + αk uk i = α1 hx, u1 i + . . . + αk hx, uk i = α1 0 + . . . + αk 0 = 0.
116 5 General inner products, orthogonality and distances
x
p ∈ U and n ⊥ U with x = p + n.
p
In other words, we write x as a sum of two 0
vectors, where one lies in U and the other
one is orthogonal to U .
x = x U + x U⊥ , i.e. p = xU and n = x U ⊥
x U = α1 u1 + . . . + αk uk ,
The (k × k) matrix on the left-hand side is called the Gramian matrix G(B). The
normal component n = x U ⊥ is then given by n = x − x U .
5.2 Orthogonal projections 117
Proof. The result for αi follows from equation (5.4). Now we show that the Gramian
matrix G(B) =: G is invertible. This means that we have to show Ker(G) = {o}. Let
(β1 , . . . , βk )T ∈ Ker(G). Then for all i = 1, . . . , k, we have:
0 = β1 hu1 , ui i + . . . + βk huk , ui i = hβ1 u1 + . . . + βk uk , ui i.
Hence, u := β1 u1 + . . . + βk uk is orthogonal to all ui . Using Proposition 5.14, we get u ∈
U ⊥ . Per construction, we get u ∈ U , which means u ∈ U ∩ U ⊥ . Using Proposition 5.13,
we conclude u = o. The family (u1 , . . . , uk ) is linearly independent since it is a basis. So
u = o implies β1 = . . . = βk = 0.
x
k x − x U k = min kx − uk =: dist(x, U ) u
u∈U
| {z }
n
xU
0
In other words: No other vector of U is as
closed to x as x U .
Proposition 5.18.
For all nonempty sets M ⊂ Rn we have:
(b) (M ⊥ )⊥ = Span(M ).
Proof. (a): By Proposition 5.16, we know that for each x ∈ Rn there is a decompos-
ition x = p + n with p ∈ Span(M ) and n ∈ (Span(M ))⊥ . Now we just have to use
Proposition 5.12 and Proposition 5.13.
(b): Exercise! (Use part (a))
Rn = U ⊕ U ⊥ ,
118 5 General inner products, orthogonality and distances
and calls it direct sum or, more correctly, orthogonal sum of two subspaces.
(c) (U ⊥ )⊥ = U .
If F is an ONB, then the Gram matrix G(F) is the identity matrix and projections are
very easily calculable.
Example 5.21. Let h·, ·i = h·, ·ieukl the standard inner product.
(a) The canonical unit vectors
e1 = (1, 0, . . . , 0)T , e2 = (0, 1, 0, . . . , 0)T , ..., en = (0, . . . , 0, 1)T
in Rn define an ONB for U = Rn .
(b) The family F = (u1 , u2 , u3 ) given by
u1 = (1, 0, 1)T , u2 = (1, 0, −1)T , u3 = (0, 1, 0)T
defines an OB of R3 . We show this: We immediately have hu1 , u3 i = 0 and hu2 , u3 i =
0. Moreover, we find
! !
D 1 1 E
hu1 , u2 i = 0 , 0 = 1 + 0 − 1 = 0.
1 −1
5.3 Orthonormal systems and bases 119
Hence, F is an OS. It remains to show that F is also a basis for R3 . Since dim(R3 ) = 3
and F consists of three linearly independent vectors, we are finished. For showing the
linear independence, the next Proposition 5.22 will be always helpful.
(c) Normalising the vectors from (b), we obtain an ONB ( √12 u1 , √12 u2 , u3 ).
Proof. Let F be an OS. To show the linear independence of F, we only have to show that
α1 u1 + . . . + αk uk = o always implies α1 = . . . = αk = 0. Using the inner product for ui
with i = 1, . . . , k, we get:
0 = ho, ui i = hα1 u1 + . . . + αk uk , ui i
= α1 hu1 , ui i + . . . + αi−1 hui−1 , ui i +αi hui , ui i +αi+1 hui+1 , ui i + . . . + αk huk , ui i
| {z } | {z } | {z } | {z } | {z }
0 0 kui k2 0 0
2
= αi kui k .
Since kui k2 6= 0, the only possibility is αi = 0 and this holds for all i = 1, . . . , k.
Now we can show, how easy it is to calculate Gramian matrices with a basis that is
orthogonal.
ku1 k2
hu1 , u1 i hu2 , u1 i . . . huk , u1 i 0
.
ku2 k2 . .
hu1 , u2 i hu2 , u2 i . . . huk , u2 i
0
G(B) = .. .. .. = .
. . . . . . .
. .
0
hu1 , uk i hu2 , uk i . . . huk , uk i 0 kuk k2
hx, u1 i hx, uk i
xU = 2
u1 + . . . + uk and x U⊥ = x − x U
ku1 k kuk k2
Even if one is not interested in the projection, this can be helpful for calculating the
coefficients for the linear combination.
hx, ui i
x = α1 u1 + . . . + αk uk with αi = for all i ∈ {1, . . . , k}. (5.6)
kui k2
This formula is called the Fourier expansion of x with respect to B, and the numbers
αi are called the associated Fourier coefficients. If B even is an ONB, then
Looking at the formula for n = x U ⊥ from Proposition 5.23, one recognise a general prin-
ciple how to construct orthogonal vectors. We summarise this in the following algorithm
..
.
(k) In the last step choose the normal component of uk w.r.t. Span(w1 , . . . , wk−1 )
k−1
X 1
vk := uk − huk , wi iwi and normalise it: wk := vk .
kvk k
|i=1 {z }
uk
Span(w1 ,...,wk−1 )
5.4 Orthogonal matrices 121
Example 5.25. Let u1 = (1, 1, 0)T and u2 = (2, 0, 2)T be two vectors in R3 and U =
Span(u1 , u2 ) the spanned plane. We calculate an ONB (w1 , w2 ) for U . The first vector is
!
1 1 1
w1 := u1 = √ 1 .
ku1 k 2 0
We recall Corollary 5.24: Why are such ONB helpful? Usually, if we want to write a
vector v as a linear combination of basis vectors B = (b1 , . . . , bk ), we have to solve a
linear system:
Xk
v= λi bi .
i=1
Thus, each coefficient of the linear combination results from a simple inner product.
Remark: Outlook
It is this principle the so called Fourier-Transformation is built on. It decomposes a
signal v(t) into frequencies ui (t) = sin(ωi t). This is, however, a problem formulated
in a more abstract vector space.
Let B = (u1 , . . . , un ) be a basis for Rn . Then each x ∈ Rn can be uniquely written as:
α1
1 ··· u .. .
x = α1 u 1 + . . . + αn u n = u n .
αn
| {z }
=:A
This means that AT A is the Gramian Matrix G(B) for the basis B. For an ONB B, the
matrix G(B) is the identity matrix 1 by Proposition 5.23. This gives us the following:
We immediately see that an orthogonal matrix A has an ONB as columns and fulfils
The last property says that the corresponding linear map fA preserves the inner product,
and thus angles and lengths.
(b) AT A = 1.
(c) AAT = 1.
(d) A−1 = AT .
Proof. Exercise!
5.5 Orthogonalisation: the QR-decomposition 123
(b) A “reflection” from Definition 5.29 could also be a point reflection in the case
n ≥ 3.
A = a1 · · · an
r11 r12 r1n r11 r12 r13 · · · r1n
0 r22
r2n !
r22 r23 · · · r2n
=
0 0 r3n
Q Q ··· Q = Q
r33 · · · r3n
= QR.
.. .. .. ... ..
. . . .
0 0 rnn rnn
| {z } | {z } | {z } | {z }
a1 a2 an =: R
1/3
r12 = ha2 , q1 i = −3, q2 = 2/3 , r22 = 3
2/3
2/3
r13 = ha3 , q1 i = 3, r23 = ha3 , q2 i = 6, q3 = −2/3 , r33 = 6;
1/3
2/3 1/3 2/3 3 −3 3
Q = 1/3 2/3 −2/3 R= 3 6
−2/3 2/3 1/3 6
Example 5.31.
2
2 −1 8 /3
a1
For A = 1 1 1 Gram-Schmidt gives us q1 = = 1
/3 ,
ka1 k −2
−2 4 4 /3
1 2
a2 − (a2 ) Span(q1 ) /3 a − (a ) /3
3 3 Span(q1 ,q2 )
2 −2
q2 = = /3 , q3 =
= /3
k...k 2 k...k 1
/3 /3
2 1 2 3 −3 3
1
Hence: Q = 1 2 −2 and R = QT A = 3 6 .
3
−2 2 1 6
As we have seen in the LR-decomposition, we can also use the QR-decomposition for
solving an LES Ax = b. If A is a square matrix (m = n), we know:
Q−1 =QT
Ax = b ⇐⇒ QRx = b ⇐⇒ Rx = QT b (5.8)
The last system has a triangle form and is solved by backwards substitution. A QR-
decomposition is also possible in the non-square case as we will see later in detail.
{v ∈ Rn : hn, v − pi = 0}
for the shortest distance between v and T and the shortest distance between S and
T , respectively.
If we are using the HNF for a hyperplane, then the expression hn, v − pi can indeed
measure the distances:
Proposition 5.33.
For a hyperplane T = {v ∈ Rn : hn, v − pi = 0} with knk = 1 (this is the HNF),
we have
hn, q − pi = ±dist(q, T ) (5.9)
where the sign “+” holds if q lies on the same side of T as the normal vector n,
and “−” holds if q lies on the other side of T .
Using equation (5.9), we are able to calculate distances. We summarise all possibilities
for such problems for R3 :
Distances in R3
• Point/Point: dist(p, q) = kp − qk, (for completeness’s sake),
• Point/Plane: dist(q, p + Span(a, b)) = |hn, q − pi|, cf. (5.9).
| {z }
E
• Line/Plane:
and therefore, one usually just considers the case of linear subspaces instead of affine
subspaces.
Summary
• Each vector x ∈ Rn can be uniquely decomposed into
– a vector x U in a given subspace U and
– a vector n that is orthogonal to U .
The vector x U is called the orthogonal projection of x onto U . n is equal to x U ⊥ .
• If dim(U ) = 1, the calculation of x U is very easy, while one can use Proposition 5.7;
If dim(U ) ≥ 2, the one has to choose a basis B for U and either
– solve an LES with the help of the Gramian matrix G(B) (Proposition 5.16) or
– build an ONS or ONB with the help of the Gram-Schmidt procedure and use
Proposition 5.23.
• A matrix A ∈ Rn×n with A−1 = AT is called orthogonal. The determinant is ±1.
Depending on the sign of det(A), the matrix A describes a reflection or a rotation.
Eigenvalues and similar things
6
The first person you should be careful not to fool is yourself. Because
you are the easiest person to fool.
Richard Feynman
Consider again a square matrix A ∈ Rn×n and the associated linear map fA : Rn → Rn
which maps Rn into itself.
Question:
Are there vectors v which are only scaled by fA ? This means that they satisfy:
Av = λv or equivalently (A − λ1)v = o
• λ is called eigenvalue of A,
First conclusions:
• Not very interesting (trivial): v = o.
• v ∈ Ker(A) \ {o} ⇒ Av = 0v, so λ = 0.
• v ∈ Ker(A − λ1) \ {o} ⇒ Av = λv, so λ is an eigenvalue.
• v eigenvector ⇒ αv is also an eigenvector (for α 6= 0).
1 1 v1 v1 + v2 v
Example. (a) Av = = ⇒ λ = 1, v = 1
0 1 v2 v2 0
3 0 v1 3v1 0 v
(b) Av = = ⇒ λ = 2, v = , or λ = 3, v = 1
0 2 v2 2v2 v2 0
(c) The eigenvalues of a diagonal matrix are the diagonal entries, its eigenvectors are the
(scaled) canonical unit vectors.
(d) Suppose A ∈ R2×2 is a rotation about an angle (not a multiple of 180◦ ). Then
“obviously” there cannot be any eigenvectors (at least no real ones).
129
130 6 Eigenvalues and similar things
This is very general definition and will work later for other cases in the same manner.
Here, we are first interested in matrices A ∈ Rn×n and eigenvalues λ ∈ R. However, you
may already see that this can also work for complex numbers. We may also include λ ∈ C
later.
To get this “optimal coordinate system” we need all the eigenvalues λ1 , λ2 and the cor-
responding eigenvectors x1 and x2 .
Question:
Idea:
Compute det(A − λ1), which yields a polynomial of degree n in λ and determine
its zeros, because
Example 6.3.
3 2
A=
1 4
3−λ 2
det(A − λ1) = det = (3 − λ)(4 − λ) − 2 · 1 = 10 − 7λ + λ2
1 4−λ
√
7 ± 49 − 40 7±3
λ1,2 = = ⇒ λ1 = 2, λ2 = 5
2 2
Thus we have the eigenvalues λ1 = 2 and λ2 = 5. Let us compute the eigenvectors:
1 2 v1 v1 + 2v2 2
o = (A − 21)v = = ⇒v=α
1 2 v2 v1 + 2v2 −1
−2 2 v1 −2v1 + 2v2 1
o = (A − 51)v = = ⇒v=α
1 −1 v2 v1 − v2 1
(i) λ is an eigenvalue A.
Proof. Exercise!
Let A ∈ Rn×n . Then we observe that det(A − λ1) = pA (λ) is a polynomial of order n in
the variable λ. For example, there could be coefficients ci such that
3 2
Example 6.6. Look at A = .
1 2
3 2
1 0
3−λ 2
pA (λ) = det(A − λ1) = det −λ = det
1 2 0 1 1 2−λ
= (3 − λ) · (2 − λ) − 2 · 1 = 6 − 3λ − 2λ + λ2 − 2 = λ2 − 5λ + 4
If we observe the polynomial in the largest number space we know, the complex numbers,
we recall the fundamental theorem of algebra:
an xn + an−1 xn−1 + . . . + a1 x1 + a0 = 0
| {z }
=: p(x)
Sometimes some of the values λ1 . . . λn are equal (e.g. pA (λ) = (λ − 1)2 , and then
λ1 = λ2 = 1), so we have a multiple root.
Definition 6.8.
If the same eigenvalue λ appears α(λ) times in this factorisation, we say:
(c) Also spec(A) = spec(AT ). Hence (a) and (b) also hold for lower triangular
matrices.
Proof. For (b): This immediately follows from Proposition 4.15 since λ ∈ spec(A) if and
only if
B C
1 0
B − λ1 C
0 = det −λ = det = det(B − λ1) det(D − λ1),
0 D 0 1 0 D − λ1
1 2 3 4 5
6 7 8 9 10
(b) spec = spec 1 2 ∪ spec 11 12
10 = {1, 6, 10, 12, 15}
6
11 12 13 14 15
13 14 15
1 2 3 4 5 6
7 8 9 10 11 12
(c) spec
12 = spec 1 2 ∪ spec 13 14
13 14
7 15 16 17 18
15 16 17 18 19 20 21
19
20 21
1 2 12 17 18
= spec ∪ spec ∪ spec = {1, 7, 12, 14, 17, 21}
7 13 14 21
Remark:
The characteristic polynomial for A ∈ Rn×n is of the following form
where tr(A) := nj=1 ajj is the sum of the diagonal, the so-called trace of A.
P
Naturally, we define the addition and scalar multiplication in Cn and Cm×n as we did for
the objects with real entries. Indeed, everything works the same and we find:
We already mentioned that a set V with an addition and scalar multiplication that fulfils
the rules above is called a vector space. However, note that we now can also scale vectors
by using complex numbers. To make this clear, we often speak of the complex vector
space Cn .
Recall the notions of linear dependence, linear independence and basis, which we still use
in the complex vector space Cn .
(1) o∈U,
(2) u ∈ U , λ ∈ C =⇒ λu ∈ U ,
(3) u, v ∈ U =⇒ u + v ∈ U .
Span (M ) := {λ1 u1 + · · · + λk uk : u1 , . . . , uk ∈ M , λ1 , . . . , λk ∈ C , k ∈ N} .
This subspace is called the span or the linear hull of M . For convenience, we define
Span(∅) := {o}.
If this is not possible, we call the family (v1 , . . . , vk ) linearly independent. This
means that
Xk
λj vj = o ⇒ λ1 , . . . , λk = 0
j=1
holds.
Even in the complex vector space Cn , we are able speak of geometry when endowing the
space with an inner product. We try to generalise what we know from the complex plane
C and the real vector space Rn .
is called the (standard) inner product of u and v. Moreover, we define the real
number p p
kvk := hv, vi = |v1 |2 + . . . + |vn |2
and call it the norm of v.
Attention!
In some other books, you might find an alternative definition of the standard inner
product in Cn where the first argument is the complex conjugated one.
Note that hv, vi is always a real number with ≥ 0 such it gives us indeed a length.
p Again,
we find the important property: hv, vi = 0 if and only if v = o. Hence, hv, vi is
well-defined and the norm k · k has the same properties as in Rn , see Proposition 6.19
below.
Proposition 6.18.
The standard inner product h·, ·i : Cn × Cn → C fulfils the following: For all vectors
x, x0 , y ∈ Cn and λ ∈ C, one has
Looking at the standard inner product in Rn , we know that AT satisfies the equation
hAx, yi = hx, AT yi for all x ∈ Rn , y ∈ Rm (Proposition 3.36). In the standard inner
product in Cn , we also have the complex conjugation involved.
is called the adjoint matrix of A. It is the uniquely determined matrix that fulfils
the equation
hAx, yi = hx, A∗ yi
for all x ∈ Cn and y ∈ Cm .
In analogy to Proposition 6.9 (c), we get the following for complex matrices:
Proof. det(A∗ − λ1) = det (A − λ1)∗ = det(A − λ1) by Proposition 4.15 and z + w =
z + w and z · w = z · w.
Definition 6.22.
A complex matrix A ∈ Cn×n is called
• selfadjoint if A = A∗ (complex version of “symmetric”),
• skew-adjoint if A = −A∗ (complex version of “skew-symmetric”),
• unitary if AA∗ = 1 = A∗ A (complex version of “orthogonal”),
• normal if AA∗ = A∗ A.
1 2i 1 −2i 1 2i
Beispiel 6.23. (a) A = ∗
⇒A = = =A
−2i 0 2i 0 −2i 0
6.4 Eigenvalues and similarity 139
i −1 + 2i i 1 + 2i −i 1 − 2i
(b) A = ⇒ A∗ = = = −A
1 + 2i 3i −1 + 2i 3i −1 − 2i −3i
1 + i 3 − 2i 1 + i 2i 1 − i −2i
(c) A = ⇒ A∗ = = 6∈ {A, −A}
2i −1 3 − 2i −1 3 + 2i −1
which implies λ ∈ R.
(b): If A = −A∗ , then:
(6.2) A∗ =−A (S4)
λ = hAx, xi = hx, A∗ xi = hx, −Axi = h−Ax, xi = −hAx, xi = −λ,
Proposition 6.26.
Similar matrices have the same characteristic polynomial and thus the same eigen-
values.
A − λ1 = S −1 BS − λS −1 S = S −1 (B − λ1)S.
Thus, A−λ1 and B−λ1 are similar matrices. Similar matrices have the same determinant:
Remark:
Later, we will see that any matrix A ∈ Cn×n is similar to a triangular matrix.
(a) spec(A) 6= ∅.
Looking at Proposition 6.4, we see what we have to do in order to calculate the eigenvectors
of a given matrix A if we already know the eigenvalues λ:
Note that Eig(λ) is always a linear subspace and makes also sense in the case when λ is
not an eigenvalue of A. In this instance, we simply have Eig(λ) = {o}.
and tells you how often the eigenvalue λj occurs in the characteristic polynomial.
We also define
(3) If one eigenvalue is found, we can reduce the characteristic polynomial by equat-
ing coefficients (or polynomial division).
142 6 Eigenvalues and similar things
(4) The eigenvectors x are given by the solutions of the LES (A − λ1)x = o for
each eigenvalue, where only the nonzero solutions x 6= o are interesting.
Example 6.32.
p(λ) = −λ3 + 5λ2 − 8λ + 6
• n = 3 is odd: “−λ3 ”
• Try some values and find: λ1 = 3.
Equating coefficients (or polynomial division):
We derive four equations for three unknowns in the following way:
−λ3 + 5λ2 −8λ + 6 = (aλ2 +bλ + c)(λ − 3) = aλ3 + (b − 3a)λ2 + (c − 3b)λ − 3c
−1 =a ⇒ a = −1
5 = b − 3a ⇒ 5=b+3 ⇒ b=2
−8 = c − 3b ⇒ c − 6 = −8 ⇒ c = −2
6 = −3c fulfilled, so λ1 = 3 is really a root of p.
Factorisation:
p(λ) = (−λ2 + 2λ − 2)(λ − 3)
Solution of quadratic equation:
√ √
−b ± b2 − 4ac −2 ± 4 − 8
λ2,3 = = = 1 ± i.
2a −2
Result:
p(λ) = − λ − 3 λ − (i + 1) λ − (i − 1) .
Exercise 6.33.
Let A be a square matrix and λ1 , λ2 two different eigenvalues. Show that
Proof. Let us denote { p(λ) : λ ∈ spec(A) } by M . Then we have to show two inclusions
to prove spec p(A) = M .
(⊃): For λ ∈ spec(A) with eigenvector x we use (6.4) and get
p(A)x = (pm Am + pm−1 Am−1 + . . . + p1 A + p0 1)x
= pm Am x + pm−1 Am−1 x + . . . + p1 Ax + p0 1x
= pm λm x + pm−1 λm−1 x + . . . + p1 λx + p0 x
= (pm λm + pm−1 λm−1 + . . . + p1 λ + p0 )x = p(λ)x.
Hence, the number p(λ) ∈ M is an eigenvalue of the matrix p(A) with the same eigenvector
x.
(⊂): First, assume that p is a constant polynomial p(z) = p0 ∈ C. Then let λ ∈
spec(p(A)), which means 0 = det(p(A) − λ1) = (p0 − λ)n . We conclude λ = p0 and
λ ∈ M.
Now assume that p is not constant and µ ∈ / M = { p(λ) : λ ∈ spec(A) }. (We do a
contraposition). Then the polynomial q(z) := p(z) − µ can be written in linear factors
m
Y
q(z) = c (z − aj )
j=1
We can expand the spectral mapping theorem also to other functions besides polynomials.
For example, it also works for the negative powers. This means that we can calculate the
eigenvalues of A−1 if you know the eigenvalues of A. In this case, you do not have to
calculate the inverse A−1 :
Let all the eigenvalues of A be nonzero (in this case, recalling Proposition 6.28 the inverse
A−1 exists) and fix one of them as λ with corresponding eigenvector x 6= o. Then we can
multiply the equation
Ax = λx
from the left by A−1 . Hence we get x = λA−1 x and also:
Rule of thumb:
A−1 has the same eigenvector x as A – but for the eigenvalue λ−1 instead of λ.
We simply get:
spec(A−1 ) = {λ−1 : λ ∈ spec(A)}.
Of course, λ−1 is always well-defined since λ 6= 0.
Example 6.36. Let A = 31 22 . Now, one could calculate the inverse, using formulas from
Chapter 4: 1
−1 /2 −1/2
A =
−1/4 3/4
This matrix has the eigenvalues µ1 = 1/4 and µ2 = 1 and the eigenvectors x1 = 21 and
1
.
x2 = −1
If one is only interested in the eigenvalues, we do not have to calculate the matrix A−1 .
We just use the rule from above and know that the eigenvalues of A−1 are the reciprocals
of λ1 = 4 and λ2 = 1. The eigenvectors are the same as the eigenvectors of A.
We do not have to stop here. We can multiply A−1 again from the left to equation (6.5)
and, doing this repeatedly, we get
A−2 x = λ−2 x, A−3 x = λ−3 x, etc.,
where A−n means (A−1 )n .
Hence, we can expand the equation (6.4) to all numbers m ∈ Z:
Am x = λm x for all m ∈ Z.
eigenvectors x1 = 1 and x2 = −1 .
2
Hence we know the two vectors, x1 and x2 , that span the lines g3 and g4 from the start
of the chapter, respectively. Also, we know the “stretch factors” λ1 = 4 and λ2 = 1 that
describe the acting of fA in the direction g3 and g4 , respectively. Therefore, choosing a
coordinate system with respect to (x1 , x2 )-coordinates makes calculations a lot easier:
fA
x1 x1
x2 x2
We immediately see a big advantage for this coordinate system: We can apply A several
times without effort. For example the operation A100 is directly given by: A100 u =
4100 α1 x1 + 1100 α2 x2
However, we already know that in general we cannot expect to stay in the real numbers. If
the eigenvalues are strictly complex numbers, the picture gets a little bit more complicated
but the properties remain. Let A ∈ Cn×n be a n × n matrix with complex entries. Let
λ1 , . . . , λn ∈ C denote the n eigenvalues A (which means the zeros pA ) counted with
algebraic multiplicities, and let x1 , . . . , xn ∈ Cn be the corresponding eigenvectors. Then
we already know:
λ1
(6.6) ..
= λ1 x1 · · · λn xn = x1 · · · xn . ,
λn
| {z }
=:D
A(α1 x1 + . . . + αn xn ) = λ1 α1 x1 + . . . + λn αn xn .
Diagonalisation of A
AX = XD. (6.7)
X −1 · X· X −1 · X·
Multiplication (6.7)·X −1
gives:
A = XDX −1 (6.8)
α1 λ1 α 1
.. D· ..
. .
and in the same ways X −1
·(6.7)
αn λn α n
gives:
Coordinates w.r.t. basis (x1 , ..., xn )
−1
X AX = D.
The important question “Is that even possible?” is equivalent to the following:
6.7 Diagonalisation – the optimal coordinates 147
1 0
Example 6.38. (a) The matrix A = has e1 and e2 as eigenvectors and they
0 2
form a basis of C2 . Hence, A is diagonalisable.
1 1 1 1
(b) The matrix B = has and as eigenvectors and they form a basis of
0 2 0 1
C2 . Hence, B is diagonalisable.
1 1 1
(c) The matrix C = has only eigenvectors in direction and they cannot
0 1 0
form a basis of C2 . Hence, C is not diagonalisable.
Choosing a basis consisting of eigenvectors, we know that A acts like a diagonal matrix.
Proof. We use mathematical induction over k. The case k = 2 was proven Exercise 6.33.
Now the induction hypothesis is that (x1 , . . . , xk ) is linearly independent for k different ei-
genvalues. In the induction step, we now look at k+1 different eigenvalues λ1 , . . . , λk , λk+1
and corresponding eigenvectors (x1 , . . . , xk , xk+1 ). Now assume that this family is linearly
dependent. Then we can choose coefficients βi such that xk+1 = β1 x1 + · · · + βk xk holds.
Hence,
By the induction hypothesis, we conclude (λk+1 − λi )βi = 0 for all i. Since not all βi can
be zero, we find at least one j ∈ {1, . . . , k} with λk+1 = λj , which is a contradiction.
Therefore, we conclude:
matrix.
(c) Look at the 3 × 3 matrices:
4 0 0 8 8 4
A= 1 6 3 and B = −1 2 1 .
−2 −4 −2 −2 −4 −2
If you calculate the characteristic polynomials, you find
For A from Example 6.41 (c), we find α(0) = 1 = γ(0), α(4) = 2 = γ(4).
For B from Example 6.41 (c), we get α(0) = 1 = γ(0), α(4) = 2 6= 1 = γ(4).
(a) A is diagonalisable,
Proof. Exercise.
For symmetric or selfadjoint matrices, we can improve Proposition 6.39 even more:
(S4) (S4)
Proof. Since hx, λ0 x0 i = hλ0 x0 , xi = λ0 hx0 , xi = λ0 hx0 , xi = λ0 hx, x0 i, we have
A=A∗ see above Prop. 6.24
λhx, x0 i = hλx, x0 i = hAx, x0 i = hx, Ax0 i = hx, λ0 x0 i = λ0 hx, x0 i = λ0 hx, x0 i
and, hence, (λ − λ0 )hx, x0 i = 0. This means that the second factor has to be zero.
X = x1 . . . xn
Sketch of proof. Use Proposition 6.43 and Gram-Schmidt for each eigenspace to find an
ONB of Cn . Then X ∗ X = 1 and also X ∗ = X −1 .
Remark: Important!
Actually, we could generalise the Proposition from above and equation (6.9). It
holds if and only if the matrix A is normal (i.e. AA∗ = A∗ A).
Proposition 6.45.
For a diagonalisable A ∈ Cn×n , let λ1 , . . . , λn be the eigenvalues counted with algeb-
raic multiplicities. Then
n
Y n
X
det(A) = λi and tr(A) = λi ,
i=1 i=1
where tr(A) := ajj is the sum of the diagonal, the so-called trace of A.
Pn
j=1
Proof. Exercise!
Remark:
Later, we will see that the result of Proposition 6.45 actually holds for all matrices
A ∈ Cn×n .
Rotation of boxes
A box of the size 10cm × 20cm × 30cm rotates around a axis given by the vector
ω ∈ R3 . The whole box has an angular momentum L ∈ R3 .
ω
L is given by a linear equation using ω, which means
L
L = Jω
√
and diagonalise the symmetric matrix A: λ1 = 4, λ2 = 0, x1 = 1 3 −1
1 √
2 1
, x2 = 2 3
√ √ √
3 3 1 3 √−1 4 1 3 √1
√ ∗ T
= A = XDX = XDX = .
3 1 2 1 3 0 2 −1 3
| {z } | {z } | {z }
X D XT
Then we get 2 = xT Ax+bT x =√xT (XDX T )x+bT x = (xT X)D(X T x)+bT X(X T x).
Setting uv = u := X T x = 21 −13 √13 xy simplifies the equation to
T T 4 u u
2 = u Du + b Xu = (u v) + (0 −2) = 4u2 − 2v.
0 v | {z } v
bT X
y
v
The more complicated equation from above looks at lot
simpler in the “optimal” (x1 , x2 )-coordinate system: u
30◦
2 = 4u2 − 2v, also v = 2u2 − 1, x2
· x1
30◦
x
There you immediately seethat it is a parabola. The
v = 2u2 − 1
transformation we did, y = x 7→ v = u = X T x,
x u
n = 2: det(A) = λ1 λ2 .
• det(A) > 0 ⇒ eigenvalues have the same sign ⇒ A (pos. or neg.) definite. If
a11 = eT1 Ae1 > 0, then pos., otherwise neg. definite.
• det(A) < 0 ⇒ A indefinite
In general: A symmetric A is positive definite if all left upper subdeterminants are positive.
Summary
• All matrices A we considered here were square matrices.
• A vector x 6= o, which A only scales, which means Ax = λx, is called an eigenvector ;
152 6 Eigenvalues and similar things
the corresponding scaling factor λ is called an eigenvalue. The set of all eigenvalues
of A is called the spectrum.
• λ is an eigenvalue of A if and only if (A − λ1)x = o has non-trivial solutions x 6= o
(namely the eigenvectors). This is fulfilled if and only if det(A − λ1) = 0.
• For A ∈ Cn×n , we define pA (λ) := det(A − λ1), the characteristic polynomial of A,
which is a polynomial of degree n in the variable λ. It has exactly n complex zeros:
the eigenvalues of A.
• The eigenvalues λ are in general complex numbers, also the eigenvectors are complex
x ∈ Cn . All matrices should be considered as A ∈ Cn×n .
• Also in Cn , we can define inner products. Here, we only use the standard inner
product hx, yi, defined by x1 y1 + · · · + xn yn . Hence we get a new operation for
matrices: A∗ := AT = (aji ). It satisfies hAx, yi = hx, A∗ yi for all x, y.
• Checking eigenvalues: Product of all eigenvalues of A is equal to det(A); the sum is
equal to tr(A).
• The matrix A is invertible if and only if all eigenvalues are nonzero.
• The eigenvalues of a triangular matrix are the diagonal entries. The eigenvalues of
a block matrix in triangular form are given by the eigenvalues of the blocks on the
diagonal.
• The eigenvalues of Am are given by the eigenvalues of A to the power of m where
m ∈ Z. The eigenvectors stay the same as for A. For example, 3A17 − 2A3 + 5A−6
has the eigenvalues 3λ17 − 2λ3 + 5λ−6 , where λ goes through all eigenvalues of A.
The eigenvectors still stay the same.
• A is called diagonalisable if it can be written as XDX −1 , where D is a diagonal
matrix consisting of eigenvalues of A and X gets eigenvectors in the columns. This
only works if there are enough eigenvectors directions such that X is invertible.
• A is diagonalisable if and only if for all eigenvalues λ the algebraic multiplicity of λ
is the same as the geometric multiplicity.
• If A = A∗ , then A is diagonalisable and the eigenvalues are real and eigenvectors
can be chosen to be orthonormal.
General vector spaces
7
’Oh man, capitalism sucks!’ cries the Kangaroo as it flips the Monopoly
board.
Marc-Uwe Kling
We already mentioned the general definition of an abstract vector space. However, most
of the time, we focused on the vector spaces Rn and Cn . Now, we really start walking
into this new abstract terrain.
153
154 7 General vector spaces
Example 7.2. Rn and Cn . At this point, we are very familiar with the space Fn , where
the vectors have n components consisting of numbers from F and the addition and scalar
multiplication is done componentwise:
v1 v1 λv1
λ ∈ F, v = ... ⇒ λv = λ ... := ...
vn vn λvn
u1 v1 u1 v1 u1 + v1
u = ... , v = ... ⇒ u + v = ... + ... := ...
un vn un vn un + vn
Now, this is now just a special case of an F-vector space.
Of course: Vectors from Rn and Cn are also vectors in this new sense. However, now we
have a lot more examples:
Example 7.3. Matrices. The set of matrices V := Fm×n together with the matrix
addition and scalar multiplication
a11 · · · a1n b11 · · · b1n a11 + b11 ··· a1n + b1n
.. .. + .. .. := .. ..
. . . . . . .
am1 · · · amn bm1 · · · bmn am1 + bm1 · · · amn + bmn
| {z } | {z } | {z }
A B A+B
a11 · · · a1n λa11 · · · λa1n
.. .. := .. .. .
λ· . . . .
am1 · · · amn λam1 · · · λamn
| {z } | {z }
A λ·A
(α · f )(x) := α · f (x)
(f + g)(x) := f (x) + g(x)
for all x ∈ R.
arctan
arctan + cos
x
cos
2 · cos
This is a natural definition for the α-multiple of a function and the sum of two functions.
Obviously, α · f and f + g are again well-defined functions R → R and hence elements in
F(R). Now we have to check to rules: (1)–(8). This is an exercise.
Hence, F(R) with + and · is an F-vector space.
So, we see functions also as vectors since we have the same calculation rules.
(8)
Proof. We have 0 · f = (0 + 0) · f = (0 · f ) + (0 · f ). Add on both sides the vector −(0 · f ),
we get o = 0 · f . For the second claim, we have
(8) (5)
o = 0 · f = (1 + (−1)) · f = (1 · f ) + ((−1) · f ) = f + ((−1) · f ).
Example 7.6. Polynomial functions. Let P(R) denote the set of polynomial functions
f : R → R. We know that P(R) is a nonempty subset of F(R) (set of all functions
R → R) from Example 7.4. The addition + and scalar multiplication · are just inherited
from F(R).
156 7 General vector spaces
Is P(R) also a vector space? Before checking (1)–(8), we have to prove that the vector
addition + and scalar multiplication · is well-defined inside P(R), which means that you
cannot leave P(R) by these operations:
We already know equation (7.1): It means that P(R) is closed under the addition + and
the scalar multiplication ·. We can easily show that (7.1) is correct for polynomials.
Now checking (1)–(8) is very fast because:
This finishes the proof that P(R) is a vector space with respect to + and ·.
Refresh that this proof could be done without much work. The only time when you have
to look at P(R), which means at the polynomial functions, was equation (7.1). All the
other things are inherited from the superset F(R).
Example 7.8. Quadratic polynomials. Let P2 (R) be the set of all polynomials with
degree ≤ 2, which means
all functions p : R → R, p(x) = a2 x2 + a1 x + a0 with a2 , a1 , a0 ∈ R.
7.2 Linear subspaces 157
Is P2 (R) with the vector addition + and · from F(R) a vector space?
Obviously, P2 (R) ⊂ F(R) and P2 (R) 6= ∅. Using Proposition 7.7 we only have to check
that P2 (R) is closed under + and ·, which means that we have to check (a) and (b):
Let p, q ∈ P2 (R) and α ∈ R. Then, there are a2 , a1 , a0 , b2 , b1 , b0 ∈ R such that
Hence:
(p + q)(x) = p(x) + q(x) = (a2 x2 + a1 x + a0 ) + (b2 x2 + b1 x + b0 )
= (a2 + b2 )x2 + (a1 + b1 )x + (a0 + b0 ),
(α · p)(x) = α · p(x) = α · (a2 x2 + a1 x + a0 )
= (αa2 )x2 + (αa1 )x + (αa0 )
We conclude that p + q ∈ P2 (R) and α · p ∈ P2 (R). The set P2 (R) is a subspace of F(R)
and a vector space for its own.
Analogously, for n ∈ N0 , we define Pn (R) as the set of all polynomials with degree ≤ n.
It forms again a vector space with the operations + and · from F(R).
Here, another vector space:
Example 7.9. Upper triangular matrices Let n ∈ N and Rn×n the set of all upper
triangular matrices A ∈ Rn×n . The operations + and · are the same as before for all
matrices. Since o ∈ Rn×n 6= ∅, the sum of two upper triangular matrices is again a
triangular matrix and scaled triangular matrices are again triangular matrices, we know
that Rn×n is a subspace of the R-vector space Rn×n (from Example 7.3) and hence a
vector space itself.
F(R)
If we look back at the polynomial spaces, we notice that P(R)
we have the following inclusions:
P10 (R)
P0 (R) ⊂ P1 (R) ⊂ P2 (R) ⊂ · · · ⊂ P(R) ⊂ F(R) P2 (R)
P1 (R)
The vector space F(R) is the largest of these. The poly- P0 (R)
nomial space Pn (R) gets smaller if we choose n smaller.
When we talk about sizes in vector spaces, we remember
the definition of a dimension of a subspace. We suppose
that the dimension of Pn (R) is n + 1. But first of all, we
have to define all the notions again.
158 7 General vector spaces
• The set of all possible linear combinations for the vectors of a subset M ⊂ V
is called the linear hull or span of M :
Span (M ) := {λ1 u1 + · · · + λk uk : u1 , . . . , uk ∈ M , λ1 , . . . , λk ∈ F , k ∈ N} .
The definitions and proofs for related propositions are literally the same as in Section 3.7
for the vector space Rn and its subspace. Therefore, we just summarise the facts in this
more abstract case:
However, the examples above were all well-known matrix spaces. It would be more in-
teresting to look at our new function spaces like the polynomial space P2 (R) from Ex-
ample 7.8:
Indeed, (7.5) is an LES 3 equations and three unknowns α, β, γ, which we can just solve
with our known methods. We get α = β = γ = 0. Hence, B = (m0 , m1 , m2 ) is linearly
independent and a basis of P2 (R). We also get dim(P2 (R)) = 3.
7.4 Coordinates with respect to a basis 161
Sketch of proof. This works the same as for P2 (R), see Example 7.13. In order to show the
linear independence, we have to choose n+1 different values for x. The (n+1)×(n+1)-LES
always has a unique solution.
Now, by knowing that the monomials are linearly independent, we can always solve equa-
tions by equating coefficients:
an x n + . . . + a1 x + a0 = b n x n + . . . + b 1 x + b 0 , (7.6)
Proof. Equation (7.6) means (an − bn )mn + . . . + (a1 − b1 )m1 + (a0 − b0 )m0 = o. Because
of the linear independence, we have (an − bn ) = . . . = (a1 − b1 ) = (a0 − b0 ) = 0.
Remark:
Since dim(Pn (R)) = n + 1 and we have the inclusions
we conclude that dim(P(R)) and dim(F(R)) cannot be finite natural numbers. Sym-
bolically, we write dim(P(R)) = ∞ in such a case.
Since B is a generating system and linearly independent, each v from V has a linear
combination
v = α1 b1 + . . . + αn bn (7.7)
where the coefficients α1 , . . . , αn ∈ F are uniquely determined. We call these numbers
the coordinates of v with respect to the basis B and sometimes write vB for the vector
consisting of these numbers:
One also sees the notation [x]B for the coordinate vector. When fixing a basis B in V ,
then each vector v ∈ V uniquely determines a coordinate vector vB ∈ Fn – and vice versa.
Then:
v = 8cos + 15arctan
arctan
sin
cos
We find
v = 8cos + 15arctan
directly by using
its coordinates:
!
0
ΦB (v) = 8 .
15
α1
of using v ∈ V , one can also do calculations with ΦB (v) = .. ∈ Fn .
.
αn
Of course, calculations in Fn might be simpler and more suitable for a computer than the
calculations in an abstract vector space.
Example 7.17. The polynomials p, q ∈ P3 (R) given by p(x) = 2x3 − x2 + 7 and q(x) =
x2 +3 can be represented with the monomial basis B = (m0 , m1 , m2 , m3 ) by the coordinate
vectors:
7 3
0
and 0 ∈ R4 , (7.10)
ΦB (p) = ∈ R4 ΦB (q) =
−1 1
2 0
since p = 7m0 + 0m1 + (−1)m2 + 2m3 and q = 3m0 + 0m1 + 1m2 + 0m3 .
In the same manner the two polynomials (2p)(x) = 4x3 −2x2 +14 and (p+q)(x) = 2x3 +10,
164 7 General vector spaces
4 2
This shows that we can also calculate with the coordinate vectors from equation (7.10).
Example 7.18. The matrix A = 10 23 ∈ R2×2 has the following coordinate vector with
!
3
The matrix 3A has the coordinate vector 6 .
9
The matrix
5 0 5
C = 0 2 0
5 0 5
has the following coordinate vector with respect to the basis from equation (7.3): 5
.
2
The matrix 2C has then the coordinate vector 104
.
which is also invertible. We have called it the basis isomorphism. For a given element
v = α1 b1 + . . . + αn bn , we can write:
v = α1 b1 + . . . + αn bn = α1 Φ−1 −1 −1
B (e1 ) + . . . + αn ΦB (en ) = ΦB (ΦB (v)) .
Φ−1 n
B : F → V , Φ−1
B (ej ) = bj for all j
7.4 Coordinates with respect to a basis 165
Example 7.19. Consider the already introduced monomial basis (now with different
order!) B = (m2 , m1 , m0 ) = (x 7→ x2 , x 7→ x, x 7→ 1) of the space P2 (R) and the
polynomial p ∈ P2 (R) defined by p(x) = 4x2 + 3x − 2. Then:
!
4
p = Φ−1
B 3 = Φ−1 B (ΦB (p)) since p = 4m2 + 3m1 − 2m0 .
−2
Now, let C = (c1 , . . . , cn ) be another basis of V . Then a given vector v ∈ V can also be
given as a linear combination with this new basis, which means v = α10 c1 + . . . + αn0 cn =
C (ΦC (v)) with some coefficients αj ∈ F.
Φ−1 0
V
Φ−1
B Φ−1
C
?
Fn Fn
?
We see that we want to create a map Fn → Fn that makes this translation from one
coordinate system into the other. Looking at the linear maps ΦB and ΦC , we see that
this is just a composition of two linear maps. To get the map from “left to right” by
f : Fn → Fn , f := ΦC ◦ Φ−1
B . More concretely for all canonical unit vectors ej ∈ F , we
n
get:
f (ej ) = ΦC (Φ−1
B (ej )) = ΦC (bj ) (7.11)
Since f is a linear map, we find a uniquely determined matrix A such that f (x) = Ax for
all x ∈ Fn . This matrix is determined by equation (7.11) and given a suitable name:
The corresponding linear map gives us a sense of switching from basis B to the basis C.
Also a good mnemonic is:
Φ−1 −1
C TC←B x = ΦB x for all x ∈ Fn (7.13)
Now, if we have a vector v ∈ V and its coordinate vector ΦB (v) and ΦC (v), respectively,
then we can calculate:
Transformation formula
B = ( m2 , m1 , m0 ) = (x 7→ x2 , x 7→ x, x 7→ 1)
|{z} |{z} |{z}
=: b1 =: b2 =: b3
C = (m2 − 21 m1 , m2 + 12 m1 , m0 ).
| {z } | {z } |{z}
=: c1 =: c2 =: c3
defines also a basis of P2 (R). Now we know how to change between these two bases.
Therefore, we calculate the transformation matrices. The first thing you should note is
7.4 Coordinates with respect to a basis 167
that the basis C is already given in linear combinations of the basis vectors from B. Hence
we get:
1 1 0 1 1 0
1 1
ΦB (c1 ) = − /2 , ΦB (c2 ) = /2 , ΦB (c3 ) = 0
=⇒ TB←C = −1/2 1
/2 0 .
0 0 1 0 0 1
Then ΦB (p) = TB←C ΦC (p) gives us the wanted the translation. If we want to calculate
the reverse translation, we have to calculate the inverse matrix of TB←C .
By calculating the inverse, we get the other transformation matrix TC←B = (TB←C )−1 :
1
/2 −1 0
TC←B = ΦC (b1 ) ΦC (b2 ) ΦC (b3 ) = 1/2 1 0
0 0 1
In this case neither the matrix TB←C nor TC←B is obviously given. In such a case, it might
be helpful to include a third basis, which is well-known. In R2 this third basis should be
of course the standard basis: E = (e1 , e2 ).
V = R2
Φ−1 Φ−1 Φ−1
E = id
B C
?
R2 R2 R2
?
The idea is to calculate first the transformation matrices TE←B and TE←C and the inverses
and then compose the maps in a way to get the transformation matrices TB←C and TC←B .
The basis elements of B and C are already given in the coordinates of the standard basis.
Hence:
1 3 1 2
TE←B = and TE←C = .
2 4 0 2
168 7 General vector spaces
Question: Can we do a similar thing in the polynomial space? Consider bases B and C
that is not the simple monomial basis:
B = (2m2 − 1m1 , −8m1 − 2m0 , 1m2 + 4m1 + 1m0 )
| {z } | {z } | {z }
=: b1 =: b2 =: b3
Answer: Yes, we can do the same by adding the the monomial basis (or a other
well-known basis) in the middle. We call the monomial basis by A, which means A =
(m2 , m1 , m0 ). Then TA←B and TA←C are immediately given:
2 0 1 0 2 1
TA←B = −1 −8 4 and TA←C = 1 2 0 ,
0 −2 1 1 0 1
and then we get TB←C :
Since we again have to find an inverse of a matrix, we can use the Gauß-Jordan algorithm
again:
1 oscillation 1 oscillation
c1 : c26 :
123 50 123 50
2 oscillations 2 oscillations
c2 : c27 :
123 50 123 50
3 oscillations 3 oscillations
c3 : c28 :
123 50 123 50
.. ..
. .
24 oscillations 24 oscillations
c24 : c49 :
123 50 123 50
25 oscillations
c50 :
123 50
170 7 General vector spaces
One can show: C = (c1 , . . . , c50 ) is also linearly independent and hence a basis of
R50 .
f= 123 50
Compression: One stores only the coordinates in ΦC (f ). One can also focus on
the (for humans) important frequencies and ignore the higher and lower ones (e.g.
MP3 file format). All this saves storage space instead of storing the coordin-
ates ΦB (f ) = (f1 , . . . , f50 )T (e.g. WAV file format). Similar ideas exist for two-
dimensional signals like pictures: ⇒ BMP vs. JPG.
α if F = R,
α :=
α if F = C (complex conjugate number).
7.5 General vector space with inner product and norms 171
A if F = R (transpose),
T
∗
A :=
A∗ if F = C (adjoint).
Recall all the properties we could derive from these four rules. For example:
Again, the standard inner product is the most important one in Rn and Cn . Since it
describes the usual euclidean geometry, we denote it by hx, yieuclid in both cases.
(b) For V = F2 and x = xx12 , y = yy12 ∈ F2 we define an inner product by
x1 y1
hx, yi = h x2
, y2
i := x1 y1 + x1 y2 + x2 y1 + 4x2 y2 .
x1 y1
hx, yi = h x2
, y2
i := x1 y2 + x2 y1 .
172 7 General vector spaces
This is symmetric
and linear in the first
argument but not positive definite. For
example, x = −1 gives us hx, xi = h −1 , −1 i = −2.
1 1 1
(d) Let V = P([0, 1], F) be the F-vector space of all polynomial functions f : [0, 1] → F.
Then, we define for f , g ∈ V the inner product:
Z 1
hf , gi := f (x)g(x) dx
0
You should see the analogy to hx, yieuclid in Fn . All data is now continuously distrib-
uted over [0, 1], and we need an integral instead of a sum. Often, we are in the case
F = R and can ignore the complex conjugation g(x).
Recall that for a general inner product on Rn , there is a uniquely determined positive
matrix A such that:
for all x, y ∈ Rn .
In the same way this also works for the complex vector space Cn . We just have to expand
the definition of positive definite matrices in this case:
Some authors might be using only equation (7.19) for defining positive definite
matrices in the real case. Therefore to play it safe, we often talk about matrices
that are “selfadjoint and positive definite”.
D1 1 x x E
1 1
, = x1 x1 + x2 x1 + x1 x2 + 4x2 x2 = (x1 + x2 )2 + 3(x2 )2 ≥ 0.
1 4 x2 x2 euclid
This can be only 0 if x1 = −x2 and x2 = 0, hence only for x = o.
(c) The matrix A = 01 10 is selfadjoint but not positive definite. For example, for x = 1
−1
the value hAx, xieuclid is negative.
Testing a matrix A ∈ Fn×n for positive definiteness can be much work, even in the case
n = 2. Therefore, the next criterion is very useful:
We skip the proof here. Keep in mind that “positive” always means strictly greater than
zero (>0)! Claim (iv) is called Sylvester’s criterion.
Example 7.29. Let us check the proposition for the matrix A = 11 14 . It is positive
We have already seen in Proposition 7.26 that hx, yi := hAx, yieuclid defines an inner
product in Fn if A is positive definite. In some sense, also the converse is correct:
174 7 General vector spaces
Proof. Exercise!
Example 7.31. Look at the R-vector space P2 ([0, 1]) of all real polynomial functions
f : [0, 1] → R with degree ≤ 2. The integral
Z 1
hp, qi := p(x)q(x) dx, p, q ∈ P2
0
defines an inner product. Let us check how to use Proposition 7.30 in this case. Choose
a basis B of P2 , for example the monomial basis B = (m0 , m1 , m2 ), and calculate the
associated Gramian matrix:
Z 1 Z 1
xi+j+1 1 1i+j+1 − 0i+j+1 1
hmi , mj i = i j
x x dx = i+j
x dx = = = (7.20)
0 0 i+j+1 0 i+j+1 i+j+1
and
1 1 1 1 1 1
hm0 , m0 i hm1 , m0 i hm2 , m0 i 0+0+1 1+0+1 2+0+1 /1 /2 /3
(7.20) 1 1 1 1 1 1
G(B) = hm0 , m1 i hm1 , m1 i hm2 , m1 i = 0+1+1 1+1+1 2+1+1
= /2
/3 /4 .
1 1 1 1 1 1
hm0 , m2 i hm1 , m2 i hm2 , m2 i 0+2+1 1+2+1 2+2+1
/3 /4 /5
Then, by Proposition 7.30: For all a, b, c, d, e, f ∈ R, we get:
D 1/1 1
/2 1
/3 a d E
ham0 + bm1 + cm2 , dm0 + em1 + f m2 i = 1/2 1
/3 1
/4 b , e
1 1 1 euclid
/3 /4 /5 c f
= ad + 21 (ae + bd) + 31 (af + be + cd) + 14 (bf + ce) + 51 cf.
Let’s check this:
Z 1
ham0 + bm1 + cm2 , dm0 + em1 + f m2 i = (a + bx + cx2 )(d + ex + f x2 ) dx
0
Z 1
= ad + (ae + bd)x + (af + be + cd)x2 + (bf + ce)x3 + cf x4 dx
0
Z 1 Z 1 Z 1 Z 1 Z 1
= ad dx + (ae + bd) x dx + (af + be + cd) x dx + (bf + ce) x dx + cf x4 dx
2 3
0 0 0 0 0
(7.20)
= ad + 21 (ae + bd) + 13 (af + be + cd) + 14 (bf + ce) + 51 cf.
7.5 General vector space with inner product and norms 175
Proof. G(B) = G(B)∗ follows from hbi , bj i = hbj , bi i. Using Proposition 7.30, we know
hG(B)ΦB (x), ΦB (x)ieuclid = hx, xi > 0 for all x ∈ V \ {o} and hence also for all vectors
ΦB (x) ∈ Fn \ {o}.
7.5.2 Norms
As always, let F ∈ {R, C} and V be an F-vector space. Even in the case V not having an
inner product, we can talk about the length of vectors if we define a length measure:
Example 7.34. (a) We already know that the euclidean norm for Fn , given by
x1
p
kxk =
...
= |x1 |2 + · · · + |xn |2 , x ∈ Fn , (7.21)
xn
The p-norm
For each real number p ≥ 1, we set:
x1
kxkp =
...
:= p |x1 |p + · · · + |xn |p , (7.22)
p
x ∈ Fn .
xn p
This defines the so-called p-norm.In fact, proving the triangle inequality (N3) is not
trivial. The euclidean norm (7.21) is hence also called 2-norm.
(c) Another related norm is given by:
p
lim p |x1 |p + · · · + |xn |p = max{|x1 |, . . . , |xn |}
p→∞
176 7 General vector spaces
Therefore, we define
Let us check for n = 2 that the three properties in Definition 7.33. Let α ∈ F and
x = xx12 , y = yy12 ∈ F2 .
x2
p
On the right-hand side, you see the geometric picture for dif-
=
∞
ferent norms. Usually, one calls it the “unit circles”, which
p
=
p
5
means the sets
=
2
p
=
{x ∈ R2 : kxkp = 1}
1
x1
Such a subset of R2 consists of all vectors with length 1, for 0
different p = 1, 2, 5 and ∞.
For p = 2, this in indeed a usual circle. However, also the
different geometric views for other p are interesting:
We can use the same symbol k · kp as before since the context is always clear.
On the right-hand side, you see some polynomial g
functions f , g, h ∈ P([a, b]). The area of the blue
region is h
Z b
kf k1 = |f (x)| dx
a
and the area of the red region is:
f
Z b
kg − hk1 = |g(x) − h(x)| dx. x
a
a b
In later lectures, like mathematical analysis, we will prove the three properties (N1),(N2)
and (N3) for all these norms.
We look at the examples for inner products from above and calculate the associated norms
s
x
D 1 1 x x E
1
1 1
p
= , = |x1 |2 + x1 x2 + x2 x1 + 4|x2 |2 .
x2 1 4 x2 x2 euclid
(c) Looking at the F-vector space P([a, b]) of all polynomial functions f : [a, b] → F, we
defined the inner product
Z b
hf , gi = f (x)g(x) dx . (7.24)
a
The associated norm in P([a, b]) is the already introduced 2-norm since
s s
p Z b Z b
kf k = hf , f i = f (x)f (x) dx = |f (x)|2 dx = kf k2 .
a a
x = p + n =: x U + x U ⊥
• OS that do not own the zero vector o are always linearly independent.
Example 7.40. (a) The vectors x = 1i and y = 01 from C2 are not orthogonal w.r.t.
However, there are orthogonal w.r.t. the inner product given by hx, yi := hAx, yieuclid
with A = −i2 i
1
, since
D1 0E D 2 i
E
1 0 D1 0E
hx, yi = , = , = , = 0.
i 1 −i 1 i 1 euclid 0 1 euclid
The orthogonal projection of x onto Span(y) can be different for different inner
products. W.r.t. h·, ·i it is o (since x ⊥ y), but w.r.t. h·, ·ieuclid it is
h 1i , 01 ieuclid 0
hx, yieuclid i 0 0 0
x Span(y) = y = 0 0 = =i = .
hy, yieuclid h 1 , 1 ieuclid 1 1 1 1 i
(b) Looking at the vector space F([0, 2π]), which contains function f : [0, 2π] → R, we
define a subspace V that is spanned by the family B = (1, cos, sin). Then w.r.t the
180 7 General vector spaces
Algorithm:
Set v := u − u Span(B) ;
If v 6= o:
Set w := v
kvk
;
Add w to B
If you cancel the algorithm at some point, the family at this point, B = (w1 , . . . , wk ), is
a ONB of the Span(w1 , . . . , wk ).
Recall that for this ONB B = (w1 , . . . , wk ) the orthogonal projection u Span(B) is calculated
by
u Span(B) = hu, w1 iw1 + . . . + hu, wk iwk .
Example 7.41. R 1The monomials C = (m0 , m1 , m2 ) do not form an ONB in P([−1, 1])
w.r.t. hf , gi = −1 f (x)g(x) dx. We can apply the Gram-Schmidt procedure for C. Here
it is useful to start with the numbering indices 0, 1, 2, ...
7.5 General vector space with inner product and norms 181
v0 (x) 1
v0 = m0 = 1, =⇒ w0 (x) = =√ ,
kv0 k 2
r
v1 (x) 3
v1 = m1 − hm1 , w0 i w0 = m1 , =⇒ w1 (x) = = x,
| {z } kv1 k 2
0
r
v2 (x) 45 2 1
v2 = m2 − hm2 , w0 i w0 − hm2 , w1 i w1 , =⇒ w2 (x) = = x − .
| {z
√
} | {z } kv2 k 8 3
2 0
3
Summary
• Vectors are elements in a set, called a vector space V , that one can add together
and scale with numbers α from R or C, without leaving the set V . The addition
and scalar multiplication just have to satisfy the rules (1)–(8) from Definition 7.1.
• If you know that a set V with two operations + and α· is a vector space and if
you want to show that also a subset U 6= ∅ of V form a vector space, then you do
not have to check (1)–(8) again, but only (a) and (b) from Proposition 7.7. This is
called a subspace of V .
• The definitions linear combination, span, generating system, linearly (in)dependent,
basis and dimension are literally the same in Chapter 3.
• If you fix a basis B = (b1 , . . . , bn ) in V , then each x ∈ V has a uniquely determined
linear combination x = α1 b1 + · · · + αn bn . The numbers α1 , . . . , αn ∈ F (F is either
R or C) are called the coordinates of x w.r.t. B. This defines the vector ΦB (x) ∈ Fn .
• Changing the basis of V from B to C also changes the coordinate vector from
ΦcB (x) ∈ Fn to ΦC (x) ∈ Fn . This change can be describes by the transformation
matrix TC←B .
• One always has TB←C = TC←B−1
. Sometimes, it is helpful to go a detour TB←C =
TB←A TA←C where A is a simple and well-known basis.
• An inner product h·, ·i is a map, which takes two vectors x, y ∈ V and gives out a
number hx, yi in F. It has to satisfy the rules (S1)–(S4) from Definition 7.23.
• If A ∈ Fn×n is selfadjoint and positive definite, then hx, yi := hAx, yieuclid defines an
inner product in Fn . Here h·, ·ieuclid is the well-known standard inner product in Rn
(Chapter 2) or Cn (Chapter 6).
• A norm k · k is a map that sends a vector x ∈ V to number kxk ∈ R and satisfy the
rules (N1)–(N3) from Definition 7.33.
• An inner product h·, ·i always defines a norm kxk := hx, xi.
p
• By having an inner product, we can talk about orthogonal projection x U for a vector
x ∈ V w.r.t. a subspace U ⊂ V .
General linear maps
8
It’s dangerous to go alone! Take this.
Old man in a cave
183
184 8 General linear maps
Example 8.3. (a) For V = W = F, let `(x) = 3x. We can easily check (L+) and (L·).
(b) For V = F and W = F2 , let `(x) = x 31 . Obviously, ` satisfies (L+) and (L·).
A = (−2 5 1).
Now let us look for some abstract vector spaces:
8.2 Combinations of linear maps 185
(We also talk about the integration in mathematical analysis next semester.)
The linear maps Example 8.4 are not directly given by a matrix vector multiplication.
However, we will see that this is possible if we go over to the representation of the vector
spaces when fixing a basis. Recall the coordinate vectors and the basis isomorphism ΦB .
We will do this in Section 8.3.
As seen in Example 7.4, we have seen that we can add and scale functions f : R → R.
This can be generalised for linear maps:
The zero vector in L(V, W ) is the zero map o : V → W defined by o(x) = o for all
x∈V.
Def. 8.5 (L ·)
and
(k + `)(αx) = k(αx) + `(αx) = α k(x) + α `(x) = α k(x) + `(x)
Def. 8.5
= α(k + `)(x),
which means k + ` has two properties (L+) and (L·) and is also linear. In the same
manner, we see that α · ` is linear. Showing the properties (1)-(8) is an exercise, for you.
I am serious. It could be an exam question.
From now on, we do not write the two operations + and α· in L(V, W ) in red anymore.
However, keep in mind that these are different operations than + and α· in W .
hx, nieuclid
xG = n = hx, nieuclid n = nhx, nieuclid = n(n> x) = (nn> )x
hn, nieuclid
defines a linear map Rn → Rn . We also know that is given by the associated matrix:
projG = fnn> .
8.2 Combinations of linear maps 187
x = xG + xE , projG (x) x
In other words:
Hence k ◦ ` also have the properties (L+) and (L·) and is linear.
188 8 General linear maps
For the reflection, we expect that using it two times brings us back to the beginning,
which means that we should get the identity map:
Example 8.10. (a) Let δ0 : P2 (R) → R given by δ0 : f 7→ f (0) the point evaluation and
∂ : P3 (R) → P2 (R) the differential operator ∂ : f 7→ f 0 from Example 8.4 (a) and (b).
Then, the composition δ0 ◦ ∂ from P3 (R) to R is given by
∂ δ
f 7→ f 0 7→
0
f 0 (0), hence δ0 ◦ ∂ : f 7→ f 0 (0).
We get:
R
∂
f 7→ F 7→ F0 = f , hence ∂◦ : f 7→ f , which means ∂◦ = id : P2 (R) → P2 (R).
R R
8.2 Combinations of linear maps 189
We can also build the converse composition of ∂ and . Is ◦ ∂ then the identity
R R
Z x Z x x
0
3at + 2bt + c dt = at + bt + ct = ax3 + bx2 + cx
2 3 2
g(x) = f (t) dt =
0 0 0
Recall that bijective and invertible are equivalent notions for maps.
However, here, we are only interested in linear maps between vector spaces. As mentioned
in Chapter 3, we have the following interesting result:
Proof. Let u, v ∈ W be arbitrary and set x := `−1 (u) and y := `−1 (v). Since ` is linear,
we have `(x + y) = `(x) + `(y) = u + v. Hence,
Example 8.12. Recall that we already considered a linear map in Section 7.4, namely
the map ΦB : v 7→ vB , which maps a vector v from an F-vector space V to its coordinate
vector vB ∈ Fn with respect to a basis B. The map ΦB is invertible: It is surjective
because B and the standard basis of Fn are generating families, and it is injective because
B and the standard basis of Fn are linearly independent. By Proposition 8.11, we also
B : v 7→ v is linear.
know that Φ−1 B
Remark:
A linear map ` : V → W exactly conserves the structure of the vector spaces,
meaning vector addition and scalar multiplication. Therefore, mathematicians call
a linear map a homomorphism. A homomorphism ` that is invertible and has an
inverse `−1 that is also a homomorphism is called an isomorphism.
190 8 General linear maps
Equation (8.7) says everything: If you know the images of the all basis elements, which
means `(b1 ), . . . , `(bn ), then you know all images `(x) for each x ∈ V immediately.
Example 8.13. Let V = P3 (R) with the monomial basis B = (m0 , m1 , m2 , m3 ) where
mk (x) = xk . For the differential operator ∂ ∈ L(P3 (R), P2 (R)) where ∂ : f 7→ f 0 , we have
∂(m0 ) = o, ∂(m1 ) = m0 , ∂(m2 ) = 2m1 , ∂(m3 ) = 3m2 , (8.7)
For an arbitrary p ∈ P3 (R), which means p(x) = ax3 + bx2 + cx + d for a, b, c, d ∈ R or
p = dm0 + cm1 + bm2 + am3 , we have
d
c
b and hence ∂(p) = d∂(m0 )+c∂(m1 )+b∂(m2 )+a∂(m3 ) = cm0 +2bm1 +3am2 .
pB =
a
Checking this: p0 (x) = 3ax2 + 2bx + c, hence ∂(p) = p0 = 3am2 + 2bm1 + cm0 .
x `(x)
ΦB Φ−1
B ΦC Φ−1
C
matrix
ΦB (x) ΦC (`(x))
Question:
How to get the map or the matrix in the bottom. How to send the coordinate vector
ΦB (x) to the coordinate vector ΦC (`(x))?
So, f := ΦC ◦ ` ◦ Φ−1
B is a linear map from F to F . We already know that there is always
n m
a corresponding matrix A with f = fA . We get the columns of the matrix by putting the
canonical vectors into the map:
(ΦC ◦ ` ◦ Φ−1 −1
B )(ej ) = ΦC (`(ΦB (ej ))) = ΦC (`(bj ))
This gives us a matrix that really represents the abstract linear map. It depends, of course,
on the chosen bases B and C in the vector spaces V and W , respectively. Therefore, we
choose a good name:
and call it the matrix representation of the linear map ` with respect to the basis B
and C.
x `(x)
ΦB Φ−1
B ΦC Φ−1
C
ΦB (x) ΦC (`(x))
`C←B
Example 8.14. (a) Let ∂ : P3 (R) → P2 (R) with f 7→ f 0 the differential operator We use
in P3 (R) and P2 (R) the respective monomial basis:
We already know:
! !
3 0
ΦC (∂(m3 )) = ΦC (3m2 ) = 0 , ΦC (∂(m2 )) = ΦC (2m1 ) = 2 ,
0 0
! !
0 0
ΦC (∂(m1 )) = ΦC (m0 ) = 0 , ΦC (∂(m0 )) = ΦC (o) = 0 .
1 0
The column vectors from above give us the columns of the matrix ∂C←B :
3 0 0 0
∂C←B = ΦC (∂(m3 )) ΦC (∂(m2 )) ΦC (∂(m1 )) ΦC (∂(m0 )) = 0 2 0 0 . (8.10)
0 0 1 0
Now we can use the map ∂ just on the coordinate level: For f ∈ P3 (R) given by
f (x) = ax3 + bx2 + cx + d with a, b, c, d ∈ R, we have
a a
b 3 0 0 0 3a
b
ΦB (f ) =
c
hence ΦB (∂(f )) = ∂C←B ΦB (f ) = 0 2 0 0
c = 2b .
0 0 1 0 c
d d
8.3 Finding the matrix for a linear map 193
So we get: !
3a
∂(f ) = Φ−1
C 2b = 3am2 + 2bm1 + cm0 .
c
We check this again by ∂(f ) = f 0 and f 0 (x) = 3ax2 + 2bx + c for all x. Therefore,
∂(f ) = 3am2 + 2bm1 + cm0 . Great!
(b) Looking again at the map : P2 ([0, 1]) → P3 ([0, 1]) which sends f to its antiderivative
R
F given by Z x
F(x) = f (t) dt for all x ∈ [0, 1].
0
Take again the monomial basis B = (mR2 , m1 , m0 ) for P2 ([0, 1]) and C = (m3 , m2 , m1 , m0 )
for P3 ([0, 1]). For getting the matrix C←B , we need the images of B. Because of
Z x
tk+1 x xk+1 1
k
for
R
(mk )(x) = t dt = = = mk+1 (x) k = 2, 1, 0 ,
0 k+1 0 k+1 k+1
we get
1
/3
0
ΦC ( (m2 )) = ΦC ( 31 m3 ) =
R
0 ,
0
0
1
/2
ΦC ( (m1 )) = ΦC ( 12 m2 ) =
R
0 ,
0
0
0
ΦC ( (m0 )) = ΦC ( 11 m1 ) =
R
1 .
0
The matrix representation C←B is now given by the coordinate vectors with respect
R
to the basis C:
1
/3 0 0
1
ΦC ( (m2 )) ΦC ( (m1 )) ΦC ( (m0 )) = 0 /2 0 .
R R R
(8.11)
R
C←B
= 0 0 1
0 0 0
(c) Let V = P2 (R) with monomial basis B = (m2 , m1 , m0 ) and W = R with basis
C = (1). Look at the map δ0 : f 7→ f (0) as a linear map V → W . For the basis
vectors from B, we get:
δ0 (m2 ) = m2 (0) = 0, δ0 (m1 ) = m1 (0) = 0, δ0 (m0 ) = m0 (0) = 1
and hence (δ0 )C←B = (0 0 1). If we look at another map give by the evaluation at
x = 1, meaning δ1 : f 7→ f (1), the we get (δ1 )C←B = (1 1 1). Let us check the
calculations for a vector f ∈ P2 (R), which means f (x) = ax2 + bx + c with a, b, c ∈ R.
Then:
! !
a a
f = am2 + bm1 + cm0 = Φ−1 B b , hence ΦB (f ) = b
c c
194 8 General linear maps
and also
! !
a a
ΦC (δ0 (f )) = (0 0 1) b = c and ΦC (δ1 (f )) = (1 1 1) b = a + b + c.
c c
!
A= a1 . . . an ∈ Fm×n
(∗) (∗)
fA (e1 ) = Ae1 = a1 = Φ−1
C a1 , ... , fA (en ) = Aen = an = Φ−1
C an .
!
(fA )C←B = a1 . . . an = A.
The matrix representation (fA )C←B of the linear map fA with respect to the canonical
basis is the associated matrix A.
(e) Let d : R2 → R2 be the rotation by angle ϕ. Choose in V = W = R2 the canonical
basis B = (e1 , e2 ). We use the rotation d for the basis elements e1 = 10 and e2 = 01 :
1 cos ϕ −1 cos ϕ
d(e1 ) = d( )= = ΦB ,
0 sin ϕ sin ϕ
0 − sin ϕ −1 − sin ϕ
d(e2 ) = d( )= = ΦB .
1 cos ϕ cos ϕ
The matrix representation of d with respect to the standard basis is a so-called rotation
matrix
!
1
ΦB (projG (b1 )) = ΦB (b1 ) = 0 ,
0
!
0
ΦB (projG (b2 )) = ΦB (o) = 0 ,
0
!
0
ΦB (projG (b3 )) = ΦB (o) = 0 .
0
(a) Let V and W be two F-vector spaces with bases B and C, respectively. For linear
maps k, ` ∈ L(V, W ) and α ∈ F, we have
(b) Let U be a third F-vector space with chosen basis A. For all ` ∈ L(U, V ) and
k ∈ L(V, W ), we have
Please note that on the left-hand side there are the operations +, α· and ◦ for linear maps
and on the right-hand side there are the operations for matrices.
The zero matrix 0 and the identity matrix 1 are exactly the matrix representations of
the zero map o : V → W with x 7→ o and of the identity map id : V → V with x 7→ x,
respectively.
However, for the last equality, you really have to choose the same basis in V . Otherwise,
see equation (8.15) in section 8.3.4 later.
Now choose ` again as a linear map V → W and also a basis B in V and a basis C in W .
If ` is invertible, we immediately get:
(`−1 )B←C `C←B = (`−1 ◦ `)B←B = idB←B = 1 and `C←B (`−1 )B←C = 1.
Hence:
Proof. If ` is invertible, then (8.14) says the m × n-matrix `C←B is invertible. This means
that the matrix is a square one, hence dim(V ) = n = m = dim(W ).
Example 8.17. (a) Let projG ∈ L(R3 , R3 ) be the linear operator given by the orthogonal
projection onto G := Span(n). We choose the same basis B in both R3 like in
Example 8.14 (f). For the projection projE and the reflection reflE with respect to
the plane E := {n}⊥ , Proposition 8.15 gives us:
(8.2)
(projE )B←B = (id − projG )B←B
1 0 0 1 0 0 0 0 0
(8.13)
= idB←B − (projG )B←B = 0 1 0 − 0 0 0 = 0 1 0
0 0 1 0 0 0 0 0 1
(8.2)
(reflE )B←B = (id − 2 projG )B←B
1 0 0 1 0 0 −1 0 0
(8.13)
= idB←B − 2 (projG )B←B = 0 1 0 − 2 0 0 0 = 0 1 0
0 0 1 0 0 0 0 0 1
(b) Next, we again consider the differential operator ∂ : P3 (R) → P2 (R) and the anti-
derivative operator : P2 (R) → P3 (R). In P2 (R) and P3 (R) choose the monomial
R
basis B and C, respectively. From Proposition 8.15 and the equations (8.10) and
8.3 Finding the matrix for a linear map 197
(8.11), we conclude
1/3 0 0
3 0 0 0 1 1 0 0
R R 0 /2 0
(∂ ◦ )B←B = ∂B←C C←B = 0 2 0 0
0
= 0 1 0 = idB←B
0 1
0 0 1 0 0 0 1
0 0 0
and
1
/3 0 0 1 0 0 0
0 1 3 0 0 0
R R /2 0 0 1 0 0
( ◦ ∂)C←C = C←B ∂B←C =
0
0 2 0 0 = 6= idC←C .
0 1 0 0 1 0
0 0 1 0
0 0 0 0 0 0 0
Let B = (b1 , . . . , bn ) and C = (c1 , . . . , cn ) be two bases of V . Then, the identity map
id : x 7→ x of V with respect to B and C has the following matrix representation:
idC←B = ΦC (id(b1 )) · · · ΦC (id(bn )) = bC1 · · · bCn = TC←B . (8.15)
The transformation matrix TC←B from equation (7.12) for the change of basis is in fact
the matrix representation id (each x stays where it is).
Now, we can expand the notion of the change of basis: Let ` be a linear map V → W .
With respect to the bases B in V and C in W , we find the matrix representation `C←B of
`. In addition, we now choose second bases B 0 in V and C 0 in W ask the following:
Question:
What is the relation between `C←B and `C 0 ←B0 ?
Let us try to calculate the matrices `C 0 ←B0 with the help of `C←B :
`C←B ·
coordinates w.r.t. basis B coordinates w.r.t. basis C
matrix representation
xB `(x)C
Φ
Fn B ΦC Fm
Φ −1 −1
B
` ΦC
TB←B0
TB0 ←B
TC 0 ←C
TC←C 0
our linear map
0 x `(x) Φ
C0
ΦB vector space V vector space W
−10 Φ −1
C0
ΦB
Fn Fm
0 0
xB `(x)C
`C 0 ←B0 ·
coordinates w.r.t. basis B0 coordinates w.r.t. basis C 0
matrix representation
Example 8.18. Let us consider the differential operator ∂ : P3 (R) → P2 (R) where
V = P3 (R) carries the monomial basis B = (m3 , m2 , m1 , m0 ) and an additional basis
B 0 = (2m3 − m1 , m2 + m0 , m1 + m0 , m1 − m0 ) =: (b01 , b02 , b03 , b04 ) .
Moreover, W = P2 (R) carries the monomial basis C = (m2 , m1 , m0 ) and another basis
C 0 = (m2 − 12 m1 , m2 + 12 m1 , m0 ) =: (c01 , c02 , c03 ).
We already know the transformation TC 0 ←C for the change of basis, see Example 7.21. The
one matrix representation ∂C←B is also known by equation (8.10). The change of basis
TB←B0 is directly given by B 0 :
2 0
0 1 ,
ΦB (b01 ) = ΦB (2m3 − m1 ) = 0
−1 , ΦB (b 2 ) = ΦB (m 2 + m 0 ) = 0
0 1
0 0
0 , ΦB (b04 ) = ΦB (m1 − m0 ) = 0 .
ΦB (b03 ) = ΦB (m1 + m0 ) =
1 1
1 −1
In summary, we have:
1 2 0 0 0
/2 −1 0 3 0 0 0 0 1 0 0
TC 0 ←C = 1/2 1 0 , ∂C←B = 0 2 0 0 , TB←B0 =
−1
.
0 1 1
0 0 1 0 0 1 0
0 1 1 −1
Using (8.16), we know that the matrix representation ∂C 0 ←B0 is given by the product of
these three matrices:
3 −2 0 0
∂C 0 ←B0 = TC 0 ←C ∂C←B TB←B0 = 3 2 0 0 . (8.17)
−1 0 1 1
8.3 Finding the matrix for a linear map 199
Alternatively, we could directly calculate ∂C 0 ←B0 from ∂ and the bases B 0 and C 0 . In order
to do this, we apply ∂ to the basis elements from B 0 and represent the results with respect
to the basis C 0 :
!
3
ΦC 0 (∂(b01 )) = ΦC 0 ∂(2m3 − m1 ) = ΦC 0 (6m2 − m0 ) = 3 ,
| {z }
b01
−1
!
−2
ΦC 0 (∂(b02 )) = ΦC 0 ∂(m2 + m0 ) = ΦC 0 (2m1 ) = 2 ,
| {z }
b02
0
!
0
ΦC 0 (∂(b03 )) = ΦC 0 ∂(m1 + m0 ) = ΦC 0 (m0 ) = 0 ,
| {z }
b03
1
!
0
Φ C0 (∂(b04 )) =Φ C0 ∂(m1 − m0 ) = ΦC 0 (m0 ) = 0 .
| {z }
b04
1
= (3a − 2b)c01
+ (3a + 2b)c02
+ (−a + c + d)c03
a
3a − 2b 3 −2 0 0
2 0 0 cb .
ΦC 0 C0 X
7−→ ∂(f ) = 3a + 2b = 3
−a + c + d −1 0 1 1 d
Question:
Are there bases B 0 and C 0 in V and W , respectively, such that A0 is the matrix
representation of `,
A0 = `C 0 ←B0 ?
xB `C←B = A `(x)C
Fn Fm
equivalence
T TB←B0 0
A = SAT
TC 0 ←C S
Fn ? Fm
`C 0 ←B0 = A0
0 0
xB with nice structure
`(x)C
coordinate vector w.r.t. coordinate vector w.r.t.
basis B0 in V basis C 0 in W
Choosing all possible bases B 0 and C 0 in V and W , respectively, we get all possible invert-
ible matrices S and T and hence with `C 0 ←B0 all matrices that are equivalent to A:
B = SAT.
A ∼ A, A ∼ B ⇒ B ∼ A, A ∼ B ∧ B ∼ C ⇒ A ∼ C.
Equivalent matrices describe the same linear map, just with respect to different bases.
Here, we have a simple test for equivalence:
Proof. For a given matrix A, we can use the Gaussian elimination to bring it into a
row echelon form K, namely P A = LK. By using even more row operation (backward
substitution) and column exchanges, we can bring the matrix into a so-called normal
form, given as a block matrix:
1r 0
A ; =: 1r,m,n with r = rank(A).
0 0 m×n
Note that in all these steps, we used invertible matrices from left and from right and,
therefore, did not change the rank of the matrix. In other words, there are invertible
matrices S and T with 1r,m,n = SAT , which means A ∼ 1r,m,n .
⇐ Is rank(A) =: r the same as rank(B) =: r0 , then we immediately get: A ∼ 1r,m,n and
1r0 ,m,n ∼ B and therefore A ∼ B.
⇒ Now, let A ∼ B. Then, there are invertible matrices S and T with B = SAT .
Because of
In particular, the natural number rank(`C←B ) is only dependent upon ` and not upon the
bases B and C. Hence the proposition below works as a alternative definition for the rank
of a linear map.
rank(`) := dim(Ran(`)).
Recall that the range of any map is the set of all elements that are “hit” by the map.
Example 8.23. The operator projG ∈ L(R3 , R3 ) given by the orthogonal projection onto
G := Span(n) with n ∈ R3 , knk = 1 has the matrix representation with respect to B given
202 8 General linear maps
by (8.13), cf Example 8.14 (f). On the other hand, with respect to the standard basis
E = (e1 , e2 , e3 ), the operator projG is given by nn> , cf. equation (8.1), which means:
1 0 0
(projG )B←B = 0 0 0 and (projG )E←E = nn> ∈ R3×3 .
0 0 0
Both are matrix representations of the same linear map projG . Hence, they are equivalent
and have the same rank.
Question:
Which matrices A0 do we get by `B0 ←B0 when B 0 is any basis of V ? What is the
connection to A?
xB `B←B = A `(x)B
Fn Fn
similarity
T TB←B0 0 −1 TB0 ←B S = T −1
A =T AT
Fn ? Fn
`B0 ←B0 = A0
0 0
xB maybe nice structure
`(x)B
coordinate vectors w.r.t. coordinate vectors w.r.t.
B0 in V B0 in V
because S = T −1 by equation (7.15). This means the new matrix representation is given
by multiplying with a suitable matrix T from right and with the inverse T −1 from the
left. Hence, we define:
A≈B ⇒ A∼B
`(x) = b.
The existence of solutions is independent of the right-hand side b if and only if all b ∈ W
lie in Ran(`), which means:
Ran(`) = W (8.18)
We call this, as before, unconditional solvability of the equation `(x) = b and this is
equivalent to the surjectivity of `.
204 8 General linear maps
On the other hand, if x − xp ∈ Ker(`) and `(xp ) = b, then also x is a solution because
In summary:
V W
Ker(`)
k o
o `
S = xp + Ker(`)
xp + k b
xp
Therefore, also in the inhomogeneous case, the solution is unique if and only if
holds since S contains at most 1 element in this case, namely xp if it exists. If there is no
solution xp ∈ V at all, then L = ∅. We call this unique solvability, and for a linear map
` this is exactly the injectivity.
8.4 Solutions of linear equations 205
V o Ran(`) = W
o
Ker(`) = {o} `
xp b
L = {xp }
`−1
`−1 maps each right-hand side b ∈ W to the unique solution x = xp ∈ V .
V W
Ker(`) o
`
`(x)
S
b Ran(`)
x
ΦB Φ−1
B ΦC Φ−1
C
Fn Fm
Ker(`C←B ) o
`C←B ·
`(x)C
bC
Ran(`C←B )
xB
Our equation `(x) = b can be translated to the coordinate level to `C←B ΦB (x) = ΦC (b).
We immediately get
x ∈ Ker(`) ⇐⇒ ΦB (x) ∈ Ker(`C←B )
and
b ∈ Ran(`) ⇐⇒ ΦC (b) ∈ Ran(`C←B ).
This means:
Ker(`) = Φ−1
B Ker(`C←B ) and Ran(`) = Φ−1
C Ran(`C←B ). (8.20)
206 8 General linear maps
Hence Ker(`) and Ker(`C←B ) can only be simultaneously be trivial, which means {o}, and
in the same manner Ran(`) and Ran(`C←B ) can also only be simultaneously be the whole
space. For matrices `C←B , we can restate our Propositions 3.65 and 3.67 from Section 3.12:
Also the rank-nullity theorem can now be transformed to the abstract case:
Rank-nullity theorem
For two F-vector spaces V , W with dim(V ) < ∞ and a linear map ` : V → W , we
have:
dim Ker(`) + dim Ran(`) = dim(V ).
Example 8.28. (a) The projection operator projG ∈ L(R3 , R3 ) with G = Span(n) has
the Ker(projG ) = E := {n}⊥ and Ran(projG ) = G. The rank-nullity theorem gives
us 2 + 1 = 3. The map is neither injective nor surjective. The equations projG (x) = b
have only solutions if b ∈ G. The solution set S is then a line that is parallel to E,
hence a translation of the kernel by a particular solution xp .
8.4 Solutions of linear equations 207
(b) The reflection reflE = id−2 projG is an invertible map where the inverse is reflE again.
The kernel is {o} and the range R3 . The rank-nullity theorem says 0 + 3 = 3. The
equation reflE (x) = b has a unique solution for all b ∈ R3 .
(c) Consider the differential operator ∂ : f 7→ f 0 , defined as P3 (R) → P3 (R), where P3 (R)
carries the monomial basis B. The matrix representation is:
0 0 0 0
3 0 0 0
∂B←B = 0 2 0 0 .
(8.21)
0 0 1 0
Obviously,
0 0 0 0
0 3 0 0
Ker(∂B←B ) = Span
0
and Ran(∂B←B ) = Span
0 , 2 , 0 ,
1 0 0 1
Ker(∂) = Φ−1
B Ker(∂B←B ) = Span(m0 ) = P0 (R)
and
Ran(∂) = Φ−1
B Ran(∂B←B ) = Span(m2 , m1 , m0 ) = P2 (R) .
This makes sense since f 0 = o is solved exactly by the constant functions, which are
all functions f ∈ P0 (R), and the set of all possible outcomes f 0 for f ∈ P3 (R) is exactly
P2 (R).
The equation ∂(f ) = g, which means searching antiderivative for a given function
g ∈ P3 (R), has the solution set S = ∅ if g 6∈ P2 (R) = Ran(∂). For g ∈ P2 (R), we
have
S = fp + Ker(∂) = fp + P0 (R)
with a particular solution fp ∈ P3 (R).
(d) We want to find all solutions f ∈ P3 (R) for the differential equation f 00 + 2f 0 − 8f =
2m2 − 3m1 . In order to do this, we define the linear map ` := ∂ ◦ ∂ + 2 ∂ − 8 id by
P3 (R) → P3 (R) and `(f ) = f 00 + 2f 0 − 8f . Then, we only have to solve the equation
`(f ) = 2m2 − 3m1 in P3 (R). Choose for P3 (R) the monomial basis B, namely on
both sides:
Since the matrix `B←B is invertible (look at the determinant (−8)4 6= 0), we can use
Proposition 8.25 and Proposition 8.26, which say that we have a unique solution for
all right-hand sides in P3 (R).
Let us now solve our equation from above: Seeing it in the coordinate language, we
can rewrite `(f ) = 2m2 − 3m1 as `B←B f B = (2m2 − 3m1 )B , hence:
m3 m2 m1 m0
m3 −8 0 0 0 m3 0 m3 0
1
m2 6 −8 0 0 m2 2 m2 − /4
fB = . Solution: fB = 1
m1 6 4 −8 0 m1 −3 m1 /4
m0 0 2 2 −8. m0 0 m0 0
For this last section, we consider a linear map ` ∈ L(V, V ), which means V = W . All
matrix representations `B←B , where we have the same basis left and right, are similar
matrices and have the same determinant and eigenvalues by Proposition 8.5.
Hence, we define:
det(`) := det(`B←B ).
Knowing the matrix representations, we immediately get the rules for the determinant:
det(k◦`) = det (k◦`)B←B = det kB←B `B←B = det kB←B det `B←B = det(k) det(`)
and det(id) = det(idB←B ) = det(1) = 1.
The notion of eigenvalues and eigenvectors were introduced for matrices and the associated
linear maps. Indeed the whole definition makes also sense in the abstract setting, so that
we can also use it for linear maps ` : V → V . In the end, this will be totally connected
to the eigenvalues of the matrix representations.
8.5 Determinants and eigenvalues for linear maps 209
λ is an eigenvalue of `
⇐⇒ the homogeneous equation (` − λ id)(x) = o has a solution x 6= o
⇐⇒ ` − λ id is not injective
P.8.27
⇐⇒ ` − λ id is not invertible
(8.20)
⇐⇒ (` − λ id)B←B = `B←B − λ idB←B = `B←B − λ1
is not invertible for any basis B of V
⇐⇒ λ is an eigenvalue of `B←B for all bases B of V
⇐⇒ det (` − λ id)B←B = 0 for all bases B of V
⇐⇒ det(` − λ id) = 0
Example 8.31. (a) The rotation d ∈ L(R2 , R2 ) from Example 8.3 (e) has the determin-
ant 1 since the associated matrix representation(8.12) w.r.t. the standard basis B in
R2 :
cos ϕ − sin ϕ
= (cos ϕ)2 + (sin ϕ)2 = 1.
det(d) = det dB←B = det
sin ϕ cos ϕ
For projG each vector from G is an eigenvector for the eigenvalue 1, and each vector
from E is an eigenvector for the eigenvalue 0 since E is the kernel of projG . For projE
we have the same with G ↔ E. For reflE each vector from G is an eigenvector for the
eigenvalue −1, and each vector from E is an eigenvector for the eigenvalue 1.
210 8 General linear maps
Summary
• A map ` from one F-vector space V to another F-vector space W is called linear if
`(x + y) = `(x) + `(y) and `(αx) = α`(x) for all x, y ∈ V and α ∈ F. We write:
` ∈ L(V, W ).
• Linear maps L(V, W ) can be added and scaled with α ∈ F. Hence, L(V, W ) gets an
F-vector space.
• The composition k ◦ ` of linear maps ` : U → V and k : V → W is linear.
• The inverse map of a bijective linear map is again linear. Therefore a bijective linear
map is called an isomorphism.
• Each linear map ` ∈ L(V, W ), between finite dimensional vector spaces V and W ,
can be identified with a matrix. In order to do this, choose a basis B = (b1 , . . . , bn )
in V and a basis C = (c1 , . . . , cm ) in W . Be using the basis isomorphisms ΦB and
ΦC , we get a linear map Fn → Fm . Such a linear map is also represented by a m × n
matrix `C←B := (`(b1 )C · · · `(bn )C ). It is called the matrix representation of ` w.r.t.
B and C.
• The matrix representation of k + ` is the sum of both matrix representations.
• The matrix representation of α` is α times the matrix representation of `.
• The matrix representation of k ◦ ` is the product of both matrix representations.
• The matrix representation of `−1 is the inverse of the matrix representation of `.
• Kernel and range of a linear map ` can be calculated by `C←B .
• By changing the basis of V from B to B 0 and changing the basis of W from C to C 0 ,
the matrix representation of ` : V → W changes from `C←B to `C 0 ←B0 . In this case,
we have `C 0 ←B0 = TC 0 ←C `C←B TB←B0 .
• We call two matrices A and B equivalent and write A ∼ B if there are invertible
matrices S and T with B = SAT .
• We have A ∼ B if and only if rank(A) = rank(B).
• For the special case ` : V → V , one often chooses the same basis B left and right.
How does the matrix `B←B change when changing the basis B to B 0 ? Then, we have
S = T −1 in the formula above.
• Two matrices A and B are called similar and one writes A ≈ B if there is an
invertible matrix T with B = T −1 AT .
• From A ≈ B follows det(A) = det(B) and spec(A) = spec(B) but the converse is
in general false.
• det(`) for a linear map ` : V → V is defined by det(`B←B ) for any basis B in V .
• λ ∈ F is an eigenvalue of ` : V → V if `(x) = λx for some x ∈ V \ {o}.
Some matrix decompositions
9
The story so far: In the beginning the Universe was created. This has
made a lot of people very angry and been widely regarded as a bad move.
Douglas Adams
In the previous chapters 7 and 8, we considered general vector spaces and linear maps
between them. We could show that we are able to decode abstract vectors into vectors
from Fn or Fm and also abstract linear maps into matrices from Fm×n . Therefore, even in
this new general setting, it is important to know how to deal with matrices. Accordingly,
in this chapter, we only consider matrices and want to find some simpler structures for a
given matrix. We already observed such transformations or decompositions of a matrix
into simpler forms:
• In Section 3.11.3, we discovered how to decompose a square matrix into a lower
and an upper triangular matrix: A = LU . We also generalised this for rectangular
matrices as A = P LK where K is the row echelon form.
• In Section 5.5, we discovered how to decompose a square matrix A with full rank
into an orthogonal matrix and an upper triangular matrix: A = QR. It is not hard
to generalise this method for complex matrices and for rectangular matrices as well.
• In Section 6.7, we found that some matrices A can be decomposed into three parts
A = XDX −1
211
212 9 Some matrix decompositions
A = XJX −1 or equivalently X −1 AX = J
J is called a Jordan normal form (JNF) of A. The entries Ji are again block
matrices, which are called Jordan blocks, and have the following structure:
Ji,1
Ji =
... α ×α
∈ C i i,
Ji,γi
where the matrices Ji,` are called Jordan boxes and have the following form:
λi 1
..
λi .
Ji,` = .
..
. 1
λi
Example 9.2. If you have a matrix A ∈ C9×9 and find an invertible matrix X with
A = XJX −1 such that
4 1
4 1
4
4 1 J2
J = 4 z }| { ,
| {z } −3 1
J1 −3
−3
−3
9.1 Jordan normal form 213
λ1 = 4, α1 = 5, γ1 = 2, λ2 = −3, α2 = 4, γ2 = 3 .
Size of Ji
The Jordan block Ji for the eigenvalue λi has the size αi × αi because we need λi as
often on the diagonal of J as the algebraic multiplicity says.
214 9 Some matrix decompositions
dim(Ker(A − λi 1)) =: γi = αi .
In the case γi < αi (which means A is not diagonalisable), we are missing some columns
in X.
To shorten everything: A − λi 1 =: N
Recall: Ker(N ) has the dimension γi = 4. Suppose that Ker(N 2 ) is of dimension 7 and
that Ker(N 3 ) has dimension 8 = αi . The difference
of the dimensions shows us where to find the four missing vectors. Elements from the
spaces Ker(N k ) are called generalised eigenvectors. To be more clear, we call an element
from Ker(N k ) \ Ker(N k−1 ) a generalised eigenvector of rank k. In this sense, the ordinary
eigenvectors are now generalised eigenvector of rank 1.
9.1 Jordan normal form 215
1st chain
Ker(N 2 )
x1,2 x2,2 x3,2
dimension: 7
4th chain
2nd chain
3rd chain
d2 = 7 − 4 = 3
N· N· N·
Ker(N 1 ) 1st level
dimension: 4 = γi x1,1 x2,1 x3,1 x4,1
As you can see in the picture, the vectors form “chains”, from top to bottom. We call each
of these sequences a Jordan chain and it will be related to a Jordan box.
At this point, we now know the whole block Ji . The next step is to find the corresponding
columns of X, which means that we have to calculate the generalised eigenvectors xj,k :
where x1,k , . . . , xj−1,k are the vectors from the chains before, 1 to j − 1, which lie
on the same level k. Now you can build the whole chain to the bottom xj,1 . We just
have to multiply with N in each step:
For x ∈ Ker(N k ), we have N x ∈ Ker(N k−1 ) since o = N k x = N k−1 (N x).
Note that equation (9.1) guarantees that all generalised eigenvectors on the kth level are
linearly independent and that the linear independence remains on the levels below. All
these αi generalised eigenvectors are put as columns into X.
216 9 Some matrix decompositions
For our example, this means: Xi = x1,1 , x1,2 , x1,3 , x2,1 , x2,2 , x3,1 , x3,2 , x4,1 ∈ Cn×8 .
After we did the whole procedure for all eigenvalues λ1 , . . . , λr , the only thing that remains
is:
• Set N := A − λi 1.
• Calculate Ker(N ), Ker(N 2 ), . . . , Ker(N m ) to dim(·) = αi .
• Calculate all dk := dim(Ker(N k )) − dim(Ker(N k−1 )).
• Draw the levels 1, . . . , m and Jordan chains. (Box ??)
• Write down the Jordan block Ji . (Box 9.4)
• Calculate all generalised eigenvectors. (Box 9.5)
• Define Xi with all generalised eigenvectors. (Box 9.6)
Why does this work? Let us look at the X-columns regarding one Jordan chain and its
corresponding Jordan box. Choose the first chain from our example. The chain was given
by x1,2 = N x1,3 and x1,1 = N x1,2 . In this way, we get.
In summary:
! ! !
A x1,1 x1,2 x1,3 = Ax1,1 Ax1,2 Ax1,3 = λi x1,1 x1,1 + λi x1,2 x1,2 + λi x1,3
! λ 1
!
i
= x1,1 x1,2 x1,3 λi 1 =: x1,1 x1,2 x1,3 Ji,1 .
λi
By using the definition of the Jordan chain, we get the 1s above the diagonal in the matrix
Ji,1 . Only at the ordinary eigenvectors (here: x1,1 ), the chain stops. There, you do not
find a 1 but only λi since Ax1,1 = λi x1,1 .
By putting all Jordan boxes together into a Jordan block, we get γi equations (one per
Jordan box), given by
A (xj,1 xj,2 · · · xj,k ) = (xj,1 xj,2 · · · xj,k ) Ji,j , j = 1, . . . , γi ,
one matrix equation AXi = Xi Ji for the ith Jordan block. The final assembling, cf.
(9.2), of the Jordan blocks Ji to the whole matrix J gives us then AX = XJ, which is
exactly the factorisation A = XJX −1 .
Now let us practise:
and we get:
Ker(N 2 ) = {x = (x1 , 0, x3 , 0, x5 )> : x1 , x3 , x5 ∈ C} .
From this, we conclude dim(Ker(N 2 )) = 3. Now we have reached the algebraic multi-
plicity α1 = 3 and do not need to consider any higher powers of N , hence m = 2.
• For the differences of the dimension, we get
d1 := dim(Ker(N 1 )) − dim(Ker(N 0 )) = 2 − 0 = 2,
d2 := dim(Ker(N 2 )) − dim(Ker(N 1 )) = 3 − 2 = 1.
• Since we have a Jordan chain with length 2 and another one with length 1, we know
that the first Jordan block J1 has two Jordan blocks with different sizes:
4 1
J1 = 0 4 .
4
• Let us now find and correctly choose the generalised eigenvectors: We have to choose
x1,2 from Ker(N 2 ) \ Ker(N 1 ). Since we already calculated both kernels, we can choose
x1,2 = (1, 0, 0, 0, 0)> for the top level of the first chain. We calculate the chain to the
bottom: x1,1 := N x1,2 = (1, 0, −1, 0, 0)> .
Now, let us do the second chain. On the top level, we find x2,1 ∈ Ker(N 1 ). By choosing
this vector, we have to respect equation (9.1), which means it is not a linear combination
of vectors from Ker(N 0 ) ∪ {x1,1 } = {o, x1,1 }. In this case, this means that x2,1 comes
from Ker(N 1 ) and is not a multiple of x1,1 . Hence, we choose x2,1 := (0, 0, 0, 0, 1)> .
• We have finished the second chain and can give the matrix
1 1 0
0 0 0
X1 = x1,1 x1,2 x2,1 = −1 0 0 .
0 0 0
0 0 1
Now, we have done everything for the eigenvalue λ1 = 4. Next thing is the eigenvalue
λ2 = 1.
• For the matrix
4 0 1 0 0
0 0 0 0 0
−1
N := A − λ2 1 = A − 11 = 0 2 0 0
0 0 0 0 0
0 0 0 0 3
9.1 Jordan normal form 219
• ...and get a bit boring picture with only m = 1 level and d1 = 2 vectors:
• Let us determine the generalised eigenvectors: x1,1 comes from Ker(N 1 ) \ Ker(N 0 ) and
we could choose x1,1 = (0, 1, 0, 0, 0)> . Now, for the second chain, choose x2,1 ∈ Ker(N 1 )
such that is not given by a linear combination of vectors from Ker(N 0 ) ∪ {x1,1 } =
{o, x1,1 }, cf. (9.1). Let us set x2,1 = (0, 0, 0, 1, 0)> .
• Hence, we have the matrix
0 0
1
0
X2 = x1,1 0
x2,1 = 0
0 1
0 0
hence, A = XJX −1 .
220 9 Some matrix decompositions
where tr(A) := ajj is the sum of the diagonal, the so-called trace of A.
Pn
j=1
In the diagonalisation and in the Jordan decomposition, we had three parts in the form
A = U DV . There we had
Now, if we drop that condition, we can actually fulfil the following two properties even
for rectangle matrices
(1) D is diagonal,
Since we allow U and V to be any square matrices, which means U ∈ Fn×n and V ∈ Fm×m ,
we can consider an arbitrary rectangular matrix A ∈ Fm×n .
We will later show that each matrix A has such a decomposition. Since we want to use
the common notations, we will denote the diagonal matrix D by Σ and use V ∗ on the
right-hand side instead of V . We will see below why this is indeed a good idea. Hence,
the wanted decomposition of A is now written as A = U ΣV ∗ and looks like this:
n m n
n
m A = m U · m Σ · n V∗ . (9.3)
unitary
arbitrary unitary diagonal
The word “diagonal” for a rectangular matrix Σ is of course not literally correct. It means
the following here:
n
s1 n
.. s1
.
...
Σ = m sn or Σ = m , (9.4)
sm
if m ≤ n
if m ≥ n
with
! !
U= u1 · · · um = TC←U , V = v1 · · · v n = TB←V , V ∗ = V −1 = TV←B .
s1 .
..
= u1 · · · um sr = s1 u1 · · · sr ur o · · · o .
Therefore, we have:
A∗ u1 = s1 v1 , . . . , A∗ ur = sr vr , A∗ ur+1 = o, . . . , A∗ um = o. (9.7)
A∗ A = (U ΣV ∗ )∗ (U ΣV ∗ ) = V Σ∗ U ∗ U ΣV ∗ = V Σ∗ ΣV ∗ (9.8)
and AA∗ = (U ΣV ∗ )(U ΣV ∗ )∗ = U ΣV ∗ V Σ∗ U ∗ = U ΣΣ∗ U ∗ . (9.9)
|s1 |2 |s1 |2
.. ..
. .
2
|sr | |sr |2
∗
Σ Σ= n and ∗
ΣΣ = m ,
n m
that are also diagonal. Hence, (9.8) and (9.9) show us the unitary diagonalisations of
the square matrices AA∗ and A∗ A. Recall that both matrices are self-adjoint and have
by Proposition 6.44 in fact an ONB consisting of eigenvectors. These orthonormal eigen-
vectors (w.r.t to standard inner product!) are chosen as the columns of the matrices U
and V .
9.2 Singular value decomposition 223
u:=Av
A∗ Av = λv =⇒ AA∗ u = AA∗ (Av) = A(A∗ Av) = A(λv) = λ(Av) = λu,
hence spec(A∗ A) ⊂ spec(AA∗ ). The converse works the same. As you know from the
homework, all eigenvalues are non-negative. This also fits in with the diagonal entries
|s1 |2 , . . . , |sr |2 , 0, . . . , 0 of Σ∗ Σ and ΣΣ∗ . Actually, we can choose the number si as we
want in F as long as |si |2 is the ith eigenvalue of A∗ A or AA∗ . A simple choice is, of
course, s1 , . . . , sr being real and positive numbers.
In summary, we now have everything for U, V and Σ:
• Set sr+1 , . . . , sm := 0.
• Add to (u1 , . . . , ur ) a family (ur+1 , . . . , um ) such that it is an ONB of Fm .
• Set U := (u1 · · · um ), V := (v1 · · · vn ) and Σ as in (9.4) or (9.5).
√
Example 9.11. Consider the matrix A = 1
−2 0
3
. We have m = n = 2 and F = R.
• Calculate √ √
1 −2 1 3 5 3
A∗ A = √ = √
3 0 −2 0 3 3
and get det(A∗ A − λ1) = (5 − λ)(3 − λ) − 3 = λ2 − 8λ + 12 = (λ − 2)(λ − 6).
The eigenvalues of A∗ A are (in decreasing order) λ1 = 6 and λ2 = 2.
• The eigenvector v1 for λ1 = 6: We solve the linear equation: (A∗ A − 61)v1 = o,
√ √
−1 3 0 1 3
hence √ v1 = ⇐= v1 = (normalised).
3 −3 0 2 1
√ √
• Next thing, we calculate s1 := λ1 = 6 and
√ √ √
1 1 1 3 1 3 1 2 √3 1 1
u1 := Av1 = √ = √ =√ ,
s1 6 −2 0 2 1 2 6 −2 3 2 −1
√ √
• Then s2 := λ2 = 2 and
√
1 1 1 −1
3 1 √ 1 2 1 1
u2 := Av2 = √ = √ =√ .
s2 2 −2 0 2 3 2 2 2 2 1
• In summary, we get:
√ √
1 1 1 1 −1
3 √ 6 √0
U=√ , V = and Σ = .
2 −1 1 2 1 3 0 2
The unitary matrices U and V are indeed orthogonal matrices if all entries are real and,
by our Definition in 5.29, describe rotations (if det(·) = 1) or reflections (if det(·) = −1).
In the example above, both matrices are rotations: U rotates R2 by −45◦ and V rotates
it by 30◦ .
In Chapter 3, we have seen that linear maps fA : R2 → R2 with x 7→ Ax can only stretch,
rotate and reflect. Hence, a linear map changes the unit circle into an ellipse or it collapses
into a line or point. By using the SVD, we can explain in more details what happens
exactly. Let us look at our Example 9.11:
v2
· v1 A· s2 u2
30◦
· 45◦
fA
s1 u1
V ∗· V· A = U ΣV ∗ U·
Rotation Rotation
Σ
z }| {
e2 s1
· s2 e2
s2
· e1 ·
strechting with s1 e1
different factors
√
(2) stretching separately into two directions
√ (with factors s 1 = 6 in direction
of the x1 -axis and with factor s2 = 2 in direction of the x2 -axis) and
(3) a rotation by −45◦ (multiplication by U ).
The major axis and minor axis of the ellipse, which is construed by fA from the unit
circle, are given by the eigenvectors u1 , u2 and the lengths are given by the singular
values s1 and s2 .
The singular values si give us the stretching factors in certain (orthogonal) directions.
For the largest singular value, s1 , we have
The here defined number kAk is the already introduced matrix norm of A. It says how
long the vector Ax ∈ Fm can be at most when x ∈ Fn has length 1. The matrix norm
fulfils the three properties of a norm.
Now we look at an important application of the SVD. We start with a calculation:
s1
=U V ∗ + ··· + U sr V ∗
! ! r
X
= s1 u1 ( v1∗ ) + ··· + sr ur ( vr∗ ) = si ui vi∗ (9.11)
i=1
As we know, A has rank r. Each of the r terms in (9.11) has rank 1. Depending on the
rate of decay of the singular values
s1 ≥ s2 ≥ · · · ≥ sr > 0
we could omit some terms in the sum (9.11) without changing the matrix so much. We
call this a low-rank matrix approximation of A.
Example 9.12. Let us look at an 8-bit-grey picture with 500 × 500 pixels (which shows
a beautiful duck on water):
9.2 Singular value decomposition 227
This can be saved as a matrix A ∈ R500×500 where, in the entries, only integer values
0, 1, . . . , 255 are allowed.
1 3 7 5 . . . 16 8
3 7 3 3 . . . 11 12
. . .. ..
. .
. . . .
6 7 . . . 248 . . . 7 6
A = . . .. .. 187 KB
.. ..
. .
4 8 8 5 ... 4 8
4 8 3 3 ... 6 6
2 3 3 4 ... 9 9
For calculations, we convert the number entries into the range [0, 1] instead of [0, 255].
Most pictures should be full-rank matrices and here we can calculate the rank and actually
get r = n. Now let us write A in the representation given by equation (9.11). Now we
stop the summation instead of after r = 500 steps at k = 50, 30, 10 or 5 terms and we
get the following pictures:
50 KB 30 KB 10 KB 5 KB
The decay of the singular values s1 ≥ · · · ≥ s500 below shows us why we already have at
onyl 30 terms in the sum a very good approximation.
The first singular values (s1 ≈ 262, s2 ≈ 28 and s3 ≈ 21) are not shown in the picture,
for obvious reason.
In Example 9.12, we have seen that a given matrix A with rank r = 500 can be well
approximated by matrices Ak with rank k = 50, 30, 10 or 5. For
r
X k
X
A= si ui vi∗ and k ∈ {1, . . . , r} we set Ak := si ui vi∗ .
i=1 i=1
Ak has rank k and is in fact the best m × n-matrix with rank k for the approximation of
A. We measure the error of approximation by using the matrix norm in equation (9.10):
s1 . s1 .
. . ..
sk .
sk
..
∗
kA − Ak k =
U − V
sr
228 9 Some matrix decompositions
sk+1 . ∗
..
=
U
= sk+1 (largest singular value left).
V
sr
In short:
In particular, s1 is the distance of A to the set of all matrices with rank 0, which consists
only of the zero matrix 0.
At the end, let us take a look at the special case m = n, which means A and Σ are square
matrices. In this case, eigenvalues and singular values are related in the following sense:
• A is invertible if and only if all the singular values are non-zero (see also Proposi-
tion 6.28 for the same claim with eigenvalues).
The smallest singular value of A, sn , gives the distance of A to the set of all n × n-
matrices with rank n − 1 or smaller (which are exactly the singular matrices) by
equation (9.12).
The equation A−1 = (U ΣV ∗ )−1 = V Σ−1 U ∗ gives the SVD of A−1 . Therefore, the
singular values of A−1 are 1/s1 , . . . , 1/sn . The largest of these, meaning 1/sn , is
kA−1 k.
• We know from Corollary 9.8 that the product of all eigenvalues of a given matrix A
is exactly det(A). Since
det(A) = det(U ΣV ∗ ) = det(U ) det(Σ) det(V ∗ ) ⇒ | det(A)| = det(Σ) ,
we know that the product of all singular values, which is det(Σ), is equal to the
absolute value of det(A).
• If A is normal, which means A∗ A = AA∗ , then A can be diagonalised by using a
unitary matrix: A = XDX ∗ . Then D = diag(d1 , . . . , dn ) is a diagonal matrix with
the eigenvalues of A as entries and X = (x1 · · · xn ) consists of eigenvectors for A.
Hence A∗ A = XD∗ DX ∗ = X diag(|d1 |2 , . . . , |dn |2 ) X ∗ . The eigenvalues λi of A∗ A
are, on the one hand, given by λi = di di = |di |2 and, on the other hand, they can be
written as λi = s2i by using the singular values si ≥ 0 of A. Therefore, we get:
si = |di |.
The singular values of A are exactly the absolute values of the eigenvalues of A.
Summary
• A lot of techniques in Linear Algebra deal with suitable factorisations of a given
matrix A:
• From Section 3.11.5: The Gaussian elimination are summarised by a left multiplic-
ation with a lower triangular matrix and a permutation matrix P . Hence, P A
is the row echelon form K of A and we have P A = LK with lower triangular matrix
L := −1 .
9.2 Singular value decomposition 229
• From Section 5.5: A linearly independent family of vectors (a1 , . . . , an ) from Fm can
be transformed into an ONS (q1 , . . . , qn ) by using the Gram-Schmidt procedure.
Therefore, we have for k = 1, . . . , n always ak ∈ Span(q1 , . . . , qk ). For the matrices
A := (a1 . . . an ) and Q := (q1 . . . qn ) we find A = QR, where R ∈ Fn×n is an
invertible upper triangular matrix.
• If we decompose A into a product U DV , then we have different approaches.
• For diagonalisable matrices, we can choose U = X and V = X −1 where in X the
columns are eigenvectors of A and form a basis. Then D has the eigenvalues of A
on the diagonal, counted with multiplicities. See Chapter 6. We also know that
selfadjoint and even normal matrices A are always diagonalisable, we can choose
eigenvectors in such a way that they form an ONB, which means X ∗ = X −1 .
• For non-diagonalisable matrices we still can write A = XDX −1 but now D is not
diagonal. We use the Jordan normal form as a substitute. We get the important
result that all (square) matrices A ∈ Cn×n have such a Jordan normal form and
therefore this decomposition. Note that we actually need the complex numbers
here.
• For the singular value decomposition, the two matrices U and V are not connected
such that we can also bring rectangular matrices A into “diagonal” structure. On
the diagonal D (that is often denoted by Σ), we find the so-called singular values
of A. The singular value decomposition is used for low rank approximation.
Index
231
232 INDEX