Lectures On Linear Algebra
Lectures On Linear Algebra
A field F is a set with two operation, called addition and multiplication, which are denoted
by + and · (often omitted), respectively, and which satisfy the following properties:
3. There exists a unique identity element for each operation, denoted by 0 and 1, i.e.,
0 + a = a + 0 = a and 1a = a1 = a for all a ∈ F
The axiom system above is not the ‘most economic’ one. Check that it implies that 0a = a0 = 0
for every a ∈ F.
Examples of fields.
Fp – the finite field of p elements, p is prime. The field Fp is often denoted by Z/pZ or Zp , and
is thought as the set of p elements {0, 1, . . . , p − 1} where addition and multiplication are done
by modulo p.
Suppose V is a set whose elements are called vectors and denoted by v̄. Suppose there exists
a binary operation on V , called addition and denoted by +, which is commutative, associative,
there exists an identity element, denoted by 0̄ and called zero vector, and for every vector v̄,
there exists an additive inverse which we denote by −v̄.
Suppose F is a field and there exists a function µ from µ : F × V → V given by (k, v̄) 7→ kv̄,
which satisfies the following axioms:
1. 1v = v for all v̄ ∈ V
An ordered triple ((V, +), F, µ) is called a vector space. In a simpler manner, we just say that
V is a vector space over the field F. The function µ is mentioned very rarely.
The addition on Fn and the multiplication by scalars are defined as follows: for every v̄, ū ∈ V ,
and every k ∈ F,
Then V = Fn is a vector space over F, which can be (and has to be) checked easily. For F = R,
we obtain the well familiar space Rn .
• Let V = C(0, 1) be the set of all continuous real valued functions on (0, 1): f : (0, 1) → R.
The addition on C(0, 1) and the scalar multiplication are defined as follows: for any f, g ∈
C(0, 1), and every k ∈ R,
Again, C(0, 1) is a vector space over R, which has to be checked. Here the fact that + is a
binary operation on C(0, 1), or more precisely, that it is ‘closed’, which means f + g ∈ C(0, 1),
is not a trivial matter. The same for kf , though it is a little easier. Similarly we can consider
C(R) or C 1 (0, 1) – the vector space over R of all real valued differentiable functions from (0, 1)
to R with continuous first derivative.
• V = F[x] – set of all polynomials of x with coefficients from a field F. V is a vector space
over F with respect to the usual addition of polynomials and the multiplication of polynomials
by numbers (elements of F).
y ′′ − 5y ′ + 6y = 0.
• V is the set of all sequences of real numbers (xn ), n ≥ 0, defined by the recurrences:
x0 = a, x1 = b, a, b ∈ R, and for all n ≥ 0, xn+2 = 2xn+1 − 3xn .
• V = Mm×n (F) – the set of all m × n matrices with entries from F with respect to usual
addition of matrices and the multiplication of matrices by scalars.
• Let A ∈ Mm×n , i.e., A is an m × n matrix over R. Then the set V of all solutions of the
homogeneous system of linear equations Ax̄ = 0̄, i.e., the set of all vectors x̄ ∈ Rn such that
Ax̄ = 0̄, is a vector space over R with respect to usual addition of vectors and the scalar
multiplication in Rn . Note that in this example elements of Rn are thought of as the column
vectors ( n × 1 matrices).
Proof. (i) Indeed, if 0̄′ is a possible another identity element, then 0̄ + 0̄′ = 0̄ as 0̄′ is an identity,
and 0̄ + 0̄′ = 0̄′ as 0̄ is an identity. So 0̄ = 0̄′ . This justifies the notation 0̄.
(ii) Indeed, let b̄, b̄′ be inverses of a. Then consider the element (b̄ + ā) + b̄′ . Since b̄ + ā = 0̄,
then (b̄ + ā) + b̄′ = 0̄ + b̄′ = b̄′ . Similarly, consider b̄ + (ā + b̄′ ). Since ā + b̄′ = 0̄, then
b̄ + (ā + b̄′ ) = b̄ + 0̄ = b̄. Due to the associativity, (b̄ + ā) + b̄′ = b̄ + (ā + b̄′ ). So b̄′ = b̄. This
justifies the notation −ā.
(iii)
(iv)
(v)
(vi)
If we do not say otherwise , V will denote a vector space over an arbitrary field F.
k1 v1 + . . . + km vm
For a subset A of V the set of all (finite) linear combinations of vectors from A is called the
span of A and is denoted by Span(A) or hAi. Clearly, A ⊆ Span(A).
P
Sometimes it is convenient to use a∈A ka a as a notation for a general linear combination
of finitely many elements of A. In this notation we always assume that only finitely many
coefficients ka are nonzero
QUESTION: Given a subset A in a vector space V , what is the smallest subspace of V with
respect to inclusion which is a superset of A?
It turns out that it is Span(A)! It is proved in the following theorem among other properties
of subspaces and spans.
where the intersection is taken for all subspaces U of V for which W is a subset.
It is clear that for every A ⊂ V , A ⊆ Span(A). The second statement means that taking a
span of a subset of V more than once does not produce a greater subspace.
Problems.
When you do these problems, please do not use any notions or facts of linear algebra which we
have not discussed in this course. You must prove all your answers.
2. Prove or disprove:
4. Let v ∈ F34 and v is not a zero vector. How many vectors does Span({v}) have?
5. Let P3 be the set of all polynomials of x with real coefficients of degree at most 3. Show
that P3 is a vector space over R. Show that the set {1, x, x2 , x3 } spans P3 , and the set
{1, x − 1, (x − 1)2 , (x − 1)3 } spans P3 .
7. Show that if U and W are subspaces of a vector space V , then U ∪ W need not be a
subspace of V . However, U ∪ W is a subspace of V if and only if U ⊆ W or W ⊆ U .
8. Let V be the set of all sequences of real numbers (xn ), n ≥ 0, defined by the recurrences:
x0 = a, x1 = b, and for all n ≥ 0, xn+2 = −2xn+1 + 3xn .
Every such sequence is completely defined by a choice of a and b. Show that V is a vector
space over R and that it can be spanned by a set of two vectors.
U + W = {u + w : u ∈ U and w ∈ W }.
11. Consider R∞ – the vector space of all infinite sequences of real numbers, with addition
of vectors and multiplication of vectors by scalars defined similarly to the ones in Rn .
Consider a subset l 2 (R) of all those sequences (xn ) such that ∞ 2
P
i=1 xi converges. Does
l 2 (R) span R∞ ?
Usually, and in this course, a set can have only distinct elements. Otherwise we would refer to
it as a multiset.
A set X is called finite if there exists an integer n ≥ 0 and a bijection from X to {1, . . . , n}. A
set X is called infinite if there exists a bijection from a subset Y of X to N = {1, 2, . . .}.
A set of vectors is called linearly independent, if every finite subset of it is linearly independent.
Otherwise a set is called linearly dependent.
Examples.
• A vector v ∈ V forms a linearly independent set {v} if and only if v 6= 0. The set {0} is
linearly dependent.
• A set {(2, 3), (−1, 4), (5, −9)} ⊂ R2 is linearly dependent: 1(2, 3) + (−3)(−1, 4) = (5, −9).
a + c = 0, a + b + ce = 0, a − b + ce−1 = 0.
Hence a(1−e)+b = 0 and a(1−e−1 )−b = 0. Adding these equalities we get (2−e−e−1 )a =
0. Since e = 2.7182818284590 . . ., 2 − e − e−1 6= 0. Hence a = 0. Substituting back, we
get b = 0 and c = 0. Hence {1, x, ex } is linearly independent.
• Let i ∈ N and ei = (x1 , . . . , xn , . . .) be a vector (i.e., an infinite sequence) from the vector
space R∞ such that xi = 1 and xj = 0 for all j 6= i. An infinite set of all vectors ei is
linearly independent.
3. A set of vectors is linearly dependent if and only if there exists a vector in the set which
is a linear combination of other vectors.
Or, equivalently,
A set of vectors is linearly dependent if and only if, there exists a vector in the set which
is a linear combination of some other vectors from the set.
(This explains why the same definition of linear independence is not made for multiset).
P
Remark: when denote a linear combination of vectors from A by a∈A ka a, we assume
that only finitely many coefficients ka are nonzero.
5. Let A be a linearly independent subset of V which does not span V . Let b ∈ V \ Span(A).
Then A ∪ {b} is linearly independent subset of V .
The cardinality of a basis is called the dimension of V . If V has a basis of n vectors, for some
n ≥ 1, we say that V is finite-dimensional, or n-dimensional, or has dimension n, or write
dim V = n.
We assume that the trivial vector space {0} has dimension zero, though it has no basis.
Examples
• Let n ≥ 1, 1 ≤ i ≤ n. Let ei denote an vector from Fn having the i-th component 1 and
all other components 0. Then {e1 , . . . , en } is a basis of Fn , called the standard basis of
Fn .
√ √
• Let V = Q[ 2] := {a + b 2 : a, b ∈ Q}. Then V is a vector space over Q of dimension 2
√
and {1, 2} is a basis.
• Let P = F[x] be the vector space of all polynomials of x over F. Then P is infinite
dimensional. Indeed, if it is not, then it has a finite basis B. Each element of the basis
is a polynomial of some (finite) degree. Let m be the greatest of the degrees of the
polynomials from the basis. Then Span(B) contains only polynomials of degree at most
m and hence Span(B) is a proper subset of P . E.g., xm+1 6∈ Span(B). The obtained
contradiction proves that P is infinitely dimensional over F.
We wish to remind ourselves that the field F which we decided not to mention every time is
there. The same set of objects can be a vector space over different fields, and the notion of
dimension depends on the field. For example, C is vector space over R of dimension two: {1, i}
is a basis. The same C is a vector space over Q of infinite dimension. All these statements have
to be, and will be, be proved.
Proof. By Theorem 3, we can assume that A contains no zero vector. We proceed by induction
on m. Let m = 1. Then A = {v}, where v 6= 0. Hence A is a basis.
Suppose the statement is proven for all sets A, |A| = k, 1 ≤ k < m. Span(A) = V 6= {0} and
|A| = m. If A is linearly independent, then A is a basis, and the proof is finished. Therefore we
assume that A is linearly dependent. Then some v ∈ A is a linear combination of other vectors
from A (Theorem 3). Let A′ = A \ {v}. Then Span(A′ ) = Span(A) = V , and |A′ | = m − 1. By
induction hypothesis A′ has a subset which is a basis of V . This subset is also the subset of A,
and the proof is finished.
Corollary 6 In a finite-dimensional space every basis has the same number of vectors.
Corollary 7 If a vector space contains an infinite linearly independent subset, then it is infinite-
dimensional.
In mathematics we often look for sets which satisfy a certain property and which are minimal
(maximal) in the sense that no proper subset (superset) of them satisfy this property. E.g., a
minimal set of axioms for a theory, a minimal independence set of a graph, maximal matching
in graphs, a minimal generating set for a group, etc.. At the same time we often want to find
Example. Consider all families of non-empty subsets of the set {1, 2, 3, 4, 5} such that the in-
tersection of every two subsets of the family is empty. Examples of such families are A =
{{1, 2, 3, 4}, {5}}, or B = {{1, 5}, {2, 3}, {4}}. |A| = 2 and |B| = 3. Both A and B are maximal,
but none of them is maximum, since the family of singletons C = {{1}, {2}, {3}, {4}, {5}} also
possess the property and has 5 members. So a maximum family must contain at least five
members.
Proof. Complete! .
When you do these problems, please do not use any notions or facts of linear algebra which we
have not discussed in this course. You must prove all your answers.
1. Let V = F be a field considered as a vector space over itself. Find dim V and describe all
bases of V .
2. Prove or disprove:
3. How many bases does Fp2 have? You can first try to answer this question for p = 2, 3, 5.
4. Let P3 be the set of all polynomials of x with real coefficients of degree at most three.
Show that P3 is a vector space over R. Show that the set {1, x − 1, (x − 1)2 , (x − 1)3 } is
a basis of P3 .
(Remember, that here P3 is a space of polynomials (as formal sum of monomials), not
polynomial functions.)
Let 1, x, x2 , x3 be functions (polynomial functions) from the vector space C(R). Prove
that they are linearly independent.
V = {(x1 , . . . , xn ) : a1 x1 + . . . + an xn = 0} ⊆ Fn .
Find a basis in V consisting of two geometric progressions, i.e., of sequences of the form
(crn ), n ≥ 0. Write the vector (i.e., sequence) of this space corresponding to a = b = 1
as linear combination of the vectors from this basis.
10. Let u, v, w be three distinct vectors in Fnp . How many vectors can Span({u, v, w}) have?
11. Consider V = R∞ – the vector space of all infinite sequences of real numbers, with addition
of vectors and multiplication of vectors by scalars defined similarly as in Rn . Prove that
V is infinite-dimensional.
12. A complex (in particularly real) number α is called transcendental, if α is not a root
of a polynomial equation with integer coefficients. For example, the famous numbers
π and e are transcendental, though the proofs are hard. Explain that the existence of
transcendental numbers implies that R is infinite-dimensional as a vector space over Q.
13. Let V = R be the vector space over Q. Prove that the set {1, 21/3 , 22/3 } is linearly
independent.
14. (Optional) Let V = R be the vector space over Q. Prove that the infinite set
2 n
(i) {1, 21/2 , 21/2 , . . . , 21/2 , . . .} is linearly independent
√
(ii) { p : p ∈ Z, p ≥ 2, and p is prime} is linearly independent.
15. (Optional) Prove that the functions ex , e2x form a basis in the vector space of all solutions
of the differential equation y ′′ − 3y ′ + 2y = 0.
We would like to add a few more facts about subspaces and their dimensions. The following
notations are useful: by W ≤ V we will denote the fact that W is a subspace of V , and we
write W < V if W is a proper subspace of V (i.e., W 6= V ).
Proof. If W < V , then a basis of W , which has n vectors in it, is not a basis of V . By the
previous corollary, it can be extended to a basis of V which will contain n + 1 elements. This
is impossible, since every basis of V contains n vectors.
We claim that
B = {v1 , . . . , vr , u1 , . . . , up−r , w1 , . . . , wq−r }
r
X p−r
X q−r
X
ai vi + bi u i + ci wi = 0. (1)
i=1 i=1 i=1
Since {v1 , . . . , vr , w1 , . . . , wq−r } is linearly independent (as a basis of W ), then all the coefficients
are zeros. In particular all ci are zeros. A similar argument gives all bi equal zero. Then (1)
gives
r
X
ai vi = 0,
i=1
and as {v1 , . . . , vr } is linearly independent (as a basis of U ∩ W , all ai are zeros. Hence B is
linearly independent and
dim(U + W ) = |B| = r − (p − r) + (q − r) = p + q − r.
Let V1 and V2 be vector spaces over the same field F. Consider the set
V1 × V2 = {(v1 , v2 ) : v1 ∈ V1 , v2 ∈ V2 .}
(v1 , v2 ) + (v1′ , v2′ ) = (v1 + v1′ , v2 + v2′ ) and k(v1 , v2 ) = (kv1 , kv2 ) for k ∈ F,
then it is easy to check that V1 × V2 is a vector space over F. It is called the direct product
or the external direct product of V1 and V2 , and will be denoted by V1 × V2 . The direct
product of more than two vector spaces over F is defined similarly.
A reader may get a feeling that the notions of the direct sum and direct product are very similar,
and there is no distinction between the resulting vector spaces. Note also that, opposite to U
and W being subspaces of U ⊕ W , neither V1 nor V2 is a subspace of V1 × V2 . All this may be a
little bothering. The notion of isomorphism, which we are going to define very soon, will help
to discuss these issues in a precise way.
Problems.
When you do these problems, please do not use any notions or facts of linear algebra which we
have not discussed in this course. You must prove all your answers.
1. It is clear how to generalize the definition of the direct product of two vector spaces to
any finite (or infinite) collection of vector spaces over the same field F.
What about the direct sum of subspaces? Try n ≥ 2 subspaces. Can you state and prove
a statement similar to Proposition 12? If you do not see how to do it for any n ≥ 2
subspaces, maybe do it first for three subspaces. Then state the result for n subspaces.
You do not have to prove it.
Can you extend the definition of the direct sum for infinitely many subspaces? If you can,
please state it.
3. Proof that C(R) = E ⊕ O, where E is the subspace of all even functions and O is the
subspace of all odd functions.
(A function f : R → R is called even (resp. odd) if for all x ∈ R, f (−x) = f (x) (resp.
f (−x) = −f (x) ).)
6. How many k-dimensional subspaces, k ≥ 1 and fixed, does C(R) (over R) have?
7. Prove that functions 1, ex , e2x , . . . , enx , n ≥ 1 and fixed, are linearly independent as vectors
in C ∞ (R).
8. Give an example of three functions f1 , f2 , f3 ∈ C ∞ (R), and three distinct real numbers
a, b, c, such that f1 , f2 , f3 are linearly independent, but the vectors (f1 (a), f2 (a), f3 (a)),
(f1 (b), f2 (b), f3 (b)), (f1 (c), f2 (c), f3 (c)) are linearly dependent as vectors in R3 .
9. Let U = hsin x, sin 2x, sin 3xi and W = hcos x, cos 2x, cos 3xi be two subspaces in C ∞ (R).
Find dim(U ∩ W ).
(The symbol hv1 , . . . , vn i is just another common notation for Span({v1 , . . . , vn })).
10. Prove that if dim V = n and U ≤ V , then there exists W ≤ V such that V = U ⊕ W .
Does such W have to be unique for a given U and V ?
11. Prove that Rn is the direct sum of two subspaces defined as:
U = {(x1 , . . . , xn ) : x1 + . . . + xn = 0} and
W = {(x1 , . . . , xn ) : x1 = x2 = · · · = xn }.
13. (Optional) Let V = R be the vector space over Q. Prove that the vectors π and cos−1 (1/3)
are linearly independent.
Strange objects (vector spaces) have to be studied by strange methods. This is how we arrive
to linear mappings. Since linear mappings are functions, we review related terminology and
facts. We do it here in a very brief way.
Given sets A and B, a function f from A to B is a subset of A × B such that for every a ∈ A the
exists a unique b ∈ B that (a, b) ∈ f . The fact that f is a function from A to B is represented
by writing f : A → B. The fact that (a, b) ∈ f is represented by writing by f : a 7→ b, or
f a = b, or in the usual way: f (a) = b. We also say that f maps a to b, or that b is an image
of a in f .
Let im f := {b ∈ B : b = f (a) for some a ∈ A}. im f is called the the image of f , or the
range of f .
We say that f is bijective if it is one-to-one and onto, or, equivalently, if f is both injective
and surjective.
It is easy to show that h is a function from A to C, and h(a) = g(f (a)). It is called the
composition of functions f and g, and denoted by g ◦ f . It is easy to check that h is
It may happen that for f : A → B, the set {(b, a) : f (a) = b} ⊂ B × A is a function from B to
A. This function is denoted by f −1 and called the inverse (function) of f . It is clear that this
happens, i.e., f −1 exists, if and only if f is a bijection.
Then f is called a linear map (or mapping) from V to W . If V = W , a linear map from V to
V is called a linear operator on V , or a linear transformation of V .
Here are some examples of linear maps. Verification that the maps are linear is left to the
reader.
• Let V = C 2 (a, b), W = C 1 (a, b), and let f : V → W be defined via f 7→ f ′ – the derivative
of f .
An easy way to construct a linear map is the following. Let {v1 , . . . , vn } be a basis of V . Chose
arbitrary n vectors {w1 , . . . , wn } in W and define f : V → W via ni=1 ki vi 7→ ni=1 ki wi for
P P
For a linear map f : V → W let ker f := {v ∈ V : f (v) = 0}. The set ker f is called the
kernel of f .
Theorem 13 Let V and W be vector spaces over F, and let f : V → W be a linear map. Then
5. im f ≤ W
Proof.
1. Since f in injective, then ker f = h0i. Hence dim V = dim ker f + dim im f = dim im f ≤
dim W .
2. Since f in surjective, then dim V = dim ker f + dim im f ≤ dim ker f + dim W ≥ dim W .
4. Let n be the common dimension of V and W , and let {v1 , . . . , vn } and {w1 , . . . , wn } be
the bases of V and W , respectively. Define f : V → W by ni=1 ki vi 7→ ni=1 ki wi . Since
P P
Problems.
2. Can you describe ker f and im f for all linear maps f from the examples of this lecture?
3. Let V and W be vector spaces over F, and let f : V → W be a function satisfying one
of the two conditions required for f being linear. For each of the two conditions, find an
example of f which satisfies this condition, but not the other one.
4. Find an isomorphism between Fn+1 and Pn (F) – the vector space of all polynomials over
F of degree at most n.
5. Find an isomorphism between Fmn and Mm×n (F) – the vector space of all m × n matrices
over F.
6. Let V be a finite dimensional space over F. Decide whether the following statements are
true or false? Explain.
(i) If V = V1 ⊕ V2 , then V ≃ V1 × V2 .
(ii) If V ≃ V1 × V2 , where V1 , V2 are subspaces of V , then V = V1 ⊕ V2 .
(iii) Let f : V → V be a linear operator on V . Then V ≃ ker f × im f .
(iv) Let f : V → V be a linear operator on V . Then V = ker f ⊕ im f .
8. (Optional) Suppose S is a set of 2n + 1 irrational real numbers. Prove that S has a subset
T of n + 1 elements such that no nonempty subset of T has a rational sum.
– Yogi Berra
Lecture 7.
What do we do next? There are many natural questions we can ask about vector spaces at
this point. For example, how do they relate to geometry or other parts of mathematics? Or to
physics? Did they help Google or the national security?
I hope we will touch all these relations, but now we will talk about matrices, objects which are
inseparable from vector spaces. We assume that the reader is familiar with basic definitions and
facts about matrices. Below we wish to discuss some natural questions which lead to matrices.
1. Trying to solve systems of linear equations is one of them. There matrices and vectors (the
latter also can be viewed as n × 1 or 1 × n matrices) appear as very convenient notations. As all
of you know, the problem of solving a system of m linear equations each with n variables can
be restated as finding a column vector x ∈ Fn such that Ax = b, where A = (aij ) is the matrix
of the coefficients of the system, and b ∈ Fm is the column vector representing the right hand
sides of the equations. Here the definition for the multiplication of A by x is chosen in such a
way that Ax = b is just a short way of rewriting the system. It seems that nothing is gained
by this rewriting, but not quite. Somehow this way of writing reminds us about the simplest
linear equation ax = b, where a, b ∈ F are given numbers and x
is the unknown number. We know that we can always solve it if a 6= 0, and the unique solution
is x = a−1 b. The logic of arriving to this solution is as follows:
We also know that for a = 0, we either have no solutions (if b 6= 0), or every element of F is a
solution (if b = 0). Analogy in appearance, suggests analogy in the approach, and we may try
to invent something like 0, or 1, or A−1 for matrices. We may ask the question whether the
product of matrices is associative, etc. .
Trying to push the analogy, we may say that, in F, it does not matter whether we are solving
ax = b or xa = b or xa − b = 0. Trying to see whether it is true for matrices, we immediately
realize that xA (what is it ???) has little to do with already introduced Ax, and that the
obvious candidate for zero-matrix, does not allow to claim that if A is not ‘the’ zero matrix,
then Ax = b is always solvable. Thinking about all this, one may come to the usual non-
commutative (but associative) ring of square matrices Mn×n (F). Analyzing further, one realizes
2. Another way to arrive to matrices, especially to the notion of matrix multiplication, can be
through doing changes of variables. The method was used long before vectors or matrices were
born. Mostly in number theory or in geometry, when the coordinates were used. If x = 3a − 2b
and y = a + 5b, and a = e − f − 5g, and b = 2e + f + 7g, then x = 3(e − f − 5g) − 2(2e + f + 7g) =
−e − 5f − 29g, and y = (e − f − 5g) + 5(2e + f + 7g) = 11e + 4f + 30g. The expression for x
and y in terms of e, f , and g, can be obtain via the following computation with matrices:
" # " #" # " # e
# "
x 3 −2 a 3 −2 1 −1 −5 f =
= = (2)
y 1 5 b 1 5 2 1 7
g
" #" #! e " # e
3 −2 1 −1 −5 f = −1 −5 −29 f
(3)
1 5 2 1 7 11 4 30
g g
Tracing how the coefficients are transformed, leads to the rule for matrix multiplication. This
rule will become more visible if we use letters instead of numbers for the coefficients in our
transformations.
It is clear that the correspondence f 7→ Mf defined by (4) between the set of all linear maps
from U to V and the set of all m × n matrices over F is a bijection, and depends heavily on the
choices of two bases.
Then the composition g ◦ f of linear maps f and g, which is a linear map itself, is given by a
m × q matrix Mg◦f = (cst ), where (g ◦ f )(us ) = cs1 w1 + . . . + csq wq , s ∈ {1, . . . , m}. Let us
express the coefficients cst in terms of aij and bkl .
X X
(g ◦ f )(us ) = g(f (us )) = g asj vj = asj g(vj ) =
j j
X X X X
asj bjt wt = asj bjt wt =
j t j t
X X
asj bjt wt .
t j
P
Therefore cst = j asj bjt , and we obtain
Mg◦f = Mf Mg .
Hence
(g ◦ f )(u1 ) w1 w1
... = Mg◦f . . . = (Mf Mg ) . . . (6)
(g ◦ f )(um ) wq wq
Though equality (6) makes the relation between the matrix of a composition of linear maps
and the product of the corresponding matrices very clear, one should not read more from it
than what it displays. Contrary to the associativity of the product of matrices in (2) and
(3), the right hand side of (6) is not the usual product of three matrices: trying to check the
associativity, we have difficulties with the meaning of the ‘products’ involved. If the column
vector of vectors wi is denoted ny w,
~ what is the meaning of Mg w?
~ Or of Mf w,
~ if we hope that
multiplying by Mf first may help?
Problems.
3. Check that the only matrix X ∈ Mm×m (F) with the property that XA = A for all
A ∈ Mm×n (F) is the identity matrix Im =diag(1, 1, . . . , 1). Similarly, the only matrix
Y ∈ Mn×n (F) with the property that AY = A for all A ∈ Mm×n (F) is the identity matrix
In =diag(1, 1, . . . , 1).
4. Check that if AB and BA are defined, then both AB and BA are square matrices. Show
that the matrix multiplication of square matrices is not, in general, commutative.
5. Check that the following products of three matrices, (AB)C and A(BC), always exist
simultaneously, and are equal. Hence the matrix multiplication is associative. Do this
exercise in two different way. First by using the formal definition for matrix multiplication
and manipulating sums. Then by using the correspondence between matrix multiplication
and composition of linear maps. Prove the fact that composition of three functions is
associative.
6. Check that for any three matrices A, B, C over the same field F, A(B + C) = AB + AC,
and (A + B)C = AC + BC, provided that all operations are defined. These are the
distributive laws.
8. Matrices from Mn×n (F) are also referred to as square matrices of order n. Let A be
a square matrix of order n. A matrix B is called the inverse of A, if AB = BA = In .
The inverse matrix, if it exists, is denoted by A−1 , and A is called nonsingular, otherwise
it is called singular.
(ii) Give an example of a 2 × 2 matrix A such that A is not zero matrix, and A−1 does
not exist.
9. Prove that if the Show that there exist matrices A, B, C ∈ M2×2 (F) such that AB =
AC = 0, A 6= 0, B 6= C.
Pn
10. Let A = (aij ) ∈ Mn×n (F). Define tr A := i=1 aii . The field element tr A is called the
trace of A.
Prove that f : Mn×n (F) → F defined via A 7→ tr A is a linear map, and that tr AB =
tr BA for every two matrices A, B ∈ Mn×n (F).
12. Find the set of all matrices C ∈ Mn×n (F) such that CA = AC for all A ∈ Mn×n (F).
Mf −1 Mf = Mf Mf −1 = In .
With α and β as above, let g : V → V be a linear map. Then we can represent g in two different
ways:
g(u1 ) u1 g(v1 ) v1
. . . = Mg,α . . . and . . . = Mg,β . . . (8)
g(un ) un g(vn ) vn
As β is a basis, we have
u1 v1
. . . = C . . . , (9)
un vn
for some matrix C. Since C = Mid with respect to the bases α and β, and id is an isomorphism
of V to V , then C is an invertible matrix, as we proved at the beginning of this lecture. Next
we notice that (9) implies
g(u1 ) g(v1 )
... = C ... (10)
g(un ) g(vn )
Equalities (8), (9), (10), and the associativity of matrix multiplication imply that
g(u1 ) v1 v1
. . . = (Mg,α C) . . . = (C Mg,β ) . . . (11)
g(un ) vn vn
As vectors {v1 , . . . , vn } are linearly independent, we obtain Mg,α C = C Mg,β , or, since C is
invertible,
Mg,β = C −1 Mg,α C (12)
We conclude this lecture with another important result. Though it is also related to a change
of basis of V , it is about a different issue.
Theorem 15 Let V be vector spaces, and let α = {u1 , . . . , un } and β = {v1 , . . . , vn } be two
P
bases of V , and let A = (aij ) be a n × n matrix over F, such that uj = i aij vi . Then
[x]β = A [x]α .
Proof. We have
X X X X X
x= xj uj = xj aij vi = aij xj vi .
j j i i j
Now we observe that (i1)-th, or just the i-th, entry of the column vector A [x]α is precisely
P
j aij xj .
Remarks
• This theorem becomes very useful when we want to convert coordinates of many vectors
from one fixed basis to another fixed basis.
• Note that according to our definition of the coordinate vector, [uj ]β is the j-th column of
matrix A.
• If α is a standard basis of Fn , and x ∈ Fn , then the i-th components of x and [x]α are
equal.
For any m × n matrix A = (aij ), let AT denote the transpose of A, i.e., the n × m matrix
(a′kl ), where a′kl = alk for all k ∈ {1, . . . , n} and l ∈ {1, . . . , m}. If in the statement of Theorem
P
15, we wrote ui = j aij vj , then the result would change to
[x]β = AT [x]α .
One may ask why we do not just use row notations for vectors from Fn . Many texts do it.
In this case, to multiply a vector and a matrix, one would write xA. The approach has both
advantage and disadvantages. One disadvantage is having x on the left of A when we want to
consider a function defined by x 7→ xA.
Problems.
I recommend that you check your computations with any CAP (Computer Algebra Package), i.e., Maple,
or Mathematica, etc.
1. Let f : R3 → R2 be a linear map defined as f ([x, y, z]) = [2x − y, x + y − 2z], and let α and β be
bases of R3 and R2 , respectively. Find matrix Mf of f if
2. Let α = {v1 = [1, 1, 0], v2 = [0, 1, 1], v3 = [1, 0, 1]} ⊂ R3 , and let f be a linear operator on R3
such that [f (v1 )]α = [−1, −3, −3], [f (v2 )]α = [3, 5, 3], [f (v3 )]α = [−1, −1, 1]. Find Mf,α . Then
find Mf,β for β = {[1, 1, 1], [1, 0, 0], [1, 0, −3]}.
3. Let V = P3 be the vector space of all polynomials with real coefficients of degree at most 3. View
df
P3 as the subspace of C∞ (R). Let d : V → V defined as f 7→ f ′ = dx. Find Md,α , Md,β for bases:
2 3 2 3
α = {1, x, x , x } and β = {1, x − 2, (x − 2) /2!, (x − 2) /3!}. In which basis the matrix is simpler?
Then compute Mdi ,α , Mdi ,β , i = 2, 3, 4, where di := d ◦ di−1 (the composition of d with itself i
times.
4. Identify the Euclidean plane E2 with R2 by introducing a Cartesian coordinate system in E2 , and
matching a point with coordinates (x1 , x2 ) with vector (x1 , x2 ) (same as [x1 , x2 ]). We can depict
the vector as a directed segment from the origin (0, 0) to point (x1 , x2 ).
Write the matrix Mf for the following linear operators on R2 with respect to the standard basis.
(a) f = sl – the symmetry with respect to a line l of E2 , where l passes through the origin. Do
it for l being x-axis; y-axis; line l : y = mx.
7. Let A and B be square matrices of order n. Prove that if AB = In , then BA = In . This means
that the equality AB = In alone implies B = A−1 , i.e., the second condition in the definition of a
nonsingular matrix can be dropped.
8. (Optional) Let f be a rotation of R3 by angle π/2 (or θ) with the axis h(1, 2, 3)i (or h(a, b, c)i).
Find Mf with respect to the standard basis.
Now we wish to introduce the notion of the determinant of a square matrix A. Roughly speaking
the determinant of A is an element of the field constructed by using all n2 entries of the matrix
in a very special way.
It is denoted by det A. Though det A does not capture all the properties of A, it does capture
a few very important ones. Determinants has always played a major role in linear algebra.
They appeared when people tried to solve systems of linear equations. Then they found use
in analysis, differential equations, geometry. The notion of a volume of a parallelepiped in
Rn , defined by n linearly independent vectors, is introduced as the absolute value of the the
determinant of the matrix which has these vectors as rows.
There are several conventional expositions of determinants, each having its own merits. The
one we chose employs the notion of an exterior algebra. So far we have discussed two main
algebraic structures, namely fields and vector spaces. We briefly mentioned non-commutative
rings, which are like fields, but the multiplication is not commutative and the existence of some
multiplicative inverses is not required. The main example of non-commutative ring in this
course is Mn×n (F). Now we introduce the definition of an algebra. In this context ‘algebra’ is
a specific technical term, not the whole field of mathematics known as algebra.
Then V with such a multiplication is called an algebra over F. If the multiplication is com-
mutative or associative, we obtain a commutative or an associative algebra, respectively.
If there exists a vector e ∈ V such that ev = ve = v for all v ∈ V , then it is called the identity
element of V , and the algebra is called an algebra with the identity. It is clear, that if the
identity exists, it is unique. If for every non-zero v ∈ V , there exists a v −1 – the inverse of v
with respect to the multiplication on V , then an algebra with the identity is called a division
algebra.
n
X n
X n
X X
ti sj akij vk
uv = tk vk sk vk =
k=1 k=1 k=1 1≤i,j≤n
• The vector space F[x] of all polynomials over F is an infinite-dimensional algebra over F
with respect to the usual product of polynomials. It is commutative, associative, with the
identity, but not a division algebra.
• A similar example is the algebra F[x1 , . . . , xn ] of all polynomials over F with n variables.
• There exists a 4-dimensional associative division algebra over R, called the algebra of
quaternions, or the Quaternion algebra (or the algebra of Hamilton Quaternions). It
has a basis {1, i, j, k}, with multiplication of the basis elements defined as
and continued to the whole algebra by the distributive laws. As we see, this algebra is
not commutative. 1 (the field element viewed as a vector) is the identity element.
There exists one more division algebra over reals. It is the Graves-Cayley algebra of
octonions, which is 8-dimensional, non-commutative and non-associative.
• Mn×n (F) - the n2 -dimensional associative algebra of all n × n matrices over F. It has the
identity, but is not commutative, and is not a division algebra.
Now we define the exterior algebra which will be used to develop the theory of the determinants
of square matrices.
Consider V = Fn , and let {e1 , . . . , en } be the standard basis of V . We define the following
vector spaces over F.
Λ = Λ0 × Λ1 × . . . × Λn
Similarly to complex numbers or polynomials, we may, and we will, think about elements of Λ
as formal sums of elements from Λi , something like
2 √
2 + 3 e2 + e3 − e1 ∧ e2 + e1 ∧ e3 − 2 e1 ∧ e2 ∧ e3 .
15
We will not write any term with coefficient 0, since it is equal to zero vector.
Now we introduce the multiplication on Λ which will distributes over vector addition and thus it
will suffice to define multiplication on basis elements. Our algebra is also going to be associative.
(i) We postulate that the basis element 1 for Λ0 (= F) is to be the identity element for
multiplication.
(ii) Next we define the product of two basis elements ei and ej from Λ1 to be an element of
Λ2 :
ei ej = ei ∧ ej = −ej ∧ ei for all i, j, even for i = j.
This implies that ei ei = −ei ei , hence 2ei ∧ ei = 0. When we consider exterior algebras, we
will consider only those fields where 2 6= 0, e.g., F2 is prohibited. This allows to conclude
that 2ei ∧ ei = 0 is equivalent to ei ∧ ei = 0.
(iii) The product of basis elements ei1 ∧ ei2 ∧ . . . ∧ eik ∈ Λk and ej1 ∧ ej2 ∧ . . . ∧ ejm ∈ Λm is
defined as follows:
1. Part (iii) of the definition can also be rephrased as follows. The products of two vectors from
the basis of Λ is defined as
(ei1 ∧ ei2 ∧ . . . ∧ eik ) (ej1 ∧ ej2 ∧ . . . ∧ ejm ) = ei1 ∧ ei2 ∧ . . . ∧ eik ∧ ej1 ∧ ej2 ∧ . . . ∧ ejm ,
with the assumption that the latter can be simplified, by using associativity: either to 0, if
some index is is equal to some jt or k + m > n, or, otherwise, to a basis element of Λk+m or its
opposite.
2. Why should we use a special notation for the product? Why instead of uw we cannot just
write u ∧ w, same wedge as in other symbols? It certainly agrees with the multiplication of the
basis vectors of Λ. That what we will do. And we will refer to the multiplication in Λ as the
wedge product.
3. It is not obvious, that the rule of determining ǫ in the part (iii) will agree with the associativity
of the product of the basis elements, but it is possible to show that it does.
4. Since zero vector 0V = 0F v, then the rule (ku)v = u(kv) = k(uv) for all u, v ∈ V , and all
k ∈ F, implies that if among the factors in a wedge product one vector is 0, then the whole
product is 0.
Examples.
• e2 ∧(e1 ∧e2 ) = (e2 ∧e1 )∧e2 = (−e1 ∧e2 )∧e2 = −(e1 ∧e2 )∧e2 = −e1 ∧(e2 ∧e2 ) = −e1 ∧0 = 0.
Pn Pn
Proof. Let x = i=1 xi ei and y = i=1 yi ei . Linear dependence of x and y implies that one
of them is a scalar multiple of another. Suppose x = ky, for some k ∈ F. Then (x1 , . . . , xn ) =
(ky1 , . . . , kyn ) and
n
X n
X n
X
x∧y = xi ei ∧ yi ei = (xi yj − xj yi )(ei ∧ ej ) =
i=1 i=1 1≤i<j≤n
Xn n
X
(kyi yj − kyj yi )(ei ∧ ej ) = (0)(ei ∧ ej ) = 0.
1≤i<j≤n 1≤i<j≤n
The second statement follows from the fact that x and x are linearly dependent.
Proof. Linear dependence of vi implies that one of them is a linear combination of others.
Renumber them such that v1 = a2 v2 + . . . + am vm . Then
v1 ∧ v2 ∧ . . . ∧ vm = (a2 v2 + . . . + am vm ) ∧ v2 ∧ . . . ∧ vm =
a2 v2 ∧ v2 ∧ . . . ∧ vn + . . . + am vm ∧ v2 ∧ . . . ∧ vm = 0,
since when we distribute the product every term will contain some vi twice among its wedge
factors, and, by Proposition 16, each such term is zero.
Let {e1 , . . . , en } be the standard basis of Fn , and let 1 ≤ i1 < i2 < . . . < ip ≤ n. For
A ∈ Mp×p (F), we define the “product of A and eik with respect to (ei1 , ei2 , . . . , eip )” as follows:
p
X
A eik := a1k ei1 + a2k ei2 + . . . + apk eip = atk eit .
t=1
Next we define the “product of A and ei1 ∧ . . . ∧ eip with respect to (ei1 , ei2 , . . . , eip )” as
p
^ p
X
A (ei1 ∧ . . . ∧ eip ) = A ei1 ∧ . . . ∧ eip := A ei1 ∧ . . . ∧ A eip = atk eit .
k=1 t=1
If p = n, then the there exists only one increasing sequence of length n in {1, 2, . . . , n}. As
A ∈ Mn×n (F), the coordinate vector of A ei in the basis (e1 , . . . , en ) represents the i-th column
of A, which is [a1i , . . . , ani ]T . In this case A ei can be thought as the genuine product of two
matrices: A and the the column (n × 1 matrix) ei . Again, A e1 ∧ . . . ∧ A en lies in 1-dimensional
space Λn . Hence it is a scalar multiple of the basis vector e1 ∧ . . . ∧ en . Hence
A e1 ∧ . . . ∧ A en = (det A) e1 ∧ . . . ∧ en . (13)
" #
a11 a12
It is easy to check that if n = 1 and A = (a), then det A = a. If n = 2, and A = ,
a21 a22
then A e1 ∧ A e2 = (a11 e1 + a21 e2 ) ∧ (a12 e1 + a22 e2 ) = (a11 a22 − a12 a21 ) e1 ∧ e2 , hence,
A (v1 ∧ . . . ∧ vp ) := A v1 ∧ . . . ∧ A vp .
Proposition 18 1. det In = 1.
5. For each A ∈ Mn×n (F), the inverse matrix A−1 exists if and only if det A 6= 0.
One of them is often called the Laplace expansion. It allows to compute the determinant of a
matrix by computing (many!) determinants of smaller matrices.
Let A = (aij ) be an n × n square matrix over F, and let Aij denote the (n − 1) × (n − 1) square
matrix obtained from A by deleting the i-th row and the j-th column of A.
Proposition 19
V V
Proof. As you remember, A ei = det A ei .
1≤i≤n 1≤i≤n
We begin with the first statement. First we establish the result for j = 1, i.e.,
X
det A = (−1)t+1 at1 det At1 .
1≤t≤n
We have:
^ ^ X X ^ X ^ X
A ei = ati et = at1 et ∧ A ei = (at1 et ) ∧ aki ek
1≤i≤n 1≤i≤n 1≤t≤n 1≤t≤n 2≤i≤n 1≤t≤n 2≤i≤n 1≤k≤n
X ^ X X ^
= (at1 et ) ∧ aki ek = at1 et ∧ det At1 ei
1≤t≤n 2≤i≤n 1≤k≤n 1≤t≤n 1≤i≤n
k6=t i6=t
X ^ X ^
= at1 det At1 et ∧ ei = at1 det At1 (−1)t−1 ei =
1≤t≤n 1≤i≤n 1≤t≤n 1≤i≤n
i6=t
X ^
= (−1)t−1 at1 det At1 ei .
1≤t≤n 1≤i≤n
In order to get a similar result for the expansion with respect to the j-th column, we prove the
following lemma. It states that if two columns of a square matrix are interchanged, then the
determinant of the matrix changes its sign.
Proof. Let 1 < i1 < . . . < ip < n, and {e1 , . . . , en } be the standard basis of Fn . Then
^ ^
A eik = det A eik .
1≤k≤p 1≤k≤p
What happens when we interchange two adjacent columns of A, say the j-th and the (j + 1)-th?
As
X X X X
A eij ∧ A eij+1 = akij eik ∧ atij+1 eit = (akij eik ) ∧ (atij+1 eit ) =
1≤k≤p 1≤t≤p 1≤k≤p 1≤t≤p
X X X X
akij atij+1 (eik ∧ eit ) = atij+1 akij (−eit ∧ eik ) = −(A eij+1 ∧ A eij ),
1≤k≤p 1≤t≤p 1≤k≤p 1≤t≤p
the interchange of two adjacent columns of A leads to a change of sign of the determinant. If
we wish to interchange the 1-st and the j-th columns of A, we can use (j − 1) adjacent column
interchanges to place the j-th column first, and then j −2 adjacent column interchanges to place
the 1-th column of A to be the j-th column of A′ . Since the total number of interchanges of
adjacent columns is an odd integer 2j − 3, the sign of the determinant will change odd number
of times. This proves the lemma.
Now we are ready to prove the formula for the expansion of det A with respect to the j-th column
for the arbitrary j. Consider the matrix A′′ obtained from A by subsequent interchanges of the
j-th column with the first j − 1 columns. In other words, the first column of A′′ is the j-th
column of A, the k-the column of A′′ is the (k − 1)-th column of A for 1 < k ≤ j, and it is the
k-th column of A for j < k ≤ n. As it takes (j − 1) interchanges, then det A = (−1)j−1 det A′′
by Lemma 20. At the same time, det Aij = det A′′i1 for all i. Expanding det A′′ with respect to
the 1-st column we obtain:
X X
det A = (−1)j−1 det A′′ = (−1)t+1 atj det A′′tj = (−1)t+j atj det Atj ,
1≤t≤n 1≤t≤n
We just wish to mention that for large n, and general A, the computation of det A using the
expansion by permutations takes long, and, hence, is not practical.
In solutions of the following problems you can use the techniques based on exterior algebra, as well as
all other properties of determinants that we proved.
3. A square matrix D = (dij ) of order n is called diagonal if dij = 0 for all i 6= j. Prove that
det D = d11 d22 · · · dnn .
4. A square matrix A = (aij ) of order n is called upper triangular (lower triangular) if aij = 0 for
all i > j (i < j). Prove that for both upper and lower triangular matrix A, det A = a11 a22 · · · ann .
7. Let A be a square matrix of order n, A1 (A2 ) be a matrix obtained from A by interchanging two
rows (columns) of A, and A3 (A4 ) be a matrix obtained from A by replacing the i-th row (column)
of A by the sum of this row (column) with the j-th row (column) multiplied by a scalar.
Prove that
det A = − det A1 = − det A2 = det A3 = det A4 .
The property det A = det A3 = det A4 is very useful for computing determinants: if aij 6= 0, then
applying the transformation several time one can make all entries of the j-th column or the i-th
row of A, except aij zeros.
8. (Vandermonde’s determinant) Let {x1 , . . . , xn } be n distinct elements from a field F. Prove that
1 1 ... 1
x1 x2 ... xn
2
x2 2 xn 2
Y
det x1 ... = (xj − xi )
. .. .. ..
. . 1≤i<j≤n
. . .
x1 n−1 x2 n−1 ... xn n−1
(all diagonal entries of the matrix are a, and all other entries are b).
We are going to present a kind of an explicit formula for the inverse of a nonsingular square
matrix. As before, we denote by Aij the matrix obtained from a square matrix A by deleting
the i-th row and the j-th column. Let
be the (ij)-cofactor of A, and let adj A = (bij )T . The matrix adj A is called the classical
adjoint of A. By GL(n, F) we will denote the set of all nonsingular n × n matrices. Since
it forms a group under multiplication (i.e., the multiplication is an associative operation on
GL(n, F), there exists an identity element, and every element has an inverse with respect to
multiplication), GL(n, F) is called the general linear group.
Theorem 21 Let A be a square matrix of order n over F. Then A(adj A) = (adj A)A =
(det A)In .
Proof. Our proof is based on two facts: (i) the Laplace expansion formula, and (ii) that a
matrix with two equal columns has zero determinant.
Let B = (bij ). Then adj A = B T . Let C = (cij ) = (adj A)A. Then, for all j, we have
n
X
T
cjj = [b1j , b2j , . . . , bnj ] [a1j , a2j , . . . , anj ] = bkj akj = det A,
k=1
For i 6= j, we get
n
X
cij = [b1i , b2i , . . . , bni ] [a1j , a2j , . . . , anj ]T = bki akj .
k=1
Consider a matrix A′ = (a′ij ) with two equal columns: A′ is obtained from A by replacing the
i-th column of A by the j-th column of A. Note that det A′ki = det Aki for all k, and that
det A′ = 0 since columns of A′ are linearly dependent. Expanding det A′ with respect to the
i-th column we obtain:
X X X
0 = det A′ = a′ki (−1)k+i det A′ki = akj (−1)k+i det Aki = akj bki .
1≤k≤n 1≤k≤n 1≤k≤n
Therefore cij = 0 for i 6= j. This proves that (adj A)A = (det A)In .
The proof of the following corollary is obvious. It gives an explicit formula for the inverse of a
nonsingular square matrix.
We just wish to mention that for large n, and general A, the computation of A−1 using adj A
takes very long, and, hence, is not practical.
Theorem 21 and Corollary 22 allow us to prove Cramer’s rule for solutions of a system of n
linear equations with n unknowns:
a11 x1 + . . . + a1n xn = b1
a21 x1 + . . . + a2n xn = b2
............
an1 x1 + . . . + ann xn = bn .
Let bij be the (ij)-cofactor of A = (aij ). Fix some j, 1 ≤ j ≤ n. Multiplying both sides of the
i-th equation by bij , and then adding all the results, we obtain:
X X X X X
aik bij xk = aik bij xk = bi bij .
1≤i≤n 1≤k≤n 1≤k≤n 1≤i≤n 1≤i≤n
P
By Theorem 21, the inner sum 1≤i≤n aik bij is equal to 0 if k 6= j, and is det A for k = j.
Hence we have
X
det A xj = bi bij .
1≤i≤n
Let Aj be the matrix obtained from A by replacing its j-th column by the column [b1 , . . . , bn ]T .
P
Then 1≤i≤n bi bij = det Aj , and we have
(det A) xj = det Aj
for all j. This implies that if the system has a solution x, and det A 6= 0, then x = [ detA detAn T
det A , . . . , det A ] .
1
It also shows that if det A = 0, but det Aj 6= 0 for at least one value of j, then the system has
no solutions.
detAi
For det A 6= 0, we can check that xi = det A , i = 1, . . . , n, are indeed the solutions of the system
by substituting them into an arbitrary equation of the system and simplifying the right hand
side.
If b = 0, i.e., the system is homogeneous, then all det Aj = 0. Therefore, if det A 6= 0, the
system will have only the trivial solution x = 0. If b = 0 and the system has a non-trivial
solution, then det A = 0. Hence we proved the following theorem.
If b = 0, i.e., the system is homogeneous, then, if det A 6= 0, the system has only the trivial
solution x = 0. Equivalently, if b = 0 and the system has a non-trivial solution, then det A = 0.
As applications of the Cramer’s rule and Vandermonde’s determinant, we can prove the following
fundamental facts about polynomials.
Proof. If A is the matrix of the coefficients of the corresponding system, then by Cramer’s rule
ai = det Ai / det A = 0/ det A = 0, i = 0, . . . n, as every matrix Ai contains a column of all
zeros.
The corollary above can be restated this way: no polynomial of degree n over a field can have
more than n distinct roots. It turns out that if roots are not distinct, but are counted with
their multiplicities, then it is still true that no polynomial of degree n over a field can have
more than n roots. But a proof in this case must be different.
The following corollary just restates the uniqueness part of Theorem 24. It generalizes the
facts that there exists a unique line passing through two points, and a unique parabola passing
through any three points, etc..
Corollary 26 Let f, g ∈ F[x] be two polynomials of degree at most n. If f (xi ) = g(xi ) for
n + 1 distinct elements xi of F, then f = g.
where the factor x − xi is missing in the numerator and the factor xi − xi = 0 is missing in the
denominator.
Then it is obvious that fi (xi ) = 1 for all i, and fi (xj ) = 0 for all i 6= j. This implies that the
polynomial
L(x) = y1 f1 (x) + . . . + yn+1 fn+1 (x) =
n+1
X (x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn+1 )
yi (17)
(xi − x1 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn+1 )
i=1
has the property that its degree is at most n and L(xi ) = yi for all i. We wish to note that
the polynomial L is exactly the same polynomial as the polynomial f obtained in the proof
of Theorem 24. Though it is not clear from the ways these polynomials are defined, it is the
case, as we showed in the proof of the theorem, and then again in Corollary 26, that such a
polynomial is unique. The form (17) is just a representation of the polynomial in the basis fi
of the vector space of all polynomials over F of degree at most n. This form is often referred
to as the Lagrange Interpolation Formula. For another view of the Lagrange interpolation
formula via Linear Algebra, the one which uses the notion of the dual basis, see the text by
Hoffman and Kunze.
We conclude with the following amazing fact. The proof is immediate and is left to the reader.
2. Find a polynomial over the reals of degree at most 3 whose graph passes through points (0, 1), (1, 2), 2, 4),
and (3, 2).
Reminding the class the notions of three basic elementary row operations. The row-echelon
form of a matrix, and the reduced row-echelon form.
Suggested reading for the review: the text by Dummit and Foote, pages 424 – 431 as the
collection of all main facts, and favorite undergraduate texts.
We assume that we know that any matrix can be transformed into a row-echelon form by using
the elementary row operations.
The row space (column space) of a matrix A ∈ Mm×n (F) is the span of all its row vectors
(column vectors). The dimensions of these spaces are called, respectively, the row-rank and
the column-rank of the matrix. Is there any relation between these spaces? The row space is
a subspace of Fm , and the column space is the subspace of Fn , so, in general, they are distinct
spaces. These two spaces can be distinct even for m = n, as the following example shows:
" #
1 1
A=
0 0
The row space of A is h(1, 1)i, and its column space is h(1, 0)i. The spaces are distinct, though
both are 1-dimensional. Experimenting with several more examples, like
1 1 2
1 0 0 1 1 2 7 2 1 0
0 3 0 , or 0 1 3 3 , or 3 2 2 ,
0 0 −1 0 0 1 −1 6 4 4
1 0 1
we notice that the dimension of the row space and column space of each of these matrices are
equal. It turns out that this is always true: the row-rank of any matrix is equal to its column-
rank. This will be proved in Theorem 30 below. The common value of the row-rank of A and
its column-rank, is called the rank of A, and it is denoted by rank A.
Lemma 28 Both row-rank and column-rank are preserved by (or invariant under) the elemen-
tary row operations and by the elementary column operations.
Proof. Let A = (apq ) be an m × n matrix. The fact that an elementary row operation does not
change the row space of A, and so its row-rank, is easy to demonstrate, and we leave it to the
reader. We will prove a less obvious fact, namely that an elementary row operation preserves
the column-rank.
of A′ .
Pm Pm
Suppose k=1 λk Ck = 0. This is equivalent to k=1 λk atk = 0 for all t ∈ [m]. Then
m
X m
X m
X m
X
λk a′ik = λk (aik + cajk ) = λk aik + c λk ajk = 0 + c0 = 0.
k=1 k=1 k=1 k=1
As all rows of A and A′ , except maybe the ith, are equal, we get that m
P
k=1 λk Ck = 0 implies
Pm ′
k=1 λk Ck = 0.
Suppose m
P ′
Pm ′
k=1 λk Ck = 0. This is equivalent to k=1 λk atk = 0 for all t ∈ [m]. For t = j, we
get, we get k=1 λk a′jk = m
Pm P
k=1 λk ajk = 0. For t = i, we get
m
X m
X m
X m
X
λk a′ik =0 ⇔ λk (aik + cajk ) = 0 ⇔ λk aik + c λk ajk = 0.
k=1 k=1 k=1 k=1
Pm Pm Pm
As k=1 λk ajk = 0, we obtain k=1 λk aik = 0. This implies that k=1 λk Ck = 0. Hence
m
X m
X
λk Ck = 0 if and only if λk Ck′ = 0.
k=1 k=1
Think about it. Does not this immediately imply what we need, i.e., that the column-ranks of
A and A′ are equal? Of course!
Lemma 29 Given two sets of n vectors {v1 , . . . , vn } and {u1 , . . . , un } such that
λ1 v1 + . . . + λn vn = 0 if and only if λ1 u1 + . . . + λn un = 0.
Proof. Let k = dim hv1 , . . . , vn i. Then there exists vi1 , . . . , vik which form a basis of hv1 , . . . , vn i.
The condition of the lemma implies that {ui1 , . . . , uik } is a basis of hu1 , . . . , un i. Indeed, the
Pk
linear independence of ui1 , . . . , uik is clear. If vj = t=1 βt vit for some βt , then (−1)vj +
Pk Pk Pk
t=1 βt vit = 0. Hence, (−1)uj + t=1 βt uit = 0, which is equivalent to uj = t=1 βt uit . This
proves that {ui1 , . . . , uik } spans hu1 , . . . un i, and so it is a basis of hu1 , . . . , un i.
As the row-rank and the column-rank of a matrix in its row-echelon form are equal, we get the
following
Theorem 30 Row-rank and column rank of an arbitrary matrix in Mm×n (F) are equal.
Let A ∈ Mm×n (F). Deleting some (or possibly none) rows or columns of A, we can obtain a
submatrix of A.
1. rank A = rank AT
5. Let B ∈ Mn×p (F). Then rank AB ≤ min{rank A, rank B}. If m > n, then AB is singular.
Proof. Here are hints for proofs. The reader should supply all missing details.
2. Straightforward.
3. One may try two different approaches. For the first, use the row-echelon form of the matrix.
For the second approach, consider the map φ : Rn → Rm , given by x 7→ Ax. Then im φ is the
span of the column space of A, and ker φ is the solution space of the system Ax = 0.
4. Follows easily from the earlier result dim(U + W ) = dim U + dim W − dim U ∩ W , where U
and W are subspaces of a finite-dimensional space V .
5. The inequality follows from the observation that a column space of AB is a subspace of the
column space of A; and the row space of AB is a subspace of the row space of B. The second
statement follows from it.
In this lecture we begin to study how the notions of the Euclidean geometry in dimensions 2
and 3 are generalized to higher dimensions and arbitrary vector spaces. Our exposition is close
to the one in [2].
Let V be a vector space over a field F. A bilinear form over V is a map f : V × V → F such
that f is linear in each variable:
and
f (w, αu + βv) = αf (w, u) + βf (w, v)
EXAMPLES.
• V = Fn , f (x, y) = x1 y1 + . . . + xn yn .
The relation between bilinear forms on a finite-dimensional vector space and matrices is de-
scribed in the theorem below. We remind the reader that a matrix B is called symmetric if
B = BT .
Proof. If {e1 , . . . , en } is the standard basis of Fn , then B = (bij ), where bij = f (ei , ej ). Then,
using bilinearity of f ,
X X X
xi yj f (ei , ej ) = xT By.
f (x, y) = f xi ei , yj ej =
1≤i≤n 1≤j≤n 1≤i,j≤n
This argument shows that B depends on the choice of the basis. If α = {v1 , . . . , vn } is another
basis, let A = (aij ), where aij = f (vi , vj ). Then f (x, y) = [x]Tα A [y]α .
If f is a symmetric bilinear form over F, we call the pair (V, f ) an inner product space, with
f the inner product.
For the rest of this lecture, f will denote an arbitrary inner product over V = Fn , and B will
denote the associated symmetric matrix.
The vectors u and v are called orthogonal (or perpendicular) if their inner product is zero.
This often is denoted by u ⊥ v. For a subset S ⊆ V , we define the perpendicular space of S
as
S ⊥ = {v ∈ V : f (u, v) = 0 for all u ∈ S}.
1. {0}⊥ = V .
2. S ⊥ T ⇔ S ⊆ T ⊥ ⇔ T ⊆ S ⊥ .
4. If S ⊆ T ⊆ V then T ⊥ ≤ S ⊥ ≤ V .
5. S ⊆ S ⊥⊥ .
rad U = U ∩ U ⊥ .
The subspace U is called singular if rad U 6= h0i; and nonsingular otherwise. We call the
inner product singular or nonsingular according to whether or not the whole space V itself
is singular. In other words, (V, f ) is nonsingular if the only vector of V orthogonal to the whole
V is zero vector. The terminology suggests that the notion of singularity in inner product spaces
is related to singularity of matrices. Indeed, this is the case.
Theorem 34 Let (V, f ) be an inner product space, dim V = n, B be a matrix associated with
f , and U ≤ V be an arbitrary subspace of V . Then the following holds.
Proof. Let {u1 , . . . uk } be a basis of U . Then x ∈ U ⊥ if and only if x ⊥ ui for all i, or,
equivalently, x is a solution of the homogeneous system of k linear equations
As the rank of this system is at most k, therefore its solution space U ⊥ has dimension at least
n − k (Theorem 31). This proves (a).
Suppose B is nonsingular. Then all vectors uTi B are linearly independent. Indeed, let
X
λi uTi B = 0.
1≤i≤k
If B is singular, then rank B < n, and the system By = 0 has a nontrivial solution. Call it v,
v 6= 0. So Bv = 0. Then for every w ∈ V , wT B v = wT (B v) = wT 0 = 0. Hence v 6= 0 and v
is orthogonal to the whole space V . Then rad V 6= h0i, and the inner space is singular. This
proves the second implication in part (c).
1. Let (V, f ) be an inner product space, and let u, w be two isotropic vectors in V . Prove that if
u ⊥ w, then Span({u, w}) is a totally isotropic subspace.
2. For each of the following inner product spaces determine whether it is singular, and if it is find
a nonzero vector orthogonal to the whole space. Find isotropic vectors or show that they do not
exist. Find a maximal totally isotropic subspace (if isotropic vectors exist).
(a) V = R2 , f (x, y) = x1 y1 .
(b) V = R2 , f (x, y) = x1 y2 + x2 y1 .
(c) V = R2 , f (x, y) = x1 y1 − x2 y2 .
Rb
(d) V = C[a, b], f (u, v) = a u(t)v(t) dt.
(e) V = C2 with the standard inner product.
(f) V = R4 , f (x, y) = xT By, where
4 2 1 0
2 1 0 0
B=
1 0 −1 1
.
0 0 1 −1
(g) V = F2p with the standard inner product, for p = 5, and for p = 7.
(h) V = F52 with the standard inner product.
(i) V = R4 with the inner product f (x, y) = x1 y1 + x2 y2 + x3 y3 − x4 y4 . This is the famous
Minkowski space used in the theory of special relativity. Here (x1 , x2 , x3 ) correspond to
coordinate of an event in R3 , and x4 to its time coordinate.
3. Let U be a subspace in an inner product space over F, where F is any field we have been using in
this course except F2 . Prove that if every vector of U is isotropic, then U is totally isotropic.
Show that the statement is not necessarily correct for F = F2 .
4. Let V be an anisotropic inner product space. Let {v1 , . . . , vn } be a set of nonzero vectors in V
such that vi ⊥ vj for all i 6= j. Prove that {v1 , . . . , vn } are linearly independent.
5. Prove that the set of vectors {cos x, cos 2x, . . . , cos nx, sin x, sin 2x, . . . , sin nx} is linearly indepen-
dent in V = C[0, 2π].
Rb
(Hint: Consider the inner product f (u, v) = a
u(t)v(t) dt, and use the previous exercise.)
7. Let f be the standard inner product on F3 . Let α = {[2, 1, −1], [1, 0, 1], [3, 1, 1]} be another basis
of Fn . Find a matrix A such that
8. (Optional) Prove that Fp3 with the standard inner product is isotropic for every prime p.
The goal of this lecture is to discuss the Gram-Schmidt orthogonalization process. Besides
being a very useful tool, it will lead us to some striking conclusions.
Let (V, f ) be an inner product space. A set S of vectors of (V, f ) is called orthogonal if every
two distinct vectors of S are orthogonal.
Proposition 36 Let (V, f ) be an n-dimensional inner product space, and let S be an orthogonal
set of vectors, such that no vector of S is isotropic. Then S is a linearly independent set, and
|S| ≤ n.
So λi f (vi , vi ) = 0 for all i. Since f (vi , vi ) 6= 0 (S has no isotropic vectors), λi = 0 for all i.
Hence S is linearly independent. Since dim V = n, then every n + 1 vectors of V are linearly
independent. So |S| ≤ n.
1. Prove that the set S = {cos x, . . . , cos nx, sin x, . . . , sin nx} of functions in V = C[0, 2π] is
linearly independent (over R).
Rb
Solution. Indeed, let f (u, v) := a u(t)v(t) dt, where u, v ∈ V . Then f an inner product on V .
Consider the inner space (V, f ). As
Z 2π Z 2π
2
f (cos kt, cos kt) = cos kt dt = sin2 kt dt = f (sin kt, sin kt) > 0,
0 0
2. People in a city of 100 residents like to form clubs. The only restrictions on these clubs are
the following:
Solution. Let {p1 , . . . , p100 } be the set of all people in the city, and let C1 , . . . , Cm denote a set
of clubs. For each club Ci consider a vector vi ∈ F100
2 , such that the k-th component of vi is 1 if
pk ∈ Ci , and is 0 otherwise. Consider the standard inner product f on F100
2 . As each |Ci | is an
odd integer, each vi = (vi1 , . . . , vi100 ) contains an odd number of components equal to 1. Hence
100
X
f (vi , vi ) = vik vik = 1 + . . . + 1 (|Ci | addends) = 1.
k=1
As each |Ci ∩ Cj | is an even integer, the vectors vi and vj share 1’s in even number of same
components. Hence
100
X
f (vi , vj ) = vik vjk = 1 + . . . + 1 (|Ci ∩ Cj | addends) = 0.
k=1
OK, let’s get serious. Everyone likes orthogonal bases. A good news is that often we can build
one. A popular technology is called the Gram-Schmidt Orthogonalization process. And
it is free! And Jorgen P. Gram (1850-1916) and Erhard Schmidt (1876-1959) are two different
people. And the method seems to be known to Laplace (1749-1827), and used by Cauchy in
1846...
Proof. Let {v1 , . . . , vn } be a basis of V . Set v1′ = v1 , and try to find v2′ ∈ Span({v1 , v2 } such
that v1′ ⊥ v2′ and Span({v1′ , v2′ }) = Span({v1 , v2 }).
In order to do this, search for v2′ in the form v2′ = v2 + xv1′ , where the scalar x is unknown.
Then
v1′ ⊥ v2′ ⇔ f (v1′ , v2′ ) = 0 ⇔ f (v1′ , v2 + xv1′ ) = 0 ⇔
f (v1′ , v2 )
f (v1′ , v2 ) + xf (v1′ , v1′ ) = 0 ⇔ x = − .
f (v1′ , v1′ )
f (v1′ , v2 ) ′
v2′ = v2 − v .
f (v1′ , v1′ ) 1
As v1′ ⊥ v2′ and, as none of them is zero vector (why?), and as (V, f ) is anisotropic, vec-
tors v1′ , v2′ are linearly independent by Proposition 36. As they are in Span({v1 , v2 }), we get
Span({v1′ , v2′ }) = Span({v1 , v2 }).
If n > 2, we search for v3′ in the form v3′ = v3 + xv1′ + yv2′ . We wish to have v1′ ⊥ v3′ and v2′ ⊥ v3′ .
Taking the inner product of v1′ with v3′ , and of and v2′ with v3′ , we obtain
f (v1′ , v3 ) f (v2′ , v3 )
x=− and y = − .
f (v1′ , v1′ ) f (v2′ , v2′ )
Hence
f (v1′ , v3 ) ′ f (v2′ , v3 ) ′
v3′ = v3 − v − v .
f (v1′ , v1′ ) 1 f (v2′ , v2′ ) 2
Clearly v1′ , v2′ , v3′ are pairwise orthogonal, and each vector is nonzero. As (V, f ) is anisotropic,
vectors v1′ , v2′ , v3′ are linearly independent by Proposition 36. As Span({v1′ , v2′ , v3′ }) ≤ Span({v1 , v2 , v3 }),
we obtain Span({v1′ , v2′ , v3′ }) = Span({v1 , v2 , v3 }). Continue by induction, if needed.
1 1 11 1
f (x′ , x′ ) = f ( x, x) = f (x, x) = 2 a2 = 1.
a a aa a
For the field of real numbers R, every nonnegative number is a square. Therefore if f has the
property that f (x, x) > 0 for all x 6= 0 (f is called positive definite in this case), then each
p
vector has ‘length’ f (x, x), which is usually called the norm of x in (V, f ), and which is
1
denoted by k x k. If x 6= 0, then kxk x is a unit vector. We say that a set of vectors in an inner
product space is orthonormal if it is orthogonal and all vectors in the set are unit vectors.
Corollary 38 Let (V, f ) be an n-dimensional inner product space over R, where f is positive
definite. Then V has an orthonormal basis.
We are ready to pass to the ‘striking conclusions’ promised at the beginning of the lecture.
1. Find an orthogonal basis in the following spaces (V, f ). Make the found basis orthonormal, if
possible.
Fn = {1, cos x, cos 2x, . . . , cos nx, sin x, sin 2x, . . . , sin nx}.
Elements of V are called Fourier polynomials of order at most n. Consider the inner product
R 2π
0
u(t)v(t) dt on V . Check that Fn is an orthogonal basis of V . Turn it into an orthonormal
basis of V . Let
n
X
f = a0 /2 + (ak cos kx + bk sin kx).
k≥1
Express the coefficients ai and bi (Fourier coefficients of f ) as inner products of f and vectors
from the orthonormal basis.
4. People in a city of 100 residents like to form clubs. The only restrictions on these clubs are the
following:
Prove that the greatest number of clubs that they can form is 250 . (Note the surprising difference
with the example discussed in the lecture.)
5. People in a city of 100 residents like to form clubs. The only restrictions on these clubs are the
following:
Hint: Let {p1 , . . . , p100 } be the set of all people in the city, and let C1 , . . . , Cm denote a set of
clubs. For each club Ci consider a vector vi ∈ R100 , such that the k-th component of vi is 1 if
pk ∈ Ci , and is 0 otherwise. Consider the m × 100 matrix A, whose rows are the vectors vi . Prove
that AAT is nonsingular.
(ii) (Optional) What is the greatest number of clubs that they can form?
6. (Optional) Solve the previous problem where the seemingly strong condition (i) is replaced by a
trivial conditions that all clubs are distinct (as subsets of people).
Thought there are many advantages in considering general inner product spaces, if one wants
to generalize usual Euclidean geometry of 2- and 3-dimensional spaces to higher dimensions,
one considers an inner product space (V, f ) over real numbers with f being a positive definite
form. We will call these spaces Euclidean spaces. If V = Rn and f is the standard inner
product, we call (V, f ) the standard n-dimensional Euclidean space. There are at least two
ways of proceeding with such generalizations. One way is to study arbitrary Euclidean spaces
axiomatically, i.e., based on the definition of a symmetric positive definite bilinear form. Another
way is to explain that all n-dimensional Euclidean spaces are in a certain sense the same, and
to work with just one of them, your favorite. We will discuss both these approaches.
Theorem 39 Let (V, f ) be a Euclidean space, and kxk = f (x, x) be the norm.
The equality is attained if and only if all vectors are colinear and of the ‘same direction’,
i.e., there exists i such that for all j, vj = kj vi and kj ≥ 0.
Since kyk2 > 0, g(t) is a quadratic function on R which takes only nonnegative values. Hence
its discriminant D is nonpositive, i.e.,
D = 0 ⇔ ∃t0 g(t0 ) = 0 ⇔ x = t0 y.
3. For k = 2, we have
Taking square roots, we obtain the result. The equality happens if and only if f (v1 , v2 ) =
kv1 kkv2 k. Hence the vectors are colinear ( again by Cauchy-Schwarz). If v1 = tv2 , we have
f (v1 , v2 ) = f (tv2 , v2 ) = tf (v2 , v2 ) = tkv2 k2 . At the same time, kv1 kkv2 k = ktv2 kkv2 k = |t|kv2 k2 .
Hence t ≥ 0. If k > 2, one proceeds by a straightforward induction.
For the standard n-dimensional Euclidean space, the inequalities can be rewritten as
X X 1/2 X 1/2
ai bi ≤ a2i b2i ,
1≤i≤n 1≤i≤n 1≤i≤n
Rb
and for f (u, v) = a u(t)v(t) dt on C([a, b]),
Z b Z b 1/2
Z b 1/2
u(t)2 dt v(t)2 dt
u(t)v(t) dt ≤ · .
a a a
The distance between two vectors x and y in a Euclidean space (V, f ) is defined as kx − yk.
This definition is, of course, motivated by the distance between two points. Note that the
notion of a point Euclidean space has not been defined. Intuitively, we think about the vectors
as directed segments which have initial points at the origin, and about points as the endpoints
of these vectors, i.e., like in dimensions 2 and 3.
Now we turn to the second approach. We call two inner product spaces (V, f ) and (V ′ , f ′ )
isometric (or isomorphic), if there exists an isomorphism φ : V → V ′ , x 7→ x′ , such that
f (x, y) = f ′ (x′ , y ′ ) for every x, y ∈ V . Such inner product preserving isomorphism is called an
isometry. When V = V ′ and f = f ′ , the isometries are often called orthogonal maps. It is
clear that an isometry also preserves norms associated with f and f ′ . The following theorem
may, at first, look very surprising.
Theorem 40 Every two n-dimensional Euclidean spaces are isometric. In particular, all such
spaces are isometric to the standard Euclidean space.
and φ is an isometry.
Hence, there exists essentially only one n-dimensional Euclidean geometry (up to isometries).
The theorem implies that every ‘geometric’ assertion (i.e., an assertion stated in terms of addi-
tion, inner product and multiplication of vectors by scalars) pertaining to vectors in the standard
n-dimensional Euclidean space, is also true in all other n-dimensional Euclidean spaces. For
n = 2, 3, it allows to claim certain geometric facts without a proof, as long as we know that
they are correct in the usual Euclidean geometry. In particular, the inequality
holds because the triangle inequality holds in usual plane geometry. No proof is necessary!
By now the reader understands that we use the word “geometry” quite freely in our discus-
sions. Here we wish to add several other informal comments.
What we mean by saying “geometry” depends on the content. We certainly used the term
much earlier, when we discussed just vector spaces. There we also had a way to identify
different spaces by using isomorphisms (just linear bijections). Those mappings preserved the
only essential features of vector spaces: addition and multiplication by scalars. This gave us
geometry of vector spaces, or just linear geometry.
Now, in addition to linearity, we wanted to preserve more, namely the inner product of vectors.
This is what the isometries do. This leads us to the geometries of inner product spaces.
So a geometry is defined by a set of objects and a set of transformations which preserve certain
relations between the objects, or some functions defined on the sets of objects. We will return
to this discussion later in the course, where new examples of geometries will be considered.
Sometimes, instead of a bilinear form f in the description of (V, f ) which intuitively represents
somehow both the lengths and angles, we can begin with a quadratic form Q, which correspond
to the notion of length only.
(ii) the function g(x, y) := 12 [Q(x + y) − Q(x) − Q(y)] is a symmetric bilinear function on V
(i.e., g is an inner product on V ).
Notice that
1 1
g(x, x) = [Q(2x) − 2Q(x)] = [4Q(x) − 2Q(x)] = Q(x),
2 2
so g(x, x) = Q(x). Hence Q can be “recovered” from g.
On the other hand, beginning with any symmetric bilinear form g on V , the function H(x) :=
g(x, x) is a quadratic form on V with g being its polar bilinear form. Indeed,
We have seen that, as soon as a basis α in V is chosen, every bilinear function on V can be
represented as xT By = 1≤i,j≤n bij xi yj , where [x]α = [x1 , . . . , xn ] and [y]α = [y1 , . . . , yn ]. This
P
1. Prove that a2 + b2 + c2 ≥ ab + bc + ca for all real a, b, c, and that the equality is attained if and
only if they are equal.
2. Let x = (x1 , . . . , x5 ) ∈ R5 and the standard norm kxk = 1. Let σ be an arbitrary permutation
(bijection) on {1, 2, 3, 4, 5}, and xσ = (xσ(1) , . . . , xσ(5) ). What is the greatest value of x1 xσ(1) +
. . . + x5 xσ(5) , and for which x it is attained.
3. Prove that if a + b + c = 1, then a2 + b2 + c2 ≥ 1/3, and that the equality is attained if and only
if a = b = c.
4. Consider a plane α : 2x − 3y − z = 5 in the usual 3-space (point space). Find the point in α which
is the closest to the origin.
6. Prove that in the usual 2- or 3-dimensional Euclidean space, the following geometric fact holds: in
any parallelogram, the sum of squares of the diagonals is equal to the sum of squares of all sides.
Does it remind you something from the lectures?
7. Prove that in every tetrahedron ABCD, if two pairs of opposite (skew) sides are perpendicular,
then so is the third pair. Prove also that in such tetrahedron, the sums of squares of lengthes of
every pair of opposite sides are equal.
8. The goal of this exercise is to show that an isometry can be characterized in a somewhat more
economic way, namely as a map on V which just fixes zero vector and all distances between the
vectors.
Let (V, f ) be a Euclidean n-dimensional space and φ : V → V is such that
Show that only one of the conditions (i) or (ii) does not imply that the map is an isometry.
• To establish some ties between usual Euclidean geometry and Linear algebra.
• To demonstrate that using even the simplest facts from linear algebra enables us to answer
some nontrivial questions from 2- and 3-dimensional Euclidean geometry.
When we think about vector spaces geometrically, we draw diagrams imagining Euclidean 1-,2-,
and 3-dimensional point spaces. Vectors are depicted as directed segments. Often we “tie” all
vector to the origin, and say that their “endpoints” correspond to the points in the space. This
creates an inconvenience when we want to draw them somewhere else in the space, which is
desirable for many applications of vectors, in particular, in geometry and physics. Then we
agree that two different directed segments define the same vector if they have equal lengths and
“directed the same”. We explain how to add and subtract vectors geometrically. This type of
discussion is usually not precise, but, nevertheless, we got used to passing from points to vectors
and back. Coordinates allow to discuss all this with greater rigor, but the relations between the
definitions and the geometric rules for operations on vectors still have to be justified. It is not
hard to make the relation between point spaces and vector spaces precise, but we will not do
it here. See, i.e., [15], or [8], or [12] for rigorous expositions. Usually a point space built over a
vector space V is referred to as affine space associated with V .
Instead dealing with affine Euclidean spaces, we will translate the usual notions of these spaces
into the language of vector spaces, and try to discuss them by means of linear algebra.
For the rest of this lecture we will deal with the standard inner product space (Rn , f ) only.
{ta : 0 ≤ t ≤ 1}.
Let a, b be two vectors. We define an affine segment spanned by a and b as the following
set of vectors:
The first representation exhibits a vector c = (1 − t)a + tb whose endpoint C lies on (point)
segment AB and divides its length in proportion AC/CB = t/(1 − t) = t1 /t2 , t, t1 > 0. For
t = t1 = 0 we get b and for t = t2 = 1 we get a. The second way or writing is more symmetric.
We say that an affine segment (line) spanned by a and b is parallel to the affine segment (line)
spanned by c and d if vectors b − a and d − c are colinear.
Let b, c be two non-colinear vectors. We define the triangle spanned by b and c as the
following set of vectors:
{tb + sc : t, s ≥ 0, s + t ≤ 1}.
If the definition of the segment made sense to us, the definition of a triangle should too. If
0 < k < 1, then the set of vectors {tb + sc : t, s ≥ 0, s + t = k} is equal to the set of vectors
{t(kb) + s(kc) : t, s ≥ 0, s + t = 1}, which is the segment with the endpoints kb and kc. Hence
the triangle is the union of all such segments.
{t1 a + t2 b + t3 c : ti ≥ 0, t1 + t2 + t3 = 1}.
We leave the verification that these sets are equal to the reader. The affine triangle spanned by
vectors a, b, c is a translation by vector a of the triangle spanned by b − a and c − b.
{t1 b + t2 c + t3 d : 0 ≤ ti , t1 + t2 + t3 ≤ 1}.
An affine tetrahedron spanned by a, b, c, d, where vectors b−a, c−a, d−a are non-coplanar,
is the set of vectors defined as follows:
{a + t2 (b − a) + t3 (c − a) + t4 (d − a) : 0 ≤ ti , t2 + t3 + t4 ≤ 1}, or
{t1 a + t2 b + t3 c + t4 d : 0 ≤ ti , t1 + t2 + t3 + t4 = 1}.
{t1 b + t2 c + t3 d : 0 ≤ ti ≤ 1},
{a + t1 (b − a) + t2 (c − a) + t3 (d − a) : 0 ≤ ti ≤ 1},
3. The ratio of lengths of parallel segments. In particular, it preserves the ratio of lengths
of segments on the same line. In particular, the midpoint of a segment is mapped to the
midpoint of its image.
4. The ratio of areas or volumes of figures, where those are defined and nonzero.
Proof. 1. Let us demonstrate the property for affine triangles only. Others can be done
similarly. Let T = {t1 a + t2 b + t3 c : 0 ≤ ti , t1 + t2 + t3 = 1} be an affine triangle spanned
by a, b, c. Then b − a, c − a are non-colinear vectors. Then φ(T ) = {φ(t1 a + t2 b + t3 c) : 0 ≤
ti , t1 + t2 + t3 = 1} = {t1 φ(a) + t2 φ(b) + t3 φ(c) : 0 ≤ ti , t1 + t2 + t3 = 1}. As φ is non-singular,
vectors φ(b − a) = φ(b) − φ(a) and φ(c − a) = φ(c) − φ(a) are non-colinear, and, hence, φ(T ) is
the affine triangle spanned by φ(a), φ(b) and φ(c). ✷
3. All these statements follow immediately from the relation φ(x+t(y−x)) = φ(x)+t(φ(y−x)). ✷
4. Let us first restrict ourselves to areas only, and recall the notion of an area in R2 .
Consider a grid of congruent unit squares in R2 . As we have shown in parts 1,2,3, parallel lines
are mapped to parallel lines, and midpoints of segments to midpoints of their images. Therefore
the image of this grid will be a grid of congruent parallelograms.
Let F1 and F2 be two figures in R2 for which area exist. If the grid of squares is sufficiently
fine, then the ratio of the number of squares in the interior of F1 to the number of squares in
the interior of F2 can be made as close to the ratio of their areas as we wish. Actually it will
be equal to the ratio of areas in the limit, as the length of the side of a square in the square
grid decreases to zero.
Consider now φ(F1 ) and φ(F2 ). The ratio of the numbers of the parallelograms in the interiors
of φ(F1 ) and φ(F2 ) will be exactly the same as the ratio of the numbers of square grids in the
interiors of F1 and F2 . When the sides of a square in the square grid decreases, so does the size
of the parallelogram in the corresponding parallelogram grid. Passing to the limit will give
area (F1 ) area (φ(F1 ))
= .
area (F2 ) area (φ(F2 ))
It is clear that a similar argument can be applied to the ratio of volumes in R3 .
Remark. The statement of part 4 can be restated as follows: there exists a positive constant
c = c(φ), such that area (φ(F )) = c area (F ) for all figure F in E 2 which have areas. The
argument used in the proof of part 4, applies to all figures with positive areas (volumes), and to
all dimensions. It turns out that the the coefficient c is just | det Mφ |, where Mφ is the matrix
representing φ in the standard basis. Let us show it for dimensions 2 and 3.
Recall the following fundamental facts from analytical geometry: the area of a parallelogram
spanned by vectors a, b, whose coordinates in the standard basis of R2 are [a] = [a1 , a2 ] and
[b] = [b1 , b2 ], respectively, is the absolute value of the determinant of the matrix
!
a1 a2
,
b1 b2
and the volume of a parallelepiped spanned by vectors a, b, c, whose coordinates in the standard
basis of R3 are [a] = [a1 , a2 , a3 ], [b] = [b1 , b2 , b3 ] and [c] = [c1 , c2 , c3 ], respectively, is the absolute
Let n = 2, let {e1 , e2 } be the standard basis of R2 , and let [φ(e1 ), φ(e2 )]T = A[e1 , e2 ]T . Hence
| det A| is the area of the parallelogram spanned by φ(e1 ) and φ(e2 ). Since the determinant
of the identity matrix is 1, the area of the unit square (the one spanned by e1 and e2 ) is 1.
So | det A| is the ratio of the area of the parallelogram to the area of the square. Let’s see
that the area of any parallelogram will change by the same factor. Let x and y be two non-
colinear vectors in R2 . Vectors x, y span a parallelogram with the area | det B|, where B is the
matrix having [x] and [y] as its rows. Then vectors φ(x) and φ(y) span a parallelogram with
area | det(BA)|, since the rows of BA are the coordinate vectors of [φ(x)] and [φ(y)]. Since
| det(BA)| = | det(B)| | det A|, we obtain that the area of every parallelogram changes by the
same factor | det A|. A similar argument works also for n = 3.
Proposition 42 Every two segments, two triangles, two parallelograms, two tetrahedra, two
parallelepipeds in V can be mapped to each other by a linear map.
Proof. The statement follows from the definitions of the figures in the statement, and the fact
that every bijection between two sets of linearly independent vectors can be continued to a
linear map of Rn .
Let us present three examples of how Theorem 41 and Proposition 42 can be used to solve
problems in Euclidean geometry.
Example 43 Is there a non-regular pentagon with the property that each its diagonal is parallel
to one of its sides?
Solution. Yes. It is easy to show that a regular pentagon has this property (do it!). Consider
any linear operator of R2 which maps a regular pentagon to a non-regular one. There are many
such projections: e.g., just pick three consecutive vertices of the regular pentagon and map
the corresponding triangle to an equilateral one. Then the image of the whole pentagon is
not regular, since one of its angles has measure of 60◦ . Since parallel segments are mapped to
parallel segments, the image will satisfy the required property. ✷
Example 44 Let A1 , B1 , C1 be points on the sides BC, CA, AB of a △ABC, respectively, such
that
BA1 /A1 C = CB1 /B1 A = AC1 /C1 B = 1/2.
Proof. Consider a linear map which maps △ABC to an equilateral △A′ B ′ C ′ . Points A′ , B ′ , C ′
will divide the sides of △A′ B ′ C ′ in the same ratio, and therefore, A′ C1′ = B ′ A′1 = C ′ B1′ = 1
(we can just choose the scale this way). See Figure ***. Therefore it is sufficient to solve the
problem in this case, since the ratio of areas does not change.
This can be done in many ways. Here is one of them. Rotating △A′ B ′ C ′ counterclockwise by
120◦ around its center, we obtain that A′1 7→ B1′ 7→ C1′ 7→ A′1 , where 7→ means ‘is mapped to’.
This implies that
A′ A′1 7→ B ′ B1′ 7→ C ′ C1′ 7→ A′ A′1 ,
and therefore A′2 7→ B2′ 7→ C2′ 7→ A′2 . It implies that △A′2 B2′ C2′ is equilateral. Using the Cosine
theorem for △A′ C1′ C ′ , we get
√
C ′ C1′ =
p
12 + 32 − 2 · 1 · 3 · cos(π/3) = 7.
Now, △A′ B2′ C1′ ∼ △A′ B ′ A′1 , since they have two pairs of congruent angles. Therefore B2′ C1′ =
√ √ √ √ √ √
1/ 7 and A′ B2′ = C ′ A′2 = 3/ 7. Therefore A′2 B2′ = 7 − 1/ 7 − 3/ 7 = 3/ 7. This implies
√
that A′2 B2′ /A′ B ′ = 1/ 7, and therefore
Example 45 Let ABCD be a tetrahedra, and let E and F be the midpoints of segments AB
and CD, respectively. Let α be any plane passing through E and F . Prove that α divides
ABCD into two polyhedra of equal volumes.
Proof. By using a linear operator map ABCD to a regular tetrahedra A′ B ′ C ′ D′ . Then E and
F are mapped to the midpoints E ′ and F ′ of segments A′ B ′ and C ′ D′ , and plane α to a plane
α′ passing through E ′ and F ′ .
Now note that the line E ′ F ′ is perpendicular to the sides A′ B ′ and C ′ D′ and lies in α′ . Therefore
a rotation around the line E ′ F ′ by 180◦ maps each of the two polyhedra into which α′ divides
A′ B ′ C ′ D′ to another one. Hence they are congruent, and their volumes are equal. But the
ratio of volumes is preserved by any non-singular linear operator.
Similarly, let a1 , . . . , am be the set of linearly independent vectors in Rn . The set of vectors
{t1 a1 + . . . + tm am : 0 ≤ ti ≤ 1},
It takes some work to verify that this definition of the volume of a parallelepiped can be extended
to to definitions of volumes of other figures, and the obtained function satisfies all the axioms
imposed on volumes as objects of Measure theory and Euclidean geometry. What is obvious, at
least, is that the number is positive, and it becomes zero when the n-parallelepiped degenerates
into one of a smaller dimension, i.e., when defining vectors are linearly dependent. It also
satisfies our expectation that two sets of n linearly independent vectors {ai } and {bi } span the
same parallelepiped then the volumes defined by them should be the same. (Check!).
Before closing this section we wish to introduce another important matrix which determinant
also can be used to compute volumes.
Let (V, f ) be any n-dimensional Euclidean space, and let a1 , . . . , am be a sequence of vectors
in V . Consider a square matrix G of order m define as: G = G(a1 , . . . , am ) = (gij ), where
gij = f (ai , aj ). Then G is called the Gram matrix of the sequence of vectors a1 , . . . , am ,
and det G is called the Gram determinant of this sequence of vectors.
3. (Hadamard’s inequality.) Let {a′i } be the orthogonal basis constructed from {ai } by Gram-
Schmidt procedure. Then
4. Let m = n, and let α be an orthonormal basis of (V, f ). If A is a matrix which i-th row
is [ai ]α for all i, then G = AAT . Consequently,
Proof.
2. Obvious.
3. Follows from Gram-Schmidt procedure and the properties of determinants. Handout was
given in class. The inequality ka′i k2 ≤ kai k2 follows from the Pythagorean Theorem.
4. Straightforward.
1. Prove that the area of a parallelogram and the volumes a parallelepipeds in 2- and 3- dimen-
sional Euclidean space, respectively, are equal to the absolute values of the determinants of the
corresponding matrices.
You are allowed to use the facts from elementary geometry: the area of a parallelogram is equal
to the product of the length of its side and the length of the corresponding altitude, and that the
volume of a parallelepiped is the product of the area of its base and the corresponding altitude.
2. Given a tetrahedron AA1 A2 A3 . Let A′i be a point on the side AAi such that AA′i /AAi = λi ,
i = 1, 2, 3. Prove that vol (AA′1 A′2 A′3 ) = λ1 λ2 λ3 vol (AA1 A2 A3 ).
3. A plane passes through the midpoints of two skew sides of a tetrahedron, and intersects two other
sides at points M and N . Prove that the points M and N divide the sides (they belong to) in the
same ratio.
x21 x2
{ x = (x1 , x2 ) : 2
+ 22 ≤ 1. }
a b
Prove that every ellipsoid can be viewed is an image of a circular disc with respect to a linear
map. Conclude that the area of the ellipsoid with semi-axes of length a and b is πab. State and
prove the generalization of this result to R3 .
5. Given a tetrahedron AA1 A2 A3 . Let A′i be a point on the side AAi such that AA′i /AAi = λi ,
i = 1, 2, 3. Prove that vol (AA′1 A′2 A′3 ) = λ1 λ2 λ3 vol (AA1 A2 A3 ).
6. Let a1 , . . . , am be vectors in the standard inner product space (Rn , f ) such that the distance
between every two of them is equal to 1. Prove that m ≤ n + 1. Construct an example of n + 1
vectors with this property.
Hint: Let ui = ai − a1 for all i. Show that G(u2 , . . . , um ) is nonsingular.
In this lecture we discussed the notion of the distance from a vector (point) in a Euclidean
space (V, f ) to its n-dimensional subspace. We proved the following theorem.
Theorem 47 Let (V, f ) be a Euclidean space, and W be its n-dimensional subspace. Then for
every v ∈ V , there exists a unique vector w0 in W such that
(i) v − w0 ⊥ W , and
(ii) kv − w0 k = min{kv − wk : w ∈ W }.
1. Let (V, f ) be a Euclidean space, W be its subspace, v ∈ V and is not orthogonal to W . Prove
that the angle between v and proj W v is the smallest among all angles between v and w, w ∈ W .
f (v,w)
(The measure of the angle between two nonzero vectors a and b in V is, by definition, cos−1 kvk kwk .)
Try to get a good feeling for this result in the standard Euclidean 2- and 3-dimensional spaces,
with W being of dimension 1 and 2, respectively.
2. In R4 , let W = h(1, 1, 0, 0), (1, 1, 1, 2)i. Find w ∈ W such that kw − (1, 0, 2, 2)k is as small as
possible.
3. Find proj W v, and dist (v, W ) for (V, f ), W , and v defined below.
W = {x = (x1 , x2 , x3 , x4 ) : x1 + x2 + x3 − 2x4 = 0 }.
W = {x = (x1 , x2 , x3 , x4 ) : x1 + x2 = x3 − x4 , and x1 = x2 + x3 = 0 }.
R1
(iii) V = C[−1, 1], f (u, v) = −1
u(t)v(t) dt, v = cos x and W = h1, x, x2 , x3 i.
Let V be a vector space over a field F, and let L(V ) denote the set of all linear operators on V .
For φ ∈ L(V ), a subset S ⊆ V such that φ(S) ⊆ S is called stable or invariant with respect
to φ, or φ-stable, or φ-invariant. Clearly {0} and V are φ-stable for every φ. Restricting φ on
its invariant subspace W , we obtain a linear operator φ|W on W . The study of the latter may
be easier than of φ on the whole V , mainly because this subspace is of smaller dimensions than
V . In particular, if V is a direct sum of its subspaces W1 , W2 , . . ., then φ|V is completely defined
by all φ|Wi . This simple idea suggest the following approach for studying linear operators.
Given φ ∈ L(V ),
This is exactly what we will try to do during the next several lectures..
As the action of φ on h0i is trivial, the first interesting case is when a φ-invariant space is
1-dimensional.
Let φ(v) = λv for some nonzero vector v ∈ V and a scalar λ ∈ F. Then v is called an
eigenvector of φ, and λ is called an eigenvalue of φ. We also say that λ and v correspond
to each other. Every eigenvector v of φ spans a 1-dimensional φ-stable subspace hvi.
φ(x) = λ1 x1 v1 + · · · + λn xn vn .
Therefore finding as many as possible linearly independent eigenvectors of φ is useful. How can
one find them? In order to answer this question, we consider matrices which represent φ in
different bases.
Let dim V = n, φ ∈ L(V ), and α = {v1 , . . . , vn } be a basis of V . Let Mφ,α , be the matrix of φ
corresponding to α. We remind ourselves that Mφ,α = (aij ), where the entries aij are defined
by the equalities φ(vi ) = ai1 v1 + · · · + ain vn , i = 1, . . . , n. To simplify the presentation, we
T by A. Then it is easy to check that
denote Mφ,α
[φ(x)]α = A [x]α
As φ(v) = λv if and only if A [v]α = λ[v]α , we wish also to define the notions of eigenvector and
eigenvalues for matrices.
Let V = Fn , A be a square matrix of order n, and A v = λv for some nonzero vector v and
a scalar λ. Then v is called an eigenvector of A, and λ is called an eigenvalue of A. The
equality A v = λv is equivalent to (λ I − A) v = 0, where I − In is the identity matrix of order
n. The system of homogeneous linear equations (λ I − A) x = 0 has a nontrivial solution if
and only if its matrix of coefficients λ I − A is singular, or if and only if det(λ I − A) = 0.
Therefore, each scalar λ which satisfies the equation det(λ I − A) = 0, is an eigenvalue of A,
and the corresponding eigenvector can always be found by solving the system of equations
(λ I − A) x = 0. Therefore, for a short while, we will discuss how one finds the eigenvalues of A.
Until this moment, we considered determinants of matrices over fields only. A very similar
theory of determinants exists for matrices over commutative rings. The only ring we will be
concerned with is the ring of polynomials F[x]. If we analyze the statements about determinants
which do not refer to the notion of linear independence of vectors, those will hold for matrices
over F[x], and the proofs can be carried verbatim. The expression x I − A can be considered as
a matrix over F[x], and its determinant as a polynomial of x over F. We call det(x I − A) the
characteristic polynomial of A, and denote it by cA (x). We proved the following important
fact.
This theorem implies that in order to find eigenvalues of a linear operator φ one can choose a
T which represents φ in this basis and find its eigenvalues.
basis α, consider the matrix A = Mφ,α
T . How do eigenvalues of B
A choice of another basis β leads to another matrix B = Mφ,β
compare to the ones of A? It turns out that they are exactly the same! Let us call the the
multiset of all eigenvalues of a matrix A the spectrum of A, and denote it by spec A.
Corollary 49 Let A and B be square matrices which represent φ ∈ L(V ) in bases α and β,
respectively. Then cA (x) = cB (x). Hence spec A = spec B.
Proof. Let α = {v1 , . . . , vn }, β = {u1 , . . . , un }, and C be the matrix whose i-th column is [ui ]α .
Then C is nonsingular, and B = C −1 AC. This implies
det(x I − A) = cA (x).
The result of this corollary allows us to define the characteristic polynomial cφ (x) of φ ∈
L(V ) as cA (x), where A is the matrix representing φ in some basis.
We have understood that when V has a basis of n eigenvectors, the matrix of φ in this basis is
diagonal. The following theorem provides a simple sufficient condition for existence of linearly
independent eigenvectors. The condition is not necessary, as simple examples show (like φ = id).
As we know, not every polynomial in F[x] has roots in F. And if it does, not all of the roots
must be distinct. What can be said about φ if cφ (x) has this property? We will be discussing
this question in the next lecture.
We wish to finish this section with an example illustrating how useful can be to diagonalize
a matrix. We will find an explicit formula for the Fibonacci sequence: F0 = F1 = 1, and
Fn = Fn−1 + Fn−2 for n ≥ 2.
The example was presented in class, and was based on the observation that for i ≥ 2,
" # ! " #
Fi 1 1 Fi−1
= .
Fi−1 1 0 Fi−2
!
1 1
Diagonalizing A = lead to an explicit formula for Fn . Some details were left as
1 0
exercises.
1. (i) Let λ be an eigenvalue of φ ∈ L(V ), and let Vλ = {v ∈ V : φ(v) = λv}. Prove that Vλ is a
subspace of V . Vλ is called the eigenspace of φ corresponding to λ. A similar notion can be
defined for a square matrix of order n.
(ii) Let Vλi , i = 1, . . . , k, be eigenspaces of φ ∈ L(V ) corresponding to pairwise distinct eigenvalues
λ1 , . . . , λk . Let αi be a basis of Vλi . Then union of all αi is a linearly independent set of vectors
of V .
2. For a matrix A below, find its characteristic polynomial cA (x); spec A; for each λ ∈ spec A, find
a maximum set of linearly independent eigenvectors of A corresponding to λ, i.e. a basis for the
eigenspace of A corresponding to λ. If A is diagonalizable, find C such that C −1 AC is a diagonal
matrix.
Try to do it first without using computer. Then use computer if you have difficulties, and in order
to check your results.
3 1 0 0
4 1 −1 3 −1 1
0 3 1 0
(i) A = 2 5 −2
(ii) A = 7 −5 1
(iii) A =
0
0 3 1
1 1 2 6 −6 2
0 0 0 3
3. Let A = Jn (λ), which is a square matrix of order n, having all diagonal elements equal to λ, all
entries in (i, i + 1) positions equal to 1 (i = 1, . . . , n − 1), and all zero entrees everywhere else.
Matrices of the form Jn (λ) are called Jordan matrices or Jordan blocks. Find spec Jn (λ) and
a maximum set of linearly independent corresponding eigenvectors of A.
(This will generalize your computation for part (iii) of Problem 2 of this set.)
b b b a
6. Given a polynomial f (x) = xn + an−1 xn−1 + . . . + a1 x + a0 , find a square matrix A such that
cA (x) = f (x). Such a matrix is called the companion matrix for f .
(Hint: try a matrix like the one in Problem 5 of this set.)
8. Let A ∈ M3×3 (R) such that all entries of A are positive. Prove that A has an eigenvector having
all its components positive.
9. (i) Is there a matrix A ∈ M6×6 (R) with negative determinant and having no real eigenvalues?
(ii) Is there a matrix A ∈ M6×6 (R) with no real eigenvector?
(iii) Is there a matrix A ∈ M7×7 (R) with no real eigenvector?
10. Let λ be an eigenvalue of φ and v be a corresponding eigenvector. Let p(x) ∈ F[x]. Then p(λ) is
an eigenvalue of p(φ) and v is a corresponding eigenvector, i.e., p(φ) v = p(λ) v.
12. Let φ ∈ L(V ). Is it possible for φ not to have any nontrivial invariant subspaces, but for φ2 to
have one?
13. Prove that if linear operators φ, ψ ∈ L(V ) commute, i.e., φψ = ψφ, then every eigenspace of φ is
an invariant subspace of ψ (and vice versa).
In this lecture we continue our discussion of the question of how to find invariant subspaces for
a linear operator φ on V , dim V = n. Instead of operator φ we will deal with a matrix A which
represents it in some basis. We assume that A acts as an operator on Fn , by mapping x to Ax.
A choice of another basis leads to a transformation of A to a similar matrix C −1 AC. Finding
a basis where φ is presented in a simple way is equivalent to transforming A to a simple form
by means of the similarity transformation. This is usually done by further investigations of the
connections between matrices and polynomials.
For any square matrix A of order n, we can consider the set of all matrices of the form p(A) =
ad Ad + ad−1 Ad−1 + . . . + a1 A + a0 In , where all ai ∈ F. We add and multiply such matrices
similarly to the polynomials in F[x]. We can also think that p(A) is obtained from p(x) = xd +
ad−1 xd−1 +. . .+a1 x+a0 by substituting A instead of x. In order to do it, we just have to interpret
a0 as a0 In = a0 A0 . If the reader is familiar with the notion of ring (or algebra) homomorphism,
we can just say that p(A) is the image of p(x) under the homomorphism F[x] → Mn×n (F),
where p(x) 7→ p(A). If p(A) = 0, we say that the polynomial p an annihilating polynomial
of A.
Given A, is there always an annihilating polynomial of A different from zero polynomial? The
answer is Yes, and the proof is surprisingly easy.
The algebra Mn×n (F) is a n2 -dimensional space over F. Therefore matrices Ad , Ad−1 , . . . A, I
form a linearly dependent set in Mn×n (F) if d ≥ n2 , as we get at least n2 + 1 matrices in the
set. If λd Ad + λd−1 Ad−1 + . . . + λ1 A + λ0 In = 0, then p(x) = xd + ad−1 xd−1 + . . . + a1 x + a0 is
an annihilating polynomial of A. Hence we proved the following fact.
Proposition 51 For every matrix A ∈ Mn×n (F), there exists an annihilating polynomial of A
of degree at most n2 .
p(x) = q(x)mA (x) + r(x), where 0 ≤ deg r(x) < deg mA (x).
In order to move our investigations further, we have to recall several basic notions and facts
about polynomials. We remind the readers that for every two polynomials a = a(x) and b = b(x)
in F[x], not both zero polynomials, there exits a unique monic polynomial d = d(x) such that
d is a common divisor of a and b (i.e., d divides both a and b), and d is divisible by every
other common divisor of a and b. It is called the greatest common divisor of a and b, and
it is denoted by gcd(a, b). The gcd(a, b) can be found by the Euclidean algorithm applied to
a and b, and it leads to the following fundamental fact: if d(x) = gcd(a(x), b(x)), there exist
polynomials u(x), v(x) such that
If gcd(a, b) = 1, a and b are called relatively prime. In this case, the above equality becomes
1 = u(x)a(x) + v(x)b(x).
The following main theorem allows to reduce the question of finding invariant subspaces of A
to the one of factoring of polynomials.
where p1 (x) and p2 (x) are relatively prime. Then V = Fn can be represented as the direct sum
V = V1 ⊕ V2 ,
so p1 (x) and p2 (x) are annihilating polynomials of A|V2 and A|V1 , respectively.
Proof. As p1 and p2 are relatively prime, there exist q1 , q2 ∈ F[x] such that
and hence
q1 (A)p1 (A) + q2 (A)p2 (A) = I.
For every v ∈ V ,
where vi = qi (A)(pi (A) v). Since pi (A) v ∈ Vi , and Vi is A-stable, Vi is qi (A)-stable. Hence
vi ∈ Vi . This proves that V = V1 + V2 .
For every v ∈ V1 ∩ V2 ,
q1 (A) 0 + q2 (A) 0 = 0.
Theorem 53 explains how one can split V in the direct sum of A-stable subspaces. The strategy
is simple:
(2) Represent it as a product of pairwise relatively prime factors: p(x) = p1 (x) · . . . pk (x), such
that each factor pi (x) cannot be further split in this way.
(3) Consider A|im pi (A) , i = 1, . . . , k, and try to find a basis in im pi (A), where the operator
defined by A|im pi (A) can be easily described. The latter is equivalent to finding a matrix similar
to A|im pi (A) and of a simple form.
It turns out that there exists a much better way. It was found by two of the creators of matrix
theory in the 19-th century. It turns out that an annihilating polynomial of degree n exists,
and we actually already know what it is!
Theorem 54 ( Hamilton-Cayley Theorem). Let A ∈ Mn×n (F), and let cA (x) = det(x I − A)
be the characteristic polynomial of A. Then cA (A) = 0, i.e., every matrix is annihilated by its
characteristic polynomial.
Striving for more, namely for the minimal polynomial mA (x) of A which can have much smaller
degree than n, we can look for it among the factors of cA (x), as it must divide it due to
Proposition 52.
Regarding (2). We see that the success in accomplishing part 1 depends heavily on the property
of polynomials over F and on particular matrix A. What do we know about factoring of
polynomials?
There are several fundamental theorems in this regard. A polynomial f ∈ F[x] is called irre-
ducible in F[x] if deg f ≥ 1 and f is not a product of two polynomials of smaller degrees from
F[x].
where k ≥ 2, fi (x) are distinct irreducible monic polynomials, and ei are positive integers. This
representation is unique up to the order of factors.
where λi are all distinct roots of f in R, and qj (x) are irreducible monic quadratic polynomials
over R. Such representation is unique up to order of factors.
Regarding (3). This part is also far from easy. As two success stories, we present the Jordan
canonical form, and The Rational Canonical Form. Jordan form can be used in all those cases
when we have (or there exists) a factorization of an annihilating polynomial into the product
of linear factors. In particular, it exists for matrices over C. The The Rational Canonical Form
can be used whenever we have factorization of an annihilating polynomial into the product of
powers of distinct irreducible factors. Both forms cover the diagonal case, if such is possible.
For particular classes of matrices, more can be said. Those cover symmetric real matrices, her-
mitian matrices, and the orthogonal real matrices (the ones which correspond to the isometries
inner product spaces). The list can be continued.
2. What is wrong with the obvious “proof” of the Hamilton-Cayley Theorem: cA (A) = det(A I −A) =
det(0) = 0.
3. Assuming Hamilton-Cayley Theorem, prove that cA (x) divides (mA (x))t for some positive integer
t.
In this lecture we describe a special basis for an operator ψ ∈ L(V ) having (x − λ)a as its
annihilating polynomial. As (ψ − λ id)a = 0, setting φ = ψ − λ id, we obtain an even simpler
equation φa = 0. An operator φ with the property φa = 0 for some a ∈ N, is called nilpotent,
and the smallest positive b ∈ N such that φb = 0 is called the order of nilpotency of φ.
The most prominent nilpotent operator is, undoubtedly, the differential operator D on a vector
space of polynomials over R (or C) of degree at most m − 1, which maps every polynomial to
1
its derivative. Consider the following basis in this space: {vi = i! xi : i = 0, . . . , m − 1}. It is
clear that
D D D D D
vm−1 7−→ vm−2 7−→ · · · 7−→ v1 7−→ v0 7−→ 0,
The m × m matrix of D in this basis, ordered (vm−1 , . . . , v0 ), has a very simple form, having
1 in positions (1, 2), (2, 3), . . . , (m − 1, m), and 0 everywhere else. It is denoted by Jm (0). The
matrix Jm (λ) := λ I + Jm (0), which is obtained from Jm (0) by putting a scalar λ on the main
diagonal, is called a Jordan matrix, or a Jordan block. For m = 4,
0 1 0 0 λ 1 0 0
0 0 1 0 0 λ 1 0
J4 (0) = and J4 (λ) =
0 0 0 1 0 0 λ 1
0 0 0 0 0 0 0 λ
Observe that xm and (x−λ)m are the annihilating polynomials of Jm (0) and Jm (λ), respectively.
Moreover, they are the minimal polynomials.
Though we arrive to Jordan matrices via the example of a particular nilpotent operator D, it
turns out that similar bases exist for other nilpotent operators. This explains the importance
of Jordan matrices in linear algebra. Before we prove the existence of such a basis, we would
like to mention another attractive computational feature of Jordan matrices, wildly admired in
some 19-th and 20-th century pre-computer societies.
Proof. Was discussed in class. See Gel’fand’s book [6], p. 136 – 137, for details.
Mφ,α = diag [Ja1 (0), . . . , Ja1 (0), Ja2 (0), . . . , Ja2 (0), . . . . . . , Jat (0), . . . , Jat (0)],
Proof. Let φ0 = id, and Wi := ker φa−i for i = 0, . . . , a. First we show that
For 1 ≤ i ≤ a − 1, 0 = pa−i (x) = φa−i−1 (φ(x)), hence, φ(x) ∈ ker φa−(i+1) = Wi+1 . So
φ(Wi+1 ) ≥ Wi . Together with (20), it gives Wi ≥ φ(Wi ), which proves that each Wi is φ-
invariant. For i = a the statement is obvious. Suppose that for some i, 0 ≤ i ≤ a − 1,
Wi = Wi+1 . Then, Wi 6= h0i, as the order of nilpotency of φ is a. Hence im φa−i = im φa−i−1 ,
since the former is a subspace of the latter and they have equal positive dimensions. This
implies h0i = im φa = im φa−1 = . . . im φa−i = im φa−i−1 6= h0i, a contradiction. This proves
that all inclusions in (19) are strict.
Proof. We have
a1 v1 + . . . + ap vp ∈ Wi ⇒ a1 = . . . = ap = 0,
Having Lemma 60, the construction of the desirable basis is as follows. Let di = dim Wi . First
chose
α1 = {e1 , . . . , es1 },
a W2 -basis of W1 of d2 −d1 elements. Continue until you get a basis αa of Wa−1 (over Wa = h0i)
of da−1 − da = da−1 − 0 = da−1 elements. Let us list all these relative basis in the following
table.
α1 : e1 ... es1
α2 : φ(e1 ) ... φ(es1 ), es1 +1 ... es2
........................................................................
αa : φa−1 (e1 ) . . . φa−1 (es1 ), φa−2 (es1 +1 ) . . . φa−2 (es2 ), esa−1 +1 . . . esa .
Now we collect vectors in this table which stand in the same column. Let
where i = 1, . . . , sa , and
with b1 ≥ b2 ≥ . . . ≥ bsa .
Lemma 61
To finish our proof of the theorem, we just observe that if φi = φ|hβi i , then Mφi ,βi = Jbi (0).
Proof. Since the Jordan form is upper trianguar, the statement follows.
Proof. As we mentioned before, no ‘easy’ proof of this theorem exists. Instead of presenting a
proof, we describe four different ideas on which a proof can be based, and refer the reader to
the literature.
http://www.blue-arena.com/mewt/entry.php?id=147 , or
http://www.cs.ut.ee/∼toomas l/linalg/lin1/node19.html
2. For a proof based on Theorem 62, see [13], or [1], p. 173. Of course, in this case Theorem
62 should be proved independently from Theorem 59. The latter can be done by induction and
can be found in [1] p. 84, or [8] p. 64-65.
3. For a proof based on the density of matrices with distinct eigenvalues in the space of all matri-
ces, see http://planetmath.org/encyclopedia/ProofOfCayleyHamiltonTheorem.html The
density is understood relative to Zariski’s topology. An advantage of this proof is that it works
for many fields different from C.
4. For a proof based on the isomorphism of rings Mn (F)[x] and Mn (F[x]), see [4] p. 94-95.
Proofs based on this idea can be found in many other books, but many of them do not state
the isomorphism clearly, and develop some weaker results instead.
The following theorem provides more details on the block structure of the Jordan form of an
operator. We remind the reader that the algebraic multiplicity of an eigenvalue λ of φ is
the multiplicity of λ as a root of the characteristic polynomial cφ (x). A geometric multiplicity
of an eigenvalue λ of φ is the dimension of its eigenspace Vλ .
where each
Bi = diag [Jmi,1 (λi ), Jmi,2 (λi ), . . . , Jmi,li (λi ), ]
Moreover,
(iii) mi,1 = mi , i = 1, . . . , k.
since the matrix xI − Mφ,β is upper triangular and its determinant is equal to the product of
its diagonal entries. Then bi = ei from the uniqueness of a representation of a polynomial as a
product of irreducible factors.
(ii) Each diagonal block of Bi has exactly one eigenvector corresponding to the eigenvalue λi .
Therefore Bi has dim ker (φ − λi id) linearly independent eigenvectors, which is, by definition,
li .
(iv) Each Bi has li Jordan blocks, and the number of 1’s above the diagonal in each block is
one less than the block’s size.
(v) Most proofs are by induction, and are reduced to the uniqueness of the Jordan form of a
nilpotent operator. The idea is to use the fact that the cardinalities for bases aj (see the table
1. Find all possible Jordan forms for a real matrix A whose characteristic and minimal polynomials
are as follows.
2. Given cA (x) and mA (x) for A ∈ M3 (C), show that these completely determines the Jordan form
of A.
2 0 0 0 0
" # 1 1 −1 5 2 0 0 0
2 1
(i) A = (ii) A =
2 2 1 (iii) A = 0 0 8 0 0
−1 4
2 −1 4 0
0 0 3 1
0 0 0 5 −2
2 0 0 0 1 2 3 4
1 −1 0
1 2 0 0 0 1 2 3
(iv) A = 2 1
3 (v) A =
0
(vi) A =
1 0 0 0 0 1 2
1 2 0
0 0 1 0 0 0 0 1
5. Let A ∈ Mn (F) be upper-triangular. Then the multiset of its diagonal entries is precisely spec A.
(This problem has appeared earlier, but we suggest that readers think about it again.)
7. Show that a square matrix A is diagonalizable if and only if the minimal polynomial mA (x) is a
product of distinct linear factors (i.e., mA (x) = (x − λ1 ) · . . . · (x − λk ) where all λ1 , . . . , λk are
distinct).
10. Let A ∈ Mn (C) such that tr A = 0. Prove that there exist B, C ∈ Mn (C) such that A = BC −CB.
[1] S. Axler. Linear Algebra Done Right, 2nd edition, Springer-Verlag, 1997.
I disagree that this text does Linear Algebra “right”, and I disagree with many method-
ological decisions of the author. But some pages and exercises are good.
An unfinished manuscrip. Excellent for the title, but also much beyod it. Some chapters
are masterpieces.
A very good undergraduate text. If you find some sections of these notes to fast/hard, try
to find the corresponding material in this book. Sometimes you will not succeed.
[4] M.L. Curtis. Abstract Linear Algebra, Springer-Verlag New York Inc., 1990.
Nice, rather algebraic. A friendly introduction to exterior algebras, those some details are
missing (like in these notes).
[5] D.S. Dummit and R.M. Foote. Abstract Algebra, 3rd edition, John Wiley & Sons, Inc.,
2004.
A quite complete text in Abstract Algebra. Good for references and examples.
A classics. The terminology and notations are sometimes out of fashion. No matter how
many times I read this thin book, I often find something new in it. Great price.
[7] K. Hoffman, R. Kunze. Linear Algebra, 2nd edition, Prentice Hall, 1971.
Quite complete and thourough. Good as a reference, but it is not easy to use.
[8] A.I. Kostrikin, Yu. I. Manin. Linear Algebra and Geometry (Algebra, Logic and Applica-
tions), Gordon and Breach Science Publishers, 1989.
Outstanding and demanding. Details may be read elsewhere. Makes connections with many
advanced mathematical topics. A stimulating exposition of linear algebra related to Quan-
tum Mechanics.
[11] P. D. Lax. Linear Algebra, 3rd edition, John Wiley & Sons, Inc., 1997.
Stimulating, some great examples, rather unusual (for linear algebra texts) content.
[12] B.A. Rosenfeld. Multidimensional Spaces, Nauka, Moscow, 1966. (In Russian).
A very clearly written monograph, which discusses the use of linear algebra in high-
dimensional geometries.
A classics. The terminology and notations are sometimes out of fashion. Complete. The
best treatment of orthogonalization, Gram matrix/determinant, and volumes. Great price.
A great book in general. Has definition of Euclidean point spaces based on linear algebra.