0% found this document useful (0 votes)
202 views122 pages

Linear Algebra

The document provides an introduction to linear algebra and its applications. It defines key concepts such as vector spaces, subspaces, bases, dimensions, linear transformations, and inner product spaces. Vector spaces are sets that satisfy properties of vector addition and scalar multiplication. Subspaces are vector spaces contained within larger vector spaces. Bases provide a way to represent vectors and determine a vector space's dimension. Linear transformations are functions between vector spaces that preserve vector addition and scalar multiplication. Inner product spaces allow defining lengths of vectors and orthogonality. The document covers these fundamental linear algebra concepts in detail over multiple chapters.

Uploaded by

dosspks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
202 views122 pages

Linear Algebra

The document provides an introduction to linear algebra and its applications. It defines key concepts such as vector spaces, subspaces, bases, dimensions, linear transformations, and inner product spaces. Vector spaces are sets that satisfy properties of vector addition and scalar multiplication. Subspaces are vector spaces contained within larger vector spaces. Bases provide a way to represent vectors and determine a vector space's dimension. Linear transformations are functions between vector spaces that preserve vector addition and scalar multiplication. Inner product spaces allow defining lengths of vectors and orthogonality. The document covers these fundamental linear algebra concepts in detail over multiple chapters.

Uploaded by

dosspks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

A Short Course on

LINEAR ALGEBRA
and its
APPLICATIONS

M.Thamban Nair

Department of Mathematics
Indian Institute of Technology Madras
Contents

Preface iv

1 Vector Spaces 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Definition and Some Basic Properties . . . . . . . . . . 2
1.3 Examples of Vector Spaces . . . . . . . . . . . . . . . . 4
1.4 Subspace and Span . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Subspace . . . . . . . . . . . . . . . . . . . . . 8
1.4.2 Linear Combination and Span . . . . . . . . . . 11
1.5 Basis and Dimension . . . . . . . . . . . . . . . . . . . 14
1.5.1 Dimension of a Vector Space . . . . . . . . . . 18
1.5.2 Dimension of Sum of Subspaces . . . . . . . . . 22

2 Linear Transformations 25
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 What is a Linear Transformation? . . . . . . . . . . . 26
2.3 Space of Linear Transformations . . . . . . . . . . . . 31
2.4 Matrix Representations . . . . . . . . . . . . . . . . . 32
2.5 Rank and Nullity . . . . . . . . . . . . . . . . . . . . . 35
2.6 Composition of Linear of Transformations . . . . . . . 38
2.7 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . 40
2.7.1 Definition and examples . . . . . . . . . . . . . 40
2.7.2 Existence of an eigenvalue . . . . . . . . . . . . 43
2.7.3 Diagonalizability . . . . . . . . . . . . . . . . . 45

3 Inner Product Spaces 47


3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Definition and Some Basic Properties . . . . . . . . . . 48
3.3 Examples of Inner Product Spaces . . . . . . . . . . . 49
3.4 Norm of a Vector . . . . . . . . . . . . . . . . . . . . . 51
3.5 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . 52

ii
Contents iii

3.5.1 Cauchy-Schwarz inequality . . . . . . . . . . . 52


3.5.2 Orthogonal and orthonormal sets . . . . . . . . 54
3.5.3 Fourier expansion and Bessel’s inequality . . . 57
3.6 Gram-Schmidt Orthogonalization . . . . . . . . . . . . 59
3.6.1 Examples . . . . . . . . . . . . . . . . . . . . . 62
3.7 Diagonalization . . . . . . . . . . . . . . . . . . . . . . 64
3.7.1 Self-adjoint operators and their eigenvalues . . 65
3.7.2 Diagonalization of self-adjoint operators . . . . 67
3.8 Best Approximation . . . . . . . . . . . . . . . . . . . 69
3.9 Best Approximate Solution . . . . . . . . . . . . . . . 72
3.10 QR-Factorization and Best Approximate Solution . . . 74
3.11 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . 75

4 Error Bounds and Stability of Linear Systems 76


4.1 Norms of Vectors and Matrices . . . . . . . . . . . . . 76
4.2 Error Bounds for System of Equations . . . . . . . . . 81

5 Fixed Point Iterations for Solving Equations 84


5.1 Iterative Methods for Solving Ax = b . . . . . . . . . . 88
5.1.1 Jacobi Method . . . . . . . . . . . . . . . . . . 89
5.1.2 Gauss-Siedel Method . . . . . . . . . . . . . . . 90
5.2 Newton’s Method for Solving f (x) = 0 . . . . . . . . . 92
5.2.1 Error Estimates . . . . . . . . . . . . . . . . . . 93

6 Interpolation and Numerical Integration 96


6.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . 96
6.1.1 Lagrange Interpolation . . . . . . . . . . . . . . 98
6.1.2 Piecewise Lagrange Interpolation . . . . . . . . 99
6.2 Numerical Integration . . . . . . . . . . . . . . . . . . 100
6.2.1 Trapezoidal rule . . . . . . . . . . . . . . . . . 101
6.2.2 Composite Trapezoidal rule . . . . . . . . . . . 101
6.2.3 Simpson’s rule . . . . . . . . . . . . . . . . . . 102
6.2.4 Composite Simpson’s rule . . . . . . . . . . . . 103

7 Additional Exercises 104


Preface

iv
1
Vector Spaces

1.1 Introduction
The notion of a vector space is an abstraction of the familiar set of
vectors in two or three dimensional Euclidian space. For example, let
~x = (x1 , x2 ) and ~y = (y1 , y2 ) be two vectors in the plane R2 . Then
we have the notion of addition of these vectors so as to get a new
vector denoted by ~x + ~y , and it is defined by
~x + ~y = (x1 + y1 , x2 + y2 ).
This addition has an obvious geometric meaning: If O is the coordi-
nate origin, and if P and Q are points in R2 representing the vectors
~x and ~y respectively, then the vector ~x + ~y is represented by a point
R in such way that OR is the diagonal of the parallelogram for which
OP and OQ are adjacent sides.
Also, if α is a positive real number, then the multiplication of ~x
by α is defined by
α~x = (αx1 , αx2 ).
Geometrically, the vector αx~ is an elongated or contracted form of
~x in the direction of ~x. Similarly, we can define α~x with a nega-
tive real number α, so that α~x represents in the negative direction.
Representing the coordinate-origin by ~0, and −~x := (−1)~x, we see
that
~x + ~0 = ~x, ~x + (−~x) = ~0.
We may denote the sum ~x + (−~y ) by ~x − ~y .
Now, abstracting the above properties of vectors in the plane, we
define the notion of a vector space.
We shall denote by F the field of real numbers or the field of
complex numbers. If special emphasis is required, then the fields
of real numbers and complex numbers will be denoted by R and C,
respectively.

1
2 Vector Spaces

1.2 Definition and Some Basic Properties


Definition 1.1 (Vector space) A vector space over F is a nonempty
set V together with two operations
(i) addition which associates each pair (x, y) of elements in V a
unique element in V denoted by x + y, and
(ii) scalar multiplication which associates each pair (α, x) with
α ∈ F and x ∈ V , a unique element in V denoted by αx,
satisfying the following conditions:
(a) x+y =y+x ∀ x, y ∈ V .
(b) (x + y) + z = x + (y + z) ∀ x, y, z ∈ V .
(c) ∃ θ ∈ V such that x + θ = x ∀x ∈ V .
(d) ∀ x ∈ V , ∃ x̃ ∈ V such that x + x̃ = θ.
(e) α(x + y) = αx + αy ∀ α ∈ F, ∀ x, y ∈ V .
(f) (α + β)x = αx + βx ∀ α, β ∈ F, ∀ x ∈ V .
(g) (αβ)x = α(βx) ∀ α, β ∈ F, ∀ x ∈ V .
(h) 1x = x ∀x ∈ V .
Elements of a vector space are called vectors, and elements of
the field F (over which the vector space is defined) are often called
scalars.
Proposition 1.1 Let V be a vector space. Then there is exactly one
element θ ∈ V such that x + θ = V for all x ∈ V .
Proof. Suppose there are θ1 and θ2 in V such that

x + θ1 = x and x + θ2 = x ∀x ∈ X.

Then, using conditions (a) and (c), we have

θ2 = θ2 + θ1 = θ1 + θ 2 = θ 1 .

This completes the proof.

Definition 1.2 (zero element) Let V be a vector space. The


unique element θ ∈ V such that x + θ = x for all x ∈ V is called the
zero element or simply, the zero in V .
Notation: The zero element in a vector space as well as the zero in
the scalar field are often denoted by the same symbol 0.
Definition and Some Basic Properties 3

Exercise 1.1 Let V be a vector space. For x, y ∈ V , show that


x + y = x implies y = θ. ♦

Proposition 1.2 Let V be a vector space. For each x ∈ V , there is


exactly one element x̃ ∈ V such that x + x̃ = θ.

Proof. Let x ∈ V . Suppose x0 and x00 are in V such that

x + x0 = θ and x + x00 = θ.

Then using the axioms (a), (b), (c), it follows that

x0 = x0 + θ = x0 + (x + x00 ) = (x0 + x) + x00 = θ + x00 = x00 .

This completes the proof.

Definition 1.3 (additive inverse) Let V be a vector space. For


each x ∈ V , the unique element x̃ ∈ V such that x + x̃ = θ is called
the additive inverse of x.
Notation: For x in a vector space, the unique element x̃ which
satisfies x + x̃ = θ is denoted by −x, i.e.,

−x := x̃.

Proposition 1.3 Let V be a vector space. Then, for all x ∈ V ,

0x = θ and (−1)x = −x.

Proof. Let x ∈ V . Since

0x = (0 + 0)x = 0x + 0x,

we have 0x = θ. Now,

x + (−1)x = [1 + (−1)]x = 0x = θ

so that, by the uniqueness of the additive inverse of x, we have


(−1)x = −x.

Notation: For x, y in a vector space, the expression x + (−y) is


denoted by x − y, i.e.,

x − y := x + (−y).
4 Vector Spaces

Exercise 1.2 Show that, if x ∈ V and x 6= 0, then αx 6= βx for


every α, β ∈ F with α 6= β. [Hint: Condition (h)] ♦

Remark 1.1 We observe that a vector space V , by definition, cannot


be an empty set. It contains at least one element, viz., the zero
element. If a vector space V contains at least one nonzero element,
then it contains infinitely many nonzero elements: If x is a nonzero
element in V , and if α, β are scalars such that α 6= β, then αx 6= βx
(see Exercise 1.2). ♦
Convention: Unless otherwise specified, we always assume that the
vector space under discussion is non-trivial, i.e., it contains at least
one nonzero element.

1.3 Examples of Vector Spaces


EXAMPLE 1.1 (Space Fn ) Consider the set Fn of all n–tuples
of scalars, i.e.,

Fn := {(α1 , . . . , αn ) : αi ∈ F, i = 1, . . . , n}.

For x = (α1 , . . . , αn ), y = (β1 , . . . , βn ) in Fn , and α ∈ F, define the


addition and scalar multiplication coordinate-wise as

x + y = (α1 + β1 , . . . , αn + βn ), αx = (αα1 , . . . , ααn ).

Then it can be seen that Fn is a vector space with zero element


θ := (0, . . . , 0) and additive inverse of x = (α1 , . . . , αn ) as −x =
(−α1 , . . . , −αn ). ♦

NOTATION: We shall, sometimes, denote the ith coordinate of


x ∈ Fn by x(i) for i ∈ {1, . . . , n}. Thus, if x = (α1 , . . . , αn ) ∈ Fn ,
then x(i) = αi for i ∈ {1, . . . , n}.
EXAMPLE 1.2 (Space Pn ) For n ∈ {0, 1, 2, . . .}, let Pn be the
set of all polynomials of degree at most n, with coefficients in F, i.e.,
x ∈ Pn if and only if x is of the form

x = a0 + a1 t + . . . + an tn

for some scalars a0 , a1 . . . , an . Then Pn is a vector space with addi-


tion and scalar multiplication defined as follows:
Examples of Vector Spaces 5

For x = a0 + a1 t + . . . an tn , y = b0 + b1 t + . . . + bn tn in Pn and
α ∈ F,

x + y = (a0 + b0 ) + (a1 + b1 )t + . . . + (an + bn )tn ,

αx = αa0 + αa1 t + . . . + αan tn .


The zero polynomial, i.e., the polynomial with all its coefficients zero,
is the zero element of the space, and

−x = −a0 − a1 t − . . . − an tn .


EXAMPLE 1.3 (Space P) Let P be the set of all polynomials
with coefficients in F, i.e., x ∈ P if and only if x ∈ Pn for some
n ∈ {0, 1, 2, . . .}. For x, y ∈ P, let n, m be such that x ∈ Pn and
y ∈ Pm . Then we have x, y ∈ Pk , where k = max {n, m}. Hence we
can define x + y and αx for α ∈ F as in Pk . With this addition and
scalar multiplication, it follows that P is a vector space. ♦
EXAMPLE 1.4 (Space Fm×n ) Let V = Fm×n be the set of all
m × n matrices with entries in F. If A is a matrix with its ij-th
entry aij , then we shall write A = [aij ]. It is seen that V is a vector
space with respect to the addition and scalar multiplication defined
as follows: For A = [aij ], B = [bij ] in V , and α ∈ F,

A + B := [aij + bij ], αA := [αaij ].

In this space, −A = [−aij ], and the matrix with all its entries are
zeroes is the zero element. ♦
EXAMPLE 1.5 (Space Fk ) This example is a special case of the
last one, namely,
Fk := Fk×1 .
This vector space is in one-one correspondence with Fk . One such
correspondence is given by F : Fk → Fk defined by
 
x1
 x2 
F ((x1 , . . . , xk )) = 
. . . ,
 (x1 , . . . , xk ) ∈ Fk .
xk


6 Vector Spaces

NOTATION: If x ∈ Fn , we shall denote its j th entry or coordinate


by xj .
EXAMPLE 1.6 (Sequence space) Let V be the set of all scalar
sequences. For (αn ) and (βn ) in V , and α ∈ F, we define

(αn ) + (βn ) = (αn + βn ), α(αn ) = (ααn ).

With this addition and scalar multiplication, V is a vector space with


its zero element as the sequence of zeroes, and −(αn ) = (−αn ). ♦
Exercise 1.3 Verify that the sets considered in Examples 1.1 – 1.6
are indeed vector spaces. ♦

EXAMPLE 1.7 (Space C(I)) Let I be an interval and C(I) be


the set of all real valued continuous functions defined on I. For
x, y ∈ C(I) and α ∈ F, we define x + y and αx point-wise, i.e.,

(x + y)(t) = x(t) + y(t), (αx)(t) = αx(t), t ∈ I.

Then it can be shown that x + y and αx are in C(I), and C(I)


is a vector space over R. The zero element is the zero function,
and the additive inverse of x ∈ C(I) is the function −x defined by
(−x)(t) = −x(t), t ∈ I. ♦

EXAMPLE 1.8 (Space R[a, b]) Let R[a, b] be the set of all real
valued Riemann integrable functions on [a, b]. From the theory of
Riemann integration, it follows that if x, y ∈ R[a, b] and α ∈ F, then
x + y and αx defined pointwise belongs to R[a, b]. It is seen that
(Verify) R[a, b] is a vector space over R. ♦

EXAMPLE 1.9 (Product space) Let V1 , . . . , Vn be vector spaces.


Then the cartesian product

V = V1 × · · · × Vn ,

the set of all of ordered n-tuples (x1 , . . . , xn ) with xj ∈ Vj for j ∈


{1, . . . , n}, is a vector space with respect to the addition and scalar
multiplication defined by

(x1 , . . . , xn ) + (y1 , . . . , yn ) := (x1 + y1 , . . . , xn + yn ),

α(x1 , . . . , xn ) := (αx1 , . . . , αxn )


Examples of Vector Spaces 7

with zero element (0, . . . , 0) and additive inverse of x = (x1 , . . . , xn )


defined by −x = (−x1 , . . . , −xn ).
This vector space is called the product space of V1 , . . . Vn .
As a particular example, the space Fn can be considered as the
product space V1 × · · · × Vn with Vj = F for j = 1, . . . , n. ♦
Exercise 1.4 In each of the following, a set V is given and some
operations are defined. Check whether V is a vector space with
these operations:
(i) Let V = {x = (x1 , x2 ) ∈ R2 : x1 + x2 = 1} with addition and
scalar multiplication as for R2 .
(ii) Let V = R2 , F = R. For x = (x1 , x2 ), y = (y1 , y2 ), let
x + y := (x1 + y1 , x2 + y2 ) and for all α ∈ R,

(0, 0) α = 0,
αx :=
(αx1 , x2 /α), α 6= 0.

(iii) Let V = C2 , F = C. For x = (x1 , x2 ), y = (y1 , y2 ), let

x + y := (x1 + 2y1 , x2 + 3y2 ) and αx := (αx1 , αx2 ) ∀α ∈ C.

(iv) Let V = R2 , F = R. For x = (x1 , x2 ), y = (y1 , y2 ), let

x + y := (x1 + y1 , x2 + y2 ) and αx := (x1 , 0) ∀α ∈ R.


Exercise 1.5 Let Ω be a nonempty set and W be a vector space.
Let F(Ω, W ) be the set of all functions from Ω into W . For f, g ∈
F(Ω, W ) and α ∈ F, let F + G and αF be defined point-wise, i.e.,

(f + g)(s) = f (s) + g(s), (αf )(s) = αf (s), s ∈ Ω.

Let −f and θ be defined by

(−f )(s) = −f (s), θ(s) = 0, s ∈ S.

Show that F(Ω, W ) is a vector space over F with the above opera-
tions. ♦
Exercise 1.6 In Exercise 1.5, let Ω = {1, . . . , n} and W = F. Show
that the map T : F(S, F) → Fn defined by

T (f ) = (f (1), . . . , f (n)), f ∈ F(Ω, F),


8 Vector Spaces

is bijective.
Also show that for every f, g in F(Ω, F) and α ∈ F,

T (f + g) = T (f ) + T (g), T (αf ) = αT (f ).

Such a map is called a linear transformation or a linear transforma-


tion. Linear transformations will be considered in more detail in the
next chapter. ♦

1.4 Subspace and Span


1.4.1 Subspace
We observe that
• V = {x = (x1 , x2 ) ∈ R2 : x2 = 0}, which is a subset of R2 is
a vector space with respect to the addition and scalar multiplication
as in R2 .
• V = {x = (x1 , x2 ) ∈ R2 : 2x1 + 3x2 = 0} which is a sub-
set of R2 is a vector space with respect to the addition and scalar
multiplication as in R2 .
• Pn which is a subset of the vector space P is also a vector space,
These examples motivate the following definition.
Definition 1.4 (Subspace) Let V0 be a subset of a vector space
V . If V0 is a vector space with respect to the operations of addition
and scalar multiplication as in V , then V0 is called a subspace of V .
The following theorem is very useful for checking whether a subset
of a vector space is a subspace or not.

Theorem 1.4 Let V be a vector space, and V0 be a subset of V .


Then V0 is a subspace of V if and only if for every x, y in V0 and
α ∈ F,
x + y ∈ V0 and αx ∈ V0 .

Proof. Clearly, if V0 is a subspace of V , then x + y ∈ V0 and


αx ∈ V0 for all x, y ∈ V0 and for all α ∈ F.
Conversely, suppose that x + y ∈ V0 and αx ∈ V0 for all x, y ∈ V0
and for all α ∈ F. Then, for any x ∈ V0 ,

θ = 0 x ∈ V0 and − x = (−1)x ∈ V0 .
Subspace and Span 9

Thus, conditions (c) and (d) in the definition of a vector space are
satisfied for V0 . All the remaining conditions can be easily verified
as elements of V0 are elements of V as well.

EXAMPLE 1.10 The space Pn is a subspace of Pm for n ≤ m. ♦


EXAMPLE 1.11 The space C[a, b] is a subspace of R[a, b]. ♦
EXAMPLE 1.12 (Space C k [a, b]) For k ∈ N, let C k [a, b] be the set
of all F-valued functions defined on [a, b] such that the j-th derivative
x(j) of x exists and x(j) ∈ C[a, b] for each j ∈ {1, . . . , k}. It can be
seen that C k [a, b] is a subspace of C[a, b]. ♦
EXAMPLE 1.13 For n ∈ N and (a1 , . . . , an ) ∈ Fn ) , let

V0 = {(x1 , . . . , xn ) ∈ Fn : a1 x1 + . . . + an xn = 0}.

Then V0 is a subspace of Fn .
Recall from school geometry that, for F = R and n = 3, the
subspace

V0 = {(x1 , x2 , x3 ) ∈ R3 : a1 x1 + +a2 x2 + a3 x3 = 0}

is a plane passing through the origin. ♦

Theorem 1.5 Suppose V1 and V2 are subspaces of a vector space.


Then V1 ∩ V2 is a subspace of V .

Proof. Suppose x, y ∈ V1 ∩ V2 and α ∈ F. Then x, y ∈ V1 and


x, y ∈ V2 . Since V1 and V2 are subspaces, it follows that αx, x+y ∈ V1
and αx, x + y ∈ V2 so that αx, x + y ∈ V1 ∩ V2 . Thus, by Theorem
1.4, V1 ∩ V2 is a subspace.

Union of two subspaces need not be a subspace. To see this


consider the subspaces

V1 := {(x1 , x2 ) : x2 = x1 }, V2 := {(x1 , x2 ) : x2 = 2x1 }

of the space R2 . Note that x = (1, 1) ∈ V1 and y = (1, 2) ∈ V2 , but


x + y = (2, 3) 6∈ V1 ∪ V2 . Hence V1 ∪ V2 is not a subspace of R2 .

Exercise 1.7 Let V1 and V2 be subspaces of a vector space. Prove


that V1 ∪ V2 is a subspace if and only if either V1 ⊆ V2 or V2 ⊆ V1 . ♦
10 Vector Spaces

Exercise 1.8 Let A be an m × n matrix of scalars. Show that the


sets
V1 = {x ∈ Fn : Ax = 0},
V2 = {y ∈ Fm : y = Ax for some x ∈ Fm }
are subspaces of Fn . ♦

Exercise 1.9 Let P[a, b] be the vector space P over R taking its
elements as continuous real valued functions defined on [a, b]. Then
the space P[a, b] is a subspace of C k [a, b] for every k ≥ 1. ♦

Exercise 1.10 Suppose V0 is a subspace of a vector space V , and


V1 is a subspace of V0 . Then show that V1 is a subspace of V . ♦

Exercise 1.11 Show that

V0 = {(x1 , x2 , x3 ) ∈ R3 : x1 + x2 + x3 = 0, x1 + 2x2 + 3x3 = 0}

is a subspace of R3 . Observe that V0 is the intersection of the sub-


spaces
V1 = {(x1 , x2 , x3 ) ∈ R3 : x1 + x2 + x3 = 0}
and
V2 = {(x1 , x2 , x3 ) ∈ R3 : x1 + 2x2 + 3x3 = 0}.
Note that V1 and V2 are planes through the origin and hence V0 is a
straight line passing through the origin. ♦

Exercise 1.12 Suppose Λ is a set, and for each λ ∈ Λ let Vλ be a


subspace of a vector space V . Then ∩λ∈Λ Vλ is a subspace of V . ♦

Exercise 1.13 In each of the following vector space V, see if the


subset V0 is a subspace of V :

(i) V = R2 and V0 = {(x1 , x2 ) : x2 = 2x1 − 1}.

(ii) V = C[−1, 1] and V0 = {f ∈ V : f is an odd function}.

(iii) V = C[0, 1] and V0 = {f ∈ V : f (t) ≥ 0 ∀t ∈ [0, 1]}.

(iv) V = P3 and V0 = {a0 + a1 t + a2 t2 + a3 t3 : a0 = 0}.

(v) V = P3 and V0 = {a0 + a1 t + a2 t2 + a3 t3 : a2 = 0}.


Subspace and Span 11

Exercise 1.14 Prove that the only proper subspaces of R2 are the
straight lines passing through the origin. ♦

Exercise 1.15 Let V be a vector space and u1 , . . . , un be in V . Show


that
V0 := {α1 u1 + . . . + αn un : αi ∈ F, i = 1, . . . , n}
is a subspace of V . ♦

1.4.2 Linear Combination and Span


Definition 1.5 (Linear combination) Let V be a vector space
and u1 , . . . , un belongs to V . Then, by a linear combination of
u1 , . . . , un , we mean an element in V of the form α1 u1 + · · · + αn un
with αj ∈ F, j = 1, . . . , n.
Definition 1.6 (Span) Let V be a vector space and u1 , . . . , un
belongs to V . Then the set of all linear combinations of u1 , . . . , un
is called the span of u1 , . . . , un , we write it as

span {u1 , . . . , un }.

In view of Exercise 1.15, if u1 , . . . , un belongs to a vector space


V , then span {u1 , . . . , un } is a subspace of V .
More generally, we have the following definition.
Definition 1.7 (Span) Let S be a subset of V . Then the set of all
linear combinations of elements of S is called the span of S, and is
also denoted by span (S).
Thus, for S ⊆ V , x ∈ span S if and only if there exists x1 , . . . , xn
in S and scalars α1 , . . . , αn such that x = α1 x1 + · · · + αn xn .
As a convention, span of the empty set is taken to be the singleton
set {0}.
Remember! By a linear combination, we always mean a linear
combination of a finite number of elements in the space. An expres-
sion of the form α1 x1 +αn xn +· · · with x1 , x2 , . . . in V and α1 , α2 , . . .
in F has no meaning in a vector space, unless there is some additional
structure which allows such expression.

Exercise 1.16 Let V be a vector space, and S ⊆ V . Then span (S)


is a subspace of V , and span (S) is the smallest subspace containing
12 Vector Spaces

S, in the sense that, if if V0 is a subspace of V such that S ⊂ V0 ,


then span (S) ⊆ V0 . ♦
Exercise 1.17 Let S be a subset of a vector space V . Show that S
is a subspace if and only if S = span S. ♦
Exercise 1.18 Let V be a vector space. Show that the the following
hold.
(i) Let S be a subset of V . Then
\
span S = {Y : Y is a subspace of V containing S}.

(ii) Suppose V0 is a subspace of V and x0 ∈ V \ V0 . Then for


every x ∈ span {x0 ; V0 } := span ({x0 } ∪ V0 ), there exists a unique
pair (α, y) ∈ F × V0 such that x = αx0 + y. ♦

NOTATION (Kronecker1 delta): For (i, j) ∈ N × N, let


(
1 if i = j
δij =
0 if i 6= j.

EXAMPLE 1.14 Let V = Fn and for each j ∈ {1, . . . , n}, let


ej ∈ Fn be such that its i-th coordinate is δij . Then Fn is the span
of {e1 , . . . , en }. ♦

EXAMPLE 1.15 For 1 ≤ k < n, let


V0 := {(α1 , . . . , αn ) ∈ Rn : αj = 0, j = k + 1, . . . , n}.
Then it is seen that V0 is the span of {e1 , . . . , ek }. ♦
EXAMPLE 1.16 Let V = P, and uj = tj−1 , j ∈ N. Then Pn is
the span of {u1 , . . . , un+1 }, and P = span {u1 , u2 , . . .}. ♦
EXAMPLE 1.17 (Space c00 ) Let V be the set of all sequences
with real entries. For n ∈ N, let
en = (δn1 , δn2 , . . .).
Then span {e1 , e2 , . . .} is the space of all scalar sequences with only a
finite number of nonzero entries. The space span {e1 , e2 , . . .} usually
denoted by c00 . ♦
1
German mathematician Leopold Kronecker (December 7, 1823 December 29,
1891). He was quoted as having said, ”God made integers; all else is the work of
man”.
Subspace and Span 13

Exercise 1.19 Consider the system of equations

a11 x1 + a12 x2 + ... + a1n xn = b1


a21 x1 + a22 x2 + ... + a2n xn = b2
... + ... + ... + ... = ...
am1 x1 + am1 x2 + ... + amn xn = bm

Let      
a11 a12 a1n
 a21   a22   a2n 
 . . .  , u2 :=  . . .  , . . . , un :=  . . .  .
u1 :=      

am1 am2 amn


Show that the above system has a solution vector x = [x1 , . . . , xn ]T
if and only if b = [b1 , . . . , bn ]T is in the span of {u1 , . . . , un }. ♦

Exercise 1.20 Let uj (t) = tj−1 , j ∈ N. Show that span of {u1 , . . . , un+1 }
is Pn , and span of {u1 , u2 , . . .} is P. ♦

Exercise 1.21 Let u1 (t) = 1, and for j = 2, 3, . . . , let uj (t) =


1 + t + . . . + tj . Show that span of {u1 , . . . , un } is Pn , and span of
{u1 , u2 , . . .} is P. ♦

Definition 1.8 (Sum of subsets) Let V be a vector space, x ∈ V ,


and E, E1 , E2 be subsets of V . Then we define the following:

x + E := {x + u : u ∈ E},

E1 + E2 := {x1 + x2 : x1 ∈ E1 , x2 ∈ E2 }.
The set E1 + E2 is called the sum of the subsets E1 and E2 .

Theorem 1.6 Suppose V1 and V2 are subspaces of V . Then V1 + V2


is a subspace of V . In fact,

V1 + V2 = span (V1 ∪ V2 ).

Proof. Let x, y ∈ V1 +V2 and α ∈ F. Then, there exists x1 , y1 ∈ V1


and x2 , y2 ∈ V2 such that x = x1 + y1 , y = y1 + y2 . Hence,

x + y = (x1 + y1 ) + (y1 + y2 ) = (x1 + y1 ) + (x2 + y2 ) ∈ V1 + V2 ,

α(x + y) = α(x1 + y1 ) = (αx1 + αy1 ) ∈ V1 + V2 .


Thus, V1 + V2 is a subspace of V .
14 Vector Spaces

Now, since V1 ∪ V2 ⊆ V1 + V2 , and since V1 + V2 is a subspace,


we have span (V1 ∪ V2 ) ⊆ V1 + V2 . Also, since V1 ⊆ span (V1 ∪ V2 ),
V2 ⊆ span (V1 ∪ V2 ), and since span (V1 ∪ V2 ) is a subspace, we have
V1 + V2 ⊆ span (V1 ∪ V2 ). Thus,

V1 + V2 ⊆ span (V1 ∪ V2 ) ⊆ V1 + V2 ,

which proves the last part of the theorem.

Exercise 1.22 Suppose V1 and V2 are subspaces of a vector space V


such that V1 ∩ V2 = {0}. Show that every x ∈ V1 + V2 can be written
uniquely as x = x1 + x2 with x1 ∈ V1 and x2 ∈ V2 . ♦
Exercise 1.23 Suppose V1 and V2 are subspaces of a vector space
V . Show that V1 + V2 = V1 if and only if V2 ⊆ V1 . ♦

1.5 Basis and Dimension


Definition 1.9 (Linear dependence) Let V be a vector space. A
subset E of V is said to be linearly dependent if there are u1 , . . . , un ,
n ≥ 2, in E such that at least one of them is a linear combination of
the remaining ones.
Definition 1.10 (Linear independence) Let V be a vector space.
A subset E of V is said to be linearly independent in V if it is not
linearly dependent.
Exercise 1.24 Let E be a subset of of a vector space V . Then prove
the following.
(i) E is linearly dependent if and only if there exists u1 , . . . , un
in E and scalars α1 , . . . , αn , with at least one of them nonzero, such
that α1 u1 + · · · + αn xn = 0,
(ii) E is linearly independent if and only if for every finite subset
{u1 , . . . , un } of E,

α1 u1 + · · · + αn xn = 0 =⇒ αi = 0 ∀ i = 1, . . . , n.


If {u1 , . . . , un } is a linearly independent (respectively, dependent)
subset of a vector space V , then we may also say that u1 , . . . , un are
linearly independent (respectively, dependent) in V .
Note that a linearly dependent set cannot be empty. In other
words, the empty set is linearly independent!
Basis and Dimension 15

Remark 1.2 If u1 , . . . , un are such that at least one of them is


not in the span of the remaining, then we cannot conclude that
u1 , . . . , un are linearly independent. For the linear independence of
{u1 , . . . , un }, it is required that ui 6∈ span {uj : j 6= i} for every
i ∈ {1, . . . , n}.
Also, if {u1 , . . . , un } are linearly dependent, then it does not imply
that any one of them is in the span of the rest.
To illustrate the above points, consider two linearly independent
vectors u1 , u2 . Then we have u1 6∈ span {u2 , 3u2 }, but {u1 , u2 , 3u2 }
is linearly dependent, and {u1 , u2 , 3u2 } is linearly dependent, but
u1 6∈ span {u2 , 3u2 }. ♦
Exercise 1.25 Let V be a vector space.
(i) Show that a subset {u1 , . . . , un } of V is linearly dependent
if and only if there exists a nonzero (α1 , . . . , αn ) in Fn such that
α1 u1 + · · · + αn un = 0.
(ii) Show that a subset {u1 , . . . , un } of V is linearly independent
if and only if the function (α1 , . . . , αn ) 7→ α1 u1 + · · · + αn un from Fn
into V is injective.
(iii) Show that if E ⊆ V is linearly independent in V , then 0 6∈ E.
(iv) Show that if E ⊆ V is linearly dependent in V , then every
superset of E is also linearly dependent.
(v) Show that if E ⊆ V is linearly independent in V , then every
subset of E is also linearly independent.
(vi) Show that if {u1 , . . . , un } is a linearly independent subset
of V , and if Y is a subspace of V such that {u1 , . . . , un } ∩ Y = ∅,
then every x in the span of {u1 , . . . , un , Y } can be written uniquely
as x = α1 u1 + · · · + αn un + y with (α1 , . . . , αn ) ∈ Fn , y ∈ Y .
(vii) Show that if E1 and E2 are linearly independent subsets
of V such that (span E1 ∩ (span E2 ) = {0}, then E1 ∪ E2 is linearly
independent. ♦

Exercise 1.26 Let A be an m × n matrix of scalars with columns


a1 , a2 , . . . , an . Show the following:
(i) The equation Ax = 0 has a non-zero solution x ∈ Fn if and
only if a1 , a2 , . . . , an are linearly dependent.
(ii) For y ∈ Fm , the equation Ax = y has a solution x ∈ Fn if
and only if a1 , a2 , . . . , an , y are linearly dependent, i.e., if and only
16 Vector Spaces

if span {a1 , a2 , . . . , an , y} = span {a1 , a2 , . . . , an }, i.e., if and only if


y ∈ span {a1 , a2 , . . . , an }. ♦
Definition 1.11 (Basis) A subset E of a vector space V is said to
be a basis of V if it is linearly independent and span E = V .
EXAMPLE 1.18 For each j ∈ {1, . . . , n}, let ej ∈ Fn be such that
ej (i) = δij , i, j = 1, . . . , n. Then we have seen that {e1 , . . . , en } is
linearly independent and its span is Fn . Hence {e1 , . . . , en } is a basis
of Fn . ♦
EXAMPLE 1.19 For each j ∈ {1, . . . , n}, let ej ∈ Fn be such that
ej (i) = δij , i, j = 1, . . . , n. Then it is easily seen that {e1 , . . . , en } is
linearly independent and its span is Fn . Hence {e1 , . . . , en } is a basis
of Fn . ♦
Definition 1.12 (Standard bases of Fn and Fn ) The basis {e1 , . . . , en }
of Fn is called the standard basis of Fn , and the basis {e1 , . . . , en } of
Fn is called the standard basis of Fn .
EXAMPLE 1.20 Let uj = tj−1 , j ∈ N. Then {u1 , . . . , un+1 } is a
basis of Pn , and {u1 , u2 , . . .} is a basis of P. ♦
Exercise 1.27 Let u1 = 1, and for j = 2, 3, . . . , let uj = 1 + t + . . . +
tj−1 . Show that {u1 , . . . , un+1 } is a basis of Pn , and {u1 , u2 , . . .} is
a basis of P. ♦
EXAMPLE 1.21 For i = 1, . . . , m; j = 1, . . . , n, let Mij be the
m × n matrix with its (i, j)-th entry as 1 and all other entries 0.
Then
{Mij : i = 1 . . . , m; j = 1, . . . , n}
is a basis of Fm×n . ♦
Remark 1.3 A linearly independent subset of a subspace remains
linearly independent in the whole space. ♦
Theorem 1.7 Let V be a vector space and E ⊆ V . Then the fol-
lowing are equivalent.
(i) E is a basis of V
(ii) E is a maximal linearly independent set in V , i.e., E is
linearly independent, and a proper superset of E cannot be linearly
independent.
(iii) E is a minimal spanning set of V , i.e., span of E is V , and
a proper subset of E cannot span V .
Basis and Dimension 17

Proof. (i) ⇐⇒ (ii): Suppose E is a basis of V . Suppose E e is a


proper superset of E. Let x ∈ E \E. Since E is a basis, x ∈ span (E).
e
This shows that E e is linearly dependent, since E ∪ {x} ⊆ E.
e
Conversely, suppose E is a maximal linearly independent set. If
E is not a basis, then there exists x 6∈ span (E). Hence, it is seen
that, E ∪ {x} is a linearly independent which is a proper superset of
E – a contradiction to the maximality of E.
(i) ⇐⇒ (iii): Suppose E is a basis of V . Suppose F is a proper
subset of E. Then, it is clear that there exists x ∈ E \ F which is
not in the span of F , since F ∪ {x} ⊆ E. Hence, F does not span V .
Conversely, suppose E is a minimal spanning set of V . If E is
not a basis, then E is linearly dependent, and hence there exists
x ∈ span (E \ {x}). Since E spans V , it follows that E \ {x}, which
is a proper subset of E, also spans V – a contradiction to the fact
that E is a minimal spanning set of V .

Exercise 1.28 For λ ∈ [a, b], let uλ (t) = exp (λt), t ∈ [a, b]. Show
that {uλ : λ ∈ [a, b]} is an uncountable linearly independent subset
of C[a, b]. ♦
Exercise 1.29 If {u1 , . . . , un } is a basis of a vector space V , then
show that every x ∈ V , can be expressed uniquely as

x = α1 u1 + · · · + αn un ,

that is, for every x ∈ V , there exists a unique n-tuple (α1 , . . . , αn )


of scalars such that x = α1 u1 + · · · + αn un . ♦
Exercise 1.30 Consider the system of equations
a11 x1 + a12 x2 + ... + a1n xn = b1
a21 x1 + a22 x2 + ... + a2n xn = b2
... + ... + ... + ... = ...
am1 x1 + am1 x2 + ... + amn xn = bm
Show that the above system has at most one solution if and only if
the vectors
     
a11 a12 a1n
 a21 
 , w2 :=  a22  , . . . , wn :=  a2n 
   
w1 := 
 ...   ...   ... 
am1 am2 amn
are linearly independent. ♦
18 Vector Spaces

Exercise 1.31 Let u1 , . . . , un be linearly independent vectors in a


vector space V . Let [aij ] be an m × n matrix of scalar, and let

v1 := a11 u1 + a21 u2 + ... + am1 un


v2 := a12 u1 + a22 u2 + ... + am2 un
... ... + ... + ... + ...
vn := a1n u1 + a2n u2 + ... + amn un .

Show that the v1 , . . . , vm are linearly independent if and only if the


vectors
     
a11 a12 a1n
 a21 
 , w2 :=  a22  , . . . , wn :=  am2 
   
w1 := 
 ...   ...   ... 
am1 am2 amn

are linearly independent. ♦

Exercise 1.32 Let p1 (t) = 1 + t + 3t2 , p2 (t) = 2 + 4t + t2 , p3 (t) =


2t + 5t2 . Are the polynomials p1 , p2 , p3 linearly independent? ♦

1.5.1 Dimension of a Vector Space

Definition 1.13 (Finite dimensional space) A vector space V is


said to be a finite dimensional space if there is a finite basis for V .
Recall that the empty set is considered as a linearly independent
set, and its span is the zero space.
Definition 1.14 (Infinite dimensional space) A vector space
which is not a finite dimensional space is called an infinite dimen-
sional space.

Theorem 1.8 If a vector space has a finite spanning set, then it has
a finite basis. In fact, if S is a finite spanning set of V , then there
exists a basis E ⊆ S.

Proof. Let V be a vector space and S be a finite subset of V such


that span S = V . If S itself is linearly independent, then we are
through. Suppose S is not linearly independent. Then there exists
u1 ∈ S such that u1 ∈ span (S \ {u1 }). Let S1 = S \ {u1 }. Clearly,

span S1 = span S = V.
Basis and Dimension 19

If S1 is linearly independent, then we are through. Otherwise, there


exists u2 ∈ S1 such that u2 ∈ span (S1 \ {u2 }). Let S2 = S \ {u1 , u2 }.
Then, we have
span S2 = span S1 = V.
If S2 is linearly independent, then we are through. Otherwise, con-
tinue the above procedure. This procedure will stop after a
finite number of steps, as the original set S is a finite set, and we
end up with a subset Sk of S which is linearly independent and
span Sk = V .

By definition, an infinite dimensional space cannot have a finite


basis. Is it possible for a finite dimensional space to have an infinite
basis, or an infinite linearly independent subset? The answer is, as
expected, negative. In fact, we have the following result.
Theorem 1.9 Let V be a finite dimensional vector space with a basis
consisting of n elements. Then every subset of V with more than n
elements is linearly dependent.
Proof. Let {u1 , . . . , un } be a basis of V , and {x1 , . . . , xn+1 } ⊂ V .
We show that {x1 , . . . , xn+1 } is linearly dependent.
If {x1 , . . . , xn } is linearly dependent, then {x1 , . . . , xn+1 } is lin-
early dependent. So, let us assume that {x1 , . . . , xn } is linearly in-
dependent. Now, since {u1 , . . . un } is a basis of V , there exist scalars
α1 , . . . , αn such that

x1 = α1 u1 + · · · + αn un .

Since x1 6= 0, one of α1 , . . . , αn is nonzero. Without loss of generality,


assume that α1 6= 0. Then we have u1 ∈ span {x1 , u2 , . . . , un } so that

V = span {u1 , u2 , . . . , un } = span {x1 , u2 , . . . , un }.


(2) (2)
Let α1 , . . . , αn be scalars such that
(2) (2)
x2 = α1 x1 + α2 u2 + · · · + αn(2) un .
(2) (2)
Since {x1 , x2 } is linearly independent, at least one of α2 , . . . , αn is
(2)
nonzero. Without loss of generality, assume that α2 6= 0. Then we
have u2 ∈ span {x1 , x2 , u3 , . . . , un } so that

V = span {x1 , u2 , . . . , un } = span {x1 , x2 , u3 , . . . , un }.


20 Vector Spaces

Now, let 1 ≤ k ≤ n − 1 be such that

V = span {x1 , x2 , . . . , xk , uk+1 , . . . , un }.


(k+1) (k+1)
Suppose k < n − 1. Then there exist scalars α1 , . . . , αn such
that
(k+1) (k+1) (k+1)
xk+1 = α1 x1 + · · · + αk xk + αk+1 uk+1 + · · · + αn(k+1) un .

Since {x1 , . . . , xk+1 } is linearly independent, at least one of the scalars


(k+1) (k+1)
αk+1 , . . . , αn is nonzero. Without loss of generality, assume that
(k+1)
αk+1 6= 0. Then we have uk+1 ∈ span {x1 , . . . , xk+1 , uk+2 , . . . , un }
so that

V = span {x1 , . . . , xk , uk+1 , . . . , un }


= span {x1 , . . . , xk+1 , uk+2 , . . . , un }.

Thus, the above procedure leads to V = span {x1 , . . . , xn−1 , un } so


(n) (n)
that there exist scalars α1 , . . . , αn such that

(n) (n)
xn = α1 x1 + · · · + αn−1 xn−1 + αn(n) un .

(n)
Since {x1 , . . . , xn } is linearly independent, it follows that αn 6= 0.
Hence,
un ∈ span {x1 , . . . , xn }.
Consequently,

V = span {x1 , x2 , . . . , xn−1 , un } = span {x1 , x2 , . . . , xn−1 , xn }.

Thus, xn+1 ∈ span {x1 , . . . , xn }, showing that {x1 , . . . , xn+1 } is lin-


early dependent.

The following three corollaries are easy consequences of Theorem


1.9. Their proofs are left as exercises for the reader.

Corollary 1.10 If V is a finite dimensional vector space, then any


two bases of V have the same number of elements.

Corollary 1.11 If a vector space contains an infinite linearly inde-


pendent subset, then it is an infinite dimensional space.
Basis and Dimension 21

Corollary 1.12 If (aij ) is an m × n matrix with aij ∈ F and n > m,


then there exists a nonzero (α1 , . . . , αn ) ∈ Fn such that

ai1 α1 + ai2 α2 + · · · + ain αn = 0, i = 1, . . . , m.

Exercise 1.33 Assuming Corollary 1.12, give an alternate proof for


Theorem 1.9. ♦

By Corollary 1.12, we see that if A ∈ Fm×n , then there exists Fn


such that
Ax = 0.

Definition 1.15 (n-vector) An n × 1 matrix is also called an n-


vector.
In view of Corollary 1.10, the following definition makes sense.
Definition 1.16 (Dimension) Suppose V is a finite dimensional
vector space. Then the dimension of V is the number of elements in
a basis of V , and this number is denoted by dim V . If V is infinite
dimensional, then its dimension is defined to be infinity and we write
dim X = ∞.
EXAMPLE 1.22 The spaces Fn and Pn−1 are of dimension n. ♦
EXAMPLE 1.23 It is seen that the set {e1 , e2 , . . . , } ⊆ F(N, F)
with ej (i) = δij is a linearly independent subset of the spaces `1 (N)
and `∞ (N). Hence, it follows that `1 (N) and `∞ (N) are infinite di-
mensional spaces. ♦
EXAMPLE 1.24 We see that {u1 , u2 , . . . , } with uj (t) = tj−1 , j ∈
N, is linearly independent in C k [a, b] for every k ∈ N. Hence, the
space C k [a, b] for each k ∈ N is infinite dimensional. ♦
EXAMPLE 1.25 Suppose S is a finite set consisting of n elements.
Then F(S, F) is of dimension n. To see this, let S = {s1 , . . . , sn },
and for each j ∈ {1, . . . , n}, define fj ∈ F(S, F) by

fj (si ) = δij , i ∈ {1, . . . , n}.

Then the set {f1 , . . . , fn } is a basis of F(S, F): Clearly,


n
X n
X
αj fj = 0 =⇒ αi = αj fj (si ) = 0 ∀ i.
j=1 j=1
22 Vector Spaces

Thus, {f1 , . . . , fn } is linearly independent. Also, note that


n
X
f= f (sj )fj ∀ f ∈ F(S, F).
j=1

Thus span {f1 , . . . , fn } = F(S, F). ♦

1.5.2 Dimension of Sum of Subspaces


Theorem 1.13 Suppose V1 and V2 are subspaces of a finite dimen-
sional vector space V . If V1 ∩ V2 = {0}, then

dim (V1 + V2 ) = dim V1 + dim V2 .

Proof. Suppose {u1 , . . . , uk } is a basis of V1 and {v1 , . . . , v` } is


a basis of V2 . We show that E := {u1 , . . . , uk , v1 , . . . , v` } is a ba-
sis of V1 + V2 . Clearly (Is it clear?) span E = V1 + V2 . So, it is
enough to show that E is linearly independent. For this, suppose
α1 , . . . , αk , β1 , . . . , β` are scalars such that α1 u1 + . . . + αk uk + β1 v1 +
. . . + β` v` = 0. Then we have

x := α1 u1 + . . . + αk uk = −(β1 v1 + . . . + β` v` ) ∈ V1 ∩ V2 = {0}

so that α1 u1 +. . .+αk uk = 0 and β1 v1 +. . .+β` v` = 0. From this, by


the linearly independence of ui ’s and vj ’s, it follows that αi = 0 for
i ∈ {1, . . . , k} and βj = 0 for all j ∈ {1, . . . , `}. Hence, E is linearly
independent. This completes the proof.

In fact, the above theorem is a particular case of the following.

Theorem 1.14 Suppose V1 and V2 are subspaces of a finite dimen-


sional vector space V . Then

dim (V1 + V2 ) = dim V1 + dim V2 − dim (V1 ∩ V2 ).

For the proof of the above theorem we shall make use of the
following result.

Proposition 1.15 Let V be a finite dimensional vector space. If E0


is a linearly independent subset of V , then there exists a basis E of
V such that E0 ⊆ E.
Basis and Dimension 23

Proof. Let E0 = {u1 , . . . , uk } be a linearly independent subset of


V , and let {v1 , . . . , vn } be a basis of V . Let
(
E0 if v1 ∈ span (E0 ),
E1 =
E0 ∪ {v1 } if v1 6∈ span (E0 ).

Clearly, E1 is linearly independent, and

E0 ⊆ E1 , {v1 } ⊆ span (E1 ).

Then define
(
E1 if v2 ∈ span (E1 ),
E2 =
E1 ∪ {v2 } if v2 ∈
6 span (E1 ).

Again, it is clear that E2 is linearly independent, and

E1 ⊆ E2 , {v1 , v2 } ⊆ span (E2 ).

Having defined E1 , . . . , Ej , j < n, we define


(
Ej if vj+1 ∈ span (Ej ),
Ej+1 =
Ej ∪ {vj+1 } if vj+1 6∈ span (Ej ).

Thus, we get linearly independent sets E1 , E2 , . . . , En such that

E0 ⊆ E1 ⊆ . . . En , {v1 , v2 , . . . , vn } ⊆ span (En ).

Since {v1 , . . . , vn } is a basis of V , it follows that E := En is a basis


of V such that E0 ⊆ En = E.

Proof of Theorem 1.14. Let {u1 , . . . , uk } be a basis of the sub-


space V1 ∩ V2 . By Proposition 1.15, there exists v1 , . . . , v` in V1
and w1 , . . . , wm in V2 such that {u1 , . . . , uk , v1 , . . . , v` } is a basis
of V1 , and {u1 , . . . , uk , w1 , . . . , wm } is a basis of V2 . We show that
E := {u1 , . . . , uk , v1 , . . . , v` , w1 , . . . , wm } is a basis of V1 + V2 .
Clearly, V1 + V2 = span (E). Hence, it is enough to show that E
is linearly independent. For this, let α1 , . . . , αk , β1 , . . . , β` , γ1 , . . . , γm
be scalars such that
k
X `
X m
X
αi ui + β i vi + γi wi = 0. (∗)
i=1 i=1 i=1
24 Vector Spaces

Then
k
X `
X m
X
x := αi ui + β i vi = − γ i wi ∈ V 1 ∩ V 2 .
i=1 i=1 i=1

Hence, there exists scalars δ1 , . . . , δk such that


k
X `
X k
X k
X `
X
αi ui + βi vi = δi ui , i.e., (αi − δi )ui + β i vi = 0
i=1 i=1 i=1 i=1 i=1

Since {u1 , . . . , uk , v1 , . . . , v` } is a basis of V1 , it follows that αi = δi


for all i = 1, . . . , k, and βj = 0 for j = 1, . . . , `. Hence, from (∗),
k
X m
X
αi ui + γi wi = 0.
i=1 i=1

Now, since {u1 , . . . , uk , w1 , . . . , wm } is a basis of V2 , it follows that


αi = 0 for all i = 1, . . . , k, and γj = 0 for all j = 1, . . . , m.
Thus, we have shown that {u1 , . . . , uk , v1 , . . . , v` , w1 , . . . , wm } is
a basis of V1 + V2 . Since dim (V1 + V2 ) = k + ` + m, dim V1 = k + `,
dim V2 = k + m and dim (V1 ∩ V2 ) = k, we get

dim (V1 + V2 ) = dim V1 + dim V2 − dim (V1 ∩ V2 ).

This completes the proof.

Exercise 1.34 Prove that if V is a finite dimensional vector space


and V0 is a proper subspace of V , then dim (V0 ) < dim (V ). ♦
2
Linear Transformations

2.1 Introduction
We may recall from the theory of matrices that if A is an m × n
matrix, and if x is an n-vector, then Ax is an m-vector. Moreover,
for any two n-vectors x and y, and for every scalar α,

A(x + y) = Ax + Ay, A(αx) = αAx).

Also, we recall from calculus that if f and g are real-valued differen-


tiable functions (defined on an interval J), and α is a scalar, then

d d d d d
(f + g) = f + g, (αf ) = α f.
dt dt dt dt dt

Note also that, if f and g are continuous real-valued functions defied


on an interval [a, b], then

Z b Z b Z b Z b Z b
(f +g)(t)dt = f (t) dt+ g(t) dt, (αf )(t) = α f (t) dt,
a a a a a

and for every s ∈ [a, b],

Z s Z s Z s Z s Z s
(f +g)(t)dt = f (t) dt+ g(t) dt, (αf )(t) = α f (t) dt.
a a a a a

Abstracting the above operations between specific vector spaces, we


define the notion of a linear transformation between general vector
spaces.

25
26 Linear Transformations

2.2 What is a Linear Transformation?


Definition 2.1 (Linear transformation) Let V1 and V2 be vector
spaces (over the same scalar field F). A a function T : V1 → V2 is
said to be a linear transformation or a linear operator from V1 to V2
if
T (x + y) = T (x) + T (y), T (αx) = αT (x)
for every x, y ∈ V1 and for every α ∈ F.
A linear transformation with codomain space as the scalar field
is called a linear functional. ♦

• Linear functionals are usually denoted by small-scale letters.

• If V1 = V2 = V , and if if T : V → V is a linear transformation,


then we say that T is a linear transformation on V .

• The linear transformation which maps each x ∈ V onto itself is


called the identity transformation on V , and is usually denoted
by I. Thus, I : V → V is defined by

I(x) = x ∀ x ∈ V.

• If T : V1 → V2 is a linear transformation, then for x ∈ V1 , we


shall the element T (x) ∈ V2 also by T x..

EXAMPLE 2.1 (Multiplication by a scalar) Let V be a vector


space and λ be a scalar. Define T : V → V by T (x) = λx, x ∈ V .
Then we see that T is a linear transformation. ♦
EXAMPLE 2.2 (Matrix as linear transformation) Let A =
(aij ) be an m × n-matrix of scalars. For x ∈ Fn , let T (x) = Ax for
every x ∈ Fn . Then, it can be easily seen that T : Fn → Fm is a
linear transformation.
Note that
    
a11 a12 · · · a1n x1 x1
 a21 a22 · · · a2n   x2   x2 
Tx =  ··· ··· ··· ···  ··· ,
  x=  ··· .

am1 am2 · · · amn xn xn


What is a Linear Transformation? 27

Another example similar the above:

EXAMPLE 2.3 Let A = (aij ) be an m × n-matrix of scalars. For


x = (α1 , . . . , αn ) in Fn , let
n
X
T x = (β1 , . . . , βm ), βi = aij αj , i = 1, . . . , m.
j=1

Then T : Fn → Fm is a linear transformation. ♦


More generally, we have the following.
EXAMPLE 2.4 Let V1 and V2 be finite dimensional vector space
with bases E1 = {u1 , . . . , un } and E2 = {v1 , . . . , vm } P
respectively.
Let A = (aij ) be an m × n-matrix of scalars. For x = nj=1 αj uj ∈
V , let
m
X n
X
Tx = βi vi with βi = aij αj for i ∈ {1, . . . , m}.
i=1 j=1

Then T : V1 → V2 is a linear transformation. Thus,


n
X  m X
X n 
T αj uj := aij αj .
j=1 i=1 j=1


EXAMPLE 2.5 For each j ∈ {1, . . . , n}, the function fj : Fn → F
defined by fj (x) = xj for x = (α1 , . . . , αn ) ∈ Fn , is a linear func-
tional. ♦
More generally, we have the following example.
EXAMPLE 2.6 Let V be an n-dimensional space and let E =
{u1 , . . . , un } be a basis of V . For x = nj=1 αj uj ∈ V , and for each
P
j ∈ {1, . . . , n}, define fj : V → F by

fj (x) = αj .

Then fj is a linear functional. ♦


Definition 2.2 The linear functionals f1 , . . . , fn defined as in Ex-
ample 2.6 are called coordinate functionals on V with respect to
the basis E of V . ♦
28 Linear Transformations

Remark 2.1 We observe that if f1 , . . . , fn are the coordinate func-


tionals on V with respect to the basis E = {u1 , . . . , un } of V , then
fj (ui ) = δij ∀ i, j = 1, . . . , n.
It is to be remarked that these linear functionals depend not only on
the basis E = {u1 , . . . , un }, but also on the order in which u1 , . . . , un
appear in the representation of any x ∈ V . ♦
EXAMPLE 2.7 (Evaluation of functions) For a given point
τ ∈ [a, b], let fτ : C[a, b] → F be defined by
fτ (x) = x(τ ), x ∈ C[a, b].
Then fτ is a linear functional. ♦
More generally, we have the following example.
EXAMPLE 2.8 Given points τ1 , . . . , τn in [a, b], and ω1 , . . . ωn in
F, let T : C[a, b] → F be defined by
n
X
f (x) = x(τi )ωi , x ∈ C[a, b].
i=1

Then f is a linear functional. ♦


EXAMPLE 2.9 (Differentiation) Let T : C 1 [a, b] → C[a, b] be
defined by
T x = x0 , x ∈ C 1 [a, b],
where f 0 denotes the derivative of f . Then T is a linear transforma-
tion. ♦
EXAMPLE 2.10 For λ, µ ∈ F, the function T : C 1 [a, b] → C[a, b]
defined by
T x = λx + µx0 , x ∈ C 1 [a, b],
is a linear transformation. ♦
More generally, we have the following example.
EXAMPLE 2.11 Let T1 and T2 be linear transformations from V1
to V2 and λ and µ be scalars. Then T : V1 → V2 defined by
T (x) = λT1 (x) + µT2 (x), x ∈ V1 ,
is a linear transformation.

What is a Linear Transformation? 29

EXAMPLE 2.12 (Definite integration) Let T : C[a, b] → F be


defined by
Z b
f (x) = x(t) dt, x ∈ C[a, b].
a
Then f is a linear functional. ♦

EXAMPLE 2.13 (Indefinite integration)Let T : C[a, b] → C[a, b]


be defined by
Z s
(T x)(s) = x(t) dt, x ∈ C[a, b], s ∈ [a, b].
a

Then T is a linear transformation. ♦


EXAMPLE 2.14 Let V be a finite dimensional vector space and
E1 = {u1 , . . . , un } be a basis of V . For x ∈ Fn , let
n
X
Tx = xj uj .
j=1

Then T : Fn → V is a linear transformation. ♦


In the above example, ek ∈ Fn such that ek j = δkj , then T ek = uk
for k ∈ {1, . . . , n}. More generally, we have the following.

EXAMPLE 2.15 Let V1 and V2 be vector spaces with dim V1 = n.


Let E1 = {u1 , . . . , un }P
be a basis of V1 and E2 = {v1 , . . . , vm } be a
subset of V2 . For x = nj=1 αj uj ∈ V1 , define T : V1 → V2 by
m
X
Tx = αi vi .
i=1

Then T is a linear transformation. ♦

Exercise 2.1 Show that the linear transformation T in Example


2.15 is
(a) one-one if and only if E2 is linearly independent,
(b) onto if and only if span (E2 ) = V2 . ♦

Exercise 2.2 Let V1 and V2 be vector spaces, E1 = {u1 , . . . , un }


be a linearly independent subset of V1 and E2 = {v1 , . . . , vm } be a
subset of V2 .
30 Linear Transformations

(a) Show that there exists a linear transformation T : V1 → V2


such that
T uj = vj , j ∈ {1, . . . , n}.
(a) Show that the transformation T in (a) is unique if and only
if E1 is a basis of V1 . ♦
Exercise 2.3 Let V1 and V2 be vector spaces, and V0 be a subspace
of V1 . Let T0 : V0 → V2 be a linear transformation. Show that there
exists a linear transformation T : V1 → V2 such that T|V0 = A0 . ♦
Theorem 2.1 Let T : V1 → V2 be a linear transformation which is
one to one and onto. Then T −1 : V2 → V1 is also a linear transfor-
mation.
Proof. For y1 , y2 ∈ V2 , let x1 , x2 ∈ V1 be such that T x1 = y1 and
T x2 = y2 . Then for α, β ∈ F, by linearity of T , we have
αy1 + βy2 = αT x1 + βT x2 = T (αx1 + βx2 ).
Hence,
T −1 (αy1 + βy2 ) = αx1 + βx2 = αT −1 (x1 ) + βT −1 (x2 ).
This completes the proof.
Definition 2.3 (Isomorphism of vector spaces) Vector spaces
V1 and V2 are said to be linearly isomorphic if there exists a bijective
linear transformation T : V1 → V2 , and in that case T is called a
linear isomorphism from V1 onto V2 . ♦
Theorem 2.2 Any two finite dimensional vector spaces of the same
dimension are linearly isomorphic.
Proof. Let V1 and V2 be finite dimensional vector spaces of the
same dimension, say n. Let E1 = {u1 , . . . , un }Pand E2 = {v1 , . . . , vn }
be bases of V1 and V2 , respectively. For x = nj=1 αj uj ∈ V1 , define
T : V1 → V2 by
Xm
Tx = αi vi .
i=1
Clearly, T
Pis a linear transformation. Note that T is bijective as well:
n
For x = j=1 αj uj ∈ V1 ,
T x = 0 =⇒ (α1 , . . . , αn ) = 0 =⇒ x = 0.
Pn Pn
Also, for y = j=1 βj vj ∈ V2 , the element x = j=1 βj uj ∈ V1 ,
satisfies T x = y.
Space of Linear Transformations 31

EXAMPLE 2.16 Let dim (V ) =Pn and E = {u1 , . . . , un } be an


ordered basis of V . Then for x = ni=1 αi ui ∈ V , T1 : V → Fn and
T2 : V → Fn defined by
 
α1
 α2 
T1 x = (α1 , . . . , αn ), T2 x =  . 
 
.
 . 
αn

are a bijective linear isomorphisms.


The above isomorphisms are called canonical isomorphisms be-
tween the spaces involved w.r.t. the basis E. ♦

2.3 Space of Linear Transformations


Let L(V1 , V2 ) denote the set of all linear transformations from V1 to
V2 . On L(V1 , V2 ) we define addition and scalar multiplication point-
wise, i.e., for T, T1 , T2 in L(V1 , V2 ) and α ∈ F, linear transformations
T1 + T2 and αT are defined by

(T1 + T2 )(x) = T1 x + T2 x,

(αT )(x) = αT x
for all x ∈ V . Then it is seen that L(V1 , V2 ) is a vector space with
its zero element as the zero operator O : V1 → V2 defined by

Ox = 0 ∀ x ∈ V1

and the additive inverse −T of T ∈ L(V1 , V2 ) is −T : V1 → V2 defined


by
(−T )(x) = −T x ∀ x ∈ V1 .

Definition 2.4 The space L(V, F) of all linear functionals on V is


called the dual space of V , and this space is also denoted by V 0 . ♦

Theorem 2.3 Let V be a finite dimensional vector space, and let


E = {u1 , . . . , un } be a basis of V . If f1 , . . . , fn are the coordinate
functionals on V with respect to E, then we have the following:

(i) Every x ∈ V can be written as x = nj=1 fj (x)uj .


P

(ii) {f1 , . . . , fn } is a basis of V 0 .


32 Linear Transformations

Proof. Since E = {u1 , . . . , un } is a basis of V , P


for every x ∈ V ,
there exist unique scalars α1 , . . . αn such that x = nj=1 αj uj . Now,
using the relation fi (uj ) = δij , it follows that
n
X
fi (x) = αj fi (uj ) = αi , i = 1, . . . , n.
j=1

Therefore, the result in (i) follows.


To see (ii), first we observe that if ni=1 αi fi = 0, then
P

n
X
αj = αi fi (uj ) = 0 ∀ j = 1, . . . , n.
i=1

Hence, {f1 , . . . , fn } is linearly independent in L(V, F). It remains to


show that the span {f1 , . . . , fn } = L(V, F). For this, let f ∈ L(V, F)
and x ∈ V . Then using the representation of x in (i), we have
n
X n
X 
f (x) = fj (x)f (uj ) = f (uj )fj (x)
j=1 j=1
Pn
for all x ∈ V . Thus, f = j=1 f (uj )fj so that f ∈ span {f1 , . . . , fn }.
This completes the proof.

Definition 2.5 ((Dual basis)) Let V be a finite dimensional vector


space and let E = {u1 , . . . , un } be a basis of V , and f1 , . . . , fn be the
associated coordinate functionals. The basis F := {f1 , . . . , fn } of V 0
is called the dual basis of E, or dual to the basis E. ♦

2.4 Matrix Representations


Let V1 and V2 be finite dimensional vector spaces, and E1 = {u1 , . . . , un }
and E2 = {v1 , . . . , vm } be ordered bases of V1 and V2 , respectively.
In Example 2.4 we have seen that an m × n matrix of scalars induces
a linear transformation from V1 to V2 . Now, we show the reverse.
Let T : V1 → V2 be a linear transformation. Note that for
everyPx ∈ V1 , there exists a unique (α1 , . . . , αn ) ∈ Fn such that
x = nj=1 αj uj . Then, by the linearity of T , we have
n
X
T (x) = αj T (uj ).
j=1
Matrix Representations 33

Since T (uj ) ∈ V2 for each j = 1, . . . , n and {v1 , . . . , vm } is a basis of


V2 , T uj can be written as

m
X
T (uj ) = aij vi
i=1

for some scalars a1j , a2j , . . . , amj . Thus,

n
X n
X m
X  Xm X
n 
T (x) = αj T uj = αj aij vi = aij αj vi . (∗)
j=1 j=1 i=1 i=1 j=1

For x := ni=1 αi ui ∈ V1 , let ~x ∈ Fn be the column vector [α1 , . . . , αn ]T .


P
Then the relation (∗) connecting the linear transformation T and the
matrix A = (aij ) can be written as

m
X
Tx = (A~x)i vi .
i=1

In view of the above representation of T , we say that the m × n


matrix A := (aij ) is the matrix representation of T , with respect
to the ordered bases E1 and E2 of V1 and V2 respectively. This fact
is written as
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
[T ]E1 ,E2 =
 ···

··· ··· ··· 
am1 am2 ··· amn

or simply [T ] = (aij ) when the bases are understood.

Definition 2.6 The matrix [T ]E1 ,E2 is called the matrix representa-
tion of T , with respect to {E1 , E2 }. ♦
Clearly, the above discussion also shows that for every m × n
matrix A = (aij ), there exists a unique linear transformation T ∈
L(V1 , V2 ) such that [T ] = (aij ). Thus, there is a one-one correspon-
dence between L(V1 , V2 ) onto Fm×n , namely,

T 7→ [T ].
34 Linear Transformations

Suppose J1 : V1 → FnPand J2 : V2 → Fm be the Pcanonical isomor-


phisms, that is, for x = ni=1 αi ui ∈ V1 and y = nj=1 βi vi ∈ V2 ,
   
α1 β1
 α2   β2 
J1 (x) :=  .  , J2 (x) :=  .  .
   
 ..   .. 
αn βn

Then we see that J2 T J1−1 : Fn → Fm and J2−1 [T ]J1 : V1 → V2 are


linear transformations such that

J2 T J1−1 x = [T ]x, x ∈ Fn ,

J2−1 [T ]J1 x = T x, x ∈ V1 .
Exercise 2.4 Prove the last statement. ♦
Exercise 2.5 Let V be an n-dimensional vector space and {u1 , . . . , un }
be an ordered basis of V . Let f be a linear functional on V . Prove
the following:
(i) There exists a unique (β1 , . . . , βn ) ∈ Fn such that

f (α1 u1 + . . . + αn un ) = α1 β1 + . . . + αn βn .

(ii) The matrix representations of [f ]E,{1} is [β1 · · · βn ]. ♦


Exercise 2.6 Let V1 and V2 be finite dimensional vector spaces,
and E1 = {u1 , . . . , un } and E2 = {v1 , . . . , vm } be bases of V1 and V2 ,
respectively. Show the following:
(a) If {g1 , . . . , gm } is the dual of E2 , then for every T ∈ L(V1 , V2 ),
 
[T ]E1 ,E2 = gi (T uj ) .

(b) If T1 , T2 ∈ L(V1 , V2 ) and α ∈ F, then

[T1 + T2 ]E1 ,E2 = [T1 ]E1 ,E2 + [T2 ]E1 ,E2 , [αT ]E1 ,E2 = α[T ]E1 ,E2 .

(c) Suppose {Aij : i = 1 . . . , m; j = 1, . . . , n} is a basis of Fm×n .


If Tij ∈ L(V1 , V2 ) is such that [Tij ]E1 ,E2 = Aij , then

{Tij : i = 1 . . . , m; j = 1, . . . , n}

is a basis of L(V1 , V2 ). (e.g., Aij as in Example 1.21. ♦


Rank and Nullity 35

Exercise 2.7 Let T : R3 → R3 be defined by

T (x1 , x2 , x3 ) = (x2 + x3 , x3 + x1 , x1 + x2 ).

Find [T ]E1 ,E2 in each of the following cases.


(a) {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, E2 = {(1, 0, 0), (1, 1, 0), (1, 1, 1)}
(b) E1 = {(1, 0, 0), (1, 1, 0), (1, 1, 1)}, E2 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}
(c) E1 = {(1, 1, −1), (−1, 1, 1), (1, −1, 1)},
E2 = {(−1, 1, 1), (1, −1, 1), (1, 1, −1) ♦

Exercise 2.8 Let T : P 3 → P 2 be defined by

T (a0 + a1 t + a2 t2 + a3 t3 ) = a1 + 2a2 t + 3a3 t2 .

Find Find [T ]E1 ,E2 in each of the following cases.


(a) E1 = {1, t, t2 , t3 }, E2 = {1 + t, 1 − t, t2 }
(b) E1 = {1, 1 + t, 1 + t + t2 , t3 }, E2 = {1, 1 + t, 1 + t + t2 }
(c) E1 = {1, 1 + t, 1 + t + t2 , 1 + t + t2 + t3 }, E2 = {t2 , t, 1}

Exercise 2.9 Let T : P 2 → P 3 be defined by


a1 2 a2 3
T (a0 + a1 t + a2 t2 ) = a0 t + t + t .
2 3
Find Find [T ]E1 ,E2 in each of the following cases.
(a) E1 = {1 + t, 1 − t, t2 }, E2 = {1, t, t2 , t3 },
(b) E1 = {1, 1 + t, 1 + t + t2 }, E2 = {1, 1 + t, 1 + t + t2 , t3 },
(c) E1 = {t2 , t, 1}, E2 = {1, 1 + t, 1 + t + t2 , 1 + t + t2 + t3 }, ♦

2.5 Rank and Nullity


Let V1 and V2 be vector spaces and T : V1 → V2 be a linear transfor-
mation. Then it is easily seen that the sets

R(T ) = {T x : x ∈ V1 }, N (T ) = {x ∈ V1 : T x = 0}

are subspaces of V1 and V2 , respectively (Verify!).


Definition 2.7 (Range and null space) The subspaces R(T ) and
N (T ) associated with a linear transformation T : V1 → V2 are called
the range of T and null space of T , respectively.
36 Linear Transformations

Definition 2.8 (Rank and nullity) The dimension of R(T ) is


called the rank of T , denoted by rank T , and the dimension of N (T )
is called the nullity of T , denoted by null T .
Let T : V1 → V2 be a linear transformation. We observe that
• T is onto or surjective if and only if R(T ) = V2 ,

• T is one-one or injective if and only if N (T ) = {0}.


The proof of the following theorem is easy, and hence left as an
exercise.
Theorem 2.4 Let T : V1 → V2 be a linear transformation. The we
have the following.

(a) If u1 , . . . , uk are linearly independent in V1 and if T is one-one,


then T u1 , . . . , T uk are linearly independent in V2 .

(b) If {u1 , . . . , uk } ⊂ V1 is such that T u1 , . . . , T uk are linearly in-


dependent subset in V2 , then u1 , . . . uk are linearly independent
in V2 .
From the above theorem we can deduce the following theorem.
Theorem 2.5 Let V1 and V2 be finite dimensional vector spaces and
T : V1 → V2 be a linear transformation. Then T is one-one if and
only if rank T = dim V1 . In particular, if dim V1 = dim V2 , then

T is one-one if and only if T is onto.

In fact, the above result is a particular case of the following the-


orem as well.
Theorem 2.6 (Rank-nullity theorem) Let V1 and V2 be vector
spaces and T : V1 → V2 be a linear transformation. Then

rank T + null T = dim V1 .

Proof. First we observe that, if either null T = ∞ or rank T = ∞,


then dim V1 = ∞(Why?). Therefore, assume that both

r := rank T < ∞, k := null T < ∞.

Suppose E0 = {u1 , . . . , uk } is a basis of N (T ) and E = {v1 , . . . , vr }


is a basis of R(T ). Let E1 = {w1 , . . . , wr } ⊆ V1 such that T wj = vj ,
Rank and Nullity 37

j = 1, . . . , r. We show that E0 ∪ E1 is a basis of V1 . Note that


E0 ∩ E1 = ∅.
Let x ∈ V1 . Since E = {v1 , . . . , vr } is a basis of R(T ), there exist
scalars α1 , . . . , αr such that
r
X r
X
Tx = αj vj = αj T wj
i=1 i=1

Hence,
 r
X 
T x− α j wj = 0
i=1
Pr
so that x − i=1 αj wj ∈ N (T ). Since E0 is a basis of N (T ), there
exist scalars β1 , . . . , βk such that
r
X k
X
x− α j wj = βj uj .
i=1 i=1

Thus, x ∈ span (E0 ∪ E1 ). It remains to show that E0 ∪ E1 is linearly


independent. For this, suppose a1 , . . . , ak and b1 , . . . , br are scalars
such that
Xk Xr
aj uj + bj wj = 0.
i=1 i=1

Applying T to the above equation, it follows that ri=1 bj vj = 0 so


P
that, by the linear independence of E, bj = 0 for all j = 1, . . . , r.
Therefore, we have ki=1 aj uj = 0. Now, by the linear independence
P
of E0 , aj = 0, for all j = 1, . . . , k. This completes the proof.

Exercise 2.10 Prove Theorem 2.4. ♦

Recall that for a square matrix A, det(A) = 0 if and only if its


columns are linearly dependent. Hence, in view of Theorem 2.4, we
have the following:

Theorem 2.7 Let T : V1 → V2 be a linear transformation and let


A be a matrix representation of A. Then T is one-one if and only if
columns of A are linearly independent.

Definition 2.9 (finite rank transformations) A linear transfor-


mation T : V → W is said to be of finite rank if rank T < ∞.
38 Linear Transformations

Exercise 2.11 Let T : V1 → V2 be a linear transformation between


vector spaces V1 and V2 . Show that T is of finite rank if and only
∈ N, {v1 , . . . , vn } ⊂ V2 and {f1 , . . . , fn } ⊂ L(V1 , F)
if there exists n P
such that T x = nj=1 fj (x)vj for all x ∈ V1 . ♦

2.6 Composition of Linear of Transformations


Let V1 , V2 , V3 be vector spaces, and let T1 ∈ L(V1 , V2 ), T2 ∈ L(V2 , V3 ).
Then the composition of T1 and T2 , namely, T2 ◦T1 : V1 → V3 defined
by
(T2 ◦ T1 )(x) = T2 (T1 x), x ∈ V1 ,

is a linear transformation and it is dented by T2 T1 .


Note that if V1 = V2 = V3 = V , then both T1 T2 , T2 T1 are well-
defined and belong to L(V ). In particular, if T ∈ L(V ), we can define
powers of T , namely, T n for any n ∈ N inductively: T 1 := T and for
n > 1,
T n = T (T n−1 ).

Using this, we can define polynomials in T as follows: .


For T ∈ L(V ) and p ∈ Pn , say p(t) = a1 + a1 t + · · · + an tn , we
define p(T ) : V → V by

p(T ) = a1 I + a1 T + · · · + an T n .

We shall also use the convention: T 0 := I.


EXAMPLE 2.17 Let T1 : R3 → R2 and T : R2 → R2 be defined
by

T1 (α1 , α2 , α2 ) = (α1 + α2 + 2α3 , 2α1 − α2 + α3 ),


T2 (β1 , β2 ) = (β1 + β2 , β1 − β2 ).

Then the product transformation T2 T1 is given by

(T2 T1 )(α1 , α2 , α2 ) = T2 (α1 + α2 + 2α3 , 2α1 − α2 + α3 )


= (β1 + β2 , β1 − β2 ),

where β1 = α1 + α2 + 2α3 , β2 = 2α1 − α2 + α3 . Thus,

(T2 T1 )(α1 , α2 , α2 ) = (3α1 + 3α3 , −α1 + 2α2 + α3 ) .


Composition of Linear of Transformations 39

Now, consider the standard bases on R3 and R2 , that is, E1 =


{(1, 0, 0), (0, 1, 0), (0, 0, 1)} and E2 = {(1, 0), (0, 1)}, respectively.
Then we see that
   
1 1 2 1 1
[T1 ]E1 ,E2 = , [T2 ]E2 ,E2 = ,
2 −1 1 1 −1
 
3 0 3
[T2 T1 ]E1 ,E2 = .
−1 2 1
Note that
[T2 T1 ]E1 ,E2 = [T2 ]E2 ,E2 [T1 ]E2 ,E3 .

Exercise 2.12 Let V1 , V2 , V3 be finite dimensional vector spaces
with based E1 , E2 , E3 , respectively. Prove that if T1 ∈ L(V1 , V2 )
and T2 ∈ L(V2 , V3 ), then [T2 T1 ]E1 ,E3 = [T2 ]E2 ,E3 [T1 ]E1 ,E2 .
Verify the above relation for the operators in Example 2.17. ♦
Recall from set theory that, if S1 and S2 are nonempty sets, then
a function f : S1 → S2 is one-one if and only if there exists a unique
g : R(f ) → S1 such that

g(f (x)) = x, f (g(y)) = y

for all x ∈ S1 and for all y ∈ R(f ).


In the case of a linear transformation we have the following.
Theorem 2.8 Let T ∈ L(V1 , V2 ). Then T is one-one if and only if
there exists a linear transformation Te : R(T ) → V1 such that

Te(T x) = x ∀ x ∈ V1 , T (Tey) = y ∀ y ∈ R(T ),

and in that case, such operator Te is unique.


Proof. The fact that T is one-one if and only if there exists a
unique function Te : R(T ) → V1 such that

Te(T x) = x ∀ x ∈ V1 , T (Tey) = y ∀ y ∈ R(T )

follows as in set theory. Thus, it is enough to prove that Te is linear.


For this, let y1 , y2 be in R(T ) and let x1 , x2 in V1 be such that
T xi = yi , i = 1, 2. Let α ∈ F. Then, by linearity of T , we have

y1 + αy2 = T Tey1 + αT Tey2 = T (Tey1 + αTey2 )


40 Linear Transformations

so that

Te(y1 + αy2 ) = TeT (Tey1 + αTey2 ) = Tey1 + αTey2 .

Thus, Te is linear.

Definition 2.10 If T : V1 → V2 is an injective linear operator, then


the unique linear operator Te : R(T ) → V1 defined as in Theorem 2.8
is called the inverse of T , and is denoted by T −1 : R(T ) → V1 . ♦
Clearly, if T ∈ L(V1 , V2 ) is bijective, then its inverse is defined on
all of V2 . Thus, T ∈ L(V1 , V2 ) is bijective if and only if there exists
a unique operator M ∈ L(V1 , V2 ) such that

T M = IY , M T = IX ,

and in that case(verify), M is also bijective and

(M T )−1 = M −1 T −1 .

Definition 2.11 A linear operator T : V1 → V2 is said to be invert-


ible if it is bijective. ♦
Exercise 2.13 Prove the following.
(i) If T1 : V1 → V2 and T2 : V2 → V3 are linear transformations
such that T2 T1 is bijective, then T2 one-one and T1 is onto.
(ii) If T ∈ L(V1 , V2 ) is invertible, then dim (V1 ) = dim (V2 ), and
the converse need not be true.
(iii) If dim (V ) < ∞ and T ∈ L(V ), then T is invertible if and
only if for every basis E of V , det[T ]E,E 6= 0. ♦

2.7 Eigenvalues and Eigenvectors


2.7.1 Definition and examples
Let T : V → V be a linear operator on a vector space V .
Definition 2.12 A scalar λ is called an eigenvalue of T if there
exists a nonzero vector x ∈ V such that

T x = λ x,

and in that case, x is called an eigenvector of T corresponding to


the eigenvalue λ. ♦
Eigenvalues and Eigenvectors 41

Suppose V is a finite dimensional space, say dim (V ) = n, and


T ∈ L(V ). Let A be the matrix representation of T w.r.t. a basis
{u1 , . . . , un } of V . Then by the discussion in Section 2.4, we have
n
X n
X
x= (~x)i ui , Tx = (A~x)i ui ,
i=1 i=1

Hence, x 6= 0 if and only if ~x 6= 0, and for λ ∈ F,

T x = λx ⇐⇒ A~x = λ~x.

Note that λ ∈ F is an eigenvalue of T if and only if T − λI is


not one-one, and in that case, the subspace N (T − λI) consists of
all eigenvectors of T corresponding to the eigenvalue λ together with
the zero vector.
Definition 2.13 If λ is an eigenvalue of T , then the space N (T −λI)
is called the eigenspace of T corresponding to λ.
The set of all eigenvalues of T is called the eigenspectrum of
T , and will be denoted by Eig(T ). ♦
In view of Theorem 2.7, we have the following:
Theorem 2.9 Let V be finite dimensional and T ∈ L(V ). If A is a
matrix representation of T with respect to a basis of V , then λ is an
eigenvalue of T if and only if det(A − λI) = 0.

EXAMPLE 2.18 The conclusions in (i)-(vi) below can be verified


easily:
(i) Let T : R3 → R3 be defined by

T (α1 , α2 , α3 ) = (α1 , α1 + α2 , α1 + α2 + α3 ).

Then Eig(T ) = {1} and N (A − I) = span {(0, 0, 1)}.


(ii) Let T : F2 → F2 be defined by

T (α1 , α2 ) = (α1 + α2 , α2 ).

Then Eig(T ) = {1} and N (T − I) = span {(1, 0)}.


(iii) Let T : F2 → F2 be defined by

T (α1 , α2 ) = (α2 , −α1 ).


42 Linear Transformations

If F = R, then A has no eigenvalues, i.e., Eig(A) = ∅.


(iv) Let T be as in (ii) above. If F = C, then Eig(T ) = {i, −i},
N (A − iI) = span {(1, i)} and N (T + iI) = span {(1, −i)}.
(v) Let T : P → P be defined by

(T x)(t) = tx(t), x ∈ P.

Then Eig(T ) = ∅.
(vi) Let X be P[a, b] and A : V → X be defined by

d
(T x)(t) = x(t), x ∈ P.
dt

Then Eig(T ) = {0} and N (T ) = span {x0 }, where x0 (t) = 1 for all
t ∈ [a, b]. ♦

Theorem 2.10 Let T ∈ L(V ) and λ ∈ F. Let V0 = {0} and for


j ∈ N, let Vj := N ((T − λI)j ). Then the following hold.

(i) {0} ⊆ N (T − λI) ⊆ N ((T − λI)2 ) ⊆ N ((T − λI)3 ) ⊆ . . ..

(ii) If N ((T − λI)k ) = N ((T − λI)k+1 ) for some k ∈ N ∪ {0}, then


N ((T − λI)k ) = N ((T − λI)k+j ) for all j ∈ N.

Suppose V is finite dimensional. Then every inclusion in Theorem


2.10(i) cannot be proper. Thus, the following corollary is immediate
from Theorem 2.10.

Corollary 2.11 Let T ∈ L(V ) and λ be an eigenvalue of T . If V


is a finite dimensional vector space, then there exists ` ∈ {1, . . . , n}
such that N ((T − λI)` ) = N ((T − λI)`+j ) for all j ∈ N.

Definition 2.14 Let T ∈ L(V ) and λ be an eigenvalue of T . The


number dim N (T − λI) is called the geometric multiplicity of λ. If
there exists ` ∈ N such that N ((T − λI)` ) = N ((T − λI)`+j ) for all
j ∈ N, then
(i) ` is called the index of λ,
(ii) N ((T − λI)` ) is called the generalized eigenspace of T corre-
sponding to λ,
(iii) dim N ((T − λI)` ) is called the algebraic multiplicity of λ. ♦
Eigenvalues and Eigenvectors 43

If V is finite dimensional, T ∈ L(V ) and λ be an eigenvalue of T ,


then it is obvious that λ has (finite) index and algebraic multiplicity.
If gλ , `λ and mλ are the geometric multiplicity, index and algebraic
multiplicity, respectively, of λ, then from Theorem 2.10(i), we have

gλ + `λ − 1 ≤ m` .

It is also known1 that


m ≤ `g.
Thus, if ` = 1, then g = m, that is, generalized eigenspace coincides
with eigenspace.

2.7.2 Existence of an eigenvalue


From the above examples we observe that in those cases in which
the eigenspectrum is empty, either the scalar field is R or the vector
space is infinite dimensional. The next result shows that if the space
is finite dimensional and if the scalar field is the set of all complex
numbers, then the eigenspectrum is nonempty.

Theorem 2.12 Let V be a finite dimensional vector space over C.


Then every linear operator on V has at least one eigenvalue.

Proof. Let dim (V ) = n and T : V → V be a linear opera-


tor. Let x be a nonzero element in V . Since dim (V ) = n, the
set {x, T x, T 2 x, . . . , T n x} is linearly dependent. Let a0 , a1 . . . , an be
scalars with at least one of them being nonzero such that

a0 x + a1 T x + · · · + an T n x = 0.

Let k = max {j : aj 6= 0, j = 1, . . . , n}. Then writing

p(t) = a0 + a1 t + · · · + ak tk , p(T ) = a0 I + a1 T + · · · + ak T k ,

we have
p(T )(x) = 0.
By fundamental theorem of algebra, there exist λ1 , . . . , λk in C such
that
p(t) = ak (t − λ1 )(t − λ2 ) . . . (t − λk ).
1
M.T. Nair: Multiplicities of an eigenvalue: Some observations, Resonance,
Vol. 7 (2002) 31-41.
44 Linear Transformations

Thus, we have
(T − λ1 I)(T − λ2 I) . . . (T − λk I)(x) = p(T )(x) = 0.
Hence, at least one of T − λ1 I, . . . , T − λk I is not one-one so that
at least one of λ1 , . . . , λk is an eigenvalue of A.
Theorem 2.13 Let λ1 , . . . , λk be distinct eigenvalues of a linear op-
erator T : V → V with corresponding eigenvectors u1 , . . . , uk , respec-
tively. Then {u1 , . . . , uk } is a linearly independent set.
Proof. We prove this result by induction. Clearly {u1 } is linearly
independent. Now, assume that {u1 , . . . , um } is linearly independent,
where m < k. We show that {u1 , . . . , um+1 } is linearly independent.
So, let α1 , . . . , αm+1 be scalars such that
α1 u1 + · · · + αm um + αm+1 um+1 = 0. (∗)
Applying T and using the fact that T uj = λj uj , we have
α1 λ1 u1 + · · · + αm λk um + αm+1 λm+1 um+1 = 0.
From (∗), multiplying by λm+1 , we have
α1 λm+1 u1 + · · · + αm λm+1 um + αm+1 λm+1 um+1 = 0.
Thus,
α1 (λ1 − λm+1 )u1 + · · · + αm (λm+1 − λm )um = 0.
Now, using the fact that {u1 , . . . , um } is linearly independent in V ,
and λ1 , . . . , λm , λm+1 are distinct, we obtain αj = 0 for j = 1, . . . , m.
Therefore, from (∗), αm+1 = 0. This completes the proof.
By the above theorem we can immediately infer that if V is finite
dimensional, then the eigenspectrum of every linear operator on V
is a finite set.
Definition 2.15 Let T ∈ L(V ) and V0 is a subspace of V . Then V0
is is said to be invariant under T if T (V0 ) ⊆ V0 , that is,
x ∈ V0 =⇒ T x ∈ V0 .
If V0 is invariant under T , then the restriction of T to the space V0
is the operator T0 ∈ L(V0 ) defined by
T0 x = T x ∀ x ∈ V0 .

Eigenvalues and Eigenvectors 45

Exercise 2.14 Prove that if T ∈ L(V ), λ ∈ Eig(T ) and for k ∈ N if


Vk = N ((T − λI)k ) and Wk = R((T − λI)k ), then show that Vk and
Wk are invariant under T . ♦

Exercise 2.15 Prove that if T ∈ L(V ), V0 is invariant under T , and


T0 ∈ L(V0 ) is the restriction of T to V0 , then Eig(T0 ) ⊆ Eig(T ). ♦

2.7.3 Diagonalizability
Definition 2.16 Suppose V is a finite dimensional vector space and
T : V → V is a linear operator. Then T is said to be diagonalizable
if V has a basis E such that [T ]E,E is a diagonal matrix. ♦
The proof of the following theorem is immediate (Write details!).

Theorem 2.14 Suppose V is a finite dimensional vector space and


T : V → V is a linear operator. Then T is diagonalizable if and only
if V has a basis E consisting of eigenvectors of T .

Hence, in view of Theorem 2.13, we have the following.

Theorem 2.15 Suppose dim (V ) = n and T : V → V is a linear


operator having n distinct eigenvalues. Then T is diagonalizable.

It is to be observed that, in general, a linear operator T : V → V


need not be diagonalizable (See Example 2.18(ii)). However, we
shall see in the next chapter that if V has some additional struc-
ture, namely that V is an inner product space, and if T satisfies
an additional condition with respect to this new structure, namely,
self-adjointness, then T is diagonalizable.

Exercise 2.16 Which of the following linear transformation T is


diagonalizable? If it is diagonalizable, find the basis E and [T ]E,E .
(i) T : R3 → R3 such that

T (x1 , x2 , x3 ) = (x1 + x2 + x3 , x1 + x2 − x3 , x1 − x2 + x3 ).

(ii) T : P3 → P3 such that

T (a0 + a1 t + a2 t2 + a3 t3 ) = a1 + 2a2 t + 3a3 t2 .

(iii) T : R3 → R3 such that

T e1 = 0, T e2 = e1 , T e3 = e2 .
46 Linear Transformations

(iv) T : R3 → R3 such that

T e1 = e2 , T e2 = e3 , T e3 = 0.

(iv) T : R3 → R3 such that

T e1 = e3 , T e2 = e2 , T e3 = e1 .


3
Inner Product Spaces

3.1 Motivation
In Chapter 1 we defined a vector space as an abstraction of the
familiar Euclidian space. In doing so, we took into account only two
aspects of the set of vectors in a plane, namely, the vector addition
and scalar multiplication. Now, we consider the third aspect, namely
the angle between vectors.
Recall from plane geometry that if ~x = (x1 , x2 ) and ~y = (y1 , y2 )
are two non-zero vectors in the plane R2 , then the angle θx,y between
~x and ~y is given by

x 1 y1 + x 2 y2
cos θx,y := ,
|~x| |~y |

where for a vector ~u = (u1 , u2 ) ∈ R2 , |~u| denotes the absolute value


of the vector ~u, i.e.,
q
|~u| := u21 + u22 ,

which is the distance of the point (u1 , u2 ) ∈ R2 from the coordinate


origin.
We may observe that the angle θx,y between the vectors ~x and ~y
is completely determined by the quantity x1 y1 + x2 y2 , which is the
dot product of ~x and ~y . Breaking the convention, let us denote this
quantity, i.e., the dot product of ~x and ~y , by h~x, ~y i, i.e.,

h~x, ~y i = x1 y1 + x2 y2 .

A property of the function (~x, ~y ) 7→ h~x, ~y i that one notices immedi-


ately is that, for every fixed ~y ∈ R2 , the function

x 7→ h~x, ~y i, ~x ∈ R2 ,

47
48 Inner Product Spaces

is a linear transformation from R2 into R, i.e.,

h~x + ~u, ~y i = h~x, ~y i + h~u, ~y i, hα~x, ~y i = αh~x, ~y i (3.1)

for all ~x, ~u in R2 . Also, we see that for all ~x, ~y in R2 ,

h~x, ~x)i ≥ 0, (3.2)

h~x, ~xi = 0 ⇐⇒ ~x = ~0, (3.3)


h~x, ~y i = h~y , ~xi. (3.4)
If we take C2 instead of R2 , and if we define h~x, ~y i = x1 y1 + x2 y2 , for
~x, ~y in C2 , then the above properties are not satisfied by all vectors
in C2 . In order to accommodate the complex situation, we define a
generalized dot product, as follows: For ~x, ~y in F2 , let

h~x, ~y i∗ = x1 ȳ1 + x2 ȳ2 ,

where for a complex number z, z̄ denotes its complex conjugation.


It is easily seen that h·, ·i∗ satisfies properties (3.1) – (3.4).
Now, we shall consider the abstraction of the above modified dot
product.

3.2 Definition and Some Basic Properties


Definition 3.1 (Inner Product) An inner product on a vector
space V is a map (x, y) 7→ hx, yi which associates each pair (x, y)
of vectors in V , a unique scalar hx, yi which satisfies the following
axioms:
(a) hx, xi ≥ 0 ∀x ∈ V ,
(b) hx, xi = 0 ⇐⇒ x = 0,
(c) hx + y, zi = hx, zi + hy, zi ∀ x, y, z ∈ V ,
(d) hαx, yi = αhx, yi ∀ α ∈ F and ∀ x, y ∈ V , and
(e) hx, yi = hy, xi ∀ x, y ∈ V . ♦
Definition 3.2 (Inner Product Space) A vector space together
with an inner product is called an inner product space. ♦
If an inner product h·, ·i is defined on a vector space V , and if V0
is a subspace of V , then the restriction of h·, ·i to V0 × V0 , i.e., the
map (x, y) 7→ hx, yi for (x, y) ∈ V0 × V0 is an inner product on V0 .
Examples of Inner Product Spaces 49

Before giving examples of inner product spaces, let us observe


some properties of an inner product.
Proposition 3.1 Let V be an inner product space. For a given
y ∈ V , let f : V → F be defined by
f (x) = hx, yi, x ∈ V.
Then f is a linear functional on V .
Proof. The result follows from axioms (c) and (d) in the definition
of an inner product: Let x, x0 ∈ V and α ∈ F. Then, by axioms (c)
and (d),
f (x + x0 ) = hx + x0 , yi = hx, yi + hx0 , yi = f (x) + f x0 ),
f (αx) = hαx, yi = αhx, yi = αf (x).
Hence, f is a linear transformation.

Proposition 3.2 Let V be an inner product space. Then for every


x, y, u, v in V , and for every α ∈ F,
hx, u + vi = hx, ui + hx, vi, hx, αyi = ᾱhx, yi.
Proof. The result follows from axioms (c),(d) and (e) in the def-
inition of an inner product: Let x, y, u, v in V and α ∈ F.
hx, u+vi = hu + v, x} = hu, xi + hv, xi = hu, xi+hv, xi = hx, ui+hx, vi,
hx, αyi = hαy, xi = αhy, xi = ᾱhx, yi.
This completes the proof.

Exercise 3.1 Suppose V is an inner product space over C. Prove


that Rehix, yi = −Imhx, yi for all x, y ∈ V . ♦

3.3 Examples of Inner Product Spaces


EXAMPLE 3.1 For x = (α1 , . . . , αn ) and y = (β1 , . . . , βn ) in Fn ,
define
X n
hx, yi = αj β̄j .
j=1

It is seen that h·, ·i is an inner product on Fn .


The above inner product is called the standard inner product
on Fn . ♦
50 Inner Product Spaces

EXAMPLE 3.2 Suppose V is a finite dimensional vector space, say


of dimension n, andPE := {u1 , . . . , un } is an ordered basis of V . For
x = ni=1 αi ui , y = ni=1 βi ui in V , let
P

n
X
hx, yiE := αi β̄i .
i=1

Then it is easily seen that h·, ·iE is an inner product on V . ♦


EXAMPLE 3.3 For f, g ∈ C[a, b], let
Z b
hf, gi := f (t)g(t) dt.
a

This defines an inner product on C[a, b]: Clearly,


Z b
hf, f i = |f (t)|2 dt ≥ 0 ∀f ∈ C[a, b],
a

and by continuity of the function f ,


Z b
hf, f i := |f (t)|2 dt = 0 ⇐⇒ f (t) = 0 ∀ t ∈ [a, b].
a

The other axioms can be verified easily. ♦

Exercise 3.2 Let T : V → Fn be a linear isomorphism. Show that

hx, yiT := hT x, T yiFn , x, y ∈ V,

defines an inner product on V . Here, h·, ·iFn is the standard inner


product on Fn . ♦

Exercise 3.3 Let τ1 , . . . , τn+1 be distinct real numbers. Show that

n+1
X
hp, qi := p(τi )q(τi ), p, q ∈ Pn ,
i=1

defines an inner product on Pn . ♦


Norm of a Vector 51

3.4 Norm of a Vector


Recall that the absolute value of a vector ~x = (x1 , x2 ) ∈ R2 , is given
by q
|~x| = x21 + x22
Denoting the standard inner product on R2 by hx, xi2 , it follows that
p
|~x| = h~x, ~xi2 .
As an abstraction of the above notion, we define the norm of a
vector.
Definition 3.3 (Norm of a Vector) Let V be an inner product
space. Then for x ∈ V , then norm of x is defined as the non-negative
square root of hx, xi, and it is denoted by kxk, i.e,
p
kxk := hx, xi, x ∈ V.
The map x 7→ kxk is also called a norm on V . ♦
Definition 3.4 A vector in an inner product space is said to be a
unit vector if it is of norm 1. ♦
Exercise 3.4 If x is a non-zero vector, then show that u := x/kxk
is a vector of norm 1. ♦

Recall from elementary geometry that if a, b are the lengths of


the adjacent sides of a parallelogram, and if c, d are the lengths of
its diagonals, then 2(a2 + b2 ) = c2 + d2 . This is the well-known
parallelogram law. This has a generalized version in the setting of
inner product spaces.
Theorem 3.3 (Parallelogram law) For vectors x, y in an inner
product space V ,
kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 .


Exercise 3.5 Verify the parallelogram law (Theorem 3.3). ♦


Exercise 3.6 Let V be an inner product space, and let x, y ∈ V .
Then, show the following:
(a) kxk ≥ 0.
(b) kxk = 0 iff x = 0.
(c) kαxk = |α| kxk for all α ∈ F..

52 Inner Product Spaces

Exercise 3.7 Let V1 and V2 be an inner product spaces, using the


same notations for inner products (respectively, norms) on both the
spaces. Let T : V1 → V2 be a linear transformation. Prove that, for
all (x, y) ∈ V1 × V2 ,

hT x, T yi = hx, yi ⇐⇒ kT xk = kxk.

[Hint: For the only if part, use hT (x + y), T (x + y)i = hx + y, x + yi


and use Exercise 3.1.] ♦

3.5 Orthogonality
Recall that the angle θx,y between vectors ~x and ~y in R2 is given by

h~x, ~y i2
cos θx,y := .
|~x| |~y |

Hence, we can conclude that the vectors ~x and ~y are orthogonal if


and only if h~x, ~y i2 = 0. This observation motivates us to have the
following definition.
Definition 3.5 (Orthogonal vectors) Vectors x and y in an inner
product space V are said to be orthogonal to each other or x is
orthogonal to y if hx, yi = 0. In this case we write x ⊥ y, and read x
perpendicular to y, or x perp y. ♦
Note that

• for x, y in V , x ⊥ y ⇐⇒ y ⊥ x, and

• 0 ⊥ x for all x ∈ V .

3.5.1 Cauchy-Schwarz inequality


Recall from the geometry of R2 that if ~x and ~y are nonzero vectors
in R2 , then the projection vector p~x,y of ~x along ~y is given by

h~x, ~y i2
p~x,y := ~y
k~y k2

and it has less length atmost k~xk, that is, k~ px,y k ≤ k~xk. Thus, we
have
|h~x, ~y i2 | ≤ k~xk k~y k.
Orthogonality 53

Further, the vectors p~x,y and ~qx,y := ~x − p~x,y are orthogonal, so that
by Pythagoras theorem,

k~xk2 = k~
px,y k2 + k~qx,y k2 .

Now, let us prove these concepts in the context of a general inner


product space.
First recall Pythagoras theorem from elementary geometry that if
a, b c are lengths of sides of a right angled triangle with c being the
hypotenuse, then a2 + b2 = c2 . Here is the generalized form of it in
the setting of an inner product space.
Theorem 3.4 (Pythagoras theorem) Suppose x and y are vectors
in an inner product space which are orthogonal to each other. Then

kx + yk2 = kxk2 + kyk2 .

Proof. Left as an exercise. (Follows by writing the norms in terms


of inner products and simplifying expressions.)

Exercise 3.8 (i) If the scalar field is R, then show that the converse
of the Pythagoras theorem holds, that is, if kx + yk2 = kxk2 + kyk2 ,
then x ⊥ y.
(ii) If the scalar field is C, then show that the converse of Pythago-
ras theorem need not be true .
[Hint: Take V = C with standard inner product, and for nonzero
real numbers α, β ∈ R, take x = α, y = i β.] ♦
Theorem 3.5 (Cauchy-Schwarz inequality) Let V be an inner
product space, and x, y ∈ V . Then

|hx, yi| ≤ kxk kyk.

Equality holds in the above inequality if and only if x and y are


linearly dependent.
Proof. Clearly, the result holds if y = 0. So, assume that y 6= 0,
and let u0 = y/kyk. Let us write x = u + v, where

u = hx, u0 iu0 , v = x − hx, u0 iu0 .

Note that hu, vi = 0 so that by Pythagoras theorem,

kxk2 = kuk2 + kvk2 = |hx, u0 i|2 + kvk2 .


54 Inner Product Spaces

Thus, |hx, u0 i| ≤ kxk. Equality holds in this inequality if and only if


v := x − hx, u0 iu0 = 0, i.e., if and only if x is a scalar multiple of y
if and only if x and y are linearly dependent.

As a corollary of the above theorem we have the following.

Corollary 3.6 (Triangle inequality) Suppose V is an inner prod-


uct space. Then for every x, y in V ,

kx + yk ≤ kxk + kyk.

Proof. Let x, y ∈ V . Then, using the Cauchy-Schwarz inequality,


we obtain

kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
= kxk2 + kyk2 + 2 Re hx, yi
≤ kxk2 + kyk2 + 2 |hx, yi|
≤ kxk2 + kyk2 + 2 kxk kyk
= (kxk + kyk)2 .

Thus, kx + yk ≤ kxk + kyk for every x, y ∈ V .

Remark 3.1 For nonzero vectors x and y in an inner product space


V , by Schwarz inequality, we have

|hx, yi|
≤ 1.
kxkkyk

This relation motivates us to define the angle between any two nonzero
vectors x and y in V as
 
−1 |hx, yi|
θx,y := cos .
kxk kyk

Note that if x = cy for some nonzero scalar c, then θx,y = 0, and if


hx, yi = 0, then θx,y = π/2.

3.5.2 Orthogonal and orthonormal sets


Theorem 3.7 Let V be an inner product space, and x ∈ V . If
hx, yi = 0 for all y ∈ V , then x = 0.
Orthogonality 55

Proof. Clearly, if hx, yi = 0 for all y ∈ V , then hx, xi = 0 as well.


Hence x = 0

As an immediate consequence of the above theorem, we have the


following.

Corollary 3.8 Let V be an inner product space, and u1 , u2 , . . . , un


be linearly independent vectors in V . Let x ∈ V . Then

hx, ui i = 0 ∀ i ∈ {1, . . . , n} ⇐⇒ hx, yi = 0 ∀ y ∈ span {u1 , . . . , un }.

In particular, if {u1 , u2 , . . . , un } is a basis of V , and if hx, ui i = 0


for all i ∈ {1, . . . , n}, then x = 0.

Exercise 3.9 If dim V ≥ 2, and if 0 6= x ∈ V , then find a non-zero


vector which is orthogonal to x. ♦

Definition 3.6 (Orthogonal to a set) Let S be a subset of an


inner product space V , and x ∈ S. Then x is said to be orthogonal
to S if hx, yi = 0 for all y ∈ S. In this case, we write x ⊥ S. The set
of vectors orthogonal to S is denoted by S ⊥ , i.e.,

S ⊥ := {x ∈ V : x ⊥ S}.

Exercise 3.10 Let V be an inner product space.


(a) Show that V ⊥ = {0}.
(b) If S is a basis of V , then show that S ⊥ = {0}. ♦

Definition 3.7 (Orthogonal and orthonormal sets) Let S be a


subset of an inner product space V . Then

(a) S is said to be an orthogonal set if hx, yi = 0 for all distinct x,


y ∈ S, i.e., for every x, y ∈ S, x 6= y implies x ⊥ y.

(b) S is said to be an orthonormal set if it is an orthogonal set and


kxk = 1 for all x ∈ S.

Theorem 3.9 Let S be an orthogonal set in an inner product space


V . If 0 6∈ S, then S is linearly independent.
56 Inner Product Spaces

Proof. Suppose 0 6∈ S and {u1 , . . . , un } ⊆ S. If α1 , . . . , αn are


scalars such that α1 u1 + α2 u2 + . . . + αn un = 0, then for every
j ∈ {1, . . . , n}, we have
n
X  n
X n
X
0= αi ui , uj = hαi ui , uj i = αi hui , uj i = αj huj , uj i.
i=1 i=1 i=1

Hence, αj = 0 for all j ∈ {1, . . . , n}.

By Theorem 3.9, it follows that every orthonormal set is linearly


independent. In particular, if V is an n-dimensional inner product
space and E is an orthonormal set consisting of n vectors, then E is
a basis of V .
Definition 3.8 (Orthonormal basis) Suppose V is a finite di-
mensional inner product space. An orthonormal set in V which is
also a basis of V is called an orthonormal basis of V . ♦
EXAMPLE 3.4 The standard basis {e1 , . . . , en } is an orthonormal
basis of Fn w.r.t. the standard inner product on Fn . ♦
EXAMPLE 3.5 Consider the the vector space C[0, 2π] with inner
product defined by
Z 2π
hf, gi := f (t)g(t) dt
0

for f, g ∈ C[0, 2π]. For n ∈ N, let

un (t) := sin(nt), vn (t) = cos(nt), 0 ≤ t ≤ 2π.

Since Z 2π Z 2π
cos(kt) dt = 0 = sin(kt) dt ∀ k ∈ Z,
0 0

we have for n 6= m,

hun , um i = hvn , vm i = hun , vn i = hun , vm i = 0.

Thus, un : n ∈ N} is an orthonormal set in C[0, 2π]. ♦

Exercise 3.11 If {e1 , . . . , en } is the standard basis of Fn , then for


every i 6= j, ei + ej ⊥ ei − ej w.r.t. the standard inner product. ♦
Orthogonality 57

Exercise 3.12 Let V1 and V2 be inner product spaces and E1 =


{u1 , . . . , un } and E2 = {v1 , . . . , vm } be orthonormal bases of V1 and
V2 , respectively. Let T ∈ L(V1 , V2 ). Prove that
(a) [T ]E1 ,E2 = hT uj , vi i.
(b) hT x, yi ∀ (x, y) ∈ V1 × V2 ⇐⇒ hT uj , ui i = hui , T uj i
∀ i = 1, . . . , m; j = 1, . . . , n. ♦

3.5.3 Fourier expansion and Bessel’s inequality


Theorem 3.10 Suppose V is an inner product space, and {u1 , . . . , un }
is an orthonormal subset of V . Then, for every x ∈ span {u1 , . . . , un },
n
X n
X
x= hx, uj iuj , kxk2 = |hx, uj i|2 .
j=1 j=1

Proof. Let x ∈ span {u1 , . . . , un }, Then there exist scalars α1 , α2 , . . . , αn


such that
x = α1 u1 + · · · + αn un .
Hence, for every i ∈ {1, . . . , n},

hx, ui i = α1 hu1 , ui i + · · · + αn hun , ui i = αi .

and
n
X n
X  n X
X n
2
kxk = hx, xi = αi ui , αj uj = αi ᾱj hui , uj i
i=1 j=1 i=1 j=1
n
X Xn
= |αi |2 = |hx, ui i|2 .
i=1 i=1

This completes the proof.

The proof of the following corollary is immediate from the above


theorem.

Corollary 3.11 (Fourier expansion and Parseval’s identity)


If {u1 , . . . , un } is an orthonormal basis of an inner product space V ,
then for every x ∈ V ,
n
X n
X
x= hx, uj iuj , kxk2 = |hx, uj i|2 .
j=1 j=1
58 Inner Product Spaces

Another consequence of Theorem 3.10 is the following.

Corollary 3.12 (Bessel’s inequality) Suppose V is an inner prod-


uct space, and {u1 , . . . , un } is an orthonormal subset of V . Then, for
every x ∈ V ,
Xn
|hx, uj i|2 ≤ kxk2 .
j=1

n
X
Proof. Let x ∈ V , and let y = hx, ui iui . By Theorem 3.10,
i=1

n
X
kyk2 = |hy, ui i|2 .
i=1

Note that hy, ui i = hx, ui i for all i ∈ {1, . . . , n}, i.e., hx−y, ui i = 0 for
all i ∈ {1, . . . , n}. Hence, hx − y, yi = 0. Therefore, by Pythagoras
theorem,
n
X
2 2 2 2
kxk = kyk + kx − yk ≥ kyk = |hx, ui i|2 .
i=1

This completes the proof.

REXAMPLE 3.6 Let V = C[0, 2π] with inner product hx, yi :=



0 x(t)y(t)dt for x, y in C[0, 2π]. For n ∈ Z, let un be defined by

ei nt
un (t) = √ , t ∈ [0, 2π].

Then it is seen that
Z 2π 
i (n−m)t 1 if n = m,
hun , um i = e dt =
0 0 6 m.
if n =

Hence, {un : n ∈ Z} is an orthonormal set in C[0, 2π]. By Theorem


3.10, if x ∈ span {uj : j = −N, −N + 1, . . . , 0, i, . . . , N },
N Z 2π
X 1
x= an e i nt
with an = x(t)e−i nt dt.
2π 0
j=−N


Gram-Schmidt Orthogonalization 59

Theorem 3.13 (Riesz representation theorem) Let V be a finite


dimensional inner product space. Then for very linear functional
f : V → F, there exists a unique y ∈ V such that
f (x) = hx, yi ∀ x ∈ V.
Proof. Let {u1 , . . . , un } be an orthonormal basis of V . Then,
Xn
by Corollary 3.11, every x ∈ V can be written as x = hx, uj iuj .
j=1
Thus, if f : V → F is a linear functional, then
n
X n
X  n
X 
f (x) = hx, uj if (uj ) = hx, f (uj )uj i = x, f (uj )uj .
j=1 j=1 j=1
Pn
Thus, y := j=1 f (uj )uj satisfies the requirements. To see the
uniqueness, let y1 and y2 be in V such that
f (x) = hx, y1 i, f (x) = hx, y2 i ∀ x ∈ V.
Then
hx, y1 − y2 i = 0 ∀ x ∈ V
so that by Theorem 3.7, y1 − y2 = 0, i.e., y1 = y2 .

3.6 Gram-Schmidt Orthogonalization


A question that naturally arises is: Does every finite dimensional
inner product space has an orthonormal basis? We shall answer this
question affirmatively.
Theorem 3.14 (Gram-Schmidt orthogonalization) Let V be
an inner product space and u1 , u2 , . . . , un are linearly independent
vectors in V . Then there exist orthogonal vectors v1 , v2 , . . . , vn in V
such such that
span {u1 , . . . , uk } = span {v1 , . . . , vk } ∀ k ∈ {1, . . . , n}.
In fact, the vectors v1 , v2 , . . . , vn defined by
v1 := u1
k
X huk+1 , vj i
vk+1 := uk+1 − vj , k = 1, 2, . . . , n − 1,
hvj , vj i
j=1

satisfy the requirements.


60 Inner Product Spaces

Proof. We construct orthogonal vectors v1 , v2 , . . . , vn in V such


such that span {u1 , . . . , uk } = span {v1 , . . . , vk } for all k ∈ {1, . . . , n}.
Let v1 = u1 . Let us write u2 as

u2 = αu1 + v2 ,

where α is chosen in such a way that v2 := u2 − αu1 is orthogonal to


v1 , i.e., hu2 − αu1 , v1 i = 0, i.e.,
hu2 , v1 i
α= .
hv1 , v1 i
Thus, the vector
hu2 , v1 i
v2 := u2 − v1
hv1 , v1 i
is orthogonal to v1 . Moreover, using the linearly independence of
u1 , u2 , it follows that v2 6= 0, and span {u1 , u2 } = span {v1 , v2 }.
Next, we write
u3 = (α1 v1 + α2 v2 ) + v3 ,
where α1 , α2 are chosen in such a way that v3 := u3 − (α1 v1 + α2 v2 )
is orthogonal to v1 and v2 , i.e.,

hu3 − (α1 v1 + α2 v2 ), v1 i = 0, hu3 − (α1 v1 + α2 v2 ), v2 i = 0.

That is, we take


hu3 , v1 i hu3 , v2 i
α1 = , α2 = .
hv1 , v1 i hv2 , v2 i
Thus, the vector
hu3 , v1 i hu3 , v2 i
v3 := u3 − v1 − v2
hv1 , v1 i hv2 , v2 i
is orthogonal to v1 and v2 . Moreover, using the linearly independence
of u1 , u2 , u3 , it follows that v3 6= 0, and

span {u1 , u2 , u3 } = span {v1 , v2 , v3 }.

Continuing this procedure, we obtain orthogonal vectors v1 , v2 , . . . , vn


defined by
huk+1 , v1 i huk+1 , v2 i huk+1 , vk i
vk+1 := uk+1 − v1 − v2 − . . . − vk
hv1 , v1 i hv2 , v2 i hvk , vk i
Gram-Schmidt Orthogonalization 61

which satisfy

span {u1 , . . . , uk } = span {v1 , . . . , vk }

for each k ∈ {1, 2, . . . , k − 1}.

Exercise 3.13 Let V be an inner product space, and let u1 , u2 , . . . , un


be orthonormal vectors. Define w1 , w2 , . . . , wn iteratively as follows:
v1
v1 := u1 and w1 =
kv1 k

and for each k ∈ {1, 2, . . . , n − 1}, let

k
X vk+1
vk+1 := uk+1 − huk+1 , wi iwi and wk+1 = .
kvk+1 k
i=1

Show that {w1 , w2 , . . . , wn } is an orthonormal set, and

span {w1 , . . . , wk } = span {u1 , . . . , uk }, k = 1, 2, . . . , n.

From Theorem 3.14, we obtain the following two theorem.

Theorem 3.15 Every finite dimensional inner product space has an


orthonormal basis.

From Theorem 3.15 we deduce the following.

Theorem 3.16 (Projection theorem) Suppose V is a finite di-


mensional inner product space and V0 is a subspace of V . Then there
exists a subspace W of V such that

V = V0 + W and V0 ⊥ W.

Proof. If V0 = V , then W = {0}. Now, assume that dim (V ) = n


and dim (V0 ) = k < n. Let E0 = {u1 , . . . , uk } be an orthonormal
basis of V0 . Now, extend E0 to a basis E of V . Now, orthonormal-
ization of E by Gram-Schmidt orthogonalization process will give
an orthonormal basis E e = {u1 , . . . , uk , uk+1 , . . . , un } of V . Then
W = span {uk+1 , . . . , un } will satisfy the requirements.
62 Inner Product Spaces

3.6.1 Examples
EXAMPLE 3.7 Let V = F3 with standard inner product. Consider
the vectors u1 = (1, 0, 0), u2 = (1, 1, 0), u3 = (1, 1, 1). Clearly,
u1 , u2 , u3 are linearly independent in F3 . Let us orthogonalize these
vectors according to the Gram-Schmidth orthogonalizaion procedure:
Take v1 = u1 , and
hu2 , v1 i
v2 = u2 − v1 .
hv1 , v1 i
Note that hv1 , v1 i = 1 and hu2 , v1 i = 1. Hence, v2 = u2 − v1 =
(0, 1, 0). Next, let
hu3 , v1 i hu3 , v2 i
v3 = u3 − v1 − v2 .
hv1 , v1 i hv2 , v2 i
Note that hv2 , v2 i = 1, hu3 , v1 i = 1 and hu3 , v2 i = 1 Hence, v3 =
u2 − v1 − v2 = (0, 0, 1). Thus,

{(1, 0, 0), (0, 1, 0), (0, 0, 1)}

is the Gram-Schmidt orthogonalization of {u1 , u2 , u3 }. ♦


EXAMPLE 3.8 Again let V = F3 with standard inner product.
Consider the vectors u1 = (1, 1, 0), u2 = (0, 1, 1), u3 = (1, 0, 1).
Clearly, u1 , u2 , u3 are linearly independent in F3 . Let us orthogonal-
ize these vectors according to the Gram-Schmidth orthogonalizaion
procedure:
Take v1 = u1 , and
hu2 , v1 i
v2 = u2 − v1 .
hv1 , v1 i
Note that hv1 , v1 i = 2 and hu2 , v1 i = 1. Hence,
1
v2 = (0, 1, 1) − (1, 1, 0) = (−1/2, 1/2, 1).
2
Next, let
hu3 , v1 i hu3 , v2 i
v3 = u3 − v1 − v2 .
hv1 , v1 i hv2 , v2 i
Note that hv2 , v2 i = 3/2, hu3 , v1 i = 1 and hu3 , v2 i = 1/2 Hence,
1 1
v3 = (1, 0, 1) − (1, 1, 0) − (−1/2, 1/2, 1) = (−2/3, 2/3, −2/3).
2 3
Gram-Schmidt Orthogonalization 63

Thus,
{(1, 1, 0), (−1/2, 1/2, 1), (−2/3, 2/3, −2/3)}

is the Gram-Schmidt orthogonalization of {u1 , u2 , u3 }. ♦


EXAMPLE 3.9 Let V = P be with the the inner product
Z 1
hp, qi = p(t)q(t) dt, p, q ∈ V.
−1

Let uj (t) = tj−1 for j = 1, 2, 3 and consider the linearly independent


set {u1 , u2 , u3 } in V . Now let v1 (t) = u1 (t) = 1 for all t ∈ [−1, 1],
and let
hu2 , v1 i
v2 = u2 − v1 .
hv1 , v1 i
Note that
Z 1 Z 1
hv1 , v1 i = v1 (t)v1 (t) dt = dt = 2,
−1 −1

Z 1 Z 1
hu2 , v1 i = u2 (t)v1 (t) dt = t dt = 0.
−1 −1

Hence, we have v2 (t) = u2 (t) = t for all t ∈ [−1, 1]. Next, let

hu3 , v1 i hu3 , v2 i
v3 = u3 − v1 − v2 .
hv1 , v1 i hv2 , v2 i

Here,
Z 1 Z 1
2
hu3 , v1 i = u3 (t)v1 (t) dt = t2 dt = ,
−1 −1 3
Z 1 Z 1
hu3 , v2 i = u3 (t)v2 (t) dt = t3 dt = 0.
−1 −1

1
Hence, we have v3 (t) = t2 − 3 for all t ∈ [−1, 1]. Thus,
 
1
1, t, t2 −
3

is an orthogonal set of polynomials. ♦


64 Inner Product Spaces

Definition 3.9 (Legendre polynomials) The polynomials

po (t), p1 (t), p2 (t) . . .

obtained by orthogonalizing 1, t, t2 , . . . using the inner product


Z 1
hp, qi = p(t)q(t) dt, p, q ∈ P,
−1

are called Legendre polynomials. ♦


It is clear that the n-th Legendre polynomial pn (t) is of degree n.
We have seen in Example 3.9 that
1
p0 (t) = 1, p1 (t) = t, p2 (t) = t2 − .
3
Remark 3.2 For nonzero vectors x and y in an inner product space
V , by Schwarz inequality, we have
|hx, yi|
≤ 1.
kxkkyk
This relation motivates us to define the angle between any two nonzero
vectors x and y in V as
 
−1 |hx, yi|
θx,y := cos .
kxk kyk
Note that if x = cy for some nonzero scalar c, then θx,y = 0, and if
hx, yi = 0, then θx,y = π/2.
Exercise 3.14 Let V1 and V2 be finite dimensional inner product
spaces, and E1 and E2 be ordered orthonormal bases of V1 and V2
respectively. Let A : Fn → Fm be the linear transformation corre-
sponding to the matrix [T ]E1 ,E2 . Prove that

hT x, yi = hAx, yi ∀ (x, y) ∈ V1 × V2 .

Deduce that hT x, yi = hx, T yi for all (x, y) ∈ V1 × V2 if and only


[T ]E1 ,E2 is hermitian. ♦

3.7 Diagonalization
In this section we show the diagonalizability of certain type operators
as promised at the end of Section 2.7.3.
Diagonalization 65

3.7.1 Self-adjoint operators and their eigenvalues


Recall from the theory of matrices that a square matrix
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
 
 ··· ··· ··· ··· 
an1 an2 · · · ann

with complex entries is said to be hermitian if its conjugate transpose


is itself, that is,
   
a11 a12 · · · a1n a11 a21 · · · an1
 a21 a22 · · · a2n   a12 a22 · · · an2 
 ··· ··· ··· ···  =  ··· ··· ··· ··· .
   

an1 an2 · · · ann a1n a2n · · · ann

The real analogue of hermitian matrices are the so called symmetric


matrices, that is, matrices whose transpose is itself.
In the context of a general inner product space, there is a analogue
for the above concepts too.
Definition 3.10 A linear transformation T : V → V on an inner
product space V is said to be a self-adjoint if

hT x, yi = hx, T yi ∀ x, y ∈ V.


Suppose V is finite dimensional, E := {u1 , . . . , un } is an (ordered)
orthonormal basis of V , and T : V → V is a self-adjoint linear
transformation. Then we have the following:

• hT uj , ui i = huj , T ui i for all i, j = 1, . . . , n,

• F = C and V = Cn implies [T ]E,E is a hermitian matrix,

• F = R and V = Rn implies [T ]E,E is a symmetric matrix.

Observe that if T : V → V is a self-adjoint operator, then

hT x, xi ∈ R ∀ x ∈ V.

Using this fact we prove the following.


Theorem 3.17 Eigenvalues of a self-adjoint operator are real.
66 Inner Product Spaces

Proof. Let T : V → V be a self-adjoint operator and λ ∈ F be an


eigenvalue of T . Let x be a corresponding eigenvector. Then x 6= 0
and T x = λx. Hence,

hT x, xi = hλx, xi = λhx, xi.

Since hT x, xi and hx, xi is nonzero real, λ is also real.

Corollary 3.18 Eigenvalues of a Hermitian matrix are real.

We also observe the following.

Theorem 3.19 Eigenvectors associated with distinct eigenvalues of


a self-adjoint operator are orthogonal.

Proof. Let T : V → V be a self-adjoint operator and λ and µ be


distinct eigenvalues of T . Let x and y be eigenvectors corresponding
to λ and µ, respectively. Then, we have

λhx, yi = hλx, yi = hT x, yi = hx, T yi = hx, µyi = µhx, yi.

Hence, hx, yi = 0.

Next we have another important property of self-adjoint opera-


tors.

Theorem 3.20 Every self-adjoint operator on a finite dimensional


inner product space has an eigenvalue.

Proof. We already know that if F = C, then every linear operator


on a finite dimensional linear space has an eigenvalue. Hence, assume
that V be an inner product space over R and T : V → V is self-
adjoint.
Let dim (V ) = n and let A = (aij ) be a matrix representation
of T with respect to an orthonormal basis {u1 , . . . , un }. Then A as
an operator on Cn has an eigenvalue, say λ. Since A : Cn → Cn is
self-adjoint, λ ∈ R. Let x ∈ Cn be an eigenvector corresponding to
λ, that is, x 6= 0 and Ax = λx. Let u and v be real and imaginary
parts of x. Then we have

A(u + iv) = λ(u + iv).

Therefore,
Au = λ u, Av = λ v.
Diagonalization 67

Since x is nonzero, at least one of u 6= 0 and v 6= 0 is nonzero.


that u 6= 0. Thus, If α1 , . . . , αn
Without loss of generality, assume P
are the coordinates of u, then u := nj=1 αj uj satisfies the equation
T u = λu.

In the Appendix (Section 3.11) we have given another proof for


Theorem 3.20 which does not depend on the matrix representation
of T .
We end this subsection with another property.

Theorem 3.21 Let T : V → V be a self-adjoint operator on a finite


dimensional inner product space V and V0 be a subspace of V . Then

T (V0 ) ⊆ V0 =⇒ T (V0⊥ ) ⊆ V0⊥ .

Proof. Suppose V0 is a subspace of V such that T (V0 ) ⊆ V0 . Now,


let x ∈ V0⊥ and y ∈ V0 . Since T y ∈ V0 , we have

hT x, yi = hx, T yi = 0.

Thus, T y ∈ V0⊥ for every y ∈ V0⊥ .

3.7.2 Diagonalization of self-adjoint operators


First we prove the diagonalization theorem in the general context.
Then state it in the setting of matrices.

Theorem 3.22 (Diagonalization theorem) Let T : V → V be a


self-adjoint operator on a finite dimensional inner product space V .
Then there exists an orthonormal basis for V consisting of eigenvec-
tors of T .

Proof. Let λ1 , . . . , λk be distinct eigenvalues of T and let

V0 = N (A − λ1 I) + . . . + N (A − λk I).

Then the union of orthonormal bases of N (A − λ1 I), . . . , N (A − λk I)


will be an orthonormal basis of V0 . Thus, if V0 = V , then we are
through.
Suppose V0 6= V . Then V0⊥ 6= 0 (See Theorem 3.16). Now, it can
be easily seen that T (V0 ) ⊆ V0 . Hence, by Theorem 3.21, T (W ) ⊆ W .
Also, the operator T1 : V0⊥ → V0⊥ defined by T1 x = T x for every
x ∈ V0⊥ is self-adjoint. Hence, T1 has an eigenvalue, say λ. If x ∈ V0⊥
is a corresponding eigenvector, then we see also have T x = λx, so
68 Inner Product Spaces

that λ = λj for some j ∈ {1, . . . , k}. Therefore, x ∈ N (T −λj I) ⊆ V0 .


Thus, we obtain
0 6= x ∈ V0 ∩ V0⊥ = {0},
which is clearly a contradiction.

We observe that, the method of the proof of the above theo-


rem shows that, if T : V → V is a self-adjoint operator on a finite
dimensional inner product space V and λ1 , . . . , λk are the distinct
eigenvalues of T and if {uj1 , . . . , ujmj } is an ornthonromal basis of
the eigenspace N (T − λj I) for j = 1, . . . , k, then the matrix repre-
sentation of T with respect to the ordered basis

{u11 , . . . , u1mj , u21 , . . . , u2mj , . . . , uk1 , . . . , ukmj }

is a diagonal matrix with diagonal entries

λ1 , . . . , λ1 , λ2 , . . . , λ2 , . . . , λk , . . . , λk

with each λj appearing mj times where

m1 + . . . + mk = dim (V ), mj = dim N (T − λj I), j = 1, . . . , k.

If A is a square matrix, then let us denote the conjugate transpose


of A by A∗ . For the next result we introduce the following definition.
Definition 3.11 A square matrix A with entries from F is said to
be a

1. self-adjoint matrix if A∗ = A, and

2. unitary matrix if A∗ A = I = AA∗ .


We note the following (Verify!):

• If A is unitary then it is invertible and A∗ = A−1 and columns


of A form an orthonormal basis for Fn .

In view of Theorem 3.22 and the above observations, we have the


following.

Theorem 3.23 Let A be a self-adjoint matrix. Then there exists a


unitary matrix U such that U −1 AU is a diagonal matrix.
Best Approximation 69

3.8 Best Approximation


In applications one may come across functions which are too com-
plicated to handle for computational purposes. In such cases, one
would like to replace them by functions of ”simpler forms” which
are easy to handle. This is often done by approximating the given
function by certain functions belonging to a finite dimensional space
spanned by functions of simple forms. For instance, one may want
to approximate a continuous function f defined on certain interval
[a, b] by a polynomial, say a polynomial p in Pn for some specified n.
It is desirable to find that polynomial p such that

kf − pk ≤ kf − qk ∀ q ∈ Pn .

Here, k.k is a norm on C[a, b]. Now the question is whether such
a polynomial exists, and if exists, then is it unique; and if there is
a unique such polynomial, then how can we find it. These are the
issues that we discuss in this section, in an abstract frame work of
inner product spaces.
Definition 3.12 Let V be an inner product spaceand V0 be a sub-
space of V . Let x ∈ V . A vector x0 ∈ V0 is a called a best approx-
imation of x from V0 if

kx − x0 k ≤ kx − vk ∀ v ∈ V0 .

Proposition 3.24 Let V be an inner product space , V0 be a subspace


of V , and x ∈ V . If x0 ∈ V0 is such that x − x0 ⊥ V0 , then x0 is a
best approximation of x, and it is the unique best approximation of
x from V0 .
Conversely, if x0 ∈ V0 is a best approximation of x, then x − x0 ⊥
V0 .

Proof. Suppose x0 ∈ V0 is such that x − x0 ⊥ V0 . Then, for every


u ∈ V0 ,

kx − uk2 = k(x − x0 ) + (x0 − u)k2


= kx − x0 k2 + kx0 − uk2 .

Hence
kx − x0 k ≤ kx − vk ∀ v ∈ V0 ,
70 Inner Product Spaces

showing that x0 is a best approximation.


To see the uniqueness, suppose that v0 ∈ V0 is another best ap-
proximation of x. Then, we have
kx − x0 k ≤ kx − v0 k and kx − v0 k ≤ kx − x0 k,
so that kx − x0 k = kx − v0 k. Therefore, using the fact that
hx − x0 , x0 − v0 i = 0, we have
kx − v0 k2 = kx − x0 k2 + kx0 − v0 k2 .
Hence, it follows that kx0 − v0 k = 0. Thus v0 = x0 .
Conversely, suppose that x0 ∈ V0 is a best approximation of x.
Then kx − x0 k ≤ kx − uk for all u ∈ V0 . In particular, if v ∈ V0 ,
kx − x0 k ≤ kx − (x0 + αvk ∀α ∈ F.
Hence, for every α ∈ F,
kx − x0 k2 ≤ kx − (x0 + αvk2
= h(x − x0 ) + αv, (x − x0 ) + αvi
= kx − x0 k2 − 2Rehx − x0 , αvi + |α|2 kvk2 .
Taking α = hx − x0 , vi/kvk2 , we have
|hx − x0 , vi|2
hx − x0 , αvi = = |α|2 kvk2
kvk2
so that
kx − x0 k2 ≤ kx − x0 k2 − 2Rehx − x0 , αvi + |α|2 kvk2
|hx − x0 , vi|
= kx − x0 k2 − .
kvk2
Hence, hx − x0 , vi = 0.
By the above proposition, in order to find a best approximation
of x ∈ V from V0 , it is enough to find a vector x0 ∈ V0 such that
x − x0 ⊥ V0 ; and we know that such vector x0 is unique.
Theorem 3.25 Let V be an inner product space, V0 be a finite di-
mensional subspace of V , and x ∈ V . Let {u1 , . . . , un } be an or-
thonormal basis of V0 . Then for x ∈ V , the vector
n
X
x0 := hx, ui iui
i=1
is the unique best approximation of x from V0 .
Best Approximation 71

Pn
Proof. Clearly, x0 := i=1 hx, ui iui satisfies the hypothesis of
Proposition 3.24.

The above theorem shows how to find a best approximation from


a finite dimensional subspace V0 , provided we know an orthonormal
basis of V0 . Suppose we know only a basis of V0 . Then, we can find
an orthonormal basis by Gram-Schmidt procedure. Another way to
find a best approximation is to use Proposition 3.24:
Suppose {v1 , . . . , vn } is a basis of V0 . By Proposition 3.24, the
vector x0 that we are looking for should satisfy hx − x0 , vi i for every
i = 1, . . . , n. Thus, we have to find scalars α1 , . . . , αn such that
n
* +
X
x− αj vj , v i = 0 ∀ i = 1, . . . , n.
j=1

That is to find α1 , . . . , αn such that


n
X
hvj , vi iαj = hx, vi i ∀ i = 1, . . . , n.
j=1

The above system of equations is uniquely solvable (Why?) to get


α1 , . . . , αn . Note that if the basis {v1 , . . . , vn } is an orthonormal basis
basis of V0 , then αj = hx, vj i for j = 1, . . . , n.
Exercise 3.15 Show that, if {v1 , . . . , vn } is a linearly independent
subset of an inner product space V , then the columns of the matrix
M := (aij ) with aij = hvj , vi i, are linearly independent. Deduce
that, the matrix is invertible. ♦

EXAMPLE 3.10 Let V = R2 with usual inner product, and V0 =


{x = (x1 , x2 ) ∈ R2 : x1 = x2 }. Let us find the best approximation of
x = (0, 1) from V0 .
We have to find a vector of the form x0 = (α, α) such that x−x0 =
(0, 1) − (α, α) = (−α, 1 − α) is orthogonal to V0 . Since V0 is spanned
by the single vector (1, 1), the requirement is to find α such that
(−α, 1 − α) is orthogonal to (1, 1), i.e., α has to satisfy the equation
−α + (1 − α = 0, i.e., α = 1/2. Thus the best approximation of
x = (0, 1) from V0 is the vector x0 = (1/2, 1/2). ♦
EXAMPLE 3.11 Let VR be the vector space C[0, 1] over R with the
1
inner product: hx, ui = 0 x(t)u(t)dt, and let V0 = P1 . Let us find
the best approximation of x define by x(t) = t2 from space V0 .
72 Inner Product Spaces

We have to find a vector x0 of the form x0 (t) = a0 + a1 t such


that the function x − x0 defined by (x − x0 )(t) = t2 − a0 − a1 t is
orthogonal to V0 . Since V0 is spanned by u1 , u2 where u1 (t) = 1 and
u2 (t) = t, the requirement is to find a0 , a1 such that
Z 1
hx − x0 , u1 i = (t2 − a0 − a1 t)dt = 0,
0

Z 1
hx − x0 , u2 i = (t3 − a0 t − a1 t2 )dt = 0.
0

That is
Z 1
(t2 − a0 − a1 t)dt = [t3 /3 − a0 t − a1 t2 /2]10 = 1/3 − a0 − a1 /2 = 0,
0

Z 1
(t3 −a0 t−a1 t2 )dt = [t4 /4−a0 t2 /2−a1 t3 /3]10 = 1/4−a0 /2−a1 /3 = 0.
0

Hence, a0 = −1/6 and a1 = 1, so that the best approximation x0 of


t2 from P1 is given by x0 (t) := −1/63 + t. ♦

Exercise 3.16 Let V be an inner product space and V0 be a finite


dimensional subspace of V . Show that for every x ∈ V , there exists
a unique pair of vectors u, v with u ∈ V0 and v ∈ V0⊥ satisfying
x = u + v. In fact,
V = V0 + V0⊥ . 

Exercise
R1 3.17 Let V = C[0, 1] over R with inner product hx, ui =
0 x(t)u(t)dt. Let V0 = P3 . Find best approximation for x from V0 ,
where x(t) is given by
(i) et , (ii) sin t, (iii) cos t, (iv) t4 . ♦

3.9 Best Approximate Solution


In this section we shall make use of the results from the previous
section to define and find a best approximate solution for an equation
Ax = y where A : V1 → V2 is a linear transformation between vector
spaces V1 and V2 with V2 being an inner product space.
Best Approximate Solution 73

Definition 3.13 Let V1 and V2 be vector spaces with V2 being an


inner product space, and let A : V1 → V2 be a linear transformation.
Let y ∈ V2 . Then a vector x0 ∈ V1 is called a best approximate
solution or a least-square solution of the equation Ax = y if

kAx0 − yk ≤ kAu − yk ∀ u ∈ V1 .


It is obvious that x0 ∈ V1 is a best approximate solution of Ax =
y if and only if y0 := Ax0 is a best approximation of y from the
range space R(A). Thus, from Proposition 3.24, we can conclude the
following.

Theorem 3.26 Let V1 and V2 be vector spaces with V2 being an


inner product space, and let A : V1 → V2 be a linear transformation.
If R(A) is a finite dimensional subspace of V2 , then the equation
Ax = y has a best approximate solution. Moreover, a vector x0 ∈ V1
is a best approximate solution if and only if Ax0 − y is orthogonal to
R(A).

Clearly, a best approximate solution is unique if and only if A is


injective.
Next suppose that A ∈ Rm×n , i.e., A is an m × n matrix of real
entries. Then we know that range space of A, viewing it as a linear
transformation from Rn to Rm , is the space spanned by the columns
of A. Suppose u1 , . . . , un be the columns of A. Then, given y ∈ Rm ,
a vector x0 ∈ Rn is a best approximate solution of Ax = y if and
only if Ax0 − y is orthogonal to ui for i = 1, . . . , n, i.e., if and only if
uTi (Ax0 − y) = 0 for i = 1, . . . , n, i.e., if and only if AT (Ax0 − y) = 0,
i.e., if and only if
AT Ax0 = AT y.
   
1 1 0
EXAMPLE 3.12 Let A = and let y = . Clearly, the
0 0 1
 
1
equation Ax = y has no solution. It can be seen that x0 =
−1
T T
is a solution of the equation A Ax = A y. Thus, x0 is a best
approximate solution of Ax = y.

74 Inner Product Spaces

3.10 QR-Factorization and Best Approximate


Solution
Suppose that A ∈ Rm×n , i.e., A is an m × n matrix of real entries
with n ≤ m. Assume that the columns of A are linearly independent.
Then we know that, if the equation Ax = y has a solution, then the
solution is unique. Now, let u1 , . . . , un be the columns of A, and
let v1 , . . . , vn are orthonormal vectors obtained by orthonormalizing
u1 , . . . , un . Hence, we know that for each k ∈ {1, . . . , n},

span {u1 , . . . , uk } = span {v1 , . . . , vk }.

Hence, there exists an upper triangular n × n matrix R := (aij ) such


that uj = a1j v1 + a2j v2 + . . . + anj vj , j = 1, . . . , n. Thus,

[u1 , u2 , . . . , un ] = [v1 , v2 , . . . , vn ]R.

Note that A = [u1 , u2 , . . . , un ], and the matrix Q := [v1 , v2 , . . . , vn ]


satisfies the relation
QT Q = I.

Definition 3.14 The factorization A = QR with columns of Q being


orthonormal and R being an upper triangular matrix is called a QR-
factorization of A. ♦
We have see that if columns of A ∈ Rm×n are linearly indepen-
dent, then A has a QR-factorization.
Now, suppose that A ∈ Rm×n with columns of A are linearly
independent, and A = QR is the QR-factorization of A. Let y ∈ Rm .
Since columns of A are linearly independent, the equation Ax = y
has a unique best approximate solution, say x0 . Then we know that

AT Ax0 = AT y.

Using the QR-factorization A = QR of A, we have

RT QT QRx0 = RT QT y.

Now, QT Q = I, and RT is injective, so that it follows that

Rx0 = QT y.
Appendix 75

Thus, if A = QR is the QR-factorization of A, then the best approx-


imate solution of Ax = y is obtained by solving the equation

Rx = QT y.

For more details on best approximate solution one may see


http://mat.iitm.ac.in/∼mtnair/LRN-Talk.pdf

3.11 Appendix
Another proof for Theorem 3.20. Assuming F = R, another
way of proving Theorem 3.20 is as follows: Consider a new vector
space Ve := {u + iv : u, v ∈ V } over C. For u + iv, u1 + iv1 , u2 + iv2
in Ve and α + iβ) ∈ C with (α, β) ∈ R2 the addition and scalar
multiplication are defined as

(u1 + iv1 ) + (u2 + iv2 ) := (u1 + u2 ) + i(v1 + v2 ),

(α + iβ)(u + iv) := (αu − βv) + i(αv + βu).


The inner product on Ve is defined as

hu1 + iv1 , u2 + iv2 i := hu1 , u2 i + hv1 , v2 i.

Define Te : Ve → Ve by

Te(u + iv) := T u + iT v.

Then, self-adjointness of T implies that Te is also self-adjoint. Indeed,

hTe(u1 + iv1 ), u2 + iv2 i = hT u1 + iT v1 ), u2 + iv2 i


= hT u1 , u2 i + hT v1 , v2 i
= hu1 , T u2 i + hv1 , T v2 i
= hu1 + iv1 , T u2 + iT v2 i
= hu1 + iv1 , Te(u2 + iv2 )i.

Now, let λ ∈ R be an eigenvalue of Te with a corresponding eigenvec-


tor x̃ := u + iv. Since

Tex̃ = λx̃ ⇐⇒ T u + iT v = λu + iλv ⇐⇒ T u = λu &T v = λv

and since one of u and v is nonzero, it follows that λ is an eigenvalue


of T as well.
4
Error Bounds and Stability of
Linear Systems

4.1 Norms of Vectors and Matrices


Recall that a norm k · k on a vector space V is a function which
associates each x ∈ V a unique non-negative real number kxk such
that the following hold:
(a) For x ∈ V , kxk = 0 ⇐⇒ x = 0
(b) kx + yk ≤ kxk + kyk ∀x, y ∈ V ,
(c) kαxk = |α| kxk ∀α ∈ F, x ∈ V .
We have already seen that if V is an inner product space, then
the function x 7→ kxk := hx, xi1/2 is a norm on V . It can be easily
sen that for x = (x1 , x2 , . . . , xk ) ∈ Rk ,
k
X
kxk1 := |xj |, kxk∞ := max |xi |
1≤i≤k
j=1

define norms on Rk . The norm induced by the standard inner prod-


uct on Rk is denoted by k · k2 , i.e.,
k
X 1/2
kxk2 := |x(t)|2 .
j=1

Exercise 4.1 Show that kxk∞ ≤ kxk2 ≤ kxk1 for every x ∈ Rk .


Compute kxk∞ , kxk2 , kxk1 for x = (1, 1, 1) ∈ R3 .
We know that on C[a, b],
Z b 1/2
1/2
kxk2 := hx, xi = |x(t)|2 dt
a

76
Norms of Vectors and Matrices 77

defines a norm. It is easy to show that


Z b
kxk1 := |x(t)|dt kxk∞ := max |x(t)|
a a≤b

also define norms on C[a, b].

Exercise 4.2 Show that there exists no constant c > 0 such that
kxk∞ ≤ c kxk1 for all x ∈ C[a, b].

Next we consider norms of matrices. Considering an n × n matrix


2
as an element of Rn , we can obtain norms of matrices. Thus, ana-
logues to the norms k · k1 , k · k2 , k · k∞ on Rn , for A = (aij ) ∈ Rn×n ,
the quantities
n X
X n n X
X n 1/2
|aij |, |aij |2 , max |aij |
1≤i,j≤n
i=1 j=1 i=1 j=1

define norms on Rn×n .


Given a vector norm k · k on Rn , it can be seen that

kAk := sup kAxk, A ∈ Rn×n ,


kxk≤1

defines a norm on the space Rn×n . Since this norm is associated with
the norm of the space Rn , and since a matrix can be considered as
a linear operator on Rn , the above norm on Rn×n is called a matrix
norm associated with a vector norm.
The above norm has certain important properties that other norms
may not have. For example, it can be seen that
• kAxk ≤ kAk kxk ∀x ∈ Rn ,
• kAxk ≤ c kxk ∀x ∈ Rn =⇒ kAk ≤ c,.
Moreover, if A, B ∈ Rn×n and if I is the identity matrix, then
• kABk ≤ kAk kBk, kIk = 1.

Exercise 4.3 Let k · k be a norm on Rn and and A ∈ Rn×n . Suppose


c > 0 is such that kAxk ≤ c kxk for all x ∈ Rn , and there exists x0 6= 0
in Rn such that kAx0 k = c kx0 k. Then show that kAk = c.
78 Error Bounds and Stability of Linear Systems

In certain cases operator norm can be computed from the knowl-


edge of the entries of the matrix. Let us denote the matrix norm asso-
ciated with k · k1 and k · k∞ by the same notation, i.e., for p ∈ {1, ∞},

kAkp := sup kAxkp , A ∈ Rn×n .


kxkp ≤1

Theorem 4.1 If A = (aij ) ∈ Rn×n , then


n
X n
X
kAk1 = max |aij |, kAk∞ = max |aij |.
1≤j≤n 1≤i≤n
i=1 j=1

Proof. Note that for x = (x1 , . . . , xn ) ∈ Rn ,


n X
X n X n X
n
kAxk1 = aij xj ≤ |aij | |xj |


i=1 j=1 i=1 j=1
Xn X n   n
X n
X
= |aij | |xj | ≤ max |aij | |xj |.
1≤j≤n
j=1 i=1 i=1 j=1

Thus, kAk1 ≤ max1≤j≤n ni=1 |aij |.PAlso, note that kAej k1 = ni=1 |aij |
P P
n
Pn i=1 |aij | ≤ kAk1 for every j ∈
for every j ∈ {1, . . . , n} so that
{1, . . . , n}. Hence, max1≤j≤n i=1 |aij | ≤ kAk1 . Thus, we have
shown that
X n
kAk1 = max |aij |.
1≤j≤n
i=1

Next, consider the norm k · k∞ on Rn . In this case, for x =


(x1 , . . . , xn ) ∈ Rn , we have
Xn
kAxk∞ = max aij xj .

1≤i≤n
j=1

Since
Xn X n n
X
aij xj ≤ |aij | |xj | ≤ kxk∞ |aij |,


j=1 j=1 j=1

it follows that
 n
X 
kAxk∞ ≤ max |aij | kxk∞ .
1‘i≤n
j=1
Norms of Vectors and Matrices 79

n
X
From this we have kAk∞ ≤ max |aij |. Now, let i0 ∈ {1, . . . , n}
1‘i≤n
j=1
n
X n
X
be such that max |aij | = |ai0 j |, and let x0 = (α1 , . . . , αn ) be
1≤i≤n
j=1 j=1

|ai0 j |/ai0 j 6 0,
if ai0 j =
such that αj = Then kx0 k∞ = 1 and
0 if ai0 j =6 0.

n
X Xn
|ai0 j | = ai0 j αj = |(Ax0 )io | ≤ kAx0 k∞ ≤ kAk∞ .

j=1 j=1

n
X n
X
Thus, max |aij | = |ai0 j | ≤ kAk∞ . Thus we have proved that
1≤i≤n
j=1 j=1

n
X
kAk∞ = max |aij |.
1≤i≤n
j=1

This completes the proof of the theorem.

What about the matrix norm

kAk2 := max kAxk2 , A ∈ Rn×n ,


kxk≤1

induced by k · k2 on Rn ? In fact, there is no simple representation


for this in terms of the entries of the matrix. However, we have the
following.

Theorem 4.2 Suppose A = (aij ) ∈ Rn×n . Then

n X
X n 1/2
kAk2 ≤ |aij |2 .
i=1 j=1

If λ1 , λ2 , . . . , λn are the (non-negative) eigenvalues of the matrix


AT A, then
p
kAk2 = max λj .
1≤≤n
80 Error Bounds and Stability of Linear Systems

Proof. Using the Cauchy-Schwarz inequality on Rn , we have, for


x = (x1 , . . . , xn ) ∈ Rn ,
n X
X n 2
kAxk22 = a x

ij j
i=1 j=1
n h X
X n n
 X i
2
≤ |aij | |xj |2
i=1 j=1 j=1
n X
X n 
≤ |aij |2 kxk22 .
i=1 j=1
n X
X n 1/2
Thus, kAk2 ≤ |aij |2 .
i=1 j=1
Since AT A is a symmetric matrix, it has n real eigenvalues (may
be some of the are repeated) with corresponding orthonormal eigen-
vectors u1 , un , . . . , un . Note that, for every j ∈ {1, 2, . . . , n},
λj = λj huj , uj i = hλj uj , uj i = hAT Auj , uj i = hAuj , Auj i = kAuj k2
so that λj ’ s are non-negative, and |λj | ≤ kAk for all j. Thus,
p
max λj ≤ kAk.
1≤≤n

To see the reverse inequality, first we observe that u1 , un , . . . , un form


an orthonormal
Pn basis of Rn . Hence, every x ∈ Rn can be written as
x = j=1 hx, uj iuj , so that
n
X n
X
AT Ax = hx, uj iAT Auj = hx, uj iλj uj .
j=1 j=1

Thus, we have kAxk2 = hAx, Axi = hAT Ax, xi so that


* n n
+
X X
2
kAxk = hx, uj iλj uj , hx, ui iui
j=1 i=1
n
X
= |hx, uj i|2 λj
j=1
 
≤ max λj kxk2 .
1≤j≤n
p
Hence, kAk2 ≤ max1≤j≤n λj . This completes the proof.
Error Bounds for System of Equations 81

 
1 2 3
Exercise 4.4 Find kAk1 , kAk∞ , for the matrix A =  2 3 4 .
3 2 1

4.2 Error Bounds for System of Equations


Given an invertible matrix A ∈ Rn×n and b ∈ Rn , consider the
equation
Ax = b.
Suppose the data b is not known exactly, but a perturbed data b̃ is
known. Let x̃ ∈ Rn be the corresponding solution, i.e.,

Ax̃ = b̃.

Then, we have x − x̃ = A−1 (b − b̃) so that

kAxk kb − b̃k
kx− x̃k ≤ kA−1 kb− b̃k = kA−1 kb− b̃k ≤ kAk kA−1 k kxk,
kbk kbk

kA−1 bk kx − x̃k
kb−b̃k ≤ kAk kx−x̃k = kAk kx−x̃k ≤ kAk kA−1 k kbk.
kxk kxk
Thus, denoting the quantity kAk kA−1 k by κ(A),

1 kb − b̃k kx − x̃k kb − b̃k


≤ ≤ κ(A) . (4.1)
κ(A) kbk kxk kbk

From the above inequalities, it can be inferred that if κ(A) is large,


then it can happen that for small relative error kb − b̃k/kbk in the
data, the relative error kx − x̃k/kxk in the solution may be large. In
fact, there do exist b, b̃ such that

kx − x̃k kb − b̃k
= κ(A) ,
kxk kbk

where x, x̃ are such that Ax = b and Ax̃ = b̃. To see this, let x0 and
u be vectors such that

kAx0 k = kAk kx0 k, kA−1 uk = kA−1 k kuk,

and let
b := Ax0 , b̃ := b + u, x̃ := x0 + A−1 u.
82 Error Bounds and Stability of Linear Systems

Then it follows that Ax̃ = b̃ and


kx0 − x̃k kA−1 uk kA−1 k kuk kAk kA−1 k kuk kb − b̃k
= = = = κ(A) .
kx0 k kx0 k kx0 k kAx0 k kbk
The quantity κ(A) := kAk kA−1 k is called the condition num-
ber of the matrix A. To illustrate the observation in the preceding
paragraph, let us consider
   
1 1+ε b1
A= , b= .
1−ε 1 b2
It can be seen that
   
−1 1 1 −1 − ε −1 1 b1
A = 2 so that x=A b=− .
ε −1 + ε 1 ε b2
From this, it is clear that, if ε is small, then for small kbk, kxk can
be very large. In this case, it can be seen that
1  2 + ε 2 4
kAk∞ = 2 + ε, kA−1 k∞ = 2 (2 = ε), κ(A) = > 2.
ε ε ε
In practice, while solving Ax = b by numerically, we obtain an
approximate solution x̃ in place of the actual solution. One would
like to know how much error incurred by this procedure. We can
have inference on this from (4.1), by taking b̃ := Ax̃.
Exercise 4.5 Let A ∈ Rn×n be an invertible matrix. Then there ex-
ist vectors x, u such that kAx0 k = kAk kx0 k and kA−1 uk = kA−1 k kuk
– Justify.
Exercise 4.6 1. Suppose A, B in Rn×n are invertible matrices,
and b, b̃ are in Rn . Let x, x̃ are in Rn be such that Ax = b and
B x̃ = b̃. Show that
kx − x̃k  kA − Bk kb − b̃k 
≤ kAk kB −1 k + .
kxk kAk kbk
[Hint: Use the fact that B(x − x̃) = (B − A)x + (b − b̃), and
use the fact that k(B − A)xk ≤ kB − Ak kxk, and kb − b̃k =
kb − b̃kkAxk/kbk ≤ kb − b̃kkAk kxk/kbk.]
2. Let B ∈ Rn×n . If kBk < 1, then show that I − B is invertible,
and k(I − B)−1 k ≤ 1/(1 − kBk).
[Hint: Show that I − B is injective, by showing that for every
x, k(I − B)xk ≥ (1 − kBk)kxk, and then deduce the result.]
Error Bounds for System of Equations 83

3. Let A, B ∈ Rn×n be such that A is invertible, and kA − Bk <


1/kA−1 k. Then, show that, B is invertible, and
kA−1 k
kB −1 k ≤ .
1 − kA − Bk kA−1 k
[Hint: Observe that B = A − (A − B) = [I − (A − B)A−1 ]A,
and use the previous exercise.]

4. Let A, B ∈ Rn×n be such that A is invertible, and kA − Bk <


1/2kA−1 k. Let b, b̃, x, x̃ be as in Exercise 1. Then, show that,
B is invertible, and

kx − x̃k  kA − Bk kb − b̃k 
≤ 2κ(A) + .
kxk kAk kbk

[Hint: Apply conclusion in Exercise 3 to that in Exercise 1.]


5
Fixed Point Iterations for
Solving Equations

Suppose S is a non-empty set and f : S → S is a function. Our


concern in this chapter is to find an x ∈ S such that x = f (x).
Definition 5.1 A point x ∈ S is called a fixed point of f : S → S
if x = f (x).
It is to be mentioned that a problem of finding zeros of a func-
tion can be converted into a problem of finding fixed points of an
appropriate function. A simplest case is the following:
Suppose S is a subset of a vector space V and g : S → V . Then
for x ∈ S,
g(x) = 0 if and only if x = f (x),

where f (x) = x − g(x). Thus, if x − g(x) ∈ S for every x ∈ S, then


the problem of solving g(x) = 0 is same as finding a fixed point of
f : S → S.
It is to be remarked that a function may not have a fixed point
or may have more than one fixed point. For example

• f : R → R defined by f (x) = x + 1 has no fixed point,

• f : R → R defined by f (x) = 2x+1 has exactly one fixed point,

• f : R → R defined by f (x) = x2 has exactly two fixed point,

• f : R2 → R2 defined by f (x1 , x2 ) = (x2 , x1 ) has infinitely many


fixed points
Rt
• f : C[0, 1] → C[0, 1] defined by f (x)(t) = 0 x(s)ds has no fixed
point.

84
85

Now, suppose that S is a subset of a normed vector space V with


a norm k · k. For finding a fixed point of f : S → S, one may consider
the following iterative procedure to construct a sequence (xn ) in S:

Start with some x0 ∈ S, then define iteratively

xn = f (xn−1 ), n = 1, 2, . . . .

One may enquire whether (xn ) converges to a fixed point


of f .

Suppose the above iterations converge to some x ∈ V , i.e., sup-


pose there exists an x ∈ V such that kx − xn k → 0 as n → ∞. Then
the question is whether x ∈ S and f (x) = x.
We require the following definition.
Definition 5.2 Let S be a subset of a normed vector space V , and
f : S → S.
(a) The set S is said to be a closed set if it has the property
that x ∈ V and x = limn→∞ xn for some sequence (xn ) in S implies
x ∈ S.
(b) The function f is said to be continuous (on S) if for every
sequence (xn ) in S which converges to a point x ∈ S, the sequence
(f (xn )) converges to f (x).
Using the above definition, the proof of the following proposition
is obvious.

Proposition 5.1 Suppose S is a subset of a normed vector space


V , and f : S → S. Let x0 ∈ S, and (xn ) is defined iteratively by
xn = f (xn−1 ) for n ∈ N. Suppose that (xn ) converges to an x ∈ V .
If f is continuous and S is closed then x is a fixed point of f .

In the above proposition the assumption that the sequence (xn )


converges is a strong one. Sometimes it is easy to show that a se-
quence is a Cauchy sequence.
Definition 5.3 Let V be a normed vector space V . A sequence (xn )
is V is said to be a Cauchy sequence if for every ε > 0 there exists
a positive integer N such that kxn − xm k < ε for all n, m ≥ N .
A normed space in which every Cauchy sequence converges is
called a Banach space
86 Fixed Point Iterations for Solving Equations

Examples of Banach spaces are


• Rk with k · k1 or k · k2 or k · k∞ ,
• C[a, b] with kxk∞ := max{|x(t)| : a ≤ t ≤ b}.
It is known that every finite dimensional vector space with any
norm is a Banach space, whereas every infinite dimensional space
need not be a Banach space with respect to certain norms. For
instance, it can be shown easily that, C[a, b] with the norm kxk1 :=
Rb
a |x(t)|dt is not a Banach space.

Theorem 5.2 Suppose S is a closed subset of a Banach vector space


V , and f : S → S satisfies

kf (x) − f (y)k ≤ ρkx − yk ∀x, y ∈ S,

for some constant ρ satisfying 0 < ρ < 1. Then f has a unique fixed
point. In fact, for any x0 ∈ S, if we define

xn = f (xn−1 ), n = 1, 2, . . . ,

iteratively, then (xn ) converges to a unique fixed point x ∈ S of f ,


and

kxn+1 − xn k ≤ ρkxn − xn−1 k ≤ ρn kx1 − x0 k ∀n ∈ N,

ρm
kxn − xm k ≤ kx1 − x0 k ∀n > m,
1−ρ
ρn
kx − xn k ≤ kx1 − x0 k ∀n ∈ N.
1−ρ
Proof. Let x0 ∈ S, and define

xn = f (xn−1 ), n = 1, 2, . . . .

Then

kxn+1 − xn k = kf (xn ) − f (xn−1 k ≤ ρkxn − xn−1 k ∀n ∈ N,

so that
kxn+1 − xn k ≤ ρn kx1 − x0 k ∀n ∈ N.
87

Now, let n > m. Then

kxn − xm k ≤ kxn − xn−1 k + kxn−1 − xn−2 k + . . . + kxm+1 − xm k


≤ (ρn−1 + ρn−2 + . . . + ρm )kx1 − x0 k
ρm
≤ kx1 − x0 k.
1−ρ
Since ρm → 0 as m → ∞, (xn ) is a Cauchy sequence. Since V is a
Banach space (xn ) converges to some x ∈ V , and since S is a closed
set, x ∈ S. It also follows that, for all m ∈ N,
ρm
kx − xm k = lim kxn − xm k ≤ kx1 − x0 k.
n→∞ 1−ρ
Observe that

kx − f (x)k ≤ k(x − xm ) + (xm − f (x)k


≤ kx − xm k + kxm − f (x)k
≤ kx − xm k + ρkxm−1 − xk
ρm ρm−1
≤ kx1 − x0 k + ρ kx1 − x0 k
1−ρ 1−ρ
2ρm
≤ kx1 − x0 k.
1−ρ
Since ρm → 0 as m → ∞, it follows that kx − f (x)k = 0, i.e.,
x = f (x), i.e., x is a fixed point of f . Now, to show that there is
only one fixed point of f , suppose u and v are fixed points of f , i.e.,
u = f (u) and v = f (v). Then we have

ku − vk = kf (u) − f (v)k ≤ ρku − vk

so that (1 − ρ)ku − vk ≤ 0. Since 1 − ρ > 0 and ku − vk ≤ 0, we see


that ku − vk = 0, i.e., u = v.

Remark 5.1 For certain functions f : S → V , sometimes one may


be able to show that kf (x) − f (y)k ≤ ρkx − yk for all x, y ∈ S for
some ρ > 0, but the condition “f (x) ∈ S for all x ∈ S” may not
be satisfied. In this case the above theorem cannot be applied. For
example, suppose f : [1, 2] → R be defined by f (x) = x/2. Then, we
have |f (x) − f (y)| = ρ|x − y| with ρ = 1/2, but f (3/2) = 3/4 6∈ [1, 2].
Now, suppose that kf (x) − f (y)k ≤ ρkx − yk for all x, y ∈ S and
if we also know that
88 Fixed Point Iterations for Solving Equations

(i) f has a fixed point x∗ , and


(ii) S contains Dr := {x ∈ V : kx − x∗ k ≤ r} for some r > 0.
Then it follows that f (x) ∈ Dr for all x ∈ Dr . Indeed, for x ∈ Dr ,

kf (x) − x∗ k = kf (x) − f (x∗ )k ≤ ρkx − x∗ k < kx − x∗ k ≤ r.

Thus, under the additional assumptions (i) and (ii), we can generate
the iterations with any x0 ∈ Dr .
Remark 5.2 In order to have certain accuracy of the approximation,
say for the error kx − xn k to at most ε > 0, we have to take n large
enough so that
ρn
kx1 − x0 k < ε,
1−ρ
that is, error kx − xn k ≤ ε > 0 for all n satisfying
 
log kx1 − x0 k/ε(1 − ρ)
n≥ .
log(1/ρ)

5.1 Iterative Methods for Solving Ax = b


Suppose A ∈ Rn×n and b ∈ Rn . We would like to convert the problem
of solving Ax = b into that of finding a fixed point of certain other
system. In this regard, the following result is of great use.
Theorem 5.3 Let C ∈ Rn×n and d ∈ Rn . Let x(0) ∈ Rn be given
and x(k) ∈ Rn be defined iteratively as

x(k) = Cx(k−1) + d, n = 1, 2, . . . .

If kCk < 1, then (x(k) ) converges to a (unique) fixed point of the


system x = Cx + d and
ρk
kx − x(k) k ≤ kx(1) − x(0) k, ρ := kCk.
1−ρ
Proof. Let F : Rn → Rn be defined by

F (x) = Cx + d, x ∈ Rn .

Then we have

kF (x) − F (y)k = kC(x − y)k ≤ kCk kx − yk.

Hence, the result follows from Theorem 5.2.


Iterative Methods for Solving Ax = b 89

Now let A ∈ Rn×n and b ∈ Rn . Note that for x ∈ Rn ,

Ax = b ⇐⇒ x = (I − A)x + b.

Hence, form Theorem 5.3, the iterations

x(k) = (I − A)x(k−) + b, n = 1, 2, . . .

converges to a unique solution of Ax = b, provided kI − Ak < 1.


EXAMPLE 5.1 Consider the system Ax = b with
 3 1 1   1 1 1

2 5 4 2 5 4
1 3 2 1 1 2
A= 4 2 9
 so that A−I = 4 2 9

1 1 4 1 1
6 3 5 6 3 − 15

Thus, it follows that kI − Ak∞ = 35/36 and kI − Ak1 = 31/30.


Thus, the error estimates for the above described iterative procedure
for this example is valid if we take k · k∞ on R3 , but not with k · k1 .
The idea of resorting to an iterative procedure for finding approx-
imate solution of Ax = b is when it is not easy to solve it exactly.
Suppose we can write A = A1 + A2 , where the system A1 x = v can
be solved easily. Then, we may write Ax = b as A1 x = b − A2 x, so
that the system Ax = b is equivalent to the system

x = A−1 −1
1 b − A1 A2 x.

Suppose, for a given x(0) , we define x(k) by

A1 x(k) = b − A2 x(k−1) , k = 1, 2, . . . .

Then, by Theorem 5.3, if kA−1 (k)


1 A2 k < 1, then (x ) converges to a
−1 −1
unique solution of x = A1 b − A1 A2 x which is same as Ax = b.

5.1.1 Jacobi Method


Let A = (aij ) be an n × n matrix. In Jacobi method, we assume that
aii 6= 0 for all i = 1, . . . , n, and define the Jacobi iterations by

(k) 1 h X (k−1)
i
xi = bi − aij xj , k = 1, 2, . . .
aii
j6=i

for i = 1, . . . , n. This is equivalent to splitting of A as A = A1 +


A2 with A1 being the diagonal matrix consisting of the diagonal
90 Fixed Point Iterations for Solving Equations

entries of A. Thus, convergence of the Jacobi iterations to the unique


solution of Ax = b is ensured if kA−1
1 A2 k < 1. If we take the norm
k · k∞ on Rn , then we have
1 X
kA−1
1 A2 k∞ = max |aij |.
i |aii |
j6=i

Hence, required condition is


X
|aij | < |aii | ∀i = 1, . . . , n.
j6=i

EXAMPLE 5.2 Consider the system Ax = b with


 
9 1 1
A =  2 10 3  .
3 4 11

For applying Jacobi method, we take


   
9 0 0 0 1 1
A1 =  0 10 0  and A2 =  2 0 3  .
0 0 11 3 4 0

We see that
1 1 1
   
9 0 0 0 9 9
A−1
1 =
 0 1
10 0  so that A−1
1 A2 =
 1
5 0 3
10
.
1 3 4
0 0 11 11 11 0

Thus, it follows that kA1−1 A2 k∞ = 7/11 and kA−1 1 A2 k1 = 47/99.


Thus, the error estimates for the above described iterative procedure
for this example is valid if we take either k · k∞ or k · k1 on R3 . For
instance, taking k · k1 on R3 and ρ = 47/99, we have

ρk 99  47 k (1)
kx − x(k) k1 ≤ kx(1) − x(0) k = kx − x(0) k.
1−ρ 52 99
5.1.2 Gauss-Siedel Method
Let A = (aij ) be an n × n matrix. In this method also, we assume
that aii 6= 0 for all i = 1, . . . , n. In this case we view the system
Ax = b as
A1 x = b − A2 x,
Iterative Methods for Solving Ax = b 91

where A1 is the lower triangular part of A including the diagonal,


and A2 = A − A1 , i.e.,
   
a11 0 0 ... 0 0 a12 a13 . . . a1n
 a21 a22 0 ... 0   0 0 a23 . . . a2n 
A1 = 
 ·
, A2 =  
· ... ·   · · ... · 
an1 an2 an3 . . . ann 0 0 0 . . . ann

Thus, the Gauss-Siedel iterations are defined by


(1) 1 h X (k−1)
i
xi = bi − aij xj ,
a11
j>i

(k) 1 h X (k)
X (k−1)
i
xi = bi − aij xj − aij xj , k = 2, . . .
aii
j<i j>i

for i = 1, . . . , n. The convergence of the Gauss-Siedel iterations to


the unique solution of Ax = b is ensured if kA−11 A2 k < 1.

EXAMPLE 5.3 Again consider the system Ax = b with


 
9 1 1
A= 2  10 3  .
3 4 11

For applying Gauss-Siedel method, we take


   
9 0 0 0 1 1
A1 =  2 10 0  and A2 =  0 0 3  .
3 4 11 0 0 0

We see that
 1   1 1

9 0 0 0 9 9
A−1
1 =
 −1
45
1
10 0  so that −1 1
A1 A2 =  0 − 45 5 
− 18 .
1 2 1 1 13
− 45 − 55 11 0 − 45 − 99

Thus, it follows that kA−1 −1


1 A2 k∞ = 3/10 and kA1 A2 k1 = 103/198 >
3/10. Thus, the error estimates for the above described iterative
procedure for this example is valid if we take either k · k∞ or k · k1
on R3 . For instance, taking k · k∞ on R3 and ρ = 3/10, we have

(k) ρk (1) (0) 10  3 k (1)


kx − x k1 ≤ kx − x k = kx − x(0) k.
1−ρ 7 10
92 Fixed Point Iterations for Solving Equations

5.2 Newton’s Method for Solving f (x) = 0


Suppose f : [a, b] → R is a function having a zero x∗ ∈ [a, b], i.e.,
f x∗ ) = 0. In practice, the exact location of x∗ may not be known,
but an (initial) approximation of x∗ , say x0 may be known. The idea
of Newton’s method is to find better approximations for x∗ in an
iterative manner. For this first we assume that
• f is differentiable at every x ∈ [a, b], and f 0 (x) 6= 0 for every
x ∈ [a, b].
The idea is to choose an initial point x0 ∈ [a, b], and find a point
x1 as the point of intersection of the tangent at x0 with the x-axis.
Thus x1 has to satisfy
f (x0 ) − 0 f (x0 )
f 0 (x0 ) = = ,
x0 − x1 x0 − x1
i.e., x1 is defined by
f (x0 )
x1 := x0 − .
f 0 (x0 )
Now, repeat the above procedure with x1 in place of x0 to get a new
point
f (x1 )
x2 := x1 − 0 .
f (x1 )
In general, we define
f (xn−1 )
xn := xn−1 − , n = 1, 2, . . . .
f 0 (xn−1 )
There arises some questions:
• Does each xn belong to [a, b]?
• Does the sequence (xn ) converge to x∗ ?
In order to answer the above questions we define a new function
g : [a, b] → R by
f (x)
g(x) := x − , x ∈ [a, b].
f 0 (x)
Theorem 5.4 Suppose that f : [a, b] → R is twice continuously
differentiable at every x ∈ [a, b], and that there exists x∗ ∈ [a, b] such
that f (x∗ ) = 0. Then there exists a closed interval J0 ⊆ [a, b] such
that g : J0 → J0 is a contraction.
Newton’s Method for Solving f (x) = 0 93

Proof. Note that, under the above assumption, the function g is


continuously differentiable at every x ∈ [a, b], and

f (x)f 00 (x)
g 0 (x) := , x ∈ [a, b].
[f 0 (x)]2

Now, by mean value theorem, for every x, y ∈ [a, b], there exists ξx,y
in the interval whose end points are x and y, such that

g(x) − g(y) = g 0 (ξx,y )(x − y).

Hence, g is a contraction in an interval J0 if there exists ρ such


that 0 < ρ < 1 and |g 0 (ξx,y )| ≤ ρ for all x, y ∈ J0 . Note that the
function g 0 is continuous in [a, b] and g 0 (x∗ ) = 0. Hence, for every ρ
with 0 < ρ < 1, there exists a closed interval Jρ ⊆ [a, b] such that
|g 0 (x)| ≤ ρ for all x, y ∈ J0 .

Assume that f : [a, b] → R is twice continuously differentiable at


every x ∈ [a, b], and let J0 be as in the above theorem. Then, taking
x0 ∈ J0 , the sequence (xn ) defined earlier converges to x∗ , and

ρn
|x∗ − xn | ≤ ρ|x∗ − xn−1 | ≤ |x1 − x0 | ∀ n ∈ N.
1−ρ

5.2.1 Error Estimates


Suppose f : [a, b] → R is a twice continuously differentiable function,
and there exits x∗ ∈ [a, b] such that f (x∗ ) = 0. We have already
seen that for any given ρ ∈ (0, 1), there exists a closed interval J0
centered at x∗ such that x0 ∈ J0 implies xn ∈ J0 for all n ∈ N, and
ρn
|x∗ − xn | ≤ ρ|x∗ − xn−1 | ≤ |x1 − x0 | ∀ n ∈ N.
1−ρ
Now we see that a better convergence estimate is possible.
Assume that f 0 (x) 6= 0 for all x ∈ [a, b]. Let us assume for the
time being that the sequence (xn ) given iteratively as follows is well-
defined:
f (xn )
xn+1 := xn − 0 , n = 0, 1, 2, . . . .
f (xn )
Then, by mean value theorem, we have

f 00 (ξn )
0 = f (x∗ ) = f (xn ) + (x∗ − xn )f 0 (xn ) + (x∗ − xn )2
2
94 Fixed Point Iterations for Solving Equations

so that 00
f (xn ) ∗ ∗ 2 f (ξn )
0= + (x − x n ) + (x − x n )
f 0 (xn ) 2f 0 (xn )
Now, by the definition of xn+1 ,
f 00 (ξn )
0 = (xn − xn+1 + (x∗ − xn ) + (x∗ − xn )2
2f 0 (xn )
so that
f 00 (ξn )
x∗ − xn+1 = (x∗ − xn )2 .
2f 0 (xn )
From the above relation, it is clear that if J0 is as in Theorem
5.4, and if we know that there exists a constant κ > 0 such that
|f 00 (x)/2f 0 (y)| ≤ κ for all x, y ∈ J0 , then

|x∗ − xn+1 | ≤ κ|x∗ − xn |2 ∀n.

Another way of looking at the issue of error estimates is the


following: Since f 0 (x∗ ) 6= 0, there exists δ > 0 such that J1 :=
[x∗ − δ, x∗ + δ] ⊆ [a, b] and |f 0 (x)| ≥ |f 0 (x∗ )|/2 for all x ∈ J1 . Let
M > 0 be such that |f 00 (x)| ≤ M for all x ∈ J1 . Hence,

|f 00 (x)/2f 0 (y)| ≤ κ0 := M/|f 0 (x∗ )| ∀x ∈ J1 .

Then we see that


 2  2 n
κ0 |x∗ − xn+1 | ≤ κ0 |x∗ − xn | ≤ κ0 |x∗ − x0 | ∀ n.

Thus, it is seen that if x0 ∈ J2 := J1 ∩ {x : |x∗ − x| < 1/κ0 }, then


xn ∈ J2 for all n ∈ N (xn ) converges to x∗ as n → ∞. Moreover, we
have the error estimate
n −1 n
|x∗ − xn+1 | ≤ κ0 |x∗ − xn |2 ≤ κ02 |x∗ − x0 |2 ∀ n.

Exercise 5.1 1. Consider the equation f (x) := x6 − x − 1 = 0.


Apply Newton’s method for this equation and find xn , f (xn )
and xn − xn−1 for n = 1, 2, 3, 4 with initial guesses (i) x0 = 1.0,
(ii) x0 = 1.5, (iii) x0 = 2.0. Compare the results.

2. Using Newton’s iterations, find approximations for the roots of


the following equations with an error tolerance, |xn − xn−1 | ≤
10−6 :
(i) x3 − x2 − x − 1 = 0, (ii) x = 1 + 0.3 cos(x), (iii) x = e−x .
Newton’s Method for Solving f (x) = 0 95

3. Write Newton’s iterations for the problem of finding 1/b for a


number b > 0 with x0 > 0.

4. Show that the Newton’s iterations for finding approximations



for a for a > 0 has the error formula:
√ √
a − xn+1 = − 2x1n ( a − xn )2 .

5. Using Newton’s method find approximations for m-th root of


2 for six significant digits, for m = 2, 3, 4, 5.
6
Interpolation and Numerical
Integration

6.1 Interpolation
The idea of interpolation is to find a function ϕ which takes certain
prescribed values β1 , β2 , . . . , βn at a given set of points t1 , t2 , . . . , tn .
In application the values β1 , β2 , . . . , βn may be values of certain un-
known function f at t1 , t2 , . . . , tn respectively. The function ϕ is to
be of some simple form for computational purposes. Thus, the in-
terpolation problem is to find a function ϕ such that ϕ(ti ) = βi ,
i = 1, . . . , n.
Usually, one looks for ϕ in the span of certain known functions
u1 , . . . , un . Thus, the interpolation P problem is to find scalars α1 , . . . , αn
such that the function ϕ := nj=1 αj uj satisfies ϕ(ti ) = βi for i =
1, . . . , n, i.e., to find α1 , . . . , αn such that
n
X
αj uj (ti ) = βi , i = 1, . . . , n.
j=1

Obviously, the above problem has a unique solution if and only if the
matrix [uj (ti )] is invertible. Thus we have the following theorem.

Theorem 6.1 Suppose u1 , . . . , un are functions defined on [a, b],


and t1 , . . . , tn are points in [a, b]. Then there exists a unique ϕ ∈
span {u1 , . . . , un } satisfying ϕ(ti ) = βi for i = 1, . . . , n if and only if
the matrix [uj (ti )] is invertible.

Exercise 6.1 Suppose u1 , . . . , un are functions defined on [a, b], and


t1 , . . . , tn are points in [a, b]. Show that, if the matrix [uj (ti )] is
invertible, then u1 , . . . , un are linearly independent.
Hint: A square matrix is invertible if and only if its columns are
linearly independent.

96
Interpolation 97

Exercise 6.2 Suppose u1 , . . . , un are functions defined on [a, b], and


t1 , . . . , tn are points in [a, b] such that the matrix [uj (ti )] is invertible.
If v1 , . . . , vn are linearly independent functions in span {u1 , . . . , un },
then show that the matrix [vj (ti )] is also invertible.
Hint: Let X0 := span {u1 , . . . , un }. Then observe that, if the
matrix [uj (ti )] is invertible, then the function J : X0 → Rn defined
by J(x) = [x(t1 ), . . . , x(tn ]T is bijective.
Exercise 6.3 Let t1 , . . . , tn be distinct points in R, and for each
j ∈ {1, 2, . . . , n}, let
Y t − ti
`j (t) = .
tj − ti
i6=j

Then show that {`1 , . . . , `n } is a basis of Pn−1 , and it satisfies `j (ti ) =


δij for all i, j = 1, . . . , n. Deduce from the previous exercise that the
matrix [ti−1
j ] is invertible.

In general, if t1 , . . . , tn are distinct points in [a, b], and if u1 , . . . , un


are
Pn functions which satisfy uj (ti ) = δij , then the function ϕ(t) :=
j=1 βj uj (t) satisfies ϕ(ti ) = βi . Thus, if t1 , . . . , tn be distinct points
in [a, b], and if u1 , . . . , un are functions which satisfy uj (ti ) = δij , the
interpolation function of f : [a, b] → R, associated with the nodes
t1 , . . . , tn and the basis functions uj ’s is
n
X
ϕ(t) := f (tj )uj (t), a ≤ t ≤ b.
j=1

EXAMPLE 6.1 Let t1 , . . . , tn be distinct points in [a, b]. Define


u1 , . . . , un as follows:

 1 if a ≤ t ≤ t1 ,
t−t2
u1 (t) = if t1 ≤ t ≤ t2 ,
 t1 −t2
0, elsewhere,

if tn ≤ t ≤ b,

 1
t−tn−1
un (t) = if tn−1 ≤ t ≤ tn ,
 tn −tn−1
0, elsewhere,
and for 2 ≤ j ≤ n − 1,
t−tj−1


 tj −tj−1 if tj−1 ≤ t ≤ tj ,
t−tj+1
uj (t) = tj −tj+1 if tj ≤ t ≤ tj+1 ,

0, elsewhere,

98 Interpolation and Numerical Integration

Because of their shapes u1 , . . . , un are called hat functions. Note that


uj (ti ) = δij . In this case the interpolation function ϕ is the polygonal
line passing throughPthe points (t1 , f (t1 )), . . . , (tn , f (tn )). In this case
it is also true that nj=1 uj (t) = 1 for all j ∈ [a, b]. Hence,
n
X
f (t) − ϕ(t) = [f (t) − ϕ(tj )]uj (t).
j=1

Note that, for ti−1 ≤ t ≤ t ≤ ti ,


i
X
f (t) − ϕ(t) = [f (t) − ϕ(tj )]uj (t).
j=i−1

6.1.1 Lagrange Interpolation


By Exercise 6.3, {`1 , . . . , `n } is a basis of Pn−1 , and it satisfies `j (ti ) =
δij for all i, j = 1, . . . , n. Hence, by Theorem 6.1, it is clear that,
given distinct points t1 , . . . , tn in [a, b], and numbers β1 , . . . , βn , there
exists a unique polynomial Ln (t) ∈ Pn−1 such that Ln (ti ) = βi for
i = 1, . . . , n, and it is given by
n
X
Ln (t) := βj `j (t), a ≤ t ≤ b.
j=1

The above polynomial is called the Lagrange interpolating polyno-


mial, and the functions `1 , . . . , `n are called Lagrange basis polyno-
mials.
It canPbe seen that the Lagrange basis polynomials `i (t) also
n
satisfies j=1 `j (t) = 1. Hence, if ϕ is the Lagrange interpolat-
ing polynomial
Pn of a function f associated with nodes t1 , . . . , tn , i.e.,
Ln (t) := j=1 f (tj )`j (t), then
n
X
f (t) − Ln (t) = [f (t) − ϕ(tj )]`j (t).
j=1

The following theorem can be seen in any standard text book on


Numerical Analysis.
Theorem 6.2 If f is continuously differentiable n + 1 times on the
interval [a, b], and if ϕ is the Lagrange interpolating polynomial of a
Interpolation 99

function f associated with nodes t1 , . . . , tn , then there exists ξ ∈ (a, b)


such that
f (n+1) (ξ)
f (t) − Ln (t) = (t − t1 )(t − t2 ) . . . (t − tn ).
(n + 1)!
By the above theorem,

|f (n+1) (ξ)| (b − a)n


|f (t) − Ln (t)| ≤ .
n+1 (n)!
n
Note that (b−a)
(n)! → 0 as n → ∞.Thus, if f is sufficiently smooth,
then we can expect that Ln is close to ϕ for sufficiently large n.
Although we have a nice result in the above theorem, it is not at
all clear whether Ln is close to f whenever the maximum width of
the subintervals is close to zero. In fact it is known that
(n) (n)
• if, for each n ∈ N, t1 , . . . , t1 are points in [a, b] then there
exists f ∈ C[a, b] such that kf − Ln k∞ 6→ 0 as n → ∞, where
(n)
Ln (t) := nj=1 f (tj )`j (t).
P

6.1.2 Piecewise Lagrange Interpolation


The disadvantage of Lagrange interpolation polynomial is that for
large n, more computations are involved in obtaining the coefficients
of the polynomial, and even for large n, it is not guaranteed that
Ln (t) is close to f for less smooth functions f (t). One way to sur-
mount this problem is to divide the interval [a, b] into n equal parts,
say by a partition a = a0 < a1 < a2 < . . . < an = b. In each subin-
terval Ii = [ai−1 , ai ] we consider the Lagrange interpolation of the
function. For this we consider points τi1 , . . . , τik in Ii , and for t ∈ Ii ,
i = 1, . . . , n, define
k Y t − τim
(n)
X
pi,n (t) := f (τij )`ij (t), `ij (t) = ,
τij − τim
j=1 m6=j

and pn (t) = pi,n (t) whenever t ∈ Ii . Note that pn is a function on


[a, b] such that for each i ∈ {1, . . . , n}, pn |Ii is a polynomial in Pk−1 .
Such a function pn is called a spline. If τi1 = ai−1 and τik = ai , then
we see that pn is a continuous function on [a, b].
Instead of taking arbitrary points τi1 , . . . , τik in Ii one may choose
them by mapping a fixed number of points τ1 , . . . , τk in [−1, 1] to
each subinterval Ii by using functions gi : [−1, 1] → Ii so as to
100 Interpolation and Numerical Integration

obtain τim = gi (τm ) for m = 1, . . . , k. Points τ1 , . . . , τk in [−1, 1]


chosen as zeros of certain orthogonal polynomial of degree k has
some advantages over other type of points.
If f is k times continuously differentiable, then by Theorem 6.2,
for each i ∈ {1, . . . , n}, there exists ξi ∈ Ii such that for t ∈ Ii ,
k
f (k+1) (ξi ) Y
f (t) − pn (t) = (t − τim ).
(k + 1)!
m=1
Hence, for every t ∈ [a, b],
k
kf (k+1) k∞ Y
|f (t) − pn (t)| ≤ sup (t − τim ).
(k + 1)! ai−1 ≤t≤ai m=1

In particular, if ai − ai−1 = hn := (b − a)/n for all i ∈ {1, . . . , n},


then
kf (k+1) k∞ k
kf − pn k∞ ≤ h .
(k + 1)! n
Note that hn → 0 as n → ∞. Thus, for large enough n, pn is an
approximation of f .

6.2 Numerical Integration


The idea involved in numerical integration of a (Riemann integrable)
Rb
function f : [a, b] → R is to replace the integral a f (t)dt by another
Rb
integral a ϕ(t)dt, where ϕ is an interpolation of f based on certain
points in [a, b]. Thus, numerical integration formulas are of the form
n
X
f (tj )wj ,
j=1

where t1 , . . . , tn are called the nodes and w1 , . . . , wn are called the


weights of the formula. Numerical integration formulas are also called
quadrature rules.
Suppose u1 , . . . , un are functions such that uj (ti ) = δij for i, j =
1, . . . , n, and let ϕ be P the interpolation of f based on t1 , . . . , tn and
u1 , . . . , un ,i.e., ϕ(t) = nj=1 f (tj )uj (t). Then
Z b Xn Z b
ϕ(t)dt = f (tj )wj with wj = uj (t)dt.
a j=1 a

The above quadrature rule is called an interpolatory quadrature


rule. Here are some special cases of interpolatory quadrature rules.
Numerical Integration 101

6.2.1 Trapezoidal rule


Suppose we approximate f by the interpolation polynomial ϕ of de-
gree atmost 1 based on the points a and b, i.e., we approximate the
graph of f by the straight line joining (a, f (a)) and (b, f (b)). Then
we see that Z b
b−a
ϕ(t)dt = [f (a) + f (b)].
a 2
This quadrature rule is called the trapezoidal rule.
EXAMPLE 6.2 Consider f (t) = 1/(1 + t) for 0 ≤ t ≤ 1. Then
Z 1
1 1 1 3
ϕ(t)dt = [f (0) + f (1)] = [1 + ] = = 0.75.
0 2 2 2 4
R 1 dt
We know that 0 1+t = ln(2) ' 0.693147.
Z 1
Error = log(2) − ϕ(t)dt ' −0.056852819
0

6.2.2 Composite Trapezoidal rule


Let a = a0 < a1 < . . . < an = b be a partition of [a, b]. Suppose
we approximate f by the piecewise interpolation polynomial ϕ which
is of degree atmost 1 in each subinterval [ai−1 , ai ] for i = 1, . . . , n,
based on the points ai−1 and ai , i.e., we approximate the graph of
f by a polygonal line line passing through the poits (ai , f (ai )) for
i = 0, 1, 2, . . . , n. Then we see that
Z b n Z ai n
X X ai − ai−1
ϕ(t)dt = ϕ(t)dt = [f (ai−1 ) + f (ai )].
a ai−1 2
i=1 i=1

This quadrature rule is called the composite trapezoidal rule. In par-


ticular, if hn := ai − ai−1 = (b − a)/n, then
Z b h f (a )
0 f (an ) i
ϕ(t)dt = hn + f (a1 ) + . . . + f (an−1 ) + .
a 2 2

EXAMPLE 6.3 Consider f (t) = 1/(1 + t) for 0 ≤ t ≤ 1. Taking


n = 2, hn = 1/2 and
Z 1
1 h f (0) f (1) i
ϕ(t)dt = + f (1/2) +
0 2 2 2
1 1 2 1
h i 17
= + + = = 0.70833.
2 2 3 4 24
102 Interpolation and Numerical Integration

Z 1
Error = ln(2) − ϕ(t)dt ' −0.01518
0

6.2.3 Simpson’s rule


Suppose we approximate f by the interpolation polynomial ϕ of de-
gree atmost 2 based on the points a, c := (a + b)/2 and b, i.e., we
approximate the graph of f by a quadratic polynomial ϕ. We know
that

(t − c)(t − b) (t − a)(t − b) (t − a)(t − c)


ϕ(t) = f (a) + f (c) + f (b).
(a − c)(a − b) (c − a)(c − b) (b − a)(b − c)

If we take h = (b − a)/2, then we see that

b
(t − c)(t − b)
Z
h
dt = ,
a (a − c)(a − b) 3

b
(t − a)(t − b)
Z
4h
dt = ,
a (c − a)(c − b) 3
b
(t − a)(t − c)
Z
h
dt = .
a (b − a)(b − c) 3
Hence,
Z b
hh i
ϕ(t)dt = f (a) + 4f (c) + f (b) .
a 3
This quadrature rule is called the Simpson’s rule.
EXAMPLE 6.4 Consider f (t) = 1/(1 + t) for 0 ≤ t ≤ 1. Taking
n = 2, hn = 1/2 and
Z 1
hh i
ϕ(t)dt = f (a) + 4f (c) + f (b)
0 3
1h i 1h1 2 1i
= f (0) + 4f (1/2) + f (1) = +4 +
6 6 2 3 2
25
= ' 0.69444.
36
Z 1
Error = ln(2) − ϕ(t)dt ' −0.001293
0
Numerical Integration 103

6.2.4 Composite Simpson’s rule


Consider the partition a = a0 < a1 < . . . < an = b of of [a, b] n is an
even number, and hn := ai −ai−1 = (b−a)/n for every i = 1, 2, . . . , n.
Suppose we approximate f by the piecewise interpolation polynomial
ϕ which is of degree atmost 2 in each subinterval [a2i , a2i+2 ] based
on the points a2i , a2i+1 and a2i+2 for i = 0, 1, 2, . . . , k where 2k = n.
Then we see that
Z b k−1 Z a2i+2 k−1
X X hn h i
ϕ(t)dt = ϕ(t)dt = f (a2i )+4f (a2i+1 )+f (a2i+2 )
a a2i 3
i=0 i=0

Thus,
Z b
hn h
ϕ(t)dt = f (a0 ) + 4f (a1 ) + 2f (a2 ) + 4f (a3 ) + 2f (a4 ) +
a 3
i
. . . + 2f (an−2 ) + 4f (an−1 ) + f (an )
k k−1
hn h X X i
= f (a0 ) + 4f (a2i−1 ) + 2f (a2i ) + f (an )
3
i=1 i=1

This quadrature rule is called the composite Simpson’s rule.


EXAMPLE 6.5 Consider f (t) = 1/(1 + t) for 0 ≤ t ≤ 1. Taking
n = 2k = 4, hn = 1/4 and
Z 1
hn h i
ϕ(t)dt = f (a0 ) + 4f (a1 ) + 2f (a2 ) + 4f (a3 ) + f (a4 )
0 3
(1/4) h i
= f (0) + 4f (1/4) + 2f (1/2) + 4f (3/4) + f (1)
3
1 h 16 4 16 1 i
= 1+ + + +
12 5 3 7 2
1 1747
= × ' 0.693253968
12 210
Z 1
Error = ln(2) − ϕ(t)dt ' −0.000106787
0

Exercise 6.4 Apply trapezoidal rule, composite trapezoidal rule,


Simpson’s rule, and composite Simpson’s rule for approximating the
the following integrals:
Z 1 Z 4 Z 2π
−t2 dt dt
e dt, 2
dt, dt.
0 0 1+t 0 2 + cos(t)
7
Additional Exercises

In the following V denotes a vector space over F which is R or C.

1. Let V be a vector space. For x, y ∈ V , show that x + y = x


implies y = θ.

2. Suppose that x ∈ V is a nonzero vector. Then show that


αx 6= βx for every α, β ∈ F with α 6= β.

3. Let R[a, b] be the set of all real valued Riemann integrable


functions on [a, b]. Show that R[a, b] is a vector space over R.

4. Let V be the set of all polynomials of degree 3. Is it a vector


space with respect to the usual addition and scalar multiplica-
tion?

5. Let S be a nonempty set, s0 ∈ S. Show that the set V of all


functions f : S → R such that f (s0 ) = 0 is a vector space
with respect to the usual addition and scalar multiplication of
functions.

6. Find a bijective linear transformation between Fn and Pn−1 .

7. Let V be the set of real sequences with only a finite number of


nonzero entries. Show that V is a vector space over R and find
a bijective map T : V → P which also satisfies T (x + αy) =
T (x) + αT (y) for all x, y ∈ V and α ∈ R.

8. In each of the following, a set V is given and some operations


are defined. Check whether V is a vector space with these
operations:

(a) Let V = {x = (x1 , x2 ) ∈ R2 : x2 = 0} with addition and


scalar multiplication as in R2 .

104
105

(b) Let V = {x = (x1 , x2 ) ∈ R2 : 2x1 +3x2 = 0} with addition


and scalar multiplication as in R2 .
(c) Let V = {x = (x1 , x2 ) ∈ R2 : x1 + x2 = 1} with addition
and scalar multiplication as for R2 .
(d) Let V = R2 , F = R. For x = (x1 , x2 ), y = (y1 , y2 ), let
x + y := (x1 + y1 , x2 + y2 ) and for all α ∈ R,

(0, 0) α = 0,
αx :=
(αx1 , x2 /α), α 6= 0.

(e) Let V = C2 , F = C. For x = (x1 , x2 ), y = (y1 , y2 ), let

x+y := (x1 +2y1 , x2 +3y2 ) and αx := (αx1 , αx2 ) ∀α ∈ C.

(f) Let V = R2 , F = R. For x = (x1 , x2 ), y = (y1 , y2 ), let

x + y := (x1 + y1 , x2 + y2 ) and αx := (x1 , 0) ∀α ∈ R.

9. Let A ∈ Rn×n , O is the zero in Rn×1 . Show that the set V0 of


of all n × 1 matrices X such that AX = O, is a subspace of
Rn×1 .

10. Let V be the space of all sequences of real numbers, and let
`1 (N) be the set of all absolutely convergent real sequences.
Show that `1 (N) is a subspace of V .

11. Let V be the space of all sequences of real numbers, and let
`∞ (N) be the set of all bounded sequences of real numbers.
Show that `∞ (N) is a subspace of the space of V .

12. For a nonempty set S, V be the set of all functions from S to


R, and let let B(S) be the set of all bounded functions on S.
Show that B(S) is a subspace of V

13. Suppose V0 is a subspace of a vector space V , and V1 is a


subspace of V0 . Then show that V1 is a subspace of V .

14. Give an example to show that union of two subspaces need not
be a subspace.

15. Let S be a subset of a vector space V . Show that S is a subspace


if and only if S = span S.
106 Additional Exercises

16. Let V be a vector space. Show that the the following hold.
(i) Let S be a subset of V . Then span S is the intersection of
all subspaces of V containing S.

(ii) Suppose V0 is a subspace of V and x0 ∈ V \ V0 . Then for


every x ∈ span {x0 ; X0 }, there exist a unique α ∈ F, y ∈ V0
such that x = αx0 + y.

17. Show that

(a) Pn is a subspace of Pm for n ≤ m,


(b) C[a, b] is a subspace of R[a, b],
(c) C k [a, b] is a subspace of C[a, b].

18. For each λ in the open interval (0, 1), let uλ = (1, λ, λ2 , . . .).
Show that uλ ∈ `1 for each ∈ (0, 1), and {uλ : 0 < λ < 1} is a
linearly independent subset of `1 .

19. Let A be an m × n matrix, and b be a column m-vector. Show


that the system Ax = b has a solution n-vector if and only if
b is in the span of columns of A.

20. Let e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1). What is the span
of {e1 + e2 , e2 + e3 , e3 + e1 }?

21. What is the span of S = {tn : n = 0, 2, 4, . . .} in P?

22. Let S be a subset of a vector space V . Show that S is a subspace


if and only if S = span S.

23. Let V be a vector space. Show that the the following hold.

(a) Let S be a subset of V . Then


\
span S = {Y : Y is a subspace of V containing S}.

(b) Suppose V0 is a subspace of V and x0 ∈ V \ V0 . Then


for every x ∈ span {x0 ; X0 }, there exist a unique α ∈ F,
y ∈ V0 such that x = αx0 + y.
107

24. Consider the system of equations


a11 x1 + a12 x2 + ... + a1n xn = b1
a21 x1 + a22 x2 + ... + a2n xn = b2
... + ... + ... + ... = ...
am1 x1 + am1 x2 + ... + amn xn = bm
Let
     
a11 a12 a1n
 a21 
 , u2 :=  a22  , . . . , un :=  a2n  .
   
u1 := 
 ...   ...   ... 
am1 am2 amn

(a) Show that the above system has a solution vector x =


[x1 , . . . , xn ]T if and only if b = [b1 , . . . , bn ]T ∈ span({u1 , . . . , un }.
(b) Show that the above system has atmost one solution vec-
tor x = [x1 , . . . , xn ]T if and only if {u1 , . . . , un } is linearly
independent.

25. Show that every superset of a linearly dependent set is linearly


dependent, and every subset of a linearly independent set is
linearly independent.

26. Give an example to justify the following: E is a subset of vector


space such that there exists an vector u ∈ E which is not a
linear combination of other members of E, but E is linearly
dependent.

27. Is union (resp., intersection) of two linearly independent sets a


linearly independent? Why?

28. Is union (resp., intersection) of two linearly dependent sets a


linearly dependent? Why?

29. Show that vectors u = (a, c), v = (b, d) are linearly independent
in R2 iff ad − bc 6= 0. Can you think of a generalization to n
vectors in Rn .

30. Show that V0 := {x = (x1 , x2 , x3 ) : x1 + x2 + x3 = 0} is a


subspace of R3 . Find a basis for V0 .

31. Show that E := {1 + tn , t + tn , t2 + tn , . . . , tn−1 + tn , tn } is a


basis of Pn .
108 Additional Exercises

32. Let u1 , . . . , un are linearly independent vectors in a vector space


V . Let [aij ] be an m × n matrix of scalar, and let

v1 := a11 u1 + a21 u2 + ... + am1 un


v2 := a12 u1 + a22 u2 + ... + am2 un
... ... + ... + ... + ...
vn := a1n u1 + a2n u2 + ... + amn un .

Show that the v1 , . . . , vm are linearly independent if and only


if the vectors
     
a11 a12 a1n
 a21   a22   am2 
w1 :=  . . .  , w2 :=  . . .  , . . . , wn :=  . . . 
    

am1 am2 amn

are linearly independent.

33. Let u1 (t) = 1, and for j = 2, 3, . . . , let uj (t) = 1 + t + . . . + tj .


Show that span of {u1 , . . . , un } is Pn , and span of {u1 , u2 , . . .}
is P.

34. Let p1 (t) = 1 + t + 3t2 , p2 (t) = 2 + 4t + t2 , p3 (t) = 2t + 5t2 .


Are the polynomials p1 , p2 , p3 linearly independent?

35. Show that a basis of a vector space is a minimal spanning set,


and maximal linearly independent set.

36. Suppose V1 and V2 are subspaces of a vector space V such that


V1 ∩ V2 = {0}. Show that every x ∈ V1 + V2 can be written
uniquely as x = x1 + x2 with x1 ∈ V1 and x2 ∈ V2 .

37. Suppose V1 and V2 are subspaces of a vector space V . Show


that V1 + V2 = V1 if and only if V2 ⊆ V1 .

38. Let V be a vector space.

(a) Show that a subset {u1 , . . . , un } of V is linearly indepen-


dent if and only if the function (α1 , . . . , αn ) 7→ α1 u1 +
· · · + αn un from Fn into V is injective.

(b) Show that if E ⊆ V is linearly dependent in V , then every


superset of E is also linearly dependent.
109

(c) Show that if E ⊆ V is linearly independent in V , then


every subset of E is also linearly independent.

(d) Show that if {u1 , . . . , un } is a linearly independent subset


of V , and if Y is a subspace of V such that (span {u1 , . . . , un })∩
Y = {0}, then every V in the span of {u1 , . . . , un , Y } can
be written uniquely as x = α1 u1 + · · · + αn un + y with
(α1 , . . . , αn ) ∈ Fn , y ∈ Y .

(e) Show that if E1 and E2 are linearly independent subsets


of V such that (span E1 ) ∩ (span E2 ) = {0}, then E1 ∪ E2
is linearly independent.

39. For each k ∈ N, let Fk denotes the set of all column k-vectors,
i.e., the set of all k × 1 matrices. Let A be an m × n matrix of
scalars with columns a1 , a2 , . . . , an . Show the following:

(a) The equation Ax = 0 has a non-zero solution if and only


if a1 , a2 , . . . , an are linearly dependent.

(b) For y ∈ Fm , the equation Ax = y has a solution if and


only if a1 , a2 , . . . , an , y are linearly dependent, i.e., if and
only if y is in the span of columns of A.

40. For i = 1, . . . , m; j = 1, . . . , n, let Eij be the m × n matrix with


its (i, j)-th entry as 1 and all other entries 0. Show that
{Eij : i = 1 . . . , m; j = 1, . . . , n}
is a basis of Fm×n .
41. If {u1 , . . . , un } is a basis of a vector space V , then show that
every x ∈ V , can be expressed uniquely as x = α1 u1 +· · ·+αn un ;
i.e., for every x ∈ V , there exists a unique n-tuple (α1 , . . . , αn )
of scalars such that x = α1 u1 + · · · + αn un .
42. Suppose S is a set consisting of n elements and V is the set of
all real valued functions defined on S. Show that V is a vector
space of dimension n.
43. Given real numbers a0 , a1 , . . . , ak , let X be the set of all solu-
tions x ∈ C k [a, b] of the differential equation
dk x dk−1 x
a0 + a1 + · · · + ak x = 0.
dtk dtk−1
110 Additional Exercises

Show that X is a linear space over R. What is the dimension


of X?

44. Let t0 , t1 , . . . , tn be in [a, b] such that a = t0 < t1 < . . . < tn =


b. For each j ∈ {1, . . . , n}, let uj be in C([a, b], R) such that
(
1 if i = j
uj (ti ) =
0 if i 6= j,

and the restriction of uj to each interval [tj−1 , tj ] is a polyno-


mial of degree atmost 1. Show that the span of {u1 , . . . , un } is
the space of all continuous functions whose restrictions to each
subinterval [ti−1 , ti ] is a polynomial of degree atmost 1.

45. State with reason whether T : R2 → R2 in each of the following


is a linear transformation:
(a) T (x1 , x2 ) = (1, x2 ), (b) T (x1 , x2 ) = (x1 , x22 )
(c) T (x1 , x2 ) = (sin(x1 ), x2 ) (d) T (x1 , x2 ) = (x1 , 2 + x2 )

46. Check whether the functions T in the following are linear trans-
formations:

(a) T : R2 → R2 defined by T (x, y) = (2x + y, x + y 2 ).


R1
(ii) T : C 1 [0, 1] → R defined by T (u) = 0 [u(t)]2 dt.
R 
1
(b) T : C 1 [−1, 1] → R2 defined by T (u) = −1 u(t)dt, u0 (0) .
R1
(c) T : C 1 [0, 1] → R defined by T (u) = 0 u0 (t)dt.

47. Let T1 : V1 → V2 and T2 : V2 → V3 be linear transformations.


Show that the function T : V1 → V3 defined by T x = T2 (T1 x),
x ∈ V1 , is a linear transformation.
[The above transformation T is called the composition of T2
and T1 , and is usually denoted by T2 T1 .]

48. If T1 : C 1 [0, 1] → C[0, 1] is defined 0


R 1 by T1 (u) = u , and T2 :
C[0, 1] → R is defined by T2 (v) = 0 v(t)dt, then find T2 T1 .

49. Let V1 , V2 , V3 be finite dimensional vector spaces, and let E1 ,


E2 , E3 be bases of V1 , V2 , V3 respectively. If T1 : V1 → V2
and T2 : V2 → V3 are linear transformations. Show that
[T2 T1 ]E1 ,E3 = [T2 ]E2 ,E3 [T1 ]E1 ,E2 .
111

50. If T1 : Pn [0, 1] → Pn [0, 1] is defined 0


R 1 by T1 (u) = u , and T2 :
Pn [0, 1] → R is defined by T2 (v) = 0 v(t)dt, then find [T1 ]E1 ,E2 ,
[T2 ]E2 ,E3 , and [T2 T1 ]E1 ,E3 , where E1 = E2 = {1, t, t2 , . . . , tn }
and E3 = {1}.
51. Justify the statement: Let T1 : V1 → V2 be a linear transforma-
tion. Then T is bijective iff there exists a linear transformation
T2 : V2 → V1 such that T1 T2 : V2 → V2 is the identity transfor-
mation on V2 and T2 T1 : V1 → V1 is the identity transformation
on V1 .
52. Let V1 and V2 be vector spaces with dimV1 = n < ∞. Let
{u1 , . . . , un } be a basis of V1 and {v1 , . . . , vn } ⊂ V2 . Find a
linear transformation T : V1 → V2 such that T (uj ) = vj for
j = 1, . . . , n. Show that there is only one such linear transfor-
mation.
53. Let T be the linear transformation obtained as in the above
problem. Show that

(a) T is one-one if and only if {v1 , . . . , vn } is linearly indepen-


dent, and
(b) T is onto if and only if span ({v1 , . . . , vn }) = V2 .

54. Let T : R2 → R2 be the linear transformation which satisfies


T (1, 0) = (1, 4) and T (1, 1) = (2, 5). Find the T (2, 3).
55. Does there exists a linear transformation T : R3 → R2 such
that T (1, 0, 2) = ((1, 1) and T (1/2, 0, 1) = ((0, 1) ?
56. Show that if V1 and V2 are finite dimensional vector spaces
of the same dimension, then the there exists a bijective linear
transformation from V1 to V2 .
57. Find bases for N (T ) and R(T ) for the linear transformation T
in each the following:

(a) T : R2 → R2 defined by T (x1 , x2 ) = (x1 − x2 , 2x2 ),


(b) T : R2 → R3 defined by T (x1 , x2 ) = (x1 + x2 , 0, 2x3 − x2 ),
(c) T : Rn×n → R defined by T (A) = trace(A). (Recall
that trace of a square matrix is the sum of its diagonal
elements.)
112 Additional Exercises

58. Let T : V1 → V2 is a linear transformation. Given reasons for


the following:

(a) rank(T ) ≤ dimV1 .


(b) T onto implies dimV2 ≤ dimV1 ,
(c) T one-one implies dimV1 ≤ dimV2
(d) Suppose dimV1 = dimV2 < ∞. Then T is one-one if and
only T is onto.

59. Let V1 and V2 be finite dimensional vector spaces, and E1 =


{u1 , . . . , un } and E2 = {v1 , . . . , vm } be bases of V1 and V2 ,
respectively. Show the following:

(a) If {g1 , . . . , gm } is the ordered dual basis of L(V1 , F) with


respect to the basis E2 of V2 , then [T ]E1 ,E2 = (gi (T uj )) .
for every T ∈ L(V1 , V2 ).

(b) If A, B ∈ L(V1 , V2 ) and α ∈ F, then

[A+B]E1 ,E2 = [A]E1 ,E2 +[B]E1 ,E2 , [αA]E1 ,E2 = α[A]E1 ,E2 .

(c) Suppose {Mij : i = 1 . . . , m; j = 1, . . . , n} is a basis of


Fm×n . If Tij ∈ L(V1 , V2 ) is the linear transformation
such that [Tij ]E1 ,E2 = Mij , then {Tij : i = 1 . . . , m; j =
1, . . . , n} is a basis of L(V1 , V2 ).

60. Let V1 and V2 be finite dimensional vector spaces, and E1 =


{u1 , . . . , un } and E2 = {v1 , . . . , vm } be bases of V1 and V2 ,
respectively. Let F1 = {f1 , . . . , fn } be the dual basis of L(V1 , F)
with respect to E1 and F2 = {g1 , . . . , gn } be the dual basis of
L(V2 , F) with respect to E2 . For i = 1, . . . , n; j = 1, . . . , m, let
Tij : V → W defined by

Tij (x) = fj (x)vi , x ∈ V1 .

Show that {Tij : i = 1, . . . , n; j = 1, . . . , m} is a basis of


L(V1 , V2 ).
113

61. Let T : R3 → R3 be defined by

T (x1 , x2 , x3 ) = (x2 + x3 , x3 + x1 , x1 + x2 ), (x1 , x2 , x3 ) ∈ R3 .

Find the matrix representation of T with respect to the basis


given in each of the following.

(a) E1 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, E2 = {(1, 0, 0), (1, 1, 0), (1, 1, 1)}
(b) E1 = {(1, 0, 0), (1, 1, 0), (1, 1, 1)}, E2 = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}
(c) E1 = {(1, 1, −1), (−1, 1, 1), (1, −1, 1)},
E2 = {(−1, 1, 1), (1, −1, 1), (1, 1, −1)

62. Let T : P 3 → P 2 be defined by

T (a0 + a1 t + a2 t2 + a3 t3 ) = a1 + 2a2 t + 3a3 t2 .

Find the matrix representation of T with respect to the basis


given in each of the following.

(a) E1 = {1, t, t2 , t3 }, E2 = {1 + t, 1 − t, t2 }
(b) E1 = {1, 1 + t, 1 + t + t2 , t3 }, E2 = {1, 1 + t, 1 + t + t2 }
(c) E1 = {1, 1 + t, 1 + t + t2 , 1 + t + t2 + t3 }, E2 = {t2 , t, 1}

63. Let T : P 2 → P 3 be defined by


a1 2 a2 3
T (a0 + a1 t + a2 t2 ) = (a0 t + t + t ).
2 3
Find the matrix representation of T with respect to the basis
given in each of the following.

(a) E1 = {1 + t, 1 − t, t2 }, E2 = {1, t, t2 , t3 },
(b) E1 = {1, 1 + t, 1 + t + t2 }, E2 = {1, 1 + t, 1 + t + t2 , t3 },
(c) E1 = {t2 , t, 1}, E2 = {1, 1 + t, 1 + t + t2 , 1 + t + t2 + t3 },

64. A linear transformation T : V → W is said to be of finite rank


if rank T < ∞.
Let T : V1 → V2 be a linear transformation between vector
spaces V1 and V2 . Show that T is of finite rank if and only if
{v1 , . . . , vn } ⊂ V2 and {f1 , . . . , fn } ⊂ L(V1 , F) such
there existsP
that Ax = nj=1 fj (x)vj for all x ∈ V1 .
114 Additional Exercises

65. Check whether the following are inner product on the given
vector spaces:

(a) hA, Bi := trace(A + B) on R2×2


(b) hA, Bi := trace(AT B) on R3×3
R1
(c) hx, yi := 0 x0 (t)y 0 (t) dt on Pn or on C 1 [0, 1]
R1 R1
(d) hx, yi := 0 x(t)y(t) dt + 0 x0 (t)y 0 (t) dt on C 1 [0, 1]

66. If {u1 , . . . , un } is an orthonormal basis of an inner product


space V , then show that, for every x, y ∈ V ,
n
X
hx, yi = hx, ui ihui , yi.
i=1

Let Fn be endowed with the usual inner product. Then, deduce


that there is a linear isometry from V onto Fn , i.e., a linear
operator T : V → Fn such that kT (x)k = kxk for all x ∈ V .

67. Let V1 and V2 be inner product spaces with inner products


h·, ·i1 and h·, ·i2 respectively. One V = V1 × V2 , define

h(x1 , x2 ), (y1 , y2 )iV := hx1 , y1 i1 +hx2 , y2 i2 , ∀ (x1 , x2 ), (y1 , y2 ) ∈ V.

Show that h·, ·iV is an inner product on V .

68. Let h·, ·i1 and h·, ·i2 are inner products on a vector space V .
Show that

hx, yi := hx, yi1 + hx, yi2 , ∀ x, y ∈ V

defines another inner product on V .

69. Let V be an n-dimensional inner product space and {u1 , . . . , un }


be an orthonormal basis of V . Show that Pnevery linear func-
tional f : V → F can be written as f = j=1 f (uj )fj , where,
for each j ∈ {1, . . . , n}, fj : V → F is the linear functional
defined by fj (x) = hx, uj i, x ∈ V .

70. For x, y in an inner product space V , show that (x+y) ⊥ (x−y)


if and only if kxk = kyk.
115

71. Let V be an inner product space. For S ⊂ V , let

S ⊥ := {x ∈ V : hx, ui = 0 ∀u ∈ S}.

Show that

(a) S ⊥ is a subspace of V .
(b) V ⊥ = {0}, {0}⊥ = V .
(c) S ⊂ S ⊥⊥ .
(d) If V is finite dimensional and V0 is a subspace of V , then
V0⊥⊥ = V0 .

72. Find the best approximation of x ∈ V from V0 where

(a) V = R3 , x := (1, 2, 1), V0 := span {(3, 1, 2), 1, 0, 1)}.


(b) V = R3 , x := (1, 2, 1), and V0 is the set of all (α1 , α2 , α3 )
in R4 such that α1 + α2 + α3 = 0}.
(c) V = R4 , x := (1, 0, −1, 1) V0 := span {(1, 0, −1, 1), (0, 0, 1, 1)}.
(d) V = C[−1, 1], x(t) = et , V0 = P3 .

73. Let A ∈ Rm×n and y ∈ Rm . Show that, there exists x ∈ Rn


such that kAx − yk ≤ kAu = yk for all u ∈ Rn , if and only if
AT Ax = AT y.

74. Let A ∈ Rm×n and y ∈ Rm . If columns of A are linearly


independent, then show that there exists a unique x ∈ Rn such
that AT Ax = AT y.

75. Find the best approximate solution (least square solution) for
the system Ax = y in each of the following:
   
3 1 1
(a) A =  1 2 ; y =  0 .
2 −1 −2
   
1 1 1 0
 −1 0 1   1 
(b) A = 
 1 −1
; y=
 −1 .

0 
0 1 −1 −2
116 Additional Exercises

   
1 1 3 1

 −1 0 5 


 −1 

(c) A = 
 0 1 −2 
; y=
 3 .

 1 −1 1   −2 
1 0 1 0

76. (a) Show that kxk∞ ≤ kxk2 ≤ kxk1 for every x ∈ Rk .


(b) Find c1 , c2 , c3 , c4 > 0 such that

c1 kxk2 ≤ kxk∞ ≤ c2 kxk2 , c3 kxk1 ≤ kxk1 ≤ c4 kxk∞

for all x ∈ Rk .
(c) Compute kxk∞ , kxk2 , kxk1 for x = (1, 1, 1) ∈ R3 .

77. Let k · k be a norm on Rn and and A ∈ Rn×n . Suppose c > 0 is


such that kAxk ≤ c kxk for all x ∈ Rn , and there exists x0 6= 0
in Rn such that kAx0 k = c kx0 k. Then show that kAk = c.
 
1 2 3
78. Find kAk1 , kAk∞ , for the matrix A =  2 3 4 .
3 2 1

79. Suppose A, B in Rn×n are invertible matrices, and b, b̃ are in


Rn . Let x, x̃ are in Rn be such that Ax = b and B x̃ = b̃. Show
that
kx − x̃k  kA − Bk kb − b̃k 
≤ kAk kB −1 k + .
kxk kAk kbk
[Hint: Use the fact that B(x − x̃) = (B − A)x + (b − b̃), and
use the fact that k(B − A)xk ≤ kB − Ak kxk, and kb − b̃k =
kb − b̃kkAxk/kbk ≤ kb − b̃kkAk kxk/kbk.]

80. Let B ∈ Rn×n . If kBk < 1, then show that I − B is invertible,


and k(I − B)−1 k ≤ 1/(1 − kBk).
[Hint: Show that I − B is injective, by showing that for every
x, k(I − B)xk ≥ (1 − kBk)kxk, and then deduce the result.]

81. Let A, B ∈ Rn×n be such that A is invertible, and kA − Bk <


1/kA−1 k. Then, show that, B is invertible, and

kA−1 k
kB −1 k ≤ .
1 − kA − Bk kA−1 k
117

[Hint: Observe that B = A − (A − B) = [I − (A − B)A−1 ]A,


and use the previous problem.]
82. Let A, B ∈ Rn×n be such that A is invertible, and kA − Bk <
1/2kA−1 k. Let b, b̃, x, x̃ be as in Problem 79. Then, show that,
B is invertible, and
kx − x̃k  kA − Bk kb − b̃k 
≤ 2κ(A) + .
kxk kAk kbk

[Hint: Apply conclusion in Problem 81 to that in Problem 79]


83. Consider the system Ax = b with
 
9 1 1
A =  2 10 3  .
3 4 11

(a) Show that the Jacobi method and Gauss-Siedel method


for the above system converges.
(b) Obtain an error estimate for the k the iterate (for both
the methods) w.r.t. the norms k · k1 and k · k∞ with initial
approximation as x(0) as the zero vector.
84. Suppose u1 , . . . , un are functions defined on [a, b], and t1 , . . . , tn
are points in [a, b]. Let β1 , . . . , βn are real numbers. Then
show that there exists a unique ϕ ∈ span {u1 , . . . , un } satisfying
ϕ(ti ) = βi for i = 1, . . . , n if and only if the matrix [uj (ti )] is
invertible.
85. Suppose u1 , . . . , un are functions defined on [a, b], and t1 , . . . , tn
are points in [a, b]. Show that, if the matrix [uj (ti )] is invertible,
then u1 , . . . , un are linearly independent.
[Hint: A square matrix is invertible if and only if its columns
are linearly independent. ]
86. Suppose u1 , . . . , un are functions defined on [a, b], and t1 , . . . , tn
are points in [a, b] such that the matrix [uj (ti )] is invertible. If
v1 , . . . , vn are linearly independent functions in span {u1 , . . . , un },
then show that the matrix [vj (ti )] is also invertible.
[Hint: Let X0 := span {u1 , . . . , un } and [uj (ti )] is invertible.
Then observe that, the function J : X0 → Rn defined by J(x) =
[x(t1 ), . . . , x(tn ]T is bijective. ]
118 Additional Exercises

87. Let t1 , . . . , tn be distinct points in R, and let


Y t − ti
`j (t) = , j = 1, 2, . . . , n.
tj − ti
i6=j

Then show that {`1 , . . . , `n } is a basis of Pn−1 , and it satisfies


`j (ti ) = δij for all i, j = 1, . . . , n. Deduce from the previous
exercise that the matrix [tji−1 ] is invertible.

88. Let t1 , . . . , tn be distinct points in [a, b] and u1 , . . . , un are in


C[a, b] such that ui (tj ) = δij for i, j = 1, . . . , n. Show that
n
X
Px = x(tj )uj , x ∈ C[a, b],
j=1

defines a linear transformation C[a, b] into itself, and it satisfies


(a) (P x)(ti ) = x(ti ), (b) P x = x for all x ∈ R(P ) amd (c)
P2 = P.

89. Let t1 , . . . , tn be distinct points in [a, b]. Show that for every
x ∈ C[a, b], there exists a unique polynomial p(t) of degree
atmost n − 1 such that p(tj ) = x(tj ) for j = 1, . . . , n.

90. Apply trapezoidal rule, composite trapezoidal rule, Simpson’s


rule, and composite Simpson’s rule for approximating the the
following integrals:
Z 1 Z 4 Z 2π
−t2 dt dt
e dt, 2
dt, dt.
0 0 1+t 0 2 + cos(t)

You might also like