0% found this document useful (0 votes)
186 views113 pages

Notes For Chemical Engineering Maths

This document contains lecture notes on linear operators in applied engineering mathematics. Chapter 1 introduces matrix, differential, and integral equations that commonly arise in engineering problems. Matrix equations represent systems of linear algebraic equations in matrix form. Differential equations, like the reaction-diffusion equation described, model physical processes and contain partial derivatives. Integral equations relate an unknown function to its integral over a domain. Linear operators are then formally defined as operators that satisfy certain properties like additivity and scalability.

Uploaded by

Abhimanyu Dubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
186 views113 pages

Notes For Chemical Engineering Maths

This document contains lecture notes on linear operators in applied engineering mathematics. Chapter 1 introduces matrix, differential, and integral equations that commonly arise in engineering problems. Matrix equations represent systems of linear algebraic equations in matrix form. Differential equations, like the reaction-diffusion equation described, model physical processes and contain partial derivatives. Integral equations relate an unknown function to its integral over a domain. Linear operators are then formally defined as operators that satisfy certain properties like additivity and scalability.

Uploaded by

Abhimanyu Dubey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

LINEAR OPERATORS IN APPLIED

ENGINEERING MATHEMATICS
(Lecture Notes)

K. G. Ayappa
Department of Chemical Engineering
Indian Institute of Science
Bangalore
Contents

1 Introduction to matrix, differential and integral equations 1


1.1 Matrix Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Integral equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Properties of Matrices 15
2.1 Equality of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Addition of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Scalar multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Multiplication of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Trace of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Symmetric and Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.9 Determinants, Cofactors and Adjoints . . . . . . . . . . . . . . . . . . . . . . 20
2.10 Echelon forms, rank and determinants . . . . . . . . . . . . . . . . . . . . . . 21

3 Vector or Linear Spaces 25


3.1 Linear Independence, Basis and Dimension . . . . . . . . . . . . . . . . . . . 26
3.2 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Linear independence of functions . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Solution of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Geometrical Interpretation . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

i
ii CONTENTS

4 Inner Products, Orthogonality and the Adjoint Operator 45


4.1 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Orthogonality and Basis Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 The Adjoint Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 Adjoints for Differential Operators . . . . . . . . . . . . . . . . . . . . . . . . 57
4.7 Existence and Uniqueness for Ax = b Revisited . . . . . . . . . . . . . . . . 59

5 Eigenvalues and Eigenvectors 69


5.1 Eigenvectors as Basis Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Similarity Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.1 Diagonalization of A . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.2 Using similarity transforms . . . . . . . . . . . . . . . . . . . . . . . 78
5.3 Unitary and orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 Jordan Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4.1 Structure of the Jordan Block . . . . . . . . . . . . . . . . . . . . . . 81
5.4.2 Generalized Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.5 Initial Value Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.6 Eigenvalues and Solutions of Linear Equations . . . . . . . . . . . . . . . . . 86
5.6.1 Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . 90
5.6.2 Convergence of Iterative Methods . . . . . . . . . . . . . . . . . . . . 91
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6 Solutions of Non-Linear Equations 103


6.0.1 Contraction Mapping or Fixed Point Theorem . . . . . . . . . . . . . . 107
Chapter 1

Introduction to matrix, differential and


integral equations

Matrix, differential and integral equations arise out of models developed to describe various
physical situations. A few examples of these equations that arise in the analysis of engineer-
ing problems are illustrated in this Chapter. Since the course is mainly concerned with linear
systems we will conclude this Chapter with a formal definition of linear operators.

1.1 Matrix Equations

Consider the following collection of linear equations, which can be compactly written in matrix
vector notation as
Ax = b (1.1)

where,      
a11 a12 . . . a1n x1 b1
 a21 a22 . . . a2n   x2   b2 
A(n × n) =  .. x =  ..  b =  .. 
     
.. .. 
 . . .  . .
an1 an2 . . . ann xn bn
We are interested in finding x given the matrix A and vector b. The matrix equation represents
a collection of linear algebraic equations which arises out of models developed to describe a
wide variety of physical situations. These include chemical reactions, staged processes such
as distillation and gas absorption, electrical networks and normal mode vibrational analysis
of molecules. Matrix equations also arise during numerical solutions of differential equations

1
using finite difference, finite volume and finite element methods as well as in the numerical
solution of integral equations using quadrature methods.

1.2 Differential Equations

Reaction-Diffusion Equation
Consider the first order reaction, A → B, occurring in the inner surface of a cylindrical
catalyst pellet of radius R and length L as shown in Fig. 1.1. Performing a mass balance on a

R 11
00
00
11
00
11
00
11
dz
00
11
z=0 z=L

Figure 1.1: Schematic of catalyst pellet of radius R and length L

differential element of thickness ∆z for species A;

∂CA
πR2 ∆z = jz πR2 − jz+∆z πR2 − k1 CA 2πR∆z (1.2)
∂t

where jz is the mass flux of the species A and k1 is the first order reaction rate constant. Dividing
Eq. 1.2 with πR2 ∆z and using Ficks law,

∂CA
jz = −DAB (1.3)
∂z

Eq. 1.2 reduces to


∂CA ∂ 2 CA
= DAB − α1 C A (1.4)
∂t ∂z 2
where α1 = 2k1 /R. Eq. 1.4 is the unsteady state reaction diffusion equation whose solution
yields the concentration CA (z, t). We further note that Eq. 1.4 is a partial differential equation

2
whose complete formulation requires one initial condition (IC) and two boundary conditions
(BCs) to be specified. The initial condition is

CA (z, t = 0) = 0 (1.5)

The boundary conditions are

CA (z, t) = CA0 at z=0 (1.6)

and
dCA
=0 at z=L (1.7)
dz
Eq. 1.7 assumes that the face of the pore at z = L is non-reactive.
Question: Modify the boundary condition for a reactive pore end at z = L.
Eq. 1.4 is an example of a partial differential equation (PDE) since the dependent vari-
able, CA (x, t) depends on more than one independent variable (x, t). At steady state, the equa-
tion reduces to the following ordinary differential equation (ODE),

d 2 CA
DAB − α1 C A = 0 (1.8)
dz 2

Eq. 1.8 along with the boundary conditions at z = 0 and z = L constitute what is commonly
referred to as a 2 point boundary value problem (BVP) since the boundary conditions at two
ends of the pore are required to complete the problem specification.
Unsteady State Heat Conduction Equation
Consider a three dimensional solid object, Ω heated with an internal source p(x, y, z) as
shown in Fig. 1.2. The unsteady state heat conduction equation is,

∂T
ρCp = ∇ · k∇T + p(x, y, z) (1.9)
∂t

where ρ is the density, Cp is the specific heat capacity and k the thermal conductivity, which
can in general be a function of the spatial co-ordinates. The solution of Eq. 1.9 with appropriate
initial and boundary conditions yield the temperature T (x, y, z, t). Eq. 1.9 is an example of a
partial differential equation that arises in conduction heat transfer. Unlike the two-point BVP
discussed earlier the boundary condition for the heat equation is specified on the entire boundary

3
n · k∇T = h(T − T∞ ) on Γ
Γ

Figure 1.2: Heat conduction in a 3D object. Ω denotes the domain and Γ the surface of the
object. At the surface heat is lost by convection. h is the heat transfer coefficient.

Γ as illustrated in Fig. 1.2. If the heat transfer coefficient is independent of the spatial co-
ordinates Eq. 1.9, reduces to
∂T
ρCp = k∇2 T + p(x, y, z) (1.10)
∂t
At steady state, in the absence of the source term, p(x, y, z), the heat conduction equation
reduces to the Laplace equation,
∇2 T = 0 (1.11)

Note: The gradient operator in Cartesian co-ordinates is defined as,


∂T ∂T ∂T
∇T = ex + ey + ez (1.12)
∂x ∂y ∂z
and the Laplacian is
∂2T ∂2T ∂2T
∇2 T = + + (1.13)
∂x2 ∂y 2 ∂z 2
The Schrödinger Wave Equation
The wave equation forms the cornerstone of quantum mechanics. It arises in the de-
scription of atomic particle positions and quantization of energy levels. The time independent
Schrödinger wave equation is
h2
∇2 Ψn + (En − V )Ψn = 0 (1.14)
8π 2 m
where h is Planck’s constant, m is the particle mass, V is the potential energy field in which
the particle is located. En are the corresponding energy levels of the particle and Ψn (r) is

4
the wavefunction which is a function of the spatial co-ordinates. The probability of locating a
particle in a volume element dr is Ψ(r)Ψ∗ (r) dr, where Ψ∗ (r), denotes the complex conjugate
of Ψ(r).

1.3 Integral equations

Many physical situations naturally give rise to integral equations. Integral equations can some-
times be derived from differential equations. Integrals equations can be broadly classified into
Volterra and Fredholm type equations. The Volterra integral equation of the first kind is,
Z x
k(x, y)u(y) dy = f (x) (1.15)
0
where k(x, y) is the kernel of the operator, f (x) is usually some known function and u(y) the
solution we seek lies in the integrand. The kernel of the operator is related to the physics of the
problem that results in the integral equation. A characteristic feature of the Volterra equation is
that the upper limit of the integral is not a fixed quantity.
The Fredholm integral equation of the first kind is,
Z b
k(x, y)u(y) dy = f (x) (1.16)
a
The main difference between the Volterra equation, Eq. 1.15 and the Fredholm integral equation,
Eq. 1.16 is that the limits of the integral in the Fredholm equation are fixed. Broadly speaking
the Volterra type integral equations are related to initial value problems (IVPs) giving rise to
the variable upper limit in the integral of Eq. 1.15 and the Fredholm type equations are related
to boundary value problems (BVPs). The final point to note is that integral equations do not
require the specification of any additional initial and/or boundary conditions. These conditions
are built into the integral equations themselves.
One final classification that is important is that both the integral equations given above
were referred to as first kind equations. The general form of a Volterra integral equations of the
second kind is Z x
k(x, y)u(y) dy + αu(x) = f (x) (1.17)
0
where α is an arbitrary scalar quantity, and the second kind Fredholm equation is,
Z b
k(x, y)u(y) dy + αu(x) = f (x) (1.18)
a

5
In the second kind equations the unknown u(x) appears both inside and outside the integrand.
Example: The relationship between an IVP and Volterra operators can be illustrated with a
simple example. Consider the following IVP,
du
+ αu = 0 u(t = 0) = u0 (1.19)
dt
where α and β are constants. Integrating Eq. 1.19 and using the IC, Eq. 1.19 can be transformed
into the following Volterra integral equation of the second kind,
Z t
αu(y) dy + u(t) = u0 (1.20)
0

The integral equation has a simple kernel and satisfies the initial conditions of Eq. 1.19. Further
the solution to Eq. 1.19 is, u(t) = u0 exp(−αt). Naturally this is also a solution to its equivalent
integral equation, Eq. 1.20.

1.4 Linear Operators

Our primary concern as engineers is to obtain solutions to the different classes of equations
presented above. Presented with an equation the natural question one poses is whether the
equation is solvable or not. This is the problem of existence. If the problem is not solvable,
one has to revisit the model assumptions and the underlying physical processes that govern
the equation. If the problem is solvable, we inquire if the solution is unique. These questions
of existence and uniqueness form a general theme in this book. Most of us are familiar with
these ideas with the solution of linear equations of the form given in Eq. 5.59. Can we now
generalize these ideas to a more general class of equations? For example under what conditions
can one examine the existence and uniqueness conditions for Eq. 1.4 or Eq. 1.15? A unifying
theory that integrates all the above questions about existence and uniqueness is the theory of
linear operators, which will form an underlying and unifying theme for the course. The theory
is general and as long as the operator is linear it will be applicable. A class of linear operators
which will require a somewhat more specialized treatment are partial differential equations.
Before we proceed further, let us formally define a linear operator.
Consider the generic equation
Lu = f (1.21)

6
where L is the operator, u is the solution we seek and f is usually specfied as part of the problem
definition. In the matrix equation L ≡ A, u ≡ x and f ≡ b L is said to be a linear operator or
linear transformation if it satisfies the following properties

L(αu) = αLu u ∈ X ∀α
L(u + v) = Lu + Lv u, v ∈ X

where X is a linear space on which the operator L acts. We will return to a formal definition of
linear spaces which contain both vectors and functions later in the text. L can also be looked
upon as a mapping of elements in X into itself, L : X → X and α lies in the associated complex
scalar field of the operator[1]. We note that both properties must be satisfied for the operator to
be linear. Further one property does not imply the other. If either of the above two properties
are not satisfied the operator is said to be nonlinear. Both the above requirements of a linear
operator can be integrated into a single property,

L(αu + βv) = αLu + βLv u, v ∈ X ∀ α, β (1.22)

Example 1: The identity operator maps elements in X into itself.

Iu = u ∀u ∈ X (1.23)

To show that I is linear, we note that

I(αu + βv) = αIu + βIv


= αu + βv

Example 2: The (n × n) matrix A is a linear transformation since


n
X
A(αu + βv) = aij (αuj + βvj ) i = 1...n
j=1
Xn n
X
= α aij uj + β aij vj i = 1...n
j=1 j=1
= αAu + βAv

Example 3: The reaction diffusion equation, where the operator


∂u ∂2u
Lu = − DAB 2 + α1 u (1.24)
∂t ∂z

7
∂(αu + βv) ∂ 2 (αu + βv)
L(αu + βv) = − DAB + α1 (αu + βv)
∂t ∂z 2
∂u ∂2u ∂v ∂2v
= α( − DAB 2 + α1 u) + β( − DAB 2 + α1 v)
∂t ∂z ∂t ∂z
= αLu + βLv

Hence the differential operator is linear. Since the operator involves both the differential equa-
tion and its associated initial and boundary conditions, the IC and BCs must also satisfy the
linearity property for the differential equation to be classified as linear. This can easily be veri-
fied for the reaction diffusion equation diffusion, Eq. 1.4.
The linearity property of differential operators, has one important consequence from the
viewpoint of obtaining solutions. It simply means that if u and v are solutions of the differential
equation, then w = αu + βv is also a valid solution. This idea, technically called the princi-
ple of superposition, is used widely in the solution of of both ordinary and partial differential
equations. A familiar example is the solution of the linear differential equation

d2 u
− m2 u = 0
dx2

The solution to the above equation, u(x) = c1 emx + c2 e−mx . Since the differential equation is
linear, not only are u1 = emx and u2 = e−mx independent solutions, u = c1 u1 + c2 u2 is also a
valid solution. The constants c1 and c2 are obtained by using the appropriate boundary or initial
conditions. The principle of superposion has the following consequence for linear operators. If

n
X
u= ci u i
i=1

then,

n
X n
X
Lu = L ci u i = ci Lui
i=1 i=1

8
Example 4: Volterra Integral equation (Eq. 1.15).
Z x
L(αu + βv) = k(x, y)[αu(y) + βv(y)] dy
0
Z x Z x
= k(x, y)αu(y)dy + k(x, y)βv(y) dy
0 0
Z x Z x
= α k(x, y)u(y) dy + β k(x, y)v(y) dy
0 0
= αLu + βLv

Example 5: To show that if an operator satisfies the property L(u + v) = Lu + Lv, it need not
satisfy L(αu) = αLu. Consider L to be the operation of complex conjugation. If z and w are
two complex numbers then L(z + w) = Lz + Lw. Clearly L(αz) 6= αLz if α is a complex
scalar.
Example 6: To show that if an operator satisfies the property L(u + v) = Lu + Lv, it need not
satisfy L(αu) = αLu. Consider the operation of mapping the components [2], x1 , x2 of a 2d
vector into a point; 
   x1 + x2 x1 x2 > 0
x1
=
x2
0 otherwise

Assuming a real field of scalars α, and vectors u = (1, −1) and v = (1, 1), L(u + v) = 0
whereas, Lu + Lv = 1.
Before we conclude, we briefly discuss the classifications of equations into homoge-
neous and inhomogeneous. In the generic equation, Eq. 1.21 the equation is homogeneous if
the right hand side, f = 0. Hence Ax = 0 is an example of a homogeneous set of linear
equations. In the case of differential equations f represents the term containing only the inde-
pendent variables and L represents the operator acting on the dependent variable. Hence the
reaction-diffusion equation, Eq. 1.4 and the Schrödinger wave equation, Eq. 1.14 are examples
of homogeneous differential equations. The unsteady state heat equation, Eq. 1.9 is inhomo-
geneous due to the presence of the source term p(x, y, z, t) and the integral equations given
in Eqs. 1.15 and Eq. 1.16 are both inhomogeneous integral equations. In the case of differen-
tial equations the ICs and BCs can also be classified as homogeneous and inhomogeneous in a
similar manner. This classification is important, as we will soon see that determining the solu-

9
tions to the homogeneous problem forms the first step in studying the existence and uniqueness
conditions for the inhomogeneous problem.

1.5 Summary

In this Chapter we have introduced some simple examples of various kinds of equations that
commonly arise in engineering and sciences. Starting with matrix equations, the basic differ-
ence between ODEs and PDEs should be clear from the examples presented above. ODEs can
further be classified into IVP’s where all the conditions are specified as an initial condition at
time t = 0 and BVP’s where the differential equation is accompanied by boundary conditions.
The classification presented in here is preliminary. PDEs can be further classified into vari-
ous categories and these will be discussed later in the text. Some examples and a preliminary
classification of integral equations was also introduced. Integral equations are more specialized
and do not arise as often in the description of physical problems as do differential equations.
Sometimes integral equations have an equivalent representation as a differential equation as we
encountered with the IVP problem. In many cases this equivalence is not feasible and the inte-
gral equation has to be solved directly. Linearity is an important concept and its consequence,
the principle of superposition, used routinely for solving linear operators should be recognized.
Finally the notion of homogeneous and inhomogeneous equations should be firmly understood.

10
Bibliography

[1] Ramkrishna D and N. R. Amundson. Linear Operator Methods in Chemical Engineering.


Prentice Hall, 1985.

[2] Naylor A. W and G. R. Sell. Linear Operator Theory in Engineering and Science. Springer-
Verlag New York, 1982.

PROBLEMS

1. If L is a linear operator show that Ln is also a linear operator. Note that L2 = LL and so
on. Use the method of mathematical induction for your proof.

2. Using the properties of a linear operator, L(u + v) = Lu + Lv and L(αu) = αL(u),


identify which of the following operators are linear.

(a)
d2 u  du
Lu = 2 + e
−x
+ x2 + xu.
dx dx
(b)  
∂c ∂ ∂c
Lc = − D (c) − kc
∂t ∂x ∂x
where D(c) is a concentration dependent diffusion coefficient and k is the reaction
rate constant. Rework this problem with D as a function only of x.

(c)
Zx
u(y)
Lu ≡ p dy = f (x)
αx − βy
0

11
Differentiate the above Volterra integral equation using the rules for differentiation
under an integral sign. Is the resulting operator still linear? The process of differ-
entiation converts the first kind Volterra integral equation to that of the second kind,
where the unkown u appears both inside the integral as well as outside. Note: α and
β are arbitrary scalars.

3. Using the properties of linear operators, determine which of the following operators are
linear

(a) The divergence operator,


Lu ≡ ∇· α(x, y, z)∇u

(b) The curl operator,


Lu ≡ ∇ × u

(c) The Fredholm integral operator


Zb
Lu ≡ ex−y u(x)dx + u2 (y)
a

4. In the following equations of the generic form Lu = f , identify the operator and deter-
mine if the operator is linear or not.

(a) Heat equation with spatially dependent thermal conductivity


∂u ∂  ∂u
+ sinx e−x = f (x)
∂t ∂x ∂x
(b) The nth order ordinary differential equation
∂nu ∂ n−1 u
a0 + a1 + · · · + an = f (x)
∂xn ∂xn−1
(c) The 3D wave equation:
∂2u 2 2
2 = c ∇ u
∂t
(d) Integral equation:
Zx
e(x−y) dy eu(y) + u(x) = f (x)
0

12
(e) The Integro-differential equation,

Zt
du t−t′
= e τ u(t′ ) dt′
dt
0

(f) The Korteweg -de Vries (KDV) equation used in the study of water waves

∂u ∂u ∂ 3 u
+ cu + =0
∂t ∂x ∂x3

(g) Which of the above equations given in parts (a) - (f) are homogeneous.

5. Check the following transforms for linearity

(a) The Laplace transform


Z∞
f (s) ≡ L[f (t)] = est f (t) dt
0

(b) The Fourier transform


Z∞
1
f (ω) ≡ L[f (t)] = √ eiωt f (t) dt

−∞

6. Using the following dimensionless variables,

u = CA (z)/CA0 , x = z/L

the dimensionless form of the steady state differential equation (Eq. 1.8) is,

d2 u
− φ2 u = 0 0<x<1 (1.25)
dx2
where φ2 = 2k1 L2 /DAB R, and L is the pore length and R is the radius of the pore. Obtain
analytical solutions for both the dead end pore and the reactive end pore. Qualitatively
sketch your solution for the dimensionless concentration u for small and large values of
the parameter φ2 . Physically interpret these two conditions.

13
14
Chapter 2

Properties of Matrices

In this section we review some basic properties of matrices. Let A be a m × n matrix,


 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A(m × n) =  ..
 
.. .. 
 . . . 
am1 am2 . . . amn

where m is the number of rows and n the number of columns. aij will the i, j element in the
matrix. A column vector x is an n × 1 matrix,
 
x1
 x2 
x(n × 1) =  .. 
 
.
xn

Matrices arise frequently in engineering applications and can assume a variety of forms, some of
which are illustrated below. Many of these forms arise during numerical solution of differential
equations and recognizing the form of the matrix is important while choosing the appropriate
solution technique.

2.1 Equality of matrices

Two matrices A and B are said to be equal to each other if aij = bij . Only matrices of similar
order can be considered to be equal.

15
9$
9$

!"#$%&"'()"*$$ !+#$,-&.&"'()"*$$ !/#$0").1.$$

!.#$0*(/2$.&"'()"*$$ !1#$3(41-$5-&")'6*"-$$ !7#$%1)81$$

Figure 2.1: Various classifications of commonly occurring matrices. The solid lines and filled
regions represent non zero elements. Sparse matrices (not shown) contain a sparse distribution
of non-zero elements in the matrix.

2.2 Addition of matrices

Matrices are compatible for addition only if the corresponding numbers of rows and columns
are similar. Matrix addition is both associative and commutative

1a (A + B) + C = A + (B + C) Associative
1b A+B= B+A Commutative

2.3 Scalar multiplication

When a matrix is multiplied by a scalar α all the elements of the matrix are multiplied α.
 
αa11 αa12 . . . αa1n
 αa21 αa22 . . . αa2n 
αA(m × n) =  ..
 
.. .. 
 . . . 
αam1 αam2 . . . αamn

2.4 Multiplication of Matrices

Two matrices A(m×n) and B(n×p) are compatible for multiplication if the number of columns
of A are similar to the number of rows of B. Matrix multiplication satisfies the following

16
properties

3a A(B + C) = AB + AC
3b (A + B)C = AC + BC
3c (AB)C = A(BC)
3d In general AB 6= BA

2.5 Transpose of a matrix

The transpose of a matrix A is obtained by interchanging its rows and columns. The transpose
is denoted by AT . The operation of a transpose satisfies the following properties.

4a (A + B)T = AT + BT
4b (AT )T = A
4c (AB)T = BT AT

2.6 Trace of a matrix

The sum of the diagonal elements of a square matrix is known as the trace. The trace of an
(n × n) square matrix,
n
X
TraceA = aii
i=1
A number of the properties of matrices listed above can be proved using index algebra. We
illustrate these manipulations with some examples which the reader should get acquainted with.
Example: Matrix vector multiplication
n
X
Ax = b → aij xj = bi i = 1...m
j=1

where A is an (m × n) matrix and x is (n × 1) and b has dimenions (m × 1).


Example: Matrix multiplication
p
X
A(m × p)B(p × n) = C(m × n) → aik bkj = cij i = 1 . . . m, j = 1...n
k=1

17
Example: To show that (AB)T = BT AT . Let cTij be the element of (AB)T
n
X
cij = aik bkj
k=1
Xn
cTij = ajk bki
k=1

Let dik be the element of BT AT . We need to show that dij = cTij


n
X
dik = bji akj
j=1
n
X
dij = bki ajk
k=1
n
X
= ajk bki = cTij
k=1

The second line in the algebra above is obtained by interchanging the index k with j. This
example illustrate manipulations with indices that the reader should be acquainted with.
Example: To show that, Trace (AB) = Trace (BA)
m X
X n
Trace (AB) = aik bki
i=1 k=1
n X
X m
Trace (BA) = bik aki
i=1 k=1
m X
X n
= bki aik
i=1 k=1
= Trace (AB)

We have assumed that A is an (m × n) matrix and B is an (n × m) matrix. In the second line


of the above equation, the indices have been exchanged and the upper limits in the summation
have been consistently altered.

2.7 Symmetric and Hermitian Matrices

The matrix A, is said to be symmetric if

A = AT . (2.1)

18
We note that the above notion of symmetry is restricted to real matrices. If the matrix has
complex elements then we define A∗ as the matrix obtained by taking the complex conjugate
of AT . Hence
T
A∗ = A ≡ AT

Note that the operation of taking the transpose and complex conjugation commute.
A matrix is said to be Hermitian if

A = A∗ (2.2)

The above definition includes real matrices as well. In the case of real matrices Eq. 2.2 is
equivalent to Eq. 2.1.
Example:
   
1 i ∗ 1 −i
A= A =
i 2 −i 2

Example:
   
i 1−i ∗ −i 1 − i
A= A =
1+i 0 1+i 0

Example:
   
1 1−i ∗ 1 1−i
A= A =
1+i 0 1+i 0
Example:
   
i i ∗i i
A= A =−
i i i i
Only the matrix in the 3rd example is Hermitian. Clearly a matrix with complex elements on
the diagonal cannot be Hermitian. The last example is an example of a skew Hermitian matrix
where A = -A∗
Example: To show that the product of two symmetric matrices need not be symmetric. Let A
and B be two symmetric matrices.

(AB)T = BT AT = BA 6= AB

This is an example of a proof which did not involve the use of indices.

19
2.8 Inverse

The inverse of A denoted by A−1 is such that

AA−1 = A−1 A = I

where the identity matrix I is a diagonal matrix with 1’s on the diagonal.
Example
   
a11 a12 −1 1 a22 −a12
A= A =
a21 a22 |A| −a21 a11

where |A| = a11 a22 −a21 a12 is the determinant of the matrix A. We can generalize the definition
of the inverse by using the adjoint or adjugate of a matrix.

2.9 Determinants, Cofactors and Adjoints

The minor |Mij | of an element aij in the matrix A is the determinant of an (n − 1) × (n − 1)


matrix formed by omitting the ith row and the j th column. The cofactor of the element aij ,

Aij = (−1)i+j |Mij |

The determinant of an n × n matrix expressed as an expansion in terms of the cofactors is


n
X
|A| = a1j A1j
j=1
Xn
= a1j (−1)1+j |M1j |
j=1

The adjoint of a matrix, denoted as adj A, is the transpose of the cofactor matrix whose elements
are made up of Aij (Eq. 2.9). This definition should not be confused with the adjoint operator
whose definition we will encounter later in the text. The inverse of a matrix can be expressed
using the definition of the adjoint by noting that

A adjA = |A|I
(adjA) A = |A|I

20
which implies that
1
A−1 = adjA
|A|
Example
   
1 2 −1 5 −1 7
1
A = 0 3 2 , adjA =  2 2 −2 , A−1 = adjA
12
1 −1 1 −3 3 3

2.10 Echelon forms, rank and determinants

The echelon form for any matrix A is such that the number of zeroes preceding the first non-
zero element in every row increases row by row (starting from the 1st row). The echelon forms
can be obtained by performing elementary row operations on the matrix. Some examples of
row reduce echelon forms are given below,
 
  2 1 5 2
  1 1 0 2
1 1  0 0 0 0
0 1 2 3
  (2.3)
0 2 0 0 5 1
0 0 0 0
0 0 0 0
Clearly if the number of zeroes preceding the first non-zero element is the same in the k and
k + 1 rows, the first non-zero element in the k + 1 row can be made zero by elementary row
operations. Once the echelon forms are obtained it is easy to deduce the rank of the matrix. The
rank r of a matrix A is the number of rows containing non-zero elements in the row reduced
echelon form. Using the above definition, the rank of the matrices given above are 2, 1 and
3 respectively. From the definition of the rank it is easy to see that r ≤ min(m, n), r 6= 0
(unless all the elements in the matrix are identically zero). A similar definition of the rank can
be generated using column operations. The row rank is equal to the column rank or equivalently
the maximum number of linearly independent rows is equal to the maximum number of linearly
independent columns in a matrix. In subsequent use of the rank, we will use the definition
based on the row rank as this will provide a convenient method for obtaining solutions of linear
systems of equations. An additional definition of the rank is based on the determinant. The rank
is the order of the largest non-zero determinant in the matrix. If the rank of a matrix is k, then
there is at least one determinant of order k that is nonzero. All determinants of order k + 1 must
vanish.

21
Some examples of obtaining the ranks of matrices using the echelon forms are given below,
Example: (3 × 3) matrix, with r = 3.
     
1 2 −1 1 2 −1 1 2 −1
0 3 2  R3 − R1  3R3 + 2R2 
0 3 2 0 3 2 (2.4)
−→ −→
1 0 1 0 −2 2 0 0 10

Example: (3 × 3) matrix, with r = 2.


     
1 2 −1 R2 − 2R1 1 2 −1 1 2 −1
R3 − R2 
2 3 2 −→ 0 −1 4  0 −1 4  (2.5)
−→
1 1 3 R3 − R1 0 −1 4 0 0 0

The rank of the above matrix can be obtained by column operations,


     
1 2 −1 C2 − C1 1 1 −4 1 1 0
C3 + 4C2 
2 3 2  −→ 2 1 −4 2 1 0 (2.6)
−→
1 1 3 C3 − 3C1 1 0 0 1 0 0

The above example illustrates that the rank can be determined by either the row or column
reduced echelon forms. This is a consequence of the property that the order of the smallest non-
zero determinant of a matrix is unchanged by elementary row or column operations as carried
out above (show this). Some elementary properties of determinants can be understood from the
above examples. For a square matrix (n × n) the determinant is non-zero if and only if r = n.
Adding a multiple of one row to another leaves the determinant unchanged. Thus in Eq. 2.4
above, the determinant of the matrix is 10, row operations R3 − R1 leaves the determinant
unchanged, however the last row operation 3R3 + 2R2 changes the determinant to 30, since R3
is multiplied by 3. The last property can easily be proved with the help of cofactor expansions.
To show that row operations, αR1 + βR2 leaves the determinant of an n × n matrix multiplied
by α,
n
X
|A| = (αa1j + βa2j )A1j
j=1
Xn n
X
= αa1j A1j + βa2j A1j
j=1 j=1
Xn
= α a1j A1j
j=1
= α|A|

22
Since the first row of the matrix is a multiple of the second row (rank = 0), the second term in
the second line above is identically zero. If α = 1 then the determinant is unchanged.

PROBLEMS

1. Consider the matrix  


1 2 −1
A = 1 3 5 
2 1 −1

(a) Find AT , A−1 , A2 , det(A) and det(A5 ). Use the adjoints to find the inverse.

(b) Find the solution to Ax = b where b = (−1, 1, 3)T

2. Consider the matrix  


1 2 −1
A = 1 3 2 
1 1 −4
Find the solutions to Ax = b where b = (1, 2, 0)T

3. A skew symmetric matrix is such that

AT = −A

(a) Show that a skew symmetric matrix is square.

(b) What are the diagonal elements of a skew symmetric matrix ?

(c) If A is an (n × n) matrix then show that (A − AT ) is skew symmetric.

(d) Show that any square matrix can be decomposed into a sum of symmetric and skew
symmetric matrices.

(e) Show that a Hermitian matrix can be written as the sum of a real symmetric ma-
trix and an imaginary skew symmetric matrix. Check this property with a suitable
example.

4. Show that (AB)T = BT AT .

5. Using the definition of cofactors and adjoints show that

A(adjA) = (adjA)A = |A|I.

23
6. If A and B are two noncommuting Hermitian matrices such that

AB − BA = iC,

prove that C is Hermitian.

7. The sum of the diagonal elements in a square matrix is known as the trace. Show that

trace(AB − BA) = 0

8. If A and B are Hermitian matrices, show that (AB + BA) and i(AB − BA) are also
Hermitian.

9. If C is non-Hermitian, show that C + C∗ and i(C − C∗ ) are Hermitian.

10. A real matrix is said to be orthogonal if A−1 = AT . Show that the product of two
orthogonal matrices is orthogonal. Further, show that det(A) = ±1. Note: If A is
complex and A−1 = A∗ then A is said to be unitary.

11. Orthogonal matrices arise in co-ordinate transformations. Consider a point (x, y) in the
X − Y plane. If the X − Y plane is rotated counter-clockwise by an angle φ then the
point (x, y) is transformed to the point (x′ , y ′) in the X ′ − Y ′ co-ordinate system. The
rotation operation can be represented by a matrix equation

Ax = x′

or      ′
cosφ sinφ x x
=
−sinφ cosφ y y′
In 3 dimensions, rotation about the z axis by an angle φ is represented by
 
cosφ sinφ 0
B = −sinφ cosφ 0
0 0 1

Verify that A and B are orthogonal matrices.

24
Chapter 3

Vector or Linear Spaces

The vector or linear space is the simplest of the abstract spaces that we will encounter. A
vector space X is a collection of vectors that can be combined by addition and each vector
can be multiplied by a scalar. The elements of a vector space satisfy the following axioms. If
u, v, w ∈ X and α and β lie in the associated field of scalars, the elements in the vector space
satisfy the following axioms,
1. Linearity:

(1a) u+v =v+u


(1b) u + (v + w) = (u + v) + w
(1c) There ∃ a unique vector 0 such that u + 0 = u ∀ u ∈ X
(1d) u + (−u) = 0

2. Multiplication by a scalar:

(2a) α(βu) = αβu


(2b) (α + β)u = αu + βu
(2c) α(u + v) = αu + αv

In order to show that elements in a set X constitute a vector space, the elements must conform to
all the properties of the linear space listed above. The properties of a linear space simply allow
for vector addition and multiplication of the elements by a scalar. Examples of vectors spaces

25
are the n-dimensional vector space which consists of vectors with n real elements, also referred
to as Rn . Alternately the elements that constitute the vector can be complex. This is known
as the space Cn . Functions can also make up a linear space. Hence the set of all continuous
functions on the interval [a,b] make up a vector space, called C[a, b]. The reader should ensure
that these examples satisfy the properties of the linear space

3.1 Linear Independence, Basis and Dimension

The notion of linear independence and dependence are important and desirable properties for a
collection of vectors. The ideas developed in this section are important while obtaining solutions
to linear equations and lay a general framework for obtaining solutions to various classes of
operators. A collection of vectors u1 , u2 . . . uk are said to be linearly independent if the only
solution to
α1 u1 + α2 u2 . . . αn un = 0 (3.1)

is the trivial solution i.e. αi = 0 for i = 1 . . . n. Eq. 3.1 represents a linear combination of
vectors. If there exists some values of αi , not all zero, such that Eq. 3.1 is satisfied then the set
of vectors are linearly dependent. In other words for the set to be linearly dependent, non trivial
solutions exist for Eq. 3.1. We illustrate the notion of linear independence by relating them to
solutions of homogeneous linear equations. If ui consists of a collection of vectors in Rn then ,
     
u11 u12 u1n
 u21  u22   u2n 
u1 =  ..  u2 =  ..  . . . un =  .. 
     
 .   .   . 
un1 un2 unn

where uij is the ith element of the vector uj then Eq. 3.1 can be recast as a collection of
homogeneous linear equations which can be represented as

Aα = 0 (3.2)

where    
u11 u12 . . . u1n α1
 u21 u22 . . . u2n   α2 
A(n × n) =  .. and α =  .. 
   
.. .. 
 . . .   . 
un1 un2 . . . unn αn

26
Hence in order to examine whether the set of vectors given in Eq. 3.1 is linearly independent
we can equivalently seek solutions of the set of linear algebraic equations Eq. 3.2. If Eq. 3.2
has nontrivial solutions (αi 6= 0 for any i) then the set of vectors is linearly dependent. If the
only solution is the trivial solution (αi = 0 for all i) then the set is linearly independent. We
illustrate this with some examples
Example 1: Consider the set of vectors
     
1 1 0
u1 = 2 , u2 = −2 , u3 = 1
1 1 1
Recasting them into a set of algebraic equations of the form Eq. 3.2,
  
1 1 0 α1
Aα = 2 −2 1
   α2  = 0
1 1 1 α3
Using row operations, it can be shown that the solution to the above equation is only the trivial
solution. The determinant of A is non zero since the rank = 3. Hence the set of vectors are
linearly independent.
Example 2: Consider the set of vectors
     
1 −1 0
u1 = 2 , u2 =
   0 , u3 = 1
 
1 1 1
Recasting them into a set of algebraic equations of the form Eq. 3.2, it can be shown that the
non trivial solution is, 
1
α = c 1 
−2
where c is an arbitrary constant. Hence the set of vectors are linearly dependent. From the last
example we can see that if any two vectors in a set are linearly dependent then the entire set is
linearly dependent. We can generalize this observation.
Theorem: If a subset of vectors in a set of vectors are linearly dependent then the entire set is
linearly dependent.
Proof: Consider a set of n vectors where the first m vectors are linearly dependent.
m
X n
X
αi u i + αi u i = 0 (3.3)
i=1 i=m+1

27
Since the first sum containing vectors from i = 1 to m forms a linearly dependent set, this
implies that there are values of αi for which,
m
X
αi u i = 0
i=1

Further since the second sum containing terms from i = m + 1 to n are linearly independent
αi = 0 for i = m + 1 to n. Hence there will always exist non trivial values of αi such that
Eq. 3.3 is satisfied and the set is linearly dependent.

3.2 Basis

Linearly independent vectors have a number of useful properties. An important property con-
cerns using a linearly independent set of vectors to represent other vectors. We will see later
that these ideas can be extended to represent functions as well. If a vector x lies in a finite
dimensional space X then we would like to represent x in a collection of suitable vectors which
we will call a basis for the space X. A finite collection of vectors φi is said to form a basis
for the finite dimensional space X if each vector in X can be represented uniquely as a linear
combination of the basis vectors.
n
X
x= αi φ i = 0 ∀ x∈X (3.4)
i=1

The term unique in the definition implies that for a given basis set φi and x the αi values are
uniquely determined Let us illustrate these ideas with some simple examples of basis sets.
Example 3: The vectors
     
1 0 0
φ1 = 0 φ2 = 1 φ3 = 0
0 0 1

form a basis for vectors in R3 , which implies that any vector in R3 can be represented uniquely
using a linear combination of the above vectors. If a, b and c, represent the components of an
arbitrary vector in R3 then  
a
α1 φ 1 + α2 φ 2 + α3 φ 3 = b 

c

28
implies that the coefficients of the expansion are uniquely determined as α1 = a, α2 = b and
α3 = c.
Example 4: The vectors given in Example 1 also constitute a basis for vectors in R3 since the
determinant of the resulting matrix formed from the column vectors is non zero.
From the above examples it is clear that for an n-dimensional vector space any set of n
linearly independent vectors form a suitable basis for the space. In seeking a suitable basis, the
representation is complete when the coefficients of the expansion given in Eq. 3.4 are obtained.
Clearly some basis sets simplify the determination of these coefficients and the basis in Example
3 was one example of a convenient basis, referred to as the orthonormal basis set. Thus, vectors
in a basis are linearly independent and in an n dimensional vector space any set of n linearly
independent vectors form a basis for the space
Dimension of a basis: The linear space X is n dimensional if it possesses a set of n linearly
independent vectors, but every n + 1th set is linearly dependent. Equivalently, the number of
vectors in a basis is its dimension.
Example 5: The set of polynomials of degree < n constitute a basis for an n dimensional linear
space of polynomials of degree < n. The basis set is

φ1 = 1 , φ2 = x , . . . , φn = xn−1

3.3 Linear independence of functions

We next extend the concepts of linear independence for functions. Consider the set of functions,
f1 (x), f2 (x), f3 (x) . . . fn (x) which are differentiable n − 1 times on the interval [a, b]. The
functions are linearly independent on [a, b] if

α1 f1 (x) + α2 f2 (x) + α3 f3 (x) + . . . αn fn (x) = 0 ∀ x ∈ [a, b] (3.5)

29
implies that αi = 0, i = 1 . . . n. Differentiating Eq. 3.5 n − 1 times, a set of equations involving
the derivatives of the functions can be generated.

α1 f1 (x) + α2 f2 (x) + α3 f3 (x) + . . . αn fn (x) = 0


α1 f1′ (x) + α2 f2′ (x) + α3 f3′ (x) + . . . αn fn′ (x) = 0
.........
(n−1) (n−1) (n−1)
α1 f1 (x) + α2 f2 (x) + α3 f3 (x) + . . . αn fn(n−1) (x) = 0

Eq. 3.6 represents a set of homogeneous equations and the Wronskian is the determinant formed
by the functions,

f1 (x) f2 (x) f3 (x) ... fn (x)
′ ′
f (x) f2 (x) f3′ (x) ... fn′ (x)
|W (f1 (x), f2 (x), f3 (x) . . . fn (x)| = 1 (3.6)
...
(n−1) ... ... ... . . .
f (n−1) (n−1) (n−1)
1 (x) f2 (x) f3 (x) . . . fn (x)
For Eq. 3.5 to have only the trivial solution, |W | =
6 0 ∀ x ∈ [a, b]. In this case the set of
functions f1 (x), f2 (x) . . . fn (x) is said to be linearly independent. However if the Wronskian
vanishes for some or all x ∈ [a, b] it does not necessarily imply that the set is linearly dependent.
Thus |W | =
6 0 ∀ x ∈ [a, b] is only a sufficient condition for the linear independence of the set
of functions.
Example: f1 (x) = sinh x, f2 (x) = cosh x,

sinh x cosh x
|W (f1(x), f2 (x))| =
= 1 6= 0 (3.7)
cosh x sinh x
Thus sinh x and cosh x constitute a linearly independent set of functions.
Example: Consider the polynomials, f1 (x) = 1, f2 (x) = x and f3 (x) = x2

1 x x2

|W (f1(x), f2 (x), f3 (x))| = 0 1 2x = 2 6= 0 (3.8)
0 0 2

Thus the set of polynomials constitute a linearly independent set of functions. This can be
extended a set of nth degree polynomials.
Example: f1 (x) = x2 , f2 (x) = 2x2 ,
2
x 2x2
|W (f1(x), f2 (x))| = =0 (3.9)
2x 4x

30
and the set is linearly dependent.
Example: f1 (x) = x, f2 (x) = x2 , x ∈ [0, 1]

α1 x + α2 x2 = 0

For x(α1 + α2 x) = 0 for all x ∈ [0, 1], α1 = α2 = 0. Thus the set is linearly independent.
Upon examining the Wronskian,

x x2
|W (f1(x), f2 (x))| =
= x2 (3.10)
1 2x
W vanishes for x = 0. This is an example where the vanishing of the Wronskian for a par-
ticular value of x does not imply that the set is linearly dependent. Clearly this set is linearly
independent.
Example: f1 (x) = x2 , f2 (x) = x|x|, x ∈ [−1, 1]

α1 x + α2 x|x| = 0

α1 = −α2 |x|/x. For −1 < x < 0, α1 = α2 . For 0 < x < 0, α1 = −α2 and at x = 0, α1 , α2
are arbitrary. Thus the only way in which α1 x + α2 x|x| = 0 can be identically zero ∀ x is when
α1 = α2 = 0. Hence f1 (x) = x2 , f2 (x) = x|x|, x ∈ [−1, 1] consitute a linearly independent
set. The Wronskian for this case is,
2 
x x|x| 2 d|x| 1 0<x<1
|W (f1 (x), f2 (x))| = = x where h(x) = =
2x |x| + xh(x) dx −1 −1 < x < 0
In this case W = −x2 |x| + x3 h(x) = 0 ∀ x. This is another example were the vanishing of the
Wronskian does not imply that the set is linearly dependent.

3.4 Solution of linear equations

One of our primary goals lies in seeking solutions to the general class of linear equations of the
form,
Ax = b (3.11)

where A is in general an m × n matrix. While discussing issues relating to the solutions of


Eq. 3.11 we will make use of the the null space and range space of A which will also use the
ideas of basis sets introduced in this Chapter.

31
The existence or solvability condition for a set of linear algebraic equations of the form
in Eq. 3.11 can be stated as follows. Ax = b is solvable if the rank of A is equal to the rank
of the augmented matrix A|b. The augmented matrix is obtained by adding an extra column
vector b to the matrix A. We illustrate the solvability conditions with reference to the examples
of the echelon matrices given in the previous Chapter. Consider the following row reduced
forms for the augmented matrices,
 
  2 1 5 2 | 1
  1 1 0 2 | 2
1 1 | 3 0 0 0 0 | 1
0 1 2 3 | 2
  (3.12)
0 2 | 0 0 0 5 1 | 1
0 0 0 0 | 0
0 0 0 0 | 0
The first and third augmented matrices satisfy the rank criterion and are hence solvable. Once
the equations are solvable we inquire into the condition of uniqueness. To answer this we first
examine the solutions to the homogeneous problem

Ax = 0 (3.13)

and define the null space of A denoted as N (A). N (A) consists of all vectors x that satisfy
the homogeneous equation, Eq. 3.13. We illustrate how the null space can be obtained for the
following set of algebraic equations,

− x1 + x3 + 2x4 = 0
−x1 + x2 − x4 = 0
−x2 + x3 + 3x4 = 0
x1 − 2x2 + x3 + 4x4 = 0 (3.14)

Using a series of row operations the matrix can be reduced as follows,


   
−1 0 1 2 −1 0 1 2
−1 1 0 −1
 →  0 1 −1 −3
 
A=  0 −1 1 3  0 0 0 (3.15)
0
1 −2 1 4 0 0 0 0
resulting in the following two linear equations

x1 − x3 − 2x4 = 0
x2 − x3 − 3x4 = 0

32
If x3 = α1 and x4 = α2 the solution vector x can be written in two basis vectors as follows.
   
1 2
1 3
x = α1 1 + α2 0
  

0 1
We make a few observations. As a check on the solution procedure we should ensure that x
given above satisfies the original set of equations. The rank of the matrix in the above example
is 2 which is equal to the number of linearly independent equations. Since the number of
unknowns is 4 the dimension of N (A) is 4 - 2 = 2. The dimension of N (A) is simply the
number of linearly independent vectors in the basis used to represent the solution space x. This
result is easily generalizable. For a general m × n matrix whose rank is r the dimension of
N (A) is n − r. Note that n is the number of unknowns in the set of equations and n − r which
is the number of arbitrary ways in which the unknowns can be specified yields the dimension
of the basis. Clearly there is no unique way of choosing the unknowns and hence the basis
for N (A) is not unique. However the dimension of N (A) is fixed. N (A) is empty when,
n = r then the only solution to the homogeneous problem is the trivial solution. This leads to
an important result.
Theorem: If Ax = 0 has only the trivial solution, then Ax = b has a unique solution.
Proof: Let the inhomogeneous equation Ax = b, have two solutions u and v. Then

Au = b
Av = b

Subtracting the two equations


Aw = 0 (3.16)

where w = u−v. Since Ax = 0 has only the trivial solution w = 0 and u = v. Hence Ax = b
has a unique solution. The above proof is always true if the matrix is square. The proof is true
for any m × n matrix, provided the inhomogeneous equation Ax = b is solvable. Example 6
in this section illustrates this situation. Later we will see that a similar proof can be used for
some linear differential and integral operators. If the matrix is square and the inverse exists
(determinant of A 6= 0 or equivalently rank, r = n), then Ax = b has a unique solution which
is x = A−1 b. Further the solution exists for any vector b. The last statement is equivalent to

33
noting that for a general nonsingular matrix n × n whose rank = n, the rank of the augmented
matrix must also equal n and is consistent with our solvability conditions based on the notion
of the rank.
Earlier we saw that a basis could be defined for the null space of A. Using the solvability
condition based on rank equivalence, we can define an additional space relevant to understand-
ing the solutions to linear equations as the range space of A also denoted as R(A). R(A)
consists of all vectors such that Ax = b is solvable. We illustrate this with a simple example.
Consider the augmented matrix where b1 and b2 represent elements of vector b.
   
2 3 | b1 2 3 | b1
→ (3.17)
6 9 | b2 0 0 | 3b1 − b2
The second matrix is obtained by elementary row operations. The solvability condition requires
that 3b1 − b2 = 0 resulting in the following basis for R(A),
 
1
b=α (3.18)
3
where α is an arbitrary scalar. In the above example, dim[R(A)] = 1 ≡ r. Using the definition
of R(A), the solvability condition is equivalent to stating that Ax = b is solvable if b lies in
R(A).
To complete the solution scenario for the linear equations we need to discuss the situa-
tion when the homogeneous equation, Ax = 0 has non-trivial solutions i.e when N (A) is not
empty.
Theorem: If Ax = 0 has non-trivial solutions then Ax = b may or may not be solvable. If it
is solvable then it has an infinity of solutions.
If A has non-trivial solutions then the rank, r < n for a n × n square matrix and for an
(m × n) matrix, r < n for both m < n and m > n. If the homogeneous problem has non-
trivial solutions then Ax = b is solvable if and only if the rank of the matrix equals the rank
of the augmented matrix. If this solvability condition is satisfied, then a solution exists, and the
system has an infinity of solutions. The infinity of solutions is due to the non trivial solutions
of the homogeneous problem and hence can be related to N (A). The solution to Ax = b can
in general be split into two parts in the following manner,
k
X
x= αi φi + xp . (3.19)
i=1

34
The first term on the right hand side represents part of the solution that lies in N (A) whose
dimension (without loss of generality) is assumed to be k, and φi form the basis for N (A). xp
is a particular solution to Ax = b. To show that x given in Eq. 3.19 is a general solution, we
operate on x with A. Hence
k
X
Ax = A( αi φi ) + Axp
i=1
k
X
= ( αi Aφi ) + Axp
i=1
= b

We note that since φi forms the basis for N (A), Aφi = 0. The infinity of solutions is due to
solutions in N (A) since αi are arbitrary scalars. If the only solution to Ax = 0 is the trivial
solution then N (A) is empty and the solution is unique. In this case x = xp assuming that
the solvability condition is satisfied. The existence and uniqueness conditions for Ax = b
discussed above are summarized in Figure 3.1.

Ax = b

n×n m×n

Ax = 0 Ax = 0
n×n m×n

Trivial Non−trivial Trivial


solution solution
solution

Unique
solution No Infinity of No Unique
solutions solutions solutions solution
r(A) 6= r(A|b) r(A) = r(A|b)

Figure 3.1: Illustration of the various solution scenarios that are encountered while solving
linear equations.

We end this section on solutions to linear equations with a geometric interpretation of

35
the different solution scenarios discussed above.

3.4.1 Geometrical Interpretation

Consider the following set of linear algebraic equations with two unkowns

a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2
(3.20)

Solutions to the above equations can be analyzed by plotting x2 vs x1 on a two dimensional plot
as shown in the Figures below. We assume that a11 /a12 > 0 and a22 /a21 > 0. Hence both lines
in the above equations will have negative slopes.
Case 1: The determinant, a11 a22 − a21 a12 is non-zero. Hence Ax = 0 has the trivial solution.
This is illustrated in Fig. 3.2, where the solutions to Ax = 0 is only the trivial solution indicated
by the intersection of the two lines at the origin. In this situation both lines have different slopes.
Further, Eq. 3.20 has a unique solution for any vector b lying in the plane.

x2

Unique Solution
Ax = b

x1
Trivial Solution
Ax = 0

Figure 3.2: Solution of linear equations illustrating a unique solution. The dashed lines repre-
sent solutions to Ax = 0.

Case 2: The determinant, a11 a22 − a21 a12 is zero. This implies that both lines have the same
slopes (Fig. 3.3). Hence Ax = 0 has an infinity of solutions indicated by the dashed line that

36
passes through the origin. If the solvability criterion (rank condition) is satisfied then the so-
lution to Ax = b consists of all points on a line having the same slope with intercept b1 /a12

x2

Infinite Solutions
Ax = b

x1
Infinite Solutions
Ax = 0

Figure 3.3: Solution of linear equations illustrating an infinity of solutions. The dashed line
represents the solutions to Ax = 0.

Case 3: The determinant, a11 a22 − a21 a12 is zero. This implies that both lines have the same
slopes. Hence as in Case 2, Ax = 0 has an infinity of solutions indicated by the dashed line
that passes through the origin. If the solvability criterion (rank condition) is not satisfied then
Ax = b does not have a solution as illustrated in Fig. 3.4.
Example 6:
Let us examine the solvability conditions for the set of linear equations,

x1 + 2x2 = b1
2x1 + 4x2 = b2
x1 = b3 (3.21)

It is easy to see that the only solution to the homogeneous equation Ax = 0 is the trivial solu-
tion. Hence N (A) is empty. The range space A consists of,
   
1 0
b = α1 2 + α2 0 
  
0 1

37
x2

No Solutions
Ax = b

x1

Infinite Solutions
Ax = 0

Figure 3.4: Solution of linear equations illustrating no solutions. The dashed line represents the
solutions to Ax = 0.

Hence 2b1 = b2 and b3 is arbitrary. The solutions are,


b1 − b3 b2 − 2b3
x1 = b3 , x2 = ≡
2 4
This is an illustrative example, as it is a situation of an m × n system where the null space is
empty. If the b lies in the range, then the system of equations has a unique solution. Figure 3.5
graphically illustrates some possible solution scenarios.

3.5 Summary

Starting from the definitions of the linear or vector space, we introduced the concept of linear
independence and subsequently notions of basis sets and dimensions of basis. The idea of rep-
resenting vectors or functions in a suitable basis has far reaching consequences in functional
analysis and solutions of differential equations. In this Chapter we saw how a basis could be
used to construct the null space and range space of a matrix and connect the dimensions of these
spaces to the now familiar definition of the rank of the matrix. The theorems on solutions of
linear systems completes the discussion on existence and uniqueness for this class of inhomo-
geneous equations which can be represented as Ax = b. The starting point for the analysis was
to investigate the solutions of the homogeneous system. Figure 3.1 schematically illustrates the

38
x2

x1 = b3

Unique Solution (b3 < b1 )


x2 = −x1 /2 + b1 /2
x1

Unique Solution
(b1 = b2 = 0)
x2 = −x1 /2

Figure 3.5: Solution of linear equations illustrating two possible solution scenarios for the set
of linear equations given in Example 6. In one case, b3 < b1 and in the second case b1 = b2 = 0.
In both cases the solutions are points obtained with the intersection by the vertical line x1 = b3 .

various scenarios
Since we are interested in existence and uniqueness conditions for ordinary differential
equations, we have to abandon the notions of ranks and determinants that form the basic tools
to analyse a linear system of equations. We begin to develop a more complete theory of linear
operators in the next Chapter by introducing the inner product space and the adjoint operator.
Once we are equipped with this formalism to study linear differential equations later in the
book, we will first revisit the theorems developed in this Chapter to understand the generality
and utility of these tools and ideas.

39
PROBLEMS

1. Which of the following column vectors can be used to construct a basis for the three
dimensional vector space R3 .
         
1 1 2 1 1
 0  , 1 , 1 , 3 , −1
−1 2 1 2 −1

Once you have picked an appropriate basis set, show that any vector in R3 can be uniquely
represented using this basis. In other words show that the vectors you have chosen form
a valid basis for R3 .

2. Consider the space X consisting of all polynomials, f (x), a ≤ x ≤ b, with real coeffi-
cients and degree not exceeding n.

(a) Show that X is a real linear (vector) space.

(b) What is the dimension of this space ?

(c) Define a suitable basis for this space of polynomials.

(d) Show that your basis does constitute a linearly independent set of vectors.

3. Consider the following functions

φn = (1 − t)n−1

for n = 1 to 4.

(a) Do these form a linearly independent set ?

(b) What is the dimension of the vector space they span ?

(c) Using these functions construct a basis to represent the polynomial 3t3 −2t2 +6t−5.
Find the coefficients of the expansion.

4. Show that the presence of a zero vector in a set of linearly independent vectors makes the
set of vectors linearly dependent.

40
5. Consider the following matrix
 
1 −1 2
A = 2 1 6
1 2 4

(a) Reduce A to its echelon form.

(b) What is the rank of A?

(c) What is the dimension of the null space, N (A) of A? Find a basis for N (A).

(d) What is the dimension of the range space, R(A) of A? Find a basis for R(A).

(e) Using your answer from part (f) identify which of the following vectors b will yield
a solution to Ax = b    
3 1 6
 2  −1  5 
−1 −2 12

(f) Find the solutions to Ax = b for those vectors b in part (g) for which solutions are
feasible. Note that your solutions consist of a vector that belongs to the null space
of A and a vector that satisfies Ax = b.

6. Consider the following set of linear algebraic equations

x1 + 2x2 + x3 + 2x4 − 3x5 = 2


3x1 + 6x2 + 4x3 − x4 − 2x5 = −1
4x1 + 8x2 + 5x3 + x4 − x5 = 1
−2x1 − 4x2 − 3x3 + 3x4 − 5x5 = 3

(a) Reduce to echelon form.


(b) Find the basis for null space of A.
(c) Find the basis for the range of A.
(d) Construct the complete solution to the set of equations.

41
(e) Does the system have a unique solution? If not, how many solutions does the system
possess?

7. Consider the following matrix


 
1 −1 2
2 1 2
A=
4 −1 9

2 1 1
(a) What is the rank of A?

(b) What is the dimension of the null space (N (A)) of A?

(c) What is the dimension of the range space (R(A)) of A? Find a basis for R(A).

(d) Next consider the transpose of the matrix A. Find a basis for the null space of AT .

(e) Construct a vector space such that the vectors in the space are orthogonal to the null
space of AT . What is the dimension of this vector space ? Compare this orthogonal
vector space with R(A). Can you draw any conclusions.

(f) Find the solutions to Ax = b for


 
3
0
b=
0

2
Does the system have a unique solution and why? Illustrate your solution graph-
ically. Note: This problem is connected with the general Fredholms Alternative
theorems to be introduced in the Chapter 4, Sec 4.5

8. Determine the ranks, dimensions and suitable basis for both the (N (A)) and (R(A)) for
the following sets of linear algebraic equations. If the right hand side vector b is given,
obtain a particular solution to the set of equations.

(a)

x1 − x2 + 3x3 + 2x4 = b1
3x1 + x2 − x3 + x4 = b2
−x1 − 3x2 + 7x3 + 3x4 = b3

42
(b)

x1 + 2x2 − x3 + x5 = b1
3x1 + 2x2 + x4 = b2
x1 − 2x2 + 2x3 + x4 − 2x5 = b3

(c)

5x1 + 10x2 + x3 − 2x4 = 6


−x1 + x2 − 2x3 + x4 = 0
2x1 + 3x2 + x3 − x4 = 2
6x1 + 9x2 + 3x3 − 3x4 = 6

(d)

x1 + x2 − x3 = b1
−2x1 − x2 + x3 = b2
x + 2x2 − 2x3 = b3

43
44
Chapter 4

Inner Products, Orthogonality and the


Adjoint Operator

While defining the linear space or vector space we were primarily concerned with elements
or vectors that conform to the rules of addition and scalar multiplication. These are algebraic
properties. The simplicity of the linear space was sufficient to introduce ideas such as linear
independence and basis sets in a finite dimensional setting. We observe that notions of distance,
length and angles between the elements of the space, which reflect geometric properties were
not discussed.
In this chapter we define the inner product space which provides the necessary frame-
work to introduce geometric properties. The primary motivation for this is to lay the grounds
for discussing orthogonality and its relationship to representation of vectors or functions in a
suitable basis set. In this Chapter the Gram-Schmidt orthogonalization process and its relation-
ship to well known orthogonal polynomials, such as the Legendre and Hermite polynomials
will be developed. The inner product space allows us to introduce the Schwarz and triangular
inequalities. We end this chapter with the definition of the adjoint operator and its utility in
studying issues of uniqueness and existence of non-homogeneous linear equations,

Ax = b (4.1)

45
4.1 Inner Product Spaces

The inner product space consists of a linear space X on which the inner product, denoted by
< ·, · > is defined, where the dots represent any two elements in the space. If u, v and w are
contained in X, and α is an arbitrary scalar contained with the scalar field associated with X,
then the inner product satisfies the following axioms

1. Linearity:
< u + v, w >=< u, w > + < v, w >

and
< αu, v >= α < u, v >

2. Symmetry
< u, v >= < v, u >

3. Positive Definiteness

< u, u > > 0 when u 6= 0

4.
< u, αv >= α < u, v >

In the above definitions the overbar is used to denote the complex conjugate. Note that the inner
product always results in a scalar quantity.
The inner product inherently contains the definition of the length or norm, denoted by
k · k. The norm of u is related to its inner product in the following manner

kuk2 =< u, u > (4.2)

Example 1
Consider two vectors u and v in an n-dimensional vector space. The inner product,

< u, v > = u1 v1 + u2 v2 + . . . un vn (4.3)


Xn
= ui vi (4.4)
i

46
The norm of the vector u, v
u n
√ uX
kuk = < u, u > = t ui ui
i=1

The use of the complex conjugate while defining the inner product is consistent with our notion
of the length of a vector in the complex plane. Consider the point with co-ordinates (1, i) in the
complex plane denoted by the vector  
1
u= (4.5)
i
If one were to use the definition of the inner product in the absence of the complex conjugate
then it would imply that a non-zero vector has a zero length! Using the definition of the inner

product given in Eq. 4.4, the norm of the vector given in Eq. 4.5, kuk = 2.
Question: Show that the inner product as given in Eq. 4.4 satisfies the axioms of the inner
product space. Thus the vectors of the n-dimensional vector space form an inner product space.
Example 2
Consider two functions f (x) and g(x) which belong to the space of continuous functions with
x ∈ [a, b]. The inner product between the two functions,
Z b
< f (x), g(x) >= f (x)g(x) dx
a

The square of the norm,


Z b
2
kf (x)k = f (x)f (x) dx
a
Z b
= |f (x)|2 dx (4.6)
a

If f (x) = x and g(x) = e−ix then


Z b
< f (x), g(x) >= xeix dx
a

Further Z b
< g(x), f (x) >= e−ix x dx
a

and it is easily seen that < f (x), g(x) > = < g(x), f (x) >, thereby satisfying the symmetry
property.

47
In the above examples it is easy to show that the definitions of the inner product sat-
isfy the axioms of the inner product space. One might naturally inquire, if there are alternate
definitions of the inner product. Indeed, other definitions do exist and we will encounter some
of them later in the book. However, as long as the definition of the inner product satisfies the
axioms of the inner product space it is a valid candidate.

4.2 Orthogonality

Two vectors u and v are said to be orthogonal if their inner product is identically zero,
n
X
< u, v >= ui vi = 0
i

Similarly two functions f (x) and g(x) are orthogonal on the interval x ∈ [a, b] if
Z b
< f (x), g(x) >= f (x)g(x), dx = 0
a

The collection of vectors u1 , u2 . . . , un are said to form an orthogonal set if

< ui , uj >= 0 if i 6= j

and the set is said to be orthonormal if



0 i 6= j
< ui , uj >= δij = (4.7)
1 i=j
The norm of each vector in an orthonormal set is unity. Hence an orthonormal set is obtained
from an orthogonal set by dividing each vector by its length or norm.
Example 3
Consider the vectors      
a 0 0
u1 = 0 , u2 =  b  , u3 = 0
0 0 c
These form an orthogonal set, since < ui , uj >= 0 when i 6= j. The corresponding orthonormal
set obtained by dividing each of the above vectors, ui by its norm, kui k, is the familiar set of
unit vectors which constitute a basis in R3
     
1 0 0
e1 = 0 , e2 = 1 , e3 = 0
    
0 0 1

48
Example 4
The set of functions, u1 (x) = sin πx, u2 (x) = sin 2πx, . . . un (x) = sin nπx form an orthogonal
set in the interval 0 ≤ x ≤ 1. Hence
Z 1 
0 m 6= n
< un (x), um (x) >= sin nπx sin mπx dx =
0 1/2 m = n
 √
The corresponding orthonormal set, vn (x) = 2 sin nπx .

4.3 Orthogonality and Basis Sets

Perhaps the most elegant and useful property of an orthonormal set is the utility as a basis to rep-
resent other vectors or functions. Consider representing a vector x in a finite dimensional space
using a suitable basis, {φi }. We will first assume that the basis does not form an orthogonal set
and is simply a linearly independent set. Let x be a vector in the complex plane, C n .
N
X
x= αi φ i (4.8)
i=1

To find the coefficients αi in the expansion, take the inner product of Eq. 4.8 with φj . This
yields
N
X
< x, φj >= < αi φ i , φ j > j = 1...N (4.9)
i=1
The procedure of taking inner products generates a set of N linear algebraic equations which
can compactly be written in matrix vector notation as,

Aα = b

where α is the vector of unknown coefficients in the expansion, Eq. 4.8 and

aij = < φj , φi >≡ < φi , φj >


bi = < x, φi >

If the basis forms an orthonormal set as defined in Eq. 4.7 then the solution is greatly simplified.
The matrix A reduces to the Identity matrix and the solution, which are the coefficients in the
expansion (Eq. 4.8)
αi =< x, φi > i = 1...N

49
The above procedure of obtaining the coefficients is similar in function space as well, with
the appropriate definition of the inner product. We observe that if we had a basis that was
not orthogonal then the procedure results in obtaining a solution to a set of linear algebraic
equations. In the case of functions the elements of the resulting coefficient matrix, A consist
of integrals that have to be evaluated. We illustrate the above procedure with examples in both
vector and function spaces.
Example 5
Consider vectors in R2 .
x = α1 φ 1 + α2 φ 2 (4.10)

where      
1 2 1
x= , φ1 = , φ2 =
2 1 1
The resulting set of linear equations can be solved using standard methods. However, in what
follows we utilize inner products as illustrated in Sec. 4.3 to obtain coefficients in the expansion
given in Eq. 4.10. Taking the inner product of the expansion, with φ1 and φ2 respectively, the
coefficients in the expansion are obtained by solving the following linear equations,

5α1 + 3α2 = 4
3α1 + 2α2 = 3

whose solution yields α1 = −1 and α2 = 3. If we use the following orthogonal basis


   
−1 1
φ1 = , φ2 =
1 1

then
< x, φ1 >
α1 = = 1/2
< φ1 , φ1 >
< x, φ2 >
α2 = = 3/2
< φ2 , φ2 >
Finally, if we use the corresponding orthonormal basis, obtained by normalizing the orthogonal
set above,    
1 −1 1 1
φ1 = √ , φ2 = √
2 1 2 1

50
and the coefficients are


α1 = < x, φ1 >= 1/ 2

α2 = < x, φ2 >= 3/ 2

The above example illustrates the simplification in the analysis obtained by using an orthonor-
mal basis set, over a basis that is simply linearly independent or even orthogonal.
Example 6
Expansion in basis sets is of central importance in functional approximation using Fourier se-
ries. Consider representing a function, f (x) for 0 ≤ x ≤ in an infinite sin series which was
shown to form an orthogonal set in Example 4 above.

X
f (x) = an sin nπx
n=1

Taking inner products with sin mπx,



X
< f (x), sin mπx >= an < sin nπx, sin mπx >
n=1

Replacing the inner product with integrals,


Z 1 ∞
X Z 1
f (x) sin mπx dx = an sin nπx, sin mπx dx m = 1...n
0 n=1 0

Using the orthogonality property of the functions, sin nπx, n = 1 . . . ∞ given in Example 4
above, the expression for the coefficient reduces to,
Z 1
an = 2 f (x) sin nπx dx
0

The above expression is obtained by noting that for every m in the previous equation, only
the mth term in the expansion survives. We will encounter similar expansions while solving
PDEs with the separation of variables technique. The representation of functions in a series
expansion of orthonormal sets forms the key foundation for solving PDEs and the central ideas
of functional representation presented in this Chapter should be mastered at this point.

51
4.4 Gram-Schmidt Orthogonalization

The Gram-Schmidt (GS) orthogonalization provides a systematic method of constructing an


orthogonal set from a linearly independent set of vectors. Given a set of n linearly independent
vectors {ui } the GS process can be used to construct, {vi }, the orthogonal set. Let {xi } denote
the corresponding orthonormal set.
v1
v1 = u1 and x1 =
kv1 k
Construct the next vector v2 as a linear combination of u2 and x1 ,

v2 = u2 − α1 x1

such that the orthogonality condition < v2 , x1 >= 0 is satisfied. Taking inner products of the
above equation with x1 , α1 =< u2 , x1 >. Hence
v2
v2 = u2 − < u2 , x1 > x1 and x2 =
kv2 k
Proceeding in a similar manner

v3 = u3 − α2 x2 − α3 x1

Setting v3 orthogonal to the x2 and x1 , i.e. < v3 , x2 >= 0 and < v3 , x1 >= 0 results in
α2 =< u3 , x2 > and α3 =< u3 , x1 >. Hence
v3
v3 = u3 − < u3 , x2 > x2 − < u3 , x1 > x1 and x3 =
kv3 k
Continuing in this manner,

vn = un − < un , xn−1 > xn−1 − < un , xn−2 > xn−2 . . . , − < un , x1 > x1

and
xn
xn =
kxn k
It is easy to show that < vn , vm >= 0 for m < n. Hence the set {vi }, is an orthogonal
set and {xi } is an orthonormal set. We note that the above procedure does not depend on the
initial ordering of the set of linearly independent vectors, {ui }. However, each ordering will

52
result in a different set of orthonormal vectors. Hence for a given vector space there are a large
number of orthonormal sets. It is easy to visualize this in two dimensions, where two orthogonal
vectors in the plane can be rotated by by an arbitrary angle to generate an infinite combination
of orthogonal vectors. We illustrate the GS procedure with some examples.
Example 7: Consider the set of linearly independent vectors in R2 .
   
1 i
u1 = , u2 =
0 1
Using the GS procedure outlined above,
 
v1 1
x1 = =
kv1 k 0
We next construct v2
     
i 1 0
v2 = u2 − < u2 , x1 > x1 = −i = ≡ x2
1 0 1
However if we reorder the initial set of two vectors, such that
   
i 1
u1 = , u2 =
1 0
then the resulting orthonormal set is
   
1 i 1 1
x1 = √ , x2 = √
2 1 2 i
This example illustrates the non-uniqueness in the orthogonal set obtained using the GS proce-
dure.
The Schwarz Inequality
Consider the definition of the dot product of two vectors,

u · v = kukkvk cos θ

where θ is the angle between the two vectors and is defined in terms of the dot product and the
norms of the two vectors. Using the inner product notation

< u, v > = kukkvk cos θ


| < u, v > | = kukkvk| cos θ|

53
Since 0 ≤ | cos θ| ≤ 1
| < u, v > | ≤ kukkvk

This is known as the Schwarz inequality. We present a more general derivation below

0 ≤ < u + αv, u + αv >


= < u, u + αv > + < αv, u + αv >
= < u, u > + < u, αv > + < αv, u > + < αv, αv >
= kuk2 + α < u, v > +α < v, u > +ααkvk2 (4.11)

Since the inner product and α are in general complex scalars, let

< u, v >= | < u, v > |eiθ and α = reiθ (4.12)

where r is the modulus and θ the phase of the complex quantity. Substituting, Eqs. 4.12 into
Eq. 4.11,
0 ≤ kuk2 + 2r| < u, v > | + r 2 kvk2 ≡ f (r) (4.13)

f (r) in Eq. 4.13 is a quadratic, in r. Since f (r) ≥ 0, the discriminant ∆ ≤ 0. When, f (r) = 0,
the quadratic has two real roots and f (r) > 0 corresponds to the situation of two imaginary
roots. Hence
b2 − 4ac ≤ 0 → 4| < u, v > |2 − 4kuk2 kvk2 ≤ 0

and
| < u, v > | ≤ kukkvk

which is the Schwarz inequality. There are alternate ways to derive the Schwarz inequality and
one such variant is illustrated below.

0 ≤ < u − αv, u − αv >


= kuk2 − α < u, v > −α < v, u > +ααkvk2 (4.14)

Substituting
< u, v > < u, v >
α= =
< v, v > kvk2
in Eq. 4.14, we get
2 | < u, v > |2
0 ≤ kuk −
kvk2

54
which yields the Schwarz inequality.
The Triangular Inequality
We illustrate how the Schwarz inequality can be used to prove the triangular inequality,

ku + vk ≤ kuk + kvk

ku + vk2 = < u + v, u + v >


= < u, u > + < v, u > + < u, v > + < v, v >
= kuk2 + < u, v >+ < u, v > +kvk2
= kuk2 + 2Re < u, v > +kvk2
≤ kuk2 + 2| < u, v > | + kvk2
≤ kuk2 + 2kukkvk + kvk2 (Using the Schwartz Inequality)
= (kuk| + kvk)2

which yields the triangular inequality,

ku + vk ≤ kuk + kvk

55
4.5 The Adjoint Operator

Consider the operator L on an inner product space X. L∗ is said to be the adjoint of L if it


satisfies the following identity,

< Lu, v >=< u, L∗ v > ∀ u, v ∈ X (4.15)

The above identity provides a formal route to identifying the adjoint operator L∗ . If L = L∗ then
the operator is said to be self-adjoint. The above definition of the adjoint operator is general,
and can be used to identify the adjoints for matrix, differential and integral operators without
loss of generality. Further, X represents a vector space if L is a matrix or a function space if
L is either a differential or integral operator. This definition should not be confused with the
adjugate or adjoint of a matrix discussed earlier in connection with finding the inverse of the
matrix. We illustrate the procedure for finding the adjoint operator starting with matrices.
Example: Let L be the n × n matrix, A and u and v represent n dimensional vectors.
XX
< Au, v > = aij uj vi∗
i j
XX
= uj aij vi∗
i j
XX
= ui aji vj∗
i j
X X
= ui aji vj∗
i j
X
= < u, a∗ji vj >
j
= < u, A∗ v >

From the last two lines of the above manipulations, it should be clear that the adjoint operator
A∗ is simply the Hermitian transpose of A. It the matrix is real symmetric or Hermitian then
A = A∗ . Hence symmetric or Hermitian matrices belong to a class of self-adjoint operators
that we are already familiar with. In Example 1, we interchanged indices on the 3rd line of the
derivation. The reader should be familiar with this index manipulation and derive the definition
of the adjoint for an n × m matrix as an exercise.
Example 2: Let A be matrix with real elements,

56
     
a11 a12 u1 v
A= u= v= 1
a21 a22 u2 v2
then,

XX
< Au, v > = aij uj vi ≡ (a11 u1 + a12 u2 )v1 + (a21 u1 + a22 u2 )v2
i j
XX
= ui aji vj ≡ u1 (a11 v1 + a21 v2 ) + u2 (a12 v1 + a22 v2 )
i j
= < u, A∗ v >

where the adjoint,


 
∗ a11 a21
A =
a12 a22

Example 3: Let us consider a specific matrix with complex elements,


 
i 0
A=
i 1

then,

< Au, v > = iu1 v1∗ + (iu1 + u2 )v2∗


= iu1 v1∗ + iu1 v2∗ + u2 v2∗
= < u, A∗ v >

In this example A 6= A∗ and hence the matrix is not self-adjoint. < Au, v >=< u, A∗ v >
by definition (Eq. 4.15). However A 6= A∗ . Hence, Eq. 4.15 only provides a prescription for
identifying the adjoint operator.

4.6 Adjoints for Differential Operators

Consider the differential operator,

d2 u
Lu = + αu(x) = 0 0≤x≤1
dx2

57
with the boundary conditions, u′ (0) = 0 and u′ (1) = 0. The prime denotes differentiation with
respect to x. In order to obtain the adjoint operator L∗ , we proceed in the following manner.
1
d2 u
Z
< Lu, v > = 2
[+ αu(x)]v(x) dx
0 dx
Z 1 2 Z 1
du
= 2
v(x) dx + αu(x)v(x) dx
0 dx 0
Z 1 Z 1
′ ′ 1 d2 v
= [vu − v u]0 + u 2 dx + αu(x)v(x) dx
0 dx 0

The last line is obtained by integrating the term containing u′′ (x) terms twice by parts. The last
step in obtaining the adjoint operator requires incorporating the boundary conditions on u(x).
If B(u, v) represents the boundary terms, then

B(u, v) = [vu′ − v ′ u]10


= [v(1)u′(1) − v ′ (1)u(1) − v(0)u′ (0) + v ′ (0)u(0)]
= [−v ′ (1)u(1) + v ′ (0)u′(0)] since u′ (0) = 0, u′(1) = 0

The boundary conditions on v(x) are chosen such that B(u, v) vanishes. This results in v ′ (0) =
0 and v ′ (1) = 0 and

1
d2 v
Z
< Lu, v > = u(x)[ 2 + αv(x)] dx
0 dx
= < u, L∗ v >

Hence the adjoint operator

d2 v
L∗ v = + αv(x) = 0 0≤x≤1
dx2

with the boundary conditions, v ′ (0) = 0 and v ′ (1) = 0. The above prescription formally defines
the adjoint operator. Note that both L and L∗ are defined along with their boundary conditions.
The boundary conditions for L∗ were obtained with the requirement that the boundary func-
tional B(u, v) = 0. Further, L = L∗ , and the differential operator is said to be self-adjoint.
In the case of differential equations, L = L∗ only when the boundary conditions for L and the
adjoint operator L∗ are identical. This is the case with the example above.

58
4.7 Existence and Uniqueness for Ax = b Revisited

We return to the question of existence and uniqueness for linear operators in a more general
setting. These theorems are also referred to as the Fredholms alternative theorems and provide
a prescription for analyzing the existence and uniqueness conditions for all linear operators. Let
us consider the existence and uniqueness conditions for the matrix equation and introduce the
concept of the adjoint operator to tackle the existence and uniqueness condition. Consider the
matrix equation,
Au = b

1. We first analyze the homogeneous problem,

Au = 0.

If Au = 0 has only the trivial solution, then Au = b has a unique solution. If A is an n × n


matrix then this is true for any vector b. However for a n × m matrix Au = b has a unique
solution only when the system is solvable. We have examined the proof of the above statements
in detail in the previous Chapter.
2. The second part of the theorem concerns the conditions for the solvability (or existence
condition) of Au = b. If Au = 0 has non-trivial solutions we have seen earlier that Au = b,
can have no solution or have an infinity of solutions. In order to determine the conditions for
solvability, we examine the homogeneous adjoint problem,

A∗ v = 0 (4.16)

where A∗ is the adjoint of A. The theorem states that Au = b has a solution if and only if

< b, v >= 0 ∀ v s.t. A∗ v = 0 (4.17)

The above condition provides the solvability or existence condition for the inhomogeneous
problem. The statement in Eq. 4.17 is equivalent to stating that the rhs vector b is orthogonal to
the null space of the adjoint operator, A∗ since v satisfies, Eq. 4.16. To show that when u is a
solution to Au = b then < b, v >= 0, where v satisfies A∗ v = 0.
Proof: Since
Au = b

59
it follows that,
< Au, v >=< b, v > (4.18)

Now
< Au, v >=< u, A∗ v >= 0 (4.19)

Hence
< b, v >= 0

To complete the proof we need to show that, if < b, v >= 0 then < Au = b > has a solution,
i.e, b lies in the range of the operator. We do not pursue this here. The theorem can be used
to verify the solvability condition when A is nonsingular. When A is nonsingular then the
only solution to A∗ v = 0 is the trivial solution. Hence < b, v >= 0 for any b and Ax = b
is therefore solvable for all b. Although we have proved the alternative theorems developed
above using A as the linear operator, the theorems are true for linear operator in general. We
will use these alternative theorems to study the existence and uniqueness conditions for some
differential operators later in the text.
Example 4: In this example we use the solvability condition of the alternative theorem to
identify the range space for the set of linear equations,

x1 + x2 + x3 = b1
2x1 − x2 + x3 = b2
x1 − 2x2 = b3

We first identify the null space vectors for A∗ using elementary row operations
   
1 2 1 1 2 1
A∗ = 1 −1 −2 → 0 1 1
1 1 0 0 0 0

The basis for the null space of A∗ ,


 
1
v = α −1
1

60
The solvability condition states that Ax = b is solvable if and only if < b, v >= 0. This
results in b1 − b2 + b3 = 0 which yields the following basis for the range space of A,
   
−1 1
b=α 0   + β 1

1 0

The range space vectors can also be obtained using the rank equivalence criterion. The reader
should obtain and compare the range space vectors using the rank criterion.
We end this Chapter with by using the Fredholms alternative theorem to prove the fol-
lowing theorem concerning the dimensions of N (A) and R(A).
Theorem: For a general m × n matrix A

dim N (A) + dim R(A) = n (4.20)

where dim N (A) = n − r and the dim R(A) = r


Case 1: Let m = n. The dimension of N (A) = n − r. Since the rank of A∗ is the same as the
rank of A, the dimension of N (A∗) = n − r. In order to determine the dimension of R(A)
we utilize the solvability condition based on the Fredholms alternative theorem. Ax = b is
solvable if and only if
< b, vi >= 0 i = 1, 2, . . . n − r (4.21)

where vi ∈ N (A∗) i.e. A∗ vi = 0. Since b is a column vector with n unknowns, Eq. 4.21
provides n − r equations. r unknowns can be chosen independently, resulting in dimR(A) = r.
Hence Eq. 4.20 is true.
Case 2: Let A be an m × n matrix of rank r. The dimension of N (A) = n − r. Since the rank of
A∗ is the same as the rank of A and A∗ is an n × m matrix, the dimension of N (A∗) = m − r.
Solvability conditions results in,

< b, vi >= 0 i = 1, 2, . . . m − r

where vi ∈ N (A∗) i.e. A∗ vi = 0. Since b is a column vector with m unknowns, Eq. 4.7
provides m − r equations. r equations can be chosen independently resulting in dimR(A) = r.
Hence Eq. 4.20 is true. The proof for Case 2, is valid for both m < n or m > n. In either case
rank, r ≤ min(m, n)

61
Problems

1. Solvability Conditions
Use the Fredholm’s alternative theorem to determine the solvability conditions (existence)
for the following sets of linear equations, by checking if the right hand side vector b is
orthogonal to the null space of A∗ . If the system is solvable, comment on the uniqueness
of the solution.

x1 − x2 + 2x3 = 3
2x1 + x2 + 6x3 = 2
x2 + 2x2 + 4x3 = −1

x1 + 2x2 + x3 + 2x4 − 3x5 = 2


3x1 + 6x2 + 4x3 − x4 + 2x5 = −1
4x1 + 8x2 + 5x3 + x4 − x5 = 1
−2x1 − 4x2 − 3x3 + 3x4 − 5x5 = 3

x1 − x2 + 3x3 + 2x4 = 2
3x1 + x2 − x3 + x4 = −3
−x1 − 3x2 + 7x3 + 3x4 = 7

2. Gram Schmidt Orthogonalization


Find the eigenvalues and eigenvectors of
 
2 −1 0
A = −1 2 −1 .
0 −1 2
(a) Show that the eigenvectors form a linearly independent set.
(b) Using the Gramd-Schmidt process construct an orthonormal set of eigenvectors.

3. Gram Schmidt Orthogonalization


Consider the following set of 5 vectors
         
1 1 3 1 0
0 , 1 , −1 , −1 , 2
2 1 4 0 1

62
(a) Using the above vectors construct a subset containing the maximum number of lin-
early independent vectors.

(b) Using the set obtained in part (a) above construct an orthonormal set of vectors using
Gram-Schmidt orthogonalization.

4. Orthonormal Functions
Show that the following functions

φn (x) = exp(2πi nx) n = 0, ±1, ±2 . . . , 0≤x≤1


where i = −1 form an orthonormal set.

5. Fourier Series Representation of Functions

Consider a piecewise continuous function f (x) defined on the interval [−c, c] with period
2c. The function can be represented as

a0 X  nπx nπx 
f (x) = + an cos + bn sin
2 n=1
c c

(a) Determine expressions for the coefficients an and bn .

(b) Simplify the series expansions for odd functions f (x) and even functions f (x).

(c) For a function 


−π/2 −π < x < 0
f (x) =
π/2 0<x<π
and f (0) = 0, evaluate the Fourier series representation.

(d) If SM (x) is the value of the series with M terms in the summation, then plot SM (x)
for the series obtained in part (c) for different values of M. What can you conclude
about the series representation for f (x) ?

(e) It can be shown that


 π  Z π sinx
lim SM = dx
M →∞ 2M 0 x
Use this result to check your limiting value of the summation that you compute.

63
6. Fourier Series Solutions
The Fourier series solution to the temperature T (x, t) in a time dependent 1D heat con-
duction problem is

X nπx
T (x, t) = an exp(−αn2 π 2 t/c2 )sin
n=1
c

where α is the thermal diffusivity.

(a) Using the initial condition T (x, t = 0) = f (x) obtain an expression for the coeffi-
cients an in the expansion.

(b) You will need to evaluate the following integral


c
nπx mπx
Z
sin sin dx
0 c c

for n = m and n 6= m.

(c) If the initial condition f (x) = x, carry out the integrations and obtain an expression
for the coefficients an in the expansion

7. Fourier Series Solutions


The Fourier series solution to the temperature T (x, y, t) in a time dependent 2D heat
conduction problem on a rectangle with sides of length a and b can be expressed as

∞ X
X nπx mπy
T (x, y, t) = anm exp[−(n2 π 2 /a2 + m2 π 2 /b2 )t]sin sin
n=1 m=1
a b

(a) Using the initial condition T (x, y, t = 0) = f (x, y) obtain a general expression
for the coefficients anm in the expansion. You will need to evaluate the following
integral
a
nπx mπx
Z
sin sin dx
0 a a
for n = m and n 6= m.

(b) If the initial condition f (x) = xy, carry out the integrations and obtain an expression
for the coefficients anm in the expansion

64
8. Orthogonal Functions
Consider the following ode
d2 un
+ λ2n un = 0
dx2
with the boundary conditions u′n (x = 0) = un (x = 1) = 0.

(a) Obtain the general solution to this equation.


(b) Obtain non-trivial solutions of the form u1 (λ1 x), u2 (λ2 x) . . . un (λn x) for the above
boundary conditions. What are the values of λn ?
(c) Verify that the solutions u1 (x), u2 (x) . . . form an orthogonal set. Construct an or-
thonormal set of functions.
(d) Evaluate the following integrals
Z 1 Z 1
sinnπx sinmπx dx cosnπx cosmπx dx
0 0
for n, m = 0, 1 . . .. While evaluating the integrals you will have to treat the cases
n 6= m and n = m separately.

Note: The above ode arises while solving partial differential equations with the separation
of variables method where the functions un (x) are known as the eigenfunctions and λn
are the eigenvalues.

9. Gram-Schmidt Orthogonalization
Consider the functions

φn (x) = exp(−x/2)xn 0≤x≤∞

(a) Using Gram-Schmidt orthogonalization construct an orthonormal basis ψn (x) for n


= 0, 1 and 2.
(b) Show that the orthonormal basis forms a linearly independent set.

10. Series Expansions


Consider the following expansion in a basis
X∞
f (x) = an φn (x)
n=1

65
(a) If the weighted inner product,
Z b
hφn φm iw(x) = φn (x)φm (x)w(x) = δnm
a

obtain an expression for the coefficients an .

(b) Consider now the finite series representation of f (x)


M
X
f (x) ≈ cn φn (x).
n=1

Obtain the coefficients cn by minimizing the least square error


Z b" M
#2
X
f (x) − cn φn (x) w(x)dx.
a n=1

(c) Comment on the value of the coefficients an and cn .

11. Prove the following:

(a)
kx + yk ≤ kxk + kyk (4.22)

(b)
kx + yk2 = kxk2 + kyk2 (4.23)

(c)

kxk − kyk ≤ kx − yk (4.24)

Give a geometric interpretation for a) and b)

12. Consider the Bessel’s inequality


M
X
| < ei , x > | ≤ kxk2 (4.25)
i=1

where ei denote the orthonormal basis in the M-dimensional vector space. If, N denotes
the dimension of vector space into which x can be decomposed show that the equality
sign holds if M = N. What is the condition under which inequality sign holds ?

66
13. Consider the Schwartz inequality

| < x, y > | ≤ kxkkyk (4.26)

For non-zero ||x|| and ||y|| show that the equality holds if and only if x and y are linearly
dependent. Interpret this geometrically.

14. Use the inner product to verify the following identities

(a)
kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ) (4.27)

(b)
1 1
kz − xk2 + kz − yk2 = kx − yk2 + 2kz − (x + y)k2 (4.28)
2 2

67
68
Chapter 5

Eigenvalues and Eigenvectors

Definition: A complex number λ is an eigenvalue of A if there exists a non-zero vector x called


the eigenvector such that
Ax = λx (5.1)

Eq. 5.1 can be rewritten as


(A − λI)x = 0 (5.2)

From Eq. 5.2, eigenvectors x belong to the null space of (A − λI) and λ’s are scalars which
result in a zero determinant for, A − λI. The null space of A − λI is also referred to as the
eigenspace corresponding to the eigenvalue λ. The eigenvectors corresponding to a particular
eigenvalue form a basis for the eigenspace. In this Chapter our primary focus will be to answer
the following questions. Given an n × n matrix A, can we always obtain n linearly independent
eigenvectors? Under what conditions do these eigenvectors form an orthonormal set? Can these
eigenvectors be used to solve nonhomogeneous problems of the kind Ax = b and initial value
problems of the following form,
dx
= Ax + b(t)
dt
Given an n × n matrix, A the eigenvalues λi ’s i = 1 . . . n, are obtained by solving the
characteristic equation
|A − λI| ≡ f (λ) = 0 (5.3)

The algebraic multiplicity for a given eigenvalue λi is the number of times the root λi is re-
peated. The geometric multiplicity of λi is the dimension of the vector space spanned by the

69
eigenvectors corresponding to the eigenvalue λi . Equivalently the geometric multiplicity cor-
responding to λi is nothing but the dimension of the null space of A − λi I. The geometric
multiplicity cannot exceed the algebraic multiplicity. We will see that it is desirable to have
matrices where the geometric multiplicity is equivalent to the algebraic multiplicity.
Given a matrix A and its corresponding eigenvalues we are interested in the properties of
the eigenvectors. If the eigenvectors are to be used as a basis set, they would have to be linearly
independent. Further, as we saw in the last Chapter it would be desirable to form a basis with
an orthogonal set. Theorem 1, is concerned with the linear dependence of eigenvectors and
Theorem 2, addresses the issue of orthogonality between eigenvectors.
Theorem 1: Eigenvectors corresponding to distinct eigenvalues are linearly independent.
Proof: Let the matrix A have eigenvalues, λi , i = 1, . . . n. Hence

Axi = λi xi i = 1...,n (5.4)

If the eigenvectors form a linearly independent set, then the only solution to

n
X
ci xi = 0 (5.5)
i

is when ci ’s are identically zero. Premultiplying Eq. 5.5 sequentially by A, we can generate the
following set of linear n algebraic equations,

n
X
ci xi = 0
i
n
X
ci λi xi = 0
i
n
X
ci λ2i xi = 0
i
..
.
n
X
ci λin−1 xi = 0
i
(5.6)

70
which can be written in matrix vector notation as,
  
1 1 ... 1 c1 x1
 λ1 λ2 . . . λn    c2 x2 
 
..   ..  = 0 (5.7)

 .. ..
 . . .  . 
n−1 n−1 n−1
λ1 λ2 . . . λn cn xn

Since the eigenvalues are distinct the above matrix is non-singular and the determinant is
nonzero. The above matrix is also known as the Vandermonde matrix and it can be shown
that the determinant is n
Y
(λj − λi ) 6= 0 (j 6= i)
i,j=1

Since Eq. 5.7 represents a set of homogeneous equations, the only solution is the trivial solution.
Further since xi are the eigenvectors they are non-zero by definition and ci ’s are identically zero.
Theorem 2: If L is a self-adjoint operator, i.e. L = L∗ , then the eigenvalues and eigenvectors
of L satisfy the following properties.
1. The eigenvalues of L are real.
2. Eigenvectors corresponding to distinct eigenvalues are orthogonal.
Proof 1:
Lu = λu

Taking inner products with u


< Lu, u >= λ < u, u > (5.8)

Using the definition of the adjoint operator

< Lu, u > = < u, L∗ u >


= < u, Lu > (Since L = L∗ )
= λ < u, u >
(5.9)

From Eqs. 5.8 and 5.9, λ = λ. This is only possible when λ is real.
Proof 2: Let
Lu = λu u and Lv = λv v (5.10)

71
Taking inner products of the first equation in Eq. 5.10 with v,

< Lu, v >= λu < u, v > (5.11)

Using the definition of the adjoint operator,

< Lu, v > = < u, L∗ v >


= λv < u, v >
= λv < u, v > (Since λ is real)
(5.12)

Equating Eq. 5.11 with Eq. 5.12 we get,

(λu − λv ) < u, v >= 0 (5.13)

Since λu 6= λv and u and v are non-zero by definition, Eq. 5.13 implies that < u, v >= 0, i.e u
is orthogonal to v.
Although we are presently occupied with matrix operators, the above proof is valid for
self-adjoint operators in general. The proof also illustrates the utility of using innner prod-
ucts in proving the theorem. We illustrate the implications of Theorem 1 and Theorem 2 with
some examples. We first consider some illustrative examples for nonsymmetric matrices. As
an exercise, the reader should obtain the eigenvalues and eigenvectors for the examples given
below.
Example 1: Nonsymmetric matrix, distinct eigenvalues
       
1 1 λ1 = 3 (1) 1 (2) −1
A= x =α and x =β
4 1 λ2 = −1 2 2

The superscript i on the eigenvector corresponds to ith eigenvalue.


Example 2: Nonsymmetric matrix, multiple eigenvalues
     
3 1 λ1 = 2 (1) 1
A= x =α
−1 1 λ2 = 2 −1

The first example illustrates that when a nonsymmetric matrix has distinct eigenvalues,
it is possible to obtain two distinct eigenvectors. By Theorem 1, these eigenvectors are linearly

72
independent. The second example illustrates a situation where a nonsymmetric matrix with
multiple eigenvalues has only one eigenvector. This is an example where the geometric multi-
plicity is less than the algebraic multiplicity. For a given λi with multiplicity m, this situation
occurs for non-symmetric matrices when dim N (A − λI) < m. This is symptomatic of situ-
ations where the algebraic multiplicty is greater than unity for nonsymmetric matrices. In the
next two examples we will consider Hermitian matrices.
Example 3: Symmetric matrix, distinct eigenvalues
       
1 2 λ1 = 3 (1) 1 (2) 1
A= x =α and x =β
2 1 λ2 = −1 1 −1
The eigenvectors are not only linearly independent, but are also orthogonal by Theorem 2. The
orthonormal set is obtained by dividing each eigenvector by its norm. The orthonormal set is,
   
(1) 1 1 (2) 1 1
x =√ and x = √
2 1 2 −1

Example 4: Symmetric matrix with multiple eigenvalues


       
2 0 λ1 = 2 (1) 1 (2) 0
A= x =α and x =β
0 2 λ2 = 2 0 1
In this example the matrix has one eigenvalue of multiplicity 2. However unlike the situation in
Example 2, here we are able to obtain two distinct eigenvectors which constitute an orthonormal
set. We can state the following theorem for Hermitian matrices.
Theorem: A Hermitian matrix of order n has n linearly independent eigenvectors and these
form an orthonormal set.
Consider a Hermitian matrix with distinct eigenvalues. From Theorem 2, it is clear
that the eigenvectors corresponding to distinct eigenvalues are orthogonal and the eigenvectors
form an orthonormal set. If the matrix has multiple eigenvalues then the orthornormal set is
constructed by using Gram-Schmidt orthogonalization. Consider an n × n Hermitian matrix
with k distinct eigenvalues, λ1 , λ2 , . . . λk . Let the multiplicity of the k + 1th eigenvalue λk+1
be m. For the first k set of distinct eigenvalues there are k eigenvectors which constitute an
orthogonal set (Theorem 2, part 2). Each of these k eigenvectors are orthogonal to the remaining
m eigenvectors corresponding the eigenvalue λk+1 which has multiplicity m (Theorem 2, part
2). The missing piece is the orthogonality between the m eigenvectors corresponding to the

73
repeated eigenvalue λk+1. Since these are eigenvectors that belong to the same eigenvalue,
Theorem 2, does not apply. However we can use Gram-Schmidt orthogonalization to construct
an orthonormal set of these m eigenvectors. With this the construction is complete and we
have a set of n orthonormal eigenvectors. We note that the Gram-Schmidt orthogonalization
is essentially a process of taking linear combinations of vectors and we need to shown that the
new vectors are still eigenvectors having the same eigenvalue. The following Lemma concerns
this point.
Lemma: Let x1 , x2 . . . xm be m eigenvectors corresponding the eigenvalue λ. Hence

Axi = λxi i = 1, . . . m

The eigenvalue of the eigenvector constructed by taking linear combinations of the m eigenvec-
tors is also λ.
Proof: Let y be the eigenvector obtained by taking a linear combination of m eigenvectors,
m
X
y= αi xi
i=1

Now
m
X
Ay = A( αi xi )
i=1
m
X
= αi Axi
i=1
Xm
= αi λxi
i=1
Xm
= λ αi xi
i=1
= λy
(5.14)

Hence y is an eigenvector with eigenvalue λ. We note that this proof is true for linear combina-
tions which involve any subset of the m eigenvectors.
Finally we state (without proof) that given a Hermitian matrix, the algebraic multiplicity
always equals the geometric multiplicity. This implies that for an eigenvalue with mulitplicity

74
m,
dim N (A − λi I) = m

Hence for a Hermitian matrix we are ensured of finding all the eigenvectors regardless of the
mulitplicities in the eigenvalues. This concludes the proof for Theorem 3.

5.1 Eigenvectors as Basis Sets

Consider the matrix equation


Au = b (5.15)

We will assume that the matrix posesses n eigenvectors. Further let us assume that the determi-
nant of A is non-zero i.e λi 6= 0 for i . . . n. Let
n
X
u= ci xi (5.16)
i

Substitute Eq. 5.16 into Eq. 5.15,


n
X
A ci xi = b
i
n
X
ci Axi = b
i
n
X
ci λi xi = b
i
Taking inner products with xj
Xn
ci < λi xi , xj > = < b, xj > j = 1...n
i
(5.17)

The above manipulations results in a set of n linear algebraic equations which can be compactly
represented as
Mc = f

where the elements of M, mij =< λj xj , xi > and fi =< b, xi >. A solution of Eq. 5.1, yields
the coefficients in the expansion. If the eigenvectors form an orthonormal set, as would be the

75
case if A were Hermitian then the coefficients can be obtained analytically,

< b, xi >
ci =
λi

and the solution can be expressed as,

n
X < b, xi >
u= xi (5.18)
i=1
λ i

The above method is general and can also be used when the det A= 0. We leave this as an
exercise. The above solution illustrates the utilility of the eigenvectors as a basis while seeking
solutions to matrix equations. In the absence of an orthonormal set of eigenvectors, obtaining
the coefficients involves solving a set of linear equations. If the eigenvectors form an orthonor-
mal set, as is the case with a Hermitian matrix A, then the solution is greatly simplified.

5.2 Similarity Transforms

Very often equations involving matrices can be conveniently treated using suitable transforma-
tions. Clearly a transformation that preserves the eigenvalues of the matrix will preserve the
underlying physics of the problem. One such transformation is the similarity transform. In this
section we introduce the similarity transform and illustrate its utility for matrix diagonalization,
matrix algebras and solutions of IVPs.
Definition: If there exists a non-singular matrix P such that

P−1AP = B

then, B is said to be similar to A. Similar matrices have the same eigenvalues.


Theorem: Similar matrices have the same eigenvalues. If P−1 AP = B then A is similar to B
and both A and B have the same eigenvalue.

76
Proof: Let the eigenvalue of A be λ.

B = P−1 AP
BP−1 = P−1 A
BP−1 x = P−1 Ax
= P−1 λx
= λP−1 x
(5.19)

If P−1 x = y, the the last line of the above equation implies,

By = λy

Hence λ is an eigenvalue of B with eigenvector P−1 x.

5.2.1 Diagonalization of A

If a matrix A has n linearly independent eigenvectors, then

P−1 AP = Λ

P is a nonsingular matrix whose columns are made up of the eigenvectors of A and Λ is a


diagonal matrix with λ’s on the diagonal.
 
λ1 0 ... 0
 0 λ2 0 . . . 0 
 
 0 0 λ3 . . . 0 
Λ= 
 .. .. .. 
. . .
0 0 . . . λn
We next show that the matrix P made up of the eigenvectors, reduces A to a diagonal matrix
under a similirity transformation. Let Axi = λi xi i = 1, . . . n

AP = A[x1 , x2 , . . . , xn ]
= [Ax1 , Ax2 , . . . , Axn ]
= [λ1 x1 , λ2 x2 , . . . , λn xn ]
= PΛ
(5.20)

77
In general a square matrix can be reduced to a diagonal matrix if and only if it possesses n
linearly independent eigenvectors. This is always possible for Hermitian matrices.

5.2.2 Using similarity transforms

Similarity transforms can be used to perform matrix algebras in a convenienent manner as illus-
trated below,

1. Powers of Matrices

P−1 AP = Λ
A = PΛP−1
An = (PΛP−1) (PΛP−1) . . . (PΛP−1)
= PΛn P−1
(5.21)

2. Inverse

A−1 = (PΛP−1)−1
= (PΛ−1P−1 )
(5.22)

3. Matrix polynomials

f (A) = a0 Am + a1 Am−1 . . . am I
(Using, An = PΛn P−1 )
= a0 PΛm P−1 + a1 PΛm−1 P−1 + . . . am PP−1
= P(a0 Λm + a1 Λm−1 + . . . , +am I)P−1
= Pf (λ)P−1
(5.23)

78
In all the above manipulations, the algebra is reduced to taking powers of the diagonal
matrix Λ. To complete the solution, one also requires a knowledge of P−1 . Under certain
conditions P−1 is easily deduced from P. We discuss this next.

5.3 Unitary and orthogonal matrices

Definition: P is said to be a unitary matrix if,

P∗ P = PP∗ = I,

which implies that P−1 = P∗ . As defined earlier, P∗ is the complex conjugate transpose of P.
If P consists of real elements then,

PT P = PPT = I,

which implies that P−1 = PT . Then P is said to be orthogonal. Further, if

P∗ P = PP∗

then the matrix P is said to be normal. Normal matrices provides a broader classification
for matrices which includes both unitary and orthogonal matrices. Other examples of normal
matrices are, Hermitian, skew Hermitian and diagonal matrices.
Theorem: If A is a Hermitian matrix then the matrix P whose columns are made up of the
eigenvectors of A is a unitary matrix.
Proof: Since P is made up of the eigenvectors of A,

P = [x1 , x2 , x3 . . . xn ]

Then
x†1 x1 x†1 x2 . . . , x†1 xn
 
 
 
 x† x x† x . . . , x† x 
 2 1 2 2 2 n
P∗ P =  (5.24)
 

 . .. .. 
 .. . ..., . 
 
 
† † †
xn x1 xn x2 . . . , xn xn

79
We have introduced some new notation for clarity. In the above equations, x†i represents the
complex conjugate transpose of the column vector xi . Hence,
 
x1
 x2 
x†i = x1 , x1 , . . . , xn

xi =  ..  then
 
.
xn

Noting, that the matrix elements in Eq. 5.24 are inner products between the eigenvectors which
form an orthonormal set (A = A∗ ).

x†i xj =< xi , xj >= δij

Hence
P∗ P = I

In a similar manner one can show that PP∗ = I. Therefore, P is unitary. The proof for
orthogonal matrices follows along similar lines, with x†i replaced with xTi .
Example 5: Consider the non-symmetric matrix from Example 1. Using the two linearly inde-
pendent eigenvectors to construct matrix P,
     
1 −1 −1 1 2 1 −1 3 0
P= , P = and P AP =
2 2 4 −2 1 0 −1

Note that the order in which the eigenvalues appear in the diagonal matrix is dependent on how
the eigenvectors are ordered in the matrix P.
Example 6: Consider the symmetric matrix from Example 3.
     
1 −1 −1 1 1 1 −1 3 0
P= , P = and P AP =
1 1 2 −1 1 0 −1

If P is constructed using the orthonormal eigenvectors of A then,


 √ √   √ √ 
1/√2 −1/√ 2 −1 1/ √2 1/√2
P= and P =
1/ 2 1/ 2 −1/ 2 1/ 2

Here P−1 = PT and P is orthogonal.

80
5.4 Jordan Forms

In this section we are concerned with an n×n matrices that do not posess n linearly independent
eigenvectors. This was the situation in Example 2 above. If A does not possess n linearly
independent eigenvectors then there exists a non-singular matirx P such that,

P−1 AP = J

where J the Jordan matrix has the following structure,


 
J1 0 ... 0
 0 J2 0 . . . 0 
 
 0 0 J3 . . . 0 
J= 
 .. .. .. 
. . . 
0 0 . . . Jn
where Ji are the Jordan blocks which have λ’s on the diagonal and 1’s on the first superdiagonal.
A typical form for a (3 × 3) Jordan block is illustrated below,
 
λ 1 0
Ji =  0 λ 1 
0 0 λ
J is also referred to as the Jordan canonical form.
If an n × n matrix A has k linearly independent eigenvectors then the matrix P is
constructed using these k eigenvectors as well as the remainining n−k generalized eigenvectors.
Before we embark on determining generalized eigenvectors we spend some time on the structure
of the Jordan forms themselves.

5.4.1 Structure of the Jordan Block

The structure of the Jordan block is best illustrated using some examples. Consider a (3 × 3)
matrix with multiplicity m = 3. We can then have three different situations depending on the
number of linearly independent eigenvectors.
Case 1: 1 eigenvector and 2 generalized eigenvectors and the Jordan matrix has 1 Jordan block.
 
λ 1 0
J = 0 λ 1
0 0 λ

81
Case 2: 2 eigenvectors and 1 generalized eigenvectors and the Jordan matrix has 2 Jordan
blocks. The two forms of the Jordan matrix are,
   
λ 1 0 λ 0 0
J =  0 λ 0  or J =  0 λ 1 
0 0 λ 0 0 λ

Case 3: 3 eigenvectors and 0 generalized eigenvectors. This case reduces to the diagonal form
Λ and can be interpreted as having 3 Jordan blocks.
 
λ 0 0
J = 0 λ 0
0 0 λ

In the above illustrations, the Jordan block is identified as a partitioned matrix. We can
make the following statement relating the number of Jordan blocks to the number of linearly
independent eigenvectors. The number of Jordan blocks in the Jordan canonical form J cor-
respond to the number of linearly independent eigenvectors of the matrix A. Further from the
the examples above, the number of 1’s on the super-diagonal is equivalent to the number of
generalized eigenvectors used to construct the matrix, P.

5.4.2 Generalized Eigenvectors

In this section we illustrate the procedure for finding generalized eigenvectors for a matrix with
deficient eigenvectors. Consider for example a (3 × 3) matrix with eigenvalue λ having multi-
plicity 3 and 1 eigenvector x. In this case we would like to obtain 2 generalized eigenvectors,
q1 and q2 . The situation corresponds to Case 1, above with the Jordan matrix having two 1’s on
the off-diagonal.

P = [x, q1 , q2 ]
AP = [λx, Aq1 , Aq2 ]
 
λ 1 0
PJ = [x, q1 , q2 ]  0 λ 1 
0 0 λ
= [λx, x + λq1 , q1 + λq2 ]

82
Equating AP = PJ, in the above equations we obtain the following equations for the general-
ized eigenvectors q1 and q2 ,

(A − λI)q1 = x and (A − λI)q2 = q1

We make the following observations. Unlike eigenvectors which are obtained as solutions to
a homogeneous problem, generalized eigenvectors are obtained as solutions to inhomogeneous
equations as given above. We consider the generalized eigenvector corresponding to situation
in Case 2 given above. In this case the matrix P is constructed by using only one generalized
eigenvector. Using the same procedure as above,

P = [x1 , x2 , q1 ]
AP = [λx1 , Ax2 , Aq1 ]
 
λ 0 0
PJ = [x1 , x2 , q1 ]  0 λ 1 
0 0 λ
= [λx1 , λx2 , x2 + λq1 ]

Equating AP = PJ, in the above equations we obtain the following equations for the general-
ized eigenvector q1 ,
(A − λI)q1 = x2

In the above example there are a number of different variations to obtain the generalized eigen-
vector. P can be constructed by interchanging the vectors x1 and x2 . In this case the equation
for the generalized eigenvector reduces to,

(A − λI)q1 = x1

Additionaly the P matrix can be constructed by taking linear combinations of x1 and x2 . If


u = αx1 + βx2 , then

P = [x1 , u, q1 ]

83
Working through the same procedure as outlined above, the equation for the generalized eigen-
vector is
(A − λI)q1 = u = αx1 + βx2 .

Clearly the above procedure illustrates that the generalized eigenvector can be constructed in
many ways and is hence non-unique. Regardless of the manner in which the generalized eigen-
vector is obtained, the structure of the Jordan matrix is unaltered. Since the generalized eigen-
vector must be obtained by solving an inhomogeneous equation the issue of solvability must be
confronted. The last example illustrates the number of ways in which the right hand side vector
can be chosen to meet the solvability criterion or equivalently arrive at an system of equations
that yields a solution.
Question: In the last example where the matrix has two eigenvectors, derive the equations for
the generalized eigenvector assuming
 
λ 1 0
J = 0 λ 0
0 0 λ

5.5 Initial Value Problems

Consider the linear IVP of the following form

dx
= Ax + b(t) (5.25)
dt

with initial condition, x(t = 0) = x0 . In Eq 5.25, each element of x and b are functions of time
and the matrix A consists of constant coefficients. Hence
   
x1 (t) b1 (t)
 x2 (t)   b2 (t) 
x =  ..  and b =  .. 
   
 .   . 
xn (t) bn (t)

Such systems of IVPs occur in staged processes, batch reactors, process control and vibration
analysis. We will solve the above equation using the similarity transform technique introduced
in the previous section.

84
We first consider the situation where, A has n linearly independent eigenvectors. In this
case A is diagonalizable. Premultiplying Eq. 5.25 by P−1
d(P−1 x)
= P−1 Ax + P−1 b(t)
dt
d(P−1 x)
= P−1 APP−1x + P−1 b(t)
dt

In the last line above we have inserted PP−1 after A. If P−1 x = y and P−1 b(t) = g(t) then
the above system can be rewritten as,
dy
= Λy + g(t) (5.26)
dt
with the following IC, y(t = 0) ≡ y0 = P−1 x0 . Using the integrating factor e−Λt , Eq 5.26 can
be rewritten as,
d −Λt
(e y) = e−Λt g(t) (5.27)
dt
whose general solution is Z t
−Λt
e y(t) = e−Λτ g(τ ) dτ + c (5.28)
0
Using the initial condition y0 = P−1 x0 , Eq. 5.28 reduces to,
Z t
y(t) = eΛ(t−τ ) g(τ ) dτ + eΛt y0 (5.29)
0
The solution can be expressed in a more compact form as
 R t λ (t−τ ) 
0
e 1
g 1 (τ ) dτ
 R t eλ2 (t−τ ) g (τ ) dτ 
 0 2
x = Py = P[eΛt y0 + f(t)] where f(t) = 

.. 
 . 
R t λ (t−τ )
0
e n gn (τ ) dτ
In order to obtain the solution we need to obtain an expression for eΛt . This can be obtained in
the following manner. Expanding eΛt in a Taylor series,
Λt (Λt)2 (Λt)3
e = I + Λt + + + ...,
2! 3!
Substituting Λ in the above expression and collecting terms,
P∞ (λ1 t)n   
n=0 n! 0 ... 0 eλ1 t 0 . . . 0
P∞ (λ2 t)n  0 eλ2 t 0
Λt
 0 n=0 n! 0 0  0 
e = =  ..
   
.
. .
. .
.  .. .. 
 . . .   . . . 
P∞ (λn t)n λn t
0 0 ... n=0
0 0 ... e
n!

85
If A is not diagonalizable then the above solution procedure remains unaltered. However Λ is
replaced with J. In this case we need to obtain an expression for eJt . Consider the following
example. If  
λ 1 0
J = 0 λ 1 , (5.30)
0 0 λ
then J = Λ + S, where Λ is the diagonal matrix and S is a matrix containing the off-diagonal
terms. Then
   
λ 0 0 0 1 0
eJt = eΛt eSt where Λ = 0 λ 0 and S = 0 0 1
0 0 λ 0 0 0

where eΛt is evaluated as illustrated above. In order to evaluate eSt we proceed in a similar
manner and carry out a Taylor expansion,

(St)2
eSt = I + St + + ..., (5.31)
2!
Due to the structure of matrix S, the number of terms that are retained in the Taylor expansion
is only 3 as the powers of S greater than 2 are identically zero. The reader should check this.
Collecting terms in the expansion given in Eq. 5.31 we obtain
t2 λ
1 t t2 /2
   λt 
e teλt 2
e t
eSt = 0 1 t  and eJt =  0 eλt te λt 

0 0 1 0 0 eλt

The above procedure can be generalized for an (n × n) Jordan matrix of the form given in
Eq. 5.30 and,
 λt tn−1 λ 
e teλt . . . , (n−1)!
e t
 0 eλt . . . , tn−2 λ 
e t
eJt = (n−2)!
.

0 ..
0 ..., . 
0 0 eλt

5.6 Eigenvalues and Solutions of Linear Equations

While solving linear equations, Ax = b it is important to understand the sensitivity of the


solution to small changes in the coefficients of the matrix A or the elements in the vector b.

86
The sensitivity usually arises from round-off error during a numerical solution such as Gauss
elimination. A measure of the sensitivity to small perturbations is known as the condition
number of the matrix. Hence a matrix whose solutions are sensitive to small changes in the
coefficients is said to be poorly conditioned. We will use concepts of matrix and vector norms
to quantify these concepts and connect this issue of sensitivity to the eigenvalues of the matrix.
Normed Space The norm is simply the notion of length that we have encountered while dis-
cussing inner product spaces. More formally, kxk is said to be a norm on a linear space X,
x, y ∈ X if it satisfies the following properties,

(i) kxk > 0


(ii) kx + yk ≤ kxk + kyk Triangular Inequality
(iii) kαxk = |αk|xk
(iv) kxk = 0 If and only if x = 0

Some examples of commonly encountered norms are


The 2 norm " n #1/2
X
kxk2 = |xi |2
i=1

The p norm
" n
#1/p
X
kxkp = |xi |2 , 1≤p<∞
i=1

The ∞ norm

kxk∞ = max |xi |


1≤i≤n

The norm incorporates the definition of a distance function or metric, d(x, y)

d(x, y) = kx − yk

If x = (x1 , x2 ) and y = (y1 , y2 ), then


p
d(x, y) = (x1 − y1 )2 + (x2 − y2 )2

which is the familiar example of the distance in R2 .

87
Matrix Norms: kAk is a matrix norm for a matrix A if it satisfies the following properties of
a normed space,

(i) kAk > 0


(ii) kA + Bk ≤ kAk + kBk Triangular Inequality
(iii) kαAk = |αk|Ak
(iv) kAk = 0 If and only if A = 0

Further
kABk ≤ kAkkBk

The matrix norm is compatible with a vector norm if

kAxk ≤ kAkkxk

Examples of commonly encountered matrix norms are given below,


The 1 norm, or the maximum column sum,
" n
#1/2
X
kAk1 = max |aij |
1≤j≤n i=1

The ∞ norm or the maximum row sum,


" n #1/2
X
kAk∞ = max |aij |
1≤i≤n j=1

The spectral norm,


kAk2 = [ρ(A∗ A)]1/2

where ρ(A) is the spectral radius of A defined as the maximum eigenvalue of A. If A is


Hermitian, A∗ = A and
kAk2 = |λmax |

If Ax = λx then any norm of A is an upper bound on the eigenvalues.

|λk|xk = kλxk = kAxk ≤ kAkkxk

88
and
|λ| ≤ kAk

Errors and Perturbation


Consider the linear equation,
Ax = b (5.32)

If δb is a small pertubation to the vector b then let δx be the corresponding pertubation to the
solution vector x and
A(x + δx) = b + δb (5.33)

The problem lies in determing a bound on the perturbation to the solution vector x. Expanding,
Eq. 5.33 and noting that Ax = b,
Aδx = δb (5.34)

and
δx = A−1 δb (5.35)

From Eq. 5.34


kδbk = kAδxk ≤ kAkkδxk (5.36)

and from Eq. 5.35,


kδxk = kA−1 δbk ≤ kA−1 kkδbk (5.37)

From Ax = b
kbk = kAxk ≤ kAkkxk (5.38)

Combining Eqs. 5.37 and 5.38

kδxkkbk ≤ kAkkA−1kkδbkkxk (5.39)

If kbk =
6 0 then Eq. 5.39 reduces to,

kδxk kδbk
≤ kAkkA−1k (5.40)
kxk kbk

The condition number κ(A) is defined as

κ(A) = kAkkA−1k (5.41)

89
and Eq. 5.42 is,
kδxk kδbk
≤ κ(A) (5.42)
kxk kbk
If the 2 or spectral norm is used and the matrix is symmetric, then
|λmax |
κ(A) = (5.43)
|λmin |
Eq. 5.42 indicates that if the condition number is large then small perturbations in the vector b
can amplify the errors in the solution vector x. Matrices that are nearly singular (i.e with one
eigenvalue close to zero) are clearly poorly conditioned. There are many numerical methods
developed to improve the conditioning of matrices. Clearly precision related conditioning can
be alleviated to some extent by using higher precision computing. While deriving the bounds
as given in Eq. 5.42 we assumed that the errors occurred only in the vector b. We next consider,
the situation where the error occurs in the matrix.

5.6.1 Positive Definite Matrices

A matrix is A is said to be positive definite if

< Ax, x > > 0 for x 6= 0

If A is symmetric then A is said to be symmetric positive definite (SPD). To show that the
eigenvalues of a positive definite matrix are always positive;

< Ax, x >= λ < x, x >= λkxk2 > 0

Hence all λ’s are positive. As a consequence, the determinant of a positive definite matrix is
non-zero. If A is singular, then ∃ a nonzero vector x such that Ax = 0, which implies that
< Ax, x > = 0.
Spectral Radius The spectral radius, ρ(A) of a matrix A is the maximum value of the modulus
of its eigenvalues.

ρ(A) = max |λi |


i

There are several localization theorems which yield information on the bounds for the eigenval-
ues. The most important theorem is the Gerschgorin’s theorem.

90
Gerschgorin’s Theorem: Let A be a general n × n matrix whose eigenvalues can be either
real or complex. If
n
X
ri = |aij | i = 1...,n
j=1,j6=i

is the sum of the off-diagonal elements in the ith row. Let Di be the disk in the complex plane
of radius ri and centered at aii ,

Di = {z : |z − aii | ≤ ri } i = 1, . . . , n

Gerschogorin’s theorem, states that all eigenvalues of A lie in the union of disks Di . Thus

λi ∈ D 1 ∪ D 2 ∪ D 3 . . . ∪ D n i = 1, . . . , n

Proof Consider any λ with corresponding eigenvector x. The eigen equation Ax = λx can be
expressed as,
n
X
(λ − aii )xi = aij xj for i = 1, . . . n (5.44)
j=1,j6=i

where xj is the jth component in the eigenvector x. Let xk be the component with the largest
absolute value in the vector x. Then |xj |/|xk | ≤ 1 for j = 1, . . . n. Eq. 5.44 for i = k can be
expressed as,
n
X xj
(λ − akk ) = akj (5.45)
j=1,j6=k
xk
Taking moduli on both sides,
n n
X |xj | X
|λ − akk | ≤ |akj | ≤ |akj | = rk
|xk |
j=1,j6=k j=1,j6=k

Thus λ is contained in the disk Dk centered at akk . A similar procedure follows for all the λ’s.
Hence the eigenvalues lie in the union of the disks, Dk k = 1, . . . n

5.6.2 Convergence of Iterative Methods

The spectral radius of a matrix is useful while analyzing convergence of iterative processes. We
will show that sequence of vectors generated by the iterative process,

xk+1 = Axk k = 0, . . . (5.46)

91
will tend to zero if and only if, ρ(A) < 1. x0 is an arbitrary initial vector. Further if ρ(A) > 1
then the sequence will diverge. The convergence of the limiting vector can be analyzed by
expanding in a basis made up of eigenvectors, {ui } of A.
n
X
x0 = αi ui
i=1
n
X n
X
x1 = Ax0 = αi Aui = αi λi ui
i=1 i=1
n
X n
X
x2 = Ax1 = αi λi Aui = αi λ2i ui
i=1 i=1
.. ..
. .n
X
xk = αi λk+1
i ui
i=1

Since ρ(A) < 1, the powers of λi tend to zero as k → ∞ and

lim xk = 0
k→∞

Alternately the iterative process can be analyzed by examining the powers of the matrix A.
Thus,

x1 = Ax0
x2 = Ax1 = A2 x0
.. ..
. .
xk = Ak x0

If A can be diagonalized using similarity transforms, then

Ak = P−1 Λk P

where Λ is the diagonal matrix with eigenvalues on the diagonal, and

lim Ak = 0 since lim Λk = 0 if ρ(A) < 1


k→∞ k→∞

and
lim xk = 0
k→∞

92
We leave it as an exersize to show that the limiting vector xk will converge to zero when,

Ak = P−1 Jk P

where J is the Jordan canonical form, i.e.,

lim Jk = 0 if ρ(A) < 1


k→∞

We illustrate the application of these ideas with the analysis of convergence properties
of iterative methods such as the Jacobi and Gauss-Seidel methods used for solutions of lin-
ear equations which results in large sparse matrices. We briefly outline the procedure for the
solution to Ax = b using these methods.
Jacobi’s Method Consider the solution to Ax = b using the Jacobi’s method. Rewrite

A=D−B (5.47)

where    
a11 0 . . . 0 0 a12 . . . a1n
 0 a22 0 0   a21 0 . . . a2n 
D =  .. B = −  ..
   
.. . . ..  .. .. .. 
 . . . .   . . . . 
0 0 . . . ann an1 an2 . . . ann
Substituting, A = D − B into Ax = b,

Dx = Bx + b

This can be solved iteratively using the following numerical scheme,

xk+1 = D−1 (Bxk + b) k = 1, . . . (5.48)

and the solution is the limiting vector as k → ∞.


The Gauss Seidel Method This is an improvement over the Jacobi’s method as it uses the latest
updated components of the vector during each iteration. Here

A = −(L + U) + D (5.49)

where D is the diagonal matrix as defined above and L is a lower triangular matrix and U is the
upper triangular matrix as shown below,
   
0 0 ... 0 0 a12 . . . a1n
 a21 0 ... 0  .. .. .. .. 
L = −  ..
 
U = − . .
 . . 
.. . . ..  
 . . . . 0 0 . . . an−1,n 
an1 . . . an,n−1 0 0 0 ... 0

93
Substituting, Eq. 5.49 into Ax = b and rearranging,

xk+1 = (D − L)−1 (Uxk + b) k = 1, . . .

Convergence of these iterative methods can be analyzed in the following manner. We


illustrate the analysis with the Jacobi’s method. Let x0 be the exact solution to Ax = b. Then,
Eq 5.48 is
x0 = D−1 (Bx0 + b) (5.50)

Note that x0 is also referred to as a fixed point of the mapping. Fixed points will be discussed
in Chapter ??. Subtracting Eq. 5.48 from 5.50,

bk+1 = Hbk (5.51)

where bk+1 is the error vector, x0 −xk+1 at iterate k +1 and H = D−1 B. Eq. 5.51 is of the same
form as Eq. 5.46 and b will tend to zero as k → ∞ when the spectral radius, ρ(H) < 1. The
conditions for the ρ(H) < 1 can be obtained by examining the spectral radii of the Gerschgorin
discs. The matrix,
 
0 a12 /a11 a13 /a11 ... a1n /a11
 a21 /a22 0 a23 /a22 ... a2n /a22 
H = D−1 B = −  (5.52)
 
.. .. .. .
. .. 
 . . . . . 
an1 /ann an2 /ann ... an−1,1 /ann 0

Since the radii, ri of the Gerschgorin discs are the sum of the off-diagonal elements of H,
n
X |aij |
ri = i = 1...n (5.53)
|aii |
j=1,j6=i

For ρ(H) < 1,


n
X |aij |
ri = < 1 i = 1...n (5.54)
j=1,j6=i
|aii |
which yields the following condition,
n
X
|aii | > |aij | (5.55)
j=1,j6=i

Matrices which satisfy the condition given by Eq. 5.55 are referred to as strictly diagonally
dominant matrices which will result in converged solutions to the Jacobi method, regardless of

94
the initial vector x0 used in the iteration. Clearly a good guess of the starting vector will reduce
the number of iterations required to obtain the solution. Note that the result only comments
on whether the iteration will converge or not. This is an instructive illustration of the utility of
the Gerschgorin’s theorem. The convergence criterion for the Gauss-Seidel method is left as an
exercize.

5.7 Summary

The main goal of this Chapter was to analyze the eigenvalue-eigenvector problem for a ma-
trix. Once the eigenvalues are obtained, the main problem reduces to finding the eigenvectors
and understanding the properties between the eigenvectors themselves. Theorem 2 provides
the foundation for constructing orthogonal sets of eigenvectors for Hermitian matrices. The
theorem was discussed with relevance to matrices, however the generality of the theorem for
differential and intergral operators has far reaching consequences, laying the foundation for ob-
taining orthogonal eigenfunctions for differential operators and developing a theory of Fourier
series. These connections will be drawn in later Chapters. We also discussed the significance of
using the eigenvectors as a basis set and illustrated its utility in solving Ax = b. In the last part
of this Chapter we discussed similarity transforms and its utility in working with functions of
matrices as well as solutions of linear initial value problems. In this context we introduced the
Jordan canonical form and described a new set of vectors called generalized eigenvectors. The
reader should realize that geneneralized eigenvectors are required for non-symmetric matrices
where only a partial set of eigenvectors can be found.

95
PROBLEMS

1. Skew-Symmetric Matrix
A matrix A is said to be skew symmetric or skew self-adjoint if A = −A∗ . Show that
the eigenvalues are imaginary (or zero) and that eigenvectors corresponding to distinct
eigenvectors are orthogonal.

2. Normal Matrices
If AA∗ = A∗ A, then A is said to be normal.

(a) Show that for any complex number α,

||Ax − αx|| = ||A∗ x − α∗ x||

(b) If z is an eigenvector of A with eigenvalue λ show that it is also an eigenvector of


A∗ . What is the corresponding eigenvalue of A∗ ?

(c) Let λ = µ + iν be an eigenvalue of A with eigenvector z. First show that A can be


decomposed in the following manner,

A = AR + iAI ,

where AR = A∗ R and AI = A∗ I . Next show that z is an eigenvector of AR and


AI with eigenvalues µ and ν respectively.

3. Consider a (4 × 4) matrix with one multiple eigenvalue. Write out the possible Jordan
canonical forms.

4. Symmetric Matrix

Consider the following matrix


 
7 −16 −8
A = −16 7 8
−8 8 −5
(a) Find the eigenvalues and eigenvectors of A?

(b) Find a solution to Ax = b where b = {1, 2, 1} by expanding x in the normalized


eigenvectors of A.

96
5. Consider the following matrix  
4 0 1
A = 2 3 2
1 0 4
(a) Find the eigenvalues and eigenvectors of A?

(b) Find a solution to Ax = b where b = {0, −2, 3} by expanding x in the eigenvectors


of A.

6. Solvability Conditions

Consider the non-homogeneous equation

(A − λI)u = b, (5.56)

where A is a square matrix of dimension n. Let


n
X
u= cj φ j , (5.57)
j=1

where cj ’s are the coefficients of the expansion and φj ’s are the eigenvectors of A.

(a) If A is self adjoint and λ in Eq. 5.56 is not an eigenvalue of A, then obtain an ex-
pression for the coefficients cj in the expansion. What are the solvability conditions
for Eq. 5.56 (Fredholms Alternative Theorem)?

(b) Re-work part a) for the case when λ is a particular eigenvalue of A. Note how the
solvability conditions are connected to the eigenfunctions of A.

(c) If A is a non self adjoint matrix with n linearly independent eigenvectors then the
solution results in
Mc = f.

Write out the components of the matrix M and vector f

7. IVP

Using similarity transforms solve the system

dx
= Ax
dt

97
with initial conditions x(t = 0) = {1, 1, 1} where
 
5 −3 −2
A =  8 −5 −4
−4 3 3

8. Let  
−2 1
A=
−1 −2

(a) Find the eigenvalues and eigenvectors of A.

(b) Do the eigenvectors form an orthogonal set ?

(c) Using similarity transforms obtain the solution to

du
= Au + b
dt

where b = {1, 1} and u(t = 0) = {0, 0}.

(d) How does your solution behave as t tends to ∞ ?

9. Skew Symmetric System

Using similarity transforms solve the system

dx
= Ax + b(t)
dt
√ √
with initial conditions x(t = 0) = {1/ 2, 1/ 2, 1} where
   √ 
−i i 0 √ 2t
A =  i −i 0  , b(t) =  2t 
0 0 −i exp(−t)

Comment on the asymptotic stability of the system.

10. Consider the initial value problem

d2 u du
2
+ 5 + 6u = e−t (5.58)
dt dt

with initial condition, u(t = 0) = u′ (t = 0) = 1

98
(a) Reduce the above ode to a set of first order linear differential equations and represent
them in matrix vector form,
du
= Au + b(t) (5.59)
dt
Write out the components for the matrix A and vectors u, b(t) and initial condition
u(t = 0). Obtain the solution to Eq. 5.59 using similarity transformations.

(b) The above solution can also be solved using the corresponding Greens function,
g(t, ξ), for the second order differential operator given in Eq. 5.58. Using the
Green’s functions the solution to Eq. 5.58 can be expressed as
Z t
u(t) = c1 u1 (t) + c2 u2 (t) + g(t, ξ)e−ξ dξ (5.60)
o

where the Green’s function g(t, ξ) = exp[2(ξ − t)] − exp[3(ξ − t)]. u1 (t) and u2 (t)
are two linearly independent solutions to the homogeneous differential equation,

d2 u du
+ 5 + 6u = 0 (5.61)
dt2 dt

Using Eq. 5.60 find the solution u(t) for initial condition, u(t = 0) = u′ (t = 0) = 1,
i.e find u1 (t), u2 (t) and the constants c1 and c2 . You will have to evaluate the integral
in Eq. 5.60 to obtain the complete solution.

11. Consider the following non-symmetric matrix with real coefficients,


 
a11 a12
a21 a22

(a) Derive the conditions on the coefficients aij , when the matrix has two similar eigen-
values.

(b) In this situation show that the matrix can possess only one eigenvector. Derive a
general expression for the eigenvector, x in terms of aij

(c) Write out the equation that will be used to obtain the generalized eigenvector, q.
Write out the form for the Jordan matrix.

(d) Show that when the matrix has two similar eigenvalues the generalized eigenvector
can always be obtained (i.e the solvability criterion are always satisfied).

99
(e) Using similarity transforms obtain a solution to the following initial value problem,

dx1 /dt = 3x1 + x2


dx2 /dt = −x1 + x2

for the initial conditions x1 = 1, t = 0 and x2 = 1, t = 0. Is the system asymptoti-


cally stable. Why?

(f) Qualitatively sketch your solutions.

12. IVPs

Consider the Initial Value Problem;

d3 x d2 x dx
3
+ a1 2
+ a2 + a3 x = f (t)
dt dt dt

with initial conditions x(0) = x′ (0) = x′′ (0) = 0. Reduce this to a system of first order
equations of the form
du
= Au + b
dt
(a) If a3 = 0, what are the conditions on a1 and a2 for the system to have a stable
solution.

(b) If a1 = −4, a2 = 3 and f (t) = sin(t) obtain a solution to the IVP using the
similarity transform method.

13. Normal Mode Analysis:Vibration of a CO2 Molecule

Consider a spring and mass model of a CO2 molecule as shown in the figure below. The
oxygen molecules have mass mo and the carbon molecule has mass mc . The springs
have a spring constant k and obey Hooke’s law. Using Newton’s laws and assuming

OXYGEN CARBON OXYGEN

x1 x2 x3

that the motion is constrained along the x − axis the system of equations describing the

100
displacement of masses is

d2 x1
= −a(x1 − x2 )
dt2
d2 x2
= −b(x2 − x1 ) − b(x2 − x3 )
dt2
d2 x3
= −a(x3 − x2 )
dt2

where a = k/mo and b = k/mc

(a) Assuming a solution of the form



xn (t) = xn ei ωt
n = 1, 2, 3

where i = −1 and ω is a natural frequency of oscillation of the system reduce the
set of ode’s to an eigenvalue problem of the form

Ax = ωx.

(b) Find the eigenvalues ω.

(c) Find the corresponding eigenvectors.

(d) Noting that the components of the eigenvectors correspond to the displacement of
the molecules, give a physical explanation for eigenvectors.

14. Projection Theorem Consider the matrix


 
1 −i
A=
i 1

where i = −1.

(a) Find the eigenvalues and normalized eigenvectors of A.

(b) Find the projections P1 and P2 of A.

(c) Using the projection theorem evaluate A2 and eAt .

101
102
Chapter 6

Solutions of Non-Linear Equations

Non-linear differential and algebraic equations arise in a wide variety of engineering situations
and we have seen some examples of non-linear operators in Chapter ??. Numerical solutions
of non-linear differential equations result in a set of non-linear algebraic equations. Although
there are a number of techniques available for solving non-linear algebraic equations, in this
Chapter we will focus on primarily two methods, the Picard and Newton-Raphson methods.
The primary goal here is to develop a framework to analyze non-linear equations. A large
number of excellent texts cover the variety of numerical methods available for solving non-
linear equations. In order to formally treat non-linear equations and discuss their convergence,
existence and uniqueness aspects, we need to introduce the metric space. In many situations we
can express non-linear or linear equations in the following implicit manner,

u = Lu

where L can either be a linear or non-linear operator and u is the unknown we seek. Examples
of equations that can be cast in the form of Eq. 6 are

1.
x = tan x

2.
x = x2 + sin x + 2

3. Z x
u(x) = k(x, y)u(y) dy
0

103
4.
x = Ax + b

Fixed Points: u is said to be a fixed point of the mapping L if

u = Lu

Thus L operating on u leaves it unchanged and is a solution to F (u) = u−Lu = 0. The method
of successive substitution can be used to determine the fixed point in the following manner,

un+1 = Lun n = 0...,

If u is a fixed point of L then


lim un = u
n→∞

and u = Lu. This method of successive substitution also known as Picard’s iterative method
will work only for a certain class of mappings or operators referred to as contractions. As the
name implies the mapping contracts the distance between successive iterates and the generated
sequence in Eq. 6 tends to a limiting value v called the fixed point under certain conditions.
Before introducing the contraction mapping theorem let us formally define the metric space
which provides a framework for defining distances between elements in a space.
Metric Space (X, d) is said to be a metric space if the distance between any two points x and y
in X denoted by d(x, y) statisfies the following axioms More formally,

(i) d(x, y) ≥ 0, d(x, y) = 0 ⇒ x = y Positivity


(ii) d(x, y) = d(y, x) Symmetry
(ii) d(x, y) ≤ d(x, z) + d(z, y) x, y, z ∈ X Triangular Inequality

Thus the metric, d(x, y) is simply a distance function and hence a scalar quantity. Some exam-
ples of commonly encountered metrics are given below
If x and y are two vectors in Rn ,
" n #1/2
X
d(x, y) = (xi − yi )2
i=1

104
If x = (x1 , x2 ) and y = (y1 , y2 ), then
p
d(x, y) = (x1 − y1 )2 + (x2 − y2 )2

which is the familiar example of the distance in R2 , called the Euclidean distance. This metric
is used frequently in least squares fitting of data. The p metric is a more general definition of
the metric,
" n #1/p
X
dp (x, y) = |xi − yi |p , 1≤p<∞
i=1

The ∞ metric

d∞ (x, y) = max |xi − yi |


1≤i≤n

The ∞ metric is useful in many engineering situations. While determining the uniformity of
temperature in an object the difference between the maximum and minimum temperatures is an
example of d∞ . The metric is related to the norm in the following manner,

d(x, y) = kx − yk

If f (x) and g(x) are two continuous functions in C[a, b], then
Z b 1/p
p
dp (f, g) = |f (x) − g(x)| dx , 1≤p<∞
a

Example: To show that (X, d)4 is a valid metric space the distance function must satisfy
the axioms of the metric space. We illustrate this with the metric defined above for finite sums,
" n #1/p
X
dp (x, y) = |xi − yi |p , 1≤p<∞
i=1

It is easy to see that the postivity and symmetry properties of the metric are satisfied. In order to
prove that dp (x, y) satisfies the triangular inequality we need to use the Minkowski Inequality
forfinite sums[?],
( n )1/p ( n )1/p ( n )1/p
X X X
|xi ± yi |p ≤ |xi |p + |yi |p
i=1 i=1 i=1

105
Then,
( n
)1/p
X
d(x, y) = |xi − yi |p
i=1
( n
)1/p
X
= |xi − zi + zi − yi |p
i=1
( n
)1/p ( n )1/p
X X
≤ |xi − zi |p + |zi − yi |p (using the Minkowski Inequality)
i=1 i=1

= d(x, z) + d(z, y)

Thus,
d(x, y) ≤ d(x, z) + d(z, y)

which is the triangular inequality.


Convergent Sequences: Consider a sequence {uk }. We say that the sequence {uk } converges
to u, i.e.,
lim = u
k→∞

if for every ǫ > 0, ∃ an N such that

d(u, uk ) ≤ ǫ ∀k > N

A sequence is said to diverge if it does not coverge.


Cauchy Sequence: uk is said to be a Cauchy sequence if ∀ ǫ > 0, ∃ N such that,

d(ui , uj ) ≤ ǫ ∀i, j > N

Theorem: If uk converges then it is a Cauchy sequence.


Proof: If uk converges then
lim = u
k→∞

Using the triangular inequality

d(ui , uj ) ≤ d(ui , u) + d(u, uj )

106
Since uk is convergent ∃ an N such that
ǫ ǫ
d(ui , u) ≤ and d(u, uj ) ≤ , i, j > N
2 2
Thus ∃ N such that
d(ui , uj ) ≤ ǫ ∀ i, j > N

Note that a Cauchy sequence need not be convergent. Hence in a Cauchy sequence the distance
between two points in the sequence can get arbitrarily close. However the limit value of the
sequence is not mentioned and it need not exist. This issue is resolved by invoking the concept
of a complete metric space.
Definition: A metric space (X, d) is said to be complete if every Cauchy sequence of points
from X converges to a limit in X.
Example: Let X[0, 1) which includes the value 0 and excludes 1. Then the sequence un = 1− n1
is a Cauchy sequence in the space X since the limit of the sequence as n → ∞ = 1 is excluded
from the space. If X[0, 1] then the sequence is convergent in X. Thus convergence is clearly
concerned with the existence of the limit points in the underlying space.

6.0.1 Contraction Mapping or Fixed Point Theorem

Contraction Mapping: Consider the mapping F (x), such that

x = F (x)

x0 is a fixed point of F if x0 = F (x0 ). Let (X, d) be a metric space and F : X → X. F (x) is


said to be a contraction if ∃ a real number k, 0 ≤ k < 1 (k independent of x and y) such that

d(F (x), F (y)) ≤ k d(x, y) ∀ x, y ∈ X

This situation is illustrated graphically in the Figure below, where the distance between two
points x and y, d(x, y) is reduced upon applying the mapping F (x) to each of the points.
Theorem: Let (X, d) be a complete metric space and let F : X → X be a contraction. Then ∃
a unique point x0 in the X such that x0 = F (x0 ).
Proof: Generate a sequence xn from the mapping F (x) in the following manner,

x1 = F (x)

107
x2 = F (x1 )
..
. =
xn = F (xn−1 )

We first show that xn is a Cauchy sequence. Consider the distances,

d(x2 , x1 ) = d(F (x1 ), F (x)) ≤ k d(x1 , x)


d(x3 , x2 ) = d(F (x2 ), F (x1 )) ≤ k d(x2 , x1 ) ≤ k 2 d(x1 , x)
..
.
d(xm , xm−1 ) = ≤ k m−1 d(x1 , x)

Using the triangular inequality,

d(x3 , x1 ) ≤ d(x3 , x2 ) + d(x2 , x1 )


d(x4 , x1 ) ≤ d(x4 , x3 ) + d(x3 , x1 )
≤ d(x4 , x3 ) + d(x3 , x2 ) + d(x2 , x1 )

Generalizing, for m > n and using the above results,

d(xm , xn ) ≤ d(xm , xm−1 ) + d(xm−1 , xm−2 ) + . . . , d(xn+1 , xn )


≤ k m−1 d(x1 , x) + k m−2 d(x1 , x) + . . . , k n d(x1 , x)
= [k m−1 ) + k m−2 + . . . , k n ] d(x1 , x)
≤ k n [k m−n−1 ) + k m−n−2 + . . . , k + 1]d(x1 , x)

Since 0 ≤ k < 1,

X kn
d(xm , xn ) ≤ k n k i d(x1 , x) = d(x1 , x)
i=0
1−k
where we have used the summation of the geometric series,

X 1
ki =
i=0
1−k

Since 0 ≤ k < 1, d(xm , xn ) → 0 as m, n → ∞. Thus xn is Cauchy. Further since (X, d) is a


complete metric space, xn is convergent in X. Let

x0 = lim xn
n→∞

108
To show that x0 is a fixed point of F (x) we use the continuity of the mapping F (x). Since F (x)
is continuous,
x0 = lim xn+1 = lim F (xn ) = F ( lim xn ) = F (x0 )
n→∞ n→∞ n→∞

To show that x0 is unique: Assume that x0 and y0 are two fixed points of F (x), i.e. x0 = F (x0 )
and y0 = F (y0 ).
d(x0 , y0) = d(F x0 , F y0) ≤ k d(x0 , y0 ) < d(x0 , y0 )

Hence d(x0 , y0 ) = 0 and x0 = y0 . Thus the fixed point is unique.


Some notes about fixed points. If F (x0 ) = x0 then F p (x0 ) = x0 . Further if x0 is a fixed
point of F p it need not be a fixed point of F (x).

109
(a) (b) F(x)
F(x)
y y = x y

y = x
F(x)

x x1 x2 x0 x2 x1 x x1 x x0 x x1

(d)
y (c) y F(x)
F(x)
F(x)
y = xx y = x

x0
x1 x2 x x1 x0 x2
Figure 6.1: Picard iterates are illustrated for different functions F (x). Convergence toward
the fixed point x0 is observed for cases (a) and (c). In these situations the |F ′(x)| < 1.Since
|F ′ (x)| > 1 for case (b) the iterates diverge and the iterates oscillate around the fixed point for
case (d) where |F ′(x)| = 1. In all cases the initial guess is x.

110

You might also like