Methods of Applied Mathematics For Engineers and Scientists
Methods of Applied Mathematics For Engineers and Scientists
Contents
Preface
I
page xi
MATRIX THEORY
1 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Definitions and Notations
4
1.2 Fundamental Matrix Operations
6
1.3 Properties of Matrix Operations
18
1.4 Block Matrix Operations
30
1.5 Matrix Calculus
31
1.6 Sparse Matrices
39
1.7 Exercises
41
2 Solution of Multiple Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.1 Gauss-Jordan Elimination
55
2.2 LU Decomposition
59
2.3 Direct Matrix Splitting
65
2.4 Iterative Solution Methods
66
2.5 Least-Squares Solution
71
2.6 QR Decomposition
77
2.7 Conjugate Gradient Method
78
2.8 GMRES
79
2.9 Newtons Method
80
2.10 Enhanced Newton Methods via Line Search
82
2.11 Exercises
86
3 Matrix Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.1 Matrix Operators
100
3.2 Eigenvalues and Eigenvectors
107
3.3 Properties of Eigenvalues and Eigenvectors
113
3.4 Schur Triangularization and Normal Matrices
116
3.5 Diagonalization
117
3.6 Jordan Canonical Form
118
3.7 Functions of Square Matrices
120
v
vi
Contents
3.8
3.9
3.10
3.11
3.12
II
124
127
132
135
138
Contents
vii
viii
Contents
B-1
I-1
Contents
ix
658
659
660
Contents
Preface
This book was written as a textbook on applied mathematics for engineers and
scientist, with the expressed goal of merging both analytical and numerical methods
more tightly than other textbooks. The role of applied mathematics has continued to
grow increasingly important with advancement of science and technology, ranging
from modeling and analysis of natural phenomenon to simulation and optimization
of man-made systems. With the huge and rapid advances of computing technology,
larger and more complex problems can now be tackled and analyzed in a very
timely fashion. In several cases, what used to require supercomputers can now be
solved using personal computers. Nonetheless, as the technological tools continue
to progress, it has become even more imperative that the results can be understood
and interpreted clearly and correctly, as well as the need for a deeper knowledge
behind the strengths and limitations of the numerical methods used. This means
that we cannot forgo the analytical techniques because they continue to provide
indispensable insights on the veracity and meaning of the results. The analytical tools
remain to be of prime importance for basic understanding for building mathematical
models and data analysis. Still, when it comes to solving large and complex problems,
numerical methods are needed.
The level of exposition in this book is aimed at the graduate students, advanced
undergraduate students, and researchers in the engineering and science field. Thus
the topics were mostly chosen to continue several topics found in most undergraduate textbooks in applied mathematics. We have focused on advanced concepts and
implementation of various mathematical tools to solve the problems that most graduate students are likely to face in their research work and other advanced courses.
The contents of the book can be divided into four main parts: matrix theory,
vector and tensors, ordinary differential equations, and partial differential equations.
We begin the book with matrix theory because the tools developed in matrix theory
form the crucial foundations used in the rest of the book. The next part centers on
the concepts used in vector and tensor theory, including the application of tensor
calculus and integral theorems to develop mathematical models of physical systems,
often resulting in several differential equations. The last two parts focus on the
solution of ordinary and partial differential equations. It can be argued that the
primary needs of applied mathematics in engineering and the physical sciences are
to obtain models for a system or phenomena in the form of differential equations
xi
xii
Preface
and then to be able to solve them to predict and understand the effects of changes
in model parameters, boundary conditions, or initial conditions.
Although the methods of applied mathematics are independent of computing
platform and programs, we have chosen to use MATLAB as a particular platform under which we investigate the mathematical methods, techniques, and ideas
so that the approaches can be tested and the results can be visualized. The supplied MATLAB codes are all included on the books website, and the reader can
modify the codes for their own use. There exists several excellent MATLAB toolboxes supplied by third-party software developers, and they have been optimized
for speed, efficiency, and user-friendliness. However, the unintended consequences
of user-friendly tools can sometimes render the users to be button pushers. We
contend that students in applied mathematics still need to discover the mechanism
and ideas behind the full-blown programs at least to apply them to simple test
problems and gain some basic understanding of the various approaches. The links
to the supplemental MATLAB programs and files can be accessed through the link:
www.cambridge.org/Co.
The appendices are collected as chapter fortifications. They include proofs,
advanced topics, additional tables, and examples. The reader should be able to
access these materials through the web via the link: www.cambridge.org/Co.
The index also contains topics that can be found in the appendices, and they are
given page numbers that continue the count from the main text.
Several colleagues and students have helped tremendously in the writing of
this textbook. Mostly, I want to thank my best friend and wife, Faith Morrison, for
the support, encouragement, and sacrifices she has given me to finish this extended
and personally significant project. I hope the textbook will contain useful information to the readers, enough for them to share in the continued exploration of the
methods and applications of mathematics to further improve the understanding and
conditions of our world.
T. Co
Houghton, MI
PART I
MATRIX THEORY
Matrix theory is a powerful field of mathematics that has found applications in the
solution of several real-world problems, ranging from the solution of algebraic equations to the solution of differential equations. Its importance has also been enhanced
by the rapid development of several computer programs that have improved the
efficiency of matrix analysis and the solution of matrix equations.
We have allotted three chapters to discussing matrix theory. Chapter 1 contains
the basic notations and operations. These include conventions and notations for
the various structural, algebraic, differential, and integral operations. As such, this
chapter focuses on how to formulate problems in terms of matrix equations, the
various approaches of matrix algebraic manipulations, and matrix partitions.
Chapter 2 then focuses on the solution of the linear equation given by Ax = b,
and it includes both direct and indirect methods. The most direct method is to find
the inverse of A and then evaluate x = A1 b. However, the major practical issue is
that matrix inverses become unwieldy when the matrices are large. This chapter is
concerned with finding the solutions by reformulating the problem to take advantage
of available matrix properties. Direct methods use various factorizations of A based
on matrices that are more easily invertible, whereas indirect methods use an iterative
process starting with an initial guess of the solution. The methods can then be applied
to linear least-squares problems, as well as to the solution of multivariable nonlinear
equations.
Chapter 3 focuses on matrices as operators. In this case, the discussion is concerned with the analysis of matrices, for example, using eigenvalues and eigenvectors. This allows one to obtain diagonalized matrices or Jordan canonical forms.
These forms provide efficient tools for evaluating matrix functions, which are also
very useful for solving simultaneous differential equations. Other analysis tools such
as singular values decomposition, matrix norms, and condition numbers are also
included in the chapter.
The matrix theory topics are also used in the other parts of this book. In Part II,
we can use matrices to represent vector coordinates and tensors. The operations and
vector/tensor properties can also be evaluated and analyzed efficiently using matrix
theory. For instance, the mutual orthogonalities among the principal axes of a symmetric tensor are immediate consequences of the properties of matrix eigenvectors.
In Part III, matrices are also shown to be indispensable tools for solving ordinary
differential equations. Specifically, the solution and analysis of a set of simultaneous
1
Matrix Theory
linear ordinary differential equations can be represented in terms of matrix exponential functions. Moreover, numerical solution methods can now be coded in matrix
forms. Finally, in Part IV of the book, both the finite difference and finite elements
methods reduce partial differential equations to linear algebraic equations. Thus the
tools discussed in Chapter 2 are strongly applicable because the matrices resulting
from either of these methods will likely be large and sparse.
Matrix Algebra
Matrix Algebra
recognition purposes. In network analysis, matrix methods are used together with
graph theory to analyze the connectivity and effects of large, complex structures.
Applications include the analysis of communication and control systems, as well as
large power grids.
We now begin with the definition of a matrix and continue with some of the
notations and conventions that are used throughout this book.
Definition 1.1. A matrix is a collection of objects, called the elements of the
matrix, arranged in rows and columns.
These elements of the matrix could be numbers, such as
1
0
0.3
A=
with i = 1
1
2 3 + i
2
or functions, such as
B=
2x(t) + a
dy/dt
1
sin(t)dt
The elements of matrices are restricted to a set of mathematical objects that allow
algebraic binary operations such as addition, subtraction, multiplication, and division. The valid elements of the matrix are referred to as scalars. Note that a scalar is
not the same as a matrix having only one row and one column.
We often use capital letters to denote matrices, whereas the corresponding small
letters stand for the elements. Thus the elements of matrix A positioned at the ith row
and j th column are denoted as aij , for example, for A having N rows and M columns,
(1.1)
A= .
..
..
..
..
.
.
.
aN1
aN2
aNM
The size of the matrix is given by the symbol [=], for example, for matrix A having
N rows and M columns,
A [=] N M
or
A[NM]
(1.2)
A row vector is a matrix having one row, whereas a column vector is a matrix
having one column. The length of a vector means the number of elements of the row
or column vector. If the type of vector has not been specified, we take it to mean a
column vector. We often use bold small letters to denote vectors. A basic vector is
the ith unit vector of length N denoted by ei ,
0
..
.
th
ei =
(1.3)
1 i element
0
.
..
0
The length N of the unit vector is determined by context.
Matrix Algebra
A square matrix is a matrix with the same number of columns and rows. Special cases include lower triangular, upper triangular, and diagonal matrices. Lower
triangular matrices have zero elements above the main diagonal, whereas upper
triangular matrices have zero elements below the main diagonal. Diagonal matrices
have zero off-diagonal elements. The diagonal matrix is also represented by
D = diag d11 , d22 , . . . , dNN
(1.4)
A special diagonal matrix in which the main diagonal elements are all 1s is known
as the identity matrix, denoted by I. If the size of the identity matrix needs to
be specified, then we use IN to denote an N N identity matrix. An extensive
list of different matrices that have special forms such as bidiagonal, tridiagonal,
Hessenberg, Toeplitz, and so forth are given in Tables A.1 through A.5 in Section A.1
as an appendix for easy reference.
1+i
2
2
3
B=
1
2i
2+i
3
then A is symmetric but not Hermitian, whereas B is Hermitian but not symmetric.
On the other hand, when A = AT , we say that A is skew-symmetric, and when
A = A , we say that A is skew-Hermitian.
Notation
C=
1
Rule
c11
.
.
.
cN1
Column Augment
MATLAB: C=[A,B]
C=
2
A
B
..
.
c1,M+P
..
. =
cN,M+P
a11
..
.
aN1
a1M
..
..
.
.
aNM
c11
..
.
cN+P,1
c1,M
..
..
=
.
.
cN+P,M
Row Augment
MATLAB: C=[A;B]
C = vec (A)
3
Vectorize
MATLAB: C=A(:)
b1P
..
..
.
.
bNP
b11
..
.
bN1
a11
..
.
aN1
b11
..
.
bP1
a1M
..
..
.
.
aNM
b1M
..
..
.
.
bPM
A,1
c1
.
..
. =
. .
cNM
A,M
where A,i is the ith column of A
1 2 3
2 3
=
A = 4 5 6 A2,3
1,2
5 6
7 8 9
For a square matrix, if the diagonals of the submatrix are a subset of the diagonals
of the original matrix, then we call it a principal submatrix. This happens if the
superscript indices and the subscript indices of the submatrix are the same. For
instance,
1 2 3
1 3
=
A = 4 5 6 A1,3
1,3
7 9
7 8 9
then A1,3
1,3 is a principal submatrix.
Matrix Algebra
Table 1.2. Matrix rearrangement operations
Operation
Notation
Rule
C=
reshape (v, N, M)
4
v1
.
C = ..
vN
Reshape
MATLAB:
reshape(v,N,M)
C=
Transpose
MATLAB: C=A.
aM1
..
..
.
.
aMN
a11
..
.
a1M
a11
.
C = ..
a1M
C=A
6
v(M1)N+1
..
vMN
C = AT
5
vN+1
..
.
v2N
Conjugate
Transpose
aM1
..
..
.
.
aMN
MATLAB: C=A
where aij = complex conjugate of aij
j ,j ...,j
C = Ai11,i22,...,ik
7
Submatrix
MATLAB:
rows=[i1,i2,...]
cols=[j1,j2,...]
C=A(rows,cols)
C = Aij
ai1 j 1
.
C[ k ] = ..
aik j 1
ai1 j
..
..
.
.
aik j
C[ (N1)(M1) ] =
1,...,j 1
A1,...,i1
1,...,j 1
A
i+1,...,N
MATLAB:
C=A
C(i,j)=[ ]
j +1,...,M
A1,...,i1
j +1,...,M
Ai+1,...,N
Next, the operation to remove some specified rows and columns is referred to
here as the (ij )th redact operation. We use Aij to denote the removal of the ith row
and j th column. For instance,
1 2 3
A= 4 5 6
7 8 9
A23 =
1
7
2
8
(1.5)
Notation
Rule
MATLAB
commands
Sum
C = A+B
C=A+B
Scalar Product
C = qA
cij = q aij
C=q*A
Matrix Product
C = AB
cij =
Haddamard
Product
C = AB
Kronecker
Product
(tensor product)
Determinant
K
k=1
C=A.*B
C=
C = AB
q = det (A)
or q = |A|
N
(K)
a1M B
..
.
aNM B
a11 B
..
.
aN1 B
q=
C=A*B
aik bkj
C=kron(A,B)
ai,ki
i=1
q=det(A)
Cofactor
q = cof (aij )
see (1.10)
q = (1)i+j Aij
Adjugate
C = adj (A)
cij = cof (a ji )
Inverse
C = A1
C=
10
Trace
q = tr (A)
q=
11
Real Part
C = Real(A)
C=Real(A)
12
Imag Part
C = Imag(A)
C=Imag(A)
13
Complex Congugate
C=A
cij = aij
C=Conj(A)
1
adj (A)
|A|
N
i=1
aii
C=inv(A)
q=trace(A)
10
Matrix Algebra
The most basic matrix binary computational operations are matrix sums, scalar
products, and matrix products, which are quite familiar to most readers. To see how
these operations seem the natural consequences of solving simultaneous equations,
we refer the reader to Section A.2 included in the appendices.
Matrix products of A and B, are denoted simply by C = AB, which requires
A[=]N K, B[=]K M and C[=]N M (i.e., the columns of A must be equal to the
rows of B). If this is the case, we say that A and B are conformable for the operation
AB. Furthermore, based on the sizes of the matrices, A[NK] B[KM] = C[NM] , we see
that dropping the common value K leaves the size of C to be N M. For the matrix
product AB, we say that A premultiplies B, or B postmultiplies A. For instance, let
1 1
A= 2 1
1 0
and
B=
2
1
1
3
then
3
4
C = AB = 5
5
2 1
0
1
0 4
1
7
2 0
DA = 0 1
0 0
2
5
8
3
2
4
6 = 4
5
9
7 8
6
6
9
1
AD = 4
7
2
5
8
3
2 0
0
2
6 0 1
0 = 8
9
0 0 1
14
2 3
5 6
8 9
1 2 3
1 2 3
1
6
4 5 6 1 = 15
1 1 1 4 5 6 = 12 15 18
7 8 9
7 8 9
1
24
4. Let T be an identity matrix, but with additional nonzero nondiagonal elements
in the j th column. Then B = TA is a matrix whose ith row (i = j ) is given by the
sum of the ith row of A and tij (j th row of A). The j th row of B remains to be the
j th row of A. For instance,
1 0 1
1 2 3
6 6 6
0 1
2 4 5 6 = 18 21 24
0 0
1
7 8 9
7
8
9
Likewise, let G be an identity matrix, but with nondiagonal elements in the ith
row. Then C = AG is a matrix whose j th column (j = i) is given by the sum of
the j th column of A and g ij (j th column of A). The ith column of G remains to
be the ith column of A. For instance,
1 2 3
1 0 0
2
8 3
4 5 6 0 1 0 = 2 17 6
7 8 9
1 2 1
2 26 9
5. A square matrix P is known as a row permutation matrix if it is a matrix obtained
by permuting the rows of an identity matrix. If P is a row permutation matrix,
then PA is a matrix obtained by permuting the rows of A in the same sequence as
P. For instance, let P[=]3 3 be obtained by permuting the rows of the identity
matrix according to the sequence [3, 1, 2], then
0 0 1
1 2 3
7 8 9
PA = 1 0 0 4 5 6 = 1 2 3
0 1 0
7 8 9
4 5 6
Likewise, a square matrix
P is also a column permutation matrix if it is a matrix
obtained by permuting the columns of an identity matrix. If
P is a column permutation matrix, then A
P is obtained by permuting the columns A in the same sequence as
P. For instance, let
P[=]3 3 be obtained by permuting the columns of the identity
matrix according to the sequence [3, 1, 2], then
1 2 3
0 1 0
3 1 2
A
P = 4 5 6 0 0 1 = 6 4 5
7 8 9
1 0 0
9 7 8
Remark: Matrices D, T , and P described in items 2, 4, and 5 are known as the
scaling, pairwise combination, and permutation row operators, respectively. Collectively, they are known as the elementary row operators. All three operations
show that premultiplication (left multiplication) is a row operation. On the other
hand, D, G, and
P are elementary column operators, and they operate on matrices
11
12
Matrix Algebra
via postmultiplication (right multiplication).1 All these matrix operations are used
extensively in the Gauss-Jordan elimination method for solving linear equations.
Aside from scalar and matrix products, there are two more matrix operations
involving multiplication. The Hadamard product, also known as element-wise product, is defined as follows:
Q = AB
For instance,
1 1
2
2
2
4
i = 1, . . . , N; j = 1, . . . , M
=
1 2
6
8
(1.6)
The Kronecker product, also known as the Tensor product, is defined as follows:
a11 B a1M B
..
..
..
= AB =
(1.7)
.
.
.
aN1 B aNM B
where the matrix blocks aij B are scalar products of aij and B. For instance,
1
2 1 2
3
4 3 4
1 1
1 2
=
2 4
2
4
2
2
3 4
6 8
6
8
Both the Hadamard product and Kronecker product are useful when solving general
matrix equations, some of which result from the finite difference methods.
1.2.2.2 Unary Algebraic Operations
We first look at the set of unary operations applicable only to square matrices. The
first set of unary operations to consider are highly related to each other. These
operations are the determinant, cofactors, adjugates, and inverses. As before, we
refer the reader to Section A.2 to see how these definitions naturally developed
from the application to the solution of simultaneous linear algebraic equations.
Of these unary operations, the matrix inverse can easily be defined independent
of computation.
Definition 1.2. The matrix inverse of a square matrix A is a matrix of the same
size, denoted by A1 , that satisfies
A1 A = AA1 = I
(1.8)
We suggest the use of the mnemonics LR and RC to stand for Left operation acts on Rows and
Right operation acts on Columns, respectively.
13
(1.9)
(5, 1, 2, 4, 3) = 1
(2, 1, 3, 4) = 1
(6, 2, 1, 5, 3, 4) = +1
(1.11)
for any j
det (A) =
N
(1.12)
(1.13)
k=1
N
k=1
By induction, one can show that either the column expansion formula given
in (1.12) or the row expansion formula given in (1.13) will yield the same result as
given in (1.10).
PROOF.
14
Matrix Algebra
Definition 1.6. The adjugate2 of a square matrix A is a matrix of the same size,
denoted by adj (A), consisting of the cofactors of each element in A but collected
in a transposed arrangement, that is,
cof
cof
(a
)
(a
)
11
N1
.
.
..
..
..
(1.14)
adj A =
adj(A) A = det(A) I
(1.15)
1
adj(A)
det(A)
(1.16)
PROOF.
EXAMPLE 1.1.
Let
1
A= 4
7
then
then
3
6
0
5
cof(a11 ) = +
8
6
;
0
2 3
;
cof(a21 ) =
8 0
2 3
cof(a31 ) = +
5 6
4
cof(a12 ) =
7
6
;
0
1 3
;
cof(a22 ) = +
7 0
1 3
cof(a32 ) =
4 6
4
cof(a13 ) = +
7
5
;
8
1 2
;
cof(a23 ) =
7 8
1 2
cof(a33 ) = +
4 5
48
adj(A) = 42
3
2
2
5
8
24
21
6
3
27
6 adj(A) A = A adj(A) 0
3
0
0 0
27 0
0 27
In other texts, the term adjoint is used instead of adjugate. We chose to use the latter, because
the term adjoint is also used to refer to another matrix in linear operator theory.
and
A1
48
1
=
42
27
3
15
24 3
21
6
6 3
Although (1.16) is a general method for computing the inverse, there are more
efficient ways to find the matrix inverse that take advantage of special structures
and properties. For instance, the inverse of diagonal matrices is another diagonal
matrix consisting of the reciprocals of the diagonal elements. Another example is
when the transpose happens to also be its inverse. These matrices are known as
orthogonal matrices. To determine whether a given matrix is indeed orthogonal, we
can just compute AT A and AAT and check whether both products yield identity
matrices.
The other unary operations include the trace, real component, imaginary component, and the complex conjugate operations. The trace of a square matrix A,
denoted tr(A), is defined as the sum of the diagonals.
Let A[=]2 2, then for M = I A, where is a scalar parameter,
we have the following results:
det (I A) = 2 tr (A) + det A
a22
a12
adj (I A) =
a21
a11
1
a22
a12
1
=
(I A)
a21
a11
2 tr (A) + det A
EXAMPLE 1.2.
Note that when det (I A) = 0, the inverse will no longer exist, but
adj (I A) will still be valid.
We now show some examples in which the matrices can be used to represent the
indexed equations. The first example involves the matrix formulation of the finite
difference approximation of a partial differential equation. The second involves the
matrix formulation of a quadratic equation.
EXAMPLE 1.3.
by
(1.17)
T (x, 0, t) = g 0 (x)
T (x, W, t) = g W (x)
and initial condition, T (x, y, 0) = h(x, y). We can introduce a uniform finite
time increment
t and finite differences for x and y given by
x = L/(N + 1)
and
y = W/(M + 1), respectively, so that tk = k
t, xn = n
x, and ym = m
y,
16
Matrix Algebra
Tn-1,m
Tn,m-1
Tn,m+1
Tn,m
y=0
x=0
Tn+1,m
...
T(k+1)
T(k)
...
T(0)
Figure 1.1. A schematic of the finite difference approximation of the temperature distribution
T of a flat plate in Example 1.3.
2T
x2
x2
2T
y2
y2
(1.18)
where
x =
t
(
x)2
y =
t
(
y)2
The finite difference methods are discussed in more detail in Chapter 13.
17
first group of terms can be described by the product AT for some constant
N N matrix A. Conversely, the second group of terms in (1.18) involves only
acombination of column elements at fixed n, which means a product TB for
some matrix B[=]M M. In anticipation of boundary conditions, we need an
extra matrix C[=]N M. Thus we should be able to represent (1.18) using a
matrix formulation given by
T (k + 1) = AT (k) + T (k)B + C
(1.19)
1
1 0
x
21
+ y
1 y 1
T 31 (k) T 32 (k) T 33 (k)
0 1 y
T 41 (k) T 42 (k) T 43 (k)
20
+ x
+ y
0
T 30 (k) 0 T 34 (k)
0
0
T 51 (k) T 52 (k) T 53 (k)
T 40 (k) 0 T 44 (k)
where x = 1/(2x ) 2 and y = 1/(2y ) 2. Generalizing, we have
x 1
0
1 ... ...
[=]N N
A = x
..
. x 1
0
1 x
y 1
0
1 ... ...
[=]M M
B = y
..
. y 1
0
p1
0
0
q1
..
.
pM
+ y
0
qM
r1
..
.
rN
0
..
.
0
0
..
.
0
s1
..
.
sN
More generally, if the boundary conditions are time-varying, then C = C(k). Also, if the coefficient
= (t), then A and B will need to be replaced by A(k) and B(k), respectively.
18
Matrix Algebra
where p m = f 0 (m
y), qm = f L (m
y), rn = g 0 (n
x), and sn = g W (n
x).
The initial matrix is obtained using the initial condition, that is, T nm (0) =
h(n
x, m
y). Starting with T (0), one can then march iteratively through time
using (1.19). (A specific example is given in exercise E1.21.)
EXAMPLE 1.4.
given by
=
N
N
aij xi x j
i=1 j =1
a11
A = ...
aN1
...
..
.
...
a1N
x1
.. and x = ..
.
.
aNN
xN
[] = xT Lx
or
[] = xT Ux
where
0
u11 . . . u1N
11
q11 . . . q1N
.. L = ..
.. U =
..
..
..
Q = ...
.
.
.
.
.
N1 . . . NN
0
uNN
qN1 . . . qNN
and
aij + a ji
qij =
;
2
aij + a ji
uij =
a
ii
0
if i < j
if i = j ;
if i > j
aij + a ji
ij =
a
ii
0
if i > j
if i = j
if i < j
(The proof that all three forms are equivalent is left as an exercise in E1.34.)
This example shows that more than one matrix formulation is possible
in some cases. Matrix Q is symmetric, whereas L is lower triangular, and U
is upper triangular. The most common formulation is to use the symmetric
matrix Q.
=
=
BA
A
A+B
B+A
A1 A
AA
(A + B) + C
A (BC)
(AB) C
A (B C)
(A B) C
A (B C)
(A B) C
Distributivity of Products
A (B + C)
AB + AC
A (B + C)
AB+AC
(A + B) C
AC + BC
(A + B) C
AC+BC
A (B + C)
AB+AC
=
=
BA+CA
(B + C) A
Transpose of Products
(AB)T
BT AT
(A B)T
AT BT
(A B)T
=
=
BT AT
AT BT
T T
A
(A )
1 1
=A
A
vec (BAC)
T
C B vec (A)
vec (A B)
vec(A) vec(B)
19
20
Matrix Algebra
Table 1.5. Definition of vectors
Vector
Description of elements
x
y
z
w
actual computations. They help in simplifying expressions that often yield important
insights about the data or the system being investigated.
The first group of properties list the commutativity, associativity, and distributivity properties of various sums and products. One general rule is to choose associations
of products that would improve computations. For instance, let a, b, c, d, e, and f be
column vectors of the same length; we should use the following associations
abT cdT ef T = a bT c dT e f T
because both bT c and dT e are 1 1. A similar rule holds for using the distributive properties. For example, we can use distributivity to rearrange the following
equation:
AD + ABCD = A(D + BCD) = A(I + BC)D
More importantly, these properties allow for manipulations of matrix equations
to help simplify the equations, as shown in the example that follows.
Consider a processing facility that can take raw material from M
different sources to produce N different products. The fractional yield of product
j per kilogram of material coming from source i can be collected in matrix form
as F = ( f ij ). In addition, define the cost, price, supply rates, and production
rates by the column vectors given in Table 1.5. We simplify the situation by
assuming that all the products are sold immediately after production without
need for inventory. Let S, C, and P = (S C) be the annual sale, annual cost,
and annual net profit, respectively. We want to obtain a vector g where the kth
element is the annual net profit per kilogram of material from source k, that is,
P = gT x.
Using matrix representation, we have
EXAMPLE 1.5.
Fx
zT y
wT x
P = S C = zT F x wT x = zT F wT x = gT x
where g is given by
g = FTz w
More generally, the problem of maximizing net profit by adjusting the supply
rates are formulated as a typical linear programming problem:
max
x
gT x
(objective function)
subject to
0
ymin
xmax
(availability constraints)
y(= F x)
ymax
(demand constraints)
The transposes of matrix products turn out to be equal to the matrix products of
the transposes but in the reversed sequence. Together with the associative property,
this can be extended to the following results:
(ABC EFG)T
T
Ak
1 T
AA
=
=
GT F T ET CT DT AT
T k
A
1 T T
1 T
A
A = I = AT
A
1 1 T
The last result shows that AT
= A
. Thus we often use the shorthand AT
T 1 1 T
to mean either A
or A
.
Similarly, the inverse of a matrix product is a product of the matrix inverses in
the reverse sequence. This can be generalized to be5
(ABC )1
1
Ak
C1 B1 A1
k
A1 A1 = A1
Ak A
Ak+
k
1
Thus we can use Ak to denote either Ak
or A1 . Note that these results are
still consistent with A0 = I.
Consider a resistive electrical network consisting of junction points
or nodes that are connected to each other by links where the links contain three
types of electrical components: resistors, current sources, and voltage sources.
We simplify our network to contain only two types of links. One type of link contains only either one resistor or one voltage source, or both connectedin series.6
EXAMPLE 1.6.
Note that this is not the case for Kronecker products that is,
(A B C )1 = A1 B1 C1
If multiple resistors with resistance Rj 1 , Rj 2 , . . . are connected in series the j th link, then they can
be replaced by one resistor with resistance Rj = k Rjk . Likewise, if multiple voltage sources with
signed voltages s j 1 , s j 2 , . . . are connected in series the j th link, then they can be replaced by one
voltage source with signed voltage s j = k s jk , where the sign is positive if the polarity goes from
positive to negative along the current flow.
21
22
Matrix Algebra
R2
1
2
R4
R1
+
S1
-
R5
R3
R6
3
A3,0
The other type of link contains only a current source. One such network is shown
in Figure 1.2.
Suppose there are n + 1 nodes and m (n + 1) links. By setting one of
the nodes as having zero potential (the ground node), we want to determine
the potentials of the remaining n nodes as well as the current flowing through
each link and the voltages across each of the resistors. To obtain the required
equations, we need to first propose the directions of each link, select the ground
node (node 0), and label the remaining nodes (nodes 1 to n). Based on the
choices of current flow and node labels, we can form the node-link incidence
matrix [=]n m, which is a matrix composed of only 0, 1, and 1. The ith
row of refers to the ith node, whereas the j th column refers to the j th link.
Note that the links containing only current sources are not included during the
formulation of incidence matrix. (Instead, these links are involved only during
the implementation of Kirchhoffs current laws.) We set ij = 1 if the current is
flowing into node i along the j th link, and ij = 1 if the current is flowing out
of node i along the j th link. For the network shown in Figure 1.2, the incidence
matrix is given by
+1 1
0 1
0
0
= 0 +1 1
0 +1
0
0
0
0 +1 1 1
Let p i be the potential of node i with respect to ground and let e j be the potential
difference along link j between nodes k and , that is, where kj = 0 and j = 0.
Because the current flows from high to low potential,
e = T p
If the j th link contains a voltage source s j , we assign a positive value if the
polarity is from positive to negative along the chosen direction of the current
flow. Let v j be the voltage across the j th resistor, then
e=v+s
Ohms law states that the voltage across the j th resistor is given by v j = i j Rj ,
where i j and Rj are the current and resistance in the j th link. In matrix form, we
have
0
R1
..
v = Ri
where
R=
.
0
Rm
23
Let the current sources flowing out of the ith node be given by Aij , whereas those
flowing into the ith node are given by Ai . Then the net current inflow at node i
due only to current sources will be
Ai
Aij
bi =
Kirchhoffs current law states that the net flow of current at the ith node is zero.
Thus we have
i + b = 0
In summary, for a given set of resistance, voltage sources, and current sources,
we have enough information to find the potentials at each node, the voltage
across each resistor, and the current flows along the links based on the chosen
ground point and proposed current flows. To solve for the node potentials, we
have
e
v+s
T p
Ri + s
R p R s
R1 T p R1 s
1 T
R p
i = b
b R1 s
1 T 1
b R1 s
R
(1.20)
Using the values of p, we could find the voltages across the resistors,
v = T p s
(1.21)
(1.22)
For the network shown in Figure 1.2, suppose the values for the resistors,
voltage source, and current source are given by: {R1 , R2 , R3 , R4 , R5 , R6 } =
{1
, 2
, 3
, 0.5
, 0.8
, 10
}, S1 = 1 v and A3,0 = 0.2 amp. Then the
solution using equations (1.20) to (1.22) yields:
0.3882
0.3882
0.0932
0.1864
0.6118
0.1418
0.4254
p = 0.4254 volts, v =
0.1475 volts, and i = 0.2950 amps
0.4643
0.0486
0.0389
0.0464
0.4643
Remarks:
1. R1 is just a diagonal matrix containing the reciprocals of the diagonal
elements
R.
1 T of
2. R is an n n symmetric matrix, and its inverse is needed in equation (1.20). If n is large, it is often more efficient to approach the same
24
Matrix Algebra
problem using the numerical techniques that are covered in the next chapter, such as the conjugate gradient method.
The last group of properties given in Table 1.3 involves the relationship between
vectorization, matrix products, and Kronecker products. These properties are very
useful in reformulating matrix equations in which the unknown matrices X do not
exclusively appear on the right or left position of the products in the equation.
For example, a form known as Sylvester matrix equation, which often results from
control theory as well as in finite difference solutions, is given by
QX + XR = C
(1.23)
EXAMPLE 1.7.
T (k + 1) = AT (k) + T (k)B + C
where A[=]N N, B[=]M M, C[=]N M, and T [=]N M. At equilibrium,
T (k + 1) = T (k) = T eq , a constant matrix. Thus the matrix equation becomes
T eq = AT eq + T eq B + C
Using the vectorization properties in Table 1.4, we obtain
vec T eq = I[M] A vec T eq + BT I[N] vec T eq + vec (C)
or
Kx = b
where
x = K1 b
I[NM] I[M] A BT I[N]
vec T eq
vec (C)
25
Determinant of Products
det AB = det A det B
det A = N
i=1 aii
Determinant of Transpose
det AT = det A
Determinant of Inverses
1
det A1 = det A
Scaled
Columns:
1 a11
..
B=
1 aN1
N a1N
..
.
N aNN
det B = (K)det A
where (K) is the permutation
sign function
N
det B =
j =1 j det A
a11
.
.
.
aN1
x1 + y1
..
.
xN + yn
a1N
..
.
aNN
Multilinearity
a11
x1
a1N
.
..
..
=
..
.
aN1
xN
aNN
a11
y1
a1N
.
..
..
+
..
.
aN1
yn
aNN
det A = 0
N
if for some k = 0,
j =1 i A,j = 0
Using item 3 (i.e., that the transpose operation does not alter the determinant), a dual set of properties
exists for items 5 to 8, in which the columns are replaced by rows.
26
Matrix Algebra
the product of the diagonal means that there is tremendous advantage to finding
multiplicative factors that could diagonalize or triagularize the original matrix.
Later, in Chapter 3, we try to find such a nonsingular T whose effect would be to make
C = T 1 AT diagonal or triangular. Yet C and A will have the same determinant,
that is,
det T 1 AT = det T 1 det (A) det (T ) =
1
det (A) det (T ) = det (A)
det (T )
The last property in the list is one of the key application of determinants in
linear algebra. It states that if the columns of a matrix are linearly dependent
(defined next), then the determinant is zero.
Definition 1.7. Vectors {v1 , v2 , . . . , vN } are linearly dependent if
N
i vi = 0
(1.24)
j =1
for some k = 0
This means that if {v1 , . . . , vN } is a linearly dependent set of vectors, then any of the
vectors in the set can be represented as a linear combination of the other (N 1)
vectors. For instance, let
1
1
0
v1 = 1
v2 = 2
v3 = 1
1
1
0
We can compute the determinant of V = v1 v2 v3 = 0 and conclude immediately that the columns are dependent. In fact, we check easily that v1 = v2 v3 ,
v2 = v1 + v3 , or v3 = v2 v1 .
EXAMPLE 1.8.
V =
v1
v2
v3
It can be shown using the techniques given in Section 4.1, together with Section 4.2, that the volume of the tetrahedron can be found by the determinant
formula:
1
1
p1 = 0
0
1
p2 = 1
0
1
p3 = 1
1
0
p4 = 1
1
27
p3
p2
x
p1
0 0 1
1
V = 1 1
1 Volume =
6
0 1
1
T
p4 = 1 0 1
,
If instead of p4 , we have by
0 0 0
V = 1 1 0 Volume = 0
0 1 1
which means p1 , p2 , p3 , and
p4 are coplanar, with v1 = v2 v3 .
0
0
d1
d1
..
..
D1 =
(1.25)
=
.
.
0
dN
1
dN
LEMMA 1.3.
=1
PROOF.
28
Matrix Algebra
PROOF.
(1.27)
With M = C1 + DA1 B, let Q be the right hand side of (1.27), that is,
Q = A1 A1 BM1 DA1
(1.28)
Then,
(A + BCD) Q
(AQ) + (BCDQ)
1
AA AA1 BM1 DA1
+ BCDA1 BCDA1 BM1 DA1
I + BCDA1 B I + CDA1 B M1 DA1
I + BCDA1 B CC1 + CDA1 B M1 DA1
I + BCDA1 BC C1 + DA1 B M1 DA1
I + BCDA1 BCDA1
=
=
Remark: The matrix inversion lemma given by (1.27) is usually applied in cases in
which the inverse of A is already known and the size of C is significantly smaller
than A.
EXAMPLE 1.9.
where
Let
1
T = 2
1
1
0 2
G= 2
2 0 = T + wvT
1 1 3
0 0
2 0
1 3
1
w= 0
0
and
vT =
1
0
0
T 1 = 1 1/2
0
0 1/6 1/3
29
1 1/3 2/3
= 1
5/6
2/3
0
1/6
1/3
where we took advantage of the fact that 1 + vT T 1 w [=]1 1.
We complete this subsection with the discussion of a technique used in solving
a subset of Ax = b. Suppose we want to solve for only one of the unknowns, for
example, the kth element of x, for a given linear equation Ax = b. One could extract
the kth element of x = A1 b, but this involves the evaluation of A1 , which can be
computationally expensive. As it turns out, finding the inverse is unnecessary if only
one unknown is needed, by using Cramers rule, as given by the following lemma:
LEMMA 1.5.
(1.29)
b1
x1
cof(a11 ) cof(aN1 )
b2
1
..
..
..
..
. =
..
.
.
.
.
det(A)
xN
cof(a1N ) cof(aNN )
bN
n
xk =
j =1
bj cof(akj )
det(A)
The numerator is just the determinant of a matrix, A[k,b] , which is obtained from A,
with the kth column replaced by b.
EXAMPLE 1.10.
Let
1
A= 2
1
0 2
2 0
1 3
and
2
b= 3
2
Then for Ax = b, the value of x2 can be found immediately using Cramers rule,
1 2 2
2 3 0
1 2 3
det A[2,b]
11
x2 =
=
=
det (A)
6
1
0 2
2
2 0
1 1 3
30
Matrix Algebra
det
det
=
AE + BG
AF + BH
CE + DG
CF + DH
(1.30)
det A det D
det(A) det D CA1 B
det D det A BD1 C ; if D1 exists
1
=
(1.31)
; if A1 exists
(1.32)
(1.33)
(1.34)
1
1 CA1 = ZCA1
A1 B1 = A1 BZ
(1.35)
Case 2: D and
= A BD1 C are nonsingular, then
W
1
1 BD1 = WBD1
D1 C
1 = D1 CW
D1 (I + C
1 BD1 ) = D1 (I CX) = (I YB) D1
(1.36)
The proofs of (1.30) through (1.36) are given in Section A.4.5. The matrices =
D CA1 B and
= A BD1 C are known as the Schur complements of A and D,
respectively.
EXAMPLE 1.11. Consider the open-loop process structure consisting of R process
units as shown in Figure 1.4. The local state vector for process unit i is given
T
by xi = x1i , , xNi . For instance, xki could stand for the kth species in
process unit i. The interaction among the process units is given by
A1 x1 + B1 x2
p1
pi
CR1 xR1 + AR xR
pR
if 1 < i < R
31
A1
C1
F=
B1
..
.
..
.
0
..
..
CR1
BR1
AR
x1
x = ...
xR
p1
p = ...
pR
1
2
0
1
0.4
1 2
0.6
1
2
and p =
F=
0
0.4
1 2
1
1
0.5
0.0952
0.3333
0.2857
0.2381
0.2857
0
0.1429
0.2857
X
and
W
x1
1
=F p=
x=
x2
Y
X
Z
0.2857
0
0.0952
0.3333
0.2857
0
0.4286
0.1429
0.1429
0.0476
0.4
0.5333
0.5 0.3333
0.4 = 0.1
0.6
0.2667
Remark: Suppose we are interested in x1 (or x2 ); then one needs only the values
of W and X (or Y and Z, respectively).
32
Matrix Algebra
advantage of matrix calculus is also to allow for compact notation and thus improve
the tractability of calculating large systems of differential equations. This means
that matrix algebra and matrix analysis tools can be used to study the solution
and behavior of systems of differential equations, the numerical solution of systems
of nonlinear algebraic equations, and the numerical optimization of multivariable
functions.
d(a11 )
d(a1M )
dt
dt
d
1
.
.
.
(1.37)
..
..
..
A(t) = lim
A(t +
t) A(t) =
t0
t
dt
d(a )
d(aNM )
N1
dt
dt
Based on (1.37), we can obtain the various properties given in Table 1.7. For the
derivative of determinants, the proof is given in Section A.4.6.
Likewise, the integral of matrices of univariable functions is defined as follows:
tf
tf
a
(t)dt
a
(t)dt
11
1N
t
tf
t0
T
1
0
.
.
..
..
..
A(t) dt = lim
A (k
t)
t =
(1.38)
.
t0
tf
t0
k=0
tf
aN1 (t)dt
aNN (t)dt
t0
t0
where T = (t f t0 ) /
t. Based on the linearity property of the integrals, we have
the properties shown in Table 1.8.
In general, A(t) and its derivative are not commutative. However, for the special
case in which A and its derivative commute, the matrix exponential simplifies to
d
d
d
d
1
exp A(t)
=
A(t) + A(t)
A(t) + A(t)2
A(t) +
dt
dt
dt
2!
dt
d
d
A(t) =
A(t) exp A(t)
= exp A(t)
dt
dt
One such case is when A(t) is diagonal. Another case is when A(t) = (t)M,
where M is a constant square matrix.
33
Sum of Matrices
d d
d
M(t) + N(t) =
M +
N
dt
dt
dt
Scalar Products
d
d
d
(t)M(t) =
M+
M
dt
dt
dt
Matrix Products
d
d
d
M(t)N(t) =
M N+M
N
dt
dt
dt
Hadamard Products
d
d
d
M(t) N(t) =
M N+M
N
dt
dt
dt
Kronecker Products
d
d
d
M(t) N(t) =
M N+M
N
dt
dt
dt
d
A
dt
d
C
dt
Partitioned Matrices
d
dt
Matrix Transpose
d T
d
A(t)T =
A
dt
dt
Matrix Inverse
d
d
A(t)1 = A1
A A1
dt
dt
A(t)
B(t)
C(t)
D(t)
d
B
dt
d
D
dt
N
d
det A(t)
=
det
A
k (t)
dt
k=1
where
Determinants
a11
dak1
A
k =
dt
an1
..
.
a1N
dakN
dt
..
.
kth row
aNN
T
p2 p1
T
1
Vol = det p3 p1
T
p4 p1
EXAMPLE 1.13.
34
Matrix Algebra
Table 1.8. Properties of integrals of matrices of univariable functions
1
Sum of Matrices
2
Scalar Products
3
Matrix Products
4
Hadamard Products
5
Kronecker Products
M(t) + N(t) dt = Mdt + Ndt
M dt =
MN dt =
Partitioned Matrices
Matrix Transpose
if M is constant
M Ndt
if M is constant
Mdt N
if N is constant
Ndt
if M is constant
Mdt N
if N is constant
M(t) N(t) dt =
A(t)
A(t)T
Ndt
if M is constant
Mdt N
if N is constant
Adt
B(t)
dt =
D(t)
Cdt
C(t)
if is constant
dt M
M(t) N(t) dt =
6
Mdt
Bdt
Ddt
T
dt =
A dt
Using the formula for the derivative of determinants (cf. property 9 in Table 1.7),
the rate of change of Vol per change in t is given by
T
p2 p1
T
1
d
Vol = 0 + 0 + det p3 p1
dt
6
p
4
dt
0.5
d
1
Vol = 0 + 0 + det
0
dt
6
2
35
2t + 3
p4 = t + 1
t+5
1
1
1
0
1
0 =
12
1
EXAMPLE 1.14.
Q() d p
0
sin()
cos()
2
0
@p 2
0 2
p1
=0
p2
2
0
f 1 (x1 , x2 , . . . , xM )
f 1 (x)
..
f (x) = ... =
.
f N (x)
f N (x1 , x2 , . . . , xM )
We denote a row vector of length M known as the gradient vector, which is the
partial derivatives of f (x) by
f
d
f
, ... ,
f (x) =
(1.40)
dx
x1
xM
36
Matrix Algebra
f 1
f
(x)
1 x
1
d
d
..
..
f (x) =
=
.
dx
dx .
f m
f M (x)
x1
..
f 1
xn
..
.
f m
xn
(1.41)
EXAMPLE 1.15.
where xA, xB, and xC are the concentrations of A, B, and C, respectively. The
other variables in the model are treated as constants. We can collect these
concentrations in a column vector x as
x1
xA
x = xB = x2
xC
x3
then the differential equations can be represented in matrix form as
d
x =f x
dt
where
F in
f1 x
xA,in x1 kAx1
FV
in
f x = f2 x =
xB,in x2 kBx2 + kAx1
V
F
in
f3 x
xC,in x3 kCx3 + kBx2
V
which can be further recast in matrix product form as:
f x = Kx + b
where K and b are constant matrices given by
kA
0
0
K = kA
;
kB
0
kC
0
kB
x
F in A,in
b=
xB,in
V
xC,in
37
The term Kx, which is just a collection of N linear combination of the elements of
x, is said to have the linear form. Furthermore,
note that K can also be obtained
by taking the Jacobian matrix of f x , that is,
f1
x1
f2
d
f x =
dx
x1
f
3
x1
f1
x2
f2
x2
f3
x2
f1
kA
x3
f2
= kA
x3
f3
0
x3
kB
kB
kC
We show later that when f (x) is expanded through a Taylor series, there will
always be a term that has the linear form, precisely due to the Jacobian matrix
being evaluated at a prescribed point.
We define the operator d2 /dx2 on f (x) as
f/x1
T
2
d d
d
d .
=
f (x)
f (x) =
..
2
dx
dx
dx
dx
f/xN
2 f
2 f
x 2
x1 xN
1
..
..
..
=
.
.
.
2
2 f
xN x1
xN 2
(1.42)
x=x
THEOREM 1.1.
and
d2
vT
dx2 f (x) v > 0
x=x
for all v = 0
(1.44)
(1.45)
38
Matrix Algebra
and
d2
vT
dx2 f (x) v < 0
x=x
PROOF.
for all v = 0
(1.46)
(See A.4.9)
The conditions given in (1.44) and (1.46) are also known as positive definiteness
condition and negative definiteness conditions, respectively.
Definition 1.8. An N N matrix A is positive definite, denoted (A > 0), if
x Ax > 0
for all x = 0
(1.47)
2
4x1 + 8x1 + 2
4 (x1 + 1) (x2 1)
d2 f
=
f (x1 , x2 )
dx2
2
4 (x1 + 1) (x2 1)
4x 8x2 + 2
2
v1
v2
2
0
0
2
v1
= 2 v21 + v22 < 0
v2
which satisfies the negative definiteness condition. Thus the point (x1 , x2 ) =
(1, 1) is a local maximum. This can be seen to be the case from Figure 1.5.
39
f(x1,x2)
1
f (x1 , x2 ) =
0.5
0
3
2
x2
1
0
-1 -3
-2
-1
For the special cases of linear and quadratic forms, the derivatives, gradients,
and Hessians have special formulas as given by the following lemma:
LEMMA 1.6.
PROOF.
(1.49)
xT
A + AT
A + AT
(1.50)
(1.51)
Remark: Equations (1.49) to (1.51) are used in the next chapter during the solution
to the least squares problem.
dx2
x2
This leads to a matrix representation Au, where
2 1
..
.
1
1 2
A=
2
.
.
x
..
..
0
1
1
2
0
x1
40
Matrix Algebra
Table 1.9. Some MATLAB commands for sparse matrix operation
MATLAB Command
Description
Creates a sparse matrix of size NM
with rowA, colA, and nzA as the
vectors for row index, column index,
and nonzero elements, respectively
Converts a full formatted matrix S
to sparse matrix format
Returns the row and column indices
of nonzero elements of matrix A
Visualization of the sparsity pattern
of matrix A
Creates a sparse formatted identity
matrix of size N
Creates an MN sparse matrix
by placing columns of V along
the diagonals specified by d
Performs sparse matrix operations
and leaves the result in sparse
matrix format
Evaluates the functions of all elements
(zero and nonzero) but leaves the
results in sparse matrix format
A=sparse(rowA,colA,nzA,N,M)
A=sparse(S)
[rowA,colA]=find(A)
spy(A)
speye(N)
A=spdiags(V,d,M,N)
Operations: +, -, *,\,
Functions: exp,sin,cos,
tan,log,kron,inv,
The fraction of nonzero elements is (3N 2)/N2 . For instance, with N = 100, this
fraction is, approximately 1%.
Two features of sparse matrices can immediately lead to computational advantages. The first feature is that the storage of only nonzero elements of sparse matrices
will result in very significant savings by avoiding the storage of zero elements. The
second feature is that when performing matrix-vector products, it should not be necessary to multiply the zero elements of the matrix with that of the vector. For instance,
= vT w =
2 1 0 0 0 3
a
b
c
d
e
f
= 2
a
b
f
We discuss only the coordinate format because it is the most flexible and the
simplest approach, although other schemes can have further significant storage savings. In this format, the nonzero elements can be collected in a vector along with
two vectors of the same length: one vector for the row indices and the other vector
for the column indices. For example, let matrix A be given as
0
0
A=
d
0
a
0
0
0
0 0
0 c
0 e
0 g
b
0
f
0
1.7 Exercises
41
then the three vectors nzA, rowA, and colA, indicating nonzero elements, row indices,
and column indices, respectively, are given by
nzA =
a
b
c
d
e
f
g
rowA =
1
1
2
3
3
3
4
colA =
2
5
4
1
4
5
4
This implies that the storage requirements will be thrice the length of nzA. As a
guideline, for the storage under the coordinate formats to be worthwhile, the fraction
of nonzero elements must be less than 31 .
Some MATLAB commands for sparse matrices are listed on Table 1.9. In the
next chapter, we often note the solution methods for the linear equation Ax = b that
have been developed to preserve as much of the sparsity of matrices, because the
amount of computations can be significantly reduced if sparsity of A is present.
1.7 EXERCISES
1.
2.
3.
4.
rij
for all i
42
Matrix Algebra
n
rij
for all j
j =1
2. if G is triangular, then R = I.
Note: The relative gain array is used in process control theory to determine
the best pairing between manipulated and controlled variables that would
reduce the interaction among the control loops.
E1.6. If A2 = A then A is said to be idempotent.
1. Show that if A is idempotent, then I A and Ak , k > 1, are also idempotent.
2. Show that if A is idempotent, then either det A = 1 or det A = 0.
Furthermore, show that the only nonsingular idempotent matrices are the
identity matrices.
3. Projection matrices are idempotent matrices that are also Hermitian.
Verify that the following matrix is a projection matrix:
2 2 2
1
A=
2
5 1
6
2 1
5
and evaluate its determinant.
E1.7. Let p, s, and t be three points in the (x, y)-plane, that is,
s
t
px
s= x
t= x
p=
py
sy
ty
A circumcircle for p, s, and t is a circle that would pass through through all
three points, as shown in Figure 1.6.
cx
cy
T
(1.52)
2. Show that (1.52) can be combined to yield the following matrix equation:
(p s)T
1 pT p sT s
c=
2 s T s tT t
(s t)T
3. Using the preceding results, find the radius and center of the circumcircle
for
1
1
0
p=
s=
t=
0
0
1
Plot the points and the circumcircle to verify your solution.
1.7 Exercises
43
2
s=
2
1
p=
1
eTk1
.
.
P=
.
k1 = = kN
eTkN
and
0
..
.
th
ei =
1 i element
0
.
.
.
0
2. Show that the products of orthogonal matrices are also orthogonal.
E1.11. Matrix A is said to be unitary if A A = AA = I. Show that the normalized
Fourier matrix, given by
f 11
1 .
F = ..
N
f N1
..
.
f 1N
.. ;
.
f NN
f k = e2i(k1)(1)/N
is unitary.
E1.12. Determine whether the following statement is true or false, and explain: Let
Q be skew-symmetric, then xT Qx = 0 for all vectors x.
44
Matrix Algebra
E1.13. A matrix that is useful for the method of Gaussian elimination is the matrix
obtained by replacing the kth column of an identity matrix by vector w as
follows:
1
w1
0
..
..
.
.
1
w
k1
EL(k, w) =
w
k
w
1
k+1
.
.
..
..
0
wN
1
1. Evaluate the determinant of
EL(k, w).
EL(k, w) is given by
2. Assuming wk = 0, show that the inverse of
1
(w1 /wk )
0
..
..
.
.
/w
)
1
(w
k1
k
EL(k, w) =
(1/wk )
(wk+1 /wk ) 1
..
..
.
.
(wN /wk )
3. Consider a matrix A[=]N M with a nonzero element in the kth row and
th column, that is, ak = 0. Based on ak , define the elements of vector
g(ak ) as follows:
ai
if i = k
ak
gi =
if i = k
ak
Discuss the effects of premultiplying A by
EL k, g(ak ) . (Hint: Use the
following specific example,
4 1 3 2
A= 2
1 5 2
1
0 2 1
Construct
EL 3, g(3, 1) and
EL 1, g(1, 4) , then obtain the products
ELA and observe the effects. Using these results, infer as well as prove the
general results.)
E1.14. The Vandermonde matrix is a matrix having a special structure given by
n1
1
n2
1 1
1
n1
2
n2
2 1
2
V = .
..
..
..
..
.
.
. .
.
.
n2
1
n1
n
n
n
and a determinant given by
det V =
(1.53)
(i j )
i<j
1.7 Exercises
45
1. Verify the formula (1.53) for the case = {1, 1, 2, 3}. What happens to
the determinant if any of the s are repeated?
2. An (n 1)th degree polynomial
y = an1 xn1 + + a1 x + a0
is needed to pass through n given points (xi , yi ). Find a matrix G such that
solving Gv = y yields the vector of coefficients v = (a0 , . . . , an1 )T where
y = (y1 , . . . , yn )T . What is the conditions on the points (xi , yi ) so that a
unique solution is guaranteed to exist?
d1
m1
m1
d2
m2
m2
dN
mN
for j = 1
d1
for 2 j N
d j d j 1
xj =
for j = N + 1
dN
while the force balances for each mass become
mi g = f s,i f s,i+1 = ki xi ki+1 xi+1
where g is the gravitational acceleration.
1. Show that the main equations relating the vector of displacements d =
(d1 , . . . , dN )T and the vector of masses m = (m1 , . . . , mN )T can be put in
matrix form as
HHT d = m
(1.54)
7
Adapted from the discussion in G. Strang, Introduction to Applied Mathematics, WellesleyCambridge Press, Wellesley, MA, 1986, pp. 4044.
mN
46
Matrix Algebra
k1
1 1
0
0
g
1
1
..
=
H=
.
..
..
.
.
kN+1
0
0
1 1
g
2. We can partition matrices H and further as
0
H= H q
=
0
with q = (0, . . . , 0, 1)T , = kN+1 /g and
k1
1 1
0
g
.
.
..
..
=
=
H
..
. 1
0
1
0
..
.
kN
g
and
are given by
Show that the inverses of H
1 1
r1
.
1
1
.
. . ..
H =
and
=
0
0
..
rN
where r j = g/k j .
3. Using this partitioning, we can reformulate (1.54) as
T + qqT d = m
H
H
(1.55)
Show that after applying the matrix inverse formula of (1.27) to (1.55), we
have
d = (AB)m
where
1
..
A= .
1
0
..
0
..
and
n
j =1
1
RN+1
0
..
.
0
R1
rN
B=I
where Rn =
r1
1
0
0
..
.
0
RN
..
.
1
..
.
1
n
g
rj =
.
kj
j =1
1.7 Exercises
V0
47
V1
V2
L2
VN-1
L3
LN
L1
(1.56)
Mass balance:
n Li+1 n Li n Vi = 0
(1.57)
(1.58)
(1.59)
Component balance:
nL =
n L1
n LN
T
Obtain matrices A, B, and C and vectors q and g such that (1.56), (1.57),
and (1.59) can be represented in the following matrix equations:
where
nV + CnL
gT nL
xLN+1 n LN+1
h=
0
..
.
(1.60)
0
..
f= .
0
n LN+1
HLN+1 n LN+1
3. Assuming the inverses exist, show that the value of nV0 can be obtained
from (1.60) as
n V0 =
(1.61)
G= I
gT
(1.62)
n = nL
v=
nV
n V0
h
f
xLN+1 n LN+1
VN
LN+1
48
Matrix Algebra
Show that one can obtain (1.61) by using Cramers rule to solve n V0 from
(1.62). (Hint: use (1.32) and (1.35).)8
E1.17. Prove that the inverse of an upper triangular matrix is also an upper triangular
matrix.
E1.18. Verify the formula (1.26) for the inverse of the following matrices:
1
2 2 0
1 0 0
0 1 1 2
L= 2 1 0 U =
0
0 2 1
2 3 2
0
0 0 1
Extend the inverse formula (1.26) for the triangular matrices to apply to
block-triangular matrices. Verify your block triangular matrix inverse formula to find the inverse of the following matrix:
1 1 1 0 0 0 0
2
0 1 0 0 0 0
1 1 3 0 0 0 0
H= 2
0 1 2 2 0 0
1 1 1 2 0 0
1
0
3 2 1 0 1 1
1 1 1 0 2 2 3
E1.19. Often in recursive data regression, a matrix A is updated as follows: Anew =
A + uvT , where u and v are column vectors.
1. Prove the following identity:
det A + uvT = 1 + vT A1 u det(A)
(1.63)
2. Equation (1.63) implies that the new matrix Anew = A + uv will become
singular if vT A1 u = 1. For the following case:
0
1
1
2
3
A = 0 1 1 , u = 0 , v = 0
0
1
4
1
T
find the value of that would make Anew singular. (This means that any
nonsingular matrix can be made singular by changing the value of just a
single element.)
E1.20. Let A[=]N N and D[=]M M be nonsingular matrices, N = M. Prove (or
disprove) the following equation:
1
1
(1.64)
A BD1 C BD1 = A1 B D CA1 B
E1.21. The unsteady-state heat conduction in a flat rectangular plate of dimension
L W is given by
2
T
T
2T
=
+ 2
t
x2
y
subject to the boundary conditions
W y
T (0, y, t) = 50 +
50
W
W y
T (L, y, t) = 50
50
W
8
In practice, one would simply apply Cramers rule directly to determine n V0 because (1.61) still
contains an inverse calculation. This subproblem is just an academic exercise in applying the various
properties of block matrix operations.
1.7 Exercises
T (x, 0, t)
50
T (x, W, t)
100
49
Condenser
L,x0
D,x0
L,xk
V,yk+1
for k = 1
for k > 1
(1.66)
x0 contains the mole fractions in the distillate stream, p contains the reciprocal
relative volatilities 1/i , and R is the (scalar) reflux ratio.
1. Evaluate the composition of the liquid out of the third stage, x3 , using the
following values:
0.2
7
p = 4 , x0 = 0.1 , R = 0.6
0.7
1
9
This problem is adapted from the example given in N. Amundson, Mathematical Methods in Chemical Engineering, Volume 1, Matrices and Their Applications, Prentice Hall, Englewood Cliffs, NJ,
1966, pp. 149157.
50
Matrix Algebra
2. Show that for k > 1, the iterative formula (1.66) can be recast as
qk = Aqk1
where
A = D + pxT0
and
D=
(1.67)
Rp 1
..
Rp n
adj (A I)
I pxT0
where
=
n
(Rp i )
i=1
=
(Rp 1 )1
0
n
x0i p i
= 1 +
Rp i
i=1
..
.
1
(Rp n )
(1.68)
(1.69)
(Hint: Use (1.63) for the determinant and (1.27) for the adjugate formula.)10
E1.23. Consider the tridiagonal matrix given by
a1 b1
..
c1 . . .
.
TN =
..
..
.
.
0
cN1
bN1
aN
(1.70)
1 1
0
A = 2
2
1
0
2 2
10
This formula for the adjugate can be used further to obtain a closed formula for xn based on
Sylvesters formula.
1.7 Exercises
51
k+1
(a
)k+1
(a +
)
if
= 0
2k+1
Dk =
(1.73)
a k
if
= 0
(1 + k)
2
(Hint: The solution given in (1.73) can be derived from (1.72) by treating
it as a difference equation subject to the two initial conditions. These
methods are given in Section 7.4. For this problem, all that is needed is to
check whether (1.73) will satisfy (1.72).)
E1.24. The general equation of an ellipse is given by
2 2
x
y
xy
+
2
cos = sin2
a1
a2
a1 a2
(1.74)
where a1 , a2 , sin() = 0.
Let v = (x, y)T , find matrix A such that equation (1.74) can be written as
vT Av = 1
(Note: vT Av = 1 is the general equation of a conic, and if A is positive definite,
then the conic is an ellipse.)
E1.25. Show that tr (AB) = tr (BA) (assuming conformability conditions are met).
E1.26. Let f (x1 , x2 ) be given by
f (x) = exp 3 (x1 1)2 + (x2 + 1)2 exp 3 (x1 + 1)2 + (x2 1)2
Find the gradient df/dx at x = (0, 0)T , x = (1, 1)T and x = (1, 1)T . Also,
find the Hessian at x = (1, 1)T and x = (1, 1)T and determine whether
they are positive or negative definite.
E1.27. Determine which of the following matrices are positive definite:
3 4
3 4
0 2
A=
B=
C=
1 2
1 0
2 0
E1.28. Let A be a square matrix containing a zero in the main diagonal. Can A be
positive definite? Why or why not?
E1.29. Prove the following equality
d
d
d
A B+A
B
(AB) =
dt
dt
dt
E1.30. If A is nonsingular, prove that
d 1
dA 1
A = A1
A
dt
dt
E1.31. A Marquardt vector update with a scalar parameter is defined by
1 T
p () = J T J + I
J F
(1.75)
52
Matrix Algebra
1
pT J T J + I
p
d
!
=
T
d
p p
E1.32. Let P be a simply connected polygon (i.e., containing no holes) in a 2D plane
described by points (xi , yi ) indexed in a counterclockwise manner. Then the
area of polygon P, areaP , can be obtained by the following calculations:11
1
x
x
x2
x3
x1
x
y1 y2
y2 y3
yN y1
2
(1.76)
1. Show that (1.76) can be recast as follows:
1
areaP = abs xT Sh ShT y
2
where Sh is the shift matrix given by
0 1
0
..
..
.
Sh = .
0 0
1
1
(1.77)
2. Verify (1.77) for the simple case of the triangle determined by (x, y)1 =
(1, 1), (x, y)2 = (2, 1), and (x, y)3 = (1.8, 2). Furthermore, notice that if we
replace the third point by (x, y)3 = (a, 2) for arbitrary real values for a, we
will obtain the same area. Explain why this is the case, both for the figure
and the formula.
3. The points covering the area of crystal growth was obtained from a scanned
image and given in Table 1.10. Plot the polygon and then obtain the area
using (1.77).
E1.33. Consider the generalized Poisson equation in a rectangular domain L W
given by
2u 2u
+ 2 = (x, y)u + (x, y)
x2
y
subject to the boundary conditions,
u(0, y) = f 0 (y)
u(L, y) = f L(y)
u(x, 0) = g 0 (x)
u(x, W) = g W (x)
and
x2
x2
y2
y2
11
M. G. Stone, A mnemonic for areas of polygons, The American Mathematical Monthly, vol. 93,
no. 6, (1986), pp. 479480.
1.7 Exercises
53
0.2293
0.2984
0.3583
0.3813
0.3813
0.3975
0.4459
0.4988
0.5311
0.5749
0.6555
0.3991
0.4488
0.4781
0.5541
0.6155
0.6915
0.7325
0.6827
0.5892
0.5687
0.5482
0.6970
0.7431
0.7385
0.6901
0.6210
0.5634
0.5173
0.5081
0.4965
0.4666
0.5307
0.5015
0.4693
0.4459
0.3991
0.4079
0.3670
0.2675
0.1944
0.1535
0.4343
0.4067
0.3813
0.3629
0.3445
0.3353
0.3168
0.2800
0.2569
0.2339
0.1447
0.1681
0.2208
0.2675
0.2939
0.3114
0.3260
0.3319
0.3406
0.3582
where n = 1, 2, , N, m = 1, 2, , M,
x = L/(N + 1),
y = W/(M + 1)
and un,m = u(n
x, m
y). Show that the finite difference approximation will
yield the following matrix equation:
AU + UB = Q U + H
(1.78)
where A = (1/
x )T [NN] , B = (1/
y )T [MM] , Q = [qnm ],
2 1
0
1 2 . . .
[=]K K
T [KK] =
..
..
.
. 1
0
1 2
f 0,1 f 0,M
0
11 1M
..
..
..
H = ...
.
.
.
x2
N1 NM
0
0
f L,1 f L,M
0 g W,1
g 0,1 0
1
..
.
..
2 ...
. ..
.
y
0 g W,N
g 0,N 0
2
One of the most basic applications of matrices is the solution of multiple equations.
Generally, problems involving multiple equations can be categorized as either linear
or nonlinear types. If the problems involve only linear equations, then they can
be readily formulated as Ax = b, and different matrix approaches can be used to
find the vector of unknowns given by x. When the problem is nonlinear, more
complex approaches are needed. Numerical approaches to the solution of nonlinear
equations, such as the Newton method and its variants, also take advantage of matrix
equations.
In this chapter, we first discuss the solution of the linear equation Ax = b.
This includes direct and indirect methods. The indirect methods are also known as
iterative methods. The distinguishing feature between these two types of approaches
is that direct methods (or noniterative) methods obtain the solution using various
techniques such as reduction by elimination, factorization, forward or backward
substitution, matrix splitting, or direct inversion. Conversely, the indirect (iterative)
methods require an initial guess for the solution, and the solution is improved using
iterative algorithms until the solution meets some specified criterion of maximum
number of iterations or minimum tolerance on the errors.
The most direct approach is to simply apply the inverse formula given by
x = A1 b
This is a good approach as long as the inverse can be found easily, for example, when
the matrix is orthogonal or unitary. Also, if the matrix is diagonal or triangular,
Section 1.3.3 gives some direct formulas for their inverse. However, in general, the
computation of the matrix inverse using the adjoint formula given in (1.16) is not
the method of choice, specially for large systems.
In Section 2.1, we discuss the first direct method known as the Gauss-Jordan
elimination method. This simple procedure focuses on finding matrices Q and W
such that QAW will yield a block matrix with an identity matrix in the upper left
corner and zero everywhere else. In Section 2.2, a similar approach known as the LU
decomposition method is discussed. Here, matrix A is factored as A = LU, with L and
U being upper triangular and lower triangular matrices, respectively. The triangular
structures of L and U allow for quick computation of the unknowns via forward and
backward substitutions. Other methods such as matrix splitting techniques, which
54
55
take advantage of special structures of A, are also discussed, with details given the
appendix as Section B.6.
In Section 2.4, we switch to indirect (iterative) methods, which include: Jacobi,
Gauss-Seidel, and the succesive over-relaxation (SOR). Other iterative methods,
such as conjugate-gradient (CG) and generalized minimal residual (GMRES) methods, are also discussed briefly, with details given in the appendices as Sections B.11
and B.12.2.
In Section 2.5, we obtain the least-squares solution. These are useful in parameter
estimation of models based on data. We also include a method for handling leastsquares problems that involve linear equality constraints.
Having explored the various methods for solving linear equations, we turn our
attention to the solution of multiple nonlinear equations in Section 2.9. We limit our
discussion to numerical solutions based on Newtons methods, because they involve
the application of matrix equations.
QAx
Qb
Qb
W 1 x
Qb
WQb
(QAW) W
(2.1)
(2.2)
If A is nonsingular, there are several values of Q and W that will satisfy QAW = I.1
EXAMPLE 2.1.
2 1
1
x1
1
4
5 1 x2 = 15
x3
1
3
2
3
Leaving the details for obtaining Q and W for now, note that the following
matrices will satisfy QAW = I:
0
1/5
0
0
1
13/17
Q=
0
3/17
5/17
W = 1 4/5 7/17
17/50 1/10
7/25
0
0
1
1
= TQ and W
W
= WT 1 for any nonsingular T will also yield QA
=I
Suppose QAW = I, then Q
Q.
and A1 = WQ = W
56
1
1
x = WQ 15 = 2
3
1
Ir
QAW =
0
[Nr,r]
0[r,Mr]
0[Nr,Mr]
(2.3)
1 1 1
A = 1 2 3
2 4 3
With
EL =
0
1
we have
0
1
0
1/4
1/2
1/4
and
ELAER =
0
0
ER =
1
0
0
2
1/2
1
1/2
0
0
3/4
1
0
3/2
1/4
Matrix EL and ER can be easily formulated using the formulas given in (B.3) to
(B.6). The next stage is to extract the lower right block matrix and then apply the
elimination process once more. The process stops when the lower right blocks are
all zeros. The complete details of the classic Gauss-Jordan elimination are given in
Section B.1 as an appendix. In addition, a MATLAB program gauss_jordan.m
is available on the books webpage.
57
Ir
0
[Nr,r]
0[r,Mr]
0[Nr,Mr]
yupper
y
lower
Qb
Qupper
Q
lower
(2.4)
(2.5)
(2.6)
= rank
A
(2.7)
Note that when b = 0, both (2.6) and (2.7) are trivially satisfied.
Suppose Qlower b = 0 in (2.6). Using yupper from (2.5) and appending it with
(M r) arbitrary constants ci for ylower , we have
Qupper b
1
W x=y=
c1
..
.
cMr
Qupper b
x=W
c1
..
.
cMr
58
Let WL and WR be matrices formed by the first r columns and the last (M r)
columns of W, respectively; then the solutions are given by
x=
WL
EXAMPLE 2.2.
WR
Qupper b
c1
..
.
cMr
c1
.
= WL Qupper b + WR ..
cNr
1 2 1
x1
0
3 2 4 x2 = 1
x3
2 4 2
0
0
0
1/4
0
0
Q = 0 1/3 1/6 W = 1 1/2
1
0
1/2
0
1
and
(2.8)
0
WL = 1
0
0
1/2
1
1
1/6
2/3
and
1
WR = 1/6
2/3
1
0
x = 1/6 + 1/6 c1
1/3
2/3
Remarks:
1. The difference DOF = (M r) is also known as the degree of freedom, and it
determines the number of arbitrary constants available (assuming that solution
is possible).
2. Full-rank non-square matrices can be further classified as either full-columnrank (when r = M < N), also known as full-tall matrices, or full-row-rank (when
r = N < M), also known as full-wide matrices.
For the full-tall case, assuming that Qlower b = 0, (2.8) implies that only one
solution is possible, because DOF = 0.
However, for the full wide-case, the condition given in (2.6) is not necessary
to check because Qlower is no longer available. This means that for full-wide
matrices, infinite solutions are guaranteed with DOF = M r.
3. In Section 2.5, the linear parameter estimation problem involves full-tall matrices
A, in which the rows are associated with data points whereas the columns are
associated with the number of unknown parameters x1 , . . . , xN . It is very likely
2.2 LU Decomposition
59
that the Ax = b will not satisfy (2.6), that is, there will not be an exact solution.
The problem will have to be relaxed and modified to the search for x that
would minimize the difference (b Ax) based on the Euclidean norm. The
modified problem is called the least-squares problem, which is discussed later in
Section 2.5.
2.2 LU Decomposition
Instead of using the matrices Q and W resulting from the Gauss-Jordan elimination
method, one can use a factorization of a square matrix A known as the LU decomposition in which L and U are lower and upper triangular matrices, respectively, such
that
A = LU
(2.9)
The special structures of L and U allow for a two-phase approach to solving the linear
problem Ax = b: a forward-substitution phase followed by a backward-substitution
phase. Let y = Ux, then the LU decomposition in (2.9) results in
11
21
..
.
22
..
.
N1
N2
0
..
NN
Ax
L (Ux)
Ly
y1
y2
..
.
yN
b1
b2
..
.
bN
u11
..
.
u1,N1
..
.
u1N
..
.
uN1,N1
uN1,N
x1
..
.
xN1
uNN
Ux
y1
..
.
yN1
xN
xN1
yN1 uN1,N xN
=
uN1,N1
x1 =
yN
then
yN
xN =
uNN
y1
N
i=2
u1i xi
u11
(2.11)
60
2 1 1
1
4 0
2 x = 8
6 2
2
8
One LU decomposition
by
L=
2
3
0
0
1
and
U=
0
0
1
1
0
1
2
3
x3 =
6
=2
3
x2 =
5 2(2)
= 1
1
x1 =
1 (1)(2) 1(1)
=1
2
Remarks: In MATLAB, one can use the backslash ( \) operation for dealing with
either forward or backward substitution.2 Thus, assuming we found lower and upper
matrices L and U such that A = LU, then for solving the equation Ax = b, the forward substitution to find y = L1 b can be implemented in MATLAB as: y=L\b;.
This is then followed by x = U 1 y, the backward substitution that can be implemented in MATLAB as: x = U\y.
aij =
N
k=1
ik ukj
i1
i1
=
ii u jj + k=1
ik ukj
j 1
if i < j
if i = j
(2.12)
if i > j
When the backslash operation is used in MATLAB, different cases are assessed first by MATLAB,
that is, the function determines first whether it is sparse, banded, tridiagonal, triangular, full, partialranked, and so forth, and then it chooses the appropriate algorithms. More details are available from
the help file, mldivide.
2.2 LU Decomposition
61
Crouts Method
Doolittles Method
Choleskis Method
Algorithm (For p = 1, . . . , N)
u pp
ip
u pj
pp
u pj
ip
pp
ip
1
1
aip
a pj
p 1
k=1
p 1
k=1
ik ukp
pk ukj / pp
pk ukj
p 1
aip k=1 ik ukp /u pp
a pj
p 1
k=1
"
p 1
a pp k=1 2pk
p 1
aip k=1 ik pk / pp
for i = p, . . . , N
for j = p + 1, . . . , N
for j = p, . . . , N
for i = p + 1, . . . , N
for i = p + 1, . . . , N
Remarks:
1. The nonzero elements of L and U are evaluated column-wise and row-wise,
respectively. For example, in Crouts method, at the pth stage, we first set
u pp = 1 and then evaluate pp . Thereafter, the pth column of L and the pth row
of U are filled in.
2. The advantage of Choleskis method over the other two methods is that the
required storage is reduced by half. However, Choleskis method requires the
square root operation. Other factorizations are available for symmetric positive
definite matrices that avoid the square root operation, for example, A = LDLT ,
where D is a diagonal matrix.
3. Because the methods use the reciprocal of either ii or u jj , pivoting, that is,
permutations of the rows and columns of A, are sometimes needed, as discussed
next.
For nonsingular matrices, pivoting is often needed unless A is positive definite
or A is diagonally dominant, that is, if
|aii |
(2.13)
[aij ]
j =i
A simple rule for pivoting at the pth stage is to maximize | pp | (for Crouts method)
or |u pp | (for Doolittles method) by permuting the last (N p ) columns or rows. In
MATLAB, the command for LU factorization is given by: [L,U,P]=lu(A), where
LU = PA and P is a permutation matrix. (A MATLAB file crout_rowpiv.m is
available on the books webpage that implements Crouts method with row-wise
pivoting.)
If A is singular, permutation of both rows and columns of A are needed, that is,
one must find permutation matrices PL and PR such that
L1 0
U1
C
PLAPR =
(2.14)
B 0
0
I(Nr)
where L1 and U 1 are lower and upper triangular matrices, respectively.
62
1 1 1 1 1 1 1
1 0 1 0 0 0 0
1 1 0 0 0 0 0
A=
1 0 0 1 0 0 0
1 0 0 0 1 0 0
1 0 0 0 0 1 0
1
1
0
0 0 0 0 0
1 1
0 0 0 0 0
0 1 0 0 0 0
L = 1 1 1 2 0 0 0 and U =
1 1 1 1 3 0 0
1
4
1 1 1 1 2 3 0
1
2
1
3
5
4
1
2
1
2
1
3
2
1
3
1
This example shows a significant number of fill-ins compared with the original A.
Thus, if one wants to use LU factors to solve the linear equation Ax = b, where A
is sparse, some additional preconditioning is often performed before solution. For
instance, one could find a permutation matrix P that transforms the problem into
PAPT (Px) = (Pb)
A
x =
b
such that
A = PAPT attains a new structure that would allow minimal fill-ins in L
and U. One popular approach is to find
A that has a small bandwidth.
Definition 2.1. For a square matrix A of size N, the left matrix bandwidth of A
is the value
BWleft (A) = max k = j i i j, aij = 0
Likewise, the right matrix bandwidth of A is the value
BWright (A) = max k = i j i j, aij = 0
The maximum of left or right bandwidths
BW (A) = max BWleft (A) , BWright (A)
is known as the matrix bandwidth of A.
In short, the left bandwidth is the lowest level of the subdiagonal (diagonals
below the main diagonal) that is nonzero, whereas the right bandwidth is the highest
2.2 LU Decomposition
1
0
0
0
0
2
0
0
0
0
0
1
A=
0
0
0
1
0 1
0
0
0
0
0
0
63
0
0
0
0
0
1
then BWleft (A) = 3 and BWright (A)
=
1, so BW (A) = 3. Note also that BWleft AT =
1 and BWright AT = 3, so BW AT = 3. Thus, for symmetric matrices, the bandwidth, the left bandwidth, and the right bandwidth are all the same.
From the LU factorization calculations used in either Crouts or Doolittles
algorithm, it can be shown that the L will have the same left bandwidth as A, and
U will have the same right bandwidth of A. This means that the fill-ins by L and
U factors can be controlled by reducing the bandwidth of A. One algorithm for
obtaining the permutation P such that
A = PAPT has a smaller bandwidth is the
Reverse Cuthill-McKee Reordering algorithm. Details of this algorithm are given in
Section B.4 as an appendix. In MATLAB, the command is given by p=symrcm(A),
where p contains the sequence of the desired permutation.
a1 b1
..
c1 . . .
.
A=
..
..
.
.
0
cN1
bN1
aN
(2.15)
1 (b1 /z1 )
z1
0
0
..
..
c1 . . .
.
.
and U =
L=
..
..
..
. bN1 /zN1
.
.
0
cN1 zN
0
1
where
z1 = a1
and
zk = ak
bk1 ck1
zk1
k = 2, . . . , N
(2.16)
v1
z1
and
yk =
1
(vk ck1 yk1 )
zk
k = 2, . . . , N
k = N 1, . . . , 1
(2.17)
(2.18)
64
One implication of (2.16), (2.17), and (2.18) is that there is no need to form matrices
L and U explicitly. Furthermore, from (2.16), we note that the storage space used
for ak can also be relieved for use by zk . The method just described is known as the
Thomas algorithm and it is often used in solving linear equations resulting from onedimensional finite difference methods. (A MATLAB code thomas.m is available on
the books webpage, which implements the Thomas algorithm and takes advantage
of storage savings.)
EXAMPLE 2.4.
d2 u
du
+ h1 (x)
+ h0 (x)u = f (x)
2
dx
dx
subject to u(0) = u0 and u(L) = uL. We can use the finite difference approximations of the derivatives given by
d2 u
uk+1 2uk + uk1
2
dx
x2
and
du
uk+1 uk1
dx
2
x
Let h2,k = h2 (k
x), h1,k = h1 (k
x), h0,k = h0 (k
x), f k = f (k
x) and
uk = u(k
x), where
x = L/(N + 1). This results in the following linear equation:
a1 b1
0
u1
w1
..
..
u2
w2
.
.
c1
= .
..
.
.
.
..
..
bN1 . .
uN
wN
0
cN1
aN
where,
ak = h0,k
2h2,k
x2
h2,k
h1,k
h2,k+1
h1,k+1
; bk =
+
; ck =
2
2
x
2
x
x
2
x
f 1 c0 u0
if k = 1
fk
if 2 k N 1
wk =
f N bN uL if k = N
yT
The exact solution is u(x) = 5xex e3x . Plots of the numerical solution and
the exact solution are shown in Figure 2.1.
65
1.5
u(x)
0.5
0
Exact
Finite Difference
0.5
1
0
0.5
1.5
2.5
3.5
x
Figure 2.1. A plot of the numerical solution using finite difference approximation and the
exact solution for Example 2.4.
A block matrix LU decomposition is also possible. The details for this are given
in Section B.5. Likewise, a block matrix version of the Thomas algorithm is available
and is left as an exercise (cf. exercise E2.16).
2
1
0
0
0
1 2
1
0
0
1 2
1
0
A=
2
and
3
0
1 2
1
EXAMPLE 2.5.
b=
2
1
0
0
0
1 2
1
0
0
1 2
1
0
M= 0
and
0
0
1 2
1
0
S=
0
0
2
3
2
0
0
0
0
0
1
3
0
1
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Formulas for finding the inverse of triangular matrices are given in (1.26) and can be extended to
block triangular matrices as in Exercise E1.18.
66
Then we have
(M + S)x = b
(I + M1 S)x = M1 b
Since M is tridiagonal and only the first column of S contains nonzero terms,
M1 need to be applied only on the first column of S and on b, using the Thomas
algorithm. This results in a lower triangular problem which can be solved using
forward substitution,
2.625
3.5
1.333 0 0 0 0
4.250
6.0
0.667 1 0 0 0
6.750
5.0
0.667 0 0 1 0
1.750
3.5
0.667 0 0 0 1
(2.19)
where H and c are constant, and x(k) is the kth iteration for the solution x. A solution
is accepted when the iterations have converged to a vector x , for example, when the
norm given by
#
$ N
$ (k+1)
(k) 2
(k+1)
(k)
x
x =%
xi
xi
(2.20)
i=1
(2.21)
where i is the ith eigenvalue of H. We discuss eigenvalues and their role in stability in
more detail in Section 3.8, Example 3.10. For now, we simply state the the eigenvalues
of a matrix H[=]N N are the N roots of the polynomial equation,
det (I H) = 0
(2.22)
67
Examples of the stationary iterative methods include the Jacobi method, GaussSeidel method, and successive over-relaxation (SOR) methods. All three methods
result from a matrix splitting
A = A1 + A2
(2.23)
such that A1 x = v or A1
1 is easy to solve. In Section 2.3, we showed how matrix
splitting can be used for direct solution of linear equations if A possesses special
structures such as those given in Sections B.6.1 and B.6.2. However,
when the matri
ces are sparse, yet lacking in special structures, the inverse of I + A1
1 A2 may not
be as easy to calculate directly and is likely to significantly increase the number of
fill-ins. In this case, an iterative approach is a viable alternative.
A1 =
a11
..
0
A2 =
.
aN,N
0
a21
..
.
a12
0
..
.
..
.
a1N
a2N
..
.
aN1
aN2
A1 x
b A2 x
1
A1
1 b A1 A2 x
(2.24)
where x(k) is the kth iterate. One often writes (2.24) in the indexed notation as
xki = vi
sij xk1
(2.25)
j
j =i
where
vi =
bi
aii
sij =
aij
aii
j = i
68
convergence will depend on the spectral radius of A1
1 A2 . If the spectral radius is
not less than 1, additional techniques are needed.
sij xkj
j<i
sij xk1
n
(2.26)
ni
where
bi
aii
vi =
sij =
aij
aii
j = i
a1,1
1 = a2,1
A
.
..
aN,1
0
..
..
..
aN,N1
2 =
A
aN,N
a1,2
..
.
..
.
..
.
a1,N
..
.
aN1,N
0
(2.27)
(2.28)
xi
(k)
= x i
(k1)
+ (1 )xi
(2.29)
where
a2,1
L=
..
.
aN,1
0
..
..
..
U=
aN,N1 0
a1,1
D=
0
0
0
..
a1,2
..
.
69
..
.
..
.
a1,N
..
.
aN1,N
0
.
aN,N
Substituting this into the linear equation and then multiplying both sides by ,
Ax =
D + L + ( 1) D + U
x = b
D + L x
(1 ) D U x + b
(2.30)
Because &
A1 is also a lower triangular matrix, the solution of (2.30) will require
only a forward substitution. The convergence
1 of the SOR method is determined
2 .
by the value of the spectral radius of A 1 A
For positive definite, symmetric matrices, for example, L = U T , 0 < < 2
should be a sufficient condition for convergence. Otherwise, the upper bound on
may be less than 2. Also, the rateof convergence
depends on the relationship
1
2 .
between and the spectral radius of A 1 A
EXAMPLE 2.6.
A=
4.2
0
1
1
0
0
1
1
1 4.2
0
1
1
0
0
1
1
1 4.2
0
1
1
0
0
0
1
1 4.2
0
1
1
0
; b =
0
0
1
1 4.2
0
1
1
1
0
0
1
1 4.2
0
1
1
1
0
0
1
1 4.2
0
0
1
1
0
0
1
1 4.2
6.2
5.4
9.2
6.2
1.2
13.4
4.2
Using an initial guess x(0) = (1, 0, 0, 0, 0, 0, 0, 0)T , we can compare the performance of Jacobi, Gauss-Seidel, and SOR by tracing the convergence toward the
exact solution, which is given by xexact = (1, 2, 1, 0, 1, 1, 2, 1). Let kth error
be defined as
#
$ 8
2
$ (k)
(k)
x x
Err = %
i
i=1
exact,i
70
10
Jacobi
GaussSeidel
10
Err 1
10
10
10
10
10
20
30
40
50
k (iterations)
then Figure 2.2 shows the convergence between the Jacobi and Gauss-Seidel
methods. We see that the Gauss-Seidel method indeed improves on the convergence rate.
When = 1, the SOR method reduces to the Gauss-Seidel method. Figure 2.3 shows the convergence performance for = {0.8, 1, 1.3, 1.5, 1.7}. We
see that > 1 is needed to gain an improvement over Gauss-Seidel. The optimal choice among the values given was for = 1.3. However, the SOR method
was divergent for = 1.7. Note that A is not a symmetric positive definite
matrix.
In Section 2.7 and Section 2.8, we show two more iterative methods: the conjugate gradient method and the generalized minimal residual (GMRES) method,
respectively. These methods have good convergence properties and do not sacrifice the sparsity of the matrices. The iterations involved in the conjugate gradient
method and GMRES are based on optimization problems that are framed under
the linear-algebra perspective of the linear equation Ax = b. Thus, in the next section, we include a brief review of linear algebraic concepts, which also leads to the
least-squares solution.
10
=1.7
10
=1.5
10
Err
=0.8
10
=1.0
=1.3
10
10
10
20
30
k (Iterations)
40
50
71
(2.31)
The Euclidean norm of the residual vector will be used as a measure of closeness
between b and Ax, where the norm of a vector v is defined by
v = v v
(2.32)
As noted in Section B.7, one can find x such that r = 0 only if b resides in the span
of the columns of A. Otherwise, we have to settle for the search of
x that would
minimize the norm of r.
Recall from Theorem 1.1 that for
x to yield a local optimum value of f (x), it is
sufficient to have have the gradient of f be zero and the Hessian of f be positive
definite at x =
x. Furthermore, for the special case in which f (x) is given by
f (x) = x Qx + w x + c
(2.33)
where Q is Hermitian and positive definite, the minimum will be global. To see this,
first set the gradient to zero,
d
f
=
x Q + w = 0[1M]
Q
x = w
dx x=x
Because Q is nonsingular, the value for
x will be unique.
Let us now apply this result to minimize the norm of the residual vector. The
value of
x that minimizes r will also minimize r2 . Thus,
r2
r r = (b Ax) (b Ax)
(b b) + x A Ax 2b Ax
where we used the fact that b Ax is symmetric. Next, evaluating the derivative,
x, we have
d/dx r2 , and setting this to zero when x =
2
x A A 2b A = 0
or
x = A b
(A A)
(2.34)
Equation (2.34) is called the normal equation. The matrix A A is also known as the
x is unique if and only
Grammian (or Gram matrix) of A.4 The solution of (2.34) for
if A A is nonsingular. This condition is equivalent to having the columns of A be an
independent set, as given in the following theorem:
4
Some textbooks refer to A A as the normal matrix. However, the term normal matrix is used in
other textbooks (as we do in this text) to refer more generally to matrices K that satisfy KK = K K.
72
is nonsingular
PROOF.
1
M
j A,j = 0 A ... = 0
j =1
M
Multiplying by A ,
A A ... = 0
M
Thus, if A A is nonsingular, the unique solution is given by j = 0 for all j , that is,
the columns of A are linearly independent if and only if A A is nonsingular.
r = 2
(A A) x A b = 2A A
dx2
dx
Assuming that the columns of A are linearly independent, then A A is positive
definite.5 In summary, suppose the columns of A are linearly independent, then the
least-squares solution of Ax = b is given by
x = (A A)1 A b = A b
(2.35)
With y = Ax,
x A Ax = y y =
N
i=1
y i yi 0
(2.36)
73
P(mm Hg)
C(mole fraction)
T (o F)
P(mm Hg)
C(mole fraction)
60.2
70.1
79.9
90.0
62.3
75.5
77.2
85.3
600
602
610
590
680
672
670
670
0.660
0.672
0.686
0.684
0.702
0.711
0.713
0.720
60.7
69.3
78.7
92.6
60.5
70.6
81.7
91.8
720
720
725
700
800
800
790
795
0.721
0.731
0.742
0.742
0.760
0.771
0.777
0.790
(2.37)
where y is a dependent variable and v j is the j th independent variable. Note that for
the special case in which the elements of the matrices are all real, we can replace the
conjugate transpose by simple transpose.
EXAMPLE 2.7. Suppose want to relate the effects of temperature and pressure on
concentration by a linear model given by
C = T + P +
(2.38)
that would fit the set of experimental data given in Table 2.2. Based on the
model given in (2.38), we can formulate the problem as Ax = b, with
0.660
60.2 600 1
0.672
70.1 602 1
x=
b=
A= .
..
.
.
..
..
..
.
0.790
91.8 795 1
Because exact equality is not possible, we try for a least-squares solution
instead, that is,
0.0010
x = A b = 0.0005
0.2998
The linear model is then given by
C = 0.0010T + 0.0005P + 0.2998
(2.39)
A plot of the plane described by (2.39) together with the data points given in
Table 2.2 is shown in Figure 2.4.
74
C ( mole fraction )
0.9
0.8
0.7
100
600
o
T( F) 80
700
60
800
P (mm Hg)
The least-squares method can also be applied to a more general class of models
called the linear-in-parameter models. This is given by
y (v) =
M
i f j (v)
(2.40)
j =1
where v1 , . . . , vK are the independent variables and y(v), f 1 (v) , . . . , f M (v) are linearly independent functions. Methods for determining whether the functions are
linearly independent are given in Section B.8.
EXAMPLE 2.8. Consider the laboratory data given Table 2.3. The data are to be
used to relate vapor pressure to temperature using the Antoine Equation given
by:
log10 (Pvap ) = A
B
T +C
(2.41)
1.291 29.0 1
37.43
1
1.327 30.5 1
40.47
..
..
..
..
2
.
.
.
.
3
404.29
3.063 132.0 1
The normal equation yields 1 = 222.5, 2 = 7.39, and 3 = 110.28, or in terms
of the original parameters, A = 7.39, C = 222.5, and B = 1534. Figure 2.5 shows
the data points together with the model (2.41).
75
T ( C)
29.0
30.5
40.0
45.3
53.6
60.1
72.0
79.7
20
21
35
46
68
92
152
206
83.5
90.2
105.2
110.5
123.2
130.0
132.0
238
305
512
607
897
1092
1156
Another common situation for some least-squares problem Ax =lsq b is that they
might be accompanied by linear equality constraints. Let the constraints be given by
Cx = z
(2.42)
where A[=]N M, C[=]K M with K < M < N, and both A and C are full rank.
Using Gauss-Jordan elimination on C, we can find Q and W such that
0[K(MK)]
QCW = I[K]
then based on (2.8), the solution to (2.42) is given by
Qz
x=W
(2.43)
where v is a vector containing (M K) unknown constants. Let W = WL WR
with WL[=]M K and WR [=]M (M K). Then applying (2.43) to the leastsquares problem,
Qz
Ax = A WL WR
= b
v = (AWR ) (b AWLQz)
v
AWLQz + AWR v
Pvap ( mm Hg )
1500
1000
500
0
20
40
60
80
o
T ( C)
100
120
140
76
fV
0
0.0718
0.1121
0.1322
0.1753
0.1983
0.2500
0
0.1700
0.2332
0.1937
0.2530
0.3636
0.3478
fL
0.2931
0.3190
0.3362
0.3937
0.4052
0.4483
0.5172
fV
fL
0.4506
0.5257
0.5217
0.4032
0.5968
0.6522
0.6759
fV
0.5690
0.6236
0.6753
0.7443
0.7902
0.9080
0.9167
1.0000
0.7549
0.8103
0.8142
0.8300
0.8972
0.9289
0.9802
1.0000
where (AWR ) is the pseudo-inverse of (AWR ). This result is then used to form the
desired least-squares solution,
Qz
x=W
(2.44)
where v = (AWR ) (b AWLQz)
v
EXAMPLE 2.9. Suppose we want to fit a second-order polynomial model to relate
liquid mole fraction f L to the vapor mole fraction f V ,
f V = + f L + f L2
(2.45)
using data given in Table 2.4. The least-squares problem is then given by Ax = b,
where
1
0
0
0
..
..
..
..
.
.
.
A=
1 ( f L)n ( f L)n b = ( f V )n x =
.
..
..
..
..
.
.
1
1
1
1
The physical constraints due to pure substances require that f L = 0 at f V =
0, and f L = 1 at f V = 1.6 The constraints are Cx = z, where
1 0 0
0
C=
and z =
1 1 1
1
Using (2.44), we obtain
0
x = 1.6868
0.6868
2.6 QR Decomposition
77
0.8
fV
Figure 2.6. Comparison of models using constraints (solid line), models without using constraints (dashed line), and the data points (open
circles).
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
The plots in Figure 2.6 compare both models. Although they appear close to
each other, the violation of the constraints by the second model may present
some complications, especially when they are applied to a process simulator.
2.6 QR Decomposition
In this section, we introduce another factorization of A known as the QR decomposition. This allows for an efficient solution of the least-squares problem.7
The QR decomposition of A is given by A = QR, such that the columns of Q
are orthogonal to each other, that is, Q Q = I, and R is an upper triangular matrix.8
Details of the QR algorithm are included in Section C.2.1 as an appendix. We could
then apply this factorization to solve the normal equation as follows,
A Ax
(R Q ) (QR) x
R Rx
A b
A b
(2.46)
The QR algorithm is also an efficient method for calculating eigenvalues, as discussed in Section C.2.
Another classic method for finding a set of orthogonal vectors that has the same span of a given set
of vectors is the Gram-Schmidt method. Details of this method are given in Section B.9.
Another method is to use SVD to find the Moore-Penrose inverse, (cf. Section 3.9.1).
78
R
0
0
0
(see exercise E2.13, part b). In this case, due to the permutation P, the normal
equation of (2.46) will have to be modified to become
R Rx = PT A b
If Q is not needed, then use instead [R,P]=QR_house(A).
(2.47)
is minimized. Let
x be the exact solution, then f (
x) = 0.
Because the conjugate gradient method is iterative, it can take advantage of the
sparsity in matrix A when evaluating matrix products. However, unlike the Jacobi,
Gauss-Seidel, or SOR methods, the conjugate gradient method is guaranteed to
reach the solution within a maximum of N moves assuming there are no round-off
errors, where N is the size of x. If small roundoff errors are present, the Nth iteration
should still be very close to
x.
The conjugate gradient method is currently one of the more practical methods to
solve linear equations resulting from finite-element methods. The resulting matrices
from finite-element methods are usually very large, sparse, and, in some cases,
symmetric and positive definite.
In its simplest formulation, the method involves only a few calculations in each
step. The update equations are given by
x(i+1) = x(i) + (i) d(i)
(2.48)
where (i) and d(i) are the ith weight and ith update vector, respectively. The new
value will have a residual error given by
r(i+1) = b Ax(i+1)
(i)
10
(2.49)
The conjugate gradient method simply chooses the weight (i) and update vectors
such that
T
d(i) Ar(j ) = 0
for j < i
(2.50)
We will use the un-bold letter xi for the ith element of x.
2.8 GMRES
that is, d(i) will be A-orthogonal, or conjugate, to the past residual vectors r(j ) , j < 1
(which also happens to be the negative gradient of f (x)). This criteria will be satisfied
by choosing the following:
(0)
for i = 0
r
(r(i) )T r(i)
(i)
d =
and (i) = (i) T (i) (2.51)
(i) T
(i1)
(r ) Ad
(d ) Ad
2.8 GMRES
The conjugate gradient method was developed to solve the linear equation Ax =
b, where A is Hermitian and positive definite. For the general case, where A is
nonsingular, one could transform the problem to achieve the requirements of the
conjugate gradient method in several ways, including A Ax = A b or
A r b
I
A
0
x
0
Another approach, known as the Generalized Minimal Residual (GMRES)
Method, introduces an iterative approach to solve Ax = b that updates the solution
x(k) by reducing the norms of residual errors at each iteration. Unlike the conjugate
gradient method, GMRES is well suited for cases with non-Hermitian matrix A.
Briefly, in GMRES, a set of orthogonal vectors u(k) is built sequentially using
another method known as Arnoldis algorithm, starting with u(0) = r(0) . At each
introduction of u(k+1) , a matrix U k+1 is formed, which is then used to solve an
associated least-squares problem
r(0)
0
U k+1 AU k yk =lsq
..
.
0
79
80
to obtain vector yk . This result is then used to solve for the kth estimate,
x(k) = x(0) + U k yk
(2.52)
The process stops when the residual r(k) = b Ax(k) has a norm less than a specified
tolerance.
Although it looks complicated, the underlying process can be shown to minimize
the residual at every iteration. Thus the rate of convergence is quite accelerated.
In some cases, the solution may even be reached in much fewer iterations than N.
However, if round-off errors are present or if A is not well conditioned, the algorithm
may be slower, especially when U k starts to grow too large. A practical solution to
control the growth of U k is to invoke some restarts (with the current estimate as
the initial guess at the restart). This would degrade the efficiency of the algorithm
because the old information would be lost at each restart. Fortunately, the new initial
guesses will always be better than the initial guesses of previous restarts.
The details of GMRES, including some enhancements to accelerate the convergence further, can be found in Section B.12 as an appendix.
Remarks: In MATLAB, the command for the GMRES method is given by
x=GMRES(A,b,m) to solve Ax = b for x with restarts after every m iterations.
Also, a MATLAB function gmres_method.m is available on the books webpage
to allow readers to explore the algorithm directly without implementing restarts.
f 1 (x1 , . . . , xn )
..
F(x) =
=0
.
f n (x1 , . . . , xn )
Newtons method is an iterative search for the values of x such that F(x) is as
close to zero as the method allows. Using an initial guess, x(0) , the values of x(k+1) is
updated from x(k) by adding a correction term
k x,
x(k+1) = x(k) +
k x
To determine the correction term, we can use the Taylor series expansion of
F(x(k+1) ) around x(k) ,
dF
(k+1)
(k)
x
+ ...
F x(k+1) = F x(k) +
x
dx x=x(k)
By forcing the condition that F x(k+1) = 0 while truncating the Taylor series expansion after the second term, we obtain the update equation for Newtons method,
(2.53)
f1
f1
x1
xn
..
..
dF
..
.
.
Jk =
= .
dx x=x(k)
fn
fn
x1
xn
x=x(k)
The updates are then obtained in an iterative manner until the norm of F x(k) is
below a set tolerance, .
In summary, the Newtons method is given by the following procedure:
Algorithm of Newtons Method.
1. Initialize. Choose an initial guess: x(0)
2. Update. Repeat the following steps until either F x(k) or the number of
iterations have been exceeded
(a) Calculate J k .
(If J k is singular, then stop the method and declare
Singular Jacobian.)
(b) Calculate the correction term:
k x = J k1 F x(k)
(c) Update x: x(k+1) = x(k) +
k x
Remarks:
1. In general, the formulation of the exact Jacobian may be difficult to evaluate.
Instead, approximations are often substituted, including the simplest approach,
called the secant method, which uses finite difference to approximate the partial
derivatives, that is,
(k)
(k)
f1
f1
s1N x
x1 xN
s11 x
..
.
.
.
.
.
..
..
..
..
..
Jk = .
(2.54)
fN
(k)
f N
(k)
sNN x
sN1 x
(k)
x1
xN
x=x
where
sij (x) =
f i (x1 , . . . , x j +
x j , . . . , xN ) f i (x1 , . . . , xN )
x j
k F B
k x (
k x)T
=B +
B
(2.55)
T
(
k x)
k x
where
k F = F x(k+1) F x(k)
81
82
and B(0) can be initiated using (2.54). Moreover, because the Newtonupdate
in fact needs the inverse of the Jacobian, that is,
k x = J k1 F x(k)
1 (k)
F x , an update of the inverse of B(k) is more desirable. Thus,
B(k)
using (1.27), (2.55) can be inverted to yield
(
1
1 1 '
1
1
= B(k)
+
B(k+1)
B(k)
k F
k x
k xT B(k)
(2.56)
where
1
=
k xT B(k)
k F
2. One can also extend Newtons method (and its variants) to the solution of
unconstrained minimization
min f (x)
(2.57)
by setting
F(x) =
d
f
dx
T
and
J (x) =
d2
f
dx2
(2.58)
where J (x) is the Hessian matrix of f (x). The point x becomes a minimimum
of f (x) if (df/dx)T (x ) = 0 and d2 f/dx2 (x ) > 0.
One practical concern with Newton methods is that the convergence to the
solutions are strongly dependent on the initial guesses. There are several ways to
improve convergence. One approach is known as the line search method, also known
as the back-tracking method, which we discuss next. Another approach is the doubledogleg method, and the details of this approach are given in Section B.13 as an
appendix.
Remarks: In MATLAB, the command to find the solution of a set of nonlinear
equation is fsolve. Also, another function is available for nonlinear least squares
given by lsqnonlin.
k x = k
(2.59)
1
( x , f( x ) )
df (x )
dx k
0.5
f(x)
0.5
x
k+1
f(x)
0.5
1
2
83
xk+1
xk+2
0.5
1
2
Figure 2.7. The performances of the Newtons method to finding the solution of tanh(x) = 0.
The left plot used x0 = 0.9 as the initial guess, whereas the right plot used x0 = 1.1 as the
initial guess.
where
k = J k1 F x(k)
(2.60)
(2.62)
Setting the derivative of (2.62) equal to zero, we obtain the value 0 that would
minimize P0 ,
k (0)
0 =
2 k (1) k (0) k (0)
(2.63)
84
3
2
m1 m1 a k (m1 ) k (0) m1 k (0)
=
b
m2 2m2
k (m2 ) k (0) m2 k (0)
The minimum of Pm () can be found to be
!
b + b2 3ak (0)
m =
3a
(2.65)
(2.66)
10
m1
2
This is to avoid producing a that is too small, which could mean very slow convergence of the line search updates or even a premature termination of the solution.
On the other hand, if the decrease of is too small, the line search updates are more
likely to miss the regions that accept regular Newton updates.
The remaining issue is the acceptability condition for . A simple criteria is
that an acceptable = occurs when the average rate of change of k is at least a
fraction of the initial rate of change k (0), for example, for (0, 1],
k ( ) k (0)
k (0)
or equivalently,
k ( ) k (0) + k (0)
(2.67)
It can be shown that with 0.25, (2.65) can be guaranteed to have real roots.
However, a usual choice is to set as low as 104 . Because 1, this parameter is
often referred to as the damping coefficient of the line method.
To summarize, we have the following enhanced Newton line search procedure:
Algorithm of Enhanced Newtons Method with Line Search.
1. Initialize. Choose an initial guess: x(0)
2. Update. Repeat the following steps until either F x(k) or the number of
iterations have been exceeded.
(a) Calculate J k .
(If J k is singular, then stop the method anddeclare
Singular Jacobian.)
(b) Calculate the correction term: k = J k1 F x(k)
85
1
1
0.5
f(x)
0.5
f(x)
0
x
xk
k+1
xk+1
xk+2
0.5
0.5
1
2
Figure 2.8. The performances of the line search method to finding the solution of tanh(x) = 0.
The left plot used x0 = 1.1 as the initial guess, whereas the right plot used x0 = 4.0 as the initial
guess.
86
E2.1. The reaction for a multiple reversible first-order batch reaction shown in
Figure 2.9,
is given by
dx
= Ax
dt
where
(kab + kac )
kab
A=
kac
kba
(kba + kbc )
kbc
kca
kcb
(kca + kcb)
xa
x = xb
xc
xi is the mass fraction of component i and kij is the specific rate constant for
the formation of component j from component i.
The equilibrium is obtained by setting dx/dt = 0, i.e. A xeq = 0.
1. Show that A is singular.
2. Because A is singular, the linear equation should yield multiple equilibrium
values, which is not realistic. Is there a missing equation? If so, determine
the missing equation.
3. Is it possible, even with the additional equation, that a case of non-unique
solution or a case with no solution may occur? Explain.
E2.2. Let A[=]N N, B[=]M M, G[=]N N, H[=]M M and C[=]N M by
given matrices. Let X[=]N M be the unknown matrix that must satisfy
AX + XB + GXH = C
Obtain the conditions for which the solution X will be unique, non-unique,
or nonexisting.
E2.3. Find X (possibly infinite solution) that satisfies the following equation:
3 2 1
1 2 3
9 16 23
4 5 6 X + X 0 4 5 = 23 18 32
1 1 1
1 6 8
8 22 31
using the matrices Q and W found based on:
1. Gauss-Jordan method
2. Singular Value Decomposition method
E2.4. Consider the network given in Figure 2.10, given S1 = S3 = 15 v, A4,5 =
10 mA, and all the resistors have 1 K
except for R6 = 100
. Based on the
notations and equations given in Example 1.6,
2.11 Exercises
R2
R3
R1
S1
-
+ S 3
R6
R7
R9
R8
R 11
R 10
R5
R4
+
87
A4,5
Figure 2.10. Resistive network containing two voltage sources and one current source.
we have
1 T
R p = b R1 s
where p is the vector of node potentials, is the node-link incidence matrix,
p is the vector of current sources, s is the vector of voltage sources, and R
is a diagonal matrix consisting of the resistance of each link. Use the LU
factorization approach to solve for p, and then solve for the voltages v across
each resistor, where
v = T p s
E2.5. Redo the boundary value problem given in Example 2.4 but using the following functions:
h2 (x) = 2 + x , h1 (x) = 5x , h0 (x) = 2 + x
f (x) = 5x2 + 11x 8 e2x+6 + 8x2 + 57x + 2
and determine u(x) for 0 x 10 with the boundary conditions u(0) = 1
and u(10) = 81. Use the finite interval
x = 0.01 and compare with the exact
solution,
u(x) = (1 + 8x) + xe2x+6
(Note: Finding the exact solution for a high-order linear differential equation containing variable coefficients is often very difficult. We have an exact
solution in this case because we actually obtained the differential equation
and the boundary conditions using u(x).)
E2.6. Let matrix M be a tri-diagonal matrix plus two additional nonzero elements
m1,3 = s and mN,N2 = t,
a1 b1
s
0
..
..
c1
.
.
.
.
.
..
..
..
(2.68)
M=
..
..
.
.
bN1
t
cN1
aN
88
z1
0
c1 z2
.
.
..
..
L=
cN2
zN1
0
t
N,N1 N,N
s
1 f1
0
a1
1
f2
0
U =
(2.69)
..
..
.
.
1
f N1
0
1
with
z1 = a1
fk =
zk = ak ck1 f k1 , k = 2, . . . , N
bk
, for i = 1 . . . , N 1, i = 2
zk
f2 =
b2
s c1
z2
z2 a1
and
N,N1 = cN1 tf N2
N,N = zN + tf N1 f N2
4 1
1
4
1 1
0
Ax = 1
.
..
..
.
..
..
.
1
0
1
1
4
1
0
1
1
..
.
4
..
.
..
1
0
0
..
.
..
.
..
..
..
.
0
..
.
1
1
0
0
..
.
x =
1
4
1
2
..
.
2
1
2.11 Exercises
89
E2.8. Use the direct matrix splitting method given in Sections 2.3, B.6 to solve the
following problem:
1
2
1
3
4
3
1
2
0
0
0
0
1
2
2
0
0
0
0
5
2
3
0
0
1
1
1
1
1
0
3
3
2
1
2
1
x =
6
8
2
3
7
4
p k (x) = yk+1
yk
k+1
x3k+1
x2k+1
k
x
k+1
1
x3k
6xk+1
x2k
xk
6xk
x3
2
x
(2.70)
x
1
d2 p k
d2 p k
;
= k ;
= k+1
dx2 x=xk
dx2 x=xk+1
xk k1 + 2 (
xk +
xk+1 ) k +
xk+1 k+1 = 6
yk+1
yk
xk+1
xk
(2.71)
where
xk = xk xk1 and
yk = yk yk1 . Equation (2.71) applies only
to k = 2, . . . , N 1. Two additional specifications are needed and can be
obtained by either setting (i) 1 = 0 and N = 0 (also known as the natural
condition), or (ii) 1 = 2 and N = N1 , or (iii) setting 1 , 2 , and 3 to be
collinear and N2 , N1 , and N to be collinear, that is,
3 1
3 2
=
x3 x1
x3 x2
and
N2 N
N2 N1
=
xN2 xN
xN2 xN1
(2.72)
90
Show that (2.71) can be combined with the various types of end conditions
to give the following matrix equation:
y
1
3
2
6
x3
x2
.
..
(2.73)
M . =
6
yN
yN1
xN
xN1
0
where
a
x2
M=
b
2 (
x2 +
x3 )
..
.
x3
..
.
xN1
d
0
..
.
2 (
xN1 +
xN )
e
xN
f
and
End conditions
Parameters
1 = 0
(a, b, c) = (1, 0, 0)
N = 0
(d, e, f ) = (0, 0, 1)
1 = 2
(a, b, c) = (1, 1, 0)
N1 = N
(d, e, f ) = (0, 1, 1)
1 , 2 , 3 Collinear
(a, b, c) = (
x3 , (
x2 +
x3 ) ,
x2 )
N2 , N1 , N Collinear
(d, e, f ) = (
xN , (
xN1 +
xN ) ,
xN1 )
Note that matrix M is tri-diagonal for the first two types of end conditions, but
the third type (i.e., with collinearity) has the form given in (2.68). Thus (2.73)
can be used to generate the values of second derivatives k , k = 1, . . . , N,
which can then be substituted into (2.70) in the appropriate interval [xk , xk+1 ].
Using these results, obtain a cubic spline curve that satisfies (2.72) and
passes through the data points given in Table 2.5.
Table 2.5. Data for spline
curve interpolation
x
0.9931
0.7028
0.4908
0.3433
0.1406
0.0161
0.2143
0.4355
0.6244
1.0023
y
0.9971
1.4942
1.3012
0.9094
1.0380
1.3070
1.4415
1.1023
0.8509
0.6930
2.11 Exercises
91
N
N
i=1 wi
i=1
2
( N
w
)
i
i=1
N
2
i=1 wi
N 2
N i=1 wi
i=1 zi
wi zi
w1
..
A= .
wN
1
..
.
1
x=
z1
b = ...
zN
m
c
E2.11. Using data given in Table 2.6, obtain the parameters 1 , 2 , and 3 that would
yield the least-squares fit of the following model:
z = (1 w2 + 2 w + 3 ) sin(2w)
(2.74)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.65
0.01
0.87
0.55
1.02
0.46
0.08
1.23
1.99
0.89
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
1.82
3.15
4.01
2.51
0.21
3.39
8.18
7.52
4.23
0.15
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
7.12
11.25
14.27
9.11
1.81
9.44
18.24
16.55
11.20
0.64
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
13.99
24.27
23.53
15.45
0.62
16.37
30.67
31.63
20.13
2.48
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
5.0
23.33
36.52
40.41
24.56
0.63
26.76
46.66
47.83
31.23
0.55
E2.12. Using the vapor liquid equilibrium data shown in Table 2.7, obtain a 5th order
polynomial fit of f V as a function of f L that satisfies the following constraints:
fV = 0
fV = 1
f V = 0.65
when f L = 0
when f L = 1
when f L = 0.65
fV
fL
fV
fL
fV
fL
fV
0.02
0.06
0.11
0.18
0.22
0.07
0.16
0.24
0.33
0.35
0.29
0.37
0.50
0.71
0.78
0.43
0.50
0.56
0.68
0.74
0.85
0.89
0.95
0.99
0.25
0.79
0.86
0.92
0.97
0.37
0.38
0.54
0.81
0.94
0.97
0.45
0.58
0.73
0.88
0.93
92
3 1
5
4 1
6
5 1
x =lsq 7
6 1
8
7 1
9
8 1
10
use R obtained from the QR factorization of A and solve for x. Compare
this to using
R obtained from the Choleski factorization of AT A, that is,
T
T
R
R = A A.
2. Try the same approach on the following problem:
5
3 1 2
6
4 1 3
5 1 4
x =lsq 7
(2.75)
8
6 1 5
9
7 1 6
10
8 1 7
Is a situation possible in which there are no least-squares solution? If
solutions exist, how would one find the infinite least-squares solution x?
Test your approach on (2.75).
E2.14. A generalization of the Euclidean norm for a vector v is the Q-weighted
norm (also known as the Riemanian norm), denoted by vQ , where Q is a
Hermitian positive-definite matrix, defined as
!
vQ = v Qv
(2.76)
The Euclidean norm results by setting Q = I.
1. Show that the function Q satisfies the conditions of norms as given in
Table B.4. (Hint: With Choleski factorization, there exists a nonsingular
S such that Q = S S. Then use vQ = Sv and the fact that Euclidean
norms satisfy the conditions for norms.)
2. Using r2Q , where (r = b Ax) is the residual, show that the weighted
least-squares solution is the solution to the weighted normal equation
AT QAx = AT Qb
(2.77)
..
.
(2.78)
Q=
0
1
2.11 Exercises
93
where 0 < < 1 is known as the forgetting factor. Then Qr attaches heavier weights on more recent data.
Using data given in Table 2.8 and (2.77), obtain a weighted linear
regression model
z = mw + c
for = 1, 0.5, 0.1, and plot the three cases together with the data. Explain
the effects of different choices.
Table 2.8. Data for weighted linear regression
z
0.0104
0.0173
0.0196
0.0403
0.0472
0.0518
0.0588
0.0726
0.0841
0.1094
0.0219
0.0921
0.1652
0.1827
0.3085
0.2588
0.1915
0.3933
0.3289
0.4430
0.1187
0.1671
0.1717
0.1970
0.2293
0.2477
0.3145
0.3353
0.3744
0.3952
0.5219
0.5892
0.5278
0.5892
0.6184
0.6652
0.7032
0.7529
0.7208
0.7880
0.4459
0.4896
0.5012
0.5472
0.6048
0.7039
0.7730
0.8306
0.9528
0.9781
0.7792
0.8465
0.8319
0.8728
0.8728
0.9167
0.9313
0.9459
0.9722
0.9956
(2.79)
(2.80)
3. Use (2.79) and (2.80) to obtain the correction fN in the following recursive
formula:
xN+1 = xN + fN
(2.81)
th
For the inverse of (A + BCD), the Woodbury formula applied to the special case where B[=]N 1
and C[=]1 N is more well known as the Sherman-Morrison-Woodbury formula.
94
A1
C1
G=
B1
A2
..
.
0
B2
..
.
CN2
..
AN1 BN1
0
CN1
AN
where An , Bn , and Cn are all K K block matrices.
Q1
C1
L=
0
Q2
..
.
..
.
0
CN1
and, with C0 = 0, W0 = 0,
U=
QN
W1
..
.
0
..
I
0
Qn
An Cn1 Wn1
[=] K K
Wn
Q1
n Bn
[=] K K
WN1
I
A1
1
Zn
n = 2, 3, . . . , N
v1
x1
v = ...
x = ...
xN
vN
where xn and vn are vectors of length K.
Having both L and U be block-triangular matrices, show that the
forward substitution becomes
y1
Z1 v1
yn
for n = 2, . . . , N
yN
xn
yn Zn Bn xn+1
for n = (N 1) , . . . , 1
2.11 Exercises
95
Note: This the block-matrix version of the Thomas algorithm (cf. Section 2.2.2). If storage is a significant issue, one could use the storage of
An and replace it with Zn , except for the case of patterned An s, as in the
exercise that follows. Furthermore, from the backward substitutions given
previously, there is no need to store Wn either.
4. For n = 1, . . . , 10, let
4
1
1 0
1
0
Bn =
Cn =
An =
1 4
0 1
0 1
and
v1 =
2
2
v10 =
3
1
vk =
0
0
for k = 2, 3, . . . , 9
R I
0
I
R I
.
.
.
..
..
..
A=
I
R I
0
I
R
with
4
1
R=
1
4
..
.
1
..
.
1
1
4
..
.
4
1
Let R[=]25 25 and A[=]625 625. Solve the linear equation Ax = b for x,
when bk = 10 when k = 1, . . . , 5 and bj = 0 for j = 6, . . . , 625 using:
1. LU decomposition followed by forward and backward substitution.
2. Conjugate gradient method.
3. Block-Thomas algorithm (cf. see Exercise E2.16)
E2.18. A hot liquid stream entering the annulus of a double-pipe heat exchanger at
a temperature T hot,in = 300 C is being cooled by cold water counter-flowing
through the inner pipe and entering at T cold,in = 25 C from the other side of
the pipe. The working equations relating the temperatures of all the entering
and exiting flows are given by
T cold,out T cold,in
(T cold,out T cold,in )
U inner (Dinner L)
= 0.3
m
cold Cp,cold
and
m
hot Cp,hot
= 0.25
m
cold Cp,cold
96
f 1 (x, y)
f 2 (x, y)
(T hot,in x) (y T cold,in )
Apply Newtons formula using the secant approximation for the Jacobian
by choosing x = 0.1 and y = 0.1,
0
f
(x
+
x,
y)
f
(x,
y)
f
(x,
y
+
y)
f
(x,
y)
1
1
1
1
1
J
0
(x
+
x,
y)
f
(x,
y)
f
(x,
y
+
y)
f
(x,
y)
f
2
2
2
2
y
and initial guess x = y = 325/2.
2. If one or more equation can easily be solved explicitly, this can improve
computational efficiency by reducing the dimension of the search space.
For instance, one could first solve for y in terms of x in f 2 (x, y) = 0 to
obtain
y = T cold,in + (T hot,in x)
which can then be substituted into f 1 (x, y) to yield
f 3 (x) = f 1 x, T cold,in + (T hot,in x)
Use Newtons method using the secant approximation for the Jacobian
like before, with the same initial guess x = 325/2. (Note: In this case, f 3 (x)
is a scalar function, and one can plot the function to easily determine
whether there are multiple roots.)
E2.19. Given R equilibrium reactions that involves M > R components. Let vij be
the stoichiometric coefficients of component j in reaction i and stored in a
stoichiometric matrix V [=]R M where
M
vj
xj i
j =1
where Ki is the equilibrium constant for the ith reaction. A typical problem
is to determine the mole fractions x j based on a given set of initial moles n 0j
for each component and equilibrium constants Ki . One effective approach
to solve this problem is to use the concept of the extent of reaction i , which
is the number of units that the ith reaction had undergone in the forward
direction. For instance, let 2 = 2.5 and stoichiometric coefficient v23 = 2;
2.11 Exercises
97
then component 3 will gain 2 v23 = 5 moles due to reaction 2. Thus if we let
n j be the number of moles of component j at equilibrium, we have
n j = n 0j +
R
vkj k
k=1
M
vkj
j =1
vi j
R
M
M
n
+
v
0j
vj
k=1 kj k
Ki = 0
f i (1 , . . . , R ) = x j i Ki =
M
R
n
+
j =1 0j
k=1 k k
j =1
j =1
1. Show that the (i, k)th element of the Jacobian of f is given by
M
vij vkj
f i
i k
= (f i + Ki )
k
nj
nT
where n j = n 0j +
on f i , i.e.,
R
k=1
j =1
vkj k and n T =
M
j =1
f i x j
f i
=
k
x j k
M
j =1
and simplify further by using the definition of f i .) Next, show that the
Jacobian can also be put in matrix form as
d
J = f = DS
(2.82)
d
where D and S are a diagonal matrix and a symmetric matrix, respectively,
given by
0
f 1 + K1
..
D=
f R + KR
S=V
n 1
1
0
0
..
.
n 1
M
1
1 .
..
nT
1
..
.
1
..
.
1
which means
that the Newton update is given by
k = J 1 f =
S1 D1 f (k) . (Note: Because S involves n 1
j , one has to avoid choosing
an initial value guess that would make any of the n j equal to zero.)
98
C+D
Reaction 2: A + C
2E
(0)
(0)
(0)
(0)
(0)
y
2.3816
2.3816
2.3728
2.4518
2.5658
2.7763
2.9956
3.0395
3.0570
x
15.2650
19.4124
21.2558
25.4032
27.1313
27.8226
29.3203
31.0484
y
3.0132
3.0044
2.9956
2.7061
2.4868
2.2939
1.9430
1.5658
x
31.8548
34.1590
35.6567
37.6152
40.3802
43.7212
47.6382
49.4816
y
1.4167
0.9956
0.8377
0.7412
0.6711
0.6360
0.6360
0.6447
that would yield a least-squares fit of the data given in Table 2.9. Also, a
MATLAB file Levenmarg.m that implements the LevenbergMarquardt algorithm is available (Note: Use a random Gaussian number
generator to obtain the initial guess).
Matrix Analysis
In Chapter 1, we started with the study of matrices based on their composition, structure, and basic mathematical operations such as addition, multiplication, inverses,
determinants, and so forth, including matrix calculus operations. Then, in Chapter 2, we focused on the use of matrices to solve simultaneous equations of the form
Ax = b, including their applications toward the solution of nonlinear equations via
Newton algorithms. Based on linear algebra, we saw that Ax can also be taken to be
a linear combination of the columns A with the elements of x acting as the weights.
Under this perspective, the least-squares problem shifted the objective to be that of
finding x that would minimize the residual error given by r = b Ax.
In this chapter, we return to the equation Ax = b with a third perspective. Here,
we consider matrix A to be an operator that will transform (or map) an input
vector x to yield an output vector b, as shown schematically in Figure 3.1. We call
this the matrix operator perspective of the linear equation. The main focus is now
on A as a machine that needs to be analyzed, constructed, or modified to achieve
some desired operational characteristics. For instance, we may want to construct a
matrix A that rotates, stretches, or flips various points xi described by vectors. As
another example, a stress tensor (to be discussed in Chapter 4) can be represented
by a matrix T , which can then be used to find the stress vector s pointing in the
direction of a unit vector n by the operation s = T n.
We begin with some general matrix operators in Section 3.1. These include unitary or orthogonal operators, projection operators, reflector operators, and so forth.
We also include an affine extension of the various operators that would allow translation. Then, in Section 3.2, we introduce the important properties called eigenvalues
and objects called eigenvectors, which characterize the behavior of the matrix as an
operator. We also outline the properties of eigenvalues and eigenvectors of specific
classes of matrices. As we show in later chapters, the eigenvalues are very important
tool, for determining stability of differential equations or iterative processes. We
also include an alternative numerical approach to the evaluation of eigenvalues and
eigenvectors, known as the QR method. This method is an extension of another
method known as the power method. Details of the QR method and the power
method are included in Section C.2 as an appendix.
Another application of eigenvalue analysis is also known as spectral analysis.
It provides useful factorizations such as diagonalization, Jordan-block forms, singular value decompositions, and polar decompositions. In Section 3.7, we discuss
99
100
Matrix Analysis
(3.1)
where and are scalars. The associative properties of matrices can then be viewed
as compositions of operator sequences as follows:
A (Bx) = b
(AB) x = b
This is shown in Figure 3.2. As we saw in Chapter 1, matrix products are not commutative in general. This means that when C = AB operates on x, operator B is applied
first to yield y = Bx. Then y is operated on by A to yield b.
101
When using the same operator on several input vectors, one could also collect
the input vectors into a matrix, say X, and obtain a corresponding matrix of output
vectors B.
AX = A x1 x2 xP = b1 b2 bP = B
Note that although X and B are matrices, they are not treated as operators. This is
to emphasize that, as always, the applications will dictate whether a matrix is viewed
an operator or not.
(3.2)
(3.3)
Unitary matrix operators are operators that preserve Euclidean norms defined
as
#
$ n
$
x = %
xi xi = x x
i=1
To see this,
b2 = Ax2 = (Ax) (Ax) = x (A A)x = x x = x2
When the matrices and vectors are real, unitary operators are synonymous with
orthogonal operators. Thus we also get for real vectors and matrices,
b2 = Ax2 = (Ax)T (Ax) = xT (AT A)x = xT x = x2
Note that if A is a unitary (or orthogonal) operator, the inverse operator is found
by simply taking the conjugate transpose (or transpose), that is, A1 = A (or A1 =
AT ).
In the following examples, unless explicitly stated, the matrix operators and
vectors are understood to have real elements. Examples of orthogonal matrices
include the following:
1. Permutation operators. These matrices are obtained by permuting the rows of
an identity matrix. When operating on an input vector, the result is a vector
with the coordinates switched according to sequence in the permutation P. For
instance,
x
0 0 1
x
z
P y = 1 0 0 y = x
z
0 1 0
z
y
2. Rotation operators (also known as Givens operator). The canonical rotation
operators R are obtained from an identity matrix in which two elements in the
diagonal, say, at rk,k and r, , are replaced by cos (), and two other off-diagonal
102
Matrix Analysis
2
1.5
x
x2, b2
0.5
b=R
cw
0.5
1.5
x ,b
1
elements at rk, and r,k are replaced by sin () and sin (), respectively. For a
two-dimensional rotation, we have
cos() sin()
(3.4)
Rcw() =
sin() cos()
as a clockwise rotation of input vectors. To see that Rcw is indeed an orthogonal
matrix, we have
cos() sin()
cos() sin()
T
Rcw Rcw =
sin() cos()
sin()
cos()
0
cos2 () + sin2 ()
=I
=
0
cos2 () + sin2 ()
Similarly, we can show that Rcw RTcw = I.
To illustrate the effect of clockwise rotation, consider
1
2
x=
b = Rcw(/4) x =
1
0
The original point was rotated /4 radians clockwise, as shown in Figure 3.3.
Because Rcw is orthogonal, the inverse, operator, is the counterclockwise
rotation obtained by simply taking the transpose of Rcw , that is,
cos() sin()
1
Rccw() = Rcw () =
(3.5)
sin()
cos()
For three dimensions, the clockwise rotation operators around x, y, and z, assuming the vectors are arranged in the classical order of (x, y, z)T , are given by Rcw,x ,
Rcw,y , and Rcw,z, respectively,
1
0
0
cos() 0 sin()
Rcw,x = 0
cos()
sin()
Rcw,y =
0
1
0
0 sin() cos()
sin() 0
cos()
cos()
sin() 0
Rcw,z = sin() cos() 0
0
0
1
and shown in Figure 3.4.
103
2
ww
w w
(3.6)
=
=
=
2
2
ww )(I ww )
w w
w w
4
4
I ww + 2 ww ww
w w
(w w)
(I
4
4
ww + ww = I
w w
(w w)
Hw x
origin
x
w
104
Matrix Analysis
x x = y y and x y = y x. Then,
H(xy) x =
I
2
(x y) (x y)
(x y) (x y)
x
1
(xx x yx x xy x + yy x)
y x
xx x xy x xx x + yx x + xy x yy x
x x y x
x x
(3.7)
(I P) (I P)
I 2P + P2
IP
Let v reside in the linear space L, and let S be a subspace in L. Then a projection
operation, PS , can be used to decompose a vector into two complementary vectors,
v
(PS + I PS ) v
Pv + (I P) v
vS + vLS
where vS will reside in the subspace S, whereas vLS will reside in the complementary
subspace) L S.
*
Let q1 , . . . , qM be a set of orthonormal basis vectors for S, that is,
qi q j
+
=
0
1
if i = j
if i = j
*
)
S = Span q1 , . . . , qM
and
M
i=1
qi qi
(3.8)
105
and
Q
=I
Q
(3.9)
M
i=1
qi (qi v) =
M
i qi S
i=1
where i = qi v.
3. For any vector v and b, y = PS v is orthogonal to z = PLS b, where PLS =
I PS . To show this,
y z = v PS PLS b = v QQ (I QQ ) b = v (QQ QQ QQ ) b = 0
There is also a strong relationship between the projection operator and the
least-squares solution of Ax = b given by (2.35). Using the QR factorization of A,
=
(A A)1 A b = (R Q QR)1 R Q b
R1 Q b
R
x
Q b
QR
x
QQ b
A
x
PS b
x
Thus A
x is the orthogonal projection of b onto the subspace spanned by the columns
of A.
EXAMPLE 3.1. Suppose we want to obtain the projection operator of a threedimensional vector b onto the subspace spanned by a1 and a2 , where
0.5
1
0
b = 0.5 a1 = 0 a2 = 1
1
1
2
Let A = a1 a2 . Using the QR-algorithm on A, we obtain the orthonormal
basis,
2/2
3/3
Q = q1 q2 =
0
3/3
2/2 3/3
106
Matrix Analysis
1.4
A( line
(a,b)
1.2
line(a,b)
A( line
(c,d)
0.8
0.6
Figure 3.6. Using a linear operator A to translate line(a,b) may not translate another line
line(c,d) .
line
(c,d)
0.4
0.2
0.2
0
0.5
1.5
5 2 1
1
1
1
PS = QQ =
2
2 2
P(LS) = I PS =
2
6
6
1 2
5
1
Solving for the least-squares solution,
1
0
0.5
0
1 x = 0.5
1 2
1
x = R1 Q b =
1
12
5
4
2
4
2
1
2
1
1.75
0.25
0.75
1.25
(3.10)
107
Strictly speaking, affine operators are not linear operators. However, we can
transform the affine operation and make it a linear matrix operation. One approach
is to expand both the operator A and the input vector v as follows:
A
t
v
A=
v=
(3.11)
0 1
1
where vector t is the desired translation. After the affine operation
A has finished
operation on
v, one can then obtain the desired results by simply removing the last
element of
v, that is,
A t
Av + t
v
A
v= I 0
I 0
= I 0
= Av + t
0 1
1
1
T
T Given a line segment defined by its endpoints, a = ax , ay and
b = bx , by . Suppose we want to rotate
T this line segment counterclockwise
,
m
, where mx = (ax + bx ) /2 and my =
by
radians
at
its
midpoint,
m
=
m
x
y
ay + by /2. This operation can be achieved by a sequence of affine operations.
First, we can translate the midpoint of the line to the origin. Second, we rotate
the line. Lastly, we translate the origin back to m. In the 2-D case, this is given
by
1 0 mx
cos () sin () 0
1 0 mx
cos ()
0 0 1 my
Aff = 0 1 my sin ()
0 0
1
0
0
1
0 0
1
cos () sin ()
cos ()
= sin ()
0
0
1
EXAMPLE 3.2.
where
(3.12)
108
Matrix Analysis
0.8
0.7
y
Aff( line
0.6
(a,b)
line(a,b)
0.5
0.4
0.4
0.5
0.6
0.7
0.8
x
Because v = 0 is always a solution to (3.12), it does not give any particular
information about an operator. It is known as the trivial vector. This is why we only
consider nonzero vectors to be eigenvectors.
To evaluate the eigenvectors of A, we can use the condition given by (3.12).
Av = v
(A I) v = 0
(3.13)
(3.14)
Equation (3.14) is known as the characteristic equation of A, and this equation can be
expanded into a polynomial of order N, where N is the size of A. Using the formula
for determinants given by (1.10), we have the following lemma:
Let A[=]N N. Then the characteristic equation (3.14) will yield a polynomial equation given by
LEMMA 3.1.
charpoly() = N + N1 N1 + + 1 + 0 = 0
where
Nk = (1)k
1 <<k
(3.15)
[ ,, ]
det A[11 ,,kk ]
[ ,, ]
and A[11 ,,kk ] is the matrix obtained by extracting the rows and columns indexed by
1 , . . . , k .
The polynomial, charpoly (A), is known as the characteristic polynomial
of A. Specifically, two important coefficients are 0 = (1)N det (A) and N1 =
(1) trace (A).
Remarks: In MATLAB, the command to obtain the coefficients of the characteristic
polynomial is given by c = poly(A), where c will be a vector of the coefficients
of the polynomial of descending powers of . Because the eigenvalues will depend
on the solution of this polynomial, small errors in the coefficients can cause a large
error in the values of the roots. Thus for large matrices, other methods are used
to find the eigenvalues. Nonetheless, the characteristic polynomials themselves are
109
EXAMPLE 3.3.
Let A be given by
1
A= 0
7
7
6
1
0
5
0
( 1)
0
det (I A) = det
0
( 5)
7
(3.16)
7
6
3
2
= 7 38 + 240
( 1)
Once the eigenvalues are obtained, each distinct value of can be used to
generate the corresponding eigenvector, v , by substituting each eigenvalue one at
a time into equation (3.13),
( I A) v = 0
(3.17)
Each of the equations (3.17) will yield infinite solutions, but only the directions
of these vectors will be important, that is, if v is an eigenvector, then so is v for any
scalar = 0. As a standard, the eigenvectors are often normalized to have a norm
equal to 1, that is, with = v1 .
We can generate a procedure of finding the eigenvectors for each distinct eigenvalue of a matrix A. Let W( , k) be a nonsingular submatrix of ( I A) obtained
by removing the kth row and kth column, that is,
W( ,k) = ( I A)(k,k)
and
det W( ,k) = 0
110
Matrix Analysis
Next, let vector c(k) be the kth column of matrix A with the kth element removed, i.e.
c(k)
a1,k
..
.
a
= k1,k
ak+1,k
..
.
aN,k
(3.18)
v
Finally, the eigenvector can be obtained by inserting a 1 in the kth position of
v1
.
..
vk1
(3.19)
v =
1
vk
..
.
vN
1
EXAMPLE 3.4.
Let A be given by
1
A= 4
7
2
5
8
3
6
1
1.0000
1.0000
1.0000
v(12.4542) = 2.3492 , v(5.0744) = 1.3415 , v(0.3798) = 0.8991
2.2519
2.9191
0.1394
and their normalized versions are
0.2937
0.2972
0.7397
v(12.4542) = 0.6901 , v(5.0744) = 0.3987 , v(0.3798) = 0.6650
0.6615
0.8676
0.1031
111
(3.20)
(3.21)
a12
a11 if a11 =
v =
a21
if a22 =
a22
(3.22)
In case = a11 = a22 , then lHospitals rule can still be used where 0/0 occurs.
2. Diagonal matrices. Let D[=]N N be diagonal, then the eigenvalues are simply
the diagonal elements,
0
( d1 )
N
..
=
det (I D) = det
( di )
.
0
( dN )
i=1
0
( 11 )
N
..
..
=
det (I L) = det
( ii )
.
.
i=1
N1
( NN )
To obtain the eigenvector corresponding to = kk (if is not unique, choose
the largest value of k), we can form the eigenvector as k 1 number of zeros
appended to the result of (3.19) applied to the lower (k 1) (k 1) submatrix
of L as follows:
0k1
v(kk ) = 1
q
112
Matrix Analysis
where
(k,k k+1,k+1 )
..
q=
.
N,k+1
..
(k,k NN )
k+1,k
..
.
N,k
C=
0
0
..
.
1
0
..
.
0
1
..
.
..
.
0
0
..
.
0
0
0
1
0
2
1
N1
(3.23)
n1
i i = 0
i=0
v =
..
.
N1
113
k = 1, 2, . . . , N
(3.24)
i=1,i=k
The proofs of these properties are given in Section C.1.1 as an appendix, with
the exception of Property 13, which is left as an exercise (see Exercise E3.20).
Properties 1 through 6 give formulas for the calculation of eigenvalues. Specifically, we have already used property 1 in the previous section for the case of diagonal
and triangular matrices. Property 2 is an extension of property 1 to block-triangular
matrices.
Assuming that the eigenvalues of A are known, properties 3 to 6 give methods to
determine eigenvalues for: (i) scalar products, A; (ii) transposes, AT ; (iii) powers,
Ak ; and (iv) similarity transformations, T 1 AT . Property 6 states that similarity
114
Matrix Analysis
3
2
0
5
and
T =
1
2
2
1
H=T
AT =
7
2
4
1
One can show that the eigenvalues of A and H are both given by (A) = 3, 5. Similarity transformations are used extensively in matrix analysis because they allow one
to change the elements of a matrix and still maintain the same set of eigenvalues. This
type of transformations will be used in several applications such as diagonalization
and triangularization and are discussed in later sections.
Properties 7 and 8 give two interesting relationships between the elements of A
and the eigenvalues of A. First, the determinant of A is equal to the product of all
eigenvalues. This means that when A is singular, at least one of the eigenvalues must
be zero. Second, the sum of the eigenvalues is equal to the trace of A. Thus, for real
matrices, because the trace will be real, the eigenvalues will have to contain either
real values and/or complex conjugate pairs.
Properties 9 through 11 apply to Hermitian matrices and symmetric real matrices. These properties have important implications in real-world applications. For
instance, real-symmetric matrix operators, such as stress tensors and strain tensors,
are guaranteed to have real eigenvalues and eigenvectors. In addition, if the operator is symmetric positive definite, then its action as an operator will only stretch or
contract input vectors without any rotation. Property 10 gives another method for
determining whether a given Hermitian matrix is positive definite. It states that if the
eigenvalues of Hermitian A are all positive, then A is immediately positive definite,
that is, v Av > 0 for any v = 0. Furthermore, the eigenvectors of the Hermitian or
real-symmetric matrices are also guaranteed to be orthogonal.
For instance, let A be given by
8.5
1
A=
3.5
6
1
3.5
8.5
1
1
1
4
v=0.5
1
1
1
= 1 ; v=1 = 1 ; v=2 = 1
2
1
0
One can show that the eigenvectors form an orthogonal set. The ellipsoid shown in
Figure 3.8 is the result of A acting on the points of a unit sphere. The eigenvectors
are alo shown in the figure, and they form three orthogonal directions.
Property 12 guarantees that if the eigenvalues of A are distinct, then the set of
eigenvectors will be linearly independent. This means that the eigenvectors can be
used as a basis for the N-dimensional space. This also means that a matrix V , whose
columns are formed by the eigenvectors, will be nonsingular. This fact is used later
during the diagonalization of matrices.
115
1.5
1
0.5
z
0
-0.5
-1
-1.5
-1
-1
Finally, property 13 gives us a simple way to estimate the location of the eigenvalues without having to evaluate them exactly. To illustrate this property, let A be
given by
10
1
0
0
0
0
1 10
1
0
0
0
1
30
1
0
0
A=
0
0
1
5
1
0
0
0
0
1
1
1
0
20
20
15
Imag()
10
10
15
20
20
10
10
Real( )
20
30
116
Matrix Analysis
a0
a1 a2
an
an
a0 a1 an1
an1 an a0 an2
(3.25)
A=
.
..
..
..
..
..
.
.
.
.
a1
a2
a3
a0
They are used in Fourier analysis, and they appear during finite difference solutions
of problems with periodic boundary conditions. For instance, the following is a
circulant matrix:
1
2
1
A = 1 1
2
(3.26)
2
1 1
It is a normal matrix because
6
A A = 1
1
THEOREM 3.1.
1
6
1
1
1 = AA
6
where 1 , . . . , N are the eigenvalues of A and the columns U are the corresponding
orthonormal eigenvectors of A. Furthermore, a matrix will have orthonormal eigvectors if and only if it is normal.
PROOF.
3.5 Diagonalization
117
For instance, for the circulant matrix A given in (3.26), the Schur triangularization yields a unitary operator U given by
0.5774
U = 0.5774
0.5774
0.5185 + 0.2540i
0.0393 0.5760i
0.4792 + 0.3220i
0.2540 + 0.5185i
0.5760 0.0393i
0.3220 0.4792i
2
U AU = 0
0
0
2.5 0.866i
0
2.5 + 0.866i
0
3.5 Diagonalization
For some square matrices A, there exists a nonsingular matrix T such that the
similarity transformation T 1 AT will be a diagonal matrix .
T 1 AT =
0
..
=
.
N
v1
and
vN
= diag(1 , . . . , N )
=
..
.
1 v1
N vN
AV = V
(3.27)
2. Normal matrices. This class includes matrices that can have repeated roots.
Based on Theorem 3.1, one can find a unitary matrix U, such that U AU = ,
where U can be found using Schur triangularization.
3. Repeated eigenvalues that satisfy rank conditions.
118
Matrix Analysis
THEOREM 3.2.
rank(i I A) = N ki
(3.28)
Let
A=
1
0
0
0
2
0
2
0
1 0 0
0 0 0
1 I A = 1 0 0
2 I A = 1 1 0
0 0 0
0 0 1
Because (rank(1 I A) = 1 = N 2) and (rank(2 I A) = 2 = N 1), the
conditions of Theorem 3.2 are satisfied. This implies that A is diagonalizable.
For 1 = 2, we have two linearly independent eigenvectors,
0
0
v1 = 1
and
v2 = 0
0
1
and for 2 = 1, we have the eigenvector,
1
v3 = 1
0
Thus
0
V = 1
0
0
0
1
1
1
0
2
V 1 AV = 0
0
0
2
0
0
0
1
J =
J1
0
..
.
Jm
(3.29)
119
where J i is called the Jordan block, which is either a scalar or a matrix block of the
form
1
0
..
..
.
.
(3.30)
Ji =
..
. 1
0
The nonsingular matrix T such that T 1 AT = J is known as the modal matrix, and
the columns of the modal matrix are known as the canonical basis of A. Thus the
Jordan decomposition of A, or the Jordan canonical form of A, is given by
A = TJT 1
(3.31)
Note that if all the Jordan blocks are 1 1, (3.31) reduces to the diagonalization of A.
EXAMPLE 3.6.
3
0
A=
1
0
0
Using T given by
T =
0
0.7071
0
0.7071
0
0
0.5843
1.0107
0
0
we have
J =T
0
0
3
0
0
0
1
0
2
0
1
1
0
0
3
AT =
2
0
0
0
0
0
3
0
0
0
1.2992
1.2992
0.8892
0
0
0
0
1.2992
0
0
0
3
0
0
0
0
0
3
0
0
0
0
1
3
0
0
0
0
1
3
0.8892
1.1826
1.8175
0
1.2992
Note that there are three Jordan blocks even though there are only two distinct
eigenvalues.
Details for finding the modal matrix T for a Jordan decomposition is given in
Section C.3. However, existing methods for calculating modal matrices are not very
reliable for large matrices. One approach is to first reduce the problem by obtaining
the eigenvectors to diagonalize part of A, for example, those corresponding to unique
eigenvalues, leaving the rest for a Jordan decomposition.
Remarks: In MATLAB, the command for finding Jordan block is
[T,J]=jordan(A), which will yield the modal matrix T and Jordan block matrix
J = T 1 AT . We have also included a MATLAB function jordan_decomp.m,
which is available on the books webpage. The attached code allows the user to
120
Matrix Analysis
i xi
(3.32)
i=0
which is convergent for |x| < R. Then the function of a square matrix A defined
by
f (A) =
i Ai
(3.33)
i=0
is called a well-defined function if each eigenvalue has an absolute value less than
the radius of convergence R.
For instance, one of the most important functions in the solution of initial value
problems in ordinary differential equations is the exponential function exp (A) where
A is a square matrix. Then it is defined as the matrix version of exp (), that is,
1
1
exp(A) = I + A + A2 + A3 +
(3.34)
2
3!
Sometimes, it is not advisable to calculate the power series of a square matrix
directly from the definition, especially when convergence is slow and the matrices are
large. An alternative approach is to use diagonalization or Jordan canonical forms.
Another alternative is to use the Cayley-Hamilton theorem to obtain an equivalent
finite series. We now discuss each of these cases:
1. Case 1: A is diagonalizable. In this case, one can find T such that T 1 AT = ,
where is diagonal and contains the eigenvalues. We can then substitute the
factorization A = TT 1 into (3.33) to yield
f (A)
T (0 I + 1 + 2 2 + . . .)T 1
0
(0 + 1 1 + . . .)
1
..
T
T
.
0
(0 + 1 N + . . .)
0
f (1 )
1
.
..
T
(3.35)
T
f (N )
121
exp(A) = T
exp(1 )
0
..
.
0
exp(2 )
..
.
..
.
0
0
..
.
exp(n )
1
T
k
Ji =
1
..
.
..
..
k
i
=
.
..
1
0
i
0
[k,k1] k1
i
ki
..
.
..
.
[k,kn+1] ikn+1
[k,kn+2] ikn+2
..
.
ki
(3.36)
where,
k,j =
k!
(k j )!j !
if j 0
otherwise
[i,0] [i,1]
0
[i,0]
f (J i ) = 0 I + 1 J i + 2 J i2 + = .
..
..
..
.
.
0
[i,N1]
[i,N2]
..
.
(3.37)
[i,0]
where,
[i,j ] =
k+j [k+j,k] ki
k=0
1
(k + j )! k
1
=
k+j
i =
j!
k!
j!
k=0
d j f (i )
di
or
f (J i ) =
f (i )
0
..
.
f (1) (i )
f (i )
..
.
..
.
f (i )
where
f (k) () =
dk f ()
dk
(3.38)
122
Matrix Analysis
f (J 1 )
0
0
0
f (J 2 )
1
(3.39)
f (A) = T
T
..
..
.
..
..
.
.
.
0
0
f (J m )
EXAMPLE 3.7.
2
0
0
A = 1 2
0
4
2 2
0 0 1
2
1
0
T = 0 1 0 and J = 0 2
1
2 4 0
0
0 2
The function exp (A) can then be evaluated as follows:
1 2
e2 e2
e
2
1
e2
e2 T 1 = e2 1
exp (A) = T 0
2
0
0
e
0
1
2
0
0
1
3. Case 3: Using finite sums to evaluate matrix functions. We first need to state an
important theorem known as the Cayley-Hamilton theorem:
THEOREM 3.3.
given by
charpoly() = a0 + a1 + + aN N = 0
(3.40)
then matrix A will also satisfy the characteristic polynomial, that is,
charpoly(A) = a0 I + a1 A + + aN AN = 0
PROOF.
(3.41)
Using the Cayley-Hamilton theorem, we can see that AN can be written as a linear
combination of Ai , i = 0, 1, . . . , (N 1).
AN =
1
(a0 I + + an1 AN1 )
aN
(3.42)
123
=
=
=
1
(a0 A + + an1 AN )
aN
'
(
1
1
a0 A + + aN1
(a0 I + + an1 AN1 )
aN
aN
0 I + 1 A + + n1 AN1
(3.43)
We can continue this process and conclude that AN+j , j 0, can always be recast as
a linear combination of I, A, . . ., AN1 . Applying this fact to 3.33, we can conclude
that
f (A) = c0 I + c1 A + + cN1 AN1
(3.44)
for some coefficients c0 , . . . , cN1 . What remains is to determine N linearly independent equations that would yield the values of these coefficients.
Because the suggested derivation of (3.44) was based on the characteristic polynomial, this equation should also hold if A is replaced by i , an eigenvalue of A. Thus
we can get m linearly independent equations from the m distinct eigenvalues:
f (1 )
f (m )
=
..
.
=
c0 + c1 1 + + cn1 N1
1
c0 + c1 m + + cn1 N1
m
(3.45)
For the remaining equations, we can use [dq f ()/dq ]=i , q = 1, ..., ri , where ri is
the multiplicity of i in the spectrum of A, that is,
dq f ()
dq
N1
=
(c0 + c1 + + cn1
)
(3.46)
q
q
d
d
=i
=i
After obtaining the required independent linear equations, c0 , . . . , cn1 can be calculated and used in (3.44).1
EXAMPLE 3.8.
Let
2
A= 0
1
0
3
0
0
1
3
Note that in cases in which the degree of degeneracy of A introduces multiple Jordan blocks
corresponding to the same eigenvalue, this method may not yield n linearly independent equations.
In those cases, however, there exists a polynomial of lower order than the characteristic polynomial
(called minimal polynomials) such that the required number of coefficients will be equal to the
number of linear equations obtained from the method just described.
124
Matrix Analysis
0.1353
0
exp(A) = c0 I + c1 A + c2 A2 = 0.0358 0.0498
0.0855
0
0
0.0498
0.0498
where w .
EXAMPLE 3.9.
k
0
a
k
Ak =
if a = c
k
a c
k
b
c
ac
or
k
0
a
if a = c
Ak =
kbck1 ck
T
Thus, for any nonzero vector v = v1 v2
, we have w(k) = Ak v =
T
w1 (k) w2 (k) , where
k
a ck
b
v1 + ck v2
if a = c
k
a
c
w1 (k) = a v1 and w2 (k) =
if a = c
kbak1 v1 + ak v2
If both |a| < 1 and |c| < 1, we see that w1 () = w2 () = 0 and conclude that
A is an asymptotically stable operator. However, if either |a| > 1 or |c| > 1, then
w1 (k) or w2 (k), or both, will become unbounded as k , thus A will be an
unstable operator.
However, if either a = 1, |c| 1 or c = 1, |a| 1, A will still be stable,
but not asymptotically stable, because the limit may not exist. If a = c with
|a| = 1 and b = 0 then w2 (k) will be unbounded.
125
As Example 3.9 shows, there are several cases that can lead to either asymptotically stable, stable but not asymptotically stable, or unstable operators. However, a
sufficient condition for asymptotically stable matrix operator was simple to check,
that is, if |a| < 1 and |c| < 1. Likewise, a sufficient condition for instability is also
simple to check, that is, if |a| > 1 or |c| > 1, we have instability. For the general
case, we can also find these sufficient conditions for stability by testing whether the
spectral radius is less than or equal to 1.
Definition 3.8. Given a matrix operator A[=]N N having eigenvalues
1 , . . . , N , the spectral radius of A, denoted by (A), is defined as
(3.47)
(A) = max i
i=1,...,N
We then have the sufficient conditions for stability and instability of matrix
operators:
THEOREM 3.4.
THEOREM 3.5.
To prove these theorems, we can use the Jordan block decomposition of A, that
is,
A = TJT
where
J =
J1
0
..
JM
Ji =
1
i
..
..
1
i
J1
0
1
..
Ak = T
T
.
k
JM
where
J ik
ki
ki
..
.
..
.
..
.
ki
with representing elements having the form with some constant and < k.
Thus if any |i | > 1 for some i, that is, (A) > 1, then Ak will be unstable. However,
if |i | < 1 for all i, that is, (A) < 1, then Ak will converge to a zero matrix.
In Section 2.4, the stationary iterative methods for solving linear
equations Ax = b required splitting the matrix A as A = A1 + A2 to obtain an
iterative method given by,
EXAMPLE 3.10.
x(k+1) = Hx(k) + c
(3.48)
126
Matrix Analysis
1
where H = A1
1 A2 and c = A1 b. (For the Jacobi method, we chose A1 as the
diagonal of A, whereas for the Gauss-Siedel method, we chose A1 to be the
lower triangular portion of A. )
At the kth iteration of (3.48),
x(k+1) = Hk+1 x(0) + I + H + + Hk c
x, where
With (H) < 1, we have lim Hk = 0, and lim x(k+1) =
k
x= I+
i=1
The infinite series I + H + H2 + , also known as the Von Neumann series,
is convergent if (H) < 1. Furthermore, this series can be shown to be equal to
(I H)1 , if the inverse exist, that is,
(3.49)
(I H)1 = I + H + H2 +
One can show the equality in (3.49) by simply multiplying both sides by (I H).
Using a Jordan canonical decomposition of H = TJT 1 ,
I H = T (I J ) T 1
and because I J is triangular,
det (I H) =
N
(1 )
=1
which is nonzero if (H) < 1. This means that if (H) < 1, we can use (3.49)
1
together with A = A1 + A2 , H = A1
1 A2 , and c = A1 b to show that the iterative process converges to
2
x
(I H)1 c
1
A
A1
I + A1
2
1
1 b
(A1 + A2 )1 A1 A1
1 b
A1 b
1
which is the desired solution of Ax = b. (Note: The closer A1
1 is to A , i.e., the
closer A2 is to 0, the faster the convergence.)
We could also extend the sufficient stability criterion of matrix operators to matrix
functions, that is,
THEOREM 3.6.
f (A) is asymptotically stable if ( f (A)) = maxi f (i ) < 1.
THEOREM 3.7.
f (A) is unstable if ( f (A)) = maxi f (i ) > 1.
Although the determinant could get very small if several eigenvalues have magnitudes close to 1
and/or N is large.
f (J 1 )
0
1
..
f (A) v = T
T v = v
.
0
f (J M )
f (J 1 )
0
1
..
= T 1 v
T v
.
0
f (J M )
Because f (J i ) are all triangular, we have the diagonal elements as the eigenvalues,
that is,
= f (1 )k1 , f (N )kN
Thus we can apply Theorems 3.4 and 3.5 to these eigenvalues to obtain Theorems 3.6
and 3.7, respectively.
Let f (A) = exp (A). A sufficient condition for stability is given by
maxi ei < 1, which is equivalent to having the real part of i be negative,
for
all i. On the other hand, if any i has a positive real part, then maxi ei > 1.
EXAMPLE 3.11.
Remarks: Note that in this example, we have still defined stability based on integer
powers of matrix operators, that is, the boundedness of w(k) = (exp A)k v for some
v < . We will attach an equivalent criterion for exp (tA) in section 6.6, with t
being a continuous parameter. Fortunately, the sufficient stability and instability
criterion ends up being the same.
127
128
Matrix Analysis
For any matrix A[=]N M, there exists a decomposition called the singular
value decomposition,
A = UV
(3.51)
where U[=]N N and V [=]M M are unitary, and [=]N M is a diagonal matrix
1
0
..
0[r,Mr]
= 0
0[Nr,r]
0[Nr,Mr]
1 2 r are the nonzero singular values of A, and r will be the rank of A. Details
of the SVD algorithm can be found in Section C.4.2 as an appendix.
Remarks: In MATLAB, the command [U,S,V]=svd(A) can be used to obtain
the diagonal matrix S containing the singular values in descending order and unitary
matrices U and V such that A = USV . We have also included a MATLAB code
that is available on the books webpage as svd_alg.m and applies the algorithm
described in Section C.4.2.
AGA = A
GAG = G
(AG) = AG
(GA) = GA
11
0
..
0[r,Nr]
1
(3.52)
1 =
0[Mr,r]
3
0[Mr,Nr]
The terms generalized inverse and pseudo-inverse are synonymous. Nonetheless, we just prefer
to set the term pseudo-inverse aside for A to invoke the use of the formula (A A)1 A .
129
(3.53)
is a generalized inverse of A.
Thus, for Ax = b, the Moore-Penrose solution is given by
x = A
M b = V
1 U b
EXAMPLE 3.12.
Let
1
A= 4
7
2
5
8
3
6
9
10
b = 25
40
0.2148
0.8872 0.4082
0.4797 0.7767
0.4082
U = 0.5206
0.2496
0.8165 V = 0.5724 0.0757 0.8165
0.8263 0.3879 0.4082
0.6651
0.6253
0.4082
0.0594
0
0
16.8481
0
0
=
0
0.93600
0
1.0684 0
1 =
0
0
0
0
0
0
Then,
A M
0.6389
= 0.0556
0.5278
0.1667
0.0000
0.1667
0.3056
0.0556
0.1944
1
5
x = A
M b = 1
3
1
To check, we can see that, indeed, we have Ax = b. Recall that this equality
is possible because it still fulfils the rank condition, that is,
rank A b = rank A = 2
If the rank conditions are not met, equality of Ax = b will not exist. However, the
Moore-Penrose solution will yield a least-squares solution.4
EXAMPLE 3.13.
Let
A=
3
5
2
4
6
0.2298
0.8835
0.4082
U = 0.5247
0.2408 0.8165
0.8196 0.4019
0.4082
4
V =
0.6196
0.7849
0.7849
0.6196
130
Matrix Analysis
9.5255
=
0
0
0
0.5143
0
1 =
0.1050
0
0.3333
0.3333
0
1.9444
0.6667
0.4167
0
0
Ve
U ,1
U ,M
1,
..
.
M,
(3.54)
(3.55)
(3.56)
Note that e will now be a square matrix. In this case, (U e ) U e = I[M] but
(U e ) (U e ) = I[M] . This loss of full unitary property is not important for the leastsquares solution. What is gained is a significant reduction in storage, because in most
least-squares problems, the amount of data can be large, whereas the dimension of x
may be much smaller, that is, N M. More importantly, for N > M, the generalized
inverse can be obtained as
A
M = V
1 U = V e (e )
1 (U e )
(3.57)
Recall Example 3.13. Instead of the standard SVD decomposition, we can obtain the reduced SVD version. For
1 2
A= 3 4
5 6
0.2298
0.8835
0.6196 0.7849
0.2408 V e =
U e = 0.5247
0.7849
0.6196
0.8196 0.4019
9.5255
0
0.1050
0
e
e
1
=
( )
=
0
0.5143
0
1.9444
EXAMPLE 3.14.
0.3333
0.3333
131
0.6667
0.4167
which is exactly the same as the one obtained using the standard SVD.
A,k
= A,k k ...
1
k =
N
1
ai,k
N
i=1
Matrix
A is known as the mean-adjusted data. This is because we are looking only
for rotations of coordinates centered at the mean of the data cluster.
Afterward, the reduced singular-value decomposition is applied to
A,
A = U e e (V e )
The columns of V e immediately represent the unit vectors of the new coordinates,
that is, the desired transformation is given by
x = V e x. One can then use the values
e
of the singular values in to decide the level of dimensional reduction depending
on the ratio of singular values, for example, if ( /1 ) < , then set K = 1 and
reduce the dimensions by M K.
Having found K, the data can now be projected to the reduced spaces. To do so,
we extract the first K columns of V e and obtain our projection operator (cf. (3.9)) as
V
PV e = V
(3.58)
132
Matrix Analysis
where
= (V e )[1,...,K]
V
This will project the data onto the K subspace. If we wish to rotate the data to the
new coordinate system, we need V e one more time. In summary, the projected data
under the new coordinates can be found by
Anew =
A (PV e V e )
(3.59)
5.2447
0
0
0.6446 0.7632
0.0444
and V = 0.4448 0.4217 0.7901
=
0
2.4887
0
0
0
0.1476
0.6218 0.4896 0.6113
EXAMPLE 3.15.
Based on the singular values, it appears that the last dimension can be removed
because it is significantly smaller than the other singular values. Thus we can
take the first two columns of V and obtain the projection operator PV as
(3.60)
where R is unitary and S is a Hermitian positive semidefinite matrix.5 The decomposition (3.60) separates the action of A into two sequences: the first is a stretching
operation by S, and the second is a unitary operation by R, which is usually a rotation
or reflection.
The classic approach to polar decomposition is to first find S such that S2 = A A,
that is, S is the positive square root of A A.
Definition 3.11. A matrix B[=]N N is a square root of A[=]N N if B2 = A.
If in addition, B is positive semidefinite, then B is the positive square root of A,
that is, the eigenvalues of B are all non-negative.
To obtain the square root of A A, we note that A A is Hermitian, and thus the
normalized eigenvectors should form an orthonormal set. Using (3.35),
1
0
..
S=V
(3.61)
V
.
0
N
5
The factorization is called the polar decomposition to imply a generalization of the scalar case
z = rei , where r is a positive real number and exp(i) describes a rotation.
133
0.5
0.5
1
1
Figure 3.10. Using SVD to locate the principal components in the original mean-adjusted data (top
plot). The projected data were then rotated (bottom plot).
0.5
0.5
0.5
z
0
0.5
1
1
1 1
and
S = VV
Then,
RS = (UV ) (VV ) = UV = A
(3.62)
134
Matrix Analysis
2
1.5
0.5
x2
Sv
2
1
1.5
0.5
Sv
x1
EXAMPLE 3.16.
can be found to be
1.25
S=
0.75
0.75
1.25
and
R=
0.866
0.500
0.500
0.866
Figure 3.11 shows how the points of a unit circle are stretched by S. The eigenvalues of S are 1 = 2 and 2 = 0.5 with the corresponding eigenvectors
0.7071
0.7071
v2 =
v1 =
0.7071
0.7071
Along v1 , the points are stretched to twice it original size, whereas along v2 ,
the points are compressed to half its original size. After S has deformed the
unit circle into an ellipse, R will rotate it by an angle = tan1 (0.5/0.866) = 30o
counterclockwise, as shown in Figure 3.12.
1.5
0.5
x2
RS v
0.5
1.5
Sv
x1
135
A 0
A = || A
A + B A + B
A = 0 only if A = 0
I = 1
Ax A x
AB A B
Positivity
Scaling
Triangle Inequality
Unique Zero
Norm of Identity
Product Inequality 1
Product Inequality 2
(3.63)
where
S is a Hermitian positive definite matrix, in which
S is the square root of AA ,
and R is unitary. In this case, the operation will involve a unitary operation followed
by a stretching operation. Note that in general,
R = R and
S = S. The procedure for
finding the dual polar decomposition is included as an exercise in E3.19.
(3.64)
Ax
= max Ax
x=1
x
(3.65)
The Frobenius matrix norm can be used to measure whether a matrix is close
to being a zero matrix. As an extension, it can also be used to measure whether a
matrix is close to being a diagonal matrix. For instance, let E = (A diag (A)); then
EF = 0 only if A is diagonal.6
For the most part, however, we focus on induced matrix norms. The properties
of the induced norm are given in Table 3.1, most of which carry over from the
properties of vector norms. Specifically, using the Euclidean norm x2 , we have the
induced Euclidean norm of matrix A, denoted as A2 , and evaluated as
"
(3.67)
This norm can be used to determine whether the iterated QR method has achieved diagonalization
of a positive definite matrix as contained in the SVD algorithm discussed in Section 3.9.
136
Matrix Analysis
or
A max |i | = (A)
For the special case in which A is Hermitian, we have A = (A).
Based on matrix norms, another useful characterization of matrix A is the condition number,
Definition 3.13. The condition number of A is defined as
(A) =
max
v,w=1
Av
1
Aw
(3.68)
maxi |i |
max
=
(A) = max
=
v,w=1
min
w A Aw
min j j
v A Av
(3.69)
where i is the ith eigenvalue of A A, whereas max and min are the maximum and
minimum singular values of A, respectively.
EXAMPLE 3.17.
As shown in Example 3.16, A will transform the unit circle into a rotated
ellipse. The points of the ellipse are bounded by two circles: an enclosing circle Cmax with radius (Cmax ) and an enclosed circle Cmin with radius (Cmin ),
that is,
y = Ax
(Cmin ) y (Cmax )
These circles are shown together with the ellipse in Figure 3.13. In our case,
(Cmin ) = 0.5 and (Cmax ) = 2, which turn out to be the smallest and largest
singular values of A, respectively. The condition number, (A), is then given by
the ratio of Cmax to Cmin , or (A) = 4.
From Example 3.17, we see that the condition number is an indication of the
shape of the resulting ellipse (or hyper-ellipsoids). A large condition number would
indicate a flatter ellipse. In the extreme case, an infinite condition number means
that an N-dimensional vector will lose at least one dimension after the operation by
A. For nonsingular matrices,7 the condition number can be shown to be
(A) = A1 A
7
(3.70)
The requirement for nonsingularity of A is needed because A1 does not exist for singular A.
However, if we set A1 = for nonsingular A, then (3.70) can be applied to singular matrices.
137
x2
Figure 3.13. The enclosing and enclosed circles for the
ellipse generated by transforming a unit circle by A.
min
Cmax
2
x1
The condition number can be used to predict how roundoff errors can affect
the solution of linear equations, Ax = b. When the conditions numbers of A are
too large, the equation is classified as ill-conditioned. In these cases, the solutions
become unreliable. To see this, suppose b is perturbed by
b; then the solution
for x will also be perturbed by
x. Assuming the unperturbed solution is given by
x = A1 b, then applying the product inequality properties given in Table 3.1,
x +
x
A1 (b +
b)
A1
b
x
A1
b
and
b
b
A
x
(3.71)
Equation (3.71) gives an upper bound of the relative error in x for a given relative
error in b. Thus the solution x +
x is unreliable if the condition number is very
large.
Consider
46
A = 8326
2529
EXAMPLE 3.18.
8261
2454
5761
43 4927
b = 13210
5076
2
7607
T
The solution is given by x = 1 1 1
. If b is perturbed slightly by
0.6
b
b =
0
= 6.1365 105
b
0.8
138
Matrix Analysis
2.5018
x +
x = A1 (b +
b) = 0.2526
1.5444
One source of ill-conditioning is when the spread of singular values are so wide.
In some cases, rescaling the unknown variables could reduce the condition number,
but only within some limits. Another source for ill-conditioning is the proximity of A
to a singularity condition, that is, when the minimum singular value is close to zero.
In this case, one could put a threshold on singular values by replacing k with zeros
if k < . Afterward, the Moore-Penrose generalized inverse can be applied to solve
the linear equation.
In summary, because the linear equations obtained from real-world applications
may often contain data corrupted with measurement errors or roundoff errors, one
should generally consider using the condition number to check whether the solutions
are reliable.
3.12 EXERCISES
1 2 3 4
cos() sin()
0
4 1 2 3
A = sin()
cos()
0
B=
3 4 1 2
0
0
cos()
2 3 4 1
2
cos() sin()
C = I ww
sin()
cos()
w w
E3.2. Determine which of the following statements are always true (Show examples
or counterexamples):
1.
2.
3.
4.
E3.3. Find a single operator that would first rotate a point 30o counterclockwise
around the z-axis then 30o clockwise around the x-axis. Verify the operators
found by using sample vectors and plot the vectors before and after the
operations.
E3.4. Let v and w be real vectors of length N and M, respectively.
1. By treating wT as an operator, what class of vectors are allowed as inputs
to wT ? Describe the outputs of this operator.
3.12 Exercises
139
2. Define A = vwT as the matrix dyadic operator. What class of vectors are
allowed as inputs to A ? Describe the outputs of the operator.
E3.5. The curve shown in Figure 3.14 is generated using the data in Table 3.2.
1. Find an affine operator that would rotate the curve by radians counterclockwise around the point (x, y) = (a, b). Test this operator on the data
given in Table 3.2 with (a, b) = (4, 1.5) and = /2.
2. Find an affine operator that would reflect the curve along a line that
contains points (x1 , y1 ) and (x2 , y2 ). Test this operator on the data given
in Table 3.2 with (x1 , y1 ) = (0, 1) and (x2 , y2 ) = (10, 3).
4
x
Table 3.2. Data for curve shown in Figure 3.14
x
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0000
0.0615
0.2348
0.4882
0.7761
1.0484
1.2601
3.5
4.0
4.5
5.0
5.5
6.0
6.5
1.3801
1.3971
1.3205
1.1776
1.0068
0.8494
0.7399
7.0
7.5
8.0
8.5
9.0
9.5
10.0
0.6999
0.7337
0.8283
0.9581
1.0903
1.1939
1.2459
E3.6. Let P be a projection operator. What determinant values are possible for P?
E3.7. Find the projection operator for the space spanned by
1
2
0
1
v1 =
and
v2 =
3
3
2
1
E3.8. An N N companion matrix (also known as a Frobenius matrix) has the
following form:
0
1
0
0
0
1
..
..
..
..
C = ...
.
.
.
.
0
0
0
1
0
N1
10
140
Matrix Analysis
n1
i i = 0
i=0
1
i
vi =
..
N1
i
E3.9. Consider the N N tri-diagonal matrix given by
a b
0
.
.
..
..
c
TN =
..
..
. b
.
0
c
a
= a2 4bc = 0 is given by
det (T N ) =
(a +
)N+1 (a
)N+1
2N+1
k = 1, 2, . . . , N
(3.72)
(3.73)
then det (T N ) = 0. (Hint: First show that for T N = 0, an equivalent
condition becomes
2(N+1)
!
q + q2 1
=1
Then use de Moivres formula for the roots of unity to find the value of q,
i.e., given zM = 1, the Mth roots are given by
2ik
z = exp
k = 1, 2, . . . , M 1
M
where i = 1.)
2. Using the previous result, show that the eigenvalues of the tri-diagonal
matrix T N are given by
k
k = a + 2
k = 1, 2, . . . , N
bc cos
N+1
Verify this formula for N = 5, with a = 3, b = 1 and c = 2.
E3.10. Let be an eigenvalue of A. Show that any nonzero column of adj ( I A)
is an eigenvector of A corresponding to . Verify this for the case in which
A is given by
2 0 0
A= 2 3 0
2 3 4
3.12 Exercises
141
E3.11. For a given 3D real vector v = (a, b, c)T , the matrix cross-product operator,
often denoted by v is defined as
0 c
b
v = c
0 a
(3.74)
b
a
0
1. Let a = b = c = 1. Now collect M points of a closed flat circle as a matrix
X, where
2(k1)
cos M1
X = x1 xM
and xk =
2(k1)
sin M1
0
Then obtain B = v X and plot each column of B as points in a 3D graph.
Observe that the resulting ellipse will be perpendicular to V.
2. Show that the operator v is skew-Hermitian. Obtain the eigenvalues
and thus verify that Property 9 of eigenvalues given in Section 3.3 holds,
that is, that the eigenvalues are either zero or pure imaginary. Obtain
the eigenvalues and their corresponding eigenvectors for the case where
a = b = c = 1.
3. Let a = 0.5, b = 0.5 and c = 1. Find the polar decomposition
of v = RS
where R is orthogonal and S is the square root of vT v . Describe the
behavior of the operator S on any input x to v . Also describe how R will
affect vectors (Sx).
E3.12. Determine which matrices below are diagonalizable and which are not. If
diagonalizable, obtain matrix T that diagonalizes the matrix; otherwise,
determine the modal matrix T that produces a similarity transformation
to the Jordan canonical form.
1 2
0
2
2
1 2
0
a) A =
0
2
1 2
2
0
2
1
4 1 2
b) B = 1 3 1
1 0 2
1 1 1
c) C = 0 2
0
0 0
2
E3.13. Let matrix Ci [=]Ni Ni have the companion form (cf. (3.23)),
0
1
0
0
0
1
..
..
..
..
..
Ci =
.
.
.
.
.
0
0
0
1
0,i
1,i
2,i
(Ni 1),i
142
Matrix Analysis
Now let A be a block triangular matrix having Ci as the ith block diagonal,
that is,
C1
0
A = ...
CK
Write down the characteristic equation for A having this structure.
E3.14. Let matrix A be given by
26 0
0
A = 68 45 0
12 20 10
Evaluate the functions below using: (a) series expansion (3.33), (b) diagonalization, and (c) finite sums.
1. cos (A)
2. B = A1/2 , that is, such that B2 = A
1 0 0
A = 1 21 0
1 1 2
E3.16. Let A be given by
2
0
0
0
0
1 2
0
0
0
2 2
0
0
A= 5
6 2
0 2
0
1
1
1
1 2
Evaluate the functions below using: (a) Jordan decomposition and (b) finite
sums.
1. cos (A)
2. sin (1.5A)
3. exp exp (A)
E3.17. Prove or disprove the following claim: For any square matrix A,
det (exp A) = etrace(A) . (Hint: Use the Jordan decomposition of A and properties of determinants of triangular matrices.)
E3.18. Determine whether the following matrix operators and matrix function operators are stable or unstable: A, (A + I), exp (A), and exp (A + I) where
2
0
0
A= 1
2.5
0
1
1
2
for the following cases: = 0.1, 0.5, and 2.8.
E3.19. Find the dual polar decomposition, A =
S
R, for
1 3 4
A= 4 5 5
0 2 0
where S is Hermitian positive semidefinite matrix square root of AA , and
R
is unitary.
3.12 Exercises
143
E3.20. Show that each of the eigenvalues of matrix A of size N must lie in at least
one of N circles known as the Gershgorin circles centered at akk and given by
N
| akk |
kth circle :
|aki |
i=1,i=k
|akk | >
|aki |
i=1,i=k
Using the result about Gershogorin circles given in the previous exercise,
show that a diagonally dominant matrix is nonsingular.
E3.22. Let v be an eigenvector matrix A. Show that the corresponding eigenvalue
for v can be obtained by using the Rayleigh quotient defined as
v Av
(3.75)
v v
E3.23. Use the power method (cf. Section C.2.2) to find the dominant eigenvalue of
4 2 3
A = 3 0 2
1 1 4
=
E3.24. Let A have distinct eigenvalues; any vector w can be represented as a linear
combination of the eigenvectors, that is,
w=
N
i vi
i=1
N
i=1
i i vi
1 1 v1 +
N
j =2
j
j vj
1
Multiply this equation by Ak and use Property 5 of Section 3.3 to show that the
power method will approach 1 k+1 v1 (or V1 if normalization is performed)
as k . This is the basis for the power method.
E3.25. Consider the data given in Table 3.3 and plotted in Figure 3.15. We want to
use principal component analysis to obtain a 2D curve in 3D space.
1. Obtain the mean-adjusted data matrix
A and take the reduced singular
value decomposition of A, that is, A = UV
2. Obtain the projection matrix PV that would project the mean-adjusted
data onto the space spanned by the first two columns of V . (This assumes
that the last column is attached to the smallest singular value).
144
Matrix Analysis
Table 3.3. Raw data set for the principal component analysis
x
0.0104
0.0288
0.0311
0.0449
0.0657
0.0703
0.1048
0.0541
0.1944
0.2558
0.3085
0.3816
0.4371
0.4898
1.3846
1.0956
0.7672
0.6511
0.3595
0.1628
0.1700
0.1302
0.2131
0.2615
0.3145
0.4090
0.4666
0.5288
0.5161
0.5482
0.5629
0.6096
0.6096
0.5629
0.5512
0.2402
0.2274
0.2011
0.4074
0.4178
0.7435
0.9040
0.6302
0.7523
0.7915
0.9021
0.9389
0.9597
0.9873
0.5395
0.5395
0.5892
0.6213
0.7295
0.7646
0.9342
1.1170
1.4258
1.2890
1.2737
1.0864
1.1498
0.5464
1.5
z
1
0.5
0
1
0.5
Figure 3.15. Plot of raw data for the principal component analysis.
0.5
0.5
yt
0.5
1
1
0.5
xt
0.5
3.12 Exercises
145
counter-clockwise using Rccw(60 ) . This will then make it possible for the
nonlinear regression to be applied to the rotated curve. Let
C = B,1 B,2 RTccw(60 )
and let x f and y f represent the first and second columns of C. Obtain a
fifth-order polynomial regression model for the rotated curve, that is,
y f = a0 + a1 x f + a2 x2f + a3 x3f + a4 x4f + a5 x5f
5. By reversing the operations, we can transform the regression curve y f =
y f (x f ) back to be in terms of the original variables. Thus perform these
operations on the regression curve to the curve shown in Figure 3.17.
1.5
z 1
Figure 3.17. Plot of the data together with
the regression curve in the original space.
0.5
0
1
1
0.5
0
0 1
E3.26. Find the least-squares solution of the following equation using the SVD
decomposition using a tolerance level of 104 (i.e., set J k = 0 if J k 104 )
and the reduced SVD decomposition:
0.7922
0.6555
0.1367
2.5583
3.7049
0.9595
0.1712
0.7883
1.9351
0.6557
0.7060 0.0503
0.1488
0.0357
0.0318
0.0039
0.8491
0.2769
0.5722
x = 3.1600
0.9340
0.0462
0.8878
3.7059
0.6787
0.0971
0.5816
2.6256
0.7577
2.2373
0.8235 0.0657
0.7431
2.3301
0.6948
0.0483
0.3922
0.3171
0.0751
1.2753
E3.27. With the Frobenius norm of A[=]N M defined as
#
$ N M
$ 2
aij
AF = %
i=1 j =1
Show that another method can be used to evaluate AF , for example,
"
AF = trace (A A)
PART II
The next two chapters contain a detailed discussion of vector and tensor analysis.
Chapter 4 contains the basic concepts of vectors and tensors, including vector
and tensor algebra. We begin with a description of vectors as an abstract object
having a magnitude and direction, whereas tensors are then defined as operators on
vectors. Several algebraic operations are summarized together with their matrix representations. Differential calculus of vector and tensors are then introduced with the
aid of gradient operators, resulting in operations such as gradients, divergences, and
curls. Next, we discuss the transformations of rectangular coordinates to curvilinear
coordinates, such as cylindrical, spherical, and other general orthogonal coordinate
systems.
Chapter 5 then focuses on the integral calculus of vectors. Detailed discussions
of line, surface, and volume integrations are included in the appendix, including the
mechanics of calculations. Instead, the chapter discusses various important integral
theorems such as the divergence theorem, the Stokes theorem, and the general Liebnitz formula. An application section is included to show how several physical models,
especially those based on conservation laws, can be cast in terms of tensor calculus,
which is independent of coordinate systems. The models generated are generally in
the form of partial differential equations that are applicable to problems in mechanics, fluid dynamics, general physico-chemical processes, and electromagnetics. The
solutions of these models are the subject of Part III and Part IV of the book.
147
In this chapter, we work with objects that possess a magnitude and a direction.
These objects are known as physical vectors or simply vectors.1 There are two types
of vectors: bound vectors, which are fixed to a specified point in the space, and free
vectors, which are allowed to move around in the space. Ironically, free vectors are
often used when working in rigid domains, whereas bound vectors are often used
when working in flowing or flexible domains. We mostly deal with bound vectors.
We denote vectors with underlined bold letters, such as v, and we denote scalars
with nonunderlined letters, such as , unless otherwise noted. Familiar examples
of vectors are velocity, acceleration, and forces. For these vectors, the concept of
direction and magnitude are natural and easy to grasp. However, it is important to
note that vectors can be built depending on the users interpretation and objectives.
As long as a magnitude and direction can be attached to a physical property, then
vector analysis can be used. For instance, for angular velocities of a rigid body, one
needs to describe how fast the rotation is, whether the rotation is counterclockwise
or clockwise, and where the axis of rotation is. By attaching an arrow whose direction
is along the axis of rotation, whose length determines how fast the rotation is, and
pointing in the direction consistent with a counterclockwise or clockwise convention,
the angular velocity becomes a vector. In our case, we adapt the right-hand screw
convention to represent the counterclockwise direction as a positive direction (see
Figure 4.1).
We begin in Section 4.1 with the description of fundamental vector operations.
The definitions of operations such as vector sums and different type of products,
including scalar, dot, cross, and triple, are done in a geometric sense, that is, based
only on measurements of distance and angles. Later, we introduce unit basis vectors
such as x , y , and z in the rectangular coordinates pointing in the x, y, and z
directions, respectively. When vectors are represented as linear combinations of the
basis unit vectors, an alternative set of efficient calculations can be achieved. These
basis vectors are also used to define an operator known as a tensor. Moreover,
vectors and tensors can also be represented by matrices. In doing so, we can take
advantage of various matrix properties and apply matrix analysis and computation
1
Physical vectors can also be represented by matrix vectors. We defer this matrix representation until
Section 4.3.
149
150
to handle vector operations; that is, the concepts of eigenvalues, eigenvectors, polar
decomposition, diagonalization, and so forth can be applied to the physical vectors
and tensors.
In Section 4.5, we discuss the derivatives of vectors that are dependent on one
variable. The derivative of such a vector is different from that of a scalar function
because vectors have the additional property of directions. Furthermore, the results
of the derivatives are also vectors. The derivative of sums and different types of
products can also be obtained. Due to its ubiquitous importance, we briefly apply
the derivatives to position vectors r and its various derivatives, for example, the
velocity vector and acceleration vector, including other related items such as tangent vectors, normal vectors, binormal vectors, curvature, and torsion. Then, in
Section 4.7, we discuss the differential calculus of vector fields, where a distribution of vectors are specified at different locations of the 3D space. This includes
differential operations such as the gradient, divergence, curl, and Laplacian, among
others.
Finally, in Sections 4.8 and 4.9, we discuss alternative coordinate systems, namely
cylindrical, spherical, and general orthogonal coordinate systems. These are important coordinate systems to consider because in several real-world applications, the
boundaries are cylindrical (e.g., pipe flow) or spherical (e.g., heat transfer from a
spherical surface). Starting with the basic transformation rules between the coordinates, we can generate relationships with those in the rectangular coordinate systems.
Unfortunately, for differential operators, changing the representations based on the
other coordinate systems tends to be more complicated. Nonetheless, the formulas
for gradient, divergence, curl, and Laplacian can still be generated in a straightforward manner.
where
nv =
1
v
v
(4.1)
Here, nv is the normalized unit vector of v, and it will not have any physical units
attached to it.
Notation
Procedure
Addition
c=a+b
Norm
= v
Scalar Product
w = v
Direction of w = direction of v; w = v
Dot Product
=uv
Cross Product
c=ab
Triple Product
= c (a b)
= [c a b]
151
152
Let c be the sum of vectors a and b; then c is also called the resultant, and c
should be in the plane containing both a and b. Furthermore, a, b, and c should
all have the same physical units. This is necessary for the following cosine law to
apply,
c
= a
+ b
2 a
b cos ( )
(4.2)
where is the angle between a and b. Also, the angle between b and c is given by
1
= sin
a
c
sin()
(4.3)
There are four types of vector products: scalar, dot, cross, and triple products.
The resulting products will most likely not be the same as factored vectors, both in
units or in meaning. Thus care must be taken when plotting different type of vectors
in the same 3D space.
The properties for the vector operations are given in Table 4.2. The table is
grouped based on the operations involved (i.e., sums, scalar products, dot products,
cross products, triple products, and norms). Most of the properties for the sums,
scalar products, and dot products are similar to the vectors of matrix theory. On the
other hand, the properties of cross products are quite different. First, we see that cross
products are anti-commutative; that is, the sign is reversed if the order of the vectors
are interchanged. Next, the parallelism property states that two nonzero vectors are
parallel to each other if their cross product is the zero vector.2 Finally, cross products
are not associative. This means that the use of parentheses is imperative; otherwise,
the operation will be ambiguous.
The properties of both the sums and scalar products show that the space of
physical vectors satisfies the conditions given in Table B.3 for a linear vector space.
This means that the properties and definitions attached with linear vector spaces
are also applicable to the space of physical vectors. These include the definitions of
linear combination, linear independence, span, and dimension, as given in Table B.2.3
Thus
)
*
1. A set of vectors, v1 , . . . , vn are linearly independent if the only possible linear
combination that results in a zero vector,
1 v1 + + n vn = 0
is when 1 = = n = 0.
)
*
2. A set of linearly independent vectors, V = v1 , . . . , vn is a basis for a space S,
if S is the span of the vectors of V .
3. The dimension of a subspace S is the number of linearly independent vectors
that would span S.
This is a dual property to orthogonality, in which two nonzero vectors are orthogonal if their dot
product is zero. However, one should note that the parallel property results in a vector, whereas the
orthogonality property results in a scalar.
In fact, the abstract linear space is the generalization of the space of physical vectors.
Associative
2
3
4
Commutative
Identity is 0
Inverse exist and unique
v+w=w+v
0+v=v
v + (v) = 0
Scalar Products
1
2
3
4
(v) = () v
1v = v
( + ) v = v + v
Associative
Identity is 1
Vector is distributive
over scalar sums
Scalar is distributive
over vector sums
(v + w) = v + w
Dot Products
1
2
3
4
5
v =vv
vw=wv
u (v + w) = u v + u w
(v w) = (v) w = v (w)
uv=0
if u = 0, v = 0 or uv
Squared norm
Commutative
Distributive
Scalar Product
Orthogonality
Cross Products
1
2
3
4
Anti-commutative
Distributive
Scalar Product
Parallelism
Non-associativity
Cyclic Permutation
v w = w v
u (v + w) = u v + u w
(v w) = (v) w = v (w)
uv=0
if
u
=
0,
v
=0 or u || v
u v y = u y v (u v) y
(u v) y = y u v y v u
Triple Products
u (v w) = w (u v)
= v (w u)
Norms
1
2
3
4
Positivity
Scaling
Triangle Inequality
Unique Zero
v 0
v = || v
v+w v + w
v = 0 only if v = 0
153
154
(4.4)
The scalars , , and are known as the components along b1 , b2 , and b3 vectors,
respectively.
In particular, it the basis vectors are normalized (magnitude 1 and no physical
units) and orthogonal to each other, the set of basis unit vectors are known as
orthonormal basis vectors. We start with orthonormal basis vectors based on the
Cartesian coordinate system, also known as the rectangular coordinate systems,
described by (x, y, z), using the convention shown in Figure 4.3.4
The unit vectors based on the Cartesian coordinates are denoted by x , y , and
z, each pointing in the positive x, y, and z direction, respectively. Thus
v = vx x + vy y + vzz
(4.5)
and the scalars vx , vy , and vz will be the x- , y-, and z- components of v, respectively
(see Figure 4.4).
The dot products and cross products of Cartesian unit vectors can be summarized
as follows:
i j
ij
(4.6)
i j
ijk k
(4.7)
0
1
if i = j
if i = j
(4.8)
The convention is that when looking down against the positive z-axis, the (x, y) plane, when rotated,
should have the positive y-axis pointing vertically upward and the positive x-axis pointing horizontally to the right.
155
Figure 4.3. The Cartesian coordinate system with three axes perpendicular to each other. The figure on the right (a) is the relationship
among x, y, and z. The figure on the left (b) is the convention by
looking directly down into the positive z-direction.
and ijk is the permutation symbol (also known as Levi-Civita symbol) defined by
ijk
if i = j or j = k or i = k
if (i, j, k) = (x, y, z) or (i, j, k) = (z, x, y) or (i, j, k) = (y, z, x) (4.9)
if (i, j, k) = (x, z, y) or (i, j, k) = (z, y, x) or (i, j, k) = (y, x, z)
Using (4.6) and the distributive property of dot products, the x-component of v can
be found as follows:
v x
vx x + vy y + vzz x
vx x x + vy y x + vzz x = vx
Similarly, vy = v y and vz = v z.
The following identities between the permutation symbols and Kronecker deltas
will also be useful during the derivation of vector operations and properties:
ijk mn
ijk imn
=
=
i
im
in
det j
jm
jn
k
km
kn
jm kn jn km
(4.10)
Equations (4.6) and (4.7), together with the properties given in Table 4.2 yield
alternative approaches to the operations defined in Table 4.1. These are given in
Table 4.3.
156
v+w
=
=
(vx + wx ) x + vy + wy y + (vz + wz) z
(vi + wi ) i
i=x,y,z
Norm
=
=
"
v2x + v2y + v2z
#
$
$ 2
%
vi
i=x,y,z
Scalar Product
=
=
(vx ) x + vy y + (vz) z
vi i
i=x,y,z
Dot Product
vw
=
=
(vx wx ) + vy wy + (vzwz)
ij vi w j =
i,j =x,y,z
Cross Product
vw
vi wi
i=x,y,z
vy wz vzwy x + (vzwx vx wz) y
+ vx wy vy wx z
ijk vi w j k
i,j,k=x,y,z
Triple Product
u (v w)
vy wz vzwy ux + (vzwx vx wz) uy
+ vx wy vy wx uz
ijk ui v j wk
i,j,k=x,y,z
Let us prove the identity for u v y given in Table 4.2. First,
we can expand the cross products in terms of the permutation symbols:
u vy
=
ui i
jk v j yk
EXAMPLE 4.2.
i=x,y,z
j,k,=x,y,z
im jk ui v j yk m
i,j,k,,m=x,y,z
Due to the cyclic nature of the symbols, we see that im = mi and jk = jk .
Thus, using (4.10),
u vy
=
mi jk ui v j yk m
i,j,k,,m=x,y,z
(mj ik mk ij ) ui v j yk m
i,j,k,m=x,y,z
i,j =x,y,z
(ui yi ) v j j
u y v (u v) y
i,k=x,y,z
(ui vi ) yk k
157
Some useful mnemonics exist for the cross product and the triple product. These are
given by
vw
u (v w)
det
vx
wx
ux
det vx
wx
y
vy
wy
uy
vy
wy
z
vz
wz
uz
vz
wz
(4.11)
(4.12)
Note, however, that (4.11) and (4.12) are just memory tools and should not be treated
as definitions.
The unit basis vectors are normalized vectors, and thus they will have no physical
units. The physical units are instead attached to the x-, y-, and z-components of
the vector. Moreover, because scalars can move freely across the various types of
products, the products of unit vectors can be computed in pure geometric terms. For
instance, to compute for work we now have
W = F
s
f x x + f y y + f zz
sx x +
sy y +
szz
( f x x ) (
sx x ) +
( f x
sx ) (x x ) +
v (w x)
v = y
(4.13)
where = w x.
One important concept in transport of properties such as material,
energy, momentum, and so forth is the idea of flux, f . By flux, we mean the
amount of property passing perpendicularly through a specified region per unit
area of that region per unit time. This is a vector because the definition has
both a magnitude and direction, although one has to be cautious about which
EXAMPLE 4.3.
158
b
x
perpendicular direction is of interest; for example, for a closed surface, the unit
normal vector to the surface can be chosen to be the outward normal.5
Consider the triangle
abc determined by three non-collinear points a, b,
and c, each defined by the position vectors a, b, and c, respectively, and shown
in Figure 4.5.6 Assume that the sequence {a, b, c} yields a counterclockwise turn
to yield our desired direction. Then the desired unit normal vector can be found
to be
(b a) (c a)
=
n
(b a) (c a)
Assume a constant property flow, P (e.g., kg of water per ft2 of region perpen.
dicular to its flow), which does not necessarily flow in the same direction as n
The magnitude of the flux through
abc will need to take the projection of P
using the inner product. Thus the flux of P through triangle
abc is given
along n
by
n
f = Pn
or, in terms of the dyad notation,
n
P
f=n
Let P
abc be the total rate of property flowing through the triangle. Then
P
abc can be obtained by taking the normal component of the flux and multiplying it by the area. The area of the triangle
abc is given by
A=
Thus
P
abc
=
=
5
6
1
(b a) (c a)
2
1
) A = P n
A = P (b a) (c a)
(f n
2
P (a b) + P (b c) + P (c a)
This results in a scalar quantity with units of the property per unit time no longer
per unit area. Hopefully, the context of the application will be sufficient to distinguish
the two meanings of flux.
Among the different possible dyads, an important set of nine dyads based on
the Cartesian unit vectors is the set of unit dyads, given by
x y ,
x z
( x x ),
y x ,
y y ,
y z
Z y ,
Z z
( Z x ),
The concept of dyads can be extended to triads, and so forth, collectively known as
polyads.
Definition 4.2. An nth -order polyad based on vectors v1 , . . . , vn denoted by
(vn v1 ) is a multilinear functional operator7 (based on dots products) acting
on a given sequence of n-input vectors x1 , , xn to yield a scalar output, as follows:
n
(4.14)
(vn v1 ) x1 , , xn =
(vi xi )
i=1
Note that dyads are second-order polyads. With respect to (4.13), (w x) v can continue to operate on another input vector, say, a, such that
(vw) x, a = (w x) (v a)
Based on the three Cartesian unit vectors, there will be 3n number of n th -order
th
polyads, which we refer to as the
unit n -order polyads. For example, there will be
33 unit triads such as ( x x x ), x x y , and so forth.
An n th -order tensor is defined as:
)
*
Definition 4.3. Let B = 1 , , m be a set of orthonormal basis vectors of an
m dimensional space S. The nth - order tensor under B is a linear combination of
n th order unit polyads formed by the unit basis vectors in B.
7
159
160
The zeroth order tensors are scalars, and first-order tensors are vectors.8 By convention, the term tensors refers to second-order tensors. Based on this convention,
we denote tensors by a letter with double underlines. We also limit our discussion to
spaces with a maximum of three dimensions. Using the three Cartesian unit vectors,
we have
T = T xx (x x ) + T xy x y + T xz x z + + T zz zz
(4.15)
where the scalar T ij is called the (i, j )component of tensor T . A special tensor,
called the unit tensor, is defined by
= x x + y y + zz
(4.16)
When this tensor operates on any vector, the result is the same vector.
A vector v can be considered a tensor only when it acts as an operator on another vector, say, a
(which in our case is via dot products),
v a =va
Otherwise, a vector is mostly just an object having a magnitude and direction. Also, the n th -order
tensor described in Definition 4.3 is still limited to the space of physical vectors, and the functional
operations are limited to dot products. A generalization to abstract linear vectors is possible, but
requires an additional vector space called the dual vector space.
Figure 4.7. The stress vector with respect to z as a sum of the normal stress and shear stress.
T xx x x + T xy x y + T xzx z
+T yx y x + T yy y y + T yzy z
+T zx zx + T zy zy + T zzzz
Let us explore the coefficients of T . Consider the (x, y) plane at point p (see
Figure 4.7). The unit normal vector is z. When tensor T operates on z,
(p, ) = T z = T xzx + T yzy + T zzz
z
T zz is the term along z, and it yields the normal stress with respect to the (x, y)
plane. Conversely, the other two terms, T xz and T yz, are the x and y components
of the shear stress.
For the general case, we can represent the unit normal vector n as
n = n x x + n y y + n zz
The stress (or traction) vector can then be obtained by letting T operate on n,
(p,n) = T n
(4.17)
=
=
T xx x x + T xy x y + . . . + T yzy z + T zzzz n x x + n y y + n zz
T xx n x + T xy n y + T xzn z x + T yx n x + T yy n y + T yzn z y
+ T zx n x + T zy n y + T zzn z z
Note that to determine the normal stress vector we need to project this vector
along the direction of n. (see Example E4.7).
Equation (4.17) is a statement of the relationship between the stress tensor and the stress vector and is known as Cauchys fundamental theorem for
stress.9
In some books on transport phenomenon or continuum mechanics, the position of the tensor
elements are switched (i.e., the transpose of T described previously) yielding the traction vector as
= n T.
161
162
Using the properties of dot products and Definition 4.1, we have the following
tensor operations:
Tensor Addition:
T +S =
T ij + Sij i j
i=x,y,z j =x,y,z
Scalar Multiplication
T
of a Tensor:
i=x,y,z j =x,y,z
Inner Product
of Two Tensors:
T S
T ij i j
i=x,y,z j =x,y,z
Inner Product of a
Tensor with a Vector:
T v
i=x,y,z
T : S
T ik Skj i j
k=x,y,z
T ij v j i
j =x,y,z
(T ij Sji )
i=x,y,z j =x,y,z
(4.18)
1
0
0
x = 0
y = 1
z = 0
(4.19)
0
0
1
With (4.19), correspondences can be established between tensor operations and
matrix operations. These are given in Table 4.4.
Once the vectors and tensors have been translated to matrix equations, the
various properties of matrices apply. For instance, the eigenvectors of a symmetric
tensor will be orthogonal to each other, associated with three real eigenvalues.
Moreover, if the symmetric tensor is positive semi-definite, then eigenvalues are all
non-negative real.
EXAMPLE 4.5.
2x x + 2x y + x z + 2y x + y y + y z + zx + zy + 4zz
2 2
T = 2 1
1 1
1
1
4
0.6035
0.6228
0.4479
v=0.5688 = 0.7962 : v=2.3644 = 0.4377 ; v=5.2044 = 0.4176
0.0422
0.6484
0.7601
Matrix representation
T
a = ax , ay , az
T
a + b = ax + bx , ay + by , az + bz
T
v = vx , vy , vz
ab
aT b
ab
abT
a = ax x + ay y + azz
a+b
ab
where H[a]
T =
tij
i j
i=x,y,z j =x,y,z
H[a] b
[a]
[a]
= hij ; hij = 3k=1 i,j,k ak
T = tij
T + S = tij + sij
T +S
T
T = (tij )
T S=
tik skj
T S
k=x,y,z
T v
T v=
j =x,y,z
T : S
txj v j ,
tyj v j ,
j =x,y,z
T
tzj v j
j =x,y,z
trace(T S)
Remarks: The items in the right column of Table 4.4 can be translated directly to
matrix operation commands in MATLAB. In addition, MATLAB provides some
10
As usual, the sign convention for tension and compression can be switched depending on which field
of study defines the tensors.
163
164
4
z
2
0
vb
-2
-4
2
0
y -2
-2
Figure 4.8. Stress tensor ellipsoid. The line along va has the maximum
normal stress, whereas the line along vb has the minimum normal stress.
2
x
Definition 4.4. The derivative of v(t) with respect to t, denoted by dv/dt, is given
by
d
1
v = lim
t0
t
dt
v (t +
t) v (t)
(4.20)
In Cartesian coordinates, the vector v(t) = vx (t)x + vy (t)y + vz(t)z will have a
derivative given by
d
v(t)
dt
1
lim
t0
t
vx (t +
t)x + vy (t +
t)y + vz(t +
t)z
vx (t)x + vy (t)y + vz(t)z
dvy
dvx
dvz
+
+
dt x
dt y
dt z
(4.21)
A geometric interpretation of (4.21) is shown in Figure 4.9. Note that the vector
dv/dt does not necessarily point in the same direction as v, and neither does it have
the same physical units as v (unless t is unitless). Using (4.21) together with the
properties of vector operations given in Table 4.2, the following identities can be
obtained:
165
Sums:
Scalar Products:
Dot Products:
Cross Products:
d
u(t) + v(t)
dt
d
(t)v(t)
dt
d
u(t) v(t)
dt
d
u(t) v(t)
dt
du dv
+
dt
dt
d
dv
v+
dt
dt
du
dv
v+u
dt
dt
du
dv
v+u
dt
dt
(4.22)
Note that the order is important for the derivatives of cross product because cross
products are anti-commutative.
d
dx
dy
dz
r =
+ +
dt
dt x dt y dt z
where the components are vx = dx/dt, vy = dy/dt and vz = dz/dt.
The norm of v(t) is the speed, denoted by s ,
"
s (t) = v(t) = v2x + v2y + v2z
11
(4.23)
(4.24)
We reserve the arrow sign above the variable to signify the derivatives of position vectors, including
velocity, acceleration, and so forth, and their components.
166
In as much as the position (x, y, z) changes with the parameter t, the distance traveled
along the path will also increase. We call this distance the arc length at t, denoted by
s(t) and defined as
t
v () d
(4.25)
s(t) =
t0
Note that the integrand is monotonic and nondecreasing, which means s(t) 0.
Differentiation will yield back the speed, that is,
"
ds
= v(t) = s (t) = v2x + v2y + v2z
dt
Vector v(t) will be oriented tangent to the path at t. Thus to find the unit tangent
vector at s, denoted by t(t), a normalization of v(s) is sufficient,
v(t)
t(t) =
(4.26)
v(t)
A derivative of t(t) with respect to t is also possible. Because t(t + dt) and t(t) both
have magnitude equal to 1, the only change will be in the direction. Using this fact,
we can apply the formula for the derivative of dot products given in (4.22),
t(t)
= t t = 1
d
t t
dt
dt
2 t
dt
Thus dt/dt is perpendicular to t. If we normalize this new vector to have a magnitude
, that is,
of 1, we obtain the unit normal to the curve at t, denoted by n
=
n
dt/dt
dt/dt
(4.27)
(t), we can find a third unit vector, called the binormal unit
Once we have t(t) and n
, that is perpendicular to both t and n
defined by
vector, denoted by b
= t n
b
(4.28)
dt/dt
ds/dt
dt/dt
s (t)
167
The radius of curvature, denoted by rcurve , is defined as the reciprocal of the curvature,
rcurve (t) =
1
s (t)
=
(t)
dt/dt
This is the radius of a circle that is tangent to the curve at t and two differentially
close neighboring points.
For the torsion of the path, we have a similar ratio. This time we define torsion
of the path at t, denoted by (t), as the norm of the change in the binormal unit
(t) per change in arclength s(t), that is,
vector b
(t) =
/dt
db
/dt(t)
db
=
ds/dt
s (t)
rtorsion =
1
s
=
/dt
db
Finally, we can take the derivative of velocity v(t) to obtain the acceleration
vector, denoted by a,
dv
d2 x
d2 y
d2 z
= 2 x + 2 y + 2 z
dt
dt
dt
dt
(4.29)
where the components are given by ax = d2 x/dt2 , ay = d2 y/dt2 and az = d2 z/dt2 .
as follows,
Alternatively, we could represent acceleration in terms of t and n
a(t)
aT t + aN n
d st
dv
=
dt
dt
ds
dt
ds
dt
t + s = t + s
n
dt
dt
dt
dt
ds
t + s 2
n
dt
Thus the tangential and normal components of the acceleration vector are given by
aT = ds/dt and aN = s 2 , respectively.
168
EXAMPLE 4.6.
Position Vector:
r
cos(t)x + sin(t)y + tz
Velocity:
v
Speed:
sin(t)x + cos(t)y + z
Unit Tangent:
t
+
sin(t)
2 x
Unit Normal:
n
cos(t)x sin(t)y
Unit Binormal:
b
sin(t)
2 x
Curvature:
1/2
Torsion:
1/2
Acceleration:
a
cos(t)x sin(t)y or n
cos(t)
2 y
cos(t)
2 y
1
2 z
1
2 z
r
x
y
z
=
+ +
x y z
t
r
x
y
z
=
x + y + z
Then we take the cross product of both tangent vectors to find a vector that is
perpendicular to the plane that contains both tangent vectors and then normalize
the result to find a unit normal vector,
=
n
t t
t t
(4.30)
dx
dy
dz
= vx ;
= vy ;
= vz
(4.32)
ds
ds
ds
subject to initial conditions r(0) = r0 . However, in general, these simultaneous
differential equations may not be that easy to evaluate. One could use numerical
ODE solvers, discussed in Chapter 7. In MATLAB, the command streamline
uses finite difference approximations to obtain the streamlines by specifying the
starting points. Another MATLAB command streamtube adds tube-width
information to show the magnitude of the velocity instead of a simple curve.
To illustrate, consider a 2D velocity field described by
17
4
29
5 2
x y+
x + x2 + (y x)
v=
3
6
3
30 y
We can plot the velocity field and streamlines as shown in Figure 4.10. This figure
was generated using a MATLAB file that are available in the books webpage,
streamline_test([0 0 0.04 0.1 0.18 0.3], ([2.05 2.2 2 2
2 2],0,1,2,3)
169
170
2.5
0.5
x
4.7.1 Gradients
Consider a 3D scalar field (x, y, z). For example, let u(x, y, z) be a scalar temperature field, that is, a temperature distribution in the 3D space. Usually, the gradation
of scalar fields will determine the driving forces in mass, momentum, or energy transport. Then it is important to determine a vector field based on the gradations of the
scalar field that can be used to determine the directions (and magnitude) of driving
forces. The desired vector field is known as the gradient vector field. The gradient
of (x, y, z) is then defined as
gradient((x, y, z)) =
x +
y +
x
y
z z
(4.33)
This assumes that the scalar field is continuous and smooth (i.e., differentiable in
each independent variable). The gradient definition can be simplified by introducing
a differential operator called the gradient operator, or grad for short.12 The grad
operator is denoted by , and it is defined as
= x
+ y + z
x
y
z
(4.34)
given by
2
2
(x, y) = e(x +y )
171
1
(x,y)
0.5
0
2
1
0
-1
y
-2
-2
-1
Figure 4.11. The surface plot of (x, y), together one curve at constant y = 1 and another
curve at constant x = 0.5.
/y = 0.573 (about twice as steep as the other slope). The gradient is given
by
(0.5, 1) = 0.287x + 0.573y
whose magnitude is (0.5, 1) = 0.641.
If we had chosen the point (x, y) = (1.5, 1), the resulting gradient would
be
(0, 1) = 0.116x + 0.078y
where the x-component is about twice as large as the y-component, and the
magnitude of the gradient vector is (1.5, 1) = 0.14. One could look
at Figure 4.11 and see that indeed the gradient at the point (x, y) = (0.5, 1)
is much steeper than (x, y) = (1.5, 1). Furthermore, one can see that the
gradient is pointing at an ascending direction.
f (x + v) f (x)
(4.35)
x +
y +
z =
vx +
vy +
vz
x
y
z
x
y
z
172
vx +
vy +
vz = v
x
y
z
(4.36)
The directional derivative is the scalar projection of the gradient along the direction
of vector v. Based on the definition (4.35), the value of the directional derivative is
zero when is a constant along v. For a 2D case, for example, = (x, y) = , this
defines a curve (possibly closed) called the contour curves or contours of (x, y) at
different levels determined by the parameter . However, for a 3D scalar field, that
is, = (x, y, z) = , these surfaces are called the contour surfaces or isosurfaces
determined by the parameter . Because the directional derivative is zero along
the contour lines or contour surfaces, we see from (4.36), that the gradient is
perpendicular to either the contour lines for the 2D case or the contour surfaces for
the 3D case.
Remarks: In MATLAB, the commands contour and contour3 gives the contour plots of a surface, for example, of a 2D scalar field. For a 3D scalar field,
one can instead use contourf to obtain contour slice plots or use the command
isosurface to find the contour surfaces at fixed values (one can then use the command hold to allow several surface plots while introducing NaNs to produce cutouts
of the surfaces). For the gradients, there is the MATLAB command gradient,
which calculates the gradient based on finite differences; but the accuracy depends
on the resolution of the mesh. The result of the gradient can then be plotted using
quiver or quiver3.
Suppose the dimensionless temperature distribution u = (T
T center )/T center is given by the following scalar field:
x 2
u=
+ y2 + z2
3
The contour surfaces, that is, u = =constant, are ellipsoids centered at the
origin. The surfaces corresponding to = 0.5, 1.0, 1.5 are shown in Figure 4.12.
Let use consider two points at the surface defined by = 1: a =
(2.9, 0, 0.2560) and b = (0, 0, 1). At point a, we find that the gradient is given by
u|a = 0.644x + 0.5121z and u|a = 0.823. At point b, we have the gradient
u|b = 2z and u|b = 2. Both gradients can be seen to be normal to the contour surfaces. Note that the distance between the surface at = 1 and = 1.5
around point a is farther than the distance between surfaces around point b, and
yet the magnitude of the gradient at point a is less than half the magnitude of the
gradient at point b. Thus one should be cautious when reading contour maps or
contour surfaces; the closer the adjacent contours are to each other, the greater
the magnitude of the gradient.
EXAMPLE 4.9.
The gradient vector can also be used to determine the rate of change in along
a path C. The differential of (x, y, z) is given by
d(x, y, z) =
dx +
dy +
dz
x
y
z
173
2
1
z0
-1
-2
-5
0
5
-2
Along the path, the rate of change in per change in arclength s is given by
d(x, y, z)
dx dy dz
=
+
+
ds
x ds
y ds
z ds
(4.37)
The right-hand side of the equation can now be factored into a dot product,
d(x, y, z)
dx
dy
dz
= ()
+ +
ds
ds x ds y ds z
=
() t
(4.38)
where t is the unit tangent vector ( cf. (4.26)). Thus the rate of change of along the
path C is the directional derivative along the unit tangent vectors of the path at the
desired point.
Let be the angle between and t. Then (4.38) becomes
d
= cos
ds
(4.39)
This means that at a point, the maximum value for increase of occurs when cos =
1 or = 0, whereas the maximum rate of decrease of occurs when cos = 1 or
= . This generates one of the methods used for the search for local optima of
(x, y, z), that is, to choose an update path C directed along the gradient to find
the local maximum, called the gradient ascent method. In other cases, we choose an
update path negative to the direction of to find a local mimimum, which is called
the gradient descent method. If the rate of change, d/ds, at a point is zero for all ,
that is, for all paths passing through this point, then the gradient must be zero at this
point. This indicates an extreme point or critical point, which is either an optimum
or a saddle point. Thus a necessary condition for optimality is = 0.
Finally, the gradient operation can be used to find the unit normal of any smooth
surface. Let the given surface be given as f (x, y, z) = C. As an alternative to methods
used in Section 4.6, for example, (4.30), the unit normal to f (x, y, z) = C at the point
a = (ax , ay , az) is given by
n=
f
f
(4.40)
174
4.7.2 Divergence
Consider the vector field v (x, y, z),
v(x, y, z) = vx (x, y, z)x + vy (x, y, z)y + vz(x, y, z)z
The divergence of a vector field at a point is defined as
v
vy
vz
x
divergence v (x, y, z) =
+
+
x
y
z
(4.41)
v j
v =
i
vj j =
i
i i j
j =x,y,z
i=x,y,z
i,j =x,y,z
=
vi
vy
vx
vz
=
+
+
i
x
y
z
i=x,y,z
175
1.5
2
1
0.5
z
0
0.5
1
2
1.5
2
0
2
1
0
2 2
0.5
0.5
Figure 4.14. The plot of the vector field v with f (z) = z and the divergence v = 2z.
For the second case, consider f (z) = z2 . The vector field is shown in
Figure 4.15. As the vector field shows, there appears to be no region where the
vector field is converging; thus the divergence is always positive, except at z = 0.
2
1.5
z
0
1
2
0.5
2
0
0
2 2
0
1
0.5
0.5
176
Figure 4.16. Evaluation of the curl of a vector field at a point across the (x, y) plane.
4.7.3 Curl
Due to the distribution in a vector field, neighboring vectors will affect the differential
volume at a point to twirl
In Figure 4.16, for a particle (i.e., a differential
or rotate.
volume) at point P = Px , Py , Pz , we consider the projection of the vector fields in
the (x, y) plane at Pz level. Let Czz be a vector that describes the tendency of the
particle at P to rotate based on the relative vector distributions of vx (x, y, Pz) along
y-direction and the relative vector distribution of vy (x, y, Pz) along the x-direction.
Using the right-hand screw convention, the total effect is given by
vy (x +
x, y, z) vy (x, y, z) vx (x, y +
y, z) vx (x, y, z)
Cz = lim
y
=
vy
vx
x
y
vx
vz
z
x
vz vy
y
z
Adding the three vectors yields a vector known as the Curl of the vector field at
the point (x, y, z),
vy
vz vy
vz
vx
vx
x +
Curl(v(x, y, z)) =
y +
z
(4.42)
y
z
z
x
x
y
177
Using the operator, the curl is written as the cross product of with v,
v j
v j
v =
i
vj j =
i j =
ijk
i
i
i k
j =x,y,z
i=x,y,z
ij
vz vy
y
z
x +
vx
vz
y +
z
x
ijk
vy
vx
x
y
(4.43)
With the rectangular coordinates, each unit vector will have the same direction
and magnitude at any point in space. Under the Cartesian coordinate system, the
following mnemonic is valid:
x
y
z
v = det
x
y
z
vx
vy
vz
For the special case in which v is a velocity field, v is the vector field known as
the vorticity of v. The angular velocity of a rigid body is related to the curl by
=
1
v
2
(4.44)
EXAMPLE 4.11.
!
1 + 5 x2 + y2
v = 20
2 z
!
1 + 10 x2 + y2
The vector field is shown in Figure 4.17, where the flow appears to follow a
helical path. The curl is also a vector field, but for this particular example, it
turns out to only have a z-component that is independent of z. Thus we can plot
the curl as as function only of x and y, which is also shown in Figure 4.17. From
the figure, one sees that the curl increases radially outward in any (x, y) plane.
Also, note that none of the vectors located far from the z-axis are lying in a plane
that is parallel to the (x, y) plane, and yet the curl at all points are directed in the
positive z-direction. This shows that the curl is not necessarily perpendicular to
the vector v at that point. This is because the curl is a differential operation on
a vector field and not a cross product of two vectors.
178
2
1
z
0
0.5
1
2
0
1
1
2 2
1 1
As shown in Example 4.11, the curl is not necessarily perpendicular to the vector
field. However, there are situations in which the curl is perpendicular to the vector
field. If v ( v) = 0, then v is known as a complex lamellar vector field. An
example of a complex lamellar vector field is
v = vx (x, y) x + vy (x, y) y + 0z
Conversely, there are also vector fields whose curls are always parallel to the vector
field. If v ( v) = 0, then v is known as a Beltrami vector field. An example of
a Beltrami vector field is
v = sin z + cos z x + sin z + cos z y + 0z
where and are constant.
4.7.4 Laplacian
As we discussed earlier, the gradient of a scalar field (x, y, z) will result in a vector
field . By taking the divergence of , we obtain the Laplacian of (x, y, z),
Laplacian((x, y, z))
()
(4.45)
2
2
2
+
+
x2
y2
z2
(4.46)
(4.47)
The energy flow following Fouriers law is proportional to the negative gradient of temperature, that is, energy flow = kT , or (divergence of energy flow)
= (kT ). The rate of energy is given by d (Cp T ) /dt. With constant heat
conductivity, density, and heat capacity, the energy balance reduces to
T
= 2 T
t
where = k/(Cp ), k is thermal conductivity, is density, and Cp is the heat capacity. (See Section 5.4.1 for a more detailed derivation that involves the divergence
theorem.)
EXAMPLE 4.12.
then
T = 2T xx + yy
and
2 T = 4 r2 1 T
!
where r = x2 + y2 . Thus the Laplacian of T is a negative factor of T when
r < 1, zero when r = 1, and a positive factor of T when r > 1.
The scalar field, gradient field, and Laplacian field are shown in Figure 4.18.
One can see that the gradient field is directed toward the center and zero at
the origin. Because the gradients are all pointing toward the origin around the
origin, a negative divergence is expected there, and this can be seen for the the
Laplacian.
Scalar functions that satisfy 2 = 0 in a region S are known as harmonic
functions. Harmonic functions in a closed region S have special direct relationships
with the values of at the boundaries of S, and these relationships can be derived
using the divergence theorem and Greens identities, which are topics in the next
chapter.
The Laplacian operator can also be applied to vectors and tensors, for instance
2 v = v. However, we need to first define v, which is the gradient of a vector.
This is discussed in the next section.
vm
v =
k
vm m =
(4.48)
k m
k k m
k=x,y,z
k,m=x,y,z
179
180
1.5
T
0.5
0.5
0.5
0
2
2
0
1.5
2 2
2
2
T
0
4
2
2
0
0
2 2
Figure 4.18. The plot of the temperature field, T (x, y), gradient vector field T , and Laplacian
field 2 T .
vx
x
vx
[v] =
y
vx
vy
x
vy
y
vy
z
vz
x
vz
vz
=x,y,z
w
k,m=x,y,z
vm
=
k k m
m,k=x,y,z
vm
wk
m
k
(4.49)
181
2. Dot-Grad
We define the dot-grad operator based on w as
w = wx
+ wy + wz
x
y
z
(4.50)
+ wy
+ wz
x
y
z
and
(w ) v = wx
v
v
v
+ wy
+ wz
=
x
y
z
k,m=x,y,z
vm
wk
m
k
w = x wy wz
+ y wz wx
+ z w x wy
z
y
x
z
y
x
(4.51)
for which the mnemonic, applicable only for the rectangular coordinate system,
is given by
x
y
z
w
wy wz
x
w = det
x y z
Using the cross-grad operator on a scalar field will yield the following identity,
(w ) = w ()
Another important identity is obtained when we extend the divergence operation, based on grad operators, to the cross-grad operators,
(w ) v = w ( v)
(4.52)
vm
2 v = (v) =
k
k
m
,m=x,y,z
k=x,y,z
k,m=x,y,z
vm
k2 m
2
2 vm m
m=x,y,z
(4.53)
182
Note that the formula given in (4.53) is only for the rectangular coordinates. For
other coordinate systems, one could start from the definition of the Laplacian
as a divergence of a gradient.
v + v
(4.55)
v + () v
(4.56)
(u v)
v u u v
(4.57)
(u v)
v u u v
+ u v v u
(u v)
(4.58)
u v + v u
+ u ( v) + v ( u)
(4.59)
(4.60)
(4.61)
( v)
(4.62)
v v
( v) 2 v
1
vv v v
2
(4.63)
The first three identities involve operations on scalar products, including product
of two scalar fields and the scalar product of a vector field. They are direct results
of implementing the properties of derivatives of products. Note that in (4.56), the
order of v is crucial.
EXAMPLE 4.13.
+ (v) = 0
t
where and v are the density field and velocity field, respectively. This can be
rewritten in a more familiar form by using the identity (4.55) to replace the
second term as follows:
(v) = v + v
+ v + v = 0
(4.64)
t
or by defining another operator D/Dt() known as the substantial rate of change
operator (or substantial time derivative) defined by
D
=
+
v
(4.65)
()
()
Dt
t
the continuity equation (4.64) becomes
D
+ v = 0
(4.66)
Dt
For the special case of incompressible fluid (i.e., is constant), (4.66) reduces to
v = 0.
Equation (4.57) shows that with the gradient operator, the usual cyclicpermutation properties of triple product no longer apply, except when either u or v
is constant. Similarly, for (4.58), the identities for the usual triple vector products no
longer apply as well.
The identity in (4.59) is surprisingly complicated. On the left-hand side is the
gradient of a dot product. However, the right-hand side identity includes cross
products of curls plus dot products of vectors with gradient-vector dyads. Note that
(4.63) is a consequence of (4.59) with v = u.
Equation (4.60) states that gradient vector fields are irrotational. However,
(4.61) states that curls have zero divergence, which means that curl fields (e.g.,
vorticity fields) are solenoidal. Both these identities are very useful in solving vector
differential equations. For instance, in Navier-Stokes equations, a pressure gradient
appears in the momentum balances. By simply taking the curl of the equation,
the dependence on pressure disappears because of identity (4.60). Likewise, if one
needs to remove curls in an equation, one simply needs to take the divergence of
that equation.
The identity given by (4.62) relates the curl of a curl with terms involving the
Laplacian of a vector field, as well as the gradient of a divergence. In some cases, this
formula is used to find the Laplacian of a vector field represented in other coordinate
systems.
Equation (4.63) can be used to yield an alternative definition for a Beltrami
vector field; that is, with v ( v) = 0, a vector field is a Beltrami field if and only
if
v v =
1
(v v)
2
Finally, equation (4.63) is also very useful because the term v v appears in
momentum balance equations and is known as the inertial terms. Thus (4.63) is often
used to introduce the role of vorticity, v, in the equations of fluid dynamics.
The proofs of some of the identities can be lengthy but more or less straightforward. The following example shows the proof of the identity given in (4.59).
183
184
Let us prove the identity given in (4.59). We begin with an expansion of the left-hand side of the identity,
uk
vk
(u v) =
i
uk vk =
i vk
+ uk
(4.67)
i
i
i
i=x,y,z
EXAMPLE 4.14.
k=x,y,z
i,k=x,y,z
v
k
u ( v) =
j uj
km
m
j =x,y,z
m,k,=x,y,z
i ijm km u j
i,j,m,k,=x,y,z
(u ) v =
k=x,y,z
vk
vk
vi
=
i uk
uk
i
k
i,k
(4.68)
vi
uk
vi i =
i uk
k
k
i=x,y,z
i,k
(4.69)
i,k=x,y,z
i,k
vk
i uk
i
(4.70)
uk
i vk
i
(4.71)
Finally, adding (4.70) and (4.71), we arrive at a sum that is equal to (4.67).
y = r sin
(4.72)
= tan (y/x)
z = z
z = z
A summary of the relationships between cylindrical and rectangular coordinates
is given in Table 4.5. The derivation of the items in this table can be done via
geometric arguments. Details of these derivations, aided by matrix methods, are
given in Section D.2.
185
The gradients, curls, and Laplacian in cylindrical coordinates have to be evaluated using the definition of the various operators and the distributive properties of
dot products and cross products. To illustrate, the divergence formula in cylindrical
coordinates can be obtained as follows:
r + + z
v =
vr r + v + vzz
r
r
z
vr
vz
1
=
+
+
vr r + v
r
z
r
vr
vz
1
vr
v
=
+
+ vr +
r v r +
r
z
r
vr
1 v
vr
vz
+
+ +
r
r
r
z
(4.73)
v =
r + + z
vr r + v + vzz
r
r
z
v
vz
vr
v
=
z
+
r
r
r
z
z
1
vr
v
vz
+
r + vr +
v r +
z
r
v
vz
vr
v
=
z
+
r
r
z
z r
1
vr
v
+
+ v z +
r
z
r
1 vz v
vz vr
1 (rv ) vr
=
r +
+
+
z
r
z
r
z
r
r
(4.74)
186
Cylindrical
Unit Vectors
x
y
z
=
=
=
cos r sin
sin r + cos
z
=
=
=
cos x + sin y
sin x + cos y
z
Vector Components
v = vx x + vy y + vzz
vx
vy
vz
=
=
=
v = vr r + v + vzz
vr cos v sin
vr sin + v cos
vz
vr
v
vz
=
=
=
vx cos + vy sin
vx sin + vy cos
vz
=
=
=
sin
r
r
cos
sin
+
r
r
cos
=
=
=
+ sin
x
y
r sin
+ r cos
x
y
z
cos
Gradient Operators
= x
+ y
+ z
x
y
z
= r
+
+ z
r
r
z
r
= ;
= r
EXAMPLE 4.15.
z 0.5
z 0.5
1
0.5
1
0.5
0.5
0
0
0.5
0.5
1
0.5
0
0
0.5
187
0.5
1
Figure 4.20. The Poiselle velocity field at z = 0 (left plot) and the corresponding curl field (right
plot).
y = r sin sin
(4.75)
= tan ( x2 + y2 /z)
tan1 (y/x)
r cos
A summary of important relationships between the rectangular and spherical coordinate systems is given in Table 4.6. In the table, as well as in several places in
this chapter, we use the shorthand notation for sines and cosines (i.e., s = sin ,
c = cos , s = sin , and c = cos ). The derivation of the items in this table can be
done via geometric arguments. Details of these derivations, aided by matrix methods,
are given in Section D.3.
188
Spherical
Unit Vectors
x
y
z
=
=
=
s c r + c c s
s s r + c s + c
c r s
=
=
=
s c x + s s y + c z
c c x + c s y s z
s x + c y
Vector Components
v = vx x + vy y + vzz
v = vr r + v + v
vx
vy
vz
vr
v
v
=
=
=
s c vr + c c v s v
s s vr + c s v + c v
c vr s v
=
=
=
s c vx + s s vy + c vz
c c vx + c s vy s vz
s vx + c vy
c c
+
r
r
s
rs
c s
s s
+
r
r
c
+
rs
s
r
r
s c
+ s s
x
y
+ c
z
r c c
+ r c s
x
y
r s
z
r s s
+ r s c
x
y
s c
Gradient Operators
= x
+ y
+ z
x
y
z
= r
1
1
+
+
r
r
rs
r
= ;
= r ;
= s
= c ;
= r s c
k
=0
m
for k, m = x, y, z
Using these operators, we can find the divergence and curl of a vector field to be
vr
1 v
vr
1 v
v cos
v=
+
+ 2 +
+
(4.76)
r
r
r sin
r
r sin
and
v
1 v
1 v
v cos
1 vr
v
v
+
r +
r
r sin
r sin
r sin
r
r
v
1 vr
v
+
+
(4.77)
r
r
r
189
EXAMPLE 4.16. Let the v be a vector field that is a function of the position vector
r, away from the origin, given by
r
v=
(r r)n/2
v = n1 r
r
With v = v = 0 and vr = vr (r), the curl can be evaluated using (4.77) to yield
a zero vector
v=0
that is, v is irrotational. Conversely, the divergence of v can be evaluated using
(4.76) to yield
vr
vr
3n
+2 = n
r
r
r
and v becomes solenoidal for n = 3.
v=
=
=
=
x (a, b, c)
y (a, b, c)
z (a, b, c)
a
b
c
=
=
=
a (x, y, z)
b (x, y, z)
c (x, y, z)
(4.78)
Then, according to the implicit value theorem, the transformation between the two
coordinates will exist if the Jacobian matrix given by
J (x,y,z)(a,b,c)
is nonsingular.
x
a
=
a
x
b
y
b
z
b
x
c
(4.79)
190
Definition 4.6. Let the Jacobian in (4.79) be nonsingular for the new coordinate
(a, b, c) given in (4.78). The a-coordinate curve (or a-curve) is the locus of points
where b and c are fixed. The a-coordinate surface (or a-surface) is the surface
defined at a fixed a.
Figure 4.22 shows the a-coordinate curve together with the a-coordinate surface.
Using (4.78), the a-curve at b = b0 and c = c0 is given by a set of points described by
the position vectors
r (a, b0 , c0 ) = x (a, b0 , c0 ) x + y (a, b0 , c0 ) y + z (a, b0 , c0 ) z
whereas the a-surface at a = a0 is described by the scalar function
a0 = a (x, y, z)
Definition 4.7. Let p be a point in the 3D space. A vector a (p) that is tangent to
the a-coordinate curve at p is called the a-base vector and defined as
k
r =
a (p) =
k
(4.80)
a p
a
k=x,y,z
p
A vector
a (p) that is normal to the a-coordinate surface at p is called the areciprocal base vector, or a-dual base vector, and defined as
a
a (p) = a (x, y, z) =
k
(4.81)
p
k
k=x,y,z
p
a
a
and
a = a
a
191
Figure 4.23. The a-base vector and the a-reciprocal base vector.
Based on Definitions 4.6 and 4.7, we have the following results and observations:
1. Vector
a is orthogonal to b and c ,
a b
a x a z a z
a
+
+
=
=0
x b y b z b
b
a c
a x a z a z
a
+
+
=
=0
x c
y c z c
c
(4.82)
a x a z a z
a
+
+
=
=1
x a y a z a
a
(4.83)
Similarly,
b b = 1 and
c c = 1.
3. The set (a , b, c ) forms a linearly independent set of vectors that span the 3D
space. Thus any vector v can be represented by a linear combination of the base
vectors, that is,
v = v a a + v bb + v c c
Likewise, the reciprocal base vectors (
a ,
b,
c ) form another linearly independent set of basis vectors. However, they are used more as the basis for the
gradient operator . To see this, we start with the gradient of a scalar field in
rectangular coordinates,
k=x,y,z
k=x,y,z
m=a,b,c
m
k
=
m
m
k
m
m=a,b,c
m
=
k
k
m k
k=x,y,z
m=a,b,c
a
+
b +
c
a
b
c
or
=
a
+
b +
c
a
b
c
(4.84)
192
b = c a
c = a b
(4.85)
Because
a is normal to the a-surface, the orthogonality of the base vectors means
that
a and a are pointing in the same direction. After normalizing the base vectors
and reciprocal base vectors, we have
a = a
b = b
c = c
(4.86)
1
a
b =
1
b
c =
1
c
(4.87)
We call the norms of the base vectors scaling factors, denoted by a , b and c ,
a =
r
a
b =
r
b
c =
r
c
(4.88)
This means that for orthogonal base vectors, the gradient operator can also be
written in terms of the base vectors or unit base vectors as follows:
=
=
=
+
b +
c
a
b
c
a a
b b +
+
c c
a
b
c
1
a a a b b b c c c
a
(4.89)
EXAMPLE 4.17.
1 2
a b2
2
which is a valid coordinate system in a domain where the following Jacobian is
nonsingular:
x = ab cos
y = ab sin
z=
=
=
a2 b2
a2 2z
a=
z+ r
193
!
where r = x2 + y2 + z2 . At fixed values of a, this gives the a-coordinate surface.
Likewise, for the b-coordinate surface, we have
x2 + y2
a2
=
=
a2 b2
b2 + 2z
b=
z + r
whereas for the -coordinate surface, we have = Atan xy , which is the same
as the cylindrical coordinate , that is, a plane containing the z-axis.
Let r be the position vector. Then the base vectors are given by
a
r
= b cos x + b sin y + az
a
r
= a cos x + a sin y bz
b
r
= ab sin x + ab cos y
= a2 b2
and the other dot products are zero. The scaling factors are given by the square
root of these dot products, that is,
!
a = b = a2 + b2 ; = ab
which then yield the gradient operator in the parabolic coordinate system to be
=
1
a2 + b2
+
b
+
a
b ab
a2 + b2
a a a
b b b
c c c
(4.90)
2. Divergence
w=
1
a bc
bc wa + c a wb + a bwc
a
b
c
(4.91)
3. Curl
1
w=
det
a bc
a a
bb
c c
a wa
bwb
c wc
(4.92)
194
2 =
a bc
a
bc
a a
+
a c
b b
+
a b
c c
(4.93)
5. Gradient-Vector Dyad
w =
k=a,b,c m=a,b,c
1 wk
m k +
m m
wk
k
m m m
(4.94)
k=a,b,c m=a,b,c
The proofs of (4.90) to (4.94) are given in the appendix as Section D.1. Applying
these results to cylindrical and spherical coordinates, we obtain an alternative set of
combined formulas given by the following:
Gradient of
Scalar Fields:
1
k k k
k
Partial Derivative
of Vector Fields:
v
m
Divergence of
Vector Fields:
vk
m
2
1 vk +
+ D (v)
k k
k
Curl of
Vector Fields:
+
2
k + pm (v)
det
a a
1
b b
1
c c
+
2
+ q (v)
vb
vc
va
where (a, b, c) = (x, y, z) , (r, , z) or (r, , )
(
Laplacian of
Scalar Fields:
Gradient-vector
dyads:
2
1 2 +
+ L ()
2k k2
k
+
2
1 vk
+ T (v)
m m m k
m
(4.95)
The values of parameters k , q, D, L, p, and T are given in Table 4.7 for both the
cylindrical and spherical coordinates. Note that the preceding equations also apply
to rectangular coordinates by setting x = y = z = 1 and disregarding all the terms
with curly brackets.
Spherical
r = 1, = r, z = 1
r = 1, = r, = rs
q (v) =
r z
q (v) =
D (v) =
vr
r
D (v) = 2
L () =
1
r r
L () =
p (v) = v r + vr
v
vr
T (v) = r +
r
r
v c
v
v
+
rs r
r
r
vr
v c
+
r
rs
2
c
+ 2
r r
r s
p (v)
v r + vr
p (v)
v s r v c + (vr s + v c )
T (v)
v
vr
+
r r
r
v
v c
vr
v c
r
+
+
r
rs
r
rs
Note: Absent items are considered zero, e.g. p r = 0. (Arranged in alphabetical order.)
Let us show that the formulas given in (4.92) and (4.95) for the
curl of a vector field in spherical coordinates are the same as that given by (4.77).
Applying (4.92) for the spherical coordinates, with (a, b, c) = (r, , ) and
(a , b, c ) = (1, r, r sin ), gives
r
r
r sin
v =
det
r sin
r
EXAMPLE 4.18.
r sin v
1
(r sin v ) (rv )
1
vr
(r sin v )
r2 sin r
r sin
r
1
(rv ) vr
+
r
r
vr
rv
v cos
v
v
1
1
v = det
r +
+
r
r
sin
r
r
r
r sin
vr
1 v
1 v
1 vr
v
v
1 vr
r
r
r sin
r sin
r
r
r
v cos
v
v
+
r +
r sin
r
r
195
196
E4.1. Verify all the properties in Table 4.2 using the following values:
= 1.5
u = x + 2y + 2z
w = 2y + z
= 2
v = x + y + 2z
y = x + 4z
;
;
;
E4.2. Prove or disprove: The dot product and cross product can be interchanged
in a triple product, that is,
u (v w) = (u v) w
E4.3. Consider three non-collinear points A = (ax , ay , az), B = (bx , by , bz), and C =
(cx , cy , cz). The minimum distance d from point C to the line passing through
A and B is given by the following formula
d=
where
u
uv
v
(4.96)
(cx ax ) x + cy ay y + (cz az) z
(bx ax ) x + by ay y + (bz az) z
Show that the same value results if the roles of A and B had been interchanged.
E4.4. Prove or disprove the following identities:
1. (a b) c = a (b c)
2. (a b) (c d) = (a c) (b d) (a d) (b c)
3. (a b) (c d) = (a (c d)) b (b (c d)) a
Also, verify the proven identities using
a = 2x + 3y z
b = 3x + 2z
c = 4x 2y + 2z
d = 2y + 3z
E4.5. Determine a formula for the volume of the triangular pyramid whose vertices
are (xi , yi , zi ), i = 1, 2, 3, 4. Under what conditions will the volume be zero?
E4.6. A triangle
in 3D
by three
non-collinear
vertices given by
space
is determined
a = ax , ay , az , b = bx , by , bz , and c = cx , cy , cz . Let the oriented triangular area, denoted by A
(a, b, c), be the vector defined by
1
A
(a, b, c) = (b a) (c a)
(4.97)
2
where a, b, and c are the position vectors based on the coordinates of a, b,
and c.
1. Show that the area of the triangle
abc is given by A
(a, b, c) . Use this
to find the area formed by a = (1, 1, 1), b = (1, 0, 1) and c = (1, 1, 1).
2. Show that an alternative formula is given by
1
A
(a, b, c) = (a b + b c + c a)
2
thus also show that
A
(a, b, c) = A
(b, c, a) = A
(c, a, b)
3. Consider the tetrahedron described by four vertices a, b, c, and d as shown
in Figure 4.24.
4.10 Exercises
197
c
z
a
d
Based on the right-hand rule, the outward normal vectors will be based
on the sequences (within a cyclic permutation) abc, bdc, cda, and dba. Show
that the vector sum of the oriented areas will be a zero vector, that is,
A
(a, b, c) + A
(b, d, c) + A
(c, d, a) + A
(d, b, a) = 0
Verify this fact using a = (1, 2, 1), b = (2, 1, 1), c = (2, 2, 4) and d =
(3, 3, 2).
E4.7. Consider the following symmetric stress tensor T and a unit normal n given
by
2 3 1
1
1
T = 3 1 1
;
n= 1
3
1 1 2
1
1. Find the normal stress vector and shear stress vector that correspond to
the surface determined by the unit normal vector n.
2. Show that, in general, given stress tensor T and unit normal n, the normal
stress vector normal and shear stress vector shear are given by
normal =
nn T n
shear =
I nn T n
where I is the identity tensor. Verify these formulas by comparing them
with the results you found previously.
3. Find the unit vectors, v1 , v2 , and v3 along the principal axes of the stress
tensor T . Show that the normal stress along v1 is equal to v1 multiplied
by the corresponding eigenvalue of T while the shear stress is zero. Show
that the same is true for v2 and v3 .
E4.8. A set of 3D vectors, u, v, and w, spans the whole 3D space. Another related set
of three vectors is called the reciprocal vectors. A vector
u is the reciprocal
vector of u if: (1) it is orthogonal to v and w, and (2)
u u = 1. Find the
formulas for the reciprocal vectors,
u,
v, and
w. (Hint: Use cross products
and triple products.)
Note: The reciprocal vectors are used to define tensors in case the basis
vectors are not orthogonal.
E4.9. For the curve C given by x(t) = 3 cos (t), y = sin (t), and z = t/(1 + 2t), evaluate the following at t = 5:
1. Velocity v and acceleration a
, and binormal unit
2. The tangent unit vector t, normal unit vector n
vector b
3. The tangential and normal components of acceleration a
4. The curvature and torsion
198
5. Verify the following equations known as the Frenet formulas at the given
point:
dt
=
n
ds
d
n
= t + b
ds
db
=
n
ds
Hint: For a vector c, use the relationship
dc
dc/dt
=
ds
s
E4.10. A solid top is shown in Figure 4.25 and has its surface described by the
following equations:
x
sin () cos ()
sin () sin ()
for 0 and 0 2.
1.5
0.5
0.5
0.5
0
y
-0.5
-0.5
4.10 Exercises
199
=
=
=
a (x, y, z)
b (x, y, z)
c (x, y, z)
x
y
z
=
=
=
x (a, b, c)
y (a, b, c)
z (a, b, c)
1.
r
r
r
dx + dy + dz
x
y
z
r
r
r
da + db + dc
a
b
c
For the Cartesian coordinates, we know that ds2 = dx2 + dy2 + dz2 .
Show that the matrix G = J T J , where J is the Jacobian matrix,
x x x
a b c
y
y y
J =
a b c
z z z
a b c
will satisfy the equation,
ds2 =
2.
da
db
dc
da
G db
dc
200
3.
x
a
y = b
c
z
where
= J diag
4.
G11
G22
G33
a
x
b = y
c
z
Verify your results by applying them to both the cylindrical and spherical
coordinate systems.
Note that at this point, the basic ingredients, that is, and , are now
available to find and k /m, for k = a, b, c and m = a, b, c.
Instead of ending each iteration at the point where the directional derivative is zero, which could
mean solving nonlinear equations, one could simply take a path along the gradient line and end at
4.10 Exercises
201
(4.98)
(4.99)
T ij
i j
i,j =x,y,z
and
T : v =
(4.100)
i,j =x,y,z
T ij
v j
i
E4.17. Let and be scalar fields. Using the identities given in (4.57) and (4.60),
show that
( ) = ( ) = 0
that is, is solenoidal.
E4.18. Using the identity (4.98) and the continuity equation (4.64), show that following is true
v
Dv
=
+ v v
(4.101)
Dt
t
where D/Dt is the substantial time derivative defined in (4.65).
E4.19. The stress tensor T of a fluid that follows Newtons law of viscosity is given
by
2
T = v + (v)T +
( v) I
(4.102)
3
where , , and I are the viscosity, dilatational viscosity, and identity tensor,
respectively.
1. Define the tensor (v)T by
(v)T =
j,k=x,y,z
v j
k j k
(4.103)
This means that equation of motion given in (5.26) reduces to the equation
known as the Navier-Stokes equation of motion,
Dv
= p + 2 v + g
Dt
where p and g are the pressure and gravitational acceleration, respectively.
a fixed ratio of the length of the gradient. Using a small ratio has a better chance of not missing the
optimum along the path. However, a small ratio slows down the convergence.
202
E4.20. A material following the Eulers equation of motion (e.g., inviscid flow) is
described in (5.28), which can be rewritten to be
v
+ v v = (Q + )
(4.104)
t
where Q is a potential body force field per unit mass and is a scalar field
whose gradient is defined by = ( p ) /, assuming is a function of
pressure p only.
1. Using the identities given in Section 4.7.6, show that
+ v = v ( v)
t
where = v is the vorticity of v. Or, using the substantial time derivative operator defined in (4.65),
D
= v ( v)
(4.105)
Dt
2. By dividing (4.105) by and then using the equation of continuity given
in (4.66), show that
D 1
1
= v
(4.106)
Dt
1
2
v
v ( v)
2
(x 0.5)2 + (y + 0.5)2
for 1 x 1 and 1 y 1.
1. Calculate T (x, y).
2. Obtain a contour plot of the temperature distribution and overlay it with
the gradient vector field plot.
3. Evaluate 2 T and its gradient.
4.10 Exercises
4.
5.
E4.26. Find the relationship between the unit vectors of the cylindrical coordinate
system with the unit vectors of the spherical coordinate system, that is, determine Rcs such that
r
r
= Rcs
z
(Hint: Use the matrices found in Sections D.2 and D.3.)
(Note that we use
r and
for the spherical coordinates to distinguish them
from the r and variables, respectively, used for the cylindrical coordinates.)
E4.27. For a tensor T , find the relationship between the components in the rectangular coordinates and the components in the spherical coordinates, for
example, T rr = T rr (T xx , . . . , T zz), and so forth.
E4.28. Show that (4.93) is equal to the formula for the Laplacian of a scalar field
given in (4.95) for the case of spherical coordinates.
E4.29. Given the vector field in the spherical coordinates,
1
1
v (r, , ) = 2 4 3 r
r
r
and a constant vector field w in rectangular coordinates
w = 2x + 4y
Evaluate (w v) at r = 1.2 and = = /4 and give the results in spherical
coordinates, as well as in rectangular coordinates.
E4.30. For the parabolic coordinate system described in Example 4.17, obtain the
divergence and curl of a vector field v in this coordinate system. Also, obtain
the reciprocal base vectors
a ,
b, and
and show that a
a = 1, b
b = 1
and
= 1.
203
In this chapter, we discuss the major integral theorems that are used to develop physical laws based on integrals of vector differential operations. The general theorems
include the divergence theorem, the Stokes theorem, and various lemmas such as
the Greens lemma.
The divergence theorem is a very powerful tool in the development of several
physical laws, especially those that involve conservation of physical properties. It
connects volume integrals with surface integrals of fluxes of the property under
consideration. In addition, the divergence theorem is also key to yielding several
other integral theorems, including the Greens identities, some of which are used
extensively in the development of finite element methods.
Stokes theorem involves surface integrals and contour integrals. In particular, it
relates curls of velocity fields with circulation integrals. In addition to its usefulness
in developing physical laws, Stokes theorem also offers a key criteria for path
independence of line integrals inside a given region that can be determined to be
simply connected. We discuss how to determine whether the regions are simply
connected in Section 5.3.
In Section 5.5, we discuss the Leibnitz theorems involving the derivative of
volume integrals in both 1D and 3D space with respect to a parameter in which
the boundaries and integrands are dependent on the same parameter . These are
important when dealing with time-dependent volume integrals.
In applying the various integral theorems, some computations may be necessary.
To this end, we have included extensive discussions of the various types of integrals in
Sections E.1 through E.3 (i.e., line integrals, surface integrals, and volume integrals).
In these sections, we include the details for computation and some examples to
appreciate the implications and evaluations of the various integral theorems useful
for the integral theorems that are covered in this chapter.
In Section 5.4, we discuss two very important applications of the integral theorems. These are the development of conservation laws and the development of
Maxwell equations for electricity and magnetism. Both these applications expose
the power of vector differential operators for the development of the partial differential equations that model several physical systems.
We do not cover the actual solution of partial differential equations in this
chapter. The solution of these partial differential equations can be very complex
and is often difficult to obtain in closed analytical form except for specific cases.
204
205
Instead, their solutions are dealt with in later chapters, starting with Chapter 6
and succeeding chapters for handling problems that can be reduced to ordinary
differential equations; Chapters 10 through 14 handle solution of certain classes of
partial differential equations, containing both analytical and numerical approaches.
dudv
(5.1)
u
v
C
S
LEMMA 5.1.
where
1. C is a sectionally smooth boundary of the surface of integration S.
2. The positive direction of contour C is consistent with the definition of the positive
direction of the vector
=
n
(r/u) (r/v)
r/u (r/v)
For the special case in which the surface of integration S lies in the (x, y)plane, Greens lemma is stated in terms of x and y, that is, in (5.1) u and v are
replaced by x and y, respectively. In this case, the positive direction of the contour
is counterclockwise. Equivalently, the positive contour direction is chosen such that
the region in S is always to the left of the contours path.
EXAMPLE 5.1.
x2 + y
G(x, y, z)
2x + 3y z + 5
Let the surface of integration to be the top portion of the unit sphere with
0 /4 and 0 2, as shown in Figure 5.1.
Using the parameterization based on the spherical coordinates, u = and
v = ,
x = sin(v) cos(u)
y = sin(v) sin(u)
z = cos(v)
and
F (u, v)
G(u, v)
206
dudv =
u
v
2
0
0
The closed contour in the (u, v) plane is shown in Figure 5.2. The positive
direction is counterclockwise, and it yields a normal vector of points in S that is
outward from the center of the sphere.
Based on Figure 5.2, the line integrals can be calculated to be
2
0
3
F (u, v)du =
F (u, 0) du +
F u,
du =
4
2
C
0
2
3
G(u, v)dv
/4
=
G (2, v) dv +
/4
G (0, v) dv
3 2
3 2
+2+5
+
25
=0
2
4
2
4
Combining all the results, we see that Greens lemma applies, that is,
3
3
G F
Fdu + Gdv =
dudv
u
v
C
C
S
207
Figure 5.3. The surface of integration with holes: (a) the original surface S and contour C, (b)
S1 , obtained by removing S2 , (c) the continuous contour
4 C1 for
4 S1 , (d) due to the cancelation
4
of line integrals in oppositely directed segments, C1 = C C2 .
Let 1 and 2 be two opposite paths, that is, they travel through the same set of
points but having switched the end points. Then we note that
F (u, v)ds =
F (u, v)ds
F (u, v)ds +
F (u, v)ds = 0
1
2
1
2
We can then use this fact to evaluate surface integrals in surfaces that contain holes.
For illustration purposes, consider the surface shown in Figure 5.3b. A surface S1
was obtained by removing S2 from the original surface S. In Figure 5.3c, a continuous
contour C1 can be generated as the boundary of S1 . At some point along the outer
boundary, contour C1 cuts a path to reach the inner hole. The contour then traverses
the boundary of this hole. After reaching the point where the contour first entered
the hole, the contour C1 retraces the same cutting path backward to continue tracking
the outer path.
208
The two line integrals that have traversed the same path segments, but in opposite directions, will cancel each other out. Thus, as shown in Figure 5.3d, the line
integral along C1 will be equal to the line integral along C minus the line integral
along C2 , which is the contour of the removed surface S2 , that is,
3
3
3
f ds =
f ds
f ds
(5.2)
C1
C2
S2
dudv =
(Fdu + Gdv)
(Fdu + Gdv)
u
v
S1
C
C2
3
=
(Fdu + Gdv)
(5.4)
C1
Thus Greens lemma can be applied to surfaces with holes by using the strategy of
cutting through the region to connect the outer contour with the inner contours of
the holes.
PROOF.
dS
fg n
S
1
(5.5)
g ( f ) f g
dV
(5.6)
(5.7)
dS
n
209
dV
(5.8)
f dV
(5.9)
f dS
n
dS
( ) n
PROOF.
2 2 dV
(5.11)
One could also view the divergence theorem as a reduction in the dimension of
the integration, that is, a volume integral on one side while a surface integral on the
other. Thus if the volume integration is difficult to obtain, the reduction to a surface
integral would usually make it easier to evaluate, at least numerically. However,
there are several instances in which volume integration would actually be easier to
obtain than the surface integral. This is specially true if the divergence operation
simplifies several terms in the integrand, as shown in the next example.
EXAMPLE 5.2.
S
4 bx + by + bz
3
3
210
Another application of the divergence theorem, via the Greens theorem given
previously, is in the development of weak solutions needed by the finite elements
methods. Briefly, the finite elements method partitions domains in which the first
derivatives are satisfied. However, at the edges where the elements are patched
together, the smoothness is no longer guaranteed. Greens theorem allows for volume integrals of Laplacians such as 2 to be replaced by the surface integrals
involving gradients. Details for the weak-solution formulation of the finite element
methods are covered later in Section 14.1.
A very important application of the divergence theorem is the Gauss theorem,
which is also useful in solving partial differential equations.
THEOREM 5.3.
S
PROOF.
1
dS =
n
r2 r
if origin is outside of S
if origin is inside of S
(5.12)
PROOF.
f dr =
C
f t ds
is known as the circulation of f along the closed path C. Thus (5.13) simply states
that the sum of all the curls of f fluxing out of the surface S bounded by C is equal to
the circulation of f along C (see Figure 5.4).
Stokes theorem has several application in physics, including one approach in the
development of Maxwells equations for electric and magnetic intensity. Another
application is to use it to assess path independence of line integrals.
In fact, based on exterior calculus, a general Stokes theorem can be developed that also generalizes
the divergence theorem.
211
Definition 5.1. Let f be a given vector field and r be the position vector, then the
line integral
f dr
IC =
(5.14)
C
is said to be path independent in a region V if, for any pair of curves, C1 and C2
inside the region V ,
IC1 = IC2
(5.15)
where C1 and C2 are continuous and sectionally smooth curves that have the same
initial point (xi , yi , zi ) and the same end point (x f , y f , zf ).
When V is not specified, then it is usually understood that the path independence
refers to the whole 3D space.
Consider two paths C1,AB and C2,AB in region V that do not intersect each other
except at the start and end points. If the line integrals are independent of path,
f dr =
f dr
(5.16)
C1,AB
C2,AB
C1,AB
C1,AB C2,AB
f dr
C2,AB
f dr
f dr
(5.17)
where C = C1,AB C2,AB is a simple closed path, as shown in Figure 5.5. With any
choice of C1,AB and C2,AB (so far assumed to be nonintersecting at midpath), by
reversing the path direction of C2,AB, a closed path C is generated. Path independence
guarantees that the line integral using the closed path will have to be zero.
Alternatively, we could have rearranged (5.16) to be
3
f dr = 0 =
f dr
(5.18)
C2,AB C1. AB
C
where C = C2,AB C1,AB is also a simple closed path but in the opposite direction
of C.
Now consider the situation shown in Figure 5.6 where path C1,AB and C2,AB might
intersect each other somewhere between the end points, say, at point D. Path C1,AB
can then be partitioned to be the sum of two subpaths: C1,AD and C1,DB. Likewise,
path C2,AB can also be partitioned to be the sum of two subpaths: C2,AD and C2,DB.
212
Figure 5.5. Two paths with the same start point A and end point B.
f dr +
C1,AD
f dr
C1,DB
f dr
C1,AD
C1,AB
f dr
C2,AD
f dr +
C2,AD
C1,AD C2,AD
f dr +
3
f dr
f dr
f dr
f dr
C2,DB
f dr
C1,DB
f dr
C2,AB
C2,DB
C1,DB C2,DB
f dr
CAD
(5.19)
CDB
Figure 5.6. Two paths with the same start point A and end point B plus an intersection at
point D.
where CAD is the closed path formed by adding path C1,AD and reverse of path C2,AD.
Similarly, CDB is the closed path formed by adding path C1,DB and reverse of path
C2,DB. Because the definition of path independence applies to the subpaths, the two
closed paths will each generate a zero line integral also. Thus the condition for path
independence is equivalent to having the line integral of any closed paths in region
V all be zero, including whichever direction the closed path takes.
Having established this equivalence, we can use Stokes theorem to determine
path independence. Recall that Stokes theorem relates the line integral in a closed
path to a surface integral involving the curl of vector field f :
3
( f ) dS
f dr = n
(5.20)
C
If f = 0 for all points in region V , then Stokes theorem suggests that this
condition will also guarantee path independence, because for any surface S inside
region V , a zero curl implies a zero line integral on the left-hand side of (5.20).
One more detail is still needed, however. Stokes theorem requires that the integrand in the surface integral be bounded. This means that in the chosen region
V , no singularities of the integrand can be present. In relation to the closed
path integral, this kind of region is formally referred to as a simply connected
region.
Definition 5.2. A region V is simply connected if any simple closed path in the
region can be continuously deformed to a single point. Otherwise, the region is
multiply connected.
A set of examples is shown in Figure 5.7. The first case in Figure 5.7(a) is a full
rectangular box. In this case, we see that any closed path contained in the region
can be deformed continuously into a point inside the box. The second case is shown
in Figure 5.7(b). For this case, a cylindrical subregion has been removed from the
center. Even though there are some closed paths that could deform to a point, the
existence of at least one closed path that will not continuously deform to a point
is sufficient to have this region be multiply connected. The third case is given in
Figure 5.7(c). In this case, the rectangular box region has a spherical subregion
removed. However, unlike case (b), any closed path inside the region of case (c) can
still be continuously deformed to a point. Thus the third case is a simply connected
region.
We can now summarize the preceding discussion in the following theorem:
Let f = 0 inside a simply connected region V ; then the integral
f
d
r
is
independent
of path inside region V .
C
THEOREM 5.5.
EXAMPLE 5.3.
y x + x y + 2z z
2
x + y2 + z2 x + 2xy y + (y z) z
y+2
x1
x +
+ 2 z
2
2
(x 1) + (y + 2)
(x 1)2 + (y + 2)2 y
213
214
Figure 5.7. Examples of simply and multiply connected regions. (a) Solid rectangular region:
simply connected; (b) solid rectangular region with a cylindrical subregion removed from the
center: multiply connected; (c) solid rectangular region with a spherical subregion removed
from the center: simply connected.
Path C1:
Path C3:
3 + cos(3t)
12t2 + 10t + 4
3 + sin(3t)
3 + 4t(1 t)
2t
2 2(1 t)2
10t2 11t + 1
Path C2:
6t2 + 5t + 1
2t2 5t
2t2 5t
6t2 + 7t
10t2 9t
Path C4:
Figure 5.8 shows paths C1 and C2 . Note that C1 has a helical form. Figure 5.9
shows paths C3 and C4 together. Included in the figure is a line described by
(x, y, z) = (1, 2, z). This line includes the singular point of vector field f . Thus
the line is only relevant when considering f .
5.4 Applications
5.4 Applications
In this section, we discuss two major applications of the vector integral theorems.
The first application mainly uses the divergence theorem for obtaining the various
conservation laws. The second application is in the field of electrodynamics. Note that
the main activity here is to obtain integral and differential equations that describe the
physical laws. The solution of these differential equations will not be treated here.
Instead, the analytical and numerical solutions of these equations are discussed in
later chapters.
215
216
Start: (1, 0, 0)
End: (0, 3, 1)
Vector fields
C1
C2
C3
C4
Curl
h
g
f
2
36.062
4.343
2
45.2
4.343
1
4.4
1.927
1
11.333
4.356
0
x + 2xy
0
Rate of Change
of
inside V
Net flux of
Internal rate of generation
+
out of V
of
across surface S
inside V
External effects
+
on V and S
(5.21)
Equation (5.21) is quite general and applicable to several objects being considered.
In integral form, we have
V
V dV =
S
dS
V v n
dS +
fn
S
G dV +
EV dV +
ES dS
S
(5.22)
where G is the generation of per unit volume, whereas EV and ES are external
effects on per unit volume and per unit surface area, respectively. Here we have
also separated the convective flux
V v from other mechanisms of fluxes described
by f . We treat some of the major applications, including mass, momentum, chemical
component, and energy. For each of these cases, each term (e.g., flux terms and
external effects) will vary and requires extra constitutive equations obtained from
physical and empirical laws (often constrained under specific conditions).
1. Mass Balance
Here, we have = m, which is mass m; then the density is
V = . The flux
of mass out is given only by convective flow normal to the surface boundary,
5.4 Applications
that is,
217
dS
v n
is the unit normal vector in S pointing outward from dS. Because mass
where n
cannot be created nor destroyed, we have G = 0, EV = ES = 0 in (5.22). Thus
(5.22) becomes
dS
dV = v n
t V
S
Because V is assumed fixed, we can move the time derivative inside the integral.3
Then, after applying the divergence theorem, we obtain
+ v dV = 0
(5.23)
t
V
which is the integral form of the continuity equation. Because V , has been set
arbitrarily, we can extract the integrand and obtain
+ v = 0
t
(5.24)
which is the differential form of the continuity equation. Using the substantial
derivative operator defined in (4.65), that is, D/Dt = /t + v , (5.24) can be
rewritten as
D
+ v = 0
Dt
2. Momentum Balance
Let = v be momentum per unit mass and
V = v becomes momentum per
unit volume. We now assume that there is no internal generation of momentum
(e.g., no reactions or explosions). Furthermore, we can take the external forces
to be due only to pressure normal to surface S and gravitational acceleration g
acting on the volume V . Thus the external effects are given by
dS +
External effects = p n
g dV
S
The momentum flux has two components: one is due to convective flow, and the
other is due to the stress at the boundary. The flow of momentum out across S
is given by
) dS
Convective momentum flow out = (v) (v n
S
The other flux, also known as the molecular momentum flux, is due to the
material stress traction vector and is given by the stress vectors pointing outward
of S. Based on the discussion given in Example 4.4, we have
dS
Molecular momentum flow out = T n
S
3
If V is not fixed, one needs to apply Leibnitz rules, as discussed in Section 5.5.
218
where T is the stress tensor field. Combining the various terms, (5.22) now
becomes
dS p n
dS +
v dV =
vv+T n
g dV
t V
S
S
V
After applying the divergence theorem, and consideration that V is fixed but
arbitrary, we arrive at
(v) = ( v v) T p + g
t
(5.25)
Equation (5.25) is known as the Cauchy equation of motion4 or, in terms of the
substantial time derivative operator,
D
=
+v
Dt
t
(5.25) together with (5.24) (see Exercise E4.18) will be reduced to
Dv
= T p + g
(5.26)
Dt
Two special cases are often used in the study of fluid dynamics. The first is when
the material is incompressible and Newtonian, and we have T = 2 v (see
exercise E4.19). This reduces (5.26) to
Dv
= p + 2 v + g
(5.27)
Dt
The other special case is when T is negligible, for example, for an inviscid
fluid, then (5.26) reduces to Eulers equation of motion,
Dv
= p + g
Dt
(5.28)
3. Energy Balance
Let =
E be the energy per unit mass; then
V =
E. The energy per unit
mass is the sum of three terms: the specific internal energy
u, the specific kinetic
2
energy v /2, and the specific potential energy
e p , that is,
2
v
+
ep
2
The flow of energy due to convection across surface S is given by
) dS
E (v n
Flow of energy through S =
E =
u+
As noted during the derivation, the assumptions used (e.g., no internal generation, no other body
forces such as magnetic effects are present, etc.) need to hold when using (5.25); otherwise extra
terms are needed.
In some texts, they consider heat generated by reaction as a generation term. In our case, we consider
heats of reaction and other latent heats as included in the specification of internal energy.
5.4 Applications
219
where q is the heat flux due to heat transfer and is the rate of heat input per
volume due to other mechanisms such as radiation, electric field, and so forth.
The net work done by the surroundings is given by
dS
p v+T v n
Rate of net work done by surroundings =
S
Then, after applying the divergence theorem and including the equation of
continuity to put it in terms of substantial time derivative, (5.21) becomes
D
E
= p v T v q +
(5.29)
Dt
known as the total energy balance equation.
For the special case of potential energy being based only on gravity, we have
e p = g, which can be assumed to be independent of time. Then the substantial
time derivative of
e p becomes
D
ep
= v g
(5.30)
Dt
We could also find the substantial rate of change of kinetic energy by taking a
dot product of the equation of motion given in (5.26) by v. Doing so, we can
combine the substantial derivative of both kinetic and potential energy to be
D 1
2
v +
ep
= v p v T
Dt 2
= p v p ( v) T v T :v
(5.31)
where we used the identity given in (4.100), with the last term being
v j
T :v =
T ij
i
i,j =x,y,z
Equation (5.31) is also known as the mechanical energy balance equation. We
can then remove two terms, namely the substantial time derivative of the specific
kinetic and potential energy, from the total energy balance in (5.29) by using
(5.31) to obtain
D
u
= p ( v) T :v q +
(5.32)
Dt
which is known as the thermal energy balance equation.
As a simple application, consider the energy balance for a stationary solid in
which the only mode of heat transfer from surroundings is via conduction.
Applying the equation for internal energy in terms of constant heat capacity Cv
with v = 0
D
u
T
= Cv
Dt
t
where we use T for temperature. For the heat transfer, we use Fouriers law,
that is,
q = k T
220
(5.33)
cAvA + cBvB
cA + cB
and D is the flux due to diffusion. The internal generation term will be the net
rate of production of A due to reaction.6 This is given by
Rate of net generation of mole A via reaction =
RAdV
V
Substituting all the terms to (5.21) and then applying the divergence theorem,
we obtain the binary component balance equation given by
cA
+ (cAv ) = D + RA
t
(5.34)
(5.35)
5.4.2 Electromagnetics
One of the important applications of vector analysis, in fact a major impetus for
the development of vector calculus, is the study of electricity and magnetism. The
6
Because we are treating only binary mixtures, we are assuming only the reaction A B.
5.4 Applications
221
Notation
Formula
Fields
Electric field
Magnetic field
Q
Flux Densities
Jc
Jd
D
t
Current density
Jc + Jd
J
Integrals and Fluxes
dS
Bn
Magnetic flux
Current
dS
Jn
i
Total charge
Q dV
V
Parameters
Permisivity
Permeability
Conductivity
different fields, flux densities, and fluxes7 are defined and given in Table 5.2, where
J = Jc + Jd is Maxwells decomposition of the current density.8
Based on experimental studies, several laws were obtained that relate the different items in Table 5.2. These are
=
H dr
(5.37)
dS
Dn
(5.38)
dS
Bn
(5.39)
E dr
(5.36)
As mentioned earlier in Example 4.3, the term flux in electromagnetics refers to a scalar quantity
that integrates the flow of a flux density vector field through a given closed surface S. The term
flux in fluid transport equations refers to a vector field.
We temporarily suspend our convention of using small bold letters for vector fields to comply with
the usual conventions of electromagnetics. Likewise, please note that i is set as current and not the
imaginary number.
222
E
(5.40)
1
B
(5.41)
with and constant. Applying Stokes theorem to Faradays law and Amperes
law, and applying the divergence theorem to both of the Gauss laws, while assuming
C, S, and V to be fixed, will yield the following:
dS
E dr =
Bn
t S
C
B
dS =
dS
n
(5.42)
( E) n
S
S t
H dr
C
dS
Jn
=
S
dS
( H) n
E
dS
E +
n
t
S
(5.43)
dS
Dn
S
( D) dV
Q dV
(5.44)
dS
Bn
( B) dV
(5.45)
Because the value for C, S, and V are arbitrary, equations (5.42) to (5.45) reduce
to the set of equations known as Maxwells equations for electricity and magnetism:
E
H
B
=
t
t
E
E +
t
(5.46)
(5.47)
5.4 Applications
223
E = Q
(5.48)
H = 0
(5.49)
where the rightmost equations of the first two equations show the coupled relationships between the fields E and H. These equations could be uncoupled by further
taking the curl of (5.46) and (5.47), together with (5.48), (5.49), and identity (4.62),
( E) = ( E) 2 E
1
Q 2 E
( H) = ( H) 2 H
2 H
( H)
t
2
E
E
t
t2
( E)
E +
t
2
H
H
t
t2
(5.50)
(5.51)
where
M =
+ 2
2
t
t
EXAMPLE 5.4. For the special case of static electric fields, that is, E = E(t), the
operator M reduces to the negative Laplacian, 2 , yielding
1
Q
However, this consists of solving three differential equations because it is a
vector differential equation. An interesting alternative is to observe that (5.46)
under static conditions reduces to
2E =
E=0
Based on Theorem 5.5, work done by E along a path inside a simply connected
region will be path independent or, equivalently, E is a conservative field. For
conservative fields, such as E in this case, we can make use of the vector differential identity (4.60), that is, = 0, and let E = , where is now the
unknown potential field. Substituting this representation into (5.48), we have
Q
which has a form known as Poissons equation. For the additional condition that
Q = 0, this reduces to
E = 2 =
2 = 0
224
a form known as Laplaces equation. Once has been solved from either
the Poisson equation or Laplace equation together with the given boundary
conditions, E can be obtaind by simply taking the gradient of .
Unfortunately, from (5.47), one observes that even for static magnetic fields,
H is generally nonconservative. Nonetheless, one can still use M = 2 , under
static conditions, and solve
2H = 0
d
d
h()
F (, x) dx
h()
g ()
g ()
F (, x)
dx
+F (, h())
PROOF.
h()
g ()
F (, g())
(5.52)
V ()
S()
THEOREM 5.7.
EXAMPLE 5.5.
5.6 Exercises
225
5.6 EXERCISES
E5.1. Let a particle move along a path C due to an attractive force f directed toward
the origin, f = kr. The tangential unit vector along the path is given by
dr
dr
t =
dr
2
2
r
dr (r dr)2
r dr
ds
W =k
+
dr
dr
C
3. Evaluate the work due to f and its associated friction force for the cyclone
path shown in Figure 5.10 given by
t3
t3
3 3
y = sin (16t)
x = cos (16t)
z=
t
4
4
4
from t = 0.1 to t = 1.5 in terms of and k.
226
0.8
0.6
0.4
0.2
0
0.5
0.5
0
0.5
0.5
where axx , ayy , . . . axz, ax , ay , az, c are constants. Calculate the volume integral
V (x, y, z) dV , where the volume of integration is the sphere centered at
the origin with a radius . Determine the volume integral if the sphere V had
its center at (x0 , y0 , z0 ) instead.
E5.3. Consider the top in exercise E4.10 whose surface was described by
x
sin () cos ()
sin () sin ()
for 0 and 0 2.
1. Find the volume of the solid.
2. Assuming a density function
(x, y, z) = 2 x2 + y2 + z2
5.6 Exercises
227
s () = RB 1
RA
RB
2
sin2 ()
This means that the surface integral for this cut piece is given by
s ()
f (x, y, z)dS =
g(s, ) dsd
s ()
F = 4z and
Verify Greens lemma (5.1) with u = and v = s for the surface of the
cut piece described Previously and RA = 1.5 and RB = 1. (Note: For the
contour integral, the closed path can be parameterized by for points
(, s) of the path, ranging from 0 to 1 with the following correspondences:
( = 0) ( , 0), ( = 0.5) ( , 0), and ( = 1) ( , 0)).
E5.7. Let S be a closed surface, and show that the divergence theorem leads to the
following identities:
1
V =
n r dS
3 S
n v dS
0 =
dV
V
v dV
n dS
n v dS
S
where V is the volume enclosed by S and v is a vector field. (Hint: For the last
two identities, apply the divergence theorem on a and v a, respectively,
where a is a constant vector.)
Note that the last two identities plus the divergence theorem can be
combined into one mnemonic equation,
F dV = n F dS
(5.57)
V
228
where
Divergence
Gradient
Curl
[F]
dot product
scalar product
cross product
E5.8. Let C be a closed curve, and show that the following line integrals are zero:
3
L1 =
a dr
C
r dr
L2
3
=
L3
dr
C
3
=
L4
where a is any constant vector, r is the position vector, and , v, and w are
scalar functions.
E5.9. Let C be a closed curve, and using Stokes theorem, show that
3
dr =
n dS
C
then the two equations can be combined into one mnemonic equation,
3
dr [F] = (n ) [F] dS
(5.58)
C
where
Cross Divergence
Cross Gradient
[F]
dot product
scalar product
5.6 Exercises
229
where g =
e p , with
e p being the potential energy per unit mass. Then
for steady state and irrotational velocity field, this equation reduces to the
Bernoulli equation of motion,
ep +
p
1
+
v
= constant
along a streamline.
E5.11. One can show that the stress tensor, in the absence of couple-moments, is
symmetric.
1. Let the stress tensor field of a body be given by T and a body force per
unit volume given by f . Show that for equilibrium, a necessary condition
is given by
TT + f = 0
(5.59)
(Hint: The summation of forces along each coordinate should be zero. The
forces include the surface integral of the force due to the stress tensor on
the body surface and the volume integral of the body forces.)
2. Assuming no couple-moments, the total moments due to both body force
and stress tensor on the surface should also be zero, that is,
r T n
dS +
r f dV = 0
(5.60)
S
Show that after implementation of the divergence formula and (5.59), the
stress tensors indeed need to be symmetric.
E5.12. Obtain the differential volume at a point r under the paraboloid coordinate
system described in Example 4.17.
E5.13. Another orthogonal coordinate system is the torroidal coordinate system,
(, , ), with 0, , and 0 2, described by the following
equations:
x=
a sinh() cos()
cosh() cos()
y=
a sinh() sin()
cosh() cos()
z=
a sin()
cosh() cos()
(5.61)
230
Find the total flow of property q out of a torus with R = 1 and C = 2 for
the flow field per unit area given in (5.62).
E5.14. A spherical solid of radius r containing substance s is decaying in time. The
distribution of s is described by a time-dependent field
z
3
t(x2 +y2 )/3
(x, y, z, t) = 0.8 e
+
2r 2
where the center of the sphere is located at the origin. Suppose the radius of
the sphere is also shrinking, symmetrically with respect to the center, due to
erosion. The radius was found to be shrinking at a rate
dr
= 0.1r
dt
r(0) = R = 10
1. Find r = r(t).
2. Determine the rate of change of s present in the solid at time t = 10.
3. Verify Leibnitz rule (5.53) for this rate of change for any time t > 0.
E5.15. Consider the regions R1 through R3 and singularities contained in S1 through
S4 given in Table 5.3. Determine which pairs (i.e., R1 with S1 removed, R1
with S2 removed, etc.) are simply connected.
Table 5.3. Regions R and singularities S for E5.15
Regions
Singularities
R1:
4 x 4
y0
< z <
S1:
R2:
spherical r 2
S2:
R3:
cylindical r 2
S3:
S4:
(x, y, z) = (0, 0, 0)
Surface
2x + y z = 0
Sphere with radius r = 0.1
and center at (4, 0, 0)
Line passing through
(1, 1, 2) and (2, 1, 1)
2
(under spherical coordinates)
r1 r
2
(under cylindrical coordinates)
sin + 3 r
xzx +
y
+ zz
y+3 y
5.6 Exercises
231
PART III
Several models of physical systems come in the form of differential equations. The
main advantage of these models lies in their flexibility through the specifications of
initial and/or boundary conditions or forcing functions. Although several physical
models result in partial differential equations, there are also several important cases
in which the models can be reduced to ordinary differential equations. One major
class involves dynamic models (i.e., time-varying systems) in which the only independent variable is the time variable, known as initial value problems. Another case
is when only one of the spatial dimensions is the only independent variable. For this
case, it is possible that boundary conditions are specified at different points, resulting
in multiple-point boundary value problems.
There are four chapters included in this part of the book to handle the analytical solutions, numerical solutions, qualitative analysis, and series solutions of
ordinary differential equations. Chapter 6 discusses the analytic approaches to solving first- and second-order differential equations, including similarity transformation
methods. For higher order linear differential equations, we apply matrix methods
to obtain the solutions in terms of matrix exponentials and matrizants. The chapter
also includes the use of Laplace transforms for solving the high-order linear ordinary
differential equations.
Numerical methods for solving ordinary differential equations are discussed
in detail in Chapter 7, including implicit and explicit Runge-Kutta methods and
multistep methods such as Adams-Bashforth and Adams-Moulton and the backward
difference formulas (BDF) methods. We also discuss some simple error-control
approaches by adaptive time-interval methods. The second part of the chapter is
devoted to the solution of boundary value problems such as the shooting method for
both linear and nonlinear ordinary differential equations and the Ricatti method.
The last part of Chapter 7 is a relatively brief, but crucial, discussion of difference
equations and stability. The analysis of difference equations is important for the
application of the Dahlquist tests to determine stability regions. Using the ideas of
stability, one can better descirbe stiff differential equations.
Chapter 8 discusses the qualitative analysis of differential equations such as
phase-plane analysis; stability analysis of equilibrium points, including Lyapunov
methods and linearization techniques; and limit cycles. This chapter also discusses
bifurcation analysis based on one- or two-parameter systems.
233
234
236
solutions do not contain any arbitrary constants, and they provide an envelop that
helps determine regions of the domains where solutions exist.
Next, we begin our discussion of higher order differential equations with the
state-space formulation. Even when one decides to use numerical approaches, the initial step, more often than not, is to first recast the differential equations in the statespace forms. We limit our discussion of analytical solutions only to linear high order
differential equations. The nonlinear cases are often handled much better using
numerical methods. Nonetheless, even by limiting our problems to linear differential equations, we still have two cases: those with constant coefficient and those with
nonconstant coefficients. For the linear equations with constant coefficients represented by a constant matrix A, the solutions are pretty standard, with the matrix
exponentials eAt as the key element of the general solutions. This means we need
the results of earlier chapters (i.e., Chapters 1 through 3) to help in evaluating the
required matrices and functions. For instance, when the matrix A is diagonalizable
(cf. Section 3.5), then eAt = Vet V 1 , where is a diagonal matrix of eigenvalues
and V is the matrix of eigenvectors. For the case in which A is not diagonalizable, we
include a brief section discussing the application of the Cayley-Hamilton theorem
to provide a finite-sum approach to evaluating the solution.
For the case in which the system of linear high-order differential equations have
nonconstant coefficients (i.e., A = A(t)), we have the generalization of the solution in
the form of fundamental matrices, also known as matrizants. Although the evaluation
of matrizants is difficult to find in general, some special cases do exist in which the
matrizants can be found directly. For instance, when A(t)A() is commutative, the
matrizant will involve a simple matrix integral. More importantly, we use the concept
of matrizants in the next chapter, when we solve boundary value problems of linear
differential equations with nonconstant coefficients using numerical approaches.
Next, we discuss the idea of decoupling of differential equations. This is possible when the matrix A is diagonalizable. The idea that the original system can
be transformed into a set of decoupled differential equations is introduced in this
section. The main issue is not so much the solution of the differential equations,
because they are simple restatements of the solution involving exponential matrices
eAt . The distinct feature of the decoupled system lies in its offering of an alternative space where the decoupling condition allows the tracking of solution in a
one-dimensional space. Potentially, it allows for easier design, interpretation, and
analysis of physical experiments. We include one such example in the form of the
Wei-Prater method for the determination of kinetic constants of multiple interacting
equilibrium reactions.
Finally, we include a brief discussion of Laplace transform methods for the
solution of multiple equations. Specifically, we show that this approach also yields
the same results of the approach given in the earlier sections.
(6.1)
237
or in the differential form, also known as the Pfaffian form, given by1
M (x, y) dx + N (x, y) dy = 0
(6.2)
M (x, y)
N (x, y)
(6.3)
Although both forms are useful, (6.1) has the additional interpretation (or constraint) of fixing x and y as the independent and dependent variable, respectively, due
to the definition of derivatives. Conversely, (6.2) treats both variables as independent
variables of an implicit solution, i.e., S(x, y) = C, where C is an arbitrary constant to
fit initial or boundary conditions.2 When required, and if it is possible, an explicit
solution can be attempted by rearranging S(x, y) = C to obtain a form y = y(x, C).
For this reason, we predominantly treat the solution of (6.2), except where the given
ODE can be identified more easily with standard (canonical) forms that are given in
the derivative forms, such as the linear first-order ODE and Bernoulli equations.
Most approaches fall under two major categories: those that are reducible to
separation of variables approach and those that are reducible to exact differential
approach. In this perspective, several techniques focus simply on additional transformations, for example, additional terms, multiplicative factors, or the change of
variables, to reduce the problems into one of these categories. Suppose that after a
set transformation is applied on x and y, new variables
x and
y are obtained resulting
in a separable form given by
(
(
M
x) d
x+N
y) d
y=0
(6.4)
then the solution approach is known as the method of separation of variables, yielding
(
(
C
(6.5)
y) d
y=
x) d
x+ N
S (
x,
y) = M
with
C as arbitrary constant.
However, if after the transformation to new variables, we obtain
M
N
=
y
x
(6.6)
known as the exactness condition, then we say that the transformed differential
equation is an exact differential equation and the solution is given by
(
(
M
x,
y) d
x + g (
y) =
N
x,
y) d
y + h (
x)
S=
y held constant
x held constant
(6.7)
where g (
y) and h (
x) are determined by matching terms in the rightmost equation
in (6.7).
1
dy
F x, y,
=0
dx
238
(6.8)
M
N
= 1 and
= 1, which saty
x
isfies the exactness condition given by (6.8),and no transformations are needed.
Thus the solution is given by
With M(x, y) = x + y and N(x, y) = x, we have
x2
S(x, y) =
+ xy = C
2
1
x2
y=
C
x
2
Alternatively, if we let
y = y/x and
x = x, then (6.8) can be transformed to
be
1
1
d
x+
d
y=0
x
2
y+1
which is a separable form. The solution then becomes
2
1 1
1
C
x
y=
C
y=
2
x2
x 2
2
which is the same solution with
C = 2C.
Remarks:
1. In the next section, we group several techniques as similarity transformation
methods. Most of these techniques should be familiar to most readers. It turns
out that they are just special cases of similarity transformations.
2. Later, in the section after next, we focus on the search for an integrating factor
needed to achieve exactness. Unfortunately, it is often that a systematic search
may yield equations that might even be more formidable than the original problem. Thus we simply outline the guiding equations from which some heuristics
will be needed to make the approach more practical.
3. It is possible that the resulting integrals are still not easily reduced to
closed forms. Thus numerical integrations may still be needed to obtain the
solutions.
4. Although an integrating factor is sometimes hard to find, the use of exact differentials have resulted in other benefits. For instance, in thermodynamics,
Caratheodory successfully used the reciprocal of absolute temperature 1/T as
an integrating factor to show that the change in entropy,
s, defined
as the
integral of the ratio of differential heat transfer to temperature (i.e., Q/T ),
becomes a path-independent function.
239
y =
y(x, y)
(6.9)
&
y = y
(6.10)
EXAMPLE 6.2.
&
y
y
=
&
x
x
Substituting x = &
x and y = &
y, we get
2
2 &
y + 22&
x2 d&
x 3&
x3 d&
y=0
(6.11)
240
y
y
= 2
x
x
Having determined the invariant u, the next theorem guarantees that the original
differential equation can be made separable.
THEOREM 6.1.
y = , where = 0 and
admit a set of similarity transformations given by&
x = and&
= 0. Then, using the invariant u = y x as a new variable, while maintaining x as
the other variable, will transform the differential equation into a separable variables
form given by
1
1
dx +
du = 0
(1)/
x
u u
G(u)
where
PROOF.
M (x, ux)
N (x, ux)
G(u) =
1/
M x, ux
x()/
1/
N x, (ux )
(6.12)
if =
if =
EXAMPLE 6.3.
3
3
1
tan
(u + 1) = ln(Cx) y = x2
3 tan
3 ln(Cx) 1
3
3
3
y
v=
= u
x
In fact, any function of the invariant, (u), is also a valid invariant for the differential equation.
241
A large class of differential equations that immediately admits similarity transformations are the isobaric first-order differential equations given by the form
y
xn1 f
dx dy = 0
(6.13)
xn
where f () is any differentiable function of . This admits a similarity transformation
with = n. In particular, with = 1, = n, and u = yxn , Theorem 6.1 reduces
(6.13) to the following separable form:
dx
du
+
=0
x
nu f (u)
(6.14)
Using the equation in Example 6.2 one more time, this differential
equation can be rearranged to be
y
2
x 2 + 2 dx dy = 0
x
EXAMPLE 6.4.
(6.16)
Another set of equations for which similarity transformations may apply after
some additional transformations is given by the form
f ()dx dy = 0
(6.17)
a1 x + a2 y + a3
b1 x + b2 y + b3
(6.18)
242
the original differential equation can be transformed into a homogeneous-type differential equation given by
b2 f
z
w
+ b1
z
dz a2 f
+ a1 dw = 0
w
which can be made separable under the variables w and = z/w, that is,
dw
b2 f () + b1
=
d
w
(a2 b2 ) f () + (a1 b1 )
EXAMPLE 6.5.
(6.19)
3 11 tan 23 11 ln (C [x + 2]) + 1
(x + 2) 1
y=
50
5
where C is an arbitrary constant.
N M
M
N
=
y
x
x
y
(6.20)
243
d
y
x
=
d
(6.21)
N
M
x
y
If could be found such that
M
N
y
x
= F ()
N
M
x
y
then the integrating factor is immediately given by
= exp
F ()d
EXAMPLE 6.6.
(6.22)
(6.23)
Given xy y2 dx + dy = 0, we have
F =
M/y N/x
x 2y
=
(N)(/x) (M)(/y)
x (xy y2 ) y
With y = 1/y, the denominator can be made closer in form to the numerator.
Thus = ln(y) + f (x). Substituting back to F ,
x
2 + y
2
F =
df
x +y
dx
which can be made constant by setting
can obtain the integrating factor:
x2
= ln(y) +
4
df
x
x2
x = , or f (x) = . Finally, we
dx
2
4
1
x2
(x, y) = 2 exp
y
2
x
exp
+
erf
=C
y
2
2
2
where C is the arbitrary constant of integration, and erf(z) is the error function
defined by
2 z t2
erf(z) =
e dt
0
244
Based on (6.22), two special cases are worth noting. These are:
M/y N/x
Case 1
= p (x) = x = exp
p (x)dx
N
M/y N/x
Case 2
= q(y) = y = exp q(y)dy
M
EXAMPLE 6.7.
(6.24)
(6.25)
we have
M/y N/x
3
=
M
y
Using (6.25), we get = y3 . The solution is then given by
x2 + 2y
+C=0
2y2
(6.26)
(
'
y = exp P(x)dx
exp
P(x)dx Q(x) dx + C
(6.28)
245
where F , V , k, and CA,in is the volumetric flow rate, reactor volume, specific
kinetic constant, and inlet concentration of A, respectively. This can be rewritten
in the standard form of a first-order differential equation,
dCA
F (t)
F (t)
+
+ k CA =
CA,in (t)
dt
V
V
where we can identify P(t) = k + (F (t)/V ) and Q(t) = F (t)CA,in (t)/V , and the
solution via (6.28) is
F (t)
k+
CA(t) = exp
dt
V
'
(
F (t)
F (t)
exp
k+
dt
CA,in (t) dt + CA(0)
V
V
One nonlinear extension of the first-order linear differential equation is to introduce a factor yn to Q(x). This is known as the Bernouli equation,
dy
+ P(x)y = Q(x)yn
dx
(6.29)
where n = 1.5 Instead of finding another integrating factor for (6.29), another transformation is used instead. By multiplying (6.29) by (1 n)yn and letting z = y1n ,
the Bernouli equation is reduced to one that is first-order linear type, that is,
dz
+ (1 n) P(x)z = (1 n) Q(x)
dx
and the solution is given by
8
9
1
z = (x)
(1 n) Q(x)(x) dx + C
where
(x) = exp
(6.30)
(1 n) P(x) dx
246
d2 y
dp
=p
2
dx
dy
or
This will allow the reduction of the second-order differential equation to a first-order
differential equation, depending on whether f (x, y, dy/dx) is missing dependencies
on x or y, as follows:
d2 y
dy
x,
=
f
1
dx2
dx
d2 y
dy
= f 2 y,
dx2
dx
dp
= f 1 (x, p )
dx
(6.32)
dp
1
= f 2 (y, p )
dy
p
(6.33)
or
dy
= S2 (y, C1 )
dx
The steady-state, one-dimensional concentration profile of substance A under a concentration-dependent diffusivity and flowing through a
pipe in the axial direction is described by
d
dCA
dCA
=
v
CA
dz
dz
dz
EXAMPLE 6.9.
where and v are the diffusivity at unit concentration and constant flow velocity,
respectively. This can be rearranged to become
d2 CA
1 dCA
1 dCA 2
=
dz2
CA dz
CA
dz
where = v/. This falls under the case where the differential equation is
not explicitly dependent on z. With p = dCA/dz, we obtain a first-order linear
differential equation given by
dp
1
+
p=
dCA CA
CA
p
dCA
dz
=+
247
m1
CA
(z) = k2 ek1 z
with k1 and k2 as the new pair of arbitrary constants. Thus an explicit solution
for CA(z) is given by
1
CA(z) =
W (z) + 1
k1
where W() is Lamberts W-function (also known as the Omega function),
defined as the inverse relation of f (w) = w ew , that is,
t = qeq q = W(t)
(6.34)
&
y = y
(6.35)
x, y,
dy
dx
(6.36)
if, after substituting (6.35), the new differential equation attains symmetry, given
by
d2&
y
d&
y
=f &
x,&
y,
d&
x2
d&
x
where is the similarity transformation parameter, and and are nonzero real
constants.
248
y
&
y
=
x
&
x
1 dy 1 d&
y
x
= &
x
dx
d&
x
(6.37)
(6.38)
Using these new variables, the original second-order differential equation can be
reduced to a first-order differential equation.6
Let the differential equation (6.36) admit a set of similarity transformations given by (6.35). Using similarity variables u and v defined in (6.37) and (6.38),
respectively, the differential equation (6.36) can be reduced to a first-order differential
equation given by
THEOREM 6.2.
G(u, v) + (1 ) v
dv
=
du
v u
where
G(u, v) = x
PROOF.
(6.39)
dy
f x, y,
dx
EXAMPLE 6.10.
d2 y
dy
=x
y + x2 2y2
2
dx
dx
we can determine whether a similarity transformation is possible. To do so,
let
x4
&
x = x ,
&
y = y
then
d2&
y
d&
y
d&
y
= 2&
x&
y
+ (+2)&
x3
22&
y
2
d&
x
d&
x
d&
x
For symmetry, we need = 2. Taking = 1 and = 2, we have the new
variables,
x4
(+2)&
u=
y
x2
and
v=
1 dy
x dx
dv
=u
du
For the more general case of reducing an n th order ordinary differential equation that admits a
similarity transformation, see Exercise E10.8.
249
2du
dx
=
u2 4u + 2C
x
Solving for u and then for y, the general solution can be simplified to be
'
(
y = 2x2 1 + k1 tan k1 ln(k2 x)
where k1 and k2 are a new pair of arbitrary constants.
d2 y
dy
+ a1 x
+ a0 y = f (x)
2
dx
dx
(6.40)
d2 y
dy
+ (a1 a2 ) + a0 y = f (ez)
2
dz
dz
(6.41)
ai xi
di y
= f (x)
dxi
(6.42)
250
and the same change of variable z = ln(x) will transform it into an nth -order
linear differential equation with constant coefficients, involving derivatives with
respect to z.
2. Euler-Cauchy equations are special cases of linear differential equations in
which the coefficients are analytic; that is, they can be represented by Taylor
series. The general approach for these types of differential equations is the
Frobenius series solution method, which is covered in Chapter 9. However, the
transformation technique described in this section will yield the same solution
as the Frobenius series solution, and it has the advantage of being able to
immediately determine the character of the solution based on the value of the
coefficients a0 , a1 , and a2 .
We end this section with a note that there are several other techniques to solve
differential equations. We include three of these in Section F.1, namely general
Ricatti equations, Legendre transformations, and singular solutions.
(6.43)
where x is an n 1 column vector that we refer to as the state vector whose element,
xi , is known as the ith state variable,
x1
x = ...
xn
(6.44)
f 1 (t, x1 , . . . , xn )
..
f(t, x) =
.
f n (t, x1 , . . . , xn )
(6.45)
251
Consider a reactor in which three first-order reactions are occurring simultaneously, as shown in Figure 6.1.
The equations for the rate of change of concentrations of components A, B,
and C are
dCA
= (kABCA + kBACB) + (kACCA + kCACC)
dt
dCB
= (kBCCB + kCBCC) + (kBACB + kABCA)
dt
dCC
= (kCACC + kACCA) + (kCBCC + kBCCB)
dt
These equations can now be formulated in state-space form
EXAMPLE 6.11.
dC
=KC
dt
where,
CA
C = CB ,
CC
(kAB + kAC)
K=
kAB
kAC
kBA
(kBA + kBC)
kBC
kCA
kCB
(kCA + kCB)
Aside from a set of first-order differential equations that are already of the
form given in (6.43), higher order equations can also cast in state-space forms.
Consider a high-order differential equation in which the highest order derivative
can be explicitly written as follows:
dn y
=f
dtn
dy
dn1 y
t, y, , . . . , n1
dt
dt
(6.46)
By assigning states to each of y and its derivatives all the way to (n 1)th derivative,
x1 = y,
x2 =
dy
,
dt
xn =
dn1 y
dtn1
d
dt
x1
..
.
xn1
xn
x2
..
.
xn
f (t, x1 , . . . , xn )
(6.47)
252
Let x1 = y and x2 =
dy
, then
dt
dx1
=
dt
dx2
=
dt
or
x2
x2 (b x21 ) x1
d
x = f(x) =
dt
x2
bx2 x21 x2 x1
(6.48)
where
a11 (t)
..
A=
.
an1 (t)
..
.
a1n (t)
..
.
ann (t)
b1 (t)
b(t) = ...
bn (t)
In the next two sections, we solve (6.48) by the introduction of matrix exponentials when A is constant. The details for the analytical solution are given in
Section 6.5.2. A concise formula for the explicit solution for the case with constant
A is given in (F.21).
When A = A(t), the matrix exponentials are generalized to matrizants. The
solutions for these cases are difficult to generalize. However, two special cases are
253
considered. One case is when A(t) and A() commutes. The other case is when A(t)
can be represented by power series in t.
t2 2 t3 3
A + A +
2!
3!
(6.49)
THEOREM 6.3.
(6.50)
(6.51)
if and only if AW = WA
d At
e = AeAt = eAt A
dt
(6.52)
(6.53)
x(t) x(0)
eA b (t) d
x(t)
eAt x(0) +
eA(t) b ()d
0
(6.55)
254
If matrix A is diagonalizable (cf. Section 3.5), with A = VV 1 , (6.55) can be simplified to be
x(t)
Ve V 1 b ()d
t
e V 1 b ()d
Vet V 1 x(0) +
(6.56)
EXAMPLE 6.13.
with
A=
2
2
2
1
2
1
2
4
4 et
b(t) =
2
2t
1+e
1
x(0) = 0 .
1
The eigenvalues of A are (3, 2, 1), and A is diagonalizable. Thus A =
VV 1 , with
0 1 1
3
0
0
V = 1 0 1 ; = 0
2
0
2 1 0
0
0
1
Let
q
=
0
e V 1 b()d =
67 + et + 21 e2t 31 e3t
1
2
t 2et + 23 e2t
t + et et
5
+ (t 2) et + t 21 e2t
2
5
et + t + 25 e2t 37 e3t
6
2t
e
Vet V 1 x(0) = 0
e2t
x=r+s=
+ (t 2) et + t + 21 e2t
4
+ t 21 et 2e2t + 67 e3t
3
5
et + t + 27 e2t 37 e3t
6
5
2
EXAMPLE 6.14.
with
A = 2
0
2
1
2
4c2 t2 2c1 t + c0
et
c2 t2 c1 t + c0
The third equation is obtained by taking the derivative, with respect to , of both
sides of equation (6.14) and then setting = 1 (because this is the repeated
root),
tet = 2c2 t2 + c1 t
Combining all three equations, we solve for c2 t2 , c1 t, and c0 instead of c2 , c1 , and
c0 . This saves us from having to invert a nonconstant matrix, and it also adds to
the efficiency in the succeeding steps. Thus
2t
4 2 1
c2 t2
e
1 1 1 c1 t = et
tet
2
1 0
c0
and
c2 t2
(t 1) et + e2t
c1 t = (3t 2) et + 2e2t
2tet + e2t
c0
In Section 3.7 Case 3, these scalars are constants. However, in applying those same methods here,
we take the variable t as a parameter to apply Cayley-Hamiltons theorem. This will necessarily
result in the coefficients ci to be functions of t.
255
256
c2 t2 A2 + c1 tA + c0 I
e2t
0
1 2t
t
(t + 1) et
2 e e
2t
e + et
2tet
e2t
1 2t
t
x=
2 e e
2t
e + et
(t + 1) et
0
t
x0
2t et
(t + 1) e
2tet
2t et
(t + 1) et
6.5.3 Matrizants
Consider the linear matrix differential equation given by
d
x = A(t)x + b(t)
dt
(6.57)
(6.58)
A(1 )d1 x0 +
A(1 )
0
(A(2 )x(2 )) d2 d1
0
Let Qk be defined as
Q1 (t) =
A(1 )d1
0
for k = 1
(6.60)
257
and
Qk (t) =
0
A(1 )
A(2 )
0
k1
A(3 )
A(k )dk d3 d2 d1
(6.61)
for k > 1. Assuming convergence, the solution becomes an infinite series given by
Qk x0 = M(t)x0
(6.62)
x(t) = x0 +
k=1
where
M(t) = I +
Qk (t)
(6.63)
k=1
and
dM
= A(t)M(t)
dt
(6.64)
Let the elements of the state matrix A be bounded; then the matrizant
M defined by (6.63), with Qk defined in (6.61), is invertible.
THEOREM 6.4.
PROOF.
and
(6.65)
To show this,
d 1
d 1
d
M M =
M
M + M1 (M)
dt
dt
dt
d
d
M1 + M1
(M) M1
dt
dt
d 1
M
+ M1 AMM1
dt
d 1
M
dt
M1 A
258
b(t)
M1 b(t)
M1 b(t)
M1 b(t)dt
t
M1 ()b()d
=
0
t
M1 ()b()d
(6.67)
M(t) = eQ1
and
M1 (t) = e(Q1 )
(6.68)
A(1 )d1 . (See Exercise E6.19 for an example of this case.) When A
0
z1
(6.69)
z = ... = V 1 x
zn
where x is the original set of state variables, then
d
d
z = V 1 x
dt
dt
8
and
zo,1
zo = ... = V 1 xo
zo,n
This solution approach, where the solution of the linear homogeneous case b(t) = 0 can be used
to extend the solution to the linear nonhomogeneous case b(t) = 0, is also known as Duhamels
principle.
259
V 1 x + V 1 b(t)
z + Q(t)
(6.70)
q1 (t)
1 z1 + q1 (t)
..
.
dzn
dt
(6.71)
n zn + qn (t)
which is a set of decoupled differential equation. Each decoupled differential equation is a first-order linear differential equation whose solution is given by
t
zk (t) = ek t zo,k +
ek (t) q()d
(6.72)
0
(6.73)
(6.74)
with A constant and b(t) bounded, then x(t) is unstable if any of the eigenvalues of A
has a positive real part.
PROOF.
260
When decoupling is possible, the trajectories will move along straight lines.
In the following example, this fact has been used to solve a parameter estimation
problem in which strong interaction among the original set of variables is present.
Wei Prater Kinetics Wei and Prater9 used the idea of decoupling
to obtain kinetic parameters of simultaneous reversible first-order reactions of
N chemical components.
For three components undergoing first-order kinetics as shown in Figure 6.2,
the system is described by
xA
(kAB + kAC)
kBA
kCA
x
d A
xB
xB = Kx =
kAB
(kBA + kBC)
kCB
dt
xC
kAC
kBC
(kCA + kCB)
xC
EXAMPLE 6.15.
xA
x = xB
xc
(kAB + kAC)
K=
kAB
kAC
kBA
(kBA + kBC)
kBC
kCA
kCB
(kCA + kCB)
The objective is to estimate the six kinetic coefficients, kij , using experimental
data. A typical graph is shown in Figure 6.3. Two things are worth noting in
Figure 6.3: (1) all curves converge to a single point, and (2) there are two
straight-line reactions. The main challenge in this parameter estimation problem
is to determine the kinetic coefficients when no experiment exists that isolates
dependence on only one component at a time. Wei and Prater decided to look for
an alternative set of coordinates such that under new coordinates, the pseudoconcentrations are decoupled.
Thus let us define a set of pseudo-concentration variables as follows:
y1
y = y2 = V 1 x
y3
where V is the matrix of the eigenvectors of K.10 This should result in a decoupled system
dy
= y
dt
9
Wei, J. and Prater, C. D., The Structure and Analysis of Complex Reaction Systems, Advances in
Catalysis, 13, Academic Press, New York (1962).
10 This naturally assumes that K is diagonalizable.
261
Figure 6.3. Experimentally observed compositions for butene isomerization. (Data adapted
from Froment and Bischoff, Chemical Reactor Analysis and Design, J. Wiley and Sons, 1979,
p. 21).
y2 = y2 (0)e2 t
and
y3 = y3 (0)e3 t
y1 v1 + y2 v2 + y3 v3
(6.75)
Note that det(K) = 0 because the last row is the negative of the sum of the
upper two rows. Recall from one of the properties of eigenvalues (cf. Property
7 of Section 3.3) that the product of the eigenvalues of matrix K is equal to the
determinant of K. Because K is singular, then at least one of the eigenvalues of K
must be zero. Without loss of generality, set 1 to zero. Then the corresponding
eigenvector, v1 , behaves according to definition of eigenvectors,
Kv1 = 1 v1 = 0
Thus, had the experiment started at x = y1 (0)e1 t v1 ,
d
x = Kx = 0
dt
which means x is the equilibrium point of the process, xeq , that is,
xeq = y1 (0)v1
Let us now look at the deviations from equilibrium: x xeq . From (6.75),
x xeq = y2 (0)e2 t v2 + y3 (0)e3 t v3
which is a linear combination of vectors v2 and v3 .
(6.76)
262
v2
r
A
v3
xeq = v
1
v2
v3
If we start the process at an initial point where y2 (0) > 0 and y3 (0) = 0, we
obtain a reaction path that follows the direction of v2 , that is, a straight-line path.
Along this path, only the coefficient given by y2 (0)e2 t decreases with time. The
eigenvalue 2 can then be found using the least-squares method by estimating
the slope of the linear equation
ln(x xeq ) = ln(y2 (0)v2 ) + 2 t
as we follow the path along v2 .
Using the other straight-line reaction, we can obtain 3 by starting at y3 (0) >
0 and y2 (0) = 0, which will be a path along v3 . Thus the eigenvalue 3 can be
found in a similar manner, using the least-squares method to estimate the slope
of the linear equation
ln(x xeq ) = ln(y3 (0)v3 ) + 3 t
Eigenvectors v2 and v3 can be obtained directly from the data, as shown in
Figure 6.4, whereas the equilibrium point v1 = xeq can be read off the plot
(point r). By subtracting the mass fractions at the start of one of the straight
lines (point s in the figure) and the equilibrium point (point r), the resulting
vector can be designated as v2 . Likewise, using the other straight-line reaction
that is not along the previous line, one could also subtract the start point (point
q) and the end point (point r) to determine v3 .
Combining all the results so far: v1 = xeq , v2 , v3 , 1 = 0, 2 and 3 , we can
build matrices V and ,
0 0
0
= 0 2 0
V = (xeq |v2 |v3 )
0 0 3
Finally, matrix K can be reconstructed as follows:
K = VV 1
(6.77)
6.8 Exercises
263
the method of Laplace transforms can be used to transform the set of differential
equations into a set of algebraic equations.
The Laplace transform of a function f (t) is defined as
f (t)est dt
(6.78)
L [ f (t)] =
0
where s is the Laplace transform variable, which spans the right-half complex plane.
Details, including several properties of Laplace transforms, can be found in Section 12.4. In our case, we use the property of derivatives given by
' (
df
L
= sL [ f (t)] f (0)
(6.79)
dt
and the convolution theorem
L [ f g] = L [ f ] L [g]
where the convolution of f (t) and g(t) is defined by
t
f g =
f (t ) g () d
(6.80)
(6.81)
For the special case of f = eAt , recall (6.53) and apply (6.79),
'
(
d At
L
e
= L AeAt
dt
At
s L e I = AL eAt
or
1
L eAt = sI A
(6.82)
L [x]
AL [x] + L [b]
1
sI A
x(0) + L [b]
Next, use the inverse Laplace transform, L1 [] defined by L1 L [ f ] = f ,
x = L1 L eAt x(0) + L1 L eAt L [b]
= eAt x(0) + eAt (b)
eA(t) b()d
= eAt x(0) +
0
264
PL P0
L
dy
3x y + 3
=
dx
y 3x + 2
3.
dy
= (2x + y) (2x + y + 4)
dx
This problem is adapted from an example given in Jenson and Jeffreys, Mathematical Methods in
Chemical Engineering, 2nd Ed., Academic Press, 1977.
6.8 Exercises
265
1. Using the change of variable z = ln(x), show that the kth derivatives
dk y/dxk can be put in terms of the derivatives dk y/dzk as follows:
dy
dx
d2 y
dx2
1 dy
x dz
1
dy d2 y
+
x2
dz dx2
..
.
dk y
dxk
k
1
d y
b
k,
xk
dz
=1
..
.
where
bk,
(k 1) bk1,1
=
bk1,1 (k 1) bk1,
if = 1
if 1 < < k
if = k
266
d3 y
d2 y
dy
1
+ x2 2 + x
+y=
3
dx
dx
dx
2x
1
1
B=
1
1
0
1
target
guided
missile
(x,y)
where r and rT are the position vectors of the missile and target, respectively,
and k is a proportionality function. After dividing dx/dt by dy/dt, and then
taking another time derivative followed by one more division by dy/dt, show
that we can end up with
"
1 + (dx/dy)2 dx dyT
d2 x
dxT
y)
+
=0
(y
T
dy2
sm
dy dt
dt
!
where we used the fact that sm = (dx/dt)2 + (dy/dt)2 . Consider the simple
case in which the target follows a straight path with yT (t) = YT 0 and xT (t) =
X T 0 + sT t, where YT 0 and X T 0 are constants, and initial conditions x(0) = 0
and y(0) = 0. Find the relationship between x and y, also known as the pursuit
path.12
12
Problem based on Burghes, D. N. and Borrie, M. S. Modeling with Differential Equations, Ellis
Horword, Ltd, 1981.
6.8 Exercises
267
= y2 + y + 2
dx
x
x
Obtain the general solution if (1 + )2 > 4. (See Section F.1.1 for an
approach to solving Ricatti equations.)
E6.14. Given the set of differential equations,
3
2
0 2
d
x=
1 1
dt
1 1
4
1
0
2
2
1
x
0
2
Obtain
the solution when the initial condition is given by xT =
1 1 0 0 .
E6.15. Let A[=]2 2, such that
=
trace(A)
2
"
=
and
Show that
exp (At) =
trace(A)2 4 det(A)
2
= 0
et
p (t)I + q(t)A
where
p (t) = cosh (t) sinh (t)
and
0
Also, using lHospital rule, show that when = 0,
exp(At) = et (1 t) I + At
and thus the solution to (6.83) when = 0 becomes
t
e(t) [1 (t )] I + A [t ] b()d
x(t) = et (1 t)I + At x0 +
0
d
x = Ax + b(t)
dt
268
0
1
0
0
0
1
..
.
.
..
..
..
A= .
.
0
0
0
0 1 2
0
0
..
.
1
N1
with y = x1 .
2. From E3.8, if the eigenvalues are distinct, the matrix of eigenvectors V ,
for the companion matrix A, is given by the Vandermonde matrix,
1
1
..
V = ...
.
N1
N1
1
N
whose inverse is given by
V 1 =
where
hij
hij
i,j 1
i
( j )
j =i
(i j )
j =i
y(0) = 1,
dy
d2 y
(0) = 2 (0) = 0
dt
dt
A set of MATLAB programs are available on the books webpage for obtaining ternary plots. See
the document plotTernaryTutorial.pdf for instructions.
6.8 Exercises
269
Data set 1
Data set 1
Time
CA
CB
Time
CA
CB
Time
CA
CB
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
7.0
9.0
11.0
12.0
0.900
0.744
0.629
0.543
0.480
0.433
0.399
0.373
0.354
0.340
0.329
0.309
0.302
0.300
0.300
0.000
0.077
0.135
0.178
0.209
0.233
0.250
0.263
0.272
0.279
0.285
0.295
0.298
0.299
0.299
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
10.0
11.0
13.0
14.0
18.0
21.0
0.100
0.220
0.279
0.305
0.315
0.317
0.316
0.314
0.311
0.307
0.305
0.303
0.302
0.300
0.300
0.000
0.043
0.090
0.134
0.171
0.201
0.225
0.243
0.257
0.276
0.282
0.290
0.292
0.297
0.299
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
13.0
15.0
21.0
0.100
0.152
0.190
0.219
0.240
0.255
0.267
0.276
0.282
0.287
0.290
0.293
0.296
0.298
0.300
0.900
0.745
0.629
0.544
0.481
0.434
0.399
0.373
0.354
0.340
0.330
0.322
0.312
0.307
0.301
2
2
2. Let
1
A1 =
1
A2 =
12
12
and
f (t) = e2t
g(t) = 2e5t
Ri
dv
dt
1
i
C
di
L
dt
Resistors
Capacitors
Inductors
270
in
L2
C i
2
i1
out
dvin
dt
1
d 2 i2
di2
(i2 i1 ) + L2 2 + R
C
dt
dt
dy
5
dx
2
+ 5x = y + 2
is given by
'
1
y(x) =
(x 5 + C)
2
(2
+ 5x 2
where C is arbitrary. Find the singular solution for this differential equation and plot the singular solution together with the complete solution, that
is, using different values for C, thus verify that the region where the solution exists is determined by an envelope that is obtained from the singular
solution.
E6.23. A spring and dashpot system shown in Figure 6.7 is often used to model the
dynamics of mechanical systems in motion. The force balance yields a linear
second-order system given by
m
dx
d2 x
+ C + kx = F (t)
2
dt
dt
6.8 Exercises
271
s
m
dx
dt
k1
mN
m2
cN
m1
c1
xN
x2
x1
mi
ki
Ci
1
2
3
10
8
12
5
5
10
0.1
0.2
0.1
kbCB k f C2A
k f C2A kbCB
272
E6.25. Let CR be the RNA concentration and CE as the enzyme concentration. One
model for protein synthesis is given by
dCR
= r(CE ) k1 CR
dt
dCE
= k2 CR k3 CE
dt
where r(CE ) is the rate of production of protein based solely on the enzyme
concentration, and k1 , k2 and k3 are constants.
1. Show that this can be combined to yield a second-order reaction given by
d2 CE
dCE
+
= k2 CE r(CE ) CE
2
dt
dt
2. Assuming Michaelis-Menten kinetics for r(CE ), that is,
k4 CE
r(CE ) =
k5 + CE
Reduce the equation into a first-order differential equation with dependent variable p = dCE /dt and independent variable z = CE .
E6.26. Prove that if A(t)A() is commutative, then with Q1 (t) and Qk (t) as defined
in (6.60) and (6.61), respectively, becomes
1
Qk (t) = Q1 (t)k
k!
and thus showing that the matrizant formula given in (6.68)
M(t) = eQ1 (t)
is valid.
274
(7.1)
subject to the initial condition, y(0) = yo . We can replace the derivative by its finite
difference approximation given by
dy
k y
yk+1 yk
=
dt t=tk
k t
tk+1 tk
Let hk =
k t = tk+1 tk for k 0, then
yk+1 = yk + hk f (tk , yk )
(7.2)
275
dt
h
yk yk1 = hf (tk , yk )
or
yk+1 = yk + hf (tk+1 , yk+1 )
(7.3)
Equation (7.3) is also known as the implicit Euler Method or backward Euler
Method. It is implicit because yk+1 appears on both sides of equation (7.3), thus
requiring additional steps for the evaluation of yk+1 .
Recall from Section 6.5 that a system of higher order differential equation can
be recast in a state-space formulation given by
d
y = f (t, y)
dt
(7.4)
where y is the state vector and f is a vector of multivariable functions f i (t, y). The
Euler methods are then given by
yk+1
yk + hf (tk , yk )
: Explicit Euler
(7.5)
yk+1
yk + hf (tk+1 , yk+1 )
: Implicit Euler
(7.6)
EXAMPLE 7.1.
1
yk + hb(tk+1 )
I hA(tk+1 )
: Explicit Euler
: Implicit Euler
(7.7)
(7.8)
dy
+ y = et
dt
subject to y(0) = y0 . The analytical solution is given by
1
1
y(t) = y0 +
et/ +
e
1
1
(ExEu)
Let yk
be the value for y(tk ) using the explicit Euler formula, then
h (ExEu)
(ExEu)
(ExEu)
yk
= yk
yk+1
+ etk
h (ExEu) h tk
yk
=
1
e
(7.9)
(7.10)
276
Let yk
be the value for y(tk ) using the implicit Euler formula, then
h (ImEu)
(ImEu)
(ImEu)
= yk
yk+1
yk+1 + etk+1
h
(ImEu)
=
yk
etk+1
+h
+h
(7.11)
Generally, the explicit methods are used more often than the implicit methods
because they avoid the additional steps of solving (possibly nonlinear) equations
for yk+1 . Example 7.1 shows that as long as the increments h are sufficiently small,
the explicit Euler should be reasonably satisfactory. However, very small values
of h imply larger computational loads. The implicit Euler method for Example 7.1
involved only a simple inverse. However, in general, the implicit solution for yk+1
could be more difficult, often involving nonlinear solvers. Nonetheless, as shown in
Example 7.1, the implicit methods are more stable. The issues of stability is discussed
in more detail in Section 7.4.
yk+1
yk +
s
c j kj
(7.13)
j =1
1
The stage s determines the number of parameters. However, the order refers to the accuracy
based on the terms used in the Taylor series expansion.
277
h = 0.0001
h = 0.0001
0.05
Analytical Solution
Explicit Euler
Implicit Euler
Explicit Euler
Implicit Euler
Error
3
0
0.002
0.004
0.006
0.008
0.05
0
0.01
0.002
0.004
0.006
0.008
0.01
h = 0.001
h = 0.001
0.8
Analytical Solution
Explicit Euler
Implicit Euler
0.6
Explicit Euler
Implicit Euler
0.4
0.2
Error
0.2
0.4
2
0.6
3
0
0.002
0.004
0.006
0.008
0.01
0.8
0
0.002
0.004
0.006
0.008
0.01
h = 0.002
h = 0.002
Analytical Solution
Explicit Euler
Implicit Euler
Error
Explicit Euler
Implicit Euler
1
1
3
0
0.002
0.004
0.006
0.008
0.01
3
0
0.002
0.004
0.006
0.008
Figure 7.1. Performance comparison of the explicit and implicit Euler methods for
example 7.1. The plots on the right column shows the errors of the Euler methods
at different values of h.
0.01
278
where kj are the intermediate update terms. Thus in (7.13), the update k is a linear
combination of kj weighted by c j . The parameters of the Runge-Kutta methods are
a1 , . . . , as , c1 , . . . , cs , and b11 , b12 . . . , bss . The parameters a j affect only tk , whereas
the parameters bj, affect only the yk during the evaluations of f (t, y). All three sets
of parameters are usually arranged in a table called the Runge-Kutta Tableau as
follows:
a1 b11 b1s
a
.
..
..
..
B
.
.
.
= ..
(7.14)
a
b
s
s1
ss
c1 cs
cT
Based on the structure of matrix B, the type of Runge-Kutta method can be classified
as follows:
1. Explicit Runge-Kutta (ERK). If matrix B is strictly triangular, that is, bij = 0,
i = 1, . . . , s, j i, the method is known an explicit Runge-Kutta method. For
these methods, the intermediate updates, kj , given in (7.12) can be evaluated
sequentially.
2. Implicit Runge-Kutta (IRK). If matrix B is not strictly triangular, the method
is known as an implicit Runge-Kutta (IRK) method. Some of the special cases
are:
(a) Diagonally Implicit Runge-Kutta (DIRK).
bij
j >i
b
=
for some 1 s
(7.15)
and
g=
f(x)
1
(7.16)
The advantage of (7.16) is that we will no longer need the parameters a j during the
Runge-Kutta calculations. Nonetheless, in some cases, there are still advantages to
using the original nonautonomous system because the stability solution properties
apply only to y (note that t is allowed to be unbounded).
If we require (7.12) to hold for the special case of [y = t, f = 1] and the case
[ f (y) = f (t)], we have the following consistency condition:
aj =
s
=1
bj
; j = 1, . . . , s
(7.17)
279
h2
= yk + hf (tk , yk ) +
2!
f
f
f
+
+
y tk ,yk
t tk ,yk
(7.18)
To obtain an n th -order approximation, the series will be truncated after the (n + 1)th
term. Thus the Euler method is nothing but a first-order Taylor series approximation.
In the next two sections, we discuss the fourth-order explicit Runge-Kutta method
and the fourth-order implicit Runge-Kutta method.
1
2
1
2
1
2
1
2
1
6
1
3
1
3
1
6
cT
(7.19)
k2
k3
k4
hf (tk , yk )
h
k1
hf tk + , yk +
2
2
h
k2
hf tk + , yk +
2
2
hf tk + h, yk + k3
yk+1
yk +
1
k1 + 2k2 + 2k3 + k4
6
(7.20)
280
where
y=
[1]
..
.
y[M]
f t, y =
f [1] t, y
..
.
f [M] t, y
(7.21)
1
3
1
1
3
2
6
4
4
6
1
3
1
3
1
=
+
+
(7.22)
2
6
4
6
4
1
1
T
c
2
2
or, in actual equations,
k1
1
3
1
1
3
hf t +
h, yk + k1 +
k2
2
6
4
4
6
k2
1
1
3
3
1
+
hf t +
+
h, yk +
k1 + k2
2
6
4
6
4
yk+1
yk +
1
(k1 + k2 )
2
(7.23)
281
k,2
hf tk + a1 h, yk + b11 k,1 + b12 k,2
hf tk + a2 h, yk + b21 k,1 + b22 k,2
yk+1
yk +
1
k1 + k2
2
(7.24)
where
1
3
a1 =
2
6
1
3
a2 = +
2
6
B=
and
1
3
4
6
1
4
1
3
+
4
6
1
4
Remarks: Two MATLAB functions, rk4.m and glirk.m, are available on the
books webpage and implement the fixed-step fourth-order explicit Runge-Kutta
and the fourth-order implicit (Gauss-Legedre) Runge-Kutta methods, respectively.
EXAMPLE 7.2. Let us now compare the performance of the fourth-order explicit
and implicit Runge-Kutta methods applied to the same differential equation
given in Example 7.1, that is,
dy
+ y = et
dt
(ExRK)
yk+1
(ExRK)
yk
1
(k1 + 2k2 + 2k3 + k4 )
6
(ImRK)
Let yk
be the value for y(tk ) using the implicit Runge-Kutta based on the
Gauss-Legendre formulas given in (7.23); then, after further rearrangements of
the equations,
(ImRK)
yk+1
where
q1 = 1 R
1
1
(ImRK)
= q1 yk
+ q2 etk
and q2 = R
(7.25)
ea1 h
ea2 h
282
1
3
1
3
with a1 =
, a2 = +
and
2
6
2
6
+
h 4
1
1
R=
2
2
1
3
4
6
1
1
3
4
6
h 4
Using the parameters used in Example 7.1, = 0.001, = 100, and y0 = 1.0.
Figure 7.2 shows the performance of both the explicit and the implicit RungeKutta methods for h = 0.001, 0.002, 0.003.
Compared with the Euler method, the accuracy has improved significantly,
as expected. At h = 0.001, the Runge-Kutta methods have much smaller errors,
even compared with Euler methods using an increment that is ten times smaller
at h = 0.0001. The explicit Euler method had become unstable at h = 0.2,
whereas the explicit Runge-Kutta did not. The explicit Runge-Kutta did become
unstable when h = 0.03. In all the cases, the implicit Runge-Kutta using the
Gauss-Legendre formulation had the best performance. It can be shown later
in Section 7.4.2 that the Gauss-Legendre implicit Runge-Kutta method will be
stable for all h when > 0, but the errors will still increase as h increases. This
means that even for the implicit Runge-Kutta method, h may need to be varied
to control the errors. A variable step method should be able to improve this
approach further.
From the previous examples, we observe that both accuracy and stability of
the solutions are often improved by using a smaller step size h. However, smaller
step sizes require more computation time and storage. Moreover, different step
sizes are needed at different regions of the solution to attain the desired level
of accuracy while being balanced with computational efficiency. The process of
continually adjusting the step sizes of the numerical methods to attain balance
between accuracy and computational loads at the appropriate regions of the
solution is generally known as error control. Two popular variations to the
explicit Runge-Kutta methods that include error control are the Fehlberg 4/5
method and the Dormand-Prince 4/5 method. Details for both these approaches
are included in section G.5 of the appendix.
m
i=0
ai yki + h
m
j =1
bj f (ykj )
(7.26)
283
h = 0.001
h = 0.001
16
x 10
14
Analytical Solution
Explicit RK
Implicit RK
10
Error
0.5
Explicit RK
Implicit RK
12
0.5
1
0
0.002
0.004
0.006
0.008
2
0
0.01
0.002
0.004
0.006
h = 0.002
h = 0.002
0.008
0.01
1
0.4
Analytical Solution
Explicit RK
Implicit RK
0.3
Error
0.5
Explicit RK
Implicit RK
0.2
0.1
0.5
0
1
0
0.002
0.004
0.006
0.008
0.01
0.1
0
0.002
0.004
0.008
0.01
0.008
0.01
h = 0.003
h = 0.003
0.006
Analytical Solution
Explicit RK
Implicit RK
6
Explicit RK
Implicit RK
Error
1
0
0.002
0.004
0.006
0.008
0.01
1
0
0.002
0.004
0.006
Figure 7.2. Performance comparison of the explicit and implicit Runge-Kutta methods for
example 7.2
284
1
1
1 b0
1 1
m b1
0 1
(7.27)
. .
.
.
.
.
..
.
.. ..
..
.
.
.
.
. .
0 1 2m mm b (1)
m
m+1
Details on how this equation was obtained are given in Section G.3 as an appendix.
The matrix multiplying the vector of b coefficients is a Vandermonde matrix, which
is nonsingular. This means that the coefficients will be unique.
For the fourth-order Adams-Bashforth method,
27
b0
1
b1
2
b2
3
b3
4
285
which yields
55
59
37
9
b1 =
b2 =
b3 =
24
24
24
24
Thus the fourth-order Adams-Bashforth method is given by
h
yk+1 = yk +
55f (yk ) 59f (yk1 ) + 37f (yk2 ) 9f (yk3 )
24
b0 =
(7.28)
(7.29)
[M]
yk+1 = yk
+h
which is just the same equation as tk+1 = tk + h. This means that the AdamsBashforth formulas can be extended immediately to handle the nonautonomous
case, that is,
h
55 f(tk , yk ) 59 f(tk1 , yk1 ) + 37 f(tk2 , yk2 ) 9 f(tk3 , yk3 )
yk+1 = yk +
24
(7.30)
1
1
1
1
b1
1
0
1
m b0
=
(7.31)
..
..
..
..
..
..
..
.
.
.
.
.
.
(1)m+1
(1)m+1
0
1
mm+1 bm
m+2
For the second-order Adams-Moulton method, we have
b1
1 1
1
b1 = b0 = 1
=
1 0 b
2
1
0
2
or
yk+1
h
= yk +
2
f yk+1 + f yk
(7.32)
286
b1
2
b0
4
b1
8
b2
2
=
+1
3
which yields b1 = 9/24, b0 = 19/24, b1 = 5/24, and b2 = 1/24. Thus the fourthorder Adams-Moulton method is given by
yk+1
h
= yk +
24
9f yk+1
+ 19f yk 5f yk1 + f yk2
(7.33)
Note that because multistep methods require past values of y, one still needs to
use one-step methods such as the Runge-Kutta methods for the initialization step.
This means that for a fourth-order Adams-Bashforth or Adams-Moulton method,
we need to use one-step methods to evaluate y1 , y2 , and y3 (in addition to y0 ) before
the Adams-Bashforth could proceed. For the fourth-order Adams-Moulton method,
we need the initial values of y0 , y1 , and y2 .
Notice that with y = t and f (y) = 1, we also end up with tk+1 = tk + h. Thus
the formulas can be extended immediately to include nonautonomous cases and
multivariable cases as follows:
h
yk+1 = yk +
9 f tk+1 , yk+1 + 19 f tk , yk 5 f tk1 , yk1 + f tk2 , yk2
24
(7.34)
h
55 f tk , yk 59 f tk1 , yk1
yk +
24
+ 37 f tk2 , yk2 9 f tk3 , yk3
(7.35)
2. Corrector:
=
yk+1
287
h
9 f tk+1 , zk+1 + 19 f tk , yk
24
5 f tk1 , yk1 + f tk2 , yk2
yk +
(7.36)
One could vary the correction stage by performing additional iterations, for example,
(0)
with wk+1 = zk+1 ,
(j +1)
wk+1 = yk +
(j +1)
h (j )
9f wk+1 + 19f yk 5f yk1 + f yk2
24
(7.37)
(j +1)
(j )
until wk+1 wk+1 < , where is a chosen tolerance, and then set yk+1 = wk+1 .
(j +1)
Even though this may converge to the stable value of wk+1 , it may still not necessarily
be an accurate value. Instead of (7.37), it is more efficient to decrease the step-size
h, using error-control strategies such as those discussed in Section G.5.
m
ai yki + h
m
bj f ykj
j =1
i=0
m
ai yki + hb1 f yk+1
(7.38)
i=0
m
i=0
hb1 +
m
i 1 +
i=0
[ (i + 1) h]q
q!
q=1
i = 1 ;
p 1
i=0
i [ (i + 1)] + b1 = 0 ;
p 1
[ (i + 1)]q
i=0
q!
i = 0 , q = 2, . . . , p
288
0 1 1
... 1
b1
1 1 2
... p
a0
0 1 22 . . . p 2 a1
.. .. ..
..
..
.
.
. . .
.
.
0
2p
...
pp
a p 1
1
0
0
..
.
(7.39)
b1
a0
a1
a2
a3
or
yk+1 =
0
1
0
0
0
1
1
1
1
1
1
2
22
23
24
1
3
32
33
34
1
4
42
43
44
1
0
0
0
12
25
48
25
36
25
16
25
25
48
36
16
3
12
yk yk1 + yk2 yk3 + hf yk+1
25
25
25
25
25
(7.40)
48
36
16
3
12
yk
yk1 +
yk2
yk3 + h f yk+1
25
25
25
25
25
(7.41)
BDF methods can handle several stiff differential equations well, but stability properties do deteriorate at higher orders; that is, they are only stable for orders less
than 7.
Unfortunately, care is needed when extending this result to nonautonomous
differential equations because setting y = t and f (y) = 1 will now yield
tk+1 =
48
36
16
3
12
tk tk1 + tk2 tk3 + h
25
25
25
25
25
which does reduce to tk+1 = tk + h only if the step size h is held constant. Thus for
fixed h, the nonautonomous approach implies the simple replacement of f(yk+1 ) by
f (tk+1 , yk+1 ). When the step sizes are variable, the coefficient will also have to vary
at each step. The generalized formulas for the BDF coefficients when the step sizes
vary are discussed in Section G.4.
EXAMPLE 7.3. We now compare the performance of the four multistep methods
just discussed, applied to the same linear scalar system used in Examples 7.1
and 7.2,
1 dy
+ y = et
dt
h tk+1
yk+1 =
48yk 36yk1 + 16yk2 12 e
25 + 12h
Figure 7.3 shows the performance of these iteration formulas of the various
multistep methods for = 0.001, = 100, and y0 = 1.0. In practical use, one
would use one-step methods such as the Runge-Kutta methods to evaluate the
first four or three iterations. For our example here, however, we used the exact
values coming from the analytical solution to allow us to assess the performance
of the multistep methods, independent of the Runge-Kutta methods.
All the methods performed well for h = 0.0001. However, when compared
with Figure 7.3 for the Runge-Kutta methods, the accuracy and stability of the
explicit Adams-Bashforth method did not appear to be as good at h = 0.001.
The stability of the predictor-corrector formulas is better than that of the explicit
Adams-Bashforth method, but it was unstable when h = 0.002. The two implicit
methods are the Adams-Moulton and BDF methods. It appears for small h
values, the Adams-Moulton performs better than the BDF method. However,
as we see in Figure 7.4, the BDF maintained stability even at h = 0.004, but
the Adams-Moulton did not. This shows that although stability is improved
by using implicit methods, some methods have greater stability ranges than
others. This is one reason that, among multistep methods, the BDF methods are often chosen to handle stiff2 differential equations, and then they
are coupled with other enhancements, such as step-size control, to improve
accuracy.
289
290
Analytical
AdamsBashforth
AdamsMoulton
PredictorCorrector
BDF
1.5
Error
x 10
Analytical Solution
AdamsBashforth
AdamsMoulton
PredictorCorrector
BDF
0.5
h = 0.0001
0.5
0.5
0.5
1
0
0.002
0.004
0.006
0.008
1.5
0
0.01
0.002
0.004
0.006
t
h = 0.001
0.01
h = 0.001
5
Analytical
AdamsBashforth
AdamsMoulton
PredictorCorrector
BDF
Analytical Solution
AdamsBashforth
AdamsMoulton
PredictorCorrector
BDF
Error
0.5
0.008
0.5
1
0
0.002
0.004
0.006
0.008
5
0
0.01
0.002
0.004
t
h = 0.002
0.008
0.01
0.008
0.01
h = 0.002
Analytical Solution
AdamsBashforth
AdamsMoulton
PredictorCorrector
BDF
0.5
0.5
Error
0.5
0.006
1.5
0.5
2.5
Analytical
AdamsBashforth
AdamsMoulton
PredictorCorrector
BDF
1
0
0.002
0.004
0.006
0.008
0.01
3.5
0
0.002
0.004
0.006
Figure 7.3. Performance comparison of the various multistep methods for example 7.3
h = 0.004
0.5
291
0.6
Analytical Solution
AdamsMoulton
BDF
0.4
Analytical Solution
AdamsMoulton
BDF
Error
0.2
0.2
0.5
0.4
1
0
0.05
0.1
0.15
0.05
0.1
Figure 7.4. Performance comparison between Adams-Moulton and BDF at h = 0.04 for
example 7.3
0.15
292
(7.42)
p p = 0
(7.44)
i=0
The roots of (7.44) will be used to form the complementary solution, that is, the
solution of the homogeneous part of (7.43), as stated by the following theorem:
Let f (n) = 0 in the difference equation (7.43), and let the distinct
(possibly repeated)
roots of the p th -order characteristic equation in (7.44) be =
;
:
1 , . . . , M where j is a k j -fold root (i.e., repeated (k j 1) times and M
j =1 k j =
p ). The solution of the homogeneous difference equation is then given by
THEOREM 7.1.
yn = Qn y0 =
M
S (j, n)
(7.45)
j =1
where
S (j, n) =
k j 1
c j, n ( j )n
(7.46)
=0
and c j, are arbitrary constants that are used to fit initial conditions.
PROOF.
EXAMPLE 7.4.
293
10
20
30
or using the polar form and Eulers identity on the complex roots,
yn = C1 (0.5)n + (C2 + C3 n) (0.7)n + rn (A cos (n) + B sin (n))
where
!
r = 0.82 + 0.32
= tan
0.3
0.8
a b
0
c ... ...
AN =
..
..
.
. b
EXAMPLE 7.5.
or
40
294
or
(7.47)
where a, b, q, and are constants, with q as a positive integer. The particular solution,
Y , can then be formulated as
(7.48)
Y = n K bn (A0 + + Aq nq ) cos (n) + (B0 + + Bq nq ) sin (n)
where K = 0 if R = bei is not a root of the characteristic equation (7.44). If R =
bei = j , where j is a k j -fold root of the characteristic equation (7.44), then we need
to set K = k j . The coefficients A0 , . . . , Aq and B0 , . . . , Bq can then be determined
after substituting Y into the difference equation.
7.4.2 Stability
As long as the term b in (7.47) has a magnitude less than 1, f (n) is bounded as
n . This means that the particular solution will also be bounded as n .
One can then conclude that as long as the nonhomogeneous term f (n) is bounded,
the source of instability can only come from the complementary solution. Using
Theorem 7.1, the stability of a linear difference equation will then depend on the
roots of the characteristic equation.
For the linear difference equation (7.43), with f (n) as a linear combination of terms having the form given in (7.47), let f (n) < as n , and let
= (1 , . . . , M ) be the set of distinct roots of the characteristic equation (7.44). Then
the solution of the difference equation, yn , is stable if 4
j < 1
for j = 1, . . . , M
THEOREM 7.2.
The theorem gives a sufficient condition. For each j that is not repeated, the stability condition
could be relaxed to be | j | 1. However, because round-off errors are usually present, it may be
more reasonable to use the strict inequality when applying it to the stability analysis of numerical
solutions.
295
(7.49)
x1n
xn = ...
xM
n
..
.
a11
..
A= .
aM1
a1M
..
.
aMM
Let matrix T be the nonsingular matrix that would yield the Jordan canonical decomposition of A: TJT 1 = A. The solution of (7.49) is
x1
=
..
.
Ax0 + b0
xn
A x0 +
n
n1
Ai1 bi1
i=0
or
xn = TJ n T 1 x0 +
n1
TJ i1 T 1 bi1
(7.50)
i=1
When J is diagonal,
..
J =
.
0
+
M
n1
1n
xn = T
0
..
i=1
1i
0
..
1
T x0
n
M
1
T bi1
i
M
1
0
..
..
.
.
J = diag
..
. 1
xn
n,n1 n1
..
.
n1
i=1
n,nM+1 nM+1
..
n,nM+2 nM+2
..
.
1
T x0
n
i
i,i1 i1
..
.
i,iM+1 iM+1
..
i,iM+2 iM+2
..
.
1
T bi1
296
where
k,j =
k!
(k j )!j !
0
if j 0
otherwise
In either case, the stability again depends on the eigenvalues of A; that is, as long as
bi in (7.49) are bounded, then stability is guaranteed if |i | < 1 for all i = 1, . . . , M.
Based on the preceding canonical forms, for most numerical methods, including single-step or multistep, implicit or explicit, the stability is applied to the the
Dahlquist test case. For the single-step methods, the test involves the following
steps:
1. Apply the chosen numerical method on
dy
= y
dt
with y(0) = 1.
2. With h =
t and z = h, rearrange the equation to be
yn+1 = g (z) yn
3. Find the stability region, which is the region such that
*
)
Stab : z : g (z) < 1
When the s-stage and s-order explicit Runge-Kutta methods are
applied to the Dahlquist test case, the difference equation becomes
s
1
i
yn+1 =
(h) yn
i!
EXAMPLE 7.6.
i=0
The stability region is then shown in Figure 7.6 for s = 1, 2, 3, 4. Note that,
although the stability region increases with the number of stages s, the explicit
Runge-Kutta methods will be conditionally stable; that is, the step size is constrained by the value of .
Applying the backward Euler method to the Dahlquist test, the difference
equation becomes
yn+1 =
1
yn
1 h
The stability region is shown in Figure 7.7. This means that if Real () < 0, the
backward Euler method will be stable for any step size.
Finally, we can assess the stability region for the Gauss-Legendre method,
which is an implicit fourth-order Runge-Kutta. The difference equation after
applying the Gauss-Legendre method to the Dahlquist test is given by
yn+1 =
12 + 6h + (h)2
12 6h + (h)2
yn
297
3
s=4
s=3
s=2
Figure 7.6. Stability region for z = h using the explicit RungeKutta methods (unshaded regions) of order s = 1, 2, 3, 4.
Imag ( z )
1
s=1
3
3
Real ( z )
The stability region is shown in Figure 7.8. For Real () < 0, the Gauss-Legendre
method will also be stable for any step size h.
For multistep methods, the procedure is similar. Instead, the characteristic equations will yield multiple roots, which are functions of z = h. One of the roots,
usually denoted by 1 , will have a value of 1 at z = z . This root is known as the
principal root. The other roots are known as the spurious roots. For accuracy and
convergence properties, the principal root is the most critical, whereas the effects
of spurious roots die out eventually, often quickly, as long as the method is stable.
The stability regions can again be obtained by applying the numerical method to
the Dahlquist test case to obtain the characteristic equation. If any of the roots at
z = z , k (z ), have a magnitude greater than 1, then z belongs to the unstable
region.
1.5
Figure 7.7. Stability region for z = h using the backward Euler methods (unshaded regions).
Imag (z)
0.5
0.5
1.5
2
1
Real (z)
298
Imag (z)
10
10
10
Real (z)
Let us apply the Dahlquist test case, that is, f (t, y) = y, to the
fourth-order Adams-Moulton method given in (7.33); then we obtain the following difference equation:
z
yk+1 = yk +
(9yk+1 + 19yk 5yk1 + yk2 )
24
where z = h, and whose characteristic equation is then given by
9
19
5
1
3
2
1 + z + 1 + z
z +
z =0
24
24
24
24
EXAMPLE 7.7.
From the stability regions shown in Example 7.6, we note that the explicit RungeKutta methods will need small values of step size as Real () becomes more negative.
However, both the backward Euler and the Gauss-Legendre methods are unconditionally stable for Real () < 0. This type of stability is also known as A-stability.
There are other types of stability, some of which are shown in Figure 7.10. Some
of these alternative types of stability are needed to discriminate among different
schemes, especially multistep methods.
As we have noted, for some numerical methods, especially explicit methods,
stability requirements may demand smaller time steps than accuracy requires. When
this situation occurs, we say that the system is a stiff differential equation. In several
cases, differential equations are classified as stiff when the difference between the
299
1.5
Untable
Figure 7.9. The stabliity region of the AdamsMoulton implicit method for the Dahlquist test
case.
Imag (z)
0.5
Stable
0.5
1.5
Real (z)
magnitudes of the largest and smallest eigenvalues is very large. For stiff differential
equations, implicit methods such as the Gauss-Legendre IRK or multistep BDF
schemes are usually preferred. Specifically, D-stability (cf. Figure 7.10), that is, where
the stability is guaranteed inside the region whose real part is less than D < 0, is also
known as stiffly stable. It can be shown that BDF schemes of lower order (e.g., order
6) are stiffly stable.
Finally, note that the stability issues discussed in this section are all based on
linear systems. For nonlinear systems, linear stability analysis can be applied locally
via linearization. Other sophisticated approaches are needed for global analysis and
are not covered here.
qT (x(T ))
(7.51)
300
Im( h)
Im( h)
Re( h)
Re( h)
-D
Im( h)
Im( h)
Re( h)
Re( h)
0
Figure 7.10. Different types of numerical stability. The unshaded regions are stable; for
example, for A -stability, the shaded regions do not cross the dotted lines.
where q0 and qT are general nonlinear functions. In the linear case, we have
Q0 x(0)
QT x(T )
(7.52)
(7.53)
(7.54)
301
an appendix. Yet another approach is the finite difference method, which is discussed
in Chapter 13.
(7.55)
(7.56)
(7.57)
where M(t) is the matrizant of the system that satisfies the homogenous part, that is,
d
M = A(t)M
dt
M(0) = I
(7.58)
M()1 b()d
z(0) = 0
(7.59)
Using (7.57), at t = T ,
x(T ) = M(T )x(0) + z(T )
(7.60)
1
p Qbz(T )
Qa + QbM(T )
(7.61)
In case M(T ) and z(T ) can be evaluated analytically, x(0) can be calculated and
substituted into (7.57). This would yield an analytical solution for x(t). However, in
other cases, one may have to rely on numerical methods to estimate M(T ) and z(T ).
To estimate M(T ), apply the properties given in (7.58). Using IVP solvers such
as Runge-Kutta methods, Adams methods, or BDF methods, while setting initial
conditions to e j (the j th column of the identity matrix), we could integrate the
302
homogenous part of the differential equation until t = T . This yields m j (T ), that is,
the j th column of M(T ). Thus
1
0
d
m1 (t = T )
IVS
m1 = A(t)m1 , m1 (0) = ...
dt
0
0
..
.
0
0
..
.
d
IVS
mn = A(t)mn , mn (0) =
dt
0
1
mn (t = T )
where IVS denotes any initial value solver such as Runge-Kutta methods, Adams
methods, or BDF methods. Afterward, we can combine the results to form the
matrizant at t = T , that is,
M(T ) =
m1 (T )
m2 (T )
mn (T )
Likewise, to estimate z(T ), apply the properties given in (7.59). Using a zero
initial condition, we could integrate the nonhomogeneous part of the differential
equation until t = T to obtain z(T ), that is,
0
z (t = T )
IVS
z = A(t)z + b(t) , x0 = ...
dt
0
Once M(T ) and z(T ) have been estimated, equation (7.61) can be used to
determine the required initial condition, x(0). Finally, we can use the initial value
solvers once more and integrate the (nonhomogenous) differential equation (7.55)
using x(0).
EXAMPLE 7.8.
and
z(4) =
2.4966 100
3.8973 101
303
5
x2
x ,x
x1
10
15
20
25
0
0
20.182
A plot of x(t) is shown in Figure 7.11, which shows that the boundary conditions
are indeed satisfied.
The extension to nonlinear boundary value problems can be handled by using
Newton-type algorithms. Details of these approaches are given in Section G.7.
d
fsys t, y, dt y, u
= f t, z, d z = 0
dt
fctrl (t, y, u)
(7.63)
(7.64)
where z = (y, u)T is an extended state vector. The combined system in (7.64) is an
example of a differential algebraic equation (DAE), and it is a generalized formulation of ordinary differential equations (ODE).
304
In some cases, as in (7.64), the DAE system takes the form known as the semiexplicit DAE form given by
d
y
dt
0
f1 (t, y, u)
f2 (t, y, u)
(7.65)
u = q(t, y)
f1
f1 dy f1 du
+
+
t
y dt
u dt
f2
f2 dy f2 du
+
+
t
y dt
u dt
which becomes an ODE set if f2 /u remains nonsingular for all t, u, and y. This trick
of taking derivatives of a strict DAE set can be done repeatedly until it becomes an
ODE set. The minimum number of times a differentiation process is needed for this
to occur is known as the index of the DAE. Thus if f2 /u remains nonsingular for
the system (7.65), it can be classified as an index-1 DAE. Likewise, an ODE set is
also known as an index-0 DAE.
Several general-purpose DAE solvers are available to solve index-1 DAEs.
Most of them are based on the BDF methods. There are also implicit Runge-Kutta
methods available. There are specific features of numerical solution of DAEs that
distinguish them from that of simple ODEs such as consistency of initial conditions,
but we refer to other sources that discuss these issues in great detail.5 Instead, we
simply outline the general approach for either implicit Runge-Kutta and multistep
BDF methods.
Assume that the following DAE set is index-1:
d
d
F t, y, y = 0 subject to y (t0 ) = y0 and
y(0) = z0
(7.66)
dt
dt
The BDF method (generated in Section G.4) results in the following nonlinear
equation that needs to be solved for yk+1 :
m
1
F
tk+1 , yk+1 ,
(i|k) yki
=0
(7.67)
hk
i=1
where the coefficients (|k) are given in (G.22). Because the BDF method is a
multistep method, one needs to have values to initiate the recursion. One approach
5
See, for example, K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of InitialValue Problems in Differential-Algebraic Equations, North-Holland, New York, 1989.
7.7 Exercises
305
is to use one-step BDF, followed by two-step and so on, until the maximum number
of steps is reached (which has to be less than 7 for stability). This would also mean
that to achieve good accuracy at the start of the BDF method, variable step sizes
may be needed.
For s-stage Runge-Kutta methods, the working equation is given by
s
bj k, , k,
F
=0
j = 1, . . . , s
(7.68)
(tk + a j h) , yk +
=1
yk+1 = yk +
s
c j k,
(7.69)
j =1
7.7 EXERCISES
x(0) =
1
1
306
E7.4. One example of a stiff differential system is given by the chemical reaction
system known as Robertsons kinetics:
A
B+B
C+B
B+C
A+C
tnum
= [0, 0.1, 0.15, 0.2, 0.3, 0.34, 0.4, 0.48, 0.5]
Let xnum and xdata be the numerical (simulated) values and the data values,
respectively, at the time instants given in tdata . Let be the parameters of the
system. Then the parameters can be estimated by performing the following
optimization:
opt = arg min (xnum () xdata )
7.7 Exercises
307
x1
x2
Time
x1
x2
Time
x1
x2
0.2419
0.6106
1.0023
1.3940
1.7166
1.0363
1.1018
1.1813
1.2327
1.3076
0.9146
0.8117
0.6573
0.5497
0.4515
2.2235
2.7074
3.3756
4.1590
4.9654
1.3591
1.4199
1.4480
1.4760
1.4854
0.3111
0.2456
0.1942
0.1801
0.1708
6.4171
7.4539
8.3525
9.0438
9.6198
1.5088
1.5088
1.5275
1.5135
1.5368
0.1661
0.1661
0.1661
0.1614
0.1614
(7.70)
(7.71)
y(2) = 0.5
Use the shooting method and obtain the plot of y(x) from x = 0 to x = 2.
E7.8. Obtain a closed formula for the determinant of the N N matrix AN given
by
5
2
0
..
1
.
5
AN =
.
.
..
. . 2
0
1
5
(Hint: Follow the method used in Example 7.5.)
308
E7.9. Obtain the solution for the two-point boundary value problem given by:
dCa
dt
dCb
dt
dCc
dt
dCd
dt
k1 Ca Cb + k2 Cc k3 Ca Cc
k1 Ca Cb + k2 Cc
k1 Ca Cb k2 Cc k3 Ca Cc
k3 Ca Cc
Cd (10)
Cd (0)
Ca (10)
0.01
Cb(10)
0.08
where k1 = 12, k2 = 0.1 and k3 = 2.5. (Hint: Use the initial guess of Ca (0) =
Cb(0) = Cc (0) = 0.3 and Cd = 0 )
E7.10. For the fourth-order BDF method given in (7.40),
1. Find the principal root for = 0 and the spurious roots.
2. Determine the stability region for the Dahlquist test case.
3. Show that this method is stiffly stable by finding the value of D that would
make it D-stable. Is it also A -stable?
E7.11. Show that the seventh-order BDF method is no longer stable for the
Dahlquist test case with = 0. (Hint: Solve for the roots and show that
some of the spurious roots are greater than 1 when z = h = 0.)
E7.12. Another class of an mth -order implicit multistep method is the Milne-Simpson
method for the equation dy/dt = f (y), which is given by the specific form
yn+1 = yn1 + h
m1
bj f nj
j =1
where f k = f (yk ). Use the same approach as in Section 7.3 (and Section G.3)
of using f (y) = y to generate the necessary conditions to show that the matrix
equation for the parameter values of bj that is given by
(m 1)
..
.
..
.
..
.
..
..
.
(1)m
(m 1)m
b
1 1
b0 2
=
. .
.. ..
b
m1
m+1
7.7 Exercises
where
k =
0
2
309
if k is even
if k is odd
=
=
= kr cAs cBs
ds
ds
ds
Ks
n tot
dP
ds
n tot RT
P
Vin
n A
cAs KaAP
(ctot cAs cBs )
n tot
n B
cBs KaBP
(ctot cAs cBs )
n tot
where the states n A, n B, and P are the molar flow rate of A, molar flow rate of
B, and pressure, respectively, at a point where the total weight of catalyst away
from the inlet is given by s. The total molar flow is n tot = n A + n B + n C + n G ,
where n G is the molar flow rate of an inert gas. The variable V in is the
volumetric flow rate at the inlet. The other variables cAs and cBs are the
adsorbed concentration of A and B per unit mass of catalyst. The total number
of catalytic sites is assumed to be a constant ctot . The parameter results from
using the Ergun equation; kr is the specific rate of reaction at the sites; and
Ks , KaA, and KaB are equilibrium constants for the reaction, adsorption of A,
and adsorption of B, respectively. Finally, R = 8.3145 Pa m3 (mole K)1 is
the universal gas constant.
Assume the following parameter set and inlet conditions:
= 11.66
Pa
kg Cat
ctot = 103
moles sites
kg Cat
kr = 30
moles
sec1
moles sites
Ks = 9.12 105 Pa
T in = T = 373 K
m3
V in = 0.001
sec
Pin = 1 atm
Based on an example from K. Beers, Numerical Methods for Chemical Engineering, Cambridge
University Press, Cambridge, UK, 2007.
310
p A,in V in
RT in
n B(0) = n C(0) = 0
n G (0) =
(Pin p A,in ) V in
= n G
RT in
1. Reduce the DAE to an ODE by first solving for cAs and cBs in terms of
n tot and P, and then substitute the results into the differential equations.
Solve the ODE using the available ODE solvers (e.g., in MATLAB).
2. By converting it to a DAE using the mass matrix form, solve the DAE
directly using available DAE solvers (e.g., in MATLAB). Compare the
solution with that obtained from the previous method.
In some applications, the qualitative behavior of the solution, rather than the explicit
solution, is of interest. For instance, one could be interested in the determination of
whether operating at an equilibrium point is stable or not. In most cases, we may want
to see how the different solutions together form a portrait of the behavior around
particular neighborhoods of interest. The portraits can show how different points
such as sources, sinks, or saddles are interacting to affect neighboring solutions. For
most scientific applications, a better understanding of a process requires the larger
portrait, including how they would change with variations in critical parameters.
We begin this chapter with a brief summary on the existence and uniqueness
of solutions to differential equations. Then we define and discuss the equilibrium
points of autonomous sets of differential equations, because these points determine
the sinks, sources, or saddles in the solution domain. Next, we explain some of the
technical terms, such as integral curves, flows, and trajectories, which are used to
define different types of stability around equilibrium points. Specifically, we have
Lyapunov stability, quasi-asymptotic stability, and asymptotic stability.
We then briefly investigate the various types of behavior available for a linear
second-order system, dx/dt = Ax, A[=]2 2, for example, nodes, focus, and centers.
Using the tools provided in previous chapters, we end up with a convenient map
that relates the different types of behavior, stable or unstable, to the trace and
determinant of A.
Afterward, we discuss the use of linearization to assess the type of stability
around the equilibrium points. However, this approach only applies to equilibrium
points whose linearized eigenvalues have real parts that are non-zero. For the rest
of the cases, we turn to the use of Lyapunov functions. These are functions that
are often related to system energy, yielding a sufficient condition for asymptotic
stability. The main issue with Lyapunov functions, however powerful and general,
is that there are no general guaranteed methods for finding them.
Next, we move our attention to limit cycles. These are special periodic trajectories that are isolated; that is, points nearby are ultimately trapped in the cycle.
Some important oscillators such as van der Pol equations exhibit this behavior. Two
theorems, namely Bendixson and Poincare -Bendixsons theorems, are available for
existence (or nonexistence) of limit cycle in a given region. Another important tool
is the Poincare map, which transforms the analysis to a discrete transition maps.
We explore the use of Poincare maps together with Lyapunov analysis to show the
311
312
existence and uniqueness of a limit cycle for a class of nonlinear systems known as
`
the Lienard
system. We also include a discussion of nonlinear centers, because these
are also periodic trajectories but are not isolated; thus they are not limit cycles.
A brief discussion on bifurcation analysis is also included in Section H.1 as
an appendix. These analyses are aimed at how the phase portraits (i.e., collective
behavior of the system) are affected, as some of the parameters are varied. It could
mean the addition or loss of equilibrium points or limit cycles, as well as changes in
their stabilities.
Qualitative analysis of dynamical systems encompasses many other tools and
topics that we do not discuss, such as nonautonomous systems and chaos.
where x,
xR
which yields
xk (t) xk1 (t) =
[f (, xk1 ) f (, xk2 )] d
The Lipschitz conditions are then used to show that Picards iteration is convergent to the solution x (t), thus showing the existence of a solution.
1
See, for example, A. C. King, J. Billingham, and S. R. Otto, Differential Equations: Linear, Nonlinear,
Ordinary and Partial, Cambridge University Press, UK, 2003.
3. To establish uniqueness, let y(t) and z(t) be two solutions, that is,
t
t
y(t) = x0 +
f (s, y) ds and z(t) = x0 +
f (s, z) ds
t0
t0
or
y(t) z(t) =
t0
Taking the norm of the left-hand side and applying the Lipschitz condition, we
get
t
t
f (s, y) f (s, z) ds K
y(s) z(s) ds
y(t) z(t)
t0
t0
implies
h(t) L exp
g(s)ds
313
314
real-valued and finite, there may even be no equilibrium points. The presence of
multiple, isolated equilibrium points is a special feature of nonlinear systems.2
EXAMPLE 8.1.
and
are given by
x1e =
dx2
= ax21 + bx1 + c
dt
b2 4ac
2a
and
x2e = 0
If b2 < 4ac, the values for x1e are complex numbers. Thus if the x is constrained to be real-valued, we say that no equilibrium points exist for this case.
However, if b2 = 4ac, we see that the only equilibrium point is for x2e = 0 and
x1e = b/(2a). Finally, if b2 > 4ac, we have two possible equilibrium points,
b b2 4ac
b + b2 4ac
2a
2a
and
[xe ]1 =
[xe ]2 =
0
0
Note that in most cases, numerical methods such as the Newton method given
in Section 2.9 may be needed to find the equilibrium points.
f 1 (x1 , . . . , xN )
d
..
C(t) =
.
dt
f n (x1 , . . . , xN )
(8.3)
We need the descriptor isolated because linear systems can also have multiple equilibrium points
but they would not be isolated.
315
1.5
x
y
1
0.5
0.5
x, y
0.5
1
0
0.5
10
15
20
25
30
1
1
0.5
0.5
Figure 8.1. On the left is the plot of solutions x and y as functions of t. On the right is the
integral curve shown in the phase plane (with the initial point shown as an open circle).
This appears to be redundant because C(t) is nothing but x(t) that satisfies (8.1).
One reason for creating another descriptor such as integral curves is to stress the
geometric character of the curves C(t). For instance, an immediate consequence is
that because C(t) are simply curves parameterized by t, we can analyze and visualize
the system behavior in a space involving only the components of x, that is, without the
explicit information introduced by t. We refer to the space spanned by the integral
curves (i.e., spanned by x1 , . . . , xn ) as the phase space, and the analysis of integral
curves in this space is also known as phase-space analysis. For the special case of a
two-dimensional plane, we call it a phase-plane analysis.
EXAMPLE 8.2.
dx
dy
= y and
= 1.04x 0.4y
dt
dt
There is only a single equilibrium point, which is at the origin. For the initial
condition at x0 = (1, 0)T , the solutions x(t) and y(t) are plotted together in
Figure 8.1 as functions of t. Also included in Figure 8.1 is the phase-plane plot
of y versus x of the integral curve starting at (x, y) = (1, 0). When exploring the
solutions starting at different initial conditions, the advantage of using phasespace plot becomes clearer various integral curves can be shown together in
one figure.
Another consequence of the concept of integral curves is that the curves can have
parameterizations other than t, for example, if f 1 = 0, the curves can be described
by
dx2
f2
= ,
dx1
f1
...,
dxn
fn
=
dx1
f1
(8.4)
1.5
316
the slopes (independent of t) can present visual cues to the shapes of the integral
curves. This leads us to the use of direction field plots. A direction field, which we
, is a vector field that gives the slopes of the tangents of the integral curves in
denote d
phase space. It is not the same as the velocity field, because the vectors in a direction
field have the same magnitudes at all points except at the equilibrium points. The
components of a direction field for a given f (x) can be obtained as
0
if f i = 0
(8.5)
di (x) =
f i (x)
otherwise
f (x)
where is a scaling constant chosen based on visual aspects of the field. In the
formulation in (8.5), the equilibrium points are associated with points rather than
vectors to avoid division by zero.3 The main advantage of direction fields is that the
formulas given in (8.5) are often much easier to evaluate. The direction fields are
often evaluated at points specified by rectangular, cylindrical, or spherical meshes.
Furthermore, one could collect the locus of points having the same slopes to
form another set of curves known as isoclines. A special case of isoclines are those
that collect points with slopes that are zero in one of the dimensions, and these
are known as nullclines. For instance, for the 2D case and rectangular coordinates
(x, y), the nullclines are the lines where the x components or y components are zero.
Alternatively, for the 2D case under polar coordinates (r, ), the nullclines are the
lines where the slopes are radially inward or outward (i.e., no angular components)
or those where the slopes are purely angular (i.e., no radial components).
EXAMPLE 8.3.
dx
dy
= y and
= 1.04x 0.4y
dt
dt
Then the direction field at a rectangular grid around the origin is shown in
Figure 8.2. Also, there are four nullclines shown in the right plot in Figure 8.2.
(8.6)
2. (x, 0) = x
3. (x, s + t) = ((x, s), t)
Essentially, flows are the mechanism by which the path of an integral curve
can be traversed. Thus flows specify the forward or backward movements at a
3
In the terminology of direction fields, equilibrium points are called singular points, whereas the rest
are called regular points.
1.5
0.5
317
0.5
y
0
0.5
0.5
1
1
0.5
0.5
1.5
0.5
0.5
Figure 8.2. The figure on the left shows the direction field for Example 8.3, whereas the figure
on the right shows the four nullclines under the rectangular coordinates as dotted lines, that
is, locus of purely left, purely right, purely up, and purely down slopes.
specified point in the phase space, thereby yielding a definite direction in the movement along the paths. In this respect, integral curves equipped with flows are called
trajectories.4
In most texts, there is no distinction between integral curves and trajectories. However, because we
have claimed that integral curves can be considered simply as curves in the phase space, they can be
reparameterized also by t for autonomous systems. Thus we suggest that the term trajectories is
a more appropriate term when the direction of the path as t increases is important.
1.5
318
Some equilibrium points can be Lyapunov stable and not quasi-asymptotic (see,
e.g., Exercise E8.2), whereas others can be quasi-asymptotically stable but not Lyapunov stable. An example of a system that is quasi-asymptotically stable but not
Lyapunov stable is the Vinograd system described in the following example.
EXAMPLE 8.4.
dx
dt
f 1 (x, y) =
x2 (y x) + y5
2
(x2 + y2 ) 1 + (x2 + y2 )
dy
dt
f 2 (x, y) =
y2 (y 2x)
2
(x2 + y2 ) 1 + (x2 + y2 )
(8.7)
The only equilibrium point is the origin. A plot of the direction field for the
Vinograd system (8.7) is shown in Figure 8.5.
As an alternative, we can represent the system in polar coordinates, that is,
dr
= f r (r, )
dt
and
d
= f (r, )
dt
r3 h1 () + rh2 ()
f r =
1 + r4
and
r2 h3 () + h4 ()
f =
1 + r4
(8.8)
where
with
h1 ()
h2 ()
h3 ()
h4 ()
319
1.5
0.5
y
Figure 8.5. The direction field for the Vinograd system
given in (8.7).
0.5
1.5
Using the nullclines of (8.8), we can find different sectors where f r and f
change signs, as shown in Figure 8.6. In the figure, we have the shaded region
where f r > 0, and the unshaded region as f r < 0, with f r = 0 at the boundaries.
However, inside the regions ABDCA, AJKIA, AGHA, and AEFA, we have
f > 0, and outside these regions, f < 0. Along the curves ABD, ACD, AIK, and
AJK, the tangents to the trajectories are pointing radially outward (i.e., v = 0),
whereas along the curves AG, AH, AF , and AE, the tangent to the trajectories is
pointing radially inward. This shows that local trajectories northwest of GACD
or southeast of FAIK will be repelled from the origin, whereas the other regions
will be attracted to the origin.
The trajectories for the Vinograd system starting at different initial points
can also be obtained by numerical evaluation of (8.7) using the IVP solvers
such as Runge-Kutta methods given in the previous chapter. These are shown
in Figure 8.7.5 The plot in Figure 8.7 is consistent with both Figures 8.5 and
8.6. It shows that initial conditions starting in some locations of the (x, y)-plane
will go directly to the origin, whereas starting at other locations may initially
diverge away from the origin but ultimately will converge to the origin. This
is an example of a case in which the equilibrium point is quasi-asymptotically
stable but not Lyapunov stable.
Because the equilibrium point is both unstable and attractive, the numerical round-off errors may
produce artificial errors, possibly showing apparent chaotic behavior. A simple fix is to provide
smaller error tolerance and also setting the values of the derivative functions f 1 (x, y) and f 2 (x, y) to
be zero if they are within the chosen error tolerance.
320
0.3
(+)
E
(+)
y 0
(+)
J
(+)
-0.3
-0.3
0.3
1.5
0.5
0.5
1.5
321
Eigenvectors:
2 tr(A) + det(A) = 0
!
tr(A) tr(A)2 4det(A)
=
2
12
a11 if a11 =
v =
a21
if a22 =
22
322
x2
0.8
0.8
0.4
0.4
x2
0.0
-0.4
0.0
-0.4
-0.8
-0.8
-1.0
-0.5
0.0
0.5
1.0
-1.0
-0.5
0.0
x1
0.5
1.0
x1
Figure 8.9. The right figure shows the trajectories around a stable node with 0 > 1 > 2 . The
left figure shows the trajectories around an unstable node with 1 > 2 > 0.
There are only three possible cases when = a11 = a22 either A is strictly diagonal,
upper triangular, or lower triangular, all of which have in the diagonals. In the
strictly diagonal case, the eigenvectors are e1 and e2 . For the triangular cases, there
is only one linearly independent eigenvector: e1 for the upper triangular case, and
e2 for the lower triangular case.
When A is nonsingular, the origin is the only equilibrium point. If A is singular
and of rank 1, then the equilibrium points will lie in a line containing the eigenvector
that corresponds to = 0. Lastly, when A = 0, the set of all equilibrium points is the
whole space; that is, no motion occurs.
Let 1 and 2 be the eigenvalues of A. If the eigenvalues are both real, then the
trajectories are classified as either nodes, stars, improper nodes, saddles, or degenerate. Otherwise, if the eigenvalues are complex-valued, the trajectories are either
focuses or centers, where centers occur when the eigenvalues are pure imaginary.
We discuss each case next.
1. Nodes. When det(A) > 0 and tr(A)2 > 4 det(A), then both 1 and 2 are realvalued and have the same sign. Both eigenvalues are negative when tr(A) < 0,
and both are positive when tr(A) > 0. In either case, using the diagonalization
procedure of Section 6.6, the solution of (8.9) is given by
(8.10)
x(t) = z10 e1 t v1 + z20 e2 t v2
where
z10
z20
=
v1
v2
1
x0
If both eigenvalues are positive, the equilibrium points are classified as unstable
nodes. Otherwise, if both eigenvalues are negative, the equilibrium points are
stable nodes.
Based on (8.10), the trajectories x(t) are linear combinations of the eigenvectors v1 and v2 . If the initial point x0 happens to be along either of the eigenvectors,
then the trajectories will travel along the same line as that contains the eigenvectors. Otherwise, the trajectories will be half-U-shaped, where the center
of the U is along the eigenvector that corresponds to the eigenvalue with the
larger absolute value. Typical plots of both stable and unstable nodes are shown
in Figure 8.9.
1.0
x2
0.0
-1.0
-1.0
0.0
1.0
x1
Figure 8.10. The trajectory around a saddle with 1 > 0 > 2 where line 1 is along v1 and line
2 is along v2 .
2. Saddles. When det(A) < 0, the eigenvalues are both real-valued, but one of
them will be positive, whereas the other will be negative. Thus let 1 > 0 > 2 ;
then, based on (8.10), x(t) will be a linear combination of an unstable growth
along v1 and a stable decay along v2 . The equilibrium points will then be classified as saddles. Typical plots of trajectories surrounding saddles are shown in
Figure 8.10.
3. Stars. When A = a I, with a = 0, both eigenvalues
will be equal to a, and the
matrix of eigenvectors becomes V = v1 v2 = I. This further implies z =
V x0 = x0 . Thus (8.10) reduces to
x(t) = eat x0
(8.11)
and the trajectories will follow along the vector x0 . The equilibrium points are
then classified as stars, and depending on a, they could be stable stars (if a < 0)
or unstable stars (if a > 0). Typical plots of trajectories surrounding stars are
shown in Figure 8.11.
4. Improper Nodes. Suppose the eigenvalues of A are equal to each other, that is,
1 = 2 = = 0. Using the method of finite sums (cf. 6.14) to solve this case with
repeated roots, the solution is given by
x(t) = et [I + t (A I)] x0
(8.12)
Note that if A = I, we get back the equation of the trajectories around that of a
star node, as given in (8.11). Thus add another condition that A is not diagonal.
In this case, there will be only one eigenvector, which is given by
a22
a12
or
v=
v=
a11
a21
whichever is nontrivial.
If the initial point x0 = v (i.e., it lies along the line containing v), then the
trajectories will travel along that line because (A I)v = 0 and (8.12) becomes
x(t) = e v. If x0 is outside of this line, the trajectories will be curved either
323
324
x2
0.8
0.8
0.4
0.4
x2
0.0
-0.4
0.0
-0.4
-0.8
-0.8
-1.0
-0.5
0.0
0.5
1.0
-1.0
-0.5
x1
0.0
0.5
1.0
x1
Figure 8.11. The trajectories surrounding (a) stable stars and (b) unstable stars.
a12 > 0
as long as
a12 = 0
or a21 < 0
as long as
a21 = 0
a12 < 0
as long as
a12 = 0
or a21 > 0
as long as
a21 = 0
If
Z-type: If
(8.13)
5. Focus and Centers. When tr(A) < 4 det(A), the eigenvalues of A are a complex
conjugate pair given by 1 = + i and 2 = i, where
tr(A)
2
"
1
4 det(A) tr(A)2
2
(8.14)
Figure 8.12. Trajectories surrounding stable (a) S-type and (b) Z-type improper nodes.
principal
line 2
2
4
x2
principal
line 1
x2
-2
-4
-4
-4
bounding ellipse
-2
-4
x1
x1
Figure 8.13. The left figure shows a stable focus and the right figure shows an unstable focus.
Included in both figures are the bounding ellipses. The principal lines are the polar nullclines,
where the points of these lines have no radial components.
cos () I + sin () !
A x0
det(A)
0 2
(8.16)
Typical plots of the trajectories surrounding stable and unstable focuses are
shown in Figure 8.13, whereas the trajectories surrounding centers are shown in
Figure 8.14.
325
326
x2
principal
principal
-4
-4
x1
6. Degenerate Points. When one or both eigenvalues of A are zero, there will be
more than one equilibrium point. Both these cases are classified as degenerate
points or nonisolated equilibrium.
Let 1 = 0 and 2 = 0; then (8.10) will reduce to
x = z01 v1 + z02 e2 t v2
(8.17)
This means that when x0 = v1 then x(t) = v1 ; that is, v1 will lie in the line that
contains all the (nonisolated, non-unique) equilibrium points. Outside this line,
the trajectories are parallel to v2 , that is, an affine operation on v2 . Again, the
equilibrium points will be stable if 2 < 0 and unstable if 2 > 0. Typical plots of
trajectories surrounding degenerate equilibrium points are shown in Figure 8.15.
If both eigenvalues happen to be equal to zero, then we must have both the
trace and determinant of A to be zero. This implies a11 = a22 and a12 a21 = a211 .
From (8.12), we have
x(t) = (I + At) x0
equilibrium
line
equilibrium
line
eigenvector
eigenvector
Figure 8.15. Trajectories surrounding (a) stable and (b) unstable degenerate points.
327
det(A)
2
FOCUS
(stable)
FOCUS
(unstable)
NODES
(stable)
NODES
(unstable)
IMPROPER
CENTER
STARS
Trace(A)
SADDLE
if a12 = 0
a11
v=
a22
if a21 = 0
a21
Outside this line, the trajectories are along straight lines that are parallel to v.
The trajectories for A = 0 are left as Exercise E8.6.
The different types of equilibrium points can be summarized in terms of the
trace and determinant of A, as shown in Figure 8.16.
(8.19)
328
which then yields a linearized approximation of the original nonlinear system given
by
d
x = J (xe ) (x xe )
dt
(8.20)
However, this approximation is true only for small neighborhoods around the
equilibrium points. Moreover, if the Jacobian is singular (i.e., it contains some zero
eigenvalues) or if it contains pure imaginary eigenvalues, then the truncations may
no longer be valid, even for a small neighborhood around the equilibrium points.
Thus we need to classify the condition for which linearization would be sufficiently
close to the actual flows around the equilibrium points, as given by the following
definition:
Definition 8.5. An equilibrium point xe of dx/dt = f(x) is classified as a hyperbolic equilibrium point if none of the eigenvalues of the Jacobian matrix J (xe ) of
f(x) are zero or pure imaginary. Otherwise, it is classified as a non-hyperbolic
equilibrium point.
The following theorem, known as the Hartman-Grobman theorem or the linearization theorem, applies only to hyperbolic equilibrium points:
Let xe be a hyperbolic equilibrium point of dx/dt = f(x). Then for a
small neighborhood around xe , the behavior of the trajectories can be approximated
by (8.20).
THEOREM 8.2.
This implies that the type and stability of the trajectories surrounding hyperbolic equilibrium points can be determined by simply analyzing the linearized
equations.
EXAMPLE 8.5.
(8.21)
where i = 1. Thus all the equilibrium points are hyperbolic, and we can use
linearized approximations of the trajectories around each of the equilibrium
points. Based on the results of Section 8.5, the equilibrium points are saddles
for even values of n and stable focuses for odd values of n.
329
0.5
x2
0.5
0.5
1.5
(8.23)
The equilibrium point is at the origin, and the linearized equation around the
origin is given by
d
0 1
x=
x
1 0
dt
The eigenvalues of the Jacobian matrix are i, which predicts that the trajectories around the origin should be close to those corresponding to centers. The
plot of the trajectory starting at x0 = (0.15, 0.1)T is shown in Figure 8.18. The
trajectory starts out different from trajectories around centers, but at longer
times, it does approach a stable focus, with very slow movement toward the
origin. It can be shown in the next section that the origin in this example is
asymptotically stable (even though it goes very slowly as is nears the origin).
Nonetheless, the linearization does predict the clockwise rotation.
2.5
330
0.1
x20.05
Figure 8.18. The trajectory of (8.23) starting at x0 =
(0.15, 0.1)T .
0
0.1
0.05
x1
THEOREM 8.3.
EXAMPLE 8.7.
331
Referring back to Example 8.6, the system there is for the case with = 60.
In that example, we noted that the origin is a non-hyperbolic equilibrium point
and that the linearization approach is not applicable. We see that the Lyapunov
function approach was still able to assess the stability around the origin. For
= 0, the nonlinear system reduces to a linear system in which the origin will
be a center, which is Lyapunov stable but not asymptotically stable.
N
|f j (x)|
j =1
d
x = Ax
dt
is asymptotically stable without having to determine the eigenvalues. Let P be
a symmetric positive definite matrix; then by definition, V = xT Px > 0. Taking
the derivative of V , we have
dV
d T
d
=
x Px + xT P
x = xT AT P + PA x
dt
dt
dt
which will be a Lyapunov function if we can find P > 0 such that
N = AT P + PA
is a negative definite matrix.
Thus we can choose any N and try to solve for X in
AT X + XA = N
(8.24)
and if X is positive definite, then the linear system will be asymptotically stable.
Thus stability of the linear system can be determined by solving the Lyapunov
matrix equation (8.24) (which is a special case of Sylvester matrix equation given
in (1.23) ) and then proving that the solution is positive definite.6
For instance, let
2.5
4.5
A=
0.5 2.5
332
EXAMPLE 8.9.
dx1
= x2
dt
dx2
= x2 (1 x21 ) x1
(8.25)
dt
The origin is an equilibrium point, and the linearized equation around the
origin is given by
d
0
1
x=
x
1 1
dt
Because the eigenvalue of the Jacobian is given by (0.5 0.866i), the behavior around the origin will be an unstable focus. However, the trajectory does
not go unbounded. Instead, it settles down into a limit cycle, as shown in
Figure 8.19.
systems, we have two important results: Bendixsons criterion and the PoincareBendixson theorem.
6
The Lyapunov function approach has also been used to prove Routh-Hurwitz stability criterion for
a given characteristic polynomial (see, e.g., C. T. Chen, Linear System Theory and Design, Oxford
University Press, 1984).
This is also the main utility of the Routh-Hurwitz method for stability analysis.
x2
1
0
-1
-2
-3
-3
-2
-1
x1
Figure 8.19. Phase-plane plot of the van der Pol system given in (8.25).
THEOREM 8.4.
by
d
x = f(x)
dt
suppose that the divergence
f =
f1
f2
+
x1
x2
is not identically zero, nor does it change sign in a simply connected open region D;
then no limit cycles exist in D.
First assume that there is a limit cycle in D and that the limit cycle is contained
in a closed curve C. Let S be the closed region enclosed by C. Then using Greens
lemma (cf. Equation 5.1)on the divergence of f in region S, we have
3
f1
f2
+
dS =
f 1 dx2 f 2 dx1
x2
S x1
C
T
dx2
dx1
f1
=
f2
dt
dt
dt
0
PROOF.
where T is the period of the cycle. However, for the surface integral to be zero, we
need the divergence of f to either be zero or change sign.
Consider the van der Pol system given in (8.25). Calculating the
divergence of f, we have
EXAMPLE 8.10.
f = 1 x21
Let D be a region bounded by a circle centered at the origin having a radius,
r < 1. The divergence of f in D is always positive. We then conclude that in this
region, there are no limit cycles. This can be verified by observing the plot given
in Figure 8.19.
However, note that the criterion does not prevent portions of limit cycles
to be in the region that satisfy the conditions of the criterion.
333
334
Cout
M
Cin
Poincare-Bendixsons
Theorem. Let M be a region bounded by two
nonintersecting closed curves Cout and Cin , where Cin is inside Cout , as shown in Figure 8.20. For the second-order autonomous system given by
THEOREM 8.5.
d
x = f(x)
dt
If
1. There are no equilibrium points inside M.
2. Along Cin and Cout
fn0
where n is the outward unit normal vector of region M,
3. Inside M
x2
= 0
(f)T
x1
(8.26)
(8.27)
EXAMPLE 8.11.
(8.28)
Next, choose Cin to be a circle centered around the origin of radius rin ,
and choose Cout to be a circle centered around the origin of radius rout , where
0 < rin < rout . The outward unit normal vectors at Cin are
y
cos ()
where = tan1
nin =
, x2 + y2 = rin
sin ()
x
8
335
2
1.5
1
Figure 8.21. A plot of some trajectories of (8.28) showing that the unit circle is a limit cycle.
x2
0.5
0
-0.5
-1
-1.5
-2
-2
-1
x1
2
sin()2
f n = rin 1 rin
0 2
2
f n = rout 1 rout
sin()2
0 2
and we can see that we need rin < 1 and rout > 1 to satisfy (8.26).
As for (8.27), we have
x2
= x21 + x22 = 0
for (x1 , x2 ) M
fT
x1
A plot of some trajectories of the system is shown in Figure 8.21. From the figure,
`
8.8.2 Poincare Maps and Lienard
Systems
`
The Poincare-Bendixson
theorem is a tool to determine the existence of a stable
limit cycle inside a region in a phase-plane, that is, a second-order system. In higher
order cases, the theorem may no longer hold because other trajectories may be
present, such as quasi-periodic or chaotic trajectories. To determine the presence
of limit cycles in these cases, a tool known as Poincare maps (also known as firstreturn maps) can be used. It can also be used to show uniqueness of stable limit
cycles.
Let S be an (n 1)-dimensional hypersurface that is transverse to the flow of an
n-dimensional autonomous system dx/dt = f(x) (i.e., none of the trajectories will be
parallel to S). A Poincare map P is a mapping obtained by following the trajectory
336
xk+1
xk
of an intersecting point xk (of the trajectory with S) to the next intersecting point
xk+1 , that is,
xk+1 = P (xk )
(8.29)
This is shown diagramatically in Figure 8.22. Note that the hypersurface S is often
a bounded or semibounded region in n 1 dimension that slices the n dimensional
region containing the limit cycle.
If a point xeq in the surface S is the fixed point of the Poincare map, that is,
xeq = P xeq , then xeq belongs to a limit cycle.9 A discrete version of Lyapunov
functions can also be used to determine the stability of the limit cycle. This principle
will be used in the following example to show the uniqueness of a limit cycle of a
`
particular class of second-order equations known as Lienard
systems.
EXAMPLE 8.12.
`
Consider the second-order system known as the Lienard
system
given by
d2 x
dx
+ f (x)
+ g(x) = 0
(8.30)
2
dt
dt
`
Instead of the usual state-space representation, the Lienard
coordinates can be
used by defining y as
x
dx
y=
+ F (x) where F (x) =
f ()d
(8.31)
dt
0
Then (8.30) becomes
d
dt
x
y
=
y F (x)
g(x)
(8.32)
We are using a technical distinction between fixed points of discrete maps such as Poincare maps
and equilibrium points for continuous maps that apply to differential equations.
337
y=F(x)
`
Figure 8.23. The nullclines of the Lienard
system.
q
-q
Y+
S-
`
If all these conditions are satisfied, then the Lienard
system given by (8.30) will
have exactly one stable limit cycle, as we show next. (When we refer to the
`
Lienard
system that follows, we are assuming that these conditions are already
`
attached.) The relevance of Lienard
systems is that it is a class of autonomous
oscillators of which the van der Pol equation is a special case.
The Jacobian matrix resulting from linearization of (8.32) at the origin is
given by
J0 =
f (x)
dg/dx
1
0
(x,y)=(0,0)
Because tr(J 0 ) = f (0) > 0 and det(J 0 ) = dg/dx > 0, the origin is hyperbolic
and is either an unstable node or unstable focus (see Figure 8.16).
Let S+ be the strictly positive y-axis (i.e., excluding the origin); then S+
is a nullcline where the trajectories are horizontal and pointing to the right
( dy/dt|(x=0) = g(0) = 0 and dx/dt|(x=0) = y > 0). Another nullcline S is the
strictly negative y-axis where the trajectories are horizontal and pointing to the
left. The other two nullclines Y and Y + will be the graph y = F (x) with x = 0.
The trajectories at Y are vertical and pointing down when x > 0, whereas
the trajectories at Y + are vertical pointing up when x < 0 ( dy/dt = g(x) <
0 for x > 0 and dy/dt = g(x) > 0 when x < 0). The nullclines are shown in
Figure 8.23.
Using all the nullclines, one can apply Poincare -Bendixsons theorem to
prove that a limit cycle exists in a region M whose outer boundary has a distance
from the origin that is greater than q (the root of F (x)) and inner boundary is
a circle of a small radius > 0 surrounding the origin. However, this does not
prove the uniqueness of the limit cycle. To do so, we need to find a unique fixed
point of a Poincare map, which we will choose S+ to help define the Poincare
map P.
Because we are only given conditions for g(x) and F (x), we cannot evaluate
the actual Poincare map P.10 Instead, we use a Lyapunov function defined by
x
1 2
V (x, y) = y +
g()d
2
0
whose derivative is given by
dV
dy
dx
= y + g(x)
= g(x)F (x)
dt
dt
dt
10
`
Even if we are given f (x) and g(x), the Lienard
equation is an Abel equation, which can be very
difficult to solve in most cases.
338
`
Furthermore, note that because F (x) and g(x) are odd functions, the Lienard
system is symmetric with respect to both x and y; that is, the same equation (8.32)
results after replacing x and y by &
x = x and &
y = y, respectively. Thus instead
of the full Poincare map P, we can just analyze the map &
P, which is the map of
the trajectory starting at a point in S+ to a first intersection in S , and conclude
P(y ) due to the special symmethat y is in a limit cycle if and only if y = &
+
try. The fixed point (0, y ) at S is unique if the Lyapunov function between
Poincare maps at S+ is equal, that is,
V (0, y ) = V (0, P(y )) or V (0, y ) = V 0, &
P(y )
Let
Vy0 = V 0, &
P(y0 ) V (0, y0 ). Starting at (0, y0 ) with y0 > 0, the
trajectory will intersect the nullcline Y . If the intersection with Y is at x q,
then dV/dt > 0, which means
Vy0 > 0. However, if the intersection with Y
is at x > q, we can show that the difference
Vy0 > 0 decreases monotonically
with increasing y0 , as sketched here:
1. The value of
Vy0 can be split into three paths yielding
t1
t2
Vy0 =
V (x(t), y(t))dt +
V (x(t), y(t))dt +
0
t1
t3
V (x(t), y(t))dt
t2
y+ (q)
the arc from y+ (q) to y (q) increases in size to the right and F (x) > 0 along
this arc as y0 is increased. However, because y (q) < 0 < y+ (q), this integral
will also decrease as y0 increases.
4. Because all three integrals decrease as y0 increases, we conclude that
Vy0
decreases monotonically as y0 is increased. Following the same arguments
of the integrals, it can be shown that
Vy0 as y0 .
Combining these results, we can see from Figure 8.24 that there is
only one fixed point in S+ (equiv. in S ); that is, the limit cycle of the
`
Lienard
system is stable and unique.
Because the Poincare map is a discrete map xk+1 = P(xk ), one could test the
stability of the limit cycle passing through a fixed point x in the surface S by
introducing a perturbation k , that is,
x + k+1 = P (x + k )
339
Vy >0
Vy
y0
y*
Using Taylor series expansion and truncating the higher order terms, we end up with
a linearized Poincare map J P for the perturbation
k+1 = J P k
where
dP
JP =
dx x=x
(8.33)
Using Theorem 7.2, the perturbations will die out if all the eigenvalues i , i =
1, . . . , n, of the linearized Poincare map J P have magnitudes less than 1. These
eigenvalues are also known as the Floquet multipliers of the Poincare map P. However, similar to the non-hyperbolic case of an equilibrium point in the continuous
case, the stability cannot be determined using linearization if any of the eigenvalues
of J P have a magnitude equal to 1. In those cases, the Lyapunov analysis or other
nonlinear analysis methods need to be used.
x
y
=
f 1 (x, y)
f 2 (x, y)
d
d
x
&
y
=
f 1 (x,&
y)
f 2 (x,&
y)
340
0.6
0.4
0.2
Figure 8.25. Plot of trajectories for (8.34) with initial points at (1, 0), (0.75, 0), (0.5, 0), (0.25, 0), and
(0.11, 0).
0.2
0.4
0.6
0.5
0.5
EXAMPLE 8.13.
(8.34)
8.10 Exercises
for all > 0. This means that the solution is not unique, and this is possible
because although f (x) is continuous, its derivative df/dx is not continuous at
x = 0.
E8.2. Consider the following autonomous system, with a = 0,
dx
= ay
dt
dy
= az
dt
dz
= a2 z
dt
Show that the equilibrium point is Lyapunov stable but not quasiasymptotically stable.
E8.3. Consider the following system given in Glendinning (1994):
dx
xy
= x y + x x2 + y2 + !
dt
x2 + y2
dy
x2
= x + y y x2 + y2 !
dt
x2 + y2
1. Determine the equilibrium points.
2. Obtain a direction field for this system in the domain 2 x 2 and
2 y 2. (We suggest a 20 20 grid.) Ascertain from the direction
field whether the equilibrium is Lyapunov stable, quasi-asymptotic stable,
or asymptotically stable.
3. Use the initial value solvers and obtain the trajectories around the equilibrium.
E8.4. For the linear second-order system dx/dt = Ax, once it can be determined to
be a focus or center, show that the direction of rotation can be determined to
be clockwise or counterclockwise depending on whether a12 > 0 or a12 < 0.
One approach is given by the following steps:
1. Show that a necessary condition for focus or centers is that a12 a21 < 0.
2. Let r = x1 x + x2 y be the position vector of x and v = y1 x + y2 y be the
tangent to trajectory at x, where y = Ax. Show that the cross product is
given by
r v = z
where
= a21 x21 + (a22 a11 ) x1 x2 a12 x22
3. Using the fact that a12 a21 < 0, show that if a21 > 0, we must have a12 < 0
then
'
(2
a22 a11
1
>
x1 a12 x2 0
a12
2
and the rotation becomes counterclockwise. Next, you can show the
reverse for clockwise rotation.
E8.5. Show that the conditions given in (8.13) will indeed determine whether the
improper node is either S-type or Z-type. (Hint: To obtain an S-type improper
node, trajectories at either side of v will have to be counterclockwise for
341
342
stable equilibrium points and clockwise for unstable equilibrium points. For
the Z-type improper nodes, the situations are reversed.)
E8.6. Obtain the trajectories in the phase plane for the degenerate cases with A = 0
where both eigenvalues are zero.
E8.7. For each of the cases for A that follow, determine the type of equilibrium
points and do a phase-plane analysis of each case for
dx
= Ax
dt
by plotting trajectories at various initial points surrounding the equilibrium
point. Also include supplementary information such as bounding ellipse,
direction of rotation, shapes, straight line trajectories, and so forth appropriate for the type of equilibrium points.
a) A =
d) A =
g) A =
1
4
2
1
b) A =
1
4
2
1
2
3
1
2
e) A =
h) A =
1
2
2
1
0
6
1
5
2
4
1
2
c) A =
f) A =
i) A =
2
3
4
2
5
0
0
5
0
0
3
0
E8.8. Let x1 and x2 be the population of two competitive species 1 and 2. One
model for the population dynamics is given by
d
x1 0
b Ax
x=
0 x2
dt
where b and A are constant, with A nonsingular.
1. Show that there are four equilibrium points given by
b1
0
0
; (xe ) = a11 ; (xe ) =
(xe )1 =
2
3
b2
0
0
a22
; (xe ) = A1 b
4
8.10 Exercises
E8.9. Consider the following coupled equations describing the dynamics of a nonisothermal continuously stirred tank reactor based on energy and mass balance for a first-order reaction A B:
dC
F
=
(Cin C) k0 CeE/(R(T +460))
dt
V
dT
F
UA
(
H)
=
k0 CeE/(R(T +460))
(T T j )
(T in T ) +
dt
V
c p
Vc p
where C and T are the concentration of A and temperature in the reactor,
respectively. Using the following parameter set:
F :
V:
T in :
Cin
H :
c p :
U:
A:
ko :
E:
R:
Tj:
1. Find all the equilibrium points and show that they are all hyperbolic.
2. Based on linearization, determine the local behavior of the trajectories
surrounding the equilibrium points.
3. Plot a direction field in a phase that contains all three equilibrium points.
Then using an ODE IVP solver, obtain the phase portrait that shows the
behavior of trajectories using various initial conditions.
E8.10. For the second-order system described by
d
h(x)
k
x=
x
k h(x)
dt
where h(x) is a scalar function of x1 and x2 and k is a real-valued constant.
1. Show that the origin is a non-hyperbolic equilibrium point if h(0) = 0.
2. Show that if h(x) is negative definite (semi-definite), then the origin is
asymptotically stable (Lyapunov stable) by using the Lyapunov function
V (x) = xT x.
3. Determine the system stability/instability if h(x) is positive definite.
E8.11. Use the Lyapunov function approach as in Example 8.8 to show that the
following linear equation is stable:
d
5
1
x=
x
2 4
dt
E8.12. Use Bendixons criterion to show that the system given in Example 8.5,
d
x2
x1
=
x2
cos (x1 ) x2
dt
will never have limit cycles.
343
344
r = r > 0, that is, g(r) < 0 for r < r and g(r) > 0 for r > r . Use PoincareBendixsons theorem to show that a limit cycle will exist around the origin.
Furthermore, show that the limit cycle will have a circular shape. (Hint:
Apply the theorem to the annular region between a circle having a radius
less than r and a circle of radius greater than r .) Test this for g(r) =
(r 1)(r + 1).
<0
x=xr
Show that the equilibrium points at (x, y) = (xr , 0) is non-hyperbolic, having linearized eigenvalues that are pure imaginary.
2. Assume xr = 0; that is, the origin is an equilibrium point where dh/dx < 0
(This can be done by a simple translation of coordinates). Then show that
the function
x
y2
V (x, y) =
h()d
2
0
is a Lyapunov function for the origin for which dV/dt = 0 for a neighborhood around the origin. Thus V = constant in the neighborhood around
the origin, hence a conservative system. Thus the origin will be a nonlinear
center.
3. Let h(x) = x(x + 1)(x + 4). Then according to the preceding results, the
origin should behave as a nonlinear center for this case. Verify by using
IVP solvers to obtain a phase-plane portrait using various initial conditions
around the origin. What about around the neighborhood of x = 4 and
x = 1?
8.10 Exercises
ax + y
dy
dt
x2
by
1 + x2
with a, b > 0.
1. From each equation we obtain two graphs: y = g 1 (x) = ax and y = g 2 (x) =
x2 /[b(1 + x2 )]. The equilibrium points are at the intersection of both
graphs. Plot both graphs under a fixed b to show that with alo > 0 there
is a range of alo < a < ahi such that the number of equilibrium points
will change from three to two. Also, for a > ahi , there will only be one
equilibrium point.
2. Determine the stability of the equilibrium points.
3. Draw a bifurcation diagram for this system for b = 1, and classify the type
of bifurcations at the bifurcation points.
E8.19. The Brusselator reaction is given by following set of reactions:
A
B+x
y+C
2x + y
3x
where the concentrations of CA, CB, CC, and CD are assumed to be constant.
Furthermore, with the reactions assumed isothermal,the specific kinetic rate
345
346
coefficients ki for the ith reaction will be constant. The rate of change in the
concentrations of x and y will then be given by
dCx
= k1 CA k2 CBCx + k3 C2x Cy k4 Cx
dt
dCy
= k2 CBCx k3 C2x Cy
dt
1. Obtain the equilibrium point of the reaction system.
2. Choosing CB as the bifurcation parameter, find the critical point CB,h under
which the Brusselator can satisfy the conditions for Hopf bifurcation.
3. Using the IVP solvers to simulate the process, determine whether the
Brusselator exhibits supercritical or subcritical Hopf bifurcation under
the following fixed parameters: k1 = 1.8, k2 = 2.1, k3 = 1.2, k4 = 0.75, and
CA = 0.9.
(9.1)
We can apply the successive substitution approach known as Picards method and
obtain a series solution. The method begins with an integration of (9.1),
x
y(x) = y0 + a
y(1 )d1
0
347
348
y0 + axy0 + a
0
1
2
0
y0 + a
y(3 )d3 d2 d1
0
(ax)2
(ax)3
y0 1 + ax +
+
+
2!
3!
y0 eax
This example suggests that some of the solutions to linear differential equations
could be represented by power series. Furthermore, it shows that, as in the preceding
process toward forming eax , the series solutions themselves can often generate the
construction of new functions. For instance, we can show that Legendre polynomials and Bessel functions could be defined from series solutions of certain classes
of second-order linear differential equations. However, instead of using Picards
method, we approach the solution by first constructing a power series that contains unknown parameters and coefficients. The parameters are then evaluated after
substitution of the power series into the differential equations.
Consider the homogeneous linear Nth -order differential equation
N (x)
dN y
dy
+ + 1 (x)
+ 0 (x)y = 0
N
dx
dx
(9.2)
d2 y
2x dy
4
+
+ 2
y=0
2
dx
x + 1 dx x (x + 1)2
an (x xo )n+r
(9.4)
n=0
349
means that even though expanding around ordinary points are straightforward to
evaluate, it is still worthwhile to pursue expansions around singular points if doing
so will significantly widen the radius of convergence.
We now list a set of formulas and assumptions that will be used during the
development of the power series solutions:
1. A function f (x) is analytic at a point xo if f (x) has a convergent Taylor series
expansion around xo .1
2. The product of two infinite series is given by
n
i
j
ai x
bj x =
ak bnk xn
(9.5)
j =0
i=0
n=0 k=0
3. The Leibnitz formula for the nth derivative of a product of functions is given by
k nk
n
dn
d f d g
n
f
(x)g(x)
=
(9.6)
k
dxn
dxk dxnk
k=0
where
n
k
=
n!
(n k)!k!
4. Let X = x xo , then
dk y
dk y
=
dX k
dxk
j (X) = (x), we could rewrite (9.2) as
With
N
k (X)
k=0
dk y
=0
dX k
Thus, in the next sections, we will assume that x has been shifted such that we
could just expand the solution around xo = 0. This will significantly simplify the
solutions by avoiding having to expand (x xo )k .
5. The Gamma Function is a function defined by
(x) =
et tx1 dt
(9.7)
0
The plot of (x) is shown in Figure 9.1. Some of the properties of the Gamma
functions are:
lim (x) =
xm
(1)m+1
1
A more complete discussion of analytic functions and singular points is given in L.2.1.
350
20
10
x
1
-10
-20
-3
-2
-1
(x + n k) =
k=0
(x + n + 1)
(x + n m)
(9.9)
(x + n)
(x)
(9.11)
(a1 )Pn (a p )Pn xn
(b1 )Pn (bq )Pn n!
(9.12)
n=0
where (a1 )Pn , (a2 )Pn and so forth are Pochhammer symbols defined in (9.10)
Special cases includes Gauss hypergeometric series defined as
(a)Pn (b)Pn xn
2 F 1 (a, b; c; x) =
n!
(c)Pn
n=0
(9.13)
351
0.5
erf(x)
0.5
1
2
(a)Pn xn
1 F 1 (a; b; x) =
(b)Pn n!
(9.14)
n=0
These functions are useful in evaluating several series solutions. They were originally used to solve differential equations known as hypergeometric equations
(see Exercise E9.3). The MATLAB command for
y=
p F q (a1 , . . . , a p ; b1 , . . . , bq , x)
is y=hypergeom([a1,...,ap],[b1,...,bp],x).
8. The error function, denoted by erf(x), is defined as
2
erf(x) =
ey dy
2
(9.15)
ey dy
2
(9.16)
In MATLAB, the function erf is available for evaluating the error function.
In the sections that follow, we first tackle the series solutions expanded around
an ordinary point. Then we follow it with the discussion of the series solutions
expanded around a regular singular point. We initially explore the solutions of highorder equations. However, at some point, we must limit it to second-order differential
equations to allow for some tractable results.
352
p j (x)
j =0
dj y
=0
dx j
(9.17)
then x = 0 is an ordinary point of (9.17) if the coefficient functions p j (x) are all
analytic around x = 0; that is, the coefficient functions can be expanded as
p j (x) =
j,n xn
j = 0, 1, 2, . . . , N
(9.18)
n=0
THEOREM 9.1.
y=
an xn
(9.19)
n=0
n+N1
n,k ak
n = 0, 1, . . . ,
(9.20)
k=0
where
0,nk +
N
j,nk+j
j =1
n,k = (1)
N,0
j 1
(k i)
i=0
N
(9.21)
(n + i)
i=1
with
j, = 0
PROOF.
<0
EXAMPLE 9.1.
2
d3 y
dy
dy
d2 y
2d y
+
x
+
3x
+
y
=
0
and
y(0)
=
2;
(0)
=
1;
=1
dx3
dx2
dx
dx
dx2
and we want to obtain a power series solution expanded around the origin. The
origin is an ordinary point of the differential equation. The coefficient functions
of the derivatives of y are
p 3 (x) = 1
3,0 = 1
p 2 (x) = x
2,2 = 1
p 1 (x) = 3x
1,1 = 3
p 0 (x) = 1
0,0 = 1
353
3.5
2.5
y 2
1.5
0.5
0
Based on Theorem 9.1, the coefficients in the recurrence equation (9.20) for
N = 3 will be given by
n,k = (1)
The nonzero terms occur only for 3,0 and i,i for i = 0, 1, 2. Thus n,k = 0 only
when n = k, which yields
n,n = (1)
n+1
(n + 2)(n + 3)
n+1
an
(n + 2)(n + 3)
n = 0, 1, 2, . . .
2
=0
a x 1 + !
k=1
3 k k
x
(3j 2 + )2
(3k + )!
j =1
'
(
'
(
4
x3
5 x3
x3
5
7 x3
1
+x 1
2 F 2 1, ; 2, ;
2 F 2 1, ; 2, ;
6
3
3 3
6
3
3 3
'
(
x2
3x3
8 7 x3
+
1
1,
2;
,
F
;
2 2
2
20
3 3 3
The series solutions are compared with the numerical solution using RungeKutta in Figure 9.3.
10
354
In some cases, the series solutions may yield some group of terms that can be
replaced by their closed forms. When these occur, the method of order reduction
may be useful in finding a closed form for the other solutions.
Let the solution of a second-order homogeneous differential equation be given
by y(x) = Au(x) + Bv(x). Suppose u(x) has been found (e.g., using series solution)
and has a known closed form. The method of order reduction simply proceeds by
letting the second solution take the form of v(x) = q(x)u(x), where q(x) will be solved
from an associated equation that will hopefully be from a reduced order equation
or an equation that can be solved using other techniques. More details about the
method of order reduction can be found in Section I.2 as an appendix.
EXAMPLE 9.2.
d2 y
dy
+x
y=0
2
dx
dx
x
q = C p dx = C e
+
erf
x
2a
2a
where erf(x) is the error function (cf. (9.15)).
Finally, we obtain the alternative form to the series solution given by
7
'
(
ex2 /(2a) + x erf x
y = Ax + C
2a
2a
This is relatively easier to evaluate (given that erf(x) is a function that is available
in MATLAB and other programs) than the solution obtained by a complete
implementation of Theorem 9.1, which is given by
1 k (2k 2)!
y = Ax + B 1 2
x2k
(9.22)
2a
(k 1)!(2k)!
k=1
2
3
This equation results from solving the diffusion equation after applying the similarity transform
method (cf. Example 11.15).
This first solution derives from Theorem 9.1, where 1,1 = 0. This leads to a3 = a5 = a7 = = 0.
355
xi &
p i (x)
i=0
di y
=0
dxi
(9.23)
&
i,n xn
&
N,0 = 0
(9.24)
n=0
Then the series solution around the regular singular point x = 0 will take a form that
is slightly different from the series solution around an ordinary point given in (9.19).
To motivate the form needed for the solution, consider the first-order differential
equation,
x
dy
+ (q0 + q1 x) y = 0
dx
1
y = Axq0 eq1 x = Axq0 1 q1 x + (q1 x)2 + = A
an x(nq0 )
2
n=1
where A is an arbitrary constant. Note that the exponents now involve another
parameter q0 .
Thus for the series solution around a regular singular point at x = 0, we use a
more general series known as the Frobenius series given by
y=
an xn+r
(9.25)
n=0
where r is an additional parameter, and the solution method is known as the Frobenius method. Using (9.25), the product of x j and j th derivative of y can be evaluated
as follows:
x
dy
dx
n=0
..
.
dj y
xj i
dx
(n + r)an xn+r
j 1
n + r an xn+r
n=0
(9.26)
=0
After substitution of (9.24) and (9.26) into (9.23), with the implementation of (9.5),
j 1
N
n
0,nk +
xn+r
ak &
&
j,nk (k + r ) = 0
(9.27)
n=0
k=0
j =1
=0
356
j 1
N
a0 &
0,0 +
&
j,0
(r )
j =1
=0
N
&
j,0
j =1
j 1
(r ) = 0
(9.28)
=0
Equation (9.28) is known as the indicial equation of (9.23), and the N roots of (9.28)
r1 , . . . , rN are known as the indicial roots (also known as exponents).
Having set a0 as arbitrary, (9.27) also implies that the other coefficients can be
obtained by using the following recurrence formula:
an Qn,n (r) =
n1
n = 1, 2, . . .
Qn,k (r)ak
(9.29)
k=0
where
0,nk +
Qn,k (r) = &
N
&
j,nk
j =1
j 1
(k + r )
(9.30)
=0
If Qn,n (r) = 0,
n (r)a0
an = &
where
5
&
n (r) =
(9.31)
1
n1
&
1
Q
(r)
(r)
/Qn,n (r)
n,k
k
k=0
if n = 0
if n > 0
(9.32)
When Qn,n (r) = 0 for all indicial roots r, the complete solution is immediately
given by a linear combination of solutions based on each root,
y=
N
ck yk
(9.33)
k=1
where
yk =
n=0
r+n
&
n (r)x
(9.34)
(r=rk )
N
j =1
&
j,0
j 1
=0
(m + rb ) = &
0,0 +
N
j =1
&
j,0
j 1
=0
(ra ) (9.35)
357
which is the indicial equation with r replaced by ra (cf. (9.28) ); thus Qm,m (rb) will be
zero. If the numerator for &
m (rb) in (9.32) happens to be zero, then the recurrence
can be continued by setting &
m (rb) = 0. Otherwise, additional procedures will be
needed to treat these cases.
There are two cases that yield an integer difference between two indicial roots:
when some of the indicial roots are equal and when some indicial roots differ by
a nonzero integer. We can partition the indicial roots into equivalence subsets,
where in each subset the roots are either repeating or differ by integers. Taking
the largest value from each of the equivalence subsets, (9.34) can be used to obtain
at least one solution from each of these subsets. To determine the other solutions,
we can apply a technique known as the dAlembert method of order reduction.
Briefly, this method is used to find another solution v(x) = u(x) z(x)dx, where u(x)
is a known solution and z(x) is a function that solves an associated differential
equation whose order is one less than the original differential equation. Details for
the method of order reduction are given in Section I.2 as an appendix. Unfortunately,
this additional procedure can also get very complicated for high-order differential
equations.
We can directly apply the procedure just discussed to second-order equations,
and the details for doing so are given in Section I.1 as an appendix. We summarize the
results from applying the Frobenius method to the second-order equation, expanded
around a regular singular point at x = 0, in the following theorem:
THEOREM 9.2. Frobenius Series Solution of Linear Second Order ODE Given the
homogenous second-order linear differential equation
P2 (x)
x2&
d2 y
dy &
+ x&
P1 (x)
+ P0 (x)y = 0
dx2
dx
where
&
Pi (x) =
&
i,k xk
i = 0, 1, 2
and &
2,0 = 0
k=0
ra , rb =
1,0 )
2,0 &
(&
"
1,0 )2 4&
0,0&
2,0
2,0 &
(&
2&
2,0
and the complete solution is given by y = Au(x) + Bv(x), where A and B are arbitrary
coefficients and
u(x)
an xn+ra
(9.36)
n=0
v(x)
u(x) ln(x) +
n=0
bn xn+rb
(9.37)
358
If the difference between ra and rb is an integer, let m = (ra rb) 0. Also, define
n (r) and n (r) as follows:
Qn,k (r), &
Qn,k (r)
&
n (r)
n (r)
&
0,nk + &
1,nk (k + r)
+&
2,nk (k + r)(k + r 1)
if k < n
n &
2,0 (2r 1) + &
1,0
2,0 n + &
if k = n
if n = 0
n
(9.38)
n1
k=0
Qn,k (r)&
k (r)
Qn,n (r)
(9.39)
if n > 0
k (r)
1,nk &
2,nk + &
(2r + 2k 1)&
(9.40)
k=0
bn
PROOF.
&
n (ra )
&
n (rb)
n1
Q (r )b + nm (ra )
k=0 n,k b k
Qn,n (rb)
m1
Q (r )&
(r )
k=0 m,k b k b
0 (ra )
(9.41)
if (ra rb) not an integer
or n < m
if n = m
if n > m
(9.42)
(9.43)
We have included three examples in Section I.3 where this theorem has been
applied. Also, in Section 9.3, we use the result of Theorem 9.2 to obtain the solution
of an important class of equations known as Bessel equations.
Solution
m2
1 x2
(where n, m are integers)
y=0
partial differential equations involving spherical coordinates (see Example 9.3). The
series solutions of these equations involve the direct implementation of the results
given in Theorem 9.1 of Section 9.1.1. The details of the series solution for these
Legendre equations and associated Legendre equations can be found in section I.4
as an appendix. These solutions can be formulated using special functions, and they
are summarized in Table 9.1.
The special functions in Table 9.1 result from grouping various terms in the series
solutions. These include Legendre polynomials Pn (x), the Legendre functions Qn (x),
the Legendre functions of the second kind, Leven and Lodd , the associated Legendre
polynomials Pn,m , and the associated Legendre functions Qn,m . The definitions of
these functions are given in Table 9.2.
Because of the importance of Legendre polynomials and functions, several computer programs have built-in routines for the evaluation of these functions. In
MATLAB, the command for the associated Legendre polynomials is
T
legendre(n,x), which yields a vector Pn,0 (x), . . . , Pn,n (x) .
The first five Legendre polynomials are given by,
P0
P1
=
=
P2
1
x
3x2 1
2
P3
P4
5x3 3x
2
35x4 30x2 + 3
8
A plot of these are shown in Figure 9.4. Also, the first four Legendre functions are
given by
Q0 (x)
atanh (x)
Q2 (x)
Q1 (x)
x atanh (x) 1
Q3 (x)
3
3x2 1
atanh (x) x
2
2
5x3 3x
5
2
atanh (x) x2 +
2
2
3
359
360
Definition
Pn (x) =
(n)
k=0
Legendre Polynomial
where (n) =
+
n/2
(n 1)/2
8
Qn (x) = Pn (x)
Legendre Function
Legendre Functions
of 2nd kind
if n is even
if n is odd
9
1
(1 x2 ) (Pn (x))2
Leven (x)
1+
Lodd (x)
x+
n=1
n=1
2n ()x2n
2n+1 ()x2n+1
where,
m
(m)
(
(m)1 '
(1)(m)
( m 2k)
( + m 2k + 1)
m!
k=0
+
m/2
if m is even
(m 1)/2 if m is odd
Associated Legendre
Polynomial
m/2 dm
Pn,m = (1)m 1 x2
Pn (x)
dxm
Associated Legendre
Function
m/2 dm
Qn,m = (1)m 1 x2
Qn (x)
dxm
EXAMPLE 9.3.
dx
by
2 F = 0 = r 2 s
2F
F
2F
F
1 2F
+ 2rs
+ s 2 + c
+
2
r
r
s 2
(9.44)
where s = sin() and c = cos(). This differential equation is linear and can be
solved using the method of separation of variables. This method is discussed in
more detail in Section 11.3. Basically, the solution is first assumed to be a product
of three univariable functions, that is, F = R(r)()(). After substituting this
product into (9.44), the following equation results
A(r) + B() +
1
C() = 0
s2
(9.45)
where
A(r) =
r2 d2 R 2r dR
+
;
R dr2
R dr
B() =
1 d2
c d
+
;
2
d
s d
C() =
1 d2
d2
361
1.5
n=0
1
n=1
n=3
P (x)
0.5
0
n=4
0.5
n=2
1
1
0.5
0.5
x
For (9.45) to be true for arbitrary values of the independent variables, the three
functions A(r), B(), and C() will need to satisfy the following:
A(r) =
C() =
B() + +
=0
s2
+
+ + 2 =0
d2
s d
s
Next, let x = c , then s2 = 1 x2 , dx = s d and
d
d
= s
d
dx
and
2
d2
d
2d
=
c
+
s
d2
dx
dx2
1 x2
2x
+
+
=0
dx2
dx
1 x2
3
n=0
Qn(x)
n=2
n=3
n=1
3
1
0.5
0.5
362
1.2
(x)
0.8
Figure 9.6. The solution to the associated Legendre equation with n = 2 and m = 1. The solid line
is the numerical BVP solution, whereas the open
circles are the analytical series solutions.
0.6
0.4
0.2
0.2
0
0.2
0.4
0.6
0.8
Furthermore, we set = n(n + 1) and = m2 , which then results in an associated Legendre equation,
d2
d
m2
1 x2
2x
+
n(n
+
1)
=0
dx2
dx
1 x2
Using Table 9.1, the solution is = APn,m (x) + BQn,m (x). For a numeric example, let n = 2 and m = 1, and boundary conditions (0) = 1 and (0.8) = 0.
Then we have
(x) = 0.503P2,1 (x) + 0.5Q2,1 (x)
where
'
(
!
!
3x2 1
3
P2,1 (x) = 3x 1 x2 and Q2,1 (x) = 1 x2 3x Atanh(x) +
2(1 x2 ) 2
A plot of (x) for this numeric case is shown in Figure 9.6 together with the
numerical BVP solution.
n
1 d n 2
x
1
2n n! dxn
(9.46)
1
g L(x, t) =
=
Pn (x)tn
2
1 2xt + t
n=0
(9.47)
This means that taking the Taylor series of g L(x, t), based on powers of t while
keeping x constant, the coefficients of the series will yield the value of Legendre
polynomial at x.
363
Solution
d2 y
dy 2
+x
+ x 2 y = 0
dx2
dx
y = AJ (x) + BY (x)
d2 y
dy 2 2
+x
+ x 2 y = 0
dx2
dx
y = AJ (x) + BY (x)
d2 y
dy 2
+x
x + 2 y = 0
dx2
dx
y = AI (x) + BK (x)
d2 y
dy
+ xf 1 (x)
+ f 2 (x)y = 0
dx2
dx
f 1 (x) = a + 2bx p
f 2 (x) = c + bx p (p + a 1) + (bx p )2 + dx2q
0
if m = n
1
Pn (x)Pm (x)dx =
2
if m = n
2n + 1
(9.48)
This property allows the Legendre polynomials to be used as a basis for the
series
approximation of continuous functions in 1 x 1, that is, f (x) =
n=0 an Pn (x).
364
Definition
J (x) =
Y (x) =
i
I (x) = exp
J (ix)
2
K (x) = exp
x 2n+
(1)n
n=0
2
n!(n + + 1)
J (x) cos() J (x)
sin()
( + 1)i
J (ix) + iY (ix)
2
BESSELJ(nu,x) and BESSELY(nu,x), respectively. Also, the MATLAB functions for I (x) and K (x) are given by BESSELI(nu,x) and BESSELK(nu,x),
respectively. A plot J (x), Y (x), I (x) and K (x) can be generated using MATLAB,
and these are shown in Figures 9.7 and 9.8.
Instead of using the Frobenius series solution process outlined in Theorem 9.2,
it is often worthwhile to determine whether a given equation can be transformed
first to either a Bessel or modified Bessel equations, to take advantage of existing
solutions (and functions). To find the conditions in which this might be possible,
let X = X(x) be the new independent variable and
Y = yG(x)
(9.49)
be the new dependent variable.4 Evaluating the derivatives dY/dX and d2 Y/dX 2 ,
dY
dX
yG + Gy
X
d2 Y
dX 2
where we used the prime notation to denote the operation d/dx, that is, G = dG/dx,
G = d2 G/dx2 , and so forth.
In terms of X and Y , the Bessel equation with parameter and order is given
by
X2
d2 Y
dY 2 2
+X
+ X 2 Y = 0
dX 2
dX
(9.50)
d2 y
dy
+ xf 1 (x)
+ f 2 (x)y = 0
dx2
dx
(9.51)
A more general approach is to let X = X(x, y) and Y = Y (x, y). As expected, doing so would yield
much more complicated results.
=0
=0
=1/2
=1
0.5
365
J(x)
Y(x)
=1
0
0.5
0
10
15
8
0
10
15
Figure 9.7. Plots of Bessel functions of the first kind, J (x), and second kind, Y (x).
where
f 1 (x)
f 2 (x)
G
X
X
x 2
+
G
X
X
8
2 9
G
X
X
G
X
2
x2
+
+ (X )
G
X
X
G
X
(9.52)
Based on choices for X(x) and G(x), both f 1 (x) and f 2 (x) should be analytic at x = 0
to guarantee a series solution for (9.51) and, in particular, one that could be written
in terms of Bessel or modified Bessel functions.
One set of specifications that have found many applications is to let
X(x)
xq
G(x)
x exp (x p )
=1/2
=0
=0
=1
0
0
=1
K(x)
I (x)
0.5
1.5
2.5
0
0
0.5
1.5
Figure 9.8. Plots of modified Bessel functions of the first kind, I (x), and second kind, K (x).
2.5
366
where q = 0 and p = 0. The first and second derivatives of X(x) and G(x) are then
given by
q
X
x
X
qxq1 =
X
q(q 1)xq2 =
G
( + px p ) x1 exp (x p ) =
G
q1
X
x
+ px p
G
x
+ px p px p (p 1)
G +
G
x
x2
( + px p )2 + px p (p 1)
G
x2
After substituting X(x), G(x), and their derivatives into (9.52), we obtain
f 1 (x)
f 2 (x)
(2 + 1) + 2px p
2
2 q2 + px p (p + 2) + (px p )2 + 2 q2 x2q
(9.53)
We can summarize the results for this specific form of X(x) and G(x) in the following
theorem.
For the second-order differential equation, with q = 0, p = 0, d = 0
and (1 a)2 4c,
THEOREM 9.3.
x2
d2 y
dy
+ xf 1 (x)
+ f 2 (x)y = 0
2
dx
dx
where
f 1 (x)
f 2 (x)
a + 2bx p
dx2q + c + bx p (p + a 1) + (bx p )2
(9.54)
y=
AJ (xq ) + BY (xq )
if d > 0
q
q
AI (x ) + BK (x )
G
if d < 0
(9.55)
where,
G
x exp (x p )
a1
2
b
p
(a 1)2 4c
2q
!
|d|
q
367
Recall that (9.54) was obtained from the Bessel equation given in (9.50),
with
X(x) = xq
and
The solutions of (9.50) can then be replaced with X(x) and Y (x) to yield (9.55).
EXAMPLE 9.4.
(9.56)
1
3
1
2!
= ,
= 0,
q= ,
= ,
=
|k|
2
2
3
3
The solution is then given by
2
2
3/2
3/2
x
AJ
+
BJ
k
x
if k > 0
k
x
1/3
1/3
3
3
y=
By looking closer at Figures 9.7 and 9.8, we can immediately note that the
Bessel and modified Bessel functions of the first kind, J (x) and I (x), are finite
368
Bi(x)
Airy Functions
Ai(x)
1
1
7
x
Figure 9.9. Plots of Airy functions A(x) and B(x).
A model for the one-dimensional steady-state temperature distribution in a triangular cooling fin as shown in Figure 9.10 is given by
EXAMPLE 9.5.
d2 U
dU
+
2 U = 0
2
d
d
T Ta
Tw Ta
and
x
L
with T w and T a being the temperature of the wall and of the surrounding
air, respectively. The parameter = Lh/ (Wk cos()) is the thermal diffusion
constant of the fin, with h and k being the heat transfer coefficient and thermal
conductivity of the fin, respectively. The boundary conditions are U(0) <
and U(1) = 1.
Using the results in Theorem 9.3, we obtain
!
!
U() = AI0 2 + BK0 2
Because U(0) < and K0 (0) = , we need B = 0. Using the other boundary
condition, we get A = (I0 (2))1 . A plot of the results for different values of
is shown in Figure 9.11.
9.4 Properties and Identities of Bessel Functions and Modified Bessel Functions
369
W
Figure 9.10. Cooling fin.
x
x=L
x=0
1
0
J (0) = I (0) =
if = 0
if > 0
if < 0
(9.60)
Y (0) =
(9.61)
K (0) =
(9.62)
2. Derivatives.
(a) Derivatives of J (x).
d
(x J (x))
dx
d
x J (x)
dx
d
J (x)
dx
x J 1 (x)
(9.63)
x J +1 (x)
(9.64)
=
=
J 1 (x) J (x)
x
J +1 (x) + J (x)
x
(9.65)
(9.66)
0.8
0.6
=1.25
U()
0.4
=2.5
0.2
0
0
=5
0.2
0.4
0.6
=10
0.8
370
x Y1 (x)
(9.67)
x Y+1 (x)
(9.68)
=
=
Y1 (x) Y (x)
x
(9.69)
(9.70)
x I1 (x)
(9.71)
x I+1 (x)
(9.72)
=
=
I1 (x) I (x)
x
(9.73)
(9.74)
x K1 (x)
(9.75)
x K+1 (x)
(9.76)
=
=
K1 (x) K (x)
x
(9.77)
(9.78)
3. Recurrence equations:
J 1 (x)
Y1 (x)
I1 (x)
K1 (x)
2
J (x) J +1 (x)
x
2
Y (x) Y+1 (x)
x
2
I (x) + I+1 (x)
x
2
K (x) + K+1 (x)
x
(9.79)
(9.80)
(9.81)
(9.82)
(The recurrence equations can be obtained directly from the preceding derivative
formulas. For instance, equation (9.79) can be obtained by equating (9.65) and
(9.66)).
9.5 Exercises
371
(1)n J n (x)
(9.83)
Yn (x)
(1)n Yn (x)
(9.84)
In (x)
In (x)
(9.85)
Kn (x)
Kn (x)
(9.86)
5. Generating functions:
'
(
x
1
exp
t
=
J n (x)tn
2
t
n=
(9.87)
6. Orthogonalities:
xJ (n x)J (m x)dx =
if n = m
a2 J 2 (n a) J 1 (n a)J +1 (n a)
2
xI (n x)I (m x)dx =
(9.88)
if n = m
if n = m
a2 I2 (n a) I1 (n a)I+1 (n a)
2
(9.89)
if n = m
9.5 EXERCISES
d2 y
dy
+ f (x)
+ g(x)y = 0
2
dx
dx
where the functions f (x) and g(x) are shown in Table 9.5.
1. Show that x = 0 is an ordinary points for all these cases.
2. For the Chebyshev type 2 equation, show that one of the solutions will be
a finite polynomial if n is a positive integer. (These polynomials are used
in series approximation of functions due to the desirable othorgonality
property of the polynomials.)
Table 9.5. Differential equations with (1 x2 ) as leading coefficient
Type
f (x)
g(x)
Jacobi
Chebyshev type 1
Chebyshev type 2
Gegenbauer
( ) ( + + 2) x
x
3x
(2 + 1) x
n (n + + + 1)
n2
n(n + 2)
n(n + 2)
372
(9.90)
u(R a) = T p T a
The independent variable x = R r is the distance from the outer tip of the
fin. The dependent variable is u = T T a , where T (x) is the temperature
at x and T a is the constant air temperature. As shown in the figure, a and
R are the inner and outer radius of the cooling fin, respectively. T p is the
temperature of the contents flowing in the pipe at x = R a. The parameter
= h/(k sin ) contains the thermal diffusivity, with being the slant angle
of the fin. By using the Frobenius series expanding around x = 0,
1. Obtain the indicial equation and show that the indicial roots are equal.
(According to Theorem 9.2, the second solution will contain a term with
ln(x). Because u(0) < , the second solution will not be necessary.)
2. Using
9.2, show that one of the series solution is given by u =
Theorem
n
x
,
where
A is an arbitrary constant and
A
n
n=0
0 = 1 ; 1 = ; ; k =
5
k(k 1) + R
k1 2 k2
2
kR
kR
9.5 Exercises
373
x=0
x=R-a
x=R
(This series is difficult to reduce to a recursion pattern; however, the convergence is relatively fast; i.e., a few terms of the series may be sufficient.)
3. Using the first ten terms in the series, compare this truncated series solution
with the numerical BVP solution for the case where R = 8 in, a = 4 in,
= 3 in 1 , T a = 70o F , and T p = 300o F . Note that for the numerical BVP,
instead of starting at x = 0, one can start instead at, say, x = = 104 . Also,
the boundary condition at x = 0 can be replaced by setting x = 0 for the
model and then approximating at x = , that is,
du
R
R u() = 0
dx
x=
E9.6. Find the series solution expanded around x = 0 for the equation given by
d2 y
dy
bx
+ cy = 0
dx2
dx
Let b = 2 and c = 2 and the boundary conditions be given by
y(0) = 1
and
y(2) = 3
Compare the solution with a numerical BVP solution. (Hint: The series
solution can also be reduced to
'
(
1
3
y(x) = A 1 x2 2 F 2 1, ; 2, ; x2
+ Bx
2
2
where 2 F 2 is the generalized hypergeometric function, cf. (9.12)).
E9.7. The Hermite equation is given by
d2 y
dy
2x
+ 2y = 0
2
dx
dx
Obtain the series solution and show that one of the solutions is a finite order
polynomial for , a positive integer. These polynomials are known as the
Hermite polynomials.
E9.8. The Laguerre equation is given by
dy
d2 y
+ (1 x)
+ my = 0
dx2
dx
Using Theorem 9.2, show that if m is a positive integer, then one of the
solutions is given by finite polynomial of order m. Obtain the solution for
m = 5. (These polynomials are known as the Laguerre polynomials.)
E9.9. Derive the following identity using the definition of the Bessel function J (x)
defined in 9.4:
7
2
J 1/2 (x) =
sin(x)
(9.91)
x
374
R(a) = F 0
r CA = 0
2
dr
dr
DAB
with boundary conditions,
dCA
CA(R) = CA,surr and
=0
dr r=0
r2
y
X = x and Y =
x
will transform the original equation to either a Bessel or modified Bessel
equation with Y and X as the dependent and independent variables, respectively.
E9.13. Consider the following equation:
1
dy
d2 y
1
+
+
2
+
1
y=0
dx2
x
dx
x2
6
Adapted from H. Adidharma and V. Temyanko, Mathcad for Chemical Engineers, Trafford Publishing, Canada, 2007.
9.5 Exercises
375
Determine whether the conditions in Theorem 9.3 can be met, and if so,
obtain the general solution. For the boundary condition that y(1) = 100 and
y(2) = 20, obtain the analytical solution and compare with the numerical
BVP solution.
E9.14. Consider the following differential equation and the boundary conditions
d2 y
= 2xy
y(1) = 1, y(1) = 0.5
dx2
Obtain the solution in terms of Airy functions Ai(x) and Bi(x). Compare the
solution with the numerical BVP solution.
E9.15. Based on (9.50) and (9.51), let G(x) = 1, that is, no transformation is needed
for the independent variable:
1. For X(x) = x + a show that
x
and
f 1 (x) =
x+a
f 2 (x) = x
2
()
x+a
2
PART IV
This part of the book focuses on partial differential equations (PDEs), including the
solution, both analytical and numerical methods, and some classification methods.
Because the general topic of PDEs is very large, we have chosen to cover only some
general methods mainly applicable to linear PDEs, with the exception of nonlinear
first-order PDEs.
In Chapter 10, we focus on the solution of first-order PDEs, including the method
of characteristics and Lagrange-Charpit methods. The second half of the chapter
is devoted to classification of high-order PDEs, based on the factorization of the
principal parts to determine whether the equations are hyperbolic, parabolic, or
elliptic.
In Chapter 11, we discuss the analytical solutions of linear PDEs. We begin
with reducible PDEs that allow for the method of separation of variables. To satisfy various types of initial and boundary conditions, Sturm-Liouville equations are
used to obtain orthogonal functions. The techniques can then be extended to the
case of nonhomogenous PDEs and nonhomogeneous boundary conditions based on
eigenfunction expansions.
In Chapter 12, we discuss integral transform methods such as Fourier and
Laplace transforms methods. For the Fourier transforms, we cover the important
concepts of the classic transforms, including the use of distribution theory and tempered distributions to find the generalized Fourier transforms of step functions, sine,
and cosine functions. A brief but substantial introduction to distribution theory is
included in the appendix. For numerical implementation purposes, we have also
included a discussion of the fast Fourier transform in the appendix. The Laplace
transform methods are then discussed, including the transform of special functions
such as gamma functions and Bessel functions. Thereafter, they are applied to solve
linear PDEs. To perform the inverse Laplace transform, we employ the theory of
residues. A brief but substantial discussion of complex integration and residue theory
is also included in the appendix.
The last two chapters cover two of the most popular numerical methods available
for solving PDEs. Chapter 13 covers the method of finite differences. Starting with a
discussion of various discretization approaches, the finite differences are then applied
to the solution of time-independent cases, ranging from the one-dimensional cases
to the three-dimensional cases. The time-dependent cases are then handled by the
semi-discrete approach, also known as the method of lines, including forward Euler,
377
378
backward Euler, and Crank Nicholson schemes. Because stability analysis is a key
issue for the successful application of the finite difference method, we include a
discussion of both the eigenvalue approach and the Von Neumann analysis.
Chapter 14 discusses the other major class of numerical methods, known as the
finite element method. The discussion is restricted to the use of triangular finite
elements applied to the weak formulation of the PDEs with only two independent variables. After transforming the PDEs to integral equations, they are further
reduced to matrix equations.
10
379
380
(10.1)
n
i (x, u)
i=1
Semilinear :
n
n
u
= g(x, u)
xi
(10.3)
i (x)
u
= h(x) + (x)u
xi
(10.4)
i (x)
u
= h(x)
xi
(10.5)
i=1
Strictly Linear :
n
i=1
(10.2)
i (x)
i=1
Linear :
u
= f (x, u)
xi
EXAMPLE 10.1. Recall the conservation laws discussed in Section 5.4.1. When
limited to one-dimensional flow with no source term, and assuming sufficient
smoothness in the functions, we can obtain the general formula given by
where u is the conserved quantities and g is the flux of u. For the special case in
which g = v u(t, z), with v as a constant velocity, we obtain
u
u
+ v
=0
t
z
(10.7)
(10.8)
where G(u) = dg/du. Equation (10.8) can thus be classified as a quasilinear firstorder PDE, that is, (10.2) with 1 (x, u) = 1, 2 (x, u) = G(u) and f (x, u) = 0.
We focus first on the solution of quasilinear forms. These methods can be readily applied to both the linear and semilinear cases. For nonlinear cases that are not
381
o
increasing
projection
quasilinear, additional steps known as the Lagrange-Charpit methods, to be discussed later in Section 10.3, can sometimes be used to convert the equations to an
associated quasilinear equation.
Consider a quasilinear equation with two independent variables, x and y,
(x, y, u)
u
u
+ (x, y, u)
= (x, y, u)
x
y
(10.9)
(10.10)
where (xo (a), yo (a)) is a boundary curve parameterized by a. This type of boundary
condition, where the values are fixed along a continuous curve in the domain of the
the independent variables, is known as a Cauchy condition.
The method of characteristics involves the determination of a solution surface
parameterized by a and s in the (x, y, u)-space where x and y are independent
variables. The boundary condition is fixed to be the values at s = 0 and and is
denoted by xo (a) = x(a, s = 0), yo (a) = y(a, s = 0), and uo (a) = u(a, s = 0); that is,
the locus of points of the boundary conditions are specified by the parameter a. At
any fixed value of a, we then track a curve in the solution surface as s increases
from 0. These curves are the integral curves known as the characteristic curves of
the partial differential system given by (10.9) subject to (10.10). This is illustrated in
Figure 10.1.
As the parameter a takes on values of its range, a collection of characteristics
curves can be generated. The projections of these curves on the (x, y) plane are
known as the projected characteristic curves, characteristic base, or characteristics,
which we denote as a . As long as the characteristics do not cross each other, we can
then take the collection of the characteristic curves and bundle them together to
be the solution of (10.9). This means that the solution surface can be put in terms of
parameters a and s. If the problem is well posed, one could transform this solution
back in terms of x and y, that is, u = u(x, y).
382
(10.11)
After comparing (10.11) and (10.9), we obtain the following set of simultaneous
ordinary differential equations, known as the characteristic equations:
du
ds
dx
ds
dy
ds
(x, y, u)
(10.12)
(x, y, u)
(10.13)
(x, y, u)
(10.14)
x (0) = xo (a) ;
y (0) = yo (a)
(10.15)
After solving these simultaneous equations, we can then rearrange the results to
obtain the solution in terms of x and y. The approach is best illustrated by a simple
example.
EXAMPLE 10.2.
u
u
y
= 2x2 6y2
x
y
(10.16)
(10.17)
yo
a = arbitrary constant
uo
3y2o
(10.18)
(10.19)
(10.20)
(10.21)
+ 2yo 5 = 3a + 2a 5
2
(10.22)
(10.23)
es
(10.24)
a es
(10.25)
383
a=2
a=1
y=a e
a=0
a=1
a=2
4
s
x=e
These two equations define the characteristic a (s). Plots of the characteristics
for a = 2, 1, 0, 1, 2 are shown in Figure 10.2. The characteristic curves are
then obtained by solving (10.20). Substituting x = es and y = aes ,
du
= 2e2s 6a2 e2s
ds
which can be integrated to yield
u = e2s + 3a2 e2s + C
(10.26)
(10.27)
2a 6
or
u = e2s + 3a2 e2s + 2a 6
(10.28)
(10.29)
One can show directly that (10.29) satisfies both the partial differential equation
(10.16) and the boundary condition (10.17).
Consider the simplified model for a parametric pump based
on the chromatographic process described by the following partial differential
equation:
EXAMPLE 10.3.
(1 + K(t))
c
dK
c
+ v(t) = v(t)
c
t
z
dt
where c, t, and z are the solute concentration, time, and distance from the
bottom of the column, respectively. The parameter = (1 )/ is the ratio
384
u(x,y)
100
50
0
2
0
y
-2
4
2
1
1 + K+
and
1
1 + K
z=L
T=T +
T=T -
z=0
(+)
(-)
Figure 10.4. The parametric pump process showing the flow configurations during an upswing
and downswing mode.
385
Also, let c+ (z) and c (z) denote the concentrations of the fluid in the column,
during switches where there are no flow, at temperatures T + and T , respectively. Then we can model the process separately during each mode as follows:
1 c
c
+
v
=
0
+ t
z
1
when
nP < t < n +
P
(n)
c(0, t) = cin,bottom
1 c
c
v
= 0
t
z
1
when
n+
P < t < (n + 1)P
(n)
c(L, t) = cin,top
+(n)
+
c
z
(t
P
)v
n
c (z, t) =
(n)
cin,bottom
for 0 z (t Pn )v+
and for Pnh < t Pn+1 ,
(n)
c
z
+
(t
P
)v
nh
c (z, t) =
(n)
cin,top
for L (t Pnh )v z L
(10.31)
(10.32)
where the feed concentrations for n > 0 are obtained by mixing the liquid
(n)
(n)
collected from the previous mode, that is, cin,bottom = cbottom (Pn ) and cin,top =
ctop (Pnh ), where
t
ctop (Pnh )
for Pnh < t Pn+1
386
and
c
(P )
bottom n t
1
cbottom (t) =
c(0, )d
t Pnh Pnh
c
=
=
c+
+
We consider the case where < 1; that is, the amount adsorbed in the solids are
greater during the downswing mode, making the liquid more dilute with solute
as it is being pushed to the bottom reservoir. This implies that the magnitude of
the slopes of the characteristics for the downswing mode is less than that of the
upswing mode, that is, v < v+ . Prior to pumping, we have
1
c (z, Pn ) and c(n) (z) = c (z, Pnh )
To complete the model, we need to be given the data for the initial conditions,
(0)
that is, c+(0) (z) and cin,bottom . It can be shown (as part of Exercise E10.2) that
the top reservoir will keep increasing in solute concentration asymptotically to
a maximum level.
c+(n) (z) =
i (x, u)
i=1
u
= f (x, u)
xi
(10.33)
dxn
= n (x, u)
ds
du
= f (x, u)
ds
(10.34)
..
.
(10.35)
xn (0)
u(0)
uo (xo,1 , . . . , x0,n )
The parameters {a1 , a2 , . . . , an1 } are used to specify the boundary hyper-surface.
1
The boundary conditions are now specified at an (n 1)-dimensional hypersurface that can be
parameterized by {a1 , . . . , an1 }.
387
For the linear and semilinear first-order equations, the characteristic equations
become
dx1
= 1 (x, u) = 1 (x)
ds
dxn
= n (x, u) = n (x)
ds
(10.36)
x1
x1 (xo,1 , . . . , xo,n , s)
..
.
=
xn
(10.37)
xn (xo,1 , . . . , xo,n , s)
Furthermore, if the boundary surface are parameterized by {a1 , . . . , an1 }, then with
a nonsingular Jacobian determinant given by
x1
x1
x1
s
a1
an1
..
..
..
..
(10.38)
.
= 0
.
.
.
xn
x
x
n
n
s
a
a
1
n1
(10.39)
A limitation with this alternate formulation is that one or more functions i may
be zero. In those cases, the notation simply implies that xi = constant, and the term
dxi /i can then be removed from (10.39). Likewise, if f = 0, we have u = constant,
and the last term in (10.39) can also be removed.
388
i = 1, . . . , n
(10.40)
These can further be combined in two possible forms, and both are equivalent
general solutions. One formulation is to use one of the solutions, say n , to be an
arbitrary function of the other solutions, that is,
n = f (1 , . . . , n1 )
(10.41)
(10.42)
EXAMPLE 10.4.
(10.43)
dx
dy
du
=
= 2
x
y
2x 6y2
(10.44)
x
As prescribed in (10.39),
1 = xy = constant
We could then substitute y = 1 /x in the last term of (10.44) and equate the
result with the first term,
du
2x2 6
12
x2
dx
x
12
x2
u 2
x2 + 3
u x2 + 3y2
f (1 )
f (xy)
x2 + 3y2 + f (xy)
(10.45)
which satisfies (10.43). Next, consider the second formulation of the general
solution,
(10.46)
F (1 , 2 ) = F xy, u x2 + 3y2 = constant
389
To show that this also satisfies (10.43), take the partial derivatives of F with
respect to x and y, respectively,
F
F u
y+
2x
= 0
1
2 x
F
F u
x+
6y
= 0
1
2 y
By multiplying the first equation by x and the second by y, the difference will be
F
u
u
x
2x2 y
(10.47)
+ 6y2 = 0
2
x
y
Because F is an arbitrary function, the other factor on the left-hand side of
(10.47) must equal zero, which again yields (10.43).
Thus either formulation of the general solution is valid. The advantage of
the first formulation is that it can often yield an explicit form, u = u(x, y). In
cases in which only implicit solutions are possible, the second formulation is
preferred.
(10.48)
u
u
G x, u,
,...,
x1
xn
=0
(10.49)
It will turn out later that we do not actually need to specify any particular formulation
of G(x, u, . . .).
We introduce the following notations to denote the partial derivatives:
F xk =
F
u
F
=
k
Fu =
F k
F
xk
u
xk
k,
2u
xk x
Gxk =
Gu =
Gk
G
xk
G
u
G
=
k
390
(10.50)
0
F x1
1
1,1
..
..
.. ..
. = . + Fu . + .
0
F xn
n
n,1
Gx1
0
1
1,1
..
..
.. ..
. = . + Gu . + .
0
Gxn
n,1
(10.51)
1,n
F 1
.. .. (10.52)
. .
n,n
F n
1,n
G1
.. .. (10.53)
..
.
. .
n,n
Gn
..
.
(F k Gxk + F k k Gu Gk F xk Gk k F u ) = 0
k=1
or
F 1 Gx1 + + F k Gxn +
n
F k k Gu
k=1
(F x1 + F u 1 ) G1 (F xn + F u n ) Gn = 0
(10.54)
which is a quasilinear partial differential equation for G. Thus we can write the 2n
characteristic equations as follows
dx1
=
F 1
dxn
du
= n
F n
k=1 F k k
d1
dn
= =
F x1 + F u 1
F xn + F u n
(10.55)
which we shall refer to as the Lagrange-Charpit conditions. Among these 2n characteristic equations, we only need n equations that would yield i , i = 1, 2, . . . , n,
in terms of the independent variables x1 , x2 , . . . , xn . Finally, u is obtained by solving
the differential equation,
du = 1 (x1 , . . . , xn ) dx1 + + n (x1 , . . . , xn ) dxn
(10.56)
The solution of (10.56) will involve a set of n arbitrary constants after substituting
back into the differential equation (10.48). This solution is known as the complete
solution. In some cases, the arbitrary constants can be removed to obtain a general
solution, where arbitrary functions replace the arbitrary constants.
391
F
x2
u x1
0 ,
F
= 2
1
F
=1
u
F
= 1 1
2
dx1
dx2
du
d1
d2
=
=
=
=
2
1 1
2 21 2
1 1
2
(10.58)
Of the four equations, we can select only those which can be used to find 1 and
2 as explicit functions of x1 and x2 . To do so, we equate the first and the last
term, as well as equate the second and the second-to-the-last term.
dx1 = d2
and
dx2 = d1
u
= x1 + c1
x2
and
1 =
or
2 =
u
= x2 + c2
x1
(10.59)
(10.60)
k = ak , k = 1, . . . , n
392
where a1 , . . . , an1 are arbitrary constants but with an = f (a1 , . . . , an1 ). The
solution is then given by
n1
ak xk + f (a1 , . . . , an1 ) xn + b
(10.61)
u=
k=1
j = a j n , j = 1, . . . , n 1
n
k=1
th
dxk
dk
=
f k /k
f k /xk
fk
fk
dx +
dk = 0 or f k (xk , k ) = ak
xk
k
n1
j =1
separate term in the given differential equation can be solved for k , that is,
f k (xk , k ) = ak
k = g (ak , xk )
(10.63)
k=1
k = ak , k = 1, . . . , n
where a1 , . . . , an are arbitrary constants (Note that in this case we can set an
arbitrary and independent of the other constants). Substituting back into the
differential equation, we have
u=
n
k=1
ak xk + f (a1 , . . . , an )
(10.64)
393
uy =
2u
xx
uxy =
ux =
uxx =
u
y
2u
xy
uyy =
..
.
2u
yy
(10.65)
u
xi
i,j
2u
,
xi x j
i,j,k
3u
,
xi x j xk
1 i, j n
1 i, j, k n
..
.
(10.66)
The order of the partial differential equation is the highest order derivative
present in the equation. The most general mth -order form is the nonlinear equation
given by
F x, u, [1] , . . . , [m] = 0
(10.67)
where,
x
{x1 , x2 , . . . , xn }
(10.68)
[1]
(10.69)
[2]
{1 , 2 , . . . , n }
)
*
i1 ,i2 , 1 i1 < i2 n
..
.
[m]
(10.70)
*
(10.71)
To be consistent with the classification we used for the first-order partial differential equations, we decompose the equation into two parts: the principal part F prin ,
which contains the highest derivatives, and the nonprincipal part F non , in which the
highest derivatives are absent,
F prin + F non = 0
(10.72)
394
The differential equations can then be classified based on special forms that the
principal parts may take on.
1. Quasilinear Equation
F prin =
i1 ,...,im x, u, [1] , . . . , [m1] i1 ,...,im
(10.73)
1i1 <<im n
which means the highest derivatives are linearly combined, with coefficients
being functions of the independent variables and lower order derivatives.
2. Semilinear Equation
i1 ,...,im (x) i1 ,...,im
(10.74)
F prin =
1i1 <<im n
which means the highest derivatives are linearly combined, with coefficients
being functions of only independent variables.
3. Linear Equation.
i1 ,...,im (x) i1 ,...,im
(10.75)
F prin =
1i1 <<im n
F non
1i1 <<im1 n
(10.76)
1in
u u = G1 (, , u, u , u )
u = G2 (, , u, u , u )
u + u = G3 (, , u, u , u )
For each canonical form, the solution takes on particular behavior and properties.
For hyperbolic equations, the solutions are wave-like in nature and are thus usually
classified as wave equations. For parabolic equations, the solutions are related to
diffusion-like behavior and are thus usually classified as diffusion equations. Lastly,
for the elliptic equations, the solutions are related to potential fields and are thus
usually classified as potential equations. Therefore, one immediate use of classification is to better identify the appropriate solution approach, either analytically or
numerically.
2
The naming of each canonical form closely matches the functional description of a hyperbola: (x/a)2 (y/b)2 = f 1 (x, y, c), a parabola: (y/b)2 = f 2 (x, y, c), and an ellipse: (x/a)2 + (y/b)2 =
f 3 (x, y, c).
395
Let (x, y) and (x, y) be a pair of new coordinates. Applying the chain rule, we
obtain
ux
x u = x u + x u
uy
y u = y u + y u
uxx
uxy
uyy
x ux = x [x u + x u ] + xx u + x [x u + x u ] + xx u
x uy = x y u + y u + xy u + x y u + y u + xy u
y uy = y y u + y u + yy u + y y u + y u + yy u
(10.78)
(10.79)
where,
g = 2 Ax x + Cy y + (B) x y + y x
and Q(, ) is a quadratic form known as the characteristic form of (10.77) defined as
A (w r1 q) (w r2 q) if A = 0
2
2
Q(w, q) = Aw + Bwq + Cq =
(10.80)
if A = 0
(Bw + Cq) (q)
with roots r1 and r2 given by
B + B2 4AC
r1 =
2A
r2 =
B2 4AC
2A
(10.81)
To remove
the
terms involving
u and u in (10.79), we can solve for and such
that Q x , y = 0 and Q x , y = 0, that is,
x r1 y x r2 y = 0 = x r1 y x r2 y
Bx + Cy y = 0 = Bx + Cy y
if A = 0
if A = 0
x r2 y = 0
Bx + Cy = 0 ;
y = 0
if A = 0
if A = 0
(10.82)
If r1 = r2 are real, or equivalently B2 > 4AC (which includes the case when A = 0,
B = 0), then and are also real. Let = + and = . Then with
=
+
=
+
and
+
=
396
+i
and
B2 > 4AC
Parabolic, if
B2 = 4AC
Elliptic, if
B2 < 4AC
EXAMPLE 10.6.
( + ) (u u ) + 2 ( )
( + )2 8 ( )
One of the advantages of finding the canonical forms is that solutions and solution methods, both analytical and numerical, are readily available for the nonprincipal parts having special reduced forms. For instance, if the canonical hyperbolic
form (in terms of and ) is given by
u = f (, )
then the general solution can be obtained immediately by integration, that is,
u (, ) =
f (, ) dd + g() + h()
397
ut =
u1
t
..
.
um
t
ux =
and
(10.83)
u1
x
..
.
um
x
gT (ut + ux ) = gT b
(10.84)
(10.85)
Assuming that the eigenvectors gk are real and linearly independent, the partial
differential equations can be reduced to a set of m ordinary differential equations,
gTk (x, t, u)
du
= gTk (x, t, u) b (x, t, u)
dt
along
dx
= k (x, t, u)
dt
(10.86)
Note, however, that unlike the eigenvalue problems we have dealt with in the previous chapters, the matrix A is a matrix of functions. Further, note that even though
there are m ordinary differential equations, the equations are along different characteristics. This means that the equations cannot be solved simultaneously as they are
given in (10.86). Moreover, the system will contain multiple-point boundary value
problems.
If the eigenvalues are all real and the eigenvectors are linearly independent, we
say that (10.83) is a hyperbolic system of PDE. In addition, when the eigenvalues
are real and distinct, we say that it is a strictly hyperbolic system of PDE. However, if none of the eigenvalues are real, then (10.83) is called an elliptic system.
If there are fewer than m linearly independent eigenvectors, we have a parabolic
system.
398
Consider the special case for (10.83) with A[=]n n and b[=]n
1 being constant, and AT having n real and distinct eigenvalues, 1 = = n
and corresponding eigenvectors g1 , . . . , gn , that is,
0
1
..
AT G = G
;
G
=
where
=
g
.
1
n
0
n
EXAMPLE 10.7.
(10.87)
Then the solution to (10.86) subject to the initial conditions, and following along
the characteristics (details of which are left as an exercise in E10.18), will be
given by
T
g1 u0 (x 1 t)
..
(10.88)
u = bt + GT
.
gTn u0 (x n t)
One can show by direct substitution that (10.88) will satisfy both the system of
differential equations (10.83) and the initial conditions (10.87).
EXAMPLE 10.8.
h
v
h
0
t
x
+
=
v
v
g
v
0
t
x
where h(x, t) and v(x, t) are the height of the surface and velocity along x,
respectively, and g is the gravitational acceleration. The first equation is the
mass balance equation, whereas the second equation is the momentum balance
(reduced under the constraint of the mass balance equation). The eigenvalues
of AT are the roots of
!
v
g
det
= 0 1 , 2 = v hg
h
v
and the corresponding eigenvectors are
6
g1 =
and
h
+
g
Thus yielding
dh
+
dt
6
h dv
=0
g dt
dh
dt
6
h dv
=0
g dt
and
g2 =
1
6
h
g
along
!
dx
= v + hg
dt
along
!
dx
= v hg
dt
10.6 Exercises
399
Because the right-hand side of both ordinary differential equations are zero, we
find that along their respective characteristics the following are constants, that
is,
!
!
dx
R1 = 2 gh + v = constant along
= v + gh
dt
!
!
dx
R2 = 2 gh v = constant along
= v gh
dt
The two quantities are known as the Riemann invariants along their respective
characteristics.
10.6 EXERCISES
E10.1. Classify the following first-order differential equations (i.e., linear, semilinear, nonlinear, etc.)
u2
u2 + 4t
1.
+
= ue2z
t
z
u eu
2.
+
= 4ze2t
t
z
u
3.
+ r u = r , under the rectangular coordinate system, where r is
t
the position vector.
E10.2. Obtain the solutions given in (10.31) and (10.32) of the parametric pump
L
process given in Example 10.3. By setting the period P = 2 + and <
v
1, obtain cbottom (t)/c0 and ctop (t)/c0 , assuming c+(0) (z) = cbottom = c0 . Also,
what is limt ctop (t) ?
E10.3. For the plug flow reactor undergoing a first-order reaction, a simplified
model for the concentration of the reactant in the reactor is given by
c
c
+ v = kc
t
z
Assuming v is constant, obtain the solution given the conditions: c(z, 0) =
c0 (z) and c(0, t) = cin (t)
E10.4. Consider the advection equation with variable coefficients given by
u
u
+ v(t)
=0
t
z
Obtain the solution given the conditions u(z, 0) = u0 (z) and u(0, t) =
uin (t) for a) v(t) = 2 e2t and b) v(t) = 2 + sin(2t)/2.
E10.5. Find the solution to the dynamic, 2D advection problem given by
u
u
u
+ vx
+ vy
=0
t
x
y
subject to the conditions:
= uin (t)
u(x, y, 0) = u0 (x, y) and u(x, y, t)
y=x
for the domains t > 0 and x + y > 0, where vx and vy are positive constants.
Check your solutions by making sure they satisfy the PDE and the boundary
and initial conditions.
400
c(0, y, t) = f (y, t)
c(x, 0, t) = g(x, t)
for the domains t > 0, x > 0, and y > 0, where vx (t) = 1 et/5 and
vy (t) = 2. Check your solutions by making sure they satisfy the PDE and the
boundary and initial conditions. (A closed solution can be written in terms
of the Lambert W-function [cf. (6.34)]. However, a set of implicit equations
is often regarded as a sufficient form of the solution.)
E10.7. Consider the 3D advection equation with a generation term R(c) given by
c
+ v (x, y, z) (c) = R(c)
t
with the initial condition c(x, y, z, 0) = f (x, y, z). For the special case where
R(c) = k0 c2 and
x
vx
vy = A y
vz
z
obtain the solution c(x, y, z, t) in terms of f and k0 . (Check your solution by
substituting it into the differential equation and the initial conditions.)
E10.8. An n th -order ordinary differential equation is said to admit a similarity
transformation &
x = x and &
y = y (cf. sections 6.2 and 6.4.2 ) if
&
y(n) = f &
y(n) = f x, y, . . . , y(n1)
x,&
y, . . . ,&
y(n1)
where y(k) = dk y/dxk .
1. Using the conditions for similarity, one obtains
n y(n) = f x, y, . . . , n+1 y(n1)
Show that differentiation with respect to followed by the setting of
= 1 will yield a linear first-order partial differential equation given by
x
f
f
f
+ y + + ( n + 1) y(n1) (n1) = ( n) f
x
y
y
(10.89)
v j +1 ( j ) v j
v1 u
dvn1
du
for j = 1, . . . , n 2
10.6 Exercises
which means a reduction of one order from the original ordinary differential equation. (Assuming the procedure can be recursively reduced in
which the set of equations also admits a similarity transformation, it will
end with a separable first-order ordinary differential equation solvable
by quadratures.)
E10.9. Consider the following quasilinear equation:
u
u
+ (1 + u)2 x
=0
t
x
subject to
u(x, 0) = u0 (x)
such that u0 (x) is continuous.
1. Solve for u(x, t) for u0 (x) = tanh(x). Plot the solution for 5 x 5 at
t = 1,5 and 10.
2. Show that as long as duo /dx > 0 and u0 > 1, the solution will not contain
a shock. (Hint: Show that under this condition, x/a = 0 for all t where
a is parameter for x when t = 0; i.e., there will be no break times, and the
characteristics will not intersect).
E10.10. Consider the following quasilinear equation:
u
u
+ (2u + 1)
= 4u + 1
t
x
subject to
u(x, 0) = h(x)
such that h(x) is continuous.
1. Solve for u(x, t) for h(x) = 2 tanh(x) (Note: The solution may need to
be in implicit form). Plot the solution for 5 x 5 at t = 0.01, 0.05,
and 0.1.
2. Show that as long as dh/dx > 4, the solution will not contain a shock.
(Hint: Show that under this condition, x/a = 0 for all t where a is
parameter for x when t = 0; i.e., there will be no break times, and the
characteristics will not intersect).
E10.11. Obtain the general solution for the differential equation given by
x y u
u
+
= ku
x
x + y y
for x 0.
E10.12. Consider the following nonlinear first-order differential equation:
u 2
u 2
u 2
x
+ y
+ z
= c2
x
y
z
Obtain the Lagrange-Charpit conditions and find the complete solution.
E10.13. When the nonlinear first-order partial differential equation does not explicitly contain u and when u/t can be separated as a linear term, we obtain
the equation known as the Hamilton-Jacobi equation:
F (t, x1 , . . . , xn , ut , ux1 , . . . , uxn ) = ut + H (t, x1 , . . . , xn , ux1 , . . . , uxn ) = 0
where ut = u/t and uxk = u/xk . This equation is very useful in classical mechanics where H is known as the Hamiltonian, whereas its solution
401
402
H
uxk
duxk
dt
H
xk
(10.90)
(10.91)
d
sin
d
d
cos
(u ()) sin
d
(u ()) cos +
E10.15. Use (10.62) to obtain the complete solution of the following equation:
2 2 2
u
u
u
+
+
= ku
x
y
z
E10.16. Classify the following second-order linear equation for y = 0 and y = x,
(x y)
2u
u
2u
+
2x
y
+
+
y)
=0
(x
x2
x
y2
Find the variables and that would reduce the principal part to u .
E10.17. For the linear homogeneous partial differential equation given by
2u
2u
2u
+
+
b)
+
=0
(a
(ab)
y2
xy
x2
where a and b are real numbers, show that this is a hyperbolic equation and
obtain the alternate canonical form given by u = 0. Obtain the general
solution.
E10.18. Derive the solution u(x, t) of the system of n first-order partial differential
equations given by
u
u
+A
=b
t
x
subject to u(x, 0) = u0 (x), assuming that A and b are constants and the
eigenvalues k , k = 1, . . . , n, of AT are distinct and real with corresponding
eigenvector gk . Also, show that the solution will satisfy the initial conditions
and the system of partial differential equations.
10.6 Exercises
403
E10.19. From the continuity equation (5.24) and equation of motion (5.25), we can
obtain the inviscid gas dynamics in a tube flowing only in the x-direction by
assuming viscosity = 0 and neglecting gravitational effects due to g,
(v)
+
t
x
v
v
p
+v
+
t
x
x
where and v are the gas density and velocity, respectively. Assume
that
!
the
pressure
is
given
by
p
=
k
where
k
>
0
and
>
1.
Let
=
dp/d
=
!
(1)/2
k
. Show that the Riemann invariants for the system are given by
R1 =
2
+ v = constant
1
along
dx
=v+
dt
R2 =
2
v = constant
1
along
dx
=v
dt
+
u(x, 0) =
uleft
uright
if x a
if x > a
where b uleft < b uright , show that the solution given by
left
u
if x b uleft t + a
xa
b1
if b uleft t + a < x b uright t + a
u(x, t) =
right
u
if x > b uright t + a
is indeed piecewise continuous. Furthermore, verify this to be the case for
a = 4, b(u) = u2 , uleft = 2.5, and uright = 3 at t = 10 and t = 50.
E10.21. An idealized model for traffic flow is given by
u
2u
u
+ 1
a
=0
t
b
x
where u is the car density, a is the maximum car velocity (e.g., at zero car
density), and b is the maximum car density when velocity is zero (e.g., at the
red light).
1. Suppose the red light at x = 0 occurs at t = 0. Note that u, a, and b are
all positive values. Thus let the initial condition be given by
+
c
for x 0
u(x, 0) =
b for x > 0
where c < b. Obtain the shock solution, using the Rankine-Hugoniot
jump condition (J.18) to determine the shock path.
404
2. Suppose the green light occurred after all the cars have been stopped at
x 0. So this time, assume the initial condition is given by
+
b for x 0
u(x, 0) =
0 for x > 0
Obtain the rarefaction solution.
E10.22. Consider the Riemann problem involving the inviscid Burger equation given
by
0 if x 0
u
u
+u
=0
subject to
u(x, 0) =
1 if 0 < x 1
t
x
0 if x > 1
This is an example in which the left discontinuity will involve a rarefaction,
whereas the right discontinuity propagates a shock.
**
2
t=0
x=0
x=1
s2 (x, t) = x 2t
Thus obtain the solution u(x, t) for t 0.
11
In this chapter, we focus on the case of linear partial differential equations. In general,
we consider a partial differential equation to be linear if the partial derivatives
together with their coefficients can be represented by an operator L such that it
satisfies the property that L (u + v) = Lu + Lv, where and are constants,
whereas u and v are two functions of the same set of independent variables. This
linearity property allows for superposition of basis solutions to fit the boundary and
initial conditions.
We limit our discussion in this chapter to three approaches: operator factorization of reducible linear operators, separation of variables, and similarity transformations. Another set of solution methods known as integral transforms approach are
discussed instead in Chapter 12.
The method using operator factorization is described in Section 11.2, and it is
applicable to a class of partial differential equations known as reducible differential
equations. We show how this approach can be applied to the one-dimensional wave
equation to yield the well-known dAlemberts solutions. In Section K.1 of the
appendix, we consider how the dAlembert solutions can be applied and modified to
fit different boundary conditions, including infinite, semi-infinite, and finite spatial
domains.
The separation of variables method is described in Section 11.3. It may not be
as general at first glance, but it is an important and powerful approach because
it has yielded useful solutions to several important linear problems in science and
engineering. These include the linear diffusion, wave, and elliptic problems. The
solutions based on this approach often involve an infinite series of basis functions
that would satisfy the partial differential equations, which is possible because of the
linearity property of the differential operators. In the process of fitting the series
to satisfy initial conditions and boundary conditions, orthogonality properties of
the basis functions are needed. To this end, the theory of Sturm-Liouville systems
can be used to find the components needed to obtain the orthogonality properties.
A brief discussion of theory of Sturm-Liouville systems is given in Section 11.3.3.
However, because Sturm-Liouville systems involve only homogeneous boundary
conditions, additional steps are needed to handle both nonhomogeneous differential
equations and nonhomogeneous boundary conditions. Fortunately, again due to the
linearity properties of the differential operation, we could employ an approach that
405
406
splits the required solutions to allow the original problem to be divided into one or
more additional problems, some of which are homogeneous differential equations,
whereas others have homogeneous boundary conditions. These function-splitting
techniques are given in Section 11.4. For the case of nonhomogeneous differential
equations with homogeneous boundary conditions, a generalization of the separation
of variables technique, known as the eigenfunction expansion approach, is described
in Section 11.4.2 and can be used to solve the nonhomogeneous problem.
The third approach, known as the method of similarity transformation (also
known as combination of variables), is described in Section 11.5. It is a special
case of techniques known as symmetry transformation methods. It can be applied
to equations involving two independent variables to yield an ordinary differential
equation based on an independent variable that is a combination of the original pair
of independent variables. If the number of independent variables were more than
two, and if the approach were applicable, it could be used to reduce the number of
independent variables by one.1 However, it has been deemed by several textbooks
to be of limited application, because the boundary conditions may not allow for similarity transformations, even if it applies to the differential equations. Nonetheless, it
has been used to derive important solutions such as that for diffusion equations with
semi-infinite domains. More importantly, the similarity transformation approach,
even more so the general approach of symmetry transformation approach, has been
applied to several nonlinear differential equations.
1i1 <<im n
1i1 <<im1 n
(11.1)
1in
where
i1 ...im =
mu
xi1 . . . xim
1 i1 , . . . , im n
(11.2)
Similar principles of similarity transformations methods are discussed in Section 6.2 and Section 6.4.2
for the case of ordinary differential equations. In those cases, first-order differential equations are
converted to separable variable types, whereas for second-order or higher order ordinary differential
equations, the approach may be used to reduce the order by one.
407
m1
xi1 , . . . , xim1
i1 ,...,im1 (x)
1i1 <<im1 n
i (x)
1in
(x)
xi
(11.3)
(11.4)
(11.5)
(11.6)
while allowing enough arbitrary functions (or infinite arbitrary constants, as in the
case of Fourier series solutions) for the satisfaction of initial or boundary conditions.
Any solution that satisfies (11.6) is known as the homogeneous solution. Suppose
uhomog,1 and uhomog,2 are two different homogeneous solutions, then due to (11.4), a
linear combination of both solutions will also be a homogeneous solution, that is,
L uhomog,1 + uhomog,2 = L uhomog,1 + L uhomog,2 = 0
(11.7)
Equation (11.7) can be extended to any number of different homogeneous solutions,
including cases in which there are an infinite number of homogeneous solutions.
This process is referred to as superposition of homogeneous solutions. By including
a sufficient number of homogeneous solutions, the complementary homogeneous
solution is given by a linear combination
ucomp =
N
i uhomog,i
(11.8)
i=1
N
i uhomog,i
(11.9)
i=1
Solutions containing arbitrary functions, instead of arbitrary constants, are considered general
solutions. However, complete solutions and general solutions are often interchangeable.
408
There are three main types of boundary conditions for partial differential
equations:
1. Dirichlet conditions (also known as essential boundary conditions). These are
the conditions in which values of the dependent variables are specified at regions
of the boundary. For instance, in a 2D rectangular domain
: (x, y) [0, 1]
[0, 1], then setting u(x = 0, y) = 10 is a Dirichlet condition.
2. Neumann conditions. These are conditions in which values of the derivatives of
the dependent variables are specified at regions of the boundary. For instance,
u/y(x = 1, y) = f (y) is a Neumann condition.
3. Robin conditions (also known as mixed boundary condtions). These are conditions in which a relationship (often a linear combination) of both the derivatives
and the values of the dependent variables are specified at regions of the boundary. For instance, u/y(x = 0, y) + u(x = 0, y) = g(y) is a Robin condition.
Both Neumann and Robin conditions often result from the consideration of specifying the flux behavior at portions of the boundary, because fluxes are usually functions
of gradients. For instance, a Neumann condition set to zero is used to specify zero
flux. Both Neumann and Robin conditions are also known as natural boundary
conditions.
THEOREM 11.1.
i=1
where
Li = f 0,i (x) +
n
q=1
f q,i (x)
xq
and
Li Lj = Lj Li
i = j
(11.11)
m
i ui
i=1
(11.12)
409
j g j (x) ui where Li ui = 0
(11.13)
(Li )k vi = 0 vi =
j =1
where j are arbitrary constants and g j are functions such that Lki g j = 0.
PROOF.
n
c
q=1
xq
(11.14)
where n is the number of independent variables, then the functions g j in (11.13) can
be set up as
p
where
n
i=1
g j = x1 1 x2 2 xnp n
(11.15)
p i k.
EXAMPLE 11.1.
+
x y
and
L2 =
+
+3
x y
u1 = (x y)
u2 = e3x (x y)
410
3 +2 +1 u=0
x
y
3 +2 +1 w=0
x
y
via the method of characteristics is
w(x, y) = ex/3 (2x 3y)
where (.) is a general function. Because the operator L = (3 x + 2 y + 1) is
repeated k = 3 times, we can use (11.15) to set
g(x, y) = 1 + 2 x + 3 y + 4 x2 + 5 xy + 6 y2
Combining, the complete general solution is given by
u(x, y) = 1 + 2 x + 3 y + 4 x2 + 5 xy + 6 y2 ex/3 (2x 3y)
=0
(11.16)
x2
c2 t2
where c is a real constant. It is hyperbolic, homogeneous, and linear, with constant
coefficients. Equation (11.16) can be solved easily after transformation to canonical
form, as discussed in Section 10.4. Alternatively, we can approach it as a reducible
partial differential equation to show that it also leads to the same solution.
The linear operator for (11.16) can be factored as
Lu = L1 L2 u = L2 L1 u
where
and L2 =
+
x c t
x c t
For L1 u = 0, the characteristics equations are
L1 =
dx
dt
du
=
=
1
1/c
0
and the general solution is given by
u1 = (x + ct)
where is an arbitrary function. Similarly, for L2 u = 0, the characteristic equations
are
dt
dx
du
=
=
1
1/c
0
and the general solution becomes
u2 = (x ct)
411
(11.17)
For the case in which < x < and initial conditions given by
u (x, 0) = f (x)
and
u
(x, 0) = g(x)
t
(11.17) reduces to
1
1
u(x, t) = [ f (x ct) + f (x + ct)] +
2
2c
(11.18)
x+ct
g()d
(11.19)
xct
This result is known as the dAlembert solution. The details for this solution, as well
as its extensions to cases with semi-infinite domain and nonhomogeneous Neumann
conditions, are given in Section K.1 as an appendix.
(11.20)
1 dT
T dt
1 dR2
1 dR
1 d2
1 d2 Z
+
+ 2
+
2
2
R dr
rR dr
r d
Z dz2
Because the left-hand side of the equation involves only the variable t, whereas
the other side does not involve t, both sides must be equal to a constant, say 1 ,
that is,
dT
1 dT
= a2
G1 t, T,
= 1
dt
T dt
and
1 dR2
1 dR
1 d2
1 d2 Z
+
+
+
= 1
R dr2
rR dr
r2 d2
Z dz2
412
r2 dR2
r dR
+
2 r 2
2
R dr
R dr
=
1 d2
d2
r
2
dr dr
R dr2
R dr
and
d d2
1 d2
G4 , ,
, 2 =
= 3
d d
d2
1 d2 Z
Z dz2
1 2
r2 dR2
r dR
+
2 r 2
R dr2
R dr
a2
1 d2
= 3
d2
and conclude that the partial differential equation is indeed separable.
As may be expected, not all linear differential equations are separable. Sometimes, a change of coordinates is needed to attain separability. However, one case is
always separable,
n
m
i=1
u
ij (xi ) j + u = 0
xi
j =0
j
(11.23)
feature
where
ij is only a function of xi . The distinguishing
of (11.23) is that it is
n
only dependent on xi . After substituting u = =1 X (x ) and then dividing again
by u,
n
m
1
dj Xi
ij (xi )
+=0
j
Xi
dxi
i=1
j =0
413
resulting in
m
1
dj Xi
ij (xi )
= i
j
Xi
dxi
i = 1, 2, . . . , n
(11.24)
j =0
where
n
i + = 0.
i=1
2u
u
=
x2
t
(11.25)
and
1 dT
=
T dt
(11.26)
X = Aex
and
+ Bex
and
T = Cet
(11.27)
where A, B, and C are arbitrary constants. At this point, constraints from the initial
and boundary conditions, as well as physical insights, can be used to move the
solution forward. For u to remain bounded, we need < 0. As a consequence, the
2
u = e t sin
x + cos
x
(11.28)
=0
u(0, t) = 0 = e t sin (0) + cos (0) = e t
414
u(L, t) = 0 = e
sin
L
sin
L =0
n
; n = . . . , 2, 1, 0, 1, 2, . . .
L
n un =
n=1
2
n e(kn ) t sin kn x
(11.29)
n=1
n sin
n=1
n
x
L
(11.30)
0
n
m
sin
x sin
x dx =
L
L
L/2
if m = n
(11.31)
if m = n
We can multiply (11.30) by sin (mx/L) and integrate from 0 to L. Using (11.31),
we get
L
0
L
m
m
n
f (x) sin
x dx =
sin
x
n sin
x dx
L
L
L
0
n=0
or
2
m =
L
f (x) sin
0
m
x dx
L
(11.32)
1
2
3
4
1.2090 100
3.3307 1016
2.4586 101
9.7143 1017
5
6
7
8
EXAMPLE 11.4.
m
4.2028 107
9.4369 1016
9.8859 102
5.9674 1016
subject to:
2u
x2
u
t
u(0, t)
u(10, t) = 0
u(x, 0)
where
c
(1 + tanh(ax + b))
4
Based on (11.32), we can evaluate the coefficients of m to be
10
m
m = 0.2
sin
(5,5,1,x) + (5,45,1,x) dx
10
0
(a,b,c,x) =
and
u(L, t) = U L
bounded
where U o and U L are constants that are not both zero. Applying these conditions
to (11.28),
u(0, t) = U o
u(L, t) = U L
2 t
sin
e
L + cos
L
415
416
u(x,0)
0.5
0.5
M=20
M=10
10
10
u(x,0)
u(x,0)
0.5
0.5
M=40
M=50
10
10
M
m=1
(11.33)
where uhBC and unonhBC both satisfy the partial differential equation, but uhBC
satisfies the homogeneous boundary conditions:
uhBC (0, t) = 0 = uhBC (L, t)
and
(11.34)
(11.35)
The solution of uhBC has already been discussed in the previous case, except that
now it depends on an additional function g(x).
u(x,t)
1
0.8
0.6
0.4
0.2
0
0
10
Time 10
5
20 0
417
We address the solution of unonhBC first. The solution of unonhBC can take on
many possibilities. The simplest approach is to set it independent of t, which
implies unonhBC = g(x). Substituting unonhBC = g(x) in the diffusion equation,
2
d2 g
=0
dx2
g(x) = ax + b
(11.37)
n=1
u = (U L U o )
x
2
+ Uo +
n e(kn ) t sin (kn x)
L
(11.39)
n=1
EXAMPLE 11.5.
with = 1,
2u
u
=
2
x
t
subject to the conditions: u(0, t) = 1, u(10, t) = 2 and u(x, 0) = 1 + tanh (x).
Then
x
2
+ 1 and uhBC =
unonhBC =
n e(kn ) t sin (kn x)
10
n=0
418
1
2
3
4
6.1150 101
2.7196 101
1.5077 101
8.8998 102
5
6
7
8
n
5.3671 102
3.2618 102
1.9879 102
1.2129 102
The methods discussed thus far are for cases with constant nonhomogeneous
boundary conditions. For the more general situations in which nonhomogeneous
terms are not constant, homogenization of the boundary conditions will make the
differential equations nonhomogeneous. For the treatment of nonhomogeneous differential equations, refer to Section 11.4 for the method of eigenfunction expansion.
(11.40)
regardless of the coordinate system used. Functions that satisfy (11.40) are also
known as harmonic functions. Using the divergence theorem, several properties can
be found for harmonic functions in bounded regions, such as mean-value properties
and maximum principles. Our focus is on solving the potential equation (11.40) in a
circle of radius rmax < by using the method of separation of variables.
Consider the potential equation in polar coordinates r and ,
2 u 1 u
1 2u
+
+
=0
r2
r r
r2 2
(11.41)
(fg)
N=3
N=5
0.5
0
0
0.5
5
x
0
0
10
(fg)
1
(fg)
5
x
1
N=15
N=10
0.5
0
0
10
0.5
5
x
10
0
0
5
x
10
N
n=1
419
u(x,t)
3
Figure 11.4. Surface plot of the trajectory of the u distribution for Example 11.5.
0
0
10
Time
can be obtained,
1 2 d2 R
dR
r
+r
=
R
dr2
dr
10
20 0
1 d2
=
d2
and
The first equation is a Euler-Cauchy equation (cf. Section 6.4.3) whose solution is
given by
R = ar
+ br
(11.42)
+ de
(11.43)
un (r, )
n=0
n
n cos(n) + n sin(n)
rmax
n=0
(11.44)
420
u(r,
)
2
1.5
0.5
0.5
0
1
0
x
-0.5
0.5
0
y
-0.5
-1
-1
EXAMPLE 11.6.
9
1
r n
f ()
+
(cos (n) cos (n) + sin (n) sin (n)) d
2
rmax
0
n=1
8
9
1
1 2
r n
=
f ()
+
cos(n( )) d
0
2
rmax
n=1
8
9
2
1
n
n
=
f () 1 +
() +
(+) d
(11.45)
2 0
1
u(r, ) =
n=1
n=1
n
where the last line was obtained after using Eulers identity for cosine, with ()
and
n
(+) defined by
() =
r i()
e
rmax
and
(+) =
r +i()
e
rmax
421
r
rmax
cos( )
and
(+) () =
2
rmax
n =
n=0
1
1
=
=
=
1
2
1
2
1
2
2
0
2
0
2
'
(
1
1
f ()
+
1 d
1 ()
1 (+)
'
(
1 () (+)
f ()
d
1 ((+) + () ) + () (+)
f ()KPoisson (, , ) d
(11.46)
where = r/rmax is the normalized radius and KPoisson (, , ) is known as the Poisson
kernel given by
KPoisson , , =
1 2
1 2 cos( ) + 2
(11.47)
Equation (11.46) is known as the Poisson integral equation. One advantage of the
form given in (11.46) is that the solution is based on a single integration instead of
an infinite series.
B
= 0 if m = n
(11.48)
r(x)m n dx =
A
= 0 if m = n
422
For r(x) = 1, the set of functions are classified as simply orthogonal. Furthermore, if
B
2n (x)dx = 1
A
for all n, then we say that the set of functions are orthonormal.
This means that if the set {k } turns out to be orthogonal with respect to r(x) in
the interval (A, B), we could find the th coefficient, a , in the following series
f (x) =
ak k (x)
(11.49)
k=0
by first multiplying both sides of (11.49) by r(x) (x) and then integrating from
A to B,
B
B
r(x) (x) f (x)dx =
a
r(x) (x)k (x)dx
A
k=0
a
A
or
a =
r(x)2 (x)dx
(11.50)
B
A
r(x)2 (x)dx
Three items need to be determined: the weighting function r(x) and the limits
A and B. The properties of Sturm-Liouville systems described next are useful in
helping us obtain these items.
Definition 11.2. Let r(x), p (x) and q(x) be continuous functions, then a SturmLiouville system is described by the following differential equation
d
d
p (x)
+ (q(x) + r(x)) = 0
(11.51)
dx
dx
subject to the following boundary conditions
d
A(A) + A
dx x=A
d
B(B) + B
dx x=B
(11.52)
where (A, A) are not simultaneously zero and (B, B) are not simultaneously
zero.
Any homogeneous linear second-order differential equation of the form
2 (x)
d2
d
+ 0 (x) + = 0
+ 1 (x)
2
dx
dx
(11.53)
423
where 2 (x) has no zeros in the region of interest, can be recast into the SturmLiouville form given in (11.51). To do so, multiply (11.53) by a factor
1
(x) = exp
dx
2
and then divide the result by 2 (x). This yields
d2 1
d
0
1
(x) 2 + (x)
+ (x) + (x)
dx
2
dx
2
2
'
( '
(
'
(
d
d
0 (x)
1
(x)
+ (x)
+ (x)
dx
dx
2 (x)
2 (x)
This means that for the given differential given in (11.53), we can identify the terms
in (11.51) as
p (x)
q(x)
r(x)
(x) = e (1 /2 )dx
(x)
0
e (1 /2 )dx
2 (x)
1
e (1 /2 )dx
2 (x)
(11.54)
The following theorem states the important result regarding solutions of SturmLiouville systems, which we use to find the items needed for orthogonality, namely
the weighting function r(x) and the limits A and B:
Let n (x) be the solutions of the Sturm-Liouville system given by
(11.51), (11.52) and (11.52), corresponding to = n . Then the set of solutions
= {0 , 1 , 2 , . . .}, are orthogonal with respect to r(x) within the interval (A, B),
where r(x) is obtained in (11.51), whereas the values of A and B are those given in
boundary conditions (11.52).
THEOREM 11.2.
PROOF.
For the special case where n (x) = ei(nx)/p , with p = B A and r(x) = 1, where
f (x) =
an ei(nx)/p
(11.55)
n=
k=0
424
d2
d
+ 1 (x)
+ 0 (x) + = 0
2
dx
dx
1
2 (x)
4. Identify points A and B, with A < B, such that k satifies the boundary conditions in (11.52) and (11.52) for all k. The interval (A, B) should at least cover
the region on which f (x) is being approximated.
5. The coefficients ak can then be obtained by the formula given in (11.50),
B
A
r(x)2k (x)dx
In Example 11.7 that follows, we show a situation using step 2a, whereas in
Example 11.8, we show a situation using step 2b in the context of solving a partial
differential equation.
Suppose we want to find the coefficients that would satisfy the
following equation:
EXAMPLE 11.7.
f (x) =
ak cos k x
k=0
k
sin k x
2 x
d 2 k
dx2
k2
k
cos k x + sin k x
4x
4x x
ak
2.4305 101
4.2365 101
5.6718 101
3.1978 102
3.0976 102
0
1
2
3
4
ak
5.7742 102
3.8581 102
1.5017 102
2.4405 102
1.6804 102
5
6
7
8
9
d 2 k
dk
+2
+ k2 k = 0
2
dx
dx
1
exp
4x
1
1
dx =
2x
4 x
(x 10) /10
f (x) =
(30 x) /10
for x 20
for x 20
and
dk
k
(B) = sin k B = 0
dx
2 B
the needed
with A = 2 9.87 and B = 42 39.4. The interval (A, B)covers
region: (10, 30). It might appear valid to set k (A) = cos k A = 0, which
could yield A = (/2)2 . Unfortunately, doing so will satisfy k (A) = 0 only for
odd integers. Finally, we have the following equations for ak ,
1
f (x) cos k x dx
2
4 x
ak = 2
4
1
cos2 k x dx
4 x
2
42
The first ten values of ak are shown in Table 11.3. Figure 11.6 shows the result of
using the first N number of terms in the series. Around N = 50, the error range
at each x is within 103 . Note that the approximations are based on cosines,
which are even functions, and thus a more complete orthogonal set should have
included sines as well.
Note that we avoided the region x 0, because r(x) is either unbounded or complex-valued in that
region.
425
426
f(x) 1
0.8
0.8
0.6
0.6
0.4
0.4
N=5
0.2
0
10
0.2
15
20
x
25
0
10
30
f(x) 1
f(x) 1
0.8
0.8
0.6
0.6
N=25
0.4
15
0.4
0.2
0
10
N=10
20
x
25
30
25
30
N=50
0.2
15
20
x
25
0
10
30
N
k=0
15
20
x
subject to
BC :
u(rmax , , t) = 0
IC : u(r, , 0) = f (r, )
0 2 ; t 0
0 2 ; 0 r rmax
dT
= 1 2 T
dt
= Be 2 + Ce 3
R = DJ 2
1 r + EY2
1 r
2
where J (r) and Y (r) are Bessel functions of r of first and second kind,
respectively, each of order with parameter .
Because we need the final solution to be bounded for all t, 1 must be
negative. So, let 1 = 2 . For to be periodic with period 2, we need 2 to be
a positive integer. So, let 2 = n 2 , where n is an integer. Finally, for the solution
to be bounded for all r, including the origin, we need the coefficient of Yn (r)
to be zero. Applying all these into the solutions, and multiplying the values to
obtain u, we have
()2 t
u=e
J n (r) cos (n) + sin (n)
To satisfy the homogeneous boundary condition at r = rmax for all t 0 and
0 2, we need
J n (rmax ) = 0
Let nm be the mth root of J n (r) and let nm =
complete solution is given by
u=
(nm )2 t
nm
. Applying superposition, the
rmax
J n (nm r) nm cos (n) + nm sin (n)
n=0 m=1
To evaluate the coefficients anm and bnm , we use the initial condition,
J n (nm r) nm cos (n) + nm sin (n)
f (r, ) = u(r, , 0) =
n=0 m=1
and apply orthogonality properties. The orthogonality property for sine was
already given in (11.31). A similar property can be used for cosines. Also, the
orthogonality of cosine and sine functions can be shown by direct integration:
2
n
m
sin
x cos
x dx = 0
L
L
0
For the Bessel functions, we can apply the Sturm-Liouville theorem (Theorem 11.2). We begin with the differential equation for R(r), written in such a
way that nm is the mth eigenvalue,
d2 R 1 dR
n2
2
+
+
R=0
nm
dr2
r dr
r2
From this equation, we can immediately identify the weighting function. To
avoid confusion, we use w(r) to denote the weighting function,
1
w(r) = exp
dr = r
r
Note that the eigenvalues for this equation are indexed by m, not n. The parameters indexed by n are eigenvalues connected with the differential equations
involving . To obtain the limits needed for the orthogonality, recall some properties of Bessel functions with r as the independent variable (cf. Equations (9.60)
and (9.66)):
n
d
J n (nm r) = J n+1 (nm r) J n (nm r)
dr
r
and
lim J n (nm r) = 0
r0
for n > 0
427
428
This implies that we could set one of the homogeneous boundary condition
needed for the Sturm-Liouville conditions at A = 0 to be
d J 0 (nm 0) = 0 for n = 0
dr
BC 1 :
J
for n > 0
n (nm 0) = 0
For the second homogenous boundary condition needed for Sturm-Liouville
conditions, we set B = rmax and obtain
J n (nm rmax ) = 0
BC 2 :
km J 0 (m r) e(m )
m=1
where
1
km =
rf (r) J 0 (m r) dr
1 2
0 rJ 0 (m r) dr
and m = m is the mth root of J 0 (r). The first set of twenty positive roots, m ,
of J 0 (r) is given in Table 11.4, whereas the values of km for m = 1, 2, . . . , 20
are given in Table 11.5. Truncation of the infinite series for u(r, 0) after m = 20
resulted in an approximation of the initial condition f (r) with errors within
104 . Figure 11.7 shows some time-lapse sequence of surface plots of u
distribution.
In general, the eigenvalues n for the Sturm-Liouville system, (11.51) and (11.52),
may not be directly identifiable in terms of sines, cosines, Bessel functions, Legendre
polynomial, and so forth. Instead, the Sturm-Liouville system must be treated as an
4
It would have been more efficient to use the fact that f (r, ) = f (r) and deduce radial symmetry at
the outset. Doing so, we could set 2 u/2 = 0 and arrive at the same results.
429
2.4048
5.5201
8.6537
11.7915
14.9309
49.4826
52.6241
55.7655
58.9070
62.0485
33.7758
36.9171
40.0584
43.1998
46.3412
eigenvalue problem. Just as it was for matrix theory, the eignenvalue problem here
is described by
1
d
d
L() =
p (x)
+ q(x) () =
r(x) dx
dx
such that = 0. Unfortunately, the solution of the Sturm-Liouville differential equations may not be easily solvable in general, because the coefficients are often functions of the independent variable x. If the solutions are available, the procedure for
evaluating the eigenvalues lies in satisfying the boundary conditions such that the
solutions are not trivial; that is, is not identically zero. We show the process in
the next example. One identity that is often useful during the determination of the
eigenvalues is
ei2n = cos(2n) + i sin(2n) = 1
EXAMPLE 11.9.
n = . . . , 1, 0, 1, . . .
(11.56)
d2
d
+ a1 x
+ a0 =
dx2
dx
(11.57)
subject to
(x0 ) = 0
and
(xL) = 0
with x0 = 0 and xL = 0.
The differential equation (11.57) is a Euler-Cauchy type and can be reduced
to a differential equation with constant coefficient under a new independent
variable z = ln(x) (cf. Section 6.4.3), that is,
a2
d2
d
+ (a1 a2 )
+ (a0 ) = 0
2
dz
dx
km
km
1
2
3
4
5
6
7
8
9
10
7.7976 101
2.5024 101
5.5448 102
3.3734 102
1.6083 102
1.1263 102
6.9293 103
5.2733 103
3.6803 103
2.9502 103
11
12
13
14
15
16
17
18
19
20
2.2189 103
1.8426 103
1.4560 103
1.2403 103
1.0151 103
8.8139 104
7.4063 104
6.5260 104
5.5986 104
4.9927 104
430
s2 +
a2 a1
2a2
7
=
2 +
s1
s2
a0
a2
xs01
xs02
xsL1
xsL2
=
For A and B to not be simultaneously zero, we need the matrix involving x0 and
xL to be singular, that is,
0=
xs01 xsL2
xs02 xsL1
= (x0 xL)
x0
xL
8
1
xL
x0
2 9
xL
x0
2
= e2 ln(xL /x0 ) = 1
2 n a0
2n ln(xL/x0 ) = i2n + a
2
n
ln(xL/x0 )
2
n
=
ln(xL/x0 )
= i
= a0 a2 2 +
n
ln(xL/x0 )
2
0.5
0.5
0.5
0
1
0
1
0
1
1
u
0
1
y
-1 -1
0
-1 -1
-1 -1
0.5
0.5
0.5
0
0.5
0
-1 -1
0
1
0
-1 -1
0
1
0
-1 -1
0
-1 -1
0
1
0
1
0
-1 -1
n
An x x x0 x
=
An x0 x
x0
x0
'
n ln(x/x0 )
n
=
An x0 x exp i
ln(xL/x0 )
(
n ln(x/x0 )
exp i
ln(xL/x0 )
ln(x/x0 )
n
=
An x0 (2i) x sin n
ln(xL/x0 )
After dropping the constant coefficient, the n th eigenfunction is given by
ln(x/x0 )
n = x(a2 a1 )/(2a2 ) sin n
ln(xL/x0 )
Furthermore, because the eigenvalue problem satisfies the conditions of a
Sturm-Liouville system, we have the following orthogonality property:
5
xL
(a1 /a2 )2
x
n m dx = 0 if n = m
= 0 if n = m
x0
431
432
an approach similar to the variation of parameters, in which a linear functional combination of the homogeneous solutions are substituted into the nonhomogeneous
equations to obtain the final solution. This approach is known as the method of
eigenfunction expansion.
In the next section, we show a procedure that converts nonhomogeneous boundary conditions to homogeneous boundary conditions. This will, in general, introduce
more nonhomogeneous terms into the differential equation. We limit the discussion
to second-order linear differential equations for u = u(x, t).
(11.58)
2
2
2
+ t +
xt
tt 2 + x
2
x
xt
t
x
t
and
u
a0 u(0, t) + a1
= g 1 (t)
x x=0
and
u
c1
= f 2 (x)
t t=0
u
b0 u(1, t) + b1
= g 2 (t)
x x=1
(11.59)
=
=
f 1 (x) c0 S(x, 0)
S
f 2 (x) c1
t t=0
a0 U(0, t) + a1
=
=
S
x x=0
S
g 2 (t) b0 S(1, t) b1
x x=1
g 1 (t) a0 S(0, t) a1
(11.60)
433
We want to transform the original problem in u(x, t) to a problem in U(x, t) but with
homogeneous boundary conditions; that is, we need
S
S
a0 S(0, t) + a1
= g 1 (t)
and
b0 S(1, t) + b1
= g 2 (t) (11.61)
x x=0
x x=1
One of the simplest choices for S(x, t) is to take the following form,
S(x, t) = S0 (t) + xS1 (t)
(11.62)
(11.63)
and
U
c1
=&
f 2 (x)
t t=0
and
U
b0 U(1, t) + b1
= 0 (11.66)
x x=1
U
a0 U(0, t) + a1
=0
x x=0
where,
(11.64)
&
h(x, t)
h(x, t) L S0 (t) + xS1 (t)
&
f 1 (x)
f 1 (x) c0 S0 (0) + xS1 (0)
&
f 2 (x)
=
f 2 (x) c1
dS0
dS1
+x
dt
dt t=0
(11.65)
(11.67)
and S0 and S1 are given by (11.63), assuming that a0 (b0 + b1 ) = a1 b0 . Thereafter, the
desired solution for u is given by
u(x, t) = U(x, t) + S0 (t) + xS1 (t)
EXAMPLE 11.10.
(11.68)
subject to the initial condition, u(0, x) = 1.5x2 + x + 0.5, and boundary conditions
u
u
= 0.9
and
2u(t, 1) + 0.5
2u(t, 0) 0.1
= 3.5e3t + 2.5
x
x
x=0
x=1
434
+ x (x)
+ x (x) + t (t) 2 + t (x) + t (t) (11.69)
2
x
x
t
t
and
U
c1
=&
f 2 (x)
t t=0
and
b0 U(t, 1) + b1
U
=0
x x=0
(11.70)
(11.71)
U
= 0 (11.72)
x x=1
LX (X) = X
where
d2
d
+ t (t) + t (t) and
2
dt
dt
LT
t (t)
LX
x (x)
d2
d
+ x (x)
+ x (x)
dx2
dx
(11.74)
The solutions X n to
x (x)
5
d2 X
dX
+ x (x)
+ x (x)X = n X
2
dx
dx
(11.75)
This is similar to the step of finding the complementary solutions of ordinary differential equations.
435
dX
dX
=
0
and
b
X
(1)
+
b
= 0 are known as the
0 n
1
dx x=0
dx x=1
eigenfunctions corresponding to = n . Based on Theorem 11.2 and (11.54),
1
= 0 if n = m
r(x)X n (x)X m (x)dx =
(11.76)
=
0
if
n
=
m
subject to a0 X n (0) + a1
where
r(x) =
(x /x )dx
U(t, x)
T n (t)X n (x)
(11.77)
&
hn (t)X n (x)
(11.78)
n=0
&
h(t, x)
n=0
where
&
hn (t) =
r(x)&
h(t, x)X n (x) dx
1
r(x)X 2n (x) dx
0
Lsep
T n (t)X n (x) =
Lsep T n (t)X n (x)
n=0
i=0
T n (t) LX X n (x) + X n (x) LT T n (t)
i=0
&
hn (t)X n (x)
n=0
=
i=0
because LX X n (x) = n X n (x). The terms T n (t) can then be obtained by solving
t (t)
d2 T n
dT n
+
(t)
+
Tn = &
(t)
+
hn (t)
t
t
n
dt2
dt
and
dT n
c1
dt
=
t=0
where &
f 1 (x) and &
f 2 (x) are the functions specified in (11.71).
1
0
(11.79)
r(x)&
f 2 (x)X n (x)dx
1
r(x)X 2n (x)dx
0
436
EXAMPLE 11.11.
U
2U
=
+ e3t 8.7x2 16.313x + 5.298 2.8
2
t
x
subject to
and
U
2U(t, 0) 0.1
x x=0
U
2U(t, 1) + 0.5
x x=1
and
U
2U(t, 0) 0.1 x
x=0
U
2U(t, 1) + 0.5 x
x=1
= T (t)X(x) and = 2 ,
Using U
T (t) = Ce
and
3t
2
&
h(t, x) = e
8.7x 16.313x + 5.298 2.8 =
h n (t)X n (x)
n=1
where
h n (t) =
1
0
e3t 8.7x2 16.313x + 5.298 2.8 X n (x) dx
1
X 2n (x)dx
0
th
Similarly, with U(t, x) =
n=1 T n (t)X n (x), we generate the n -ordinary differential equations for T n given by
1
U(0, x)X n (x)dx
dT n
hn (t) subject to T n (0) = 0 1
+ 2n T n = &
dt
X 2n (x)dx
0
The explicit calculations can be obtained for this example with the aid of symbolic manipulation software.6
Because the original problem in Example 11.10 is for u(t, x) = U(t, x) +
S(t, x), where
25
7 3t
8
35
S(t, x) =
+x
e
e3t
52 104
13 26
The final solution is given by
u(t, x) = S(t, x) +
T n (t)X n (x)
n=1
A plot of the eigenfunction solution using the series truncated after n = 100
is shown in Figure 11.8. The exact solution we used to generate the original
problem in Example 11.10 is known and is given by
14 2
3
1
4
uexact (t, x) = e3t x2 + x +
+ 1 e3t
x x+
2
2
10
10
Also included in Figure 11.8 is a plot of the errors between the eigenfunction
solution and the exact solution.
Using MathCad, we get T n (t) = Tn,a (t) + Tn,b(t) + Tn,c (t)/Tn,d , where
Tn,a (t)
e3t
8
2059 3
67512
n +
n sin (n )
13
13
9
3752 2
+
n + 13920 cos (n ) + 13920 + 46402n
13
8
165 5
12738 3
36960
n
n
n sin (n )
13
13
13
en t
Tn,c (t)
9
3960 4
11088 2
n +
n cos (n )
13
13
1123n + 336n sin (n ) + 22402n 6720 cos (n ) 1
Tn,d (t)
Tn,b(t)
405n + 1203n cos (n ) + 6n 4034n + 12002n cos (n ) sin (n )
+7n + 4375n 13203n
437
438
1
2
3
4
5
2.4664
5.1244
7.9425
13.8145
16.8133
6
7
8
9
10
19.8380
22.8825
25.9425
29.0148
32.0971
EXAMPLE 11.12.
u(t,x)
x 10
0.5
Error
0
0.5
1
5
1
0.5
0.5
0.5
t
0 0
0.5
x
0 0
Figure 11.8. A plot of the eigenfunction expansion solution for the system given in Examples 11.10 and 11.11. The plot on the right is the error of the expansion solution from the exact
solution.
439
u(x,y)
0.08
0.06
0.04
Figure 11.9. A plot of u(x, y) that solves the Poisson equation for f = 1 and boundary conditions
u(0, y) = u(1, y) = u(x, 0) = u(x, 1) = 0.
0.02
0
-0.02
1
1
y
0.5
0.5
x
0 0
4
sin (nx)
1 cosh(n)
U(x, y) = 3
cosh(ny) +
sinh(ny)
n3
sinh(n)
subject to U(0, y) = U(1, y) = 0, U(x, 0) = U(x, 1) =
n=1,3,5,...
(11.80)
A plot of u =
x(1 x)
+ U(x, y) is given in Figure 11.9.
2
and &
u =&
u (u, x)
(11.82)
440
It can be shown that if the differential equation admits a symmetry transformation based on &
u and &
x, one can reduce the original differential equation to a
differential equation with as a new variable that depends on , = 1, . . . , (n 1).
This means that a combination of variables from the set (u, x) can yield an ordinary differential equation if n = 2 or a partial differential equation with (n 1)
independent variables if n > 2.
However, in general, the determination of the symmetries of a differential equation can be a very long and difficult process. One type of transformation is easy to
check for symmetry. These are the similarity transformations (also known as scaling
or stretch transformations), given by
&
u = u and
x&k = k xk for k = 1, . . . , n
(11.84)
where is known as the similarity transformation parameter, and at least two of the
exponents k must be nonzero. To determine whether a given partial differential
equation admits a similarity transformation, one would only need to substitute the
transformations given in (11.84) into the given differential equation (11.81) and
determine whether there exists values of k and that would yield an equation that
does not involve the parameter .
t,
Applying the similarity group of transformations (11.84), t = &
x and u = &
u to the differential equation given by
x=&
EXAMPLE 11.13.
u
2u
+x 2 =A
t
x
where A is a constant, we obtain
&
u 2&
u
+
&
x 2 =A
&
t
&
x
Symmetry is achieved if we set = = = 1. Conversely, one can show that the
following differential equation does not admit symmetry based on a similarity
transformation:
u 2 u
+ 2 = x2 + A
t
x
At this point, we limit the similarity method to handle only partial differential
equations for u that depend on two independent variables x and t. For this case, we
have the following theorem:
441
(11.85)
&
x
x
=
&
t
t
and
&
u
u
=
&
t
t
Theorem 11.3 guarantees that a partial differential equation involving two independent variables that admits a similarity transformation can be reduced to an
ordinary differential equation. However, one will need to consider additional complexities. In some cases, especially for linear partial differential equations, there
can be more than one set of similarity transformations. One will need to determine
whether symmetry applies to the initial and boundary conditions as well. Finally, the
ordinary differential equations may not necessarily be easy to solve. In some cases,
numerical methods might be needed.
EXAMPLE 11.14.
u
2u
= 2 2
(11.86)
t
x
subject to the conditions: u(x, 0) = ui and u(0, t) = u0 . After substitution of
&
x = x and &
u = u, we have
t = t, &
&
u
2&
u
= 2 2 2
&
t
&
x
subject to u x, 0 = ui and u (0, t) = u0 . For symmetry, we need
= 2 and = 0, so we
can set = 1, = 1/2 and = 0, yielding the invariants
x/ &
= u =&
u and = x/ t = &
t. Substituting these into the partial differential
equation (11.86) will yield
2
The general solution is given by
d2 u
du
=
2
d
2 d
u () u (0) = C1
0
2
exp
42
d = A erf
2
442
t=0
t=1
(uu )/(uiu )
0.8
t=2
0.6
t=3
0.4
t=4
t=5
0.2
10
2 1/2
(4 t)
with erf(0) = 0 and erf() = 1. After implementing the initial and boundary
conditions, that is, u(0, t) = u( = 0) = u0 and u(x, 0) = u( = ) = ui , the solution is then given by
x
u (x, t) = (ui u0 ) erf !
+ u0
42 t
!
A plot of the ratio (u u0 )/(ui u0 ) along the normalized distance x/ 42 t at
different values of t is shown in Figure 11.10.
For the diffusion equation given in (11.86), suppose the conditions are now given by u(x, 0) = 0, limx u(x, t) = 0, and u/x(0, t) = H.
x = x and &
u = u, the same partial differAfter substitution of &
t = t, &
ential equation results as in Example 11.14, except that the initial and boundary
conditions are now given by
u &
x, 0
= 0
&
&
u
and
t) = H
(0, &
&
x
t
= 0
u &
x, &
lim&x &
EXAMPLE 11.15.
=0
d2
22 d
22
d
(0) = H. The solution of this equation can be
d
found in Example 9.2. After applying the boundary conditions, we have
7
2H
2 /(42 )
= H
+
e
erf !
42
42
11.6 Exercises
443
= 0.1
u/H
= 1.0
= 5.0
= 10.
= 20
8
6
92
2
x
4 t
x
u(x, t) = H x erfc !
exp !
2
4 t
42 t
where erfc(z) = 1 erf(z) is known as the complementary error function. A
plot showing u/H as a function of x at different values of = 42 t is shown in
Figure 11.11.
E11.1. Obtain the general solutions for the following reducible or reduced homogeneous differential equations:
2u
2u
2u
u
u
+
2 2 +8
+3
+ 2u = 0
2
x
xy
y
x
y
2u
u
u
+ a(x, y)
+ b(x, y)
+ c(x, y)u = 0
xy
x
y
(11.87)
10
444
Lx =
+ b(x, y)
and
Ly =
+ a(x, y)
x
y
Show that (11.87) can be rearranged in two forms:
Lx Ly (u) = ha (x, y)u
or
where
a
+ a(x, y)b(x, y) c(x, y)
x
b
hb(x, y) =
+ a(x, y)b(x, y) c(x, y)
y
The functions ha and hb are known as the Laplace invariants of (11.87).
2. If ha = hb = 0, (11.87) will be reducible. Solve the case where a(x, y) = x,
b(x, y) = y and c(x, y) = 1 + xy.
3. If only one of the Laplace invariants is zero, one can still proceed by
integrating with respect to one independent variable, followed by the
integration with respect to the other independent variable. For instance,
if Lx Ly u = 0, then first solve Lx z = 0 followed by solving Ly u = z. Using
this approach, obtain the general solution of (11.87) for the case where
a(x, y) = y, b(x, y) = xy and c(x, y) = xy2 .
ha (x, y)
where
(, , , x) =
1 + tanh (x + )
2
and
i
1
2
3
4
i
1
1
1
1
i
4
4
4
10
i
1
1
0.5
0.5
u (x, 0) = e2x
u
1
for x 0
;
(x, 0) =
4x
t
1+e
u(0, t) =
9 e4t
8
t0
11.6 Exercises
445
(Hint: See Section K.1.2 for the solution of the wave equation that includes
a Dirichlet boundary condition.)
E11.5. Obtain the displacement u(x, t) for a string fixed at two ends described by
the following wave equation
2
2u
2 u
=
t2
x2
subject to u(x, 0) = f (x), u/t(x, 0) = g(x) and u(0, t) = u(L, t) = 0. Plot
the solution for the case where L = 1, g(x) = 0 and
+
0.2x
0 x 0.5
f (x) =
0.2(1 x) 0.5 < x 1
and
U(0,
t)
U
2U(1, t) +
x x=1
du
d2 u
+ nu = 0 r(x) = ex
+ (1 x)
2
dx
dx
446
d2 u
du
2
2x
+ 2n (n + 1) u = 0 r(x) = ex
2
dx
dx
7. Spherical Bessel functions:
d2
( + 1)
u = 0 r(x) = 1
(xu) + x
dx2
x
E11.8. Let f (x) be a continuous function of x. Show that for a set of functions
n (x) = cos n f (x)
a Sturm-Liouville differential eigenfunction equation (in expanded form)
for these functions can be constructed as follows
2 2
2 3
df
d n
d f
df
dn
+ n 2 n = 0
2
2
dx
dx
dx
dx
dx
Using this result, find a series approximation based on n (x) using f (x) = x2
to approximate
(x)
for 0.25 x 0.5
h(x) =
(1 x) for 0.5 < x 0.75
E11.9. A well-known equation for financial option pricing is the Black-Scholes
equation for a x b and 0 t T given by
u
1
2u
u
= 2 x2 2 + r x
u
t
2
x
x
subject to
u(a, t) = u(b, t) = 0
u(x, 0) = u f (x)
and
where u(x, t) is the value of the option, x is the value of the underlying asset,
t is the time from expiry of option, r is the risk-free interest rate, is the
volatility of the asset, and u f (x) is the final payoff. The boundaries a and b
are barriers under which the option becomes worthless. Obtain the solution
and plots for r = 1, = 2, a = 0.5, b = 2 and
if 0.5 x 1
10x 5
u f (x) =
5x + 10 if 1 x 2
0
otherwise
(Hint: Using separation of variables, an eigenvalue problem that is a special
case of Example 11.9 will result.)
E11.10. Obtain the solution to the following convective-diffusion equation:
D
2 u u
u
+
=
x2
x
t
subject to:
u(0, t)
u(1, t)
=
=
sin (t)
0
and
u(x, 0) = 0
11.6 Exercises
447
q
2q
= 2 2
t
x
E11.13. The Nusselt problem models the temperature for a laminar flow of a fluid
through a pipe. After normalization, we have
2
u
u 1 u
=
+
0 z L and 0 r 1
z
r2
r r
subject to
u
u(1, z) = uW
(0, z) = 0 and u(r, 0) = u0
r
Solve this equation using the method of separation of variables. Plot the
solution for the case with uW = 5 and u0 = 10.
E11.14. Consider the potential equation for the temperature u of a sphere of radius.
Assume cylindrical symmetry for the surface temperature, u(1, , ) = f ();
then we have only dependence with r and . Thus
2u
u 2 u cos u
+
2r
+ 2 +
=0
r2
r
sin
where 0 and 0 r 1.
2u = 0
r2
1. Let u = R(r)(); then show that separation of variables will yield the
following ordinary differential equations:
r2
d2 R
dR
+ 2r
R
dr2
dr
(11.88)
d2 cos d
+
+ = 0
(11.89)
d2
sin d
2. Letting x = cos and = n(n + 1), show that (11.89) reduces to the
Legendre equation (cf. Section 9.2), that is,
d2
d
2x
+ n(n + 1) = 0
2
dx
dx
whose solutions (considering Qn (0) = ) are
(1 x2 )
n () = Pn (cos )
where Pn is the Legendre polynomial of order n defined in (I.31).
3. Using = n(n + 1), solve for R(r) under the condition that R(0) is finite.
4. Obtain the solution for u(r, ) under the additional boundary conditions
u(1, ) = 1 cos2 (), 0 and plot the solution.
448
r=0
4
sin (nx)
1 cosh(n)
U(x, y) = 3
cosh(ny)+
sinh(ny)
n3
sinh(n)
n=1,3,5,...
E11.17. Solve the following nonhomogeneous partial differential equation using the
eigenfunction expansion approach:
u
2u
= 2 3ex2t
t
x
0 x 1 and
t0
subject to
u(0, t) = e2t
u(1, t) = e12t
and
u(x, 0) = ex
Plot the solution and compare with the exact solution u(x, t) = ex2t .
Hint: First, use the approach given in Section 11.4 to obtain a splitting of
the solution as u(x, t) = U(x, t) + S(x, t), such that
2U
U
S 2 S
x2t
=
3e
2
t
x2
t
x
subject to
U(0, t) = 0 ,
U(1, t) = 0
and
U(x, 0) = ex S(x, 0)
Then show that the eigenfunction approach solution should yield U(x, t) =
n=1 T n (t) sin(nt) where T n (t) is the solution of the ordinary differential
equation:
dT n
+ (n)2 T n = n e2t subject to T n (0) = p n
dt
where
1
2
n
pn =
(1)
e
1
n [(n)2 + 1]
n = (p n ) (n)2 2
11.6 Exercises
449
u(x, y) =
mn mn (x, y)
and
f (x, y) =
mn mn (x, y)
m=1 n=1
m=1 n=1
E11.20. The chemical vapor deposition reaction (CVD) that is fed by a diffusionlimited laminar flow between two parallel plates can be modeled approximately by7
u
Q 2u
=
x
y y2
subject to: u(x, 0) = 0, u(0, y) = u0 , lim u(x, y) = u0 and lim u(x, y) = 0,
y
Based on an example in R. G. Rice and D. D. Do, Applied Mathematics and Modeling for Chemical
Engineers, John Wiley & Sons, New York, 1995, pp. 415420.
12
In this chapter, we discuss the integral transform methods for solving linear partial
differential equations. Although there are several types of transforms available, the
methods that are most often used are the Laplace transform methods and the Fourier
transform methods. Basically, an integral transform is used to map the differential
equation domain to another domain in which one of the dimensions is reduced
from derivative operations to algebraic operations. This means that if we begin with
an ordinary differential equation, the integral transform will map the equation to
an algebraic equation (cf. Section 6.7). For a 2D problem, the integral transforms
should map the partial differential equation to an ordinary differential equation, and
so on.
We begin in Section 12.1 with a very brief introduction to general integral
transforms. Then, in Section 12.2, we discuss the details of Fourier transforms, its
definition, some particular examples, and then the properties. Surprisingly, the initial
developments of Fourier transforms are unable to be applied to some of the most
useful functions, including step function and sinusoidal functions. Although there
were several ad hoc approaches to overcome these problems, it was not until the
introduction of the theory of distributions that the various ad hoc approaches were
unified and gained a solid mathematical grounding. This theory allows the extension of classical Fourier transform to handle the problematic functions. We have
located the discussion of theory of distributions in the appendix, and we include
the details of the Fourier transform of the various difficult functions in the same
appendix. Some of properties of Fourier transforms are then explored and collected
in Section 12.2.2. We have kept these properties to a minimum, with a focus on
solving differential equations. Then, in Section 12.3, we apply Fourier transform to
solve partial differential equations. As some authors have noted, Fourier transform
methods are most useful in handling infinite domains.
Next, starting in Section 12.4, we switch our attention to the Laplace transforms.
We view Laplace transforms as a special case of Fourier transforms. However, with
Laplace transform, one can apply the technique on dimensions that are semi-infinite,
such as time variables. Thus there are strong similarities between the Laplace and
Fourier transforms of functions as well as in their properties. There are also significant differences. For one, the handling of several functions such as step, sinusoidal,
exponential, Bessel, and Dirac delta functions are simpler to obtain. The inversion
of Laplace transforms, however, can be quite complicated. In several cases, using a
450
451
table of Laplace transforms, together with separation to partial fractions, are sufficient to obtain the inverse maps. In general, however, one may need to resort to
the theory of residues. We have included a review of the residue methods in the
appendix. Then, in Section 12.5, we apply Laplace transforms to solve some partial
differential equations. Finally, in Section 12.6, we include a brief section to show
how another technique known as method of images can be used to extend the use of
either Fourier or Laplace methods to semi-infinite or bounded domains.
K (p, x) f (x) dx
(12.1)
is called the integral transform of f , where K(p, x) is called the kernel of the
transform and p is called the transform variable of the integral transform. The
limits of integration a and b can be finite or infinite.
A list of different useful integral transforms is given in Table 12.1. For our
purposes, we take integral transforms simply as a mapping of the original function
based on variable x to another function based on a new variable p . The expectation
is that in the new domain, convenient properties are obtained, by which the analysis becomes more manageable. For instance, with Laplace transforms, the original
problem could be a linear time-invariant linear differential equations. After taking
Laplace transforms of these equations, the differential equations are replaced by
algebraic equations.
Based on the properties of integrals, integral transforms immediately satisfy the
linearity property, that is, with constants and ,
IK,a,b [f (x) + g(x)]
K (p, x) f (x) dx +
K (p, x) g(x) dx
a
452
Transform
Mellin
M [f (x)] =
f (x)x p 1 dx
7
Fourier Cosine
7
Fourier Sine
Fs [ f (x)] =
Fourier
F [f (x)] =
Laplace
L [f (x)] =
Fc [ f (x)] =
f (x) cos(xp ) dx
0
f (x) sin(xp ) dx
0
f (x)eixp dx
f (x)exp dx
Hankel transform
H [ f (x)] =
f (x)xJ (xp ) dx
0
b
a
p ) dp
(IK,a,b [ f ]) K(x,
(12.2)
'
=1
a cos
(
2
2
x + b sin
x
T
T
(12.3)
Ironically, the kernel of the Fourier transform, K(x, p ) = eixp , is not a Fourier kernel; that is, the
p ) = eixp /2.
kernel of the inverse Fourier transformation is K(x,
453
T/2
a =
T/2
T/2
T/2
b =
and
T/2
T/2
T/2
cos (2t/T ) dt
T/2
sin2 (2t/T ) dt
a
b
1
f (t)dt
2
1
t
f (t) cos
dt
1
t
f (t) sin
dt
(12.4)
An efficient way to solve for the coefficients is through the use of fast Fourier
transforms (FFT). The FFT method is described in Section L.1 as an appendix.
Substituting (12.4) into (12.3),
f (x) =
1
2
f (t)dt +
1
f (t) cos
(x t) dt
=1
Letting (1/) =
and / = ,
f (x) =
2
1
f (t)dt +
=1
'
(
f (t) cos ((x t)) dt
(12.5)
an integral. Also, with the assumption that ( f (t)dt < ), the first term in (12.5)
will become zero. We end up with the formula known as the Fourier integral
equation,
1
f (t) cos ((x t)) dt d
(12.6)
f (x) =
0
Assuming for now that (12.6) is valid for the given function f (x), we could
proceed further by obtaining an alternative form of (12.6) based on Eulers identity.
Taking the integral of Eulers identity,
m
m
m
i(xt)
e
d =
cos ((x t)) d + i
sin ((x t)) d
m
or
0
1
2
m
m
ei(xt) d
(12.7)
454
'
(
1
ix
it
=
e
f (t)e
dt d
2
(12.8)
Equation (12.8) can now be deconstructed to yield the Fourier transform pair.
Definition 12.2. For a given function f (x), the operator F acting on f (x) given
by
f (t)eit dt
(12.9)
F () = F [ f ] =
is called the Fourier transform of f (t). For a given function F (), the operator
F 1 acting on F () given by
1
f (x) = F 1 [F ] =
F ()eix d
(12.10)
2
is called the inverse Fourier transform of F ().
Thus the kernel of the Fourier transform is K(x, ) = eix , whereas the inverse
) = (1/2)eix ; that is, the signs in the exponential power in
kernel is given by K(x,
both kernels are opposites of each other.2
The Fourier integral equation, (12.6), is not valid for all types of functions.
However, (12.6) is valid for functions that satisfy the Dirichlet conditions and are
integrable, that is,
f (t) dt
(12.11)
(12.12)
Details of Dirichlet conditions and the corresponding Fourier integral theorem, Theorem L.8, can be found in an appendix of this chapter, under Section L.3.
Included in the discussion is the proof that when Dirichlet and integrability conditions are satisfied, the interchangeability of integration sequence used to derive
(12.8) is indeed allowed.
The evaluation of the Fourier and inverse Fourier transforms usually requires
the rules and methods of integration of complex functions. A brief (but relatively
extensive) review of complex functions and integration methods are included in an
appendix under Section L.2. Included in Section L.2 are the theory of residues to
2
There are several versions for the definition of Fourier transform. One version switches the sign
of the exponential. Other versions have different coefficients. Because of the existence of different
definitions, it is crucial to always determine which definition was chosen before using any table of
Fourier transforms.
455
2.5
F()=2 sin()/
f(t)=H(1|t|)
F()
f(t)
1.5
0.5
0.5
0
0
0.5
2
1.5
0.5
0.5
1.5
0.5
30
20
10
10
Figure 12.1. A plot of the square pulse function and its corresponding Fourier transform.
evaluate contour integrals and extensions needed to handle integrals with complex
paths of infinite limits, regions with infinite number of poles, and functions with
branch cuts. Examples addressing Fourier transforms are also included.
EXAMPLE 12.1.
(12.13)
a
a
eit dt = 2
sin (a)
(12.14)
(12.15)
=
(12.16)
2
2 ix + 1 ix 1
x +1
EXAMPLE 12.2.
20
30
456
Unfortunately, some useful and important functions, such as the unit step function, sine, cosine, and some exponential functions, do not satisfy the required conditions of integrability. The definition of Fourier transforms needed to be expanded
to accommodate these important functions.
Before the 1950s, the problematic functions were treated on a case-by-case basis
using arguments that involve taking limits of some parameters to approach either 0
or infinity. Many of the results using these approaches were verified by successful
physical applications and practice, especially in the fields of physics and engineering. As a result of their successes, these approaches remain valid and acceptable
to most practicioners, even at present time. Nonetheless, these approaches lacked
crucial mathematical rigor and still lacked generality in their approach. The biggest
contention was centered on the fact that the Dirac delta function did not satisfy
the conditions required of functions.
With the introduction of distribution theory by Laurent Schwartz, most of the
mathematical rigor needed was introduced. The theory allowed the definition of
the Dirac delta function as a new object called a distribution. A subset of distributions called tempered distributions was subsequently constructed. Using tempered
distributions, a generalized Fourier transform was formulated, and this allowed a
general approach to handle Fourier transforms of functions such as sines, cosines,
and so forth. Fortunately, the previous approaches using limiting arguments were
proven to be equivalent to the methods of distribution theory. Thus the previous
methods were validated with more solid mathematical basis. More importantly, a
general approach had become available for problems in which the limiting argument
may be difficult to perform or justify.
A short introduction to distribution theory and delta distributions is included in
Section L.4 as an appendix. Included in Section L.4 are general properties and operations of distributions and specific properties and operations of delta distributions.
A discussion of tempered distribution and its application to the formulation of generalized Fourier transform continues in Section L.5 as another appendix. Included in
Section L.5 are the definition of the generalized Fourier transforms; the evaluation
of Fourier transforms of sines, cosines, unit step functions, and delta distributions
using the methods of generalized Fourier transforms; and additional properties such
as the Fourier transforms of integrals.
With the formulation of generalized Fourier transforms, we specifically refer to
the original definitions given in (12.9) and (12.10) as the classic Fourier transform
and the classic inverse Fourier transform, respectively. As shown in Section L.5,
the computations used for Fourier transforms of tempered distributions still need
both these definitions. Thus we use the term Fourier transform to imply generalized Fourier transform, because the classic Fourier transform is already included in
the generalized forms. This means that the integral formulas of the classic Fourier
and classic inverse Fourier transforms are used for most evaluations, until a problem with integrability or the presence of delta distributions occurs, at which point
the methods of generalized Fourier transforms using tempered distributions are
applied.
457
[aF () + bG()]
aF [ f (t)] + bF [g(t)]
aF
[F ()] + bF
(12.17)
[G()]
(12.18)
2. Shifting.
F [f (t a)]
=
=
[F ( b)]
f (t a)eit dt
f ()ei(+a) d
eia
1
2
eibt
1
2
1
2
( = t a)
(12.19)
F ( b)eit d
F ()ei(+b)t d
( = b)
1
a
1
a
For a < 0,
F [ f (at)]
1
a
f (at)eit dt
f ()ei(/a) d
f ()ei(/a) d =
( = at)
1
F
a
a
f (at)eit dt
infty
f ()ei(/a) d
( = at)
1
f ()ei(/a) d = F
a
a
1
F
|a|
a
(12.21)
(12.22)
458
4. Derivatives.
'
F
df
dt
df it
e dt
dt
(
f (t)eit
+ i
iF [ f (t)]
f (t)eit dt
(12.23)
dn f
dtn
(
= (i)n F [ f (t)]
(12.24)
'
dF
d
1
2
dF it
e d
d
(
1
F ()eit
it
F ()eit d
2
=
=
=
itF 1 [F ()]
(it) f (t)
(12.25)
or in general,
F 1
'
dn F
dn
(
= (it)n f (t)
(12.26)
5. Integrals.
'
F
(
1
f ()d = F () + ()F (0)
i
(12.27)
The derivation of (12.27) is given in the appendix under Section L.5.2 and
requires the use of generalized Fourier transforms.
6. Multiplication and Convolution. For any two functions f () and g(), the convolution operation, denoted by , is defined as
Convolution(f, g) = [ f g] (t) =
f (t )g()dd
eit
f (t )g()ddt
(12.28)
Let = t , then
F [ f g](t)
=
=
ei(+)
459
f ()g()dd
ei f ()d
F f () F g()
F ()G()
ei g()d
(12.29)
=
=
1
2
eit
F ( )g()dd
let =
1
it(+)
e
F ()g()dd
2
1
1
it
it
2
e F ()d
e G()d
2
2
2F 1 [F ()] F 1 [G()]
2f (t)g(t)
(12.30)
When it comes to applications of Fourier transform to solve differential equations, the dual versions of (12.29) and (12.30) are more frequently used, that
is,
= [ f g](t)
(12.31)
F 1 F ()G()
F f (t)g(t)
1
[F G]()
2
(12.32)
These six properties are summarized in Table 12.2. Using either direct computation
or implementation of the properties of Fourier transforms, a list of the Fourier
transforms of some basic functions is given in Table 12.3. In some cases, the Fourier
transforms can also be obtained by using known solutions of related differential
equations. One such case is the Fourier transform of the Airy function, Ai(x) (see
Exercise E12.1).
460
1. Linearity
2. Shifting
3. Scaling
aF [ f (t)] + bF [g(t)]
F 1 [aF () + bG()]
aF 1 [F ()] + bF 1 [G()]
F [ f (t a)]
eia F [f (t)]
F 1 [F ( b)]
eibt F 1 [F ()]
F [ f (at)] =
1
F
|a|
a
'
(
dn f
= (i)n F [f (t)]
dtn
' n (
d F
F 1
=
(it)n F 1 [F ()]
dn
' t
(
1
F
f ()d = F () + ()F (0)
i
= [f g](t)
F 1 F ()G()
1
F f (t)g(t)
=
[F G]()
2
4. Derivatives
5. Integrals
6. Convolution
instance, suppose the dependent variable is u = u(x, t), with < x < and 0
t . Then the transform would make sense only with respect to x.
When taking the Fourier transform with respect to x, the other independent
variables are fixed during the transformations. For instance, with u(x, t), we define a
new variable,
U(, t) = F [u(x, t)]
Some basic rules apply:
1. When taking derivatives with respect to the other independent variables, one
can interchange the order of differentiation with the Fourier transform operation, for example,
(
dk
dk
k
u(x,
t)
=
F
t)]
=
F
U(, t)
[u(x,
tk
dtk
dtk
'
(12.33)
Note that the partial derivative operation has been changed to an ordinary
derivative.
2. When taking derivatives with respect to the chosen variable under which
Fourier transforms will apply, say x, the derivative property of Fourier transforms can be used, that is,
'
F
(
k
u(x,
t)
= (i)k F [u(x, t)] = (i)k U(, t)
xk
(12.34)
if t 0
if t > 0
0
1
F [f (t)]
Some remarks
() + [1/ (i)]
2/ (i)
2 sin(a)/
H (t) =
sgn(t) =
H (a |t|) =
sin(bt)/t
H b ||
(t a)
eia
a/ t2 + a2
ea||
2()
eiat
2( a)
e|a|t
2
/| a | e /(4|a|)
1
1
+
if t < 0
if t > 0
0
1
if |t| > a
if |t| a
10
cos(at)
( + a) + ( a)
11
sin(at)
i ( + a) ( a)
461
462
EXAMPLE 12.3.
2u
1 2u
2 2 =0
2
x
c t
subject to initial conditions
u
(x, 0) = g(x)
t
We first take the Fourier transform of both sides,
' 2
(
u
1 2u
1 d2
2
F
=
U(,
t)
U =
(i)
x2
c2 t2
c2 dt2
u (x, 0) = f (x)
and
d2 U
+ (c)2 U
dt2
0
0
A
B
1
=
2
and
1
1
1/(ic)
1/(ic)
F [f ]
F [g]
Then,
1 ict
1 ict
e + eict F [f (x)] +
e eict F [g(x)]
2
2ic
Next, take the inverse Fourier transforms of the various terms while applying
the shifting theorem (12.19),
'
(
1 ict
1
F 1
=
e F [ f (x)]
f (x + ct)
2
2
'
(
1 ict
1
F 1
=
e
F [ f (x)]
f (x ct)
2
2
8
9
8
' x
( 9
1 ict
1 1
1
ict
F
e F [g(x)]
=
F
g()d
e F
2ic
2c
F 1 eict ()
F [g]
2c
=0
x+ct
1
1
=
g()d
F [g]
2c
4c
=0
8
9
xct
1 ict
1
1
F 1
e
F [g(x)]
=
g()d
F [g]
2ic
2c
4c
=0
U(, t) =
463
10
u(x,y)
2
1.5
4
0.5
0
10
0
5
0 5
Figure 12.2. A surface plot of u(x, y) given by (12.35) together with the corresponding contour
plot.
EXAMPLE 12.4.
subject to
u(x, 0)
f (x) = H (1 |x|)
u(x, y)
0 as |x| and y
d2 U
=0
dy2
1
x1
x+1
=
Atan
Atan
(12.35)
y
y
Figure 12.2 shows a surface plot of u(x, y) given by (12.35), together with some
curves at constant u values, which is shown separately as a contour plot. The
level curves can be seen to be circles whose centers are located along the line
x = 0.
464
if x 0
if x > 0
(12.36)
'
(
1
H (x) eax f (x) =
eix
f (t)e(a+i)t dt d
2
0
(
'
1
(a+i)x
(a+i)t
H (x) f (x) =
e
f (t)e
dt d
(12.37)
2
0
where H (x) is the unit step function.
From (12.37), and letting s = a + iw, a 0, we can extract the integral transform
pair called the Laplace transform and inverse Laplace transform.
Definition 12.3. Let s be a complex variable whose real part is non-negative. For
a given function f (t), the operator L acting on f (t) given by
f (t)est dt
(12.38)
f (s) = L [ f ] =
0
(12.39)
The integral formula given in (12.39) is also known as the Bromwich integral.
465
transforms via (12.39) may be obtained via the method of residues. Details of the
residue theorem can be found in Sections L.2.2 and L.2.3.4 Briefly, we have
1
2i
+i
f (s)est ds =
N
f (s)est
Resz
(12.40)
=1
1
dk1
lim k1 [z zo ]k g(z)
(k 1)! zzo dz
(12.41)
For the special case when the function g(z) is a rational function with a numerator
function num(z) and denominator function den(z), that is,
g(z) =
num(z)
den(z)
such that a simple pole at (z = z0 ) is a root den(z), then via LHospitals rule, we
have
num(z)
Reszo (g) =
(12.42)
d
den(z)
dz
z=z0
Laplace transform of t . Consider the function f (t) = t where
> 0 is a real valued constant. Then
y y
1
L [t ] =
est t dt =
ey
d
= +1
ey y dy
s
s
s
0
0
0
EXAMPLE 12.5.
( + 1)
s+1
where (x) is the gamma function of x (cf. (9.7)).
=
(12.43)
Laplace transform of erfc 1/(2 t) . The error function, erf (x),
and the complementary error function, erfc (x) are defined as
x
2
2
2
2
erf (x) =
e d
and
erfc (x) =
e d (12.44)
0
x
EXAMPLE 12.6.
(12.45)
In practice, because the evaluation of the inverse Laplace transforms can be quite complicated, a
table of Laplace transforms is usually consulted first.
466
The Laplace transform of erfc 1/2 x is given by
8
9
1
2
st
L erfc
=
e
e ddt
2 x
0
1/(2 t)
After integration by parts,
8
9
1
L erfc
=
2 x
=
st 1/(4t)
1
e e
dt
2s 0
t t
2
2
2
es/(4q ) eq dq
s 0
1
with q =
2 t
dg
ds
2
d g
ds2
1 s/(4q2 ) q2
e
e dq
4q2
1
42 q4
es/(4q ) eq dq
2
+ g
ds2
s 0
8q2
4
s
2 ds
4
or
d2 g
1 dg
1
s2 2 + s
sg = 0
ds
2 ds
4
This equation is reducible to a Bessel or modified Bessel equation. Using
the methods described in Theorem 9.3, we have, after using g(0) = /2 and
|g()| < ,
g(s) = s1/4 AI1/2 s + BI1/2 s = ae s + be s
s
=
e
2
Combining the results, we have
8
9
1
1
L erfc
= e s
(12.46)
s
2 x
EXAMPLE 12.7.
(s) be given by
Let F
(s) = J (s)
F
J (s)
where = {0, 1}. To find the inverse Laplace transform, we can use (12.39), that
is,
+i
1
J (s)
1
F (x) =
L
ds
est
2i i
J (s)
467
and use the theory of residues to evaluate the complex integral. The poles of
est will be the roots of J (s), except for s = 0, because it a removable singularity
F
as it is also the root of the numerator J (s). Thus let zn be the n th root of J (z).
Using (12.40) and (12.42), plus the formula for the derivative of Bessel function
given in (9.66), we have
J (zn )
(x) =
ezn t
L1 F
J (zn ) J +1 (zn )
n=1,...,;zn =0
zn
a L f (t) + b L g(t)
f (s) + b L1
a L1
g (s)
=
=
(12.47)
(12.48)
where a and b are constants. Both (12.47) and (12.48) are immediate consequences of the properties of integrals.
2. Shifting.
L H (t a) f (t a)
H (t a) f (t a)est dt
=
0
as
f ()es(+a) d
( = t a)
f ()es d
eas L f (t)
f (s b)
L1
1
2i
+i
+i
f (s b)est d
1
2i
ebt
f (s)
ebt L1
1
2i
(12.49)
f ()e(+b)t d
+i
i
( = s b)
f ()et d
(12.50)
468
3. Scaling.
Let Real(a) 0, a = 0
L f (at)
=
0
=
=
=
1
a
1
a
f (at)est dtdt
f ()es(/a) d
f ()e(s/a) d
1
f
a
( = at)
s
a
(12.51)
4. Derivatives.
8
L
df
dt
df st
e dt
dt
0
(
st
f (t)e
+s
=
=
sL f (t)
f (t)est dt
f (0)
(12.52)
dn f
dtn
= sn L f (t)
n1
k=0
k
d
f
snk1 k
dt t=0
(12.53)
5. Integrals.
8
L
f ()d
0
st
f ()d
f ()d
0
1
=
s
st
f (t)e
0
1
dt =
L
s
'
(
f (t)
(12.54)
Convolution(f, g) = [f g] (t) =
0
f (t )g()dd
(12.55)
469
Note that the limits for the convolution used in Laplace transform methods are
from 0 to t, whereas the convolution for Fourier transform methods are from
to .5
The Laplace transform of a convolution is then given by
' t
(
'
(
est
f (t )g()d dt
L f g
=
0
=
'
'
'
=
=
(
est f (t )g()d dt
(
est f (t )g()dt d
(
es(+) f ()g()d d
'
( '
f ()es d
g()es d
L [ f ] L [g]
(12.56)
EXAMPLE 12.8.
1
s+
(12.58)
(12.59)
For cosine and sine, we can use (12.59), Eulers identity, and the linearity property
(
'
(
'
1
1
1
1
=
L cos (t)
L eit + eit =
+
2
2 s i s + i
s
=
(12.60)
s2 + 2
5
Based on the original definition of convolution given in (12.28), observe that (12.55) is simply the
result of restricting f (t) = g(t) = 0 for t < 0, that is,
t
H (t ) f (t )
H () g() d =
f (t )g()dd
470
'
L
(
'
1
1
1
1
it
it
=
L e e
2i
2i s i s + i
s2 + 2
sin (t)
(12.61)
EXAMPLE 12.9.
L J (t)
k=0
(1)k
2k+
L
t
k! 22k+ ( + k + 1)
(1)k ( + 2k + 1) 1 2k++1
k! 22k+ ( + k + 1) s
k=0
(1)k ( + 2k + 1)
1 k
k!
( + k + 1)
4s2
k=0
k
k1
k
1
(1)
1
1 +
(k + + 1) + j
s(2s)
k!
4s2
1
s(2s)
j =0
k=1
(12.62)
where we used the Pockhammer product equation given in (9.11). Note further
that in this case, to guarantee that the Pockhammer product will be positive, we
need to require that > 1. Next, define g(q) as
!
4q + 1 1
1
!
g(q) =
(12.63)
2q
4q + 1
!
or in terms of z = 4q + 1,
2
1
g(z(q)) =
z+ 1
z
Then
g(q)q=0
dg
dq q=0
=
=
g(z(q))z=1 = 1
dz
dg
( + 1)z 1
+1
=2
= ( + 2)
dz
dq z=1
(z + 1)+1 z3 z=1
..
.
dk g
dqk q=0
(1)k
k1
(k + + 1) + j
j =0
k1
(1)k
g(q) = 1 +
(k + + 1) + j qk
k!
k=1
j =0
471
s2 + 1 s
s2 + 1
(12.64)
Note that this result is valid for > 1, including non-integer values.
EXAMPLE 12.10.
Using the scaling property, (12.51), with a = i and the Laplace transform of J (t)
given in (12.64),
2
s + 1 (s/i)
1
s2 + 1
=
EXAMPLE 12.11.
s s2 1
s2 1
(12.65)
L H (t)
=
0
H (t) est dt =
1 st t=
1
e t=0 =
s
s
(12.66)
Using this result plus the fact that (t) is defined as the derivative of H (t),
(
'
d
(12.67)
L (t)
= L
H (t) = sL H (t) H (0) = 1
dt
472
0
1
L [f (t)]
Remarks
1
s
if t 0
if t > 0
H (t) =
(t)
et
1
s+1
sin(t)
1
s2 + 1
cos (t)
s
s2 + 1
( + 1)
s+1
erfc
1
e1/(4t)
t
1
e1/(4t)
t3
10
2 t
1 s
e
s
7
s
e
s
4 e s
s2 + 1 s
s2 + 1
J (t)
11
I (t)
12
t/2 J 2 t
s s2 1
s2 1
e1/s
s1+
(12.68)
where
N(s)
n1
q sq
(12.69)
q=1
D(s)
m
k=1
(s rk )Nk
with
m
k=1
Nk = n
(12.70)
473
f (s) can be separated into n additive terms as follows:
f (s) =
m
N
k
k=1
=1
Ak
(s rk )
(12.71)
whose inverse Laplace transform, after using (12.43) together with the linearity,
scaling, and shifting properties, is given by
f (t) =
m
N
k
k=1
=1
Ak
( 1)!t1
erk t
(12.72)
NK
AK (s rK )NK
(12.73)
=1
where,
(K) =
N
k
k=K
=1
Ak
(s rK )NK
(s rk )
(12.74)
lim
srK
d(NK L)
(K)
=0
ds(NK L)
AKL = lim
srK
(NK L)
1
d
'
(
f (s)
(s rK )NK
(12.76)
474
rk
Nk
Ak
1
2
1/2
2
1
2
1 + i
1 i
A11
A21
A22
A31
A32
A41
A42
= 64/225
= 19/18
= 5/6
= (67/100) (31/100)i
= (7/20) (1/5)i
= (67/100) + (31/100)i
= (7/20) + (1/5)i
After using Eulers identities for sin(t) and cos(t), (12.72) can be rearranged
to be
64 t/2
5
19 2t
f (t) =
e
t+
e
225
6
18
8
9
2
31
7
67
+ et
t+
sin(t) + t +
cos(t)
5
50
10
50
(12.77)
u(x,
t)
= k L [u(x, t)] = k U(x,
s)
(12.78)
L
k
x
dt
dt
475
2. When taking derivatives with respect to the transformed variable, say t, the
derivative property of Laplace transforms can be used, that is,
'
L
(
k1
n
k
k
kn1 u
U(x,
s)
u(x,
t)
=
s
s
xk
tn t=0
(12.79)
n=0
Note that the initial conditions of the partial derivatives need to be specified by
the problem for the successful application of Laplace transforms.
The general approach to solving the partial differential equation for u(x, t) can be
outlined as follows:
1. Take the Laplace transform of the partial differential equation with respect to
s), with
t 0 and reduce it to an ordinary differential equation in terms of U(x,
x as the independent variable.
2. Take the Laplace transform of the conditions at fixed values of x, for example,
t)
U(a,
d
U(a, t)
dx
L [u(a, t)]
'
(
u
L
(a, t)
x
(12.80)
(12.81)
etc.
s).
3. Solve the ordinary differential equation for U(x,
s).
4. Take the inverse Laplace transform of U(x,
In several applications, one could use the table of Laplace transforms
together with the properties of Laplace transforms. In other cases, one may
need to resort to using the method of decomposition to partial fractions and
even the theory of residues to evaluate the inverse Laplace transforms.
EXAMPLE 12.13.
2u
u
=
(12.82)
x2
t
with a constant initial condition, u(x, 0) = Ci . Taking the Laplace transform of
(12.82) with respect to t, we obtain
2
d2 U
= Ci
sU
dx2
(12.83)
s /. The solution will depend on the values of the parameters
where =
and the boundary conditions.
Let the boundary conditions be given by
u (0, t) = C0
and
(12.84)
476
u(x,t)
100
90
80
70
60
1
50
0
x
0.5
10
20
C0 Ci
. Thus
s
= C0 Ci e s/ + Ci
U
s
s
Based on item 7 in Table 12.4 and the scaling property given in (12.51),
'
(
x
1
2
L erfc
= e (x/) s
s
2 t
Applying these to (12.83), we get A = 0 and B =
2 t
+ Ci
(12.85)
This preceding case applies to the boundary conditions given in (12.84). For
other boundary conditions, the use of residues may be needed to obtain the inverse
Laplace transform. In Section L.7, we include a few more examples to show how the
solutions under different boundary conditions can still be obtained using Laplace
transform methods.
12.7 Exercises
477
Subproblem A
Subproblem B
u(x, 0) = f (x)
uB (x, 0) = 0
u(0, y) = g(y)
uA(0, y) = 0
u(x, 0) = f (x)
uB (x, 0) = 0
u
(0, y) = g(y)
x
uA
(0, y) = 0
x
uB
(0, y) = g odd (y)
x
can be split into two subproblems for LuA = h(x, y) and LuB = h(x, y), and whose
boundary conditions are given in Table 12.5, where
+
+
f (x)
for x 0
f (x)
for x 0
even
odd
(x) =
f
and
f (x) =
f (x) for x < 0
f (x) for x < 0
EXAMPLE 12.14.
x0
u
(0, y) = 0. Note that using a Laplace transform
x
with respect to x would require another boundary condition because the reduced
ordinary differential equation will be second order. Thus we use the method of
images and then apply the Fourier transform method. In this case, there is no
need for subproblem B.
By extending f to f even and following the same approach in Example 12.4,
we obtain
1
y
1 even
y
even
(x)
u(x, y) = f
=
f
d
()
2
2
x +y
(x )2 + y2
1 even
y
1 even
y
=
f
d
f
d
()
()
2
2
0
(x ) + y
0
(x + )2 + y2
y
1
y
=
f ()
d
0
(x )2 + y2
(x + )2 + y2
subject to u(x, 0) = f (x) and
12.7 EXERCISES
(12.86)
478
2
1 + ex2
t 0, let = !
, thus
42 t
!
1
2
u(x, t) =
e f x 42 t d
12.7 Exercises
1 for |x| 1
f (x) =
0 otherwise
E12.5. Recall the Black-Scholes equation described in Exercise E11.9,
u
1
2u
u
= 2 x2 2 + r x
u
t
2
x
x
(Note: recall that t is the time from expiry of the option not actual time.)
1. Taking a hint from the solution of Euler-Cauchy equations (cf. (6.40)),
show that with z = ln(x), the Black-Scholes equation can be transformed
to be
2 2 u
u
2 u
=
+ r
ru
t
2 z2
2 z
2. Next, using u(z, t) = e(Az+Bt) q(z, t), find the values of A and B such that
the equation reduces to a diffusion equation given by
q
2 2 q
=
t
2 z2
3. Using the solution given in Exercise E12.4, show that the solution that
satisfies the European call option: u(0, t) = 0, u(x, t) x as x
and u(x, 0) = max(x K, 0), where K is the strike price, is the BlackScholes formula for European call option given by
(12.88)
xN (d1 ) Kert N (d2 )
2
r + ( /2) t + ln(x/K)
d1 =
2 t
r ( 2 /2) t + ln(x/K)
d2 =
2 t
where N () is the cumulative normal distribution function defined as
follows:
1
+
erf
/ 2
1
2
N () =
e /2 d =
2
2
(Hint: By completing squares in the arguments, it can be shown that
2
2
1
1
b 4ac
b + 2a
exp a + b + c d = exp
N
4a
a
2a
where a > 0.)
u(x, t)
E12.6. Obtain the solution for the reaction-diffusion equation with time-dependent
coefficient K(t) given by
u
2u
= D 2 K(t)u
t
x
subject to u(x, 0) = f (x) and u(x, t) < for t > 0 and < x < ; use
the Fourier transform method with respect to x. Compare the solution with
one obtained using the transformation suggested in Exercise E11.11.
479
480
1
1
L exp
=
4 e s
(12.90)
4t
t3
E12.9. Using a similar approach as Example 12.9, show that
e1/s
L t/2 J 2 t = +1
(12.91)
s
Thus, using the scaling properties, show that for > 0
'
(
1
L1
exp
= I0 2 x
(12.92)
s
s
E12.10. To obtain the inverse Laplace transform of
1
s
F (s) = exp
s
s+1
First show that
1
s
1
1
s
exp
d + = exp
s+1
s+1
s
s
s+1
0
Using this identity and (12.92), prove that
'
(
!
1
s
L1
exp
= H (t) +
e(t) J 0 2 t d
(12.93)
s
s+1
0
E12.11. The concentration of a reactant A undergoing a first-order reaction A P
in a plug flow reactor can be modeled by
c
c
+v
= kc for 0 x L, t 0
t
x
subject to c(x, 0) = f (x) and c(0, t) = g(t), where v is the constant velocity
of the flow. Use the Laplace transform method to solve this problem. Compare this with the solution obtained using the method of characteristics (cf.
Section 10.1). (Note that the method of characteristics can also handle an
nth -order reaction, i.e., with kcn instead of kc.)
E12.12. Consider a heat exchanger tube immersed in a temperature bath. The temperature of fluid flowing through the tube can be modeled as
u
u
+v
= (H) ub(t) u
t
x
12.7 Exercises
481
(Hint: You may need equation (12.93).) Obtain a set of time-lapse plots of
u(x, t)/u0 .
E12.14. The dynamic equations describing the current and voltage distribution along
a thick cable can be modeled by
I
V
+C
+ GV = 0
t
t
V
I
+ L + RI = 0
t
t
where I(x, t) and V (x, t) are the current and voltage values, whereas C,
R, L, and G are constant capacitance, resistance, inductance, and leakage,
respectively.
1. Show that these equations can be combined to yield
2
2u
u
2 u
+
+
)
+
u
=
(
t2
t
x2
G
R
1
where u is either V (x, t) or I(x, t), with = , = and =
. These
C
L
LC
equations are known as the telegraph equations.
2. For the special case of = , use the Laplace transform method to solve
the equation for the conditions
u
u(0, t) = g(t); u(x, 0) =
(x, 0) = 0 and u(, t) = 0
t
E12.15. Consider the Nusselt problem describing the heat transfer for a plug flow,
as given by the following equation:
v u
2 u 1 u
= 2 +
z
r
r r
0 r R,
0zL
u
(z, 0) = 0.
r
1. Applying Laplace transform with respect to z, show that this results in
uin
uR uin J 0 ia s
L [u] =
+
s
s
J 0 ib s
7
7
v
v
where a = r
and b = R
(with i = 1).
2. Using the approach used in Example 12.7 to find the inverse Laplace
transform, obtain the solution u(z, r).
subject to: u(z, R) = uR , u(0, r) = uin and
482
E12.16. Consider the shallow water model given in Example 10.8. After linearization, we obtain the system described as follows:
h
0
H
0
t
+
=
v
v
g
0
0
t
x
where h(x, t) and v(x, t) are the height of the surface and velocity along
x, respectively. The constants g and H are the gravitational acceleration
and mean depth, respectively. Assume that the initial and boundary conditions are h(x, 0) = v(x, 0) = 0 and v(0, t) = f (t), with v(x, t) < . Using
the Laplace transform method, show that the solution is given by
6
x
H
x
v(x, t) = f t
and
h(x, t) =
f t
g
Hg
Hg
E12.17. Consider the diffusion equation given by
u
2u
=D 2
t
x
subject to u(x, 0) = f (x) and u(0, t) = g(t) for x 0 and t 0. Based on the
discussion in Section 12.6, the problem can be split into two subproblems:
uA
2 uA
=D 2
t
x
uB
2 uB
=D 2
t
x
where f odd is the odd function extension of f (x). Solve for uA using the
Fourier transform method and uB using the Laplace transform method.
(Note that because the time derivative is first order, only one initial condition
is needed.) Thus obtain the final solution as u(x, t) = uA + uB.
13
484
Having developed matrix equations for one, two, or three-dimensional equations, the transition to time-dependent cases is straightforward. This is achieved by
using an approach called the semi-discrete method, which essentially reduces the partial differential equations first into initial value problems. This means that the techniques covered in Chapter 7 can be used directly. However, due to size of the problems, the Euler methods (forward and backward difference), as well as the averaged
version known as Crank Nicholson methods, are often the methods of choice. As it
was in Chapter 7, the issue of stability becomes important, where implicit methods
have the advantages of much larger stability regions. In Section 13.4, we discuss the
stability analysis based on the spectral radius, but we also include another method
based on Fourier analysis known as the von Neumann analysis.
We include the use of finite difference equations for handing hyperbolic equations in Section M.3 as an appendix. Recall that the challenge for handling these
types of equations is that the solutions are expected to travel as waves. Thus discontinuities will be propagated in the path of characteristics (in nonlinear cases, these
could also involve shock formations). Some of the methods are simple substitution
of finite difference formulas such as the upwind formulas. However, other methods, such as the Lax-Wendroff method, which uses the Taylor series approximation
directly on the differential equation, will arrive at a different scheme altogether.
We do not cover the solution of nonlinear partial differential equations. Instead,
we just note that the extension to some nonlinear problems, such as for semilinear or
quasilinear partial differential equations, are often straightforward. The additional
complexity will be the convergence of the linearized formulas, either by successive
substitution or Newton-type approaches.
In this section, we discuss the first and second steps in more detail, that is, how
to obtain various finite difference approximations of derivatives. Specifically, we
develop the formulas for first- and second-order derivatives. The third step, that is,
the inclusion of boundary conditions, is discussed in the next section.
Let us first set some specific index notations. We will assume that t, x, y, and
z domains have been discretized by uniform increments of
t,
x,
y, and
z,
(q)
respectively. Thus let uk,n,m be defined as
(q)
uk,n,m = u (q
t, k
x, n
y, m
z)
(13.1)
485
(K + 1)
x), where
x = 1/(K + 1). The points x = 0 and x = (K + 1)
x are at the
boundaries.
When some of the dimensions are fixed, the corresponding superscripts or subscripts will be suppressed. For instance, in a time-varying system that is dependent
(q)
on only one spatial dimension, we use uk = u (q
t, k
x), whereas for a timeindependent system in two spatial dimensions, we have uk,n = u (k
x, n
y).
Let us now start with the approximation of a first-order derivative. The Taylor
series expansion of uk+1 = u (x +
x) around x = xk is given by
du
1 d2 u
uk+1 = uk +
x +
x2 + . . .
(13.2)
dx xk
2 dx2 xk
Rearranging,
du
uk+1 uk
=
+ O (
x)
dx xk
du
uk+1 uk
dx xk
(13.3)
x +
x2 + . . .
(13.4)
dx xk
2 dx2 xk
Rearranging,
du
uk uk1
=
+ O (
x)
dx xk
du
uk uk1
dx xk
(13.5)
x3 d3 u
uk+1 uk1 = 2
x + 2
+ ...
(13.6)
dx xk
6 dx3 xk
from which we get
du
uk+1 uk1
=
+ O
x2
dx xk
2
x
2
and upon dropping the O
x terms,
du
uk+1 uk1
dx xk
2
x
(13.7)
(13.8)
486
D
=
j uk+j
P
dxP xk
xP j =
(13.9)
The choice for the indices 1 and 2 will determine both the order of approximation
as well as the bandwidth of the matrices used to solve the finite difference equations
once they are applied to the differential equations. For instance, the forward, backward, and central difference formula given in (13.3), (13.5), and (13.7) uses the limits
(1 , 2 ) = (0, 1), (1 , 2 ) = (1, 0), and (1 , 2 ) = (1, 1), respectively. Both forward
and backward approximation formulas have bandwidths of 2 yielding a first-order
approximation, whereas the central approximation formula has a bandwidth of 3 but
yields a second-order approximation.
One approach to find the coefficients j is known as method of undetermined
coefficients and is based on the Taylor series of u or of its Mth derivative at x = xk+j ,
which we can rewrite in the following form:
dM u
1
=
(M),j
dxM xk+j
xM
(13.10)
=0
where
d u
r =
xr
,
dxr xk
r
r,s
r
s /r!
1
=
if r > 0
if r = 0
(13.11)
if r < 0
The form given in (13.9) does not include derivatives at neighboring points. Thus
substituting (13.10) with M = 0, 1 0, 2 0 and P (2 1 ) into (13.9),
dP u
dxP xk
P
xP
2
1
u
P j k+j
x
j =1
(2 1 )
2
j j +
xP j =
=0
=(2 1 )+1
2
j j
xP j =
1
By setting the second sum on the right side to be the truncation error, we have the
following lemma:
The coefficients j of the finite difference approximation of a Pth -order
derivative under the form given by (13.9), with 1 0, 2 0 and P (2 1 ) is
given by
LEMMA 13.1.
0,1
1
..
..
. =
.
2
2 1 ,1
..
.
0,2
..
.
2 1 ,2
eP+1
(13.12)
487
where eP+1 is the (P + 1)th unit vector of length (2 1 + 1), yielding a truncation
error given by
Error =
=(2 1 )+1
2
j j
j =1
d u
xP
dx xk
(13.13)
1
1
1
1
1
0
1
0 = 1 0
1 0 = 2
1
1
1/2 0 1/2
1
Thus
d2 u
1
x2
(13.14)
Looking at the truncation error, we see that the leading term in the summation
found in (13.13) is zero, that is,
(1)3
13
02+
=0
3!
3!
This means that the level of approximation in (13.14) is second-order, which is a
fortunate case because the leading terms of the truncation errors do not vanish
in general.
3,1 1 + 3,0 0 + 3,1 1 =
The finite difference form given in (13.9) can be extended to include derivatives
at the neighboring points:
dP u
Dp =
dxP xk
j [1 ,2], [0,P]
xP
j,
d u
dx xk+j
(13.15)
LEMMA 13.2.
j,
(),j j, = P
(13.16)
488
where rs is the Kronecker delta (i.e., rr = 1 and 0 otherwise). The truncation error is
given by
d u
(),j j,
Error =
xP
(13.17)
dx xk
=Q+1
1,1
x
+ 0,0 uk + 1,0 uk+1
dx2
x2
dx xk1
EXAMPLE 13.2.
0 1
1
1,1
0
1 0
1 0,0 = 0
1 0 1/2
1,0
1
or
d2 u
2
2
dx
3
x2
1,1
1
0,0 = 2 1
3
1,0
1
du
x
uk + uk+1
dx xk1
(13.18)
Remarks: A MATLAB code to obtain the coefficients of a finite difference approximation of a Pth -order derivative based on a given list of indices J and is available
on the books webpage as eval_pade_gen_coef.m. The function is invoked by
the statement [v,ord]=eval_pade_gen_coef(J,Lambda,P) to obtain v as
the vector of coefficients and ord as the order of approximation.
A list of common finite difference approximations for the first- and secondorder derivatives based on the method of undetermined coefficients is given in
Tables 13.1 and 13.2, respectively. These are used later in formulating the numerical
solution of differential equations. As expected, more terms are needed for increased
precision. A balance has to be struck between a smaller
x or using more terms in the
finite difference approximations. Having more terms in the approximation formula
will increase the number of computations due to a matrix with larger bandwidths.
However, a smaller
x will also increase the number of computations because the
size of the matrix will have to be enlarged to achieve a similar precision.
Note that items 7 through 12 of Table 13.1 and items 5 through 12 of Table 13.2
involve derivatives at the leftmost or rightmost neighbor. These approximation formulas are used when dealing with Neumann boundary conditions.
For mixed derivatives, the same method of undetermined coefficients can be
used with the Taylor series expansion of multivariable functions but would require
2u
, we have
much more complex formulation. For the case of
tx
2 u
1 (q+1)
(q+1)
(q1)
(q1)
u
u
(13.19)
u
k+1
k1
k+1
k1
xt (xk ,tq )
4
x
t
Item
489
du
dx xk
Approximation formula
Error order
1
uk + uk+1
O (
x)
1
uk1 + uk
O (
x)
1
uk1 + uk+1
2
x
O
x2
1
+uk2 8uk1 + 8uk+1 uk+2
12
x
O
x4
1
3uk1 10uk + 18uk+1 6uk+2 + uk+3
12
x
O
x4
1
uk3 + 6uk2 18uk1 + 10uk + 3uk+1
12
x
O
x4
1
3
'
(
1
2
2
du
uk1 +
uk +
3
x
dx k+1
O
x2
'
(
1
du
38
9
54
7
3
uk1
uk +
uk+1
uk+2
75
dx k2
x
O
x4
10
'
(
1
du
197
279
99
17
18
uk +
uk+1
uk+2 +
uk+3
150
dx k1
O
x4
11
'
(
1
7
54
9
du
38
uk2
uk1 +
uk +
uk+1 3
75
x
x
dx k+2
O
x4
12
'
(
1
17
99
279
du
197
uk3 +
uk2
uk1 +
uk + 18
150
x
dx k+1
O
x4
'
du
dx
2
2
uk +
uk+1
x
k1
O
x2
with Error = O
t2 ,
x2 . Likewise, we have
2 u
1
u
u
+
u
k+1,n+1
k+1,n1
k1,n+1
k1,n1
xy (xk ,yn )
4
x
y
(13.20)
with Error = O
x2 ,
y2 . The details for this result can be found in example M.1
contained in the appendix as Section M.1. For higher order approximations, a simpler
490
d2 u
dx2 xk
Item
Approximation formula
Error order
1
uk1 2uk + uk+1
2
O
x2
1
u
+
16u
30u
+
16u
u
k2
k1
k
k+1
k+2
12
x2
O
x4
1
10u
15u
4u
+
14u
6u
+
u
k1
k
k+1
k+2
k+3
k+4
12
x2
O
x4
1
uk4 6uk3 + 14uk2 4uk1 15uk + 10uk+1
2
12
x
O
x4
'
(
1
du
2
x
2u
+
2u
k
k+1
3
x2
dx k1
O (
x)
(
'
1
du
2u
+
2
x
2u
k1
k
3
x2
dx k+1
O (
x)
'
(
1
du
6
x
4uk + 2uk+1 + 2uk+2
11
x2
dx k1
O
x2
'
(
1
du
2u
+
2u
4u
+
6
x
k2
k1
k
11
x2
dx k+1
O
x2
du
1
30
x
+ 946uk1 1905uk
dx k2
822
x2
+ 996uk+1 31uk+2 6uk+3
O
x4
10
du
1
600
x
+
945u
3548u
k
k+1
dx k1
1644
x2
+ 3918uk+2 1572uk+3 + 257uk+4
O
x4
11
du
+946uk+1 30
x
822
x2
dx k+2
O
x4
12
du
+ 945uk + 600
x
1644
x2
dx k+1
O
x4
but slightly limited approach for mixed partial derivatives is to build up the finite
difference approximation by using the fact that
2u
=
xy
x
u
y
3u
,
=
x 2 y
x
2u
y2
, . . . , etc.
491
Thus (13.20) could also have been derived by applying central difference approximations on the variable y followed by another central difference approximation on
the variable x, that is,
ui,n+1 ui,n1
2u
ui,n+1 ui,n1
1
=
+ O
x2 ,
y2
xy
2
y
2
y
2
x
i=k+1
i=k1
In case one needs to accommodate a Neumann boundary condition at y = 0, another
useful form based on item 7 in Table 13.1 is given by
8
2u
1
y u
u
2
=
(uk+1,n uk1,n )
xy
2
x
y 3
y k+1,n1
y k1,n1
3
(
2
+ (uk+1,n+1 uk1,n+1 )
(13.21)
3
The derivation of (13.21) is included as exercise E13.2.
(13.22)
Note that the problem is a two-point boundary value problem that can also be
approached using the techniques given in Section 7.5 based on shooting methods.
Nonetheless, the matrix formulations in this section are used when they are extended
to the 2D and 3D cases.
Define the following vectors and matrices as:
u1
u = ...
uK
0
..
0
..
.
K
= ...
K
where uk = u(k
x), k = (k
x), k = (k
x), and k = (k
x). The terms connected with the derivatives are given in matrix form as
du
D 1 u + b1
dx
and
d2 u
D 2 u + b2
dx2
The elements and structure of the K K matrix DP are formed using the formulas
in Tables 13.1 and 13.2, depending on the choice of precision and types of boundary
conditions, whereas the elements of the K 1 vector bP contain the given boundary
492
values of the problem. Thus the finite difference approximation of (13.22) can be
represented by a matrix equation
(D2 u + b2 ) + (D1 u + b1 ) + u + = 0
where
A = D2 + D1 +
and
Au = b
(13.23)
b = b 2 + b1 +
and the solution can be found using techniques found in Chapter 2 that take advantage of the band structure and sparsity of matrix A. What remains at this point is to
specify D1 , D2 , b1 , and b2 .
We limit our discussion to two cases. In the first case, the boundary conditions
are Dirichlet type at both x = 0 and x = 1, that is, u(0) = Di0 and u(1) = Di1 . Using
the formulas in Tables 13.1 and 13.2, we have for Error = O
x2 (where we denote
the Dirichlet or Neumann boundary condition by (Di) or (Neu), respectively as a
superscript for x = 0 and as a subscript for x = 1)
Di0
0
1
0
' ((Di)
' ((Di)
..
..
.
.
1
1
1
..
and
b
=
D1
=
1
.
.
.
2
x
2
x
.
.
(Di) O(
x2 )
(Di) O(
x2 )
. 1
.
0
1 0
Di1
(13.24)
' ((Di)
D2
(Di)
1
=
x2
O(
x2 )
..
..
..
1
..
.
.
1
' ((Di)
1
and b2
=
2
(Di) O(
x2 )
x
1
2
Di0
0
..
.
0
Di1
(13.25)
For higher precision, that is, Error = O
x4 , we have
10 18 6
1
8
0
8 1
1
8
0
8
1
' ((Di)
1
.
.
.
..
..
..
..
D1
=
.
12
x
(Di) O(
x4 )
1 8
0
1
8
1
' ((Di)
b1
(Di)
O(
x4 )
3Di0
Di0
0
..
.
12
x
Di1
3Di1
..
.
8
0
18
8
10
(13.26)
493
13
0.6
0.5
0.5
0.4
x 10
Error
0.3
1.5
0.2
0.1
0
0.2
0.4
0.6
0.8
2.5
0
0.2
0.4
0.6
0.8
Figure 13.1. A plot of the exact solution for u (solid line) and finite difference solution (points). On the
right is a plot of the errors of the approximate solution from the exact solution.
'
((Di)
D2
(Di)
O(
x4 )
12
x2
15
16
1
4
30
16
..
.
14
16
30
..
.
6
1
16
..
.
1
..
.
16
1
6
30
16
14
' ((Di)
b2
(Di)
EXAMPLE 13.3.
O(
x4 )
10Di0
Di0
0
..
.
12
x2
Di1
10Di1
..
.
16
30
4
16
15
(13.27)
d2 u
du
+ (3 x)
+ 5u = 10x3 + 22.5x2 3x 5.5
2
dx
dx
subject to Dirichlet conditions u(0) = 0.1 and u(1) = 0.6. The exact solution is
known to be: u(x) = 5x3 7.5x2 + 3x + 0.1.
Using
x = 0.01 plus the matrices defined in (13.26) and (13.27) for high
precision, we obtain the plots for the solution and errors shown in Figure 13.1.
Note that the errors are within 1012 . It can be shown that using (13.24) and
(13.25) instead would have yielded errors within 103 .
For the second case, we consider the boundary condition at x = 0 to be a Neumann type and the boundary condition at x = 1 to remain a Dirichlet type, that is,
494
du/dx(0) = Neu0 and u(1) = Di1 . Then we can use item 7 in Table 13.1 and item 7
in Table 13.2 for Error = O
x2 ,
'
((Neu)
D1
(Di)
34
1
1
2
x
O(
x2 )
0
..
.
O(
x2 )
2
x
'
((Neu)
D2
(Di)
O(
x2 )
1
=
x2
4
11
O(
x2 )
1
=
x2
x
3
2
..
.
.
0
1
Neu0
0
..
.
x
11
..
.
2
1
Neu0
0
..
.
0
Di1
(13.28)
2
11
1
..
.
1
1
0
..
0
Di1
2
11
' ((Neu)
b2
(Di)
1
..
.
1
' ((Neu)
b1
(Di)
4
3
1
2
(13.29)
The matrices for the second case above corresponding to Error = O
x4 is left as
an exercise in E13.3.
Another approach to handling Neumann boundary conditions is to apply a central difference approximation of the derivative itself. For instance, let the Neumann
boundary condition be given as du/dx(0) = Neu0 . Then this can be approximated
first as
du
u1 + u1
u1 = u1 + 2
xNeu0
(13.30)
dx x0
2
x
Because u1 is a value at a fictitious or ghost point; that is, it is not in the actual
problem domain, the approach is sometimes known as method of ghost points. To
complete this method, the solution domain will be extended to include the point at
x0 ; that is, we extend the previous definitions as
u =
u0
u1
..
.
uK
1
..
0
1
0
=
.
..
.
K
0
K
0
1
..
.
K
495
' ((Neu)
D1
(Di)
O(
x2 )
0
1
1
2
x
0
0
..
.
' ((Neu)
b1
(Di)
O(
x2 )
2
x
' ((Neu)
D2
(Di)
O(
x2 )
2
1
1
x2
' ((Neu)
b2
(Di)
O(
x2 )
1
=
x2
1
..
.
1
..
.
0
1
0
Di1
(13.31)
1
..
.
1
..
.
2
1
2
xNeu0
0
..
.
1
0
2
xNeu0
0
..
.
2
2
..
.
0
Di1
1
2
(13.32)
(13.33)
where
A = D2 + D1 +
EXAMPLE 13.4.
and
b = b2 + b1 +
d2 u
du
+ (3 x)
+ 5u = 10x3 + 22.5x2 3x 5.5
dx2
dx
except this time we have a Neumann condition du/dx(0) = 3 and a Dirichlet
condition u(1) = 0.6. The exact solution is still u(x) = 5x3 7.5x2 + 3x + 0.1.
Using
x = 0.01, the plots shown in Figure 13.2 compare the result using the
method of ghost points and the result using the matrices in (13.28) and (13.29).
For both methods the errors are within 102 , with the direct method slightly
better for this example.
496
x 10
2.5
Error
1.5
0.5
0
0
0.2
0.4
0.6
0.8
1 i K, 1 j N
and u0,j , uK+1,j , ui,0 and ui,N+1 refer to the boundary values. With this convention,
we can put the terms corresponding to finite difference approximations of the partial
derivatives in (13.36) in the following matrix forms:
u
D(1,x) U + B(1,x)
x
2u
D(2,x) U + B(2,x)
x2
u
T
T
UD(1,y)
+ B(1,y)
y
2u
T
T
UD(2,y)
+ B(2,y)
y2
(13.34)
u0,1
u0,N
0
0
1
.
.
..
.
B(1,x) =
=
b(1,x) y1
b(1,x) yN
2
x
0
0
uK+1,1
uK+1,N
For the mixed derivative 2 u/(xy), the matrix formulation can be obtained by
applying the partial derivatives in sequence, that is,
2u
T
D(1,x) UD(1
+ B(1,x,1,y)
y)
xy
(13.35)
497
where
T
T
B(1,x,1,y) = D(1,x) B(1,y)
+ B(1,x) D(1,y)
+ C(1,x,1,y)
and C(1,x,1,y) introduces data from extreme corner points of the domain. For instance,
if the boundary conditions are all of Dirichlet types, while using the central difference
formulas of first-order derivatives, we have
u0,0
0 0
u0,N+1
0
0
.
.
..
.
C(1,x,1,y) =
0
.
4
x
y
0
0
uN+1,0 0 0 uN+1,N+1
With these matrix representations, we can now formulate the finite difference
solution of a linear second-order linear differential equation given by
xx (x, y)
2u
2u
2u
+
(x,
y)
+
(x,
y)
xy
yy
x2
xy
y2
+ x (x, y)
u
u
+ y (x, y)
+ (x, y)u + (x, y) = 0
x
y
(13.36)
vec (BAC)
vec (A B)
vec(A) vec(B) = Adv vec(B)
(13.38)
(13.39)
498
T
xx dv vec B(2,x) + xy dv vec B(1,x,1,y) + yy dv vec B(2,y)
T
+ x dv vec B(1,x) + y dv vec B(1,y)
+ vec
subject to u(0, y) = g a (y), u(1, y) = g b(y), u(x, 0) = hc (x) and u(x, 1) = hd (x).
In this case, xx (x, y) = yy (x, y) = 1 and the other coefficients are zero. Let us
now use the central difference formulas
2 1
0
2 1
0
.
.
1
1
1 2 . .
1 2 . .
and D(2,y) =
D(2,x) =
2
2
.
.
.
.
y
.. .. 1
.. .. 1
0
1 2
0
1 2
hc
hd
(ga )1
(ga )N
1
1
0
0
0
..
..
..
..
B(2,x) =
and
B
=
(2,y)
.
.
.
0
0
0
(gb)1
(gb)N
hc
hd
K
with D(2,x) and D(2,y) having sizes K K and N N, respectively, and (ga )i =
g a (i
y)/
x2 , hc = hc (j
x)/
y2 , . . . , and so forth. The matrices in (13.39)
j
then become
R2D
AK
sI
K
=
sIK
AK
..
.
0
..
..
sIK
sIK
AK
and
(ga )1 + hc
1
h
c
..
= vec
.
hc
K1
(gb)1 + hc
f2D
r
where AK =
(ga )N1
(ga )N + hd
1
hd
hd
(gb)2
r
..
(ga )2
499
(gb)N1
..
.
K1
(gb)N + hd
, s =
y2 , r =
x2 and q = 2 (r + s).
..
..
. r
.
0
r
q
Note that R2D has a block tri-diagonal structure and AK has a tri-diagonal structure. These special structures allow for efficient methods such as the Thomas
algorithm and the block Thomas algorithms discussed in Section 2.2.2 and
Exercise E2.16, respectively, to be used.
EXAMPLE 13.6.
..
0 x, y 1
(13.40)
where
x
y
5
2
g(x, y) = 2x 2x +
+ cos
2 cos (y) + 52 8 sin (y)
2
2
4
subject to
u(0, y)
u(1, y)
1
y sin (y)
4
1
4
u(x, 0)
u(x, 1)
1
(2x 1)2
4
1
(2x 1)2
4
(13.41)
500
x 10
4
0.5
Error
u(x,y) 0
2
1
-0.5
1
1
0.5
0.5
0.5
0.5
0 0
0 0
Figure 13.3. The finite difference solution to (13.40) subject to conditions (13.41) is shown on
the left plot, whereas the error from the exact solution in shown on the right plot.
2 u 1 u
1 2u
+
+ 2 2
2
r
r r
r
uk,1 = uk,N+1
u(r,
) = u(r, 2
)
uk,0 = uk,N
(13.42)
Thus, with U = uk,n [=]K N, uk,n = u (rk , n ), the central difference formulas
will yield
per T
u
= U D1
and
per T
2u
= U D2
2
(13.43)
where
per
D1
1
1
=
2
1
..
.
..
..
..
.
1
and Dper = 1 1
2
1
0
1
1
..
.
..
..
..
.
1
1
2
(13.44)
501
which are tri-diagonal matrices with additional terms at positions (1, N) and (N, 1).
per
per
Note that both D1 and D2 are circulant matrices.
There are many approaches to handling the complications at the origin. We
discuss a simple approach that will stretch the domain by introducing fictitious
points at r < 0 and a discretization that would bypass the origin yet make the fictitious points naturally disappear from the finite difference approximation equations.1
Assume that the radius has been normalized to be 0 r 1. Let k = 1, . . . , K with
2
r =
and rk = (k 21 )
r. Thus rK+1 = 1 and the fictitious points occur at
2K + 1
r0 =
r/2. With this choice and under the second-order central difference approximation of Laplacian at k = 1, the formula becomes
u2,n 2u1,n + u0,n
1 u2,n u0,n
1 u1,n+1 2u1,n + u1,n1
2 ur=
r/2
+
+
2
r/2
2
r
r2 /4
2
in which the terms involving u0,n do cancel out to yield
u2,n 2u1,n
1
u2,n
1 u1,n+1 2u1,n + u1,n1
2 ur=
r/2
+
+
2
r
(
r/2) 2
r
r2 /4
2
Assuming Dirichlet conditions at r = 1 given by u(1, ) = g(), we have
per T
2 u D2 U + B2 + V D1 U + V B1 + WU D2
(13.45)
where U = uk,n , uk,n = u(rk , n ), D2 and D1 are K K matrices given by (13.24)
per
1
1
H and B1 =
H, with
2
r
2
r
H=
0
..
.
0
g(1 )
0
..
.
0
g(N )
Au = b
where u = vec (U) and
per
IN (D2 + V D1 ) + D2 W
(13.46)
This method is based on M. C. Lai, A note on finite difference discretizations for Poisson equation
on a disk, Numerical Methods for Partial Difference Equations, vol. 17, 2001, pp. 199203.
502
x 10
2
Figure 13.4. The errors of the finite difference solution from the exact solution
for the Laplace equation in polar coordinates given in Example 13.7.
Error
0
2
1
1
0
1 1
These results can be extended to handle the more general case of nonhomogeneous
Helmholtz equation given by
2 u + h(r, )u = f (r)
The finite difference solution for this case is included as an exercise in E13.7.
The structure of A is a KN KN, block-circulant matrix, which is a block tridiagonal matrix with additional blocks at the corners, that is,
G H
H
H G ...
A=
.
.
..
.. H
H
H G
and there are efficient algorithms for solving problems with this special structure.
(See, e.g., Section B.6.1, where the matrices can be split to take advantage of the
block tri-diagonal inner structure.)
Consider the Laplace equation in polar coordinates 2 u = 0
with boundary condition u(1, ) = 1 + cos(3). The exact solution was found
in Example 11.6 to be
EXAMPLE 13.7.
u = 1 + r3 cos(3)
Based on (13.46), the errors of the finite difference solution from the exact
solution are shown in Figure 13.4 while using K = 30 and N = 60 or
r = 0.328
and
= 0.1047. The errors are within 2 103 .
For the case of spherical coordinates, the Laplacian is given by
2u =
2 u 2 u
1 2u
cos u
1
2u
+
+
+
+
r2
r r
r2 2
r2 sin
r2 sin2 2
per
per
(13.47)
The same matrices D1 and D2 defined in (13.44) are needed to satisfy the periodicity along the and variables. However, to handle the issue of the origin, there
503
is no need to change the discretization along r as was done for the case of polar or
cylindrical coordinates. This means that we can set rk = k
r, k = 1, . . . , K, where
r = 1/(K + 1). To see this, take the finite difference approximation of the terms in
(13.47) that involve partial derivatives with respect to r at r = r1 =
r
2
u 2 u
u2,n,m 2u1,n,m + u0,n,m
u2,n,m u0,n,m
+
+
2
2
r
r r r=r1 =
r
r2
=
2u2,n,m 2u1,n,m
r2
Thus, for the case of azimuthal symmetry, that is, u = u(r, ), we can use the
regular discretizations k = 1, . . . , K,
r = 1/(K + 1), rk = k
r and n = 1, . . . , N,
= 2/(N + 1), = (n 1)
to obtain a matrix equation similar to (13.45) that
approximates the Laplacian operation in spherical coordinates, that is,
per T
per T
+ WU D1
Q (13.48)
2 u D2 U + B2 + 2V D1 U + 2V B1 + WU D2
per
per
Au = b
(13.49)
per
per
IN D2 + 2V D1 + D2 + QD1
W
Remarks: MATLAB codes that implement (13.46) and (13.49) for the 2D Poisson
equation under polar and spherical (with azimuthal symmetry), respectively,
are available on the books webpage as poisson2d_polar_dirich.m and
poisson2d_polar_sphere.m, respectively. The function is invoked by the
statement
[U,r,th,x,y]=poisson2d_polar_dirich(K,N)
(or [U,r,th,x,y]=poisson2d_sphere_dirich(K,N)) to obtain the solution
U of size K N at grid points x and y (or polar coordinates r=r and th=). The
program will need to be be edited to customize its application with user-defined
forcing function f (r, ) and user-defined boundary conditions u(1, ) = g().
504
(13.50)
where u(q) = u (q
t) and contains the values of u(t, x) at the grid points. Equation (13.50) is called the time-marching scheme; that is, from an initial condition
u(0) = u0 , (13.50) is evaluated iteratively as q is incremented. If the function f in
(13.50) is not dependent on u(q+1) , the scheme is classified as explicit; otherwise, it
is classified as an implicit. The integer p determines how many past values are used.
For p > 0, we end up with a scheme that is known as a multilevel scheme.2
We limit our discussion only on linear time-marching schemes, that is,
u(q+1) =
p
(q)
i u(qi) + g (q)
(13.51)
i=1
(q)
d
u(t) =
F(t)u(t) +
B(t)
dt
(13.52)
(13.53)
When (13.53) is discretized with respect to time, this yields the time-marching equations of the form (13.51). This approach is known as the semi-discrete approach, also
known as the method of lines.
For instance, consider the following linear, time-varying, second-order equation
in 2D space:
t
u
u
2u
2u
+
ts
+
ps
+
s
+ u +
t s=t,x,y ts p,s=x,y
ps s=x,y s
(13.54)
Using the procedures discussed in Section 13.2.2, the dependent variable u and the
coefficients t , ts , ps , s , and , with p, s = x, y, can be represented in matrix
2
More specifically, for p 0, we have a (p + 2)-level scheme in which two of the levels are for
t = (q + 1)
t and t = q
t.
505
u1,1
..
.
uK,1
..
.
u1,N
.. ;
.
uK,N
xx
..
.
(xx )1,1
..
=
.
(xx )K,1
(xx )1,N
..
.
(xx )K,N
; etc.
2u
t2
2u
tx
2u
ty
d
U
dt
d2
U
dt2
d
d
D(1,x) U + B(1,x) = D(1,x)
U +
dt
dt
d
d
T
T
UD(1,y)
+ B(1,y) =
+
U D(1,y)
dt
dt
d
B(1,x)
dt
d
B(1,y)
dt
(13.55)
d2
d
(vec(U)) + M1 (vec(U)) + M0 (vec(U)) + N = 0
dt2
dt
(13.56)
M2
M1
M0
dv
tt
dv
'
(dv
dv
+ tx
IN D(1,x) + ty
D(1,y) IK
R2D
'
(dv
dv d
d
f2D + tx
B(1,x) + ty
B(1,y)
dt
dt
and the terms R2D and f2D were given in (13.39). Next, let
vec(U)
v=
d
(vecU)
dt
(13.57)
For the special case of nonsingular M2 , we could rearrange (13.56) into semi-discrete
form (13.53), that is,
d
v(t) = F(t)v(t) + B(t)
dt
(13.58)
where
F=
M1
2 M1
M1
2 M0
and
B=
0
M1
2 N
506
Another simple case occurs for diffusion problems. In this case, t = 1 and
tt = tx = ty = 0. Then M2 = 0, M1 = 1, M0 = R2D , and N = f2D . With v =
vec(U), (13.56) reduces to
d
v = R2D v + f2D
dt
EXAMPLE 13.8.
(13.59)
by
u
= 2 u + u + (t, x, y)
t
(13.60)
u (t, x, 0) = w0 (t, x)
u (t, 1, y) = v1 (t, y)
u (t, x, 1) = w1 (t, x)
(13.61)
u1,1 . . . u1,N
..
..
U = ...
.
.
uK,1 . . . uK,N
1,1 (t)
..
(t) =
.
K,1 (t)
(13.62)
...
..
.
...
1,N (t)
..
.
K,N (t)
(13.63)
where
v
vec (U)
IN D(2,x) + D(2,y) IK + INK
T
vec + vec B(2,x) + B(2,y)
Once the problem has been reduced to an initial-value problem, the methods discussed in Chapter 7, including Runge-Kutta and multistep methods can be
employed. One concern, however, is that the size of the state vector can be enormous
as the spatial-grid resolution increases to meet some prescribed accuracy. Moreover,
matrix F in (13.53) will become increasingly ill-conditioned as the matrix size grows,
507
1000
eigenvalues
2000
3000
4000
5000
6000
7000
5
10
15
N+1
as shown in Example 13.9 that follows. As discussed in Section 7.4.2, stiff differential
equations favor implicit time-marching methods.
Ra Rb
0
Rb . . . . . .
(13.64)
R=
..
..
.
. Rb
0
Rb Ra
EXAMPLE 13.9.
where,
Ra =
..
.
..
..
..
; Rb =
0
..
8
2
; = 2 3; =
x2
maxi (|i |)
min j (| j |)
20
508
1. Forward-Euler Schemes.
v(q+1) v(q)
v(q+1)
(13.65)
This scheme is an explicit type and is one of the easiest time-marching schemes
to implement. Starting with the initial condition, v(0) , the values of v(q) are
obtained iteratively. However, as with most explicit methods, stability is limited
by the size of time steps used. Sometimes, the time steps required to maintain
stability can be very small, resulting in a very slow time-marching process.3
Nonetheless, due to the ease of implementation, forward schemes are still used
in several applications, with the caveat that stability may be a problem under
certain parametric conditions.
2. Backward Euler Schemes.
v(q+1) v(q)
v(q+1)
I
tF(q+1)
1
v(q) +
tB(q+1)
(13.66)
In some cases, the required time steps may be too small such that round-off errors become very
significant. In this case, the explicit scheme is impractical.
509
t
2
9
8
t (q+1) 1
t (q)
t (q+1)
(q+1)
(q)
(q)
= I
I+
v
F
v +
F
B
+B
2
2
2
(13.67)
This is also an implicit method and again requires the inversion of (I
(
t/2)F(q+1) ). One advantage of Crank-Nicholson schemes over Euler backward schemes is an increase in the order of accuracy.
It can be shown that the
accuracy of a Crank-Nicholson will be O
t2 ,
xn , compared with the accuracy
of a backward Euler scheme, which is O (
t,
xn ). However, for discontinuous
or non-smooth boundary conditions, the Crank-Nicholson method can introduce
undesired oscillations unless the value
t is small enough.5
For a simple exploration of the three methods applied to a one-dimensional
diffusion equation, see Exercise E13.9. For equations in two- and three-spatial
dimensions, other modifications such as ADI (alternate-dimension implicit)
schemes are also used (see the appendix given as Section M.4).
All three methods are special cases of an umbrella scheme called the Euler-
method, which is also known as weighted-average Euler method. The weightedaverage Euler method is given by
v(q+1) v(q)
= F(q+1) v(q+1) + B(q+1) + 1 F(q) v(q) + B(q)
(13.68)
From (13.68), we see that = 0, = 1/2, and = 1 yield the Euler-forward, CrankNicholson, and Euler-backward schemes, respectively.
EXAMPLE 13.10.
plane,
4
5
(13.69)
If the required solution is only the steady-state profiles, time accuracy may not be as important. In
this case, large time steps can be used to speed up the convergence.
Other additional techniques such as smoothing via averages can be used to reduce the amount of
oscillations.
510
where,
(x, y) =
8
1 + 20r(x, y)
r(x, y) = e22xy
(x, y) =
1
1 + 5s(x, y)
s(x, y) = e8[(x0.8)
y]
q(x, y)
h(x, y)
800r
(1 + 20r)
(1 + 20r)2
8
9
64 2
s2
3200 + 50 16x
5
(1 + 5s)3
8
9
64 2
s
400 + 5 16x
5
(1 + 5s)2
u (t, 1, y) = 1 (t, y)
u (t, x, 0) = 0 (t, x)
u (t, x, 1) = 1 (t, x)
e2t (0, y) + 1 e2t (0, y)
e2t (1, y) + 1 e2t (1, y)
e2t (x, 0) + 1 e2t (x, 0)
e2t (x, 1) + 1 e2t (x, 1)
(13.71)
(13.72)
This initial value-boundary condition problem satisfies the same situation given
in Example 13.8. Thus the matrix formulation is given by
d
v = R v + f(t)
dt
Using the Crank-Nicholson method,
t (q+1)
(q+1)
(q)
(q)
I
R v
= I+
R v +
f
+f
2
2
2
8
1
9
1
t
(q+1)
I
v
I+
=
I
R
R v(q) +
f(q+1) + f(q)
R
2
2
2
2
With
x =
y = 0.05 (K = N = 19) and
t = 0.001, the plots of the approximate solutions at different time slices are shown in Figure 13.6, together with
the exact solution. The plots of the error distribution at different time slices are
shown in Figure 13.7. The errors are within 2 103 .
t= 0.1
t= 0.2
u
0
1
0 0
t= 0.3
0.5
0.5
0
1
0.5
0 0
t= 0.4
0.5
0
1
0.5
0 0
t= 0.5
1
0.5
0.5
0.5
0.5
0 0
0.5
0
1
0.5
0 0
t= 0.7
0.5
0
1
0.5
0 0
t= 0.8
1
0.5
0.5
0.5
0.5
0 0
0.5
0
1
0.5
0 0
0.5
t= 0.9
0
1
0.5
t= 0.6
0
1
511
0.5
0
1
0.5
0 0
0.5
Figure 13.6. The finite difference solution to (13.70) at different slices of time, subject to conditions (13.71) and (13.72), using the Crank-Nicholson time-marching method. The approximations are shown as points, whereas the exact solutions, (13.69), at the corresponding t
values are shown as a surface plots.
512
x 10
x 10
t= 0.1
x 10
t= 0.2
t= 0.3
2
Error
2
1
2
1
2
1
0 0
0.5
0 0
0.5
0.5
0 0
x 10
x 10
t= 0.5
t= 0.6
2
1
2
1
2
1
0.5
0.5
0 0
0.5
0 0
0.5
0.5
0 0
x 10
t= 0.8
t= 0.9
0.5
0.5
0 0
x 10
2
1
0.5
x 10
t= 0.7
x 10
t= 0.4
0.5
2
1
0.5
0 0
0.5
2
1
0.5
0 0
0.5
Figure 13.7. The error distribution between the finite difference approximation (using central
difference formulas for spatial derivatives and Crank-Nicholson time-marching method) and
the exact solutions, (13.69), at different t values.
t= 0.014
Error
5
5
1
1
0.5
0.5
y
0
513
In this section, we limit our discussion to linear equations. Let the partial differential equation be
Lv = f
(13.73)
(13.74)
for all points p in the discretized space, then the scheme (13.74) is said to be
convergent with (13.73).
Definition 13.3. Suppose the homogeneous part of (13.74), that is, L
u p,
= 0,
is rearranged into a two time-level formula given by
(q+1)
(q)
= C(q) v
(13.77)
For instance, one refinement path could be one that maintains the ratio
t/(
x
y
z) constant.
514
For Lax-Richtmyer stability, there exist several tests for necessary conditions,
including spectral radius and von Neumann methods, as well as other sufficient
conditions. In general, stability properties are still easier to prove than convergence
properties.
Fortunately, a crucial theorem called the Lax equivalence theorem states that
for two-level linear consistent schemes, Lax-Richtmyer stability is both a necessary
and sufficient condition for the convergence of a given scheme.7 Thus we describe
next some of the tools of stability analysis. From these analyses, we can determine
the range of time steps that maintains boundedness of the approximate solutions.
(13.78)
1
C(q) =
I
tF(q+1)
for backward Euler scheme
1
I
t
F(q+1)
I +
t
F(q)
for Crank-Nicholson scheme
2
2
whereas for matrix
(q) ,
tG(q)
1
(q) =
I
tF(q+1)
tG(q+1)
1
t (q+1)
I
t
F(q+1)
G
+ G(q)
2
2
As long as matrix
(q) is bounded, the stability of the finite difference scheme
will only depend on matrix C(q) . For the case in which C is stationary, that is, not
dependent on q, we have the following result.
Let
(q) be bounded for all q 0. Let = {1 , . . . , m } be the set of
distinct eigenvalues of C, with si copies of i . A necessary condition for the scheme
given in (13.78) to be numerically stable (i.e., |v(q) | < ) is that
:
;
max |i | 1
(13.79)
THEOREM 13.1.
i=1,...,m
For a proof of the Lax equivalence theorem, see Morton and Myers (2005).
515
forms,
C = T 1 JT
where T is a nonsingular matrix, whereas J is a block diagonal matrix composed of
m Jordan blocks,
i
1
0
0
J1
..
..
.
.
..
with
J
=
J =
[=] si si
i
.
i
1
Jm
0
i
Also, recalling formulas (3.36)-(2), the powers of J i are given by
q
J i = P(q,si ) i
where
P(q,s)
k,j
1
0
..
.
[q,q1] 1
i
1
..
.
..
.
[q,qs+1] s+1
i
[q,qs+2] s+2
i
..
.
k!
(k j )!j !
[=] s s
if j 0
otherwise
q
0
P(q,s1 ) 1
1
.
..
= T
0
(0)
T v
q
P(q,sm ) m
Thus
max (|i |) > 1
i
Condition (13.79) becomes both a necessary and sufficient condition for numerical stability when all the eigenvalues are distinct. The maximum absolute eigenvalue
is also known as the spectral radius. Thus condition (13.79) is also known as the
spectral radius condition.
Note that numerical stability is only a necessary condition for Lax-Richtmyer stability. For Lax-Richtmyer stability, additional conditions are needed for the refinement paths; that is, numerical stability should be maintained as
0. Nonetheless,
for practical purposes, (13.79) usually satisfies the needs of most finite difference
schemes. In fact, it can be shown that when C is a normal matrix, numerical stability
becomes necessary and sufficient for Lax-Richtmyer stability.
516
max( | (C) | )
10
10
t = 0.001
10
t = 0.00025
10
10
10
10
EXAMPLE 13.11.
C = I +
t R
(13.80)
517
max( |(C)| )
CrankNicholson
0.8
0.6
0.4
0.2
0 4
10
Backward Euler
3
10
10
10
Neumann approach takes the point formulas, that is, the time-marching difference
equation for u(q
t, k
x, n
y, m
z), and determines whether the difference scheme
will amplify the magnitude of u at t +
t at some points in the (x, y, z)-space.
Specifically, the method proceeds as follows:
1. Set up the time-marching difference formula for u ((q + 1)
t, k
x, n
y, m
z)
and consider only the homogeneous terms of the partial differential equation.
2. Set u to be
u q
t, k
x, n
y, m
z = q eix k
x eiy n
y eizm
z
(13.81)
(13.82)
10
518
Let us now use the von Neumann method to estimate the range
of
t that would yield a stable scheme for Example 13.10. With the forward
Euler scheme, the time-marching scheme for (13.70)
EXAMPLE 13.12.
u
= 2 2 u 3u + g(t, x, y)
t
(q)
at point uk,n,m is
1 (q+1)
(q)
uk,n,m uk,n,m
2 (q)
(q)
(q)
u
2u
+
u
k,n,m
k1,n,m
x2 k+1,n,m
2 (q)
(q)
(q)
+
uk,n+1,m 2uk,n,m + uk,n1,m
2
y
(q)
3uk,n,m + g q
t, k
x, n
y, m
z
To consider only the homogeneous part, we remove g from the equation. Next,
(q)
we substitute (13.81) and then divide out uk,n,m from both sides to obtain
1
( 1)
4
(cos (x
x) 1)
x2
4
+ 2 cos y
y 1 3
1 (3 + 8)
t
where
=
1
x
1
y
2
2
sin
+
sin
0
x
y
x2
2
y2
2
Because the stability requirement should hold for the worst case situation,
the values of
t needed to keep || 1 can be determined by setting
3 + 8 max()
t 2
x ,y
2
x2
y2
+ 8 (
x2 +
y2 )
3
x2
y2
By setting
x =
y = 0.05, we find that
t 3.123 104 . This is comparable
to the stability range found using the eigenvalue method (see Example 13.11),
which is
t 3.20 104 .
For the backward Euler method, we have
1 (q+1)
2 (q+1)
(q)
(q+1)
(q+1)
uk,n,m uk,n,m
=
u
2u
+
u
k+1,n,m
k,n,m
k1,n,m
x2
2 (q+1)
(q+1)
(q+1)
+
u
2u
+
u
k,n,m
k,n1,m
y2 k,n+1,m
(q+1)
3uk,n,m + g (q + 1)
t, k
x, n
y, m
z
13.5 Exercises
519
13.5 EXERCISES
E13.1. Obtain the coefficients for the finite difference formulas of the following
and determine the order:
2
2
d2 u
d u
1
d u
1.
=
a
+
bu
+
e
+
cu
+
du
k1
k
k+1
2
2
2
dx xk
dx k1
x
dx2 k+1
d2 u
1 du
d2 u
1
2.
=a
+
buk + cuk+1 + d
2
2
dx xk
x dx k1
x
dx2 k+1
2u
, where u/y
xy
was approximated using the formula in item 7 of Table 13.1.
E13.3. Find matrices D2 , D1 , b2 , and b1 (based on notations given in Section 13.2.1) for a fourth-order finite difference approximation that would
handle the Neumann condition at x = 0 and Dirichlet condition at x = 1,
that is, (du/dx)(0) = Neu0 and u(1) = Di1 .
E13.4. Consider the following second-order differential equation:
2u + 2
2u
u
u
+
2
9y = 0 ;
xy x
y
0 x, y 1
y=0
520
E13.6. Use the method given in Section 13.2.3 to obtain the finite difference solution
of the following Poisson equation in polar coordinates:
2 u = 4 21 cos (5)
0 r 1 , 0 2
subject to u(1, ) = 2 + cos (5). Noting that the exact solution is given by
u = 1 + r2 (cos (5) + 1), plot the solution and the error distribution for the
case with K = 30 grid points along r and N = 200 grid points along .
E13.7. Modify the formulation of matrix A in (13.46) to handle the generalization
of the Poisson equation to the nonhomogeneous Helmholtz equation in
polar coordinates given by
2 u + h(r, )u = f (r, )
Create a program (or modify poisson2d_polar_dirich.m) to accommodate the change and test it on the case with h = 3, f = 18 15r3 sin(3)
and boundary condition u(1, ) = 6 5 sin(3). The exact solution for this
case is given by u = 6 5r3 sin(3). Plot the error distribution for the case
using K = 100 grid points along r and N = 100 grid points along .
E13.8. Use the method given in Section 13.2.3 to obtain the finite difference solution
of the following Poisson equation in spherical coordinates under azimuthal
symmetry:
5 4 cos2 ()
0 r 1 , 0 2
sin()
subject to u(1, ) = 3 + sin (). Noting that the exact solution is given by
u = 3 + r2 (sin ()), plot the solution and the error distribution for the case
with K = 30 grid points along r and N = 100 grid points along .
2u =
2. Obtain the time-marching equation based on the central difference formula for 2 u/x2 ; that is, find F and B in (13.51) for this problem.
3. Using
x = 0.01, try to obtain the finite difference solution from t = 0
to t = 0.1 using the three weighted-Euler methods, that is, the forward
Euler, backward Euler, and Crank-Nicholson. First try
t = 0.005 and
then try
t = 5 105 . Plot the time lapse solutions and error distribution of the solutions (if they are bounded).
4. Using the von Neumann stability method, show that the maximum time
increment allowed for a stable marching of the forward Euler method for
this problem will be
t =
x2 /2 (thus
tmax = 5 105 will be stable.)
5. For the Crank-Nicholson method, the value of
t need not be as small as
the one needed for the stability of the forward Euler method to remove
the oscillations. Try
t = 5 104 and show that the oscillations are
absent.
13.5 Exercises
521
(q+1)
(1 + ) uk+1 + (1 ) uk
(q)
(q)
= (1 ) uk+1 + (1 + ) uk
(13.83)
where =
t/
x.
1. Show that, based on the von Neumann method, the amplification factor
is given by || = 1 for all real .
2. Use this scheme for = 0.5,
t = 0.01 for a finite domain xk = k
x,
(q)
(q)
u (t, x, 0) = w0 (t, x)
u (t, x, 1) = w1 (t, x)
E13.13. Write a general program that would solve the dynamic 2D Poisson equation
in polar coordinates given by
u
2 u 1 u
1 2u
+ f (r, ) = 2 +
+ 2 2
t
r
r r
r
subject to static initial and boundary conditions given by
u(r, , 0) = U i (r, )
u(1, , t) = U R ()
522
where f (r, ) is a forcing function. Test the program with the problem given
in Example 11.8 and determine whether you get similar results to those
shown in Figure 11.7.
E13.14. Consider the one-dimensional time-dependent heat equation for a sphere
that is symmetric around the origin given by
2
T
T
2 T
2
= T =
+
t
r2
r r
with initial and boundary conditions given by T (r, 0) = 0, (T/r)r=0 = 0
and T (0, t) = 1. Based on the separation of variables method, the analytical
solution is given by
2 (1)n (n)2 t
T (r, t) = 1 +
e
sin (nr)
(13.84)
r
n
n=0
+ T (1, t) = 1
r
r=1
with = 0.1. (Hint: You can use the method of ghost points to introduce
this boundary condition while adding T (rK+1 , tq ) as unknowns.)
E13.15. Use the von Neumann method to determine the stability region of the six
basic finite difference schemes given in Table M.1.
E13.16. Apply the same six finite difference schemes on the system given in Example M.3 but instead using an initial condition that contains a triangular pulse
given by:
0
otherwise
Observe whether significant oscillations will occur when using the CrankNicholson, leapfrog, or Lax-Wendroff schemes.
14
In this chapter, we discuss the finite element method for the solution of partial differential equations. It is an important solution approach when the shape of the domain
(including possible holes inside the domains) cannot be conveniently transformed
to a single rectangular domain. This includes domains whose boundaries cannot be
formulated easily under existing coordinate systems.
In contrast to finite difference methods which are based on replacing derivatives
with discrete approximations, finite element (FE) methods approach the problem
by piecewise interpolation methods. Thus the FE method first partitions the whole
domain
into several small pieces
n , which are known as the finite elements
represented by a set of nodes in the domain
. The sizes and shapes of the finite
elements do not have to be uniform, and often the sizes may need to be varied to
balance accuracy with computational efficiency.
Instead of tackling the differential equations directly, the problem is to first recast
it as a set of integral equations known as the weak form of the partial differential
equation. There are several ways in which this integral is formulated, including
least squares, collocation, and weighted residual. We focus on a particular weighted
residual method known as the Galerkin method. These integrals are then imposed on
each of the finite elements. The finite elements that are attached to the boundaries
of
will have the additional requirements of satisfying the boundary conditions.
As is shown later, these integrals can be reduced to matrix equations in which
the unknowns are the nodal values. Because neighboring finite elements share the
same nodes, the various local matrix equations have to be assembled to form the
global matrix equation. Basically, the result is either a large linear matrix equation
for steady-state problems or a large matrix iteration scheme for transient problems.
There are several implementations of finite element method. In this chapter,
we limit our discussion to the simplest approaches. First, we only tackle 2D linear
second-order partial differential equations. The construction of the weak form for
these problems is discussed in Section 14.1. We opted to tackle the 2D case because
the 1D case can already be easily handled by finite-difference methods, whereas
there are several 2D domains that are difficult to solve using only finite difference
methods. With a good basic understanding of the 2D case, the extensions to three
dimensions should be straightforward.
Second, we limit our finite element meshes to be composed of triangles and
apply only the simplest shape function (also known as interpolation function) based
523
524
on three vertices of the triangle. The various properties and integrals based on triangular elements are discussed in Sections 14.2 and 14.2.1, respectively. We include
much later in Section 14.4 a brief description of mesh construction called the Delaunay triangulation. The use of triangles, although not the most accurate, simplifies the
calculation of the integrals significantly compared with high-order alternatives. The
inclusion of the boundary conditions depends on the types of conditions. The application of Neumann conditions and Robin conditions involves line integrals, which
are discussed in Section 14.2.2. This will require another set of one-dimensional
shape functions.
After the weak form has been applied to the local elements to obtain the various
matrix equations, the assembly of these local matrix equations to a global equation
is discussed in Section 14.3. Once the assembly process has finished, only then are
the Dirichlet conditions included. There are two approaches available for doing
this: the matrix reduction approach and the overloading approach. Following most
implementations of finite element methods, we focus on the overloading approach.
Afterward, the various steps of the finite element method, based on triangular elements are summarized in Section 14.5.
Having a basic description of the particular implementation of the finite element method using triangles for a linear second-order differential equation, we also
include three extensions in this chapter. One is the improvement for convectiondominated cases known as the streamline upwind Petrov-Galerkin (SUPG) method,
discussed briefly in Section N.2 as an appendix. Another extension is the treatment of
axisymmetric cases discussed in Section 14.6, in which the techniques of the 2D finite
element method can be used almost directly but with the inclusion of r as a factor
inside the integrals. Finally, in Section 14.7, we discuss the use of the finite element
method to handle unsteady state problems via the Crank-Nicholson method.
(M(x, y) u) + b(x, y) u
+ g(x, y)u + h(x, y) = 0
for (x, y)
(14.1)
525
u(
n)
u
u(x, y)
on boundary ( )DBC
(14.4)
where
is a region in the (x, y) plane, with the subscripts NBC, RBC, and DBC
standing for Neumann, Robin, and Dirichlet boundary conditions, respectively, and
n is the unit normal vector pointing outward of
. The M tensor is assumed to be
symmetric and positive definite. The functions u(x, y), q(x, y) and (x, y) are all
assumed to be continuous functions along their respective boundary.
As mentioned in the introduction, we limit our approach to the use of triangular
finite elements and linear piecewise approximations. This means that the domain will
be partitioned into triangular elements, each defined by the three vertices known
as nodes. By assuming a linear-piecewise approximation of the surface in each element, a stitching of the neighboring elements will provide a continuous, albeit nonsmooth, surface solution to the partial differential equation. To illustrate, consider
Figure 14.1. The finite element domain,
n , is a triangular region in the (x, y)-plane
defined by three vertices (or nodes), p1 , p2 , and p3 . The collection of u-values at each
node, that is, u (p j ), will form the desired numerical solution to the partial differential
equation. The linear approximation will then be a plane, denoted by u (
n ), as shown
in Figure 14.1. Doing so, we see that the triangular planes will stitch continuously
with their neighboring elements. Obviously, the approximation can be improved by
increasing the order of approximation, that is, allowing for curved surfaces. When
doing so, one needs to make sure that the surface elements will form a continuous
surface when combined. In our case, we assume that increasing the resolution of the
mesh will be sufficient for improving accuracy.
In committing to use first-order approximations for each finite element, the
first-order derivatives u/x and u/y will in general not be continuous across
the shared edges of neighboring finite elements. This means that the second-order
partial derivatives can no longer be evaluated around these neighborhoods. We
need to reformulate the original partial differential equation to proceed with the
first-order approximation.
One such formulation is called the weak form. It transforms the original secondorder partial differential equation as given in (14.1) into an integral equation in
which the integrands will involve only first-order partial derivatives. We begin by
526
including a weighing function on the right-hand side of (14.1). We can choose the
variation of u, denoted by u, as the weighing function,1
=
=
u F dA
u (M(x, y) u) dA +
u b(x, y) u dA
+
(14.5)
Thus the solution is to find the values of u at the nodes that would make the sum
of integrals in (14.5) as close to zero as possible, based on the chosen forms of the
weights.
An exact solution u(x, y) has the property that the values and derivatives of u
will satisfy F = 0 as given in (14.1), while satisfying the boundary conditions. These
are known as the strong solutions. By fixing the solutions to be composed of flat
triangles, we are immediately setting the objective toward obtaining an approximate
solution that is defined by the node values, and we do not expect F = 0, except in
special circumstances. Instead, the factor inside the integrand that is weighted by u
can be taken instead as a residual error or residual. Thus the weak form given
in (14.5) is also known as the weighted residual method. And the solution to (14.5)
is known as a weak solution.
The first integrand in (14.5) contains the second-order partial derivatives. As we
had already mentioned, this causes a problem because the second derivatives of a
surface composed of flat triangles will be zero inside each element, while becoming
indeterminate at the edges of the elements. To alleviate this problem, we use two
previous results: the divergence of a scalar product of a vector (cf. (4.55)) and the
divergence theorem (cf. (5.5)), which we repeat below:
F n dS
v + v
F dV
(14.6)
(14.7)
(uM(x, y) u)
(u) M(x, y) u
(14.8)
Next, using (14.7) (which is equivalently Greens lemma for the 2D case) to the first
term on the right-hand side of equation (14.8),
(u M(x, y) u) dA =
(u M(x, y) u) n ds
(14.9)
Bound( )
Instead of using the concept of weight, a similar approach is to treat u as a test function.
527
pi
pk
pj
where Bound (
) is the boundary of
and n is the outward unit normal at this
boundary. Substituting (14.8) and (14.9) to (14.5), and then after some rearranging,
we get
M(x,
y)
u
dA
(u)
u
M(x,
y)
n
ds
Bound(
)
u b(x, y) u dA =
(14.10)
+ u h(x, y) dA
u g(x, y) u dA
We refer to (14.10) as our weak-form working equation for the finite element method.
EXAMPLE 14.1.
then
2x
1.5 y
21
det
1.5 1
7 2
2
x y
3 3
3
det
1
1.5 x
2y
1.5 1
21
1 x 1.5 x
det
1y
2y
12
1.5 2
det
1 1.5 2 1.5
2 4
2
+ x y
3 3
3
2 2
4
3 = x + y
3 3
3
528
From this point on, we assume that the indexing of the points follow a cyclic
sequence, that is, (i, j, k) = (1, 2, 3) , (2, 3, 1) , or (3, 1, 2) should trace a counterclockwise path. Let D/2 be the area of the triangle formed by pi , p j and pk , that is,
(cf. (1.76) in Exercise E1.32),
1
1
1
(14.12)
D = det pi p j + det p j pk + det pk pi = det
pi p j pk
Using the multilinearity property of determinants (cf. item 7 in Table 1.6), one can
show that
D = det (p j pi ) (pk pi )
which is the denominator in (14.11). Thus with p constrained to be inside or on the
triangle formed by pi , p j and pk , we have
2Area
ppj pk
i =
and
0 i (p) 1
2Area
pi pj pk
We now list some of the properties of the shape functions (with i, j, k {1, 2, 3}
and i = j = k). These results can be obtained by direct application of (14.11).
1. i (p) has a maximum value of 1 at p = pi .
1 (p1 ) = 1 ;
2 (p2 ) = 1
3 (p3 ) = 1
(14.13)
i = j = k
01
(14.14)
(14.15)
1
3
(p1 + p2 + p3 ).
1
(p1 + p2 + p3 )
3
D
.
6. The surface integral of i with respect to
n is equal to
6
D
i dA =
6
n
1 (p ) = 2 (p ) = 3 (p ) =
with p =
(14.16)
(14.17)
(14.18)
Equation (14.18) is just the volume of the pyramid formed by the shape function
i , as shown in Figure 14.2. The altitude of the pyramid is 1, whereas the area
529
p
of the triangular base has been determined earlier to be D/2. Thus because the
volume of the pyramid is one-third the product of base area and the altitude,
we obtain the value of D/6, where D is given by (14.12)
(14.19)
D
2
f1 + f2 + f3
3
(14.20)
f dA = f
dA = f =
2
2
3
n
n
(14.21)
(14.22)
Thus the surface integrals given by (14.20) and by (14.22) are equal. The constant
value, f , acts like a mean value of f in
n as shown in Figure 14.4.
We will be replacing f (p) with several objects. In some cases, we will substitute
f (p) with the elements of M(x, y) and b(x, y), the functions g(x, y) and h(x, y), as
530
p*
Figure 14.4. The volume in (a) is the same as
the volume in (b). The volume in (a)
is generated by the surface integral of f = 3i=1 f i i .
The volume in (b) is a prism of constant height
and isgenerated by the surface integral of
f = 3i=1 f (p )i
p*
p*
(a)
(b)
1
3
(14.24)
u (p ) = u1 1 (p ) + u2 2 (p ) + u3 3 (p ) = T [u]
where
u1
[u] = u2
u3
1
1
=
1
3
1
and
(14.25)
(14.26)
Similarly, for u,
u
u (p )
where
=
=
u1 1 + u2 2 + u3 3
(14.27)
u1 1 (p ) + u2 2 (p ) + u3 3 (p ) = [u]
T
(14.28)
u1
[u] = u2
u3
A finite element approach in which the particular choice of the weight u is formulated by (14.27) is known as the Galerkin method.2 It can be shown that for b = 0
2
Other weights can be chosen, which may indeed produce a more accurate solution, for example, the
Petrov-Galerkin approach to be discussed later.
531
and M that is symmetric, the Galerkin method yields an optimal solution of the weak
form (14.5) for a given mesh.
The gradients u and (u) will be approximated by u and (u)
given below.
Substituting (14.16) into (14.24) and rearranging,
p
p
p
p
p
p
2y
3y
3y
1y
1y
2y
1
x, y
u =
D
(p 3x p 2x ) (p 1x p 3x ) (p 2x p 1x )
det
'
p2
x,
p3
det
p3
p1
1
D
det
p1
p2
u1
u2
u3
(14.29)
1
0
0
1
p1
p2
1
det p2 p3 ,
D
The gradient of u then becomes
(
T + [u]
where,
p3
0
1
1
det
p3
p1
1
1
0
det p1 p2
1
0
1
,
u = T [u]
(14.30)
(u)
= T [u]
(14.31)
Similarly, for u,
2
D
[u]T TT M(p ) T [u]
2
n
+
2
D
[u]T bT(p ) T [u]
2
n
2
+
D
[u]T g (p ) T [u]
2
n
+
2
D
[u]T h(p )
2
n
(u) [M(x, y) u] dA
n
u [b(x, y) u] dA
[u g(x, y) u] dA
n
n
u h(x, y) dA
(14.32)
532
boundary
(a)
(b)
(c)
(d)
(d), all three vertices belong to boundary(
). However, for simplicity, we avoid this
situation by constraining the mesh triangulation of
to only allow cases (a), (b)
or (c).3
Let pa and pb be two points of
n that belong to boundary(
). As before, we
can define shape functions for each point that would yield a linear approximation of
the segment in the boundary. This time, the shape function for point pa and pb will
be given by a (p) b(p), respectively, defined by
det pb p
det pa p
(14.33)
a (p) =
and
b(p) =
det pb pa
det pa pb
where p pa pb with pa pb being the line segment connecting pa and pb given by
+
2
x
= pa + (1 ) pb , 0 1
pa pb =
y
Figure 14.6 shows the shape function b.
Some of the properties of these shape functions are:
a (pa ) = b(pb) = 1 ; a (pb) = b(pa ) = 0 ; a (p) + b(p) = 1
pa + pb
pa + pb
1
a
= b
=
(14.34)
2
2
2
pb pa
a ds =
b ds =
2
pa pb
pa pb
Along points in pa pb , a continuous function f (x, y) can be approximated linearly by
f (p) f (pa ) a (p) + f (pb) b(p)
The line integral can then be approximated by
f (pa ) + f (pb)
pb pa
f (x, y)ds
2
pa pb
3
(14.35)
(14.36)
For instance, when case (d) occurs, we can split the triangle into two triangles along the middle
vertex such that the two smaller triangle elements satisfy case (c).
533
x,y
pb
pa
n
papb
x
where
"
pb pa =
(pb pa )T (pb pa )
Recalling the terms involving a line integral in (14.10), we can apply (14.36),
ua qa + ub qb
u M(x, y) u n ds
pb pa
2
Bound(
n )
qa /2
T
=
L
[u] ea eb
(14.37)
qb/2
qa = (M(x, y) u) n
L = vT v
with
;
p=pa
v=
qb = (M(x, y) u) n
p1
p2
1
0
e1 = 0
;
e2 = 1
0
0
p3
eb ea
p=pb
0
e3 = 0
1
In (14.37), we are assuming that the Neumann boundary conditions will supply
the values of qa and qb. If this is not the case, the Dirichlet conditions will set the
values of u at the boundaries making the line integral calculations unnecessary.
This result can be generalized easily for the Robin (i.e., mixed) conditions. The
only change necessary is to replace qa and qb by
= a u(pa ) + qa ; (M(x, y) u) n
= bu(pb) + qa
(M(x, y) u) n
p=pa
p=pb
where a , b, qa and qb should all be supplied at the boundary points with the Robin
conditions.
534
(u) [M(x, y) u] dA
n u [b(x, y) u] dA
n [u g(x, y) u] dA
+
2
D
= [u]Tn Kn [u]n
(14.38)
and
Bound( n )
n
(uM(x, y) u) n ds
u h(x, y) dA
2
D
T
+
Q
(u)
(p )
2
n
[u]Tn n
(14.39)
where
Kn
n
+
2
D
T
T
T
T M(p ) T b(p ) T g (p )
2 n
+
2
D
h(p ) + Q
2 n
(14.40)
(14.41)
The constant matrices and T were defined in (14.26) and (14.29), respectively, and
qa
L
ea eb
if pa , pb Bound(
)NBC
D
qb
Q=
(14.42)
otherwise
0
The formulas for D and L were defined in (14.12) and (14.37), respectively. Note
that the sizes of matrices Kn and n are 3 3 and 3 1, respectively.
For the special case in which pa and pb both belong to the boundary points
specified by Robin conditions, a term will have to be added to Kn as follows:
+
2
D
TT M(p ) T bT(p ) T g (p ) T
Kn =
(14.43)
2 n
2
+
D
n =
h(p ) + Q + Q(rbc)
(14.44)
2 n
where
=
ea
eb
a
0
0
b
T
e
a L
T D
eb
and
Q(rbc) =
ea
eb
q(rbc)
a
(rbc)
qb
L
D
535
if pa , pb Bound(
)RBC
otherwise
[] dA =
Nelements
n
n=1
[] dA
and
[] ds =
Nelements
Bound( )
[] ds
Bound(
n )
n=1
where [] stands for the appropriate integrands. Using (14.38) and (14.39), applied
to the weak-form working equation given in (14.10), we obtain
Nelements
[u]Tn Kn [u]n =
Nelements
n=1
[u]Tn n
(14.45)
n=1
However, because [u]i and [u] j , with i = j , refer to different vectors, they cannot be
added together directly. The same is true for [u]i and [u] j , i = j . Instead, we need
to first represent (14.45) using u, the global vector of node values.
To proceed at this point, assume that a mesh of triangular elements has been
generated for domain
. One particular method for mesh generations that would
yield a triangulation is known as the Delaunay triangulation, which is discussed in
Section 14.4. Regardless of the triangulation approach, we assume that the mesh is
represented by two sets of data. One set of data is the collection of the node positions
given by node matrix P,
P=
x1
y1
x2
y2
...
xNnodes
yNnodes
(14.46)
where Nnodes is the number of nodes in
. The second set of data is represented by
a matrix of indices (of the nodes) that make up each triangular finite element, given
by index matrix I,
I=
I1,1
I2,1
..
.
I1,2
I2,2
I1,3
I2,3
INelements ,1
INelements ,2
INelements ,3
(14.47)
where Nelements are the number of elements in
and Ii,j {1, 2, . . . , Nnodes } are the
indices of the vertices of the ith triangular element. To illustrate,consider the mesh
536
y
2
2.0
1
1.0
5
6
1.0
2.0
3.0
4.0
triangulation shown in Figure 14.7; then the corresponding matrices P and I are
given by
1 3 2
2 3 4
1 7 3
0.67 2.22 1.90 3.60 2.38 3.78 0.96
P=
and I =
3 7 5
1.63 2.13 1.25 1.43 0.31 0.45 0.29
3 5 4
4 5 6
Note that the sequence of the nodes in each row of I is such that it follows a
counterclockwise path.
For the n th element, we can use the rows of I to define a matrix operator En of
size (Nnodes 3) whose elements are given by
+
1 if i = In,j
En(i,j ) =
(14.48)
0 otherwise
When the transpose of En premultiplies a vector u of length Nnodes , the result is
a (3 1) vector, [u]n , whose elements are extracted from positions indexed by
{In,1 , In,2 , In,3 }.4
To illustrate, suppose Nnodes = 6 and In = {In,1 , In,2 , In,3 } = {6, 1, 3}, then
u1
u2
u6
0 0 0 0 0 1
u
3
u1
ETn u = 1 0 0 0 0 0
u4 =
u3
0 0 1 0 0 0
u5
u6
Returning to the main issue of summing the various terms in (14.45), we see that
the (3 1) vector [u]n given by
uIn,1
[u]n = uIn,2
(14.49)
uIn,3
4
We can also use En to generate the matrix [p]n = (p1 , p2 , p3 )n from matrix P given in (14.46), that
is,
x[In,1 ]
x[In,2 ]
x[In,3 ]
[p1 , p2 , p3 ]n = P En =
y[In,1 ]
y[In,2 ]
y[In,3 ]
The local matrix [p]n can then be used to evaluate matrices Kn and n in (14.38) and (14.39).
u=
537
u1
u2
..
.
(14.50)
uNnodes
using the matrix operator En , that is,
[u]n = ETn u
(14.51)
[u]n = ETn u
(14.52)
[u]
En Kn ETn u
n=1
N
[u]
n=1
elements
Nelements
En Kn ETn
[u]T En n
N
elements
[u]
n=1
En n
n=1
T
[u] Ku
T
[u]
(14.53)
where
K
Nelements
Kn(G)
n=1
Nelements
n(G)
n=1
Kn(G)
En Kn ETn
n(G)
En n
(14.54)
One final detail still needs to be addressed. The Dirichlet conditions in (14.4)
have not yet been included in (14.54). To address this issue, let the nodes that are
attached to the Dirichlet conditions be indexed by vector D,
D = (D1 , D2 , . . . , DNDBC )
(14.55)
where Di {1, 2, . . . , Nnodes } and NDBC is the number of nodes that are involved in
the Dirichlet boundary conditions. We now describe two possible approaches.
1. Approach 1: Reduction of unknowns. In this approach, u is split into two vectors:
uDirichlet , which contains the known u-values at the Dirichlet boundary, and
538
unonDirichlet , which contains the unknown u-values. Let vector Dnon be the vector
of indices that remain after the removal of D, that is,
;
:
non
(14.56)
Dnon = D1non , . . . , D(N
nodes NDBC )
where
Dinon
\D
and
if i < j
Then
[uDirichlet ]i = u(Di )
i = 1, 2, . . . , NDBC
(14.57)
and
[unonDirichlet ] j = u(Dnon
j )
j = 1, 2, . . . , (Nnodes NDBC )
(14.58)
N
DBC
(i,D ) [uDirichlet ]
K
(14.59)
=1
(14.60)
(14.63)
3
1
2
2
1
2
4
circumcircle
4
2
symmetric matrix, the overloading approach has become the preferred route in
several finite element solutions. For our discussion, we use the overloading approach
due to its simpler form. Specifically, in (14.63), u refers to the same vector in (14.54).
The classical approach is the structured method of mapping quadrilateral subregions that make up
the domain. The structured approach often yields more uniform patterns. However, these procedures
sometimes demand several inputs and setup time from the users. Some versions also require the
solution of associated partial differential equations.
539
540
(c) Internal nodes, that is, excluding the boundary points, may need to be moved
around locally so that the nodes are mostly equidistant to each other. This
step is called smoothing. We do not discuss these smoothing techniques, but
some of the more popular methods include the Laplacian smoothing and
the force-equilibrium method.6 Instead, for simplicity, we just generate the
points as close to equal size in the desired domain.
The result of the first part of the procedure is the matrix of node positions
P, that is,
P=
x1
y1
x2
x2
...
xNnodes
yNnodes
(14.64)
2. Identify the Delaunay triangles. One of the simpler methods for finding the
Delaunay triangles is to use a process known as node lifting. Let (xo , yo ) be
a point near the center of domain
. A paraboloid function defined by
z(x, y) = (x xo )2 + (y yo )2
(14.65)
)
*
P=
P1 ,
P2 , . . . ,
PNnodes
where
Pi =
xi
yi
(14.66)
zi (xi , yi )
as shown in Figure 14.9. Next, a tight polygonal cover of the lifted nodes can
be generated to form a convex hull of these nodes using triangular facets. One
particular method, which we refer to as the simplified-QuickHull algorithm, can
be used to obtain such a convex hull of the lifted points. Details of this algorithm
are given in the appendix as Section N.1.7
When the nodes of each facet are projected down onto the (x, y) plane,
the triangular mesh that is generated will satisfy the conditions of a Delaunay
triangulation, thus identifying the nodes of a finite element from each projected
facet. This can be seen in Figure 14.9, where a vertical circular cylinder enclosing
a facet of the triangulated paraboloid generates a circumcircle of the finite element in the (x, y)-plane. Because this cylinder that cuts through the paraboloid
will include only the three points of the triangular facet, the projected triangle in the (x, y)-plane will have a circumcircle that should not contain other
nodes.
6
7
See, for example, P. O. Persson and G. Strang, A Simple Mesh Generator in MATLAB, SIAM
Review, vol. 46, pp. 329345, June 2004.
The facets belonging to the top of the paraboloid will need to be removed. These facets are distinguished from the bottom facets using the property that their outward normal vectors are pointing
upwards, that is, having a positive z-component.
541
Figure 14.9. Projection of the triangular facets of the paraboloid will yield
a Delaunay triangulation in the (x, y) plane.
Using the nodes identified with each facet, the matrix I can now be given by
I11
I12
I13
I21
I22
I23
(14.67)
I=
We have found that the function delaunayn is more robust than delaunay.
542
I:
D:
N:
R:
p
=
Nnodes
1
2
where
pGlobal
=
i
xi
yi
I1
I2
.
INelements
;
:
set of indices of Dirichlet boundary nodes = D1 , D2 , . . . , DNDBC
;
:
set of indices of Neumann boundary nodes = N1 , N2 , . . . , NNNBC
;
:
set of indices of Robin boundary nodes = R1 , R2 , . . . , RNRBC
Local
p1
n
Local
p2
n
Dn
Tn
pn
Qn
QRBC
n
n
det
1
Dn
1
PnLocal
1
0
0
1
n
T
. For the
Local
pi
= pGlobal
, i = 1, 2, 3
Ini
n
0
Local
1
Pn
1
1
0
1
1
1
0
PnLocal ()
ea
ea
if
otherwise
qa (RBC)
Ln
eb
(RBC) Dn
qb
ea
qa
Ln
eb
Dn
qb
if
otherwise
a
eb
0 eTa L
n
T Dn
b
eb
if
otherwise
Ln = vT v , v = PnLocal (eb ea )
1
0
0
e1 = 0 , e2 = 1 , e3 = 0
0
0
1
Dn
TTn M(pn ) Tn bT(pn ) Tn g (pn ) T n
2
Dn
h(pn ) + Qn + Q(RBC)
n
2
where
Kn
1,
Local
p3
n
where
1
1,
3
543
544
Nelements
En Kn ETn
=
and
Nelements
n=1
5
(En )ij =
where
En n
n=1
if i = Inj
otherwise
(Remarks: In MATLAB, once the local matrices Kn and n have been found,
instead of using En , we could use the simpler command lines such as
K(In,In) = K(In,In) + K_n ;
Gamma(In) = Gamma(In) + Gamma_n ;
and
are updated iteratively from n = 1
where In is the n th row of I; thus K
to Nelements .)
(b) Include Dirichlet boundary conditions using the overloading" approach
) *
Kij
{i }
where
where
Kij =
i =
ii
(1/)K
Kij
ii uD
(1/)K
i
if i = j = D
= 1, . . . , NDBC
0<1
otherwise
if i = D
= 1, . . . , NDBC
0<1
otherwise
EXAMPLE 14.3.
2 2
3 + x y /2
M=
1/4
1/4
x+1
; b=
y+1
2
; g = (x y)2
and
h = ( cos(2x) + sin(2x) + )
where
= 4 1 + xy2 x
= 242 (2xy)2 + 2 (x y)2
= 21.2x2 y3 10.6y2 x3 + y 5.3x4 15.9x2 + 10.6x + 31.8 + 5.3x(1 + x)
We now fix the domain to be a unit square domain with a hole similar to the one
given in Example 14.2 but with a circular hole of radius 0.2 located in the center
of the square and with the lower left corner situated at the origin. For our test,
we set the Dirichlet conditions the left, top, and bottom sides of the square, plus
at the circular edge of the hole given by
x=0
, 0y1
0x1
,
y=0
u = 5.3x2 y + 2 sin(2x) for
0x1
,
y=1
! 2
x + y2 = 0.2
For the right side of the square, we set the Neumann conditions given by
6 + x2 y2 2 cos(2x)
[M u] n =
at x = 1, 0 y 1 :
+ 31.8xy + 5.3x3 y3 + 1.325x2
The exact solution of this problem is known (which was in fact used to set h and
the boundary conditions) and given by
u(x, y) = 5.3x2 y + 2 sin(2x)
After applying the finite element method based on a Delaunay triangulation
generated by the same method used in Example 14.2, we obtain the solution
shown in Figure 14.11 together with a plot of the errors between the numerical the exact solution. The errors are within 102 , where the larger errors
occur in this case around the boundary with Neumann conditions, suggesting
potential areas for improvement of the method. It is expected that increasing
the resolution may improve the accuracy. However, there are high-order accuracy improvements that can be implemented (which are not discussed here).
The results shown here can be reproduced by running the MATLAB m-file
fem_sqc_test1.m that is available on the books webpage.
545
546
Errors
u
10
0.01
10
1
1
0.5
0.01
1
0.5
0.5
0.5
0 0
x
0 0
As we have noted in the introduction to this chapter, there are cases where
convection dominates over the diffusion terms; that is, the effects of the coefficients
b are much larger than those of M. In these cases, a slight modification known as
the Streamline-Upwind Petrov-Galerkin (SUPG) method can handle some of the
problems. A brief introduction to this method is given in Section N.2 as an appendix.
M(x,
y,
z)
u
dV
(u)
u
M(x,
y,
z)
n
dS
Bound(
)
(14.68)
u b(x, y, z) u dV =
+
u
h(x,
y,
z)
dV
u g(x, y) u dV
We do not discuss the general finite element method for the 3D case. Instead,
we just focus on the special case known as the axisymmetric case; that is, we assume
symmetry about the z-axis, which can be formalized as ()/ = 0 in the cylindrical
coordinate system. Under this condition, the terms in (14.68) reduce to
u
(u)
(u)
m11 (r, z) m12 (r, z)
r 2r dr dz
,
u
m21 (r, z) m22 (r, z)
r
z
(r,z)
z
u
r
547
0.8
0.6
z
0.4
0.2
0
0
0.2
0.4
0.6
0.8
where q(r, z) = u M(r, z) u n. By close observation, this is the same as (14.10)
after division by 2, with the exception that, other than replacing x and y by r and z,
respectively, the coefficients are now multiplied by r, that is, M are replaced by rM,
and so forth. Thus the procedure will be similar as before; however, note that the
integrands must now be multiplied by r evaluated at p , that is, at the centroids of
the triangular finite elements.
EXAMPLE 14.4.
r=0
=0
548
equation given by
u u u 2 u 2 u
F t, x, y, u, , , , 2 , 2 = 0
t x y x y
= [ (M(t, x, y) u)] + [b(t, x, y) u]
u
(14.70)
t
for (x, y)
q(t, x, y)
[M(t, x, y) u] n
on boundary ( )NBC
(14.71)
or
u
u(t, x, y)
on boundary ( )DBC
(14.73)
for t = 0
(14.74)
Compared with the steady-state version given by (14.1), the new system
described by (14.70) now includes an additional term: [c(t, x, y) u/t]. Furthermore, the coefficients: M, b, g and h, have all been given time dependence.
As discussed previously, when the equation becomes convection-dominated,
the Galerkin method may become increasingly inaccurate. In fact, in the limit of
M = 0 with b = 0, we have hyperbolic equations, which contain propagated wave
solutions. These convection-dominated cases may need to be handled differently,
most likely moving the solution along the characteristics. We defer the discussion of
these characteristics-based solutions to other sources. Instead, we only treat equations that can be consider to be diffusion-dominated, that is, where the effects of b
are much less that the effects due to M.
We can use the same semi-discrete method, also known as method of lines, introduced in Section 13.3.1. As before, we start with a spatial-discretization of the problem and then convert the original partial differential equation into an initial value
problem, that is, an ordinary differential equation in time. First, we let u = u/t and
treat it as another independent variable. Using the approach in Section 14.1, while
keeping t fixed, we obtain a weak formulation of (14.70),
(u) [M(t, x, y) u] dA
u
x,
y)
u]
dA
[b(t,
Bound(
) (uM(t, x, y) u) n ds
(14.75)
[u g(t, x, y) u] dA
+
u h(t, x, y) dA
+
[u c(t, x, y) u]
dA
For the n th element, we can use the linear approximation given in (14.19),
{u}
n {u 1 1 + u 2 2 + u 3 3 }n
(14.76)
549
[u c(t, x, y) u]
[u] c(t, p ) [u]
2
n
n
where
(14.77)
u 1
n = u 3
[u]
u 3
n
Nelements
(14.78)
+
En Cn (t)ETn
and
Cn (t) =
n=1
D
c(t, p ) T
2
2
n
and both K(t) and (t) already include the handling of Dirichlet and/or Neumann
conditions, which are possibly time-dependent as well. Matrix C is often also referred
to as a mass matrix.
Equation (14.78) is now a set of simultaneous ordinary differential equation.
If the resulting system is linear with constant coefficients, then the solution can
be obtained analytically using the techniques discussed in Section 6.5.1. If C is
singular with rank r, then the equation could be transformed into a set of r ordinary
differential equations and a set of Nnodes r algebraic equations.
More generally, the ODE solvers discussed in Chapter 7 can be used. In the
event that C(t) becomes singular, the system will be a set of differential-algebraic
equations (DAE) and would then require special treatments. Among the ODE
solvers, due to the large, albeit sparse, matrices involved, the usual preferred choice is
the Euler method based on central difference approximation of the time derivatives.
A slight, yet significant, variation to the central difference approximation is the
Crank-Nicolson method, which is also used in finite difference methods (cf. Section
13.3.2). In this approach, the equations are modified to locate the points at the middle
of two consecutive time points used in the time march. This results in a method with
good stability properties and accuracy (as long as the boundary data are sufficiently
smooth).
Thus the Crank-Nicolson method uses two approximations:
u(t +
t) u(t)
d
u
(14.79)
dt
t
t
t+
( 2)
t
u t+
2
u(t +
t) + u(t)
2
(14.80)
t/2. Equation (14.80) sets the value of u at the half-time increment to be the average
of the values at t and t +
t. Applying these approximations to (14.78) at t +
t/2
will yield, after some rearrangements,
Lk uk+1 = Sk uk + Hk
(14.81)
550
where
Lk
C(k ) +
t
K(k )
2
Sk
C(k )
t
K(k )
2
Hk
t (k )
and
1
k = k +
t
2
(14.82)
Note that instead of using the overloading" approach used in solving steadystate equations, a simpler solution for the transient case is to substitute the values of
u at the boundary nodes by the boundary values specified by the Dirichlet conditions
at each tk .
Consider the linear reaction-diffusion type differential equation
equation and boundary conditions given in Example 13.10,
EXAMPLE 14.5.
u
= 2 2 u 3u + h(t, x, y)
t
(14.83)
where
h(t, x, y)
a (x, y)
b(x, y)
800r
(1 + 20r)
(1 + 20r)2
8
9
64 2
s2
3200 + 50 16x
5
(1 + 5s)3
8
9
s
64 2
400 + 5 16x
5
(1 + 5s)2
8
1 + 20r(x, y)
(x, y) =
r(x, y) = e22xy
s(x, y) = e8[(x0.8)
(x, y) =
1
1 + 5s(x, y)
2
y]
551
t= 0.1
t= 0.2
1
0.5
0.5
0.5
0
1
0
1
0.5
0.5
0 0
0.5
0 0
t= 0.4
0.8
0.6
0 0
t= 0.5
0.5
0.5
0
1
0.5
0.5
0 0
0.5
0
1
0.5
0 0
t= 0.8
1
0.5
0.5
0.5
0
1
0
1
0.5
0 0
0.5
0.5
0 0
0.5
0.5
0 0
0.02
0
0.02
1
0.02
0
0.02
1
0.02
0
0.02
1
0.5
0.5
0.5
0 0
t= 0.4
0 0
t= 0.7
0 0
t= 0.2
0.5
0.5
0.5
0.02
0
0.02
1
1
0.02
0
0.02
1
1
0.02
0
0.02
1
1
0.5
0.5
0.5
0 0
t= 0.5
0 0
t= 0.8
0 0
t= 0.3
0.5
0.5
0.5
0.02
0
0.02
1
1
0.02
0
0.02
1
1
0.02
0
0.02
1
1
0.5
0.5
0.5
0 0
t= 0.6
0 0
t= 0.9
0 0
0.5
0.5
0.5
0.5
0
1
Figure 14.13. The left plot shows the Delaunay triangulation used for the finite element
solution. The plots on the right are the finite element solution at different time instants using
the Crank-Nicholson method. The approximations are shown as points, whereas the exact
solutions are shown as surface plots.
t= 0.1
t= 0.9
0.2
0.5
t= 0.6
0
1
t= 0.7
0.8
0.5
0.5
0 0
0.6
0
1
0.4
0.4
0.5
0.2
0.5
0
0
t= 0.3
Figure 14.14. The error distribution between the exact solution and the finite element solution
using the Crank-Nicholson method at different time instants.
0.5
552
E14.1. Generate a triangulation, that is, the set of points P and a corresponding
index set I defined in (14.64) and (14.67), respectively, for the domains
given in Figure 14.15 such that the edge (or characteristic length) of the
triangles are at most 0.1. (Note: Once the positions have been assigned for
both boundary and internal points, you may use the MATLAB function
delaunay to generate both P and I as well as the MATLAB function
triplot to obtain a mesh plot.)
(5,3)
(3,1)
(2,2)
(0,0)
(2,-1)
R=1.5
R=1.5
(0,0)
(0,0)
r=0.5
E14.2. As a small tutorial to help understand the procedure given in Section 14.5,
consider the following differential equation
2 2 u
u u
+
u + (2x 3y + 7) = 0
x
y
subject to
u(1, y) = 3y + 4 ; u(x, 1) = 2x + 5 , u(x, 0.5) = 2x + 0.5
u
and
= 2.0 in the domain 1 x 2 and 1 y 0.5. Let the mesh
x x=2
be described based on the node position matrix P and element index matrix
I be given by
1.0
1.0 1.0 1.0
1.5
1.5 1.5 1.5
2.0
2.0 2.0 2.0
P=
1.0 0.5 0.0 0.5 1.0 0.5 0.0 0.5 1.0 0.5 0.0 0.5
5
2
1
6
2
5
6
5 10
10
5
9
6 10
7
7 10 11
I=
2
6
7
3
2
7
4
3
7
7
8
4
11 12
7
12
8
7
14.8 Exercises
67
7 71
67
1
1
K6 =
5
79 71
and
6 =
211
72
144
77 65 145
211
67
5 59
4
1
1
K7 =
5
67 59
and
7 = 4
72
9
77 77 157
4
3. Show that after assembly, the first five columns of the global matrix K
and
, prior to any overloading, are given by
145
77
0
0
65
71
71
264
280
83
0
14
112
0
65
278 77
0
105
0
0
71 134
0
71
294
10
0
0
304
1
1
0
154
0
0
130
276
,(1,...,5) =
=
K
;
0
2
154
10
0
480
72
144
108
0
0
0 77
0
223
0
0
0
0
77
576
0
0
0
0
2
416
0
0
0
0
0
0
0
0
0
0
261
4. After overloading, solve for the finite element solution u. Compare this
with the exact solution given by u = 2x 3y + 2. Note that because the
solution is a plane, the finite element solution should also be quite close
to the exact solution.
E14.3. Consider the Laplace equation 2 u = 0 with a unit circle domain R = 1 subject to the Dirichlet boundary condition u(R = 1) = 1 + 3 cos , 0 < 2.
Obtain the finite element solution using the rectangular coordinate system
and compare the solution with the exact solution given by u(r, ) = 1 +
r3 cos(3) (cf. Example 11.6). (Hint: For the triangulation of the unit circle,
you can use the command [x,y,Ind]=circle_mesh_example(1,20)
that is available on the books webpage for a circle of radius 1 and 20
concentric circles.)
E14.4. Consider the Poisson equation 2 u = 1 in the domain 0 x 1, and
0 y 1, subject to the Dirichlet boundary conditions u(0, y) = u(1, y) =
u(x, 0) = u(x, 1) = 0. Obtain the finite element solution and compare it with
the series solution given in (11.80); that is, plot the difference.
E14.5. A cross-section of a cooling fin is shown in Figure 14.16 with the base
located at y = 0. The steady-state temperature distribution u inside the fin
satisfies the Laplacian equation 2 u = 0, subject to the Dirichlet boundary
condition: u(y = 0) = T 0 , and a mixed boundary condition at y > 0:
T u n
= (hT ) T surr u
Boundary,y>0
553
554
b
Figure 14.16. The dimensions for a trapezoid-shaped fin.
y=0
1. Build a program (under MATLAB or other platforms) that would generate plots of u(x, y) that takes the following as parameters: a, b, c, T , hT ,
T surr , and T 0 , as well as , which is a characteristic length of the triangles.
(Hint: You can use the program trapezoid_mesh.m, or the algorithm
within, that is available on the books webpage, for the triangulation of
the domain.)
2. Try the following values: a = 4, b = 1, c = 1, , T surr = 25, T 0 = 10, and =
0.1. Use the following cases T /hT = 0.5, 1, 10 and observe what happens
to the temperature distribution. Discuss the trend of the finite element
solution obtained here at the various ratios of kT /hT , and compare them
with the trend obtained in Example 9.5, where it was assumed that the
temperature at each cross-sectional slice of the fin is fixed, that is, u =
u(y).
E14.6. Consider the nonhomogeneous Helmholtz equation in polar coordinates
given by
2 u + 3u = 18 15r3 sin(3)
in a unit circle with a Dirichlet boundary condition given by u = 6
5 sin(3) at r = 1. Obtain a finite element solution using the rectangular coordinates. (You may use the triangulation using the command
[x,y,Ind]=circle_mesh_example(1,20)). Compare the finite element solution with those found using the finite difference method (cf. Exercise E13.7). (Note: The exact solution is given by u(r, ) = 6 5r3 sin(3).)
E14.7. For the equation
2 u v
with
u
ku = h(x, y)
y
where 0 x 1; 0 y 10
2
h(x, y) = ex A + Bex/2 ;
A = 2 + k 4x2 ;
5
35
+ v 5k + 20x2
4
2
x2
subject to u(x, 0) = 4e , u(1, y) = e1 5ey/2 1 and
u
u
=0
and
=0
y y=10
x x=0
B=
14.8 Exercises
555
x=L
that is, the flow entering contains only an x-component, and the velocity is
uniform and equal at the entrance and at the exit. Also, because the flow
cannot penetrate the walls of the plates or the cylinder, we have,
n
= n
= n
=0
2
2
2
y=0
(xxc ) +(yyc ) =r
y=Y
1.2
1.2
0.8
0.8
y 0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.2
0.5
1.5
0.5
1.5
Figure 14.17. The streamlines for a potential flow around one cylinder or an array of cylinders.
1. Using the finite element method, solve the Laplacian equation for
subject to the boundary conditions given above. Specifically, let v0 =
3, L = 2, Y = 1, (xc , yc ) = (1, 0.5) and r = 0.2. (Hint: You can use the
MATLAB file rect_circ_mesh.m to build the mesh).
2. Using (14.30), write a program that would find the gradients of each
finite element, to be located at the centroid of the triangle; that is, using
the results found for P, I, and u, evaluate the vectors:
xgrad,1
ygrad,1
vx,1
vy,1
.
.
..
..
xgrad =
; ygrad =
; vx = .. ; vy = ..
.
.
xgrad,N
9
ygrad,N
vx,N
vy,N
For an inviscid and incompressible fluid flowing in two dimensions, the Helmholtz vorticity equation
given in (4.106) implies irrotational flow at steady state because ( v) v = 0 in a 2D flow.
556
where xgrad and ygrad are positions of the centroids of the triangles, and vx
and vy are the x- and y- components of the velocitiy, and N is the number
of finite elements.
3. With the vectors, xgrad , ygrad , vx and vy , try to reproduce the streamline plot shown in the left plot in Figure 14.17. (Hint: Once all the
necessary vectors have been found, you can use the MATLAB file
fem_exer_streamline.m that is available on the books webpage.)
4. Apply the same solution approach to a path involving an array of
horizontal cylinders as shown in the right plot in Figure 14.17. Hint: You
can use the following MATLAB commands that are available on the
books webpage as follows:
[choles,rholes]=circ_arrays(3,4,0.4,0.3,0.12,1,0.5);
[x,y,Ind]=rect_circ_mesh([0;0],[2;1],choles,rholes,0.03);
and
y=Y
T n
Q
= 40
Cv
Let the thermal diffusivity = 1. Using the velocity field found in Exercise
E14.8, let the entrance velocity v0 = 4, and use the finite element method to
obtain the temperature field and a tomographic plot of T . Note: Once the
values for x, y, I, and T (which are the x and y node positions, nodes indices
of the elements, and temperature values at the nodes, respectively) have
been found, a temperature tomography plot can be obtained in MATLAB,
with Ind=I, using the following commands:
trisurf( Ind, x, y, T);
axis equal; view([0 90]);
shading interp;
14.8 Exercises
557
z=L
Use the finite element method under the axisymmetric assumption to find
the distribution u(r, z) for L = 10, R = 1, = 0.2, v = 3, = 0.3, and uR =
80. Generate a tomographic with contour map of u(r, z) for R r R and
0 z L.
E14.11. At steady state, a parallel, Newtonian viscous flow in the z-direction, through
a pipe having a fixed cross-section domain
, will have vx = vy = 0, and vz
modeled by
2
vz 2 vz
+
=
x2
x2
where = (1/)d(p + gz)/dz, with , p , and g being the viscosity, pressure, and gravitational acceleration, respectively. The domain
can either
be simply connected (without holes) or multiply connected (with holes).
Either way, the no-slip boundary condition will have to apply; that is, we
have the Dirichlet conditions, vz = 0 at Boundary(
).
1. Let the cross-section domain
be formed by a circle of radius R but with
a hole formed by a smaller circle of radius r0 = R ( < 1) whose center
coincides with the larger circle. The exact solution of this flow is known,10
and it is given by
'
(
r 2
R2
1 2
R
vz(r) =
1
ln
(14.84)
4
R
ln(1/)
r
and the average velocity is given by
'
(
vzd
R2 1 4
1 2
vz,ave =
=
(14.85)
8
1 2
ln(1/)
d
For simplicity, let = 1, = 0.3, and R = 1. Using the finite element
method, compare the numerical solution using a mesh based on rectangular coordinates with the exact solution given in (14.84). (Hint: You can
use the triangulation obtained by the following MATLAB command that
is available on the books webpage:
[x,y,Ind]=circle_annulus(c,r,choles,rholes,dL);
where c and r are the center and radius of the outer circle, respectively,
whereas choles and rholes are the centers and radii of the holes,
respectively, and dL is the approximate characteristic length of the elements.)
Furthermore, using the same triangulation and the numerical solution
of vz for the annular flow found above, use the approximation of the
integrals based on the shape functions (c.f. (14.22)) to determine the
10
See, for example, R. B. Bird et al., Transport Phenomena, 2nd edition, J. Wiley & Sons, New York,
2002, pp. 5455.
558
average velocity vz,ave , and compare it with the exact value given by
(14.85).
2. Instead of one hole at the center, consider the case as shown in Figure 14.18, where there are three holes with centers and their corresponding radii given by:
1
1
1
2 2 cos(2/3) 2 cos(4/3)
choles = c1 c2 c3 =
1
1
0 2 sin(2/3) 2 sin(4/3)
and rholes = 0.2, 0.2, 0.2 . Obtain a tomographic plot of vz under
viscous flow through
for this case, as well as the average velocity.
Finally, for density
= 1, determine the mass flow rate Q = vz,ave A
,
where A
=
d
is the area of
.
1
0.5
0.5
0.5
0.5
E14.12. Consider the torus shown in Figure 14.19, with a = 2 and R = 1. Assume
the temperature to be initially at u = 60. At time t > 0, the surface of the
torus was set to 120. The conduction inside the torus can be modeled by the
heat diffusion equation with = 0.5.
u
= 2 u
t
Obtain a time-lapse temperature (surface plot or tomograph plot) of a
circular slice of the torus at t = 0.0, 0.1, 0.2, . . . , 0.8.
a
Figure 14.19. A solid torus with major radius a and minor radius R.
14.8 Exercises
559
E14.13. A cylindrical tank with a hemispherical bottom contains a fluid that is sealed
from the top. A cylindrical rod is then inserted at the center. (See Figure 14.20). The temperature distribution inside the tank can be modeled
using the diffusion equation
T
= 2 u
t
subject to T (r, , z, 0) = T w , T along wall = T w , T along rod = T rod and
T
T
=0
and
=0
r r=0
z along top surface
ro
Figure 14.20. A cylindrical tank with hemisphere bottom
of radius R, including a solid cylindrical heating rod at the
center. The right plot shows the half cross-section from
the cental vertical axis.
APPENDIX A
K
D =
0
0
..
K
dN
Another set of related classes of matrices are the idempotent, projection, involutory, nilpotent, and convergent matrices. These classes are based on the results of
integral powers. Matrix A is idempotent if A2 = A, and if, in addition, A is hermitian,
then A is known as a projection matrix. Projection matrices are used to partition an
N-dimensional space into two subspaces that are orthogonal to each other. A matrix
B is involutory if it is its own inverse, that is, if B2 = I. For example, a reflection
matrix such as the Householder matrix is given by
H=I
2
vv
v v
562
Definition
lim Ak = 0
k
r
Defective (Deficient)
Remarks
k Ak = 0
k=0
r = 0 ; r < N
Diagonalizable
T 1 AT is diagonal
for some nonsingular T
Elementary
Gram
A = B B for some B
Are Hermitian
Hermitian
A =A
(B + B ) /2 is the hermitian
part of B.
B B and BB are hermitian
Idempotent
A2 = A
det(A) = 1 or det(A) = 0
Involutory
A2 = I, i.e. A = A1
Negative definite
x Ax < 0
x = 0
Negative semidefinite
x Ax 0
x = 0
Ak = 0 ; k > 0
det(A) = 0
Normal
AA = A A
Are diagonalizable
Nonsingular (Invertible)
|A| = 0
for procedures that implement iterative computations. If, in addition, k < for
Ck = 0, then the stable matrix will belong to the subclass of nilpotent matrices.
Aside from the classifications given in Tables A.1 and A.2, we also list some special matrices based on the structure and composition of the matrices. These are given
in Table A.3. Some of the items in this table serve as a glossary of terms for the special
matrices already described in this chapter. Some of the matrices refer to matrix structures based on the positions of zero and nonzero elements such as banded, sparse,
triangular, tridiagonal, diagonal, bidiagonal, anti-diagonal, and Hessenberg. Some
involve additional specifications on the elements themselves. These include identity, reverse identity, shift, real, complex, polynomial, rational, positive/negative, or
nonpositive/nonnegative matrices. For instance, positive (or nonnegative) matrices
Definition
Orthogonal
AT = A1
Positive definite
x Ax > 0 ; x = 0
Positive semidefinite
x Ax 0 ; x = 0
Projection
Reducible
Skew-symmetric
A = A
det(A) = 0 if N is odd
aii = 0, thus trace(A) = 0
(B BT )/2 is the
skew-symmetric part of B
Skew-hermitian
A = A
Symmetric
A = AT
Unitary
A = A1
Remarks
are matrices having only positive (or nonnegative) elements.1 Some special matrices
depend on specifications on the pattern of the nonzero elements. For instance, we
have Jordan, Toeplitz, Shift, Hankel, and circulant matrices, as well as their block
matrix versions, that is, block-Jordan, block-Toeplitz, and so forth. There are also
special matrices that depend on collective properties of the rows or columns. For
instance, stochastic matrices are positive matrices in which the sum of the elements
within each row should sum up to unity. Another example are diagonally dominant
matrices, where for the elements of any fixed row, the sum of the magnitudes of
off-diagonal elements should be less than the magnitude of the diagonal element in
that row. Finally, there are matrices whose entries depend on their row and column
indices, such as Fourier, Haddamard, Hilbert, and Cauchy matrices. Fourier and
Haddamard matrices are used in signal-processing applications.
As can be expected, these tables are not exhaustive. Instead, the collection
shows that there are several classes and special matrices found in the literature.
They often contain interesting patterns and properties such as analytical formulas
for determinants, trace, inverses, and so forth, that could be taken advantage of
during analysis and computations.
1
Note that positive matrices are not the same as positive definite matrices. For instance, with
1 2
1 5
B=
A=
0
2
5 1
A is positive but not positive definite, whereas B is positive definite but not positive.
563
564
Definition
Antidiagonal
A=
0
N
Remarks
...
i>j+p
aij = 0 if
or
j >i+q
p is the right-bandwidth
q is the left-bandwidth
Bidiagonal
(Stieltjes)
Binary
Cauchy
1
1
A=
.
N1
1
i
i1
1
k
if i > j , bij =
i
k
if j = i, bii =
k=j
MATLAB:
A=diag(v)+diag(w,-1)
where v= (1 , . . . , N )
w= (1 , . . . , N1 )
1
N
A=
p n1
1
A=
Complex
..
Companion
2
..
.
aij = 0 or 1
Circulant
det(A) = N
i=1 i
Let B = A1 then
if j > i, bij = 0
2 N
1 N1
3 1
p 1
0
..
.
1
p 0
0
..
.
0
p k are coefficients of a
polynomial:
sN + p N1 sn1 + p 1 s + p 0
MATLAB: A=compan(p)
where p= (1, p n1 , . . . , p 1 , p 0 )
Name
Definition
Diagonal
A=
Remarks
0
..
|aii | >
det(A) = i i
MATLAB: A=diag(alpha)
where alpha= (1 , . . . , N )
Diagonally dominant
aij
Nonsingular (based on
Gersgorins theorem)
i= j
i = 1, 2, . . . , N
Are orthogonal
Used in Fourier transforms
MATLAB:
h=ones(N,1)*[0:N-1];
W=exp(-2*pi/N*1i);
A=W.(h.*h)/sqrt(N)
2
W = exp 1
N
Givens (Rotation)
Hadamard
Hk [=]2k 2k
1
1
Hk =
Hk1
1 1
H0 = 1
Elements either 1 or 1
Are orthogonal
MATLAB: A=hadamard(2k)
Hankel
A=
Hessenberg
a j +k,j = 0
2 k (N j )
Hilbert
1
aij =
i+j 1
Identity
A=
0
..
.
1
Often denoted by IN
det(A) = 1
AB = BA = B
MATLAB: A=eye(N)
(continued)
565
566
Definition
Imaginary
A = iB
where Bis real
and i = 1
Jordan block
A=
Remarks
1
..
.
0
..
..
1
s
Are bidiagonal
det(A) = sN
MATLAB:
A=gallery(jordbloc,N,s)
det(A) =
N
aii
i=1
Lower Triangular
ai,j = 0 ; j > i
=1
MATLAB: A=tril(B)
extracts the lower triangle of B
Negative
aij < 0
Non-negative
aij 0
Non-positive
aij 0
P=
T
ek1
Permutation
ekN
k1 = = kN
e j is the j th unit vector
Persymmetric
A[=]N N
ai,j = a(N+1j ),(N+1i)
Positive
aij > 0
Polynomial
aij are
polynomial functions
Real
Rational
aij are
rational functions
Rectangular
(non-square)
A[=]N M; N = M
Reverse identity
A=
Sparse
Significant number of
elements are zero
Name
Definition
Remarks
Stochastic
(Probability,
transition)
A is real, nonnegative
and N
j =1 aij = 1
for i = 1, 2, . . . N
aka Right-Stochastic
Left-Stochastic if
N
i=1 aij = 1, j
Doubly-Stochastic if
both right- and left- stochastic
Shift
A=
Toeplitz
A=
0
..
.
0
..
..
.
Tridiagonal
A= 1
..
..
0
..
2
..
.
..
.
N1
N1
N
Unit
aij = 1
MATLAB: A=ones(N,M)
Unitriangular
A is (lower or upper)
triangular and aii = 1
det(A) = 1
det(A) =
N
aii
i=1
Upper Triangular
ai,j = 0 ; j < i
=1
MATLAB: A=triu(B)
extracts the upper triangle
portion of B
Vandermonde
A=
M1
1
..
.
M1
N
Zero
aij = 0
1
..
.
1
..
.
If square, det(A) = i<j (i j )
Becomes ill-conditioned for
large N
MATLAB: A=vander(v)
where v= (1 , . . . , N )
MATLAB: A=zeros(N,M)
567
568
a11 x1 + + a1M xM
..
.
yN
aN1 x1 + + aNM xM
M
i = 1, 2, . . . , N
aij x j
(A.1)
j =1
y1
x1
a11
..
..
..
y= . x= . A= .
yN
xM
aN1
a1M
..
.
aNM
..
.
(A.2)
then
y1
x1 + 3x2
y2
x1 2x2
y = Ax where
A=
1
1
3
2
; y=
y1
y2
; x=
x1
x2
As we proceed from here, we show that the postulated form in (A.2) to represent
(A.1) will ultimately result in a definition of matrix products C = AB, which is a
generalization of (A.2), that is, with y = C and x = B.
Now let y1 , . . . , yN and z1 , . . . , zN be related to x1 , . . . , xM as follows:
yi =
M
j =1
akj x j
and
zi =
M
bkj x j
i = 1, . . . , N
j =1
569
Let ui = yi + zi , i = 1, . . . , N, then
ui =
M
aij x j +
M
j =1
M
bij x j =
j =1
(aki + bki ) x j =
j =1
M
g ij x j
j =1
where g ij are the elements of a matrix G. From the rightmost equality, we can then
define the matrix sum by the following operation:
G = A+B
M
aij x j =
j =1
i = 1, . . . , N
j = 1, . . . , M
g ij = aij + bij
M
M
j =1
(A.3)
aij x j =
M
j =1
hij x j
j =1
where hij are the elements of a matrix H. From the rightmost equality, we can then
define the scalar product by the following operation:
H = A
i = 1, . . . , N
j = 1, . . . , M
hij = aij
(A.4)
M
Next, let wk = N
i=1 cki yi , k = 1, . . . , K, and yi =
j =1 aij x j , i = 1, . . . , N, where
cki and aij are elements of matrices C and A, respectively, then
N
N
M
M
M
wk =
cki
aij x j =
cki aij x j =
f kj x j
i=1
j =1
j =1
j =1
i=1
where f kj are the elements of a matrix F . From the rightmost equality, we can then
define the matrix product by the following operation:
F = CA
f kj =
N
k = 1, . . . , K
j = 1, . . . , M
cki aij
i=1
(A.5)
=
=
b1
b2
(A.6)
One of the unknowns (e.g., x2 ) can be eliminated by multiplying the first equation
by a22 and the second equation by a12 , and then adding adding both results. Doing
so, we obtain
(a11 a22 a12 a21 ) x1 = a22 b1 a12 b2
(A.7)
(A.8)
570
The coefficients of x1 and x2 in (A.7) and (A.8) are essentially the same, which we
now define the determinant function of a 2 2 matrix,
m11 m12
= m11 m22 m12 m21
(A.9)
det (M) = det
m21 m22
Equations (A.7) and (A.8) can be then be combined to yield a matrix equation,
x a
b1
a12
1
22
=
(A.10)
det(A)
x2
a21
a11
b2
If det (A) = 0, then we have
1
x1
a22
=
x2
a21
det (A)
a12
a11
b1
b2
=
=
=
b1
b2
b3
(A.11)
We can rearrange the first two equations in (A.11) and move terms with x3 to the
other side to mimic (A.6), that is,
a11 x1 + a12 x2
b1 a13 x3
a21 x1 + a22 x2
b2 a23 x3
a12
a11
3 = det
a11
a21
a12
a22
b1 a13 x3
b2 a23 x3
(A.12)
a12
a11
=
=
3
(A.15)
where
3 = det
a22
a32
a21
a31
and
3 = det
a11
a12
a31
a32
Substituting (A.15) into (A.14) and rearranging to solve for unknown x3 , we obtain
a13
3 a23
3 + a33
3 x3 =
3 b1
3 b2 +
3 b3
(A.16)
Looking closer at
3 ,
3 , and
3 , they are just determinants of three matrix redacts
A13 , A23 , and A33 , respectively (where Aij are the matrices obtained by removing
the ith row and j th column, cf. (1.5)). The determinants of Aij are also known as the
ij th minor of A. We can further incorporate the positive or negative signs appearing
in (A.16) with the minors and define them as the cofactor of aij , denoted by cof (aij ),
and given by
cof(aij ) = (1)i+j det (Aij )
(A.17)
ai3 cof (ai3 ) x3 =
cof (a13 )
cof (a23 )
cof (a33 )
i=1
b1
b2 (A.18)
b3
Instead of applying the same sequence of steps to solve for x1 and x2 , we just
switch indices. Thus to find the equation for x1 , we can exchange the roles of indices
1 and 3 in (A.18). Likewise, for x2 , we can exchange the roles of indices 2 and 3 in
(A.18). Doing so, we obtain
3
ai1 cof (ai1 ) x1
cof (a11 )
cof (a21 )
cof (a12 )
cof (a22 )
i=1
3
ai2 cof (ai2 ) x2
i=1
b1
cof (a31 ) b2 (A.19)
b3
b1
cof (a32 ) b2 (A.20)
b3
i=1
3
i=1
3
i=1
(A.21)
The sum of the six terms in (A.21) can now be defined as the determinant of a
3 3 matrix. By comparing it with the determinant of a 2 2 matrix given in (A.9),
we can inductively define the determinants and cofactors for any size matrix as given
in Definitions 1.4 and 1.5, respectively.
571
572
cof (a11 ) cof (a21 ) b1
x1
=
(A.22)
det(A)
b2
x2
cof (a12 ) cof (a22 )
Likewise, if we combine (A.18), (A.20), and (A.19) for a size 3 problem, we obtain
cof
cof
cof
(a
)
(a
)
(a
)
11
21
31
b
x1
x3
b3
FK (f, x,
x)
(A.24)
K=1
where
N
1
K f
FK ( f, x,
x) =
xi )ki (A.25)
(xi
k1
k1 ! kN ! x xkNN
1
i=1
k1 ,...,kN 0
A BC D
x)
(x=
N
i
ki =K
THEOREM A.1.
F1
F2
(A.26)
PROOF.
d
f (x
x)
dx x=x
(A.27)
EXAMPLE A.1.
where,
g (x1 , x2 ) = 4
573
574
f( x , x )
1
0.5
0
1
x2
0
1
x1
64
=
e
0.5)
(x
1
x1 2
f
= 8eg (x2 + 0.5)
x2
2 f
2
g
8
64
=
e
+
0.5)
(x
2
x2 2
2 f
2 f
=
= 64eg (x1 0.5) (x2 + 0.5)
x1 x2
x2 x1
Choosing
x = (0, 0), we have the first-order approximation given by
x1
2
2
f Lin,(0,0)T (x) = 1 e
+ 4e
1 1
x2
or
f Lin,(0,0)T (x) = 1 e2 + 4e2 x1 + x2
Quad,(0,0)T
(x)
1 e2 + 4e2 1
2
+ 4e
or
x1
x2
1
1
2
x1
x2
2
1
x1
x2
f Quad,(0,0)T (x) = 1 e2 + 4e2 x1 + x2 x21 + 4x1 x2 x22
1,(0,0)
575
f2,(0,0) f
0.02
0.02
0.01
0.01
0.01
0.01
0.02
0.1
0.02
0.1
0.1
0
0.1
0
0.1
0.1
0.1
0.1
Figure A.2. The errors from f of the first-order approximation (left) and the second-order
approximation (right) at
x = (0, 0)T .
From Figure A.1, we can see that minimum value of f (x) occurs at x1 = 0.5
and x2 = 0.5. If we had chosen to expand the Taylor series around the point
x = (0.5, 0.5)T , the gradient will be df/dx = (0, 0). The Hessian will be given
by
d2
8 0
f
=
0 8
dx2
and the second-order approximation is
f Quad,(0.5,0.5)T (x) = 4 (x1 0.5)2 + (x2 + 0.5)2
A plot of f Quad,(0.5,0.5)T (x) for a region centered at
x = (0.5, 0.5)T is shown
in Figure A.3. Second-order approximations are useful in locating the value of
x that would yield a local minimum for a given scalar function. At the local
minimum, the gradient must be a row vector of zeros. Second, if the shape of the
curve is strictly concave at a small neighborhood around the minimum point,
then a minimum is present. The concavity will depend on whether the Hessian,
d2 f/dx2 , are positive or negative definite.
2,(0.5,0.5)
0.04
0.03
0.02
0.01
0
0.4
x2
0.5
0.5
0.6
0.4
576
A + (B + C) = (A + B) + C
For the identity (AB) (CD) = (A C)(B D), let A[=]m p and B[=]p n
and then expand the right-hand side,
..
..
..
..
..
..
(A C)(B D) =
.
.
.
.
.
.
am1 C . . . amp C
bp 1 D . . . bpn D
p
...
p
..
.
i=1
..
ami bi1 CD
...
a
b
CD
i=1 1i in
..
p
i=1 ami bin CD
p
(AB) (CD)
2. Transposes of Products.
Let A[=]N M, B[=]M L, then
(AB)T
ij
M
a jm bmi =
m=1
(AB) = B AT
ij
(A B)T
AT BT
bmi a jm = BT AT ij
m=1
T
a11 B
..
aN1 B
M
a ji bji = AT BT ij
(A B)T = AT BT
T
..
a1M B
a11 B
..
..
=
.
.
aNM B
a1M BT
aN1 B
..
..
.
aMN BT
Thus C = B A
C (AB)
(AB) C
1 1
B A (AB) = B1 B = I
(AB) B1 A1 = BB1 = I
AA1 BB1 = I
(A1 B1 )(A B)
A1 A B1 B = I
Thus
(A B)1 = A1 B1
4. Vectorization of Sums and Products.
Let A, B, C[=]N M and C = A + B
=
th
(X)C(,j ) =
X (,1)
X (,r)
c1j
r
.
cij X (,i)
.. =
i=1
crj
r
(BX)C(,j ) = B XC(,j ) =
cij BX (,i)
i=1
c1j B
c1j B
(BXC)(,1)
(BXC)(,2)
vec (BXC) =
..
.
=
c2j B
c2j B
(BXC)(,s)
T
C B vec (X)
crj B
X (,1)
X (,2)
..
.
X (,r)
crj B
c11 B
c12 B
..
.
c21 B
c22 B
..
.
..
.
cr1 B
cr2 B
..
.
c1s B
c21 B
crs B
vec (X)
vec (X)
577
578
5. Reversible Operations.
T T
(A ) ij
Aij
T T
A
=A
1
Let C = A1 , then
CA1 = A1 C
1
C = A1
=A
n
k1 , . . . , kn
ci,ki
k1 =k2 ==kn
i=1
k1 , . . . , kn
n
k1 =k2 ==kn 1 =1
n
1 =1
but
n
a11 b1 k1
n =1
n
aii
i=1
ann bn kn
n
bi ki
j =1
k1 , . . . , kn (b1 k1 bn kn )
(a11 ann )
n
n =1
k1 , . . . , kn
n
n =1
k1 =k2 ==kn
k1 , . . . , kn (b1 k1 bn kn ) = 0
1 =1
k1 =k2 ==kn
n
if i = j
k1 =k2 ==kn
so
det(C) =
1 =2 ==n
k1 , . . . , kn (b1 k1 bn kn )
(a11 ann )
k1 =k2 ==kn
n
det(C) =
1 , . . . , n
aii
1 =2 ==n
i=1
k1 =k2 ==kn
det(A) det(B)
k1 , . . . , kn
bjkj
j =1
579
u11
0
u12
u22
= u11 u22
det
d11
0
det
0
d22
11
21
0
22
= 11 22
= d11 d22
Then using induction and row expansion formula (1.12), the result can be proved
for any size N.
3. Determinant of Transposes.
For a 2 2 matrix, that is,
a11 a12
a11
det
= a11 a22 a12 a21 = det
a21 a22
a12
a21
a22
By induction, and using (1.12), the result can be shown to be true for matrices
of size N.
4. Determinant of Inverses.
Because A1 A = I, we can take the determinant of both sides, and use the
property of determinant of products. Thus
det A1 A = det A1 det A = 1 det A1 =
1
det A
A,k1
A,kN
=A
ek1
ekN
Using (1.10),
det
ek1
ekN
= K
ekN
= K det A
1 a11
..
.
1 aN1
N a1N
a11
..
..
= .
.
N aNN
aN1
a1N
1
..
.
0
aNN
0
..
.
N
580
N
1 a11
N a1N
a11
a1N
..
..
..
det
i det ...
=
.
.
i=1
1 aN1
N aNN
aN1
aNN
7. Multilinearity Property.
Let aij = vij = wij , j = k and vik xi , wik = yi , aik = xi + yi for i, j = 1, 2, . . . , n,
By expanding along the kth column,
det A
n
i=1
n
xi cof (vik ) +
i=1
=
8. Determinant when
n
i=1
n
yi cof (wik )
i=1
det V + det W
Let the elements of matrix V (k, ) be the same as the identity matrix I except
for the kth row replaced by 1 , . . . N , that is,
1
1
0
..
..
.
.
1 k1
V (k, ) =
1
k+1
.
.
.
.
.
0
.
N
1
where k = 0. Then evaluating the determinant by expanding along the kth row,
det V (k, ) = k
Postmultiplying A by V (k, ), we have
N
A V (k, ) =
A,1 A,(k1)
A
A,(k+1)
j =1 i ,j
=
A,1 A,(k1) 0 A,(k+1) A,N
A,N
N
=1
ai cof(a j )
Using (1.13), bij is the determinant of a matrix formed from A except that the j th
row is replaced by the ith row of A, that is,
a11 a1N
..
ai1 aiN
..
bij = det
..
.
aN1
aNN
We can use the property of determinant of matrices with linearly dependent rows to
conclude that bij = 0 when i = j , and bii = det (A), that is,
det A
0
..
A adj(A) = B =
= det A I
.
0
det A
or
1
det A
adj(A) = I
adj(A) A = I
det A
Thus
A1 =
1
adj(A)
det A
x1
cof(a11 )
x2
1
..
.. =
.
. det(A)
cof(a1N )
xn
Thus, for the kth entry in x,
n
xk =
j =1
..
.
b1
cof(aN1 )
b2
..
..
.
.
cof(aNN )
bN
bj cof(akj )
det(A)
The numerator is just the determinant of a matrix, A[k,b] , which is obtained from A
with kth column replaced by b.
581
582
det
A
C
0
D
= det(A)det(D)
where
z1j
cof(z1j )
=
=
if j n
j >n
G1j
1+j
(1) det
Cj
g 1j
0
0
B
, j n
then
n
det Z =
g 1j cof(g 1j )det B = det G det B
j =1
0
D CA1 B
B
D
AA1 I + B1 CA1 + B 1 CA1
I + B 1 CA1 B 1 CA1 = I
AX + BZ
CW + DY
CA1 I + B1 CA1 + D 1 CA1
AW + BY
=
=
1 = I
CX + DZ
k1 =k2 ==kN
+ ... +
d
k1 , . . . , kN
a1,k1 a2,k2 . . . aN,kN
dt
k1 =k2 ==kN
N
d
k1 , . . . , kN a1,k1 a2,k2 . . .
aN,kN
dt
det
A
n
n=1
where
det
A
n =
k1 =k2 ==kN
d
an,kn . . . aN,kN
k1 , . . . , kN a1,k1 . . .
dt
583
584
d(a11 x1 )/dx1
d
..
Ax =
=A
.
dx
d(am1 x1 )/dx1
Assume (1.49) is true for
A[=]M (N 1) and
x[=](N 1) 1. Let
x
A= A v
and x =
=
=
=
=
d
A
dx
d
A
x + v
dx
d
A
x + v
d
x
A v =A
x
A
x + v
wT
where v[=](N 1) 1, w[=](N 1) 1, and , are scalars. Then
d T
d
T
T
2
x Ax =
x A
x+
x + (w + v)
dx
dx
=
AT +
A + (wT + vT )
xT (w + v) + 2
xT
T
A +A w+v
=
xT T
w + vT
2
T
A
w
v
A
= xT
+
T
T
w
xT AT + A
where we used the fact that xT v and xT w are symmetric. Thus equation (1.50)
can be shown to be true for any N by induction.
3. Proof of (1.51): d2 (xT Ax)/dx2 = A + AT
'
(T
T
d2 T
d d T
d T T
x
Ax
=
Ax
=
x
x
A
+
A
= A + AT
dx2
dx dx
dx
SK ( f, x,
x)
K=1
where
x) =
SK ( f, x,
k1 0
BC
N
i
ak1 ,,kN
kN 0
N
xi )ki
(xi
i=1
ki =K
At x =
x, we see that SK (f,
x,
x) = 0 for K > 0, or
f (
x) = a0
Then, for a fixed value of k1 , . . . , kN ,
K f
xk1 1 xkNN
i=1
i >0
After setting x =
x and rearranging, we have
ak1 ,,kN =
1
k1 ! kN !
K f
kN
k1
x1 xN
x)
(x=
f (x +
x) = f (x ) +
x) + (
x)
f
f
(x
(
x)
dx x=x
2
dx2 x=x
585
586
becomes
1
f (x +
x) f (x ) = (
x)T
2
d2
f
(
x)
dx2 x=x
With the additional condition that the Hessian is positive definite, that is,
2
d
f
x = 0
(
x)T
(
x) > 0
2
dx
x=x
then
f (x +
x) > f (x )
for all
x = 0
which means that x satisfying both (1.43) and (1.44) are sufficient conditions for x
to be a local minimum.
for all x = 0
(A.29)
for all x
(A.30)
N
N
aij xi x j
(A.31)
i=1 j =1
(A.32)
[ f (x)] = x Qx
(A.33)
or
(x Ax)
x A x
1
x (A + A ) x = x Qx
2
for all x = 0
(A.34)
for all x
(A.35)
EXAMPLE A.2.
Let N = 2, and
[f (x)] = x Qx
q12
q12
q11 q22 q12 q12
q11 x1 +
x1 +
x2
x2 +
x2 x2
q11
q11
q11
q11 yy +
det(Q)
x2 x2
q11
where
y = x1 +
q12
x2
q11
x Ax = 5 x1 + x2
x1 + x2 +
x2 x2
10
10
5
which we can see will always have a positive value if x = 0. Thus A is positive
definite.
Note that A does not have to be symmetric or Hermitian to be positive definite.
However, for the purpose of determining positive definiteness of a square matrix A,
one can simply analyze the Hermitian component Q = (A + A ) /2, which is what
the theorem below will be focused on. We can generalize the procedure shown in
Example A.2 to N > 2. The same process of completing the square will produce
587
588
(A.36)
dk = det .
..
..
..
..
.
.
.
hk1 hk2 hkk
EXAMPLE A.3.
2
3
3
H= 3
6
0.1
3 0.1
2
Using Sylvesters criterion, we take the determinants of the principal submatrices of increasing size starting from the upper left corner:
2
3
3
2 3
det (2) = 2 , det
= 3 , det 3
6
0.1 = 40.52
3 6
3 0.1
2
Because one of the determinants is not positive, we conclude that H is not
positive definite.
Now consider matrix Q given by
3
1 0.1
Q = 1
4
2
0.1
2
3
Using Sylvesters criterion on Q, the determinants are:
3
1
3
1
det (3) = 3, det
= 11, det 1
4
1
4
0.1
2
0.1
2 = 20.56
3
Because all the determinants are positive, we conclude that Q is positive definite.
Note that matrices that contain only positive elements are known as positive
matrices. Thus H given previously is a positive matrix. However, as we just
showed, H is not positive definite. Conversely, Q given previously is not a
positive matrix because it contains some negative elements. However, Q is
positive definite. Therefore, it is crucial to distinguish between the definitions
of positive definite matrices and positive matrices.
APPENDIX B
where B is a matrix that contains the pivot element in the upper-left corner.
By choosing the element with the largest absolute value as the pivot, the pivot is
0 only when A = 0. This can then be used as a stopping criterion for the elimination
process. Thus if A = 0, matrix B will have a nonzero value in the upper-left corner,
yielding the following partitioned matrix:
b11 wT
T
P() AP() = B =
(B.1)
v
The elimination process takes the values of b11 , v and wT to form an elementary row operator matrix GL[=]N N and a column elementary operator matrix
GR [=]M M given by
1
1 T
0 0
1
w
b11
b11
1
0
1
0
and
G
GL =
=
(B.2)
R
1
.
.
.
.
.
.
.
v
.
.
b11
0
0
1
0
1
589
590
These matrices can now eliminate, or zero-out, the nondiagonal elements in the
first row and first column, while normalizing the (1, 1)th element, that is,
1
1 T
0
0
1
w
b11
b11
b
T
11 w 0
1
0
1
0
GLBGR =
v
1
.
.
.
.
..
.
..
b11 v
0
0
1
0
1
1 0
= .
1
T
..
vw
b11
0
Let = a, be the pivot of A. For computational convenience, we could combine
T
GR , then
the required matrices, that is, let EL = GLP() and ER = P()
If = 1,
1/
0 0
a2, / 1
0
EL =
(B.3)
..
.
.
.
.
am, /
otherwise, if > 1,
EL
0
0
1/
a2, /
..
.
0
0
..
.
a1, /
a1, /
0
..
.
0
..
.
a+1, /
..
.
am, /
0
0
..
.
0
1
0
..
.
0
..
..
0
0
..
.
0 th row
th column
(B.4)
If = 1
ER =
1
0
..
.
a,2 /
1
..
a,n /
(B.5)
591
otherwise, if > 1
ER =
0
0
.
..
..
.
0
0
1
..
1
0
..
.
0
0
..
.
0
0
a,2
0
..
.
a,1
a,1
a,+1
0
..
.
0
..
.
..
0
0
..
.
a,n
th
row
th column
EXAMPLE B.1.
Let A be given by
1
A = 1
2
1
2
4
(B.6)
1
3
3
0 0
1/4
0
1
0
EL = 0 1 1/2
ER = 1 1/2 3/4
1 0 1/4
0
0
1
from which we get
1
ELAER = 0
0
0
2
1/2
0
3/2
1/4
592
1. Determine the pivot = max
ij . If = 0, then stop; otherwise, continue.
i,j
EL ER =
0 0
1
0
..
.
0
3. Update Q, W, and
as follows:
L
Ir
0[r(Mr)]
0[(Nr)r]
EL
R
if r = 0
Q
otherwise
if r = 0
Ir
0[r(Mr)]
0[(Nr)r]
ER
otherwise
4. Increment r by 1.
EXAMPLE B.2.
1
A = 1
2
1
3
3
1
2
4
Iteration
1
1
2
1
3
3
1
2
4
2
1/2
3/2
1/4
5/8
3 2
0
1
2 1 1
5
8
EL
0 1/4
0
1
1 1/2
1 1/2
0 1/4
0
0
1/2
1/4
1 1
ER
0
1
8/5
1
0
0
3/4
1
3/4
1
1
1 0
0
1
0
0 0 1/2
Q = 0 1
0 0 8/5
0
1/4
0
0
1/4
= 0
1/2
1/4
8/5
2/5
3/5
0
1
0
0
1
0
1
1/2
0
1
1/2
0
1
0
3/4 0
1
0
3/4
9/8
1
0
0
0 0
1
1
0
1
0
0
1
0
0
1
3/4 0
1
0
1/4
1/2
1/4
0
1
0
0
0
1
Remarks:
1. By choosing the pivot to be the element having the largest absolute value,
accuracy is also improved because division by small values can lead to larger
roundoff errors.
2. The value of rank r is an important property of a matrix. If the matrix is square,
r = M = N implies a nonsingular matrix; otherwise it is singular. For a nonsquare M N matrix, if r = min(M, N), then the matrix is called a matrix of full
rank; otherwise we refer to them as partial rank matrices.1
3. Because roundoff errors resulting from the divisions by the pivot tend to propagate with each iteration, the Gauss-Jordan elimination method is often used for
medium-sized problems only. This means that in some cases, the value of zero
may need to be relaxed to within a specified tolerance.
4. The Gauss-Jordan elimination algorithm can also be used to find the determinant
of A. Assuming r = M = N, the determinant can be found by taking the products
of the pivots and (1) raised to the number of instances where = 1 plus the
number of instances where = 1. For example, using the calculations performed
in Example B.2, there is one instance of = 1 and one instance of = 1 while
the pivots are 4, 2, and 5/8. Thus the determinant is given by
5
1+1
= 5
det A = (1) (4) (2)
8
5. A MATLAB file gauss_jordan.m is available on the books webpage that
finds the matrices Q and W, as well as inverses Q1 and W 1 . The program allows
one to prescribe the tolerance level while taking advantage of the sparsity of EL
and ER .
As discussed later, the rank r determines how many columns or rows are linearly independent.
593
594
(B.7)
1
0
..
where i > 0, i = 1, . . . , r
(B.8)
=
..
.
0
The details for obtaining U, V , and can be found in Section 3.9. Based on (B.7),
Q and W can be found as follows:
Q =
1 U
where,
1
11
W =V
and
(B.9)
0
..
.
r1
1
..
EXAMPLE B.3.
Let A be given by
12
0
A=
10
3
32
4
24
8
28
2
22
7
595
Then the singular value decomposition can be obtained using MATLABs svd
command to be
0.7749
0.2179
0.0626 0.5900
0.0728
0.8848 0.2314
0.3978
U =
0.5972 0.4083 0.3471
0.5968
0.1937
0.0545
0.9066
0.3708
0.2781 0.6916
0.6667
V =
0.7186 0.6103 0.3333
0.6374 0.3864 0.6667
57.0127
0 0
0.0175
0
0
0
1.8855
0
1 =
0 0.5304
0
=
0
0 0
0
0 1.0000
0
0 0
Finally,
Q = U
and
W = V 1
0.0049
= 0.0126
0.0112
0.3668
0.3237
0.2049
0.6667
0.3333
0.6667
0 0 1
GB = 1 0 1
0 0 0 B
(We use the subscript B to indicate that the elements are boolean.)
Figure B.1. The digraph G given in (B.10).
596
EXAMPLE B.4.
=
=
=
=
f 1 (x5 , x6 )
f 2 (x2 , x5 , x7 )
f 3 (x2 , x7 )
f 4 (x1 , x2 )
x5
x6
x7
=
=
=
f 5 (x2 , x3 )
f 6 (x4 , x7 )
f 7 (x3 , x5 )
0 0 0 0 1 1 0
0 1 0 0 1 0 1
0 1 0 0 0 0 1
GB =
1 1 0 0 0 0 0
0 1 1 0 0 0 0
0 0 0 1 0 0 1
0 0 1 0 1 0 0 B
(B.11)
(B.12)
The vertices of Figure B.2 can be moved around to show a clearer structure
and a partitioning into two subgraphs G1 and G2 , as shown in Figure B.3, where
G1 = {x2 , x3 , x5 , x7 } and G2 = {x1 , x4 , x6 }. Under this partitioning, any of the
vertices in G1 can link to vertices in G2 , but none of the vertices of G2 can
reach the nodes of G1 . This decomposition implies that functions { f 2 , f 3 , f 5 , f 7 }
in (B.11) can be used first to solve for {x2 , x3 , x5 , x7 } as a group because they are
not influenced by either x1 , x4 , or x6 . Afterward, the results can be substituted
to functions { f 1 , f 4 , f 6 } to solve for {x1 , x4 , x6 }.2
The process in which a set of nonlinear equations are sequenced prior to actual solving of the
unknowns is known as precedence ordering.
597
G1
vertex in the same collection. Obviously, as the number of vertices increases, the
complexity will likely increase such that the decomposition to strongly connected
subgraphs will be very difficult to ascertain by simple inspection alone. Instead, we
can use the boolean matrix representation of the influence digraph to find the desired
partitions.
Because the elements of the boolean matrices are boolean (or logic) variables,
the logical OR and logical AND operations will replace the product () and sum
(+) operations, respectively, during the matrix product operations. This means
that we have the following rules3 :
(0 + 0)B = 0
(0 + 1)B = (1 + 0)B = (1 + 1)B = 1
(0 0)B = (1 0)B = (0 0)B = 0
(A B)B = C
(1 1)B = 1
cij = (ai1 b1j )B + + (aiK bKj )B
B
k
A B = (A A A)B
1 0 1
0
0 0 1 1
1 0 1
1
0
1
0
1
1
1
=
0
1
1
B
0
0
0
(B.13)
1
1
1 B
k Using the rules in (B.13), we note that for a digraph A[=]N N, the result of
A B with k N will be to add new arcs (vi , v j ) to the original digraph if there exists
a path consisting of at most k arcs that would link vi to v j . Thus to find the strongly
connected components, we could simply perform the boolean matrix conjunctions
enough times until the resulting
digraph
has settled to a fixed boolean matrix, that
is, find k N such that Ak B = Ak+1 B .
G2
598
0 0 0 0 1 1 0
1 1 1 1 1 1 1
0 1 0 0 1 0 1
0 1 1 0 1 0 1
0 1 0 0 0 0 1
0 1 1 0 1 0 1
4
3
G B=
1 1 0 0 0 0 0 = 1 1 1 1 1 1 1 = G B
0 1 1 0 0 0 0
0 1 1 0 1 0 1
0 0 0 1 0 0 1
1 1 1 1 1 1 1
0 0 1 0 1 0 0 B
0 1 1 0 1 0 1 B
3
From the result of G B , we see that columns {2, 3, 5, 7} have the same entries,
whereas columns {1, 4, 6} have the same entries. These two groups of indices
determine the subgraphs G1 and G2 obtained in Example B.4.
For the special case of linear equations, we could use the results just obtained
to determine whether a matrix is reducible or not, and if it is, we could also find the
required permutation matrix P that is needed to find the reduced form.
Definition B.1. A square matrix A is a reducible matrix if there exists a permutation matrix P such that
B11
0
T
PAP = B =
B12 B22
A matrix that is not reducible is simply known as an irreducible matrix.
B11
0
B21
B22
B= .
..
..
..
.
.
BM1
BM2
BMM
where the block matrices are Bii [=]1 i and i=1 Mi = N.
Remarks: A MATLAB code that implements the algorithm for finding the reduced
form is available on the books webpage as matrixreduce.m.
EXAMPLE B.6.
0 0 0 0
2
0 1 0 0 1
0 3 0 0
0
2
1
0
0
0
A=
0 1 1 0
0
0 0 0 4
0
0 0 1 0 1
1
0
0
0
0
0
0
0
1
1
0
0
1
0
then the influence graph is given by the same boolean matrix GB given in example B.5. The algorithm then suggests the following sequence: [2, 3, 5, 7, 1, 4, 6]
for P, that is,
0 1 0 0 0 0 0
0 0 1 0 0 0 0
0 0 0 0 1 0 0
P=
0 0 0 0 0 0 1
1 0 0 0 0 0 0
0 0 0 1 0 0 0
0
1
3
T
PAP = A =
0
0
1
0
1
0
0
1
2
0
0
1
1
0
0
0
0
1
0
0
0
0
0
2
0
0
0
0
0
0
0
4
0
0
0
0
1
0
0
Once the block triangular structure has been achieved, a special case of (1.34)
can be used, that is,
B11
B21
0
B22
B1
11
1
B1
22 B21 B11
B1
22
599
600
A=
1
1
0
0
2
1
2
0
0
0
0
0
3
1
0
0
0
1
4
1
2
0
0
1
5
The degrees of each index are given by (1), (2), (3), (4), (5) = (2, 1, 1, 2, 2).
Suppose that the current sequences U and V are given by
U = {3, 4, 5}
4
and
V = [2, 1]
We use a pair of curly brackets to indicate a collection in which the order is not relevant and a pair
of square brackets to indicate that the order sequence is important.
then
Ne(1) = {2, 5}
Ne (1) = [2, 5]
and
F
1
Bs =
R1
D1
..
.
0
..
..
..
..
..
.
Fm
Rm
Dm
and where the diagonal blocks Di are square. The value m is the maximal value that
attains a block tri-diagonal structure for Bs and is known as the depth of Bs . Let
be the size of the last diagonal block, Dm . Then we can test the indices determined
by the last entries of Vs as starting indices and apply the Cuthill-McKee algorithm
to each of these indices. If any of these test indices, say index w, yield a smaller
601
602
bandwidth, then we update s with w and the whole process is repeated.5 Otherwise,
V = Vs is chosen to be the desired sequence.
Remarks:
1. Often, especially in the solution of finite element methods, the reversed ordering
has shown a slight computational improvement. Thus a slight modification yields
the more popular version known as the Reverse Cuthill-McKee reordering,
which is to simply reverse the final sequence in V found by the Cuthill-McKee
algorithm.
2. The MATLAB command that implements the reverse Cuthill-McKee reordering algorithm of matrix A is: p=symrcm(A), and the permuted matrix can be
obtained as: B=A(p,p).
3. A MATLAB function p=CuthillMcKee(A) is available on the books webpage that implements the Cuthill-McKee algorithm.
EXAMPLE B.7. Consider the C60 molecule (or geodesic dome popularized by
Buckminster Fuller), which is a form of pure carbon with 60 atoms in a nearly
spherical configuration. A graphical figure is shown in Figure B.4. An adjacency
(boolean) matrix describing the linkage among the atoms is shown in Figure B.5
in which the dots are TRUE and the unmarked positions are FALSE. The bandwidth of the original indexing is 34. Note that each node is connected to three
other nodes; thus the degrees of each node is 3 for this case. After applying
the Cuthill-McKee reordering algorithm, the atoms are relabeled and yield the
adjacency matrix shown in Figure B.6. The bandwidth of the reordered matrix
is 10.
The method of choosing new starting indices based on the last block of the level structure is based
partially on the method developed of Gibbs, Poole and Stockmeyer (1976) for choosing the initial
index. Unlike their method, the one discussed here continues with using the Cuthill-McKee algorithm
to generate the rest of the permutation.
603
10
20
30
40
50
60
0
10
20
30
40
50
60
from modular units of connected subsystems, for example, from physical processes
composed of different parts. In some cases, it results from the geometry of the
problem (e.g., from the finite difference solutions of elliptic partial differential equations). In other cases, the block structure results from reordering of equations and
re-indexing of the variables.
10
20
30
40
50
60
0
10
20
30
40
50
60
604
L11
0
..
A = ...
.
Ln1 Lnn
(B.14)
x1
b1
x = ...
and
b = ...
(B.15)
xn
bn
where xk and bk are column vectors of length Nk .
It is possible that even though the original matrix A does not have the lower
block triangular structure of (B.14), one might still be able to find permutation
matrices P such that
A = PAPT attains a lower block triangular structure. If so, then
A is known as reducible. One could use boolean matrices to find the required P,
and details of this method are given in Section B.3 as an appendix. Furthermore,
a MATLAB code that implements the algorithm for finding the reduced form is
available on the books webpage as matrixreduce.m.
Assuming that the block diagonal matrices Lkk are square and nonsingular, the
solution can be obtained by the block matrix version of forward substitution, that is,
k1
1
1
and
xk = Lkk bk
Lk, x ; k = 2, . . . , n (B.16)
x1 = L11 b1
=1
U 11 U 1n
..
..
A=
.
.
0
U nn
(B.17)
where U i,j [=]Ni Nj , i j , and assuming that the block diagonal matrices U kk are
square and nonsingular, the solution can be obtained by the block matrix version of
backward substitution, that is,
n
1
1
bn and xk = U kk
U k, x ; k = n 1, . . . , 1 (B.18)
xn = U nn
bk
=k+1
Let A be partitioned as
A11
..
A= .
An1
..
.
A1n
..
.
Ann
(B.19)
where Aij [=]Ni Nj with Akk square. Then block matrix computation can be
extended to yield block LU decompositions. The block-Crouts method and the
block-Doolittles method are given in Table B.1. Note that Lij and U ij are matrices
of size Ni Nj and are not triangular in general. Furthermore, when A is block
tri-diagonal, a block version of the Thomas algorithm becomes another natural
extension. (See Exercise E2.16 for the block-Thomas algorithm ).
605
Algorithm (For p = 1, . . . , N )
U pp
Lip
INP
Aip
for i = p, . . . , n
Lik U kp
k=1
U pj
p 1
L1
pp Apj
p 1
for j = p + 1, . . . , n
Lpk U kj
k=1
Lpp
U pj
=
=
I Np
Apj
Aip
p 1
for j = p, . . . , n
Lpk U kj
k=1
Lip
p 1
Lik U kp U 1
pp
for i = p + 1, . . . , n
k=1
S11 S12
S = PR SPCT =
(B.20)
S21
0
Assume that the size of
S11 is significantly smaller than the full matrix. If either
S21 = 0, then an efficient solution method known as the Diakoptic method
S12 = 0 or
is available.
Case 1.
S12 = 0
In this case, PR = I. With S = A M and
S = SPCT , the problem Ax = b can be recast
as follows:
Ax = (M + S) x = b
I + H
S y=z
where H = PCM1 , y = PCx and z = PCM1 b. Let
S11 [=]r r. With
S12 =
S22 = 0,
L = (I + HS) will be block lower triangular matrix, that is,
0
H11 H12
S11 0
yr
zr
Ir
+
=
0 INr
H21 H22
yNr
zNr
S21 0
L11
L21
0
L22
yr
yNr
=
zr
zNr
606
S11 + H12
S21 , L21 = H21
S11 +
where L11 = Ir + H11
H22 S21and L22 = INr . Note that
the blocks of L are obrained by just partitioning I + H
S .
Assuming L11 is nonsingular,
yr
yNr
L1
11 zr
zNr L21 yr
=
=
x=
5 0
1
0 1 0
1 4 1
0 0 0
0 1
4
0 2 0
A=
1 0
1
4 0 0
1 2
0
1 5 0
1 0
2 1 1 5
yr
PCT
(B.21)
yNr
EXAMPLE B.8.
b=
and
0 0
5 0 0
0 0 0
0 0
1 4 0
0
0
0
0 0
0 1 4
0 0 0
M=
and S = 0 0
1
0
1
4
0
0
0 0
1 2 0
1 5 0
0 0
1 0 2 1 1 5
Let PC be
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0.513
1.016
0.2
0.05
0.178
0.204
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
PC =
then we obtain
I + H
S =
1.075
0.094
0.2
0.3
0.069
0.023
0
0
0
0
0
1
yr =
3
1
;
yNr
0
0
0
0
0
0
0
0
0
0
0
0
z=
and
and
1
0
2
0
0
0
1
2
=
1
2
1
1
0
0
0
0
Finally, we get
9
6
16
0
9
3
x=
1
2
3
1
1
2
3.737
1.297
1.8
1.05
1.384
2.271
607
Case 2.
S21 = 0
For this case, PC = I. With matrices S = A M and
S = PR SPCT , the problem Ax = b
can be recast as follows:
Ax = (M + S) x = b
I +
SH
y =
z
= M1 PT ,
z = PR b. Let
S11 [=]r r. With
S21 =
S22 = 0,
where H
R y = PR Mx and
Ir
0
0
INr
+
S11
0
S12
0
11
H
H21
12
H
H22
U 11
0
U 12
U 22
yr
yNr
yr
yNr
zr
zNr
=
zr
zNr
11 +
21 , U 12 =
12 +
22 and U 22 = INr . The blocks
where U 11 = Ir +
S11 H
S12 H
S11 H
S12 H
. Assuming U 11 is nonof U are obtained by simple partitioning of I +
SH
singular,
yNr
yr
=
=
zNr
1
yNr )
U 11
zr U 12
(
5
1 0 1 1 1
0
4 1 0 2
0
1 1 4 1 0
2
A=
0
0
0
4
1
1
1
0 2 0 5
1
0
0 0 0 0
5
x=M
PRT
yr
yNr
(B.22)
EXAMPLE B.9.
b=
and
0
0
5 1 0 1 1 1
0 4 1 0 2
0
0
0
1 1
0 0 4 1 0
2
M=
0 0 0 4 1 1 and S = 0
0
1
0 0 0 0 5
0
1
0
0
0 0 0 0 0
5
Let PR be given by
PR =
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
9
13
6
1
10
10
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
608
then
1.075
0.513
0
I + SH =
0
0
0
0.094
1.016
0
0
0
0
0.2
0.2
1
0
0
0
0.3
0.05
0
1
0
0
0.069
0.178
0
0
1
0
Finally, we obtain
yNr
9
13
=
1
10
0.023
0.204
0
;
0
;
yr =
7
3
and
x=
1
2
3
1
1
2
6
10
z=
13
1
10
A
0
A
11
1n
..
..
.
.
A=
(B.23)
A
A
n1,n1
n1,n
An,1
An,n1
An,n
Consider the domain given in Figure B.7 in which a larger rectangular region (Subdomain I) is attached to a smaller rectangular region (Subdomain II). We identify points {1, 2, 3, . . . , 10} and {11, 12, 13, 14} to be the interior
points of Subdomain I and Subdomain II, respectively. The remaining interior
points {15, 16} are the interface points that link both subdomains.
The partial differential equation that models the steady-state temperature
distribution is given by
EXAMPLE B.10.
2u 2u
+ 2 =0
x2
y
subject to values that are fixed for u at the boundaries. Let the boundary points
described by the various points shown in Figure B.7 have the following values:
(ua , ub, uc , ud , ue , u f , ug ) = (100, 90, 80, 70, 60, 50, 40)
a
b
c
d
10
15
16
11
13
12
14
e
f
609
b
c
A11
0
A13
x1
b1
0
A22 A23 x2 = b2
A31 A32 A33
x3
b3
where,
4
1
0
A11 =
0
0
0
1
4
0
1
0
0
0
0
0
0
1
0
4
1
1
0
0
0
0
0
0
1
1
4
0
1
0
0
0
0
0
0
1
0
4
1
1
0
0
0
4
1
1 4
1
0
0
1
0 1
A31 =
0 0
1 0
A32 =
0 0
1
0
4
1
190
70
A22 =
bT1
bT2
bT3
80
60
90
70
70
0
0
0
1
100
60
90
0
0
0
1
1
4
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
1
4
1
1 4
1
0
0
1
0
1
1
4
0
0
1
4
1
0
A23 =
0
0
0
0
0
0
0
0
1
0
4
1
0
1
0
1
0
; A13 =
0
0
0
0
0
1
0
0 0 0 0 0 0
0 0 0 0 0 0
0
4
1
; A33 =
0
1 4
100
70
100
70
190
150
0
0
0
0
610
xT1
xT2
xT3
u1
u2
u11
u12
u15
u16
u3
u4
u13
u5
u6
u7
u8
u9
u10
u14
Note that the structures of A11 , A22 , and A33 are block tri-diagonal, whose inverse
can be obtained using the block LU methods (or more specifically, using the
block-Thomas algorithm; see Exercise E2.16).
(I + H) x = z
A11
M=
0
..
.
Ann
we have
H=M S
A1
11
0
A1
22
..
.
A1
nn
A1,n
..
An,1
..
.
0
An1,n
An,n1
A1
11 A1,n
0
..
A1
nn An,1
..
.
A1
n1,n1 An1,n
A1
nn An,n1
1
n1
Let Bk = A1
. Note that the product
Ann is
k=1 An,k Bk
kk Akn and
= Ann
the inverse of the Schur complement of Ann . Using the block matrix inverse formula
given in (1.36), we obtain
W
X
z
(B.24)
x = (I + H)1 z =
Y
Z
611
where,
Z =
Ann
Y =
B1
X = ...
Ann
Bn1
An1
An,n1
B1
W = I + ...
An1
Bn1
An,n1
EXAMPLE B.11.
zT
80.357, 52.698, 78.731, 50.436, 84.131, 70.313, 87.481, 76.686,
89.107, 78.948, 33.75, 41.25, 33.75, 41.25, 23.333, 23.333
1
2
and
v2 = 1
v1 = 1
0
1
Based on the definition given in Table B.3, the span of v1 and v2 is the collection of
all vectors obtained by a linear combination of these two vectors. A representative
vector is then given by
v
av1 + bv2
612
Associative
Commutative
Identity is 0
Inverse exist and unique
v + (w + y) = (v + w) + y
v+w=w+v
0+v=v
v + (v) = 0
Associative
Identity is 1
Vector is distributive
over scalar sums
Scalar is distributive
over vector sums
(v) = () v
1v = v
( + ) v = v + v
(v + w) = v + w
x
y
z
2b
a+b
ab
1
v3 = 1
1
This point is no longer in the span of v1 and v2 because the three elements of v3 do
not satisfy z = y x.
Table B.3. Some important definitions for linear vector spaces
3
4
5
6
Conditions
w is a linear combination
of {v1 , . . . , vK }
based on {1 , . . . , K }
w=
Span of {v1 , . . . , vK }
is the space of possible
linear combinations
{v1 , . . . , vK } are
linearly independent
{v1 , . . . , vK } are
linearly dependent
{v1 , . . . , vK }
is the basis of subspace S
An integer d = dim (S) is the
dimension of subspace S
K
i=1
i vi
: ;
Span (v1 , . . . , vK ) = w
such that w = K
i=1 i vi
for i F
k
i=1 i vi = 0
only if i = 0 for all i
k
i=1 i vi = 0
for some i = 0
{v1 , . . . , vK } is linearly independent,
and Span (v1 , . . . , vK ) = S
There exist {v1 , . . . , vd }
that is a basis of S
613
Positivity
Scaling
Triangle Inequality
Unique Zero
v 0
v = || v
v + w v + w
v = 0 only if v = 0
The space of column vectors, using the matrix algebra operations discussed in
Section 1.2, satisfy the conditions in Table B.2. Vectors v1 , . . . , vM (each of length
N) can then be linearly combined using scalars xi , that is,
M
xi vi
i=1
or Ax, where
A=
v1
vM
(B.25)
Among the various possible norms for matrix vectors, we have the Euclidean
norm, denoted v2 defined by
#
$ N
$
v2 = v v = %
vi vi
(B.26)
i=1
In most cases, we default to the Euclidean norms and drop the subscript 2, unless
the discussion involves other types of norms. One can show that this definition
614
satisfies the conditions given in Table B.4 (we include the proof that (B.26) is a
norm in Section B.10.1 as an appendix.) If the vectors are represented by points in
a N-hyperspace, then the norm is simply the distance of the points from the origin.
Note also that only the zero vector will have a zero norm.
(B.27)
One method to determine whether functions {f 1 , . . . , f M } are linearly independent is the Wronskian approach, extended to multivariable functions. First, take the
linear combination
1 f 1 (x) + . . . + M f M (x) = 0
(B.28)
0
f1
fM
f 1 /v1 f M /v1
. 0
f 1 /v2 f M /v2 .. = 0
..
..
M
.
.
Enough equations are generated until a nonsingular submatrix can be obtained.
If this occurs, then it would establish that { f 1 , . . . , f m } are linearly independent.
However, the Wronskian approach requires the evaluation of partial derivatives
and determinants that involve the independent variables. This means that except for
small number of functions, the general case will be cumbersome to solve symbolically.
Another method, called the substitution approach, first chooses different values
A can
for v, say, vi with i = 1, . . . , M, and then substitutes them into f j (v). Matrix
then be formed as follows:
f 1 (v1 ) f M (v1 )
..
..
..
A=
.
.
.
f 1 (vM ) f M (vM )
)
*
If
A is nonsingular, we conclude that f 1 (v) , . . . , f M (v) are linearly independent.
EXAMPLE B.12.
(B.29)
Here, we have one independent variable v. The functions are f i (v) = vi1 .
Using the Wronskian approach, we have
0
a0
..
W
= ..
.
.
aM1
0
where
W =
1
0
..
.
v
1
..
.
vM1
(M 1) vM2
..
.
(M 1)!
615
1
1
M1
1
1
2
M1
2
A= .
.
.
.
..
..
..
..
M1
1 M1 M1
)
EXAMPLE B.13.
(B.30)
Here, we have two independent variables v1 and v2 . The functions are f 1 (v) = 1,
f 2 (v) = v21 , f 3 (v) = (v1 v2 ) v2 and f 4 (v) = v22 .
Using the extended-Wronskian approach, we have
1 v21
v22
(v1 v2 ) v2
0 2v2
v2
0
0
2v2
0
2v2
W =
0
2
0
0
0
0
1
0
0
0
2
2
We can take two different tracks. The first is to take the determinant of the
Grammian W T W. This will mean a 4 4 determinant involving symbolic manipulations. The other method is to choose rows and determine whether a nonsingular submatrix emerges. We show the second track by choosing rows 1, 4, 5,
and 6. Doing so, we have
W[1,4,5,6] =
0 0
1
0
0 0
2
2
whose determinant is 4. Thus the functions are linearly independent.
616
1 1
0 0
1 0 1 1
A=
1 1
0 1
1 1 2 1
whose determinant is 2. Thus it also shows that the functions are linearly
independent.
(B.31)
Definition B.3. Let a and b be two vectors of the same length. Then a and b are
orthogonal to each other if
a, b = 0. A set of vectors z1 , . . . , zN is an orthonormal set if
F 0 if i = j
E
(B.32)
zi , z j =
1 if i = j
Gram-Schmidt Algorithm:
Let {a1 , . . . , aN } be linearly independent. Set z1 =
yk
ak
k1
i=1
zk
yk
yk
a1
For k = 2, . . . , N,
a1
ai , zi zi
Given
1
a1 = 2
1
0
a2 = 1
2
1
a3 = 1
0
0.408
0.436
z1 = 0.816
z2 = 0.218
0.408
0.873
0.802
z3 = 0.534
0.267
E
F
We can check that
z1 , z1 =
z2 , z2 =
z3 , z3 = 1, and zi , z j = 0 for i = j .
v = v v = %
vi vi
i=1
|| v
3. Because vi vi = 0 if and only if vi = 0, the only vector that will yield a zero norm
is v = 0.
4. The triangle inequality is more involved. It requires a relationship known as
Cauchy-Schwarz inequality:
v w v w
(B.33)
The proof of the Cauchy-Schwarz inequality is given later. For now, we apply
(B.33) to prove the triangle inequality of Euclidean norms.
v + w2
v v + v w + w v + w w
v2 + v w + w v + w2
Thus
v + w v + w
617
618
(B.34)
and
0
(|a| |b|)2
2|a||b|
|a||b|
|a|2 + |b|2
1 2
|a| + |b|2
2
(B.35)
|ab|
(B.36)
1
v
v
and
a=
1
w
w
N
N
1 2 2
|ai | +
|bi | = 1
2
i=1
Then
a b
|v w|
v w
v w
i=1
v w
T
1 T
rk rk + rTk J k
k x +
k x J kT J k
k x
2
We limit the proof only for the case of Euclidean norms, although the Cauchy-Schwarz can be
applied to different norms and inner products.
619
If the minimum of lies inside the trust region, then the problem resembles the
unconstrained problem, whose solution is immediately given by
1 T
J k rk
k x = J kT J k
that is, = 0.
However, if the trust region is smaller, then the minima will be on the boundary of
the trust region, that is,
k x = Mk . This means that the solution to the constrained
minimization problem given in (B.109) will be a step
k x such that when we perturb
it by another vector, say v, the value of can be decreased only at the expense of
moving it outside the trust region.
A perturbation v will minimize
k x further only if
T
k
k
T
k
T
=
r Jk +
x
J k J k v + vT J kT J k v
0 <
x +v
x
or, because vT J kT J k v 0,
T
T
k
T
r Jk +
x
Jk Jk v > 0
(B.37)
k x + v >
k x
T
k x v > 0
(B.38)
The implication is that the vectors premultiplying v in (B.37) and (B.38) must point
in the opposite directions, or
J kT r + J kT J k
k x =
k x
for some > 0. Thus
k x is given by the form
1 T
k x = J kT J k + I
J k rk
To show uniqueness, let
1 T
s () = J kT J k + I
Jk r
and let q () be the difference
q () = s () Mk
whose derivative is given by
3 T
Jk r
rT J k J kT J k + I
dq
=
d
s ()
The derivative dq/d is always negative for > 0, and equal to zero only when
rTk J k = 0 (which occurs only when x[k] is already the minimum of ). This implies
that q () is zero only for a unique value of .
620
(B.39)
The mismatch between b and Ax(i) is called the ith residual vector, denoted by r(i) ,
that is,
r(i) = b Ax(i)
(B.40)
By taking the gradient of f (x), we can see that the residual vector is the transpose
of the negative gradient of f (x) at x = x(i) , that is,
d
f (x)
= xT A bT x=x(i) = (r(i) )T
(B.41)
dx
x=x(i)
The relationship between the residual vectors and error vectors can be obtained
by adding A
x b = 0 to (B.40),
r(i)
b Ax(i) + (A
x b)
A x(i)
x = A err(i)
(B.42)
(B.43)
where d(i) is the ith correction vector and (i) is a factor that will scale the correction
vector optimally for the ith update. (The choice for d(i) and (i) is discussed later).
The residual vector is
r(i+1) = b Ax(i+1)
(B.44)
x(i+1)
x
x(i)
x + (i) d(i)
err(i+1)
A err(i+1)
r(i+1)
(B.45)
(B.46)
This means that the conjugate gradient method begins with the same search direction as a gradient
descent method, because r(0) is the negative gradient of f (x) at x(0) .
621
Afterward, the next correction vector will be a combination of the previous correction vectors and the most recent residual vector, that is,
d(i+1) = (i+1) r(i+1) + (i+1) d(i)
(B.47)
for j < i
(B.48)
2. The ith residual vector is orthogonal to previous residual vectors, and it is also
orthogonal to previous direction vectors, that is,
(r(i) )T r(j ) = 0 ; (r(i) )T d(j ) = 0
for j < i
(B.49)
As is shown later in Lemma B.1, these criteria are achieved by using the following
values for the scaling factors:
(i+1) = 1 ; (i) =
(r(i) )T r(i)
(r(i+1) )T Ad(i)
(i+1)
;
=
(d(i) )T Ad(i)
(d(i) )T Ad(i)
(B.50)
(B.51)
(r(i) )T r(i)
(d(i) )T Ad(i)
(B.52)
x(i+1)
(B.53)
r(i+1)
(B.54)
(i+1)
(r(i+1) )T Ad(i)
(d(i) )T Ad(i)
(B.55)
d(i+1)
(B.56)
622
The relationships among the various residual vectors and direction vectors are
outlined in the following lemma:
For i 1 and j < i, using the conjugate gradient algorithm (B.51) to
(B.56), we have the following identities for r(i) and d(i) :
LEMMA B.1.
(B.57)
(j )
(r ) d
(B.58)
(r(i) )T r(i)
(r(i) )T d(i)
(B.59)
(d(i) )T Ad(j )
(B.60)
(r(i) )T r(j )
(i) T
(i)
(r ) Ad
(r(i) )T Ad(i1)
(d(i1) )T Ad(i1)
(r(i) )T Ar(i1)
(r(i) )T Ad(i1)
(B.63)
(j )
(B.64)
(r(i+1) )T Ad(j )
(B.65)
(r(i+1) )T Ar(j )
(B.66)
(i) T
(d ) Ad
(i) T
(d ) Ar
PROOF.
(i) T
(i)
(r(i) )T r(i)
(r(i1) )T r(i1)
(B.61)
(B.62)
2.
3.
4.
5.
(r(i+1) )T r(i+1)
(r(i) )T r(i)
(B.67)
623
6. Equation (B.63) underlines the fact that although r(i+1) is A-orthogonal to r(j )
with j < i, r(i+1) is not A-orthogonal to r(i) .
One of the more important implications is that if the round-off errors are not
present, the solution can be found in a maximum of N moves, that is,
THEOREM B.1. Let A[=]N N be symmetric positive-definite. Then, as long as there
are no round-off errors, the conjugate gradient algorithm as given by (B.51) to (B.56)
will have a zero error vector after at most N iterations.
PROOF.
We now give a simple example to illustrate the inner workings of the conjugate
gradient method for a 2D case.
EXAMPLE B.15.
(0)
1.5
1.5
3.5
0.25
0
0
The method terminated after two iterations, and we see that x(2) solves the linear
equation.
To illustrate how the method proceeds, we can plot the three iterations of
x as shown in Figure B.8. Attached to points x(0) and x(1) are concentric ellipses
that are the equipotential contours of f (x), where
[ f (x)] = xT Ax xT b
Because A is a symmetric positive definite matrix, we could factor A to be equal
to ST S,
1.4142 0.3536
S=
0
1.0607
624
p0
x2
x1
Then we could plot the same points x(0) , x(1) , and x(2) using a new coordinate
system,
y1
= Sx
y=
y2
which yields,
1.5910
(0)
y =
1.5910
(1)
0.9342
1.4533
(2)
0.8839
0.5303
625
y 1
2
y1
coordinates y.9 Instead, the conditions of A-orthogonality of the direction vectors are achieved efficiently by the conjugate gradient method by working only
in the original coordinate system of x.
(r(1) )T r0 = 0
(d ) Ad(1) = (r(1) )T Ad(1)
(r(2) )T Ar0 = 0
(1) T
(r(1) )T Ad0
(r(1) )T r(1)
=
T
(d0 ) Ad0
(r0 )T r0
(B.68)
(r(i) )T r(i)
(d(i) )T Ar(i) = (r(i) )T r(i) (r(i) )T r(i) = 0
(d(i) )T Ad(i)
(r(i) )T r(i)
(d(i) )T Ar(j ) = 0
(d(i) )T Ad(i)
(B.69)
626
(r(i) )T r(i)
(d(i) )T Ad(i) = (r(i) )T d(i) (r(i) )T r(i) = 0
(d(i) )T Ad(i)
(r(i) )T r(i)
(d(i) )T Ad(j ) = 0
(d(i) )T Ad(i)
(B.70)
(r(i+1) )T r(i+1)
(r(i+1) )T r(i+1)
(r(i+1) )T Ad(i)
(B.71)
4. Using (B.56),
(d(i+1) )T Ad(i)
(B.72)
(r(i+1) )T Ad(i+1)
(r(i+1) )T Ad(i+1)
(r(i+1) )T r(i)
or rearranging
(r(i+1) )T r(i+1)
(r(i+1) )T Ad(i)
=
(r(i) )T r(i)
(d(i) )T Ad(i)
(B.74)
627
(r(i+1) )T Ar(i)
(r(i+1) )T Ar(i)
(r(i+1) )T Ar(i)
(B.76)
(ri+2 )T Ad(i)
(d ) Ad
(r(i) )T r(i)
(i+1) (i+1) T (i+1)
(i)
(d
)
A
r
r
(1)
(r(i+1) )T r(i+1) (i) T (i)
(d ) Ad
(r(i) )T r(i)
(r(i+1) )T r(i+1) (i) T (i)
+
(d ) Ad
=0
(r(i) )T r(i)
Whereas using (B.54) for rTi+2 , multiplying by Ad(j ) , and then using (B.65) and
(B.76),
(r(i+2) )T Ad(j )
=
=
(d
) A (j ) r
=0
r
(B.77)
(B.78)
Thus (B.69) through (B.78) show that if (B.57) to (B.66) apply to i with j < i, then
the same equations should also apply to i + 1. The lemma then follows by induction
from i = 1.
628
(d(i) )T Ad(j )
N1
k d(k)
(B.79)
k=0
To identify the coefficients k , multiply (B.79) by (d() )T A while using the Aorthogonal properties of d() ,
(d() )T A err(0)
(d() )T Ad()
(d() )T A err(0)
d() r(0)
=
(d() )T Ad()
(d() )T Ad()
(B.80)
x(0) + 0 d(0)
x2
..
.
x(i)
x(0) +
i1
m d(m)
m=0
i1
m d(m)
(B.81)
m=0
(i)
=r
(0)
i1
m Ad(m)
(B.82)
m=0
(r() )T r()
=
(d() )T Ad()
(B.83)
629
N1
k d(k)
(B.84)
k=0
k=0
Thus when i = N,
err(N) =
N1
N1
k d(k) +
m d(m)
=0
m=0
k=0
r(0)
r(0)
(B.85)
uk+1 =
pk
pk
(B.87)
One can show that Arnoldis method will yield the following property of U k and
U k+1 :
k
U k+1
AU k = H
(B.88)
k =
H
..
..
.
.
0
630
r(0)
0
k yk =lsq
(B.89)
U k+1 AU k yk = H
..
.
0
where yk has a length k, which is presumably much smaller than N. Because of
k , efficient approaches for the least-squares
the special Hessenberg structure of H
solution of (B.89) is also available. Thus, with yk , the kth solution update is given by
x(k) = x(0) + U k yk
(B.90)
To show that (B.89) together with update (B.90) is equivalent to minimizing the
kth residual, we simply apply the properties of U k obtained using Arnoldis method
as follows:
= min b A x(0) + U k y
min r(k)
=
=
min r(0) AU k y
min (r(0) )u1 U k+1 U k+1
AU k y
k y
min U k+1 ck H
(B.91)
=
T
631
included in Section B.12.2, where the GMRES algorithm is also outlined in that
section with the improvements already incorporated.
Note that for both the conjugate gradient method and GMRES method, we
have
r(k) =
k1
ck Ak r(0)
(B.92)
i=0
where ci are constant coefficients. For the conjugate gradient method, this results
directly from (B.45).
For the GMRES method, we have from Arnoldis method,
j
u1 Au j
1
..
u j +1 =
ai ui
Au j u1 u j
= bj Au j +
.
j
i=1
u j Au j
for some coefficients ai , i = 1, . . . , j and bj . When applied to j = 2, 3, . . . , k, together
with u1 = r(0) / r(0) , we can recursively reduce the last relationship to
uk =
k1
qi Ai1 r(0)
i=1
r(k) =
k1
ci Ai r(0)
i=0
When seen that in this space, the critical difference between the conjugate
gradient and GMRES methods lies in how each method determines the coefficients
ci for the linear combination of Ai r(0) . Otherwise, they both contain updates that
resides in a subspace known as the Krylov subspace.
Definition B.4. A kth -order Krylov subspace based on square matrix A[=]N N
and vector v[=]N 1 is the subspace spanned by vectors Ai v, with i = 1, . . . , N,
k N, that is,
Kk (A, v) = Span v, Av, , Ak1 v
There are several other methods that fall under the class known as Krylov
subspace methods including Lanczos, QMR, and BiCG methods. By restricting
the updates to fall within the Krylov subspace, the immediate advantage is that the
components of Krylov subspace involve repeated matrix-vector products of the form
Av. When A is dense, Krylov methods may just be comparable to other methods,
direct or iterative. However, when A is large and sparse, the computations of Av can
be significantly reduced by focusing only on the nonzero components of A.
More importantly, for nonsingular A, it can be shown that the solution of Ax = b
lies in the Krylov subspace, Kk (A, b) for some k N. Thus as we had noted earlier,
both the conjugate gradient method and the GMRES method are guaranteed to
reach the exact solution in at most N iterations, assuming no round-off errors. In some
cases, the specified tolerance of the error vectors may even be reached at k iterations
that are much fewer than the maximal N iterations. However, the operative word
632
here is still nonsingular. Thus it is possible that the convergence will still be slow
if A is nearly singular or ill-conditioned. Several methods are available that choose
matrix multipliers C, called preconditioners, such that the new matrix
A = CA has
an improved condition number, but this has to be done without losing much of the
advantages of sparse matrices.
B.12.2 Enhancements
We address the two improvements of the basic GMRES. One improvement centers
k during the leaston taking advantage of the truncated Hesseberg structure of H
squares solution. The other improvement is to enhance the Arnoldi method for
calculating uk by incorporating the values of the residuals r(k) .
We begin with an explicit algorithm for Arnoldis method:
Arnoldi Method:
Given: A[=]n n, 1 < k n, and p1 [=]n 1.
Initialize:
p1
u1 =
and
U 1 = u1
p1
Iterate: Loop for i = 1, . . . , k,
wi = Aui
hi = U i wi
pi = wi U i hi
i = pi
If i > 0 or i < n
ui+1 =
pi
i
and
U i+1 =
Ui
ui+1
Else
Exit and report the value of i as the maximum number of orthogonal
vectors found.
End If
End Loop
i =
At the termination of the Arnoldi algorithm, we can generate matrices H
h j (i) for k j i,
Hk
and
H
Hk (i, j ) =
=
(B.93)
j
for k j = i 1
k
01(k1) k
0
otherwise
Using the QR decomposition of Hk = Qk Rk , where Qk is unitary and Rk is upper
triangular, we can form an orthogonal matrix
Qk
0k1
k+1 =
Q
01k
1
such that
H
Q
k+1 k =
Rk
01k
633
I[k1]
0
((k1)2)
Gk+1 =
c
s
0(2(k1))
s c
"
where c = Rk (k, k)/k , s = k /k and k = Rk (k, k)2 + 2k . Then
Rk
H
k =
Gk+1 Q
k+1
01(k+1)
where
Rk [=]k k is an upper triangular matrix that is equal to Rk except for the
lower corner element,
Rk (k, k) = k .
be the combined orthognal matrix. Premultiplying both
Let
k+1 = Gk+1 Q
k+1
sides of (B.89) by
k+1 , the least-squares problem reduces to
k+1 (1, 1)
..
Rk y = r(0)
(B.94)
.
k+1 (k, 1)
Because
Rk is upper triangular, the value of y can be found using the back-substitution
process.
Recursion formulas for
and Rk are given by
0()1
+1 = G+1
(B.95)
01()
1
R =
(B.96)
R1
h
Using
1 = [1] and R0 as a null matrix, the recursions (B.95) and (B.96) can be
incorporated inside the Arnoldi iterations without having to explicitly solve for Qk .
Furthermore, when the equality in (B.94) is satisfied, the norm of the kth residual
is given by
r(k)
k y
k+1 ck H
r(0)
r(0)
k+1 (1, 1)
..
.
k+1 (k + 1, 1)
k+1 (k + 1, 1)
Rk y
0
(B.97)
This means that the norm of the kth residual can be incorporated inside the iterations
of Arnoldis method, without having to explicitly solve for x(k) .
When
k+1 (k + 1, 1) = 0, (B.97) implies that x(k) = x(0) + U k y is an exact solution. Note that the Arnoldi method will stall at the ith iteration if i = 0 because
634
i as
ui+1 requires a division by i . However, this does not prevent the formation of H
given in (B.93). This implies that (Qk+1 Hk ) is already in the form required by Rk and
, or
k+1 (k + 1, 1) = 0. Thus when the Arnoldi process in GMRES
that
k+1 = Q
k+1
stalls at a value i, the update xi at that point is already an exact solution to Ax = b.
This is also the situation when i = n, assuming no roundoff errors.
In summary, we have the GMRES algorithm given below:
GMRES Algorithm:
Given: A[=]n n, b[=]n 1, initial guess x(0) [=]n 1 and tolerance tol.
Initialize:
r(0) = b A x(0)
U=
Q=
= r(0)
i=0 ;
u = r(0) /
R=[]
Iterate: i i + 1
While > tol and > tol
w = Au;
h = U w;
p = w Uh;
if > tol
U
=
p
p
"
r(i)2 + 2
c=
ri ;
I[i1]
0
R
r = Qh;
r(i)
;
R
0
s=
Qi+1,1
end if
End While Loop
Solve for y using back-substitution:
Q1,1
Ry = ...
Qi,1
Evaluate the final solution:
x = x(0) + Uy
r
0
Q
c s
0
s c
= p
0
1
635
model trust
region
Figure B.10. The trust region and the local quadratic model based on x(k) . The right figure
shows the contour plot and the double-dogleg step.
(B.98)
1
N
=
J
F
x(k)
k
k
(B.99)
2. Newton Update
Because the Newton update was based on a local model derived from a truncated
Taylors series, we could limit the update step to be inside a sphere centered around
x(k) known as the model-trust region approach, that is, with Mk > 0
k x Mk
(B.100)
Assuming the Newton step is the optimum local step, the local problem is that of
minimizing a scalar function k given by
k (
x)
=
=
1
(Fk + J k
x)T (Fk + J k
x)
2
1 T
1
Fk Fk + FTk J k
x +
xT J kT J k
x
2
2
(B.101)
F
J
J
F
J
J
F
k
k k
k
k k k k
k
k
2 k
2
636
xN[k]
x [k]
Newton
x [k+1]
x [k]
CP
[k]
FTk J k J kT Fk
2
FTk J k J kT Fk
(k)
(B.102)
(k)
Note that if xCP is outside the trust region, xCP will need to be set as the intersection
of the line along the gradient descent with the boundary of the trust region. In
Figure B.10, the contour plot is shown with an arrow originating from x(k) but
terminates at the Cauchy point.
(k)
The full Newton step will take x(k) to the point denoted by xNewton , which is the
minimum point located at the center of the elliptical contours. The Cauchy point,
full-Newton update point, and other relevant points, together with the important
line segments, are blown up and shown in Figure B.11.
(k)
(k)
One approach is to draw a line segment from xNewton to the Cauchy point xCP .
Then the next update can be set as the intersection of this line segment with the
boundary of the trust region. This approach is known as the Powell update, or the
single-dogleg step. However, it has been found that convergence can be further
improved by taking another point along the Newton step direction, which we denote
(k)
(k)
by xN . The Dennis-Mei approach suggests that xN is evaluated as follows:
(k)
1
(k)
(B.103)
=
x
J
F
x(k)
xN = x(k) + N
k
k
where
8
= 0.2 + 0.8
9
FTk J k J kT Fk
FTk Fk
(k)
(B.104)
where
b +
b2 ac
a
(k)
(k) 2
(k)
(k) T
xN xCP
xN xCP
(k) 2
xCP
(k)
xCP
Mk2
and Mk is the radius of the trust region. In case the update does not produce satisfactory results, then the radius will need to be reduced using an approach similar to
the line search method.
To summarize, we have the following enhanced Newton with double-dogleg
procedure:
Algorithm of the Enhanced Newtons Method with Double-Dogleg Search.
1. Initialize. Choose an initial guess: x(0)
2. Update. Repeat the following steps until either F x(k) or the number of
iterations have been exceeded
(a) Calculate J k .
(If J k is singular, then stop the method and declare Singular Jacobian.)
N
(b) Calculate the G
k and k . (cf. (B.98) and (B.99), respectively).
(k)
(k)
(k)
k x = (1 ) xCP + xN
where is obtained by (B.104).
(e) Check if
k x is acceptable. If
F x(k) +
k x
> F x(k)
+ 2FTk J k
k x
F x(k) +
k x
FTk J k
k x
2
F x(k)
2FTk J k
k x
637
638
f(x1,x2)
10
80
60
x2
40
20
10
5
0
10
0
-5
x2
-10 -10
x1
[k+2]
[k+3]
[k+1]
[k]
10
10
10
x1
Figure B.12. A surface plot of f (x1 , x2 ) of (B.105) is shown in the left figure. The right figure
shows the contour plot and the performance of the enhanced Newton with double-dogleg
method with the initial guess at (x1 , x2 ) = (4, 6).
EXAMPLE B.16.
where
(B.105)
x1
x2
1
1 (x1 , x2 ) = 5 tanh +
3
3
2
x2
2 (x1 , x2 ) = 1
2
A surface plot of f (x1 , x2 ) is shown in Figure B.12. When the enhanced Newton
with double-dogleg method was used to find the minimum of f (x1 , x2 ), we see in
Figure B.12 that starting with (x1 , x2 )0 = (4, 6), it took only three iterations to
settle at the minimum point of (x1 , x2 ) = (0.5, 2) which yields the value f = 2.
Conversely, applying the line-search method, in this case with the same initial
point, will converge to a different point (x1 , x2 ) = (432, 2) with f = 27.
A particular property of the function f (x1 , x2 ) in (B.105) is that the minimum is located in a narrow trough. When the line-search approach was
used, starting at (x1 , x2 )0 = (4, 6), the first Newton step pointed away from
(x1 , x2 ) = (0.5, 2). However, the double-dogleg method constrained the search
to a local model-trust region while mixing the gradient search direction with the
Newton direction. This allowed the double-dogleg method a better chance of
locating the minima that is close to the initial guess.
1
r (x)
2
(B.106)
r1 (x1 , . . . , xn )
..
r (x) =
.
rm (x1 , . . . , xn )
i = 1, . . . , m
One could apply Newtons method directly to (B.106). However, doing so would
involve the calculation of d2 r/dx2 ,
T
m
d2
dr
dr
d 2 ri
r
=
+
ri 2
2
dx
dx
dx
dx
i=1
r1
x1
.
..
.
J (x0 ) =
.
.
rm
x1
r1
xn
..
.
rm
xn
x=x0
This transforms the nonlinear least-squares problem (B.106) back to a linear leastsquares problem (cf. Section 2.5), that is,
min
xx0
1
r(x0 ) + J (x0 ) (x x0 )
2
(B.107)
639
640
J k = J x(k) ;
rk = r x(k)
k x
1
r k + J k
k x
2
subject to
k x Mk
(B.109)
where
k x = x(k+1) x(k) is the update step and Mk is the radius of the trust region.
From Figure B.10, we see that there is a unique point in the boundary of the
trust region where the value of function in the convex surface is minimized. This
observation can be formalized by the following lemma:
Levenberg-Marquardt Update Form
The solution to the minimization problem (B.109) is given by
1 T
k x = J kT J k + I
J k rk
LEMMA B.2.
(B.110)
(B.111)
where
1 T
s = J kT J k + I
J k rk
Note that we set = 0 if s0 < Mk . Also, the derivative of q () is given by
1
sT J kT J k + I
s
dq
q () =
=
(B.112)
d
s
Although the Newton method can be used to solve (B.111), the More method
has been shown to have improved convergence. Details of the More algorithm are
included in Section B.14.1.
> r x(k)
+ 2rTk J k
k x
rTk J k
k x
2
r x(k) +
k x
r x(k)
2rTk J k
k x
the function:
y = d exp ax2 + bx + c + e
to fit the data given in Table B.5. Applying the Levenberg-Marquardt method
with the initial guess: (a, b, c, d, e) = (0, 0, 0, 0, 0), we obtain the estimates:
(a, b, c, d, e) = (0.0519, 0.9355, 1.1346, 0.0399, 2.0055). A plot of the model,
together with data points, is shown in Figure B.13. We also show, in the right
plot of same figure, the number of iterations used and the final value of the
residual norm.
0
M
=
k1
|x=x(k1)
Mk
if k = 0
otherwise
641
642
0.1152
0.7604
1.2673
2.7419
3.4332
4.3088
4.8618
5.3687
6.4747
2.0224
2.0303
2.0408
2.1197
2.1803
2.2776
2.3645
2.4382
2.6303
6.5207
7.0276
7.5346
8.3180
9.4700
10.3917
11.2212
12.0507
2.6118
2.7355
2.7855
2.8539
2.8645
2.7934
2.6645
2.5487
12.6037
13.1106
13.7097
14.3548
15.0461
16.2442
17.5806
19.7465
2.4487
2.3750
2.2855
2.2013
2.1355
2.0671
2.0250
2.0039
2. Update.
(j )
s(j 1)
= (j 1)
Mk
q
q =(j 1)
(j )
(j )
if Lo j (j ) Hi j
!
max
Lo j Hi j , 103 Hi j
otherwise
where
Lo j
q(0)
q
(0)
q
, Lo j 1
max
q =(j 1)
if j = 0
otherwise
10
2.8
0
10
2
2.6
|| r ||
2.4
10
2.2
2
0
10
15
20
10
10
20
Iteration
30
40
Figure B.13. The model together with the data given in Table B.5. On the right plot, we have
the number of iterations performed and the corresponding norm of the residuals.
Hi j
J kT rk
Mk
(j 1)
min
Hi
,
j
1
Hi j 1
if j = 0
if q (j 1) < 0
otherwise
4. Repeat until:
s(j ) 0.9Mk , 1.1Mk
643
APPENDIX C
N
(aii ) = 0
i=1
N
(Aii I) = 0
i=1
or
det (Aii I) = 0
r Property 3: Eigenvalues of A is .
(A) v = () v
r Property 4: Eigenvalues of A and AT are the same.
Because det (B) = det BT ,
det (A I) = det (A I)T = det AT I = 0
Thus the characteristic equation for A and AT is the same, yielding the same
eigenvalues.
644
(Note: Property 7 implies that eigenvalues are nonzero for nonsingular matrices.)
Then for k < 1,
v = A1 Av = A1 v
A1 v =
det (A I)
Because the characteristic polynomials for both A and T 1 AT are the same, the
eigenvalues will also be the same.
If v is an eigenvector of A corresponding to then
Av = v
1
TBT
1 v
B T v
1
0
U AU = .
..
0
=
=
v
T 1 v
2
..
.
..
.
..
.
where U is unitary and represent possible nonzero entries. After taking the
determinant of both sides,
N
U |A| |U| = |A| =
i
r Property 8: = tr (A).
i
Using Schur triangularization,
U AU =
i=1
1
0
..
.
2
..
.
..
.
..
.
N
i=1
645
646
r Property 9: Eigenvalues of Hermitian matrices are real, and eigenvalues of skewHermitian matrices are pure imaginary.
Let H be Hermitian, then
(v Hv) = v H v = v Hv
which means v Hv is real. Now let be an eigenvalue of H. Then
Hv
v Hv
v v
v
v
v H
v
v
v is pure imaginary,
Because
v
has to be pure imaginary.
v is real and
v H
r Property 10: Eigenvalues of positive definite Hermitian matrices are positive.
Because H is positive definite, v Hv > 0, where v is an eigenvector of H.
Because v > 0, we must have > 0.
However, v Hv = |v|2 .
r Property 11: Eigenvectors of Hermitian matrices are orthogonal.
If H is Hermitian, H H = H2 = HH . Thus, according to Definition 3.5, H
is a normal matrix. Then the orthogonality of the eigenvectors of H follows as a
corollary to Theorem 3.1.
r Property 12: Distinct eigenvalues yield linearly independent eigenvectors.
Let 1 , . . . , M be a set of distinct eigenvalues of A[=]N N, with M N,
and let v1 , . . . , vM be the corresponding eigenvectors. Then
Ak vi = i Ak1 vi = = ki vi
We want to find a linear combination of the eigenvector that would equal the
zero vector,
1 v1 + + n vn = 0
After premultiplication by A, A2 , . . . , AM1 ,
1 1 v1 + + M M vM
..
.
v1 + + M M1 vM
1 M1
1
1 v1
M vM
1
1
..
.
1
2
..
.
..
.
1M1
2M1
..
.
M1
M
= 0[NM]
U AU = B =
b12
..
.
..
.
..
.
b1,N
..
.
bN1,N
N
N
|b1k |2
k=2
This is possible only if b1k = 0, for k = 2, . . . , N. Having established this, we can now
equate the second diagonal element of B B to the second diagonal element of BB
as follows:
|2 |2 = |2 |2 +
N
|b2k |2
k=3
and conclude that b2k = 0, for k = 3, . . . , N. We can continue this logic until the
(N 1)th diagonal of B B. At the end of this process, we will have shown that B is
diagonal.
We have just established that as long as A is normal, then U AU = , where
contains all the eigenvalues of A, including the case of repeated roots. Next, we can
show that the columns of U are the eigenvectors of A,
AU ,1
U AU
AU
U
1 U ,1
AU ,N
N U ,N
or
AU ,i = i U ,i
Because U is unitary, the eigenvectors of a normal matrix are orthonormal.
647
648
, where
{v1 , , vN
}
corresponding
to
eigenvalues
,
that
is,
V
,
.
.
.
,
CV
=
1
N
= diag 1 , . . . , N .
(V CV ) (V CV ) = V C CV
(V CV ) (V CV ) = V CC V
=
, we have
Because
C C = CC
This means that when all the eigenvectors are orthonormal, the matrix is guaranteed
to be a normal matrix.
=T
charpoly(J 1 )
0
..
.
0
charpoly(J 2 )
..
.
..
.
0
0
..
.
charpoly(J M )
1
T
(C.1)
C.2.1 QR Algorithm
QR Decomposition Algorithm (using Householder operators):
Given A[=]N M.
= IN
1. Initialize. K = A, Q
2. Iterate.
For j = 1, . . . , min(N, M) 1
(a) Extract first column of K:
u = K,1
(b) Construct a Householder matrix:
u1
u1 u
2
uu
u u
(c) Update K:
K HK
:
[j,...,N], HQ
[j,...,N],
(d) Update last (N j ) rows of Q
Q
(e) Remove the first row and first column of K:
K K1,1
if N > M:
Q
[M+1,...,N],
3. Trim the last (N M) rows of Q
Q
4. Obtain Q and R:
Q
Q
QA
Let square matrix A have a dominant eigenvalue, that is, |1 | > j , j > 1. An
iterative approach known as the power method can be used to find 1 and its
corresponding eigenvector v1 .
Power Method Algorithm:
Given matrix A[=]N N and tolerance > 0.
1. Initialize. Set w = 0 and select a random vector for v
2. Iterate. While v w >
w
Aw
v
v
649
650
10
|| v
k+1
v ||
10
10
10
10
10
15
Iteration ( k )
Figure C.1. Convergence of the eigenvector estimation using the power method.
1
v Av
v v
A short proof for the validity of the power method is left as an exercise
(cf. E3.24). The power method is simple but is limited to finding only the dominant
eigenvalue and its eigenvector. Also, if the eigenvalue with the largest magnitude is
close in magnitude to the second largest, then convergence is very slow. This means
that convergence may even suffer for those with complex eigenvalues that happen
to have the largest magnitude. In those cases, there are block versions of the power
method.
3. Obtain eigenvalue:
EXAMPLE C.1.
Let A be given by
A=
1
2
2
2
1
1
3
3
the power method found the largest eigenvalue = 6 and its corresponding eigenvector v = (0.5774, 0.5774, 0.5774)T in a few iterations. The norm
v(k+1) v(k) is shown in Figure C.1.
(C.2)
then A[
1] simply has reversed the order of Q and R. Because the eigenvalues are
preserved under similarity transformations (cf. Section 3.3), A and A[
1] will have
the same set of eigenvalues. One could repeat this process k times and obtain
A[
k]
Q[ k] R[ k]
A[ k+1]
R[ k] Q[ k]
where the eigenvalues of A[
k] will be the same as those of A. Because R[
k] is upper
triangular, one can show1 that A[
k] will converge to a matrix that can be partitioned
as follows:
B C
lim A[
k] =
(C.3)
0 F
k
where F is either a 1 1 or a 2 2 submatrix. Because the last matrix is block
triangular, the eigenvalues of A will be the union of the eigenvalues of B and the
eigenvalues of F . If F [=]1 1, then F is a real eigenvalue of A; otherwise, two
eigenvalues of A can be found using (3.21).
The same process can now be applied on B. The process continues with QR
iterations applied to increasingly smaller matrices until all the eigenvalues of A are
found.
EXAMPLE C.2.
A=
1
1
1
0
1
0
1
0
1.3333
1.1785 0.4083
A[
33] = 0.9428 0.6667
0.5774
0.0000
0.0000
1.0000
which means one eigenvalue can be found as 1 = 1. For the remaining two
eigenvalues, we can extract the upper left 2 2 submatrix and use (3.21) to
obtain 2 = 1 + i and 3 = 1 i.
Although the QR method will converge to the required eigenvalues, the convergence can also be slow sometimes, as shown in preceding example. Two enhancements significantly help in accelerating the convergence. The first enhancement is
called the shifted QR method. The second enhancement is the Hessenberg formulation. Both of these enhancements combine to form the modified QR method, which
will find the eigenvalues of A with reasonable accuracy. The details of the modified
QR method are included in Section C.2.4.
For a detailed proof, refer to G. H. Golub and C. Van Loan, Matrix Computations, 3rd Edition,
1996, John Hopkins University Press.
651
652
(C.4)
(C.5)
&
k +
k I
A
k+1 = &
R
k Q
(C.6)
Even with the modifications given by (C.4), (C.5), and (C.6), A
k+1 will still be a
similarity transformation of A
k starting with A
0 = A. To see this,
A
k+1
&
k +
k I
&
R
k Q
1
&
k
&
k +
k I
A
k
k I Q
Q
1
&
k
&
k
A
k Q
Q
Note that these modifications introduce only 2N extra operations: the subtraction of
k I from the diagonal of A
k , and the addition of
k I to the diagonal of
&
k . Nonetheless, the improvements in convergence toward attaining the form
&
R
k Q
given in (C.3) will be significant.
..
... ...
.
(C.7)
H=
.
.
..
..
0
where denotes arbitrary values.
To obtain the upper Hessenberg form, we use the Householder operators U xy
given in (3.7),
U xy = I
2
(x y) (x y)
(x y) (x y)
which will transform x to y, as long as x = y . With the aim of introducing zeros,
we will choose y to be
x
0
y= .
..
0
Two properties of Householder operators are noteworthy: they are unitary and
Hermitian. The following algorithm will generate a Householder matrix H such
that HAH will have an upper Hessenberg form. Also, because HAH is a similarity
transformation of A, both A and HAH will have the same set of eigenvalues.
Algorithm for Householder Transformations of A to Upper Hessenberg Form:
Start with G A.
For k = 1, . . . , (N 2)
wi = Gk+i,k ; i = 1, . . . , (N k)
1. Extract vector w:
2. Evaluate H:
I[N]
if w y = 0
H=
I[k]
0
otherwise
0
U wy
where,
3. Update G:
End loop for k
U wy
w
2
wy
T
(w y) (w y)
GHGH
Let
A=
3
0
0
0
0
4
1
0
2
2
0
0
2
0
0
12
0
3
5
6
12
2
3
6
7
3
4
0
12
0
1
1.4142
1
0
2.8284
1
8.4853
G = HAH =
0
0
0
1.6213
0
0
0
0.6213
12
1
8.4853
3.6213
2.6213
One can check that the eigenvalues of G and A will be the same.
653
654
Note that for this example, the resulting Hessenberg form is already in
the desired block-triangular forms, even before applying the QR or shifted QR
algorithms. In general, this will not be the case. Nonetheless, it does suggest
that starting off with the upper Hessenberg forms will reduce the number of QR
iterations needed for obtaining the eigenvalues of A.
b + b2 4c
b b2 4c
1 =
; 2 =
2
2
and
(Gk1,k1 + Gk,k )
2. Update G by removing the last two rows and last two columns.
Case 3: (|Gk,k1 | > ) and (|Gk1,k2 | > ).
Iterate until either Case 2 or Case 3 results:
Let = Gk,k ,
1. Find Q and R such that:
QR = G I
2. Update G:
G RQ + I
End While-loop
r Termination:
Case 1: G = [], then add to eigenvalue list.
Case 2: G[=]2 2, then add 1 and 2 to the list of eigenvalues,
where
b + b2 4c
b b2 4c
; 2 =
1 =
2
2
and
b = (G11 + G22 )
Let
A=
1
2
1
2
0
2
1
0
2
0
0
0
1
1
2
1
0
2
0
0
1
2
1
3
1
655
1
2
0.1060 1.3072 0.5293
3
1.8889
1.1542
2.3544
1.7642
0
1.0482
0.8190
0.6139
0.4563
G=
0
0
1.2738
0.0036
2.9704
0
0.8456
1.7115
4.2768
0.2485
2.2646
2.2331
5.7024
0
1.8547
2.3670
1.3323
0.2085
0
1.5436
0.4876
1.0912
0.0094
0
0
0.2087
0.3759
0.0265
0
1.2856
and we could extract 1.2856 as one of the eigenvalues. Then the size of G is
reduced by deleting the last row and column, that is,
4.2768
0.2485
2.2646
2.2331
0
1.8547
2.3670
1.3323
0
1.5436
0.4876
1.0912
0
0
0.2087
0.3759
Note that along the process, even though G will be modified and shrunk, it will
still have an upper Hessenberg form.
The process is repeated until all the eigenvalues of A are obtained: 1.2856,
0.0716, 0.5314 1.5023i, and 4.2768.
(A I)r1 vr = 0
j = (r 1), . . . , 1
(C.8)
656
Note: If the order of the chain is 1, then the chain is composed of only one eigenvector.
Algorithm for Obtaining Chain (A,,r).
1. Obtain vector vr to begin the chain.
(a) Construct matrix M,
(A I)r1
M(, r) =
(A I)r
I
0
j = 1, 2, . . . , q
j = q + 1, . . . , 2n
j = (r 1), . . . , 1
Note that as mentioned in Section B.2, the matrices Q and W can also be found
based on the singular value decomposition. This means that with UV = M, we
can replace W above by V of the singular value decomposition. Furthermore, the
rationale for introducing randomly generated numbers in the preceding algorithm
is to find a vector that spans the last (2n q) columns of W without having to
determine which vectors are independent.
EXAMPLE C.5.
Let
A=
3
0
1
0
0
0
3
0
0
0
0
0
3
0
0
0
1
0
2
0
1
1
0
0
3
0
1.2992
0.8892
0
1.2992
1.1826
0.8892
1.8175
chain(A, 3, 3) = v1 v2 v3 = 1.2992
0
0
0
0
0
1.2992
we can directly check that
(A I)3 v3 = 0
(A I)2 v3 = 0
v2 = (A I) v3
v1 = (A I) v2
and
657
To obtain the canonical basis, we still need to determine the required eigenvector
chains. To do so, we need to calculate the orders of matrix degeneracy with respect
to an eigenvalue i , to be denoted by Ni,k , which is just the difference in ranks of
succeeding orders, that is,
Ni,k = rank(A i I)k1 rank(A i I)k
(C.9)
Using these orders of degeneracy, one can calculate the required orders for
the eigenvector chains. The algorithm that follows describes in more detail the
procedure for obtaining the canonical basis.
Algorithm for Obtaining Canonical Basis.
Given A[=]N N.
For each distinct i :
1. Determine multiplicity mi .
2. Calculate order of required eigenvector chains.
Let
(
'
p i = arg min rank(A 1 I) p = (N mi )
1p n
if k = p i
if k < p i
where,
Ni,k = rank(A i I)k1 rank(A i I)k
3. Obtain the required eigenvector chains.
For each i,k > 0, find i,k sets of chain(A, i , k) and add to the collection of
canonical basis.
One can show that the eigenvector chains found will be linearly independent.
This means that T is nonsingular. The Jordan canonical form can then be obtained
by evaluating T 1 AT = J .
Although Jordan decomposition is not reliable for large systems, it remains
very useful for generating theorems that are needed to handle both diagonalizable
and non-diagonalizable matrices. For example, the proof of Cayley-Hamilton theorem uses Jordan block decompositions without necessarily having to evaluate the
decompositions.
EXAMPLE C.6.
3 0
0 3
A=
1 0
0 0
0 0
0
0
3
0
0
0
1
0
2
0
1
1
0
0
3
658
then
i
mi
pi
Ni,k
ordi
2
3
1
4
1
3
[1]
[2, 1, 1]
[1]
[1, 0, 1]
chain(A, 2, 1) =
0
0.707
0
0.707
0
chain(A, 3, 3) =
chain(A, 3, 1) =
0
0
1.2992
0
0
1.2992
1.2992
0.8892
0
0
0
0.5843
1.0107
0
0
0.8892
1.1826
1.8175
0
1.2992
0
0
0
1.2992
0.7071 0.5843
0
1.2992
0
1.0107
1.2992
0.8892
T =
0.7071
0
0
0
0
0
0
0
0.8892
1.1826
1.8175
0
1.2992
J =T
AT =
2
0
0
0
0
0
3
0
0
0
0
0
3
0
0
0
0
1
3
0
0
0
0
1
3
Let Hm =
w1
wm1
659
; then use
Gm Hm
Hm
=
bT
Gm1
I[Nm]
0
0
Hm
r)
(d) Find Uq [=]N
(M
such that Uq is orthogonal to U r .
(e) Set U = U r Uq .
+
if i = j r
i
4. Form [=]N M: ij =
0
otherwise
THEOREM C.1.
f (A) =
N
f (k ) vk wk
(C.10)
k=1
N
k=1
=k
f (k )
( I A)
=k ( k )
(C.11)
660
and
f (A) =
N
k=1
adj ( I A)
f (k )
=k ( k )
(C.12)
The advantage of (C.11) is that it does not require the computation of eigenvectors.
However, there are some disadvantages to both (C.11) and (C.12). One is that all
the eigenvalues have to be distinct; otherwise, a problem arises in the denominator.
To show that (C.10) can be derived from (3.35), we need to first show that the
rows of V 1 are left eigenvectors of A. Let wk be the kth row of V 1 , then
AV
V
V 1
w1
..
. A
wN
0
..
.
N
w1
..
.
wN
or
wk A = k wN
Thus wk is a left eigenvector of A. Using this partitioning of V 1 , (3.35) becomes
f (A)
=
=
v1
vN
v1
vN
w1
..
..
.
.
0
f (N )
wN
N
w1
T
f (k ) ek ek ...
k=1
wN
f (1 )
f (1 ) v1 w1 + . . . + f (N ) vn wN
661
C=
0
0
..
.
1
0
..
.
0
1
..
.
..
.
0
0
..
.
0
0
0
1
0
2
1
n1
(C.13)
(C.14)
(C.15)
v= .
..
n1
Thus with a similarity transformation of A based on a similarity transformation
by S
C1
0
Q21
C2
S1 AS = .
(C.16)
..
..
..
.
.
Qr1 Qr2 Cr
where Ci are n i n i companion matrices to polynomials
p i (s) = sni + ni 1 sni 1 + + 1 s + 0
[i]
[i]
[i]
r
p i (s)
(C.17)
i=1
Danilevski Algorithm:
Let A[=]N N; then Danilevski(A) should yield matrix S such that (C.16) is satisfied.
Initialize k = 0 and S = IN
While k < N,
kk+1
If N = 1,
S=1
662
else
Let j max = arg max j {i+1,...,N} aij and q = ai,j max
If q = 0
Interchange rows i + 1 and j max of A
Interchange columns i + 1 and j max of A
Interchange columns i + 1 and j max of S
X = (xi j )
ak,j /ak,k+1
1/ak,k+1
xij =
1
Y = (yi j )
ak,j
yij =
1
YAX
SX
if i = k + 1, j = k + 1
if i = k + 1, j = k + 1
if i = j = k + 1
otherwise
if i = k + 1
if i = j = k + 1
otherwise
else
Extract the submatrix formed by rows and columns i + 1 to N of A as H,
then solve for G = Danilevkii(H)
SS
Ii
0
0
G
kN
end If
End while
The Danilevskii algorithm is known to be among one of the more precise methods for determination of characteristic polynomials and is relatively efficient compared with Leveriers approach, although the latter is still considered very accurate
but slow.
A MATLAB function charpoly is available on the books webpage for the
evaluation of the characteristic polynomial via the Danilevskii method. The program
obtains the matrix S such that S1 AS is in the form of a block triangular matrix
[k]
given in (C.16). It also yields a set of polynomial coefficients p nk saved in a cell
array. Finally, the set of eigenvalues is also available by solving for the roots of the
polynomials. A function poly(A) is also available in MATLAB, which is calculated
in reverse; that is, the eigenvalues are obtained first, and then the characteristic
polynomial is formed.
Given
A=
1
4
1
2
1
2
5
2
1
1
3
0
0
0
1
0
0
0
1
0
0
0
0
2
1
1
0
0
0
2.75 0.25
0.25
0
1.5
0.5
0.1667
0
S=
0
0
0
1
0
0
0
0.5
0
1
0
0
0
0
1
0
6
6
0
S1 AS =
39
0.75 0.25
0.25
0
5.75
1.25
.5833 1
0
0
0
0
0.5
0
0
0
1
2
=
=
s3 6s2 6s + 39
p (s)
s2 2s + 1
=
=
p 1 (s)p 2 (s)
s5 8s4 + 7s3 + 45s2 84s + 39
663
APPENDIX D
(D.1)
(D.2)
wa (bc b c)
=
=
wa
1 (bc )
a bc a
(D.3)
1 wa
1 (bc )
1 (wa bc )
+ wa
=
a a
a bc a
a bc
a
1 (a wbc )
a bc
b
(wc c ) =
1 (a bwc )
a bc
c
Combining,
1
w=
a bc
3. Curl (4.92): Using (4.56) and (4.61), the curl of wa a can be expanded as
follows:
(wa a )
(wa a a)
wa a ( a) + (wa a ) a
A BC D
=0
1 (wa a )
1 (wa a )
1 (wa a )
1
+ b
+ c
a a a
b
b
c
c
a a
=
=
1
(wa a )
1
(wa a )
c
+
b
a b
b
a c
c
Similarly,
(wbb)
1
(wbb)
1
(wbb)
c
a
ba
a
bc
c
(wc c )
1
(wc c )
1
(wc c )
a
b
c b
b
c a
a
1
a a
a bc
(c wc ) (bwb)
b
c
+ bb
+ c c
(a wa ) (c wc )
c
a
(bwb) (a wa )
a
b
1
1
1
a
+ b
+ c
a a
b b
c c
into (4.91),
'
(
'
(
'
(
1
bc
a c
a b
=
+
+
a bc a
a
a
b
b
b
c
c
c
665
666
w =
wk k
k=a,b,c
(wk ) k + wk k
k=a,b,c
k=a,b,c m=a,b,c
1 wk
m k +
m m
wk
k
m m m
k=a,b,c m=a,b,c
cos sin 0
Rrc = sin cos 0
(D.4)
0
0
1
Because Rrc is an orthogonal matrix,
x
x
r
r
= Rrc y y = RTrc
z
z
z
z
(D.5)
which is relationship 1 in Table 4.6. We can then apply (D.5) for vector v,
r
r
x
T
vr v vz = v = vx vy vz y = vx vy vz Rrc
z
z
z
Comparing both ends of the equations, we have
vr
vx
vx
vr
v = Rrc vy vy = RTrc v
vz
vz
vz
vz
(D.6)
cos
sin
0
r r
r
r
x
x y z
r sin r cos 0
y
= y =
x y z
0
0
1
z
z z z
z
z
1
Note that the operator in (D.4) will rotate an input vector clockwise by an angle . However, because
we are rotating the reference axes, the operator would do the reverse; that is, it rotates the axes
counterclockwise.
667
Let Drc = diag 1, r, 1 . Then,
r
x
= Drc Rrc y
z
z
x
r
T
1
y = Rrc Drc
z
z
(D.7)
x
x
z y =
r
T
=
r y
(D.8)
= = z =0
r
r
r
2. Likewise, the direction and magnitude of r , , and z will not change if we just
modify the z position. Thus
= = z =0
z
z
z
3. If we just change the position, the direction or magnitude of z will also not
change. Thus
z
=0
668
where the subtraction is a vector subtraction. This is shown in (the right side of)
Figure D.1. As
0, we can see that the vector difference will be pointing perpendicular to r (r, , z). Thus
r
direction
= direction ( )
(D.9)
(r, +
, z) (r, , z)
= lim
The vector subtraction is shown in Figure D.2, where the limit yields a vector that is
pointing in opposite direction of r . The magnitude of the limit is also 1. Thus
= r
(D.10)
669
r
z
z
z
r
0
Rrc Rrc = 0
r
z
0
r
Rrc RTrc
sin
cos
0
cos
cos sin 0 sin
0
0
0
0
r
0
r
0
Rrc RTrc = 0
z
z
0
sin
cos
0
0
r
0
0
z
Rrs1
cos
= sin
0
sin
cos
0
0
0
1
Rrs2
cos
= 0
sin
0
1
0
sin
0
cos
0
Ers = 1
0
0
0
1
1
0
0
sin cos
= cos cos
sin
sin sin
cos sin
cos
cos
sin
0
(D.11)
670
Then, following the same approach used during transformations between rectangular and cylindrical coordinates, we have
x
x
r
r
T
= Rrs y
y = Rrs
(D.12)
z
z
vx
vr
v = Rrs vy
v
vz
vx
vr
vy = RTrs v
vz
v
(D.13)
The partial differential operators between the rectangular and spherical coordinate system are obtained by using the chain rule,
x
y
z
s c s s
c
r r
r
r
x
y
z
rc
c
rc
s
rs
=
=
y
x
y
z
0
rs s rs c
z
z
Let Drs = diag 1, r, r sin . Then,
r
x
x
r
= Drs Rrs y y = RTrs D1
rs
z
z
(D.14)
x
r
T
1
x y z y =
R
=
R
D
rs rs rs
r
x
r y
1
r sin z
(D.15)
= =
=0
r
r
r
2. The direction and magnitude of will not change if we just modify the
position. Thus
=0
The remaining partial derivatives of unit vectors will change their direction
based on their position in space. For a fixed r and , the vector subtractions are
shown in Figure D.3, and the partial derivatives are then given by
r
=
= r
(D.16)
For a fixed r and , the vector subtractions are shown in Figure D.4. Note that
four of the unit vectors are first projected into the horizontal plane prior to taking
limits. The partial derivatives are then given by:
= cos sin r ;
= sin ;
= cos
(D.17)
671
672
Figure D.4. Unit vectors at fixed r and . The unit vectors are represented by: a = r (r, , ),
b = (r, , ), c = (r, ), d = r (r, , +
), f = (r, , +
), g = (r, , +
).
The unit vectors projected into the horizontal planes are: &
a = r (r, , ) sin , &
b=
(r, , ) cos , &
d = r (r, , +
) sin , &
f = (r, , +
) cos .
r
0
r
=
Rrs Rrs = 0
r
r
0
r
r
=
Rrs RTrs
s c c c s
r
c c
c s
s
= s c s s c s s c s
c
0
0
0
c
s
0
z
= r
0
r
r
=
Rrs RTrs
s c c c s
r
s s s c 0
= c s c c 0 s s c s
c
s 0
c
s
0
c
z
r
0
0
s
= 0
0
c
s c 0
APPENDIX E
lim
i 0,N
N
F (xi , yi , zi )
i
(E.1)
i=0
In most applications, the differential d is set to either dx, dy, dz or ds, where
"
ds = dx2 + dy2 + dz2
(E.2)
For the 2D case, F = F (x, y) and the path C = C(x, y). Figure E.1 gives the area
interpretation of the line integrals. The integral C F (x, y)ds is the area under the
curve F (x, y)as the point travels along curve C. Conversely, the line integral with
respect to
x, C F (x, y)dx is the area projected onto the plane y = 0. The projected
integral C Fdx is with respect to segments where C(x, y) has to be single-valued with
respect to x. Otherwise, the integration path will have to be partitioned into segments
such that it is single-valued with respect to x. For example, the integration path from
A to B in Figure E.2 will have to be partitioned into segment ADE, segment EF , and
segment FGB. Thus for the integration path shown in Figure E.2, the line integral
with respect to x is given by
F (x, y)dx =
F (x, y)dx +
F (x, y)dx +
F (x, y)dx
(E.3)
C
[ADE]
[EF ]
[FGB]
For the 3D case, another interpretation is more appropriate. One could visualize
a mining activity that accumulates substance, say, Q, along path C in the ground
containing a concentration distribution of Q. Let F (x, y, z) be the amount of Q
673
674
gathered per unit length traveled. Then, along the differential path ds, an amount
F (x, y, z)ds will have been accumulated, and the total amount gathered along the
path C becomes C F (x, y, z)ds. Conversely, the integral C F (x, y, z)dx is the amount
of Q gathered
along the projected path in the x-direction. In this mining scenario, the
line integral C F (x, y, z)dx does not appear to be as relevant compared with the line
integral with respect to s. However, these line integrals are quite useful during the
computation of surface integrals and volume integrals because differential surfaces
s are often described by dx dy, dx dz, or dy dz, and differential volumes are often
described by the product dx dy dz.1 Another example is when the integral involves
the position vector r of the form
f dr =
f x dx + f y dy + f zdz
C
A 3D path can be described generally by C = x(t), y(t), z(t) = r(t), where r is
the position vector and t is a parameter going from t = 0 to t = 1.2 In some cases,
the curve can be parameterized by either x = t, y = t or z = t. In these cases, the
other variables are said to possess an explicit form, for example, for x = t, we can
use y = y(x) and z = z(x).3
1
2
3
One could then expect that in other coordinate systems, d may need involve those coordinates, for
example, dr, d, d, and so forth.
A more general formulation would be to let the parameter start at t = a and end with t = b, where
b > a. Using translation and scaling, this case could be reduced back to a = 0 and b = 1.
The parameterizations can also originate from coordinate transformations such as polar, cylindrical,
or spherical coordinates.
EXAMPLE E.1.
675
x+3
2
2
+ (y + 2)2 = 4
(E.4)
3 4 cos(2t)
2 2 sin(2t)
from t = 0 to t = 1
2. Explicit function of x.
Path Cabcda = Cabc + Ccda
where
6
Cabc : y = 2
4
6
Ccda : y = 2 +
x+3
2
x+3
2
2
from x = 7 to x = 1
2
from x = 1 to x = 7
3. Explicit function of y.
Path Cabcda = Cab + Cbcd + Cda
676
d
0
b
5
where
"
Cab : x = 3 2 4 (y + 2)2
from y = 2 to y = 4
"
: x = 3 + 2 4 (y + 2)2
from y = 4 to y = 0
"
Cda : x = 3 2 4 (y + 2)2
from y = 0 to y = 2
Cbcd
dx
g(t)
dt
dt
0
1
dy
g(t)
dt
dt
0
1
dz
g(t)
dt
dt
0
6
2 2 2
1
dx
dy
dz
g(t)
dt
+
+
dt
dt
dt
0
F (x, y, z)dx
F (x, y, z)dy
F (x, y, z)dz
F (x, y, z)ds
C
(E.6)
x = t, (E.6) are modified by replacing dx/dt = 1, dy/dt = dy/dx and dz/dt = dz/dx
with the lower limit xstart and upper limit xend . For example,
xend
F (x, y, z)dx =
F (x, y(x), z(x))dx
C
xstart
EXAMPLE E.2.
F (x, y) = 2x + y + 3
and the counter-clockwise elliptical path of integration given in Example E.1.
Using the parameterized form based on t,
x(t)
3 4 cos (2t)
y(t)
g(t)
2 2 sin (2t)
F x(t), y(t) = 2 (3 4 cos (2t)) + (2 2 sin (2t)) + 3
and
Thus
F (x, y)dx
dx
8 sin (2t) dt
dy
ds
4 cos (2t) dt
"
4 4 sin2 (2t) + cos2 (2t)dt
=
F (x, y)dy
F (x, y)ds
Cabc
Ccda
C = Cabc + Ccda
6
x+3 2
: y = yabc = 2 4
2
6
x+3 2
: y = ycda = 2 + 4
2
from x = 7 to x = 1
from x = 1 to x = 7
6
2
x
+
3
F (x, y)abc = 2x + 3 + 2 4
2
F (x, y)cda
2x + 3 + 2 +
6
4
x+3
2
2
677
678
dy
dx
dy
dx
ds
dx
ds
dx
=
abc
x+3
+ !
2 (1 x) (x + 7)
"
1 + dy2abc
"
1 + dy2cda
=
cda
x+3
!
2 (1 x) (x + 7)
abc
cda
Note that ds has a negative sign for the subpath [cda]. This is because the
direction of ds is opposite that of dx in this region.
The line integrals are then given by
1
7
F (x, y)dx =
F (x, y)abc dx +
F (x, y)cda dx
7
=
8
F (x, y)dy
F (x, y)ds
F (x, y)abc
dy
dx
dx +
F (x, y)cda
1
abc
dy
dx
dx
cda
16
1
7
1
7
F (x, y)abc
ds
dx
dx +
F (x, y)cda
1
abc
ds
dx
dx
cda
96.885
This shows that either the parameterized form or the explicit form approach
can be used to obtain the same values. The choice is usually determined by the
tradeoffs between the complexity of the parameterization procedure and the
complexity of the resulting integral.
lim
Ai 0,N
N
F (xi , yi , zi )
Ai
(E.7)
i=0
In most applications, the differential area is specified for either dA = dx dy, dydz,
dx dz, or dS, where dS is the differential area of the surface of integration
To visualize surface integrals, we could go back to the mining scenario for
the substance Q, except now the accumulation
is obtained by traversing a surface
instead of a path. Thus the surface integral S f (x, y, z)dS can be thought of as the
total amount mined by sweeping the total surface area S.
679
(E.9)
Other explicit forms are possible, for example, y = y(x, z) and x = x(y, z).
Two important variables are needed during the calculation of surface integrals:
, and the differential area dS at the point (x, y, z). As
the unit normal vector n
discussed in Section 4.6, the unit normal to a surface is given by (4.30), that is,
=
n
tu tv
(E.10)
tu tv
where
tu =
Specifically, we have
tu tv =
r
u
and tv =
r
v
(y, z)
(z, x)
(x, y)
+
+
(E.11)
where we used the shorthand notation for the Jacobian determinants given by
a a
c d
(a, b)
= det
b b
(c, d)
c d
However, the differential surface area is given by the area of the parallelogram
formed by differential arcs form by movement along constant v and u, respectively,
that is, the area formed by tu du and tv dv. Thus
tu du tv dv = tu tv du dv
dS =
6
(y, z) 2
(z, x) 2
(x, y) 2
=
+
+
du dv
(E.12)
(u, v)
(u, v)
(u, v)
If the explicit form z = z(x, y) is possible, that is, with x = u and y = v, the
formulas reduce to the more familiar ones, that is,
z
z
x y + z
x
y
n = 6
(E.13)
2 2
z
z
1+
+
x
y
680
6
dS
1+
z
x
2
+
z
y
2
dxdy
(E.14)
Note that with the square root, the choice for sign depends on the interpretation
of the surface direction. In most application, for a surface that encloses a region of
3D space, the surface outward of the enclosed region is often given a positive sign.
x
tu tv = det cos
r sin
y
sin
r cos
x
y
tu tv = det r sin r cos
cos
sin
z
0 = rz
0
z
0 = rz
0
top
n
dStop
r dr d
bottom
n
dSbottom
r dr d
x
y
z
tu tv = det R sin R cos 0 = R cos x + sin y = Rr
0
0
1
which the yields
side = r
n
and
dSside = Rddz
681
Figure E.6. The two possible domain descriptions: (a) boundary is partitioned into two segments such that v = (u), and (b) boundary is partitioned into two segments such that u = (v)
.
If the ranges of u and v are independent, then domain D can, without loss of
generality, be given as
D : 0u1 ;
0v1
F (x, y, z)dS =
S
g(u, v)du dv
where
6
g(u, v) = f (x(u, v), y(u, v), z(u, v))
(y, z)
(u, v)
2
+
(z, x)
(u, v)
2
+
(x, y)
(u, v)
2
(E.15)
Thus
h(v)
g(u, v)du
holding v constant
F (x, y, z)dS
S
h(v)dv
(E.16)
If u and v are interdependent at the boundary of the parameter space, then two
domain descriptions are possible:
Du : ulower u uupper
0 (u) v 1 (u)
(E.17)
0 (v) u 1 (v)
(E.18)
or
Dv : vlower v vupper
where ulower , uupper , vlower and vupper are constants. Both domain descriptions are
shown in Figure E.6, and both are equally valid.
682
With the first description given by (E.17), the surface integral is given by
1 (u)
g(u, v)dv
holding u constant
h(u) =
0 (u)
=
F (x, y, z)dS
uupper
h(u)du
(E.19)
ulower
where g(u, v) is the same function as in (E.15). Similarly, using the second description
given in (E.18),
1 (v)
g(u, v)du
holding u constant
h(v) =
0 (v)
=
F (x, y, z)dS
vupper
vlower
h(v)dv
(E.20)
For the special case in which the surface is given by z = z(x, y),
u
v=y
6
g(u, v)
ulower
2
z 2
z
g(x, y) = f (x, y, z(x, y)) 1 +
+
x
y
xlower
uupper = xupper
1 (u)
1 (x)
0 (u) = 0 (x)
vlower
ylower
vupper = yupper
1 (v)
1 (y)
0 (v) = 0 (y)
EXAMPLE E.4.
y = sin(u) sin(v) ;
0v
(x, z)
(u, v)
z = cos(u)
(y, z)
(u, v)
F (x, y, z)
(u, v)(u, v)
2
+
(z, x)
(u, v)
2
+
(x, y)
(u, v)
2
where
(u, v) = 2 sin(u) cos(v) + 2 sin(u) sin(v) cos(u) + 3
7
(u.v) =
3 cos2 (v) (cos(u) 1)2 (cos(u) + 1)2 + (1 + 2 cos2 (u) 3 cos4 (u))
As an alternative, we can partition the elliptical surface into two halves. The
upper half and lower half can be described by zu and z , respectively, where
6
zu =
1 x2
y2
2
z = 1 x2
y2
2
!
!
2 1 x2 y 2 1 x2
dzu
y/2
=!
dy
4 4x2 y2
with an integrand
g u (x, y) = 2x + y
1 x2
6
2 16
1
y2
3y
+3
2
2 4 + 4x2 + y2
dz
y/2
=!
dy
4 4x2 y2
with an integrand
g (x, y) = 2x + y +
1 x2
6
2 16
1
y2
3y
+3
2
2 4 + 4x2 + y2
683
684
Combining everything, we can calculate the surface integral via numerical integration to be
Iu
I
fdS
S
+1
2 1x2
2 1x2
+1 2 1x2
2 1x2
Iu + I = 64.4
which is the same value as the previous answer using the parameterized description.
Remark: In the example just shown, we have used numerical integration. This is
usually the preferred route when the integrand becomes too complicated to integrate analytically. There are several ways in which the numerical approximation can
be achieved, including the rectangular or trapezoidal approximations or Simpsons
methods. We have also included another efficient numerical integration technique
called the Gauss-Legendre quadrature method in the appendix as Section E.4.
lim
Wi 0,N
N
F (xi , yi , zi )
Wi
(E.21)
i=0
r
du
u
b=
r
dv ;
v
c=
r
dw
w
685
x
u
y
dV = c (a b) = det
u
z
u
x
w
y
w
z
w
du dv dw
(E.22)
cos
r sin
wmax vmax umax
(x, y, z)
FdV =
G
v,
w)
(u,
(u, v, w) du dv dw
V
wmin
vmin
umin
(E.23)
686
wmin w wmax
where
G(u, v, w) = F x(u, v, w), y(u, v, w), z(u, v, w)
(E.24)
687
with boundaries,
0 u 1 ; 0 v 2 ; 0 w
Let sw = sin(w), cw = cos(w), sv = sin(v), and cv = cos(v). Then the differential
volume is
s
c
uc
c
us
s
v w
v w
v w
2sv sw 2ucv sw 2usv cw
dV = det
du dvdw
cv
usv
0
2u2 |sv | du dvdw
Alternatively, we could use the original variables x, y and z. Doing so, the
differential volume is dV = dx dy dz, whereas the boundary of the volume of
integration is given by
7
7
y 2
y 2
Surface boundary:
1 z2
x 1 z2
2
2
!
!
2 1 z2 y 2 1 z2
1
2 1z2
2 1z2
1z2 (y/2)2
1z2 (y/2)2
(2x + y z) dx dy dz = 8
and
F (x)dx
n
i=1
Wi F (xi )
(E.26)
688
2n1
dx =
am x
m=0
n
Wi
2n1
i=1
am xm
i
(E.27)
m=0
Approximations having the form given in (E.26) are generally called quadrature
formulas. Other quadrature formulas include Newton-Cotes formulas, Simpsons
formulas, and trapezoidal formulas. The conditions given in (E.27) distinguish the
values found for Wi and xi as being Gauss-Legendre quadrature parameters.
A direct approach to determine Wi and xi is obtained by generating the required
equations using (E.27):
2n1
1
2n1
2n1
1
1
m=0
am x
dx
m=0
am xm
xm dx
n
Wi
i=1
m=0
am
m
n
xm dx
m=0
Wi xm
i
2n1
2n1
am xm
i
m=0
am
n
Wi xm
i
i=1
(E.28)
i=1
Because the condition in (E.27) should be true for any polynomial of order
(2n 1), (E.28) should be true for arbitrary values of am , m = 0, 1, . . . , (2n 1).
This yields
n
Wi xm
i = m
for m = 0, 1, . . . , (2n 1)
(E.29)
i=1
where
m =
1
1
xm dx =
2/(m + 1)
if m is even
if m is odd
(E.30)
0
This means that we have 2n independent equations that can be used to solve the
2n unknowns: xi and Wi . Unfortunately, the equation becomes increasingly difficult
to solve as n gets larger. This is due to the nonlinear terms such as Wi xm
i appearing
in (E.29).
An alternative approach is to separate the problem of identifying the xi values
from the problem of identifying the Wi values. To do so, we use Legendre polynomials and take advantage of their orthogonality properties.
We first present some preliminary formulas:
1. Any polynomial of finite order can be represented in terms of Legendre polynomials, that is,
q
i=0
ci xi =
q
j =0
bj P j (x)
(E.31)
689
With this definition, the integral of R(2n1) (x), with limits from 1 to 1, is
guaranteed to be zero. To see this, we apply (E.31) to the first polynomial on the
right-hand side of (E.32), integrate both sides, and then apply the orthogonality
properties of Legendre polynomials (cf. (9.48)), that is,
9
1 8
1
n1
R(2n1) (x)dx =
bi Pi (x) (Pn (x)) dx
1
n1
i=0
8
bi
i=0
9
Pi (x)Pn (x)dx
(E.33)
3. One can always decompose a (2n 1)th order polynomial, say, (2n1) (x), into
a sum of two polynomials
(2n1) (x) = (n1) (x) + R(2n1) (x)
(E.34)
where (n1) (x) is an (n 1)th order polynomial and R(2n1) (x) is a (2n 1)th
order polynomial that satisfies the form given in (E.32).
To show this fact constructively, let r1 , . . . , rn be the roots of the n th -order
Legendre polynomial, Pn (x). By virtue of the definition given in (E.32), we see
that R(2n1) (ri ) = 0 also. Using this result, we can apply each of the n roots to
(E.34) and obtain
(2n1) (ri ) = (n1) (ri )
i = 1, 2, . . . , n
(E.35)
th
One can then obtain (n1) (x) to be
1) order polynomial that
the unique (n
passes through n points given by ri , (2n1) (ri ) . Subsequently, R(2n1) (x) can
be found by subtracting (n1) (x) from (2n1) (x).
4. Using the decomposition given in (E.34) and the integral identity given in
(E.33), an immediate consequence is the following identity:
1
1
(2n1) (x)dx =
(n1) (x)dx
(E.36)
1
This means the integral of an (2n 1)th order polynomial can always be
replaced by the integral of a corresponding (n 1)th order polynomial.
We now use the last two results, namely (E.35) and (E.36), to determine the
Gauss-Legendre parameters. Recall (E.27), which is the condition for a GaussLegendre quadrature, and apply it to (2n1) (x),
1
n
(2n1) (x)dx =
Wi (2n1) (xi )
(E.37)
1
i=1
690
Now set xi = ri , that is, the roots of the n th order Legendre polynomial. Next, apply
(E.35) on the right-hand side, and apply (E.36) on the left-hand side of the equation:
(2n1) (x)dx
1
1
n1
k=0
(n1) (x)dx
n1
1
n1
bk x dx
n
Wi (n1) (ri )
n
Wi
i=1
bk
k=0
n
k=0
(E.38)
1 k=0
bk
Wi (2n1) (ri )
i=1
n1
n
i=1
xk dx
Wi rik k
bk
n
Wi rik
i=1
k=0
bk rik
k=0
n1
n1
(E.39)
i=1
where
k =
x dx =
k
2/(k + 1)
if k is even
if k is odd
(E.40)
0
1
r1
..
.
1
r2
..
.
...
...
..
.
1
rn
..
.
r1n1
r2n1
...
rnn1
W1
W2
..
.
Wn
0
1
..
.
(E.41)
n1
In summary, to obtain the parameters for an n-point Gauss-Legendre quadrature, first solve for the roots ri of the n th -order Legendre polynomial, i = 1, . . . , n.
After substituting these values into (E.41), we can solve for Wi , i = 1, . . . , n.4
4
The first equation in (E.41), ni=1 Wi = 2, can be viewed as a partition of the domain 1 x 1
into n segments having widths Wi . As each of these partitions are given the corresponding heights
of F (xi = ri ), the integral approximation is seen as a sum of rectangular areas. This means that the
process replaces the original shape of the integration area into a set of quadrilaterals. Hence, the
general term quadrature. For integrals of function in two dimensions, a similar process is called
cubature.
For n = 3, we have
P3 (x) =
691
x 2
5x 3
2
1 1
2
W1
1
0.6 0
0.6 W2 = 0
W3
2/3
0.6
0
0.6
whose solution is given by W1 = W3 = 5/9 and W2 = 8/9.
Note that r1 = r3 . This is not a coincidence but a property of Legendre
polynomials. In general, for an n th -order Legendre polynomial: (1) for n odd,
one of the roots will always be zero, and (2) each positive root will have a
corresponding negative root of the same magnitude.
1
1
[f (x1 , . . . , xn )] dx1 dx p
=
1
1
..
.
n
i1 =1
1
1
n
Wi p f x1 , . . . , x p 1 , ri p dx1 dx p 1
i p =1
n
Wi1 Wi p f (ri1 , . . . , ri p )
(E.42)
i p =1
where Wi and ri are the same values obtained earlier for the one-dimensional case.
0 (u) v 1 (u)
692
F (u, v)
du dv
v
uupper
=
0 (u)
ulower
uupper
1 (u)
F (u, v)
dv du
v
ulower
F (u, v)du
(E.43)
0 (v) u 1 (v)
G(u, v)
dudv
u
=
vlower
=
vupper
1 (v)
0 (v)
vupper
vlower
G(u, v)
dv du
u
G(u, v)dv
(E.44)
Combining (E.43) and (E.44), we arrive at the formula given in Greens lemma,
3
G(u, v)dv +
C
F (u, v)du =
C
G(u, v)
du dv
u
F (u, v)
du dv
v
f dV =
V
fx
dV +
x
V
fy
dV +
y
V
fz
dV
z
(E.45)
Figure E.9. The normal vector to the surface x = max (y, z) is given by N1 which
has a magnitude equal to the differential
surface, dS1 .
For the first term in (E.45), we can use the following description of the volume of
integration: 5
V :
zmin
zmax
min (z)
max (z)
min (y, z)
max (y, z)
min (z)
The first surface integral in (E.46) is with respect to the surface: S1: x = max (y, z). To
determine the differential area of the surface, dS1 at a point in the surface, we can
use the position vector r of the point in surface S1 . Along the curve in the surface,
in which z is fixed, we have a tangent vector given by (r/y) dy. Likewise, along the
curve in the surface, in which y is fixed, we have a tangent vector given by (r/z)dz.
This is shown in Figure E.9. By taking the cross product of these two tangent vectors,
we obtain a vector N1 which is normal to surface S1 whose magnitude is the area of
the parallelogram formed by the two tangent vectors, that is,
1
N1 = dS1 n
1 is the unit normal vector.
where n
Thus, with the position vector r along the surface given by
r = max (y, z) x + y y + z z
5
This assumes that any line that is parallel to the x axis will intersect the surface boundary of region V
at two points, except at the edges of the boundary, where it touches at one point. If this assumption
is not true for V , it can always be divided into subsections for which this assumption can hold.
After applying the divergence theorem to these smaller regions, they can be added up later, and the
resulting sum can be shown to satisfy the divergence theorem.
693
694
Figure E.10. The normal vector to the surface x = min (y, z) is given by N2 which has
a magnitude equal to the differential surface, dS2 .
we have
1
dS1 n
=
=
=
r
r
y z
dydz
max
max
+ y
+ z dydz
y x
z x
max
max
x
dydz
z y
y z
(E.47)
The same arguments can be used for the other surface given by x = min (y, z). The
difference is that, as shown in Figure E.10, the normal vector N2 = (r/z) (r/y),
and thus
n2 x ) dS2 = dydz
(
(E.48)
Returning to equation (E.46), we can now use the results in (E.47) and (E.48) to
obtain,
fx
n 1 x ) +
dV =
f x (
n 2 x ) =
f x (
f x x n
(E.49)
V x
S1
S2
S
Following the same procedure, we could show that the other two terms in (E.45) can
be evaluated to be
fy
dV =
f y y n
(E.50)
V y
S
fz
f z z n
dV =
(E.51)
V z
S
Adding the three equations: (E.49), (E.50) and (E.51), we end up with the divergence
theorem, that is,
fy
fy
fx
dS
+
+
dV =
f x x + f y y + f z z n
(E.52)
x
y
z
V
S
( ) + 2
()
( ) + 2
n
dS
+
n
2 r
2 r
r
r
r
&
V
S1
S2
Focusing on S2 , the unit normal is given by r , and
1
1
1
dS = 4
n= 2
n
2 r
r2 r
r
r
S2
695
696
1
dS = 4
n
r2 r
3
x
x
y
y
du + dv +
fy
du + dv
u
v
u
v
C
3
z
z
+ fz
du + dv
u
v
C
fx
C
3
=
(E.53)
where,
x
y
z
+ fy
+ fz
u
u
u
x
y
z
g(u, v) = f x
+ fy
+ fz
v
v
v
Applying Greens lemma, (5.1), to (E.53)
3
g
f
du dv
( f (u, v)du + g(u, v)dv) =
v
C
S u
f (u, v)
fx
(E.54)
The integrand of the surface integral in (E.54) can be put in terms of the curl of f as
follows:
f y y
g
f x x
f
2x
2y
f z z
2z
=
+ fx
+
+ fy
+
+ fz
u v
u v
vu
u v
vu
u v
vu
2
2
f y y
f x x
x
y
f z z
2z
+ fx
+
+ fy
+
+ fz
v u
uv
v u
uv
v u
uv
F k m k
F k m k
=
m u v m=x,y,z
m v u
m=x,y,z
k=x,y,z
k=x,y,z
f y (x, y) f y (z, y)
f x (y, x) f x (z, x)
+
+
+
y (u, v)
z (u, v)
x (u, v)
z (u, v)
f z (x, z)
f z (y, z)
+
+
x (u, v)
y (u, v)
fy
f x (x, y)
f z f y (y, z)
x
y (u, v)
y
z (u, v)
f z (z, x)
fx
+
z
x (u, v)
(y, z)
(z, x)
(x, y)
+
+
(E.55)
( f )
(u, v) x (u, v) y (u, v) z
697
dS is given by
Recall that n
dS =
n
(y, z)
(z, x)
(x, y)
x +
y +
z du dv
(u, v)
(u, v)
(u, v)
(E.56)
dS
( f ) n
f dr =
C
h()
F (, x) dx
g ()
1
lim
h(+
)
F ( +
, x)dx
g(+
)
h()
F (, x)dx
g()
(E.57)
The first integral in the left-hand side of (E.57) can be divided into three
parts,
h(+
)
g (+
)
h(+
)
F ( +
, x) dx =
F ( +
, x) dx
h()
h()
g ()
+
F ( +
, x) dx +
F ( +
, x) dx
g ()
g (+
)
(E.58)
Furthermore, the first integral in the left-hand side of (E.58) can be approximated by the trapezoidal rule,
h(+
)
h()
1
F +
, h(+
)
2
+ F +
, h() h(+
) h()
F ( +
, x) dx
(E.59)
Likewise, we can also approximate the third integral in the left-hand side of
(E.58) as
g ()
g (+
)
1
F +
, g (+
)
2
+ F +
, g () g () g (+
)
F ( +
, x) dx
(E.60)
698
Substituting (E.59) and (E.60) into (E.58), and then into (E.57),
d
d
8
g ()
=
F ( +
, x) F (, x)
F (, x) dx = lim
dx
g())
F +
, h(+
) + F +
, h()
+
h(+
) h()
2
9
F +
, g (+
) + F +
, g ()
+
g () g (+
)
2
h()
h()
g())
h()
dh
dg
F (, x) dx + F (, h())
F (, g())
d
d
0
V (+
)
V ()
By adding and subtracting the term V () f (x, y, z, +
)dV in the right-hand
side,
d
d
f (x, y, z, )dV
V ()
1
= lim
'
f (x, y, z, +
)dV
V ()
f (x, y, z, )dV
V ()
'
(
1
+ lim
f (x, y, z, +
)dV
f (x, y, z, +
)dV
V (+
)
V ()
'
f
1
dV + lim
=
f (x, y, z, +
)dV
V ()
V (+
)
(
(E.61)
f (x, y, z, +
)dV
V ()
The last group of terms in the right-hand side (E.61) is the difference of two
volume integrals involving the same integrand. We can combine these integrals
by changing the volume of integration to be the region between V ( +
) and
V ().
f (x, y, z, +
)dV
f (x, y, z, +
)dV =
V (+
)
V ()
f (x, y, z, +
)dV
(E.62)
V (+
)V ()
699
Figure E.12. Graphical representation of differential volume emanating from points in S()
towards S( +
).
Recall that
r
r
dS
du
dv = n
u
v
d du dv
u u
r
d dS
n
The volume integral for points bounded between the surfaces of V () and
V ( +
) can now be approximated as follows:
r
dS
f (x, y, z, +
)dV
f (x, y, z, +
)
n
V(+
) V()
S()
=
(E.63)
Substituting (E.63) into (E.62) and then to (E.61),
d
f
f (x, y, z, )dV =
dV
d V ()
V ()
1
r
dS
+ lim
f (x, y, z, +
)
n
0
S()
=
V ()
f
dV +
f (x, y, z, )
S()
r
dS
n
APPENDIX F
(F.1)
Note that when P(x) = 0, we have a first-order linear differential equation, and when
R(x) = 0, we have the Bernouli differential equation.
Using a method known as the Ricatti transformation,
y(x) =
1
dw
P(x)w dx
we obtain
dy
dx
Py2
Qy
1 d2 w
1
+
2
Pw dx
Pw2
1
dw 2
Pw2 dx
dw
dx
2
1 dP
+ 2
p w dx
dw
dx
Q dw
Pw dx
+ Q(x)
+ P(x)R(x)w = 0
dx2
P(x)
dx
(F.2)
700
701
Noting that P(x) = x, Q(x) = 2/x and R(x) = 1/x3 , the Ricatti transformation y = (dw/dx)/(xw) converts it to a linear second-order differential equation given by
d2 w
dw
+x
w=0
dx2
dx
which is a Euler-Cauchy equation (cf. Section 6.4.3). Thus we need another
transformation z = ln(x), which would transform the differential equation further to be
x2
d2 w
=w
dz2
whose solution becomes,
w(z) = Aez + Bez
1
w(x) = A + Bx
x
A
B
2
1 C x2
x
= 2
A
x C + x2
x
+ Bx
x
702
(x,y)
Figure F.1. Description of a curve as an envelope of tangent lines used for Legendre transformation rules.
x
-q
(F.4)
where p is the slope and q is the y-intercept. This is illustrated in Figure F.1.
The Legendre transformation uses the following transformations:
p=
dy
;
dx
q=x
dy
y and
dx
dq
=x
dp
(F.5)
where p is the new independent variable and q is the new dependent variable. The
inverse Legendre transformations are given by
x=
dq
;
dp
y=p
dq
q
dp
and
dy
=p
dx
(F.6)
(F.7)
(F.8)
It is hoped that (F.8) will be easier to solve than (F.7), such as when the derivative
dy/dx appears in nonlinear form while x and y are in linear or affine forms. If this is
the case, one should be able to solve (F.8) to yield a solution of the form given by:
S(p, q) = 0. To return to the original variables, we observe that
S dq
S
+
= 0 g(p, xp y) + h(p, xp y)x = 0
p
q dp
where g and h are functions resulting from the partial derivatives. Together with
f (x, y, p ) = 0, one needs to remove the presence of p to obtain a general solution s(x, y) = 0. In some cases, if this is not possible, p would have to be left as
a parameter, and the solution will be given by curves described by x = x(p ) and
y = y(p ).
703
(F.9)
where (p ) = p .1 For instance, one may have a situation in which the dependent
variable y is modeled empirically as a function of p = dy/dx in the form given by
(F.9). After using the Legendre transformation, we arrive at
dq
1
(p )
+
q=
dp
(p ) p
p (p )
which is a linear differential equation in variables p and q.
EXAMPLE F.2.
q=
dp
2p
2
whose solution is given by
q=
p2
+C p
3
3
2
where is a parameter for the solution (y(), x()), and C is an arbitrary
constant.
y = x() + 2 ; subject to x() =
(F.11)
704
= S(x, y) = 0
C
(F.12)
where S(x, y) is obtained with the aid of (F.11). To determine whether S(x, y) = 0 is
indeed a singular solution, one needs to check if
S S dy
+
=0
x
y dx
(F.13)
will satisfy the original differential equation (F.10). If it does, then it is a singular
solution.
EXAMPLE F.3.
(F.14)
2 x
4u + 1 1 + 4u
whose solution is given by
ln( 4u + 1 1) = ln(x) + k
y = Cx + C2
(F.15)
then
= x 2C = 0
(F.16)
C
where C can be eliminated from (F.16) using (F.15) to obtain
"
x2
S(x, y) = x2 + 4y = 0 y =
(F.17)
4
Finally, we can check that (F.17) satisfies (F.14), thus showing that (F.17) is
indeed a singular solution of (F.14).
A simpler alternative approach to solving Clairauts equation is to take the
derivative of (F.14) with respect to x while letting p = dy/dx, that is,
dp
dp
+ 2p
dx
dx
p +x
dp
(x + 2p )
dx
705
30
C = 5
C=1
y1(x;C), y2(x)
20
10
C=3
C = 1
10
20
30
10
then
dp
=0
and
dx
yielding two solutions of different forms
y1 = c1 x + c2
and
p =
y2 =
x
2
x2
+ c3
4
Substituting both solutions into (F.14) will result in c3 = 0 and c2 = c21 . Thus the
general solution is given by
y1 = cx + c2
while the singular solution is given by
y2 =
x2
4
10
706
Q1
q0,0 (i )
..
..
Q = . Qi =
.
qmi 1,0 (i )
Qp
where,
qj, () =
q0,n1 (i )
..
[=] mi n (F.18)
.
qmi 1,n1 (i )
..
.
if
<j
!
(j )
( j )!
g1
g(t) = ... gi =
gp
3. Combining the results, we have
i t
e [=] mi 1
(F.19)
tmi 1
c0
..
.
v(t) =
1
t
..
.
1
= Q g(t)
cn1 tn1
where Q, as given in (F.18), is a matrix of constants. The matrix exponential is
then given by
eAt =
n1
v+1 (t)A
(F.20)
=0
A is constant
(F.21)
where
H1
H2
w (t)
x0 Ax0 An1 x0 Q1
I[n] A An1 Q1 I[n]
t
g t b d [=] n 2 1
0
The span of the columns of H1 is also known as the Krylov subspace of A based on x0 .
m
m!
1
et et
( )k tk
m+1
k!
t
( )
k=0
(t )m e(t) e d =
m+1
t
0
et
m
+
1
707
if =
if =
(F.22)
Because Q is formed by the eigenvalues of A, both H1 and H2 are completely
determined by A and x0 alone, that is, they are both independent of b(t). A MATLAB
function linode_mats.m is available on the books webpage. This function can be
used to evaluate H1 and H2 , with A, x0 as inputs.
Consider the linear system
0
0
1
2
1
d
2
2 x +
x=
2
2
dt
e3t
1
2
0
EXAMPLE F.4.
1
x(0) = 0
1
2t
1 2
4
e
Q = 1 1
1
;
g = et
0
1 2
tet
Matrices H1 and H2 are
1
0
0
1
1
1
1
1
H2 =
H1 =
2
2
2
2
1
2
1
1
and the integrals can be evaluated to be
0
1
2
1
1 e2t /2
1 e2t
e3t e2t
1 et
t
2 2et
w (t) =
g t b d =
3t
e et /2
1 (1 + t) et
2 2 (1 + t) et
e3t + (1 2t) et /4
0
1
708
5
x3
Figure F.3. A plot of the solution of the system given in Example F.4.
x2
1
0
10
Note that w(0) = 0 as it should, because the initial conditions are contained
only in H1 . Furthermore, note that columns 2, 3, and 7 in H2 are all zero, which
implies that the corresponding elements in w(t) are not relevant to the solution
of x(t).
Combining the results using (F.21), a plot of the solutions is shown in
Figure F.3.
(F.23)
where F (x, y) = M(x, y)/N(x, y). Taking the partial derivative of (F.23) with
respect to ,
1
1 F x, y
1 F x, y
F (x, y) = x
+ y
( )
( x)
( y)
Next, fix = 1 to obtain
x
F
F
+ y
= ( ) F
x
y
which is a linear first-order partial differential equation, that is solvable using the
method of characteristics.3 The characteristic equations are given by,
dx
dy
dF
=
=
x
y
( ) F
3
y
=u
x
2 =
and
F
x()/
F =
dy
= x()/ G(u)
dx
=
=
=
d&
y
f &
x,&
y,
d&
x
1 dy
f x, y
dx
1 dy
f x, y,
dx
Next, we take the partial derivative of this equation with respect to and then set
= 1,
f
dy
f
f
x + y + ( 1)
= ( 2) f
x
y
dx (dy/dx)
which is a linear first-order partial differential equation. The characteristic equations4
are given by
dx
dy
d(dy/dx)
df
=
=
=
x
y
( 1) (dy/dx)
( 2) f
which yields three invariants, 1 , 2 , and 3 ,
y
=u
x
(dy/dx)
=v
x1
f
2
x
709
710
The general solution for the partial differential equation is then given by
3 = G (1 , 2 )
x2
= G(u, v)
du
dx
dv
dx
=
=
v u
d2 y 1
f
+ (1 ) v = 2 + (1 ) v
dx2 x2
x
G(u, v) + (1 ) v
ti i
A,
i!
Hj =
sj j
A
j!
G0 + G1 + G2 + G3 +
eAs
H0 + H1 + H2 + H3 +
=
=
(G0 + G1 + G2 + G3 + ) (H0 + H1 + H2 + H3 + )
G0 H0 + G1 H0 + G2 H0 + G3 H0 +
+ G0 H1 + G1 H1 + G2 H1 + G3 H1 +
+ G0 H2 + G1 H2 + G2 H2 + G3 H2 +
+
Q0 + Q1 + Q2 +
where
Qk
k
Gi Hki
i=0
k i
t
i=0
i!
i
1 k
k!
A
ti ski
k!
i! (k i)!
k
i=0
ski
Aki
(k i)!
1 k
A (s + t)k
k!
711
Thus
eAt eAs
I + (s + t)A +
eA(s+t)
(s + t)2 2
A +
2!
which proves (6.51). Note also that matrices eAt and eAs commute.
By letting s = t,
eAt eAt = eAt eAt = I
Thus eAt is the inverse of eAt
Now let matrices
i and j be defined as
i =
ti i
A,
i!
j =
tj j
W
j!
0 + 1 + 2 + 3 +
eWt
0 + 1 + 2 + 3 +
=
=
(
0 +
1 +
2 +
3 + ) (0 + 1 + 2 + 3 + )
0 0 +
1 0 +
2 0 +
3 0 +
+
0 1 +
1 1 +
2 1 +
3 1 +
+
0 2 +
1 2 +
2 2 +
3 2 +
+
R0 + R1 + R2 +
where
Rk
k
i ki
i=0
k i
t
i=0
i!
Ai
tki
Wki
(k i)!
1 k
k!
t
Ai Wki
k!
i! (k i)!
k
i=0
(A + W)3
(A + W) (A + W)
A2 + WA + AW + W2
A2 + 2AW + W2
(A + W)2 (A + W)
A3 + 2AWA + W2 A
(F.24)
712
+A2 W + 2AW2 + W3
=
..
.
(A + W)k
A3 + 3A2 W + 3AW2 + W3
k
i=0
k!
Ai Wki
i! (k i)!
which will not be true in general unless A and W commute. Thus, if and only if A
and W commutes, (F.24) becomes
Rk =
tk
(A + W)k
k!
and
eAt eWt
t2
(A + W)2 +
2!
I + t (A + W) +
e(A+W)t
d
t2 2 t3 3
I + At + A + A +
dt
2!
3!
A + A2 t +
(F.26)
t2 3 t3 4
A + A +
2!
3!
t2
t3
A I + At + A2 + A3 +
2!
3!
AeAt = eAt A
where,
(k)
k = m
M
ij
(k)
m
ij =
mij
if i = k
dmij
dt
if i = k
dmij
=
ai, m,j
dt
n
(F.25)
=1
where aij and mij are the (i, j )th element of A and M, respectively. Then
m11
m1n
..
..
.
.
n
n
k =
M
=1 ak, m,1
=1 ak, m,n
..
..
.
.
mnn
mn1
and
n
k =
det M
ak,
=1
m11
..
.
det
m,1
.
..
mn1
m1n
..
.
m,n
..
.
mnn
Thus
d
det (M) = a11 det(M) + + ann det(M) = trace(A) det(M)
dt
Integrating,
det(M) = e
trace(A)dt
Because the trace of A is bounded, the determinant of M will never be zero, that is,
M1 exists.
J1
J =
0
0
..
Jm
Jk =
1
..
.
0
..
..
713
714
where
eJ k t =
ek t
0
..
.
tek t
ek t
..
.
..
.
ek t
If any of the eigenvalues have a positive real part, then some elements of z will grow
unbounded as t increases. Because x = T z, the system will be unstable under this
condition.
APPENDIX G
t
2e + 1 y2 3y1
2y2
(G.1)
with y1 (0) = 2 and y2 (0) = 1. Then the following steps are needed:
1. Build a file, say derfunc.m, to evaluate derivatives of the state space model:
function dy = derfunc(t,y)
y1 = y(1);
y2 = y(2)
;
dy1 = (2*exp(-t)+1)*y2 -3*y1
;
dy2 = -2*y2
;
dy = [dy1;dy2]
;
2. Run the initial value solver
>> [t,y]=ode45(@derfunc,[0,2],[2;1]);
where [t,y] are the output time and states, respectively, derfunc is the file
name of the derivative function, [0,2] is the time span, and [2;1] is the vector
of initial values. A partial list of solvers that are possible alternatives to ode45
is given in Table G.1. It is often suggested to first try ode45. If the program
takes too long, then it could be due to the system being stiff. In those cases, one
can attempt to use ode15s.
There are more advanced options available for these solvers in MATLAB.
In addition to the ability to set relative errors or absolute errors, one can also
include event handling (e.g., modeling a bouncing ball), allow passing of model
parameters, or solving equations in mass-matrix formulations, that is,
M(t, y)
d
y = f (t, y)
dt
715
716
Description
Remarks
ode23
ode45
ode113
Adams-Bashforth-Moulton
Predictor-Corrector
ode15s
Variable-order BDF
(Gears method)
ode23s
ode23t
Trapezoid method
ode23tb
where M(t, y) is either singular (as it would be for DAE problems) and/or has
preferable sparsity patterns.
y1 (1) 0.3
y2 (0) 1
r = bconds(yinit,yfinal)
yfinal(1)-0.3 ;
yinit(2)-1
;
[r1;r2]
;
Note that this file does not know that the final point is at t = 1. That information
will have to come from a structured data, trialSoln, that is formed in the next
step.
3. Create a trial solution data, trialSoln,
>> trialSoln.x = linspace(0,1,10);
>> trialSoln.y = [0.5;0.2]*ones(1,10);
The data in trialSoln.x give the initial point t = 0, final point t = 1, and
10 mesh points. One can vary the mesh points so that finer mesh sizes can be
focused around certain regions. The data in trialSoln.y just give the intial
conditions repeated at each mesh point. This could also be altered to be closer
to the final solution. (Another MATLAB command bvpinit is available to
create the same initial data and has other advanced options.)
4. Run the BVP solver,
>> soln = bvp4c(@derfunc,@bconds,trialSoln);
The output, soln, is also a structured data. Thus for plotting or other postprocessing of the output data, one may need to extract the t variable and y variables
as follows:
>> t=soln.x;
y=soln.y;
There are several advanced options for bvp4c, including the solution of multipoint BVPs, some singular BVPs, and BVPs containing unknown parameters. The
solver used in bvp4c is said to be finite difference method coupled with a three-stage
implicit Runge-Kutta method known as Lobatto III-a.
717
718
3. Set the parameter, options, to include the mass matrix information using the
command
>> options=odeset(Mass,[1,0;0,0]);
4. Run the DAE solver
>>[t,y]=ode15s(@daevdpol,[0,2],[-0.0997;0.1],options)
where [0,2] is the time span.
tk ,yk
tk ,yk
tk ,yk
d2 y
dt2
f
f
+
f
t
y
d3 y
dt3
2 f
2 f
2 f 2 f f
+
2f
+
f +
+
t2
ty
y2
t y
f
y
2
f
..
.
The number of terms increases exponentially with increases in the order of
differentiation. These equations, including those for higher orders, can be made
more tractable using an elegant formulation using labeled trees (see, e.g., Hairer
and Wanner [1993]).
As an alternative, we simplify the process by picking specific forms for f (t, y).
The first choice is to let f (t, y) = t3 . The analytical solution from yk to yk+1 is given by
d
y
dt
t3
k1
h tk3
k2
h (tk + a2 h)3
k3
h (tk + a3 h)3
(G.3)
k4
h (tk + a4 h)3
yk+1
yk + c1 k1 + c2 k2 + c3 k3 + c4 k4
yk + (c1 + c2 + c3 + c4 ) tk3 h
+ 3 (c2 a2 + c3 a3 + c4 a4 ) tk2 h2
+ 3 c2 a22 + c3 a23 + c4 a24 tk h3
+ c2 a32 + c3 a33 + c4 a34 tk h4
719
(G.4)
c2 a2 + c3 a3 + c4 a4
1
2
1
3
1
4
(G.5)
ln
d
y
dt
yk+1
yk
yk+1
=
=
=
ty
(tk + h)2 tk2
2
(tk + h)2 tk2
yk exp
2
(G.6)
(G.7)
=
=
h tk yk
h (tk + a j h) yk +
j 1
bj k
j = 2, 3, 4
=1
yk+1
=
=
yk + c1 k1 + c2 k2 + c3 k3 + c4 k4
yk 1 + 1,1 tk h + 2,0 + 2,2 tk2 h2
3,1 tk + 3,3 tk3 h3
4,0 + 4,2 tk2 + 4,4 tk4 h4 + O(h5 )
(G.8)
720
where,
1,1
4
ci
i=1
2,0
4
ci ai
i=2
2,2
3,1
4
ci
i=2
j =1
4
i1
ci
4
i=3
4,0
4
ci
4
4
ci ai
i1
bij
j =1
i=2
bij bj
=1 j =+1
ci ai
i1
bij a j
j =2
ci ai
i1
i2
=1 j =+1
i=3
4,4
a bi, +
i1
i2
i=3
4,2
bij
=2
i=3
3,3
i1
bij bj +
4
ci
i=3
i1
=2
bi a
1
j =1
Now compare the coefficients of (G.8) and (G.7). Using (7.17) and including (G.5), we end up with the eight equations necessary for the fourth-order
approximation:
c1 + c2 + c3 + c4
1
6
c2 a2 + c3 a3 + c4 a4
1
2
1
8
1
3
+ c4 b43 a23 + b42 a22
1
12
1
4
c4 b43 b32 a2
1
24
c3 b32 a22
(G.9)
After replacing a j with bj , there are ten unknowns (bij and c j , i < j , j = 1, 2, 3, 4)
with only eight equations, yielding two degrees of freedom. One choice is to set
b31 = b41 = 0. This will result in the coefficients given in the tableau shown in (7.14).
Another set of coefficients that satisfies the eight conditions given in (G.9) is the
Runge-Kutta tableau given by
1
3
1
3
2
3
31
1
8
3
8
3
8
1
8
cT
(G.10)
or
k1
k2
k1
k2
=
(1/h) b11
b21
b12
(1/h) b22
1
1
1
yk
=
=
(1/h) b11
b21
1 + p 1 h + p 2 h2
yk
1 + q1 h + q2 h2
yk +
c1
c2
b12
(1/h) b22
1
1
1
yk
(G.11)
where
p1
c1 + c2 b11 b22
p2
q1
b11 b22
q2
The usual development of the Gauss-Legendre method is through the use of collocation theory, in
which a set of interpolating Lagrange polynomials is used to approximate the differential equation
at the collocation points. Then the roots of the s-degree Legendre polynomials are used to provide
the collocation points. See, e.g., Hairer, Norsett and Wanner (1993).
721
722
2
1
12
c1 + c2 b11 b22
b11 b22
(G.13)
which still leaves two degrees of freedom. A standard choice is to use the roots of
the second-degree Legendre polynomial to fix the values of a1 and a2 ,2 that is,
P2 (t) = t2 t + (1/6)
yielding the roots
1
3
a1 =
2
6
1
3
a2 = +
2
6
and
1
3
= b11 + b12
and
2
6
1
3
+
= b21 + b22
2
6
(G.14)
(G.15)
From
(G.13) and (G.15),
m
m
bj f ykj = yk + h
bj ykj
j =0
(G.16)
j =0
(G.17)
m
bj ejh
j =0
bj 1 jh +
+
2!
3!
2!
3!
j =0
(G.18)
h
h2
1+
+
+
2!
3!
m
j =0
(jh)2
(jh)3
bj 1 jh +
+
2!
3!
m
bj h
j =0
m
j =0
m
2
h
j bj +
j 2 bj +
2!
j =0
.
..
..
.
..
.
..
b
0
m b1
.. ..
. .
mm bm
1
2
..
.
(1)m
m+1
(G.19)
(i|k) yki = hk f yk+1
(G.20)
i=1
Using the same technique of finding the necessary conditions by the simple application of the approximation to dy/dt = y, that is, f (y) = y and y = et y0 , we note
that
ykj = e(tkj tk+1 ) yk+1
Then (G.20) reduces to
m
m
i=1
(i|k)
i=1
(tki tk+1 )2
1 + (tki tk+1 ) +
+
2!
hk yk+1
hk
The form (G.20), in which the derivative function f is kept on one side without unknown coefficients,
is often preferred when solving differential algebraic equations (DAE).
723
724
For the p th -order approximation, we again let m = p 1, and the equation will
yield
.
.
.
...
(tk+1 tk )
(tk+1 tk1 )
...
(tk+1 tk )2
(tk+1 tk1 )2
...
..
.
..
.
..
(tk+1 tk ) p
(tk+1 tk1 ) p
...
(1|k)
2
(tk+1 tkp +1 ) (1|k) = 0
. .
..
. .
.
. .
p
0
(p 1|k)
(tk+1 tkp +1 )
(G.21)
Because the right-hand side is just hk e2 , this equation can be solved directly using
Cramers rule and using the determinant formulas of Vandermonde matrices. The
results are
p
tk+1 tk
if = 1
t
k+1 tkj
j =0
(|k) =
(G.22)
p
t
t
k+1
kj
k+1
k
if 0
tk+1 tk
tk tkj
j 0,j =
Note that this formula involves product terms that are the Lagrange formulas used
in polynomial interpolation, which is how most textbooks derive the formulas for
BDF coefficients. The approach taken here, however, has the advantage that it
automatically fixes the order of approximation when we truncated the Taylor series
of the exponential functions based on the chosen order.
When the step sizes are constant, that is, hk = h, then (tk+1 tkj ) = (j + 1)h,
and (G.22) can be used to find independent of k. For instance, for the sixth-order
BDF method, that is, p = 6, the coefficient of yk3 becomes
1
3 =
4
1
2
3
5
6
14 24 34 54 64
=
15
4
To determine the appropriate value for hk , we can first set hk = hk1 and then
use either of the error-control methods given in Section G.5 to modify it. The stepdoubling approach might be simpler for the general nonlinear case.
and
|wk+1 zk+1 |
hk
(G.23)
In addition, we expect that the truncation error, k+1 (hk ), is of the order of hnk , that
is, for some constant C,
k+1 (hk ) = Chnk
Chnk =
|wk+1 zk+1 |
hk
(G.24)
|wk+1 zk+1 |
hk
Rearranging,
hk
|wk+1 zk+1 |
1/n
(G.25)
To incorporate (G.25), we can set to be equal to the right hand side of (G.25)
with divided by 2, that is,
1/n
hk
=
(G.26)
2|wk+1 zk+1 |
This would guarantee a strict inequality in (G.25).
The implementation of (G.26) is shown in the flowchart given in Figure G.1. In
the flowchart, we see that if the truncation error, k+1 , is less than the tolerance ,
we can set yk+1 to be wk+1 . Otherwise, we choose to be
(G.27)
= min max , max min ,
If k+1 happens to be much less than , the scaling factor will be greater than unity,
which means the previous step size was unnecessarily small. Thus the step size could
be increased. However, if k+1 is greater than , will be less than unity, which means
the step size has to be reduced to satisfy the accuracy requirements. As shown in the
flowchart, we also need to constrain the step size hk to be within a preset maximum
bound, hmax , and minimum bound, hmin .
725
726
hk
nth Order
Method
(n+1)th Order
Method
zk+1
wk+1
k+1
yk+1
tk+1
k
wk+1
(or zk+1)
tk+hk
|wk+1 -zk+1|
hk
yes
k+1
k+1
no
^=
min
hk
yes
1
2
1/n
max
1/n
k+1
max
^
min
min(hmax, hk)
hk
hmin ?
no
1
4
3
8
12
13
F45 :
1
1
2
zk+1
wk+1
1
4
3
32
9
32
1932
2197
7200
2197
7296
2197
439
216
3680
513
845
4104
8
27
3544
2565
1859
4104
11
40
25
216
1408
2565
2197
4104
51
16
135
6656
12825
28561
56430
9
50
2
55
1
5
3
10
4
5
9
DP54 :
zk+1
wk+1
(G.28)
1
5
3
40
9
40
44
45
56
15
32
9
19372
6561
25360
2187
64448
6561
212
729
9017
3168
355
33
46732
5247
49
176
5103
18656
35
384
500
1113
125
192
2187
6784
11
84
35
384
500
1113
125
192
2187
6784
11
84
5179
57600
7571
16695
393
640
92097
339200
187
2100
(G.29)
1
40
727
728
0.11
0.8
y1
0.6
0.4
0.2
0.1
0.2
0.4
0.6
0.8
0
0
0.2
0.4
0.6
0.8
Figure G.2. Numerical solution for Example G.1 showing varying step sizes based on errorcontrol strategy for tolerance = 108 .
EXAMPLE G.1.
dy1
dt
dy2
dt
( D) y1
D (y2f y2 )
max y2
km + y2
y1
Y
where Y = 0.4, D = 0.3, y2f = 4.0, max = 0.53, and km = 0.12 are the yield,
dilution rate, feed composition, maximum rate, and Michaelis-Menten parameter, respectively. Assuming an initial condition of y(0) = (0.1, 0)T , we have the
plots shown in Figure G.2 after applying the Fehlberg 4/5 embedded RungeKutta method using error control with tolerance = 108 . We see that the step
sizes are smaller near t = 0 but increased as necessary.
and
wk+2 zk+2
2n+1 Chn+1
2Chn+1
k
k
n+1
2
2 Err (hk )
=
or
Err (hk ) =
w
z
k+2
k+2
2n+1 2
n+1
Err (hk )
C (hk )n+1
2n+1 2
where < 1, for example, = 0.9. This yields the formula for based on the stepdoubling approach:
1/(n+1)
2n+1 2
=
wk+2 zk+2
(G.30)
The MATLAB code for the Gauss-Legendre IRK is available on the books
webpage as glirk.m, and it incorporates the error control based on the stepdoubling method.
EXAMPLE G.2.
equation:
dy1
dt
dy2
dt
y2
1 y21 y2 y1
subject to the initial condition y = (1, 1)T . When = 500, the system becomes
practically stiff. Specifically, for the range t = 0 to t = 800, the Fehlberg 4/5
Runge Kutta will appear to hang. Instead, we could apply the Gauss-Legendre
Implicit Runge Kutta, together with error-control based on the step-doubling
approach using tolerance = 106 . This results in the plot shown in Figure G.3,
which shows that small step sizes are needed where the slopes are nearly
vertical.
729
730
y1
Figure G.3. The response for a van der Pol oscillator when = 500 using Gauss-Legendre IRK
method with error control based on step-doubling
procedure.
3
0
200
400
600
800
k j 1
k j 1
n!
( j )n
c j, n ( j )n =
j,
S (j, n) =
(n )!
=0
k j 1
=
where D j =
d
=0
j, ( j ) D j ( j )n
=0
i=0
k j 1
i=0
j, ( j ) D j
( j ) ( j )n
=0
k j 1
j, ( j )
=0
m=0
p
!
m
Dm
( j )n
j ( j ) D j
m!( m)!
= 0, 1, . . . , k j 1
i Qi (S (j, n)) = 0
i=0
i Q (yn ) =
i
p
M
j =1
i=0
=0
i Q S (j, n)
i
(G.31)
q (x(0), x(T )) = 0
(G.32)
(G.33)
and these vectors could be evaluated by using any initial value solvers such as
Runge-Kutta method after setting x0 as the initial condition.
The main idea of the shooting method is to find the appropriate value for x0
such that the boundary conditions given in (G.32) are satisfied, that is,
Find x0
q (x0 , xT (x0 )) = 0
such that
x0
where,
(k)
x0
(k)
(k)
= x0 +
x0
(k)
(k)
J 1 q x0 , xT x0
dq
dx0 x0 =x(k)
0
(G.34)
(G.35)
(G.36)
(k)
(k)
(k)
is close to zero, we can set x0 = x0 and solve for x(t) from
Once q x0 , xT x0
t = 0 to t = T . If the number of iterations exceeds a maximum, then either a better
initial guess is required or a different method needs to be explored.
The terms in (G.36) generate a companion set of initial value problem. Specifically, J is the square Jacobian matrix of q. The added complexity stems from the
dependence of q on xT , which in turn depends on x0 through the integration process
of (G.31).
Let the boundary conditions be given as
..
(G.37)
q (x0 , xT ) =
=0
.
qn (x01 , . . . , x0n , xT 1 , . . . , xTn )
4
The Newton-search approach is not guaranteed to converge for all systems. It is a local scheme and
thus requires a good initial guess.
731
732
then
dq
dx0
=
=
q (a, b) a
a
x0
a=x0 ,b=xT
q (a, b) db
b dx0
a=x0 ,b=xT
Qa (x0 , xT ) + Qb (x0 , xT ) M (T )
(G.38)
where,
Qa
ij
Qb
1n
..
.
nn
..
.
11
..
.
n1
(G.39)
qi (a1 , . . . , an , b1 , . . . , bn )
a j
ak =x0k ,b =xT
11 1n
..
..
..
.
.
.
n1
(G.40)
(G.41)
nn
ij
qi (a1 , . . . , an , b1 , . . . , bn )
bj
ak =x0k ,b =xT
M(T )
dxT
dx0
(G.42)
(G.43)
A (t, x) M(t)
(G.44)
where
M(t)
dx
dx0
(G.45)
A(t, x)
F
x
(G.46)
and
M(0) = I
(G.47)
Note that A(t, x) depends on the x consistent with the x0 used. Thus the following
integration needs to be performed simultaneously:
d
x
dt
d
M
dt
F(t, x)
A (t, x) M
x(0) = x0
M(0) = I
(G.48)
(G.49)
x0
dx
dt
dM
dt
(t, )
x(0)=x0
M(0)=I
MT=M(T)
q = q(x0,xT)
xT=x(T)
Qa (x0,xT)
Qb (x0,xT)
no
yes
J=Qa+QbMT
x0
x0
-1
J q
dx
dt
(t, )
x(0)=x0
x0 + x 0
Solution:
x(0),x(h),x(2h),...,x(T)
Figure G.4. A flowchart for nonlinear shooting implemented with Newtons method.
Having calculated xT = x(T ) and M(T ), we can then substitute these values together
with x0 to determine Qa , Qb and dq/dx0 . Thereafter, the update to x0 can be determined, and the iteration continues until the desired tolerance on q is obtained. A
flowchart showing the calculation sequences is given in Figure G.4.
EXAMPLE G.3.
k1 e2t x1 x2 + k3
x
d 1
=
x2
k1 e2t x1 x2 + k2 x3
dt
x3
k1 e2t x1 x2 k2 x3
x1 (0) x2 (T ) 0.164
q (x(0), x(T )) = x2 (0)x2 (T ) 0.682 = 0
x3 (0) + x3 (T ) 1.136
with T = 2, k1 = 10, k2 = 3 and k3 = 1.
733
734
1.5
x2
1.0
x1
x
0.5
x3
0.5
1.5
1
0
0
Qa = 0 x2 (T ) 0
0
0
1
0
1
0
Qb = 0 x2 (0) 0
0
0
1
Similarly, we can calculate A = F/x,
2t
k1 e2t x2
A(t, x) = k1 e x2
k1 e2t x2
2t
k1 e2t x1
k1 e2t x1
k1 e x1
0
k2
k2
(0)
1.516
1.083
xT = 1.352
x0 = 0.504
0.992
0.144
Plots of the solutions are shown in Figure G.5. (A MATLAB file nbvp.m
is available on the books webpage and solves this specific example. The code
contains sections that are customizable to apply to different nonlinear boundary
value problems.)
(G.50)
with separated boundary conditions such that k conditions are specified at t = 0 and
(n k) conditions are specified at t = T ,
Q0 x(0)
(G.51)
QT x(T )
(G.52)
(G.53)
where S(t) is an n n transformation matrix and z(t) is the new state vector. The
aim of the transformation is to recast the original problem into a partially decoupled
problem such that the solution of first k values of z can be solved independently of
the last (n k) values of z, that is,
d
z1
H11 (t)
q1 (t)
0
z1
(G.54)
=
+
z2
z2
H21 (t) H22 (t)
q2 (t)
dt
where z1 (t)[=]k 1 and z2 (t)[=](n k) 1. In addition, the transformation will be
done such that the z1 (0) can be specified from (G.51), whereas z2 (T ) can be specified
from (G.52).
Thus z1 is first solved using initial value solvers to determine z1 (t = T ). Afterward, z1 (T ) is combined with z2 (T ), after using (G.52), to form z(t = T ). The terminal condition for x(t) at t = T can then be found from (G.53). Having x(T ), the
trajectory of x(t) can be evaluated by integrating backward from t = T to t = 0.
To obtain the form in (G.54), we first apply (G.53) to the original equation,
(G.50),
d
d
S z + S z = ASz + b
dt
dt
d
d
1
z = S (t) A(t)S(t) S z + S1 (t)b(t)
dt
dt
=
where
H(t)z + q(t)
d
S (t) A(t)S(t) S
dt
H(t)
d
S
dt
A(t)S(t) S(t)H(t)
(G.55)
and
q(t) = S1 b(t)
(G.56)
=
Ik
0
R(t)
Ink
(G.57)
(G.58)
735
736
A21 A22
0
I
0
I
H21
0
0
A11 (H11 + RH21 ) A11 R + A12 RH22
=
A21 H21
A21 R + A22 H22
0
H22
A21
H22
A21 R + A22
H11
d
R
dt
(G.59)
(A11 RA21 ) z1 + q1
(G.60)
z1 (0) = C1 0
(G.62)
In summary, the first phase, known as the forward-sweep phase of the Ricatti
equation method, is to solve for R(t) and z1 (t), that is,
d
R = A11 R + A12 RA21 R RA22
dt
C D , followed by
where Q0 =
d
z1 = (A11 RA21 ) z1 +
dt
R(0) = C1 D
I
z1 (0) = C1 0
(G.63)
(G.64)
The second phase of the method is to find the conditions for x(T ) by combining
the results from the first phase with the other set of boundary conditions given by
(G.52). By partitioning QT as
F G
QT =
where F is (n k) n and G is (n k) (n k), we get
I R(T )
z1 (T )
F
G
QT x(T ) =
z2 (T )
0
I
F z1 + FR(T )z2 (T ) + Gz2 (T )
z2 (T ) = (FR(T ) + G)
(T F z1 (T ))
T
(G.65)
(G.66)
Having evaluated x(T ) means we now have all the information at one boundary.
We could then use the original differential equations given in (G.50) and integrate
backward starting from t = T until t = 0. This second phase is also known as the
backward-sweep phase of the Ricatti equation method.
The Ricatti equation method (which is also sometimes called the invariant
embedding method) is sometimes more stable than the shooting method, especially when the process (7.55) is unstable. However, there are also situations when
the shooting methods turn out to be more stable. Thus both methods may need to
be explored in case one or the other does not yield good results. Note also that
our development of the Ricatti equation method is limited to cases with separated
boundary conditions.
737
APPENDIX H
f
x2 2 f
r2 2 f
2 f
f
+r +
+
+
rx
+
x
r
2 x2
2 r2
rx
739
Type
Normal form
Saddle-Node
x = r + x2
Bifurcation diagram
1
unstable
stable
1
1
Transcritical
x = rx x2
stable
stable
0
unstable
unstable
1
1
r
3
Pitchfork
x = rx x3
(Supercritical)
1
x0
stable
unstable
stable
stable
1
1
r
x = rx + x3
(Subcritical)
1
unstable
x0
stable
unstable
unstable
1
1
where all the various partial derivatives are evaluated at (x, r) = (0, 0). Because
(x, r) = (0, 0) is a non-hyperbolic equilibrium point, the first two terms are zero, that
is, f (0, 0) = 0 and f/x(0, 0) = 0.
We will truncate the series after the second-order derivatives to yield bifurcation
analysis of saddle-node bifurcations and transcritical bifurcations. This means that
equilibrium points near (x, r) = (0, 0) will be given by the roots of the second-order
polynomial in x,
2 (r) x2 + 1 (r) x + 0 (r) = 0
(H.1)
740
where
2 (r)
1 2 f
2 x2
1 (r)
2 f
rx
0 (r)
f
r2 2 f
+
r
2 r2
which was obtained by setting the right-hand side of the Taylor series expansion
to zero. Solving for the roots of (H.1), we find the neighboring equilibrium points
around (x, r) = (0, 0),
6
2 2
2
2 f
f
1 f
f
r2 2 f
2
r
r
4
r +
rx
rx
2 x2
r
2 r2
xeq =
(H.2)
2
f
x2
For saddle-node bifurcations, consider |r| 1. Then (H.2) will reduce to
6
2 1
f
f
xeq saddlenode = 2r
(H.3)
r
x2
which then requires
f
r
r
2 f
x2
1
<0
(H.4)
rx
rx
x2
r2
xeq = r
(H.5)
2
f
x2
A pair of equilibrium points will then exist if the value inside the square root is
positive, plus 2 f/x2 = 0, that is,
2 2 2 2
f
f
f
2 f
>
0
and
0
=
(H.6)
rx
x2
r2
x2
As r changes sign, the stability of the equilibrium points will switch, thereby giving
the character of transcritical bifurcations.
For both saddle-node and transcritical bifurcations, the stability can be assessed
by regrouping the Taylor series approximation as
x 0 (r) + 1 (r) + 2 (r)x x = 0 (r) + (x,r) x
where
(x,r) = 1 (r) + 2 (r)x
741
2
xeq
1
0
-1
-2
-2
r
1
0
2
-1
Then applying the formulas for xeq (Equation (H.3) for saddle-node bifurcations and
Equation (H.5) for transcritical bifurcations), we find that
xeq,i
is stable if
for i = 1, 2
For pitchfork bifurcations, the Taylor series will need to include third-order
derivatives such that a third-order polynomial can be obtained for the equilibrium points. The computations are lengthier, but with the additional condition that
f/r(0, 0) = 0 and 2 f/x2 (0, 0) = 0, the conditions simplify to the following conditions
(H.8)
742
1.5
0.5
three
equilibrium
points
0.5
cusp point
1.5
2
2
r
saddle-node bifurcations. When h > 1.089 and gradually decreased, the equilibrium points following the top curve in Figure H.3 will also decrease continuously.
However, as h moves past the critical value h = 1.089, the equilibrium point will
jump to follow the values of the lower curve. The opposite thing happens for the
lower curve; that is, as h gradually increases until it passes the value of 1.089, the
equilibrium point jumps to follow the upper locus of equilibrium points. This characteristic of having the behavior depend on the direction of parameter change is
known as hysteresis.
The bifurcations of second-order systems include all three types of the first-order
cases, namely saddle-node, transcritical, and pitchfork bifurcations. These three
types of bifurcations are extended by means of simply adding one more differential
equation. The canonical forms are given in Table H.2. These types of bifurcations
are centered at non-hyperbolic equilibrium points that have zero eigenvalues.
The Hopf bifurcation is a type of bifurcation that is not available to onedimensional systems because it involves pure imaginary eigenvalues. These bifurcations yield the appearance or disappearance of limit cycles. A supercritical Hopf
bifurcation occurs when a stable focus can shift to a stable limit cycle. Conversely,
a subcritical Hopf bifurcation occurs when an unstable limit cycle changes to an
2
stable
1.5
0.5
unstable
0.5
1.5
stable
2
2
Normal form
Saddle-Node
y = y
x = r + x2
Transcritical
y = y
x = rx x2
Pitchfork
Supercritical:
y = y
x = rx x3
Subcritical:
y = y
x = rx + x3
Hopf
=
= + a3
Supercritical: a < 0
Subcritical: a > 0
unstable focus. The canonical form of a Hopf bifucation given in terms of polar
coordinates (, ),
d
=
dt
and
d
= + a3
dt
!
where = x2 + y2 and = tan1 (y/x). It can be shown that when a < 0, the system
exhibits a supercritical Hopf bifurcation. However, when a > 0, the system exhibits
a subcritical Hopf bifurcation. These are shown in Figures H.4.
It turns out that Hopf bifurcations can occur for orders 2. A general theorem
is available that prescribes a set of sufficient conditions for the existence of a Hopf
bifurcation.
Let h be a value of parameter such that the system dx/dt = f (x; )
has an equilibrium point xeq (h ) with the Jacobian matrix
J = df/dx at x = xeq (h )
having a pair of pure imaginary eigenvalues, i (h ) (i = 1), whereas the rest of
the eigenvalues have nonzero real parts. In addition, let the real and imaginary parts
of the eigenvalues () be smooth functions of parameter in which
d
Re(()) = 0
d
THEOREM H.1.
in a neighborhood around h . Under these conditions, the system will have a Hopf
bifurcation at = h .
There are several physical systems that exhibit Hopf bifurcations, such as in
the fields of biomedical science, aeronautics, fluid mechanics, and chemistry.1 In
1
A good elementary treatment of Hopf bifurcations, including several examples and exercises, can
be found in S. Strogatz, Nonlinear Dynamics and Chaos, Perseus Book Publishing, Massachusetts,
1994.
743
744
chemistry, there are several well-known reaction systems, such as the BelousovZhabotinsky (BZ) system, known collectively as oscillating chemical reactions.
Depending on the critical conditions, the systems can oscillate spontaneously. One of
the well-known examples of a Hopf bifurcation is the Brusselator reaction, which is
given in Exercise E8.19. Although it is strictly fictitious, its simplification still allows
one to understand the onset of Hopf bifurcations in real systems.
APPENDIX I
d2 y
dy
+ xP 1 (x)
+ P 0 (x)y = 0
dx2
dx
&
P2 (x)
&
2,0 + &
2,1 x + &
2,2 x2 +
&
P1 (x)
&
1,0 + &
1,1 x + &
1,2 x2 +
&
P0 (x)
&
0,0 + &
0,1 x + &
0,2 x2 +
(I.1)
where
(I.2)
and &
2,0 = 0.
The indicial equation (9.28) becomes
&
0,0 + &
1,0 r + &
2,0 (r)(r 1)
&
2,0 r2 + (&
1,0 &
2,0 ) r + &
0,0
1,0 )
2,0 &
(&
"
1,0 )2 4&
0,0&
2,0
2,0 &
(&
2&
2,0
(I.3)
(I.4)
We denote the larger root (if real) by ra and the other root by rb.
When the roots differ by an integer, say ra rb = m 0,
ra + r b
2ra m
ra
&
1,0
&
2,0
&
1,0
1
m+1
2
&
2,0
(I.5)
&
1,0
1
ra =
1
2
&
2,0
(I.6)
745
746
&
n (ra )xra +n
(I.7)
n=0
where
&
n (ra )
Qn,k (ra )
if n = 0
n1
k=0
if n > 0
&
0,nk + &
1,nk (k + ra ) + &
2,nk (k + ra )(k + ra 1)
If (ra rb) is not an integer, the second solution, v(x), is immediately given by
v(x) =
&
n (rb)xrb+n
(I.8)
n=0
where
&
n (rb)
Qn,k (rb)
if n = 0
n1
k=0
Qn,k (rb)&
k (rb)
Qn,n (rb)
if n > 0
&
0,nk + &
1,nk (k + rb) + &
2,nk (k + rb)(k + rb 1)
If the indicial roots differ by an integer, that is, m 0, we can use the dAlembert
method of order reduction (cf. Lemma I.1 in Section I.2) to find the other solution.
For N = 2, this means the second solution is given by
v(x) = u(x) z(x)dx
(I.9)
where z(x) is an intermediate function that solves a first-order differential equation
resulting from the dAlembert order reduction method. Using u(x) as obtained in
(I.7), z(x) can be obtained by solving
dz
du
2&
2&
&
+ 2x P2 (x)
+ xP1 (x)u z = 0
x P2 (x)u
dx
dx
&
P1 (x)
1 dz
1 du
=
+2
(I.10)
z dx
u dx
x&
P2 (x)
P1 (x) defined by equations (I.7) and (I.2), respectively, the leftWith u, &
P2 (x) and &
hand side of (I.10) can be replaced by an infinite series,
1 dz
=
(n + n ) xn
z dx
n=1
(I.11)
2ra
n =
&
n+1 (ra ) n1
(r
)
2(ra + n + 1)&
k
nk
a
k=1
1,0 /&
2,0
&
n1
&
2,0
&
1,n+1
k=1 k 2,nk /&
747
if n = 1
if n 0
(I.12)
if n = 1
if n 0
(I.13)
&
1,0
+ 2ra = m + 1
&
2,0
1 dz
m+1
n
=
+
(n + n ) x
z dx
x
n=0
m+1
(n + n ) n+1
ln(z) = ln x
+
x
n+1
n=0
8
9
(n + n )
(m+1)
n+1
z = x
exp
x
n+1
n=0
k=0 k x
1
nm1
+ m x +
if m > 0
n=m+1 n x
z=
n1
0 x1 +
if m = 0
n=1 n x
and
m1
km
k=0 (k /k m)
x
n
0 ln |x| +
n=1 (n /n) x
if m > 0
if m = 0
748
This integral can now be combined with u to yield the form for the second
independent solution, that is,
v(x) = u(x) zdx
=
&
n x
ra +n
zdx
n=0
v(x)
u ln |x| + n=0 bn xrb+n
u ln |x| +
rb +n
n=1 bn x
if m > 0
(I.14)
if m = 0
Note that for m = 0, the infinite series starts at n = 1 and the coefficient of
(u ln |x|) is one. The parameter is set equal to 1 when m = 0 because will later
be combined with a constant of integration. However, when m > 0, should not be
fixed to 1, because = 0 in some cases. Instead, we will set b0 = 1 in anticipation of
merging with the arbitrary constant of integration.
Having found the necessary forms of the second solution, Theorem 9.2 summarizes the general solution of a second-order linear differential equation that includes
the recurrence formulas needed for the coefficients of the power series based on the
Frobenius method.
i (x)
i=0
di y
=0
dxi
(I.15)
Suppose we know one solution, say, u(x), that solves (I.15). By introducing another
function, q(x), as a multiplier to u(x), we can obtain
y = q(x)u(x)
(I.16)
LEMMA I.1.
(I.17)
F i (x)
i=1
di1 z
=0
dxi1
(I.18)
with
F i (x) =
N
k=i
k!
d(ki) u
k (x) (ki)
(k i)!i!
dx
(I.19)
First, applying Leibnitzs rule (9.6) to the n th derivative of the product y = qu,
j
i
di y
d q d(ij ) u
i
=
j
dxi
dx j dx(ij )
j =0
where
i
j
i!
j !(i j )!
q
N
i=0
di u
i (x) i
dx
i (x)
N
j
i
d q d(ij ) u
i
j
dx j dx(ij )
j =0
i=0
j
i
d q d(ij ) u
i
j
dx j dx(ij )
i (x)
i=1
j =1
Because u satisfies (I.15), the first group of terms vanishes. The remaining terms can
then be reindexed to yield
N
N
k
d(ki) u di q
k (x) (ki)
=0
i
dxi
dx
i=1
k=i
This method can be used repeatedly for the reduced order differential equations.
However, in doing so, we require that at least one solution is available at each stage of
the order reductions. Fortunately, from the results of the previous section, it is always
possible to find at least one solution for the differential equations using the Frobenius
method. For instance, with N = 3, the Frobenius series method will generate one
solution, say, u. Then via dAlemberts method, another solution given by y = qu
produces a second-order differential equation for z = dq/dt. The Frobenius series
method can generate one solution for this second-order equation, say, v. Applying
the order reduction method one more time for z = wv, we end up with having to
solve a first-order differential equation for w.1
Having solved for w, we can go backward:
1 v + 2 wv
1 vdx + 2 wv dx
1 u + 2 qu
1 u + 2 1 u
vdx + 2 2 u
wv dx
749
750
EXAMPLE I.1.
d2 y
dy
+ x(1 x)
y=0
dx2
dx
The terms for &
i,j are &
2,0 = 2, &
1,0 = 1, &
1,1 = 1, and &
0,0 = 1, whereas
the rest are zero. The indicial roots become ra = 1 and rb = 0.5. Because the
n (rb). The only nonzero values of
difference is not an integer, = 0 and bn = &
Qn,k are
2x2
Qn,n1 (r) = (n 1 + r)
and
n1+r
&
n1 (r)
n (2n + 4r 1)
Thus
&
n (ra )
=
=
&
n (rb)
1
&
n1 (ra ) =
2n + 3
1
2n + 3
where
n>0
1
1
&
0
2n + 1
5
(2n + 2)(2n) 6 4!
2n+1 (n + 1)!
=3
(2n + 3)(2n + 2)(2n + 1) 5 4!
(2n + 3)!
1
1
1
1
1
&
n1 (rb) =
0 (rb) = n
2n
2n
2(n 1)
2
2 n!
2n+1 (n + 1)! n+1
1 n(1/2)
x
3
x
+B
(2n + 3)!
2n n!
n=0
n=0
d2 y
dy
+x
+ xy = 0
dx2
dx
The terms for &
i,j are &
2,0 = 1, &
1,0 = 1, and &
0,1 = 1, whereas the rest are zero.
The indicial roots are ra = rb = 0. Because the difference is an integer with
m = 0, we have = 1. The only nonzero values of Qn,k are
x2
Qn,n (0) = n 2
Thus &
0 (0) = 1 and for n > 0,
&
n (0) =
1
&
n1 (0) =
n2
Qn,n1 (0) = 1
and
1
n2
1
(n 1)2
(1) =
(1)n
(n!)2
(1)n
n=0
(n!)2
xn
(1)n 2n
(n!)2
=
=
=
1
(1)n
(1)n
(1)n
(1)n
b
2
=
b
2
+
2
n1
0
n2
n(n!)2
(n!)2
(n!)2
n(n!)2
(1)n
1
1
2
1 + + +
(n!)2
2
n
n
1
(n!)2
k
(x)n
n=1
k=1
EXAMPLE I.3.
d2 y
dy
+ 3x
+ (2x 8)y = 0
2
dx
dx
The terms for &
i,j are &
2,0 = 9, &
1,0 = 3, &
0,0 = 8, and &
0,1 = 2. The indicial
roots are ra = 4/3 and rb = 2/3. The difference is an integer; then m = 2.
The only nonzero values of Qn,k are
Qn,n (r) = n 9n + 9(2r 1) + 3
and
Qn,n1 (r) = 2
9x2
751
752
Thus &
0 (r) = 1 and
&
n (r) =
2
&
n1 (r)
n (9n + 9(2r 1) + 3)
0
9n(n + 2)
9(n 1)(n + 1)
913
3
=
(1)n 2n+1
9n (n!)(n + 2)!
n=0
(1)n 2n+1
xn+(4/3)
9n (n!)(n + 2)!
F
1 2
27
9
Because m = 2, we only need &
1 (rb) for the second solution,
2
2
2
&
1
=
=
3
9(1)
9
Next, we need n (ra ) and ,
n (ra )
n (ra ) =
[9(2ra + 2n 1) + 3] &
m1 (rb)
Qm,m1&
2
= 2
0 (ra )
9
For the coefficients bn , we have b0 = 1, b1 = 2/9, b2 = 0 and the rest are found
by recurrence, that is,
bn
=
=
=
=
Qn,n1 (rb)
nm (ra )
bn1
Qn,n (rb)
Qn,n (rb)
2
(n 1)
(1)n 2n+1
bn1 +
9n(n 2)
9n (n 2)!n!
n(n 2)
(2)n2 2
2
(1)n 2n+1
(n 1)
b2 +
+ +
9n2 n!(n 2)!
9n (n 2)!n!
(3)(1)
(n)(n 2)
n3
(1)n 2n+1
(n 1 k)
,
for n > 2
9n (n 2)!n!
(n k)(n 2 k)
k=0
753
2
2
x2/3 + x1/3 u ln(x)
9
81
n3
(1)n 2n+1
(n 1 k)
n(2/3)
+
x
9n (n 2)!n!
(n k)(n 2 k)
n=3
k=0
y=
an xn
n=0
With N = 2, the coefficients i,j are: 2,0 = 1, 2,2 = 1, 1,1 = 2, and 0,0 =
( + 1). Based on (9.21), the only nonzero values are for n = k, that is,
n,n
( + n + 1)( n)
(n + 1)(n + 2)
( + n + 1)( n)
an
(n + 1)(n + 2)
n1
(1)n
[ 2(n k)] [ + 2(n k) + 1] a0
(2n)!
(I.21)
n1
(1)n
[ 2(n k) 1] [ + 2(n k) + 2] a1
(2n + 1)!
(I.22)
k=0
a2n+1
k=0
n1
(1)n
( 2(n k)) ( + 2(n k) + 1)
(2n)!
(I.23)
k=0
2n+1 ()
n1
(1)n
( 2(n k) 1) ( + 2(n k) + 2) (I.24)
(2n + 1)!
k=0
754
y = a0 1 +
2n ()x2n + a1 x +
2n+1 ()x2n+1
n=1
(I.25)
n=1
The two infinite series are called the Legendre functions of the second kind,
namely Leven (x) and Lodd (x), where
=
Leven (x)
1+
2n ()x2n
(I.26)
2n+1 ()x2n+1
(I.27)
n=1
Lodd (x)
x+
n=1
For the special case when = even is an even integer, even +2j (even ) = 0,
j = 1, . . ., and thus Leven (x) becomes a finite sum. Similarly, when = odd is an
odd integer, odd +2j (odd ) = 0, j = 1, . . ., and Lodd (x) becomes a finite sum. In
either case, the finite sums will define a set of important polynomials. By carefully
choosing the values of a0 and a1 , either of the finite polynomials can be normalized
to be 1 at x=1. If = even = 2, we need
a0 = A(1)
(2)!
2 (!)2
(I.28)
Conversely, if = odd = 2 + 1,
a1 = A(1)
(2 + 2)!
2 !( + 1)!
(I.29)
where A is arbitrary. Thus with these choices for a0 and a1 , we can rewrite (I.25) to
be
y = APn (x) + BQn (x)
(I.30)
Int(n/2)
Pn (x) =
k=0
where
+
Int(n/2) =
2n k!(n
n/2
(n 1)/2
if n even
if n odd
(I.31)
(I.32)
The Legendre functions, Qn (x), has a closed form that can be obtained more conveniently by using the method of order reduction. Applying dAlemberts method of
order reduction, we can set Qn (x) = q(x)Pn (x), where q(x) is obtained via Lemma I.1
given in Section I.2. Applying this approach to (I.20),
dz
2
2 dPn
+ 2 1x
0 =
1 x Pn
2xPn z
dx
dx
dz
z
2x dx
dPn
2
2
1x
Pn
=
=
Thus with q = zdz,
exp
(Pn )2
2x
1 x2
755
dx
1
(1 x2 ) (Pn )2
8
Qn (x) = Pn (x)
1
(1 x2 ) (Pn (x))2
9
dx
(I.33)
where we included a factor of (1) to make it consistent with (I.26) and (I.27).
(I.35)
With y = qw, where q = (1 x2 )m/2 , the terms on the right-hand side of (I.34) can
each be divided by q and then evaluated to be
1
m2
m2
n(n + 1)
y
=
n(n
+
1)
w
q
1 x2
1 x2
2x dy
q dx
1 x2 d2 y
q dx2
2mx2
dw
w 2x
1 x2
dx
2
m (m 1)x2 1
dw
2 d w
w
2mx
+
1
x
1 x2
dx
dx2
(I.36)
(I.37)
Then S satisfies the Legendre equation given by (I.20). With f (x) = 1 x2 , df/dx =
2x and a = n(n + 1), (I.20) can be rewritten with S replacing y, as
f
d2 S df dS
+
+ aS = 0
dx2
dx dx
(I.38)
756
Furthermore, with d2 f/dx2 = 2 and dk f/dxk = 0 for k > 2, the mth derivative of
each term in (I.38) is, using the Leibnitz rule (9.6),
dm
dxm
d2 S
f 2
dx
=
dm
dxm
df dS
dx dx
=
=
dm
(aS)
dxm
k (2+mk)
m
d f
d
S
m
k
dxk
dx(2+mk)
k=0
(1+m)
d(2+m) S
df
S
d
f
+m
(2+m)
(1+m)
dx
dx
dx
m
m(m 1) d2 f
d S
+
2
2
dx
dxm
k+1
m
d f
d(1+mk) S
m
k
dxk+1
dx(1+mk)
k=0
2 m
df d(1+m) S
d f
d S
+
m
dx dx(1+m)
dx2
dxm
dm S
dxm
dm S
dxm
2(m + 1)x
d
dx
dm S
dxm
+ (n m)(n + m + 1)
dm S
dxm
=0
(I.39)
m/2
1 x2
y
dm S
dxm
dm Pn
dm Qn
A m +B
dx
dxm
(I.40)
where Pn,m and Qn,m are the associated Legendre polynomials and associated
Legendre functions, respectively, of order n and degree m defined by2
Pn,m
Qn,m
m/2 dm
(1)m 1 x2
Pn (x)
dxm
m/2 dm
(1)m 1 x2
Qn (x)
dxm
(I.41)
In some references, the factor (1)m is neglected, but we chose to include it here because MATLAB
happens to use the definition given in (I.41).
757
d2 y
dy 2
+x
+ x 2 y = 0
2
dx
dx
(I.42)
Using a series expansion around the regular singular point x = 0, we can identify the
1,0 = 1, &
0,0 = 2 , and &
0,2 = 1. The indicial roots
following coefficients: &
2,0 = 1, &
using (9.2) are ra = and rb = . Applying the Frobenius method summarized in
Theorem 9.2, the only nonzero values of Qn,k are
Qn,n2 (r) = 1
and
1 (r) = 0, &
n (r) = &
n2 (r)/[n(n + 2r)], for n > 1, and n (r) = (2r +
thus &
0 (r) = 1, &
&
1 (r) = 0, functions corresponding to odd subscripts
2n)n (r). Furthermore, because &
will be zero, that is,
&
2n+1 (r) = 0
for n = 0, 1, . . .
=
=
1
&
2n2 =
4n(n + r)
1
1
&
0
4n(n + r)
4(1)(1 + r)
(1)n
4n n!
n1
k=0
(n + r k)
Depending on the value of the order , we have the various cases to consider:
r Case 1: 2 is not an integer. We have a
2k+1 = b2k+1 = 0, k = 0, 1, . . ., and for n =
1, 2, . . .
a2n =
(1)n
n1
4n n! k=0 (n + k)
and
b2n =
(1)n
n1
4n n! k=0 (n k)
n=0
(1)n x2n+
4n n! n1
k=0 (n + k)
and
v(x) =
n=0
(1)n x2n
4n n! n1
k=0 (n k)
These results can further be put in terms of Gamma functions (cf. (9.9)), and
after extracting constants out of the summations, we obtain
u(x) = 2 ( + 1)J (x)
and
where J (x) is known as the Bessel function of the first kind defined by
J (x) =
x 2n+
n=0
(1)n
n!(n + + 1)
(I.43)
where the order in the definition (I.43) may or may not be an integer. Thus in
terms of Bessel functions, the complete solution, for not an integer, is given by
y = AJ (x) + BJ (x)
(I.44)
758
(I.45)
r Case 3: 2 = 0 is an even integer. Let = with an integer. For the first root
ra = , we have a2n = &
2n () and the first solution becomes
u(x) = 2 !J (x)
(I.46)
For the second solution, we will separate v(x) into three parts: v1 , v2 , and v3 ,
where v1 contains the terms with b2n (x), n < , v2 is the term with ln(x) and v3
contains the rest of the terms.
( n 1)!
2n () = n
For v1 , we take n < , for which b2n = &
and obtain
4 n!( 1)!
v1 (x) =
1 2n
x
( n 1)!
4n n!( 1)!
n=0
'
=
(
1
1
x 2n ( n 1)!
(I.47)
2 ( 1)!
2
n!
n=0
22 ()
Q2,22 ()&
2
=
0 ()
4 !( 1)!
(
1
J (x) ln(x)
v2 (x) = u(x) ln(x) = 2
2 ( 1)!
(I.48)
=
=
(1)n
b2
4n n!(n + )!
('
'
( 8
9
n1
(1)n
1
1
1
+
+
4 ( 1)! 4n n!(n + )!
nk nk+
k=0
n
x 2n+ (1)n
1
1
1
v3 (x) =
+
(I.49)
2 ( 1)!
2
n!(n + )!
k k+
'
n=1
k=1
759
Adding up (I.47), (I.48) and (I.49), we have the second solution v(x) as
v(x)
=
=
+
(I.50)
2
n!(n + )!
k k+
n=1
k=1
(I.51)
where the function Y (x) is known as Bessel function of the second kind (also
known as the Neumann function), defined as
x
1 x 2n ( n 1)!
2
J (x) ln
+
2
n!
n=0
8 n+ 9
1
1 x 2n+ (1)n
2
n!(n + )!
k
1
Y (x)
n=0
(I.52)
k=1
(I.53)
(I.54)
where
x
2 x 2n (1)n
2
Y0 (x) = J 0 (x) ln
+
2
(n!)2
n=1
n
1
k
(I.55)
k=1
An alternative method for computing the Bessel functions is to define the Bessel
function of the second kind as
Y (x) =
(I.56)
(I.57)
This means we can unify the solutions to both cases of being an integer or not, as
y(x) = AJ (x) + BY (x)
(I.58)
760
d2 y
dy 2 2
+x
+ x 2 y = 0
dx2
dx
(I.59)
1
dw
dy
dy
=
dx
dw
2
d2 y
2d y
=
dx2
dw2
d2 y
dy 2
+w
+ w 2 y = 0
dw2
dw
(I.60)
x2
x2
(I.61)
d2 y
dy 2 2
+x
+ (i) x 2 y = 0
2
dx
dx
(I.62)
where I (x) is the modified Bessel equation of the first kind of order defined by
i
J (ix)
I (x) = exp
(I.63)
2
and K (x) is the modified Bessel equation of the second kind of order defined by
( + 1)i
K (x) = exp
(I.64)
[J (ix) + iY (ix)]
2
761
an xn
(I.65)
n=0
nan xn1 =
n=1
(n + 1)an+1 xn
n=0
(n + 1)(n)an+1 x
n1
=
(n + 2)(n + 1)an+2 xn
n=1
n=0
..
.
dN y
dxN
(n + N)!
n!
n=0
an+N xn
(I.66)
After substitution of (9.18) and (I.66) into (9.17), while using (9.5),
n
N
(k + j )!
n
ak+j j,nk
x
=0
k!
k=0 j =0
n=0
for n = 0, 1, . . . ,
(I.67)
k=0 j =0
(k + j )!
= j,nk
k!
j,mj =
0,nm
if j = 0
j,nm+j
j 1
i=0
(m i)
if j > 0
j =1
0,2
..
.
1,1
..
1,2
..
.
..
N,0
..
..
N,1
N,2
..
.
0,n
j =N
a0
a1
1,0
1,n
N,n
..
.
an+N
where the group of terms to the left of am are summed up as the coefficient of
am . Note that j, = 0 if < 0. In addition, we can define j, = 0 for < 0, and
762
N
j,mj
if m < n + N
0,m +
j =1
coef (am ) =
if m = n + N
N,n
n+N1
N
N,n an+N +
am 0,m +
j,mj = 0
j =1
m=0
n+N1
an+N
=
=
am 0,m +
j,mj
j =1
m=0
n+N1
N
N,n
n,m am
m=0
where
0,nm +
N
j,nm+j
j =1
n,m = (1)
N,0
j 1
(m i)
i=0
N
(n + i)
i=1
and
j, = 0
<0
u ln(x) +
'
(
du
u + x ln(x)
+
bn (n + rb)xn+rb
dx
'
(
du
d2 u
2
+ x ln(x) 2 +
u + 2x
bn (n + rb)(n + rb 1)xn+rb
dx
dx
bn xn+rb
n=0
x2
dv
dx
d2 v
dx2
n=0
n=0
Substituting into
P2 (x)
x2&
we have
d2 v
dv &
+ x&
P1 (x)
+ P0 (x)v = 0
2
dx
dx
du &
ln(x)
+ x&
P1 (x)
+ P0 (x)u
dx
'
(
du
+ &
P2 (x) u + 2x
+&
P1 (x)u
dx
d2 u
P2 (x) 2
x2&
dx
+&
P2 (x)
bn (n + rb)(n + rb 1)xn+rb
n=0
+&
P1 (x)
bn (n + rb)xn+rb
n=0
+&
P0 (x)
bn xn+rb
n=0
n
n+ra
n=0
&
k (ra ) (&
1,nk + (2ra + 2k 1)&
2,nk )
k=0
xn+rb
n
n=0
bk Qn,k (rb)
k=0
xn+rb
n=m
nm
&
k (ra ) (&
1,nmk + (2ra + 2k 1)&
2,nmk )
k=0
xn+rb
n=0
n
bk Qn,k (rb)
k=0
Using the definition of n (r) given in (9.40), we arrive at the working equation,
m1
n
n+rb
x
bk Qn,k (rb)
n=0
rb +m
+x
+
k=0
n=m+1
8
n+rb
nm (ra ) +
m1
bk Qm,k (rb)
k=0
n
k=0
bk Qn,k (rb)
9
=
763
764
Thus for n < m, the formula for bn becomes those for &
n (rb). For n = m, note that
Qm,m (rb) = 0, and we have bm arbitrary, which we can set to zero. Doing so and
making the coefficient of xm+rb be equal to zero,
m1
=
k=0
bk Qm,k (rb)
0 (ra )
For n > m > 0, each coefficient of xn+rb can be set to zero, which yields the recurrence
formula for bn ,
nm (ra ) + n1
k=0 Qn,k (rb)bk
bn =
Qn,n (rb)
Finally, if m = 0, a similar derivation can be followed, except that we can set = 1
as discussed before. The working equation is now given by
xrb (0 (ra ) + bm Qm,m (rb))
9
8
n
n+rb
x
bk Qn,k (rb)
+
nm (ra ) +
n=1
k=0
1,0 /&
2,0 )/2, which means 0 = 0. With
Note that for this case, ra = rb = (1 &
Q0,0 (rb) = 0, b0 can be arbitrary and thus can be set to be zero. The remaining
coefficients then become
n (ra ) + n1
k=0 Qn,k (rb)bk
bn =
Qn,n (rb)
m=0
x 2m+
(1)m
m!(m + + 1) 2
To show (9.63), multiply J (x) by x and then take the derivative with respect
to x,
8
9
d
d
(1)m x2m+2
(x J (x)) =
dx
dx
m!(m + + 1)22m+
m=0
(1)m (2m + 2)x2m+21
m!(m + + 1)22m+
m=0
m=0
x 2m+1
(1)m
m!(m + ) 2
x J 1 (x)
To show (9.64), multiply J (x) by x and then take the derivative with respect
to x,
8
9
d
d
(1)m x2m
x J (x)
=
dx
dx
m!(m + + 1)22m+
m=0
m=1
m=1
(1)m (2m)x2m1
m!(m + + 1)22m+
(1)m x2m1
(m 1)!(m + + 1)22m+1
m=0
x 2m++1
(1)m
m!(m + + 2) 2
x J +1 (x)
d
J (x)
dx
d
J (x)
dx
x J 1 (x)
J 1 (x) J (x)
x
d
J (x)
dx
d
J (x)
dx
x J +1 (x)
J +1 (x) + J (x)
x
1
2 x
1 ( m 1)! x 2m
ln
+ J (x)
m!
2
m=0
8
9
m
1 (1)m x 2m+ 1
m!(m + )! 2
k
m=1
k=1
8m+ 9
1 (1)m x 2m+ 1
m!(m + )! 2
k
m=0
k=1
765
766
To show (9.67), multiply Y (x) by x and then take the derivative with respect
to x, while incorporating (9.63),
d
(x Y (x))
dx
2 d x
ln
+ x J (x)
dx
2
1
1 d ( m 1)!x2m
dx
m!22m
m=0
8 m 9
1
1 d (1)m x2m+2
dx
m!(m + )!22m+
k
m=1
k=1
8m+ 9
1
1 d (1)m x2m+2
dx
m!(m + )!22m+
k
m=0
k=1
2 1
2 x
x J (x) +
ln
+ x J 1 (x)
1
1 ( m 1)!x2m1
(m 1)!22m1
m=1
8 m 9
1
1
(1)m x2m+21
2m+1
m!(m + 1)!2
k
m=1
k=1
1
(1)m x2m+21
m!(m + 1)!22m+1
m=0
8m+1 9
1
k
k=1
2 1 (1)m x 2m+
x
m!(m + )! 2
m=0
2 x
ln
+ x J 1 (x)
2
1 ( m)! x 2m+1
x
(m)!
2
m=0
8 m 9
x 2m+1
1
(1)m
1
x
m!(m + 1)! 2
k
m=1
x 2m+1
(1)m
1
x
m!(m + 1)! 2
m=0
x Y1 (x)
k=1
8m+1 9
1
k
k=1
To show (9.68), multiply Y (x) by x and then take the derivative with respect
to x, while incorporating (9.64),
d
x Y (x)
dx
2 d x
ln
+ x J (x)
dx
2
1
1 d ( m 1)!x2m2
dx
m!22m
m=0
8 m 9
1
1 d
(1)m x2m
dx
m!(m + )!22m+
k
m=1
k=1
8
9
m+
1
1 d
(1)m x2m
dx
m!(m + )!22m+
k
m=0
k=1
2 1
2 x
x
J (x)
ln
+ x J +1 (x)
2
+
1
1 ( m)!x2m21
m!22m1
m=0
8 m 9
1
1
(1)m x2m1
2m+1
(m 1)!(m + )!2
k
m=1
k=1
8m+ 9
1
1
(1)m x2m1
2m+1
(m 1)!(m + )!2
k
m=1
k=1
2 x
ln
+ x J +1 (x)
1
( m)! x 2m1
+ x
(m)!
2
m=0
8 m 9
x 2m++1
1
(1)m
1
+ x
m!(m + + 1)! 2
k
m=1
+
=
1
x
m=0
k=1
8
9
x 2m++1 m++1
1
(1)m
m!(m + + 1)! 2
k
k=1
x Y1 (x)
d
Y (x)
dx
d
Y (x)
dx
x Y1 (x)
Y1 (x) Y (x)
x
767
768
d
Y (x)
dx
d
Y (x)
dx
x Y+1 (x)
=
=
=
1)
x exp
i J 1 (ix)
2
x I1 (x)
To show (9.72), multiply I (x) by x and then take the derivative with respect
to x, while using (9.66),
d
x I (x)
dx
=
=
=
d
I (x)
dx
d
I (x)
dx
x I1 (x)
I1 (x) I (x)
x
d
I (x)
dx
d
I (x)
dx
x I+1 (x)
( + 1)
i (J (ix) + iY (ix))
2
To show (9.75), multiply K (x) by x and then take the derivative with respect
to x, while using (9.65) and (9.69),
d
x K (x)
dx
=
=
( + 1) 1
exp
i x (J (ix) + iY (ix))
2
To show (9.72), multiply I (x) by x and then take the derivative with respect
to x, while using (9.66) and (9.70),
d
x K (x)
dx
=
=
( + 1)
i x1 (J (ix) + iY (ix))
2
exp
x K+1 (x)
d
K (x)
dx
d
K (x)
dx
x K1 (x)
K1 (x) I (x)
x
769
770
2n
J n (x) J n+1 (x)
x
2n
J n1 (x) =
J n (x) J n+1 (x)
x
Adding and subtracting these equations,
J n1 (x)
J n1 (x)
J n1 (x)
2n
(J n (x) J n (x))
x
(J n+1 (x) + J n1 (x)) J n+1 (x)
(I.68)
2n
(J n (x) + J n (x))
x
(J n+1 (x) J n1 (x)) + J n+1 (x)
(I.69)
If n is even, while using the inductive hypothesis, that is, supposing that J n (x) =
J n (x) and J n1 (x) = J n+1 (x), we can then use (I.68) and see that
J (n+1) (x) = J n+1 (x)
If n is odd, while using the inductive hypothesis, that is, supposing that J n (x) =
J n (x) and J n1 (x) = J n+1 (x), we can then use (I.69) and see that
J (n+1) (x) = J n+1 (x)
To complete the proof, we note that
J 0 (x) = (1)0 J 0 (x)
and with the recurrence formula,
J 1 (x) = J 1 (x)
We can then continue the induction process to show that the identity is satisfied
for n = 2, 3, . . . and conclude that
J n (x) = (1)n J n (x)
Similar approaches can be used to show the identities for Yn (x), In (x) and
Kn (x).
APPENDIX J
(J.1)
(J.2)
dx
= b(x, t, u)
ds
du
= c(x, t, u)
ds
(J.3)
du
= c(x, s, u)
ds
(J.4)
which can be solved either analytically or numerically for fixed values of a, where
a is the parameter along the Cauchy condition. Because of the coupling of the
equations in (J.4), the solution for x and u is a curve C(x, u) that is parameterized by
a and s. Unfortunately, these curves can contain folds, that is, several u values may
correspond to a point (x, t).
771
772
(J.5)
with the Cauchy initial condition (J.2). Then the solution of (J.4) with b(x, s, u) = u,
c(x, s, u) = 0, u(a, s = 0) = u0 (a), and x(a, s = 0) = a, is given by
u(a, s) = u0 (a)
and
x(a, s) = u0 (a)s + a
+
2 1 + eq(x)
2.5 + q(x) 2
with
q(x) =
x 10
10
2
(J.6)
We can plot u(a, s) versus x(a, s) at different fixed values of s with 80 a 100 as
shown in Figure J.1.
From the plots in Figure J.1, we see that as s increases, the initial shape moves
to the right and slants more and more to the right. At s = 29.1, portions of the curve
near x = 41.0 will have a vertical slope, and a fold is starting to form. When s = 80,
three values of u correspond to values in the neighborhood of x = 78. At s = 120,
portions of the curve near x = 54.8 will again have a vertical slope. Then at s = 300,
we see that around x = 165 and x = 235, three values of u correspond to each of these
x values. Finally, we see that at s = 600, there are five values of u that correspond to
x = 370.
at s = sbreak
(J.7)
Suppose the shock at the break time will belong to a characteristic starting from
a that belongs to a range [aleft , aright ]. For instance, one could plot the characteristics
based on a uniform distribution of a and then determine adjacent values of a whose
characteristics intersect, as shown in Figure J.2. The values of aleft and aright can then
be chosen to cover this pair of adjacent values of a. The break time sbreak and the
critical point acritical can then be determined by solving the following minimization
problem
x
such that
0
(J.8)
min {s}
a[aleft , aright ]
a
The values of x at sbreak along the characteristic corresponding to acritical will be the
break position, denoted by xbreak ,
xbreak = x (acritical , sbreak )
(J.9)
s=0.0
773
s=29.1
0.9
0.9
0.8
0.8
u(a,s)
u(a,s)
x=41.0
0.7
0.6
0.5
0.7
0.6
100
50
50
100
0.5
150
100
50
x(a,s)
x=54.8
0.9
0.8
0.8
u(a,s)
u(a,s)
0.9
0.7
0.6
0.7
0.6
50
50
100
150
0.5
50
200
x(a,s)
50
200
s=600.0
0.9
0.9
0.8
0.8
u(a,s)
u(a,s)
150
x=235
x=165
100
x(a,s)
s=300.0
0.7
0.6
0.5
100
150
x=78.0
100
s=120.0
s=80.0
1
0.5
50
x(a,s)
0.7
0.6
150
200
250
300
350
0.5
300
400
x(a,s)
500
600
x(a,s)
t
Figure J.2. Determination of aleft and aright .
t(=s)=0
a left
a right
774
150
100
Figure J.3. The characteristics corresponding to uniformly distributed values of a. Also included are two
characteristics along acritical . The circles are the break
points (xbreak , sbreak ).
t
50
50
50
100
150
200
In particular, the characteristics (x, t) for the inviscid Burger equation (J.5) are
given by straight lines
t=
xa
u0 (a)
if
u0 (a) = 0
(J.10)
(If u0 (a) = 0, the characteristics are vertical lines at a.) For the initial data u0 (x) of
(J.6), a set of characteristics corresponding to a set of uniformly distributed a values is
shown in Figure J.3. From this figure, we could set [aleft , aright ] = [0, 50] to determine
the break time of the first shock point. We could also set [aleft , aright ] = [50, 0] to
determine the break time of the other shock point. Solving the minimization problem
of (J.8) for each of these intervals yields the following results:
sbreak,1 = 29.1
sbreak,2 = 120
;
;
acritical,1 = 19.84
acritical,2 = 15.25
;
;
xbreak,1 = 41.0
xbreak,2 = 54.8
In Figure J.3, this information is indicated by two darker lines starting at (t, x) =
(0, acritical ) and ending at the points (t, x) = (sbreak , xbreak ). These break times and
break positions are also shown in Figure J.1 for s = 29.1 and s = 120 to be the
correct values where portions of the curves are starting to fold.
t)
is a weak solution of a partial differential equation, such as (J.1),
u
u
+ b(x, t, u)
= c(x, t, u)
t
x
if
0
'
(
u
u
(x, t)
+ b(x, t, u)
c(x, t, u)
dx dt = 0
t
x
(J.11)
t) can be avoided
by transferring the derivative operations instead on continuous functions (x, t).
775
u
Area1
Figure J.4. The location of xshock based on equal area rule.
Area1 = Area2
Area2
xshock
Another important point is that the function (x, t) is kept arbitrary; that is, there is
no need to specify this function nor the domain given by xright , xleft , tlow , or thigh . This
will keep the number of discontinuities to a minimum. For instance, if a continuous
u can satisfy (J.11) for arbitrary , then no discontinuity need to be introduced, and
u = u, a classic solution.
For the special case in which c(x, t, u)
= c(x, t) is continuous, let the desired
discontinuity that satisfies (J.11) occur at (t = s, xshock (s)). The value of xshock will
occur when two characteristics, one initiated at a = a() and another initiated at
a = a(+) , intersected to yield xshock . The condition (J.11) implies that xshock is located
at a position where the area of the chopped region to right of xshock is equal to the
area of the chopped region to the left of xshock , as shown in Figure J.4.
a(+)
a()
'
x
u(a, s)
a
(
da = 0
(J.12)
such that x a() , s = x a(+) , s = xshock (s).
Generally, the location of the shock path, especially one that is based on the
equal area rule, will require numerical solutions. We outline a scheme to determine
the shock path in a region where the folds yield triple values u for some x (i.e., the
case shown in Figure J.4). This scheme depends on the following operations that
require nonlinear solvers:
1. Detection of Fold Edges. Let acritical be the value found at the break time of the
shock, then
aedge,1
aedge,2
= EDGE (acritical )
where
x
x
=
0
=
a aedge,1
a aedge,2
and
(J.13)
776
a 1
a 2 = FINDa (xg , s)
a 3
(J.14)
where
a1 > a2 > a3
and
a3 (y)
a1 (y)
'
(
x
u(s, a)
da
a
(J.15)
where a1 (y) and a3 (y) are found using the operation FINDa(y).
Shock-Fitting Scheme:
r Given: s
break ,
s and acritical
r For s = s
break +
s, sbreak + 2
s, . . .
1. Calculate xg as the average of the edge values,
xg =
1
x s, aedge,1 + x s, aedge,2
2
777
250
200
Figure J.5. Two shock paths for the Burger equation under the conditions given by (J.6) using
the shock-fitting scheme based on the equal-area
principle.
150
100
50
0
0
50
100
150
x
J.1.4 Jump Conditions
We further limit our discussion to the case where b(x, t, u) = b(u) in (J.1). Under
this condition, the differential equation (J.1) results from (or can be recast as) a
conservation equation given by
u (x, t) dx = flux u(,t ) flux u(,t) +
c (x, t, u) dt
(J.16)
t
where flux(u) = b(u)du and c(x, t, u) is the volumetric rate of generation for u.
Now suppose at t, < is chosen so that the shock discontinuity is at x = xs
+
located between and . Let x
s and xs be the locations slightly to left of and right
of xs , respectively. Then
xs
u (x, t) dx +
u (x, t) dx = flux u(,t ) flux u(,t) +
c (t, x) dt
t
x+
s
(J.17)
Applying the Leibnitz rule (5.52) to (J.17), we obtain
xs
dx
+ dx+
u
u
s
s
dx + u x
,
t
+
dx
u
xs , t
s
t
dt
t
dt
x+
s
= flux u(,t ) flux u(,t) +
c (t, x) dt
+
Next, we take the limit as x
s and xs . This yields
dx
dx+
s
s
u x
u x+
= flux u(xs ,t) flux u(x+s ,t)
s ,t
s ,t
dt
dt
x+
where xs cdx = 0 if we assume that c(x, t, u) is piecewise continuous.1 As the
s
previous section showed, the shock propagation is continuous and implies that
1
A more complete assumption for c is that it does contain any Dirac delta distribution (i.e., delta
impulses).
200
778
dx+
s /dt = dxs /dt = dxs /dt. Using the jump notation,
arrive at
H
I
flux(u)
dxs
=
dt
u!
EXAMPLE J.1.
u
u
+u
=0
t
x
subject to the discontinuous condition
+
1 if x a
u(x, 0) =
0 if x > a
For this problem, b(u) = u, and the flux is
u2
flux(u) = u du =
2
Because the initial condition is immediately discontinuous, the break time in
this case is at t = 0. Using the Rankine-Hugoniot jump condition (J.18),
J 2 K
u
dxs
u+ + u
2
= L M =
dt
2
u
Because u = constant along the characteristics, u = 1 and u+ = 0, yielding
dxs
1
t
=
xs = + a
dt
2
2
Thus the solution is given by
1 if x 2 + a
u(x, t) =
0 if x > + a
2
2
The jump conditions given in (J.18) will generally not guarantee a unique solution. Instead, additional conditions known as admissibility conditions, more popularly known as Lax entropy conditions, are needed to achieve physical significance
and uniqueness. We now state without proof the following condition known as
the Lax entropy conditions applicable to the case where flux(u) is convex, that is,
d2 flux/du2 > 0:
d flux
dxs
d flux
(J.19)
du u=u
dt
du u=u+
Thus these conditions put the necessary bounds on the shock speed, at least for the
case of convex fluxes.3 This condition simply implies that if the characteristics appear
to be intersecting in the direction of decreasing t (time reversal), then this solution
is not admissible.
EXAMPLE J.2.
where A < B. Let 0 < m < 1, then a solution that contains two shock paths given
by
for x (A + m)t/2
A
u(x, t) =
m for (A + m)t/2 < x (m + B)t/2
(J.20)
B
for x > (m + B)/2
will satisfy the Rankine-Hugoniot jump conditions at both regions of discontinuities. This means there are an infinite number of possible solutions that will
satisfy the differential equation and jump discontinuity conditions.
However, using the entropy conditions given in (J.19), we obtain
dxs
>B
dt
which is not true (because it was a given in the initial condition that A < B).
This means that the discontinuous solutions in (J.20) are inadmissible based on
the entropy conditions. We see in the next section that the rarefaction solution
turns out to be the required solution.
A>
J.1.5 Rarefaction
When a first-order quasilinear PDE is coupled with a discontinuous initial condition,
we call this problem a Riemann problem. We already met these types of problems in
previous sections. In Example J.1, we saw that the Riemann problem there resulted in
a shock propagated solution for the inviscid Burger equation, where u(x a, 0) = 1
and u(x > a, 0) = 0. However, if the conditions were switched, that is, with u(x
a, 0) = 0 and u(x > a, 0) = 1, the method of characteristics will leave a domain in the
(x, t) plane without specific characteristic curves, as shown in Figure J.6.4 In contrast
3
4
A set of more general conditions are given by Oleinik entropy conditions, which are derived using
the approach known as the vanishing viscosity methods.
If the initial condition were not discontinuous, this would have been filled in without any problem,
especially because the characteristics would not even intersect and no shocks would occur.
779
780
u(x,t)=?
t
Figure J.6. Rarefaction in a Riemann problem.
0
to the shock-fitting problem, this case is called the rarefaction, a term that originates
from the phenomenon involving wave expansion of gases.
We limit our discussion to the case of (J.1), where b(x, t, u) = b(u) and c(x, t, u) =
0 with the additional assumption that the inverse function b1 () can be obtained.
Consider
u
u
+ b(u)
=0
t
x
(J.21)
subject to
5
u(x, 0) =
uleft
right
if x a
if x > a
(J.22)
where b uleft < b uright . Let the initial data be parameterized by , that is, at s = 0,
t(s = 0) = 0, x(s = 0) = and u(, 0) = uleft or u(, 0) = uright when a or > a,
respectively. Then the characteristics are given by
left
t+
if a
b u
x = b (u(, 0)) t + x =
right
b u
t + if > a
Rarefaction will start at x = a when t = 0. The characteristics at this point can be
rearranged to be
1 x a
u(a, 0) = lim b
(x,t)(a,0)
t
We could pose that the solution in the rarefaction domain to be of the form
xa
u(x, t) = b1
t
and see that this will satisfy the differential equation, that is,
u
u
1
xa xa
d
xa
+ b(u)
=0
+
b1
=0
t
x
t
t
t
d ((x a)/t)
t
The solution of (J.21) subject to (J.22) is then given by
left
u
if x b uleft t + a
1 x a
b
if b uleft t + a < x b uright t + a
u(x, t) =
right
u
if x > b uright t + a
It is left as an exercise (E10.20) to show that (J.23) is piecewise continuous.
(J.23)
781
For the inviscid Burger equation and initial conditions given by,
+
u
u
0.5 if x 2
+u
= 0 subject to u(x, 0) =
1.5 if x > 2
t
x
EXAMPLE J.3.
0.5
x2
u(x, t) =
1.5
if x 0.5t + 2
if 0.5t + 2 < x 1.5t + 2
if x > 1.5t + 2
n
n
(J.24)
i=1 j =1
Just as we did in the previous section, we look for a new set of independent
variables {1 , . . . , n }, such that under the new coordinates,
F prin (1 , . . . , n ) =
n
()
i i,i
where
= 0, 1 or + 1
(J.25)
i=1
u
i
i,j
()
2u
,
i j
()
3u
,
i j k
i,j,k
1 i, j n
1 i, j, k n
..
.
(J.26)
if i
if i
if i
if i
782
i = 1, 2, . . . , n
(J.28)
that would yield the canonical forms (J.27) may not be always be possible. However,
when the coefficients in the principal parts are constants, then the equation can be
transformed into the canonical forms given in Definition J.1.
THEOREM J.1.
(J.29)
i=1 j =1
where Ai,j = Aj,i are constants. Let (1 , 2 , . . . , n ) be a set of new independent variables defined by
x1
1
x2
2
(J.30)
.. = DU ..
.
.
n
xn
1/ |i | if 1 = 0
di =
0
if i = 0
and i is the ith eigenvalue of A. Then under the change of coordinates given by (J.30),
the partial differential equation (J.29) becomes
n
()
()
i i,i = f 1 , . . . , n , u, 1 , . . . , ()
n
, i = 0, 1 or 1
(J.31)
i=1
PROOF.
/x1
/x2
..
.
/xn
= UT D
/1
/2
..
.
/n
Using the partial differential operators, the partial differential equation (J.29) can
written as
/x1
/x2
/x1 /x2 /xn A
u = f (x, u, 1 , . . . , n )
..
.
/xn
783
or
/1
..
DUAU T D
u = f (x, u, 1 , . . . , n )
.
/n
/1
/n
()
()
sign (i ) i,i = f 1 , . . . , n , u, 1 , . . . , ()
n
i=1
where
+1
0
sign(i ) =
if i > 0
if i = 0
if i < 0
Consider the second-order differential equation with three independent variables x, y, and z,
EXAMPLE J.4.
2u
2u
2u
2u
2u
2u
+
5
2
+
+
2
+
3
= ku
x2
xy
xz y2
yz
z2
(J.32)
We now look for new coordinates 1 , 2 , and 3 that would transform (J.32) into
the canonical form given in (J.27) for purposes of classification.
Extracting the coefficients into matrix A,
3
2.5 1
A = 2.5
1
1
1
1
3
Using Schur triangularization, we can obtain the orthogonal matrix U
0.5436
0.7770
0.3176
U = 0.0153
0.3692
0.9292
0.8392
0.5099
0.1888
and the diagonal normalizing matrix D,
D = diag (0.9294, 0.5412, 0.4591)
The new coordinates are obtained as follows
0.5252x 0.7221y + 0.5029z
x
1
2 = DU y = 0.0083x + 0.1998y + 9.5029z
3
0.3853x + 0.2341y 0.0867z
z
As a check, we can apply the change of coordinates while noting that the secondorder derivatives of i , e.g., 2 i / (xy), are zero. Thus
2
3
3
i j
2u
u
=
; for p, q = x, y, z
pq
p q i j
i=1 j =1
784
ij
2u
= ku
i j
where
ij = a11
2j
2 i
2 i
+
a
+
+
a
12
33
x2
xy
z2
For instance,
12
(3)(0.5052)(0.083) + (2.5)(0.5052)(0.1998)
+(1)(0.5052)(0.5029) + + (3)(0.2952)(0.5029)
After performing the computations, we find 11 = 1, 22 = 33 = 1 and for i = j ,
ij = 0, i.e.
2u
2u
2u
+
+
= ku
1 1
2 2
3 3
(J.33)
(J.34)
Prior to the determination of whether the equation can be transformed to the hyperbolic, elliptic, or parabolic canonical forms, the roots of the characteristic form
became critical. For the hyperbolic equations, the roots were real. For the parabolic
equations, the roots were equal. And for the elliptic equations, the roots were complex. By using the character of the roots, we can then extend the concept of hyperbolic, parabolic, and elliptic to higher orders.
Definition J.2. For an mth-order semilinear partial differential equation in two
independent variables x and y,
m
i=0
Ai (x, y)
mu
= f x, y, u, [1] , . . . , [m1]
i x mi y
(J.35)
785
(J.36)
i=0
Ai (x, y) ri = 0
(J.37)
i=0
Thus for the hyperbolic case, we can determine m characteristics (i) (x, y) by
solving the m characteristic equations given by
(i),x ri (x, y)(i),y = 0
i = 1, 2, . . . , m
(J.38)
(J.39)
APPENDIX K
u (x, 0) = f (x)
and
u
(x, 0) = g(x)
t
Applying the initial conditions to the general solution for u given in (11.17),
u(x, 0)
u
(x, 0)
t
g(x) = c
d
d
c
dx
dx
(K.1)
d
1 df
1
=
g(x)
dx
2 dx 2c
(K.2)
ua
2
1
0
10
Time = 0.0
0
50
50
Time = 1.0
0
50
50
Time = 3.0
6
Time
0
50
4
2
50
0
50
50
Time = 8.0
0
50
787
50
Figure K.1. A surface plot of the trajectories of ua (left) and a set of four snapshots of ua at
different time instants (right) for the dAlemberts solution based on zero initial velocity.
and
(x)
(x)
=
=
1
1
f (x) +
2
2c
1
1
f (x)
2
2c
g()d + 1
g()d + 2
1
1
[ f (x ct) + f (x + ct)] +
2
2c
x+ct
g()d
(K.3)
xct
Equation (K.3) is known as the dAlemberts solution of the initial value problem.
EXAMPLE K.1.
4
(i , i , i , x), where
i=1
(, , , x) =
1 + tanh (x + )
2
and
1
2
3
4
1
1
1
1
4
4
4
10
1
1
0.5
0.5
1
1 x+ct
g(s)ds.
(f (x + ct) + f (x ct)) and ub(x, t) =
2
2c xct
From Figures K.1, we see that the initial distribution given by f (x) is gradually split into two shapes that are both half the height of the original distribution. Both shapes move at constant speed equal to c but travel in the
opposite directions. However, for ub, we see from Figures K.2 that the influence of the initial velocities is propagated within a triangular area determined by speed c. Combining both effects, the solution u = ua + ub is shown in
Figures K.3.
Let ua (x, t) =
788
ub
2
1
0
10
Time = 0.0
0
50
0
50
50
Time = 1.0
50
Time = 3.0
1
6
Time
0
50
Time = 8.0
50
50
0
50
0
50
50
Figure K.2. A surface plot of the trajectories of ub (left) and a set of four snapshots of ub at
different time instants (right) for the dAlemberts solution based on zero initial distribution.
u (x, 0)
u
(x, 0)
t
f (x)
g(x)
2u
1 2u
x2
c2 t2
for x 0
for
x0
u(0, t) = (t)
t0
(K.4)
where, for continuity, (0) = f (0) and d/dt(0) = g(0). We can first find a solution,
v(x, t), whose domain is < x < . The desired solution, u(x, t), will be obtained
by restricting v(x, t) values at 0 x , that is,
u(x, t) = v(x, t)x0
TIme = 0.0
0
50
2
1
0
10
(K.5)
0
50
Time
6
50
4
2
0
0 50
0
50
0
50
50
Time = 3.0
50
Time = 8.0
50
Time = 1.0
50
Figure K.3. A surface plot of the trajectories (left) and four snapshots of the distribution at
different time instants for u = ua + ub.
789
x2
c2 t2
v (x, 0)
v
(x, 0)
t
f e (x)
g e (x)
v(0, t) = (t)
t0
where,
f e (x) = f (x)
and
g e (x) = g(x)
for x 0
Note that f e and g e have not been defined completely. The solution for v(x, t) is the
dAlemberts solution, given by v = e (x + ct) + e (x ct), where
1
1 s
1
1 s
e (s) = f e (s) +
g e ()d and e (s) = f e (s)
g e ()d
2
2c 0
2
2c 0
For x 0, we have (x + ct) > 0 and so e (x + ct) is immediately given by
1
1 x+ct
e (x + ct) = f (x + ct) +
g()d
2
2c 0
However, because (x ct) < 0 when x < ct, e (x ct) has to be handled differently
because f e (s < 0) and g e (s < 0) has not been defined. At x = 0, we have
v(0, t) = e (ct) + e (ct) = (t)
or
s
e (s) = e (s)
c
x
e (x ct) = t
e (ct x)
c
1
1 ct+x
f
+
x)
x)
+
g()d
(ct
(ct
2
2c ctx
x
+ t
for 0 x < ct
c
u(x, t) =
1
1 x+ct
f (x ct) + f (x + ct) +
g()d
for x ct
2
2c xct
(K.6)
u (x, 0)
u
(x, 0)
t
=
=
2u
1 2u
2 2
2
x
c t
f (x)
for x 0
g(x)
=
;
x0
u(0, t) = (t)
t0
(K.7)
790
df
(0). Again, we solve the following extended problem
dx
but this time with the Neumann boundary condition,
2v
1 2v
x2
c2 t2
v (x, 0)
v
(x, 0)
t
f e (x)
g e (x)
v
(0, t) = (t)
x
1
1
f (x + ct) +
2
2c
x+ct
g()d
0
However, for e (x ct), we can use the Neumann condition to handle the range
0 x < ct,
v
(0, t) = (t) = e (ct) + e (ct)
x
from which
s
e (s) = e (s)
c
e (s)
e (x ct)
s/c
()d e (s)
0 t(x/c)
0
()d e (ct x)
1
1 x+ct
f (ct + x) f (ct x) +
g()d
2
2c ctx
t(x/c)
c
()d
for 0 x ct
0
u(x, t) =
(K.8)
1 x+ct
1
f (x ct) + f (x + ct) +
g()d
for x ct
2
2c xct
u (x, 0)
u
(x, 0)
t
f (x)
g(x)
1 2u
2u
x2
c2 t2
for 0 x L
=
;
for
u(0, t) = 0
0xL
t0
(K.9)
where, for continuity, we need f (0) = 0 = f (L). For this case, we use the method of
reflection given by the following extension,
2v
1 2v
=0
x2
c2 t2
v (x, 0)
v
(x, 0)
t
f e (x)
g e (x)
x
v(0, t) = 0
f (x)
for 0 x L
f (x)
for L x 0
f e (x) =
|x| > L
f e (x 2L)
The solution can then given by
u(x, t) = v(x, t)
(K.10)
x0
where
v(x, t) =
1
1
( f e (x + ct) + f e (x ct)) +
2
2c
x+ct
g e ()d
xct
L1 L2 (1 u1 + 2 u2 ) = L1 (1 L2 u1 + 2 L2 u2 )
1 L1 L2 u1 = 1 L2 (L1 u1 ) = 0
Next, assume the theorem is true for m = 1. Then with L = LAL = L LA where
1
LA = 1
i=1 Li whose solution is given by uA =
i=1 i ui , and with u = uA + u ,
we have
Lu
LAL uA = L (LAuA) = 0
Then, by induction we have proven that (11.12) is a solution for the case when
Li = Lj for i = j , i, j = 1, . . . , m.
For the case where Li is repeated k times, note that
Lki (g j ui )
k
=0
=
Thus
k!
ui
L g j Lk
i
(k )!! i
ui Lki g j = 0
k
Lki
g j ui = 0
j =1
791
792
'
(
dn
dm
z(x) = p (x) m
n
dx
dx
r(x)n m dx
B=
m (B)
dm /dx(B)
n (B)
dn /dx(B)
u = u, we have
&
x = x and &
F x, t, u, . . . , (m) [,m] . . . , = 0
(K.12)
where
[,m] =
u
m&
t) ( m&
x)
( &
After taking the derivative with respect to and then setting = 1, we obtain a
quasilinear differential equation given by
F
F
F
F
&
x
+ &
t
+ &
u
+ + (m ) [,m] [,m] + = 0
&
x
&
t
&
u
where the other terms include the partial derivatives of F with respect to the partial derivatives &
u/&
t, &
u/x, etc. Method of characteristics yields the following
equations:
d&
x
d&
t
d&
u
d[,m]
dF
=
=
=
= =
&
&
x
t
&
u
0
(m ) [,m]
At this point, we assume that = 1 for brevity.1 Solving the first equations excluding
the last term will yield the following invariants
d&
x
d&
t
=
&
t
&
x
d&
t
d&
u
=
&
t
&
u
&
x
&
t
&
u
&
t
..
.
d&
t
d[,m]
=
&
t
(m ) [,m]
,m =
[,m]
&
t((m))
..
.
plus F , which is another invariant. We also can now use x, t, and u instead of&
x,&
t, and
&
u because the invariants also satisfy the symmetry conditions. The general solution
of the quasilinear equation can now be given by
F = g (, , . . . , ,m , . . .) = 0
For the invariants with = 0, that is, the partial derivatives with respect to x only,
we have
[0,m] =
m u m dm
= t
xm
dm
0,m =
dm
dm
With
[,m] =
[0,m]
j =0
c j j
dm+j
dm+j
,m =
c j j
j =0
dm+j
dm+j
793
794
APPENDIX L
2ikt
Ck exp
(L.1)
g FS (t) =
T
k=
2it
2i(k )t
g(t) exp
dt =
Ck
exp
dt
T
T
0
0
k=
Because
e2mi = cos (2m) + i sin (2m) = 1
we have
T
0
with m an integer
5
T
2i(k )t
T
2i(k)
dt =
exp
e
1 =
T
2i (k )
0
Thus
1
C =
T
0
2i
g(t) exp
t dt
T
if k =
if k =
(L.2)
t +
i
t
g k exp
C =
N
t
2
N
k=1
795
796
g0 + gN
2
xk =
g k1
for k = 1
(L.3)
for k = 2, . . . , N
then we obtain
y =
N
(k1)(1)
xk W[N]
k = , . . . , N
(L.4)
k=1
where W[N] = e(2/N)i . Equation (L.4) is known as the discrete Fourier transform
of vector x = (x1 , . . . , xN )T . For the determination of y , = 1, . . . , N, a matrix
representation of (L.4) is given by
y = F [N] x
where
F [N]
1
1
= 1
.
.
.
1
(L.5)
1
W[N]
1
N1
W[N]
2
W[N]
..
.
N1
W[N]
..
.
W[N]
..
.
(N1)(N1)
W[N]
2(N1)
(L.6)
For the special case of N = 2m for some integer m 1, we can obtain the classic
algorithm known as the Radix-2 Fast Fourier Transform, or often simply called Fast
Fourier Transform FFT. The FFT algorithm significantly reduces the number of
operations in the evaluation of (L.5).
First, note from (L.6) that F [1] = 1. For N = 2m , m 1, we can separate the odd
2
= W[N/2] to obtain a rearrangement of
and even indices and use the fact that W[N]
(L.4) as follows:
y
N/2
(2k2)(1)
(2k1)(1)
x2k1 W[N]
+ x2k W[N]
k=1
N/2
N/2
(k1)(1)
(k1/2)(1)
2
2
x2k1 W[N]
+
x2k W[N]
k=1
N/2
k=1
(k1)(1)
x2k1 W[N/2]
k=1
1
+ W[N]
N/2
(k1)(1)
x2k W[N/2]
(L.7)
k=1
T
Equation (L.7) is known as the Danielson-Lanczos equation. Let y = yTA yTB
T
N
where yA = (y1 , . . . , yN/2 )T and yB = y(N/2)+1 , . . . , yN . Because W[N]
= 1 and
N/2
odd
even
F [N/2] P[N]
x +
[N/2] F [N/2] P[N]
x
yB
odd
even
F [N/2] P[N]
x
[N/2] F [N/2] P[N]
x
797
where
odd
P[N]
=
e1
...
e3
eN1
[N/2] =
T
even
P[N]
=
1
W[N]
..
(N/2)1
(o|e)
...
eN
T
(o|e)
(o|e)
P
[N] = Z[N] I2 F [N/2] P[N]
P[N]
e4
W[N]
[N/2] F [N/2]
F [N/2]
F [N] =
odd
P
[N]
=
even
P[N]
e2
where
and
Z[N]
IN/2
=
IN/2
(L.8)
[N/2]
[N/2]
(L.9)
where,
G[N]
bitreverse
P[N]
Z[N] I2 Z[N/2] I4 Z[N/4] IN/2 Z[2]
(o|e)
(o|e)
(o|e)
(o|e)
IN/2 P[2] I4 P[N/4] I2 P[N/2] P[N]
bitreverse
It can be shown that the effect of P[N]
on x is to rearrange the elements of
x by reversing the bits of the binary number equivalent of the indices. To illustrate,
let N = 8, then
1 0 0 0 0 0 0 0
x1
1 0 0 0 0 0 0 0
x5
0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
x3
0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0
x7
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
bitreverse
x=
x=
P[8]
x2
0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
x6
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0
x4
0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
x8
0 0 0 0 0 0 0 1
798
Instead of building the permutations, we could look at the bit reversal of the binary
equivalents of the indices of x (beginning with index 0),
000
001
010
011
100
101
110
111
reverse bits
000
100
010
110
001
101
011
111
decimal
0
4
2
6
1
5
3
7
add 1
1
5
3
7
2
6
4
8
FFT Algorithm:
Given: x[=]2m 1
y Reverse Bits(x)
For r= 1, . . . , m
y I2mr Z[2r ] y
End
Remark: The MATLAB command for the FFT function is y=fft(x).
EXAMPLE L.1.
Let
2
t
cos
t
g(t) =
4
20
28 10
if t < 20
if 20 t < 80
if t 20
Now apply the Fourier series to approximate g(t) for 0 t 200 with T = 200
and sampling N = 210 + 1 uniformly distributed data points for g(t).
Using x as defined in (L.3) and y = FFT (x), we can obtain a finite series
approximation given by
g FFT,L =
L
k=L
Ck W
kt
L
1
kt
=
Real yk+1 W
y1 + 2
N
k=1
Note that only the first N/2 = 2m1 terms of y are useful for the purpose of
approximation, i.e. L N/2.
Figure L.1 shows the quality of approximation for L = 10 and L = 25.
L=10
L=25
20
20
10
g(t)
10
g(t)
10
10
20
20
799
50
100
150
200
50
100
150
k (z z )k
(L.10)
k=0
where,
1 dk f
k =
k! dzk z=z
(L.11)
THEOREM L.1.
f re
zre
f im
zim
f re
zim
f im
zre
(L.12)
(L.13)
200
800
The pair of equations (L.12) and (L.13) are known as the Cauchy-Riemann
conditions.
Some Important Properties of Analytic Functions:
Let f (z), f 1 (z) and f 2 (z) be analytic in the same domain D, then
1. Linear combinations of analytic functions are analytic; that is, f sum (z) =
1 f 1 (z) + 2 f 2 (z) is analytic.
2. Products of analytic functions are analytic; that is, f prod (z) = f 1 (z) f 2 (z) is analytic.
3. Division of analytic functions are analytic except at the zeros of the denominator;
that is, f div (z) = f 1 (z)/ f 2 (z) is analytic except at the zeros of f 2 (z).
4. Composition of analytic functions are analytic; that is, f comp (z) = f 1 ( f 2 (z)) is
analytic.
5. The inverse function, f 1 ( f (z)) = z, is analytic if df/dz = 0 in D and f (z1 ) =
f (z2 ) when z1 = z2 .
6. The chain rule is given by
d
df 2 df 1
[f 2 (f 1 (z))] =
dz
df 1 dz
(L.14)
(L.15)
we are assuming that C is a simple-closed curve; that is, C is a curve that begins
and ends at the same point without intersecting itself midway. Furthermore, the line
THEOREM L.2.
curve C, then
3
f (z)dz = 0
(L.16)
PROOF.
dzim
= n re
ds
For higher dimensional regions, if any simple closed path in D can be shrunk to a point, then D is
simply connected.
801
802
b
a
Figure L.2. The curves C, C , and H used for proof of Theorem L.3.
Thus
3
f (z)dz
( f re + if im ) (nim + i n re ) ds
3
=
( f im n re + f re n im ) ds + i
( f re n re f im n im ) ds
C
dzre dzim
zre
zim
zre
zim
C
Because analytic functions satisfy the Cauchy-Riemann conditions, the integrands are both zero.
THEOREM L.3.
C
PROOF.
Based on Figure L.2, we see that the integral based on curve H is given by
3
b
a
3
3
f (z)dz =
f (z)dz +
f (z)dz +
f (z)dz
f (z)dz
H
C
C
Theorem L.3 does not constrain how the shrinking of curve C to C occurs
except for the conditions given in the theorem. For instance, if f (z) is analytic
throughout the interior of C, then the smaller curve C can be located anywhere
inside C.
We now shift our focus on point zo and the contours surrounding it.
Definition L.4. For a given point zo and function f (z), let C be a simple closed
curve that encircles zo such that zo is the only possible singularity of f (z) inside
C; then the residue of f (z) at zo is defined as
Reszo (f ) =
1
2i
3
f (z)dz
(L.18)
THEOREM L.4.
f (z), then
Reszo ( f ) =
1
dk1
lim k1 [z zo ]k f (z)
(k 1)! zzo dz
(L.19)
PROOF.
z = zo + ei
for 0 2
and
(z zo ) = ei
dz = iei d
Thus
3
(z zo ) dz = i+1
O
2
0
ei(+1) d =
2i
if = 1
(L.20)
if = 1
A result known as Moreras theorem guarantees that if Reszo ( f ) = 0, then f (z) is analytic at a small
neighborhood around zo .
Note that Theorems L.2 and L.4 are both associated with Cauchys name, but the two theorems are
not the same. Strictly speaking, Cauchys integral representation actually refers only to the case of
a simple pole, that is, k = 1.
803
804
Because zo is a pole of order k of f (z), there exists a curve C such that the function
g(z) = (z zo )k f (z)
is analytic inside and on a curve C, which includes zo as an interior point; that is, it
could be expanded into a Taylor series around zo ,
(z zo ) f (z)
k
g(z) =
n (z zo )n
n=0
f (z)
n (z zo )nk
(L.21)
n=0
where
n = lim
zzo
1 dn g
1 dn
k
=
lim
f
(z)
z
]
[z
o
zzo n! dzn
n! dzn
(L.22)
Based on the definition of the residue, choose the curve C to be a small circle O
centered at zo such that f (z) is analytic on and inside the circle O except at zo . This
means that the radius of the circle, , must be chosen to be small enough such that,
inside O , zo is the only singular point of f (z). Taking the contour integral of (L.21),
with substitutions of (L.20) and (L.22),
3
f (z)dz
n=0
3
n
(z zo )nk dz
O
2i k1
2i
dk1
lim k1 [z zo ]k f (z)
(k 1)! zzo dz
Thus
Reszo ( f ) =
1
dk1
lim k1 [z zo ]k f (z)
(k 1)! zzo dz
We now state a generalization of Theorem L.3. This theorem is very useful for
the evaluation of contour integrals in the complex plane.
Residue Theorem. Let f (z) be analytic on and inside the closed curve
C except for isolated singularities: z , = 1, 2, . . . , n. Then the contour integral of f (z)
along C is given by
3
n
f (z)dz = 2i
Resz ( f )
(L.23)
THEOREM L.5.
=1
We prove the theorem only for n = 2, but the same arguments can be generalized easily for n > 2.
Let C1 and C2 be nonintersecting closed curves inside C such that the pole z1 is
inside C1 only and the pole z2 is inside C2 only. As shown in Figure L.3, f (z) will be
analytic in the curve H as well as in the interior points of H.
PROOF.
z2 C 2
z1
z2
z1
C1
Thus
3
f (z)dz = 0 =
H
f (z)dz
C
f (z)dz
C1
f (z)dz
C2
or
3
f (z)dz =
C
f (z)dz +
C1
f (z)dz
C2
n
Resz (z)
=1
Note that Theorem L.5 is true whether the isolated singularities are poles or
essential singularities. However, we limit our applications only to singularities involving poles. As such, the formula for calculating residues when singularities are poles
(cf. Theorem L.4) is used when invoking the method of residues.
(L.24)
(L.25)
805
806
P
bR
right
aR
right
(aR ,bR )
|z(t = b)| =
and
(L.26)
path P. Likewise, we denote the arc by aR ,bR if the arc starting from aR is to the
right of path P (see Figure L.4).
The main idea is to combine either the left or right arc with the subpath,
P(aR , bR ), to obtain a simple closed curve from which we can apply the method
of residues.
3. Convergence Assumptions. In handling the path integration along the left circular arcs, we assume the following condition:
lim R
max
| f (z)| = 0
(L.27)
We refer to (L.27) as the convergence condition in the left arc. Together with
the following inequality (also known as Darbouxs inequality),
f (z) |dz| < 2R
max | f (z)| (L.28)
f
(z)dz
(aR ,bR )left
we obtain
lim
R
f (z)dz = 0
(L.29)
max
| f (z)| = 0
(L.30)
and obtain
lim
R
f (z)dz = 0
(L.31)
4. Cauchy Principal Value. With finite limits, the following identity is true:
P(aR ,br )
f (z)dz =
f (z)dz +
P(aR ,0)
f (z)dz
P(0,br )
However, the integral P(aR ,0) f (z)dz or the integral P(0,bR ) f (z)dz, or both, may
diverge as aR , bR , even though the integral
f (z)dz
(L.32)
PV ( f ) = lim
aR ,bR P(aR ,bR )
converges. In our calculations of P fdz that follow, we mean the limit calculation
of (L.32). The integral in (L.32) is known as the Cauchy principal value of f (z).
We now state a theorem that shows how the method of residues can be applied
to complex integrations along infinite paths.
THEOREM L.6.
Let P(t) be an infinite path that does not pass through any singular
points of f (z).
1. Let z1 , z2 , . . . , zn be the singularities in the region to the left of path P(t), and f (z)
satisfies the absolute convergence in the left arc condition given in (L.27), then
f (z)dz =
P
n
Resz ( f )
(L.33)
=1
2. Let z 1 , z 2 , . . . , z m be the singularities in the region to the right of path P(t), and
f (z) satisfies the absolute convergence in the right arc condition given in (L.30),
then
m
f (z)dz =
Resz ( f )
(L.34)
P
=1
PROOF. Based on Figure L.5, where R is chosen large enough such that the contour
formed by the subpath P(aR , bR ) and -(aR , bR )left will contain all the singular points
of f (z) that are to the left of P. Then using the theorem of residues,
P(aR ,bR )
f (z)dz
f (z)dz =
=1
n
=1
n
Resz ( f )
Resz ( f )
807
808
(aR,bR)left
R
zn
z2
bR
z1
aR
P(aR,bR)
Likewise, based on Figure L.6, where R is chosen large enough such that the contour
formed by the subpath P(aR , bR ) and (aR , bR )right will contain all the singular
points of f (z) that are to the right of P. Then using the theorem of residues,
P(aR ,bR )
f (z)dz +
f (z)dz =
m
Resz ( f )
=1
n
Resz ( f )
=1
Note that the convergence conditions, (L.27) and (L.30), are sufficient conditions
that may sometimes be too conservative. In some cases, they could be relaxed.
In particular, we have the result known as Jordans lemma, which is useful when
calculating Fourier transforms and Fourier-Sine/Fourier-Cosine transforms.
Let f (z) = g(z)eiz, where > 0, with (aR , bR )left and (aR , bR )right
as the semicircle in the upper half and lower half, respectively, of the complex plane,
THEOREM L.7.
1. If
lim
then
max
z2
aR
(L.35)
f (z)dz = 0
(L.36)
lim
R
-P(aR,bR)
|g(z)| = 0
bR
zm
z1
(aR,bR)
right
2. If
f (z)dz = 0
(L.38)
lim
R
PROOF.
(L.37)
max
lim
then
|g(z)| = 0
We show the theorem only for the left arc, that is, upper half of the complex
plane,
On the semicircle, we have z = Rei . Thus
dz = Rei d
|dz| = R|d|
and
eiz
eiR(cos +i sin )
iz
e = eR sin
g(z) eiwz |dz|
max
<
max
<
max
|g(z)|
|g(z)|
<
<
/2
|g(z)|
2R
R sin
0
/2
2R/
max |g(z)|
1 eR
max |g(z)|
2R
iwz
e |dz|
f (z)dz = 0
Theorem L.7 assumed > 0 and > 0. For < 0, we need to traverse the
path in the opposite directions; that is, we need to replace by .
809
810
zIm
left
aR
R
aR
P(aR,bR)
zRe
bR
-P(aR,bR)
bR
zRe
(aR,bR)right
(L.39)
Res[(1+i)/2] ( f )
Res[(1+i)/2] ( f )
x3 ix
e
dx
1 + x4
x3 ix
e
1 + x4
'
(
1+i
1
z
f
(z)
= e(1i)/ 2
4
z(1+i)/ 2
2
'
(
1 + i
1
z
lim
f (z) = e(1+i)/ 2
4
z(1+i)/ 2
2
'
(
i cos e/ 2
2
lim
For > 0, we can use the closed-contour in the lower region of the complex
plane. Doing so, we have
x3 ix
( f ) + Res
(f )
e
dx
=
2i
Res
(L.41)
(1i)/
2
(1i)/
2
[
]
[
]
4
1 + x
and
Res[(1i)/2] ( f )
'
(
1i
1
z
f
(z)
= e(1i)/ 2
4
z(1i)/ 2
2
Res[(1i)/2] ( f )
'
(
1 i
1
z
f
(z)
= e(1+i)/ 2
4
z(1i)/ 2
2
lim
lim
'
x3 ix
e dx
1 + x4
(
/ 2
i cos e
2
x
x3 ix
F
=
e
dx
=
i
cos
e[||/ 2]
[sgn()]
4
4
1+x
2
1 + x
Similarly, we have
'
g(x) sin(x)dx = Im
(
g(x)e dx
ix
(L.44)
Based on Jordans lemma, that is, Theorem L.7, with = > 0, we need to
satisfy only the condition given in (L.35) and apply it to the contour in the upper
region of the complex plane,
max |g(z)| = 0
(L.45)
lim
R
EXAMPLE L.3.
|z|=R,zim 0
811
812
Using a semicircle in the upper region of the complex plane as the contour of
integration, we apply Theorem L.7 to obtain
R
f (zre )dzre = 2i Res[1+i] ( f ) + Res[1+i] ( f )
lim
R R
where,
z2 eiz
1 + z4
f (z) =
with
Then,
Res[1+i]/2 ( f )
Res[1+i]2 ( f )
2(1 i) (1+i)/2
e
8
2(1 + i) (1i)/2
e
8
'
x2 cos x
dx
1 + x4
(
x2 eix
Re
dx
4
1 + x
'
(
1
1
sin
e(1/ 2)
cos
2
2
2
=
=
2. Rectangular Contours. Sometimes the limits involve a line that is shifted parallel
to the real axis or the imaginary axis. In these cases, it may often be convenient
to use evaluations already determined for the real line or imaginary axis. To do
so, we need a rectangular contour. This is best illustrated by an example.
EXAMPLE L.4.
where > 0.
First, consider > 0. We could simplify the integral by first completing the
squares,
8
2 9
i 2
i
i
2
2
x ix = x + x +
2
2
=
thus
x2 ix
i 2 2
x +
2
4
dx
=
=
2 /(4)
/(4)
e[x+i/(2)] dx
+i/(2)
+i/(2)
ez dz
2
813
zIm
zRe
R+i/(2)
ez dz = 0
2
R+i/(2)
and
lim
ez dz = 0
2
R R+i/(2)
resulting with
+i/(2)
+i/(2)
ez dz =
2
7
ez dz =
2
Using a rectangular contour in the lower region, a similar approach can be used
to handle < 0. Combining all the results, we obtain
2 7
2
F e x =
e /(4)
This says that the Fourier transform of a Gaussian function is another Gaussian
function.
3. Path P Contains a Finite Number of Simple Poles. When the path of integration contains simple poles, the path is often modified to avoid the poles using a
semicircular indentation having a small radius, as shown in Figure L.9. Assuming convergence, the calculation for the integral proceeds by taking the limit as
0.
z0
(a)
P
z0
(b)
814
[sin(x)/x]dx.
zRe
(-)
(+)
EXAMPLE L.5.
(L.46)
First, we evaluate the integrals with limits from to . Using the techniques
for solving integrals with sinusoids as given in (L.44),
' ix (
sin(x)
e
dx = Im
dx
x
x
Using the path in the real line, z = 0 is a pole in the real line. Thus, modifying
the path to avoid the origin, we obtain the closed contour shown in Figure L.10
given as C = () + + (+) + R .
The integral along can be evaluated by setting z = ei . As a consequence,
dz
= id
z
and
eiz
dz =
z
lim
0
Conversely, we have
eiz
dz = i
z
lim
R R
Thus
or
exp iei id
eiz
dz = 0
z
eix
dx = i
x
sin(x)
dx = Im [i] =
x
Because the function sin(x)/x is an even function, we could just divide the value
by 2 to obtain the integral with limits from 0 to , that is,
sin(x)
dx =
(L.47)
x
2
0
815
zIm
zn
[(x2 + 4)
2i
z3
z2
z1
zRe
P
4. Regions Containing Infinite Number of Poles. In case there is an infinite number of poles in the region inside the contour, we simply extend the summation of
the residues to contain all the poles in that region. If the infinite sum of residues
converge, then the method of residues will still be valid, that is,
3
f (z)dz =
Resz ( f )
(L.48)
C
EXAMPLE L.6.
=1
(x + 4) cosh(x)
(L.49)
From the roots of (z2 + 4) and the roots of cosh(z) = cos(iz), the singularities
are all simple poles given by:
z0 = 2i,
z =
2 1
i,
2
= 1, 2, . . . ,
f (z)dz
f (z)dz = 2i Res(2i) [ f ] +
Res(z ) [ f ]
(L.50)
lim
R
P
R
=1
Res(2i) [ f ] = lim
and with z = i(2 1)/2, together with the application of LHospitals rule,
Res(z ) [ f ]
=
=
=
lim
zz
z2
(z2
z z
+ 4) cosh(z)
1
1
i
sin(iz
+4
)
4
(1)
i 42 (2 1)2 2
816
rb
a
-1
ra
1
zRe
1
1
dx
=
+
8
(1)
2
2
2 cos(2)
4 (2 1)2 2
(x + 4) cosh(x)
=1
(L.51)
5. Integrals along Branch Cuts. When the integrand involves multivalued complex
functions, a branch cut is necessary to evaluate the integral. This means that a
Riemann sheet4 has to be specified by selecting the range of the arguments of
complex variable z. Usually, the ranges for the argument are either 0 < arg(z) <
2, < arg(z) < , (/2) < arg(z) < (5/2) or /2 < arg(z) < (3/2) for
branch cuts along the positive real line, negative real line, positive imaginary
line, or negative real line, respectively. In other cases, the range of arg(z) may
be a finite segment in the complex plane.
Once the particular Riemann sheet has been selected, the method of residues
can proceed as before.
EXAMPLE L.7.
dx
+ 1) 1 x2
(L.52)
This is a finite integral in which the integrand contains a square root in the
denominator. One can check that the points z = 1 and z = 1 are branch points5
of f (z) where
f (z) =
(z2 + 1) z2 1
(Note that we used z2 1. The form 1 x2 will show up from the calculations
later.)
We could be rewrite the square root terms as
"
!
z2 1 =
(z 1)(z + 1)
"
=
(ra eia ) (rbeib )
=
ra rb ei(a +b)/2
where,
z 1 = ra eia
and
z + 1 = rbeib
(L.53)
817
zIm
R
1,R
-1,1
Figure L.13. Contour used for solving the integral in Example L.7.
-1
1,-1
R,1
We can then specify the branch cut by fixing the ranges on a and b to be
0 < a < 2 and
0 < b < 2
Aside from being branch points, the points z = 1 are also singular points.
We can then choose the contour shown in Figure L.13 and implement the method
of residues. The closed-contour C is given by
C
=
=
Following earlier methods, we can evaluate the integrals along the three circular
paths: the outer circle R and the pair of inner circles (1) (1) , to yield zero
values as the limits of R and 0 are approached, respectively. Thus
we need to evaluate only the four remaining straight paths. Because f (z) is
multivalued, the path along a common segment, but in opposite directions, may
not necessarily cancel. We now show that the integrals along 1,R and R,1 will
cancel, whereas the integrals along 1,1 and 1,1 will not.
Along the path R,1 , we have zim = 0, 1 < zre R, a = 2 and b = 2, thus
f (z)R,1 =
1
1
=
(1 + x2 ) ra rb e2i
(1 + x2 ) x2 1
Similarly, along path 1,R , we have zim = 0, 1 < zre < R, a = 0 and b = 0,
f (z)1,R =
1
1
=
2
(1 + x2 ) ra rb
(1 + x ) x2 1
The sum of integrals along both 1,R and R,1 is then given by
1,R
f (z)dz +
R,1
f (z)dz
1
dx
1 (1 + x2 ) x2 1
1
1
+
dx
2
R (1 + x ) x2 1
Along the path 1,1 , we have zim = 0, 1 < zre 1, a = and b = 2, thus
f (z)1,1 =
1
1
=
2
(1 + x2 ) ra rb e3i/2
(1 + x )i 1 x2
zRe
818
Similarly, along path 1,1 , we have zim = 0, 1 < zre < 1, a = and b = 0,
f (z)1,1 =
(1 +
x2 )
1
1
=
i/2
ra rb e
(1 + x2 )i 1 x2
1,1
f (z)dz +
1,1
f (z)dz
+
2
i
(1 +
1
1
1
1
dx
1 x2
x2 )i
(1 +
1
dx
1 x2
x2 )i
1
dx
(1 + x2 ) 1 x2
Next, we need to calculate the residues at the poles z = i. Note that because
the function is multivalued, we need to be careful when taking the limits of the
square root. First, consider the pole z = i. At this point, we have
z 1 = i 1
z + 1 = 1 + i
2 e5i/4
2 e7i/4
Thus
Resi [ f ]
=
=
=
z+ i
!
lim
zi (1 + z2 ) (z 1)(z + 1)
1
1
2i
2 e3i/2
2 2
z+ 1 = i + 1
3i/4
2e
i/4
2e
and
Resi [ f ]
=
=
=
z i
!
lim
2
zi (1 + z ) (z 1)(z + 1)
1
1
2i
2 ei/2
1
2 2
2
i
f (z)dz
2i (Resi [ f ] + Resi [ f ])
1
dx
(1 + x2 ) x2 1
1
2i
2
1
dx
x2 1
(1 +
x2 )
THEOREM L.8.
where
f (x+ ) = lim f (x + ||)
0
and
f (x ) = lim f (x ||)
0
f (x ) + f (x ) = lim
f (x + )
d
(L.55)
2
PROOF.
as long as f (x) satisfy Dirichlets conditions. The proof of (L.55) is given in section L.6.1 (page 836).
Let t = x + . Also, we use the fact that
sin ()
=
cos () d
(L.56)
0
Substituting (L.56) into (L.55) with x held fixed, we get
1
1
+
f (x ) + f (x ) = lim
f (t)
cos ((x t)) d dt
2
0
(L.57)
819
820
f (t)
So with (L.58) substituted to (L.57), we obtain the Fourier integral equation given
in (L.54)
0
1
if
if
t<0
t0
(L.59)
The delta distribution is often defined as the derivative of the Heaviside step
function. Unfortunately, because of the discontinuity at t = 0, the derivative is not
defined there. However, the integral
E
F
H (t) , g(t) [a,b] =
H (t) g(t)dt
(L.60)
with g(t) at least piecewise continuous, does not present any computational or conceptual problems. We can use this fact to explore the action of (t) by studying the
integral,
E
F
(t) , g(t) =
(t) g(t)dt
(L.61)
dt
dg
=
H (t) dt
dt
=
=
+ H () g () H () g ()
dg
dt + g ()
dt
0
g(0)
(L.62)
Thus (t) can be defined based on the associated action on g(t), resulting with a
number g(0). If g(t) = 1,
(t) dt = 1
(L.63)
The operational definition of (t) given in (L.62) may suffice for some applications. Other applications, however, require extensions of this operation to accommodate algebraic operations and calculus involving (t). To do so, the theory of
distributions was developed by L. Schwarz as a framework to define mathematical objects called distributions and their operations, of which (t) is one particular
example.
if t a
0
ab
ab (t) =
exp 1 (ta)(tb)
if a < t < b
0
if t b
(L.64)
Definition L.7. A distribution, Dist (t), is a mapping from the set of test functions, test , to the set of real (or complex) numbers given by
E
F
Dist (t) (t)dt
(L.65)
Dist (t) , (t) =
821
822
ab(t)
0
a
t
Figure L.14. A plot of the smooth pulse function defined by (L.64).
(L.66)
and
2. EContinuous: For
any convergent sequence of test functions n 0 then
F
Dist (t) , n (t) 0, where the convergence of sequence of test functions satisfies.
(a) All the test functions in the sequence have the same compact support.
(b) For each k, the kth derivatives of the test functions converges uniformly
to zero.
Note that although we denote a distribution by Dist (t), (L.65) shows that the
argument t is an integration variable. Distributions are also known as generalized
functions because functions can also act as distributions. Moreover, using a very
narrow smooth-pulse function, for example, ab(t) in (L.64) centered around to with
a b and under appropriate normalization, the distribution based on a function f (t)
reduces to the same evaluation operation of f (t) at t = to . However, the important
difference is that distributions are mappings from test functions to real (or complex)
numbers, whereas functions are mappings from real (or complex) numbers to real
(or complex) numbers, as shown in Figure L.15.
<
<
f(t)
Dist(t), (t)
RI
RI
test
RI
Based on the conventional rules of integration, the following operation on distributions also yield distributions:
1. Linear Combination of Distributions. Let g 1 (t), g 2 (t) C , that is, infinitely differentiable functions, then
Distcomb (t) = [g 1 (t)Dist1 (t) + g 2 (t)Dist2 (t)]
is a distribution and
E
F
[g 1 (t)Dist1 (t) + g 2 (t)Dist2 (t)] , (t) =
F
E
F
E
Dist1 (t) , g 1 (t)(t) + Dist2 (t) , g 2 (t)(t)
(L.67)
(L.68)
[g 1 (t)Dist1 (t) (t)] dt +
F E
F
E
= Dist1 (t) , g 1 (t)(t) + Dist2 (t) , g 2 (t)(t)
2. Invertible Monotonic Transformation of Argument. Let (t) be an invertible
and monotonic transformation of argument t, that is, (d/dt = 0), then
Dist (t) = Dist ((t))
is also a distribution, and
N
O
E
F
1 (z)
Dist ((t)) , (t) = Dist (z) ,
&(z)
(L.69)
where
z
(t)
&(z)
(t)
d
dt
1 (z)
(L.70)
(L.71)
823
824
R
S
1
t
=
Dist (t) ,
(L.72)
||
(L.73)
=
()
()
1
dz
d/dt
(L.74)
d
=
Dist (t)
dt
dt
R
S
d(t)
= Dist (t) ,
dt
(L.75)
dt
Using the preceding operations of distributions, we have the following theorem
that describes the calculus available for distributions.
Let Dist (t), Dist1 (t), and Dist2 (t) be distributions, g(t) be a C function, and be a constant, then
THEOREM L.9.
(L.76)
(L.77)
(L.78)
(t ) f (t)dt
=
=
(t) f (t + )dt
f ()
1
||
1
f (0)
||
(L.80)
(t) f (t/)dt
(L.81)
tn
0
d
(t) =
m
(1)n m! d(mn) (t)
dt
(mn)! dt(mn)
m
if
0m<n
if
0nm
(L.82)
(L.83)
(L.84)
(L.85)
(L.86)
825
826
n
k=1
1
(t rk )
|dg/dt|(t=rk )
(L.87)
(L.88)
1.
2.
3.
is piecewise
continuous
f(t)
f (t)dt < and lim|t| f (t) = 0
f (t)dt = 1
(L.89)
lim F (, t) = (t)
(L.90)
PROOF.
(L.91)
F (, t) = e(x) /2
2
(L.92)
and
827
F(,t)=(2)1/2e
( x) /2
=4
1
=2
=1
0
5
2. Rectangular Pulse. Let H (t) be the unit Heaviside step function; then the unit
rectangular pulse function is given by
1
1
f (t) = H t +
H t
(L.93)
2
2
and
1
1
F (, t) = H t +
H t
2
2
(L.94)
=5
4
=2.5
=1
0
1
0.5
0
t
0.5
828
0.6
=2
F(,t)=sin( t)/( t)
0.5
0.4
0.3
=1
0.2
0.1
0.1
0.2
20
10
0
t
10
20
x=
x1
x2
..
.
(L.97)
xn
the delta distribution of x is given by
(x) = (x1 ) (x2 ) (xn )
(L.98)
Under this definition, the properties of (t) can be used while integrating along
each dimension. For instance, the sifting property for g(x) with p Rn becomes
(L.99)
Note, however, that when dealing with the general curvilinear coordinates, normalization is needed to provide consistency.
Definition L.9. Let = (1 , 2 , . . . , n ) be a set of n curvilinear coordinates,
1
1 (x1 , x2 , . . . , xn )
2 (x1 , x2 , . . . , xn )
..
.
n
(L.100)
1 (x1 , x2 , . . . , xn )
that is invertible with the Cartesian coordinates x = (x1 , . . . , xn ), that is, the Jacobian matrix,
J C =
(1 , . . . , n )
(x1 , . . . , xn )
1 /x1
1 /xn
..
.
..
..
.
n /x1
(L.101)
n /xn
is nonsingular.
Then, the delta distribution under the new coordinates of is given by
() =
(1 ) (2 ) (n )
det (J C )
(L.102)
(L.103)
(x) dV
() dV
=
xn,hi
xn,lo
=
n,lo
=
1
n,hi
n,hi
n,lo
x1,hi
x1,lo
1,lo
1,hi
1,hi
1,lo
( (1 ) (n ))
dx1 dxn
det (J C )
( (1 ) (n ))
det (J C ) d1 dn
det (J C )
( (1 ) (n )) d1 dn
where we used the relationship of multidimensional volumes in curvilinear coordinates, that is,
dV = dx1 dxn = det (J C ) d1 dn
and x is an interior point in region V .
829
830
r sin () cos ()
r sin () sin ()
r cos ()
x/r
y/r
z/r
sin () cos ()
sin () sin ()
cos ()
r2 sin ()
x/
y/
z/
x/
y/
z/
r cos () cos ()
r cos () sin ()
r sin ()
r sin () sin ()
r sin () cos ()
0
Thus
(r, , ) =
(r) () ()
r2 sin ()
(L.104)
831
exp( | t | )
0.8
0.6
0.4
0.2
0
4
832
THEOREM L.11.
PROOF.
f (t)dt
(L.107)
Next, we need two derivative formulas. The first formula is given by,
'
(
F (it)m f (t)
=
eit (it)m f (t)dt
=
dm
dm
dm
dm
dm it
e
f (t)dt
dm
eit f (t)dt
F [f ]
(L.108)
(i)n F [ f ]
(L.109)
(L.110)
(L.111)
(L.112)
Because f is a Schwartz function, the term on the right-hand side can be replaced
by a constant Cnm . This means that F [ f ] is also a Schwartz function.
With this fact, we can define the Fourier transform of tempered distributions.
Definition L.12. Let TDist (t) be a tempered distribution and (t) a Schwartz
function. Then the generalized Fourier transform of TDist (t), denoted by
F [TDist (t)], is a tempered distribution defined by the following operation
Q
P
Q P
(L.113)
F [TDist (t)] , () = TDist () , F [(t)]
Note that (L.113) is acceptable because TDist () was already assumed to be
a tempered distribution and F [(t)] is guaranteed to be a Schwartz function (via
Theorem L.11). Also, note the change of independent variable from t to , because
the Fourier transform yields a function in . The tempered distribution TDist () will
be based on .
With this definition, we are able to define Fourier transforms of functions such
as cosines and sines and distributions such as delta distributions. Moreover, the
Fourier transforms of distributions will yield the same Fourier transform operation
if the distribution is a function that allow the classical Fourier transform.
Fourier transform of delta distribution. Let () be a Schwartz
function.
P
Q
F [(t a)] , ()
=
( a)F [(t)] d
EXAMPLE L.9.
=
=
( a)
it
(t)dt
eiat (t)dt
P
Q
eiat , (t)
Q
P
eia , ()
where we used the sifting property of delta distribution. Also, in the last line,
we substituted for t by considering t can as a dummy integration variable.
Comparing both sides, we conclude that
F [(t a)] = eia
(L.114)
(L.115)
EXAMPLE L.10.
833
834
where the right-hand side can be seen as 2 times the inverse Fourier transform
at t = a, that is,
ia
1
e F [(t)] d = 2F F [(t)]
t=a
= 2 (t)t=a = 2(a)
= 2
(t a)(t)dt
Q
2 (t a), (t)
P
Q
2 ( a), ()
=
=
(L.117)
(L.118)
(L.119)
Using (L.118), Eulers identity, and the linearity property of tempered distributions, we have
' iat
(
e + eiat
F [cos (at)] = F
2
1 iat
F e + F eiat
2
( a) + ( + a)
(L.120)
F [sin (at)] = i ( + a) ( a)
(L.121)
=
=
Similarly for sine, we obtain
=
=
(t)
()
N '
=
it
eit f ()ddt
eit f (t)dt d
(
f (t)dt , ()
where we exchanged the roles of and t in the last two lines. Thus we have
F [ f (t)] =
eit f (t)dt
This shows that we indeed obtained a generalization of the classic Fourier transform.
(L.122)
PROOF.
=
=
(t)
it
f ()d d dt
(t) cos (t) + sin (t) + (t) dt
(L.123)
where the terms cos (t), sin (t) and (t) are obtained after integration by parts6 to be
cos(t)
cos (t) = lim
f ()d
(L.124)
it
sin(t)
sin (t) = lim
f ()d
it
(L.125)
= (t) F [ f ()]
t=0
1 it
(t) =
e
f ()d
it
=
1
F [f ()]
it
(L.126)
Next, expand (L.123) to obtain the three additive terms evaluated as,
(t)cos (t)dt = 0 (treated as a principal value integral) (L.127)
N'
(t)sin (t)dt
=
N'
(t)(t)dt
() F [ f (t)]
(
1
F [ f (t)]
i
=0
O
, ()
(L.128)
(
, ()
(L.129)
Let u = f ()d and ( dv = exp(it) ). Then, v = [1/(it)] exp(it). And using Leibnitz
rule, du = f ()d.
835
836
We again switched the roles of t and in (L.128) and (L.129). Adding these three
terms together and then comparing with the right-hand side of (L.123), we obtain
(L.122).
EXAMPLE L.11. Fourier transform of the unit step function and signum function.
The (dual) definition of the unit step function is that it is the integral to the delta
distribution. Using (L.122) and the fact that F [(t)] = 1 (cf. (L.115)), we have
(
' t
H () d
F [H (t)] = F
() F [(t)]
=0
1
F [(t)]
i
1
(L.130)
i
Furthermore, with the relationship between H (t) and sgn(t) given by,
=
() +
sgn(t) = 2H (t) 1
(L.131)
2
i
(L.132)
f (x+ )
if 0 = a < b
b
2
sin ()
(L.133)
d =
f (x + )
lim
f (x )
if a < b = 0
a
if 0 < a < b
0
or a < b < 0
THEOREM L.13.
sin(q)
dq =
q
2
(L.134)
q2
lim
q1 q
1
sin(q)
dq = 0
q
(L.135)
sin()
f (x + )
d
f (x + )
f (x + + )
sin()
d + f (x + )
sin(q)
dq + f (x + )
q
sin()
d
sin(q)
dq
q
lim
f (x + )
sin()
d = 0
(L.136)
Note that so far, (L.136) has been shown to apply to a subinterval where f (x + ) is
monotonic. However, because f (x + ) satisfies Dirichlets conditions, the interval
(a, b) can be partitioned into n subintervals (ai , ai+1 ), with
0 < a = a0 < a1 < < an = b
such that f (x) is monotonic inside each subinterval (e.g., with ai occurring either at
a discontinuity, minima, or maxima of f (x)). Thus
lim
(a>0)
sin()
d = lim
n1
f (x + )
i=0
ai+1
ai
f (x + )
sin()
d = 0
(L.137)
(b<0)
lim
f (x + )
sin()
d = 0
(L.138)
Next, for the case when a = 0, we need to focus only on the first interval, (0, a1 ), in
which f (x) is monotonic because (L.137) says the integral in the interval (a1 , b) is
zero. Using the mean value theorem again, there exists 0 < < a1 such that
a1
0
sin()
d = f (x+ )
f (x + )
= f (x+ )
sin()
d + f (x + a1 )
sin(q)
dq + f (x + a1 )
q
a1
sin()
d
a1
sin(q)
dq
q
837
838
f (x + )
lim
d = f (x+ )
0
2
or with 0 = a < b,
2
lim
f (x + )
sin()
d = f (x+ )
(L.139)
(L.140)
For the last case, that is, a < 0 < b, we simply add (L.139) and (L.140) to obtain
2 (b>0)
sin()
lim
f (x + )
d = f (x+ ) + f (x )
(L.141)
(a<0)
then
lim
f (t)
PROOF.
+
f (t)
(L.143)
and
0
0
where 0 < .
With < and < , the sequence of integrals with finite limits can be interchanged, that is,
f (t)
0
0
| f (t)|dt <
f (t)
0
cos ((x t)) d dt
sin((x t))
f
(t)
dt
xt
sin()
f
(x
+
)
d
x+
1
<
|
f
(x
+
)|
d
2
=
=
<
and
f (t) cos ((x t)) dt d
<
0
<
2
| f (t)| dt d
d =
2
f (t)
f (t) cos ((x t)) dt d
<
1
1+
<
2
Thus
f (t)
(L.146)
Taking the difference between (L.143) and (L.144), and then substituting (L.145)
and (L.146),
f (t)
0
0
(L.147)
(L.148)
f (t)
0
0
839
840
f (t)
cos ((x t)) d dt =
EXAMPLE L.12. Here we extend the results of Example 12.13 to handle a different
set of boundary conditions. Thus with
2u
u
=
(L.149)
x2
t
under a constant initial condition, u(x, 0) = Ci . In Example 12.13, we have
already partially found the solution in the Laplace domain to be given by (12.83);
this was found to be
= Aex + Bex + Ci
U
(L.150)
s
where i = s/. Now we investigate the solutions for a different set of boundary conditions.
2
and
u(L, t) = CL
Ci
s
A+B+
AeL + BeL +
Ci
s
or
A=
eL(C0 Ci ) (CL Ci )
s (eL eL)
and
B=
eL(C0 Ci ) + (CL Ci )
s (eL eL)
b = 1 sinh (x)
U
s sinh (L)
To evaluate the inverse Laplace transform, we can use the residue theorem for
an infinite number of simple poles (cf. Section L.2.3).7 Fortunately, the poles of
b are all simple poles. They are given by8
a and U
both U
s=0
and
k
L = ik sk =
L
2
, k = 1, 2, . . .
a
Res 0 est U
a
Res sk est U
lim est
s0
sinh((L x))
Lx
=
sinh(L)
L
(L x) sk
esk t
s sk
sinh
lim
(L x) sk
2 sk
esk t
1
sinh
sk
cosh (L sk /)
L
'
(
2
Lx
k 2
(1)
sin k
exp
t
k
L
L
k
b,
Similarly, for U
b
L1 U
b
Res 0 est U
b
Res sk est U
=
=
1
2i
b(x, s)ds =
est U
b
Resz est U
z=0,sk
x
L
'
(
x
2
k 2
(1)
sin k
exp
t
k
L
L
k
Combining all the results, we have the solution for u(x, t):
u = usteadystate + utransient
7
The following identities are also useful for the calculations in this example:
sinh(i|z|) = i sin(|z|) , cosh(i|z|) = cos(|z|)
and
d
d
sinh(z) = cosh(z) ,
cosh(z) = sinh(z)
dz
dz
With f (x = i|z|) = sinh(i|z|) = i sin(|z|), the roots of f (x) are then given by x = i arcsin(0).
841
842
u(x,t)
utransient(x,t)
50
100
50
10
50
50
0
5
x
0
0
0.5
1 0
0.5
1 0
where
usteadystate
utransient
x
x
Ci + (C0 Ci ) 1
+ (CL Ci )
L
L
k (t) (C0 Ci ) k (L x) + (CL Ci ) k (x)
k=1
with
'
(
2
k 2
k (t) = (1)
exp
t
k
L
k
y
k (y) = sin k
L
and
Plots of utransient (x, t) and u(x, t) are shown in Figure L.20, for = 0.1, L = 1,
C0 = 0, Ci = 50, and CL = 100, where the summation for utransient was truncated
after k = 250.
1. Dirichlet Conditions and Neumann Conditions, in Finite Domain. Let
u(0, t) = C0 ;
u
(L, t) = 0
x
and
u(x, 0) = 0
where =
and
0 = AeL BeL
or
A=
eL
eL + eL
and
B=
e+L
eL + eL
Thus
U
C0 e(Lx) + e(Lx)
C0 e(2Lx) + ex
=
L
L
s
e +e
s
1 + e2L
1
=
(1)n qn
1+q
n=0
(L.151)
843
u(x,t)
1
0.5
1
0
0
0.5
x
5
10 0
1
1
= C0
U
(1)n en (x) s +
(1)n en (x) s
s
s
n=0
n=0
where,
2L(n + 1) x
2Ln + x
and
n (x) =
(x)
(x)
n
n
(1)n erfc
u(x, t) = C0
+ erfc
(L.152)
2
t
2
t
n=0
n (x) =
2u
u
=
+ u
(L.153)
2
x
t
with a constant initial condition u(x, 0) = Ci and boundary conditions
2
u (0, t) = f (t)
and
d2 U
Ci + U
= sU
dx2
844
and
B = L [f ] +
Ci
s
= L [ f ] Ci e( s+)x/ + Ci
U
s
s
d
1
= 1 e1/(4t)
=
erfc
dt
2 t
2 t3
Next, applying both shifting and scaling,
1
1
L1 e (s+a)/b =
exp
at
4bt
2 bt3
Thus with a = and b = (/x)2 ,
t
2
x
f
(t
C
x
i
u(x, t) =
exp 2 d + Ci
4
2 0
3
(L.154)
x (C0 Ci )
I(x, t) + Ci
I(x, t) =
0
2
1
x
exp
d
42
a
a
q1 () = + b
and
q2 () = b
845
u(x,t)
1
0.5
0
0
0.4
0.3
0.2
0.5
x
then
dq1 =
1
a
b
+ d
2
dq2 =
and
0.1
1 0
1
a
b
d
2
a2
+ b2 = q12 2ab = q22 + 2ab
2
e2ab
a
2
e2ab
a
q1 (t)
q2 (t)
where
g(x, t) =
0
1
x2
exp 2 d
4
The integral g(x, t) is just as difficult to integrate as I(x, t). Fortunately, we avoid
this by adding the two forms of I(x, t) based on q1 and q2 to obtain
'
(
2ab
x
x
2ab
I(x, y) =
e erfc
erfc
+ +e
x
2
2
or
u(x, y)
'
C0 Ci x/
x
e
erfc
+ t
2
2 t
(
x
+ ex / erfc
t + Ci
2 t
(L.155)
846
u(x,t)
u(x,t)
0.5
0.5
0
0
0.4
0.3
0.5
x
0.2
0.1
0
0
0.5
0.2
1 0
0.4
whose differential is
dp =
1
d
2
u(x, t) =
f t 2 Ci exp
2 dp + Ci (L.156)
p
2
p
p (t)
Take as an example, f (t) as a Gaussian function given by
f (t) = e200(t0.2)
=
=
Dist ()
()
()
()
()
=
R
=
'
(
d dt d
dt
dt d dt
d
dt
Dist ()
d(t)
d
d
d
Dist () (t)d
d
d
d
Dist () (t) dt
d
dt
d
[Dist ()] , (t)
d
by parts,
R
S
dn
tn n (t) , (t)
dt
8 n
9
n d(ni)
di
n
n
= (1)
(t)
t
dt
i
dti
dt(ni)
i=0
8 n
9
n n! di
(t) (t)dt + (1)n
(t)
ti
= (1)n n!
dt
i
i!
dti
i=1
F
E
= (1)n n! (t) , (t)
Thus
tn
Let > 0 then
R
t tn
dn
(t) = (1)n n! (t)
dtn
dn
(t) , (t)
dtn
S
=
E
F
(1)n n! (t) , t (t)
Thus
tn
dm
(t) = 0
dtm
if
0m<n
847
848
d(n+k)
t (n+k) (t) , (t)
dt
= (1)
= (1)
8 n
9
n d(ni)
dk
di
n
(t)
x
dt
i
dtk
dti
dt(ni)
i=0
k
i
n
n! i
d
d
n
t
(t)
dt
k
i
i!
dt
dti
i=0
= (1)
R
= (1)
min(n,k)
i=0
(1)i (n!)2 k!
(i!)2 (n i)!(k i)!
(n + k)!
k!
d(ki)
di
dt
(t)
dti
dt(ki)
dk
(t) (t)dt
dtk
(n + k)! dk
(t) , (t)
k!
dtk
Thus
tn
dm
m!
d(mn)
n
=
(1)
(t)
(t)
dtm
(m n)! dt(mn)
if
0nm
E
F
(g(t)) , (t)
=
=
(g(t)) (t)dt
N
k=1
rk +
rk
(g(t)) (t)dt
where > 0 is small enough such that g(t) is monotonic and invertible in the range
(rk ) t (rk + ) for all k.
We can now apply an equation similar to (L.69) for each integral term,
g(rk +)
rk +
g 1 (z)
(g(t)) (t)dt =
(z)
dz
|dg/dt|(g 1 (z))
g(rk )
rk
=
(rk )
|dg/dt|t=rk
E
F
1
(t rk ) , (t)
|dg/dt|t=rk
N
k=1
N
=
E
F
1
(t rk ) , (t)
|dg/dt|t=rk
N '
k=1
O
(
1
(t rk ) , (t)
|dg/dt|t=rk
Using (L.72),
R
S
E
F
t
F (, t), (t) = f (t),
then
E
F
F (, t) (t) , (t)
S
t
f (t),
(0)
then with
f (t)dt = 1
t
f (t)
dt
where
(t) = (t) (0)
Taking absolute values of both sides, we obtain the following inequality,
E
F
F (, t) (t) , (t) A + B
where,
A
q
t
t
dt +
f (t)
dt
f (t)
q
q
t
dt
f (t)
849
850
then
or
2 max |(t)|
t
f (t)dt
E
F
F (, t) (t) , (t) 2 max |(t)| +
t
f (t)dt
Because all the terms on the right-hand side of the inequality is fixed except for ,
we can choose arbitrarily small. Hence
lim F (, t) = (t)
APPENDIX M
m,i
f m,
,j
(M.1)
m=0 =0
where
f m,
m+
= m
t x
u
(xk ,tq )
tm
x
and
m,i
,j
= ,j m,i
x
y
xy (xk ,yn )
EXAMPLE M.1.
i=1 j =1
2
2
1
1
m,i
f m,
,j
i,j f 1,1 =
t
x (Error)
=0 m=0
i=1 j =1
f m,
1
1
=3 m=0
i=1 j =1
1
1
=0 m=3
f m,
m,i
,j
i,j
m,i
,j
i,j
i=1 j =1
(M.3)
851
852
Setting the left-hand side (M.3) equal to 0 results in a set of nine independent
linear equations:
+
0 if (i, j ) = (1, 1)
m,i
i,j =
,j
1 if (i, j ) = (1, 1)
Solving these equations, we obtain
ij
i, j = 1, 0, 1
4
which yields the following finite difference approximation of the mixed partial
derivative:
2 u
1
(k+1)
(k+1)
(k1)
(k1)
u
(M.4)
u
+
u
n1
n+1
n1
xy
4
x
y n+1
i,j =
(xk ,yn )
To determine the order of truncation error, note that the coefficients of the
lower order terms of f m, are
0
if l = 0, or m = 0
1
1
1
m,i
1 + (1)m+1
if l = 1
,j
i,j =
2m!
i=1 j =1
1
+1
1
+
(1)
if m = 1
2!
yielding
4
4
t2
x2
Error =
u
+
u
+
3!
t3 x1
3!
t1 x3
(x,t)
(x,t)
or Error = O
t2 ,
x2 .
xx (x, y, z)
2u
2u
2u
+
(x,
y,
z)
+
(x,
y,
z)
yy
zz
x2
y2
z2
+ xy (x, y, z)
2u
2u
2u
+ yz(x, y, z)
+ xz(x, y, z)
xy
yz
xz
+ x (x, y, z)
u
u
u
+ y (x, y, z)
+ z(x, y, z)
x
y
z
+ (x, y, z)u + (x, y, z)
(M.5)
Let the superscript (3() denote matrix augmentation that will flatten the 3D
tensor into a matrix representation. For instance, for k = 1, . . . , K, n = 1, . . . , N, and
u1,1,1 u1,N,1
u1,1,M
.
.
..
(3()
.
..
..
..
..
U
=
.
.
uK,1,1
uK,N,1
uK,1,M
1,1,1
..
.
K,1,1
..
.
1,N,1
..
.
1,1,M
..
.
(3()
K,N,1
K,1,M
u1,N,M
..
.
uK,N,M
..
.
1,N,M
..
.
K,N,M
etc.
where uk,n,m = u(k
x, n
y, m
z), k,n,m = (k
x, n
y, m
z), etc. Likewise, let the
superscripts (2(, x) and (2(, y) denote column augmentation, as the indices are
incremented along the x and y directions, respectively. For instance,
(2(,x)
b(1,z) = b(1,z) k=1 b(1,z) k=K
The partial derivatives can then be approximated by finite difference approximations in matrix forms as follows:
u
(3()
D(1,x) U (3() + B(1,x)
x
(3()
u
T
T
U (3() IM D(1,y)
+ B(1,y)
y
u
T
(1,z)
U (3() D(1,z)
IN + B
z
;
;
;
2u
(3()
D(2,x) U (3() + B(2,x)
x2
(3()
2u
(3()
T
T
I
+
B
D
M
(2,y)
(2,y)
y2
2u
(3()
T
(2,z)
D
+B
I
N
(2,z)
z2
2u
xy
(3()
T
D(1,x) U (3() IM D(1,y)
+ B(1,x,1,y)
2u
yz
T
T
(1,y,1,z)
+B
D(1,y)
U (3() D(1,z)
2u
xz
T
(1,x,1,z)
D(1,x) U (3() D(1,z)
IN + B
where D(1,x) , D(1,y) , D(1,z) , D(2,x) , D(2,y) , and D(2,z) are matrices that can take forms such
as those given in Section 13.2.2 depending on order of approximation and boundary
conditions. The matrices B(1,x) , B(2,x) , . . . , and so forth contain the boundary data.
2,z are given by a sequence of transformations as
1,z and B
The new matrices B
'
(2(,y) (T
(2(,x)
(1,z) = reshape
b
B
, K, NM
(M.6)
(1,z)
(2,z)
B
'
(T
(2(,x) (2(,y)
b(2,z)
reshape
, K, NM
1,x,1,z and B
1,y,1,z are left as exercises.)
(The matrices B
(M.7)
853
854
Just as in the previous section, we can use the properties of matrix vectorizations
to obtain the following linear equation problem corresponding to (M.5):
R3D vec U (3() = f3D
(M.8)
where,
R3D
'
(dv
dv
xx (3() IM IN D(2,x) + yy (3()
IM D(2,y) IK
+
dv
zz(3() D(2,z) IN IK
'
'
(dv
(dv
(3()
(3()
+ xy
IM D(1,y) D(1,x) + yz
D(1,z) D(1,y) IK
+
dv
xz(3() D(1,z) IN D(1,x)
'
(dv
dv
(3()
(3()
+ x
IM IN D(1,x) + y
IM D(1,y) IK
+
+
f3D
dv
z(3() D(1,z) IN IK
dv
(3()
'
(dv
dv
(3()
(3()
xx (3() vec B(2,x) + yy (3()
vec B(2,y)T
+
dv
(2,z)
zz(3() vec B
(dv
'
'
(dv
(3()
(3()
(3()
(1,y,1,z)
+ xy
vec B(1,x,1,y) + yz
vec B
+
dv
(1,x,1,z)
xz(3() vec B
'
(dv
dv
(3()
(3()
T
x (3() vec B(1,x) + y (3()
vec B(1,y)
+
dv
(1,z)
z(3() vec B
+ vec (3()
EXAMPLE M.2.
0 x, y, z 1
(M.9)
where,
(2
4
(x, y, z) = exp 2 [z x] 5 1 z y
5
2
2
4
52
122
+ 16 (x z)2 + 100 1 z y + 8 z 8y + 4x
5
5
5
'
(2
4
u (x, y, z) = exp 2 [z x] 5 1 z y
5
(M.10)
'
(M.11)
Using
x =
y =
z = 0.05, and central difference formulas for D(2,x) , D(2,y) ,
B(2,x)
D(2,y)
, B(2,y) , and B(2,z) , the linear equation (M.8) can be solved for
, (3()
. The results are shown in Figure M.1 at different values of z, where
vec U
the approximations are shown as points, whereas the exact solutions are shown
as surface plots. (A MATLAB file poisson_3d.m is available on the books
webpage that implements the finite difference solution and obtains the plots
shown in this example.) The errors from the exact solution (M.11) are shown in
Figure M.2 at different fixed values of z. The errors are in the range 1.7 103 .
u+A
u =
c
t
x
(M.12)
855
856
z= 0.1
z= 0.2
u
0
1
1
0 0
z= 0.3
0.5
0.5
0
1
0.5
0.5
0 0
z= 0.4
0
1
0.5
0 0
z= 0.5
1
0.5
0.5
0.5
0.5
0 0
0.5
0
1
0.5
0.5
0 0
z= 0.7
0
1
0.5
0 0
z= 0.8
1
0.5
0.5
0.5
0.5
0 0
0.5
0
1
0.5
0 0
0.5
z= 0.9
0
1
z= 0.6
0
1
0.5
0.5
0
1
0.5
0 0
0.5
Figure M.1. The finite difference solution to (M.9) at different values of z, subject to conditions
(M.10). The approximations are shown as points, whereas the exact solutions, (M.11), at the
corresponding z values are shown as surface plots.
where
u = (
u1 , . . . ,
uJ )T and A is a constant J J matrix. If A is diagonalizable, that
is, there exist a nonsingular matrix V and a diagonal matrix such that A = VV 1 ,
then with
u
u = V 1
c = V 1
c
(M.13)
x 10
x 10
z= 0.1
u
2
1
y
0 0
x 10
z= 0.2
2
1
0.5
0 0
0.5
0.5
0 0
z= 0.5
z= 0.6
0.5
0 0
0.5
2
1
0.5
0 0
0.5
2
1
x 10
2
1
2
1
2
1
0.5
0.5
z= 0.9
0 0
0 0
x 10
z= 0.8
0.5
0.5
3
x 10
z= 0.7
x 10
2
1
0.5
x 10
z= 0.4
z= 0.3
2
1
x 10
857
0.5
0 0
0.5
0.5
0 0
0.5
Figure M.2. The error distribution between the finite difference approximation (using central
difference formulas) and the exact solutions, (M.11), at different z values.
forward-time-forward-space (FTFS), forward-time-central-space (FTCS), forwardtime-backward-space (FTBS), backward-time-forward-space (BTFS), backwardtime-central-space (BTCS), and backward-time-backward-space (BTBS). Each
scheme will have different stability ranges for
t in relation to
x and . In Table M.1,
we summarize the different upwind schemes and their stability based on another
parameter
=
(M.14)
which is known as the Courant number. The stability conditions included in the table
are obtained using the von Neumann method and are given as Exercise in E13.15.
We can make the following observations:
1. The forward-time schemes: FTFS, FTCS, and FTBS are explicit schemes,
whereas the backward-time schemes: BTFS, BTCS and BTBS are implicit
schemes.
2. The central-space schemes are given by FTCS and BTCS, with the explicit FTCS
being unstable and the implicit BTCS being unconditionally stable.
3. The noncentral space schemes have their stability dependent on the sign of , or
equivalently on the sign of . Both forward-space schemes, FTFS and BTFS, are
858
Approximation equation
FTFS
uk
FTCS
uk
FTBS
uk
BTFS
(1 ) uk
BTCS
uk
BTBS
(1 + ) uk
Leapfrog
uk
Lax-Friedrichs
uk
(q)
(q+1)
= (1 + ) uk uk+1 +
tck
(q+1)
= uk
(q+1)
= uk1 + (1 ) uk +
tck
(q)
(q+1)
(q+1)
(q+1)
= uk
(q+1)
(q+1)
(q)
1 0
(q)
(q)
(q)
uk+1 uk1 +
tck
2
(q)
(q+1)
None
(q)
(q+1)
+ uk+1 +
tck
01
(q)
= uk
(q+1)
(q+1)
(q+1)
(q)
uk+1 uk1 +
tck
= uk
2
(q+1)
uk
(q)
(q)
(q+1)
(q+1)
uk1 +
tck
(q1)
(q)
(q)
= uk
(q)
(q)
|| 1
un+1 + un1 + 2
tck
(q)
1 2 uk +
1
2
All
1
1+
(q)
(q)
(q)
uk+1 +
uk1 +
tck
2
2
|| 1
(q)
2
uk+1
(q)
+ 21 2 + uk1
c
c (q) 2
(q)
+ ck
t +
t
t
x k
Lax-Wendroff
Crank-Nicholson
Stability region
(q+1)
(q+1)
(q+1)
+ uk
uk1 =
u
4 k+1
4
(q)
(q)
(q)
(q+1/2)
uk+1 + uk + uk1 +
tck
4
4
|| 1
All
stable only for negative values, whereas both backward-space schemes, FTBS
and BTBS, are stable only for positive values.1
From the last observation, we can still recover the use of noncentral schemes
by switching between forward-space and backward-space schemes depending on the
sign of . This combination is called the upwind schemes, because the direction of
space difference is adjusted to be opposite to the wave speed . Specifically, with
(+) =
1
+ ||
2
and
() =
||
2
(M.15)
Note that even though BTFS and BTBS are both implicit schemes, neither are unconditionally
stable.
we have the explicit upwind scheme, which combines both FTFS and FTBS in one
equation,
(q+1)
(q)
(q)
(q)
= 1 + () (+) uk + (+) uk1 () uk+1
(M.16)
uk
and whose stability range is given by 0 < || < 1. Likewise, we have the implicit
upwind scheme, which combines both BTFS and BTBS in one equation,
(q+1)
(q+1)
(q+1)
(q)
1 () + (+) uk
(+) uk1 + () uk+1 = uk
(M.17)
whose stability range is given by 0 < ||.
(q)
uk+1 uk1
2
x
(q+1)
uk
(q1)
uk
2
t
(q)
= ck
(M.18)
Note that the leapfrog scheme needs values at both tq and tq1 to obtain values
at tq+1 . Thus the leapfrog schemes often require other one-step marching, such as
Lax-Friedrich or Lax-Wendroff to provide it with values at t1 , and then continue
with the leapfrog for tq , q 2.
The Lax-Friedrichs scheme approximates the time derivative as a forward time
(q+1)
(q)
(q)
difference, but between uk
and the average at the current point, 21 uk+1 + uk1 .
Thus the scheme is given by
(q+1)
(q)
(q)
(q)
(q)
uk
21 uk+1 + uk1
uk+1 uk1
(q)
+
= ck
(M.19)
2
x
t
Note that the leapfrog scheme used the values at tq1 , whereas the Lax-Friedrichs
continues to stay within tq .
The third explicit finite difference scheme uses the Taylor series approximation
for u,
u
1 2 u
(q+1)
(q)
= uk +
t +
uk
t2 + O
t3 (M.20)
2
t t=q
t,x=k
x
2 t t=q
t,x=k
x
and then substitutes the following identities obtained from the given differential
equation
u
u
=
+ c and
t
x
2
2u
c
2 u
=
c +
t2
x2
t
859
860
uk
1
t2
t (q)
(q)
(q)
(q)
(q)
uk+1 uk1 + 2 2 uk+1 2uk + uk1
2
x
2
x
(q)
c
c
(q)
+ ck
t +
t2
t
x k
(q)
uk
or
(q+1)
uk
(q) 1 2
(q)
(q)
1 2
(q)
1 2 uk +
uk+1 +
+ uk1 + ck
t
2
2
c
c (q) 2
+
t
(M.21)
t
x k
Using the von Neumann method, one can show that the stability range of
the three explicit schemes, namely the leapfrog, Lax-Friedrichs, and Lax-Wendroff
schemes, given in (M.18), (M.19), and (M.21), respectively,
given by || 1.
2 are 2all
, O (
x,
t), and
,
t
The
approximation
errors
for
these
methods
are
O
x
2
2
O
x ,
t for leapfrog, Lax-Friedrichs, and Lax-Wendroff schemes, respectively.
The Crank-Nicholson scheme is an implicit scheme that could be seen as an
attempt to improve the accuracy of the BTCS scheme,
may be uncondition which
ally stable but only has approximation errors of O
x2 ,
t . However, unlike the
leapfrog scheme, where values at tq1 are introduced, this method tries to avoid
this from occurring by using a central difference approximation at a point between
tq+1 and tq , that is, at t = tq+1/2 , with a time increment
t/2. However, by doing so,
the spatial derivative at t = tq+1/2 must be estimated by averages. Thus the CrankNicholson scheme uses the following approximation for the time derivative:
2
x
(q+1)
(q)
uk+1 + uk+1
2
(q+1)
(q)
uk1 + uk1
2
+
(q+1)
uk
(q)
uk
2(
t/2)
(q+1/2)
= ck
or
(q+1)
(q+1)
(q)
(q)
(q+1)
(q)
(q+1/2)
uk+1 + uk
uk1 = uk+1 + uk + uk1 +
tck
(M.22)
4
4
4
4
The approximation error of the Crank-Nicholson scheme is O
x2 ,
t2 . Using the
von Neumann method, we can show that the Crank-Nicholson scheme, like the
BTCS scheme, is unconditionally stable.
EXAMPLE M.3.
u
u
+ 0.5
=0
(M.23)
t
x
we consider both a continuous initial condition and a discontinuous initial
condition.
1. Continuous initial condition. Let initial condition be a Gaussian function
given by
u(0, t) = e8(5x1)
(M.24)
t 0
0
x
BTCS
t0
x
0
Lax-Wendroff
u
x
x =
t = 0.01 are shown in Figure M.3. It appears that the leapfrog, LaxWendroff, and Crank-Nicholson schemes yielded good approximations.
2. Discontinuous initial condition. Let the initial condition be a square pulse
given by
+
1 if 0.2 x 0.4
u(0, t) =
(M.25)
0 otherwise
UPWIND (Implicit)
0.5
0.5
UPWIND (Explicit)
0.5
x
Leapfrog
0.5
0.5
0.5
x
LaxWendroff
0.5
x
CrankNicholson
0.5
x
0.5
x
BTCS
0.5
0.5
0.5
x
1
0
1
t 0
0
1
0
1
t0
x
0
Crank-Nicholoson
1
0
1
t 0
0
1
0
1
1
0
1
t 0
0
x
Leapfrog
Figure M.3. Numerical solutions for continuous initial condition using the various
schemes.
UPWIND (Implicit)
1
0
1
UPWIND (Explicit)
861
Figure M.4. Comparison with exact solutions for different schemes at t = 1. The exact solution
is given as dashed lines.
862
x
Leapfrog
1
0
1
t0
x
0
Lax-Wendroff
1
0
1
t 0
0
x
BTCS
t 0
0
Figure M.5. Numerical solutions for discontinuous initial condition using the various schemes.
t0
0
x
Crank-Nicholson
1
0
1
1
0
1
t 0
0
UPWIND (Implicit)
u
UPWIND (Explicit)
1
0
1
1
0
1
t 0
0
x =
t = 0.01 are shown in Figure M.5. As one can observe from the
plots, none of the schemes match the exact solution very well. This is due to
numerical dissipation introduce by the schemes. Dissipation was instrumental for stability, but it also smoothed the discontinuity. However, the other
schemes had growing amount of oscillations. These are due to the spurious roots of the schemes. Significant amounts of oscillation throughout the
spatial domain can be observed in both the leapfrog and Crank-Nicholson
schemes. The Lax-Wendroff appears to perform the best; however, a smaller
mesh size should improve the approximations.
More importantly, however, is that if one had chosen || = 1, both the
Lax-Wendroff and Lax-Friedrich schemes reduce to yield an exact solution
as shown in Figure M.7 because the discontinuity
will travel along the char1
acteristic; that is, with c(x, t) = 0 and
t =
x (or || = 1), both schemes
reduce to
(q)
uk+1 if = 1
(q+1)
uk
=
(q)
uk1 if = +1
The example shows that the Lax-Wendroff performed quite well, especially
when
t was chosen carefully so that || = 1. Note that the case in which it yielded
an exact solution (at the grid points) is limited primarily to a constant and zero
homogeneous case, that is, c(x, t) = 0. The other issue remains that Lax-Wendroff
and Lax-Friedrich are still explicit time-marching methods.
0.5
0.5
UPWIND (Explicit)
0.5
x
Leapfrog
0.5
0.5
0.5
x
LaxWendroff
0.5
x
CrankNicholson
0.5
x
0.5
x
BTCS
0.5
0.5
863
0.5
x
u 1
0.5
0
1
1
t
0.5
0.5
x
0 0
864
and convergence, as well as the same range of stability as the original schemes.
Because the computations are now reduced to solving two or more sequences of
tri-diagonal systems, via the Thomas algorithm, the improvements in computational
efficiency, in terms of both storage and number of computations, become very significant compared with the direct LU factorizations.
The original ADI schemes were developed by Douglas, Peaceman, and Rachford
to improve the Crank-Nicholson schemes for parabolic equations. For a simple
illustration of the ADI approach, we take the linear second-order diffusion equation
for 2D space, without any mixed partial derivatives, given by
u
t
xx (t, x, y)
2u
2u
u
+
(t,
x,
y)
+ x (t, x, y)
yy
x2
y2
x
+y (t, x, y)
u
+ (t, x, y)u + (t, x, y)
y
(M.26)
u(t, x, 0) = w0 (t, y)
u(t, 1, y) = v1 (t, y)
u(t, x, 1) = w1 (t, y)
u11 u1N
11 1N
.. ; = ..
.. ; etc.
..
..
U = ...
.
.
.
.
.
uK1
uKN
K1
KN
(M.27)
where
v
vec(U)
Mx
My
Mx + My
dv
dv
1
xx IN D(2,x) + x IN D(1,x) + dv
2
' (dv
' (dv
1
yy
D(2,y) IK + y
D(1,y) IK + dv
2
' (dv
dv
T
xx vec B(2,x) + yy
vec B(2,y)
+
' (dv
dv
T
x vec B(1,x) + y
vec []
vec B(1,y)
t (q) (q)
t (q+1)
t (q+1) (q+1)
= I+
B
(M.28)
I
F
v
+ B(q)
F
v +
2
2
2
865
By subtracting the term, I (
t/2)
tF(q+1) v(q) , from both sides of (M.28),
t
t (q+1) (q+1)
I
v
F
v(q) =
F(q+1) + F(q) v(q) + B(q+1) + B(q)
2
2
(M.29)
Let
(q+1)
t (q+1)
I
t v(q)
F
=
2
=
(see (M.28)),
t (q+1)
t (q+1)
I
Mx
t v(q)
My
2
2
t (q+1)
t (q+1)
I
Mx
I
t v(q)
My
2
2
2
Mx(q+1) My(q+1)
t v(q)
4
(q+1) (q+1)
Gx
Gy
t v(q) O
t4
where
t (q)
t (q)
G(q)
;
G(q)
Mx
(M.30)
M
x =I
y =I
2
2 y
The last term is O
t4 because of the fact that the Crank-Nicholson schemeguar
antees that
t v(q) = v(q+1) v(q) = O
t2 . By neglecting terms of order O
t4 ,
(M.29) can then be replaced by
t
F(q+1) + F(q) [u]q + B(q+1) + B(q)
Gx(q+1) Gy(q+1)
t [u](q) =
(M.31)
2
However, Gx and Gy are block tri-diagonal matrices whose nonzero submatrices are
diagonal in which the main blocks in the diagonal are also tri-diagonal, thus allowing
easy implementation of the Thomas and block-Thomas algorithms. Equation (M.31)
is known as the delta-form of the ADI scheme.2 The values of U (q+1) are them
obtained from
(M.32)
U (q+1) =
t U (q) + u(q)
It can be shown by direct application of the von Neumann analysis that the ADI
scheme given in (M.31) will not change the stability conditions; that is, if the CrankNicholson scheme is unconditionally stable, then the corresponding ADI schemes
will also be unconditionally stable. Furthermore, because the only change from the
original Crank-Nicholson scheme was the removal of terms that are fourth order
in
t, the ADI scheme is also consistent. The application of the Lax equivalence
theorem then implies that the ADI schemes will be convergent. The extension of
the ADI approach to 3D space is straightforward and is given as an exercise.
2
(q)
The scheme is named Alternating Direction Implicit (ADI) based on the fact that the factors Gx
(q)
and Gy deal separately along the x and y directions, respectively. Also, the term Implicit (the I in
ADI) is a reminder that ADI schemes are developed to improve the computation of implicit schemes
such as the backward-Euler or Crank-Nicholson, where matrix inversions or LU factorizations are
required.
866
Other approaches to steady-state solutions include relaxation methods for solving large sparse linear
equations such as Jacobi, Gauss-Seidel, SOR. Currently, various Krylov subspace approaches such
as conjugate gradient and GMRES (see Sections 2.7 and 2.8) are used for very large sparse problems.
APPENDIX N
Barber, C. B., Dobkin, D. B., and Huhdanpaa, H. The QuickHull Algorithm for Convex Hulls.
ACM Trans. on Math. Software, 1995.
867
868
Hyp(F)
b
d
then
R() =
(p a , p b)
(p b, p c )
(p c , p d )
(p d , p e )
(p e , p f )
(p f , p g )
(p g , p a )
Note that it will be beneficial for our purposes to specify the sequence of each
edge such that they follow a counter-clockwise traversal, for example (p a , p b)
instead of (p b, p a ), and so forth.
3. Extend operation. Let p be an outside point to a set of connected facets, .
Then the operation Extend(p, ) will take p and attach it to each ridge in
R() to form m new facets, where m is the number of ridges of , that is,
F M+1
..
= Extend(p, )
.
F M+m
(N.1)
12
10
13
14
11
16
15
For example, using the same set shown in Figure N.2, suppose p h is an outside
point to the facets in , then we have
Extend(p, ) =
F 17 = (p h , p a , p b)
F 18 = (p h , p b, p c )
F 19 = (p h , p c , p d )
F 20 = (p h , p d , p e )
F 21 = (p h , p e , p f )
F 22 = (p h , p f , p g )
F 23 = (p h , p g , p a )
Note that each new facet generated will also have a sequence that goes counterclockwise.
Simplified-QuickHull Algorithm:
Let P = {p 1 , . . . , p N } be the set of available points.
1. Initialization.
(a) Create a tetrahedron as the initial convex hull (e.g., using the points in P
corresponding to the three largest z-components and connect them to the
point with the smallest z-component):
F = {F 1 , F 2 , F 3 , F 4 }
(b) Remove, from P, the points that were assigned to F .
(c) Obtain the collection of current visible sets:
V = {Vis(p i ), p i P}
2. Expand the convex hull using unassigned point p i .
(a) Obtain the ridge set of the visible set of p i :
R = R (Vis(p i ))
(b) Update the facets of the hull:
i. Generate new facets:
F add = EXTEND(p
i , R).
@
ii. Combine with F :
F F
F add .
F F Vis(p i ).
iii. Remove Vis(p i ) from F :
(c) Update the collection of visibility sets:
i. Remove, from each set in V, any reference to the facets in Vis(p i ) (thus
also removing Vis(p i ) from V).
ii. Add facet F k F add to Vis(p j ) if point p j is outside of facet of F k .
(d) Remove p i from the set of available points.
This version is a simplification of the QuickHull algorithm. We have assumed
that all the points are boundary points; that is, each point will end up as vertices of
869
870
the triangular patches forming the convex hull. Because of this, the algorithm steps
through each unassigned point and modifies the visibility sets of these points as the
convex hull grows in size.2
(N.3)
Doing so, the modifications will simply end up with the addition of one term each
for Kn and n as defined in (14.43) and (14.44), respectively; that is, we now instead
use
+
2
D
T
T
T
T
T
T M(p ) T b(p ) T g (p ) T b(p ) b(p ) T
Kn =
(N.4)
2 n
2
+
D
(rbc)
T
n =
h(p ) + Q + Q
+ T b(p ) h(p )
(N.5)
2 n
When = 0, we get back the Galerkin method.
The last detail is the evaluation of the stabilization parameter . Although several
studies have found an optimal value for in the one-dimensional case, the formulation for the optimal values for 2D and 3D cases remain to be largely heuristic. For
simplicity, we can choose the rule we refer to as the Shakib formula,
8
91/2
2
2b 2
4
2
+9
=
+
(N.6)
2
2
In the original QuickHull algorithm of Barber and co-workers, the procedure steps through each
facet that have non-empty outside sets and then builds the visible set of the farthest outside point.
This will involve checking whether the chosen point is outside of the adjacent facets. In case there
are points that eventually reside inside the convex hull, the original version will likely be more
efficient. Nonetheless, we opted to describe the revised approach because of its relative simplicity.
871
=
1
(pk pi )
pk p j
(N.7)
will yield 0 1. Then = |s| is the length of the segment from node i to the edge
containing nodes j and k.
EXAMPLE N.1.
0.001
M=
0
and
; g=0
; b=
3
0.001
2
2
h = 1.5 (3x 2y) 0.32 x2 + y2 80 (1.5y x + 0.001) e4(x +y )
0.5
u(x,y)
6
0.5
2
1
1
1
0.5
0.5
0 x
0.5
0.5
1 1
Figure N.4. The triangulation mesh is shown in the left plot, whereas the SUPG solution
(dots) is shown together with exact solution (surface) in the right plot.
1.5
1.5
0.5
0.5
Errors
Errors
872
0.5
0.5
1.5
1
1 1
1.5
1
x
y
1 1
Figure N.5. The errors obtained using the Galerkin method are shown in the left plot, whereas
the errors obtained using the SUPG method are shown in the right plot.
Let the domain to be a square of width 2, centered at the origin. Also, let all the
boundary conditions be Dirichlet, with
x = 1
, 1 y 1
2
2
x=1
, 1 y 1
for
u = 1.5xy + 5e4(x +y )
1
1
,
y = 1
1 x 1 ,
y=1
The exact solution of this problem is known (which was in fact used to set h and
the boundary conditions) and given by
2
2
u = 1.5xy + 5e4(x +y )
After applying the SUPG methods based on a Delaunay mesh shown in the left
plot of Figure N.4, we obtain the solution shown in the right plot of Figure N.4.
The improvements of the SUPG method over the Galerkin method are shown
Figure N.5. The errors for the Galerkin and the SUPG are 1.2 and 0.3,
respectively.
Of course, as the mesh sizes are decreased, the accuracy will also increase.
Furthermore, note that from (N.6), the stabilization parameter for each element will approach 0 as 0, reducing the SUPG method to a simple Galerkin
method.
Remarks: The results for this example were generated by the MATLAB function fem_sq_test2.m, which uses the function linear_2d_supg.m a general SUPG finite element solver for the linear second-order partial differential
equation. Both of these files are available on the books webpage.
Bibliography
Birkhauser,
Boston, 1997.
[20] A. S. Deif, Advanced Matrix Theory for Scientists and Engineers, Abacus Press, Kent,
England, 1982.
B-1
B-2
Bibliography
[21] J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained Optimization
and Nonlinear Equations, Prentice Hall, New Jersey, 1983.
[22] J. Donea and A. Huerta, Finite Element Methods for Flow Problems, John Wiley &
Sons, New York, 2003.
[23] L. Dresner, Similarity Solutions of Nonlinear Partial Differential Equations, Pitman
Publishing, London, 1983.
[24] P. DuChateau and D. Zachmann, Applied Partial Differential Equations, Dover Publications, New York, 1989.
[25] L. Edelstein-Keshet, Mathematical Models in Biology, Society for Industrial and
Applied Mathematics, Philadelphia, 2005.
[26] D. K. Faddeev and V. N. Faddeeva, Computational Methods of Linear Algebra, W.H.
Freeman, San Francisco, 1963.
[27] J. D. Faires and R. L. Burden, Numerical Methods, Brook/Cole Publishing Company,
Pacific Grove, CA, third ed., 2002.
[28] S. J. Farlow, Partial Differential Equations for Scientists and Engineers, Dover Publications, New York, 1993.
[29] J. H. Ferziger and M. Peric,
Computational Methods of Fluid Dynamics, Springer
Verlag, Berlin, 2002.
[30] B. A. Finlayson, The Method of Weighted Residuals and Variational Principles, Academic Press, New York, 1972.
[31] G. B. Folland, Fourier Analysis and Its Applications, Brooks/Cole Publishing Company, Pacific Grove, CA, 1992.
[32] G. Friedlander and M. Joshi, Introduction to the Theory of Distributions, Cambridge
University Press, Cambridge, UK, second ed., 1998.
[33] J. C. Friedly, Dynamic Behaviour of Processes, Prentice-Hall, New Jersey, 1972.
[34] G. F. Froment and K. B. Bischoff, Chemical Reactor Analysis and Design, John Wiley
& Sons, New York, first ed., 1979.
[35] F. R. Gantmacher, Matrix Analysis, vol. 1 and 2, Chelsea Publishing Company, New
York, 1977.
[36] C. W. Gear, Numerical Initial Value Problems in Ordinary Differential Equations,
Prentice-Hall, New Jersey, 1971.
Bibliography
[50] M. Humi and W. Miller, Second Course in Ordinary Differential Equations for Scientists and Engineers, Springer Verlag, New York, 1987.
[51] P. E. Hydon, Symmetry Methods for Differential Equations, A Beginners Guide, Cambridge University Press, Cambridge, MA, 2000.
[52] E. L. Ince, Ordinary Differential Equations, Dover Publications, New York, 1956.
[53] E. Isaacson and H. B. Keller, Analysis of Numberical Methods, John Wiley & Sons,
New York, 1966.
[54] A. Iserles, A First Course in the Numerical Analysis of Differential Equations,
Cambridge University Press, Cambridge, UK, 1996.
[55] V. G. Jenson and G. V. Jeffreys, Mathematical Methods in Chemical Engineering,
Academic Press, London, second ed., 1977.
[56] C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element
Method, Dover Publications, New York, 2009.
[57] T. Kailath, Linear Systems, Prentice-Hall, New Jersey, 1980.
[58] A. C. King, J. Billigham, and S. R. Otto, Differential Equations: Linear, Nonlinear,
Ordinary, Partial, Cambridge University Press, Cambridge, UK, 2003.
[59] R. Knobel, An Introduction to the Mathematical Theory of Waves, American Mathematical Society, Providence, RI, 1999.
[60] E. Kreyszig, Advanced Engineering Mathematics, John Wiley & Sons, ninth ed., 2006.
[61] M. C. Lai, A note on finite difference discretizations for poisson equation on a disk,
Numerical Methods for Partial Difference Equations, 17 (2001), pp. 199203.
[62] P. D. Lax, Hyperbolic Systems of Conservation Laws and the Mathematical Theory of
Shock Waves, Society of Industrial and Applied Mathematics, Philadephia, 1987.
[63] E. S. Lee, Quasilinearization and Invariant Imbedding, Academic Press, New York,
1968.
B-3
B-4
Bibliography
[78] Y. Saad, Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied
Mathematics, Philadelphia, second ed., 2003.
[79] H. M. Schey, Div, Grad, Curl and All That: An Informal Text on Vector Calculus, W. W.
Norton & Company, New York, 1992.
[80] J. H. Seinfeld, Mathematical Methods in Chemcial Engineering, vol. 3, Prentice Hall,
New Jersey, 1974.
[81] I. N. Sneddon, Fourier Transforms, McGraw-Hill, New York, 1951.
[82] I. P. Stavroulakis and S. A. Tersian, Partial Differential Equation: An Introduction
with Mathematica and MAPLE, World Scientific Publishing Company, Singapore, 1999.
[83] H. Stephani, Differential Equations, Their Solution Using Symmetries, Cambridge University Press, Cambridge, UK, 1989.
[84] H. H. Storey and C. Rosenbrock, Computational Techniques for Chemical Engineers,
Pergamon Press, New York, 1966.
[85] G. Strang, Introduction to Applied Mathematics, Wellesley-Cambridge Press, Wellesley,
MA, 1986.
[86] S. H. Strogatz, Nonlinear Dynamics and Chaos: With Applications to Physics, Biology,
Chemistry, and Engineering, Westview Press, Cambridge, MA, 2001.
[87] J. W. Thomas, Numerical Partial Differential Equations: Finite Difference Methods,
Springer Verlag, New York, 1995.
[88] E. G. Thompson, Introduction to the Finite Element Method, John Wiley & Sons, New
York, 2005.
[89] N. Tufillaro, T. Abbott, and J. Reilly, An Experimental Approach to Nonlinear
Dynamics and Chaos, Addison-Wesley Publishing Company, Redwood City, CA, 1992.
[90] C. R. Wiley and L. C. Barrett, Advanced Engineering Mathematics, McGraw-Hill
Book, New York, fifth ed., 1982.
[91] O. C. Zienkiewicz, R. L. Taylor, and P. Nithiarasu, The Finite Element Method for
Fluid Dynamics, Elsevier Butterworth-Heinemann, Amsterdam, sixth ed., 2005.
[92] O. C. Zienkiewicz, R. L. Taylor, and J. Z. Zhu, The Finite Element Method, Its Basis
and Fundamentals, Elsevier Butterworth-Heinemann, Amsterdam, sixth ed., 2005.
[93] D. Zwillinger, Handbook of Integration, Jones and Bartlett Publishers, Boston, 1992.
, Handbook of Differential Equations, Academic Press, San Diego, third ed., 1997.
[94]
Index
I-1
I-2
Index
Coordinate systems (cont.)
rectangular coordinate system, 149
spherical coordinate system, 187189
Table, relationship between rectangular and
cylindrical, 186
Table, relationship between rectangular and
spherical, 188
torroidal coordinate system, 229
vector differential operation in cylindrical and
spherical coordinates, 194
Courant number, 857
Crank-Nicholson method
See (Finite difference method-)
Cubic spline interpolation, 89
Curl
See also (Vector differential operation)
of a vector field, 176
vorticity, 177
Curvature of path/radius of curvature, 166
Cuthill-Mckee algorithm, 601
dAlembert method of order reduction, 357
procedure, 748750
dAlembert solutions, 786791
Dahlquist stability test, 291, 296
Danielevskii method, 660663
Danielson-Lanczos equation, 796
Darbouxs inequality, 806
Delaunay triangulation. See (Finite element
methods)
Delta functions. See (Distributions)
Difference equation
linear, with constant coefficients
characteristic equation, 292
complementary solution, 292
stability, 294
Differential algebraic equation (DAE)
index, 304
mass matrix form, 305
MATLAB DAE solver, 717
semi-explicit, 303
Digraph, 595
strongly connected subdigraphs, 596
Directional derivative, 171
Dirichlet integral theorem, 836
Discrete Fourier transforms. See (Fast Fourier
transforms)
Distribution theory
definition, 821
delta distribution (delta function), 820
in higher dimensions, 828830
limit identities, 826827
properties and identities, 825826
derivative (theorem) of distributions,
824
properties, 823824
tempered distributions, 830831
definition, 831
generalized Fourier transform. See Fourier
transforms-generalized Fourier transforms
test functions, 821
Divergence
See also (Vector differential operation)
of a vector field, 174
Divergence theorem, 208210
Duhamels principle, 258
Eigenfunctions, 435
Eigenvalues and eigenvectors, 107115
definition, 107
left eigenvectors, 113
list of properties, 113
modified QR method, 651655
of companion matrices, 112
power method, 649
QR method, 650
spectral radius, 125
spectrum, 109
Electromagnetics
Table, terms and relationships, 221
Energy balance, 218220
mechanical energy balance, 219
thermal energy balance, 219
total energy balance, 219
Error function, 465
Euler equation of motion, 218
Faradays law, 222
Fast Fourier transforms
algorithm, 798
discrete Fourier transforms, 796
Ficks law, 220
Finite difference equations
ADI schemes, 863866
Finite difference method
consistency, 513
convergence, 513
finite difference approximations
backward difference, 485
central difference, 485
forward difference, 485
method of undetermined coefficients,
486491, 851
finite difference approximations
finite difference approximation lemma,
487
Table, for first-order derivatives, 489
Table, for second-order derivatives, 490
hyperbolic equations, 855862
Courant number, 857
Crank-Nicholson scheme, 860
Lax-Friedrichs scheme, 859
Lax-Wendroff scheme, 860
leapfrog scheme, 859
Table, basic schemes, 858
upwind schemes, 856
Wendroff scheme, 521
Lax-Richmyer stabilty, 513
stability analysis
amplification factor, 517
eigenvalue method, 514516
Von Neumann method, 516519
Index
time-dependent
backward Euler, 508
Crank-Nicholson, 509
forward Euler, 508
semi-discrete approach (or method of lines),
504507
weighted Euler methods, 509
time-independent
one dimension, 491496
polar and spherical coordinates, 500503
three dimensions, 852854
two dimensions, 496499
Finite element methods
streamlined-upwind-Petrov-Galerkin (SUPG),
870
Shakib formula, 870
stabilization parameter, 870
assembly
method of reduction of unknowns, 537
overloading method, 538
axisymmetric problems, 546547
Delaunay triangulation, 539541
node lifting, 540
quick-hull algorithm, 540, 867870
Galerkin method, 530
summary, main steps, 542544
time-dependent
Crank-Nicholson method, 549
mass matrix, 549
semi-discrete approach, 548
triangular finite elements, 527533
index matrix, 535
line integrals, 531533
node matrix, 535
properties of shape functions, 528529
shape functions, 527
surface integrals, 529531
weak solution, 526
weighted residual method, 526
First-order PDE
Cauchy condition, 381
characteristic curves, 381
characteristics
characteristics, 381
Clairauts equation, 392
general solution, form, 388
Lagrange-Charpit conditions, 390
Lagrange-Charpit method, 389
method of characteristics, 380387
special forms (quasilinear, semilinear, etc.),
380
Floquet multipliers, 339
Fourier integral equation, 453
Fourier integral theorem, 819
technical lemma, 838
Fourier kernel, 452
Fourier series, 423, 452
Fourier transforms
definition, Fourier/inverse Fourier transform,
454
convolution, 458
I-3
I-4
Index
Initial value problems (IVP) (cont.)
Adams-Moulton (implicit multistep), 285
backward difference formula (BDF) method
(Gears method), 287
BDF with variable step size, 723
Milne-Simpson method, 308
trapezoidal method, 285
Runge-Kutta methods, 276
explicit fourth order, 279
explicit second order (Huens method), 307
implicit fourth order (Gauss-Legendre), 280
Runge-Kutta tableau, 278
stability
Dahlquist stability test, 296
principal roots/spurious roots, 297
stability regions
Adams-Moulton method, 299
backward Euler method, 297
explicit Runge-Kutta, 297
fourth-order Gauss-Legendre, 298
types of numerical stability, 300
stiff differential equation, 298
Table, MATLAB IVP solvers, 716
Integral theorems
divergence theorem
(Gauss-Ostrogradski-Green), 208
Gauss theorem, 210
Greens Lemma, 205
Greens theorem (Greens identities), 209
Stokes theorem, 210
Integral transforms
definition, 451
Dirichlet conditions, 819
Table, examples, 452
Jacobi equation, 371
Jordan block, 119
Jordan canonical form (Jordan decomposition),
118
Kronecker delta, 154
Krylov subspace, 631, 706
Laguerre equation and polynomials, 373
Lamberts W function (Omega function), 247
Laplace invariants, 444
Laplace transforms
definition, Laplace/inverse Laplace transforms,
464
convolution operation, 468
inverse transformation via partial fractions,
472474
List, properties, 467469
Table, transforms of basic functions, 472
Laplacian
See also (Vector differential operation)
of a scalar field, 178
of vector fields, 181
operator, 178
Lax entropy conditions, 779
Least-squares solution, 7177
forgetting factor, 93
Levenberg-Marquardt method, 639
algorithm, 641
More method, 641
normal equation, 71
recursive least squares, 93
weighted least squares, 92
with linear constraints, 76
Legendre equations, 358363
Plots, Legendre functions of orders 0 to 4, 361
Plots, Legendre polynomials of orders 0 to 4,
361
properties of Legendre polynomials, 362
Table, Legendre polynomials and functions, 360
Table, types and solutions, 359
Leibnitz derivative formula (Leibnitz rule),
224225
for one dimension, 224
for three dimensions, 224
Leibnitz formula (n th derivative of products), 349
Levenberg-Marquardt
See (Least-squares solution)
Levi-Civita symbol, 155
`
Lienard
systems, 336
Limit cycle, 332
Bendixsons criterion, 333
Poincare-Bendixsons
theorem, 334
Line integral, 673
Linear PDE
boundary conditions (Dirichlet, Neumann and
Robin), 408
complementary solution, 407
dAlembert solution, 411
linear partial differential operator, 407
non-homogeneous PDE
homogenization of boundary conditions, 432
homogenization of PDE, 438
particular solution, 407
reducible PDEs, 408409
similarity transformation and similarity
transformation parameter, 440
solution method
Fourier transform method, 459463
Laplace transform methods, 474476
method of eigenfunction expansion, 434437
method of images, 476477
separation of variables, 411428
similarity transformation, 439443
superposition, 407
Lipschitz condition, 312
Logistic solution, 265
LU decomposition, 5965
block LU decomposition, 605
Choleskis method, 61
Crouts method, 61
Doolittles method, 61
Thomas algorithm, 63
Lyapunov function, 330
Krasovskii form, 331
Rosenbrock function, 331
Lyapunov matrix equation, 331
Index
Mass balance, 216217
Mass matrix. See (Finite element methodstime
dependent)
Matrix
definition, 5
adjugate, 14
asymptotically stable, 124
bandwidth, 62
block matrix inverse, 30
block matrix product, 30
Boolean, 595
characteristic equation, 108
circulant, 116
classes
Table, based on operational properties,
562563
Table, based on structure and composition,
567
cofactor, 13
companion (Frobenius), 112
condition number, 136
Cramers rule, 29
cross-product operator, 141
derivative, multivariable
gradient, 35
Hessian, 37
Jacobian, 36
derivative, univariable function
definition, 32
Table of properties, 33
determinant
definition, 13
block matrix formulas, 30
Table of properties, 25
via row/column expansion, 13
diagonalizable, 117
diagonally dominant, 61, 143
elementary row/column operators, 11
exponential, 32, 253
Fourier, 43
grammian, 71
Hadamard product, 12
Hermitian/skew-Hermitian, 6
Hessenberg, 652
algorithm based on Householder operations,
653
idempotent, 42, 104
ill-conditioned, 137
integral, univariable function
definition, 32
Table of properties, 34
inverse
definition, 12
block matrix formulas, 30
Moore-Penrose generalized inverse, 128
of diagonal matrices, 27
of triangular matrices, 27
via adjugates, 14
Woodbury matrix formula, 28
Jordan canonical basis, 119
Kronecker product, 12
modal, 119
negative definite, 38
nilpotent, 43
nondefective, 117
normal, 116
norms, 135138
operations, algebraic, 1012
Table, 9
Table of properties, 19
operations, restructuring, 68
Table, 78
operators, 100107
affine, 106
permutation, 101
projection, 104
reflection (Householder), 103
rotation (Givens), 101
orthogonal, 101
permutation, 11, 101
positive definite, 38, 587
Sylvesters criterion, 588
projection, 42, 104
pseudo-inverse, 72
quadratic form, 18
gradient and Hessian, 39
rank, 56
redact, 8
reducible/irreducible, 598600
reshape, 6
semisimple, 117
sparse matrix, 3940
coordinate format, 40
Table of MATLAB commands, 40
spectrum, 109
square root, 132
stable, 124
submatrix, 7
symmetric/skew-symmetric, 6
unitary, 43, 101
vectorization, 6
Matrix diagonalization, 117118
Matrix iterative methods
conjugate gradient method, 78
Gauss-Seidel method, 68
GMRES method, 79
Jacobi method, 67
succesive over-relaxation (SOR), 68
Matrix norms, 135138
Frobenius, 135
induced norm, 135
Matrix splitting
diakoptic method, 605
Schur complements method, 608
Matrix, functions of, 120124
evaluation
diagonalization, 120
finite sums, 122
Jordan decomposition, 121
Sylvesters theorem, 659
well-defined functions, 120
Maxwells equations, 222
I-5
I-6
Index
Method of ghost points, 494
Method of lines
See (Finite difference method-timedependent-semidiscrete approach)
Michaelis-Menten kinetics, 272, 728
Molecular momentum flux, 217
Momentum balance, 217218
Multiply connected regions, 213, 801
Index
Potential equation. See (Partial differential
equationsLaplace equation)
Principal axes, 163
Principal component analysis, 131132
Projection operator, 104106
Pursuit path, 266
QR decomposition, 77
algorithm, 649
Rarefaction. See (Shocks and rarefaction)
Rayleigh quotient, 143
Residues. See (Analytic functions)
Riemann invariants, 399
Riemann space, 199
Rodriguez formula (for Legendre polynomials,
362
Schur triangularization, 116117
algorithm, 658
Schwartz class (of functions), 830
Series solution (ODE)
around ordinary points, 352
around regular singular points
Frobenius series, 355
around regular singular points (Frobenius
method), 355
indicial equation and roots, 356
linear second-order ODE, Frobenius series, 357
ordinary point, 348
radius of convergence, 348
singular point, 348
Shallow water model, 482
Shocks and rarefaction, 771
break times, 772
jump notation, 778
Rankine-Hugonoit jump condition, 778
rarefaction, 780
Riemann problems, 779
shock-fitting scheme, 776
shock speed, 778
shocks/shock paths, 771
weak solutions, 774
Similarity transformation
in ODEs, 239
in PDEs, 440
of matrices, 113
Simply connected regions, 213, 801
Singular value decomposition (SVD), 127132
algorithm, 659
application to principal component analysis, 131
reduced (economical) SVD, 130
to obtain Gauss-Jordan factors, 594
to obtain polar decomposition, 133
Singular values, 127
Spectral radius, 66, 125, 515
Spring and dashpot system, 270
Stokes theorem, 210215
Streamlines, 169
potential flow around cylinders, 555
Sturm-Liouville systems
definition, 421
orthogonality of (eigen-)solutions, 423
Substantial time derivative, 183
surface integral, 678
Sylvester matrix equation, 24
Sylvesters criterion (for positive definite
matrix)V, 588
Sylvesters theorem, 121, 659
Symmetry transformation
in ODEs, 238
in PDEs, 440
Taylor series expansion
definition, 572
linearization, 573
second-order approximation, 573
Tensors
See also (Vector (and tensors))
definition, n th order, 159
metric tensors (fundamental tensor), 199
operations, 162
stress tensor, 160, 218
Table, correspondence with matrix operations,
163
unit tensor, 160
Thermal conductivity coefficient, 220
Thermal diffusivity coefficient, 220
Torsion of path/radius of torsion, 167
Traffic flow (idealized model), 403
Unit vector
for matrices, 5
for vectors and tensors, 154
van der Pol equation, 332, 717
Vector and scalar fields, 169
Vector differential operation
curl, 176178
divergence, 174175
gradients, 170173
Laplacian, 178179
list of identities, 182
List, spherical and cylindrical, 194
miscellaneous operations, 179182
Vector field
Beltrami, 178, 183
complex lamellar, 178
gradient vector field, 170
irrotational, 177
solenoidal, 174
Vectors (and tensors)
acceleration vector, 167
basis unit vectors, 154
binormal unit vector, 166
derivative, 164165
List of identities, 164
dimension, 152
dyad, 157
linearly dependent/independent, 152
polyads, 159
position vector, 165
I-7
I-8
Index
Vector field (cont.)
Table, correspondence with matrix operations,
163
Table, fundamental operations, 151
Table, operations based on unit vectors, 156
Table, properties, 153
traction vector, 160
unit normal vector (to curve), 166
unit normal vector (to surface), 168, 173