Numerical Methods and Methods of Approximation in Science and Engineering
Numerical Methods and Methods of Approximation in Science and Engineering
and Methods
of Approximation
in Science and Engineering
Numerical Methods
and Methods
of Approximation
in Science and Engineering
Karan S. Surana
Department of Mechanical Engineering
The University of Kansas
Lawrence, Kansas
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to
publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials
or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material repro-
duced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any
form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming,
and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copy-
right.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400.
CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have
been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identifica-
tion and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
Preface xv
1 Introduction 1
1.1 Numerical Solutions . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Numerical Methods without any Approximation . . . 1
1.1.2 Numerical Methods with Approximations . . . . . . . 2
1.2 Accuracy of Numerical Solution, Error . . . . . . . . . . . . 2
1.3 Concept of Convergence . . . . . . . . . . . . . . . . . . . . . 3
1.4 Mathematical Models . . . . . . . . . . . . . . . . . . . . . . 4
1.5 A Brief Description of Topics and Methods . . . . . . . . . . 4
vii
viii CONTENTS
df
8.2.1 First Derivative of dx at x = xi . . . . . . . . . . . . 349
d2 f
8.2.2 Second Derivative at x = xi : Central Difference
dx2
Method . . . . . . . . . . . . . . . . . . . . . . . . . . 350
3
8.2.3 Third Derivative ddxf3 at x = xi . . . . . . . . . . . . . 351
8.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 354
BIBLIOGRAPHY 467
INDEX 471
Preface
xv
xvi PREFACE
a computer program.
In this book, all numerical methods are clearly grouped in two cate-
gories:
performing some numerical studies, and for bringing the original prelimi-
nary version of the manuscript of the book to significant level of completion.
Aaron’s interest in the subject, hard work, and commitment to this book
project are instrumental in the completion of the major portion of this book.
Also my very sincere and special thanks to Mr. Dhaval Mysore, my current
Ph.D. student for completing the typing and type setting of much of the
newer material in Chapters 7 through 11. His interest in the subject, hard
work and commitment have helped in the completion of final manuscript of
this book. My sincere thanks to many of my colleagues of the mechanical
engineering department at the University of Kansas, and in particular to my
colleague and good friend Professor Peter TenPas, for valuable suggestions
and many discussions that have helped me in improving the manuscript of
the book.
This book contains many equations, derivations, mathematical details,
and tables of solutions that it is hardly possible to avoid some typographical
and other errors. The author would be grateful to those readers who are
willing to draw attention to the errors using the email [email protected].
xix
1
Introduction
1
2 INTRODUCTION
Remarks.
(a) For a given class of mathematical models, some methods of obtaining nu-
merical solutions may be numerical methods (no approximation), while
others may be methods of approximation. For example, if the math-
ematical model consists of a system of linear simultaneous algebraic
equations (Chapter 2), then methods like Gauss elimination, Gauss-
Jordan method, and Cramer’s rule for obtaining their solution are nu-
merical methods without any approximation, while Gauss-Seidel and
Jacobi methods are methods of approximation.
(i) If the true solution is known, the error can be measured as the difference
between the true solution and the calculated solution in the pointwise
sense, or if possible in the sense of L2 -norm.
(ii) When the theoretical solution is not known, as is the case with most
practical applications, we can possibly consider some of the following.
(a) We can attempt to estimate the error bounds. This provides the
least upper bound of the error in the solution, i.e., the true error
is less than or equal to the estimated error bound. In many cases
(but not always), this estimation of the error bound is possible.
(b) There are methods of approximation in which errors can be com-
puted based on the current numerical solution without knowledge
of the theoretical solution. The residual functional or L2 -norms of
residuals in the finite element methods with minimally conform-
ing approximation spaces are examples of this approach. This ap-
proach is highly meritorious as it provides a quantitative measure
of error in the computed solution without knowldge of the theo-
retical solution, hence can be used to compute errors in practical
applications.
(c) There are methods in which the solution error can neither be es-
timated nor computed but there is some vague indication of im-
provement. Order of truncation errors in finite difference processes
fall under this category. With increasing order of truncation, the
solution errors are expected to reduce.
Linear Physical
System
Linear Mathematical
Model
(BVP or IVP as
examples)
(A)
Discretization
Linear Algebraic
Equations
(B)
Solution
Error estimate or
error computation
NO YES
Converged Converged solution
Solution? (of (A))
Non-linear Physical
System
Non-linear Mathematical
Model
(BVP or IVP as
examples)
(A)
Discretization
Linear Algebraic
Equations
(B)
Iterative Solution
Procedure
NO Iterative process
converged?
YES
Error Computation
or Estimation
NO YES
Approximate solution
Converged ?
(of (A))
fi (x1 , x2 , . . . , xn ) = bi ; i = 1, 2, . . . , n (2.1)
9
10 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
(3) If the number of equations are large (large value of n in equation (2.1)),
then the representation (2.2) is cumbersome, i.e., not very compact. We
use matrix and vector notations to represent (2.2).
Definition 2.3 (Matrix). A matrix is an ordered rectangular (in general)
arrangement of elements and is generally denoted by a symbol. Thus, n × m
elements aij ; i = 1, 2, . . . , n; j = 1, 2, . . . , m can be represented by a symbol
[A] called the matrix A as follows:
a11 a12 . . . a1m
a21 a22 . . . a2m
[A] = . (2.4)
..
an1 an2 . . . anm
2.1. INTRODUCTION, MATRICES, AND VECTORS 11
The elements along each horizontal line are called rows whereas the elements
along each vertical line are called columns. Thus, the matrix [A] has n rows
and m columns. We refer to [A] as an n × m matrix. We identify each
element of [A] by row and column location. Thus, the element aij of [A] is
located at row i and column j. The first subscript in aij is the row location
and the second subscript is the column location. This is a standard notation
and is used throughout the book.
The notation (2.12) is helpful when expressing [I] in terms of its components
(Einstein notation). Thus δij is in fact the identity matrix expressed in
Einstein notation. If we consider the product of [A] and [I], then we can
write:
[A][I] = aij δjk = aik = [A] ; i, j, k = 1, 2, . . . , n (2.13)
Likewise:
[I][I] = δij δjk = δik = [I] (2.14)
2.1. INTRODUCTION, MATRICES, AND VECTORS 13
[D] is defined by
[C](n×l) is defined by
We note that the number of columns in [A] must be the same as the number
of rows in [B], otherwise the product of [A] and [B] is not valid. Consider
a11 a12
b11 b12
[A] = a21 a22
[B] = (2.24)
b21 b22
a31 a32
Then
a11 a12 (a11 b11 + a12 b21 ) (a11 b12 + a12 b22 )
b b
[C] = [A][B] = a21 a22 11 12 = (a21 b11 + a22 b21 ) (a21 b12 + a22 b22 )
b21 b22
a31 a32 (a31 b11 + a32 b21 ) (a31 b12 + a32 b22 )
(2.25)
We note that [A](n×n) [I](n×n) = [I](n×n) [A](n×n) = [A](n×n) .
Distributive Property:
2.1. INTRODUCTION, MATRICES, AND VECTORS 15
The sum of [A] and [B] multiplied with [C] is the same as [A] and [B]
multiplied with [C], then summed.
Commutative Property:
The product of [A] and [B] is not the same as product of [B] and [A].
Thus, in taking the product of [A] and [B], their positions cannot be changed.
A singular matrix is one for which its inverse does not exist. The inverse is
only defined for a square matrix.
then
a11 a21
[A]T = a12 a22 (2.32)
a13 a23 (3×2)
Row one of [A] is the same as column one of [A]T . Likewise row one of
[A]T is the same as column one of [A] and so on. That is, rows of [A] are
same as columns of [A]T and vice versa.
16 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
then
a1
a2
[A]T(m×1) = .. (2.34)
.
am (m×1)
Transpose of a Vector:
If {A} is a vector defined by
a1
a2
{A}(n×1) = .. (2.35)
.
an (n×1)
then
{A}T(1×n) = a1 a2 . . . an (1×n)
(2.36)
That is, the transpose of a column vector is a row matrix.
Likewise: T
[A][B][C] = [C]T [B]T [A]T (2.38)
and
([A]m×n {c}n×1 )T = {c}T [A]T
1×m
(2.39)
Thus, the transpose of the product of matrices is the product of their trans-
poses in reverse order.
or
[A]T = −[A] (2.43)
Therefore, we have:
T
[A][B] = [B]T [A]T = −[B][A] (2.45)
Likewise: T
[B][A] = [A]T [B]T = [A](−[B]) = −[A][B] (2.46)
One can conclude from this that the product of a symmetric matrix and a
skew-symmetric matrix is a skew-symmetric matrix.
and strictly greater than zero, and the associated eigenvectors are real (see
Chapter 4).
for some square matrix [B]. Neither [A] nor [B] are necessarily symmetric.
When [A] is not symmetric, [B] and [B]∗ are complex. If [B] is not complex,
then [B]∗ = [B]T . This is only ensured if [A] is symmetric. Thus, if [A] is
symmetric then [B] is also symmetric and in this case (see Chapter 4 for
proof):
{x}T [A]{x} = {x}T [B]T [B]{x} ≥ 0 ∀{x} = 6 {0} (2.51)
and
{x}T [A]{x} = 0 for some {x} =
6 {0} (2.52)
when {x}i and {x}j are orthogonal with respect to [I]. Thus, when (2.56)
holds, so does (2.53). We note that (2.56) is a special case of (2.55) with
[M ] = [I].
2.1. INTRODUCTION, MATRICES, AND VECTORS 19
Consider
a11 a12 a13 b1
[Aag ] = a12 a22 a23 b2 (2.64)
a13 a23 a33 b3
[Aag ] in this case is the (3 × 4) matrix defined by augmenting [A] by a vector
whose components are b1 , b2 , and b3 .
Definition 2.25 (Linear Dependence and Independence of Rows).
If a row of a matrix can be generated by a linear combination of the other
rows of the matrix, then this row is called linearly dependent. Otherwise,
the row is called linearly independent.
Definition 2.26 (Linear Dependence and Independence of Columns).
If a column of a matrix can be generated by a linear combination of the other
columns of the matrix, then this column is called linearly dependent. Oth-
erwise, the column is called linearly independent.
Definition 2.27 (Rank of a Matrix). The rank of a square matrix is the
number of linearly independent rows or columns. In a (n × n) square matrix,
if all rows and all columns are linearly independent, then n is the rank of
the matrix.
Definition 2.28 (Rank Deficient Matrix). In a rank deficient (n × n)
square matrix, there is at least one row and one column that can be expressed
as a linear combination of the other rows and columns. Thus, in a (n × n)
matrix of rank (n−m) there are m rows and columns that can be expressed as
linear combinations of the others. In such matrices, a reduced (n−m×n−m)
matrix can be formed by removing the linearly dependent rows and columns
that would have a rank of (n − m).
col. j
The determinant is only defined for a square matrix. Obviously, the cal-
culation of det[A] is facilitated by choosing a row or a column containing
zeros.
Find |A|.
Solution:
Determine |A| using the first row of [A].
(i) The minors m11 and m12 of a11 and a12 are given by
(ii) The cofactors of a11 and a12 are given by the signed minors of a11 and
a12 .
Find |A|.
Solution:
Determine |A| using the first row of [A].
(i) Minors m11 , m12 , and m13 of a11 , a12 , and a13 are given by
(iii)
|A| = a11 ā11 + a12 ā12 + a13 ā13
Substituting for ā11 , ā12 , and ā13 :
a22 a23 a a a a
|A| = a11 (1) + a12 (−1) 21 23 + a13 (1) 21 22
a32 a33 a31 a33 a31 a32
Similarly:
a21 a23
= a21 a33 − a23 a31
a31 a33
a21 a22
= a21 a32 − a22 a31
a31 a32
|A| = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )
In matrix [A], row two is identical to row one. It can be shown that if [A]
is a square matrix (n × n) and if any two rows are the same, the |A| = 0,
regardless of n.
fi (x1 , x2 , . . . , xn ) = bi i = 1, 2, . . . , n (2.70)
in which
a11 a12 . . . a1n
b1
x1
a12 a22 . . . a2n b2
x2
[A] = . .. ; {b} = ; {x} = (2.73)
.. ..
.. .
.
.
an1 an2 . . . ann bn xn
The matrix [A] is called the coefficient matrix, {b} is called the right-hand
side or non-homogeneous part, and {x} is a vector of unknowns to be deter-
mined such that (2.72) holds. Sometimes we augment [A] by {b} by including
it as (n + 1)th column in [A]. Thus augmented matrix [Aag ] would be:
a11 a12 ... a1n b1
a12 a22 ... a2n b2
[Aag ] = . (2.74)
.. .. .. ..
.. . . . .
an1 an2 . . . ann bn
26 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
We rewrite (2.76) by dividing the first equation by a11 and the second equa-
tion by a22 (provided a11 6= 0 and a22 6= 0).
y = (−a11/a12 ) x + (b1/a12 )
(2.77)
y = (−a21/a22 ) x + (b2/a22 )
If we define
m1 = (−a11/a12 ) m2 = (−a21/a22 )
(2.78)
c1 = (b1/a12 ) c2 = (b2/a22 )
y = m1 x + c1
(2.79)
y = m2 x + c2
Remarks.
(1) When the determinant of the coefficient matrix in (2.76), i.e., (a11 a22 −
a12 a22 ), is not equal to zero, the intersection of the straight lines is
distinct and we clearly have a unique solution (as shown in Figure 2.1)
a11 x + a12 y = b1
(2.80)
a11 x + a12 y = b2
2.4. DIRECT METHODS 29
y
y = m1 x + c1
c2
y = m2 x + c2
c1
Figure 2.1: Graphical method of obtaining solution of two linear simultaneous equations
or
y = (−a11/a12 ) x + (b1/a12 )
(2.81)
y = (−a11/a12 ) x + (b2/a13 )
Let
m1 = −a11/a12 c1 = b1/a12 c2 = b2/a12 (2.82)
Hence, (2.81) can be written as
y = m1 x + c1
(2.83)
y = m1 x + c2
Equations (2.83) are the equations of straight lines that are parallel.
Parallel lines have the same slopes but different intercepts, thus these
will never intersect. In this case we obviously cannot find a solution
(x, y) of (2.83). We also note that the determinant of the coefficient
matrix of (2.80) is zero. In (2.80) row one of the coefficient matrix is
the same as row two and the columns are multiples of each other. This
system of equations (2.80) is rank deficient. Figure 2.2 shows plots of
(2.83).
(3) Consider a case in which column two of the coefficient matrix in (2.76)
is a multiple of column one, i.e., for a scalar s we have
a12 = sa11
a22 = sa21 (2.84)
30 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
y = m1 x + c2
y = m1 x + c1
c2
c1
c1 = c2
Figure 2.3: Infinitely many solutions when two equations are identical
(5) Consider system of equations (2.76). It could happen that the coeffi-
cients aij ; i, j = 1, 2 are such that the determinant of the coefficient
matrix (a11 a22 − a12 a21 ) may not be zero but may be close to zero. In
this case the straight lines defined by the two equations in (2.76) do have
an intersection but their intersection may not be distinct (Figure 2.4).
(7) We clearly see that for n > 3, the graphical approach is difficult and
impractical. However, the graphical approach gives deeper insight into
the meaning of the solutions of linear simultaneous equations.
Then
b1 a12 a13 a11 b2 a13 a11 a12 b1
b2 a22 a23 a21 b2 a23 a21 a22 b2
(2.91)
b3 a32 a33 a31 b3 a33 a31 a32 b3
x1 = x2 = x3 =
|A| |A| |A|
Thus, to calculate x1 , we replace the first column of [A] by {b}, then divide
its determinant by the determinant of [A]. For x2 and x3 we use the second
and third columns of [A] with {b} respectively, with the rest of the procedure
remaining the same as for x1 .
Remarks.
2.4. DIRECT METHODS 33
(1) If det[A] is zero then xj ; j = 1, 2, 3 are infinity, i.e., they are not defined.
(2) When n is large, calculations of determinants is tedious and time con-
suming. Hence, this method is not preferred for large systems of linear si-
multaneous algebraic equations. However, unlike the graphical method,
this method can be used for n ≥ 3.
x1 + x2 + x3 = 6
0.1x1 + x2 + 0.2x3 = 2.7
x1 + 0.2x2 + x3 = 4.4
In this case
1 1 1 x1 6
[A] = 0.1 1 0.2 {x} = x2 {b} = 2.7
1 0.2 0.2 x3 4.4
in
[A]{x} = {b}
We use Cramer’s Rule to obtain the solution of {x}. Following (2.91):
1 1 1
det[A] = 0.1 1 0.2 = 0.08
1 0.2 0.2
upper triangular, without switching rows or columns. The row and column
locations in [Aag ] are preserved during the elimination process.
Consider (2.92) with n = 3, i.e., three linear simultaneous algebraic equa-
tions in three unknowns: x1 , x2 , and x3 . [A]{x} = {b} is given by:
a11 a12 a13 x1 b1
a21 a22 a23 x2 = b2 (2.93)
a31 a32 a33 x3 b3
In (2.96), we note that all elements below the diagonal are zero in [A], i.e., [A]
in (2.96) is in upper triangular form. This the main objective of elimination
36 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
b003
x3 = (2.98)
a0033
In this case a0033 is the pivot element. Next we can use the second equation
in (2.97) to solve for x2 , as x3 is already known.
(b02 − a023 x3 )
x2 = (2.99)
a022
Now using the first equation in (2.97), we can solve for x1 as x2 and x3 are
already known.
(b1 − a21 x2 − a13 x3 )
x1 = (2.100)
a11
Thus, the complete solution [x1 x2 x3 ]T is known.
Remarks.
(1) The elements a11 , a022 , and a0033 are pivots that are used to divide other
coefficients. These cannot be zero otherwise this method will fail.
(2) In this method we maintain the positions of rows and columns in the
augmented matrix, i.e., we do not perform row and column interchanges
even if zero pivots are encountered, hence the name naive Gauss elimi-
nation.
(3) It is a two step process: in the first step we make the matrix [A] in the
augmented form upper triangular using elementary row operations. In
2.5. ELIMINATION METHODS 37
(4) When a solution is required for more than one {b} (i.e., more than one
right side), then the matrix [A] can be augmented by all of the right side
vectors before performing elementary row operations to make [A] upper
triangular. As an example consider (2.92), a (3 × 3) system in which we
desire solutions {x} for {b} = {p} and {b} = {q}.
p1 q1
{b} = {p} = p2 {b} = {q} = q2 (2.101)
p3 q3
and
Now we can use back substitution for (2.104) and (2.105) to find solutions
for {x} for {b} = {p} and {b} = {q}.
38 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
x1 + x2 + x3 = 6
0.1x1 + x2 + 0.2x3 = 2.7
x1 + 0.2x2 + x3 = 4.4
where
1 1 1 x1 6
[A] = 0.1 1 0.2 {x} = x2 {b} = 2.7
1 0.2 1 x3 4.4
Make column one in [Aag ] upper triangular by using the elementary row
operations shown below.
1 1 1 6
R2 − 0.1
1 R1 0 0.9 0.1 2.1
1
R3 − 1 R1 0 −0.8 0 −1.6
Next we make column two in the modified [Aag ] upper triangular using the
elementary row operations shown below.
1 1 1 6
0 0.9 0.1 2.1
−0.8 −0.8 0.8
R3 − 0.9 R2 0 0 ( 0.9 )(0.1) −1.6 − ( 0.9 )2.1
1 1 1 6
0 0.9 0.1 2.1
0 0 0.89
0.8
3
2.5. ELIMINATION METHODS 39
Back Substitution
From the third row in the upper triangular form:
0.8 0.8
x3 =
9 3
∴ x3 = 3
∴ x2 = 2
x1 = 6 − x2 − x3 = 6 − 2 − 3 = 1
Hence,
x1 1
{x} = x2 = 2
x3 3
The solution {x} is the same as that obtained using Cramer’s rule.
In some cases the coefficients in the system of equations (2.106) may be such
that a11 = 0 even though the system of equations (2.106) does have a unique
solution. In this case the naive Gauss elimination method will fail due to
the fact that we must divide by the pivot a11 . In such situations we can
employ partial pivoting that helps in avoiding zero pivots. This procedure
involves the interchange of rows for a column under consideration during
upper triangulation such that the largest element (absolute value) in this
column becomes the pivot. This is followed by the upper triangulation for
the column under consideration. This procedure is continued for subsequent,
columns keeping in mind that the columns (and corresponding rows) that
are already in upper triangular form are exempted or are not considered in
40 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
x1 + x2 + x3 = 6
8x1 + 1.6x2 + 8x3 = 35.2
0.1x1 + x2 + 0.2x3 = 2.7
We want to make column one upper triangular by using the largest element,
i.e., 8, as the pivot. This requires that we interchange rows one and two in
the augmented matrix.
8 1.6 8 35.2
R1 R2 1 1 1 6
0.1 1 0.2 2.7
42 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
We then make column one upper triangular by using elementary row oper-
ations.
8 1.6 8 35.2
R2 − 18 R1 0 0.8 0
1.6
R3 − 0.18 R1 0 0.98 0.1 2.26
Next we consider column two. The elements on the diagonal and below the
diagonal in column two are 0.8 and 0.98. We want to use 0.98 as the pivot
(the larger of the two). This requires that we interchange rows two and
three.
8 1.6 8 35.2
R2 R3 0 0.98 0.1 2.26
0 0.8 0 1.6
We now make column two upper triangular by elementary row operations.
8 1.6 8 35.2
0 0.98 0.1 2.26
0.8 0.8 0.8 0.8
R3 − 0.98 R2 0 0.8 − ( 0.98 )(0.98) 0 − ( 0.98 )(0.1) 1.6 − ( 0.98 )(2.26)
which after simplification becomes:
8 1.6 8 35.2
0 0.98 0.1 2.26
0 0 0.0816 0.2449
This augmented form contains the final upper triangular form of [A]. Now
we can find x3 , x2 , and x1 using back substitution. Using the last equation:
0.2448979
x3 = =3
0.0816326
Using the second equation with the known value of x3 :
(2.26 − 0.1x3 ) (2.26 − 0.1(3)) 1.96
x2 = = = =2
0.98 0.98 0.98
Now, we can find x1 using the first equation and the known values of x2 and
x3 :
1 1 8
x1 = (35.2 − 1.6x2 − 8x3 ) = (35.2 − 1.6(2) − 8(3)) = = 1
8 8 8
Hence we have the solution {x}:
x1 1
{x} = x2 = 2
x3 3
2.5. ELIMINATION METHODS 43
2. Search the entire matrix [A] for the element with the largest magnitude
(absolute value).
5. Next consider column two of the reduced sub-matrix without row one
and column one. Search for the element with largest magnitude (absolute
value) in this sub-matrix. Perform row and column interchanges in the
augmented matrix (with column one in upper triangular form) so that the
element with the largest magnitude is the pivot for column two. Make
column two upper triangular by elementary row operations.
6. We continue this procedure for the remaining columns until [Aag ] becomes
upper triangular.
7. Solution {x} is then calculated using the upper triangular matrix in [A]
to obtain {x} in reverse order ie xn , xn−1 , . . . , x1 .
Remarks.
(3) It is important to note that row interchanges do not effect the order
of the variables in vector {x}, but column interchanges require that we
44 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
Consider the (2 × 2) sub-matrix (i.e., a022 , a023 , a032 , and a033 ). The element
with the largest magnitude is 2.9 (a023 ). We want 2.9 to be pivot, i.e., at
location (2,2). This could be done in two ways:
(i) First interchange row two with row three and then interchange columns
two and three.
(ii) Alternatively, first interchange columns two and three and then inter-
change rows two and three.
Regardless of whether we choose (i) or (ii), the end result is the same. In
the following we consider (i).
Interchange rows two and three.
x1 x2 x3
8 1.6 8 35.2 x1
0 0.98 2.9 10.66 x2
0 0.8 0 4.6 x3
Now interchange columns two and three. In doing so, we should also inter-
change x2 and x3 respectively.
x1 x3 x2
8 8 1.6 35.2 x1
0 2.9 0.98 10.66 x3
0 0 0.8 4.6 x2
Since a032 is already zero, column two is already in upper triangular form,
hence no elementary row operations are required. This system of equations
are in the desired upper triangular form. Now, we can use back substitution
to calculate x2 , x3 , and x1 (in this order).
Consider the last equation from which we can calculate x2 .
4.6
x2 = = 5.75
0.8
Using the second equation and the known value of x2 we can calculate x3 .
(10.66 − 0.98x2 ) (10.66 − 0.98(5.75))
x3 = = = 1.73
2.9 2.9
Now using first equation, we can calculate x1 .
(35.2 − 8x3 − 1.6x2 ) (35.2 − 8(1.73) − 1.6(5.75)
x1 = = = 1.52
8 8
46 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
Hence
x1 1.52
{x} = x2 = 5.75
x3 1.73
Comparing (2.115) with (2.117) suggest that if we augment [A] by {b} and
then if we can make [A] an identity matrix by elementary row operations,
then the modified {b} will be the solution {x}.
Consider a general (3 × 3) system of linear simultaneous algebraic equa-
tions.
Augment the coefficient matrix [A] of the coefficients aij by the right side
vector {b}.
a11 a12 a13 b1
[Aag ] = a21 a22 a23 b2 (2.120)
a31 a32 a33 b3
2.5. ELIMINATION METHODS 47
In this first step our goal is to make a11 unity and its first column in the
upper triangular from using elementary row operations. First, we make a11
unity.
R1 = aR1 0 a0 0
11
1 a12 13 b1
a21 a22 a23 b2 (2.121)
a31 a32 a33 b3
Make column one in (2.121) upper triangular using elementary row opera-
tions. 0 0
1 a12 a13 b01
We make the elements of column two below the diagonal zero by using row
two and elementary row operations in (2.123).
0 0
1 a12 a13 b01
0 1 a0023 b002 (2.124)
R3 − a32 R2 0 0 a0033 b003
0
Make the elements of column three in (2.125) above the diagonal zero by
using row three and elementary row operations.
R2 − a0023 R3 0 1 0 b000
2
(2.126)
0 0 1 b3000
Lastly, make the elements of column two in (2.126) above the diagonal zero
using row two and elementary row operations.
R1 − a0012 R2 1 0 0 b000
1
0 1 0 b000
2
(2.127)
000
0 0 1 b3
48 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
In (2.127), the vector b000 000 000 T is the solution vector {x} = x x x T .
1 b2 b3 1 2 3
Remarks.
(1) If the solution of (2.115) is required for more than one right-hand side,
then [A] in (2.121) must be augmented by all right-hand sides before
making [A] identity. When [A] becomes identity, the locations of the
right-hand side column vectors contain solutions for them.
or
100 1
0 1 0 2 (2.134)
001 3
The vector in the location of {b} in (2.134) is the solution vector {x}. Thus:
x1 1
{x} = x2 = 2 (2.135)
x3 3
Consider
[A]{x} = {b} (2.136)
The rules for determining the coefficients of [L] and [U ] are established by
forming the product [L][U ] and equating the elements of the product to the
corresponding elements of [A]. We present details in the following. Consider
50 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
First Row of [U ]:
To determine the first row of [U ] we equate coefficients in the first row
on both sides of (2.139).
That is, the first row of [U ] is the same as the first row of the coefficient
matrix [A].
Second Row of [U ]:
Equate the coefficients of the second row on both sides of (2.139).
Third Row of [U ]:
Equate coefficients of the third row on both sides of (2.139).
Remarks.
(1) The coefficients of [U ] and [L] can be expressed more compactly as fol-
lows:
j = 1, 2, . . . n
u1j = a1j
(i = 1)
ai1 i = 2, 3, . . . , n
Li1 =
u11 (j = 1)
(2.147)
i−1 i = 2, 3, . . . , n
X
uij = aij − Lik ukj j = i, i + 1, . . . , n
k=1 (for each value of i)
j−1
P
aij − Lik ukj
k=1 j = 2, 3, . . . , n
Lij =
ujj i = j + 1, . . . , n
Using n = 4 in (2.147) we can obtain (2.140) – (2.146). The form in
(2.147) is helpful in programming [L][U ] decomposition.
(2) We can economize in the storage of the coefficients of [L] and [U ].
(i) There is no need to store zeros in either [L] or [U ].
(ii) Ones in the diagonal of [L] do not need to be stored either.
(iii) A closer examination of the expressions for the coefficients of [L]
and [U ] shows that once the elements of aij of [A] are used, they
do not appear again in the further calculations of the coefficients
of [L] and [U ].
(iv) Thus we can store coefficients of [L] and [U ] in the same storage
space for [A].
uij : stored in the same locations as
i = 1, 2, . . . , n
aij
j = i, i + 1, . . . , n (for each i)
Lij : stored in the same locations as
j = 1, 2, . . . , n
aij
i = j + 1, . . . , n
2.5. ELIMINATION METHODS 53
[A]{x} = {b}
in which
3 −0.1 −0.2 x1 7.85
[A] = 0.1 7 −0.3 {x} = x2 {b} = −19.3
0.3 −0.2 10 x3 71.4
54 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
1 0 0 3 −0.1 −0.2
[L] = 0.0333 1 0 [U ] = 0 7.00333 − 0.29333
0.1 − 0.02713 1 0 0 10.012
Let
[U ]{x} = {y}
∴ [L]{y} = {b}
1 0 0 y1 7.85
0.0333 1 0 y2 = −19.3
0.1 − 0.02713 1 y3 71.4
y1 = 7.85
y2 = −19.3 − (0.0333)(7.85) = −19.561405
y3 = 71.4 − (0.1)(7.85) − (−0.2713)(−19.561405) = 70.0843
Now we know {y}, hence we can use [U ]{x} = {y} to find {x} (backward
pass).
3 −0.1 −0.2 x1 7.85
0 7.00333 − 0.29333 x2 = −19.56125
0 0 10.012 x3 70.0843
56 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
The rules for determining the elements of [L] and [U ] are established by
forming the product [L][U ] and equating the elements of the product to the
corresponding elements of [A]. We present details in the following. Consider
[A] to be a (4 × 4) matrix. We equate [A] to the product of [L] and [U ].
a11 a12 a13 a14 L11 0 0 0 1 u12 u13 u14
a21 a22 a23 a24 L21 L22 0 0 0 1 u23 u24
= (2.153)
a31 a32 a33 a34 L31 L32 L33 0 0 0 1 u34
a41 a42 a43 a44 L41 L42 L43 L44 0 0 0 1
2.5. ELIMINATION METHODS 57
To determine the elements of [L] and [U ], we form the product of [L] and
[U ] in (2.153).
a11 a12 a13 a14
a21 a22 a23 a24
a31 a32 a33 a34 =
First Row of [U ]:
Equating elements of the first row of both sides of (2.154):
a1j
u1j = j = 1, 2, . . . , 4 (or n in general) (2.156)
L11
Second Column of [L]:
Equating elements of the second column on both sides of (2.154):
L22 = a22 − L21 u12
L32 = a32 − L31 u12 (2.157)
L42 = a42 − L41 u12
Second Row of [U ]:
Equating elements of the second row on both sides of (2.154):
(a23 − L21 u13 )
u23 =
L22
(2.158)
(a24 − L21 u14 )
u24 =
L22
58 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
Third Row of [U ]:
Equating the elements of the third row in (2.154):
Remarks.
(1) The elements of [L] and [U ] can be expressed more compactly as follows:
i = 1, 2, . . . , n
Lij = ai1
(j = 1 for this case )
a1j j = 2, 3, . . . , n
uij =
L11 (i = 1 for this case )
j = 2, 3, . . . , n (2.162)
j−1
X
Lij = aij − Lik ukj i = j, j + 1, . . . , n
k=1 (for each j)
i−1
P
aij − Lik ukj
k=1 i = 2, 3, . . . , n
uij =
Lii j = i + 1, i + 2, . . . , n
(2) Just like Cholesky decomposition, [L] and [U ] can be stored in the same
space that is used for [A], however [A] is obviously destroyed in this case.
2.5. ELIMINATION METHODS 59
[A]{x} = {b}
in which
3 − 0.1 − 0.2 x1 7.85
[A] = 0.1 7 −0.3 {x} = x2 {b} = −19.3
0.3 −0.2 10 x3 71.4
L11 = a11 = 3
L21 = a21 = 0.1
L31 = a31 = 0.3
First Row of [U ]:
a12 −0.1
u12 = = = −0.033333
L11 3
a13 −0.2
u13 = = = −0.066667
L11 3
Second Column of [L]:
Second Row of [U ]:
(a23 − L21 u13 ) (−0.3 − 0.1(−0.066667))
u23 = =
L22 7.003333
u23 = −0.0418848
3 0 0 1 − 0.03333 −0.66667
[A] = [L][U ] = 0.1 7.003333 00 1 − 0.04188848
0.3 −0.19 10.012042 0 0 1
By taking product of [L] and [U ] we recover [A].
∴ [L]{y} = {b}
Using forward pass, calculate {y} using [L]{y} = {b}.
3 0 0 y1 7.85
0.1 7.003333 0 y2 = −19.3
0.3 −0.19 10.012042 y3 71.4
7.85
y1 = = 2.616667
3
(−19.3 − (0.1)(2.616667))
y2 = = −2.7931936
7.003333
(71.4 − (0.3)(2.61667) − (−0.19(−2.7931936))
y3 = =7
10.012042
y1 2.616667
∴ {y} = y2 = −2.7931936
y3 7
Consider
[A]{x} = {b} (2.163)
When making the first column in (2.164) upper triangular, we need to make
a21 and a31 zero by elementary row operations. In this process we multiply
row one by aa21
11
= C21 and aa31
11
= C31 and then subtract these from rows two
and three of (2.164). This results in
a11 a12 a13 x1 b1
0 a022 a023 x2 = b02 (2.165)
0 a032 a033
0
x3 b3
To make the second column upper triangular we multiply the second row in
a0
(2.165) by a32
0 = C32 and subtract it from row three of (2.165).
22
a11 a12 a13 x1 b1
0 a022 a023 x2 = b02 (2.166)
0 0 a0033
00
x3 b3
or
1 0 0 a11 a12 a13 x1 b1
C21 1 0 0 a022 a023 x2 = b2 (2.168)
C31 C32 1 0 0 a0033 x3 b3
a0
By using C21 = aa21
11
, C31 = aa31
11
, and C32 = a32
0 in (2.168) and by carrying
22
the product of [L] and [U ] in (2.168), the matrix [A] is recovered.
62 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
a032 −0.19
C32 = 0 = = −0.027130 (2.171)
a22 7.003333
The new upper triangular form of [A] is in fact [U ] and is given by:
3 −0.1 −0.2
0 7.003333 − 0.293333 = [U ] (2.172)
0 0 10.012
and
1 0 0 1 0 0 1 0 0
[L] = L21 1 0 = C21 1 0 = 0.0333 1 0 (2.173)
L31 L32 1 C31 C32 1 0.1 − 0.02713 1
We can check that the product of [L] in (2.173) and [U ] in (2.172) is in fact
[A].
2.5. ELIMINATION METHODS 63
(L̃11 )2 (L̃11 )(L̃21 ) (L̃11 )(L̃31 )
=
(L̃21 )(L̃11 ) (L̃21 )2 + (L̃22 )2 (L̃21 )(L̃31 ) + (L̃22 )(L̃32 )
(L̃31 )(L̃11 ) (L̃31 )(L̃21 ) + (L̃32 )(L̃22 ) (L̃31 )2 + (L̃32 )2 + (L̃33 )2
(2.176)
Using (2.188):
[I]{x} = [A]−1 {b} (2.190)
or
{x} = [A]−1 {b} (2.191)
Thus, if we can find [A]−1 then the solution of {x} of (2.187) can be obtained
using (2.191).
4. Find the cofactor matrix of [A], i.e., [Ā], by using cofactors āij ; i, j =
1, 2, . . . , n.
ā11 ā12 . . . ā1n
ā21 ā22 . . . ā2n
[Ā] = (2.193)
..
.
ān1 ān2 . . . ānn
[A] [I] {b} (2.200)
Using the Gauss-Jordan method, when [A] is made identity using elementary
row operations, we have:
[I] [A]−1 {x}
(2.201)
The location of [I] in (2.200) contains [A]−1 and the location of {b} in (2.200)
contains the solution vector {x}.
Let
[B] = [A]−1 (2.203)
To obtain the first column of [B] we solve the following system of equations:
First column of [B] XXX
1st row
X
z
b11 1
0
b
21
..
.
..
.
[L][U ] = . (2.204)
bi1
..
..
.
.
.
.
bn1
0
(iv) Use the {x} vector from (2.212) in (2.209) to solve for x2 , say x002 .
(v) Update the {x} in (2.212) by replacing x̃2 with x002 , hence the updated
{x} becomes: 0
x1
{x} = x002 (2.213)
x̃3
(vi) Use the vector {x} from (2.213) in (2.210) to solve for x3 , say x000
3.
Convergence Criterion
Let {x}j−1 and {x}j be two successive solutions at the end of (j − 1)th
and j th iterations. We consider the iterative solution procedure converged
when the corresponding components of {x}j−1 and {x}j are within a preset
tolerance ∆.
(i )j is the percentage error in the ith component of {x}, i.e., xi , based on
the most up to date solution for the ith component of {x}, (xi )j . When
(2.215) is satisfied, we consider the iterative process converged and we have
an approximation {x}j of the true solution {x} of (2.207).
Solve for x1 , x2 , and x3 using the first, second, and third equations in (2.216).
7.85 + 0.1x2 + 0.2x3
x1 = (2.217)
3
−19.3 − 0.1x1 + 0.3x3
x2 = (2.218)
7
71.4 − 0.3x1 + 0.2x2
x3 = (2.219)
10
(i) Choose
x̃1 0
{x} = x̃2 = 0 (2.220)
x̃3 0
(ii) Solve for x1 using (2.217) and (2.220) and denote the new value of x1
by x01 .
7.85 + 0 + 0
x1 = = 2.616667 = x01 (2.221)
3
2.7. ITERATIVE METHODS OF SOLVING LINEAR SYSTEMS 71
(iii) Using the new value of x1 , i.e. x01 , update the starting solution vector
(2.220). 0
x1 2.616667
{x} = x̃2 = 0 (2.222)
x̃3 0
(iv) Using the most recent {x} (2.222), calculate x2 using (2.218) and de-
note the new value of x2 by x002 .
−19.3 − 0.1(2.616667) + 0
x2 = = −2.794524 = x002 (2.223)
7
(vi) Calculate x3 using (2.219) and (2.224) and denote the new value of x3
by x000
3.
1
x3 = (71.4 − 0.3(2.616667) + 0.2(−2.7974524)) = 7.00561 = x003
10
(2.225)
(vii) Update {x} in (2.224) using the new value of x3 , i.e. x000
3.
0
x1 2.616667
{x}1 = x002 = −2.794524 (2.226)
000
x3 7.00561
Steps (i)-(vii) complete the first iteration. At the end of the first iteration,
{x} in (2.226) is the most recent estimate of the solution. We denote this
by {x}1 .
Using (2.226) as the initial solution for the second iteration and repeating
steps (ii)-(vii), the second iteration would yield the following new estimate
of the solution {x}.
7.85 + 0.1(−2.794524) + 0.2(7.005610)
x01 = = 2.990557
3
−19.3 − 0.1(2.990557) + 0.3(7.005610)
x002 = = −2.499625 (2.227)
7
71.4 − 0.3(2.990557) + 0.2(−2.499625)
x000
3 = = 7.000291
10
72 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
Thus at the end of the second iteration the solution of the vector {x} is
2.990557
{x}2 = −2.499685 (2.228)
7.000291
Using {x}1 and {x}2 in (2.226) and (2.228), we can compute (i )2 (using
(2.215)).
2.990557 − 2.616667
(1 )2 = 100 = 12.5%
2.990557
−2.499625 − (−2.794524)
(2 )2 = 100 = 11.8% (2.229)
−2.4999625
7.000291 − 7.00561
(3 )2 = 100 = 0.076%
7.000291
Thus, at the end of iteration 6, {x}6 is the converged solution in which each
component of {}6 < 10−7 .
2.7. ITERATIVE METHODS OF SOLVING LINEAR SYSTEMS 73
Thus, at the end of iteration 8, {x}8 is the converged solution in which each
component of {}8 < 10−7 .
Remarks.
(1) In Gauss-Seidel method, we begin with a starting or assumed solution
vector and obtain new values of the components of {x} individually.
The new computed vector {x} is used as the starting vector for the next
iteration.
(2) We observe that the coefficient matrix [A] in (2.216), if well-conditioned,
generally has the largest elements on the diagonal, i.e., it is a diagonally
dominant coefficient matrix. Iterative method have good convergence
characteristics for algebraic systems with such coefficient matrices.
(3) The choice of starting vector is crucial. Sometimes the physics from
which the algebraic equations are derived provides enough information to
prudently select a starting vector. When this information is not available
or helpful, null or unity vectors are often useful as initial guess solutions.
or
{x} = {b̂} − ([Â] − [I]){x} (2.233)
It is more convenient to write (2.233) in the following form for performing
iterations.
{x}j+1 = {b̂} − ([Â] − [I]){x}j (2.234)
{x}j+1 is the most recent estimate of {x} and {x}j is the immediately pre-
ceding estimate of {x}.
(i) Assume a guess or initial vector {x}1 for {x} (i.e., j = 1 in (2.234)).
This could be a null vector, a unit vector, or any other appropriate
choice.
(ii) Use (2.234) to solve for {x}2 . {x}2 is the improved estimate of the
solution.
(iii) Check for convergence using the criterion defined in the next section
or using the same criterion as in the case of Gauss-Seidel method,
(2.215). We repeat steps (ii)-(iii) if the most recent estimate of {x} is
not converged.
or
{x}j+1 = {b̂} − [B̂]{x}j (2.236)
For j = 1:
{x}2 = {b̂} − [B̂]{x}1 (2.237)
For j = 2:
{x}3 = {b̂} − [B̂]{x}2 (2.238)
Substitute {x}2 from (2.237) into (2.238).
For j = 3:
{x}4 = {b̂} − [B̂]{x}3 (2.241)
Substituting for {x}3 from (2.240) in (2.241) and rearranging terms:
Thus, at the end of iteration 8, {x}8 is the converged solution in which each
component of {}8 < 10−7 .
Thus, at the end of iteration 17, {x}17 is the converged solution in which
each component of {}17 < 10−7 .
Remarks.
(1) We note that the convergence characteristics of the Jacobi method for
this example are much poorer than the Gauss-Seidel method in terms of
the number of iterations.
(2) The observation in Remark (1) is not surprising due to the fact that in
80 LINEAR SIMULTANEOUS ALGEBRAIC EQUATIONS
2. If 0 < λ < 1, the vectors {x}c and {x}p get multiplied with factors be-
tween 0 and 1. For this choice of λ, the method is called under-relaxation.
3. If 1 < λ < 2, then the vector {x}c is multiplied with a factor greater
than one and {x}p is multiplied with a factor that is less than zero. The
motivation for this is that {x}c is supposedly more accurate than {x}p ,
hence a bigger weight factor is appropriate to assign to {x}c compared to
{x}p . For this choice of λ, the method is called successive or simultaneous
over-relaxation or SOR.
Problems
2.1 Matrices [A], [B], [C] and the vector {x} are given in the following:
0 1 0
1 2 −1 3 1 2 −1
[A] = 0 4 −2 1 ; [B] = 1 −1 3
3 −1 1 1
2 −2 1
0 1 4 2 −1.2
[C] = −1 −2 3 1 ; {x} = 3.7
0 1 2 −1 2.0
2.3 Write the following system of equations in the matrix form using x, y, z
as vector of unknowns (in this order).
d1 = b1 y + c1 z
d2 = b2 y + a2 x
d3 = a3 x + b3 y
3x1 − x2 + 2x3 = 12
x1 + 2x2 + 3x3 = 11
2x1 − 2x2 − x3 = 2
4x1 + x2 − x3 = −2
5x1 + x2 + 2x3 = 4
6x1 + x2 + x3 = 6
(a) Obtain solution {x} using Gauss elimination with partial pivoting.
(b) Obtain solution {x} using Gauss elimination with full pivoting.
2x1 + x2 − x3 = 1
5x1 + 2x2 + 2x3 = −4
3x1 + x2 + x3 = 5
2.9. CONCLUDING REMARKS 85
(a) Calculate solution {x} using Gauss-Jordan method with partial piv-
oting.
(b) Calculate solution {x} using Gauss-Jordan method with full pivot-
ing.
2.9 Consider the following system of equations in which the coefficient ma-
trix [A] is symmetric.
2x1 − x2 = 1.5
−x1 + 2x2 − x3 = −0.25
−x2 + x3 = −0.25
[A] = [L̃][L̃]T
For each iteration tabulate the starting solution, computed solution, and
percentage error in each component of the computed solution using the cal-
culated solution as the improved solution. Allow maximum of 20 iterations
and use a convergence tolerance of 0.1 × 10−6 for the percentage error in
each component of the solution vector.
(a)
3.0 −0.1 −0.2 x1 7.85 0
0.1 7.0 −0.3 x2 = −19.3 as initial or
; use {x} = 0
starting solution
0.3 −0.2 10.0 x3 71.4 0
(b)
10 1 2 3 x1 10 0
1
20 2 3 x 20 0 as initial or
2
2
= ; use {x} =
2 30 4 x3 30 0
starting solution
3 3 4 40 x4 40 0
Based on the numerical studies for (a) and (b) comment on the performance
of Gauss-Seidel method and Jacobi method.
show whether this system of equations has a unique solution or not without
computing the solution.
6x + 4y = 4
4x + 5y = 1
2.9. CONCLUDING REMARKS 87
3.1 Introduction
Nonlinear simultaneous equations are nonlinear expressions in unknown
quantities of interest. These may arise in some physical processes directly
due to consideration of the mathematical descriptions of their physics. On
the other hand, in many physical processes described by nonlinear differ-
ential or partial differential equations, the use of approximation methods
such as finite difference, finite volume, and finite element methods for ob-
taining their approximate numerical solutions naturally results in nonlinear
simultaneous equations. Solutions of these nonlinear simultaneous equations
provides the solutions of the associated nonlinear differential and partial dif-
ferential equations.
In this chapter we consider systems of nonlinear simultaneous equations
and methods of obtaining their solutions. Consider a system of n nonlinear
simultaneous equations:
fi (x1 , x2 , . . . , xn ) = bi ; i = 1, 2, . . . , n (3.1)
89
90 NONLINEAR SIMULTANEOUS EQUATIONS
f (x)
x1 x2 x3
x
x=a x=b
We note that for the root x1 , this condition holds in the immediate neigh-
borhood of x = x1 as long as xl < x1 and xu > x1 . For the second root x2
we have:
f (xl ) < 0 , f (xu ) > 0
(3.5)
f (xl )f (xu ) < 0 for xl < x2 , xu > x2
For the third root x3 :
f (xl ) > 0 , f (xu ) < 0
(3.6)
f (xl )f (xu ) < 0
92 NONLINEAR SIMULTANEOUS EQUATIONS
f (x)
f (xl ) > 0
x1
x
x = xl x = xu
f (xu ) < 0
in which xi in the root of f (x), holds for each root. Thus, the condition (3.7)
is helpful in the root-finding methods considered in the following sections.
In the graphical method, we simply plot a graph of f (x) versus x and
locate values of x for which f (x) = 0 in the range [a, b]. These of course are
the approximate values of the desired roots within the limits of the graphical
precision.
60
40
20
0 x
-4 -3 -2 -1 0 1 2 3 4
-20
for each i value in (3.13), calculate f (xi+1 ). Using two successive values
f (xi ) and f (xi+1 ) of f (x), consider:
(
< 0 =⇒ a root in [xi , xi+1 ]
f (xi )f (xi+1 ) (3.14)
> 0 =⇒ no root in [xi , xi+1 ]
∆x = 0.41000E + 00
xmin = −0.40000E + 01
xmax = 0.40000E + 01
Remarks.
(2) A value of ∆x larger than 0.41 can be used too, but in this case ∆x may
be too large, hence we may miss one or more roots.
(3) For each range of the roots in (3.15), we can perform incremental search
with progressively reduced ∆x to eventually obtain an accurate value of
each root. This approach to obtaining accurate values of each root is
obviously rather inefficient.
(ii) Divide the interval [xl , xu ] into two equal intervals, [xl , xk ] and [xk , xu ],
in which xk = 1/2(xl + xu ).
Since the root lies in [xl , xk ], the interval [xk , xu ] can be discarded.
(iv) We now reinitialize the x values, i.e., keep x = xl the same but set
xu = xk to create a new, smaller range of [xl , xu ].
f (x)
f (xl ) > 0
xl xk xu
x
f (xk ) < 0
f (xu ) < 0
We discard the range x that does not contain the root and reinitialize the
other half-interval to [xl , xu ]. We repeat steps in (3.19) and (3.20) until:
xu − xl
% Error = × 100 ≤ ∆, a preset tolerance (3.21)
xu
Table 3.2: Results of bisection method for the first root of equation (3.12)
xl = -0.35900E+01
xu = -0.31800E+01
∆= 0.10000E−03
I= 20
Table 3.3: Results of bisection method for the second root of equation (3.12)
xl = -0.11300E+01
xu = -0.72000E+00
∆= 0.10000E−03
I= 20
Table 3.4: Results of bisection method for the third root of equation (3.12)
xl = 0.17400E+01
xu = 0.21500E+01
∆= 0.10000E−03
I= 20
(i) Connect the points (xl , f (xl )) and (xu , f (xu )) by a straight line (see
Figure 3.5) and define the intersection of this line with the x-axis as
x = xr . Using the equation of the straight line, similar triangles, or
f (x)
f (xu ) > 0
xr θ
x
θ
xu
f (xl ) < 0 f (xr )
xl
Figure 3.5: Method of false position
simply equating tan(θ) from the two triangles shown in Figure 3.5, we
obtain:
f (xl ) f (xu )
= (3.23)
(xl − xr ) (xu − xr )
Solving for xr :
xu f (xl ) − xl f (xu )
xr = (3.24)
f (xl ) − f (xu )
Equation (3.24) is known as the false position formula. An alternate
form of (3.24) can be obtained. From (3.24):
xu f (xl ) xl f (xu )
xr = − (3.25)
f (xl ) − f (xu ) f (xl ) − f (xu )
100 NONLINEAR SIMULTANEOUS EQUATIONS
xu f (xl ) xl f (xu )
xr = xu + − − xu (3.26)
f (xl ) − f (xu ) f (xl ) − f (xu )
f (xl )f (xr )
(3.28)
f (xr )f (xu )
Check which is less than zero. From Figure 3.5 we note that
Hence, the root lies in the interval [xr , xu ] (for the function shown in
Figure 3.5). Therefore, we discard the interval [xl , xr ].
xl = xr
and xu = xu unchanged
(iv) In this method xr is the new estimate of the root. We check the
convergence of the method using the following (approximate percentage
relative error):
(xr )i+1 − (xr )i
× 100 < ∆ (3.30)
(xr )i+1
in which (xr )i is the estimate of the root in the ith iteration. When
converged, i.e., when (3.30) is satisfied, (xr )i+1 is the final value of the
root.
3.2. ROOT-FINDING METHODS 101
Example 3.3 (False Position Method). In this method, once a root has
been bracketed we use the following to obtain an estimate of xr of the root
in the bracketed range:
f (xu )(xu − xl )
xr = xu − (3.31)
f (xu ) − f (xl )
Then, we consider the products f (xl )f (xr ) and f (xr )f (xu ) to determine the
range containing the root. We discard the range not containing the root and
reinitialize the range containing the root to [xl , xu ]. We iterate (3.31) and
use the steps following it until:
(xr )i+1 − (xr )i
% Error = × 100 ≤ ∆ (3.32)
(xr )i+1
We consider f (x) = 0 defined by (3.12) and the bracketed ranges of the roots
determined in Example 3.1 to present details of the false position method
for each root. We choose ∆ = 0.0001 and maximum of twenty iterations
(I = 20).
Table 3.5: Results of false position method for the first root of equation (3.12)
xl = -0.35900E+01
xu = -0.31800E+01
∆= 0.10000E−03
I= 20
Table 3.6: Results of false position method for the second root of equation (3.12)
xl = -0.11700E+01
xu = -0.72000E+00
∆= 0.10000E−03
I= 20
Table 3.7: Results of false position method for the third root of equation (3.12)
xl = 0.17400E+01
xu = 0.21500E+01
∆= 0.10000E−03
I= 20
f (xi ) 6= 0 (3.33)
If we neglect O((∆x)2 ) in (3.35) (valid when (∆x)2 << ∆x) then (3.35) is
approximate, and we can solve for ∆x.
f (xi )
∆x = − (3.36)
f 0 (xi )
xi+1 = xi + ∆x (3.37)
f (xi )
xi+1 = xi − (3.38)
f 0 (xi )
Remarks.
df
f 0 (xi ) = (3.39)
dx x=xi
f (xi )
f 0 (xi ) ≈ (3.40)
(xi − xi+1 )
f (xi )
xi+1 = xi − (3.41)
f 0 (xi )
xi+1 is the improved value (i.e., more accurate) of the root of f (x) in [xl , xu ].
Clearly (3.41) is the same as (3.38), hence earlier remarks hold here as well.
104 NONLINEAR SIMULTANEOUS EQUATIONS
f (x)
tangent to
f (x) at
x = xi
f (xi )
x
xi+1
xi xu
xl
(xi − xi+1 )
Convergence Criterion
xi+1 − xi
× 100 ≤ ∆ ; ∆ is a preset value (3.42)
xi+1
(1) The method requires a range [xl , xu ] that brackets the desired root and
an initial guess x0 ∈ [xl , xu ] of the root.
(2) The method works extremely well if f 0 (x) and f 00 (x) are well-behaved
in the range [xl , xu ].
(3) The method fails if f 0 (xi ) becomes zero, i.e., f 0 (x) changes sign in the
neighborhood of xi .
(4) When the initial guess xi is sufficiently close to the root of f (x), the
method has quadratic convergence (shown in a later section), hence only
a few iterations are required to obtain a highly accurate value of the root.
where
xi+1 = xi + ∆x (3.44)
Expand f (xi + ∆x) = f (xi+1 ) in a Taylor series about xi .
∆x2
f (xi+1 ) = f (xi ) + f 0 (xi )∆x + f 00 (ξ) =0; ξ ∈ [xl , xu ] (3.45)
2
If we neglect f 00 (ξ) term in (2.138), then we obtain Newton’s linear method
or Newton-Raphson method.
or
f (xi ) + f 0 (xi )(xi+1 − xi ) = 0 (3.47)
f (xi )
∴ xi+1 = xi − (3.48)
f 0 (xi )
We can use Taylor series expansion to estimate the error in (3.48). We go
back to the Taylor series expansion (3.45) and use ∆x = xi+1 − xi .
f 00 (ξ)
f (xi ) + f 0 (xi )(xi+1 − xi ) + (xi+1 − xi )2 = 0 (3.49)
2
Since (3.49) is exact (in the sense that influence of all terms in the Taylor
series expansion is accounted for in the last term), xi+1 in (3.49) must be the
exact root or true root, say xi+1 = xt . We substitute xt for xi+1 in (3.49).
f 00 (ξ)
f (xi ) + f 0 (xi )(xt − xi ) + (xt − xi )2 = 0 (3.50)
2
Subtract (3.47) from (3.50), noting that xi+1 in (3.47) is not xt .
f 00 (ξ)
f 0 (xi )(xt − xi+1 ) + (xt − xi )2 = 0 (3.51)
2
Since xt is the true solution:
(∆x)2 00
f (xi + ∆x) = f (xi ) + ∆xf 0 (xi ) + f (xi ) + O((∆x)3 ) = 0 (3.60)
2!
If we neglect O((∆x)3 ), then (3.60) can be written as
(∆x)2 00
f (xi + ∆x) = f (xi ) + ∆xf 0 (xi ) + f (xi ) = 0 (3.61)
2!
In (3.61), f (xi ), f 0 (xi ), and f 00 (xi ) are known and have numerical values.
∆x is unknown, but in the following we treat it as a known increment in
part of the expression, or as completely unknown and to be determined. We
consider both cases in the following.
xi+1 = xi + ∆x1
(3.62)
xi+1 = xi + ∆x2
Of the two values of xi+1 in (3.62), the value that lies in the range (xl , xu )
is the correct improved or new value of the root. We check for convergence
based on approximate percentage relative error given by:
xi+1 − xi
100 ≤ ∆ (3.63)
xi
If (3.63) is satisfied, then xi+1 is the desired value of the root, otherwise
use xi+1 as the new initial or starting value instead of xi and repeat the
calculations, beginning with (3.61).
3.2. ROOT-FINDING METHODS 109
f (xi )
∆x = − (3.67)
f (xi )f 00 (xi )
f 0 (xi ) − 2f 0 (xi )
xi+1 = xi + ∆x (3.68)
or
f (xi )
xi+1 = xi − (3.69)
f (xi )f 00 (xi )
f 0 (xi ) − 2f 0 (xi )
xi+1 − xi
100 ≤ ∆ (3.70)
xi
If (3.70) is satisfied then we have a converged value of the root, i.e., xi+1 is
the desired root of f (x) in the range [xl , xu ]. If not, then using xi+1 as the
new initial or guess value, we repeat the calculations using (3.69).
Remarks.
(2) Case I does require the solution of ∆x, i.e., ∆x1 and ∆x2 , using the
expression for roots of a quadratic equation.
(3) Newton’s second order method requires f 00 (x) to be well-behaved in the
neighborhood of xi .
(4) As in the case of Newton’s linear method, Newton’s second order method
(both Case I and Case II) also have good convergence characteristics as
long as the initial or starting solution is in a sufficiently small neighbor-
hood of the correct value of the root.
(5) The convergence rate of Case II is similar to Newton-Raphson method
due to the introduction of approximation (3.65).
Case I:
In this approach we use
(∆x)2 00
f (xi ) + f 0 (xi ) + f (xi ) = 0 (3.71)
2
in which
f (x) = x3 + 2.3x2 − 5.08x − 7.04
f 0 (x) = 3x2 + 4.6x − 5.08 (3.72)
00
f (x) = 6x + 4.6
and xi is the current estimate of the root in the bracketed range. Choose
i = 0, and hence x0 as the initial guess, and calculate f (xi ), f 0 (xi ), and
f 00 (xi ) using (3.72). Using these values and (3.71) find two values of ∆x, say
∆x1 and ∆x2 , using the quadratic formula. Let
xi+1 = xi + ∆x1
for i = 1, 2, . . . (3.73)
xi+1 = xi + ∆x2
Choose the value of xi+1 that falls within [xl , xu ], the range that brackets
the root. Increment i = i + 1 and repeat calculations beginning with (3.71).
The convergence criterion is given as (approximate percentage relative error)
xi+1 − xi
× 100 ≤ ∆ (3.74)
xi+1
3.2. ROOT-FINDING METHODS 111
All roots have been determined within the desired accuracy of ∆ = 0.0001
in three iterations, compared to four iterations for Newton’s linear method.
Strictly in terms of iteration count, this is an improvement, but this method
also requires additional calculation of two values of ∆x at each iteration
using the quadratic formula, and determination of which choice of ∆x is ap-
propriate. This may result in worse overall efficiency compared to Newton’s
linear method.
Case II
In this approach we approximate ∆x/2 using Newton’s linear method to
obtain:
xi
xi+1 = xi − 00 (x )
; i = 0, 1, 2, . . . (3.75)
f 0 (xi ) − f (x2fi )f
0 (x )
i
i
f (xi ) − f (xi−1 )
f 0 (xi ) ∼
= (3.77)
xi − xi−1
This is a backwards difference approximation for f 0 (xi ). Substituting from
(3.77) into (3.76) for f 0 (xi ):
f (x)
f (xi )
f (xi−1 )
x
xi−1 xi
xl xu
Figure 3.7: Secant Method
This is known as the secant method. This expression for xi+1 in (3.78) is
the same as that derived for false position method (see Example 3.3 for a
numerical example).
Remarks.
(2) This method is helpful when f (x) is complicated, in which case deter-
mining f 0 (x) may be involved, but is avoided in this case.
We begin with xi ∈ [xl , xu ] as the assumed value of the root in the bracketed
range [xl , xu ] and iterate using (3.79) until converged, i.e., until the following
holds (approximate percentage relative error):
xi+1 − xi
100 ≤ ∆ (3.80)
xi+1
3.2. ROOT-FINDING METHODS 115
Case (a)
Consider
f (x) = x2 − 4x + 3
We express f (x) as x = g(x).
x = x2 − 3x + 3 = g(x)
∴ xi+1 = x2i − 3xi + 3 ; i = 0, 1, . . . x0 is initial guess
√
(iii) x = ± 4x − 3 = ±g(x)
√
∴ xi+1 = ± 4xi − 3 ; i = 0, 1, . . . x0 is initial guess
Case (b)
Consider
f (x) = sin(x) = 0
Add x to both sides of f (x) = 0.
x = sin(x) + x = g(x)
∴ xi+1 = sin(xi ) + xi ; i = 0, 1, . . .
116 NONLINEAR SIMULTANEOUS EQUATIONS
Case (c)
Consider
f (x) = e−x − x
∴ x = e−x = g(x)
Hence,
xi+1 = e−xi ; i = 0, 1, . . . x0 is initial guess
We present a numerical study of Case (c) using x0 = 0. Calculated values
are tabulated in the following.
Table 3.17: Results of fixed point method for Case (c) with x0 = 0.
x0 = 0.00000E+00
∆= 0.10000E−03
I= 20
(3) Newton’s linear method has even better performance than false position
method due to the fact that in false position method (or secant method),
the function derivative is approximated. Error O(10−5 ) (lower than (1)
and (2)) required only four iterations for each root. From third to fourth
iteration the error (relative error) reduces from O(10−1 ) or O(10−2 ) to
O(10−5 ) or O(10−6 ), better than the theoretical quadratic convergence
rate.
(6) Fixed point method is even worse than bisection method. For the nu-
merical example considered here error O(10−1 ) required 20 iterations.
f (x, y, z) = 0
g(x, y, z) = 0 (3.81)
h(x, y, z) = 0
f (xi , yi , zi ) 6= 0
g(xi , yi , zi ) 6= 0 (3.82)
h(xi , yi , zi ) 6= 0
xi+1 = xi + ∆x
yi+1 = yi + ∆y (3.83)
zi+1 = zi + ∆z
−1
xi+1 xi fx fy fz f
∴ yi+1 = yi − gx gy gz g (3.88)
zi+1 zi hx hy hz x ,y ,z h x ,y ,z
i i i i i i
xi+1 , yi+1 , zi+1 from (3.88) are improved values of the solution compared to
xi , yi , zi (previous iteration). Now we check for convergence (approximate
percentage relative error):
xi+1 − xi
100 ≤ ∆
xi+1
yi+1 − xi
100 ≤ ∆ (3.89)
yi+1
zi+1 − zi
100 ≤ ∆
zi+1
Remarks.
(2) When these nonlinear equations describe a physical process, the physics
is generally of help in estimating or guessing a good starting solutions.
For example, Stokes flow is a good assumption as a starting solution for a
120 NONLINEAR SIMULTANEOUS EQUATIONS
(3) Often a null vector or a vector of ones may serve as a crude guess also.
Such a choice may result in many more iterations as this choice may
be far away from the true solution. In some cases, this choice may also
result in lack of convergence.
(4) The most important point to remember is that Newton’s method has
excellent convergence characteristics, provided the starting solution is
in a sufficiently small neighborhood of the correct solution. Thus, a
choice of initial solution close to the true solution is necessary, otherwise
the method may require too many iterations to converge or may not
converge at all.
or
f (xi )
xi+1 = xi − (3.91)
f 0 (xi )
which is the same as Newton’s linear method or Newton-Raphson method
derived in Section 3.2.5 for a single nonlinear equation.
We wish to find all possible values of x,y that satisfy the above equations,
i.e., all roots, using Newton’s linear method or Newton’s first order method.
For a system of two equations described by f (x, y) = 0 and g(x, y) = 0, we
have:
xi+1 xi fx fy f
= − (3.93)
yi+1 yi gx gy (x ,y ) g (x ,y )
i i i i
3.3. SOLUTIONS OF NONLINEAR SIMULTANEOUS EQUATIONS 121
4
y = [(21-x3)/3](1/2)
0
y
y = (-2-x2)/2
-2
y2
y1 y = -[(21-x3)/3](1/2)
-4
x1 x2
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x
xi yi f(xi , yi ) g(xi , yi ) fx fy gx gy
xi yi f(xi , yi ) g(xi , yi ) fx fy gx gy
Remarks.
(1) The method has convergence rates similar to Newton’s linear method
finding roots of f (x).
(2) This method has good mathematical foundation and is perhaps the best
method for obtaining solutions of nonlinear systems of equations.
(3) It is important to note again that Newton’s methods (both linear and
quadratic) have small radii of convergence. The radius of convergence
3.3. SOLUTIONS OF NONLINEAR SIMULTANEOUS EQUATIONS 123
is the region or neighborhood about the root such that a choice of the
starting solution or guess solution for the root in this region is assured to
yield a converged solution. Due to the fact that the radius of convergence
is small for Newton’s methods, close proximity of the guess or starting
solution to the true root or solution is essential for the convergence of
the methods.
Problems
Consider the following cubic algebraic polynomial.
3.1 Plot a graph of f (x) versus x to locate the roots of (1), approximately.
3.2 Using the brackets of the roots determined in 3.1, use bisection method
to determine more accurate values of each root. Use
(xm )i+1 − (xm )i
× 100 ≤ 0.1 × 10−3
(xm )i+1
3.3 Using brackets of the roots determined in 3.1, use method of false posi-
tion to determine more accurate values of each root. Use
(xr )i+1 − (xr )i
× 100 ≤ 0.1 × 10−3
(xr )i+1
as convergence criterion, where (xr )i is the value of the root in the ith iter-
ation. Tabulate your calculations in the same manner as in the examples.
Clearly show the values of the roots. Limit number of iterations to 20.
3.4 Using the brackets of the roots determined in 3.1, use Newton’s linear
method (Newton-Raphson method) to determine more accurate values of
each root. Use -0.8, 3.3 and 4.8 as starting values (initial guess) of the roots
(in ascending order). Use
xi+1 − xi
× 100 ≤ 0.1 × 10−4
xi+1
as convergence criterion, where xi is the value of the root in the ith iteration.
Tabulate your calculations in the same manner as in the examples. Limit
the number of iterations to 10.
3.5 Using the brackets of the roots determined in 3.1, use Newton’s second
order method:
3.3. SOLUTIONS OF NONLINEAR SIMULTANEOUS EQUATIONS 125
Use -0.8, 3.3 and 4.8 as initial guess of the roots (considered in ascending
order) for Case I. Use -0.8, 2.9, and 4.8 as initial guess of the roots (consid-
ered in ascending order) for Case II.
3.6 Based on the studies performed in problems 3.2 - 3.5, write a short
discussion regarding the accuracy of various methods, convergence charac-
teristics, and efficiency.
3.7 Consider
f (x) = −x2 + 5.5x + 11.75
3.8 Consider
f (x) = 3x3 − 2.5x2 + 3.5x − 1
3.9 Consider
3.10 Consider
f (x) = −x3 + 6.8x2 − 8.8x − 4.4
3.11 Consider
f (x) = −x2 + 1.889x + 2.778
3.12 Consider
f (x) = x3 − 8x2 + 12x − 4
3.13 Consider
f (x) = 0.5x3 − 3x2 + 5.5x − 3.05
(c) Also find roots of f (x) = 0 using secant method with the same accuracy
as in (b).
3.15 Consider
x2 − 3x − 4 = 0 (1)
(a) Use basic iteration method or fixed point method to find a root of (1)
near x = 3. Perform five iterations
(b) Let (xl , xu ) = (3.2, 5) contain root of (1). Determine a value of the root.
Perform four iterations only.
(c) Use method of false position to obtain a root of (1). Perform four
iterations only.
Use decimal place accuracy of the computed solutions in (b) and (c) to
compare their convergence characteristics.
3.16 Find square of π (3.14159265) using Newton’s linear method (but with-
out taking square root) starting with a value of 1.0. Calculate five new es-
timates.
√
Hint: Let x = π.
3.17 Calculate e−1 (inverse of e) without taking its inverse using Newton’s
linear method with accuracy upto four decimal places. Use e = 2.7182183
and a starting value of 0.3.
Hint: x = e−1 .
3.18 Let f (x) = cos(x) where x is in radians. Find a root of f (x) starting
with x0 = 1.0 using fixed point or basic iteration method. Calculate five
new estimates.
3.19 Find cube root of 8 accurate upto three decimal places using Newton’s
linear method starting with a value of 1.
−2x2 + 2x − 2y + 1 = 0
(1)
0.2x2 − xy − 0.2y = 0
128 NONLINEAR SIMULTANEOUS EQUATIONS
Use Newton’s linear method to find solutions of (1) using x0 , y0 = 1.0, 1.0
as initial guess with at least five decimal place accuracy. Write a computer
program to perform calculations.
(x − 2)2 + (y − 2)2 − 3 = 0
(1)
x2 + y 2 − 4 = 0
Plot graphs of the functions in (1) to obtain initial guess of the roots. Use
Newton’s linear method to obtain values of the roots accurate upto five
decimal places. Write a computer program to perform all calculations.
x2 − y + 1 = 0
(1)
3 cos(x) − y = 0
Plot graphs of the functions in (1) in the xy-plane to obtain initial values
of the roots. Use Newton’s linear method to obtain the values of the roots
accurate upto five decimal places. Write a computer program to perform all
calculations.
4
Algebraic Eigenvalue
Problems
4.1 Introduction
Algebraic eigenvalue problems play a central and crucial role in dynamics,
mathematical physics, continuum mechanics, and many areas of engineer-
ing. Broadly speaking eigenvalue problems are mathematically classified as
standard eigenvalue problems or generalized eigenvalue problems.
holds, then (4.1) is called the standard eigenvalue problem. The scalar λ
(or λs) and the corresponding vector(s) {φ} are called eigenvalue(s) and
eigenvector(s). Together we refer to (λ,{φ}) as an eigenpair of the standard
eigenvalue problem (4.1).
129
130 ALGEBRAIC EIGENVALUE PROBLEMS
(ii) Expansion of (4.3) and (4.4) using Laplace expansion will result in a
nth degree polynomial in λ called the characteristic polynomial p(λ)
corresponding to the eigenvalue problems (4.1) or (4.2).
(iii) The nth degree characteristic polynomial p(λ) has n roots λ1 , λ2 , . . . , λn
called eigenvalues of the eigenvalue problem (4.1) or (4.2). We generally
arrange λi s in ascending order.
λ1 < λ2 < · · · < λn (4.5)
(v) Consider the SEVP (4.1). When [A] is symmetric its eigenvalues λi ;
i = 1, 2, . . . , n are real. When [A] is positive-definite, then λi are real
and positive, i.e., λi > 0; i = 1, 2, . . . , n. When [A] is positive semi-
definite then all eigenvalues of [A] are real, but the smallest eigenvalue
λi (or more) can be zero. When [A] is non-symmetric then its eigen-
values can be real, real and complex, or all of them can be complex. In
this course as far as possible we only consider [A] to be symmetric. In
case of GEVP the same rules hold for both [A] and [B] together, i.e.,
either symmetric or non-symmetric.
Let (λi , {φ}i ) and (λj , {φ}j ) be two distinct eigenpairs of (4.11), i.e., λi 6= λj .
Then we have:
or
(λi − λj ){φ}Tj [I]{φ}i = 0 (4.18)
We note that
{φ}Ti {φ}i > 0 (4.20)
and is equal to zero if and only if {φ}i = {0}, a null vector. Since the
eigenvectors only represent a direction, we can normalize them such that
their length is unity (in this case). Let ||{φ}i || be the euclidean norm or the
length of the eigenvector {φ}i .
q
||{φ}i || = {φ}Ti {φ}i (4.21)
Consider:
1
{φ}
e i= {φ}i (4.22)
||{φ}i ||
s s
1 1 ||{φ}i ||2
{φ}
e i = {φ}Ti {φ}i = =1 (4.23)
||{φ}i || ||{φ}i || ||{φ}i || ||{φ}i ||
Thus, {φ}
e i is the normalized {φ}i such that the norm of {φ}
e i is one. With
this normalization (4.19) reduces to:
(
e T [I]{φ}
{φ} e j = δij = 1
e T {φ}
e j = {φ} if j = i
(4.24)
i i
0 6 i
if j =
The quantity δij is called the Kronecker delta. The condition (4.24) is called
the orthonormality condition of the normalized eigenvectors of SEVP.
4.2. BASIC PROPERTIES OF THE EIGENVALUE PROBLEMS 133
Let (λi , {φ}i ) and (λj , {φ}j ) be two eigenpairs of (4.25) in which λi and λj
are distinct, i.e., λi 6= λj . Then we have:
Take the transpose of (4.29) (since [A] and [B] are symmetric [A]T = [A]
and [B]T = [B]).
Consider:
{φ}i
{φ}
e i= (4.36)
||{φ}i ||B
Taking the [B]-norm of {φ}
e i:
q
||{φ}
e i ||B = e T [B]{φ}
{φ} e i (4.37)
i
4.2.2.1 SEVP
If (λi , {φ}i ) is an eigenpair of the SEVP, then:
or
β([A]{φ}i − λi [I]{φ}i ) = {0} (4.42)
Since β 6= 0:
[A]{φ}i − λi [I]{φ}i = {0} must hold (4.43)
Hence, (λi , β{φ}i ) is an eigenpair of SEVP as (4.43) is identical to the SEVP.
4.2. BASIC PROPERTIES OF THE EIGENVALUE PROBLEMS 135
GEVP
or
β([A]{φ}i − λi [B]{φ}i ) = {0} (4.46)
Since β 6= 0:
[A]{φ}i − λi [B]{φ}i − {0} must hold (4.47)
Consider:
[A]{φ} − λ[I]{φ} = {0} (4.48)
e i − λi [I]{φ}
[A]{φ} e i = {0} (4.49)
e T.
Premultiply (4.49) by {φ}i
e T [A]{φ}
{φ} e T [I]{φ}
e i − λi {φ} e i=0 (4.50)
i i
Since {φ}
e i is normalized with respect to [I]:
e T [I]{φ}
{φ} e i=1 (4.51)
i
e T [A]{φ}
{φ} e i = λi (4.52)
i
Consider:
[A]{φ} − λ[B]{φ} = {0} (4.53)
For an eigenpair (λi , {φ}
e i ):
e i − λi [B]{φ}
[A]{φ} e i = {0} (4.54)
e T.
Premultiply (4.54) by {φ}i
e T [A]{φ}
{φ} e T [B]{φ}
e i − λi {φ} e i=0 (4.55)
i i
Since {φ}
e i is normalized with respect to [B]:
e T [B]{φ}
{φ} e i=1 (4.56)
i
Basic Steps:
Let
n
P
[B1 ] = [A] ; p1 = tr[B1 ] = (b1 )ii
i=1
n
1
tr[B2 ] = 12
P
[B2 ] = [A] [B1 ] − p1 [I] ; p2 = 2 (b2 )ii
i=1
n
1
tr[B3 ] = 13 (4.62)
P
[B3 ] = [A] [B2 ] − p2 [I] ; p3 = 3 (b3 )ii
i=1
.. ..
. .
n
1 1
P
[Bn ] = [A] [Bn−1 − pn−1 [I] ; pn = n tr[Bn ] = n (bn )ii
i=1
or
2−λ −1 0
−1 2−λ −1 = 0
0 −1 1−λ
Laplace expansion using the first row:
2−λ −1 −1 − 1 −1 2 − λ
(2 − λ) + (−1) + (0) =0
−1 1−λ 1−λ 0 0 −1
(2 − λ) (2 − λ)(1 − λ) − (−1)(−1) + (−1) (−1)(0) − (−1)(1 − λ) = 0
or
p(λ) = −λ3 + 5λ2 − 6λ + 1 = 0
[B2 ] = [A] [B1 ] − p1 [I]
2 −1 0 2 −1 0 1 0 0
= −1 2 − 1 −1 2 − 1 − (5) 0 1 0
0 −1 1 0 −1 1 0 0 1
2 −1 0 −3 − 1 0
= −1
2 − 1 −1 −3 −1
0 −1 1 0 −1 − 4
−5 1 1
1 1
∴ [B2 ] = 1 −4 2 ; p2 = tr[B2 ] = (−5 − 4 − 3) = −6
2 2
1 2 −3
140 ALGEBRAIC EIGENVALUE PROBLEMS
[B3 ] = [A] [B2 ] − p2 [I]
2 −1 0 2 −1 0 1 0 0
= −1 2 − 1 −1 2 − 1 − (−6) 0 1 0
0 −1 1 0 −1 1 0 0 1
2 −1 0 1 1 1
= −1 2 − 1 1 2 2
0 −1 1 1 2 3
1 0 0
1 1
∴ [B3 ] = 0 1 0
; p3 = tr[B3 ] = (1 + 1 + 1) = 1
3 3
0 0 1
The characteristic polynomial p(λ) is given by:
p(λ) = (−1)3 (λ3 − p1 λ2 − p2 λ − p3 )
or
p(λ) = (−1)3 (λ3 − 5λ2 − (−6)λ − 1)
or
p(λ) = −λ3 + 5λ2 − 6λ + 1
which is the same as obtained using the determinant method. We note
that:
1 1 1
1 1
[A]−1 =
[B2 ] − p2 [I] = 1 2 2
p3 1
1 2 3
1 1 1
−1
∴ [A] = 1 2 2
1 2 3
λ2 − 4λ + 3 = 0 =⇒ (λ − 1)(λ − 3) = 0 (4.72)
∴ λ = 1 and λ = 3
Hence, the eigenvalues λ1 and λ2 are (in ascending order):
λ1 = 1 , λ2 = 3 (4.73)
Eigenvectors
Corresponding to eigenvalues λ1 = 1, λ2 = 3, we calculate eigenvectors in
the following. Each eigenvector must satisfy the eigenvalue problem (4.69).
Using λ = λ1 = 1 in (4.69):
2 −1 1 0 φ1 0
− (1) = (4.74)
−1 2 0 1 φ2 0
142 ALGEBRAIC EIGENVALUE PROBLEMS
φ1
{φ}1 = is the desired eigenvector corresponding to λ1 = 1. From
φ2 1
equation (4.74):
1 − 1 φ1 0
= (4.75)
−1 1 φ2 0
1 −1
Obviously det = 0 as expected.
−1 1
To determine {φ}1 , we must choose a value for either φ1 or φ2 in
(4.75) and then solve for the other. Let φ1 = 1, then using:
φ1 − φ2 = 0 or − φ1 + φ2 = 0
Using λ = λ2 = 3 in (4.69):
2 −1 1 0 φ1 0
− (3) = (4.78)
−1 2 0 1 φ2 0
φ1
{φ}2 = is the desired eigenvector corresponding to λ2 = 3. From
φ2 2
equation (4.78):
−1 − 1 φ1 0
= (4.79)
−1 −1 φ2 0
In this case we also note that the determinant of the coefficient matrix
in (4.79) is zero (as expected). To determine {φ}2 we must choose a
value of φ1 or φ2 in (4.79) and then solve for the other. Let φ1 = 1, then
using:
−φ1 − φ2 = 0 (4.80)
4.3. DETERMINING EIGENPAIRS 143
we obtain:
φ2 = −1
Hence
φ1 1
{φ}2 = = (4.81)
φ2 −1
We now have the second eigenpair corresponding to the second eigen-
value λ2 = 3:
1
(λ1 , {φ}2 ) = 3, (4.82)
−1
Thus, the two eigenpairs in acending order of the eigenvalues are:
1 1
1, and 3, (4.83)
1 −1
Orthogonality of Eigenvectors
We note that
T
1 1 1
{φ}T1 [I]{φ}2 {φ}T1 {φ}2
= = = 11 =0
1 −1 −1
T (4.84)
T T 1 1 1
{φ}2 [I]{φ}1 = {φ}2 {φ}1 = = 1 −1 =0
−1 1 1
That is, {φ}1 and {φ}2 are orthogonal to each other or with respect to [I].
Since
T
T 1 1 1
{φ}1 {φ}1 = = 11 =2
1 1 1
T (4.85)
T 1 1 1
and {φ}2 {φ}2 = = 1 −1 =2
−1 −1 −1
Therefore
{φ}Ti {φ}j 6= δij ; i, j = 1, 2 (4.86)
Hence, {φ}1 and {φ}2 are not orthonormal.
and
s T
q q
1 1 √
||{φ}2 || = {φ}T2 [I]{φ}2 = {φ}T2 {φ}2 = = 2 (4.89)
−1 −1
√
1 1 1 1/ 2
∴ {φ}
e 2= {φ}2 = √ = √ (4.90)
||{φ}2 || 2 −1 −1/ 2
Thus, {φ}
e 1 and {φ}
e 2 are orthogonal and normalized (with respect to [I]),
hence these eigenvectors are orthonormal.
The only difference between (4.92) and (4.93) is that in the right side of
(4.92) we have [I] instead of [B]. Thus (4.92) can be obtained from (4.93)
by redefining [B] as [I]. Hence, instead of (4.92) and (4.93) we could define
a new eigenvalue problem:
or
[I]{x} = λ[A]−1 {x} (4.95)
If we define [A] = [I] and [B ] = [A]−1 in (4.95) then we obtain (4.94). In
the case of (4.93),
e we could epremultiply it by [A]−1 (provided [A]−1 exists)
to obtain:
[A]−1 [A]{x} = λ[A]−1 [B]{x}
or
[I]{x} = λ([A]−1 [B]){x} (4.96)
If we define [A] = [I] and [B ] = [A]−1 [B] in (4.96), then again we obtain
(4.94). Alternatively, in the case of (4.93), we can also premultiply by [B]−1
e e
(provided [B]−1 exists) to obtain:
or
([B]−1 [A]){x} = λ[I]{x} (4.97)
If we define [A] = [B]−1 [A] and [B ] = [I] in (4.97), then we also obtain
(4.94). e e
Thus, the eigenvalue problem in the form (4.94) is the most general
representation of any one of the five eigenvalue problem forms defined by
(4.92), (4.93), (4.95), (4.96), and (4.97). Regardless of the form we use, the
eigenvalues remain unaffected due to the fact that in all these various forms
λ has never been changed. However, a [B ]-normalized eigenvector may be
different if [B ] is not the same. e
It then suffices
e to consider the eigenvalue problem (4.94) for present-
ing details of the vector iteration method. The specific choices of [A] and
[B ] may be application dependent, but these choices do not influence e the
eigenvalue.
e As mentioned earlier, when vector iteration method is applied
to an eigenvalue problem such as (4.94), it always yields lowest eigenpair
(proof omitted) (λ1 , {φ}1 ). Calculation of lowest eigenpair by vector itera-
tion method is also called the Inverse Iteration Method.
Remarks.
(1) The method described above to calculate the smallest eigenpair is called
the inverse iteration method, in which we iterate for an eigenvector and
then for an eigenvalue, hence the method is sometimes also called the
vector iteration method.
(3) Using the method presented here it is only possible to determine (λ1 , {φ}1 )
(proof omitted), the smallest eigenvalue and the corresponding eigenvec-
tor.
Since the vector iteration technique described in the previous section only
determines the lowest eigenpair, we must recast (4.108) and (4.109) in al-
ternate forms that will allow us to determine the largest eigenpair. Divide
148 ALGEBRAIC EIGENVALUE PROBLEMS
1
[B]{x} = [A]{x} (4.111)
λ
Let
1 e
=λ (4.112)
λ
Then, (4.110) and (4.111) become:
[I]{x} = λ[A]{x}
e (4.113)
[B]{x} = λ[A]{x}
e (4.114)
[A]{x}
e = λ[
e B]{x}
e (4.115)
The alternate forms of (4.113) and (4.114) described in Section 4.3.2.1 are
possible to define here too. If we premultiply (4.113) by [A]−1 (provided
[A]−1 exists), we obtain:
e1 , {φ}
(λ e 1 ) gives us (1/λe, {φ}
e 1 ) = (λn , {φ}n )
[A]{x}
e = λ[
e B]{x}
e (4.119)
The details are exactly the same as presented for calculating (λ1 , {φ}1 ), but
we repeat these in the following for the sake of completeness. We want
e {φ}
to calculate (λ, e1 is the lowest eigenvalue and {φ}
e 1 ) in which λ e is the
associated eigenvector.
[A]{x̄}
e k+1 = P ({x̄}k+1 )[B]{x̄}k+1
e e (4.121)
{x̄}Tk+1 [A]{x̄}
e e T
k+1 = P ({x̄}k+1 ){x̄}k+1 [B]{x̄}k+1
e
150 ALGEBRAIC EIGENVALUE PROBLEMS
{x̄}Tk+1 [A]{x̄}
e k+1
∴ P̃ ({x̄}) = (4.122)
T
{x̄} [B]{x̄}k+1
e
k+1
5. For each value of k we check for the convergence of the calculated eigen-
pair.
Pe({x̄}k+1 ) − Pe({x̄}k )
For eigenvalue: ≤ ∆1 (4.126)
Pe({x̄}k+1 )
Remarks.
(1) In the vector iteration method described above for k = 1, 2, . . . the
following holds.
lim P ({x̄}k ) = λ
e1 ; lowest eigenvalue (4.128)
k→∞
1
= λn ; largest eigenvalue (4.129)
λ
e
lim {x}k = {φ}
e 1 ; eigenvector corresponding to λn (4.130)
k→∞
4.3. DETERMINING EIGENPAIRS 151
(2) Using the method presented it is only possible to determine one eigen-
pair.
(3) Use of (4.115) in the development of the computational procedure per-
mits treatment of the SEVP as well as the GEVP in any one of the
desired forms shown earlier by choosing appropriate definitions of [A]
e
and [B].
e
For k = 1
1
We have 1, for the initial guess of the eigenpair (not orthogonal
1
to {φ}1 ). Using (4.138), we have:
2 − 1 x̄1 1 0 1
=
−1 4 x̄2 2 0 1 1
Hence
x̄1 0.71429
= = {x̄}2
x̄2
2
0.42857
T
0.71429 2 − 1 0.71429
{x̄}T2 [A]{x̄}2 0.42857 −1 4 0.42857
P ({x̄}2 ) = T
= T = 1.6471
{x̄}2 [I]{x̄}2
e
0.71429 0.71429
[I]
0.42857 0.42857
1
{x}2 = q {x̄}2
{x̄}T2 [I]{x̄}2
1 0.71429 0.85749
= s =
T 0.42857 0.51450
0.71429 0.71429
0.42857 0.42857
∴ The new estimate of the first eigenpair is
0.85749 x
1.6471, = P ({x̄}2 ), 1
0.51450 x2 2
4.3. DETERMINING EIGENPAIRS 153
For k = 2
2 − 1 x̄1 x
= [I] 1
−1 4 x̄2 3 x2 2
Hence
x̄1 0.56350
= = {x̄}3
x̄2 3
0.26950
T
0.56350 2 − 1 0.56350
{x̄}T3 [A]{x̄}3 0.26950 −1 4 0.26950
P ({x̄}3 ) = = = 1.5938
{x̄}T3 [I]{x̄}3
e T
0.56350 0.56350
[I]
0.26950 0.26950
1
{x}3 = q {x̄}3
{x̄}T3 [I]{x̄}3
1 0.56350 0.90213
= s =
T 0.26950 0.43146
0.56350 0.56350
[I]
0.26950 0.26950
Normalized Eigenvector
P({x̄}k+1 )−P({x̄}k )
k P({x̄}k+1 ) P({x̄}k+1 )
x1 x2
The convergence tolerance ∆1 = O(10−5 ) has been used for the eigenvalue
in the results listed in Table 4.1.
Example 4.4 (Determining the Largest Eigenpair: Forward Itera-
tion Method). Consider the following SEVP:
2 − 1 x1 1 0 x1
=λ (4.139)
−1 4 x2 0 1 x2
The basic steps to be used in this method have already been presented,
so we do not repeat these here. Instead we present calculation details and
numerical results.
Choose λe = 1.0 and x1 = 1.0 and rewrite (4.140) in the difference
x2 1.0
equation form with λ = 1. The vector {x}1 must not be orthogonal to {φ}
e e 1.
1 0 x̄1 2 − 1 x1
= ; k = 1, 2, . . . (4.141)
0 1 x̄2 k+1 −1 4 x2 k
For k = 1
1
We have 1, as the initial guess of the largest eigenvalue and the
1
corresponding eigenvector, hence:
1 0 x̄1 2 −1 1 1 0
=
0 1 x̄2 −1 4 1 0 1
x̄1 1.0
∴ = = {x̄}2 ; new estimate of eigenvector
x̄2 3.0
4.3. DETERMINING EIGENPAIRS 155
For k = 2
1 0 x̄1 2 − 1 x1
=
0 1 x̄2 3 −1 4 x2 2
1 0 x̄1 2 − 1 0.17678
=
0 1 x̄2 3 −1 4 0.53033
x̄1 −0.17678
= = {x̄}3
x̄2 3 1.9445
Using {x̄}3 and (4.140), we obtain a new estimate of λ.e
T
−0.17678 1 0 −0.17678
{x̄}T3 [A]{x̄}
e 3 1.9445 0 1 1.9445
P ({x̄}3 ) = = T = 0.24016
{x̄}T [B]{x̄}
3
e 3 −0.17678 2 − 1 −0.17678
1.9445 −1 4 1.9445
Normalize {x̄}3 to obtain {x}3 .
1
{x}3 = q {x̄}3
T
{x̄}3 [B̃]{x̄}3
1 −0.17678 −0.44368
= s =
T 1.9445 0.48805
−0.17678 2 − 1 −0.17678
1.9445 −1 4 1.9445
156 ALGEBRAIC EIGENVALUE PROBLEMS
Normalized Eigenvector
P({x̄}k+1 )−P({x̄}k )
k P({x̄}k+1 ) P({x̄}k+1 )
x1 x2
As shown in the previous example, we can use (4.143) and the forward itera-
tion method to find the smallest λ,
e i.e., λ
e1 , and hence the largest eigenvalue
4.3. DETERMINING EIGENPAIRS 157
Normalized Eigenvector
P({x̄}k+1 )−P({x̄}k )
k P({x̄}k+1 ) P({x̄}k+1 )
x1 x2
Thus, λ
e1 = 0.22653547 (same as previous example) and we have:
1 1
λ2 = = = 4.414320 (largest eigenvalue)
λ̃1 0.22653547
158 ALGEBRAIC EIGENVALUE PROBLEMS
The eigenvector in this case is different than previous example due to the
fact that it is normalized differently. Hence, we have
−0.38246
(λ2 , {φ}2 ) = 4.414320,
0.92397
We recall that the inverse iteration method only yields the lowest eigen-
pair while the forward iteration method gives the largest eigenpair. These
two methods do not have a mechanism for determining intermediate or sub-
sequent eigenpairs. For this purpose we utilize Gram-Schmidt orthogonaliza-
tion or the iteration vector deflation method in conjunction with the inverse
or forward iteration method.
The basis for Gram-Schmidt orthogonalization or iteration vector defla-
tion is that in order for an assumed eigenvector (iteration vector) to converge
to the desired eigenvector in the inverse or forward iteration method, the it-
eration vector must not be orthogonal to the desired eigenvector. In other
words, if the iteration vector is orthogonalized to the eigenvectors that have
already been calculated, then we can eliminate the possibility of convergence
of iteration vector to any one of them and hence convergence will occur to
the next eigenvector. A particular orthogonalization procedure used exten-
sively is called Gram-Schmidt orthogonalization process or iteration vector
deflation method. This procedure can be used for the SEVP as well as the
GEVP in the inverse or forward iteration methods.
Based on the material presented for the inverse and forward iteration
methods it suffices to consider:
Remarks.
(1) In the inverse iteration method we find the lowest eigenpair (λ1 , {φ}1 )
using the usual procedure and then use vector deflation to find (λ2 , {φ}2 ),
(λ3 , {φ}3 ),. . . , in ascending order.
(2) In the forward iteration method we find the largest eigenpair (λn , {φ}n )
using usual procedure and then use iteration vector deflation to find
(λn−1 , {φ}n−1 ), (λn−2 , {φ}n−2 ), . . . , in descending order.
[B]{x} = λ[A]{x}
e ; e= 1
λ (4.156)
λ
or [A]{x}
e = λ[
e B]{x}
e (4.157)
where [A]
e = [B] ; [B]e = [A] (4.158)
5. Calculate a new estimate of λm+1 , say Pm+1 ({x̄}k+1 ), using (4.161) and
{x̄}k+1 .
{x̄}Tk+1 [A]{x̄}k+1
Pm+1 ({x̄}k+1 ) = (4.165)
{x̄}Tk+1 [B]{x̄}k+1
7. Convergence check:
Remarks.
[A]{x} = λ[B]{x}
in which
1 −1 2 1
[A] = and [B] =
−1 2 1 2
We want to calculate both eigenpairs of this GEVP by using inverse
iteration method with vector deflation.
Normalized Eigenvector
P({x̄}k+1 )−P({x̄}k )
k P({x̄}k+1 ) P({x̄}k+1 )
x1 x2
We already have:
0.49079
{φ}1 =
0.31971
hence m, the number of known eigenpairs, is one.
For k = 1
αi = {φ}Ti [B]{x}1
[A]{x̄}2 = [B]{x}1
e
Calculate {x̄}2 using {x}1 from Step 3.
e
−1
1 −1 2 1 −0.19335 −0.076285
{x̄}2 = =
−1 2 1 2 0.22262 0.087799
For k = 2
0.65268
1. Choose λ = 1 and {x}2 = , from Step 6 for k = 1.
0.75120
2. Calculate the scalar α1 .
0.49079 2 1 0.65268
αi = {φ}Ti [B]{x}2 =
0.31971 1 2 0.75120
or
α1 = −0.31083 × 10−3
Normalized Eigenvector
P({x̄}k+1 )−P({x̄}k )
k P({x̄}k+1 ) P({x̄}k+1 )
x1 x2
From the relative error, we note that for k = 2, we have converged values
of the second eigenpair, hence:
−0.65268
(λ2 , {φ}2 ) = 2.5352,
0.75120
Thus, the second eigenpair is determined using (λ1 , {φ}1 ) and iteration
vector deflation method. Just in case the estimates of the second eigenpair
are not accurate enough for k = 2, the process can be continued for
k = 3, 4, . . . until desired accuracy is achieved.
(iii) Shifting can be used to calculate eigenpairs other than (λ1 , {φ}1 ) and
(λn , {φ}n ) in inverse and forward iteration methods.
in which η and {y} are an eigenvalue and eigenvector of the GEVP (4.173).
µ is called the shift and the GEVP defined by (4.173) is called the shifted
GEVP.
λ=η+µ or η =λ−µ
(4.175)
{y} = {x}
Thus,
(i) The eigenvectors of the original GEVP and shifted GEVP are the same.
Remarks.
(1) Shifting also holds for the SEVP as in this case the only difference com-
pared to GEVP is that [B] = [I].
or
[P1 ]T [[A] − λ[I]] [P1 ] {x}1
(4.178)
In the eigenvalue problem (4.178), {x}1 is the eigenvector (and not {x}),
thus a change of basis alters eigenvectors. To determine the eigenvalues of
168 ALGEBRAIC EIGENVALUE PROBLEMS
or
det[P1 ]T det[[A] − λ[I]] det[P1 ] = 0 (4.180)
Since [P1 ] is orthogonal:
in which [A] and [B] are symmetric matrices. Here also, we show that an
orthogonal transformation on (4.183) does not alter its eigenvalues but the
eigenvectors change. As in the case of the SEVP, we replace {x} by {x}1
through an orthogonal transformation of the type:
in which
[P1 ]−1 = [P1 ]T and det[P1 ] = det[P1 ]T = 1 (4.185)
Substituting from (4.184) into (4.183) and premultiplying (4.183) by [P1 ]T :
or
[P1 ]T [[A] − λ[B]] [P1 ] {x}1
(4.187)
In the eigenvalue problem (4.187), {x}1 is the eigenvector (and not {x}), thus
change of basis alters eigenvectors. To determine the eigenvalues of (4.187),
4.4. TRANSFORMATION METHODS FOR EIGENVALUE PROBLEMS 169
Remarks.
(a) In the case of the SEVP, [A] becomes a diagonal matrix but [I]
matrix remains unaltered. Then, the diagonals of this transformed
[A] matrix are the eigenvalues and columns of the products of the
transformation matrices contain the eigenvectors.
(b) In the case of the GEVP, we make both [A] and [B] diagonal ma-
trices through orthogonal transformations. Then, the ratios of the
corresponding diagonals of transformed [A] and [B] are the eigen-
values and the columns of the products of transformation matrices
contain the eigenvectors.
(3) The eigenpairs are not in any particular order, hence these must be
arranged in ascending order.
(4) Just like the root-finding methods used for the characteristic polynomial,
the transformation methods are also iterative. Thus, the eigenpairs are
determined only within the accuracy of preset thresholds for the eigenval-
ues and eigenvectors. The transformation methods are indeed methods
of approximation.
(5) In the following sections we present details of the Jacobi method for the
SEVP, the Generalized Jacobi method for the GEVP, and only provide
170 ALGEBRAIC EIGENVALUE PROBLEMS
and
{x} = [P1 ][P2 ]{x}2 (4.200)
4.4. TRANSFORMATION METHODS FOR EIGENVALUE PROBLEMS 171
Equations (4.200) describe how the eigenvectors {x}2 of (4.199) are related
to the eigenvectors {x} of the original SEVP (4.191). We construct [P2 ]
such that the transformation (4.199) makes an off-diagonal element zero in
[P2 ]T ([P1 ]T [A][P1 ])[P2 ]. This process is continued by choosing off-diagonal el-
ements of the progressively transformed [A] in sequence until all off-diagonal
elements have been considered. In this process it is possible that when we
zero out a specific element of the transformed [A], the element that was made
zero in the immediately preceding transformation may not remain zero, but
may be of a lower magnitude than its original value. Thus, to make [A] diago-
nal it may be necessary to make more than one pass through the transformed
off-diagonal elements of [A]. We discuss details in the following sections.
Thus, after k transformations, we obtain:
[Pk ]T [Pk−1 ] . . . [P2 ]T [P1 ]T [A][P1 ][P2 ] . . . [Pk−1 ][Pk ]{x}k = λ[I]{x}k (4.201)
lim [Pk ]T [Pk−1 ] . . . [P2 ]T [P1 ]T [A][P1 ][P2 ] . . . [Pk−1 ][Pk ] = [Λ]
(4.202)
k→∞
in which [Λ] is a diagonal matrix containing the eigenvalues and the columns
of the square matrix [Φ] are the corresponding eigenvectors. Thus in the
end, when [A] becomes the diagonal matrix [Λ] we have all eigenvalues in
[Λ] and [Φ] contains all eigenvectors. In (4.202) and (4.203), each value of
k corresponds to a complete pass through all of the off-diagonal elements of
the progressively transformed [A] that are not zero.
[Pl ] orthogonal matrices in the Jacobi method are called rotation matri-
ces as these represent rigid rotations of the coordinate axes. A specific [Pl ] is
designed to make a specific off-diagonal term of the transformed [A] (begin-
ning with [P1 ] for [A]) zero. Since [A] is symmetric, [Pl ] can be designed to
make an off-diagonal term of [Al ] (transformed [A]) as well as its transposed
term zero at the same time.
Let us say that we have already performed some transformations and the
new transformed [A] is [Al ]. We wish to make alij and alji of [Al ] zero. We
design [Pl ] as follows to accomplish this.
172 ALGEBRAIC EIGENVALUE PROBLEMS
Column i Column j
1
1
cos θ - sin θ Row i
1
[Pl ] = (4.204)
1
sin θ cos θ Row j
1
1
In [Pl ], the elements corresponding to rows i, j and columns i, j are non-zero.
The remaining diagonal elements are unity, and all other elements are zero.
θ is chosen such that in the following transformed matrix:
The elements at the locations i, j and j, i, i.e., alij = alji , have become zero.
This allows us to determine θ.
2alij
tan(2θ) = ; aljj 6= alii
alii − aljj (4.206)
π
and θ = when aljj = alii
4
1. Consider the off-diagonal elements row-wise, and each element of the row
in sequence. That is first consider row one of the original matrix [A] and
the off-diagonal element a12 (a21 = a12 ). We use a11 , a22 , and a12 to
determine θ using (4.206) and then use this value of θ to construct [P1 ]
using (4.204) and perform the orthogonal transformation on [A] to obtain
[A1 ].
[P1 ]T [A][P1 ] = [A1 ] (4.208)
In [A1 ], a112 and a121 have become zero.
2. Next consider [A1 ] and the next element in row one, i.e., a113 (a131 = a113 ).
Using a111 , a133 , and a113 and (4.206) determine θ and then use this value
4.4. TRANSFORMATION METHODS FOR EIGENVALUE PROBLEMS 173
7. Threshold Jacobi
In threshold Jacobi we perform an orthogonal transformation to zero out
an off-diagonal element of [A] (or of the most recently transformed [A])
and then check the magnitude of the next off-diagonal element to be made
zero. If it is below the threshold we skip the orthogonal transformation
for it and move to the next off-diagonal element in sequence, keeping
in mind that the same rule applies to this current off-diagonal elements
as well as those to come. It is clear that in this procedure we avoid
unnecessary transformation for the elements that are already within the
threshold of zero. Thus, the threshold Jacobi clearly is more efficient than
cyclic Jacobi.
[A]{x} = λ[I]{x}
in which
2 −1
[A] =
−1 4
In this case there is only one off-diagonal term in [A], a12 = a21 = −1 (row 1,
column 2 ; i = 1, j = 2). We construct [P1 ] or [P12 ] to make a12 = a21 = −1
zero in [A]. The subscript 12 in [P12 ] implies the matrix [P ] corresponding
to the element of [A] located at row 1, column 2.
cos θ − sin θ
[P12 ] =
sin θ cos θ
2a12 2(−1) −2
tan 2θ = = = =1
a11 − a22 2−4 −2
π π
∴ 2θ = ; θ=
4 8
π
cos = 0.92388
π8
sin = 0.38268
8
0.92388 − 0.38268
∴ [P12 ] =
0.38268 0.92388
∴ [A1 ] = [P12 ]T [A][P12 ]
4.4. TRANSFORMATION METHODS FOR EIGENVALUE PROBLEMS 175
or
1 0.92388 0.38268 2 − 1 0.92388 − 0.38268
[A ] =
−0.38268 0.92388 −1 4 0.38268 0.92388
0.92388 0.38268 1.46508 − 1.68924
=
−0.38268 0.92388 0.60684 4.0782
1.58578 0 λ 0
= = 1
0 4.41421 0 λ2
Hence, we have:
0.92388
(λ1 , {φ}1 ) = 1.52578,
0.38268
−0.38268
and (λ2 , {φ}2 ) = 4.41421,
0.92388
in which [A] and [B] are symmetric matrices and [B] 6= [I]. Our aim in the
generalized Jacobi method is to perform a series of orthogonal transformation
on (4.211) such that:
(i) [A] becomes a diagonal matrix and [B] becomes [I]. The diagonal ele-
ments of the final transformed [A], i.e., [Ak ] (after k transformations),
will be the eigenvalues of (4.211) and the columns of the product of the
transformation matrices will contain the corresponding eigenvectors.
(ii) [A] and [B] both become diagonal, but [B] is not an identity matrix. In
this approach the ratios of the diagonal elements of the transformed [A]
and [B], i.e., [Ak ] and [B k ] (after k transformations), will be the eigen-
values and the columns of the product of the transformation matrices
will contain the corresponding eigenvectors.
176 ALGEBRAIC EIGENVALUE PROBLEMS
(iii) Using either (i) or (ii), the results remain unaffected. In designing
transformation matrices, (ii) is easier.
in which [P1 ] is orthogonal, i.e., [P1 ]T = [P1 ]−1 . Substituting from (4.212)
in (4.211) and premultiplying by [P1 ]T :
We choose [P1 ] such that an off-diagonal element of [A] and the corresponding
off-diagonal element of [B] become zero. If we define:
[A1 ] and [B 1 ] are the transformed [A] and [B] after the first orthogonal
transformation. Perform another change of basis on (4.215).
We choose [P2 ] such that it makes an off-diagonal element of [A1 ] and the
corresponding element of [B 1 ] zero and
The [Pl ] matrices are called rotation matrices, same as in the case of
the Jacobi method for the SEVP. In the design of [Pl ] we take into account
that [A] and [B] are symmetric. To be general, consider [Al ] and [B l ] after l
transformations. Let us say that we want to make alij and blij zero (alji and
blji are automatically made zero as we consider symmetry of [A] and [B] in
designing [Pl+1 ]), then [Pl+1 ] can have the following form.
i j
1
1
1 α
i
1
[Pl+1 ] = (4.225)
1
β 1
j
1
1
The parameters α and β are determined such that in the transformed [Al ]
and [B l ], i.e., in
alij and blij are zero (hence, alji and blji are zero). Using [Pl+1 ] in (4.226) and
setting alij = blij = 0 gives us the following two equations.
a12 = a21 = −1
b12 = b21 = 1
1α
[P12 ] =
β 1
4.4. TRANSFORMATION METHODS FOR EIGENVALUE PROBLEMS 179
We note that λ’s are not in ascending order, but can be arranged in ascending
or descending order.
2. Using the tridiagonal form of [A] and [I] in the transformed (4.229),
we perform QR iterations to extract the eigenvalues and eigenvectors of
(4.229).
(ii) Unlike the Jacobi method, in the Householder method once a row and
the corresponding column are in tridiagonal form, subsequent transfor-
mations for other rows and columns do not affect them.
[An−2 ] = [Pn−2 ]T [Pn−3 ]T . . . [P2 ]T [P1 ][A][P1 ][P2 ] . . . [Pn−3 ][Pn−3 ] (4.234)
{x} = [P1 ][P2 ] . . . .[Pn−3 ][Pn−2 ] (4.235)
[An−2 ] is the final tridiagonal form of [A]. We note that [I] remains unaf-
fected.
a11 {a1 }T
1 [0] 0
[P1 ] = ; [A] = ; {w1 } =
{0} [P̄1 ] {a1 } [A11 ] {w̄1 }
(4.238)
where [P̄1 ], {w̄1 } and [A11 ] are of order (n − 1). Premultiply [A] by [P1 ]T
and post-multiply by [P1 ] to obtain [A1 ].
a11 {a1 }T
1 [0] 1 [0]
[A1 ] = (4.239)
{0} [P̄1 ]T {a1 } [A11 ] {0} [P̄1 ]
or
{a1 }T [P̄1 ]
a11
[A1 ] = (4.240)
T
]T [A
P̄1 {a1 } [P̄1 11 ][P̄1 ]
182 ALGEBRAIC EIGENVALUE PROBLEMS
In [A1 ] first row and the first column should be tridiagonal, i.e. [A1 ] should
have the following form:
a11 x 0 0 . . . 0
x
[A1 ] =
0
(4.241)
[Ā1 ]
0
..
.
0
[P̄1 ] is called the reflection matrix. We are using [P̄1 ] to reflect {a1 } of
[A] into a vector such that only its first component is non-zero (obvious by
comparing (4.240) and (4.241)). Since the length of the vector corresponding
to row one or column one (excluding a11 ) must be the same as the length of
{a1 }, we can use this condition to determine {w1 } (i.e., first {w̄1 } and then
{w1 }).
where a21 is the element (2,1) of matrix [A]. The vector {w1 } is obtained
using {w̄1 } from (4.244) in (4.238). Thus, [P1 ] is defined and we can perform
the Householder transformation for column one and row one to obtain [A1 ],
in which column one and row one are in tridiagonal form.
Next we consider column two and row two to obtain [P2 ] and then use:
In [A2 ] the first two columns and rows are in tridiagonal form. We continue
this (n − 2) times to finally obtain [An−2 ] in tridiagonal form, and we can
write:
[An−2 ]{x}n−2 = λ[I]{x}n−2 (4.247)
and {x} = [P1 ][P2 ] . . . [Pn−3 ][Pn−2 ]{x}n−2 (4.248)
Equation (4.248) is essential to recover the original eigenvector {x}.
The matrix [Q] is orthogonal and [R] is upper triangular. Since [Q] is or-
thogonal we perform a change of basis on (4.247).
or
[Q1 ]T [An−2 ][Q1 ]{x}1n−2 = λ[I]{x}1n−2 (4.253)
Using (4.251) in the left side of (4.253) for [An−2 ]:
[Q1 ]T [An−2 ][Q1 ] = [Q1 ]T [Q1 ][R1 ][Q1 ] = [R1 ][Q1 ] (4.254)
an−2
ji an−2
ii
sin θ = q ; cos θ = q (4.258)
(an−2 2 n−2 2
ii ) + (ajj ) (aii ) + (an−2
n−2 2
ji )
2
contain the eigenvectors of the SEVP and the diagonal elements of the final
transformed [An−2 ] are the eigenvalues. We also note that these eigenvalues
are not in any particular order.
4.4. TRANSFORMATION METHODS FOR EIGENVALUE PROBLEMS 185
{a1 }T = [−4 1 0]
sign(a21 ) = sign(−4) = −
p √
||a1 || = (−4)2 + (1)2 + (0)2 = 17 = 4.123
−4 1 −8.1231
∴ {w̄} = 1 − 4.123 0 = 1
0 0 0
0
0
−8.1231
{w1 } = ... =
1
{w̄1 }
0
2 2
θ= = = 0.029857
{w1}T {w 1} 0 + (−8.1231)(−8.1231) + (1)(1) + 0
1 0 0 0
0 − 0.9701 0.2425 0
∴ [P1 ] = [I] − θ{w1 }{w1 }T = 0
0.2425 0.9701 0
0 0 0 1
5 4.1231 0 0
4.1231 7.8823 3.5294 −1.9403
[A1 ] = [P1 ]T [A][P1 ] =
0 3.5294 4.1177 −3.6380
0 −1.9403 −3.6380 5
3.5294 1 7.5570
∴ {w̄2 } = + 4.0276 =
−1.9403 0 −1.9403
0
0
2
{w2 } = ; θ= = 0.032855
7.5570
{w2 }T {w 2}
−1.9403
1 0 0 0
0 1 0 0
[P2 ] = [I] − θ{w2 }{w2 }T =
∴ 0 0 − 0.8763
0.4817
0 0 0.4817 0.8763
5 4.1231 0 0
4.1231 7.8823 −4.0276 0
[P2 ][A1 ][P2 ] = [A2 ]
∴ 0 −4.0276 7.3941
2.3219
0 0 2.3218 1.7236
[A2 ] is the final tridiagonal form of [A]. The tridiagonal form is symmetric
as expected.
(2) We find the projections of [K] and [M ] in the space spanned by the q
matrices using:
(3) We solve the eigenvalue problem constructed using [Kk+1 ] and [Mk+1 ].
(2) Based on the diagonal elements of [K] and [M ], some guidelines can be
used.
(a) Construct ri = kii/mii . If {e1 }, {e2 }, . . . , {en } are unit vectors con-
taining one at rows 1, 2, . . . , n, respectively, and zeroes everywhere
else and if rj , rk , rl are progressively increasing values of ri , then
[x]n×3 consists of [{ej }, {ek }, {el }].
(3) The eigenvalue problem (4.264) can be solved effectively using Gener-
alized Jacobi or QR-Householder method in which all eigenpairs are
determined.
188 ALGEBRAIC EIGENVALUE PROBLEMS
in which
2 −1 0 0 1 0 0 0
−1 2 −1 0 0 2 0 0
[K] =
0 −1 2 −1 ;
[M ] =
0
0 4 0
0 0 −1 1 0 0 0 3
In this case:
ri = kii/mii = 2, 1, 0.5, 1/3
i=1,2,3,4 i=1,2,3,4
10
(2) The vector iteration method is quite effective in determining the smallest
and the largest eigenpairs. The vector iteration method with iteration
vector deflation is quite effective in calculating a few eigenpairs. For a
large number of eigenpairs, the orthogonalization process becomes er-
ror prone due to inaccuracies in the numerically computed eigenvectors
(hence, not ensuring their orthogonal property). This can cause the
computations to become erroneous or even to break down completely.
(3) The Jacobi and generalized Jacobi methods yield all eigenpairs, hence
4.5. CONCLUDING REMARKS 189
are not practical for eigensystems larger than (50 × 50) or at the most
(100 × 100). In these methods, the off-diagonal terms made zero in
an orthogonal transformation become non-zero in the next orthogonal
transformation, thus these methods may require a larger number of cy-
cles or sweeps before convergence is achieved.
(4) In the Householder method with QR iterations for the SEVP (only), we
tridiagonalize the [A] matrix by Householder transformations and then
use QR iterations on the tridiagonal form to extract the eigenpairs. The
tridiagonalization process is not iterative, but QR iterations are (as the
name suggests). This method also yields all eigenpairs and hence is only
efficient for eigensystems that are smaller than (50 × 50) or at the most
(100 × 100). Extracting eigenpairs using QR iterations is more efficient
with the tridiagonal form of [A] than the original matrix [A]. This is
the main motivation for converting (transforming) [A] to the tridiagonal
form before extracting eigenpairs.
(5) The subspace iteration method is perhaps the most practical method
for larger eigensystems as in this method a large eigenvalue problem
is reduced to a very small eigenvalue problem. Computation of the
eigenpairs only requires working with an eigensystem that is (q × q),
q > p, where p is the desired number of eigenpairs. Generally we choose
q = 2p.
190 ALGEBRAIC EIGENVALUE PROBLEMS
Problems
4.1 Use minors to expand and compute the determinant of
2−λ 2 10
8 3−λ 4
10 4 5−λ
Use Faddeev-Leverrier method to perform the same computations. Also
compute the matrix inverse and verify that it is correct.
where
100 100
[A] = 0 2 0 ; [B] = 0 1 0
003 001
(a) Use inverse iteration method to compute the lowest eigenvalue and
the corresponding eigenvector.
(b) Transform (1) into new SEVP such that the transformed eigenvalue
problem can be used to determine the largest eigenvalue and the
corresponding eigenvector of (1). Compute the largest eigenvalue
and the corresponding eigenvector using this form.
(c) For eigenvalue problem (1), use inverse iteration with iteration vec-
tor deflation technique to compute all its eigenpairs.
(d) Consider the transformed SEVP in (b). Apply inverse iteration
with iteration vector deflation to compute all eigenpairs.
(e) Tabulate and compare the eigenpairs computed in (c) and (d). Dis-
cuss and comment on the results.
where
2 −1 0 0 1 0 0 0
−1 2 −1 0 0 1 0 0
[A] =
0 −1 2 −1 ; [B] =
0
0 1 0
0 0 −1 1 0 0 0 1
4.5. CONCLUDING REMARKS 191
(a) Use inverse iteration method with iteration vector deflation to com-
pute all eigenpairs of (1).
(b) Transform (1) into SEVP such that the transformed eigenvalue
problem can be used to find the largest eigenvalue and the cor-
responding eigenvector of (1). Apply inverse iteration method to
this transformed eigenvalue problem with iteration vector deflection
to compute all eigenpairs.
(c) Tabulate and compare the eigenpairs computed in (a) and (b). Dis-
cuss and comment on the results.
4.4 (a) Show that λ = 1 and λ = 4 are the eigenvalues of the following
eigenvalue problem without calculating them.
53 x 20 x
=λ
35 y 02 y
Calculate both eigenpairs of (1) using standard Jacobi method. Discuss and
show the orthogonality of the eigenvectors calculated in this method.
192 ALGEBRAIC EIGENVALUE PROBLEMS
are its eigenvectors, then how are the eigenvalues and the eigenvectors of
this problem related to the eigenvalue problem [K̂]{x} = µ[M ]{x}?
Where
[K̂] = [K] + 1.5[M ]
[K] and [M ] are same as in problem 4.10
4.12 Consider the following eigenvalue problem [A]{x} = λ[B]{x}
1 −1 x1 1 0 x1
=λ (1)
−1 1 x2 0 1 x2
5.1 Introduction
When taking measurements in experiments, we often collect discrete data
that may describe the behavior of a desired quantity of interest at selected
discrete locations. These data in general may be over irregular domains
in R1 , R2 , and R3 . Constructing a mathematical description of these data
is helpful and sometimes necessary if we desire to perform operations of
integration, differentiation, etc. for the physics described by these data. One
of the techniques or methods of constructing a mathematical description
for discrete data is called interpolation. Interpolation yields an analytical
expression for the discrete data. This expression then can be integrated or
differentiated, thus permitting operations of integration or differentiation on
discrete data. The interpolation technique ensures that the mathematical
expression so generated will yield precise function values at the discrete
locations that are used in generating it.
When the discrete data belong to irregular domains in R1 , R2 , and R3 ,
the interpolations may be quite difficult to construct. To facilitate the in-
terpolations over irregular domains, the data in the irregular domains are
mapped into regular domains of known shape and size in R1 , R2 , and R3 .
The desired operations of integration and differentiation are also performed
in the mapped domain and then mapped back to the original (physical)
irregular domain. Details of the mapping theories and interpolations are
considered in this chapter. First, we introduce the concepts of interpolation
theory in R1 in the physical coordinate space (say x). This is followed by
mapping theory in R1 that maps data in the physical coordinate space to
the natural coordinate space ξ in a domain of two unit length with the origin
located at the center of the two unit length. The concepts of piecewise map-
ping in R1 , R2 , and R3 as well as interpolations over the mapped domains
are presented with illustrative examples.
195
196 INTERPOLATION AND MAPPING
f (x)
fn+1
fi+1
fi
f3
f2
f1
x
x1 x2 x3 xi xi+1 xn+1
f (x) = a0 + a1 x + a2 x2 + · · · + an xn (5.3)
f (x)|x=xi = fi ; i = 1, 2, . . . , n + 1 (5.4)
Remarks.
(1) In order for {a} to be unique det[A] 6= 0 must hold. This is ensured if
each xi location is distinct or unique.
198 INTERPOLATION AND MAPPING
(2) When two data point locations (i.e., xi values) are extremely close to
each other the coefficient matrix [A] may become ill-conditioned.
(3) For large data sets (large values of n), this method requires solutions of a
large system of linear simultaneous algebraic equations. This obviously
leads to inefficiency in its computations.
Theorem 5.1. There exists a unique polynomial ψ(x) of degree not exceed-
ing n called the Lagrange interpolating polynomial such that
ψ(xi ) = fi ; i = 1, 2, . . . , n (5.7)
Proof. The existence of the polynomial ψ(x) can be proven if we can estab-
lish the existence of polynomials Lk (x) ; k = 1, 2, . . . , n with the following
properties:
Hence ψ(x) has the desired properties of f (x), an interpolation for the data
set (xi , fi ) ; i = 1, 2, . . . , n.
n
X
f (x) = ψ(x) = fi Li (x) (5.10)
i=1
5.2. INTERPOLATION THEORY IN R1 199
Remarks.
(1) Lk (x) ; k = 1, 2, . . . , n are polynomials of degree less than or equal to n.
(2) ψ(x) is a linear combination of fk and Lk (x), hence ψ(x) is also a poly-
nomial of degree less than or equal to n.
(3) ψ(xk ) = fk = f (xk ) because Lk (xi ) = 0 for i 6= k and Lk (xk ) = 1.
(4) Lk (x) are called Lagrange interpolating polynomials or Lagrange inter-
polation functions.
n
Li (x) = 1 is essential due to the fact that if fi = f ∗
P
(5) The property
i=1
; i = 1, 2, . . . , n, then f (x) from (5.10) must be f ∗ for all values of x,
n
P
which is only possible if Li (x) = 1.
i=1
The functions Lk (x) defined in (5.11) have the desired properties (5.8).
Hence we can write
n
X
f (x) = ψ(x) = fi Li (x) ; f (xi ) = fi (5.12)
i=1
Therefore we have:
3
Y x − xm (x − x2 )(x − x3 ) (x − 0)(x − 1) x(x − 1)
L1 (x) = = = =
(k=1) x1 − xm (x1 − x2 )(x1 − x3 ) (−1 − 0)(−1 − 1) 2
m=1
m6=1
3
Y x − xm (x − x1 )(x − x3 ) (x − (−1))(x − 1)
L2 (x) = = = = 1 − x2
(k=2) x2 − xm (x2 − x1 )(x2 − x3 ) (0 − (−1))(0 − 1)
m=1
m6=2
3
Y x − xm (x − x1 )(x − x2 ) (x − (−1))(x − 0) x(x + 1)
L3 (x) = = = =
(k=3) x3 − xm (x3 − x1 )(x3 − x2 ) (1 − (−1))(1 − 0) 2
m=1
m6=3
(5.15)
L1 (x), L2 (x), and L3 (x) defined in (5.15) are the desired Lagrange polyno-
mials in (5.13), hence f (x) is defined.
Remarks. Lk (x); k = 1, 2, 3 in (5.15) have the desired properties.
(i) (
1 ; j=i
Li (xj ) = (5.16)
0 ; j 6= i
(ii)
3
X x(x − 1) x(x + 1)
Li (x) = + (1 − x2 ) + =1 (5.17)
2 2
i=1
x
x = −1 x=0 x=1
(x1 ) (x2 ) (x3 )
(iv) We have
f (x) = f1 L1 (x) + f2 L2 (x) + f3 L3 (x) (5.18)
Substituting for L1 (x), L2 (x), and L3 (x) from (5.15) in (5.18):
x(x − 1) x(x + 1)
f (x) = f1 + f2 (1 − x2 ) + f3 (5.19)
2 2
where f1 , f2 , f3 are given numerical values. The function f (x) in (5.19)
is the desired interpolating polynomial for the three data points. We
note that f (x) is a quadratic polynomial in x (i.e., a polynomial of
degree two).
The Lagrange interpolating polynomial f (x) for this data set can be written
as:
4
Y x − xm (x − x1 )(x − x2 )(x − x4 )
L3 (x) = =
(k=3) x3 − xm (x3 − x1 )(x3 − x2 )(x3 − x4 )
m=1
m6=3
27
= (1 + x)(1 − x) (1/3 + x)
16 (5.22)
4
Y x − xm (x − x1 )(x − x2 )(x − x3 )
L4 (x) = =
(k=4) x4 − xm (x4 − x1 )(x4 − x2 )(x4 − x3 )
m=1
m6=4
9 1
=− ( /3 + x) (1/3 − x) (1 + x)
16
Li (x); i = 1, 2, . . . , 4 in (5.22) are the desired Lagrange interpolating poly-
nomials in (5.20).
(i) (
1 ; j=i
Li (xj ) = (5.23)
0 ; j 6= i
(ii)
n
X
Li (x) = 1 (5.24)
i=1
5.3 Mapping in R1
Consider a line segment in one-dimensional space x with equally spaced
coordinates xi ; i = 1, 2, . . . , n (Figure 5.3). We want to map this line seg-
ment in another coordinate space ξ in which its length becomes two units
and xi ; i = 1, 2, . . . , n are mapped into locations ξi ; i = 1, 2, . . . , n respec-
tively. Thus x1 maps into location ξ1 = −1, and xn into location ξn = +1
and so on. This can be done rather easily if we recall that for the data set
(xi , fi ), the Lagrange interpolation f (x) is given by:
n
X
f (x) = fi Li (x) (5.25)
i=1
1 2 3 n−1 n
x
x1 x2 x3 xn−1 xn
ξ = −1 ξ = +1
1 2 3 n−1 n
ξ
ξ1 ξ2 ξ3 ξ = 0 ξn−1 ξn
2
by xi , then the data set (xi , fi ) becomes (ξi , xi ) and (5.25) becomes:
n
X
x(ξ) = xi Li (ξ) (5.26)
i=1
The ξ-coordinate space is called natural coordinate space. The origin of the
ξ-coordinate space is considered at the middle of the map of two unit length
(Figure 5.3). Equation (5.26) indeed is the desired equation that describes
the mapping of points in x- and ξ-coordinate spaces.
Remarks.
(1) The Lagrange interpolation functions Li (ξ) are constructed using the
configuration of Figure 5.3 in the ξ-coordinate space.
(2) xi ; i = 1, 2, . . . , n are the Cartesian coordinates of the points on the line
segment in the Cartesian coordinate space.
(3) In equation (5.26), x(ξ) is expressed as a linear combination of the La-
grange polynomials Li (ξ) using the Cartesian coordinates of the points
in the x-space.
(4) If we choose a point −1 ≤ ξ ∗ ≤ 1, then (5.26) gives its corresponding
location in x-space.
n
X
x∗ = x(ξ ∗ ) = xi Li (ξ ∗ ) (5.27)
i=1
204 INTERPOLATION AND MAPPING
(6) The Lagrange interpolation functions Lk (ξ) are given by replacing x and
xi ; i = 1, 2, .., n with ξ and ξi ; i = 1, 2, . . . , n in (5.11).
n
Y ξ − ξm
Lk (ξ) = ; k = 1, 2, . . . , n (5.28)
ξk − ξm
m=1
m6=k
1−ξ 1+ξ
∴ x(ξ) = x1 + x2
2 2
1−ξ 1+ξ
or x(ξ) = (2) + (6)
2 2
or x(ξ) = (1 − ξ) + 3(1 − ξ)
or x(ξ) = 4 + 2ξ (5.31)
(ξ − ξ2 )(ξ − ξ3 ) ξ(ξ − 1)
∴ L1 (ξ) = =
(ξ1 − ξ2 )(ξ1 − ξ3 ) 2
(ξ − ξ1 )(ξ − ξ3 )
L2 (ξ) = = 1 − ξ2 (5.33)
(ξ2 − ξ1 )(ξ2 − ξ3 )
(ξ − ξ1 )(ξ − ξ2 ) ξ(ξ + 1)
L3 (ξ) = =
(ξ3 − ξ1 )(ξ3 − ξ2 ) 2
ξ(ξ − 1) ξ(ξ + 1)
and x(ξ) = x1 + (1 − ξ 2 )x2 + x3
2 2
ξ(ξ − 1) ξ(ξ + 1)
x(ξ) = (2) + (1 − ξ 2 )(4) + (6)
2 2
x(ξ) = 4 + 2ξ (5.34)
206 INTERPOLATION AND MAPPING
Remarks.
(1) We note that the mapping (5.34) is the same as (5.31). This is not a
surprise as the mapping in this case is also a linear stretch mapping due
to the fact that points in x-space are equally spaced. Hence, in this case
we could have used points 1 and 3 with coordinates x1 and x3 in x-space
and linear Lagrange interpolation
functions
corresponding to points at
1−ξ 1+ξ
ξ = −1 and ξ = 1, i.e., 2 and 2 , to derive the mapping:
1−ξ 1+ξ
x(ξ) = (2) + (6) = 4 + 2ξ (5.35)
2 2
(2) The conclusion in (1) also holds for more than three equally spaced
points in the x-space.
(3) From (1) and (2) we conclude that when the points in x-space are equally
spaced it is only necessary to use the
coordinates
of the two end points
1−ξ
with Lagrange linear polynomials 2 and 1+ξ
2 for the mapping
between x- and ξ-spaces. Thus, if we have n equally spaced points in
x-space that are mapped in ξ-space, the following x(ξ) can be used for
defining the mapping.
1−ξ 1+ξ
x(ξ) = x1 + xn (5.36)
2 2
1−ξ 1+ξ
x(ξ) = (2) + (1 − ξ 2 )(3) + (6)
2 2
or x(ξ) = 3 + 2ξ + ξ 2 (5.37)
5.4. LAGRANGE INTERPOLATION IN R1 USING MAPPING 207
On the other hand if we used linear mapping, i.e,. x1 , x3 and 1−ξ
2 , 1+ξ
2 ,
we obtain:
1−ξ 1+ξ
x(ξ) = (2) + (6)
2 2
or x(ξ) = 4 + 2ξ (a linear stretch mapping) (5.38)
When
ξ = −1 ; x = 2 = x1
ξ=1 ; x = 6 = x3
ξ=0 ; x = 4 6= x2
Thus, mapping (5.38) is not valid in this case.
Remarks.
(1) Mapping (5.37) is not a stretch mapping due to the fact that points in
x-space are not equally spaced. In mapping (5.37) the length between
points 2 and 1 in x-space (x2 − x1 = 3 − 2 = 1) is mapping into unit
length (ξ2 −ξ1 = 1) in the ξ-space. On the other hand the length between
points 3 and 2 (x3 − x2 = 6 − 3 = 3) is also mapped in the unit length
(ξ3 − ξ2 = 1) in the ξ-space. Hence, this mapping is not a linear stretch
mapping for the entire domain in x-space.
(2) From this example we conclude that when the points in the x-space are
not equally spaced, we must utilize all points in the x-space in deriving
in the mapping. This is necessitated due to the fact that in this case the
mapping is not a linear stretch mapping.
i 1 2 3
xi 2 4 6
fi 0 10 0
5.5. PIECEWISE MAPPING AND LAGRANGE INTERPOLATION IN R1 209
Derive Lagrange interpolating polynomial for this data set using the map
of xi ; i = 1, 2, 3 in ξ-space in two unit length.
In which
1−ξ 1+ξ
L1 (ξ) = ; L2 (ξ) = (1 − ξ 2 ) ; L3 (ξ) =
2 2
1−ξ 1+ξ
∴ f (ξ) = (0) + (1 − ξ 2 )(10) + (0)
2 2
∴ f (ξ) = 10(1 − ξ 2 ) (5.43)
Hence, (5.41) and (5.43) complete the interpolation of the data in the table.
For a given ξ ∗ , we obtain f (ξ ∗ ) from (5.43) that corresponds to x∗ (in x-
space) obtained using (5.41), i.e., x∗ = x(ξ ∗ ).
Ω̄(e)
1 2 3 4 5 6 7 n−2 n−1 n
x ξ
xi−1 xi xi+1 -1 0 +1
Remarks.
(1) Rather than choosing three points for a subdomain Ω̄(e) , we could have
chosen four points in which case f (e) (ξ) over Ω̄(ξ) would be a polynomial
of degree three.
(2) Choice of the number of points for a subdomain depends upon the degree
p(e) of the Lagrange interpolation f (e) (ξ) desired.
(3) We note that when the entire data set (xi , fi ) ; i = 1, 2, . . . , n is interpo-
lated using a single Lagrange interpolating polynomial f (ξ), then f (ξ) is
a polynomial of degree (n − 1), hence it is of class C n−1 , i.e., derivatives
of f (ξ) of up to order (n − 1) are continuous.
(4) When we use piecewise mapping and interpolation, then f (e) (ξ) is of
e
M
class C p ; pe ≤ n but f (ξ) given by f (e) (ξ) is only of class C 0 . This
S
e=1
is due to the fact that at the mating boundaries between the subdomains
Ω̄(e) , only the function f is continuous. This is a major and fundamental
difference between piecewise interpolation and using a single Lagrange
polynomial describing the interpolation of the data.
i 1 2 3 1 2 3
xi 2 4 6 x1 x2 x3
fi 0 10 0 f1 f2 f3
we have:
x(ξ) = 4 + 2ξ (5.46)
2
f (ξ) = 10(1 − ξ ) (5.47)
In the second case, we consider piecewise mapping and interpolations using
the subdomains Ω̄(1) = [x1 , x2 ], Ω̄(2) = [x2 , x3 ].
Consider Ω̄(1)
x1 = 2 , x2 = 4 ; f1 = 0 , f2 = 10
1 Ω̄(1) 2 1 2
ξ
x1 = 2 x2 = 4 ξ1 = −1 ξ2 = 1
f1 = 0 f2 = 4
(a) x-space (b) ξ-space
(1) 1−ξ 1+ξ
∴ x (ξ) = x1 + x2
2 2
(1) 1−ξ 1+ξ
x (ξ) = (2) + (4)
2 2
∴ x(1) (ξ) = 3 + ξ (5.48)
and
(1) 1−ξ 1+ξ
f (ξ) = f1 + f2
2 2
or
(1) 1−ξ 1+ξ
f (ξ) = (0) + (10)
2 2
∴ f (1) (ξ) = 5(1 + ξ) (5.49)
Consider Ω̄(2)
x2 = 4 , x3 = 6 ; f2 = 10 , f3 = 0
2 Ω̄(2) 3 2 3
ξ
x2 = 4 x3 = 6 ξ1 = −1 ξ2 = 1
f2 = 10 f3 = 0
(a) x-space (b) ξ-space
5.5. PIECEWISE MAPPING AND LAGRANGE INTERPOLATION IN R1 213
(2) 1−ξ 1+ξ
∴ x (ξ) = x2 + x3
2 2
1−ξ 1+ξ
or x(2) (ξ) = (4) + (6)
2 2
x(2) (ξ) = 5 + ξ (5.50)
and
(2) 1−ξ 1+ξ
f (ξ) = f2 + f3
2 2
1−ξ 1+ξ
or f (2) (ξ) = (10) + (0)
2 2
∴ f (2) (ξ) = 5(1 − ξ) (5.51)
Summary
(i) Single Lagrange polynomial for the whole domain Ω = [x1 , x3 ]
x(ξ) = 4 + 2ξ
(5.52)
f (ξ) = 10(1 − ξ 2 )
(ii) Piecewise interpolation
(a) Subdomain Ω̄(1) :
x(1) (ξ) = 3 + ξ
(5.53)
f (1) (ξ) = 5(1 + ξ)
(b) Subdomain Ω̄(2) :
x(2) (ξ) = 5 + ξ
(5.54)
f (2) (ξ) = 5(1 − ξ)
Figure 5.6 shows plots of f (ξ), f (1) (ξ), and f (2) (ξ) over Ω̄ = [x1 , x3 ] = [2, 6].
f (x)
f (x) = 10(1 − ξ 2 )
f (1) (ξ) = 5(1 + ξ)
10
f (2) (ξ) = 5(1 − ξ)
x
1 2 3
Figure 5.6: Plots of f (ξ), f (1) (ξ), and f (2) (ξ)
214 INTERPOLATION AND MAPPING
and
n
X
f (ξ) = fi Li (ξ) (5.56)
i=1
L
e i (ξ) and Li (ξ) are suitable Lagrange polynomials in ξ for mapping of points
and interpolation. We note that (5.55) only describes mapping of points, i.e.,
given ξ ∗ we can obtain x∗ using (5.55), x∗ = x(ξ ∗ ). Mapping of length in
x- and ξ-spaces requires a different relationship than (5.55). Consider the
differential of (5.55):
n n
!!
dL
e i (ξ) dLe i (ξ)
X e e
X
dx(ξ) = xi dξ = xi dξ (5.57)
dξ dξ
i=1 i=1
Let
n
!
dL
e i (ξ)
e
X
J= xi (5.58)
dξ
i=1
∴ dx(ξ) = Jdξ (5.59)
Equation (5.59) describes a relationship between elemental lengths dξ and
dx in ξ- and x-spaces. J is called the Jacobian of mapping.
df
From (5.56), we note that f is a function of ξ, thus if we require dx , it
can not be obtained directly using (5.56). Differentiate (5.56) with respect
to ξ and since x = x(ξ), we also have ξ = ξ(x) (inverse of the mapping),
hence we can use the chain rule of differentiation whenever needed.
n
df (ξ) X dLi (ξ)
= fi (5.60)
dx dx
i=1
5.6. MAPPING OF LENGTH AND DERIVATIVES OF F (·) 215
2 k
The derivatives dLdξ i (ξ)
, d dξ
Li (ξ)
2 , . . . , d dξ
Li (ξ)
k ; k = 1, 2, . . . , n can be deter-
mined by differentiating Li (ξ); i = 1, 2, . . . , n with respect to ξ. Hence, the
derivatives of f (ξ) with respect to x of any desired order can be determined.
The mapping of length and derivatives of f in the two spaces (x and ξ)
is quite important in many other instances than just obtaining derivatives
of f (·) with respect to x. We illustrate this in the following.
Z1
I= f (ξ)Jdξ (5.69)
−1
Use all three data points to derive the mapping x(ξ). Also derive the map-
ping using points x1 and x3 . Construct Lagrange polynomial f (ξ) using all
5.7. MAPPING AND INTERPOLATION THEORY IN R2 217
But since the points are equally spaced x2 = x1 +x 2 , hence we obtain the
3
which is the same mapping as (5.71), hence in this case also J = h2 . Thus
for linear stretch mapping between x and ξ, mapping is always linear in ξ
and, hence J = h2 , h being the length of the domain in x-space.
As derived earlier, f (ξ) is given by
x3 − x1 6−1
f (ξ) = 10(1 − ξ 2 ) , h= = =2
2 2
df 1 df 1 1
= = (−20ξ) = (−20ξ) = −10ξ
dx J dξ J 2
2
d f 2
1 d f 1 1
2
= 2 2 = 2 (−20) = 2 (−20) = −5
dx J dξ J 2
Choice of data points for subdomains (i.e., Figure 5.8(a) or 5.8(b)) is not
arbitrary but depends upon the degree of interpolation desired for Ω̄(e) as
shown later.
Instead of choosing quadrilateral subdomains we could have chosen tri-
angular or any other desired shape of subdomain. We illustrate the details
using quadrilateral subdomains. Constructing interpolations of data over
subdomains of Figures 5.8(a) and 5.8(b) is quite a difficult task due to irreg-
ular geometry. This task can be simplified by using mapping of the domains
of Figures 5.8(a) and 5.8(b) into regular shapes such as two unit squares,
and then constructing interpolation in the mapped domain.
5.7. MAPPING AND INTERPOLATION THEORY IN R2 219
y y
x x
(a) Ω̄ (b) Ω̄
Ω̄(e)
Ω̄(e)
y y
x x
4 7 Ω̄(e)
3
6
5
Ω̄(e) 8 9
1 4
y y 1 2
2 3
x x
properties.
(
1 ; j=i
1. Li (ξj , ηj ) = ; i, j = 1, 2, . . . , 4
0 ; j 6= i
4
X
2. Li (ξ, η) = 1 (5.75)
i=1
3. Li (ξ, η) ; i = 1, 2, . . . , 4 are polynomials of degree
less than or equal to 2 in ξ and η
Equations (5.76) are the desired equations for mapping of points between
Ω̄(e) and Ω̄(ξη) .
Remarks.
(1) Equations (5.76) are explicit in ξ and η, i.e., given values of ξ and η
(ξ ∗ , η ∗ ), we can use (5.76) to determine their map (x∗ , y ∗ ) using (5.76),
(x∗ , y ∗ ) = (x(ξ ∗ , η ∗ ), y(ξ ∗ , η ∗ )).
(2) Equations (5.76) are implicit in x and y. Given (x∗ , y ∗ ) in xy-space, de-
termination of its map (ξ ∗ , η ∗ ) in ξη-space requires solution of simultaneous
5.7. MAPPING AND INTERPOLATION THEORY IN R2 221
η
4 3 4 3
1 (0,0) ξ
2
y
2 1 2
x 2
(a) Four-node Ω̄(e) in xy-space (b) Map of Ω̄(e) into Ω̄(ξ,η) in a two
unit square in ξη coordinate space
η
7 6
7 5
6
5
8 9 8 9 ξ
2 4
4
y 1 2
3 1 2 3
x 2
(c) A nine-node Ω̄(e) in xy-space (d) Map of Ω̄(e) into Ω̄(ξ,η) into a two
unit square in ξη coordinate space
(4) As shown subsequently, this mapping is bilinear due to the fact that
Li (ξ, η) in this case are linear in both ξ and η.
we can write:
9
X
x(ξ, η) = Li (ξ, η)xi
i=1
(5.77)
9
X
y(ξ, η) = Li (ξ, η)yi
i=1
(6) Choice of the configurations of nodes (as in Figures 5.9(a) and 5.9(c)) is
not arbitrary and is based on the degree of the polynomial desired in ξ
and η, and can be determined using Pascal’s rectangle.
x(ξ, η) = c0 + c1 ξ + c2 η + c3 ξη (5.78)
y(ξ, η) = d0 + d1 ξ + d2 η + d3 ξη (5.79)
In this case the choice of monomials 1, ξ, η, and ξη was not too difficult.
However in case of nine-node configuration of Figure 5.9(d) the choice of the
monomials in ξ and η is not too obvious. Pascal’s rectangle facilitates (i)
the selection of monomials in ξ and η for complete polynomials of a chosen
degree in ξ and η, and (ii) determination of the number of nodes and their
location in the two unit square in ξη-space.
Consider increasing powers of ξ and η in the horizontal and vertical
directions (see Figure 5.10). This arrangement is called Pascal’s rectangle.
We can choose up to the desired degree in ξ and η using Figure 5.10. The
terms located at the intersections of ξ and η lines are the desired monomial
terms. The locations of the monomial terms are the locations of the nodes
in the ξη configuration.
5.7. MAPPING AND INTERPOLATION THEORY IN R2 223
Increasing Powers of ξ
1 ξ ξ2 ξ3 ξ4
ξ2 η ξ3 η ξ4 η
Increasing Powers of η
η ξ η
η2 ξ η2 ξ2 η2 ξ3 η2 ξ4 η2
ξ η3 ξ2 η3 ξ3 η3 ξ4 η3
η3
ξ η4 ξ2 η4 ξ3 η4 ξ4 η4
η4
Remarks.
(3) Pascal’s rectangle is still extremely useful as it can tell us the nodal
configurations and the monomials for complete polynomials of desired
degrees in ξ and η.
η η
(-1,1) (1,1)
1 2
4 3
1 2
-1 1 1 2
(-1,-1) (1,-1) ξ
-1 1
In the ξ-direction:
1−ξ 1+ξ
Lξ1 (ξ) = , Lξ2 (ξ) = (5.84)
2 2
In the η-direction:
1−η 1+η
Lη1 (η) = , Lη2 (η) = (5.85)
2 2
Arrange Lξ1 (ξ) and Lξ2 (ξ) as a vector along with their ξ coordinates of −1 and
+1. Note that ±1 are not elements of the vector, they have been included
to indicate the location of the node corresponding to each Lk . In this case,
this arrangement gives a 2 × 1 vector of Lξ1 and Lξ2 .
ξ 1−ξ
L1 (ξ)
2
(−1) (−1)
ξ =
1+ξ
(5.86)
L 2 (ξ)
2
(+1)
(+1)
Arrange Lη1 (η) and Lη2 (η) as a row matrix along with their η coordinates of
−1 and +1.
" η # 1−η 1+η
L1 (η) Lη2 (η)
= 2 2 (5.87)
(−1) (+1) (−1) (+1)
Take the product of Lξi (ξ) in (5.86) with Lηj (η) in (5.87), keeping their ξ, η co-
ordinates together with the product terms. This is called the tensor product.
226 INTERPOLATION AND MAPPING
ξ
L 1 (ξ)
" Lξ1 (ξ)Lη1 (η) Lξ1 (ξ)Lη2 (η)
Lη (η) Lη (η)
#
(−1) (−1, −1) (−1, +1)
1 2
= ξ (5.88)
Lξ2 (ξ) L2 (ξ)Lη1 (η) Lξ2 (ξ)Lη2 (η)
(−1) (+1)
(+1) (+1, −1) (+1, +1)
Substituting for Lξ1 (ξ), Lξ2 (ξ), Lη1 (η), and Lη2 (η) in (5.88):
1−ξ 1−η 1−ξ 1+η
2 2 2 2 L1 (ξ, η) L4 (ξ, η)
(−1, −1) (−1, +1) (−1, −1) (−1, +1)
1+ξ 1−η 1+ξ 1+η = (5.89)
L2 (ξ, η) L3 (ξ, η)
2 2 2 2
(+1, −1) (+1, +1) (+1, −1) (+1, +1)
The coordinates ξ, η associated with the terms and their comparisons with
the ξ, η coordinates of the four nodes in Figure 5.11(a) identifies Li (ξ, η) ; i =
1, 2, . . . , 4 for the four-node configuration of Figure 5.11(a). We could view
this process as the two-node configuration in η direction (Figure 5.11(b))
traversing along the ξ-direction. As it encounters a node in the ξ-direction,
we obtain a trace of the nodes. Each node of the trace contains products of
1D functions in ξ and η as the two 2D functions in ξ and η. Thus, we have
for the four-node configuration of Figure 5.11(a):
1−ξ 1−η
L1 (ξ, η) =
2 2
1−ξ 1+η
L2 (ξ, η) =
2 2
(5.90)
1+ξ 1−η
L3 (ξ, η) =
2 2
1+ξ 1+η
L4 (ξ, η) =
2 2
η η
7 6
5 Lη3 3 (+1)
8 9 4 ξ Lη2 2 (0)
L1 (ξ, η) L8 (ξ, η) L7 (ξ, η)
= L2 (ξ, η) L9 (ξ, η) L6 (ξ, η)
L3 (ξ, η) L4 (ξ, η) L5 (ξ, η)
Thus Li (ξ, η) ; i = 1, 2, . . . , 9 are completely determined.
Recall
ξ(ξ − 1) ξ(ξ + 1)
Lξ1 (ξ) = , Lξ2 (ξ) = (1 − ξ 2 ) , Lξ3 (ξ) =
2 2 (5.92)
η(η − 1) η(η + 1)
Lη1 (η) = , Lη2 (η) = (1 − η 2 ) , Lη3 (η) =
2 2
Remarks.
(1) Using this procedure it is possible to determine the complete polynomials
Li (ξ, η) for any desired degree in ξ and η.
228 INTERPOLATION AND MAPPING
(2) Hence, for mapping of geometry Ω̄(e) to Ω̄(ξ,η) we can write in general:
n
e
X
x(ξ, η) = L
e i (ξ, η)xi
i=1
(5.93)
Xn
e
y(ξ, η) = L
e i (ξ, η)yi
i=1
Choice of n e and L
e i (ξ, η) depend upon the degrees of the polynomial in ξ
and η and are defined by Pascal’s rectangle. We have intentionally used n e
and L
e i (ξ, η) for mapping of geometry.
n
e
X
x(ξ, η) = L
e i (ξ, η)xi
i=1
(5.94)
n
e
X
y(ξ, η) = L
e i (ξ, η)yi
i=1
n
e and Le i (ξ, η) are suitably chosen number of nodes and the Lagrange inter-
polation polynomials for mapping of Ω̄(e) to Ω̄(ξ,η) . The function values fi
at the nodes of Ω̄(e) or Ω̄(ξ,η) can be interpolated using:
n
X
f (e) (ξ, η) = Li (ξ, η)fi (5.95)
i=1
in which
∂x ∂x
dx = dξ + dη
∂ξ ∂η
(5.96)
∂y ∂y
dy = dξ + dη
∂ξ ∂η
Hence ( ) " #( ) ( )
∂x ∂x
dx ∂ξ ∂η dξ dξ
= ∂y ∂y
= [J] (5.97)
dy ∂η ∂η dη dη
where " #
∂x ∂x
∂ξ ∂η
[J] = ∂y ∂y
(5.98)
∂η ∂η
The matrix [J] is called the Jacobian of mapping. The matrix [J] provides
a relationship between elemental lengths dξ, dη and dx, dy in ξη- and xy-
spaces.
~eη
dx
y dy ξ
~eξ
~j dη
x dξ
~i
(a) Ω̄(e) (b) Ω̄(ξ,η)
magnitude of this vector represents the area formed by these two vectors,
i.e., dΩ. Thus:
dx~i × dy~j = dxdxy ~i × ~j = dxdy~k (5.99)
But
∂x ∂x
dx~i = dξ~eξ + dη~eη (5.100)
∂ξ ∂η
∂y ∂y
dy~j = dξ~eξ + dη~eη (5.101)
∂ξ ∂η
∂x ∂x ∂y ∂y
∴ dx~i × dy~j = dxdy~k = dξ~eξ + dη~eη × dξ~eξ + dη~eη
∂ξ ∂η ∂ξ ∂η
(5.102)
Expanding right side of (5.102):
∂x ∂y ∂x ∂y
dxdy~k = dξ dξ ~eξ × ~eξ + dη dξ ~eη × ~eξ
∂ξ ∂ξ ∂η ∂η
∂y ∂y ∂y ∂y
+ dξ dη ~eξ × ~eη + dη dη ~eη × ~eη (5.103)
∂ξ ∂η ∂η ∂η
Noting that:
or dΩ = |J|dΩ(ξ,η) (5.108)
5.7. MAPPING AND INTERPOLATION THEORY IN R2 231
n
df X ∂Li (ξ, η)
= fi (5.110)
dy ∂y
i=1
( ) ( )
∂Li ∂Li
∂x T −1 ∂ξ
∴ ∂Li
= [J ] ∂Li
(5.113)
∂y ∂η
Hence, ∂f ∂f
∂x and ∂y in (5.109) and (5.110) are now explicitly defined, hence
can be determined.
Remarks.
(1) Many remarks made in Section 5.6 for mapping and interpolation in R1
hold here as well.
(2) We recognize that piecewise interpolation over Ω̄(e) facilitates the process
but is not the same as interpolation
S e over the entire Ω̄ due to limited
differentiability of f (x, y) = f (x, y) (only C 0 in this case) in the
e
piecewise process.
and
m
X
Ni (ξ, η) = 1 (5.115)
i=1
7 6 5 7 6 5
8 9 ξ 8 ξ
4 4
1 2 3 1 2 3
η η
ξ ξ
1+ξ =0 1 2
1+η =0
(a) First, we note that the four sides of the domain Ω̄(ξη) are described by the
equations of the straight lines as shown in the figure. Consider node 1.
N1 (ξ, η) is one at node 1 and zero at nodes 2,3 and 4. Hence, equations
of the straight lines connecting nodes 2 and 3 and nodes 3 and 4 can be
used to derive N1 (ξ, η). That is,
N1 (ξ, η) = c1 (1 − ξ)(1 − η) (5.116)
in which c1 is a constant. But N1 (−1, −1) = 1, hence using (5.116) we
get
1
N1 (−1, −1) = 1 = c1 (1 − (−1))(1 − (−1)) ⇒ c1 = (5.117)
4
234 INTERPOLATION AND MAPPING
Thus, we have
1
N1 (ξ, η) = (1 − ξ)(1 − η) (5.118)
4
which is the correct approximation function for node 1 of the bilinear
approximation functions. Similarly, for nodes 2, 3, and 4 we can write
N2 (ξ, η) = c2 (1 + ξ)(1 − η)
N3 (ξ, η) = c3 (1 + ξ)(1 + η) (5.119)
N4 (ξ, η) = c4 (1 − ξ)(1 + η)
But
1
N2 (1, −1) = 1 ⇒ c2 =
4
1
N3 (1, 1) = 1 ⇒ c3 = (5.120)
4
1
N4 (−1, 1) = 1 ⇒ c4 =
4
Thus, from (5.119) and (5.120) we obtain
1
N2 (ξ, η) = (1 + ξ)(1 − η)
4
1
N3 (ξ, η) = (1 + ξ)(1 + η) (5.121)
4
1
N4 (ξ, η) = (1 − ξ)(1 + η)
4
(5.118) and (5.121) are the correct approximation functions for the four-
node bilinear approximation functions.
(b) In the above derivations we have only utilized the property (5.114), hence
we must show that the interpolation functions in (5.118) and (5.121)
satisfy (5.115). In this case, obviously they do. However, this may not
always be the case.
Since
1
N1 (ξ, η)|(−1,−1) = 1 ⇒ c1 = − (5.123)
4
5.8. SERENDIPITY FAMILY OF C 00 INTERPOLATIONS 235
η 1−η =0
7 6 5
1−ξ =0
1+ξ+η =0
8 ξ
4
1+ξ =0
1 2 3
1+η =0
we obtain
1
N1 (ξ, η) = − (1 − ξ)(1 − η)(1 + ξ + η) (5.124)
4
For nodes 3, 5, and 7 one may use the equations of the lines indicated in
Fig. 5.17 and the conditions similar to (5.123) for N2 , N3 , and N4 .
1−ξ−η =0 1+ξ−η =0
1−η =0 5 7
3 1+η =0
1−ξ+η =0 1+η =0
For the mid-side nodes, the product of the equations of straight lines not
236 INTERPOLATION AND MAPPING
containing the mid-side nodes provide the needed expressions and we have
1
N1 = (1 − ξ)(1 − η)(−1 − ξ − η)
4
1
N2 = (1 − ξ 2 )(1 − η)
2
1
N3 = (1 + ξ)(1 − η)(−1 + ξ − η)
4
1
N8 = (1 − ξ)(1 − η 2 )
2
1
N4 = (1 + ξ)(1 − η 2 )
2
1
N7 = (1 − ξ)(1 + η)(−1 − ξ + η)
4
1
N6 = (1 − ξ 2 )(1 + η)
2
1
N5 = (1 + ξ)(1 + η)(−1 + ξ + η)
4
8
P
In this case also we must show that Ni (ξ, η) = 1, which holds.
i=1
9 10 11 12
7 8
ξ
5 6
1 2 3 4
1
N1 = (1 − ξ)(1 − η)[−10 + 9(ξ 2 + η 2 )]
32
9
N2 = (1 − ξ 2 )(1 − η)(1 − 3ξ)
32
9
N3 = (1 − ξ 2 )(1 − η)(1 + 3ξ)
32
1
N4 = (1 + ξ)(1 − η)[−10 + 9(ξ 2 + η 2 )]
32
9
N5 = (1 − ξ)(1 − η 2 )(1 − 3η)
32
9
N6 = (1 + ξ)(1 − η 2 )(1 − 3η)
32
9
N7 = (1 − ξ)(1 − η 2 )(1 + 3η)
32
9
N8 = (1 + ξ)(1 − η 2 )(1 + 3η)
32
1
N9 = (1 − ξ)(1 + η)[−10 + 9(ξ 2 + η 2 )]
32
9
N10 = (1 − ξ 2 )(1 + η)(1 − 3ξ)
32
9
N11 = (1 − ξ 2 )(1 + η)(1 + 3ξ)
32
1
N12 = (1 + ξ)(1 + η)[−10 + 9(ξ 2 + η 2 )]
32
Remarks.
(1) Serendipity interpolations are obviously incomplete polynomials in ξ and
η, hence have poorer local approximation compared to the local approx-
imations based on Pascal’s rectangle.
(2) There is no particular theoretical basis for deriving them.
(3) In view of p-version hierarchical approximations [49, 50], serendipity ap-
proximations are precluded and are of no practical significance.
ζ η
y ξ
z
(a) Eight-node irregular hexahedron (b) Map of Ω̄(e) of (a) in a two unit
in xyz-space cube in ξηζ space
ζ η
y ξ
z
(c) A 27-node distorted hexahedron in (d) Map of Ω̄(e) of (c) in ξηζ natural
xyz-space coordinate space in a two unit cube
n
e
X
x(ξ, η, ζ) = L
e i (ξ, η, ζ)xi
i=1
Xn
e
y(ξ, η, ζ) = L
e i (ξ, η, ζ)yi (5.125)
i=1
Xn
e
z(ξ, η, ζ) = L
e i (ξ, η, ζ)zi
i=1
5.9. MAPPING AND INTERPOLATION IN R3 239
n are suitably chosen for mapping depending upon the degrees of polynomials
in ξ, η, and ζ and Le i (ξ, η, ζ) are the Lagrange polynomials associated with n
nodes. Li (ξ, η, ζ) have the following properties (similar to mapping in R2 ).
e
The importance of these properties have already been discussed for mapping
in R2 . The conclusions drawn for R2 mapping hold here as well. Equations
(5.125) map Ω̄(e) from xyz-space to Ω̄(m) in a two unit cube in ξηζ-space.
Once we know L e i (ξ, η, ζ) ; i = 1, 2, . . . , n
e, the mapping is completely defined
by (5.125).
5.9.1.1 Construction of L
e i (ξ, η, ζ) using Polynomial Approach
x(ξ, η, ζ) = c0 + c1 ξ + c2 η + c3 ζ + . . .
y(ξ, η, ζ) = d0 + d1 ξ + d2 η + d3 ζ + . . . (5.127)
z(ξ, η, ζ) = b0 + b1 ξ + b2 η + b3 ζ + . . .
(i) The locations of the terms are the locations of the points of the nodes
in Ω̄(m) and Ω̄(e) configuration.
(ii) The terms or monomials (and their products) are the choice we should
use in the linear combination (5.127).
240 INTERPOLATION AND MAPPING
1 ξ ξ2 ξ3 ξ4
ζ
η ξη ξ2η ξ3η ξ4η
ζ2
η2 ξη 2 ξ2η2 ξ3η2 ξ4η2
ζ3
η3 ξη 3 ξ2η3 ξ3η3 ξ4η3
1 ξ
ζ ζξ
η ξη
ζη ζηξ
1 ξ ξ2
η ξη ξ2η
ζ1 ζ 2ξ ζξ 2
η2 ξη 2 ξ2η2
ζη ζξη ζξ 2 η
ζ 21 ζ 2ξ ζ 2ξ2
ζη 2 ζξη 2 ζξ 2 η 2
ζ 2η ζ 2 ξη ζ 2ξ2η
ζ 2η2 ζ 2 ξη 2 ζ 2 ξ 2 η 2
Remarks.
(1) This polynomial approach requires inverse of the coefficient matrix which
can be avoided by using tensor product approach similar to R2 .
L
e i (ξ, η, ζ) ; i = 1, 2, . . . , (n)(m)(q)
Lηm m
Lη3 3
Lη2 2
1
Lη1
Lξ1 Lξ2 Lξ3 Lξn
1 ξ
2 Lζ1 1 2 3 n
3 Lζ2
Lζ3
q
Lζq
ζ
Figure 5.22: 1D Lagrange polynomials in ξ, η, and ζ
Lξ2 Lηm
Lξ1 Lηm Lξn Lηm
Lη1 = 1−η
2 2
Lη2 = 1+η
2
1
1 2
1
ξ
ξ
L
1 =
1−ξ
2 Lξ2 = 1+ξ
2
Lζ1 = 1−ζ
2
2
Lζ2 = 1+ζ
2
Lξ1 Lη2 = 1−ξ
2
1+η
2 Lξ2 Lη2 = 1+ξ
2
1+η
2
1 Lξ1 Lη1 = 1−ξ 1−η
Lξ η
L
2 1 = 1+ξ 1−η
2 2 2 2
Lζ1 = 1−ζ 2
2
Lζ2 = 1+ζ
2
ζ
Figure 5.25: Tensor product in ξη-space
244 INTERPOLATION AND MAPPING
η
ζ
Lξ1 Lη2 Lζ1
Lξ2 Lη2 Lζ1
○
4 ○
3
○
1 Lξ2 Lη1 Lζ1
Lξ1 Lη1 Lζ1 ○
2
○
5 ○
6
These are the desired Lagrange polynomials that are linear in ξ, η, and ζ
and correspond to the eight-node configuration in Figure 5.26.
5.9. MAPPING AND INTERPOLATION IN R3 245
n
e and L e i (ξ, η, ζ) are suitably chosen number of nodes and the Lagrange
interpolation polynomials for mapping of Ω̄(e) to Ω̄(m) in a two unit cube.
If fi are the function values at the nodes of Ω̄(e) or Ω̄(m) then these can be
interpolated using
n
X
f (e) (ξ, η, ζ) = Li (ξ, η, ζ)fi
i=1
in which
n=8 for linear Li (ξ, η, ζ) in ξ, η, and ζ
n = 27 for quadratic Li (ξ, η, ζ) in ξ, η, and ζ
n = 64 for cubic Li (ξ, η, ζ) in ξ, η, and ζ
and so on. Li (ξ, η, ζ) are determined using the tensor product. The functions
L
e i (ξ, η, ζ) and Li (ξ, η, ζ) are generally not the same but can be the same
if so desired. The choice of L e i (ξ, η, ζ) depends on the geometry mapping
considerations, whereas Li (ξ, η, ζ) are chosen based on data points to be
interpolated.
∂x ∂x ∂x
dx = dξ + dη + dζ
∂ξ ∂η ∂ζ
∂y ∂y ∂y
dy = dξ + dη + dζ (5.130)
∂ξ ∂η ∂ζ
∂z ∂z ∂z
dz = dξ + dη + dζ
∂ξ ∂η ∂ζ
246 INTERPOLATION AND MAPPING
or
dx dξ
dy = [J] dη (5.131)
dz dζ
where
∂x ∂x ∂x
∂ξ ∂η ∂ζ
∂y ∂y ∂y
[J] =
∂ξ ∂η ∂ζ (5.132)
∂z ∂z ∂z
∂ξ ∂η ∂ζ
∂x ∂x ∂x
dx~i = dξ~eξ + dη~eη + dζ~eζ
∂ξ ∂η ∂ζ
∂y ∂y ∂y
dy~j = dξ~eξ + dη~eη + dζ~eζ (5.133)
∂ξ ∂η ∂ζ
∂z ∂z ∂z
dz~k = dξ~eξ + dη~eη + dζ~eζ
∂ξ ∂η ∂ζ
We note that:
Substituting for dx~i, dy~j and dz~k from (5.133) into (5.134) and using the
properties of the dot product and cross product in ξηζ- and xyz-spaces, we
obtain:
dxdydz = det[J]dξdηdζ (5.135)
We note that for (5.135) to hold det[J] > 0 must hold. Thus for the mapping
between Ω̄(e) and Ω̄(m) to be valid (one-to-one and onto) det[J] > 0 must
hold. This is an important conclusion from (5.135).
we have:
n
∂f X ∂Li (ξ, η, ζ)
= fi
∂x ∂x
i=1
n
∂f X ∂Li (ξ, η, ζ)
= fi (5.137)
∂y ∂y
i=1
n
∂f X ∂Li (ξ, η, ζ)
= fi
∂z ∂z
i=1
∂f ∂f ∂f
Thus, ∂x , ∂y , ∂z are deterministic if we know:
Since Li = Li (ξ, η, ζ) and x = x(ξ, η, ζ), y = y(ξ, η, ζ), and z = z(ξ, η, ζ), we
can use the chain rule of differentiation to obtain:
∂Li (ξ, η, ζ) ∂Li ∂x ∂Li ∂y ∂Li ∂z
= + +
∂ξ ∂x ∂ξ ∂y ∂ξ ∂z ∂ξ
∂Li (ξ, η, ζ) ∂Li ∂x ∂Li ∂y ∂Li ∂z
= + + ; i = 1, 2, . . . , n (5.138)
∂η ∂x ∂η ∂y ∂η ∂z ∂η
∂Li (ξ, η, ζ) ∂Li ∂x ∂Li ∂y ∂Li ∂z
= + +
∂ζ ∂x ∂ζ ∂y ∂ζ ∂z ∂ζ
or
∂Li (ξ,η,ζ) ∂Li
∂ξ
∂x
∂Li (ξ,η,ζ)
∂η = [J T ] ∂Li
∂y ; i = 1, 2, . . . , n (5.139)
∂Li (ξ,η,ζ)
∂Li
∂ζ ∂z
∂L ∂L
∂xi
∂ξi
∴ ∂Li
∂y = [J T ]−1 ∂Li
∂η ; i = 1, 2, . . . , n (5.140)
∂Li
∂Li
∂z ∂ζ
x η
4 3
3
(4, 4)
2 ξ
4 (0, 2)
2 1 2
1 x
(0, 0) (2, 0) 2
a) Ω̄(e) b) Ω̄(ξ,η)
Solution
(a) Equations describing the mapping: The Lagrange polynomials Li (ξ, η)
; i = 1, 2, . . . , 4 are
1−ξ 1−η 1+ξ 1−η
L1 = , L2 =
2 2 2 2
1+ξ 1+η 1−ξ 1+η
L3 = , L4 =
2 2 2 2
4
X 4
X
∴ x(ξ, η) = Li xi , y(ξ, η) = Li yi
i=1 i=1
5.9. MAPPING AND INTERPOLATION IN R3 249
Similarly
1−ξ 1−η 1+ξ 1−η
y(ξ, η) = (0) + (0)
2 2 2 2
1+ξ 1+η 1−ξ 1+η
+ (4) + (2)
2 2 2 2
or
1+ξ
1+η 1−ξ 1+η
y(ξ, η) = (4) + (2)
2 2 2 2
or
1+η
y(ξ, η) = (3 + ξ)
2
1+ξ
∴ x(ξ, η) = (3 + η) )
2
Equations describing mapping
1+η
y(ξ, η) = (3 + ξ)
2
3+η 1+ξ 1+η 3+ξ 1
det[J] = − = (8 + 2ξ + 2η)
2 2 2 2 4
(c) Derivatives of Li with respect to x, y:
3+η 1+η
[J T ] = 2 2
1+ξ 3+ξ
2 2
3+η
1 − 1+η
[J T ]−1 = 2 2 =
det[J] − 1+ξ
2
3+ξ
2
3+η
1 − 1+η
2 2
1
4 (8 + 2ξ + 2η) − 1+ξ
2
3+ξ
2
( ) ( )
∂Li ∂Li
∂x
∂Li = [J T ]−1 ∂ξ
∂Li ; [J T ]−1 is defined above
∂y ∂η
and
∂L1 1 1−η ∂L1 1 1−ξ
=− =−
∂ξ 2 2 ∂η 2 2
∂L2 1 1−η ∂L2 1 1+ξ
= =−
∂ξ 2 2 ∂η 2 2
∂L3 1 1+η ∂L3 1 1+ξ
= =
∂ξ 2 2 ∂η 2 2
∂L4 1 1+η ∂L4 1 1−ξ
=− =
∂ξ 2 2 ∂η 2 2
∂Li ∂Li
Hence, ∂x , ∂y ; i = 1, 2, 3, 4 are completely defined.
(d) Determination of f (ξ, η):
X
f (ξ, η) = Li (ξ, η)fi
1−ξ 1−η 1+ξ 1−η
= (0) + (1)
2 2 2 2
1+ξ 1+η 1−ξ 1+η
+ (2) + (1)
2 2 2 2
After simplifying,
1
f (ξ, η) = (4 + 2ξ + 2η)
4
5.10. NEWTON’S INTERPOLATING POLYNOMIALS IN R1 251
∂f ∂f ∂x ∂f ∂y
= +
∂ξ ∂x ∂ξ ∂y ∂ξ
∂f ∂f ∂x ∂f ∂y
= +
∂η ∂x ∂η ∂y ∂η
( ∂f ) " ∂y
# ( ∂f
) ( ∂f )
∂x
∂ξ ∂ξ ∂ξ ∂x ∂x
∴ ∂f
= ∂x ∂y ∂f
= [J T ] ∂f
∂η ∂η ∂η ∂y ∂y
∂f 1 ∂f 1
= , =
∂ξ 2 ∂η 2
( )
( ∂f ) 3+ξ 1+η 1
1 −
∴ ∂x
= 1 2 2 2
∂f 1+ξ 3+η 1
∂y 4 (8 + 2ξ + 2η) − 2 2 2
(3+ξ)
− (1+η)
( ∂f ) ( )
∂x 1 4 4
∂f
= 1 (1+ξ) (3+η)
∂y 4 (8 + 2ξ + 2η) − 4 + 4
f (n) = C0 + C1 x + C2 x2 + · · · + Cn xn (5.141)
We can also write (5.141) in an alternate way using the locations of the data
points.
Expanding (5.143):
If we define:
C0 = a 0 + a 1 x 1 + a 2 x 1 x 2
C1 = a1 + a2 x1 − a2 x2 (5.146)
C2 = a2
f (x) = C0 + C1 x + C2 x2 (5.147)
(i) If we let x = x1 , then except for a0 , all other terms become zero due
to the fact that they all contain (x − x1 ). Thus, we obtain:
f (x1 ) = a0 = f1 (5.148)
(ii) If we let x = x2 , then except the first two terms on the right side of
(5.142), all others are zero and we obtain (after substituting for a0 from
(5.148)):
f (x2 ) − f (x1 ) f2 − f1
∴ a1 = = = f [x2 , x1 ] (5.150)
x2 − x1 x2 − x1
f [x2 , x1 ] is a convenient notation. It is called first divided difference
between the points x1 and x2 . Thus, a1 is determined.
(iii) If we let x = x3 , then except the first three terms on the right side of
(5.142), all others are zero and we obtain:
f (x2 ) − f (x1 )
f (x3 ) = f (x1 ) + (x3 − x1 ) + a2 (x3 − x1 )(x3 − x2 ) (5.152)
x2 − x1
f [x3 , x2 ] − f [x2 , x1 ]
a2 = = f [x3 , x2 , x1 ] (5.154)
x3 − x3
a0 = f (x1 )
a1 = f [x2 , x1 ]
a2 = f [x3 , x2 , x1 ] (5.155)
..
.
an = f [xn+1 , xn , . . . , x1 ]
in which the f values in the square brackets are the divided differences
defined by:
f (x2 ) − f (x1 )
f [x2 , x1 ] = ; first divided difference
x2 − x1
f [x3 , x2 ] − f [x2 , x1 ]
f [x3 , x2 , x1 ] = ; second divided difference
x3 − x1
f [x4 , x3 , x2 ] − f [x3 , x2 , x1 ]
f [x4 , x3 , x2 , x1 ] = ; third divided difference
x4 − x1
.. .. ..
. . .
f [xn+1 , xn , . . . , x2 ] − f [xn , xn−1 , . . . , x1 ]
f [xn+1 , xn , . . . , x1 ] =
xn+1 − x1
(5.156)
f (x) = a0 +a1 (x−x1 )+a2 (x−x1 )(x−x2 )+a3 (x−x3 )(x−x2 )(x−x3 ) (5.157)
where
f (xi ) = fi ; i = 1, 2, . . . , 4
and
a0 = f (x1 )
(5.158)
a1 = f [x2 , x1 ]
a2 = f [x3 , x2 , x1 ]
a3 = f [x4 , x3 , x2 , x1 ]
i 1 2 3 4
xi 1 2 3 4
fi 0 10 0 -5
−5
4 4 −5
x2 = x1 + h
x3 = x2 + 2h
.. (5.159)
.
xn = x1 + nh
f (x2 ) − f (x1 )
f 0 (x1 ) = = f [x2 , x1 ]
h (5.162)
f (x3 ) − 2f (x2 ) + f (x1 ) f [x3 , x2 , x1 ]
f 00 (x1 ) = 2
=
h 2!
and so on. Hence,
can be written as
f 00 (x1 )
f (x) = f (x1 ) + f 0 (x1 )(x − x1 ) + (x − x1 )(x − x2 ) + . . . (5.164)
2!
Equation (5.164) is an important form of the Newton’s interpolating poly-
nomial.
If we let
x − x1
=α ∴ x − x − 1 = hα
h
5.12. CONCLUDING REMARKS 257
then
x − x2 = x − (x1 − h) = x − x1 − h = αh − h = h(α − 1)
x − x3 = x − (x2 + h) = x − (x1 + h + h) = x − x1 − 2h (5.165)
= αh − 2h = h(α − 2)
f 0 (x1 ) 2 f n hn
f (x) = f (x1 )+f 0 (x1 )hα+ h α(α−1)+· · ·+ α(α−1) . . . (α−(n−1))+Rn
2! n!
(5.166)
f n+1 (ξ) n+1
Rn = h α(α − 1)(α − 2) . . . (α − n) ; remainder (5.167)
(n + 1)!
Remarks.
Problems
5.1 Consider the following table of data
i 1 2 3
xi 0 3 6
fi f1 f2 f3
We can express
3
X
f (x) = Lk (x)fk (1)
k=1
i 1 2 3 4 5
xi 0 1 2 3 4
fi 1 3 7 13 21
i 1 2 3
xi -1.0 -0.5 1.0
fi f1 f2 f3
i 1 2 3
xi 0 2 4
fi 1 5 17
Determine f (3) using Lagrange interpolating polynomial for the data in the
table.
i 1 2 3 4
xi 3 4 2.5 5
fi 8 2 7 1
i 1 2 3 4 5
xi 0 1 2 3 4
fi 0 2 6 12 20
5.7 Consider a two node configuration Ω̄(e) in R1 shown in Figure (a) with
coordinates. Figure (b) shows its map Ω̄(ξ) in the natural coordinate space
ξ.
y
Ω̄(ξ)
1 2
1 Ω̄(e) 2 ξ
x
x=1 x=4 2
(e)
(a) A two node domain Ω̄ (b) Map of Ω̄(e) in Ω̄(ξ)
(c) If f1 and f2 are function values at nodes 1 and 2 of Figure (a), then
establish interpolation f (ξ) in the natural coordinate space ξ i.e.
Ω̄(ξ) .
df (ξ)
(d) Derive an expression for dx using the interpolation f (ξ) derived
in (c).
(e) Using f1 = 10 and f2 = 20 calculate values of f at x = 1.75 and
x = 3.25 using the interpolation f (ξ) in (c).
df
(f) Also calculate dx at x = 1.75 and x = 3.25.
1 2 3
1 2 3 ξ
x −1 0 1
x=1 x = 2.5 x = 4 2
(a) A three node configuration Ω̄(e) (b) Map Ω̄(ξ) of Ω̄(e)
η
y
1 2 3
1 2 3 ξ
x −1 0 1
x = 1 x = 1.75 x=4 2
(a) A three node configuration Ω̄(e) (b) Map Ω̄(ξ) of Ω̄(e)
1 2 3
1 2 3 ξ
x −1 0 1
x=1 x = 3.25 x = 4 2
(a) A three node configuration Ω̄(e) (b) Map Ω̄(ξ) of Ω̄(e)
1 2 3 1 2 3
x ξ
x1 x2 x3 ξ=0 ξ=1 ξ=2
(a) A three node configuration Ω̄(e) (b) Map Ω̄(ξ) of Ω̄(e)
y η
3 (0, 2) (2, 2)
4 3
(p, q)
4
(0, 2)
1 2 1 2
x ξ
(0, 0) (2, 0) (0, 0) (2, 0)
(a) Ω̄(e) in x, y space (b) Map Ω̄(ξ,η) of Ω̄(e)
5.12. CONCLUDING REMARKS 263
The coordinates of the nodes are also given in the two spaces in Figures (a)
and (b).
(a) Determine the equations describing the mapping of points in xy and
ξη spaces for Ω̄(e) and Ω̄(ξη) i.e. determine x = x(ξ, η), y = y(ξ, η).
Simplify the expressions till no further simplification is possible.
(b) Determine the relationship between p and q (the Cartesian coordi-
nates of node 3) for their admissibility in the geometric description
of the geometry Ω̄(e) in the xy space. Simplify the final expression
or equation.
5.13 Consider a four node quadrilateral bilinear geometry Ω̄(e) in R2 shown
in Figure (a).
3
B
(3, 3)
4
(0, 2)
A
1 2
x
(0, 0) (2, 0)
(a) A four node Ω̄(e) in R2
6 4
5 η
(0, 1) (1, 1)
6 5 4
y
1 2
3
1 2 3
ξ
x (0, 0) (1, 0)
(a) Ω̄(e) in R2 (b) Map Ω̄(ξ,η) of Ω̄(e)
5.15 Consider a four node bilinear Ω̄(e) in R2 shown in Figure (a). Its map
Ω̄(ξη) in ξη space is shown in Figure (b).
y η
4
(0, 4) 3
(0, 1) (1, 1)
(s, t) 4 3
4
1 2
x
(0, 0) (2, 0)
1 2
ξ
2
(0, 0) (1, 0)
(a) Ω̄(e) in R2 (b) Map Ω̄(ξ,η) of Ω̄(e)
5.16 Consider a six node para-linear Ω̄(e) in R2 shown in Figure (a). Its
map Ω̄(ξη) in ξη space is shown in Figure (b).
5.12. CONCLUDING REMARKS 265
y
η
6 5 6
6 (1.5, 2.25) (3, 3)
(0, 3)
5 4
2 ξ
(1.5, 0.75)
3 1 2 3
1
x 2 (3, 0)
(0, 0) 2
(e) 2
(a) Ω̄ in R (b) Map Ω̄(ξ,η) of Ω̄(e)
y
√
5
3
B
A
4
√
5 3
4
2.25
2
1 2
x
5.18 Consider two-dimensional Ω̄(e) in R2 shown in Figure (a), (b), and (c).
266 INTERPOLATION AND MAPPING
The Cartesian coordinates of the nodes are given. The domains Ω̄(e) are
mapped into ξη-space into a two-unit square.
y y (6.5,7) y
(5,6) (10,6)
3
4 3 4 5
7 6
3
60°
10 5 8
3 4
1 2 1 2 3 1.5
1 2
x x x
10 5 3 3
0.5
(a) (b) (c)
Figure 1: Ω̄(e) in R2
7 6 5
3
8
1
2 4
3 1
1 3 1.5
x
3 3
0.5
(e)
Figure 1: Ω̄ in R2
Provide program listing, results, tables, and plots along with a write-up on
the equations used as part of the report. Also provide a discussion of your
results.
6
Numerical Integration or
Quadrature
6.1 Introduction
In many situations, due to the complexity of integrands and irregular-
ity of the domain in definite integrals it becomes necessary to approximate
the value of the integral. The numerical integration methods or quadra-
ture methods are methods of obtaining approximate values of the definite
integrals. Many simple numerical integration methods are derived using the
simple fact that if we wish to calculate the integral of f (x) between the limits
x = A to x = B, i.e.,
ZB
I = f (x) dx (6.1)
A
f (x)
RB
shaded area = I = f (x) dx
A
x
x=A x=B
methods are based on approximating the actual area under the curve f (x)
versus x between x = A and x = B.
Numerical integration methods such as trapezoid rule, Simpson’s 1/3
and 3/8 rules, Newton-Cotes integration, Richardson’s extrapolation, and
Romberg method presented in the following sections are all methods of
approximation. These methods are only effective in R1 . Gauss quadra-
ture is equally effective in R1 , R2 , and R3 . Gauss quadrature a numerical
269
270 NUMERICAL INTEGRATION OR QUADRATURE
(1) In the first category of methods the integration interval [A, B] is divided
into subintervals (may be considered of equal width for convenience).
The integration methods are developed for calculating the approximate
value of the integral for a subinterval. The sum of the approximated
integral values for each subinterval then yields the approximate value of
the integral over the entire interval of integration [A, B]. We consider
two methods:
in which
Zbi
Ii = f (x) dx (6.3)
ai
f (bi ) − f (ai )
f (x) ≈ fe(x) = f (ai ) + (x − ai ) (6.4)
bi − ai
Zbi
Ii ≈ Iei = fe(x) dx (6.5)
ai
Zbi
f (bi ) − f (ai )
Ii ≈ f (ai ) + (x − ai ) dx (6.6)
(bi − ai )
ai
or
f (ai ) + f (bi )
Ii ≈ (bi − ai ) (6.7)
2
272 NUMERICAL INTEGRATION OR QUADRATURE
f (x)
f (bi )
f (ai )
x
ai bi
subinterval i
1
6.2.2 Simpson’s 3
Rule
Consider Ii for a subinterval [ai , bi ].
Zbi
Ii = f (x) dx (6.8)
ai
We calculate f (ai ), f ( ai +b
2 ), and f (bi ) using f (x), the integrand in (6.8).
i
Using (x1 , f (x1 )), (x2 , f (x2 )), and (x3 , f (x3 )), we establish a quadratic in-
terpolating polynomial fe(x) (say, using Lagrange polynomials) that is con-
sidered to approximate f (x) ∀x ∈ [ai , bi ].
(x − x2 )(x − x3 ) (x − x1 )(x − x3 )
fe(x) = f (x1 ) + f (x2 )
(x1 − x2 )(x1 − x3 ) (x2 − x1 )(x2 − x3 )
(x − x1 )(x − x2 )
+ f (x3 ) (6.10)
(x3 − x1 )(x3 − x2 )
Zbi
Ii ≈ Iei = fe(x) dx (6.11)
ai
This is called Simpson’s 13 Rule. Figure 6.3 shows fe(x) and the true f (x)
for the subinterval [ai , bi ]. We note that (6.12) can also be written as:
f (x1 ) + 4f (x2 ) + f (x3 )
Ii = (bi − ai )
e = (bi − ai )Hi (6.13)
| {z } | 6
width
{z }
average height
f (x)
fe(x)
f (x)
ai +bi
x
ai 2 bi
(x1 ) (x2 ) (x3 )
Figure 6.3: f (x) and fe(x), quadratic approximation of f (x) for the subinterval [ai , bi ]
274 NUMERICAL INTEGRATION OR QUADRATURE
3
6.2.3 Simpson’s 8
Rule
Consider Ii for a subinterval [ai , bi ]. We divide the subinterval in three
equal parts and define the coordinates as:
bi − ai 2(bi − ai )
x1 = ai ; x2 = ai + ; x3 = ai + ; x4 = b4
3 3
(6.14)
We calculate f (x1 ), f (x2 ), f (x3 ), and f (x4 ) using f (x) in the integrand of
Ii .
Zbi
Ii = f (x) dx (6.15)
ai
3 bi − ai
Ii ≈ Iei = hi (f (x1 )+3f (x2 )+3f (x3 )+f (x4 )) ; hi = (6.18)
8 3
f (x)
f (x)
fe(x)
x
x1 x2 x3 x4
(bi −ai ) 2(bi −ai )
(ai ) ai + 3 ai + 3 (bi )
Figure 6.4: f (x) and fe(x), cubic approximation of f (x) for the interval [ai , bi ]
We note that fe(x) can be quite different than f (x). From (6.19), we
can interpret Simpson’s 38 rule as the area of a rectangle with base (bi − ai )
and height Hi (given by (6.19)). We calculate Iei for a subinterval and use
n
P
Ie = Iei to obtain approximate value of the integral (6.1).
i=1
Remarks.
(1) As in the other methods discussed, here also the accuracy of the method
is dependent on the size of the subinterval. The smaller the subinterval,
the better the accuracy of the approximated value of the integral I.
(2) It can be shown that the truncation error in calculating Ii using (6.19)
is of the order of O(h6 ) (proof omitted).
(3) This definition of hi is different than used in Sections 6.2.1 and 6.2.2.
276 NUMERICAL INTEGRATION OR QUADRATURE
In each of the three methods, we calculate I for each subinterval [ai , bi ] and
sum them to obtain I.
f (bi )
bi −ai
• Ii ≈ 2 (f (ai ) + f (bi ))
f (ai )
• f (ai ) and f (bi ) are calculated using
f (x) = (sin x)2 ex
x
ai bi
6.2. NUMERICAL INTEGRATION IN R1 277
Table 6.1: Results of trapezoid rule for (6.20) using one subinterval
subintervals = 1
bi − ai = 0.200000E+01
i ai bi Ii
TOTAL 0.610943E+01
Table 6.2: Results of trapezoid rule for (6.20) using two subintervals
subintervals = 2
bi − ai = 0.100000+01
i ai bi Ii
TOTAL 0.497946E+01
Table 6.3: Results of trapezoid rule for (6.20) using four subintervals
subintervals = 4
bi − ai = 0.500000+00
i ai bi Ii
TOTAL 0.490884E+01
278 NUMERICAL INTEGRATION OR QUADRATURE
Table 6.4: Results of trapezoid rule for (6.20) using eight subintervals
subintervals = 8
bi − ai = 0.250000+00
i ai bi Ii
TOTAL 0.489874E+01
i ai bi Ii
TOTAL 0.489660E+01
6.2. NUMERICAL INTEGRATION IN R1 279
1
Example 6.2 (Simpson’s 3 Rule).
f (x)
f (x3 )
f (x2 )
(bi −ai )
• Ii ≈ 6 (f (x1 ) + 4f (x2 ) + f (x3 ))
f (x1 )
• f (x1 ), f (x2 ), and f (x3 ) are calculated us-
ing f (x) = (sin x)2 ex
x
x1 = ai x2 x3 = bi
1
Table 6.6: Results of Simpson’s 3
rule for (6.20) using one subinterval
subintervals = 1
bi − ai = 0.200000E+01
i ai bi Ii
TOTAL 0.460280E+01
1
Table 6.7: Results of Simpson’s 3
rule for (6.20) using two subintervals
subintervals = 2
bi − ai = 0.100000+01
i ai bi Ii
TOTAL 0.488530E+01
280 NUMERICAL INTEGRATION OR QUADRATURE
1
Table 6.8: Results of Simpson’s 3
rule for (6.20) using four subintervals
subintervals = 4
bi − ai = 0.500000+00
i ai bi Ii
TOTAL 0.489538E+01
1
Table 6.9: Results of Simpson’s 3
rule for (6.20) using eight subintervals
subintervals = 8
bi − ai = 0.250000+00
i ai bi Ii
TOTAL 0.489589E+01
6.2. NUMERICAL INTEGRATION IN R1 281
1
Table 6.10: Results of Simpson’s 3
rule for (6.20) using 16 subintervals
subintervals = 16
bi − ai = 0.250000+00
i ai bi Ii
TOTAL 0.489591E+01
3
Example 6.3 (Simpson’s 8 Rule).
f (x)
f (x3 ) f (x4 )
f (x2 )
(bi −ai
f (x1 ) • Ii ≈ 8 f (x1 ) + 3f (x2 ) + 3f (x3 )
+f (x4 )
3
Table 6.11: Results of Simpson’s 8
rule for (6.20) using one subinterval
subintervals = 1
bi − ai = 0.200000E+01
i ai bi Ii
TOTAL 0.477375E+01
3
Table 6.12: Results of Simpson’s 8
rule for (6.20) using two subintervals
subintervals = 2
bi − ai = 0.100000+01
i ai bi Ii
TOTAL 0.488530E+01
3
Table 6.13: Results of Simpson’s 8
rule for (6.20) using four subintervals
subintervals = 4
bi − ai = 0.500000+00
i ai bi Ii
TOTAL 0.489568E+01
6.2. NUMERICAL INTEGRATION IN R1 283
3
Table 6.14: Results of Simpson’s 8
rule for (6.20) using eight subintervals
subintervals = 8
bi − ai = 0.250000+00
i ai bi Ii
TOTAL 0.489591E+01
3
Table 6.15: Results of Simpson’s 8
rule for (6.20) using 16 subintervals
subintervals = 16
bi − ai = 0.250000+00
i ai bi Ii
TOTAL 0.489592E+01
6.3) are summarized in Tables 6.1 – 6.15. In these studies all subintervals are
of uniform widths, however use of non-uniform width subintervals presents
no problem. In this case one needs to be careful to establish ai and bi based
on subinterval widths for evaluating Ii for the subinterval [ai , bi ].
In each method, as the number of subintervals are increased, accuracy
of the value of the integral improves. For the same number of subintervals
Simpson’s 13 method produces integral values with better accuracy (error
O(h4 )) compared to trapezoid rule (error O(h2 )), and Simpson’s 38 rule (error
O(h6 )) is more accurate than Simpson’s 13 rule. In Simpson’s 38 rule the
integral values for 8 and 16 subintervals are accurate up to four decimal
places.
I ≈ Ih1 + ChN
1 (6.21)
I ≈ Ih2 + ChN
2 (6.22)
where Ih1 is the value of the integral I using subinterval size h1 , Ih2 is
the value of the integral I using subinterval size h2 . N depends upon the
polynomial approximation used in approximating actual f (x). ChN 1 is the
error in Ih1 and ChN2 is the error in I h2 . The expressions (6.21) and (6.22)
are based on several assumptions:
(i) The constant C is not the same in (6.21) and (6.22), but we assume it
to be.
(ii) Since Ih2 is based on h2 < h1 , Ih2 is more accurate than Ih1 and hence
we expect I in (6.22) to have better accuracy than I in (6.21).
First, assuming I to be the same in (6.21) and (6.22), we can solve for C.
Ih1 − Ih2
C≈ (6.23)
hN
2 − h1
N
or N
h1
Ih − Ih h2 Ih2 − Ih1
I ≈ Ih2 + 2N 1 = N (6.25)
h1 h1
h2 −1 h2 −1
Value of N :
1. In trapezoid rule ; N =2
2. In Simpson’s 13 rule ; N =4
3. In Simpson’s 38 rule ; N = 6 and so on
Remarks.
(1) Use of (6.25) requires Ih1 for subinterval width h1 and Ih2 for subin-
terval width h2 < h1 . The integral value I in (6.25) is an improved
approximation of the integral.
(2) Thus, we can view the truncation error in the numerical integration
process when the integrand f (x) is approximated by a polynomial in a
subinterval to be of the following form, a series in h:
Et = C1 h2 + C2 h4 + C3 h6 + . . . (6.26)
(3) In Richardson’s extrapolation if Ih1 and Ih2 are obtained using trapezoid
rule, then N = 2 and by using (6.25), we eliminate errors O(h2 ).
(4) On the other hand if Ih1 and Ih2 are obtained using Simpson’s 13 rule,
then N = 4 and by using (6.25), we eliminate errors of the order of
O(h4 ) and so on.
ZB
I= f (x) dx (6.27)
A
Let us consider trapezoid rule and let us calculate numerical values of the
integral I using one, two, three, etc. uniform subintervals. Then all these
286 NUMERICAL INTEGRATION OR QUADRATURE
integral values have truncation error of the order of O(h2 ), shown in Table
6.16 below.
We use values of the integral in column two that contain errors O(h2 ) in
Richardson’s extrapolation to eliminate errors O(h2 ). In doing so we use
Ih − Ih
I ≈ Ih2 + 2N 1 (6.28)
h1
h2 −1
and
f (x) = 0.2 + 25x − 200x2 + 675x3 − 900x4 + 400x5
Consider trapezoid rule with 1, 2, 4, and 8 subintervals of uniform width
for calculating the numerical value of the integrals. These numerical values
of the integral contain truncation errors 0(h2 ). The numerical values of this
integral I are listed in Table 6.17 in column two.
We use this with values of the integral in column 2 to calculate the integral
values in column three of Table 6.17, which contain leading truncation error
O(h4 ). With the integral values in column three we use:
to obtain integral values in column four of table 6.17, which contain leading
truncation error O(h6 ). Using integral values in column four and
we obtain the integral value in column five that contains leading truncation
error O(h8 ). This is the final most accurate value of the integral I based on
Romberg method employing Richardson’s extrapolation.
Remarks.
(1) Numerical integration methods such as the Newton-Cotes methods dis-
cussed in R1 are difficult to extend to integrals in R2 and R3 .
(2) Richardson’s extrapolation and Romberg method are specifically de-
signed to improve the accuracy of the numerically calculated values of
the integrals from trapezoid rule, Simpson’s 13 method, Simpson’s 38
method, and in general Newton-Cotes methods. Thus, their extensions
to R2 and R3 are not possible either.
(3) In the next section we present Gauss quadrature that overcomes these
shortcomings.
f (ξ) = C0 + C1 ξ + C2 ξ 2 + C3 ξ 3 + C4 ξ 4 + C5 ξ 5 (6.46)
Let N , the degree of the polynomial f (ξ), be such that the n-point quadra-
ture rule integrates it exactly. Then, we can write:
n
X
I= wi f (ξi ) (6.48)
i=1
(2) Values of wi and ξi for various values of n are generally tabulated (see
Table 6.18). Since the locations of the quadrature points in the interval
[−1, 1] are symmetric about ξ = 0, the values of ξi in the table are listed
only in the itnerval [0, 1]. For example, for n = 3 we have −ξ1 , ξ2 (= 0),
and ξ1 , thus only ξ1 and ξ2 need to be listed as given in Table 6.18. For
n = 4, we have −ξ1 , −ξ2 , ξ2 , and ξ1 , thus only ξ1 and ξ2 ned to be listed
in Table 6.18. The weight factors for ±ξi are the same, i.e., wi applies
to +ξi as well as −ξi . The values of wi and ξi are listed up to fifteen
decimal places.
(wi , ξi ) ; i = 1, 2, . . . , n
(iv) Then
n
X
I= wi f (ξi ) (6.51)
i=1
Table 6.18: Sampling points and weight factors for Gauss quadrature for integration
limits [−1, 1]
+1
R n
P
I= f (x) dx = Wi f (xi )
−1 i=1
±xi Wi
n=1
0 2.00000 00000 00000
n=2
0.57735 02691 89626 1.00000 00000 00000
n=3
0.77459 66692 41483 0.55555 55555 55556
0.00000 00000 00000 0.88888 88888 88889
n=4
0.86113 63115 94053 0.34785 48451 37454
0.33998 10435 84856 0.65214 51548 62546
n=5
0.90617 98459 38664 0.23692 68850 56189
0.53846 93101 05683 0.47862 86704 99366
0.00000 00000 00000 0.56888 88888 88889
n=6
0.93246 95142 03152 0.17132 44923 79170
0.66120 93864 66265 0.36076 15730 48139
0.23861 91860 83197 0.46791 39345 72691
n=7
0.94910 79123 42759 0.12948 49661 68870
0.74153 11855 99394 0.27970 53914 89277
0.40584 51513 77397 0.38183 00505 05119
0.00000 00000 00000 0.41795 91836 73469
n=8
0.96028 98564 97536 0.10122 85362 90376
0.79666 64774 13627 0.22238 10344 53374
0.52553 24099 16329 0.31370 66458 77887
0.18343 46424 95650 0.36268 37833 78362
n=9
0.96816 02395 07626 0.08127 43883 61574
0.83603 11073 26636 0.18064 81606 94857
0.61337 14327 00590 0.26061 06964 02935
0.32425 34234 03809 0.31234 70770 40003
0.00000 00000 00000 0.33023 93550 01260
n = 10
0.97390 65285 17172 0.06667 13443 08688
0.86506 33666 88985 0.14945 13491 50581
0.67940 95682 99024 0.21908 63625 15982
0.43339 53941 29247 0.26926 67193 09996
0.14887 43389 81631 0.29552 42247 14753
6.3. NUMERICAL INTEGRATION IN R1 USING GAUSS QUADRATURE FOR [−1, 1] 295
we find that integration variables are x and ξ in (6.52) and (6.53), but
that makes no difference. Secondly, the limits of integration in (6.52) (the
integral we want to evaluate) are [a, b], whereas in (6.53) they are [−1, 1].
By performing a change of variable from ξ to x in (6.53), we obtain (6.52).
We proceed as follows
(i) Determine the highest degree of the polynomial f (x) in (6.52), say N ,
then the minimum number of quadrature points n are determined using
n = N 2+1 (round to the next highest integer).
(ii) From Table 6.18 determine wi ; i = 1, 2, . . . , n and ξi ; i = 1, 2, . . . , n
for the integration interval [−1, 1].
(iii) Transform (wi , ξi ) ; i = 1, 2, . . . , n for the integration interval [a, b] in
(6.52) using:
a+b b−a x b−a
xi = + ξi ; wi = wi ; i = 1, 2, . . . , n
2 2 2
(6.54)
296 NUMERICAL INTEGRATION OR QUADRATURE
(iv) Now using the weight factors wix ; i = 1, 2, . . . , n and quadrature points
xi ; i = 1, 2, . . . , n for the integration interval [a, b] we can integrate
f (x) in (6.52).
Xn
I= wix f (xi ) (6.55)
i=1
This is the exact numerical value of the integral (6.56) using Gauss quadra-
ture.
6.4. GAUSS QUADRATURE IN R2 297
Zd Zb Zd Zb
I= f (x, y) dx dy = f (x, y) dx dy (6.61)
c a c a
Nx + 1 Ny + 1
nx = ; ny = round to the next higher integers (6.62)
2 2
Now using (6.63) in (6.61), first we integrate with respect to x using Gauss
quadrature holding y constant.
Zd n
X
x !
I= wix f (xi , y) dy (6.64)
c i=1
ny nx !
wjy
X X
I= wix f (xi , yj ) (6.65)
j=1 i=1
This is the exact numerical value of the integral (6.61) obtained using Gauss
quadrature.
298 NUMERICAL INTEGRATION OR QUADRATURE
or
Z1 Z1 Z1
I= f (ξ, η, ζ) dξ dη dζ (6.67)
−1 −1 −1
Nξ + 1 Nη + 1 Nζ + 1
nξ = , nη = , nζ =
2 2 2 (6.68)
round to the next higher integers
Z1 Z1
n ξ
Z1
nη nξ
wjη
X X
I= f (ξi , ηj , ζ) dζ (6.70)
−1 j=1 i=1
This is the exact numerical value of the integral using Gauss quadrature.
6.5. GAUSS QUADRATURE IN R3 299
Zf Zd Zb
I= f (x, y, z) dx dy dz (6.72)
e c a
or
Zf Zd Zb
I= f (x, y, z) dx dy dz (6.73)
e c a
Let N x, N y,
and Nz
be the highest degrees of the polynomial f (x, y, z) in
x, y, and z, then n , ny , and nz , the minimum number of quadrature points
x
Nx + 1 Ny + 1 Nz + 1
nx = , ny = , nz =
2 2 2 (6.74)
round up to the next higher integer
Now using (6.73) and (6.75) we can integrate f (x, y, z) with respect to x, y,
and z using Gauss quadrature.
!
nz ny nx
wjy
X X X
I= wkz wix f (xi , yj , zk ) (6.76)
k=1 j=1 i=1
Remarks.
300 NUMERICAL INTEGRATION OR QUADRATURE
or
I = 1.9333334000
6.5. GAUSS QUADRATURE IN R3 301
This value agrees with the theoretical value of I up to six decimal places.
We could check that if we use n = 3 (one order higher than minimum
quadrature rule) the value of the integral remains unaffected up to six deci-
mal places. Details are given in the following.
3
X
I= wi f (xi )
i=1
From Table 6.18, for n = 3, we have:
x1 = −0.7745966910 ; w1 = 0.5555555820
x2 = 0.0 ; w2 = 0.8888888960
x3 = 0.7745966910 ; w3 = 0.5555555820
3
P
Using these values of wi , xi ; i = 1, 2, 3 and I = wi f (xi ), we obtain:
i=1
I = 1.9333334000
which agrees with the integral value calculated using n = 2 up to all com-
puted decimal places. Thus, using n = 3, the integral value neither improved
nor deteriorated.
or I = 46.212463400
This value is accurate up to six decimal places when compared with the
theoretical value of I.
As in Example 6.5, here also if we use n = 3 instead of n = 2 (minimum)
and repeat the integration process, we find that I remains unaffected.
Determine the quadrature point location and weight factors in x and y using
Table 6.18 for the interval [−1, 1].
I = 0.4444443880
This value agrees with theoretical values up to at least seven decimal places.
It can be verified that using Gauss quadrature higher than (nx × ny ) =
(2 × 2), the value I of the integral remains unaffected.
I= (xy + x2 y 2 ) dx dy (6.84)
0.31 0.11
In this example the integrand is the same as in Example 6.7, but the limits
of integration are not [−1, 1] in x and y as in Example 6.7. In this case also
N x = 2, N y = 2, hence:
Nx + 1 2+1 3
nx = = = ; nx = 2
2 2 2
N y +1 2 + 1 3
ny = = = ; ny = 2
2 2 2
Determine the quadrature points and the weight function factors in ξ and η
for the integration interval [−1, 1] using Table 6.18.
I = 0.5034378170
This value agrees with the theoretical value up to at least seven decimal
places.
2 0.577189866000E−02
3 0.686041173000E−02
4 0.687233079000E−02
5 0.687239133000E−02
6 0.687239319000E−02
7 0.687239412000E−02
Problems
6.1 Calculate the value of the integral
Z2
1 2
I= x+ dx
x
1
numerically.
6.2 Use Romberg method to evaluate the following integral with accuracy
of the order of O(h8 )
Z3
I = xe2x dx
0
Hint: Use trapezoidal rule with 1, 2, 4 and 8 steps then apply Romberg
method.
6.3 Use lowest order Gauss quadrature to obtain exact value of the following
integral.
Z1
10 + 5x2 + 2.5x3 + 1.25x4 + 0.62x5 dx
I=
−1
6.4 Use lowest order Gauss quadrature to obtain exact value of the following
integral.
Z2
4x2 + 2x4 + x6 dx
I=
1
Provide details of the sampling point locations and the weight functions.
6.5 Use two, three and four point Gauss quadrature to evaluate the following
integral.
Z1/2
I = sin(πx) dx
0
Will the value of the integral improve with 5, 6 and higher order quadrature
and why? Can you determine the order of the Gauss quadrature that will
yield exact value of the integral I, explain.
308 NUMERICAL INTEGRATION OR QUADRATURE
Zd Zb !
I= f (x, y) dx dy
c a
f (x, y) and the limits [a, b] and [c, d] are given in the following. Use lowest
order Gauss quadrature in each case.
(a)
(b)
f (x, y) = 1 + x + y + xy + x2 + y 2 + x3 + x2 y + xy 2 + y 3
[a, b] = [−1, 1] ; [c, d] = [1, 2]
(c)
f (x, y) = x2 y 2 exy
[a, b] = [1, 2] ; [c, d] = [1.2, 2.1]
Calculate numerical values of the integral I in (1) using (a), (b) and (c) with
1, 2, 4, 8, 16 and 32 number of strips for the following f (x) and [a, b].
6.5. GAUSS QUADRATURE IN R3 309
1 2
I. f (x) = x + ; [a, b] = [1, 2]
x
II. f (x) = xe2x ; [a, b] = [0, 3]
III. f (x) = 10 + 5x2 + 2.5x3 + 1.25x4 + 0.625x5 ; [a, b] = [−1, 1]
IV. f (x) = 4x2 + 2x4 + x6 ; [a, b] = [1, 2]
V. f (x) = sin(πx) ; [a, b] = [1, 1/2]
For each f (x), tabulate your results in the following form. Provide a listing
of your program and document your work.
Integral Value
Number of steps
Trapezoid rule Simpson’s 1/3 rule Simpson’s 3/8 rule
1
2
..
.
32
For each of the three methods ((a), (b) and (c)) apply Romberg method
to obtain the most improved values of the integrals. When using Romberg
method, provide a separate table for each of the three methods for each f (x).
6.8 Consider the following integral
Z3 Z2 1/3
I= x2 + y 2 dx dy
2 1
obtain numerical values of the integral I using Gauss quadrature: 1×1, 2×2,
. . . , n × n to obtain the integral value with seven decimal place accuracy.
Tabulate your calculations.
6.9 Consider the following integral
Zπ/2
I = cos(x) dx
0
(s) Can this be integrated exactly using Gauss quadrature? If yes, then
(s) find the minimum number of quadrature points in x and y. Clearly
(s) illustrate and justify your answers.
6.12 Consider the following table of data.
i 1 2 3 4 5
xi 0 0.25 0.5 0.75 1.0
fi = f (xi ) 0.3989 0.3867 0.3521 0.3011 0.2410
R1
(a) Compute I = f (x) dx with strip widths of h = 0.25, h = 0.5 and
0
h = 1.0 using Trapezoid rule. Using these computed values employ
Romberg method to compute the most accurate value of the integral
I. Tabulate your calculations. Show the orders of the errors being
eliminated.
(b) given
Z1 Z3.9 Z2.7
4.2 1.2 2 3 −y 3.6 1
I= x (1 + x ) y e z 1 + 2.6 dx dy dz
z
0.5 2.5 1.6
What is the lowest order of Gauss quadrature in x, y and z to
calculate exact value of I. Explain your reasoning.
7.1 Introduction
In the interpolation theory, we construct an analytical expression, say
f (x), for the data points (xi , fi ); i = 1, 2, . . . , n. The function f (x) is such
that it passes through the data points, i.e., f (xi ) = fi ; i = 1, 2, . . . , n. This
polynomial representation f (x) of the data points may some times be a
poor representation of the functional relationship described by the data (see
Figure 7.1).
functional relationship
described by the data
polynomial
representation
Figure 7.1: Polynomial representation of data and comparison with true functional rela-
tionship
311
312 CURVE FITTING
we have a linear least squares fit to the data. When g(x) is a non-linear
function of the unknown constants or coefficients, we obviously have a non-
linear least squares fit. In this chapter we consider linear as well as non-linear
least squares fits. In case of linear least squares fit we also consider weighted
least squares fit in which the more accurate data points can be assigned
larger weight factors to ensure that the resulting fit is biased towards these
data points. The non-linear least squares fit is first presented for a special
class of g(x) in which taking log or ln of both sides of g(x) yields a form that
is suitable for linear least squares fit with appropriate correction so that true
residual is minimized. This is followed by a general non-linear least squares
fit process that is applicable to any form of g(x) in which g(x) is a desired
non-linear function of the unknown constants or coefficients. A weighted
non-linear least squares formulation of this non-linear least squares fit is
also presented. It is shown that this non-linear least squares fit formulation
naturally degenerates to linear and weighted linear least squares fit when
g(x) is a linear function of the unknown constants or coefficients.
Since g(x) does not necessarily pass through the data points, if xi ; i =
1, 2, . . . , n are substituted in g(x) to obtain g(xi ); i = 1, 2, . . . , n, these may
not agree with fi ; i = 1, 2, . . . , n. Let r1 , r2 , . . . , rn be the differences
between g(x1 ), g(x2 ), . . . , g(xn ) and f1 , f2 , . . . , fn , called residuals at each
of the location xi ; i = 1, 2, . . . , n.
m
X
ck gk (xi ) − fi = ri ; i = 1, 2, . . . , n (7.2)
k=1
or
[G]n×m {c}m×1 − {f }n×1 = {r}n×1 (7.3)
where
g1 (x1 ) g2 (x1 ) . . . gm (x1 )
g1 (x2 ) g2 (x2 ) . . . gm (x2 )
[G] = (7.4)
..
.
g1 (xn ) g2 (xn ) . . . gm (xn )
7.2. LINEAR LEAST SQUARES FIT (LLSF) 313
c1
f1
r1
c2
f2
r2
{c} = .. ; {f } = .. ; {r} = .. (7.5)
.
.
.
cm m×1 fn n×1 rn n×1
The vector {r} is called the residual vector. It represents the difference
between the assumed fit g(x) and the actual function values fi . In the least
squares fit we minimize the sum of squares of the residuals, i.e., we consider
minimization of the sum of the squares of the residuals R.
n
!
X
2
(R)minimize = (ri ) (7.6)
i=1 minimize
or
n
X ∂ri
2ri =0 ; k = 1, 2, . . . , m
∂ck
i=1
or
n
X ∂ri
ri =0 ; k = 1, 2, . . . , m (7.8)
∂ck
i=1
But
∂ri
= gk (xi ) ; from (7.2) (7.9)
∂ck
Hence, (7.8) can be written as:
n
X
ri gk (xi ) = 0 ; k = 1, 2, . . . , m (7.10)
i=1
or
{r}T [G] = [0, 0, . . . ., 0] (7.11)
or
[G]T {r} = {0} (7.12)
Substituting for {r} from (7.3) into (7.12):
or
[G]T [G]{c} = [G]T {f } (7.14)
314 CURVE FITTING
Using (7.14), the unknowns {c} can be calculated. Once {c} are known, the
desired least squares fit is given by g(x) in (7.1).
g(x) = c1 + c2 x3
g1 (x4 ) g2 (x4 ) 1 27
{f }T = [2.4 3.4 13.8 39.5]
T 4.0 36.0
[G] [G] =
36.0 794.0
T 59.1
[G] {f } =
1180.0
c
∴ [G]T [G] 1 = [G]T {f }
c2
or
4.0 36.0 c1 59.1
=
36.0 794 c2 1180.0
∴ c1 = 2.35883
c2 = 1.37957
Hence,
g(x) = 2.35883 + 1.37957x3
is the least squares fit to the data in the table.
For this least squares fit we have
3
X
R= ri2 = 0.291415
i=1
7.3. WEIGHTED LINEAR LEAST SQUARES FIT (WLLSF) 315
40
Given Data
Curve fit g(x)
35
30
25
Data fi or g(x)
20
15
10
0
0 0.5 1 1.5 2 2.5 3
x
Figure 7.2 shows plots of data points (xi , fi ) and g(x) versus x. In this case
g(x) provides good approximation to the data (xi , fi ).
i 1 2 3
xi 1 2 3
fi 4.5 9.5 19.5
Here we demonstrate the use of weight factors (considered unity in this case).
Determine the constants c1 and c2 for g(x) given below to be a least squares
fit to the data in the table. Use weight factors of 1.0 for each data point.
g(x) = c1 + c2 x2
7.3. WEIGHTED LINEAR LEAST SQUARES FIT (WLLSF) 317
g1 (x) = 1 , g2 (x) = x2
g1 (x1 ) g2 (x2 ) 11
[G] = g1 (x2 ) g2 (x2 ) = 1 4
g1 (x3 ) g2 (x3 ) 19
100 4.5
[W ] = 0 1 0 ; {f } = 9.5
001 19.5
c
∴ [G] [W ][G] 1 = [G]T [W ]{f }
T
c2
Where
T 3.0 14.0
[G] [W ][G] =
14.0 98.0
T 33.5
[G] [W ]{f } =
218.0
3.0 14.0 c1 33.5
∴ =
14.0 98.0 c2 218.0
∴ c1 = 2.35714
c2 = 1.88775
Hence,
g(x) = 2.35714 + 1.88775x2
is a least squares fit to the data in the table with weight factors of 1.0
assigned to each data point. We note that the least squares with or without
the weight factors will yield the same results due to the fact that the weight
factors are unity in this example.
In this case we have
X 3
R= ri2 = 0.255102
i=1
318 CURVE FITTING
20
Given Data
Curve fit g(x)
18
16
14
Data fi or g(x)
12
10
4
1 1.5 2 2.5 3
x
Figure 7.3 shows plots of data points (xi , fi ) and g(x) versus x. We note
that g(x) is a good fit to the data (xi , fi ).
100 4.5
[W ] = 0 2 0 ; {f } = 9.5
001 19.5
4.0 18.0
[G]T [W ][G] =
18.0 114.0
43.0
[G]T [W ]{f } =
256.0
4.0 18.0 c1 43.0
∴ =
18.0 114.0 c2 256.0
∴ c1 = 2.227273
c2 = 1.89394
∴ g(x) = 2.227273 + 1.89394x2
This is a least squares fit to the data in the table with weight factors of 1,
2, and 1 for the three data points. Due to w2 = 2, c1 and c2 have changed
slightly compared to Example 7.2. In this case we have
3
X
R= wi ri2 = 0.378788
i=1
20
Given Data
Curve fit g(x)
18
16
14
Data fi or g(x)
12
10
4
1 1.5 2 2.5 3
x
Figure 7.4 shows plots of (xi , fi ) and g(x) versus x. Weight factors of 2 for
data point two does not appreciatively alter g(x).
i 1 2 3 4
xi 1 2 2 3
fi 4.5 9.5 9.5 19.5
g(x) = c1 + c2 x2
g1 (x) = 1 , g2 (x) = x2
g1 (x1 ) g2 (x2 ) 11
g1 (x2 ) g2 (x2 ) 1 4
[G] = g1 (x3 ) g2 (x3 ) = 1 4
g1 (x4 ) g2 (x4 ) 19
1000
4.5
0 1 0 0
9.5
[W ] = 0 0 1 0
; {f } =
9.5
0001 19.5
T 4.0 18.0
[G] [W ][G] =
18.0 114.0
T 43.0
[G] [W ]{f } =
256.0
4.0 18.0 c1 43.0
∴ =
18.0 114.0 c2 256.0
c1 = 2.227273
∴ exactly the same as in Example 7.3
c2 = 1.89394
integer) to the data point is the same as repeating this data point k times
with a weight factor of one. In this case we have
3
X
R= wi ri2 = 0.378788
i=1
16
14
Data fi or g(x)
12
10
4
1 1.5 2 2.5 3
x
then due to the specific form of g(x) in (7.23), the minimization of (7.24)
will result in a system of nonlinear algebraic equations in c and k. This can
be avoided by considering the following: consider (7.23) and take the log of
both sides.
log(g(x)) = log c + k log x = c1 g1 (x) + c2 g2 (x) (7.25)
where c1 = log c
c2 = k
(7.26)
and g1 (x) = 1
g2 (x) = log x
Now we can use (7.25) and apply linear least squares fit.
n
!
X
(R)minimize = (log(g(xi )) − log(fi ))2 (7.27)
i=1
e
minimize
We note that in (7.27), we are minimizing the sum of squares of the residuals
of logs of g(xi ) and fi . However, if we still insist on using (7.27), then some
adjustments or corrections must be made so that (7.27) indeed would result
in what we want.
Let ∆fi be the error in fi , then the corresponding error in log(g(xi )),
i.e., ∆(log(g(xi ))), can be approximated.
d(log(fi )) 1 ∆fi
∆(log(g(xi ))) = d(log(g(xi ))) ' d(log(fi )) = dfi = dfi =
dfi fi fi
(7.28)
∴ ∆fi = fi ∆(log(g(xi ))) (7.29)
∴ (∆fi )2 = (fi )2 (∆(log(g(xi ))))2 (7.30)
From (7.30), we note that minimization of the square of the error fi requires
minimization of the error in log(g(xi )) squared multiplied with fi2 , i.e., fi2 be-
haves like a weight factor. Thus instead of considering minimization (7.27),
if we consider:
n
!
X
2
(R)minimize = wi (log(g(xi )) − log(fi )) (7.31)
i=1 minimize
7.4. NON-LINEAR LEAST SQUARES FIT: A SPECIAL CASE (NLSF) 323
0 0 0 . . . fn2
log(f 1 )
log(f2 )
{fˆ} = .. (7.34)
.
log(fn )
This procedure described above has approximation due to (7.28), but helps
in avoiding nonlinear algebraic equations resulting from the least squares fit.
i 1 2 3
xi 1 2 3
fi 1.2 3.63772 6.959455
Let g(x) = cxk be a least squares fit to the data in the table. Determine c
and k using non-linear least squares fit procedure described in section 7.4.
Take log10 of both sides of g(x) = cxk .
2
f1 0 0 1.44 0 0
[W ] = 0 f22 0 = 0 13.233 0
0 0 f32 0 0 48.434
log10 f1 log10 1.2
{fˆ} = log10 f2 = log10 3.63772
log10 f3 log10 6.959455
63.1070 27.0924
∴ [G]T [W ][G] =
27.0924 12.2249
T ˆ 48.3448
[G] [W ]{f } =
21.7051
c
∴ [G]T [W ][G] 1 = [G]T [W ]{fˆ} gives
c2
63.1070 27.0924 c1 48.3448
=
27.0924 12.2249 c2 21.7051
c1 = 0.0791822 = log10 c ∴ c = 1.2
c2 = 1.6 = k
Hence,
g(x) = 1.2x1.6
is the least squares fit to the data. For this least squares fit we have
3
X 3
X
R= ri2 = (fi − g(xi ))2 = 1.16858 × 10−11
i=1 i=1
7
Given Data
Curve fit g(x)1
6
5
Data fi or g(x)
1
1 1.5 2 2.5 3
x
Figure 7.6 shows plots of (xi , fi ) and g(x) versus x. The fit by g(x) is
almost exact. This is also obvious from such low value of R.
7
Given Data
Curve fit g(x)
6
5
Data fi or g(x)
1
1 1.5 2 2.5 3
x
Figure 7.7 shows plots of (xi , fi ) and g(x) versus x. g(x) is almost exact fit
to the data. This is also obvious from very low R.
g1 (x) = 1 , g2 (x) = x
g1 (x1 ) g2 (x2 ) 10
[G] = g1 (x2 ) g2 (x2 ) = 1 1
g1 (x3 ) g2 (x3 ) 12
7.4. NON-LINEAR LEAST SQUARES FIT: A SPECIAL CASE (NLSF) 327
2
f1 0 0 1.44 0 0
[W ] = 0 f22 0 = 0 7.1324 0
0 0 f32 0 0 35.327
ln(f1 ) ln(1.2)
{fˆ} = ln(f2 ) = ln(2.67065)
ln(f3 ) ln(5.94364)
Therefore we have
c
[G] [W ][G] 1 = [G]T [W ]{fˆ}
T
c2
where
T 43.8992 77.7861
[G] [W ][G] =
77.7861 148.440
T ˆ 70.2327
[G] [W ]{f } =
132.93
43.8992 77.7861 c1 70.2327
∴ =
77.7861 148.440 c2 132.93
∴ c1 = 0.1823226 = ln(c) =⇒ c = 1.2
c2 = 0.8 = k
Hence
g(x) = 1.2e0.8x
is the least squares fit to the data given in the table. For this least squares
fit we have
3
X 3
X
R= ri2 = (fi − g(xi ))2 = 9.59730 × 10−13
i=1 i=1
328 CURVE FITTING
6
Given Data
5.5 Curve fit g(x)
4.5
Data fi or g(x)
3.5
2.5
1.5
1
0 0.5 1 1.5 2
x
Figure 7.8 shows graphs of (xi , fi ) and g(x) versus x. In this case also g(x)
is almost exact fit to the data
∂g(xi , c1 , c2 , . . . , cm )k ∂g(xi , c1 , c2 , . . . , cm )k
(∆c1 )k + (∆c2 )k + · · · +
∂c1 ∂c2
g(xi , c1 , c2 , . . . , cm )k − fi = ri ; i = 1, 2, . . . , n (7.39)
in which
∂g(x
1 ,c1 ,c2 ,...,cm )k
∂g(x1 ,c1 ,c2 ,...,cm )k ∂g(x1 ,c1 ,c2 ,...,cm )k
∂c1 ∂c ... ∂cm
∂g(x2 ,c1 ,c2 ,...,cm )k ∂g(x2 ,c1 ,c22,...,cm )k ∂g(x2 ,c1 ,c2 ,...,cm )k
∂c1 ∂c2 ... ∂cm
[G]k =
.. .. ..
(7.41)
. . .
∂g(xn ,c1 ,c2 ,...,cm )k ∂g(xn ,c1 ,c2 ,...,cm )k ∂g(xn ,c1 ,c2 ,...,cm )k
∂c1 ∂c2 ... ∂cm n×m
n
!
X
(Rk )minimization = (ri )2k (7.44)
i=1 minimization
We note that (7.40) is similar to (7.3) when {d}k in (7.40) takes the place
of {f } in (7.3), hence the least squares fit becomes
[G]Tk [G]k {∆c}k = [G]Tk {d}k (7.45)
We solve for {∆c}k using (7.45). Improved values of {c} i.e. {c}k+1 are
given by
{c}k+1 = {c}k + {∆c}k (7.46)
Convergence check for the iterative process (7.45) and (7.46) is given by
(ci )k+1 − (ci )k
100 ≤ ∆ ; i = 1, 2, . . . , m (7.47)
(ci )k+1
or simply |(ci )k+1 − (ci )k | ≤ ∆1 .
We note that the method requires initial or starting values of ci ; i = 1, 2, . . . , m
i.e. {c}k so that coefficients of [G]k and g(x, c1 , c2 , . . . , cm )k in {d}k in (7.5)
can be calculated.
When the convergence criteria in (7.47) is satisfied we have the solution
{c}k+1 for ci ; i = 1, 2, . . . , m in g(x, c1 , c2 , . . . , cm ) otherwise we increment k
by one and repeat (7.45) and (7.46) till (7.47) is satisfied.
7.5.1.1 Using general non-linear least squares fit for linear least
squares fit
In case of linear least squares fit we have
m
X
g(x, c1 , c2 , . . . , cm ) = ci gi (x) (7.50)
i=1
7.5. GENERAL FORMULATION FOR NON-LINEAR LEAST SQUARES FIT (GNLSF) 331
Hence
∂g
= gk (x) (7.51)
∂ck
Hence, [G]k in (7.45) reduces to [G] ((7.4)) in linear least squares fit and we
have (omitting subscript k)
[G]T [G]{∆c} = [G]T {d} (7.52)
in which di = fi − g(xi ); i = 1, 2, . . . , n.
(1) With the initial choice of ci = 0; i = 1, 2, . . . , m, {d} becomes {f }, hence
(7.52) reduces to
[G]T [G]{∆c} = {f } (7.53)
we note that (7.53) is same as (7.4) in linear least squares fit. Clearly
{∆c} = {c}.
(2) With any non-zero choices of {c}, the iterative process converges in two
iterations as [G] is not a function of {c}.
Example 7.8. We consider the same problem as example 7.5 but apply the
general formulation for non-linear least squares fit presented in section 7.5.
g(x, c, k) = cxk
∂g ∂g
= xk , = ckxk−1
∂c ∂k
we consider matrix [G] and vector {d} using (7.41) and (7.43)
∂g(x1 ,ck ) ∂g(x1 ,ck )
∂g(x∂c2 ,ck ) ∂g(x∂k2 ,ck )
∂c ∂k
[G] = .. ..
. .
∂g(xn ,ck ) ∂g(xn ,ck )
∂c ∂k
43.8243 32.2682
[G]T1 [G]1 =
32.2682 25.9323
{[G]T1 {d}1 }T = [−3.65106 × 10−6 − 2.10685 × 10−6 ]
{c}T2 = [1.1999998 1.6000003]
[G]T1 [G]1 {∆c}1 = [G]T1 {d}1
gives
{∆c}T1 = [−2.08034 × 10−7 2.6759 × 10−7 ]
are
{c}T2 = {c}T1 + {∆c}T1 = [1.1999998 1.600003] = [c, k]
{c}2 is converged solution based on tolerance ∆1 = 10−6 .
Since {c}1 (initial values of c and k) are the correct values, the non-linear
iterations solution procedure converges in only one iteration and we have
g(x) = 1.2x1.6 with R = 1.38914 × 10−12
the desired least squares fit.
7
Given Data
Curve fit g(x)
6
5
Data fi or g(x)
1
1 1.5 2 2.5 3
x
Example 7.9. Here we consider the same problem as example 7.7 but apply
the general formulation for non-linear least squares fit.
g(x, c, k) = cekx
∂g ∂g
= ekx , = ckekx
∂c ∂k
Matrix [G] and vector {d} are constructed using using (7.41) and (7.43)
∂g(x1 ,ck ) ∂g(x1 ,ck )
∂g(x∂c2 ,ck ) ∂k
∂g(x2 ,ck )
∂c ∂k
[G] = .. ..
. .
∂g(xn ,ck ) ∂g(xn ,ck )
∂c ∂k
33.1457 70.5365
[G]T2 [G]2 =
70.5365 160.978
{[G]T2 {d}2 }T = [−1.41707 − 3.26531]
[G]T2 [G]2 {∆c}2 = [G]T2 {d}2
gives
{∆c}T2 = [6.1246 × 10−3 − 2.2968 × 10−2 ]
Hence,
{c}T3 = {c}T2 + {∆c}T2 = [1.19964123 0.800565183]
with R = 2.50561 × 10−5
For k = 3 (iteration three)
1.0 0.0
[G]3 = 2.22680 2.67136
4.95863 11.8972
T 30.5467 64.9423
[G]3 [G]3 =
64.9423 148.679
{[G]T3 {d}3 }T = [−2.57275 × 10−2 − 6.06922 × 10−2 ]
[G]T3 [G]3 {∆c}3 = [G]T3 {d}3
gives
{∆c}T3 = [3.5895 × 10−4 − 5.6500 × 10−4 ]
Hence,
{c}T4 = {c}T3 + {∆c}T3 = [1.20000017 0.80000019]
and R = 3.53244 × 10−12
For k = 4 (iteration four)
1.0 0.0
[G]4 = 2.22554 2.67065
4.95303 11.8873
T 30.4856 64.8218
[G]4 [G]4 =
64.8218 148.44
{[G]T4 {d}4 }T = [−9.38711 × 10−6 − 2.22698 × 10−5 ]
[G]T4 [G]4 {∆c}4 = [G]T4 {d}4
gives
{∆c}T4 = [1.5505 × 10−7 − 2.1773 × 10−7 ]
7.5. GENERAL FORMULATION FOR NON-LINEAR LEAST SQUARES FIT (GNLSF) 335
Hence,
{c}T5 = {c}T4 + {∆c}T4 = [1.200000290 0.799999952]
with R = 0.313247 × 10−13
Absolute value of each components of {c}4 − {c}3 is less than or equal to
∆1 = 10−6 , hence
{c}T2 = [c, k] = [1.2 0.8]
Thus we have
g(x) = 1.2e0.8x
is the desired least squares fit. This is same as in example 7.7.
6
Given Data
5.5 Curve fit g(x)
4.5
Data fi or g(x)
3.5
2.5
1.5
1
0 0.5 1 1.5 2
x
Remarks.
(1) Examples 7.1 - 7.4, linear least squares fit have also been solved using the
general non-linear least squares fit (section 7.5), the results are identical
to those obtained in examples 7.1 - 7.4, hence are not repeated here.
(2) In examples 7.1 - 7.4 when using formulations of section 7.5 the initial
336 CURVE FITTING
f (t) = f (t + T ) (7.54)
in which T is called period. T is the smallest value of time for which (7.54)
holds i.e. f (··) repeats after every value of time as a multiple of T .
In least squares fit we can generally use functions in time of the forms
repeats and φ is called phase shift that defines how the function is shifted
horizontally. Negative φ implies lag whereas positive φ results in lead.
An alternative to (7.55) is to use
or
g(t) = c̃1 + c̃2 (cos ωt cos φ − sin ωt sin φ)
(7.58)
= c̃1 + (c̃2 cos φ) cos ωt + (−c̃2 sin φ) sin ωt
Let
c1 = c̃1 , c2 = c̃2 cos φ , c3 = −c̃2 sin φ (7.59)
Then, we can write (7.57) as
or
[G]{c} − {f } = {r} (7.70)
In weighted least squares curve fit we consider
Xn
(R)minimizing =( wi ri2 )minimizing (7.71)
i=1
Hence
n
1 X
c1 = ( fi )
n
i=1
n
2 X
c2 = ( fi cos ωti ) (7.76)
n
i=1
n
2 X
c3 = ( fi sin ωti )
n
i=1
Example 7.10. In this example we consider least squares fit using sinusoidal
functions.
Consider
f (t) = 1.5 + 0.5 cos 4t + 0.25 sin 4t
for t ∈ [0, 1.5]. We generate ti and f (ti ) or fi ; i = 1, 2, . . . , 16 in equal incre-
ment ∆t = 0.1. (ti , fi ); i = 1, 2, . . . , 16 are given in the following (n = 16).
340 CURVE FITTING
i ti fi
1 0.00000E+00 0.20000E+01
2 0.10000E+00 0.20579E+01
3 0.20000E+00 0.20277E+01
4 0.30000E+00 0.19142E+01
5 0.40000E+00 0.17353E+01
6 0.50000E+00 0.15193E+01
7 0.60000E+00 0.13002E+01
8 0.70000E+00 0.11126E+01
9 0.80000E+00 0.98626E+00
10 0.90000E+00 0.94099E+00
11 0.10000E+01 0.98398E+01
12 0.11000E+01 0.11084E+01
13 0.12000E+01 0.12947E+01
14 0.13000E+01 0.15134E+01
15 0.14000E+01 0.17300E+01
16 0.15000E+01 0.19102E+01
3
X
g(t) = c1 + c2 cos 4t + c3 sin 4t = ck gk (t)
k=1
in which
g1 (t) = 1 , g2 (t) = cos 4t and g3 (t) = sin 4t
or
1 cos 4t1 sin 4t1
1 cos 4t2 sin 4t2
[G] = .
.. ..
.. . .
1 cos 4tn sin 4tn
7.6. LEAST SQUARES FIT USING SINUSOIDAL FUNCTIONS (LSFSF) 341
0.100000E+01 0.100000E+01 0.000000E+00
0.100000E+01 0.921061E+00 0.389418E+00
0.100000E+01 0.696707E+00 0.717356E+00
0.100000E+01 0.362358E+00 0.932039E+00
0.100000E+01 -0.291995E-01 0.999574E+00
0.100000E+01 -0.416147E+00 0.909297E+00
0.100000E+01 -0.737394E+00 0.675463E+00
0.100000E+01 -0.942222E+00 0.334988E+00
[G] =
0.100000E+01 -0.998295E+00 -0.583742E-01
0.100000E+01 -0.896758E+00 -0.442520E+00
0.100000E+01 -0.653644E+00 -0.756802E+00
0.100000E+01 -0.307333E+00 -0.951602E+00
0.100000E+01 0.874992E-01 -0.996165E+00
0.100000E+01 0.468516E+00 -0.883455E+00
0.100000E+01 0.775566E+00 -0.631267E+00
0.100000E+01 0.960170E+00 -0.279415E+00
0.160000E+02 0.290885E+00 -0.414649E-01
[G]T [G] = 0.290885E+00 0.814368E+01 -0.418135E-01
-0.414649E-01 -0.418135E-01 0.785632E+01
2.2
Given Data
Curve fit g(x)
2
1.8
Data fi or g(x)
1.6
1.4
1.2
0.8
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
x
well as non-linear least squares fit, both weighted as well as without weight
factors.
344 CURVE FITTING
Problems
7.1 Consider the following data
i 1 2 3 4
xi −2 −1 0 1
fi = f (xi ) 6 4 3 3
Consider g(x) = c1 +c2 x to be least squares fit to this data. Find coefficients
c1 and c2 . Plot graphs of data points and g(x) versus x as well as tabulate.
i 1 2 3
xi 0 π/4 π/2
fi = f (xi ) 0 1 0
g(x) = c1 + c2 ex
i 1 2 3
xi 0 1 2
fi = f (xi ) 1 2 2
i 1 2 3
xi 0 1 2
fi = f (xi ) 2 6.40 22.046
i 1 2 3
xi 1 2 4
fi = f (xi ) 1.083 3.394 9.6
7.6. LEAST SQUARES FIT USING SINUSOIDAL FUNCTIONS (LSFSF) 345
Obtain constants a and b in g(x) = axb so that g(x) is a least squares fit to
the data in the table. Use formulations in section 7.4 and 7.5. Compare and
discuss the results obtained from the two formulations.
i 1 2 3
xi 0 1 2
fi = f (xi ) 10 3 1
Using a form of the type g(x) = k1 e−k2 x . Calculate k1 and k2 using the
formulations in section 7.4 as well as 7.5. Compare and discuss the values
of k1 and k2 obtained using the two formulations.
i 1 2 3
xi 0 1 2
fi = f (xi ) 10 12 18
i 1 2 3
xi 0.1 0.2 0.3
fi = f (xi ) 12.161 13.457 20.332
8.1 Introduction
In many situations, given the discrete data set (xi , fi ); i = 1, 2, . . . , n, we
are faced with the problem of determining the derivative of a function f with
respect to x. The discrete data set (xi , fi ); i = 1, 2, . . . , n may be from an
experiment in which we have only determined values fi at discrete locations
xi . In such a situation the value of a function f ∀x 6= xi ; i = 1, 2, . . . , n
is not known. Secondly, we only have discrete data points. A function f (x)
describing this data set is not known yet.
For the data set (xi , fi ); i = 1, 2, . . . , n we wish to determine approximate
value of the derivative of f with respect to x. We consider the following two
approaches in this chapter.
dk f
8.1.1 Determination of Approximate Value of dxk
; k = 1, 2, . . . .
using Interpolation Theory
In this approach we consider the data set (xi , fi ); i = 1, 2, . . . , n and
establish the interpolating polynomial f (x) using (see Chapter 5):
347
348 NUMERICAL DIFFERENTIATION
(xi , fi ) ; i = 1, 2, . . . , n (8.1)
x1 x2 x3 xi−1 xi xi+1 xn
xi+1 = xi + h ; i = 1, 2, . . . , n − 1 (8.2)
The scalar h is the spacing between the two successive data points.
k
If we pose the problem of determining ddxfk ; k = 1, 2, . . . at x = xi ,
k
then by letting i = 1, 2, . . . we can determine ddxfk ; k = 1, 2, . . . at x = xi ;
i = 1, 2, . . . , n. Consider x = xi and fi and two sets (for example) of data
points immediately preceding x = xi as well as immediately following x = xi
(see Figure 8.2). Since fi is the value of f at xi , we can define:
h h h h
fi = f (xi ) ; i = 1, 2, . . . , n (8.3)
dk f
Our objective is to determine dxk
at x = xi ; k = 1, 2, . . . ; i = 1, 2, . . . , n.
df
8.2.1 First Derivative of dx
at x = xi
(a) Forward difference method or first forward difference
Consider Taylor expansion of f (xi+1 ) about x = xi .
h2 h3
f (xi+1 ) = f (xi ) + f 0 (xi )h + f 00 (xi ) + f 000 (xi ) + . . . (8.4)
2! 3!
or
h2 h3
f (xi+1 ) − f (xi ) = f 0 (xi )h + f 00 xi + f 000 (xi ) (8.5)
2! 3!
or
f (xi+1 ) − f (xi ) = f 0 (xi )h + O(h2 ) (8.6)
or
f (xi+1 ) − f (xi )
= f 0 (xi ) + O(h) (8.7)
h
f (xi+1 ) − f (xi )
∴ f 0 (xi ) ' (8.8)
h
The approximate value of the derivative of f with respect to x at x = xi
given by (8.8) has truncation error of the order of h O(h). This is called
df
forward difference approximation of dx at x = xi . By letting i = 1, 2, . . .
df
in (8.8), we can obtain dx at x = xi ; i = 1, 2, . . . , n − 1.
h2 h3
f (xi−1 ) = f (xi ) − f 0 (xi )h + f 00 (xi ) − f 000 (xi ) (8.9)
2! 3!
or
h2 h3
f (xi−1 ) − f (xi ) = −f 0 (xi )h + f 00 (xi ) − f 000 (xi ) (8.10)
2! 3!
or
f (xi−1 ) − f (xi ) = −f 0 (xi )h + O(h2 ) (8.11)
or
f (xi−1 ) − f (xi )
= −f 0 (xi ) + O(h) (8.12)
h
f (xi ) − f (xi−1 )
∴ f 0 (xi ) ' (8.13)
h
The approximate value of the derivative of f with respect to x at x = xi
given by (8.13) has truncation error of the order of h O(h). This is
350 NUMERICAL DIFFERENTIATION
df
called backward difference approximation of dx at x = xi . By letting
df
i = 1, 2, . . . in (8.13), we can obtain dx at x = xi ; i = 2, 3, . . . ,n.
(c) First central difference method
Consider Taylor series expansion (8.5) and (8.10).
h2 h3
f (xi+1 ) − f (xi ) = f 0 (xi )h + f 00 (xi ) + f 000 (xi ) + . . . (8.14)
2! 3!
h 2 h3
f (xi−1 ) − f (xi ) = −f 0 (xi )h + f 00 (xi ) − f 000 (xi ) + . . . (8.15)
2! 3!
Subtracting (8.15) from (8.14):
h3
f (xi+1 ) − f (xi−1 ) = 2hf 0 (xi ) + 2f 000 (xi ) (8.16)
3!
or
f (xi+1 ) − f (xi−1 ) = 2hf 0 (xi ) + O(h3 ) (8.17)
or
f (xi+1 ) − f (xi−1 )
= f 0 (xi ) + O(h2 ) (8.18)
2h
f (xi+1 ) − f (xi−1 )
∴ f 0 (xi ) ' (8.19)
2h
df
The approximate value of dx at x = xi given by (8.19) has truncation
2
error of the order of O(h ). This is known as central difference approx-
df
imation of dx at x = xi for u = 2, 3, . . . , n − 1.
Remarks.
(1) Forward difference and backward difference approximation have the same
order of truncation error O(h), hence we expect similar accuracy in ei-
ther of these two approaches.
(2) The central difference method has truncation error of the order of O(h2 ),
hence this method is superior to forward or backward difference method
and will yield better accuracy. Thus, this is higher order approximation
by one order than (a) and (b).
d2 f
8.2.2 Second Derivative dx2
at x = xi : Central Difference
Method
Consider Taylor series expansions (8.5) and (8.10).
h2 h3
f (xi+1 ) − f (xi ) = f 0 (xi ) + f 00 (xi ) + f 000 (xi ) + . . . (8.20)
2! 3!
h 2 h3
f (xi−1 ) − f (xi ) = −f 0 (xi ) + f 00 (x)i) − f 000 (xi ) + . . . (8.21)
2! 3!
8.2. NUMERICAL DIFFERENTIATION USING TAYLOR SERIES EXPANSIONS 351
or
f (xi+1 ) − 2f (xi ) + f (xi−1 )
= f 00 (xi ) + O(h2 ) (8.23)
h2
f (xi+1 ) − 2f (xi ) + f (xi−1 )
∴ f 00 (xi ) ' (8.24)
h2
2
The approximation of ddxf2 at x = xi ; i = 2, 3, . . . , n − 1 given by (8.24) has
truncation error of the order of O(h2 ).
d3 f
8.2.3 Third Derivative dx3
at x = xi
Recall (8.5) and (8.10) based on Taylor series expansions of f (xi+1 ) and
f (xi−1 ) about x = xi .
h2 h3
f (xi+1 ) − f (xi ) = f 0 (xi )h + f 00 (xi ) + f 000 (xi ) + . . . (8.25)
2! 3!
h 2 h3
f (xi−1 ) − f (xi ) = −f 0 (xi )h + f 00 (xi ) − f 000 (xi ) + . . . (8.26)
2! 3!
Also consider Taylor series expansions of f (xi+2 ) and f (xi−2 ) about x = xi .
(2h)2 (2h)3
f (xi+2 ) = f (xi ) + f 0 (xi )(2h) + f 00 (xi ) + f 000 (xi ) + ... (8.27)
2! 3!
(2h)2 (2h)3
f (xi−2 ) = f (xi ) − f 0 (xi )(2h) + f 00 (xi ) − f 000 (xi ) + ... (8.28)
2! 3!
Subtracting (8.26) from (8.25):
1
f (xi+1 ) − f (xi−1 ) = 2f 0 (xi )h + f 000 (xi )h3 + O(h5 ) (8.29)
3
Subtracting (8.28) from (8.27):
8
f (xi+2 ) − f (xi−2 ) = 4f 0 (xi )h + f 000 (xi )h3 + O(h5 ) (8.30)
3
Multiply (8.29) by 2 and subtract it from (8.30).
3
The approximation of ddxf3 at x = xi ; i = 3, 4, . . . , n − 2 using (8.33) has
truncation error of O(h2 ). Since in this derivation we have considered two
data points immediately before and after x = xi , (8.33) can be labeled as
3
central difference approximation of ddxf3 at x = xi ; 3, 4, . . . , n − 2.
Remarks.
f (xi+1 ) − f (xi )
f 0 (xi ) =
h
f (xi+2 ) − 2f (xi+1 ) + f (xi )
f 00 (xi ) =
h2
f (xi+3 ) − 3f (xi+2 ) + 3f (xi+1 ) − f (xi )
f 000 (xi ) =
h3
f (xi+4 ) − 4f (x i+3 ) + 6f (xi+2 ) − 4f (xi+1 ) + f (xi )
f iv (xi ) =
h4
(8.34)
f (xi ) − f (xi−1 )
f 0 (xi ) =
h
f (xi ) − 2f (xi−1 ) + f (xi−2 )
f 00 (xi ) =
h2
f (xi ) − 3f (xi−1 ) + 3f (xi−2 ) − f (xi−3 )
f 000 (xi ) =
h3
f (xi ) − 4f (xi−1 ) + 6f (xi−2 ) − 4f (xi−3 ) + f (xi−4 )
f iv (xi ) =
h4
(8.35)
(2) Approximating derivatives using Taylor series expansion works well and
is easier to use when the data points are equally spaced or have uniform
spacing.
(3) The various differencing expressions are often called finite difference ap-
proximations of the derivatives of the function f defined by the discrete
8.2. NUMERICAL DIFFERENTIATION USING TAYLOR SERIES EXPANSIONS 353
x1 = 0 x2 = 1 x3 = 2 x4 = 3
f1 = 0 f2 = 4 f3 = 0 f4 = −2
df
Thus we can determine dx using central difference at x = 1 and x = 2.
df f3 − f1 0−0
= = =0
dx x=1 2(1) 2(1)
df f1 − f2 −2 − (4)
= = = −3
dx x=2 2(1) 2
354 NUMERICAL DIFFERENTIATION
df f2 − f1 4−0
= = =4
dx x=0 (1) 1
df f4 − f3 −2 − 0
= = = −2
dx x=3 (1) 1
i 1 2 3 4
xi 0 1 2 3
df
dx x=x 4 0 -3 -2
i
i 1 2 3 4
xi 0 1 2 3
df 34
dx x 3 − 53 − 14
3
11
3
i
df
We note that dx values in the two tables are quite different. This is gen-
erally the case when only few data points are available and the spacing
between them is relatively large as is the case in this example.
Problems
8.1 Consider the following table of xi , f (xi ).
i 1 2 3 4 5 6 7
xi 0 1/16 1/8 3/16 1/4 3/8 1/2
df
Compute dx at x = 1/8 and x = 1/4 using forward difference and backward
difference approximation with truncation error of the order O(h) (h = 1/16 in
2
this case). Also compute ddxf2 at x = 1/8 and x = 1/4 using central difference
approximation with truncation error of the order O(h2 ).
Using f (x) = sin(πx) as the actual function describing the data in the ta-
ble, calculate percentage error in the estimates of the first and the second
derivatives.
i 1 2 3 4 5
xi 0 5 10 15 20
fi = f (xi ) 0 1.60944 2.30259 2.70805 2.99573
df
Compute dx at x =5, 10 and 15 using forward difference and backward dif-
ference approximation with truncation error of the order O(h) (h = 5 in
2
this case). Also compute ddxf2 at x = 5, 10 and 15 using central difference
approximation with truncation error of the order O(h2 ).
Using f (x) = ln(x) as the actual function describing the data in the ta-
ble, calculate percentage error in the estimates of the first and the second
derivatives.
i 1 2 3 4 5
xi 1 2 3 4 5
fi = f (xi ) 2.71828 7.3896 20.08554 54.59815 148.41316
df
Compute dx at x = 2, 3 and 4 using forward difference and backward differ-
ence approximation with truncation error of the order O(h) (h = 1 in this
2
case). Also compute ddxf2 at x = 2, 3 and 4 using central difference approxi-
mation with truncation error of the order O(h2 ).
Using f (x) = ex as the actual function describing the data in the table, calcu-
late percentage error in the estimates of the first and the second derivatives.
356 NUMERICAL DIFFERENTIATION
i 1 2 3 4 5
xi 1 2 3 4 5
fi = f (xi ) 0.36788 0.13536 0.04979 0.01832 0.006738
df
Compute dx at x = 2, 3 and 4 using forward difference and backward differ-
ence approximation with truncation error of the order O(h) (h = 1 in this
2
case). Also compute ddxf2 at x = 2, 3 and 4 using central difference approxi-
mation with truncation error of the order O(h2 ).
Using f (x) = e−x as the actual function describing the data in the table,
calculate percentage error in the estimates of the first and the second deriva-
tives.
i 1 2 3 4 5
xi 0 1 2 3 4
fi = f (xi ) 0 0.2 1.6 5.4 32
df
Compute dx at x = 2, 3 and 4 using forward difference and backward differ-
ence approximation with truncation error of the order O(h) (h = 1 in this
2
case). Also compute ddxf2 at x = 2, 3 and 4 using central difference approxi-
mation with truncation error of the order O(h2 ).
Using f (x) = 0.2x3 as the actual function describing the data in the ta-
ble, calculate percentage error in the estimates of the first and the second
derivatives.
d2 f (x)
and dx2
estimated in problem 8.2 using finite difference approximation as
8.3. CONCLUDING REMARKS 357
well as with those calculated using f (x) = ln(x). Also calculate percentage
2 f (x)
error in dfdx
(x)
and d dx 2 values using f (x) = ln(x) as the true behavior of
data in the table.
9.1 Introduction
A boundary value problem (BVP) describes a stationary process in which
the state of the process does not change over time, hence the values of the
dependent variables remain the same or fixed for all values of time. The
mathematical description of BVP result in ordinary or partial differential
equations in dependent variables and spatial coordinates x, y, and z but
not time t. The BVPs also have boundary conditions that may consist
of, specified values of dependent variables and/or their derivatives on the
boundaries of the domain of definition of the BVP.
There are many methods currently employed for obtaining approximate
numerical solutions of the BVPs:
(e) Others
359
360 NUMERICAL SOLUTIONS OF BVPS
Remarks.
(1) We see that integration of the ODE yields its solution and the differen-
tiation of the solution gives back the ODE.
(2) At this stage even though we do not know the specific details of the
various methods mentioned above (but regardless of the details), one
thing is clear, the methods of solution of ODEs and PDEs must consider
their integration in some form or the other over their domain of definition
as this is the only mathematically justifiable approach for obtaining their
solutions.
(3) We generally represent ODEs and PDEs using differential operators and
dependent variable(s). The differential operator contains operations of
differentiation (including differentiation of order zero). When the differ-
ential operator acts on the dependent variable it produces the original
differential or partial differential equations. For example in case of (9.1),
we can write
Aφ = x2 ∀x ∈ Ω (9.5)
in which the differential operator is A = d/dx.
If
dφ 1 d2 φ
− = f (x) ∀x ∈ (a, b) = Ω (9.6)
dx P e dx2
9.2. INTEGRAL FORMS 361
Aφ = f (x) ∀x ∈ Ω
d 1 d2 (9.7)
A= −
dx P e dx2
If
dφ 1 d2 φ
φ − = f (x) ∀x ∈ (a, b) = Ω (9.8)
dx Re dx2
is the BVP, then we can write (9.8) as
Aφ = f (x) ∀x ∈ (a, b) = Ω
d 1 d2 (9.9)
A=φ −
dx Re dx2
In (9.9) the differential operator is a function of the dependent variable
φ.
If
d2 φ
+ φ = f (x) ∀x ∈ (a, b) = Ω (9.10)
dx2
is the BVP, then we can write (9.10) as
Aφ = f (x) ∀x ∈ Ω
d2 (9.11)
A= +1
dx2
∂φn
v(x) = δφn (Ci ) = = ψj (x) ; j = 1, 2, . . . , n (9.15)
∂Ci
Z Z
(Aφn )v dx = f v dx (9.17)
Ω̄ Ω̄
9.2. INTEGRAL FORMS 363
or
n
Z
P
Ci Aψi (x) ψj (x) dx =
i=1
Ω̄
Z Z
f ψj (x) dx − Aψ0 (x)ψj (x) dx ; j = 1, 2, . . . , n (9.20)
Ω̄ Ω̄
[K]{C} = {F } (9.21)
Using (9.21), we calculate {C}. Then, equation (9.14) defines the approxi-
mation φn (x) of φ(x) over Ω̄.
Remarks.
R
(1) When v = δφn , (Aφn −f )v dx = 0 is called the Galerkin method (GM).
(2) When v(x) = w(x) = 0 where φn is specified but w(x) 6= δφn (x), then:
Z Z
(Aφn (x) − f )v(x) dx = (Aφn (x) − f )w(x) dx = 0 (9.23)
Ω̄ Ω̄
364 NUMERICAL SOLUTIONS OF BVPS
In B(φn , v) all terms contain both φn and v are included. The additional
expression l(v) is due to integration by parts and contains those terms
that
R only have
e v. It is called the concomitant. We can combine l(v) and
f v dx to obtain: e
B(φn , v) = l(v)
Z
l(v) = f v dx + l(v) (9.25)
e
Ω̄
This method is called the Galerkin method with weak form (GM/WF)
(v = δφn ) and the integral form (9.25) is called the weak form of (9.17).
The reason for transferring differentiation from φn to v in (9.17) is to
ensure that each term of the integrand of B(φn , v) contains equal orders
of differentiation
R of φn and v. We only perform integration by parts for
those terms in (Aφn )v dx that yield this. Thus, integration by parts is
Ω̄
performed on those terms that contain even order derivatives of φn . In
such terms, after integration by parts, φn and v are interchangeable in
the integrand in GM/WF in which v = δφn .
(4) We note that the integrals over Ω̄ are definite integrals, hence pro-
duce numbers after the limits are substituted. Such integralsR are called
functionals. Thus, (9.13), (9.17), B(φn , v), l(v), l(v), and Ω̄ f v dx are
all functionals. In GM, PGM, and WRM also e we can write (9.13) as
B(φn , v) = l(v) in which:
Z Z
B(φn , v) = (Aφn )v dx and l(v) = f v dx (9.26)
Ω̄ Ω̄
(5) The domain of definition Ω of the BVP is not discretized, hence GM,
PGM, WRM, and GM/WF considered here are often referred to as clas-
sical methods of approximation.
(6) These methods, as we have seen here, are rather simple and straight-
forward in principle. The major difficulty lies in the selection of ψi (x);
9.2. INTEGRAL FORMS 365
(8) We note that in GM/WF, [K] is symmetric when the differential oper-
ator A contains only even order derivatives.
or
E = [k]{c} − f ; E T = [c]{k} − f (9.28)
e e
in which
ki = Aψi ; i = 1, 2, . . . , n
(9.29)
f = −f + Aψ0
e
The residual functional I is given by:
Z Z Z
2 T
I = E dx = E E dx = [c]{k} − f [k]{c} − f dx (9.30)
e e
Ω̄ Ω̄ Ω̄
or Z
I= [c]{k}[k]{c} − f [k]{c} − f [c]{k} + f 2 dx (9.31)
e e e
Ω̄
Hence, we have:
Z Z
{k}[k]dx {c} = f dx (9.34)
e
Ω̄ Ω̄
or
[K]{C} = {F } (9.35)
in which
Z
Kij = (Aψi Aψj )dx
Ω̄
; i, j = 1, 2, . . . , n (9.36)
Fi = (f − Aψ0 )Aψi
(2) Here also we have the same problems associated with the choice of ψi (x)
as described in Section 9.2.1, hence its usefulness for practical applica-
tions is extremely limited.
(3) If we have more than one PDE, then we have a residual function for each
PDE, Ei ; i = 1, 2, . . . , m, and the residual functional is defined as:
m m
Z
(Ei )2 dx
P P
I= Ii = (9.37)
i=1 i=1
Ω̄
The details for each Ii follow what has been described for a single residual
function.
A, E
P
Ω̄
P
y
Ω̄T Ω̄e
1 2 3 x
y xe xe+1
P x
1 2 3 4 a typical element e
y
Ω̄T Ω̄e
1 2 3 x
y xe xe 1 xe+2
P x
1 2 3 4 a typical element e
t, E, ν
Ω̄ σx
Ω̄e
σx
a typical element e
x
y
Ω̄T
Ω̄e
σx
a typical element e
x
y
Ω̄T
Ω̄e
σx
a typical element e
x
in which Ω̄e = Ωe ∪Γe is the domain of an element with its closed boundary
Γe (see Figures 9.1 and 9.2). Let φh (x) be the approximation of φ over Ω̄T ,
then:
φh (x) = ∪φeh (x) (9.39)
e
in which φeh (x)is the approximation of φ(x) over an element e with domain
Ω̄e , called the local approximation of φ.
The test function v = δφh for GM, GM/WF and v(x) = w(x) 6= δφh for
in PGM, WRM. Since the definite integral in (9.40) is a functional, we can
write this as a sum of the integrals over the elements.
Z XZ
(Aφh − f )v dx = (Aφeh − f )v dx = 0 (9.41)
e
Ω̄T Ω̄e
or
B e (φeh , v) − le (v) = 0
P
(9.42)
e
Z
e
B (φeh , v) = (Aφeh )v dx
Ω̄e
Z (9.43)
e
l (v) = f v dx
Ω̄e
1 2
(p=1)
1 2 3
(a) Using two-node linear element
1 2
(p=2)
1 2 3 4 5
(b) Using three-node quadratic element
and two in Figure 9.3(a), the degrees of freedom are {δ 1 } and {δ 2 }. Using
Lagrange interpolating polynomials (Chapter 5), we can easily define local
approximations φ1h and φ2h for elements one and two. First, the elements are
mapped into ξ-space, i.e., Ω̄e → Ω̄ξ = [−1, 1] using (for an element):
1−ξ 1+ξ
x(ξ) = xe + xn (9.44)
2 2
For elements one and two of Figure 9.3(a), we have (e, n) = (1, 2) and (2, 3),
whereas for elements one and two of Figure 9.3(b) we have (e, n) = (1, 3)
and (3, 5). The mapping (9.44) is a linear stretch in both cases. The local
approximations φ1h (ξ) and φ2h (ξ) can now be established in ξ-space using
Lagrange interpolation and {δ 1 } and {δ 2 }.
In the case of Figure 9.3(a), the functions N1 (ξ) and N2 (ξ) for both φ1h (ξ)
and φ2h (ξ) are:
1−ξ 1+ξ
N1 (ξ) = ; N2 (ξ) = (9.46)
2 2
For Figure 9.3(b) we have N1 (ξ), N2 (ξ), and N3 (ξ).
ξ(ξ − 1) ξ(ξ + 1)
N1 (ξ) = ; N2 (ξ) = 1 − ξ 2 ; N3 (ξ) = (9.47)
2 2
Thus, we could write (9.45) as:
n
φeh (ξ) = Ni (ξ)δie
P
(9.48)
i=1
v = wj ; j = 1, 2, . . . , n (9.50)
in which
Z1
Z
e
Kij = Ni (ANj ) dx = Ni (ANj )J dξ ; J = he/2
e Ω̄ −1
i, j = 1, 2, . . . , n
Z1
fie =
f Ni J dξ
−1
(9.53)
Thus, for elements (1) and (2) of Figure 9.3(a) we have:
Z 1 1
1
1 K11 K12 φ1 f1
(Aφh )v dx = 1 1 − ; element one (9.54)
K21 K22 φ2 f21
Ω̄1
2 K2 f12
Z
K11 φ2
(Aφ2h )v dx = 12
2 K2 − ; element two (9.55)
K21 22 φ3 f22
Ω̄2
These are called the element equations.
For the two-element discretization Ω̄T we have:
X2 Z
(Aφeh − f )v dx = 0 (9.56)
e=1 T
Ω̄
Equations (9.54) and (9.55) must be substituted into (9.56) to obtain their
sum. Since the degrees of freedom for elements are different, the summation
372 NUMERICAL SOLUTIONS OF BVPS
process, or assembly, of the element equations requires care. From the dis-
cretization shown in Figure 9.3(a) and the dofs at the nodes or grid points,
we know that (9.56) will yield:
[K]{δ} = {F } (9.57)
Remarks.
(1) The assembly process remains the same for Figure 9.3(b), except that
in this case the element matrices and vectors are (3 × 3) and (3 × 1) and
the assemble [K] and {F } are (5 × 5) and (5 × 1).
integrand that contain even order derivatives of φeh , we transfer half of the
differentiation to v. By doing so, we can make the order of differentiation on
φeh and v in these terms the same. This results in a symmetric coefficient ma-
trix for the element corresponding to these terms. The integration by parts
results in boundary terms or boundary integrals, called the concomitant.
Thus, in this process we have:
Z Z Z
e e
(Aφh − f )v dx = (Aφh )v dx − f v dx (9.59)
Ω̄e Ω̄e Ω̄e
9.3. FINITE ELEMENT METHOD FOR BVPS 373
or Z Z
(Aφeh − f )v dx = B e
(φeh , v) e
− l (v) − f v dx (9.60)
e
Ω̄e Ω̄e
We note that (9.60) is due to (9.59) after integration by parts. This is
referred to as weak form of (9.59) due to the fact that it contains lower
order derivatives of φeh compared to the BVP. B e (φeh , v) contains only those
terms that contain both φeh and v. The concomitant le (v) only contains the
terms that resulting from integration by parts thatehave v (and not φeh ).
After substituting for φeh and v = δφeh = Nj ; j = 1, 2, . . . , n, we obtain:
Z
(Aφeh − f )v dx = [K e ]{δ e } − {P e } − {f e } (9.61)
Ω̄e
The vector {P e } is due to the concomitant and is called the vector of sec-
ondary variables. The assembly process for [K e ] and {f e } follows Section
9.3.1.1. Assembly for {P e }, giving {P }, is the same as that for {f e }.
Remarks.
(1) When the differential operator A contains only even order derivatives,
[K e ] and [K] are assured to be symmetric. This is not true in GM,
PGM, and WRM.
We shall see that NBC are naturally satisfied or absorbed while EBC
need to be specified or imposed on the assembled equations to ensure
uniqueness of the solution.
In R2 the concomitant is a contour integral over closed contour Γe of an
element e. In R3 the concomitant is a surface integral. Simplification of
concomitant in R2 and R3 requires that we split the integral over Γe into
integral over Γe1 , Γe2 on which EBC and NBC are specified. For specific
details on these see reference [49].
or
δI e = [K e ]{δ e } − {f e } (9.70)
9.3. FINITE ELEMENT METHOD FOR BVPS 375
in which Z
e
Kij = (ANi )(ANj ) dx
Ω̄e
Z i, j = 1, 2, . . . , n (9.71)
fie = f (ANi ) dx
Ω̄e
[K]{δ} = {F } (9.72)
[K] = [K e ] ; {δ} = ∪{δ e } ; {f e }
P P
{F } = (9.73)
e e e
Remarks.
(2) Surana, et al. [49] have shown that [K e ] and [K] can also be made
symmetric when A is nonlinear.
dα
+ φ = f (x) ∀x ∈ Ω (9.75)
dx
dφ
α= (9.76)
dx
in dependent variables φ and α. α is called auxiliary variable and equa-
tion (9.76) is called auxiliary equation. Using the same approach a higher
oder ODE in φ can be reduced to a system of first order ODEs.
d2 T
+ T = f (x) ∀x ∈ (0, 1) = Ω ⊂ R1 (9.77)
dx2
with boundary conditions
AT = f (x) ∀x ∈ Ω
d2 (9.79)
A= +1
dx2
Since A contains even order derivatives, GM/WF is ideally suited for de-
signing finite element process for (9.77). Let Ω̄T = ∪Ω̄e be discretization of
e
Ω̄ = [0, 1] in which Ω̄e = [xe , xe+1 ] is an element e.
Let Th be approximation of T over Ω̄T and The be approximation of T over
Ω̄e , then
Th = ∪The (9.80)
e
9.3. FINITE ELEMENT METHOD FOR BVPS 377
we consider Z
(ATh − f (x))v(x) dx = 0 ; v = δTh (9.81)
Ω̄T
or
XZ
(AThe − f )v(x) dx = 0 ; v = δThe (9.82)
e
Ω̄e
consider
d2 The
Z Z Z
(AThe − f )v(x) dx = + Th
e
v(x) dx − f v dx (9.83)
dx2
Ω̄e Ω̄e Ω̄e
In which e xe+1
dTh
< AThe , v >Γe = v(x) (9.85)
dx xe
is the concomitant resulting due to integration by parts. In this case since
we have an ODE in R1 , the concomitant consists of boundary terms. From
(9.85), we find that
• T is PV and T = T0 (given) on some boundary Γ∗1 is EBC.
Let
dThe dThe
= −P2e and = P1e (9.87)
dx xe+1 dx xe
dv dThe
Z Z
e
(ATh , v)Ωe = − + Th v dx − f v dx − v(xe )P1e − v(xe+1 )P2e
e
dx dx
Ω̄e Ω̄e
(9.89)
or
(AThe , v)Ωe = B e (The , v) − le (v) (9.90)
in which
dv dThe
Z
e
B (The , v) = − + The v dx (9.91)
dx dx
Ω̄e
Z
le (v) = f v dx + v(xe )P1e + v(xe+1 )P2e (9.92)
Ω̄e
B e (The , v) = B e (v, The ) i.e. interchanging the roles of The and v does not
change B e (·, ·), hence B e (·, ·) is symmetric. (9.90) is the weak form of the
integral from (9.83).
Consider a five element uniform discretization using two node linear element
1 1 2 2 3 3 4 4 5 5 6
x
T1 T2 T3 T4 T5 T6
Figure 9.4: A five element uniform discretization using two node elements
1 2
ξ
ξ = −1 Ω̄ξ ξ = +1
in which δ1e and δ2e are nodal degrees of freedom for nodes 1 and 2 (local
node numbers) of element e. Mapping of points is defined by
1−ξ 1+ξ
x(ξ) = xe + xe+1 (9.94)
2 2
Hence,
dx
dx = dξ = Jdξ (9.95)
dξ
Where
d 1−ξ d 1+ξ xe+1 − xe he
J= xe + xe+1 = = (9.96)
dξ 2 dξ 2 2 2
dv dThe
Z
e e e
B (Th , v) = − + Th v dx
dx dx
Ω̄e
2 2
Z ! ! !
dNj X dNi X
= − δie + Ni δie Nj dx
dx dx
i=1 i=1
Ω̄e
2 2
Z ! ! !
1 dNj X 1 dNi e X
= − δ + Ni δie Nj Jdξ
J dξ J dξ i
i=1 i=1
Ω̄e
Z2 2
!! Z+1 X
2
!
2 dNj X dNi he
= − δie dξ + Ni δie Nj dξ
he dξ dξ 2
i=1 i=1 −1 i=1
1 e e 2 e e
= [ K ]{δ } + [ K ]{δ }
(9.100)
380 NUMERICAL SOLUTIONS OF BVPS
in which
Z+1
1e 2 dNi dNj
Kij =− dξ ; i, j = 1, 2 (9.101)
he dξ dξ
−1
Z+1
2e he
Kij = Ni Nj dξ ; i, j = 1, 2 (9.102)
2
−1
Z+1
l (N1 ) = f (ξ)N1 J dξ + N1 (−1)P1e + N1 (1)P2e
e
(9.105)
−1
for j = 2
Z+1
le (N2 ) = f (ξ)N2 J dξ + N2 (−1)P1e + N2 (1)P2e (9.106)
−1
Since
N1 (−1) = 1 , N1 (1) = 0
(9.107)
N2 (−1) = 0 , N2 (1) = 1
We can write
le (v) = {F e } + {P e }
Z+1
Fie = f (ξ)Ni J dξ ; i = 1, 2 (9.108)
−1
{P } = [P1e
e T
P2e ]
in which
e −4.93333 5.03333
[K ] = ; e = 1, 2, . . . , 5 (9.118)
5.03333 −4.93333
382 NUMERICAL SOLUTIONS OF BVPS
T1 T6 T2 T3 T4 T5
−4.933 0 5.033 0 0 0 T
1
0 −4.933 0 0 0 5.033 T6
(−4.933
5.033 0 5.033 0 0 T2
− 4.933)
(−4.933
[K] = 0 0 5.033 5.033 0 T3
− 4.933)
(−4.933
0 0 0 5.033 5.033 T4
− 4.933)
(−4.933
0 5.033 0 0 5.033 T5
− 4.933)
(9.121)
0.88889 × 10−4
−1
0.819711 × 10
−4 −2
0.31111 × 10 + 0.20889 × 10
{F } = (9.122)
0.39111 × 10−2 + 0.10489 × 10−1
0.15511 × 10−1 + 0.30089 × 10−1
0.39911 × 10−1 + 0.65689 × 10−1
{δ}T = T1 T6 T2 T3 T4 T5
(9.123)
9.3. FINITE ELEMENT METHOD FOR BVPS 383
P21 + P12 = P2 = 0
P22 + P13 = P3 = 0
(9.124)
P23 + P14 = P4 = 0
P24 + P15 = P5 = 0
(2) Where primary variables are specified (or given) at a node i.e. when
the essential boundary conditions are given at a node, the sum of the
secondary variables is unknown at that node. Thus P11 = P1 and P25 =
P6 are unknown.
in which
{δ}T1 = T1 T6 = [0.0 − 0.5] ; known
{δ}T2 = T2 T3 T4 T5 ; unknown
{F }T1 = F1 F6 ; {F }T2 = F2 F3 F4 F5
(9.126)
{P }T1 = P1 P6 ; unknown
{P }T2 = P2 P3 P4 P5 ; known
and from the first set of equations in (9.127) we can solve for {P }1 .
First, we solve for {δ}2 using (9.128) and calculate {P }1 using (9.129).
in which
−4.93333 0
[K11 ] =
0 −4.93333
5.03333 0 0 0 (9.130)
[K12 ] =
0 0 0 5.03333
[K21 ] = [K12 ]T
−9.86666 5.03333 0 0
5.03333 −9.86666 5.03333 0
[K22 ] = (9.131)
0 5.03333 −9.86666 5.03333
0 0 5.03333 −9.86666
{F }T1 = 0.88889 × 10−4 0.818711 × 10−1
(9.132)
{F }T2 = 0.212 × 10−2 0.144 × 10−1 0.456 × 10−1 1.056 × 10−1
(9.133)
{[K21 ]{δ}1 }T = 0.0 0.0 0.0 −2.516667
(9.134)
{P }T2 = 0 0 0 0
(9.135)
Thus, using (9.128), we now have using (9.131), (9.133), (9.134) and (9.135)
in (9.128) we can solve for {δ}2 .
0.88889 × 10−4
(9.137)
0.819711 × 10−1
9.3. FINITE ELEMENT METHOD FOR BVPS 385
or
P1 −0.651643
{P }1 = = (9.138)
P6 0.111952
Thus, now {δ}T = [T1 , T2 , . . . , T6 ] is known, hence using local approximation
for each element we can describe T over each element domain Ω̄ξ .
1−ξ 1+ξ
T (ξ) = Te + Te+1 ; e = 1, 2, . . . , 5 (9.139)
2 2
node x T
1 0.0 0.0
2 0.2 -0.12945
3 0.46 -0.25328
4 0.6 -0.36418
5 0.8 -0.45155
6 1.0 -0.5
e
Table 9.2: dTh/dx versus x (example 9.1)
1 0.0 -0.64725
1
2 0.2 -0.64725
2 0.2 -0.6195
2
3 0.4 -0.6195
3 0.4 -0.5545
3
4 0.5 -0.5545
4 0.6 -0.43685
4
5 0.8 -0.43685
5 0.8 -0.24225
5
6 1.0 -0.24225
Table 9.2 gives dThe/dx values calculated at each of the five elements of the
discretization.
0
-0.1
Temperature Th
e
-0.2
-0.3
-0.4
-0.5
0 0.2 0.4 0.6 0.8 1
x
-0.2
-0.25
-0.3
-0.35
-0.4
d(Th)/dx
-0.45
e
-0.5
-0.55
-0.6
-0.65
-0.7
0 0.2 0.4 0.6 0.8 1
x
Figures 9.7 and 9.8 show plots of T versus x and dT/dx versus x. Since The is
of class C 0 (Ω̄e ), we observe inter element discontinuity of dThe/dx at the inter
element boundaries. Upon mesh refinement the jumps in dThe/dx diminishes
at the inter element boundaries.
(9.144)
388 NUMERICAL SOLUTIONS OF BVPS
0.88889 × 10−4
−4 + 0.20889 × 10−2
0.31111 × 10
−4 −1
0.39111 × 10 + 0.10489 × 10
{F } = −1 −1 (9.145)
0.15511 × 10 + 0.30089 × 10
0.39911 × 10 + 0.65689 × 10−1
−1
0.819711 × 10 −1
Using the rules for defining the sum of the secondary variables in example
(9.1), we have
P2 = 0 , P3 = 0 , P4 = 0 , P5 = 0 , P6 = −20 (9.146)
and P1 is unknown.
Due to EBC, we have
are unknown.
Assembled equations (9.120) with [K], {F } and {P } defined by (9.144) –
(9.147) can be written in partitioned form
[K11 ] [K12 ] {δ}1 {F }1 {P }1
= + (9.148)
[K21 ] [K22 ] {δ}2 {F }2 {P }2
in which
{δ}T1 = {0.0} ; known
{δ}T2 = T2 T3 T4 T5 T6 ; unknown
First, we solve for {δ}2 using (9.151) and then calculate {P }1 using (9.152).
in which
[K11 ] = [−4.93333]
[K12 ] = 5.03333 0.0 0.0 0.0 0.0 (9.153)
[K21 ] = [K12 ]T
−9.86666 5.03333 0 0 0
5.03333 −9.86666 5.03333 0 0
[K22 ] =
0 5.03333 −9.86666 5.03333 0
(9.154)
0 0 5.03333 −9.86666 5.03333
0 0 0 5.03333 −4.93333
{F }T1 = {0.88889 × 10−4 } (9.155)
{P }1 = P1 = [−4.93333]{0.0}+
7.2469
14.206
5.03333 0.0 0.0 0.0 0.0 20.604 −
26.192
30.761
dT 1 dT (ξ) 1 he
= = (Te+1 − Te ) ; J = e = 1, 2, . . . , 5 (9.163)
dx J dξ he 2
node x T
1 0.0 0.0
2 0.2 7.2469
3 0.46 14.206
4 0.6 20.604
5 0.8 26.192
6 1.0 30.761
e
Table 9.4: dTh/dx versus x (example 9.2)
30
25
Temperature Th
e
20
15
10
0
0 0.2 0.4 0.6 0.8 1
x
40
35
d(Th)/dx
30
e
25
20
0 0.2 0.4 0.6 0.8 1
x
Figures 9.9 and 9.10 show plots of T versus x and dT/dx versus x. Since The is
of class C 0 (Ω̄e ), we observe inter element discontinuity of dThe/dx at the inter
element boundaries. Upon mesh refinement the jumps in dThe/dx diminishes
at the inter element boundaries.
d2 u
+ λu = 0 ∀x ∈ (0, 1) = Ω ⊂ R1 (9.164)
dx2
BCs: u(0) = 0 , u(1) = 0 (9.165)
we wish to determine the eigenvalue λ and the corresponding eigenvectors
using finite element method.
We can write (9.164) as
Au = 0 ∀x ∈ Ω
d2 (9.166)
A= +λ
dx2
Since the operator A has even order derivatives, we consider GM/WF. Let
Ω̄T = ∪Ω̄e be discretization of Ω̄ = [0, 1] in which Ω̄e is an element. We
e
consider GM/WF over an element Ω̄e = Ωe ∪ Γe ; Γe is boundary of Ω̄e
consisting of xe and xe+1 end points.
Let uh be approximation of u over Ω̄T such that
uh = ∪ueh (9.167)
e
in which ueh is the local approximation of u over Ω̄e . Consider integral form
over Ω̄e = [xe , xe+1 ]
d2 ueh
Z Z
(Aueh )v dx = e
+ λuh v dx = 0 ; v = δueh (9.168)
dx2
Ω̄e Ω̄e
Consider a two node linear element Ω̄e with degrees of freedom δ1e and δ2e at
the nodes, then
Let
1−ξ 1+ξ
ueh = δ1e + δ2e = N1 (ξ)δ1e + N2 (ξ)δ2e (9.170)
2 2
and
v = δueh = Nj ; j = 1, 2 (9.171)
First, concomitant in (9.169)
dueh dueh
< Aueh , v >Γe = v(xe+1 ) − v(xe )
dx xe+1 dx xe
Let (9.172)
dueh dueh
− = P1e , = P2e
dx xe dx xe+1
Then,
dv dueh
Z Z
e
(Auh )v(x) dx = − + λuh v dx+v(xe+1 )P2e +v(xe )P1e (9.173)
e
dx dx
Ω̄e Ω̄e
2 2
Z Z ! ! !
dNj X dNi X
(Aueh )v(x) dx = − δie +λ Ni δie Nj dx+
dx dx
i=1 i=1
Ω̄e Ω̄e
Nj (xe+1 )P2e + Nj (xe )P1e (9.174)
Z+1 Z+1
e 2 dNi dNj λhe
Kij =− dξ + Ni Nj dξ (9.176)
he dξ dξ 2
−1 −1
394 NUMERICAL SOLUTIONS OF BVPS
1 1 2 2 3 3 4 4 5
x
u1 u2 u3 u4 u5
Figure 9.11: A four element uniform discretization using two node elements
Thus, for each element of the discretization of Figure 9.11 we can write
Z e
e
1 e 2 e
ue P1
(Auh )v dx = [ K ] + λ[ K ] + ; e = 1, 2, . . . , 4 (9.178)
ue+1 P2e
Ω̄e
1 1
−4 4
[1K e ] = ; [2K e ] = 12
1
24
1 (9.179)
4 −4 24 12
Assembly of the element equations can be written as
Z 4 Z
X 4 4 4
X X X
(Aueh )v dx 1 e
[ K ] {δ}+ {P e } = {0}
2 e
(Auh )v dx = = [ K ]+λ
e=1 e=1 e=1 e=1
Ω̄T Ω̄e
(9.180)
{δ} = ∪{δ e }
e
or
[K]{δ} = −{P }
" 4 4
#
X X
[K] = [1K e ] + λ [2K e ]
e=1 e=1
(9.181)
= [1K] + λ[2K]
4
X
and {P } = {P e }
e=1
{δ}T = u1 u5 u2 u3 u4
(9.182)
9.3. FINITE ELEMENT METHOD FOR BVPS 395
so that known u1 and u5 are together, hence the assembled equations will
be in partitioned form. Thus, we label rows and columns of [K] as u1 , u2 ,
u3 , u4 and u5 . Assembled [1K e ], [2K e ] and {P e } are shown in the following.
u1 u5 u2 u3 u4
−4 0 4 0 0 u
1
0 −4 0 0 4 u5
(−4
4 0 4 0 u2
1 e − 4)
[K ]= (9.183)
(−4
0 0 4 4 u3
− 4)
(−4
0 4 0 4 u4
− 4)
u1 u5 u2 u3 u4
1 1
12
0 24
0 0 u
1
1 1
0 12
0 0 24 u5
( 1/12
1 1
0 0 u2
24 24
2 e + 1/12)
[K ]= (9.184)
( 1/12
1 1
0 0 u3
24 + 1/12) 24
( 1/12
1 1
0 24
0 24
u4
+ 1/12)
P11
P1
4
P P5
2
{P } = P21 + P12 = P2 (9.185)
P 2 + P13 P3
23
P2 + P1 4
P4
using (9.185) – (9.187) in (9.188), we obtain the following from the second
set of partitioned equations
1
[ K22 ] + λ[2K22 ] {δ}2 = {0}
(9.191)
1 −4 0
[ K11 ] =
0 −4
1 400
[ K12 ] =
004
[1K21 ] = [1K12 ]T
−8 4 0
[1K22 ] = 4 −8 4
0 4 −8
" #
1 (9.192)
2 3 0
[ K11 ] =
0 13
" #
1
6 0 0
[2K12 ] =
0 0 16
[2K21 ] = [2K12 ]T
2 1
3 6 0
2 1 2 1
[ K22 ] = 6 3 6
0 16 23
Using [1K22 ] and [2K22 ] from (9.192) in (9.191) and changing sign throughout
2 1
8 −4 0
3 6 0 u2
−4 8 −4 − λ 16 2 1
u3 = {0} (9.193)
3 6
0 −4 8 u4
1 2
0 6 3
1 2 3 4 5
x
x1 x2 x3 x4 x5
h h h h
x=0 x=L
(b) Express the derivatives in the differential equation describing the BVP
in terms of their finite difference approximations using Taylor series ex-
pansions about the nodes in [0, L] and substitute these in the differential
equation.
(c) We also do the same for the derivative boundary conditions if there are
any.
(d) As far as possible we use finite difference expressions for the various
derivatives that have truncation error of the same order so that the
order of the truncation error in the solution is clearly defined. In case of
using finite difference expressions for the derivatives that have truncation
errors of different orders, it is the lowest order truncation error that
controls the order of the truncation error in the numerically computed
solution.
(e) In step (b), the differential equation is converted into a system of alge-
braic equations. Arrange the final equations resulting in (b) in matrix
form and solve for the numerical values of the unknown dependent vari-
ables at the grid or node points.
T1 T2 T3 T4 T5 T6
T1 = T (0) = 0 T6 = T (1) = −0.5
(BC) (BC)
i−1 i i+1
0.2 0.2
Using
Ti = T (xi ) ; i = 1, 2, . . . with h = 0.2 (9.197)
we can write:
d2 T Ti+1 − 2Ti + Ti−1 Ti+1 − 2Ti + Ti−1
= = (9.198)
dx2 x=xi h2 (0.2)2
Consider (9.195) at node i.
d2 T
+ T |x=i = x3i (9.199)
dx2 x=xi
d2 T
Substituting for dx2 x=x
from (9.198) into (9.199):
i
or
−49 25 0 0 T2
0.008
25 −49 25 0
T3 = 0.064
(9.203)
0 25 −49 25 T 0.216
4
0 0 25 −49 T5 13.012
T2 = −0.128947, T3 = −0.252416
(9.204)
T4 = −0.363228, T5 = −0.450872
node x T
1 0.0 0.0
2 0.2 -0.128947
3 0.46 -0.25328
4 0.6 -0.36418
5 0.8 -0.45155
6 1.0 -0.5
9.4. FINITE DIFFERENCE METHOD 401
-0.1
Temperature T
-0.2
-0.3
-0.4
-0.5
0 0.2 0.4 0.6 0.8 1
distance x
Table 9.5 gives values of T at the grid points. Figures 9.15 show plot of
(Ti , xi ); i = 1, 2, . . . , 6.
Remarks.
(1) We note that the coefficient matrix in (9.203) is in tridiagonal form,
hence we can take advantage in storing the coefficients of the matrix as
well as in solution methods for calculating Ti , i = 2, 3, . . . , 5.
(a) The finite difference expression for the derivatives are approximate.
(b) We have only used a finite number of node points (only six) in Ω̄T .
(c) If the number of points in Ω̄T are increased, accuracy of the com-
puted nodal values of T will improve.
(3) Both boundary conditions in (9.196) are function values, i.e., values of
T at the two boundaries x = 0 and x = 1.0.
(4) We only know the solution at the grid or node points. Between the node
points we only know that the solution is continuous and differentiable,
but we do not know what it is. This is not the case in finite element
method.
402 NUMERICAL SOLUTIONS OF BVPS
T1 T2 T3 T4 T5 T6 T7
dT
T1 = 0.0 dx = 20
x1 = 0.0 x6 = 1.0
i−1 i i+1
h h
d2 T
Substituting for dx2 x=x
in (9.208) from (9.207):
i
Since at x = 1.0, dT
dx is given, T at x = 1.0 is not known, hence (9.210) must
also hold at x = 1.0, i.e., at node 6 in addition to i = 2, 3, 4, 5. In order to
satisfy the BC dTdx = 20 at x = 1.0 using a central difference approximation
dT
of dx , we need an additional node 7 (outside the domain) as shown in Figure
9.16. Using a three-node stencil i − 1, i and i + 1, we can write the following
(central difference).
dT dT
= = 20 = 2.5(T7 − T5 ) (9.212)
dx x=x6 dx x=1
or
T7 = T5 + 8.0 (9.213)
Thus T7 is known in terms of T5 . Using (9.210) for i = 2, 3, 4, 5 and 6:
T2 = 7.32918 , T3 = 14.36551 ,
T6 = 31.07091 (9.216)
T4 = 20.82977 , T5 = 26.46949 ,
node x T
1 0.0 0.0
2 0.2 7.32918
3 0.46 14.36551
4 0.6 20.82977
5 0.8 26.46949
6 1.0 31.07091
30
25
Temperature T
20
15
10
0
0 0.2 0.4 0.6 0.8 1
distance x
Table 9.6 gives values of T at the grid points. Figures 9.18 show plot of
(Ti , xi ); i = 1, 2, . . . , 6.
Remarks.
(1) Node points such as point 7 that are outside the domain Ω̄ are called
imaginary points. These are necessary when the derivative (first or sec-
ond) boundary conditions are specified at the boundary points.
(2) This example demonstrates how the function value and derivative
boundary condition (first derivative in this case) are incorporated in
the finite difference solution procedure.
9.4. FINITE DIFFERENCE METHOD 405
(3) The accuracy of the approximation in general is poorer when the deriva-
tive boundary conditions are specified at the boundary points due to the
additional approximation of the boundary condition.
(4) Here also, we only know the solution at the grid or node points. Be-
tween the node points we only know that the solution is continuous and
differentiable, but we do not know what it is. This is not the case in
finite element method.
i−1 i i+1
ui−1 ui ui+1
d2 u
+ λui = 0 (9.220)
dx2 x=xi
Since nodes 1 and 5 have function u specified, at these locations the solution
is known. Thus, (9.222) must only be satisfied at xi ; i = 2, 3, 4.
or
32 −16 0 u2 1 0 0 u2
−16 32 16 u3 − λ 0 1 0 u3 = {0} (9.225)
0 −16 32 u4 001 u4
We can find the eigenpairs of (9.225) using any one of the methods discussed
in Chapter 4, and we obtain:
0.5
(λ1 , {φ}1 ) = 9.37, 0.707
0.5
0.70711
(λ2 , {φ}2 ) = 32.0, 0.0 (9.228)
−0.70711
0.49957
(λ3 , {φ}3 ) = 54.62, −0.70722
0.49957
Ω̄ = Ω ∪ Γ (9.236)
∂2φ ∂2φ
+ 2 = 0 ∀x, y ∈ Ω = (0, L) × (0, M ) (9.237)
∂x2 ∂y
BCs : φ(0, y) = φ(x, 0) = φ(L, y) = 0 , φ(x, M ) = 100 (9.238)
Figure 9.21 shows the domain Ω, its boundary Γ, and the boundary condi-
tions (9.238).
2 ∂2y
In the finite difference method we approximate ∂∂xφ2 and ∂x2 by their finite
difference approximation at a countable number of points in Ω̄. Consider
the following uniform grid (discretization Ω̄T of Ω̄) in which ∆x and ∆y are
spacing in x- and y-directions.
9.4. FINITE DIFFERENCE METHOD 409
φ = 100 ∀x ∈ [0, L] at y = M
y=M
φ=0
φ=0
x
x=0 x=L
φ=0
3
∆y
2
j=1 x
i=1 2 3 4 5
∆x
i, j + 1
j
i − 1, j i, j i + 1, j
i, j − 1
x
i
Figure 9.23: Node(i, j) and five-node stencil
1,4 5,4
2,4 3,4 4,4
φ=0 φ=0
1,3 5,3
2,3 3,3 4,3
1,2 5,2
2,2 3,2 4,2
x
1,1 2,1 3,1 4,1 5,1
φ=0
A A-A is line of symmetry
We note that A − A is a line (or plane) of symmetry. Ω̄, BCs, and the
solution φ are all symmetric about this line, i.e., the left half and right half
of line A − A are reflections of each other. Hence, for interior points we have:
φ4,4 = φ2,4
φ4,3 = φ2,3 (9.242)
φ4,2 = φ2,2
Using (9.244), we can obtain the following for the nine interior grid points
(after substituting the value of φ for boundary nodes):
matrix form:
1 −0.25 0 −0.25 0
0
φ2,2
0
−0.25 1 −0.25 0 −0.25 0
φ 2,3
0
0 −0.25 1 0 0 −0.25
φ 2,4
25
−0.5
= (9.247)
0 0 1 −0.25 0 φ3,2
0
0 −0.5 0 −0.25 1 −0.25 φ3,3 0
0 0 −0.5 0 −0.25 1
φ3,4 25
The solution values in (9.248) can be shown schematically (see Figure 9.25).
We note that this solution holds for any L = M i.e. for all square domains
Ω̄.
A
0.0 0.0
42.86 52.68 42.86
0.0 0.0
18.75 25.00 18.75
0.0 0.0
7.14 9.82 7.14
Consider:
∂2φ ∂2φ
+ 2 = −2 ∀x, y ∈ Ω = (0, L) × (0, L) (9.249)
∂x2 ∂y
In this case Ω is a square domain with side length L. Following the details
for Laplace equation, we can write the following for a grid points (i, j) (using
∆x = ∆y = h).
−4φ2,2 = −2(1)2
(9.252)
∴ φ2,2 = 0.5
2,3 3,3
1,3
L=1
1,2 3,2
2,2
x
1,1 2,1 3,1
h h
L
φ=0
5
4
C φ=0
B 3 φ=0 B
2 L=2
h = 0.5
j=1
φ=0
C i=1 2 3 4 5
A
Figure 9.27: A 25-node discretization of Ω̄T
At node (2,2):
C
2,3 3,3
1,3
fictitious nodes
(or imaginary nodes)
1,2 3,2
2,2
At node (2,3):
At node (3,3):
or in matrix form:
−4 2 0 φ2,2 −0.5
2 −4 1 φ2,3 = −0.5 (9.256)
0 4 −4 φ3,3 −0.5
(3) This method, though crude, is simple for obtaining quick solutions that
give some idea about the solution behavior.
(4) The finite element method of approximation for BVPs has sound math-
ematical foundation. This method eliminates or overcomes many of the
difficulties encountered in the finite difference method.
Problems
9.1 Consider the following 1D BVP in R1
d2 u
= x2 + 4u ∀x ∈ (0, 1) (1)
dx2
with boundary conditions:
d2 u
= x2 + 4u ∀x ∈ (0, 1) (1)
dx2
with boundary conditions:
d2 u du
2
+4 + 2u = x2 ∀x ∈ (0, 2) (1)
dx dx
with boundary conditions:
du
u(0) = 0 , u0 (2) = = 0.6 (2)
dx x=2
solution of (1) with BCs in (2) using a uniform discretization containing five
grid points (i.e. ∆x = 0.5). Plots graph of u, du/dx and d2 u/dx2 versus x.
du/dx and d2 u/dx2 can be calculated using central difference approximation
and forward and backward difference approximations at grid points 1 and 5
for du/dx.
d2 T x
− 1 − T =x ∀x ∈ (1, 3) (1)
dx2 5
with boundary conditions:
d2 T
T (1) = 10 , T 00 (3) = =6 (2)
dx2 x=3
9.5 Consider the same BVP (i.e. (1)) as in problem 9.4, but with the new
BCs :
d2 T x
− 1 − T = x ∀x ∈ (1, 3) (1)
dx2 5
with boundary conditions:
d2 T dT
T 00 (1) = = 2 , T 0 (3) = =1 (2)
dx2 x=1 dx x=3
∂2T ∂2T
+ = 0 ∀(x, y) ∈ (Ωxy ) = Ωx × Ωy (1)
∂x2 ∂y 2
The domain Ω̄xy and the discretization including boundary conditions are
given in the following.
418 NUMERICAL SOLUTIONS OF BVPS
y
∂T
=0
∂y
o
o ∂T ∂T
8 =0, =0
T = 15 7 9
∂x ∂y
5
T = 15 4 5
o ∂T
6 =0
∂x
5
T = 15 1 2 3 x
T = 30 T = 30
10 10
9 10 11 12 φ = 100
φ=5
y=4
φ=5 5 6 7 8 φ = 100
y=2
u(1, y) = 3y
3
u(0, y) = 0
x
u(x, 0) = 0
3
(a) Schematic of Ω̄xy
9
10 11 12
4 5 6
5
6 7 8
1 2 3 1 2 3 4
(b) A nine grid point discretization (c) A sixteen grid point discretiza-
tion
420 NUMERICAL SOLUTIONS OF BVPS
∂2u ∂2u
+ 2 = 0 ∀(x, y) ∈ (Ωxy ) = Ωx × Ωy (1)
∂x2 ∂y
The domain Ω̄xy and the discretization including boundary conditions are
shown in Figure (a) in the following.
y u(x, 1) = x2
u(1, y) = y 2
3
u(0, y) = 0
x
u(x, 0) = 0
3
(a) Schematic of Ω̄xy
7 8 9 13 14 15 16
9
10 11 12
4 5 6
5
6 7 8
1 2 3 1 2 3 4
(b) A nine grid point discretization (c) A sixteen grid point discretiza-
tion
9.5. CONCLUDING REMARKS 421
imations of the derivative in the BVP (1) using central difference method to
construct a system of algebraic equations for (1) using this approximation
of the derivatives. Find numerical values of u5 for discretization in Figure
(b) and the numerical values of u6 , u7 , u10 and u11 for the discretization in
Figure (c).
dφ 1 d2 φ
− = 0 ∀x ∈ (Ωx ) = (0, 1) (1)
dx P e dx2
with boundary conditions:
1 2 3 4 5
x
x=0 x=1
φ1 = 1 φ2 φ3 φ4 φ5 = 0
(a) A five grid point uniform discretization
dφ 1 d2 φ
φ − = 0 ∀x ∈ (Ωx ) = (0, 1) (1)
dx Re dx2
with boundary conditions:
This is a nonlinear BVP, hence the resulting algebraic system will be a system
of nonlinear algebraic equations. Consider finite difference approximation of
the derivatives using central difference method to obtain an algebraic system
for (1). Consider a five grid point uniform discretization shown below.
422 NUMERICAL SOLUTIONS OF BVPS
1 2 3 4 5
x
x=0 x=1
φ1 = 1 φ2 φ3 φ4 φ5 = 0
(a) A five grid point uniform discretization
d2 u
= x2 + 4u ∀x ∈ (0, 1) = Ω (1)
dx2
with boundary conditions:
d2 u
= x2 + 4u ∀x ∈ (0, 1) = Ω (1)
dx2
with boundary conditions:
of (1) over an element Ω̄e of Ω̄T using Galerkin method with weak form
(GM/WF). Assemble element equations and obtain numerical values of u at
the grid points with unknown u using boundary conditions (2). Plot graphs
of u versus x and du/dx versus x. Calculate du/dx using element local approx-
imation. Compare this solution with the solution calculated in problem 9.2.
Also calculate values of the unknown secondary variables (if any).
d2 u du
2
−4 + 2u = x2 ∀x ∈ (0, 2) = Ω (1)
dx dx
with boundary conditions:
d2 T x
− 1 − T =x ∀x ∈ (1, 3) (1)
dx2 5
with boundary conditions:
dφ 1 d2 φ
− = 0 ∀x ∈ (Ωx ) = (0, 1) (1)
dx P e dx2
424 NUMERICAL SOLUTIONS OF BVPS
425
426 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
t=τ
Γ4
BCs Γ1 Γ2 BCs
t=0 x
x=0 ICs Γ3 x=L
When the initial value problem contains two spatial coordinates, we have
space-time slab Ω̄xt shown in Fig. 10.2 in which
t
D1
C1
Γ1 D Γ4 C
Γ2
A1 B1
t=τ L2
Γ3
A B
t=0 x
L1
shown in Fig. 10.3. For an increment of time ∆t, that is for 0 ≤ t ≤ ∆t,
(1)
consider the first space-time strip Ω̄xt = [0, L] × [0, ∆t]. If we are only
interested in the evolution up to time t = ∆t and not beyond t = ∆t, then
the evolution in the space-time domain [0, L] × [∆t, τ ] has not taken place
(1)
yet, hence does not influence the evolution for Ω̄xt , t ∈ [0, ∆t]. We also note
(1)
that for Ω̄xt , the boundary at t = ∆t is open boundary that is similar to
the open boundary at t = τ for the whole space-time domain. We remark
(1)
that BCs and ICs for Ω̄xt and Ω̄xt are identical in the sense of those that
(2)
are known and those that are not known. For Ω̄xt , the second space-time
(1)
strip, the BCs are the same as for Ω̄xt but the ICs at t = ∆t are obtained
(1)
from the computed evolution for Ω̄xt at t = ∆t. Now, with the known ICs
(2)
at t = ∆t, the second space-time strip Ω̄xt is exactly similar to the first
(1) (1)
space-time strip Ω̄xt in terms of BCs, ICs, and open boundary. For Ω̄xt ,
(2)
t = ∆t is open boundary whereas for Ω̄xt , t = 2∆t is open boundary. Both
open boundaries are at final values of time for the corresponding space-time
strips.
t
open boundary
t=τ
Γ4
t = tn + ∆t = tn+1
(n)
Ω̄xt
t = tn
BCs Γ1 Γ2 BCs
(n−1)
ICs from Ω̄xt
t = 2∆t = t3
(2)
Ω̄xt
t = ∆t = t2
(1)
Ω̄xt
t = 0 = t1 x
x=0 ICs Γ3 x=L
Figure 10.3: Space-time domain with 1st , 2nd , and nth space-time strips
In this process the evolution is computed for the first space-time strip
(1)
Ω̄xt = [0, L]×[0, ∆t] and refinements are carried out (in discretization and p-
(1)
levels in the sense of finite element processes) until the evolution for Ω̄xt is a
430 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
(1)
converged solution. Using this converged solution for Ω̄xt , ICs are extracted
(2)
at t = ∆t for Ω̄xt and a converged evolution is computed for the second
(2)
space-time strip Ω̄xt . This process is continued until t = τ is reached.
Remarks.
h(t):
φ(x, t) = g(x)h(t) (10.4)
where g(x) is a known function that satisfies differentiability, continuity,
and the completeness requirements (and others) as dictated by (10.1). We
substitute (10.4) in (10.1) and obtain
Integrating (10.5) over Ω̄x = [0, L] while assuming h(t) and its time deriva-
tives to be constant for an instant of time, we can write
Z
(A (g(x)h(t)) − f (x, t)) dx = 0 (10.6)
Ω̄x
Since g(x) is known, the definite integral in (10.6) can be evaluated, thereby
eliminating g(x), its spatial derivatives (due to operator A), and more specif-
ically spatial coordinate x altogether. Hence, (10.6) reduces to
Remarks.
Ω̄ex that are obtained using interpolation theory irrespective of BCs and
ICs.
(5) In principle, (10.4) holds for all of the methods of approximation listed in
Section 10.2. In all these methods spatial coordinate is eliminated using
(10.4) for discretization in space that may be followed by integration
of A(g(x)h(t)) − f (x, t) over Ω̄Tx depending upon the method chosen.
In doing so the IVP (10.1) reduces to a system of ordinary differential
equations in time which are then integrated simultaneously using explicit
or implicit time integration methods or finite element method in time
after imposing BCs and the ICs of the IVP.
∂φ ∂φ 1 ∂2φ
+ − ∀(x, t) ∈ Ωxt = (0, 1) × (0, τ ) = Ωx × Ωt (10.9)
∂t ∂x P e ∂x2
with some BCs and ICs. Equation (10.9) is a linear partial differential equa-
tion in dependent variable φ, space coordinate x, and time t. P e is the
Péclet number. Let
φ(x, t) = g(x)h(t) (10.10)
in which g(x) ∈ V ⊂ H 3 (Ω̄x ). Substituting (10.10) in (10.9)
Let
d2 g(x)
Z Z Z
dg(x)
C1 = g(x) dx ; C2 = dx ; C3 = dx (10.13)
dx dx2
Ω̄x Ω̄x Ω̄x
10.4. SPACE-TIME DECOUPLED OR QUASI METHODS 433
Let
d2 g(x)
Z Z Z
dg(x)
C1 = g(x) dx ; C2 = g(x) dx ; C3 = dx (10.19)
dx dx2
Ω̄x Ω̄x Ω̄x
in which Ni (x) are local approximation functions and δie (t) are nodal de-
grees of freedom for an element e with spatial domain Ω̄ex . Using (10.1)
we construct integral form over Ω̄Tx using any of the standard methods of
approximation. Let us consider Galerkin method with weak form:
Z
(Aφh − f, v)Ω̄Tx = (Aφh − f )v(x) dx = 0; v = δφh (10.22)
Ω̄T
x
436 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
Remarks.
BCs). In case of (10.30), a first order ODE in time, we need one initial
condition at the commencement of the evolution (at t = t1 or simply t1 = 0).
φ(0) = φ t=0
= φ0 (10.31)
ti+1
Z ti+1
Z
dφ
dt = f (φ, t) dt (10.32)
dt
ti ti
ti+1
Z ti+1
Z
or dφ = f (φ, t) dt (10.33)
ti ti
ti+1
Z
or φ ti+1
−φ ti
= f (φ, t) dt (10.34)
ti
ti+1
Z
or φi+1 = φi + f (φ, t) dt (10.35)
ti
where φ ti+1
= φi+1 and φ ti
= φi (10.36)
f (φ, t)
t
ti ti+1
f (φ, t)
f (φi , ti )
area = hf (φi , ti )
t
ti ti+1
The error made in doing so is illustrated by the empty area bound by the dot-
ted line which is neglected in the approximation (10.37) and hence (10.38).
In Euler’s method we begin with i = 0 and φ0 defined using initial condition
at time t = 0 and march the solution in time using (10.38). It is obvious that
smaller values of h = ∆t will yield more accurate results. Euler’s method
440 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
is one of the simplest and most crude approximation techniques for ODEs
in time. This method is called explicit method as in this method the so-
lution at new value of time is explicitly expressed in terms of the solution
at current value of time. Thus, computations of solution at new value of
time is simply a matter of substitution in (10.38). By using more accurate
tR
i+1
approximation of f (φ, t) dt we can devise methods that will yield more
ti
accurate numerical solution of φ(t) of (10.30).
dφ
−t−φ=0 for t > 0 (10.39)
dt
IC: φ(0) = 1 (10.40)
We consider numerical solution of (10.39) with IC (10.40) using ∆t = 0.2,
0.1 and 0.05 for 0 ≤ t ≤ 1.
We rewrite (10.39) in the standard form (10.30).
dφ
= t + φ = f (φ, t) (10.41)
dt
Thus using (10.38) for (10.41), we have
h = 0.2
0≤t≤1
step number time, t function value, φ time derivative
dφ
dt i
= f(φi , ti )
h = 0.2
0≤t≤1
step number time, t function value, φ time derivative
dφ
dt i
= f(φi , ti )
From tables 10.1 and 10.2, we note that even for h = 0.2 and 0.1, rather
large time increments, the values of φ and dφ
dt are quite reasonable. Plots of
dφ
φ and dx versus t are shown in figures 10.6 and 10.7 illustrate this.
3.5
h = 0.2
h = 0.1
2.5
φ
1.5
1
0 0.2 0.4 0.6 0.8 1
Time, t
4.5
h = 0.2
h = 0.1
4
3.5
3
dφ/dt
2.5
1.5
1
0 0.2 0.4 0.6 0.8 1
Time, t
dφ
Figure 10.7: dt
versus time t
Recall (10.35)
ti+1
Z
φi+1 = φi + f (φ, t) dt (10.43)
ti
ti+1
Z
f (φ, t) dt = h(a1 k1 + a2 k2 + ...an kn ) (10.44)
ti
k1 = f (φi , ti )
k2 = f (φ + q11 k1 h, ti + p1 h)
k3 = f (φ + q21 k1 h + q22 k2 h, ti + p2 h)
k4 = f (φ + q31 k1 h + q32 k2 h + q33 k3 , ti + p3 h) (10.46)
..
.
kn = f (φi + qn−1,1 k1 h + qn−1,2 k2 h + ...qn−1,n−1 kn−1 h)
h2
φi+1 = φi + f (φi , ti )h + f 0 (φi , ti ) (10.50)
2!
∂f (φ, t) ∂f (φ, t) ∂φ
where f 0 (φ, t) = + (10.51)
∂t ∂φ ∂t
h2
∂f (φ, t) ∂f (φ, t) dφ
∴ φi+1 = φi + f (φi , ti )h + + (10.52)
∂t ∂t dt ti 2!
dφ
But dt = f (φ, t), hence (10.52) becomes
∂f (φi , ti ) h2
∂f (φi , ti )
φi+1 = φi + f (φi , ti )h + + f (φi , ti ) (10.53)
∂t ∂φ 2!
Consider Taylor series expansion of f (·) in (10.49), using it as a function of
two variables as in
∂g ∂g
g(x + u, y + v) = g(x, y) + u+ v + ... (10.54)
∂x ∂y
444 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
∂f ∂f
∴ k2 = f (φi + q11 k1 h, t + p1 h) = f (φi , ti ) + p1 h + q11 k1 h + O(h2 )
∂t ∂φ
(10.55)
Substituting from (10.55) and (10.48) into (10.47)
∂f ∂f
φi+1 = φi + ha1 f (φi , ti ) + a2 hf (φi , ti ) + a2 p1 h2 + a2 q11 k1 h2 + O(h2 )
∂t ∂φ
(10.56)
Rearranging terms in (10.56)
∂f ∂f
φi+1 = φi + (a1 h + h2 h)f (φi , ti ) + a2 p1 h2 + a2 q11 k1 h2 + O(h2 ) (10.57)
∂t ∂t
For φi+1 in (10.53) and (10.57) to be equivalent, the following must hold.
a1 + a2 = 1
1
a2 p 1 = (10.58)
2
1
a2 q11 =
2
Equation (10.58) are three equations in four unknowns (a1 , a2 , p1 and q11 ),
hence they do not have a unique solution. There are infinitely many solu-
tions, so an arbitrary choice must be made at this point.
φi+1 = φi + hk2
where:
k1 = f (φi , ti ) (10.60)
1 1
k2 = f (ti + h, φi + k1 h)
2 2
10.9. SOME TIME INTEGRATION METHODS FOR ODES IN TIME 445
dφ
= f (φ, t) ∀t ∈ (t1 , t2 ) = (0, τ ) = Ωt (10.61)
dt
Then
φi+1 = φi + h(a1 k2 + a2 k2 + a3 k3 ) (10.62)
where
1 4 1
a1 = , a2 = , a3 = (10.63)
6 6 6
and
k1 = f (φi , ti )
k1 h
k2 = f φi , h, ti + (10.64)
2 2
k3 = f (φi + 2hk2 − hk1 , ti + h)
dφ
= f (φ, t) ∀t ∈ (t1 , t2 ) = (0, τ ) = Ωt (10.65)
dt
Then
φi+1 = φi + h(a1 k1 + a2 k2 + a3 k3 + a4 k4 ) (10.66)
where
1 1
a1 = a4 = , a2 = a3 = (10.67)
6 3
and k1 = f (ti , φi )
h k1
k2 = f (ti + , φi + h )
2 2 (10.68)
h k2
k3 = f (ti + , φi + h )
2 2
k4 = f (ti + h, φi + hk3 )
446 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
2nd and 3rd order Runge-Kutta methods as we see fit. For (10.72), we give
details for 4th order Runge-Kutta method.
du dv
= f1 (u, v, t) = f2 (u) = u (10.73)
dt dt
h h
ui+1 = ui + (k1 + 2k2 + 2k3 + k4 ) vi+1 = vi + (l1 + 2l2 + 2l3 + l4 )
6 6
k1 = f1 (ui , vi , ti ) l1 = f2 (ui ) = ui
hk1 hl1 h hk1 hk1
k2 = f1 (ui + , vi + , ti + ) l2 = f2 (ui + ) = ui +
2 2 2 2 2
hk2 hl2 h hk2 hk2
k3 = f1 (ui + , vi + , ti + ) l3 = f2 (ui + ) = ui +
2 2 2 2 2
k4 = f1 (ui + hk3 , vi + hl3 , ti + h) l4 = f2 (ui + hk3 ) = ui + hk3
1 1
ui+1 = ui + h( k1 + k2 )
2 2
k1 = f (ui , ti )
k2 = f (ui + hk1 , ti + h)
For i = 1:
t = t1 = 0, u1 = u(0) = 1
For i = 2:
448 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
k1 = f (t1 + u1 ) = 0 + 1 = 1
k2 = f (t1 + h, u1 + k1 h) = 0.2 + (1 + 1(0.2)) = 1.4
1 1.4
u2 = 1 + 0.2( + ) = 1 + 0.24 = 1.24
2 2
For i = 3:
t = t3 = 2∆t = 0.4 using ∆t = h = 0.2
Thus we have
du
t u dt = f (u, t)
0 1 1
0.2 1.24 1.44
0.4 1.5768 1.9768
For i = 1:
t = t1 = 0, u1 = 1, f (u, t) = u + t, h = 0.2
For i = 2:
t = t2 = ∆t = 0.2 using ∆t = h = 0.2
k1 = f (u1 , t) = 1 + 0 = 1
k1 h h 1(0.2) 0.2
k2 = f (u1 + , t1 + ) = (1 + ) + (0 + ) = 1.2
2 2 2 2
k3 = f (u1 + 2hk2 − hk1 , t1 + h) = (1 + 2(0.2)(1.2) − 0.2(1)) + 0 + 0.2 = 1.48
h
u2 = u1 + (k1 + uk2 + k3 )
6
0.2
u2 = 1 + (1 + 4(1.2) + 1.48) = 1.24267
6
For i = 3:
10.9. SOME TIME INTEGRATION METHODS FOR ODES IN TIME 449
For i = 2:
t = t2 = ∆t = 0.2 using ∆t = h = 0.2
k1 = f (u1 , t1 ) = 1 + 0 = 1
hk1 h hk1 h
k2 = f (u1 + , t1 + ) = (u1 + ) + (t1 + )
2 2 2 2
(0.2)(1) 0.2
= (1 + ) + (0 + ) = 1.22
2 2
hk2 h hk2 h
k3 = f (u1 + , t1 + ) = (u1 + ) + (t1 + )
2 2 2 2
(0.2)(1.2) 0.2
= (1 + ) + (0 + ) = 1.22
2 2
k4 = f (u1 + hk3 , t1 + h) = (u1 + hk3 ) + (t1 + h)
= 1 + (0.2)(1.22) + (0 + 0.2) = 1.444
h
u2 = u1 + (k1 + 2k2 + 2k3 + k4 )
6
1
= 1 + (0.2)(1 + 2(1.2) + 2(1.22) + 1.444) = 1.2428
6
450 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
For i = 3:
t = t3 = 2∆t = 0.4 using ∆t = h = 0.2
k1 = 1.3328
k2 = 1.68708
k3 = 1.71151
k4 = 1.9851
h
u3 = u2 + (k1 + 2k2 + 2k3 + k4 )
6
0.2
= 1.2428 + (1.3328 + 2(1.68708) + 2(1.71151) + 1.9851)
6
= 1.58364
u4 = 2.044218
Thus, we have
du
t u dt = f (u, t)
0 1 1
0.2 1.2428 1.4428
0.4 1.58364 1.98314
0.6 2.044218 2.644218
Remarks.
(3) 4th order Runge-Kutta method has very good accuracy. This method is
used widely in practical applications.
Example 10.5 (4th order Runge-Kutta Method for a System of First Order
ODEs in Time). Consider the following system of ODEs in time.
dx dy
= xy+t = f (x, y, t) ; = x+yt = g(x, y, t) ∀t ∈ (t1 , t2 ) = (0, τ ) = Ωt
dt dt
10.9. SOME TIME INTEGRATION METHODS FOR ODES IN TIME 451
k1 h l1 h h
k2 = f (x0 + , y0 + , t0 + )
2 2 2
(−1)(0.2) (1)(0.2) 0.2
= (1 + )(−1 + )+
2 2 2
= 0.71
k1 h l1 h h
l2 = g(x0 + , y0 + , t0 + )
2 2 2
(−1)(0.2) 1(0.2) 0.2
= (1 + ) + (−1 + )( )
2 2 2
= 0.81
k2 h l2 h h
k3 = f (x0 + , y0 + , t0 + )
2 2 2
(0.71)(0.2) (0.81)(0.2) 0.2
= (1 + )(−1 + +
2 2 2
= −0.754
k2 h l2 h h
l3 = g(x0 + , y0 + , t0 + )
2 2 2
(−0.71)(0.2) (0.81)(0.2) 0.2
= (1 + ) + (−1 + )+( )
2 2 2
= 0.837
452 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
k4 = f (x0 + k3 h, y0 + l3 h, t0 h)
= (1 + (−0.754)(0.2)(−1 + 0.837(0.2)) + 0.2
= −0.507
l4 = g(x0 + k3 h, y0 + l3 h, t0 + h)
0.2
= (1 + (−0.754)(0.2)) + (−1 + (0.837)(0.2)) + ( )
2
= 0.68
h
x1 = x0 + (k1 + 2k2 + 2k3 + k4 )
6
0.2
=1+ (−1 + 2(−0.71) + 2(−0.754) − 0.507)
6
x1 = 0.8522
h
y1 = y0 + (l1 + 2l2 + 2l3 + l4 )
6
0.2
= −1 + (1 + 2(0.81) + 2(0.837) + 0.68)
6
y1 = −0.834
Hence solution at t = 0.2 is (x1 , y1 ) = (0.8522, −0.8341).
Example 10.6 (4th Order Runge-Kutta Method for a Second Order ODE
in Time). Consider the following second order ODE in time.
d2 θ 32.2
+ sin θ = 0 ∀t ∈ (t1 , t2 ) = (0, τ ) = Ωt
dt2 r
θ = θ(t)
with
θ(0) = θ t=0 = θ0 = 0.8 in radians
dθ
=0; r=2
dt t=0
2
Use fourth order Runge-Kutta method to calculate θ, dθ d θ
dt , and dt2 for t = 0.1
nd
using ∆t = h = 0.1. Convert the 2 order ODE to a system of first order
ODEs in time.
Let
dθ du
= u = f (u) = −16.1 sin θ = g(θ) ; (r = 2)
dt dt
10.9. SOME TIME INTEGRATION METHODS FOR ODES IN TIME 453
Hence
u t=0
= u0 = 0
For i = 0:
k1 = f (u0 ) l1 = g(θ0 )
=0 = −16.1 sin(0.8) = −11.55
l1 h k1 h
k2 = f (u0 + ) l2 = g(θ0 + )
2 2
−11.55(0.1) 9(0.1)
=0+( ) = −16.1 sin(0.8 + )
2 2
= −0.578 = −11.55
l2 h k2 h
k3 = f (u0 + ) l3 = g(θ0 + )
2 2
0.1 (−0.578)(0.1)
= 0 + (−11.55)( ) = −16.1 sin(0.8 + )
2 2
= −0.578 = −11.22
k4 = f (u0 + l3 h) l4 = g(θ0 + k3 h)
= 0 + (−11.22)(0.1) = −16.1 sin(0.8 + (−0.578)(0.1))
= −1.122 = −10.882
h
θ1 = θ0 + (k1 + 2k2 + 2k3 + k4 )
6
0.1
= 0.8 + (0 + 2(−0.578) + 2(−0.578) − 1.122)
6
θ1 = 0.7429
h
u1 = u0 + (l1 + 2l2 + 2l3 + l4 )
6
0.1
=0+ (−11.55 + 2(−11.55) + 2(−11.55) − 10.882)
6
dθ
u1 = −1.133 =
dt
d2 θ −32.2
t=0.1
= sin(θ1 ) = −16.1 sin(0.7429) = −10.89
dt2 2
454 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
θ t=0.1
= 0.7429
dθ
t=0.1
= −1.133
dt
d2 θ
t=0.1
= −10.89
dt2
(2) In case of two first order simultaneous ODEs the area constants ki ;
i = 1, 2, ..., 4 and li ; i = 1, 2, ..., 4 associated with the two ODEs must
be computed in the following order.
This is due to the fact that (k2 , l2 ) contain (k1 , l1 ) and (k3 , l3 ) contain
(k2 , l2 ) and so on.
(3) When there are more than two first order ODEs, we also follow the rule
(2).
(k1 , l1 , m1 , ...) first followed by (k2 , l2 , m2 , ...)
Problems
10.1 Consider the following ordinary differential equation in time
du
= tu2 ∀t ∈ (1, 2) = (t1 , t2 ) = Ωt (1)
dt
with IC : u(1) = 1 (2)
(a) Use Euler’s method to calculate the solution u(t) and u0 (t) ∀t ∈
(1, 2] using integration step of 0.1. tabulate your results and plot
graphs of u versus t and u0 versus t.
(b) Repeat the calculations for step size of 0.05. Tabulate and plot
graphs and computed solution and compare the solution computed
here with the results obtained in (a). Write short discussion of the
results calculated in (a) and (b).
(a) Calculate u, u0 , v and v 0 ∀t ∈ (0, 1.0] with time step of 0.1 using
second order and fourth order Runge Kutta methods. Plot graphs of
u, u0 , v and v 0 versus t. Tabulate your computed solution. Compare
two solutions from the second and the fourth order methods.
(b) Repeat the calculations in (a) using time step of 0.05. Tabulate
your computed solution and plot similar graphs in (a). Compare
the computed solution here with that in (a). Write a short discus-
sion. Also compare the two solutions obtained here from second
and fourth order Runge-Kutta method.
(a) Calculate φ(t), φ0 (t) ∀t ∈ (π/2, π/2 + 3] using second order and fourth
order Runge Kutta methods with time step of 0.1. Tabulate your
456 NUMERICAL SOLUTION OF INITIAL VALUE PROBLEMS
(a) Calculate u(t), u0 (t) ∀t ∈ (0, 2.0] with time step of 0.1 and using
second and fourth order Runge Kutta methods. Plot graphs of u(t)
and u0 (t) versus t using calculated solutions. Compare the solutions
obtained using second and fourth order Runge Kutta methods.
(b) Repeat the calculations and all the details in (a) using time step
of 0.05. Compare the solution obtained here with that obtained in
(a). Provide a short discussion.
(a) Calculate u(t), u0 (t) ∀t ∈ (0, 2.0] using Runge Kutta methods of
second and fourth orders using an integrating time step of 0.2. Tab-
ulate your results and plot graphs of u(t) and u0 (t) versus t using
calculated solutions. Compare the two sets of solutions.
(b) Repeat the calculations and all other details in (a) using time step
of 0.01. Compare these results with those in (a). Write a short
discussion.
(a) Calculate φ(t), φ0 (t) ∀t ∈ (1, 3] using first order and second order
Runge Kutta methods with time step of 0.2. Tabulate your calcu-
lated solution and plot graphs of φ(t), φ0 (t) versus t. Compare the
two sets of solutions.
(b) Repeat the calculations and details in (a) using integration time step
of 0.1. Compare this computed solution with the one calculated in
(a). Write a short discussion.
11
Fourier Series
11.1 Introduction
In many applications such as initial value problems often the periodic
forcing functions i.e. periodic non-homogeneous part may not be analytical.
Such forcing functions are not continuous and differential every where in the
domain of definition. Rectangular or square waves, triangular waves, saw
tooth waves, etc. are a few examples. In such cases solutions of the initial
value problems may be difficult to obtain. Fourier series provides means
of approximate representation of such functions that are continuous and
differentiable everywhere in the domain of definitions, hence are meritorious
in the solutions of the IVPs.
459
460 FOURIER SERIES
Hence,
Z T
1
a0 = f (t) dt (11.4)
T 0
(ii) Determination of ak ; k = 1, 2, . . . , j, . . . , ∞
To determine aj , we multiply (11.1) by sin(jωt) and integrate with
respect to t with limits [0, T ].
Z T Z T
f (t) sin(jωt) dt = a0 sin(jωt) dt+
0 0
∞ Z
X T
sin(jωt) ak sin(kωt) + bk cos(kωt) dt (11.5)
k=1 0
we note that
Z T
sin(jωt) dt = 0
0
Z T
sin(jωt) sin(kωt) dt = 0 ; k = 1, 2, . . . , ∞ k 6= j
0
Z T (11.6)
sin(jωt) cos(kωt) dt = 0 ; k = 1, 2, . . . , ∞ k =6 j
0
Z T Z T
T
sin(jωt) sin(jωt) dt = sin2 (jωt) dt =
0 0 2
Hence,
Z T
2
aj = f (t) sin(jωt) dt ; j = 1, 2, . . . , ∞ (11.8)
T 0
(iii) Determination of bk ; k = 1, 2, . . . , j, . . . , ∞
To determine bj , we multiply (11.1) by cos(jωt) and integrate with
respect to time with limits [0, T ].
Z T Z T
f (t) cos(jωt) dt = a0 cos(jωt) dt+
0 0
∞ Z
X T
cos(jωt) ak sin(kωt) + bk cos(kωt) dt (11.9)
k=1 0
11.2. FOURIER SERIES REPRESENTATION OF ARBITRARY PERIODIC FUNCTION461
we note that
Z T
cos(jωt) dt = 0
0
Z T
cos(jωt) sin(kωt) dt = 0 ; k = 1, 2, . . . , ∞ k 6= j
0
Z T (11.10)
cos(jωt) cos(kωt) dt = 0 ; k = 1, 2, . . . , ∞ k =6 j
0
Z T Z T
T
cos(jωt) sin(jωt) dt = cos2 (jωt) dt =
0 0 2
Remarks.
(1) Regardless whether f (t) is analytic or not its Fourier series approxima-
tion (11.1) is always analytic.
(3) It is possible to design L2 -norm of error between actual f (t) and its
Fourier approximation to quantitatively judge the accuracy of the ap-
proximation.
f (t)
1
t
t=0
1
t = −T/2 t = T/2
t= −T/4 t= T/4
!
Z T Z −T/4 Z T/4 Z T/2
1 1
a0 = f (t) dt = (−1) dt + (1) dt + (−1) dt
T 0 T −T/2 −T/4 T/4
1
= ((−T/2 + T/4) + (T/4 − T/4) + (T/2 − T/4))
T
1
= (0) = 0
T
Z T/2
2
aj = f (t) sin(jωt) dt
T −T/2
or
!
Z −T/4 Z T/4 Z T/2
2
aj = (−1) sin(jωt) dt + (1) sin(jωt) dt + (−1) sin(jωt) dt
T −T/2 −T/4 T/4
2 1 −T/4 T/4 T/2
= cos(jωt) − cos(jωt) + cos(jωt)
T jω −T/2 −T/4 T/4
11.3. CONCLUDING REMARKS 463
or
2 1
aj = cos (jω T/4) − cos (jω T/2) − cos (jω T/4) + cos (jω T/4) +
T jω
!
2
cos (jω T/2) − cos (jω T/4) = (0) = 0
T
and Z T/2
2
bj = f (t) cos(jωt) dt
T −T/2
or
!
Z −T/4 Z T/4 Z T/2
1
bj = (−1) cos(jωt) dt + (1) cos(jωt) dt + (−1) cos(jωt) dt
T −T/2 −T/4 T/4
or
2 1 −T/4 T/4 T/2
bj = − sin(jωt) + sin(jωt) − sin(jωt)
T jω −T/2 −T/4 T/4
Problems
11.1 Figure (a) below shows a rectangular wave.
f (t)
0 t
T/2 T
A
(a) A rectangular wave of period T
T
0 t
T/2
A
(a) A sawtooth wave of period T
f (t)
0 t
T/2 T
(a) A triangular wave of time period T
[32]
BIBLIOGRAPHY 467
BIBLIOGRAPHY
[1] Allaire, F.E.: Basics of the Finite Element Method. William C. Brown,
Dubuque, IA (1985)
[5] Bathe, K.J., Wilson, E.L.: Numerical Methods in Finite Element Anala-
ysis. Prentice-Hall, Englewood Cliffs, NJ (1976)
[7] Burden, R.L., Faires, J.D.: Numerical Analysis, 5th edn. PWS Pub-
lishing, Boston (1993)
[8] Carnahan, B., Luther, H.A., Wilkes, J.O.: Applied Numerical Methods.
Wiley, New York (1969)
[10] Cheney, W., Kincaid, D.: Numerical Mathematics and Computing, 2nd
edn. Brooks/Cole, Monterey, CA (1994)
[17] Forsythe, G.E., Malcolm, M.A., Moler, C.B.: Computer Methods for
Mathematical Computation. Prentice-Hall, Englewood Cliffs, NJ (1977)
[18] Froberg, C.E.: Introduction to Numerical Analysis. Addison-Wesley
Publishing Company (1969)
[19] Gear, C.W.: Numerical Initial-Value Problems in Ordinary Differential
Equations. Prentice-Hall, Englewood Cliffs, NJ (1971)
[20] Gear, C.W.: Applied Numerical Analysis, 3rd edn. Addison-Wesley,
Reading, MA (1989)
[21] Hamming, R.W.: Numerical Methods for Scientists and Engineers. Wi-
ley, New York (1973)
[22] Hartley, H.O.: The modified gauss-newton method for fitting non-linear
regression functions by least squares. Technometrics 3, 269–280 (1961)
[23] Henrici, P.H.: Elements of Numerical Analysis. Wiley, New York (1964)
[24] Henrick, P.: Error Propagation for Finite Difference Methods. John
Wiley & Sons (1963)
[25] Hildebrand, F.B.: Introduction to Numerical Analysis, 2nd edn.
McGraw-Hill, New York (1974)
[26] Hoffman, J.: The Theory of Matrices in Numerical Analysis. Blaisdell,
New York (1964)
[27] Hoffman, J.: Numerical Methods for Engineers and Scientists. McGraw-
Hill, New York (1992)
[28] Householder, A.S.: Principles of Numerical Analysis. McGraw-Hill,
New York (1953)
[29] Hurty, W.C., Rubinstein, M.F.: Dynamics of Structures. Prentice-Hall
(1964)
[30] Isaacson, E., Keller, H.B.: Analysis of Numerical Methods. Wiley, New
York (1966)
[31] Lapidus, L., Pinder, G.F.: Numerical Solution of Partial Differential
Equations in Science and Engineering. Wiley, New York (1981)
[32] Lapidus, L., Seinfield, J.H.: Numerical Solution of Ordinary Differential
Equations. Academic Press, New York (1971)
[33] Maron, M.J.: Numerical Analysis, A Practical Approach. Macmillan,
New York (1982)
BIBLIOGRAPHY 469
[37] Paz, M.: Structural Dynamics: Theory and Computations. Van Nos-
trand Reinhold Company (1984)
[38] Ralston, A., Rabinowitz, P.: A First Course in Numerical Analysis, 2nd
edn. New York (1978)
[39] Reddy, J.N.: An Introduction to the Finite Element Method, 3rd edn.
McGraw-Hill (2006)
[44] Stasa, F.L.: Applied Finite Element Analysis for Engineers. Hold, Rine-
hard and Winston, New York (1985)
[46] Surana, K.S., Ahmadi, A.R., Reddy, J.N.: The k-version of finite ele-
ment method for self-adjoint operators in BVP. International Journal
of Computational Engineering Science 3(2), 155–218 (2002)
[47] Surana, K.S., Ahmadi, A.R., Reddy, J.N.: The k-version of finite ele-
ment method for non-self-adjoint operators in BVP. International Jour-
nal of Computational Engineering Science 4(4), 737–812 (2003)
[48] Surana, K.S., Ahmadi, A.R., Reddy, J.N.: The k-version of finite ele-
ment method for non-linear operators in BVP. International Journal of
Computational Engineering Science 5(1), 133–207 (2004)
470 BIBLIOGRAPHY
[49] Surana, K.S., Reddy, J.N.: The Finite Element Method for Boundary
Value Problems: Mathematics and Computations. CRC Press/Taylor
& Francis Group (2016)
[50] Surana, K.S., Reddy, J.N.: The Finite Element Method for Initial Value
Problems: Mathematics and Computations. CRC Press/Taylor & Fran-
cis Group (2017)
[51] Surana, K.S., Reddy, J.N., Allu, S.: The k-version of finite element
method for initial value problems: Mathematical and computational
framework. International Journal for Computational Methods in Engi-
neering Science and Mechanics 8(3), 123–136 (2007)
[53] Wilkinson, J.H., Reinsch, C.: Linear Algebra: Handbook for Automatic
Computation, vol. 11. Springer-Verlag, Berlin (1971)
471
472 INDEX
F Periodic, 459
Periodic functions, 459–466
Factorization or decomposition Representation of arbitrary periodic func-
Crout Decomposition, 56–60 tion, 459
[L][U ] decomposition, 49–54 Sawtooth wave, 464
False position method, 99 Square wave, 462–463, 464
Convergence, 100 Time period, 459
Derivation, 99 Triangular wave, 465
Relative error (stopping criteria), 100 Fourth order Rune-Kutta, 445–447
Finite difference methods, 397–415
BVPs, 397–415 G
ODEs, 397–407
Eigenvalue problem, 405–407 Gauss Elimination, 34–46
Second order non-homogeneous ODE, Full pivoting, 43–46
397–407 back substitution, 43–46
Function values as BCs, 397– upper triangular form, 43–44
407 Naive, 34–39
Function values and derivatives back substitution, 36–38
as BCs, 402–405 upper triangular form, 35–36
PDEs Partial pivoting, 39–43
Laplace equation, 408–412 back substitution, 41–43
Poisson’s equation, 408, 412–415 upper triangular form, 39–40
IVPs Gauss-Jordan method, 46–49
Heun method, 444-445 Algorithm, 46–48
Runge-Kutta methods, 442-454 Examples, 48–49
Numerical differentiation, 347–354 Gauss quadrature, 288–300
Finite element method, 359–397 Examples, 300–305
Differential operators, 360–361 in R1 over [−1, 1], 288–295
Discretization, 366-369 n point quadrature, 292–293
FEM based on FL, 369–374 Three point quadrature, 290–292
FEM based on residual functional, Two point quadrature, 289–290
374–375 in R1 over [a, b], 295–296
FEM using GM/WF, 372 in R2 over [−1, 1] × [−1, 1], 296
concomitant, 373 in R2 over [a, b] × [c, d], 273
EBC, NCM, 373 in R3 over a two unit cube, 297
PV, SV, 373 in R3 over a prism, 299
weak form, 372–373 Gauss-Seidel method, 68–74
FEM using GM, PGM, WRM, 371– Algorithm, 68–69
372 Convergence criterion, 70
assembly, 372 Examples, 70–74
element equations, 371 Gradient method, also see Newton’s method
integral form, 369–374 or Newton-Raphson method, 102
local approximations in R1 , 379 in R1 , 102–107
mapping in R1 , 379 error analysis, 105–106
second order ODE, 375 examples, 106–107
Global approximations, 369 method, 102–104
Integral form, 361 in R2 , 118–123
based on Fundamental Lemma, 362– example, 120–123
365 method, 118–120
residual functional, 365–366
Local approximations, 369–370 H
First Backward difference, 349–350
First Forward difference, 349 Half interval method, 95–98
First order approximation, 349–350 Harmonics, 459
First order ODEs, 360–407, 437–454 Heun’s method, 444–445
First order Runge-Kutta methods, 442 Higher order approximation, 350–353
Fourier series, 459–463 Householder’s method, also see Eigenvalue
Determination of coefficients, 459–461 problems, 180–186
Fundamental frequency, 459 House holder transformation, see Eigen-
Harmonic, 459 value problems, 181–183
474 INDEX
Open interval, see BVPs and IVPs and root Trapezoidal Rule, 271–272
finding methods in R2
Ordinary Differential Equations Examples, 300–305
Boundary Value Problem, 359–407 over [−1, 1] × [−1, 1], 296
Finite difference method, 397–407 over [a, b] × [c, d], 297
Finite element method, 366–397 in R3
Initial Value Problem, 425–454 over a prism, 299
Finite element method, 434–436 over two unit cube, 298
Time integration of ODEs, 437–454
R
P
Relative error
Partial differential equation Bisection method, 95
Finite difference method, 408–415 False position method, 100
Partial pivoting, 39–43 Fixed pint method, 114
PDEs, see partial differential equations Gauss-Seidel method, 70
Pivoting, 39–43, 43–46 Newton’s method
Gauss elimination first order, Newton-Raphson, 104
full, 43–46 nonlinear simultaneous equations, 119
partial, 39–43 second order, 108
Poisson’s equation, 408 Romberg integration, 285
Polynomial interpolation or polynomial, also Relaxation techniques, 80, 82
see Interpolation Residual, 312-315, 365, 394
in R1 Romberg integration, 285–286
approximate error, see Approximate Roots of equations
relative error Bisection method (Half-interval method),
definition, 195–196 95–98
Lagrange interpolating polynomial, False position, 99-102
198–217 Fixed point, 114–116
Newton’s interpolating polynomial, Graphical, 91–92
251–255 Incremental search, 92–95
Pascale rectangle, 222 Newton-Raphson (Newton’s linear) method,
piecewise linear, 196 102–107
polynomial interpolation, 197–198 Newton’s second order method, 108–
in R2 , Lagrange interpolation, Tensor 113
product, 217–237, 224–231 Secant method, 113–114
in R3 , Lagrange interpolation, Tensor Runge-Kutta methods, 442–454
product, 237–247, 237–247 First order Runge-Kutta method, 442
Fourth order Runge-Kutta method, 445–
Q 447
Second order Runge-Kutta method, 443–
QR iteration, 183 445
Quadratic convergence, 102–106 Third order Runge-Kutta method, 445
Quadratic interpolation, 198–217
Quadrature, also see Integration or numeri- S
cal integration
in R1 , 269 Secant method, 113–114
Examples, 276–283, 286–288, 300– Serendipity (interpolation), 232–237
305 Shape functions, 369–370
Gauss quadrature, 288–300 Simpson’s method
n-point quadrature, 292–293 1/3 Rule, 272–274
over [−1, 1], 288–295 3/8 Rule, 274–276
over [a, b], 295–296 Simultaneous equations, also System of lin-
two point quadrature, 289–290 ear algebraic equations, 10–80
three point quadrature, 290–292 Definitions (linear, nonlinear), 10
Newton-Cotes integration, 276 Elementary row operations, 26
Richardson’s extrapolation, 284–285 Linear dependence, 20
Romberg method, 285–286 Linear independence, 20
Simpson’s 1/3 Rule, 272–274 Matrix and vector representation, 25
Simpson’s 3/8 Rule, 274–276 Methods of solution
INDEX 477
T
Taylor series, 102, 105, 108, 109, 118, 256,
329, 348–354, 397, 398, 437, 443
Third order Runge-Kutta method, 445
Trace of matrices, 15
Transpose of a matrix, 15
Trapezoid rule, 271–273
Triangular matrices, 13
Tridiagonal matrices (banded), 13