Problem Discretization Approximation Theory Revised
Problem Discretization Approximation Theory Revised
_
Computationally Tractable
Approximation
_
and we compute an approximate solution using the computable version. In this module,
we explain the process of problem approximation using various approaches available in the
literature. In the end, we distill out generic equation forms that frequently arise in the
process of the problem approximation.
1 Unied Problem Representation
Using the generalized concepts of vectors and vector spaces discussed in the previous module,
we can look at mathematical models in engineering as transformations, which map a subset
of vectors from one vector space to a subset in another space.
Denition 1 (Transformation):Let A and 1 be linear spaces and let ` be subset of
A. A rule which associates with every element x ` to an element y 1 is said to be
transformation from A to 1 with domain `. If y corresponds to x under the transformation
we write y = T (x) where T (.) is called an operator.
The set of all elements for which an operator T is dened is called as domain of T and
the set of all elements generated by transforming elements in the domain by T are called as
range of T . If for every y 1 , there is utmost one x ` for which T (x) = y , then T (.)
is said to be one to one. If for every y 1 there is at least one x `. then T is said to
map ` onto 1. A transformation is said to be invertible if it is one to one and onto.
Denition 2 (Linear Transformations): A transformation T mapping a vector space
A into a vector space 1 is said to be linear if for every x
(1)
. x
(2)
A and all scalars c. ,
we have
3
T (cr
(1)
+ ,r
(2)
) = cT (x
(1)
) + ,T (x
(2)
). (1)
Note that any transformation that does not satisfy the above denition is not a linear
transformation.
Denition 3 (Continuous Transformation): A transformation T : ` 1 is con-
tinuous at point x
` if and only if
_
x
(n)
_
x
implies T (x
(n)
) T (x
) . If T (.) is
continuous at each x
_
y=
T ( x)
_
(6)
6
where x A
n
. y 1
n
are nite dimensional spaces and
T (.) is an approximation of the
original operator T (.).This process is called as discretization. The main strategy used for
discretization is approximation of continuous functions using nite order polynomials. In the
sections that follow, we discuss the theoretical basis for this choice and dierent commonly
used polynomial based approaches for problem discretization.
2 Polynomial Approximation[3]
Given an arbitrary continuous function over an interval, can we approximate it with another
simple function with arbitrary degree of accuracy? This question assumes signicant
importance while developing many numerical methods. In fact, this question can be posed
in any general vector space. We often use such simple approximations while performing
computations. The classic examples of such approximations are use of a rational number to
approximate an irrational number (e.g. 22,7 is used in place of : or nite series expansion
of number c) or polynomial approximation of a continuous function. This section discusses
rationale behind such approximations.
Denition 5 (Dense Set) A set 1 is said to be dense in a normed space A. if for each
element x A and every 0. there exists an element d 1 such that |x d| < .
Thus, if set 1 is dense in A, then there are points of 1 arbitrary close to any element
of A. Given any x A , a sequence can be constructed in 1 which converges to x. Classic
example of such a dense set is the set of rational numbers in the real line. Another dense
set, which is widely used for approximations, is the set of polynomials. This set is dense
in C[c. /] and any continuous function ,(t) C[c. /] can be approximated by a polynomial
function j(t) with an arbitrary degree of accuracy as evident from the following result. This
classical result is stated here without giving proof.
Theorem 6 (Weierstrass Approximation Theorem): Consider space C[c. /], the set
of all continuous functions over interval [c. /], together with norm dened on it as
|,(t)|
1
=
max
t [c. /]
[,(t)[ (7)
Given any 0. for every ,(t) C[c. /] there exists a polynomial j
n
(t) such that |,(t) j
n
(t)| <
.
This fundamental result forms the basis of the problem discretization in majority of the
cases. It may be noted that this is only an existence theorem and does not provide any
7
method of constructing a polynomial approximation. The following three approaches are
mainly used for constructing approximating polynomials:
Taylor series expansion
Polynomial interpolation
Least square approximation
These approaches and their applications to problem discretization will be discussed in
detail in the sections that follow.
3 Discretization using Taylor Series Approximation
3.1 Local approximation by Taylor series expansion [14, 9]
To begin with let us consider Taylor series expansion for a real valued scalar function. Given
any scalar function ,(r) : 1 1. which is continuously dierentiable : + 1 times at
r = r, the Taylor series expansion of this function attempts to construct a local polynomial
approximation of the form
j
n
(r) = c
0
+ c
1
(r r) + ..... + c
n
(r r)
n
(8)
of ,(r) in the neighborhood of a point, say r = r. such that
d
k
j
n
(r)
dr
k
=
d
k
,(r)
dr
k
(9)
for k = 0,1,2,...n. For k = 0, we have
j
n
(r) = c
0
= ,(r)
Similarly, for k = 1, the derivative condition (9) reduces to
dj
n
(r)
dr
=
_
c
1
+ 2c
2
(r r) + ..... + :c
n
(r r)
n1
x=x
= c
1
=
d,(r)
dr
and, in general for the kth derivative, we have
d
k
j
n
(r)
dr
k
=
_
(/!) c
k
+ ((/ + 1)/...2) c
k+1
(r r) + ..... + (:(: 1)...(: /)) c
n
(r r)
nk
_
x=x
= c
k
=
1
/!
d
k
,(r)
dr
k
(10)
8
Thus, the local polynomial approximation j
n
(r) can be expressed as
j
n
(r) = ,(r) +
_
d,(r)
dr
_
or +
1
2!
_
d
2
,(r)
dr
2
_
(or)
2
+ .... +
1
:!
_
d
n
,(r)
dr
n
_
. (or)
n
(11)
where or = r r. The residual or the approximation error, :
n
(r. or). is dened as follows
:
n
(r. or) = ,(r) j
n
(r) (12)
plays an important role in analysis. The Taylor theorem gives the following analytical
expression for the residual term
:
n
(r. or) =
1
(: + 1)!
d
n+1
,(r + `or)
dr
n+1
(or)
n+1
n/c:c (0 < ` < 1) (13)
which is derived by application of the mean value theorem and the Rolles theorem on interval
[r. r] [14]. Thus, given a scalar function ,(r) : 1 1. which is continuously dierentiable
:+1 times at r = r, the Taylor series expansion of this function can be expressed as follows
,(r) = ,(r) +
_
d,(r)
dr
_
or +
1
2!
_
d
2
,(r)
dr
2
_
(or)
2
+.... +
1
:!
_
d
n
,(r)
dr
n
_
. (or)
n
+:
n
(r. or) (14)
While developing numerical methods, we require a more general, multi-dimensional ver-
sion of the Taylor series expansion. Given function F(x) : 1
n
1
m
. which is continuously
dierentiable : + 1 times at x = x, the Taylor series expansion of this function in the
neighborhood the point x = x can be expressed as follows
F(x) = P
n
(x) +R
n
(x. ox) (15)
P
n
(x) = F(x) +
_
JF(x)
Jx
_
ox+
1
2!
_
J
2
F(x)
Jx
2
_
(ox.ox) +.... +
1
:!
_
J
n
F(x)
Jx
n
_
(ox.ox. ....ox) (16)
where ox = x x and the residual R
n
(x. ox) is dened as follows
R
n
(x. ox) =
1
(: + 1)!
J
n+1
F(x + `ox)
Jx
n+1
(ox.ox. ....ox) n/c:c (0 < ` < 1) (17)
Here, the F(x) 1
m
. Jacobian
_
@F(x)
@x
_
is a matrix of dimension (: :),
_
@
2
F(x)
@x
2
_
is a
(: : :) dimensional array and so on. In general,
_
@
r
F(x)
@x
r
_
is an (: : :... :)
dimensional array such that when the vector ox operates on it : times, the result is an
:1 vector. It may be noted that the multi-dimensional polynomial given by equation (16)
satises the condition
d
k
P
n
(x)
dx
k
=
d
k
F(x)
dx
k
(18)
for i = 1,2,...n. The following two multidimensional cases are used very frequently in the
numerical analysis.
9
Case A: Scalar Function ,(x) : 1
n
1
,(x) = ,(x) + [\,(x)]
T
ox +
1
2!
ox
T
_
\
2
,(x)
ox + 1
3
(x. ox)
\,(x) =
_
J,(x)
Jx
_
=
_
J,
Jr
1
J,
Jr
2
......
J,
Jr
n
_
T
x=x
\
2
,(x) =
_
J
2
,(x)
Jx
2
_
=
_
_
J
2
,
Jr
2
1
J
2
,
Jr
1
Jr
2
......
J
2
,
Jr
1
Jr
n
J
2
,
Jr
2
Jr
1
J
2
,
Jr
2
2
......
J
2
,
Jr
2
Jr
n
...... ...... ...... ......
J
2
,
Jr
n
Jr
1
J
2
,
Jr
n
Jr
2
......
J
2
,
Jr
2
n
_
_
x=x
1
3
(x. ox) =
1
3!
n
i=1
n
j=1
n
k=1
J
3
,(x + `ox)
Jr
i
Jr
j
Jr
k
or
i
or
j
or
k
; (0 < ` < 1)
Here, \,(x).referred to as gradient, is an :1 vector and, [\
2
,(x)]. known as Hessian,
is an : : matrix. It may be noted that the Hessian is always a symmetric matrix.
Example 7 Consider the function vector ,(x) : 1
2
1
,(x) = r
2
1
+ r
2
2
+ c
(x
1
+x
2
)
which can be approximated in the neighborhood of x =
_
1 1
_
T
using the Taylor series
expansion as
,(x) = ,(x) +
_
J,
1
Jr
1
J,
1
Jr
2
_
x=x
ox +
1
2
[ox]
T
_
_
J
2
,
Jr
2
1
J
2
,
Jr
1
Jr
2
J
2
,
Jr
2
Jr
1
J
2
,
Jr
2
2
_
_
x=x
ox+1
3
(x. ox) (19)
= (2 + c
2
) +
_
(2 + c
2
) (2 + c
2
)
_
_
r
1
1
r
2
1
_
+
1
2
_
r
1
1
r
2
1
_
T
_
(2 + c
2
) c
2
c
2
(2 + c
2
)
__
r
1
1
r
2
1
_
+ 1
3
(x. ox) (20)
10
Case B: Functionvector 1(x) : 1
n
1
n
1(x) = 1(x) +
_
J1(x)
Jx
_
ox +R
2
(x. ox) (21)
_
J1(x)
Jx
_
=
_
_
J,
1
Jr
1
J,
1
Jr
2
......
J,
1
Jr
n
J,
2
Jr
1
J,
2
Jr
2
......
J,
2
Jr
n
...... ...... ...... ......
J,
n
Jr
1
J,
n
Jr
2
......
J,
n
Jr
n
_
_
x=x
Here,
_
J1(x)
Jx
_
. referred to as Jacobian matrix is an : : matrix.
Example 8 Consider the function vector 1(x) 1
2
1(x) =
_
,
1
(x)
,
2
(x)
_
=
_
r
2
1
+ r
2
2
+ 2r
1
r
2
r
1
r
2
c
(x
1
+x
2
)
_
which can be approximated in the neighborhood of x =
_
1 1
_
T
using the Taylor series
expansion as follows
1(x) =
_
,
1
(x)
,
2
(x)
_
+
_
_
J,
1
Jr
1
J,
1
Jr
2
J,
2
Jr
1
J,
2
Jr
2
_
_
x=x
ox + 1
2
(x. ox)
=
_
4
c
2
_
+
_
4 4
2c
2
2c
2
__
r
1
1
r
2
1
_
+ 1
2
(x. ox)
3.2 Discretization using Finite Dierence Method [2]
To begin with we present an application of scalar Taylor series expansion to discretization
of ODE-BVP and PDEs. Even when the domain of the function under consideration is
multivariate, the Taylor series approximation is applied locally by considering one variable
at a time.
3.2.1 Discretization of ODE-BVPs
Consider the following general form of 2
nd
order ODE-BVP problem frequently encountered
in engineering problems
_
d
2
n
d.
2
.
dn
d.
. n. .
_
= 0 ,o: . (0. 1) (22)
11
1.C. 1 (ct . = 0) : ,
1
_
dn
d.
. n. 0
_
= 0 (23)
1.C. 2 (ct . = 1) : ,
2
_
dn
d.
. n. 1
_
= 0 (24a)
Let n
(.) C
(2)
[0. 1] denote the exact / true solution to the above ODE-BVP. Depending
on the nature of operator .it may or may not be possible to nd the true solution to the
problem. In the present case, however, we are interested in nding an approximate numerical
solution, say n(.). to the above ODE-BVP. The basic idea in nite dierence approach is
to convert the ODE-BVP into a set of coupled linear or nonlinear algebraic equations using
local approximation of the derivatives based on Taylor series expansion. In order to achieve
this, the domain 0 _ . _ 1 is divided into (: + 1) grid points .
1
. ..... .
n
. .
n+1
located such
that
.
1
= 0 < .
2
< .
3
... < .
n+1
= 1
The simplest option is to choose them equidistant, i.e.
.
i
= (i 1)(.) = (i 1),(:) for i = 1. 2. .... : + 1
which is considered for the subsequent development. Let the value of the approximate
solution, n(.). at location .
i
be denoted as n
i
= n(.
i
). If . is suciently small, then, using
the Taylor Series expansion, we can write
n
i+1
= n(.
i+1
) = n(.
i
+ .)
= n
i
+
dn(.
i
)
d.
(.) +
1
2!
d
2
n(.
i
)
d.
2
(.)
2
+
1
3!
d
3
n(.
i
)
d.
3
(.)
3
+ :
4
(.
i
. .) (25)
Similarly, using Taylor series expansion , we can express n
i1
= n(.
i1
) = n(.
i
.) as
follows
n
i1
= n
i
dn(.
i
)
d.
(.
i
) +
1
2!
d
2
n(.
i
)
d.
2
(.)
2
1
3!
d
3
n(.
i
)
d.
3
(.)
3
+ :
4
(.
i
. .) (26)
From equations (25) and (26) we can arrive at several approximate expressions for
_
dn(.
i
)
d.
_
.
Rearranging equation (25) we obtain
dn(.
i
)
d.
=
(n
i+1
n
i
)
.
_
d
2
n(.
i
)
d.
2
_
.
2
_
+ ...
_
(27)
and, when . is suciently small, we obtain the forward dierence approximation of the
local rst order derivative neglecting the higher order terms, i.e.
dn(.
i
)
d.
(n
i+1
n
i
)
.
12
Similarly, starting from equation (26), we can arrive at the backward dierence approxima-
tion of the local rst order derivative, i.e.
dn(.
i
)
d.
(n
i
n
i1
)
.
(28)
It may be noted that the errors in the approximation in each case are of the order of ., which
is denoted as C(.). Alternatively, subtracting equation (26) from (25) and rearranging we
can arrive at the following expression
dn(.
i
)
d.
=
(n
i+1
n
i1
)
2(.)
_
n
(3)
i
(
.
2
3!
) + ...
_
(29)
and, for suciently small ., we obtain central dierence approximation of the local rst
order derivative by neglecting the terms of order higher than .
2
, i.e.
dn(.
i
)
d.
(n
i+1
n
i1
)
2(.)
(30)
The central dierence approximation is accurate to C[(.)
2
] and is more commonly used.
To arrive at an approximation for the second order derivatives at the ith grid point,
adding equation (26) with (25) and rearranging, we have
d
2
n(.
i
)
d.
2
=
(n
i+1
2n
i
+ n
i1
)
(.)
2
_
2
d
4
n(.
i
)
d.
4
(.)
2
4!
+ ...
_
(31)
When . is suciently small, we obtain the following approximation for the second deriva-
tive
d
2
n(.
i
)
d.
2
n
i+1
2n
i
+ n
i1
(.)
2
(32)
Note that errors in the approximations (30) and (32) are of order C[.)
2
]. These approxima-
tions of the local derivatives can now be used to discretize the ODE-BVP. While discretizing
the ODE, it is preferable to use the approximations having similar accuracies. The basic idea
is to enforce the approximation of equation (22) at each internal grid point. The remaining
equations are obtained from discretization of the boundary conditions. The steps involved
in the discretization can be summarized as follows:
Step 1 : Force residual 1
i
at each internal grid point to zero,i.e.,
1
i
=
_
(n
i+1
2
i
+ n
i1
)
(.)
2
.
(n
i+1
n
i1
)
2(.)
. n
i
. .
i
_
= 0 (33)
i = 2. 3.... :. (34)
This gives rise to (: 1) equations in (: + 1) unknowns n
i
: i = 1. 2. .... : + 1.
13
Step 2: Use boundary conditions to generate the remaining algebraic equations. This
can be carried out using either of the following two approaches
Approach 1: Use one-sided derivatives only at the boundary points, i.e.,
,
1
_
(n
2
n
1
)
.
. n
1
. 0
_
= 0 (35)
,
2
_
(n
n+1
n
n
)
.
. n
n+1
. 1
_
= 0 (36)
This gives remaining two equations.
Approach 2: This approach introduces two more variables n
0
and n
n+2
at two
hypothetical grid points, which are located at
.
0
= .
1
. = .
.
n+2
= .
n+1
+ . = 1 + .
With the introduction of these hypothetical points, the boundary conditions are
evaluated as
,
1
[
(n
2
n
0
)
(2.)
. n
1
. 0] = 0 (37)
,
2
[
n
n+2
n
n
(.)
. n
n+1
. 1] = 0 (38)
Now we have : + 3 variables and : + 1 algebraic constraints. Two additional
algebraic equations are generated by setting the residual at the boundary points
to zero,i.e., at .
1
and .
n+1
,i.e.,
1
1
(. = 0) =
_
(n
2
2n
1
+ n
0
)
(.)
2
.
(n
2
n
0
)
2(.)
. n
1
. 0
_
= 0
1
n+1
(. = 1) =
_
(n
n+2
2n
n+1
+ n
n
)
(.)
2
.
(n
n+2
n
n
)
2(.)
. n
n+1
. 1
_
= 0
This results in (: + 3) equations in (: + 3) unknowns n
i
: i = 0. 1. 2. .... : + 2.
It may be noted that the local approximations of the derivatives are developed under the
assumption that . is chosen suciently small. Consequently, it can be expected that the
quality of the approximate solution would improve with the increase in the number of grid
points.
14
Example 9 Consider steady state heat transfer/conduction in a slab of thickness 1, in which
energy is generated at a constant rate of \,:
3
. The boundary at . = 0 is maintained at a
constant temperature 1
(40)
1.C.ct . = 1 : /
_
d1
d.
_
z=L
= /[1
1
1(1)] (41)
Note that this problem can be solved analytically. However, it is used here to introduce the
concepts of discretization by nite dierence approach. Dividing the region 0 _ . _ 1 into
: equal subregions with . = 1,: and setting residuals zero at the internal grid points, we
have
(1
i+1
21
i
+ 1
i1
)
(.)
2
+
/
= 0 (42)
for i = 2. 3. ....:. Using the boundary condition (40) i.e. (1
1
= 1
), the residual at .
2
reduces
to
21
2
+ 1
3
= (.)
2
_
/
_
1
(43)
Using one sided derivative at . = 1, boundary condition (41) reduces to
/
(1
n+1
1
n
)
(.)
= /(1
1
1
n+1
) (44)
or
1
n+1
(1 +
/.
/
) 1
n
= /.
1
1
/
(45)
Rearranging the equations in the matrix form, we have
A x= y
x =
_
1
2
1
3
... 1
n+1
_
T
y =
_
(.)
2
(,/) 1
(.)
2
(,/) ... +/(.)1
1
,/
_
T
A =
_
_
2 1 0 0 .. 0
1 2 1 0 .. 0
0 1 2 1 .. 0
.. .. .. .. .. ..
.. .. .. .. 2 1
0 0 .. .. 1 (1 + /.,/)
_
_
15
Thus, after discretization, the ODE-BVP is reduced to a set of linear algebraic equation and
the transformation operator
T = A. It may also be noted that we end up with a tridiagonal
matrix A, which is a sparse matrix i.e. it contains large number of zero elements.
Example 10 Consider the ODE-BVP describing the steady state conditions in a tubular
reactor with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out
at a constant temperature. The steady state behavior can be modelled using the following
ODE-BVP:
1
1c
d
2
C
d.
2
dC
d.
1cC
2
= 0 (0 _ . _ 1) (46)
1.C.ct . = 0 :
dC
d.
= 1c(C 1) ct . = 0; (47)
1.C.ct . = 1 :
dC
d.
= 0 ct . = 1; (48)
Forcing residuals at (n-1) internal grid points to zero, we have
1
1c
C
i+1
2C
i
+ C
i1
(.)
2
C
i+1
C
i1
2 (.)
= 1cC
2
i
i = 2. 3. ...:
Dening
c =
_
1
(.)
2
1c
1
2 (.)
_
; , =
_
2
1c (.)
2
_
. =
_
1
(.)
2
1c
+
1
2 (.)
_
the above set of nonlinear equations can be rearranged as follows
cC
i+1
,C
i
+ C
i1
= 1cC
2
i
i = 2. 3. ...:
The two boundary conditions yield two additional equations
C
2
C
1
.
= 1c(C
1
1)
C
n+1
C
n
.
= 0
The resulting set of nonlinear algebraic equations can be arranged as follow
T ( x) = A xG( x) =0 (49)
16
where
x =
_
_
C
1
C
2
...
...
C
n+1
_
_
; G( x) =
_
_
1c (.)
1cC
2
2
.....
1cC
2
n
0
_
_
A=
_
_
(1 + .1c) 1 0. ... ... 0
, c ... ... ..
.... ... .. ... ... 0.
..... ... .. ... , c
0 ... .. ... 1 1
_
_
(50)
Thus, the ODE-BVP is reduced to a set of coupled nonlinear algebraic equations after dis-
cretization.
3.3 Discretization of PDEs using Finite Dierence [2]
Typical second order PDEs that we encounter in engineering problems are of the form
Jn
Jt
_
c\
2
n + /\n + cq(n)
= ,(r. . .. t)
r
L
< r < r
H
;
L
< <
H
; .
L
< . < .
H
subject to appropriate boundary conditions and initial conditions. For example, the Lapla-
cian operators \
2
and gradient operator \are dened in the Cartesian coordinates as follows
\n =
Jn
Jr
+
Jn
J
+
Jn
J.
\
2
n =
J
2
n
Jr
2
+
J
2
n
J
2
+
J
2
n
J.
2
In Cartesian coordinate system, we construct grid lines parallel to x, y and z axis and force
the residuals to zero at the internal grid points. For example, the partial derivative of the
dependent variable n with respect to r at grid point (r
i;
j;
.
k
) can be approximated as follows
_
Jn
Jr
_
ijk
=
(n
i+1;j;k
n
i1;j;k
)
2(r)
_
J
2
n
Jr
2
_
ijk
=
(n
i+1;j;k
2n
i;j;k
+ n
i1;j;k
)
(r)
2
The partial derivatives in the remaining directions can be approximated in analogous manner.
It may be noted that the partial derivatives are approximated by considering one variable
at a time and is equivalent to application of Taylor series expansion of a scalar function.
17
When the PDE involves only the spatial derivatives, the discretization process yields
either coupled set of linear / nonlinear algebraic equations or an ODE-BVP. When the PDEs
involve time derivatives, the discretization is carried out only in the spatial coordinates. As a
consequence, the discretization process yields coupled nonlinear ODEs with initial conditions
specied, i.e. an ODE-IVP.
Example 11 Consider the PDE describing the unsteady state condition in a tubular reactor
with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out.
JC
Jt
=
1
1c
J
2
C
J.
2
JC
J.
1cC
2
i: (0 < . < 1) (51)
t = 0 : c(.. 0) = ,(.) i: (0 < . < 1) (52)
1.C.ct . = 0 :
JC(0. t)
J.
= 1c (C(0. t) 1) for t _ 0 (53)
1.C.ct . = 1 :
JC(1. t)
J.
= 0 for t _ 0 (54)
Using nite dierence method along the spatial coordinate . with : 1 internal grid points,
we have
dC
i
(t)
dt
=
1
1c
_
C
i+1
(t) 2C
i
(t) + C
i1
(t)
(.)
2
_
(55)
_
C
i+1
(t) C
i1
(t)
2 (.)
_
1c [C
i
(t)]
2
(56)
i = 2. 3. ....:
The boundary conditions yield
1.C.1 :
C
2
(t) C
1
(t)
.
= 1c (C
1
(t) 1)
= C
1
(t) =
_
1
.
+ 1c
_
1
_
C
2
(t)
.
+ 1c
_
(57)
and
1.C.2 :
C
n+1
(t) C
n
(t)
.
= 0 =C
n+1
(t) = C
n
(t) (58)
These boundary conditions can be used to eliminate variables C
1
(t) and C
n+1
(t) from the
set of ODEs (55). This gives rise to a set of (n-1) coupled ODEs together with the initial
conditions
C
2
(0) = , (.
2
) . C
3
(0) = , (.
3
) . ..... = C
n
(0) = , (.
n
) (59)
Thus, dening vector x of concentration values at the internal grid points as
x =
_
C
2
(t) C
3
(t) ... C
n
(t)
_
T
18
the discretized problem is an ODE-IVP of the form
T ( x) =
d x
dt
1( x) =0 (60)
subject to the initial condition x(0). Needless to say that better approximation is obtained if
large number of grid points are selected.
Example 12 (FurnacePDE) Laplace equation represents a prototype for steady state dif-
fusion processes. For example 2-dimensional Laplace equation
c
_
J
2
1
Jr
2
+
J
2
1
J
2
_
= ,(r. ) (61)
0 < r < 1 ; 0 < < 1
where T is temperature and r. are dimensionless space coordinates. Equations similar to
this arise in many problems of uid mechanics, heat transfer and mass transfer. In the
present case , 1(r. ) represents the dimensionless temperature distribution in a furnace and
c represents thermal diusivity. Three walls of the furnace are insulated and maintained
at a constant temperature. Convective heat transfer occurs from the fourth boundary to the
atmosphere. The boundary conditions are as follows:
r = 0 : 1 = 1
; r = 1 : 1 = 1
(62)
= 0 : 1 = 1
(63)
= 1 : /
d1(r. 1)
d
= /[1
1
1(r. 1)] (64)
We construct the 2 -dimensional grid with (:
x
+ 1) equispaced grid lines parallel to y axis
and (:
y
+1) equispaced grid lines parallel to x axis. The temperature 1 at (i. ,)
th
grid point
is denoted as 1
ij
= 1(r
i;
j
).We then force the residual to be zero at each internal grid point
to obtain the following set of equations:
(1
i+1;j
21
i
.
j
+1
i1;j
)
(r)
2
+
(1
i;j+1
21
i
.
j
+1
i;j1
)
()
2
= ,(r
i
.
j
),c (65)
for (i = 2. 3. .... :
x
) and ( , = 2. 3. ... :
y
). Note that regardless of the size of the system, each
equation contains not more than ve unknowns, resulting in a sparse linear algebraic system.
Consider the special case when
r = = ,
For this case the above equations can be written as
1
i1;j
+ 1
i;j1
41
i;j
+ 1
i;j+1
+ 1
i+1;j
= ,
2
,(r
i
.
j
) (66)
19
for (i = 2. 3. .... :
x
) and (, = 2. 3. .... :
y
)
Using the boundary conditions, we have additional equations
1
1;j
= 1
; 1
n
x+1
;j
= 1
,o: , = 1. 2. .... :
y
1
i;0
= 1
,o: i = 1. 2. ....:
x
+ 1
/
1
i;n
y+1
1
i;n
y
= /
_
1
1
1
i;n
y
+1
=1
i;n
y
+1
=
1
(/,) + /
_
/1
1
+ (/,)1
i;n
y
,o: i = 1. 2. ..... :
x
+ 1
that can be used to eliminate the boundary variables from the set of ODEs. Thus, we obtain
(:
x
1) (:
y
1) linear algebraic equations in (:
x
1) (:
y
1) unknowns. Dening
vector x as
x = [1
2;2
1
2;3
......1
2;n
y
;
..... 1
n
x
;2:
.......1
n
x
;n
y
]
T
we can rearrange the resulting set of equations in form of A x = b, then A turns out to be
a large sparse matrix. Even for modest choice of 10 internal grid lines in each direction, we
would get a 100 100 sparse matrix associated with 100 variables.
Example 13 Converting a PDE to an ODE-BVP by method of lines [2]: Consider
the 2-D steady state heat transfer problem in the previous example. By method of lines, we
discretize only in one spatial direction. For example, we choose :
x
1 internal grid points
along x coordinate and construct :
x
1 grid lines parallel to the y-axis. The temperature
1 along the i
th
grid line is denoted as
1
i
() = 1(r
i
. ) (67)
Now, we equate residuals to zero at each internal grid line as
d
2
1
i
d
2
=
1
(r)
2
[1
i+1
() 21
i
() + 1
i1
()] + ,(r
i
. ),c (68)
i = 2. 3. ....:
x
The boundary conditions at r = 0 and r = 1 yield
1
1
() = 1
; 1
n
x
+1
() = 1
which can be used to eliminate variables in the above set of ODE that lie on the corresponding
edges. The boundary conditions at = 0 and = 1 are:
1
i
(0) = 1
(69)
/
d1
i
(1)
d
= /(1
1
1
i
(1)) (70)
20
i = 2. 3. ....:
x
Thus, dening
u =
_
1
2
() 1
3
() ... 1
n
()
_
T
discretization of the PDE using the method of lines yields OBE-BVP of the form
T ( u) =
d
2
y
d
2
1 [ u] = 0
subject to the boundary conditions
u(0) = 1
d u(1)
d
= G[ u(1)]
Example 14 Consider the 2-dimensional unsteady state heat transfer problem
J1
Jt
= c[
J
2
1
Jr
2
+
J
2
1
J
2
] + ,(r. . t) (71)
t = 0 : 1 = H(r. ) (72)
r = 0 : 1(0. . t) = 1
; r = 1 : 1(1. . t) = 1
(73)
= 0 : 1(r. 0. t) = 1
; (74)
= 1 : /
d1(r. 1. t)
d
= /(1
1
1(r. 1. t)) (75)
where 1(r. . t) is the temperature at locations (r. ) at time t and c is the thermal diusivity.
By nite dierence approach, we construct a 2-dimensional grid with :
x
1 equispaced grid
lines parallel to the y-axis and :
y
1 grid lines parallel to the x-axis. The temperature 1 at
the (i. ,)th grid point is given by
1
ij
(t) = 1(r
i
.
i
. t) (76)
Now, we force the residual to zero at each internal grid point to generate a set of coupled
ODE-IVPs as
d1
ij
dt
=
c
(r)
2
[1
i+1;j
21
i;j
+ 1
i1;j
]
+
c
()
2
[1
i;j+1
21
i;j
+ 1
i;j1
] + ,(r
i
.
j
. t) (77)
for i = 2. 3. .... :
x
and , = 2. 3. .... :
y
21
Using the boundary conditions, we have constraints at the four boundaries
1
0;j
(t) = 1
; 1
n
x
+1;j
(t) = 1
,o: , = 1. 2. ...:
y
+ 1
1
i;0
(t) = 1
,o: i = 1. 2. ....:
x
+ 1
/
1
i;n
y
+1
1
i;n
y
= /
_
1
1
1
i;n
y
+1
=1
i;n
y
+1
(t) =
1
(/,) + /
_
/1
1
+ (/,)1
i;n
y
(t)
,o: i = 2. ....:
x
These constraints can be used to eliminate the boundary variables from the set of ODEs 77.
Thus, dening vector
x(t) = [1
2;2
(t) 1
2;3
(t)....1
2;n
y
(t)..... 1
n
x
;2
(t).......1
n
x
;n
y
(t)]
T
the PDE after discretization is reduced to a set of coupled ODE-IVPs of the form
T ( x) =
d x
dt
1( x. t) =0
subject to the initial condition x(0)
x(0) = [H(r
2
.
2
) H(r
2
.
3
)....H(r
n
x
.
2
)....H(r
n
x
.
n
y
)]
T
3.4 Newtons Method for Solving Nonlinear Algebraic Equations
The most prominent application of the multivariate Taylor series expansion in the numerical
analysis is arguably the Newtons method, which is used for solving a set of simultaneous
nonlinear algebraic equations. Consider set of : coupled nonlinear equations of the form
,
i
(x) = 0 for i = 1. ..... : (78)
which have to be solved simultaneously. Here, each ,
i
(.) : 1
n
1 is a scalar function.
Dening a function vector
F(x) =
_
,
1
(x) ,
2
(x) ... ,
n
(x)
_
T
the problem at hand is to solve vector equation
F(x) = 0
22
Suppose x
) = 0. If each function ,
i
(x) is continuously dier-
entiable, then, in the neighborhood of x
) = F[ x+(x
x)] = F( x) +
_
J1
Jr
_
x=e x
(x
x) +R
2
(x
. x
x) (79)
where x represents a guess solution. If the guess solution is suciently close to the true
solution, then, neglecting terms higher than the rst order, we can locally approximate the
nonlinear transformation F(x
) as follows
F(x
)
F(x
)= F( x) +
_
J1
Jr
_
x=e x
x
x = x
x
and solve for
F(x
) = 0
The approximated operator equation can be rearranged as follows
_
J1
Jr
_
x=e x
[ x] = F( x)
(: :) :ct:ir (: 1) ccto: = (: 1) ccto:
which corresponds to the standard form Ax = b. Solving the above linear equation yields
x and, if the guess solution x is suciently close to true solution, then
x
- x+ x (80)
However, we may not reach the true solution in a single iteration. Thus, equation (80) is
used to generate a new guess solution, say x
New
. as follows
x
New
= x + x (81)
This process is continued till
_
_
_
F( x
New
)
_
_
_ <
1
or
| x
New
x|
| x
New
|
<
2
where tolerances
1
and
2
are some suciently small numbers. The above derivation in-
dicates that the Newtons method is likely to converge only when the guess solution is
suciently close to the true solution, x
. x
x) can be neglected.
23
4 Discretization using Polynomial Interpolation
Consider a function n(.) to be a continuous function dened over . [c. /] and let n
1
. n
2;
...n
n+1
_
1 .
1
... (.
1
)
n
1 .
2
... (.
2
)
n
... ... ... .....
1 .
n+1
... (.
n+1
)
n
_
_
Since matrix A and vector n are known, the coecients of the Lagrange interpolation poly-
nomial can be found by solving for vector .
4.2 Piecewise Polynomial Interpolation [2]
Matrix A appearing in equation (84) is known as Vandermond matrix. Larger dimensional
Vandermond matrices tend to become numerically ill-conditioned. Also, if the number of
data points is large, tting a large order polynomial can result in a polynomial which exhibits
unexpected oscillatory behavior. In order to avoid such oscillations and the diculties arising
from ill conditioning of the Vandermond matrix, the data is divided into sub-intervals and
a lower order spline approximation is developed on each sub-interval. Let [a,b] be a nite
interval. We introduce a partition of the interval by placing points
c _ ?
1
< ?
2
< ?
3
.... < ?
n+1
_ /
where ?
i
are called nodes. A function is said to be a piecewise polynomial of degree / on
this partition if in each subinterval ?
i
_ . _ ?
i+1
we develop a kth degree polynomial. For
example, a piecewise polynomial of degree one consists of straight line segments. Such an
approximation is continuous at the nodes but will have discontinuous derivatives. In some
applications it is important to have a smooth approximation with continuous derivatives.
A piecewise kth degree polynomial, which has continuous derivatives up to order / 1 is
called a spline of degree /. In particular, the case / = 3. i.e. cubic spline, has been studied
extensively in the literature. In this section, we restrict our discussion to the development
of cubic splines. Thus, given a set of points .
1
= c < .
2
< .
3
< .... < .
n+1
= /. the nodes
are chosen as
?
i
= .
i
for i = 1. 2. .... : + 1
25
and : cubic splines that t (: + 1) data points can be expressed as
j
1
(.) = c
0;1
+ c
1;1
(. .
1
) + c
2;1
(. .
1
)
2
+ c
3;1
(. .
1
)
3
(87)
(.
1
_ . _ .
2
) (88)
j
2
(.) = c
0;2
+ c
1;2
(. .
2
) + c
2;2
(. .
2
)
2
+ c
3;2
(. .
2
)
3
(89)
(.
2
_ . _ .
3
) (90)
....... = .......................
j
n
(.) = c
0;n
+ c
1;n
(. .
n
) + c
2;n
(. .
n
)
2
+ c
3;n
(. .
n
)
3
(.
n
_ . _ .
n+1
) (91)
There are total 4: unknown coecients c
0;1
. c
1;1
.......c
3;n
to be determined. In order to
ensure continuity and smoothness of the approximation, the following conditions are imposed
Initial point of each polynomial
j
i
(.
i
) = n
i
for i = 1. 2. .... : (92)
Terminal point of the last polynomial
j
n
(.
n+1
) = n
n+1
(93)
Conditions for ensuring continuity between two neighboring polynomials
j
i
(.
i+1
) = j
i+1
(.
i+1
) ; i = 1. 2. ....: 1 (94)
dj
i
(.
i+1
)
d.
=
dj
i+1
(.
i+1
)
d.
; i = 1. 2. ....: 1 (95)
d
2
j
i
(.
i+1
)
d.
2
=
d
2
j
i+1
(.
i+1
)
d.
2
; i = 1. 2. ....: 1 (96)
which result in 4: 2 conditions including earlier conditions.
Two additional conditions are imposed at the boundary points
d
2
j
1
(.
1
)
d.
2
=
d
2
j
n
(.
n+1
)
d.
2
= 0 (97)
which are referred to as free boundary conditions. If the rst derivatives at the bound-
ary points are known,
dj
1
(.
1
)
d.
= d
1
;
dj
n
(.
n+1
)
d.
= d
n+1
(98)
then we get the clamped boundary conditions.
26
Using constraints (92-96) and dening .
i
= .
i+1
.
i
, we get the following set of coupled
linear algebraic equations
c
0;i
= n
i
; ( i = 1. 2. .... :) (99)
c
0;n
+ c
1;n
(.
n
) + c
2;n
(.
n
)
2
+ c
3;n
(.
n
)
3
= n
n+1
(100)
c
0;i
+ c
1;i
(.
i
) + c
2;i
(.
i
)
2
+ c
3;i
(.
i
)
3
= c
0;i+1
(101)
c
1;i
+ 2c
2;i
(.
i
) + 3c
3;i
(.
i
)
2
= c
1;i+1
(102)
c
2;i
+ 3c
3;i
(.
i
) = c
2;i+1
(103)
,o: i = 1. 2. .... : 1
In addition, using the free boundary conditions, we have
c
2;1
= 0 (104)
c
2;n
+ 3c
3;n
(.
n
) = 0 (105)
Eliminating c
3;i
using equation (103 and 105), we have
c
3;i
=
c
2;i+1
c
2;i
3 (.
i
)
for i = 1. 2. .... : 1 (106)
c
3;n
=
c
2;n
3 (.
n
)
(107)
and eliminating c
1;i
using equations (100,101), we have
c
1;i
=
1
.
i
(c
0;i+1
c
0;i
)
.
i
3
(2c
2;i
+ c
2;i+1
) (108)
for i = 1. 2. .... : 1
c
1;n
=
n
n+1
c
0;n
.
n
(.
n
) c
2;n
c
3;n
(.
n
)
2
(109)
Thus, we get only c
2;i
: i = 1. .... : as unknowns and the resulting set of linear equations
assume the form
c
2;1
= 0 (110)
(.
i1
) c
2;i1
+ 2(.
i
+ .
i1
)c
2;i
+ (.
i
) c
2;i+1
= /
i
(111)
for i = 2. .... : 1
where
/
i
=
3(c
0;i+1
c
0;i
)
.
i
3(c
0;i
c
0;i1
)
.
i1
=
3(n
i+1
n
i
)
.
i
3(n
i
n
i1
)
.
i1
27
for i = 2. .... : 1.
1
3
(.
n1
) c
2;n1
+
2
3
(.
n1
+ .
n
)c
2;n
= /
n
(112)
/
n
=
n
n+1
.
n
_
1
.
n
+
1
.
n1
_
n
n
+
n
n1
.
n1
Dening vector
2
as
2
=
_
c
2;1
c
2;2
....... c
2;n
_
T
the above set of : equations can be rearranged as
A
2
= b (113)
where A is a (: :) matrix and b is (: 1) vector. Elements of A and b can be obtained
from equations (110-112). Note that matrix A will be a near tridiagonal matrix, i.e. a sparse
matrix.Once all the c
2;i
are obtained, c
1;i
and c
2;i
can be easily obtained.
4.3 Interpolation using Linearly Independent Functions
While polynomial is a popular choice as basis for interpolation, any set of linearly indepen-
dent functions dened on [a,b] can be used for developing an interpolating function. Let
,
0
(.). ,
1
(.). ...,
n
(.) represent a set of linearly independent functions in C[c. /]. Then, we
can construct an interpolating function, q(.). as follows
q(.) = c
0
,
1
(.) + ....... + c
n
,
n
(.) (114)
Forcing the interpolating function to have values n
i
at . = .
i
leads to the following set of
linear algebraic equations
c
0
,
0
(.
i
) + ....... + c
n
,
n
(.
i
) = n
i
(115)
i = 0. 1. .... :
which can be further rearranged as A = u where [with .
0
= 0 and .
n
= 1]
A =
_
_
,
0
(0) ,
1
(0) .... ,
n
(0)
,
0
(.
1
) ,
1
(.
1
) .... ,
n
(.
1
)
.... .... .... ....
,
0
(1) ,
1
(1) .... ,
n
(1)
_
_
(116)
and vectors and u are dened by equations (85) and (86), respectively. Commonly used
interpolating functions are
28
Shifted Legandre polynomials
Chebysheve polynomials
Trigonometric functions, i.e. sines and cosines
Exponential functions c
i
z
: i = 0. 1. ...: with c
0
....c
n
specied i.e.
q(.) = o
1
c
1
z
+ o
2
c
2
z
+ ...................... + o
n
c
n
z
(117)
4.4 Discretization using Orthogonal Collocations [2]
One of the important applications of polynomial interpolation is the method of orthogonal
collocations. By this approach, the dierential operator over a spatial / temporal domain is
approximated using an interpolation polynomial.
4.4.1 Discretization of ODE-BVP
Consider the second order ODE-BVP given by equations (22), (23) and (24a). To see how the
problem discretization can be carried out using Lagrange interpolation, consider a selected
set of collocation (grid) points .
i
: i = 1. .... :+1 in the domain [0. 1] such that .
1
= 0 and
.
n+1
= 1 and .
2
. .
3
. .....
n
(0. 1) such that
.
1
= 0 < .
2
< .
3
< .... < .
n+1
= 1
Let n
i
= n(.
i
) : i = 1. 2. .... : + 1 represent the values of the dependent variable at these
collocation points. Given these points, we can propose an approximate solution, n(.). of the
form
n(.) = c
0
+ c
1
. + ..... + c
n
.
n
to the ODE-BVP as an interpolation polynomial that passes exactly through n
i
: i =
1. ...: + 1. This requires that the following set of equations hold
n(.
i
) = c
0
+ c
1
.
i
+ ..... + c
n
.
n
i
= n
i
i = 1. 2. .... : + 1
at the collocation points. The unknown polynomial coecients c
i
: i = 0. 1. ...: can be
expressed in terms of unknowns n
i
: i = 1. .... : + 1 as follows
o = A
1
u
29
where matrix A is dened in equation (84). To approximate the OBE-BVP in (0. 1), we
force the residuals at the collocation points to zero using the approximate solution n(.), i.e.
1
i
=
_
d
2
n(.
i
)
d.
2
.
dn(.
i
)
d.
. n(.
i
). .
i
_
= 0 (118)
for i = 2. 3. ...:. Thus, we need to compute the rst and second derivatives of the approximate
solution n(.) at the collocation points. The rst derivative at ith collocation point can be
computed as follows
d n(.
i
)
d.
= 0c
0
+ c
1
+ 2c
2
.
i
+ ....... + :c
n
.
n1
i
(119)
=
_
0 1 2.
i
... :.
n1
i
_
(120)
=
_
0 1 2.
i
... :.
n1
i
_
A
1
u (121)
Dening vector
_
s
(i)
T
=
_
0 1 2.
i
... :.
n1
i
_
A
1
we have
d n(.
i
)
d.
=
_
s
(i)
T
u
Similarly, the second derivative can be expressed in terms of vector u as follows:
d
2
n(.
i
)
d.
2
= 0c
0
+ 0c
1
+ 2c
2
+ ....... + :(: 1)c
n
.
n2
i
(122)
=
_
0 0 2 ... :(: 1).
n2
i
_
(123)
=
_
0 0 2 ... :(: 1).
n2
i
_
A
1
u (124)
Dening vector
_
t
(i)
T
=
_
0 0 2 ... :(: 1).
n2
i
_
A
1
we have
d
2
n(.
i
)
d.
2
=
_
t
(i)
T
u
Substituting for the rst and the second derivatives of n(.
i
) in equations in (118), we have
_
_
t
(i)
T
u.
_
s
(i)
T
u. u
i
. .
i
_
= 0 (125)
for i = 2. 3.... :. At the boundary points, we have two additional constraints
,
1
_
d n(0)
d.
. n
1
. 0
_
= ,
1
_
_
s
(1)
T
u. n
1
. 0
_
= 0
30
Table 1: Roots of Shifted Legandre Polynomials
Order (:) Roots
1 0.5
2 0.21132, 0.78868
3 0.1127, 0.5, 0.8873
4 0.9305,0.6703, 0.3297, 0.0695
5 0.9543, 0.7662, 0.5034, 0.2286, 0.0475
6 0.9698, 0.8221, 0.6262, 0.3792, 0.1681, 0.0346
7 0.9740, 0.8667, 0.7151, 0.4853, 0.3076, 0.1246, 0.0267
,
2
_
d n(1)
d.
. n
n+1
. 1
_
= ,
2
_
_
s
(n+1)
T
u. n
n+1
. 1
_
= 0 (126)
Thus, we have (: +1) algebraic equations to be solved simultaneously in (: +1) unknowns,
i.e. n
i
: i = 1. .... : + 1.
It may be noted that the collocation points need not be chosen equispaced. It has
been shown that, if these collocation points are chosen at the roots of :
th
order orthogonal
polynomial, then the error [n
_
_
s
(1)
T
_
s
(2)
T
....
_
s
(n+1)
T
_
_
; T =
_
_
_
t
(1)
T
_
t
(2)
T
....
_
t
(n+1)
T
_
_
(127)
31
In addition, let us dene matrices C and D as follows
C =
_
_
0 1 .... (:)(.
0
)
n1
0 1 .... (:)(.
1
)
n1
.... .... .... ....
0 1 .... (:)(.
n
)
n1
_
_
D =
_
_
0
0
....
0
0
0
....
0
2 6.
0
.. :(: 1) (.
0
)
n2
2 6.
1
.. :(: 1)(.
1
)
n2
.... .... .. ....
2 6.
n
.. :(: 1) (.
n
)
n2
_
_
It is easy to see that
S = CA
1
; T = DA
1
(128)
where matrix A is dened by equation (84).
Example 15 [2] Consider the ODE-BVP describing steady state conditions in a tubular
reactor with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out.
Using method of orthogonal collocation with : = 4 and dening vector
C =
_
C
1
C
2
... C
5
_
T
at
.
1
= 0. .
2
= 0.1127. .
3
= 0.5. .
4
= 0.8873 and .
5
= 1
we get following set of ve simultaneous nonlinear algebraic equations
1
1c
_
_
t
(i)
T
C
_
_
_
s
(i)
_
T
C
_
1cC
2
i
= 0
i = 2. 3. 4
_
_
s
(1)
T
C
_
1c(C
1
1) = 0
_
_
s
(5)
T
C
_
= 0
where the matrices A. S and T for the selected set of collocation points are
A =
_
_
1 0 0 0 0
1 0.1127 (0.1127)
2
(0.1127)
3
(0.1127)
4
1 0.5 (0.5)
2
(0.5)
3
(0.5)
4
1 0.8873 (0.8873)
2
(0.8873)
3
(0.8873)
4
1 1 1 1 1
_
_
(129)
32
S =
_
_
_
s
(1)
T
_
s
(2)
T
_
s
(3)
T
_
s
(4)
T
_
s
(5)
T
_
_
=
_
_
13 14.79 2.67 1.88 1
5.32 3.87 2.07 1.29 0.68
1.5 3.23 0 3.23 1.5
0.68 1.29 2.07 3.87 5.32
1 1.88 2.67 14.79 13
_
_
(130)
T =
_
_
_
t
(1)
T
_
t
(2)
T
_
t
(3)
T
_
t
(4)
T
_
t
(5)
T
_
_
=
_
_
84 122.06 58.67 44.60 24
53.24 73.33 26.67 13.33 6.76
6 16.67 21.33 16.67 6
6.76 13.33 26.67 73.33 53.24
24 44.60 58.67 122.06 84
_
_
(131)
Thus, discretization yields a set of nonlinear algebraic equations.
Remark 16 Are the two methods presented above, i.e. nite dierence and collocation
methods, doing something fundamentally dierent? Let us compare the following two cases
(a) nite dierence method with 3 internal grid points (b) collocation with 3 internal grid
points on the basis of expressions used for approximating the rst and second order derivatives
computed at one of the grid points. For the sake of comparison, we have taken equi-spaced grid
points for the collocation method instead of taking them at the roots of 3rd order orthogonal
polynomial. Thus, for both collocation and nite dierence method, the grid (or collocation)
points are at .
1
= 0. .
2
= 1,4. .
3
= 1,2. .
4
= 3,4. .
5
= 1. Let us compare expressions for
approximate derivatives at . = .
3
used in both the approaches.
Finite Dierence
dn(.
3
)
d.
=
(n
4
n
2
)
2(.)
= 2n
4
2n
2
; . = 1,4
d
2
n(.
3
)
d.
2
=
(n
4
2n
3
+ n
2
)
(.)
2
= 16n
4
32n
3
+ 16n
2
Collocation
dn(.
3
)
d.
= 0.33n
1
2.67n
2
+ 2.67n
4
0.33n
5
d
2
n(.
3
)
d.
2
= 1.33n
1
+ 21.33n
2
40n
3
+ 21.33n
4
1.33n
5
It is clear from the above expressions that the essential dierence between the two ap-
proaches is the way the derivatives at any grid (or collocation) point is approximated. The
33
nite dierence method takes only immediate neighboring points for approximating the deriv-
atives while the collocation method nds derivatives as weighted sum of all the collocation
(grid) points. As a consequence, the approximate solutions generated by these approaches
will be dierent.
4.4.2 Discretization of PDEs [2]
Example 17 Consider the PDE describing unsteady state conditions in a tubular reactor
with axial mixing (TRAM) given earlier. Using method of orthogonal collocation with : 1
internal collocation points, we get
dC
i
(t)
dt
=
1
1c
_
_
t
(i)
T
C(t)
_
_
_
s
(i)
_
T
C(t)
_
1cC
i
(t)
2
i = 2. 3. ...:
where
C(t) =
_
C
1
(t) C
2
(t) ... C
n+1
(t)
_
C
i
(t) represents time varying concentration at the ith collocation point, C(.
i
. t). and the
vectors
_
t
(i)
T
and
_
s
(i)
_
T
represent row vectors of matrices T and S. dened by equation
(127). The two boundary conditions yield the following algebraic constraints
_
_
s
(1)
T
C(t)
_
= 1c(C
1
(t) 1)
_
_
s
(n+1)
T
C(t)
_
= 0
Thus, the process of discretization in this case yields a set of dierential algebraic equations
of the form
dx
dt
= 1(x. z)
0 = G(x. z)
which have to be solved simultaneously subject to the specied initial conditions on (x. z).
In the present case, since the algebraic constraints are linear, they can be used to eliminate
variables C
1
(t) and C
n+1
(t) from the set of ODEs resulting from discretization. For example,
when we select 3 internal grid points as discussed in Example 15, the boundary constraints
can be stated as follows
(13 + 1c)C
1
(t) +14.79C
2
(t) 2.67C
3
(t) +1.88C
4
(t) C
5
(t) = 1c
C
1
(t) 1.88C
2
(t) +2.67C
3
(t) 14.79C
4
(t) +13C
5
(t) = 0
34
These equations can be used to eliminate variables C
0
(t) and C
4
(t) from the three ODEs
C
1
(t). C
2
(t). C
3
(t) by solving the following linear algebraic equation
_
(13 + 1c) 1
1 13
__
C
1
(t)
C
5
(t)
_
=
_
14.79C
2
(t) +2.67C
3
(t) 1.88C
4
(t) 1c
1.88C
2
(t) 2.67C
3
(t) +14.79C
4
(t)
_
Thus,the resulting set of (n-1) ODEs together with initial conditions
C
2
(0) = , (.
2
) . .....C
n
(0) = , (.
n
) (132)
is the discretized problem.
Example 18 [2] Consider the 2-dimensional Laplace equation given in Example 12. We
consider a scenario where the thermal diusivity c is function of temperature. To begin with,
we choose (:
x
1) internal collocation points along x-axis and (:
y
1) internal collocation
points along the y-axis. Using :
x
1 internal grid lines parallel to y axis and :
y
1 grid lines
parallel to x-axis, we get (:
x
1) (:
y
1) internal collocation points. Corresponding to the
chosen collocation points, we can compute matrices (S
x
. T
x
) and (S
y
. T
y
) using equations
(128). Using these matrices, the PDE can be transformed as to a set of coupled algebraic
equations as follows
c(1
i;j
)
_
_
t
(j)
x
_
T
1
(j)
x
+
_
t
(i)
y
_
T
1
(i)
y
_
= ,(r
i
.
j
)
i = 2. ...:
x
; , = 2. ...:
y
where vectors 1
(j)
x
and 1
(i)
y
are dened as
1
(j)
x
=
_
1
1;j
1
2;j
... 1
n
x
+1;j
_
1
(i)
y
=
_
1
i;1
1
i;2
... 1
i;n
y
+1
_
At the boundaries, we have
1
0;j
= 1
; (, = 1. ...:
y
+ 1)
1
1;j
= 1
; (, = 1. ...:
y
+ 1)
1
i;0
= 1
; (i = 1. ...:
x
+ 1)
/
_
s
(i)
n
y
+1
_
T
1
(n
y
+1)
x
= /(1
1
1
(n
y
+1)
x;i
) for (i = 2. ...:
x
)
The above discretization procedure yields a set of (:
x
+ 1) (:
y
+ 1) nonlinear algebraic
equations in (:
x
+ 1) (:
y
+ 1) unknowns, which have to be solved simultaneously.
35
4.5 Orthogonal Collocations on Finite Elements (OCFE)
The main diculty with polynomial interpolation is that Vandermond matrix becomes ill
conditioned when the order of interpolation polynomial is selected to be large. A remedy
to this problem is to sub-divide the region into nite elements and assume a lower order
polynomial spline solution. The collocation points are then selected within each nite ele-
ment, where the residuals are forced to zero. The continuity conditions (equal slopes) at the
boundaries of neighboring nite elements gives rise to additional constraints. We illustrate
this method by taking a specic example.
Example 19 [2] Consider the ODE-BVP describing steady state conditions in a tubular
reactor with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out.
It is desired to solve this problem by OCFE approach.
Step 1: The rst step is to create nite elements in the domain. Let us assume that we
create 3 sub-domains. Finite Element 1: 0 _ . _ 0.3. Finite Element 2: 0.3 _ . _ 0.7.
Finite Element 3: 0.7 _ . _ 1. It may be noted that these sub-domains need not be equi-sized.
Step 2: On each nite element, we dene a scaled spacial variable as follows
1
=
. ?
1
?
2
?
1
.
2
=
. ?
2
?
3
?
2
and
3
=
. ?
3
?
4
?
3
where ?
1
= 0. ?
2
= 0.3, ?
3
= 0.7 and ?
4
= 1 represent the boundary points of the nite
elements. It is desired to develop a polynomial spline solution such that polynomial on each
nite element is 4th order. Thus, within each element, we select 3 collocation points at the
root of the 3rd order shifted Legandre polynomial, i.e.,
i;1
= 0.1127.
i;2
= 0.5 and
i;3
= 0.8873 for i = 1. 2. 3
In other words, collocation points are placed at
?
i
+ 0.1127(?
i+1
?
i
). ?
i
+ 0.5(?
i+1
?
i
). and ?
i
+ 0.8873(?
i+1
?
i
) for i = 1. 2. 3
in the ith element ?
i
_ . _Z
i+1
.Thus, in the present case, we have total of 9 collocation
points. In addition, we have two points where the neighboring polynomials meet, i.e. at
Z
1
= 0.3 and Z
2
= 0.7. Thus, there are total of 11 internal points and two boundary points,
i.e. Z
1
= 0 and Z
4
= 1.
Step 3: Let the total set of points created in the previous step be denoted as .
1
. .
1;
....
13
i
=
. ?
i
?
i+1
?
i
=
. ?
i
/
i
for i = 1,2,3, the ODE in each nite element is modied as follows
1
1c
_
1
/
2
i
_
d
2
C
d
2
i
_
1
/
i
_
dC
d
i
1cC
2
= 0 for ?
i
_ . _ ?
i+1
and i = 1. 2. 3 (134)
The main dierence here is that only the variables associated within an element are used
while discretizing the derivatives. Thus, at the collocation point .
2
in nite element 1, the
residual is computed as follows
1
2
=
1
1c
_
1
/
2
1
_
_
t
(2)
T
C
(2)
_
1
/
1
_
_
s
(2)
T
C
(1)
1c (C
2
)
2
= 0 (135)
_
t
(2)
T
C
(1)
= (53.24C
1
73.33C
2
+ 26.27C
3
13.33C
4
+ 6.67C
5
)
_
s
(2)
T
C
(1)
= (5.32C
1
+ 3.87C
2
+ 2.07C
3
1.29C
4
+ 0.68C
5
)
where vectors
_
s
(2)
T
and
_
t
(2)
T
are 2
nd
rows of matrices (130) and (131), respectively.
Similarly, at the collocation point . = .
8
, which corresponds to
i;3
= 0.8873 in nite element
2, the residual is computed as follows
1
8
=
1
1c
_
1
/
2
2
_
_
t
(3)
T
C
(2)
_
1
/
2
_
_
s
(3)
T
C
(2)
1c (C
8
)
2
= 0 (136)
_
t
(3)
T
C
(2)
= 6.76C
5
13.33C
6
+ 26.67C
7
73.33C
8
+ 53.24C
9
_
s
(2)
T
C
(2)
= 0.68C
5
+ 1.29C
6
2.07C
7
3.87C
8
+ 5.32C
9
Other equations arising from the forcing the residuals to zero are
Finite Element 1: 1
3
= 1
4
= 0
Finite Element 2: 1
6
= 1
7
= 0
Finite Element 3: 1
10
= 1
11
= 1
12
= 0
37
In addition to these 9 equations arising from the residuals at the collocation points, there are
two constraints at the collocation points .
4
and .
8
. which ensure smoothness between the the
two neighboring polynomials, i.e.
_
1
/
1
_
_
s
(5)
T
C
(1)
=
_
1
/
2
_
_
s
(1)
T
C
(2)
_
1
/
2
_
_
s
(5)
T
C
(2)
=
_
1
/
3
_
_
s
(1)
T
C
(3)
The remaining two equations come from discretization of the boundary conditions.
_
1
/
1
_
_
_
s
(1)
T
C
(1)
_
= 1c(C
0
1)
_
1
/
3
_
_
_
s
(5)
T
C
(3)
_
= 0
Thus, we have 13 equations in 13 unknowns. It may be noted that, when we collect all the
equations together, we get the following form of equation
AC = F(C)
A =
_
_
1
[0] [0]
[0]
2
[0]
[0] [0]
3
_
_
1313
C =
_
C
0
C
1
... C
12
_
T
and F(C) is a 13 1function vector containing all the nonlinear terms. Here,
1
.
1
and
3
are each 5 5 matrices and matrix A is a sparse block diagonal matrix.
The method described above can be easily generalized to any number of nite elements.
Also, the method can be extended to the discretization of PDEs in a similar way. These
extensions are left to the reader as an exercise and are not discussed separately. Note that
block diagonal and sparse matrices naturally arise when we apply this method.
5 Least Square Approximations
While constructing an interpolation polynomial, we require that the interpolating function
passes exactly through the specied set of points. Alternatively, one can relax the require-
ment that the approximating function passes exactly through the desired set of points.
Instead, an approximate function is constructed such that it captures the trend in variation
of the dependent variable in some optimal way.
38
In the development that follows, we slightly change the way the data points are numbered
and the rst data point is indexed as (n
1
. .
1
). Thus, we are given a data set (n
i
. .
i
) : i =
1. ...: where n
i
denotes the value dependent variable at . = .
i
such that . .
i
: i = 1. ...:
C[c. /].Let ,
1
(.). ...,
m
(.) represent a set of linearly independent functions in C[c. /]. Then,
we can propose to construct an approximating function, say q(.). as follows
q(.) = c
1
,
1
(.) + ....... + c
m
,
m
(.) (137)
where : < :. where the unknown coecients c
1
. ...c
m
are determined from the data set
in some optimal manner. Dening approximation error at point .
i
as
c
i
= n
i
[c
1
,
1
(.
i
) + ....... + c
m
,
m
(.
i
)] (138)
i = 1. 2. ...:
and error vector, e. as follows
e =
_
c
1
c
2
.... c
n
_
T
the problem of nding best approximation q(.) is posed as nding the parameters c
1
. ...c
m
such that some norm of the error vector (e) is minimized. Most commonly used norm is
weighted two norm, i.e.
|e|
2
w;2
= e. e
W
= e
T
We =
n
i=1
n
i
c
2
i
where
W = diag
_
n
1
n
2
... n
n
_
and n
i
0 for all i. The set of equations (138) can be expressed as follows
e = u A
=
_
c
1
c
2
... c
m
_
T
(139)
u =
_
n
1
n
2
... n
n
_
T
(140)
A =
_
_
,
1
(.
1
) ,
2
(.
1
) .... ,
m
(.
1
)
,
1
(.
2
) ,
2
(.
2
) .... ,
m
(.
2
)
.... .... .... ....
,
1
(.
n
) ,
2
(.
n
) .... ,
m
(.
n
)
_
_
(141)
39
It may be noted that e 1
n
. u 1
n
. 1
m
and A is a non-square matrix of dimension
(::).Thus, it is desired to choose a solution that minimizes the scalar quantity c = e
T
We.
i.e.
min
c =
min
e
T
We =
min
(u A)
T
W(u A) (142)
The resulting approximate function is called the least square approximation. Another op-
tion is to nd the parameters such that innite-norm of vector e is minimized w.r.t. the
parameters, i.e.
min |e|
1
= min
_
max
i
[c
i
[
_
These problems involve optimization of a scalar function with respect to minimizing argu-
ment . which is a vector. The necessary and sucient conditions for qualifying a point to
be an optimum are given in the Appendix.
5.1 Solution of Linear Least Square Problem
Consider the minimization problem
min
_
c =(u A)
T
W(u A)
_
(143)
To obtain a unique solution to this problem, the matrices A and W should satisfy the
following conditions
Condition C1: Matrix W should be positive denite
Condition C2: Columns of matrix A should be linearly independent
Using the necessary condition for optimality, we have
Jc
J
= 0
Rules of dierentiation of a scalar function , = x
T
1y with respect to vectors x and y can
be stated as follows
J
Jx
(x
T
1y) = 1y (144)
J
Jy
[x
T
1y] = 1
T
x (145)
J
Jx
[x
T
1x] = 2 1x (when 1 is symmetric) (146)
40
Applying the above rules to the scalar function
c = u
T
Wu (A)
T
Wu u
T
WA +
T
(A
T
WA)
together with the necessary condition for the optimality yields the following constraint
Jc
J
= A
T
Wu A
T
Wu + 2(A
T
WA) = 0 (147)
Rearranging the above equation, we have
(A
T
WA)
LS
= A
T
Wu (148)
It may be noted that we have used the fact that W
T
= W and matrix A
T
WA is symmetric.
Also, even though A is a non-square (: :) matrix, A
T
WA is a (: :) square matrix.
When Conditions C1 and C2 are satised, matrix (A
T
WA) is invertible and the least square
estimate of parameters can computed as
LS
=
_
A
T
WA
1
_
A
T
W
_
u (149)
Thus, the linear least square estimation problem is nally reduced to solving linear equations.
Using the sucient condition for optimality, the Hessian matrix
_
J
2
c
J
2
_
= 2(A
T
WA) (150)
should be positive denite or positive semi-denite for the stationary point to be a minimum.
When Conditions C1 and C2 are satised, it can be easily shown that
x
T
_
A
T
WA
_
x =(Ax)
T
W(Ax) _0 for any x 1
m
(151)
Thus, the suciency condition is satised and the stationary point is a minimum. As c is a
convex function, it can be shown that the solution
LS
is the global minimum of c = e
T
We.
5.2 Geometric Interpretation of Least Squares Approximation [11]
A special case of the above result is when W = I. The least square estimate of the parameter
vector can be computed as follows
LS
=
_
A
T
A
1
A
T
u (152)
In the previous subsection, this result was derived by purely algebraic manipulations. In this
section, we interpret this result from the geometric viewpoint.
41
5.2.1 Distance of a Point from a Line
Suppose we are given a vector b 1
3
and we want to nd its distance from the line in the
direction of vector a 1
3
. In other words, we are looking for a point p along the line that
is closest to b(see Figure 5.2.1).i.e. p =oa such that
|e|
2
= |p b|
2
= |oa b|
2
(153)
is minimum. This problem can be solved by minimizing c = |e|
2
2
with respect to o. i.e.
min
o
c =
min
a b. a b (154)
=
min
o
_
o
2
a. a 2o a. b +b. b
(155)
Using necessary condition for optimality,
Jc
Jo
= o a. a a. b = 0 (156)
= o
LS
=
a. b
a. a
(157)
p = o a =
a. b
a. a
a (158)
Now, equation (156) can be rearranged as
a.oa a. b = a.o
LS
a b = a. p b = 0 (159)
which implies that the error vector e = p b is perpendicular to a. From school geometry,
we know that if p is such a point, then the vector (b p) is perpendicular to direction a. We
have derived this geometric result using principles of optimization. Equation (158) can be
further rearranged as
p =
_
a
_
a. a
. b
_
a
_
a. a
= a. b a (160)
where a =
a
_
a. a
is unit vector along direction of a and point p is the projection of vector
b along direction a. Note that the above derivation holds in any general : dimensional space
a. b1
n
or even any innite dimensional vector space.
The equation can be rearranged as
p = a
_
a
T
b
a
T
a
_
=
_
1
a
T
a
_
_
aa
T
b = P
r
.b (161)
where P
r
=
1
a
T
a
aa
T
is a : : matrix and is called as projection matrix, which projects
vector binto its column space.
42
5.2.2 Distance of a point from Subspace
The situation is exactly same when we are given a point b 1
3
and plane o in 1
3
, which is
spanned by two linearly independent vectors
_
a
(1)
. a
(2)
_
. We would like to nd distance of
b from o.i.e. a point p o such that |p b|
2
is minimum (see Figure 5.2.2). Again, from
school geometry, we know that such point can be obtained by drawing a perpendicular from
b to o ; p is the point where this perpendicular meets o (see Figure 5.2.2). We would like
to formally derive this result using optimization.
More generally, consider a : dimensional subspace o of 1
n
such that
o = :jc:
_
a
(1)
. a
(2)
. ..... a
(m)
_
where the vectors
_
a
(1)
. a
(2)
. ..... a
(m)
_
1
n
are linearly independent vectors. Given an
arbitrary point b 1
n
, the problem is to nd a point p in subspace o such that it is closest
to vector b (see Figure 5.2.2). As p o we have
p = c
1
a
(1)
+ c
2
a
(2)
+ .... + c
m
a
(m)
=
m
i=1
c
i
a
(i)
(162)
In other words, we would like to nd a point p o such that 2-norm of the error vector,
e = p b.i.e.
|e|
2
= |p b|
2
=
_
_
_
_
_
_
m
i=1
c
i
a
(i)
_
b
_
_
_
_
_
2
(163)
43
44
is minimum. This problem is equivalent to minimizing c = |e|
2
2
. i.e.
LS
=
min
c =
min
__
m
i=1
c
i
a
(i)
b
_
.
_
m
i=1
c
i
a
(i)
b
__
(164)
Using the necessary condition for optimality, we have
Jc
Jc
j
=
_
a
(j)
.
_
m
i=1
c
i
a
(i)
b
__
=
_
a
(i)
_
. (p b)
_
= 0 (165)
, = 1. 2. ...:
Equation (165) has a straight forward geometric interpretation. Vector p b is orthogonal
to each vector a
(i)
, which forms the basis of o, and the point p is the projection of b into
subspace o. Equation (165) can be further rearranged as follows
_
a
(j)
.
m
i=1
c
i
a
(i)
_
=
m
i=i
c
i
a
(j)
. a
(i)
_
=
a
(j)
. b
_
(166)
, = 1. 2. ...: (167)
Collecting the above set of equations and using vector-matrix notation, we arrive at the
following matrix equation
_
a
(1)
. a
(1)
_
a
(1)
. a
(2)
_
....
a
(1)
. a
(m)
_
a
(2)
. a
(1)
_
a
(2)
. a
(2)
_
....
a
(2)
. a
(m)
_
..... ..... ..... .....
a
(m)
. a
(1)
_
a
(m)
. a
(2)
_
.....
a
(m)
. a
(m)
_
_
_
_
_
c
1
c
2
....
c
m
_
_
=
_
a
(1)
. b
_
a
(2)
. b
_
....
a
(m)
. b
_
_
_
(168)
which is called the normal equation. Now, consider the : : matrix A constructed such
that vector a
(i)
forms ith column of A. .i.e.
A =
_
a
(1)
a
(2)
... a
(m)
_
It is easy to see that
A
T
A =
_
a
(1)
. a
(1)
_
a
(1)
. a
(2)
_
....
a
(1)
. a
(m)
_
a
(2)
. a
(1)
_
a
(2)
. a
(2)
_
....
a
(2)
. a
(m)
_
..... ..... ..... .....
a
(m)
. a
(1)
_
a
(m)
. a
(2)
_
.....
a
(m)
. a
(m)
_
_
_
; A
T
b
_
a
(1)
. b
_
a
(2)
. b
_
....
a
(m)
. b
_
_
_
In fact, equation (168) is general and holds for any denition of the inner product such as
a
(i)
. a
(j)
_
W
=
_
a
(i)
T
Wa
(j)
For the later choice of the inner product, the normal equation (168) reduces to
(A
T
WA)
LS
= A
T
Wb
which is identical to equation (148).
45
5.2.3 Additional Geometric Insights
To begin with, we dene fundamental sub-spaces associated with a matrix.
Denition 20 (Column Space): The space spanned by column vectors of matrix A is
dened as column space of the matrix and denoted as 1(A).
It may be noted that when matrix A operates on vector x. it produces a vector Ax
1(A). i.e. a vector in the column space of A. Thus, the system Ax = b can be solved if
and only if b belongs to the column space of A. i.e., b 1().
Denition 21 (Row Space): The space spanned by row vectors of matrix is called as
row space of matrix and denoted as 1(A
T
).
Denition 22 (Null space): The set of all vectors x such that Ax =
0 is called as null
space of matrix A and denoted as `(A).
Denition 23 (Left Null Space) :The set of all vectors y such that A
T
y =
0 is called
as null space of matrix and denoted as `(A
T
).
The following fundamental result, which relates dimensions of row and column spaces
with the rank of a matrix, holds true for any :: matrix A.
Theorem 24 (Fundamental Theorem of Linear Algebra): Given a :: matrix A
dim[1(A) ] = Number of linearly independent columns of A = :c:/(A)
dim[`(A)] = : :c:/(A)
dim[1(A
T
) ] = Number of linearly independent rows of A = :c:/(A)
dim[`(A
T
) ] = ::c:/(A)
In other words, the number of linearly independent columns of A equals the number of
linearly independent rows of A.
With this background on the vector spaces associated with a matrix, the following com-
ments regarding the projection matrix are in order.
If columns of A are linearly independent, then matrix A
T
A is invertible and, the point
p, which is projection of b onto column space of A (i.e. 1(A)) is given as
p = A
LS
= A
_
A
T
A
1
_
A
T
b = [1
r
] b (169)
1
r
= A
_
A
T
A
1
_
A
T
(170)
46
Here matrix 1
r
is the projection matrix, which projects vector b onto 1(A). i.e. the
column space of A. Note that [1
r
] b is the component of b in 1(A)
b (1
r
)b = [1 1
r
] b (171)
is component of b l to 1(A). Thus we have a matrix formula of splitting a vector
into two orthogonal components.
Projection matrix has two fundamental properties.
[1
r
]
2
= 1
r
[1
r
]
T
= 1
r
Conversely, any symmetric matrix with A
2
= A represents a projection matrix.
Suppose then b 1(A). then b can be expressed as linear combination of columns of
A i.e.,the projection of b is still b itself.
p = A
= b (172)
This implies
p = A(A
T
A)
1
A
T
b = A(A
T
A)
1
_
A
T
A
_
= A
= b (173)
The closest point of p to b is b itself
At the other extreme, suppose b l 1(A). Then
j = A(A
T
A)
1
A
T
b = A(A
T
A)
1
0 =
0 (174)
When A is square and invertible, every vector projects onto itself, i.e. p =
A(A
T
A)
1
A
T
b = (AA
1
)(A
T
)
1
A
T
b = b.
Matrix
_
A
T
A
1
_
A
T
u p. a
(i)
_
=
u
_
c
1
a
(1)
+ c
2
a
(2)
+ .... + c
m
a
(m)
_
. a
(i)
_
= 0 (176)
for i = 1. 2. ...:
This set of : equations can be written as
G =
_
a
(1)
. a
(1)
_
a
(1)
. a
(2)
_
....
a
(1)
. a
(m)
_
a
(2)
. a
(1)
_
a
(2)
. a
(2)
_
....
a
(2)
. a
(m)
_
..... ..... ..... .....
a
(m)
. a
(1)
_
a
(m)
. a
(2)
_
.....
a
(m)
. a
(m)
_
_
_
_
_
c
1
c
2
....
c
m
_
_
=
_
a
(1)
. u
_
a
(2)
. u
_
....
a
(m)
. u
_
_
_
(177)
This is the general form of normal equation resulting from the minimization problem. The
:: matrix G on L.H.S. is called as Gram matrix. If vectors
_
a
(1)
. a
(2)
. ........ a
(m)
_
are lin-
early independent, then Grammatrix is nonsingular. Moreover, if the set
_
a
(1)
. a
(2)
. ........ a
(m)
_
is chosen to be an orthonormal set, say
_
e
(1)
. e
(2)
. ........ e
(m)
_
. then Gram matrix reduces to
identity matrix i.e. G = 1 and we have
p = c
1
e
(1)
+ c
2
e
(2)
+ .... + c
m
e
(m)
(178)
where
c
i
=
e
(i)
. u
_
as
e
(i)
. e
(j)
_
= 0 when i ,= ,. It is important to note that, if we choose orthonormal set
_
e
(1)
. e
(2)
. ........ e
(m)
_
and we want to include an additional orthonormal vector, say e
(m+1)
.
to this set, then we can compute c
m+1
as
c
m+1
=
e
(m+1)
. y
_
48
without requiring to recompute c
1
. ....c
m
.
Remark 26 Given any Hilbert space A and a orthonormal basis for the Hilbert space
_
e
(1)
. e
(2)
. ... e
(m)
. ...
_
we can express any vector u A as
u = c
1
e
(1)
+ c
2
e
(2)
+ .... + c
m
e
(m)
+ ...... (179)
c
i
=
e
(i)
. u
_
(180)
The series
u =
e
(1)
. u
_
e
(1)
+
e
(2)
. u
_
e
(2)
+ ........... +
e
(i)
. u
_
e
(i)
+ .... (181)
=
1
i=1
e
(i)
. u
_
e
(i)
(182)
which converges to element u A is called as generalized Fourier series expansion
of element u and coecients c
i
=
e
(i)
. u
_
are the corresponding Fourier coecients.
The well known Fourier expansion of a continuous function over interval [:. :] using
sin(/t). cos(/t) : / = 0. 1. ... is a special case of this general result.
5.3.1 Simple Polynomial Models and Hilbert Matrices [11, 7]
Consider problem of approximating a continuous function, say n(.), over interval [0. 1] by a
simple polynomial model of the form
n(.) = c
1
+ c
2
. + c
3
.
2
+ ...... + c
m+1
.
m
(183)
Let the inner product on C
(2)
[0. 1] is dened as
/(.). q(.) =
1
_
0
/(.)q(.)d.
We want to nd a polynomial of the form (183), which approximates n(.) in the least square
sense. Geometrically, we want to project n(.) in the (:+1) dimensional subspace of C
(2)
[0. 1]
spanned by vectors
,
1
(.) = 1; ,
2
(.) = . ; ,
3
(.) = .
2
. ....... ,
m+1
(.) = .
m
(184)
Using projection theorem, we get the normal equation
_
_
1.1 1.. .... 1..
m
_
_
_
c
1
c
1
....
c
m+1
_
_
=
_
_
1.n(.)
..n(.)
....
.
m
.n(.)
_
_
(185)
49
Element /
ij
of the matrix on L.H.S. can be computed as
/
ij
=
1
_
0
.
j+i2
d. =
1
i + , 1
(186)
and this reduces the above equation to
H
m+1
_
_
c
1
c
1
....
c
m+1
_
_
=
_
_
1.n(.)
..n(.)
....
.
m
. n(.)
_
_
(187)
where
H
m+1
=
_
_
1 1,2 1,3 ... 1,:
1,2 1,3 1,4 ... 1,(: + 1)
... ... ... ... ...
1,: ... ... ... 1,(2:1)
_
_
(m+1)(m+1)
(188)
The matrix H
m+1
is known as Hilbert matrix and this matrix is highly ill-conditioned for
: + 1 3. The following table shows condition numbers for a few values of :.(Refer to
Lecture Notes on Solving Linear Algebraic Equations to know about the concept of condition
number and matrix conditioning).
: + 1 3 4 5 6 7 8
c
2
(H) 524 1.55e4 4.67e5 1.5e7 4.75e8 1.53e10
(189)
Thus, for polynomial models of small order, say : = 3 we obtain good situation, but
beyond this order, what ever be the method of solution, we get approximations of less and
less accuracy. This implies that approximating a continuous function by polynomial of type
(183) with the choice of basis vectors as (184) is extremely ill-conditioned problem from the
viewpoint of numerical computations. Also, note that if we want to increase the degree of
polynomial to say (:+1)from :, then we have to recompute c
1
. ..... c
m+1
along with c
m+2
.
On the other hand, consider the model
(.) = ,
1
j
1
(.) + ,
2
j
2
(.) + ,
3
j
3
(.) + ............. + ,
m
j
m
(.) (190)
where j
i
(.) represents the ith order orthonormal basis function on C
(2)
[0. 1] i.e.
j
i
(.). j
j
(.) =
_
1 i, i = ,
0 i, i ,= ,
_
(191)
50
the normal equation reduces to
_
_
1 0 .... 0
0 1 .... 0
..... ..... ..... .....
0 0 ..... 1
_
_
_
_
,
1
,
2
....
,
m
_
_
=
_
_
j
1
(.).n(.)
j
2
(.).n(.)
....
j
m
(.). n(.)
_
_
(192)
or simply
,
i
= j
i
(.).n(.) ; i = 1. 2. ....: (193)
Obviously, the approximation problem is extremely well conditioned in this case. In fact, if
we want to increase the degree of polynomial to say (: + 1) from :, then we do not have
to recompute ,
1
. ..... ,
m
as in the case basis (184). We simply have to compute the ,
m+1
as
,
m+1
= j
m+1
(.).n(.) (194)
The above illustration of approximation of a function by orthogonal polynomials is a special
case of what is known as generalized Fourier series expansion.
5.3.2 Approximation of Numerical Data by a Polynomial [7]
Suppose we only know numerical n
1
. n
2
. ......n
n
at points .
1
. .
2
. .......
n
[0. 1] and we
want to develop a simple polynomial model of the form given by equation (183). Substituting
the data into the polynomial model leads to an overdertermined set of equations
n
i
= c
1
+ c
2
.
i
+ c
3
.
2
i
+ ..... + c
m
.
m1
i
+ c
i
(195)
i = 1. 2. .....: (196)
The least square estimates of the model parameters ( for W = I) can be obtained by solving
normal equation
(A
T
A)
= A
T
u (197)
where
A =
_
_
1 .
1
.
2
1
... .
m1
1
... ... ... ... .......
1 .
n
.
2
n
... .
m1
n
_
_
(198)
A
T
A =
_
_
n
.
i
.
2
i
....
.
m1
i
.
i
.
2
i
..... ....
.
m
i
..... ..... ..... .... ......
.
m1
i
..... ..... ....
.
2m2
i
_
_
(199)
51
i.e.,
(A
T
A)
ik
=
n
i=1
.
j+k2
i
(200)
Let us assume that .
i
is uniformly distributed in interval [0. 1]. For large :, approximating
d. = .
i
.
i1
1,:. we can write
_
A
T
A
jk
=
n
i=1
.
j+k2
i
~
= :
1
_
0
.
j+k2
d. =
:
, + / 1
(201)
( ,. / = 1. 2. ...... :) (202)
Thus, we can approximate (A
T
A) matrix by the Hilbert matrix
(A
T
A) = :(H) = :
_
_
1 1,2 1,3 ... 1,:
1,2 1,3 1,4 ... 1,(: + 1)
... ... ... ... ...
1,: ... ... ... 1,(2:1)
_
_
(203)
which is highly ill- conditioned for large :. Thus, whether we have a continuous function
or numerical data over interval [0. 1]. the numerical diculties persists as the Hilbert matrix
appears in both the cases.
5.4 Problem Discretization using Minimum Residual Methods
In interpolation based methods, we force residuals to zero at a nite set of collocation points.
Based on the least squares approach discussed in this section, one can think of constructing
an approximation so that the residual becomes small (in some sense) on the entire domain.
Thus, given a ODE-BVP / PDE, we seek an approximate solution as linear combination
of nite number of linearly independent functions. Parameters of this approximation are
determined in such a way that some norm of the residuals is minimized. There are many
discretization methods that belong to this broad class. In this section, we provide a brief
introduction to these discretization approaches.
5.4.1 Raleigh-Ritz method [11, 12]
To understand the motivation for developing this approach, rst consider a linear system of
equations
Ax = b (204)
52
where A is a :: positive denite and symmetric matrix and it is desired to solve for vector
x. We can pose this as a minimization problem by dening an objective function of the form
c(x) = (1,2)x
T
Ax x
T
b (205)
= (1,2) x. Ax x. b (206)
If c(x) minimum at x = x
b = 0 (207)
which is precisely the equation we want to solve. Since the Hessian matrix
J
2
c,Jx
2
= A
is positive denite, the solution of x = x
= A.
When matrix A has all real elements, we have
x
T
(Ay) = (A
T
x)
T
y
53
and it is easy to see that A
= A
T
. i.e.
x. Ay =
A
T
x. y
_
(211)
The matrix A is called self-adjoint if A
T
= A. Does operator L dened by equations (208-
210) have some similar properties of symmetry and positiveness? Analogous to the concept
of adjoint of a matrix, we rst introduce the concept of adjoint of an operator L on any inner
product space.
Denition 28 (Adjoint of Operator) An operator L
. n
Further, the operator 1 is said to be self-adjoint, if L
= L, 1.C.1
= 1.C.1 and
1.C.2
= 1.C.2.
To begin with, let us check whether the operator 1 dened by equations (208-210) is
self-adjoint.
. Ln =
1
_
0
(.)(d
2
n,d.
2
)d.
=
_
(.)
dn
d.
_
1
0
+
1
_
0
d
d.
dn
d.
d.
=
_
(.)
dn
d.
_
1
0
+
_
d
d.
n(.)
_
1
0
+
1
_
0
_
d
2
d.
2
_
n(.)d.
Using the boundary conditions n(0) = n(1) = 0, we have
_
d
d.
n(.)
_
1
0
=
d
d.
n(1)
d
d.
n(0) = 0
If we set
1.C.1
: (0) = 0
1.C.2
: (1) = 0
then
_
dn
d.
(.)
_
1
0
= 0
54
and we have
. Ln =
1
_
0
_
d
2
d.
2
_
n(.)d. = 1
. n
In fact, it is easy to see that the operator L is self adjoint as L
= L, 1C1
= 1C1 and
1C2
; x n(.); b ,(.)
Let n(.) = n
(.) represent the true solution of the ODE-BVP. Now, taking motivation from
the optimization formulation for solving x = b, we can formulate a minimization problem
to compute the solution
c[n(.)] = (1,2)
n(.). d
2
n,d.
2
_
n(.). ,(.) (212)
= 1,2
1
_
0
n(.)(d
2
n,d.
2
)d.
1
_
0
n(.),(.)d. (213)
n
(.) =
`i:
n(.)
c[n(.)] (214)
=
`i:
n(.)
(1,2) n(.). 1n(.) n(.). ,(.) (215)
n(.) C
(2)
[0. 1] (216)
subject to n(0) = n(1) = 0
55
Thus, solving the C11 1\ 1 has been converted to solving a minimization problem.
Integrating the rst term in equation (213) by parts, we have
1
_
0
n(.)
_
d
2
n
d.
2
_
d. =
1
_
0
_
dn
d.
_
2
d.
_
n
dn
d.
_
1
0
(217)
Now, using boundary conditions, we have
_
n
dn
d.
_
1
0
=
_
n(0)
_
dn
d.
_
z=0
n(1)
_
dn
d.
_
z=1
_
= 0 (218)
This reduces c(n) to
c(n) =
_
_
1,2
1
_
0
_
dn
d.
_
2
d.
_
_
_
_
1
_
0
n,(.)d.
_
_
(219)
The above equation is similar to an energy function, where the rst term is analogous to
kinetic energy and the second term is analogous to potential energy. As
1
_
0
_
dn
d.
_
2
d.
is positive and symmetric, we are guaranteed to nd the minimum. The main diculty
in performing the search is that, unlike the previous case where we were working in 1
n
.
the search space is innite dimensional as n(.) C
(2)
[0. 1]. One remedy to alleviate this
diculty is to reduce the innite dimensional search problem to a nite dimensional search
space by constructing an approximate solution using : trial functions. Let
(1)
(.). ......
(n)
(.)
represent the trial functions. Then, the approximate solution is constructed as follows
n(.) = c
0
(0)
(.) + ..... + c
n
(n)
(.) (220)
where
(i)
(.) represents trial functions. Using this approximation, we convert the innite
dimensional optimization problem to a nite dimensional optimization problem as follows
`i:
c() =
_
_
1,2
1
_
0
_
d n
d.
_
2
d.
_
_
_
_
1
_
0
n,(.)d.
_
_
(221)
= 1,2
1
_
0
_
c
0
_
d
(0)
(.),d.
_
+ ..... + c
n
_
d
(n)
(.),d.
_
2
d.
1
_
0
,(.)[c
0
(0)
(.) + ..... + c
n
(n)
(.)]d. (222)
56
The trial functions
(i)
(.) are chosen in advance and coecients c
1
. ....c
m
are treated as
unknown. Also, let us assume that these functions are selected such that n(0) = n(1) = 0.
Then, using the necessary conditions for optimality, we get
J
c
Jc
i
= 0 ,o: i = 0. 2. ...: (223)
These equations can be rearranged as follows
J
c
J
= A
b = 0 (224)
where
=
_
c
0
c
1
... c
n
_
T
A =
_
_
_
d
(0)
d.
.
d
(0)
d.
_
........
_
d
(0)
d.
.
d
(n)
d.
_
.................. ........ .............
_
d
(n)
d.
.
d
(0)
d.
_
........
_
d
(n)
d.
.
d
(n)
d.
_
_
_
(225)
b =
_
(1)
(.). ,(.)
_
............
(n)
(.). ,(.)
_
_
_
(226)
Thus, the optimization problem under consideration can be recast as follows
`i:
c() =
`i:
_
(1,2)
T
A
T
b
(227)
It is easy to see that matrix A is positive denite and symmetric and the global minimum
of the above optimization problem can be found by using necessary condition for optimality
i.e. J
c,J = A
b = 0 or
= A
1
b. Note the similarity of the above equation with
the normal equation arising from the projection theorem. Thus, steps in the Raleigh-Ritz
method can be summarized as follows
1. Choose an approximate solution.
2. Compute matrix A and vector b
3. Solve for A = b
57
5.4.2 Discretization of ODE-BVP / PDEs using Finite Element Method
The nite element method is a powerful tool for solving PDEs particularly when the system
under consideration has complex geometry. This method is based on the least square approx-
imation. In this section, we provide a very brief introduction to the method discretization
of PDEs and ODE-BVPs using the nite element method.
Discretization of ODE-BVP using Finite Element [11] Similar to nite dierence
method, we begin by choosing (: 1) equidistant internal node (grid) points as follows
.
i
= i. (i = 0. 1. 2. ....:)
and dening : nite elements
.
i1
_ . _ .
i
for i = 1. 2. ...:
Then we formulate the approximate solution using piecewise constant polynomials on each
nite element. The simplest possible choice is a line
n
i
(.) = c
i
+ /
i
. (228)
.
i1
_ . _ .
i
for i = 1. 2. ...: (229)
With this choice, the approximate solution for the ODE-BVP can be expressed as
n(.) =
_
_
c
1
+ /
1
. ,o: .
0
_ . _ .
1
c
2
+ /
2
. ,o: .
1
_ . _ .
2
.....
c
n
+ /
n
. ,o: .
n1
_ . _ .
n
_
_
(230)
In principle, we can work with this piecewise polynomial approximation. However, the
resulting optimization problems has coecients (c
i
. /
i
: i = 1. 2. ...:) as unknowns. If the
optimization problem has to be solved numerically, it is hard to generate initial guess for
these unknown coecients. Thus, it is necessary to parameterize the polynomial in terms of
unknowns for which it is relatively easy to generate the initial guess. This can be achieved
as follows. Let n
i
denote the value of the approximate solution n(.) at . = .
i
. i.e.
n
i
= n(.
i
) (231)
Then, at the boundary points of the ith element, we have
n(.
i1
) = n
i1
= c
i
+ /
i
.
i1
(232)
n(.
i
) = n
i
= c
i
+ /
i
.
i
(233)
58
Using these equations, we can express (c
i
. /
i
) in terms of unknowns ( n
i1
. n
i
) as follows
c
i
=
n
i1
.
i
n
i
.
i1
.
; /
i
=
n
i
n
i1
.
(234)
Thus, the polynomial on the ith segment can be written as
n
i
(.) =
n
i1
.
i
n
i
.
i1
.
+
_
n
i
n
i1
.
_
. (235)
.
i1
_ . _ .
i
for i = 1. 2. ...:
and the approximate solution can be expressed as follows
n(.) =
_
_
n
0
.
1
.
+
_
n
1
n
0
.
_
. ,o: .
0
_ . _ .
1
n
1
.
2
n
2
.
1
.
+
_
n
2
n
1
.
_
. ,o: .
1
_ . _ .
2
..............
n
n1
.
n
n
n
.
n1
.
+
_
n
n
n
n1
.
_
. ,o: .
n1
_ . _ .
n
_
_
(236)
Thus, now we can work in terms of unknown values n
0;
n
1
. .... n
n
instead of parameters c
i
and /
i:
. Since unknowns n
0;
n
1
. .... n
n
correspond to some physical variable, it is relatively
easy to generate good guesses for these unknowns from knowledge of the underlying physics
of the problem. The resulting form is still not convenient from the viewpoint of evaluating
integrals involved in the computation of c[ n(.)] . A more elegant and useful form of equation
(236) can be found by dening shape functions. To arrive at this representation, consider
the rearrangement of the line segment equation on ith element as follows
n
i
(.) =
n
i1
.
i
n
i
.
i1
.
+
_
n
i
n
i1
.
_
. (237)
=
.
i
.
.
n
i1
+
. .
i1
.
n
i
Let us dene two functions, `
i
(.) and `
i
(.). which are called as shape functions, as follows
`
i
(.) =
.
i
.
.
; `
i
(.) =
. .
i1
.
.
i1
_ . _ .
i
for i = 1. 2. ...:
The graphs of these shape functions are straight lines and they have fundamental properties
`
i
(.) =
_
1 ; . = .
i1
0 ; . = .
i
_
(238)
`
i
(.) =
_
0 ; . = .
i1
1 ; . = .
i
_
(239)
59
This allows us to express n
i
(.) as
n
i
(.) = n
i1
`
i
(.) + n
i
`
i
(.)
i = 1. 2. ...:
Note that the coecient n
i
appears in polynomials n
i
(.) and n
i+1
(.). i.e.
n
i
(.) = n
i1
`
i
(.) + n
i
`
i
(.)
n
i+1
(.) = n
i
`
i+1
(.) + n
i+1
`
i+1
(.)
Thus, we can dene a continuous trial function by combining `
i
(.) and `
i+1
(.) as follows
(i)
(.) =
_
_
`
i
(.) =
. .
i1
.
= 1 +
. .
i
.
; .
i1
_ . _ .
i
`
i+1
(.) =
.
i+1
.
.
= 1
. .
i
.
; .
i
_ . _ .
i+1
0 Elsewhere
_
_
(240)
i = 1. 2. ....:
This yields the simplest and most widely used hat function, which is shown in Figure 1. This
is a continuous linear function of .. but, it is not dierentiable at .
i1;
.
i;
and .
i+1
. Also, note
that at . = .
i
. we have
(i)
(.
j
) =
_
1 if i = ,
0 otherwise
_
(241)
, = 1. 2. ..... :
Thus, plot of this function looks like a symmetric triangle. The two functions at the boundary
points are dened as ramps
(0)
(.) =
_
`
1
(.) = 1
.
.
; 0 _ . _ .
1
0 Elsewhere
_
(242)
(n)
(.) =
_
_
_
`
n
(.) = 1 +
. .
n
.
; .
n1
_ . _ .
n
0 Elsewhere
_
_
_
(243)
Introduction of these trial functions allows us to express the approximate solution as
n(.) = n
0
(0)
(.) + ...... + n
n
(n)
(.) (244)
0 _ . _ 1
and now we can work with u =
_
n
0
n
1
... n
n
_
T
as unknowns. Now, we have two
boundary conditions, i.e.
n
0
= 0 and n
n
= 0
60
and the set of unknowns is reduced to u =
_
n
1
n
2
... n
n1
_
T
. The optimum parameters
u can be computed by solving equation
A u b = 0 (245)
where
(A)
ij
=
_
d
(i)
d.
.
d
(j)
d.
_
(246)
and
d
(i)
d.
=
_
1,. on interval left of .
i
1,. on interval right of .
i
_
If intervals do not overlap, then
_
d
(i)
d.
.
d
(j)
d.
_
= 0 (247)
The intervals overlap when
i = , :
_
d
(i)
d.
.
d
(i)
d.
_
=
z
i
_
z
i1
(1,.)
2
d. +
z
i+1
_
z
i
(1,.)
2
d. = 2,. (248)
or
i = , + 1 :
_
d
(i)
d.
.
d
(i1)
d.
_
=
z
i
_
z
i1
(1,.).(1,.)d. = 1,. (249)
i = , 1 :
_
d
(i)
d.
.
d
(i+1)
d.
_
=
z
i+1
_
z
i
(1,.).(1,.)d. = 1,. (250)
Thus, the matrix A is a tridiagonal matrix
A =
1
.
_
_
2 1 .... .... 0
1 2 1 .... ...
.... .... .... .... ...
0 .... .... 1 2
_
_
(251)
which is similar to the matrix obtained using nite dierence method. The components of
vector bon the R.H.S. is computed as
/
i
=
(i)
. ,(.)
_
(252)
=
z
i
_
z
i1
,(.)
_
1 +
. .
i
.
_
d. +
z
i+1
_
z
i
,(.)
_
1
. .
i
.
_
d. (253)
i = 1. 2. .... : 1 (254)
61
Figure 1: (a) Trial functions and (b) Piece-wise linear approximation
which is a weighted average of ,(.) over the interval .
i1
_ . _ .
i+1
. Note that the R.H.S.
is signicantly dierent from nite dierence method.
In this sub-section, we have developed approximate solution using piecewise linear ap-
proximation. It is possible to develop piecewise quadratic or piecewise cubic approximations
and generate better approximations. Readers are referred to Computational Science and
Engineering by Gilbert Strang [13].
Discretization of PDE using Finite Element Method [12] The Raleigh-Ritz method
can be easily applied to discretize PDEs when the operators are self-adjoint. Consider
Laplace / Poissons equation
Ln = J
2
n,Jr
2
J
2
n,J
2
= ,(r. ) (255)
in open set o and n(r. ) = 0 on the boundary. Let the inner product on the space
C
(2)
[0. 1] C
(2)
[0. 1] be dened as
,(r. ). q(r. ) =
1
_
0
1
_
0
,(r. ) q(r. ) drd (256)
We formulate an optimization problem
c(n) = 1,2
n(r. ). J
2
n,Jr
2
J
2
n,J
2
_
n(r. ). ,(r. ) (257)
Integrating by parts, we can show that
62
c(n) =
_ _
[1,2(Jn,Jr)
2
+ 1,2(Jn,J)
2
,n]drd (258)
= (1,2) Jn,Jr. Jn,Jr + 1,2 Jn,J. Jn,J ,(r. ). n(r. ) (259)
We begin by choosing (: 1) (: 1) equidistant (with r = = /) internal node
(grid) points at (r
i
.
j
) where
r
i
= i/ (i = 1. 2. ....: 1)
j
= i/ (, = 1. 2. ....: 1)
In two dimension, the simplest element divides region into triangles on which simple poly-
nomials are tted. For example, n(r. ) can be approximated as
n(r. ) = c + /r + c
where vertices c. /. c can be expressed in terms of values of n(r. ) at the triangle vertices.
For example, consider triangle dened by (r
i
.
j
). (r
i+1
.
j
) and (r
i
.
j+1
). The value of the
approximate solution at the corner points is denoted by
n
i;j
= n(r
i
.
j
) ; n
i+1;j
= n(r
i+1
.
j
) ; n
i;j+1
= n(r
i
.
j+1
)
Then, n(r. ) can be written in terms of shape functions as follows
n(r. ) = n
i;j
+
n
i+1;j
n
i;j
/
(r r
i;j
) +
n
i;j+1
n
i;j
/
(
i;j
)
= n
i;j
_
1
(r r
i;j
)
/
(
i;j
)
/
_
+ n
i+1;j
_
(r r
i;j
)
/
_
+ n
i;j+1
_
(
i;j
)
/
_
(260)
Now, coecient n
i;j
appears in the shape functions of four triangular element around (r
i
.
j
).
Collecting these shape functions, we can dene a two dimensional trial function as follows
(i;j)
(.) =
_
_
1
(r r
i;j
)
/
(
i;j
)
/
; r
i
_ r _ r
i+1
;
j
_ _
j+1
1 +
(r r
i;j
)
/
(
i;j
)
/
; r
i1
_ r _ r
i
;
j
_ _
j+1
1
(r r
i;j
)
/
+
(
i;j
)
/
; r
i
_ r _ r
i+1
;
j1
_ _
j
1 +
(r r
i;j
)
/
+
(
i;j
)
/
; r
i1
_ r _ r
i
;
j1
_ _
j
0 Elsewhere
_
_
63
Figure 2: Trial function in two dimensions.
The shape of this trial function is like a pyramid (see Figure 2). We can dene trial functions
at the boundary points in a similar manner. Thus, expressing the approximate solution using
trial functions and using the fact that n(r. ) = 0 at the boundary points, we get
n(r. ) = n
1;1
(1;1)
(r. ) + .... + n
n1;n1
(n1;n1)
(r. )
where
(i;j)
(.) represents the (i,j)th trial function. For the sake of convenience, let us re-
number these trial functions and coecients using a new index | = 0. 1. ...... ` such that
| = i + (: 1),
i = 1. ...: 1 and , = 0. 1. ...: 1
` = (: 1) (: 1)
The approximate solution can now be expressed as
n(r. ) = n
0
0
(r. ) + .... + n
N
N
(r. )
The minimization problem an be reformulated as
`i:
u
c( n) =
`i:
u
_
1
2
_
J n
Jr
.
J n
Jr
_
+
1
2
_
J n
J
.
J n
J
_
,(r. ). n(r. )
_
where
u =
_
n
0
n
2
... n
N
_
T
Thus, the above objective function can be reformulated as
`i:
u
c( u) =
`i:
u
_
1,2 u
T
A u u
T
b
_
(261)
64
where
(A)
ij
= (1,2)
J
(i)
,Jr. J
(j)
,Jr
_
+ (1,2)
J
(i)
,J. J
(j)
,J
_
(262)
/
i
=
,(r. ).
(j)
(r. )
_
(263)
Again, the matrix is symmetric and positive denite matrix and this guarantees that
stationary point of c(u) is the minimum. At the minimum, we have
Jc,J u = A u b = 0 (264)
The matrix A will also be a sparse matrix. The main limitation of Raleigh-Ritz method is
that it works only when the operator L is symmetric or self adjoint.
5.4.3 Method of Least Squares[4]
This is probably best known minimum residual method. When used for solving linear op-
erator equations, this approach does not require self adjointness of the linear operator. To
understand the method, let us rst consider a linear ODE-BVP
L[n(.)] = ,(.) (265)
1.C.1 : n(0) = 0 (266)
1.C.2 : n(1) = 0 (267)
Consider an approximate solution constructed using linear combination of set of nite number
of linearly independent functions as follows
n(.) = c
1
n
1
(.) + c
2
n
2
(.) + .... + c
n
n
n
(.)
Let us assume that these basis functions are selected such that the two boundary conditions
are satised, i.e. n
i
(0) = n
i
(1) = 0. Given this approximate solution, the residual is dened
as follows
1(.) = L[n(.)] ,(.) where 0 < . < 1
The idea is to determine
=
_
c
1
c
2
... c
n
_
T
such that
`i:
_
L n
1
. L n
1
L n
1
. L n
2
.... L n
1
. L n
n
L n
2
. L n
1
L n
2
. L n
2
.... L n
2
. L n
n
_
_
_
c
1
c
2
....
c
m
_
_
=
_
_
L n
1
.,(.)
L n
2
.,(.)
....
L n
n
.,(.)
_
_
(268)
which can be solved analytically.
Example 29 [4]Use the least squares method to nd an approximate solution of the equation
L[n(.)] =
J
2
n
J.
2
n = 1 (269)
1.C. 1 : n(0) = 0 (270)
1.C. 2 : n(1) = 0 (271)
Let us select the function expansion as
n(.) = c
1
sin (:.) + c
2
sin (2:.)
It may be noted that this choice ensures that the boundary conditions are satised. Now,
L[ n
1
(.)] = (:
2
+ 1) sin(:.)
L[ n
2
(.)] = (4:
2
+ 1) sin(2:.)
With the inner product dened as
,. q =
_
1
0
,(.)q(.)d.
the normal equation becomes
_
(
2
+1)
2
2
0
0
(4
2
+1)
2
2
__
c
1
c
2
_
=
_
2(
2
+1)
0
_
and the approximate solution is
n(.) =
4
:(:
2
+ 1)
sin (:.)
which agrees with the exact solution
n(.) =
c
z
+ c
1z
(c + 1)
1
to within 0.006.
66
When boundary conditions are non-homogeneous, it is some times possible to transform
them to homogeneous conditions. Alternatively, the optimization problem is formulated in
such a way that the boundary conditions are satised in the least square sense [4]. While this
method can be, in principle, extended to discretization of general ODE-BVP of type (22-
24a), working with parameter vector as minimizing argument can pose practical diculties
as the resulting minimization problem has to be solved numerically. Coming up with initial
guess of to start the iterative algorithms can prove to be a tricky task. Alternatively, one
can work with trial solutions of the form (244) or (260) to make the problem computationally
tractable.
5.4.4 Gelarkins Method[4, 2]
The Gelarkins method can be applied for any problem where dierential operator is not self
adjoint or symmetric. Instead of minimizing c( u), we solve for
(i)
(.). L n(.)
_
=
(i)
(.). ,(.)
_
i = 1. 2. ..... :
where n(.) is chosen as nite dimensional approximation to n(.)
n(.) = n
1
(1)
(.) + ...... + n
n
(n)
(.) (272)
Rearranging above equations as
(i)
(.). (L n(.) ,(.))
_
= 0 for (i = 1. 2. ....:)
we can observe that parameters n
1
. .....n
n
are computed such that the error or residual vector
c(.) = (L n(.) ,(.))
is orthogonal to the (:) dimensional subspace spanned by set o dened as
o =
_
(i)
(.) : i = 1. 2. ....:
_
This results in a linear algebraic equation of the form
A u = b (273)
where
A =
_
(1)
. 1(
(1)
)
_
........
(1)
. 1(
(n)
)
_
............. ........ ...........
(n)
. 1(
(1)
)
_
........
(n)
. 1(
(n)
)
_
_
_
(274)
67
b =
_
(1)
(.). ,(.)
_
.............
(n)
(.). ,(.)
_
_
_
Solving for u gives approximate solution given by equation (272).When the operator is L
self adjoint, the Gelarkins method reduces to the Raleigh-Ritz method.
Example 30 Consider ODE-BVP
Ln = J
2
n,J.
2
Jn,J. = ,(.) (275)
in (0 < . < 1) (276)
subject to n(0) = 0; n(1) = 0 (277)
It can be shown that
1
(= J
2
,J.
2
+ J,J.) ,= (J
2
,J.
2
J,J.) = 1
Thus, Raleigh-Ritz method cannot be applied to generate approximate solution to this problem,
however, Gelarkins method can be applied.
It may be noted that one need not restrict to linear transformations while applying the
Gelarkins method. This approach can be used even when the ODE-BVP or PDE at hand
is a nonlinear transformation. Given a general nonlinear transformation of the form
T (n) = ,(.)
we select a set of trial function
_
(i)
(.) : i = 0. 1. ...:
_
and an approximate solution of the
form (272) and solve for
(i)
(.). T (n(.))
_
=
(i)
(.). ,(.)
_
for i = 0. 1. 2. ....:
Example 31 [4]Use the Gelarkins method to nd an approximate solution of the equation
L[n(.)] =
J
2
n
J.
2
n = 1 (278)
1.C. 1 : n(0) = 0 (279)
1.C. 2 : n(1) = 0 (280)
Let us select the function expansion as follows
n(.) = c
1
sin (:.) + c
2
sin (2:.)
68
which implies
L[ n(.)] = c
1
(:
2
+ 1) sin(:.) c
2
(4:
2
+ 1) sin(2:.)
With the inner product dened as
,. q =
_
1
0
,(.)q(.)d.
the normal equation becomes
_
(
2
+1)
2
0
0
(4
2
+1)
2
__
c
1
c
2
_
=
_
2
0
_
and the approximate solution is
n(.) =
4
:(:
2
+ 1)
sin (:.)
which turns out to be identical to the least square solution.
Example 32 [2]Consider the ODE-BVP describing steady state conditions in a tubular re-
actor with axial mixing (TRAM) in which an irreversible 2nd order reaction is carried out.
T (C) =
1
1c
d
2
C
d.
2
dC
d.
1cC
2
= 0 (0 _ . _ 1)
dC
d.
= 1c(C 1) ct . = 0;
dC
d.
= 0 ct . = 1;
The approximate solution is chosen as
C(.) =
C
1
(1)
(.) + ...... +
C
n+1
(n+1)
(.) =
n+1
i=1
C
i
(i)
(.) (281)
and we then evaluate the following set of equations
_
(i)
(.).
1
1c
d
2
C(.)
d.
2
d
C(.)
d.
1c
C(.)
2
_
=
(i)
(.). ,(.)
_
for i = 2. ....:
where the inner product is dened as
q(.). /(.) =
_
1
0
q()/()d
69
It may be noted that evaluation of integrals, such as
_
(i)
(.).
C(.)
2
_
=
_
1
0
(i)
()
_
n
i=0
C
i
(i)
()
_
2
d
will give rise to equations that are nonlinear in terms of unknown coecients. Two additional
equations arise from enforcing the boundary conditions. i.e.
d
C(0)
d.
= 1c(
C(0) 1)
d
C(1)
d.
= 0
Thus, we get (n+1) nonlinear algebraic equations in (n+1) unknowns, which have to be solved
simultaneously to compute the unknown coecients
C
1
. ...
C
n+1
. Details of computing these
integrals and developing piecewise approximating functions on nite element can be found in
[2].
6 Errors in Discretization and Computations[4]
As evident from various examples discussed in this module, the process of discretization
converts the original (often intractable) problem typically from an innite dimensional spaces
to a computationally tractable form in nite dimensions. Obviously, solving the discretized
version of the problem, y=
1
n
such that c(x
) < c(x)
for any x 1
N
. then x
i=1
Jc
Jr
i
(x)r
i
+ 1
2
(x. x) (287)
i.c. c(x + x) c(x) = r
k
Jc
Jr
i
(x) + 1
2
(x. x) (288)
Since 1
2
(x. x) is of order (r
i
)
2
. the terms of order r
i
will dominate over the higher
order terms for suciently small x. Thus, sign of c(x + x) c(x) is decided by sign of
r
k
Jc
Jr
k
(x)
Suppose,
Jc
Jr
k
(x) 0 (289)
then, choosing r
k
< 0 implies
c(x + x) c(x) < 0 =c(x + x) < c(x) (290)
and c(x) can be further reduced by reducing r
k
. This contradicts the assumption that
x =x is a minimum point. Similarly, if
Jc
Jr
k
(x) < 0 (291)
then, choosing r
k
0 implies
c(x + x) c(x) < 0 =c(x + x) < c(x) (292)
and c(x) can be further reduced by increasing r
k
. This contradicts the assumption that
x =x is a minimum point. Thus, x =x will be a minimum of c(x) only if
Jc
Jr
k
(x) = 0 1o: / = 1. 2. ...: (293)
Similar arguments can be made if x =x is a maximum of c(x).
8.3 Sucient Condition for Optimality
The sucient condition for optimality, which can be used to establish whether a stationary
point is a maximum or a minimum, is given by the following theorem.
73
Theorem 41 A sucient condition for a stationary point x =x to be an extreme point (i.e.
maximum or minimum) is that matrix
_
J
2
c
Jr
i
Jr
j
_
(Hessian of c)evaluated at x =x is
1. positive denite when x =x is minimum
2. negative denite when x =x is maximum
Proof: Using Taylor series expansion, we have
c(x + x) = c(x) +
N
i=1
Jc
Jr
i
(x)x +
1
2!
N
i=1
N
j=1
J
2
c(x + `x)
Jr
i
Jr
j
r
i
r
j
(0 < ` < 1) (294)
.Since x =x is a stationary point we have
\c(x) = 0 (295)
Thus, above equation reduces to
c(x + x) c(x) =
1
2!
N
i=1
N
j=1
J
2
c(x + `x)
Jr
i
Jr
j
r
i
r
j
(296)
(0 < ` < 1)
This implies that sign of c(c +r) c(c)at extreme point x is same as sign of R.H.S. Since
the 2nd partial derivative
_
J
2
c
Jr
i
Jr
j
_
is continuous in the neighborhood of x =x. its value at
x =x + `x will have same sign as its value at x =x for all suciently small x. If the
quantity
N
i=1
N
j=1
J
2
c(x + `r)
Jr
i
Jr
j
r
i
r
j
(x)
T
[\
2
c(x)]x _0 (297)
for all x, then x =x will be a local minimum. In other words, if Hessian matrix ,[\
2
c(x)], is
positive semi-denite, then x =x will be a local minimum. If the quantity
N
i=1
N
j=1
J
2
c(x + `r)
Jr
i
Jr
j
r
i
r
j
(x)
T
[\
2
c(x)]x _0 (298)
for all x, then x =x will be a local maximum. In other words, if Hessian matrix, [\
2
c(x)],
is negative semi-denite, then x =x will be a local maximum.
It may be noted that the need to dene a positive denite or negative denite matri-
ces naturally arises from the geometric considerations while qualifying a stationary point
74
in multi-dimensional optimization problem. Whether a matrix is positive (semi) denite,
negative (semi) denite or indenite can be established using algebraic conditions, such as
sign of the eigen values of the matrix. If eigenvalues of a matrix are all real positive (i.e.
`
i
_ 0 for all i) then, the matrix is positive semi-denite. If eigenvalues of a matrix are all
real negative (i.e. `
i
_ 0 for all i) then, the matrix is negative semi-denite. When eigen
values have mixed signs, the matrix is indenite.
References
[1] Bazara, M.S., Sherali, H. D., Shetty, C. M., Nonlinear Programming, John Wiley, 1979.
[2] Gupta, S. K.; Numerical Methods for Engineers. Wiley Eastern, New Delhi, 1995.
[3] Kreyzig, E.; Introduction to Functional Analysis with Applications,John Wiley, New
York, 1978.
[4] Linz, P.; Theoretical Numerical Analysis, Dover, New York, 1979.
[5] Luenberger, D. G.; Optimization by Vector Space Approach , John Wiley, New York,
1969.
[6] Luenberger, D. G.; Optimization by Vector Space Approach , Joshn Wiley, New York,
1969.
[7] Gourdin, A. and M Boumhrat; Applied Numerical Methods. Prentice Hall India, New
Delhi.
[8] Moursund, D. G., Duris, C. S., Elementary Theory and Application of Numerical Analy-
sis, Dover, NY, 1988.
[9] Rall, L. B.; Computational Solutions of Nonlinear Operator Equations. John Wiley,
New York, 1969.
[10] Rao, S. S., Optimization: Theory and Applications, Wiley Eastern, New Delhi, 1978.
[11] Strang, G.; Linear Algebra and Its Applications. Harcourt Brace Jevanovich College
Publisher, New York, 1988.
[12] Strang, G.; Introduction to Applied Mathematics. Wellesley Cambridge, Massachusetts,
1986.
75
[13] Strang, G.; Computational Science and Engineering. Wellesley-Cambridge Press, MA,
2007.
[14] Philips, G. M.,Taylor, P. J. ; Theory and Applications of Numerical Analysis (2nd Ed.),
Academic Press, 1996.
76