Lecture Notes On Modelling PDF
Lecture Notes On Modelling PDF
Lecture Notes On Modelling PDF
Sachin C. Patwardhan,
Department of Chemical Engineering,
Indian Institute of Technology, Bombay,
Mumbai 400076, India.
Contents
6. Exercise 162
Bibliography 243
CHAPTER 1
1. Introduction
A modern chemical plant consists of interconnected units such as heat
exchangers, reactors, distillation columns, mixers etc. with high degree of inte-
gration to achieve energy efficiency. Design and operation of such complex plants
is a challenging problem. Mathematical modeling and simulation is a cost ef-
fective method of designing or understanding behavior of these chemical plants
when compared to study through experiments. Mathematical modeling can-
not substitute experimentation, however, it can be effectively used to plan the
experiments or creating scenarios under different operating conditions. Thus,
best approach to solving most chemical engineering problems involves judicious
combination of mathematical modeling and carefully planned experiments.
To begin with, let us look at types of problems that can arise in context of
modeling and simulation. Consider a typical small chemical plant consisting of
a reactor and a distillation column, which is used to separate the product as
overhead (see Figure 1). The reactants, which are separated as bottom product
of the distillation column, are recycled to the reactor. We can identify following
problems
• Process Design problem
Given: Desired product composition, raw material composition and avail-
ability.
• To Find: Raw material flow rates, reactor volume and operating con-
ditions (temperature, pressure etc.), distillation column configuration
(feed locations and product draws), reboiler,condenser sizes and oper-
ating conditions (recycle and reflux flows, steam flow rate, operating
temperatures and pressure etc.)
• Process Retrofitting: Improvements in the existing set-up or oper-
ating conditions
Plant may have been designed for certain production capacity and
assuming certain raw material quality. We are often required to assess
whether
1
2 1. MATHEMATICAL MODELING AND SIMULATION
From mathematical viewpoint, these models can be classified into two broad
classes
The above two classes of models together with the various scenarios under
consideration give rise to different types of equation forms such as linear / non-
linear algebraic equations, ordinary differential equations or partial differential
equations. In order to provide motivation for studying these different kinds of
equation forms, we present examples of different models in chemical engineering
and derive abstract equation forms in the following section.
Figure 2
(2.1) y = 20.5x
where y mass fraction of the acetone in the vapor stream and x mass fraction
of the acetone in the liquid stream. Operating conditions of the process are as
follows
• Air in flow: 600 lb /hr with 8 mass % acetone
• Water flow rate: 500 lb/hr
It is required that the waste water should have acetone content of 3 mass
% and we are required to determine concentration of the acetone in the vapor
stream and flow rates of the product streams.
Mass Balance:
Equilibrium Relation:
(2.6) y = 20.5 x
(2.7) ⇒ y = 20.5 × 0.03 = 0.615
Substituting for all the known values and rearranging, we have
⎡ ⎤⎡ ⎤ ⎡ ⎤
0.97 0 0 Ao 0.92 × 600
⎢ ⎥⎢ ⎥ ⎢ ⎥
(2.8) ⎣ 0 0.03 0.615 ⎦ ⎣ L ⎦ = ⎣ 0.08 × 600 ⎦
0.03 0.385 0.97 V 500
The above model is a typical example of system of linear algebraic equations,
which have to be solved simultaneously. The above equation can be represented
in abstract form set of linear algebraic equations
(2.9) Ax = b
where x and b are a (n × 1) vectors (i.e. x, b ∈ Rn ) and A is a (n × n) matrix.
Figure 4
vaporize and two phases at the equilibrium with each other will be present in
the flash drum. The equilibrium relationships are
• Temperature of the liquid phase = temperature of the vapor phase.
• Pressure of the liquid phase = pressure of the vapor phase.
• Chemical potential of the i0 th component in the liquid phase = Chem-
ical potential of the i0 th component in the vapor phase
• Component balance
(2.12) zi ∗ F = xi ∗ L + yi ∗ V (i = 1, 2, 3)
(2.13) = xi ∗ L + ki ∗ xi ∗ V
X
(2.14) xi = 1
(2.15) f1 (x1 , x2 , x3 , L, V ) = 0
(2.16) f2 (x1 , x2 , x3 , L, V ) = 0
............................ = 0
(2.17) f5 (x1 , x2 , x3 , L, V ) = 0
(2.23) F (x) = 0 ; x ∈ Rn
h iT
x = x1 x2 ... xn
8 1. MATHEMATICAL MODELING AND SIMULATION
h iT
(2.24) F (x) = f1 (x) f2 (x) ... fn (x)
A→B
n −E
(2.27) −raie = ko Cai exp( )
R ∗ Ti
then, the problem is to choose parameters {ko , E, n} such that the sum of the
square of errors between the measured and estimated rates is minimum, i.e.
2. MECHANISTIC MODELS AND ABSTRACT EQUATION FORMS 9
Min X
N
(2.28) Φ(ko , E, n) = [−rai − (−raie )]2
ko , E, n i=1
dna d(Ca V ) X X
(2.30) = = Cai Fi − Caj Fi + or − rV
dt dt
Total energy Balance
dE d(U + K + P ) X X dH
(2.31) = = ρi Fi hi − ρj Fj hj ± Q ± WS '
dt dt dt
i : inlet j : outlet
where
ρ: density of the material in the system
2. MECHANISTIC MODELS AND ABSTRACT EQUATION FORMS 11
d (ρAh)
(2.32) = ρFi − ρF
dt
dh
(2.33) A = Fi − F
dt
12 1. MATHEMATICAL MODELING AND SIMULATION
CV-3 CV-1
FC FT
LT LC
Flow Setpoint
Valve
Position Level
CV-2 Setpoint
Steam Flow
TT
Thus,
dh √
(2.34) A+ k h = Fi
dt
Total energy of liquid in the tank is given by
E =U +k+P
Tref represents reference temperature where the specific enthalpy of the liquid is
assumed to be zero. Now, using the energy conservation principle
d (ρAhCp (T − Tref ))
(2.37) = ρFi Cp (Ti − Tref ) − ρF Cp (T − Tref ) + Q
dt
2. MECHANISTIC MODELS AND ABSTRACT EQUATION FORMS 13
where Q is the amount of heat supplied by the steam per unit time. Assuming
Tref = 0, we have
d(hT ) Q
(2.38) A = Fi Ti − F T +
dt ρCp
d(hT ) dT dh
A = Ah + AT
dt dt dt
dT
= Ah + T (Fi − F )
dt
Q
= Fi Ti − F T +
ρCp
Or
dT Q
Ah = Fi (Ti − T ) +
dt ρCp
Summarizing modelling steps
dh 1 1 √
(2.39) = (Fi − F ) = (Fi − k h)
dt A A
dT Fi Q
(2.40) = (Ti − T ) +
dt Ah AhρCp
The associated variables can be classified as
• state(or dependent) variables : h, T
• Input (or independent) variables :Ti , Fi , Q
• Parameters: A, ρ, Cp
Steady state behavior can be computed by solving following two equations
dh √
(2.41) = Fi − k h = 0
dt
dT Fi Q
(2.42) = (Ti − T ) + =0
dt Ah ρCp
Once we choose independent variables Fi = F i , Ti = T i and Q = Q, the steady
state h = h and T = T can be computed by simultaneously solving nonlinear
algebraic equations (2.41-2.42).
The system will be disturbed from the steady state if the input variables
suddenly change value at t = 0. Consider following two situations in which we
need to investigate transient behavior of the above process
• Ti decreases by 10% from its steady state value T i at t = 0. Liquid level
remains at the same steady state value as Ti does not influence the total
mass in tank. The temperature T in the tank will start decreasing with
time (see Figure 7). How T (t) changes with time is determined by the
14 1. MATHEMATICAL MODELING AND SIMULATION
The model we considered above did not contain variation of the variables
with respect to space. Such models are called as ’Lumped parameter models’
and are described by ordinary differential equations of the form
dx1
(2.43) = f1 [x1 (t), x2 (t), ..., xn (t), u1 (t), .., um (t)]
dt
............................................
dxn
(2.44) = fn [x1 (t), x2 (t), ..., xn (t), u1 (t), .., um (t)]
dt
x1 (0) = x1 , ....xn (0) = xn (Initial conditions)
where {xi (t)} denote the state (or dependent) variables and {ui (t)} denote inde-
pendent inputs (or forcing functions) specified for t ≥ 0. Using vector notation,
we can write the above set of ODEs in more compact form
dx
(2.45) = F (x, u)
dt
(2.46) x(0) = x0
where
(2.50) F (x, u) = 0
as a function of time for t ≥= 0 and with the initial state x(0), integrate
dx
(2.52) = F (x, u(t))
dt
over interval 0 ≤ t ≤ tf to determine state trajectories
Example 6. Consider the double pipe heat exchanger in which a liquid flow-
ing in the inner tube is heated by steam flowing countercurrently around the tube
(Figure 10). The temperature in the pipe changes not only with time but also
along the axial direction z. While developing the model, it is assumed that the
temperature does not change along the radius of the pipe. Consequently , we have
only two independent variables, i.e. z and t. To perform the energy balance,we
consider an element of length ∆z as shown in the figure. For this element, over
a period of time ∆t
(2.55) ρCp A∆z[(T )t+Λt − (T )t ] = ρCp V A(T )z ∆t − ρCp V A(T )z+∆z ∆t
(2.56) +Q∆t(πD∆z)
This equation can be explained as
[accumulation of the enthalpy during the time period ∆t]
= [flow in of the enthalpy during ∆t] - [flow out of the enthalpy during ∆t]
[enthalpy transferred from steam to the liquid through wall during ∆t]
2. MECHANISTIC MODELS AND ABSTRACT EQUATION FORMS 17
where
Q: amount of heat transferred from the steam to the liquid per unit time
and per unit heat transfer area.
A: cross section area of the inner tube.
V : average velocity of the liquid(assumed constant).
D: external diameter of the inner tube.
Dividing both the sides by (∆z∆t) and taking limit as ∆t → 0 and ∆z → 0,
we have
∂T (z, t) ∂T (z, t)
(2.57) ρCp A = −ρCp V A + πDQ
∂t ∂z
(2.58) Q = U[Tst − T ]
Boundary conditions:
T (t, z = 0) = T1 fort ≥ 0
Initial condition
(2.59) T (t = 0, z) = T0 (0, z)
This results in a ODE-IVP, which can be solved to obtain steady state profiles
T (z) for specified heat load and liquid velocity.
Dynamic Simulation
∂T ∂T
(2.62) ρCp A = −ρCp V A + πDQ
∂t ∂z
with
This results in a Partial Differential Equation (PDE) model for the distributed
parameter system.
Example 7. Now, let us consider the situation where the some hot liquid
is used on the shell side to heat the tube side fluid (see Figure 10). The model
18 1. MATHEMATICAL MODELING AND SIMULATION
where subscript t denotes tube side and subscript s denotes shell side. The initial
and boundary conditions become
These are coupled PDEs and have to be solved simultaneously to understand the
transient behavior. The steady state problem can be stated as
dTt (z, t)
(2.72) ρt Cpt Vt At = πDU[Ts (z) − Tt (z)]
dz
dTs (z, t)
(2.73) ρs Cps Vs As = πDU[Ts (z) − Tt (z)]
dz
(2.74) Tt (0) = Tt0 at z = 0
(2.75) Ts (1) = Ts1 at z = 1
3. Summary
These lecture notes introduces various basic forms of equations that
appear in steady state and dynamic models of simple unit operations. Following
generic forms or problem formulations have been identified
• Linear algebraic equations
• Nonlinear algebraic equations
• Unconstrained optimization
• Ordinary Differential Equations : Initial Value Problem (ODE-IVP)
• Ordinary Differential Equations : Boundary Value Problem (ODE-
BVP)
• Partial Differential Equations (PDEs)
Methods for dealing with numerical solutions of these generic forms / for-
mulations will be discussed in the later parts.
∂ 2 u/∂x2 + ∂ 2 u/∂y 2 = 0
• Non homogeneous equations: Contain terms other than dependent vari-
ables
(4.3) ∂u/∂t = ∂ 2 u/∂x2 + sin x
X4 X 4
∂2u
(4.5) aij = f [∂u/∂x1 , ......∂u/∂x4 , , u, x1 , ........., x4 ]
i=1 j=1
∂xi ∂xj
4. APPENDIX: BASIC CONCEPTS OF PARTIAL DIFFERENTIAL EQUATIONS 21
aij are assumed to be independent of 0 u0 and its derivative. They can be functions
of (xi ). aij can always be written as aij = aji for i 6= j as
∂2u ∂2u
(4.6) =
∂xi ∂xj ∂xj ∂xi
Thus, aij are elements of a real symmetric matrix A. Obviously A has real eigen
values. The PDE is called
• Elliptic: if all eigenvalues are +ve or-ve.
• Hyperbolic: if some eigenvalues are +ve and rest are -ve.
• Parabolic: if at-least one eigen value is zero.
1. Introduction
When we begin to use concept of vectors in formulating mathematical
models for engineering systems, we tend to associate the concept of a vector
space with the three dimensional coordinate space. The three dimensional space
we are familiar with can be looked upon as a set of objects called vectors, which
satisfy certain generic properties. While working with mathematical modeling,
however, we need to deal with variety of such sets containing such objects.
It is possible to ’distill’ essential properties satisfied by vectors in the three
dimensional vector space and develop a more general concept of a vector space,
which is collection of objects satisfying these properties. Such a generalization
can provide a unified view of problem formulations and the solution techniques.
Generalization of the concept of vector and vector space to any general set
other than collection of vectors in three dimensions is not sufficient. In order
to work with these sets of generalized vectors, we need various algebraic and
geometric structures on these sets, such as norm of a vector, angle between two
vectors or convergence of a sequence of vectors. To understand why these struc-
tures are necessary, consider the fundamental equation that arises in numerical
analysis
(1.1) F (x) = 0
where x is a vector and F (.) represents some linear or nonlinear operator, which
when operates on x yields the zero vector 0. In order to generate a numerical
approximation to the solution of equation (6.2), this equation is further trans-
formed to formulate an iteration sequence as follows
£ ¤
(1.2) x(k+1) = G x(k) ; k = 0, 1, 2, ......
© ª
where x(k) : k = 0, 1, 2, ...... is sequence of vectors in vector space under con-
sideration. The iteration equation is formulated in such a way that the solution
x∗ of equation (1.2), i.e.
x∗ = G [x∗ ]
23
24 2. FUNDAMENTALS OF FUNCTIONAL ANALYSIS
is also a solution of equation (6.2). Here we consider two well known examples
of such formulation.
Zz
£ ¤
(1.8) x(k+1) (z) = x0 + f x(k) (q), q dq
0
Note that this procedure generates a sequence of functions x(k+1) (z) over the
interval 0 ≤ z ≤ 1 starting from an initial guess solution x(0) (z). If we can
treat a function x(k) (z) over the interval 0 ≤ z ≤ 1 as a vector, then it is easy
to see that the equation (1.8) is of the form given by equation (1.2).
2. Vector Spaces
Associated with every vector space is a set of scalars F (also called as
scalar field or coefficient field) used to define scalar multiplication on the space.
In functional analysis, the scalars will be always taken to be the set of real
numbers (R) or complex numbers (C).
is a linear space Lp .
Similarly, if X = l∞ and x(k) ∈ l∞ represents k’th vector in the set, then x(k)
represents a sequence of infinite components
h iT
(k) (k) (k)
(2.4) x = x1 .... xi ......
Definition 6. (Span of Set of Vectors): Let S be a subset of vector
space X. The set generated by all possible linear combinations of elements of S
is called as span of S and denoted as [S]. Span of S is a subspace of X.
Definition 7. (Linear Dependence): A vector x is said to be linearly
dependent up on a set S of vectors if x can be expressed as a linear combination
of vectors from S. Alternatively, x is linearly dependent upon S if x belongs to
span of S, i.e. x ∈ [S]. A vector is said to be linearly independent of set S, if
it not linearly dependent on S . A necessary and sufficient condition for the set
of vectors x(1) , x(2) , .....x(m) to be linearly independent is that expression
X m
(2.5) αi x(i) = 0
i=1
Example 24. (C[a, b], kx(t)k∞ ) : The normed linear space C[a, b] together
with infinite norm
max
(3.5) kx(t)k∞ = |x(t)|
a≤t≤b
3. NORMED LINEAR SPACES AND BANACH SPACES 31
⎡b ⎤ 12
Z
(3.9) kx(t)k2 = ⎣ |x(t)|2 dt⎦
a
Example 26. Let X = (Q, k.k1 ) i.e. set of rational numbers (Q) with scalar
field also as the set of rational numbers (Q) and norm defined as
(3.11) kxk1 = |x|
A vector in this space is a rational number. In this space, we can construct
Cauchy sequences which do not converge to a rational numbers (or rather they
converge to irrational numbers). For example, the well known Cauchy sequence
x(1) = 1/1
x(2) = 1/1 + 1/(2!)
.........
x(n) = 1/1 + 1/(2!) + ..... + 1/(n!)
converges to e, which is an irrational number. Similarly, consider sequence
x(n+1) = 4 − (1/x(n) )
Starting from initial point x(0) = 1, we can generate the sequence of rational
numbers
3/1, 11/3, 41/11, ....
√
which converges to 2 + 3 as n → ∞.Thus, limits of the above sequences is
outside the space X and the space is incomplete.
4. INNER PRODUCT SPACES AND HILBERT SPACES 33
Example 28. Let X = (C[0, 1], k.k1 ) i.e. space of continuous function on
[0, 1] with one norm defined on it i.e.
Z1
(3.12) kx(t)k1 = |x(t)| dt
0
However, as can be observed from Figure1, the sequence does not converge to a
continuous function.
vectors in R3 , say x
b and yb,the angle between these two vectors is defined using
inner (or dot) product of two vectors as
µ ¶T
T x y
(4.1) cos(θ) = (b x) y b=
kxk2 kyk2
(4.2) = x b1 yb1 + x
b2 yb2 + x
b3 yb3
The fact that cosine of angle between any two unit vectors is always less than
one can be stated as
(4.3) |cos(θ)| = |hb bi| ≤ 1
x, y
Moreover, vectors x and y are called orthogonal if (x)T y = 0. Orthogonality
is probably the most useful concept while working in three dimensional Euclid-
ean space. Inner product spaces and Hilbert spaces generalize these simple
geometrical concepts in three dimensional Euclidean space to higher or infinite
dimensional spaces.
2 hx, yi hy, xi
(4.7) ⇒ −λ hx, yi − λ hy, xi = −
hy, yi
2 hx, yi hx, yi 2 |hx, yi|2
(4.8) = − =−
hy, yi hy, yi
|hx, yi|2
(4.9) ⇒ 0 ≤ hx, xi −
hy, yi
p
or | hx, yi| ≤ hx, xi hy, yi
The triangle inequality can be can be established easily using the Cauchy-
Schwarz inequality as follows
36 2. FUNDAMENTALS OF FUNCTIONAL ANALYSIS
Definition 14. (Angle) The angle θ between any two vectors in an inner
product space is defined by
∙ ¸
−1 hx, yi
(4.20) θ = cos
kxk2 kyk2
Example 29. Inner Product Spaces
(1) Space X ≡ Rn with x = (ξ 1, ξ 2, ξ 3, .......ξ n, ) and y = (η 1 , η 2 ........η n )
X
n
T
(4.21) hx, yi = x y = ξ i η i
i=1
X
n
(4.22) hx, xi = (ξ i )2 = kxk22
i=1
n
(2) Space X ≡ R with x = (ξ 1, ξ 2, ξ 3, .......ξ n, ) and y = (η 1 , η 2 ........η n )
(4.23) hx, yiW = xT W y
where W is a positive definite matrix. The corresponding 2-norm is
p √
defined as kxkW,2 = hx, xiW = xT W x
4. INNER PRODUCT SPACES AND HILBERT SPACES 37
X
n X
n
(4.25) hx, xi = ξiξi = |ξ i |2 = kxk22
i=1 i=1
is an inner product space and denoted as L2 [a, b]. Well known examples
of spaces of this type are the set of continuous functions on [−π, π] or
[0, 2π],which we consider while developing Fourier series expansions of
continuous functions on these intervals using sin(nπ) and cos(nπ) as
basis functions.
(5) Space of polynomial functions on [a, b]with inner product
Zb
(4.27) hx, yi = x(t)y(t)dt
a
is probably the most important result the plane geometry, is true in any inner
product space.
(1) x(1)
(4.29) e = (1)
kx k2
We form unit vector e(2) in two steps.
®
(4.30) z(2) = x(2) − x(2) , e(1) e(1)
®
where x(2) , e(1) is component of x(2) along e(1) .
z(2)
(4.31) e(2) =
kz(2) k2
.By direct calculation it can be verified that e(1) ⊥e(2) . The remaining orthonor-
mal vectors e(i) are defined by induction. The vector z(k) is formed according
to the equation
X
k−1
(k) (i) ® (i)
(k) (k)
(4.32) z =x − x , e .e
i=1
and
4. INNER PRODUCT SPACES AND HILBERT SPACES 39
z(k)
(4.33) e(k) = ; k = 1, 2, .........n
kz(k) k2
Again it can be verified by direct computation that z(k) ⊥e(i) for all i < k.
®
(4.36) z(2) = x(2) − x(2) , e(1) .e(1)
⎡ ⎤ ⎡ 1 ⎤ ⎡ ⎤
1
1 √
⎢ ⎥ 1 ⎢ 2 ⎥ ⎢ 2 ⎥
= ⎣ 0 ⎦− √ ⎣ 0 ⎦=⎣ 0 ⎦
2 √1
0 2
− 12
⎡ ⎤
√1
z(2) ⎢ 2
⎥
(4.37) e(2) = (2)
.=⎣ 0 ⎦
kz k2 1
− √2
® ®
z(3) = x(3) − x(3) , e(1) .e(1) − x(3) , e(2) .e(2)
⎡ ⎤ ⎡ 1 ⎤ ⎡ 1 ⎤ ⎡ ⎤
2 √ √ 0
⎢ ⎥ √ ⎢ 2
⎥ √ ⎢ 2
⎥ ⎢ ⎥
(4.38) = ⎣ 1 ⎦ − 2⎣ 0 ⎦ − 2⎣ 0 ⎦=⎣ 1 ⎦
0 √1 − √12 0
2
z(3) h iT
e(3) = . = 0 1 0
kz(3) k2
Note that the vectors in the orthonormal set will depend on the definition of
inner product. Suppose we define the inner product as follows
(4.39) hx, yiW = xT W y
⎡ ⎤
2 −1 1
⎢ ⎥
W = ⎣ −1 2 −1 ⎦
1 −1 2
40 2. FUNDAMENTALS OF FUNCTIONAL ANALYSIS
° ° √
where W is a positive definite matrix. Then, length of °x(1) °W,2 = 6 and the
e(1) becomes
unit vector b
⎡ 1 ⎤
√
x(1) ⎢ 6 ⎥
(4.40) e(1)
b = (1) .=⎣ 0 ⎦
kx kW,2 1 √
6
The remaining two orthonormal vectors have to be computed using the inner
product defined by equation 4.39.
Z1
(1)
®
(2) t
(4.44) e (t), x (t) = dt = 0
2
−1
®
(4.45) z(2) (t) = t − x(2) , e(1) .e(1) = t = x(2) (t)
z(2)
(4.46) e(2) =
kz(2) k
Z1 ∙ 3 ¸1
° (2) °2 t 2
(4.47) °z (t)° = 2
t dt = =
3 −1 3
−1
r
° (2) ° 2
(4.48) °z (t)° =
3
r
3
(4.49) e(2) (t) = .t
2
4. INNER PRODUCT SPACES AND HILBERT SPACES 41
® ®
z(3) (t) = x(3) (t) − x(3) (t), e(1) (t) .e(1) (t) − x(3) (t), e(2) (t) .e(2) (t)
⎛1 ⎞ ⎛r 1 ⎞
Z Z
1 3 3 ⎠ (2)
= t2 − ⎝ t2 dt⎠ e(1) (t) − ⎝ t dt e (t)
2 2
−1 −1
1 1
(4.50) = t2 − − 0 = t2 −
3 3
z(3) (t)
(4.51) e(3) (t) =
kz(3) (t)k
Z1 µ ¶2
° °2 ® 1
(4.52) where °z(3) (t)° = z(3) (t), z(3) (t) = 2
t − dt
3
−1
Z1 µ ¶ ∙ 5 ¸1
4 2 2 1 t 2t3 t
= t − t + dt = − +
3 9 5 9 9 −1
−1
2 4 2 18 − 10 8
= − + = =
3 9 9 45 45
r r
° (3) ° 8 2 2
(4.53) °z (t)° = =
45 3 5
The orthonormal polynomials constructed above are well known Legandre poly-
nomials. It turns out that
r
2n + 1
(4.54) en (t) = pn (t) ; (n = 0, 1, 2.......)
2
where
(−1)n dn ©¡ ¢ ª
2 n
(4.55) Pn (t) = n 1 − t
2 n! dtn
are Legandre polynomials. It can be shown that this set of polynomials forms a
orthonormal basis for the set of continuous functions on [-1,1].
The set of all elements for which and operator F is defined is called as
domain of F and set of all elements generated by transforming elements in
domain by F are called as range of F. If for every y ∈ Y , there is utmost one
x ∈ M for which F(x) = y , then F(.) is said to be one to one. If for every
y ∈ Y there is at least one x ∈ M, then F is said to map M onto Y.
Note that any transformation that does not satisfy above definition is not a
linear transformation.
(5.2) y = Ax
(5.3) y = Ax + b
does not satisfy equation (2.1) and does not qualify as a linear trans-
formation.
(2) d/dt(.) is an operator from the space of continuously differentiable func-
tions to the space of continuous function.
R1
(3) The operator 0 [.]dt maps space of integrable functions into R.
(5.4) y =F(x)
(5.5) Ax = b
can be rearranged as
(5.6) F (x) = Ax − b =0
(5.7) F (x) = 0
h iT
F (x) = f1 (x) f2 (x) ... fn (x)
n×1
• ODE—IVP: X ≡ C n [0, ∞)
⎡ ⎤ " #
dx(t)
(5.8) ⎣ dt − F (x(t), t) ⎦ = 0
x(0) − x0 0
Boundary Conditions
(5.11) f1 [du/dz, u, z] = 0 at z = 0
5. PROBLEM FORMULATION, TRANSFORMATION AND CONVERGENCE 45
(5.12) f2 [du/dz, u, z] = 0 at z = 1
which can be written in abstract form
F [u(t)] = 0 ; u(t) ∈ C (2) [0, 1]
Here, the operator F [u(t)] consists of the differential operator Ψ(.)
together with the boundary conditions and C (2) [0, 1] represents set of
twice differentiable continuous functions.
As evident from the above abstract representations, all the problems can be
reduced to one fundamental equation of the form
(5.13) F(x) = 0
where x represents a vector in the space under consideration. It is one funda-
mental problem, which assumes different forms in different context and different
vector spaces. Viewing these problems in a unified framework facilitates better
understanding of the problem formulations and the solution techniques.
5.2. Numerical Solution. Given any general problem of the form (5.13),
the approach to compute a solution to this problem typically consists of two
steps
• Problem Transformation: In this step, the original problem is trans-
formed into a one of the known standard forms, for which numerical
tools are available.
• Computation of Numerical Solution: Use a standard tool or a
combination of standard tools to construct a numerical solution to the
transformed problem. The three most commonly used tools are (a)
linear algebraic equation solvers (b) ODE-IVPs solvers (c) numerical
optimization. A schematic diagram of the generic solution procedure
is presented in Figure (2). In the sub-section that follows, we will de-
scribe two most commonly used tools for problem transformation. The
tools used for constructing solutions and their applications in different
context will form the theme for the rest of the book.
Original
Problem
Transformation
Taylor Series / Weierstrass
Theorem
Tool 2:
Solutions
Tool 1: of ODE-
Solutions Transformed Problem in IVP
of Linear Standard Format
Algebraic
Equations Tool 3:
Numerical Numerical
Solution Optimization
Figure 2
1 X n X n Xn 3
∂ f (x + λδx)
R3 (x, δx) = δxi δxj δxk ; (0 < λ < 1)
3! i=1 j=1 k=1 ∂xi ∂xj ∂xk
Note that the gradient ∇f (x) is an n×1 vector and the Hessian ∇2 f (x)
is an n × n matrix..
This gives nonlinear equations in three unknowns, which can be solved simul-
taneously to compute a1, a2 and a3 . In effect, the Weierstrass theorem has been
used to convert an ODE-IVP to a set of nonlinear algebraic equations.
6. Summary
In this chapter, we review important concepts from functional analysis
and linear algebra, which form the basis of synthesis and analysis the numerical
methods. We begin with the concept of a general vector space and define various
algebraic and geometric structures like norm and inner product. We also inter-
pret the notion of orthogonality in a general inner product space and develop
Gram-Schmidt process, which can generate an orthonormal set from a linearly
independent set. We later introduce two important results from analysis, namely
the Taylor’s theorem and Weierstrass approximation theorems, which play piv-
otal role in formulation of iteration schemes . We then proceed to develop theory
for analyzing convergence of linear and nonlinear iterative schemes using eigen
value analysis and contraction mapping principle, respectively. In the end, we
establish necessary and sufficient conditions for optimality of a scalar valued
function, which form basis of the optimization based numerical approaches.
7. Exercise
(1) While solving problems using a digital computer, arithmetic operations
can be performed only with a limited precision due to finite word length.
Consider the vector space X ≡ R and discuss which of the laws of
algebra (associative, distributive, commutative) are not satisfied for
the floating point arithmetic in a digital computer.
(2) Show that the solution of the differential equation
d2 x
+x=0
dt2
is a linear space. What is the dimension of this space?
52 2. FUNDAMENTALS OF FUNCTIONAL ANALYSIS
(3) Show that functions 1, exp(t), exp(2t), exp(3t) are linearly independent
over any interval [a,b].
(4) Does the set of functions of the form
hx, yi = xT W y
(18) Show that in C[a,b] with maximum norm, we cannot define an inner
product hx, yi such that hx, xi1/2 = kxk∞ . In other words, show that
in C[a, b] the following function
max
hf (t),g(t)i = |x(t)y(t)|
t
cannot define an inner product.
(19) In C (1) [a, b] is
Zb
hx, yi = x0 (t)y0 (t)dt + x(a)y(a)
a
an inner product?
(20) Show that in C (1) [a, b] is
Zb
hx, yi = w(t)x(t)y(t)dt
a
(1.5) Ax = b
⎡ ⎤
a11 . . a1n
⎢ . . . . ⎥
⎢ ⎥
(1.6) A=⎢ ⎥
⎣ . . . . ⎦
am1 . . amn
Figure 1
There are two ways of interpreting the above matrix vector equation geometri-
cally.
• Row picture : If we consider two equations separately as
" #T " #
2 x
(1.8) 2x − y = =1
−1 y
" #T " #
1 x
(1.9) x+y = =5
1 y
then, each one is a line in x-y plane and solving this set of equations
simultaneously can be interpreted as finding the point of their intersec-
tion (see Figure 1 (a)).
• Column picture : We can interpret the equation as linear combina-
tion of column vectors, i.e. as vector addition
" # " # " #
2 −1 1
(1.10) x1 + x2 =
1 1 5
⎡ ¡ ¢T ⎤ ⎡ ⎤
r(1) x b1
⎢ ¡ (2) ¢T ⎥ ⎢ ⎥
⎢ r x ⎥ ⎢ b2 ⎥
⎢ ⎥
(1.13) ⎢ ⎥=⎢ ⎥
⎣ ¡ ....¢ ⎦ ⎣ .... ⎦
T
r(n) x bn
¡ ¢T
Each of these equations r(i) x = bi represents a hyperplane in Rn (i.e. line
in R2 ,plane in R3 and so on). Solution of Ax = b is the point x at which
all these hyperplanes intersect (if at all they intersect in one point).
Column picture : Let A be represented as A = [ c(1) c(2) .........c(n) ] where
c(i) represents ith column of A. Then we can look at Ax = b as one vector
equation
(1.14) x1 c(1) + x2 c(2) + .............. + xn c(n) = b
Components of solution vector x tells us how to combine column vectors to
obtain vector b.In singular case, the n hyperplanes have no point in common
58 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
or equivalently the n column vectors are not linearly independent. Thus, both
these geometric interpretations are consistent with each other.
Now, to understand behavior of solutions of type (1.5), we can define four
fundamental spaces associated with matrix A
Definition 23. (Row Space): The space spanned by row vectors of matrix
A is called as row space of matrix A and denoted as R(AT ).
Definition 24. (Null space): The set of all vectors x such that Ax = 0̄
is called as null space of matrix A and denoted as N(A).
A non-zero null space is obtained only when columns of A are linearly de-
pendent. If columns of A are linearly independent, then N(A) ≡ {0̄}.
Definition 25. (Left Null Space) :The set of all vectors y such that
A y = 0̄ is called as null space of matrix A and denoted as N(AT ).
T
A non-zero left null space is obtained only when rows of A are linearly
dependent. If rows of A are linearly independent, then N(AT ) ≡ {0̄}.
The following fundamental result, which relates dimensions of row and col-
umn spaces with the rank of a matrix, holds true for any m × n matrix A.
Note that
• When matrix A operates on vector x ∈ R(AT ) (i.e. a vector belonging
to row space of A) it produces a vector Ax ∈ R(A) (i.e. a vector in
column space of A)
1. SOLUTION OF Ax = b AND FUNDAMENTAL SPACES OF A 59
as the vector on R.H.S. belongs to R(A). But, this is not the only solution.
We can write
" # " # " #
1 1 2
(1.17) 3 + (−1) =
2 2 4
h iT h iT
This implies that x1 x2 = 3 −1 is also a solution to the above
problem. Why does this happen and how can we characterize all possible solu-
tions to this problem? To answer this question, let us find null space of ma-
trix A. In this particular case, by simple visual inspection, we can find that
h iT h iT
x1 x2 = 1 −1 is a vector belonging to the null space of A.
" # " # " #
1 1 0
(1.18) (1) + (−1) =
2 2 0
h iT
In fact, null space of A can be written as N(A) = α 1 −1 for any real
scalar α. Thus,
" #" # " #
1 1 α 0
(1.19) =
2 2 −α 0
h iT h iT
This implies that, if x1 x2 = 1 1 is a solution to (1.15), then any
vector
" # " #
1 1
(1.20) x= +α
1 −1
Thus, if we add any vector from the null space of A to the solution of (1.5),
then we get another solution to equation (1.5). If N(A) ≡ {0̄} and a solution
exists, i.e. b ∈ R(A), then the solution is unique. If N(A) 6= {0̄} and b ∈ R(A),
then there are infinite solutions to equation (1.5).
Methods for solving linear algebraic equations can be categorized as (a)
direct or Gaussian elimination based schemes and (b) iterative schemes. In the
sections that follow, we discuss these techniques in detail.
some of the sparse matrix algorithms are discussed in detail. This is meant to
be a brief introduction to sparse matrix computations and the treatment of the
topic is, by no means, exhaustive.
Boundary Conditions
(3.2) f1 [dy/dz, y, z] = 0 at z = 0
(3.3) f2 [dy/dz, y, z] = 0 at z = 1
Let y ∗ (z) ∈ C (2) [0, 1] denote true solution to the above ODE-BVP. Depending
on the nature of operator Ψ,it may or may not be possible to find the true
solution to the problem. In the present case, however, we are interested in
finding an approximate numerical solution, say y(z), to the above ODE-BVP.
The basic idea in finite difference approach is to convert the ODE-BVP to
a set of linear or nonlinear algebraic equations using Taylor series expansion as
basis. In order to achieve this, the domain 0 ≤ z ≤ 1 is divided into (n + 1)
equidistant grid points z0 , z1 ......, zn located such that
Let the value of y at location zi be defined as yi = y(zi ). Using the Taylor Series
expansion yi+1 = y(zi+1 ) = y(zi + ∆z) can be written as
(3.4)
(2) (3)
yi+1 = yi + (dy/dz)i (∆z) + (1/2!)yi (∆z)2 + (1/3!) yi (∆z)3 + ..................
Where
(k)
(3.5) yi = (dk y/dz k )z=zi
0
From equations (3.4) and (3.6) we can arrive at several expressions for yi−1 .
From equation (3.4) we get [6]
(yi+1 − yi ) h (2) i
(3.7) (dy/dz)i = − yi (∆ z/2) + ....
∆z
From equation (3.6) we get
(yi − yi−1 ) h (2) i
(3.8) (dy/dz)i = + yi (∆ z/2) − .......
∆z
Combining equations (3.4) and (3.6) we get
(yi+1 − yi−1 ) h (3) i
(3.9) (dy/dz)i = − yi (∆ z 2 /3!) + ........
2(∆z)
The first two formulae are accurate to O(∆z) while the last one is accurate
to O[(∆z)2 ] and so is more commonly used. Equation (3.7) and (3.8) can be
(2)
combined to give an equation for second derivatives yi at location i:
(yn − yn−1 )
(3.14) f2 [ , y0 , 0] = 0
(∆z)
This gives remaining two equations.
64 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
— Approach 2:
(y1 − y−1 )
(3.15) f1 [ , y0 , 0] = 0
(2∆z)
yn+1 − yn−1
(3.16) f2 [ , y0 , 0] = 0
(∆z)
This approach introduces two more variables y−1 and yn+1 at hy-
pothetical grid points. Thus we have n + 3 variables and n + 1
equations, two more algebraic equations can be generated by set-
ting residual at zero at boundary points,i.e., at z0 and zn ,i.e.,
R0 = 0 and Rn = 0
This results in (n + 3) equations in (n + 3) unknowns.
0<z<L
(Note that this problem can be solved analytically.) Dividing the region 0 ≤ z ≤
L into n equal subregions and setting residuals zero at the internal grid points,
we have
(3.20) (Ti−1 − 2Ti + 2Ti+1 )/(∆z)2 + q/k = 0
i = 1, 2, .......(.n − 1).
3. SOLUTIONS OF SPARSE LINEAR SYSTEMS 65
Or
or
3.1.2. Solution of PDE using Finite Difference Method [6]. Consider elliptic
PDEs described by
∇2 u = cu + f (x, y, z)
∂2u ∂2u ∂2u
or + + = cu + f (x, y, z)
∂x2 ∂y 2 ∂z 2
which is solved on a 3 dimensional bounded region V with boundary S. The
boundary conditions on spatial surface S are given as
These equations can be transformed into a set of linear (or nonlinear) algebraic
equations by using Taylor’s theorem and approximating
µ 2 ¶
∂ u (ui+1,j,k − 2ui,j,k + ui−1,j,k )
2
=
∂x ijk (∆x)2
µ 2 ¶
∂ u (ui,j+1,k − 2ui,j,k + ui,j−1,k )
2
=
∂y ijk (∆y)2
µ 2 ¶
∂ u (ui,j,k+1 − 2ui,j,k + ui,j,k−1 )
2
=
∂z ijk (∆z)2
and so on.
Example 40. Laplace equation represents a prototype for steady state dif-
fusion processes. For example 2-dimensional Laplace equation
(3.29) x = 0 : T = T1 ; x = Lx : T = T3
(3.30) y = 0 : T = T2 ; y = Ly : T = T4
Construct the 2 -dimensional grid with (nx + 1) equispaced grid lines parallel to
y axis and (ny + 1) equispaced grid lines parallel to x axis. The temperature T
at (i, j) th grid point is denoted as Tij = T (xi, yWe force the residual to be zero
at each internal grid point to obtain the following set of equations:
Now we define
(3.35) x = [T11 T12 .............T1,ny −1, .........., Tnx −1,1. .............Tnx −1,ny −1 ]T
And rearrange the above set of equations in form of Ax = b, then A turns out
to be a large sparse matrix. Even for 10 internal grid lines in each direction we
would get a 100 × 100 sparse matrix associated with 100 variables.
y = f (z) = α0 + α1 z + α2 z 2 + ... + αn z n
Note that values of y and x are exactly known and this is a problem of finding
an n’th degree interpolation polynomial that passes through all the points. The
coefficients of this polynomial can be easily found by solving equation
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 z0 ... (z0 )n α0 y0
⎢ 1 z ... (z )2 ⎥ ⎢ α ⎥ ⎢ y ⎥
⎢ 1 1 ⎥⎢ 1 ⎥ ⎢ 1 ⎥
(3.36) ⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎣ ... ... ... ..... ⎦ ⎣ ..... ⎦ ⎣ .... ⎦
1 zn ... (zn )2 αn yn
(3.37) or Aα = y
for( i = 0, 1, 2, ..., n − 2 )
(3.55) α2,0 = 0
(3.56) α2,n−1 + 3α3,n−1 (∆zn−1 ) = 0
(3.61) α2,0 = 0
(3.62) (∆zi−1 ) α2,i−1 + 2(∆zi + ∆zi−1 )α2,i + (∆zi−1 ) α2,i+1 = bi
for ( i = 1, 2, ..., n − 2 )
where
3(α0,i+1 − a0,1 ) 3(α0,i − a0,i−1 ) 3(yi+1 − y1 ) 3(yi − yi−1 )
bi = − = −
∆zi ∆zi−1 ∆zi ∆zi−1
for ( i = 1, 2, ..., n − 2 )
1 2
(3.63) (∆zn−2 ) α2,n−2 + (∆zn−2 + ∆zn−1 )α2,n−1 = bn
3 3
µ ¶
yn 1 1 yn−2
(3.64) bn = − + yn−1 +
∆zn−1 ∆zn−1 ∆zn−2 ∆zn−2
Defining vector α2 as
h iT
α2 = α2,0 α2,1 ....... α2,n
70 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
⎡ ⎤⎡ ⎤ ⎡ ⎤
b1 c1 0 ... ... ... ... 0 x1 d1
⎢ a2 b2 c2 0 ... ... ... 0 ⎥⎢ x2 ⎥ ⎢ d2 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 a3 b3 c3 ... ... ... 0 ⎥⎢ x3 ⎥ ⎢ .... ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 0 a4 b4 c4 . ... ... ... ⎥⎢ .... ⎥ ⎢ .... ⎥
(3.66) ⎢ ⎥⎢ ⎥ =⎢ ⎥
⎢ ... ... .. ... ... ... ... ... ⎥⎢ .... ⎥ ⎢ .... ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ... ... ... ... ... ... cn−2 0 ⎥⎢ .... ⎥ ⎢ .... ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ ... ... ... ... ... an−1 bn−1 cn−1 ⎦⎣ .... ⎦ ⎣ .... ⎦
0 0 0 0 ... .0 an bn xn dn
where matrix A is a tridiagonal matrix.
Step 1:Triangularization: Forward sweep with normalization
(3.67) γ 1 = c1 /b1
ck
(3.68) γk = ; k = 2, 3, ....(n − 1)
bk − ak γ k−1
(3.69) β 1 = d1 /b1
(dk − ak β k−1 )
(3.70) βk = ; k = 2, 3, ....n
(bk − ak γ k−1 )
This sequence of operations finally results in the following system of equations
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 γ 1 0 .... 0 x1 β1
⎢ 0 1 γ .... 0 ⎥ ⎢ x ⎥ ⎢ β ⎥
⎢ 2 ⎥ ⎢ 2 ⎥ ⎢ 2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ... 0 1 .... . ⎥ ⎢ .... ⎥ = ⎢ . ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ .... .... .... .... γ n−1 ⎦ ⎣ .... ⎦ ⎣ . ⎦
0 0 ... .... 1 xn βn
Step 2: Backward sweep leads to solution vector
xn = β n
3. SOLUTIONS OF SPARSE LINEAR SYSTEMS 71
(3.71) xk = β k − γ k xk+1
(3.72) k = (n − 1), .(n − 2), ......, 1
ϕ = 5n − 8
Γ1 = [B1 ]−1 C1
P
i−1
[bi − lij xj ]
j=1
(3.80) xi = ; i = 2, 3, .....n
lii
The operational count ϕ i.e., the number of multiplications and divisions, for
this elimination process is
X
i−1
(i) −1 (i)
(3.83) η = (Aii ) [b − (Aij η (i) )] ; i = 2, 3, .....n
j=1
The above form does not imply that the inverse (Aii )−1 should be compared
explicitly. For example we can find η (1) by Gaussian elimination to solve the
system A11 η (1) = b(1)
4. ITERATIVE SOLUTION TECHNIQUES FOR SOLVING Ax = b 73
(3.87) A21 [A11 ]−1 [b(1) − A12 x(2) ] + A22 x(2) = b(2)
£ ¤
(3.88) A22 − A21 [A11 ]−1 A12 x(2) = b(2) − (A21 A−1
11 )b
(1)
or
£ ¤−1 £ (2) ¤
x(2) = A22 − A21 [A11 ]−1 A12 b − (A21 A−1
11 )b
(1)
(4.1) A=S−T
Sx = T x + b
n o n o
(k+1) (k+1) (k)
(4.22) x2 = [b2 − a21 x1 − a23 x3 + ..... + a2n xn(k) ]/a22
n o n o
(k+1) (k+1) (k+1) (k)
(4.23) x3 = [b3 − a31 x1 + a32 x2 − a34 x4 + ..... + a3n xn(k) ]/a33
...... = .............................................................
(k+1) (k+1)
(4.24) xn(k+1) = [bn − an1 x1 ......an,n−1 xn−1 ]/ann
Again it is implicitly assumed that pivots aii are non zero. Now in the above
(k+1)
set of equations, if we move all terms involving xi from R.H.S. to L.H.S. ,
4. ITERATIVE SOLUTION TECHNIQUES FOR SOLVING Ax = b 77
we get
⎡ ⎤⎡ (k+1) ⎤
a11 0 0 . x1
⎢ a21 a22 . . ⎥⎢ . ⎥
⎢ ⎥⎢ ⎥
(4.25) ⎢ ⎥⎢ ⎥
⎣ . . . . ⎦⎣ . ⎦
(k+1)
an1 . . ann xn
⎡ ⎤⎡ (k) ⎤ ⎡ ⎤
0 −a12 . . −a1n x1 b1
⎢ . . . . . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
(4.26) = ⎢ ⎥⎢ ⎥+⎢ ⎥
⎣ . . . .. −an−1,n ⎦⎣ . ⎦ ⎣ . ⎦
(k)
0 . . . 0 xn bn
Thus, we have
S = L+D
T = −U
Gauss-Seidel Algorithm
INITIALIZE :b, A, x, kmax , ε
k=0
δ = 100 ∗ ε
WHILE [(δ > ε) AND (k < kmax )]
FOR i = 1 : n n
P
ri = bi − aij xj
j=1
xi = xi + (ri /aii )
END FOR
Tc = krk / kbk
k =k+1
END WHILE
(4.29) ye − y = ω (b
y − y)
(4.30) or ye = ω yb + (1 − ω) y
Relaxation Algorithm
INITIALIZE : b, A, x, kmax , ε, ω
k=0
δ = 100 ∗ ε
WHILE [(δ > ε) AND (k < kmax )]
FOR i = 1 : n n
P
qi = bi − aij xj
j=1
zi = xi + (qi /aii )
xi = ωzi + (1 − ω)xi
END FOR
r = b−Ax
Tc = krk / kbk
k =k+1
END WHILE
4. ITERATIVE SOLUTION TECHNIQUES FOR SOLVING Ax = b 79
¡ ¢
(4.35) x∗ = S −1 T x∗ + S −1 b
lim
(4.42) e(k) = 0
k→∞
lim
(4.43) i.e. [S −1 T ]k e(0) = 0
k→∞
for any initial guess vector e(0) . It mat be noted that equation (4.37) is a linear
difference equation of form
subject to initial condition z(0) .Here z ∈ Rn and B is a n×n matrix. In the next
sub-section, we analyze behavior of the solutions of linear difference equations
of type (4.44). We then proceed with applying these general results to the
specific problem at hand. i.e. convergence of iteration schemes for solving
linear algebraic equations.
80 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
4.4.1. Eigenvalue Analysis. To begin with, let us consider scalar linear iter-
ation scheme
if and only if |b| < 1.To generalize this notation to a multidimensional case,
consider equation of type (4.44) where z(k) ∈ Rn . Taking motivation from the
scalar case, we propose a solution to equation (4.44) of type
(4.47) z(k) = λk v
(4.50) (λI − B) v = 0
Note that equation (4.51) is nothing but the characteristic polynomial of matrix
A and its roots are called eigenvalues of matrix A. For each eigenvalue λi we
can find the corresponding eigen vector v(i) such that
Thus, we get n fundamental solutions of the form (λi )k v(i) to equation (6.54)
and a general solution to equation (6.54) can be expressed as linear combination
4. ITERATIVE SOLUTION TECHNIQUES FOR SOLVING Ax = b 81
max
(4.59) ρ(B) = |λi |
i
then, the condition for convergence of iteration equation (6.54) can be stated as
ρ(B) < 1
The above sufficient condition is more useful from the viewpoint of compu-
tations as kBk1 and kBk∞ can be computed quite easily. On the other hand,
the spectral radius of a large matrix can be comparatively difficult to compute.
4.4.2. Convergence Criteria for Iteration Schemes. The criterion for conver-
gence of iteration equation (4.37) can be derived using results derived above.
The necessary and sufficient condition for convergence of (4.37) can be stated
as
ρ(S −1 T ) < 1
i.e. the spectral radius of matrix S −1 T should be less than one.
The necessary and sufficient condition for convergence stated above requires
computation of eigen values of S −1 T, which is a computationally demanding
task when the matrix dimension is large. If for a large dimensional matrix,
we could check this condition before starting iterations, then we might as well
solve the problem by a direct method rather than using iterative approach to
save computations. Thus, there is a need to derive some alternate criteria
for convergence, which can be checked easily before starting iterations. For
example, using Theorem 3.3, we can obtain sufficient conditions for convergence
° −1 ° ° °
°S T ° < 1 OR °S −1 T ° < 1
1 ∞
84 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
which are significantly easy to evaluate than the spectral radius. Also, if the
matrix A has some special properties, such as diagonal dominance or symme-
try and positive definiteness, then we can derive easily computable criteria by
exploiting these properties.
Thus, the error norm at each iteration is reduced by factor of 1/4. This implies
that, for the example under consideration
(4.98) λ1 = λ2 = ωopt − 1
at optimum ω.Now,
computed as x∗ = A−1 b.
88 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
From example 3, we can clearly see that the rate of convergence depends on
ρ(S −1 T ). From analysis of some simple problems, we can generate the following
table [5]
(5.1) A + δA = L0 U 0
instead of right matrix A = LU. In fact, due to round off errors inherent in any
computation using computer, we actually end up solving the equation
The question is, how serious are the errors δx in solution x, due to round off
errors in matrix A and vector b? Can these errors be avoided by rearranging
5. WELL CONDITIONED AND ILL-CONDITIONED PROBLEMS 89
(5.5) x2 = 0.999899
(5.6) x2 = 1
in our computer which keeps only three significant digits. The solution then
becomes
h iT h i
(5.7) x1 x2 = 0.0 1
5.2. Induced Matrix Norms. We have already mentioned that set of all
m × n matrices with real entries (or complex entries) can be viewed a linear
vector space. In this section, we introduce the concept of induced norm of
a matrix, which plays a vital role in the numerical analysis. A norm of a
matrix can be interpreted as amplification power of the matrix. To develop a
numerical measure for ill conditioning of a matrix, we first have to quantify this
amplification power of the matrix.
5. WELL CONDITIONED AND ILL-CONDITIONED PROBLEMS 91
(5.21) B = ΨΛΨT
Where Ψ is matrix with eigen vectors as columns and Λ is the diagonal matrix
with eigenvalues of B (= AT A) on the diagonal. Note that in this case Ψ is
unitary matrix ,i.e.,
and eigenvectors are orthogonal. Using the fact that Ψ is unitary, we can write
(5.23) xT x = xT ΨΨT x = yT y
xT Bx yT Λy
(5.24) or T
= T
where y = ΨT x
(x x) (y y)
Suppose eigenvalues λi of AT A are numbered such that
(5.25) 0 ≤ λ1 ≤ λ2 ≤ .................. ≤ λn
Then
yT Λy (λ1 y12 + ................ + λn yn2 )
(5.26) = ≤ λn
(yT y) (y12 + ................. + yn2 )
This implies
yT Λy xT Bx xT (AT A)x
(5.27) = = ≤ λn
(yT y) (xT x) (xT x)
The equality holds only at the corresponding eigenvector of AT A, i.e.,
£ (n) ¤T T £ (n) ¤T
v (A A)v(n) v λn v(n)
(5.28) T
= T
= λn
[v(n) ] v(n) [v(n) ] v(n)
Thus,
or
Remark 2. There are other matrix norms, such as Frobenious norm, which
are not induced matrix norms. Frobenious norm is defined as
" n n #1/2
XX
||A||F = |aij |2
i=1 j=1
||δx|| ||δb||
(5.38) ⇒ ≤ (||A−1 || ||A||)
||x|| ||b||
The above inequality holds for every b and δb vector. The number
(5.39) C(A) = ||A−1 || ||A||
is called as condition number of matrix A. Thus the condition number
||δx||/||x||
(5.40) ≤ C(A) = ||A−1 || ||A||
||δb||/||b||
94 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
(A−1 A)T = I
AT (A−1 )T = I
(5.50) (AT )−1 = (A−1 )T
(5.54) AT A = ΨΛΨT
£ ¤
⇒ (AT A)−1 = [ΨΛΨT ]−1 = (ΨT )−1 Λ−1 Ψ−1 = ΨΛ−1 ΨT
This ordinary looking matrix is near singular with eigen values (computed using
MATLAB)
(6.3) Ax = G(x)
h iT
(6.4) G(x) = g1 (x) g2 (x) .... gn (x)
such that the solution of equation (6.3) is also solution of equation (6.2). The
nonlinear Equation (6.3) can be used to formulate iteration sequence of the form
£ ¤
(6.5) Ax(k+1) =G x(k)
£ ¤
Given a guess x(k) ,the R.H.S. is a fixed vector, say b(k) = G x(k) , and compu-
tation of the next guess x(k+1) essentially involves solving the linear algebraic
equation
Ax(k+1) = b(k)
at each iteration. Thus, the set of nonlinear algebraic equations is solved by for-
mulating a sequence of linear sub-problems. Computationally efficient method
of solving such sequence of linear problems would be to use LU decomposition
of matrix A.
A special case of interest is when matrix A = I in equation (6.5).In this
case, if the set of equations given by (6.3) can be rearranged as
i = 1, .........., n
• Relaxation Method
(k+1) (k) (k)
xi = xi + ω[gi (xk ) − xi ]
A popular method of this type is Wegstein iterations. Given initial
guess vector x(0)
(1)
xi = gi (x(0) ) ; ( i = 1, ............., n)
£ ¤
(k) gi (x(k) ) − gi (x(k−1) )
si = h i
(k) (k−1)
xi − xi
(k)
(k) si
(6.8) ωi =h i
(k)
1 − si
(k+1) (k) (k)
xi = gi (x(k) ) + ωi [xi − gi (x(k) )]
i = 1, ........., n
The iterations can be terminated when
° (k+1) °
(6.9) °x − x(k) ° < ε
i = 1, 2, ...n − 1
where
µ ¶ µ ¶
1 1 2
(6.15) α= 2 + ; β= ;
(∆z) P e 2 (∆z) P e (∆z)2
C1 − C0
(6.16) = P e(C0 − 1)
∆z
Cn − Cn−1
(6.17) = 0
∆z
the above set of nonlinear algebraic equations can be arranged as
(6.18)
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−(1 + ∆zP e) 1 0. ..... .... 0 C0 −P e (∆z)
⎢ α −β α .... ..... ..... ⎥ ⎢ C1 ⎥ ⎢ DaC12 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ .... ..... ..... ..... ..... 0. ⎥ ⎢ . ⎥=⎢ ..... ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ ..... ..... ..... ..... −β α ⎦ ⎣ . ⎦ ⎣ DaCn−12 ⎦
0 ..... ..... ... −1 1 Cn 0
If we define
h iT
(6.19) x= C0 C1 ... Cn
In other words, both J (k) and J (k+1) will predict some change in direc-
tion perpendicular to ∆x(k) .
(2) J (k+1) predicts for 4x(k) , the same ∆F (k) in linear expansion, i.e.,
or
As
We have
which yields
£ ¤
(k) ∆F (k) − J (k) 4 x(k)
(6.46) y =
[[4x(k) ]T 4 x(k) ]
6. SOLVING NONLINEAR ALGEBRAIC EQUATIONS 103
converges to x∗ with
° (k) ° ° °
°x − x∗ ° ≤ θk °x(0) − x∗ °
The proof of this theorem can be found in Rall [14] and Linz [9].
is not a contraction mapping near (1,1) and the iterations do not converge even if
we start from a value close to the solution. On the other hand, the rearrangement
q
(6.66) x(k + 1) = (y (k) + 2)/3
q
(k+1)
(6.67) y = (x(k) + 1)/2
where
e(k) = x(k) − x∗
and using definition of induced matrix norm, we can write
°∙ ¸ °
||e(k+1) || ° ° ∂G °
°
(6.72) < ° ∂x °
||e(k) || x=x(k)
It is easy to see that the successive errors will reduce in magnitude if the fol-
lowing condition is satisfied at each iteration i.e.
°∙ ¸ °
° ∂G °
(6.73) ° ° < 1 for k = 1, 2, ....
° ∂x (k)
°
x=x
Note that this is only a sufficient conditions. If the condition is not satisfied,
then the iteration scheme may or may not converge. Also, note that introduction
of step length parameter λ(k) in Newton-Raphson step as
(6.75) x(k+1) = x(k) + λ(k) ∆x(k)
° ° ° °
such that °F (k+1) ° < °F (k) )° ensures that G(x) is a contraction map and
ensures convergence.
6.4. Condition Number of Nonlinear Set of Equations. Concept of
condition number can be easily extended to analyze numerical conditioning of
set on nonlinear algebraic equations. Consider nonlinear algebraic equations of
the form
(6.76) F (x, u) = 0 ; x ∈Rn , u ∈Rm
where F is n × 1 function vector and u is a set of known parameters or indepen-
dent variables on which the solution depends. The condition number measures
the worst possible effect on the solution x caused by small perturbation in u.
Let δx represent the perturbation in the solution caused by perturbation δu,i.e.
(6.77) F (x+δx, u+δu) = 0
Then the condition number of the system of equations is defined as
sup ||δx||/||x||
(6.78) C(x) =
δu ||δu||/||u||
||δx||/||x||
(6.79) ⇒ ≤ C(x)
||δu||/||u||
If the solution does not depend continuously on u, then the C(x) becomes
infinity and such systems are called as (numerically) unstable systems. Systems
with large condition numbers are more susceptible to computational errors.
(7.2) f1 [dy/dz, y, z] = 0 at z = 0
(7.3) f2 [dy/dz, y, z] = 0 at z = 1
The true solution to problem is a function, say y ∗ (z) ∈ C(2) [0, 1], which be-
longs to the set of twice differentiable continuous functions. According to the
Weierstrass theorem, any continuous function over an interval can be approxi-
mated with arbitrary accuracy using a polynomial function of appropriate de-
gree. Thus, we assume an (approximate) n’th order polynomial solution to
ODE-BVP of the form
{pi (z); i = 0, 1, 2, .....n + 1} should be linearly independent vectors in C (2) [0, 1].
A straightforward choice of such linearly independent vectors is
In fact, the name orthogonal collocation can be attributed to the choice the
collocation points at roots of orthogonal polynomials. After selecting the loca-
tion of collocation points, the approximate solution (4.7) is used to convert the
ODE-BVP together with the BCs into a set of nonlinear algebraic equations by
setting residuals at n collocation (grid) points and the two boundary points
equal to zero.
Approach 1
Let us denote
(7.10) yi0 (θ) = [dy/dz]z=zi = θ0 p00 (zi ) + .......................θ n+1 p0n+1 (zi )
£ ¤
(7.11) yi00 (θ) = d2 y/dz 2 z=zi = θ0 p000 (zi ) + .......................θ n+1 p00n+1 (zi )
110 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
where
(7.12) θ = [θ0 ...........θ n+1 ]T
0
Substitute for y, y and y 00 in equation (3.1) and enforcing residual to be equal
to zero at the grid points, we get
(7.13) Ψ[yi00 (θ), yi0 (θ), yi (θ), zi ] = 0
i = 1, ......n
Similarly, enforcing residuals at boundary points equal to zero yields
(7.14) f1 [y00 (θ), y0 (θ), 0] = 0
i = 1, ......n
where the matrix M is computed at the internal collocation points and the
boundary points. Using equation (7.20), we can write
(7.21) θ = M −1 y = Ry
Using equation (7.21), we can express vector of first derivatives as
⎡ 0 ⎤ ⎡ 0 0 0 ⎤⎡ ⎤
y0 p0 (0) p1 (0) .... pn+1 (0) θ0
⎢ y0 ⎥ ⎢ p0 (z ) p0 (z ) .... p0 (z ) ⎥ ⎢ θ ⎥
⎢ 1 ⎥ ⎢ 1 1 1 n+1 1 ⎥ ⎢ 1 ⎥
(7.22) ⎢ ⎥ = ⎢ 0 ⎥⎢ ⎥
⎣ ... ⎦ ⎣ .... .... .... .... ⎦ ⎣ ... ⎦
0 0 0 0
yn+1 p0 (1) p1 (1) .... pn+1 (1) θn+1
(7.23) = Nθ = [NR] y =S y
If we express the matrix S as
⎡ £ ¤T ⎤
s(0)
⎢ £ (1) ¤T ⎥
⎢ s ⎥
(7.24) S=⎢
⎢ ....
⎥
⎥
⎣ £ ¤T ⎦
s(n+1)
i = 1, ......n
h£ ¤T i
(7.30) f1 s(0) y, y0 , 0 = 0
h£ ¤T i
(7.31) f2 s(n+1) y, yn+1 , 1 = 0
112 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
Example 48. [6] Consider the ODE-BVP describing steady state conditions
in a tubular reactor with axial mixing (TRAM) in which an irreversible 2nd
order reaction is carried out
1 d2 C dC
(7.36) 2
− − DaC 2 = 0 (0 ≤ z ≤ 1)
P e dz dz
dC
(7.37) = P e(C − 1) at z = 0;
dz
dC
(7.38) = 0 at z = 1;
dz
Using method of orthogonal collocation with n = 3 and defining
h iT
(7.39) C = C0 C1 ... C4
i = 1, 2, 3
7. SOLUTIONS OF ODE-BVP AND PDES BY ORTHOGONAL COLLOCATION 113
h£ ¤T i
(7.41) t(0) C − P e(C0 − 1) = 0
h£ ¤T i
(7.42) t(4) C = 0
⎡ ⎤
−13 14.79 −2.67 1.88 −1
⎢ −5.32 3.87 2.07 −1.29 0.68 ⎥
⎢ ⎥
⎢ ⎥
(7.43) S=⎢ 1.5 −3.23 0 3.23 −1.5 ⎥
⎢ ⎥
⎣ −0.68 1.29 −2.07 −3.87 5.32 ⎦
1 −1.88 2.67 −14.79 13
⎡ ⎤
84 −122.06 58.67 −44.60 24
⎢ 53.24 −73.33 26.27 −13.33 6.67 ⎥
⎢ ⎥
⎢ ⎥
(7.44) T =⎢ −6 16.67 −21.33 16.67 −6 ⎥
⎢ ⎥
⎣ 6.76 −13.33 26.67 −73.33 53.24 ⎦
24 −44.60 58.67 −122.06 84
0<x<1 ; 0<y<1
where u(x, y) represents the dimensionless temperature distribution in a furnace
and x, y are space coordinates. The boundary conditions are as follows:
(7.46) x = 0 : u = u∗ ; x = 1 : u = u∗
(7.47) y = 0 : u = u∗ ; y = 1 : k(∂u/∂y) = h(u∞ − u(x, 1))
Using nx internal grid lines parallel to y axis and ny grid lines parallel to y-
asix, we get nx × ny internal collocation points. Corresponding to the chosen
collocation points, we can compute matrices (Sx , Tx ) and (Sy , Ty ) using equations
(7.23) and (7.27). Using these matrices, the PDE can be transformed as
£ (i) ¤T (i) £ (j) ¤T (j)
(7.48) tx U + ty U = f (xi , yj )
i = 1, 2, ...nx ; j = 1, 2, ...ny
h i
(7.49) U (i) = u0,i u1,i ... unx+1,i
h i
(7.50) U (j) = uj,o uj,1 ... uj,ny+1
114 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
Remark 3. Are the two methods presented above, i.e. finite difference
and collocation methods, doing something fundamentally different? Suppose
we choose n0 th order polynomial (4.7), we are essentially approximating the
true solution vector y ∗ (z) ∈ C(2) [0, 1] by another vector (i.e. the polynomial
function) in (n + 2) dimensional subspace of C(2) [0, 1]. If we choose n inter-
nal grid points by finite difference approach, then we are essentially finding a
vector y in Rn+2 that approximates y ∗ (z). In fact, if we compare the Approach
2 presented above and the finite difference method, the similarities are more
apparent as the underlying (n + 2) dimensional subspace used in approxima-
tions become identical. Let us compare the following two cases (a) finite dif-
ference method with 3 internal grid points (b) collocation with 3 internal grid
points on the basis of expressions used for approximating the first and second
order derivatives computed at one of the grid points. For the sake of compar-
ison, we have taken equi-spaced grid points for collocation method instead of
taking them at the roots of 3’rd order orthogonal polynomial. Thus, for both
collocation and finite difference method, the grid (or collocation) points are at
{z0 = 0, z1 = 1/4, z2 = 1/2, z3 =h 3/4, z4 = 1} and wei want to estimate the ap-
proximate solution vector y = y0 y1 y2 y3 y4 in both the cases. Let us
compare expressions for derivative at z = z2 used in both the approaches.
Finite Difference
(y3 − y1 )
(7.55) (dy/dz)2 = = 2y3 − 2y1 ; ∆z = 1/4
2(∆z)
(y3 − 2y2 + y1 )
(7.56) (d2 y/dz 2 )2 = = 16y3 − 32y2 + 16y1
(∆z)2
9. APPENDIX 115
Collocation
It becomes clear from the above expressions that the essential difference between
the two approaches is the way the derivatives at any grid (or collocation) point
is approximated. The finite difference method takes only immediate neighboring
points for approximating the derivatives while the collocation method finds deriv-
atives as weighted sum of all the collocation (grid) points. As a consequence,
the approximate solutions generated by these approaches will be different.
8. Summary
In these lecture notes, we have developed methods for efficiently solv-
ing large dimensional linear algebraic equations. To begin with, we introduce
induced matrix norms and use them to understand matrix ill conditioning and
susceptibility of matrices to round of errors. The direct methods for dealing
with sparse matrices are discussed next. Iterative solution schemes and their
convergence characteristics are discussed in the subsequent section. We then
present techniques for solving nonlinear algebraic equations, which are based
on successive solutions of linear algebraic sub-problem. In the last section, we
have discussed the orthogonal collocation technique, which converts ODE-BVPs
or certain class of PDEs into a set of nonlinear algebraic equations, which can
be solved using Newton-Raphson method.
9. Appendix
9.1. Proof of Theorem 2 [4]: For Jacobi method,
(9.1) S −1 T = −D−1 [L + U]
⎡ a12 a1n ⎤
0 − ..... −
⎢ a11 a11 ⎥
⎢ a12 ⎥
⎢ − 0 ..... .... ⎥
⎢ ⎥
(9.2) = ⎢ a22 an−1,n ⎥
⎢ ..... ..... .... − ⎥
⎢ an−1,n−1 ⎥
⎣ a12 ⎦
− ..... ..... 0
ann
116 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
Xn ¯ ¯
¯ aij ¯
(9.4) ⇒ ¯ ¯ < 1 for i = 1, 2., ...n
¯ aii ¯
j=1(j6=i)
⎡ ⎤
X n ¯ ¯
° −1 ° max ⎣ ¯ aij ¯
(9.5) °S T ° = ¯ ¯⎦ < 1
∞
i ¯ aii ¯
j=1(j6=i)
Since
° ∗ ° ¯³ ´¯
°x − x(k) ° = max ¯¯ x∗j − xj(k) ¯¯
∞
j
we can write
¯ ¯ ° ° ° °
¯ (k+1) ¯
(9.10) ¯xi − x∗i ¯ ≤ pi °x∗ − x(k+1) °∞ + qi °x∗ − x(k) °∞
where
i−1 ¯
X ¯ Xn ¯ ¯
¯ aij ¯ ¯ aij ¯
(9.11) pi = ¯ ¯ ¯ ¯
¯ aii ¯ ; qi = ¯ aii ¯
j=1 j=i+1
or
° ∗ ° qs ° °
(9.14) °x − x(k+1) ° ≤ °x∗ − x(k) °
∞ 1 − ps ∞
Let
max qj
(9.15) µ=
j 1 − pj
° ∗ ° qs ° °
(9.16) °x − x(k+1) ° ≤ °x∗ − x(k) °
∞ 1 − ps ∞
X
n ¯ ¯
¯ aij ¯
(9.19) 0 < pi + qi = ¯ ¯<1
¯ aii ¯
j=1(j6=i)
Let
⎡ ⎤
Xn ¯ ¯
max ⎣ ¯ aij ¯
(9.20) β= ¯ ¯⎦
i ¯ aii ¯
j=1(j6=i)
Then, we have
(9.21) pi + qi ≤ β < 1
It follows that
(9.22) qi ≤ β − pi
and
qi β − pi β − pi β
(9.23) µ= ≤ ≤ =β<1
1 − pi 1 − pi 1 − pi
Thus, it follows from inequality (9.17) that
° ∗ ° ° °
(9.24) °x − x(k) ° ≤ µk °x∗ − x(0) °
∞ ∞
we have
¯ ¯
α ¯ α ¯
(9.33) λ=− ⇒ |λ| = ¯¯ ¯
α+σ α + σ¯
Note that σ > 0 follows from the fact that trace of matrix A, is positive as
eigenvalues of A are positive. Using positive definiteness of matrix A, we have
®
(9.34) hAe, ei = hLe, ei + hDe, ei + LT e, e
(9.35) = σ + 2α > 0
This implies
(9.36) −α < (σ + α)
Since σ > 0, we can say that
(9.37) α < (σ + α)
i.e.
(9.38) |α| < (σ + α)
This implies
¯ ¯
¯ α ¯
(9.39) |λ| = ¯¯ ¯<1
α + σ¯
10. EXERCISE 119
10. Exercise
(1) A polynomial
y = a0 + a1 x + a2 x2 + a3 x3
passes through point (3, 2), (4, 3), (5, 4) and (6, 6) in an x-y coordi-
nate system. Setup the system of equations and solve it for coefficients
a0 to a3 by Gaussian elimination. The matrix in this example is (4 X
4 ) Vandermonde matrix. Larger matrices of this type tend to become
ill-conditioned.
(2) Solve using Gauss Jordan method.
u + v + w = −2
3u + 3v − w = 6
u − v + w = −1
to obtain A−1 . What coefficient of v in the third equation, in place
of present −1 , would make it impossible to proceed and force the
elimination to break down?
(3) Decide whether vector b belongs to column space spanned by x(1), x(2) , ....
(a) x(1) = (1, 1, 0); x(2) = (2, 2, 1); x(3) = (0, 2, 0); b = (3, 4, 5)
(b) x(1) = (1, 2, 0); x(2) = (2, 5, 0); x(3) = (0, 0, 2); x(4) = (0, 0, 0); any
b
(4) Find dimension and construct a basis for the four fundamental sub-
spaces associated with each of the matrices.
" # " #
0 1 4 0 0 1 4 0
A1 = ; U2 =
0 2 8 0 0 0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 1 2 0 1 1 2 0 1
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
A2 = ⎣ 0 0 1 ⎦ ; A3 = ⎣ 0 1 1 0 ⎦ ; U1 = ⎣ 0 1 1 0 ⎦
0 0 0 1 2 0 1 1 2 0 1
(5) Find a non -zero vector x∗ orthogonal to all rows of
⎡ ⎤
1 2 1
⎢ ⎥
A=⎣ 2 4 3 ⎦
3 6 4
(In other words, find Null space of matrix A.) If such a vector exits,
can you claim that the matrix is singular? Using above A matrix find
one possible solution x for Ax = b when b = [ 4 9 13 ]T . Show that
120 3. LINEAR ALGEBRAIC EQUATIONS AND RELATED NUMERICAL SCHEMES
A(x + αx∗ ) = b
Obtain A−1 , det(A) and also solve for [x1 x2 ]T . Obtain numerical
values for ε = 0.01, 0.001 and 0.0001. See how sensitive is the solution
to change in ε.
(21) If A is orthogonal matrix, show that ||A|| = 1 and also c(A) = 1.
Orthogonal matrices and their multiples(αA) are the only perfectly
conditioned matrices. " #
2 −1
(22) For the positive definite matrix, A = , compute ||A−1 || =
−1 2
1/λ1 ; and c(A) = λ2 /λ1. Find the right side b and a perturbation vector
δb such that the error is worst possible, i.e. find vector x such that
and to
λmax (AB) ≤ λmax (A) λmax (B)
(29) Prove the following inequalities/ identities
kA + Bk ≤ kAk + kBk
1. Motivation
In these lecture notes, we undertake the study of solution techniques for
multivariable and coupled ODE-IVPs. The numerical techniques for solving
ODE-IVPs form basis for a number of numerical schemes and are used for
solving variety of problems such as
• Dynamic simulation of lumped parameter systems
• Solution of ODE-BVP
• Solving Parabolic / Hyperbolic PDEs
• Solving simultaneous nonlinear algebraic equation
and so on. In order to provide motivation for studying the numerical meth-
ods for solving ODE-IVPs, we first formulate numerical schemes for the above
problems in the following subsections.
Consider three isothermal CSTRs in series in which a first order liquid phase
reaction of the form
(1.1) A −→ B
is carried out. It is assumed that volume and liquid density remains constant
in each tank and
dCA1
(1.2) V1 = F (CA0 − CA1 ) − kV1 CA1
dt
dCA2
(1.3) V2 = F (CA1 − CA2 ) − kV2 CA2
dt
dCA3
(1.4) V3 = F (CA2 − CA3 ) − kV3 CA3
dt
127
128 4. ODE-IVPS AND RELATED NUMERICAL SCHEMES
dS 1
(1.10) = F2 (X, S, P, D, Sf ) = D(Sf − S) − µX
dt YX/S
dP
(1.11) = F3 (X, S, P, D, Sf ) = −DP + (αµ + β)X
dt
where X represents effluent cell-mass or biomass concentration, S represents
substrate concentration and P denotes product concentration. It is assumed
that product concentration (S) and the cell-mass concentration (X) are mea-
sured process outputs while dilution rate (D) and the feed substrate concen-
tration (Sf ) are process inputs which can be manipulated. Model parameter
µ represents the specific growth rate, YX/S represents the cell-mass yield, α and
1. MOTIVATION 129
β are the yield parameters for the product. The specific growth rate model is
allowed to exhibit both substrate and product inhibition:
P
µm (1 −)S
Pm
(1.12) µ=
S2
Km + S +
Ki
where µm represents maximum specific growth rate, Pm represents product sat-
uration constant, Km substrate saturation constant and the Ki represents sub-
strate inhibition constant. Defining state and input vectors as
h iT h iT
(1.13) x= X S P ; u = D Sf
(1.17) F (x) = 0
130 4. ODE-IVPS AND RELATED NUMERICAL SCHEMES
Example 53. Consider the 2-dimensional unsteady state heat transfer prob-
lem
∂T ∂2T ∂2T
(1.29) = α[ 2 + 2 ]
∂t ∂x ∂y
(1.30) t = 0 : T = F (x, y)
dT (x, 1, t)
(1.32) y = 0 : T (x, 0, t) = T ∗ ; y = 1 : k = h(T∞ − T (x, 1, t))
dy
Now, we force the residual to zero at each internal grid point to generate a set
of coupled ODE-IVP’s as
dTij α α
(1.34) = 2
[Ti+1,j − 2Ti,j + Ti−1,j ] + [Ti,j+1 − 2Ti,j + Ti,j−1 ]
dt (∆x) (∆y)2
dTi,ny (t)
(1.35) k = h(T∞ − Ti,ny (t))
dy
i = 1, 2, ....nx − 1
132 4. ODE-IVPS AND RELATED NUMERICAL SCHEMES
i = 1, 2, 3, ...m
h£ ¤T i
t(0) C = P e(C0 (t) − 1)
h£ ¤T i
t(m+1) C = 0
£ ¤ ¡ ¢
where the matrices t(i) and s(i) represent row vectors of matrices T and S, as
defined in the Section 5 (example 6) of lecture notes on linear algebraic equations
and related numerical scheme.. Here, Ci (t) represents concentration at the i’th
collocation point. The above set of ODEs, together with initial conditions,
Example 56. the ODE-BVP describing tubular reactor with axial mixing
(TRAM) in which an irreversible 2nd order reaction is carried out is given as
1 d2 C dC
(1.36) − − DaC 2 = 0 (0 ≤ z ≤ 1)
P e dz 2 dz
dC
= P e(C − 1) at z = 0;
(1.37) dz
dC
=0 at z = 1;
dz
where C is the dimensionless concentration, z is axial position, P e is the Peclet
number for mass transfer and Da is the Damkohler number. Now, defining new
state variables
dC
(1.38) x1 = C and x2 =
dz
1. MOTIVATION 133
(1.53) x = 0 : T = T ∗; x = 1 : T = T ∗
dT
(1.54) y = 0 : T = T ∗; y = 1 : k = h(T∞ − T )
dx
We construct ny + 1 grid lines parallel to the x-axis. The temperature T along
the j th grid line is denoted as
(1.55) Tj (x) = T (x, yj )
Now, we equate residuals to zero at each internal grid line as
d2 Tj 1
(1.56) 2
= − [Tj+1 (x) − 2Tj (x) + Tj−1 (x)]
dx (∆y)2
j = 2, ....ny − 1
The boundary conditions at y = 0 can be used to eliminate variables on the
corresponding edge. At the boundary y = 1, using the B.C.s we get
dTny
(1.57) k = h(T∞ − Tny )
dx
2. ANALYTICAL SOLUTIONS OF MULTIVARIABLE LINEAR ODE-IVP 135
j = 1, 2, ....ny − 1
The above set of ODE’s can be integrated from x = 0 to x = 1 with initial
condition
dx
(2.1) = Ax; x = x(0) at t = 0
dt
x ∈ Rm , A is a (m × m) matrix
To begin with, we develop solution for the scalar case and generalize it to the
multivariable case.
Now,
where v is a constant vector. The above candidate solution must satisfy the
ODE, i.e.,
d λt
(e v) = A(eλt v)
(2.9) dt
⇒ λveλt = Aveλt
Cancelling eλt from both the sides, as it is a non-zero scalar, we get an equation
that vector v must satisfy,
(2.10) λv = Av
This fundamental equation has two unknowns λ and v and the resulting problem
is the well known eigenvalue problem in linear algebra. The number λ is called
the eigenvalue of the matrix A and v is called the eigenvector. Now, λv = Av
is a non-linear equation as λ multiplies v. if we discover λ then the equation
for v would be linear. This fundamental equation can be rewritten as
(2.11) (A − λI)v = 0
This implies that vector v should be ⊥ to the row space of (A − λI). This is
possible only when rows of (A − λI) are linearly dependent. In other words,
λ should be selected in such a way that rows of (A − λI) become linearly
dependent, i.e., (A − λI) is singular. This implies that λ is an eigenvalue of A
if and only if
Then, it can be shown that x(t) also satisfies equation (2.1). Thus, a general
solution to the linear ODE-IVP can be constructed as a linear combination of
the fundamental solutions eλi t v(i) .
The next task is to see to it that the above equation reduces to the initial
conditions at t = 0. Defining vectors C and matrix Ψ as
h iT h i
(2.14) C= c1 c2 ... cm ; Ψ= v(1) v(2) ..... v(m)
we can write
(2.15) x(0) = ΨC
1
(2.18) eAt = I + At + (At)2 + .......
2!
Using the fact that matrix A can be diagonalized as
(2.19) A = ΨΛΨ−1
where matrix Λ is
h i
Λ = diag λ1 λ2 .... λm
we can write
This system exhibits entirely different dynamic characteristics for different set
of parameter values (Marlin, 1995). The nominal parameter values and nomi-
nal steady state operating conditions of the CSTR for the stable and unstable
operating points are given in Table 1.
• Perturbation model at stable operating point
" # " #" #
d δCA −7.559 −0.09315 δCA
(2.33) =
dt δT 852.7 5.767 δT
∙ ¸
∂F
Eigenvalues of are
∂x x=x
and all the trajectories for the unforced system (i.e. when all the inputs
are held constant at their nominal values) starting in the neighborhood
of the steady state operating point converge to the steady state.
3. NUMERICAL METHODS FOR THE SOLUTION OF ODE-IVP 141
and all the trajectories for the unforced system starting in any small
neighborhood of the steady state operating point diverge from the
steady state.
3.1. Why develop methods only for the set of first order ODE’s?
In the above illustrations, the system of equations under consideration is the
set of simultaneous first order ODEs represented as
dx
(3.1) = F (x, t) ; x(0) = x0 ; x ∈ Rn
dt
In practice, not all models appear as first order ODEs. In general, one can get
an m’th order ODE of the type:
dm y dy d2 y dm−1 y
(3.2) = f [y, , , ....., , t]
dtm dt dt2 dtm−1
dm−1 y
(3.3) Given y(0), ...... m−1 (0)
dt
Now, do we develop separate methods for each order? It turns out that such a
exercise in unnecessary as a m’th order ODE can be converted to m first order
142 4. ODE-IVPS AND RELATED NUMERICAL SCHEMES
⎡ ⎤
x2
⎢ ..... ⎥
⎢ ⎥
(3.6) F (x) = ⎢ ⎥
⎣ xm ⎦
f [x1 , x2 , x3 , ......., xm , t]
we can write the above set of
dx
(3.7) = F (x, t)
dt
∙ ¸T
dy dm−1y
(3.8) x(0) = y(0) (0)....... m−1 (0)
dt dt
Thus, it is sufficient to study only the solution methods for solving n first
order ODE’s. Any set of higher order ODEs can be reduced to a set of first
order ODEs. Also, as forced systems (non-homogeneous systems) can be looked
upon as unforced systems (homogenous systems) with variable parameters, it is
sufficient to study the solution methods for homogenous set of equations of the
type (3.1).
each defined over a smaller interval [tn , tn+1 ] . This generates a sequence of
approximate solution vectors {x(tn ) : n = 1, .......f }. The difference hn = tn −
tn−1 is referred to as the integration step size or the integration interval. Two
possibilities can be considered regarding the choice of the sequence {tn }
• Fixed integration interval: The numbers tn are equispaced, i.e., tn = nh
for some h > 0
• Variable size integration intervals
For the sake of convenience, we introduce the notation
The new value x(n + 1) is a function of only the past value of x i.e.,
x(n). This is a non-iterative scheme.
• Implicit Euler method:
dx ∼ x(n + 1) − x(n)
= = F [x(n + 1), tn+1 ]
(3.15) dt h
x(n + 1) = x(n) + hF (n + 1), (n = 0, 1, ......., n − 1)
Each of the above equation has to be solved by iterative method.
For example if we use successive substitution method for solving the
resulting nonlinear equation(s), the algorithm can be stated as follows:
Initialize: x(0), tf , h, ∈, N = tf /h
FOR n = 1 TO n = N
x(0) (n + 1) = x(n) + hF [x(n), tn ]
WHILE ( δ >∈)
x(k+1) (n + 1) = x(n) + hF [x(k) (n + 1), tn+1 ]
||x(k+1) (n + 1) − x(k) (n + 1)||
δ=
||x(k) (n + 1)||
END WHILE
x(n + 1) = x(k) (n + 1)
END FOR
3.2.2. Variable stepsize implementation with accuracy monitoring. One prac-
tical difficulty involved in the integration with fixed stepsize is the choice of
stepsize such that the approximation errors are kept small. Alternatively, a
variable stepsize algorithm is implemented with error monitoring as follows.
Given: tn , x(n) = x(tn ), ε
(1)
• Step 1: Choose stepsize h1 and let tn+1 = tn + h1
• Step 2: Compute x(1) (n + 1) using an integration method (say explicit
Euler).
(2)
• Step 3: Define h2 = h1 /2; tn+1 = tn + h2
(2) (1)
tn+2 = tn + 2h2 (= tn+1 )
(2)
Compute x(2) and xn+2 by the same integration method.
° (1) °
• Step 4: IF (°x (n + 1) − x(2) (n + 2)° < ε),
(Accept x(1) (n + 1) as the new value)
Set x(n + 1) = x(1) (n + 1), and n = n + 1 and proceed to
Step 1.
ELSE
set h1 = h2 and proceed to the step 2.
END IF
3. NUMERICAL METHODS FOR THE SOLUTION OF ODE-IVP 145
" # " #
y1 (t) 2e−100t
(3.18) = 103 −t 4 −100t
y2 (t) 99
e − 99
e
It can be observed that the terms with e−100t lead to a sharp decrease in y1 (t)
and to a small maximum in y2 (t) at t = 0.0137. The term y2 (t) is dominated
by e−t which decreases slowly. Thus,
(3.19) y1 (t) < 0.01y1 (0) f or t > 0.03
is a function of time. Thus, for stiff systems it is better to use variable step size
methods or special algorithms for stiff systems.
and so on. Let us now suppose that instead of actual solution x∗ (n), we have
available an approximation to x∗ (n), denoted as x(n). With this information,
we can construct x(n + 1), as
∙µ ¶ µ ¶ ¸
h2 ∂f ∂f
(3.28) x(n + 1) = x(n) + hf (n) + f (n) + + .......
2 ∂x n ∂t n
We can now make a further approximation by truncating the infinite series.
If the Taylor series is truncated after the term involving hk , then the Taylor’s
series method is said to be of order k.
• Order 1(Euler explicit formula)
• Order 2
∙µ ¶ µ ¶ ¸
h2 ∂f ∂f
(3.30) x(n + 1) = x(n) + hf (n) + f (n) +
2 ∂x n ∂t n
The real numbers a, b,α, β, are chosen such that the RHS of (3.31) approximates
the RHS of Taylor series method of order 2 (ref. 3.30). To achieve this, consider
Taylor series expansion of the function k2 , about (tn , x(n)).
k2
(3.34) = f (tn + αh, x(n) + βhf (n))
h µ ¶ µ ¶
∂f ∂f
(3.35) = f (tn , x(n)) + αh + βh f (n) + O(h3 )
∂t n ∂x n
Substituting this in equation (3.31), we have
Now, suppose we want to use this method for solving m simultaneous ODE-IVPs
dx
(3.48) = F(x, t)
dt
(3.49) x(0) = x0
where x ∈ Rm and F (x, t) is a m×1 function vector. Then, the above algorithm
can be modified as follows
h
(3.50) x(n + 1) = x(n) + (k1 + 2k2 + 2k3 + +k4 )
6
and
{f (n), f (n − 1), f (n − 2).........f (0)}
which can be used to construct the polynomial approximation. We approxi-
mate x(t) in the neighborhood of t = tn by constructing a local polynomial
approximation of type
and use it to estimate or extrapolate x(n + 1).(Note that subscript ’n’ used
for coefficients indicate that the coefficients are corresponding to polynomial
approximation at time t = tn ). Here, the coefficients of the polynomial are
estimated using the state and derivative information from the past and possibly
f (n + 1). In order to see how this can be achieved, consider a simple case where
we want construct a second order approximation
Now, there are several ways we could go about estimating the unknown para-
meters of the polynomial.
• Explicit algorithm: Let us use only the current and the past infor-
mation of state and derivatives, which will lead to an explicit algorithm.
f (n − 1) = a1,n − 2a2,n h
(3.60) f (n) = a1,n
dx(n) X m
(3.75) f (x, t) = = a1,n + 2a2,n t + ........ + m am,n tm−1 = jaj,n (t)j−1
dt j=1
(3.78) x(n − i) = a0,n + a1,n (−ih) + a2,n (−ih)2 + ....... + am,n (−ih)m
(3.79) i = 0, 1, ...., p
Substitution of equations (3.77),(3.78) and (3.80) into (3.71) gives what is known
as the exactness constraints for the algorithm as
p
"m # p
" m #
Xm X X X X
aj,n (h)j = αi aj,n (−ih)j + h βi jaj,n (−ih)j−1
j=0 i=0 j=0 i=−1 j=1
à p ! à p p
!
X X X
= αi a0,n + (−i)αi + (−i)0 β i a1,n h + ...
i=0 i=0 i=−1
à p p
!
X X
(3.82) ... + (−i)m αi + m (−i)m−1 β i am,n hm
i=0 i=−1
(3.85) (m + 1) ≤ 2p + 3
(3.86) m = 2(p + 1)
(3.100) p=m−2
(3.101) α1 = α2 = ....... = αp = 0
For j = 0, we have
p
X
(3.102) αi = 1; ⇒ α0 = 1
i=0
where y(n) represents sum of all terms which are known from the past data.
The above implicit equation has to be solved iteratively to obtain x(n + 1).
3.5.3. Predictor-Corrector Algorithms. We saw that a m step Adams-Bashworth
algorithm is exact for polynomials of order m, while a m-step Adams-Moulton
algorithm is exact for the polynomials of order (m + 1). However, the Adams-
Moulton algorithm is implicit, i.e.,
where the quantity y(n) depends on x(n), ......., x(n − p) and is known. The
above implicit equation can be solved iteratively as
£ ¤
(3.107) x(k+1) (n + 1) = y(n) + hβ −1 f x(k) (n + 1), tn+1
If we choose the initial guess x(0) (n + 1) reasonably close to the solution, the
convergence of the iterations is accelerated. To achieve this, we choose x(0) (n +
1) as the value generated by an explicit m−step algorithm and then apply
the iterative formula. This is known as the predictor-corrector method. For
example, a two-step predictor-corrector algorithm can be given as
∙ ¸
(0) 3 1
(3.109) x (n + 1) = x(n) + h f (n) − f (n − 1) (Predictor)
2 2
(3.110) ∙ ¸
(k+1) 1 (k) 1
x (n + 1) = x(n) + h f (x (n + 1), tn+1 ) + f (n) (Corrector)
2 2
If the stepsize is selected properly, relatively few applications of the correction
formula are enough to determine x(n + 1), with a high degree of accuracy.
Gear’s Predictor-Corrector Algorithms: A popular algorithm used
for numerical integration is Gear’s predictor corrector. The equations for this
algorithm are as follows:
• Gear’s m-th order predictor algorithm is an explicit algorithm, with
(3.111) p = m−1
(3.112) β −1 = β 1 = ....... = β p = 0; β 0 6= 0
(3.113) x(n + 1) = α0 x(n) + α1 x(n − 1) + ... + αp x(n − p) + hβ 0 f (n)
(3.114) p = m−1
(3.115) β 0 = β 1 = ....... = β p = 0; β −1 6= 0
(3.116)
x(n + 1) = α0 x(n) + α1 x(n − 1) + ....... + αp x(n − p) + hβ −1 f (n + 1)
3.5.4. Multivariate Case. Even though the above derivations have been worked
for one dependent variable case, these methods can be easily extended to multi-
variable case
dx
(3.117) = F (x, t) ; x ∈ Rn
dt
where F (x, t) is a n × 1 function vector. In the multivariable extension, the
scalar function f (x, t) is replaced by the function vector F (x, t), i.e.
(3.118)
x(n + 1) = α0 x(n) + α1 x(n − 1) + ....... + αp x(n − p)
£ ¤
+h β −1 F (n + 1) + β 0 F (n) + β 1 F (n − 1) + .... + β p F (n − p)
where
(3.119) F (n − i) ≡ F [x(tn − ih), (tn − ih)]
i = −1, 0, 1, ...p
© ª
and the scalar coefficients α0 ....αp , β −1 , β 0 , β 1 , ......β p are identical with the
coefficients derived for the scalar case as described in the above section.
The main advantages and limitations of multi-step methods can be summa-
rized as follows
• Advantages:
There are no extraneous ’inter-interval’ calculations as in the case
of Runge-Kutta methods.
Can be used for stiff equations if integration interval is chosen care-
fully.
• Limitations:
Time instances should be uniformly spaced and selection of integra-
tion interval is a critical issue.
Now consider the approximate solution of the above ODE-IVP by explicit Euler
methods
x(n + 1) = x(n) + hf (n)
(4.4) = (1 + ah)x(n)
⇒ x(n) = (1 + ah)n x(0)
Defining approximation error introduced due to numerical integration,
we can write
£ ¤
(4.6) e(n + 1) = (1 + ah)e(n) + eah − (1 + ah) x∗ (n)
¯ ¯
¯ 1 ¯
(4.10) ¯ ¯
¯ 1 − ah ¯ < 1 ⇒ ah < 0
158 4. ODE-IVPS AND RELATED NUMERICAL SCHEMES
¯ ¯
¯ 1 + (ah/2) ¯
(4.12) ¯ ¯
¯ 1 − (ah/2) ¯ < 1 ⇒ ah < 0
¯ ¯ 2
¯ 2¯
(4.15) ¯1 + ah + (ah) ¯ < 1 ⇒ −2 < ah + (ah) < 0
¯ 2 ¯ 2
Following similar arguments as in the scalar case, it can be shown that condition
for choosing integration interval are as follows
4. STABILITY ANALYSIS AND SELECTION OF INTEGRATION INTERVAL 159
• Explicit Euler
(4.17) x(n + 1) = (I + hA)x(n)
(4.18) ρ [I + hA] < 1
where ρ(.) represents spectral radius of the matrix [I + hA] . When
matrix A is diagonalizable, i.e. A = ΨΛΨ−1 , we can write
(4.19) I + hA = Ψ [I + hΛ] Ψ−1
and eigen values of matrix I + hA are {(1 + hλi ) : i = 1, 2, ..., n} where
{λi : i = 1, 2, ..., n} represent eigenvalues of matrix A. Thus, the stabil-
ity requirement reduces to
(4.20) |1 + hλi | < 1 for i = 1, 2, ..., n
• Implicit Euler
(4.21) x(n + 1) = (I − hA)−1 x(n)
£ ¤
(4.22) ρ (I − hA)−1 < 1
• Trapeziodal Rule:
µ ¶−1 µ ¶
h h
(4.23) x(n + 1) = I − A I + A x(n)
2 2
"µ ¶−1 µ ¶#
h h
(4.24) ρ I− A I+ A <1
2 2
Similar error analysis (or stability analysis) can be performed for other inte-
gration methods. For example, when the 3-step algorithm is used for obtaining
the numerical solution of
Defining
⎡ ⎤ ⎡ ⎤
x(n − 2) x(n − 1) 0 1 0
⎢ ⎥ ⎢ ⎥
(4.27) z(n) = x(n − 1) ; z(n + 1) = ⎣ x(n) ⎦ ; B = ⎣ 0 0 1 ⎦
x(n) x(n + 1) η2 η1 η0
we have
(4.30) z∗ (n + 1) = B ∗ z∗ (n)
(4.31) x(n + 1) = Cz∗ (n)
where
⎡ ⎤
eah 0 0
⎢ ⎥
(4.32) B ∗ = ⎣ 0 eah 0 ⎦
0 0 eah
The evolution of the approximation error is given as
If the stability criterion that can be used to choose integration interval h can
be derived as
(4.36) λ3 − η 0 λ2 − η 1 λ − η 2 = 0
• Even though the first and second order Adams-Moulton methods ( im-
plicit Euler and Crank-Nicholson) are A-stable, the higher order tech-
niques have restricted regions of stability. These regions are larger than
the Adams-Bashworth family of the same order.
• All forms of the R-K algorithms with order ≤ 4 have identical stability
envelopes.
• Explicit R-K techniques have better stability characteristics than ex-
plicit Euler.
• For predictor-corrector schemes, accuracy of scheme improves with or-
der. However, stability region shrinks with order.
Remark 6. The conclusions reached from studying linear systems can be
extended to general nonlinear systems locally using Taylor expansion.
dx
(4.37) = F (x)
dt
can be approximated as
∙ ¸
dx ∼ ∂F
(4.38) = F (x(n)) + (x − x(n))
dt ∂x x=x(n)
∙ ¸ " ∙ ¸ #
∼ ∂F ∂F
(4.39) = x + F [x(n)] − x(n)
∂x x=x(n) ∂x x=x(n)
(4.40) ∼
= (A)n x + (d)n
Applying some numerical technique to solve this problem will lead to
(4.41) x(n + 1) = (B)n x(n + 1) + (c)n
and stability will depend on the choice of h such that ρ[(B)n ] < 1 for all n. Note
that, it is difficult to perform global analysis for general nonlinear systems.
5. Summary
In these lecture notes, we undertake the study of solutions of multivariable
and coupled ODE-IVPs. To begin with, we show that variety of problems, such
as solving ODE-BVP, hyperbolic / parabolic PDEs or set of nonlinear algebraic
equations, can be reduced to solving ODE-IVP. A special class of problems,
i.e. solutions of coupled linear ODE-IVPs can be solved analytically. Thus,
before we start the development of the numerical methods, we will develop
analytical solutions for unforced (homogeneous) linear ODE-IVP problem and
investigate their asymptotic behavior using eigenvalue analysis. We later discuss
development of numerical algorithms based on
• Taylor series approximations (Runge-Kutta methods)
162 4. ODE-IVPS AND RELATED NUMERICAL SCHEMES
6. Exercise
(1) Express the following set of equations in the standard form
(b) Set 2
d2 z/dt2 + 3dz/dt + 2z = 0
z(0) = 1; dz/dt = 0
Compare the coefficients of the characteristic equation, i.e. det(
λI − A) = 0, and those of the ODE(s) for the first two sets. Also,
comment upon the asymptotic behavior of the solution in each case
based on eigenvalues of matrix A.
(2) Consider the dynamic model of three isothermal CSTRs in series (Ex-
ample 1). The model parameters are Residences time values: τ 1 = 1
min τ 1 = 2 min τ 3 = 3 min and the reaction rate constant k = 0.5
(min−1 )
(a) Assuming that the CSTR is at a steady state initially ( i.e., dc/dt =
0) with CA0 = 1.8, find the corresponding steady state concentra-
tion by solving the resulting linear algebraic equations.
(b) Suppose, we decide to shutdown the reactor system and reduce
CA0 = 0 at t = 0. Integrate the resulting set of homogeneous ODEs
analytically to obtain the concentration profile C(t), starting with
the steady state obtain above.
6. EXERCISE 163
(c) Use explicit Euler method to integrate the above set of equations
from t =0 to t = 2 with integration interval of 0.1, 0.25, 0.5, 1.0
and compare the approximate solutions with the corresponding
analytical solution in each case.
(d) Repeat (c) for the case h=0.25 using implicit Euler method.
(3) Consider the PDE given below
∂C/∂t = ∂ 2 C/∂z 2
C(0, t) = C(1, t) = 0 for all 0 ≤ t ≤ ∞
C(z, 0) = 1 for 0 ≤ z ≤ 1
(a) Use the finite difference technique on the dimensionless diffusion
equation obtain a set of ODE-IVPs assuming N internal grid points.
Particularly for the case N = 3, obtain the analytical solution to
the resulting ODE-IVP.
(b) Repeat the above exercise using orthogonal collocation to dis-
cretize in space with two internal collocation points.
(4) Consider Van der Pol equation given below
d2 y/dt2 − (1 − y 2 )dy/dt + 3y = 0
y(0) = 2; dy/dt = 0 at t = 0
(a) Express the above ODE-IVP in standard form
dx/dt = F (x); x = x(0) at t = 0
(b) Linearize the resulting equation in the neighborhood of x = [ 0 0 ]
and obtain the perturbation solution analytically. Comment upon
the asymptotic behavior of the solution.
(5) Consider the quadruple tank setup shown in Figure 1.
164 4. ODE-IVPS AND RELATED NUMERICAL SCHEMES
1. Introduction
These lecture notes deal with multivariable unconstrained optimization
techniques and their application to computing numerical solutions of various
types of problems. One of the major application of unconstrained optimization
is model parameter estimation (function approximation or multivariate regres-
sion). Thus, we begin by providing a detailed description of the model para-
meter estimation problem. We then derive necessary and sufficient conditions
for optimality for a general multivariable unconstrained optimization problem.
If the model has a nice structure, such as it is linear in parameters or can
be transformed to a linear in parameter form, then the associated parameter
estimation problem can be solved analytically. The parameter estimation of
linear in parameter models (multivariate linear regression) problem is treated
next. Geometric and statistical properties of the linear least square problem are
discussed in detail to provide further insights into this formulation. Numerical
methods for estimating parameters of the nonlinear-in-parameter models are
presented in subsequent section. The applications of optimization formulations
for solving problems such as solving linear/ nonlinear equations and solution of
PDEs using finite element method are discussed in the last section.
2. Principles of Optimization
2.1. Necessary Conditions for Optimality. Given a real valued
scalar function F (z) : Rn → R defined for any z ∈ Rn .
X
N
∂F
∗ ∗
(2.2) F (z + ∆z) = F (z ) + (z∗ )∆zi + R2 (z∗ , ∆z)
i=1
∂zi
∂F ∗
(2.3) i.e. F (z∗ + ∆z) − F (z∗ ) = ∆zk (z ) + R2 (z∗ , ∆z)
∂zi
Since R2 (z∗ , ∆z) is of order (∆zi )2 , the terms of order ∆zi will dominate over
the higher order terms for sufficiently small ∆z. Thus, sign of F (z∗ +∆z)−F (z∗ )
is decided by sign of
∂F ∗
∆zk (z )
∂zk
Suppose,
∂F ∗
(2.4) (z ) > 0
∂zk
then, choosing ∆zk < 0 implies
(2.5) F (z∗ + ∆z) − F (z∗ ) < 0 ⇒ F (z∗ + ∆z) < F (z∗ )
and F (z) can be further reduced by reducing ∆zk . This contradicts the assump-
tion that z = z∗ is a minimum point. Similarly, if
∂F ∗
(2.6) (z ) < 0
∂zk
then, choosing ∆zk > 0 implies
(2.7) F (z∗ + ∆z) − F (z∗ ) < 0 ⇒ F (z∗ + ∆z) < F (z∗ )
and F (z) can be further reduced by increasing ∆zk . This contradicts the as-
sumption that z = z∗ is a minimum point. Thus, z = z∗ will be a minimum of
F (z) only if
∂F ∗
(2.8) (z ) = 0 F or k = 1, 2, ...N
∂zk
Similar arguments can be made if z = z∗ is a maximum of F (z).
2. PRINCIPLES OF OPTIMIZATION 167
(2.9) zT Az >0
whenever z 6=0.
(2.10) zT Az ≥0
(2.11) zT Az <0
whenever z 6=0.
(2.12) zT Az ≤0
The sufficient condition for optimality, which can be used to establish whether
a stationary point is a maximum or a minimum, is given by the following theo-
rem.
∗
Theorem 12. A sufficient ¸ for a stationary point z = z to be an
∙ condition
2
∂ F
extreme point is that matrix (Hessian of F )evaluated at z = z∗ is
∂zi ∂zj
(1) positive definite when z = z∗ is minimum
(2) negative definite when z = z∗ is maximum
Proof: Using Taylor series expansion, we have
X
N
∂F 1 XX ∂ 2 F (z∗ + λ∆z)
N N
F (z∗ + ∆z) = F (z∗ ) + (z∗ )∆z + ∆zi ∆zj
i=1
∂zi 2! i=1 j=1 ∂zi ∂zj
(2.13) (0 < λ < 1)
(2.14) ∇F (z∗ ) = 0
168 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
1 XX ∂ 2 F (z∗ + λ∆z)
N N
∗ ∗
(2.15) F (z + ∆z) − F (z ) = ∆zi ∆zj
2! i=1 j=1 ∂zi ∂zj
(0 < λ < 1)
This implies that sign of F (a + ∆z) − F (a)at ∙extreme¸ point z∗ is same as
∂2F
sign of R.H.S. Since the 2’nd partial derivative is continuous in the
∂zi ∂zj
neighborhood of z = z∗ , its value at z = z∗ + λ∆z will have same sign as its
value at z = z∗ for all sufficiently small ∆z. If the quantity
X
N X
N
∂ 2 F (z∗ + λ∆z)
(2.16) ∆zi ∆zj ' (∆z)T [∇2 F (z∗ )]∆z ≥0
i=1 j=1
∂zi ∂zj
for all ∆z, then z = z∗ will be a local minimum. In other words, if Hessian ma-
trix [∇2 F (z∗ )] positive semi-definite, then z = z∗ will be a local minimum.
If the quantity
X
N X
N
∂ 2 F (z∗ + λ∆z)
(2.17) ∆zi ∆zj ' (∆z)T [∇2 F (z∗ )]∆z ≤0
i=1 j=1
∂zi ∂zj
for all ∆z, then z = z∗ will be a local maximum. In other words, if Hessian
matrix [∇2 F (z∗ )] negative semi-definite, then z = z∗ will be a local maxi-
mum.
then
(2.20) (∆z)T [∇2 F (z∗ )]∆z = 4 (∆z1 )2 + (∆z2 )2 + (1/9) (∆z3 )2 = 1
As the coefficients of quadratic terms are unequal and positive, we get an ellip-
soid instead of a sphere in the neighborhood of z = z∗ .
Now, suppose we consider a dense and positive definite [∇2 F (z∗ )] such
as
" #
5 4
(2.21) [∇2 F (z∗ )] =
4 5
then, we have
(2.22) (∆z)T [∇2 F (z∗ )]∆z = 5 (∆z1 )2 + 8 (∆z1 ∆z2 ) + 4 (∆z2 )2 = 1
This is still an ellipse in the neighborhood of z = z∗ , however, its axis are not
aligned parallel to the coordinate axis. Matrix [∇2 F (z∗ )] can be diagonalized as
" # " √ √ #" #" √ √ #T
5 4 1/ 2 1/ 2 1 0 1/ 2 1/ 2
(2.23) = √ √ √ √
4 5 −1/ 2 1/ 2 0 9 −1/ 2 1/ 2
Defining a rotated coordinates ∆y as
" #
∆y1
(2.24) ∆y = = ΨT ∆z
∆y2
" √ √ #T " #
1/ 2 1/ 2 ∆z1
(2.25) = √ √
−1/ 2 1/ 2 ∆z2
we have
" #
1 0
(∆z)T [∇2 F (z∗ )]∆z = ∆yT ∆y
0 9
(2.26) = (∆y1 )2 + 9 (∆y2 )2 = 1
h iT
∗
Figure 1 shows ellipsoid when z = 0 0 . Note that the coordinate trans-
formation ∆y = ΨT ∆z has rotated the axis of the space to match the axes of the
ellipsoid. Moreover, the major axis is aligned along the eigenvector correspond-
ing to the largest magnitude eigenvalue and the minor axis is aligned along the
smallest magnitude eigenvalue.
In more general settings, when z ∈ Rn , let 0 < λ1 ≤ λ2 ≤ .... ≤ λn
represent eigenvalues of the Hessian matrix. Using the fact that Hessian is
positive definite, we can write
(2.27) [∇2 F (z∗ )] = ΨΛΨT
170 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
Figure 1
where
h i
Ψ= (1) (2) (n)
v v .... v
represents unitary matrix with eigenvectors of the Hessian as its columns and Λ
is a diagonal matrix with eigenvalues on the diagonal. Using this transforma-
tion, we have
¡ ¢
(2.28) q(∆z) = (ΨT ∆z)T Λ ΨT ∆z = (∆y)T Λ (∆y)
(2.29) = λ1 (∆y1 )2 + λ2 (∆y2 )2 + ....λn (∆yn )2 = 1
(i.e. λi ≤ 0 for all i) then, the matrix is negative semi-definite. When eigen
values have mixed signs, the matrix is indefinite.
(3.1) CP = a + bT + cT 2
h iT
y ≡ Cp ; x ≡ T ; θ ≡ a b c
(2) Dimensionless analysis is mass transfer / heat transfer
h iT
y = f ; x = Re ; θ ≡ α β
RT a
(3.5) P = − 1/2
V − b T (V + b)V
h iT h iT
y=P ; x= T V ; θ≡ a b
B
(3.7a) log(Pv ) = A −
T +C
Example 60. Reaction rate models:
µ ¶
dCA
(3.8) −rA = − = ko exp(−E/RT ) (CA )n
dt
h iT
y ≡ −rA ; x ≡ [CA T ] T ; θ ≡ n E ko
h i
y = δT ; x ≡ t ; θ = K α1 α2 τ 1 τ 2
where FT (.) represents true functional relationship that relates yt with xt and
Θ represent the parameter vector. When we collect data from an experiment,
we get a set of measurements x(k) and yk such that
(k)
x(k) = xt +ε(k)
yk = yt,k + vk
k = 1, 2, ......N
where ε(k) and v (k) represent errors in measurements of independent and depen-
dent variables, respectively, and N represents the size of data sample. Given
these measurements, the model relating these measured quantities can be stated
as ³ ´
(k)
yt,k = F xt , θ +ek
where f (.) represents the proposed approximate functional relationship, θ rep-
resent the parameter vector and ek represents the equation error for k’th sample
point. Thus, the most general problem of estimating model parameters from
experimental data can be stated as follows:
Estimate of θ such that
min ¡ ¢
f e1 , e2 , .....eN , ε(1) , ....ε(N) , v1 , ...vN
θ
subject to
h i
(i)
ei = yi − F xt , θ
(i)
ε(i) = x(i) − xt
vi = yi − yt,i
for i = 1, 2, ....N
where, f (.) represents some scalar objective function.
Given a data set, formulation and solution of the above general modeling
problem is not an easy task. In these lecture notes, we restrict ourselves to a
special class of models, which assume that
(1) Measurement errors in all independent variables are negligible i.e. x =
xT
(2) Effect of all unknown disturbances and modeling errors can be ade-
quately captured using equation error model, i.e.
y = F (x, θ) +e
The term e on R.H.S. can be interpreted either as modeling error or as
measurement error while recording y.
176 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
Figure 2. Interpolation
Figure 3. Approximation
b of model predictions as
and y
h i
b=
y yb1 yb2 .... ybN
N×1
bkp
f (e1 , e2 , .....eN ) = ky − y
X
N
(3.21) b ||1 =
|| y − y | yi − ybi |
i=1
X
N
(3.22) b ||2 =
|| y − y ( yi − ybi )2
i=1
4. MULTIVARIATE LINEAR REGRESSION 179
X
N
(3.24) b ||2,w =
|| y − y ( yi − ybi )2 wi
i=1
Zb
(3.25) || y(z) − yb(z) ||1 = | y(z) − yb(z) | dz
a
Zb
(3.26) || y(z) − yb(z) ||2 = [ y(z) − yb(z) ]2 dz
a
Zb
(3.27) || y(z) − yb(z) ||2,w = [ y(z) − yb(z) ]2 w(z) dz
a
The procedures for the determination of the best approximation are both for
continuous and discrete cases.
¡ ¢
(4.4) yb1 = θ1 f1 (x(1) ) + θ2 f2 x(1) + .......... + θm fm (x(1) )
¡ ¢
(4.5) yb2 = θ1 f1 (x(2) ) + θ2 f2 x(2) + .......... + θm fm (x(2) )
..... = .................................................
¡ ¢
(4.6) ybN = θ1 f1 (x(N ) ) + θ2 f2 x(N) + .......... + θm fm (x(N) )
Defining
h iT
(4.7) θ = θ1 θ2 .... θm ∈ Rm
h iT
(4.8) ϕ(i) = f1 (x(i) ) f2 (x(i) ) .... fm (x(i) ) ∈ Rm
h i
(4.9) b =
y yb1 yb2 .... ybN ∈ RN
and
⎡ ⎤ ⎡ ¡ ¢T ⎤
f1 (x(1) ) ........ fm (x(1) ) ϕ(1)
⎢ ⎥ ⎢ ⎥
(4.10) Φ = ⎣ ...... ......... ...... ⎦ = ⎣ ...... ⎦
¡ (N ) ¢T
f1 (x(N) ) ........ fm (x(N) ) N x m ϕ
we get an over - determined set of equations
(4.11) b = Φθ
y
(4.15) b = min eT We
θ
θ
where
h i
(4.16) W = diag w1 w2 .... wN
is a diagonal weighting matrix.
b = min T
(4.17) θ e We
θ
e b = y − Aθ
= y−y
Using the necessary condition for optimality, we have
£ ¤
∂ eT We
=0
∂θ
Rules of differentiation of a scalar function f = uT Bv with respect to vectors
u and v can be stated as follows
∂ ∂ T
(4.18) (uT Bv) = Bv ; [u Bv] = B T u
∂u ∂v
∂
(4.19) [uT Bu] = 2 Bu when B is symmetric
∂u
Now, applying the above rules
(4.20) eT We = [y − Φθ]T W [y − Φθ]
= yT Wy − (Φθ)T Wy − yT WΦθ + θT (ΦT WΦ)θ
£ T ¤
∂ e We b =0
(4.21) = −ΦT Wy − ΦT Wy + 2(ΦT WΦ)θ
∂θ
(4.22) bLS = ΦT Wy
⇒ (ΦT WΦ) θ
In the above derivation, we have used the fact that matrix ΦT WΦ is symmetric.
b can
If matrix (ΦT WΦ) is invertible, the least square estimate of parameters θ
computed as
£ ¤ ¡ ¢
(4.23) bLS = ΦT WΦ −1 ΦT W y
θ
182 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
" £ ¤#
∂ 2 eT We
(4.24) = 2(ΦT WΦ)
∂θ2
and the sufficiency condition is satisfied and the stationary point is a minimum.
ˆ
As Φ is a convex function, it can be shown that the solution θ is the global
minimum of Φ.
Thus, linear least square estimation problem is finally reduced to solving
equation of the form Ax = b where
min 2 min
(5.2) Φ = hθa − y,θa − yi
θ θ
min £ 2 ¤
(5.3) = θ ha, ai − 2θ ha, yi + hy, yi
θ
5. PROJECTIONS: GEOMETRIC INTERPRETATION OF LINEAR REGRESSION 183
Figure 4
Figure 5
origin,we want to find distance of y from S,i.e. a point p ∈S such that kp − yk2
is minimum (see Figure 5). Again, from school geometry, we know that such
point can be obtained by drawing a perpendicular from y to S ; p is the point
where this perpendicular meets S (see Figure 5). We would like to formally
derive this result using optimization.
More generally, given a point y ∈Rm and subspace S of Rm , the problem
is to find a point p in subspace S such that it is closest to vector y. Let
© ª
S = span a(1) , a(2) , ...., a(m) and as p ∈ S we have
X
m
(1) (2) (m)
(5.10) p = θ1 a + θ2 a + .... + θm a = θi a(i)
i=1
5. PROJECTIONS: GEOMETRIC INTERPRETATION OF LINEAR REGRESSION 185
Collecting the above set of equations and using vector-matrix notation, we have
(5.16)
⎡ (1) (1) ® (1) (2) ® (1) (m) ® ⎤ ⎡ ⎤ ⎡ (1) ® ⎤
a ,a a ,a .... a ,a θ1 a ,y
⎢ a(2) , a(1) ® a(2) , a(2) ® .... a(2) , a(m) ® ⎥ ⎢ θ ⎥ ⎢ a(2) , y® ⎥
⎢ ⎥⎢ 2 ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥=⎢ ⎥
⎣ ..... ..... ..... ..... ⎦ ⎣ .... ⎦ ⎣ .... ⎦
(m) (1) ® (m) (2) ® (m) (m) ® (m) ®
a ,a a ,a ..... a , a θm a ,y
This is nothing but the classic normal equation derived in the above subsec-
tion.
Let us now interpret the least square parameter estimation problem stated in
the last section using the above geometric arguments. The least square problem
was posed as choosing the parameter vector θ b such that
q
(5.17) || e ||W,2 = || y − Φθ ||W,2 = [y − Φθ]T W [y − Φθ]
vector Y from a subspace S . The sub-space involved is nothing but the column
space of Φ. Let Φ be represented as
h i
(5.18) Φ = a(1) a(2) .... a(m)
where a(i) ∈ RN are columns of matrix Φ. Let us define the inner product for
any u, v ∈ RN as
(5.19) hu, vi = uT Wv
(5.20) θ1 a(1) + b
p=b θ2 a(2) + .... + b b
θm a(m) = Φθ
Remark 10. Given any Hilbert space X and a orthonormal basis for the
© ª
Hilbert space e(1) , e(2) , .., e(m) , ... we can express any vector u ∈X as
The series
(1) ® (1) (2) ® (2) ®
(5.42) u = e , u e + e , u e + ........... + e(i) , u e(i) + ....
X∞
(i) ® (i)
(5.43) = e ,u e
i=1
b i denotes esti-
Here, ϕi represents true mean or (entire) population mean and ϕ
mate of population mean generated using the given data set. The sample mean
6. STATISTICAL INTERPRETATIONS OF LINEAR REGRESSION 191
XN
(6.12) b = 1
ϕ ϕ(j)
N j=1
It may be noted that the vector 1 makes makes equal angles with all coordinate
axis in RN and the vector ϕ b j 1 represents the projection of η (j) along vector 1.
© ª
A measure of spread of data elements φj,i : j = 1, 2...N around the esti-
mated sample mean ϕ b i is given by sample variance defined as
1 X£
N
¤2
(6.13) s2i = φj,i − ϕi
N − 1 j=1
(6.14) b i1
e(i) = η (i) − ϕ
1 £ (i) ¤T (i)
(6.15) s2i = e e
N −1
³ ´ ³ ´
b i 1 is orthogonal to vector ϕ
Note that the vector η (i) − ϕ b i 1 , which is best
approximation of η (i) along vector 1. Square root of sample variance is called
sample standard deviation.
Now consider data obtained for two different random variables, say ϕi and
ϕk . A measure of linear association between these two variables can be estimated
as
1 Xh ih i
N
£ (i) ¤T (k)
(6.16) si,k = φj,i − ϕ bk = 1
b i φj,k − ϕ e e
N − 1 j=1 N −1
i = 1, 2, ...m ; i = 1, 2, ...m
192 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
Here si,k are (i, k)th elements of sample covariance matrix Sϕ of the random
variable vector ϕ. Alternatively, this matrix can also be estimated as
1 X h (j) b i h (j) b iT
N
(6.17) Sϕ = Cov(ϕ) = ϕ −ϕ ϕ −ϕ
N − 1 j=1
∙ ³ ´T ¸T ∙ ³ ´T ¸
1 b b
(6.18) = Φ−1 ϕ Φ−1 ϕ
N −1
It may be noted that sample covariance matrix Sϕ is an estimated of population
covariance matrix Σ.
It may be noted that Finally, sample correlation coefficients (normalized or
standardized covariances) are defined as
PN h ih i
φj,i − b
ϕ i φ j,k − b
ϕ k
j=1
(6.19) ri,k = r h i r h i2
PN 2 PN
b
j=1 φj,i − ϕi j=1 φj,k − ϕk
b
si,k si,k
= √ √ =√ √
si,i sk,k si sk
£ (i) ¤T (k)
e e
= q q = cos(θι,k )
T T
[e(i) ] e(i) [e(k) ] e(k)
i = 1, 2, ...m ; i = 1, 2, ...m
If two deviation vectors e(i) and e(k) have nearly same orientation, the sample
correlation coefficient will be close to 1. If two deviation vectors e(i) and e(k)
are nearly perpendicular, the sample correlation coefficient will be close to 0.
if two deviation vectors e(i) and e(k) have nearly opposite orientation, the sam-
ple correlation coefficient will be close to -1. Thus, ri,k is a measure of linear
association between two random variables.
ri,k = 0 ⇒ Indicates Lack of linear association between ϕi and ϕk
tendency for one value in pair to be larger than
ri,k < 0 ⇒
its average when the other is smaller than its average
tendency for one value in pair to be large when the
ri,k > 0 ⇒
other is large and for both values to be small together
6. STATISTICAL INTERPRETATIONS OF LINEAR REGRESSION 193
It may be noted that two random variables having nonlinear association may
have ri,k = 0 i.e. lack of linear association. Thus, ri,k = 0 implies only lack of
linear association and not lack of association between two random variables.
The sample correlation matrix Rϕ can also be estimated as
(6.21) Rϕ = D−1/2 Sϕ D−1/2
where
h √ √ √ i
(6.22) D1/2 = diag s1 s2 ... sm
Suppose we define a variable transformation where φi,j are replaced by nor-
malized or standardized variables defined as
³ ´
b
φi,j − ϕj
(6.23) υ i,j = √
sj
Then the vector υ of these scaled random variables will have zero mean and its
sample covariance will be identical to its sample correlation matrix.
it follows that
1
(6.27) C=
(2π)m/2 [det(Σ)]1/2
Thus, multivariate Gaussian distribution has form
1 h i
T −1
(6.28) f (ϕ) = exp (ϕ−ϕ) Σ (ϕ−ϕ)
(2π)m/2 [det(Σ)]1/2
194 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
u u b−ϕ
ϕ
(6.33) t= q = =
χ2 sϕ /σ ϕ sϕ
m
where ϕb is the sample mean and sϕ is the sample standard deviation. The
probability density function for t is
¡ ¢∙ ¸− m+1
1 Γ m+1 2 t2 2
(6.34) p(t) = √ ¡m¢ 1 +
πm Γ 2 m
6. STATISTICAL INTERPRETATIONS OF LINEAR REGRESSION 195
Note that the above covariance expression implies that each element
ei of vector e, are independent and normally distributed variables such
196 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
that
£ (i) ¤T
(6.41) yi = ϕ θ + ei for i = 1, 2...N
(6.42) cov(ei , ej ) = 0 when i 6= j ; i = 1, 2...N, j = 1, 2...N
(6.43) var(ei ) = σ 2 for i = 1, 2...N
• The parameter vector θ and the error vector e are not correlated.
Thus, statistical model for experimental data given by equation (6.41) can
be expressed as
(6.44) y = ΦθT + e
⎡ £ ¤T ⎤
ϕ(1)
⎢ ⎥
(6.45) Φ = ⎣ ........ ⎦
£ (N) ¤T
ϕ
where θ T represent the true parameter values and vectors y and e have been
defined by equations (4.12) and (4.13), respectively. Taking expectation (mean)
on both sides of the above equation., we have
(6.46) E(y) = E(ΦθT + e) = ΦθT
From least square analysis, we have least square solution , given as
(6.47) b = (ΦT Φ)−1 ΦT y
θ
(6.48) b = (ΦT Φ)−1 ΦT E(y) = (ΦT Φ)−1 ΦT E(Φθ T + e)
E(θ)
(6.49) = (ΦT Φ)−1 ΦT ΦθT = θT
The above result guarantees that, if we collect sufficiently large number of sam-
ples, the least square estimate will approach the true values of the model para-
b is an unbiased estimate of θT . To calculate
meters. In statistical terms, θ
b we proceed as follows
covariance of θ,
(6.50) b θ)
V = cov (θ, b = E [(θ b − θT )T ]
b − θT )(θ
Now
(6.51) b − θT
θ = (ΦT Φ)−1 ΦT (ΦθT + e) − θ T
(6.52) = θT + (ΦT Φ)−1 ΦT e − θ T
(6.53) = (ΦT Φ)−1 ΦT e
This implies
(6.54) b θ)
V = cov(θ, b = E [ (ΦT Φ)−1 ΦT (eeT ) Φ (ΦT Φ)−1 ]
(6.55) = (ΦT Φ)−1 ΦT E(eeT ) Φ (ΦT Φ)−1
6. STATISTICAL INTERPRETATIONS OF LINEAR REGRESSION 197
(6.56) E(eeT ) = σ 2 I
we have
(6.57) b θ)
V =cov(θ, b = σ 2 (ΦT Φ)−1
Now, ³since e ´is zero mean and normally distributed vector, it can be shown
that θ b − θT is also a zero mean and normally distributed vector with the
covariance matrix given by equation (6.57). Thus, given matrix Φ and σ2 ,we
can determine confidence interval of each parameter using the parameter
variances..
In general, σ 2 may not be known apriori for a given set of data. However,
an estimate of σ2 (denoted as σb2 ) can be obtained from the estimated error
vector as
1 XN
eT b
b e
(6.59) b2
σ = eb2i =
N − m i=1 N−m
(6.60) b b
e = y − Φθ
from ybi . Thus, this correlation coefficient can be used to compare different mod-
els developed using same data. Further it can be shown that the quantity
µ ¶
N −m R2
(6.63) f=
m + 1 1 − R2
has Fisher distribution with (m + 1) and (N − m) degrees of freedom. When
regression is significant, we can test the hypothesis H0 by rejecting H0 with risk
of α%. The test statistic is with risk of α% can be computed as
(6.64) ε = Fα (m + 1, N − m)
where F denotes Fisher distribution. If f > ε,then we conclude that f is
probably not a the Fisher distribution and the model is not suitable. If f < ε
,then smoothing is significant and the model is satisfactory.
In many situations, the model parameters have physical meaning. In such
cases, it is important to determine confidence interval for parameter θi . Defining
variable
b
θi − θi
(6.65) ti = q
Vbii
where θi is the true value of parameter and bθi is the estimated value of parame-
ter, it can be shown that ti is a t (Student) distribution with N − m degrees of
freedom. Thus, confidence interval with the risk of α% for θi is
q q
(6.66) b
θi − Vbii t(α/2, N − m) < θi < b θi + Vbii t(α/2, N − m)
In many situations, when we are fitting a functional form to explain variation
in experimental data, we do not know apriori the exact function form (e.g.
polynomial order) that fits data best. In such a case, we need to assess whether
a particular term fj (x) contributes significantly to yb or otherwise. In order to
measure the importance of the contribution of θi in
X
m
(6.67) yb = θj fj (x)
j=1
one can test hypothesis H0 namely: θi = 0 (i.e. fi (x) does not influence yb).
Thus, we have
b
θi − 0
(6.68) τi = q
b Vbii
σ
If
(6.69) −t(α/2, N − m) < τ i < t(α/2, N − m)
7. SIMPLE POLYNOMIAL MODELS AND HILBERT MATRICES 199
then fi (x) has significant influence on yb. If not, then we can remove term fi (x)
from yb.
(7.1) yb(z) = θ1 p(1) (z) + θ2 p(2) (z) + θ3 p(3) (z) + ............ + θm p(m) (z)
= θ1 + θ2 z + θ3 z 2 + ................... + θm z m−1
Z1
(7.2) hh(z), g(z)i = h(z)g(z)dz
0
We want to find a polynomial of the form (7.1) which approximates f (z) in the
least square sense. Geometrically, we want to project f (z) in the m dimensional
subspace of C2 [0, 1] spanned by vectors
(7.3) p(1) (z) = 1; p(2) (z) = z ; p(3) (z) = z 2 , ........, p(m) (z) = z m−1
Z1
® 1
(7.5) hij = x(i) (z), x(j) (z) = z j+i−2 dz =
i+j−1
0
where
⎡ ⎤
1 1/2 1/3 ... ... 1/m
⎢ 1/2 1/3 1/4 ... ... 1/(m + 1) ⎥
⎢ ⎥
(7.7) H = ⎢ ⎥
⎣ ... ... ... ... ... ... ⎦
1/m ... ... ... ... 1/(2m − 1)
The matrix H is known as Hilbert matrix and this matrix is highly ill-conditioned
for m > 3. The following table shows condition numbers for a few values of m.
m 3 4 5 6 7 8
(7.8)
c(H) 524 1.55e4 4.67e5 1.5e7 4.75e8 1.53e10
Thus, for polynomial models of small order, say m = 3 we obtain good sit-
uation, but beyond this order, what ever be the method of solution, we get
approximations of less and less accuracy. This implies that approximating a
continuous function by polynomial of type (7.1) with the choice of basis vectors
as (7.3) is extremely ill-conditioned problem from the viewpoint of numerical
computations. Also, note that if we want to increase the degree of polynomial
to say (m + 1)from m, then we have to recompute θ1 , ...., θm along with θm+1 .
On the other hand, consider the model
(7.9) yb(z) = α1 p1 (z) + α2 p2 (z) + α3 p3 (z) + ............. + αm pm (z)
where pi (z) represents the i’th order orthonormal basis function on C2 [0, 1].This
corresponds to choice of basis functions as
(7.10) x(1) (z) = p1 (z); x(2) (z) = p2 (z) ; ........, x(m) (z) = pm (z)
and since
( )
1 if i = j
(7.11) hpi (z), pj (z)i =
0 if i 6= j
the normal equation reduces to
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 .... 0 α1 hp1 (z),f (z)i
⎢ 0 1 .... 0 ⎥⎢ α2 ⎥ ⎢ hp2 (z),f (z)i ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
(7.12) ⎢ ⎥⎢ ⎥=⎢ ⎥
⎣ ..... ..... ..... ..... ⎦⎣ .... ⎦ ⎣ .... ⎦
0 0 ..... 1 αm hpm (z), f (z)i
or simply
(7.13) αi = hpi (z),f (z)i ; i = 1, 2, ....m
Obviously, the approximation problem is extremely well conditioned in this case.
In fact, if we want to increase the degree of polynomial to say (m + 1)from m,
then we do not have to recompute α1 , ...., αm as in the case basis (7.3) where
7. SIMPLE POLYNOMIAL MODELS AND HILBERT MATRICES 201
vectors are linearly independent but not orthogonal. We simply have to compute
the αm+1 as
(7.14) αm+1 = hpm+1 (z),f (z)i
The above illustration of approximation of a function by orthogonal polynomials
is a special case of what is known as generalized Fourier series expansion.
Let us assume that zi is uniformly distributed in interval [0, 1]. For large N,
approximating dz ' 1/N, we can write
X
N Z1
∼ N
(7.21) [Φ]jk = zij+k−2 = N z j+k−2 dz =
i=1
j+k−1
0
(7.22) ( j, k = 1, 2, ............m )
202 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
By definition, a set of polynomials {pj (zi )}are orthogonal over a set of points
{zi } with weights wi , if
X
N
wi pj (zi ) pk (zi ) = 0 ; j, k = 1...............m and ( j 6= k)
i=1
Thus, each term αj pj (zi ) is (statistically) independent of any other term and
contains information that other terms do not have and the resulting parameter
estimation problem is highly well conditioned. A set of orthogonal polynomials
can be obtained by different approaches. One simple algorithm is
INITIALIZE
p−1 (zi ) = 0
p0 (zi ) = 1 (f or i = 1, .............N)
8. NONLINEAR IN PARAMETER MODELS 203
Ψ0 = 0
FOR (k = 0, 1, .............m − 1)
FOR (i = 0, 1, .............N )
PN
wi [ pk (zi ) ]2
Ψk = Ni=1
P
wi [ pk−1 (zi ) ]2
i=1
P
N
wi zi [ pk (zi ) ]2
i=1
Γk+1 =
PN
wi [ pk (zi ) ]2
i=1
The first two approaches use the linear least square formulation as basis
while the nonlinear programming approaches is a separate class of algorithms.
(8.17) (i = 1, .........N)
∂ēi 1 1
(8.19) = =
∂ei yei + ei yi
(8.20) ⇒ wi = (yi )2
Let us denote
∙ ¸
(k−1) ∂F
(8.25) J =
∂θ θ=θ(k−1)
and
³ ´
(8.26) F (k−1) = F X, θ (k−1)
206 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
The least square solution to above sub problem can be obtained by solving the
normal equation
¡ (k−1) ¢T ¡ ¢T £ ¤
(8.29) J W J(k−1) 4θ (k) = J(k−1) W y − F (k−1)
h¡ ¢T i−1 ¡ ¢T £ ¤
(8.30) 4θ(k) = J(k−1) W J(k−1) J(k−1) W y − F (k−1)
and an improved guess can be obtained as
(8.31) θ(k) = θ(k−1) + 4θ(k)
Termination criterion : Defining e(k) = y − F (k) and
£ ¤T
(8.32) Φ(k) = e(k) W e(k)
terminate iterations when Φ(k) changes only by a small amount, i.e.
| Φ(k) − Φ(k−1) |
(8.33) < ε
| Φ(k) |
Gauss Newton Algorithm:
INITIALIZE: θ(0) , ε1 , ε2 , α, k, δ 1 , δ 2 , kmax
e(0) = y − F [ X, θ(0) ]
δ 1 = e(0)T W e(0)
WHILE [(δ 1 > ε1 ) AND (δ 2 > ε2 ) AND (k < kmax )]
e(k) = y − F [ X, θ(k) ]
£ (k)T ¤
J W J (k) 4 θ (k) = J (k)T W e(k) )
λ(0) = 1, j = 0
θ(k+1) = θ(k) + λ(0) 4 θ(k)
δ 0 = e(k) W e(k)
δ 1 = e(k+1) W e(k+1)
WHILE[δ1 > δ 0 ]
λ(j) = αλ(j−1)
θ(k+1) = θ(k) + λ(j) 4 θ(k)
δ 1 = e(k+1) W e(k+1)
END WHILE
δ 2= ||θ(k+1) − θ(k) || / ||θ(k+1) ||
k =k+1
9. UNCONSTRAINED NONLINEAR PROGRAMMING 207
END WHILE
i.e. new guess. Typically, λ is selected by minimizing f (x(k) +λs(k) ) with respect
to, i.e. by carrying out one dimensional optimization with respect to λ.
The iterations are terminated when either one of the following termination
criteria are satisfied:
|f (x(k+1) ) − f (x(k) )|
(9.2) < 1
|f (x(k+1) )|
max
(9.4) x(m) = f (x(i) )
i ∈ (1, ....n + 1)
1 X (i)
n+1
(c)
(9.5) x = x
n i=1,i6=m
Then, we find a new point x(new) new such that by moving in direction of
x(c) − x(m) as follows
¡ ¢
(9.6) x(new) = x(m) + λ x(c) − x(m)
By tracing out one by one level surfaces we obtain contour plot (see Figure
6 and Figure 7). Suppose z =z is a point lying on one of the level surfaces. If
Φ(z) is continuous and differentiable then, using Taylor series expansion in a
neighborhood of z we can write
1 £ ¤
(9.11) Φ(z) = Φ(z) + [∇Φ(z)]T (z−z) + (z−z)T ∇2 Φ(z) (z−z) + ....
2
If we neglect the second and higher order terms, we obtained
210 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
[∇Φ(z)]T ∆z < 0
then
(9.16) Φ(z + ∆z) < Φ(z)
and ∆z is called as descent direction. Suppose we fix our selves to unit sphere
in the neighborhood of z =z i.e. set of all z such that k∆zk ≤ 1 and want to
9. UNCONSTRAINED NONLINEAR PROGRAMMING 211
Figure 8
min ¡ (k) ¢
λ∗k = Φ z − λs(k)
λ
z(k+1)
= z(k) − λ∗k s(k)
° °
δ = °∇Φ(z k+1 )° 2
END WHILE
A numerical approach to the above one dimensional minimization problem
is given in the Appendix. Alternate criteria which can be used for termination
of iterations are as follows
¯ ¯
¯Φ(z(k+1) ) − Φ(z(k) )¯
≤ ε
|Φ(z (k+1) )|
¯ ¯
¯ ∂Φ(z(k+1) ) ¯
Maz ¯¯ ¯ ≤ ε
¯
i ∂zi
° (k+1) °
°z − z(k) ° ≤ ε
The Method of steepest descent may appear to be the best unconstrained min-
imization method. However due to the fact that steepest descent is a local
property, this method is not effective in many problems. If the objective func-
tion are distorted, then the method can be hopelessly slow.
9.3. Method of Conjugate Gradients. Convergence characteristics
of steepest descent can be greatly improved by modifying it into a conjugate
gradient method. Method of conjugate gradients display positive characteris-
tics of Cauchy’s and second order (i.e. Newton’s) method with only first order
information. This procedure sets up each new search direction as a linear com-
bination of all previous search directions and newly determined gradient . The
set of directions p(1) , p(2) ........ are called as A-conjugate if they satisfy the
condition.
Proof. Let z =z minimize Φ(z), then applying necessary condition for op-
timality, we have
(9.22) ∇Φ(z) = b + Az = 0
Now , given a point z(0) and n linearly independent directions s(0) ,.....,s(n−1) ,
constants β i can be found such that
X
n−1
(9.23) z = z(0) + β i s(i)
i=0
[b + Ax(0) ]T s(j)
(9.26) βj = −
[s(j) ]T As(j)
Now, consider an iterative procedure for minimization that starts at z(0) and
successively minimizes Φ(z) in directions s(0) .......s(n−1) where the directions are
A-conjugate .The successive points are determined by relation
(k)
(9.27) z(k+1) = z(k) + λmin s(k)
(k)
Whereλmin is found by minimizing f (z(k) + λs(k) ) with respect to λ. At the
optimum λ, we have
∙ ¸T
∂Φ ∂Φ ∂z(k+1)
(9.28) = =0
∂λ ∂z z=z(k+1) ∂λ
X
k−1
(9.32) ⇒z (k)
=z(0)
+ λ(j) s(j)
j=1
214 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
X
i−1
T
(9.33) [z(k) ]T As(k) = [z(0) ]T As(k) + λkmin s(j) As(k)
j=1
(0) T (k)
(9.34) = [z ] As
h iT
b + Az(0) s(k)
(9.35) ⇒ λkmin = −
[s(k) ]T As(k)
which is identical with β j . Thus z can be expressed as
X
n−1 X
n−1
(i)
(0) (j) (0)
(9.36) z=z + βj s =z + λmin s(i)
i=1 i=1
This implies that z can be reached in n steps or less. Since above equation holds
good for any z(0) , the process converges independent of any choice of starting
point z(0) and any arbitrary set of conjugate directions. ¤
Defining
we have
X
k−1
(9.43) = −g(k) + αi s(i)
i=0
9. UNCONSTRAINED NONLINEAR PROGRAMMING 215
which is linear combination of all previous directions with s(0) = −g(0) . Note
that at each iteration, a line search is performed to determine optimum step
length λk such that
δΦ(z(k+1) )
(9.44) = 5Φ[z(k+1) ]T s(k)
δλ
£ ¤T
(9.45) = g(k+1) s(k) = 0
∆z(1)
(9.50) [g(1) + α0 g(0) ]T A =0
λ0
But, we have
(9.51) A∆z(1) = ∆I (1)
min ¡ (k) ¢
λ∗k = Φ z − λp(k)
λ
z(k+1)
= z(k) − λ∗k p(k)
° °
δ = °∇Φ(z(k+1) )°2
END WHILE
9. UNCONSTRAINED NONLINEAR PROGRAMMING 217
(9.65) ∇Φ(z) = 0
if z = z is the optimum. Note that equation (9.65) defines a system of m
equations in m unknowns. If ∇Φ(z) is continuously differentiable in the neigh-
borhood of z = z, then, using Taylor series expansion, we can express the
optimality condition (9.65) as
£ ¤ £ ¤
(9.66) ∇Φ(z) = ∇Φ z(k) + (z − zk ) ' ∇Φ[z(k) ] + ∇2 Φ(z(k) ) ∆z(k) = 0
Defining Hessian matrix H (k) as
£ ¤
H (k) = ∇2 Φ(z(k) )
an iteration scheme an iteration scheme can be developed by solving equation
(9.66)
z(k+1) = zk + λ∆zk ;
£ ¤−1
∆z(k) = − H (k) ∇Φ[z(k) ]
In order that ∇z(k) is a descent direction it should satisfy the condition
£ ¤T
(9.67) ∇Φ[z(k) ] ∆z(k) < 0
or
£ ¤T £ (k) ¤−1
(9.68) ∇Φ[z(k) ] H ∇Φ[z(k) ] > 0
i.e. in order that ∆z(k) is a descent direction, Hessian H (k) should be a positive
definite matrix. This method has good convergence but demands large amount
of computations i.e. solving a system of linear equations and evaluation of
Hessian at each step
Algorithm
INITIALIZE: z(0) , ε, kmax , λ(0)
k=0
δ = 100 ∗ ε
WHILE [(δ > ε) AND (k < kmax )]
Solve H (k) s(k) = −∇Φ(k)
218 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
min ¡ (k) ¢
λ∗k = Φ z − λs(k)
λ
z(k+1)
= z(k) − λ∗k s(k)
° £ ¤°
δ = °∇Φ z(k+1) ° 2
END WHILE
9.4.1. Quasi- Newton Method. Major disadvantage of Newtons method
or its variants is the need to compute the Hessian of each iteration . The
quasi-Newton methods over come this difficulty by constructing an approxi-
mate Hessian from the gradient information available at the successive itera-
tion. Quasi-Newton methods thus mimic positive characteristics of Newton’s
method using only gradient information. All methods of this class generate
search direction as
(9.69) s(k) = −D(k) ∇f (k)
which is similar to the Newton’s update, i.e.
s(k) = −[H (k) ]−1 ∇f (k)
Here, D(k) is a n × n matrix (called metric), which changes with iterations
(variable metric methods). A variable metric method is quasi-Newton method
if it is designed so that the iterates satisfy the following quadratic function
property
(9.70) ∆z = A−1 ∆g
Let us assume a recursion for the estimate to inverse of Hessian
(9.71) D(k+1) = D(k) + Dc(k)
(k)
Basic idea is to form Dc such that D(0) , D(1) ..........D (k+1) → [H(z)]−1 =
[∇2 f (z)]−1 . We know that for a quadratic f (z) of the form
(9.72) f (z) = 0.5zT Az + bz + c
we can show
(9.73) ∆z = A−1 ∆g
Let us assume that our approximation for A−1 is of the form
(9.74) A−1 = βD(k)
where β is a scalar. We would like D(k) to satisfy
(9.75) ∆z(k) = z(k+1) − z(k) = βD(k+1) ∆g(k)
= βD(k+1) [g(k+1) − g(k) ]
9. UNCONSTRAINED NONLINEAR PROGRAMMING 219
∆z(k)
(9.78) = [D(k) + Dc(k) ]∆g(k)
β
∆z(k)
(9.79) Ac(k) ∆g(k) = − D(k) ∆g(k)
β
One can verify by direct substitution that
∙ ¸
(k) 1 ∆z(k) y T A(k) ∆I (k) η T
(9.80) Ac = −
β y T ∆I (k) η T ∆I (k)
is a solution. As y and η are arbitrary vectors, this is really a family of solutions
.If we let
(9.81) y = ∆z(k) and η = A(k) ∆I (k)
we get the Davidon-Fletcher-Powel update
∆z(k−1) [∆z(k−1) ]T A(k−1) ∆I (k−1) [∆I (k−1) A(k−1) ]T
(9.82) A(k) = A(k−1) + −
[∆z(k−1) ]T ∆I (k−1) [∆I (k−1) ]T A(k−1) ∆I (k−1)
Thus, matrix A(k) is iteratively computed
£ (k) ¤
H + λk I ∆z(k) = −∇Φ[z(k) ]
z(k+1) = z(k) + ∆z(k)
Here λk is used to set the search direction and step length. To begin the search,
=104 )is selected so that
a large value of λo (e
h i
H (0) + λ(0) I =e [λo I]
Thus, for sufficiently large λ(0) , ∆z(0) is in the negative of the gradient direction
i.e. −∇Φ(k) .As λk → 0, ∆z(k) goes from steepest descent to Newtons method.
• Advantages:Simplicity, excellent convergence near z
• Disadvantages: Need to compute Hessian matrix, H (k) and set of
linear equations has to be solved at each iteration
Algorithm
INITIALIZE: z(0) , ε, kmax , λ(0)
k=0
δ = 100 ∗ ε
WHILE [(δ > ε) AND (k < kmax )]
STEP 1 : Compute H (k) and ∇Φ[z(k) ]
£ ¤
STEP 2 : Solve H (k) + λk I s(k) = −∇Φ[z(k) ]
¡ ¢
IF Φ[z(k+1) ] < Φ[z(k) ]
λ(k+1) = 12 λ(k)
9. UNCONSTRAINED NONLINEAR PROGRAMMING 221
° °
δ = °∇Φ[z(k+1) ]°
k =k+1
ELSE
λ(k) = 2λ(k)
GO TO STEP 2
END WHILE
Φ(β) = a + bβ + cβ 2 + dβ 3
dΦ(β)/dλ = b + 2cβ + 3dβ 2
i.e. by solving
" #" # " ¡ ¢ #
β2 β3 c Φ z(k) − βs(k) − a − βb
= ¡ ¢T ¡ ¢
2β 3β 2 d − s(k) ∇Φ z(k) − βs(k) − b
The application of necessary condition for optimality yields
i.e.
p
∗ −c ± (c2 − 3bd)
(9.95) λ =
3d
One of the two values correspond to the minimum. The sufficiency condition
for minimum requires
The fact that dΦ/dλ has opposite sign at λ = 0 and λ = β ensures that the
equation 9.94 does not have imaginary roots.
Algorithm
INITIALIZE: z(k) , s(k) , h
Step 1: Find β
β=h
WHILE [dΦ(β)/dλ < 0]
β = 2β
END WHILE
Step 2: Solve for a, b, c and d using z(k) , s(k) and β
Step 3: Find λ∗ using sufficient condition for optimality
(10.1) Ax = b ; x, b ∈ Rn
(10.3) AT (Ax − b) = 0
Or
(10.8) F (x) = 0 ; x ∈ Rn
h iT
x = x1 x2 ... xn
(10.10) Φ(x) = [F (x)]T F (x) = [f1 (x)]2 + [f2 (x)]2 + .... + [fn (x)]2
224 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
(10.13) F (x) = 0
10.3. Finite Element Method for Solving ODE-BVP and PDEs [15,
16]. The finite element method is a powerful tool for solving PDEs particularly
when the system under consideration has complex geometry. This method is
based on the optimization formulation. In this section, we provide a very brief
introduction to the method of finite element.
10.3.1. Raleigh-Ritz method. Consider linear system of equations
(10.14) Ax = b
which is precisely the equation we want to solve. Since the Hessian matrix
∂ 2 Φ/∂x2 = A
when u(z) is a non-zero vector in C (2) [0, 1]. In other words, solving the ODE-
BVP is analogous to solving Ax = b by optimization formulation where A is
symmetric and positive definite matrix, i.e.
£ ¤
A ↔ −d2 /dz 2 ; x ↔ u(z); b ↔ f (z)
10. NUMERICAL METHODS BASED ON OPTIMIZATION FORMULATION 227
Let u(z) = u∗ (z) represent the true solution of the ODE-BVP. Now, taking mo-
tivation from the optimization formulation for solving Ax = b, we can formulate
a minimization problem to compute the solution
®
(10.30) Φ [u(z)] = (1/2) u(z), −d2 u/dz 2 − hu(z), f (z)i
Z1 Z1
= 1/2 u(z)(−d2 u/dz 2 )dz − u(z)f (z)dz
0 0
Min
u∗ (z) = Φ[u(z)]
u(z)
Min
(10.31) = (1/2) hu(z), Lui − hu(z), f (z)i
u(z)
Z1 Z1 1
2 2
(10.33) u(z)(−d u/dz )dz = (du/dz)2 dz − [u(du/dz) ]
0
0 0
⎡ ⎤
⎡ 1 ⎤
Z1 Z
(10.35) Φ(u) = ⎣1/2 (du/dz)2 dz ⎦ − ⎣ uf (z)dz ⎦
0 0
The above equation is similar to an energy function, where the first term is
analogous to kinetic energy and the second term is analogous to potential energy.
228 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
As
Z1
(du/dz)2 dz
0
is positive and symmetric, we are guaranteed to find the minimum. The main
difficulty in performing the search is that, unlike the previous case where we were
working in Rn , the search space is infinite dimensional as u(z) ∈ C (2) [0, 1]. One
remedy to alleviate this difficulty is to reduce the infinite dimensional search
problem to a finite dimensional search space by constructing an approximate
solution using n trial functions. Let v (1) (z), ....., v (n) (z) represent trial function.
The approximate solution is constructed as
where v(i) (z) represents trial functions. Using this approximation, we convert
the infinite dimensional optimization problem to a finite dimensional optimiza-
tion problem as follows
⎡ ⎤ ⎡ 1 ⎤
Z1 Z
b
(10.37)MinΦ(c) = ⎣1/2 u/dz)2 dz ⎦ − ⎣ u
(db bf (z)dz ⎦
C
0 0
Z1
£ ¡ (0) ¢ ¡ ¢¤2
(10.38) = 1/2 c0 dv (z)/dz + ..... + cn dv (n) (z)/dz dz
0
Z1
(10.39) − f (z)[c0 v (0) (z) + ..... + cn v (n) (z)]dz
0
The trial functions v(i) (z) are chosen in advance and coefficients c1 , ....cm are
treated as unknown. Then, the above optimization problem can be recast as
Min b Min £ ¤
(10.40) Φ(c) = 1/2 cT Ac − cT b
c c
h iT
(10.41) c = c0 c2 ... cn
10. NUMERICAL METHODS BASED ON OPTIMIZATION FORMULATION 229
⎡ ¿ À ¿ (0) À ⎤
dv (0) dv (0) dv dv (n)
⎢ , ........ , ⎥
⎢ dz dz dz dz ⎥
(10.42) A=⎢
⎢ .................. ........ ............. ⎥
⎥
⎣ ¿ (n) À ¿ (n) À ⎦
dv dv (0) dv dv (n)
, ........ ,
dz dz dz dz
⎡ ® ⎤
v(1) (z), f (z)
⎢ ⎥
(10.43) b = ⎣ .................... ⎦
(n) ®
v (z), f (z)
It is easy to see that matrix A is positive definite and symmetric and the global
minimum of the above optimization problem can be found by using necessary
condition for optimality as follows
(10.44) b
∂ Φ/∂c = Ac∗ − b = 0
or
(10.45) c∗ = A−1 b
Note the similarity of the above equation with the normal equation arising
from the projection theorem. Thus, steps in the Raleigh-Ritz method can be
summarized as follows
(1) Choose an approximate solution.
(2) Compute matrix A and vector b
(3) Solve for Ac = b
Similar to finite difference method, we begin by choosing (n − 1) equidistant
internal node (grid) points as follows
zi = i∆z (i = 0, 1, 2, ....n)
(10.47) bi (z) = ai + bi z
u
(10.48) zi−1 ≤ z ≤ zi for i = 0, 2, ...n − 1
230 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
Figure 9
This implies
u bi zi−1
bi−1 zi − u bi − u
u bi−1
(10.51) ai = ; bi =
∆z ∆z
Thus, the polynomial on the i’th segment can be written as
µ ¶
u bi zi−1
bi−1 zi − u bi − u
u bi−1
(10.52) ubi (z) = + z
∆z ∆z
zi−1 ≤ z ≤ zi for i = 1, 2, ...n − 1
bi (z) as
This allows us to express u
bi (z) = u
u bi−1 Mi (z) + u
bi Ni (z)
i = 1, 2, ...n − 1
10. NUMERICAL METHODS BASED ON OPTIMIZATION FORMULATION 231
Figure 10
bi appears in polynomials u
Note that the coefficient u bi (z) and u
bi+1 (z), i.e.
u bi−1 Mi (z) + u
bi (z) = u bi Ni (z)
bi+1 (z) = u
u bi Mi+1 (z) + u
bi+1 Ni+1 (z)
Thus, we can define a continuous trial function by combining Ni (z) and Mi+1 (z)
as follows
⎧ ⎫
⎪ z − zi−1
⎪
⎨ Ni (z) = ∆z
⎪ ; zi−1 ≤ z ≤ zi ⎪ ⎪
⎪
⎬
(i)
(10.56) v (z) = z − zi
⎪ −Mi+1 (z) = 1 − ; zi ≤ z ≤ zi+1 ⎪
⎪
⎪ ∆z ⎪
⎪
⎩ 0 Elsewhere ⎭
i = 1, 2, ....n − 1
The simplest and most widely used are these piecewise linear functions (or hat
function) as shown in Figure 10. This is a continuous linear function of z, but,
it is not differentiable at zi−1, zi, and zi+1 . Also, note that at z = zi , we have
( )
1 if i = j
(10.57) v(i) (zi ) =
0 otherwise
i = 1, ....n − 1
Thus, plot of this function looks like a symmetric triangle. The two functions
at the boundary points are defined as ramps
( z )
−M 1 (z) = 1 − ; 0 ≤ z ≤ z1
(10.58) v (0) (z) = ∆z
0 Elsewhere
232 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
⎧ ⎫
⎨ N (z) = z − zn−1 ; z ⎬
n n−1 ≤ z ≤ zn
(10.59) v(n) (z) = ∆z
⎩ 0 Elsewhere ⎭
(10.61) u−b=0
Ab
where
¿ À
dv (i) dv (j)
(10.62) (A)ij = ,
dz dz
and
( )
dv (i) 1/∆z on interval left of zi
=
dz −1/∆z on interval right of zi
If intervals do not overlap, then
¿ À
dv (i) dv (j)
(10.63) , =0
dz dz
The intervals overlap when
¿ À Zzi Zzi
dv (i) dv (i) 2
(10.64) i=j : , = (1/∆z) dz + (−1/∆z)2 dz = 2/∆z
dz dz
zi−1 zi−1
or
¿ À Zzi
dv (i) dv (i−1)
(10.65)i = j + 1 : , = (1/∆z).(−1/∆z)dz = −1/∆z
dz dz
zi−1
¿ À Zzi
dv (i) dv (i+1)
(10.66)i = j − 1 : , = (1/∆z).(−1/∆z)dz = −1/∆z
dz dz
zi−1
10. NUMERICAL METHODS BASED ON OPTIMIZATION FORMULATION 233
(i) ®
(10.68) bi = v , f (z)
Zzi Zzi µ ¶
z − zi−1 z − zi
(10.69) = f (z) dz + f (z) 1 − dz
∆z ∆z
zi−1 zi−1
which is a weighted average of f (z) over the interval zi−1 ≤ z ≤ zi+1 . Note that
the R.H.S. is different from finite difference method.
The Raleigh-Ritz method can be easily applied to problems in higher dimen-
sions when the operators are self-adjoin. Consider Laplace / Poisson’s equation
Z Z
(10.73)
Φ(u) = [1/2(∂u/∂x)2 + 1/2(∂u/∂y)2 − f u]dxdy
(10.74) = (1/2) h∂u/∂x, ∂u/∂xi + 1/2 h∂u/∂y, ∂u/∂yi − hf (x, y), u(x, y)i
We begin by choosing (n − 1) × (n − 1) equidistant (with ∆x = ∆y = h)
internal node (grid) points at (xi , yj ) where
xi = ih (i = 1, 2, ....n − 1)
yj = ih (j = 1, 2, ....n − 1)
234 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
In two dimension, the simplest element divides region into triangles on which
simple polynomials are fitted. For example, u(x, y) can be approximated as
(10.75) b(x, y) = a + bx + cy
u
bi+1,j − u
u bi,j bi,j+1 − u
u bi,j
b(x, y) = u
(10.76) u bi,j + (x − xi,j ) + (y − yi,j )
∙ h ¸ h
(x − xi,j ) (y − yi,j )
bi,j 1 −
= u −
h h
∙ ¸ ∙ ¸
(x − xi,j ) (y − yi,j )
(10.77) +bui+1,j bi,j+1
+u
h h
Now, coefficient u bi,j appears in the shape functions of four triangular element
around (xi , yj ). Collecting these shape functions, we can define a two dimen-
sional trial function as follows
(10.78) ⎧ ⎫
⎪ (x − xi,j ) (y − yi,j ) ⎪
⎪
⎪ 1 − − ; x i ≤ x ≤ xi+1 ; y j ≤ y ≤ yj+1 ⎪ ⎪
⎪
⎪ h h ⎪
⎪
⎪
⎪ (x − x ) (y − y ) ⎪
⎪
⎪
⎪ 1 +
i,j
−
i,j
; x ≤ x ≤ x ; y ≤ y ≤ y ⎪
⎪
⎪
⎨ h h
i−1 i j j+1 ⎪
⎬
(i,j)
v (z) = (x − x i,j ) (y − y i,j )
⎪
⎪ 1− + ; xi ≤ x ≤ xi+1 ; yj−1 ≤ y ≤ yj ⎪ ⎪
⎪
⎪ h h ⎪
⎪
⎪
⎪ (x − xi,j ) (y − yi,j ) ⎪
⎪
⎪
⎪ 1 + + ; xi−1 ≤ x ≤ xi ; y j−1 ≤ y ≤ y j ⎪
⎪
⎪
⎪ h h ⎪
⎪
⎩ ⎭
0 Elsewhere
The shape of this trial function is like a pyramid (see Figure 11). We can define
trial functions at the boundary points in a similar manner. Thus, expressing the
approximate solution using trial functions and using the fact that u b(x, y) = 0
at the boundary points, we get
where v(i,j) (z) represents the (i,j)’th trial function. For the sake of convenience,
let us re-number these trial functions and coefficients using a new index l =
10. NUMERICAL METHODS BASED ON OPTIMIZATION FORMULATION 235
Figure 11
l = i + (n − 1)j
i = 1, ...n − 1 and j = 0, 1, ...n − 1
N = (n − 1) × (n − 1)
(10.79) b(x, y) = u
u bN v N (x, y)
b0 v0 (x, y) + .... + u
where
h iT
(10.81) b=
u u b2 ... u
b0 u bN
Min Min ¡ ¢
Φ(b
u) = uT Ab
1/2b bT b
u−u
b
u b
u
where
® ®
(10.82) (A)ij = (1/2) ∂v (i) /∂x, ∂v (j) /∂x + (1/2) ∂v(i) /∂y, ∂v (j) /∂y
®
(10.83) bi = f (x, y), v (j) (x, y)
236 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
Again, the matrix A is symmetric and positive definite matrix and this guar-
antees that stationary point of Φ(u) is the minimum. At the minimum, we
have
(10.90) Ab
u=b
where
⎡ ® ® ⎤
v(0) , L(v(0) ) ........ v(1) , L(v(n) )
⎢ ⎥
(10.91) A = ⎣ .................. ........ ............. ⎦
(n) (0)
® (n) (n)
®
v , L(v ) ........ v , L(v )
⎡ ® ⎤
v (0) (z), f (z)
⎢ ⎥
b = ⎣ .................... ⎦
(n) ®
v (z), f (z)
10. NUMERICAL METHODS BASED ON OPTIMIZATION FORMULATION 237
dC
(10.96) = P e(C − 1) at z = 0;
dz
dC
(10.97) = 0 at z = 1;
dz
The approximate solution is chosen as
X
n
(10.98) b
C(z) =C bn v(n) (z) =
b0 v () (z) + ...... + C bi v (i) (z)
C
i=0
i=0
and using the boundary conditions. This yields nonlinear algebraic equations,
which have to be solved simultaneously to compute the unknown coefficients
b0 , ...C
C bn .
238 5. OPTIMIZATION AND RELATED NUMERICAL SCHEMES
11. Summary
In these lecture notes, we have presented various numerical schemes based on
multivariate unconstrained optimization formulation. One of the major appli-
cation of unconstrained optimization is function approximation or multivariate
regression. Thus, we begin by providing a detailed description of the model pa-
rameter estimation problem. We then derive necessary and sufficient conditions
for optimality for a general multivariable unconstrained optimization problem.
If the model has some nice structure, such as it is linear in parameters, then
the parameter estimation problem can be solved analytically. Thus, the linear
model parameter estimation (linear regression) problem is treated elaborately.
Geometric and statistical properties of the linear least square problem are dis-
cussed in detail to provide further insights into this formulation. Numerical
methods for estimating parameters of the nonlinear-in-parameter models are
presented next. Other applications of optimization formulations, such as solv-
ing nonlinear algebraic equations and finite element method for solving PDEs
and ODE-BVPs, have been discussed at the end.
12. Exercise
(1) Square the matrix P = aaT /aT a, which projects onto a line and show
that p2 = p. Is projection matrix invertible?
(2) Compute a matrix that projects every point in the plane onto line
x + 2y = 0.
(3) Solve Ax = b by least square and find p = Ax if
⎡ ⎤ ⎡ ⎤
1 0 1
⎢ ⎥ ⎢ ⎥
A = ⎣ 0 1 ⎦; b = ⎣ 1 ⎦
1 1 0
b = C + Dt + e
P
which minimizes e2 comes from the least squares:
" P #" # " P #
n ti C b
P P = P
ti ( ti )2 D bi ti
12. EXERCISE 239
(12) Use all of the data given in the following table to fit the following
two-dimensional models for diffusion coefficient D as a function of tem-
perature (T) and weight fraction (X).
T(0 C) 20 20 20 25 25 25 30 30 30
X 0.3 0.4 0.5 0.3 0.4 0.5 0.3 0.4 0.5
D ×105 cm2 /s 0.823 0.639 0.43 0.973 0.751 0.506 1.032 0.824 0.561
Model 1 : D = c1 + c2 T + c3 X
Model 2 : D = c1 + c2 T + c3 X + c2 T 2 + c5 T X + c6 X 2
In each case, calculate D at T = 22, X = 0.36 and compare the two
results.
(13) Weighed least square method is to be used in the following linearizable
model
(16) Solve the following equations using Steepest descent method and New-
ton’s optimization method
2x1 + x2 = 4; x1 + 2x2 + x3 = 8; x2 + 3x3 = 11
by taking the starting point as x = [0 0 0]T .
(17) In optimization algorithms, instead of performing 1-D minimization
to find step length λ(k) at each iteration, an alternate approach is to
choose λ(k) such that
f [x(k+1) ] = f [x(k) + λ(k) s(k) ] < f [x(k) ]
Develop an algorithm to choose λ(k) at each iteration step by the
later approach.
Bibliography
243