100% found this document useful (1 vote)
298 views205 pages

Numerical Analysis I-1

This document discusses numerical methods and error analysis. It introduces basic concepts in numerical analysis including definitions of numerical analysis and error. Sources of errors in computations include rounding errors from floating point arithmetic and truncation errors from approximating problems. The document discusses different types of errors and formulas for estimating errors. It will cover various numerical techniques for solving equations, systems of equations, interpolation, integration, differentiation, ordinary differential equations, and more.

Uploaded by

Abinet Alemu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
298 views205 pages

Numerical Analysis I-1

This document discusses numerical methods and error analysis. It introduces basic concepts in numerical analysis including definitions of numerical analysis and error. Sources of errors in computations include rounding errors from floating point arithmetic and truncation errors from approximating problems. The document discusses different types of errors and formulas for estimating errors. It will cover various numerical techniques for solving equations, systems of equations, interpolation, integration, differentiation, ordinary differential equations, and more.

Uploaded by

Abinet Alemu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 205

Numerical Methods

CEGN-2073

Eyaya B

26th March 2019


Contents

1 Basic Concepts in Error Estimation 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Errors and approximations in computations . . . . . . . . . . . . . . . . 2
1.2.1 Sources of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Measure of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Approximation of Errors . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Types of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Truncation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Round-off Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3 Propagation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 General Error Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Inverse problems of the theory of errors . . . . . . . . . . . . . . . . . . . 18

2 Nonlinear Equations 21
2.1 Locating Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 Number of Iterations Needed in the Bisection Method to Achieve
Certain Accuracy: . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 False Position (Regular-falsi) Method . . . . . . . . . . . . . . . . . . . . 29
2.4 Fixed-point Iteration Method . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.6 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 System of Equations 50
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2 Direct Methods for System of Linear Equations (SLE) . . . . . . . . . . . 51
3.2.1 Gaussian Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Gaussian method with partial pivoting . . . . . . . . . . . . . . . 64
3.2.3 Gauss Jordan Method . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2.4 Gauss Jordan Method for matrix inversion . . . . . . . . . . . . . 73
Contents

3.3.2 Gauss Seidel method . . . . . . . . . . . . . . . . . . . . . . . . . 98


3.4 Systems of non-linear equations using Newton’s method . . . . . . . . . . 100
3.5 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.6 Basic Properties of eigenvalues and eigenvectors . . . . . . . . . . . . . . 102
3.6.1 Power method for finding dominant eigenvalues . . . . . . . . . . 103
3.6.2 Approximated dominant eigenvalue . . . . . . . . . . . . . . . . . 104
3.6.3 Inverse power method . . . . . . . . . . . . . . . . . . . . . . . . 107

4 Interpolation 109
4.1 Finite differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.1.1 Shift operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.1.2 Averaging Operator . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.1.3 Differential Operator . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.1.4 Forward difference operator . . . . . . . . . . . . . . . . . . . . . 112
4.1.5 Backward difference operator . . . . . . . . . . . . . . . . . . . . 116
4.1.6 Central difference operator . . . . . . . . . . . . . . . . . . . . . . 117
4.1.7 Relations between operators . . . . . . . . . . . . . . . . . . . . . 119
4.2 Interpolations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2.1 Linear interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.2.2 Quadratic interpolation . . . . . . . . . . . . . . . . . . . . . . . . 125
4.3 Lagrange’s interpolation formula . . . . . . . . . . . . . . . . . . . . . . . 126
4.4 Divided difference formula . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.5 The Newton-Gregory Interpolation Formulae (with equidistant data points)134
4.5.1 The Newton-Gregory Forward Interpolation Formula . . . . . . . 135
4.5.2 The Newton-Gregory Backward Interpolation Formula . . . . . . 139
4.6 Error in Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . 142
4.6.1 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.6.2 Cubic Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

5 Numerical Differentiation and Integration 151


5.1 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.1.1 Differentiation based on Newton’s forward interpolation formula . 152
5.1.2 Differentiation based on Newton’s backward interpolation formula 155
5.2 Integration (Trapezoidal and Simpson’s rule) . . . . . . . . . . . . . . . . 159
5.2.1 Trapeziodal Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.2.2 Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

6 Least Squares Method 170

Numerical Analysis-I Page ii


Contents

6.1 Discrete Least Squares Approximation . . . . . . . . . . . . . . . . . . . 170


6.1.1 Linear Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.1.2 Non-linear least-squares (polynomial and exponential) Approxima-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.2 Continuous Least-Squares Approximation . . . . . . . . . . . . . . . . . . 180
6.2.1 Approximation by Polynomials . . . . . . . . . . . . . . . . . . . 180

7 Numerical methods for ODEs 187


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.2 Initial Value problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.2.1 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.2.2 Other Examples of One-Step Method . . . . . . . . . . . . . . . . 190
7.2.3 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . 197

Numerical Analysis-I Page iii


Chapter 1
Basic Concepts in Error Estimation

1.1 Introduction

Many problems in Science and Engineering can not be solved analytically on a computer
and as a result Numeric solutions are often required. Numeric solutions provide only
approximate solutions and they are not unique. Different numerical algorithms might
yield different approximations.

Definition 1.1 Numerical Analysis

Numerical Analysis deals with the design, analysis of numeric algorithms that deals
with continuous or discrete quantities and considers or analyzes the effects of ap-
proximations.

Definition 1.2 An Error


Error-that is, how far an answer or computed value is from the true value.

The reliability of the numerical result will depend on an error estimate or bound, therefore
the analysis of error and the sources of error in numerical methods is also a critically
important part of the study of numerical technique.

In general, solving of computational problems usually involves the following steps:

• Mathematical modeling: In general, models:

– are abstractions of reality!

1
Chapter-1: Basic Consepts in Error Estimation
1.2. Errors and approximations in computations

– are a representations of a particular thing, idea or condition.

Mathematical Models: are simplified representations of some real world entity that
can be expressed in mathematical equations or computer code. A Mathematical
Model can be broadly defined as a formulation or equation that express the essential
features of a physical system or process in mathematical terms. In a very general
sense, it can be represented as a functional relationship of the form

Dvar = f (Indvars, Params, ffuncs) (1.1)

Where, Dvar( Dependent variable):is a characteristic that typically reflect the


behavior or state of the system,
Indvar(Independent variables): are usually dimensions, such as time and
space, along with the system’s behavior is being determined,
Params( Parameters): are reflectives of the system’s properties or composi-
tions, and
ffuncs( Forcing functions): are external influences acting upon it.
The actual Mathematical expression of equation (1.1) can range from a simple
algebraic relationship to large complicated sets of differential equation.

• Algorithm design: involves

– Building an algorithm to solve the mathematical problem formulation and


– Analyze the algorithm for its performance.

• Implementation and Evaluation: involves

– programming implementation of the algorithm and


– Evaluate its performance with real data

1.2 Errors and approximations in compu-


tations

When we use numerical methods or algorithms and computing with finite precision, errors
of approximation or rounding and truncation are introduced. It is important to have a
notion of their nature and their order. A newly developed method is worthless without

Numerical Analysis-I Page 2


Chapter-1: Basic Consepts in Error Estimation
1.2. Errors and approximations in computations

an error analysis. Neither does it make sense to use methods which introduce errors with
magnitudes larger than the estimated error bound. On the other hand, using a method
with very high accuracy might be computationally too expensive to justify the gain in
accuracy.

Before we going to discuss these source of errors in detail, let us fist define what an error
is and measure of errors in a Numerical calculation.

1.2.1 Sources of Errors

Error in solving an engineering or science problem can arise due to several factors. A
paramount goal in numerical analysis is to asses the accuracy of the results of calculations.
Errors contained in the numerical answers to problems generally arise in two areas:

Type-I Those inherent in the mathematical formulation of the problems, such as

(a) The error incurred when the mathematical statement of a problem is only an
approximation to the physical situation;
(b) The error due to inaccuracies in the physical data.

Type-II Those incurred in the numerical computation process, such as

(a) Programming blunder;


(b) Truncation error, i.e., inexact evaluation of mathematical operators;
(c) Roundoff errors, i.e., inexact arithmetic calculations.
(d) Propagation errors,i.e, errors can be amplified as they propagate through an
iterative computation.

Type (1) errors are beyond the control of the calculation and are usually negligible. It is
understood, however, that the worth of a computed solution must be carefully weighed
against these errors. Programming blunder which results in the correct calculation of
the wrong result usually can be detected or verified. It is the last three sources of
computational error that chiefly interest us and should be controlled by any feasible
Numerical method.

Numerical Analysis-I Page 3


Chapter-1: Basic Consepts in Error Estimation
1.2. Errors and approximations in computations

1.2.2 Measure of Errors

An error can be measured in three different ways:

Definition 1.3 Absolute Error


Absolute error, denoted by abs , is the numerical difference between the computed
(or estimated) value of a quantity and the true value. Usually, absolute error is
defined in terms of the magnitude of the difference between the true value and the
approximate value as:
abs = kx − xa k, (1.2)

where x denotes the exact value and xa denotes the approximate value. The unit
of exact or unit of approximate values expresses the absolute error.

The absolute error doesn’t usually signify the measure of an error. For instance a 0.1
pound absolute error is very small error when measuring a person’s weight, but the same
error measure can be disastrous when measuring the dosage of a medicine. On the other
hand, the relative error and percentage error defined bellow, gives a better measure of an
error.

Definition 1.4 Relative Error


Relative error, rel , is the absolute error divided by the true value:

kx − xa k abs
rel = = . (1.3)
kxk kxk

The relative error is dimensionless or it’s independent of units.

Definition 1.5 Percentage Error

The percentage error,p in xa , committed in approximating the true value of x is


given by
kx − xa k
p = rel × 100% = × 100% (1.4)
kxk

Numerical Analysis-I Page 4


Chapter-1: Basic Consepts in Error Estimation
1.2. Errors and approximations in computations

Example 1.1

Consider x = 0.33 and f l(x) = 0.30. Clearly,

abs = 0.03 and


0.03
rel =
0.33
= 0.09091 ≈ 9.1%.

When x = 0.33 × 10−5 and f l(x) = 0.30 × 10−5 , we have

abs = 0.03 × 10−5


= 3 × 10−7
but
rel = 0.09091.
≈ 9.1%.

Note that the relative error is unchanged, while the absolute error changed by a
factor of 105 .

Therefore, from the above observation we can concluded the following:

• The absolute error is strongly dependent on the magnitude of x.

• The absolute error is misleading unless it is stated what it is an error of.

• The relative error is a measure of the number of significant digits of x that are
correct, we will discuss this in detail in Section (1.3.2).

• A relative error has meaning even when x is not known. It is given as a percentage
value.

Example 1.2
1
Three approximate values of the number 3
are given as 0.3, 0.33, and 0.34. Which
of these 3 approximations is the best approximation?
Solution: The number which has least absolute error gives the best approximation:

• Ea1 = | 31 − 0.30| = 1
30

• Ea1 = | 31 − 0.33| = 1
300

Numerical Analysis-I Page 5


Chapter-1: Basic Consepts in Error Estimation
1.2. Errors and approximations in computations

• Ea1 = | 13 − 0.34| = 1
150

1
Therefore, since 300
is the smallest of all the absolute errors, 0.33 is the best ap-
1
proximation for 3
.

Approximate Numbers

• Exact Number: a number with, which no uncertainly is associated, no approxi-


mation is taken. Example: 5, 1/2, 21
6
.

• Approximate Number: There are numbers which are not exact. For example,

e = 2.7182 · · · , 2 = 1.41421 · · · e.t.c. They contain infinitely many non-recurring
digits. Therefore, the numbers obtained by retaining a few digits are called ap-
proximate numbers.
Example: The approximate numbers, e ≈ 2.718 and π ≈ 3.142.

• Significant digits (Figures): are the numbers of digits used to express the num-
ber. The digits 1, 2, 3, · · · , 9 are significant digits and 0 is also a significant digit
except when it is used to fix the decimal point or used to fill the place of discarded
digits.
Example: 5879, 0.4762 contains four significant digits, 0.00486 and 0.000382 con-
tains three significant digits and 2.0682 contains five significant digits.

Example 1.3

Find the absolute, relative and percentage errors if x is rounded-off to three decimal
digits, given x = 0.005998.
Solution: If x is rounded-off to three decimal places we get x = 0.006. Therefore,

Error = True value − Approximate value


Error = 0.005998 − 0.006
= −0.000002.

Therefore,
Absolute Error Ea = |Error|
= | − 0.000002|
= 0.000002,

Numerical Analysis-I Page 6


Chapter-1: Basic Consepts in Error Estimation
1.2. Errors and approximations in computations

|Error|
Relative Error Er =
|True value|
| − 0.000002|
=
|0.005998|
= 0.0033344 and
Percentage Error Ep = Er × 100%
= 0.0033344 × 100%
= 0.33344%

1.2.3 Approximation of Errors

Often times the true value is unknown to us, which is usually the case in numerical
computing. In this case we will have to quantify errors using approximate values only.
For example, when an iterative method is used we get a approximate value at the end
of each iteration. The approximate error (Ea ) is defined as the difference between the
current (present) approximate value and the previous approximation. In general,

approximate error (Ea ) = present approximation − previous approximation.

Similarly we can calculate the relative approximate error (Er ) by dividing the approximate
error by the present approximate value.

approximate error
relative approximate error(Er ) = .
present approximation

Assume our iterative method yield a better approximation as the iteration carries on.
Often times we can set an acceptable tolerance to stop the iteration when the relative
approximate error is small enough. We often set the tolerance in terms of the number of
significant digits - the number of digits that carry meaningful contribution to its precision.
It corresponds to the number of digits in the scientific notation to represent a number’s
significand or mantissa.

An approximation rule for minimizing the error is as follows: if the absolute relative
approximate error is less than or equal to a predefined tolerance then the acceptable
error has been reached and no more iterations would be required.

Numerical Analysis-I Page 7


Chapter-1: Basic Consepts in Error Estimation 1.3. Types of Errors

1.3 Types of Errors

1.3.1 Truncation Errors

Truncation error refers to the error in a method, which occurs because some series (finite
or infinite) is truncated to a fewer number of terms. Examples of this include the compu-
tation of a definite integral through approximation by a sum or the numerical integration
of an ordinary differential equation by some finite difference method. Such errors are
essentially algorithmic errors and we can predict the extent of the error that will occur
in the method. Simply,

Definition 1.6 Truncation Error


Truncation error is defined as the error caused by truncating a mathematical for-
mula or procedure.

Example 1.4

For example, the Maclaurin series for ex is given as

x2 x3
ex = 1 + x + + + ···
2! 3!

This series has an infinite number of terms but when using this series to calculate
ex , only a finite number of terms can be used. For example, if one uses three terms
to calculate ex , then
x2
ex ≈ 1 + x + .
2!
Thus, the truncation error for such an approximation is
!
x x2
Truncation Error (ET ) = e − 1 + x +
2!
x3 x4
= + + ···
3! 4!

Numerical Analysis-I Page 8


Chapter-1: Basic Consepts in Error Estimation 1.3. Types of Errors

1.3.2 Round-off Errors

When a calculator or digital computer is used to perform numerical calculations, an


unavoidable error, called round-off error, must be considered. This error arises due to the
fact that floating point numbers are represented by finite precision on a computing device
and it occurs because of the computing device’s inability to deal with certain numbers.
Thus, calculations are performed with approximate representations of the actual numbers
which results in a round-off error. Let’s see the result of round-off error in a real-world
problem.

Example 1.5

Problems created by round off error

• 28 Americans were killed on February 25, 1991 by an Iraqi Scud missile in


Dhahran, Saudi Arabia.

• The patriot defense system failed to track and intercept the Scud. Why?

Problem with Patriot missile

• The Patriot defense system consists of an electronic detection device called


the range gate. It calculates the area in the air space where it should look
for a Scud. To find out where it should aim next, it calculates the velocity
of the Scud and the last time the radar detected the Scud. Time is saved in
a register that has 24 bits length. Since the internal clock of the system is
measured for every one-tenth of a second, 1/10 is expressed in a 24 bit-register
as 0.00011001100110011001100. However, this is not an exact representation.
In fact, it would need infinite numbers of bits to represent 1/10 exactly. So,
the error in the representation in decimal format is

1 
= − 0 × 2−1 + 0 × 2−2 + 0 × 2−3 + 1 × 2−4 · · · + 1 × 2−22
10 
+0 × 2−23 + 0 × 2−24
s 360s
= 9.5 × 10−8 × 100hr ×
0.1s 1hr
= 0.342s

The battery was on for 100 consecutive hours, hence causing an inaccuracy

Numerical Analysis-I Page 9


Chapter-1: Basic Consepts in Error Estimation 1.3. Types of Errors

of
s 360s
= 9.5 × 10−8 × 100hr ×
0.1s 1hr
= 0.342s

• The shift calculated in the range gate due to 0.3433 s was calculated as 687 m.
For the Patriot missile defense system, the target is considered out of range
if the shift was going to more than 137 m.

For example a number like 1/3 may be represented as 0.33333 on a computing device.
Then the round off error in this case is 1/3 − 0.33333 = 0.00000033. There are also other
numbers that cannot be represented exactly on a computing machine, for example, π and

2 are numbers that need to be approximated in computer calculations.

There are two major approaches in approximating the actual number in a computer:
chopping and rounding:

• When chopping a number to a specified number of decimal places, say n, the first
n digits of the mantissa are retained, simply chopping off the remain digits.

• When rounding a number, the computer chooses the closest number that is rep-
resentable by the computer.

A natural question one may ask is what error is committed when a number is chopped
or rounded to n digits? Consider the number

x = 0.d1 d2 · · · dn dn+1 · · · , (1.5)

then chopping to n digits produce and the number

f l(x) = 0.d1 d2 · · · dn , (1.6)

with an error
Error = x − f l(x) = 0.dn+1 dn+2 · · · × 10−n
≤ 0.99999 · · · × 10−n (1.7)
≤ 10−n .
Does rounding x to n digits increase or decrease this error? We now show that error will
decrease. Once again, consider:

x = 0.d1 d2 · · · dn dn+1 · · · , (1.8)

Numerical Analysis-I Page 10


Chapter-1: Basic Consepts in Error Estimation 1.3. Types of Errors

If dn+1 < 5, chop x to n. Then the error is with an error

Error = x − f l(x) = 0.dn+1 dn+2 · · · × 10−n


≤ 0.49999 · · · × 10−n (1.9)
1
< × 10−n .
2

When 5 < dn+1 ≤ 9, add 0.5 × 10−n and chop the result. If x∗ = x + 0.5 × 10−n ,

round(x) = chop(x∗ ) = 0.d1 d2 · · · d∗n (1.10)

where 0 < d∗n < 5, and d∗n = dn + 1 The error is therefore,

|x − round(x)| = |0.d1 d2 · · · d∗n − 0.d1 d2 · · · dn dn+1 · · · |


= |d∗n − dn − 0.dn+1 dn+2 · · · | × 10−n
(1.11)
= |1 − 0.dn+1 dn+2 · · · | × 10−n
1
< × 10−n .
2

The last inequality follows since 5 < dn+1 ≤ 9.

When dn+1 = 5, increase the dn by unity if it is odd otherwise, leaves it unchanged and
apply chopping.

Example: If the number x = 11.675 is round-off to two decimal places, then we get
xa = 11.68. However, if we round-off the number x = 11.685 to two decimal places we
xa = 11.68.

In summary, the error committed by rounding a normalized number x to n digits (in


base 10) is 1
2
× 10−n , which is half the maximum error committed by chopping.

Definition 1.7 True relative error


The formula for the relative error

x − f l(x)
= −rel , (1.12)
x

is usually written as
f l(x) = (1 + rel )x.

Numerical Analysis-I Page 11


Chapter-1: Basic Consepts in Error Estimation 1.3. Types of Errors

True Relative Error and Significant digits

If the number x is rounded to n significant digits, then abs = 1


2
× 10−n

Example: If x = 0.51 and correct to two decimal places. Then Ea = 0.005 and the
relative accuracy is given by

0.005
Ep = Er × 100% = × 100%
0.51
= 0.98%

Our intuition tells us that the more accurate a number is, the more digits (that follow the
decimal point) will be correct, once the number is expressed in normalized format. To
make this statement more precise, we relate the number of correct digits to the relative
error.

Definition 1.8
Let p∗ approximates p to t significant digits if t is the largest integer > 0 such that

|p − p∗ |
Er = ≤ 5 × 10−t .
|p|

Taking log10 of both sides results in


 
log10 (Er ) ≤ log10 5 × 10−t
= log10 5 − t,

so that
5
t ≤ log10 .
Er

A related statement is that if x and y are normalized floating-point machine numbers


such that x > y > 0 and 1 − x
y
≥ 2−k , then at most k significant binary bits are lost in
the subtraction x − y.

Example 1.6

Given a relative error Er = 0.5, how many significant digits do we have?


Solution:
5
t ≤ log10 = 1,
5 × 10−1

Numerical Analysis-I Page 12


Chapter-1: Basic Consepts in Error Estimation 1.3. Types of Errors

thus, we have one significant digit.


Given a relative error Er = 0.1

5
t ≤ log10 = log10 50
10−1

so that 1 ≤ t ≤ 2 and hence we have again one significant digit.

Example 1.7

Consider x = 3.29 and f l(x) = 3.2. How accurate is the approximation f l(x)?
Solution: The true relative error is

3.29 − 3.2
Er =
3.29
9 × 10−2
=
3.29
≈ 3 × 10−2 ,

so that
5 5
 
t ≤ log10 −2
= 2 + log10 ,
3 × 10 3
which implies that t lies between 2 and 3. Thus, there are 2 significant digits.

The number of significant digits is only weakly dependent on the value of the relative
error mantissa. For example, as long as
h i
Er ∈ 0.5 × 10−2 , 5.0 × 10−2 , (1.13)

there are only two significant digits.

Example 1.8

Another way to pose the question is given x = 3.2, what is the worst possible
approximation to x which is accurate to 2 significant digits?
Solution: Let the approximation be x∗ . Therefore

5
2 = t = log10 ∗ .
| 3.2−x
3.2
|

Numerical Analysis-I Page 13


Chapter-1: Basic Consepts in Error Estimation 1.3. Types of Errors

Taking the antilogarithm of both sides (10 to the power),

5
100 = ∗ .
| 3.2−x
3.2
|

Rearanging terms,
20
= |3.2 − x∗ |−1
3.2
or
1
|3.2 − x∗ | ≈ = 0.16
6.25
The worst approximation to x while retaining 2 significant digits is therefore x∗ =
3.04 or x∗ = 3.36.

Example 1.9

Given the solution of a problem as xa = 35.25 with the relative error in the solution
at most 0.02. Find, to four decimal digits, the range of values within which the
exact value of the solution must lie.
Solution: If xt is the exact value of the solution, then according to given informa-
tion in the question we have

Ea |xt − xa | xt − xa
=⇒ Er = = =| | < 0.02,
|xt | |xt | xt
xa
=⇒ |1 − | < 0.02
xt
xa
=⇒ − 0.02 < 1 − < 0.02,
xt
xa xa
=⇒ − 0.02 < 1 − and 1 − < 0.02,
xt xt
xa xa
=⇒ < 1.02 and 0.98 < ,
xt xt
xa xa
=⇒ < xt and xt < ,
1.02 0.98
xa xa
=⇒ < xt <
1.02 0.98

Now, substituting the approximate value xa = 35.25 we get

35.25 35.25
< xt <
1.02 0.98
=⇒ 34.55882353 < xt < 35.96938775

Hence, correct to 4 decimal digits, the range of values within which the exact value
of the solution lies, is 34.5588 < xt < 35.9694.

Numerical Analysis-I Page 14


Chapter-1: Basic Consepts in Error Estimation 1.3. Types of Errors

The last the last tow example demonstrates that the concept of significant digits is not
simply a matter of counting the number of digits which are correct.

1.3.3 Propagation Errors

Error analysis of algorithms generally assumes perfect precision, ie. no round-off error.
However, it is there and is worth keeping it in mind. Especially if you are doing many
sequential calculations where the output from one is input into another. In this way,
errors can be propagated and your final answer can be garbage.

If a calculation is made with numbers that are not exact, then the calculation itself
will have an error. How do the errors in each individual number propagate through the
calculations? Let’s look at the concept via some examples.

Example 1.10

Find the bounds for the propagation error in adding two numbers. For example if
one is calculating x + y where

x = 1.5 ± 0.05, y = 3.4 ± 0.04.

Solution: By looking at the numbers, the maximum possible value of x and y are
x = 1.55 and y = 3.44. Hence

x + y = 1.55 + 3.44 = 4.99

is the maximum value of x + y.


The minimum possible value of x and y are x = 1.45 and y = 3.36. Hence

x + y = 1.45 + 3.36 = 4.81

is the minimum value of x + y. Therefore

4.81 ≤ x + y ≤ 4.99.

Here, we can see that the exact sum of the two numbers is 4.9 and has an absolute
error of 0.09 which is greater than the error in x and y.

Numerical Analysis-I Page 15


Chapter-1: Basic Consepts in Error Estimation 1.4. General Error Formula

Example 1.11

Suppose a real number x is represented by the machine as x∗ where x∗ = x(1 + δ)


and δ (the initial relative error) is small. Suppose we want to calculate x2 . In the
machine we will get (x∗ )2 = x2 (1 + δ)2 . Now we get an error

E = (x∗ )2 − x2
h i
= x2 (1 + δ)2 − 1 since δ 2  1.
≈ x2 (2δ)

E
which can be very big, especially if x is big. If we look at relative error= x2
, we still
get a relative error of 2δ . Notice, the relative error doubled. This is an example
of an error being propagated.

1.4 General Error Formula

What if the evaluations we are making are function evaluations instead of arithmetic
operations? How do we find the value of the propagation error in such cases. In the
section, we derive a general formula for the error committed in using a certain formula
for a functional relation. Consider

u = f (x1 , x2 , · · · , xn ),

be a function of several variables x1 , x2 , · · · , xn and let the error in each xi be 4xi . Then
the error 4u in u is given by

u + 4u = f (x1 + 4x1 , x2 + 4x2 , · · · , xn + 4xn )

Numerical Analysis-I Page 16


Chapter-1: Basic Consepts in Error Estimation 1.4. General Error Formula

Using Taylor’s series expansion of f about (x1 , x2 , · · · , xn ) in the above equation we get

u + 4u = f (x1 , x2 , · · · , xn )+
∂u ∂u ∂u
4x1 + 4x2 + · · · + 4xn +
∂x1 ∂x2 ∂xn
1 ∂ 2u 1 ∂ 2u 1 ∂ 2u
4x21 2 + 4x1 4x2 + ··· + 4x1 4xn +
2! ∂x1 2! ∂x1 ∂x2 2! ∂x1 ∂xn
1 ∂ 2u 1 ∂ 2u 1 ∂ 2u
4x1 4x2 + 4x22 2 + · · · + 4x2 4xn +
2! ∂x1 ∂x2 2! ∂x2 2! ∂x2 ∂xn
..
.
1 ∂ 2u 1 ∂ 2u 1 ∂ 2u
4x1 4xn + 4x2 4xn + · · · + 4x2n 2 +
2! ∂x1 ∂xn 2! ∂x2 ∂xn 2! ∂xn
..
.

4xi
Assuming the errors 4x1 , 4x2 , · · · , 4xn in xi all are small and that xi
 1 so that
that the terms containing 4x21 , 4x22 , · · · , 4x2n and higher powers of 4x1 , 4x2 , · · · , 4xn
are being neglected. Therefore,

∂u ∂u ∂u
u + 4u ≈ f (x1 , x2 , · · · , xn ) + 4x1 + 4x2 + · · · + 4xn ,
∂x1 ∂x2 ∂xn
which implies that

∂u ∂u ∂u
4u ≈ 4x1 + 4x2 + · · · + 4xn . (1.14)
∂x1 ∂x2 ∂xn

Equation (1.14) represents the general formula for errors. If we divide the above
equation by u on both sides we get the relative error

4u 4x1 ∂u 4x2 ∂u 4xn ∂u


Er = ≈ + + ··· + , (1.15)
u u ∂x1 u ∂x2 u ∂xn

Also from Equation (1.14), by taking modulus we get maximum absolute error,

∂u ∂u ∂u
|4u| ≤ |4x1 | + |4x2 | + · · · + |4xn |. (1.16)
∂x1 ∂x2 ∂xn

Numerical Analysis-I Page 17


Chapter-1: Basic Consepts in Error Estimation
1.5. Inverse problems of the theory of errors

In addition, from Equation (1.17), by taking the modulus we get the maximum relative
error as

4x1 ∂u 4x2 ∂u 4xn ∂u


Er ≤ | |+| | + ··· + | |, (1.17)
u ∂x1 u ∂x2 u ∂xn

Example 1.12
5xy 2
Let u = z3
, with 4x = 4y = 4z = 0.0001 and x = y = z = 1. Find the
maximum absolute error and relative errors.
Solution:
∂u 5y 2 ∂u 10xy ∂u −15y 2
∂x
= ;
z 3 ∂y
= z3
; ∂z = z4
.
Thus, the absolute error bound is given as

5y 2 10xy −15y 2
(4u)max = |4x | + |4y | + |4x |
z3 z3 z4
|0.0001 × 5| + |0.0001 × 10| + |0.0001 × −15|
= 0.003,

and the relative error bound is given by

(4u) 0.003
Er = = = 0.0006.
u 5

1.5 Inverse problems of the theory of


errors

Definition 1.9
Error Inverse Problem What must the absolute errors of the independent variables
of a function be so that the absolute error of the function does not exceed a given
quantity?

Given the errors of several independent quantities or approximate numbers, the direct
problem requires us to find the error of any function of these quantities.However,the
inverse problem requires us to find the allowable errors in several independent quantities

Numerical Analysis-I Page 18


Chapter-1: Basic Consepts in Error Estimation
1.5. Inverse problems of the theory of errors

in order to obtain a prescribed degree of accuracy in any function.The direct problem is


straightforward.The formula to be used is:

∂f ∂f ∂f
∆u = ∆x1 + ∆x2 + · · · + ∆xn .
∂x1 ∂x2 ∂xn

However, the inverse problem,namely,finding the allowable errors in x1 , x2 , · · · , xn when u


is of a desired accuracy, is mathematically indeterminate since there is only one equation
for ∆u and there are several unknowns ∆x1 , ∆2 , · · · , ∆xn .

The problem is solved with the minimum effort by using what is known as the principle
of equal effects.This principle assumes that the partial differentials

∂f
∆xi , i = 1, 2, · · · , n,
∂xi

are equal. Thus, we have


∂f
∆u = n ∆xi
∂xi
∆u
=⇒ ∆xi = f .
n ∂xi
Therefore, the absolute error of each independent variable is given by

The Inverse Problem

∆u
∆xi = , for i = 1, 2, · · · n, (1.18)
n ∂xf i
where n is the number of independent variables.

Example 1.13

The base of a cylinder has radius r ≈ 2m, the altitude of the cylinder is h ≈ 3m.
With what absolute errors must we determine r and h so that the volume v may
be computed within 0.1m3 ?
Solution: We have v = πrr h and ∆v = 0.1m3 . Here we can see see that v is
function of three variables, i.e., π, r and h. Thus, putting r = 2m, h = 3m and

Numerical Analysis-I Page 19


Chapter-1: Basic Consepts in Error Estimation
1.5. Inverse problems of the theory of errors

π = 3.14, we approximately get:

∂v
= r2 h = 22 × 3m3 = 12m3
∂π
∂v
= 2πrh = 2 × π × 2 × 3m3 = 37.7m3
∂r
∂v
= πr2 = π × 22 m3 = 12.6m3
∂h

Since n = 3, using the inverse formula (1.18) above we have

∆v 0.1
∆r = ∂v = = 0.000884173298 < 0.001
3 × ∂r 3 × 37.7
∆v 0.1
∆h = ∂v = = 0.0026455026455 < 0.003
3 × ∂r 3 × 12.6

Numerical Analysis-I Page 20


Chapter 2
Nonlinear Equations

2.1 Locating Roots

Solving nonlinear equations is one of the most important and challenging problems in
science and engineering applications. The root finding problem is one of the most relevant
computational problems. It arises in a wide variety of practical applications in Physics,
Chemistry, Biosciences, Engineering, etc.

The problem of nonlinear root finding can be stated in an abstract sense as follows:

Given some function f (x), determine the value(s) of x such that f (x) = 0.

The solution x is called the root of the equation or the zero of the function f and the
problem is called root finding or zero finding.

The common root-finding methods include: Bisection, Newton-Raphson, False position,


Secant methods etc. Different methods converge to the root at different rates. That is,
some methods are faster in converging to the root than others. The rate of convergence
could be linear, quadratic, etc. The higher the order, the faster the method converges.

The central concept to all root finding methods is iteration or successive approxi-
mation. The idea is that we make some guess at the solution, and then we repeatedly
improve upon that guess, using some well-defined operations, until we arrive at an ap-
proximate answer that is sufficiently close to actual answer. We refer to this process as
an iterative method. We call the sequence of approximations the iterates and denote

21
Chapter-2: Nonlinear Equations 2.1. Locating Roots

them by x0 , x1 , x2 , · · · , xn , · · · .

Iterative methods generally involve an infinite number of steps to obtain the exact so-
lution. However, the beauty and power of these methods is that typically after a finite,
relatively small number of steps the iteration can be terminated with the last iterate
providing a very good approximation to the actual solution. One of the primary concerns
of an iterative method is thus the rate at which it converges to the actual solution, called
order of convergence.

Definition 2.1 Order of Convergence:

Let α ∈ R be the actual solution of f (x) = 0. A sequence of iterates,xn ∈ R, n =


0, 1, 2, · · · , is said to be convergent to α if

lim |xn − α| = 0.
n→∞

Thus, if there exists a constant c > 0, an integer N0 ≥ 0 and p ≥ 0 such that for
all n > N0 we have

|α − xn |
|α − xn | ≤ c|α − xn−1 |p , i.e., → c as n → ∞, (2.1)
|α − xn−1 |p

then we say the sequence of iterates converges with an order at least p to α.


If p = 1 and c < 1 then the convergence is called linear and we can write Equa-
tion (6.1) as
|α − xn | ≈ cn |α − x0 |, as n → ∞. (2.2)

If p > 1 then the convergence is called superlinear for any c > 1. In particular,
the values p = 2 and p = 3 are given the special names quadratic and cubic
convergence, respectively.

Definition 2.2 Error Equation:

Notation: The notation en = |xn − α|, is the error in the nth iteration. The
equation
en+1 = cepn + O(ep+1
n ) (2.3)

is called the error equation. By substituting en = xn − α for all n in any iterative


method and simplifying we obtain the error equation for that method. The value
of p obtained is called the order of convergence of the method.

In addition to the order of convergence, the factors for deciding whether an iterative

Numerical Analysis-I Page 22


Chapter-2: Nonlinear Equations 2.2. Bisection Method

method for root finding problem is good or not are accuracy, stability, efficiency, and
robustness. Each of these can be defined as follows:

• Accuracy: The error |α − xn | becomes small as n is increased.

• Stability: If the input parameters are changed by small amounts the output of
the iterative method should not be wildly different, unless the underlying problem
exhibits this type of behavior.

• Efficiency: The number of operations and the time required to obtain an approx-
imate answer should be minimized.

• Robustness: The iterative method should be applicable to a broad range of inputs.

In the iterative methods that we study we will see how each one of these concepts applies.

2.2 Bisection Method

This is one of the simplest iterative methods and is strongly based on the property
of intervals (bracketing). The bisection method is a bracketing method for finding a
numerical solution of an equation of the form f (x) = 0.

As the name suggests, the method is based on repeated bisections of an interval containing
the root. The basic idea is very simple. Suppose we now want to approximate the solution
to f (x) = 0 for a general continuous function f (x). The key to the bisection method is
to keep the actual solution bracketed between the guesses. Thus, in addition to being
given f (x), we need an interval a ≤ x ≤ b where f (a) and f (b) differ in sign. We can
write this requirement mathematically as

f (a) × (b) < 0. (2.4)

It seems reasonable to conclude that since f (x) is continuous and has different signs at
each end of the interval [a, b], there must be at least one point α ∈ [a, b], such that
f (α) = 0. Thus, f (x) has at least one root in the interval. This result is in fact known
as the corollary of Intermediate Value Theorem.

Numerical Analysis-I Page 23


Chapter-2: Nonlinear Equations 2.2. Bisection Method

Theorem 2.1 Intermediate value theorem


Suppose f is continuous on a closed interval [a, b]. Let p be any number between
f (a) and f (b) so that f (a) ≤ p ≤ f (b) or f (b) ≤ p ≤ f (a). Then there exists a
number c in (a, b) such that f (c) = p.

Figure 2.1: Root finding using Bisection method

At each step the method divides the interval in two by computing the midpoint c =
(a + b)/2 of the interval and the value of the function f (c) at that point. Unless c is itself
a root (which is very unlikely, but possible) there are now only two possibilities: either
f (a) and f (c) have opposite signs and bracket a root, or f (c) and f (b) have opposite
signs and bracket a root. The method selects the sub-interval that is guaranteed to be
a bracket as the new interval to be used in the next step. In this way an interval that
contains a zero of f is reduced in width by 50% at each step.

The process is continued until the interval is sufficiently small. Explicitly, if f (a) and
f (c) have opposite signs, then the method sets c as the new value for b, and if f (b) and
f (c) have opposite signs then the method sets c as the new a. (If f (c) = 0 then c may be
taken as the solution and the process stops.) In both cases, the new f (a) and f (b) have
opposite signs, so the method is applicable to this smaller interval.

Now that we have an idea for how the bisection method works for a general problem
f (x) = 0, it is time to write down a formal procedure for it using well defined operations.
We call such a procedure an algorithm.

Numerical Analysis-I Page 24


Chapter-2: Nonlinear Equations 2.2. Bisection Method

Algorithm 1: Bisection Method


Input: Continuous function f (x) ;
Interval [a, b] such that f (a)f (b) < 0 ;
Error tolerance . ;
Output: Approximate solution that is within  of a root of f (x) ;
Step 1. n = −1 initialize the counter ;
Step 2. while b − a ≥ 2 do
a+b
xn+1 = (bisect the interval) ;
2
if f (xn+1 ) == 0 then
return xn+1 (we have found a solution);
end
if f (xn+1 )f (a) ≤ 0 then
b = xn+1
else
a = xn+1
end
n = n + 1 (update the counter)
end
b+a
Step 3. xn+1 = (bisect the interval one last time)
2
;
Step 4. return xn+1 (return the solution)

We can think of an algorithm as a receipe for solving some mathematical problem. How-
ever, instead of the basic ingredients of flour, sugar, eggs, and salt, the fundamental
building blocks of an algorithm are the basic mathematical operations of addition, sub-
traction, multiplication, and division, as well as the for, if, and while constructs.

In the Bisection Algorithm 1, the line while b − a > 2 in this algorithm is called the
stopping criterion, and we call  the error tolerance. This line says that we are going
to continue bisecting the interval until the length of the interval is ≤ 2. This guarantees
that the value returned by the algorithm is at most a distance  away from the actual
solution. The value for  is given as an input to the algorithm.

Note that the smaller the value of , the longer it takes the bisection method to converge.
Typically, we choose this value to be something small like  = 10−6 . The stopping criterion
that we have chosen is called an absolute criterion. Some other types of criterion are
relative and residual. These correspond to b − a < 2|xn | and |f (xn )| ≤ , respectively.
There is no correct choice.

Numerical Analysis-I Page 25


Chapter-2: Nonlinear Equations 2.2. Bisection Method

2.2.1 Number of Iterations Needed in the Bisec-


tion Method to Achieve Certain Accuracy:

Let an , bn and xn denote the nth computed values of a, b and x respectively. Then, we
have
1
bn+1 − an+1 = (bn − an ), n ≥ 1,
2
also
1
b n − an = (b − a), n ≥ 1, (2.5)
2n−1
where (b − a) is the length of the original interval with which we started. Since the root
α is in either the interval (an , xn ) or (xn , bn ), we know that

1
|α − xn | ≤ xn − an = bn − xn = (bn − an ) (2.6)
2

The above Equation (6.5) is the error bound for xn that is used in Step 2 of Algorithm 1.
Now, combining Equation (2.5) and (6.5) we get the further bound

1
|α − xn | ≤ (b − a) (2.7)
2n

which shows that the iterate xn converges to α as n → ∞. Therefore, the bisection


1
method converges linearly to the solution at a rate of 2
. Note that this bound is
entirely independent of the function f (x).

Let us now find out what is the minimum number of iterations n needed with the
bisection method to achieve a certain desired accuracy, say , suppose we want to have

1
|α − xn | ≤ (b − a) ≤ .
2n
Taking logarithms, (with any convenient base), of both sides of the above equation and
simplifying the resulting expression we obtain

log b−a

n≥ (2.8)
log 2

Numerical Analysis-I Page 26


Chapter-2: Nonlinear Equations 2.2. Bisection Method

Example 2.1

Find the largest root of

f (x) = x6 − x − 1 = 0

accurate to within  = 0.001. Solution: With the help of the following graph, it
is easy to check that 1 < α < 2

We choose a = 1, b = 2; then f (a) = −1, f (b) = 61, and Equation (2.4) is satisfied.
Thus, applying the Bisection Methods results in

i a b c f (a) f (b) f (c) f (a) × f (c) |b − a|


1 1.0000 2.0000 1.5000 -1.0000 61.0000 8.8906 -8.8906 1.0000
2 1.0000 1.5000 1.2500 -1.0000 8.8906 1.5647 -1.5647 0.5000
3 1.0000 1.2500 1.1250 -1.0000 1.5647 -0.0977 0.0977 0.2500
4 1.1250 1.2500 1.1875 -0.0977 1.5647 0.6167 -0.0603 0.1250
5 1.1250 1.1875 1.1562 -0.0977 0.6167 0.2333 -0.0228 0.0625
6 1.1250 1.1562 1.1406 -0.0977 0.2333 0.0616 -0.0060 0.0312
7 1.1250 1.1406 1.1328 -0.0977 0.0616 -0.0196 0.0019 0.0156
8 1.1328 1.1406 1.1367 -0.0196 0.0616 0.0206 -0.0004 0.0078
9 1.1328 1.1367 1.1348 -0.0196 0.0206 0.0004 -0.0000 0.0039
10 1.1328 1.1348 1.1338 -0.0196 0.0004 -0.0096 0.0002 0.0020

Therefore, the largest root for the given function is 1.134765625 which is 5 correct
significant digits

Numerical Analysis-I Page 27


Chapter-2: Nonlinear Equations 2.2. Bisection Method

Example 2.2

Approximate 2 with an accuracy of  = 10−3 and also compute the minimum
number of iterations need.

Solution: Let x = 2 be a root of a function f . First let’s find the function f ,


α= 2 =⇒ α2 = 2 =⇒ α2 − 2 = 0

Thus, let f (x) = x2 − 2 and take the interval [1, 2], we have f (1) ∗ f (2) < 0 and

2 ∈ [1, 2], the convergence of the bisection method is guaranteed on this interval.
Thus, the number of minimum iteration is given by

2−1
log 10−3 3
n≥ = ≈ 9.965784
log 2 log10 2

Thus, the minimum number iterations required to have the given accuracy is 10.
The table below shows the numerical value and error of the first 10 iterates of the

bisection algorithm for approximating 2 using the function f (x) = x2 − 2 on the
interval [1, 2] with an error tolerance  = 10−3 .

n a b c sign(f (a)) ∗ sign(f (c)) b−a |α − c|


1 1.00000 2.00000 1.50000 −ve 1.00000 0.08579
2 1.00000 1.50000 1.25000 +ve 0.50000 0.16421
3 1.25000 1.50000 1.37500 +ve 0.25000 0.03921
4 1.37500 1.50000 1.43750 −ve 0.12500 0.02329
5 1.37500 1.43750 1.40625 +ve 0.06250 0.00796
6 1.40625 1.43750 1.42188 −ve 0.03125 0.00766
7 1.40625 1.42188 1.41406 +ve 0.01562 0.00015
..
8 1.41406 1.42188 1.41797 . 0.00781 0.00376
9 1.41406 1.41797 1.41602 0.00391 0.00180
10 1.41406 1.41602 1.41504 0.00195 0.00083

As we can see from the table it takes a minimum of 10 iteration to have an accuracy
|b−a|
of 10−3 , i.e., 2
< . In addition we can see that |α − c| ≤ 21 |b − a|.

The most difficult part about using the bisection method is finding an interval [a, b] where
the continuous function f (x) changes sign. Once this is found, the algorithm is guranteed
to converge. Thus, we would say that the bisection method is very robust. Also, as long
as f (x) has only one root between the interval [a, b], and it does not have another root

Numerical Analysis-I Page 28


Chapter-2: Nonlinear Equations 2.3. False Position (Regular-falsi) Method

very close to a or b, we can make small changes to a or b and the method will converge
to the same solution. Thus, we would say the bisection method is stable. Additionally,
Equation (2.7) tells us that the error |α−xn | can be made as small as we like by increasing
n. Thus, we would say the bisection method is accurate. Finally, the method converges
linearly which is acceptable, but, as we will see in the next two sections, it is by no means
the best we can do. Thus, we would say that the bisection method is not very efficient.

Advantages and disadvantages of the bisection method

• The method is guaranteed to converge

• The error bound decreases by half with each iteration

• The bisection method converges very slowly

• The bisection method cannot detect multiple roots

2.3 False Position (Regular-falsi) Method

The false position method retains the main features of the Bisection method, that the
root is trapped in a sequence of intervals of decreasing size. Therefore, The regula falsi
method is a bracketing method. This method uses the point where the secant line
intersect the x-axis. The secant line over the interval [a, b] is the chord between (a, f (a))
and (b, f (b)) as shown in Figure 6.3. The two right triangles angles in the figure are
similar, which mean that we have

b−c c−a
=
f (b) f (a)

This implies that


af (b) − bf (a)
c= (2.9)
f (b) − f (a)
then we can compute f (c) and repeat the process with the interval [a, c], if f (a)×f (c) < 0
or to the interval [c, b], if and only if f (c) × f (b) < 0.

The rate of convergence is still linear but faster than that of the bisection method. Both
the Bisection and Regular-falsi methods will fail if f has a double root.

Numerical Analysis-I Page 29


Chapter-2: Nonlinear Equations 2.3. False Position (Regular-falsi) Method

Figure 2.2: Root finding using Regular-Falsi method

The formal procedure of the Regular-falsi method is given below in Algorithm 2.

Algorithm 2: Regular-Falsi Method


Input: Continuous function f (x) ;
Interval [a, b] such that f (a)f (b) < 0 ;
Error tolerance . ;
Output: Approximate solution that is within  of a root of f (x) ;
Step 1. c = b (for Residual criteria) ;
Step 2. while |f (c)| ≥  do
af (b) − bf (a)
c= ;
f (b) − f (a)
if f (c) == 0 then
return c (we have found a solution);
end
if f (a)f (c) ≤ 0 then
b=c
else
a=c
end
end
Step 3. return c (return the solution)

We can rewrite the above algorithm as follows so that we can use the iteration number.
Given an interval [x0 , x1 ] such that sing(f (x0 )) × sign(f (x1 )) < 0, then there exists a
root on this interval and the next approximate root, x2 , is computed as

x0 × f (x1 ) − x1 × f (x0 )
x2 = .
f (x1 ) − f (x0 )

Numerical Analysis-I Page 30


Chapter-2: Nonlinear Equations 2.3. False Position (Regular-falsi) Method

Now, check if this approximate root is within the given tolerance, i.e., if f (x2 ) <  then we
take x2 as an approximate root other wise we need to find the next new interval depending
on the sing of f (x0 ) × f (x2 ), if the sign is negative then the root lies in [x0 , x2], otherwise
the root lies in x2 , x1 ]. Thus, the next approximate root, x3 can be computed as

x0 × f (x2 ) − x2 × f (x0 )
x3 = .
f (x2 ) − f (x0 )

if the root lies in the first interval, x3 can be computed as

x1 × f (x2 ) − x2 × f (x1 )
x3 = .
f (x2 ) − f (x1 )

We repeat this procedure until f (xn+ ) < .

In general, at the nth iteration we have the interval [xn−1 , xn ] such that f (xn−1 )×f (xn ) < 0
and the next approximate root is given as

xn−1 × f (xn ) − xn × f (xn−1 )


xn+1 = .
f (xn ) − f (xn−1 )

or,
xn − xn−1
xn+1 = xn − × f (xn ).
f (xn ) − f (xn−1 )

Example 2.3

Find a real root of the equation f (x) = xex − 3 using Regula-falsi method correct
to three decimal places.

Solution: As seen from the graph of f we can assume x0 = 1 and x1 = 1.5 as


initial guess values then f (1) = −0.2817 and f (1.5) = 3.7225 and hence the root
lies between 1 and 1.5.
x0 f (x1 ) − x1 f (x0 ) 1(3.7225) − 1.5 × (−0.2817)
• Iteration-1: x2 = = = 1.0352
f (x1 ) − f (x0 ) (3.7225 − (−0.2817))

Numerical Analysis-I Page 31


Chapter-2: Nonlinear Equations 2.3. False Position (Regular-falsi) Method

Since, f (x2 ) is negative, the next approximate root lies between x1 and x2
and also none of the decimal digits in x1 and x2 are correct.
x1 f (x2 ) − x2 f (x1 )
• Iteration-2: x3 = = 1.0456
f (x2 ) − f (x1 )
Now, f (x3 ) is negative and hence the root again lies between x1 and x3 .

In similar manner, x4 = 1.0487, x5 = 1.0496 and x6 = 1.0498. Hence the approxi-


mate root is 1.0498 correct to three decimal places.
The approximate for exact root is α = 1.0499088949636644

n xn−1 xn xn+1 f (xn−1 ) × f (xn+1 ) f (xn+1 ) α − xn+1


1 1.00000 1.50000 1.03518 -ve 0.08535 0.01473
2 1.03518 1.50000 1.04560 -ve 0.02518 0.00431
3 1.04560 1.50000 1.04865 -ve 0.00737 0.00126
4 1.04865 1.50000 1.04954 -ve 0.00215 0.00037
5 1.04954 1.50000 1.04980 -ve 0.00063 0.00011
6 1.04980 1.50000 1.04988 -ve 0.00018 0.00003

The error analysis for the false-position method is not as easy as it is for the bisection
method, however, if one of the end points becomes fixed, it can be shown that it is still a
linear order of convergence, that is, it is the same rate as the bisection method, usually
faster, but possibly slower. For differentiable functions, the closer the fixed end point is
to the actual root, the faster the convergence.

Order of convergence of Regular Falsi Method

Let α be the exact root of f (x) = 0 and let xn and xn+1 be two successive approximate
solutions to the actual root α at step n.If n and n+1 are the corresponding errors, thus
we have:
xn = α + n and xn+1 = α + n+1 .

The false position method is given by:

xn − xn−1
xn+1 = xn − × f (xn ).
f (xn ) − f (xn−1 )

xn − xn−1
=⇒ α + n+1 = α + n − × f (α + n ).
f (α + n ) − f (α + n−1 )

Numerical Analysis-I Page 32


Chapter-2: Nonlinear Equations 2.3. False Position (Regular-falsi) Method

Expanding f (α + n ) and f (α + n−1 ) about α using Tylor’s series gives

2n 00
h i
(xn − xn−1 ) f (α) + n f 0 (α) + 2
f (α) + ···
=⇒ n+1 = n − h 2n 00
i h
2n 00
i
f (α) + n f 0 (α) + 2
f (α) + · · · − f (α) − n f 0 (α) + 2
f (α) − ···
2n 00
h i
(n − n−1 ) f (α) + n f 0 (α) + 2
f (α)
=⇒ n+1 = n −   (ignoring higher order terms)
2n 2n−1
(n − n−1 )f 0 (α) + 2
− 2
f 00 (α)
2n 00
h i
(n − n−1 ) f (α) + n f 0 (α) + 2
f (α)
=⇒ n+1 = n − h
n +n−1 00
i
(n − n−1 ) f 0 (α) + 2
f (α)
2
f (α) + n f 0 (α) + 2n f 00 (α)
=⇒ n+1 = n −
f 0 (α) + n +2n−1 f 00 (α)
2
n f 0 (α) + 2n f 00 (α)
=⇒ n+1 = n − 0 Since f (α) = 0
f (α) + n +2n−1 f 00 (α)
2n f 00 (α)
n + 2 f 0 (α)
=⇒ n+1 = n − n +n−1 f 00 (α)
Dividing numerator and denominator by f 0 (α)
1+ 2 f 0 (α)
#−1
2n f 00 (α) n + n−1 f 00 (α)
" #"
=⇒ n+1 = n − n + 1+
2 f 0 (α) 2 f 0 (α)

Now, using the formula (1 + x)−1 = 1 − x + x2 − x3 + · · · and ignoring higher order powers
of x we get

2 f 00 (α) 2 f 00 (α) n + n−1 f 00 (α)


! ! 
n+1 = n − n + n 0 + n + n 0
2 f (α) 2 f (α) 2 f 0 (α)
2 f 00 (α) n (n + n−1 ) f 00 (α) 2n f 00 (α) (n + n−1 ) f 00 (α)
=⇒ n+1 = n − n − n 0 + +
2 f (α) 2 f 0 (α) 2 f 0 (α) 2 f 0 (α)
!2
2 f 00 (α) 2n f 00 (α) n n−1 f 00 (α) 2n (n + n−1 ) f 00 (α)
=⇒ n+1 =− n 0 + + +
2 f (α) 2 f 0 (α) 2 f 0 (α) 4 f 0 (α)
!2
n n−1 f 00 (α) 2n (n + n−1 ) f 00 (α)
=⇒ n+1 = +
2 f 0 (α) 4 f 0 (α)
1 f 00 (α)
!
=⇒ n+1 = n n−1 Ignoring higher powers of n
2 f 0 (α)

Thus, we have
n+1 = n n−1 M (2.10)
1 f 00 (α)
!
where M = is constant.
2 f 0 (α)

In order to find the order of convergence, it is necessary to find a formula of type

n+1 = cpn (2.11)

Numerical Analysis-I Page 33


Chapter-2: Nonlinear Equations 2.4. Fixed-point Iteration Method

=⇒ n = cpn−1
1
n

p
=⇒ n−1 =
c
Substituting this value of n−1 into Equation (2.10), we get:
1
n p

n+1 = n M
c
 1
n p
=⇒ cpn = n M
c
 1
n p 1
=⇒ pn = n M×
c c
1
1 (1+ )
=⇒ pn = M c−(1+ p ) n p

Comparing the powers of n on both sides we get

1
p=1+
p
=⇒ p2 − p − 1 = 0
=⇒ p ≈ 1.618 taking the positive root

By putting this value of p into Equation (2.13) we get

n+1 = cn1.618

Therefore, the rate of convergence of the Regular Falsi Method is 1.618.

2.4 Fixed-point Iteration Method

In this section we give a more general introduction to iteration methods, presenting


a general theory for one-point iteration formulae. This method is also known as
substitution method or method of fixed iterations. To find the root of the equation
f (x) = 0 by successive approximation, we rewrite the given equation in the form of
x = g(x) (in an infinite number of ways). A root α of f (x) = 0 is also a fixed point of the
functiong(x), meaning that α is a number for which α = g(α). So the iteration procedure

xn+1 = g(xn ) (2.12)

converges to α under certain conditions.

Numerical Analysis-I Page 34


Chapter-2: Nonlinear Equations 2.4. Fixed-point Iteration Method

Figure 2.3: Root finding using Fixed-point iteration method

The formal procedure of the Fixed-point iteration Method is given below in Algorithm 3.

Algorithm 3: Fixed-point iteration Method


Input: Continuous function g(x) ;
Initial guess x0 ;
Error tolerance . ;
Output: Approximate solution that is within  of a root of f (x) ;
n=0 (Initialize the counter for iteration) ;
Step 2. while |f (xn )| ≥  do
xn+1 = g(xn ) ;
if g(xn+1 ) == 0 then
return xn+1 (we have found a solution);
end
n = n + 1;
end
Step 3. return xn (return the solution)

Example: Consider the function f (x) = x3 − 2 and can be written as

x = x3 + x − 2 (2.13)
2 + 5x − x3
x= (2.14)
5

Taking x0 = 1.2 we have

Numerical Analysis-I Page 35


Chapter-2: Nonlinear Equations 2.4. Fixed-point Iteration Method

n Equation (2.13) Equation (2.14)


1 0.928 1.2544
2 −0.273 1.2596
3 −2.293 1.2599
4 −16.349 1.25992

Thus Equation (2.14) gives the correct root, while Equation (2.13) does not converge.

We begin by asking whether the equation x = g(x) has a solution. For this to occur,
the graphs of y = x and y = g(x) must intersect, as seen on the earlier figure 6.3. The
following lemma gives conditions under which we are guaranteed there is a fixed point α
which is the root of the function f .

Lemma: Let g(x) be a continuous function on the interval [a, b], and suppose
it satisfies the property

a ≤ x ≤ b =⇒ a ≤ g(x) ≤ b (2.15)

Then the equation x = g(x) has at least one solution α in the interval [a, b].

The next question is on what condition does Equation (2.12) will converge to the fixed
point α. Suppose g(x) and g 0 (x) are continuous then from the Tylor series expansion
about a point xn

(x − xn )2 00
g(x) = g(xn ) + (x − xn )g0 (xn ) + g (xn ) + · · ·
2!

by Taylor’s Theorem we have

g(α) = g(xn ) + (α − xn )g0 (ξn ), where xn ≤ ξn ≤ α. (2.16)

Numerical Analysis-I Page 36


Chapter-2: Nonlinear Equations 2.4. Fixed-point Iteration Method

Now, let x0 be an initial guess to the fixed point, then from Equation (2.12) we have

x1 = g(x0 )
=⇒ α − x1 = α − g(x0 ) = g(α) − g(x0 )
= (α − x0 )g0 (ξ0 ), x0 ≤ ξ0 ≤ α, (using the above Equation (2.16))
α − x2 = g0 (ξ1 )(α − x1 )
= g0 (ξ0 )g0 (ξ1 )(α − x0 ), x1 ≤ ξ1 ≤ α
..
.
α − xn = g0 (ξ0 )g0 (ξ1 ) · · · g0 (ξn−1 )(α − x0 )
(2.17)
So, if |g0 (ξn )| ≤ M for all n, then |n | ≤ M n |0 | and convergence is assured if M < 1, i.e.
if |g0 (x)| < 1 in a neighbourhood containing both α and x0 ; this condition dictates the
version of the method which is to be used. Thus, condition for convergence is given as

Convergence condition The fixed-point iteration method based on the func-


tion g converge to α if
• g and g0 are continuous functions on an interval I and

• |g0 (x)| < 1 ∀x ∈ I


Note: If g0 (x)| > 1, then the iteration will not converge to α and when
g0 (x)| = 1 no conclusion can be made.

Example 2.4

Find the root of the equation cos x = 3x − 1 correct to three decimal places using
fixed-point iteration method.
π
Solution: Here we have f (x) = cos x − 3x + 1 assume x0 = 0 and x1 = 2
and
f (0) = 2 = +ve and f ( π2 ) = −3( π2 ) + 1 = −ve. Thus, the root lies between 0 and
π
2
.
1
Now, the given equation can be rewrite as x = cos x + 1 = g(x) (say) then we
− sin x
3
can check that g0 (x) = 3
= |g0 (x)| < 1 in [0, π2 ] hence, the Fixed-point iteration
method can be applied.
Let x0 = 0 be the initial guess then x1 = g(x0 ) = 0.667, x2 = g(x1 ) =
0.5953, · · · , x5 = 0.6072, x6 = 0.6071. Therefore, since x5 and x6 are correct to
three decimal places the required root is given by 0.6071.

Numerical Analysis-I Page 37


Chapter-2: Nonlinear Equations 2.4. Fixed-point Iteration Method

n xn−1 xn |xn − xn−1 | |f (xn )|


1 0.00000 0.66667 0.66667 0.21411
2 0.66667 0.59530 0.07137 0.04210
3 0.59530 0.60933 0.01403 0.00795
4 0.60933 0.60668 0.00265 0.00151
5 0.60668 0.60718 0.00050 0.00029
6 0.60718 0.60709 0.00010 0.00005

Rate Of Convergence Of Iteration Method

In general Fixed-point Iteration converges linearly with asymptotic error constant |g0 (α)|,
since, by the definition of ξn and the continuity of g0 ,

en+1
lim = lim |g0 (ξn )| = |g0 (α)|. (2.18)
n→∞ e n→∞
n

Recall that the conditions we have stated for linear convergence are nearly identical to
the conditions for g to have a unique fixed point in [a, b]. The only difference is that now,
we also require g0 to be continuous on [a, b].

Now, suppose that in addition to the previous conditions on g, we assume that g 0 (α) = 0,
and that g is twice continuously differentiable on [a, b]. Then, using Taylor’s Theorem,
we obtain

en+1 = g(xn ) − g(α) = g 0 (α)(xn − α) + g 00 (ξn )(xn − α)2 = g00 (ξn )e2k ,

where ξn lies between xn and α. It follows that for any initial iterate x0 ∈ [a, b], Fixed-
point iteration converges at least quadratically, with asymptotic error constant |g 00 (α)/2|.
This discussion implies the following general result

Theorem 2.2 General Convergence

Let g(x) be a function that is n times continuously differentiable on an interval [a, b].
Furthermore, assume that g(x) ∈ [f (a), f (b)] for x ∈ [a, b], and that |g0 (x)| ≤ M
on (a, b) for some constant M < 1. If the unique fixed point α in [a, b] satisfies

g0 (α) = g00 (α) = · · · = g(n−1) (α) = 0, (2.19)

Numerical Analysis-I Page 38


Chapter-2: Nonlinear Equations 2.5. Newton-Raphson Method

Then for any x0 ∈ [a, b], Fixed-point Iteration converges to α of order n, with
asymptotic error constant |g(n)(α)/n||.

2.5 Newton-Raphson Method

It is also called Newton’s method and it is the general root finding method. This method
requires only one appropriate starting point x0 as an initial assumption of the root of
the function f (x) = 0. At (x0 , f (x0 )) a tangent to f (x) = 0 is drawn. Equation of this
tangent is given by

y = f 0 (x0 )(x − x0 ) + f (x0 ) (2.20)

The point of intersection, say , of this tangent with x-asis (y = 0) is taken to be the next
approximation to the root of f(x) = 0. So on substituting y = 0 in the tangent equation
we get
f (x0 )
x1 = x 0 − (2.21)
f 0 (x0 )
If |f (x1 )| <  we have got an acceptable approximate root of f (x) = 0, otherwise we
replace x0 by x1 , and draw a tangent to f (x) = 0 at (x1 , f (x1 )) and consider its intersec-
tion, say , with x-axis as an improved approximation to the root of f(x)=0. If |f (x2 )| > ,
we iterate the above process till the convergence criteria is satisfied. This geometrical
description of the method may be clearly visualized in the figure below:

Figure 2.4: Geometric representation of Newton’s Method

Numerical Analysis-I Page 39


Chapter-2: Nonlinear Equations 2.5. Newton-Raphson Method

The various steps involved in calculating the root of f (x) = 0 by Newton Raphson Method
are described compactly in the algorithm below.

Algorithm 4: Newton-Raphson Method


Input: Continuously differentiable function f (x) ;
Initial guess x0 ;
Error tolerance . ;
Output: Approximate solution that is within  of a root of f (x) ;
Step 1. n = 0 initialize the counter ;
Step 2. while |f (xn )| ≥  do
f (xn )
xn+1 = xn − 0 ;
f (xn )
if f (xn+1 ) == 0 then
return xn+1 (we have found a solution);
end
n=n+1
end
Step 3. return xn (return the solution)

Remark (1): This method converges faster than the earlier methods. In fact the method
converges at a quadratic rate. We will prove this later.

Remark (2): This method can be also derived directly by the Taylor expansion f(x) in
the neighbourhood of the root α of f (x) = 0. The starting approximation x0 to α is to
be properly chosen so that the first order Taylor series approximation of f (x0 + h) in the
neighbourhood of x0 leads to , an improved approximation to α. i.e

0 h2 00
f (x0 + h) = f (x0 ) + hf (x0 ) + f (x0 ) + ..... = 0
2

∵ h << 1, neglecting h2 and its higher powers, we get

f (x0 ) + hf 0 (x0 ) = 0

i.e.

f (x0 )
h=− , ∵ h = x1 − x0
f 0 (x0 )

Numerical Analysis-I Page 40


Chapter-2: Nonlinear Equations 2.5. Newton-Raphson Method

f (x0 )
x1 = x 0 −
f 0 (x0 )

Now the successive approximations etc may be calculated by the iterative formula:

f (xn )
xn+1 = xn −
f 0 (xn )

Example 2.5

Find the root of the equation cos x = 3x − 1 correct to three decimal places using
the Newton-Raphson’s Method.
Solution: Given
f (x) = cos x − 3x + 1

and hence
f 0 (x) = − sin x − 3
π
assume x0 = 4
= 0.7854
Iteration 1:
∴ f (x0 ) = −0.6491; f 0 (x0 ) = −3.7071
f (x0 ) π −0.6491
∴ x1 = x0 − = − = 0.6103
f 0 (x0 ) 4 −3.7071
Since |f (x1 )| = 0.0114 > 12 10−3 or since none of the digits in x0 and x1 are correct,
we repeat the Newton-Raphson procedure
Iteration 2:
f (x1 ) = −0.0114; f 0 (x1 ) = −3.5731

We have
f (x1 ) −0.0114
x2 = x1 − 0
= 0.6103 − = 0.6071028260
f (x1 ) −3.5731
Now, we can see that only one correct decimal place between x1 and x2 and we
need procedure the iteration.
Iteration 3:

f (x2 ) = −4.20566891868e − 06; f 0 (x2 ) = −3.57049039628

We have
f (x2 ) −0.0114
x3 = x2 − 0
= 0.6071 − = 0.6071016481
f (x2 ) −3.5731

Numerical Analysis-I Page 41


Chapter-2: Nonlinear Equations 2.5. Newton-Raphson Method

n xn−1 xn |xn − xn−1 | |f (xn )|


1 0.7853981634 0.6103053626 0.1750928008 0.0114430406
2 0.6103053626 0.6071028260 0.0032025367 0.0000042055
3 0.6071028260 0.6071016481 0.0000011778 0.0000000000

Convergence Analysis for Netwon’s Method

Using the same approach as with Fixed-point Iteration, we can determine the convergence
rate of Newton’s Method applied to the equation f (x) = 0, where we assume that f is
continuously differentiable near the exact solution α , and that f 00 exists near α. Using
Taylor’s Theorem, we obtain

en+1 = xn+1 − α
f (xn )
= xn − −α
f 0 (xn )
f (xn )
= en −
f 0 (xn )
1 1 00
 
0 2
= en − f (α) − f (xn )(α − xn ) − f (ξn )(α − xn )
f 0 (xn ) 2
1 1
 
0 00 2
= en + f (x n )(α − x n ) + f (ξn )(α − x n ) (MVT)
f 0 (xn ) 2
1 1 00
 
0 2
= en + −f (x )e
n n + f (ξ )e
n n
f 0 (xn ) 2

Thus, we have
f 00 (ξn ) 2
en+1 = e (2.22)
2f 0 (xn ) n
where ξn is between xn and α . We conclude that if f 0 (α) 6= 0, then Newton’s Method
f 00 (α)
converges quadratically, with asymptotic error constant | 0 |. It is easy to see from
2f (α)
this constant, however, that if f 0 (α) is very small, or zero, then convergence can be very
slow or may not even occur.

Example 2.6

Solve 2x3 − 2.5x − 5 = 0 for the root in [1,2] by Newton Raphson method with a
tolerance  = 10−6 .

Numerical Analysis-I Page 42


Chapter-2: Nonlinear Equations 2.5. Newton-Raphson Method

Solution: Given

f (x) = 2x3 − 2.5x − 5 = 0 and f 0 (x) = 6x2 − 2.5 with  = 10−6

Iteration 1: Take x0 = 2 as an initial guess

∴ f (x0 ) = 6 ; f 0 (x0 ) = 21.5

f (x0 ) 6.0
∴ x1 = x0 − 0
=2− = 1.72093023
f (x0 ) 21.5
Since, |f (x1 )| = 0.8910913504 > 10−6 we repeat the process.
Results are tabulated below:

n xn−1 xn |xn − xn−1 | |f (xn )|


0 2.0000000000 1.7209302326 0.2790697674 0.8910913504
1 1.7209302326 1.6625730361 0.0583571965 0.0347669334
2 1.6625730361 1.6601046517 0.0024683844 0.0000607495
3 1.6601046517 1.6601003235 0.0000043282 0.0000000002
4 1.6601003235 1.6601003235 0.0000000000 0.0000000000

Examining the numbers in the table above, we can see that the number of cor-
rect decimal places approximately doubles with each iteration, which is typical of
quadratic convergence.

Advantages and disadvantages of Newton’s Method

• The error decreases rapidly with each iteration

• Newton’s method is very fast. (Compare with bisection method!)

• Unfortunately, for bad choices of x0 (the initial guess) the method can fail to con-
verge! Therefore the choice of x 0 is VERY IMPORTANT!

• Each iteration of Newton’s method requires two function evaluations, while the
bisection method requires only one.

Note: A good strategy for avoiding failure to converge would be to use the bisection
method for a few steps (to give an initial estimate and make sure the sequence of guesses

Numerical Analysis-I Page 43


Chapter-2: Nonlinear Equations 2.6. Secant Method

is going in the right direction) followed by Newton’s method, which should converge very
fast at this point.

2.6 Secant Method

It is the most important variant of Netwon-Raphson method. The idea behind the Secant
Method is as follows. Assume we need to find a root of the equation f (x) = 0, called
α. Consider the graph of the function f (x) and two initial estimates of the root, x0
and x1 . Unlike the Bisection and Regular-falsi method, the two initial guesses do not
need to bracket the root of the equation. Thus, The secant method is an open method
but a two-point iteration method and may or may not converge. However, when secant
method converges, it will typically converge faster than the Bisection method. However,
since the derivative is approximated as given by Equation (2.23), it converges slower than
the Newton-Raphson method.

The two points (x0 , f (x0 )) and (x1 , f (x1 )) on the graph of f (x) determine a straight line,
called a secant line which can be viewed as an approximation to the graph.

Figure 2.5: Root finding using Fixed-point iteration method

The straight line passing through the two points (x0 f (x0 )) and (x1 , f (x1 ))can be expressed
as

Numerical Analysis-I Page 44


Chapter-2: Nonlinear Equations 2.6. Secant Method

y − f (x1 ) f (x0 ) − f (x1 )


=
x − x1 x0 − x1
We let y = 0 and solve the equation for x, now renamed as x2 , as an approximation of
the root
x1 − x0
x2 = x = x1 − f (x1 )
f (x1 ) − f (x0 )
The point x2 where this secant line crosses the x−axis is then the next approximation
for the root α.

The general procedure for Secant Method is given bellow in Algorithm 5

Algorithm 5: Secant Method


Input: Continuous function f (x) ;
Initial guesses x0 and x1 ;
Error tolerance . ;
Output: Approximate solution that is within  of a root of f (x);
Step 1. n = 1 (Initialize the counter);
Step 2. xn−1 = x0 ; xn = x1 ;
Step 3. while |f (xn )| ≥  do
xn − xn−1
xn+1 = xn − f (xn ) ;
f (xn ) − f (xn−1 )
if f (xn+1 ) == 0 then
return xn+1 (we have found a solution);
end
n=n+1
end
Step 4. return xn (return the solution)

Alternatively, we can think the secant method as in Newton’s method, but instead of
using f 0 (xn ), we approximate this derivative by a finite difference or the secant, i.e., the
slope of the straight line that goes through the two most recent approximations xn and
xn−1 . This slope is given by

f (xn ) − f (xn−1 )
f 0 (xn−1 ) = . (2.23)
xn − xn−1

Inserting this expression for f 0 (xn ) in Newton’s method simply gives us the secant method:

f (xn )
xn+1 = xn − f (xn )−f (xn−1 )
,
xn −xn−1

Numerical Analysis-I Page 45


Chapter-2: Nonlinear Equations 2.6. Secant Method

or
xn − xn−1
xn+1 = xn − f (xn ) (2.24)
f (xn ) − f (xn−1 )
Comparing Equation (2.24) to the graph in Figure 2.5, we see how two chosen starting
points x0 , x1 , and corresponding function values are used to compute x2 . Once we have
x2 , we similarly use x1 and x2 to compute x3 . As with Newton’s method, the procedure
is repeated until |f (xn )| or |xn+1 − xn |is below some chosen error tolerance ().

Example 2.7

Solve 2x3 − 2.5x − 5 = 0 for the root with a = 1, b = 2 by secant method to an


accuracy of 10−6 .
Solution:
Iteration 1: Set x0 = a = 1 ; x1 = b = 2, and we have

f (x0 ) = f (1) = −5.5 and f (x1 ) = f (2) = 6.0

Therefore,
f (x0 )x−1 − f (x−1 )x0
x2 =
f (x0 ) − f (x−1 )
f (2).1 − f (1).2
= )
f (2) − f (1)
6.1 − (−5.5).2
=
6 − (−5.5)
= 1.4782608747

Since, |f (x1 )| = |−2.2348976135| > 10−6 we repeat the process with


(x1 , f (x1 )), (x2 , f (x2 )) and so on till you get a ξ = xn s.t. |f (ξ)| <  = 10−6 .
These results are tabulated below:

n xn−1 xn xn+1 |xn − xn−1 | |f (xn )|


1 1.0000000000 2.0000000000 1.4782608696 0.5217391304 2.2348976740
2 2.0000000000 1.4782608696 1.6198574765 0.1415966069 0.5488317259
3 1.4782608696 1.6198574765 1.6659486215 0.0460911450 0.0824254417
4 1.6198574765 1.6659486215 1.6599303406 0.0060182810 0.0023855241
5 1.6659486215 1.6599303406 1.6600996200 0.0001692795 0.0000098734
6 1.6599303406 1.6600996200 1.6601003236 0.0000007035 0.0000000012

Numerical Analysis-I Page 46


Chapter-2: Nonlinear Equations 2.6. Secant Method

Convergence Anlysis for Secant Method

We now consider the order of convergence of the secant method. Let α be the true root
of the equation f (α) = 0, and the error of xn+1 is:

en+1 = xn+1 − α
xn − xn−1
= xn − f (xn ) − α
f (xn ) − f (xn−1 )
(xn−1 − α)f (xn ) − (xn − α)f (xn−1 )
=
f (xn ) − f (xn−1 )
en−1 f (xn ) − en f (xn−1 )
=
f (xn ) − f (xn−1 )

The Taylor expansion of f (xn )

1 1
f (xn ) = f (α + en ) = f (α) + f 0 (α)en + f 00 (α)e2n + O(e3n ) = f 0 (α)en + f 00 (α)e2n + O(e3n )
2 2

(as α is the solution for f (α) = 0) and similarly

1
f (xn−1 ) = f 0 (α)en−1 + f 00 (α)e2n−1 + O(e3n−1 )
2

Substituting these into the expression for en+1 we get


h i h i
en−1 f 0 (α)en + 12 f 00 (α)e2n + O(e3n ) − en f 0 (α)en−1 + 21 f 00 (α)e2n−1 + O(e3n−1 )
en+1 = h i h i
f 0 (α)en + 12 f 00 (x)e2n + O(e3n ) − f 0 (α)en−1 + 12 f 00 (α)e2n−1 + O(e3n−1 )
en−1 en f 00 (α)(en − en−1 )/2 + O(e4n−1 )
=
(en − en−1 )f 0 (α) + (e2n − e2n+1 )f 00 (α)/2 + O(e3n−1 )
en−1 en f 00 (α)/2 + O(e3n−1 )
= 0
f (α) + (en + en+1 )f 00 (α)/2 + O(e2n−1 )
en−1 en f 00 (α)/2 + O(e3n−1 )
= 0
f (α) + O(en ) + O(e2n−1 )
(2.25)

When n → ∞, all error terms in both the numerator and denominator of order higher
than the lowest order term approach to zero, we have

Numerical Analysis-I Page 47


Chapter-2: Nonlinear Equations 2.6. Secant Method

en−1 en f 00 (α)
en+1 = = en en−1 C
2f 0 (x)

where we have defined C = f 00 (α)/2f 0 (α). To find the order of convergence, we need to
find p in

|en+1 | ≤ |en−1 | |en | |C| = µ|en |p

Solving the equation above for |en | we get


!1/(p−1) !1/(p−1)
|C| |C|
|en | = |en−1 | = |en−1 |1/(p−1)
µ µ

However, when n → ∞ we also have

|en | = µ|en−1 |p

Comparing the two equations above we get

!1/(p−1) !p
1 |C| |C|
p= , µ= =
p−1 µ µ

Solving these two equations we get


1+ 5 √
p= = 1.618, µ = |C|p/(p+1) = C ( 5−1)/2
= |C|0.618
2

i.e.,

f 00 (x) 0.618

|en+1 | = |en |1.618

0
2f (x)

We see that the order of convergence p = 1.618 of the secant method is better than linear
but not worse than quadratic convergence.

Numerical Analysis-I Page 48


Chapter-2: Nonlinear Equations 2.6. Secant Method

Advantages and disadvantages of Secant Method

• The error decreases slowly at first but then rapidly after a few iterations.

• The secant method is slower than Newton’s method but faster than the bisection
method.

• Each iteration of Newton’s method requires two function evaluations, while the
secant method requires only one

• The secant method does not require differentiation

Numerical Analysis-I Page 49


Chapter 3
System of Equations

3.1 Introduction

In this chapter we will learn how to solve system of Equations of the form

f1 (x1 , x2 , · · · , xn ) = 0
f2 (x1 , x2 , · · · , xn ) = 0
.. (3.1)
.
fn (x1 , x2 , · · · , xn ) = 0,

where fi ∈ Rn are functions of n-variables of xi and bi ∈ R are real numbers.

Definition 3.1 Linear and nonlinear functions:


A function f : R → R is defined as being nonlinear when it does not satisfy the
superposition principle that is

f (x1 + x2 + · · · + xn ) 6= f (x1 ) + f (x2 ) + · · · + f (xn ).

On the other hand, if f satisfies the superposition principle the f is a linear function
and hence we can rewrite the equation f (x1 , x2 , · · · , xn ) = 0 as

a1 x1 + a2 x2 + · · · an xn = b.

If all the functions fi in Equation (7.1) are linear equation, then Equation (7.1) called
System of Linear Equations (SLEs), otherwise if one of the fi ’s is nonlinear then it is

50
Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

called System of Nonlinear Equations (SNLEs). In the next section subsequent sections
of this chapter we will discuss methods for solving SLEs and at the end we will give one
particular method,Newton’s Method, for solving SNLEs.

In general, we can divide the approaches to the solution of linear algebraic equations into
two broad areas. The first of these involve algorithms that lead directly to a solution
of the problem after a finite number of steps while the second class involves an initial
"guess" which then is improved by a succession of finite steps, each set of which we will
call an iteration. If the process is applicable and properly formulated, a finite number of
iterations will lead to a solution.

3.2 Direct Methods for System of Linear


Equations (SLE)

There are several methods which directly solve system of linear equations, among these are
such as Cramer’s rule, Gaussian elimination, Gauss Jordan, QR factorization. However,
since the Cramer’s rule is unsuitable for computer implementation and is not discussed
here. Among the direct methods, we only present the Gaussian elimination and Gauss
Jordan here in detail.

In general, we may write a system of linear algebraic equations in the form

a11 x1 + a12 x2 + · · · a1n xn = b1


a21 x1 + a22 x2 + · · · a2n xn = b2
.. (3.2)
.
an1 x1 + an2 x2 + · · · ann xn = bn

where x1 , x2 , · · · , xn are the unknowns, a11 , a12 , · · · , ann are the coefficients of the sys-
tem, and b1 , b2 , · · · , bn the constant terms. The above system of linear equations (7.2)
can be written in matrix form as
Ax = b,

where the coefficient matrix A, the unknown column vector x and the constant column

Numerical Analysis-I Page 51


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

vector b are given, respectively, as


     
 a11 a12 · · · a1n   x1   b1 
a22 · · · a2n 
     
 a21  x2   b2 
A= 
 . .. . .

..  x=
 
 .  and b=
 
.
 . . .   .  .
 . .   .  .
     
an1 a22 · · · a2n xn bn

3.2.1 Gaussian Method

The Gaussian elimination is arguably the most used method for solving a set of linear
algebraic equations. It makes use of the fact that a solution of a special system of linear
equations, namely the systems involving triangular matrices, can be constructed very
easily.

Forward and back substitution

A matrix equation Lx = b or U x = b is very easy to solve by an iterative processes


called forward substitution and back substitution, respectively.

Consider the system of equations of the form Lx = b,


    
 l11 0 0 ··· 0   x1   b1 
0 ··· 0

 l12 l22
 
  x2 
 
   b2 
  
    
 l31
 l32 u33 · · · 0   x3 
  =  b3 
 
 . .. .. . . ..  .  .
 . .  .  .
 . . . .  .  .
    
ln1 ln2 ln3 · · · lnn xn bn

where, L is an lower triangular matrix,i.e., lij = 0, ∀i > j and lii 6= 0∀. We can solve
this system by using the forward substitution as follows. The process is so called because,
we first computes x1 from the first equation and then substitutes that forward into the
next equation to solve for x2 , and repeats through to xn .

Observe that the first equation l11 x1 = b1 only involves x1 , and hence we can compute
x1 directly. The second equation only involves x1 and x2 , thus we can substitutes the
computed value of x1 to this equation and solver for x2 . Continuing in this way, the
j th equation only involves x1 , x2 , · · · , xj , and one can solve for xj by substituting the
previously solved values x1 , x2 , · · · , xj−1 . Thus, we can write this procedure in a formal

Numerical Analysis-I Page 52


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

way as bellow

Forward Substitution (FS):

b1
x1 =
l11
b2 − l21 x1
x2 =
l22
.. (3.3)
.
 
i−1
1  X
xi = bi − lij xj  for i = 3, 4, · · · n.
lii j=1

On the other hand, if we consider a system of linear equation given by the following
matrix equation
Ux = β (3.4)

Or     
u11 u12 u13 · · · u1n   x1   b1 
 0 u22 u2 3 · · · u2n   x2 
    
 b2 
    
    

 0 0 u33 · · · u3n 
  x3  =
   b3 
 
.. .. .. . . .   . 
.
. ..   .. 
.
  

 . . . .
    
0 0 0 · · · unn xn bn
where U is an upper triangular matrix such that all elements below the main diagonal
are zero and all the diagonal elements are non-zero, i.e., uii 6= for all i. In an upper
triangular matrix, similarly, we can work backwards, first computing xn , then substituting
xn back into the previous equation to solve for xn−1 , and repeating through x1 .

To solve the system (3.4) one can start from the last equation

βn
xn = ,
unn

and then proceed as follows

1
xn−1 = [bn−1,n−1 − un−1,n xn ] ,
un−1,n−1
1
xn−2 = [bn−2,n−2 − un−2,n−1 xn−1 − un−2,n xn ] ,
un−2,n−2

Numerical Analysis-I Page 53


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

In general, for ith element xi ; we can write


 
n
1  X
xi = bi − uij xj 
uii j=i+1

where i = n − 1, n − 2; · · · , 1. Since we proceed from i = n to i = 1, these set of


calculations are referred to as back substitution. Thus, we have

Backward Substitution(BF):

bn
xn =
unn

n

(3.5)
1  X
xi = bi − uij xj 
uii j=i+1

Thus, the solution procedure for solving this system of equations involving a special type
of upper triangular matrix is particularly simple. However, the trouble is that most of
the problems encountered in real applications do not have such special form.

Now, suppose we want to solve a system of equations Ax = b where A is a general full


rank square matrix without any special form. Then, by some series of transformations,
can we convert the system from Ax = b form to U x = β form, i.e. to the desired special
form? It is important to carry out this transformation such that the original system of
equations and the transformed system of equations have identical solutions, x.

The following row operations on the augmented matrix of a system produce the aug-
mented matrix of an equivalent system, i.e., a system with the same solution as the
original one.

• Interchange any two rows.

• Multiply each element of a row by a nonzero constant.

• Replace a row by the sum of itself and a constant multiple of another row of the
matrix.

For these row operations, we will use the following notations.

• Ri ⇐⇒ Rj means: Interchange row i and row j.

Numerical Analysis-I Page 54


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

• αRi means: Replace row i with α times row i.

• Ri + αRj means: Replace row i with the sum of row i and α times row j.

Now, we shall assume that the system has a unique solution, i.e., we assume aii 6= 0 and
proceed to describe the simple Gaussian Elimination method(GEM) for finding the
solution. The method reduces the system to an upper triangular system using elementary
row operations (ERO).

Gaussian Elimination method to solve a system of linear equations is described in the


following steps.

Step -1: write the augmented matrix of the system as follows:

 
(1) (1) (1) (1) (1)
a11 a12 a13 · · · a1n | b1 
 (1) (1) (1) (1) (1) 
a21
 a22 a23 · · · a2n | b2  
 (1) (1) (1) (1) (1)  (1) (1)
A(1) = a
 31 a32 a33 · · · a3n | b3 , where aij = aij and bj = bj .
 . .. .. .. .. .. 
 . . . | . 
 . . . 
 
(1) (1) (1)
an1 an2 an3 · · · a(1) (1)
nn | bn

Step -2: Perform elementary row operations to get zeros below the diagonal.
(1)
2 -1: Assuming the pivot element ,a11 6= 0, is nonzero we apply ERO on A(1) to
(1)
reduce all entries below a11 to zero. Let the resulting matrix be denoted by
A(2) .
(1) (1)
(1) Ri +mi1 R1 (2) (1) ai1
A −−−−−−→ A , where mi1 =− (1)
.
a11
Here, the mi ’s are called multipliers for each row i and also note that A(2) is
of the form  
(1) (1) (1) (1) (1)
a11 a12 a13 · · · a1n | b1 
(2) (2) (2) (2) 
0 · · · a2n | b2 


 a22 a23 
(2) (2) (2) (2) 
A(2) =


 0 a32 a33 · · · a3n | b3 ,
 .. .. .. .. .. .. 

 . . . . . | .  
 
(2) (2)
0 an2 an3 · · · a(2) (2)
nn | bn

In addition, the above row operations on A(1) can be effected by pre-multiplying

Numerical Analysis-I Page 55


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

A(1) by M (1) where


 

1 0 0 0 0 
 (1) 

 m21 

(1)
M (1) =
 

 m31 In−1 ,

 .. 

 . 

 
(1)
mn1

where In−1 is an n − 1 × n − 1 identity matrix. Thus, we have

M (1) A(1) = A(2)

Then the system Ax = b is equivalent to A(2) x = b(2)


(2) (2)
2-2: Assume the pivot element a22 6= 0 and reduce all entries bellow the a22 to
zero by ERO

(2) (2)
(2) Ri +mi2 R1 (3) (2) ai2
A −−−−−−→ A , where mi2 =− (2)
, for i > 2.
a22

and M (2) is given by


 

1 0 0 ··· 0 0 
 0 1 0 ··· 0 0 
 
 
 (2) 
 0 m32 
M (2) = ,
 
 (2)

 0 m42 In−2 

 .. .. 
. .
 
 
 
(2)
0 mn2

and hence M (2) A(2) = A(3) , where A(3) is of the form


 
(1) (1) (1) (1) (1)
a11 a12 a13 · · · a1n | b1 
(2) (2) (2) (2) 
0 · · · a2n | b2 


 a22 a23 
(3) (3) (3) 
A(3) =


 0 0 a33 · · · a3n | b3 ,
 .. .. .. .. .. .. 

 . . . . . | .  
 
(3)
0 0 an3 · · · a(3) (3)
nn | bn

Repeat these procedures until you get A(n) . These procedures are generalized
as follow at the k th step.
2 -k: Assume a(k)kk 6= 0, then reduce all entries below a(k)kk to zero by applying

Numerical Analysis-I Page 56


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

ERO,i.e.,

(k) (k)
Ri +m Rk (k) aik
−−→ A(k+1) ,
A(k) −−−−ik where mik = − (k)
, for i > k.
akk

and M (k) is of the form


 
 0 0 ··· 0 
0 0 ··· 0 
 

Ik
 
 .. .. . . .. 

 . . . . 

 
0 0 ··· 0 
 
M (k) =

 
(k)
0 0 · · · m(k+1 k)
 

 
(k)
0 · · · m(k+2 k)
 
0 
In−k
 
. .. . . ..
.

. . . . 

 
(k)
0 0 ··· m(n k)

and hence M (k) A(k) = A(k+1) , where A(k+1) is of the form


 (1) (1) (1) (1) (1) (1)

a a12 a13 · · · a1k · · · a1n | b1
 11 
(2) (2) (2) (2) (2) 
 0 a22 a23 · · · a2k · · · a2n | b2 

 
(3) (3) (3) (3) 
 0 0 a33 · · · a3k · · · a3n | b3 

 . .. .. .. ..
 
 .. ...
A(k) =

. . . | . ,
 
(k) (k)
 
 0
 0 0 ··· akk ··· akn | b(k)
n 

 .. .. .. .. .. ..
 
 . . . . . | .


 
0 0 0 ··· 0 · · · a(k) (k)
nn | bn

Continue this procedure until the (n − 1)th -step and is given bellow.
(n−1) (n−1)
2 -(n-1): Assuming an−1,n−1 6= 0, reduce all entries below an−1,n−1 to zero by applying

(n−1) (n−1)
(n−1)
Rn +mn,n−1 Rn−1
(n) (n−1) an,n−1
A −−−−−−−−−−→ A , where mn,n−1 =− (n−1)
,
an−1,n−1

and M (n−1) is of the form

 

0


I(n−1)
M (n−1) =
 
 
 
 
 
(n−1)
0 0 · · · m(n,n−1) 1

Numerical Analysis-I Page 57


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

and hence M (n−1) A(n−1) = A(n) , where A(n) is of the form


 (1) (1) (1) (1) (1) (1)

a a12 a13 · · · a1k · · · a1n | b1
 11 
(2) (2) (2) (2) (2) 
 0 a22 a23 · · · a2k · · · a2n | b2 

 
(3) (3) (3) (3) 
 0 0 a33 · · · a3k · · · a3n | b3 

 . .. .. .. ..
 
 .. ..
A(n) =

 . . . . | . ,

(k) (k)
 
 0
 0 0 ··· akk ··· akn | b(k)
n 

 .. .. .. .. .. ..
 
 . . . . . | .


 
0 0 0 ··· 0 · · · a(n) (n)
nn | bn

At the end of the (n − 1)th -step, in general, we have

M (n−1) M (n−2) · · · M (1) A(1) = A(n) , (3.6)

where A(1) is an augmented matrix of the original matrix A and vector b.

Step -3: Inspect the resulting matrix and re-interpret it as a system of equations.

a If you get a zero diagonal element then the system can not be solved using
GEM or doe not have a solution. In this case the matrix is probably a
singular matrix.
b If you get less equations than unknowns after the reduction and if there is a
solution then there is an infinite number of solutions.
c If you get as many equations as unknowns after the reduction and if there is
a solution then there is exactly one solution.

Step -4: Apply the BSF to get the solution of the given system if part (c) in step -3 is true.

Note, as a "by-product" of GEM, the simple GEM in addition to to solve the system
(k)
Ax = b it can also be used to evaluate det A provided akk 6= 0 for each k. Note further
that each M (k) is a lower triangular matrix with all diagonal entries as 1. Thus let
det M (k) is 1 for every k. Now, from Equation (3.6) taking the un-augmented matrix
from A0(n) A0(1) and we have

det A0(n) = det M (n−1) det M (n−2) · · · det M (1) det A0(1) ,
det A0(n) = det A0(1) = det A, since A = A0(1)

Now A0(n) is an upper triangular matrix and hence its determinant is the product of the
diagonal elements
(1) (2)
a11 a22 · · · a(n)
nn

Numerical Analysis-I Page 58


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

. Thus det A is given by


(1) (2)
det A = a11 a22 · · · a(n)
nn

In addition, note that all the matrix M (k) are lower triangular, and nonsingular as their
det = 1 6= 0 for all k. They are all therefore invertible and their inverses are all lower
triangular, i.e., if
L = M (n−1) M (n−2) · · · M (1)

then N is lower triangular, and nonsingular and N −1 is also lower triangular. Now

N A = N A0(1) = M (n−1) M (n−2) · · · M (1) A0(1) = A0(n) .

Therefore
A = N −1 A0(n) .

Now N −1 is lower triangular which we denote by L and A0(n) is upper triangular which
we denote by U , and we thus get the so called LU decomposition

A = LU ,

of a given matrix A- as a product of a lower triangular matrix with an upper triangular


matrix. This is another application of the simple GEM.

(k)
REMEMBER IF AT ANY STAGE WE GET akk 6= 0 WE CANNOT PROCEED FUR-
THER WITH THE SIMPLE GSM.

Example 3.1:
Solve the following system using Gauss elimination method.

x+y+z =6
2x − 2y + z = 3 (3.7)
x+z =4

Solution: Step-1: Form the augmented matrix:


 
1 1 1 | 6
 
A(1) = 
2 −2 1 | 3

(3.8)
 
1 0 1 | 4

Numerical Analysis-I Page 59


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Step -2: Perform elementary row operations to get zeros below the diagonal.
2-1: First compute the multipliers for the second and third row:

(1) (1)
(1) a21 2 (1) a 1
m2,1 =− (1)
= − = −2 and m3,1 = − 31
(1)
= − = −1.
a11 1 a11 1

Thus,  
1 0 0
 
M (1) = 
−2 1 0

(3.9)
 
−1 0 1

Now, multiply matrix (3.8) by matrix (7.7), we get A(2) , i.e.,

    
1 1 0 1 1 1 | 6 1 1 1 | 6
    
A(2) = −2 1 0 2 −2 1 | 3 = 0 −3 −1 | −9 (3.10)
    
    
−1 0 1 1 0 1 | 4 0 −1 0 | −2

2-2: First compute the multipliers for third row:

(2)
(2) a32 −1 1
m3,2 =− (2)
=− =−
a22 −3 3

Thus,  
1 0 0
 
M (2) = 
0 1 0

(3.11)
 
0 − 13 1

Now, multiply matrix (3.10) by matrix (7.9), we get A(3) , i.e.,

    
1 0 0 1 1 1 | 6 1 1 1 | 6
    
A(2) = 
0 1 0 0 −3 −1 | −9 = 0 −3 −1 | −9
   
    
0 − 31 1 0 −1 0 | −2 0 0 1
3
| 1

Step-3 Inspect the resulting system.


un-argumentation of the above matrix A(2) results,
   
1 1 1 6
   
A0 = 
0 −3 −1  and b0 = −9
  
   
1
0 0 3
1

Numerical Analysis-I Page 60


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

and hence we have the system of equation

x+y+z =6
−3y − z = −9 (3.12)
1
z = 1.
3

Note that Equation (3.23) and Equation (7.4) have the same solution since the
applied ERO doesn’t change the solutions.
Here, we can see that, there are no zeros on the diagonal of A0 and we have the
number of equations as the number of unknowns. Thus, the system have a unique
solution and the BSF method can be applied to solve the system.

Step-4 Solve the system using the BSF:


From the last equation we get
1
z= 1 = 3.
3

Next, from the second equation we solve for y where z = 3

−9 − (−z) −6
y= = =2
−3 −3

Finally, from the first equation we solve for x, where y = 2 and z = 3, as

x=6−y−z =6−2−3=1

Therefore, the solution of the given system of equations (7.4) is

x = 1, y = 2 and z = 3.

Exercise 3.1:
Solve the following system of linear equations using the GEM:

(a) (b) (c)

x+y+z =6 x+y+z =6 x+y+z =6


2x − y + z = 3 2x − y + z = 3 2x − y + z = 3
x+z =4 x+z =4 2x + 2y + 2z = 12
2x + y + z = 7 2x + y + z = 8

Numerical Analysis-I Page 61


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

We observed that in order to apply simple GEM we need

(k)
akk 6= 0

for each stage k. This may not be satisfied always. So we have to modify the simple
GEM in order to overcome this situation. Further, even if the condition

(k)
akk 6= 0

is satisfied at each stage, simple GEM may not be a very accurate method to use. What
do we mean by this?

Example 3.2:
Consider, as an example, the following system:

(0.000003)x + (0.213472)y + (0.332147)z = 0.235262


(0.215512)x + (0.375623)y + (0.476625)z = 0.127653 (3.13)
(0.173257)x + (0.663257)y + (0.625675)z = 0.285321

Solve this system by GEM and perform the computations to 6 significant digits.
Solution:
Step-1: Form the augmented matrix:
 
0.000003 0.213472 0.332147 | 0.235262
 
A(1) = 0.215512 0.375623 0.476625 | 0.127653
 
 
0.173257 0.663257 0.625675 | 0.285321

Step -2: Perform elementary row operations to get zeros below the diagonal.
(1)
2-1: First compute the multipliers for the second and third row. Since a11 6= 0, we have

(1) (1)
(1) a21 0.215512 (1) a 0.173257
m2,1 =− (1)
=− = −71837.3 and m3,1 = − 31
(1)
=− = −57752.3.
a11 0.000003 a11 0.000003

Thus,  
1 0 0
 
M (1) = −71837.3 1 0
 
 
−57752.3 0 1

Numerical Analysis-I Page 62


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Now, multiplying A(1) by M (1) results in A(2) , i.e.,


 
0.000003 0.213472 0.332147 | 0.235262
 
A(2) =  0 −15334.9 −23860.0 | −16900.5
 
 
0 −12327.8 −19181.7 | −13586.6

(2)
2-2: Compute the multipliers for third row, since a22 = −15334.9 6= 0 we have

(2)
(2) a32 −12327.8
m3,2 =− (2)
=− = −0.803905
a22 −15334.9

Thus,  
1 0 0
 
M (2) = 0 1 0
 
 
0 −0.803905 1

Thus, A(3) obtained by multiplying A(2) by M (2) and we have


 
0.000003 0.213472 0.332147 | 0.235262
 
A(2) =  0 −15334.9 −23860.0 | −16900.5
 
 
0 0 −0.50000 | −0.20000

Step-3 Inspect the resulting system.


un-argumentation of the above matrix A(2) results,
   
0.000003 0.213472 0.332147 0.235262
   
A0 =  0 −15334.9 −23860.0 and b0 = −16900.5
   
   
0 0 −0.50000 −0.20000

and hence we have the system of equation

0.000003x + 0.213472y + 0.332147z = 0.235262


−15334.9y − 23860.0z = −16900.5 (3.14)
−0.50000z = −0.20000

Here, we can see that we there are no zeros on the diagonal of A0 and we have the
number of equations as the number of unknowns. Thus, the system have a unique
solution and the BSF method can be applied to solve the system.

Step-4 Solve the system using the BSF:

Numerical Analysis-I Page 63


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

From the last equation we get

−0.20000
z= = 0.400000
−0.50000

Next, from the second equation we solve for y where z = 0.400000

−16900.5 − (−23860.0z)
y= = 0.479723
−15334.9

Finally, from the first equation we solve for x, where y = 0.479723 and z = 0.400000,
as
0.235262 − (0.213472y + 0.332147z)
x= = −1.33333
0.000003
Therefore, the solution of the above system of equations (7.12) is

x = −1.33333,
y = 0.479723,
z = 0.400000

This compares poorly with the correct answers (to 10 digits) given by

x = −0.9912894252
y = 0.0532039339 (3.15)
z = 0.6741214691

of the given system of equations (3.13).

Thus we see that the simple Gaussian Elimination method needs modification in order
(k)
to handle the situations that may lead to akk = 0 for some k or situations as arising in
the above example. In order to alleviate such problems we introduce the idea of Partial
Pivoting.

3.2.2 Gaussian method with partial pivoting

When performing Gaussian elimination, the diagonal element that one uses during the
elimination procedure is called the pivot. To obtain the correct multiple, one uses the
pivot as the divisor to the elements below the pivot. Gaussian elimination in this form
will fail if the pivot is zero. In this situation, a row interchange must be performed.

Numerical Analysis-I Page 64


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Even if the pivot is not identically zero, a small value can result in big round-off errors.
For very large matrices, one can easily lose all accuracy in the solution. To avoid these
round-off errors arising from small pivots, row interchanges are made, and this technique
is called partial pivoting (partial pivoting is in contrast to complete pivoting, where both
rows and columns are interchanged).

The idea of partial pivoting is as follows. At the k th stage we shall be trying to reduce
all the entries below the k th diagonal to zero as we did in the simple GEM. However,
before we do this we look at the entries in the k th diagonal and below it and then pick
the one that has the largest absolute value and we bring it to the k th diagonal position
by a row interchange, and then reduce the entries below the k th diagonal to zero. When
we incorporate this idea at each stage of the Gaussian elimination process we get the
Gaussian Elimination Method with Partial Pivoting. We now illustrate this with
a few examples:

Example 3.3:
Solve the following system using Gauss elimination method.

x + y + 2z = 4
2x − y + z = 2
x + 2y = 3

Solution:
Step-1: Form the augmented matrix:
 
1 1 2 | 4
 
A(1) = 2 −1 1 | 2
 
 
1 1 0 | 3

Step -2: Perform elementary row operations to get zeros below the diagonal.
(1)
2-1 Select the pivot row by comparing the elements a11 and below it and pick the one
(1)
with the largest absolute value. Thus, the pivot element has to be chosen as a21 2
as this is the largest absolute valued entry in the first column. Therefore we need
to interchange row-1 and row-2 and hence we have
 
0 1 0
 
M (1) = 
1 0 0 ,

 
0 0 1

Numerical Analysis-I Page 65


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

and
    
0 1 0 1 1 1 | 4 2 −1 1 | 2
    
A(2) = M (1) A(1) = 1 0 0 2 −2 1 | 2 = 1 1 2 | 4
    
    
0 0 1 1 0 1 | 3 1 1 0 | 3

Note, that multiplying the matrix A(1) by the matrix M (1) will simply interchange
row-1 and row-2 of the matrix A(1) .

2-2: Now, compute the multipliers for the second and third row:

(1) (1)
(2) a21 1 (2) a 1
m2,1 =− (1)
= − and m3,1 = − 31
(1)
=− .
a11 2 a11 2

Thus,  
1 0 0
 
M (2) = 
− 12 1 0

 
− 12 0 1

Now, multiply matrix A(2) by matrix M (2) , we get A(3) , i.e.,


 
2 −1 1 | 2
 
A(3) = 0 3 3
| 3
 
 2 2 
5
0 2
− 12 | 2

5
2-3 Next, the pivot element a332 = 2
is selected since this is the entry with the largest
st
absolute value in the 1 column of the next sub matrix. So we have to do another
row interchange, i.e., interchange row-2 and row-3. Let
 
1 0 0
 
M (3) = 0 0 1 ,
 
 
0 1 0

and we have  
2 −1 1 | 2
 
A(4) = M (3) A(3) = 0 5
− 12 | 2
 
 2 
3 3
0 2 2
| 3

2-4: Compute the multipliers for third row:

(4) 3
(4) a32 3
m3,2 =− (4)
= − 25 = −
a22 2
5

Numerical Analysis-I Page 66


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Thus,  
1 0 0
 
M (2) = 0 1 0
 
 
0 − 35 1

and here we have


 
2 −1 1 | 2
 
A(5) = M (4) A(4) = 0 5
− 12 | 2  .
 
 2 
9 9
0 0 5
| 5

This completes the reduction and we have that the given system is equivalent to
the system
A 0 x = b0 ,

where    
2 −1 1 2
   
0 5 0
A = 0 − 21  and b = 2
   
 2   
9 9
0 0 5 5

Step-3 Inspect the resulting system.

2x − y + z = 2
5 1
y− z=2
2 2
9 9
z= .
5 5

Here, we can see that we there are no zeros on the diagonal of A0 and we have the
number of equations as the number of unknowns. Thus, the system have a unique
solution and the BSF method can be applied to solve the system.

Step-4 Solving the above system using the BSF results

x = 1, y = 1 and z = 1,

Exercise 3.2:
Solve the system of linear equations in Equation (3.13) by using GEM with partial piv-
oting and compare the results with the correct answer in Equation (7.13):

Numerical Analysis-I Page 67


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Determinant Evaluation

Notice that even in the partial pivoting method we get matrices M (k) , M (k−1) · · · M (1)
such that the product M (k) M (k−1) · · · M (1) A0 is upper triangular matrix and therefore
det M (k) det M (k−1) · · · det M (1) det A0 = Product of the diagonal entries in the final
upper triangular matrix.

Now if M (i) is used to make entries below a diagonal to zero then det M (i) = 1; and if it
is used for a row interchange necessary for a partial pivoting then det M (i) = −1. There-
fore, det M (k) det M (k−1) · · · det M (1) = (−1)n where n is the number of row interchange
effected in the reduction. Hence, det A = (−1)n ×product of the diagonal elements in the
final upper triangular matrix, A0 .

3.2.3 Gauss Jordan Method

Consider the following matrix


 
1 0 0 4
 
A=
0 1 0 3

 
0 0 1 −1

This is an example of a matrix in reduced row-echelon form. To be in this form, a


matrix must have the following properties:

1. If a row does not consist entirely of zeros, then the first nonzero number in the row
is a 1. (We call this the leading 1)

2. If there are any rows that consist entirely of zeros, then they are grouped together
at the bottom of the matrix.

3. In any two successive rows that do not consist entirely of zeros, the leading 1 in the
lower row occurs farther to the right than the leading 1 in the higher row.

4. Each column that contains a leading 1 has zeros everywhere else.

A matrix having properties 1, 2 and 3 (but not necessarily 4) is said to be in row-echelon

Numerical Analysis-I Page 68


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

form. The following two matrices are in echelon form


 
  0 2 4 7 10
1 2 4 2  
0 0 3 3 −6
   
0 7 −1 5 ,
   
0 0 0 4 −1
   
0 0 3 −1  
0 0 0 0 0

Typical examples of matrices on reduced echelon form are


 
1 2 0 0 | 10  
  1 0 0 | 2
0 0 1 0 | −6
   
 , 0

1 0 | 5 .

0 0 0 1 | −1
   
  0 0 1 | −1
0 0 0 0 | 0

Gaussian elimination can be used to reduce an augmented matrix to row-echelon form.


A process called Gauss-Jordan elimination (which is an extended version of Gaussian
elimination) is used to reduce an augmented matrix to reduced row-echelon form.
Recall that with Gaussian elimination it was only necessary to get zeros below each
leading 1. With Gauss-Jordan elimination it is desirable to get zeros above and below
each leading 1 and we could also do the reduction here by partial pivoting.

Consider the following example. Gauss-Jordan elimination is the exact same as Gaussian
elimination until the matrix is in row-echelon form.

Example 3.4:
Solve the following system using Gauss-Jordan method.

x + y + 2z = 9
2x + 4y − 3z = 1
3x + 6y − 5z = 0

Solution:
Step-1: Form the augmented matrix:
 
1 1 2 | 9
 
A(1) = 
2 4 −3 | 1

 
3 6 −5 | 0

Step-2 Reduce the matrix to reduced row-echelon form.

Numerical Analysis-I Page 69


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

2-1: Normalize the diagonal element in the first-column to have a leading 1, but it’s
already a leading 1.

2-2: Perform elementary row operations to get zeros below the diagonal of column-1,i.e.,
compute the multipliers for the second and third row:

(1) (1)
(1) a21 2 (1) a 3
m2,1 =− (1)
= − = −2 and m3,1 = − 31
(1)
= − = −3.
a11 1 a11 1

Thus,  
1 0 0
 
M (1) = 
−2 1 0

 
−3 0 1

Now, multiply matrix A(1) by matrix M (1) , we get A(2) , i.e.,


 
1 1 2 | 9
 
A(2) = 
0 2 −7 | −17

 
0 3 −11 | −27

2-3 Normalize the diagoanl element in the second column to ge the leading 1. Here, we
need to divide the second row by 2. Thus, we have
 
1 0 0
 
M (2) = 
0 1
2
0

 
0 0 1

and  
1 1 2 | 9
 
A(3) = M (2) A(2) = 0 1 | − 17 − 27
 
2 


0 3 −11 | −27

(3)
(3) a32
2-3 Compute multiplier for the third-row: m32 = (3) = − 31 = −3
a22

Thus,  
1 0 0
 
M (3) = 0 1 0
 
 
0 −3 1

Numerical Analysis-I Page 70


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

and hence  
1 1 2 | 9
 
A(4) = M (3) A(3) = 0 1 − 72 | − 17 .
 
 2 
0 0 − 12 | − 23

3-4 Normalize the diagonal element of the third row, i.e.,


 
1 0 0
 
M (4) = 
0 1 0

 
0 0 −2

and hence  
1 1 2 | 9
 
A(5) = M (4) A(4) = 0 1 − 72 | − 17 .
 
 2 
0 0 1 | 3

This matrix is now in row-echelon form (upper triangular). To solve this matrix
using Gauss-Jordan method we need to do one more step.
Step-3: Beginning with the last nonzero row and working upwards, add suitable multi-
ples of each row to the rows above it to introduce zeroes above the leading 1’s.

3-1 Compute multipliers for the first and and second row using the third row as a pivot:

(5) (5)
(5) a13 7 (5) a23
m13 = (5)
= and m23 = (5)
= −2
a33 2 a33
.

Thus,  
1 0 −2
 
M (5) = 
0 1 7
2


 
0 0 1
and hence  
1 1 0 | 3
 
A(6) = M (5) A(5) = 0 1 0 | 2 .
 
 
0 0 1 | 3
We are now finished with column 3 as there are all zeros above the leading 1. The final
step involves looking at column 2 and getting a zero above the leading 1. To do this we
compute multiplier for the first row using the second row as a pivot, i.e.,

(6)
(6) a12
m12 =− (6)
= −1
a22

Numerical Analysis-I Page 71


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

and hence we have  


1 −1 0
 
M (6) = 0 1 0
 
 
0 0 1
and hence  
1 0 0 | 1
 
A(7) = M (6) A(6) = 
0 1 0 | 2 .

 
0 0 1 | 3
This completes the reduction and we have that the given system is equivalent to the
system
A0 x = b 0 ,

where    
1 0 0 1
   
A0 = 0 1 0 and b0 = 2
   
   
0 0 1 3
Step-4 Inspect the resulting system.

x + 0y + 0z = 1
0x + y + 0z = 2
0x + 0y + z = 3.

Here, we can see that we there are no zeros on the diagonal of A0 and we have the number
of equations as the number of unknowns. Thus, the system have a unique solution is equal
to b0 ,i.e.,
x = 1, y = 2, and z = 3.

Note that the final matrix is in reduced row-echelon form and it’s and identity matrix.

Remark:

The natural question that we need to ask at this point is “why the extra computations in
Gauss Jordan Method?". The Gauss Jordan Method will result the inverse of the original
matrix and also to determine the determinant of the matrix we simply take the product
of the determinant M k ’s. The inverse of a matrix can be computed in two ways, one
way is take the product M (k) M (k−1) · · · M (1) = A−1 and the other way is by augmenting
the original matrix with and identity matrix. The Gauss Jordan method will reduce the
original matrix to an identity matrix and the identity matrix will become the inverse of

Numerical Analysis-I Page 72


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

the original matrix.

3.2.4 Gauss Jordan Method for matrix inversion

As we try to explain above the Gauss Jordan method is also used to find inverse of a
given matrix. We simply formalize the procedure using the following example.

Example 3.5:
Solve the following system using Gauss-Jordan method and also find the inverse of the
coefficient matrix and it’s determinant.

x + 2y + 3z = 12
3x + 2y + z = 24
2x + y + 3z = 36

Solution:
Step-1: In order to solve and find the inverse of the coefficient matrix the first need we
need to do is forming the augmented matrix of A, b and I, where I is an identity matrix
of same order as A:  
1 2 3 | 12 | 1 0 0
 
A(1) = 3 2 1 | 24 | 0 1 0
 
 
2 1 3 | 36 | 0 0 1
Step-2 Reduce the matrix to reduced row-echelon form. Here, we will be a little bit
systematic so that we can reduce the computation and we are not going to compute the
M k rather we follow a different form.
2-1: Now normalize the all rows by factoring out the lead elements of the first column
so that
 
1
R
3 2 1 2 3 | 12 | 1 0 0
−−−−−−→  
A(1) 1
1
 2
3
1
3
| 8 | 0 1
3
0 (1)(3)(2) = A(2) .

2
R 3  
−−−−−−→ 1 1 3
| 18 | 0 0 1
2 2 2

Note: the product (1)(3)(2) is to for the computation of the the determinant.

- The first row can then be subtracted from the remaining rows (i.e. rows 2 and 3)

Numerical Analysis-I Page 73


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

to yield
 
R2−R1 1 2 3 | 12 | 1 0 0
−−−−−−−→  
A(2) 0 − 43 − 38 | −4 | −1 31 0  (6) = A(3) .
 
R3−R1
−−−−−−−→  
0 − 32 − 23 | 6 | −1 0 12

2-2 Now repeat the cycle normalizing by factoring out the elements of the second column
getting
1
2
R1
−−−− −−→  1 1 3
| 6 | 1
0 0

2 2 2
−3R
  4 3
A(3) −−−−4−−
2
−→  0 1 2 | 3 | 3
− 41 0  (6)(2)(− )(− ) = A(4) .
 
 4  3 2
2
− 2 R3 0 1 1 | −4 | 3
0 − 13
−−−−3−−−→

- Subtracting the second row from the remaining rows (i.e. rows 1 and 3) gives
 
1
R1−R 0 − 12 | 3 | − 14 1
0
−−−−−−−2−→  2 4 
A(4) 0 1 2 | 3 | 3
− 14 0  (24) = A(5) .
 
R3−R 4
−−−−−−−2−→ 
1 1

0 0 −1 | −7 | − 12 4
− 13

2-3 Again repeat the cycle normalizing by the elements of the third column so

−2R
−−−−−−1−→ −1 0 1 | −6 |
 
1
2
− 21 0
R 1
  1
A(5) −−−−2 2
0 1
1 | 3
| 3
− 81 0 (24)(− )(2)(−1) = A(6) .
 
−−→ 
 2 2 8  2
1
−−−−−−−→ 0
−R3 0 1 | 7 | 12
− 41 1
3

- and subtract third row from the remaining rows to yield


 
5
R1 −R3 −1 0 0 | −13 | 12
− 14 − 13
−−−−−−−−→  
A(6) 
R −R3
 0 1
2
0 | − 11
2
| 7
24
1
8
− 13 

(24) = A(7) .
2
−−−−− −−−→ 
1

0 0 1 | 7 | 12
− 14 1
3

2-4 Finally normalize each row by the diagonal elements so as to produce the unit

Numerical Analysis-I Page 74


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

matrix on the left hand side so that


−R
1
−−−−−− −→ 1 0 0 |
 
5 1 1
13 | − 12 4 3
2R2
  1
A(7) −−−−−−→  0 1 0 | −11 | 7 1
− 23  (24)(−1)( )(2)(1) = A(8) .
 
 12 4  2
1
R3
−−−−− −→ 0 0 1 | 7 | 12
− 14 1
3

Step-3 The solution to the equations is now contained in the center vector while the right
hand matrix contains the inverse of the original matrix that was on the left hand
side of expression (2.2.14). The scalar quantity accumulating at the front of the
matrix is the determinant as it represents factors of individual rows of the original
matrix. Thus our complete solution is

x = 13, y = −11 and z = 7.

The inverse of the coefficient matrix is


 
5 1 1
− 12 4 3
 
A−1 = 
 7 1
− 23 

 12 4 
1
12
− 41 1
3

and also the determinant of the coefficeint matrix is det A = −12.

3.2.5 Matrix Decomposition Method

As described in the previous sections, Gauss elimination is designed to solve systems of


linear algebraic equations
Ax = b

Although it certainly represents a sound way to solve such systems, it becomes inefficient
when solving equations with the same coefficients A, but with different right-hand-side
constants (the bs). Recall that Gauss elimination involves two steps: forward elimination
and back-substitution. Of these, the forward-elimination step comprises the bulk of the
computational effort. This is particularly true for large systems of equations. On the other
hand, LU decomposition methods separate the time-consuming elimination of the matrix
A from the manipulations of the right-hand side b. Thus, once the matrix A has been
decomposed, multiple right-hand-side vectors can be evaluated in an efficient manner.
Before showing how this can be done, let us first provide a mathematical overview of the

Numerical Analysis-I Page 75


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

decomposition strategy.

Overview of the LU Decomposition

We shall now consider the LU decomposition of matrices. Suppose A is an n × n matrix.


If L and U are lower and upper triangular n×n matrices respectively such that A = LU .
We say that this is the LU decomposition of A. Note that LU decomposition is not
unique. For example if A = LU is a decomposition then A = Lα Uα is also an LU
decomposition where α 6= 0 is any scalar and Lα = αL and Uα = α1 L.

A two-step strategy for obtaining solutions of system of equations of the form

Ab = b,

can be based explained as follow:

• LU decomposition step. A is factored or decomposed into lower L and upper


U triangular matrices.

• Substitution step. L and U are used to determine a solution x for a right-hand


side column vector b. This step itself consists of two steps. First, an intermediate
vector y is generated by forward substitution. Then, the result is substituted back
to solve for x by back substitution.

A = LU ,

y = U x,

LU x = Ly = b,

- The forward-substitution step can be represented concisely as

b1
y1 = ,
l11

i−1

(3.16)
1 X
yi = bi − lij yj  , for i = 2, 3, · · · , n.
lii j=1

Numerical Analysis-I Page 76


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

- the back-substitution step can be represented concisely as

ynn
xnn = ,
unn
(3.17)
 
n
1  X
xi = yi − uij yj  , for i = 2, 3, · · · , n.
uii j=i+1

Further if A = LU is a LU decomposition then det A can be calculated as det A =


det L × det U = l11 l22 · · · lnn u11 u22 · · · unn , where lii are the diagonal entries of L and uii
are the diagonal entries of U .

We shall now give methods to find LU decomposition of a matrix. Basically, we shall


be considering three cases. First, we shall consider the the Doolittles’s and Crout’s
method for a general matrix; secondly the Cholesky’s method for a symmetric matrix,
and thirdly the decomposition of Tridiagonal matrix. Before discuss these methods
let’s discuss the LU of a given matrix A using the Gaussian Elimination method.

LU decomposition using Gauss elimination procedure

When the Gauss elimination procedure is applied to a matrix , the elements of the
matrices L and U are actually calculated. The upper triangular matrix U is the matrix
of coefficients A that is obtained at the end of the Gauss elimination procedure. For the
lower triangular matrix L, the elements on the diagonal are all 1, and the elements below
(k)
the diagonal are the negative of multipliers mij that multiply the pivot equation when
it is used to eliminate the elements below the pivot coefficient. For the case of a system
of three equations, the decomposition has the form:
    
(1) (1) (1)
a11 a12 a13 1 0 0 a11 a12 a13
    
(1) (2) (2) 
a21 a22 a23  = m21 1 0  0 a22 a23  ,
   
    
(1) (2) (3)
a31 a32 a33 m31 m32 1 0 0 a33

where
(1) (1) (2)
(1) a21 (1) a31 (2) a32
m21 = (1)
, m31 = (1)
, and m32 = (2)
.
a11 a11 a22

Numerical Analysis-I Page 77


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Let the LU decomposition of A is given as:


   
 l11 0 0 ··· 0 u11 u12 u13 · · · u1n 
0 ··· 0  0 u22 u2 3 · · · u2n 

 l12 l22
  
   
   
 l31
 l32 u33 · · · 0 
  0
and  0 u33 · · · u3n 
 (3.18)
 . .. .. . . ..  . .. .. . . .. 
 .  .

 . . . . . 
  . . . . .  
   
ln1 ln2 ln3 · · · lnn 0 0 0 · · · unn

In order to obtain a unique solution it is convenient to choose either lii = 1 or uii = 1.


Depending of these choices we will have to methods of the LU decomposition.

a) if lii = 1 for i = 1, 2 · · · , n the method is called the Doolittle’s method and

b) if uii = 1 for i = 1, 2 · · · , n the method is called Crout’s method.

LU decomposition using Doolittle’s Method

Let A = (aij ). We seek an LU decomposition in which the diagonal entries li of L are


all 1. Let L = (lij ); U = (uij ). Since L is a lower triangular matrix, we have lij = 0 if
j > i; and by our choice, lii = 1. Similarly, since U is an upper triangular matrix, we
have uij = 0 if i > j.

We determine L and U as follows: The 1st row of U and 1st column of L are determined
as follows : n
X
a11 = l1k uk1
k=1

= l11 u11 , since l1k = 0 for k > 1,


= u11 , since l11 = 1.
Therefore, in general

u11 = a11 . (3.19)

Now,
n
X
a1j = l1k ukj
k=1

= l11 u1j since l1k = 0 for k > 1,


= u1j since l11 = 1.

Numerical Analysis-I Page 78


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

=⇒ u1j = a1j (3.20)

Thus the first row of U is the same as the first row of A. The first column of L is
determined as follows:
n
X
aj1 = ljk uk1
k=1

= lj1 u11 since uk1 = 0 for k > 1,

aj1
=⇒ lj1 = (3.21)
u11

Note : u11 is already obtained from Equation (3.19)

Thus Equation (3.20) and Equation (3.21) determine respectively the first row of U and
first column of L. The other rows of U and columns of L are determined recursively as
given below: Suppose we have determined the first i − 1 rows of U and the first i − 1
columns of L. Now we proceed to describe how one then determines the ith row of U
and ith column of L. Since first i − 1 rows of U have been determined, this means, ukj
are all known for 1 ≤ k ≤ i − 1 ; 1 ≤ j ≤ n. Similarly, since first i − 1 columns are known
for L, this means, lik are all known for 1 ≤ i ≤ n ; 1 ≤ k ≤ i − 1.

Now,
n
X
aij = lik ukj
k=1
Xi
= lik ukj since lik = 0 for k > i,
k=1
i−1
X
= lik ukj + lii uij
k=1
i−1
X
= lik ukj + uij since lii = 1.
k=1

Thus, solving for uij results in

i−1
X
=⇒ uij = aij − lik ukj . (3.22)
k=1

Numerical Analysis-I Page 79


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Note that on the RHS we have aij which is known from the given matrix. Also the sum on
the RHS involves lik for 1 ≤ k ≤ i − 1 which are all known because they involve entries in
the first i − 1 columns of L; and they also involve ukj ; 1 ≤ k ≤ i − 1 which are also known
since they involve only the entries in the first i − 1 rows of U . Thus Equation (3.22)
determines the ith row of U in terms of the known given matrix and quantities determined
upto the previous stage. Now we describe how to get the ith column of L:
n
X
aji = ljk uki
k=1
Xi
= ljk uki since uki = 0for k > i,
k=1
i−1
X
= ljk uki + lji uii ,
k=1

solving for lji gives us:

i−1
" #
1 X
lji = aji − ljk uki . (3.23)
uii k=1

Once again we note the RHS involves uii , which has been determined using Equa-
tion (3.22); aij which is from the given matrix; ljk ; 1 ≤ k ≤ i − 1 and hence only entries
in the first i − 1 columns of L; and uki , 1 ≤ k ≤ i − 1 and hence only entries in the first
i − 1 rows of U . Thus RHS in Equation (3.23) is completely known and hence lji , the
entries in the ith column of L are completely determined by Equation (3.23).

Summary:
The summary of Doolittle’s procedure is as follows:
Step-1 determining 1st row of U and 1st column of L :

a) The diagonal elements of L are 1, i.e.,

lii = 1, for i = 1, 2 · · · , n.

b) The 1st row of U are given by the 1st row of A, i.e.,

u1,j = a1,j for j = 1, 2, · · · , n.

Numerical Analysis-I Page 80


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

c). The 1st column of L is given by Equation (3.21), i.e.,

aj1
lj1 = for j = 1, 2, · · · , n.
u11

Step-i for i = 2, 3, · · · , n we determine:

a) Determine the ith row of U using Equation (3.22), i.e.,

i−1
X
uij = aij − lik ukj , for j = i, i + 1, · · · , n.
k=1

Note that for j < i we have uij = 0.


b.) Determine the ith column of L using Equation (3.23), i.e.,

i−1
" #
1 X
lji = aji − ljk uki , for j = i, i + 1, · · · , n.
uii k=1

Note that for j < i we have lji = 0.


We observe that the method fails if uii = 0 for some i.

Example 3.6:
Determine the the Doolittle’s decomposition for the matrix.
 
2 1 −1 3
 
−2 2 6 −4
 
A=  
 4 14 19 4
 
 
6 0 −6 12

Solution:
Step-1 Determine the 1st row of U and 1st column of L :

a) The diagonal elements of L are 1, i.e.,

lii = 1, for i = 1, 2 · · · , n.

Thus, we have,
l11 = 1, l22 = 1, l33 = 1, and l44 = 1

b) 1st row of U

Numerical Analysis-I Page 81


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

The first row of U is given by the first two of A, i.e.,

u1,j = a1,j for j = 1, 2, · · · , n.

Thus we have

u11 = 2, u12 = 1, u13 = −1, and u14 = 3

c). 1st column of L


The fist column of L is given by Equation (3.21), i.e.,

aj1
lj1 = for j = 1, 2, · · · , n.
u11

Thus, j = 1
=⇒ l11 = 1,

j=2
a21 −2
=⇒ l21 = = = −1,
u11 2
j=3
a31 4
=⇒ l31 = = = 2.
u11 2
and j = 4
a41 6
=⇒ l41 = = = 3,
u11 2
Step-2 a) 2th row of U :
The second row of U is given by Equation (3.22) when i=2, i.e.,

2−1
X
u2j = a2j − l2k ukj , for j = 2, 3, 4.
k=1

Thus, we have
j=2
=⇒ u22 = a22 − l21 u12 = 2 − (−1)(1) = 3

j=3
=⇒ u23 = a23 − l21 u13 = 6 − (−1)(−1) = 5

j=4
=⇒ u24 = a23 − l21 u14 = −4 − (−1)(3) = −1

Numerical Analysis-I Page 82


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

b.) 2nd column of L: The second column of L is given by Equation (3.23) when
i = 2, i.e.,
2−1
" #
1 X
lj2 = aj2 − ljk uk2 , for j = 2, 3, 4.
u22 k=1

Thus, we have j = 2
=⇒ l22 = 1

j=3
1 1
=⇒ l32 = [a32 − l31 u12 ] = [14 − (2)(1)] = 4
u22 3
j=4
1 1
=⇒ l42 = [a42 − l41 u12 ] = [0 − (3)(1)] = −1
u22 3

Step-3 a) 3rd row of U :


The third row of U is given by Equation (3.22) when i=3, i.e.,

3−1
X
u3j = a3j − l3k ukj , and j = 3, 4.
k=1

Thus, we have
j=3

=⇒ u33 = a33 − (l31 u13 + l32 u23 ) = 19 − ((2)(−1) + (4)(5)) = 1

j=4

=⇒ u34 = a34 − (l31 u14 + l32 u24 ) = 4 − ((2)(3) + (4)(−1)) = 2

b.) 3rd column of L: The second column of L is given by Equation (3.23) when
i = 3, i.e.,
3−1
" #
1 X
lj3 = aj3 − ljk uk3 , for j = 3, 4.
u33 k=1

Thus, we have j = 3
=⇒ l33 = 1

Numerical Analysis-I Page 83


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

j=4
1
l43 = [a43 − (l41 u13 + l42 u23 )]
u33
=⇒ 1
= [−6 − ((3)(−1) + (−1)(5))]
1
=2

Step-4 a) 4th row of U :


The fourth row of U is given by Equation (3.22) when i=4, i.e.,

4−1
X
u4j = a4j − l4k ukj , and j = 4.
k=1

Thus, we have
j=4
u44 = a44 − (l41 u14 + l42 u24 + l43 u34 )
=⇒ = 12 − ((3)(3) + (−1)(−) + (2)(2))
= −2

b.) 4th column of L: The second column of L is given by Equation (3.23) when
i = 4, i.e.,
4−1
" #
1 X
lj4 = aj4 − ljk uk4 , for j = 4.
u44 k=1

Thus, we have j = 4
=⇒ l44 = 1

Now, collecting the elements of L and U we get


   
1 0 0 0 2 1 −1 3
   
−1 1 0 0 0 3 5 −1
   
L=   and U =  
 2 4 1 0 0 0 1 2
   
   
3 −1 2 1 0 0 0 −2

This completes the LU decomposition by Doolittle’s method for the given A.

LU decomposition using Crout’s method

The second method for decomposing a general matrix A into the LU factor is the Crout’s
Method. In this method the matrix is decomposed into the product LU , where the

Numerical Analysis-I Page 84


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

diagonal elements of the matrix U are all 1s. It turns out that in this case, the elements
of both matrices can be determined using formulas that can be easily programmed just
like the Doolittel’s method.

A procedure for determining the elements of the matrices L and U using the Crout’s
method can be written as follow.

Step-1: Calculate the first column of L and the first row of U .

a.) Substituting 1s in the diagonal of U :

uii = 1 for i = 1, 2, · · · n.

b.) Calculating the first column of L by using

li1 = ai1 , for i = 1, 2, · · · n.

c.) Calculating the fist row of U by using

a1j
u1j = for j = 2, · · · n.
l11

Step-i: Calculate the ith column of L and the ith row of U , for i = 2, 3, · · · , n.

a.) Calculate the ith column of L:

j−1
X
lij = aij − lik ukj for j = 2, 3, · · · i.
k=1

b.) Calculate the ith row of U :

i−1
" #
1 X
uij = aij − lik ukj , for j = i + 1, i + 2, · · · , n
lii k=1

Example 3.7:

Solve the following system of linear equations by Crout’s Method (LU factorization or
decomposition method):

Numerical Analysis-I Page 85


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

9x1 + 3x2 + 3x3 + 3x4 = 24


3x1 + 10x2 − 2x3 − 2x4 = 17
3x1 − 2x2 + 18x3 + 10x4 = 45
3x1 − 2x2 + 10x3 + 10x4 = 29
Solution: The given system of equation can be written in matrix form as
    
9 3 3 3 x1 24
    
3 10 −2 −2 x  17
    
   2 =  
3 −2 18 10  x  45
    
   3  
3 −2 10 10 x4 29

Step-1: Calculate the first column of L and the first row of U .

a.) Substituting 1s in the diagonal of U :

uii = 1 for i = 1, 2, · · · n.

Thus, we have

u11 = 1, u22 = 1, u33 = 1, and u44 = 1.

b.) Calculating the first column of L by using

li1 = ai1 , for i = 1, 2, · · · n.

Here we have

l11 = a11 = 9, l21 = a21 = 3, l31 = a31 = 3, and l41 = a41 = 3.

c.) Calculating the fist row of U by using

a1j
u1j = for j = 2, 3, · · · n.
l11

j=2
l12 3 1
=⇒ u12 = = =
l11 9 3
j=3
l13 3 1
=⇒ u13 = = =
l11 9 3

Numerical Analysis-I Page 86


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

j=4
l14 3 1
=⇒ u14 = = =
l11 9 3
Step-2: Calculate the 2nd column of L and the 2nd row of U , for i = 2, 3, · · · , n.

a.) Calculate the 2nd column of L:

j−1
X
l2j = a2j − l2k ukj for j = 2 · · · i.
k=1

j=2
1
l22 = a22 − l21 u12 = 10 − (3)( ) = 9
3
b.) Calculate the 2th row of U :

2−1
" #
1 X
u2j = a2j − l2k ukj , forj = 3, 4
l22 k=1

j=3
1 1 1 1
 
u23 = [a23 − l21 u13 ] = −2 − (3)( ) = −
l22 9 3 3
j=4
1 1 1 1
 
u24 = [a24 − l21 u14 ] = −2 − (3)( ) = −
l22 9 3 3
Step-3: Calculate the 3rd column of L and the 3rd row of U , for i = 2, 3.

a.) Calculate the 3rd column of L:

j−1
X
l3j = a3j − l3k ukj for j = 2, 3 · · · i.
k=1

j=2
1
l32 = a32 − l31 u12 = −2 − (3)( ) = −3
3
j=3

1 1
 
l33 = a33 − [l31 u13 + l32 u23 ] = 18 − (3)( ) + (−3)(− ) = 16
3 3

b.) Calculate the 3rd row of U :

3−1
" #
1 X
u3j = a3j − l3k ukj , forj = 4
l33 k=1

Numerical Analysis-I Page 87


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

j=4

1 1 1 1 1
  
u34 = [a34 − (l31 u14 + l32 u24 )] = 10 − (3)( ) + (−3)(− ) =
l33 16 3 3 2

Step-4: Calculate the 4th column of L and the 4th row of U , for i = 2, 3, 4.

a.) Calculate the 4th column of L:

j−1
X
l4j = a4j − l4k ukj for j = 2, 3 · · · i.
k=1

j=2
1
l42 = a42 − l41 u12 = −2 − (3)( ) = −3
3
j=3

1 1
 
l43 = a43 − [l41 u13 + l42 u23 ] = 10 − (3)( ) + (−3)(− ) = 8
3 3

j=4

1 1 1
 
l44 = a44 − [l41 u14 + l42 u24 + l43 u34 ] = 10 − (3)( ) + (−3)(− ) + (8)( ) = 4
3 3 2

b.) Calculate the 4th row of U :


u44 = 1

Now, collecting the elements of L and U we get


   
1 1 1
9 0 0 0 1 3 3 3
   
3 9 0 0 0 1 − 13 − 13 
   
L=   and U = 
1 

3 −3 16 0 0 0 1
  

   2 
3 −3 8 4 0 0 0 1

This completes the LU decomposition by Doolittle’s method for the given A. Now, let
us solve the given system

- Forward substitution gives


Ly = b

Numerical Analysis-I Page 88


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

we have     
9 0 0 0 y1 24
    
3 9 0 0 y  17
    
   2 =  
3 −3 16 0 y  45
    
   3  
3 −3 8 4 y4 29
Using the forward substitution formula we get

8
9y1 = 24 =⇒ y1 =
3
3y1 + 9y2 = 17 =⇒ y2 = 1
5
3y1 − 3y2 + 16y3 = 45 =⇒ y3 =
2
3y1 − 3y2 + 8y3 + 4y4 = 29 =⇒ y4 = 1.

 
8
3
1
 
Thus y =  
5 and U x = y gives
 
2
1

    
1 1 1 8
1 3 3 3
x1
   3
0 1 − 31 − 13  x  1
    
   2 =  .
0 1  5
0 1 x 
  
2   3
 
 2
0 0 0 1 x4 1

- By using the backward substitution formula we get

x4 = 1
1 5
x3 + x4 = =⇒ x3 = 2
2 2
1 1
x2 − x3 − x4 = 1 =⇒ x2 = 2
3 3
1 1 1 8
x1 + x2 + x3 + x4 = =⇒ x1 = 1.
3 3 3 3

Therefore, the required solution by Crout’s method is

x1 = 1, x2 = 2, x3 = 2 and x4 = 1.

Numerical Analysis-I Page 89


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Cholesky Decomposition for Symmetric Matrices

Definition 3.2 Symmetric Matrix

A symmetric matrix is a square matrix that in which it’s transpose is the matrix
it self. Formally, matrix A is symmetric if

AT = A.

Definition 3.3 positive definite symmetric Matrix

An n × n real symmetric matrix A is said to be positive definite if the scalar


xT Ax is positive for every non-zero column vector x of n real numbers.

The Cholesky decomposition of a positive-definite symmetric matrix A is a decomposition


of the form
A = LLT ,

where L is a lower triangular matrix with real positive diagonal elements and LT is
the transpose of L. The Cholesky decomposition is unique when A is positive definite;
there is only one lower triangular matrix L with strictly positive diagonal entries such
that A = LLT . However, the decomposition need not be unique when A is positive
semidefinite. The converse holds trivially: if A can be written as LLT for some lower
triangular matrix L, then A is symmetric and positive definite.

The general Algorithm for Cholesky decomposition is given as follow:


l11 = a11
ai1
li1 = for i = 2, 3, · · · , n.
l11
for j = 2, 3, · · · , n
v
u j−1
u X
2 (3.24)
ljj = ta
jj − ljk
k=1

for i = j + 1, j + 2, · · · , n.
 
j−1
1  X
lij = aij − lik ljk  ,
ljj k=1

Numerical Analysis-I Page 90


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Cholesky decomposition is evaluated column by column (starting from the first column)
and, in each row, the elements are evaluated from top to bottom. That is, in each column
the diagonal element is evaluated first using (2) (the elements above the diagonal are zero)
and then the other elements in the same row are evaluated next using (3). This is carried
out for each column starting from the first one.

Steps for solving Ax = b, where A is symmetric, using Cholesky decomposition is given


below.

1. Factorize A in to A = LLT .

2. Solve for x

(a) Forward substitution:

Solve for y using Ly = b

(b) Back substitution:


Solve for x using LT x = y

Example 3.8:
Solve the following system of equations using the Cholesky decomposition.

4x1 + 2x2 + 14x3 = 14


2x1 + 17x2 − 5x3 = −101
14x1 − 5x2 + 83x3 = 155

Solution: The coefficient matrix is given by


 
4 2 14
 
A=2 17 −5
 
 
14 −5 83

In order to solve the above system first we need to get the cholesky decomposition of the
coefficient matrix A. Now, we use the formulas in Equation (3.24) determine the LLT
decomposition as follows. The computation of L is column by column.
Step-1 First column when j = 1:

Numerical Analysis-I Page 91


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

The diagonal element l11 is given by

√ √
l11 = a11 = 4=2

The elements below the diagonal are computed as:

ai1
li1 = , i = 2, 3
l11

Thus, when i = 2 we have


a21 2
l21 = = =1
l11 2
when i = 3
a31 14
l31 = = =7
l11 2

Step-2 Compute the 2nd column, i.e., j = 2:


The diagonal element l22 is given by
v
u
u 2−1
X q √
2 2
l22 = 22 − = 17 − l21 = 17 − 12 = 4
ta l2k
k=1

The elements below l22 are computed as:

2−1
!
1 X
li2 = ai2 − lik l2k , i = 3
l22 k=1

when i = 3 we have

1 1
l32 = (a32 − l31 l21 ) = (−5 − (7)(1)) = −3
l22 4

Step-3: Compute the 3rd column, i.e., j = 3 Here we only have one element to compute l33
and is given by
v
u 3−1 q q
2
u X
2 2
l33 = ta
33 − l3k = a33 − (l31 + l32 )= 83 − (49 + 9) = 5
k=1

Numerical Analysis-I Page 92


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

Collecting the elements of L we get


 
2 0 0
 
L = 1 4 0
 
 
7 −3 5

Therefore, the Cholesky decomposition of A is given by


    
4 2 14 2 0 0 2 1 7
    
A=2 17 −5 = 1 4 0 0 4 −3 = LLT .
    
    
14 −5 83 7 −3 5 0 0 5

Exercise: Using the above decomposition solve the system given system

3.2.6 Tri-diagonal matrix method

All the methods described so far generally require about n3 operations to obtain the solu-
tion. However, there is one frequently occurring system of equations for which extremely
efficient solution algorithms exist. This system of equations is called tri-diagonal because
there are never more than three unknowns in any equation and they can be arranged so
that the coefficient matrix is composed of non-zero elements on the main diagonal and
the diagonal immediately adjacent to either side. Thus such a system would have the
form
Ax = b

where A is a Tridiagonal matrix given by


 
b c1
 1 
a2

 b2 c2 0 


 
 a3 b3 c3 
A= 
 .. .. .. .


 . . . 

 



0 an−1 bn−1 cn−1 


an bn

Here, we can solve the given system using the two-stage strategy:

• LU decomposition step. A is factored or decomposed into lower L and upper

Numerical Analysis-I Page 93


Chapter-3: System of Equations
3.2. Direct Methods for System of Linear Equations (SLE)

U triangular matrices of the form:


   

1 
u v1
 1 
l2

 1 0 




 u2 v2 0 


   
 l3 1   u3 v3 
L= ,U =
   
 .. ..  .. .. 

 . . 


 . . 

   



0 ln−1 1






0 un−1 vn−1



ln 1 un

To determine the matrices L and U we use the following formula:

b1 = u1 =⇒ u1 = b1
c1 = v1 =⇒ v1 = c1

for k = 2, 3, · · · , n

ak
ak = lk uk−1 =⇒ lk = ,
uk−1
bk = lk vk−1 + uk =⇒ uk = bk − lk vk−1 ,
ck = vk =⇒ vk = ck .

• Substitution step. L and U are used to determine a solution x for a right-hand


side column vector b. This step itself consists of two steps. First, an intermediate
vector y is generated by forward substitution. Then, the result is substituted back
to solve for x by back substitution.

- The forward-substitution step to solve

Ly = b,

where y is obtained using:

y 1 = b1 ,
for k = 2, · · · n
lk yk−1 + yk = bk =⇒ yk = bk − lk yk−1

- the back-substitution to solve


U x = y,
yn
un xn = yn =⇒ xn = ,
un

Numerical Analysis-I Page 94


Chapter-3: System of Equations 3.3. Indirect methods for SLE

for k = n − 1, n − 2, · · · , 1:

yk − vk xk+1
uk xk + vk xk+1 = yk =⇒ xk =
uk

3.3 Indirect methods for SLE

Introduction

Direct solvers such as Gaussian Elimination and LU decomposition allow for efficient
solving. In this section we introduce iterative solutions methods. The choice of a direct
method or an indirect method is a combination of the efficiency of the method (and in
general iterative methods are more efficient), the particular structure of the matrix sys-
tem, a trade-off between compute time and memory, and the computer architecture being
used. Iterative methods work by refining a guess to the solution and converging as quickly
as possible from that guess to the actual solution. You may have met iterative methods
previously in, for example, the general purpose solution of non-linear equations– such
as bisection or Newton-Raphson techniques (along with their more advanced cousins).
Iterative methods for linear systems have become a widespread and powerful tool for
solving the most complex scientific and engineering problems and can be extremely ef-
fective, especially when starting from a good guess at the final solution – and often effort
is expended in making that initial guess as good as possible and which will start you off
close to the final solution and yield a more rapid convergence to the answer. Their only
drawback is that they may not necessarily converge to a solution for a particular matrix
system.

By this approach, we start with some initial guess solution, say x(0) ; for solution x and
generate an improved solution estimate x(k+1) from the previous approximation x(k) :
This method is a very effective for solving differential equations, integral equations and
related problems. Let the residue vector r be defined as
n
(k) X (k)
ri = bi − aij xi for i = 1, 2, · · · n
j=1

i.e., r (k) = b − Ax(k) . The iteration sequence {x(k) : k := 0, 1, · · · } is terminated when


some norm of the residue ||r (k) || = ||Ax(k) − b|| becomes sufficiently small.

Numerical Analysis-I Page 95


Chapter-3: System of Equations 3.3. Indirect methods for SLE

In this section, to begin with, some well known iterative schemes are presented. Their
convergence analysis is presented next. In the derivations that follow, it is implicitly
assumed that the diagonal elements of matrix A are non-zero, i.e. aii 6= 0: If this is not
the case, simple row exchange is often sufficient to satisfy this condition.

3.3.1 Gauss Jacobi method

Consider the set of equations

4x1 − x2 + x3 = 7
4x1 − 8x2 + x3 = −21 (3.25)
−2x1 + x2 + 5x3 = 15

This could be written as


7 + x 2 − x3
x1 =
4
21 + 4x1 + x3
x2 =
8
15 + 2x1 − x2
x3 =
5
And we could derive an iteration scheme which cycles through each of the values of x1 , x2
, and x3 in turn to refine an initial guess. If k is the k th iteration, then

(k) (k)
(k+1) 7 + x2 − x3
x1 =
4
(k) (k)
(k+1) 21 + 4x1 + x3
x2 =
8
(k) (k)
(k+1) 15 + 2x1 − x2
x3 =
5

Starting with an initial guess of x(0) = [1, 2, 2]T we obtain:

(0) (0)
(1) 7 + x2 − x3 7+2−2
x1 = = = 1.75
4 4
(0) (0)
(1) 21 + 4x1 + x3 21 + 4 + 1
x2 = = = 3.375
8 8
(0) (0)
(1) 15 + 2x1 − x2 15 + 2 − 2
x3 = = = 3.000
5 5

Numerical Analysis-I Page 96


Chapter-3: System of Equations 3.3. Indirect methods for SLE

In general we can write the Jacobi Method as:


 
n
(k+1) 1 bi −
X (k) 
xi = aij xj  (3.26)
aii 
j6=i

j=1

The following table shows subsequent iterations of the above example

(k) (k) (k)


k x1 x2 x3
0 1.0 2.0 2.0
1 1.75 3.375 3.0
2 1.84375 3.875 3.025
3 1.9625 3.925 2.9625
.. .. .. ..
. . . .
19 2.0000 4.0000 3.0000

Stopping Criteria

The iterations are stopped when the absolute relative error is less than a respecified
tolerance,, for all unknowns,i.e.,

(k+1) (k)
|xi − xi |
(k+1)
< , for all i = 1, 2, · · · , n. (3.27)
|xi |

Convergence

It is a sufficient condition for the matrix to be strictly diagonally dominant for the Jacobi
method to converge from any given starting vector.

Definition 3.4 Strictly diagonally dominant matrix

A matrix is said to be strictly diagonally dominant if


n
X
|aii | > |aij | i = 1, 2, · · · , n.
j=1
j6=i

Numerical Analysis-I Page 97


Chapter-3: System of Equations 3.3. Indirect methods for SLE

In the above example we have  


4 −1 1
 
A=
 4 −8 1

 
−2 1 5
and
Row 1: |a11 = |4| > | − 1| + |1|
Row 2: |a22 = | − 8| > |4| + |1|
Row 3: |a33 = |5| > | − 2| + |1|.
Thus, the matrix is strictly diagonally dominant and the Jacobi method will always
converge for any given starting vector or initial guess.

Exercise 3.3:
Solve the linear equation A2 x = b2 using Jacobi Iteration, where
   
−2 1 5 15
   
A2 =  4 −1 1 , and b2 = −21
   
   
4 −8 1 7

and explain what happens.

3.3.2 Gauss Seidel method

When matrix A is large, there is a practical difficulty with the Jacobi method. It is
required to store all components of x(k) in the computer memory (as a separate variables)
until calculations of x(k+1) is over. The Gauss-Seidel method overcomes this difficulty by
(k+1) (k+1)
using xi immediately in the next equation while computing xi+1 :This modification
leads to the following set of equations

(k+1) 1 
(k) (k)

x1 = b1 − a12 x2 − a13 x3 − · · · − a1n x(k)
n
a11
(k+1) 1 
(k+1) (k)

x2 = b2 − {a21 x1 } − {a23 x3 + · · · + a2n xn(k) }
a22

(k+1) 1  (k+1) (k+1) (k)



x3 = b3 − {a31 x1 + a32 x2 } − {a34 x4 + · · · + a3n xn(k) }
a33

Numerical Analysis-I Page 98


Chapter-3: System of Equations 3.3. Indirect methods for SLE

In general, for i’th element of x, we have


 
i−1 n
(k+1) 1  X (k+1) X (k)
xi = bi − aij xj − aij xj  (3.28)
aii j=1 j=i+1

Now we are using the new values of x as soon as they are available at each iteration.
Thus the equations in Equation (3.25) would become:

(k) (k)
(k+1) 7 + x2 − x 3
x1 =
4
(k+1) (k)
(k+1) 21 + 4x1 + x3
x2 =
8
(k+1) (k+1)
(k+1) 15 + 2x1 − x2
x3 =
5

Making this change and repeating the above makes the iteration to the solution [2, 4, 3]T
take only 10 steps as per the table below.

(k) (k) (k)


k x1 x2 x3
0 1.0 2.0 2.0
1 1.75 3.75 2.95
2 1.95 3.96875 2.98625
3 1.995625 3.99609375 2.99903125
.. .. .. ..
. . . .
10 2.0000 4.0000 3.0000

The stopping criteria of the Gauss Seidel iteration is the same as Gauss Jacobi as given
in Equation (3.27).

Convergence

The Gauss Seidel method always convergences if matrix A is either:

• strictly diagonally dominant

• symmetric positive-definite.

Numerical Analysis-I Page 99


Chapter-3: System of Equations
3.4. Systems of non-linear equations using Newton’s method

3.4 Systems of non-linear equations us-


ing Newton’s method

In this section, we will now extend the Newton’s Method that we discussed in Section 2.5
of Chapter 2 further to systems of many nonlinear equations. Consider the general system
of n linear equations in n unknowns:

f1 (x1 , x2 , · · · , xn ) = 0
f2 (x1 , x2 , · · · , xn ) = 0
.. (3.29)
.
fn (x1 , x2 , · · · , xn ) = 0,

where fi ∈ Rn are nonlinear functions of n-variables of xi and bi ∈ R are real numbers.


For convenience we can think of (x1 , x2 , x3 , · · · , xn ) as a vector x and (f1 , f2 , · · · , fn ) as
a vector-valued function f . With this notation, the system of non-linear equations (3.29)
may then be compactly written as
f (x) = 0,

in which 0 denotes the zero vector [0, 0, · · · , 0]t ∈ Rn . In order to find x such that f goes
to 0, an initial estimate x0 is chosen, and Newton’s iterative method for converging to
the solution is used:
x1 = x0 − J −1 (x0 )f (x0 ) (3.30)

where J (x) is the Jacobian matrix of partial derivatives of f with respect to x given as
 
∂ ∂ ∂
 ∂x1 f1 (x) f (x)
∂x2 1
··· f (x)
∂xn 1
 ∂ ∂ ∂
···

 ∂x1 f2 (x) f (x)
∂x2 2
f (x)
∂xn 2

J (x) =  .

..

 . 

 
∂ ∂ ∂
f (x)
∂x1 n
f (x)
∂x2 n
··· f (x)
∂x1 n

The formula in Equation (3.30) is the vector equivalent of the Newton’s method formula
we learned before. However, in practice we never use the inverse of a matrix for
computations, so we cannot use this formula directly. Rather, we can do the following.
First solve the equation
J (x)h = −f (x0 ).

Numerical Analysis-I Page 100


Chapter-3: System of Equations 3.5. Eigenvalue Problems

Since J (x0 ) is a known matrix and f (x0 ) is a known vector, this equation is just a
system of linear equations, which can be solved efficiently and accurately. Once we have
the solution vector h we can obtain our improved estimate x1 by

x1 = x0 + h.

This, then, fully defines Newton’s method for systems of non-linear equations as

1. Choose some initial guess x0

2. For all k = 0, 1, · · · until convergence

a) Compute the Jacobian matrix J (x) = Df (x)


b) Solve the linear system J hk = −f (xk ) with respect to hm .
c) set xk+1 = xk + hk .

The convergence criteria is given by

||xk+1 − xk || < ,

where  is the given error tolerance.

Example 3.6:
Solve the following systems of nonlinear equations using the Newton’s method

x1 − x2 + 1 = 0
x21 + x22 − 4 = 0

3.5 Eigenvalue Problems

In this section we will discuss, in some detail, some iterative methods for finding single
eigenvalue-eigenvector pairs (eigenpairs is a common term) of a given real matrix A; we
will also give an overview of more powerful and general methods that are commonly used
to find all the eigenpairs of a given real A.

Numerical Analysis-I Page 101


Chapter-3: System of Equations 3.6. Basic Properties of eigenvalues and eigenvectors

3.6 Basic Properties of eigenvalues and


eigenvectors

The algebraic eigenvalue problem is as follows: Given a matrix A ∈ Rn×n , find a nonzero
vector x ∈ Rn and the scalar λ such that

Ax = λx

Note that this says that the vector Ax is parallel to x, with λ being an amplification
factor, or gain. Note also that the above implies that

(A − xI)x = 0,

that A − λI is a singular matrix. Hence, det(A − λI) = 0; it is easy to show that this
determinant is a polynomial (of degree n) in λ, known as the characteristic polynomial
of A, p(λ), so that the eigenvalues are the roots of a polynomial. Although this is not a
good way to compute the eigenvalues, it does give us some insight into their properties.
Thus, we know that an n × n matrix has n eigenvalues, that the eigenvalues can be
repeated, and that a real matrix can have complex eigenvalues, but these must occur in
conjugate pairs. We summarize these and a number of other basic eigenvalue properties
in the following theorem, presented without proof.

Theorem 3.1
Basic Eigenvalue Properties Let A ∈ Rn×n be given. Then we have the following:

1. There are exactly n eigenvalues, counting multiplicities; complex eigenvalues


will occur in conjugate pairs.

2. Eigenvectors corresponding to distinct eigenvalues are linearly independent.

3. If an n × n matrix A has n independent eigenvectors, then there exists a


nonsingular matrix P such that P −1 AP = D is diagonal and A is called
diagonalizable. Moreover, the columns of P are the eigenvectors of A and
the elements dii = λi are the eigenvalues of A.

4. If A is symmetric (A = AT ), then the eigenvalues are real and we can choose


the eigenvectors to be real and orthogonal.

Numerical Analysis-I Page 102


Chapter-3: System of Equations 3.6. Basic Properties of eigenvalues and eigenvectors

5. If A is symmetric, then there is an orthogonal matrix Q such that QT AQ =


D is diagonal, where the elements dii = λi , are the eigenvalues of A.

6. If A is triangular, then the eigenvalues are the diagonal elements, λi = aii .

3.6.1 Power method for finding dominant eigen-


values

The power method is an iterative technique to find or locater the dominant eigenvalue of
a given matrix and also computes the associated eigenvector.

Let A be an n × n matrix with eigenvalues λ1 , λ2 , λ3 , · · · , λn not necessary distinct that


satisfy the relation
λ1 > λ2 ≥ λ3 ≥ · · · ≥ λn .

The eigenvalue λ1 which is the largest in magnitude, is known as dominant eigenvalue of


the matrix A. Furthermore, we assume that the associated eigenvectors v1 , v2 , v3 , · · · , vn
are linearly independent, and therefore form a basis for Rn . It should be noted that not all
matrices have eigenvalues and eigenvectors which satisfy the conditions we have assumed
here.

Let x0 be a nonzero element of Rn . Since the eigenvectors of A form a basis for R it


follows that x0 can be written as a linear combination of v1 , v2 , v3 , · · · , vn , that is, there
exists constants α1 , α2 , α3 , · · · , αn such that

x0 = α1 v1 + α2 v2 + α3 v3 + · · · + αn vn

Next, construct the sequence {xm according to the rule xm = Axm−1 form ≥ 1. By the

Numerical Analysis-I Page 103


Chapter-3: System of Equations 3.6. Basic Properties of eigenvalues and eigenvectors

direct calculation we find

x1 = Ax0
= α1 (Av1 ) + α2 (Av2 ) + α3 (Av3 ) + · · · + αn (Avn )
= α1 (λ1 v1 ) + α2 (λ2 v2 ) + α3 (λ3 v3 ) + · · · + αn (λn vn )
x2 = Ax1 = A2 x0
= α1 (A2 v1 ) + α2 (A2 v2 ) + α3 (A2 v3 ) + · · · + αn (A2 vn )
= α1 (λ21 v1 ) + α2 (λ22 v2 ) + α3 (λ23 v3 ) + · · · + αn (λ2n vn )

and, in general

xm = Axm−1 = · · · = Am x0
= α1 (Am v1 ) + α2 (Am v2 ) + α3 (Am v3 ) + · · · + αn (Am vn )
= α1 (λm m m m
1 v1 ) + α2 (λ2 v2 ) + α3 (λ3 v3 ) + · · · + αn (λn vn )

In deriving these equations we have made repeated use of the relation Avj = λj vj , which
flows from the fact that vj is an eigenvector associated with the eigenvalue λj .

Factoring λm m
1 from the right-hand side of the equation for x gives

" ! ! !#
m λ2 λ3 λn
x = λm
1 α1 v2 + α2 + α3 + · · · αn . (3.31)
λ1 λ1 λ1

By assumption, | λλ1j | < 1 for each j, so | λλ1j |m → 0 as m → ∞. It therefore follows that

xm
lim = α1 v1 .
m→∞ λm
1

Since any nonzero constant times and eigenvector is still an eigenvector associated with
the same eigenvalue, we see that the scaled sequence {x(m) /λm
1 } converges to and eigenvec-
tor associated with the dominant eigenvalue provided α1 6= 0. Furthermore, convergence
towards the eigenvector is linear with asymptotic error constant | λλ12 |.

3.6.2 Approximated dominant eigenvalue

An approximation for the dominant eigenvalue of A can be obtained from the sequence
(m−1)
{x(m) } as follows. Let i be an index for which xi 6= 0, and consider the ratio of the
ith element from the vector xm to the ith element from x(m−1) .

Numerical Analysis-I Page 104


Chapter-3: System of Equations 3.6. Basic Properties of eigenvalues and eigenvectors

By equation (3.31),

(m)
xi λm m
1 α1 v1,i [1 + O((λ2 /λ1 ) )]
(m−1)
= m−1 m−1
= λ1 [1 + O((λ2 /λ1 )m−1 )]
xi λ1 α1 v1,i [1 + O((λ2 /λ1 ) )]

provided v1,i 6= 0, where v1,i denotes the ith element from the vector v1 . Hence, the ratio
(m)
xi
(m−1) converges towards the dominant eigenvalue, and the convergence is linear with
xi
asymptotic rate constant |λ2 /λ1 |.

To avoid overflow or underflow problems when calculating the sequence {x(m) , it is com-
mon practice to scale the vectors x(m) so that they all of unit length. Here, we will use
the l∞ norm to measure vector length. Thus, in a practical implementation of the power
method, the vectors x(m) would be computed in two steps: First multiply the previous
vector by the matrix A and then scale the resulting vector to unit length.

Select x( 0) 6= 0, k = 1,set Nmax , tol, λ(0) = 0;


y (k) = Ax(k−1) , λ(k) = yp(k) , where |yp(k) = ky (k) k∞ ;
instructions;
if |λ(k) − λ(k−1) | < tol or k ≥ Nmax then
Stop;
else
y (k)
set x(k) = λ(k)
;
Go to step 2;
end
Algorithm 1: Power method algorithms

Theorem 3.2 Determining an Eigenvalue from and Eigenvector

If x is and eigenvector of a matrix A, then its corresponding eigenvalue is given by

Ax · x
λ=
x·x

This quotient is call the Rayleigh quotient.

Numerical Analysis-I Page 105


Chapter-3: System of Equations 3.6. Basic Properties of eigenvalues and eigenvectors

Example 3.1

Approximate dominant eigenvalue using the power method for the matrix
 
1 2 0
 
A = −2 1 2
 
 
1 3 1

Use x0 = (1, 1, 1) as the initial approximation and tol = 0.001


Solution:

Step 1: First iteration of the power method produces


    
1 2 0 1 3
    
(1) (0)
y = Ax = −2 1 2 1 = 1
    
    
1 3 1 1 5

Thus, λ(1) = ky 1 k∞ = 5 and hence we have


 
0.60
y (1)  
x(1) = = 
0.20

λ(1) 


1.00

Since |λ(1) − λ(0) | = 5 ≥ tol we need to repeat the power method.

Step 2: Second iteration of the power method produces


    
1 2 0 0.60 1.00
    
y (2) = Ax(1) = −2 1 2 0.20 = 1.00
    
    
1 3 1 1.00 2.20

Thus, λ(2) = ky 2 k∞ = 2.20 and hence we have


 
0.45
y (2)  
x(2) = = 0.45
 
(2)
 
λ  
1.00

Since |λ(2) − λ(1) | = |2.20 − 5| = 2.80 ≥ tol we need to repeat the power
method.

Continuing this process, you obtain the sequence of approximations shown in Table ??.

Numerical Analysis-I Page 106


Chapter-3: System of Equations 3.6. Basic Properties of eigenvalues and eigenvectors


x(0)   x(1)   x(2)   x(3)   x(4)   x(5)   x(6)   x(7) 
1.00 0.60 0.45 0.48 0.51 0.50 0.50 0.50
1.00 0.20 0.45 0.55 0.51 0.49 0.50 0.50
               

1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00


(0) (1) (2) (3) (4) (5) (6)
λ λ λ λ λ λ λ λ(7)
0 5.0 2.20 2.82 3.13 3.02 2.99 3.00

Then the dominant eigenvalue can be obtained using the Rayleigh quotient after the
convergence of the power method. Hence

Ax(7) · x(7)
λ1 = = 3.0
x(7) · x(7)

Exercise 3.1
Determine the largest eigenvalue and the corresponding eigenvector of the matrix
below using power method
   
5 4 1
  using x0 =   as an initial guess.
1 2 0

Answer: After
 the 6th iteration you will find that λ(6) = λ(6) = 6 and x(6) =
1
x(5) =  .
0.25

3.6.3 Inverse power method

Suppose λ is an eigenvalue of matrix A, then:

Ax = λx
⇐⇒ A−1 Ax = λA−1 x
⇐⇒ x = λA−1 x
1
⇐⇒ A−1 x = A−1 x
λ
1
⇐⇒ is eigenvalue of A−1 .
λ

1
Thus if λ is the eigenvalue of A and x is the corresponding eigenvector, then is an
λ
eigenvalue of corresponding to the same eigenvector. Hence the reciprocal of the largest

Numerical Analysis-I Page 107


Chapter-3: System of Equations 3.6. Basic Properties of eigenvalues and eigenvectors

eigenvalue of A−1 is the smallest eigenvalue of A. Therefore to find the smallest eigenvalue
of A, we apply power method on A−1 . This process is called inverse power method.

Example 3.2

Perform3 iterations

of the inverse
 
power method to obtain the smallest eigenvalue
1 6 1 1
   
1 2 0 using x(0) = 0 as an initial guess.
of A =    
   
0 0 3 0

Numerical Analysis-I Page 108


Chapter 4
Interpolation

4.1 Finite differences

Different types of finite difference operators are used in Numerical Analysis, viz. shift (E),
average(mean) (µ),forward difference (4), backward difference (5) and central difference
(δ) are discussed in this chapter.

When a function is known explicitly, it is easy to calculate the value (or values) of f (x),
corresponding to a fixed given value . However, when the explicit form of the function is
not known, it is possible to obtain an approximate value of the function up to a desired
level of accuracy with the help of finite differences. The calculus of finite differences deals
with the changes that take place in the value of a function due to finite changes in the
independent variable.

Definition 4.1 Finite Differences


If x0 , x1 , x2 · · · , xn be given set of observations and let y0 = f (x0 ), y1 = f (x1 ), y2 =
f (x2 ), · · · , yn = f (xn ) be their corresponding values for the curve y = f (x), y1 −
y0 , y2 − y1 , · · · , yn − yn−1 is called as finite difference.

Different types of finite difference operators are defined, among them forward difference,
backward difference and central difference operators are widely used for equally spaced
data points.

Let the function y = f (x), x begin the independent variable and y a dependent variable,
be defined on the closed interval [a, b] and let x0 , x1 , · · · , xn be the n discrete values of

109
Chapter-4: Interpolation 4.1. Finite differences

x on the given interval. Assumed that these values are equidistant, i.e. xi = x0 + ih,
i = 0, 1, 2, · · · , n; h is a suitable real number called the difference of the interval or step
size. When x = xi , the value of y is denoted by yi and is defined by yi = f (xi ). The
values of independent variable x are called arguments and the dependent variable y are
called entries.

Definition 4.2 Operator

An operator is a symbol which is operated with a function then it transforms the


function or we can say that the operator maps one sequence to other.

4.1.1 Shift operators

If the shift operator E is operating on yi , the y value is shifted down to the next provided
y value.

Definition 4.3 Shift Operator

Let y = f (x) be a function x, and let x takes the consecutive values x, x + h, x +


2h, etc. We then define an operator having the property

Ef (x) = f (x + h).

Thus, when E operates on f (x), the result is the next value of the function. Here,
E is called the shift operator.

If we apply the shift operator twice on f (x), we get

E 2 f (x) = E(Ef (x))


= Ef (x + h)
= f (x + 2h)

Thus, in general, if we apply the operator E n times, we get

E n f (x) = f (x + nh),

or
E n yx = y(x + nh),

Numerical Analysis-I Page 110


Chapter-4: Interpolation 4.1. Finite differences

or using the index i


E n yi = yi+n

For example

Ey0 = y1 , E 2 y0 = Ey1 = y2 , E 4 y0 = y4 , · · · , E 2 y2 = y4 .

Definition 4.4 Inverse Shift Operator

The inverse operator E −1 is defined as

E −1 f (x) = f (x − h), or E −1 yx = yx−h or E −1 yi = yi−1

Similarly,
E −n f (x) = f (x − nh).

4.1.2 Averaging Operator

The averaging operator µ is defined as

1 1 1
 
µf (x) = f (x + h) + f (x − h)
2 2 2

4.1.3 Differential Operator

The averaging operator D is defined as

d
Df (x) = f (x) = f 0 (x),
dx (4.1)
d2
D2 f (x) = 2 f (x) = f 00 (x).
dx

Numerical Analysis-I Page 111


Chapter-4: Interpolation 4.1. Finite differences

4.1.4 Forward difference operator

Definition 4.5 Forward difference Operator

The forward difference is denoted by (4) and is defined by

4 f (x) = f (x + h) − f (x). (4.2)

When x = xi then from above equation

4f (xi ) = f (xi + h) − f (xi ), i.e., 4yi = yi+1 − yi , i = 0, 1, 2, · · · , n − 1.

In particular,

4y0 = y1 − y0 , 4y1 = y2 − y1 , · · · , 4yn−1 = yn − yn−1 .

These are called First order Forward Differences. The differences of the first order
forward differences are called Second order Forward Differences and are denoted by
42 y0 , 42 y1 , 42 y2 , · · · , 42 yn . In particulat two second order differences are

42 y0 = 4y1 − 4y0 = (y2 − y1 ) − (y1 − y0 ) = y2 − 2y1 + y0 .


42 y1 = 4y2 − 4y1 = (y3 − y2 ) − (y2 − y1 ) = y3 − 2y2 + y1 .

The third order forward differences are also defined in similar manner, i.e.

43 y 0 = 42 y 1 − 4 2 y 0
= (y3 − 2y2 + y1 ) − (y2 − 2y1 + y0 .)
= y3 − 3y2 + 3y1 − y0 .
! ! !
1 3 3 3
= y3 + (−1) y2 + (−1)2 y1 + (−1)3 y0
2 1 0
43 y1 = y4 − 3y3 + 3y2 − y1 ,
! ! !
1 3 3 3
= y4 + (−1) y3 + (−1)2 + (−1)3 y1 ,
2 1 0
!
n n!
where the combination = .
k k!(n − k)!

Numerical Analysis-I Page 112


Chapter-4: Interpolation 4.1. Finite differences

In general, higher order differences can be defined as follows


h i h i
4n f (x) = 4 4n−1 f (x) , i.e., 4n yi = 4 4n−1 yi , n = 1, 2, · · · .

Or n
! !
n
X
k n
4 yi = yn+i + (−1) yn+i−k . (4.3)
k=1 n−k
It must be remembered that 40 ≡ identity operator, i.e. 40 f (x) = f (x) and 41 ≡ 4.

Example 4.1

Find 44 y3 ?
Solution: Here, we can either use the formula in Equation (4.3) or successive
application of the forward difference operator on y3 four times.
Using the formula, we have n = 4 and i = 3 hence

4
! !
4
X
k 4
4 y3 = y4+3 + (−1) y4+3−k
k=1 4−k
! ! ! !
1 4 4 4 4
= y7 + (−1) y6 + (−1)2 y5 + (−1)3 y4 + (−1)4 y3
3 2 1 0
= y7 − 4y6 + 6y5 − 4y4 + 1y3 .

Applying the forward difference operator 4 times on y3 yields

44 y3 = 43 y4 − 43 y3
   
= 42 y 5 − 4 2 y 4 − 42 y 4 − 4 2 y 3
= [(4y6 − 4y5 ) − (4y5 − 4y4 )] − [(4y5 − 4y4 ) − (4y4 − 4y3 )]
= [((y7 − y6 ) − (y6 − y5 )) − ((y6 − y5 ) − (y5 − y4 ))]
− [((y6 − y5 ) − (y5 − y4 )) − ((y5 − y4 ) − (y4 − y3 ))]
= [(y7 − 2y6 + y5 ) − (y6 − 2y5 + y4 )] − [(y6 − 2y5 + y4 ) − (y5 − 2y4 + y3 )]
= [y7 − 3y6 + 3y5 − y4 ] − [y6 − 3y5 + 3y4 − y3 ]
= y7 − 4y6 + 6y5 − 4y4 + y3 .

∴ 44 y3 = y7 − 4y6 + 6y5 − 4y4 + y3 .

All the forward differences can be represented in a tabular form, called the forward
difference or diagonal difference table. Let x0 , x1 , · · · , x4 be four arguments. All the
forwarded differences of these arguments are shown in Table 4.1 bellow.

Numerical Analysis-I Page 113


Chapter-4: Interpolation 4.1. Finite differences

x y 4 42 43 44
x0 y0
4y0
x1 y1 42 y 0
4y1 43 y0
2
x2 y2 4 y1 44 y 0
4y2 43 y1
2
x3 y3 4 y2
4y3
x4 y4

Table 4.1: Forward difference table.

The first entry, i.e., y0 is called leading term and 4y0 , 42 y0 , 43 y0 , · · · are called the
leading differences.

Error propagation in a difference table

If any entry of the difference table is has an error, then this error spread over the table in
convex manner. The propagation of error in a difference table is illustrated in Table 4.2.
Let us assumed that y3 has an error and the amount of the error be .

x y 4 42 43 44 45 46
x0 y0
4y0
x1 y1 42 y 0
4y1 43 y 0 + 
x2 y2 42 y 1 +  44 y0 − 4
3
4y2 +  4 y1 − 3 45 y0 + 10
x3 y3 +  42 y2 − 2 44 y1 + 6 46 y0 − 20
3 5
4y3 −  4 y2 + 3 4 y1 − 10
x4 y4 42 y 3 +  44 y2 − 4
4y4 43 y 3 − 
2
x5 y5 4 y4
4y5
x6 y6

Table 4.2: Error propagation in a finite difference table.

Following observations are noted from Table 4.2.

(i) The error increases with the order of the differences.

Numerical Analysis-I Page 114


Chapter-4: Interpolation 4.1. Finite differences

(ii) The error is maximum (in magnitude) along the horizontal line through the erro-
neous tabulated value.

(iii) In the k th difference column, the coefficients of errors are the binomial coefficients in
the expansion of (1 − x)k . In particular, the errors in the second difference column
are , −2, , in the third difference column these are , −3, 3, −, and so on.

(iv) The algebraic sum of errors in any complete column is zero.

Example 4.2

Construct a forward diagonal difference table for the following set of values:

xi 0 2 4 6 8 10 12 14
yi 625 81 1 1 81 625 2401 65611

Solution: The forward difference table is given by

xi yi 4 42 43 44 45
0 625
-544
2 81 464
-80 -384
4 1 80 384
0 0 0
6 1 80 384
80 384 0
8 81 464 384
544 768 0
10 625 1232 384
1776 1152
12 2401 2384
4160
14 65611

Note: If f (x) is a polynomial of degree n in x, then 4n f (x) is a constant and is zero.


Conversely, if the (n + 1)th difference of a polynomial is zero, then the degree of the

Numerical Analysis-I Page 115


Chapter-4: Interpolation 4.1. Finite differences

polynomial is less or equal to n. Example: Let f (x) = x2 + 8x − 5, then

4f (x) = f (x + h) − f (x)
h i h i
= (x + h)2 + 8(x + h) − 5 − x2 + 8x − 5
= 2xh + h2 + 8h

Now,
42 f (x) = 4f (x + h) − 4f (x)
h i h i
= 2h(x + h) + h2 + 8h − 2hx + h2 + 8h
= 2h2 , which is a constant.
Hence,
43 f (x) = 42 f (x + h) − 42 f (x)
= 2h2 − 2h2 = 0.

4.1.5 Backward difference operator

Definition 4.6 Backward Difference Operator

The symbol 5 is used to represent backward difference operator. The back-


ward difference operator is defined as

5 f (x) = f (x) − f (x − h). (4.4)

Please note that the backward difference of f (x + h) is same as the forward difference of
f (x), that is
5f (x + h) = 4f (x).

When x = xi , the above relation reduces to

5yi = yi − yi−1 , i = n, n − 1, · · · , 1.

In particular,
5y1 = y1 − y0 , 5y2 = y2 − y1 , · · · , 5yn = yn − yn−1 ,

These are called the first order backward differences. The second order backward
differences are denoted by 52 y2 , 52 y3 , · · · , 52 yn . First three second order backward

Numerical Analysis-I Page 116


Chapter-4: Interpolation 4.1. Finite differences

differences are
52 y2 = 5 (5y2 ) = 5 (y2 − y1 )
= 5y2 − 5y1
= (y2 − y1 ) − (y1 − y0 )
= y2 − 2y1 + y0 ,
52 y3 = y3 − 2y2 + y1 , and
52 y4 = y4 − 2y3 + y2 .
The other second order differences can be obtained in similar manner.

In general
5k yi = 5k−1 yi − 5k−1 yi−1 , i = n, n − 1, · · · , k,

where 50 yi = yi , 51 yi = 5yi . Like forward differences, these backward differences can


be written in a tabular form, called backward difference or horizontal difference table.
All backward difference table for the arguments x0 , x1 , · · · , x4 are shown in Table 4.3.

x y 5 52 53 54
x0 y0
5y1
x1 y1 52 y 2
5y2 53 y3
2
x2 y2 5 y3 54 y 4
5y3 53 y4
x3 y3 52 y 4
5y4
x4 y4

Table 4.3: Backward difference table.

It is observed from the forward and backward difference tables that for a given table of
values both the tables are same. Practically, there are no differences among the values
of the tables, but, theoretically they have separate significant which will be discussed in
the next chapter.

4.1.6 Central difference operator

There is another kind of finite difference operator known as central difference opera-
tor.

Numerical Analysis-I Page 117


Chapter-4: Interpolation 4.1. Finite differences

Definition 4.7 Central Difference Operator

The central difference operator is denoted by the symbol δ and is defined by

h h
δf (x) = f (x + ) − f (x − ). (4.5)
2 2

When x = xi , then the first order central difference, in terms of ordinates is

δyi = yi+ 1 − yi− 1


2 2

where yi+ 1 = f (xi + h2 ) and yi− 1 = f (xi − h2 ). In particular,


2 2

δy 1 = y1 − y0 , δy 3 = y2 − y1 , · · · , δyn− 1 = yn − yn−1 .
2 2 2

The second order central differences are

δ 2 yi = δyi+ 1 − δyi− 1
2 2

= (yi+1 − yi ) − (yi − yi−1 )


= yi+1 − 2yi + yi−1 .

In general,
δ n yi = δ n−1 yi+ 1 − δ n−1 yi− 1 .
2 2

All central differences for the five arguments x0 , x1 , · · · , x4 is shown in Table 4.4.

x y δ δ2 δ3 δ4
x0 y0
δy1/2
x1 y1 δ 2 y1
δy3/2 δ 3 y3/2
x2 y2 δ 2 y2 δ 4 y2
δy5/2 δ 3 y5/2
2
x3 y3 δ y3
δy7/2
x4 y4

Table 4.4: Central difference table.

It may be observed that all odd (even) order differences have fraction suffices (integral
suffices), respectively.

Numerical Analysis-I Page 118


Chapter-4: Interpolation 4.1. Finite differences

4.1.7 Relations between operators

Lot of useful and interesting results can be derived among the operators discussed above.
First of all, we determine the relation between forward and backward difference operators.

4 and 5

4yi = yi+1 − yi = 5yi+1 .

42 yi = 4yi+1 − 4yi
= yi+2 − 2yi + yi−1
= 5yi+2 .
In general

4n yi = 5n yi+n , i = 0, 1, 2, · · · (4.6)

4 and E

There is a good relation between E and 4 operators.

4f (x) = f (x + h) − f (x)
= Ef (x) − f (x) (4.7)
= (E − 1)f (x).

From this relation one can conclude that the operators 4 and E − 1 are equivalent. That
is,

4 ≡ E − 1, (4.8)

or

E ≡ 4 + 1, (4.9)

Numerical Analysis-I Page 119


Chapter-4: Interpolation 4.1. Finite differences

The expression for higher order forward differences in terms of function values can be
derived as per following way:

43 yi = (E − 1)3 yi
= (E 3 − 3E 2 + 3E − 1)yi = y3 − 3y2 + 3y1 − y0 .

5 and E

The relation between 5 and E operators is derived below:

5f (x) = f (x) − f (x − h)
= f (x) − E −1 f (x)
= (1 − E −1 )f (x).

That is,

5 ≡ 1 − E −1 , (4.10)

δ and E

The relation between the operators δ and E is given below:

1 h
δf (x) = f (x + ) − f (x − )
2 2
1 1

= E 2 f (x) − E 2 f (x)
1 1
 
= E 2 − E − 2 f (x).

That is

1 1
δ ≡ E 2 − E− 2 . (4.11)

Every operator defined earlier can be expressed in terms of other operator(s). Few more
relations among the operators 4, 5, E and δ are given in the following.

Numerical Analysis-I Page 120


Chapter-4: Interpolation 4.2. Interpolations

E 4 5 δ q
hD
δ2 δ2
E E 4+1 (1 − 5)−1 1+ 2q
+ δ 1+ 4
ehD
−1 δ2 δ2
4 E−1 4 (1 − 5) −1 2
+δ 1+ 4
ehD − 1
2
q
2
5 1 − E −1 1 − (1 + 4)−1 5 − δ2 + δ 1 + δ4 1- e−hD
1 1
δ E 2 − E− 2 4(1 + 4)−1/2 5(1 − 5)−1/2 δ 2 sinh(hD/2)
hD log E log(1 + 4) − log(1 − 5) −2 sinh−1 (δ/2) hD

Table 4.5: Relationship between the operators

4.2 Interpolations

An interpolation task usually involves a given set of data points: where the values yi can,

xi x0 x1 ··· xn
f (xi ) y0 y1 ··· yn

for example, be the result of some physical measurement or they can come from a long
numerical calculation. Thus we know the value of the underlying function f (x) at the
set of points {xi }, and we want to find an analytic expression for f . In practice, often
we can measure a physical process or quantity (e.g., temperature) at a number of points
(e.g., time instants for temperature), but we do not have an analytic expression for the
process that would let us calculate its value at an arbitrary point. Interpolation provides
a simple and good way of estimating the analytic expression, essentially a function, over
the range spanned by the measured points.

In interpolation, the task is to estimate f (x) for arbitrary x that lies between the smallest
and the largest xi . If x is outside the range of the xi ’s, then the task is called extrapo-
lation, which is considerably more hazardous.

Definition 4.8 Interpolation and extrapolation

Suppose that the function f (x) is known at (n + 1) points


(x0 , y0 ), (x1 , y1 ), · · · , (xn , yn ) where the pivotal points xi spread out over the
interval [a, b] satisfy a = x0 < x1 < · · · < xn = b and yi = f (xi ) then finding
the value of the function at any non-tabular point xs (x0 < xs < xn ) is called
interpolation.

Interpolation is done by approximating the required function using simpler functions

Numerical Analysis-I Page 121


Chapter-4: Interpolation 4.2. Interpolations

such as, polynomials. Polynomial approximations assume the data as exact at the (n + 1)
tabular points and generate an nth degree polynomial passing through these (n + 1)
points. However, if the given data has some errors then these errors also will reflect in the
corresponding approximated function. More accurate approximations can be done using
Splines and Chebicheve, Legender and Hermite polynomials but polynomials of degree n
or less passing through (n + 1) points are easy to develop and useful in understanding
numerical differentiation and numerical integral. Hence the present chapter is devoted to
developing and using polynomial interpolation formulae to the required functions.

Definition 4.9
The points x0 , · · · , xn are called the interpolation points. The property of “pass-
ing through these points” is referred to as interpolating the data or called inter-
polation condition. The function that interpolates the data is an interpolant
or an interpolating polynomial (or whatever function is being used).

The purpose of interpolation are:

1. Replace a set of data points {(xi , yi )} with a function given analytically. Here we
have several aspects

• Given a set of data points {(xi , yi )}, find a curve passing thru these points
that is “pleasing to the eye”. In fact, this is what is done continually with
computer graphics. How do we connect a set of points to make a smooth
curve? Connecting them with straight line segments will often give a curve
with many corners, whereas what was intended was a smooth curve.
• We may want to take function values f (x) given in a table for selected values
of x, often equally spaced, and extend the function to values of x not in the
table. For example, given numbers from a table of logarithms, estimate the
logarithm of a number x not in the table.
• The data may be from a known class of functions. Interpolation is then used
to find the member of this class of functions that agrees with the given data.
For example, data may be generated from functions of the form

p(x) = a0 + a1 ex + a2 x2x + · · · + an enx

Then we need to find the coefficients {aj } based on the given data values.

2. Approximate functions with simpler ones, usually polynomials or ’piecewise poly-

Numerical Analysis-I Page 122


Chapter-4: Interpolation 4.2. Interpolations

nomials’. Here, the interpolation is to approximate the function f (x) by a simpler


function, perhaps to make it easier to integrate or differentiate f (x). That will be
the primary reason for studying interpolation in this course and the application will
be discussed in Chapter 6. As an example of why this is important, consider the
problem of evaluating Z 1
dx
0 1 + x10
This is very difficult to do analytically. But we will look at producing polynomial
interpolants of the integrand; and polynomials are easily integrated exactly.

4.2.1 Linear interpolation

The simplest form of interpolation is probably the straight line, connecting two points
by a straight line. Consider the data (x0 , y0 ), (x1 , y1 ). The problem is to find a function
P1 (x) which passes through these two data points. Since there are only two data points
available, the maximum degree of the unique polynomial which passes through these
points is one. Let us assume that P1 (x) = ax + b is the straight line passing through the
two points then
ax0 + b = y0 ,
ax1 + b = y1 .
Solving for a and b gives
y1 − y0
a=
x1 − x 0
x1 y0 − x0 y1
b=
x1 − x0
Thus, P1 (x) can be written in more convenient ways as

x − x1 x − x0
P1 (x) = y0 + y1
x0 − x1 x1 − x0
(x1 − x)y0 + (x0 − x)y1
=
x1 − x0 (4.12)
x − x0
= y0 + [y1 − y0 ]
x1 − x0
y1 − y0
 
= y0 + (x − x0 )
x1 − x 0

Check each of these by evaluating them at x = x0 and x1 to see if the respective values
are y0 and y1 .

As we will see, the interpolating polynomial can be written in a variety of forms, among

Numerical Analysis-I Page 123


Chapter-4: Interpolation 4.2. Interpolations

these are the Lagrange form and the Newton form. These forms are equivalent in
the sense that the polynomial in question is the one and the same (in fact, the solution
to the interpolation task is given by a unique polynomial)

Example 4.3

Suppose we have the following velocity versus time data (a car accelerating from a
rest position)

Time, s Velocity, m.p.h


0 0
1 10
2 25
3 36
4 52
5 59

Use linear interpolation to estimate the car’s velocity at time t = 1.5 and at t =
4.25.
Solution: Let’s denote the discrete times by ti and velocities by vi , i.e., velocity
at ti .
To estimate the car’s velocity at t = 1.5 we used the data points (t1 , v1 ) and
(t1 , v2 ),i.e., (1, 10) and (2, 25) respectively, in Equation (4.12). Thus, we have

t − t2 t − t1
v(t) = v1 + v2 ,
t1 − t2 t2 − t1
1.5 − 2 1.5 − 1
v(1.5) = 10 + 25,
1−2 2−1
= 5 + 12.5 = 17.5

Next, to estimate the velocity at t = 4.25 we use the last two data points and we
have
t − t5 t − t4
v(t) = v4 + v5 ,
t4 − t5 t5 − t4
4.25 − 5 4.25 − 4
v(4.25) = 52 + 59,
4−5 5−4
= 53.75

Numerical Analysis-I Page 124


Chapter-4: Interpolation 4.2. Interpolations

Example 4.4

Given f (x) = ex and x0 = 0.82, x1 = 0.83, find P1 (x) which approximates ex on


the interval [0.82, 0.83] and evaluate P1 (0.826).
Solution: First we find the interpolation points,i.e., y0 = ex0 = 2.2705 and y1 =
ex1 = 2.29332. Hence, we have

0.83 − x x − 0.82
P1 (x) = 2.2705 + 2.29332
0.83 − 0.82 0.83 − 0.82

Hence
P1 (0.826) = 2.284192.

Remark: In general, if y0 = f (x0 ) and y1 = f (x1 ) for some function f , then P1 (x) is a
linear approximation of f (x) for all x ∈ [x0 , x1 ].

4.2.2 Quadratic interpolation

Given the data points (x0 , y0 ), (x1 , y1 ), (x2 , y2 ), are given data we want to find a quadratic
polynomial that passes through these points.

Figure 4.1: Quadratic interpolation

Similar to the linear case, the equation of this parabola is given by

P2 (x) = a0 + a1 x + a2 x2 ,

Numerical Analysis-I Page 125


Chapter-4: Interpolation 4.3. Lagrange’s interpolation formula

which satisfies
P2 (xi ) = yi , fori = 0, 1, 2.

for the given data points. One formula for such a polynomial follows:

P2 (x) = y0 L0 (x) + y1 L1 (x) + y2 L2 (x) (4.13)

with

(x − x1 )(x − x2 ) (x − x0 )(x − x2 ) (x − x0 )(x − x1 )


L0 (x) = , L1 (x) = , L2 (x) = ,
(x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) (x2 − x0 )(x2 − x1 )

Equation (4.13) is called Lagrange’s form of the interpolation polynomial and the
functions L0 , L1 and L2 are called Lagrange’s interpolating basis functions. The
Lagrange’s basis functions have the property that deg(Li ) ≤ 2 and

1,

if i = j
Li(x) = δij =
0, if i 6= j.

Here, δij is called the Kronecker delta.

Example 4.5

Construct the quadratic polynomial interpolation P2 (x) that interpolates the points
(1, 4), (2, 1), and (5, 6).

4.3 Lagrange’s interpolation formula

Lagrange’s formula is applicable to problems where the independent variable occurs at


equal and unequal intervals, but preferably this formula is applied in a situation where
there are unequal intervals for the given independent series.

Given n + 1 discrete data points (xi , yi ), i = 0, 1, 2...n, since there are (n + 1) data points
(xi , yi ), we can define a polynomial of degree n as

Pn (x) =a0 (x − x1 )(x − x2 ) · · · (x − xn ) + a1 (x − x0 )(x − x2 ) · · · (x − xn )


+ a2 (x − x0 )(x − x1 )(x − x3 ) · · · (x − xn ) + · · · + an (x − x0 ) · · · (x − xn−1 )
(4.14)

Numerical Analysis-I Page 126


Chapter-4: Interpolation 4.3. Lagrange’s interpolation formula

and we have assumed that, i.e., the interpolation condition:

yi = Pn (xi ) i = 0, 1, .....n

For i = 0, we have
y0 = Pn (x0 ) = a0 (x0 − x1 )...(x0 − xn )

y0
∴ a0 =
(x0 − x1 )...(x0 − xn )

For i = 1, we get

y1 = Pn (x1 ) = a1 (x1 − x0 )(x1 − x2 )...(x1 − xn )

y1
∴ a1 =
(x1 − x0 )(x1 − x2 )...(x1 − xn )

Similarly for i = 2.......n − 1, we get

yi
ai =
(xi − x0 )(xi − x1 )...(xi − xi−1 )(xi − xi+1 )...(xi − xn )

and for i = n, we get

yn
an =
(xn − x0 )...(xn − xn−1 )

Numerical Analysis-I Page 127


Chapter-4: Interpolation 4.3. Lagrange’s interpolation formula

Thus, substituting the ai ’s in Equation (4.14) we get

(x − x1 )(x − x2 ) · · · (x − xn )
Pn (x) = y0
(x0 − x1 )(x0 − x2 ) · · · (x0 − xn )
(x − x0 )(x − x2 ) · · · (x − xn )
+ y1
(x1 − x0 )(x1 − x2 ) · · · (x1 − xn )
..
.
(x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn )
+ yi
(xi − x0 )(xi − x1 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn )
..
.
(x − x0 )(x − x2 ) · · · (x − xn−1 )
+ yn
(xn − x0 )(xn − x2 ) · · · (xn − xn−1 )

This can be rewritten in a compact form as:

Pn (x) = L0 (x)y0 + L1 (x)y1 + · · · + Ln (x)yn


n
X
= Li (x)yi
i=0

where
(x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn )
Li (x) =
((xi − x0 )(xi − x1 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn )

Or using the product notation


Qn
k=1 (x − xk )
k6=i
Li (x) = Qn
k=1 (xi − xk )
k6=i

Therefore, Lagrange interpolation polynomial of degree n can be written as

n
X
Pn (x) = Li (x)yi , (4.15)
i=0

where Qn
k=1 (x − xk )
k6=i
Li (x) = Qn (4.16)
k=1 (xi − xk )
k6=i

Numerical Analysis-I Page 128


Chapter-4: Interpolation 4.4. Divided difference formula

Example 4.6

Given the following data table, construct the Lagrange interpolation polynomial
P (x), to fit the data and find f (1.25)

xi 0 1 2 3
yi 1 2.25 3.75 4.25

4.4 Divided difference formula

A major difficulty with the Lagrange Interpolation is that one is not sure about the degree
of interpolating polynomial needed to achieve a certain accuracy. Thus, if the accuracy
is not good enough with polynomial of a certain degree, one needs to increase the degree
of the polynomial, and computations need to be started all over again. Furthermore,
computing various Lagrangian polynomials is an expensive procedure. It is, indeed,
desirable to have a formula which makes use of Pk−1 (x) in computing Pk (x).

The following form of interpolation, known as Newton’s interpolation allows us to do


so. The idea is to obtain the interpolating polynomial Pn (x) in the following form:

Pn (x) = a0 +a1 (x−x0 )+a2 (x−x0 )(x−x1 )+· · ·+an (x−x0 )(x−x1 ) · · · (x−xn−1 ), (4.17)

such that
Pn (xi ) = fi = yi , (Interpolation Condition.)

Hence, the constants a0 through an can be determined as follows using the interpolation
condition.

For i = 0 we get
f0 = Pn (x0 ) = a0

For i = 1 we have

f1 = pn (x1 ) = a0 + a1 (x1 − x0 )

f1 − f0
∴ a1 =
x1 − x0

Numerical Analysis-I Page 129


Chapter-4: Interpolation 4.4. Divided difference formula

For i = 2, we have

f2 = pn (x2 ) = a0 + a1 (x2 − x0 ) + a2 (x2 − x0 )(x2 − x1 )

[(f2 − f1 )/(x2 − x1 )] − [(f1 − f0 )/(x1 − x0 )]


∴ a2 =
(x2 − x0 )

Similarly we can find a3 .....an−1 . To express ai , i = 0......n − 1 in a compact manner let


us first define the following notation called divided differences:

f [xk ] = fk

f [xk+1 ] − f [xk ]
f [xk , xk+1 ] =
xk+1 − xk

f [xk+1 , xk+2 ] − f [xk , xk+1 ]


f [xk , xk+1 , xk+2 ] =
xk+2 − xk

Now the co-efficients can be expressed in terms of divided differences as follows:

a0 = f 0
= f [x0 ]
f2 −f1
x2 −x1
− xf11 −f
−x0
0

a2 =
x2 − x0
f [x1 , x2 ] − f [x0 , x1 ]
=
x2 − x0
= f [x0 , x1 , x2 ]
f [x1 , x2 , x3 ] − f [x0 , x1 , x2 ]
a3 =
x3 − x 0
= f [x0 , x1 , x2 , x3 ]
...
f [x1 , x2 , · · · , xi ] − f [x0 , x1 , x2 , · · · xi−1 ]
ai =
x i − x x0
= f [x0 , x1 , x2 , · · · , xi ]
..
.

Numerical Analysis-I Page 130


Chapter-4: Interpolation 4.4. Divided difference formula

f [x1 , x2 , · · · , xn ] − f [x0 , x1 , · · · , xn−1 ]


an =
xn − x0
= f [x0 , x1 , x2 , · · · , xn ]

Note that a1 is called as the first divided difference, a2 as the second divided
difference and so on. Now the polynomial in Equation (4.17) can be rewritten as:

Pn (x) =f [x0 ] + f [x0 , x1 ](x − x0 ) + f [x0 , x1 , x2 ](x − x0 )(x − x1 )


(4.18)
+ · · · + f [x0 , x1 , · · · , xn ](x − x0 )(x − x1 ) · · · (x − xn−1 )
which can be rewritten in a compact form as

n
X k−1
Y
Pn (x) = f [x0 , x1 , · · · , xk ] (x − xi ) (4.19)
k=0 i=0

Thus, Equation (4.18) or (4.19) is called as Newton’s Divided Difference inter-


polation polynomial.

It may also be noted for calculating the higher order divided differences we have used
lower order divided differences. In fact starting from the given zeroth order differences
; one can systematically arrive at any of higher order divided differences. For clarity
the entire calculation may be depicted in the form of a table called Newton Divided
Difference Table.
xi f [xi ] First order Second order Third order Fourth order
difference difference difference difference
x0 f [x0 ]
f [x0 , x1 ]
x1 f [x1 ] f [x0 , x1 , x2 ]
f [x1 , x2 ] f [x0 , x1 , x2 , x3 ]
x2 f [x2 ] f [x1 , x2 , x3 ] f [x0 , x1 , x2 , x3 , x4 ]
f [x2 , x3 ] f [x1 , x2 , x3 , x4 ]
x3 f [x3 ] f [x2 , x3 , x4 ]
f [x3 , x4 ]
x4 f [x4 ]

Table 4.6: Newton Divided difference table.

In the above Newton divided difference table the bold faces are the coefficients of the

Numerical Analysis-I Page 131


Chapter-4: Interpolation 4.4. Divided difference formula

polynomial. Again suppose that we are given the data set (xi , fi ), i = 0, 1 · · · , 4 and
that we are interested in finding the 4th order Newton Divided Difference interpolating
polynomial. Let us first construct the Newton Divided Difference Table. Wherein one
can clearly see how the lower order differences are used in calculating the higher order
Divided Differences:

Example 4.7

Construct the Newton Divided Difference Table for generating Newton interpola-
tion polynomial with the following data set:

xi 0 1 2 3 4
f (xi ) = yi 0 1 8 27 64

Solution: Here n = 4. One can find a fourth order Newton Divided Difference
interpolation polynomial to the given data. Let us generate Newton Divided Dif-
ference Table first; as requested.

xi f [xi ] 1st order 2nd order 34d order 4th order


difference difference difference difference
0 0
1-0
1-0
=1
7-1
1 1 2-0
=3
8−1 6-3
2−1
=7 3-1
=1
19−7 1-1
2 8 3−1
=6 4-0
=0
27−8 9−6
3−2
= 19 4−1
=1
37−19
3 27 4−2
=9
64−27
4−3
= 37
4 64

Note: One may note that the given data corresponds to the cubic polynomial x3 .
To fit such a data 3rd order polynomial is adequate. From the Newton Divided
Difference table we notice that the fourth order difference is zero. Further the
divided differences in the table can be directly used for constructing the Newton
Divided Difference interpolation polynomial that would fit the data as follows

P3 (x) = 0 + 1 × (x − 0) + 3 × (x − 0)(x − 1) + 1 × (x − 0)(x − 1)(x − 2)


= x + 3(x2 − x) + x3 − 3x2 + 2x
= x3

Numerical Analysis-I Page 132


Chapter-4: Interpolation 4.4. Divided difference formula

The advantage of the above method is that there is no need to start all over again
if additional pairs of data are added. We simply need to compute additional divided
differences. Since nth order polynomial interpolation of a given (n + 1) pairs of data is
unique, thus the above polynomial and Lagrangian polynomial are exactly the same.

Example 4.8

Use Newton’s Divided Difference formula and evaluate f (3.0) by a third and fourth
degree polynomial, given

xi 3.2 2.7 1.0 4.8 5.6


f (xi ) = yi 22.0 17.8 14.2 38.3 51.7

Solution: Here n = 4. One can find a fourth order Newton Divided Difference
interpolation polynomial to the given data. Let us generate Newton Divided Dif-
ference Table first:

xi f [xi ] 1st order 2nd order 34d order 4th order


difference difference difference difference
3.2 22.0
8.400
2.7 17.8 2.856
2.118 -0.5280
1.0 14.2 2.012 0.2560
6.342 0.0865
4.8 38.3 2.63
16.750
5.6 51.7

The third degree polynomial fitting all points from x0 = 3.2 to x3 = 4.8 is given by

P3 (x) = 22.0 + 8.400(x − 3.2) + 2.856(x − 3.2)(x − 2.7)


− 0.5280(x − 3.2)(x − 2.7)(x − 1.0)
Hence
f (3) ≈ P3 (3) = 22.0 + 8.400(3 − 3.2) + 2.856(3 − 3.2)(3 − 2.7)
− 0.5280(3 − 3.2)(3 − 2.7)(3 − 1.0)
= 20.2120

The fourth degree polynomial fitting all points from x0 = 3.2 to x4 = 5.6 is also

Numerical Analysis-I Page 133


Chapter-4:
4.5. Interpolation
The Newton-Gregory Interpolation Formulae (with equidistant data points)

given by

P4 (x) = P3 (x) + a4 (x − x0 )(x − x1 )(x − x2 )(x − x3 )


= P3 (x) + 0.2560(x − 3.2)(x − 2.7)(x − 1.0)(x − 4.8)(x − 5.6)
Hence
f (3) ≈ P4 (3) = 20.2120 + 0.2560(3 − 3.2)(3 − 2.7)(3 − 1.0)(3 − 4.8)(3 − 5.6)
= 20.2120 + 0.055296
= 20.267296.

4.5 The Newton-Gregory Interpolation


Formulae (with equidistant data points)

If the data points are equally spaced, that is,

xi+1 = xi + h, i = 0, 1, n − 1

then we have
xi = x0 + ih, i = 1, 2, · · · n.

Thus, we have then n + 1 data points as

(x0 , y0 ), (x0 + h, y1 ), (x0 + 2h, y2 ), · · · , (x0 + nh, yn )

Using these we can easily simplify the Newton divided differences using the forward, back-
ward and central difference and we can obtain the corresponding interpolating polynomial
formula.

Numerical Analysis-I Page 134


Chapter-4:
4.5. Interpolation
The Newton-Gregory Interpolation Formulae (with equidistant data points)

4.5.1 The Newton-Gregory Forward Interpola-


tion Formula

Recalling the Divided difference and the forward difference we have the following:

f (x1 ) − f (x0 )
f [x0 , x1 ] =
x1 − x0
4f (x0 )
=
h
f [x1 , x2 ] − f [x0 , x1 ]
f [x0 , x1 , x2 ] =
x2 − x0
f (x2 )−f (x1 )
x2 −x1
− f (xx11)−f
−x0
(x0 )
=
x2 − x0
f (x2 ) − 2f (x1 ) + f (x0 )
=
2h2
2
4 f (x0 )
=
2h2

f [x1 , x2 , x3 ] − f [x0 , x1 , x2 ]
f [x0 , x1 , x2 , x3 ] =
x3 − x0
f (x3 )−2f (x2 )+f (x1 )
2h2
− f (x2 )−2f2h(x21 )+f (x0 )
=
x3 − x0
f (x3 )−3f (x2 )+3f (x1 )−f (x0 )
2h2
=
3h
43 f (x0 )
=
3!h3
Or in general we have
4n f (x0 )
f [x0 , x1 , · · · xn ] = .
n!hn
with this notation the Newton’s Divided Difference interpolating polynomial 4.18 can be
written as

Numerical Analysis-I Page 135


Chapter-4:
4.5. Interpolation
The Newton-Gregory Interpolation Formulae (with equidistant data points)

4f (x0 ) 42 f (x0 )
Pn (x) = f (x0 ) + (x − x0 ) + (x − x0 )(x − x1 ) + · · ·
h 2!h2 (4.20)
4n f (x0 )
+ (x − x0 )(x − x1 ) · · · (x − xn−1 )
n!hn
simplifies to

42 f (x0 ) 4n f (x0 )
Pn (x) = f (x0 ) + k 4 f (x0 ) + k(k − 1) + · · · + k(k − 1) · · · (k − n + 1) ,
2! n!
(4.21)
where
(x − x0 )
k= .
h
This is called the Newton’s forward interpolation formula or forward Newton-
Gregory formula.

By looking at the forward difference table 4.1 we can see that this formula uses the values
along the diagonal of the differences of y - it is a FORWARD DIFFERENCE formula. It
is therefore used for interpolation near the beginning of a table where k is small.

Example 4.9

In the following table, use the Newton-Gregory Forward Interpolation formula to


find

(a) f (2.4) (b) f (8.7)

xi 2 4 6 8 10
f (xi ) = yi 9.86 10.96 12.32 13.76 15.28

Solution: Form a difference table and note that all forward differences > 2 are
zero.

Numerical Analysis-I Page 136


Chapter-4:
4.5. Interpolation
The Newton-Gregory Interpolation Formulae (with equidistant data points)

xi f (xi ) 4 42
2 9.68
1.28
4 10.96 0.08
1.36
6 12.32 0.08
1.44
8 13.76 0.08
1.52
10 15.28

x − x0 2.4 − 2
(a) Here we have x = 2.4, x0 = 2, h = 2 and k = = = 0.2. Using
h 2
Equation (4.20) we have

4f (x0 ) 42 f (x0 )
P2 (x) = f (x0 ) + (x − x0 ) + (x − x0 )(x − x1 )
h 2!h2
1.28 0.08
= 9.68 + (x − 2) + (x − 2)(x − 4) .
2 2! × 22
Hence
f (2.4) ≈ P2 (2.4)
1.28 0.08
= 9.68 + (2.4 − 2) + (2.4 − 2)(2.4 − 4) .
2 2! × 22
= 9.9296.

Or using Equation (4.21) we get

42
P2 (x) = f (x0 ) + k 4 f (x0 ) + k(k − 1)
2!
Hence
f (2.4) ≈ P2 (2.4)
0.08
= 9.68 + 0.2 × 1.28 + 0.2(0.2 − 1)
2!
= 9.9296

8.7−2
(b) Here we have x = 8.7; x0 = 2; h = 2 and k = 2
= 3.35. Now using

Numerical Analysis-I Page 137


Chapter-4:
4.5. Interpolation
The Newton-Gregory Interpolation Formulae (with equidistant data points)

Equation (4.21) we get

0.08
f (8.7) ≈ P2 (8.7) = 9.68 + 3.35(1.28) + 3.35(3.35 − 1)
2!
= 14.2829

Example 4.10

IIn the following table of ex use the Newton-Gregory formula of forward interpola-
tion to calculate

(a) f (0.12) (b) f (2)

xi 0.1 0.6 1.1 1.6 2.1


exi 1.1052 1.8221 3.0042 4.9530 8.1662

Solution: First let’s construct the forward difference table:

xi f (xi ) 4 42 43 44
0.1 1.1052
0.7169
0.6 1.8221 0.4652
1.1821 0.3015
1.1 3.0042 0.7667 0.1962
1.9488 0.4977
1.6 4.9530 1.2644
3.2132
2.1 8.1662

Note that in this case there is no difference column that is constant. This is to be
expected since ex cannot be represented by a polynomial function of finite degree.

(a) Here we have x = 0.12, x0 = 0.1, h = 0.5 and k = 0.04.

0.4652
e0.12 ≈ 1.1052 + 0.04(0.7169) + 0.04(0.04 − 1)
2!
0.3015
+ 0.04(0.04 − 1)(0.04 − 2)
3!
0.1962
+ 0.04(0.04 − 1)(0.04 − 2)(0.04 − 3)
4!
= 1.1269. (correct value to 5 d.p. is 1.12750)

Numerical Analysis-I Page 138


Chapter-4:
4.5. Interpolation
The Newton-Gregory Interpolation Formulae (with equidistant data points)

(b) Here we have x = 2; x0 = 0.1; h = 0.5 and k = 3.8 and we get

e2.00 ≈ 1.1052 + 2.72422 + 2.47486 + 0.96239 + 0.12525


= 7.3919

In Example 4.5.1 the interpolation formula is identical with f (x), which is a quadratic
function, and the results for f (2.4) and f (8.7) will therefore be correct to the number
of decimal places retained. In example 4.5.1 the function ex is replaced by a 4th degree
polynomial which takes the value of ex at the five given entries. Because the successive
difference decrease, higher differences are relatively small and the value of the estimate
converges. From direct calculation it turns out that the error in the estimate for e0.12 is
about 0.05 percent and for e2.00 it is about 0.04 percent.

4.5.2 The Newton-Gregory Backward Interpola-


tion Formula

For interpolating the value of the function y = f (x) near the end of the given data points
and also to extrapolate value of the function a short distance forward from yn , Newton’s
backward interpolation formula is used.

Let y = f (x) represent a function which assumes the values y0 , y1 , · · · , yn corresponding


to the equidistant interpolation points x0 , x1 , · · · , xn of the argument x, the common
difference being h. With these n + 1 data points, we may consider a polynomial of degree
n as our interpolation formula. Suppose the nth degree polynomial is takes the form:

Pn (x) = an +an−1 (x−xn )+an−2 (x−xn )(x−xn−1 )+· · ·+a0 (x−xn )(x−xn−1 ) · · · (x−x1 ).
(4.22)
Now, the constants an , an−1 , · · · , a0 are computed such that

Pn (xn ) = yn , Pn (xn−1 ) = yn−1 · · · Pn (x0 ) = y0 .

Thus, evaluating Equation (4.22) at x = xn we get

Pn (xn ) = an =⇒ an = yn

Numerical Analysis-I Page 139


Chapter-4:
4.5. Interpolation
The Newton-Gregory Interpolation Formulae (with equidistant data points)

Again,
Pn (xn−1 ) = an + an−1 (xn−1 − xn )
=⇒ yn−1 = yn + an−1 (xn−1 − xn )
yn − yn−1
=⇒ an−1 =
xn − xn−1
5yn
= .
h
When we evaluate Pn (x) at x = xn−2 we get

Pn (xn−2 ) = an + an−1 (xn−2 − xn ) + an−1 (xn−2 − xn )(xn−2 − xn−1 )


yn − yn−1
=⇒ yn−2 = yn + (2h) + an−2 (2h)(h)
h
= yn + 2(yn−1 − yn ) + an−2 (2h2 )
yn−2 − 2(yn−1 − yn ) − yn
=⇒ an−2 =
2h2
yn−2 − 2yn−1 + yn
=
2!h2
2
5 yn
= .
2!h2

In general, for each x = xi we get

5n−i yn
ai = , i = n − 3, n − 4, · · · , 0
(n − i)!hn−i

Substituting the value of the ai ’s in Equation (4.22) results in

5yn 52 yn 5n y n
Pn (x) = yn + (x−xn )+ (x−x n )(x−x n−1 )+· · ·+ (x−xn )(x−xn−1 ) · · · (x−x0 ).
h 2!h2 n!hn
(4.23)
Now, setting
x − xn
k= ,
h
then x − xn = kh and x − xn−1 = x − (xn − h) = x − xn + h = kh + h = (k + 1)h Similarly,

x − xn−2 = (k + 2)h
x − xn−3 = (k + 3)h
..
.
x − x1 = (k + (n − 1))h

Thus, Equation (4.23) reduces to

Numerical Analysis-I Page 140


Chapter-4:
4.5. Interpolation
The Newton-Gregory Interpolation Formulae (with equidistant data points)

52 y n 5n yn
Pn (x) = yn + k 5 yn + k(k + 1) + · · · + k(k + 1) · · · (k + (n − 1)) , (4.24)
2! n!

where
(x − xn )
k=
h
This is called the Newton’s back interpolation formula or Newton-Gregory
backward formula.

Example 4.11

For the following table of values, estimate f (7.5)

xi 1 2 3 4 5 6 7 8
yi = f (xi ) 1 8 27 64 125 216 343 512

Solution: The value to be interpolated is at the end of the table. Hence, it is


appropriate to use Newton’s backward interpolation formula. Let’s first construct
the backward difference table for the given data.

xi f (xi ) 5 52 53 54
1 1
7
2 8 12
19 6
3 27 18 0
37 6
4 64 24 0
61 6
5 125 30 0
91 6
6 216 36 0
127 6
7 343 42
169
8 512

Since the 4th and higher order differences are zero, the required Newton’s backward

Numerical Analysis-I Page 141


Chapter-4: Interpolation 4.6. Error in Polynomial Interpolation

interpolation formula is

52 y n 53 yn
yx = yn + k 5 yn + k(k + 1) + k(k + 1)(k + 1) . (4.25)
2! 3!

In this problem,
x − xn 7.5 − 8
k= = = −0.5,
1 1
hence, we have

42
y7.5 ≈ 512 + (−0.5)169 + (−0.5)(−0.5 + 1)
2
6
+ (−0.5)(−0.5 + 1)(−0.5 + 2)
3! (4.26)
= 512 − 84.5 − 5.25 − 0.375
= 421.875

4.6 Error in Polynomial Interpolation

Our goal in this section is to provide estimates on the “error" we make when interpolating
data that is taken from sampling an underlying function f (x). While the interpolant and
the function agree with each other at the interpolation points, there is, in general, no
reason to expect them to be close to each other elsewhere. Nevertheless, we can estimate
the difference between them, a difference which we refer to as the interpolation error.
The interpolating polynomial Pn is an approximation to f , but unless f itself is a
polynomial of degree n, there will be a nonzero error e(x) = f (x) − Pn (x). At times it is
useful to have an explicit expression for the error and is given by the following theorem
which we are not going to prove.

Theorem 4.1
Let f be a given function on [a, b] and Pn be the polynomial of degree less than or
equal to n interpolating the f at the n + 1 data points x0 , x1 , x2 , · · · , xn in [a, b].
If f has n + 1 continuous derivatives and xi are distinct, then
n
(x − xk ) f (n+1) (ξx ),
Y
en (x) = |f (x) − Pn (x)| = (4.27)
i=0

where a ≤ x ≤ b and ξx is between the minimum and maximum of x0 , x1 , · · · , xn .

Numerical Analysis-I Page 142


Chapter-4: Interpolation 4.6. Error in Polynomial Interpolation

In addition to the interpretation of the divided difference of order n as the coefficient of


xn in some interpolation polynomial, it can also be characterized in another important
way. Consider, e.g., the first-order divided difference

f (x1 ) − f (x0 )
f [x0 , x] = .
x1 − x0

Since the order of the points does not change the value of the divided difference, we can
assume, without any loss of generality, that x0 < x1 . If we assume, in addition, that f (x)
is continuously differentiable in the interval [x0 , x1 ], then this divided difference equals to
the derivative of f (x) at an intermediate point, i.e.,

f [x0 , x1 ] = f 0 (ξ), ξ ∈ (x0 , x1 )

In other words, the first-order divided difference can be viewed as an approximation of the
first derivative of f (x) in the interval. It is important to note that while this interpretation
is based on additional smoothness requirements from f (x) (i.e. its being differentiable),
the divided differences are well defined also for non-differentiable functions. This notion
can be extended to divided differences of higher order as stated bellow

Let x, x0 , · · · , xn−1 be n − 1 distinct points. Let a = min(x, x0 , · · · , xn−1 ) and


b = max(x, x0 , · · · , xn−1 ). Assume that f (x) has a continuous derivative of
order n in the interval (a, b). Then

f n (ξ)
f [x, x0 , x1 , · · · , xn−1 ] = , (4.28)
n!

where ξ ∈ (a, b).

Example 4.12

If P (x) is the polynomial that interpolates the function f (x) = sin(x) at 10 points
on the interval [0, 1], what is the greatest possible error?
Solution: In this example we have n + 1 = 10 data points, hence n = 9 and

f (n+1) (x) = f (10) (x) = − sin(x).

Numerical Analysis-I Page 143


Chapter-4: Interpolation 4.6. Error in Polynomial Interpolation

Thus, the largest possible error would be the maximum of e10 (x), i.e.,

n

1
(10)
Y
f (ξx) (x − x i ) ,


10!
i=0

for x, x0 , x1 , · · · , x9 , ξx ∈ [0, 1]. Clearly, on the interval [0, 1]

max |x − xi | = 1

and

max f (10) (ξx ) = max |−s sin(x)| = 1

so the maximum error would be

1
(1)(1)n+1 ≈ 2.8 × 10−7 .
10!

Example 4.13

Determine the spacing h in a table of evenly spaced values of the function f (x) =

x between 1 and 2, so that interpolation with a second-degree polynomial in this
table will yield a desired accuracy .
1
Solution: Such a table contains the values f (xi ), i = 0, 1, · · · , n = h
, at points
xi = 1 + ih. If x ∈ [xi−1 , xi+1 ], then we approximate f (x) with P2 (x), where
P2 (x) is the polynomial of degree 2 that interpolates f at xi−1 , xi , xi+1 . Thus, by
Theorem 4.6, the error is

f 000 (ξx )
e2 (x) = (x − xi−1 )(x − xi )(x − xi+1 ),
3!

where ξx depends on x. Though ξx is unknown to us, we can drive the following


error bound:
f 000 (ξx ) ≤ max |f 000 (x)|
1≤x≤2
3 −5
= max x 2 ,
1≤x≤2 8
3
=
8

Numerical Analysis-I Page 144


Chapter-4: Interpolation 4.6. Error in Polynomial Interpolation

and for any x ∈ [xi−1 , xi ],




|(x − xi−1 )(x − xi )(x − xi+1 )| ≤ max (y − h)h(y + h)

y∈[−h,h]
2
= √ h3 .
3 3

In the last step, the maximum absolute value of g(y) = (y − h)y(y + h) over [−h, h]
is obtained as follows. Since

g(−h) = g(h) = g(0) = 0

g(y) achieves its maximum (or minimum) in the interior of the interval. We have

g0 (h) = 3y 2 − h2 .

h
Thus, g0 (h) = 0 yields y = ± √ , where the maximum of |g(y)| is attained. Hence
3
we have derived a bound on the interpolation error:

1 3 h
|e2 (x)| ≤ · ·√
3! 8 3
3 (4.29)
h
= √ .
2 3

Therefore, we have
h3
√ < .
2 3
In particular, suppose we want the accuracy of at least 7 places after zero. We
should choose h such that
h3
√ < 5 × 10−7 .
24 3
This gives us h ≈ 0.0127619. And the number of entries is about 79.

Exercise 4.1
Construct the Lagrange and Newton forms of the interpolating polynomial P3 (x)

for the function f (x) = 3 x which passes through the points (0, 0), (1, 1), (8, 2) and
(27, 3). Calculate the interpolation error at x = 5 and compare with the theoretical
error bound.

Numerical Analysis-I Page 145


Chapter-4: Interpolation 4.6. Error in Polynomial Interpolation

4.6.1 Spline Interpolation

Linear Spline

Definition 4.10
Function S is called a spline of degree one or linear spline on [a, b] if:

i. S is continuous on [a, b].

ii. There is a partitioning of the interval a = x0 < x1 < x2 < · · · < xn = b such
that S is a linear polynomial on each sub interval.

Figure 4.2: Linear Spline

Example 4.14

Determine whether the following function is linear spline function or not






x, −1 ≤ x ≤ 0


s(x) = 1 − x, 0<x<1



2x − 2,

1≤x≤2

Solution:

Numerical Analysis-I Page 146


Chapter-4: Interpolation 4.6. Error in Polynomial Interpolation

Example 4.15

State whether the following piecewise polynomials are linear splines or not.

x + 1,


 −1 ≤ x ≤ 0


s(x) = 2x + 1, 0<x<1




4 − x, 1≤x≤2

Solution:

Linear spline can be used for interpolation

Example 4.16

Obtain the piecewise linear interpolating polynomial (linear spline) for the function
f (x) defined by the data:

x 1 2 4 8
f (x) 3 7 21 73

and estimate the value of f (3) and f (7).


Solution: For the interval [1, 2] (i.e., for points (1, 3) and (2, 7)) we have:

P1 (x) = l0 (x)y0 + l1 (x)y1


x − x1 x − x0
=⇒ P1 (x) = y0 + y1
x0 − x1 x1 − x0
x−2 x−1
= (3) + (7)
1−2 2−1
= −3x + 6 + 7x − 7
=⇒ P1 (x) = 4x − 1, 1≤x≤2

For the interval [2, 4] (i.e., for points (2, 7) and (4, 21)) we have:

P2 (x) = l0 (x)y0 + l1 (x)y1


x − x1 x − x0
=⇒ P2 (x) = y0 + y1
x0 − x1 x1 − x0
x−4 x−2
= (7) + (21)
2−4 4−2
= 7x − 7
=⇒ P2 (x) = 7x − 7, 2≤x≤4

Numerical Analysis-I Page 147


Chapter-4: Interpolation 4.6. Error in Polynomial Interpolation

For the interval [4, 8] (i.e., for points (4, 21) and (8, 73)) we have:

P2 (x) = l0 (x)y0 + l1 (x)y1


x − x1 x − x0
=⇒ P2 (x) = y0 + y1
x0 − x1 x1 − x0
x−8 x−2
= (21) + (73)
4−8 8−4
= 13x − 31
=⇒ P2 (x) = 13x − 31, 4≤x≤8

Hence the linear spline interpolating the above data is:






 4x − 1, 1≤x≤2


P (x) = 7x − 7, 2≤x≤4




13x − 31, 4 ≤ x ≤ 8

Since 3 is in 2 ≤ x ≤ 4, f (3) = 7(3) − 7 = 14. In addition, since 7 in 4 ≤ x ≤


8, ; f (7) = 13(7) − 31 = 60

Example 4.17

Approximate the function f (x) = 4x on [−1, 1] by linear spline method and find
the value of f (0.125).

4.6.2 Cubic Spline

Definition 4.11
A cubic spline,S(x) is defined by the following conditions:

i. S(x) is a cubic polynomial in each of the sub intervals [xj−1 , j], j = 1, 2, · · · , n


and

ii. S(x), S 0 (x) and S 00 (x) are continuous at each point.

If the data

Numerical Analysis-I Page 148


Chapter-4: Interpolation 4.6. Error in Polynomial Interpolation

x x1 x2 ··· x1
f (x) y3 y2 ··· y1

is given then the cubic spline s(x) on this given data is found by finding the spline on
[xj−1 , xj ], j = 1, 2, · · · , n using the formula:
!
1   1 h2
s(x) = (xj − x)3 Mj−1 + (x − xj−1 )3 Mj + (xj − x) yj−1 − Mj−1
6h h 6
! (4.30)
1 h2
+ (x − xj−1 ) yj − Mj
h 6

And

6
Mj−1 + 4Mj + Mj = (yj−1 − 2yj + yj+1 ) , j = 1, 2, · · · n − 1,
h2 (4.31)
where s00 (xj ) = Mj .

Equation (??) gives a system of (n11) linear equations in the (n+1) unknowns M0 , M1 , M2 , · · · , Mn .
Two more conditions, called the end conditions, have to be prescribed to obtain (n + 1)
equations in (n + 1) unknowns M0 , M1 , M2 , · · · , Mn .

Different types of cubic splines are obtained when different end conditions are supplied.

1. When we assume the condition that M0 = 0 and Mn = 0 the cubic interpolation is


called the Natural Cubic Spline.

2. We may also assume that S 00 (x) to be constant near end points i.e. M0 = M1 and
Mn = Mn−1

3. We can impose the first derivative condition at the end points as follows:

M0 = y00 and Mn = yn0

4. Similarly ce can impose the second derivative condition at the end points as follows:

M0 = y000 and Mn = yn00

Example 4.18

The following values of x and y are given:

Numerical Analysis-I Page 149


Chapter-4: Interpolation 4.6. Error in Polynomial Interpolation

1 2 3 4
1 2 5 11

Find the natural cubic spline and evaluate y(1.5) and y0(3)
Solution:

Numerical Analysis-I Page 150


Chapter 5
Numerical Differentiation and Integration

5.1 Differentiation

Differentiation is a very important mathematical process and a great deal of effort has
been devoted to the development of analytic techniques of finding the derivatives of
various mathematical functions. It often occurs, however, that it is not possible to utilise
these traditional methods. This happens when:

(i) The function is too complex

(ii) The function is unknown (when data are collected from some experiment).

In this section numerical techniques are described which provide an estimate of the deriva-
tive of a tabulated function. The main concept of numerical differentiation is stated
below:

construct an appropriate interpolation polynomial from the given set of values of x and y
and then differentiate it at any value of x.

Like interpolation, lot of formulae are available for differentiation. Based on the given
set of values, different types of formulae can be constructed. The common formulae are
based on Lagrange’s and Newton’s interpolation formulae. These formulae are discussed
in this section.

151
Chapter-5: Interpolations 5.1. Differentiation

5.1.1 Differentiation based on Newton’s forward


interpolation formula

We know that the Newton’s forward and backward interpolation formulae are applicable
only when the arguments are in equispaced. So, we assumed that the given arguments
are equispaced.

Let the function y = f (x) be known at the (n + 1) equispaced arguments x0 , x1 , · · · , xn


and yi = f (xi ) for i = 0, 1, · · · , n. Since the arguments are in equispaced, therefore one
x − x0
x − x0 can write xi = x0 + ih. Also, let k = where h is called the spacing. For
h
this data set, the Newton’s forward interpolation formula is

k(k − 1) 2 k(k − 1) · · · (k − n + 1) n
Pn (x) = y0 + k 4 y0 + 4 y0 + · · · + 4 y0
2! n!
k2 − k 2 k 3 − 3k 2 + 2k 3
= y0 + k 4 y0 + 4 y0 + 4 y0
2! 3!
k 4 − 6k 3 + 11k 2 − 6k 4 k 5 − 10k 4 + 35k 3 − 50k 2 + 24k 5
+ 4 y0 + 4 y0 + · · ·
4! 5!
(5.1)
The error term of this interpolation formula is

k(k − 1)(k − 2) · · · (k − n) n+1 n+1


E( x) = h f (ξ), (5.2)
(n + 1)!

where min{x, x0 , x1 , · · · , xn } < ξ < {x, x0 , x1 , · · · , xn }.

Differentiating Equation (5.1) thrice to get the first three derivatives of

"
1 2k − 1 2 3k 2 − 6k + 2) 3 4k 3 − 18k 2 + 22k − 6 4
Pn0 (x) = 4y0 + 4 y0 + 4 y0 + 4 y0
h 2! 3! 4!
# !
5k 4 − 40k 3 + 105k 2 − 100k + 24 5 dk 1
+ 4 y0 + · · · ∵ =
5! dx h
(5.3)

"
1 6k − 6 3 12k 2 − 36k + 22 4
Pn00 (x) = 2 42 y0 + 4 y0 + 4 y0
h 3! 4!
# (5.4)
20k 3 − 120k 2 + 210k − 100 5
+ 4 y0 + · · ·
5!

Numerical Analysis-I Page 152


Chapter-5: Interpolations 5.1. Differentiation

" #
1 24k − 36 4 60k 2 − 240k + 210 5
Pn000 (x) = 3 43 y 0 + 4 y0 + 4 y0 + · · · (5.5)
h 4! 5!
and so on.

In this way, we can find all other derivatives. It may be noted that 4y0 , 42 y0 , 43 y0 , · · ·
are constants.

The above three formulae give the first three (approximate) derivatives of f (x) at any
arbitrary argument x where x = x0 +kh. The above formulae become simple when x = x0
, i.e. k = 0. That is,

1 1 1 1 1
 
Pn0 (x0 ) = 4y0 − 42 y0 + 43 y0 − 44 y0 + 45 y0 − · · · (5.6a)
h 2 3 4 5 
00 1 2 3 11 4 5 5
Pn (x0 ) = 4 y0 − 4 y0 + 4 y0 − 4 y0 + · · · (5.6b)
h 12 6 
1 3 7
Pn000 (x0 ) = 43 y0 − 44 y0 + 45 y0 − · · · (5.6c)
h 2 5

Error in Newton’s forward differentiation formula


The error in Newton’s forward differentiation formula is calculated from the expression
of error in Newton’s forward interpolation formula. The error in Newton’s forward inter-
polation formula is given by

f (n+1) (ξ)
En (x) = k(k − 1)(k − 2) · · · (k − n)hn+1
(n + 1)!

Differentiating this expression with respect to x, then we have

f (n+1) (ξ) d 1 k(k − 1) · · · (k − n) n+1 d h (n+1) i


En0 (x) = hn+1 [k(k − 1) · · · (k − n)] + h f (ξ)
(n + 1)! dk h (n + 1)! dx
f (n+1) (ξ) d k(k − 1) · · · (k − n) n+1 (n+2)
= hn [k(k − 1) · · · (k − n)] + h f (ξ1 )
(n + 1)! dk (n + 1)!

where ξ and ξ1 are two quantities depend on x and min{x, x0 , · · · , xn } < ξ, ξ1 < max{x, x0 , · · · , xn }.

The expression for error at the starting argument x = x0 , i.e., k = 0 is evaluated as

Numerical Analysis-I Page 153


Chapter-5: Interpolations 5.1. Differentiation

f (n+1) (ξ) d
En0 (x0 ) = hn [k(k − 1) · · · (k − n)]k=0 + 0
(n + 1)! dk
hn (−1)n n!f (n+1) (ξ)
" #
d n
= as [k(k − 1) · · · (k − n)]k=0 = (−1) n!
(n + 1)! dk
(−1)n hn f (n+1) (ξ)
= ,
n+1
where ξ lies between min{x, x0 , · · · , xn } and max{x, x0 , · · · , xn }.

Example 5.1

Consider the following table

x : 1.0 1.5 2.0 2.5 3.0 3.5


y : 1.234 2.453 7.625 12.321 18.892 23.327

dy d2 y dy
Find the value of , 2
at x = 1 and when x = 1.2
dx dx dx
Solution: The forward difference table is

x y 4y 42 y 43 y 44 y 45 y
1.0 1.234
3.425
1.5 2.453 3.095
6.520 0.36
2.0 7.625 3.455 0.71
9.975 1.07 −0.680
2.5 12.321 4.525 0.03
14.500 1.10
3.0 18.892 5.625
20.125
3.5 23.327

Here x = x0 = 1 and h = 0.5. Then k = 0 hence we can use Equation (5.6a) to


find the first derivative at x = 1. Thus,

1 1 1 1 1
 
0
y (1) ≈ 4y0 − 42 y0 + 43 y0 − 44 y0 + 45 y0
h  2 3 4 5
1 1 1 1 1

= 3.425 − × 3.095 + × 0.36 − × 0.71 + × (−0.680)
0.5 2 3 4 5
= 3.36800.

Numerical Analysis-I Page 154


Chapter-5: Interpolations 5.1. Differentiation

Similarly, using Equation (5.6b) we can compute the second derivative at x = 1:

1 11 4 5
 
00
y (1) ≈ 42 y0 − 43 y0 + 4 y0 − 45 y0
0.5  12 6
11 5

= 4.0 × 3.095 − ×0.36 + × 0.71 + × (−0.680)
12 6
= 15.8100.

x − x0 1.2 − 1
Now, at x = 1.2, h = 0.5, k = = = 0.4
h 0.5
Therefore, using Equation (5.3) we have:

"
0 1 2k − 1 2 3k 2 − 6k − 2) 3 4k 3 − 18k 2 + 22k − 6 4
y (1.2) = 4y0 + 4 y0 + 4 y0 + 4 y0
0.5 2! 3! 2!
#
5k 4 − 40k 3 + 105k 2 − 100k + 24 5
+ 4 y0
2!
"
1 2 × 0.4 − 1 3(0.4)2 − 6(0.4) − 2)
= 3.425 + × 3.095 + 0.36
0.5 2! 3!
4(0.4)3 − 18(0.4)2 + 22(0.4) − 6
+ × 0.71
4! #
5(0.4)4 − 40(0.4)3 + 105(0.4)2 − 100(0.4) + 24
+ × (−0.68)
5!
= 6.26948.

5.1.2 Differentiation based on Newton’s back-


ward interpolation formula

Like Newton’s forward differentiation formula one can derive Newton’s backward dif-
ferentiation formula based on Newton’s backward interpolation formula.

Suppose the function y = f (x) is not know explicitly, but it is known at (n+1) arguments
x0 , x1 , · · · , xn . That is, yi = f (xi ), i = 0, 1, 2, · · · , n are given. Since the Newton’s
backward interpolation formula is applicable only when the arguments are equispaced,
x − xn
therefore, xi = x0 + ih, i = 0, 1, 2, · · · , n and k = .
h

Numerical Analysis-I Page 155


Chapter-5: Interpolations 5.1. Differentiation

The Newton’s backward interpolation formula is

k(k + 1) 2 k(k + 1)(k + 2) 3


Pn (x) = yn + k∇yn + ∇ yn + ∇ yn
2! 3!
k(k + 1)(k + 2)(k + 3) 4 k(k + 1)(k + 2)(k + 3)(k + 4) 5
+ ∇ yn + ∇ yn + · · ·
4! 5!

Differentiating this formula with respect to x successively, the formulae for derivatives of
different order can be derived as
"
1 2k + 1 2 3k 2 + 6k + 2 3 4k 3 + 18k 2 + 22k + 6 4
Pn0 (x) = ∇yn + ∇ yn + ∇ yn + ∇ yn
h 2! 3! 4!
#
5k 4 + 40k 3 + 105k 2 + 100k + 24 5
+ ∇ yn + · · ·
5!
(5.7)

"
1 6k + 6 3 12k 2 + 36k + 22 4
Pn00 (x) = 2 ∇2 yn + ∇ yn + ∇ yn
h 3! 4!
# (5.8)
20k 3 + 120k 2 + 210k + 100 5
+ ∇ yn + · · ·
5!
" #
1 24k + 36 4 60k 2 + 240k + 210 5
Pn000 (x) = 2 ∇3 yn + ∇ yn + ∇ yn + · · · (5.9)
h 4! 5!
and so on.
dy d2 y d3 y
The above formulae give the approximate value of , , , and so on, at any point
dx dx2 dx3
value of x, where min{x, x0 , · · · , xn } < x < max{x, x0 , · · · , xn }.

When x = xn then v = 0. In this particular case, the above formulae reduced to the
following form.

1 1 1 1 1
 
Pn0 (xn ) = ∇yn + ∇2 yn + ∇3 yn + ∇4 yn + ∇5 yn + · · · (5.10a)
h 2 3 4 5 
00 1 2 3 11 4 5 5
Pn (xn ) = ∇ yn + ∇ yn + ∇ yn + ∇ yn + · · · (5.10b)
h 12 6 
1 3 7
Pn000 (xn ) = ∇3 yn + ∇4 yn + ∇5 yn + · · · (5.10c)
h 2 5

Numerical Analysis-I Page 156


Chapter-5: Interpolations 5.1. Differentiation

Error in Newton’s backward differentiation formula

The error can be calculated by differentiating the error in Newton’s backward interpola-
tion formula. Such error is given by

f (n+1) (ξ)
En (x) = k(k + 1)(k + 2) · · · (k + n)hn+1 .
(n + 1)!

x − xn
where k = and ξ lies between min{x, x0 , · · · , xn } and max{x, x0 , · · · , xn }. Differ-
h
entiating En (x), we get

d f (n+1) (ξ)
En0 (x) = hn [k(k + 1)(k + 2) · · · (k + n)]
dk (n + 1)!
(5.11)
k(k + 1)(k + 2) · · · (k + n) (n+2)
+ hn+1 f (ξ1 )
(n + 1)!

where ξ and ξ1 are two quantities depend on x and min{x, x0 , · · · , xn } < ξ, ξ1 < max{x, x0 , · · · , xn }.

This expression gives the error in differentiation at any argument x. In particular, when
x = xn , i.e. when k = 0 then

f (n+1) (ξ) d
En0 (xn ) = hn [k(k + 1) · · · (k + n)]k=0 + 0
(n + 1)! dk
hn n! (n+1)
" #
d
= f (ξ) as [k(k + 1) · · · (k + n)]k=0 = n!
(n + 1)! dk
hn f (n+1) (ξ)
= ,
n+1

Example 5.2

A slider in a machine moves along a fixed straight rod. It’s distance x (in cm)
along the rod are given in the following table for various values of the time t (in
second):

(sec) t : 0 2 4 6 8
(cm) x : 20 50 80 120 180

Find the velocity and acceleration of the slider at time t = 8.


Solution: The backward difference table is

Numerical Analysis-I Page 157


Chapter-5: Interpolations 5.1. Differentiation

x y ∇y ∇2 y ∇3 y ∇4 y
0 20
30
2 50 0
30 10
4 80 10 0
40 10
6 120 20
60
8 180

Here t = 8 = tn and h = 2,the velocity at t = 8 is given by

dx 1 1 2 1 1 1
 
v(8) = |t=8 ≈ ∇y4 + ∇ y4 + ∇3 y4 + ∇4 + ∇5 + · · ·
dt h 2 3 4 5
1 1 1

= 60 + × 20 + × 10 + 0
2 2 3
10
 
= 0.5 × 70 +
3
= 36.66667

Thus, the velocity of the slider when t = 8 is v = 36.6667.


The acceleration at t = 8 is

d2 x 1 11 4 5 5
 
2 3
a(8) = |t=8 ≈ ∇ y 4 + ∇ y 4 + ∇ y 4 + ∇ y4 + · · ·
dt2 h2 12 6
1
= [20 + 10 + 0]
2
2 10
 
= 2 × 70 +
2 3
= 7.50

Choice of differentiation formula

Choice of differentiation formula is same as choice of interpolation formula. That is, if the
given argument is at the beginning of the table then the Newton’s forward differentiation
formula is used. Similarly, when the given argument is at the end of the table then
the Newton’s backward differentiation formula is used. The Lagrange’s differentiation
formula is used for any argument.

Numerical Analysis-I Page 158


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

5.2 Integration (Trapezoidal and Simp-


son’s rule)

Integration is a very common and fundamental tool of integral calculus. But, finding of
integration is not easy for all kind of functions, even the function is known completely.
Again, in many real life problems, only a set of values of x and y are available and we
have to find the integration of such functions. In this situations, separate methods are
developed and these methods are known as numerical integration or quadrature.

The problem of numerical integration is stated below:

Given a set of points (x0 , y0 ), (x1 , y1 ), · · · , (xn , yn ) of a function y = f (x). The problem
is to find the value of the definite integral
Z xn
I= f (x) dx.
x0

The function f (x) is replaced by a suitable interpolating polynomial Pn (x).

Then the approximate value of the definite integral is then evaluated by the following
formula Z xn Z xn
f (x) dx ≈ Pn (x) dx.
x0 x0

A quadrature formula is said to be of closed type, if the limits of integration a(= x0 )


and b(xn ) are taken as two interpolating points. If a and b are not included in the
interpolating polynomial, then the formula is known as open type formula.

Newton’s cotes Quadrature formula

Let f (x) be an unknown function whose numerical values are given at (n + 1) equidistant
points xi in the interval [a, b], where xi = x0 + ih, i = 0, 1, · · · , n such that a = x0 and
b = xn , i.e., b − a = nh. Then
Z b Z xn
f (x) dx ≈ Pn (x) dx
a x0
Z n
=h Pn (p) dp, where x = x0 + ph, and dx = h dp.
0

Numerical Analysis-I Page 159


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

Expressing Pn (p) by Newton’s forward difference formula (5.12) , we get


Z n" #
Z b
p(p − 1) 2 p(p − 1) · · · (p − n + 1) n
f (x) dx ≈ h f0 + p∆f0 + ∆ f0 + · · · ∆ f0 dk
a 0 2! n!
" ! #n
p2 1 p3 p2 2
= h pf0 + ∆f0 + − ∆ f0 + · · · + last term
2 2! 3 2 0

Thus, the Newton’s Cotes quadrature formula is given by:

" ! #
Z b
n2 1 n3 n2
f (x) dx ≈ h nf0 + ∆f0 + − ∆2 f0 + · · · + last term (5.12)
a 2 2! 3 2

From this formula one can derive many simple formulae for different values of n =
1, 2, 3, · · · . Some particular cases are discussed below.

5.2.1 Trapeziodal Rule

One of the simple quadrature formula is trapezoidal formula. To obtain this formula, we
substitute n = 1 to the Equation (5.12). Obviously, we can fit a straight line through
these two points or we can say, one can obtain only first order differences from two points
then neglecting second and high order differences in Equation (5.12) we get,
Z b
1 1
   
f (x) dx ≈ h f0 + ∆f0 = h f0 + (f1 − f0 )
a 2 2
h
= (f0 + f1 )
2

Hence

Z b
h
f (x) dx ≈ (f0 + f1 ) (5.13)
a 2

This is known as Trapezoidal quadrature formula or the Trapezoidal rule.

Note that the formula is very simple and it gives a very rough approximation of the inte-
gral. So, if the interval [a, b] is divided into some subintervals and the formula is applied
to each of these subintervals, then much better approximate result may be obtained. This
formula is known as composite trapezoidal formula, described below.

Numerical Analysis-I Page 160


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

Composite Trapezoidal formula

Suppose the interval [a, b] be divided into n equal subintervals as a = x0 , x1 , x2 , · · · , xn =


b. That is, xi = x0 + ih, i = 1, 2, · · · , n, where h is the length of the intervals.

Now, the trapezoidal formula is applied to each of the subintervals, and we obtained the
composite formula as follows:
Z b Z x1 Z x2 Z xn
f (x) dx = f (x) dx + f (x) dx + · · · + f (x) dx
a x0 x1 xn−1
h h h
≈ (f0 + f1 ) + (f1 + f2 ) + · · · + (fn−1 + fn )
2 2 2
h
= [f0 + 2 (f1 + f2 + · · · + fn−1 ) + fn ] .
2

Therefore, the Composite trapezoidal rule is given by

 
Z b n−1
h X
f (x) dx ≈ f0 + 2 fj + fn  . (5.14)
a 2 j=1

Example 5.3

The following points were found empirically.

x 2.1 2.4 2.7 3.0 3.3 3.6


y 3.2 2.7 2.9 3.5 4.1 5.2

Use the trapezoidal rule to estimate


Z 3.6
y dx
2.1

Solution: Here we have h = 0.3, therefore we have


Z 3.6
0.3
y dx = [3.2 + 2 × (2.7 + 2.9 + 3.5 + 4.1) + 5.2]
2.1 2
= 5.22

Numerical Analysis-I Page 161


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

Error in Trapezoidal Rule

Since Trapezoidal rule is a numerical formula, it must have an error. The error in trape-
zoidal formula is calculated below.

Error of Trapezoidal in one step, i.e., Local error is


Z x1
p(p − 1) 2 00
EL0 = h f (ξ1 ) dp
x0 2!
3
h 00 Z 1  h3
= f (ξ1 ) p2 − p dp = = − f 00 (ξ1 ) → O(h3 ),
2 0 12

i.e., the local error of Trapezoidal rule is O(h3 ). Now, Global error in Trapezoidal rule
means the sum of all n local errors, which is

n−1
h3 00
[f (ξ1 ) + f 00 (ξ2 ) + · · · + f 00 (ξn )] , xi ≤ ξi ≤ xi+1 , i = 0, 1, 2 · · · , n − 1.
X
ELi = −
i=0 12

If we assume that f 00 (x) is continuous on (a, b) then there exists some value of x in (a, b),
n
f 00 (ξi ) = nf 00 (ξ).
X
say ξ such that
i=1

Therefore, the global error of Trapezoidal rule is given by

h3 00 b − a 2 00
ET = − nf (ξ) = − h f (ξ) → O(h2 ) since nh = b − a, (5.15)
12 12

which is one less to the order of local error.

Note: The error term in trapezoidal formula indicates that if the second and higher order
derivatives of the function f (x) vanish, then the trapezoidal formula gives exact result.
That is, the trapezoidal formula gives exact result when the integrand is linear.

Geometrical interpretation of trapezoidal formula

In trapezoidal formula, the integrand y = f (x) is replaced by the straight line, let AB
joining the points (xi , yi ) and (x1 , y1 ) (see Figure 5.1). Then the area bounded by the

Numerical Analysis-I Page 162


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

Figure 5.1: Geometrical interpretation of trapezoidal formula

curve y = f (x), the ordinates x = xi , x = xi+1 and the x-axis is approximated by the
area of the trapezium bounded by the straight line AB,Z the straight lines x = xxi , x =
xi+1
xi+1 and x-axis. That is, the value of the integration f (x) dx obtained by the a
xi
trapezoidal formula is nothing but the area of the trapezium.

5.2.2 Simpson’s Rule

When substituting n = 2 in the formula (5.12) and similar to n = 1, neglecting third and
higher order differences in (5.12), we get
" ! #
Z b
22 1 23 22
f (x) dx ≈ h 2f0 + ∆f0 + − ∆ 2 f0
a 2 2! 3 2
1
 
= h 2f0 + 2(f1 − f0 ) + (f2 − 2f1 + f0 )
3
h
= [f0 + 4f1 + f2 ]
3

Hence, the Simpson’s 1/3 formula is given by

Z b
h
f (x) dx ≈ [f0 + 4f1 + f2 ] (5.16)
a 3

Numerical Analysis-I Page 163


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

Composite Simpson’s 1/3 Rule

In the above formula, the interval of integration [a, b] is divided into two subdivisions.
Now, we divide the interval [a, b] into n (even number) equal subintervals by the arguments
x0 , x1 , x2 , · · · , xn , where xi = x0 + ih, i = 1, 2, · · · , n.

Z b Z x2 Z x4 Z xn
f (x) dx = f (x) dx + f (x) dx + · · · + f (x) dx
a x0 x2 xn−2
h h h
≈ (f0 + 4f1 + f2 ) + (f2 + 4f3 + f4 ) + · · · + (fn−2 + 4fn−1 + fn )
3 3 3
h
= [f0 + 4 (f1 + f3 + · · · + fn−1 ) + 2 (f2 + f4 + · · · + fn−2 ) + fn ] .
3
h
= [f0 + 4 (sum of fi with odd subscripts) + 2 (sum of fi with even subscripts) + fn ] .
3
(5.17)

Thus, the Simpson’s 1/3 composite quadrature formula is given by

n n
−1
 
Z b
h X2 2
X
f (x) dx ≈ f0 + 4 f2j−1 + 2 f2j + fn  . (5.18)
a 3 j=1 j=1

Note: Simpson’s 1/3 -rule requires the division of the whole range into an even number
of subintervals of width h.

Error in Simpson’s 1/3 quadrature formula

The Local error expression in Simpson’s 1/3 rule on the interval [x0 , x2 ] is given by

h5 (iv)
EL ≈ − f (ξ)
90

where x0 < ξ < x2 and the Global error in the composite Simpson’s 1/3 rule is given by

Numerical Analysis-I Page 164


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

b − a 4 (iv)
ET = − h f (ξ), where f (iv)(ξ) = max{f (iv) (x0 ), f (iv) (x1 ), · · · , f (iv) (xn )}
180
(5.19)

Example 5.4

Approximate the value of the integral


Z 1
e−x dx
0

using the composite Simpson’s rule with n = 4 subintervals. Determine an upper


bound for the absolute error using the error term. Verify that the absolute error is
within this bound.
1−0 1
Solution: Here n = 4 and hence h = = . Using the Simpson’s composite
4 4
rule we get
Z 1
1  0 
e−x dx ≈ e + 4e−0.25 + 2e−0.5 + 4e−0.75 + e−1
0 3×4
≈ 0.6321342

Next, we need to find an upper bound for the absolute error using the general error
term for the composite Simpson’s rule given by

b − a 4 (4)
E=− h f (ξ)
180

where a < ξ < b. Taking absolute values, and inserting a = 0, b = 1 and h = 1/4,
the absolute error E is
1
E=− |f (4) (ξ)|.
180 × 44
Since |f (4) (ξ) = e−x | is a decreasing positive function, we have the bound |f (4) (ξ)| <
e−0 = 1. Therefore the error bound is given by

1
E≤ ≈ 2.2 × 10−5 .
46080

To verify that the bound holds here, we easily compute the exact value of the
integral Z 1 h i1
e−x dx = −e−x = e−1 − e0 = 1 − e−1 = 0.6321206
0 0

Numerical Analysis-I Page 165


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

Thus the actual absolute error is (correct to the digits used)

|0.6321206 − 0.6321342 ≈ 1.4 × 10−5

so the absolute error is within our bound. The bound quite closely bounds the
actual absolute error in this case.

Exercise 5.1
Z 3
2
Evaluate (x+1)ex dx taking 10 intervals, by (i) Trapezoidal, and (ii) Simpson’s
1
1/3 rule. Ans. (i) 6149.2217 (ii) 5557.9445

Simpson’s 3/8 Rule:

We put n = 3 in Equation (5.12), and we will have the four points (x0 , y0 ), (x1 , y1 ), (x2 , y2 ), (x3 , y3 )
so that all forward differences higher than third order in Equation (5.12) will be zero.
Hence we obtain:
" ! ! #
Z b
32 1 33 32 1 34
f (x) dx ≈ h 3f0 + ∆f0 + − ∆2 f0 + − 33 + 32 ∆3 f0
a 2 2! 3 2 3! 4
9 9 2 9 3
 
= h 3f0 + ∆f0 + ∆ f0 + ∆ f0
2 4 24
3 h i
= h 8f0 + 12∆f0 + 6∆2 f0 + ∆3 f0
8
3
= h [8f0 + 12(f1 − f0 ) + 6(f2 − 2f1 + f0 ) + (f3 − 3f2 + 3f1 − f0 )]
8
3
= h [f0 + 3f1 + 3f2 + f3 ]
8

Therefore, the Simpson’s 3/8 Rule is given by

Z b
3
f (x) dx ≈ h [f0 + 3f1 + 3f2 + f3 ] (5.20)
a 8

Numerical Analysis-I Page 166


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

Composite Simpson’s 3/8 Rule

In the above formula, the interval of integration [a, b] is divided into three subdivisions.
Now, we divide the interval [a, b] into n ( a multiple of three) equal subintervals by the
arguments x0 , x1 , x2 , · · · , xn , where xi = x0 + ih, i = 1, 2, · · · , n.
Z b Z x3 Z x6 Z xn
f (x) dx = f (x) dx + f (x) dx + · · · + f (x) dx
a x0 x3 xn−3
3 3
≈ h (f0 + 3f1 + 3f2 + f3 ) + h (f3 + 3f4 + 3f5 + f6 ) + · · ·
8 8
3
+ h (fn−3 + 3fn−2 + 3fn−1 + fn )
8
3
= h [f0 + 3 (f1 + f2 + f4 + f5 + · · · fn−1 ) + 2 (f3 + f6 + f9 + · · · + fn−3 ) + fn ]
8

Thus, the composite Simpson’s 3/8 quadrature formula is given by

Z b
3
f (x) dx ≈ h [f0 + 3 (f1 + f2 + f4 + f5 + · · · fn−1 ) + 2 (f3 + f6 + · · · + fn−3 ) + fn ]
a 8
(5.21)

In using Equation (5.21) the number of subintervals should be taken as a multiple of 3.

Example 5.5

Approximate the value of the integral


Z 6
1
dx
0 1 + x2

by dividing the interval[0,6] in to six equal subintervals using:

i. Trapezoidal rule

ii. Simpson’s 1/3 rule

iii. Simpson’s 3/8 rule

Solution: Since we have six subintervals, i.e. n = 6, we can obtain the the step
size h as
6−0
h= = 1.
6

Numerical Analysis-I Page 167


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

1
As a result we obtain the values of the function f (x) = 1+x2
at the nodal points

xi 0 1 2 3 4 5 6
f (xi ) 1 0.5 0.2 0.1 0.0588 0.0385 0.027

i. Trapezoidal rule:
Z 6
1 1
dx ≈ [f0 + 2 (f1 + f2 + f3 + f4 + f5 ) + f6 ]
0 1 + x2 2
1
≈ [1 + 2 (0.5 + 0.2 + 0.1 + 0.0588 + 0.0385) + 0.027]
2
≈ 1.4108

ii. Simpson’s 1/3 rule:


Z 6
1 1
2
dx ≈ [f0 + 4 (f1 + f3 + f5 ) + 2 (f2 + f4 ) + f6 ]
0 1+x 3
1
≈ [1 + 2 (0.5 + 0.1 + 0.0385) + (0.2 + 0.0588) + 0.027]
3
≈ 1.3662

iii. Simpson’s 3/8 rule:


Z 6
1 3
2
dx ≈ [f0 + 3 (f1 + f2 + f4 + f5 ) + 2 (f3 ) + f6 ]
0 1+x 8
3
≈ [1 + 2 (0.5 + 0.2 + 0.0588 + 0.0385) + 2 (0.1+) + 0.027]
8
≈ 1.3571

A more illuminating explanation of why Simpson’s rule is “more accurate than it ought
to be” can be had by looking at the extent to which it integrates polynomials exactly.
This leads us to the notion of degree of precision for a quadrature rule.

Definition 5.1 Degree of Precision

The degree of precision of a quadrature formula is the positive integer n such


that E(Pm ) = 0 for all polynomials Pm (x) of degree ≤ n, but for which E(Pn+1 ) 6= 0
for some polynomial Pn+1 (x) of degree n + 1.

Numerical Analysis-I Page 168


Chapter-5: Interpolations 5.2. Integration (Trapezoidal and Simpson’s rule)

Example 5.6

Determine the degree of precision of Simpson’s rule.


Solution: It will suffice to apply the rule over the interval [0, 2].
Z 2
1 Z 2
1
dx = 2 = (1 + 4 + 1) , dx = 2 = (0 + 4 + 2)
0 3 0 3
Z 2
8 1 Z 2
1
x2 dx = = (0 + 4 + 4) , x3 dx = 4 = (0 + 4 + 8)
0 3 3 0 3

but, Z 2
32 1 20
x4 dx = 6= (0 + 4 + 16) = ,
0 5 3 3
Therefore, the degree of precision is 3.

Numerical Analysis-I Page 169


Chapter 6
Least Squares Method

6.1 Discrete Least Squares Approxima-


tion

Given a set of data points (x1 , y1 ), (x2 , y2 ), · · · , (xm , ym ), a normal and useful practice in
many applications in statistics, engineering and other applied sciences is to construct a
curve that is considered to be the “best fit” for the data, in some sense. So far, we have
discussed two data-fitting techniques, polynomial interpolation and piecewise polynomial
interpolation. Interpolation techniques, of any kind, construct functions that agree ex-
actly with the data. That is, given points (x1 , y1 ), (x2 , y2 ), · · · , (xm , ym ), interpolation
yields a function f (x) such that f (xi ) = yi for i = 1, 2, · · · , m.

However, fitting the data exactly may not be the best approach to describing the data
with a function. We have seen that high-degree polynomial interpolation can yield os-
cillatory functions that behave very differently than a smooth function from which the
data is obtained. Also, it may be pointless to try to fit data exactly, for if it is obtained
by previous measurements or other computations, it may be erroneous. Therefore, we
consider revising our notion of what constitutes a “best fit” of given data by a function.

One alternative approach to data fitting is to solve the minimax problem, which is the
problem of finding a function f (x) of a given form for which

max |f (xi ) − yi |,
1≤i≤m

170
Chapter-6: Leas Squares Method 6.1. Discrete Least Squares Approximation

is minimized. However, this is a very difficult problem to solve.

Another approach is to minimize the total absolute deviation of f (x) from the data. That
is, we seek a function f (x) of a given form for which

m
X
|f (xi ) − yi |,
i=1

is minimized. However, we cannot apply standard minimization techniques to this func-


tion, because, like the absolute value function that it employs, it is not differentiable.

This defect is overcome by considering the problem of finding f (x) of a given form for
which m
[f (xi ) − yi ]2 ,
X

i=1

is minimized. This is known as the least squares problem. In summary, the problem
of least squares is the following

Discrete Least-Squares Approximation Problem


Given a set of n discrete data points (xi , yi ), i = 1, 2, · · · , m.
Find the algebraic polynomial

Pn (x) = a0 + a1 x + a2 x2 + · · · an xn (n ≤ m),

such that the error E(a0 , a1 , a2 , · · · , an ) in the least-squares sense is minimized; that
is,
m h i2
a0 + a1 x + a2 x2 + · · · an xn − yi
X
E(a0 , a1 , a2 , · · · , an ) =
i=1

is minimum.
Here E(a0 , a1 , a2 , · · · , an ) is a function of (n + 1) variables: a0 , a1 , a2 , · · · an

We will first show how this problem is solved for the case where f (x) is a linear function
of the form f (x) = a1 x + a0 , and then generalize this solution to other types of functions.

Numerical Analysis-I Page 171


Chapter-6: Leas Squares Method 6.1. Discrete Least Squares Approximation

6.1.1 Linear Least-Squares

When f (x) is linear, the least squares problem is the problem of finding constants a0 and
a1 such that the function
m
[yi − a0 + a1 x]2
X
E(a0 , a1 )) =
i=1

is minimum. In order to minimize this function of a0 and a1 , we must compute its partial
derivatives with respect to a0 and a1 and set these partial derivatives to zero. This yields
m m
∂E(a0 , a1 ) X ∂E(a0 , a1 ) X
=2 [yi − a0 + a1 x] , =2 [yi − a0 + a1 x] xi .
∂a0 i=1 ∂a1 i=1

At a minimum, both of these partial derivatives must be equal to zero. This yields the
system of linear equations
m m
!
X X
na0 + xi a1 = yi ,
i=1 i=1
m m m
! !
x2i a1 =
X X X
x i a0 + xi yi ,
i=1 i=1 i=1

Using the formula for the inverse of a 2 × 2 matrix,


 −1  
a b 1 d −b
  =  ,
c d ad − bc −c a

we obtain the solution


Pm
x2i ) ( m m Pm
i=1 yi ) − ( i=1 xi ) (
P P
( i=1 i=1 xi y i )
a0 = 2
n m 2 Pm
i=1 xi − ( i=1 xi )
P
Pm Pm Pm
n i=1 xi yi − ( i=1 xi ) ( i=1 yi )
a1 = 2
n m 2 Pm
i=1 xi − ( i=1 xi )
P

Example 6.1

We wish to find the linear function y = a1 x + a0 that best approximates the data
shown in the following table, in the least-squares sense.

Numerical Analysis-I Page 172


Chapter-6: Leas Squares Method 6.1. Discrete Least Squares Approximation

i xi yi
1 2.0774 3.3123
2 2.0774 3.8982
3 3.0125 4.6500
4 4.7092 6.5576
5 5.5016 7.5173
6 5.8704 7.0415
7 6.2248 7.7497
8 8.4431 11.0451
9 8.7594 9.8179
10 9.3900 12.2477

Using the summations

380.5426 × 73.8373 − 56.2933 × 485.9487 742.5703


a0 = = = 1.1667,
10 × 380.5426 − 56.29332 636.4906
10 × 485.9487 − 56.2933 × 73.8373 702.9438
a1 = = = 1.1044
10 × 380.5426 − 56.29332 636.4906

We conclude that the linear function that best fits this data in the least-squares
sense is
y = 1.1044 + 1.1667x.

The data, and this function, are shown in Figure 6.1 bellow.

Figure 6.1: Data points (xi , yi ) (circles) and least-squares line (solid line)

Numerical Analysis-I Page 173


Chapter-6: Leas Squares Method 6.1. Discrete Least Squares Approximation

6.1.2 Non-linear least-squares (polynomial and


exponential) Approximations

In the last lecture, we learned how to compute the coefficients of a linear function that
best fit given data, in a least-squares sense. We now consider the problem of finding a
polynomial of degree n or exponential function that gives the best least-squares fit.

Polynomial least-square method

As before, let (x1 , y1 ), (x2 , y2 ), · · · , (xm , ym ) be given data points that need to be approx-
imated by a polynomial of degree n. We assume that n < m − 1, for otherwise, we can
use polynomial interpolation to fit the points exactly.

Let the least-squares polynomial have the form


m
aj x j ,
X
Pn (x) =
j=0

Our goal is to minimize the sum of squares of the deviations in Pn (x) from each y−value,
 2
m m n
[Pn (x) − yi ]2 = aj x j
X X X
E(a) = 
i − yi  ,
i=1 i=1 j=0

where a is a column vector of the unknown coefficients of Pn (x),


 
 a0 
 
 a1 
a=  
 . 
 . 
 . 
 
an

Since E(a0 , a1 , · · · , an ) is a function of the variables, a0 , a1 , · · · , an , for this function to


be minimum, we must have:

∂E
= 0, j = 0, 1, · · · , n
∂aj

Numerical Analysis-I Page 174


Chapter-6: Leas Squares Method 6.1. Discrete Least Squares Approximation

Now, simple computations of these partial derivatives yield:


m 
∂E 
yi − a0 − a1 xi − a2 x2i − · · · − an xni
X
= −2
∂a0 i=1
m
∂E  
xi yi − a0 − a1 xi − a2 x2i − · · · − an xni
X
= −2
∂a1 i=1
..
.
m
∂E  
xni yi − a0 − a1 xi − a2 x2i − · · · − an xni
X
= −2
∂an i=1

Setting these equations to be zero, we have


m m m m
xni =
X X X X
a0 1 + a1 x i + · · · an yi
i=1 i=1 i=1 i=1
m m m m
x2i + · · · an xn+1
X X X X
a0 x i + a1 i = xi y i
i=1 i=1 i=1 i=1
..
.
m m m m
xni xn+1 x2n xni yi
X X X X
a0 + a1 i + · · · an i =
i=1 i=1 i=1 i=1

Set m
xki ,
X
sk = k = 0, 1, · · · , 2n
i=1
m
xki yi ,
X
bk = k = 0, 1, · · · , n
i=1

Using these notations, the above equations can be written as:

s0 a0 + s1 a1 + · · · + sn an = b0
s1 a0 + s2 a1 + · · · + sn+1 an = b0
.. (6.1)
.
sn a0 + sn+1 a1 + · · · + s2n an = b0

This is a system of (n + 1) equations in (n + 1) unknowns a0 , a1 , · · · , an . These equations


are called Normal Equations. This system now can be solved to obtain these (n + 1)
unknowns, provided a solution to the system exists. We will not show that this system
has a unique solution if xi ’s are distinct.

Numerical Analysis-I Page 175


Chapter-6: Leas Squares Method 6.1. Discrete Least Squares Approximation

The system (6.1) can be written in the following matrix form:


    
 s0 s1 ··· s n   a0   b 0 
· · · sn+1 
    
 s1 s2   a1 
   b1 
= , (6.2)
  
. ... ..   ..   .
.
 .. 
  
. .  .   
    
sn sn+1 · · · s2n an bn

or
Sa = b, (6.3)

where      
 s0 s1 ··· sn   a0   b0 
· · · sn+1 
     
 s1 s2  a1   b1 
S= 
. ...

..  , a=  . ,
 
b=  . .
 
.  .  .
. . 
  .  .
     
sn sn+1 · · · s2n an bn
Define  
1 x1 x21 · · · xn1 
1 x22 · · · xn2 

x2

 
x23 · · · xn3 
 
V = 1 x3
 
. .. .. . . . .. 
.
. . . . 
 
1 xm x2m · · · xnm

Then the system (6.3) has the form:

V T V a = b. (6.4)

The matrix V is known as the Vandermonde matrix, and this matrix has full rank if
xi ’s are distinct. In this case, the matrix S = V T V is symmetric and positive definite
[Exercise] and is therefore nonsingular. Thus, if xi ’s are distinct, the equation (6.3) has
a unique solution.

Theorem 6.1 Existence and uniqueness of Discrete Least-Squares Solutions

Let (x1 , y1 ), (x2 , y2 ), · · · , (xm , ym ) be m distinct points. Then the discrete least-
square approximation problem has a unique solution.

Example 6.2

We wish to find the quadratic function y = a2 x2 + a1 x + a0 that best approximates


the data shown in the following table, in the least-squares sense.

Numerical Analysis-I Page 176


Chapter-6: Leas Squares Method 6.1. Discrete Least Squares Approximation

i xi yi
1 2.0774 3.3123
2 2.0774 3.8982
3 3.0125 4.6500
4 4.7092 6.5576
5 5.5016 7.5173
6 5.8704 7.0415
7 6.2248 7.7497
8 8.4431 11.0451
9 8.7594 9.8179
10 9.3900 12.2477
First let’s compute the summation

i xi yi x2i x3i x4i xi y i x2i yi


1 2.0774 2.7212 4.3156 8.9652 18.6243 5.6530 11.7436
2 2.3049 3.7798 5.3126 12.2449 28.2233 8.7121 20.0804
3 3.0125 4.8774 9.0752 27.3389 82.3585 14.6932 44.2632
4 4.7092 6.6596 22.1766 104.4339 491.8000 31.3614 147.6870
5 5.5016 10.5966 30.2676 166.5202 916.1278 58.2983 320.7337
6 5.8704 9.8786 34.4616 202.3034 1187.6016 57.9913 340.4323
7 6.2248 10.5232 38.7481 241.1994 1501.4180 65.5048 407.7544
8 8.4431 23.3574 71.2859 601.8743 5081.6849 197.2089 1665.0542
9 8.7594 24.0510 76.7271 672.0833 5887.0461 210.6723 1845.3632
10 9.3900 27.4827 88.1721 827.9360 7774.3192 258.0626 2423.2074
Sum 56.2933 123.9275 380.5423 2864.8995 22969.2037 908.1578 7226.3193
Thus, the matrix S and the vector b are given by
   
10 56.2933 380.5423 123.9275
   
S =  56.2933 380.5423 2864.8995  b =  908.1578  .
   
   
380.5423 2864.8995 22969.2037 7226.3193

solving the normal equations


Sa = b

we obtain the coefficients

a0 = 4.7681, a1 = −1.5193, a2 = 0.4251.

Numerical Analysis-I Page 177


Chapter-6: Leas Squares Method 6.1. Discrete Least Squares Approximation

and conclude that the quadratic function that best fits this data in the least-squares
sense is
y = 0.4251x2 − 1.5193x + 4.7681.

The data, and this function, are shown in Figure 6.2.

Figure 6.2: Data points (xi , yi ) (circles) and least-squares line (solid line)

Exponential least-square method

Least-squares fitting can also be used to fit data with functions that are not linear combi-
nations of functions such as polynomials. Suppose we believe that given data points can
best be matched to an exponential function of the form y = beax , where the constants a
and b are unknown. Taking the natural logarithm of both sides of this equation yields

ln y = ln b + ax.

If we define z = ln y and c = ln b, then the problem of fitting the original data points
{(xi , yi )}m
i=1 with an exponential function is transformed into the problem of fitting the
data points {(xi , yi )}m
i=1 with a linear function of the form c + ax, for unknown constants
a and c.

Similarly, suppose the given data is believed to approximately conform to a function


of the form y = bxa , where the constants a and b are unknown. Taking the natural

Numerical Analysis-I Page 178


Chapter-6: Leas Squares Method 6.1. Discrete Least Squares Approximation

logarithm of both sides of this equation yields

ln y = ln b + a ln x.

If we define z = ln y, c = ln b and w = ln x, then the problem of fitting the original data


points {(xi , yi )}m
i=1 with a constant times a power of x is transformed into the problem of
fitting the data points {(wi , zi )}m
i=1 with a linear function of the form c + aw, for unknown
constants a and c.

Example 6.3

We wish to find the exponential function y = beax that best approximates the data
shown in the following table , in the least-squares sense.

i xi yi
1 2.0774 1.4509
2 2.3049 2.8462
3 3.0125 2.1536
4 4.7092 4.7438
5 5.5016 7.7260
First let’s compute the summation

i xi yi zi = ln yi x2i xi zi
1 2.0774 1.4509 0.3722 4.3156 0.7732
2 2.3049 2.8462 1.0460 5.3126 2.4109
3 3.0125 2.1536 0.7671 9.0752 2.3110
4 4.7092 4.7438 1.5568 22.1766 7.3315
5 5.5016 7.7260 2.0446 30.2676 11.2485
Sum 17.6056 18.9205 5.7867 71.1475 24.0751
By defining    
5 17.6056 5.7867
S=  b= 
17.6056 71.1475 24.0751
solving the normal equations
Sc = b

we obtain the coefficients

c0 = −0.2653, c1 = 0.4040

and we obtain a = c1 = 0.4040, b = ec0 = e−0.2653 = 0.7670,. and conclude that

Numerical Analysis-I Page 179


Chapter-6: Leas Squares Method 6.2. Continuous Least-Squares Approximation

the exponential function that best fits this data in the least-squares sense is

y = 0.7670e0.4040x .

The data, and this function, are shown in Figure 6.3.

Figure 6.3: Data points (xi , yi ) (circles) and least-squares line (solid line)

6.2 Continuous Least-Squares Approxi-


mation

In the previous section, we have described least-squares approximation to fit a set of


discrete data. Here we first describe continuous least-square approximations of a
function f (x) by using polynomials and later in the subsequent sections using orthogonal
polynomials and Fourier series.

6.2.1 Approximation by Polynomials

First, consider approximation by a polynomial with monomial basis: {1, x, x2 , · · · , xn }.

Numerical Analysis-I Page 180


Chapter-6: Leas Squares Method 6.2. Continuous Least-Squares Approximation

Least-Square Approximations of a Function Using Monomial Polynomials


Given a function f (x), continuous on [a, b], find a polynomial Pn (x) of degree at most
n:
Pn (x) = a0 + a1 x + a2 x2 + · · · an xn (n ≤ m),

such that the integral of the square of the error is minimized. That is,
Z b
E(a0 , a1 , a2 , · · · , an ) = [f (x) − Pn (x)]2 dx,
a

is minimized.

The polynomial Pn (x) is called the Least-Squares Polynomial. For minimization, we


must have
∂E
= 0, i = 0, 1, · · · , n.
∂ai
As before, these conditions will give rise to a system of (n + 1) normal equations in
(n + 1) unknowns: a0 , a1 , · · · , an . Solution of these equations will yield the unknowns:
a0 , a1 , · · · , an .

Setting up the Normal Equations

Since Z bh  i2
E= f (x) − a0 + a1 x + a2 x2 + · · · an xn dx,
a

differentiating E with respect to each ai results in


Z bh
∂E i
= −2 f (x) − a0 − a1 x − a2 x2 − · · · − an xn dx,
∂a0 a
Z b h
∂E i
= −2 x f (x) − a0 − a1 x − a2 x2 − · · · − an xn dx,
∂a1 a
..
.
Z b
∂E h i
= −2 xn f (x) − a0 − a1 x − a2 x2 − · · · − an xn dx.
∂an a

Thus, we have
Z b Z b Z b Z b Z b
∂E 2 n
=0 =⇒ a0 1 dx + a1 x dx + a2 x dx + · · · + an x dx = f (x).
∂a0 a a a a a

Numerical Analysis-I Page 181


Chapter-6: Leas Squares Method 6.2. Continuous Least-Squares Approximation

Similarly,
Z b Z b Z b Z b Z b
∂E
=0 =⇒ a0 xi dx + a1 dxi+1 x dx + a2 xi+2 dx + · · · + an xi+n dx = xi f (x).
∂ai a a a a a

i = 0, 1, 2, · · · n.

So, the (n + 1) normal equations in this case are:


Z b Z b Z b Z b Z b
2 n
i = 0 : a0 1 dx + a1 x dx + a2 x dx + · · · + an x dx = f (x).
a a a a a
Z b Z b Z b Z b Z b
i = 1 : a0 x dx + a1 x2 x dx + a2 x3 dx + · · · + an xn+1 dx = xf (x).
a a a a a
..
.
Z b Z b Z b Z b Z b
n n+1 n+2
i = n : a0 x dx + a1 x x dx + a2 x dx + · · · + an 2n
x dx = xn f (x).
a a a a a

Denoting
Z b Z b
xi dx = si , i = 0, 1, 2, · · · , 2n, and bi = xi f (x) dx, i = 0, 1, 2, · · · n,
a a

the above (n + 1) equations can be written as

s 0 a0 + s 1 a1 + · · · + s n an = b 0
s1 a0 + s2 a1 + · · · + sn+1 an = b0
..
.
sn a0 + sn+1 a1 + · · · + s2n an = b0

or in matrix notation     
 s0 s1 ··· s n   a0   b 0 
· · · sn+1 
    
 s1 s2   a1 
   b1 
= .
  
. ... ..   ..   .
.
 .. 
  
. .  .   
    
sn sn+1 · · · s2n an bn
Hence, we have the system of normal equations

Sa = b, (6.5)

where      
 s0 s1 ··· sn   a0   b0 
· · · sn+1 
     
 s1 s2  a1   b1 
S= 
. ...

..  , a=  . ,
 
b=  . .
 
.  .  .
. . 
  .  .
     
sn sn+1 · · · s2n an bn

Numerical Analysis-I Page 182


Chapter-6: Leas Squares Method 6.2. Continuous Least-Squares Approximation

The solution of Equation (6.5) will yield the coefficients a0 , a1 , · · · , an of the least-squares
polynomial Pn (x).

Method-1 Least-Squares Approximation using Monomial Polynomials

Inputs: (i) f (x) - A continuous function on [a, b].


(ii) n - The degree of the desired least-squares polynomial

Output: The coefficients a0 , a1 , · · · , an of the desired least-squares polyno-


mial: Pn (x) = a0 + a1 x + · · · + an xn.

Step 1: Compute s0 , s1 , · · · , s2n :


for i = 0, 1, 2, · · · , 2n do
Z b
si = xi dx
a

end

Step 2: Compute b0 , b1 , · · · , bn :
for i = 0, 1, 2, · · · , n do
Z b
bi = xi f (x) dx
a

end

Step 3: Form the matrix S from the numbers s0 , s1 , · · · , s2n and the vector
b from the numbersb0 , b1 , · · · , bn , i.e.,
   
 s0 s1 ··· sn   b0 
· · · sn+1 
   
 s1 s2  b1 
S= 
. ...

..  , b=  . .
 
. .
. . 
 .
   
sn sn+1 · · · s2n bn

Step 4: Solve the (n + 1) × (n + 1) system of equations for a0 , a1 , · · · , an :


 
 a0 
 
 a1 
Sa = b, where a=  . .
 
 . 
 . 
 
an

Numerical Analysis-I Page 183


Chapter-6: Leas Squares Method 6.2. Continuous Least-Squares Approximation

A Special Case:

Let the interval be [0, 1]. Then


Z 1
1
si = xi dx = , i = 0, 1, 2, · · · , 2n.
0 i+1

Thus, in this case the matrix of the normal equations


 
1 1
1 2
··· n 
1 1 1 

2 3
··· 
n+1 
S= .
. ... ..  ,
. . 

 
1 1 1
n n+1
··· 2n

which is a Hilbert Matrix. It is well-known to be ill-conditioned.

Example 6.4

Find Linear and Quadratic least-squares approximations to f (x) = ex on [−1, 1].


Solution:
Linear approximation: n = 1 P1 (x) = a0 + a1 x
Step 1: Z 1
s0 = 1 dx = 2,
−1
" #1
Z 1
x2 1 1
 
s1 = x dx = = − = 0,
−1 2 −1
2 2
3 1
" #
Z 1
x 1 −1 2
 
s2 = x2 dx = = − = .
−1 3 −1
3 3 3
Step 2: Z 1
1
b0 = 1 dex = [ex ]1−1 = e − = 2.3504,
−1 e
Z 1
2
b1 = xex dx = = 0.7358,
−1 e
Step 3: From the matrix S and vector b:
   
2 0 2.3504
S= 2
, b= 
0 3
0.7358

Numerical Analysis-I Page 184


Chapter-6: Leas Squares Method 6.2. Continuous Least-Squares Approximation

Step 4: Solve the normal system is:


    
2 0 a0 2.3504

2
  = 
0 3
a1 0.7358

This gives
a0 = 1.1752, a1 = 1.1037

The linear least-squares polynomial P1 (x) = 1.1752 + 1.1037x.


Accuracy Check:
P1 (0.5) = 1.7270, e0.5 = 1.6487.

Relative Error:

|e0.5 − P1 (0.5)| |1.6487 − 1.7270|


= = 0.0475.
|e |
0.5 |1.6487|

Quadratic fitting n=2; P2 (x) = a0 + ax + a2 x2


Step 1: Compute si ’s

2
s0 = 2, s1 = 0, s2 =
3
" #1
Z 1
x4 1 1
 
3
s3 = x dx = = − = 0,
−1 4 −1
4 4
5 1
" #
Z 1
x 1 −1 2
 
4
s4 = x dx = = − = .
−1 5 −1
5 5 5

Step 2: Compute bi ’s

b0 = 2.3504, b1 = 0.7358,
Z 1
5
b2 = x2 ex dx = e − = 0.8789.
−1 e

Step 3: From the matrix S and vector b:


   
2
2 0 3
2.3504
  
S = 0 2 , b = 0.7358
0
   
 3   
2 2
3
0 5
0.8789

Numerical Analysis-I Page 185


Chapter-6: Leas Squares Method 6.2. Continuous Least-Squares Approximation

Step 4: Solve the normal system is:


    
2
2 0 a
3   0
2.3504
  
0 2 a  = 0.7358
0
    
 3   1  
2 2
3
0 5
a2 0.8789

This gives
a0 = 0.9963, a1 = 1.1037, a2 = 0.5368.

The linear least-squares polynomial P2 (x) = 0.9963 + 1.1037x + 0.5368x2 .


Accuracy Check:
P2 (0.5) = 1.6889, e0.5 = 1.6487.

Relative Error:

|e0.5 − P1 (0.5)| |1.6487 − 1.6889|


= = 0.0204.
|e |
0.5 |1.6487|

Numerical Analysis-I Page 186


Chapter 7
Numerical methods for ODEs

7.1 Introduction

Consider y(x) to be a function of a variable x. A first order Ordinary differential


equation is an equation relating y, x and its first order derivatives. The most general
form is :
F (x, y(x), y 0 (x)) = 0

The variable y is known as a dependent variable and x is independent variable. The


equation is of first order as it is the order of highest derivative present in the equation.
Sometimes it is possible to rewrite the equation in the form

y 0 (x) = f (x, y(x)). (7.1)

y = g(x) is a solution of the first order differential equation (7.1) means

i) y(x) is differentiable

ii) Substitution of y(x) and y 0 (x) in (7.1) satisfies the differential equation identically.

The differential equations are commonly obtained as mathematical representations of


many real world problems. Then the solution of the underlying problem lies in the
solution of differential equation. Finding solution of the differential equation is then
critical to that real world problem. In This chapter we are concerned with the problem
of solving differential equations, numerically.

187
Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

7.2 Initial Value problem

At first we are concentrate on the so-called first order Initial Value Problem (IVP).
A first order differential equation together with specified initial condition at x = x0 is
defined as
y 0 (x) = f (x, y(x)) with y(x0 ) = y0 . (7.2)

There exist several methods for finding solutions of differential equations. However, all
differential equations are not solvable. The following well known theorem from theory of
differential equations establishes the existence and uniqueness of solution of the IVP:

Theorem 7.1 Existence and Uniqueness Theorem

Let f (x, y(x)) be continuous in a domain D = {(x, y(x)) : a ≤ x ≤ b, c ≤ y ≤ d} ⊆


R2 . If f satisfies Lipschitz condition on the variable y and (x0 , y0 ) in D, then
IVP has a unique solution y = y(x) on the some interval a ≤ x ≤ b. (The function
f satisfies Lipschitz condition means that there exists a positive constant L such
that |f (x, y) − f (x, w)| < L|y − w| )

The theorem gives conditions on function f (x, y) for existence and uniqueness of the
solution. But the solution has to be obtained by available methods. It may not be possi-
ble to obtain analytical solution (in closed form) of a given first order differential equation
by known methods even when the above theorem guarantees its existence. Sometimes it
is very difficult to obtain the solution. In such cases, the approximate solution of given
differential equation can be obtained using Numerical methods.

Discretization

The aim of this chapter is to device numerical methods to obtain an approximate solution
of the initial value problem (7.2) at only a discrete set of point. That is, if we are
interested in obtaining solution for (7.2) in an interval [a, b], then we first discretize the
interval as
a = x0 < x1 < · · · < xN = b,

Numerical Analysis-I Page 188


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

where each point xi , i = 0, 1, · · · , N is called a node. Unless otherwise stated, we


always assume that the nodes are equally spaced. That is,

xi = x0 + ih, i = 0, 1, · · · N

for a sufficiently small positive real number h, called the stepsize. We use the notation
for the approximate solution as

yi = yh (xi ) ≈ y(xi ), i = 0, 1, · · · , N.

In both cases, let us assume that we somehow have found solutions yi ≈ y(xi ), for i =
0, 1, · · · , n, and we want to find an approximation yn+1 ≈ y(xn + 1) where xn+1 = xn + h.
Basically, there are two different classes of methods in practical use.

1). One-step methods: Only yn is used to find the approximation yn+1 . One-step
methods usually require more than one function evaluation pre-step.
They can all be put in a general abstract form

yn+1 = yn + hφ(xn , yn ; h).

2.) Linear multistep methods: yn+1 is approximated from yn−k+1 , · · · , yn .

In the next section a very basic one-step method known as Euler method is being dis-
cussed.

7.2.1 Euler’s Method

Euler’s method is the natural starting point for any discussion of numerical methods for
IVPs. Although it is not the most accurate of the methods we study, it is by far the
simplest, and much of what we learn from analyzing Euler’s method in detail carries over
to other methods without a lot of difficulty.

Euler’s Method assumes our solution is written in the form of a Taylor’s Series (??). This
gives us a reasonably good approximation if we take plenty of terms, and if the value of
h is reasonably small.

Numerical Analysis-I Page 189


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

For Euler’s Method, we just take the first 2 terms only.

0h2 00
y(x + h) = y(x) + hy (x) + y (η), where x < η < x + h.
2!

Using the fact that y 0 = f (x, y(x)), we obtain a numerical scheme by truncating the
Taylor series after the second term.

Method-2 Euler’s method


For approximating the solution to the initial-value problem

y 0 (x) = f (x, y), y(x0 ) = y0

at the points xi+1 = x0 + ih (i = 0, 1, 2, · · · , N ) on [x0 , xN ] is

yi+1 = yi + hf (xi , yi ). with y0 = y(x0 ). (7.3)

Example 7.1

Consider the initial-value problem

1
y 0 = y − x, y(0) =
2

Use Euler’s method (a) with h = 0.1 and (b) with h = 0.05 to obtain an approxi-
mation to y(1). Given the exact solution to the initial value problem is

1
y(x) = x + 1 − ex
2

compare the errors in the two approximations to y(1).

7.2.2 Other Examples of One-Step Method

Assume that xn , yn is known. The exact solution y(xn+1 ) with xn+1 = xn + h of equa-
tion (7.1) passing through this point is given by
Z xn+1 Z xn +h
0
y(xn + h) = yn + y (τ )dτ = yn + f (τ, y(τ ))dτ. (7.4)
xn xn

Numerical Analysis-I Page 190


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

The idea is to find approximations to the last integral. The simplest idea is to use
f (τ, y(τ )) ≈ f (xn , yn ), in which case we get the Euler method again:

yn+1 = yn + hf (xn , yn ).

Modified Euler Method

The integral (7.4) can be approximated by Midpoint rule or Rectangular rule of


integration as Z xn +h
f (τ, y(τ ))dτ = h [f (xn + h/2, y (xn + h/2))] .
xn

By inserting the forward Euler step for the missing value y(xn + h/2)

h
y(xn + h/2) = yn + f (xn , yn )
2

we obtain the Modified Euler’s method as


" !#
h
yn+1 = yn + h f xn + h/2, yn + f (xn , yn )
2

which we can rewrite as:

Method-3 Modified Euler’s Method

yn+1 = yn + k2 ,
k1 = hf (xn , yn ) (7.5)
k2 = hf (xn + h/2, yn + k1 /2)

Improved Euler Method

The above numerical method can be improved for a more accurate solution by using
the trapezoidal rule instead of using the rectangular rule. Approximating the integral in
Equation (7.4) by Trapezoidal rule results in
Z xn +h
h
yn+1 = yn + f (xn , y(τ ))dτ = yn + (f (xn , yn ) + f (xn+1 , yn+1 )) . (7.6)
xn 2

Here yn+1 is available by solving a (usually) nonlinear system of equations. Such methods
are called implicit methods. To avoid this extra difficulty, we could replace yn+1 on the

Numerical Analysis-I Page 191


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

right hand side by the approximation from Eulers method. The method that we consider
here is an example of what is called a predictor-corrector method. The idea is to use
the formula from Euler’s method to obtain a first approximation to the solution y(xn+1 ),
we denote this approximation as

yn+1 = yn + hf (xn , yn ).

Hence, Equation (7.7) becomes

h
yn+1 = yn + (f (xn , yn ) + f (xn + h, yn + hf (xn , yn ))) , (7.7)
2

which can be rewrite as:

Method-4 Improved Euler Method

1
yn+1 = yn + (k1 + k2 ) ,
2
where
k1 = hf (xn , yn )
k2 = hf (xn + h, yn + k1 )

Example 7.2

Apply (i) Euler’s method, (ii) Modified Euler’s Method and (ii) Improved Euler’s
method to compute y(x) at x = 0.3 with step-size h = 0.1 for the initial value
problem:
dy
= 2x (1 − y) , y(0) = 2
dx
Compare the errors en = |y(xn ) − yn | at each step with the exact solution y(x) =
2
1 + e−x .
Solution: Since we are solving the problem on the interval [0, 0.3] with step-size
h = 0.1 we have the nodes x0 = 0, x1 = 0.1, x2 = 0.2 and x3 = 0.3 and the initial
value y0 = 2. In addition we have f (x, y) = 2x(1 − y).

i.) Euler’s method: is given by

yn+1 = yn + hf (xn , yn ),
where
f (xn , yn ) = 2xn (1 − yn )

Numerical Analysis-I Page 192


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

When n = 0, i.e., at x = 0.1 we have

y1 = y0 + hf (x0 , y0 ) = y0 + h [2x0 (1 − y0 )]
= 2 + 0.1 × [2 × 0 × (1 − 2)]
=2

When n = 1, i.e., at x = 0.2 we have

y2 = y1 + hf (x1 , y1 ) = y1 + h [2x1 (1 − y1 )]
= 2 + 0.1 × [2 × 2 × (1 − 2)]
= 1.98

When n = 2, i.e., at x = 0.3 we have

y3 = y2 + hf (x2 , y2 ) = y2 + h [2x2 (1 − y2 )]
= 1.98 + 0.1 × [2 × 1.98 × (1 − 1.98)]
= 1.9408

ii.) Modified Euler’s method: is given by

yn+1 = yn + k2,
where
k1 = hf (xn , yn )
k2 = hf (xn + h/2, yn + k1/2)

Numerical Analysis-I Page 193


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

When n = 0, i.e., at x = 0.1 we have

k1 = hf (x0 , y0 )
= h (2x0 (1 − y0 ))
= 0.1 × (2 × 0 × (1 − 2))
=0
k2 = hf (x0 + h/2, y0 + k1/2)
= h (2(x0 + h/2) (1 − (y0 − k1/2)))
= 0.1 × (2 × (0 + 0.1/2) × (1 − (2 + 0/2)))
= −0.01
y1 = y0 + k2
= 2 + (−0.01)
= 1.99

When n = 1, i.e., at x = 0.2 we have

k1 = hf (x1 , y1 ) = h (2x1 (1 − y1 ))
= 0.1 × [2 × 0.1 × (1 − 1.99)]
= −0.0198

k2 = hf (x1 + h/2, y1 + k1/2)


= h (2(x1 + h/2) (1 − (y1 − k1/2)))
= 0.1 × [2 × (0.1 + 0.1/2) × (1 − (1.99 + (−0.0198)/2))]
= −0.0294
y2 = y0 + k2
= 1.99 + (−0.0294)
= 1.9606

Numerical Analysis-I Page 194


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

When n = 2, i.e., at x = 0.3 we have

k1 = hf (x2 , y2 ) = h (2x2 (1 − y2 ))
= 0.1 × [2 × 0.2 × (1 − 1.9606)]
= −0.0384
k2 = hf (x2 + h/2, y2 + k2/2)
= h (2(x2 + h/2) (1 − (y2 − k1/2)))
= 0.1 × [2 × (0.2 + 0.1/2) × (1 − (1.9606 + (−0.0384)/2))]
= −0.0471
y3 = y0 + k2
= 1.9606 + (−0.0384)
= 1.9135

iii.) Improved Euler’s method: is given by

1
yn+1 = yn + (k1 + k2) ,
2
where
k1 = hf (xn , yn )
k2 = hf (xn + h, yn + k1)

When n = 0, i.e., at x = 0.1 we have

k1 = hf (x0 , y0 )
= h (2x0 (1 − y0 ))
= 0.1 × (2 × 0 × (1 − 2))
=0

Numerical Analysis-I Page 195


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

k2 = hf (x0 + h, y0 + k1)
= h (2(x0 + h) (1 − (y0 − k1)))
= 0.1 × (2 × (0 + 0.1) × (1 − (2 + 0)))
= −0.02
1
y1 = y0 + (k1 + k2)
2
= 2 + 0.5 × (0 + (−0.02))
= 1.99

When n = 1, i.e., at x = 0.2 we have

k1 = hf (x1 , y1 ) = h (2x1 (1 − y1 ))
= 0.1 × [2 × 0.1 × (1 − 1.99)]
= −0.0198
k2 = hf (x1 + h, y1 + k1)
= h (2(x1 + h) (1 − (y1 − k1)))
= 0.1 × [2 × (0.1 + 0.1) × (1 − (1.99 + (−0.0198)))]
= −0.0388
1
y2 = y0 + (k1 + k2)
2
= 1.99 + 0.5 × (−0.0198 − 0.0388)
= 1.9607

When n = 2, i.e., at x = 0.3 we have

k1 = hf (x2 , y2 ) = h (2x2 (1 − y2 ))
= 0.1 × [2 × 0.2 × (1 − 1.9607)]
= −0.0384
k2 = hf (x2 + h, y2 + k2)
= h (2(x2 + h) (1 − (y2 − k1)))
= 0.1 × [2 × (0.2 + 0.1) × (1 − (1.9607 + (−0.0384)))]
= −0.0553
1
y1 = y0 + (k1 + k2)
2
= 1.9606 + 0.5 × (−0.0384 − 0.0553) = 1.9138

Numerical Analysis-I Page 196


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

Summary
x Exact value Euler’s Modified Improved
0.1 1.9900 2 1.99 1.99
0.2 1.9608 1.9800 1.9606 1.9607
0.3 1.9139 1.9408 1.9135 1.9138
Error 0 -0.0269 0.0004 0.0001

7.2.3 Runge-Kutta Methods

Although Euler’s method is easy to implement, this method is not so efficient in the sense
that to get a better approximation, one need a very small step size. One way to get a
better accuracy is to include the higher order terms in the Taylor expansion in the formula.
But the higher order terms involve higher derivatives of y. The Runge-Kutta methods
attempt to obtain greater accuracy and at the same time avoid the need for higher
derivatives, by evaluating the function f (x, y) at selected points on each subintervals. A
general Runge Kutta algorithm is given as

yn+1 = yn + hφ(xn , yn , h) (7.8)

The function φ is termed as increment function. The mth order Runge-Kutta method gives
accuracy of order O(hm ). The function φ is chosen in such a way that when expanded the
right hand side of (7.8) matches with the Taylor series up to desired order. This means
that for a second order Runge-Kutta method the right side of (7.8) matches up to second
order terms of Taylor series.

Second Order Runge-Kutta

The Second order Runge Kutta methods are known as RK2 methods. For the derivation
of second order Runge Kutta methods, it is assumed that φ is the weighted average of
two functional evaluations at suitable points in the interval [xn , xn+1 ], i.e., φ(xn , yn , h) =
w1 k1 + w2 k2 . Thus, we have:

yn+1 = yn + [w1 k1 + w2 k2 ] (7.9)

Numerical Analysis-I Page 197


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

where
k1 = hf (xn , yn ), k2 = hf (xn + αh, yn + βk1 ) (7.10)

Here w1 , w2 , α and β are constants to be determined so that equation (7.9) agrees with
the Taylor algorithm of a possible higher order.

Now, let’s write down the Taylor series expansion of y in the neighborhood of xn correct
to the h2 term i.e

h2 0
y(xn+1 ) = y(xn ) + hf (xn , y(xn )) + f (xn , y(xn )) + O(h3 ) (7.11)
2

Then, using chain rule for the derivative f 0 (xn , y(xn )) we get

∂f (xn , y(xn )) ∂f (xn , y(xn ))


f 0 (xn , y(xn )) = + f (xn , y(xn )) ,
∂x ∂y

Thus we have
" #
h2 ∂f (xn , y(xn )) ∂f (xn , y(xn ))
y(xn+1 ) = y(xn )+hf (xn , y(xn ))+ + f (xn , y(xn )) +O(h3 )
2 ∂x ∂y
(7.12)
In addition, equation (7.9) and (7.10) can be rewritten as:

yn+1 = yn + w1 hf (xn , yn ) + w2 hf (xn + αh, yn + βhf (xn , yn ))


" #
∂f (xn , yn ) f (xn , yn )
= yn + w1 hf (xn , yn ) + w2 h f (xn , yn ) + αh + βhf (xn , yn ) + O(h2 )
∂x ∂y
" #
2 ∂f (xn , yn ) ∂f (xn , yn )
= yn + h(w1 + w2 )f (xn , yn ) + h w2 α + w2 βf (xn , yn ) + O(h3 )
∂x ∂y

Therefore,
" #
2 ∂f (xn , yn ) ∂f (xn , yn )
yn+1 = yn + h(w2 + w2 )f (xn , yn ) + h w2 α + w2 βf (xn , yn ) + O(h3 ).
∂x ∂y
(7.13)
Assuming y(xn ) ≈ yn and comparing equations (7.12) and (7.13) yields

1 1
w1 + w2 = 1, w2 α = and w2 β = . (7.14)
2 2

Observe that four unknowns are to be evaluated from three equations. Accordingly many
solutions are possible for (7.14). Two examples of second-order Runge-Kutta methods of
the form (7.9) and (7.10) are the modified Euler method and the improved Euler method.

Numerical Analysis-I Page 198


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

1
(a) The modified Euler method In this case we take β = 2
obtain
!
1 h
yn+1 = yn + hf xn + h, yn + f (xn , yn ) .
2 2

(b) The improved Euler method, usually called RK2 This is arrived at by choos-
ing β = 1 which gives

k1 = hf (xn , yn ),
k2 = hf (xn + h, yn + k1 ),
1
yn+1 = yn + (k1 + k2 ) .
2

Fouth Order Runge-Kutta

A similar but more complicated analysis is used to construct Runge-Kutta methods of


higher order. One of the most frequently used methods of the Runge-Kutta family is
often known as the classical fourth-order method, RK4, given by:

1
yn+1 = yn + (k1 + 2k2 + 2k3 + k4 ) (7.15)
6

where

k1 = hf (xn , yn )
1 1
k2 = hf (xn + h, yn + k1 )
2 2
1 1
k3 = hf (xn + h, yn + k2 )
2 2
k4 = hf (xn + h, yn + k3 ).

Example 7.3

Consider the initial value problem

y 0 = y, y(0) = 1.

Approximate y(0.05) with a step-size h = 0.01 using RK2 and RK4.


Solution: Here f (x, y) = y, x0 = 0, y0 = 1, and h = 0.01. First let’s use
RK2 for n = 0, 1, · · · , 5.

Numerical Analysis-I Page 199


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

A x = 0.01 or when n = 0:

k1 = hf (x0 , y0 ) = hy0 = 0.010000


k2 = hf (x0 + h, y0 + k1 ) = h(y0 + k1 ) = 0.01(1 + 0.01) = 0.010100
1
y1 = y0 + (k1 + k2 ) = 1.0 + 0.5(0.010000 + 0.010100) = 1.010050
2

At x = 0.02 or when n = 1:

k1 = hf (x1 , y1 ) = hy1 = 0.01(1.010050) = 0.010100


k2 = hf (x1 + h, y1 + k1 ) = h(y1 + k1 ) = 0.01(1.010050 + 0.010100) = 0.010202
1
y2 = y1 + (k1 + k2 ) = 1.010050 + 0.5(0.010100 + 0.010202) = 1.020201
2

Repeating this until x = 0.05 we get the following

xi k1 k2 yi
0.0 – – 1.000000
0.1 0.010000 0.010100 1.010050
0.2 0.010100 0.010202 1.020201
0.3 0.010202 0.010304 1.030454
0.4 0.010305 0.010408 1.040810
0.5 0.010408 0.010512 1.051270

Therefore, y(0.05) = 1.051270


Now, let us use RK4 for n = 0, 1, · · · 5.
A x = 0.01 or when n = 0:

k1 = hf (x0 , y0 ) = hy0 = 0.010000


1 1 1
k2 = hf (x0 + h, y0 + k1 ) = h(y0 + k1 ) = 0.01(1 + 0.005) = 0.010050
2 2 2
1 1 1
k3 = hf (x0 + h, y0 + k2 ) = h(y0 + k2 ) = 0.01(1 + 0.005025) = 0.010050
2 2 2
k4 = hf (x0 + h, y0 + k3 ) = h(y0 + k3 ) = 0.01(1 + 0.010050) = 0.010101
1 1
y1 = y0 + (k1 + 2k2 + 2k3 + k4 ) = 1.0 + (0.010000 + 2 × 0.010050 + ×0.010050 + 0.010101)
2 6
= 1.010050

Numerical Analysis-I Page 200


Chapter-7: Numerical solutions of differential equation 7.2. Initial Value problem

At x = 0.02 or when n = 1:

k1 = hf (x1 , y1 ) = hy1 = 0.01(1.010050) = 0.010101


1 1 1
k2 = hf (x1 + h, y1 + k1 ) = h(y1 + k1 ) = 0.01(1.010050 + 0.005050) = 0.010151
2 2 2
1 1 1
k3 = hf (x1 + h, y1 + k2 ) = h(y1 + k2 ) = 0.01(1.010050 + 0.5 × 0.010151) = 0.010151
2 2 2
1
k4 = hf (x1 + h, y1 + k3 ) = h(y1 + k3 ) = 0.01(1.010050 + 0.010151) = 0.010202
2
1 1
y2 = y1 + (k1 + 2k2 + 2k3 + k4 ) = 1.010050 + (0.010101 + 2 × 0.010151 + 2 × 0.010151 + 0.01020
2 6
= 1.020201.

Repeating the above procedure until x = 0.05 we get the following

xi k1 k2 k3 k4 yi
0.0 – – – – 1.000000
0.1 0.010000 0.010050 0.010050 0.010101 1.010050
0.2 0.010101 0.010151 0.010151 0.010202 1.020201
0.3 0.010202 0.010253 0.010253 0.010305 1.030455
0.4 0.010305 0.010356 0.010356 0.010408 1.040811
0.5 0.010408 0.010460 0.010460 0.010513 1.051271

Therefore, y(0.05) = 1.051271

Numerical Analysis-I Page 201

You might also like