0% found this document useful (0 votes)
192 views

Intorduction To Numerical Analysis

This document provides an introduction and table of contents to numerical analysis. Numerical analysis is the study of algorithms for solving problems of mathematical analysis using arithmetic calculations, as analytic methods have limitations in practical applications. It involves developing constructive methods to obtain solutions to mathematical problems with arbitrary precision in a finite number of steps. The document outlines sources of error in numerical analysis and various methods for solving nonlinear equations, systems of linear equations, interpolation, differentiation, and integration numerically.

Uploaded by

Matusal Gizaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
192 views

Intorduction To Numerical Analysis

This document provides an introduction and table of contents to numerical analysis. Numerical analysis is the study of algorithms for solving problems of mathematical analysis using arithmetic calculations, as analytic methods have limitations in practical applications. It involves developing constructive methods to obtain solutions to mathematical problems with arbitrary precision in a finite number of steps. The document outlines sources of error in numerical analysis and various methods for solving nonlinear equations, systems of linear equations, interpolation, differentiation, and integration numerically.

Uploaded by

Matusal Gizaw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Introduction to

Numerical Analysis

By
Tibebesellassie T/Maraim

March 2005
Contents
Introduction.................................................................................................................... iv
1 Basic Concepts in error estimation .................................................................. 1
1.1 Sources of error......................................................................................................... 1
1.2 Number Representation .......................................................................................... 2
1.3 Rounding off errors................................................................................................. 4
1.4 Absolute and relative errors .................................................................................... 4
1.5 Propagation of Errors.............................................................................................. 5
2 Nonlinear equations ....................................................................................................... 8
2.1 Introduction............................................................................................................. 8
2.2 Locating Roots ........................................................................................................ 8
2.3 Bisection Method .................................................................................................... 9
2.4 False-Position Method .......................................................................................... 13
2.5 The Secant Method ............................................................................................... 16
2.6 Fixed-Point Iteration Method................................................................................ 19
2.7 The Newton-Raphson iterative method ................................................................ 22
3 Systems of Linear Equations ....................................................................................... 29
Introduction................................................................................................................... 29
3.1 Direct Methods........................................................................................................ 31
3.1.1 Upper-triangular Linear Systems..................................................................... 31
3.1.2 Gauss elimination method................................................................................ 34
3.1.3 Gaussian Elimination With Partial Pivoting .................................................. 38
3.1.4 Gauss-Jordan Method .................................................................................... 41
3.1.5 Matrix Inversion Using Jordan Elimination .................................................. 44
3.1.6 Matrix Decomposition ................................................................................... 45
3.1.7 Tri Diagonal Matrix Method.......................................................................... 54
3.2 Indirect methods.................................................................................................... 56
3.2.1 Gauss Jacobi Iterative Method....................................................................... 56
3.2.2 Gauss-Seidel Method ..................................................................................... 58
4 Solving Nonlinear Equations Using Newton’s Method............................................... 62
4.1 Introduction............................................................................................................. 62
4.2 Newton Method ...................................................................................................... 62
5 Finite differences ......................................................................................................... 65
Introduction................................................................................................................... 65
5.1 The shift operator E .............................................................................................. 67
5.2 The forward difference operator Δ ........................................................................ 68
5.3 The backward difference operator ∇ ...................................................................... 69
5.4 The central difference operator δ ........................................................................... 69
5. 5 Finite Differences of Polynomials ......................................................................... 72
6 Interpolation................................................................................................................. 75
6.1 Linear interpolation................................................................................................ 75
6.2 Quadratic interpolation ........................................................................................... 77
6.3 Newton interpolation formulae ............................................................................. 79
6.3.1 Newton's forward difference formula ............................................................ 79
6.3.2 Newton's backward difference formula ......................................................... 79
6.4 Lagrange interpolation formula .............................................................................. 83
6.5 Divided Difference Interpolation........................................................................... 86

Prepared by Tibebe-selassie T/mariam ii


7 Numerical Differentiation and Integration................................................................... 88
7.1 Numerical Differentiation..................................................................................... 88
7.2 Numerical Integration ........................................................................................... 90
7.2.1 The trapezoidal rule ....................................................................................... 90
7.2.1 Simpson's Rule............................................................................................... 93
Mid-Semester Examination (2004/2005).................................................................. 96
Final Examination (2005) ......................................................................................... 98
Mid-Semester Examination (2005/2006).................................................................. 99
2005/2006 Final Examination................................................................................. 101
2006 Mid Examination (I) ...................................................................................... 104
2006 Final Examination (I)..................................................................................... 106
INDEX ........................................................................................................................ 108

Prepared by Tibebe-selassie T/mariam iii


Introduction
Unlike other terms denoting mathematical disciplines, such as calculus or linear
algebra the exact extent of the discipline called numerical analysis is not yet clearly
defined.
By numerical analysis, we mean the theory of constructive methods in
mathematical analysis. By constructive method we mean a procedure that permits us to
obtain the solution of a mathematical problem with an arbitrary precision in a finite
number of steps that can be prepared rationally (the number of steps depend on the
desired accuracy). The numerical analysis is both a science and an art. As a science, it is
concerned with the process by which mathematical problem can be solved by the
operations of arithmetic. As an art, numerical analysis is concerned with choosing that
procedure which is best suited to the solution of a particular problem. In general
numerical analysis is the study of appropriate algorithms for solving problems of
mathematical analysis by means of arithmetic calculations.
The analytic methods have certain limitations in practical applications. So in most
cases exact solution is not possible from applied mathematics. In such cases the
numerical methods are very important tools to provide practical method for calculating
the solutions of problems to a desired degree of accuracy. The use of high-speed digital
computers for solving problems in the field of engineering design and scientific research
increased the demand for numerical methods.
The problems that are impossible to solve by classical methods or are too
formidable for a solution by manual computation can be resolved in a minimum of time
using digital computers. So it is essential for the modern man to be familiar with the
numerical methods used in programming problems using the computers.
The computer however is only as useful as the numerical methods employed. If
the method is inherently inefficient, the computer solution is worthless, no matter how
efficiently the method is organized and programmed. On the other hand, an accurate,
efficient numerical method will produce poor results if inefficiently organized or poorly
programmed. The proper utilization of the computer to solve scientific problems requires
the development of a program on numerical method suitable to the problem.
Thus in the choice of efficient numerical methods we have to consider the
accuracy of the method in giving an approximate solution and the easiness of the method
for implementation.
Numerical computation
The steps involved in the numerical computation are as below:
i) Selection of method
ii) Preparation of algorithm
iii) Flow charting
iv) Programming
v) Execution of a program
The selection of method is to be a mathematical formula to find the solution of a
given problem. To solve the same problem, there may be more than one methods are
available. One should choose a method that suits the given problem. At the same time
any assumptions and limitations of the method must be also studied.
After selection of a proper method, a complete set of computational steps to be followed
in a sequence. The list of procedures is called an algorithm.

Prepared by Tibebe-selassie T/mariam iv


A diagrammatic representation that illustrates the sequence of operations to be
performed to arrive at the solution is called a flow chart. The operating instructions are
placed in boxes that are connected by arrows to indicate the order of execution. In
drawing flow chart certain commonly used symbols and their meanings are given below.

Start or stop of the program

Input or output instructions

Computational steps

Decision-making and branching

For loops

Connector or joining of two parts of a program

Denotes a sub-program or function program

Based on the flow chart a series of instructions are written in “C++” programming
language.
Since the thorough understanding of errors is necessary for a proper appreciation
of the art of using numerical methods we devote the first chapter of this material for the
theory of error analysis.

Prepared by Tibebe-selassie T/mariam v


1 Basic Concepts in error estimation
1.1 Sources of error
The main sources of error in obtaining numerical solutions to mathematical problems are:
a) Modeling: - a mathematical description of a physical problem usually involves
Simplifications and omissions. (For more information see Numerical Methods for
Engineers pages 11-20.)
b) Measuring instruments or Computing aids: - there may be errors in measuring or
estimating values;
c) The numerical method : - Most of the time numerical methods involve
approximations
d) Representation of numbers: - e.g π cannot be represented exactly by a finite number
of digits;
e) Arithmetic error: - frequently errors are introduced in carrying out operations such as
addition (+) and multiplication (×).
We can pass responsibility for (a) onto the applied mathematician, but the others are not
so easy to dismiss. Thus, if the errors in the data are known to lie within certain bounds,
we should be able to estimate the consequential errors in the results. Similarly, given the
characteristics of the computer, we should be able to account for the effects of (d) and (e).
As for (c), when a numerical method is devised it is customary to investigate its error
properties.

Classification of errors
We classify errors in general into three depending on their sours:
1. Errors which are already present in the statement of a problem before its solution are
called inherent errors. Such errors arise either due to the given data being
approximate or due to the limitations of mathematical tables, calculators or the digital
computer. Inherent errors can be minimized by taking better data or by using high
precision computing aids.
2. Errors due to arithmetic operations using normalized floating-point numbers. Such
errors are called rounding errors. Such errors are unavoidable in most of the
calculations due to the limitations of the computing aids. Rounding errors can,
however, be reduced:
(i) by changing the calculation procedure so as to avoid subtraction of nearly equal
numbers or division by a small number; or
(ii) by retaining at least one more significant figure at each step than that given in the
data and rounding off at the last step.
3. Errors due to finite representation of an inherently infinite process. For example the
use of a finite number of terms in the infinite series expansion of sin x, cos x, ex, etc.
Such errors are called truncation errors. Truncation error is a type of algorithm
error. Truncation error can be reduced by retaining more terms in the series or more
steps in the iteration; this, of course, involves extra work.

1
1.2 Number Representation
In order to carry out a numerical calculation involving real numbers like 1/3 and π in
terms of decimal representation, we are forced to approximate them by a representation
involving a finite number of significant figures. At this stage in a significant
representation of a number we have to notice the following properties.
i. All zeros between two non-zero digits are significant.
ii. If a number is having embedded decimal point ends with a non-zero or a sequence of
zeros, then all these zeros are significant digits.
iii. All zeros preceding a non-zero digit are non-significant.
For example
Number Number of significant digits
5.0450 5
0.0037 2
0.0001020 4

Similarly, when trailing zeros are used in large numbers, it is not clear how many,
if any, of the zeros are significant. For example, at face value the number 23,400 may
have three, four, or five significant digits, depending on whether the zeros are known
with confidence. Such uncertainty can be resolved by using scientific notations, where
2 ⋅ 34 × 10 4 ,2 ⋅ 340 × 10 4 ,2 ⋅ 3400 × 10 4 significant that the number is known to three, four,
and five significant figures, respectively.
Notice that as indicated above, we separate the significant figures (the mantissa)
from the power of ten (the exponent); the form in which the exponent is chosen so that
the magnitude of the mantissa is less than 10 but not less than 1 is referred to as a
scientific notation.
Number representation in the computer
Since there is a fixed space of memory in a computer, a given number in a certain base
must be represented in a finite space in the memory of the machine. This means that not
all real numbers can be represented in the memory. The numbers that can be represented
in the memory of a computer are called machine numbers. Numbers outside these cannot
be represented.
There are two conventional ways of representation of machine numbers.
i. Fixed point representation
Suppose a number to be represented has n digits, in fixed-point representation system, the
n digits are subdivided into n1 and n2, where n1 digits are reserved for the integer part and
n2 digits are reserved for the fractional part of the number. Here, whatever the value of
the number is, the numbers n1 and n2 are fixed from the beginning. i.e the decimal point is
fixed.
Example: If n=10, n1 = 4, and n2 = 6 then say 30.421 is represented by 0030421000 in a
register.
Note: - In modern computers, the fixed point representation is used for integers only.
ii. Floating point representation
Every real number “a” can be written as:
a = p × N q , where: p is any real number, N is the chosen base of representation and q is
an integer. Such a kind of representation of “a” is said to be normalized if N −1 ≤ p < 1.

Prepared by Tibebe-selassie T/mariam 2


Example: If a = 36.72, then the normalized floating point representation for a is given
by a = 0.3672 × 10 2 so that p = 0.3672, N = 10 and q = 2. We call p and q, the mantissa
and characteristic (exponent), respectively of the number “a”. Thus, if the base is known
or fixed, any real number “a” can be represented by order pair (p, q) and it will be
sufficient to store this pair in the computer. Every number in a computer is represented as
follows.
Space reserved for the number a(n-bits)
S Characteristic
mantissa
(+ or -) (exponent)
(sign-bit) r bits t bits in base N

Most of the time computers store numbers in base two to save memory space So
for instance the number (-39.9)10 should first be changed to its binary form
(−100111.11100) 2 = (−0.10011111100) 2 × (2 6 )10 so as to be stored in a computer.
A widely used storage scheme for the binary form is IEEE Standard for Binary
Floating-Point arithmetic. This standard was define by the Institute of Electrical and
Electronic Engineers and adopted by the American National Standards Institute. The
single-precision format employs 32 bits, of which the first or most significant bit is
reserved for the sign bit S; the first bit for the number (-39.9)10 is therefore equal to 1.
The next 8 bits are used to store a bit pattern to represent the exponent r. the
binary value of r is not stored directly; rather, it is stored in biased or offset form as a
nonnegative binary value c. The relation for the actual exponent r in terms of the stored
value c and the bias b is
c=r+b
The advantage of biased exponents is that they contain only positive numbers. It
is then simpler to compare their relative magnitude without being concerned with their
signs. As consequence, a magnitude comparator can be used to compare their relative
magnitude during the alignment of the mantissa. Another advantage is that the smallest
possible baised exponent contains all zeros. The floating-point representation of zero is
then a zero mantissa and the smallest possible exponent.
The 8-bit value of c ranges from (0000 0000)2 to (1111 1111)2 or from (0)10 to
(255)10. The bias b has a value of (0111 1111)2 or (127)10. Again using (-39.9)10 for which
r is equal to(6)10 we obtain a c value of (133)10 whose 8-bit form is (10000101)2.
The remaining 23 bits of the 32-bit format are used for the mantissa. The mantissa
for our example (-39.9)10 is an infinitely long binary sequence (−0.10011111100) 2 ,
which must be reduced to 23 bits for storage. The method of reduction dictated by the
IEEE standard is rounding.
The IEEE format for the decimal number (-39.9)10 is shown in Fig bellow
1 1 0 0 0 0 1 0 1 1 0 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1

S c mantissa
Note: A number cannot be represented exactly if it contains more than t bits in mantissa.
In that case it has to be rounded off. Moreover if, in the course of computation, a number

Prepared by Tibebe-selassie T/mariam 3


x is produced of the form ±q×Nr with r outside the computer’s permissible range, then we
say that an overflow or an underflow has occurred or that x is outside the range of the
computer. Generally, an overflow results in a fatal error(or exception), and the normal
execution of the program stops. An underflow, however, is usually treated automatically
by setting x to zero without any interruption of the program but with a warning message
in most computers.

1.3 Rounding off errors


The simplest way of reducing the number of significant figures in the representation of a
number is merely to ignore the unwanted digits. This procedure, known as chopping, is
used by many modern computers. A better procedure is rounding, which involves adding
5 to the first unwanted digit, and then chopping. For example, π chopped to four decimal
places is 3.1415, but it is 3.1416 when rounded; the representation 3.1416 is correct to
five significant figures. The error involved in the reduction of the number of digits is
called round off error. Since π is 3.14159… we could remark that chopping has
introduced much more round-off error than rounding.

1.4 Absolute and relative errors


There are two common measures of error in a numerical calculation. If the exact value of
some quantity is denoted by x and an approximate calculated value is denoted by x̂ then
1. The absolute error is defined by
e abs = x − xˆ .
If x̂ is found due to rounding correct to n decimal places from that of x then
e abs x − xˆ < 0.5 × 10 − n .
Whereas the absolute error due to chopping to n decimal places can be twice as large;
e abs = x − xˆ ≤ 10 − n
2. The relative error is defined by
x − xˆ e abs
e rel = = where x ≠ 0.
x x
x̂ is said to approximate x correct to n significant digits if n is the largest positive
integer for which
x − xˆ
e rel = < 0.5 × 10 − n
x
The absolute value signs can be omitted if the sign of the error is important. The
relative error is a good measure of the approximate number of significant figures in a
calculated result. Of course, both formulas require that the exact value be known.
Example 4 Find the
i)absolute error and the relative errors in the following cases.
a) x = 3.14592 and x̂ =3.14;
b) x = 1,000,000 and x̂ = 999,996;
c) x = 0.000012 and x̂ =0.000009;
ii) Detemine the number of significant digits for the approximations in (i).
Solution: i) a) e abs = x − xˆ = 3.14592 − 3.14 = 0.001592,

Prepared by Tibebe-selassie T/mariam 4


e abs 0.001592
e rel = = = 0.000507.
x 3.141592
b) e abs = x − xˆ = 1,000,000 − 999,996 = 4,
e abs 4
e rel = = = 0.000004.
x 1,000,000
c) e abs = x − xˆ = 0.000012 − 0.000009 = 0.000003,
e abs 0.000003
e rel = = = 0.25.
x 0.000012
ii) a) Since e rel = 0.000507 < 0.5 × 10 −2 , x̂ approximates x to two significant digits.
b) Since e rel = 0.000004 < 0.5 × 10 −5 , x̂ approximates x to five significant digits.
c) Since e rel = 0.25 < 0.5 × 10 0 , x̂ approximates x to no significant digits.
Example 5 Let a = 0.263 × 104 and b = 0.446 × 101 be three-digit decimal normalized
floating-point numbers. If these numbers are added, the exact sum will be
x =a + b = 0. 263446 × 104
However, the three digit decimal normalized floating-point representation of this sum is
0.263 × 104. This then should be the computed sum. We will denote the floating-point
sum by x̂ . The absolute error in the sum is
|x - x̂ | = 4.46
and the relative error is
4.46
≈ 0.17 × 10 − 2.
0.26344 × 10 4

The actual value of ab is 11,729.8; however, (ab)’ is 0.117 × 105. The absolute error in
the product is 29.8 and the relative error is approximately 0.25 × 10-2. Floating-point
subtraction and division can be done in a similar manner.
One of the challenges of numerical methods is to determine error estimates in the
absence of knowledge regarding the exact value. For example, certain numerical methods
use an iterative approach to compute answers. In such an approach, a present
approximation is made on the basis of a previous approximation. This process is
performed repeatedly, or iteratively. For such cases, the error is often estimated as the
difference between the previous and the current approximations. Thus, relative error is
determined according to
current approximation − previous approximation
erel =
current approximation
For practical reasons, the relative error is usually more meaningful than the absolute
error. For example if, if a1= 1.333, b1= 1.334, and a2 = 0.001, and b2 = 0.002, then the
absolute error of bi as an approximation to ai is the same in both cases-namely, 10-3.
However, the relative errors are ¾ × 10-3 and 1, respectively. The relative error clearly
indicates that b1 is a good approximation to a1 but that b2 is a poor approximation to a2.

1.5 Propagation of Errors


Consider two numbers x = xˆ + e x , y = yˆ + e y and the error e:

Prepared by Tibebe-selassie T/mariam 5


i) Under the operation addition,
x + y = ( xˆ + e x ) + ( yˆ + e y )
e ≡ ( x + y ) − ( xˆ + yˆ ) = e x + e y .
hence e ≤ e x + e y
i.e Max(e) = e x + e y .
The magnitude of the propagated error is therefore note more than the sum of the initial
absolute errors; of course, it may be zero.
ii) Under the operation multiplication (division),
Since xy = ( xˆ + e x )( yˆ + e y ) , xy − xˆyˆ = xˆe y + yˆ e x + e x e y .
xy − xˆyˆ e ey ex e y
so that ≤ x + +
xˆyˆ xˆ yˆ xˆyˆ
ex ey ex e y
and so Max{e rel } ≈ + assuming is negligible.
xˆ yˆ xˆyˆ
This shows that the relative error in the product xy is a pproximately the sum of the
relative errors in the approximations xˆ and yˆ .
The same result holds for division.
Often an initial error will be propagated in a sequence of calculations. A quality
that is desired of any numerical process is that a small error in the initial conditions will
produce small changes in the final result. An algorithm with this feature is called stable;
otherwise, it is called unstable. Whenever possible we shall choose methods that are
stable.
Exercises
1. a. Chop to three significant figures,
b. Chop to three decimal places,
c. Round to three significant figures
d. Round to three decimal places
each of the following numbers:
2.46475; 43.4765; 0.000442; 0.8005; 8982334.
2. Find the normalized floating-point representation of each of the following
numbers.
a) 2312 b) 32.56 c) 0.01267 d) 82,431
3. Find the absolute and the relative error when a three-digit decimal normalized
floating-point number approximates each of the real numbers in problem 2.
4. A percentile error ep is defined as erel×100. Find ep in problem 2 given the
conditions of problem 3.
5. Represent each of the following as five-digit base 2 floating point numbers
a) 21 b)3/8 c) 9.872 d) –0.1
6. Do each of the following using four-digit decimal normalized floating point
arithmetic and calculate the absolute and relative errors in your answer
a) 10,420 + 0.0018 b) 10,424 – 10,416 c) (3626.6)⋅(22.656)
7. 19 = 4.359 and π = 1.772 correct to 4 significant figures. Find the relative
and absolute errors in their sum and difference.

Prepared by Tibebe-selassie T/mariam 6


8. Calculate the value of (x2 – y2 )/(x – y) with x = 0.4845 and y = 0.4800, using
normalized point arithmetic. Compare with value of (x + y). Determine the
relative error.
9. If a number p is correct to 4 significant digits, what will be the maximum relative
error?
10. Discuss the difference between error and mistake.
11. Does floating point arithmetic obey the usual laws of arithmetic? (reading
assignment)
12. Complete the following computation
1/ 4 1/ 4
⎛ x4 x6 ⎞
∫0 ∫0 ⎜⎝
x2
e dx ≈ ⎜ 1 + x 2
+ + ⎟dx = xˆ
2! 3! ⎟⎠
State what type of error is present in this situation. Compare your answer with the
true value x = 0.2553074606.
13. Find the absolute error and relative error. Also determine the number of
significant digits in the approximation.
a) x = 2.71828182, x̂ = 2.7182 b) x = 98,350, x̂ = 98,000
c) x = 0.000068, x̂ = 0.00006
14. Assume that a ≠ 0 and b 2 − 4ac > 0 and consider the equation ax 2 + bx + c = 0.
Then the root can be compound with the quadratic formulas
− b + b 2 − 4ac − b − b 2 − 4ac
i) x1 = and x2 =
2a 2a
Show that these roots can be calculated with the equivalent formulas
− 2c − 2c
ii) x1 = and x2 =
b + b 2 − 4ac b − b 2 − 4ac
Hint: Rationalize the numerators in (i) Remark In the cases when b ≈ b 2 − 4ac ,
one must proceed with caution to avoid loss of precision due to a catastrophic
cancellation. If b > 0, then x1 should be computed using (i) and x2 should be
computed using (ii).
15. Use the result of excercies 14 to construct an algorithm and c++ program that will
accurately copute the roots of quadratic equation in all situations including the
troublesome ones when b ≈ b 2 − 4ac .
16. Discuss the propagation of error for the following:
a) The sum of three numbers:
x + y + z = ( xˆ + e x ) + ( yˆ + e y ) + ( zˆ + e z ) .
x ( xˆ + e x )
b) The quotient of two numbers: = .
y ( yˆ + e y )
c) The product of three numbers:
xyz = ( xˆ + e x )( yˆ + e y )( zˆ + e z )

Prepared by Tibebe-selassie T/mariam 7


2 Nonlinear equations
2.1 Introduction
In this chapter, we study certain methods of solving nonlinear equations in single variable
of the form f ( x) = 0.
The study of analytical methods for finding the roots of algebraic, polynomial
equations are limited to, for example, polynomials of less degrees such as the quadratic,
cubic and fourth degree polynomial equations and to some particular type of equations of
the form
P ( x) = ax 2 n + bx n + c = 0.
However, for polynomial equations of degree greater than four, exact methods of solution
hardly exist and, therefore, one should apply numerical methods to find the roots.
As we have already stated, analytical methods are limited to simple polynomial
equations. They are not applicable for transcendental equations such as
ae x + b cos x + cx = 0.

2.2 Locating Roots


Let us suppose that our problem is to find some or all of the roots of the non-linear
equation f ( x) = 0. Before we use a numerical method we should have some idea about
the number, nature and approximation of graphs or tables of values of the functions f(x),
and by using graphs we can locate roots in the following two ways:
a) The roots of the equation f ( x) = 0 are the x-coordinate values of the points of
intersection of f (x) with the x-axis. Thus, we may trace the function f (x) in the
interval [a, b] and determine graphically the root x*.
y

a x
x* b

Fig 2.1

b) Often it is possible to write the function f (x) as the difference of two simple known
functions. Suppose f ( x) = h( x) − g ( x) . Then, the roots of f ( x) = 0 that is
h( x) = g ( x) , are given by the intersection of the curves h(x) and g(x). For instance if
in the equation
sin x − x + 0.5 = 0
we put f ( x) = sin x − x + 0.5 , it is easy to separate f (x) into two parts, sketch two
curves on the set of axes, and see where they intersect. In this case we sketch
h( x) = sin x and g ( x) = x − 0.5. Since sin x ≤ 1 we are only interested in the interval
− 0.5 ≤ x ≤ 1.5 (outside which x − 0.5 > 1).

Prepared by Tibebe-selassie T/mariam 8


Fig 1.2

We deduce from the graph that the equation has one real root near x = 1.5. We then
tabulate f ( x) = sin x − x + 0.5 near x =1.5 as follows:

x 1.5 1.45 1.49


sin x 0.9975 0.9927 0.9967
f(x) -0.0025 0.0427 0.0067

We now know that the root lies between 1.49 and 1.50, and we can use a numerical
method to obtain a more accurate answer.

2.3 Bisection Method


This method is based on the repeated application of the intermediate value theorem.
Which states that if a function f (x) is continuous between a and b, and f (a ) and f (b )
are of opposite sighs, then there exists at least one real root of f ( x) = 0 between a and b.
a + b0
Let f be a continuous function on [a0 , b0 ] , f (a 0 ) < 0 , f (b0 ) > 0 and x* = 0
2
then if we calculate f (x*) , which is the function value at the point of bisection of the
interval (a 0 , b0 ) , we will have three possibilities:
a. f ( x*) = 0 in this case x* is the root of f ( x) = 0.
b. f ( x*) < 0 in this case the root of f ( x) = 0 lies between x* and b0 .
c. f ( x*) > 0 in this case the root of f ( x) = 0 lies between a and x* and a 0 .
Presuming there is just one root, if case (a) occurs the process is terminated. If either case
(b) or case (c) occurs, the process of bisection of the interval containing the root can be
repeated until the root is obtained to the desired accuracy.

Prepared by Tibebe-selassie T/mariam 9


Algorithm for bisection method
1. Choose limiting values a and b (with b>a) so that f(b)⋅f(a)<0 allowable absolute
error (tolerance) ε.
2. Compute the interval midpoint x*=(a+b)/2, and compute f(x*).
3. If f(b)·f(x*) =0 go to step 6 otherwise go to step 4.
4. If f(b)· f(x*)<0 go to step 5(i) otherwise go to step 6(ii).
5. i. If |b – x*| ≤ ε, go to step 6, otherwise reset a to x* and turn to step 2.
ii. If |a – x*| ≤ ε, go to step 6, otherwise reset b to x* and turn to step 2.
6. x* is the approximated root.
Example 1 Solve sin x − x + 0.5 = 0 accurate to 4 decimal places by the bisection method.
Solution: Let f ( x) = sin x − x + 0.5 , remember that in the last section we have seen that f
has a root between 1.49 and 1.5 we can tabulate our result round to 4 decimal places as
follows.
a + bn
Interval an bn x* = n f (x*) Absolute Error
2
[a0 , b0 ] 1.49 1.50 1.495 0.002129 0.005
[a1 , b1 ] 1.495 1.50 1.4975 -0.000185 0.0025
[a 2 , b2 ] 1.495 1.4975 1.49625 0.000973 0.00125
[a3 , b3 ] 1.49625 1.4975 1.496875 0.000394 0.000625
[a 4 , b4 ] 1.496875 1.4975 1.497187 0.000105 0.000313
[a5 , b5 ] 1.497187 1.4975 1.497348 -0.00000045 0.000161
[a6 , b6 ] 1.497187 1.497348 1.497267 0.000003 0.000081
[a 7 , b7 ] 1.497267 1.497348 1.497307 -0.00000006 0.00004

Hence since | 1.497307 – 1.4 97267| ≈ 0.00004 the approximated root is 1.4973 round to
4 decimal places.
Convergence Analysis
Now let us investigate the accuracy with which the bisection method determines a root of
a function. Suppose that f is a continuous function that takes values of opposite sign at
the ends of an interval [a, b]. Since the new interval containing the root, is exactly half
the length of the previous one, the interval width is reduced by a factor of ½ at each step.
At the end of the nth step, the new interval will therefore be of length (b – a)/2. If on
repeating this process n times, the latest interval is as small as an error tolerance ε, then
(b – a)/2n ≤ ε. From this relation it is also possible to determine the number of steps
required in the bisection method to reach at the solution. So to find this number of steps
n, it is necessary to take the logarithm (with any convenient base) both sides of the
inequality (b – a)/2n ≤ ε to obtain

log(b − a) − log ε
n≥
log 2
Observe that the error decreases with each step by a factor of ½, (i.e en+1/en = ½), thus we
say that the convergence in the bisection method is linear for en+1 is a scalar multiple of
en .

Prepared by Tibebe-selassie T/mariam 10


Example 2 If right at the beginning of Example 1, we were keen enough to know the
number if iterations that is required to go through in order to reach at the solution within
the required accuracy, we could have employed the relation:
log(1.49 − 1.5) − log(0.00005)
n≥ = 7.644
log 2
and observe that at list n = 8 iterations are necessary to reach at the solution. Which is of
course exactly the number of iterations that was required in the solution of the example.

Flow Chart of Bisection Method

Start

Define f(x)

Read
Read
a,a,b,b,eε

Inappropriate
f (a)f (b) >0 yes initials Stop

a+b
x* =
a=x* 2 b= x*
No No
<0 >0
|b-x*|≤ε f(x*)f(b) |a-x*| ≤ ε

=0
Yes Yes
x* is root

Stop
Fig 2.3

Following the algorithm and the follow chart we can write a pseudocode for the bisection
method in C++ programming language as bellow:

#include<iostream.h>
#include<math.h>
#include<conio.h>
#include<iomanip.h>

Prepared by Tibebe-selassie T/mariam 11


float f(float x)
{
return sin(x)-x+0.5;
}
void main()
{
clrscr();
float a,b,c,e;
int i=0;
cout<<"Enter the initial interval (a,b) and the the error e \n";
cin>>a>>b>>e;
if(f(a)*f(b)>=0)
{
cout<<"either the function has equal roots at a and b \n";
cout<<"or a or b is the root of f(x)=0 and \n";
cout<<" f("<<a<<")="<<f(a)<<"and"<<" f("<<b<<")="<<f(b);
}
else
{
do
{
c=(a+b)/2;
i=i+1;
if(f(a)*f(c)<0)
{
if(fabs(a-c)>e)
b=c;
else
break;
}
else if(f(a)*f(c)>0)
{
if(fabs(b-c)>e)
a=c;
else
break;
}
else
break;
cout<<i<<setw(15)<<a<<setw(15)<<b<<setw(15)<<c<<endl;
}
while(1);
cout<<i<<setw(15)<<a<<setw(15)<<b<<setw(15)<<c<<endl;
cout<<"the root of the equation is "<<c<<" in "<<i<<" steps";
}

getch();
}

Effectiveness of the bisection method


The bisection method is suitable for automatic computation and is almost certain to give a
root. It only fail if the accumulated error in the calculation of f(x) at a bisection point
gives it a small negative value when actually it should have a small positive value (or
vice versa); the interval subsequently chosen would therefore be wrong. This can be
overcome by working to sufficient accuracy, and this almost-assured convergence is not
true of many other methods of finding a root.

Prepared by Tibebe-selassie T/mariam 12


One drawback of the bisection method is that it applies only for roots of f(x) about
which f(x) changes sigh. In particular, double roots can be overlooked; one should be
careful to examine f(x) in any range where it is small, so that repeated roots about which
f(x) does not change sign are otherwise evaluated. Such a close examination of course
also avoids another nearby root being overlooked.
Finally, note that bisection is rather slow; after n steps the interval containing the
root is of length b − a / 2 n. However, provided values of f(x) can be generated readily;
the rather large number of steps which can be involved in the application of bisection is
of relatively little consequence of an automatic computer!

2.4 False-Position Method


The bisection method guarantees that the iterative process will converge. It is, however,
slow. Thus attempts have been made to speed up bisection method retaining its
guaranteed convergence. The method of linear interpolation or regula falsi (false
position) is found to be an effective alternative to the bisection method for solving the
equation f ( x) = 0 for a real root between a and b, given that f (x) is continuous and
f (a) and f (b) have opposite signs.
To implement the method observe that the curve y = f (x ) is not generally a strait line.
However, one may join the points
(a, f (a)) and (b, f (b))
by the straight line
y − f (a) x−a
=
f (b) − f (a) b − a
The straight line cuts the x-axis at (x ,0) where
0 − f (a ) x−a
=
f (b) − f (a) b − a
So that
b−a
x = a − f (a)
f (b) − f (a )
af (b) − bf (a ) 1 a f (a )
= =
f (b) − f (a ) f (b) − f (a ) b f (b)
Now let us assume that f (a) is negative and f (b) is positive. As in the bisection method,
there are three possibilities:
i) f ( x ) = 0 , in which case x is the root;
ii) f ( x ) < 0 , in which case the root lies between x and b;
iii) f ( x ) > 0 , in which case the root lies between x and a.
If case (i) occurs, the process is terminated; if either case (ii) or case (iii) occurs, the
processes can be repeated until the root is obtained to the desired accuracy. In fig 2.4, the
successive points where the straight lines cut the x-axis are denoted by x1 , x 2 , x3 .

Prepared by Tibebe-selassie T/mariam 13


Algorithm for False-Position Method
1. Choose a and b such that a < b and f(b)⋅f(a)<0, set the allowable absolute error
(tolerance) ε and the maximum number of iteration M.
2. Set the counter i to zero.
3. Increase i by 1, and compute the intermediate point x form either
b−a
x = a − f (a)
f (b) − f (a)
or
a−b
x = b − f (b)
f (a) − f (b)
4. If i ≤ M, go to 5, otherwise go to 8.
5. If |f( x )| ≤ ε go to step 7 otherwise go to step 6.
6. If f(b)⋅f( x )<0 reset a to x , otherwise reset b to x , and turn to step 3.
7. x is the approximated root.
8. The iteration process does not converge with in M iteration,.

x
a x1 x3 x2

Fig 2.4
Example 3 To witness how fast the false-position method can be let us reconsider the
problem in example 1. Thus let f ( x) = sin x − x + 0.5 , since the root lays between 1.49
and 1.5 the first intermediate point x is given by
1.5 − 1.49
x = 1.49 − f (1.49) = 1.4973
f (1.5) − f (1.49)
which is the solution of the equation round to 4 decimal places. Observe that this was the
result that was found after 8 steps of bisection in example1.
Example 4 Find the root of the equation e − x = x using the false-position method correct
to three decimal places.
Solution:- Let f ( x) = e − x − x , then since f(0) = 1 and f(1) = – 0.6321 the root lies
between 0 and 1. Putting a = 0 and b = 1 the first intermediate point x can found using
the relation

Prepared by Tibebe-selassie T/mariam 14


b−a
x = a − f (a)
f (b) − f (a)
which implies
1− 0
x = 0 − f (0) = 0.6127
f (1) − f (0)
now f( x ) = f(0.6127) = – 0.0708, reset b to x i.e b = 0.6127 and find the next
intermediate point between a and b (0 and 0.6127) and so on till the f( x )<0.0005. We
now arrange our solution in the table bellow.

a b x f (x) f (x )
0 1 0.6127 -0.0708 0.0708
0 0.6127 0.5722 -0.0079 0.0079
0 0.5722 0.5677 -0.0009 0.0009
0 0.5677 0.5672 -0.0001 0.0001

Form the table we deduce that the root round to 3 decimal places is 0.567 in 4 iterations.
Observe that every approximation was done round to 4 decimal places and finally we
round the last approximation to 3 decimal places that is a deliberate act to minimize the
absolute error. Note that, had it been in the case of bisection there could have been 13
iterations to reach at the same accuracy! (Check).
Like the bisection method, the method of false position has almost assured
convergence, and it may converge to a root faster. However, it may happen that most or
all of the calculated x values are on the same side of the root, in which case convergence
may be slower than bisection see fig 2.5 Example 5 bellow.

a b

Fig 2.5

Example 5 Use bisection and false position to locate the root of f ( x) = x10 − 1 between
x=0 and x=1.2 round two decimal places.
Solution: It can be shown that in the case of bisection method we have to go through 9
iterations while in case of false-position method we have to go through at least twenty
iteration in order to reach at the desired accuracy. Which implies it took us more than two
fold iterations in the case of false position than in the case of bisection. (Check).

Prepared by Tibebe-selassie T/mariam 15


2.5 The Secant Method
The modification of the interpolation method that is the secant method which we state
below avoids the slow process of the interpolation method except that no attempt is made
to ensure that the root is enclosed. Starting with approximations to the root x0 and x1
further approximations x 2 , x3 ,... computed from
x n − x n −1
x n +1 = x n − f ( x n )
f ( x n ) − f ( x n −1 )
We no longer have assured convergence, but the process is simpler since it does don’t
require the sign of f ( x n +1 ) to be tested and often converges faster. However, the secant
method has the following two limitations:
i) The convergence is no more guaranteed.
ii) The solution is no more contained in the interval (a n , bn ).
Application of this method is recommended only in close neighborhoods of the root.
A geometrical interpretation of the secant method is shown in Fig 2.6. Observe that in
this method a secant is drawn connection f ( x n −1 ) and f ( x n ). The point where it cuts the
x-axis is x n +1 . Another secant is drawn through f ( x n ) and f ( x n +1 ). This is repeated till
f(x) = 0.

xn xn+1 xn-1

Fig 2.6
Algorithm for Secant Method
1. Choose x0 and x1, set the error (tolerance) ε and the maximum number of iteration M.
2. Set the counter i to 0.
3. If |f(x1)| < ε, x1 is the solution otherwise go to Step 3.
4. Increase i by 1 and calculate an intermediate point x2 from
x1 − x 0
x 2 = x1 − f ( x1 )
f ( x1 ) − f ( x0 )
5. If i > M, ‘the process does not converge to the required limit’ otherwise go to step 5.
6. Reset x0 to x1 , rest x1 to x2, and return to Step 2.

Prepared by Tibebe-selassie T/mariam 16


Example 6 Find the root of the equation e − x = x using the secant method correct to three
decimal places.
Solution: Let f ( x) = e − x − x and the initials x0= 0 and x1 = 1. Since f ( x1 ) = 0.6321
which is much larger than 0.0005, we proceed to find the approximation point x2 using
the relation
x1 − x 0
x 2 = x1 − f ( x1 )
f ( x1 ) − f ( x0 )
1− 0
that is x 2 = 1 − f (1) = 0.6127
f (1) − f (0)
now reset x0=1 and x1=0.6127 and again continue to searching for the next approximation
point since |f(0.6127)|=0.1006>0.0005.
0.6127 − 1
x 2 = 0.6127 − f (0.6127) = 0.5638
f (0.6127) − f (1)
again reset x0=0.6127 and x1=0.5638 and again continue to searching for the next
approximation point since |f(0.5638)|=0.0052>0.0005.
0.5638 − 0.3679
x 2 = 0.5638 − f (0.5638) = 0.5672
f (0.5638) − f (0.3679)
this time since |f(0.5672)|≈0.5923×10-5 <0.0005. The root round to three decimal places is
given by 0.567 in 3 iterations. Comparing this result with the result of example 4 we can
see that secant has a faster convergence rate than that of false-position when it converges.
Caution: It is not necessary to choose initials a and b such that f(a).f(b)<0.For instance in
example 6 we could use initials 1 and 2 and reach at the solution in three iterations.
(check)!
Example 7 Find the approximated solution for f ( x) = x 3 + x 2 − 3x − 3 = 0 by using the
bisection, interpolation and secant methods correct to 2 decimal places in the interval
[1,2] and compare the methods.
Solution:
Bisection method
x1 x2 x* f (x*)
1 2 1.5 -1.875
1.5 2 1.75 0.171875
1.5 1.75 1.625 -0.943359
1.625 1.75 1.6875 -0.409424
1.6875 1.75 1.71875 -0.124786
1.71875 1.75 1.734375 0.0203
1.71875 1.734375 1.726562 -0.051755
Interpolation method
x1 x2 x3 f ( x*)
1 2 1.571429 -1.364432
1.571429 2 1.705411 -0.247744
1.705411 2 1.727883 -0.039339
1.727883 2 1.731405 -0.00611

Prepared by Tibebe-selassie T/mariam 17


Secant method
x n −1 xn x n +1
1 2 1.5714
2 1.5714 1.705411
1.5714 1.705411 1.735136

As can be seen from the tables above the bisection method takes 6 iterations, the
interpolation method takes 4 iterations and the secant method takes only 3 iterations to
reach at the approximated root 1.73.Thus comparing the methods we see that the fastest is
the secant and the slowest is the bisection method.
Even in the general case, if there is a solution in each case the secant method is
the fastest of the three methods discussed so far in this section.
Rate of Convergence of the Secant Method
With respect to speed of convergence of the secant methods, we have the error at (n+1)th
step:
x n − x n −1
x n +1 = x n − f ( x n ) **
f ( x n ) − f ( x n −1 )
The convergence of the iterative procedure is judged by the rate at which the error
between the true root and the calculated root decreases. The order of convergence of an
iterative process is defined in terms of the errors ei and ei+1 in successive approximations.
An iterative algorithm is kth order convergent if k is the largest integer such that:
lim (ei +1 eik ) ≤ M
n→∞

where M is a finite number. In other words the error in any step is proportional to the kth
power of the error in the previous step.
We now find the speed of convergence of the secant method form the following
derivation. Let r be the root of the equation f ( x) = 0. Let ei = r − xi i.e ei is the error in
the ith iteration. Then
ei +1 = r − xi +1
x n −1 f ( x n ) − x n f ( x n −1 )
=r−
f ( x n ) − f ( x n −1 )
(r − x n −1 ) f ( x n ) − (r − x n ) f ( x n −1 )
=
f ( x n ) − f ( x n −1 )
and using the Taylor’s series of f ( x n ) = f (r − en ) and f ( x n −1 ) = f (r − x n −1 ) , moreover
using the fact that f (r ) = 0.
2 2
e [ f (r ) − en f ' (r ) + e2n f " ( x ) − ...] − en [ f (r ) − en −1 f ' (r ) + en2−1 f " (r )...]
en +1 = n −1
[ f (r ) − en f ' (r ) + ...] − [ f (r ) − en −1 f ' (r ) + ...]
2 n
(e − e ) f (r ) + 12 (en −1en − en en −1 ) f " (r ) − ....
= n −1 n
(en −1 − en ) f ' ( r ) − ...
⎡ − f " (r ) ⎤
≈ en −1en ⎢ ⎥ ~ en −1en .
⎣ 2 f ' (r ) ⎦

Prepared by Tibebe-selassie T/mariam 18


2
We seek k such that en ~ enk−1 : then en +1 ~ enk ~ enk−1 and en −1en ~ enk−+11 , so that we deduce
(1 + 5 )
k 2 ≈ k + 1, hence solving the quadratic equation k 2 − k − 1 = 0, we have k ≈ =
2
1.618. The speed of convergence is therefore faster than linear (k=1), but slower than
quadratic (k=2).

2.6 Fixed-Point Iteration Method


If the equation f ( x) = 0 whose roots are to be found is expressed as x = g ( x) then an
iterative method, which is very easy to program for a computer may be formulated. For
example the equation
f ( x) = e − x − 2 x + 1 = 0
may be rewritten as
x = (1 + e − x ) / 2.
The iterative technique is to guess a starting value x0 . Substitute it in g (x) to give new
approximation x1 = g ( x 0 ). The new approximation is a gain substituted in g (x) to give a
further approximation x 2 = g ( x1 ), and so on until a sufficiently accurate approximation to
the root is obtained. This repetitive process, based on x n +1 = g ( x n ), is called simple
iteration; provided that x n +1 − x n decreases as n increases, the process tends to
α = f (α ), where α denotes the root.
A graphic representation of the method is illustrated in fig 2.6.

y=x

y =g(x)

x0 x 2 x1

Fig 2.6

Algorithm for fixed-point iteration


1. Guess a starting value x0 and choose a convergence parameter ε.
2. Compute an improved value xapp from: xapp = g(x0).
3. If x app − x0 > ε , set x0 to xapp and return to step 2; otherwise, xapp is the
approximate root.

Example 8 Find the root of the equation 3xe x = 1 to an accuracy of 0.0001, using the
method of simple iteration with initial point x0 = 1.

Prepared by Tibebe-selassie T/mariam 19


Solution: We first set x = 13 e − x = g ( x)
Starting with x0 = 1, successive iterations produce
x1 = 0.12263,
x 2 = 0.29486,
x3 = 0.24821, x 4 = 0.26007, x5 = 0.25700, x 6 = 0.25779,
x7 = 0.25759, x8 = 0.25764.
Thus, eight iterations are necessary before x n +1 − x n < 0.0001 and the root is 0.25764.
Convergence
Whether or not an iteration procedure converges quickly, indeed at all, depends on the
choice of the function g ( x) , as well as the striating value x0 . We can examine the
convergence of the iteration process
x n +1 = g ( x n )
to
α = g (α )
with the help of the Taylor series
g (α ) = g ( x k ) + (α − x k ) g ' (t k ), k = 0,1,..., n,
where t k it a point between the root α and the approximation x k . We have
α − x1 = g (α ) − g ( x0 ) = (α − x0 ) g ' (t 0 )
α − x 2 = g (α ) − g ( x1 ) = (α − x1 ) g ' (t1 )
. .
. .
. .
α − x n +1 = g (α ) − g ( x n ) = (α − x n ) g ' (t n ).
Multiplying the n+1 rows together and canceling the common factors
α − x1 ,α − x 2 ,...,α − x n leavesi
α − x n +1 = (α − x0 ) g ' (t 0 ) g ' (t1 )...g ' (t n ).
Consequently,
α − x n +1 = α − x0 g ' (t 0 ) g ' (t1 ) ... g ' (t n ) ,
so that the absolute error α − x n +1 can be made as small as we please by sufficient
iteration if g ' (t i ) < 1 for all i in the neighborhood of the root.
1
For instance in our last example if we write 3xe x = 1 in the form x = ln( ) , we get
3x
'
⎛ 1 ⎞ 1
⎜ ln( ) ⎟ = − > 1, ∀x ∈ (0,1) which s hows us the iteration process would not be
⎝ 3x ⎠ x
convergent.
Observe that if g ' (t i ) ≤ k < 1 for all i, then from the above argument it is clear that
α − x n ≤ k α − x n −1 , k<1 and thus the iteration method is linearly convergent.

Prepared by Tibebe-selassie T/mariam 20


Note:- 1. The smaller the value of g ' ( x), the more rapid will be the convergence.
2. This method of iteration is particularly useful for finding the real roots of an
equation given in the form of an infinite series.
3 The slow rate of convergence of iteration method can be improved by using the
method derived below.
Aitken’s Δ2 method Let x n , x n +1 , x n + 2 be three successive approximations to the desired
root α of the equation x = g (x). Then we know that
α − x n +1 = k (α − x n )
α − x n + 2 = k (α − x n +1 )
α − x n +1 α − x n
Dividing we get =
α − x n + 2 α − x n +1
The sequence y n defined by
( x n +1 − x n ) 2 (Δx n ) 2
yn = xn − = xn − 2
x n + 2 − 2 x n +1 + x n Δ xn
is a better approximation for α than xn . We can introduce this approximation of Aitken
after two iterations in the successive approximations method.
Note that we define Δx n = x n +1 − x n thus Δ2 x n = Δ(Δx n ) = Δ( x n +1 − x n ) = Δx n +1 − Δx n
which implies Δ2 x n = x n + 2 − 2 x n +1 + x n . The forward difference Δ will be discussed
thoroughly in section 5.2.
Example 9 Find a real root of the equation cos x = 3 x − 1 correct to three decimal places
using (i) iteration method (ii) Aitken’s Δ2 method.
Solution:
i) Let f ( x) = cos x − 3x + 1 = 0 , since f(0) = 2 (+ve) and f(π/2) = – 3π/2 + 1 (-ve)
The root lies between 0 and π/2.
Rewriting the given equation as
1
x = (cos x + 1) = g ( x)
3
sin x
we have g ' ( x) = −
3
1
and thus g ' ( x) = sin x < 1 in (0, π / 2).
3
Hence the iteration method can be applied and we start with x0 = 0. Then the successive
approximations are,
1 1
x1 = g ( x0 ) = (cos 0 + 1) = 0.6667 x 2 = g ( x1 ) = (cos 0.6667 + 1) = 0.5953
3 3
1 1
x3 = g ( x 2 ) = (cos 0.5953 + 1) = 0.6093 x 4 = g ( x3 ) = (cos 0.6093 + 1) = 0.6064
3 3
1 1
x5 = g ( x 4 ) = (cos 0.6064 + 1) = 0.6072 x 6 = g ( x 5 ) = (cos 0.6072 + 1) = 0.6071
3 3
we can see that x6 − x5 = 0.6071 − 0.6072 = 0.0001 < 0.0005 hence thee root is 0.607
correct to three decimal places.

Prepared by Tibebe-selassie T/mariam 21


i) We calculate x1 , x 2 , x3 as above. To use Aitken’s method, we proceed as follows by
forming a difference table:

x Δx Δ 2x
x1 0.6667
-0.0714
x2 0.5953 0.0854
0.014
x3 0.6093
(Δx1 ) 2
(−0.0714) 2
Hence x 4 = x1 − 2 = 0.6667 − = 0.607 .
Δ x1 0.0854
Which corresponds to six iterations in normal form. Thus the required root is 0.607.

2.7 The Newton-Raphson iterative method


The Newton-Raphson method is suitable for implementation on a computer. It is a
process for the determination of a real root of an equation f (x) = 0, given just one point
close to the desired root. It can be viewed as a limiting case of the secant method (5.5) or
as a special case of simple iteration (5.6).
1. Procedures
Let x0 denote the known approximate value of the root of f(x) = 0 and h the difference
between the true value α and the approximate value, i.e.,
α = x0 + h
The second degree, terminated Taylor expansion about x0 is
h2
f (α ) = f ( x0 + h) = f ( x 0 ) + hf ' ( x 0 ) + f " (ξ )
2!
Where ξ = x 0 + θ h, 0 < θ < 1. lies between α and x 0 .
Ignoring the remainder term and writing f (α ) = 0.
f ( x0 ) + hf ' ( x 0 ) ≈ 0

f ( x0 )
whence h≈−
f ' ( x0 )

and, consequently,

f ( x0 )
x1 = x 0 −
f ' ( x0 )

should be a better estimate of the root than x0. Even better approximations may be
obtained by repetition (iteration) of the process, which then becomes

f ( xn )
x n +1 = x n −
f ' ( xn )

Prepared by Tibebe-selassie T/mariam 22


Note that if f is a polynomial, you can use the recursive procedure to compute
f ( x n ) and f ' ( x n ) .

x1 x3 x2 x0 x

Fig 2.7
The geometrical interpretation is that each iteration provides the point at which the
tangent at the original point cuts the x-axis (Figure 9). Thus the equation of the tangent at
(x0, f (x0)) is
y − f ( x 0 ) = f ' ( x0 )( x − x 0 )
so that (x1, 0) corresponds to
− f ( x0 ) = f ' ( x0 )( x1 − x0 )
whence
f ( x0 )
x1 = x0 −
f ' ( x0 )
Algorithm for Newton-Raphson method
1. Suppose x0 , ε , M , and δ .
2. Set i=0
3. If | f ( x 0 ) |<ε, x0 is the estimated solution; otherwise go to step 4.
4. Compute the next approximation a from
f ( x0 )
a = x0 − ,
f ' ( x0 )
increase i by one , reset x0 to a and go to step 5.
5. If i ≤ M go to step 3, otherwise go to step 6.
6. Impossible to attain the required accuracy within M iterations.
Example 10 Find the root of the equation e − x = x using the Newton-Raphson method
correct to three decimal places given the initial guess x0 = 1.

Prepared by Tibebe-selassie T/mariam 23


Solution: Let f ( x) = e − x − x then f ' ( x) = −e − x − 1 . From Newton’s iteration formula
gives
f ( xn ) e − xn − x n ( x n + 1)e − xn
x n +1 = x n − = xn − =
f ' ( xn ) (−e − xn − 1) e − xn + 1
Putting n =0, the first approximation x1 is given by
( x + 1)e − x0 (1 + 1)e −1
x1 = 0 − x0 = −1 = 0.5379 , since f (0.5379) = 0.0461 > 0.0005
e +1 e +1
the required accuracy is not yet attained so put n=1 so the second approximation is
( x + 1)e − x1 (0.5379 + 1)e −0.5379
x 2 = 1 − x1 = = 0.5670 and f (0.5670) = 0.0002 < 0.0005
e +1 e −0.5379 + 1
Hence the desired root is 0.567 round to three decimal places.
In the use of Newton’s method, consideration must be given to the proper choice
of a starting point. Usually one must have some insight into the shape of the graph of the
function. Sometimes a coarse graph is adequate, but in other cases a step-by-step
evaluation of the function at various points may be necessary to find a point near the root.
Often the bisection method or the false-position method is used initially to obtain a
suitable starting point, and Newton’s method is used to improve the precision. As u may
see in the next example.
Example 11
We will use the Newton-Raphson method to find the positive root of the equation sin x
= x2, correct to 3 decimal places.
Solution: It will be convenient to use the method of false position to obtain an initial
approximation. Tabulation yields

x f ( x ) = sin x − x 2
0 0
0.25 0.1849
0.5 0.2294
0.75 0.1191
1 –0.1585
With numbers displayed to 4 decimal places, we see that there is a root in the interval
0.75 < x < 1 at approximately
1 0.75 0.1191
x0 =
− 0.1585 − 0.1191 1 − 0.1585
1
=− ( −0.1189 − 0.1191)
0.2777
0.2380
= = 0.8573
0.2777
Next, we will use the Newton-Raphson method:
f (0.8573) = sin(0.8573) − (0.8573) 2
= 0.7561 − 0.7349 = 0.0211
and
f ' ( x ) = cos x − 2 x

Prepared by Tibebe-selassie T/mariam 24


yielding
f ' (0.8573) = 0.6545 − 1.7145 = −1.0600
Consequently, a better approximation is
0.0211
x1 = 0.8573 + = 0.8573 + 0.0200 = 0.8772
1.0600
Repeating this step, we obtain
f ( x1 ) = f (0.8772) = −0.0005
and
f ' ( x1 ) = f ' (0.8772) = −1.1151
so that
0.0005
x 2 = 0.8772 − = 0.8772 − 0.0005 = 0.8767
1.1151
Since f(x2) = 0.0000, we conclude that the root is 0.877 to 3D.

3. Convergence
If we write
f ( x)
φ ( x) = x − ,
f ' ( x)
the Newton-Raphson iteration expression
f ( xn )
x n +1 = x n −
f ' ( xn )
may be rewritten
x n +1 = φ ( x n )
We have observed (section 2.6) that, in general, the iteration method converges when
φ ' ( x ) < 1 near the root. In the case of Newton-Raphson, we have
[ f ' ( x )] 2 − f ( x ) f " ( x ) f ( x ) f " ( x )
φ ' ( x) = 1 − = ,
[ f ' ( x )] 2 [ f ' ( x )] 2
so that the criterion for convergence is
,
i.e., convergence is not as assured as, say, for the bisection method.

4. Rate of convergence

The second degree terminated Taylor expansion about xn is

where is the error at the n-th iteration and .

Since , we find

Prepared by Tibebe-selassie T/mariam 25


But, by the Newton-Raphson formula,

whence the error at the (n + 1)-th iteration is

provided en is sufficiently small.

This result states that the error at the (n + 1)-th iteration is proportional to the square of
the error at the nth iteration; hence, if an answer is correct to one decimal place at one
iteration it should be accurate to two places at the next iteration, four at the next, eight at
the next, etc. This quadratic - second-order convergence - outstrips the rate of
convergence of the methods of bisection and false position which as we already know is
linear!
In relatively little used computer programs, it may be wise to prefer the methods of
bisection or false position, since convergence is virtually assured. However, for hand
calculations or for computer routines in constant use, the Newton-Raphson method is
usually preferred.
5. The square root

One application of the Newton-Raphson method is in the computation of square roots.


Since a½ is equivalent to finding the positive root of x2 = a. i.e.,

f(x) = x2 - a = 0.

Since f '(x) = 2x, we have the Newton-Raphson iteration formula:

xn+1 = xn - (x²n - a)/2xn = ½(xn + a/xn),

a formula known to the ancient Greeks. Thus, if a = 16 and x0 = 5, we find to 3D

x1 = (5 + 3.2)/2 = 4.1, x2 = (4.1 + 3.9022)/2 = 4.0012, and x3 = (4.0012 + 3.9988)/2 =


4.0000.

Prepared by Tibebe-selassie T/mariam 26


EXERCISES

1. Determine a root of the following equations correct to 3 decimal places and the
number of iterations to reach the required accuracy using
a)bisection method b)false position method c) secant method d) fixed point
iteration e) Newton-Raphson method
i. x 3 − 4 x + 1 = 0
ii. x + cos x = 0
iii. 2 x − log x = 6
iv. x 6 − x 4 − x 3 − 1 = 0
v. x − e − x = 0
2. Find where the graphs of y = 3x and y = e x intersect correct to four decimal
digits.
3. Demonstrate graphically that the equation 50π + sin x = 100 arctan x has infinitely
many roots.
4. If a = 0.1 and b = 1.0, how many steps of the bisection method are needed to
determine the root with an error of at most 0.5 × 10 – 8.
5. Determine the two smallest roots of the equation f ( x) = x sin x + cos x = 0 correct
to three decimal places using:
a) bisection method b) false position method c) secant method d) fixed point
iteration e) Newton-Raphson method
6. Find the root of the equation 2 x = cos x + 3 correct to three decimal places using
(i) Iteration method (ii) Aitken’s Δ2 method.
7. Determine the real root of ln x 2 = 0.8 .
i) Graphically
ii) Using three iterations of the bisection method, with initial guesses of a=0.5
and b=2.
iii) Using three iterations of false-position method, with the same initial guesses
as in (ii).
8. Find the root or roots of ln[(1 + x) /(1 − x 2 )] = 0
9. Denote the successive intervals that arise in the bisection method by [a0 , b0 ] ,
[a1 , b1 ] , [a 2 , b2 ] , and so on.
a. Show that a 0 ≤ a1 ≤ a 2 ≤ ... and that b0 ≥ b1 ≥ b2 ≥ ...
b. Show that bn − a n = 2 − n (b0 − a 0 ).
c. Show that, for all n, a n bn − a n − bn −1 = a n −1bn − a n bn −1.
10. Verify that when Newton’s method is used to compute R (by solving the
equation x 2 = R) , the sequence of iterates is defined by

1⎛ R⎞
x n +1 = ⎜⎜ x n + ⎟⎟
2⎝ xn ⎠

11. Show that if the sequence {x n } is defined as in the problem 10, then

Prepared by Tibebe-selassie T/mariam 27


2
1 ⎡ xn2 − R ⎤
x 2
−R= ⎢
2 ⎣ 2 x n ⎥⎦
n +1

12. Two of the four zeros of f ( x) = x 4 + 2 x 3 − 7 x 2 + 3 are positive. Find them by


Newton’s method, correct to two significant figures.
13. Using a calculator, observe the sluggishness with which Newton’s method
converges in the case of f ( x) = ( x − 1) m with m = 8 of 12. Reconcile this with the
theory. Use x0 = 1.1.
14. The reciprocal of a number R can be computed without of a number R can be
computed without division by the iterative formula

x n +1 = x n (2 − x n R)

Establish this relation by applying Newton’s method to some f(x). beginning with
x0 = 0.2 , compute the reciprocal of 4 correct to six decimal digits or more by this
rule. Tabulate the error at each step and observe the quadratic convergence.

15. The iteration formula x n +1 = x n − (cos x n )(sin x n ) + R cos 2 x n , where R is a


positive constant, was obtained by applying Newton’s method to some function
f(x). What was f(x)?
( x nk − a )
16. Derive the Newton-Raphson iteration formula xn +1 = x n − that can be
kx nk −1
used to find the kth root of a.
17. Consider the following procedures
1
a. x n +1 = (2 x n − R / x n2 )
3
b. xn +1 = 12 x n + 1 / xn
Do they converge for x0 ≠ 0 ? if so, to what values?
18. If we use the secant method on f ( x) = x 3 − 2 x + 2 starting with x0 = 0 and x1 = 1,
what is x2?
19. Write a simple program to compare the secant method with Newton’s method for
finding a root of each function.
a. x 3 − 3 x + 1 with starting point x0 = 2
b. x 3 − 2 sin x with starting point x0 = 1/2
Use the x1 value from Newton’s method as the second starting point for the secant
method. Print out each iteration for both methods
20. What is the appropriate formula for finding square roots using the secant method?
21. Draw a follow chart and write a pseudo code in C++ for the:
a. False-position method b. secant method c. Fixed point iteration
d. Newton’s method

Prepared by Tibebe-selassie T/mariam 28


3 Systems of Linear Equations
Introduction
Many physical systems can be modeled by a set of linear equations, which describe
relationships between system variables. In simple cases, there are two or three variables;
in complex systems (for example, in a linear model of the economy of a country) there
may be several hundred variables. Linear systems also arise in connection with many
problems of numerical analysis. Examples of these are the solution of partial differential
equations by finite difference methods, statistical regression analysis, and the solution of
eigenvalue problems.
Hence there arises a need for rapid and accurate methods for solving systems of
linear equations. The tow methods of solving system of equations that are commonly
known are the direct method and indirect (or iterative method). The direct method is
based on the elimination of variables to transform the set of equations to a triangular
form, the completed in a finite number of steps resulting in the exact solution and thus the
amount of computation involved can be specified in advance. The method is independent
of the accuracy desired. The indirect methods always begins with an approximate
solution, and obtains an improved solution with each step of the iteration but would
require an infinite number of steps to obtain an exact solution in the absence of round –
off errors. The accuracy of the solution unlike the direct method depends on the number
of iterations performed.
Usually ‘iterative methods’ are used for spars 1 matrices whereas for dense 2
matrices we use direct methods.
Notation and definitions
We first consider an example in three variables:
x+ y−z=2
x + 2y + z = 6
2x − y + z = 1
a set of three linear equations in the three variables or unknowns x, y, z. During solution
of such a system, we determine a set of values for x, y and z which satisfies each of the
equations. In other words, if values (X, Y, Z) satisfy al1 equations simultaneously, then
they constitute a solution of the system.
Consider now the general system of n equations in n variables:
a11 x1 + a12 x 2 + ... + a1n x n = b1
a 21 x1 + a 22 x 2 + ... + a 2 n x n = b2
M M M M
a n1 x1 + a n 2 x 2 + ... + a nn x n = bn
Obviously, the dots indicate similar terms in the variables and the remaining (n - 3)
equations.

1
Dense matrices have few zero elements. Such matrices occur in science and engineering problems.
2
The sparse matrices have few non-zero elements. Such type of matrices arise in partial differential
equations.

Prepared by Tibebe-selassie T/mariam 29


In this notation, the variables are denoted by x1, x2, . . , xn; they are sometimes referred to
as xi , i = 1, 2, · ·. , n. The coefficients of the variables may be detached and written in a
coefficient matrix:
⎡ a11 a12 L a1n ⎤
⎢a a 22 L a 2 n ⎥
A= ⎢ 21 ⎥
⎢ M M O M ⎥
⎢ ⎥
⎣a n1 a n 2 L a nn ⎦
The notation aij will be used to denote the coefficient of xj in the i-th equation. Note that
aij occurs in the i-th row and j-th column of the matrix.
The numbers on the right-hand side of the equations are called constants; they may be
written in a column vector:
⎡ b1 ⎤
⎢b ⎥
b= ⎢ 2⎥
⎢M⎥
⎢ ⎥
⎣bn ⎦
The coefficient matrix may be combined with the constant vector to form the augmented
matrix:
⎡ a11 a12 L a1n b1 ⎤
⎢a a 22 L a 2 n b2 ⎥
⎢ 21 ⎥
⎢ M M O M M⎥
⎢ ⎥
⎣a n1 a n 2 L a nn bn ⎦
As a rule, one works in the elimination method directly with the augmented matrix.
The existence of solutions
For any particular solution of n linear equations, there may be a single solution (X1, X2, . .
. Xn), or no solution, or infinitely money solutions. In the theory of linear algebra,
theorems are given and conditions stated which allow to make a decision regarding the
category into which a given system falls. We shall not treat the question of existence of
solutions in this book, but for the benefit of students, familiar with matrices and
determinants, we state the theorem:
Theorem: A linear system of n equations in n variables with coefficient matrix A and
non-zero constants vector b has a unique solution, if and only if the determinant of A is
not zero.
If b = 0, the system has the trivial solution x = 0. It has no other solution unless the
determinant of A is zero, in which case it has an infinite number of solutions.
If the determinant of A is non-zero, there exists an n×n matrix, called the inverse of A
(denoted by A-1) such that the matrix product of A-1 and A is equal to the n × n-identity
or unit matrix I. The elements of the identity matrix are 1 on the main diagonal and 0
elsewhere. Its algebraic properties include Ix = x for any n×1 vector x, and IM = MI = M
for any n×n matrix M. For example, the 3×3 identity matrix is

Prepared by Tibebe-selassie T/mariam 30


⎡1 0 0⎤
I = ⎢0 1 0⎥
⎢ ⎥
⎢⎣0 0 1⎥⎦
Multiplication of the equation Ax = b from the left by the inverse matrix A-1 yields A-1Ax
= A-1b, whence the unique solution is x = A-1b (since A-1A = I and Ix = x). Thus, in
principle, a linear system with a unique solution may be solved by first evaluating A-1 and
then A-1b. This approach is discussed in more detail in section 3.1.5. The Gauss
elimination method, that we want to concider next, is a more general and efficient direct
procedure for solving systems of linear equations.

3.1 Direct Methods


3.1.1 Upper-triangular Linear Systems
We will now develop the back-substitution algorithm, which is useful for solving a
linear system of equations that has an upper-triangular coefficient matrix. This will be
incorporated in the algorithm for solving a general linear system in Section 3.2.
Definition 3.1 An n × n matrix A = (aij ) is called upper triangular provided that the
elements satisfy aij = 0 whenever i > j. The n × n matrix A = (aij ) is called lower
triangular provided that aij = 0 whenever i < j.
We will develop a method for constructing the solution to upper-triangular linear
system of equations and leave the investigation of lower-triangular systems to the reader.
If A is an upper-triangular matrix, then AX=B is said to be an upper-triangular system
of linear equations and has the form
a11 x1 + a12 x 2 + a13 x 3 + L + a1,n −1 x n −1 + a1,n x n = b1
a 22 x 2 + a 23 x 3 + L + a 2,n −1 x n −1 + a 2n x n = b2
a 33 x 3 + L + a 3,n −1 x n −1 + a 3n x n = b3
(1)
M M
+ a n −1,n −1 x n −1 + a n −1,n x n = bn −1
a nn x n = bn .
To solve this system of equations where a kk ≠ 0 for k = 1,2,..., n, we start from the last
equation since it only involves only xn so solving it first we have:
b
xn = n .
a nn
Now xn in known and it can be used in the next-to-last equation:
bn −1 − a n −1,n x n
x n −1 = .
a n −1,n −1
Now xn and xn – 1 are used to find xn – 1
bn − 2 − a n − 2,n −1 x n −1 − a n − 2,n x n
x n −2 = .
a n − 2,n − 2
Once the values xn , xn – 1, …, xk+1 are known, the general step is

Prepared by Tibebe-selassie T/mariam 31


n
bk − ∑a
j = k +1
kj xj
xk = for k = n − 1, n − 2,...1. (2)
a kk
The uniqueness of the solution is easy to see. The nth equation implies that bn/ann in the
only possible value of xn. Then finite induction is used to establish then xn-1, xn – 2, …, x1
are unique.
Example 1: Use back substitution to solve the linear system
4 x1 − x 2 + 2 x 3 + 3x 4 = 20
− 2 x 2 + 7 x 3 + 4 x 4 = −7
6 x 3 + 5x 4 = 4
3x 4 = 6
Solution: Solving for x4 in the last equation yields x4 = 6/3 = 2.
Using x4 = 2 in the third equation, we obtain
4 − 5( 2)
x4 = = −1.
6
Now x3= – 1 and x4=2 are used to find x2 in the second equation:
− 7 − 7( −1) + 4( 2)
x2 = = −4 .
−2
Finally, x1 is obtained using the first equation:
20 + 1( −4) − 2( −1) − 3( 2)
x1 = = 3.
4
The condition that a kk ≠ 0 is essential because equation (2) involves division by
a kk . If this requirement is not fulfilled, either no solution exists or infinitely many
solutions exist.
Example 2: Show that there is no solution to the linear system
4 x1 − x 2 + 2 x 3 + 3x 4 = 20
0x2 + 7x3 + 4x4 = −7
(3)
6 x 3 + 5x 4 = 4
3x 4 = 6
Solution: - Using the last equation in (3) we must have x 4 = 2, which is substituted into
the second and third equations to obtain
7 x 3 − 8 = −7
(4)
6 x 3 + 10 = 4
The first equation in (4) implies that x 3 = 1 / 7 , and the second equation implies that
x 3 = −1 . This contradiction leads to the conclusion that there is no solution to the linear
system (3).
The following flow chart will solve the upper-triangular system by the method of
back substitution, provided a kk ≠ 0 for k = 1,2,..., n,

Prepared by Tibebe-selassie T/mariam 32


Start

for i=1 to n
for j ≥ i to n +1
Read aij

a n ,n +1
xn =
a nn

k = n–1

Sum=0

j= k +1

k=k-1 j=j+1 Sum = Sum + akjxj

yes
j<n
no
a k ,n +1 − Sum
xk =
a kk

yes
k>1

no
For i =1 to n
Display xi

Stop

Flow chart for back substitution

Prepared by Tibebe-selassie T/mariam 33


Exercises
In problems 1 through 3, solve the upper-triangular system
1. 3x1 − 2 x 2 + x 3 − x 4 =8 2. 5 x1 − 3x 2 − 7 x 3 + x 4 = −14
4 x 2 − x 3 + 2 x 4 = −3 11x 2 + 9 x 3 + 5 x 4 = 22
2 x 3 + 3x 4 = 11 3x 3 − 13x 4 = −11
5 x 4 = 15 7x4 = 14
3. 4 x1 − x 2 + 2 x 3 + 2 x 4 − x 5 = 4
− 2 x 2 + 6x3 + 2 x4 + 7 x5 = 0
x3 − x4 − 2 x5 = 3
− 2 x 4 − x 5 = 10
3x 5 = 6
In problems 4 and 5, solve the lower-triangular system.
4. 2 x1 =6 5. 5 x1 =6
− x1 + 4 x 2 =5 x1 + 3 x 2 =5
3 x1 − 2 x 2 − x 3 =4 3 x1 + 4 x 2 + 2 x 3 =4
x1 − 2 x 2 + 6 x 3 + 3 x 4 = 2 − x1 + 3 x 2 − 6 x 3 − x 4 = 2
2
6. Show that back substitution requires n divisions, (n – n)/2 multiplications, and
(n2–n)/2 additions or subtractions. Hint: You can use the formula
m

∑ k = m(m + 1) / 2
k =1
7. Write a forward-substitution algorithm for solving a system of equations with a lower
triangular coefficient matrix and draw a flow chart for the algorithm.

3.1.2 Gauss elimination method


In this section we develop a scheme for solving a general system Ax = B of n × n system
of equations. The aim is to construct an equivalent upper-triangular system Ux = Y that
can be solved by using back-substitution.
During transformation of a system to upper triangular form, one or more of the
following elementary operations are used at every step:

1. Multiplication of an equation by a constant;


2. Subtraction from one equation some multiple of another equation;
3. Interchange of two equations.

Mathematically speaking, it should be clear to the student that performing elementary


operations on a system of linear equations leads to equivalent systems with the same
solutions. This statement requires proof which may be found as a theorem in books on
linear algebra. It forms the basis of all elimination methods for solving systems of linear
equations.
Example: Find the parabola y = a + bx + cx 2 that passes through the three points (1,1),
(2,– 1), and (3, 1).

Prepared by Tibebe-selassie T/mariam 34


Solution: First we obtain an equation relating the value of x to the value of y. The result it
the linear system
a + b + c =1 at (1,1)
a + 2b + 4c = −1 at ( 2,−1)
a + 3b + 9 c = 1 at (1,1)
Step 1: The variable a is eliminated form the second and third equations by subtracting
the first equation from them.
a + b + c = 1
b + 3c = −2 ( R2 − R1 )
2b + 8c = 0 ( R3 − R1 )
Step 2: The variable b is eliminated from the third equation in the last system by
subtracting from it two times the second equation. We arrive at the equivalent upper-
triangular system:
a + b + c = 1
b + 3c = −2
2c = 0 ( R 3 − 2 R 2 )
The back substitution algorithm is now used to find the coefficients c=4/2=2,
b = – 2 – 3(2) = – 8, and a=1 – (– 8) – 2 = 7, and the equation of the parabola is
y = 7 − 8x + 2 x 2 .
We can also solve the system Ax = B by performing elementary row operations
on the augmented matrix [A|B]. In this case the number a kk in the coefficient matrix A
that is used to eliminate a ik , where k = i + 1, i + 2, …, n, is called the kth pivotal
element, and the kth row is called another row.
General treatment of the elimination process
We will now describe the application of the elimination process to a general n× n linear
system, written in general notation, which is suitable for implementation on a computer
(Pseudo-code). Before considering the general system, the process will be described by
means of a system of three equations. We begin with the augmented matrix:, and
display in the column (headed by m) the multipliers required for the transformations.

Step 1: Eliminate the coefficients a21 and a31, using row R1:

Step 2: Eliminate the coefficient a32 using row R2:

Prepared by Tibebe-selassie T/mariam 35


The matrix is now in the form which permits back-substitution:. The full system of
equations at this stage, equivalent to the original system, is

Hence follows the solution by back-substitution:

We now display the process for the general n × n system, omitting the primes (') for
convenience. Recall that the original augmented matrix was:
⎡ a11 a12 L a1n a1,n +1 ⎤
⎢ ⎥
⎢a 21 a 22 L a 2 n a 2,n +1 ⎥
⎢ M M O M M ⎥
⎢ ⎥
⎣a n1 a n 2 L a nn a n ,n +1 ⎦
Step 1: If necessary, switch rows so that a11 ≠ 0 ; then eliminate the coefficients a21, a31,.
. . , an1 by calculating the multipliers
for i = 2 to n
m i1 = a i1 / a11 ;
a i1 = 0
for j = 2 to n + 1
a ij = a ij − m i1 ∗ a1 j ;

This leads to the modified augmented system


⎡a11 a12 a13 L a1n a1,n +1 ⎤
⎢ ⎥
⎢ 0 a 22 a 23 L a 2 n a 2,n +1 ⎥
⎢ 0 a 32 a 33 L a 3n a 3,n +1 ⎥
⎢ ⎥
⎢ M M M O M M ⎥
⎢ 0 a n2 a n3 L a nn a n ,n +1 ⎥⎦

Step 2: If necessary, switch rows so that a 22 ≠ 0 ; then eliminate the coefficients a32, a42,
… , an2 by calculating the multipliers and performing row operation applying the
algorithm
for i = 3 to n
m i 2 = a i 2 / a 22 ;
ai2 = 0
for j = 3 to n + 1
a ij = a ij − mi 2 ∗ a 2 j ;

whence

Prepared by Tibebe-selassie T/mariam 36


⎡a11 a12 a13 L a1n a1,n +1 ⎤
⎢ ⎥
⎢0 a 22 a 23 L a 2 n a 2,n +1 ⎥
⎢0 0 a 33 L a 3n a 3,n +1 ⎥
⎢ ⎥
⎢ M M M O M M ⎥
⎢0 0 a n 3 L a nn a n ,n +1 ⎥⎦

We continue to eliminate unknowns, going on to columns 3, 4, . . so that by the beginning
of the k-th stage we have the augmented matrix
⎡a11 a12 L L L a1n a1,n +1 ⎤
⎢ 0 a 22 L L L a 2 n a 2,n +1 ⎥
⎢ ⎥
⎢M M M O M M M ⎥
⎢0 0 L a kk L a 3n a 3,n +1 ⎥
⎢ ⎥
⎢ M M M M O M M ⎥
⎢⎣ 0 0 L a kn L a nn a n ,n +1 ⎥⎦
Step k: If necessary, switch rows so that a 22 ≠ 0 ; then eliminate ak+l,k, ak+2,k, . . , ak+n,kl by
calculating the multipliers and elementary row operations according the algorithm
for i = k + 1 to n
m ik = a ik / a kk ;
a ik = 0
for j = k + 1 to n + 1
a ij = a ij − m ik ∗ a kj ;
At the end of the k-th stage, we obtain the augmented system
⎡a11 a12 L L L L a1n a1,n +1 ⎤
⎢ 0 a 22 L L L L a 2 n a 2,n +1 ⎥
⎢ ⎥
⎢M M O M M M M M ⎥
⎢0 0 L a kk a k ,k +1 L a 3n a 3,n +1 ⎥
⎢ ⎥
⎢ M M M M M O M M ⎥
⎢⎣ 0 0 L 0 a kn L a nn a n ,n +1 ⎥⎦

Continuing in this way, we obtain after n -1 stages the augmented matrix


⎡a11 a12 a13 L a1n a1,n +1 ⎤
⎢ ⎥
⎢ 0 a 22 a 23 L a 2 n a 2,n +1 ⎥
⎢0 0 a 33 L a 3n a 3,n +1 ⎥
⎢ ⎥
⎢ M M M O M M ⎥
⎢0 0 0 L a nn a n ,n +1 ⎥⎦

Note that the original coefficient matrix has been transformed into the upper triangular
form. So we now solve the last system by back substitution.
Pivoting to avoid akk = 0
If akk = 0, row k cannot be used to eliminate the elements in column k below the main
diagonal. It is necessary to find row r, where ark≠0 and r > p, and then interchange row k

Prepared by Tibebe-selassie T/mariam 37


and row r so that a nonzero pivot element is obtained. This process is called pivoting, and
the criterion for deciding which row to choose is called a pivoting strategy. The trivial
pivoting strategy is as follows. If akk≠0, do not switch rows. If akk = 0, locate the first row
below p in which ark≠0 and switch rows k and p. This will result in a new element akk≠0,
which is a nonzero pivot element.
Round-off errors and numbers of operations
Numerical methods for solving systems of linear equations involve large numbers of
arithmetic operations. For example, the Gauss elimination of section 3.1.1, according to
Atkinson (1993), involves (n3 + 3n2 - n)/3 multiplications/divisions and (2n3 + 3n2 - 5n)/6
additions/subtractions in the case of a system with n unknowns.
Since round-off errors are propagated at each step of an algorithm, the growth of round-
off errors can be such that, when n is large, a solution differs greatly from the true one.

3.1.3 Gaussian Elimination With Partial Pivoting


In Gauss elimination, the buildup of round-off errors may be reduced by rearranging the
equations so that the use of large multipliers in the elimination operations is avoided. The
corresponding procedure is known as partial pivoting (or pivotal condensation). The
general rule to follow involves: At each elimination stage, rearrange the rows of the
augmented matrix so that the new pivot element is larger in absolute value than (or equal
to) any element beneath it in its column. i.e
1. In the kth column, choose the rth row where
{ }
a rk = max a kk , a k +1,k ,..., a n =1,k , a nk
2. and interchange the kth row with the rth row.
Now, each of the multipliers will be less than or equal to 1 in absolute value. The
following example illustrates how the use of the trivial pivoting strategy in Gaussian
elimination can lead to significant error in the solution of a linear system of equations.
Example The values x1 = x 2 = 1.000 are the solutions to the system of equations
1 . 133 x 1 + 5 . 281 x 2 = 6 . 414
24 . 14 x 1 − 1 . 210 x 2 = 22 . 93
Use four-digit arithmetic and Gaussian elimination with trivial pivoting to find a
computed approximate solution to the system.
Solution The multiplier m 21 = 24.14 / 1.133 = 21.31 , thus subtracting the m 21 multiple of
row 1 from row 2 using four digits calculations i.e
1.133x1 + 5.281x 2 = 6.414
24.14 x1 − 1.210 x 2 = 22.93 ( R2 − 21.31R1 )
We have the computed upper-triangular system is
1.133x1 + 5.281x 2 = 6.414
− 113.7 x 2 = −113.8
Back substitution is used to compute x 2 = −113.8 /( −113.7) = 1.001 and x1 = 0.9956.
The error in the solution of this linear system is due to the magnitude of the
multiplier m 21 = 21.31 . In the next example the magnitude of the multiplier m 21 is
reduced by using partial pivoting as can be seen in the next example.

Prepared by Tibebe-selassie T/mariam 38


Example Use four-digit arithmetic and Gaussian elimination with partial pivoting to
solve the linear system
24.14 x1 − 1.210 x 2 = 22.93
1.133x1 + 5.281x 2 = 6.414
Solution: This time m 21 = 1.133 / 24.14 = 0.04693 and subtracting the m 21 multiple of
row 1 from row 2 using four digits calculations i.e
24.14 x1 − 1.210 x 2 = 22.93
1.133 x1 + 5.281x 2 = 6.414 ( R2 − m 21 R1 )
We have the computed upper-triangular system is
24.14 x1 − 1.210 x 2 = 22.93
5.338 x 2 = 5.338
Back substitution is used to compute x 2 = 5.338 / 5.338 = 1.000, and x1 = 1.000.
Ill-conditioning
Certain systems of linear equations are such that their solutions are very sensitive to small
changes (and therefore to errors) in their coefficients and constants. We give an example
below in which 1 % changes in two coefficients change the solution by a factor of 10 or
more. Such systems are said to be ill-conditioned. If a system is ill-conditioned, a
solution obtained by a numerical method may differ greatly from the exact solution, even
though great care is taken to keep round-off and other errors very small.
As an example, consider the system of equations:

which has the exact solution x =1, y = 2. Changing coefficients of the second equation by
1% and the constant of the first equation by 5% yields the system:

It is easily verified that the exact solution of this system is x = 11, y = -18.2. This solution
differs greatly from the solution of the first system. Both these systems are said to be ill-
conditioned.
If a system is ill-conditioned, then the usual procedure of checking a numerical solution
by calculation of the residuals may not be valid. In order to see why this is so, suppose
we have an approximation X to the true solution x. The vector of residuals r is then
given by r = b - AX = A(x - X). Thus e = x - X satisfies the linear system Ae = r. In
general, r will be a vector with small components. However, in an ill-conditioned
system, even if the components of r are small so that it is `close' to 0, the solution of the
linear system Ae = r could differ greatly from the solution of the system Ae = 0, namely
0. It then follows that X may be a poor approximation to x despite the residuals in r being
small.
Obtaining accurate solutions to ill-conditioned linear systems can be difficult, and many
tests have been proposed for determining whether or not a system is ill-conditioned.

Prepared by Tibebe-selassie T/mariam 39


Exercise
In Problems 1 through 4 solve the system Ax = B
1. 2 x1 + 4 x 2 − 6 x 3 = −4 2. 2 x1 − 2 x 2 + 5 x 3 = 6
x1 + 5 x 2 + 3x 3 = 10 2 x1 + 3 x 2 + x 3 = 13
x1 + 3 x 2 + 2 x 3 = 5 − x1 + 4 x 2 − 4 x 3 = 3
3. 2 x1 + 4 x 2 − 4 x 3 = 12 4. x1 + 2 x 2 − x 4 = 9
x1 + 5 x 2 − 5 x 3 − 3x 4 = 18 2 x1 + 3 x 2 − x 3 = 9
2 x1 + 3 x 2 + x 3 + 3 x 4 = 8 3x 2 + x 3 + 3x 4 = 26
x1 + 4 x 2 − 2 x 3 + 2 x 4 = 8 5 x1 + 5 x 2 + 2 x 3 − 4 x 4 = 32
5. Find the parabola y = a + bx + cx 2 that passes through (1, 4), (2,7), and (3, 14).
6. Find the parabola y = a + bx + cx 2 that passes through (1, 6), (2,5), and (3, 2).
7. Find the solution to the following linear system
x1 + 2 x 2 =7
2 x1 + 3 x 2 − x 3 =9
4 x 2 + 2 x 3 + 3x 4 = 10
2 x 3 − 4 x 4 = 12
8. Find the solution to the following linear system
x1 + x 2 =5
2 x1 − x 2 + 5 x 3 = −9
3x 2 − 4 x 3 + 2 x 4 = 19
2 x3 + 6x4 = 2
9. Solve the following linear systems using Gaussian elimination with partial pivoting.
(b) x1 + 20 x 2 − x 3 + 0.001x 4 = 0
( a ) 2 x1 − 3 x 2 + 100 x 3 = 1
2 x1 − 5 x 2 + 30 x 3 − 0.1x 4 = 1
x1 + 10 x 2 − 0.001x 3 = 0
5 x1 + x 2 − 100 x 3 − 10 x 4 = 0
3 x1 − 100 x 2 + 0.01x 3 = 0
2 x1 − 100 x 2 − x3 + x4 = 0
10. The Hilbert matrix is a classical ill conditioned matrix and small changes in its
coefficients will produce a large change in the solution to the perturbed system.
(a) Find the exact solution of Ax=B (leave all numbers as fractions and do exact
arithmetic) using the Hilbert matrix of dimension 4 × 4:
⎡ 1 1 1⎤
⎢1 2 3 4⎥
⎢1 1 1 1⎥ ⎡1 ⎤
⎢ ⎥ ⎢0⎥
A= ⎢ 2 3 4 5 ⎥ B=⎢ ⎥
⎢1 1 1 1⎥ ⎢0⎥
⎢3 4 5 6⎥ ⎢ ⎥
⎢1 1 1 1⎥ ⎣0⎦
⎢ ⎥
⎣4 5 6 7⎦
(b) Now solve Ax = B using four-digit rounding arithmetic.

Prepared by Tibebe-selassie T/mariam 40


11. Construct a program for a Gaussian elimination method with out partial pivoting.
12. Construct a program for a Gaussian elimination method with partial pivoting.
13. Many applications involve matrices with many zeros. Of practical importance are
tridiagonal systems(see problem 7 and 8) of the form
d 1 x1 + c1 x 2 = b1
a 1 x1 + d 2 x 2 + c 2 x 3 = b2
a 2 x 2 + d 3 x 3 + c3 x 3 = b3
. . . .
. . . .
. . . .
a n − 2 x n − 2 + d n −1 x n −1 + c n −1 x n = bn −1
a n −1 x n −1 + d n x n = bn
Construct a program that will solve a tridiagonal system. You may assume that row
interchanges are not needed and that row k can be used to eliminate x k in row k + 1.
14. Modify the program you have written in Problem 11 so that it will efficiently solve
M linear systems with the same coefficient matrix A but different column matrices
B. The M linear systems look like
Ax 1 = B 1 , Ax 2 = B 2 , ... Ax M = B M .

3.1.4 Gauss-Jordan Method


The Gauss-Jordan method consists of transforming the linear system Ax = b into an
equivalent system Ix = b’, where I is the identity matrix of order n so that x = b’ is the
solution of the original linear system.

Example Using Gauss-Jordan method sol1ve the system of equations


x+ y+z=2
2x + 3y + z = 3
x − y − 2 y = −6
Solution: We start with the augmented matrix and use the first row to create zeros in the
first column. (This corresponds to using the first equation to eliminate x from the second
and third equations.)
⎡1 1 1 2⎤ ≈ ⎡1 1 1 2⎤
⎢2 3 1 3 ⎥ R2 − 2 R1 , ⎢0 1 − 1 − 1⎥
⎢ ⎥ ⎢ ⎥
⎢⎣1 − 1 − 2 − 6⎥⎦ R3 − R1 ⎢⎣0 − 2 − 3 − 8⎥⎦
Create appropriate zeros in column 2:
≈ ⎡1 0 2 3 ⎤ ⎡1 0 2 3⎤

R1 − R2 0 1 − 1 − 1 ⎥ ≈ ⎢ ⎥
⎢ ⎥ (−1 / 5) R ⎢0 1 − 1 − 1⎥
R3 + 2 R2 ⎢0 0 − 5 − 10⎥ 3
⎢⎣0 0 1 2 ⎥⎦
⎣ ⎦
Creating zeros in column 3:

Prepared by Tibebe-selassie T/mariam 41


⎡1 0 0 − 1⎤
⎢0 1 0 1 ⎥
⎢ ⎥
⎢⎣0 0 1 2 ⎥⎦
The last matrix corresponds to the system 1
x = −1
y = 1
z = 2
Consequently the solution is given by x = −1, y = 1, z = 2.
Description of the Method
Let us consider the following linear system defined by
⎡ a11 a12 ... a1n ⎤ ⎡ x1 ⎤ ⎡ a1,n +1 ⎤
⎢a a22 ... a2 n ⎥ ⎢ x2 ⎥ ⎢⎢a2,n +1 ⎥⎥
⎢ 21 ⎥⎢ ⎥ =
⎢ M M M ⎥ ⎢ x3 ⎥ ⎢ M ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣an1 an 2 ... ann ⎦ ⎣ x4 ⎦ ⎣an ,n +1 ⎦
Step 1 Assume that a11 ≠ 0. The normalization operation replaces a11 by 1 in the
augmented matrix [A, b] and this is possible by pre-multiplying row 1 of [A, b] by 1 / a11 .
So, after normalization the first row of the matrix [A, b] becomes
a11 j = a1 j / a11 for j = 2,..., ( n + 1)
Now we make the non-diagonal elements of the first column of A to become zero. This
possible by multiplying R1 by ai1 and adding it to the ith row i ≥ 2 . Then we get the
new coefficients defined by
aij1 = aij − ai1a1 j for i = 2,..., n; j = 2,..., ( n + 1)
Thus, the new system [A, b]1 is written as
⎡1 a121 1
a13 L a11n a11,n +1 ⎤
⎢ 1 1 1 1 ⎥
⎢0 a 22 a 23 L a 2 n a 2,n +1 ⎥
⎢M M M O M M ⎥
⎢ 1 1 1 1 ⎥
⎢⎣0 a n 2 a n 3 L a nn a n ,n +1 ⎥⎦
Step 2: Assume, that a22 1
≠ 0. The normalization operation changes the pivot element a 122
by 1. This is resulted by pre-multiplying the second row of the augmented matrix by
1 / a 122 . Thus, the new coefficients of the second row will become
a 22 j = a 12 j / a 122 for j = 3,4,..., ( n + 1)
1
The coefficient above the diagonal is made zero by pre-multiplying R2 by a12 and
subtracting the multiple from R1. In fact, the row coefficients of the first row are given by
a12j = a11 j − a12
1
a 22 j for j = 3,4,..., ( n + 1)
But to make the coefficients below the diagonal element of the second column of A zero
we multiply the second row by ai12 while i ≥ 3,..., n .
Therefore, the general formula for the new coefficients can be written as

Prepared by Tibebe-selassie T/mariam 42


aij2 = aij1 − ai 2a22 j ∀j = 3,..., ( n + 1); i = 1,..., n with i ≠ 2,
and the new system becomes
⎡1 0 a132 L a12n a12,n +1 ⎤
⎢ 2 ⎥
⎢0 1
2
a 23 L a 22n a 2,n +1 ⎥
⎢0 0 2
a 33 L M a 32,n +1 ⎥
⎢ ⎥
⎢M M M L M M ⎥
⎢0 0 2 a2
a n23 L a nn ⎥
⎣ n , n +1 ⎦

Step k. Assume a kkk −1 ≠ 0. Then pre-multiplying the kth row of [A, b]k-1 by 1 / a kkk −1 , we
change a kkk −1 to 1, but the other coefficients of the k-th row are changed. In fact, the new
coefficients will be
akjk = akjk −1 / akkk −1 for j = k ,..., ( n + 1)
The non-diagonal coefficients of the k-th column are made zero by pre-multiplying the
k-th row by aikk −1 for i=1,…,n and i ≠ k . This implies that the new coefficients will be
defined by
aijk = aijk −1 − aikk −1 .a kjk for j = (k + 1),..., (n + 1); i = 1,2,...n; i ≠ k
Now, we can write Gauss-Jordan algorithm for a linear system.
Gauss-Jordan Algorithm for solving a Linear System [A, b]
Gauss-Jordan Algorithm without pivoting
(i) Transforming [A, b] into [I, b’]. For k = 1, 2, …,n,
akjk = akjk −1 / akkk −1 ∀j = (k + 1),..., ( n + 1)
aijk = aijk −1 − aikk −1akjk , ∀j = ( k + 1),..., ( n + 1)
i = 1,..., n; (i ≠ k )
(ii) Solution of the system
x i = a in,n +1 for i = 1, 2, ..., n.
The number of operations necessary to transform the system [A, b]k is: (n-1)(n-k+1)
additions, (n-1)(n-k+1) multiplications (n-k+1)division. Therefore, the total number of
operations necessary to transform the original system [A, b] into [I, b’] is
n
No. of additions = ∑ (n − 1)(n − k + 1) = n(n
k =1
2
− 1) / 2.

No. of additions = No of multiplication= n(n 2 − 1) / 2.


n
No. of division = ∑ (n − k + 1) = n(n + 1) / 2.
k =1
Note for n>20, the number of operations will be of the order of n3 /2.
Exercise. Find the number of operations necessary to solve a linear system of order n by
Gauss elimination and Gauss-Jordan methods, for n = 5 and n = 10?
Comparison of Gauss-Jordan and Gaussian Elimination
The method of Gaussian elimination is in general more efficient than Gauss-Jordan
elimination in that it involves fewer operations of addition and multiplication. It is during

Prepared by Tibebe-selassie T/mariam 43


the back substitution that Gaussian elimination picks up this advantage. Particularly in
larger systems of equations, many more operations are saved in Gaussian elimination
during back substitution. The reduction in the number of operations not only saves time
on a computer but also increases the accuracy of the final answer. With large systems, the
method of Gauss-Jordan elimination involves approximately 50% more arithmetic
operations than does Gaussian elimination.
Gauss-Jordan elimination, on the other hand, has the advantage of being more
straightforward for hand computations. It is superior for solving small systems.

3.1.5 Matrix Inversion Using Jordan Elimination


Let A be an n × n matrix.
1. Adjoin the identity n × n matrix In to A to form the matrix [A: In]
2. Compute the reduced echelon form of [A: In]. If the reduced echelon form is of
the type [In: B], then B is the inverse of A. If the reduced echelon form is not of
the type [In: B], in that the first n × n submatrix is not In, then A has no inverse.
The following examples illustrate the method.
Example: Determine the inverse of the matrix
⎡ 1 − 1 − 2⎤
A = ⎢ 2 − 3 − 5⎥
⎢ ⎥
⎢⎣ − 1 3 5 ⎥⎦
Solution: Applying the method of Gauss-Jordan elimination, we get

⎡ 1 − 1 − 2 1 0 0⎤ ≈ ⎡1 − 1 − 2 1 0 0⎤
⎢ ⎥
[ A : I 3 ] = ⎢ 2 − 3 − 5 0 1 0⎥ R 2 + ( −2) R1 ⎢0 − 1 − 1 − 2 1 0 ⎥
⎢ ⎥
⎢⎣− 1 3 5 0 0 1⎥⎦ R3 + R1 ⎢⎣0 2 3 1 0 1⎥⎦
⎡1 − 1 − 2 1 0 0⎤
≈ ⎢0 1 1 2 − 1 0⎥
( −1) R2 ⎢ ⎥
⎢⎣0 2 3 1 0 1⎥⎦
≈ ⎡1 0 − 1 3 − 1 0⎤
R1 + R2 ⎢0 1 1 2 − 1 0⎥
⎢ ⎥
R3 + ( −2) R2 ⎢⎣0 0 1 − 3 2 1⎥⎦
≈ ⎡1 0 0 0 1 1⎤
R1 + R3 ⎢0 1 0 5 − 3 − 1⎥ = [ I : A−1 ]
⎢ ⎥ 3
R2 + ( −1) R3 ⎢⎣0 0 1 − 3 2 1 ⎥⎦
Thus
⎡0 1 1⎤

A = 5 − 3 − 1⎥
−1
⎢ ⎥
⎣⎢ − 3 2 1 ⎦⎥
Observe that we can solve the system of equations

Prepared by Tibebe-selassie T/mariam 44


x1 − x 2 − 2 x3 = 1
2 x1 − 3x 2 − 5 x3 = 3
− x1 + 3x 2 + 5 x3 = −2
using the inverse of the coefficient which is the result of the problem above using the
result x = A– 1 B i.e
⎛ x1 ⎞ ⎡ 0 1 1 ⎤⎡ 1 ⎤ ⎡ 1 ⎤
⎜ ⎟ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎜ x 2 ⎟ = ⎢ 5 − 3 − 1⎥ ⎢ 3 ⎥ = ⎢ − 2⎥
⎜ x ⎟ ⎢− 3 2 1 ⎥⎦ ⎢⎣ − 2⎥⎦ ⎢⎣ 1 ⎥⎦
⎝ 3⎠ ⎣
So the solution of the system is x1 = 1, x 2 = −2, and x 3 = 1.
Exercise
1. Solve (if possible) each of the following system using the method of Gauss-Jordan
elimination.
a) x1 + 4 x 2 + 3 x 3 = 1 b) x1 + 4 x 2 + x 3 = 2
2 x1 + 8 x 2 + 11x 3 = 7 x1 + 2 x 2 − x 3 = 0
x1 + 6 x 2 + 7 x 3 = 3 2 x1 + 6 x 2 =3
c ) x1 + 2 x 2 + 3 x 3 = 8 d ) x1 + 2 x 2 + 8 x 3 = 7
3x1 + 7 x 2 + 9 x 3 = 26 2 x1 + 4 x 2 + 16 x 3 = 14
2 x1 + 6 x 3 = 11 x 2 + 3x 2 = 4

e) x1 + x 2 + x 3 − x 4 = −3 e ) x1 − x 2 + 2 x 3 =7
2 x1 + 3x 2 + x 3 − 5 x 4 = −9 2 x1 − 2 x 2 + 2 x 3 − 4 x 4 = 12
x1 + 3x 2 − x 3 − 6 x 4 = −7 − x 1 + x 2 − x 3 + 2 x 4 = −4
− x1 − x 2 − x 3 =1 − 3x1 + x 2 − 8 x 3 − 10 x 4 = −29
2. Solve the following system of linear equations by applying the method of Gauss-
Jordan elimination to a large augmented matrix that represents two systems wih the
same matrix of coefficients.
x1 + x 2 + 5 x 3 = b1 ⎡ b1 ⎤ ⎡ 2 ⎤ ⎡ 3⎤
x1 + 2 x 2 + 8 x 3 = b2 ⎢ ⎥ ⎢ ⎥
for b2 = 5 and ⎢2⎥ in turn
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
2 x1 + 4 x 2 + 16 x 3 = b3 ⎢⎣b3 ⎥⎦ ⎢⎣10⎥⎦ ⎢⎣4⎥⎦
3. Solve the following systems of equations by determining the inverse of the matrix of
coefficients and then using matrix multiplication.
a) x1 + 2 x 2 − x 3 = 2 b) x1 − x 2 =1
x1 + x 2 + 2 x 3 = 0 x1 + x 2 + 2 x 3 = 2
x1 − x 2 − x 3 = 1 x1 + 2 x 2 + x 3 = 0
4. Draw a flow chart for Gauss-Jordan elimination method

3.1.6 Matrix Decomposition


A third method for the solution of general systems of linear algebraic equations is the LU
decomposition method. The objective of this method is to find a lower triangular factor

Prepared by Tibebe-selassie T/mariam 45


L and an upper triangular factor U such that the system of equations can be transformed
according to
A⋅x = b → (L⋅U)⋅x = A*⋅x = b*
The matrix A* in the above equation is the matrix A after row exchanges have been made
to allow the factors L and U to be computed accurately; the vector b* is the vector b after
an identical set of row exchanges.
A decomposition in which each diagonal element lii of L has a unit value is
known as the Doolittle method; one in which each diagonal element uii of U has a unit
value is known as the Crout method. Another method in which corresponding diagonal
elements lii and uii are equal to each other is known as the Cholesky method. Regardless
of which method is used to obtain the factors L and U of A*, the methods described in
sections 3.1.1 and 3.1.2 for triangular matrices are used to obtain x by solving
L⋅y = b*; U⋅x= y.
Here we consider the Doolittle and Crout method and leave the discussion of the
Cholesky method as reading assignment.
Doolittle Method
From the description of the Doolittle method we can infer that a given 4 × 4 matrix can
be decompose into the form
⎛ a11 a12 a13 a14 ⎞ ⎛ 1 0 0 0 ⎞ ⎛ u11 u12 u13 u14 ⎞
⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ a 21 a 22 a 23 a 24 ⎟ ⎜ m 21 1 0 0 ⎟ ⎜ 0 u 22 u 23 u 24 ⎟
⎜a =
a 32 a 33 a 34 ⎟ ⎜ m 31 m 32 1 0⎟ ⎜ 0 0 u 33 u 34 ⎟
⎜⎜ 31
⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟
a
⎝ 41 a 42 a 43 a 44 ⎠ m
⎝ 41 m 42 m 43 1 ⎠⎝ 0 0 0 u 44 ⎠

The condition that A is nonsingular implies that u kk ≠ 0 for all k. The notation for
the entries in L is mij , and the reason for the choice of mij instead of l ij , will be pointed
out in the next example.
Example Use Gaussian elimination to construct the triangular factorization of the matrix
⎛ 4 3 − 1⎞
⎜ ⎟
A = ⎜− 2 − 4 5 ⎟
⎜ 1 2 6 ⎟⎠

The matrix L will be constructed from an identity matrix placed at the left. For each row
operation used to construct the upper-triangular matrix, the multipliers mij will be put in
their proper places at the left. Start with
⎛ 1 0 0 ⎞⎛ 4 3 − 1⎞
⎜ ⎟⎜ ⎟
A = ⎜ 0 1 0 ⎟⎜ − 2 − 4 5 ⎟
⎜ 0 0 1 ⎟⎜ 1 2 6 ⎟⎠
⎝ ⎠⎝
Row 1 is used to eliminate the elements of A in column 1 below a11 . The multiples
m 21 = −0.5 and m 31 = 0.25 of row 1 are subtracted from rows 2 and 3, respectively.
These multipliers are put in the matrix at the left and the result is
⎛ 1 0 0 ⎞⎛ 4 3 −1 ⎞
⎜ ⎟⎜ ⎟
A = ⎜ − 0.5 1 0 ⎟⎜ 0 − 2.5 4.5 ⎟
⎜ 0.25 0 1 ⎟⎜ 0 1.25 6.25 ⎟
⎝ ⎠⎝ ⎠

Prepared by Tibebe-selassie T/mariam 46


Row 2 is used to eliminate the elements of A in column 2 below a 22 . The multiple
m 32 = −0.5 of the second row is subtracted from row 3, and the multiplier is entered in
the matrix at the left and we have desired triangular factorization of A.
Theorem (Direct Factorization A=LU. No Row Interchange). Suppose that Gaussian
elimination, without row interchanges, can be successfully performed to solve the general
linear system AX = B. Then the matrix A can be factored as the product of a lower-
triangular matrix L and an upper-triangular matrix U:
A = LU.
Furthermore, L can be constructed to have 1’s on its diagonal and U will have nonzero
diagonal elements. After finding L and U, the solution X is computed in two steps
1. Solve LY= B for Y using forward substitution.
2. Solve UX=Y for X using back substitution.
Proof: We will show that, when the Gaussian elimination process is followed and B is
stored in column N+1 of the augmented matrix, the result after the upper triangularization
step is the equivalent upper-triangular system UX = Y. The matrices L, U, B, and Y will
have the form
⎛ 1 0 0 L 0⎞ ⎛ a 11( 0 ) a 12( 0 ) a 13( 0 ) L a 1(n0 ) ⎞ ⎛ a1( 0n)+1 ⎞ ⎛ a1( 0n)+1 ⎞
⎜ ⎟ ⎜ (1) (1) (1)
⎟ ⎜ ( 0)
⎟ ⎜ (1) ⎟
⎜ m 21 1 0 L 0⎟ ⎜ 0 a 22 a 23 L a 2 n ⎟ ⎜ a 2 n +1 ⎟ ⎜ a 2 n +1 ⎟

L = m 31 m 32 ⎟
1 L 0 U=⎜ 0 ⎜ (2)
0 a 33 L a 3 n (2) ⎟ ⎜ ( 0) ⎟
B = a 3 n +1 Y = ⎜ a 3( 2n)+1 ⎟
⎜ ⎟ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ M M M L M⎟ ⎜ M M M L M ⎟ ⎜ M ⎟ ⎜ M ⎟
⎜m ⎟ ⎜ ⎟ ⎜ ( 0) ⎟ ⎜ ( n −1) ⎟
⎝ n1 m n 2 m n 3 L 1 ⎠ ⎝ 0 0 0 L a (nnn −1) ⎠ ⎝ a n n +1 ⎠ ⎝ a n n +1 ⎠
Remark: To find just L and U, the(n + 1)st column is not needed.
Firs store the coefficients in the augmented matrix. The superscript on a ij0 means that this
is the first time that a number is stored in location (i, j).
⎛ a11( 0 ) a12( 0 ) a13( 0 ) L a1(n0 ) a1( 0n)+1 ⎞
⎜ ( 0)

⎜ a 21
( 0) (0)
a 22 ( 0)
a 23 L a 2( 0n) a 2 n +1 ⎟
⎜ ( 0) (0) ( 0) (0)
a ( 0) ⎟
⎜ a 31 a 32 a 33 L a 3n 3 n +1 ⎟
⎜ M M M M ⎟
⎜ ( 0) (0) ( 0) (0) (0) ⎟
⎝ a n1 a n1 a n1 L a nn a n n +1 ⎠
Step 1: Eliminate a i1 for i from 2 to n in rows 2 through n and store the multiplier mi1 ,
used to eliminate a i1 in row i, in the matrix at location (i, 1)
for i = 2 to n
mi1 = a i(10 ) / a11( 0 )
a i1 = m i1 ;
for j = 2 to n +1
a ij(1) = a ij( 0 ) − mi1 a1( 0j )
End for
End for
So at the end of this elimination we have the augmented matrix

Prepared by Tibebe-selassie T/mariam 47


⎛ a11( 0 ) a12( 0 ) a13( 0 ) L a1(n0 ) a1 n +1 ⎞
( 0)
⎜ (1)

⎜ m 21 (1)
a 22 (1)
a 23 L a 2(1n) a 2 n +1 ⎟
⎜ (1) (1) (1) ⎟
⎜ m 31 a 32 a 33 L a 3(1n) a 3 n +1 ⎟
⎜ M M M M ⎟
⎜ ⎟
⎝ m n1 a n(11) a n(11) (1)
L a nn a n(1)n +1 ⎠
Step 2 Eliminate a i 2 for i from 3 to n in rows 3 through n and store the multiplier mi1 ,
used to eliminate a i 2 in row i, in the matrix at location (i, 2)
for i = 3 to n
mi 2 = a i(21) / a12(1)
a i 2 = mi 2 ;
for j = 3 to n +1
a ij( 2 ) = a ij(1) − mi 2 a 2(1j)
End for
End for
At the end of this elimination the augmented has the form
⎛ a11( 0 ) a12( 0 ) a13( 0 ) L a1(n0 ) a1( 0n)+1 ⎞
⎜ (1)

⎜ m 21 a 22 (1) (1)
a 23 L a 2(1n) a 2 n +1 ⎟
⎜ ( 2) ( 2)
a (2) ⎟
⎜ m 31 m 32 a 33 L a 3n 3 n +1 ⎟
⎜ M M M M ⎟
⎜ ( 2) ( 2) ( 2) ⎟
⎝ m n1 m n 2 a n 3 L a nn a n n +1 ⎠
Step k: This is the general step. Eliminate a ik in row k + 1 through n-1 and store the
multipliers at the location (i, k).
for i = k+1 to n
mik = a ik( k ) / a1(kk )
a ik = mik ;
for j = k+1 to n +1
a ij( k +1) = a ij( k ) − mik a kj( k )
End for
End for
The final matrix after n-1 steps of elimination is of the form
⎛ a11( 0 ) a12( 0 ) a13( 0 ) L a1(n0−) 1 a1(n0 ) a1( 0n)+1 ⎞
⎜ ⎟
⎜ m 21 a 22 (1) (1)
a 23 L a 2(1n)−1 a1(n0 ) a 2(1n) +1 ⎟
⎜ ( 2) (2) ( 0) ( 2) ⎟
⎜ m 31 m 32 a 33 L a 3n −1 a1n a 3 n +1 ⎟
⎜ M M M M M M ⎟
⎜⎜ ( 0) (2) ⎟ ⎟
⎝ m n 1 m n 2 m n 3 L m n n −1 a nn
a n n +1 ⎠

The upper-triangular process is now complete. Notice that once array is used to store the
elements of both L and U. the 1’s of L are not stored, nor are the 0’s of L and U that lies

Prepared by Tibebe-selassie T/mariam 48


above and below the diagonal, respectively. Only the essential coefficients needed to
reconstruct L and U are stored!
The last part of the proof is to verify the product LU = A and it is left as exercise.
Crout Method
Let Ax=B be a system of n equations in n variables, where A is a non-singular matrix that
has LU decomposition. An alternative approach that involves a U matrix with 1’s on the
diagonal which is called Crout decomposition. Like the Doolittle method the system can
thus be written as
LUx = B
and the method involves writing this system as two subsystems, one of which is lower
triangular and the other upper triangular of the form Ux = y and Ly = B.
In practice, we first solve Ly = B for y and then solve Ux = y to get the solution x. Let us
now consider the system of equations below
a11 x1 + a12 x 2 + a13 x3 = b1
a 21 x1 + a 22 x 2 + a 23 x3 = b2
a31 x1 + a 23 x 2 + a33 x3 = b3
The above equation can be written as
Ax = B
⎡ a11 a12 a13 ⎤ ⎡ x1 ⎤ ⎡ b1 ⎤
⎢ ⎥
Where A = ⎢a 21 a 22 a 23 ⎥, x = ⎢ x 2 ⎥ ⎢ ⎥ and B = ⎢⎢b2 ⎥⎥
⎢⎣ a31 a32 a33 ⎥⎦ ⎢⎣ x3 ⎥⎦ ⎢⎣b3 ⎥⎦

Let A = LU
⎡l11 0 0⎤ ⎡1 u12 u13 ⎤
Where L = ⎢⎢l 21 l 22 0 ⎥ , U = ⎢⎢0 1 u 23 ⎥⎥ So that

⎢⎣l31 l 31 l33 ⎥⎦ ⎢⎣0 0 1 ⎥⎦
⎡l11 0 0 ⎤ ⎡1 u12 u13 ⎤ ⎡ a11 a12 a13 ⎤
⎢l ⎥⎢ ⎥ ⎢ ⎥
⎢ 21 l 22 0 ⎥ ⎢0 1 u 23 ⎥ = ⎢a 21 a 22 a 23 ⎥
⎢⎣l31 l 32 l33 ⎥⎦ ⎢⎣0 0 1 ⎥⎦ ⎢⎣ a 31 a32 a33 ⎥⎦
Then l11 = a11 , l21 = a 21 , l31 = a31 ,
a
l11u12 = a12 then u12 = 12
a11
a13
l11u13 = a13 then u13 =
a11
l 21u12 + l 22 = a 22 then l 22 = a 22 − a 21 .u12
l 31u12 + l 32 = a 32 then l 32 = a 32 − a 31 .u12
1
l 21 .u13 + l 22 .u 23 = a 23 then u 23 = (a 23 − a 21 .u13 )
l 22
l31u13 + l32u23 + l33 = a33 then l33 = [a 33 − a31 .u13 − l 32 .u 23 ]
We may generalize the above relations by the following concise series of formulas:

Prepared by Tibebe-selassie T/mariam 49


for i = 1,2,..., n
li1 = ai1
for j = 2,3,..., n
u1 j = a1 j / l11
for j = 2,3,..., n − 1
for i = j, j + 1,..., n
j −1
lij = aij − ∑ lik ukj
k =1

for k = j + 1, j + 2,..., n
j −1
a jk − ∑ l ji uik
u jk = i =1
l jj
and
n −1
lnn = ann − ∑ lnk ukn
k =1
Now Ax = B becomes LUx = B.
Let Ux = y then Ly=B
And from Ly = B we get
⎡ l11 0 0 ⎤ ⎡ y1 ⎤ ⎡ b1 ⎤
⎢l l 0 ⎥ ⎢ y2 ⎥ = ⎢b2 ⎥
⎢ 21 22 ⎥⎢ ⎥ ⎢ ⎥
⎢⎣l31 l31 l33 ⎥⎦ ⎢⎣ y2 ⎥⎦ ⎢⎣b2 ⎥⎦
Thus
l11 . y1 = b1
l 21 y1 + l 22 . y 2 = b2
l31 y1 + l32 . y 2 + l 33 y 3 = b3
Solve for y1 , y 2 , y 3 by forward substitution.
We know that Ux = y
Thus
⎡1 u12 u13 ⎤ ⎡ x1 ⎤ ⎡ y1 ⎤
⎢0 1 u ⎥ ⎢ x ⎥ = ⎢ y ⎥
⎢ 23 ⎥ ⎢ 2 ⎥ ⎢ 2⎥
⎢⎣0 0 1 ⎥⎦ ⎢⎣ x3 ⎥⎦ ⎢⎣ y3 ⎥⎦

Consequently
x1 + u12 x 2 + u13 x3 = y1
x 2 + u 23 x3 = y 2
x3 = y 3
By the backward substitution, we get the values of x1 , x2 , x3.

Prepared by Tibebe-selassie T/mariam 50


Example: - Solve the following system of equations using Doolittle decomposition and
Crout decomposition methods.
2 x1 + x2 + 3x3 = −1
4 x1 + x2 + 7 x3 = 5
− 6 x1 − 2 x2 − 12 x3 = −2
Solution: We can write above equation in the matrix form of Ax = B i.e
⎡2 1 3 ⎤ ⎡ x1 ⎤ ⎡ − 1 ⎤
⎢4 1 7 ⎥⎥ ⎢⎢ x 2 ⎥⎥ = ⎢⎢ 5 ⎥⎥

⎢⎣− 6 − 2 − 12⎥⎦ ⎢⎣ x3 ⎥⎦ ⎢⎣− 2⎥⎦
Using the Doolittle decomposition method it can easily be shown that matrix A has the
following LU decomposition
⎡ 2 1 3 ⎤ ⎡1 0 0⎤ ⎡ 2 1 3⎤

A= 4 1 ⎥
7 = 2 ⎢ 1 0 0 −1 1 ⎥
⎥ ⎢
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣⎢ − 6 − 2 − 12⎦⎥ ⎣⎢ − 3 − 1 1⎦⎥ ⎣⎢0 0 − 2⎦⎥
So first we solve the system Ly =B i.e
⎡1 0 0⎤ ⎡ y1 ⎤ ⎡ − 1⎤
⎢2 1 0⎥ ⎢ y 2 ⎥ = ⎢ 5 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢⎣ − 3 − 1 1⎥⎦ ⎢⎣ y 3 ⎥⎦ ⎢⎣ 2 ⎥⎦
using forward substitution and find that y1 = −1, y 2 = 7, and y 3 = 2
Lastly we solve the system Ux = y i.e
⎡2 1 3 ⎤ ⎡ x1 ⎤ ⎡ − 1⎤
⎢0 − 1 1 ⎥ ⎢ x ⎥ = ⎢ 7 ⎥
⎢ ⎥ ⎢ 2⎥ ⎢ ⎥
⎢⎣0 0 − 2⎥⎦ ⎢⎣ x 3 ⎥⎦ ⎢⎣ 2 ⎥⎦
By back ward substitution method for x which leads us to the solution
x1 = 5, x 2 = −8, x3 = −1.
Similarly in Crout decomposition method, let A = LU then we must have
⎡2 1 3 ⎤ ⎡l11 0 0 ⎤ ⎡1 u12 u13 ⎤
⎢4 1 7 ⎥⎥ = ⎢⎢l 21 l 22 0 ⎥⎥ ⎢⎢0 1 u 23 ⎥⎥

⎢⎣− 6 − 2 − 12⎥⎦ ⎢⎣l31 l31 l 33 ⎥⎦ ⎢⎣0 0 1 ⎥⎦
Now solving the above matrix equation we have
l11 = 2, l 21 = 4, l 31 = −6, u12 = 1 / 2, u13 = 3 / 2
l22 = −1, l32 = 1, u23 = −1, l33 = −2.
Then we have
⎡2 0 0⎤ ⎡1 1 / 2 3 / 2 ⎤
L = ⎢ 4 − 1 0 ⎥ , and U = ⎢⎢0 1
⎢ ⎥ − 1 ⎥⎥
⎢⎣− 6 1 − 2⎥⎦ ⎢⎣0 0 1 ⎥⎦
writing Ly =B we have

Prepared by Tibebe-selassie T/mariam 51


⎡2 0 0 ⎤ ⎡ y1 ⎤ ⎡ − 1⎤
⎢ 4 − 1 0 ⎥⎢ y ⎥ = ⎢ 5 ⎥
⎢ ⎥⎢ 2 ⎥ ⎢ ⎥
⎢⎣ − 6 1 − 2⎥⎦ ⎢⎣ y3 ⎥⎦ ⎢⎣ − 2⎥⎦
Solving the above system by for ward substitution i.e
2 y1 = −1 then y1 = −1 / 2
4 y1 − y 2 = 5 then y 2 = −7
− 6 y1 + y 2 − 2 y 3 = −2 then y 3 = −1
Now again writing Ux =y we have
⎡1 1 / 2 3 / 2⎤ ⎡ x1 ⎤ ⎡− 1 / 2⎤
⎢0 1 − 1 ⎥⎥ ⎢⎢ x 2 ⎥⎥ = ⎢⎢ − 7 ⎥⎥

⎢⎣0 0 1 ⎥⎦ ⎢⎣ x3 ⎥⎦ ⎢⎣ − 1 ⎥⎦

Solving the above equation by substitution we have


x1 + 1 / 2 x2 + 3 / 2 x3 = −1 / 2
x2 − x3 = −7
x3 = −1
The solution to the given system is therefore x1 = 5, x 2 = −8, x3 = −1.
The method of LU decomposition can be applied to any system of n equations in
n variables, Ax = B, that can be transformed into an upper triangular form U using row
operations that involve adding multiples of rows to rows. In general, if in transforming
say the coefficient matrix A row interchanges are required to arrive at the upper
triangular form, then matrix A does not have an LU decomposition and the method
cannot be used to solve the system Ax = B as it is.
The total number of arithmetic operations needed to solve a system of equations
using LU decomposition is exactly the same as that needed in Gaussian elimination.
However, if the linear system is to be solved many times, with the same coefficient
matrix A but with different column matrix B, it is not necessary to decompose the matrix
each time if the factors are seved. This is the reason the LU decomposistion method is
usually chosen over the elimination method.
The other fact that LU decomposition is used to solve systems of equations on
computers where applicable, rather than Gaussian elimination, is a result of the
usefulness of LU decomposition of a matrix for many types of computations. The inverse
of a triangular matrix can be computed very efficiently, and the determinant of a
triangular matrix is the product of its diagonal elements.
Exercise:
1. Solve LY = B, and verify that B = AX for (a) B = [−4 10 5]T and
(b) B = [20 49 32]T , where A = LU is
⎡ 2 4 − 6⎤ ⎡ 1 0 0⎤ ⎡2 4 − 6⎤
⎢1 5 3 ⎥ = ⎢1 / 2 1 0⎥ ⎢0 3 6 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢⎣1 3 2 ⎥⎦ ⎢⎣1 / 2 1 / 3 1⎥⎦ ⎢⎣0 0 3 ⎥⎦

Prepared by Tibebe-selassie T/mariam 52


2. Solve LY = B, and verify that B = AX for (a) B = [7 2 10]T and
(b) B = [23 35 7]T , where A = LU is
⎡1 1 6⎤ ⎡ 1 0 0⎤ ⎡1 1 6 ⎤
⎢ − 1 2 9⎥ = ⎢ − 1 1 0⎥ ⎢0 3 15⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢⎣ 1 − 2 3⎥⎦ ⎢⎣ 1 − 1 1⎥⎦ ⎢⎣0 0 12⎥⎦
3. Find the Doolittle and Crout decomposition A = LU for the matrices
⎡ − 5 2 − 1⎤ ⎡1 0 3⎤

a) 1 0 3 ⎥ b) ⎢ 3 1 6 ⎥
⎢ ⎥ ⎢ ⎥
⎢⎣ 3 1 6 ⎥⎦ ⎢⎣ − 5 2 − 1⎥⎦
⎡4 2 1 ⎤ ⎡1 −2 7 ⎤
c) 2 5 − 2⎥
⎢ d) ⎢4 2 1 ⎥
⎢ ⎥ ⎢ ⎥
⎢⎣1 − 2 7 ⎥⎦ ⎢⎣2 5 − 2⎥⎦
⎡4 8 4 0 ⎤ ⎡1 2 4 1⎤
⎢1 5 4 − 3⎥ ⎢2 8 6 4⎥
e) ⎢ ⎥ f) ⎢ ⎥
⎢1 4 7 2 ⎥ ⎢3 10 8 8⎥
⎢ ⎥ ⎢ ⎥
⎣1 3 0 − 2⎦ ⎣4 12 10 6⎦
4. Consider the matrix
⎡2 2 1⎤
A = ⎢1 1 1⎥
⎢ ⎥
⎣⎢ 3 2 1⎦⎥
a) Show that A has no LU decomposition
b) Interchange the rows of A so that this can be done.

5.Solve the follow system of equations using


i) Doolittle method ii) Crout method
x1 − 3x2 + 4 x3 = 12 5 x1 − x 2 − 2 x 3 = 142
a) − x1 + 5 x2 − 3x3 = −12 b) x1 − 3x 2 − x 3 = −30
4 x1 − 8 x2 + 23x3 = 58 2 x1 − x 2 − 3 x 3 = 5
6. Write a C++ program for Crout method

Prepared by Tibebe-selassie T/mariam 53


3.1.7 Tri Diagonal Matrix Method

Definition: - Let A = ( aij ) be a square matrix of order n such that aij whenever i − j > 1
then A is called a Tridiagonal matrix. Such matrices arise frequently in the study of
numerical differential equations.

⎛ α1 γ1 0 0 L 0 ⎞
⎜ ⎟
⎜β2 α2 γ 2 0 L 0 ⎟
⎜ 0 β3 α3 γ 3 L 0 ⎟
A=⎜ ⎟
⎜ M M M M O M ⎟
⎜ 0 0 0 L α n −1 γ n −1 ⎟
⎜ ⎟
⎜ 0 0 0 L β n α n ⎟⎠

Consider the problem of solving the linear system


Ax = B,
where A is a tri-diagonal matrix. Trying to solve the system using by decomposition
method because of the structure of A we expect L to have non-zero entries only on the
diagonal and sub-diagonal, and U to have non-zero entries only on the diagonal and
super-diagonal. In fact a little thought will tell you that the diagonal on either L or U
could be chosen to be 1's; we will choose U to have this form(as in Crout method). Thus

⎛ l1 0 0 ... 0 ⎞ ⎛ 1 μ1 0 ... 0 ⎞
⎜ ⎟ ⎜ ⎟
⎜ β 2 l 2 0 ... 0 ⎟ ⎜0 1 μ2 ... 0 ⎟
L = ⎜ 0 β 3 l 3 ... 0 ⎟ U = ⎜0 0 1 ... 0 ⎟
⎜ ⎟ ⎜ ⎟
⎜ ... ... ... ... ⎟ ⎜ ... ... ... μ n −1 ⎟
⎜ 0 0 0 β n l n ⎟⎠ ⎜0 0 ... 0 1 ⎟⎠
⎝ ⎝
This leads us to the following algorithm
l1 = α 1 , μ 1 = γ 1 / l 1
for i = 2 to n
l i = α i − β i μ i −1
if i < n
μi = γ i / li
end for
The solution is now easy consisting of forward and backward substitution.

y1 = b1 / l1 y i = bi − β i y i −1 i = 2,3,..., n
xn = y n xi = y i − μ i xi +1 i = 1,2,..., n − 1 (2)

This algorithm is called the Thomas algorithm (esp. in engineering circles).

Prepared by Tibebe-selassie T/mariam 54


Example: Solve
2 x1 − x 2 =0
− x1 + 2 x 2 − x3 =0
− x 2 + 2 x3 − x 4 = 0
− x3 + 2 x 4 = 1
Solution: The above equation has the equivalent matrix form Ax = b.
⎡ 2 − 1 0 0 ⎤ ⎡ x1 ⎤ ⎡1⎤
⎢ − 1 2 − 1 0 ⎥ ⎢ x ⎥ ⎢0 ⎥
⎢ ⎥⎢ 2 ⎥ = ⎢ ⎥
⎢ 0 − 1 2 − 1⎥ ⎢ x3 ⎥ ⎢0⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣0 0 − 1 2 ⎦ ⎣ x 4 ⎦ ⎣1 ⎦
Let A=LU
⎡ 2 −1 0 0 ⎤ ⎡ l1 0 0 0 ⎤ ⎡1 μ 1 0 0⎤
⎢− 1 2 − 1 0 ⎥ ⎢ β ⎥
0 0 ⎥ ⎢0 1 μ 2 0 ⎥⎥

⎢ ⎥ = ⎢ 2 l2
⎢ 0 − 1 2 − 1⎥ ⎢ 0 β 3 l3 0 ⎥ ⎢0 0 1 μ 3 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣0 0 −1 2 ⎦ ⎣ 0 0 β 4 l 4 ⎦ ⎣0 0 0 1⎦
Now solving the above matrix equation we have
l1 = 2, μ1 = −1 / 2, l 2 = 3 / 2, μ 2 = −2 / 3, l3 = 4 / 3, μ 3 = −3 / 4, l 4 = 5 / 4
Then we have
⎡2 0 0 0⎤ ⎡ −1 ⎤
⎢1 2 0 0 ⎥
⎢ 3 ⎥
⎢− 1 2 0 0 ⎥ ⎢ −2 ⎥
⎥ ⎢ ⎥
L=⎢
0 1 0
4 U =⎢ 3 ⎥
⎢ 0 −1 0⎥
⎢ 3 ⎥ ⎢0 0 − 3⎥
1
⎢0 0 −1
5⎥ ⎢ 4 ⎥
⎢⎣ 4⎦⎥ ⎢
⎣0 0 0 1 ⎥⎦
Writing Ly = b we have
⎡2 0 0 0 ⎤ ⎡ y1 ⎤ ⎡1⎤
⎢− 1 3 / 2 0
⎢ 0 ⎥⎥ ⎢⎢ y 2 ⎥⎥ ⎢⎢0⎥⎥
=
⎢ 0 − 1 4 / 3 0 ⎥ ⎢ y 3 ⎥ ⎢0 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣0 0 − 1 5 / 4⎦ ⎣ y 4 ⎦ ⎣1⎦
Then solving the system we have
y1 = 1 / 2, y 2 = 1 / 3, y 3 = 1 / 4, y 4 = 1
and from Ux = y by back substitution we have
x 4 = 1, x3 = 1, x 2 = 1, x1 = 1.
Thus the solution of the given systems of equations is given by
x1 = 1, x 2 = 1, x3 = 1, x 4 = 1.

Prepared by Tibebe-selassie T/mariam 55


3.2 Indirect methods
We have discussed a number of elimintion methods for solving systems of linear
equations. We now introduce two-called iterative methods for soliving systems of n
equations in n variables that have a unique solution.
Consider the system
Ax = b
(0)
Let x be the initial approximation vector and xT the vector of the true solution. We
generate a sequence of vectors x ( 0) , x (1) ,..., x ( n ) which converge to the true solution xT . In
doing so, we must consider two things. That is, the convergence of the sequence and the
stop criteria since we have to stop the iteration after n steps.

3.2.1 Gauss Jacobi Iterative Method


Consider the system
Ax = b,
which is an n × n system of equations of the form
a11 x1 + a12 x 2 + ... + a1n x n = b1
a 21 x1 + a 22 x 2 + ... + a 2 n x n = b2
M M M M M
a n1 x1 + a n 2 x 2 + ... + a nn x n = bn
⎡ x1( 0 ) ⎤
⎢ (0) ⎥
x
and let x ( 0) = ⎢ 2 ⎥ be the vector of the initial approximation.
⎢ M ⎥
⎢ (0) ⎥
⎢⎣ x n ⎥⎦

Step I: Obtain x1(1) from the first equation as a function of xi( 0) , i = 2,3,..., n as follows:
b1 − (a12 x 2( 0 ) + a13 x3( 0 ) + ... + a1n x n( 0) )
x1(1) =
a11
Similarly
b2 − (a 21 x1( 0 ) + a 23 x3( 0) + ... + a 2 n x n( 0) )
x (1)
2 = (from the second equation)
a 22
bk − (a k 2 x1( 0) + ... + a k ,k −1 x k( 0−)1 + a k ,k +1 x k +1 + ... + a1n x n( 0) )
x (1)
k = (from the kth equation)
a kk
and

bn − (a n1 x1( 0 ) + a n 2 x 2( 0) + ... + a nn −1 x n( 0−)1 )


x (1)
n = (from the last equation)
a nn
( k +1)
After similar steps we can see that at the (k+1)th step xi , i = 1,2,...., n from the
(k )
previous xi , i = 1,2,...., n by the formula

Prepared by Tibebe-selassie T/mariam 56


n
bi − ∑ aij x kj
i =1
( k +1) j ≠i
x i = , i = 1,2,..., n; k = 0,1,...
aii
Termination of Iterations
⎛ n ⎞
Suppose we denote the residue vector r r ( k ) = b − Ax ( k ) , i.e. ri( k ) = ⎜⎜ bi − ∑ aij x (jk ) ⎟⎟
⎝ j =1 ⎠
∀i = 1,2,..., n. Then the standard criterion for the termination of the iteration is
r (k )

b
where ε is arbitrarily small.
Another standard termination condition for the relative improvement for x is stated as
x ( k ) − k ( k +1)

x ( k +1)
This condition is ‘practically’ equivalent to the previous condition for the termination of
the iteration.
Sometimes, we also stop the iteration procedure when x k − x (k +1) < ε .
Convergence condition
The next theorem, which we present without proof, gives one of conditions under which
the Jacobi iterative method can be used.
Definition: A matrix A of dimension n × n is said to be diagonally dominated provided
that
n
a kk > ∑ a kj for k = 1, 2, …,n.
j =1
j ≠k

Theorem 2: Suppose that A is a diagonally dominated matrix. Then the system


Ax = B, has a unique solution and the Jacobi method will converge to this solution, no
matter what the initial values.
Example: Solve the following system equations by Jacobi method.
6x + 2 y − z = 4
x + 5y + z = 3
2 x + y + 4 z = 27
Solution: Firs let us investigate the convergence condition for the system. The matrix
coefficients is
⎡6 2 − 1⎤
A = ⎢⎢1 5 1 ⎥⎥
⎢⎣2 1 4 ⎥⎦
Comparing the magnitude of the diagonal elements with the sums of the magnitudes of
the other elements in each row, we get the results show in table below. Observe that the
diagonal elements dominate the rows; thus the Jacobi method can be used.

Prepared by Tibebe-selassie T/mariam 57


Absolute value of diagonal Sum of absolute values of
Row
elements other elements in the row
1 6 |2| + |-1| =3
2 5 |1| + | 1| =2
3 4 |2| + | 1| = 3

Now to solve the system rewrite the equations as follows, isolating x in the first equation,
y in the second equation, and z in the third equation.
4 − 2y + z
x=
6
3− x − z
y= (2)
5
27 − 2 x − y
z=
4
Now make an estimate of the solution, say x = 1, y = 1, z = 1. The accuracy of the estimate
affects only the speed with which we get a good approximation to the solution. Let us
label these values x ( 0) , y ( 0 ) , and z ( 0 ) . They are called the initial values of the iterative
process.
x ( 0) = 1, y ( 0) = 1, z ( 0) = 1
Substitute these values into the right-hand side system (2) to get the next set of values in
the iterative process.
x (1) = 0.5, y (1) = 0.2, z (1) = 6 1
These values of x,y, and z are now substituted into system (2) again to get
x ( 2) = 1.6, y ( 2) = −0.7, z ( 2 ) = 6.45
This process can be repeated to get x ( 3) , y (3) , z ( 3) , and so on. Repeating the iteration will
give a better approximation to the exact solution at each step. For this straightforward
system of equations, the solution can easily be seen to be
x = 2, y = −1, z = 6.

3.2.2 Gauss-Seidel Method


The Gauss-Seidel method is a refinement of the Jacobi method that usually (but not
always) gives more rapid convergence. The latest value of each variable is substituted at
each step in the iterative process. This method like the Jacobi method converges if the
coefficient matrix A is diagonally dominant.
So given an nxn system of equations the general iteration process can be written as
x1( k +1) = (b1 − a12 x 2( k ) − a13 x3( k ) − ... − a1n x n( k ) ) / a11
x 2( k +1) = (b2 − a 21 x1( k +1) − a 23 x3( k ) − ... − a 2 n x n( k ) ) / a 22
M M
x n( k +1) = (bn − a n1 x1( k +1) − a n 2 x 2( k +1) − ... − a n ,n −1 x n( k−1+1) ) / a nn
Here, as in the case of the Jacobi method, we assume that the pivots aii are non-zero.

Prepared by Tibebe-selassie T/mariam 58


Gauss-Seidel Algorithm

Gauss-Seidel Algorithm for the solution of Ax = b


Let A, b, x(0),ε1, ε2 and kmax be given k=1,2,…,kmax,
⎡ i −1 n ⎤
⎢ i ∑ ij j ∑
( k +1)
b − a x − aij x (jk ) ⎥
1. xi( k +1) = ⎣ j =1 j =i +1 ⎦ , i = 1,2,..., n
aii
2. Termination condition: xi( k +1) − xi( k ) < ε 1 or
xi( k +1) − xi( k )
< ε2 , i = 1,2, …,n.
xi( k +1)

Example: Let us consider the previous system of equations. As before, let us take our
initial guess to be
x ( 0) = 1, y ( 0) = 1, z ( 0) = 1
Substituting the latest values of each variable into (2) system gives
4 − 2 y ( 0) + z (0)
x (1) = = 0.5
6
3 − x (1) − z ( 0 )
y (1) = = 0.3
5
27 − 2 x (1) − y (1)
z =(1)
= 6.4250
4
Observe that we have used x (1) , the most up-to-date value of x, to get y (1) . We have used
x (1) and y (1) to get z (1) . Continuing, we get
4 − 2 y (1) + z (1)
x ( 2) = = 1.6375
6
3 − x ( 2 ) − z (1)
y ( 2) = = −1.0125
5
27 − 2 x ( 2 ) − y ( 2 )
z ( 2) = = 6.4250
4
The next two tables below give the results obtained for this system of equations using
both methods. They illustrate the more rapid convergence of the Gauss-Seidel method to
the exact solution x = 2, y = −1, z = 6. And the last table gives the difference between the
solution x ( 6) , y ( 6 ) , z ( 6) obtained in the two methods after six iterations and the actual
solution x = 2, y = −1, z = 6. The Gauss-Seidel method converges much more rapidly
than the Jacobi method.

Prepared by Tibebe-selassie T/mariam 59


Jacobi Method
Index x y Z
Initial Guess 1 1 1
1 0.5 0.2 6
2 1.6 -0.7 6.45
3 1.975 -1.01 6.125
4 2.024167 -1.02 6.015
5 2.009167 -1.007833 5.992917
6 2.001431 -1.000417 5.997375

Gauss-Seidel Method
Index x y Z
Initial Guess 1 1 1
1 0.5 0.3 6.425
2 1.6375 -1.0125 6.184375
3 2.034896 -1.043854 5.993516
4 2.013537 -1.001411 5.993594
5 1.998597 -0.998597 5.999949
6 1.999524 -0.9998945 6.000212

The comparison table


x ( 6) − 2 y ( 6 ) − (−1) z (6) − 6
Jacobi Method 0.001431 0.000417 0.002625
Gauss-Seidel Method 0.000476 0.0001055 0.000212

Comparison of Gaussian Elimination and Gauss-Seidel


Limitations Gaussian elimination is a finite method and can be used to solve any system
of linear equations. The Gauss-Seidel method converges only for special systems of
equations; thus it can be used only for those systems.
Efficiency The efficiency of a method is a function of the number of arithmetic
operations (addition, subtraction, multiplication, and division) involved in each method.
For a system of n equation in n variables, where the solution is unique and the value of n
is large, Gaussian elimination involves 2n3/3 arithmetic operations to solve the problem,
while Gauss-Seidel requires approximately 2n2 arithmetic operations per iteration.
Therefore if the number of iterations is less than or equal to n/3, the iterative method
requires fewer arithmetic operations.
Accuracy In general, when Gauss-Seidel is applicable, it is more accurate than the
Gaussian elimination.
Storage Iterative methods are, in general, more economical in core-storage requirements
of a computer.

Prepared by Tibebe-selassie T/mariam 60


Exercise
In problems 1 to 6:
a) Start with x 0 = 0 and use Jacobi iteration to find Pk for k = 1, 2, 3. Will Jacobi
iteration converge to the solution?
b) Start with x 0 = 0 and use Gauss-Seidel iteration to find Pk for k = 1, 2, 3. Will
Gauss-Seidel iteration converge to the solution?
3. 5 x − y + z = 10
1. 4 x − y = 15 2. 2 x + 3 y = 1
2 x + 8 y − z = 11
x + 5y = 9 7x − 2 y = 1
− x + y + 4z = 3
4. 2 x + 8 y − z = 11 5. x − 5 y − z = −8 6. 4 x + y + 4 z = 13
5 x − y + z = 10 4 x + y − z = 13 x − 5 y − z = −8
− x + y + 4z = 3 2 x − y − 6 z = −2 2 x − y − 6 z = −2
7. Write a program for Jacobi and Gauss-Seidel methods and solve the problems 1-6
using a tolerance 10 −8 .
8. In Theorem 2 the condition that A be diagonally dominated is a sufficient but not
necessary condition. Use both of your programs that u developed in problem 7 and
several different initial guesses on the following system of equations. Note the Jacobi
iteration appears to converge, while the Gauss-Seidel iteration diverges.
x +z=2
−x + y =0
x + 2 y − 3z = 0
9. Consider the following tridiagonal linear system, and assume that the coefficient
matrix is strictly diagonally dominant.
d 1 x1 + c1 x 2 = b1
a 1 x1 + d 2 x 2 + c 2 x 3 = b2
a 2 x 2 + d 3 x 3 + c3 x 3 = b3
. . . .
. . . .
. . . .
a n − 2 x n − 2 + d n −1 x n −1 + c n −1 x n = bn −1
a n −1 x n −1 + d n x n = bn
i) Write an iterative algorithm that will solve this system. Your algorithm should
efficiently use the “sparseness” of the coefficient matrix.
ii) Construct a program based on your algorithm in (i) and solve the following
tridiagonal systems.
a ) 4m1 + m 2 = 3 b) 4m1 + m 2 =1
m1 + 4m 2 + m 4 =3 m1 + 4m 2 + m 4 =2
O O O M O O O M
m 48 + 4m 49 + m50 = 3 m 48 + 4m 49 + m 50 = 1
m 49 + 4 m50 = 3 m 49 + 4m 50 = 2

Prepared by Tibebe-selassie T/mariam 61


4 Solving Nonlinear Equations Using Newton’s Method
4.1 Introduction
In this chapter, we discuss the problem of solving a system of nonlinear equations in
several variables of the following form:
f1 ( x1 , x 2 ,..., x n ) = 0
f 2 ( x1 , x 2 ,..., x n ) = 0
(4.1)
M
f n ( x1 , x 2 ,..., x n ) = 0
where f i are nonlinear real valued functions of the real variables x1 , x 2 ,..., x n . We denote
the system (4.1) in matrix form as follows:
⎡ f1 ( x1 , x 2 ,..., x n ) ⎤
⎢ f ( x , x ,..., x ) ⎥
n ⎥
f (x) = ⎢ 2 1 2 =0 (4.2)
⎢M ⎥
⎢ ⎥
⎣ f n ( x1 , x 2 ,..., x n )⎦
where x = [ x1 , x 2 ,..., x n ]t ;0 = [0,0,...,0]t is the null vector and f is the nonlinear operator
given by f = [ f1 , f 2 ,..., f n ]t .
The problem is to find a set of n real numbers, i.e. a vector x* = [ x1* , x 2* ,..., x1* ]t ,
simultaneously satisfying the n equations of the system (4.1)
Note that solving the system (4.1) is generally complex and rarely possible by any
of the elimination methods studied in chapter 3.
We can only hope for iterative methods for finding a solution of (4.1). Here we
present one such method called the Newton Raphson method.

4.2 Newton Method


Let us suppose x* = [ x1* , x 2* ,..., x1* ]t be a solution vector of the nonlinear system
f i (x) = 0, i = 1,2,..., n.
If each function f i is continuous and continuously differentiable then by considering the
Taylor series in a neighborhood of x* containing an approximation x(k) obtained at the k-
th step, we get
n
∂f i
f i (x*) = f i (x (k )
+ (x * − x )) = f i (x
k (k )
)+∑ ( x *j − x kj ) +
j =1 ∂x j
x= xk
(4.3)
1 n n * ∂ 2 fi
∑∑ j j r r ∂x ∂x
2! j =1 r =1
( x − x k
)( x *
− x k
) + ... = 0
j r x= x k

Since the approximation vector x is very close to x*, the elements ( x *j − x kj ) 2 are small
(k)

and therefore, all the higher order terms including the second order terms negligible. So
the system above will become
n
∂f i (x )

j =1 ∂ xj x = x ( k )
( x *j − x (j k ) ) = − f i ( x ( k ) ), i = 1, 2 ,..., n (4.4)

Prepared by Tibebe-selassie T/mariam 62


Define the Jacobian matrix J(k) whose coefficients are the first order partial derivatives i.e
⎡ ∂f1 (x) ∂f1 (x) ∂f1 (x) ⎤
⎢ L ⎥
⎢ ∂x1 x ( k ) ∂x 2 x( k ) ∂x n x ( k ) ⎥
⎢ ∂f 2 (x) ∂f 2 (x) ∂f 2 (x) ⎥
⎢ L ⎥
J = ⎢ ∂x1 ( k )
(k )
∂ x 2 (k ) ∂ x n (k ) ⎥
x x x
⎢ M M M ⎥
⎢ ∂f (x) ∂f n (x) ∂f n (x) ⎥
⎢ n L ⎥
⎢⎣ ∂x1 x ( k ) ∂x 2 x( k ) ∂x n x ( k ) ⎥⎦
the error vector h (k ) by
h (j k ) = x *j − x (jk ) , j = 1,2,..., n.
and the vector F (k ) such that
Fi ( k ) = − f i (x ( k ) )
Then, the matrix relation (4.4) will become
J ( k ) h ( k ) = −f ( k ) which implies h ( k ) = −(J ( k ) ) f ( k )
−1
(4.5)
(k )
while J is invertible.
In Eq.(4.5), all the quantities are known except the vector h (k ) . Equation (4.5) is a linear
system and hence the methods of solution of the linear systems studied previously are
applicable to determine h.
Now h (k ) is an approximation of the error committed in approximating x* by x(k).
We will obtain a better approximation x(k+1) of x* by
x(k+1) = x(k) + h(k)
or more generally
x ( k +1) = x ( k ) − (J k ) −1 f k .
we continue the process until x * −x ( k ) → 0.
In practice, x* is unknown and so given the tolerance ε and the maximum number
of iteration kmax we stop the iterations when one of the following conditions is true:
1. xi( k +1) − xi( k ) < ε
2. (x ( k +1)
i )
− xi( k ) / xi( k +1) < ε
3. f i (x (k +1) ) < ε
4. k > kmax
for i=1,2,3, …,n
Example: Using Newton’s method approximate the solution of
3x 2 + 4 y + sin z = 0
e x + x y + ln z = 0
xy 4 z = 0
with initial approximation x ( 0) = y ( 0 ) = z ( 0) = 1 ,where z is in radian.
Solution: Let us construct the Jacobian as follows

Prepared by Tibebe-selassie T/mariam 63


∂f1 ( x) ∂f1 ( y ) ∂f1 ( z )
= 6 x; = 4; = cos z
∂x ∂y ∂z
∂f 2 ( x) ∂f ( y ) x ∂f ( z ) 1
= ex + y; 2 = ; 2 =
∂x ∂y 2 y ∂z z
∂f 3 ( x) ∂f 3 ( y ) ∂f 3 ( z )
= y 4 z; = 4 xy; = xy 4
∂x ∂y ∂z
Now, constructing the Jacobian and using Eq(4.5), the system becomes:
⎡ 6x 4 cos z ⎤ ⎡ h1 ⎤ ⎡− 3x 2 − 4 y − sin z ⎤
⎢e x + y x / 2 y 1 / z ⎥ ⎢h ⎥ = ⎢ − e x − x y − ln z ⎥ (4.6)
⎢ ⎥⎢ 2 ⎥ ⎢ ⎥
⎢⎣ y z 4
4 xy z 3
xy ⎥⎦ ⎢⎣ h3 ⎥⎦ ⎢⎣
4
− xy 4 z ⎥

Taking x = y = z = 1 as an initial approximation, Eq(4.6) reduces to:
( 0) (0) ( 0)

⎡ 6 4 0.5403⎤ ⎡ h1 ⎤ ⎡− 7.8415⎤
⎢3.7183 0.5
⎢ 1 ⎥⎥ ⎢⎢h2 ⎥⎥ = ⎢⎢ − 3.7183⎥⎥
⎢⎣ 1 4 1 ⎥⎦ ⎢⎣ h3 ⎥⎦ ⎢⎣ − 1 ⎥⎦
Solving this system results in h1 = −1.2674, h2 = −0.2076, h3 = 1.0979. Using these
results, the next approximation for the unknown variable can be calculated as follows.
x (1) = x ( 0 ) + h1 = 1 − 1.2674 = −0.2674
y (1) = y ( 0) + h2 = 1 − 0.2076 = 0.7924
z (1) = z ( 0 ) + h3 = 1 + 1.0979 = 2.0979
Now substituting the values of x(1),y(1),z(1) in Eq (4.6), we construct a new linear system
and solve it to obtain the new correction factor {h} and then calculate the new
approximations x(2),y(2), and z(2). This process is continued until the convergence
condition is fulfilled.
Exercise:
1. Consider the nonlinear system
⎧ f ( x, y ) = x 2 + y 2 − 25

⎩ g ( x, y ) = x − y − 2 = 0
2

Using a software package that has 2D plotting capabilities, illustrate what is going on
in solving such a system by plotting f(x, y), g(x, y), and show their intersection with
the xy-plane. Determine approximate roots of these equations from the plot.
2. Using Newton’s method approximate the solution of
a) x 2 + y 2 + z 2 = 1 b) x + y + z = 3
2x + y − 4z = 0
2 2
x2 + y2 + z2 = 5
3x 2 − 4 y + z 2 = 0 e x + xy − xz = 1
with initial values x ( 0) = y ( 0 ) = z ( 0) = 0.5 with initial values x ( 0 ) = y ( 0 ) = z ( 0 ) = 0
c) x+ y+ z=0
x 2 + y 2 + z 2 = 2 with initial values(3/4, ½,– ½ )
x ( y + z ) = −1

Prepared by Tibebe-selassie T/mariam 64


5 Finite differences
Introduction
Before the introduction of computers, calculations were performed either by slide
rule or with the aid of desk calcuators and hence there was necessity for some methods to
avoid excess calculations. Numerical methods came as a help. Then the methods based
on differences were well suited and were extesively used. For example, the value of the
function at an untabulated point may be required, so that an interpolation procedure is
necessary. It is also possible to estimate the derivative or the definite integral of a
tabulated function, using some finite processes to approximate the corresponding
(infinitesimal) limiting procedures of calculus. In each case, it has been traditional to use
finite differences. Another application of finite differences, which is outside of the scope
of this book, is the numerical solution of partial differential equations.
Finite-Difference Table
Although most tables of mathematical functions use constant argument intervals,
some functions do change rapidly in value in particular regions of their argument, and
hence may best be tabulated using intervals varying according to the local behaviour of
the function. Tables with varying argument intervals are more difficult to work with;
however, it is common to adopt uniform argument intervals wherever possible. And
finite-difference methods are applicable to such tables.
It is extremely important that the interval between successive values is small
enough to display the variation of the tabulated function, because usually the value of the
function will be needed at some argument value between values specified. If the table is
constructed in this manner, we can obtain such intermediate values to reasonable
accuracy by assuming a polynomial representation (hopefully, of low degree) of the
function f.
Since Newton, finite differences have been used extensively. The construction of
a table of finite differences for a tabulated function is simple: One obtains first
differences by subtracting each value from the succeeding value in the table, second
differences by repeating this operation on the first differences, and so on for higher order
differences. For example for the exponential function f ( x ) = e x where 0.20 ≤ x ≤ 0.25
with a constant argument interval 0.01(0.20(0.01)0.25) to 6S we have the difference table

Differences
x f(x) 1st 2nd 3rd
0.20 1.22140
0.01228
0.21 1.23368 0.00012
0.01240 0.00000
0.22 1.24608 0.00012
0.01252 0.00001
0.23 1.25860 0.00013
0.01265 0.00000
0.24 1.27125 0.00013
0.01278
0.25 1.28403

Prepared by Tibebe-selassie T/mariam 65


Influence of round-off errors
Consider the difference table given below for f ( x ) = e x : 0.2 (0.05) 0.45 to 6S
constructed as in the preceding Table. As before, differences of increasing order decrease
rapidly in magnitude, but the third differences are irregular. This is largely a consequence
of round-off errors.

Differences
x f(x) 1st 2nd 3rd
0.20 1.22140
0.06263
0.25 1.28403 0.00320
0.06583 0.00018
0.30 1.34986 0.00338
0.06921 0.00016
0.35 1.41907 0.00354
0.07275 0.00020
0.40 1.49182 0.00374
0.07649
0.45 1.56831

Although the round-off errors in f should be less than 1/2 in the last significant place,
they may accumulate; the greatest error that can be obtained corresponds to:

Diffrences
Tabular error
1st 2th 3th 4th 5th 6th

–1
–½ +2
+1 –4
+½ –2 +8
–1 +4 – 16
–½ +2 –8 +32
+1 –4 +16
+½ –2 +8
–1 +4
–½ +2
+1

A rough working criterion for the expected fluctuations (`noise level') due to round-off
error is shown in the table:

Prepared by Tibebe-selassie T/mariam 66


Let us assume that an error ε is made at any tabulated value (since we are interested only
in the build-up errors for the analysis we consider the values of f(x) with zeros)

Diffrences
f(x)
1st 2th 3th 4th 5th 6th
0
0
0 0
0 ε
0 ε –4 ε
ε –3 ε 10 ε
0+ε –2ε 6ε – 20 ε
–ε 3ε –10 ε
0 ε –4ε
0 –ε
0 0
0
0
Note that the maximum error coccurs directly opposite to the entery whose functional
value is in error of ε.
EXERCISES
1. Construct a difference table for the function f (x) = x3 for x = 0(1) 6.
2. Construct a difference table for each of the polynomial functions:
a) 2x - l for x = 0(1)3.
b) 3x2 + 2x - 4 for x = 0(1)4.
c) 2x3 + 3x - 3 for x = 0(1)5.
Study your resulting tables carefully; note what happens in the final few columns of
each table. Suggest a general result for polynomials of degree n and compare your
answer with the theorem in.
3. Construct a difference table for the function f (x) = ex, given to 7S for x = 0.1(0.05)
0.5:

Forward, backward, central difference notations


There are several different notations for the single set of finite differences, described in
the preceding Step. We introduce each of these three notations in terms of the so-called
shift operator, which we will define first.

5.1 The shift operator E


Let f j = f ( x j ), where x j = x 0 + jh, j = 0, 1, 2, ..., n. be a set of values of the function
f (x ) the shift operator E is defined by:
Ef j = f j +1 = f ( x j + h )

Prepared by Tibebe-selassie T/mariam 67


Consequently,
E 2 f j = E ( Ef j ) = Ef j +1 = f j + 2 = f ( x j + 2h )
and so on, i.e.,
E k f j = f j + k = f ( x j + kh )
where k is any positive integer. Moreover, the last formula can be extended to negative
integers, and indeed to all real values of j and k, so that, for example,
E −1 f j = f j −1 ,
E 2 f j = f j + 1 = f (x j + 12 h ) = f ( x0 + ( j + 12 )h )
1
And
2

Example 1:- Find the value of E 2 x 2 when the value of x may vary by a constant
increment of 2.
Solution: let f ( x ) = x 2 since the constant increment h=2, we obtain
E 2 x 2 = E 2 f ( x ) = f ( x + 2h ) by definition hence
E 2 x 2 = ( x + 4) 2 = x 2 + 8 x + 16.
Example 2: - Find the value of E n e x when x may vary by a conxtant interval of h.
Solution: Similar to Example 1 E n e x = e x + nh .

5.2 The forward difference operator Δ


If we define the forward difference operator Δ by
Δ ≡ E − 1 , i.e E ≡ Δ + 1, (1)
then
Δf j = (E − 1) f j = Ef j − f f = f j +1 − f j = f ( x j + h ) − f ( x j ) ,
which is the first-order forward difference at xj. Similarly, we find that
Δ2 f j = Δ ( Δf j ) = Δf j +1 − Δf j = f j + 2 − 2 f j +1 + f j
is the second-order forward difference at xj, and so on. The forward difference of
order k is
Δk f j = Δk −1 ( Δf j ) = Δk −1 ( f j +1 − f j ) = Δk −1 f j +1 − Δk −1 f j ,
where k is any integer.
In (1) E and 1 are mere operators and not algebraic numbers. 1 means an operation on
f(x) which does not make any change in the value of the function i.e identity operator.
Example 3: Find the value of Δ tan −1 x, the interval of difference being h.
Solution: - Δ tan −1 x = tan −1 ( x + h ) − tan −1 x (by definition of Δ)
⎛ x+h−x ⎞ ⎛ ⎛ A − B ⎞⎞
= tan −1 ⎜⎜ ⎟⎟ snice ⎜ tan −1 A − tan −1 B = tan −1 ⎜ ⎟⎟
⎝ 1 + ( x + h) x ⎠ ⎝ ⎝ 1 + AB ⎠ ⎠
⎛ h ⎞
= tan −1 ⎜ 2 ⎟
.
⎝ 1 + hx + x ⎠
Example 4: Prove that
⎡ Δf ( x ) ⎤
Δ log f ( x ) = log ⎢1 +
⎣ f ( x ) ⎥⎦
Solution: Δ log f ( x ) = log f ( x + h ) − log f ( x )

Prepared by Tibebe-selassie T/mariam 68


f ( x + h) Ef ( x )
= log = log
f ( x) f ( x)
(1 + Δ ) f ( x ) ⎡ f ( x ) + Δf ( x ) ⎤
= log = log ⎢ ⎥
f ( x) ⎣ f ( x) ⎦
⎡ Δf ( x ) ⎤
= log⎢1 + .
⎣ f ( x ) ⎥⎦
Example 5 Find the function whose first difference is e x .
Solution: By definition, Δe x = e x + h − e x = e x ( e h − 1) where h is the common interval
lengthe. So we get
1 ⎧ ex ⎫
e x = h −1 Δe x which intern implies that Δ ⎨ h ⎬=e .
x

e ⎩ e − 1⎭
ex
Hence the function whose first defference is e x is given by f ( x ) = .
eh − 1

5.3 The backward difference operator ∇


If we define the backward difference operator ∇ by
∇ ≡ 1 − E −1 ,
then
∇f j = (1 − E −1 ) f j = f j − E −1 f j = f j − f j −1 = f ( x j ) − f ( x j − h ),
which is the first-order backward difference at xj. Similarly,
∇ 2 f j = ∇(∇f j ) = ∇f j − ∇f j −1 = f j − 2 f j −1 + f j − 2 ,
is the second-order backward difference at xj, etc. The backward difference of order
k is
∇ k f j = ∇ k −1 (∇f j ) = ∇ k −1 ( f j − f j −1 ) = ∇ k −1 f j − ∇ k −1 f j −1 ,
where k is any integer. Note that ∇f j = Δf j −1 , and ∇ k f j = Δk f j − k .
Example 6: Find the value of ∇ 2 x 3 when the value of x may vary by a constant
increment of 1.
Solution: ∇ 2 x 3 = (1 − E −1 ) 2 x 3 = (1 − 2 E −1 + E −2 ) x 3
= x 3 − 2 E −1 x 3 + E −2 x 3 = x 3 − 2( x − 1) 3 + ( x − 2) 3
= x 3 − 2( x 3 − 3x 2 + 3x − 1) + ( x 3 − 6 x 2 + 12 x − 8)
= 6( x − 1).

5.4 The central difference operator δ


If we define the central difference operator δ by
1 1

δ =E −E 2 2

then
1 1 1 1
− −
δf j = ( E 2 − E 2
)ff = E2 fj − E 2
fj = f 1 − f 1 ,
j+ j−
2 2

Prepared by Tibebe-selassie T/mariam 69


which is the first-order central difference at xj. Similarly,
⎛ ⎞
δ 2 f j = δ (δf j ) = δ ⎜⎜ f 1 − f 1 ⎟⎟ = f j +1 − 2 f j + f j −1
⎝ j+ 2 j+
2 ⎠

is the second-order central difference at xj, etc. The central difference of order k is
⎛ ⎞
δ k f j = δ k −1 (δf j ) = δ k −1 ⎜⎜ f 1 − f 1 ⎟⎟ = δ k −1 f 1 − δ k −1 f 1
⎝ j+ 2 j+
2 ⎠
j+
2
j+
2

where k is any integer. Note that δ f 1 = Δf j = ∇f j +1 .


k
j+
2

Example 7: Find the value of δ x when the value of x may vary by a constant
2 3

increment of 1.
2
⎛ 1 − ⎞
1

Solution: δ x = ⎜⎜ E 2 − E 2 ⎟⎟ x 3 = (E − 2 + E −1 )x 3
2 3

⎝ ⎠
= Ex − 2 x + E −1 x 3
3 3

= ( x + 1) 3 − 2 x 3 + ( x − 1) 3
= 6 x.
Forward difference table
In general a forward difference table and simply a difference table are identical except
that when it comes to the case of a forward difference table we use the forward difference
operators to indicate the order of the difference column. See the table below.

Differences
x f(x)
Δ Δ2 Δ3
0.20 1.22140
0.01228
0.21 1.23368 0.00012
0.01240 0.00000
0.22 1.24608 0.00012
0.01252 0.00001
0.23 1.25860 0.00013
0.01265 0.00000
0.24 1.27125 0.00013
0.01278
0.25 1.28403

Onec we construct a forward difference table we can red the result of the rest difference
operater from the same table.So there is no need to construct other difference table for
central and backward differences.
Example 8:- If we just let x 0 = 0.20 in the last difference table, since h=0.01 we can
easly see from the table that Δf 1 = 0.01240, while ∇f 1 = 0.01228, moreover
Δ2 f 2 = 0.00013 , ∇ 2 f 2 = 0.00012, and δ 2 f 2 = 0.00013. But we can not read values like
δf 1 , ∇f 0 , Δf 5 , ∇ 2 f 1 , Δ4 f 4 , etc just from the difference table.

Prepared by Tibebe-selassie T/mariam 70


Although forward, central, and backward differences represent precisely the same
data:
1. Forward differences are useful near the start of a table, since they only involve
tabulated function values below xj ;
2. Central differences are useful away from the ends of a table, where there are
available tabulated function values above and below xj;
3. Backward differences are useful near the end of a table, since they only involve
tabulated function values above xj.
EXERCISES
1. Construct a table of differences for the polynomial
f ( x ) = 3x 3 − 2 x 2 + x + 5 ;
for x = 0(1)4. Use the table to obtain the values of :
a) Δf 1 , Δ2 f 1 , Δ3 f 1 , Δ3 f 0 , Δ2 f 2 ;
b) ∇f 1 , ∇f 2 , ∇ 2 f 2 , ∇ 2 f 3 , ∇ 3 f 4 ;
c) δf 1 , δ 2 f 1 , δ 3 f 3 , δ 3 f 5 , δ 2 f 2 .
2 2 2
2. For the difference table of f (x) = ex for x = 0.2(0.05)0.45 determine to six significant
digits the quantities (taking x0 = 0.2 ):
a) Δf 2 , Δ2 f 2 , Δ3 f 2 , Δ4 f 2 ;
b) δ 2 f 2 , δ 4 f 3 ;
c) Δ2 f 1 , δf 3 , ∇ 2 f 3 ;
2
3. Prove the statements:
a) Ex j = x j +1 ;
b) Δ3 f j = f j + 3 − 3 f j + 2 + 3 f j +1 − f j ;
c) ∇ 3 f j = f j − 3 f j −1 + 3 f j − 2 − f j − 3 ;
d) δ 3 f j = f j + 3 − 3 f j + 1 + 3 f j − 1 − f j − 3 .
2 2 2 2

⎛Δ ⎞2
4. Evaluate ⎜⎜ ⎟⎟ x 3 while the interval of difference equals 1.
⎝E⎠
5. Prove that
⎛ Δ2 ⎞ Ee x
e x = ⎜⎜ ⎟⎟e x ⋅ 2 x
⎝E⎠ Δe
the interval of difference being h.
6. Show that
a) ( Δ − ∇) ≡ Δ∇ ;
b) (1 + Δ )(1 − ∇) ≡ 1 ;
c) ∇Δ ≡ δ 2 .
7. Show that all the difference operators commute with one another for instance
example EΔ ≡ ΔE , and δ∇ ≡ ∇δ etc.
8. Simplify ( Δ + ∇) 2 f ( x ) with the common step length equals to h.

Prepared by Tibebe-selassie T/mariam 71


5. 5 Finite Differences of Polynomials
Polynomials
Since polynomial approximations are used in many areas of Numerical Analysis, it is
important to investigate the phenomena of differencing polynomials.
Consider the finite differences of an n-th degree polynomial
f ( x ) = a n x n + a n −1 x n −1 + ... + a1 x + a 0 ,
tabulated for equidistant points at the tabular interval h.
Theorem: The n-th difference of a polynomial of degree n is a constant proportional to
hn, and higher order differences are zero.
Proof: For any positive integer k, the binomial expansion
k
k!
( x j + h) k = ∑ x kj −i h i ,
i = 0 i! ( k − i )
yields
( x j + h ) k − x kj = kx kj −1 h + polynomial of deg ree ( k − 2).
Omitting the subscript on x, we find
[ ] [
Δf ( x ) = f ( x + h ) − f ( x ) = a n ( x + h ) n − x n + a n −1 ( x + h ) n −1 − x n −1 ]
+ ... + a1 [( x + h ) − x ]
= a n nx n −1 h + polynomial of deg ree( n − 2)
[ ]
Δ2 f ( x ) = a n nh ( x + h ) n −1 − x n −1 + ...
n −2
= a n n( n − 1) x h + polynomial of degree( n − 3)
2

M M
Δn f ( x ) = a n n! h n = constant
Δn +1 f ( x ) = 0.
In passing, the student may recall that in Differential Calculus the increment
Δf ( x ) = f ( x + h ) − f ( x ) is related to the derivative of f (x) at the point x.
Conversely if the nth finite-difference of a tabulated function are constant then
f(x) is a polynomial of degree n.
This converse result is very much helpful. If, in an experimental set up we obtain
a table of values between two variables and that the function relation y = f (x ) is not
known to us and that a particular order difference turns out to be, all constant, then we
can imply that it is a polynomial of the order corresponding to constant difference. This
concept helps us to establish the relationship for experimental results. We shall see later,
how we shall see in the next chapter how this can be established through numerical
examples.
Examples
1) Determine Δ3 {(1 + x )(1 − 3x )(1 + 5 x )}, where the common interval length is 1.
Solution: Observe that (1 + x )(1 − 3x )(1 + 5 x ) = 1 + 3x − 13x 2 − 15 x 3 . From the theorem
above, we can see that for a polynomial of nth degree, the difference is constant being
equal to a n n! h n where a n is the coefficient of xn in the polynomial, h is the interval of
differencing.

Prepared by Tibebe-selassie T/mariam 72


Here k = – 15 and h = 1 and the polynomial 1 + 3x − 13x 2 − 15 x 3 is of degree 3.
Hence when the operator Δ is operated on the polynomial three times, then from the
theorem we have
Δ3 {(1 + x )(1 − 3 x )(1 + 5 x )} = Δ3 (1 + 3x − 13 x 2 − 15 x 3 )
= ( −15)3! (1) 3 = −90.
2) Find Δ10 {(1 − ax )(1 − bx 2 )(1 − cx 3 )(1 − dx 4 )} where the common interval length is 1
Solution: Evidently the maximum power of x in the polynomial will be 10 and the
coefficient of x10 will be abcd.
Hence here a n = abcd , h = 1, and n = 10. Using the theorem again we get
Δ10 {(1 − ax )(1 − bx 2 )(1 − cx 3 )(1 − dx 4 )} = ( abcd )10! (1) 10
= (abcd )10!.
3
3) Construct for f (x) = x with x = 5.0(0.1)5.5 the difference table:

x f(x) Δ Δ2 Δ3 Δ4
5.0 125.000
7.651
5.1 132.651 0.306
7.957 0.006
5.2 140.608 0.312 0.000
8.269 0.006
5.3 148.877 0.318 0.000
8.587 0.006
5.4 157.464 0.324
8.911
5.5 166.375

Since in this case n = 3, an =1, h = 0.1, we find Δ3 f ( x ) = 1(3! )(0.1) 3 = 0.006.


Note that round-off error noise may occur; for example, consider the tabulation of
f ( x ) = x 3 for 5.0(0.1)5.5, rounded to two decimal places:

Prepared by Tibebe-selassie T/mariam 73


) Whenever the higher differences of a table become small (allowing for round-
off noise), the function represented may be approximated well by a polynomial.
4 By constructing a difference table, find the sixth term of the sequence 8,12,19,29,42,

Solution: The difference table is as given below
x f(x) Δf(x) Δ2f(x)
1 8
4
2 12 3
7
3 19 3
10
4 29 3
13
5 42 3
16
6 58
The various stapes for constructing the above table are:
i) Construct the difference table for the given values of the sequence.
ii) Note that values of Δ2f(x) are equal to 3.
iii) Write the 6th row and put 3 in the column of Δ2f(x).
iv) Add 3 to 13 in column of Δf(x) and write 16 below 13
v) Add 16 to 42 of f(x) to get 58, which is the required sixth term.
EXERCISES
1. Construct a difference table for the polynomial f(x) = x4 for x = 0(0.1)1 when
a) the values of f are exact;
b) the values of f have been rounded to 3D. Compare the fourth difference
round-off errors with the estimate +/-6.
2. Find the degree of the polynomial which fits the data in the table:
a) b)
x f(x) x f(x) x f(x) x f(x)
0 3 3 24 0 0 3 15
1 2 4 59 1 3 4 24
2 7 5 118 2 8 5 35

3. Find the tenth term of the sequence 3, 14, 39, 84, 155, 258, …
4. Given f 0 = 3, f 1 = 12, f 2 = 81, f 3 = 200, f 4 = 100, f 5 = 8, find Δ5 f 0 without
constructing a difference table.
5. Show that the kth forward difference Δ2f1 in a difference table may be expressed
as
⎛k ⎞ ⎛k ⎞ ⎛k ⎞ ⎛k ⎞
Δk f 1 = ⎜⎜ ⎟⎟ f k + ( −1)⎜⎜ ⎟⎟ f k −1 + ( −1) 2 ⎜⎜ ⎟⎟ f k − 2 + ... + ( −1) k ⎜⎜ ⎟⎟ f 1
⎝ 0⎠ ⎝1⎠ ⎝ 2⎠ ⎝k ⎠
⎛k ⎞ k
Where the coefficients ⎜⎜ ⎟⎟ = are the Binomial coefficients.
⎝ j ⎠ j! ( k − j )!

Prepared by Tibebe-selassie T/mariam 74


6 Interpolation
Interpolation is the art of reading between the lines in a table. It may be regarded as a
special case of the general process of curve fitting. More precisely, interpolation is the
process whereby untabulated values of a function, given only at certain values, are
estimated on the assumption that the function has sufficiently smooth behavior
between tabular points, so that it can be approximated by a polynomial of fairly low
degree.

Interpolation is not as important in Numerical Analysis as it has been, now that


computers (and calculators with built-in functions) are available, and function values
may often be obtained readily by an algorithm (probably from a standard subroutine).
However,

1. interpolation is still important for functions that are available only in tabular
form (perhaps from the results of an experiment); and
2. interpolation serves to introduce the wider application of finite differences.

In Chapter 5 we have observed that, if the differences of order k are constant (within
round-off fluctuation), the tabulated function may be approximated by a polynomial of
degree k. Linear and quadratic interpolation correspond to the cases k = 1 and k = 2,
respectively.

6.1 Linear interpolation

When a tabulated function varies so slowly that first differences are


approximately constant, it may be approximated closely by a straight line
between adjacent tabular points. This is the basic idea of linear interpolation.
In Fig.6.1, the two function points (xj, fj) and (xj+1, fj+1) are connected by a straight
line. Any x between xj and xj+1 may be defined by a value of q such that

If f (x) varies only slowly in the interval, a value of the function at x is


approximately given by the ordinate to the straight line at x. Elementary
geometrical considerations yield

so that

Prepared by Tibebe-selassie T/mariam 75


FIGURE 6.1 Linear interpolation.

In analytical terms, we have approximated f (x) by

the linear function of x which satisfies

As an example, consider the following difference table, taken from a 4D table of


e-x:

The first differences are almost constant locally, so that the table is suitable for
linear interpolation. For example,

Prepared by Tibebe-selassie T/mariam 76


.

6.2 Quadratic interpolation

As previously indicated, linear interpolation is appropriate only for slowly


varying functions. The next simple process is quadratic interpolation, based on
a quadratic approximating polynomial; one might expect that such an
approximation would give better accuracy for functions with larger variations.

Given three adjacent points xj, xj+1 = xj+ h and xj+2 = xj + 2h, suppose that f (x) can
be approximated by

,.

where a, b, and c are chosen so that

Thus,

whence

Setting , we obtain the quadratic interpolation formula:

We note immediately that this formula introduces a second term (involving ),


not included in the linear interpolation formula.

As an example, we determine the second-order correction to the value of f


(0.934) obtained above using linear interpolation. The extra term is

Prepared by Tibebe-selassie T/mariam 77


so that the quadratic interpolation formula yields

(In this case, the extra term -0.0024/200 is negligible!)

Checkpoint

1. What process obtains an untabulated value of a function?


2. When is linear interpolation adequate?
3. When is quadratic inteipolation needed and adequate?

EXERCISES

4. Obtain an estimate of sin(0.55) by linear interpolation of f (x) = sin x over


the interval [0.5, 0.6] using the data:

Compare your estimate with the value of sin(0.55) given by your


calculator.

5. The entries in a table of cos x are:

Obtain an estimate of cos(80° 35') by means of

1. Linear interpolation,
2. quadratic interpolation.
6. The entries in a table of tan x are:

Is it more appropriate to use linear or quadratic interpolation? Obtain


an estimate of tan(80° 35').

Prepared by Tibebe-selassie T/mariam 78


6.3 Newton interpolation formulae

The linear and quadratic interpolation formulae are based on first and second degree
polynomial approximations. Newton has derived general forward and backward
difference interpolation formulae, corresponding for tables with constant interval h.

6.3.1 Newton's forward difference formula

Consider the points xj, xj + h, xj + 2h, . . ., and recall that

where θ is any real number. Formally, one has (since )

which is Newton's forward difference formula. The linear and quadratic


(forward) interpolation formulae correspond to first and second order
truncation, respectively. If we truncate at n-th order, we obtain

which is an approximation based on the values fj, fj+1,. . . , fj+n. It will be exact if
(within round-off errors)

which is the case if f is a polynomial of degree n.

6.3.2 Newton's backward difference formula

Formally, one has (since Newton's backward difference


formula. The linear and quadratic (backward) interpolation formulae correspond
to truncation at first and second order, respectively. The approximation based
on the fj-n, fj-n-1, . . . , fj is

Prepared by Tibebe-selassie T/mariam 79


Use of Newton's interpolation formulae

Newton's forward and backward difference formulae are well suited for use at
the beginning and end of a difference table, respectively. (Other formulae,
which use central differences, may be more convenient elsewhere.)

As an example, consider the difference table of f (x) = sin x for x = 0°( 10°)50°:

Since the fourth order differences are constant, we conclude that a quartic
approximation is appropriate. (The third-order differences are not quite
constant within expected round-offs, and we anticipate that a cubic
approximation is not quite good enough.) In order to determine sin 5° from the
table, we use Newton's forward difference formula (to fourth order); thus,
taking xj = 0, we find and

Note that we have kept a guard digit (in parentheses) to minimize accumulated
round-off error.

In order to determine sin 45° from the table, we use Newton's backward
difference formula (to fourth order); thus, taking xj = 40, we find
and

Prepared by Tibebe-selassie T/mariam 80


Uniqueness of the interpolating polynomial

Given a set of values f(x0), f(x1), . . , f(xn) with xj = x0 + jh, we have two
interpolation formulae of order n available:

Clearly, Pn and Qn are both polynomials of degree n. It can be verified (Exercise


2) that Pn(xj) = Qn(xj) = f(xj) for j = 0,1, 2, . . . , n, which implies that Pn - Qn is a
polynomial of degree n which vanishes at (n + 1 ) points. In turn, this implies that
Pn - Qn º 0, or Pn º Qn. In fact, a polynomial of degree n through any given (n + 1)
(distinct but not necessarily equidistant) points is unique, and is called the
interpolating polynomial.

Analogy with Taylor series

If we define for an integer k

the Taylor series about xj becomes

Setting we have formally

Prepared by Tibebe-selassie T/mariam 81


A comparison with Newton's interpolation formula

shows that the operator (applied to functions of a continuous variable) is


analogous to the operator E (applied to functions of a discrete variable).

Checkpoint

1. What is the relationship between the forward and backward linear and
quadratic interpolation formulae (for a table of constant interval h) and
Newton's interpolation formulae?
2. When do you use Newton's forward difference formula?
3. When do you use Newton's backward difference formula?

EXERCISES

4. From a difference table of f (x) = ex to 5D for x = 0.10(0.05)0.40,


estimate:
1. e0.14 by means of Newton's forward difference formula;
2. e0.315 by means of Newton's backward difference formula.
5. Show that for j = 0, 1, 2, . . .,

6. Derivc the equation of the interpolating polynomial for the data.

Prepared by Tibebe-selassie T/mariam 82


6.4 Lagrange interpolation formula

The linear and quadratic interpolation formulae correspond to first and second-degree
polynomial approximations, respectively. In section 6.3, we have discussed Newton's
forward and backward interpolation formulae and noted that higher order
interpolation corresponds to higher degree polynomial approximation. In this Step we
consider an interpolation formula attributed to Lagrange, which does not require
function values at equal intervals. Lagrange's interpolation formula has the
disadvantage that the degree of the approximating polynomial must be chosen at the
outset. Thus, Lagrange's formula is mainly of theoretical interest for us here; in passing,
we mention that there are some important applications of this formula beyond the scope
of this book - for example, the construction of basis functions to solve differential
equations using a spectral (discrete ordinate) method.

Procedure

Let the function f be tabulated at (n + 1), not necessarily equidistant points xj, j =
1, 2,…., n and be approximated by the polynomial

of degree at most n, such that

n
( x − xi )
Since for k = 0,1, 2, . . , n Lk ( x ) = ∏ i.e.
i =0 ( x k − xi )
i≠k

is a polynomial of degree n which satisfies

then:

is a polynomial of degree (at most) n such that

Prepared by Tibebe-selassie T/mariam 83


i.e., the (unique) interpolating polynomial. Note that for x = xj all terms in the
sum vanish except the j-th, which is fj; Lk(x) is called the k-th Lagrange
interpolation coefficient, and the identity

(established by setting f(x) = 1) may be used as a check. Note also that with n = 1
we recover the linear interpolation formula:

in Section 6.1.

1 Example

We will use Lagrange's interpolation formula to find the interpolating


polynomial P3 through the points (0, 3), (1, 2), (2, 7), and (4, 59), and then find
the approximate value P3(3).

The Lagrange coefficients are:

(The student should verify that Hence, the


required polynomial is

Consequently, However, note that, if the explicit


form of the interpolating polynomial were not required, one would proceed to

Prepared by Tibebe-selassie T/mariam 84


evaluate P3(x) for some value of x directly from the factored forms of Lk(x).
Thus, in order to evaluate P3(3), one has

Notes of caution

In the case of the Newton interpolation formulae, considered in the preceding


Step, or the formulae to be discussed in the next Step, the degree of the required
approximating polynomial may be determined merely by computing terms until
they no longer appear to be significant. In the Lagrange procedure, the
polynomial degree must be chosen at the outset! Also, note that

1. a change of degree involves recomputation of all terms; and


2. for a polynomial of high degree the process involves a large number of
multiplications, whence it may be quite slow.

Lagrange interpolation should be used with considerable caution. For example,


let us employ it to obtain an estimate of from the points (0, 0), (1,1), (8, 2),
(27, 3), and (64, 4) on . We find

so that which is not very close to the correct value 2.7144! A


better result (i.e., 2.6316) can be obtained by linear interpolation between (8, 2)
and (27, 3). The problem is that the Lagrange method yields no indication as to
how well is represented by a quartic. In practice, therefore, Lagrange
interpolation is used only rarely.

Checkpoint

1. When is the Lagrange interpolation formula used in practical computations?


2. What distinguishes the Lagrange formula from many other interpolation
formulae?
3. Why should the Lagrange formula be used in practice only with caution?

EXERCISE

Given that f (-2) = 46, f (-1 ) = 4, f ( 1 ) = 4, f (3) = 156, and f (4) = 484, use Lagrange's
interpolation formula to estimate the value of f(0).

Prepared by Tibebe-selassie T/mariam 85


6.5 Divided Difference Interpolation
We noted that the Lagrange interpolation formula is mainly of theoretical interest, for at
best it involves very considerable computation in practice, and it can be quite dangerous
to use. It is much more efficient to use divided differences to interpolate a tabular
function with unequally spaced arguments, and at the same time it is relatively safe since
the necessary degree of collocation polynomial can be decided. We define divided
differences as bellow.
Suppose the function f ( x) is tabulated at the (not necessarily equidistant) points
{x0 , x1 ,..., xn }. We define the divided differences between points thus:
first divided difference (say, between x0 and x1) by
f ( x1 ) − f ( x 0 ) f1 − f 0
f [ x0 , x1 ] = = = f [ x1 , x0 ];
x1 − x0 x1 − x0
second divided difference (say, between x0 , x1 and x 2 ) by
f [ x1 , x 2 ] − f [ x0 , x1 ]
f [ x0 , x1 , x 2 ] = ;
x 2 − x0
and so on to nth divided difference (between x0 , x1 , . . ., x n )
f [ x1 , x 2 , . . ., x n ] − f [ x 0 , x1 , ... , x n −1 ]
f [ x0 , x1 , . . ., x n ] = ;
x n − x0
Example
Construct a divided difference table from the following data:

x 0 1 3 6 10
f(x) 1 –6 4 169 921

The difference table is as follows:

x f(x)
0 1 −7
4
1 −6 5 1
10 0
3 4 55 1
19
6 169 188
10 921

It is notable that the third divided differences are constant. Below, we shall
interpolate from the table by using Newton’s divided difference formula, and determine
the corresponding collocation cubic.

Prepared by Tibebe-selassie T/mariam 86


Newton’s Divided Difference Formula

From the definitions of divided difference we have


f ( x ) = f ( x 0 ) + ( x − x 0 ) f [ x, x 0 ]
f [ x, x 0 ] = f [ x 0 , x1 ] + ( x − x1 ) f [ x, x 0 , x1 ]
f [ x, x0 , x1 ] = f [ x0 , x1 , x 2 ] + ( x − x 2 ) f [ x, x 0 , x1 , x 2 ]
M M M
f [ x, x0 ,..., x n −1 ] = f [ x0 , x1 ,..., x n ] + ( x − x n ) f [ x, x0 , x1 ,..., x n ].

Multiplying the second equation by ( x − x0 ), the third by ( x − x 0 )( x − x1 ), etc., and


adding yields

f ( x) = f ( x0 ) + ( x − x 0 ) f [ x0 , x1 ] + ( x − x 0 )( x − x1 ) f [ x0 , x1 , x 2 ] + ... +
( x − x0 )( x − x1 )...( x − x n −1 ) f [ x 0 , x1 ,..., x n ] + R
where
R= ( x − x0 )( x − x1 )...( x − x n ) f ( x, x 0 , x1 ,..., x n )
Note that the remainder term R is zero at x0 , x1 ,..., x n and we may infer that the other
terms of the right hand side constitute the collocation polynomial or, equivalently, the
Lagrange polynomial. If the degree of collocation polynomial necessary is not known in
advance, it is customary to order the points x0 , x1 ,..., x n according to increasing distance
from x and add terms until R is small enough.
For instance for the tabular function in the above example we may find
f (2) and f (4) by Newton’s divided difference formula as below.
Since the third difference is constant, we can fit a cubic through the five points.
By Newton’s divided difference formula, using x0 = 0, x1 = 1, x 2 = 3, x3 = 6 the cubic is
f ( x) = f (0) + xf (0,1) + x( x − 1) f (0,1,3) + x( x − 1)( x − 3) f (0,1,3,6)
= 1 − 7 x + 4 x( x − 1) + 1x( x − 1)( x − 3),
so that
f (2) = 1 − 14 + 8 − 2 = −7.
Note the corresponding collocation polynomial is obviously
1 − 7 x + 4 x 2 − 4 x + x 3 − 4 x 2 + 3 x = x 3 − 8 x + 1.
To find f (4) , let us identify x0 = 1, x1 = 3, x 2 = 6, x3 = 10 , so that
f ( x) = −6 + 5( x − 1) + 10( x − 1)( x − 3) + ( x − 1)( x − 3)( x − 6)
and
f (4) = −6 + 5 × 3 + 10 × 3 + 3 × 1(−2) = 33 .
As expected, the collocation polynomial is the same cubic polynomial i.e x 3 − 8 x + 1 .
Exercise
1. Use Newton’s divided difference formula to show that an interpolation for 3 20 from
the points (0,0),(1,1),(8,2),(27,3),(64,4), on f ( x) = 3 x is quite invalid.
2. Given that f (−2) = 46, f (−1) = 4, f (3) = 156 and f (4) = 484, compute f(0) by
Newton’s divided difference formula.

Prepared by Tibebe-selassie T/mariam 87


7 Numerical Differentiation and Integration
7.1 Numerical Differentiation
Finite differences
In Analysis, we are usually able to obtain the derivative of a function by the methods of
elementary calculus. However, if a function is very complicated or known only from
values in a table, it may be necessary to resort to numerical differentiation.
Procedure
Formulae for numerical differentiation may easily be obtained by differentiating
interpolation polynomials. The essential idea is that the derivatives f', f", . . . of a
function are represented by the derivatives P'n, P"n, . . . of the interpolating polynomial
Pn. For example, differentiation of Newton's forward difference formula:

with respect to x, since , etc., yields formally :

In particular, if we set θ=0, we arrive at formulae for derivatives at the tabular points
{xj}:

.
If we set  ½, we have a relatively accurate formula at half-way points (without second
differences):

;
if we set = 1 in the formula for the second derivative, we find (without third
differences):

,
i.e., a formula for the second derivative at the next point.
Note that, if one retains only one term, one arrives at the well-known formulae:

,
Errors in numerical differentiation

Prepared by Tibebe-selassie T/mariam 88


It must be recognized that numerical differentiation is subject to considerable error; the
basic difficulty is that, while f (x) - Pn(x) may be small, the differences (f'(x) - Pn'(x)) and
(f"(x-Pn"(x)), etc. may be very large. In geometrical terms, although two curves may be
close together, they may differ considerably in slope, variation in slope, etc. (Figure 14).

.
FIGURE 14 Interpolating f(x)
It should also be noted that all these formulae involve division of a combination of
differences (which are prone to loss of significance or cancellation errors, especially if h
is small) by a positive power of h. Consequently, if we want to keep round-off errors
down, we should use a large value of h. On the other hand, it can be shown (see Exercise
3 below) that the truncation error is approximately proportional to hp, where p is a
positive integer, so that k must be sufficiently small for the truncation error to be
tolerable. We are in a cleft stick and must compromise with some optimum choice of h.
In brief, large errors may occur in numerical differentiation, based on direct
polynomial approximation, so that an error check is always advisable. There arc
alternative methods, based on polynomials, which use more sophisticated procedures
such as least-squares or mini-max, and other alternatives involving other basis
functions (for example, trigonometric functions). However, the best policy is probably
to use numerical differentiation only when it cannot be avoided!
Example
We will estimate the values of f'(0.1) and f"(0.1) for f(x) = ex, using the data in.
If we use the above formulae with = 0, we obtain (ignoring fourth and higher
differences):

.
Since f"(0.1) = f''(0.1) = f (0.1) = 1.10517, it is obvious that the second result is much less
accurate (due to round-off errors).
Checkpoint
How are formula for the derivatives of a function obtained from interpolation
formulae?

Prepared by Tibebe-selassie T/mariam 89


Why is the accuracy of the usual numerical differentiation process not necessarily
increased if the argument interval is reduced?
When should numerical differentiation be used?
EXERCISES
1. Derive formulae involving backward differences for the first and second derivatives
of a function.
2.The function is tabulated for x = 1.00(0.05)1.30 to 5D:

a. Estimate the values of f'(1.00) and f"(1.00), using Newton's forward difference
formula.
b. Estimate f'(1.30) and f"(1.30), using Newton's backward difference formula.
3. Use Taylor series to find the truncation errors in the formulae:

7.2 Numerical Integration


7.2.1 The trapezoidal rule
It is often either difficult or impossible to evaluate by analytical methods definite
integrals of the form

,
so that numerical integration or quadrature must be used.
It is well known that the definite integral may be interpreted as the area under the curve y
= f (x) for and may be evaluated by subdivision of the interval and summation
of the component areas. This additive property of the definite integral permits evaluation
in a piecewise sense. For any subinterval of the interval , we
may approximate f (x) by the interpolating polynomial Pn(x). Then we obtain the
approximation

,
which will be a good approximation, provided n is chosen so that the error |f (x) - Pn(x)|
in each tabular subinterval is sufficiently small. Note that

Prepared by Tibebe-selassie T/mariam 90


for n > 1 the error is often alternately positive and negative in successive subintervals and
considerable cancellation of error occurs; in contrast with numerical differentiation,
quadrature is inherently accurate! It is usually sufficient to use a rather low degree,
polynomial approximation over any subinterval .
The trapezoidal rule
Perhaps the most straightforward quadrature is to subdivide the interval into N
equal strips of width h by the points

such that b = a + Nh. Then one can use the additive property

and the linear approximations, involving

to obtain the trapezoidal rule, which is suitable for computer implementation.

Integration by the trapezoidal rule therefore involves computation of a finite sum of


values of the integrand f, whence it is very quick. Note that this procedure can be
interpreted geometrically (Figure 16) as the sum of the areas of N trapezoids of width h
and average height (fj + fj+1)/2.

FIGURE 15 The trapezoidal rule

Accuracy
The trapezoidal rule corresponds to a rather crude polynomial approximation (a straight
line) between successive points xj and xj+1 = xj + h, and hence can only be accurate for
sufficiently small h. An approximate (upper) bound on the error may be derived as
follows:

Prepared by Tibebe-selassie T/mariam 91


The Taylor expansion

yields the trapezoidal form:

while f (x) may be in xj < x < xj+1 as

to arrive at the exact form:

Comparison of these two forms shows that the truncation error is

Ignoring higher-order terms, one arrives at an approximate bound on this error when
using the trapezoidal rule over N subintervals:

Whenever possible, we will choose h small enough to make this error negligible. In the
case of hand computations from tables, this may not be possible. On the other hand, in a
computer program in which f (x) may be generated anywhere in , the interval
may be resubdivided until sufficient accuracy is achieved. (The integral value for
successive subdivisions can be compared, and the subdivision process terminated when
there is adequate agreement between successive values.)
Example
Obtain an estimate of the integral

using the trapezoidal rule and the data in the table on page 63. If we use T(h) to denote
the approximation with strip width h, we obtain

Since we may observe that the error sequence 0.00081,


0.00020, 0.00005 decreases with h², as expected.
Checkpoint
Why is quadrature using a polynomial approximation for the integrand likely to be
satisfactory, even if the polynomial is of low degree?

Prepared by Tibebe-selassie T/mariam 92


What is the degree of the approximating polynomial corresponding to the trapezoidal
rule?
Why is the trapezoidal rule well suited for implementation on a computer?

EXERCISES
Estimate the value of the integral

using the trapezoidal rule and the data given in Exercise 2 of the preceding Step.
Use the trapezoidal rule with h = 1,0.5, and 0.25 to estimate the value of the integral

7.2.1 Simpson's Rule


If it is undesirable (for example, when using tables) to increase the subdivision of an
interval , in order to improve the accuracy of a quadrature, one alternative is to
use an approximating polynomial of higher degree. The integration formula, based on a
quadratic (i.e., parabolic) approximation is called Simpson's Rule. It is adequate for
most purposes that one is likely to encounter in practice.
Simpson's Rule
Simpson's Rule gives for

.
A parabolic arc is fitted to the curve y = f(x) at the three tabular points
Hence, if N/2 = (b - a)/2 is even, one obtains Simpson's Rule:

,
where
.
Integration by Simpson's Rule involves computation of a finite sum of given values of the
integrand f, as in the case of the trapezoidal rule. Simpson's Rule is also effective for
implementation on a computer; a single direct application in a hand calculation usually
gives sufficient accuracy.
Accuracy
For a given integrand f, it is quite appropriate to computer program increased interval
subdivision, in order to achieve a required accuracy, while for hand calculations an error
bound may again be useful.

Prepared by Tibebe-selassie T/mariam 93


Let the function f(x) have in the Taylor expansion

,
then

.
One may reformulate the quadrature rule for by replacing fj+2 = f (j+1 +
h) and fj = f (xj+1 - k) by its Taylor series; thus

.
A comparison of these two versions shows that the truncation error is

.
Ignoring higher order terms, we conclude that the approximate bound on this error
while estimating

by Simpson's Rule with N/2 subintervals of width 2h is

.
Note that the error bound is proportional to h , compared with h2 for the cruder
4

trapezoidal rule. Note that Simpson's rule is exact for cubic functions!
Example
We shall estimate the value of the integral

,
using Simpson's rule and the data in Exercise 2 of STEP29. If we choose h = 0.15 or h =
0.05, there will be an even number of intervals. Denoting the approximation with strip
width h by S(h), we obtain

and

,
respectively. Since f(4)(x) = -15x-7/2/16, an approximate truncation error bound is

Prepared by Tibebe-selassie T/mariam 94


whence it is 0.000 000 8 for h = 0.15 and 0.000 000 01 for h = 0.05. Note that the
truncation error is negligible; within round-off error, the estimate is 0.32148(6).
Checkpoint
What is the degree of the approximating polynomial corresponding to Simpson's Rule?
What is the error bound for Simpson's rule?
Why is Simpson's Rule well suited for implementation on a computer?
EXERCISES
Estimate by numerical integration.the value of the integral

,
to 4D.
Use Simpson's Rule with N = 2 to estimate the value of

.
Estimate to 5D the resulting error, given that the true value of the integral is 0.26247.

Prepared by Tibebe-selassie T/mariam 95


Mid-Semester Examination (2004/2005)

1. i) The normalized floating point representations of 10,872 and 0.0066 are respectively
___________________ and _______________ [1 point]
ii) a) The numbers 10,872 and 0.0066 chopped to two significant figures are given
by ____________________ and _________________
b) The numbers 10,872 and 0.0066 chopped to two decimal places are given
by ____________________ and _________________
c) The numbers 10,872 and 0.0066 round to two significant figures are given
by ____________________ and _________________
d) The numbers 10,872 and 0.0066 round to two decimal places are given
by ____________________ and _________________ [ 4 point]

iii) Using four-digit normalized floating point arithmetic give the sum [3 point]
a) 10,872 + 0.0066
b) Calculate the absolute and relative errors in your answer for iii(a) above.

iii) Evaluate 4.27 × 3.13 as accurately as possible assuming that the values 4.27 and
3.13 are correct to three significant figures. [3 points]
2. i)Given the equation − 0.4 x + 2.2 x + 4.7 = 0 and
2

x -2 -1 0 7 8
− 0.4 x 2 + 2.2 x + 4.7 -1.3 2.1 4.7 0.5 -3.3

a. Locate the real roots of the equation [2 points]

b. Using the bisection method determine the highest root round to two decimal
places. Compute the absolute error after each iterations. Arrange your job in a
table. [3 points]

c. Using the method of false-position to determine the smallest root round to two
decimal places. Compute the absolute error after each iterations. Arrange your
job in a table. [3 points]

d. Using the method of simple iteration determine the highest root round to four
decimal places. Use the result of (b) above as initial guess. [3 points]

ii) a) Show that Newton’s method for the solution of


x 3 − a = 0, a > 0,
2x + a
3
is given by x n +1 = n 2 [2 points]
3xn

Prepared by Tibebe-selassie T/mariam 96


b) Use this method to find 9 round to two decimal places starting from x0 = 2.
3

[3 points]
3. a) Solve the following system of equations using the Gauss-Jordan elimination
method. [4 points]

2 x + 2 y − 4 z = 14
3x + y + z = 8
2 x − y + 2 z = −1

b) Solve the system of equations below using the method of LU decomposition.


x − 3 y + 4 z = 12
− x + 5 y − 3 z = −12 [5 points]
4 x − 8 y + 23 z = 58

c) Determine approximate solutions to the following system of linear equations


using the Gauss-Seidel iterative method. Give your answer round to two decimal
places. [4 points]
4x + y − z = 8
5 y + 2z = 6
x − y + 4 z = 10
x 0 = 1, y 0 = 2, z 0 = 3

Prepared by Tibebe-selassie T/mariam 97


Final Examination (2005)

1. The population of a city in a census taken once in ten years is given below.
x 1935 1945 1955 1965 1975 1985 1995
f(x) 35 42 58 84 120 165 220
a) Construct a forward difference table for the data in the table above.
[4 points]
b) What is the lowest degree of polynomial that matches the data? [2 points]
2 2 2
c) Determine Δ f2, δ f4, and ∇ f6. [3 points]
d) Estimate the population it the years 1940, 1980 and 1990 [2 points]

2. Use a Lagrange interpolating polynomial of the first and second order to evaluate
f(2) on the bases of the following table. [5 points]

x 1 3 6
f(x) -6 4 196
3. Given the table of values

x -2 0 1 2 5
f(x) -15 1 -3 -7 41
a) Construct the corresponding divided difference table and decide the lowest
degree of the polynomial which matches the data exactly [5 points]
b) Find the value of f[1,2,5] [2 points]
c) Write down the corresponding interpolating polynomial.(Do not simplify!)
[3 points]
1. Given the Newton’s forward difference table for f ( x) = sin x below
x f(x) Δ Δ2 Δ3 Δ4
0.2 0.1987
0.0487
0.25 0.2474 − 0.0006
0.0481 − 0.0001
0.3 0.2955 − 0.0007 −0.0001
0.0474 − 0.0002
0.35 0.3429 − 0.0009
0.0465
0.4 0.3894

a. Use Newton’s forward difference formula to estimate f ' (0.2) and f " (0.2)
ignoring the fourth difference. [5 points]
0.4
b. Use the trapezoidal rule with step length 0.05 to estimate ∫ f ( x)dx . Give the
0.2

truncation error bound involved in using this method. [5 points]


0.4
c. Use the Simpson’s rule with step length 0.05 to estimate ∫ f ( x)dx . Give the
0.2

truncation error bound involved in using this method. [5 points]

Prepared by Tibebe-selassie T/mariam 98


Mid-Semester Examination (2005/2006)
Part I: Choose the best answer and write your choice on the space provided.

____ 1. Which one of the following is not a case of introducing an error?


a. Errors may be introduced while measuring quantities
b. Errors may be introduced while we represent them in a machine
c. Errors may be introduced while we copy numbers
d. Errors may be introduced while we represent infinite series
e. None of the above
____ 2. The number 0.0009875 when rounded off to three significant digits we have
a. 0.001 b. 0.000987 c. 0.000988 d. 0.000 e. None of these
____ 3. When a number is rounded off to n decimal places, then magnitude of absolute-
error cannot exceed
a. 10-n b. 10-n+1 c. 0.5×10-n+1 d. 0.5×10-n e. None of these
____ 4. The number 98750.0205 when chopped off to three significant digits we have
a. 987 b. 98750.021 c. 98750.020 d. 98700 e. None of these
____ 5. If a function f(x) is continuous on [a,b], and f(a) and f(b) are of opposite signs
then
a) There exists one c∈(a,b) such that f(c)=0
b) There exists one c∈(a,b) such that f(c)≠0
c) There exists at least one c∈(a,b) such that f(c)=0
d) There exists at least one c∈(a,b) such that f(c)≠0
e) None of these
____ 6. The equation x 3 − 3 x + 4 = 0 has only one real root. What is its first approximate
value as obtained by the method of false position in (-3, -2)?
a. –2.125 b. 2.125 c. –2.812 d. 2.812 e. None of these
____ 7. Newton-Raphson method is applicable when
a. f(x)≠0 in the neighborhood of the root
b. f ' ( x) ≠ 0 in the neighborhood of the root
c. f " ( x) ≠ 0 in the neighborhood of the root
d. a and c
e. None of these
____ 8. Which one of the following formula is preferable in solving 2 tan x − x − 1 = 0 by
the iterative method?
⎛ x + 1⎞
a. g ( x) = 2 tan x − 1 b. g ( x) = 2 / cos 2 x c. g ( x) = arctan⎜ ⎟
⎝ 2 ⎠
1
d. g ( x) = e. None of these
2(1 + (( x + 1) / 2) 2 )
Part II Write the simplified form of your answer in the space provided.
1. If we round of the number 364250 to four significant figures then absolute and
the relative errors that we may introduce because of rounding of respectively are
_____________________ and _______________________
2. Given: a=9.00±0.05,b=0.0356±0.0002,c=15300±100,d=62000±500. The
maximum value of the absolute error in a + b + c + d=________________

Prepared by Tibebe-selassie T/mariam 99


3. The value of 124.53-124.52 as accurate as possible, assuming both values are
correct to the five significant figures is ______________________.
4. Using Newton-Raphson method, the root of the equation x 2 − 5 x + 2 = 0 nearer to
0 correct to three decimal places is __________________
5. Describe the kind of systems of linear equations for which LU decomposition
method cannot be applicable.
_________________________________________________________________
_________________________________________________________________.

Part III Solve the following problems by showing the necessary steps neatly and
clearly.
1. a. Develop an iterative formula using Newton-Raphson method, to find the fourth
root of a positive number K. (i.e 4 K ) (2 points)
b. Draw a flow chart that can help in developing a computer program for finding
the fourth root of a positive number with a given tolerance say ε using the
result of (a). Your flow chart should display the number of iteration that is
required to achieve the result. If in case this process diverges devise a
termination condition as well. (Hint: Use x0, ε, and imax as input) (3 points)
2. a. find the inverse of
⎛ 1 1 2⎞
⎜ ⎟
⎜ 1 2 4⎟
⎜ 2 1 3⎟
⎝ ⎠
by using Jordan method (2 points)
b. Solve the system of equation bellow by using your result in (a)
x + y + 2z = 3
x + 2 y + 4 z = −2 (2 points)
2 x + y + 3z = 1
3. Solve the system of equations below by using LU decomposition method
10 x + y + z = 12
2 x + 10 y + z = 13 (5 points)
2 x + 2 y + 10 z = 14

Prepared by Tibebe-selassie T/mariam 100


2005/2006 Final Examination
Part I: Choose the best answer and write your choice on the space provided.

_____ 1. Let A=(aij) be a square matrix such that aij =0where |i – j |≥2 , then matrix A is
a. Diagonally dominated b. Tri-diagonal c. Upper triangular
d. Lower triangular e. None of these

_____ 2. Which one of the following methods of solving system of linear equations is
different from the rest?
a. Gaussian elimination b. LU decomposition
c. Gauss-Jacobi d. Gauss-Jordan
e. None of the above
_____ 3. Which one of the following operators is used as a basis for defining the rest?
a. E b. ∇ c. δ d. Δ e. None of these
_____ 4. ∇Δf i =
a. δ 2 f i b. (Δ − ∇) f i c. Δ∇f i
d. All of the above e. None of these
_____ 5. Let f ( x) = (1 − x)(1 − 2 x)(1 − 3x)(1 − 4 x) , x 0 = 0, and h = 0.5 then Δ4 f 0 equals
a. 18 b. 288 c. 1 d. cannot be determined
e. None of the above
_____6. Given x0 , x1 ,..., x n and f 0 , f 1 ,..., f n which one of the following is true about
the Lk ( x) (kth order Lagrange coefficient)?
n
a. Lk ( x k ) = 0 b. Lk ( xi ) = 0 for k ≠ i c. ∑L
k =0
k ( x) = 0 for x ∈ [ x0 , x n ]
n
d. Lk ( x k ) = ∑ f i e. None of these
i =0
i≠k

_____ 7. Which one of the following interpolation formulas cannot be applicable while
we have a constant interval length between consecutive arguments for a tabulated
function?
a. Newton’s forward interpolation formula
b. Newton’s backward interpolation formula
c. Lagrange’s interpolation formula
d. Divided difference interpolation formula
e. None of the above

____ 8. Which one of the following statements is not true?


a. Numerical differentiation is as accurate as analytic differentiation
b. We have to apply numerical differentiation when it cannot be avoided
1
[ ]
c. f ' ( x j ) = Δ − 12 Δ2 + 13 Δ3 − ... f j while h is a constant step length
h
1
[ ]
d. f ' ( x j ) = ∇ + 12 ∇ 2 + 13 ∇ 3 + ... f j while h is a constant step length
h

Prepared by Tibebe-selassie T/mariam 101


e. None of the above

____ 9. The total truncation error in Trapezoidal rule with constant step length h applied
b

∫ f ( x)dx is approximately (assume f ( n ) ( x) = f (n)


on ( x) )
a

h3 (b − a )h 4 (b − a )h 2
a. − f "(x j ) b. − max f " ( x) c. − max f " ( x)
12 12 a ≤ x ≤b 12 a ≤ x≤b

(b − a )h 4
d) − max f ( 4) ( x) e. None of these
180 a ≤ x ≤b
____ 10. Which one of the following methods is the most appropriate one for solving
system of none linear equations?
a. Gaussian elimination b. LU decomposition
c. Gauss-Jacobi d. Gauss-Seidel
e. None of these

Part II: - Solve the following problems by showing the necessary stapes clearly.
1. Given the system of equations
5 x + 2 y + z = 12
x + 4 y + 2 z = 15 and initial guess x = 2, y = 3, z = 0.
x + 2 y + 5 z = 20
solve questions a and b bellow.
a. Discuss why the Gauss Seidel iteration process converges no matter the initial
guess is. [2 points]
b. Solve the system using the Gauss Seidel iteration method in two iterations.
[4 points]

2. Given the table of data below solve the questions bellow the table

x -2 -1 0 1 2 3
f(x) 31 5 1 1 11 61

a. Construct a forward difference table for the data [4 points]


b. What is the lowest degree of polynomial that matches the data?why? [2 points]
c. Determine δ3f1+1/2, δ2f3, and ∇2f5. [3 points]
d. Estimate f(-0.5) and f(2.5). [2 points]
e. Determine the interpolating polynomial using the Newton’s forward. [4 points]

3. Use a Lagrange interpolating polynomial of second order to approximate f(1) on the


bases of the following table. [4 points]

x 0 2 3
f ( x) 7 11 28

Prepared by Tibebe-selassie T/mariam 102


4. Given the table of values

x 0 1 2 4 5 6
f ( x) 1 14 15 5 6 19
f. Construct the corresponding divided difference table and decide the lowest degree
of the polynomial which matches the data exactly [4 points]
g. Find the value of f[1,2,4] [1 points]
h. Write down the corresponding interpolating polynomial.(Do not simplify!)
[3 points]
5. Given the Newton’s forward difference table for f ( x) = ln x below

x f(x) Δ Δ2 Δ3 Δ4 Δ5 Δ6
4.0 1.3863
0.0488
4.2 1.4351 -0.0023
0.0465 0.0003
4.4 1.4816 -0.0020 -0.0003
0.0445 0.0000 0.0006
4.6 1.5261 -0.0020 0.0003 -0.0010
0.0425 0.0003 -0.0004
4.8 1.5686 -0.0017 -0.0001
0.0408 0.0002
5.0 1.6094 -0.0015
0.0393
5.2 1.6487

d. Use Newton’s forward difference formula to estimate f ' (4.0) and f " (4.0)
ignoring the fourth difference. [4 points]
5.2
e. Use the trapezoidal rule with step length 0.2 to estimate ∫ f ( x)dx . Give the
4.0

truncation error bound involved in using this method. [4 points]


5.2
f. Use the Simpson’s rule with step length 0.2 to estimate ∫ f ( x)dx . Give the
4.0

truncation error bound involved in using this method. [4 points]

Prepared by Tibebe-selassie T/mariam 103


2006 Mid Examination (I)
3. Let x = 0.000077218 and y = 7.1422 then
i. [ 5 Points]
a) The normalized floating point representations of x and y are respectively
___________________ and _______________
b) x and y chopped to one significant figure are respectively given by
_________________ and _______________
c) x and y chopped to one decimal place are respectively given by
________________ and _____________
d) x and y round to one significant figure are respectively given by
_________________ and _______________
e) x and y round to one decimal place are respectively given by
_______________ and ______________

ii. Using five-digit normalized floating point arithmetic give the sum [5 points]
d) x + y
e) Calculate the absolute and relative errors due to rounding in your answer for
ii(a) above.
f) Calculate the absolute and relative errors due to chopping in your answer for
ii(a) above.

4.a) Use three bisection routines on the equation x sin x + cos x = 0 in the interval [2, 3]
to give the approximated root.(Arrange your job in a table) [3 points]
b) how many steps of the bisection method are needed to determine the root of the
equation in problem (2a) above with an error of at most 0.5×10-5 [2 points]

5. The reciprocal of a number a can be computed without division by the iterative


formula
x n +1 = x n (2 − x n a )
a. Establish this relation by applying Newton’s method. Beginning with x0=0.2,
compute the reciprocal of 4 correct to 6 decimal digits. Calculating the error at
each step verify the quadratic convergence. [4 points]
b. Draw a flow chart for the iterative method for (3a) assuming that the tolerance
and the maximum of iteration are given respectively as (Tol and M). [3 points]
4. a) Solve the following special bi-diagonal linear system of equations efficiently.
[3 points]

x1 = 2
− x1 − x2 = −1
2 x2 x3 = 0
x3 x4 x5 = 3
− x5 2 x6 = 0
x6 x7 = 1
x7 = 2

Prepared by Tibebe-selassie T/mariam 104


b) Write the efficient algorithm that you have developed for solving the system in (3a)
generalizing the case for order n(odd). (Hint:-Use the matrix relation below for
your generalization by the help of one dimensional arrays (ai ), (bi ), (d i )) . [3 points]
⎛ d1 ⎞⎛ x1 ⎞ ⎛ b1 ⎞
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ a1 d2 ⎟⎜ x 2 ⎟ ⎜ b2 ⎟
⎜ a2 d3 ⎟⎜ x ⎟ ⎜ b ⎟
⎜ ⎟⎜ 3 ⎟ ⎜ 3 ⎟
⎜ a3 d4 a5 ⎟⎜ x 4 ⎟ = ⎜ b4 ⎟
⎜ d5 a6 ⎟⎜ x ⎟ ⎜ b ⎟
⎜ ⎟⎜ 5 ⎟ ⎜ 5 ⎟
⎜ d6 a 7 ⎟⎜ x6 ⎟ ⎜ b6 ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ d 7 ⎠⎝ x7 ⎠ ⎝ b7 ⎠

c. Draw a flow chart for the algorithm that you have developed in (3b)
[4 points]
5. Solve the system of equations below using the method of LU decomposition.
2 x − 5 y + z = 12
− x + 3 y − z = −8 [5 points]
3 x − 4 y + 2 z = 16

Prepared by Tibebe-selassie T/mariam 105


2006 Final Examination (I)
1. Given the quadratic equation x 2 − 3x − 1 = 0
a) Show that its one root lies in the interval [3,4]. [1 point]
x −1
2
b) The use of the iterative formula xn +1 = n in fixed-point iterative
3
method to find the root of the quadratic equation that lies in the interval
[3,4] never lead us to convergence explain why? [3 points]
c) Choose your own fixed-point iterative formula and approximate this root
(in the interval [3,4] in three iterations using x0=3.5. [3 points]

2. Given the system of non-linear equations


f1 ( x, y, z ) = x 2 + cos y + z 2 = 0
f 2 ( x, y, z ) = 2 x sin y + y 2 + 2 z = 0
f 3 ( x, y, z ) = 3x + e y + 2 xz = 0
a) Find the Jacobian matrix at ( x 0 , y 0 , z 0 ) = (0.5,0,0.5). [5 points]
b) Calculate the inverse of the Jacobian matrix in (a). [5 points]

3. Determine approximate solutions to the following system of linear equations


using two iterations of Gauss seidel iterative method
5 x − y + z = 20
2 x + 4 y = 30 with initial guesses x=y=z=0
x − y + 4 z = 10
Explain why Gauss seidel iterative method converges for what ever initial guesses
we have. [7 points]

4. Write the simplified form of (∇ − Δ) 2 f n [2 points]

5. Using a third degree Lagrange polynomial approximation for the function that
passes through the points given in the table bellow, find P3(1.7)
xi F(xi)
1 0
1.2 0.182
1.6 0.47
1.9 0.642
(Hint: L1(1.7)=-0.101 ,L2(1.7)= 0.885, L3(1.7)= 0.213) [4 points]

6. It is suspected that the table


x -2 -1 0 1 2 3
y 1 4 11 16 12 -4
f. Construct a forward difference table for the data [4 points]
g. What is the lowest degree of polynomial that matches the data?why? [2 points]

Prepared by Tibebe-selassie T/mariam 106


h. Determine δf1+1/2, δ2f3, and ∇2f5. [3 points]
i. Determine the interpolating polynomial using the Newton’s forward difference
formula. [4 points]
7. Given the table of values

x 0 1 2 3 5
f ( x) 2 1 6 5 -183
a. Construct the corresponding divided difference [4 points]
b. Find the value of f[1,2,3,5] [1 points]
c. Write down the corresponding interpolating polynomial.(Do not simplify!)
[3 points]
x
8. Given the Newton Forward difference table for f(x)=e as bellow

x f(x) Δ Δ2 Δ3 Δ4
1 2.7183
0.2859
1.1 3.0042 0.0301
0.3160 0.0032
1.2 3.3201 0.0332 0.0003
0.3492 0.0035
1.3 3.6693 0.0367 0.0004
0.3859 0.0039
1.4 4.0552 0.0406 0.0004
0.4265 0.0043
1.5 4.4817 0.0449
0.4713
1.6 4.9530

g. Use Newton’s forward difference formula to estimate f ' (1) and f " (1)
ignoring the fourth difference. [4 points]
1.6
h. Use the Simpson’s rule with step length 0.1 to estimate ∫ f ( x)dx . Give the
1
truncation error bound involved in using this method. [5 points]
i. Draw a flow chart for Simpson’s rule. (Bonus) [10 points]

Prepared by Tibebe-selassie T/mariam 107


INDEX
Inherent errors · 5
Iterative method · 33
A
Absolute error · 8 M
Aitken · 25
Algorithm · 3 machine numbers · 6
Augmented matrix · 39, 41
N
B
Newton-Raphson method · 26
Back-substitute · 40 Normalized · 6
Back-substitution · 40 Numerical differentiation · 93
Biased · 7 Numerical integration · 95

C O
Cholesky method · 53 Order of convergence · 22
Chopping · 8 Overflow · 7

D P
Definite integrals · 95 Partial pivoting · 42
Direct method · 33
Doolittle method · 50
Q

E Quadrature · 95, 98

Error bound · 99
R

F relative error · 8
round off error · 8
Fixed-point representation · 6 Rounding errors · 5
Flow chart · 3 Round-off errors and numbers of
operations · 42
G
S
Gauss-Jordan Algorithm · 47
graphically · 12 Scientific notation · 6
Secant method · 20
Significant figures · 6
I
Simpson's Rule · 98
Ill-conditioned system · 43 Systems of linear equations · 33
Ill-conditioning · 43

Prepared by Tibebe-selassie T/mariam 108


T Upper triangular form · 41

Trapezoidal rule · 96, 98 V


Truncation errors · 5
vector of residuals · 43
U
Underflow · 7

Prepared by Tibebe-selassie T/mariam 109

You might also like