CFD 1st Unit
CFD 1st Unit
CHAPTER
ONE
NUMBER SYSTEMS AND ERRORS
In everyday life we use numbers based on the decimal system. Thus the
number 257, for example, is expressible as
257 = 2·100 + 5·10 + 7·1
= 2·102 + 5·101 + 7·1000
We call 10 the base of this system. Any integer is expressible as a
polynomial in the base 10 with integral coefficients between 0 and 9. We
use the notation
N = (a n a n - 1 ··· a 0 ) 1 0
= a n 10 n + a n-1 10 n-1 + ··· + a 0 10 0 (1.1)
to denote any positive integer in the base 10. There is no intrinsic reason to
use 10 as a base. Other civilizations have used other bases such as 12, 20,
or 60. Modern computers read pulses sent by electrical components. The
state of an electrical impulse is either on or off. It is therefore convenient to
represent numbers in computers in the binary system. Here the base is 2,
and the integer coefficients may take the values 0 or 1.
1
2 NUMBER SYSTEMS AND ERRORS
(1.2)
where the coefficients ak are either 0 or 1. Note that N is again represented
as a polynomial, but now in the base 2. Many computers used in scientific
work operate internally in the binary system. Users of computers, however,
prefer to work in the more familiar decimal system. It is therefore neces-
sary to have some means of converting from decimal to binary when
information is submitted to the computer, and from binary to decimal for
output purposes.
Conversion of a binary number to decimal form may be accomplished
directly from the definition (1.2). As examples we have
Then
Conversely, to convert from binary to octal, one partitions the binary digits
in groups of three (starting from the right) and then replaces each three-
group by its octal digit; thus
Hence, using Algorithm 1.1 [with 2 replaced by 10 = (12)8, and with octal
arithmetic],
Therefore, finally,
EXERCISES
is its fractional part. The fractional part can always be written as a decimal
fraction:
(1.4)
is not.
If the integral part of x is given as a decimal integer by
1.2 THE REPRESENTATION OF FRACTIONS 5
while the fractional part is given by (1.4), it is customary to write the two
representations one after the other, separated by a point, the “decimal
point”:
where each bk is a nonnegative integer less than 2, i.e., either zero or one. If
the integral part of x is given by the binary integer
then we write
then
and now we are back to a fractional part of 0.2, so that the digits cycle. It
follows that
Then
We have stated this algorithm for a general base rather than for the
specific binary base for two reasons. If this conversion to binary is
carried out with pencil and paper, it is usually faster to convert first to
octal, i.e., use and then to convert from octal to binary. Also, the
algorithm can be used to convert a binary (or octal) fraction to decimal, by
choosing and using binary (or octal) arithmetic.
To give an example, if x = (.lOl)2, then, with and
binary arithmetic, we get from Algorithm 1.2
EXERCISES
1.2-l Convert the following binary fractions to decimal fractions:
(.1100011)2 (. 1 1 1 1 1 1 1 1)2
1.2-2 Find the first 5 digits of .1 written as an octal fraction, then compute from it the first 15
digits of .1 as a binary fraction.
1.2-3 Convert the following octal fractions to decimal:
(.614)8 (.776)8
Compare with your answer in Exercise 1.2-1.
1.2-4 Find a binary number which approximates to within 10-3.
1.2-5 If we want to convert a decimal integer N to binary using Algorithm 1.1, we have to use
binary arithmetic. Show how to carry out this conversion using Algorithm 1.2 and decimal
arithmetic. (Hint: Divide N by the appropriate power of 2, convert the result to binary, then
shift the “binary point” appropriately.)
1.2-6 If we want to convert a terminating binary fraction x to a decimal fraction using
Algorithm 1.2, we have to use binary arithmetic. Show how to carry out this conversion using
Algorithm 1.1 and decimal arithmetic.
and
See Exercise 1.3-3. The maximum possible value for is often called the
unit roundoff and is denoted by u.
When an arithmetic operation is applied to two floating-point num-
bers, the result usually fails to be a floating-point number of the same
length. If, for example, we deal with two-decimal-digit numbers and
then
†Numbers in brackets refer to items in the references at the end of the book.
10 NUMBER SYSTEMS AND ERRORS
(1.12)
by the formula
Here, we have used Eqs. (1.11a ) and (1.11b), and have not bothered to
1.3 FLOATING-POINT ARITHMETIC 11
This shows that the computed value for s satisfies the perturbed equation
(1.13)
Note that we can reduce all exponents by 1 in case ar+1 = 1, that is, in
case the last division need not be carried out.
1.3-1 The following numbers are given in a decimal computer with a four-digit normalized
mantissa:
Perform the following operations, and indicate the error in the result, assuming symmetric
rounding:
(1.15)
with s the largest integer such that For instance, x* = 3 agrees
with to one significant (decimal) digit, while
is correct to three significant digits (as an approximation to ). Suppose
now that we are to calculate the number
which can be evaluated quite accurately for small x; else, one could make
use of the Taylor expansion (see Sec. 1.7) for f(x),
(1.17)
Let us assume that b2 - 4ac > 0, that b > 0, and that we wish to find the
root of smaller absolute value using (1.17); i.e.,
(1.18)
equation
(1.19)
Using (1.18) and five-decimal-digit floating-point chopped arithmetic, we
calculate
while in fact,
is the correct root to the number of digits shown. Here too, the loss of
significant digits can be avoided by using an alternative formula for the
calculation of the absolutely smaller root, viz.,
(1.20)
(1.21)
The larger the condition, the more ill-conditioned the function is said to
be. Here we have made use of the fact (see Sec. 1.7) that
then so that
and this number can be quite large for |x| near 1. Thus, for x near 1 or
- 1, this function is quite ill-conditioned. It very much magnifies relative
errors in the argument there.
The related notion of instability describes the sensitivity of a numerical
process for the calculation of f(x) from x to the inevitable rounding errors
committed during its execution in finite precision arithmetic. The precise
effect of these errors on the accuracy of the computed value for f(x) is
hard to determine except by actually carrying out the computations for
particular finite precision arithmetics and comparing the computed answer
with the exact answer. But it is possible to estimate these effects roughly by
considering the rounding errors one at a time. This means we look at the
individual computational steps which make up the process. Suppose there
are n such steps. Denote by xi the output from the ith such step, and take
x0 = x. Such an xi then serves as input to one or more of the later steps
and, in this way, influences the final answer xn = f(x). Denote by f i the
function which describes the dependence of the final answer on the
intermediate result xi . In particular, f0 is just f. Then the total process is
unstable to the extent that one or more of these functions fi is ill-condi-
tioned. More precisely, the process is unstable to the extent that one or
more of the fi ’s has a much larger condition than f = f0 has. For it is the
condition of fi which gauges the relative effect of the inevitable rounding
error incurred at the ith step on the final answer.
To give a simple example, consider the function