CHAP 03e

1
Approximations and Round-Off Errors

Chapter 3
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2
• Numerical methods yield approximate results that are close to the exact analytical
solution.
• How confident we are in our approximate result ? In other words,
“how much error is present in our calculation and is it tolerable?”
Significant Figures
• Number of significant figures indicates precision. Significant digits of a number
are those that can be used with confidence, e.g., the number of certain digits plus
one estimated digit.
53,800 How many significant figures?
5.38 x 104 3
5.3800 x 104 5
Zeros are sometimes used to locate the decimal point not significant figures.
0.00001753 4
0.001753 4
Identifying Significant Digits
http://en.wikipedia.org/wiki/Significant_figures
•All non-zero digits are considered significant. For example, 91 has two significant
figures, while 123.45 has five significant figures
•Zeros appearing anywhere between two non-zero digits are significant.

Ex: 101.1002 has seven significant figures.
•Leading zeros are not significant. Ex: 0.00052 has two significant figures.
•Trailing zeros in a number containing a decimal point are significant.

Ex: 12.2300 has six significant figures: 1, 2, 2, 3, 0 and 0. The number
0.000122300 still has only six significant figures (the zeros before the 1 are not
significant). In addition, 120.00 has five significant figures.
•The significance of trailing zeros in a number not containing a decimal point

can be ambiguous. For example, it may not always be clear if a number like 1300 is
accurate to the nearest unit. Various conventions exist to address this issue.
Error Definitions
True error: Et = True value – Approximation (+/-)
True value – Approximation

True percent relative error :  t  100%
True value
Approximate Error
• For numerical methods, the true value will be known only when we deal
with functions that can be solved analytically.
• In real world applications, we usually do not know the answer a priori.
Approximate Error = CurrentApproximation(i) – PreviousApproximation(i-1)
Approximate error
Approximate Relative Error : a  100%
Approximation
5
Iterative approaches (e.g. Newton’s method)
(Current Approx.) - (Previous Approx.)

Approx.Relative Error : a  100%
CurrentApprox.
Computations are repeated until stopping criterion is satisfied
a  s Pre-specified % tolerance based on your

knowledge of the solution. (Use absolute value)
If εs is chosen as:
 s  (0.5  10(2-n) )%
Then the result is correct to at least n significant figures (Scarborough 1966)
EXAMPLE 3.2: Maclaurin series expansion
2 3 n
x x x
e x  1 x    ... 
2 3! n!
Calculate e0.5 (= 1.648721…) up to 3 significant figures. During the calculation
process, compute the true and approximate percent relative errors at each step
Error tolerance  s  (0.5 10(2-3) )%  0.05%

MATLAB file in: C:\ERCAL\228\MATLAB\3\EXPTaylor.m
Terms Count Result εt (%) True εa (%) Approx.
1 1 1 39.3
1+(0.5) 2 1.5 9.02 33.3
1+(.5)+(.5)2/2 3 1.625 1.44 7.69
1+(.5)+(.5)2/2+(.5)3/6 4 1.6458333 0.175 1.27
5 1.6484375 0.0172 0.158
6 1.648697917 0.00142 0.0158
6
7
Round-off and Chopping Errors
• Numbers such as , e, or √7 cannot be expressed by a fixed number

of significant figures. Therefore, they can not be represented exactly by a
computer which has a fixed word-length
 = 3.1415926535….
• Discrepancy introduced by this omission of significant figures is called
round-off or chopping errors.
• If  is to be stored on a base-10 system carrying 7 significant digits,

chopping : =3.141592 error: t=0.00000065
round-off: =3.141593 error: t=0.00000035
• Some machines use chopping, because rounding has additional

computational overhead.
8
Number
Representation
86409
in Base-10
173
in Base-2
9
The representation of -173 on a 16-bit computer

using the signed magnitude method
10
Computer representation of a floating-point number
exponent
m.be
mantissa
Base of the number system used
11
156.78    0.15678x103
(in a floating point base-10
system)
1
 0.029411765 Suppose only 4
34
decimal places to be stored
0.0294100
• Normalize  remove the leading zeroes.

• Multiply the mantissa by 10 and lower the exponent by 1
0.2941 x 10-1
Additional significant
figure is retained
12
• Due to Normalization, absolute value of m is limited:1
 m 1
for base-10 system: 0.1 ≤ m < 1 b
for base-2 system: 0.5 ≤ m < 1
• Floating point representation allows both fractions and very large

numbers to be expressed on the computer. However,
▫ Floating point numbers take up more room
▫ Take longer to process than integer numbers.
13
Q: What is the smallest positive floating

point number that can be represented using
a 7-bit word (3-bits reserved for mantissa).
What is the number?

(* Solve Example 3.4 page 61 *)
14
Your turn:
Problem Statement:
• What is the largest positive
floating point number that can
be represented using a 7-bit
word (3-bits reserved for
mantissa).
15
Additional Notes on floating point numbers:

• Addition of two floating point numbers (normalization is needed)
• Multiplication
• Overflow / Underflow
very small and very large numbers can not be represented using a fixed-
length mantissa/exponent representation, therefore overflow and underflow
can occur while doing arithmetic with these numbers.
• Double precision arithmetic is always recommended
• The interval between Numbers increases as the numbers grow

in magnitude
16
aspects of floating-point
representation that have significance
regarding computer round-off errors:
1. There Is a Limited Range of Quantities That May
Be Represented.
- Leads to Overflow Error
2. There Are Only a Finite Number of Quantities That
Can Be Represented within the range
Example:
π = 3.14159265358 . . . is to be stored on a base-10
number system carrying seven significant figures.
Solution:
One method of approximation would be to merely omit,
or “chop off,” the eighth and higher terms,
as in π = 3.141592,
17
3. The Interval between Numbers, x, Increases as

the Numbers Grow in Magnitude.
•It is this characteristic, that allows floating-point
representation to preserve significant digits.
•However, it also means that quantizing errors
will be proportional to the magnitude of the
number being represented.
18
For normalized floating-point numbers, this

proportionality can be expressed
chopping is employed rounding is employed
• ,
where is referred to as the machine epsilon, computed as
Where
b is the number base and
t is the number of significant digits
19
Machine Epsilon
Problem Statement.
Determine the machine epsilon
and verify its effectiveness in
•characterizing the errors of the
number system on the adjacent
figure. Assume that chopping is
used.
20
Solution An example of a maximum error

would be a value falling just below
the upper bound of the interval
between
• The hypothetical floating-
point system from Example
3.5 employed values of the • Based on previous problem
base b = 2, and the number of
mantissa bits t = 3. Therefore, First value= 0.125000
the machine epsilon would be
Interval to Second
Value=0.03125
=0.25
21
Your turn:
Problem Statement 1: Problem Statement 2:
Determine the machine epsilon Determine the machine epsilon
and verify its effectiveness in and verify its effectiveness in
•characterizing the errors of the •characterizing the errors of the
number system on the adjacent number system on the adjacent
figure. Assume that chopping is figure. Assume that chopping is
used. used.
•01010112 •01100012
22
23
Common Arithmetic Operations

ADDITION:
0.1557 · 101 + 0.4381 · 10−1
•[1 − (−1) = 2]
•0.4381 · 10−1 →0.004381 · 101
•Thus,
•Notice how the last two digits of the second number that were shifted to the right have essentially been
lost from the computation.
24

• SUBTRACTION:
▫ sign of the subtrahend is reversed.
▫ Example>>>subtract 26.86 from 36.41
The loss of significance
during the subtraction of
nearly equal numbers is
among the greatest source
of round-off error in
numerical methods.
▫ the result is not normalized, and so shift the decimal
one place to the right to give 0.9550 · 10 = 9.550 1
25

• MULTIPLICATION:
▫ Example:
If, as in this case, a leading zero is introduced,

the result is normalized,
chopped
26

• DIVISION:
▫ Division is performed in a similar manner, but the
mantissas are divided and the exponents are
subtracted. Then the results are normalized and
chopped.
27
Adding a Large and a Small Number.

• Suppose we add a small number, 0.0010, to a
large number, 4000, using a hypothetical
computer with the 4-digit mantissa and the 1-
digit exponent. We modify the smaller number
so that its exponent matches the larger,
28
SEATWORK: Normalized your

answer!
1. 1.0678.100 + 0.0986.10-2
2. 0.5612.102 + 0.5959.10-2
3. Subtract 0.5612+ 1.5959

4. Subtract 1.0008.10-2+ 0.0341.101
5. Multiply 1.5612 and 0.0219
6. Multiply 0.5612.102 + 0.5959.10-2
7. Add 0.5612.104 + 0.00 5959.10-2
29
Subtractive Cancellation.
• This term refers to the round-off induced when
subtracting two nearly equal floating-point
numbers.
• effect of adding and subtracting large
numbers (each with some small error) and placing
great significance on the
differences>>>subtractive cancellation.
30
One common instance where this can occur involves finding the roots
of a quadratic equation or parabola with the quadratic formula
roots of a quadratic • Alternative formulation to

equation or parabola with minimize subtractive
the quadratic formula cancellation
31
Subtractive Cancellation
Problem Statement.
• Compute the values of the

roots of a quadratic equation
with a = 1,
• b = 3000.001, and c = 3.
Check the computed values
versus the true roots of
x1=−0.001
• and x2=−3000.
32
Evaluation of ex using Infinite

Series
Problem Statement.
• The exponential function y =

ex is given by the infinite series
Maclaurin series expansion
• Evaluate this function for x =

10 and x=−10, and be attentive
to the problems of roundoff
• error.

CHAP 03e

Uploaded by

Copyright:

Available Formats

CHAP 03e

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CHAP 03e

Uploaded by

Copyright:

Available Formats

1

Approximations and Round-Off Errors

53,800 How many significant figures?

•Zeros appearing anywhere between two non-zero digits are significant.

•Trailing zeros in a number containing a decimal point are significant.

•The significance of trailing zeros in a number not containing a decimal point

True error: Et = True value – Approximation (+/-)

True value – Approximation

Approximate Error = CurrentApproximation(i) – PreviousApproximation(i-1)

Iterative approaches (e.g. Newton’s method)

(Current Approx.) - (Previous Approx.)

Computations are repeated until stopping criterion is satisfied

a  s Pre-specified % tolerance based on your

Error tolerance  s  (0.5 10(2-3) )%  0.05%

Round-off and Chopping Errors

• Numbers such as , e, or √7 cannot be expressed by a fixed number

• If  is to be stored on a base-10 system carrying 7 significant digits,

• Some machines use chopping, because rounding has additional

The representation of -173 on a 16-bit computer

Computer representation of a floating-point number

• Normalize  remove the leading zeroes.

• Floating point representation allows both fractions and very large

Q: What is the smallest positive floating

What is the number?

Additional Notes on floating point numbers:

• Double precision arithmetic is always recommended

• The interval between Numbers increases as the numbers grow

3. The Interval between Numbers, x, Increases as

For normalized floating-point numbers, this

where is referred to as the machine epsilon, computed as

Solution An example of a maximum error

Common Arithmetic Operations

Common Arithmetic Operations

Common Arithmetic Operations

If, as in this case, a leading zero is introduced,

Common Arithmetic Operations

Adding a Large and a Small Number.

SEATWORK: Normalized your

3. Subtract 0.5612+ 1.5959

roots of a quadratic • Alternative formulation to

• Compute the values of the

Evaluation of ex using Infinite

• The exponential function y =

Maclaurin series expansion

• Evaluate this function for x =

You might also like