Intro Errors Floatingpoint Binary
Intro Errors Floatingpoint Binary
2 http://numericalmethods.eng.usf.edu
True Error
◼ Defined as the difference between the true
value in a calculation and the approximate value
found using a numerical method etc.
3 http://numericalmethods.eng.usf.edu
Example—True Error
The derivative, f (x) of a function f (x) can be
approximated by the equation,
f ( x + h) − f ( x)
f ' ( x)
h
If f ( x) = 7e and h = 0.3
0.5 x
4
Example (cont.)
Solution:
a) For x = 2 and h = 0.3
f (2 + 0.3) − f (2)
f ' ( 2)
0.3
f (2.3) − f (2)
=
0.3 e=
7e 0.5( 2.3) − 7e 0.5( 2 ) 2.718281828459045235360
=
0.3
22.107 − 19.028
= = 10.263
0.3
5
Example (cont.)
Solution:
b) The exact value of f ' (2) can be found by using
our knowledge of differential calculus.
f ( x) = 7e 0.5 x
f ' ( x ) = 7 0.5 e0.5 x
= 3.5e 0.5 x
So the true value of f ' ( 2) is
f ' (2) = 3.5e 0.5( 2)
= 9.5140
c) True error is calculated as
Et = True Value – Approximate Value
= 9.5140 − 10.263 = −0.722
6
Relative True Error
◼ Defined as the ratio between the true
error, and the true value.
True Error
Relative True Error ( t ) =
True Value
7
Example—Relative True Error
Following from the previous example for true error,
find the relative true error for f ( x) = 7e 0.5 x at f ' (2)
with h = 0.3
From the previous example,
Et = −0.722
Relative True Error is defined as
True Error
t =
True Value
− 0.722
= = −0.075888
9.5140
as a percentage,
t = −0.075888 100% = −7.5888%
8
Approximate Error
◼ What can be done if true values are not
known or are very difficult to obtain?
◼ Approximate error is defined as the
difference between the present
approximation and the previous
approximation.
Approximate Error ( E a ) = Present Approximation – Previous Approximation
9
Example—Approximate Error
For f ( x) = 7e 0.5 x at x = 2 find the following,
a) f (2) using h = 0.3
b) f (2) using h = 0.15
c) approximate error for the value of f (2) for part b)
Solution:
a) For x = 2 and h = 0.3
f ( x + h) − f ( x)
f ' ( x)
h
f (2 + 0.3) − f (2)
f ' ( 2)
0.3
10
Example (cont.)
Solution: (cont.)
f (2.3) − f (2)
=
0.3
7e 0.5( 2.3) − 7e 0.5( 2 )
=
0.3
22.107 − 19.028
= = 10.263
0.3
b) For x = 2 and h = 0.15
f (2 + 0.15) − f (2)
f ' (2)
0.15
f (2.15) − f (2)
=
0.15
11
Example (cont.)
Solution: (cont.)
7e 0.5( 2.15) − 7e 0.5( 2 )
=
0.15
20.50 − 19.028
= = 9.8800
0.15
12
Relative Approximate Error
◼ Defined as the ratio between the
approximate error and the present
approximation.
Approximate Error
Relative Approximate Error ( a) =
Present Approximation
13
Example—Relative Approximate Error
For f ( x ) = 7e 0.5 x
at x = 2 , find the relative approximate
error using values from h = 0.3 and h = 0.15
Solution:
From Example 3, the approximate value of f (2) = 10.263
using h = 0.3 and f (2) = 9.8800 using h = 0.15
Ea = Present Approximation – Previous Approximation
= 9.8800 − 10.263
= −0.38300
14
Example (cont.)
Solution: (cont.)
Approximate Error
a =
Present Approximation
− 0.38300
= = −0.038765
9.8800
as a percentage,
a = −0.038765 100% = −3.8765%
15
How is Absolute Relative Error used as a
stopping criterion?
If |a | s where s is a pre-specified tolerance, then
no further iterations are necessary and the process is
stopped.
16
Table of Values
For f ( x) = 7e at x = 2 with varying step size, h
0.5 x
h f (2) a m
0.3 10.263 N/A 0
17
Sources of Error
10/5/2024
Two sources of numerical error
1) Round off error
2) Truncation error
19
Round off Error
◼ Caused by representing a number
approximately
1
0.333333
3
2 1.4142...
20
Problems created by round off error
21
Problem with Patriot missile
◼ Clock cycle of 1/10 seconds was
represented in 24-bit fixed point
register created an error of 9.5 x
10-8 seconds.
◼ The battery was on for 100
consecutive hours, thus causing
an inaccuracy of
−8 s 3600s
= 9.5 10 100hr
0.1s 1hr
= 0.342s
22
Problem (cont.)
◼ The shift calculated in the ranging
system of the missile was 687 meters.
◼ The target was considered to be out of
range at a distance greater than 137
meters.
23
Truncation Error
24
Truncation error
◼ Error caused by truncating or
approximating a mathematical
procedure.
25
Example of Truncation Error
Taking only a few terms of a Maclaurin series to
approximate
x
e
2 3
x x
e x = 1 + x + + + ....................
2! 3!
If only 3 terms are used,
x 2
Truncation Error = e − 1 + x +
x
2!
26
Example 1 —Maclaurin series
1 .2
Calculate the value of e with an absolute
relative approximate error of less than 1%.
1.2 2 1.2 3
e 1.2
= 1 + 1.2 + + + ...................
2! 3!
n
e 1 .2
Ea a %
1 1 __ ___
2 2.2 1.2 54.545
3 2.92 0.72 24.658
4 3.208 0.288 8.9776
5 3.2944 0.0864 2.6226
6 3.3151 0.020736 0.62550
90
9
y = x2 2
60
x dx
30 3
0 x
0 3 6 9 12
29
Integration example (cont.)
Choosing a width of 3, we have
9
= (6 − 3) + ( x 2 ) (9 − 6)
2 2
x dx ( x )
x =3 x =6
3
= (3 2 )3 + (6 2 )3
= 27 + 108 = 135
Actual value is given by
9 9
x 3 93 − 33
3 x dx = 3 = 3 = 234
2
3
Truncation error is then
234 − 135 = 99
Can you find the truncation error with 4 rectangles?
30
Binary Representation
10/5/2024
How a Decimal Number is
Represented
−1 −2
257.76 = 2 10 + 5 10 + 7 10 + 7 10 + 6 10
2 1 0
32
Base 2
(1 23 + 0 2 2 + 1 21 + 1 20 )
(1011.0011) 2 = −1 −2 −3 −4
+ ( 0 2 + 0 2 + 1 2 + 1 2 ) 10
= 11.1875
33
Convert Base 10 Integer to
binary representation
Table 1 Converting a base-10 integer to binary representation.
Quotient Remainder
11/2 5 1 = a0
5/2 2 1 = a1
2/2 1 0 = a2
1/2 0 1 = a3
Hence
(11)10 = (a3 a 2 a1a0 ) 2
= (1011) 2
34
Start
Integer N to be
Input (N)10
converted to binary
format
i=0
Divide N by 2 to get
quotient Q & remainder R
i=i+1,N=Q
ai = R
No
Is Q = 0?
Yes
n=i
(N)10 = (an. . .a0)2
STOP
35
Fractional Decimal Number
to Binary
Table 2. Converting a base-10 fraction to binary representation.
Hence
(0.1875)10 = (a−1a− 2 a− 3a− 4 ) 2
= (0.0011) 2
36
Start
Fraction F to be
Input (F)10
converted to binary
format
i = −1
Multiply F by 2 to get
number before decimal,
S and after decimal, T
i = i − 1, F = T
ai = R
No
Is T =0?
Yes
n=i
(F)10 = (a-1. . .a-n)2
STOP
37
Decimal Number to Binary
(11.1875)10 = ( ?.? )2
Since
(11)10 = (1011) 2
and
(0.1875)10 = (0.0011) 2
we have
(11.1875)10 = (1011.0011) 2
38
All Fractional Decimal Numbers
Cannot be Represented Exactly
Table 3. Converting a base-10 fraction to approximate binary representation.
Number Number
Number after before
decimal Decimal
0 .3 2 0.6 0.6 0 = a−1
0 .6 2 1.2 0.2 1 = a− 2
0 .2 2 0.4 0.4 0 = a−3
0 .4 2 0.8 0.8 0 = a− 4
0 .8 2 1.6 0.6 1 = a−5
39
Another Way to Look at
Conversion
= 1 23 + 0 2 2 + 1 21 + 1 20
= (1011)2
40
(0.1875)10 = 2 −3 + 0.0625
= 2 −3 + 2 − 4
−1 −2 −3 −4
= 0 2 + 0 2 + 1 2 + 1 2
= (.0011)2
(11.1875)10 = (1011.0011)2
41
Floating Point Representation
10/5/2024
Floating Decimal Point : Scientific Form
−3
0.003678 is written as + 3.678 10
− 256.78 is written as − 2.5678 10 2
43
Example
The form is
sign mantissa 10 exponent
or
m 10e
Example: For
− 2.5678 102
= −1
m = 2.5678
e=2
44
Floating Point Format for Binary
Numbers
y = m2 e
45
Example
9 bit-hypothetical word
▪the first bit is used for the sign of the number,
▪the second bit for the sign of the exponent,
▪the next four bits for the mantissa, and
▪the next three bits for the exponent
0 0 1 0 1 1 1 0 1
mantissa exponent
Sign of the Sign of the
number exponent
46
Machine Epsilon
Defined as the measure of accuracy and found
by difference between 1 and the next number
that can be represented
47
Example
Ten bit word
▪Sign of number
▪Sign of exponent
▪Next four bits for exponent
▪Next four bits for mantissa
0 0 0 0 0 0 0 0 0 0 = (1)10
Next
number 0 0 0 0 0 0 0 0 0 1 = (1.0001)2 = (1.0625)10
48
Relative Error and Machine
Epsilon
The absolute relative true error in representing
a number will be less then the machine epsilon
Example
(0.02832)10 (1.1100)2 2−5
= (1.1100)2 2 −(0110 ) 2
0 1 0 1 1 0 1 1 0 0
Sign of the exponent mantissa
Sign of the
number
exponent
(1.1100)2 2−(0110 ) 2
= 0.0274375
0.02832 − 0.0274375
a =
0.02832
= 0.034472 2 − 4 = 0.0625
49
IEEE 754 Standards for Single
Precision Representation
IEEE-754 Floating Point
Standard
• Standardizes representation of
floating point numbers on
different computers in single and
double precision.
• Standardizes representation of
floating point operations on
different computers.
One Great Reference
What every computer scientist (and even if
you are not) should know about floating point
arithmetic!
http://www.validlab.com/goldberg/paper.pdf
IEEE-754 Format Single
Precision
s
.
Value = ( −1) (1 m )2 2 e ' −127
53
Example#1
1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
= (− 1) (1.625) 2162−127
= (− 1) (1.625) 235 = −5.5834 1010
54
Example#2
Represent -5.5834x1010 as a single
precision floating point number.
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
− 5.5834 10 = (− 1) (1. ? ) 2 ?
10 1
55
IEEE-754 Format
(1.1........1)2 2
127
= 3.40 10 38
56
Propagation of Errors
10/5/2024
Propagation of Errors
In numerical methods, the calculations are not
made with exact numbers. How do these
inaccuracies propagate through the calculations?
58
Example 1:
Find the bounds for the propagation in adding two numbers. For example
if one is calculating X +Y where
X = 1.5 ± 0.05
Y = 3.4 ± 0.04
Solution
Maximum possible value of X = 1.55 and Y = 3.44
Hence
4.81 ≤ X + Y ≤4.99.
59
Propagation of Errors In Formulas
f f f f
f X 1 + X 2 + ....... + X n −1 + X n
X 1 X 2 X n −1 X n
60
Example 2:
The strain in an axial member of a square cross-
section is given by
F
= 2
h E
Given
F = 72 0.9 N
h = 4 0.1 mm
E = 70 1.5 GPa
61
Example 2:
Solution
72
= −3 2
(4 10 ) (70 10 )
9
= 64.286 10 −6
= 64.286
= F + h + E
F h E
62
Example 2:
1 2F F
= 2 =− 3 =− 2 2
F h E h hE E h E
Thus
1 2F F
E = 2 F + 3 h + 2 2 E
h E hE h E
1 2 72
= −3 2
0.9 + −3 3
0.0001
(4 10 ) (70 10 )
9
(4 10 ) (70 10 )
9
72
+ 1.5 10 9
Solution
Let
z = x− y
Then
z z
z = x + y
x y
= (1)x + (−1)y
= x + y
So the relative change is
z x + y
=
z x− y
64
Example 3:
For example if
x = 2 0.001
y = 2.003 0.001
z 0.001 + 0.001
=
z | 2 − 2.003 |
= 0.6667
= 66.67%
65