0% found this document useful (0 votes)
9 views

Intro Errors Floatingpoint Binary

Uploaded by

taheen aamir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Intro Errors Floatingpoint Binary

Uploaded by

taheen aamir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Measuring Errors

Why measure errors?


1) To determine the accuracy of
numerical results.
2) To develop stopping criteria for
iterative algorithms.

2 http://numericalmethods.eng.usf.edu
True Error
◼ Defined as the difference between the true
value in a calculation and the approximate value
found using a numerical method etc.

True Error (Et) = True Value – Approximate Value

3 http://numericalmethods.eng.usf.edu
Example—True Error
The derivative, f (x) of a function f (x) can be
approximated by the equation,
f ( x + h) − f ( x)
f ' ( x) 
h

If f ( x) = 7e and h = 0.3
0.5 x

a) Find the approximate value of f ' (2)


b) True value of f ' (2)
c) True error for part (a)

4
Example (cont.)
Solution:
a) For x = 2 and h = 0.3
f (2 + 0.3) − f (2)
f ' ( 2) 
0.3
f (2.3) − f (2)
=
0.3 e=
7e 0.5( 2.3) − 7e 0.5( 2 ) 2.718281828459045235360
=
0.3
22.107 − 19.028
= = 10.263
0.3

5
Example (cont.)
Solution:
b) The exact value of f ' (2) can be found by using
our knowledge of differential calculus.
f ( x) = 7e 0.5 x
f ' ( x ) = 7  0.5  e0.5 x
= 3.5e 0.5 x
So the true value of f ' ( 2) is
f ' (2) = 3.5e 0.5( 2)
= 9.5140
c) True error is calculated as
Et = True Value – Approximate Value
= 9.5140 − 10.263 = −0.722

6
Relative True Error
◼ Defined as the ratio between the true
error, and the true value.
True Error
Relative True Error ( t ) =
True Value

7
Example—Relative True Error
Following from the previous example for true error,
find the relative true error for f ( x) = 7e 0.5 x at f ' (2)
with h = 0.3
From the previous example,
Et = −0.722
Relative True Error is defined as
True Error
t =
True Value
− 0.722
= = −0.075888
9.5140
as a percentage,
t = −0.075888  100% = −7.5888%

8
Approximate Error
◼ What can be done if true values are not
known or are very difficult to obtain?
◼ Approximate error is defined as the
difference between the present
approximation and the previous
approximation.
Approximate Error ( E a ) = Present Approximation – Previous Approximation

9
Example—Approximate Error
For f ( x) = 7e 0.5 x at x = 2 find the following,
a) f (2) using h = 0.3
b) f (2) using h = 0.15
c) approximate error for the value of f (2) for part b)
Solution:
a) For x = 2 and h = 0.3
f ( x + h) − f ( x)
f ' ( x) 
h
f (2 + 0.3) − f (2)
f ' ( 2) 
0.3

10
Example (cont.)
Solution: (cont.)
f (2.3) − f (2)
=
0.3
7e 0.5( 2.3) − 7e 0.5( 2 )
=
0.3
22.107 − 19.028
= = 10.263
0.3
b) For x = 2 and h = 0.15
f (2 + 0.15) − f (2)
f ' (2) 
0.15
f (2.15) − f (2)
=
0.15

11
Example (cont.)
Solution: (cont.)
7e 0.5( 2.15) − 7e 0.5( 2 )
=
0.15
20.50 − 19.028
= = 9.8800
0.15

c) So the approximate error, E a is


Ea = Present Approximation – Previous Approximation
= 9.8800 − 10.263
= −0.38300

12
Relative Approximate Error
◼ Defined as the ratio between the
approximate error and the present
approximation.
Approximate Error
Relative Approximate Error ( a) =
Present Approximation

13
Example—Relative Approximate Error
For f ( x ) = 7e 0.5 x
at x = 2 , find the relative approximate
error using values from h = 0.3 and h = 0.15
Solution:
From Example 3, the approximate value of f (2) = 10.263
using h = 0.3 and f (2) = 9.8800 using h = 0.15
Ea = Present Approximation – Previous Approximation
= 9.8800 − 10.263
= −0.38300

14
Example (cont.)
Solution: (cont.)
Approximate Error
a =
Present Approximation
− 0.38300
= = −0.038765
9.8800
as a percentage,
a = −0.038765 100% = −3.8765%

Absolute relative approximate errors may also need to


be calculated,
a =| −0.038765 | = 0.038765 or 3.8765 %

15
How is Absolute Relative Error used as a
stopping criterion?
If |a |  s where s is a pre-specified tolerance, then
no further iterations are necessary and the process is
stopped.

If at least m significant digits are required to be


correct in the final answer, then
|a | 0.5  102−m %

16
Table of Values
For f ( x) = 7e at x = 2 with varying step size, h
0.5 x

h f (2) a m
0.3 10.263 N/A 0

0.15 9.8800 3.877% 1

0.10 9.7558 1.273% 1

0.01 9.5378 2.285% 1

0.001 9.5164 0.2249% 2

17
Sources of Error

10/5/2024
Two sources of numerical error
1) Round off error
2) Truncation error

19
Round off Error
◼ Caused by representing a number
approximately
1
 0.333333
3
2  1.4142...

20
Problems created by round off error

◼ 28 Americans were killed on February


25, 1991 by an Iraqi Scud missile in
Dhahran, Saudi Arabia.
◼ The patriot defense system failed to
track and intercept the Scud. Why?

21
Problem with Patriot missile
◼ Clock cycle of 1/10 seconds was
represented in 24-bit fixed point
register created an error of 9.5 x
10-8 seconds.
◼ The battery was on for 100
consecutive hours, thus causing
an inaccuracy of

−8 s 3600s
= 9.5  10  100hr 
0.1s 1hr
= 0.342s

22
Problem (cont.)
◼ The shift calculated in the ranging
system of the missile was 687 meters.
◼ The target was considered to be out of
range at a distance greater than 137
meters.

23
Truncation Error

24
Truncation error
◼ Error caused by truncating or
approximating a mathematical
procedure.

25
Example of Truncation Error
Taking only a few terms of a Maclaurin series to
approximate
x
e
2 3
x x
e x = 1 + x + + + ....................
2! 3!
If only 3 terms are used,
 x 2

Truncation Error = e − 1 + x + 
x

 2! 

26
Example 1 —Maclaurin series
1 .2
Calculate the value of e with an absolute
relative approximate error of less than 1%.
1.2 2 1.2 3
e 1.2
= 1 + 1.2 + + + ...................
2! 3!
n
e 1 .2
Ea a %
1 1 __ ___
2 2.2 1.2 54.545
3 2.92 0.72 24.658
4 3.208 0.288 8.9776
5 3.2944 0.0864 2.6226
6 3.3151 0.020736 0.62550

6 terms are required. How many are required to get


at least 1 significant digit correct in your answer? 27
Example 2 —Differentiation
f ( x) = x 2 f ( x + x) − f ( x)
Find f (3)
for using f ( x) 
x
and x = 0.2
f (3 + 0.2) − f (3)
f ' (3) =
0.2
f (3.2) − f (3) 3.2 2 − 32 10.24 − 9 1.24
= = = = = 6 .2
0.2 0.2 0.2 0.2

The actual value is


f ' ( x ) = 2 x, f ' (3) = 2  3 = 6

Truncation error is then, 6 − 6.2 = −0.2


Can you find the truncation error with x = 0.1
28
Example 3 — Integration
Use two rectangles of equal width to
approximate the area under the curve for
f ( x ) = x 2 over the interval [3,9]
y

90
9


y = x2 2
60
x dx
30 3

0 x
0 3 6 9 12

29
Integration example (cont.)
Choosing a width of 3, we have
9

 = (6 − 3) + ( x 2 ) (9 − 6)
2 2
x dx ( x )
x =3 x =6
3
= (3 2 )3 + (6 2 )3
= 27 + 108 = 135
Actual value is given by
9 9
 x 3   93 − 33 
3 x dx =  3  =  3  = 234
2

 3  
Truncation error is then
234 − 135 = 99
Can you find the truncation error with 4 rectangles?
30
Binary Representation

10/5/2024
How a Decimal Number is
Represented

−1 −2
257.76 = 2 10 + 5 10 + 7 10 + 7 10 + 6 10
2 1 0

32
Base 2

 (1 23 + 0  2 2 + 1 21 + 1 20 ) 
(1011.0011) 2 =  −1 −2 −3 −4 

 + ( 0  2 + 0  2 + 1  2 + 1  2 ) 10
= 11.1875

33
Convert Base 10 Integer to
binary representation
Table 1 Converting a base-10 integer to binary representation.

Quotient Remainder
11/2 5 1 = a0
5/2 2 1 = a1
2/2 1 0 = a2
1/2 0 1 = a3
Hence
(11)10 = (a3 a 2 a1a0 ) 2
= (1011) 2

34
Start

Integer N to be
Input (N)10
converted to binary
format

i=0

Divide N by 2 to get
quotient Q & remainder R

i=i+1,N=Q
ai = R

No
Is Q = 0?

Yes

n=i
(N)10 = (an. . .a0)2

STOP

35
Fractional Decimal Number
to Binary
Table 2. Converting a base-10 fraction to binary representation.

Number Number after Number before


decimal decimal
0.1875  2 0.375 0.375 0 = a−1
0.375  2 0.75 0.75 0 = a− 2
0.75  2 1.5 0.5 1 = a−3
0 .5  2 1.0 0.0 1 = a− 4

Hence
(0.1875)10 = (a−1a− 2 a− 3a− 4 ) 2
= (0.0011) 2

36
Start

Fraction F to be
Input (F)10
converted to binary
format
i = −1

Multiply F by 2 to get
number before decimal,
S and after decimal, T

i = i − 1, F = T
ai = R

No
Is T =0?

Yes

n=i
(F)10 = (a-1. . .a-n)2

STOP

37
Decimal Number to Binary
(11.1875)10 = ( ?.? )2
Since
(11)10 = (1011) 2
and
(0.1875)10 = (0.0011) 2

we have
(11.1875)10 = (1011.0011) 2

38
All Fractional Decimal Numbers
Cannot be Represented Exactly
Table 3. Converting a base-10 fraction to approximate binary representation.
Number Number
Number after before
decimal Decimal
0 .3  2 0.6 0.6 0 = a−1
0 .6  2 1.2 0.2 1 = a− 2
0 .2  2 0.4 0.4 0 = a−3
0 .4  2 0.8 0.8 0 = a− 4
0 .8  2 1.6 0.6 1 = a−5

(0.3)10  (a−1a−2 a−3a−4 a−5 ) 2 = (0.01001) 2 = 0.28125

39
Another Way to Look at
Conversion

Convert (11.1875)10 to base 2


(11)10 = 23 + 3
= 23 + 21 + 1
=2 +2 +2
3 1 0

= 1 23 + 0  2 2 + 1 21 + 1 20
= (1011)2

40
(0.1875)10 = 2 −3 + 0.0625
= 2 −3 + 2 − 4
−1 −2 −3 −4
= 0  2 + 0  2 + 1 2 + 1 2
= (.0011)2

(11.1875)10 = (1011.0011)2
41
Floating Point Representation

10/5/2024
Floating Decimal Point : Scientific Form

256.78 is written as + 2.5678 10 2

−3
0.003678 is written as + 3.678 10
− 256.78 is written as − 2.5678 10 2

43
Example

The form is
sign  mantissa 10 exponent

or
  m 10e
Example: For
− 2.5678 102
 = −1
m = 2.5678
e=2
44
Floating Point Format for Binary
Numbers

y =  m2 e

 = sign of number (0 for + ve, 1 for - ve )


m = mantissa (1)2  m  (10)2 
1 is not stored as it is always given to be 1.
e = integer exponent

45
Example
9 bit-hypothetical word
▪the first bit is used for the sign of the number,
▪the second bit for the sign of the exponent,
▪the next four bits for the mantissa, and
▪the next three bits for the exponent

(54.75)10 = (110110.11)2 = (1.1011011)2  25


 (1.1011)2  (101)2
We have the representation as

0 0 1 0 1 1 1 0 1
mantissa exponent
Sign of the Sign of the
number exponent

46
Machine Epsilon
Defined as the measure of accuracy and found
by difference between 1 and the next number
that can be represented

47
Example
Ten bit word
▪Sign of number
▪Sign of exponent
▪Next four bits for exponent
▪Next four bits for mantissa
0 0 0 0 0 0 0 0 0 0 = (1)10
Next
number 0 0 0 0 0 0 0 0 0 1 = (1.0001)2 = (1.0625)10

mach = 1.0625 − 1 = 2−4

48
Relative Error and Machine
Epsilon
The absolute relative true error in representing
a number will be less then the machine epsilon
Example
(0.02832)10  (1.1100)2  2−5
= (1.1100)2  2 −(0110 ) 2

10 bit word (sign, sign of exponent, 4 for exponent, 4 for mantissa)

0 1 0 1 1 0 1 1 0 0
Sign of the exponent mantissa
Sign of the
number
exponent

(1.1100)2  2−(0110 ) 2
= 0.0274375
0.02832 − 0.0274375
a =
0.02832
= 0.034472  2 − 4 = 0.0625
49
IEEE 754 Standards for Single
Precision Representation
IEEE-754 Floating Point
Standard
• Standardizes representation of
floating point numbers on
different computers in single and
double precision.

• Standardizes representation of
floating point operations on
different computers.
One Great Reference
What every computer scientist (and even if
you are not) should know about floating point
arithmetic!

http://www.validlab.com/goldberg/paper.pdf
IEEE-754 Format Single
Precision

32 bits for single precision


0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Sign Biased Mantissa (m)


(s) Exponent (e’)

s
.
Value = ( −1)  (1 m )2  2 e ' −127

53
Example#1
1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Sign Biased Mantissa (m)


(s) Exponent (e’)

Value = (− 1)  (1. m )2  2e ' −127


s

= (− 1)  (1.10100000)2  2 (10100010 ) 2 −127


1

= (− 1)  (1.625)  2162−127
= (− 1) (1.625) 235 = −5.5834 1010

54
Example#2
Represent -5.5834x1010 as a single
precision floating point number.
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Sign Biased Mantissa (m)


(s) Exponent (e’)

− 5.5834  10 = (− 1)  (1. ? )  2 ?
10 1

55
IEEE-754 Format

The largest number by magnitude

(1.1........1)2  2
127
= 3.40 10 38

The smallest number by magnitude


(1.00......0)2  2−126 = 2.18 10−38
Machine epsilon
 mach = 2 −23
= 1.19  10 −7

56
Propagation of Errors

10/5/2024
Propagation of Errors
In numerical methods, the calculations are not
made with exact numbers. How do these
inaccuracies propagate through the calculations?

58
Example 1:
Find the bounds for the propagation in adding two numbers. For example
if one is calculating X +Y where
X = 1.5 ± 0.05
Y = 3.4 ± 0.04
Solution
Maximum possible value of X = 1.55 and Y = 3.44

Maximum possible value of X + Y = 1.55 + 3.44 = 4.99

Minimum possible value of X = 1.45 and Y = 3.36.

Minimum possible value of X + Y = 1.45 + 3.36 = 4.81

Hence
4.81 ≤ X + Y ≤4.99.

59
Propagation of Errors In Formulas

If f is a function of several variables X 1 , X 2 , X 3 ,......., X n−1 , X n


then the maximum possible value of the error in f is

f f f f
f  X 1 + X 2 + ....... + X n −1 + X n
X 1 X 2 X n −1 X n

60
Example 2:
The strain in an axial member of a square cross-
section is given by
F
= 2
h E
Given
F = 72  0.9 N
h = 4  0.1 mm
E = 70  1.5 GPa

Find the maximum possible error in the measured


strain.

61
Example 2:
Solution
72
= −3 2
(4  10 ) (70  10 )
9

= 64.286  10 −6
= 64.286

  
 = F + h + E
F h E

62
Example 2:
 1  2F  F
= 2 =− 3 =− 2 2
F h E h hE E h E
Thus
1 2F F
E = 2 F + 3 h + 2 2 E
h E hE h E
1 2  72
= −3 2
 0.9 + −3 3
 0.0001
(4 10 ) (70 10 )
9
(4 10 ) (70 10 )
9

72
+  1.5  10 9

(4 10 −3 ) 2 (70 109 ) 2


= 5.3955
Hence
= (64.286  5.3955 )
63
Example 3:
Subtraction of numbers that are nearly equal can create unwanted
inaccuracies. Using the formula for error propagation, show that this is true.

Solution
Let
z = x− y
Then
z z
z = x + y
x y
= (1)x + (−1)y
= x + y
So the relative change is
z x + y
=
z x− y
64
Example 3:
For example if
x = 2  0.001
y = 2.003  0.001

z 0.001 + 0.001
=
z | 2 − 2.003 |
= 0.6667
= 66.67%

65

You might also like