Numerical Error RoundOff
Numerical Error RoundOff
26 March, 2023
𝑦 = 𝜎 × 𝑚 × 2𝑒
Exponent Mantissa
Machine epsilon (𝜖𝑚 ) is defined as the distance (gap) between 1 and the
next largest floating point number.
For IEEE-754 single precision, 𝜖𝑚 = 2−23 ≈ 1.192 × 10−7 , as shown by:
1 = 1.00000000000000000000000
1 + 𝜖𝑚 = 1⏟ .0000000000000000000000 1⏟
20 2−23
01000001000110011000000000000000
11000001110110010100000000000000
𝑠 = 1, -ve ⟹ (−1)1 = −1
𝐸 = 28 + 21 + 20 = 128 + 2 + 1 = 131
𝑒 = 𝐸 − 127 = 131 − 127 = 4
𝑀 = 2−1 + 2−3 + 2−4 + 2−7 + 2−9 = 0.697265625
= −1 × 24 × (20 + 0.697265625) = −27.15625
Dr. Amadou Toure Numerical Methods (Engr-280) Errors in Numerical Methods
26 March, 2023 11 / 15
Some fractions can’t be represented exactly! (But, 𝜖𝑎 < 𝜖𝑚 )
0.02832 x 2 = 0.05664
0.05664 x 2 = 0.11328
0.11328 x 2 = 0.22656
0.22656 x 2 = 0.45312
0.45312 x 2 = 0.90624 (0.02832)10 = (0.00000111001111111111101011000001)2
0.90624 x 2 = 1.81248 = 1.11001111111111101011000001 × 2−6
0.81248 x 2 = 1.62496 𝐸 = −6 + 127 = 121 = 011110012
0.62496 x 2 = 1.24992
0.24992 x 2 = 0.49984
0.49984 x 2 = 0.99968
0.99968 x 2 = 1.99936
⋯
31 23 0
00111100111001111111111101011000
(0.02832)10 ≈ (0.02831999957561492919921875)2
The Error 𝐸𝑡 = −4.2438507080078125 × 10−10
𝐸𝑡
𝜖𝑎 = ∣ ∣ = 1.498535 × 10−08 < 𝜖𝑚 (𝑒−23 or 1.192 × 10−07 )
0.02832
Dr. Amadou Toure Numerical Methods (Engr-280) Errors in Numerical Methods
26 March, 2023 12 / 15
Not Really Real
Floating point numbers approximate real numbers.
▶ The number of bits in the mantissa sets the granularity,
▶ the exponent sets the range of the approximation.
Exple: a 3-bit exponent and 3-bit mantissa represent the following values on
the real line:
Calculations with results greater than greatest value (15 in this case) gives
an overflow.
A result between 0 and the lowest value (1/8 in this case) gives an
underflow.
The distance from 1 to the smallest value greater than 1 is 𝜖𝑚 (machine
epsilon)
Dr. ▶ (in Matlab
Amadou Toure eps)Numerical Methods (Engr-280) Errors in Numerical Methods
26 March, 2023 13 / 15
single precision - The exponent is 8 bits and the mantissa 23 bits.
𝜖𝑚 ≈ 1.2 × 10−7
Overflow at ≈ 3.4 × 1038
double precision - The exponent is 11 bits and the mantissa 52 bits.
𝜖𝑚 ≈ 2.2 × 10−16
Overflow at ≈ 1.8 × 10308
The underflow is handled gracefully by unnormalized values, that fill
the gap round zero.
If an underflow still occurs the result is set to 0.
An overflow gives ∞ as result.
Undeterminable results produce a NaN – Not a Number. (ex: 0/0 =
NaN).
Arithmetic operations (+, −, ∗, /) get a result as if they were
calculated in full precision, and then rounded to fit into the floating
point representation.
Dr. Amadou Toure Numerical Methods (Engr-280) Errors in Numerical Methods
26 March, 2023 14 / 15
Rounding errors introduced in IEEE floating point arithmetics are modeled
like this:
𝜖
𝑓𝑙(𝑥 + 𝑦) = (𝑥 + 𝑦)(1 + 𝜎), |𝜎| ≤ 𝑚
2
𝜖
𝑓𝑙(𝑥 − 𝑦) = (𝑥 − 𝑦)(1 + 𝜎), |𝜎| ≤ 𝑚
2
𝜖
𝑓𝑙(𝑥 ∗ 𝑦) = (𝑥 ∗ 𝑦)(1 + 𝜎), |𝜎| ≤ 𝑚
2
𝜖𝑚
𝑓𝑙(𝑥/𝑦) = (𝑥/𝑦)(1 + 𝜎), |𝜎| ≤
2
Two uses:
(uncommon) Keep track on errors in calculations
(important) Analyze algorithms with respect to influence of floating
point round off. If round off errors are moderate compared to a
perturbation analysis of the problem, the algorithm is stable.