The Unum Number Format Mathematical Foundations Im
The Unum Number Format Mathematical Foundations Im
Mathematisches Institut
arXiv:1701.00722v1 [cs.NA] 2 Jan 2017
Bachelorarbeit
Laslo Hunhold
Erstgutachterin:
Prof. Dr. Angela Kunoth
Zweitgutachter:
Samuel Leweke
8. November 2016
Contents
1. Introduction 1
3. Interval Arithmetic 17
3.1. Projectively Extended Real Numbers . . . . . . . . . . . . . . . . . . . . . 17
3.1.1. Finite and Infinite Limits . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2. Well-Definedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2. Open Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3. Flakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4. Unum Arithmetic 33
4.1. Lattice Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1. Linear Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.2. Exponential Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.3. Decade Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2. Machine Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1. Unum Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.2. Operations on Sets of Real Numbers . . . . . . . . . . . . . . . . . 40
4.2.3. Unum Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3. Revisiting Floating-Point-Problems . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1. The Silent Spike . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.2. Devil’s Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.3. The Chaotic Bank Society . . . . . . . . . . . . . . . . . . . . . . . 43
4.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4.1. Comparison to IEEE 754 Floating-Point Numbers . . . . . . . . . 44
4.4.2. Sticking and Creeping . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.3. Lattice Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
iii
Contents
4.4.4. Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A. Notation Directory 51
A.1. Section 2: IEEE 754 Floating-Point Arithmetic . . . . . . . . . . . . . . . 51
A.2. Section 3: Interval Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.3. Section 4: Unum Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 51
B. Code Listings 53
B.1. IEEE 754 Floating-Point Problems . . . . . . . . . . . . . . . . . . . . . . 53
B.1.1. spike.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
B.1.2. devil.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
B.1.3. bank.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
B.1.4. Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
B.2. Unum Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
B.2.1. gen.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
B.2.2. table.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
B.2.3. unum.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
B.2.4. config.mk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
B.2.5. Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
B.3. Unum Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
B.3.1. euler.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
B.3.2. devil.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
B.3.3. bank.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
B.3.4. spike.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
B.3.5. Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
B.4. License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Bibliography 89
Eigenständigkeitserklärung 91
iv
1. Introduction
This thesis examines a modern concept for machine numbers based on interval arith-
metic called ‘Unums’ and compares it to IEEE 754 floating-point arithmetic, evaluating
possible uses of this format where floating-point numbers are inadequate. In the course
of this examination, this thesis builds theoretical foundations for IEEE 754 floating-
point numbers, interval arithmetic based on the projectively extended real numbers and
Unums.
Projectively Extended Real Numbers Besides the well-known and established concept
of extending the real numbers with signed infinities +∞ and −∞, called the affinely
extended real numbers, a different approach is to only use one unsigned symbol for
infinity, denoted as ∞ ˘ in this thesis. This extension is called the projectively extended
real numbers and we will prove that it is well-defined in terms of finite and infinite limits.
It is in our interest to examine how much we lose and what we gain with this reduction,
especially in regard to interval arithmetic.
1
1. Introduction
This thesis will present a theory of interval arithmetic based on the projectively ex-
tended real numbers, picking up the idea of modelling degenerate intervals across the
infinity point as well, allowing division by zero and showing many other useful properties.
Goal of this Thesis The goal of this thesis is to evaluate the Unum number format in
a theoretical and practical context, make out advantages and see how reasonable it is to
use Unums rather than the ubiquitous IEEE 754 floating-point format for certain tasks.
At the time of writing, all available implementations of the Unum arithmetic are
using floating-point arithmetic at runtime instead of solely relying on lookup tables as
Gustafson proposes. The provided toolbox developed in the course of this thesis limits
the use of floating-point arithmetic at runtime to the initialisation of input data. Thus it
is a special point of interest to evaluate the format the way it was proposed and not in an
artificial floating-point environment created by the currently available implementations.
2
2. IEEE 754 Floating-Point Arithmetic
Floating-point numbers have gone a long way since Konrad Zuse’s Z1 and Z3, which were
among the first machines to implement floating-point numbers, back then obviously using
a non-standardised format (see [Roj98, pp. 31, 40–48]). With more and more computers
seeing the light of day in the decades following the pioneering days, the demand for a
binary floating-point standard rose in the face of many different proprietary floating-
point formats.
The Institute of Electrical and Electronics Engineers (IEEE) took on the task and
formulated the ‘ANSI/IEEE 754-1985, Standard for Binary Floating-Point Arithmetic’
(see [IEE85]), published and adopted internationally in 1985 and revised in 2008 (see
[IEE08]) with a few extensions, including decimal floating-point numbers (see [IEE08,
Section 3.5]), which are not going to be presented here. This standardisation effort led
to a homogenisation of floating-point formats across computer manufacturers, and this
chapter will only deal with this standardised format and follow the concepts presented
in the IEEE 754-2008 standard. All results in this chapter are solely derived from this
standard.
i=0
There exist multiple parametres (s, e, d) for a single x. For instance, x = 6 in the base
b = 10 yields (0, 0, {6, 0, . . .}) and (0, 1, {0, 6, 0, . . .}) as two of many possible paramet-
risations.
Given the finite nature of the computer, the number of possible exponents e and digits
di is limited. Within these bounds we can model a machine number x̃ with exponent
bounds e, e ∈ Z, e ≤ e ≤ e and a fixed number of digits nm ∈ N and base b = 2 as
nm
x̃ = (−1)s · 2e · di · 2−i .
X
i=0
Given binary is the native base the computer works with, we will assume b = 2 in this
chapter. Despite being able to model finite floating-point numbers in the machine now,
we still have problems with the lack of uniqueness. The IEEE 754 standard solves this by
3
2. IEEE 754 Floating-Point Arithmetic
reminding that the only difference between those multiple parametrisations for a given
machine number x̃ 6= 0 is that
min {i ∈ {0, . . . , nm } | di = 0}
is variable (see [IEE08, Section 3.4]). This means that we have a varying amount of 0’s
in the sequence {di }i∈{0,...,nm } until we reach the first 1. One way to work around this
redundancy is to use normal floating point numbers, which force d0 = 1 (see [IEE08,
Section 3.4]). The d0 is not stored as it has always the same value. This results in the
Definition 2.1 (set of normal floating-point numbers). Let nm ∈ N and e, e ∈ Z. The
set of normal floating-point numbers is defined as
nm
( ! )
s −i nm
X
e
M1 (nm , e, e) := (−1) · 2 · 1 + di · 2 s ∈ {0, 1} ∧ e ≤ e ≤ e ∧ d ∈ {0, 1} .
i=1
The subnormal floating-point numbers allow us to express 0 with d = 0 and fill the so
called ‘underflow gap’ between the smallest normal floating-point number and 0. With
d and s variable, we use boundary values of the exponent to fit subnormal, normal and
exception cases under one roof (see [IEE08, Section 3.4a-e]).
Definition 2.3 (set of floating-point numbers). Let nm ∈ N, e, e ∈ Z and d ∈ {0, 1}nm .
The set of floating point numbers is defined as
∈ M0 (nm , e) e=e−1
∈ M (n , e, e)
e≤e≤e
1 m
M(nm , e − 1, e + 1) 3: x̃(s, e, d)
= (−1)s · ∞ e=e+1∧d=0
= NaN e = e + 1 ∧ d 6= 0.
4
2.1. Number Model
Proof. Let d ∈ {0, 1}nm and e ≤ e ≤ e. It follows with the finite geometric series that
nm
" #!
s −i
X
e
max (M1 (nm , e, e)) = max (−1) · 2 · 1 + di · 2
i=1
nm
!
2−i
X
= (−1)0 · 2e · 1 +
i=1
nm
2−i
X
= 2e ·
i=0
nm i
e
X 1
=2 ·
i=0
2
nm +1
1
1− 2
= 2e · 1
1− 2
−nm
= 2e · 2 − 2
Proposition 2.7 (number of NaN representations). Let nm ∈ N and e, e ∈ Z. The
number of NaN representations is
| NaN |(nm ) := x̃(s, e, d) ∈ M(nm , e − 1, e + 1) x̃ = NaN = 2nm +1 − 2.
5
2. IEEE 754 Floating-Point Arithmetic
1 ne nm
Figure 2.1.: IEEE 754 Floating-point memory layout; see [IEE08, Figure 3.1].
Handling the exponent just as an unsigned integer would not allow the use of negative
exponents. To solve this, the so called exponent bias was introduced in the IEEE 754
standard, which is the value 2ne −1 −1 subtracted from the unsigned value of the exponent
(see [IEE08, Section 3.4b]) and should not be confused with the two’s complement, the
usual way to express signed integers in a machine. Looking at the exponent values, the
exponent bias results in
and thus
e(ne ) := −2ne −1 + 2
e(ne ) := 2ne −1 − 1.
With the exponent bias representation, we know how many exponent values can be
assumed. Because of that it is now possible to determine the
different exponents for M1 (nm , e(ne ), e(ne )). Given d ∈ {0, 1}nm and s ∈ {0, 1} are
arbitrary it follows that
|M1 (nm , e(ne ), e(ne ))| = 2 · 2nm · (2ne − 2) = 21+ne +nm − 2nm +2 .
6
2.3. Rounding
Proof. According to Definition 2.2 it follows with arbitrary d ∈ {0, 1}nm and s ∈ {0, 1}
that
|M0 (nm , e(ne ))| = 2 · 2nm = 2nm +1 .
Proof. We define
|∞| := x̃(s, e, d) ∈ M(nm , e − 1, e + 1) x̃ = ±∞ = 2
|M(nm , e(ne ) + 1, e(ne ) − 1)| = |M0 (nm , e(ne ))| + |M1 (nm , e(ne ), e(ne ))| + |∞| + | NaN |
= 2nm +1 + 21+ne +nm − 2nm +2 + 2 + 2nm +1 − 2
= 21+ne +nm + 2nm +1 + 2nm +1 − 2 · 2nm +1
= 21+ne +nm .
Excluding the extended precisions above 64 bit, the IEEE 754 standard defines three
storage sizes for floating-point numbers (see [IEE08, Section 3.6]), parametrised by nm
and ne , as can be seen in Table 2.1. Half precision floating-point numbers (binary16)
were introduced in IEEE 754-2008 and are just meant to be a storage format and not
used for arithmetic operations given the low dynamic range.
2.3. Rounding
Given M(nm , e − 1, e + 1) is a finite set, we need a way to map arbitrary real values
into it if we want floating-point numbers to be a useful model of the real numbers.
The IEEE 754 standard solves this with rounding, an operation mapping real values to
preferrably close floating-point numbers based on a set of rules (see [IEE08, Section 4.3]).
Given the different requirements depending on the task at hand, the IEEE 754 standard
defines five rounding rules. Two based on rounding to the nearest value (see [IEE08,
Section 4.3.1]) and three based on a directed approach (see [IEE08, Section 4.3.2]).
7
2. IEEE 754 Floating-Point Arithmetic
Table 2.1.: IEEE 754-2008 binary floating-point numbers up to 64 bit with their charac-
terizing properties.
2.3.1. Nearest
The most intuïtive approach is to just round to the nearest floating-point number. In
case of a tie though, there has to be a rule in place to make a decision possible. Two
rules proposed by the IEEE 754 standard are tiing to even (also known as Banker’s
rounding) and tiing away from zero. Only the first mode is presented here, which is also
the default rounding mode (see [IEE08, Section 4.3.3]).
This part of the standard is often misunderstood, resulting in many publications not
presenting nearest and tie to even rounding as the standard rounding operation but
nearest and tie away from zero rounding, which is not correct but easy to overlook.
∞
x = (−1)s · 2e · di · 2−i .
X
i=0
rdE : R → M(nm , e − 1, e + 1)
8
2.3. Rounding
is defined for
nm
x := (−1)s · 2e · di · 2−i
X
i=0
"n #
m
s −i −nm
X
e
x := (−1) · 2 · di · 2 +1·2
i=0
as
s
(−1) · ∞
|x| ≥ max(M1 ) − 2e · 2−nm = 2e · 2 − 2(−nm −1)
x 7→ x |x − x| < |x − x| ∨ [|x − x| = |x − x| ∧ dnm = 1]
x |x − x| > |x − x| ∨ [|x − x| = |x − x| ∧ dnm = 0] .
What this means is that if two nearest machine numbers x and x are equally close to
x, the last mantissa bit dnm of x decides whether x is rounded to x or x. For dnm = 0
we know that x is even and for dnm = 1 it follows from the definition that x is even.
Tiing to even may seem like an arbitrary and complicated approach to rounding, but
its stochastic properties make it very useful to avoid biased rounding-effects in only one
direction. Given for a set of rounding-operations the number of even and odd ties, if they
appear, will be roughly the same with the number of rounding-operations approaching
infinity, it results in a balanced behaviour of up- and downrounding in tie-cases.
2.3.2. Directed
Another way to round numbers is a directed rounding approach to a given orientation.
The three modes have three distinct orientations: Rounding toward zero, upward and
downward. The first mode is not presented here.
i=0
rd↑ : R → M(nm , e − 1, e + 1)
9
2. IEEE 754 Floating-Point Arithmetic
rd↓ : R → M(nm , e − 1, e + 1)
The directed rounding modes are important for interval-arithmetic where it is import-
ant not to round down the upper bound or round up the lower bound of an interval.
This way it is always guaranteed that for a, b ∈ R and a ≤ b
is satisfied. The bounds may grow faster than by using a to-nearest rounding mode, but
it is guaranteed that the solution lies inbetween them.
2.4. Problems
As with any numerical system, we can find problems exhibiting its weaknesses. In this
context we examine three different kinds of problems. Using the results obtained here
it will allow us to evaluate if and how good the Unum arithmetic solves these problems
respectively.
|3 · (1 − x) + 1| = 0
⇔ 3 · (1 − x) + 1 = 0
⇔ 3−3·x+1=0
4
⇔ x= .
3
10
2.4. Problems
More specifically,
lim (f (x)) = lim (f (x)) = −∞.
x↓ 34 x↑ 43
Implementing this problem using IEEE 754 floating-point numbers (see listing B.1.1), we
might expect to receive a very small number or even negative infinity in an environment
of 43 . However, this is not the case.
Instead, as you can see in Figure 2.2, the program claims that f ( 43 ) ≈ −36.044 is
the minimum in direct vicinity of 43 , completely hiding the fact that f is singular in 34 .
The reason why the floating-point implementation hides the singularity is not that the
−33
−34
f (x)
−35
−36 −36.044
−2 −1 0 1 2
x− 4 ·10−15
3
logarithm implementation is faulty, but because the value passed to the logarithm is off
in the first place. It is easy to see the singular point 43 cannot be exactly represented in
the machine. This effect is increased with rounding errors occuring during the evaluation
(see Listing B.1.1) of
rdE rdE rdE (3) · rdE rdE (1) − rdE 4
+ rdE (1) ≈ 2.2204 · 10−16 .
3
In magnitude, this is relatively close to zero, but given
we not only see the significance of the rounding error, but also the reason why the
floating-point implementation claims that −36.044 is the minimum of f in direct vicinity
of 34 .
11
2. IEEE 754 Floating-Point Arithmetic
This result indicates that there are simple examples where floating-point numbers fail
for piecewise continuous functions with singularities. Not being able to spot singularities
for a given function might have drastic consequences, for example ‘hiding’ destructive
frequencies in resonance curves for the oscillation of bridge stay cables, which are, for
instance, derived in [PdCMBL96].
and determine the possible limits of this series, if they exist. For this purpose, we assume
convergence with u := un = un−1 = un−2 and obtain the characteristic polynomial
relation
1130 3000
u = 111 − + 2
u u
3 2
⇔ u = 111 · u − 1130 · u + 3000
⇔ 0 = u3 − 111 · u2 + 1130 · u − 3000
with solutions 5, 6 and 100. As further described in [Kah06, §5] for a similar recurrence,
we obtain the general solution with α, β, γ ∈ R under the condition |α| + |β| + |γ| =6 0
12
2.4. Problems
It follows that
lim (un ) = 6.
n→∞
If we take a look at the floating-point implementation (see listing B.1.2) of this problem,
we can observe a rather peculiar behaviour: Figure 2.3 shows that the IEEE 754-based
solver behaves completely opposite from what one might expect. Using the closed form
2 25
100 100
80
60
un
40
20
6
0
0 5 10 15 20 25
n
(2.4) we have shown that the recurrence (2.3) converges to 6. However, even though the
floating-point solver comes quite close to 6 up until n = 15, it unexpectedly converges to
100 in subsequent iterations. The reason for that is found within consecutive rounding
errors of un , which skew the results so far that the parametre α of the closed form (2.4)
becomes non-zero.
The carefully chosen starting values u0 = 2 and u1 = −4 deliberately make α disappear
in (2.4), which shows how even little rounding errors can give completely wrong results
for such a pathologic example.
13
2. IEEE 754 Floating-Point Arithmetic
The name of this example can be derived by thinking of the series as an imaginary
offer by a bank to start with a deposit of e − 1 currency units and in each year for 25
years, multiply it by the current running year number and subtract one currency unit
as banking charges.
For a theoretical answer, we first want to find a closed form of un . We observe the
pattern
a0 = a0 = 0! · (a0 )
1
a1 = a0 · 1 − 1 = 1! · a0 −
1!
1 1
a2 = (a0 · 1 − 1) · 2 − 1 = 2! · a0 − −
1! 2!
1 1 1
a3 = [(a0 · 1 − 1) · 2 − 1] · 3 − 1 = 3! · a0 − − − .
1! 2! 3!
Proposition 2.15 (closed form of an ). The closed form of the recurrent series (2.5) is
n
!
X 1
an = n! · a0 −
k=1
k!
a) a0 = a0 = 0! · a0 .
Pn
1
b) Assume an = n! · a0 − k=1 k! holds true for an arbitrary but fixed n ∈ N.
c) Show n 7→ n + 1.
an+1 = an · (n + 1) − 1
n
!
b) X 1
= n! · a0 − · (n + 1) − 1
k=1
k!
n
!
X11
= (n + 1)! · a0 − −
k=1
k! (n + 1)!
n+1
!
1 X
= (n + 1)! · a0 −
k=1
k!
Using the closed form of an and the definition of Euler’s number, we get for a
14
2.4. Problems
disturbed a0 = (e − 1) + δ with δ ∈ R
n
!
X 1
an = n! · (e − 1) + δ −
k=1
k!
n
!
1 X
= n! · δ + e − 1 −
k=1
k!
+∞ n
!
X1 X 1
= n! · δ + −
k=0
k! k=0 k!
+∞
X 1
= n! · δ +
k=n+1
k!
+∞
X n!
= n! · δ + .
k=n+1
k!
It follows that
−∞
δ<0
lim (an ) = 0 δ=0
n→+∞
+∞ δ > 0
15
3. Interval Arithmetic
The foundation for modern interval arithmetic was set by Ramon E. Moore in 1967
(see [Moo67]) as a means for automatic error analysis in algorithms. Since then, the
usage of interval arithmetic beyond stability analysis was limited to some applications
(see [MKŠ+ 06], [Moo79] and [MKC09]), which is also indicated by the fact that the
first IEEE standard for interval arithmetic, IEEE 1788-2015, was published in 2015 (see
[IEE15]). The standard is based on the ubiquitous affinely extended real numbers
R := R ∪ {+∞} ∪ {−∞},
which this chapter will not make use of. Instead, the basis will be the projectively
extended real numbers
R∗ := R ∪ {∞}.
˘
The motivation for this chapter is to find out how much we lose when only having
one symbol for infinity, and more importantly, what we gain in this process, ultimately
proving well-definedness of R∗ . Based on the findings, it is in our interest to construct
an interval arithmetic on top of R∗ , which we can later use to formalise the Unum
arithmetic.
17
3. Interval Arithmetic
∞
˘
−x x
−1 1
−x−1 x−1
−(∞)
˘ := ∞
˘ (3.1a)
a+∞
˘ =∞
˘ + a := ∞
˘ (3.1b)
b·∞
˘ =∞
˘ · b := ∞
˘ (3.1c)
a/∞
˘ := 0 (3.1d)
b/0 := ∞.
˘ (3.1e)
For more information on indeterminate forms on extensions of the real numbers see
[TF95].
To be able to show well-definedness of the extension of the arithmetic operations in
R∗ in terms of infinite limits, we first have to introduce the concept of ∞-infinite
˘ limits
∗
on R .
18
3.1. Projectively Extended Real Numbers
Besides finite limits, we also need a way to express when a function diverges. In this
regard, having only one infinity-symbol induces some losses, as only the absolute values
of the functions can be evaluated. However, it still holds that if a function diverges in
standard-infinite limits it also diverges in ∞-infinite
˘ limits.
Definition 3.4 (∞-infinite
˘ limit). Let f : R → R. The ∞-infinite
˘ limit of f for x ∈ R
approaching a ∈ R is defined as
lim (f (x)) = ∞
˘ :⇔ ∀ε > 0 : ∃δ > 0 : 0 < x − a < δ ⇒ |f (x)| > ε
x↓a
lim (f (x)) = ∞
˘ :⇔ ∀ε > 0 : ∃δ > 0 : 0 < a − x < δ ⇒ |f (x)| > ε
x↑a
lim (f (x)) = ∞
˘ :⇔ lim (f (x)) = ∞
˘ ∧ lim (f (x)) = ∞,
˘
x→a x↓a x↑a
lim (f (x)) = ∞
˘ :⇔ ∀ε > 0 : ∃c ∈ R : ∀x ∈ R : x < c : |f (x)| > ε
x↓∞
˘
lim (f (x)) = ∞
˘ :⇔ ∀ε > 0 : ∃c ∈ R : ∀x ∈ R : x > c : |f (x)| > ε
x↑∞
˘
lim (f (x)) = ∞
˘ :⇔ lim (f (x)) = ∞
˘ ∧ lim (f (x)) = ∞.
˘
x→∞
˘ x↓∞
˘ x↑∞
˘
lim (f (x)) = ∞
˘ ⇐ lim (f (x)) = ±∞
x↓a x↓a
lim (f (x)) = ∞
˘ ⇐ lim (f (x)) = ±∞
x↑a x↑a
lim (f (x)) = ∞
˘ ⇐ lim (f (x)) = ±∞
x↓∞
˘ x→−∞
lim (f (x)) = ∞
˘ ⇐ lim (f (x)) = ±∞.
x↑∞
˘ x→+∞
19
3. Interval Arithmetic
3.1.2. Well-Definedness
We can now use our definitions of ∞-finite
˘ and ∞-infinite
˘ limits to show that R∗ with
the extensions given in Definition 3.1 is well-defined in terms of infinite limits.
Theorem 3.6 (well-definedness of R∗ ). R∗ is well-defined in terms of infinite limits.
˘ , fa , fb , f0 : R → R, a, b ∈ R and b 6= 0. Without loss of generality we
Proof. Let f∞
assume that ∞
˘ is approached from below and specify
˘ (x)) = ∞
lim (f∞ ˘ (3.2a)
x↑∞
˘
˘ (x)) = ∞
lim (f∞ ˘ ⇔ ˘ (x)) = ∞
lim (−f∞ ˘
x↑∞
˘ x↑∞
˘
Following from precondition (3.2b), Definition 3.4 and ε̃ > 0 we know that
∃c2,a ∈ R : ∀x > c2,a : |fa (x) − a| < ε̃.
It follows for x > c2,a using the reverse triangle inequality that
ε̃ > |fa (x) − a| ≥ ||fa (x)| − |a|| ≥ |fa (x)| − |a|
⇒ |fa (x)| < ε̃ + |a|. (3.4)
Following from precondition (3.2a), Definition 3.4 and 2 · ε̃ + |a| > 0 we also know
that
∃c2,∞
˘ ∈ R : ∀x > c2,∞˘ : |f∞
˘ (x)| > 2 · ε̃ + |a|. (3.5)
˘ } to satisfy both (3.4) and (3.5). It follows using the
Let x > c̃2 := max{c2,a , c2,∞
reverse triangle inequality that
|f∞
˘ (x)| > 2 · ε̃ + |a| = ε̃ + (ε̃ + |a|) > ε̃ + |fa (x)|
⇒ ε̃ < |f∞
˘ (x)| − |fa (x)| = |f∞
˘ (x)| − | − fa (x)| ≤ |f∞
˘ (x) − (−fa (x))|
⇒ |fa (x) + f∞
˘ (x)| = |f∞
˘ (x) + fa (x)| > ε̃,
20
3.1. Projectively Extended Real Numbers
21
3. Interval Arithmetic
fb (x)
lim = ∞.
˘ (3.11)
x↑∞
˘ f0 (x)
|b|
Following from precondition (3.2a), Definition 3.4 and 2·ε̃ > 0 we know
|b|
∃c5,0 ∈ R : ∀x > c5,0 : |f0 (x)| < (3.12)
2 · ε̃
Let x > c̃5 := max{c3,b , c5,0 } to satisfy both (3.7) and (3.12). It follows that
|b| |b|
|fb (x)| > = ε̃ · > ε̃ · |f0 (x)|
2 2 · ε̃
⇒ |fb (x)| > ε̃ · |f0 (x)|
|fb (x)|
⇔ > ε̃
|f0 (x)|
fb (x)
⇔ > ε̃,
f0 (x)
Definition 3.7 (disjoint union). Let A be a set and {Ai }i∈I a family of sets over an
index set I with Ai ⊆ A. A is the disjoint union of {Ai }i∈I , denoted by
G
A= Ai ,
i∈I
if and only if
∀i, j ∈ I : i 6= j : Ai ∩ Aj = ∅ (3.13)
and [
A= Ai . (3.14)
i∈I
22
3.2. Open Intervals
Definition 3.9 (set of open R∗ -intervals). The set of open R∗ -intervals is defined as
I := {(a, a) a, a ∈ R∗ }.
(3.15)
∅ a∈R
∅ b, b ∈ R ∧ b ≥ b a=a (3.16a)
else
R
(b, b) ⊕ (a, a) b=b (3.16b)
(∞,
˘ a + b) a=b=∞
˘ (3.16c)
(a + b, ∞)
˘ a=b=∞
˘ (3.16d)
R a=b=∞
˘ (3.16e)
(b, b) ⊕ (a, a) a=b=∞
˘ (3.16f)
(a, a), (b, b) 7→ (
∅ b>b
a=∞
˘ (3.16g)
(∞,
˘ a + b) else
(
∅
b>b
a=∞
˘ (3.16h)
(a + b, ∞)
˘ else
(b, b) ⊕ (a, a) b=∞
˘ (3.16i)
(b, b) ⊕ (a, a) b=∞
˘ (3.16j)
(
∅ a>a∧b>b
else (3.16k)
(a + b, a + b) else
and, using A := {a·b, a·b}, A := {a·b, a·b} and A := A∪A for a, a, b, b ∈ R, ⊗ : I×I → I
23
3. Interval Arithmetic
defined as
∅ a∈R
∅ b, b ∈ R ∧ b ≥ b a=a (3.17a)
R else
(b, b) ⊗ (a, a) b=b (3.17b)
(
(a · b, ∞)
˘ a≤0∧b≤0
a=b=∞
˘ (3.17c)
R else
(
(a · b, ∞)
˘ a≥0∧b≥0
a=b=∞
˘ (3.17d)
R else
(
(∞,
˘ a · b) a≤0∧b≥0
a=b=∞
˘ (3.17e)
else
R
(b, b) ⊗ (a, a) a=b=∞
˘ (3.17f)
b>b
R
(∞,
˘ max(A)) b≥0
a=∞
˘ (3.17g)
(a, a), (b, b) 7→
(min(A), ∞)
˘ b≤0
else
R
R b>b
(min(A), ∞)
˘ b≥0
a=∞
˘ (3.17h)
(∞, b≤0
˘ max(A))
else
R
(b, b) ⊗ (a, a) b=∞
˘ (3.17i)
(b, b) ⊗ (a, a) b=∞
˘ (3.17j)
∅ a>a∧b>b (3.17k)
(
(max(A), min(A)) sgn(b) = sgn(b)
a>a (3.17l)
∅ else
(b, b) ⊗ (a, a) b>b (3.17m)
∅ a=a∨b=b
(3.17n)
(min(A), max(A)) else (3.17o)
Remark 3.10 (role of empty set in definition). The use of the empty set in Definition 3.9
denotes cases where undefined behaviour occurs.
Theorem 3.11 (well-definedness of I). I is well-defined in terms of set theory.
Proof. One can see that the operations ⊕ and ⊗ satisfy closedness with regard to I.
Symmetry is also satisfied given the explicit transposed forms (3.16b), (3.16f), (3.16i)
and (3.16j) for ⊕ and (3.17b), (3.17f), (3.17i), (3.17j) and (3.17m) for ⊗.
24
3.2. Open Intervals
Well-definedness in terms of set theory is based on the condition that for given A, B ∈ I
the two operations ⊕ and ⊗ must satisfy
A ⊕ B = {a + b | a ∈ A ∧ b ∈ B}
and
A ⊗ B = {a · b | a ∈ A ∧ b ∈ B}
respectively, except for cases where undefined behaviour occurs. It follows from the
conditions that if either A = ∅ or B = ∅ the resulting set is also empty (see (3.16a) and
(3.17a)).
Let a, b ∈ I and a, a, b, b ∈ R.
(3.16a) This case either corresponds to
∅ ⊕ b,
yielding the empy set, or
R ⊕ b,
˘ and R∗ ∈
yielding R, unless b is degenerate, given it contains ∞ / I is undefined, or
empty, yielding the empty set.
(3.16c) This case corresponds to
(∞,
˘ a) ⊕ (∞,
˘ b)
and yields, using Definition 3.8,
{x ∈ R | x < a} ⊕ {x ∈ R | x < b} = {x ∈ R | x < a + b} = (∞,
˘ a + b).
25
3. Interval Arithmetic
{x ∈ R | x > a} ⊕ ((b, ∞)
˘ t {∞} ˘ b)) = R∗
˘ t (∞,
((a, ∞)
˘ t {∞}
˘ t (∞,
˘ a)) ⊕ ((b, ∞)
˘ t {∞}
˘ t (∞,
˘ b))
((a, ∞)
˘ t {∞}
˘ t (∞,
˘ a)) ⊕ (b, b) = (a + b, ∞)
˘ t {∞}
˘ t (∞,
˘ a + b) = (a + b, a + b).
{x ∈ R | a < x < a}⊕{x ∈ R | b < x < b} = {x ∈ R | a+b < x < a+b} = (a+b, a+b).
The cases (3.17a), (3.17c), (3.17d), (3.17e), (3.17g), (3.17h), (3.17k), (3.17l), (3.17n) and
(3.17o) for ⊗ are shown analogously.
Given the complexity of open interval arithmetic alone, it becomes clear why open
intervals have been studied independently up to this point. We will now expand I with
singletons and introduce the concept of R∗ -Flakes.
3.3. Flakes
To model subsets of R∗ , one easily finds that open intervals alone are not sufficient
to model even simple sets. Using singletons to expand I can present new possibilities.
Before we introduce the central concept of this chapter, we first need to formalise the
definition of singletons in R∗ .
Definition 3.12 (set of singletons). Let S be a set. The set of S-singletons is defined
as
§(S) := {{x} : x ∈ S} .
26
3.3. Flakes
F := I t §(R∗ ).
a∈I ↔ a = (a, a)
∗
a ∈ §(R ) ↔ a = {ã}
b∈I ↔ b = (b, b)
∗
b ∈ §(R ) ↔ b = {b̃}
a⊕b a, b ∈ I (3.18a)
(
∅ ã = b̃ = ∞
˘
a, b ∈ §(R∗ ) (3.18b)
{ã + b̃} else
(
∅ b≥b
(a, b) →
7
ã = ∞
˘
{∞}
˘ else
a ∈ §(R∗ ) ∧ b ∈ I
(3.18c)
∅ b=b
(ã + b, ã + b) else
a ∈ I ∧ b ∈ §(R∗ )
ba (3.18d)
27
3. Interval Arithmetic
a⊗b a, b ∈ I
(3.19a)
∅ ã = ∞˘ ∧ b̃ ∈ {0, ∞}
˘
a, b ∈ §(R∗ )
∅ ã ∈ {0, ∞}
˘ ∧ b̃ = ∞ ˘ (3.19b)
{ã · b̃}
else
(
{∞}˘ b<0
b=∞ ˘
∅
else
(
{∞} ˘ b>0
b=∞
˘
ã = ∞
∅ ˘
else
∅
b>b
{∞}
˘ sgn(b) = sgn(b)
(a, b) 7→
∅
else
(∞,
˘ ã · b) ã > 0
a ∈ §(R∗ ) ∧ b ∈ I (3.19c)
b=∞
(ã · b, ∞)
˘ ã < 0 ˘
∅
else
(ã · b, ∞)
˘ ã > 0
(∞,˘ ã · b) ã < 0 b=∞
˘
∅
else
(max(A), min(A)) b>b
∅
b=b
(min(A), max(A)) else
a ∈ I ∧ b ∈ §(R∗ ) (3.19d)
ba
While this definition is definitely complex, we can see that going step by step and first
defining operations on open R∗ -intervals alone makes it easier to prove well-definedness
of those operations as a whole. It shall be noted here that R∗ -Flakes allow us to model
closed and open sets on R∗ easily.
28
3.3. Flakes
Proof. One can see that the operations and satisfy closedness with regard to F.
Symmetry is also satisfied given the explicit transposed forms (3.18d) for and (3.19d)
for and the fact that we have shown in Theorem 3.11 that ⊕ and ⊗ are symmetric.
Well-definedness in terms of set theory is based on the condition that for given A, B ∈ F
the two operations and must satisfy
A B = {a + b | a ∈ A ∧ b ∈ B}
and
A B = {a · b | a ∈ A ∧ b ∈ B}
respectively, except for cases where undefined behaviour occurs.
Let a, b ∈ F as in Definition 3.13.
(3.18a) We have shown in Proposition 3.11 that is well-defined in terms of set theory.
The cases (3.19a), (3.19b) and (3.19c) for are shown analogously.
What remains to be shown is that the inverse elements are well-defined. One can see
that the inverse elements are all closed under F and map ∅ to ∅. We now have to show
that the operation of an element in F with its respective inverse element results in a
set containing the respective neutral elements of R∗ except where undefined behaviour
occurs.
For with ‘−’ and and ‘/’ we observe for singletons
29
3. Interval Arithmetic
Now that we have shown well-definedness of F, we can proceed with showing some
useful properties that allow easier generalisations on Flakes. One of them is the
fF : F → F
Proof. Let f : R → R be strictly increasing. We can see that fF is closed in F and maps
∅ to ∅. For singletons well-definedness follows immediately, as it just corresponds to the
singleton of the single function evaluation of f . In this context, f (∞)
˘ = ∞,
˘ treating ∞˘
as an invariant object, is also consistent with the axioms of Definition 3.1, as
This also implies the well-definedness of the degenerate case, as for a, a ∈ R and a > a
it holds that
30
3.3. Flakes
fF : F → F
is defined as
a 7→ −((−f )F (a)).
With these results we have shown in general that we can evaluate strictly monotonic
functions on R∗ -Flakes, for instance exp or ln confined to R+
6=0 , which will be used later.
We require strictly monotonic functions, as a constant function f (x) = c ∈ R, that is
monotonic but not strictly monotonic, would yield
31
4. Unum Arithmetic
This Chapter will construct the Unum arithmetic based on the results in Chapter 3 and
the publications [Gus16a] and [Gus16b] by Gustafson. We start off by examining the
p0 := 1 and pn+1 := ∞.
˘ The set of Unums on the lattice P is defined as
n
G
F ⊃ U(P ) := {pi } t /{pi } t −{pi } t −/{pi } t
i=1
Gn
{(pi , pi+1 )} t {/(pi , pi+1 )} t {−(pi , pi+1 )} t {−/(pi , pi+1 )} t
i=0
{1} t {−1} t {0} t {∞}
˘
Remark 4.2. By Definition 4.1, U is closed under inversion with regard to and .
In regard to F, Remark 4.2 underlines the fact that this choice for U, generated by a
set of lattice points between (1, ∞),
˘ is in fact a good one. We will now proceed to derive
some elemental properties of U and prepare it to define operations on it.
Proposition 4.3 (cardinality of U). Let P as in Definition 4.1. The number of Unums
is
|U| = 8 · (|P | + 1).
Proof. Each quadrant of R∗ is filled with |P | lattice points and |P | + 1 intervals. Added
to this are the 4 fixed points 1, −1, 0, ∞.
˘ It follows from Definition 4.1 of |U| as a
disjoint union of finite sets that
Before we proceed with constructing operations on the set of Unums, we first have to
define the
Definition 4.4 (power set). Let S be a set. The power set of S is defined as
P(S) := {s ⊆ S}.
To use the results we have derived for F, we need to find a way to ‘blur’ R∗ -Flakes
into sets of Unums. For this purpose, we define the
33
4. Unum Arithmetic
Definition 4.5 (blur operator). Let P as in Definition 4.1. The blur operator
bl : F → P(U(P ))
is defined as
f 7→ {u ∈ U : f ⊆ u}.
We are now able to embed R∗ -Flakes
into subsets of U, which allows us to define
operations on U by identifying them with operations on F using the bl-operator.
Remark 4.6 (dependent sets and dependency problem). It is not within the scope of
this thesis to elaborate on the theory of dependent sets, and there are multiple ways to
approach it. To give a simple example, evaluating for A = (−1, 1) ∈ I
A−A
is expected to yield {0}, but using interval arithmetic, the expression just decays to
(−1, 1) − (−1, 1) = (−1, 1) + (−1, 1) = (−2, 2),
effectively doubling the width of the interval. This is known as the dependency problem.
It is in our interest to find an approach to limit this problem. As follows, we will
denote two dependent sets S1 and S2 with S1 ∼ S2 , and with regard to the example given
above, it holds that A ∼ A.
To approach the dependency problem, we only evaluate pairwise operations for de-
pendent sets. The underlying idea is that if a given value is present in the first set within
a Unum, the dependency guarantees it will also only be within this Unum in the second
set. We identify operations on F with operations on U by defining the
Definition 4.7 (dual Unum operation). Let ? : F × F → F be an operation on F and P
as in Definition 4.1. The dual Unum operation
h?i : P(U(P )) × P(U(P )) → P(U(P ))
is defined as
∅ U ∼ V ∧ u 6= v
[ [
(U, V ) 7→ R∗ u?v =∅
u∈U v∈V
bl(u ? v) else.
Remark 4.8 (NaN for Unum operations). As one can see in Definition 4.7, when an
R∗ -Flake operation ? yields the empty set, indicating an empty set or that undefined
behaviour was witnessed, the Unum arithmetic proposed by Gustafson in [Gus16b,
Table 2] mandates that the respective dual Unum operation yields R∗ .
This is not the ideal behaviour, as we carefully defined and to give the empty set
if one operand is the empty set, −∅ = ∅ and /∅ = ∅. This behaviour is useful, as just
like NaN for floating-point numbers, which, once it occurs, is carried through the entire
stream of floating-point calculations, the empty set plays this special role in the Unum
context.
In the interest of staying compatible with the Unum format proposed by Gustafson,
this weak spot in the proposal was implemented in the Unum toolbox anyway.
34
4.1. Lattice Selection
35
4. Unum Arithmetic
The problem with a linear Unum lattice is the lack of dynamic range. Just like with
floating-point numbers, we want a dense distribution of lattice points around 1 and
a lighter distribution the further we move away from 1. As we can deduce from this
observation, a desired quality of the Unum lattice could be, for instance, an exponential
distribution.
36
4.1. Lattice Selection
This is trivial for p = 1. For p > 1 and i ∈ {1, . . . , p − 1} we note that for m ∈ N it holds
that
( )
i mod m = m − 1
(i + 1) mod m = 0 ⇒
∃n ∈ N0 : (i + 1) = n · m
j k
i+1
m = bnc = n
⇒ j k
i =n−1
m
i+1 i
⇒ = +1
m m
and obtain
j k
h i i+1
−(s−1) s s−1 10s −10s−1
pi+1 − pi = 1 + 10 · (i + 1) mod 10 − 10 · 10 −
j k
h i i
1 + 10−(s−1) · i mod 10s − 10s−1 · 10 10s −10s−1
j k
i
h i +1
≥ 1 + 10−(s−1) · 0 · 10 10s −10s−1
−
j k
h i i
−(s−1) s s−1 10s −10s−1
1 + 10 · 10 − 10 −1 · 10
j k
h i i
= 10 − 1 − 10−(s−1)+s + 10−(s−1)+s−1 + 10−(s−1) · 10 10s −10s−1
j k
i
= 10−(s−1) · 10 10s −10s−1
> 0.
Proposition 4.17 (maximum of the decade Unum lattice). Let p ∈ N0 and s ∈ N. The
maximum of the decade Unum lattice is
j k
h i p
−(s−1) s s−1 10s −10s−1
max {PD (p, s)} = 1 + 10 · p mod 10 − 10 · 10 .
37
4. Unum Arithmetic
Proof. As shown in the proof of Proposition 4.16, ∀i > j : pi > pj and thus
j k
h i p
−(s−1) s s−1 10s −10s−1
max {PD (p, s)} = pp = 1 + 10 · p mod 10 − 10 · 10 .
1 35
9,000
8,000
6,000
Value
4,000
2,000
0 0
0 10 20 30
Lattice Point
38
4.2. Machine Implementation
as
(S, n) 7→ sn
Using the ascension operator, we enumerate the elements in U(P ) with P as in Defin-
ition 4.1, taking note that U(P ) ∩ P((0, 1)), U(P ) ∩ P((1, ∞)), ˘ U(P ) ∩ P((∞,
˘ −1)) and
U(P ) ∩ P((−1, 0)) are finite strictly ordered sets. In other words, we define a mapping
from {0, . . . , |U(P )| − 1}, which is {0, . . . , 8 · (|P | + 1) − 1} according to Proposition 4.3,
into U(P ), called the
Definition 4.19 (Unum enumeration). Let P as in Definition 4.1. The Unum enumer-
ation
u : {0, . . . , |U(P )| − 1} → U(P )
is defined as
{0} n = 0 · (|P | + 1)
asc(U(P ) ∩ P((0, 1)), n − 0 · (|P | + 1)) 0 · (|P | + 1) < n < 2 · (|P | + 1)
{1} n = 2 · (|P | + 1)
asc(U(P ) ∩ P((1, ∞)),
˘ n − 2 · (|P | + 1)) 2 · (|P | + 1) < n < 4 · (|P | + 1)
n 7→
{∞}
˘ n = 4 · (|P | + 1)
asc(U(P ) ∩ P((∞,
˘ −1)), n − 4 · (|P | + 1)) 4 · (|P | + 1) < n < 6 · (|P | + 1)
{−1} n = 6 · (|P | + 1)
asc(U(P ) ∩ (−1, 0), n − 6 · (|P | + 1)) 6 · (|P | + 1) < n < 8 · (|P | + 1).
Remark 4.20 (enumeration of infinity). For arbitrary U(P ) with P as in Definition 4.1
it follows that
|U(P )|
u = {∞}.
˘
2
To describe the enumeration intuïtively, we cut the R∗ -circle at 0 and trace all Unums
from 0 to 0 in a counter-clockwise direction. In the machine the Unum enumeration
mapping can be realised using unsigned integers. One can deduce that for a given
number of Unum bits nb ∈ N an unsigned nb -bit integer can represent 2nb values, namely
0 through 2nb − 1.
Even though in theory the size of |P | can be arbitrary, as it is the case for the provided
toolbox, one must respect the fundamental data-types in a machine, resulting in the
limitation nb ∈ {8, 16, 32, 64, . . .} in the interest of not wasting any bit patterns in the
process. It follows that we are interested in finding out the required lattice size for a
given nb .
Proposition 4.21 (lattice size depending on Unum bits). Let nb ∈ N, nb > 2 and P as
in Definition 4.1. Given nb Unum bits it follows that
|P | = 2nb −3 − 1.
39
4. Unum Arithmetic
Proof. With nb Unum bits it follows that |U(P )| = 2nb . According to Proposition 4.3
we know that |U(P )| = 8 · (|P | + 1) and thus
2nb = 8 · (|P | + 1) = 23 · (|P | + 1) ⇔ |P | = 2nb −3 − 1
According to the results obtained in Section 4.1, we will only take decade lattices into
account. We are led to the
Definition 4.22 (set of machine Unums). Let nb ∈ N, nb > 2 and ns ∈ N. The set of
machine Unums with nb bits and ns significant digits is defined as
UM (nb , ns ) := U(PD (2nb −3 − 1, ns ).
Having found an expression for machine Unums, it is now possible to represent arbit-
rary elements of P(UM (nb , ns )) in the machine to model sets of real numbers.
summation formula and the facts that each entry takes up 2 · nb bit and we have two
operations and, thus, two LUTs, the total storage size is
n
2 b
! nb
2 · (2nb + 1)
X
2 · (2 · nb ) · i bit = 4 · nb · bit = nb · 2nb +1 · (2nb + 1) bit.
i=1
2
With the lookup tables constructed, operations on SORNs are analogous to dual Unum
operations (see Definition 4.7), with the only difference that the set union for the bit
strings is realised with a bitwise OR.
40
4.2. Machine Implementation
41
4. Unum Arithmetic
ucut() and uuni() for cutting and taking the union of two SORNs and uequ() and
usup() to check if two SORNs are equal and if one SORN is the superset of another.
The input and output functions play a special role in this toolbox. uint() is the only
function using floating point numbers to add a closed interval to a SORN and uout()
prints a SORN in a human-readable format to standard output.
When using the Unum toolbox, only the components unum.h and the static library
libunum.a are relevant and need to be present when compiling programs using the Unum
toolbox (see Section B.3). All functions are reëntrant and, thus, thread-safe.
We can express the spike function (2.2) within the Unum arithmetic, using a LUT-based
natural logarithm
LN : P(U(P )) → P(U(P ))
defined as
(
hlnF i(U ) U ∩ P((∞,
˘ 0]) = ∅
U 7→
∅ else
(see ulog() in Listing B.2.3) and an elementary Unum modulus function | · | (see uabs()
in Listing B.2.3), as
42
4.3. Revisiting Floating-Point-Problems
1 4
1.2 3 1.9
−1
F (x)
−2
−3
∞
˘
−4
1
Figure 4.2.: Evaluation of the Unum spike function F (see (4.1)) on all Unums in [ 1.2 , 1.9]
with (nb , ns ) = (12, 2) (◦/• demarks open/closed interval endpoints); see
Listing B.3.4.
Running the Unum toolbox implementation (see Listing B.3.2) of this problem, we obtain
U25 = R∗ .
This indicates the instability of the problem posed. Even though the information loss is
great, this result can at least be a warning to investigate the numerical behaviour of the
given sequence.
43
4. Unum Arithmetic
Again, running the Unum toolbox implementation (see Listing B.3.3), we obtain for
A0 = bl({e − 1}) = (1.7, 1.8)
A25 = R∗ .
This is consistent with the theoretical results we obtained, given we can find an ε > 0
such that A0 contains e − 1 + δ with δ ∈ (−ε, ε), as e − 1 6∈ PD (2nb −3 − 1, nd ).
We observe that, even though the results do not lie about the solution, the information
loss is great.
Concluding, introducing Unums as a number format allowing you to neglect stabil-
ity analysis has turned out to be a false promise. We can also not sustain the notion
that naïvely implementing algorithms in Unums abolishes the need for a break con-
dition. Besides complete information loss, sticking- and creeping-effects elaborated in
Subsection 4.4.2 additionally make it difficult to think of proper ways to do that.
4.4. Discussion
With the theoretical formulation of Unums and practical results, it is now time to discuss
the format taking into account the results obtained in the previous chapters.
nb (bit) 8 16 32 64
ns 1 3 7 15
|PD | = 3.10 · 10+1 ≈ 8.19 · 10+3 ≈ 5.37 · 10+8 ≈ 2.31 · 10+18
|UM | = 2.56 · 10+2 ≈ 6.55 · 10+4 ≈ 4.29 · 10+9 ≈ 1.84 · 10+19
max(PD ) = 5.00 · 10+3 = 1.91 · 10+9 ≈ 6.87 · 10+59 ≈ 1.43 · 10+2562
max(PD )−1 = 2.00 · 10−4 ≈ 5.24 · 10−10 ≈ 1.45 · 10−60 ≈ 6.99 · 10−2563
Size of LUTs ≈ 132 kB ≈ 17 GB ≈ 1.48 · 1020 B ≈ 5.44 · 1039 B
Table 4.1.: Machine Unums properties for nb ∈ {8, 16, 32, 64} and ns selected to match
IEEE 754 significant decimal digits (= blog10 (2nm +1 )c) for each storage size.
Comparing Table 4.1 to Table 2.1, we note that for the same number of storage bits, the
dynamic range, the ratio of the largest and smallest representable numbers, of Unums is
orders of magnitude larger than that of IEEE 754 floating-point numbers. For example,
with a storage size of 16 bit, the dynamic range of IEEE 754 floating-point numbers is
44
4.4. Discussion
respectively, which is an increase of roughly 6 orders of magnitude. The reason for this
significant difference is the fact that no bit patterns are wasted for NaN-representations
in the Unum number format.
One the other hand, one can see that any values for nb beyond roughly 12 bit (cor-
responding to a LUT size of ≈ 50 MB) is not feasible given the huge size of the
LUTs. It shows that we can only really reason about machine Unum environments
with nb ∈ {3, . . . , 12}.
Example 4.25 (Euler’s number). Determining Euler’s number in the Unum arith-
metic can be done by defining a SORN-series En satisfying
where " #
n k
En := hi /hi bl({`}) , (4.2)
k=0 `=0
Using the Unum toolbox (see Listing B.3.1), the partial sums of this problem are visualised
in Figure 4.3. The first 21 iterates are depicted and illustrate a pathological behaviour.
Starting from n = 3, the lower bound of the solution set is stuck at the value 2.6. One
can also observe that the upper bound is growing linearly on each iteration. It creeps
away from e and reduces the quality of the solution with each step.
The cause of these sticking- and creeping-effects is the fact that we add infinitesimally
small values to the SORN on each iteration. The lower bound gets stuck because the value
added is smaller than the length of the lowest interval, hitting a blind spot of the blur
function. The upper bound creeps away because even though we add an infinitesimally
small value, it expands to at least the next following Unum value.
45
4. Unum Arithmetic
0 20
En
e
1
0 5 10 15 20
n
Figure 4.3.: Evaluation of the Unum Euler partial sums (4.2) for iterations n ∈
{0, . . . , 20} with (nb , ns ) = (12, 2) (◦/• demarks open/closed interval en-
dpoints); see Listing B.3.1.
This problem makes it impossible to work with Unums to examine infinite series or
sequences and iterative problems in general. Even though Unums do not lie about the
solution, the quality of it is decreased on each iteration, as we could already see in
Subsections 4.3.2 and 4.3.3. There is also no chance of formulating a break condition
for the given algorithm because of this behaviour. We observe comparable problems for
finding break conditions for infinite series that do not converge quickly using floating-
point numbers, so we can generally think of it as an unsolved problem following from
the finite nature of the machine.
46
4.4. Discussion
results.
This problem can be approached using a run-length encoding for SORNs comparable
to how LUTs were implemented (see Subsection 4.2.2), but this would make SORN
operations in general less efficient unless the operations take place directly on top of
Unum enumeration indices.
4.4.4. Complexity
Despite the efforts to simplify arithmetic operations and overhead by creating lookup
tables and working on bit strings in a simple manner, the cost of this simplification weighs
heavily. The contradiction lies within the fact that to at least reduce the detrimental
effects of sticking and creeping it is necessary to increase the number of Unum bits nb .
However, this is only possible up to a certain point until the LUTs become too large.
In this context, dealing with strictly monotonic functions like ln in the Unum context
requires LUTs for each of them as well (see Subsection 4.3.1).
It is questionable how useful the Unum arithmetic is within the tight bounds set by
these limiting factors. However, it should be taken into account that there are pos-
sible uses for Unums on very coarse grids, for instance inverse kinematics. Gustafson
also identifies the problem (see [Gus16b, Section 6]) and notes that this problem could
indicate that Unums are ‘ [. . .] primarily practical for low-accuracy but high-validity
applications, thereby complementing float arithmetic instead of replacing it. ’ [Gus16b,
Section 6.2]
47
5. Summary and Outlook
In the course of this thesis we started off with the construction of a mathematical de-
scription of IEEE 754 floating-point numbers, compared the properties of different binary
storage formats and studied examples which uncover inherent weaknesses of this arith-
metic.
Following from these observations, we constructed the projectively extended real num-
bers based on a small set of axioms. After introducing a definition of finite and infinite
limits on the projectively extended real numbers, we showed their well-definedness in
terms of these limits. Based on this foundation, we developed the Flake arithmetic and
proved well-definedness in terms of set theory.
This effort led us to the mathematical foundation of Unums, which as proposed ap-
proaches the interval arithmetic dependency problem in a new way and is meant to
be easy to implement in the machine. We presented different types of Unum lattices,
evaluated the requirements for hardware implementations and studied the numerical be-
haviour in a Unum toolbox, which was developed in the course of this thesis. Using
these results, we were able to draw the conclusion that Unums may not be a number
format allowing naïve computations, but exhibited promising results in low-precision but
high-validity applications.
The author expected to find drawbacks of this nature for the Unum number format,
as any numerical system exhibits its strength only within certain conditions, making
it easy to find examples where it fails. In this context, it was observed that IEEE
754 floating-point numbers and Unums complement each other. Given the nature of
Unum arithmetic, it may be on the one hand difficult to do stability analysis due to the
complexity of the arithmetic rules, but on the other hand the guaranteed bounds of the
result do not cover up when an algorithm is not fit for this environment and indicate
the need to approach the problem using a different numerical approach.
At the point of writing, the revised Unum format was approached with neither a
mathematical foundation nor formalisation. The available toolboxes were only emulating
Unums using floating-point arithmetic, hiding numerous drawbacks with regard to the
complexity of lookup tables. The results obtained in this thesis make it possible to reason
about Unums in the bounds that will also be present when it comes to implementing
Unums in hardware and not only in software.
In general it is questionable if the approach of using lookup tables is really the best
way to go, despite the possible advantage of simplifying calculations. It is questionable if
it is really worth it to throw the entire IEEE 754 floating-point infrastructure overboard
and have two exclusive numerical systems.
The bl operator presented in this thesis corresponds to the rounding operation for
floating-point numbers to a certain extent. A topic for further research could be to
49
5. Summary and Outlook
50
A. Notation Directory
A.1. Section 2: IEEE 754 Floating-Point Arithmetic
51
A. Notation Directory
52
B. Code Listings
B.1. IEEE 754 Floating-Point Problems
B.1.1. spike.c
#include <float.h>
#include <math.h>
#include <stdio.h>
int
main(void)
{
double x[NUMPOINTS * 2 + 1][2];
int i;
putchar(’\n’);
53
B. Code Listings
return 0;
}
B.1.2. devil.c
#include <stdio.h>
int
main(void)
{
double a, b, tmp;
int i;
a = 2;
b = -4;
return 0;
}
B.1.3. bank.c
#include <float.h>
#include <math.h>
#include <stdio.h>
int
main(void)
54
B.2. Unum Toolbox
{
double a;
int n;
a = 1.718281828459045235;
return 0;
}
B.1.4. Makefile
%: %.c
cc $^ -o $@ $(LDFLAGS)
clean:
rm -f $(PROBLEMS) $(LMPROBLEMS)
#include <fenv.h>
#include <float.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#undef LEN
55
B. Code Listings
struct _unum {
double val;
char *name;
};
struct _unumrange {
size_t low;
size_t upp;
};
struct _latticep {
char *name;
double val;
};
static void
printunums(struct _unum *unum, size_t numunums)
{
size_t i;
fputs("\nstruct␣_unum␣unums[]␣=␣{\n", stdout);
56
B.2. Unum Toolbox
fputs("};\n", stdout);
}
size_t
blur(double val, struct _unum *unum, size_t numunums)
{
size_t i;
/* infinity is infinity */
if (isinf(val)) {
return numunums / 2;
}
/* in range */
if (isnan(unum[i].val) &&
val < unum[UCLAMP(i, +1)].val &&
val > unum[UCLAMP(i, -1)].val) {
return i;
}
}
void
add(size_t a, size_t b, struct _unumrange *res,
struct _unum *unum, size_t numunums)
{
double av, bv, aupp, alow, bupp, blow;
av = unum[a].val;
bv = unum[b].val;
57
B. Code Listings
58
B.2. Unum Toolbox
if (isinf(av)) {
/* all extended real numbers */
res->low = 0;
res->upp = numunums - 1;
return;
} else {
fesetround(FE_DOWNWARD);
res->low = blur(av + blow, unum, numunums);
fesetround(FE_UPWARD);
res->upp = blur(av + bupp, unum, numunums);
}
} else if (!isnan(bv) && isnan(av)) {
/*
* a interval, b point
*/
add(b, a, res, unum, numunums);
return;
}
if (isnan(av) || isnan(bv)) {
/* we had an open interval in our calculation
* and need to check if res->upp or res->low
* are a point. If this is the case, we have
* to round it down to respect the openness
* of the real interval */
if (!isnan(unum[res->low].val)) {
res->low = UCLAMP(res->low, +1);
}
if (!isnan(unum[res->upp].val)) {
res->upp = UCLAMP(res->upp, -1);
}
}
}
59
B. Code Listings
void
mul(size_t a, size_t b, struct _unumrange *res,
struct _unum *unum, size_t numunums)
{
double av, bv, aupp, alow, bupp, blow;
av = unum[a].val;
bv = unum[b].val;
60
B.2. Unum Toolbox
res->upp = numunums / 2 - 1;
} else {
/* all real numbers */
res->low = numunums / 2 + 1;
res->upp = numunums / 2 - 1;
return;
}
} else if (isinf(alow) && isinf(bupp)) {
if (aupp <= 0 && blow >= 0) {
/* (iffy, aupp * blow) */
res->low = numunums / 2 + 1;
fesetround(FE_UPWARD);
res->upp = blur(aupp * blow,
unum, numunums);
} else {
/* all real numbers */
res->low = numunums / 2 + 1;
res->upp = numunums / 2 - 1;
return;
}
} else if (isinf(aupp) && isinf(blow)) {
mul(b, a, res, unum, numunums);
return;
} else if (isinf(alow)) {
if (blow >= 0) {
/* (iffy, MAX(aupp * blow,
* aupp * bupp) */
res->low = numunums / 2 + 1;
fesetround(FE_UPWARD);
res->upp = blur(MAX(aupp * blow,
aupp * bupp),
unum, numunums);
} else if (bupp <= 0) {
/* (MIN(aupp * blow, aupp * bupp),
* iffy) */
fesetround(FE_DOWNWARD);
res->low = blur(MIN(aupp * blow,
aupp * bupp),
unum, numunums);
res->upp = numunums / 2 - 1;
} else {
/* all real numbers */
res->low = numunums / 2 + 1;
res->upp = numunums / 2 - 1;
61
B. Code Listings
return;
}
} else if (isinf(aupp)) {
if (blow >= 0) {
/* (MIN(alow * blow, aupp * bupp),
* iffy) */
fesetround(FE_DOWNWARD);
res->low = blur(MIN(alow * blow,
alow * bupp),
unum, numunums);
res->upp = numunums / 2 - 1;
} else if (bupp <= 0) {
/* (iffy, MAX(alow * blow,
* alow * bupp) */
res->low = numunums / 2 + 1;
fesetround(FE_UPWARD);
res->upp = blur(MAX(alow * blow,
alow * bupp),
unum, numunums);
} else {
/* all real numbers */
res->low = numunums / 2 + 1;
res->upp = numunums / 2 - 1;
return;
}
} else if (isinf(blow) || isinf(bupp)) {
mul(b, a, res, unum, numunums);
} else {
/* (MIN(C), MAX(C)) */
fesetround(FE_DOWNWARD);
res->low = blur(MIN(MIN(alow * blow,
alow * bupp),
MIN(aupp * blow,
aupp * bupp)),
unum, numunums);
fesetround(FE_UPWARD);
res->upp = blur(MAX(MAX(alow * blow,
alow * bupp),
MAX(aupp * blow,
aupp * bupp)),
unum, numunums);
}
} else if (!isnan(av) && !isnan(bv)) {
/*
62
B.2. Unum Toolbox
* a point, b point
*/
if ((isinf(av) && (fabs(bv) <= DBL_EPSILON *
fabs(bv) || isinf(bv))) ||
(isinf(bv) && (fabs(av) <= DBL_EPSILON *
fabs(av) || isinf(av)))) {
/* all extended real numbers */
res->low = 0;
res->upp = numunums - 1;
return;
} else {
fesetround(FE_DOWNWARD);
res->low = blur(av * bv, unum, numunums);
fesetround(FE_UPWARD);
res->upp = blur(av * bv, unum, numunums);
}
} else if (!isnan(av) && isnan(bv)) {
/*
* a point, b interval
*/
bupp = unum[UCLAMP(b, +1)].val;
blow = unum[UCLAMP(b, -1)].val;
if (isinf(av)) {
if (isinf(blow)) {
if (bupp < 0) {
/* infinity */
res->low = numunums / 2;
res->upp = numunums / 2;
return;
} else {
/* all extended real
* numbers */
res->low = 0;
res->upp = numunums - 1;
return;
}
} else if (isinf(bupp)) {
if (blow > 0) {
/* infinity */
res->low = numunums / 2;
res->upp = numunums / 2;
return;
} else {
63
B. Code Listings
if (isnan(av) || isnan(bv)) {
/* we had an open interval in our calculation
* and need to check if res->upp or res->low
* are a point. If this is the case, we have
* to round it down to respect the openness
* of the real interval */
if (!isnan(unum[res->low].val)) {
64
B.2. Unum Toolbox
static void
gentable(char *name, void (*f)(size_t, size_t, struct _unumrange *,
struct _unum *, size_t), struct _unum *unum, size_t numunums)
{
struct _unumrange res;
size_t s, z;
printf("\nstruct␣_unumrange␣%stable[]␣=␣{\n", name);
fputs("\n", stdout);
}
fputs("};\n", stdout);
}
void
ulog(size_t u, struct _unumrange *res, struct _unum *unum,
size_t numunums)
{
double uv, ulow, uupp;
uv = unum[u].val;
if (isnan(uv)) {
ulow = unum[UCLAMP(u, -1)].val;
uupp = unum[UCLAMP(u, +1)].val;
65
B. Code Listings
static void
genfunctable(char *name, void (*f)(size_t, struct _unumrange *,
struct _unum *, size_t), struct _unum *unum, size_t numunums)
{
struct _unumrange res;
size_t u;
printf("\nstruct␣_unumrange␣%stable[]␣=␣{\n", name);
fputs("};\n", stdout);
}
static void
genunums(struct _latticep *lattice, size_t latticesize,
struct _unum *unum, size_t numunums)
{
size_t off;
ssize_t i;
off = 0;
/* 0 */
unum[off].val = 0.0;
unum[off].name = "0";
off++;
unum[off].val = NAN;
unum[off].name = NULL;
off++;
/* (0,1) */
for (i = latticesize - 1; i >= 0; i--, off++) {
unum[off].val = 1 / lattice[i].val;
66
B.2. Unum Toolbox
if (lattice[i].name[0] == ’/’) {
unum[off].name = lattice[i].name + 1;
} else {
/* add ’/’ prefix */
if (!(unum[off].name =
malloc(strlen(lattice[i].name) + 2))) {
fprintf(stderr, "out␣of␣memory\n");
exit(1);
}
strcpy(unum[off].name + 1, lattice[i].name);
unum[off].name[0] = ’/’;
}
off++;
unum[off].val = NAN;
unum[off].name = NULL;
}
/* 1 */
unum[off].val = 1.0;
unum[off].name = "1";
off++;
unum[off].val = NAN;
unum[off].name = NULL;
off++;
/* (1,INF) */
for (i = 0; i < latticesize; i++, off++) {
unum[off].val = lattice[i].val;
unum[off].name = lattice[i].name;
off++;
unum[off].val = NAN;
unum[off].name = NULL;
}
/* INF */
unum[off].val = INFINITY;
unum[off].name = "\u221E";
off++;
unum[off].val = NAN;
unum[off].name = NULL;
off++;
/* (INF,-1) */
for (i = latticesize - 1; i >= 0; i--, off++) {
67
B. Code Listings
unum[off].val = -lattice[i].val;
if (!(unum[off].name =
malloc(strlen(lattice[i].name) + 2))) {
fprintf(stderr, "out␣of␣memory\n");
exit(1);
}
strcpy(unum[off].name + 1, lattice[i].name);
unum[off].name[0] = ’-’;
off++;
unum[off].val = NAN;
unum[off].name = NULL;
}
/* -1 */
unum[off].val = -1.0;
unum[off].name = "-1";
off++;
unum[off].val = NAN;
unum[off].name = NULL;
off++;
/* (-1, 0) */
for (i = 0; i < latticesize; i++, off++) {
unum[off].val = - 1 / lattice[i].val;
if (lattice[i].name[0] == ’/’) {
if (!(unum[off].name =
strdup(lattice[i].name))) {
fprintf(stderr, "out␣of␣memory\n");
exit(1);
}
unum[off].name[0] = ’-’;
} else {
/* add ’-/’ prefix */
if (!(unum[off].name =
malloc(strlen(lattice[i].name) + 3))) {
fprintf(stderr, "out␣of␣memory\n");
exit(1);
}
strcpy(unum[off].name + 2, lattice[i].name);
unum[off].name[0] = ’-’;
unum[off].name[1] = ’/’;
}
off++;
unum[off].val = NAN;
68
B.2. Unum Toolbox
unum[off].name = NULL;
}
}
void
gendeclattice(struct _latticep **lattice, size_t *latticesize,
double maximum, int sigdigs)
{
size_t i, maxlen;
double c1, c2, curmax;
char *fmt = "%.*f";
/*
* Check prerequisites
*/
if (sigdigs == 0) {
fprintf(stderr, "invalid␣number␣of␣"
"significant␣digits\n");
}
if ((*latticesize == 0) == isinf(maximum)) {
fprintf(stderr, "gendeclattice:␣accepting␣"
"only␣one␣parameter␣besides␣number␣of␣"
"significant␣digits\n");
exit(1);
}
if (*latticesize == 0) {
/* calculate lattice size until maximum is
* contained */
for (curmax = 0; curmax < maximum; (*latticesize)++) {
curmax = (1 + c2 *
(*latticesize % (size_t)c1)) *
pow(10, floor(*latticesize / c1));
}
} else { /* isinf(maximum) */
/* calculate maximum */
maximum = (1 + c2 *
(*latticesize % (size_t)c1)) *
pow(10, floor(*latticesize / c1));
}
69
B. Code Listings
/*
* Generate lattice
*/
if (!(*lattice = malloc(sizeof(struct _latticep) *
*latticesize))) {
fprintf(stderr, "out␣of␣memory\n");
exit(1);
}
maxlen = snprintf(NULL, 0, fmt, sigdigs - 1, maximum) + 1;
for (i = 0; i < *latticesize; i++) {
(*lattice)[i].val = (1 + c2 *
((i + 1) % (size_t)c1)) *
pow(10, floor((i + 1) / c1));
if (!((*lattice)[i].name = malloc(maxlen))) {
fprintf(stderr, "out␣of␣memory\n");
exit(1);
}
snprintf((*lattice)[i].name, maxlen, fmt,
sigdigs - 1, (*lattice)[i].val);
}
}
int
main(void)
{
struct _unum *unum;
struct _latticep *lattice;
size_t latticebits, latticesize, numunums;
ssize_t i;
int bits;
/* Generate lattice */
latticesize = (1 << (UBITS - 3)) - 1;
gendeclattice(&lattice, &latticesize, INFINITY, DIGITS);
/*
* Print unum.h includes
*/
fprintf(stderr, "#include␣<math.h>\n#include␣<stddef.h>\n"
"#include␣<stdint.h>\n\n");
/*
* Determine number of effective bits used
*/
70
B.2. Unum Toolbox
struct {
int bits;
char *type;
} types[] = {
{ 8, "uint8_t" },
{ 16, "uint16_t" },
{ 32, "uint32_t" },
{ 64, "uint64_t" },
};
/*
* Determine type needed to store the unum
*/
for (i = 0; i < LEN(types); i++) {
if (types[i].bits >= bits)
break;
}
if (i == LEN(types)) {
fprintf(stderr, "cannot␣fit␣bits␣into␣system"
"types\n");
return 1;
}
/*
* Print list of preliminary unum.h definitions
*/
fprintf(stderr, "typedef␣%s␣unum;\n#define␣ULEN␣%d\n"
"#define␣NUMUNUMS␣%zd\n", types[i].type, bits,
numunums);
fprintf(stderr, "#define␣UCLAMP(i,␣off)␣(((((off␣<␣0)␣&&␣(i)␣<"
"-off)␣?␣\\\n\tNUMUNUMS␣-␣((-off␣-␣(i))␣%%␣NUMUNUMS)␣:"
"\\\n\t((off␣>␣0)␣&&␣(i)␣+␣off␣>␣NUMUNUMS␣-␣1)␣?␣\\\n\t"
71
B. Code Listings
"((i)␣+␣off␣%%␣NUMUNUMS)␣%%␣NUMUNUMS␣:␣(i)␣+␣off))␣%%␣"
"NUMUNUMS)\n\n");
fprintf(stderr, "typedef␣struct␣{\n\tuint8_t␣data[%d];\n}␣SORN;\n",
(1 << bits) / 8);
fprintf(stderr, "\nvoid␣uadd(SORN␣*,␣SORN␣*);\n"
"void␣usub(SORN␣*,␣SORN␣*);\n"
"void␣umul(SORN␣*,␣SORN␣*);\n"
"void␣udiv(SORN␣*,␣SORN␣*);\n"
"void␣uneg(SORN␣*);\n"
"void␣uinv(SORN␣*);\n"
"void␣uabs(SORN␣*);\n\n"
"void␣ulog(SORN␣*);\n\n"
"void␣uemp(SORN␣*);\n"
"void␣uset(SORN␣*,␣SORN␣*);\n"
"void␣ucut(SORN␣*,␣SORN␣*);\n"
"void␣uuni(SORN␣*,␣SORN␣*);\n"
"int␣uequ(SORN␣*,␣SORN␣*);\n"
"int␣usup(SORN␣*,␣SORN␣*);\n\n"
"void␣uint(SORN␣*,␣double,␣double);\n"
"void␣uout(SORN␣*);\n");
/*
* Generate unums
*/
if (!(unum = malloc(sizeof(struct _unum) * numunums))) {
fprintf(stderr, "out␣of␣memory\n");
return 1;
}
genunums(lattice, latticesize, unum, numunums);
/*
* Print table.c includes
*/
printf("#include␣\"table.h\"\n");
/*
* Print list of unums
*/
printunums(unum, numunums);
/*
* Generate and print tables
72
B.2. Unum Toolbox
*/
gentable("add", add, unum, numunums);
gentable("mul", mul, unum, numunums);
/*
* Generate function tables
*/
genfunctable("log", ulog, unum, numunums);
return 0;
}
B.2.2. table.h
#include "unum.h"
struct _unumrange {
unum a;
unum b;
};
struct _unum {
double val;
char *name;
};
B.2.3. unum.c
#include <float.h>
#include <math.h>
#include <stdio.h>
#include "table.h"
#undef MAX
#define MAX(x,y) ((x) > (y) ? (x) : (y))
73
B. Code Listings
static size_t
blur(double val)
{
size_t i;
/* infinity is infinity */
if (isinf(val)) {
return NUMUNUMS / 2;
}
/* in range */
if (isnan(unums[i].val) &&
val < unums[UCLAMP(i, +1)].val &&
val > unums[UCLAMP(i, -1)].val) {
return i;
}
}
static void
_sornaddrange(SORN *s, unum lower, unum upper)
{
unum u;
size_t i, j;
int first;
74
B.2. Unum Toolbox
static unum
_unumnegate(unum u)
{
return UCLAMP(NUMUNUMS, -u);
}
static unum
_unuminvert(unum u)
{
return _unumnegate(UCLAMP(u, +(NUMUNUMS / 2)));
}
static unum
_unumabs(unum u)
{
return (u > NUMUNUMS / 2) ? _unumnegate(u) : u;
}
static void
_sornop(SORN *a, SORN *b, struct _unumrange table[],
unum (*mod)(unum))
{
unum u, v, low, upp;
size_t i, j, m, n;
static SORN res;
75
B. Code Listings
/*
* compare struct pointers to
* identify dependent arguments
* and in this case only do
* pairwise operations
*/
if (a == b && u != v)
continue;
76
B.2. Unum Toolbox
2 + v].b;
}
_sornaddrange(&res, low, upp);
}
}
}
}
static void
_sornmod(SORN *s, unum (mod)(unum))
{
SORN res;
unum u;
size_t i, j, k, l;
k = u / (sizeof(*s->data) * 8);
l = u % (sizeof(*s->data) * 8);
res.data[k] |= (1 << (sizeof(*res.data) *
8 - 1 - l));
}
}
77
B. Code Listings
void
uadd(SORN *a, SORN *b)
{
_sornop(a, b, addtable, NULL);
}
void
usub(SORN *a, SORN *b)
{
_sornop(a, b, addtable, _unumnegate);
}
void
umul(SORN *a, SORN *b)
{
_sornop(a, b, multable, NULL);
}
void
udiv(SORN *a, SORN *b)
{
_sornop(a, b, multable, _unuminvert);
}
void
uneg(SORN *s)
{
_sornmod(s, _unumnegate);
}
void
uinv(SORN *s)
{
_sornmod(s, _unuminvert);
}
void
uabs(SORN *s)
{
78
B.2. Unum Toolbox
_sornmod(s, _unumabs);
}
void
ulog(SORN *s)
{
unum u;
size_t i, j;
static SORN res;
void
uemp(SORN *s)
79
B. Code Listings
{
size_t i;
void
uset(SORN *a, SORN *b)
{
size_t i;
void
ucut(SORN *a, SORN *b)
{
size_t i;
void
uuni(SORN *a, SORN *b)
{
size_t i;
int
uequ(SORN *a, SORN *b)
{
size_t i;
80
B.2. Unum Toolbox
if (a->data[i] != b->data[i]) {
return 0;
}
}
return 1;
}
int
usup(SORN *a, SORN *b)
{
ucut(a, b);
void
uint(SORN *s, double lower, double upper)
{
_sornaddrange(s, blur(lower), blur(upper));
}
void
uout(SORN *s)
{
unum loopstart, u;
size_t i, j;
int active, insorn, loop2run;
loop2run = 0;
for (active = 0, i = sizeof(s->data) / 2; i < sizeof(s->data);
i++) {
loop1start:
for (j = 0; j < sizeof(*s->data) * 8; j++) {
u = sizeof(*s->data) * 8 * i + j;
insorn = s->data[i] & (1 << (sizeof(*s->data) *
8 - 1 - j));
if (!active && insorn) {
/* print the opening of a closed
* subset */
active = 1;
if (unums[u].name) {
printf("[%s,", unums[u].name);
} else {
81
B. Code Listings
printf("(%s,", unums[UCLAMP(u,
-1)].name);
}
} else if (active && !insorn) {
/* print the closing of a closed
* subset */
active = 0;
if (unums[UCLAMP(u, -1)].name) {
printf("%s]␣", unums[UCLAMP(u,
-1)].name);
} else {
printf("%s)␣", unums[u].name);
}
}
}
if (loop2run) {
goto loop2end;
}
}
loop2run = 1;
for (i = 0; i < sizeof(s->data) / 2; i++) {
goto loop1start;
loop2end:
;
}
if (active) {
printf("\u221E)");
}
}
B.2.4. config.mk
UBITS = 12
DIGITS = 2
B.2.5. Makefile
include config.mk
all: libunum.a
82
B.3. Unum Problems
unum.o: unum.c
cc -c unum.c -lm
table.o: table.c
cc -c table.c
table.c: gen
./gen 2> unum.h 1> table.c
%: %.c libunum.a
cc $^ -o $@
clean:
rm -f gen table.c unum.h table.o unum.o libunum.a
B.3.1. euler.c
#include <stdio.h>
#include "unum.h"
void
factorial(SORN *s, int f)
{
SORN tmp;
int i;
uemp(s);
83
B. Code Listings
uint(s, 1, 1);
int
main(void)
{
SORN e, tmp;
int i;
uemp(&e);
uint(&e, 1, 1);
uout(&e);
putchar(’\n’);
return 0;
}
B.3.2. devil.c
#include <stdio.h>
#include "unum.h"
int
main(void)
{
SORN a, b, c, tmp1, tmp2, tmp3;
int n;
84
B.3. Unum Problems
uemp(&a);
uemp(&b);
uemp(&c);
uint(&a, 2, 2);
uint(&b, -4, -4);
uemp(&tmp1);
uint(&tmp1, 111, 111);
uemp(&tmp2);
uint(&tmp2, 1130, 1130);
udiv(&tmp2, &b);
usub(&tmp1, &tmp2);
uemp(&tmp2);
uint(&tmp2, 3000, 3000);
uset(&tmp3, &b);
umul(&tmp3, &a);
udiv(&tmp2, &tmp3);
uadd(&tmp1, &tmp2);
uset(&c, &tmp1);
printf("U_%d␣=␣", n);
uout(&c);
putchar(’\n’);
}
return 0;
}
B.3.3. bank.c
#include <math.h>
#include <stdio.h>
85
B. Code Listings
#include "unum.h"
int
main(void)
{
SORN a, tmp;
int y;
uemp(&a);
uint(&a, M_E - 1, M_E - 1);
uemp(&tmp);
uint(&tmp, 1, 1);
usub(&a, &tmp);
printf("year␣%2d:␣", y);
uout(&a);
putchar(’\n’);
}
return 0;
}
B.3.4. spike.c
#include <stdio.h>
#include "unum.h"
int
main(void)
{
SORN res, tmp;
size_t i, j;
86
B.3. Unum Problems
unum pole, u;
/* calculate F(res) */
uneg(&res);
uemp(&tmp);
uint(&tmp, 1, 1);
uadd(&res, &tmp);
uemp(&tmp);
uint(&tmp, 3, 3);
umul(&res, &tmp);
uemp(&tmp);
uint(&tmp, 1, 1);
uadd(&res, &tmp);
uabs(&res);
ulog(&res);
uout(&res);
putchar(’\n’);
87
B. Code Listings
return 0;
}
B.3.5. Makefile
all: $(PROBLEMS)
%: %.c
cc -o $@ $^ libunum.a
clean:
rm -rf $(PROBLEMS)
B.4. License
This ISC license applies to all code listings in Chapter B.
Copyright
c 2016, Laslo Hunhold
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
88
Bibliography
[Gus15] John L. Gustafson. The End of Error: Unum Computing. Computa-
tional Science Series. Chapman & Hall/CRC, Boca Raton, Florida, USA,
1 edition, February 2015. ISBN 9781482239867.
[IEE85] IEEE Task P754. ANSI/IEEE 754-1985, Standard for Binary Floating-
Point Arithmetic. New York City, NY, USA, August 12, 1985. 20 pp.
[IEE08] IEEE Task P754. IEEE 754-2008, Standard for Floating-Point Arithmetic.
New York City, NY, USA, August 29, 2008. 58 pp.
[IEE15] IEEE Task P1788. IEEE 1788-2015, Standard for Interval Arithmetic.
New York City, NY, USA, June 11, 2015. 97 pp.
[Kah06] William Morton Kahan. How futile are mindless assessments of roundoff
in floating-point computation? https://people.eecs.berkeley.edu/
~wkahan/Mindless.pdf, January 2006. Accessed: 2016-08-31.
89
Bibliography
[MKŠ+ 06] Rafi L. Muhanna, Vladik Kreinovich, Pavel Šolín, Jack Chessa, Roberto
Araiza, and Gang Xiang. Interval finite element methods: New directions.
In Rafi L. Muhanna and Robert L. Mullen, editors, Modeling Errors and
Uncertainty in Engineering Computations, pages 229–243. NSF Workshop
on Reliable Engineering Computing, REC 2006, Atlanta, GA, USA, Janu-
ary 1, 2006. ISBN 044486377X.
[TF95] George B. Thomas and Ross L. Finney. Calculus and Analytic Geo-
metry. Addison Wesley, Boston, MA, USA, 9 edition, August 1995. ISBN
9780201531749.
90
Eigenständigkeitserklärung
Hiermit bestätige ich, daß ich die vorliegende Arbeit selbstständig verfaßt und keine
anderen als die angegebenen Hilfsmittel verwendet habe.
Die Stellen der Arbeit, die dem Wortlaut oder dem Sinn nach anderen Werken entnom-
men sind, wurden unter Angabe der Quelle kenntlich gemacht.
Laslo Hunhold
91