Floating Point Algorithmic Math Package User's Guide
Floating Point Algorithmic Math Package User's Guide
user’s guide
By David Bishop ([email protected])
The floating point algorithmic math package was designed to be a high level math
package. It was designed to mimic the functions in a floating point FPU. These
packages provide you with several “calculator” math functions. It was designed as a
floating point representation of the math_real package. There are several ways to do
many of these functions. Several are represented here.
The VHDL floating point algorithmic math packages can be downloaded at:
http://www.vhdl.org/fphdl/float_alg_pkg.zip
Dependencies:
“float_alg_pkg” is dependent on the VHDL-2008 “numeric_std”, “float_pkg”,
“fixed_pkg”, and “math_real”, as well as “fixed_alg_pkg”. The VHDL-1993
version of “float_alg_pkg” is dependent on the “IEEE_PROPOSED” library
which can be downloaded at http://www.vhdl.org/fphdl/vhdl2008c.zip. It is also
dependant on the “math_real” and “fixed_alg_pkg”
Overview
Why two versions you ask? First as a check, to verify that the correct result is being
returned. Basically to test the testbench. Second, these algorithms can sometimes take a
long time to run, so in batch simulations it may be faster to use the real math version.
The results do not match exactly, but they are very close.
This package uses series to compute values. There is no pipelining done, so you need to
do automatic pipelining, insert your own pipelines, or live with the long delays in a multi
cycle clocking scheme.
These algorithms are not exhaustively debugged. Please e-mail me if you find a bug.
Use at your own risk...
Index:
Operators:
Functions:
Precision - This function rounds the input to a given number of binary bits. It is very
useful to compare results to see if they are close to the predicted values. I hope to move
this function into "float_pkg" in the next release of VHDL.
inputs:
arg : float
places : Natural number (starting at 1)
round_style : see floating point package documentation
result will be the same size as the input argument.
Example:
variable x, y : float32;
Floor - This function rounds the result down to the nearest integer. Similar to the floor
function in C. I hope to move this function into "float_pkg" in the next release of VHDL.
inputs:
arg: float
result will be the same size as the input argument.
Ceil - This function rounds the result up to the nearest integer. Similar to the ceil function
in C. I hope to move this function into "float_pkg" in the next release of VHDL.
inputs:
arg: float
result will be the same size as the input argument.
nr_divide - This function does a Newton Raphson divide (by using a loop, no real
division involved). This function works by running the Newton Raphson algorithm on
the reciprocal, then multiplying that by the "left" input. Yes, it has accuracy issues.
Found only in the “float_alg_pkg”.
inputs:
l, r : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentaiton
iterations : Number of times to go through the loop (see nr_reciprocal).
Result will be the same size floating point number as the input.
nr_reciprocal - This function does a Newton Raphson reciprocal. The algorithm used is:
c1 = c0 * (2 - c0*arg)
and looping until a given accuracy is achieved. This loop is shortened by creating the
correct seed, or "c0". This is done shifting the argument so the correct power of 2 of the
result is achieved. Then running the loop. For anything less than 10 bits, 3 loops will
work. 6 loops will give an accurate result for a 50 bit number. Thus there will be a trade
off as to which division algorithm
is best.
arg: float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentaiton
iterations : Number of times to go through the loop (see nr_reciprocal).
If the number of iterations is set to "0" then it will make an assumption given the length
of the number fed into the algorithm. A number of 6 will grantee accuracy, at the
expense of logic.
Result will be the same size floating point number as the input.
Log - This function takes in two arguments and returns the Log of the first argument
using the second argument as the base. The algorithm used is: LogY(X) = ln(X) / ln(y),
so it isn't very efficient.
arg : float (must be a positive number)
base : float (must be a positive number)
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations : Number of times to go through the loop, passed to "ln" function
Result will be the same size floating point number as the input.
power_of - This function performs l^r. This is done via a log based operation which does
involve some division. Loop iterations are calculated from the length of the output.
l : float
r : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
Result will be of the same size and type as the left or "l" input.
ln - Performs the natural log of the argument. The series performed here is:
TERM = ((x-1)/(x+1)), ln(x) = 2(TERM + (1/3)*TERM**3 + ...)
There is one divide, at the beginning of the routine.
arg : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations : Number of times to go through the loop, You need about
1 iteration for every 4 bits of result. If a "0" is passed as the
number of iterations then iterations is arg'length/4.
Trig Functions
result will be the same size as the "arg" input, and will saturate as
the number goes to infinity.
arctan - Performs an "arctan" function (angle who's tangent is). This one is a bit
complicated. After several trials with standard equations I came up with:
arctan (x) = PI/2 - arctan(1/x)
which only works for small angles, so I use a "half_angle" formula to compute arguments
larger than sqrt(2)/4.
arg : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: one iteration for every 6 bits of precision.
result will be the same size as the input, and returned in radians.
sinh - Hyperbolic sin function. This one also took a few trials.
The most efficient function I could come up with was:
sinh(x) = (e^x - e^(-x))/2 which uses the very efficient "exp" function.
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: passed to the "exp" function.
result will be the same size as the input.
Arcsinh – Find the angle who’s hyperbolic sin is. Bounds on the output are –PI to PI.
Bounds on the input are –inf to + inf.
Inputs:
Arg: ufixed or sfixed
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: Number of times to go through the loop, passed to the “ln” funciton
result will be the same size as the input.
Arccosh – Find the angle who’s hyperbolic cos is. Bounds on the output are 0 to PI.
Bounds on the input are 1.0 to + inf.
Inputs:
Arg: float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: Number of times to go through the loop, passed to the “ln” funciton
result will be the same size as the input.
Arctanh – Find the angle who’s hyperbolic tan is. Bounds on the output are -PI to PI.
Bounds on the input are -1.0 to +1.0.
Inputs:
Arg: float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: Number of times to go through the loop, passed to the “ln” funciton
result will be the same size as the input.