0% found this document useful (0 votes)
65 views8 pages

Floating Point Algorithmic Math Package User's Guide

This document describes a floating point algorithmic math package that provides common math functions like division, square root, logarithms, exponents, and trigonometric functions. It includes two versions of the package implementation - one using the "real" type and one using algorithms. The package files can be downloaded and include testbench files. The packages are designed for VHDL-2008 but have backward compatible VHDL-1993 versions.

Uploaded by

emilliano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views8 pages

Floating Point Algorithmic Math Package User's Guide

This document describes a floating point algorithmic math package that provides common math functions like division, square root, logarithms, exponents, and trigonometric functions. It includes two versions of the package implementation - one using the "real" type and one using algorithms. The package files can be downloaded and include testbench files. The packages are designed for VHDL-2008 but have backward compatible VHDL-1993 versions.

Uploaded by

emilliano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Floating point algorithmic math package

user’s guide
By David Bishop ([email protected])
The floating point algorithmic math package was designed to be a high level math
package. It was designed to mimic the functions in a floating point FPU. These
packages provide you with several “calculator” math functions. It was designed as a
floating point representation of the math_real package. There are several ways to do
many of these functions. Several are represented here.

The VHDL floating point algorithmic math packages can be downloaded at:
http://www.vhdl.org/fphdl/float_alg_pkg.zip

In the ZIP archive you will find the following files:


 “float_alg_pkg.vhdl” - Package definition
 “float_alg_pkg-body_real.vhdl” – Package body, implemented using the “real”
type (unsynthesizable)
 “float_alg_pkg-body.vhdl” – Package body, implemented algorithmically.
 “test_float_alg.vhdl” which test the functionality of these packages.
 “compile.mti” – A compile script.
 “compile_93.mti” – Compile script for VHDL-93 (to 2002 rules for shared
variables)
These packages have been designed for use in VHDL-2008. However, the compatibility
version of the packages is provided that works for VHDL-1993. The VHDL-1993
versions of the packages have an “_93” at the end if their file names. These packages
were designed to be compiled into the IEEE_PROPOSED library.

Dependencies:
 “float_alg_pkg” is dependent on the VHDL-2008 “numeric_std”, “float_pkg”,
“fixed_pkg”, and “math_real”, as well as “fixed_alg_pkg”. The VHDL-1993
version of “float_alg_pkg” is dependent on the “IEEE_PROPOSED” library
which can be downloaded at http://www.vhdl.org/fphdl/vhdl2008c.zip. It is also
dependant on the “math_real” and “fixed_alg_pkg”

Overview

The “float_alg_pkg” package defines no new types.

Why two versions you ask? First as a check, to verify that the correct result is being
returned. Basically to test the testbench. Second, these algorithms can sometimes take a
long time to run, so in batch simulations it may be faster to use the real math version.
The results do not match exactly, but they are very close.
This package uses series to compute values. There is no pipelining done, so you need to
do automatic pipelining, insert your own pipelines, or live with the long delays in a multi
cycle clocking scheme.

These algorithms are not exhaustively debugged. Please e-mail me if you find a bug.
Use at your own risk...

Index:
Operators:

“**” – overloaded for float ** integer, and float ** float


Please see the “power_of” function for documentation. These functions are
implemented in the float_alg_pkg (real and synthesizable versions). There is also a
lookup table version in which the power (second term) is assumed to be a constant.

Functions:

Precision - This function rounds the input to a given number of binary bits. It is very
useful to compare results to see if they are close to the predicted values. I hope to move
this function into "float_pkg" in the next release of VHDL.
inputs:
arg : float
places : Natural number (starting at 1)
round_style : see floating point package documentation
result will be the same size as the input argument.

Example:
variable x, y : float32;

x := " 00111110101010101010101010101011"; -- 1/3


y := precision (x, 10);
The result will be rounded to 10 binary points, or:
y := " 00111110101010101100000000000000";

Floor - This function rounds the result down to the nearest integer. Similar to the floor
function in C. I hope to move this function into "float_pkg" in the next release of VHDL.
inputs:
arg: float
result will be the same size as the input argument.

Ceil - This function rounds the result up to the nearest integer. Similar to the ceil function
in C. I hope to move this function into "float_pkg" in the next release of VHDL.
inputs:
arg: float
result will be the same size as the input argument.

nr_divide - This function does a Newton Raphson divide (by using a loop, no real
division involved). This function works by running the Newton Raphson algorithm on
the reciprocal, then multiplying that by the "left" input. Yes, it has accuracy issues.
Found only in the “float_alg_pkg”.
inputs:
l, r : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentaiton
iterations : Number of times to go through the loop (see nr_reciprocal).
Result will be the same size floating point number as the input.

nr_reciprocal - This function does a Newton Raphson reciprocal. The algorithm used is:
c1 = c0 * (2 - c0*arg)
and looping until a given accuracy is achieved. This loop is shortened by creating the
correct seed, or "c0". This is done shifting the argument so the correct power of 2 of the
result is achieved. Then running the loop. For anything less than 10 bits, 3 loops will
work. 6 loops will give an accurate result for a 50 bit number. Thus there will be a trade
off as to which division algorithm
is best.
arg: float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentaiton
iterations : Number of times to go through the loop (see nr_reciprocal).
If the number of iterations is set to "0" then it will make an assumption given the length
of the number fed into the algorithm. A number of 6 will grantee accuracy, at the
expense of logic.
Result will be the same size floating point number as the input.

sqrt - Performs a square root using Newton's iteration:


root := (1 + arg) / 2, then root := (root + (arg/root))/2 until done.
Yes, this function does involve a divide, making it fairly slow. I recommend that you use
the inverse_sqrt function which if much more hardware efficient if possible. The
number of iterations necessary can be assumed correctly by the length of the argument.
arg : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
Result will be the same size floating point number as the input.
cbrt - Performs a cube root using Newton's iteration:
root := (arg + 2) / 3, then root := ((arg/root**2) + 2(root)))/3 until done.
Another fairly slow function. Every number has 3 cube roots, but this function only
delivers the positive one. A negative input will result in a negative result.
arg : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
Result will be the same size floating point number as the input.

inverse_sqrt - This function returns 1/sqrt(x) using a Newton Raphson


iteration:
y1 = (y0*(3-y0*y0*arg))/2, where the seed "y0" is computed figuring out the power of 2
of the result. No division is involved in this algorithm. The number of iterations can be
assumed by the number of bits in the result. However you can override this value if
necessary. 4 iterations are usually good enough depending on the seed. 6 will be good
enough for 50 bits of resolution.
arg : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
Result will be the same size floating point number as the input.

exp - This function returns E^X. It is done by using the series:


e**x := 1 + x + (x**2 / 2!) + (x**3 / 3!) ...
This algorithm is done without using division (1/3! is a constant), however it needs one
"term" for every 2 bits of accuracy.
arg: float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
Result will be the same size floating point number as the input.

Log - This function takes in two arguments and returns the Log of the first argument
using the second argument as the base. The algorithm used is: LogY(X) = ln(X) / ln(y),
so it isn't very efficient.
arg : float (must be a positive number)
base : float (must be a positive number)
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations : Number of times to go through the loop, passed to "ln" function
Result will be the same size floating point number as the input.

power_of - This function performs l^r. This is done via a log based operation which does
involve some division. Loop iterations are calculated from the length of the output.
l : float
r : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
Result will be of the same size and type as the left or "l" input.

ln - Performs the natural log of the argument. The series performed here is:
TERM = ((x-1)/(x+1)), ln(x) = 2(TERM + (1/3)*TERM**3 + ...)
There is one divide, at the beginning of the routine.
arg : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations : Number of times to go through the loop, You need about
1 iteration for every 4 bits of result. If a "0" is passed as the
number of iterations then iterations is arg'length/4.

Trig Functions

sin - Performs a "sin" function using this series:


sin(x) = x - x**3/3! + x**5/5! - x**7/7! ....
no divides involved, and is fairly efficient. The function does one iteration for every 4
bits of precision. The input is expected to be in radians. If the input is larger that 2*PI,
or less than 0, then a function is called with normalizes the input to the 0 to 2*PI range.
arg : float - input in Radians
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
result will be the same size as the "arg" input, and will be a number between -1 and 1.

cos - Performs a "cos" function using this series:


cos(x) = 1 - x**2/2! + x**4/4! - x**6/6! ....
no divides involved, and is fairly efficient. The function does one iteration for every 4
bits of precision. The input is expected to be in radians. If the input is larger that 2*PI,
or less than 0, then a function is called with normalizes the input to the 0 to 2*PI range.
arg : float - input in Radians
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
result will be the same size as the "arg" input, and will be a number between -1 and 1.

tan - Performs a tangent function using this series:


This function is currently performed by calculating sin(x)/cos(x), it works, but it is
probably not as efficient as I would like.
arg : float - input in Radians
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation

result will be the same size as the "arg" input, and will saturate as
the number goes to infinity.

arcsin - Performs an "arcsin" function (angle who's "sin" is).


This is done by calculating: arcsin(x) = arctan (x / sqrt (1 - x**2)), which is defiantly not
the most efficient mechanism, but it works.
arg : float - between -1.0 and 1.0
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: passed to the arctan function.
result will be the same size as the input, and returned in radians.

arccos - Performs an "arccos" function (angle who's "cos" is).


This is done by calculating: arccos(x) = PI/2 - arcsin(arg), which is defiantly not the most
efficient mechanism, but it works.
arg : float - between -1.0 and 1.0
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: passed to the arctan function.
result will be the same size as the input, and returned in radians.

arctan - Performs an "arctan" function (angle who's tangent is). This one is a bit
complicated. After several trials with standard equations I came up with:
arctan (x) = PI/2 - arctan(1/x)
which only works for small angles, so I use a "half_angle" formula to compute arguments
larger than sqrt(2)/4.
arg : float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: one iteration for every 6 bits of precision.
result will be the same size as the input, and returned in radians.

sinh - Hyperbolic sin function. This one also took a few trials.
The most efficient function I could come up with was:
sinh(x) = (e^x - e^(-x))/2 which uses the very efficient "exp" function.
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: passed to the "exp" function.
result will be the same size as the input.

cosh - Hyperbolic cos function.


The most efficient function I could come up with was:
cosh(x) = (e^x + e^(-x))/2 which uses the very efficient "exp" function.
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: passed to the "exp" function.
result will be the same size as the input.

tanh - Hyperbolic tan function.


The most efficient function I could come up with was:
tanh(x) = (e^2x - 1) / (e^2x + 1) which uses the very efficient "exp" function, but this
time involving a divide.
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: passed to the "exp" function.
result will be the same size as the input.

Arcsinh – Find the angle who’s hyperbolic sin is. Bounds on the output are –PI to PI.
Bounds on the input are –inf to + inf.
Inputs:
Arg: ufixed or sfixed
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: Number of times to go through the loop, passed to the “ln” funciton
result will be the same size as the input.

Arccosh – Find the angle who’s hyperbolic cos is. Bounds on the output are 0 to PI.
Bounds on the input are 1.0 to + inf.
Inputs:
Arg: float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: Number of times to go through the loop, passed to the “ln” funciton
result will be the same size as the input.

Arctanh – Find the angle who’s hyperbolic tan is. Bounds on the output are -PI to PI.
Bounds on the input are -1.0 to +1.0.
Inputs:
Arg: float
round_style : see floating point package documentation
guard : see floating point package documentation
check_error: see floating point package documentation
denormalize: see floating point package documentation
iterations: Number of times to go through the loop, passed to the “ln” funciton
result will be the same size as the input.

You might also like