Determine encoding

.makeEncoding(var, type, ...)

Arguments

var: the variable for which to determine encoding.
type: the encoding type for which to determine encoding.
...: list(.)
named list of options to determine encoding, see Details.

Value

list of the encoding values for sign, exponent and significand, and an additional provenance term.

Details

Floating-point values are encoded using three fields that map directly to bit sequences. Any numeric value can be written in scientific notation. For example, the decimal 923.52 becomes 9.2352 × 10². The same principle applies in binary: the value 101011.101₂ becomes 1.01011101 × 2⁵. This binary scientific notation directly yields the three encoding fields:

Sign: whether the value is positive or negative (here: positive → 0)
Exponent: the power of 2 (here: 5)
Significand: the fractional part after the leading 1 (here: 01011101)

For background on floating-point representation, see 'Floating Point' by Thomas Finley, or explore encodings interactively at https://float.exposed/.

The allocation of bits across these fields can be adjusted to suit different needs: more exponent bits provide a wider range (smaller minimums and larger maximums), while more significand bits provide finer precision. This package documents bit allocation using the notation [s.e.m], where s = sign bits (0 or 1), e = exponent bits, and m = significand bits.

For non-numeric data (boolean or categorical), the same notation applies with sign and exponent set to 0. A binary flag uses [0.0.1], while a categorical variable with 8 levels requires 3 bits, yielding [0.0.3].

Possible options (...) of this function are

format: switch that determines the configuration of the floating point encoding. Possible values are "half" [1.5.10], "bfloat16" [1.8.7], "tensor19" [1.8.10], "fp24" [1.7.16], "pxr24" [1.8.15], "single" [1.8.23] and "double" [1.11.52],
fields: list of custom values that control how many bits are allocated to sign, exponent and significand for encoding the numeric values,
range: the ratio between the smallest and largest possible value to be reliably represented (modifies the exponent),
decimals: the number of decimal digits that should be represented reliably (modifies the significand).

In a future version, it should also be possible to modify the bias to focus number coverage to where it's most useful for the data.