Determine encoding

.makeEncoding(var, type, ...)

Arguments

var

the variable for which to determine encoding.

type

the encoding type for which to determine encoding.

...

list(.)
named list of options to determine encoding, see Details.

Value

list of the encoding values for sign, exponent and significand, and an additional provenance term.

Details

Floating-point values are encoded using three fields that map directly to bit sequences. Any numeric value can be written in scientific notation. For example, the decimal 923.52 becomes 9.2352 × 10². The same principle applies in binary: the value 101011.101₂ becomes 1.01011101 × 2⁵. This binary scientific notation directly yields the three encoding fields:

  • Sign: whether the value is positive or negative (here: positive → 0)

  • Exponent: the power of 2 (here: 5)

  • Significand: the fractional part after the leading 1 (here: 01011101)

For background on floating-point representation, see 'Floating Point' by Thomas Finley, or explore encodings interactively at https://float.exposed/.

The allocation of bits across these fields can be adjusted to suit different needs: more exponent bits provide a wider range (smaller minimums and larger maximums), while more significand bits provide finer precision. This package documents bit allocation using the notation [s.e.m], where s = sign bits (0 or 1), e = exponent bits, and m = significand bits.

For non-numeric data (boolean or categorical), the same notation applies with sign and exponent set to 0. A binary flag uses [0.0.1], while a categorical variable with 8 levels requires 3 bits, yielding [0.0.3].

Possible options (...) of this function are

  • format: switch that determines the configuration of the floating point encoding. Possible values are "half" [1.5.10], "bfloat16" [1.8.7], "tensor19" [1.8.10], "fp24" [1.7.16], "pxr24" [1.8.15], "single" [1.8.23] and "double" [1.11.52],

  • fields: list of custom values that control how many bits are allocated to sign, exponent and significand for encoding the numeric values,

  • range: the ratio between the smallest and largest possible value to be reliably represented (modifies the exponent),

  • decimals: the number of decimal digits that should be represented reliably (modifies the significand).

In a future version, it should also be possible to modify the bias to focus number coverage to where it's most useful for the data.