This function maps values from a dataset to bit flags that can be encoded into a bitfield.
bf_map(protocol, data, registry, ..., name = NULL, na.val = NULL)character(1)
the protocol based on which
the flag should be determined, see Details.
the object to build bit flags for.
registry(1)
an already defined bitfield
registry.
the protocol-specific arguments for building a bit flag, see Details.
character(1)
optional flag-name.
value, of the same encoding type as the flag, that needs to be
given, if the test for this flag results in NAs.
an (updated) object of class 'registry' with the additional flag defined here.
protocol can either be the name of an internal item (see
bf_pcl), a newly built local protocol
(bf_protocol) or one that has been imported from the bitfield
community standards repo on github (bf_standards). Any
protocol has specific arguments, typically at least the name of the
column containing the values to test (x). To make this function as
general as possible, all of these arguments are specified via the
... argument of bf_map. Internal
protocols are:
na (x): test whether a variable contains NA-values
(boolean).
nan (x): test whether a variable contains NaN-values
(boolean).
inf (x): test whether a variable contains Inf-values
(boolean).
identical (x, y): element-wise test whether values are
identical across two variables (boolean).
range (x, min, max): test whether the values are within a
given range (boolean).
matches (x, set): test whether the values match a given set
(boolean).
grepl (x, pattern): test whether the values match a given
pattern (boolean).
category (x): test whether the values are part of a set of
given categories. (enumeration).
case (...): test whether values are part of given cases
(enumeration).
nChar (x): count the number of characters of the values
(unsigned integer).
nInt (x): count the number of integer digits of the values
(unsigned integer).
nDec (x): count the decimal digits of the variable values
(unsigned integer).
integer (x, ...): encode values as integer bit-sequence.
Accepts raw integer data directly, or numeric data with
auto-scaling when range, fields, or decimals
are provided. With range = c(min, max) and
fields = list(significand = n), values are linearly mapped
from [min, max] to [0, 2^n - 1] during encoding and
back during decoding. The scaling parameters are stored in
provenance for transparent round-trips (signed integer).
numeric (x, ...): encode the numeric value as floating-point
bit-sequence (see .makeEncoding for details on the
... argument) (floating-point).
Console output from R classes (such as tibble) often rounds
or truncates decimal places, even for ordinary numeric vectors. Internally,
R stores numeric values as double-precision floating-point numbers (64
bits, with 52 bits for the significand), providing approximately 16
significant decimal digits ((log10(2^52)) ≈ 15.95). If a bit flag
appears inconsistent with the displayed values, verify the full precision
using sprintf("%.16f", values). Using more than 16 digits will show
additional figures, but these are artifacts of binary-to-decimal conversion
and carry no meaningful information.
# first, set up the registry
reg <- bf_registry(name = "testBF", description = "test bitfield",
template = bf_tbl)
# then, put the test for NA values together
reg <- bf_map(protocol = "na", data = bf_tbl, registry = reg,
x = year)
# all the other protocols...
# boolean encoding
reg <- bf_map(protocol = "nan", data = bf_tbl, registry = reg,
x = y)
reg <- bf_map(protocol = "inf", data = bf_tbl, registry = reg,
x = y)
reg <- bf_map(protocol = "identical", data = bf_tbl, registry = reg,
x = x, y = y, na.val = FALSE)
reg <- bf_map(protocol = "range", data = bf_tbl, registry = reg,
x = yield, min = 10.4, max = 11)
reg <- bf_map(protocol = "matches", data = bf_tbl, registry = reg,
x = commodity, set = c("soybean", "honey"), na.val = FALSE)
reg <- bf_map(protocol = "grepl", data = bf_tbl, registry = reg,
x = year, pattern = ".*r", na.val = FALSE)
# enumeration encoding
reg <- bf_map(protocol = "category", data = bf_tbl, registry = reg,
x = commodity, na.val = 0)
reg <- bf_map(protocol = "case", data = bf_tbl, registry = reg, na.val = 4,
yield >= 11, yield < 11 & yield > 9, yield < 9 & commodity == "maize")
# integer encoding
reg <- bf_map(protocol = "nChar", data = bf_tbl, registry = reg,
x = commodity, na.val = 0)
reg <- bf_map(protocol = "nInt", data = bf_tbl, registry = reg,
x = yield)
reg <- bf_map(protocol = "nDec", data = bf_tbl, registry = reg,
x = yield)
reg <- bf_map(protocol = "integer", data = bf_tbl, registry = reg,
x = as.integer(year), na.val = 0L)
#> Warning: NAs introduced by coercion
#> Warning: NAs introduced by coercion
# integer encoding with auto-scaling (numeric data mapped to integer range)
dat <- data.frame(density = c(0.5, 1.2, 2.8, 0.0, 3.1))
reg2 <- bf_registry(name = "scaledBF", description = "auto-scaled",
template = dat)
reg2 <- bf_map(protocol = "integer", data = dat, registry = reg2,
x = density, range = c(0, 3.1),
fields = list(significand = 5), na.val = 0L)
# floating-point encoding
reg <- bf_map(protocol = "numeric", data = bf_tbl, registry = reg,
x = yield, decimals = 2)
# finally, take a look at the registry
reg
#> type data.frame
#> width 44
#> flags 14 -|-|-|-|-|-|-|--|---|----|---|---|-----------|-----------
#>
#> pos encoding name col
#> 1 0.0.1/0 na year
#> 2 0.0.1/0 nan y
#> 3 0.0.1/0 inf y
#> 4 0.0.1/0 identical x-y
#> 5 0.0.1/0 range yield
#> 6 0.0.1/0 matches commodity
#> 7 0.0.1/0 grepl year
#> 8 0.0.2/0 category commodity
#> 10 0.0.3/0 case yield-commodity
#> 13 0.0.4/0 nChar commodity
#> 17 0.0.3/0 nInt yield
#> 20 0.0.3/0 nDec yield
#> 23 0.0.11/0 integer year
#> 34 0.4.7/7 numeric yield
# alternatively, a raster
library(terra)
bf_rst <- rast(nrows = 3, ncols = 3, vals = bf_tbl$commodity, names = "commodity")
bf_rst$yield <- rast(nrows = 3, ncols = 3, vals = bf_tbl$yield)
reg <- bf_registry(name = "testBF", description = "raster bitfield",
template = bf_rst)
reg <- bf_map(protocol = "na", data = bf_rst, registry = reg,
x = commodity)
reg <- bf_map(protocol = "range", data = bf_rst, registry = reg,
x = yield, min = 5, max = 11)
reg <- bf_map(protocol = "category", data = bf_rst, registry = reg,
x = commodity, na.val = 0)
reg
#> type SpatRaster
#> width 4
#> flags 3 -|-|--
#>
#> pos encoding name col
#> 1 0.0.1/0 na commodity
#> 2 0.0.1/0 range yield
#> 3 0.0.2/0 category commodity