Bit Performance
Bit Performance
2022-11-13
Contents
A performance example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Boolean data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
% memory consumption of filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
% time extracting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
% time assigning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
% time subscripting with ‘which’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
% time assigning with ‘which’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
% time Boolean NOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
% time Boolean AND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
% time Boolean OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
% time Boolean EQUALITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
% time Boolean XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
% time Boolean SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Fast methods for integer set operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
% time for sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
% time for unique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
% time for duplicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
% time for anyDuplicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
% time for sumDuplicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
% time for match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
% time for in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
% time for notin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
% time for union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
% time for intersect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
% time for setdiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
% time for symdiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
% time for setequal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
% time for setearly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
A performance example
Before we measure performance of the main functionality of the package, note that something simple as
‘(a:b)[-i]’ can and has been accelerated in this package:
a <- 1L
b <- 1e7L
i <- sample(a:b,1e3)
x <- c(
R = median(microbenchmark((a:b)[-i], times=times)$time)
, bit = median(microbenchmark(bit_rangediff(c(a,b), i), times=times)$time)
1
, merge = median(microbenchmark(merge_rangediff(c(a,b), bit_sort(i)), times=times)$time)
)
knitr::kable(as.data.frame(as.list(x/x["R"]*100)), caption="% of time relative to R", digits=1)
R bit merge
100 19.4 21.5
The vignette is compiled with the following performance settings: 5 replications with domain size small 1000
and big 106 , sample size small 1000 and big 106 .
2
% size and timings in 'rare' scenario
size wb
size R
[] w
[] R b
[which] w
[which] b
R
[which]<−TRUE
[which]<−TRUE bw
R
[]<−logical w
[]<−logical b R
! w
! b R
& w
& b R
| w
| b R
== wb
== R
!= w
!= b R
summary w
summary b R
0 20 40 60 80 100
Figure 1: % size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘rare’ scenario
3
% size and timings in 'often' scenario
size wb
size R
[] w
[] R b
[which] w
[which] R b
[which]<−TRUE w
[which]<−TRUE R b
[]<−logical w
[]<−logical b R
!
! bw R
&
& bw R
| w
| b R
== w
== b R
!= w
!= b R
summary w
summary b R
0 20 40 60 80 100
Figure 2: % size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘often’ scenario
4
% size and timings in 'coin' scenario
size w
size b R
[] w
[] R b
[which]
[which] R b w
[which]<−TRUE w
[which]<−TRUE R b
[]<−logical w
[]<−logical R b
! w
! b R
& w
& b R
| w
| b R
==
== b Rw
!=
!= b Rw
summary w
summary b R
0 50 100 150
Figure 3: % size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘coin’ scenario
5
% memory consumption of filter
% time extracting
% time assigning
6
Table 7: % time of logical
% time Boolean OR
7
coin often rare chunk
bitwhich 16.7 2.5 2.8 6.2
which NA NA NA NA
ri NA NA NA NA
coin often
logical 100.0 36.3
bit 49.9 14.4
8
unsorted
sort
b
R
sortunique
b
R
0 20 40 60 80 100
execution time
sorted
sort
b
R
sortunique
b
R
0 20 40 60 80 100
execution time
9
Timings in 'unsorted bigbig' scenario
unique
unique b m
duplicated R
duplicated b m
anyDuplicated R
anyDuplicated b m
sumDuplicated R
sumDuplicated b m
match R
match m
inin R
b m
notin R
notin b m
union R
union b m
intersect R
intersect b m
setdiff R
setdiff b m
symdiff R
symdiff b m
setequal R
setequal b m
setearly R
setearly b m
R
0 20 40 60 80 100
Figure 5: Execution time for R, bit and merge relative to most expensive R in ‘unsorted bigbig’ scenario
10
Timings in 'sorted bigbig' scenario
unique
unique mb
duplicated R
duplicated mb
anyDuplicated R
anyDuplicated mb
sumDuplicated R
sumDuplicated mb
match R
match m
inin R
mb
notin R
notin mb
union R
union mb
intersect R
intersect m b
setdiff R
setdiff mb
symdiff R
symdiff m b
setequal R
setequal mb
setearly R
setearly mb
R
0 20 40 60 80 100
Figure 6: Execution time for R, bit and merge in ‘sorted bigbig’ scenario
small big
sort 171.3 637.1
sortunique 100.9 52.9
small big
sort 25.1 72.4
sortunique 20.5 13.7
small big
bit 160.5 24.5
merge 34.6 9.2
sort 0.0 0.0
11data relative to R
Table 17: unsorted
small big
bit 170.7 17.2
% time for anyDuplicated
small big
bit 170.1 31.7
merge 29.9 10.2
sort 0.0 0.0
small big
bit 247.3 21.0
merge 274.1 69.4
sort 241.6 63.7
small big
bit 153.5 27.5
merge 35.0 11.0
sort 0.0 0.0
small big
bit 140.0 13.8
merge 218.4 59.5
sort 187.2 54.2
12
smallsmall smallbig bigsmall bigbig
sort 379.4 58.2 113.5 55.0
% time for in
13
Table 31: unsorted data relative to R
14
Table 37: unsorted data relative to R
15