Optimizing R VM: Allocation Removal and Path Length Reduction Via Interpreter-Level Specialization
Optimizing R VM: Allocation Removal and Path Length Reduction Via Interpreter-Level Specialization
prev node and next node that link all R objects for the
garbage collector
The VECTOR data structure is used to represent vector
and matrix objects in R. The body of the VECTOR records
vector length information and the data stored in the vector.
Scalar values are represented as vectors of length one.
The SEXPREC data structure is used to represent all R
data types not represented by VECTOR such as linked-list
and internal R VM data structures such as the local frame.
The body of SEXPREC contains three pointers to SEXPREC
objects: CAR, CDR, and TAG. For instance, a local frame
is represented as a linked list of entries where each entry
contains pointers to a local variable name, the object assignd
to the local variable, and the next entry in the the frame.
2.1.3 Memory Management
Memory Allocator The memory allocator pre-allocates
pages of SEXPREC. A request is satised just by getting one
free node from a page. The memory allocator also preallo-
cates some small VECTOR objects in different page sizes to
satisfy requests for small vectors. A large vector allocation
request is performed through the system malloc.
Garbage Collector R VM does automatic garbage collec-
tion (GC) with a stop-world multi-generation based collec-
tor. The mark phase traverses all the objects through the link
pointers of the object headers. Dead objects are then com-
SEXPREC VECTOR_SEXPREC
sxpinfo_struct sxpinfo
SEXPREC* CAR
SEXPREC* CDR
SEXPREC* TAG
SEXPREC* attrib
SEXPREC* pre_node
SEXPREC* next_node
sxpinfo_struct sxpinfo
SEXPREC* attrib
SEXPREC* pre_node
SEXPREC* next_node
R_len_t length
R_len_t truelength
Vector raw data
S
E
X
P
R
E
C
_
H
E
A
D
E
R
Figure 6. Internal representation of R Objects.
pacted to free pages. Dead large vectors are freed and re-
turned to the system.
Copy-on-write Every named object in R is a value object
(i.e., immutable). If we assign a variable to another variable,
the behavior specied by the semantics of R is that the value
of one variable is copied and this copy is used as the value of
the other variable. R implemented copy-on-write to reduce
the number of copy operations. There is a named tag in the
object header, with three possible values: 0, 1, and 2. Values
0 and 1 mean that only one variable points to the object
(value 1 is used to handle a special intermediate state
2
). By
default the named value is 0. When the variable is assigned
to another variable, which means more than one variable
point to the same underlying object, the objects named tag
is changed to 2.
When an object is to be modied, the named tag is con-
sulted. If the value is 2, the runtime rst copies the ob-
ject, and then modies the newly copied object. Because the
runtime cannot distinguish whether more than one variable
point to the object, named remains 2 in the original object.
2.2 Performance of Type I Codes
Type I R programs suffer from many performance problems
that it has in common with of other interpreted dynamic
scripting languages, including