Os Unit4 PDF
Os Unit4 PDF
Storage Management
Lecture 21
November 11, 2004
The A Machine
In order to extend the semantics of the E machine with transition rules for
automatic storage management, we must enrich our model of expressions,
values, and program states. For the purposes of our discussion today, we
will use a version of MinML that includes integers, functions, and lists. As
we alluded to above, in order to provide a framework for automatic storage
management, the A machine will distinguish small values from large values,
as follows.
Small Values v ::= num(n) | nil
Large Values w ::= hhη; eii | cons(v1 , v2 )
Closures and cons cells (i.e. large values) will not be stored directly in the
stack or environment; instead we will use locations to refer to them indi-
rectly. As in our formulation of references, locations (denoted syntactically
as l) will not appear in the concrete syntax.
Locations l
Expressions e ::= . . . | loc(l)
Small Values v ::= . . . | loc(l)
Frames f and stacks k are given as before but with the replacement of small
values for values.
1
The heap is similar in notion to the store as it appeared in our discussion of mutable
references; however, while the store may be updated by assignment, the heap is immutable
from the programmer’s perspective.
Recall that environment frames k I η on the stack are popped when values
are returned past them, and that variables are looked up in the environ-
ments on the stack from right to left (see also Assignment 4 and the code in
the sample solution). We will now return to the question, when can values
safely be removed from the heap?2
(let p = cons(3,cons(4,nil)) in
case p of nil => 2
| cons(n,k) = p in
[a] fn x => n
end
end [b]) 7 [c];
If we allocate p as described above, when it is safe to free it? At point [a]?
[b]? [c]? We would like to release the storage associated with a location
2
Though if we recall our original question with respect to references, we should note
that the ideas described here can also be extended to encompass mutable storage.
FL(H, k, η) = ∅
?
H∪ H0 ; k > e 7→a H ; k > e
Recall that this rule was deficient in its inability to reclaim (unreachable)
cycles in H. For the time being, however, we will tackle a larger problem:
how can we separate H from H 0 ?
Tracing Collection
At the most abstract level, the garbage collector has to traverse the stack
k and follow chains of location pointers in the heap in order to see which
locations may still be relevant to the evaluation of e in k. Note that an ex-
pression e may contain free variables (which will be bound to small values
in environment in k), but never free locations. This means we don’t have
to traverse e to see which heap cells may be “live” for the current compu-
tation. This general technique is called tracing. We now describe a tracing
collector using our notation of judgments. In what follows we describe
more concrete realizations of this general idea that are closer to what actual
implementations do.
The state of the garbage collector has the form Hf ; k ; Ht where Hf
is the so-called from-space that we are traversing and Ht is the so-called
to-space where we move reachable locations found in Hf . Since locations
remain abstract, we simply move them from Hf to Ht . The judgment above
is invoked in the following way:
3
“Conservative” is also, somewhat erroneously, used to describe garbage collection in
the presence of incomplete knowledge of the structure of the stack or heap (e.g. as in an
implementation of C).
H ; k ; · 7→∗g Hf ; • ; H 0
H ; k > e 7→a H 0 ; k > e
That is, we start the garbage collector with the current heap H as the
from-space and an empty to-space. Then we trace k and H, moving lo-
cations to the to-space until the stack is empty and we can return to the
normal evaluation.
Note that this rule can apply whenever we are in the process of evalu-
ating an expression. In a more realistic scenario the garbage collector either
starts when we run out of space or acts concurrently on the heap.
Next we describe the rules for garbage collection, using single-step tran-
sitions. We use the stack k as a “stack”, pushing onto it those portions of the
small values that we may still have to trace. Since a stack cannot have val-
ues on it directly, only environments, we will use environment with anony-
mous variables. Recall the invariants on expressions (only free variables, no
locations), environments (binds variable to small values) and heaps (binds
locations to large values).
Hf ; k . cons(, e2 ) ; Ht 7→g H f ; k ; Ht
Hf ; k . cons(v1 , ) ; Ht 7→g Hf ; k I ( =v1 ) ; Ht
Hf ; k . case(, e2 , x.y.e3 ) ; Ht 7→g Hf ; k ; Ht
Hf ; k I · ; Ht 7→g Hf ; k ; Ht
Hf ; k I (η, x=nil) ; Ht 7→g H f ; k I η ; Ht
(Hf , l=cons(v1 , v2 )) ; k I (η, x=l) ; Ht 7→g
Hf ; k I (η, =v1 , =v2 ) ; Ht , l=cons(v1 , v2 )
Hf ; k I (η, x=l) ; (Ht , l=w) 7→g Hf ; k I η ; (Ht , l=w)
Copying Collection
We now give a slightly lower level view of garbage collection where both
from-space and to-space are actually regions in memory whose cells are
addressed by integers. In this case, we actually divide the whole available
memory into two disjoint regions: one that the evaluator uses, and one that
is reserved for the time that we need to call the collector.
Heap cells are allocated from lower to higher addresses, using a spe-
cial next pointer to keep track of the next available address. The garbage
collector is invoked when we are attempting to use more than half of the
available space.
We then trace the stack and the cells in from-space, moving the cell con-
tents to to-space as we encounter them. Of course, references to memory in
the stack need to be updated to point to the new locations of the cells.
Moreoever, we need to account for multiple pointers to the same loca-
tions. In order to preserve sharing, we replace the cell content by a forward-
ing pointer that goes from from-space to to-space. When we encounter a
forwarding pointer when tracing the heap, we just update the pointer in
the stack to the destination of the forwarding pointers.
Once the whole stack has been traced, all reachable cells have been
moved to the beginning of the to-space. As this point we flip the roles
of the two semi-spaces and resume evaluation.
A pictorial example of copying collection can be found in Figure 1. The
contents of blank cells is irrelevant for the purposes of the garbage collec-
tion algorithm. They will never be visited because tracing never reaches
them.
There are many refinements of copying collection. For example, in
order to avoid using additional stack space for tracing, we use a second
pointer in to-space so that we always know we still have to trace the region
between this second pointer and the next pointer. In essence, we use the
heap as a kind of special purpose stack.
Other refinements include incremental collection, where we do not com-
pletely stop the running program but interleave actions of the garbage col-
lector with actions of the running program, and generational collection where
we collect smaller parts instead the whole semi-space all at once.
Mark-and-Sweep Collection
Another important algorithm for garbage collection is mark-and-sweep,
even though it seems to have fallen into disfavor more recently.
A mark-and-sweep collector does not divide the heap into two semi-
space, but reserves an additional bit for each heap cell called a mark. Ini-
tially all heap cells are unmarked, and the heap is arranged into a linked
list of cells called the free list. When we allocate an element from the heap
we take the first element from the free list and update the free list pointer
to its next element.
When the free list become empty, we have to invoke the garbage collec-
tor. It traces the heap, starting from the stack, much in the same way as the
copying collector. However, rather than copying heap elements it marks
them as being reachable.
In a second phase the garbage collector sweeps through the whole mem-
ory (not just the reachable cells). During this sweep it adds any unmarked
cells to the free list and removes the mark from any marked cells.
A graphic example of mark-and-sweep collection can be found in Fig-
ure 2.
In Assignment 8 you have the opportunity to compare copying and
mark-and-sweep collection and assess their relative merits, so we will not
give a detailed analysis here. One advantage of copying collection that
your analysis will probably not be able to reveal is locality. When copy-
ing, we actually move the elements of the data structures closer together, at
the beginning of the to-space. This means better cache behavior which can
have dramatic impact on running times on modern machine architectures.
As a result, even more mark-and-sweep garbage collectors some algorithm
for compacting memory have been developed to avoid the natural fragmen-
tation of the heap.
Reference Counting
In a reference counting garbage collector every cell has a counter associated
with it that tracks the number of references to it. When we allocate a cell,
this counter is initialized to 1. Operations of the (abstract) machine need
to maintain these counters. As soon as one of them becomes 0, the cell is
deallocated and the reference counts of the cells that it might point to are
decremented, leading perhaps to further garbage collection.
Reference counting is suspect for the heaps of functional languages be-
cause of the overhead of maintaining the reference counts, and because it
does not work properly with circular structures which prevent reference
counts from going to 0! However, the are many less general situations
where reference counts are appropriate, such as file descriptors in an oper-
ating system, or channels for communication in a distributed environment.
In those situations, the overhead of maintaining reference counts is small,
while a tracing collector would be hard or impossible to implement because
we may not know or even have access to the internals of all processes that
my access a resource.
L1
N
1
4 NIL 2
3 3
4 NIL
2
N
L1
1
L2 L2
9 9
L1
1
4 NIL
L2
9
4 NIL 4 NIL
3 3
2 2
L1 L1
1 1
L2 F L2
9 9
4 NIL 4 NIL
3 3
2 2
L1 L1
1 1
L2 L2
9 9