Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!news2.near.net!MathWorks.Com!europa.eng.gtefsd.com!howland.reston.ans.net!swrinde!elroy.jpl.nasa.gov!decwrl!adobe!macb022.mv.us.adobe.com!user
From: mhamburg@mv.us.adobe.com (Mark Hamburg)
Subject: Re: The expense of call/cc (was R4RS)
Message-ID: <mhamburg-061094091103@macb022.mv.us.adobe.com>
Followup-To: comp.lang.scheme
Sender: usenet@adobe.com (USENET NEWS)
Organization: Adobe Systems, Inc.
References: <19941003T142911Z.enag@gyda.ifi.uio.no> <hbakerCx3zM5.FKn@netcom.com> <19941004T194114Z.enag@hnoss.ifi.uio.no> <mhamburg-051094101457@macb022.mv.us.adobe.com> <36ut56$835@larry.rice.edu>
Date: Thu, 6 Oct 1994 17:11:02 GMT
Lines: 111

In article <36ut56$835@larry.rice.edu>, drichter@owlnet.rice.edu (David
Richter) wrote:

> In article <mhamburg-051094101457@macb022.mv.us.adobe.com>, mhamburg@mv.us.adobe.com (Mark Hamburg) writes:
> |> blume@beth.cs.princeton.edu (Matthias Blume) writes:
> |> 
> |> > Why does this argument keep popping up again and again?!  I mean the
> |> > one about call/cc.  There is still no clear evidence that call/cc
> |> > hinders efficient compilation.  In fact, SML/NJ shows that such an
> |> > implementation can be both simple and efficient.
> |> When I heard about Shao and Appel's papers on heap allocation being as
> |> cheap as stack allocation, I eagerly pursued them.  Unfortunately, what it
> |> really boils down to is that heap allocation is as cheap as stack
> |> allocation if we have to implement general call/cc.  This assertion is
> |> interesting but is not necessarily relevant when worrying about whether to
> |> include generalized call/cc in a LISP implementation.
> 
> Rather, what is boils down to is that: if the choice is a stack with
> traditional push/pop usage vs. allocating frames on a
> copying-garbage-collected heap, the latter is no worse and sometimes
> better.
> 
> Regardless of whether the language has call/cc.
> 
> The basic argument is that pushing a frame and allocating a frame have
> the same expense, but popping a frame always costs at least one
> instruction (decrement stack pointer) while garbage-collecting that
> frame (using a copying garbage collector) is (asymptotically) free.
> 
> Where call/cc comes in is as follows: if the frames are heap
> allocated, then they are easily made persistent (as opposed to the
> ephemeral frames of a traditional stack).  If frames are persistent,
> then call/cc is trivial to implement (just change the frame pointer
> and jump).
> 
> David

Repeating the instruction cost table from Appel and Shao's paper:

                   Heap        Stack

Frame creation      3.1          1.0
Frame pointers      2.0          0.0
Copying, sharing    0.0          3.4
Cache read misses   0.1          0.0
Disposal (pop)      0.2          1.0

I have left out the possibility of a charge for cache write misses which
costs the heap strategy 10 instructions and is free for the stack
implementation because it only afflicts some architectures.

Now, some words about where these costs come from (summarizing Appel and
Shao):

Frame creation: (a) In addition to advancing a pointer (same cost for heap
and stack), the heap allocator needs to check for heap overflow.  Stack
overflow is generally detected using the memory protection hardware because
it is expected to be rare -- i.e. the program has probably gone into an
infinite regress.  Appel has retreated from his stance that memory
protection be used to detect heap overflow.  (b) The heap allocator needs
to write a descriptor word into the  heap allocated frame for the garbage
collector.  The stack based approach generally relies on knowing how to
find return addresses (it can do so because the stack consists only of
activation records) and then mapping these return addresses to frame
descriptors.  (c) Generally, the instruction which advances the free
pointer in the heap is not also the one that sets the frame pointer.

Frame pointers: This is the problem of resetting the frame pointer when we
return from a function.  On the stack, this can be done as part of the
frame pop (charged elsewhere).  In the heap, we need to store the frame
pointer when we create a frame and fetch it when we return.

Copying, sharing: This reflects the cost inherent in building first-class
closures in a stack based world where closures cannot point to frames. The
figure they present is based on looking at the cost incurred in SML/NJ.  (I
would note here that SML/NJ compiles using continuation passing style and
hence is probably relatively closure intensive.)  Appel and Shao separately
analyze the cost of call/cc for stack (expensive) v. heap (free).

Cache read misses: This is basically the charge for the heap running
through a larger region of memory than does the stack and hence suffering
from more misses in the primary cache.

Disposal: The costs of disposal on the heap are relatively minimal with a
generational scavenger and come out to less than the pop instruction for
the stack.  The use of a copying generational scavenger imposes a need to
avoid writing into stack frames after they have been allocated or else
suffer the penalty of potentially writing to older generations.  This
"cost" can be minimized -- assuming that it is a reasonably rare event --
if you are running under an OS that allows user programs to manipulate the
paging/protection tables.  This restriction is not a problem for ML but
could be a problem for LISP -- though probably less so in the average
Scheme program.  Copying collectors also complicate the interface to
foreign code since we can't readily update the pointers handed off to that
code.

In total: 5.4 instructions/frame each way but most of this cost for the
stack allocated frames is associated with closure construction.  Most
closures do not need to be first-class and hence really could live on the
stack.  This is particularly true in a LISP system with objects since
closures will no longer be used to implement objects.  Hence, I suspect
that the 3.4 instruction figure could actually be reduced through a careful
design that delayed closure construction until a reference to the closure
was potentially being stored somewhere that would cause it to live beyond
the dynamic scope imposed by a stack based discipline.  Liberal use of
general call/cc adds significantly to the cost of a stack based discipline.
 However, support for non-local exits and threading (i.e., multiple stacks)
would probably mitigate much of the need for call/cc.

For more information, I encourage you to read Appel and Shao's paper.  It
is available via ftp from ftp.cs.princeton.edu.
