Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!das-news.harvard.edu!news2.near.net!MathWorks.Com!europa.eng.gtefsd.com!howland.reston.ans.net!spool.mu.edu!news.cs.indiana.edu!bruggema@cs.indiana.edu
From: "Carl Bruggeman" <bruggema@cs.indiana.edu>
Subject: Re: The expense of call/cc (was R4RS)
Message-ID: <1994Oct8.115151.17399@news.cs.indiana.edu>
Organization: Computer Science, Indiana University
References: <19941003T142911Z.enag@gyda.ifi.uio.no> <mhamburg-051094101457@macb022.mv.us.adobe.com> <36ut56$835@larry.rice.edu> <mhamburg-061094091103@macb022.mv.us.adobe.com>
Date: Sat, 8 Oct 1994 11:51:44 -0500
Lines: 56

In article <mhamburg-061094091103@macb022.mv.us.adobe.com>,
Mark Hamburg <mhamburg@mv.us.adobe.com> wrote:
>Repeating the instruction cost table from Appel and Shao's paper:
>
>                   Heap        Stack
>
>Frame creation      3.1          1.0
>Frame pointers      2.0          0.0
>Copying, sharing    0.0          3.4
>Cache read misses   0.1          0.0
>Disposal (pop)      0.2          1.0
>
>I have left out the possibility of a charge for cache write misses which
>costs the heap strategy 10 instructions and is free for the stack
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>implementation because it only afflicts some architectures.

The problem is that any cache system that reads the existing cache
line from memory when a write miss occurs _will_ incur this penalty.
Although the first-level cache system on many top-of-the-line (and
future) processors may avoid this problem, the vast majority of
workstations in use today will read the line while processing a write
miss.  It is even possible that many low end systems in the future
will still have this behavior as well due to the inherent
implementation simplicity. The bottom line is that implementations
with heap-based frames are not insensitive to cache implementation
issues while implementations with stack-based frames are
insensitive. This implies that implementations with heap-based frames
are not portable (performance-wise) to as wide a range of platforms as
are implementations with stack-based frames.

Another problem is that counting instructions may not give a very
accurate cost analysis for super-scalar processors.  For example, the
frame creation and disposal cost for a stack is given as 2.0, the same
as the cost for manipulating frame pointers.  But, the two
instructions for the stack implementation are simple "add a constant
to a register" which are easy to schedule and issue with other
instructions while the two instructions for frame pointer manipulation
(for heap-based frames) are memory operations which are more difficult
to schedule and thus may take more cycles (though not more instructions).

Appel's paper clearly demonstrates that we cannot simply dismiss the
idea of heap-based frames as "obviously" more inefficient than
stack-based frames. The issues must be examined and measured much more
carefully. Although Appel presents an interesting set of numbers that
shows that the respective costs (on the proper platforms) may be quite
similar, I think that it is premature for non-research oriented
implementers to assume that heap-based frames have the same
approximate cost as stack-based frames.


Carl Bruggeman

bruggema@cs.indiana.edu


