Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!ix.netcom.com!netcom.com!bakul
From: bakul@netcom.com (Bakul Shah)
Subject: Re: On-the-fly code generation
Message-ID: <bakulD6DK6n.G9s@netcom.com>
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
References: <bakulD69s9w.44D@netcom.com> <3lh3nm$l6t@bell.maths.tcd.ie>
Date: Sat, 1 Apr 1995 21:11:11 GMT
Lines: 90
Sender: bakul@netcom16.netcom.com


>bakul@netcom.com (Bakul Shah) writes:
>>        It seems to me this is a fairly straight-forward
>> task that will provide a considerable performance
>> improvement.  

bosullvn@maths.tcd.ie (Bryan O'Sullivan) writes:
>Well, it depends on what you mean by "on-the-fly".  If you just mean
>some sort of scheme that treats your bytecode representation as a
>"portable" low-level representation a la ANDF, then yes, things are
>pretty straightforward.

You are right that a straight-forward implementation will have to
use something low level.  But I believe even this simple minded
translation will improve performance considerably.  I recall
reading how replacing a switch statement with code to directly
jump to a particular case (using gcc's label datatype) almost
doubled the performance of a SmallTalk virtual machine.  Even
that jump is removed by a dumb translator (on the other hand
there may be more code to bust your cache).  You can also keep
some more things in registers.  The bytecode can be designed to
make such translation efficient (e.g. by assuming a certain
number of registers).

>                         However, it doesn't conceptually take much to
>go from this approach to generating chunks of code on a demand-driven
>basis and use information you have discovered at runtime to further
>optimise your code (by moving runtime constants into your code,
>eliminating newly-discovered dead code, &sw.), but this is rather a lot
>harder to do well on modern architectures with their split multi-level
>caches and deep pipelines.

You may need OS support or some messy code to deal with
system specific cache/pipeline issues but I don't see how that
affects things like dead code elimination and other peep-hole
optimizations.

There is a range of optimization techniques one can try and quite
a few need to be tried *before* the byte code generation.  Beyond
that perhaps one can use something like burg to do instruction
sequence selections given a particular bytecode sequence.

>And if it can be shown to be a win, then there is a rather huge
>investment of time and effort required to write the translator and get
>it running on even the half-dozen most popular processor/OS
>combinations.  If I were implementing a free Scheme, I'd be able to
>think of twenty more obviosuly useful ways to spend my time at the drop
>of a pin.

I hate to say it but a free Scheme that runs fast on a x86
machine (under Linux or Windows or MSDOS) will at this point will
have a *huge* ``market''.  Other implementations would follow if
there was enough merit in that first implementation and it was
done cleanly.

The situation today is that I have access to more than 15 free
implementations.  They are great for small projects or if you
have a fast machine with gobs of memory but they do not scale too
well.  I think that is because they all do the easy and/or the
interesting part.  They either don't do the `engineering' grunge
work or rely on C.

>As for "true" run-time code generation, that's just an order of
>magnitude more difficult on modern machines for the reasons cited
>above, and unless processor vendors have a rethink of their strategies,
>it's going to become more and more difficult to do a decent job of it.

Yup.  The reality is that the language implementors have to work
extra hard or be content with their language remaining a niche.

>For the time being, I'd say the closest approximation you might be able
>to get to runtime native code compilation is to hack you own solution
>using SCM, Hobbit, and SCM's dynamic loading capability.  The result
>would likely be bletcherous and slow, but I don't think you'll see
>anyone doing any serious work on either ANDF-style translation or RTCG
>in the Scheme implementation world any time soon.

I will look at scm and vscm again to see how hard this will be.

>Dave Keppel at UWash wrote a good paper on the issues in RTCG about
>three years ago; the situation has become more skewed against RTCG
>since then, but it's still an interesting read.  Check out "A case for
>runtime code generation", available from ftp.cs.washington.edu
>somewhere in the pub/TR directory, as I recall.

I am aware of his work (I have his paper around here somewhere....)
My present interest stems from reading the Java papers.  It feels
like such a translation will be a win but I don't know for sure.

Bakul Shah
