Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!newsserver.jvnc.net!nntpserver.pppl.gov!princeton!news.princeton.edu!blume
From: blume@atomic.cs.princeton.edu (Matthias Blume)
Subject: Re: multiple-value return & optimising compilers
In-Reply-To: shivers@clark.lcs.mit.EDU's message of 31 Jan 1995 02:37:40 -0500
Message-ID: <BLUME.95Jan31114048@atomic.cs.princeton.edu>
Originator: news@hedgehog.Princeton.EDU
Sender: news@Princeton.EDU (USENET News System)
Nntp-Posting-Host: atomic.cs.princeton.edu
Organization: Princeton University
References: <dig-Scheme-7.26@mc.lcs.mit.edu> <9501310741.AA25479@clark.lcs.mit.edu>
Date: Tue, 31 Jan 1995 16:40:48 GMT
Lines: 368

In article <9501310741.AA25479@clark.lcs.mit.edu>
shivers@clark.lcs.mit.EDU (Olin Shivers) writes:

   Matthias Blume makes the claim that multiple-value return produces
   objects that aren't first-class, in the sense that the "container"
   holding the multiple values isn't accessible as other values are.

Multiple return values -- as the term implies -- are collections of
values.  There are other collections of values already in the
language, which are first-class (lists, vectors).

[Actually, those collections are really collections of location, which
are mapped to values by the store.  This extra indirection is part of
the reason why things are more complicated to statically analyze
(eq?-ness, side-effects, etc.).  Perhaps we do need a new kind of
container: a true collection of values without identity and without
extra indirection through the store -- ML-style tuples.]

In the denotational semantics multiple values are described by members
of the E* domain.  As it so happens the E* domain is not part of the E
domain, and therefore elements of it are not first-class.  But they
*can* be viewed as single members of a certain domain (and in fact
they are).  Making them first-class corresponds to including the E*
domain into E, which can be done.  The whole discussion is about
whether we should or we shouldn't.  I say we should, and most others
on this thread say we shouldn't.

   Multiple return values are not a data-structure; this is like
   saying variables aren't first class or something.

Indeed, variables are not first-class in Scheme.  Variables (domain
Ide) are mapped by environments (domain U = Ide -> L) to locations
(domain L).  Neither Ide nor U are part of E, and even L is not
included individually but only as part of compounds like Ep = L x L x
T (domain of pairs).  (The existence of a C-style address-of operator
would require to include L into E.)

   Multiple return values are completely symmetric with multiple
   parameters to procedures.

True.  And I have come to dislike both of them as long as they are
only second-class.

   The latter allows you to pass multiple values to a procedure;

There are other ways to achieve the same.

   the former allows you to pass multiple values to an implicit continuation.

There are other ways to achieve the same.

   CALL-WITH-VALUES essentially
   allows you to specify the arity of a continuation.

Arity seems to be Scheme's half-baked attempt of adding some type
information.  We know the arity of a function (or a continuation for
that matter -- however, that's really the same), but because we don't
have a consistent type system to propagate this information throughout
the program we can't take advantage of it unless we severely restrict
objects of this type: we make them second class, so they can't escape
and do bad things we can't control.

However, work has been done in this area just recently (Andrew
Wright's soft typing).  Despite of Scheme's deficiencies in soundness
of the type system we can infer the necessary information most of the
time.  The major problem is the open-endedness of the world (no closed
program units).  I'm constantly begging for the addition of a
reasonable module system to Scheme -- in fact, I've tried to do some
work in this area.  This would not only allow us to optimize redundant
boxing-operations for first-class multiple values, it would also
enable many other optimizations, or at least justify existing
techniques.  (Most good Scheme compilers add their own little ad-hoc
syntax to close over program units, which are compiled separately.)

Olin gives a good summary of these points with:

   Matthias is consistent when he says he doesn't like multiple
   parameters to procedures for the same reasons he doesn't like
   multiple return values. As he points out, this is the road taken by
   ML, which is completely consistent.  In ML, functions take and
   return exactly one value. This is elegant. It raises certain
   implementation issues.

So?

   But with ML, at least the static type system
   -- in particular the record types -- can help out the
   compiler.

Soft typing can do the same for Scheme most of the time.

   With Scheme, you have no such assist.

Yes, we have.  Ask Andrew Wright!

   ML also has a
   powerful pattern matching facility, elegantly integrated with the
   language's sum and product types, that makes it transparent to
   destructure compound values. Scheme doesn't even have standard
   product or sum types, much less statically typeable ones.

Syntax is indeed something to consider.  But we have a very powerful
macro system, so we wouldn't need to design the syntax into the core
language.  We also don't have true sum or product types, but we can
pretend to have them, run the static analyzer, and soften our
assumptions as necessary.  In most cases we will still get enough
information to drive the optimization.  (This is an unfounded claim of
mine, for details you should consult Andrew Wright's thesis or ask him
personally.)

   I like multiple return values in Scheme. They are consistent with
   its multiple procedure parameters, and useful for programming.

I don't like multiple arguments in Scheme. They are inconsistent with
single return values, and they add nothing to the expressiveness of
the language.  In fact, the take away from the expressiveness, because
I can't write higher-order functions without knowing the arities of my
arguments and without going through explicit boxing and unboxing
operations in the case that I don't know them.  My standard example is
the `compose' procedure, which everybody should be sick of by now.  In
ML I write

	fun compose f g = fn x => (f (g x))

which renders `compose' to be of type ('b -> 'c) -> ('a -> 'b) -> ('a -> 'c)

This makes no assumptions about the arity of `f', `g', or their
respective continuations other than that g's continuation must accept
the same type that `f' accepts.  But we can easily write

	fun twice x = (x, x)
	fun plus (x, y) = x + y
	fun id x = x

and freely apply `compose' as in:

	compose id id;
	compose twice plus;
	compose plus twice;
	compose id twice;
	compose twice id;
	compose twice twice;

To write a `compose' function in Scheme which is equally general I
have to stand on my head:

	(define (compose f g)
	  (lambda args          ; explicit boxing of arguments
	    (call-with-values
	      (lambda ()        ; needless syntactic clutter
		(apply g args)) ; unwrapping of argument box
	      f)))

I seriously doubt that any compiler which supports `values' would also
compile away the explicit boxing and unboxing in this example --
exactly *because* we have a special construct.  If we rely on
programmer optimization and provide language constructs to aid such,
then we can't reasonably expect the compiler to do the same kind of
optimization internally.  And if the compiler would do it anyway, then
we wouldn't need programmer optimization in the first place.

   Matthias also claims it is bogus to have a procedure VALUES that
   returns multiple values, because you can't write (+ 3 (values 4
   5)).

What I was complaining about was that I can't write

	(f (values 3 4))

where f is defined as

	(define (f x) x)

Normally a compiler could simply eliminate the call to f, because it
is the identity function.  With `values', however, this call is
illegal. If `values' would return some kind of container object
carrying `3' and `4', then eliminating the call to `f' is legal in all
cases.

Look at:

	(call-with-values
	  (lambda () (f (values 1 2)))
	  list)

There are three possible outcomes of this:

	1. error
	2. (1)
	3. (1 2)

The error-checking camp probably votes for 1., the `drop extra values
implicitely' camp votes for 2., Personally I would prefer 3., because
it corresponds to being able to eliminate any call to any identity
function all of the times.  But this also conceptually corresponds to
`values' returning a first-class container object, because eliminating
the call to identity is only an optimization, while in the language
definition we don't talk about these things.

So conceptual clarity and orthogonality on the language design level
does correspond to being able to perform important and powerful
optimizations.  Scheme is loosely bases on lambda calculus.  If the
foundation would be stronger, then we could apply the usual beta- and
eta-reductions during optimization phases more freely.  Beta-reduction
(function inlining) and eta-reduction (elimination of redundand
closures) are very important steps in optimizing higher-order
languages.  If we add non-orthogonal things to the language, which
render these reductions invalid in general, then we actually lose
potential for good optimizations.  Unfortunately, in Scheme we already
have tons of things, which are to be watched out for during
optimization phases.  Let's not add even more!

   Encapsulating the base functionality of m-v in a procedural form
   (with the VALUES and CALL-WITH-VALUES procedures) is the way you do
   things in Scheme. It is analogous to encapsulating the base
   functionality of continuations in a procedural form (with the
   CALL/CC procedure and the reified continuation procedures it
   produces).

No, this is not true.  call/cc is a way of getting our hands on
implicit continuations.  No-one would doubt that implicit
continuations are a reality (`behind the scenes') -- regardless of
whether we are able to reify them or not.  Multiple values of the
second-class kind only exist if we create them.  We don't have to
create them, because first-class multiple values would do the same job
just as well (and often better).  You can think of

	fun f (x, y, z) = g (x, h (y, z))

as taking three arguments, coming in an implicit container (which
doesn't have to exist physically if the compiler is sufficiently
clever).  But it is only a syntactically convenient notation for

	fun f f_args = let
	  val h_args = (#2 f_args, #3 f_args)
	  val h_result = h_args
	  val g_args = (#1 f_args, h_result)
	in
	  g g_args
	end

where everything is done explicitely and all three arguments are
temporarily boxed in f_args.  I'm not sure, but I would be surprised
if there were any differences in the code produced for either of the
two versions in SML/NJ.

   I frequently see people on this list claim that it's a simple
   matter of global program analysis to handle all of the horrible
   inefficiencies introduced by various proposals -- such as Matthias'
   one parameter/one return value proposal.

It is not a proposal.  I'm opposing the horrible `multiple values'
proposal, which was somebody else's.  In the course of the argument I
have come to realize that in order to be consistent we need to scrap
multiple arguments as well.

   I have noted that people who have actually implemented aggressive
   native-code Scheme compilers -- such as Orbit, Gambit, or Chez
   Scheme -- are usually not the people who make these claims.

I have noted that people who have implemented aggressive native-code
ML compilers (like Andrew Appel) are exactly the ones who make these
claims.

   It is
   very difficult to do these compilers, and global analysis of
   higher-order languages is quite tricky and does not always pay off.

It is actually not that hard to do, and most of the time it pays off.
I admit that it is quite a bit trickier in Scheme than it is in ML --
but this must be blamed on various other deficiencies in Scheme's
design.  The `values' proposal tries to patch a performance hole
(where, frankly, I don't even see one).  If we go on with this kind of
patchwork, then we will end up with something like C, or -- God help
us -- C++.

   If you believe that a magic global analysis will make the
   implementation problems go away,

This question must be phrased: ``... will make DESIGN problems go
away, ...''

   then I invite you to design and implement such an analysis.

I would like to accept your invitation.  But at the moment I don't
have time for that.  Also, this discussion has severely reduced my
enthusiasm for trying to produce a good Scheme implementation, because
I have realized that Scheme is not my favorite language anymore.
Once Scheme was an example of progress in language development.  Now
it seems like progress in Scheme has come to a full stop, and we are
about to switch to reverse gear.

   Not only will your results be worth serious academic acclaim,

I doubt that, because it won't be anything new.  Perhaps it would be
new to Scheme, but who cares?  [Only a year ago I would have loved to
start work on that, but wisely my advisor has stopped me.]

   they will significantly impact real-world programming.

Keep dreaming...

   I don't want to dampen anyone's enthusiasm. Sarcasm aside, I really
   do encourage anyone who wants to make advanced programming
   languages go fast -- go for it. It's a fun problem, if you are into
   that kind of thing (I am). But people have been working on this one
   for a while. If it was easy, it would have been done by now.

It is certainly not easy, but it has been done.  The solution also
included changes to language design itself, which is why we don't see
the solutions in Scheme.

   I want to use powerful languages to do real systems programming.

Give ML a try!

   I don't want to have to use C.

Who does...

   Efficiency matters.

Of course.  I'm not opposing efficiency.  I'm opposing questionable
performance hacks, which try to patch over deficiencies in overall
language design and which at the same time add even more of these
deficiencies.

   For a large set of programming tasks --

   graphics,

*never* in Scheme

   operating systems,

*never* in Scheme, but *perhaps* one day in ML or a successor thereof

   ...

   So arguments of the form, "Of course it makes the system slow, but
   it's minimal and elegant, so it's OK" don't work for me.

Neither do they work for me.  Conversely, arguments of the form "Of
course it is ugly, but right now I can't think of how to make the nice
thing run fast, so it's OK" don't work for me either.

   It's easy to appeal to hypothetical static analyses to justify
   introducing inefficient features into a language.

Please take note, that I was not the one who wanted to introduce
anything into the language in the first place.

It's easy to appeal to ad-hoc additions to a deficient language to
justify not having to think about how to remove the deficiencies.
Please, don't call this kind of analysis `hypothetical'!  It has been
done, if not in Scheme then in other languages.  If we find it can't
be done in Scheme, then this is a reason to think about why this is so
and to draw the conclusions.

For me the conclusions will never be to solve problems `by piling
feature on top of feature, but by removing the weaknesses and
restrictions that make additional features appear necessary.'  This
attitude expressed in the introduction to the Scheme report was the
prime reason for me to choose Scheme as my favorite language. The lack
thereof among the current Scheme crowd has taken this reason away.

--
-Matthias
