Newsgroups: alt.lang.design,comp.lang.c++,comp.lang.lisp
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!bloom-beacon.mit.edu!panix!zip.eecs.umich.edu!caen!night.primate.wisc.edu!sal.wisc.edu!waldorf.csc.calpoly.edu!kestrel.edu!mcdonald
From: mcdonald@kestrel.edu (Jim McDonald)
Subject: Re: Comparing productivity: LisP against C++ (was Re: Reference Counting)
Message-ID: <1994Dec23.020527.25551@kestrel.edu>
Sender: mcdonald@kestrel.edu (Jim McDonald)
Organization: Kestrel Institute, Palo Alto, CA
References: <19941203T221402Z.enag@naggum.no> <BUFF.94Dec15103904@pravda.world> <D0xAIp.3Dn@rheged.dircon.co.uk> <vrotneyD11MDv.Ks7@netcom.com> <vogtD12y8D.HLL@netcom.com> <3d5alh$6j7@celebrian.otago.ac.nz> <3d8j9r$cm9@celebrian.otago.ac.nz>
Date: Fri, 23 Dec 1994 02:05:27 GMT
Lines: 277
Xref: glinda.oz.cs.cmu.edu comp.lang.c++:104757 comp.lang.lisp:16180


[Apologies again for the length of this post, but it's hard to illustrate
 some ideas with anything less than a transcript.]

In article <3d8j9r$cm9@celebrian.otago.ac.nz>, nmein@bifrost.otago.ac.nz (Nick Mein) writes:
|> 
|> Several people (Michael Callahan, Erik Naggum, Mike McDonald, Larry Hunter)
|> seemed to miss the point of my challenge. Of course I could have
|> written the toy program in a few lines of C. I tried to come up with a problem
|> that C++ seemed well suited to, but that did not take hours or pages
|> of code to solve. A real Image class would have other operations
|> defined than load, invert and save; and would be part of a much
|> larger program (and ideally reusable, so that it could be used in many such
|> programs).
|> 
|> From: marcoxa@mosaic.nyu.edu (Marco Antoniotti)
|> 
|> > 20 Minutes. I do not touch type. I tried to write a "multidimensional"
|> > 'read-sequence' (a function with that name will be in the upcoming CL
|> > ANSI, the first ANSI OO Language standard), decided it was not worth
|> > the effort, went back to the original version, corrected a few errors
|> > (there are still a few around), checked CLtL2 a couple of times
|> > (actually more) and looked up some documentation strings on my
|> > faithful AKCL.
|> 
|> > Pretty good, isn't it? :)
|> 
|> Yes, I'm impressed. Although I did have to change the names of functions
|> load-image and save-image to get your program to run (on a Harlequin
|> LispWorks system). Portability problems in the Lisp world?
|> 
|> > Moreover, the CL program does more than the C++ program
|> 
|> Could you be more specific? I don't read Lisp well enough to be able to
|> detect the extra functionality. Actually, I think that you have supplied
|> somewhat less than I have. Your image class (which, admittedly, is the 
|> significant part of the example) is fine, but your test function does not
|> perform the command line parsing and handling of errors performed
|> by main.

Common Lisp implementations tend to provide a lot of such error handling
automatically--in fact much of that handling is specified by the ANSI
standard, although offhand I'm not sure where the exact boundaries are.

For example, WITH-OPEN-FILE handles a much wider range of errors than
your main program, in a more graceful manner:

  > (with-open-file (s "foo.baz") (format t "[Form read is ~S]" (read s)))
  >>Error: Cannot open file "foo.baz" - No such file or directory
  
  OPEN:
     Required arg 0 (PATHNAME): "foo.baz"
     Keyword arg 1 (DIRECTION): :INPUT
     Keyword arg 2 (ELEMENT-TYPE): STRING-CHAR
     Keyword arg 3 (IF-EXISTS): NIL
     Keyword arg 4 (IF-DOES-NOT-EXIST): :ERROR
     Keyword arg 5 (ATTRIBUTES): NIL
  :C  0: Use a new pathname
  :A  1: Abort to Lisp Top Level
  
  -> :c
  Use a new pathname
  Filename (default is "/usr/home/kestrel/mcdonald/foo.baz"): hack.lisp
  [Form read is (IN-PACKAGE "USER")]

|> Also (maybe significantly) there does not appear to be the equivalent
|> of my "assert(data)" - an important check that some client of the class
|> will not get away with doing something stupid - like manipulating an image
|> that hasn't been loaded yet!

In lisp, runtime type-checking catches many such errors, entering an
interactive debugger from which repairs can be attempted.

Using the quick hack I wrote:

  > (read-image ".cshrc")
  >>Error: End of file on stream #<Stream OSI-BUFFERED-STREAM "/usr/home/kestrel/mcdonald/.cshrc" EA05FE>
  
  READ-BYTE:
     Required arg 0 (STREAM): #<Stream OSI-BUFFERED-STREAM "/usr/home/kestrel/mcdonald/.cshrc" EA05FE>
     Optional arg 1 (EOF-ERROR-P): T
     Optional arg 2 (EOF-VALUE): NIL
  :C  0: Return the given eof-value: NIL
  :A  1: Abort to Lisp Top Level
  
  -> :c
  Return the given eof-value: NIL
  >>Error: The value NIL, given to LUCID::|SETF of AREF (rank=2)|,
               is the wrong type for storing into a (UNSIGNED-BYTE 8) array.
  
  LUCID-RUNTIME-SUPPORT:SET-2DIM-AREF-SUBR:
  :C  0: Supply a new value: 
  :A  1: Abort to Lisp Top Level
  
  -> :a
  Abort to Lisp Top Level
  Back to Lisp Top Level

In this case I knew that using NIL for the read-byte would lose, but I
proceeded anyway to show what happens when you try to process garbage.
Lisp is vastly less likely to allow you to process garbage silently
than a corresponding C/C++ program.   

...

|> Oh, I see. Checking the number of command line arguments, checking
|> the return value of system calls, returning a meaningful value to
|> the system is a lot of junk.

Well, it is nicer to get all of that automatically without having
to clutter your code with it:

  > (read-image "/tmp/hack" 47)
  >>Error: READ-IMAGE called with 2 arguments, but only 1 argument is allowed
  
  READ-IMAGE:
  :C  0: Ignore extra arguments
  :A  1: Abort to Lisp Top Level
  
  -> 

...

|> From: Erik Naggum <erik@naggum.no>
|> 
|> > the functions are also moderately fast.
|> 
|> > * (time (make-image "/tmp/test"))
|> > Compiling LAMBDA NIL: 
|> > Compiling Top-Level Form: 
|> 
|> > Evaluation took:
|> >   0.27 seconds of real time
|> >   0.26 seconds of user run time
|> >   0.01 seconds of system run time
|> >   0 page faults and
|> >   1344 bytes consed.
|> > NIL
|> > * (time (invert-image "/tmp/test" "/tmp/test.out"))
|> > Compiling LAMBDA NIL: 
|> > Compiling Top-Level Form: 
|> 
|> > Evaluation took:
|> >   0.56 seconds of real time
|> >   0.54 seconds of user run time
|> >  0.02 seconds of system run time
|> >  0 page faults and
|> >  3944 bytes consed.
|> > NIL
|> 
|> > CMU CL 17f on a 50 MHz SPARCstation 2.  /tmp is in memory.
|> 
|> Depends what you mean by fast, I suppose.
|> 
|> time invert test.img out.img
|> 0.0u 0.0s 0:00 100% 57+253k 0+12io 0pf+0w

Well, 6 minutes (or whatever it took Erik) + .27 second + .56 seconds
certainly seems like a win over 60 minutes + 0.0 seconds.
As I mentioned in my message, you underspecified the spec for your 
program.  Certainly (to me at least) you did not imply anything about
how fast it should run.

...

|> From: mcdonald@kestrel.edu (Jim McDonald)
|> 
|> > There are at leas a couple of problems with this challenge.
|> 
|> > The main problem is that you have given an overly specified problem.
|> 
|> I was hoping that people would (if they took the challenge seriously at all)
|> look at the spirit of the challenge without me laying down the rules 
|> explicitly. Marco Antoniotti did so, and so did you with your second & third
|> versions. Thanks.

I tried, but to be honest, I really didn't know what you wanted--it wasn't
an idle complaint.

|> > The second problem is that the task you chose to measure is very low
|> > level and doesn't really use any interesting features of a language--
|> > it's almost nothing more than a few calls to the OS.
|> 
|> But my example does use an interesting feature of C++ - encapsulation. 

Ok, but the motivation does seem a little obscure for this task.
(I'm not saying encapsulation is bad, or even that it's inappropriate
for this task, just that you probably have some unspecified intentions
for this task that make it desirable.  Not knowing those intentions,
it's hard to say if your code was good, bad, or indifferent wrt them.)

|> > I'll do three versions:
|> > one very simple and close to your example, a second slightly
|> > more abstract, and a third closer to the problem I've given
|> > above.
|> 
|> > SECOND VERSION 
|> > I used parameterized types and sizes, 2D arrays, and added an
|> > explicit test routine.
|> 
|> This sounds a bit closer to my version. 

I could quibble, but fair enough.

|>                                         Adding a header to the image
|> file didn't seem to add anything interesting to the problem.

It was a step towards the third version where the data has a variable
format, given by the header.   Again, I ask you how long it would take
to revise your program to accept an array of arbitrary byte formats,
e.g. 128-bit bytes, with any number of dimensions, e.g. 256 * 256 * 4 * 2,
where the formats are given in the data file.  Such a capability seems
to be far more object-oriented than the minimal encapsulation you used.
I think your little *p hack becomes problematic.

But to return to my point about specs, I probably have a completely 
different notion than you of where this thing would be headed, so I'm
not sure if any further comparison (or even this comparison!) is 
meaningful.

|> > I'll do three versions:
..
|> > Elapsed Real Time = 5.14 seconds
|> > Total Run Time    = 5.09 seconds
|> > User Run Time     = 5.06 seconds
|> > System Run Time   = 0.03 seconds
|> > Process Page Faults    =         12
|> > Dynamic Bytes Consed   =    131,088
|> > Ephemeral Bytes Consed =      5,168
|> The run times speak for themselves.

They most certainly do not.  I was asked to quickly write a program
that did a simple task, which I accomplished.  As an afterthought,
I timed it a couple of different ways, just to illustrate that it
wasn't impossibly slow.

While I was writing the code I did several things that I knew full
well were incredibly inefficient at runtime.  For example, I read
each byte with a separate system call, which is obviously ridiculous 
if performance is an issue.  (I did even more egregious things in
the third version.)

Another ten minutes of hacking on the version most analogous to your 
version gives this result on a Sparcstation 2:

> (time (write-image "/tmp/x2" (invert-image (read-image "/tmp/x1"))))
Elapsed Real Time = 0.22 seconds
Total Run Time    = 0.14 seconds
User Run Time     = 0.12 seconds
System Run Time   = 0.02 seconds
Process Page Faults    =         44
Dynamic Bytes Consed   =    131,088
Ephemeral Bytes Consed =      3,568
#<Simple-Vector (UNSIGNED-BYTE 8) 65536 114A926>
> 

Most of my time on that was wasted due to my not having a manual
handy, so I had to experiment a bit to find the right invocations
to make it run faster.  Invert-image by itself takes about 0.04
seconds for a 256*256 array, and with a quick hack that is about
as [un]safe as your *p loop, I get a version of invert-image
that runs in 0.01 seconds by processing 4 bytes at a time.

With a manual I could reduce both this and the overall time 
further, but for me at least I've already gone far beyond the 
point of diminishing returns, since I'll probably never run this 
thing again.

If I really cared fantastically about speed I'd spend a few hours
to add a compiler-macro to produce hand-coded optimal assembly
code for the target machine whenever the compiler saw the pattern 
to be optimized.  That tends to create programs a few per cent 
faster than C compilers can create.




