Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!newshost.marcam.com!zip.eecs.umich.edu!newsxfer.itd.umich.edu!gatech!howland.reston.ans.net!ix.netcom.com!netcom.com!bakul
From: bakul@netcom.com (Bakul Shah)
Subject: Re: Record parsing ala C/AWK?
Message-ID: <bakulD0DMx0.Go2@netcom.com>
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
References: <LEWIKK.94Dec5153744@grasshopper.aud.alcatel.com>
Date: Tue, 6 Dec 1994 06:48:36 GMT
Lines: 54

lewikk@grasshopper.aud.alcatel.com (Kevin K. Lewis) writes:

>I'm trying to parse a line of records that look something like this:
>"cow    a4 b2 c3 d1"
>The space varies between each record.

>I'm using the stdio lib in slib (from SCM) in the following way:
>(scanf "%s %c%d %c%d %c%d %c%d")
 ...
>Is there a better way to parse records in Scheme (ala AWK)?  Or is
>there a better way to do what I'm doing?

You can do this in SCM by using its regexp package.  You will
have to write some code though.  Something like

(define RE
    (regcomp
	"([a-zA-Z]+) +([a-z])([0-9]+) +([a-z])([0-9]+) +([a-z])([0-9]+)"))

can describe the format.  Now you can call regexec for every line:

	(regexec RE line) => vector

The returned vector will have coordinates for submatches, which
you can easily transform into substrings to give you something
like:
    #("cow" "a" "4" "b" "2" "c" "3" "d" "1")

You should simplify the RE so that only the pieces you are
interested in are extracted.

Just today I put together a library of  a small set of functions
that can do this and much more (maybe even awk like functionality
in Scheme).  I will post an article on it soon, but briefly, you
can just do

    ((/md RE	; break a line up in fields
          (lambda (match submatch-vector) ...)) ; operate on them
     line-of-data)

This particular expression will even allow multiple records per
line and call procedure (lamdba (match submatch-vector) ...) once
per record.

Currently I am working on making newlines and nuls just normal
chars in regular expressions (so that you can describe multi-line
records with REs), and a data structure to allow use of nestable
procedures like /md with strings, files or where data is split
between the two.  With these changes you should be able to
process an entire file with a Perl like `one-liner':

    ((/md "[^\n]+\n" ((/md RE (lambda (x y) ...)))) (file->text "myfile"))

Bakul Shah
