Newsgroups: comp.lang.scheme
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!news.mathworks.com!newsfeed.internetmci.com!howland.reston.ans.net!ix.netcom.com!netcom.com!netcom12!dpb
From: dpb@netcom.com (Don Bennett)
Subject: Re: Regular expressions in scheme?
In-Reply-To: joe mcdonald's message of Tue, 23 Jan 1996 02:37:14 -0800
Message-ID: <DPB.96Jan29223957@netcom12.netcom.com>
Sender: dpb@netcom12.netcom.com
Organization: NETCOM On-line services
References: <3104BA5A.2719@smartlink.net>
Date: Tue, 30 Jan 1996 06:39:57 GMT
Lines: 111


I put together a modified  regular expression interface 
for SCM-4e1 that lets you do searching, matching, splitting, and 
editing.

If you want a copy of the code, send mail.

Don Bennett
dpb@netcom.com


From the README file:


(regcomp <pattern> [<flags>])

  Compile a regular expression.
  Return a compiled regular expression, or an integer error code
  suitable as an argument to regerror.

  <flags>     in regcomp is a string of option letters used to control
              the compilation of the regular expression. The letters may 
              consist of:

        'n' - newlines won't be matched by . or hat lists; ( [^...] )
        'i' - ignore case.
  only when compiled with _GNU_SOURCE:
        '0' - allows dot to match a null character.
	'f' - enable GNU fastmaps.

(regerror <errno>)

  Returns a string describing the integer <errno> returned when
  regcomp fails.

(regexec <re> <string>)

  Returns #f or a vector of integers.  These integers are in doublets.
  The first of each doublet is the index of <string> of the start of the
  matching expression or sub-expression (delimited by parentheses in the
  pattern).  The last of each doublet is index of <string> of the end of
  that expression.  #f is returned if the string does not match.


(regmatch? <re> <string>)

  Returns #t if the <pattern> such that <regexp> = (regcomp <pattern>)
  matches <string> as a POSIX extended regular expressions.  Returns #f
  otherwise.


(regsearch  <re> <string> [<start> [<len>]])
(regsearchv <re> <string> [<start> [<len>]])
(regmatch   <re> <string> [<start> [<len>]])
(regmatchv  <re> <string> [<start> [<len>]])

  Regsearch searches for the pattern within the string.
  Regmatch anchors the pattern and begins matching it against string.
  Regsearch returns the character position where <re> starts, or
    #f if not found. 
  Regmatch returns the number of characters matched, #f if not matched.
  Regsearchv and regmatchv return the match vector is returned if <re>
    is found, #f otherwise.

  <re>        may be either:
              a) a compiled regular expression returned by regcomp;
              b) a string representing a regular expression;
              c) a list of a string and a set of option letters.

  <string>    The string to be operated upon.

  <start>     The character position at which to begin the search
              or match. If absent, the default is zero.
	     
              *** Compiled _GNU_SOURCE and using GNU libregex only: *** 
	      When searching, if <start> is negative, the absolute 
	      value of <start> will be used as the start location 
	      and reverse searching will be performed.
 
  <len>       The search is allowed to examine only the first <len>
              characters of <string>. If absent, the entire string 
	      may be examined.


(string-split  <re> <string>)
(string-splitv <re> <string>)
	
  String-split splits a string into substrings that are
  separated by <re>, returning a vector of substrings.

  String-splitv returns a vector of string positions that 
  indicate where the substrings are located.


(string-edit  <re> <edit-spec> <string> [<count>])

  Returns the edited string.

  <edit-spec> Is a string used to replace occurances of <re>.
              Backquoted integers in the range of 1-9 may be used
              to insert subexpressions in <re>, as in sed.

  <count>     The number of substitutions for string-edit to perform.
	      If #t, all occurances of <re> will be replaced.
	      The default is to perform one substitution.

		

-- 
   Don Bennett
   dpb@netcom.com
