Perl and Regular Expressions 1 Perl and Regular Expressions 1
Perl and Regular Expressions 1 Perl and Regular Expressions 1
Perl and Regular Expressions 1 Perl and Regular Expressions 1
Lennart Herlaar [email protected] http://www.cs.uu.nl/people/lennart room A104, telephone 030-2533921 March 9, 2006 Originally designed for processing of (textual) data Is written as a combination of various Unix shell commands shell scripts awk, sed, tr, grep Therefore it has a strange syntax, but easier to learn if you know these programs already. There are many additional modules available There is an extensive user community on the Internet It is very portable (Unix/Windows) PHP was largely derived from Perl. Not just used for CGI programming, but also for maintenance and reporting.
Perl
Variables do not have a xed type Lots of type juggling Implicit arguments, print $_ versus print Built-in: strings, lists, dictionaries (associative arrays, hashes) Strong support for regular expressions and other string handling Some object oriented features (but I skip those) Modular setup (import modules) Design is rather messy Programs are dicult to understand and maintain While programming: have a book or manual pages handy.
Operators
File operators: -e $a is true if le named $a exists -d $a is true if $a is a directory and many more Comparison: the usual, but strings have dierent operators: eq, ne, lt, gt, le, cmp. No ===. 2 ** 16 for exponentiation. 2 x 16 for repeat. ++ also increments "file1" to "file2". Simultaneous assignments: ($a, $b) = ($b, $a). More exible assignments ($fst, $snd, @otherwords) = split(" ", $line); print ++($snore = zz); # aaa
Perl and regular expressions 6
Control structures
Use compound statement with the normal if. unless as inverse of if elsif and not elseif as in PHP. kissme() if $showup($me); # no braces necessary killme() unless $ipayup; while (<*.java>) { chmod 0711; # set access rights for all java files }
Loopy arrays
foreach $cds (@collection) { print "<LI>$cds<\/LI>\n"; } @hex = (0 .. 9, a .. f); # range operator %knor = (a,1,b,2,c,10); # squashed pairs key , value foreach $key (sort keys %knor) { print "$key has value $knor{$key}"; } # Better hash notation %map = (red => afghan, blue black => pearl);
Subroutines
No named parameters: take parameters out of @_ array. Variables only local by explicit mention If no mention, identiers have global package scope. Signature can be used to indicate types of parameters (for checking). sub f ($$@) { $a = shift(@_); # modifies global $a $b = shift; # implicit parameter @rest = @_; # Or use ($a, $b, @rest) = @_; return (@rest, $a, $b); } @v = (3, 4, 5); $a = 2; @v = &f (1, $a, @v);
Perl and regular expressions 9
=> curacao,
Signatures
Use signatures! Optional parameters to the right of semi-colon. References (& in PHP) prexed by backslash. First array can be changed, the second one, if available, can not. Unbackslashed arrays and hashes eat everything. sub complex ($\@;$@) { ...... } sub dumbo (@$$) { @fst = @_; # also includes 1 and 2 } &complex (1, \@a, 2); &dumbo (@a, 1, 2);
Perl and regular expressions 10
Programming advice
Perl can be used for CGI, but also website maintenance. Use use strict. Add signature information to subroutines, and put function denitions at the top of your source les. Turn on all warnings (-w option) Avoid using implicit parameters unless you know what you are doing. Avoid dependence on implicit casts. Use or die ".....$!\n" whenever you do I/O.
11
Regular expressions
Most languages do have them, but rarely as embedded into the language as in Perl. I concentrate on regular expression and using them for matching (for validation) substitution (for modication) Regular expressions are similar to the Regular Languages and Finite Automata of Grammaticas en Ontleden. However, most regular expression languages can do more. In fact, in some respects they go beyond the context free languages. PHP oers perl regular expressions, with some minor dierences.
12
13
14
15
First examples
Pattern matching using /.../ or m/.../ print "Not empty" if ($str =~ /\S/); if ($bandname =~ /[iI]ce/) {....} @words = $line =~ m/\S+/g; Substituting for a pattern s/../../.. $s = "jack in the box"; ($t = $s) =~ s/\s+/-/g; # jack-in-the-box ($v = $s) =~ s/\s*/-/g; # -j-a-c-k--i-n-... $s =~ s/\S+/X/g; # X X X X Possible ags i match case-insensitive g match more than once in one line s newline is like any other character
INP 2003/2004 - regular expressions 16
Characteristics
Matching is done from left to right, and are as long/large as possible. Under the g ag, as many as possible, trying from left to right. Resulting matches can be put in an array (or single scalar) In a boolean context, matching is true if a match was found. Substitution actually changes the string you match on.
17
Character classes
The usual special characters: \t, \n, \\ but also \s, \S, \w, \W Matching any single lower case letter: [a-z]. All digits: [0-9]. Alphanumeric: [0-9A-Za-z] Or use [:alnum:]. Combine them: [01[:alpha:]#]. Digits also by \d, white space by \s, non-whitespace by \S. Complementation: [:^space:] is equivalent to \S. Every character but a, e or f: [^aef]. Matching every word but lennart: $name !~ /lennart/.
Matching sequences
A character matches only a single character. How can we match sequences of characters? /max[iy]ma/ matches both maxima and maxyma. /a*/ matches a sequence of zero or more as. /\S+/ matches a sequence of one or more non-whitespace characters. Matching decimal numbers with [:digit:]+ Matching identiers: [a-zA-Z$_][a-zA-Z0-9$_]* Matching the word option or nothing: (option)? Parentheses can be used to group patterns.
18
19
21
Take care
Perl regular expressions are not regular: /([a-zA-Z]+)\s+\1/ matches a word followed by that same word. On $t = "abc ac abcdef define"; yields def. Here we must use \1 in place of $1. Not even context free. As you can imagine, I skipped a few facilities. Read the perl manual pages on regular expression for the complete story. Study well on regular expressions: they pop up everywhere.
22
23
Regex in PHP
preg_grep - Return array entries that match the pattern. preg_match_all - Perform a global regular expression match. preg_match - Perform a regular expression match. preg_replace - Perform a regular expression search and replace. preg_split - Split string by a regular expression. http://weblogtoolscollection.com/regex/regex.php could be useful. $fl_array = preg_grep("/^(\d+)?\.\d+$/", $array); $string = April 15, 2003; $pattern = /(\w+) (\d+), (\d+)/i; $replacement = ${1}1,$3; echo preg_replace($pattern, $replacement, $string);
24