0% found this document useful (0 votes)
43 views5 pages

The Default Behavior For Matching Can Be Changed

The document discusses various regular expression modifiers in Perl. It describes modifiers like /i for case-insensitive matching, /x for whitespace and comments in patterns, and character set modifiers like /a. It provides details on some modifiers and examples of using modifiers like /x to make patterns more readable.

Uploaded by

Mahendar S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views5 pages

The Default Behavior For Matching Can Be Changed

The document discusses various regular expression modifiers in Perl. It describes modifiers like /i for case-insensitive matching, /x for whitespace and comments in patterns, and character set modifiers like /a. It provides details on some modifiers and examples of using modifiers like /x to make patterns more readable.

Uploaded by

Mahendar S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

The default behavior for matching can be changed, using various modifiers.

Modifiers that
relate to the interpretation of the pattern are listed just below. Modifiers that alter the way a
pattern is used by Perl are detailed in "Regexp Quote-Like Operators" in perlop and "Gory
details of parsing quoted constructs" in perlop. Modifiers can be added dynamically;
see "Extended Patterns" below.

Treat the string being matched against as multiple lines. That is,
change "^" and "$" from matching the start of the string's first line and the end of its
last line to matching the start and end of each line within the string.

Treat the string as single line. That is, change "." to match any character
whatsoever, even a newline, which normally it would not match.

Used together, as /ms, they let the "." match any character whatsoever, while still
allowing "^" and "$" to match, respectively, just after and just before newlines within
the string.

Do case-insensitive pattern matching. For example, "A" will match "a" under /i.

If locale matching rules are in effect, the case map is taken from the current locale for
code points less than 255, and from Unicode rules for larger code points. However,
matches that would cross the Unicode rules/non-Unicode rules boundary (ords
255/256) will not succeed, unless the locale is a UTF-8 one. See perllocale.

There are a number of Unicode characters that match a sequence of multiple


characters under /i. For example, LATIN SMALL LIGATURE FI should match the
sequence fi. Perl is not currently able to do this when the multiple characters are in
the pattern and are split between groupings, or when one or more are quantified.
Thus

"\N{LATIN SMALL LIGATURE FI}" =~ /fi/i; # Matches

"\N{LATIN SMALL LIGATURE FI}" =~ /[fi][fi]/i; # Doesn't match!

"\N{LATIN SMALL LIGATURE FI}" =~ /fi*/i; # Doesn't match!

# The below doesn't match, and it isn't clear what $1 and $2 would

# be even if it did!!

"\N{LATIN SMALL LIGATURE FI}" =~ /(f)(i)/i; # Doesn't match!

Perl doesn't match multiple characters in a bracketed character class unless the
character that maps to them is explicitly mentioned, and it doesn't match them at all if
the character class is inverted, which otherwise could be highly confusing.
See "Bracketed Character Classes" in perlrecharclass, and "Negation" in
perlrecharclass.

x and xx

Extend your pattern's legibility by permitting whitespace and comments. Details in "/x
and /xx"

Preserve the string matched such that ${^PREMATCH}, ${^MATCH},


and ${^POSTMATCH} are available for use after matching.

In Perl 5.20 and higher this is ignored. Due to a new copy-on-write


mechanism, ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} will be available after the
match regardless of the modifier.

a, d, l, and u

These modifiers, all new in 5.14, affect which character-set rules (Unicode, etc.) are
used, as described below in "Character set modifiers".

Prevent the grouping metacharacters () from capturing. This modifier, new in 5.22,
will stop $1, $2, etc... from being filled in.

"hello" =~ /(hi|hello)/; # $1 is "hello"

"hello" =~ /(hi|hello)/n; # $1 is undef

This is equivalent to putting ?: at the beginning of every capturing group:

"hello" =~ /(?:hi|hello)/; # $1 is undef

/n can be negated on a per-group basis. Alternatively, named captures may still be


used.

"hello" =~ /(?-n:(hi|hello))/n; # $1 is "hello"

"hello" =~ /(?<greet>hi|hello)/n; # $1 is "hello", $+{greet} is

# "hello"

Other Modifiers

There are a number of flags that can be found at the end of regular expression
constructs that are not generic regular expression flags, but apply to the operation
being performed, like matching or substitution (m// or s/// respectively).

Flags described further in "Using regular expressions in Perl" in perlretut are:

c - keep the current position during repeated matching


g - globally match the pattern repeatedly in the string

Substitution-specific modifiers described


in "s/PATTERN/REPLACEMENT/msixpodualngcer" in perlop are:

e - evaluate the right-hand side as an expression

ee - evaluate the right side as a string then eval the result

o - pretend to optimize your code, but actually introduce bugs

r - perform non-destructive substitution and return the new value

Regular expression modifiers are usually written in


documentation as e.g., "the /x modifier", even
though the delimiter in question might not really be a
slash. The modifiers /imnsxadlup may also be
embedded within the regular expression itself using
the (?...) construct, see "Extended Patterns" below.

Details on some modifiers


Some of the modifiers require more explanation than
given in the "Overview" above.

/x and /xx

A single /x tells the regular expression parser to


ignore most whitespace that is neither backslashed
nor within a bracketed character class. You can use
this to break up your regular expression into more
readable parts. Also, the "#" character is treated as
a metacharacter introducing a comment that runs up
to the pattern's closing delimiter, or to the end of the
current line if the pattern extends onto the next line.
Hence, this is very much like an ordinary Perl code
comment. (You can include the closing delimiter
within the comment only if you precede it with a
backslash, so be careful!)

Use of /x means that if you want real whitespace


or "#" characters in the pattern (outside a bracketed
character class, which is unaffected by /x), then
you'll either have to escape them (using backslashes
or \Q...\E) or encode them using octal, hex,
or \N{} or \p{name=...} escapes. It is ineffective to
try to continue a comment onto the next line by
escaping the \n with a backslash or \Q.

You can use "(?#text)" to create a comment that


ends earlier than the end of the current line,
but text also can't contain the closing delimiter
unless escaped with a backslash.
A common pitfall is to forget that "#" characters
begin a comment under /x and are not matched
literally. Just keep that in mind when trying to puzzle
out why a particular /x pattern isn't working as
expected.

Starting in Perl v5.26, if the modifier has a


second "x" within it, it does everything that a
single /x does, but additionally non-backslashed
SPACE and TAB characters within bracketed
character classes are also generally ignored, and
hence can be added to make the classes more
readable.

/ [d-e g-i 3-7]/xx

/[ ! @ " # $ % ^ & * () = ? <> ' ]/xx

may be easier to grasp than the squashed


equivalents

/[d-eg-i3-7]/

/[!@"#$%^&*()=?<>']/

Taken together, these features go a long way


towards making Perl's regular expressions more
readable. Here's an example:

# Delete (most) C comments.

$program =~ s {

/\* # Match the opening delimiter.

.*? # Match a minimal number of


characters.

\*/ # Match the closing delimiter.

} []gsx;

Note that anything inside a \Q...\E stays unaffected


by /x. And note that /x doesn't affect space
interpretation within a single multi-character
construct. For example (?:...) can't have a space
between the "(", "?", and ":". Within any delimiters
for such a construct, allowed spaces are not affected
by /x, and depend on the construct. For example, all
constructs using curly braces as delimiters, such
as \x{...} can have blanks within but adjacent to
the braces, but not elsewhere, and no non-blank
space characters. An exception are Unicode
properties which follow Unicode rules, for which
see "Properties accessible through \p{} and \P{}" in
perluniprops.

The set of characters that are deemed whitespace


are those that Unicode calls "Pattern White Space",
namely:

U+0009 CHARACTER TABULATION

U+000A LINE FEED

U+000B LINE TABULATION

U+000C FORM FEED

U+000D CARRIAGE RETURN

U+0020 SPACE

U+0085 NEXT LINE

U+200E LEFT-TO-RIGHT MARK

U+200F RIGHT-TO-LEFT MARK

U+2028 LINE SEPARATOR

U+2029 PARAGRAPH SEPARATOR

Character set modifiers

/d, /u, /a, and /l, available starting in 5.14, are


called the character set modifiers; they affect the
character set rules used for the regular expression.

The /d, /u, and /l modifiers are not likely to be of


much use to you, and so you need not worry about
them very much. They exist for Perl's internal use, so
that complex regular expression data structures can
be automatically serialized and later exactly
reconstituted, including all their nuances. But, since
Perl can't keep a secret, and there may be rare
instances where they are useful, they are
documented here.

The /a modifier, on the other hand, may be useful.


Its purpose is to allow code that is t

You might also like