REGEX Cheat Sheet
REGEX Cheat Sheet
https://staff.washington.edu/weller/grep.html 1/4
1/9/24, 7:12 AM REGEX Cheat Sheet
*? +? and {n,}? are lazy — match as little as ignore case, single-line, multi-line, free spacing
possible (?i)[a-z]*(?-i) ignore case ON / OFF
<.+?> finds 2 matches in <b>bold</b> (?s).*(?-s) match multiple lines (causes . to match newline)
(?m)^.*;$(?-m) ^ & $ match lines not whole string
comments — (?#comment)
(?x) #free-spacing mode, this EOL comment ignored
(?#year)(19|20)\d\d embedded comment
\d{3} #3 digits (new line but same pattern)
(?x)(19|20)\d\d #year free spacing & EOL comment
-\d{4} #literal hyphen, then 4 digits
(see modifiers)
(?-x) (?#free-spacing mode OFF)
/regex/ismx modify mode for entire string
A few examples:
The lookahead prevents matches on PRE, PARAM, and PROGRESS tags by only allowing more characters in the opening tag if P is
followed by whitespace. Otherwise, ">" must follow "<p".
LOOKAROUND notes
The lookahead seeks "e" only for the sake of matching "r".
Because the lookahead condition is ZERO-width, the expression is logically impossible.
It requires the 2nd character to be both "e" and "d".
For looking ahead, "e" must follow "r".
https://staff.washington.edu/weller/grep.html 2/4
1/9/24, 7:12 AM REGEX Cheat Sheet
For matching, "d" must follow "r".
fixed-width lookbehind
Most regex engines depend on knowing the width of lookbehind patterns. Ex: (?<=h1) or (?<=\w{4}) look behind for
"h1" or for 4 "word" characters.
This limits lookbehind patterns when matching HTML tags, since the width of tag names and their potential attributes can't be
known in advance.
variable-width lookbehind
.NET and JGSoft support variable-width lookbehind patterns. Ex: (?<=\w+) look behind for 1 or more word characters.
The first few examples below rely on this ability.
match text bound by simple HTML tags (NB: < \w+ > does not match tags with attributes.)
(?<=<(\w+)>).*(?=</\1>)
Lookaround groups define the context for a match. Here, we're seeking .* ie., 0 or more characters.
A positive lookbehind group (?<= . . . ) preceeds. A positive lookahead group (?= . . . ) follows.
These set the boundaries of the match this way:
In other words, advance along string until an opening HTML tag preceeds. Match chars until its closing HTML tag follows.
The tags themselves are not matched, only the text between them.
To span multiple lines, use the (?s) modifier. (?s)(?<=<cite>).*(?=</cite>) Match <cite> tag contents, regardless of line
breaks.
match text bound by HTML tags, including tags with attributes (not nested, though)
(?<=<(\w+?)\ ?.*?>).*(?=</\1>)
As in example above, the first group (\w+) captures the presumed tag name, then an optional space and other characters \
?.*? allow for attributes before the closing >.
class=".*?\bred\b.*?" this new part looks for class=" and red and " somewhere in the opening tag
\b ensures "red" is a single word
.*? allow for other characters on either side of "red" so pattern matches class="red" and class="blue red green" etc.
match complex opening & closing xhtml tags and all text between
(?i)<([a-z][a-z0-9]*)[^>]*>.*?</\1>
Here, the first group captures only the tag name. The tag's potential attributes are outside the group.
(NB: This markup <a onclick="valueOK('a>b')"> would end the match early. Doesn't matter here. Subsequent < pulls
match to closing tag. But if you attempted to match only the opening tag, it might be truncated in rare cases.)
https://staff.washington.edu/weller/grep.html 3/4
1/9/24, 7:12 AM REGEX Cheat Sheet
IF condition — phone number w/optional parentheses around area code (and optional space after closing parens)
(\()?\d{3}(?(1)\) ?|[- \.])\d{3}[- \.]\d{4}
(\()?\d{3} optional group ( )? matches "(" prior to 3-digit area code \d{3} — group creates back reference #1
(?(1)\) ?|[-/ \.]) (1) refers to group 1, so if "(" exists, match ")" followed by optional space, else match one of these: "- / . "
\d{3}[- \.]\d{4} rest of phone number
groups can be named (assume a file of lastname, firstname altered using "preg_replace()")
(?#find)(\b.+), (\b.*\b) (?#replace)\2 \1
(?#find)(?P<lname>\b.+), (?P<fname>\b.*\b) (?#replace) (?P=fname) (?P=lname)
https://staff.washington.edu/weller/grep.html 4/4