Regular Expressions Cheat Sheet
Regular Expressions Cheat Sheet
0
Basic definitions
- String s matches the regex pattern /p/ whenever s contains the pattern ‘p’.
- Example: abc matches /a/, /b/, /c/
- For simplicity, we will use the matches verb loosely in a sense that
- a string can match a regex (e.g. ‘a’ matches /a/)
- a regex can match a string (e.g. /a/ matches ‘a’)
- A regex pattern consists of literal characters that match itself, and metasyntax characters - Literal characters can be concatenated
in a regular expression. String s matches /ab/ if there is an a character directly followed by a b character.
- Example: abc matches /ab/, /bc/, /abc/
- Example: abc does not match /ac/, /cd/, /abcd/
- *Alternative execution can be achieved with the metasyntax character |
- /a|b/ means: match either a or b
- Example: ‘ac’, ‘b’, ‘ab’ match /a|b/
- Example: ‘c’ does not match /a|b/
- Iteration is achieved using repeat modifiers. One repeat modifier is the * (asterisk) metasyntax character.
- Example: /a*/ matches any string containing any number of a characters
- Example: /a*/ matches any string, including '', because they all contain at least zero a characters - Matching is greedy. A
greedy match attempts to stay in iterations as long as possible.
-Example: s = 'baaa' matches /a*a/ in the following way:
- s[0]: 'b' is discarded
- s[1]: 'a' matches the pattern a*
- s[1] - s[2]: 'aa' matches the pattern a*
- s[1] - s[3]: 'aaa' matches the pattern a*
- as there are no more characters in s and there is a character yet to be matched in the regex, we backtrack one character
- s[1] - s[2]: 'aa' matches the pattern a*, and we end investigating the a* pattern - s[3]: 'a' matches the a pattern
- there is a complete match, s[1] - s[2] match the a* pattern, and s[3] matches the a pattern. The returned match is aaa
starting at index 1 of string s
- Backtracking is minimal. We attempt to backtrack one character at a time in the string, and attempt to interpret the rest of the regex
pattern on the remainder of the string.
Constructing a regex
• literal form: /regex/
• constructor: new RegExp( 'regex' );
- escaping: /\d/ becomes new RegExp( '\\d' )
- argument list: new RegExp( pattern, modifiers );
g global match. We attempt to find all matches instead of just returning the first match.
The internal state of the regular expression stores where the last match was located,
and matching is resumed where it was left in the previous match.
m multiline match. It treats the ^ and $ characters to match the beginning and the end of
each line of the tested string. A newline character is determined by \n or \r.
y Sticky search
Example:
Regex API
- regex.exec( str ): returns information on the first match. Exec allows iteration on the regex for all matches if the g modifier
is set for the regex - regex.test( str ): true iff regex matches a string
String API
str.match( regex ): for non-global regular expression arguments, it returns the same return value as regex.exec( str ). For
global regular expressions, the return value is an array containing all the matches. Returns null if no match has been found.
str.replace( regex, newString ): replaces the first full match with newString. If regex has a global modifier,
str.replace( regex, newString ) replaces all matches with newString. Does not mutate the original string str.
str.search( regex ): returns the index of the first match. Returns -1 when no match is found. Does not consider the global modifier.
str.split( regex ): does not consider the global modifier. Returns an array of strings containing strings in-between matches.
Literal characters
A regex literal character matches itself. The expression /a/ matches any string that contains the a character. Example:
Literal characters are: all characters except metasyntax characters such as: ., *, ^, $, [, ], (, ), {, }, |, ?, +, \
When you need a metasyntax character, place a backslash in front of it. Examples: \., \\, \[.
Whitespaces:
- behave as literal characters, exact match is required - use character classes for more flexibility, such as:
- \n for a newline
- \t for a tab
- \b for word boundaries
Metasyntax characters
In the last example, notice the parentheses. As the | operator has the lowest precedence out of all operators,
parentheses made it possible to increase the prcedence of a|b in (a|b)c.
Examples:
/^a+$/ matches any string consisting of one or more 'a' characters and nothing else
/^a?$/ matches '' or 'a'. The string may contain at most one 'a' character
/^a*$/ matches the empty string and everything matched by /^a+$/
/^a{3,5}$/ matches 'aaa', 'aaaa', and 'aaaaa'
/(ab){3}/ matches any string containing the substring 'ababab'
/^a+?$/ lazily matches any string consisting of one or more 'a' characters and nothing else
/^a??$/ lazily matches '' or 'a'. The string may contain at most one 'a' character - /^a*?$/ lazily
matches the empty string and everything matched by /^a+$/
/^a{3,5}?$/ lazily matches 'aaa', 'aaaa', and 'aaaaa'
/(ab){3}?/ lazily matches any string containing the substring 'ababab'
Capture groups
- ( and ) captures a substring inside a regex
- Capture groups have a reference number equal to the order of the starting parentheses of the open parentheses
of the capture group starting with 1
- (?: and ) act as non-capturing parentheses, they are not included in the capture group numbering Examples:
/a(b|c(d|(e))(f))$/
^ ^ ^ ^
| | | |
1 2 3 4
/(?<!a)b/.exec( 'Ab' )
["b", index: 1, input: "Ab"] // executed in latest Google Chrome
/\bregex\b/.exec( 'This regex expression tests word boundaries.' )
["regex", index: 5, input: "This regex expression tests word boundaries."] /^regex$/.exec(
'This\nregex\nexpression\ntests\nanchors.' ) null
/^regex$/m.exec( 'This\nregex\nexpression\ntests\nanchors.' )
["regex", index: 5, input: "Thisregexexpressiontestsanchors."]
Possessive Alternation
Attempts to match each branch of the alternation until the first match is found. After matching a branch in the
alternation, we are not allowed to backtrack and try out any other branches in the alternation.
Possessive alternation does not exist in JavaScript. However, there are workarounds.
Assuming we don’t have any other capture groups in front of the expression, use (?=(a|b))\1 instead of the
generic PCRE pattern (?>a|b)
Articles
Check out my articles on regular expressions on zsoltnagy.eu/category/regular-expressions2
1
https://github.com/slevithan/xregexp
2
http://zsoltnagy.eu/category/regular-expressions