Regular Expressions
Regular Expressions
WHAT IS A REGULAR
EXPRESSION (REGEX)
SEQUENCE OF
CHARACTERS THAT
DEFINES A SEARCH
PATTERN
Note: it’s very important to use back slashes(\) to make know the computer that
you’re using the metacharacter “\d” and not the letter “d”, same happens with the
rest of the metacharacters.
“\” is a special character like “+”, “.”, “[]”, “[^]” and “?”. We will see the rest of
them in the following section.
“\w” METACHARACTER
In this case, “w” refers to “word” and this metacharacter matches any word character
(a-z, A-Z, 0-9, and lowercases “_” ).
So “/m[0-9a-zA-Z_]uch/” and “/m\wuch/” do the same.
“\s” METACHARACTER
“s” comes from “space”, then “\s” metacharacter matches all kind of whitespaces.
So, “/abadiel\s2014/” will match: “abadiel 2014”, “abadiel 2014”, etc.
SPECIAL CHARACTERS
“?” CHARACTER
If we use “?” after another character, then it makes it optional to match, in other
words, it can appear 0 or 1 times.
So, if we have /hello?/, then “hell” and “hello” will match, but “helloo” won’t.
“.” CHARACTER
If we use “.” it matches any character, except newline character.
So, if we have /.+/, then “gjnnj51//-*+595_##/” will match.
KNOWN SPECIAL
CHARACTERS
“+” character “\” character
In the last section we explained it, it It’s also called “escape character” and it
matches the preceding character 1 or enables us to use metacharacters. But
more times. also helps with matching characters like
“/” and “.” that, as we saw, have
different meanings and can’t be used
literally.
KNOWN SPECIAL
CHARACTERS
“[]” character “^” character
We used it to create sets of possibilities We used it to exclude options for a
for one character, like [a-zA-Z] for character in a set, like [^123]
example.
“*” CHARACTER
Similar to “+”, but it gives us the possibility of 0 repetitions for the preceding
character.
So, /a*/ matches “”, “a” and “a…aaaa…aaa”.
STARTING AND
ENDING PATTERNS
STARTING AND ENDING
PATTERN
Sometimes, we only want our regular expression to affect the beginning of the text,
until now, our regex’s were always focusing at any part inside the string.
For example: If we have /\w{4}/, then:
- “w_ter” matches
- “waternfkfkjngt” matches too
- “dfngijwaterfbjgfn” matches too
- “fvnkjdfnvjdfjnwater” matches too
But this can also be dangerous, because in the second example, even though the first
part is the only one that matches, the computer will take all the string. Sometimes
when we want all the string to match completely this becomes a problem.
STARTING PATTERN
And that’s why is necessary to establish starting and ending patterns, so we can put a
start and an end to the string.
To establish a starting pattern we use a carat(^) at the beginning of the expression.
Then, the regular expression /^abcd/ will have these effects:
- “abcdfmvfdnivj”, result: match
- “dicifabcdnjf”, result: not match
- “cdbhjdbfdsabcd”, result: not match
Our regular expression now focuses in the beginning of the string.
ENDING PATTERN
And, in the other hand, if we want our regular expression to focus in the end of the
string, we need an ending pattern, to establish it we use the dollar-sign ($).
Then, /abcd$/ will have these effects:
- “abcdfmvfdnivj”, result: not match
- “dicifabcdnjf”, result: not match
- “cdbhjdbfdsabcd”, result: match
And we can use both patterns at the same time, this makes all the string be matched
by our regular expression.
Then, /^abcd$/ will have these effects:
- “abcdfmvfdnivj”, result: not match
- “dicifabcdnjf”, result: not match
- “cdbhjdbfdsabcd”, result: not match
- “abcd”, result: match
Now our regular expression focuses in all the string.
ALTERNATE
CHARACTERS
ALTERNATE CHARACTERS
We learnt how to establish possibilities for character’s value, but now we will see
how to establish possibilities for a substring or the entire one.
So, imagine that we are asking for a phone number or a name, we can put those 2
possibilities in one regular expression by using a pipe (|) which means “or”.
It would be like this: /[a-zA-Z]{1,20}|[0-9]{9}/
The pipe then separates both possibilities. In this case, the possibilities are for the
complete string.
If we want to focus the options on a part of the string, then the parentheses are
useful.
Example: /^(orange|apple)-juice$/ matches both “orange-juice” and “apple-juice”