0% found this document useful (0 votes)
40 views35 pages

Regular Expressions

This document provides an introduction to regular expressions (regexes). It explains that regexes are sequences of characters that define search patterns to match strings. Some key points covered include: - Regexes start and end with "/" and are used in programming languages to filter data. - Common regex patterns include matching specific words, ranges of characters, numbers of repetitions, and starting/ending patterns. - Metacharacters like "\d" have special meanings, and flags like "i" and "g" modify matching behavior. - Braces {}, brackets [], and symbols like ?, *, +, | have specific regex functions around repetition, character sets, and alternation.

Uploaded by

edgar leiva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views35 pages

Regular Expressions

This document provides an introduction to regular expressions (regexes). It explains that regexes are sequences of characters that define search patterns to match strings. Some key points covered include: - Regexes start and end with "/" and are used in programming languages to filter data. - Common regex patterns include matching specific words, ranges of characters, numbers of repetitions, and starting/ending patterns. - Metacharacters like "\d" have special meanings, and flags like "i" and "g" modify matching behavior. - Braces {}, brackets [], and symbols like ?, *, +, | have specific regex functions around repetition, character sets, and alternation.

Uploaded by

edgar leiva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

REGULAR EXPRESSIONS

WHAT IS A REGULAR
EXPRESSION (REGEX)

SEQUENCE OF
CHARACTERS THAT
DEFINES A SEARCH
PATTERN

IT PROVIDES A CONCISE AND USED IN VARIOUS


POWERFUL WAY TO DESCRIBE PROGRAMMING LANGUAGES
THOSE PATTERNS
IMPORTANCE
As we’re learning about SQL commands sometimes we need to filter data using
regular expressions, this occurs with e-mails, passwords, aliases, etc.
BASICS
- All regular expressions starts and ends with “/”
Example: /example/
Note: This regular expression catches all strings that contains the exact word “example”.
Note: When a string accomplishes the regex rule, we call that a “match”.
REGEX FOR STRINGS THAT
CONTAINS A SPECIFIC WORD
In this case, if we need a regular expression that matches strings that contains an
specific word, then we might use: /<our word>/
So, /watermelon/ will have this effect:
 “odfkvokkmwater-melongbbjofg”, result: not match
 “watermelondifbijingfi”, result: match
 “sfnvjdfslb”, result: not match
 “xzfdsewatermelongfi”, result: match Note that this regex only
 “vfvxwatermelon”, result: match match the first appearance of
 “vfvxwatermelonfdndjfdfwatermelon”, result: match the word “watermelon”, then
returns.
REGEX FOR STRINGS THAT
CONTAINS A SPECIFIC WORD
We can also use a regex flag (an optional parameter to a regex that modifies its behavior
of searching) named “global” (g) to match all the appearances of the word
“watermelon”.
So, /watermelon/g will have this effect: Note: Regex flags are always typed
 “odfkvokkmwater-melongbbjofg”, result: not match after the final “/”.
 “watermelondifbijingfi”, result: 1 match
 “sfnvjdfslb”, result: not match
 “xzfdsewatermelongfi”, result: 1 match
 “vfvxwatermelon”, result: 1 match
 “vfvxwatermelonfdndjfdfwatermelon”, result: 2 matches
REGEX FOR STRINGS THAT
CONTAINS A SPECIFIC WORD
We can also use a regex flag named “insensitive” (i) to match all the appearances of
the word “watermelon” even when it’s been written with capital letters.
So, /watermelon/gi will have this effect:
 “odfkvokkmwater-melongbbjofg”, result: not match
 “wAtErmelOndifbijingfi”, result: 1 match
 “sfnvjdfslb”, result: not match
 “xzfdseWATERMELONgfi”, result: 1 match
 “vfvxwatermElon”, result: 1 match
 “vfvxWateRmelonfdndjfdfwatErmelon”, result: 2 matches
REGEX FOR STRINGS THAT
CONTAINS A VARIETY OF
OPTIONS FOR A CHARACTER
If we need to validate a string which one of his characters can be these letters: “a”,
“b” or “c” for example, then we can use square brackets [ ] and put the possibilities
inside of them.
So, the regular expression “/a[g45]bc/”, has the following effects:
- “a4bc”, result: match
- “a45bc”, result: not match
- “a5bd”, result: not match
- “agbcdfvlkdfmlf”, result: match
REGEX FOR STRINGS THAT
CONTAINS A VARIETY OF
OPTIONS FOR A CHARACTER
As we can include a certain quantity of options for a character, we can also exclude.
Better saying, we can give the characters that can’t be used. For that, we use a caret
(^), and then we put the characters we don’t want to match.
So, the regular expression “/a[^123]h[12]/”, has the following effects:
- “a4h1”, result: match
- “a1h3”, result: not match Note:
We are using a exclude
- “a2h0”, result: not match set.
- “a2h2”, result: match
RANGES
RANGES
Imagine, we want a first character to be able to take all the letters of the alphabet,
then our regular expression would seem like this:
/[abcdefghijklmnopqrstuvwxyz]hello/
And we have to admit that this regular expression takes a lot of space and it’s
exposed to human error, so to make it pretty smaller we have to use ranges, and this
is possible while using a dash (-).
The equivalent of the previous expression is: /[a-z]hello/
“[a-z]” means the character can take values from “a” to “z”.
Note: Ranges can be used in exclude sets too. Example: /[^a-d]hello/, in
consequence, “ahello”, “bhello”, “chello” and “dhello” won’t match.
Last section, we used the regex flag insensitive(i) to include capital letters, but this
affects all the regular expression, what if we want this effect only in one character?
To solve this problem we won’t need to use that regex flag, we only need to use
ranges.
Example: /[a-bA-B]oat/ matches these words: aoat, boat, Aoat, Boat.
REPETITIONS
REPEATING CHARACTERS
Now, imagine that we want a phone number, here in Peru, phone numbers has 9
digits between 0 and 9, so our regular expression could look like this:
/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/
But this looks too long, we can short it by using curly brackets (called braces too,
{}).
Inside the curly brackets we have to put the number of repetitions for the character.
So the equivalent would be: /[0-9]{9}/
But, the repetitions can vary too, if we want a word between 4 and 5 digits, we can
use this: Use a comma inside the
braces to express the range
/[a-zA-Z]{4,5}/ of repetitions
If we don’t want to express an upper limit for the repetitions, we can leave it like
this /[a-zA-Z]{4,}/, with this, the minimum number of repetitions will be 4, and
then we can repeat more as we want.
Even though we can express a minimum of 1 character with /[a-zA-Z]{1,}/, there is
an equivalent for that expression by using the plus symbol (+), this is:
/[a-zA-Z]+/
METACHARACTERS
WHAT IS A METACHARACTER?

A character that has a special meaning during pattern


processing.
“\d” METACHARACTER
This metacharacter matches digits (“d” for digit), it’s the same as [0-9].
In other words, “/m[0-9]uch/” and “/m\duch/” do exactly the same thing.

Note: it’s very important to use back slashes(\) to make know the computer that
you’re using the metacharacter “\d” and not the letter “d”, same happens with the
rest of the metacharacters.
“\” is a special character like “+”, “.”, “[]”, “[^]” and “?”. We will see the rest of
them in the following section.
“\w” METACHARACTER
In this case, “w” refers to “word” and this metacharacter matches any word character
(a-z, A-Z, 0-9, and lowercases “_” ).
So “/m[0-9a-zA-Z_]uch/” and “/m\wuch/” do the same.
“\s” METACHARACTER
“s” comes from “space”, then “\s” metacharacter matches all kind of whitespaces.
So, “/abadiel\s2014/” will match: “abadiel 2014”, “abadiel 2014”, etc.
SPECIAL CHARACTERS
“?” CHARACTER
If we use “?” after another character, then it makes it optional to match, in other
words, it can appear 0 or 1 times.
So, if we have /hello?/, then “hell” and “hello” will match, but “helloo” won’t.
“.” CHARACTER
If we use “.” it matches any character, except newline character.
So, if we have /.+/, then “gjnnj51//-*+595_##/” will match.
KNOWN SPECIAL
CHARACTERS
“+” character “\” character
In the last section we explained it, it It’s also called “escape character” and it
matches the preceding character 1 or enables us to use metacharacters. But
more times. also helps with matching characters like
“/” and “.” that, as we saw, have
different meanings and can’t be used
literally.
KNOWN SPECIAL
CHARACTERS
“[]” character “^” character
We used it to create sets of possibilities We used it to exclude options for a
for one character, like [a-zA-Z] for character in a set, like [^123]
example.
“*” CHARACTER
Similar to “+”, but it gives us the possibility of 0 repetitions for the preceding
character.
So, /a*/ matches “”, “a” and “a…aaaa…aaa”.
STARTING AND
ENDING PATTERNS
STARTING AND ENDING
PATTERN
Sometimes, we only want our regular expression to affect the beginning of the text,
until now, our regex’s were always focusing at any part inside the string.
For example: If we have /\w{4}/, then:
- “w_ter” matches
- “waternfkfkjngt” matches too
- “dfngijwaterfbjgfn” matches too
- “fvnkjdfnvjdfjnwater” matches too
But this can also be dangerous, because in the second example, even though the first
part is the only one that matches, the computer will take all the string. Sometimes
when we want all the string to match completely this becomes a problem.
STARTING PATTERN
And that’s why is necessary to establish starting and ending patterns, so we can put a
start and an end to the string.
To establish a starting pattern we use a carat(^) at the beginning of the expression.
Then, the regular expression /^abcd/ will have these effects:
- “abcdfmvfdnivj”, result: match
- “dicifabcdnjf”, result: not match
- “cdbhjdbfdsabcd”, result: not match
Our regular expression now focuses in the beginning of the string.
ENDING PATTERN
And, in the other hand, if we want our regular expression to focus in the end of the
string, we need an ending pattern, to establish it we use the dollar-sign ($).
Then, /abcd$/ will have these effects:
- “abcdfmvfdnivj”, result: not match
- “dicifabcdnjf”, result: not match
- “cdbhjdbfdsabcd”, result: match
And we can use both patterns at the same time, this makes all the string be matched
by our regular expression.
Then, /^abcd$/ will have these effects:
- “abcdfmvfdnivj”, result: not match
- “dicifabcdnjf”, result: not match
- “cdbhjdbfdsabcd”, result: not match
- “abcd”, result: match
Now our regular expression focuses in all the string.
ALTERNATE
CHARACTERS
ALTERNATE CHARACTERS
We learnt how to establish possibilities for character’s value, but now we will see
how to establish possibilities for a substring or the entire one.
So, imagine that we are asking for a phone number or a name, we can put those 2
possibilities in one regular expression by using a pipe (|) which means “or”.
It would be like this: /[a-zA-Z]{1,20}|[0-9]{9}/
The pipe then separates both possibilities. In this case, the possibilities are for the
complete string.
If we want to focus the options on a part of the string, then the parentheses are
useful.
Example: /^(orange|apple)-juice$/ matches both “orange-juice” and “apple-juice”

You might also like