0% found this document useful (0 votes)
7 views8 pages

Regex Summary

The document provides an overview of regex functions and metacharacters, detailing their usage with examples. It includes practical regex questions and solutions, demonstrating how to search, find, replace, and split strings using regex patterns. Key concepts covered include matching digits, word boundaries, and handling whitespace characters.

Uploaded by

midogamal0122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views8 pages

Regex Summary

The document provides an overview of regex functions and metacharacters, detailing their usage with examples. It includes practical regex questions and solutions, demonstrating how to search, find, replace, and split strings using regex patterns. Key concepts covered include matching digits, word boundaries, and handling whitespace characters.

Uploaded by

midogamal0122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

regex

Created by OMVR GAYAR

Created time @January 20, 2025 10:25 AM

Tags Product

‫بسم هللا والصاله والسالم علي رسول هللا صل هللا عليه وسلم‬

ِ t the beginning let’s summarize some of the


A
important regex functions and metacharacters.

Functions:
search(): Searches through the entire string and finds the first match of the
regex.

findall(): Finds all the matches of the regex in the string and returns them as a
list.
sub(): Replaces the first (or all) occurrences of a regex pattern with a
replacement string.
split(): Splits the string into a list based on the occurrences of the regex
pattern.

Metacharacters:
\d : Matches any digit (equivalent to [0-9] )

\s : Matches any whitespace character (including spaces, tabs, and line


breaks)

\w : Matches any word character (letters, digits, and underscores—


equivalent to [a-zA-Z0-9_] )

regex 1
: Matches a word boundary (the position between a word and non-word
\b

character)

\D : Matches any non-digit character (the opposite of \d )

\S : Matches any non-whitespace character (the opposite of \s )

\W : Matches any non-word character (the opposite of \w )

A Good cheatsheet for regex

Now let’s have a look at the regex questions that


came in the final exam 2023/2024, to get a better
grasp on the different functions and metacharacters,
and how to deal with them in different situations.

Q1.

import re
text = "The quick brown fox jumbs over the lazy dog"
pattern = r'\b\w{5}\b'
matches = re.findall(pattern, text)
# The Output: ['quick', 'brown', 'jumbs']

Here the pattern puts boundaries \b and searches for a word with a length of 5
characters \w{5} . So the function will return a list of words with the length of 5
characters.

Q2.

import re
text = "My phone number is 100"
pattern = r'\d+'
replacment = "XXX"
new_text= re.sub(pattern, replacment, text)

regex 2
print(new_text)
# The Output: My phone number is XXX

In this one the pattern matches only for digits \d+ , So it will substitute sub() the
100 and replace it with XXX.

Ok that’s totally fine, but what about this little + sign after the \d ??
This plus sign indicates that it matches one or more digits, so in this example
it’ll take the whole 100 and replace it with XXX, But if i only removed this plus
sign the pattern will match for only one digit so it’ll deal with the 100 as 3 digits
and the output will be XXXXXXXXX which is XXX * 3.

Q3.

import re
text = "apple, orange, banana, grape"
pattern = r',\s*'
result = re.split(pattern, text)
print(result)
# The Output: ['apple', 'orange', 'banana', 'grape']

Ok, what was the \s matches for?


It is matching for whitespace characters [spaces, tabs, newlines]

The * here means zero or more whitespace characters.


So the whole pattern ,\s* it matches for a comma followed by any number of
whitespace characters after it, including zero.

And then split the string at each comma, with ignoring any spaces after the
comma.

Q4.

import re
text = "abc123def456"

regex 3
pattern = r'\d+'
matches= re.findall(pattern, text)
print(matches)
# The Output: ['123', '456']

Ok, again digits pattern, no problem.

The pattern matches for one or more consecutive digits, which will return a list
with all the digits found in the text, so the output is ['123', '456'].
Again just to make sure that you got it clear, if he removes the plus sign the
pattern will be matching only one digit so the output in this case gonna be ['1',
'2', '3', '4', '5', '6'].

Lastly but not least, one more difference you gotta make sure you understand it
right, which is the difference between findall() and search() , in the question he
used findall() to return all the matches of the regex in the text. But what does
search() match for ? Only the first match for the regex, so in this case the
output only gonna be 123.

Q5.

import re
text = "Please contact us at [email protected] or support@exam
pattern = r'\b\w+@\w+\.\w+\b'
matches= re.search(pattern, text)
print(matches.group())
# The Output: [email protected]

In this one he puts the word boundary \b this insures that the pattern matches
a complete word or in other words a continues word.

The \w+ matches one or more word characters [letters, digits, underscores],
which is the first part of the email before the @ sign info or support .

The @ sign is the literal @ sign in the email address.

Again the \w+ which matches the domain name part in the email example .

regex 4
The \. matches a literal dot but needs to be escaped with a backslash
because the dot is a metacharacter in regex.

One more \w+ which matches the TLD (Top Level Domain) part in the email
address, which is the com .
Lastly it ends with the \b again which is the boundaries part for the pattern
which insures it’s a complete word as we said before.

He used the search() function so the pattern will only returns the first match
which in this case is [email protected] and will not return the support email.

Another detail to make sure you get it, with the search() function the return type
is object not a list like in findall() , so we use the group() function in
matches.group() to extract the matched string.

Now let’s take a look on the questions Dr Hend sent


to practice more on regex.

Q2. Extract all dates in the format dd-mm-yyyy from a given string.

import re
text = "Today's date is 20-01-2025"
pattern = r'\d{2}-\d{2}-\d{4}'
match = re.findall(pattern, text)
print(match)
# The Output: ['20-01-2025']

This one should be easy for us now.


It’s just matching for 2 digits for the day and month with \d{2} and on 4 digits
for the year with \d{4} , and just formatting with a dash between them.

Q3. Extract all words that start with a vowel from a given string.

regex 5
import re
text = "an apple and an orange or a banana and a mango"
pattern = r'\b[aeiouAEIOU]\w*\b'
matches = re.findall(pattern, text)
print(matches)
# The Output: ['an', 'apple', 'and', 'an', 'orange', 'or', 'a

In this one he asks to filter for the words that starts with a vowel only, before
we start, We should remember the English vowels which they are ['a', 'e', 'i',
'o', 'u'].

At first we put our boundaries \b , we know that we gonna search in


different words and there’re spaces between them, so there are different
boundaries.

Next we filter for the vowel letters lower and upper cases at the beginning
of the word in this part [aeiouAEIOU] , and after it we add \w* which is used
for matching word characters [letters, digits, underscores], the purpose
from the * that it identifies that there is zero or more characters here.

So even the word is consisting of one character, two characters or any


number of characters it’ll match the pattern.

Q4. Write a regular expression to remove all non-alphanumeric


characters from a string, except spaces.

import re
text = "Hi I'm a s/tu*de_nt at FC-I S&C$U"
pattern = r'[^a-zA-Z0-9\s]'
replacment = ''
result = re.sub(pattern, replacment, text)
print(result)
# The Output: Hi Im a student at FCI SCU

Here we used something new, the ^ .

regex 6
The caret ( ^ ) when placed inside square brackets the pattern will match any
character not in the set.
Inside the square brackets, we matches for
a-z lower case characters, A-Z upper case characters, 0-9 digits, and \s
whitespace characters [spaces, tabs, newlines], which is what we want
exactly, when using the ^ in the square brackets it’ll match the characters
which is not in the set, which are the non-alphanumeric characters.
Using the sub() function, we replaced these non-alphanumeric characters
with an empty string.

Q5. Extract the domain name from an email address.

import re
text = "[email protected]"
pattern = r'\b\w+\.\w+\b'
matches= re.search(pattern, text)
print(matches.group())
# The Output: example.com

It is super similar and even easier than the one in Q5 in the final exam.

Q6. Write a regular expression to split a string by commas, semicolons,


and spaces.

import re
text = "apple,orange banana,grape;mango"
pattern = r'[\s,;]'
result = re.split(pattern, text)
print(result)
# The Output: ['apple', 'orange', 'banana', 'grape', 'mango']

regex 7
This is an easy one, we’re just splitting the string using the split() function, the
pattern is \s for whitespace characters [spaces, tabs, newlines], ; for
semicolons, and , for commas.

Q7. Write a regular expression to replace Multiple Whitespace Characters


with a Single Space.

import re
text = "Tommorrow is the final exam!"
pattern = r'\s+'
replacment = ' '
result = re.sub(pattern, replacment, text)
print(result)
# The Output: Tommorrow is the final exam!

In this one, we used the \s+ to specify one or more consecutive Whitespace
Characters and replace them with only a single space.

regex 8

You might also like