Python RegEx

Last Updated : 10 Jul, 2025

Regular Expression (RegEx) is a powerful tool used to search, match, validate, extract or modify text based on specific patterns. In Python, the built-in re module provides support for using RegEx. It allows you to define patterns using special characters like \d for digits, ^ for the beginning of a string and many more.

Python

import re

txt = 'GeeksforGeeks: A computer science portal for geeks'
match = re.search(r'portal', txt)

if match:
    print(match.group())
    print("Start:", match.start(), "End:", match.end())
else:
    print("No match")

Output

portal
Start: 34 End: 40

Explanation: re.search(r'portal', txt) finds the first occurrence of "portal" in the string. match.group() returns "portal", start() is 34 and end() is 40 (spaces included in count).

Why use RegEx?

Regular expressions are widely used in various fields involving text manipulation and analysis. Below are some common use cases:

Use Case	Description
Data Mining	Quickly extract emails, phone numbers, URLs, etc. from large text blocks.
Validation	Validate user inputs like email addresses, passwords, dates, etc.
Text Processing	Replace or reformat strings to match required formats (e.g., date reformatting).

MetaCharacters In RegEx

To understand the RE analogy, MetaCharacters are useful, important and will be used in functions of module re. Below is the list of metacharacters.

MetaCharacters	Description
\	Used to drop the special meaning of character following it
[]	Represent a character class
^	Matches the beginning
$	Matches the end
.	Matches any character except newline
\|	Means OR (Matches with any of the characters separated by it.
?	Matches zero or one occurrence
*	Any number of occurrences (including 0 occurrences)
+	One or more occurrences
{}	Indicate the number of occurrences of a preceding RegEx to match.
()	Enclose a group of RegEx

group(), start() and end() methods are commonly used to access matched substrings and their positions.

Special Sequences

Special sequences in Python RegEx begin with a backslash (\) and are used to match specific character types or positions in a string. They simplify complex patterns and enhance readability.

Special Sequence	Description	Examples
\A	Matches if the string begins with the given character	\Afor	for geeks
\A	Matches if the string begins with the given character	\Afor	for the world
\b	Matches if the word begins or ends with the given character. \b(string) will check for the beginning of the word and (string)\b will check for the ending of the word.	\bge	geeks
\b		\bge	get
\B	It is the opposite of the \b i.e. the string should not start or end with the given regex.	\Bge	together
\B		\Bge	forge
\d	Matches any decimal digit, this is equivalent to the set class [0-9]	\d	123
\d		\d	gee1
\D	Matches any non-digit character, this is equivalent to the set class [^0-9]	\D	geeks
\D		\D	geek1
\s	Matches any whitespace character.	\s	gee ks
\s	Matches any whitespace character.	\s	a bc a
\S	Matches any non-whitespace character	\S	a bd
\S	Matches any non-whitespace character	\S	abcd
\w	Matches any alphanumeric character, this is equivalent to the class [a-zA-Z0-9_].	\w	123
\w		\w	geeKs4
\W	Matches any non-alphanumeric character.	\W	>$
\W	Matches any non-alphanumeric character.	\W	gee<>
\Z	Matches if the string ends with the given regex	ab\Z	abcdab
\Z	Matches if the string ends with the given regex	ab\Z	abababab

Basic RegEx Patterns

Let's understand some of the basic regular expressions. They are as follows:

1. Character Classes

Character classes allow matching any one character from a specified set. They are enclosed in square brackets [].

Python

import re
print(re.findall(r'[Gg]eeks', 'GeeksforGeeks: \
                 A computer science portal for geeks'))

Output

['Geeks', 'Geeks', 'geeks']

2. Ranges

In RegEx, a range allows matching characters or digits within a span using - inside []. For example, [0-9] matches digits, [A-Z] matches uppercase letters.

Python

import re
print('Range',re.search(r'[a-zA-Z]', 'x'))

Output

Range <re.Match object; span=(0, 1), match='x'>

3. Negation

Negation in a character class is specified by placing a ^ at the beginning of the brackets, meaning match anything except those characters.

Syntax:

[^a-z]

Example:

Python

import re

print(re.search(r'[^a-z]', 'c'))
print(re.search(r'G[^e]', 'Geeks'))

Output

None
None

3. Shortcuts

Shortcuts are shorthand representations for common character classes. Let's discuss some of the shortcuts provided by the regular expression engine.

\w - matches a word character
\d - matches digit character
\s - matches whitespace character (space, tab, newline, etc.)
\b - matches a zero-length character

Python

import re

print('Geeks:', re.search(r'\bGeeks\b', 'Geeks'))
print('GeeksforGeeks:', re.search(r'\bGeeks\b', 'GeeksforGeeks'))

Output

Geeks: <_sre.SRE_Match object; span=(0, 5), match='Geeks'>
GeeksforGeeks: None

4. Beginning and End of String

The ^ character chooses the beginning of a string and the $ character chooses the end of a string.

Python

import re


# Beginning of String
match = re.search(r'^Geek', 'Campus Geek of the month')
print('Beg. of String:', match)

match = re.search(r'^Geek', 'Geek of the month')
print('Beg. of String:', match)

# End of String
match = re.search(r'Geeks$', 'Compute science portal-GeeksforGeeks')
print('End of String:', match)

Output

Beg. of String: None
Beg. of String: <_sre.SRE_Match object; span=(0, 4), match='Geek'>
End of String: <_sre.SRE_Match object; span=(31, 36), match='Geeks'>

5. Any Character

The . character represents any single character outside a bracketed character class.

Python

import re

print('Any Character', re.search(r'p.th.n', 'python 3'))

Output

Any Character <_sre.SRE_Match object; span=(0, 6), match='python'>

6. Optional Characters

Regular expression engine allows you to specify optional characters using the ? character. It allows a character or character class either to present once or else not to occur. Let's consider the example of a word with an alternative spelling - color or colour.

Python

import re

print('Color',re.search(r'colou?r', 'color')) 
print('Colour',re.search(r'colou?r', 'colour'))

Output

Color <_sre.SRE_Match object; span=(0, 5), match='color'>
Colour <_sre.SRE_Match object; span=(0, 6), match='colour'>

7. Repetition

Repetition enables you to repeat the same character or character class. Consider an example of a date that consists of day, month, and year. Let's use a regular expression to identify the date (mm-dd-yyyy).

Python

import re


print('Date{mm-dd-yyyy}:', re.search(r'[\d]{2}-[\d]{2}-[\d]{4}',
                                     '18-08-2020'))

Output

Date{mm-dd-yyyy}: <_sre.SRE_Match object; span=(0, 10), match='18-08-2020'>

Here, the regular expression engine checks for two consecutive digits. Upon finding the match, it moves to the hyphen character. After then, it checks the next two consecutive digits and the process is repeated.

Let's discuss three other regular expressions under repetition.

7.1 Repetition ranges

The repetition range is useful when you have to accept one or more formats. Consider a scenario where both three digits, as well as four digits, are accepted. Let's have a look at the regular expression.

Python

import re


print('Three Digit:', re.search(r'[\d]{3,4}', '189'))
print('Four Digit:', re.search(r'[\d]{3,4}', '2145'))

Output

Three Digit: <_sre.SRE_Match object; span=(0, 3), match='189'>
Four Digit: <_sre.SRE_Match object; span=(0, 4), match='2145'>

7.2 Open-Ended Ranges

There are scenarios where there is no limit for a character repetition. In such scenarios, you can set the upper limit as infinitive. A common example is matching street addresses. Let's have a look

Python

import re


print(re.search(r'[\d]{1,}','5th Floor, A-118,\
Sector-136, Noida, Uttar Pradesh - 201305'))

Output

<_sre.SRE_Match object; span=(0, 1), match='5'>

7.3 Shorthand

Shorthand characters allow you to use + character to specify one or more ({1,}) and * character to specify zero or more ({0,}.

Python

import re

print(re.search(r'[\d]+', '5th Floor, A-118,\
Sector-136, Noida, Uttar Pradesh - 201305'))

Output

<_sre.SRE_Match object; span=(0, 1), match='5'>

8. Grouping

Grouping is the process of separating an expression into groups by using parentheses, and it allows you to fetch each individual matching group.

Python

import re


grp = re.search(r'([\d]{2})-([\d]{2})-([\d]{4})', '26-08-2020')
print(grp)

Output

<_sre.SRE_Match object; span=(0, 10), match='26-08-2020'>

Let's see some of its functionality.

8.1 Return the entire match

The re module allows you to return the entire match using the group() method

Python

import re


grp = re.search(r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020')
print(grp.group())

Output

26-08-2020

8.2 Return a tuple of matched groups

You can use groups() method to return a tuple that holds individual matched groups

Python

import re


grp = re.search(r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020')
print(grp.groups())

Output

('26', '08', '2020')

8.3 Retrieve a single group

Upon passing the index to a group method, you can retrieve just a single group.

Python

import re


grp = re.search(r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020')
print(grp.group(3))

Output

8.4 Name your groups

The re module allows you to name your groups. Let's look into the syntax.

Python

import re


match = re.search(r'(?P<dd>[\d]{2})-(?P<mm>[\d]{2})-(?P<yyyy>[\d]{4})',
                  '26-08-2020')
print(match.group('mm'))

Output

8.5 Individual match as a dictionary

We have seen how regular expression provides a tuple of individual groups. Not only tuple, but it can also provide individual match as a dictionary in which the name of each group acts as the dictionary key.

Python

import re


match = re.search(r'(?P<dd>[\d]{2})-(?P<mm>[\d]{2})-(?P<yyyy>[\d]{4})',
                  '26-08-2020')
print(match.groupdict())

Output

{'dd': '26', 'mm': '08', 'yyyy': '2020'}

9. Lookahead

In the case of a negated character class, it won't match if a character is not present to check against the negated character. We can overcome this case by using lookahead; it accepts or rejects a match based on the presence or absence of content.

Python

import re


print('negation:', re.search(r'n[^e]', 'Python'))
print('lookahead:', re.search(r'n(?!e)', 'Python'))

Output

negation: None
lookahead: <_sre.SRE_Match object; span=(5, 6), match='n'>

Lookahead can also disqualify the match if it is not followed by a particular character. This process is called a positive lookahead, and can be achieved by simply replacing ! character with = character.

Python

import re

print('positive lookahead', re.search(r'n(?=e)', 'jasmine'))

Output

positive lookahead <_sre.SRE_Match object; span=(5, 6), match='n'>

10. Substitution

The regular expression can replace the string and returns the replaced one using the re.sub method. It is useful when you want to avoid characters such as /, -, ., etc. before storing it to a database. It takes three arguments:

the regular expression
the replacement string
the source string being searched

Let's have a look at the below code that replaces - character from a credit card number.

Python

import re

print(re.sub(r'([\d]{4})-([\d]{4})-([\d]{4})-([\d]{4})',r'\1\2\3\4',
             '1111-2222-3333-4444'))

Output

1111222233334444

Compiled Regular Expression

In Python, the re.compile() function allows you to compile a regular expression pattern into a RegEx object. This compiled object can then be reused for multiple operations like search, match, sub, etc.

Python

import re

regex = re.compile(r'([\d]{2})-([\d]{2})-([\d]{4})')

# search method
print('compiled reg expr', regex.search('26-08-2020'))

# sub method
print(regex.sub(r'\1.\2.\3', '26-08-2020'))

Output

compiled reg expr <_sre.SRE_Match object; span=(0, 10), match='26-08-2020'> 26.08.2020

Explanation:

re.compile(...) creates a reusable regular expression pattern to match dates in the DD-MM-YYYY format.
regex.search() finds the date match and regex.sub(r'\1.\2.\3', ...) replaces '26-08-2020' with '26.08.2020'.

sonugeorge

Improve

Article Tags :

Python RegEx

Why use RegEx?

MetaCharacters In RegEx

Special Sequences

Basic RegEx Patterns

1. Character Classes

2. Ranges

3. Negation

3. Shortcuts

4. Beginning and End of String

5. Any Character

6. Optional Characters

7. Repetition

7.1 Repetition ranges

7.2 Open-Ended Ranges

7.3 Shorthand

8. Grouping

8.1 Return the entire match

8.2 Return a tuple of matched groups

8.3 Retrieve a single group

8.4 Name your groups

8.5 Individual match as a dictionary

9. Lookahead

10. Substitution

Compiled Regular Expression

Explore

Python Fundamentals

Python Data Structures

Advanced Python

Data Science with Python

Web Development with Python

Python Practice

Thank You!

What kind of Experience do you want to share?