
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Use Named Groups in Python Regular Expressions
Usually, if we need to find a particular word such as, "data" in a text or a string we will directly search for it.
We can also use a sequence of characters which forms a pattern to search for words in a string. For example the characters "[0-9]" matches a single digit number and "[0-9]+" matches any string containing one or more digits. These sequence of characters are known as regular expressions.
Groups in Python Regular Expressions
Like any other programming Python provides a module named re to handle regular expressions. The methods re.match() and re.search() of this module are used to search for a desired pattern in a string.
If a pattern contains multiple sub-patterns, we retrieve specific parts of a match using groups. These are defined using parentheses "()". For example the pattern "\d{4})-(\d{2})-(\d{2}" matches an date format.
Here -
- group(0): Represents the entire match.
- group(1): Represents d{4}, which matches the year.
- group(2): Represents d{2}, which matches the month.
- group(3): Represents d{2}, which matches the date.
Example
Following is the Python example, which demonstrates how to match a particular pattern in a string using groups -
import re input_string = "Tutorials Point began as a single HTML tutorial and has since expanded to offer a wide range of online courses and tutorials. It was established on 2014-06-12." regexp = r"(\d{4})-(\d{2})-(\d{2})" match = re.search(regexp, input_string) if match: print("Found date in the given text:", match.group(0)) print("Year:", match.group(1)) print("Month:", match.group(2)) print("Day:", match.group(3))
Following is the output of the above program -
Found date in the given text: 2014-06-12 Year: 2014 Month: 06 Day: 12
Named Groups in Regular Expressions
As we discussed earlier, we can refer to groups of a pattern using numbers as a group(1), group(2), etc.
To avoid remembering group numbers, named groups are used. They allow us to assign names to each group, enhancing code readability, especially with complex patterns or multiple groups. The following is the regular expression syntax for a named capturing group -
(?P<name>pattern)
Example
This example shows how to extract specific strings using named groups in a regular expression. It uses a regular expression with two named groups: name and age. The matched values can be accessed using their group names -
import re data = "Name: vikram, Age: 30" pattern = r"Name: (?P<name>\w+), Age: (?P<age>\d+)" result = re.search(pattern, data) if result: print(result.group("name")) print(result.group("age"))
Following is the output of the above code ?
vikram 30
Accessing Named Groups
Instead of accessing each group one by one, we can either group("name") or groupdict() method, which returns all the named groups as a dictionary. This is very useful when you have multiple values to extract.
Each key in the dictionary is the group name, and each value is the matched part from the text.
import re text = "Name: vikram, Age: 30" pattern = r"Name: (?P<name>\w+), Age: (?P<age>\d+)" match = re.search(pattern, text) print(match.groupdict())
Output
Following is the output of the above code ?
{'name': 'vikram', 'age': '30'}
Using Loop to Extract Multiple Matches
If the text contains multiple matching patterns, then we can use re.finditer() function in a loop to extract all named group matches.
Example
In this example, for each match found, it prints the name and age using the named groups. This is helpful when working with structured or repeated data in text.
import re text = "Name: suresh, Age: 28; Name: praveen, Age: 34" pattern = r"Name: (?P<name>\w+), Age: (?P<age>\d+)" for match in re.finditer(pattern, text): print(match.group("name"), match.group("age"))
Output
Following is the output of the above code ?
suresh 28 praveen 34