
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Character Classes in Python Regular Expressions
In this chapter, we will understand character classes, how they work. and provide simple examples and programs to explain their usage. One of the main components of regular expressions is character classes or character sets.
Character classes help you to define a set of characters that will match in a string. They can be used to define a range of characters or specific characters to consider during a search.
Character Classes or Character Sets
A "character class," also known as a "character set," which helps you inform the regex engine to match only one of numerous characters. Simply enter the characters you want to match in square brackets. If you want to match an, a, an, or e, type [ae]. This could be used in gr[ae]y to match gray or grey. Very helpful if you have no idea whether the content you are looking over is written in American or British English.
A hyphen can be used within a character class to represent a range of characters. [0-9] refers to a single digit between 0 and 9. You can also use multiple ranges. [0-9a-fA-F] represents a single hexadecimal digit, case insensitive. You can combine ranges and individual characters. [0-9a-fxA-FX] represents a hexadecimal number or the letter X. Again, the sequence of the characters and the ranges make no difference.
Character classes are one of the most common types of regular expressions. You can find a term that is misspelled, like sep[ae]r[ae]te or li[cs]en[cs]e. A programming language's identifier is [A-Za-z_].[A-Za-z_0-9]*. For C-style hexadecimal number, we can denote it as 0[xX][A-Fa-f0-9]+.
Predefined Character Sets
Here are some of the predefined character sets mentioned for your reference -
\d accepts any digit (equivalent to [0-9]).
\D corresponds to any non-digit character.
\w corresponds to any alphanumeric character (equivalent to [a-zA-Z0-9_]).
\W corresponds to any non-alphanumeric character.
\s matches all whitespace characters (spaces, tabs and newlines).
\S corresponds to any non-whitespace character.
Now we will see some examples to show you how we can use character sets in Python Regular Expressions ?
Match Specific Characters
In this example, we have created a regex pattern for matching cat, bat or rat. The character class [car] allows any character from 'c', 'a', or 'r' before 'at'.
Here we are using the re.findall() method, which is used to find all pattern occurrences in a string.
import re text = "The cat sat on the mat." pattern = r'[car]at' matches = re.findall(pattern, text) print(matches)
Output
This will create the following outcome -
['cat', 'bat']
Matching Vowels
The program below searches for all vowels in the given string and returns a list of those vowels as output. Here also we have also used the re.findall() function.
import re text = "Hello World! Are you there?" pattern = r'[aeiou]' matches = re.findall(pattern, text) print(matches)
Output
This will generate the following result -
['e', 'o', 'o', 'A', 'e', 'o', 'e']
Find Numeric Characters
In this program, we will extract all numeric characters or digits from the given string. Here we have used the \d pattern, which will identify each digit present in the given string.
import re # Extract digits from a string text = "There are 4 apples and 10 oranges." pattern = r'\d' matches = re.findall(pattern, text) print(matches)
Output
This will produce the following result -
['4', '1', '0']
Match Username
The program below mainly checks if the username has only letters, numbers, or underscores. We have given the character set as '[a-zA-Z0-9_]', which means all allowed characters. And it verifies the username using character sets.
import re username = "user_123" pattern = r"^[a-zA-Z0-9_]+$" match = re.match(pattern, username) if match: print("Valid username") else: print("Invalid username")
Output
This will lead to the following outcome -
Valid username