0% found this document useful (0 votes)

2 views11 pages

Regexpresion

The document provides an overview of Regular Expressions (RegEx) in Python, detailing various methods such as finditer(), findall(), match(), and search() for pattern matching in strings. It explains the use of meta characters and special sequences, along with examples of how to implement them in code. Additionally, it covers sets and their functionalities in RegEx, demonstrating how to match specific character sets within strings.

Uploaded by

Hector Nunez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views11 pages

Regexpresion

Uploaded by

Hector Nunez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

9/8/22, 4:14 PM RegEx

RegEx (Regular Expression)

Shubham Verma

Linkedin https://www.linkedin.com/in/shubham-verma-3968a5119

In [1]: import re

In [44]: test_string = '123abc456789abc123ABC'

In [45]: # finditer() method gives match object

pattern = re.compile(r'abc')
matches = pattern.finditer(test_string)

for match in matches:

print(match)

<re.Match object; span=(3, 6), match='abc'>

<re.Match object; span=(12, 15), match='abc'>

In [46]: # or can also be done in following way:

matches = re.finditer(r"abc", test_string)

for match in matches:
print(match)

<re.Match object; span=(3, 6), match='abc'>

<re.Match object; span=(12, 15), match='abc'>

In [47]: # findall() method gives all matching strings

matches = re.findall(r"abc", test_string)

for match in matches:
print(match)

abc
abc

In [48]: # match() method gives first match at the beginning of string.

match = re.match(r"abc", test_string)

print(match)

None

In [49]: match = re.match(r"123", test_string)

print(match)

<re.Match object; span=(0, 3), match='123'>

In [50]: # search() method gives first matching string at any location

match = re.search(r"abc", test_string)

print(match)

<re.Match object; span=(3, 6), match='abc'>

Methods on a Match object

1. group(): Return the string matched by the RE
2. start(): Return the starting position of the match
3. end(): Return the ending position of the match
4. span(): Return a tuple containing the (start, end) positions of the match

In [51]: # moving forward with finditer() method

pattern = re.compile(r'abc')
matches = pattern.finditer(test_string)

for match in matches:

print(match.span(), match.start(), match.end())

# group, start, end, span method

(3, 6) 3 6
(12, 15) 12 15

localhost:8888/nbconvert/html/RegEx.ipynb?download=false 1/11
9/8/22, 4:14 PM RegEx

In [52]: pattern = re.compile(r'abc')

matches = pattern.finditer(test_string)

for match in matches:

print(match.group())

abc
abc

All meta characters . ^ $ * + ? { } \ | ( )

Note: Meta characters need to be escaped (with ) if we actually want to search for the char.

1. . any character (except new line character)

2. ^ startswith "^hello"
3. $ endswith "world$"
4. * zero or more occurances
5. + one or more occurances
6. {} exactly specified no of occurances "al{2}"
7. [] A set of characters "[a-m]"
8. \ Signals a special sequence (can also be used to escape special characters) "\d"
9. | Either or "falls|stays"
10. ( ) Capture and group

More Metacharacters / Special Sequences

A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

1. \d :Matches any decimal digit; this is equivalent to the class [0-9].

2. \D : Matches any non-digit character; this is equivalent to the class [^0-9].
3. \s : Matches any whitespace character;
4. \S : Matches any non-whitespace character;
5. \w : Matches any alphanumeric (word) character; this is equivalent to the class [a-zA-Z0-9_].
6. \W : Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].
7. \b Returns a match where the specified characters are at the beginning or at the end of a word r"\bain" r"ain\b"
8. \B Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
r"\Bain" r"ain\B"
9. \A Returns a match if the specified characters are at the beginning of the string "\AThe"
10. \Z Returns a match if the specified characters are at the end of the string "Spain\Z"

In [53]: pattern = re.compile(r".")

matches = pattern.finditer(test_string)

for match in matches:

print(match.group(), end = " ")

1 2 3 a b c 4 5 6 7 8 9 a b c 1 2 3 A B C

In [54]: test_string1 = '123abc456789abc123ABC.'

pattern = re.compile(r"\.")
matches = pattern.finditer(test_string1)

for match in matches:

print(match)

<re.Match object; span=(21, 22), match='.'>

In [55]: test_string = '123abc456789abc123ABC'

pattern = re.compile(r"^123")
matches = pattern.finditer(test_string)

for match in matches:

print(match)

<re.Match object; span=(0, 3), match='123'>

In [56]: pattern = re.compile(r"^abc")

matches = pattern.finditer(test_string)

localhost:8888/nbconvert/html/RegEx.ipynb?download=false 2/11
9/8/22, 4:14 PM RegEx
for match in matches:
print(match)
# no match

In [57]: pattern = re.compile(r"abc$")

matches = pattern.finditer(test_string)

for match in matches:

print(match)

# no match

In [58]: pattern = re.compile(r"ABC$")

matches = pattern.finditer(test_string)

for match in matches:

print(match)