Python Regular Expressions (re Module)
What is Regular Expression?
❖ Regular Expression (Regex) is a special pattern used to search, match, or extract specific
parts from a string.
❖ In Python, we use the re module to work with regular expressions.
Why to Use Regex?
❖ To find a specific word in a sentence
❖ To check if an email/phone number is valid
❖ To extract data from files or webpages
❖ To do filtering, searching, and counting text
How to Use Regex in Python?
import re
Use re.finditer() or re.compile() for pattern matching
Function Use
re.compile(pattern) Creates a regex pattern
finditer() Finds all matching positions one by one
group() Gives the matched value
start() Start index of the match
end() End index of the match
Examples (Basic Character Search)
Example 1: Search character 's'
msg = input('Enter a message: ')
ptn = re.compile('s')
result = ptn.finditer(msg)
for i in result:
print(f's is present at: {i.start()} index')
Output:
s is present at: 2 index
s is present at: 9 index
Example 2: Search capital 'B'
msg = input('Enter a message: ')
ptn = re.compile('B')
result = ptn.finditer(msg)
for i in result:
print(f'B is present at: {i.start()} index')
Output:
B is present at: 0 index
B is present at: 1 index
Example 3: Print match value using .group()
msg = input('Enter a message: ')
ptn = re.compile('B')
result = ptn.finditer(msg)
for i in result:
print(f'{i.group()} is present at: {i.start()} index')
Output:
B is present at: 0 index
B is present at: 1 index
Examples using fixed messages (no user input)
Example 4:
msg = 'BBSR IS NOT BAD'
ptn = re.compile('BB')
result = ptn.finditer(msg)
for i in result:
print(f'{i.group()} is present at index {i.start()}')
Output:
BB is present at index 0
Example 5:
msg = 'BBSR IS NOT BAD'
ptn = re.compile('BB')
result = ptn.finditer(msg)
for i in result:
print(f'{i.group()} is started with: {i.start()} and ended with: {i.end()}')
Output:
BB is started with: 0 and ended with: 2
Example 6:
msg = 'BBSR IS BBSR'
ptn = re.compile('BB')
result = ptn.finditer(msg)
for i in result:
print(f'{i.group()} is started with: {i.start()} and ended with: {i.end()}')
Output:
BB is started with: 0 and ended with: 2
BB is started with: 7 and ended with: 9
Example 7: Show all match objects
msg = 'BBSR IS BBSR AND BBSR'
ptn = re.compile('BB')
result = ptn.finditer(msg)
for i in result:
print(i)
Output:
<re.Match object; span=(0, 2), match='BB'>
<re.Match object; span=(7, 9), match='BB'>
<re.Match object; span=(17, 19), match='BB'>
Example 8: Direct pattern in finditer()
msg = 'BBSR IS BBSR AND BBSR'
result = re.finditer('BB', msg)
for i in result:
print(f'{i.group()} is started with: {i.start()} and ended with: {i.end()}')
Output:
BB is started with: 0 and ended with: 2
BB is started with: 7 and ended with: 9
BB is started with: 17 and ended with: 19
Character Classes and Custom Matching
Example 9: Match specific characters [SKP]
msg = 'Surendra Kumar Panda 1024 @#
[email protected]'
result = re.finditer('[SKP]', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output:
S at index 0
K at index 9
P at index 16
Example 10: Match [KPS] Order doesn’t matter
result = re.finditer('[KPS]', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output: Same as Example 10
Example 11: Match uppercase A to Z
result = re.finditer('[A-Z]', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Example 12: Match lowercase, both cases, and digits
re.finditer('[a-z]', msg)
re.finditer('[A-Za-z]', msg)
re.finditer('[0-9]', msg)
Example 13: Match @ symbol, 0 or 1, or everything except 'S'
re.finditer('[@]', msg)
re.finditer('[@01]', msg)
re.finditer('[^S]', msg)
Example 14 : Negate uppercase, alphabets, alphanumeric
re.finditer('[^A-Z]', msg)
re.finditer('[^A-Za-z]', msg)
re.finditer('[^A-Za-z0-9]', msg)
Counting Matches with Regex
Example 15: Count spaces in a string
msg = 'Surendra Kumar Panda 1024 @#
[email protected]'
result = re.finditer('[ ]', msg)
count = 0
for i in result:
count += 1
print(count)
Output:
5
Example 16: Count digits in a string
result = re.finditer('[0-9]', msg)
count = 0
for i in result:
count += 1
print(count)
Output:
7
Example 17: Count dots (.) in a string
result = re.finditer('[.]', msg)
count = 0
for i in result:
count += 1
print(count)
Output:
1
Using Special Character Classes
Example 18: Count spaces using \s
result = re.finditer('\s', msg)
count = 0
for i in result:
count += 1
print(count)
Output:
5
Example 19: Match all non-space characters \S
result = re.finditer('\S', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output: (prints all characters that are not space, with their index)
Example 20: Match all non-alphanumeric \W
result = re.finditer('\W', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output: (prints spaces, @, #, . etc. with their index)
Example 21: Match all non-digits \D
result = re.finditer('\D', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output: (prints all non-digit characters with their index)
Example 22: Match all digits \d
result = re.finditer('\d', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output:
1 at index 20
0 at index 21
2 at index 22
4 at index 23
Example 23: Count all alphabets (both lowercase and uppercase)
result = re.finditer('[A-Za-z]', msg)
count = 0
for i in result:
count += 1
print(count)
Output:
30
Example 24: Count digits using \d
msg = 'Surendra Kumar Panda 1024 @#
[email protected] 7539000111 Python 101'
result = re.finditer('\d', msg)
count = 0
for i in result:
count += 1
print(count)
Output:
17
Example 25: Count dot . using special character class
result = re.finditer('\.', msg)
count = 0
for i in result:
count += 1
print(count)
Output:
1
Example 26: Count whitespace using \s
result = re.finditer('\s', msg)
count = 0
for i in result:
count += 1
print(count)
Output:
9
Example 27: Match all non-whitespace using \S
result = re.finditer('\S', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output: Prints all characters except spaces
Example 28: Match all non-alphanumeric using \W
result = re.finditer('\W', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output:
at index 8
at index 14
at index 20
@ at index 25
# at index 26
. at index 42
at index 49
at index 56
Example 29: Match all non-digit using \D
result = re.finditer('\D', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output: Prints all characters except digits
Example 30: Match all digit using \d
result = re.finditer('\d', msg)
for i in result:
print(f'{i.group()} at index {i.start()}')
Output:
1 at index 21
0 at index 22
2 at index 23
4 at index 24
7 at index 36
5 at index 37
3 at index 38
9 at index 39
0 at index 40
0 at index 41
0 at index 42
1 at index 43
1 at index 44
1 at index 45
1 at index 46
1 at index 50
0 at index 51
1 at index 52
Total: 17 digits
Example 31: Match beginning of the string using \A
result = re.finditer('\ASurendra', msg)
for i in result:
print(f'{i.group()} starts at {i.start()}')
Output:
Surendra starts at 0
Example 32: Match if string ends with 101 using \Z
result = re.finditer('101\Z', msg)
for i in result:
print(f'{i.group()} ends at {i.end()}')
Output:
101 ends at 60
Example 33: Match 'Sure' at beginning using \A
msg = 'Surendra Kumar Panda SURENDRA surendra@#$
[email protected]7539000111 Python 101'
result = re.finditer('\ASure', msg)
for i in result:
print(f'{i.group()} is at index {i.start()} to {i.end()}')
Output:
Sure is at index 0 to 4
Example 34: Match full starting text using \A
result = re.finditer('\ASured', msg)
for i in result:
print(f'{i.group()} is at index {i.start()} to {i.end()}')
Output:
Sured is at index 0 to 5
Example 35: Try matching another word at beginning
result = re.finditer('\AKumar', msg)
for i in result:
print(f'{i.group()} is at index {i.start()} to {i.end()}')
Output:
(No output, because 'Kumar' is not at beginning)
Example 36: Match '101' at end using \Z
result = re.finditer('101\Z', msg)
for i in result:
print(f'{i.group()} is at index {i.start()} to {i.end()}')
Output:
101 is at index 76 to 79
Example 37: Try matching another word at start using \A
result = re.finditer('\AKumar', msg)
for i in result:
print(f'{i.group()} is at index {i.start()} to {i.end()}')
Output:
(No output)
Example 38: Match a wrong start pattern
result = re.finditer('\AKumar', msg)
for i in result:
print(f'{i.group()} is at index {i.start()} to {i.end()}')
Output:
(No output)
Example 39: Match all occurrences of 'a'
msg = 'Surendra Kumar Panda SURENDRA surendra@#$
[email protected]7539000111 Python 101'
result = re.finditer('a', msg)
for i in result:
print(f"{i.group()} at index {i.start()} to {i.end()}")
Output:
a at index 14 to 15
a at index 20 to 21
a at index 29 to 30
a at index 41 to 42
a at index 45 to 46
a at index 55 to 56
a at index 70 to 71
a at index 74 to 75
Example 40: Count all 'a' in a string
msg = 'ababaabaa'
result = re.finditer('a', msg)
count = 0
for i in result:
count += 1
print(count)
Output:
6
Example 41: Use quantifier '+' to match one or more 'a'
msg = 'ababaabaa'
result = re.finditer('a+', msg)
count = 0
for i in result:
print(f"{i.group()} at index {i.start()} to {i.end()}")
count += 1
print(count)
Output:
a at index 0 to 1
aa at index 3 to 5
aa at index 6 to 8
3
Example 42: Match one or more 'a' in long string
msg = 'ababaabaaaaaaaba'
result = re.finditer('a+', msg)
for i in result:
print(f"{i.group()} at index {i.start()} to {i.end()}")
Output:
a at index 0 to 1
aa at index 3 to 5
aaaaaa at index 6 to 12
a at index 13 to 14
Example 43: Use * quantifier to match zero or more 'a'
msg = 'aaabbbaaabbbcca'
result = re.finditer('a*', msg)
for i in result:
print(f'{i.start()} {i.group()}')
Output:
0 aaa
3
4
5 aaa
8
9
10
11
12 aa
14
15
Example 44: Match zero or more 'a' in empty string
msg = ''
result = re.finditer('a*', msg)
count = 0
for i in result:
print(f'{i.start()} {i.group()}')
count += 1
print(count)
Output:
0
1
Example 45: Match zero or one 'a' using '?'
msg = 'abaaabaa'
result = re.finditer('a?', msg)
count = 0
for i in result:
print(f'{i.start()} {i.group()}')
count += 1
print(count)
Output:
0 a
1
2 a
3
4 a
5 a
6
7 a
8
9
Example 46: Match zero or one 'a' in 'akasha'
msg = 'akasha'
result = re.finditer('a?', msg)
count = 0
for i in result:
print(f'{i.start()} {i.group()}')
count += 1
print(count)
Output:
0 a
1
2 a
3
4
5 a
6
7
Example 47: Match exactly three 'a's
msg = 'abaabaaabaaab'
result = re.finditer('a{3}', msg)
count = 0
for i in result:
print(f'{i.start()} {i.group()}')
count += 1
print(count)
Output:
8 aaa
1
Example 48: Match two to five 'a's
msg = 'abaabaaabaaab'
result = re.finditer('a{2,5}', msg)
count = 0
for i in result:
print(f'{i.start()} {i.group()}')
count += 1
print(count)
Output:
5 aa
8 aaa
2
Example 49: Match between 2 to 5 'a's in a long string
msg = 'abaabaaabaaaabaaaaabaaaaaaaaaa'
result = re.finditer('a{2,5}', msg)
count = 0
for i in result:
print(f'{i.start()} {i.group()}')
count += 1
print(count)
Output:
5 aa
8 aaa
11 aaaa
16 aaaaa
22 aaaaa
27 aaaaa
6
Example 50: Match Email ID
msg = 'My email is
[email protected] and alternate is
[email protected]'
result = re.finditer(r'[\w.-]+@[\w.-]+\.\w+', msg)
for i in result:
print(f"Email found: {i.group()} at index {i.start()} to {i.end()}")
Output:
Email found:
[email protected] at index 13 to 41
Email found:
[email protected] at index 61 to 80
Example 51: Match Odisha Vehicle Number Plate
Format: OR02AB1234
msg = 'My car number is OR02AB1234 and bike number is OD05CD4321.'
result = re.finditer(r'\b(O[RD])[0-9]{2}[A-Z]{2}[0-9]{4}\b', msg)
for i in result:
print(f"Vehicle number found: {i.group()} at index {i.start()} to {i.end()}")
Output:
Vehicle number found: OR02AB1234 at index 18 to 28
Vehicle number found: OD05CD4321 at index 48 to 58
Example 52: Match 10-digit Indian Mobile Numbers
msg = 'Call me at 9876543210 or 7539000111 for more info.'
result = re.finditer(r'\b[6-9][0-9]{9}\b', msg)
for i in result:
print(f"Mobile number: {i.group()} at index {i.start()} to {i.end()}")
Output:
Mobile number: 9876543210 at index 11 to 21
Mobile number: 7539000111 at index 25 to 35
Example 53: Match PAN Card Number
Format: ABCDE1234F
msg = 'My PAN is ABCDE1234F and Priyanka's PAN is FGHIJ5678Z.'
result = re.finditer(r'\b[A-Z]{5}[0-9]{4}[A-Z]\b', msg)
for i in result:
print(f"PAN found: {i.group()} at index {i.start()} to {i.end()}")
Output:
PAN found: ABCDE1234F at index 10 to 20
PAN found: FGHIJ5678Z at index 43 to 53
Example 54: Match Aadhaar Number
Format: 12-digit numeric (can have spaces)
msg = 'Aadhaar: 1234 5678 9012 and old ID was 234567890123'
result = re.finditer(r'\b(?:\d{4} \d{4} \d{4}|\d{12})\b', msg)
for i in result:
print(f"Aadhaar found: {i.group()} at index {i.start()} to {i.end()}")
Output:
Aadhaar found: 1234 5678 9012 at index 9 to 24
Aadhaar found: 234567890123 at index 40 to 52
import re
# count=0
# pattern=re.compile('a')
# x=pattern.finditer('surendrakumarpandaabbcc')
# for i in x:
# count=count+1
# print('The number of ocxcurances',count)
# count=0
# pattern=re.compile('aa')
# x=pattern.finditer('surendrakumarpandaabbcc')
# for i in x:
# count=count+1
# print('The number of ocxcurances',count)
# count=0
# pattern=re.compile('abc')
# x=pattern.finditer('surendrakumarpandaabbcc')
# for i in x:
# count=count+1
# print('The number of ocxcurances',count)
# count=0
# pattern=re.compile('aa')
# x=pattern.finditer('surendrakumarpandaabbcc')
# for i in x:
# count=count+1
# print('The number of ocxcurances',count)
# print start index
# count=0
# pattern=re.compile('a')
# x=pattern.finditer('surendrakumarpandaabbcc')
# for i in x:
# print(i.start())
#print end index
# count=0
# pattern=re.compile('a')
# x=pattern.finditer('surendrakumarpandaabbcc')
# for i in x:
# print(i.end())
# count=0
# pattern=re.compile('panda')
# x=pattern.finditer('surendrakumarpandaabbcc')
# for i in x:
# print(i.end())
# group() --> return matched item /search item
# count=0
# pattern=re.compile('panda')
# x=pattern.finditer('surendrakumarpandaabbcc')
# for i in x:
# print(i.group())
# start () end () group()
# count=0
# pattern=re.compile('a')
# x=pattern.finditer('abcaabbcc')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# count=0
# pattern=re.compile('ss')
# x=pattern.finditer('aabbccddeeaabbccssddeesshhggee')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# same program in other way
# we can pass pattern directly as an argumnt in finditer()
# count=0
# x=re.finditer('a','aabbccddeeaabbccssddeesshhggee')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# char classess
# [abc]
# x=re.finditer('[abc]','aabbccabcabcaab')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# [^abc]
# x=re.finditer('[^abc]','aabbccabaa@ghb')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# # [a-z]
# x=re.finditer('[a-z]','HelloPython66')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# [A-Z]
# x=re.finditer('[A-Z]','HelloPython66')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
#[A-Za-z]
# x=re.finditer('[A-Za-z]','13Hello234hi')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
#[0-9]
# x=re.finditer('[0-9]','13Hello234hi')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# [4-8]
# x=re.finditer('[4-8]','1756Hello234hi')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# # alphanumeric
# #[A-Za-z0-9]
# x=re.finditer('[A-Za-z0-9]','13Hello234hi')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# alphanumeric
#[A-Za-z0-9]
# x=re.finditer('[A-Za-z0-9]','@@789')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# i dont want alphanumeric
# means i want special symbols
# x=re.finditer('[^A-Za-z0-9]','@@789abc')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# other special predefined char classses
# \s ( small s )
# x=re.finditer('\s','@surendra kumar panda#')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# \S ( UPPAER CASE) --> EXCEPT SPACE
# x=re.finditer('\S','@surenrda kumar panda#')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
#\d --> any digit 0-9
# x=re.finditer('\d','1526surendra90')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
#\D --> except digitis
# x=re.finditer('\D','125RAHUL45@@')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
#\w -> [A-Za-z0-9]
# x=re.finditer('\w','@surendra kumar 90 panda#')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
#\W
# x=re.finditer('\W','@surendra kumar 90 panda#')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# i want any char ( alphabet + digits + specail symabols)
#.
# x=re.finditer('.','@surendra kumar 90 panda#')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# Qunatifiers --->Numbers of occurrense
#a (exact one a )
#a+ ( atleast one a / means more also )
# x=re.finditer('a+','abcaabbccaaabbca')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
#a* (any number of a (may be 0))
# x=re.finditer('a*','abcaabbccaa')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
#a? ( atmost/maximun one 'a' (means either 0 or 1 )
# x=re.finditer('a?','abcaabbccaa')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# a{m} --> Exactly m number of a's
# x=re.finditer('a{2}','abcaabbccaa')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# x=re.finditer('a{3}','abcaabbccaaa')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
#a{m,n} ---> m--> min , n --->max
# x=re.finditer('a{2,3}','abcaabbccaaa')
# for i in x:
# print(i.start())
# print(i.end())
# print(i.group())
# other important Functions present inside re module
# match() function
# s=input('Enter patten')
# x=re.match(s, 'surendrapanda')
# if x!=None:
# print('Succeess')
# else:
# print('Fail')
# s=input('Enter patten')
# x=re.match(s, 'surendrapanda')
# if x!=None:
# print('Succeess')
# print(x.start())
# print(x.end())
# print(x.group())
# else:
# print('Fail')
#matchall()
# s=input('Enter patten')
# x=re.fullmatch(s, 'surendrapanda')
# if x!=None:
# print('Succeess')
# print(x.start())
# print(x.end())
# print(x.group())
# else:
# print('Fail')
# search()
# s=input('Enter patten')
# x=re.search(s, 'surendrapanda')
# if x!=None:
# print('Succeess')
# print(x.start())
# print(x.end())
# print(x.group())
# else:
# print('Fail')
# findall()
# this function will return a list which conatins occuransce
# s=input('Enter patten')
# x=re.findall(s, 'abc347@dv78678')
# for i in x:
# print(i)
#sub() --->substritution / replacemnet
# s=input('Enter patten')
# replace=input('Enter replacemnet string')
# x=re.sub(s,replace ,'a1b2c3d4')
# print(x)
#subn()
# return a tuple
#()
# number of replacemnet
# s=input('Enter patten')
# replace=input('Enter replacemnet string')
# x=re.subn(s,replace ,'a1b2c3d4')
# print(x) #('a*b*c*d*', 4)
# s=input('Enter patten')
# replace=input('Enter replacemnet string')
# x=re.subn(s,replace ,'a1b2c3d4')
# print('replacemnet string',x[0])
# print('Number of Replacemnet',x[1])
# split()
# name='surendra kumar panda'
# a=name.split()
# print(a)
# # for i in a:
## print(i)
# name='hello.hi.priyanka.sanu'
# a=name.split('.')
# for i in a:
# print(i)
# name='1000/10/50/60/80'
# a=name.split('/')
# for i in a:
# print(i)
# str='Hello Hi Priyanka'
# data=re.search("^Hello", str)
# if data!=None:
# print('stared with Hello')
# else:
# print('Not started with Hello')
# check a mobile number is started with 9 or 8 or 7 or 6
# ^ started with
# mobile=input('Enter mobile number')
# data=re.search('^[9876]', mobile)
# if data!=None:
# print('stared with 9 or 8 or 7 or 6')
# else:
# print('Not started with 9 or 8 or 7 or 6')
# check end with good night or not
# msg=input('Enter a msg ')
# data=re.search('good night$',msg,re.IGNORECASE)
# if data!=None:
# print('End with good night')
# else:
# print('Not End with good night')
# My own variable rules
# var=input('Enter variable names')
# x=re.fullmatch("[a-z_][A-Za-z0-9_]*", var)
# if x!=None:
# print('Valid variable')
# else:
# print('Invalid variable')
# var=input('Enter variable names')
# x=re.fullmatch("[a-z_][A-Za-z0-9_]", var)
# if x!=None:
# print('Valid variable')
# else:
# print('Invalid variable')
# var=input('Enter variable names')
# x=re.fullmatch("[a-z_][A-Za-z0-9_]*", var)
# if x!=None:
# print('Valid variable')
# else:
# print('Invalid variable')
# var=input('Enter variable names')
# x=re.fullmatch("[a-z_][A-Za-z0-9_]+", var)
# if x!=None:
# print('Valid variable')
# else:
# print('Invalid variable')
# var=input('Enter variable names')
# x=re.fullmatch("[a-z_][A-Za-z0-9_]+", var)
# if x!=None:
# print('Valid variable')
# else:
# print('Invalid variable')
# 10 digits mobile
# exact 10 digits
# start with 6-9
# var=input('Enter Mobile Numbers ')
# x=re.fullmatch("[6-9][0-9]{9}", var)
# if x!=None:
# print('Valid Mobile Number')
# else:
# print('Invalid Mobile Number')
# 10 or 11 or 12 digits
# f1=open("abc.txt",'r')
# f2=open("mobilenumber.txt",'w')
# for i in f1:
# list=re.findall('[6-9]\d{9}', i)
# for j in list:
# f2.write(j+"\n")
# Regular Expression Complete
# Web Scraping
import urllib.request
# data=urllib.request.urlopen("https://codedais.com/")
# text=data.read()
# t=re.findall("<body>.*</body>", str(text),re.IGNORECASE)
# print(t)
# extact mobile number from redbuc.in webpage
import urllib.request
import re
# Step 1: Fetch the page content
data = urllib.request.urlopen("https://www.redbus.in/info/contactus")
text = data.read().decode("utf-8") # decode bytes to string
# Step 2: Extract body content (optional but kept)
body_match = re.search(r"<body.*?>.*?</body>", text, re.IGNORECASE | re.DOTALL)
if body_match:
body_text = body_match.group()
else:
body_text = text # fallback to full text if <body> tag not found
# Step 3: Use Indian PIN code pattern (valid: 6 digits, not starting with 0)
pin_pattern = r"\b[1-9][0-9]{5}\b" # \b = word boundary to avoid partial matches
# Step 4: Find all valid PIN codes
result = re.finditer(pin_pattern, body_text)
# Step 5: Print matched PINs with positions
for i in result:
print(f"{i.group()} found at position {i.start()}–{i.end()}")