0% found this document useful (0 votes)

55 views15 pages

PP - Chapter - 4

Regular expressions (regex) are patterns used to match character combinations in strings. Python's re module provides regex capabilities through functions like re.match(), re.search(), re.findall(), and re.split(). It uses special meta characters like ., ^, $, *, ?, etc. to match patterns in a string. This module allows validating formats like emails and passwords as well as parsing data using regex patterns.

Uploaded by

akash chandankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views15 pages

PP - Chapter - 4

Uploaded by

akash chandankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

4.

Python Regular Expression

4.1. Powerful pattern matching and searching

4.2. Power of pattern searching using regex in python

4.3. Real time parsing of data using regex

4.4. Password, email, URL validation using regular Expression

4.5. Pattern finding programs using regular expression

DR. MILIND GODASE, SIOM, PUNE – 41 1

Regular Expressions in Python
The term Regular Expression is popularly shortened as regex. A regex is a sequence of
characters that defines a search pattern, used mainly for performing find and replace
operations in search engines and text processors.
Python offers regex capabilities through the “re module” bundled as a part of the standard
library.

Raw strings
Different functions in Python’s re module use raw string as an argument. A normal string,
when prefixed with 'r' or 'R' becomes a raw string.
Example: Raw String
>>> rawstr = r'Hello! How are you?'
>>> print(rawstr)
Hello! How are you?
The difference between a normal string and a raw string is that the normal string
in print() function translates escape characters (such as \n, \t etc.) if any, while those in a raw
string are not.
Example: String vs Raw String
str1 = "Hello!\nHow are you?"
print("normal string:", str1)
str2 = r"Hello!\nHow are you?"
print("raw string:",str2)
Output
normal string: Hello!
How are you?
raw string: Hello!\nHow are you?
In the above example, \n inside str1 (normal string) has translated as a newline being printed
in the next line. But, it is printed as \n in str2 - a raw string.

DR. MILIND GODASE, SIOM, PUNE – 41 2

meta characters
Some characters carry a special meaning when they appear as a part pattern matching string.
In Windows or Linux, DOS commands, we use * and ? - they are similar to meta characters.
Python’s re module uses the following characters as meta characters:

.^$*+?[]\|()
Metacharacters are characters with a special meaning:

Character Description Example

[] A set of characters "[a-m]"

\ Signals a special sequence (can also be used "\d"

to escape special characters)

. Any character (except newline character) "he..o"

^ Starts with "^hello"

$ Ends with "world$"

* Zero or more occurrences "aix*"

+ One or more occurrences "aix+"

DR. MILIND GODASE, SIOM, PUNE – 41 3

{} Exactly the specified number of "al{2}"
occurrences

| Either or "falls|stays"

() Capture and group

When a set of alpha-numeric characters are placed inside square brackets [], the target
string is matched with these characters. A range of characters or individual characters
can be listed in the square bracket. For example:
Pattern Description
[abc] match any of the characters a, b, or c
[a-c] which uses a range to express the same set of characters.
[a-z] match only lowercase letters.
[0-9] match only digits.

The following specific characters carry certain specific meaning.

Pattern Description

\d Matches any decimal digit; this is equivalent to the class [0-9].

\D Matches any non-digit character

\s Matches any whitespace character

\S Matches any non-whitespace character

\w Matches any alphanumeric character

DR. MILIND GODASE, SIOM, PUNE – 41 4

Pattern Description

\W Matches any non-alphanumeric character.

. Matches with any single character except newline ‘\n’.

? match 0 or 1 occurrence of the pattern to its left

+ 1 or more occurrences of the pattern to its left

* 0 or more occurrences of the pattern to its left

\b boundary between word and non-word. /B is opposite of /b

[..] Matches any single character in a square bracket

\ It is used for special meaning characters like . to match a period or +

for plus sign.

{n,m} Matches at least n and at most m occurrences of preceding

a| b Matches either a or b

re.match() function
This function in re module tries to find if the specified pattern is present at the beginning of
the given string.
re.match(pattern, string)
The function returns None, if the given pattern is not in the beginning, and a match objects if
found.
Example: re.match()
from re import match

mystr = "Welcome to TutorialsTeacher"

obj1 = match("We", mystr)
print(obj1)
obj2 = match("teacher", mystr)
print(obj2)
Output

DR. MILIND GODASE, SIOM, PUNE – 41 5

<re.Match object; span=(0, 2), match='We'>
None
The match object has start and end properties.
Example:
>>> print("start:", obj.start(), "end:", obj.end())
Output
start: 0 end: 2
The following example demonstrates the use of the range of characters to find out if a
string starts with 'W' and is followed by an alphabet.
Example: match()

from re import match

strings=["Welcome to TutorialsTeacher", "weather forecast","Winston Churchill",

"W.G.Grace","Wonders of India", "Water park"]

for string in strings:

obj = match("W[a-z]", string)
print(obj)
Output
<re.Match object; span=(0, 2), match='We'>
None
<re.Match object; span=(0, 2), match='Wi'>
None
<re.Match object; span=(0, 2), match='Wo'>
<re.Match object; span=(0, 2), match='Wa'>

DR. MILIND GODASE, SIOM, PUNE – 41 6

re.search() function
The re.search() function searches for a specified pattern anywhere in the given string and
stops the search on the first occurrence.
Example: re.search()

from re import search

string = "Try to earn while you learn"

obj = search("earn", string)

print(obj)
print(obj.start(), obj.end(), obj.group())
7 11 earn
Output
<re.Match object; span=(7, 11), match='earn'>
This function also returns the Match object with start and end attributes. It also gives a group
of characters of which the pattern is a part of.

re.findall() Function
As against the search() function, the findall() continues to search for the pattern till the target
string is exhausted. The object returns a list of all occurrences.
Example: re.findall()

from re import findall

string = "Try to earn while you learn"

obj = findall("earn", string)

print(obj)
Output
['earn', 'earn']

DR. MILIND GODASE, SIOM, PUNE – 41 7

This function can be used to get the list of words in a sentence. We shall use \w* pattern
for the purpose. We also check which of the words do not have any vowels in them.
Example: re.findall()
obj = findall(r"\w*", "Fly in the sky.")
print(obj)

for word in obj:

obj= search(r"[aeiou]",word)
if word!=' ' and obj==None:
print(word)
Output
['Fly', '', 'in', '', 'the', '', 'sky', '', '']
Fly
sky

re.finditer() function
The re.finditer() function returns an iterator object of all matches in the target string.
For each matched group, start and end positions can be obtained by span() attribute.
Example: re.finditer()

from re import finditer

string = "Try to earn while you learn"

it = finditer("earn", string)
for match in it:
print(match.span())
Output
(7, 11)
(23, 27)

DR. MILIND GODASE, SIOM, PUNE – 41 8

re.split() function
The re.split() function works similar to the split() method of str object in Python. It splits
the given string every time when a white space is found. In the above example of
the findall() to get all words, the list also contains each occurrence of white space as a word.
That is eliminated by the split() function in re module.
Example: re.split()
from re import split

string = "Flat is better than nested. Sparse is better than dense."

words = split(r' ', string)
print(words)
Output
['Flat', 'is', 'better', 'than', 'nested.', 'Sparse', 'is', 'better', 'than', 'dense.']

re.compile() Function
The re.compile() function returns a pattern object which can be repeatedly used in different
regex functions. In the following example, a string ‘is’ is compiled to get a pattern object and
is subjected to the search() method.

Example: re.compile()

from re import *

pattern = compile(r'[aeiou]')
string = "Flat is better than nested. Sparse is better than dense."
words = split(r' ', string)
for word in words:
print(word, pattern.match(word))
Output
Flat None
is <re.Match object; span=(0, 1), match='i'>

DR. MILIND GODASE, SIOM, PUNE – 41 9

better None
than None
nested. None
Sparse None
is <re.Match object; span=(0, 1), match='i'>
better None
than None
dense. None
The same pattern object can be reused in searching for words having vowels, as shown below.
Example: search()
for word in words:
print(word, pattern.search(word))
Output
Flat <re.Match object; span=(2, 3), match='a'>
is <re.Match object; span=(0, 1), match='i'>
better <re.Match object; span=(1, 2), match='e'>
than <re.Match object; span=(2, 3), match='a'>
nested. <re.Match object; span=(1, 2), match='e'>
Sparse <re.Match object; span=(2, 3), match='a'>
is <re.Match object; span=(0, 1), match='i'>
better <re.Match object; span=(1, 2), match='e'>
than <re.Match object; span=(2, 3), match='a'>
dense. <re.Match object; span=(1, 2), match='e'>

The sub() Function

The sub() function replaces the matches with the text of your choice:
Example
Replace every white-space character with the number 9:
import re

DR. MILIND GODASE, SIOM, PUNE – 41 10

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

4.4. Password, email, URL validation using re

Password Validation using re
First we create a regular expression which can satisfy the conditions required to call it a valid
password. Then we match the given password with the required condition using the search
function of re. In the below example the complexity requirement is we need at least one capital
letter, one number and one special character. We also need the length of the password to be
between 8 and 18.
import re

pswd = input("Enter a Valid password : ")

reg = "^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*#?&])[A-Za-z\d@$!#%*?&]{8,18}$"

# compiling regex
match_re = re.compile(reg)

# searching regex
res = re.search(match_re, pswd)

# validating conditions
if res:
print("Valid Password")
else:
print("Invalid Password")

DR. MILIND GODASE, SIOM, PUNE – 41 11

URL Validation using re
Following python code in which we pass the regex expression and string to search() method and
checked whether the input URL matches the expression pattern. If the pattern is matched, the
URL is valid else the URL is invalid.
import re

def check_url(ip_url):

# Regular expression for URL

regex = re.compile(

r'^(?:http|ftp)s?://' # http:// or https://

r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'

#domain...

r'localhost|' #localhost...

r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip

r'(?::\d+)? # optional port

r'(?:/?|[/?]\S+)$', re.IGNORECASE)

if (ip_url == None):

print("Input string is empty !!!")

if(re.search(regex, ip_url)):

print("Input URL is valid !!!")

else:

print("Input URL is invalid !!!")

ch = 'y'

while ch == 'y':

DR. MILIND GODASE, SIOM, PUNE – 41 12

ip_url = input("Enter the URL string: ")

check_url(ip_url)

ch = input("Do you want to continue? (y or n): ")

if ch == 'y':

continue

else:

break

Output:

Enter the URL string: http://localhost:8080

Input URL is valid !!!

Do you want to continue? (y or n): y

Enter the URL string: http://172.16.16.16

Input URL is valid !!!

Do you want to continue? (y or n): n

DR. MILIND GODASE, SIOM, PUNE – 41 13

Email Validation using re
Given a string, write a Python program to check if the string is a valid email address or not.
An email is a string (a subset of ASCII characters) separated into two parts by @ symbol, a
“personal_info” and a domain, that is personal_info@domain.
Valid Domain Name: A domain name consists of minimum two and maximum 63 characters.
All letters from a to z, all numbers from 0 to 9 and a hyphen (-) are possible.
A domain name must not consist of a hyphen (-) on the third and fourth position at the same
time.
Examples:
Input: [email protected]
Output: Valid Email
Input: [email protected]
Output: Valid Email
Input: ankitrai326.com
Output: Invalid Email
In this program, we are using search() method of re module. So let’s see the description about
it.
re.search() : This method either returns None (if the pattern doesn’t match), or re.MatchObject
that contains information about the matching part of the string. This method stops after the first
match, so this is best suited for testing a regular expression more than extracting data.

# Python program to validate an Email

# import re module
import re
# Make a regular expression for validating an Email.
regex = r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"
# Define a function for validating an Email.
def check(email):
if(re.search(regex, email)):
print("Valid Email")

DR. MILIND GODASE, SIOM, PUNE – 41 14

else:
print("Invalid Email")

if __name__ == '__main__':
# Enter the email
email = "[email protected]"
check(email)
email = "[email protected]"
check(email)
email = "ankitrai326.com"
check(email)

Output:
Valid Email
Valid Email
Invalid Email
List of Valid Email Addresses
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
email@[123.123.123.123]
"email"@example.com
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

4.5. Pattern finding programs using regular expression

DR. MILIND GODASE, SIOM, PUNE – 41 15

Unit-3 Python
No ratings yet
Unit-3 Python
72 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
100% (1)
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Application Admin
No ratings yet
Application Admin
136 pages
MCA Python Journal
100% (2)
MCA Python Journal
5 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
Event Management System Synopsis
No ratings yet
Event Management System Synopsis
7 pages
Cohort 6 AICTE Registrtion and Internship Aplly Process Document
No ratings yet
Cohort 6 AICTE Registrtion and Internship Aplly Process Document
43 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
6 pages
Sped Report Card - Docx Final
No ratings yet
Sped Report Card - Docx Final
2 pages
Basic Factors of Delivery
100% (1)
Basic Factors of Delivery
44 pages
MACHINE LEARNING ALGORITHM - Unit-1-1
100% (1)
MACHINE LEARNING ALGORITHM - Unit-1-1
78 pages
Python Programming: Reema Thareja
No ratings yet
Python Programming: Reema Thareja
27 pages
Regular Expressions
100% (1)
Regular Expressions
15 pages
Indian Philosophy
100% (1)
Indian Philosophy
45 pages
Regular Expression
No ratings yet
Regular Expression
17 pages
English L@2reading PDF
50% (2)
English L@2reading PDF
213 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
Regular Expression
No ratings yet
Regular Expression
22 pages
13 Python Ch05 ORC
No ratings yet
13 Python Ch05 ORC
4 pages
Day-13 Python Regx
No ratings yet
Day-13 Python Regx
11 pages
Unit 2
No ratings yet
Unit 2
69 pages
Regular Expression 01
No ratings yet
Regular Expression 01
48 pages
Unit III
No ratings yet
Unit III
79 pages
Python Complete Unit 3
No ratings yet
Python Complete Unit 3
40 pages
Unit 4 - Regular Expressions
No ratings yet
Unit 4 - Regular Expressions
20 pages
Unit 4 Regular Expression
No ratings yet
Unit 4 Regular Expression
16 pages
17 - Regular Expression
No ratings yet
17 - Regular Expression
20 pages
PP - Module-3 Notes
No ratings yet
PP - Module-3 Notes
56 pages
Regular Expression 4
No ratings yet
Regular Expression 4
16 pages
Regular Exp
No ratings yet
Regular Exp
10 pages
9 RegEx
No ratings yet
9 RegEx
57 pages
UNIT4
No ratings yet
UNIT4
67 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Regular Exp
No ratings yet
Regular Exp
6 pages
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
No ratings yet
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
17 pages
13B RegExp
No ratings yet
13B RegExp
38 pages
Regular Expression L
No ratings yet
Regular Expression L
20 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
Python Unit 3
No ratings yet
Python Unit 3
46 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Unit - 4 Regex
No ratings yet
Unit - 4 Regex
28 pages
RegEx in Python
No ratings yet
RegEx in Python
5 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
Python RegEx
No ratings yet
Python RegEx
1 page
Real World Task and Pedagogic Tasks
0% (1)
Real World Task and Pedagogic Tasks
1 page
Module 3 Regular Expressions
No ratings yet
Module 3 Regular Expressions
8 pages
Regular
No ratings yet
Regular
9 pages
The Current Topic: Python Announcements: Lecture Room
No ratings yet
The Current Topic: Python Announcements: Lecture Room
7 pages
Unit 3 Python
No ratings yet
Unit 3 Python
72 pages
9 RegEx
No ratings yet
9 RegEx
57 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
12 pages
Re Expression 19 and 20
No ratings yet
Re Expression 19 and 20
26 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
18 pages
Ems - Project Report
No ratings yet
Ems - Project Report
43 pages
Python Unit-3
No ratings yet
Python Unit-3
23 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
3.III-Regular Expression Part-I & II 2022-23
No ratings yet
3.III-Regular Expression Part-I & II 2022-23
14 pages
Implementation of High-Speed and Area-Efficient VLSI Architecture of Three-Operand Binary Adder
No ratings yet
Implementation of High-Speed and Area-Efficient VLSI Architecture of Three-Operand Binary Adder
26 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
Python Regex: Re - Match, Re - Search, Re - Findall With Example
No ratings yet
Python Regex: Re - Match, Re - Search, Re - Findall With Example
10 pages
Python How To Regex
No ratings yet
Python How To Regex
19 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regular Expression Howto: A.M. Kuchling
No ratings yet
Regular Expression Howto: A.M. Kuchling
20 pages
NEP - A Path To Paradigm Shift
No ratings yet
NEP - A Path To Paradigm Shift
4 pages
PAST CONTINUOUS Questions
100% (1)
PAST CONTINUOUS Questions
2 pages
CH 1 Software Quality Assurance Fundamentals-KM
No ratings yet
CH 1 Software Quality Assurance Fundamentals-KM
19 pages
Ivy-Alvarez Edited
No ratings yet
Ivy-Alvarez Edited
14 pages
PostgreSQL Architecture 2
No ratings yet
PostgreSQL Architecture 2
5 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
MACHINE LEARNING ALGORITHM Unit-II Part-II-1
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II Part-II-1
65 pages
PP - Chapter - 1-1
No ratings yet
PP - Chapter - 1-1
106 pages
First Summative Test, Oral Communication
No ratings yet
First Summative Test, Oral Communication
2 pages
PP - Chapter - 8
No ratings yet
PP - Chapter - 8
112 pages
How To Connect To A Remote SQL Server
No ratings yet
How To Connect To A Remote SQL Server
15 pages
Module 9 Spiritual Self
No ratings yet
Module 9 Spiritual Self
10 pages
Baddley's Working Memory
No ratings yet
Baddley's Working Memory
5 pages
WWW - AD-POWER - CN: Class-D Amplifier Module
No ratings yet
WWW - AD-POWER - CN: Class-D Amplifier Module
6 pages
CH 2-Software Testing Fundamentals - KM
No ratings yet
CH 2-Software Testing Fundamentals - KM
42 pages
Module 2.1 - Speaking Mathematically
No ratings yet
Module 2.1 - Speaking Mathematically
7 pages
Mad Imp
No ratings yet
Mad Imp
18 pages
Unit 7 - Visual Stories - Animation Video Project. - Unit 7 - Visual Stories - Project
No ratings yet
Unit 7 - Visual Stories - Animation Video Project. - Unit 7 - Visual Stories - Project
1 page
CC Imp
No ratings yet
CC Imp
7 pages
Amreen Fatima 1021 PDF
No ratings yet
Amreen Fatima 1021 PDF
4 pages
Instant Ebooks Textbook O Pioneers Webster S German Thesaurus Edition Willa Cather Download All Chapters
No ratings yet
Instant Ebooks Textbook O Pioneers Webster S German Thesaurus Edition Willa Cather Download All Chapters
85 pages
The Themes of Quine S Philosophy Meaning Reference and Knowledge 1st Edition Edward Becker
No ratings yet
The Themes of Quine S Philosophy Meaning Reference and Knowledge 1st Edition Edward Becker
44 pages
Chapter 1 Artificial Intelligence and Knowledge Representation
No ratings yet
Chapter 1 Artificial Intelligence and Knowledge Representation
36 pages
(506)
No ratings yet
(506)
6 pages
C Programming Exercise
No ratings yet
C Programming Exercise
4 pages
Keyboard Shortcuts
No ratings yet
Keyboard Shortcuts
6 pages
W5 Quiz-Ans
No ratings yet
W5 Quiz-Ans
5 pages
RFQ Process
No ratings yet
RFQ Process
19 pages
TEO - Self-Responsibility in Existentialism and Buddhism
No ratings yet
TEO - Self-Responsibility in Existentialism and Buddhism
12 pages
DWDM Assignment 2
No ratings yet
DWDM Assignment 2
16 pages
KRAI Lab Manual
No ratings yet
KRAI Lab Manual
3 pages
CS403 2nd Assignment
No ratings yet
CS403 2nd Assignment
2 pages
Assignment No 1
No ratings yet
Assignment No 1
1 page
Soal Bhs Inggris Olimpiade 2
No ratings yet
Soal Bhs Inggris Olimpiade 2
3 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet