0% found this document useful (0 votes)

166 views34 pages

Regular Expressions: Python For Everybody

Uploaded by

Anusha Vedanabhatla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views34 pages

Regular Expressions: Python For Everybody

Uploaded by

Anusha Vedanabhatla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Regular Expressions

Chapter 11

Python for Everybody

www.py4e.com
Regular Expressions
In computing, a regular expression, also referred to as
“regex” or “regexp”, provides a concise and flexible
means for matching strings of text, such as particular
characters, words, or patterns of characters. A regular
expression is written in a formal language that can be
interpreted by a regular expression processor.

http://en.wikipedia.org/wiki/Regular_expression
Regular Expressions
Really clever “wild card” expressions for matching
and parsing strings

http://en.wikipedia.org/wiki/Regular_expression
Really smart “Find” or “Search”
Understanding Regular Expressions

• Very powerful and quite cryptic

• Fun once you understand them
• Regular expressions are a language unto themselves
• A language of “marker characters” - programming with characters
• It is kind of an “old school” language - compact
http://xkcd.com/208/
Regular Expression Quick Guide
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
+ Repeats a character one or more times
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
{} Range set
[a-z] Any lower case alphabet
[A-Z] Any Upper case alphabet
[0-9] Any digits
The Regular Expression Module
• Before you can use regular expressions in your program, you must
import the library using “import re”

• You can use re.search() to see if a string matches a regular expression,

similar to using the find() method for strings

• You can use re.findall() to extract portions of a string that match your
regular expression, similar to a combination of find() and slicing:
var[5:10]
Using re.search() Like find()

import re
hand = open('mbox-short.txt')
for line in hand: hand = open('mbox-short.txt')
line = line.rstrip() for line in hand:
if line.find('From:') >= 0: line = line.rstrip()
print(line) if re.search('From:', line) :
print(line)
Using re.search() Like startswith()
import re
hand = open('mbox-short.txt')
for line in hand: hand = open('mbox-short.txt')
line = line.rstrip() for line in hand:
if line.startswith('From:') : line = line.rstrip()
print(line) if re.search('^From:', line) :
print(line)

We fine-tune what is matched by adding special characters to the string

Wild-Card Characters
• The dot character matches any character

• If you add the asterisk character, the character is “any number of

times”
Many times
Match the start of the line
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475
X-Content-Type-Message-Body: text/plain
^X.*:
Match any character
Fine-Tuning Your Match
Depending on how “clean” your data is and the purpose of your
application, you may want to narrow your match down a bit

Many times
Match the start of
X-Sieve: CMU Sieve 2.3 the line
X-DSPAM-Result: Innocent
X-Plane is behind schedule: two weeks
X-: Very short
^X.*:
Match any character
Fine-Tuning Your Match
Depending on how “clean” your data is and the purpose of your
application, you may want to narrow your match down a bit

One or more
Match the start of
X-Sieve: CMU Sieve 2.3 times
X-DSPAM-Result: Innocent the line
X-: Very Short
X-Plane is behind schedule: two weeks ^X-\S+:
Match any non-whitespace character
Matching and Extracting Data
• re.search() returns a True/False depending on whether the string
matches the regular expression

• If we actually want the matching strings to be extracted, we use

re.findall()
>>> import re
[0-9]+ >>> x = 'My 2 favorite numbers are 19 and 42'
>>> y = re.findall('[0-9]+',x)
>>> print(y)
['2', '19', '42']
One or more digits
Matching and Extracting Data
When we use re.findall(), it returns a list of zero or more sub-strings that
match the regular expression

>>> import re
>>> x = 'My 2 favorite numbers are 19 and 42'
>>> y = re.findall('[0-9]+',x)
>>> print(y)
['2', '19', '42']
>>> y = re.findall('[AEIOU]+',x)
>>> print(y)
[]
Warning: Greedy Matching
The repeat characters (* and +) push outward in both directions (greedy)
to match the largest possible string
One or more
characters
>>> import re
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+:', x)
>>> print(y) ^F.+:
['From: Using the :']

First character in the Last character in the

Why not 'From:' ?
match is an F match is a :
Non-Greedy Matching
Not all regular expression repeat codes are greedy! If you
add a ? character, the + and * chill out a bit... One or more
characters but
not greedy
>>> import re
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+?:', x) ^F.+?:
>>> print(y)
['From:']
First character in the Last character in the
match is an F match is a :
Fine-Tuning String Extraction
You can refine the match for re.findall() and separately determine which portion of
the match is to be extracted by using parentheses

From [email protected] Sat Jan 5 09:14:16 2008

>>> y = re.findall('\S+@\S+',x) \S+@\S+

>>> print(y)
['[email protected]’]
At least one non-
whitespace
character
Fine-Tuning String Extraction
Parentheses are not part of the match - but they tell where to start and stop
what string to extract

From [email protected] Sat Jan 5 09:14:16 2008

>>> y = re.findall('\S+@\S+',x)
>>> print(y)
['[email protected]']
^From (\S+@\S+)
>>> y = re.findall('^From (\S+@\S+)',x)
>>> print(y)
['[email protected]']
String Parsing Examples…
21 31

From [email protected] Sat Jan 5 09:14:16 2008

>>> data = 'From [email protected] Sat Jan 5 09:14:16 2008'

>>> atpos = data.find('@')
>>> print(atpos)
21
>>> sppos = data.find(' ',atpos) Extracting a host
>>> print(sppos) name - using find
31
>>> host = data[atpos+1 : sppos]
and string slicing
>>> print(host)
uct.ac.za
The Double Split Pattern
Sometimes we split a line one way, and then grab one of the pieces of the
line and split that piece again

From [email protected] Sat Jan 5 09:14:16 2008

words = line.split() [email protected]

email = words[1] ['stephen.marquard', 'uct.ac.za']
pieces = email.split('@')
print(pieces[1]) 'uct.ac.za'
The Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)

['uct.ac.za']
'@([^ ]*)'

Look through the string until you find an at sign

The Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)

['uct.ac.za']
'@([^ ]*)'

Match non-blank character Match many of them

The Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)

['uct.ac.za']
'@([^ ]*)'

Extract the non-blank characters

Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]*)'

Starting at the beginning of the line, look for the string 'From '
Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]*)'

Skip a bunch of characters, looking for an at sign

Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]*)'

Start extracting
Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]+)'

Match non-blank character Match many of them

Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)

['uct.ac.za']
'^From .*@([^ ]+)'

Stop extracting
Spam Confidence
import re
hand = open('mbox-short.txt')
numlist = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('^X-DSPAM-Confidence: ([0-9.]+)', line)
if len(stuff) != 1 : continue
num = float(stuff[0])
numlist.append(num)
print('Maximum:', max(numlist)) python ds.py
Maximum: 0.9907
X-DSPAM-Confidence: 0.8475
Escape Character
If you want a special regular expression character to just behave
normally (most of the time) you prefix it with '\'

>>> import re At least one or

>>> x = 'We just received $10.00 for cookies.' more
>>> y = re.findall('\$[0-9.]+',x)
>>> print(y)
['$10.00']
\$[0-9.]+
A real dollar sign A digit or period
Summary

• Regular expressions are a cryptic but powerful language for

matching strings and extracting elements from those strings
• Regular expressions have special characters that indicate intent
Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance (
...
www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan

School of Information

… Insert new Contributors and Translations here

Digital & Tech Solutions Notes - 2024
No ratings yet
Digital & Tech Solutions Notes - 2024
11 pages
7FBMF 16-50-strony-253-498
No ratings yet
7FBMF 16-50-strony-253-498
246 pages
5A - Regex
No ratings yet
5A - Regex
32 pages
SCS-C02 Updated Questions for Passing AWS Certified Security - Specialty
No ratings yet
SCS-C02 Updated Questions for Passing AWS Certified Security - Specialty
21 pages
Vlsi Notes
No ratings yet
Vlsi Notes
194 pages
Notepad ANC
No ratings yet
Notepad ANC
11 pages
Cana Trans Manual
No ratings yet
Cana Trans Manual
8 pages
WF 300 User Manual 20.10.20
No ratings yet
WF 300 User Manual 20.10.20
12 pages
ReleaseNotes Codesys 2 3.9.68
No ratings yet
ReleaseNotes Codesys 2 3.9.68
5 pages
Flowcon valves PL - March 2024
No ratings yet
Flowcon valves PL - March 2024
4 pages
Elliptic Curve Cryptography
No ratings yet
Elliptic Curve Cryptography
30 pages
UPS Battery TR 1& 2 - FBATT12 - PEM - 0003 - 01
No ratings yet
UPS Battery TR 1& 2 - FBATT12 - PEM - 0003 - 01
7 pages
Comprehensive Guide to Cloud Security
No ratings yet
Comprehensive Guide to Cloud Security
15 pages
IAT-IV Question Paper With Solution of 18CS72 Big Data Analytics Feb-2022-Poonam Vijay Tijare
No ratings yet
IAT-IV Question Paper With Solution of 18CS72 Big Data Analytics Feb-2022-Poonam Vijay Tijare
9 pages
3D Printed Fabric - Infill Method Guide and Log
No ratings yet
3D Printed Fabric - Infill Method Guide and Log
2 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
M8047 PP44S8L71 Spec Sheet-1
No ratings yet
M8047 PP44S8L71 Spec Sheet-1
2 pages
Linux Command Assignment
No ratings yet
Linux Command Assignment
3 pages
Universiti Kuala Lumpur Intra Management System: Unikl-Ims
No ratings yet
Universiti Kuala Lumpur Intra Management System: Unikl-Ims
16 pages
Lab02 AWS Assignment
No ratings yet
Lab02 AWS Assignment
1 page
CMM Cable Guide PDF
No ratings yet
CMM Cable Guide PDF
69 pages
MVSG
No ratings yet
MVSG
2 pages
In Steam Sterilizers: Osama Kamel
No ratings yet
In Steam Sterilizers: Osama Kamel
37 pages
Section 66A of The Information Technology Act
No ratings yet
Section 66A of The Information Technology Act
7 pages
Sew-Eurodrive, Inc.: Parallel Helical Reducer RX81A
No ratings yet
Sew-Eurodrive, Inc.: Parallel Helical Reducer RX81A
1 page
Simplified DLL Contemporary Arts F2F Fourth
No ratings yet
Simplified DLL Contemporary Arts F2F Fourth
2 pages
IV Year I Sem - R07
No ratings yet
IV Year I Sem - R07
7 pages
New Text Document
No ratings yet
New Text Document
7 pages
Module 02 - Footprinting and Reconnaissance - Lab 5 - Perform Email Footprinting
No ratings yet
Module 02 - Footprinting and Reconnaissance - Lab 5 - Perform Email Footprinting
12 pages
Zero Trust Cyber Security Model
No ratings yet
Zero Trust Cyber Security Model
5 pages
Technical Product Description Wincc Oa v3 18 en
No ratings yet
Technical Product Description Wincc Oa v3 18 en
11 pages
MRKT 620 Final Exam Short Answers
No ratings yet
MRKT 620 Final Exam Short Answers
3 pages
ICT-Action-Plan-Sample
No ratings yet
ICT-Action-Plan-Sample
3 pages
Module 01 Introduction to Penetration Testing
No ratings yet
Module 01 Introduction to Penetration Testing
30 pages
Slides
0% (1)
Slides
52 pages
File Inclusion
No ratings yet
File Inclusion
73 pages
Purpose of Visit: - Action Taken: - : Service Report Page 1/3 Global Peace
No ratings yet
Purpose of Visit: - Action Taken: - : Service Report Page 1/3 Global Peace
4 pages
Course Code: CS101 Course Title: IT For Managers
No ratings yet
Course Code: CS101 Course Title: IT For Managers
31 pages
NIkto
No ratings yet
NIkto
1 page
A-Guide-for-Beginners---Understand-Artificial-Intelligence-
No ratings yet
A-Guide-for-Beginners---Understand-Artificial-Intelligence-
14 pages
Valvoline Maxlife Multi Vehicle ATF
No ratings yet
Valvoline Maxlife Multi Vehicle ATF
2 pages
Python For Loop
No ratings yet
Python For Loop
13 pages
Chapter 1-4
No ratings yet
Chapter 1-4
135 pages
SAMPLEnTopic 1+
No ratings yet
SAMPLEnTopic 1+
26 pages
02 Cryptographic Tools
No ratings yet
02 Cryptographic Tools
116 pages
Python Lists: Contents
No ratings yet
Python Lists: Contents
9 pages
Elliptic Curve Cryptography
No ratings yet
Elliptic Curve Cryptography
10 pages
LMS User Guide - Simplilearn
No ratings yet
LMS User Guide - Simplilearn
31 pages
LMS User Guide - Simplilearn
No ratings yet
LMS User Guide - Simplilearn
31 pages
Lec 06 - Components of PKI
No ratings yet
Lec 06 - Components of PKI
50 pages
Bell 47G RFM Rev.04 (1958)
No ratings yet
Bell 47G RFM Rev.04 (1958)
37 pages
CS45-TCP Course Notes
No ratings yet
CS45-TCP Course Notes
25 pages
Observer Design For State Variable Feedback Controller by Matlab
No ratings yet
Observer Design For State Variable Feedback Controller by Matlab
21 pages
Chapter 5 Introduction To Python
No ratings yet
Chapter 5 Introduction To Python
61 pages
CS GTU Study Material Presentations Unit-1 02102020082427AM
No ratings yet
CS GTU Study Material Presentations Unit-1 02102020082427AM
63 pages
SYS Module
No ratings yet
SYS Module
20 pages
Artificial Intelligence Overview
No ratings yet
Artificial Intelligence Overview
10 pages
Zarq Javed Gill - 70126247 - Assignment 4 - PP
No ratings yet
Zarq Javed Gill - 70126247 - Assignment 4 - PP
5 pages
Data Communication &networks: Domain Name System
No ratings yet
Data Communication &networks: Domain Name System
20 pages
6CS4-23 Python Lab Plan
No ratings yet
6CS4-23 Python Lab Plan
2 pages
Function Arguments and Keyword Arguments
No ratings yet
Function Arguments and Keyword Arguments
13 pages
AWS - Storage Service Detail - Notes - by Rohit Singh
No ratings yet
AWS - Storage Service Detail - Notes - by Rohit Singh
13 pages
Ethical Hacking Chap - 1
No ratings yet
Ethical Hacking Chap - 1
17 pages
Aircrack NG Suite
No ratings yet
Aircrack NG Suite
4 pages
Fox Head: Level 12 Parts Time To Create 4 Hour
No ratings yet
Fox Head: Level 12 Parts Time To Create 4 Hour
6 pages
Chapter 5 and 6
No ratings yet
Chapter 5 and 6
22 pages
Ipl, 6021P, 917.38451a, 2014-01
No ratings yet
Ipl, 6021P, 917.38451a, 2014-01
26 pages
INS - Assignment 2 (TCS2223033)
No ratings yet
INS - Assignment 2 (TCS2223033)
6 pages
Tap Proof Encryption
No ratings yet
Tap Proof Encryption
8 pages
Assigment No 2 InfoSec
No ratings yet
Assigment No 2 InfoSec
11 pages
Args and Kwargs
No ratings yet
Args and Kwargs
3 pages
Ccs Module 2
No ratings yet
Ccs Module 2
85 pages
Splunk Basics
No ratings yet
Splunk Basics
13 pages
Chapter 6 - AC - MAC-DAC-BRAC-ABAC
No ratings yet
Chapter 6 - AC - MAC-DAC-BRAC-ABAC
44 pages
Netcat
No ratings yet
Netcat
11 pages
Lab Manual Data Structures Using C Lab: For MCA 2 Semester of VTU
No ratings yet
Lab Manual Data Structures Using C Lab: For MCA 2 Semester of VTU
38 pages
Lab Manual Data Structures Using C Lab: For MCA 2 Semester of VTU
No ratings yet
Lab Manual Data Structures Using C Lab: For MCA 2 Semester of VTU
38 pages
3 - Python
No ratings yet
3 - Python
33 pages
Lecture3 SSD Israfil
No ratings yet
Lecture3 SSD Israfil
40 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
Computer Security: Principles and Practice
No ratings yet
Computer Security: Principles and Practice
36 pages
Splunk Project
No ratings yet
Splunk Project
2 pages
As Assignment 2
No ratings yet
As Assignment 2
16 pages
Cs6004 Cyber Forensics: Unit - Ii
No ratings yet
Cs6004 Cyber Forensics: Unit - Ii
62 pages
Cryptography Assignment
No ratings yet
Cryptography Assignment
7 pages
CSA Preparing The IR Toolkit + Chpt9
No ratings yet
CSA Preparing The IR Toolkit + Chpt9
43 pages
BRC-Transport-Level Security
No ratings yet
BRC-Transport-Level Security
40 pages
DNS Packet Flow With DNSSEC Terminology: Public Key Algorithm
No ratings yet
DNS Packet Flow With DNSSEC Terminology: Public Key Algorithm
1 page
Demag Drives: Keeping Things On The Move
No ratings yet
Demag Drives: Keeping Things On The Move
28 pages
Net Cat
No ratings yet
Net Cat
4 pages
Open Function: File - Object Open ("Filename", "Mode") Where File - Object Is The Variable To Add The
No ratings yet
Open Function: File - Object Open ("Filename", "Mode") Where File - Object Is The Variable To Add The
12 pages
Assignment No.5 Question No.1:: What Is The Difference Between A Block Cipher and A Stream Cipher? Answer
No ratings yet
Assignment No.5 Question No.1:: What Is The Difference Between A Block Cipher and A Stream Cipher? Answer
2 pages
Explain: Assignment-1
No ratings yet
Explain: Assignment-1
4 pages
Application Vulnerabilities and Defences
No ratings yet
Application Vulnerabilities and Defences
12 pages
Print Formatting in Python
No ratings yet
Print Formatting in Python
3 pages
Assignment OF CSE-213
No ratings yet
Assignment OF CSE-213
8 pages
Thara
No ratings yet
Thara
4 pages

Regular Expressions: Python For Everybody

Uploaded by

Regular Expressions: Python For Everybody

Uploaded by

Regular Expressions

Python for Everybody

• Very powerful and quite cryptic

• You can use re.search() to see if a string matches a regular expression,

We fine-tune what is matched by adding special characters to the string

• If you add the asterisk character, the character is “any number of

• If we actually want the matching strings to be extracted, we use

First character in the Last character in the

From [email protected] Sat Jan 5 09:14:16 2008

>>> y = re.findall('\S+@\S+',x) \S+@\S+

From [email protected] Sat Jan 5 09:14:16 2008

From [email protected] Sat Jan 5 09:14:16 2008

>>> data = 'From [email protected] Sat Jan 5 09:14:16 2008'

From [email protected] Sat Jan 5 09:14:16 2008

words = line.split() [email protected]

Look through the string until you find an at sign

Match non-blank character Match many of them

Extract the non-blank characters

Skip a bunch of characters, looking for an at sign

Match non-blank character Match many of them

>>> import re At least one or

• Regular expressions are a cryptic but powerful language for

Initial Development: Charles Severance, University of Michigan

… Insert new Contributors and Translations here

You might also like