0% found this document useful (0 votes)

27 views6 pages

Supplement Python Regular Expression

This document discusses regular expressions (regex) and how they can be used to validate user input by matching patterns in strings. It provides examples of regex syntax used to match numbers, letters, special characters, and quantifiers. The re module allows importing regex functions in Python like re.match() and re.search() to find matches anywhere in a string. Specific patterns are demonstrated to match social security numbers, phone numbers, and other structured data.

Uploaded by

ايادالعراقي

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views6 pages

Supplement Python Regular Expression

Uploaded by

ايادالعراقي

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Supplement: Regular Expressions

For Introduction to Programming Using Python

By Y. Daniel Liang

0 Introduction

Often you need to write the code to validate user input such as
to check whether the input is a number, a string with all
lowercase letters, or a social security number. How do you write
this type of code? A simple and effective way to accomplish this
task is to use the regular expression.

A regular expression (abbreviated regex) is a string that

describes a pattern for matching a set of strings. Regular
expression is a powerful tool for string manipulations. You can
use regular expressions for matching, replacing, and splitting
strings.

1 Getting Started

To use regex, import the re module. You can use the split
function in the module to split a string. For example,
re.split(" ", "ab bc cd")

splits "ab bc cd" into a list ['ab', 'bc', 'cd'].

At first glance, re.split function is very similar to the split

method in the string object. For example, you can use the
following method to split "ab bc cd".
"ab bc cd".split()

However, the re.split function is more powerful. You can specify

regex pattern to split a string. For example,
re.split("\d", "ab1bc4cd")

splits "ab1bc4cd" into a list ['ab', 'bc', 'cd']. \d in the

preceding statement is a regular expression. It represents any
single digit. Here is another example,
re.split("\d*", "ab13bc44cd443gg")

splits "ab13bc44cd443gg" into a list ['ab', 'bc', 'cd', 'gg'].

Here, the regular expression \d* means zero or more digits.

2 Regular Expression Syntax

A regular expression consists of literal characters and special

symbols. Table 1 lists some frequently used syntax for regular
expressions.

1
Table 1: Frequently Used Regular Expressions

Regular Expression Meaning Example

x A character literal "good" matches "good"

. Any single character "good" matches "goo."
(ab|cd) ab or cd "good" matches "a|g"
[abc] a, b, or c "good" matches "[ag]"
[âbc] any character except "good" matches "[âc]"
a, b, or c
[a-z] a through z "good" matches [a-i]oo[a-d]
[â-z] any character except "good" matches goo[î-x]
a through z
\d a digit, same as [0-9] "good3" matches "good\d"
\D a non-digit "good" matches "\D\Dod"
\w a word character "good3" matches "goo\w\w"
\W a non-word character $good matches "\Wgood"
\s a whitespace character "good 2" matches "good\s2"
\S a non-whitespace char "good" matches "\Sood"

p* zero or more "good" matches "a*"

occurrences of pattern p bbb matches "a*"
p+ one or more "good" matches "o+"
occurrences of pattern p bbb matches "b+"
p? zero or one "good" matches "good?"
occurrence of pattern p bbb matches "b?"
p{n} exactly n aaa matches "a{3}"
occurrences of pattern p good does not match "go{2}d"
p{n,} at least n good matches "go{2,}d"
occurrences of pattern p good does not match "g{1,}"
p{n,m} between n and m aa matches "a{1,9}"
occurrences (inclusive) bb does not match "b{2,9}"

NOTE
Recall that a whitespace (or a whitespace character)
is any character which does not display itself but
does take up space. The characters ' ', '\t', '\n',
'\r', '\f' are whitespace characters. So \s is the
same as [ \t\n\r\f], and \S is the same as [^
\t\n\r\f\v].

NOTE
A word character is any letter, digit, or the
underscore character. So \w is the same as [a-z[A-
Z][0-9]_] or simply [a-zA-Z0-9_], and \W is the same
as [^a-zA-Z0-9_].

NOTE
The last six entries *, +, ?, {n}, {n,}, and {n, m}
in Table 1 are called quantifiers that specify how
many times the pattern before a quantifier may
repeat. For example, A* matches zero or more A’s, A+
matches one or more A’s, A? matches zero or one A’s,
A{3} matches exactly AAA, A{3,} matches at least
three A’s, and A{3,6} matches between 3 and 6 A’s. *

2
is the same as {0,}, + is the same as {1,}, and ? is the
same as {0,1}.

CAUTION
Do not use spaces in the repeat quantifiers. For
example, A{3,6} cannot be written as A{3, 6} with a
space after the comma.

NOTE
You may use parentheses to group patterns. For
example, (ab){3} matches ababab, but ab{3} matches
abbb.

Let us use several examples to demonstrate how to construct

regular expressions.

Example 1: The pattern for social security numbers is xxx-xx-

xxxx, where x is a digit. A regular expression for social
security numbers can be described as
\d{3}-\d{2}-\d{4}

For example,

"111-22-3333" matches "\d{3}-\d{2}-\d{4}"

but

"11-22-3333" does not match "\d{3}-\d{2}-\d{4}"

Example 2: An even number ends with digits 0, 2, 4, 6, or 8. The

pattern for even numbers can be described as

\d*[02468]

For example,

"123" matches "\d*[02468]"

but
"122" does not match "\d*[02468]"

Example 3: The pattern for telephone numbers is (xxx) xxx-xxxx,

where x is a digit and the first digit cannot be zero. A regular
expression for telephone numbers can be described as
\$[1-9]\d{2}\$ \d{3}-\d{4}

3
Note that the parentheses symbols ( and ) are special characters
in a regular expression for grouping patterns. To represent a
literal ( or ) in a regular expression, you have to use \$ and
\$.

For example,

"(912) 921-2728" matches "\$[1-9]\d{2}\$ \d{3}-\d{4}"

but

"921-2728" does not match "\$[1-9]\d{2}\$ \d{3}-\d{4}"

Example 4: Suppose the last name consists of at most 25 letters

and the first letter is in uppercase. The pattern for a last
name can be described as

[A-Z][a-zA-Z]{1,24}

Note that you cannot have arbitrary whitespace in a regular

expression. For example, [A-Z][a-zA-Z]{1, 24} would be wrong.

For example,

"Smith" matches "[A-Z][a-zA-Z]{1,24}"

but

"Jones123" does not match "[A-Z][a-zA-Z]{1,24}"

Example 5: Python identifiers are defined in §2.4,

“Identifiers.”

 An identifier is a sequence of characters that consists of

letters, digits, underscores (_), and asterisk (*).
 An identifier must start with a letter or an underscore. It
cannot start with a digit.

The pattern for identifiers can be described as

[a-zA-Z_][\w$]*

4
Example 6: What strings are matched by the regular expression
"Welcome to (XHTML|HTML)"? The answer is Welcome to XHTML or
Welcome to HTML.

Example 7: What strings are matched by the regular expression

".*"? The answer is any string.

3 The match and search Functions

You can use the re.match and re.search functions to match a

string with a pattern. re.match(r, s) returns a match object if
the regex r matches at the start of string s. re.search(r, s)
returns a match object if the regex r matches anywhere in string
s. Listing 1 gives an example of using these functions.
Listing 1 MatchDemo.py

import re

regex = "\d{3}-\d{2}-\d{4}"
ssn = input("Enter SSN: ")
match1 = re.match(regex, ssn)

if match1 != None:
print(ssn, " is a valid SSN")
print("start position of the matched text is " +
str(match1.start()))
print("start and end position of the matched text is " +
str(match1.span()))
else:
print(ssn, " is not a valid SSN")

Sample Output

Enter SSN: 4343

4343 is not a valid SSN

Sample Output
Enter SSN: 434-32-3243
434-32-3243 is a valid SSN
start position of the matched text is 0
start and end position of the matched text is (0, 11)

Invoking re.match returns a match object if the string matches

the regex pattern at the start of the string. Otherwise, it
returns None. The program checks whether if there is a match. If
so, it invokes the match object’s start() method to return the
start position of the matched text in the string (line 10) and
the span() method to return the start and end position of the
matched text in a tuple (line 11).

5
Listing 2 SearchDemo.py

import re

regex = "\d{3}-\d{2}-\d{4}"
text = input("Enter a text: ")
match1 = re.search(regex, text)

if match1 != None:
print(text, " contains a SSN")
print("start position of the matched text is " +
str(match1.start()))
print("start and end position of the matched text is " +
str(match1.span()))
else:
print(text, " does not contain a SSN")

Sample Output
Enter a text: The ssn for Smith is 343-34-3490
The ssn for Smith is 343-34-3490 contains a SSN
start position of the matched text is 21
start and end position of the matched text is (21, 32)

Sample Output
Enter a text: Smith's ssn is 343.34.3434
Smith's ssn is 343.34.3434 does not contain a SSN

Invoking re.search returns a match object if the string matches

the regex pattern anywhere in the string. Otherwise, it returns
None. The program checks whether if there is a match (line 7).
If so, it invokes the match object’s start() method to return
the start position of the matched text in the string (line 10)
and the span() method to return the start and end position of
the matched text in a tuple (line 11).

4 Flags

For the functions in the re module, an optional flag parameter

can be used to specify additional constraints. For example, in
the following statement
match1 = re.search("a{3}", "AaaBe", re.IGNORECASE)

The string "AaaBe" matches the pattern a{3} case-insensitive.

But in the following statement
match1 = re.search("a{3}", "AaaBe")

The string "AaaBe" does not match the pattern a{3}.

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6438)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (642)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1174)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (997)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1855)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4102)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1018)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5144)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (463)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (279)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4360)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2010)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
casestudy_4
No ratings yet
casestudy_4
1 page
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2788)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2884)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4088)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
الإستدلال الإحصائي - جلال الصياد
No ratings yet
الإستدلال الإحصائي - جلال الصياد
335 pages
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (835)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Contract of Lease
50% (4)
Contract of Lease
2 pages
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
NoSQL Databases
No ratings yet
NoSQL Databases
6 pages
Study The Effect of The Organization's Size and The Manager's Profile
No ratings yet
Study The Effect of The Organization's Size and The Manager's Profile
18 pages
Chen 2000
No ratings yet
Chen 2000
7 pages
Medical Statistics With R
No ratings yet
Medical Statistics With R
85 pages
Artificial Intelligence: Gaming Algorithms
No ratings yet
Artificial Intelligence: Gaming Algorithms
26 pages
اسلوب الاستفهام بالقران 2
No ratings yet
اسلوب الاستفهام بالقران 2
176 pages
طريقة كتابة الانشاء
No ratings yet
طريقة كتابة الانشاء
4 pages
نسخ الاسئلة الوزارية - اسلامية سادس اعدادي - لكل سنين والادوار (من 2016 الى 2019) - موقع سطور
No ratings yet
نسخ الاسئلة الوزارية - اسلامية سادس اعدادي - لكل سنين والادوار (من 2016 الى 2019) - موقع سطور
26 pages
Mastering Regular Expressions: Jeffrey E. F. Friedl
No ratings yet
Mastering Regular Expressions: Jeffrey E. F. Friedl
10 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Salient Feature Agrarian Reforms
No ratings yet
Salient Feature Agrarian Reforms
13 pages
Labor Standards Reviewer
No ratings yet
Labor Standards Reviewer
7 pages
Arens Aud16 02
No ratings yet
Arens Aud16 02
41 pages
Blum Catalogue 2022-2023 EN - V4
No ratings yet
Blum Catalogue 2022-2023 EN - V4
758 pages
TL-SG1008D V6 Datasheet
No ratings yet
TL-SG1008D V6 Datasheet
2 pages
Job Creation - Scheduele 1
No ratings yet
Job Creation - Scheduele 1
8 pages
Pump & Systems June 2019
100% (1)
Pump & Systems June 2019
92 pages
Azolla
100% (1)
Azolla
28 pages
Learn Enough JavaScript
100% (1)
Learn Enough JavaScript
58 pages
Tmai Pub Exp 250131 Tmai Corporate Exposure
No ratings yet
Tmai Pub Exp 250131 Tmai Corporate Exposure
9 pages
Establishing Identity
No ratings yet
Establishing Identity
1 page
NATCOR MIDTERM
No ratings yet
NATCOR MIDTERM
15 pages
Compact & Medium - Hammermaster
No ratings yet
Compact & Medium - Hammermaster
8 pages
BK Sons Company Profile
No ratings yet
BK Sons Company Profile
8 pages
BQ Penawaran Jasa Maintenance
No ratings yet
BQ Penawaran Jasa Maintenance
3 pages
ESPORT Outlined Framework
No ratings yet
ESPORT Outlined Framework
10 pages
Use of Remote Sensing and GNSS in Precision Agriculture: Prof Graciela Metternicht
No ratings yet
Use of Remote Sensing and GNSS in Precision Agriculture: Prof Graciela Metternicht
44 pages
MariamNersisyanResume SW
No ratings yet
MariamNersisyanResume SW
3 pages
Boarding Pass Wings
No ratings yet
Boarding Pass Wings
2 pages
Top Projects Companies in India - 27072020
No ratings yet
Top Projects Companies in India - 27072020
6 pages
Additional Mathematics Teacher Manual 2
No ratings yet
Additional Mathematics Teacher Manual 2
139 pages
CATIA Questions
No ratings yet
CATIA Questions
8 pages
51 Letters For IELTS General
No ratings yet
51 Letters For IELTS General
47 pages
Dixell XEV22D Manual
No ratings yet
Dixell XEV22D Manual
36 pages
RFR - MKT0231 - Biares' Building
No ratings yet
RFR - MKT0231 - Biares' Building
2 pages
AKANKSHA RESUME(22) (1)
No ratings yet
AKANKSHA RESUME(22) (1)
3 pages
Green Cycle Startup
No ratings yet
Green Cycle Startup
14 pages
Best
0% (1)
Best
4 pages

Supplement Python Regular Expression

Uploaded by

Supplement Python Regular Expression

Uploaded by

Supplement: Regular Expressions

For Introduction to Programming Using Python

A regular expression (abbreviated regex) is a string that

splits "ab bc cd" into a list ['ab', 'bc', 'cd'].

At first glance, re.split function is very similar to the split

However, the re.split function is more powerful. You can specify

splits "ab1bc4cd" into a list ['ab', 'bc', 'cd']. \d in the

splits "ab13bc44cd443gg" into a list ['ab', 'bc', 'cd', 'gg'].

2 Regular Expression Syntax

A regular expression consists of literal characters and special

Regular Expression Meaning Example

x A character literal "good" matches "good"

p* zero or more "good" matches "a*"

Let us use several examples to demonstrate how to construct

Example 1: The pattern for social security numbers is xxx-xx-

"111-22-3333" matches "\d{3}-\d{2}-\d{4}"

"11-22-3333" does not match "\d{3}-\d{2}-\d{4}"

Example 2: An even number ends with digits 0, 2, 4, 6, or 8. The

"123" matches "\d*[02468]"

Example 3: The pattern for telephone numbers is (xxx) xxx-xxxx,

"(912) 921-2728" matches "\\([1-9]\d{2}\\) \d{3}-\d{4}"

"921-2728" does not match "\\([1-9]\d{2}\\) \d{3}-\d{4}"

Example 4: Suppose the last name consists of at most 25 letters

Note that you cannot have arbitrary whitespace in a regular

"Smith" matches "[A-Z][a-zA-Z]{1,24}"

"Jones123" does not match "[A-Z][a-zA-Z]{1,24}"

Example 5: Python identifiers are defined in §2.4,

 An identifier is a sequence of characters that consists of

The pattern for identifiers can be described as

Example 7: What strings are matched by the regular expression

3 The match and search Functions

You can use the re.match and re.search functions to match a

Enter SSN: 4343

Invoking re.match returns a match object if the string matches

Invoking re.search returns a match object if the string matches

For the functions in the re module, an optional flag parameter

The string "AaaBe" matches the pattern a{3} case-insensitive.

The string "AaaBe" does not match the pattern a{3}.

You might also like