SlideShare a Scribd company logo
PYTHON REGULAR EXPRESSIONS
John Zhang
Tuesday, December 11, 2012
Regular Expressions
• Regular expressions are a powerful string
manipulation tool
• All modern languages have similar library
packages for regular expressions
• Use regular expressions to:
– Search a string (search and match)
– Replace parts of a string (sub)
– Break stings into smaller pieces (split)
Regular Expression Python Syntax
• regular match:
Example: the regular expression “test” only
matches the string ‘test’
• [x] matches any one of a list of characters
Example: “*abc+” matches ‘a’,‘b’,or ‘c’
• [^x] matches any one character that is not
included in x
“*^abc+” matches any single character except
‘a’,’b’,or ‘c’
Regular Expressions Syntax
• “.” matches any single character
• Parentheses can be used for grouping by ()
Example: “(abc)+” matches ’abc’, ‘abcabc’,
‘abcabcabc’, etc.
• x|y matches x or y
Example: “this|that” matches ‘this’ and ‘that’,
but not ‘thisthat’.
Regular Expression Syntax
• x* matches zero or more x’s
“a*” matches ’’, ’a’, ’aa’, etc.
• x+ matches one or more x’s
“a+” matches ’a’,’aa’,’aaa’, etc.
• x? matches zero or one x’s
“a?” matches ’’ or ’a’ .
• x{m, n} matches i x‘s, where m<i< n
“a,2,3-” matches ’aa’ or ’aaa’
Regular Expression Syntax
• “d” matches any digit; “D” matches any non-digit
• “s” matches any whitespace character; “S”
matches any non-whitespace character
• “w” matches any alphanumeric character; “W”
matches any non-alphanumeric character
• “^” matches the beginning of the string; “$”
matches the end of the string
• “b” matches a word boundary; “B” matches
position that is not a word boundary
Search and Match
• The two basic functions are re.search and re.match
– Search looks for a pattern anywhere in a string
– Match looks for a match staring at the beginning
• Both return None if the pattern is not found (logical false)
and a “match object” if it is
pat = "a*b"
import re
matchObj = re.search(pat,"fooaaabcde")
if matchObj:
print “match successfully at %s” % matchObj.group(0)
Q: What’s a match object?
• A: an instance of the match class with the details of the match
result
pat = "a*b"
>>> r1 = re.search(pat,"fooaaabcde")
>>> r1.group() # group returns string matched
'aaab'
>>> r1.start() # index of the match start
3
>>> r1.end() # index of the match end
7
>>> r1.span() # tuple of (start, end)
(3, 7)
What got matched?
• Here’s a pattern to match simple email addresses
w+@(w+.)+(com|org|net|edu)
>>> pat1 = "w+@(w+.)+(com|org|net|edu)"
>>> r1 = re.match(pat1,“qzhang@pku.cn.edu")
>>> r1.group()
'qzhang@pku.cn.edu’

• We might want to extract the pattern parts, like the
email name and host
What got matched?
• We can put parentheses around groups we want to be
able to reference
>>> pat2 = "(w+)@((w+.)+(com|org|net|edu))"
>>> r2 = re.match(pat2,"qzhang@pku.cn.edu")
>>> r2.group(1)
‘qzhang'
>>> r2.group(2)
‘pku.cn.edu'
>>> r2.groups()
r2.groups()
(‘qzhang', ' pku.cn.edu ', ‘cn.', 'edu’)

• Note that the ‘groups’ are numbered in a preorder
traversal of the forest
What got matched?
• We can ‘label’ the groups as well…
>>> pat3 ="(?P<name>w+)@(?P<host>(w+.)+(com|org|net|edu))"
>>> r3 = re.match(pat3,"qzhang@pku.cn.edu")
>>> r3.group('name')
‘qzhang'
>>> r3.group('host')
‘pku.cn.edu’

• And reference the matching parts by the labels
More re functions
• re.split() is like split but can use patterns
>>> re.split("W+", “This... is a test, short and sweet, of split().”)
*'This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’+

• re.sub substitutes one string for a pattern
>>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes')
'black socks and black shoes’

• re.findall() finds al matches
>>> re.findall("d+”,"12 dogs,11 cats, 1 egg")
*'12', '11', ’1’+
Compiling regular expressions
• If you plan to use a re pattern more than once,
compile it to a re object
• Python produces a special data structure that
speeds up matching
>>> capt3 = re.compile(pat3)
>>> cpat3
<_sre.SRE_Pattern object at 0x2d9c0>
>>> r3 = cpat3.search("qzhang@pku.cn.edu")
>>> r3
<_sre.SRE_Match object at 0x895a0>
>>> r3.group()
'qzhang@pku.cn.edu'
Pattern object methods
• There are methods defined for a pattern object that
parallel the regular expression functions, e.g.,
– match
– search
– split
– findall
– sub

More Related Content

PDF
Python (regular expression)
DOCX
Python - Regular Expressions
PDF
Python - Lecture 7
PPT
Adv. python regular expression by Rj
PDF
Python Programming - XI. String Manipulation and Regular Expressions
PDF
Python : Regular expressions
PPTX
Processing Regex Python
PPTX
Regular expressions in Python
Python (regular expression)
Python - Regular Expressions
Python - Lecture 7
Adv. python regular expression by Rj
Python Programming - XI. String Manipulation and Regular Expressions
Python : Regular expressions
Processing Regex Python
Regular expressions in Python

What's hot (20)

PPTX
Java: Regular Expression
PPT
Regular Expressions
PPT
16 Java Regex
PPTX
Regular expressions
PPTX
Python- Regular expression
PPTX
Regular Expression
PPT
Regular Expression
PPT
Php String And Regular Expressions
PDF
Strings in Python
PPTX
Regular expression
ODP
Regular Expression
ODP
Regex Presentation
PPTX
Regular Expressions in Java
PPTX
Finaal application on regular expression
PPT
Textpad and Regular Expressions
PPTX
Regular Expressions 101 Introduction to Regular Expressions
PDF
Strings in python
PDF
Python strings
PPTX
Bioinformatics p2-p3-perl-regexes v2014
PPTX
Regular expressions
Java: Regular Expression
Regular Expressions
16 Java Regex
Regular expressions
Python- Regular expression
Regular Expression
Regular Expression
Php String And Regular Expressions
Strings in Python
Regular expression
Regular Expression
Regex Presentation
Regular Expressions in Java
Finaal application on regular expression
Textpad and Regular Expressions
Regular Expressions 101 Introduction to Regular Expressions
Strings in python
Python strings
Bioinformatics p2-p3-perl-regexes v2014
Regular expressions
Ad

Similar to Python advanced 2. regular expression in python (20)

PPTX
unit-4 regular expression.pptx
PDF
regular-expression.pdf
PDF
Python Regular Expressions
PPTX
Pythonlearn-11-Regex.pptx
PDF
Module 3 - Regular Expressions, Dictionaries.pdf
PDF
Python regular expressions
PDF
Regular expression in python for students
PPTX
Common regex pp functions wweweewwt.pptx
PPTX
Python lec5
PDF
Regular expressions
PPTX
P3 2018 python_regexes
PPTX
Regular Expressions
PPTX
Regular expressions,function and glob module.pptx
PPTX
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
PPTX
P3 2017 python_regexes
PPTX
Regular_Expressions.pptx
PPTX
UNIT-4( pythonRegular Expressions) (3).pptx
PPTX
regex.pptx
PPTX
Regular Expressions in Python.pptx
PDF
A3 sec -_regular_expressions
unit-4 regular expression.pptx
regular-expression.pdf
Python Regular Expressions
Pythonlearn-11-Regex.pptx
Module 3 - Regular Expressions, Dictionaries.pdf
Python regular expressions
Regular expression in python for students
Common regex pp functions wweweewwt.pptx
Python lec5
Regular expressions
P3 2018 python_regexes
Regular Expressions
Regular expressions,function and glob module.pptx
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
P3 2017 python_regexes
Regular_Expressions.pptx
UNIT-4( pythonRegular Expressions) (3).pptx
regex.pptx
Regular Expressions in Python.pptx
A3 sec -_regular_expressions
Ad

More from John(Qiang) Zhang (11)

PPTX
Git and github introduction
PPT
Python testing
PPT
Profiling in python
PPT
Introduction to jython
PPT
Introduction to cython
PPT
A useful tools in windows py2exe(optional)
PPT
Python advanced 3.the python std lib by example –data structures
PPT
Python advanced 3.the python std lib by example – system related modules
PPT
Python advanced 3.the python std lib by example – application building blocks
PPT
Python advanced 1.handle error, generator, decorator and decriptor
PPT
Python advanced 3.the python std lib by example – algorithm
Git and github introduction
Python testing
Profiling in python
Introduction to jython
Introduction to cython
A useful tools in windows py2exe(optional)
Python advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example – system related modules
Python advanced 3.the python std lib by example – application building blocks
Python advanced 1.handle error, generator, decorator and decriptor
Python advanced 3.the python std lib by example – algorithm

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
cuic standard and advanced reporting.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Empathic Computing: Creating Shared Understanding
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Getting Started with Data Integration: FME Form 101
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
cuic standard and advanced reporting.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Tartificialntelligence_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Empathic Computing: Creating Shared Understanding
SOPHOS-XG Firewall Administrator PPT.pptx
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing

Python advanced 2. regular expression in python

  • 1. PYTHON REGULAR EXPRESSIONS John Zhang Tuesday, December 11, 2012
  • 2. Regular Expressions • Regular expressions are a powerful string manipulation tool • All modern languages have similar library packages for regular expressions • Use regular expressions to: – Search a string (search and match) – Replace parts of a string (sub) – Break stings into smaller pieces (split)
  • 3. Regular Expression Python Syntax • regular match: Example: the regular expression “test” only matches the string ‘test’ • [x] matches any one of a list of characters Example: “*abc+” matches ‘a’,‘b’,or ‘c’ • [^x] matches any one character that is not included in x “*^abc+” matches any single character except ‘a’,’b’,or ‘c’
  • 4. Regular Expressions Syntax • “.” matches any single character • Parentheses can be used for grouping by () Example: “(abc)+” matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc. • x|y matches x or y Example: “this|that” matches ‘this’ and ‘that’, but not ‘thisthat’.
  • 5. Regular Expression Syntax • x* matches zero or more x’s “a*” matches ’’, ’a’, ’aa’, etc. • x+ matches one or more x’s “a+” matches ’a’,’aa’,’aaa’, etc. • x? matches zero or one x’s “a?” matches ’’ or ’a’ . • x{m, n} matches i x‘s, where m<i< n “a,2,3-” matches ’aa’ or ’aaa’
  • 6. Regular Expression Syntax • “d” matches any digit; “D” matches any non-digit • “s” matches any whitespace character; “S” matches any non-whitespace character • “w” matches any alphanumeric character; “W” matches any non-alphanumeric character • “^” matches the beginning of the string; “$” matches the end of the string • “b” matches a word boundary; “B” matches position that is not a word boundary
  • 7. Search and Match • The two basic functions are re.search and re.match – Search looks for a pattern anywhere in a string – Match looks for a match staring at the beginning • Both return None if the pattern is not found (logical false) and a “match object” if it is pat = "a*b" import re matchObj = re.search(pat,"fooaaabcde") if matchObj: print “match successfully at %s” % matchObj.group(0)
  • 8. Q: What’s a match object? • A: an instance of the match class with the details of the match result pat = "a*b" >>> r1 = re.search(pat,"fooaaabcde") >>> r1.group() # group returns string matched 'aaab' >>> r1.start() # index of the match start 3 >>> r1.end() # index of the match end 7 >>> r1.span() # tuple of (start, end) (3, 7)
  • 9. What got matched? • Here’s a pattern to match simple email addresses w+@(w+.)+(com|org|net|edu) >>> pat1 = "w+@(w+.)+(com|org|net|edu)" >>> r1 = re.match(pat1,“[email protected]") >>> r1.group() '[email protected]’ • We might want to extract the pattern parts, like the email name and host
  • 10. What got matched? • We can put parentheses around groups we want to be able to reference >>> pat2 = "(w+)@((w+.)+(com|org|net|edu))" >>> r2 = re.match(pat2,"[email protected]") >>> r2.group(1) ‘qzhang' >>> r2.group(2) ‘pku.cn.edu' >>> r2.groups() r2.groups() (‘qzhang', ' pku.cn.edu ', ‘cn.', 'edu’) • Note that the ‘groups’ are numbered in a preorder traversal of the forest
  • 11. What got matched? • We can ‘label’ the groups as well… >>> pat3 ="(?P<name>w+)@(?P<host>(w+.)+(com|org|net|edu))" >>> r3 = re.match(pat3,"[email protected]") >>> r3.group('name') ‘qzhang' >>> r3.group('host') ‘pku.cn.edu’ • And reference the matching parts by the labels
  • 12. More re functions • re.split() is like split but can use patterns >>> re.split("W+", “This... is a test, short and sweet, of split().”) *'This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’+ • re.sub substitutes one string for a pattern >>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes') 'black socks and black shoes’ • re.findall() finds al matches >>> re.findall("d+”,"12 dogs,11 cats, 1 egg") *'12', '11', ’1’+
  • 13. Compiling regular expressions • If you plan to use a re pattern more than once, compile it to a re object • Python produces a special data structure that speeds up matching >>> capt3 = re.compile(pat3) >>> cpat3 <_sre.SRE_Pattern object at 0x2d9c0> >>> r3 = cpat3.search("[email protected]") >>> r3 <_sre.SRE_Match object at 0x895a0> >>> r3.group() '[email protected]'
  • 14. Pattern object methods • There are methods defined for a pattern object that parallel the regular expression functions, e.g., – match – search – split – findall – sub