0% found this document useful (0 votes)
73 views18 pages

Lecture 04

This document summarizes a lecture on regular expressions (regex) in Python. It introduces common regex patterns like optional expressions, Kleene stars and pluses, wildcards, and anchors. It also demonstrates how to define patterns, search text strings, and use functions like search, match, findall, and sub in Python's re module. Finally, it lists some online resources for learning more about regex in Python.

Uploaded by

ali khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views18 pages

Lecture 04

This document summarizes a lecture on regular expressions (regex) in Python. It introduces common regex patterns like optional expressions, Kleene stars and pluses, wildcards, and anchors. It also demonstrates how to define patterns, search text strings, and use functions like search, match, findall, and sub in Python's re module. Finally, it lists some online resources for learning more about regex in Python.

Uploaded by

ali khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture 04

Regular Expressions (regex)


Natural Language Processing
COSC-3121
Ms. Humaira Anwer
[email protected]

Lecture 03 Regular Expressions {regex} 1


Agenda
• Regular Expressions
• Optional Expressions
• Kleene+
• Kleene*
• Wildcard expression
• Anchors
• Alphanumeric characters
• Whitespaces
• Python implementation
• Online resources
• Summary

Lecture 03 Regular Expressions {regex} 2


Regular Expressions-Optional
Expression
• How can we talk about optional elements?
• like an optional s in woodchuck and woodchucks?
• For this we use the question mark ?
• /?/, which means “the preceding character or nothing.”
• Also means “zero or one instances of the previous
character”.

Lecture 03 Regular Expressions {regex} 3


Regular Expressions: * + .
Pattern Matches Examples
/oo*h/ 0 or more previous character oh, ooh,
oooh, oooooh

/[ab]*/ 0 or more a’s and b’s abababab,


aaaaabbbbb,
bbbbb, a

/o+h/ 1 or more of previous oh, ooh,


character (at least one) ooooooh

/beg.n/ Wildcard expression i.e. can begin, begun,


match any character (except began, beg3n,
a carriage return) beg’n

Lecture 03 Regular Expressions {regex} 4


Regular Expressions-Anchors
Pattern Matches Examples
/^The/ Matches character/string The KFUEIT is the best
at the start of a line
/^[A-Z]/ Matches character/string the Pakistan is my beloved Country
start of a line

/$/ Matches end of line The end? The end!

/ the shop\ . $ / Wildcard expression i.e. can She went to the shop.
match any character (except
a carriage return)

/ \b the \b / Matches a word boundary She went to the shop to buy the
birthday card for them.

/ \b 99 \b / Matches a word boundary These 299 bottles are for a


total of $99 only.

Lecture 03 Regular Expressions {regex} 5


Regular Expressions-Anchors

Pattern Matches Examples


/ a-zA-Z0-9˽ / Any alphanumeric KFUEIT.
or space
/ \w/
/^\w/ A non-alphanumeric !!!!!

/ \W/
/ ˽\t\n\r\f\v/ Whitespace, space,
tab
/ \s/
/^\s/ Non-whitespace
/ \S/

Lecture 03 Regular Expressions {regex} 6


Regular Expressions-Anchors

Pattern Matches Examples


/ Python\Z/ Matches if I like Python. Match
specified I like Python Programming. No Match
character is at Python is fun. No Match
end of string
['^a...s$'] A pattern defined abs No match
using RegEx can alias Match
be used to match abyss Match
against a string. Alias No match
An abacus No match

Lecture 03 Regular Expressions {regex} 7


Python Implementation- Importing
required modules
• Python’s re module is used to work with regex
• Built in library of python for handling regex

• The re module needs to be imported before actual


working starts
• To import re module in python program following code
snippet is used.

import re

Lecture 03 Regular Expressions {regex} 8


Python Implementation-Defining
patterns
• A Regular Expression (regex) is a sequence of
characters that defines a search pattern.
• E.g.
[pP]ython

• This defines a regex pattern.


• The pattern is: the word python starting with either
small p or capital P

Lecture 03 Regular Expressions {regex} 9


Python Implementation-Defining
patterns
• Now that re module is imported we can start by defining
string patterns to search for.
• Following code snippet can be used for this purpose.

import re
pattern = ‘[pP]ython’

• Pattern is written as per rules of regex and is enclosed is


single quotes.
• The variable pattern is used to save the search string.

Lecture 03 Regular Expressions {regex} 10


Python Implementation-Text String
to Search
• Now that we have defined the required pattern
that we want to search.
• A text corpus much be given from which search is
to be performed.
• An exemplary text string can be created by
following given code snippet

import re
pattern = ‘[pP]ython’
text_string = “I love Python.”

Lecture 03 Regular Expressions {regex} 11


Python Implementation-Search
function
• If there is a match anywhere in the string; the
search function returns
• The first occurrence of a Match object in the string
• Location index in the provided string
import re
text_string = “I love Python.”
x = re.search(“[pP]ython”, text_string)
print(x)
Output
<_sre.SRE_Match object; span=(7, 13), match='Python'>

Process finished with exit code 0

Lecture 03 Regular Expressions {regex} 12


Python Implementation-match
function
• re.match() function of re in Python will search the
regular expression pattern and return the first
occurrence.
• The Python RegEx Match method checks for a
match only at the start of the string.
• if a match is found at the start of string, it returns
the match object.
• But if a match is found in some other place in the
string, the Python RegEx Match function returns
null.
Lecture 03 Regular Expressions {regex} 13
Python Implementation-match
function

Example 01 Example 02
import re text_string = “Python is my favourite.”
text_string = “I love Python.” x = re.match(“[pP]ython”, text_string)
x = re.match(“[pP]ython”, text_string) print(x)
print(x) Output
Output <_sre.SRE_Match object; span=(0, 6),
match='Python'>
None
Process finished with exit code 0
Process finished with exit code 0

Lecture 03 Regular Expressions {regex} 14


Python Implementation-
findall function
• The findall() function returns a list containing all matches.
• The list contains the matches in the order they are found.
• If no matches are found, an empty list is returned.
Example
import re
text_string = “I love Python.”
x = re.findall('[A-Za-z .]',text,)
print(x)
Output
['I', ' ', 'l', 'o', 'v', 'e', ' ', 'P', 'y', 't', 'h', 'o', 'n', '.']

Process finished with exit code 0


Lecture 03 Regular Expressions {regex} 15
Python Implementation-Sub
Function
• The sub() function replaces the matched string with
the text of your choice.
• The following example code snippet replaces all spaces with the
number 9
Example
import re
text_string = “I love Python.”
x = re.sub(‘\s’,‘9’,text,)
print(x)
Output
I9love9Python.

Process finished with exit code 0


Lecture 03 Regular Expressions {regex} 16
Online Resources
• https://www.programiz.com/python-
programming/regex

• https://www.w3schools.com/python/python_regex
.asp

• https://docs.python.org/3/library/re.html

• https://docs.python.org/3/howto/regex.html
Lecture 03 Regular Expressions {regex} 17
Summing Up
• Regular expressions are good at representing
subsets of natural language
• But may be difficult for humans to understand for
any real (large) subset of a language
• Can be hard to scale up: e.g., when many choices at any
point (e.g. surnames)
• But quick, powerful and easy to use for small problems

Lecture 03 Regular Expressions {regex} 18

You might also like