Supplement Python Regular Expression
Supplement Python Regular Expression
0 Introduction
Often you need to write the code to validate user input such as
to check whether the input is a number, a string with all
lowercase letters, or a social security number. How do you write
this type of code? A simple and effective way to accomplish this
task is to use the regular expression.
1 Getting Started
To use regex, import the re module. You can use the split
function in the module to split a string. For example,
re.split(" ", "ab bc cd")
1
Table 1: Frequently Used Regular Expressions
NOTE
Recall that a whitespace (or a whitespace character)
is any character which does not display itself but
does take up space. The characters ' ', '\t', '\n',
'\r', '\f' are whitespace characters. So \s is the
same as [ \t\n\r\f], and \S is the same as [^
\t\n\r\f\v].
NOTE
A word character is any letter, digit, or the
underscore character. So \w is the same as [a-z[A-
Z][0-9]_] or simply [a-zA-Z0-9_], and \W is the same
as [^a-zA-Z0-9_].
NOTE
The last six entries *, +, ?, {n}, {n,}, and {n, m}
in Table 1 are called quantifiers that specify how
many times the pattern before a quantifier may
repeat. For example, A* matches zero or more A’s, A+
matches one or more A’s, A? matches zero or one A’s,
A{3} matches exactly AAA, A{3,} matches at least
three A’s, and A{3,6} matches between 3 and 6 A’s. *
2
is the same as {0,}, + is the same as {1,}, and ? is the
same as {0,1}.
CAUTION
Do not use spaces in the repeat quantifiers. For
example, A{3,6} cannot be written as A{3, 6} with a
space after the comma.
NOTE
You may use parentheses to group patterns. For
example, (ab){3} matches ababab, but ab{3} matches
abbb.
For example,
but
\d*[02468]
For example,
but
"122" does not match "\d*[02468]"
3
Note that the parentheses symbols ( and ) are special characters
in a regular expression for grouping patterns. To represent a
literal ( or ) in a regular expression, you have to use \\( and
\\).
For example,
but
[A-Z][a-zA-Z]{1,24}
For example,
but
4
Example 6: What strings are matched by the regular expression
"Welcome to (XHTML|HTML)"? The answer is Welcome to XHTML or
Welcome to HTML.
import re
regex = "\d{3}-\d{2}-\d{4}"
ssn = input("Enter SSN: ")
match1 = re.match(regex, ssn)
if match1 != None:
print(ssn, " is a valid SSN")
print("start position of the matched text is " +
str(match1.start()))
print("start and end position of the matched text is " +
str(match1.span()))
else:
print(ssn, " is not a valid SSN")
Sample Output
Sample Output
Enter SSN: 434-32-3243
434-32-3243 is a valid SSN
start position of the matched text is 0
start and end position of the matched text is (0, 11)
5
Listing 2 SearchDemo.py
import re
regex = "\d{3}-\d{2}-\d{4}"
text = input("Enter a text: ")
match1 = re.search(regex, text)
if match1 != None:
print(text, " contains a SSN")
print("start position of the matched text is " +
str(match1.start()))
print("start and end position of the matched text is " +
str(match1.span()))
else:
print(text, " does not contain a SSN")
Sample Output
Enter a text: The ssn for Smith is 343-34-3490
The ssn for Smith is 343-34-3490 contains a SSN
start position of the matched text is 21
start and end position of the matched text is (21, 32)
Sample Output
Enter a text: Smith's ssn is 343.34.3434
Smith's ssn is 343.34.3434 does not contain a SSN
4 Flags