Advance Python3.8
Advance Python3.8
Python 3.8
Advanced
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
•
ho
ut
na
U
Advanced Python
EV
3.8
U
na
AL
ut
ho
riz
U
ed
R
AT
ep
hands-on exercises
d
uc
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
WEBUCATOR
n
Pr
PY
oh
bii
te
d
Copyright © 2020 by Webucator. All rights reserved.
No part of this manual may be reproduced or used in any manner without written permission of the
EV
copyright owner.
Version: PYT238.1.1.1
U
Class Files
na
AL
ut
https://www.webucator.com/class-files/index.cfm?versionId=4765.
riz
Errata
U
ed
AT
ep
ro
d uc
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
Table of Contents
LESSON 1. Advanced Python Concepts...........................................................................................1
Lambda Functions....................................................................................................................1
EV
Advanced List Comprehensions................................................................................................2
Exercise 1: Rolling Five Dice...........................................................................................8
Collections Module.................................................................................................................10
Exercise 2: Creating a defaultdict.................................................................................17
U
Counters.................................................................................................................................21
na
AL
Exercise 3: Creating a Counter.....................................................................................26
ut
Sorting....................................................................................................................................34
U
ed
AT
Creating a Dictionary from Two Sequences............................................................................44
ep
N
Exercise 6: Green Glass Door.......................................................................................68
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
Table of Contents | i
LESSON 3. Working with Data.....................................................................................................73
Virtual Environment...............................................................................................................73
Relational Databases..............................................................................................................74
Passing Parameters.................................................................................................................82
SQLite.....................................................................................................................................85
EV
Exercise 7: Querying a SQLite Database.......................................................................89
SQLite Database in Memory...................................................................................................90
Exercise 8: Inserting File Data into a Database.............................................................93
U
CSV.........................................................................................................................................97
AL
Exercise 9: Finding Data in a CSV File.........................................................................102
ut
AT
JSON.....................................................................................................................................125
ep
IO
Exercise 13: Comparing Times to Execute..................................................................143
tio
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
ii | Table of Contents
LESSON 5. Classes and Objects...................................................................................................157
Attributes..............................................................................................................................157
Behaviors..............................................................................................................................158
Classes vs. Objects................................................................................................................159
Attributes and Methods.......................................................................................................161
EV
Exercise 15: Adding a roll() Method to Die.................................................................170
Private Attributes.................................................................................................................173
Properties.............................................................................................................................175
U
Inheritance...........................................................................................................................192
riz
AT
Static Methods.....................................................................................................................204
ep
Understanding Decorators...................................................................................................219
uc
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
Lambda functions.
U
na
AL
Advanced list comprehensions.
ut
U
ed
Sorting sequences.
R
AT
Unpacking sequences in function calls.
ep
IO
Packing the basket was not quite such pleasant work as unpacking the basket.
tio
It never is.
n
or
N
– The Wind in the Willows, Kenneth Grahame
D
is
Introduction
tri
C
bu
In this lesson, you will learn about some Python functionality and techniques that are commonly used
tio
PY
oh
Lambda Functions
i bi
Lambda functions are anonymous functions that are generally used to complete a small task, after
te
which they are no longer needed. The syntax for creating a lambda function is:
d
LESSON 1 | 1
lambda arguments: expression
Lambda functions are almost always used within other functions, but for demonstration purposes, we
could assign a lambda function to a variable, like this:
EV
f = lambda n: n**2
U
AL
ut
f(5) # Returns 25
ho
f(2) # Returns 4
riz
U
Try it at the Python terminal:
ed
R
AT
>>> f = lambda n: n**2
ep
>>> f(5)
25
ro
>>> f(2)
d
4
uc
IO
tio
N
Advanced List Comprehensions
D
is
tri
Before we get into advanced list comprehensions, let’s do a quick review. The basic syntax for a list
tio
O
comprehension is:
n
Pr
Here is an example from earlier, in which we create a list by filtering another list:
Demo 1: advanced-python-concepts/Demos/sublist_from_list.py
EV
1. def main():
2. words = ['Woodstock', 'Gary', 'Tucker', 'Gopher', 'Spike', 'Ed',
3. 'Faline', 'Willy', 'Rex', 'Rhino', 'Roo', 'Littlefoot',
4. 'Bagheera', 'Remy', 'Pongo', 'Kaa', 'Rudolph', 'Banzai',
U
AL
6. three_letter_words = [w for w in words if len(w) == 3]
7. print(three_letter_words)
ut
8.
ho
9. main()
riz
U
ed
Code Explanation
R
AT
This will return:
ep
ro
IO
tio
And here is a new example, in which we map all the elements in one list to another using a function:
n
Demo 2: advanced-python-concepts/Demos/list_comp_mapping.py
or
N
D
1. def get_inits(name):
is
C
4. # Join inits list on "." and append "." to end
bu
6.
O
7. def main():
n
10.
11. # Create list by mapping person elements to get_inits()
i bi
14.
d
15. main()
LESSON 1 | 3
Code Explanation
Assume that you need to create a list of tuples showing the possible permutations of rolling two six-sided
ho
dice. When dealing with permutations, order matters, so (1, 2) and (2, 1) are not the same. First, let’s
riz
look at how we would do this without a list comprehension, using a nested for loop:
U
ed
Demo 3: advanced-python-concepts/Demos/dice_rolls.py
R
AT
1. def main():
ep
2. dice_rolls = []
ro
5. roll = (a, b)
IO
6. dice_rolls.append(roll)
tio
7.
n
8. print(dice_rolls)
9.
or
N
10. main()
D
is
tri
Code Explanation
C
bu
O
n
Pr
PY
oh
i bi
te
d
List comprehensions can include multiple for loops with each subsequent loop nested within the
U
previous loop. This provides an easy way to create something similar to a two-dimensional array or a
na
AL
matrix:
ut
ho
Demo 4: advanced-python-concepts/Demos/dice_rolls_list_comp.py
riz
1. def main():
U
2. dice_rolls = [
ed
3. (a, b)
4. for a in range(1, 7)
R
AT
5. for b in range(1, 7)
ep
6. ]
ro
7.
d
8. print(dice_rolls)
uc
9.
IO
10. main()
tio
n
or
Code Explanation
N
D
This code will create the same list of tuples containing all the possible permutations of two dice rolls.
is
tri
C
bu
Notice that the list of permutations contains what game players would consider duplicates. For example,
(1, 2) and (2, 1) are considered the same in dice. We can remove these pseudo-duplicates by starting
tio
O
the second for loop with the current value of a in the first for loop. Let’s do this first without a list
n
comprehension:
Pr
PY
oh
bii
te
d
LESSON 1 | 5
Demo 5: advanced-python-concepts/Demos/dice_combos.py
1. def main():
2. dice_rolls = []
3. for a in range(1, 7):
4. for b in range(a, 7):
EV
5. roll = (a, b)
6. dice_rolls.append(roll)
7.
8. print(dice_rolls)
U
9.
na
10. main()
AL
ut
ho
Code Explanation
riz
U
The first time through the outer loop, the inner loop from 1 to 7 (not including 7), the second time
ed
AT
ep
The dice_rolls list will now contain the different possible rolls (from a dice rolling point of view):
ro
d
[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
uc
(2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
IO
(3, 3), (3, 4), (3, 5), (3, 6),
tio
(6, 6)]
N
D
is
Where we previously showed permutations, in which order matters, we are now showing combinations,
tri
in which order does not matter. The following two tuples represent different permutations, but the
C
bu
O
n
Now, let’s see how we can do the same thing with a list comprehension:
Pr
PY
oh
i bi
te
d
9.
na
10. main()
AL
ut
ho
Code Explanation
riz
U
This code will create the same list of tuples containing all the possible combinations of two dice rolls.
ed
R
AT
ep
ro
d uc
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
LESSON 1 | 7
Exercise 1: Rolling Five Dice
10 to 15 minutes
EV
There is no limit to the number of for loops in a list comprehension, so we can use this same technique
to get the possibilities for more than two dice.
sions.py.
na
AL
2. Write two separate list comprehensions:
ut
ho
A. The first should create five-item tuples for all unique permutations from rolling five
identical six-sided dice. Remember, when looking for permutations, order matters.
riz
U
B. The second should create five-item tuples for all unique combinations from rolling
ed
five identical six-sided dice. Remember, when looking for combinations, order doesn’t
matter.
R
AT
ep
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
LESSON 1 | 9
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
ho
ut
na
U
Solution: advanced-python-concepts/Solutions/list_comprehensions.py
1. # Get unique permutations:
2. dice_rolls_p = [(a, b, c, d, e)
3. for a in range(1, 7)
4. for b in range(1, 7)
EV
5. for c in range(1, 7)
6. for d in range(1, 7)
7. for e in range(1, 7)]
8.
U
10.
AL
11. # Get unique combinations:
ut
U
15. for c in range(b, 7)
ed
AT
18.
ep
Code Explanation
uc
IO
tio
C
bu
Collections Module
tio
O
n
The collections module includes specialized containers (objects that hold data) that provide more
Pr
specific functionality than Python’s built-in containers (list, tuple, dict, and set). Some of the
PY
oh
more useful containers are named tuples (created with the namedtuple() function), defaultdict,
i
and Counter.
bi
te
d
Named Tuples
Imagine you are creating a game in which you need to set and get the position of a target. You could
do this with a regular tuple like this:
A named tuple allows you to reference target_pos.x, which is more meaningful and helpful. Here
na
AL
is a simplified signature for creating namedtuple objects:
ut
ho
namedtuple(typename, field_names)
riz
U
1. typename – The value passed in for typename will be the name of a new tuple subclass. It is
ed
standard for the name of the new subclass to begin with a capital letter. We have not yet
R
AT
covered classes and subclasses yet. For now, it is enough to know that the new tuple subclass
ep
created by namedtuple() will inherit all the properties of a tuple, and also make it possible
ro
2. field_names – The value for field_names can either be a whitespace-delimited string (e.g.,
uc
IO
'x y'), a comma-delimited string (e.g., 'x, y'), or a sequence of strings (e.g., ['x', 'y']).
tio
n
Demo 7: advanced-python-concepts/Demos/namedtuple.py
or
N
1. from collections import namedtuple
D
2.
3. Point = namedtuple('Point', 'x, y')
is
4.
tri
C
5. # Set target position:
bu
7.
O
8. # Get x value of target position
n
9. print(target_pos.x)
Pr
PY
oh
Code Explanation
i bi
te
As the preceding code shows, the namedtuple() function allows you to give a name to the elements
d
LESSON 1 | 11
Default Dictionaries (defaultdict)
With regular dictionaries, trying to modify a key that doesn’t exist will cause an exception. For example,
the following code will result in a KeyError:
EV
foo = {}
foo['bar'] += 1
U
A defaultdict is like a regular dictionary except that, when you look up a key that doesn’t exist, it
na
AL
creates the key and assigns it the value returned by a function specified when creating it.
ut
ho
To illustrate how a defaultdict can be useful, let’s see how we would create a regular dictionary that
shows the number of different ways each number (2 through 12) can be rolled when rolling two dice,
riz
U
like this:
ed
R
{
AT
ep
2: 1,
3: 2,
ro
4: 3,
d
5: 4,
uc
IO
6: 5,
tio
7: 6,
8: 5,
n
9: 4,
or
N
10: 3,
D
11: 2,
is
12: 1
tri
}
C
bu
O
n
There are five ways to roll a 6: (1, 5), (2, 4), (3, 3), (4, 2), and (5, 1).
PY
oh
dice_rolls = [
d
(a, b)
for a in range(1, 7)
for b in range(1, 7)
]
roll_counts = {}
na
roll_counts[sum(roll)] += 1
ho
else:
riz
roll_counts[sum(roll)] = 1
U
ed
This method works fine and gives us the following roll_counts dictionary that we looked at earlier:
R
AT
ep
{
ro
2: 1,
d
3: 2,
uc
IO
4: 3,
tio
5: 4,
6: 5,
n
7: 6,
or
N
8: 5,
D
9: 4,
is
10: 3,
tri
11: 2,
C
bu
12: 1
}
tio
O
n
An alternative to using conditionals to make sure the key exists is to just go ahead and try to increment
Pr
the value of each potential key we find and then, if we get a KeyError, assign 1 for that key, like this:
PY
oh
i bi
te
d
LESSON 1 | 13
roll_counts = {}
for roll in dice_rolls:
try:
roll_counts[sum(roll)] += 1
except KeyError:
EV
roll_counts[sum(roll)] = 1
But with a defaultdict, we don’t need the if-else block or the try-except block. The code looks
AL
like this:
ut
ho
U
ed
roll_counts = defaultdict(int)
for roll in dice_rolls:
R
AT
roll_counts[sum(roll)] += 1
ep
ro
The result is a defaultdict object that can be treated just like a normal dictionary:
d
uc
IO
defaultdict(<class 'int'>, {
tio
2: 1,
n
3: 2,
4: 3,
or
N
5: 4,
D
6: 5,
is
7: 6,
tri
8: 5,
C
bu
9: 4,
10: 3,
tio
O
11: 2,
n
12: 1
Pr
})
PY
oh
AL
Demo 9: advanced-python-concepts/Demos/dict_try_except.py
ut
5. roll_counts = {}
riz
8. roll_counts[sum(roll)] += 1
9. except KeyError:
R
AT
10. roll_counts[sum(roll)] = 1
ep
ro
IO
1. from collections import defaultdict
tio
N
9. roll_counts[sum(roll)] += 1
D
is
tri
roll_counts = defaultdict(int)
O
n
Pr
Remember, when you try to look up a key that doesn’t exist in a defaultdict, it creates the key and
PY
oh
assigns it the value returned by a function you specified when creating it. In this case, that function is
int().
i bi
te
When passing the function to defaultdict(), you do not include parentheses, because you are not
d
calling the function at the time you pass it to defaultdict(). Rather, you are specifying that you
want to use this function to give you default values for new keys. By passing int, we are stating that
LESSON 1 | 15
we want new keys to have a default value of whatever int() returns when no argument is passed to
it. That value is 0:
>>> int()
0
EV
You can create default dictionaries with any number of functions, both built-in and user-defined:
U
AL
a = defaultdict(list) # Default key value will be []
ut
AT
ep
def foo():
d
uc
return 'bar'
IO
tio
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
defaultdict(<class 'list'>,
U
{
na
AL
'OF': ['Earle Combs', 'Cedric Durst', 'Bob Meusel',
ut
AT
'Wilcy Moore', 'Herb Pennock', 'George Pipgras',
ep
})
uc
IO
tio
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
LESSON 1 | 17
yankees_1927 = [
{'position': 'P', 'name': 'Walter Beall'},
{'position': 'C', 'name': 'Benny Bengough'},
{'position': 'C', 'name': 'Pat Collins'},
{'position': 'OF', 'name': 'Earle Combs'},
EV
{'position': '3B', 'name': 'Joe Dugan'},
{'position': 'OF', 'name': 'Cedric Durst'},
{'position': '3B', 'name': 'Mike Gazella'},
{'position': '1B', 'name': 'Lou Gehrig'},
U
AL
{'position': 'C', 'name': 'Johnny Grabowski'},
ut
IO
{'position': 'P', 'name': 'Bob Shawkey'},
tio
N
]
D
is
C
bu
2. Write code so that the script creates the defaultdict above from the given list.
tio
PY
oh
i bi
te
d
LESSON 1 | 19
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
ho
ut
na
U
Solution: advanced-python-concepts/Solutions/defaultdict.py
1. from collections import defaultdict
2.
3. yankees_1927 = [
4. {'position': 'P', 'name': 'Walter Beall'},
EV
5. {'position': 'C', 'name': 'Benny Bengough'},
6. {'position': 'C', 'name': 'Pat Collins'},
7. {'position': 'OF', 'name': 'Earle Combs'},
8. {'position': '3B', 'name': 'Joe Dugan'},
U
U
15. {'position': 'SS', 'name': 'Mark Koenig'},
ed
AT
18. {'position': 'P', 'name': 'Wilcy Moore'},
ep
IO
23. {'position': 'P', 'name': 'Dutch Ruether'},
tio
29. ]
is
30.
tri
O
34. # Loop through list of yankees appending player names to their position keys
n
36. positions[player['position']].append(player['name'])
PY
37.
oh
38. print(positions['P'])
bii
te
Code Explanation
d
'Urban Shocker',
na
AL
'Myles Thomas'
ut
]
ho
Add the following line of code to the for loop to watch as the players get added:
riz
U
ed
print(player['position'], positions[player['position']])
R
AT
ep
IO
P ['Walter Beall']
tio
C ['Benny Bengough']
C ['Benny Bengough', 'Pat Collins']
n
OF ['Earle Combs']
or
N
3B ['Joe Dugan']
D
1B ['Lou Gehrig']
C
bu
…
O
n
Pr
PY
oh
Counters
i bi
te
Consider again the defaultdict object we created to get the number of different ways each number
d
could be rolled when rolling two dice. This type of task is very common. You might have a collection
of plants and want to get a count of the number of each species or the number of plants by color. The
LESSON 1 | 21
objects that hold these counts are called counters, and the collections module includes a special
Counter() class for creating them.
Although there are different ways of creating counters, they are most often created with an iterable,
like this:
EV
from collections import Counter
c = Counter(['green', 'blue', 'blue', 'red', 'yellow', 'green', 'blue'])
U
na
AL
This will create the following counter:
ut
ho
Counter({
'blue': 3,
riz
U
'green': 2,
ed
'red': 1,
'yellow': 1
R
AT
})
ep
ro
To create a counter from the dice_rolls list we used earlier, we need to first create a list of sums
d
IO
tio
N
roll_sums will contain the following list:
D
is
[
tri
C
2, 3, 4, 5, 6, 7,
bu
3, 4, 5, 6, 7, 8,
tio
4, 5, 6, 7, 8, 9,
O
5, 6, 7, 8, 9, 10,
n
6, 7, 8, 9, 10, 11,
Pr
7, 8, 9, 10, 11, 12
PY
oh
]
i bi
c = Counter(roll_sums)
That creates a counter that is very similar to the defaultdict we saw earlier:
11: 2,
na
AL
2: 1,
ut
12: 1
})
ho
riz
U
The code in the following file creates and outputs the colors and dice rolls counters:
ed
AT
ep
4.
uc
IO
5. dice_rolls = [(a,b)
tio
6. for a in range(1,7)
7. for b in range(1,7)]
n
8.
or
C
bu
O
n
Colors Counter:
Pr
LESSON 1 | 23
Updating Counters
Counter is a subclass of dict. We will learn more about subclasses later, but for now all you need to
understand is that a subclass generally has access to all of its superclass’s methods and data. So, Counter
supports all the standard dict instance methods. The update() method behaves differently though.
EV
In standard dict objects, update() replaces key values with those of the passed-in dictionary:
AL
grades.update({'Math':97, 'Gym':93})
ut
ho
U
ed
{
'English': 97,
R
AT
'Math': 97, # 97 replaces 93
ep
'Art': 74,
'Music': 86,
ro
}
uc
IO
tio
In Counter objects, update() adds the values of a passed-in iterable or another Counter object to its
n
own values:
or
N
D
C
>>> c # Before update:
bu
PY
oh
i bi
te
d
Counters also have a corresponding subtract() method. It works just like update() but subtracts
U
AL
ut
AT
Counter({'blue': 3, 'green': 2, 'red': 0, 'yellow': -1, 'purple': -1})
ep
ro
Notice that the value for the 'yellow' and 'purple' keys are negative, which is a little odd. We will learn
d
IO
tio
N
Counters include a most_common([n]) method that returns the n most common elements and their
D
counts, sorted from most to least common. If n is not passed in, all elements are returned.
is
tri
C
>>> c = Counter(['green', 'blue', 'blue', 'red', 'yellow', 'green', 'blue'])
bu
>>> c.most_common()
tio
PY
oh
i bi
te
d
LESSON 1 | 25
Exercise 3: Creating a Counter
10 to 15 minutes
EV
In this exercise, you will create a counter that holds the most common words used and the number of
times they show up in the U.S. Declaration of Independence.
AL
A. Reads the Declaration_of_Independence.txt file in the same folder.
ut
ho
B. Creates a list of all the words that have at least six characters.
riz
Use split() to split the text into words. This will split the text on
U
ed
AT
Use upper() to convert the words to uppercase.
ep
ro
D. Outputs the most common ten words and their counts. The result should look like
uc
IO
this:
tio
n
[
('PEOPLE', 13),
or
N
('STATES', 7),
D
('SHOULD', 5),
is
('INDEPENDENT', 5),
tri
('AGAINST', 5),
C
bu
('GOVERNMENT,', 4),
('ASSENT', 4),
tio
('OTHERS', 4),
O
n
('POLITICAL', 3),
Pr
('POWERS', 3)
]
PY
oh
i bi
te
d
LESSON 1 | 27
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
ho
ut
na
U
Solution: advanced-python-concepts/Solutions/counter.py
1. from collections import Counter
2.
3. with open('Declaration_of_Independence.txt') as f:
4. doi = f.read()
EV
5.
6. word_list = [word for word in doi.upper().split() if len(word) > 5]
7.
8. c = Counter(word_list)
U
9. print(c.most_common(10))
na
AL
ut
ho
U
ed
map(function, iterable, …)
R
AT
The built-in map() function is used to sequentially pass all the values of an iterable (or multiple iterables)
ep
to a function and return an iterator containing the returned values. It can be used as an alternative to
ro
list comprehensions. First, consider the following code sample that does not use map():
d uc
IO
Demo 12: advanced-python-concepts/Demos/without_map.py
tio
2. return x * y
or
3.
N
4. def main():
D
7.
C
bu
8. multiples = []
9. for i in range(len(nums1)):
tio
11. multiples.append(multiple)
Pr
12.
13. for multiple in multiples:
PY
oh
14. print(multiple)
i
15.
bi
16. main()
te
d
24
na
AL
25
ut
24
ho
21
16
riz
9
U
ed
R
AT
The following code sample does the same thing using map():
ep
IO
2. return x * y
tio
3.
4. def main():
n
9.
tri
11. print(multiple)
12.
tio
13. main()
O
n
Pr
PY
Code Explanation
oh
i bi
We could also include the map() function right in the for loop:
te
d
LESSON 1 | 29
for multiple in map(multiply, nums1, nums2):
print(multiple)
Note that you can accomplish the same thing with a list comprehension:
EV
multiples = [multiply(nums1[i], nums2[i]) for i in range(len(nums1))]
U
One possible advantage of using map() in combination with multiple sequences is that it will not error
na
AL
if the sequences are different lengths. It will stop mapping when it reaches the end of the shortest
ut
sequence. In some cases, this might also be a disadvantage as it might hide a bug in the code. Also, this
ho
U
ed
AT
ep
filter(function, iterable)
ro
The built-in filter() function is used to sequentially pass all the values of a single iterable to a
duc
function and return an iterator containing the values for which the function returns True. As with
IO
map(), filter() can be used as an alternative to list comprehensions. First, consider the following
tio
N
Demo 14: advanced-python-concepts/Demos/without_filter.py
D
1. def is_odd(num):
is
2. return num % 2
tri
C
3.
bu
4. def main():
tio
7. odd_nums = []
Pr
10. odd_nums.append(num)
i
11.
bi
13. print(num)
d
14.
15. main()
This code passes a range of numbers one by one to the is_odd() function to create an iterator (a list)
of odd numbers. It then loops through the iterator printing each result. It will output:
EV
1
3
5
7
U
9
na
AL
ut
ho
The following code sample does the same thing using filter():
riz
1. def is_odd(num):
R
2. return num % 2
AT
3.
ep
4. def main():
ro
6.
uc
10. print(num)
or
11.
N
12. main()
D
is
tri
Code Explanation
C
bu
As with map(), we can include the filter() function right in the for loop:
tio
O
n
print(num)
PY
oh
i bi
Again, you can accomplish the same thing with a list comprehension:
te
d
LESSON 1 | 31
Using Lambda Functions with map() and filter()
The map() and filter() functions are both often used with lambda functions, like this:
...
na
0
AL
9
ut
16
ho
21
24
riz
U
25
ed
24
21
R
AT
16
ep
9
ro
... print(num)
IO
...
tio
1
n
3
or
5
N
7
D
9
is
tri
C
bu
O
Some programmers, including Guido van Rossum, the creator of Python, dislike lambda,
n
filter() and map(). These programmers feel that list comprehension can generally be used
Pr
instead. However, other programmers love these functions and Guido eventually gave up the
PY
oh
1. https://mail.python.org/pipermail/python-dev/2006-February/060415.html
>>> v1 = 'A'
na
>>> v2 = 'A'
AL
>>> v1 is v2
ut
True
ho
U
>>> list1 is list2
ed
False
R
AT
ep
Immutable objects cannot be modified in place. Every time you “change” a string, you are actually
creating a new string:
ro
d uc
2613698710320
n
>>> id(name)
N
2613698710512
D
is
tri
Notice the ids are different. It is impossible to modify an immutable object in place.
C
bu
Lists, on the other hand, are mutable and can be modified in place. For example:
tio
O
n
Pr
PY
oh
i bi
te
d
LESSON 1 | 33
>>> v1 = [1, 2]
>>> v2 = v1
>>> id(v1) == id(v2)
True # Both variables point to the same list
>>> v1, v2
EV
([1, 2], [1, 2])
>>> v2 += [3]
>>> v1, v2
([1, 2, 3], [1, 2, 3])
U
AL
True # Both variables still point to the same list
ut
ho
Notice that with lists, v2 changes when we change v1. Both are pointing at the same list object, which
riz
is mutable. So, when we modify the v2 list, we see the change in v1, because it points to the same
U
object.
ed
Be careful though. If you use the assignment operator, you will overwrite the old list and create a new
R
AT
ep
list object:
ro
>>> v1 = [1, 2]
d
uc
>>> v2 = v1
IO
>>> v1, v2
tio
>>> v1 = v1 + [3]
or
>>> v1, v2
N
([1, 2, 3], [1, 2])
D
is
tri
Assigning v1 explicitly with the assignment operator, rather than appending a value via v1.append(3),
C
bu
O
Sorting
n
Pr
PY
oh
Python lists have a sort() method that sorts the list in place:
te
d
The sort() method can take two keyword arguments: key and reverse.
na
AL
ut
reverse
ho
U
ed
colors.sort(reverse=True)
AT
ep
ro
IO
['red', 'orange', 'green', 'blue']
tio
n
key
or
N
D
The key argument takes a function to be called on each list item and performs the sort based on the
is
result. For example, the following code will sort by word length:
tri
C
bu
colors.sort(key=len)
O
n
Pr
LESSON 1 | 35
def get_lastname(name):
return name.split()[-1]
AL
[
ut
'John Adams',
ho
'George Washington'
U
ed
]
R
AT
Note that John Quincy Adams shows up after John Adams in the result only because he shows up after
ep
him in the initial list. Our code as it stands does not take into account middle or first names.
ro
d
IO
tio
If you don’t want to create a new named function just to perform the sort, you can use a lambda
n
function. For example, the following code would do the same thing as the code above without the need
or
O
n
The key and reverse arguments can be combined. For example, the following code will sort by word
PY
oh
colors.sort(key=len, reverse=True)
1. It does not modify the iterable in place. Rather, it returns a new sorted list.
na
AL
2. It can take any iterable, not just a list (but it always returns a list).
ut
ho
riz
U
ed
R
AT
ep
ro
d uc
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
LESSON 1 | 37
Exercise 4: Converting list.sort() to sorted(iterable)
15 to 25 minutes
EV
In this exercise, you will convert all the examples of sort() we saw earlier to use sorted() instead.
2. The code in the first example has already been converted to use sorted().
na
AL
3. Convert all other code examples in the script.
ut
ho
3. # colors.sort()
4. new_colors = sorted(colors) # This one has been done for you
R
AT
5. print(new_colors)
ep
6.
ro
9. print(colors)
IO
10.
tio
13. print(colors)
or
N
14.
D
22. print(people)
Pr
23.
PY
oh
26. print(people)
te
27.
28. # Combing key and reverse
d
LESSON 1 | 39
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
ho
ut
na
U
Solution: advanced-python-concepts/Solutions/sorting.py
1. # Simple sort() method
2. colors = ['red', 'blue', 'green', 'orange']
3. # colors.sort()
4. new_colors = sorted(colors) # This one has been done for you
EV
5. print(new_colors)
6.
7. # The reverse argument:
8. # colors.sort(reverse=True)
U
9. # print(colors)
na
12.
ho
U
15. # print(colors)
ed
AT
18.
ep
22.
uc
IO
23. people = ['George Washington', 'John Adams',
tio
26. # print(people)
or
29.
is
O
34. # print(people)
n
36. print(new_people)
PY
37.
oh
40. # print(colors)
te
42. print(new_colors)
('Franklin', 'Roosevelt'),
AL
('Joseph', 'Stalin'),
ut
('Adolph', 'Hitler'),
ho
('Benito', 'Mussolini'),
('Hideki', 'Tojo')
riz
U
]
ed
ww2_leaders.sort()
R
AT
ep
The ww2_leaders list will be sorted by first name and then by last name. It will now contain:
ro
d uc
[
IO
('Adolph', 'Hitler'),
tio
('Benito', 'Mussolini'),
n
('Franklin', 'Roosevelt'),
N
('Hideki', 'Tojo'),
D
('Joseph', 'Stalin'),
is
('Teddy', 'Roosevelt'),
tri
C
('Winston', 'Churchill')
bu
]
tio
O
n
PY
oh
The ww2_leaders list will now be sorted by last name and then by first name. It will now contain:
te
d
LESSON 1 | 41
[
('Winston', 'Churchill'),
('Adolph', 'Hitler'),
('Benito', 'Mussolini'),
('Franklin', 'Roosevelt'),
EV
('Teddy', 'Roosevelt'),
('Joseph', 'Stalin'),
('Hideki', 'Tojo'),
('Charles', 'de Gaulle')
U
]
na
AL
ut
It may seem strange that “de Gaulle” comes after “Tojo,” but that is correct. Lowercase letters come
ho
after uppercase letters in sorting. To change the result, you can use the lower() function:
riz
U
ed
AT
ww2_leaders will now contain:
ep
ro
[
d uc
('Winston', 'Churchill'),
IO
('Charles', 'de Gaulle'),
tio
('Adolph', 'Hitler'),
n
('Benito', 'Mussolini'),
or
('Franklin', 'Roosevelt'),
N
('Teddy', 'Roosevelt'),
D
('Joseph', 'Stalin'),
is
('Hideki', 'Tojo')
tri
]
C
bu
tio
quences.py.
Pr
PY
oh
AL
)
ut
ww2_leaders.append(
{'fname':'Benito', 'lname':'Mussolini', 'dob':date(1882,1,30)}
ho
)
riz
ww2_leaders.append(
U
ed
ww2_leaders.append(
AT
ep
ww2_leaders.append(
d
IO
)
tio
n
This data can be sorted using a lambda function similar to how we sorted lists of tuples:
or
N
D
tionaries.py.
i bi
itemgetter()
te
d
While the method shown above works fine, the operator module provides an itemgetter() method
that performs this same task a bit faster. It works like this:
LESSON 1 | 43
Demo 16: advanced-python-concepts/Demos/sorting_with_itemgetter.py
1. from datetime import date
2. from operator import itemgetter
3.
4. def main():
EV
-------Lines 5 through 27 Omitted-------
28. ww2_leaders.sort(key=itemgetter('dob'))
29. print('First born:', ww2_leaders[0]['fname'])
30.
U
AL
32. print('First in Encyclopedia:', ww2_leaders[0]['fname'])
ut
33.
34. main()
ho
riz
U
ed
AT
ep
Follow these steps to make a dictionary from two lists using the first list for keys and the second list
for values:
ro
d
1. Use the built-in zip() function to make a list of two-element tuples from the two lists:
uc
IO
tio
N
D
C
bu
>>> course_grades
O
{'English': 96, 'Math': 99, 'Art': 88, 'Music': 94}
n
Pr
PY
oh
This works with any type of sequence. For example, you could create a dictionary mapping letters to
numbers like this:
AL
ut
import math
def distance_from_origin(a, b):
ho
U
ed
The function expects two arguments, a and b, which are the x, y coordinates of a point. It uses the
Pythagorean theorem to determine the distance the point is from the origin.
R
AT
ep
c = distance_from_origin(3, 4)
uc
IO
tio
But it would be nice to be able to call the function like this too:
n
or
N
point = (3, 4)
D
c = distance_from_origin(point)
is
tri
C
However, that will cause an error because the function expects two arguments and we’re only passing
bu
in one.
tio
O
One solution would be to pass the individual elements of our point:
n
Pr
PY
point = (3, 4)
oh
c = distance_from_origin(point[0], point[1])
i bi
te
But Python provides an even easier solution. We can use an asterisk in the function call to unpack the
d
LESSON 1 | 45
point = (3, 4)
c = distance_from_origin(*point)
When you pass a sequence preceded by an asterisk into a function, the sequence gets unpacked, meaning
that the function receives the individual elements rather than the sequence itself.
EV
The preceding code can also be found in advanced-python-concepts/Demos/unpacking_func
tion_arguments.py.
U
na
AL
ut
ho
riz
U
ed
R
AT
ep
ro
d
uc
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
1. import datetime
AL
2.
ut
3. def str_to_date(str_date):
ho
4. # Write function
5. pass
riz
U
6.
ed
AT
9. print(date)
ep
ro
1. Open advanced-python-concepts/Exercises/converting_date_string_to_date
d uc
2. The imported datetime module includes a date() method that can create a date object
n
from three passed-in parameters: year, month, and day. For example:
or
N
datetime.date(1776, 7, 4)
D
is
tri
A. Splits the passed-in string into a list of date parts. Each part should be an integer.
tio
B. Returns a date object created by passing the unpacked list of date parts to
O
n
datetime.date().
Pr
PY
oh
i bi
te
d
LESSON 1 | 47
Solution: advanced-python-concepts/Solutions/converting_date_string_to_datetime.py
1. import datetime
2.
3. def str_to_date(str_date):
4. date_parts = [int(i) for i in str_date.split("-")]
EV
5. return datetime.date(*date_parts)
6.
7. str_date = input('Input date as YYYY-MM-DD: ')
8. date = str_to_date(str_date)
U
9. print(date)
na
AL
ut
ho
U
You have worked with different Python modules (e.g., random and math) and packages (e.g.,
ed
collections). In general, it’s not all that important to know whether a library you want to use is a
R
AT
module or a package, but there is a difference, and when you’re creating your own, it’s important to
ep
Modules
uc
IO
tio
A module is a single file. It can be made up of any number of functions and classes. You can import
the whole module using:
n
or
N
import module_name
D
is
tri
Or you can import specific functions or classes from the module using:
C
bu
tio
For example, if you want to use the random() function from the random module, you can do so by
PY
importing the whole module or by importing just the random() function:
oh
i bi
te
d
As shown above, when you import the whole module, you must prefix the module’s functions with
U
AL
Every .py file is a module. When you build a module with the intention of making it available to other
ut
modules for importing, it is common to include a _test() function that runs tests when the module
ho
is run directly. For example, if you run random.py, which is in the Lib directory of your Python home,
riz
AT
0.003 sec, avg 0.500716, stddev 0.285239, min 0.000495333, max 0.99917
ep
ro
0.004 sec, avg 0.0061499, stddev 0.971102, min -2.86188, max 3.02266
uc
IO
2000 times lognormvariate
tio
0.004 sec, avg 1.64752, stddev 2.12612, min 0.0310675, max 28.5174
n
…
or
N
D
C
To find your Python home, run the following code:
bu
tio
1. import sys
Pr
2. import os
PY
3.
oh
4. python_home = os.path.dirname(sys.executable)
i bi
5. print(python_home)
te
d
Open random.py in an editor and you will see it ends with this code:
LESSON 1 | 49
if __name__ == '__main__':
_test()
The __name__ variable of any module that is imported holds that module’s name. For example, if you
EV
import random and then print random.__name__, it will output “random”. However, if you open
random.py, add a line that reads print(__name__), and run it, it will print “__main__”. So, the if
condition in the code above just checks to see if the file has been imported. If it hasn’t (i.e., if it’s
running directly), then it will call the _test() function.
U
na
AL
If you do not want to write tests, you could include code like this:
ut
ho
if __name__ == '__main__':
riz
Packages
AT
ep
A package is a group of files (and possibly subfolders) stored in a directory that includes a file named
ro
__init__.py. The __init__.py file does not need to contain any code. Some libraries’ __init__.py files
d
IO
tio
N
However, you can include code in the __init__.py file that will initialize the package. You can also (but
D
do not have to) set a global __all__ variable, which should contain a list of files to be imported when
is
a file imports your package using from package_name import *. If you do not set the __all__
tri
C
variable, then that form of import will not be allowed, which may be just fine.
bu
tio
The Python interpreter must locate the imported modules. When import is used within a script, the
PY
oh
interpreter searches for the imported module in the following places sequentially:
i bi
As you see, the steps involved in creating modules and packages for import are relatively straightforward.
However, designing useful and easy-to-use modules and packages takes a lot of planning and thought.
EV
Conclusion
In this lesson, you have learned several advanced techniques with sequences. You have also learned to
U
do mapping and filtering, and to use lambda functions. Finally, you have learned how modules and
na
U
ed
R
AT
ep
ro
d uc
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
2. sys.path contains a list of strings specifying the search path for modules. The list is os-dependent. To see your list, import sys,
and then output sys.path.
LESSON 1 | 51
PY
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
Python’s re module.
AL
ut
ho
Tom’s whole class were of a pattern--restless, noisy, and troublesome. When they came
riz
to recite their lessons, not one of them knew his verses perfectly, but had to be prompted
U
all along.
ed
AT
ep
ro
Introduction
d uc
IO
Regular expressions are used to do pattern matching in many programming languages, including Java,
tio
PHP, JavaScript, C, C++, and Perl. We will provide a brief introduction to regular expressions and
then we’ll show you how to work with them in Python.
n
or
N
D
C
We will use the online regular expression testing tool at https://pythex.org to demonstrate and
bu
test our regular expressions. To see how it works, open the page in your browser:
tio
O
n
Pr
PY
oh
i bi
te
d
LESSON 2 | 53
EV
U
na
AL
ut
ho
riz
U
ed
R
AT
ep
ro
d
IO
tio
3. Notice the Match result. The parts of the string that match your pattern will be highlighted.
N
D
Usually, you will want to have the MULTILINE option selected so that each line will be tested
is
individually.
tri
C
bu
In the Your test string field, you can test multiple strings:
tio
O
n
Pr
PY
oh
i
bi
te
d
54 | Regular Expressions
EV
U
na
AL
ut
ho
riz
U
ed
R
AT
ep
ro
d uc
IO
tio
These examples just find occurrences of a substring (e.g., “rose”) in a string (e.g, “A rose is a rose is a
n
rose.”). But the power of regular expressions is in pattern matching. The best way to get a feel for them
is to try them out. So, let’s do that.
or
N
D
C
bu
Here we’ll show the different symbols used in regular expressions. You should use https://pythex.org
to test the patterns we show.
tio
O
n
PY
oh
A caret ( ^ ) at the beginning of a regular expression indicates that the string being searched must start
i
The pattern ^dog can be found in “dogfish”, but not in “bulldog” or “boondoggle”.
d
A dollar sign ( $ ) at the end of a regular expression indicates that the string being searched must end
with this pattern.
LESSON 2 | 55
The pattern dog$ can be found in “bulldog”, but not in “dogfish” or “boondoggle”.
Backslash-b ( \b ) denotes a word boundary. It matches a location at the beginning or end of a word.
EV
A word is a sequence of numbers, letters, and underscores. Any other character is considered a word
boundary.
The pattern dog\b matches the first but not the second occurence of “dog” in the phrase
U
na
In the phrase “The dogfish bit the bulldog.”, it only matches the second occurrence of “dog”,
ho
Backslash-B ( \B ) is the opposite of backslash-b ( \b ). It matches a location that is not a word boundary.
U
ed
The pattern dog\B matches the second but not the first occurence of “dog” in the phrase
R
AT
“The bulldog bit the dogfish.”
ep
But in the phrase “The dogfish bit the bulldog.”, it only matches the first occurrence of “dog”.
ro
d
Number of Occurrences ( ? + * {} )
uc
IO
tio
The following symbols affect the number of occurrences of the preceding character: ?, +, *, and {}.
n
or
A question mark ( ? ) indicates that the preceding character should appear zero or one times in the
N
pattern.
D
is
The pattern go?ad can be found in “goad” and “gad”, but not in “gooad”. Only zero or one
tri
C
“o” is allowed before the “a”.
bu
tio
A plus sign ( + ) indicates that the preceding character should appear one or more times in the pattern.
O
n
The pattern go+ad can be found in “goad”, “gooad” and “goooad”, but not in “gad”.
Pr
PY
oh
An asterisk ( * ) indicates that the preceding character should appear zero or more times in the pattern.
i bi
The pattern go*ad can be found in “gad”, “goad”, “gooad” and “goooad”.
te
d
Curly braces with one parameter ( {n} ) indicate that the preceding character should appear exactly n
times in the pattern.
56 | Regular Expressions
The pattern fo{3}d can be found in “foood” , but not in “food” or “fooood”.
Curly braces with two parameters ( {n1,n2} ) indicate that the preceding character should appear
between n1 and n2 times in the pattern.
EV
The pattern fo{2,4}d can be found in “food”, “foood” and “fooood”, but not in “fod” or
“foooood”.
Curly braces with one parameter and an empty second parameter ( {n,} ) indicate that the preceding
U
AL
ut
The pattern fo{2,}d can be found in “food” and “foooood”, but not in “fod”.
ho
riz
Common Characters ( . \d \D \w \W \s \S )
U
ed
AT
ep
The pattern fo.d can be found in “food”, “foad”, “fo9d”, and “fo d”.
ro
Backslash-d ( \d ) represents any digit. It is the equivalent of [0-9] (to be discussed soon).
d uc
IO
The pattern fo\dd can be found in “fo1d”, “fo4d” and “fo0d”, but not in “food” or “fodd”.
tio
n
Backslash-D ( \D ) represents any character except a digit. It is the equivalent of [^0-9] (to be discussed
or
soon).
N
D
The pattern fo\Dd can be found in “good” and “gold”, but not in “go4d”.
is
tri
Backslash-w ( \w ) represents any word character (letters, digits, and the underscore (_) ).
C
bu
tio
The pattern fo\wd can be found in “food”, “fo_d” and “fo4d”, but not in “fo*d”.
O
n
PY
oh
The pattern fo\Wd can be found in “fo*d”, “fo@d” and “fo.d”, but not in “food”.
i bi
The pattern fo\sd can be found in “fo d”, but not in “food”.
LESSON 2 | 57
The pattern fo\Sd can be found in “fo*d”, “food” and “fo4d”, but not in “fo d”.
Character Classes ( [] )
Square brackets ( [] ) are used to create a character class (or character set), which specifies a set of
EV
characters to match.
The pattern f[aeiou]d can be found in “fad” and “fed”, but not in “food”, “fyd” or “fd”
U
AL
ut
The pattern f[aeiou]{2}d can be found in “faed” and “feod”, but not in “fod”, “fold” or
ho
“fd”.
riz
The pattern [A-Za-z]+ can be found twice in “Webucator, Inc.”, but not in “13066”.
U
ed
[A-Z] matches
AT
any uppercase letter.
ep
The pattern [1-9]+ can be found twice in “13066”, but not in “Webucator, Inc.”
uc
IO
tio
Negation ( ^ )
n
or
When used as the first character within a character class, the caret ( ^ ) is used for negation. It matches
N
any characters not in the set.
D
is
The pattern f[^aeiou]d can be found in “fqd” and “f4d”, but not in “fad” or “fed”.
tri
C
bu
Groups ( () )
tio
O
n
Parentheses ( () ) are used to capture subpatterns and store them as groups, which can be retrieved
Pr
later.
PY
oh
The pattern f(oo)?d can be found in “food” and “fd”, but not in “fod”.
i bi
Alternatives ( | )
58 | Regular Expressions
The pattern ^web|or$ can be found in “website”, “educator”, and twice in “webucator”, but
not in “cobweb” or “orphan”.
Escape Character ( \ )
EV
The backslash ( \ ) is used to escape special characters.
The pattern fo\.d can be found in “fo.d”, but not in “food” or “fo4d”.
U
na
AL
Backreferences
ut
ho
Backreferences are special wildcards that refer back to a group within a pattern. They can be used to
make sure that two subpatterns match. The first group in a pattern is referenced as \1, the second
riz
U
group is referenced as \2, and so on.
ed
For example, the pattern ([bmpr])o\1 matches “bobcat”, “thermometer”, “popped”, and “prorate”.
R
AT
ep
A more practical example has to do with matching the delimiter in social security numbers. Examine
ro
IO
^\d{3}([\- ]?)\d{2}([\- ]?)\d{4}$
tio
n
Within the caret (^) and dollar sign ($), which are used to specify the beginning and end of the pattern,
or
N
there are three sequences of digits, optionally separated by a hyphen or a space. This pattern will be
D
123-45-6789
C
bu
123 45 6789
tio
123456789
O
n
123-45 6789
Pr
123 45-6789
PY
oh
123-456789
i bi
te
The last three strings are not ideal, but they do match the pattern. Backreferences can be used to make
sure that the second delimiter matches the first delimiter. The regular expression would look like this:
d
LESSON 2 | 59
^\d{3}([\- ]?)\d{2}\1\d{4}$
The \1 refers back to the first subpattern. Only the first three strings listed above match this regular
expression.
EV
Python’s Handling of Regular Expressions
U
In Python, you use the re module to access the regular expression engine. Here is a very simple
na
AL
illustration. Imagine you’re looking for the pattern “r[aeiou]se” in the string “A rose is a rose is a rose.”
ut
import re
U
ed
AT
ep
p = re.compile('r[aeiou]se')
ro
d
IO
tio
print(result)
is
tri
C
bu
This will print the following, showing that the result is a match object and that it found the match
tio
Compiling a regular expression pattern into an object is a good idea if you’re going to reuse the expression
i bi
throughout the program, but if you’re just using it once or twice, you can use the module-level search()
te
60 | Regular Expressions
>>> result = re.search('r[aeiou]se', 'A rose is a rose is a rose.')
>>> result
<re.Match object; span=(2, 6), match='rose'>
EV
Raw String Notation
Python uses the backslash character ( \ ) to escape special characters. For example \n is a newline
character. A call to print('a\nb\nc') will print the letters a, b, and c each on its own line:
U
na
AL
ut
>>> print('a\nb\nc')
a
ho
b
riz
c
U
ed
If you actually want to print a backslash followed by an “n”, you need to escape the backslash with
R
AT
another backslash, like this: print('a\\nb\\nc'). That will print the literal string “a\nb\nc”:
ep
ro
>>> print('a\\nb\\nc')
d
a\nb\nc
uc
IO
tio
Python provides another way of doing this. Instead of escaping all the backslashes, you can use rawstring
n
notation by placing the letter “r” before the beginning of the string, like this: print(r'a\nb\nc'):
or
N
D
>>> print(r'a\nb\nc')
is
a\nb\nc
tri
C
bu
While this may not come in very handy in most areas of programming, it is very helpful when writing
tio
regular expression patterns. That is because the regular expression syntax also uses the backslash for
O
n
special characters. If you don’t use raw string notation, you may find your patterns filled with backslashes.
Pr
The takeaway: Always use raw string notation for your patterns.
PY
oh
i bi
te
d
LESSON 2 | 61
Regular Expression Object Methods
1. p.search(string) – Finds the first substring that matches the pattern. Returns a Match
object or None.
EV
>>> p = re.compile(r'\W')
>>> p.search('andré@example.com')
<re.Match object; span=(5, 6), match='@'>
U
AL
2. p.match(string) – Like search(), but the match must be found at the beginning of the
ut
>>> p = re.compile(r'\W')
U
>>> p.match('andré@example.com') # Returns None
ed
>>> p.match('@example.com')
R
This matches the first character if it is a non-word character. The first example returns None
ro
3. p.fullmatch(string) – Like search(), but the whole string must match. Returns a Match
IO
tio
object or None.
n
>>> p = re.compile(r'[\w\.][email protected]')
or
N
>>> p.match('andré@example.com')
D
This matches a string made up of word characters and periods followed by “@example.com”.
C
bu
O
>>> p = re.compile(r'\W')
n
>>> p.findall('andré@example.com')
Pr
['@', '.']
PY
oh
62 | Regular Expressions
5. p.split(string, maxsplit=0) – Splits the string on pattern matches. If maxsplit is
nonzero, limits splits to maxsplit. Returns a list of strings.
>>> p = re.compile(r'\W')
>>> p.split('andré@example.com')
EV
['andré', 'example', 'com']
repl. If count is nonzero, limits replacements to count. More details on sub() under Using
AL
sub() with a Function
ut
All the methods that search a string for a pattern (search(), match(), fullmatch(), and findall())
riz
include start and end parameters that indicate what positions in the string to start and end the search.
U
ed
Groups
R
AT
ep
As discussed earlier, parentheses in regular expression patterns are used to capture groups. You can
ro
access these groups individually using a match object’s group() method or all at once using its groups()
d
method.
uc
IO
tio
N
match.group(1) returns the first group found.
D
is
C
And so on...
bu
tio
You can also get multiple groups at the same time returned as a tuple of strings by passing in more
O
n
When nested parentheses are used in the pattern, the outer group is returned before the inner group.
PY
oh
3. Groups can also be named through a Python extension to regular expressions. For more information, see
https://docs.python.org/3/howto/regex.html#non-capturing-and-named-groups.
LESSON 2 | 63
>>> import re
>>> p = re.compile(r'(\w+)@(\w+\.(\w+))')
>>> match = p.match('[email protected]')
>>> email = match.group(0)
>>> handle = match.group(1)
EV
>>> domain = match.group(2)
>>> domain_type = match.group(3)
>>> print(email, handle, domain, domain_type, sep='\n')
[email protected]
U
andre
na
AL
example.com
ut
com
ho
Notice that “example.com” is group 2 and “com”, which is nested within “example.com” is group 3.
riz
U
ed
And you can use the groups() method to get them all at once:
R
AT
ep
>>> print(match.groups())
('andre', 'example.com', 'com')
ro
d
uc
IO
Flags
tio
The compile() method takes an optional second argument: flags. The flags are constants that can
n
or
re.compile(pattern, re.FLAG1|re.FLAG2)
tri
C
bu
O
1. re.IGNORECASE (re.I) – Makes the pattern case insensitive.
n
Pr
The sub() method can either replace each match with a string or with the return value of a specified
d
function. The function receives the match as an argument and must return a string that will replace
the matched pattern. Here is an example:
64 | Regular Expressions
Demo 18: regular-expressions/Demos/clean_cusses.py
1. import re
2. import random
3.
4. def clean_cuss(match):
EV
5. # Get the whole match
6. cuss = match.group(0)
7. # Generate a random list of characters the length of cuss
8. chars = [random.choice('!@#$%^&*') for letter in cuss]
U
U
15. s = """Shucks! What a cruddy day I\'ve had.
ed
AT
18. result = p.sub(clean_cuss, s)
ep
19. print(result)
20.
ro
21. main()
d uc
IO
tio
Code Explanation
n
Reading regular expressions is tricky. You have to think like a computer and parse it part by part:
or
N
D
Word boundary:
is
tri
\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b
C
bu
O
\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b
n
Pr
PY
Any one of the words delimited by the pipes (|):
oh
\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b
i bi
te
\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b
LESSON 2 | 65
Word boundary:
\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b
Notice that we compile the pattern using the re.IGNORECASE and re.MULTILINE flags:
EV
p = re.compile(pattern, re.IGNORECASE|re.MULTILINE)
U
AL
ut
ho
riz
U
ed
R
AT
ep
ro
d
uc
IO
tio
n
or
N
D
is
tri
C
bu
Run the file at the terminal to see that it replaces all those matches with a random string of characters:
tio
O
PS …\regular-expressions\Demos> python clean_cusses.py
n
In an earlier lesson (see page 26), we split the text of the U.S. Declaration of Independence on spaces to
d
create a counter showing which words were used the most often. The resulting list looked like this:
66 | Regular Expressions
[('PEOPLE', 13), ('STATES', 7), ('SHOULD', 5), ('INDEPENDENT', 5),
('AGAINST', 5), ('GOVERNMENT,', 4), ('ASSENT', 4),
('OTHERS', 4), ('POLITICAL', 3), ('POWERS', 3)]
In the following demo, we use a regular expression to split on any character that is not a capital letter:
EV
Demo 19: regular-expressions/Demos/counter_re.py
1. import re
U
AL
3.
ut
4. with open('Declaration_of_Independence.txt') as f:
5. doi = f.read().upper()
ho
6.
riz
9. c = Counter(word_list)
10. print(c.most_common(10))
R
AT
ep
ro
Code Explanation
d uc
Because we use upper() to convert the whole text to uppercase, we can split on [^A-Z]. If we didn’t
IO
tio
know that there were only uppercase letters, we would have used [^A-Za-z] instead.
n
N
D
O
n
Pr
PY
oh
i bi
te
d
LESSON 2 | 67
Exercise 6: Green Glass Door
20 to 30 minutes
EV
In this exercise, you will modify a function so that it uses a regular expression. But first, a little riddle:
The following items can pass through the green glass door:
U
1. puddles
na
AL
2. mommies
ut
3. aardvarks
ho
4. balloons
riz
U
ed
The following items cannot pass through the green glass door:
R
AT
1. ponds
ep
2. moms
ro
3. anteaters
duc
4. kites
IO
tio
Knowing that, which of the following can pass through the green glass door?
n
or
1. bananas
N
D
2. apples
is
3. pears
tri
C
4. grapes
bu
5. cherries
tio
O
n
Did you figure it out? The two that can pass are apples and cherries. Any word with a double letter
Pr
68 | Regular Expressions
Exercise Code: regular-expressions/Exercises/green_glass_door.py
1. def green_glass_door(word):
2. prev_letter = ''
3. for letter in word:
4. letter = letter.upper()
EV
5. if letter == prev_letter:
6. return True
7. prev_letter = letter
8. return False
U
9.
na
12.
ho
U
15. print(f'YES! {fruit} can pass through the green glass door.')
ed
16. else:
17. print(f'NO! {fruit} cannot pass through the green glass door.')
R
AT
ep
ro
Study the code, paying particular attention to the green_glass_door() function. Your job is to
d
rewrite that function to use a regular expression. Don’t forget to import re.
uc
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i
bi
te
d
LESSON 2 | 69
Solution: regular-expressions/Solutions/green_glass_door.py
1. import re
2.
3. def green_glass_door(word):
4. pattern = re.compile(r'(.)\1')
EV
5. return pattern.search(word)
6.
7. fruits = ['banana', 'apple', 'pear', 'grape', 'cherry',
8. 'persimmons', 'orange', 'passion fruit']
U
9.
na
12. print(f'YES! {fruit} can pass through the green glass door.')
ho
13. else:
14. print(f'NO! {fruit} cannot pass through the green glass door.')
riz
U
ed
Code Explanation
R
AT
ep
The first part of the pattern matches any character. It uses parentheses to create a group:
ro
d
pattern = re.compile(r'(.)\1')
uc
IO
tio
The second part of the pattern uses a backreference to match the first group: that is, the character
n
matched by (.):
or
N
D
pattern = re.compile(r'(.)\1')
is
tri
C
The function then returns the result of searching the string for that pattern:
bu
tio
return pattern.search(word)
O
n
Pr
That will either return a Match object, which evaluates to True, or it will return None, which evaluates
PY
oh
to False.
i bi
te
d
70 | Regular Expressions
Conclusion
In this lesson, you have learned how to work with regular expressions in Python. To learn more about
regular expressions, see Python’s Regular Expression HOWTO
(https://docs.python.org/3/howto/regex.html).
EV
U
na
AL
ut
ho
riz
U
ed
R
AT
ep
ro
d uc
IO
tio
n
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
i bi
te
d
LESSON 2 | 71
PY
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
72 | Regular Expressions
ho
ut
na
U
EVU
na
AL
ut
ho
riz
U
ed
R
AT
ep
ro
ucd
IO
7400 E. Orchard Road, Suite 1450 N
tio
www.ITCourseware.com
or
N
D
is
tri
C
bu
tio
O
n
Pr
PY
oh
bii
te
d
1-38-00319-000-04-28-20